Tuesday 6 November 2018

Bloom Filter

A bloom filter is a rapid and memory efficient data structure, used to test and tell whether an element is present in a set or not. However, the efficiency comes for a price i.e. it is a probabilistic data structure. It can help us identify that either element is not in the set or may be present in the set.
The base data structure of a Bloom Filter is a Bit Vector. A Bit Vector looks like this,



It has empty cells in the table, each representing a bit, and the number below it is its index like indices in array. To add an element to the Bloom Filter, we hash the element a few times, which gives us hashes. Then we set the bits in the Bit Vector to 1, which correspond to the hashes.
Illustrating an example, we will use Fnv and Murmur hash functions here.
When we hash a string “hello”, fnv results 6 and murmur results 0. So, the Bloom Filter looks like the one below.In reality these cells are not colored but the bits are set to 1.


Now to test whether a string is already in the Bloom Filter, we simply hash the string with the same hash functions. Then we check if those values are set in the bit vector. If they aren't, we know that the element isn't in the set. If they are, we only know that it might be, because another element or some combination of other elements could have set the same bits.
Choosing Hash functions for Bloom Filters
The hash functions used in a Bloom filter should be independent, fast and uniformly distributed. Cryptographic hashes such as sha1, though widely used therefore are not very good choices.  Examples of fast, simple hashes that are independent enough3 include murmur, the fnv series of hashes, and HashMix.
How big should I make my Bloom filter?
The false positive of a Bloom filter can be modified. A larger filter will have less false positives, and a smaller one will have more.

How many hash functions should I use?
The more hash functions we use, the slower is the bloom filter, and the quicker it fills up. On the other hand, if we have too few, we may suffer too many false positives. So, we need to maintain the right balance.

Merkle Tree

Merkle Trees
Merkle trees are data structure trees where each non-leaf node is a hash of its respective child node. The leaf nodes are the lowest tier of nodes in the tree. It was named after Ralph Merkle who patented this concept in 1979.



Merkle Trees help in efficient and secure verification of large data structures. These are a fundamental component of blockchains because in the case of blockchains, potentially boundless data sets are used. The implementation of Merkle trees in blockchains has multiple effects. It allows them to scale and provide the hash-based architecture for them to maintain data integrity. Cryptographic hash functions are the underlying technology that allows for Merkle trees to work.
It is easier to understand the concept using the above pictorial view. Notice the non-leaf nodes or branches are represented by hashes of their respective children. The pictorial example is the most common and simple form of a Merkle Tree known as a Binary Merkle Tree. The top hash of the tree is known as the root hash. So, we can also understand Merkle Tree as a data structure which can take ‘n’ number of elements and represent it with a single hash.

How is this convenient for data verification?
The tree structure allows efficient mapping of an arbitrarily large amount of data. And helps in easy identification of where changes in data occur. This is also the underlying basis of Merkle proofs. It can be verified that the data chunk is consistent with the root hash by checking only a small subset of the hashes rather than the entire data set.
Source: https://blockonomi.com/merkle-tree/

Trust Trail in blockchain

  1. Validate Transaction -->
  2. Verify Gas and Resources -->
  3. Select set of transactions to create a block -->
  4. Execute transaction to get a new state -->
  5. Form a new block -->
  6. Work toward consensus -->
  7. New block added to chain and confirmed (finalize the block by the winner)


Consensus Protocol
At step 5 i.e. forming a new block, A PoW is computed. PoW stands for Proof of Work, and it uses hashing as well.
PoW is a puzzle to be solved by the miners.
PoW is used as consensus protocol by Bitcoin and Ethereum Byzantium Metropolis Blockchain.




Exceptions in the blockchain:
  1. More than one miner solves the consensus puzzle - Fork
They are asked to form further blocks. It is a very low probability that both will form another block at the same time. So this breaks the tie and the chain of new blocks are added to the original chain.
Ethereum gives runner-up incentives to the miners who built the block. Runners-up are called Ommers.

  1. More than one transaction references as input, the same digital asset - Double Spending.
In Bitcoin, the transaction referring to the asset first is considered valid transaction and rest of the transactions referencing the asset are rejected.
In Ethereum, a combination of account number and global nonce is used to address the double spending issue.
Every time a transaction is initiated by the account, a global nonce is included in the transaction. After that nonce is incremented. Timestamp and nonce in the transaction should be unique and verified to prevent the double spending problem.

Fork

Well managed forks help build credibility in the blockchain by providing approaches to manage unexpected faults and planned improvements.

Soft fork is like the release of software patches to existing software
Hard fork is like the release of a new version of the operating systems

Recent hard fork in Ethereum which was a change from Homestead to Metropolis Byzantium version. Oct 17, 2017.
PoW protocol still stays, except that, every 100 blocks, PoS consensus protocol is applied for evaluating the latter.
And Miner incentive was reduced from 5 Ethers to 3 Ethers for block creation.

After a hard fork, the emerging two chains are incompatible with each other.

Unplanned hard fork in Ethereum protocol -  Ethereum Core and Ethereum Classic split to address critical software issue in DAO application that resulted in a 150 Million dollar heist.



Hashing in blockchain

SHA-3, SHA-256, and Keccak-256 are important hashing algorithms in BlockChains

Tree hashing is used wherever a variable number of arguments is involved in the block.

  
In Ethereum, hashing is used to generate:
  1. Account addresses
  2. Digital Signatures
  3. Transaction Hash
  4. State Hash
  5. Receipt Hash
  6. Block Header Hash


To manage the integrity of the blockchain transaction
  1. Secure and Unique Account Addresses
  2. Authorization of the transaction by the sender through digital signing
  3. Verifying that the content of the transaction is not modified

Generating address of the accounts
Address of the blockchain accounts are generated through use of public key-private key pair
Here are the steps involved:
  1. 256 bit random number is generated and designated as private key
    • It is usually kept secure using a password or passphrase
  2. ECC algorithm is applied to the private key to generate a unique public key. This completes the private-public key pair. (ECC: Elliptic-curve cryptography)
  3. Hashing technique is applied to public key to obtain an account address (20 bytes)

Transaction for transferring assets has to be:
  1. Authorized
  2. Non - repudiable
  3. Unmodifiable

Verification of a blockchain transaction
  1. Find the hash of the data fields of the transaction
  2. Encrypt the hash using the private key of the participant originating the transaction, thus, digitally signing the transaction to authorize and making the transaction as non-repudiable.
  3. This hash is added to the transaction and other participants can verify it by decrypting it using the public key of the sender of the transaction and recomputing the hash of the transaction.
  4. Then they compare the re-computed hash and the hash received with the signature.
  5. If there is a match, accept the transaction, otherwise, reject it.
  6. Note that this is not the only verification required for the transaction acceptance. Transaction timestamp, nonce, account balances and sufficiency of fees are also verified.

What is non-repudiable?
What is nonce?

Securing the blockchain
Main components of the Ethereum Block are
  1. Block header
  2. Transactions including the transaction hash
  3. Transaction Root
  4. State variables: State hash and State root


In Ethereum Block hash is a block of all the elements in the block header including the transaction root and state root hashes. It is computed by applying a variant of the SHA-3 algorithm called Keccuk on all the items of the block header.

A typical Bitcoin block has about 2000 transactions
A typical Ethereum block has about 100 transactions



What is the advantage of Tree hashing over flat hashing?
If any transaction has to be verified, only one pathway to the tree has to be checked. You don't have to go through the entire set of transactions.

So, smart contract execution in Ethereum results in state transitions. Every state change requires state root (hash) re-computation. Instead of re-computing the hash for all set of states, Only the affected path in the Merkle Tree needs to be recomputed and not the entire tree path.


Block hash computation
Block hash in Ethereum is computed by first computing the State root hash,  Transaction root hash and the Receipt root hash shown at the bottom row of the block header diagram.

These roots and all the other items in the header are hashed together along with the variable nonce, using Keccak-256 to solve the Pow (Proof of Work) puzzle.
Block Hash solves two important purposes:
  1. Verification of the integrity of the block and the transaction
  2. Formation of the "Chain link" by embedding the previous block hash in the current block header.

If any participant node tampers with the block, its hash value changes resulting in the mismatch of the hash values and rendering the local chain of the blocks into an invalid state. Any future blocks proposed by the node will be rejected by other miners due to hash mismatch. This enforces the immutability of the chain.

Public key Cryptography

For blockchains validation and verification, two techniques are used predominantly.
These are Hashing and Asymmetric key Encryption.

Blockchain security domain, 4 topics:
  1. Public key cryptography
  2. Secure Hashing
  3. Transaction Integrity
  4. Block Integrity

We will describe the concept of asymmetric key encryption.
Then hashing and what hashing algorithms are used by Blockchain. Then explain Techniques which use these algorithms to manage the integrity of the transaction and blockchain

Public Key Cryptography

Blockchain participants are not known to each other. Participants can join and leave the chain as they wish. They cannot be identified with conventional means like Driver's License.  In this context, how do you identify the participants?
How do you authorize the transactions? How do you detect the forged or faulty transactions?

This can be accomplished using public key cryptography algorithm.
Let us first understand simple key encryption. The same key is used for encryption and decryption, therefore it is called simple key encryption.
For example: Caesar Encryption
Alphabets in a message are shifted by a fixed number of positions. This number is called a key.
Consider the message: I AM GOING
Encrypted: G CO IQGPI
Letters shifted by 2 positions. Hence the key is 2.

Drawbacks: 1. Even the encryption key or mechanism is much more complex, it is easier to derive the secret key from the encrypted data.
  1. The key distribution to other participants is a challenge.
These challenges grow in magnitude in case of blockchains where participants are unknown to each other.


Solution to these issues: Public key encryption
  1. Two keys instead of a single key. These keys form a public key and private key pair.
  2. Public keys are published. Private keys are kept safe and secret using a passphrase.
  3. When a data is encrypted with a private key, it can be decrypted by a corresponding public key and vice versa.

Example of public key encryption: Popular implementation: RSA algorithm (Rivest Shamir Adelman)
It is used in an application for password less user authentication like in Amazon Virtual machine access.

But Blockchains need more secure algorithms. So it uses  Elliptic Curve Cryptography (ECC). Both Bitcoin and Ethereum use these algorithms to generate the blockchain. ECC is stronger than RSA.
256 bits ECC key pair is ~ (equivalent in strength) to 3072 bits RSA key pair


Crypto-currency Incentive Model

Ethereum uses an incentive-based model for the creation of block.

The Proof of Work (PoW) puzzle winner or the miner that creates a new block is incentivized with the base fees of 3 Ethers and the transaction fees in the Ethereum blockchain.
The winning miner also gets the fees or the gas points for execution of the smart contract. There also may be other miners who also solved the puzzle besides the winner. These miners who solved the puzzle but didn't win the block are called Ommers. The blocks created by them are called Ommers blocks. These are added as Ommers blocks or the side blocks to the main chain. Ommer miners also get a small percentage of the total gas point as a constellation prize, and for network security.
Summary: Any transaction in Ethereum including the transfer of Ethers or executing Smart Contracts require some fee or gas points to be specified in the transaction. Miners are paid fees for security, validation, execution of smart contracts as well as the creation of blocks.

Mining is the process to secure the network by validating the computations, collecting them to form a block, verifying them and broadcasting.



What are Gas points?
Gas points in a transaction
Every operation in Ethereum requires fuel or Gas. Gas points are used to communicate fees instead of Ether for ease if computation using standard values. Gas points allow crypto-currency independent valuation of the transaction fee and computation fee.  Ether in value varies due to market swings but Gas points do not vary. Ethereum has specified Gas points for each type of operation. Mining process computes Gas points required for execution of a transaction. If the fee specified in the Gas points in the transaction is not sufficient, it is rejected. This is similar to mailing a letter with insufficient postage. The letter will not be delivered if it has insufficient postage. The gas points specified in the transaction must be present in the account for execution to happen. Any amount left over after the execution of the transaction is returned to the originating account.

Gas points in a block
Gas limit = the number of gas points available to a block to spend.
Gas spent = Actual amount of Gas spent at the creation of the block.


Ether Transfer

For a simple ether transfer,  the amount of transfer, target address, along with fees or gas points.

The amount in the fees is transferred to the respective accounts.




An Ethereum node represents the computational business unit or individual participant in the blockchain network.

An Ethereum full node holds the software for transaction initiation, Validation, Mining, Block Creation, Contract Execution, and EVM ( Ethereum Virtual Machine)

The smart contract is Object Oriented code written in a programming language like Solidity.
The Smart Contract is compiled and deployed to EVM.

When the target address in a transaction is a smart contract. The execution code corresponding to the smart contract is executed on the EVM. The input required (if any) to execute the smart contract is extracted from the payload of the transaction.

State of the smart contract may be changed by this execution. Results of the executions are stored in the receipts.
A blockchain maintains both, the state hash and the receipt hash.

All the transactions in the blockchain are validated. Validation includes the timestamp and the nonce of the transaction and also that the required fee is available in the account.

Miner nodes in the network receive, verify, gather and execute transactions. The invoked smart contract code are executed by all miners. Validated transactions are broadcasted and gathered for block creation. The consensus protocol used here is a memory based rather than a CPU based PoW (Proof of Work)

Who pays for verification, validation, and consensus?
The incentive model in the next lesson will answer this question.