Download the full Jupyter/iPython notebook from Github here
This tutorial will walk you through the basics of how to build a blockchain from scratch. Focusing on the details of a concrete example will provide a deeper understanding of the strengths and limitations of blockchains. For a higher-level overview, I’d recommend this excellent article from BitsOnBlocks.
At its core, a blockchain is a distributed database with a set of rules for verifying new additions to the database. We’ll start off by tracking the accounts of two imaginary people: Alice and Bob, who will trade virtual money with each other.
We’ll need to create a transaction pool of incoming transactions, validate those transactions, and make them into a block.
We’ll be using a hash function to create a ‘fingerprint’ for each of our transactions- this hash function links each of our blocks to each other. To make this easier to use, we’ll define a helper function to wrap the python hash function that we’re using.
import hashlib, json, sys
def hashMe(msg=""):
# For convenience, this is a helper function that wraps our hashing algorithm
if type(msg)!=str:
msg = json.dumps(msg,sort_keys=True) # If we don't sort keys, we can't guarantee repeatability!
if sys.version_info.major == 2:
return unicode(hashlib.sha256(msg).hexdigest(),'utf-8')
else:
return hashlib.sha256(str(msg).encode('utf-8')).hexdigest()
Next, we want to create a function to generate exchanges between Alice and Bob. We’ll indicate withdrawals with negative numbers, and deposits with positive numbers. We’ll construct our transactions to always be between the two users of our system, and make sure that the deposit is the same magnitude as the withdrawal- i.e. that we’re neither creating nor destroying money.
import random
random.seed(0)
def makeTransaction(maxValue=3):
# This will create valid transactions in the range of (1,maxValue)
sign = int(random.getrandbits(1))*2 - 1 # This will randomly choose -1 or 1
amount = random.randint(1,maxValue)
alicePays = sign * amount
bobPays = -1 * alicePays
# By construction, this will always return transactions that respect the conservation of tokens.
# However, note that we have not done anything to check whether these overdraft an account
return {u'Alice':alicePays,u'Bob':bobPays}
Now let’s create a large set of transactions, then chunk them into blocks.
txnBuffer = [makeTransaction() for i in range(30)]
Next step: making our very own blocks! We’ll take the first k transactions from the transaction buffer, and turn them into a block. Before we do that, we need to define a method for checking the valididty of the transactions we’ve pulled into the block.
For bitcoin, the validation function checks that the input values are valid unspent transaction outputs (UTXOs), that the outputs of the transaction are no greater than the input, and that the keys used for the signatures are valid. In Ethereum, the validation function checks that the smart contracts were faithfully executed and respect gas limits.
No worries, though- we don’t have to build a system that complicated. We’ll define our own, very simple set of rules which make sense for a basic token system:
If either of these conditions are violated, we’ll reject the transaction.
def updateState(txn, state):
# Inputs: txn, state: dictionaries keyed with account names, holding numeric values for transfer amount (txn) or account balance (state)
# Returns: Updated state, with additional users added to state if necessary
# NOTE: This does not not validate the transaction- just updates the state!
# If the transaction is valid, then update the state
state = state.copy() # As dictionaries are mutable, let's avoid any confusion by creating a working copy of the data.
for key in txn:
if key in state.keys():
state[key] += txn[key]
else:
state[key] = txn[key]
return state
def isValidTxn(txn,state):
# Assume that the transaction is a dictionary keyed by account names
# Check that the sum of the deposits and withdrawals is 0
if sum(txn.values()) is not 0:
return False
# Check that the transaction does not cause an overdraft
for key in txn.keys():
if key in state.keys():
acctBalance = state[key]
else:
acctBalance = 0
if (acctBalance + txn[key]) < 0:
return False
return True
Here are a set of sample transactions, some of which are fraudulent- but we can now check their validity!
state = {u'Alice':5,u'Bob':5}
print(isValidTxn({u'Alice': -3, u'Bob': 3},state)) # Basic transaction- this works great!
print(isValidTxn({u'Alice': -4, u'Bob': 3},state)) # But we can't create or destroy tokens!
print(isValidTxn({u'Alice': -6, u'Bob': 6},state)) # We also can't overdraft our account.
print(isValidTxn({u'Alice': -4, u'Bob': 2,'Lisa':2},state)) # Creating new users is valid
print(isValidTxn({u'Alice': -4, u'Bob': 3,'Lisa':2},state)) # But the same rules still apply!
Each block contains a batch of transactions, a reference to the hash of the previous block (if block number is greater than 1), and a hash of its contents and the header
We’re ready to start making our blockchain! Right now, there’s nothing on the blockchain, but we can get things started by defining the ‘genesis block’ (the first block in the system). Because the genesis block isn’t linked to any prior block, it gets treated a bit differently, and we can arbitrarily set the system state. In our case, we’ll create accounts for our two users (Alice and Bob) and give them 50 coins each.
state = {u'Alice':50, u'Bob':50} # Define the initial state
genesisBlockTxns = [state]
genesisBlockContents = {u'blockNumber':0,u'parentHash':None,u'txnCount':1,u'txns':genesisBlockTxns}
genesisHash = hashMe( genesisBlockContents )
genesisBlock = {u'hash':genesisHash,u'contents':genesisBlockContents}
genesisBlockStr = json.dumps(genesisBlock, sort_keys=True)
Great! This becomes the first element from which everything else will be linked.
chain = [genesisBlock]
For each block, we want to collect a set of transactions, create a header, hash it, and add it to the chain
def makeBlock(txns,chain):
parentBlock = chain[-1]
parentHash = parentBlock[u'hash']
blockNumber = parentBlock[u'contents'][u'blockNumber'] + 1
txnCount = len(txns)
blockContents = {u'blockNumber':blockNumber,u'parentHash':parentHash,
u'txnCount':len(txns),'txns':txns}
blockHash = hashMe( blockContents )
block = {u'hash':blockHash,u'contents':blockContents}
return block
Let’s use this to process our transaction buffer into a set of blocks:
blockSizeLimit = 5 # Arbitrary number of transactions per block-
# this is chosen by the block miner, and can vary between blocks!
while len(txnBuffer) > 0:
bufferStartSize = len(txnBuffer)
## Gather a set of valid transactions for inclusion
txnList = []
while (len(txnBuffer) > 0) & (len(txnList) < blockSizeLimit):
newTxn = txnBuffer.pop()
validTxn = isValidTxn(newTxn,state) # This will return False if txn is invalid
if validTxn: # If we got a valid state, not 'False'
txnList.append(newTxn)
state = updateState(newTxn,state)
else:
print("ignored transaction")
sys.stdout.flush()
continue # This was an invalid transaction; ignore it and move on
## Make a block
myBlock = makeBlock(txnList,chain)
chain.append(myBlock)
chain[0]
chain[1]
As expected, the genesis block includes an invalid transaction which initiates account balances (creating tokens out of thin air). The hash of the parent block is referenced in the child block, which contains a set of new transactions which affect system state. We can now see the state of the system, updated to include the transactions:
state
Now that we know how to create new blocks and link them together into a chain, let’s define functions to check that new blocks are valid- and that the whole chain is valid.
On a blockchain network, this becomes important in two ways:
We will need three functions to facilitate in this:
def checkBlockHash(block):
# Raise an exception if the hash does not match the block contents
expectedHash = hashMe( block['contents'] )
if block['hash']!=expectedHash:
raise Exception('Hash does not match contents of block %s'%
block['contents']['blockNumber'])
return
def checkBlockValidity(block,parent,state):
# We want to check the following conditions:
# - Each of the transactions are valid updates to the system state
# - Block hash is valid for the block contents
# - Block number increments the parent block number by 1
# - Accurately references the parent block's hash
parentNumber = parent['contents']['blockNumber']
parentHash = parent['hash']
blockNumber = block['contents']['blockNumber']
# Check transaction validity; throw an error if an invalid transaction was found.
for txn in block['contents']['txns']:
if isValidTxn(txn,state):
state = updateState(txn,state)
else:
raise Exception('Invalid transaction in block %s: %s'%(blockNumber,txn))
checkBlockHash(block) # Check hash integrity; raises error if inaccurate
if blockNumber!=(parentNumber+1):
raise Exception('Hash does not match contents of block %s'%blockNumber)
if block['contents']['parentHash'] != parentHash:
raise Exception('Parent hash not accurate at block %s'%blockNumber)
return state
def checkChain(chain):
# Work through the chain from the genesis block (which gets special treatment),
# checking that all transactions are internally valid,
# that the transactions do not cause an overdraft,
# and that the blocks are linked by their hashes.
# This returns the state as a dictionary of accounts and balances,
# or returns False if an error was detected
## Data input processing: Make sure that our chain is a list of dicts
if type(chain)==str:
try:
chain = json.loads(chain)
assert( type(chain)==list)
except: # This is a catch-all, admittedly crude
return False
elif type(chain)!=list:
return False
state = {}
## Prime the pump by checking the genesis block
# We want to check the following conditions:
# - Each of the transactions are valid updates to the system state
# - Block hash is valid for the block contents
for txn in chain[0]['contents']['txns']:
state = updateState(txn,state)
checkBlockHash(chain[0])
parent = chain[0]
## Checking subsequent blocks: These additionally need to check
# - the reference to the parent block's hash
# - the validity of the block number
for block in chain[1:]:
state = checkBlockValidity(block,parent,state)
parent = block
return state
We can now check the validity of the state:
checkChain(chain)
And even if we are loading the chain from a text file, e.g. from backup or loading it for the first time, we can check the integrity of the chain and create the current state:
chainAsText = json.dumps(chain,sort_keys=True)
checkChain(chainAsText)
In an actual blockchain network, new nodes would download a copy of the blockchain and verify it (as we just did above), then announce their presence on the peer-to-peer network and start listening for transactions. Bundling transactions into a block, they then pass their proposed block on to other nodes.
We’ve seen how to verify a copy of the blockchain, and how to bundle transactions into a block. If we recieve a block from somewhere else, verifying it and adding it to our blockchain is easy.
Let’s say that the following code runs on Node A, which mines the block:
import copy
nodeBchain = copy.copy(chain)
nodeBtxns = [makeTransaction() for i in range(5)]
newBlock = makeBlock(nodeBtxns,nodeBchain)
Now assume that the newBlock
is transmitted to our node, and we want to check it and update our state if it is a valid block:
print("Blockchain on Node A is currently %s blocks long"%len(chain))
try:
print("New Block Received; checking validity...")
state = checkBlockValidity(newBlock,chain[-1],state) # Update the state- this will throw an error if the block is invalid!
chain.append(newBlock)
except:
print("Invalid block; ignoring and waiting for the next block...")
print("Blockchain on Node A is now %s blocks long"%len(chain))
We’ve created all the basic architecture for a blockchain, from a set of state transition rules to a method for creating blocks, to mechanisms for checking the validity of transactions, blocks, and the full chain. We can derive the system state from a downloaded copy of the blockchain, validate new blocks that we recieve from the network, and create our own blocks.
The system state that we’ve created is effectively a distributed ledger or database- the core of many blockchains. We could extend this to include special transaction types or full smart contracts.
We haven’t explored the network architecture, the proof-of-work or proof-of-state validation step, and the consensus mechanism which provides blockchains with security against attack. We also haven’t discussed public key cryptography, privacy, and verification steps. More on that in the future!