[Cross-posted at blog.chain.com.]
This is part two in a series. In part one, we learned that the big idea behind blockchains is this:
In this article we’ll take a closer look at just how this is done. That is, we’ll look at how:
- I give the rest of the world
- A signed statement
- Saying I paid you
Let’s take these one at a time, in reverse order.
Step 3: …Saying I paid you
Suppose I want to pay you ten dollars on a blockchain. To “say” that I paid you, I have to construct a message called a transaction that combines information about what I’m paying with where I’m sending it.
| what | “ten dollars” | 
| where | “to you” | 
The ten dollars is called the input to the transaction. Where it’s going is called the output. Ultimately this message will be incorporated into a blockchain, which we learned last time is a ledger – a record of transactions – that is immutable, distributed, and cryptographically secure. More about this below.
Of course I have to have ten dollars before I can pay it to you. It has to come from somewhere. So the input needs to be some earlier transaction saying that someone paid me ten dollars.1 This means each transaction has to have some sort of unique name, or number, or other identifier, so later transactions can refer back to earlier ones.
| transaction-id | “unique identifier for this transaction” | 
| input | “unique identifier for some earlier ten-dollar transaction” | 
| output | “you” | 
What if the only earlier transaction I have is one where I received twelve dollars? Since I only want to send you ten, and since I have to use up all of the earlier transaction (for reasons that will become clear), my new transaction must send you your ten dollars and must also send me two dollars as change. This means that a transaction must be able to have multiple outputs.
| transaction-id | “unique identifier for this transaction” | 
| input | “unique identifier for some earlier $12 transaction” | 
| output1 | “$10 to you” | 
| output2 | “$2 to me” | 
Now that we’ve decided transactions can have multiple outputs, it’s necessary to say which output of an earlier transaction you’re using as the input.
| transaction-id | “unique identifier for this transaction” | 
| input | “unique identifier for output1 of some earlier $12 transaction” | 
| output1 | “$10 to you” | 
| output2 | “$2 to me” | 
And what if I don’t have a single $10 or $12 transaction to draw on, but I do have a $5 one and a $7 one? Let’s let transactions have multiple inputs as well as multiple outputs.
| transaction-id | “unique identifier for this transaction” | 
| input1 | “unique identifier for output1 of some earlier $5 transaction” | 
| input2 | “unique identifier for output1 of some earlier $7 transaction” | 
| output1 | “$10 to you” | 
| output2 | “$2 to me” | 
Let’s now focus on those unique transaction identifiers. How should they be chosen so that:
- Distinct transactions have distinct identifiers, and
- Anyone in the world can construct his or her own transaction, and
- No one needs to coordinate with anyone else, or with any central authority, in order to construct a transaction?
The main problem is to prevent “collisions” – two different transactions having the same identifier. If you and I both construct a transaction at the same time, on opposite sides of the world, and don’t coordinate with each other or anyone else, what’s to stop us from accidentally choosing MYCOOLTRANSACTION17 as the identifier for both transactions?
Blockchains solve this problem using a technique called hashing. This is a process that transforms a message of any length, such as the transactions we’re constructing, into a single number of a predetermined size, called a hash. There are several different recipes for computing the hash of a message; they have names like MD5 and SHA1. But good hashing recipes all have the same goals:
- Given a message, it must be easy to compute the hash (well, easy for a computer);
- Given only the hash, it must be close to impossible to come up with a message that produces it (even for a computer!);
- Two identical messages always produce the same hash;
- Even a tiny difference between two messages must produce wildly different hashes.
The ease of going from message to hash, and the difficulty of going from hash to message, makes this a so-called one-way function, an idea that will be important a little later on.
Now, when squashing a long message down to a number of a predetermined size, it’s unavoidable that different messages will collide – i.e., produce the same hash. But if the predetermined size is big enough – 32 bytes, say – and if the recipe is very good at scattering hashes evenly throughout all 232×8 possible values (that’s 100 quadrillion-quadrillion-quadrillion-quadrillion-quadrillion, give or take a few quadrillion-quadrillion-quadrillion-quadrillion-quadrillions), then the odds of a collision are so low as to be effectively impossible.2
So when you and I construct our transactions, we don’t choose identifiers at all. Instead, we compute identifiers that are nothing more or less than a hash of each transaction’s contents.
| input1 | 
 | 
| input2 | 
 | 
| output1 | $10 to you | 
| output2 | $2 to me | 
When you are deciding whether to accept this transaction as payment for something, you can consult the complete history of transactions on the blockchain to make sure that the inputs of this transaction really do exist, and that they haven’t already been spent in some other transaction. Later on, when you want to spend this money you’re now receiving, someone else will look at this transaction to make sure you own it.
Using a transaction’s hash as its unique identifier also explains why one must consume all of a transaction’s output at the same time (as when, in an earlier example above, I had to consume a $12 transaction output and return $2 to myself as change). If I could consume only part of an old transaction, that would alter the amount available from that old transaction. Altering the transaction would change its hash, which cannot be allowed if hashes are permanent, unchanging unique identifiers for transactions. Once published on a blockchain, a transaction can never change, it can only be referenced by newer transactions.
Step 2: …A signed statement…
Remember that this transaction, like all others on a blockchain, is a message that’s going to everyone in the world. My earlier $5 and $7 transactions, the source of the funds I’m paying to you, are sitting out there on the blockchain for everyone to see, like all unspent transaction outputs, just waiting to be used. What prevents someone else from using them in a payment of their own?
This is where the “to you” and “to me” part of the transaction outputs come into play. I need to be able to write “to you” in such a way that no one but you can construct a new transaction claiming that $10.
This is done using so-called public-private keypairs. You choose a very (very, very) large random number and keep it secret. This is your “private key.” This number can be transformed with some fancy arithmetic into another number, the “public key,” that you publish for everyone to see. The fancy arithmetic is a one-way function akin to hashing, so no one with only your public key can figure out your private key.
Public-private keypairs have some amazing superpowers. One of them is that you can digitally sign a message so that everyone in the world can be sure it’s you signing it. You do this by combining your private key in a particular way with the message you’re signing (or, more typically, a hash of the message you’re signing). The resulting “signature” has some special properties:
- It was created using another one-way function, so no one looking at just the signature can discover either your private key or the message you’ve signed;
- There remains a mathematical relationship between the signature and your public key, so if someone has that and the message you signed, they can verify that the signature is genuine. Even without knowing your private key, they can be sure the signature was made from it, and from that particular message and no other. (So no one can take your valid signature from one transaction and stick it on another one in the hope that it’ll be valid there – it won’t.)
So to make sure that only you can access the $10 I’m paying you, I secure the output of my transaction by attaching your public key. I also secure the $2 in change that I’m paying to myself by attaching my public key.
| input1 | 
 | 
| input2 | 
 | 
| output1 | 
 | 
| output2 | 
 | 
In order to redeem one transaction’s output for use as the input to another transaction, the payee supplies a digital signature made from the new transaction’s hash and his or her private key. My transaction paying you $10 redeems $5 and $7 from two earlier transactions, which were paid to my public key, so I redeem them like so:
| input1 | 
 | 
| input2 | 
 | 
| output1 | 
 | 
| output2 | 
 | 
Anyone can look at this transaction and verify that my signature on the inputs matches the public key attached to the earlier transactions’ outputs. As long as I’ve kept my private key secret, no one else can produce a valid signature that matches both this transaction and my public key.
The balance of money that I own on the blockchain is simply the sum of all unspent transaction outputs that have my public key attached.
Step 1: I give the rest of the world…
These transactions must be distributed to be useful, meaning that everyone in the world has, or can get, the data they need to validate transactions.3 If I create a transaction sending you $10, in principle you’ll need the entire history of earlier transactions leading up to that one in order to validate it (i.e., to believe that you’re really receiving $10), including all the unrelated transactions in the system to ensure I haven’t spent that same $10 somewhere else. When you want to spend the $10 I send you, your payee will need the same thing.4
It’s easy to imagine a system in which each new transaction is broadcast to all blockchain participants that are somehow subscribed to new-transaction notices. But the reality of network delays means that different subscribers will receive these notices in different orders. (Transactions that originate closer on the network will arrive sooner, in general, than transactions that need more “hops” to get to you.) The system only works if everyone has a consistent view of the transaction history: if I see A, then B, and you see B, then A, we might disagree about the validity of C, and a distributed ledger (or any ledger, really) can’t work if there’s disagreement about a transaction’s validity. Here’s why: if I were dishonest,5 I might try to exploit network delays to spend the same $10 twice, to two different people, each of whom might believe (thanks to differences in ordering) that theirs is the valid $10 and the other is the invalid double-spend. No one would be willing to accept either person’s (purported) $10 as payment for anything, and confidence in the whole scheme goes out the window.
What’s needed is some authority that everyone can trust to put a stamp on the official correct ordering of transactions; and once the order is set, to publish the sequence for all to see. The published sequence could, in principle, consist of a list of individual, timestamped transactions, digitally signed by the timestamping authority; but if there are more than just a few transactions each second, the processing and communication overhead of this approach is prohibitive. For efficiency, it’s better to group transactions into blocks, certifying and publishing a block containing many transactions every so often, with each block linked to the block before it (by including the earlier block’s unchangeable hash, in the same way transactions refer to other transactions by their hashes) in an ever-lengthening blockchain.
Whom to trust for generating blocks in the chain? That depends on how a particular blockchain is going to be used. If it’s for managing an anti-authoritarian global cryptocurrency, the answer is “no one.” If it’s for managing the loyalty-reward points of a national coffee-shop chain, the answer is probably the corporate parent of the coffee shops. Other use cases require in-between levels of trust.
There are techniques for concentrating trust or spreading it around to match different use cases. The just-trust-headquarters case is easy, of course: everyone sends their proposed transactions there, and listens for the blocks that occasionally emerge, confirming their transactions. The trust-no-one case has everyone broadcasting their proposed transactions to as many others as they can, and everyone racing to collect them up and be the one that produces the next valid block in exchange for some small reward (a process called “mining,” designed so no one person or group can control the contents of the blockchain). The in-between case of trusting a group of independent authorities can require that, if one of that group proposes a block, all or a majority of the others must endorse it by adding their digital signatures.
In most cases, the simple existence of a transaction in a block of the blockchain is the transfer of money: final and authoritative, with no further steps required before the recipient can spend what they’ve just received – by adding a transaction of their own.
Sounds great but
Transferring money (or other kinds of value) on a blockchain is as fast and easy as handing someone cash – easier, since you don’t have to be in the same place to do it.
But cash isn’t the right answer for every type of transaction. Sometimes you need a delay, and sometimes you need to cancel or reclaim your payment. And what about this everyone-can-see-every-transaction business? Do you really want to give everyone in the world the ability to look at your whole purchase history?6
There are ways to preserve privacy on a blockchain, as well as ways to delay payment until a certain time elapses or other conditions are met, and even ways to eliminate “counterparty risk” (the risk that you pay for something and then don’t get what you paid for), but I’ve gone on long enough for now and discussion of those will have to wait until part three.
- And that transaction had to have a source too, and so on, and so on. Where do the dollars on a blockchain ultimately come from? It’s a good question with a complicated answer that we won’t get to in this article. The short version is that participants can “buy in” to a blockchain in the same way one converts dollars to chips in order to play at a casino (among other options). [↩]
- Many newcomers to hashing worry about the difference between “effectively impossible” and “actually impossible” and waste a lot of energy in a vain attempt to eliminate even the tiny remaining possibility of a hash collision. But that’s only because our ape brains are bad at understanding really, really, really tiny possibilities. When it’s likelier that your blockchain system will be disrupted by simultaneous drunken-rhinoceros stampedes at multiple datacenters than by even one hash collision, your efforts are better directed elsewhere (like putting up rhino fencing). [↩]
- Who is “everyone in the world”? It would be more accurate to say “everyone participating in a particular blockchain.” A blockchain managing consumer dollars, as in the examples in this article, would necessarily be global, and “everyone in the world” would literally mean everyone in the world. Other blockchains managing other kinds of asset might confine participants to particular companies’ customers, or particular traders, investors, or institutions. [↩]
- If you’re thinking that’s a tremendous data requirement, you’re not wrong. In a future article we’ll discuss clever ways to mitigate this and even make it fast. [↩]
- I’m not. But if I were, that’s just what I would say. [↩]
- Millennials: this is a rhetorical question. The answer is “no.” [↩]