The generation of random numbers is too important to be left to chance!
Random number generation has always been important in computation. Although local and cloud computing have had reliable pseudorandom number generators (PRNG) for decades, blockchain presents new challenges.
First, a quick primer on pseudorandom number generation. Although there are many algorithms, most PRNGs start with a “seed” — a sequence of 0
s and 1
s chosen based on some form of entropy, e.g. how you move your mouse around the screen. A PRNG uses that seed as a starting point on a special curve and the algorithm inside will bounce around that curve, outputting a series of numbers that is deterministic but appears random (i.e. based on knowledge of the last n values, there is no way to reliably predict the n+1
value other than the probability distribution function of the curve).
One of the objectives of blockchain and Web 3 in general is that computers in the network don’t need to trust each other. In a basic sense, the way the computers in the network operate without trusting each other is by having many computers execute the same software and only accepting the result if some majority (usually but not always 1/2 or 2/3, for reasons beyond the scope of this article) agree on the results of the execution.
This is not too difficult for programs like updating account balances. Subtract a number from the balance of account A, add the same number minus some transaction fee to account B. Everyone can agree on that. But what about when there is randomness?
Why is this important?
This article will not cover specific curves or algorithms, but will focus on the desired properties of random number generation, the specific challenges to blockchain, and the best time and place to compute random numbers.
We want randomness that satisfies two properties:
Because computers cannot generate truly random numbers, we either need a way of agreeing on a PRNG seed or some external form of entropy that satisfies unpredictability and unbiasability.
One approach that has already been used is to use the output of hash functions. They are deterministic, but there’s no way to tell what the output will be based on the input until you actually execute it. Why don’t we agree to use some set of bits from the hash of a current or future block header as the random number?[2] Let’s say you have a coin flip game. You could just take the least significant bit (LSB) of the block header’s hash and use that 0 or 1 as your result. With all the electricity being spent hashing, why not use the hash, right?
Yet, how do you get the block hash producer to publish this block? In a proof of work blockchain, the incentive to publish a valid block is the block reward. But let’s imagine that the block producer is implicated in the bet. For example, the block producer will win $1000 if the LSB of the hash is a 1, but his valid block ends in a 0, and the current block reward is equal to $300. In this case, he could choose to not propagate this block, because the expected value of the bet is $500 if he lets another miner find the block.[3]
So the miner here has a probability of winning the bet greater than 50% (viz. 50% of the span between his percent of network hash power and 1). This may only yield a final probability of 50.001%, but in high frequency trading and other probabilistic endeavors, an advantage of that size can yield millions in profit over competitors.[4]
(I would actually expand this and say that any information related to consensus should exclusively be used for consensus. Proof of work is routinely criticized for “wasting” energy, and some solutions have been to put those CPU cycles to use for a dual and usually altruistic purpose, e.g. protein folding. The energy wasted is in fact what keeps the chain secure: if protein folding suddenly becomes valuable, then chain security would be second priority. As soon as you introduce other incentives like the outcome of applications into consensus parameters, one of the two — consensus or application — will be manipulated.)
Our first objective (unpredictability) is an easier target and has already been tackled in distributed computing. Multi-player gaming is an application of this: you want unpredictability in your game, but it needs to be executed on many local machines, and the overall state of the game needs to be in agreement between the participants.[5] The second is more difficult, because how can you force someone to reveal a result they don’t like?
Ideally, we would have a PRNG that is verifiable (we assume that a PRNG is unpredictable). A verifiable function is one that takes P-time to compute and NP-time to verify. This essentially means that computing the solution may have been highly intensive, but verifying if it is correct or not is easy.
A good application of this would be a validator selection scheme where everyone checks to see if she is the validator, and if she is then she computes and proposes a block. Everyone can check his own validator status and verify the actual validator’s legitimacy faster than anyone could compute who the validator is from scratch (and try to bribe or attack the validator before she proposes a block).
More recent work on solving this came from Dan Boneh in a paper on Verifiable Delay Functions. VDFs propose a few functions (usually based on squaring a number many times) that can only be computed on a single CPU thread to prevent adversaries with more CPUs from using parallel processing to compute the solution.[6] The solution to the VDF that took n
steps to compute can be verified in log(n)
steps, so honest nodes can verify that a solution is correct faster than an adversary could compute a new one.
Boneh’s paper on VDFs was only released in 2018, and even they admit that their example functions need work. This is cutting edge work and we can expect (hope for) a lot of advances.
Solving the second trait (unbiasability) is difficult because you can’t force someone to perform a computation or to reveal the result. The leading method right now is a commit-reveal scheme. Our goal here is to agree on a PRNG seed publicly so that nobody can bias the results by secretly changing the seed or refusing to propagate a solution.
First, everyone agrees on a pseudorandom function to use. Later, it will be easy to verify that everyone used the correct function, because we can all run the function ourselves and verify the results. Now we need to agree on what seed to use, and we don’t want anyone to know the seed in advance (it would be trivial to make a winning bet if everyone knew the seed) and we certainly don’t want any individual to select the seed by seeing everyone else’s and revealing last.
As an example, let’s do a commit-reveal scheme with three parties: an application owner (e.g. online gambling site), a user, and node that will perform the computation. Each party generates a seed, but does not reveal it. Instead, each party reveals a hash of the seed.
At this point in time, each party has their own seed and the hashes of the other parties’ seeds. Next, everyone reveals his or her actual seed. The final seed for the pseudorandom function is
S = S1 XOR S2 XOR S3
If you are S3
, and you receive S1
and S2
, you could now create a new S3’
that flips whatever bits needed to get the final seed that you desire.[7] But you’ve already sent the hash of S3
, and everyone will know if you try to change S3
after the fact! Everyone is committed to the seed he or she created before knowing the other seeds, and even if one party doesn’t reveal the computation result, everyone else has the information necessary to compute the function output themselves and come to agreement.
The commit-reveal scheme still has a flaw: The last person to reveal can see the final result before everyone else and decide not to reveal if the outcome will not be favorable. In principle, there are two ways to approach this: technical and economical. As examples, a technical solution could employ a smart contract to hold the individual seeds and then reveal all of them once the parties confirm that they have each received a hash of each seed. Economically, you could create some punishment for actors who do not reveal.
This is all quite high level and this remains an open problem; new solutions are likely to have weaknesses. Randomness and determinism are at odds, so convincing people to trust random numbers generated by someone they don’t trust will be an evolving challenge in decentralized computation.
0
does nothing and a 1
toggles the switch. i.e. 0 XOR 0 = 0
, 0 XOR 1 = 1
, 1 XOR 0 = 1
, 1 XOR 1 = 0
. Let’s say the application will use the first 100 random numbers and an attacker wants to gain an advantage by manipulating S
. To select this S
, the attacker would just try billions of seeds and use the one that gave the most favorable first 100 outputs. To select S3’
, compare the desired S
with S1 XOR S2
, and S3’
is 0
wherever they agree and 1
where they do not.Verifiable Random Functions Micali, Rabin, Vadhan, 1999 https://people.csail.mit.edu/silvio/Selected%20Scientific%20Papers/Pseudo%20Randomness/Verifiable_Random_Functions.pdf
Verifiable Delay Functions, Boneh, Bonneau, Bunz, Fisch, 2018 https://eprint.iacr.org/2018/601.pdf
Introduction to Verifiable Delay Functions, Trail of Bits, 2018 https://blog.trailofbits.com/2018/10/12/introduction-to-verifiable-delay-functions-vdfs/
Meta-Analysis of Alternative Consensus Protocols, Gauba, Krishnan, Koticha, Uddin, Movahedi, 2018 https://github.com/Mechanism-Labs/MetaAnalysis-of-Alternative-Consensus-Protocols