Random numbers are often useful during programming - they can be used for rendering pretty animations, generating interesting content in computer games, load balancing, executing a randomized algorithm, etc. Unfortunately, CPUs are deterministic machines, and (controversial RDRAND instruction aside) cannot just generate random numbers out of thin air. This left programmers and computer designers with few options:
- Invest in additional devices (Hardware Random Number Generators).
- Use existing hardware in an unintended way (for example, by collecting lowest bits of audio input from a microphone, measuring hard disk seek times or timing keystrokes).
- Fake it - generate numbers that "look" random, but aren't.
All these options were explored, but dedicated devices never went mainstream, and other ways of gathering entropy have too small throughput to be used exclusively. This left programmers with the third option - numbers that look random but are in fact generated by a completely deterministic algorithm. These algorithms are called "Pseudo Random Number Generators", or PRNGs in short.
PRNGs are usually really good at generating statistically random numbers. A quality of generator can be measured by one of few standardized tests, like TestU01 or DIEHARD test suite - and good PRNGs are often as good as true random number generators (TRNG).
Unfortunately, there is one problem with PRNGs that cannot be fixed - they are still deterministic "in heart", and knowing a full internal state of PRNG allows an attacker to predict all future (and, usually, previous) values. This usually isn't a problem, unless PRNGs are used for security sensitive things - like generating certificates, encryption keys, secrets, etc 1. In this post, I'll show how easily PRNGs can be cracked (cracking PRNG means recovering its internal state and predicting future values).
This time I'll focus on one specific kind of PRNGs - Linear Congruential Generators. They are defined by three integers, "multiplier", "increment" and "modulus", and can be implemented in three lines of Python code:
class prng_lcg: m = 672257317069504227 # the "multiplier" c = 7382843889490547368 # the "increment" n = 9223372036854775783 # the "modulus" def __init__(self, seed): self.state = seed # the "seed" def next(self): self.state = (self.state * self.m + self.c) % self.n return self.state def test(): gen = prng_lcg(123) # seed = 123 print gen.next() # generate first value print gen.next() # generate second value print gen.next() # generate third value
LCGs are one of the most popular pseudo-random number generators. There are reasons for that: they're mathematically elegant, very easy to understand/implement and very fast, especially when a modulus is a power of two (because slow modular division can be replaced with binary AND in this case). Unfortunately, they're not perfect at being statistically random (depending on chosen constants, resulting bits often have varying level of "randomness") and, as we'll see soon, are dramatically weak at being cryptographically secure.
Challenge 0: everything known
After a short theoretical introduction, let's focus on the attacks. This is not actually a challenge yet, just explanation of what we're trying to achieve. Let's say that we are observing one LCG, and it generated three consecutive values:
s0 = 2300417199649672133 s1 = 2071270403368304644 s2 = 5907618127072939765
And we want to learn the next value that will be generated, without actually calling PRNG again. In this case, we have all necessary information (state, m, c, n) so the problem is trivial - we just plug the values into the formula. Let's check:
In : m = 672257317069504227 # the "multiplier" ...: c = 7382843889490547368 # the "increment" ...: n = 9223372036854775783 # the "modulus" ...: s0 = 2300417199649672133 # seed In : s1 = (s0*m + c) % n In : s2 = (s1*m + c) % n In : s3 = (s2*m + c) % n In : s4 = (s3*m + c) % n In : s1 Out: 2071270403368304644L # correct In : s2 Out: 5907618127072939765L # correct In : s3 Out: 5457707446309988294L # predicted!
We know a full internal state of our LCG, and we can easily generate all future values. In fact, we can even go back and get all previously generated values, which is a security problem too.
Challenge 1: unknown increment
Ok, let's move to the first challenge. What if we don't know "increment"? i.e:
m = 81853448938945944 c = # unknown n = 9223372036854775783
And let's say that we know two consecutive values generated by this LCG:
s0 = 4501678582054734753 s1 = 4371244338968431602
Can we still attack this? Again, in this case, all values are 64bit, too much to bruteforce it. Let's use some basic math instead.
s1 = s0*m + c (mod n) c = s1 - s0*m (mod n)
Easy. Now we can implement our attack in Python and plug concrete values in:
def crack_unknown_increment(states, modulus, multiplier): increment = (states - states*multiplier) % modulus return modulus, multiplier, increment print crack_unknown_increment([4501678582054734753, 4371244338968431602], 9223372036854775783, 81853448938945944)
That's it - challenge solved.
Challenge 2: unknown increment and multiplier
Previous two levels were rather trivial, time for something more interesting. Now we know neither multiplier nor increment:
m = # unknown c = # unknown n = 9223372036854775783
Now we don't know increment and multiplier. At least we get to know three consecutive values from LCG:
s0 = 6473702802409947663 s1 = 6562621845583276653 s2 = 4483807506768649573
This looks much harder, but really isn't - we still have two linear equations, and two unknowns, so everything should go smoothly:
s_1 = s0*m + c (mod n) s_2 = s1*m + c (mod n) s_2 - s_1 = s1*m - s0*m (mod n) s_2 - s_1 = m*(s1 - s0) (mod n) m = (s_2 - s_1)/(s_1 - s_0) (mod n)
And when we know multiplier, problem is reduced to the one we already solved in chellenge 1. Let's implement this in Python:
def crack_unknown_multiplier(states, modulus): multiplier = (states - states) * modinv(states - states, modulus) % modulus return crack_unknown_increment(states, modulus, multiplier) print crack_unknown_multiplier([6473702802409947663, 6562621845583276653, 4483807506768649573], 9223372036854775783)
This algorithm uses modular division, so we'll need modular inverse too. We can use this one.
def egcd(a, b): if a == 0: return (b, 0, 1) else: g, x, y = egcd(b % a, a) return (g, y - (b // a) * x, x) def modinv(b, n): g, x, _ = egcd(b, n) if g == 1: return x % n
Challenge 3: unknown increment, multiplier and modulus
Now a list of values that we know doesn't look very interesting:
m = # unknown c = # unknown n = # unknown
But we have a lot of generated integers:
s0 = 2818206783446335158 s1 = 3026581076925130250 s2 = 136214319011561377 s3 = 359019108775045580 s4 = 2386075359657550866 s5 = 1705259547463444505 s6 = 2102452637059633432
Unfortunatelly, this time we can't solve this with simple linear equations - we don't know modulus, so every equation we'll form will introduce new unknown:
s1 = s0*m + c (mod n) s2 = s1*m + c (mod n) s3 = s2*m + c (mod n)
This doesn't look bad - three equations and three unknowns. At least until we remember, that by definition this really is equivalent to:
s1 - (s0*m + c) = k_1 * n s2 - (s1*m + c) = k_2 * n s3 - (s2*m + c) = k_3 * n
Six unknowns and three equations. And it's clear that no number of equations will help us because every new equation introduces new unknown. Fortunately, there is a mathematical trick that usually comes in handy in situations like this. Namely, interesting number theory fact: if we have few random multiples of
n, with large probability their gcd will be equal to n. For example:
In : n = 123456789 In : reduce(gcd, [randint(1, 1000000)*n, randint(1, 1000000)*n, randint(1, 1000000)*n]) Out: 123456789
Why is this useful? Because if we can think of some modular operations that will give something congruent to zero, for example:
X = 0 (mod n)
Then, by definition, this is equivalent to:
X = k*n
This is only interesting if
X != 0, but
X = 0 (mod n). We just need to take a gcd from few such values, and we can recover n. This is a really generic method, that can often be used when modulus that we're using is unknown.
Ok, now how can we generate something like this for above LCG? We can introduce sequence T(n) = S(n+1) - S(n):
t0 = s1 - s0 t1 = s2 - s1 = (s1*m + c) - (s0*m + c) = m*(s1 - s0) = m*t0 (mod n) t2 = s3 - s2 = (s2*m + c) - (s1*m + c) = m*(s2 - s1) = m*t1 (mod n) t3 = s4 - s3 = (s3*m + c) - (s2*m + c) = m*(s3 - s2) = m*t2 (mod n)
And now we can use this trick to generate our desired operation:
t2*t0 - t1*t1 = (m*m*t0 * t0) - (m*t0 * m*t0) = 0 (mod n)
Using this method we can generate few values congruent to 0, and crack LCG with mentioned "trick". Attack in Python, again:
def crack_unknown_modulus(states): diffs = [s1 - s0 for s0, s1 in zip(states, states[1:])] zeroes = [t2*t0 - t1*t1 for t0, t1, t2 in zip(diffs, diffs[1:], diffs[2:])] modulus = abs(reduce(gcd, zeroes)) return crack_unknown_multiplier(states, modulus) print crack_unknown_modulus([2818206783446335158, 3026581076925130250, 136214319011561377, 359019108775045580, 2386075359657550866, 1705259547463444505])
Looks like it works!
So we have just cracked LCG, without bruteforcing, with basically zero knowledge - only by observing its output. So can we crack every LCG in the world now? Unfortunately, no. The reason we were able to mount the attacks so easily is that all the operations are linear, and we get a complete state of LCG every time. This is basically a cryptologist's dream.
Unfortunately, in a real world, things aren't always that easy. Most importantly, all these attacks can be disrupted by truncating the output - for example, using 64bit integers for computation, but returning
state % 2**32. This method is often used in practice (not for improved security, but because it's improving number distribution). Alternatively, introducing nonlinear operations somewhere will complicate attacks greatly (for example
xoring state with something).
Of course, this doesn't mean that simple truncation will ruin our chances of cracking LCG - we'll just have to use more complicated math, like all-powerful LLL Algorithm. I'll come back to these attacks sooner or later - but this blog post is getting long anyway, and I've heard that nobody reads more than few pages. For now, the main takeaway from this article is:
- If you need random numbers for anything crypto related, use secure random number generator, not just any PRNG.
- LCG is not a secure RNG.
Although CSPRNGs (Cryptographically Secure Pseudo Random Number Generators) can be used in such case ↩