Collision attack for StreamHash5
StreamHash is an alternative family of cryptographic hash functions, designed for maximum speed, while being reasonably secure at the same time.
As far as I know, it’s not used in any production software, but is not completely fringe too - it was submitted to the SHA-3 competition and prompted a few academic publications.
StreamHash5 is the newest member of the family. It was created as an improved version of StreamHash4, after I implemented a practical collision attack for it 1.
In this blog post, I will describe my impractical collision attack on StreamHash5 (and a practical partial attack). I stress that to manage expectations, but it’s enough to consider StreamHash5 broken cryptographically:
- Hash size is 512 bits. Hence, the generic birthday attack complexity is
2**256
. - My attack complexity is roughly
2**128 * 2**38 = 2**166
.
While the project’s readme clearly states that There are currently no known collision attacks easier than the generic birthday attack
2.
I’m also confident the attack complexity can be improved. But this research spent the last 2 years in my drawer and I think it’s time to finally publish it.
Partial collision example (line breaks added for clarity):
$ cat data.{0,1}.hex
626974636820646f20317420616c6f6ec0402c500c6570577b3c09fb4e966d2215ea0bb4a024bab440e907c9cc7e3bd0
696620796f752077616e6e61206372794b6dfb042aed03d14e8dbd4936536c3500000000000000000000000000000000
$ streamhash5sum data.0.bin
65bf759586d6b23e3ba7b36b9150920b # collision
6f1daa35f658c4181fc25960d2b0ac61 # collision
71df8e636e38beecbf1d36ae7f9c26e4 # collision
e4121fda76468f48ae542cb70a31875e
$ streamhash5sum data.1.bin
65bf759586d6b23e3ba7b36b9150920b # collision
6f1daa35f658c4181fc25960d2b0ac61 # collision
71df8e636e38beecbf1d36ae7f9c26e4 # collision
82ef6e3e544e910fc99b69c77ca58ceb
StreamHash5
Like one wise man with anger issues once said, “talk is cheap, show me
the code”. StreamHash5 processes data in 16-byte blocks. The internal state
has 4*16 bytes. After every round, the state is xor-ed with the number of
bits processed so far. Simplified version (works when len(data)
is
divisible by 16) below:
def sh5(data):
offset = 0
state = list(consts)
while offset + 16 <= len(data):
offset += 16
magic = struct.pack("<I", offset * 8).rjust(8, "\x00")
state[3] = xor(state[3], magic * 2)
block(data[offset-16:offset], state)
return state[0] + state[1] + state[2] + state[3]
Of course, the real magic is in the block
compression function.
StreamHash can be described as four “almost-AES-CBC” encryptions
going on in parallel.
The state is divided into four 16 byte chunks. First, every chunk is xored with a new block of hashed data. After that, every block is encrypted with two rounds of AES (every block uses a different but constant key). In Python:
def xorstate(data, state):
state[0] = xor(data, state[0])
state[1] = xor(data, state[1])
state[2] = xor(data, state[2])
state[3] = xor(data, state[3])
def halfblock(state):
state[0] = aes_round(state[0], consts[0])
state[1] = aes_round(state[1], consts[1])
state[2] = aes_round(state[2], consts[2])
state[3] = aes_round(state[3], consts[3])
def block(data, state):
xorstate(data, state)
halfblock(state)
halfblock(state)
In StreamHash5, the halfblock function is called twice per block - this is the improvement over StreamHash4 introduced in the new version.
This can also be described with the following ASCII-art:
data bits
| |
v |
S0 -> xor -> aes_round(C0) -> aes_round(C0) --- | --> S0
S1 -> xor -> aes_round(C1) -> aes_round(C1) --- | --> S1
S2 -> xor -> aes_round(C2) -> aes_round(C2) --- v --> S2
S3 -> xor -> aes_round(C3) -> aes_round(C3) -> xor -> S3
Attack description
The attack itself is just a smart brute-force with a lot of precomputing.
The problem we’re trying to solve can be stated as follows:
AES(AES(A00 ^ A1D, C0), C0) = AES(AES(B00 ^ B1D, C0), C0)
AES(AES(A01 ^ A1D, C1), C1) = AES(AES(B01 ^ B1D, C1), C1)
AES(AES(A02 ^ A1D, C2), C2) = AES(AES(B02 ^ B1D, C2), C2)
AES(AES(A03 ^ A1D, C3), C3) = AES(AES(B03 ^ B1D, C3), C3)
Excuse my notation. C0
, C1
, C2
and C3
. Are algorithm constants.
AES
is a single round of AES encryption. AnD
and BnD
describe n
th data blocks (in message A
and B
respectively). And finally, Anm
/Bnm
describe m
th state after processing n
th block
(remember, there are four states).
In other words, the relationship between the state variables is:
# state 0 in message A
A10 = AES(A00 ^ A1D, C0)
A20 = AES(A10 ^ A2D, C0)
A30 = AES(A20 ^ A3D, C0)
...
# state 1 in message B
B11 = AES(B00 ^ B1D, C1)
B21 = AES(B10 ^ B2D, C1)
B31 = AES(B20 ^ B3D, C1)
...
# etc
Finally, an example with ASCII art again:
A1D bits
| |
v |
A00 -> xor -> aes_round(C0) -> aes_round(C0) --- | --> A10
A01 -> xor -> aes_round(C1) -> aes_round(C1) --- | --> A11
A02 -> xor -> aes_round(C2) -> aes_round(C2) --- v --> A12
A03 -> xor -> aes_round(C3) -> aes_round(C3) -> xor -> A13
So, back to the problem again. I won’t try to solve all four equations.
We’ll create a collision for three equations in 2**38
and observe that we can
repeat this 2**128
times to get a collision for the fourth block 3.
So we start with:
AES(AES(A00 ^ A1D, C0), C0) = AES(AES(B00 ^ B1D, C0), C0)
AES(AES(A01 ^ A1D, C1), C1) = AES(AES(B01 ^ B1D, C1), C1)
AES(AES(A02 ^ A1D, C2), C2) = AES(AES(B02 ^ B1D, C2), C2)
We expand AES
round into suboperations - SubBytes
, AddRoundKey
,
MixColumns
and ShiftRows
, and simplify:
SB(AR(MX(SR(SB(A00 ^ A1D))), C0)) ^ A2D = SB(AR(MX(SR(SB(B00 ^ B1D))), C0)) ^ B2D
SB(AR(MX(SR(SB(A01 ^ A1D))), C1)) ^ A2D = SB(AR(MX(SR(SB(B01 ^ B1D))), C1)) ^ B2D
SB(AR(MX(SR(SB(A02 ^ A1D))), C2)) ^ A2D = SB(AR(MX(SR(SB(B02 ^ B1D))), C2)) ^ B2D
Observe that SR(SB(A00 ^ A1D))
don’t mix bytes in any way. In fact, it’s
equivalent to SR(SB(A00)) ^ SR(SB(A1D))
. Use this trick to simplify equatoin
again (Let A00sr = SR(SB(A00))
etc):
SB(AR(MX(A00sr ^ A1Dsr), C0)) ^ A2D = SB(AR(MX(B00sr ^ B1Dsr), C0)) ^ B2D
SB(AR(MX(A01sr ^ A1Dsr), C1)) ^ A2D = SB(AR(MX(B01sr ^ B1Dsr), C1)) ^ B2D
SB(AR(MX(A02sr ^ A1Dsr), C2)) ^ A2D = SB(AR(MX(B02sr ^ B1Dsr), C2)) ^ B2D
Move variables around, and let d = A2D ^ B2D
d ^ SB(AR(MX(A00sr ^ A1Dsr), C0))= SB(AR(MX(B00sr ^ B1Dsr), C0))
d ^ SB(AR(MX(A01sr ^ A1Dsr), C1))= SB(AR(MX(B01sr ^ B1Dsr), C1))
d ^ SB(AR(MX(A02sr ^ A1Dsr), C2))= SB(AR(MX(B02sr ^ B1Dsr), C2))
Now, introduce helper functions for four constant:
let SSB_C0(x) = SB(AR(MX(x), C0))
d ^ SSB_C0(A00sr ^ A1Dsr) = SSB_C0(B00sr ^ B1Dsr)
d ^ SSB_C1(A01sr ^ A1Dsr) = SSB_C1(B01sr ^ B1Dsr)
d ^ SSB_C2(A02sr ^ A1Dsr) = SSB_C2(B02sr ^ B1Dsr)
Why? Because SB(AR(MX(x), C0))
is in a sweet spot, where it’s already hard to
analyse mathematically, but there is still only a single MixColumns step - so we
can operate on message dwords (32bit chunks) separately. This means we can
precompute inverse functions like:
SSB_C0(x) = y
<=>
x ∈ INVSSB_C0(y) # many inputs may give the same result
This will come in handy.
Now we can just brute-force A1Dsr
quickly. After plugging in a specific
value for A1D
(remember, the first data block) we get:
d0 = SB_C0(B00sr ^ B1Dsr)
d1 = SB_C1(B01sr ^ B1Dsr)
d2 = SB_C2(B02sr ^ B1Dsr)
This means that:
d0 ^ d1 = SB_C0(B00sr ^ B1Dsr) ^ SB_C1(B01sr ^ B1Dsr)
Observe that B00sr
and B01sr
don’t depend on A1D
(or even B1D
) . This means
that we can define a function:
SB_C0xC1(x) = SB_C0(B00sr ^ x) ^ SB_C1(B01sr ^ x)
And precompute its inverse:
SB_C0xC1(x) = y
<=>
x ∈ INVSB_C0xC1(y)
Going back to our equation, we can find solutions immediately. Pseudo-code:
# solve d0 ^ d1 = SB_C0(B00sr ^ B1Dsr) ^ SB_C1(B01sr ^ B1Dsr)
for B1Dsr in INVSB_C0xC1(d0 ^ d1):
... # do more crypto magic
This takes care of the first two equations. But remember, we need to solve three of them at once:
d0 = SB_C0(B00sr ^ B1Dsr)
d1 = SB_C1(B01sr ^ B1Dsr)
d2 = SB_C2(B02sr ^ B1Dsr)
Sadly, this is as far as clever precomputing gets us. But the good thing is
that we can work on 32bit input fragments independently. By picking a
random A1D
we have 1 / 2*32
chance of satisfying the third equation by pure
chance. And when we do it’s over - the attack succeeded.
So the idea is to split a message block into four 32bit parts, find a collision for each part separately, and then combine them into a 128bit block.
The only bad news is that sometimes we won’t find any A1D
that works, even
after exhaustive checking of all 2**32
uint32 values. In this case, we have to
pick other initial states and redo the attack.
That’s really it. I’ve glossed over a few details, but the gist of the attack and the most interesting part is just that - a big precomputed inverse for xor of two AES supersboxes.
Conclusions
Even though I didn’t present a full practical collision, I’m confident StreamHash5 shouldn’t be used in its current form. I see a few ways to improve my attack, and certainly focused work by experts and three-letter agencies would improve it even more.
The usual takeaway is: cryptography is hard - even when you’re a professional cryptographer, rolling your own crypto is risky.