# Collision attack for StreamHash5

StreamHash is an alternative family of cryptographic hash functions, designed for maximum speed, while being reasonably secure at the same time.

As far as I know, it’s not used in any production software, but is not completely fringe too - it was submitted to the SHA-3 competition and prompted a few academic publications.

StreamHash5 is the newest member of
the family. It was created as an improved
version of StreamHash4, after I implemented a practical collision attack for
it ^{1}.

In this blog post, I will describe my **impractical** collision attack on
StreamHash5 (and a practical partial attack). I stress that to manage
expectations, but it’s enough to consider StreamHash5 broken cryptographically:

- Hash size is 512 bits. Hence, the generic birthday attack complexity is
`2**256`

. - My attack complexity is roughly
`2**128 * 2**38 = 2**166`

.

While the project’s readme clearly states that `There are currently no known collision attacks easier than the generic birthday attack`

^{2}.

I’m also confident the attack complexity can be improved. But this research spent the last 2 years in my drawer and I think it’s time to finally publish it.

Partial collision example (line breaks added for clarity):

```
$ cat data.{0,1}.hex
626974636820646f20317420616c6f6ec0402c500c6570577b3c09fb4e966d2215ea0bb4a024bab440e907c9cc7e3bd0
696620796f752077616e6e61206372794b6dfb042aed03d14e8dbd4936536c3500000000000000000000000000000000
$ streamhash5sum data.0.bin
65bf759586d6b23e3ba7b36b9150920b # collision
6f1daa35f658c4181fc25960d2b0ac61 # collision
71df8e636e38beecbf1d36ae7f9c26e4 # collision
e4121fda76468f48ae542cb70a31875e
$ streamhash5sum data.1.bin
65bf759586d6b23e3ba7b36b9150920b # collision
6f1daa35f658c4181fc25960d2b0ac61 # collision
71df8e636e38beecbf1d36ae7f9c26e4 # collision
82ef6e3e544e910fc99b69c77ca58ceb
```

## StreamHash5

Like one wise man with anger issues once said, “talk is cheap, show me
the code”. StreamHash5 processes data in 16-byte blocks. The internal state
has 4*16 bytes. After every round, the state is xor-ed with the number of
bits processed so far. Simplified version (works when `len(data)`

is
divisible by 16) below:

```
def sh5(data):
offset = 0
state = list(consts)
while offset + 16 <= len(data):
offset += 16
magic = struct.pack("<I", offset * 8).rjust(8, "\x00")
state[3] = xor(state[3], magic * 2)
block(data[offset-16:offset], state)
return state[0] + state[1] + state[2] + state[3]
```

Of course, the real magic is in the `block`

compression function.
StreamHash can be described as four “almost-AES-CBC” encryptions
going on in parallel.

The state is divided into four 16 byte chunks. First, every chunk is xored with a new block of hashed data. After that, every block is encrypted with two rounds of AES (every block uses a different but constant key). In Python:

```
def xorstate(data, state):
state[0] = xor(data, state[0])
state[1] = xor(data, state[1])
state[2] = xor(data, state[2])
state[3] = xor(data, state[3])
def halfblock(state):
state[0] = aes_round(state[0], consts[0])
state[1] = aes_round(state[1], consts[1])
state[2] = aes_round(state[2], consts[2])
state[3] = aes_round(state[3], consts[3])
def block(data, state):
xorstate(data, state)
halfblock(state)
halfblock(state)
```

In StreamHash5, the halfblock function is called twice per block - this is the improvement over StreamHash4 introduced in the new version.

This can also be described with the following ASCII-art:

```
data bits
| |
v |
S0 -> xor -> aes_round(C0) -> aes_round(C0) --- | --> S0
S1 -> xor -> aes_round(C1) -> aes_round(C1) --- | --> S1
S2 -> xor -> aes_round(C2) -> aes_round(C2) --- v --> S2
S3 -> xor -> aes_round(C3) -> aes_round(C3) -> xor -> S3
```

## Attack description

The attack itself is just a smart brute-force with a lot of precomputing.

The problem we’re trying to solve can be stated as follows:

```
AES(AES(A00 ^ A1D, C0), C0) = AES(AES(B00 ^ B1D, C0), C0)
AES(AES(A01 ^ A1D, C1), C1) = AES(AES(B01 ^ B1D, C1), C1)
AES(AES(A02 ^ A1D, C2), C2) = AES(AES(B02 ^ B1D, C2), C2)
AES(AES(A03 ^ A1D, C3), C3) = AES(AES(B03 ^ B1D, C3), C3)
```

Excuse my notation. `C0`

, `C1`

, `C2`

and `C3`

. Are algorithm constants.
`AES`

is a single round of AES encryption. `AnD`

and `BnD`

describe `n`

th data blocks (in message `A`

and `B`

respectively). And finally, `Anm`

/`Bnm`

describe `m`

th state after processing `n`

th block
(remember, there are four states).

In other words, the relationship between the state variables is:

```
# state 0 in message A
A10 = AES(A00 ^ A1D, C0)
A20 = AES(A10 ^ A2D, C0)
A30 = AES(A20 ^ A3D, C0)
...
# state 1 in message B
B11 = AES(B00 ^ B1D, C1)
B21 = AES(B10 ^ B2D, C1)
B31 = AES(B20 ^ B3D, C1)
...
# etc
```

Finally, an example with ASCII art again:

```
A1D bits
| |
v |
A00 -> xor -> aes_round(C0) -> aes_round(C0) --- | --> A10
A01 -> xor -> aes_round(C1) -> aes_round(C1) --- | --> A11
A02 -> xor -> aes_round(C2) -> aes_round(C2) --- v --> A12
A03 -> xor -> aes_round(C3) -> aes_round(C3) -> xor -> A13
```

So, back to the problem again. I won’t try to solve all four equations.
We’ll create a collision for three equations in `2**38`

and observe that we can
repeat this `2**128`

times to get a collision for the fourth block ^{3}.
So we start with:

```
AES(AES(A00 ^ A1D, C0), C0) = AES(AES(B00 ^ B1D, C0), C0)
AES(AES(A01 ^ A1D, C1), C1) = AES(AES(B01 ^ B1D, C1), C1)
AES(AES(A02 ^ A1D, C2), C2) = AES(AES(B02 ^ B1D, C2), C2)
```

We expand `AES`

round into suboperations - `SubBytes`

, `AddRoundKey`

,
`MixColumns`

and `ShiftRows`

, and simplify:

```
SB(AR(MX(SR(SB(A00 ^ A1D))), C0)) ^ A2D = SB(AR(MX(SR(SB(B00 ^ B1D))), C0)) ^ B2D
SB(AR(MX(SR(SB(A01 ^ A1D))), C1)) ^ A2D = SB(AR(MX(SR(SB(B01 ^ B1D))), C1)) ^ B2D
SB(AR(MX(SR(SB(A02 ^ A1D))), C2)) ^ A2D = SB(AR(MX(SR(SB(B02 ^ B1D))), C2)) ^ B2D
```

Observe that `SR(SB(A00 ^ A1D))`

don’t mix bytes in any way. In fact, it’s
equivalent to `SR(SB(A00)) ^ SR(SB(A1D))`

. Use this trick to simplify equatoin
again (Let `A00sr = SR(SB(A00))`

etc):

```
SB(AR(MX(A00sr ^ A1Dsr), C0)) ^ A2D = SB(AR(MX(B00sr ^ B1Dsr), C0)) ^ B2D
SB(AR(MX(A01sr ^ A1Dsr), C1)) ^ A2D = SB(AR(MX(B01sr ^ B1Dsr), C1)) ^ B2D
SB(AR(MX(A02sr ^ A1Dsr), C2)) ^ A2D = SB(AR(MX(B02sr ^ B1Dsr), C2)) ^ B2D
```

Move variables around, and let `d = A2D ^ B2D`

```
d ^ SB(AR(MX(A00sr ^ A1Dsr), C0))= SB(AR(MX(B00sr ^ B1Dsr), C0))
d ^ SB(AR(MX(A01sr ^ A1Dsr), C1))= SB(AR(MX(B01sr ^ B1Dsr), C1))
d ^ SB(AR(MX(A02sr ^ A1Dsr), C2))= SB(AR(MX(B02sr ^ B1Dsr), C2))
```

Now, introduce helper functions for four constant:
`let SSB_C0(x) = SB(AR(MX(x), C0))`

```
d ^ SSB_C0(A00sr ^ A1Dsr) = SSB_C0(B00sr ^ B1Dsr)
d ^ SSB_C1(A01sr ^ A1Dsr) = SSB_C1(B01sr ^ B1Dsr)
d ^ SSB_C2(A02sr ^ A1Dsr) = SSB_C2(B02sr ^ B1Dsr)
```

Why? Because `SB(AR(MX(x), C0))`

is in a sweet spot, where it’s already hard to
analyse mathematically, but there is still only a single MixColumns step - so we
can operate on message dwords (32bit chunks) separately. This means we can
precompute inverse functions like:

```
SSB_C0(x) = y
<=>
x ∈ INVSSB_C0(y) # many inputs may give the same result
```

This will come in handy.

Now we can just brute-force `A1Dsr`

quickly. After plugging in a specific
value for `A1D`

(remember, the first data block) we get:

```
d0 = SB_C0(B00sr ^ B1Dsr)
d1 = SB_C1(B01sr ^ B1Dsr)
d2 = SB_C2(B02sr ^ B1Dsr)
```

This means that:

```
d0 ^ d1 = SB_C0(B00sr ^ B1Dsr) ^ SB_C1(B01sr ^ B1Dsr)
```

Observe that `B00sr`

and `B01sr`

don’t depend on `A1D`

(or even `B1D`

) . This means
that we can define a function:

```
SB_C0xC1(x) = SB_C0(B00sr ^ x) ^ SB_C1(B01sr ^ x)
```

And precompute its inverse:

```
SB_C0xC1(x) = y
<=>
x ∈ INVSB_C0xC1(y)
```

Going back to our equation, we can find solutions immediately. Pseudo-code:

```
# solve d0 ^ d1 = SB_C0(B00sr ^ B1Dsr) ^ SB_C1(B01sr ^ B1Dsr)
for B1Dsr in INVSB_C0xC1(d0 ^ d1):
... # do more crypto magic
```

This takes care of the first two equations. But remember, we need to solve three of them at once:

```
d0 = SB_C0(B00sr ^ B1Dsr)
d1 = SB_C1(B01sr ^ B1Dsr)
d2 = SB_C2(B02sr ^ B1Dsr)
```

Sadly, this is as far as clever precomputing gets us. But the good thing is
that we can work on 32bit input fragments independently. By picking a
random `A1D`

we have `1 / 2*32`

chance of satisfying the third equation by pure
chance. And when we do it’s over - the attack succeeded.

So the idea is to split a message block into four 32bit parts, find a collision for each part separately, and then combine them into a 128bit block.

The only bad news is that sometimes we won’t find any `A1D`

that works, even
after exhaustive checking of all `2**32`

uint32 values. In this case, we have to
pick other initial states and redo the attack.

That’s really it. I’ve glossed over a few details, but the gist of the attack and the most interesting part is just that - a big precomputed inverse for xor of two AES supersboxes.

## Conclusions

Even though I didn’t present a full practical collision, I’m confident **StreamHash5 shouldn’t be used** in its current form. I see a few ways to improve my attack, and certainly focused work by experts and three-letter agencies would improve it even more.

The usual takeaway is: **cryptography is hard** - even when you’re a professional cryptographer, rolling your own crypto is risky.