StreamHash is an alternative family of cryptographic hash functions, designed for maximum speed, while being reasonably secure at the same time.

As far as I know, it’s not used in any production software, but is not completely fringe too - it was submitted to the SHA-3 competition and prompted a few academic publications.

StreamHash5 is the newest member of the family. It was created as an improved version of StreamHash4, after I implemented a practical collision attack for it 1.

In this blog post, I will describe my impractical collision attack on StreamHash5 (and a practical partial attack). I stress that to manage expectations, but it’s enough to consider StreamHash5 broken cryptographically:

  • Hash size is 512 bits. Hence, the generic birthday attack complexity is 2**256.
  • My attack complexity is roughly 2**128 * 2**38 = 2**166.

While the project’s readme clearly states that There are currently no known collision attacks easier than the generic birthday attack 2.

I’m also confident the attack complexity can be improved. But this research spent the last 2 years in my drawer and I think it’s time to finally publish it.

Partial collision example (line breaks added for clarity):

$ cat data.{0,1}.hex
626974636820646f20317420616c6f6ec0402c500c6570577b3c09fb4e966d2215ea0bb4a024bab440e907c9cc7e3bd0
696620796f752077616e6e61206372794b6dfb042aed03d14e8dbd4936536c3500000000000000000000000000000000

$ streamhash5sum data.0.bin
65bf759586d6b23e3ba7b36b9150920b  # collision
6f1daa35f658c4181fc25960d2b0ac61  # collision
71df8e636e38beecbf1d36ae7f9c26e4  # collision
e4121fda76468f48ae542cb70a31875e

$ streamhash5sum data.1.bin
65bf759586d6b23e3ba7b36b9150920b  # collision
6f1daa35f658c4181fc25960d2b0ac61  # collision
71df8e636e38beecbf1d36ae7f9c26e4  # collision
82ef6e3e544e910fc99b69c77ca58ceb

StreamHash5

Like one wise man with anger issues once said, “talk is cheap, show me the code”. StreamHash5 processes data in 16-byte blocks. The internal state has 4*16 bytes. After every round, the state is xor-ed with the number of bits processed so far. Simplified version (works when len(data) is divisible by 16) below:

def sh5(data):
    offset = 0
    state = list(consts)

    while offset + 16 <= len(data):
        offset += 16
        magic = struct.pack("<I", offset * 8).rjust(8, "\x00")
        state[3] = xor(state[3], magic * 2)
        block(data[offset-16:offset], state)

    return state[0] + state[1] + state[2] + state[3]

Of course, the real magic is in the block compression function. StreamHash can be described as four “almost-AES-CBC” encryptions going on in parallel.

The state is divided into four 16 byte chunks. First, every chunk is xored with a new block of hashed data. After that, every block is encrypted with two rounds of AES (every block uses a different but constant key). In Python:

def xorstate(data, state):
    state[0] = xor(data, state[0])
    state[1] = xor(data, state[1])
    state[2] = xor(data, state[2])
    state[3] = xor(data, state[3])


def halfblock(state):
    state[0] = aes_round(state[0], consts[0])
    state[1] = aes_round(state[1], consts[1])
    state[2] = aes_round(state[2], consts[2])
    state[3] = aes_round(state[3], consts[3])


def block(data, state):
    xorstate(data, state)
    halfblock(state)
    halfblock(state)

In StreamHash5, the halfblock function is called twice per block - this is the improvement over StreamHash4 introduced in the new version.

This can also be described with the following ASCII-art:

     data                                     bits
       |                                        |
       v                                        |
S0 -> xor -> aes_round(C0) -> aes_round(C0) --- | --> S0
S1 -> xor -> aes_round(C1) -> aes_round(C1) --- | --> S1
S2 -> xor -> aes_round(C2) -> aes_round(C2) --- v --> S2
S3 -> xor -> aes_round(C3) -> aes_round(C3) -> xor -> S3

Attack description

The attack itself is just a smart brute-force with a lot of precomputing.

The problem we’re trying to solve can be stated as follows:

AES(AES(A00 ^ A1D, C0), C0) = AES(AES(B00 ^ B1D, C0), C0)
AES(AES(A01 ^ A1D, C1), C1) = AES(AES(B01 ^ B1D, C1), C1)
AES(AES(A02 ^ A1D, C2), C2) = AES(AES(B02 ^ B1D, C2), C2)
AES(AES(A03 ^ A1D, C3), C3) = AES(AES(B03 ^ B1D, C3), C3)

Excuse my notation. C0, C1, C2 and C3. Are algorithm constants. AES is a single round of AES encryption. AnD and BnD describe nth data blocks (in message A and B respectively). And finally, Anm/Bnm describe mth state after processing nth block (remember, there are four states).

In other words, the relationship between the state variables is:

# state 0 in message A
A10 = AES(A00 ^ A1D, C0)
A20 = AES(A10 ^ A2D, C0)
A30 = AES(A20 ^ A3D, C0)
...

# state 1 in message B
B11 = AES(B00 ^ B1D, C1)
B21 = AES(B10 ^ B2D, C1)
B31 = AES(B20 ^ B3D, C1)
...

# etc

Finally, an example with ASCII art again:

       A1D                                     bits
        |                                        |
        v                                        |
A00 -> xor -> aes_round(C0) -> aes_round(C0) --- | --> A10
A01 -> xor -> aes_round(C1) -> aes_round(C1) --- | --> A11
A02 -> xor -> aes_round(C2) -> aes_round(C2) --- v --> A12
A03 -> xor -> aes_round(C3) -> aes_round(C3) -> xor -> A13

So, back to the problem again. I won’t try to solve all four equations. We’ll create a collision for three equations in 2**38 and observe that we can repeat this 2**128 times to get a collision for the fourth block 3. So we start with:

AES(AES(A00 ^ A1D, C0), C0) = AES(AES(B00 ^ B1D, C0), C0)
AES(AES(A01 ^ A1D, C1), C1) = AES(AES(B01 ^ B1D, C1), C1)
AES(AES(A02 ^ A1D, C2), C2) = AES(AES(B02 ^ B1D, C2), C2)

We expand AES round into suboperations - SubBytes, AddRoundKey, MixColumns and ShiftRows, and simplify:

SB(AR(MX(SR(SB(A00 ^ A1D))), C0)) ^ A2D = SB(AR(MX(SR(SB(B00 ^ B1D))), C0)) ^ B2D
SB(AR(MX(SR(SB(A01 ^ A1D))), C1)) ^ A2D = SB(AR(MX(SR(SB(B01 ^ B1D))), C1)) ^ B2D
SB(AR(MX(SR(SB(A02 ^ A1D))), C2)) ^ A2D = SB(AR(MX(SR(SB(B02 ^ B1D))), C2)) ^ B2D

Observe that SR(SB(A00 ^ A1D)) don’t mix bytes in any way. In fact, it’s equivalent to SR(SB(A00)) ^ SR(SB(A1D)). Use this trick to simplify equatoin again (Let A00sr = SR(SB(A00)) etc):

SB(AR(MX(A00sr ^ A1Dsr), C0)) ^ A2D = SB(AR(MX(B00sr ^ B1Dsr), C0)) ^ B2D
SB(AR(MX(A01sr ^ A1Dsr), C1)) ^ A2D = SB(AR(MX(B01sr ^ B1Dsr), C1)) ^ B2D
SB(AR(MX(A02sr ^ A1Dsr), C2)) ^ A2D = SB(AR(MX(B02sr ^ B1Dsr), C2)) ^ B2D

Move variables around, and let d = A2D ^ B2D

d ^ SB(AR(MX(A00sr ^ A1Dsr), C0))= SB(AR(MX(B00sr ^ B1Dsr), C0))
d ^ SB(AR(MX(A01sr ^ A1Dsr), C1))= SB(AR(MX(B01sr ^ B1Dsr), C1))
d ^ SB(AR(MX(A02sr ^ A1Dsr), C2))= SB(AR(MX(B02sr ^ B1Dsr), C2))

Now, introduce helper functions for four constant: let SSB_C0(x) = SB(AR(MX(x), C0))

d ^ SSB_C0(A00sr ^ A1Dsr) = SSB_C0(B00sr ^ B1Dsr)
d ^ SSB_C1(A01sr ^ A1Dsr) = SSB_C1(B01sr ^ B1Dsr)
d ^ SSB_C2(A02sr ^ A1Dsr) = SSB_C2(B02sr ^ B1Dsr)

Why? Because SB(AR(MX(x), C0)) is in a sweet spot, where it’s already hard to analyse mathematically, but there is still only a single MixColumns step - so we can operate on message dwords (32bit chunks) separately. This means we can precompute inverse functions like:

SSB_C0(x) = y
<=>
x ∈ INVSSB_C0(y)  # many inputs may give the same result

This will come in handy.

Now we can just brute-force A1Dsr quickly. After plugging in a specific value for A1D (remember, the first data block) we get:

d0 = SB_C0(B00sr ^ B1Dsr)
d1 = SB_C1(B01sr ^ B1Dsr)
d2 = SB_C2(B02sr ^ B1Dsr)

This means that:

d0 ^ d1 = SB_C0(B00sr ^ B1Dsr) ^ SB_C1(B01sr ^ B1Dsr)

Observe that B00sr and B01sr don’t depend on A1D (or even B1D) . This means that we can define a function:

SB_C0xC1(x) = SB_C0(B00sr ^ x) ^ SB_C1(B01sr ^ x)

And precompute its inverse:

SB_C0xC1(x) = y
<=>
x ∈ INVSB_C0xC1(y)

Going back to our equation, we can find solutions immediately. Pseudo-code:

# solve d0 ^ d1 = SB_C0(B00sr ^ B1Dsr) ^ SB_C1(B01sr ^ B1Dsr)
for B1Dsr in INVSB_C0xC1(d0 ^ d1):
    ...  # do more crypto magic

This takes care of the first two equations. But remember, we need to solve three of them at once:

d0 = SB_C0(B00sr ^ B1Dsr)
d1 = SB_C1(B01sr ^ B1Dsr)
d2 = SB_C2(B02sr ^ B1Dsr)

Sadly, this is as far as clever precomputing gets us. But the good thing is that we can work on 32bit input fragments independently. By picking a random A1D we have 1 / 2*32 chance of satisfying the third equation by pure chance. And when we do it’s over - the attack succeeded.

So the idea is to split a message block into four 32bit parts, find a collision for each part separately, and then combine them into a 128bit block.

The only bad news is that sometimes we won’t find any A1D that works, even after exhaustive checking of all 2**32 uint32 values. In this case, we have to pick other initial states and redo the attack.

That’s really it. I’ve glossed over a few details, but the gist of the attack and the most interesting part is just that - a big precomputed inverse for xor of two AES supersboxes.

Conclusions

Even though I didn’t present a full practical collision, I’m confident StreamHash5 shouldn’t be used in its current form. I see a few ways to improve my attack, and certainly focused work by experts and three-letter agencies would improve it even more.

The usual takeaway is: cryptography is hard - even when you’re a professional cryptographer, rolling your own crypto is risky.


  1. You can read my slides or watch the video. Both are PL-only. ↩︎

  2. https://github.com/mtrojnar/StreamHash5 ↩︎

  3. Can we use the birthday paradox to bring it down to 2**64 somehow? I Don’t know. Maybe? But we don’t control the first three blocks, so it’s not obvious how. ↩︎