*Jon Maiga, 2020-08-03*

Earlier I explored the structure of bit mixers and found that the common pattern ‘multiply and xor-shift’ is a good choice. In the search I used preselected constants from well known bit mixers. In this post I will try to search for new constants using a simple algorithm from machine learning. For example consider the following bit mixer that was found

```
uint64_t xmx(uint64_t x) {
x ^= x >> 23;
x *= 0xff51afd7ed558ccdull;
x ^= x >> 23;
return x;
}
```

What are the optimal constants? The shift constants are small (<$2^6=64$) while the multiplier is a 64-bit word. Combinations of those will change the mixing and its performance. In the example there is $2^{64}\cdot2^6\cdot2^6=2^{76}$ possible configurations. So already for this simple hash function, exhaustive search is out of the question.

To reduce the dimensionality I am going to employ a simple technique from machine learning. In machine learning you sometimes have a lot of features (parameters) and for some reason you need to select a subset. What is a good subset? Given a fitness function it’s possible to find well performing candidates via different algorithms.

In the example above I will consider each bit as a ‘feature’ so we have $76$ features that will either be included ($1$) or excluded ($0$). Now that we have one array of $76$ bits we will use an algorithm to see which bits to include. The algorithm has no explicit knowledge of the fact that the bit vector consists of three parts, 6-6-64-bits or what it represent.

SFFS works by starting with no features selected, then testing to add each feature alone and compute it’s score. In the next step (forward) we keep the feature with the best score. Now we enter a backtracking loop, removing a feature at a time and computing the a new score for the previous state (remove worst feature). If the score is better than before, we step back and continue to backtrack until we find nothing better after which we start restart our forward loop. This procedure is repeated until all states have been stabilized. For a more formal description see for example the last slide here.

I modified the algorithm to make it possible to start from a seed, that way we can start with constants of an existing program and see if they can be tuned.

The process itself is pretty fast, what takes time is computing the fitness which in our case will be an avalanche score for N samples.

Here is an example, starting with no bits set, In this example it searches for a multiplier constant, so it starts by making it odd (the bit output happens to be big endian), you can see how it goes back (bwd) and forth (fwd) flipping bits trying to minimize the fitness function.

```
seed@0 0000000000000000000000000000000000000000000000000000000000000000 (5.00803)
fwd: 1: (1): 1000000000000000000000000000000000000000000000000000000000000000 [ 1 ] : 4.77487 vs 1e+08*
fwd: 2: (2): 1000000000000000000001000000000000000000000000000000000000000000 [ 2097153 ] : 3.20621 vs 1e+08*
fwd: 3: (3): 1000000000000001000001000000000000000000000000000000000000000000 [ 2129921 ] : 2.16687 vs 1e+08*
fwd: 4: (4): 1000000000000001000001000000100000000000000000000000000000000000 [ 270565377 ] : 1.59203 vs 1e+08*
bwd: 3: (3): 1000000000000001000001000000000000000000000000000000000000000000 [ 2129921 ] : 2.16519 < 2.16687
fwd: 4: (4): 1000000000000001000001000000100000000000000000000000000000000000 [ 270565377 ] : 1.59278 vs 1.59203
fwd: 5: (5): 1000100000000001000001000000100000000000000000000000000000000000 [ 270565393 ] : 1.28264 vs 1e+08*
fwd: 6: (6): 1000100000000001000001000000100000001000000000000000000000000000 [ 68990042129 ] : 1.07799 vs 1e+08*
bwd: 5: (5): 1000100000000001000000000000100000001000000000000000000000000000 [ 68987944977 ] : 1.24125 < 1.28264
fwd: 6: (6): 1000100000000001000000001000100000001000000000000000000000000000 [ 69004722193 ] : 1.06216 vs 1.07799*
bwd: 5: (5): 1000100000000001000000000000100000001000000000000000000000000000 [ 68987944977 ] : 1.24106 < 1.24125
fwd: 6: (6): 1000100000000001000000001000100000001000000000000000000000000000 [ 69004722193 ] : 1.06407 vs 1.06216
fwd: 7: (7): 1000100000001001000000001000100000001000000000000000000000000000 [ 69004726289 ] : 0.699706 vs 1e+08*
bwd: 6: (6): 1000100000001000000000001000100000001000000000000000000000000000 [ 69004693521 ] : 0.815164 < 1.06216
fwd: 7: (7): 1000100000001000000010001000100000001000000000000000000000000000 [ 69005742097 ] : 0.598486 vs 0.699706*
fwd: 8: (8): 1000100000001010000010001000100000001000000000000000000000000000 [ 69005758481 ] : 0.437632 vs 1e+08*
bwd: 7: (7): 1000100000001000000010001000100000001000000000000000000000000000 [ 69005742097 ] : 0.596503 < 0.598486
fwd: 8: (8): 1000100000001001000010001000100000001000000000000000000000000000 [ 69005774865 ] : 0.43684 vs 0.437632*
fwd: 9: (9): 1000100000001001000010001000100000001001000000000000000000000000 [ 618761588753 ] : 0.33495 vs 1e+08*
bwd: 8: (8): 1000100000001001000000001000100000001001000000000000000000000000 [ 618760540177 ] : 0.349883 < 0.43684
bwd: 7: (7): 1000100000001001000000001000000000001001000000000000000000000000 [ 618492104721 ] : 0.379048 < 0.596503
bwd: 6: (6): 1000100000001000000000001000000000001001000000000000000000000000 [ 618492071953 ] : 0.79068 < 0.815164
bwd: 5: (5): 1000100000001000000000000000000000001001000000000000000000000000 [ 618475294737 ] : 1.18159 < 1.24106
fwd: 6: (6): 1000100000101000000000000000000000001001000000000000000000000000 [ 618475295761 ] : 0.621712 vs 0.79068
...
```

The fitness function is as important as it’s hard to specify. After testing a bit I went with trying to minimize the max bias. For each permutation of the bit array, the mixer is called N times with random numbers for which avalanche scores are computed. Since the strict avalanche criterion (sac) is easier to satisfy it will guide the search until the bit independence criterion (bic) can be computed (which will remain at 100% in the beginning). Without spending too much time on this I set the fitness to minimizing

$f=mean\_bias + max\_bias$

This will keep the mean low and the avoid worst cases, which in turn lowers the std deviation. The mean is more stable than the max bias, which is perhaps the reason it seems to help in the searches.

To narrow down the scope I will focus on mixer with (m)ultipliers and (x)or-shift-right. When I write `xmx`

I mean xor-shift-right by some constant, multiply with a constant and then xor-shift-right by some constant (like the example function above).

This experiment will gradually create more complex mixers and the purpose is to see how many operations we need to make a good hash function. By good, I mean something that has a low bias and passes at least $2^{40}$ bytes through PractRand.

Just as before I used `RNG_test.exe stdin64 -tf 2 -te 0 -multithreaded -tlmax 1TB -tlmin 1KB`

for PractRand, where the mixer is used as a random number generator where the state is increased by one in each step.

My first question is, if our mixer function consists of a single multiplication what is the constant that minimizes the bias? First we need an odd constant $C$ to make the function bijective, but which `c`

is optimal?

```
uint64_t m(uint64_t x) {
return x * c;
}
```

Intuitively it feels like it should have about as many zeros as ones distributed randomly. Using this hypothesis let’s start by creating an odd constant with binary 32 ones and 32 zeros where each interleaves each other,

```
c_base=0b0101010101010101010101010101010101010101010101010101010101010101
```

Then we run the avalanche test on `m`

(passing 1.000.000 random numbers through it). The function is very biased, average around 31% with worst case of 97% (strict avalanche criterion). The question is if we can find another constant that is less biased? To give a hint we can run the SFFS algorithm and additionally seed it with `c_base`

. The results are pretty conclusive, if we start with the seed `c_base`

not a single bit is changed by the SFFS algorithm. When starting SFFS at random seeds or 0, no better constant was found, but all end up close to `c_base`

in regards to their constructions (1’s interleaved with 0’s).

```
c_found=0b1101010101010101010110100110100110101010101010101010101010101010
```

The average bias is slightly worse (33%) than our base case with the same max bias.

Maybe `c_base`

is theoretically optimal with respect to the strict avalanche criterion. The repeating `10`

pattern makes this constant really bad when used in more realistic mixers as we shall see.

I will spare you much of the details, but after running searches I started to suspect that as long as the multiplier is somewhat random we are fine in terms of PractRand results. Further I am not entirely convinced that using different multipliers automatically makes things better, although it certainly feels like it should. What seems somewhat important is that the least significant bits look decent (not too many repeating 0’s or 1’s and not to unbalanced).

Here I found something interesting, it seems like those are only semi-dependent of the multiplier constants (which is nice from a search perspective). If you find a better configuration for a multiplier it’s likely it will improve all the other multipliers too! In the tables below, I’ve used a bunch of well known constants from mixers, `c_base`

and `c_found`

, and a random constant (literally the first I generated from random.org).

With the hypothesis that the xor constants are pretty independent of the multiplier constants I fixated the multiplier constant while SFFS is looking for good xor-shift constants.

We fix `mc`

and look for good `xc1`

constants.

```
uint64_t mx(uint64_t x) {
x *= mc;
x ^= x >> xc1;
return x;
}
```

mc | xc1 | sac_max | PR |
---|---|---|---|

c_base | 33 | 34.41 | 10 |

c_found | 32 | 33.47 | 10 |

splitmix_1 | 31 | 42.82 | 14 |

splitmix_2 | 31 | 44.46 | 13 |

murmur_1 | 30 | 39.86 | 12 |

murmur_2 | 32 | 45.41 | 12 |

fasthash | 30 | 45.41 | 14 |

random | 32 | 45.61 | 11 |

SFFS suggests that shifting around 32 is a good choice.

We fix `mc`

and look for good `xc1`

constants.

```
uint64_t xm(uint64_t x) {
x ^= x >> xc1;
x *= mc;
return x;
}
```

mc | xc1 | sac_max | PR |
---|---|---|---|

c_base | 36 | 93.75 | 10 |

c_found | 36 | 93.75 | 10 |

splitmix_1 | 31 | 93.75 | 12 |

splitmix_2 | 27 | 93.75 | 12 |

murmur_1 | 30 | 93.75 | 10 |

murmur_2 | 32 | 93.75 | 10 |

fasthash | 29 | 93.75 | 10 |

random | 32 | 93.75 | 11 |

Same here, looks like around 32 is a good choice, `xm`

seems weaker than `mx`

both for the avalanche criterion and PractRand wise.

We fix `mc`

and look for good `xc1`

constants.

```
uint64_t mxm(uint64_t x) {
x *= mc;
x ^= x >> xc1;
x *= mc;
return x;
}
```

mc | xc1 | max_sac | PR |
---|---|---|---|

c_base | 59 | 31.25 | 10 |

c_found | 53 | 11.63 | 12 |

splitmix_1 | 57 | 10.91 | 13 |

splitmix_2 | 57 | 5.25 | 14 |

murmur_1 | 57 | 13.97 | 12 |

murmur_2 | 57 | 10.93 | 14 |

fasthash | 57 | 4.13 | 13 |

random | 57 | 4.79 | 14 |

For the `mxm`

construction it seems like a high value of `57`

is the best choice to keep the avalanche bias down. This is close to the constant that was used in the experiment in my previous article (56).

We fix `mc`

and look for good `xc1`

and `xc2`

constants.

```
uint64_t xmx(uint64_t x) {
x ^= x >> xc1;
x *= mc;
x ^= x >> xc2;
return x;
}
```

mc | xc1/xc2 | max_sac | PR |
---|---|---|---|

c_base | 21/35 | 13.21 | 10 |

c_found | 17/32 | 10.64 | 10 |

splitmix_1 | 26/27 | 7.41 | 14 |

splitmix_2 | 26/31 | 6.73 | 13 |

murmur_1 | 21/35 | 6.47 | 10 |

murmur_2 | 30/27 | 4.89 | 14 |

fasthash | 26/30 | 6.48 | 14 |

random | 26/32 | 7.27 | 11 |

Here the found constants are a bit lower than in the `mxm`

case.

We fix `mc`

and look for good `xc1`

and `xc2`

constants.

```
uint64_t mxmx(uint64_t x) {
x *= mc;
x ^= x >> xc1;
x *= mc;
x ^= x >> xc2;
return x;
}
```

mc | xc1/xc2 | max_sac | PR | PR 34/28 | max_sac 34/28 |
---|---|---|---|---|---|

c_base | 47/46 | 6.01 | 10 | 10 | 8.86 |

c_found | 38/35 | 1.57 | 15 | 16 | 1.80 |

splitmix_1 | 39/28 | 1.75 | 27 | 32 | 1.71 |

splitmix_2 | 47/31 | 1.61 | 19 | 31 | 1.72 |

murmur_1 | 32/32 | 2.61 | 15 | 25 | 2.46 |

murmur_2 | 36/32 | 1.77 | 15 | 29 | 1.56 |

fasthash | 53/12 | 1.74 | 18 | 30 | 6.40 |

random | 56/15 | 2.60 | 17 | 21 | 2.14 |

This table gave me the insight shows something I found in my earlier article: measuring the avalanche alone is an insufficient indicator on how random a function is. This was a bit of a downer, since I don’t have time to explore a better fitness function. But then I realised something interesting that put me back on track.

In the fifth column I manually found a pair of `xc1`

and `xc2`

that seemed good with one of the multiplier constant, then when I used the same on the others it seemed to improve the PractRand result for all of them without changing the avalanche bias too much! This is good news for two reasons: 1) if true, it much quicker iterations for tuning xor-constants 2) when found, those will work everywhere.

We fix `mc`

and look for good `xc1`

and `xc2`

constants.

```
uint64_t xmxm(uint64_t x) {
x ^= x >> xc1;
x *= mc;
x ^= x >> xc2;
x *= mc;
return x;
}
```

mc | xc1/xc2 | max_sac | PR |
---|---|---|---|

c_base | 33/33 | 32.32 | 10 |

c_found | 14/50 | 5.00 | 10 |

splitmix_1 | 18/48 | 2.82 | 14 |

splitmix_2 | 26/47 | 2.90 | 14 |

murmur_1 | 12/52 | 2.06 | 14 |

murmur_2 | 36/36 | 1.90 | 14 |

fasthash | 17/51 | 2.71 | 14 |

random | 13/51 | 2.59 | 14 |

I was not able to find any xor constants that pushed any of the configuration above PractRand 14. Setting the constants around `34/18`

made `c_found`

go to 14 as well.

Although the SFFS found xor constants gave good avalanche scores, they did not guarantee good PractRand scores. So from here I will manually find xor constants and see if the hypothesis that if they improve one constant, they are likely to improve others as well.

I continued in the same manner, going through `mxmxm`

, `xmxmx`

, `xmxmxm`

, `mxmxmx`

finding good constants for each - the pattern seemed to hold, you can optimize it for any constant and if it improves that - it will improve the others as well. I wish I had time to compose proper tables for this, but my vacation is coming to an end so I need to finish this up. I will summarize the findings and use less complete tables.

```
uint64_t mxmxm(uint64_t x) {
x *= mc;
x ^= x >> 32;
x *= mc;
x ^= x >> 33;
x *= mc;
return x;
}
```

mc | PR 32/33 | max_sac |
---|---|---|

c_base | 10 | 24.95 |

c_found | 20 | 6.59 |

splitmix_1 | 38 | 2.17 |

splitmix_2 | 36 | 0.63 |

murmur_1 | 36 | 3.54 |

murmur_2 | 39 | 0.61 |

fasthash | 36 | 1.11 |

random | 38 | 2.00 |

For `mxmxm`

good xor-shift constants seems to be `33/29`

, this will fail PractRand around 40 for most of the constants. However as we saw in my previous article, the bit independence criterion bias is way too high to make those a competitor to MurMur3 etc.

```
uint64_t xmxmx(uint64_t x) {
x ^= x >> 34;
x *= mc;
x ^= x >> 18;
x *= mc;
x ^= x >> 28;
return x;
}
```

For `xmxmx`

(same construction as MurMur3 and SplitMix) good constants seems to be around `34/18/28`

(I admit that this can probably be optimized). It will take most of the constants over PractRand 30, and here we can get low bias in the bit independence test.

Note as a baseline, SplitMix and MurMur3 in their original form fail around PractRand 19 when using a counter based state (gamma=1).

mc | PR 34/18/28 | max_bic |
---|---|---|

c_base | 10 | 20.5 |

c_found | 17 | 2.05 |

splitmix_1 | 36 | 0.78 |

splitmix_2 | 36 | 1.39 |

murmur_1 | 25 | 1.37 |

murmur_2 | 30 | 1.22 |

fasthash | 30 | 4.18 |

random | 32 | 4.52 |

```
uint64_t mxmxmx(uint64_t x) {
x *= mc;
x ^= x >> 33;
x *= mc;
x ^= x >> 29;
x *= mc;
x ^= x >> 39;
return x;
}
```

For `mxmxmx`

we get really good mixers with low avalanche bias and high ‘randomness’ as we can push it passed PractRand >44 (>16TB) (takes a lot of time to run the tests at this level). As a note, the random constant seems to have good PractRand properties but the max bias for the bit independence criteria is not excellent.

The best constants I’ve found are `33/29/39`

with respect to PractRand.

mc | PR 32/28/33 | PR 33/29/39 | max_bic |
---|---|---|---|

c_base | 11 | 11 | 19.32 |

c_found | 30 | 38 | 4.710 |

splitmix_1 | 41 | >44 | 0.378 |

splitmix_2 | 43? | >44 | 0.440 |

murmur_1 | 37 | >44 | 0.390 |

murmur_2 | 42 | >44 | 0.365 |

fasthash | 38 | >44 | 0.394 |

random | 44 | >44 | 3.719 |

For illustration of my point of quicker iterations, I kept the not so good constants, which I first found after tuning the xor-shift constants to for c_found to 32/28/33 after which I tuned it further to 38 - note how all constants improved. Really promising construction!

Finally I found `32/28/33`

to work well with `xmxmxm`

```
uint64_t xmxmxm(uint64_t x) {
x ^= x >> 32;
x *= mc;
x ^= x >> 28;
x *= mc;
x ^= x >> 33;
x *= mc;
return x;
}
```

mc | PR 32/28/33 | max_bic |
---|---|---|

c_base | 10 | 12.17 |

c_found | 25 | 0.375 |

splitmix_1 | 39 | 0.360 |

splitmix_2 | 39 | 0.378 |

murmur_1 | 40 | 0.372 |

murmur_2 | 44 | 0.348 |

fasthash | 38 | 0.399 |

random | 44 | 0.380 |

Doesn’t seem as good as `mxmxmx`

, at least I was not able to get passed PractRand $2^{44}$.

The following table summarizes the tables from above, it displays the lowest bias and best PractRand for each table. The purpose is to provide an idea of how well the different constructions can work.

mixer | xor constants | max_sac | max_bic | max PR $2^{k}$ |
---|---|---|---|---|

m | 31 | 33.47 | 100 | 12 |

mx | 31 | 33.47 | 100 | 14 |

xm | 31 | 93.75 | 100 | 12 |

mxm | 57 | 4.13 | 100 | 14 |

xmx | 32/33 | 4.89 | 100 | 14 |

mxmx | 34/28 | 1.57 | 100 | 32 |

xmxm | 34/18 | 1.90 | 100 | 14 |

mxmxm | 32/33 | 0.61 | 50.12 | 39 |

xmxmx | 34/18/28 | 0.378 | 0.78 | 36 |

mxmxmx | 33/29/39 | 0.0235 | 0.365 | >44 |

xmxmxm | 32/28/33 | 0.0333 | 0.348 | 44 |

It seems that passing, PractRand 40 we need either `mxmxmx`

or `xmxmxm`

. The common construction found in SplitMix and Murmur3 `xmxmx`

is the first good that gets the bit independence bias down, however I was not able to push it further than PractRand 36. Remember that I’ve only tested those with a counter state (0,1,2,3,4…) which usually this fails earlier than other gammas.

The use of SFFS didn’t work out as I hoped since the fitness function doesn’t capture enough (optimizing for avalanche doesn’t imply good PractRand results). But during the process I stumbled into and an interesting observation:

It seems like you can tune xor-shift constants for any multiplier constant and if it improves the PractRand result it will probably improve it for all the other multiplier constants. This would have at least two nice consequences

- We can perhaps find a best unique set of xor-shift constants for a given construction.
- We can artificially create ‘bad’ multiplier constants that fails PractRand early (say at 30) and tweak the xor-shift constants here for quicker iteration (it worked great for me, tweaking for example
`c_found`

).

With those experiments, I think I have enough intuition to assemble a hash function or prng based on the findings.

**Posts in series**

- The construct of a bit mixer
- Tuning bit mixers
- The mx3 mix/prng/hash functions

Written with StackEdit.