Tuning bit mixers

Jon Maiga, 2020-08-03

Earlier I explored the structure of bit mixers and found that the common pattern ‘multiply and xor-shift’ is a good choice. In the search I used preselected constants from well known bit mixers. In this post I will try to search for new constants using a simple algorithm from machine learning. For example consider the following bit mixer that was found

uint64_t xmx(uint64_t x) {
	x ^= x >> 23;
	x *= 0xff51afd7ed558ccdull;
	x ^= x >> 23;
	return x;
}

What are the optimal constants? The shift constants are small (<26=642^6=64) while the multiplier is a 64-bit word. Combinations of those will change the mixing and its performance. In the example there is 2642626=2762^{64}\cdot2^6\cdot2^6=2^{76} possible configurations. So already for this simple hash function, exhaustive search is out of the question.

To reduce the dimensionality I am going to employ a simple technique from machine learning. In machine learning you sometimes have a lot of features (parameters) and for some reason you need to select a subset. What is a good subset? Given a fitness function it’s possible to find well performing candidates via different algorithms.

In the example above I will consider each bit as a ‘feature’ so we have 7676 features that will either be included (11) or excluded (00). Now that we have one array of 7676 bits we will use an algorithm to see which bits to include. The algorithm has no explicit knowledge of the fact that the bit vector consists of three parts, 6-6-64-bits or what it represent.

Sequential Forward Floating Selection

SFFS works by starting with no features selected, then testing to add each feature alone and compute it’s score. In the next step (forward) we keep the feature with the best score. Now we enter a backtracking loop, removing a feature at a time and computing the a new score for the previous state (remove worst feature). If the score is better than before, we step back and continue to backtrack until we find nothing better after which we start restart our forward loop. This procedure is repeated until all states have been stabilized. For a more formal description see for example the last slide here.

I modified the algorithm to make it possible to start from a seed, that way we can start with constants of an existing program and see if they can be tuned.

The process itself is pretty fast, what takes time is computing the fitness which in our case will be an avalanche score for N samples.

Here is an example, starting with no bits set, In this example it searches for a multiplier constant, so it starts by making it odd (the bit output happens to be big endian), you can see how it goes back (bwd) and forth (fwd) flipping bits trying to minimize the fitness function.

seed@0       0000000000000000000000000000000000000000000000000000000000000000 (5.00803)
fwd: 1: (1): 1000000000000000000000000000000000000000000000000000000000000000 [  1 ] : 4.77487 vs 1e+08*
fwd: 2: (2): 1000000000000000000001000000000000000000000000000000000000000000 [  2097153 ] : 3.20621 vs 1e+08*
fwd: 3: (3): 1000000000000001000001000000000000000000000000000000000000000000 [  2129921 ] : 2.16687 vs 1e+08*
fwd: 4: (4): 1000000000000001000001000000100000000000000000000000000000000000 [  270565377 ] : 1.59203 vs 1e+08*
bwd: 3: (3): 1000000000000001000001000000000000000000000000000000000000000000 [  2129921 ] : 2.16519 < 2.16687
fwd: 4: (4): 1000000000000001000001000000100000000000000000000000000000000000 [  270565377 ] : 1.59278 vs 1.59203
fwd: 5: (5): 1000100000000001000001000000100000000000000000000000000000000000 [  270565393 ] : 1.28264 vs 1e+08*
fwd: 6: (6): 1000100000000001000001000000100000001000000000000000000000000000 [  68990042129 ] : 1.07799 vs 1e+08*
bwd: 5: (5): 1000100000000001000000000000100000001000000000000000000000000000 [  68987944977 ] : 1.24125 < 1.28264
fwd: 6: (6): 1000100000000001000000001000100000001000000000000000000000000000 [  69004722193 ] : 1.06216 vs 1.07799*
bwd: 5: (5): 1000100000000001000000000000100000001000000000000000000000000000 [  68987944977 ] : 1.24106 < 1.24125
fwd: 6: (6): 1000100000000001000000001000100000001000000000000000000000000000 [  69004722193 ] : 1.06407 vs 1.06216
fwd: 7: (7): 1000100000001001000000001000100000001000000000000000000000000000 [  69004726289 ] : 0.699706 vs 1e+08*
bwd: 6: (6): 1000100000001000000000001000100000001000000000000000000000000000 [  69004693521 ] : 0.815164 < 1.06216
fwd: 7: (7): 1000100000001000000010001000100000001000000000000000000000000000 [  69005742097 ] : 0.598486 vs 0.699706*
fwd: 8: (8): 1000100000001010000010001000100000001000000000000000000000000000 [  69005758481 ] : 0.437632 vs 1e+08*
bwd: 7: (7): 1000100000001000000010001000100000001000000000000000000000000000 [  69005742097 ] : 0.596503 < 0.598486
fwd: 8: (8): 1000100000001001000010001000100000001000000000000000000000000000 [  69005774865 ] : 0.43684 vs 0.437632*
fwd: 9: (9): 1000100000001001000010001000100000001001000000000000000000000000 [  618761588753 ] : 0.33495 vs 1e+08*
bwd: 8: (8): 1000100000001001000000001000100000001001000000000000000000000000 [  618760540177 ] : 0.349883 < 0.43684
bwd: 7: (7): 1000100000001001000000001000000000001001000000000000000000000000 [  618492104721 ] : 0.379048 < 0.596503
bwd: 6: (6): 1000100000001000000000001000000000001001000000000000000000000000 [  618492071953 ] : 0.79068 < 0.815164
bwd: 5: (5): 1000100000001000000000000000000000001001000000000000000000000000 [  618475294737 ] : 1.18159 < 1.24106
fwd: 6: (6): 1000100000101000000000000000000000001001000000000000000000000000 [  618475295761 ] : 0.621712 vs 0.79068
...

Fitness function

The fitness function is as important as it’s hard to specify. After testing a bit I went with trying to minimize the max bias. For each permutation of the bit array, the mixer is called N times with random numbers for which avalanche scores are computed. Since the strict avalanche criterion (sac) is easier to satisfy it will guide the search until the bit independence criterion (bic) can be computed (which will remain at 100% in the beginning). Without spending too much time on this I set the fitness to minimizing

f=mean_bias+max_biasf=mean\_bias + max\_bias

This will keep the mean low and the avoid worst cases, which in turn lowers the std deviation. The mean is more stable than the max bias, which is perhaps the reason it seems to help in the searches.

Experiment

To narrow down the scope I will focus on mixer with (m)ultipliers and (x)or-shift-right. When I write xmx I mean xor-shift-right by some constant, multiply with a constant and then xor-shift-right by some constant (like the example function above).

This experiment will gradually create more complex mixers and the purpose is to see how many operations we need to make a good hash function. By good, I mean something that has a low bias and passes at least 2402^{40} bytes through PractRand.

Just as before I used RNG_test.exe stdin64 -tf 2 -te 0 -multithreaded -tlmax 1TB -tlmin 1KB for PractRand, where the mixer is used as a random number generator where the state is increased by one in each step.

Constant for a single multiplication mixer

My first question is, if our mixer function consists of a single multiplication what is the constant that minimizes the bias? First we need an odd constant CC to make the function bijective, but which c is optimal?

uint64_t m(uint64_t  x) {
   return x * c;
}

Intuitively it feels like it should have about as many zeros as ones distributed randomly. Using this hypothesis let’s start by creating an odd constant with binary 32 ones and 32 zeros where each interleaves each other,

c_base=0b0101010101010101010101010101010101010101010101010101010101010101

Then we run the avalanche test on m (passing 1.000.000 random numbers through it). The function is very biased, average around 31% with worst case of 97% (strict avalanche criterion). The question is if we can find another constant that is less biased? To give a hint we can run the SFFS algorithm and additionally seed it with c_base. The results are pretty conclusive, if we start with the seed c_base not a single bit is changed by the SFFS algorithm. When starting SFFS at random seeds or 0, no better constant was found, but all end up close to c_base in regards to their constructions (1’s interleaved with 0’s).

c_found=0b1101010101010101010110100110100110101010101010101010101010101010

The average bias is slightly worse (33%) than our base case with the same max bias.

Maybe c_base is theoretically optimal with respect to the strict avalanche criterion. The repeating 10 pattern makes this constant really bad when used in more realistic mixers as we shall see.

Multiplier constants for more complex mixers

I will spare you much of the details, but after running searches I started to suspect that as long as the multiplier is somewhat random we are fine in terms of PractRand results. Further I am not entirely convinced that using different multipliers automatically makes things better, although it certainly feels like it should. What seems somewhat important is that the least significant bits look decent (not too many repeating 0’s or 1’s and not to unbalanced).

The xor-shift constants

Here I found something interesting, it seems like those are only semi-dependent of the multiplier constants (which is nice from a search perspective). If you find a better configuration for a multiplier it’s likely it will improve all the other multipliers too! In the tables below, I’ve used a bunch of well known constants from mixers, c_base and c_found, and a random constant (literally the first I generated from random.org).

With the hypothesis that the xor constants are pretty independent of the multiplier constants I fixated the multiplier constant while SFFS is looking for good xor-shift constants.

mx

We fix mc and look for good xc1 constants.

uint64_t mx(uint64_t x) {
	x *= mc;
	x ^= x >> xc1;
	return x;
}
mc xc1 sac_max PR
c_base 33 34.41 10
c_found 32 33.47 10
splitmix_1 31 42.82 14
splitmix_2 31 44.46 13
murmur_1 30 39.86 12
murmur_2 32 45.41 12
fasthash 30 45.41 14
random 32 45.61 11

SFFS suggests that shifting around 32 is a good choice.

xm

We fix mc and look for good xc1 constants.

uint64_t xm(uint64_t x) {
	x ^= x >> xc1;
	x *= mc;
	return x;
}
mc xc1 sac_max PR
c_base 36 93.75 10
c_found 36 93.75 10
splitmix_1 31 93.75 12
splitmix_2 27 93.75 12
murmur_1 30 93.75 10
murmur_2 32 93.75 10
fasthash 29 93.75 10
random 32 93.75 11

Same here, looks like around 32 is a good choice, xm seems weaker than mx both for the avalanche criterion and PractRand wise.

mxm

We fix mc and look for good xc1 constants.

uint64_t mxm(uint64_t x) {
	x *= mc;
	x ^= x >> xc1;
	x *= mc;
	return x;
}
mc xc1 max_sac PR
c_base 59 31.25 10
c_found 53 11.63 12
splitmix_1 57 10.91 13
splitmix_2 57 5.25 14
murmur_1 57 13.97 12
murmur_2 57 10.93 14
fasthash 57 4.13 13
random 57 4.79 14

For the mxm construction it seems like a high value of 57 is the best choice to keep the avalanche bias down. This is close to the constant that was used in the experiment in my previous article (56).

xmx

We fix mc and look for good xc1 and xc2 constants.

uint64_t xmx(uint64_t x) {
	x ^= x >> xc1;
	x *= mc;
	x ^= x >> xc2;
	return x;
}
mc xc1/xc2 max_sac PR
c_base 21/35 13.21 10
c_found 17/32 10.64 10
splitmix_1 26/27 7.41 14
splitmix_2 26/31 6.73 13
murmur_1 21/35 6.47 10
murmur_2 30/27 4.89 14
fasthash 26/30 6.48 14
random 26/32 7.27 11

Here the found constants are a bit lower than in the mxm case.

mxmx

We fix mc and look for good xc1 and xc2 constants.

uint64_t mxmx(uint64_t x) {
	x *= mc;
	x ^= x >> xc1;
	x *= mc;
	x ^= x >> xc2;
	return x;
}
mc xc1/xc2 max_sac PR PR 34/28 max_sac 34/28
c_base 47/46 6.01 10 10 8.86
c_found 38/35 1.57 15 16 1.80
splitmix_1 39/28 1.75 27 32 1.71
splitmix_2 47/31 1.61 19 31 1.72
murmur_1 32/32 2.61 15 25 2.46
murmur_2 36/32 1.77 15 29 1.56
fasthash 53/12 1.74 18 30 6.40
random 56/15 2.60 17 21 2.14

This table gave me the insight shows something I found in my earlier article: measuring the avalanche alone is an insufficient indicator on how random a function is. This was a bit of a downer, since I don’t have time to explore a better fitness function. But then I realised something interesting that put me back on track.

In the fifth column I manually found a pair of xc1 and xc2 that seemed good with one of the multiplier constant, then when I used the same on the others it seemed to improve the PractRand result for all of them without changing the avalanche bias too much! This is good news for two reasons: 1) if true, it much quicker iterations for tuning xor-constants 2) when found, those will work everywhere.

xmxm

We fix mc and look for good xc1 and xc2 constants.

uint64_t xmxm(uint64_t x) {
	x ^= x >> xc1;
	x *= mc;
	x ^= x >> xc2;
	x *= mc;
	return x;
}
mc xc1/xc2 max_sac PR
c_base 33/33 32.32 10
c_found 14/50 5.00 10
splitmix_1 18/48 2.82 14
splitmix_2 26/47 2.90 14
murmur_1 12/52 2.06 14
murmur_2 36/36 1.90 14
fasthash 17/51 2.71 14
random 13/51 2.59 14

I was not able to find any xor constants that pushed any of the configuration above PractRand 14. Setting the constants around 34/18 made c_found go to 14 as well.

Although the SFFS found xor constants gave good avalanche scores, they did not guarantee good PractRand scores. So from here I will manually find xor constants and see if the hypothesis that if they improve one constant, they are likely to improve others as well.

I continued in the same manner, going through mxmxm, xmxmx, xmxmxm, mxmxmx finding good constants for each - the pattern seemed to hold, you can optimize it for any constant and if it improves that - it will improve the others as well. I wish I had time to compose proper tables for this, but my vacation is coming to an end so I need to finish this up. I will summarize the findings and use less complete tables.

mxmxm

uint64_t mxmxm(uint64_t x) {
	x *= mc;
	x ^= x >> 32;
	x *= mc;
	x ^= x >> 33;
	x *= mc;
	return x;
}
mc PR 32/33 max_sac
c_base 10 24.95
c_found 20 6.59
splitmix_1 38 2.17
splitmix_2 36 0.63
murmur_1 36 3.54
murmur_2 39 0.61
fasthash 36 1.11
random 38 2.00

For mxmxm good xor-shift constants seems to be 33/29, this will fail PractRand around 40 for most of the constants. However as we saw in my previous article, the bit independence criterion bias is way too high to make those a competitor to MurMur3 etc.

xmxmx

uint64_t xmxmx(uint64_t x) {
	x ^= x >> 34;
	x *= mc;
	x ^= x >> 18;
	x *= mc;
	x ^= x >> 28;    	
	return x;
}

For xmxmx (same construction as MurMur3 and SplitMix) good constants seems to be around 34/18/28 (I admit that this can probably be optimized). It will take most of the constants over PractRand 30, and here we can get low bias in the bit independence test.

Note as a baseline, SplitMix and MurMur3 in their original form fail around PractRand 19 when using a counter based state (gamma=1).

mc PR 34/18/28 max_bic
c_base 10 20.5
c_found 17 2.05
splitmix_1 36 0.78
splitmix_2 36 1.39
murmur_1 25 1.37
murmur_2 30 1.22
fasthash 30 4.18
random 32 4.52

mxmxmx

uint64_t mxmxmx(uint64_t x) {
	x *= mc;    
	x ^= x >> 33;
	x *= mc;
	x ^= x >> 29;
	x *= mc;
	x ^= x >> 39;    	
	return x;
}

For mxmxmx we get really good mixers with low avalanche bias and high ‘randomness’ as we can push it passed PractRand >44 (>16TB) (takes a lot of time to run the tests at this level). As a note, the random constant seems to have good PractRand properties but the max bias for the bit independence criteria is not excellent.

The best constants I’ve found are 33/29/39 with respect to PractRand.

mc PR 32/28/33 PR 33/29/39 max_bic
c_base 11 11 19.32
c_found 30 38 4.710
splitmix_1 41 >44 0.378
splitmix_2 43? >44 0.440
murmur_1 37 >44 0.390
murmur_2 42 >44 0.365
fasthash 38 >44 0.394
random 44 >44 3.719

For illustration of my point of quicker iterations, I kept the not so good constants, which I first found after tuning the xor-shift constants to for c_found to 32/28/33 after which I tuned it further to 38 - note how all constants improved. Really promising construction!

xmxmxm

Finally I found 32/28/33 to work well with xmxmxm

uint64_t xmxmxm(uint64_t x) { 
	x ^= x >> 32;
	x *= mc;
	x ^= x >> 28;
	x *= mc;
	x ^= x >> 33;
	x *= mc;
	return x;
}
mc PR 32/28/33 max_bic
c_base 10 12.17
c_found 25 0.375
splitmix_1 39 0.360
splitmix_2 39 0.378
murmur_1 40 0.372
murmur_2 44 0.348
fasthash 38 0.399
random 44 0.380

Doesn’t seem as good as mxmxmx, at least I was not able to get passed PractRand 2442^{44}.

Conclusion

The following table summarizes the tables from above, it displays the lowest bias and best PractRand for each table. The purpose is to provide an idea of how well the different constructions can work.

mixer xor constants max_sac max_bic max PR 2k2^{k}
m 31 33.47 100 12
mx 31 33.47 100 14
xm 31 93.75 100 12
mxm 57 4.13 100 14
xmx 32/33 4.89 100 14
mxmx 34/28 1.57 100 32
xmxm 34/18 1.90 100 14
mxmxm 32/33 0.61 50.12 39
xmxmx 34/18/28 0.378 0.78 36
mxmxmx 33/29/39 0.0235 0.365 >44
xmxmxm 32/28/33 0.0333 0.348 44

It seems that passing, PractRand 40 we need either mxmxmx or xmxmxm. The common construction found in SplitMix and Murmur3 xmxmx is the first good that gets the bit independence bias down, however I was not able to push it further than PractRand 36. Remember that I’ve only tested those with a counter state (0,1,2,3,4…) which usually this fails earlier than other gammas.

The use of SFFS didn’t work out as I hoped since the fitness function doesn’t capture enough (optimizing for avalanche doesn’t imply good PractRand results). But during the process I stumbled into and an interesting observation:

It seems like you can tune xor-shift constants for any multiplier constant and if it improves the PractRand result it will probably improve it for all the other multiplier constants. This would have at least two nice consequences

With those experiments, I think I have enough intuition to assemble a hash function or prng based on the findings.

Posts in series

  1. The construct of a bit mixer
  2. Tuning bit mixers
  3. The mx3 mix/prng/hash functions

Written with StackEdit.