Tuning bit mixers

Jon Maiga, 2020-08-03

Earlier I explored the structure of bit mixers and found that the common pattern ‘multiply and xor-shift’ is a good choice. In the search I used preselected constants from well known bit mixers. In this post I will try to search for new constants using a simple algorithm from machine learning. For example consider the following bit mixer that was found

uint64_t xmx(uint64_t x) {
	x ^= x >> 23;
	x *= 0xff51afd7ed558ccdull;
	x ^= x >> 23;
	return x;
}

What are the optimal constants? The shift constants are small (< $2^6=64$ ) while the multiplier is a 64-bit word. Combinations of those will change the mixing and its performance. In the example there is $2^{64}\cdot2^6\cdot2^6=2^{76}$ possible configurations. So already for this simple hash function, exhaustive search is out of the question.

To reduce the dimensionality I am going to employ a simple technique from machine learning. In machine learning you sometimes have a lot of features (parameters) and for some reason you need to select a subset. What is a good subset? Given a fitness function it’s possible to find well performing candidates via different algorithms.

In the example above I will consider each bit as a ‘feature’ so we have $76$ features that will either be included ( $1$ ) or excluded ( $0$ ). Now that we have one array of $76$ bits we will use an algorithm to see which bits to include. The algorithm has no explicit knowledge of the fact that the bit vector consists of three parts, 6-6-64-bits or what it represent.

Sequential Forward Floating Selection

SFFS works by starting with no features selected, then testing to add each feature alone and compute it’s score. In the next step (forward) we keep the feature with the best score. Now we enter a backtracking loop, removing a feature at a time and computing the a new score for the previous state (remove worst feature). If the score is better than before, we step back and continue to backtrack until we find nothing better after which we start restart our forward loop. This procedure is repeated until all states have been stabilized. For a more formal description see for example the last slide here.

I modified the algorithm to make it possible to start from a seed, that way we can start with constants of an existing program and see if they can be tuned.

The process itself is pretty fast, what takes time is computing the fitness which in our case will be an avalanche score for N samples.

Here is an example, starting with no bits set, In this example it searches for a multiplier constant, so it starts by making it odd (the bit output happens to be big endian), you can see how it goes back (bwd) and forth (fwd) flipping bits trying to minimize the fitness function.

seed@0       0000000000000000000000000000000000000000000000000000000000000000 (5.00803)
fwd: 1: (1): 1000000000000000000000000000000000000000000000000000000000000000 [  1 ] : 4.77487 vs 1e+08*
fwd: 2: (2): 1000000000000000000001000000000000000000000000000000000000000000 [  2097153 ] : 3.20621 vs 1e+08*
fwd: 3: (3): 1000000000000001000001000000000000000000000000000000000000000000 [  2129921 ] : 2.16687 vs 1e+08*
fwd: 4: (4): 1000000000000001000001000000100000000000000000000000000000000000 [  270565377 ] : 1.59203 vs 1e+08*
bwd: 3: (3): 1000000000000001000001000000000000000000000000000000000000000000 [  2129921 ] : 2.16519 < 2.16687
fwd: 4: (4): 1000000000000001000001000000100000000000000000000000000000000000 [  270565377 ] : 1.59278 vs 1.59203
fwd: 5: (5): 1000100000000001000001000000100000000000000000000000000000000000 [  270565393 ] : 1.28264 vs 1e+08*
fwd: 6: (6): 1000100000000001000001000000100000001000000000000000000000000000 [  68990042129 ] : 1.07799 vs 1e+08*
bwd: 5: (5): 1000100000000001000000000000100000001000000000000000000000000000 [  68987944977 ] : 1.24125 < 1.28264
fwd: 6: (6): 1000100000000001000000001000100000001000000000000000000000000000 [  69004722193 ] : 1.06216 vs 1.07799*
bwd: 5: (5): 1000100000000001000000000000100000001000000000000000000000000000 [  68987944977 ] : 1.24106 < 1.24125
fwd: 6: (6): 1000100000000001000000001000100000001000000000000000000000000000 [  69004722193 ] : 1.06407 vs 1.06216
fwd: 7: (7): 1000100000001001000000001000100000001000000000000000000000000000 [  69004726289 ] : 0.699706 vs 1e+08*
bwd: 6: (6): 1000100000001000000000001000100000001000000000000000000000000000 [  69004693521 ] : 0.815164 < 1.06216
fwd: 7: (7): 1000100000001000000010001000100000001000000000000000000000000000 [  69005742097 ] : 0.598486 vs 0.699706*
fwd: 8: (8): 1000100000001010000010001000100000001000000000000000000000000000 [  69005758481 ] : 0.437632 vs 1e+08*
bwd: 7: (7): 1000100000001000000010001000100000001000000000000000000000000000 [  69005742097 ] : 0.596503 < 0.598486
fwd: 8: (8): 1000100000001001000010001000100000001000000000000000000000000000 [  69005774865 ] : 0.43684 vs 0.437632*
fwd: 9: (9): 1000100000001001000010001000100000001001000000000000000000000000 [  618761588753 ] : 0.33495 vs 1e+08*
bwd: 8: (8): 1000100000001001000000001000100000001001000000000000000000000000 [  618760540177 ] : 0.349883 < 0.43684
bwd: 7: (7): 1000100000001001000000001000000000001001000000000000000000000000 [  618492104721 ] : 0.379048 < 0.596503
bwd: 6: (6): 1000100000001000000000001000000000001001000000000000000000000000 [  618492071953 ] : 0.79068 < 0.815164
bwd: 5: (5): 1000100000001000000000000000000000001001000000000000000000000000 [  618475294737 ] : 1.18159 < 1.24106
fwd: 6: (6): 1000100000101000000000000000000000001001000000000000000000000000 [  618475295761 ] : 0.621712 vs 0.79068
...

Fitness function

The fitness function is as important as it’s hard to specify. After testing a bit I went with trying to minimize the max bias. For each permutation of the bit array, the mixer is called N times with random numbers for which avalanche scores are computed. Since the strict avalanche criterion (sac) is easier to satisfy it will guide the search until the bit independence criterion (bic) can be computed (which will remain at 100% in the beginning). Without spending too much time on this I set the fitness to minimizing

$f=mean\_bias + max\_bias$

This will keep the mean low and the avoid worst cases, which in turn lowers the std deviation. The mean is more stable than the max bias, which is perhaps the reason it seems to help in the searches.

Experiment

To narrow down the scope I will focus on mixer with (m)ultipliers and (x)or-shift-right. When I write xmx I mean xor-shift-right by some constant, multiply with a constant and then xor-shift-right by some constant (like the example function above).

This experiment will gradually create more complex mixers and the purpose is to see how many operations we need to make a good hash function. By good, I mean something that has a low bias and passes at least $2^{40}$ bytes through PractRand.

Just as before I used RNG_test.exe stdin64 -tf 2 -te 0 -multithreaded -tlmax 1TB -tlmin 1KB for PractRand, where the mixer is used as a random number generator where the state is increased by one in each step.

Constant for a single multiplication mixer

My first question is, if our mixer function consists of a single multiplication what is the constant that minimizes the bias? First we need an odd constant $C$ to make the function bijective, but which c is optimal?

uint64_t m(uint64_t  x) {
   return x * c;
}

Intuitively it feels like it should have about as many zeros as ones distributed randomly. Using this hypothesis let’s start by creating an odd constant with binary 32 ones and 32 zeros where each interleaves each other,

c_base=0b0101010101010101010101010101010101010101010101010101010101010101

Then we run the avalanche test on m (passing 1.000.000 random numbers through it). The function is very biased, average around 31% with worst case of 97% (strict avalanche criterion). The question is if we can find another constant that is less biased? To give a hint we can run the SFFS algorithm and additionally seed it with c_base. The results are pretty conclusive, if we start with the seed c_base not a single bit is changed by the SFFS algorithm. When starting SFFS at random seeds or 0, no better constant was found, but all end up close to c_base in regards to their constructions (1’s interleaved with 0’s).

c_found=0b1101010101010101010110100110100110101010101010101010101010101010

The average bias is slightly worse (33%) than our base case with the same max bias.

Maybe c_base is theoretically optimal with respect to the strict avalanche criterion. The repeating 10 pattern makes this constant really bad when used in more realistic mixers as we shall see.

Multiplier constants for more complex mixers

I will spare you much of the details, but after running searches I started to suspect that as long as the multiplier is somewhat random we are fine in terms of PractRand results. Further I am not entirely convinced that using different multipliers automatically makes things better, although it certainly feels like it should. What seems somewhat important is that the least significant bits look decent (not too many repeating 0’s or 1’s and not to unbalanced).

The xor-shift constants

Here I found something interesting, it seems like those are only semi-dependent of the multiplier constants (which is nice from a search perspective). If you find a better configuration for a multiplier it’s likely it will improve all the other multipliers too! In the tables below, I’ve used a bunch of well known constants from mixers, c_base and c_found, and a random constant (literally the first I generated from random.org).

With the hypothesis that the xor constants are pretty independent of the multiplier constants I fixated the multiplier constant while SFFS is looking for good xor-shift constants.

mx

We fix mc and look for good xc1 constants.

uint64_t mx(uint64_t x) {
	x *= mc;
	x ^= x >> xc1;
	return x;
}

mc	xc1	sac_max	PR
c_base	33	34.41	10
c_found	32	33.47	10
splitmix_1	31	42.82	14
splitmix_2	31	44.46	13
murmur_1	30	39.86	12
murmur_2	32	45.41	12
fasthash	30	45.41	14
random	32	45.61	11

SFFS suggests that shifting around 32 is a good choice.

xm

We fix mc and look for good xc1 constants.

uint64_t xm(uint64_t x) {
	x ^= x >> xc1;
	x *= mc;
	return x;
}

mc	xc1	sac_max	PR
c_base	36	93.75	10
c_found	36	93.75	10
splitmix_1	31	93.75	12
splitmix_2	27	93.75	12
murmur_1	30	93.75	10
murmur_2	32	93.75	10
fasthash	29	93.75	10
random	32	93.75	11

Same here, looks like around 32 is a good choice, xm seems weaker than mx both for the avalanche criterion and PractRand wise.

mxm

We fix mc and look for good xc1 constants.

uint64_t mxm(uint64_t x) {
	x *= mc;
	x ^= x >> xc1;
	x *= mc;
	return x;
}

mc	xc1	max_sac	PR
c_base	59	31.25	10
c_found	53	11.63	12
splitmix_1	57	10.91	13
splitmix_2	57	5.25	14
murmur_1	57	13.97	12
murmur_2	57	10.93	14
fasthash	57	4.13	13
random	57	4.79	14

For the mxm construction it seems like a high value of 57 is the best choice to keep the avalanche bias down. This is close to the constant that was used in the experiment in my previous article (56).

xmx

We fix mc and look for good xc1 and xc2 constants.

uint64_t xmx(uint64_t x) {
	x ^= x >> xc1;
	x *= mc;
	x ^= x >> xc2;
	return x;
}

mc	xc1/xc2	max_sac	PR
c_base	21/35	13.21	10
c_found	17/32	10.64	10
splitmix_1	26/27	7.41	14
splitmix_2	26/31	6.73	13
murmur_1	21/35	6.47	10
murmur_2	30/27	4.89	14
fasthash	26/30	6.48	14
random	26/32	7.27	11

Here the found constants are a bit lower than in the mxm case.

mxmx

We fix mc and look for good xc1 and xc2 constants.

uint64_t mxmx(uint64_t x) {
	x *= mc;
	x ^= x >> xc1;
	x *= mc;
	x ^= x >> xc2;
	return x;
}

mc	xc1/xc2	max_sac	PR	PR 34/28	max_sac 34/28
c_base	47/46	6.01	10	10	8.86
c_found	38/35	1.57	15	16	1.80
splitmix_1	39/28	1.75	27	32	1.71
splitmix_2	47/31	1.61	19	31	1.72
murmur_1	32/32	2.61	15	25	2.46
murmur_2	36/32	1.77	15	29	1.56
fasthash	53/12	1.74	18	30	6.40
random	56/15	2.60	17	21	2.14

This table gave me the insight shows something I found in my earlier article: measuring the avalanche alone is an insufficient indicator on how random a function is. This was a bit of a downer, since I don’t have time to explore a better fitness function. But then I realised something interesting that put me back on track.

In the fifth column I manually found a pair of xc1 and xc2 that seemed good with one of the multiplier constant, then when I used the same on the others it seemed to improve the PractRand result for all of them without changing the avalanche bias too much! This is good news for two reasons: 1) if true, it much quicker iterations for tuning xor-constants 2) when found, those will work everywhere.

xmxm

We fix mc and look for good xc1 and xc2 constants.

uint64_t xmxm(uint64_t x) {
	x ^= x >> xc1;
	x *= mc;
	x ^= x >> xc2;
	x *= mc;
	return x;
}

mc	xc1/xc2	max_sac	PR
c_base	33/33	32.32	10
c_found	14/50	5.00	10
splitmix_1	18/48	2.82	14
splitmix_2	26/47	2.90	14
murmur_1	12/52	2.06	14
murmur_2	36/36	1.90	14
fasthash	17/51	2.71	14
random	13/51	2.59	14

I was not able to find any xor constants that pushed any of the configuration above PractRand 14. Setting the constants around 34/18 made c_found go to 14 as well.

Although the SFFS found xor constants gave good avalanche scores, they did not guarantee good PractRand scores. So from here I will manually find xor constants and see if the hypothesis that if they improve one constant, they are likely to improve others as well.

I continued in the same manner, going through mxmxm, xmxmx, xmxmxm, mxmxmx finding good constants for each - the pattern seemed to hold, you can optimize it for any constant and if it improves that - it will improve the others as well. I wish I had time to compose proper tables for this, but my vacation is coming to an end so I need to finish this up. I will summarize the findings and use less complete tables.

mxmxm

uint64_t mxmxm(uint64_t x) {
	x *= mc;
	x ^= x >> 32;
	x *= mc;
	x ^= x >> 33;
	x *= mc;
	return x;
}

mc	PR 32/33	max_sac
c_base	10	24.95
c_found	20	6.59
splitmix_1	38	2.17
splitmix_2	36	0.63
murmur_1	36	3.54
murmur_2	39	0.61
fasthash	36	1.11
random	38	2.00

For mxmxm good xor-shift constants seems to be 33/29, this will fail PractRand around 40 for most of the constants. However as we saw in my previous article, the bit independence criterion bias is way too high to make those a competitor to MurMur3 etc.

xmxmx

uint64_t xmxmx(uint64_t x) {
	x ^= x >> 34;
	x *= mc;
	x ^= x >> 18;
	x *= mc;
	x ^= x >> 28;    	
	return x;
}

For xmxmx (same construction as MurMur3 and SplitMix) good constants seems to be around 34/18/28 (I admit that this can probably be optimized). It will take most of the constants over PractRand 30, and here we can get low bias in the bit independence test.

Note as a baseline, SplitMix and MurMur3 in their original form fail around PractRand 19 when using a counter based state (gamma=1).

mc	PR 34/18/28	max_bic
c_base	10	20.5
c_found	17	2.05
splitmix_1	36	0.78
splitmix_2	36	1.39
murmur_1	25	1.37
murmur_2	30	1.22
fasthash	30	4.18
random	32	4.52

mxmxmx

uint64_t mxmxmx(uint64_t x) {
	x *= mc;    
	x ^= x >> 33;
	x *= mc;
	x ^= x >> 29;
	x *= mc;
	x ^= x >> 39;    	
	return x;
}

For mxmxmx we get really good mixers with low avalanche bias and high ‘randomness’ as we can push it passed PractRand >44 (>16TB) (takes a lot of time to run the tests at this level). As a note, the random constant seems to have good PractRand properties but the max bias for the bit independence criteria is not excellent.

The best constants I’ve found are 33/29/39 with respect to PractRand.

mc	PR 32/28/33	PR 33/29/39	max_bic
c_base	11	11	19.32
c_found	30	38	4.710
splitmix_1	41	>44	0.378
splitmix_2	43?	>44	0.440
murmur_1	37	>44	0.390
murmur_2	42	>44	0.365
fasthash	38	>44	0.394
random	44	>44	3.719

For illustration of my point of quicker iterations, I kept the not so good constants, which I first found after tuning the xor-shift constants to for c_found to 32/28/33 after which I tuned it further to 38 - note how all constants improved. Really promising construction!

xmxmxm

Finally I found 32/28/33 to work well with xmxmxm

uint64_t xmxmxm(uint64_t x) { 
	x ^= x >> 32;
	x *= mc;
	x ^= x >> 28;
	x *= mc;
	x ^= x >> 33;
	x *= mc;
	return x;
}

mc	PR 32/28/33	max_bic
c_base	10	12.17
c_found	25	0.375
splitmix_1	39	0.360
splitmix_2	39	0.378
murmur_1	40	0.372
murmur_2	44	0.348
fasthash	38	0.399
random	44	0.380

Doesn’t seem as good as mxmxmx, at least I was not able to get passed PractRand $2^{44}$ .

Conclusion

The following table summarizes the tables from above, it displays the lowest bias and best PractRand for each table. The purpose is to provide an idea of how well the different constructions can work.

mixer	xor constants	max_sac	max_bic	max PR $2^{k}$
m	31	33.47	100	12
mx	31	33.47	100	14
xm	31	93.75	100	12
mxm	57	4.13	100	14
xmx	32/33	4.89	100	14
mxmx	34/28	1.57	100	32
xmxm	34/18	1.90	100	14
mxmxm	32/33	0.61	50.12	39
xmxmx	34/18/28	0.378	0.78	36
mxmxmx	33/29/39	0.0235	0.365	>44
xmxmxm	32/28/33	0.0333	0.348	44

It seems that passing, PractRand 40 we need either mxmxmx or xmxmxm. The common construction found in SplitMix and Murmur3 xmxmx is the first good that gets the bit independence bias down, however I was not able to push it further than PractRand 36. Remember that I’ve only tested those with a counter state (0,1,2,3,4…) which usually this fails earlier than other gammas.

The use of SFFS didn’t work out as I hoped since the fitness function doesn’t capture enough (optimizing for avalanche doesn’t imply good PractRand results). But during the process I stumbled into and an interesting observation:

It seems like you can tune xor-shift constants for any multiplier constant and if it improves the PractRand result it will probably improve it for all the other multiplier constants. This would have at least two nice consequences

We can perhaps find a best unique set of xor-shift constants for a given construction.
We can artificially create ‘bad’ multiplier constants that fails PractRand early (say at 30) and tweak the xor-shift constants here for quicker iteration (it worked great for me, tweaking for example c_found).

With those experiments, I think I have enough intuition to assemble a hash function or prng based on the findings.

Posts in series

Written with StackEdit.