The mx3 mix/prng/hash functions

Jon Maiga, 2020-08-10

As I am already completely off track of what I was going to do (b-spline noise) I might just as well round up this little hashing side project with a hash function based on the previous findings.

This is the third post in the series, here are the two earlier the construct of a bit mixer and tuning bit mixers.

EDIT: 2020-10-14: I’ve updated mx3 with newly found constants, Improved mx3 and the RRC test.

Goals

• Not sensitive to the choice of multiplier constant
• Native c++ code (no machine specific instructions for now)
• A counter based pseudo random number generator that passes PractRand $2^{44}$ (>16TB)
• For hashing pass all SMHasher tests

Design ideas

After reviewing the findings in my earlier experiments I decided to try using the mxmxmx construction, hence the name mx3, multiply and xor-shift, repeated three times. The compelling argument to use this is

• It’s the simplest construction I found that is possible to push pass PractRand $>2^{44}$ bytes.
• For hashing, the mxm part alone of the construction looks promising for mixing the data stream. Also in my earlier speed tests it seems efficient.

The choice of multiplier constant

All constants (except deliberate bad baselines c_base, c_found) passed PractRand $2^{44}$ bytes (>16TB) for the mxmxmx construction. This is very reassuring that we aren’t too dependent on finding a needle in a haystack. However it’s noticeable that the random does have worse avalanche behaviour than the others (see xmxmx and mxmxmx in the previous post), so one has to pay some attention in the selection.

To obtain an multiplier I employed the SFFS algorithm starting without any seed. After running for a couple of hours it came back with

C = 0xbea225f9eb34556d


To verify this is a good constant, I ranked it for the mx3 construction against about 25 other constants with respect to avalanche behavior - and sure enough it ranked among the top ones. Finally I verified it had expected PractRand and SMHasher results.

mx3::mix

EDIT 2020-10-14: Updated code with revision 2.

Here is the revision 2 64-bit mixer

inline uint64_t mix(uint64_t x) {
x ^= x >> 32;
x *= 0xbea225f9eb34556d;
x ^= x >> 29;
x *= 0xbea225f9eb34556d;
x ^= x >> 32;
x *= 0xbea225f9eb34556d;
x ^= x >> 29;
return x;
}


Nothing revolutionary here, it uses the familiar multiply and xor-shift as seen in Murmur3 and SplitMix, but with an extra xor-multiplication on beforehand, different xor-shift constants and only one multiplier constant, C.

mx3::random

The PRNG just uses the mixer and feeds it with a counter like this:

class random {
public:
explicit random(uint64_t  seed) : _counter(seed) {}
uint64_t operator()() { return  mix(_counter++); }
private:
uint64_t _counter;
};


This PRNG has a cycle of $2^{64}$.

mx3::hash outline

The hash would look something like this (pseudo code)

uint64_t hash(data,end) {
uint64_t h = 0;
while (data != end)
h += mxm(data++);
return mix(h);
}


For performance reasons, I think we don’t need the full mix for every 8 bytes of data in the stream. Here mxm refers to the first three operations in the mix function. But we needed to make a modification in mxm to make this work, more on this a bit down.

SMHasher is a testsuite designed especially for testing hash functions. I went ahead and cloned a fork by Reini Urban. To get started quickly I used fast-hash as a template to implement the new hash function.

At first, the idea I outlined above did not work as it failed some differential and permutation tests (among other). However, once I started to get a feeling for why the tests failed I it was pretty straightforward to fix. For example in the stream mixer I had to not only sum up the hash but also mix it up bit

while (data != end) {
h += mxm(*data++);
h *= C;
}


One thing that I needed to make the stream mixer mxm a bit stronger. In my previous experiments I had made the observation that the avalanche bias doesn’t imply a good PractRand result and vice-versa. The first xor-constant 33, chosen in mxmxmx was one that favored the PractRand test, while the optimal constant to minimize avalanche bias is 57 (found by SFFS in previous article mxm construction). By exchanging 33 with 57 it passes the BIC test, but be worsen the PractRand result. I was not able to find any reasonable middleground which was very discouraging. Then I realized: why not use both? Here is the modified hash mixer.

uint64_t mxm(uint64_t x) {
x *= C;
x ^= (x >> 57) ^ (x >> 33);
x *= C;
return x;
}


With this it passed all the SMHasher tests (but for performance reasons I don’t use the modification in the mx3::random).

Github

Here is the mx3 code on GitHub.

Earlier posts in series