Jon Maiga, 2020-08-10
As I am already completely off track of what I was going to do (b-spline noise) I might just as well round up this little hashing side project with a hash function based on the previous findings.
This is the third post in the series, here are the two earlier the construct of a bit mixer and tuning bit mixers.
EDIT: 2020-10-14: I’ve updated mx3 with newly found constants, Improved mx3 and the RRC test.
After reviewing the findings in my earlier experiments I decided to try using the mxmxmx
construction, hence the name mx3
, multiply and xor-shift, repeated three times. The compelling argument to use this is
mxm
part alone of the construction looks promising for mixing the data stream. Also in my earlier speed tests it seems efficient.All constants (except deliberate bad baselines c_base
, c_found
) passed PractRand bytes (>16TB) for the mxmxmx
construction. This is very reassuring that we aren’t too dependent on finding a needle in a haystack. However it’s noticeable that the random
does have worse avalanche behaviour than the others (see xmxmx
and mxmxmx
in the previous post), so one has to pay some attention in the selection.
To obtain an multiplier I employed the SFFS algorithm starting without any seed. After running for a couple of hours it came back with
C = 0xbea225f9eb34556d
To verify this is a good constant, I ranked it for the mx3 construction against about 25 other constants with respect to avalanche behavior - and sure enough it ranked among the top ones. Finally I verified it had expected PractRand and SMHasher results.
EDIT 2020-10-14: Updated code with revision 2.
Here is the revision 2 64-bit mixer
inline uint64_t mix(uint64_t x) {
x ^= x >> 32;
x *= 0xbea225f9eb34556d;
x ^= x >> 29;
x *= 0xbea225f9eb34556d;
x ^= x >> 32;
x *= 0xbea225f9eb34556d;
x ^= x >> 29;
return x;
}
Nothing revolutionary here, it uses the familiar multiply and xor-shift as seen in Murmur3 and SplitMix, but with an extra xor-multiplication on beforehand, different xor-shift constants and only one multiplier constant, C
.
The PRNG just uses the mixer and feeds it with a counter like this:
class random {
public:
explicit random(uint64_t seed) : _counter(seed) {}
uint64_t operator()() { return mix(_counter++); }
private:
uint64_t _counter;
};
This PRNG has a cycle of .
The hash would look something like this (pseudo code)
uint64_t hash(data,end) {
uint64_t h = 0;
while (data != end)
h += mxm(data++);
return mix(h);
}
For performance reasons, I think we don’t need the full mix
for every 8 bytes of data in the stream. Here mxm
refers to the first three operations in the mix
function. But we needed to make a modification in mxm
to make this work, more on this a bit down.
SMHasher is a testsuite designed especially for testing hash functions. I went ahead and cloned a fork by Reini Urban. To get started quickly I used fast-hash
as a template to implement the new hash function.
At first, the idea I outlined above did not work as it failed some differential and permutation tests (among other). However, once I started to get a feeling for why the tests failed I it was pretty straightforward to fix. For example in the stream mixer I had to not only sum up the hash but also mix it up bit
while (data != end) {
h += mxm(*data++);
h *= C;
}
One thing that I needed to make the stream mixer mxm
a bit stronger. In my previous experiments I had made the observation that the avalanche bias doesn’t imply a good PractRand result and vice-versa. The first xor-constant 33, chosen in mxmxmx
was one that favored the PractRand test, while the optimal constant to minimize avalanche bias is 57 (found by SFFS in previous article mxm
construction). By exchanging 33 with 57 it passes the BIC test, but be worsen the PractRand result. I was not able to find any reasonable middleground which was very discouraging. Then I realized: why not use both? Here is the modified hash mixer.
uint64_t mxm(uint64_t x) {
x *= C;
x ^= (x >> 57) ^ (x >> 33);
x *= C;
return x;
}
With this it passed all the SMHasher tests (but for performance reasons I don’t use the modification in the mx3::random).
Here is the mx3 code on GitHub.
Earlier posts in series