Full Code of vnmakarov/mum-hash for AI

master 31b2cc31970c cached
70 files
760.6 KB
255.0k tokens
654 symbols
1 requests
Download .txt
Showing preview only (793K chars total). Download the full file or copy to clipboard to get everything.
Repository: vnmakarov/mum-hash
Branch: master
Commit: 31b2cc31970c
Files: 70
Total size: 760.6 KB

Directory structure:
gitextract_28zcd061/

├── .clang-format
├── ChangeLog
├── README.md
├── benchmarks/
│   ├── City.cpp
│   ├── City.h
│   ├── SpookyV2.cpp
│   ├── SpookyV2.h
│   ├── bbs-prng.h
│   ├── bench-crypto.c
│   ├── bench-crypto.sh
│   ├── bench-prng.c
│   ├── bench-prng.sh
│   ├── bench.c
│   ├── bench.sh
│   ├── blake2-config.h
│   ├── blake2-impl.h
│   ├── blake2.h
│   ├── blake2b-load-sse2.h
│   ├── blake2b-load-sse41.h
│   ├── blake2b-round.h
│   ├── blake2b.c
│   ├── byte_order.c
│   ├── byte_order.h
│   ├── chacha-prng.h
│   ├── gen-table.rb
│   ├── meow_hash.h
│   ├── meow_intrinsics.h
│   ├── metrohash64.cpp
│   ├── metrohash64.h
│   ├── mum512-prng.h
│   ├── platform.h
│   ├── rapidhash.h
│   ├── sha3.c
│   ├── sha3.h
│   ├── sha512.c
│   ├── sha512.h
│   ├── sip24-prng.h
│   ├── siphash24.c
│   ├── splitmix64.c
│   ├── t1ha/
│   │   ├── src/
│   │   │   ├── t1ha0.c
│   │   │   ├── t1ha0_ia32aes_a.h
│   │   │   ├── t1ha0_ia32aes_avx.c
│   │   │   ├── t1ha0_ia32aes_avx2.c
│   │   │   ├── t1ha0_ia32aes_b.h
│   │   │   ├── t1ha0_ia32aes_noavx.c
│   │   │   ├── t1ha0_selfcheck.c
│   │   │   ├── t1ha1.c
│   │   │   ├── t1ha1_selfcheck.c
│   │   │   ├── t1ha2.c
│   │   │   ├── t1ha2_selfcheck.c
│   │   │   ├── t1ha_bits.h
│   │   │   ├── t1ha_selfcheck.c
│   │   │   ├── t1ha_selfcheck.h
│   │   │   └── t1ha_selfcheck_all.c
│   │   └── t1ha.h
│   ├── ustd.h
│   ├── xoroshiro128plus.c
│   ├── xoroshiro128starstar.c
│   ├── xoseed.c
│   ├── xoshiro256plus.c
│   ├── xoshiro256starstar.c
│   ├── xoshiro512plus.c
│   ├── xoshiro512starstar.c
│   ├── xxh3.h
│   ├── xxhash.c
│   └── xxhash.h
├── mum-prng.h
├── mum.h
├── mum512.h
└── vmum.h

================================================
FILE CONTENTS
================================================

================================================
FILE: .clang-format
================================================
BasedOnStyle: google
SpaceBeforeParens: Always
IndentCaseLabels: false
AllowShortIfStatementsOnASingleLine: true
AllowShortLoopsOnASingleLine: true
SpaceAfterCStyleCast: true
PointerAlignment: Right
BreakBeforeBinaryOperators: All
ConstructorInitializerIndentWidth: 2
ContinuationIndentWidth: 2
PenaltyBreakBeforeFirstCallParameter: 10000
SortIncludes: false
BreakStringLiterals: true
BreakBeforeTernaryOperators: true
AllowShortCaseLabelsOnASingleLine: true
#AllowShortEnumsOnASingleLine: true
ColumnLimit: 100
MaxEmptyLinesToKeep: 1
#StatementMacros: [ 'REP2', 'REP3', 'REP4', 'REP5', 'REP6', 'REP7', 'REP8' ]
#TypenameMacros: [ 'VARR', 'DLIST', 'HTAB' ]


================================================
FILE: ChangeLog
================================================
2018-11-02  Vladimir Makarov  <vmakarov@gcc.gnu.org>

	* README.md: Add update about mum-prng.  Correct typo for
	xoshiro512** result.
	* mum-prng.h (_mum_prng_state): Change avx2_support onto
	update_func.
	(_mum_prng_setup_avx2): Setup update_func.  Move below.
	(_start_mum_prng): Setup update_func for non x86-64.  Move below.
	(init_mum_prng, set_mum_prng_seed): Move below.
	(_mum_prng_update_avx2, _mum_prng_update): Update a state word
	from the next word.
	(get_mum_prn): Simplify.
	* src/bench-prng: Use env. variable MUM_ONLY.

2018-10-31  Vladimir Makarov  <vmakarov@gcc.gnu.org>

	* README.md: Minor editions.  Add RAND failure on practrand.

2018-10-31  Vladimir Makarov  <vmakarov@gcc.gnu.org>

	* README.md: Update for PRNGs and MUM-512.
	* mum-512.h (_MC_FRESH_GCC, _mc_hash_avx2): Removed.
	(_mc_hash_default): Don't check _MC_UNALIGNED_ACCESS.
	(mum512_keyed_hash): Remove avx2 probe.  Use _mc_hash_aligned.
	* mum-prng.h (_mum_avx2): New.
	(_mum_prng_update_avx2): Use it.
	* src/bench-prng: Add new xo[ro]shiro tests.
	* src/bench-prng.c: Ditto.  Add code for PRNs output.
	* src/xoroshiro128plus.c, src/xoroshiro128starstar.c:  New files.
	* src/xoshiro256plus.c, src/xoshiro256starstar.c: New files.
	* src/xoshiro512plus.c, src/xoshiro512starstar.c: New files.
	* src/splitmix64.c, src/xoseed.c: New files.

2018-10-30  Vladimir Makarov  <vmakarov@gcc.gnu.org>

	* README.md: Descrition of new version of mum-hash.  New results
	for modern CPUs.
	* mum-hash (_MUM_FRESH_GCC): Remove.
	(_mum_rotl): New.
	(_mum_hash_aligned, _mum_final): Add new version.  Use MUM_V1 for
	the old code.
	(_mum_hash_avx2): Remove.
	(_mum_hash_default, mum_hash): Modify and simplify.
	* src/bench: Use environment variable MUM_ONLY for runing mum-hash
	only.  Add Meow Hash runs.  Use 16MB keys instead of 1KB ones.
	* src/bench.c (meowhash_test): New.
	(main): Use 16MB keys instead of 1KB ones.
	* meow_hash.h: New.

2016-08-10  Aras Pranckevičius <nearaz@gmail.com>
	    Vladimir Makarov  <vmakarov@gcc.gnu.org>

	* mum.h: Permit unaligned access for _M_AMD64 and _M_IX86
	(Windows).
	* mum512.h: Ditto.
	* bench: Use CC instead of CXX for siphash.

2016-07-13  Vladimir Makarov  <vmakarov@gcc.gnu.org>

	* mum.h (_mum_hash_aligned, mum_hash_randomize): Make i type of
	size_t.
	* mum512.h (_mc_hash_aligned): Ditto.
	(_mc_init_state, _mc_hash_avx2, _mc_hash_default): Use cont for
	seed.

2016-06-14  Vladimir Makarov  <vmakarov@gcc.gnu.org>

	* README.md: Add results for Blake2.
	* src/{blake2b.c, blake2b-load-sse2.h, blake2b-load-sse41.h}: New.
	* src/{blake2b-round.h, blake2-config.h, blake2.h, blake2-impl.h}:
	New.
	* src/bench-crypto.c: Add code for testing Blake2.
	* src/bench-crypto: Ditto.

2016-06-07  Vladimir Makarov  <vmakarov@gcc.gnu.org>

	* README.md: Update speed numbers for all functions for aarch64.

y2016-06-06  Vladimir Makarov  <vmakarov@gcc.gnu.org>

	* src/bench.c: Use faster interface for xxHash.
	* README.md: Update xxHash speed numbers for x86-64 and ppc64.

2016-05-18  Vladimir Makarov  <vmakarov@gcc.gnu.org>

	* README.md: Update speed data for MUM and MUM512 and add info
	about testing MUM PRNG on NIST bigger data.

2016-05-18  Vladimir Makarov  <vmakarov@gcc.gnu.org>
	    Vsevolod Stakhov  <vsevolod@highsecure.ru>

	* mum512.h (_MC_FRESH_GCC): New. Use it as a guard for avx2 version.
	* mum-prng.h (_MUM_PRNG_FRESH_GCC): New. Use it as a guard for
	avx2 version.
	* mum.h (_MUM_FRESH_GCC): New. Use it as a guard for avx2 version.
	Remove clang guard for _MUM_OPTIMIZE etc.

2016-05-18  Vladimir Makarov  <vmakarov@gcc.gnu.org>

	* src/mum-prng.h: Move it to parent directory

2016-05-13  Vladimir Makarov  <vmakarov@gcc.gnu.org>

	* README.md: Some minor changes.

2016-05-13  Vladimir Makarov  <vmakarov@gcc.gnu.org>
	    Vsevolod Stakhov  <vsevolod@highsecure.ru>

	* src/bench.c: Include test for metro hash.
	* src/bench (COPTFLAGS, CC, CXX, LTO): New.  Use them.  Add runs for
	metro hash.
	* README.md: Update benchmark results, add results for MetroHash.
	* metrohash64.cpp: New.
	* metrohash64.h: New.
	* platform.h: New.

2016-05-13  Vsevolod Stakhov  <vsevolod@highsecure.ru>

	* mum.h: Add support for LLVM.

2016-05-13  Vladimir Makarov  <vmakarov@gcc.gnu.org>

	* mum.h (_mum_le32): New.
	(_mum_hash_aligned): Change the code to deal with uint64_t shifts
	and endianess.

2016-05-12  Vladimir Makarov  <vmakarov@gcc.gnu.org>

	* README.md: Add results for 5-byte string tests.
	* mum.h (uint16_t): New.
	(_mum_hash_aligned): Modify code to process a tail < 8 bytes.
	(_mum_hash_default): Use memmove instead of memcpy.
	* src/bench.c: Add test for 5-byte strings.
	* src/bench: Ditto.

2016-05-10  Vladimir Makarov  <vmakarov@gcc.gnu.org>

	* README.md: Highlight the new MUM PRNG speed.

2016-05-10  Vladimir Makarov  <vmakarov@gcc.gnu.org>

	* README.md: Describe a new version of MUM PRNG and update its
	speed.
	* src/mum-prng.h (MUM_PRNG_UNROLL): New.
	(EXPECT): New.
	(mum_prng_state): Rename to _mum_prng_state.  Add fields count and
	avx2_support.  Make state an array.
	(_mum_prng_setup_avx2, _mum_prn_update, _mum_prn_avx2_update):
	New.
	(_start_mum_prng): New.
	(init_mum_prng): Use _start_mum_prng.
	(set_mum_seed): Rename to set_mum_prng_seed.  Use _start_mum_prng.
	(get_mum_prn): Rewrite.
	* src/bench-prng.c (init_prng): Randomize multiplication
	constants.

2016-05-09  Vladimir Makarov  <vmakarov@gcc.gnu.org>

	* src/mum-prng.h (init_mum_prng): Use seed == 1.
	(get_mum_prn): Fix prns generation.
	* README.md: Update result for MUM PRNG.

2016-05-09  Vladimir Makarov  <vmakarov@gcc.gnu.org>

	* README.md: Add results for xxHash64 on AARCH64 and PPC64 and for
	xoroshiro128+.
	* src/xoroshiro128plus.c: New.
	* src/bench-prng.c: Add a code to test xoroshiro128+.
	* src/bench-prng: Add a run to test speed of xoroshiro128+.

2016-05-09  Vladimir Makarov  <vmakarov@gcc.gnu.org>

	* README.md: Add results for xxHash64.
	* src/xxhash.[ch]: New.
	* src/bench.c: Add code for xxHash64.
	(state, xxHash64_test): New.
	* src/bench: Add runs for xxHash64.

2016-05-08  Vladimir Makarov  <vmakarov@gcc.gnu.org>

	* README.md: Some editing.

2016-05-08  Vladimir Makarov  <vmakarov@gcc.gnu.org>

	* mum512.h (_mc_rotr): Decrease sh for sh >= 64.

2016-05-08  Vladimir Makarov  <vmakarov@gcc.gnu.org>

	* mum512.h (_mc_ti): Define depending on endianess.

2016-05-08  Vladimir Makarov  <vmakarov@gcc.gnu.org>

	* mum512.h (_mc_mul64): Change multiplication result names.
	(_mc_permute): Use _mc_xor.

2016-05-08  Vladimir Makarov  <vmakarov@gcc.gnu.org>

	src/bench.c (main): Change input every iteration.

2016-05-08  Vladimir Makarov  <vmakarov@gcc.gnu.org>

	* mum.h: New file.
	* mum512.h: New file.
	* README.md: New file.
	* src/bbs-prng.h: Ditto.
	* src/bench: Ditto.
	* src/bench.c: Ditto.
	* src/bench-crypto: Ditto.
	* src/bench-crypto.c: Ditto.
	* src/bench-prng: Ditto.
	* src/bench-prng.c: Ditto.
	* src/byte_order.[ch]: Ditto.
	* src/chacha-prng.h: Ditto.
	* src/City.cpp: Ditto.
	* src/City.h: Ditto.
	* src/mum512-prng.h: Ditto.
	* src/mum-prng.h: Ditto.
	* src/sha3.[ch]: Ditto.
	* src/sha512.[ch]: Ditto.
	* src/sip24-prng.h: Ditto.
	* src/siphash24.c: Ditto.
	* src/Spooky.cpp: Ditto.
	* src/Spooky.h: Ditto.
	* src/ustd.h: Ditto.



================================================
FILE: README.md
================================================
# **Update (Nov. 28, 2025): Implemented collision attack prevention in VMUM and MUM-V3**
* The attack is described in Issue#18
* The code in question looks like
```
    state ^= _vmum (data[i] ^ _vmum_factors[i], data[i + 1] ^ _vmum_factors[i + 1]));
```
  It is easy to generate data which makes the 1st operand of `_vmum`
  to be zero.  In this case whatever the second operand is, the hash
  will be generated the same.  So an adversary can generate a lot of
  data with the same hash
* This is pretty common code mistake for a few fast hash-functions. At
  least I found the same vulnerability in [wyhash](https://github.com/wangyi-fudan/wyhash/blob/46cebe9dc4e51f94d0dca287733bc5a94f76a10d/wyhash.h#L130) and [rapidhash](https://github.com/Nicoshev/rapidhash/blob/d60698faa10916879f85b2799bfdc6996b94c2b7/rapidhash.h#L383)
* After the code change, the safe variants of VMUM and MUM hashes are
  switched on by default.  If you want previous variants, please use
  macros VMUM_V1 and MUM_V3 correspondingly.  I believe there are still
  cases when they can be used, e.g. for hash tables in compilers.
* The fix consist of checking _vmum operands on zero and use nonzero value instead
  * all checks are implemented to avoid branch instruction generations to keep hash calculation pipeline going
  * still the checks increase length of critical paths of calculation
  * in most cases, new versions of VMUM and MUM generates the same hashes as the previous versions
* The fix results in slowing down hash speeds by about **10%** according to my benchmarks
  * I updated all benchmark data related to the new versions of VMUM and MUM below
	
# MUM Hash
* MUM hash is a **fast non-cryptographic hash function**
  suitable for different hash table implementations
* MUM means **MU**ltiply and **M**ix
  * It is a name of the base transformation on which hashing is implemented
  * Modern processors have a fast logic to do long number multiplications
  * It is very attractive to use it for fast hashing
    * For example, 64x64-bit multiplication can do the same work as 32
      shifts and additions
  * I'd like to call it Multiply and Reduce.  Unfortunately, MUR
    (MUltiply and Rotate) is already taken for famous hashing
    technique designed by Austin Appleby
  * I've chosen the name also as the first release happened on Mother's day
* To use mum you just need one header file (mum.h)
* MUM hash passes **all** [SMHasher](https://github.com/aappleby/smhasher) tests
  * For comparison, only 4 out of 15 non-cryptographic hash functions
    in SMHasher passes the tests, e.g. well known FNV, Murmur2,
    Lookup, and Superfast hashes fail the tests
* MUM V3 hash does not pass the following tests of a more rigourous
  version of [SMHasher](https://github.com/rurban/smhasher):
  * It fails on Perlin noise and bad seeds tests.  It means it still
    qualitative enough for the most applications
  * To make MUM V3 to pass the Rurban SMHasher, macro `MUM_QUALITY` has been
    added.  Compilation with this defined macro makes MUM V3 to pass
    all tests of Rurban SMHasher.  The slowdown is about 5% in average
    or 10% at most on keys of length 8.  It also results in generating
    a target independent hash
* For historic reasons mum.h contains code for older version V1 and
  V2.  You can switch them on by defining macros **MUM_V1** and **MUM_V2**
* MUM algorithm is **simpler** than the VMUM one
* MUM is specifically **designed to be fast on 64-bit CPUs**
  * Still MUM will work for 32-bit CPUs and it will be sometimes
    faster than Spooky and City
* MUM has a **fast startup**.  It is particular good to hash small keys
  which are prevalent in hash table applications

# MUM implementation details

* Input 64-bit data is randomized by 64x64->128 bit multiplication and mixing
  high- and low-parts of the multiplication result by using addition.
  The result is mixed with the current internal state by using XOR
  * Instead of addition for mixing high- and low- parts, XOR could be
    used
    * Using addition instead of XOR improves performance by about
      10% on Haswell and Power7
* Factor numbers, randomly generated with an equal probability of their
  bit values, are used for the multiplication
* When all factors are used once, the internal state is randomized, and the same
  factors are used again for subsequent data randomization
* The main loop is formed to be **unrolled** by the compiler to benefit from the
  the compiler instruction scheduling optimization and OOO
  (out-of-order) instruction execution in modern CPUs
* MUM code does not contain assembly (asm) code anymore. This makes MUM less
  machine-dependent.  To have efficient mum implementation, the
  compiler should support 128-bit integer
  extension (true for GCC and Clang on many targets)

# VMUM Hash
* VMUM is a vector variant of mum hashing (see below)
  * It uses target SIMD instructions (insns)
  * In comparison with mum v3, vmum considerably (up to 3 times) improves the speed
    of hashing mid-range (32 to 256 bytes) to long-range (more 256 bytes) length keys
  * As with previous mum hashing, to use vmum you just need one header
    file (vmum.h)
  * vmum source code is considerably smaller than that of extremely
    fast xxHash3 and th1ha2 and competes with them on hashing speed
  * vmum passes a more rigorous version of
    [SMHasher](https://github.com/rurban/smhasher)
   
# VMUM implementation details
* For long keys vmum uses vector insns:
  * AVX2 256-bit vector insns on x86-64
  * Neon 128-bit vector insns on aarch64
  * Altivec 128-bit vector insns on ppc64
  * There is a scalar emulation of the vector insns, too, for other targets
	* This could be useful for understanding used the vector
      operations used
* You can add usage of vector insns for other targets.  For this you
    just need to add small functions `_vmum_update_block`,
    `_vmum_zero_block`, and `_vmum_fold_block`
  * For the beneficial usage of vector insns the target should have unsigned `32 x 32-bit ->
    64-bit` vector multiplication
* To run vector insns in parallel on OOO CPUs, two vmum code loops are formed
  to be **unrolled** by the compiler into one basic block
* I experimented a lot with other vector insns and found that the usage of
  carry-less (sometimes called polynomial) vector multiplication insns does not work
  well enough for hashing

# VMUM and MUM benchmarking vs other famous hash functions

* Here are the results of benchmarking VMUM and MUM with the fastest
  non-cryptographic hash functions I know:
  * Google City64 (sources are taken from SMHasher)
  * Bob Jenkins Spooky (sources are taken from SMHasher)
  * Yann Collet's xxHash3 (sources are taken from the
    [original repository](https://github.com/Cyan4973/xxHash))
* I also added J. Aumasson and D. Bernstein's
  [SipHash24](https://github.com/veorq/SipHash) for the comparison as it
  is a popular choice for hash table implementation these days
* A [metro hash](https://github.com/jandrewrogers/MetroHash)
  was added as people asked and as metro hash is
  claimed to be the fastest hash function
    * metro hash is not portable as others functions as it does not deal
      with the unaligned accesses problem on some targets
    * metro hash will produce different hash for LE/BE targets
* Measurements were done on 4 different architecture machines:
  * AMD Ryzen 9900X
  * Intel i5-1300K
  * IBM Power10
  * Apple M4 10 cores (mac mini)
* Hashing 10,000 of 16MB keys (bulk)
* Hashing 1,280M keys for all other length keys
* Each test was run 3 times and the minimal time was taken
  * GCC-14.2.1 was used on AMD and M4 machine, GCC-12.3.1 on Intel
    machine, GCC-11.5.0 was used on Power10
  * `-O3` was used for all compilations
  * The keys were generated by `rand` calls
  * The keys were aligned to see a hashing speed better and to permit runs for Metro
  * Some people complaint that my comparison is unfair as most hash functions are not inlined
    * I believe that the interface is the part of the implementation.  So when
      the interface does not provide an easy way for inlining, it is an
      implementation pitfall
    * Still to address the complaints I added `-flto` for benchmarking all hash
      functions excluding MUM and VMUM.  This option makes cross-file inlining
* Here are graphs summarizing the measurements:

![AMD](./benchmarks/amd.png)

![INTEL](./benchmarks/intel.png)

![M4](./benchmarks/m4.png)

![Power10](./benchmarks/power10.png)

* Exact numbers are given in the last section

# SMhasher Speed Measurements

* SMhasher also measures hash speeds.  It uses the CPU cycle counter (__rtdc)
  * __rtdc-based measurements might be inaccurate for a small number of
    executed insns as the process can migrate, not all insns can
    retire, and CPU freq can be different.  That is why I prefer long
    running benchmarks
* Here are the results on AMD Ryzen 9900X for the fastest quality hashes
  (chosen according to SMhasher bulk speed results from https://github.com/rurban/smhasher)
* More GB/sec is better.  Less cycles/hash is better
* Some hashes are based on the use of x86\_64 AES insns and are less portable.
  They are marked by "Yes" in the AES column 
* The SLOC column gives the source code lines to implement the hash
  
| Hash            | AES  | Bulk Speed (256KB): GB/s |Av. Speed on keys (1-32 bytes): cycles/hash| SLOC|
|:----------------|:----:|-------------------------:|------------------------------------------:|----:|
|VMUM-V2          |  -   |  103.7                   | 16.4                                      |459  |
|VMUM-V1          |  -   |  143.5                   | 16.8                                      |459  |
|MUM-V4           |  -   |   28.6                   | 15.8                                      |291  |
|MUM-V3           |  -   |   40.4                   | 16.3                                      |291  |
|xxh3             |  -   |   66.6                   | 17.6                                      |965  |
|umash64          |  -   |   63.1                   | 25.4                                      |1097 |
|FarmHash32       |  -   |   39.8                   | 32.6                                      |1423 |
|wyhash           |  -   |   39.3                   | 18.3                                      | 194 |
|clhash           |  -   |   38.4                   | 51.7                                      | 366 |
|t1ha2\_atonce    |  -   |   34.7                   | 25.5                                      |2262 |
|t1ha0\_aes\_avx2 | Yes  |  128.9                   | 25.0                                      |2262 |
|gxhash64         | Yes  |  197.1                   | 27.9                                      | 274 |
|aesni            | Yes  |   38.7                   | 28.5                                      | 132 |


# Using cryptographic vs. non-cryptographic hash function
  * People worrying about denial attacks based on generating hash
    collisions started to use cryptographic hash functions in hash tables
  * Cryptographic functions are very slow
    * *sha1* is about 20-30 times slower than MUM and City on the bulk speed tests
    * The new fastest cryptographic hash function *SipHash* is up to 10
      times slower
  * MUM and VMUM are also *resistant* to preimage attack (finding a
    key with a given hash) 
    * To make hard moving to previous state values we use mostly 1-to-1 one way
      function `lo(x*C) + hi(x*C)` where C is a constant.  Brute force
      solution of equation `f(x) = a` probably requires `2^63` tries.
      Another used function equation `x ^ y = a` has a `2^64`
      solutions.  It complicates finding the overal solution further
  * If somebody is not convinced, you can use **randomly chosen
    multiplication constants** (see functions `mum_hash_randomize` and
    `vmum_hash_randomize`).
    Finding a key with a given hash even if you know a key with such
    a hash probably will be close to finding two or more solutions of
    *Diophantine* equations
  * If somebody is still not convinced, you can implement hash tables
    to **recognize the attack and rebuild** the table using the MUM function
    with the new multiplication constants
  * Analogous approach can be used if you use weak hash function as
    MurMur or City.  Instead of using cryptographic hash functions
    **all the time**, hash tables can be implemented to recognize the
    attack and rebuild the table and start using a cryptographic hash
    function
  * This approach solves the speed problem and permits us to switch easily to a new
    cryptographic hash function if a flaw is found in the old one, e.g., switching from
    SipHash to SHA2
  
# How to use [V]MUM
* Please just include file `[v]mum.h` into your C/C++ program and use the following functions:
  * optional `[v]mum_hash_randomize` for choosing multiplication constants randomly
  * `[v]mum_hash_init`, `[v]mum_hash_step`, and `[v]mum_hash_finish` for hashing complex data structures
  * `[v]mum_hash64` for hashing a 64-bit data
  * `[v]mum_hash` for hashing any continuous block of data
  * Compile `vmum.h` with other code using options switching on vector
    insns if necessary (e.g. -mavx2 for x86\_64)
* To compare MUM and VMUM speed with other hash functions on your machine go to
  the directory `benchmarks` and run a script `./bench.sh`
* The script will compile source files and run the tests printing the
  results as a markdown table

# Crypto-hash function MUM512
  * [V]MUM is not designed to be a crypto-hash
    * The key (seed) and state are only 64-bit which are not crypto-level ones
    * The result can be different for different targets (BE/LE
      machines, 32- and 64-bit machines) as for other hash functions, e.g. City (hash can be
      different on SSE4.2 nad non SSE4.2 targets) or Spooky (BE/LE machines)
      * If you need the same MUM hash independent on the target, please
        define macro `[V]MUM_TARGET_INDEPENDENT_HASH`.  Defining the
        macro affects the performace only on big-endian targets or
        targets without int128 support
  * There is a variant of MUM called MUM512 which can be a **candidate**
    for a crypto-hash function and keyed crypto-hash function and
    might be interesting for researchers
    * The **key** is **256**-bit
    * The **state** and the **output** are **512**-bit
    * The **block** size is **512**-bit
    * It uses 128x128->256-bit multiplication which is analogous to about
      64 shifts and additions for 128-bit block word instead of 80
      rounds of shifts, additions, logical operations for 512-bit block
      in sha2-512.
  * It is **only a candidate** for a crypto hash function
    * I did not make any differential crypto-analysis or investigated
      probabilities of different attacks on the hash function (sorry, it
      is too big job)
      * I might be do this in the future as I am interested in
        differential characteristics of the MUM512 base transformation
        step (128x128-bit multiplications with addition of high and
        low 128-bit parts)
      * I am also interested in the right choice of the multiplication constants
      * May be somebody will do the analysis.  I will be glad to hear anything.
        Who knows, may be it can be easily broken as Nimbus cipher.
    * The current code might be also vulnerable to timing attack on
      systems with varying multiplication instruction latency time.
      There is no code for now to prevent it
  * To compare the MUM512 speed with the speed of SHA-2 (SHA512) and
    SHA-3 (SHA3-512) go to the directory `benchmarks` and run a script `./bench-crypto.sh`
    * SHA-2 and SHA-3 code is taken from [RHash](https://github.com/rhash/RHash.git)
  * Blake2 crypto-hash from [github.com/BLAKE2/BLAKE2](https://github.com/BLAKE2/BLAKE2)
    was added for comparison.  I use sse version of 64-bit Blake2 (blake2b).
  * Here is the speed of the crypto hash functions on AMD 9900X:

|                        | MUM512 | SHA2  |  SHA3  | Blake2B|
:------------------------|-------:|------:|-------:|-------:|
10 bytes (20 M texts)    | 0.27s  | 0.27s |  0.44s |  0.81s |
100 bytes (20 M texts)   | 0.36s  | 0.25s |  0.84s |  0.84s |
1000 bytes (20 M texts)  | 1.21s  | 2.08s | 5.63s  |  3.70s |
10000 bytes (5 M texts)  | 5.60s  | 5.05s | 14.07s |  7.99s |

# Pseudo-random generators
  * Files `mum-prng.h` and `mum512-prng.h` provide pseudo-random
    functions based on MUM and MUM512 hash functions
  * All PRNGs passed *NIST Statistical Test Suite for Random and
    Pseudorandom Number Generators for Cryptographic Applications*
    (version 2.2.1) with 1000 bitstreams each containing 1M bits
    * Although MUM PRNG passed the test, it is not a cryptographically
      secure PRNG as is the hash function used for it
  * To compare the PRNG speeds go to
    the directory `benchmarks` and run a script `./bench-prng.sh`
  * For the comparison I wrote crypto-secured Blum Blum Shub PRNG
    (file `bbs-prng.h`) and PRNGs based on fast cryto-level hash
    functions in ChaCha stream cipher (file `chacha-prng.h`) and
    SipHash24 (file `sip24-prng.h`).
    * The additional PRNGs also pass the Statistical Test Suite
  * For the comparison I also added the fastest PRNGs
    * [xoroshiro128+](http://xoroshiro.di.unimi.it/xoroshiro128plus.c)
    * [xoroshiro128**](http://xoroshiro.di.unimi.it/xoroshiro128starstar.c)
    * [xoshiro256+](http://xoroshiro.di.unimi.it/xoshiro256plus.c)
    * [xoshiro256**](http://xoroshiro.di.unimi.it/xoshiro256starstar.c)
    * [xoshiro512**](http://xoroshiro.di.unimi.it/xoshiro512starstar.c)
    * As recommended the first numbers generated by splitmix64 were used as a seed
  * I had no intention to tune MUM based PRNG first but
    after adding xoroshiro128+ and finding how fast it is, I've decided
    to speedup MUM PRNG
    * I added code to calculate a few PRNs at once to calculate them in parallel
    * I added AVX2 version functions to use the faster `MULX` instruction
    * The new version also passed NIST Statistical Test Suite.  It was
      tested even on bigger data (10K bitstreams each containing 10M
      bits).  The test took several days on i7-4790K
    * The new version is **almost 2 times** faster than the old one and MUM PRN
      speed became almost the same as xoroshiro/xoshiro ones
      * All xoroshiro/xoshiro and MUM PRNG functions are inlined in the benchmark program
      * Both code without inlining will be visibly slower and the speed
        difference will be negligible as one PRN calculation takes
        only about **3-4 machine cycle** for xoroshiro/xoshiro and MUM PRN.
  * **Update Nov.2 2019**: I found that MUM PRNG fails practrand on 512GB.  So I modified it.
    Instead of basically 16 independent PRNGs with 64-bit state, I made it one PRNG with 1024-bit state.
    I also managed to speed up MUM PRNG by 15%.
  * All PRNG were tested by [practrand](http://pracrand.sourceforge.net/) with
    4TB PRNG generated stream (it took a few days)
      * **GLIBC RAND, xoroshiro128+, xoshiro256+, and xoshiro512+ failed** on the first stages of practrand
      * The rest of the PRNGs passed
      * BBS PRNG was tested by only 64GB stream because it is too slow
  * Here is the speed of the PRNGs in millions generated PRNs
    per second:

|  M prns/sec  | AMD 9900X   |Intel i5-1360K| Apple M4    | Power10  |
:--------------|------------:|-------------:|------------:|---------:|
BBS            | 0.0886      | 0.0827       | 0.122       | 0.021    |
ChaCha         | 357.68      | 184.80       | 262.81      |  83.20   |
SipHash24      | 702.10      | 567.43       | 760.13      | 231.48   |
MUM512         |  91.54      | 179.62       | 268.04      |  44.28   |
MUM            |1947.27      |1620.65       |2263.68      | 694.42   |
XOSHIRO128**   |1797.02      |1386.87       |1095.37      | 477.67   |
XOSHIRO256**   |1866.35      |1364.85       |1466.15      | 607.65   |
XOSHIRO512**   |1663.86      |1235.15       |1423.90      | 631.90   |
GLIBC RAND     | 115.57      | 101.48       | 228.99      |  33.66   |
XOROSHIRO128+  |1786.62      |1299.59       |1296.48      | 549.85   |
XOSHIRO256+    |2321.99      |1720.67       |1690.96      | 711.41   |
XOSHIRO512+    |1808.81      |1525.18       |1659.76      | 717.12   |

# Table results for hash speed measurements
* Here are table variants of my measurements for people wanting the
  exact numbers.  The tables also contain time spent for hashing.
  
* AMD Ryzen 9900X:

| Length    |  VMUM-V2  |  VMUM-V1  |  MUM-V4   |  MUM-V3   |  Spooky   |   City    |  xxHash3  |   t1ha2   | SipHash24 |   Metro   |
|:----------|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|
|   3 bytes |1.00  4.57s|0.98  4.67s|0.98  4.64s|0.97  4.73s|0.76  6.01s|0.60  7.61s|0.94  4.84s|0.61  7.47s|0.61  7.51s|0.69  6.67s|
|   4 bytes |1.00  2.77s|1.00  2.78s|1.09  2.55s|1.09  2.55s|0.55  5.08s|0.39  7.15s|0.71  3.92s|0.69  4.03s|0.44  6.24s|0.75  3.69s|
|   5 bytes |1.00  4.63s|1.00  4.64s|1.00  4.62s|1.00  4.63s|0.80  5.78s|0.65  7.07s|0.88  5.28s|0.62  7.41s|0.61  7.59s|0.88  5.24s|
|   6 bytes |1.00  4.57s|1.00  4.56s|1.00  4.57s|1.00  4.56s|0.77  5.93s|0.65  7.06s|0.87  5.24s|0.62  7.38s|0.58  7.87s|0.87  5.24s|
|   7 bytes |1.00  4.84s|1.01  4.80s|1.01  4.80s|1.01  4.79s|0.79  6.15s|0.69  7.06s|0.92  5.24s|0.66  7.38s|0.60  8.10s|0.76  6.38s|
|   8 bytes |1.00  2.74s|1.00  2.74s|1.09  2.51s|1.09  2.51s|0.54  5.03s|0.39  7.06s|0.52  5.24s|0.69  3.97s|0.33  8.29s|0.75  3.67s|
|   9 bytes |1.00  3.01s|1.08  2.78s|1.07  2.82s|1.06  2.83s|0.59  5.06s|0.27 11.03s|0.45  6.66s|0.41  7.37s|0.36  8.29s|0.60  5.04s|
|  10 bytes |1.00  3.01s|1.08  2.78s|1.08  2.79s|1.04  2.89s|0.59  5.08s|0.27 11.02s|0.45  6.66s|0.41  7.36s|0.36  8.34s|0.60  5.05s|
|  11 bytes |1.00  3.01s|1.08  2.79s|1.08  2.79s|1.08  2.78s|0.59  5.08s|0.27 11.04s|0.45  6.67s|0.41  7.37s|0.36  8.32s|0.49  6.20s|
|  12 bytes |1.00  3.01s|1.09  2.77s|1.07  2.81s|1.07  2.82s|0.59  5.06s|0.27 11.03s|0.45  6.66s|0.41  7.35s|0.36  8.30s|0.60  5.02s|
|  13 bytes |1.00  2.98s|1.08  2.77s|1.05  2.83s|1.08  2.77s|0.59  5.02s|0.27 10.94s|0.45  6.61s|0.41  7.28s|0.36  8.21s|0.48  6.15s|
|  14 bytes |1.00  2.96s|1.08  2.74s|1.08  2.75s|1.08  2.74s|0.59  5.01s|0.27 10.95s|0.45  6.60s|0.41  7.29s|0.36  8.21s|0.48  6.16s|
|  15 bytes |1.00  2.98s|1.09  2.74s|1.08  2.77s|1.06  2.80s|0.59  5.01s|0.27 10.93s|0.45  6.61s|0.41  7.29s|0.36  8.21s|0.41  7.28s|
|  16 bytes |1.00  2.98s|1.09  2.74s|1.09  2.73s|1.09  2.73s|0.27 10.94s|0.41  7.28s|0.93  3.19s|0.59  5.08s|0.29 10.31s|0.62  4.78s|
|  32 bytes |1.00  3.28s|1.08  3.05s|1.00  3.27s|0.98  3.34s|0.30 10.95s|0.40  8.19s|1.03  3.19s|0.44  7.50s|0.23 14.39s|0.33  9.82s|
|  64 bytes |1.00  3.89s|1.09  3.58s|0.87  4.47s|0.85  4.58s|0.23 16.63s|0.47  8.36s|1.22  3.19s|0.68  5.69s|0.17 22.59s|0.37 10.49s|
|  96 bytes |1.00  4.55s|1.08  4.20s|0.79  5.79s|0.78  5.83s|0.20 22.31s|0.36 12.54s|1.43  3.19s|0.66  6.88s|0.15 31.11s|0.40 11.27s|
| 128 bytes |1.00  6.32s|1.40  4.52s|0.91  6.92s|0.90  7.06s|0.23 27.99s|0.50 12.54s|1.98  3.19s|0.83  7.57s|0.16 39.35s|0.53 11.85s|
| 192 bytes |1.00  8.55s|1.29  6.63s|0.90  9.46s|0.99  8.65s|0.36 23.81s|0.59 14.48s|1.25  6.83s|0.88  9.74s|0.15 55.96s|0.65 13.21s|
| 256 bytes |1.00 10.98s|1.38  7.95s|0.92 11.98s|1.06 10.32s|0.45 24.22s|0.66 16.71s|0.79 13.86s|0.92 11.89s|0.15 74.12s|0.75 14.57s|
| 512 bytes |1.00 14.42s|1.12 12.91s|0.65 22.33s|0.79 18.16s|0.41 35.30s|0.53 26.98s|0.91 15.92s|0.69 21.04s|0.10 140.39s|0.63 22.76s|
|1024 bytes |1.00 17.13s|1.09 15.75s|0.37 46.54s|0.50 34.26s|0.32 53.76s|0.38 45.34s|0.88 19.50s|0.44 38.78s|0.06 272.94s|0.44 39.07s|
| Bulk      |1.00  1.70s|1.36  1.25s|0.31  5.57s|0.44  3.85s|0.33  5.13s|0.34  4.94s|1.18  1.44s|0.37  4.62s|0.05 33.88s|0.40  4.26s|
| Average   |1.00       |1.11       |0.93       |0.96       |0.50       |0.43       |0.85       |0.58       |0.31       |0.59       |
| Geomean   |1.00       |1.10       |0.90       |0.93       |0.46       |0.41       |0.77       |0.55       |0.26       |0.56       |

* Intel i5-13600K:

| Length    |  VMUM-V2  |  VMUM-V1  |  MUM-V4   |  MUM-V3   |  Spooky   |   City    |  xxHash3  |   t1ha2   | SipHash24 |   Metro   |
|:----------|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|
|   3 bytes |1.00  4.67s|1.02  4.59s|1.03  4.53s|1.03  4.53s|0.67  6.94s|0.58  8.05s|0.93  5.03s|0.50  9.31s|0.47  9.93s|0.69  6.79s|
|   4 bytes |1.00  4.27s|1.00  4.27s|1.06  4.02s|1.06  4.02s|0.73  5.86s|0.51  8.37s|0.75  5.70s|0.77  5.58s|0.52  8.17s|0.81  5.28s|
|   5 bytes |1.00  4.66s|1.02  4.59s|1.03  4.53s|1.03  4.53s|0.73  6.37s|0.57  8.17s|0.82  5.70s|0.50  9.31s|0.44 10.49s|0.69  6.79s|
|   6 bytes |1.00  4.67s|1.02  4.56s|1.03  4.53s|1.03  4.53s|0.69  6.76s|0.57  8.17s|0.82  5.69s|0.50  9.31s|0.43 10.95s|0.69  6.79s|
|   7 bytes |1.00  4.87s|1.00  4.89s|1.01  4.83s|1.01  4.83s|0.69  7.02s|0.60  8.17s|0.85  5.70s|0.52  9.31s|0.45 10.89s|0.69  7.04s|
|   8 bytes |1.00  4.27s|1.00  4.28s|1.55  2.76s|1.55  2.76s|0.76  5.60s|0.53  8.08s|0.75  5.69s|0.99  4.30s|0.39 10.88s|1.06  4.02s|
|   9 bytes |1.00  4.53s|1.06  4.29s|1.50  3.02s|1.50  3.01s|0.81  5.60s|0.33 13.59s|0.53  8.58s|0.48  9.51s|0.42 10.88s|0.82  5.53s|
|  10 bytes |1.00  4.54s|1.06  4.29s|1.50  3.02s|1.51  3.01s|0.81  5.60s|0.33 13.59s|0.53  8.58s|0.48  9.51s|0.42 10.88s|0.82  5.53s|
|  11 bytes |1.00  4.52s|1.06  4.27s|1.50  3.02s|1.50  3.02s|0.81  5.60s|0.33 13.59s|0.53  8.58s|0.48  9.51s|0.42 10.88s|0.67  6.79s|
|  12 bytes |1.00  4.54s|1.06  4.29s|1.50  3.02s|1.51  3.01s|0.81  5.60s|0.33 13.59s|0.53  8.58s|0.48  9.51s|0.42 10.88s|0.82  5.53s|
|  13 bytes |1.00  4.52s|1.06  4.28s|1.49  3.03s|1.50  3.02s|0.81  5.60s|0.33 13.59s|0.53  8.59s|0.48  9.51s|0.42 10.88s|0.67  6.79s|
|  14 bytes |1.00  4.52s|1.06  4.27s|1.49  3.03s|1.50  3.02s|0.81  5.60s|0.33 13.59s|0.53  8.59s|0.48  9.51s|0.42 10.88s|0.67  6.79s|
|  15 bytes |1.00  4.53s|1.06  4.29s|1.50  3.02s|1.50  3.02s|0.81  5.60s|0.33 13.59s|0.53  8.58s|0.48  9.51s|0.42 10.88s|0.56  8.05s|
|  16 bytes |1.00  4.52s|1.06  4.28s|1.50  3.02s|1.50  3.01s|0.37 12.13s|0.56  8.05s|0.89  5.07s|0.85  5.29s|0.34 13.43s|0.83  5.46s|
|  32 bytes |1.00  4.79s|1.05  4.58s|1.30  3.69s|1.33  3.59s|0.39 12.38s|0.51  9.39s|0.94  5.10s|0.67  7.15s|0.25 18.92s|0.43 11.07s|
|  64 bytes |1.00  5.46s|1.06  5.15s|1.14  4.78s|1.11  4.91s|0.29 18.66s|0.58  9.36s|1.06  5.13s|0.88  6.22s|0.17 31.57s|0.46 11.83s|
|  96 bytes |1.00  6.43s|1.10  5.83s|0.84  7.68s|0.84  7.67s|0.26 25.17s|0.46 13.88s|1.23  5.23s|0.85  7.60s|0.15 42.71s|0.51 12.70s|
| 128 bytes |1.00  7.92s|1.25  6.36s|0.84  9.42s|0.86  9.19s|0.25 31.62s|0.57 13.87s|1.51  5.24s|0.91  8.67s|0.15 53.88s|0.59 13.51s|
| 192 bytes |1.00 11.52s|1.43  8.06s|1.05 11.02s|1.08 10.68s|0.39 29.49s|0.71 16.25s|1.23  9.34s|1.03 11.18s|0.15 76.23s|0.76 15.07s|
| 256 bytes |1.00 14.26s|1.60  8.89s|1.04 13.65s|1.18 12.11s|0.48 29.86s|0.75 19.06s|0.91 15.64s|1.03 13.82s|0.15 97.67s|0.85 16.68s|
| 512 bytes |1.00 15.93s|1.04 15.31s|0.62 25.67s|0.81 19.65s|0.35 45.39s|0.46 34.70s|0.96 16.68s|0.58 27.38s|0.09 186.04s|0.53 29.86s|
|1024 bytes |1.00 20.96s|1.08 19.32s|0.42 49.61s|0.56 37.74s|0.29 71.66s|0.34 61.75s|0.84 24.92s|0.42 49.86s|0.06 362.59s|0.36 58.67s|
| Bulk      |1.00  3.13s|1.09  2.86s|0.40  7.77s|0.61  5.17s|0.40  7.78s|0.42  7.37s|0.87  3.60s|0.49  6.38s|0.07 45.99s|0.45  6.88s|
| Average   |1.00       |1.10       |1.15       |1.18       |0.58       |0.48       |0.83       |0.65       |0.31       |0.67       |
| Geomean   |1.00       |1.09       |1.08       |1.13       |0.54       |0.46       |0.79       |0.62       |0.26       |0.65       |

* Apple M4:

| Length    |  VMUM-V2  |  VMUM-V1  |  MUM-V4   |  MUM-V3   |  Spooky   |   City    |  xxHash3  |   t1ha2   | SipHash24 |   Metro   |
|:----------|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|
|   3 bytes |1.00  5.15s|1.02  5.03s|1.03  5.02s|1.02  5.03s|0.69  7.50s|0.52  9.91s|0.87  5.92s|1.08  4.77s|0.51 10.01s|0.63  8.17s|
|   4 bytes |1.00  4.70s|1.01  4.66s|1.08  4.36s|1.08  4.36s|0.69  6.78s|0.49  9.64s|0.70  6.71s|0.99  4.77s|0.52  9.03s|0.73  6.41s|
|   5 bytes |1.00  5.05s|1.00  5.03s|1.01  5.01s|1.01  5.01s|0.71  7.08s|0.52  9.64s|0.74  6.78s|1.06  4.77s|0.49 10.38s|0.62  8.17s|
|   6 bytes |1.00  5.15s|1.02  5.03s|1.04  4.95s|1.04  4.95s|0.69  7.49s|0.53  9.64s|0.76  6.78s|1.08  4.76s|0.49 10.51s|0.63  8.17s|
|   7 bytes |1.00  5.52s|1.03  5.34s|1.04  5.32s|1.04  5.32s|0.71  7.82s|0.57  9.64s|0.81  6.79s|1.16  4.76s|0.51 10.77s|0.65  8.46s|
|   8 bytes |1.00  3.07s|1.02  3.02s|1.16  2.65s|1.16  2.65s|0.47  6.49s|0.32  9.64s|0.46  6.71s|0.68  4.53s|0.26 11.69s|0.66  4.66s|
|   9 bytes |1.00  3.26s|1.07  3.04s|1.03  3.17s|1.03  3.18s|0.50  6.48s|0.27 11.96s|0.56  5.85s|0.55  5.98s|0.28 11.69s|0.51  6.41s|
|  10 bytes |1.00  3.27s|1.08  3.03s|1.08  3.02s|1.08  3.02s|0.50  6.48s|0.27 11.96s|0.56  5.85s|0.55  5.98s|0.28 11.69s|0.51  6.41s|
|  11 bytes |1.00  3.38s|1.10  3.07s|1.11  3.05s|1.07  3.15s|0.52  6.48s|0.28 11.96s|0.58  5.85s|0.57  5.98s|0.29 11.69s|0.43  7.87s|
|  12 bytes |1.00  3.39s|1.08  3.13s|1.13  3.00s|1.13  3.00s|0.52  6.48s|0.28 11.96s|0.58  5.85s|0.57  5.98s|0.29 11.69s|0.53  6.41s|
|  13 bytes |1.00  3.33s|1.08  3.08s|1.10  3.04s|1.10  3.04s|0.51  6.48s|0.28 11.95s|0.57  5.85s|0.56  5.98s|0.28 11.69s|0.42  7.87s|
|  14 bytes |1.00  3.31s|1.09  3.05s|1.10  3.02s|1.09  3.03s|0.51  6.48s|0.28 11.96s|0.56  5.86s|0.55  5.98s|0.28 11.69s|0.42  7.87s|
|  15 bytes |1.00  3.32s|1.06  3.12s|1.08  3.07s|1.08  3.07s|0.51  6.48s|0.28 11.96s|0.57  5.85s|0.56  5.98s|0.28 11.69s|0.36  9.33s|
|  16 bytes |1.00  3.30s|1.10  3.00s|1.07  3.07s|1.07  3.08s|0.23 14.08s|0.35  9.33s|0.85  3.87s|0.55  5.98s|0.23 14.58s|0.54  6.12s|
|  32 bytes |1.00  3.66s|1.07  3.42s|1.01  3.64s|1.00  3.65s|0.26 14.07s|0.35 10.51s|0.96  3.80s|0.41  9.03s|0.18 20.66s|0.29 12.57s|
|  64 bytes |1.00  4.42s|1.09  4.07s|0.89  4.99s|0.89  4.99s|0.21 21.37s|0.41 10.84s|1.17  3.79s|0.62  7.11s|0.13 33.90s|0.33 13.44s|
|  96 bytes |1.00  5.16s|1.07  4.82s|0.82  6.27s|0.84  6.17s|0.18 28.70s|0.32 16.19s|1.36  3.80s|0.60  8.57s|0.11 45.54s|0.36 14.34s|
| 128 bytes |1.00  6.87s|1.13  6.08s|0.91  7.55s|0.93  7.40s|0.19 35.99s|0.42 16.19s|1.81  3.80s|0.72  9.53s|0.12 59.49s|0.45 15.22s|
| 192 bytes |1.00  8.65s|1.12  7.69s|0.82 10.60s|0.88  9.86s|0.27 31.55s|0.46 18.64s|0.88  9.86s|0.71 12.16s|0.10 84.13s|0.51 16.99s|
| 256 bytes |1.00  9.39s|1.30  7.20s|0.71 13.24s|0.76 12.29s|0.29 32.11s|0.44 21.28s|0.71 13.19s|0.63 14.87s|0.09 106.82s|0.50 18.77s|
| 512 bytes |1.00 14.79s|1.07 13.76s|0.69 21.33s|0.92 15.99s|0.29 50.41s|0.40 37.15s|0.91 16.28s|0.49 30.25s|0.07 204.80s|0.46 31.88s|
|1024 bytes |1.00 27.83s|1.56 17.79s|0.65 43.07s|1.04 26.70s|0.35 78.46s|0.44 62.77s|1.14 24.39s|0.51 54.28s|0.07 399.61s|0.55 50.90s|
| Bulk      |1.00  3.45s|1.39  2.49s|0.58  6.00s|1.13  3.06s|0.44  7.83s|0.51  6.70s|1.19  2.89s|0.54  6.36s|0.07 50.68s|0.67  5.13s|
| Average   |1.00       |1.11       |0.96       |1.02       |0.45       |0.39       |0.84       |0.68       |0.26       |0.51       |
| Geomean   |1.00       |1.10       |0.95       |1.01       |0.41       |0.38       |0.79       |0.66       |0.21       |0.50       |

* IBM Power10:

| Length    |  VMUM-V2  |  VMUM-V1  |  MUM-V4   |  MUM-V3   |  Spooky   |   City    |  xxHash3  |   t1ha2   | SipHash24 |   Metro   |
|:----------|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|
|   3 bytes |1.00 11.52s|1.00 11.53s|1.03 11.18s|1.03 11.21s|0.68 16.87s|0.61 18.95s|0.95 12.16s|1.09 10.57s|0.52 22.13s|0.66 17.35s|
|   4 bytes |1.00 10.86s|1.00 10.87s|1.07 10.19s|1.07 10.19s|0.72 15.18s|0.54 20.22s|0.89 12.27s|1.03 10.58s|0.51 21.13s|0.88 12.40s|
|   5 bytes |1.00 11.53s|0.98 11.73s|1.03 11.17s|1.03 11.17s|0.71 16.24s|0.55 20.91s|0.90 12.76s|1.09 10.58s|0.49 23.74s|0.66 17.35s|
|   6 bytes |1.00 11.52s|0.98 11.73s|1.03 11.17s|1.03 11.18s|0.68 16.87s|0.55 20.92s|0.90 12.76s|1.09 10.58s|0.48 23.91s|0.66 17.36s|
|   7 bytes |1.00 12.23s|1.02 11.96s|0.98 12.51s|1.03 11.84s|0.69 17.69s|0.58 20.92s|0.96 12.76s|1.16 10.56s|0.50 24.38s|0.56 22.01s|
|   8 bytes |1.00 10.85s|1.00 10.86s|1.06 10.19s|1.07 10.18s|0.76 14.27s|0.52 20.92s|0.85 12.75s|1.05 10.32s|0.37 29.14s|1.10  9.86s|
|   9 bytes |1.00 11.54s|1.06 10.87s|1.06 10.85s|1.06 10.85s|0.79 14.55s|0.40 28.92s|0.72 16.11s|0.84 13.73s|0.40 29.08s|0.84 13.80s|
|  10 bytes |1.00 11.53s|1.06 10.87s|1.06 10.85s|1.06 10.85s|0.79 14.54s|0.40 28.92s|0.72 16.10s|0.84 13.72s|0.40 29.17s|0.84 13.79s|
|  11 bytes |1.00 11.22s|1.03 10.87s|1.03 10.85s|1.03 10.85s|0.77 14.55s|0.39 28.90s|0.70 16.11s|0.82 13.72s|0.38 29.21s|0.66 17.08s|
|  12 bytes |1.00 11.53s|1.06 10.86s|1.06 10.86s|1.06 10.85s|0.79 14.54s|0.40 28.92s|0.72 16.11s|0.84 13.72s|0.40 29.13s|0.84 13.80s|
|  13 bytes |1.00 11.23s|1.03 10.87s|1.04 10.85s|1.04 10.85s|0.77 14.56s|0.39 28.91s|0.70 16.11s|0.82 13.72s|0.39 29.14s|0.66 17.09s|
|  14 bytes |1.00 11.23s|1.03 10.88s|1.03 10.87s|1.04 10.84s|0.77 14.55s|0.39 28.92s|0.70 16.11s|0.82 13.71s|0.38 29.20s|0.66 17.09s|
|  15 bytes |1.00 11.53s|1.06 10.88s|1.06 10.89s|1.06 10.89s|0.79 14.56s|0.40 28.91s|0.72 16.11s|0.84 13.72s|0.40 29.17s|0.57 20.38s|
|  16 bytes |1.00 12.20s|1.12 10.91s|1.12 10.85s|1.12 10.89s|0.44 27.70s|0.61 20.05s|1.36  8.96s|0.89 13.72s|0.31 39.95s|1.00 12.16s|
|  32 bytes |1.00 12.92s|1.11 11.59s|1.06 12.22s|1.06 12.22s|0.45 28.63s|0.58 22.34s|1.57  8.23s|0.64 20.32s|0.24 54.90s|0.52 24.97s|
|  64 bytes |1.00 14.54s|1.11 13.09s|0.97 15.02s|0.96 15.07s|0.35 41.29s|0.62 23.31s|1.70  8.54s|0.90 16.20s|0.16 88.17s|0.54 26.78s|
|  96 bytes |1.00 15.93s|1.13 14.07s|0.82 19.44s|0.96 16.64s|0.29 54.32s|0.45 35.33s|1.93  8.24s|0.81 19.57s|0.13 119.66s|0.55 28.81s|
| 128 bytes |1.00 16.71s|1.08 15.47s|0.75 22.21s|0.86 19.52s|0.24 68.72s|0.47 35.31s|2.03  8.22s|0.77 21.78s|0.11 156.55s|0.54 30.78s|
| 192 bytes |1.00 21.98s|1.16 18.99s|0.83 26.51s|0.83 26.60s|0.36 60.97s|0.53 41.32s|0.76 28.78s|0.76 29.01s|0.11 195.56s|0.62 35.17s|
| 256 bytes |1.00 22.31s|1.25 17.89s|0.73 30.47s|0.76 29.34s|0.36 62.61s|0.47 47.41s|0.67 33.50s|0.64 35.02s|0.09 252.22s|0.57 39.10s|
| 512 bytes |1.00 33.65s|1.19 28.16s|0.71 47.60s|0.74 45.60s|0.33 100.76s|0.46 73.34s|0.73 46.38s|0.56 60.20s|0.07 483.42s|0.60 55.93s|
|1024 bytes |1.00 56.70s|1.41 40.15s|0.69 81.61s|0.73 78.10s|0.34 167.11s|0.45 126.45s|0.80 70.61s|0.49 116.16s|0.06 944.76s|0.45 127.01s|
| Bulk      |1.00  6.75s|1.42  4.75s|0.70  9.62s|0.67 10.01s|0.38 17.80s|0.49 13.74s|0.93  7.27s|0.49 13.91s|0.06 118.69s|0.46 14.52s|
| Average   |1.00       |1.10       |0.95       |0.97       |0.58       |0.49       |1.00       |0.84       |0.30       |0.67       |
| Geomean   |1.00       |1.09       |0.94       |0.96       |0.54       |0.48       |0.93       |0.82       |0.24       |0.65       |



================================================
FILE: benchmarks/City.cpp
================================================
// Copyright (c) 2011 Google, Inc.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in
// all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
// THE SOFTWARE.
//
// CityHash, by Geoff Pike and Jyrki Alakuijala
//
// This file provides CityHash64() and related functions.
//
// It's probably possible to create even faster hash functions by
// writing a program that systematically explores some of the space of
// possible hash functions, by using SIMD instructions, or by
// compromising on hash quality.

#include "City.h"

#include <algorithm>
#include <string.h>  // for memcpy and memset

using namespace std;

static uint64 UNALIGNED_LOAD64(const char *p) {
  uint64 result;
  memcpy(&result, p, sizeof(result));
  return result;
}

static uint32 UNALIGNED_LOAD32(const char *p) {
  uint32 result;
  memcpy(&result, p, sizeof(result));
  return result;
}

#ifndef __BIG_ENDIAN__

#define uint32_in_expected_order(x) (x)
#define uint64_in_expected_order(x) (x)

#else

#ifdef _MSC_VER
#include <stdlib.h>
#define bswap_32(x) _byteswap_ulong(x)
#define bswap_64(x) _byteswap_uint64(x)

#elif defined(__APPLE__)
// Mac OS X / Darwin features
#include <libkern/OSByteOrder.h>
#define bswap_32(x) OSSwapInt32(x)
#define bswap_64(x) OSSwapInt64(x)

#else
#include <byteswap.h>
#endif

#define uint32_in_expected_order(x) (bswap_32(x))
#define uint64_in_expected_order(x) (bswap_64(x))

#endif  // __BIG_ENDIAN__

#if !defined(LIKELY)
#if defined(__GNUC__) || defined(__INTEL_COMPILER)
#define LIKELY(x) (__builtin_expect(!!(x), 1))
#else
#define LIKELY(x) (x)
#endif
#endif

static uint64 Fetch64(const char *p) {
  return uint64_in_expected_order(UNALIGNED_LOAD64(p));
}

static uint32 Fetch32(const char *p) {
  return uint32_in_expected_order(UNALIGNED_LOAD32(p));
}

// Some primes between 2^63 and 2^64 for various uses.
static const uint64 k0 = 0xc3a5c85c97cb3127ULL;
static const uint64 k1 = 0xb492b66fbe98f273ULL;
static const uint64 k2 = 0x9ae16a3b2f90404fULL;
static const uint64 k3 = 0xc949d7c7509e6557ULL;

// Bitwise right rotate.  Normally this will compile to a single
// instruction, especially if the shift is a manifest constant.
static uint64 Rotate(uint64 val, int shift) {
  // Avoid shifting by 64: doing so yields an undefined result.
  return shift == 0 ? val : ((val >> shift) | (val << (64 - shift)));
}

// Equivalent to Rotate(), but requires the second arg to be non-zero.
// On x86-64, and probably others, it's possible for this to compile
// to a single instruction if both args are already in registers.
static uint64 RotateByAtLeast1(uint64 val, int shift) {
  return (val >> shift) | (val << (64 - shift));
}

static uint64 ShiftMix(uint64 val) {
  return val ^ (val >> 47);
}

static uint64 HashLen16(uint64 u, uint64 v) {
  return Hash128to64(uint128(u, v));
}

static uint64 HashLen0to16(const char *s, size_t len) {
  if (len > 8) {
    uint64 a = Fetch64(s);
    uint64 b = Fetch64(s + len - 8);
    return HashLen16(a, RotateByAtLeast1(b + len, len)) ^ b;
  }
  if (len >= 4) {
    uint64 a = Fetch32(s);
    return HashLen16(len + (a << 3), Fetch32(s + len - 4));
  }
  if (len > 0) {
    uint8 a = s[0];
    uint8 b = s[len >> 1];
    uint8 c = s[len - 1];
    uint32 y = static_cast<uint32>(a) + (static_cast<uint32>(b) << 8);
    uint32 z = len + (static_cast<uint32>(c) << 2);
    return ShiftMix(y * k2 ^ z * k3) * k2;
  }
  return k2;
}

// This probably works well for 16-byte strings as well, but it may be overkill
// in that case.
static uint64 HashLen17to32(const char *s, size_t len) {
  uint64 a = Fetch64(s) * k1;
  uint64 b = Fetch64(s + 8);
  uint64 c = Fetch64(s + len - 8) * k2;
  uint64 d = Fetch64(s + len - 16) * k0;
  return HashLen16(Rotate(a - b, 43) + Rotate(c, 30) + d,
                   a + Rotate(b ^ k3, 20) - c + len);
}

// Return a 16-byte hash for 48 bytes.  Quick and dirty.
// Callers do best to use "random-looking" values for a and b.
static pair<uint64, uint64> WeakHashLen32WithSeeds(
    uint64 w, uint64 x, uint64 y, uint64 z, uint64 a, uint64 b) {
  a += w;
  b = Rotate(b + a + z, 21);
  uint64 c = a;
  a += x;
  a += y;
  b += Rotate(a, 44);
  return make_pair(a + z, b + c);
}

// Return a 16-byte hash for s[0] ... s[31], a, and b.  Quick and dirty.
static pair<uint64, uint64> WeakHashLen32WithSeeds(
    const char* s, uint64 a, uint64 b) {
  return WeakHashLen32WithSeeds(Fetch64(s),
                                Fetch64(s + 8),
                                Fetch64(s + 16),
                                Fetch64(s + 24),
                                a,
                                b);
}

// Return an 8-byte hash for 33 to 64 bytes.
static uint64 HashLen33to64(const char *s, size_t len) {
  uint64 z = Fetch64(s + 24);
  uint64 a = Fetch64(s) + (len + Fetch64(s + len - 16)) * k0;
  uint64 b = Rotate(a + z, 52);
  uint64 c = Rotate(a, 37);
  a += Fetch64(s + 8);
  c += Rotate(a, 7);
  a += Fetch64(s + 16);
  uint64 vf = a + z;
  uint64 vs = b + Rotate(a, 31) + c;
  a = Fetch64(s + 16) + Fetch64(s + len - 32);
  z = Fetch64(s + len - 8);
  b = Rotate(a + z, 52);
  c = Rotate(a, 37);
  a += Fetch64(s + len - 24);
  c += Rotate(a, 7);
  a += Fetch64(s + len - 16);
  uint64 wf = a + z;
  uint64 ws = b + Rotate(a, 31) + c;
  uint64 r = ShiftMix((vf + ws) * k2 + (wf + vs) * k0);
  return ShiftMix(r * k0 + vs) * k2;
}

uint64 CityHash64(const char *s, size_t len) {
  if (len <= 32) {
    if (len <= 16) {
      return HashLen0to16(s, len);
    } else {
      return HashLen17to32(s, len);
    }
  } else if (len <= 64) {
    return HashLen33to64(s, len);
  }

  // For strings over 64 bytes we hash the end first, and then as we
  // loop we keep 56 bytes of state: v, w, x, y, and z.
  uint64 x = Fetch64(s + len - 40);
  uint64 y = Fetch64(s + len - 16) + Fetch64(s + len - 56);
  uint64 z = HashLen16(Fetch64(s + len - 48) + len, Fetch64(s + len - 24));
  pair<uint64, uint64> v = WeakHashLen32WithSeeds(s + len - 64, len, z);
  pair<uint64, uint64> w = WeakHashLen32WithSeeds(s + len - 32, y + k1, x);
  x = x * k1 + Fetch64(s);

  // Decrease len to the nearest multiple of 64, and operate on 64-byte chunks.
  len = (len - 1) & ~static_cast<size_t>(63);
  do {
    x = Rotate(x + y + v.first + Fetch64(s + 8), 37) * k1;
    y = Rotate(y + v.second + Fetch64(s + 48), 42) * k1;
    x ^= w.second;
    y += v.first + Fetch64(s + 40);
    z = Rotate(z + w.first, 33) * k1;
    v = WeakHashLen32WithSeeds(s, v.second * k1, x + w.first);
    w = WeakHashLen32WithSeeds(s + 32, z + w.second, y + Fetch64(s + 16));
    std::swap(z, x);
    s += 64;
    len -= 64;
  } while (len != 0);
  return HashLen16(HashLen16(v.first, w.first) + ShiftMix(y) * k1 + z,
                   HashLen16(v.second, w.second) + x);
}

uint64 CityHash64WithSeed(const char *s, size_t len, uint64 seed) {
  return CityHash64WithSeeds(s, len, k2, seed);
}

uint64 CityHash64WithSeeds(const char *s, size_t len,
                           uint64 seed0, uint64 seed1) {
  return HashLen16(CityHash64(s, len) - seed0, seed1);
}

// A subroutine for CityHash128().  Returns a decent 128-bit hash for strings
// of any length representable in signed long.  Based on City and Murmur.
static uint128 CityMurmur(const char *s, size_t len, uint128 seed) {
  uint64 a = Uint128Low64(seed);
  uint64 b = Uint128High64(seed);
  uint64 c = 0;
  uint64 d = 0;
  signed long l = len - 16;
  if (l <= 0) {  // len <= 16
    a = ShiftMix(a * k1) * k1;
    c = b * k1 + HashLen0to16(s, len);
    d = ShiftMix(a + (len >= 8 ? Fetch64(s) : c));
  } else {  // len > 16
    c = HashLen16(Fetch64(s + len - 8) + k1, a);
    d = HashLen16(b + len, c + Fetch64(s + len - 16));
    a += d;
    do {
      a ^= ShiftMix(Fetch64(s) * k1) * k1;
      a *= k1;
      b ^= a;
      c ^= ShiftMix(Fetch64(s + 8) * k1) * k1;
      c *= k1;
      d ^= c;
      s += 16;
      l -= 16;
    } while (l > 0);
  }
  a = HashLen16(a, c);
  b = HashLen16(d, b);
  return uint128(a ^ b, HashLen16(b, a));
}

uint128 CityHash128WithSeed(const char *s, size_t len, uint128 seed) {
  if (len < 128) {
    return CityMurmur(s, len, seed);
  }

  // We expect len >= 128 to be the common case.  Keep 56 bytes of state:
  // v, w, x, y, and z.
  pair<uint64, uint64> v, w;
  uint64 x = Uint128Low64(seed);
  uint64 y = Uint128High64(seed);
  uint64 z = len * k1;
  v.first = Rotate(y ^ k1, 49) * k1 + Fetch64(s);
  v.second = Rotate(v.first, 42) * k1 + Fetch64(s + 8);
  w.first = Rotate(y + z, 35) * k1 + x;
  w.second = Rotate(x + Fetch64(s + 88), 53) * k1;

  // This is the same inner loop as CityHash64(), manually unrolled.
  do {
    x = Rotate(x + y + v.first + Fetch64(s + 8), 37) * k1;
    y = Rotate(y + v.second + Fetch64(s + 48), 42) * k1;
    x ^= w.second;
    y += v.first + Fetch64(s + 40);
    z = Rotate(z + w.first, 33) * k1;
    v = WeakHashLen32WithSeeds(s, v.second * k1, x + w.first);
    w = WeakHashLen32WithSeeds(s + 32, z + w.second, y + Fetch64(s + 16));
    std::swap(z, x);
    s += 64;
    x = Rotate(x + y + v.first + Fetch64(s + 8), 37) * k1;
    y = Rotate(y + v.second + Fetch64(s + 48), 42) * k1;
    x ^= w.second;
    y += v.first + Fetch64(s + 40);
    z = Rotate(z + w.first, 33) * k1;
    v = WeakHashLen32WithSeeds(s, v.second * k1, x + w.first);
    w = WeakHashLen32WithSeeds(s + 32, z + w.second, y + Fetch64(s + 16));
    std::swap(z, x);
    s += 64;
    len -= 128;
  } while (LIKELY(len >= 128));
  x += Rotate(v.first + z, 49) * k0;
  z += Rotate(w.first, 37) * k0;
  // If 0 < len < 128, hash up to 4 chunks of 32 bytes each from the end of s.
  for (size_t tail_done = 0; tail_done < len; ) {
    tail_done += 32;
    y = Rotate(x + y, 42) * k0 + v.second;
    w.first += Fetch64(s + len - tail_done + 16);
    x = x * k0 + w.first;
    z += w.second + Fetch64(s + len - tail_done);
    w.second += v.first;
    v = WeakHashLen32WithSeeds(s + len - tail_done, v.first + z, v.second);
  }
  // At this point our 56 bytes of state should contain more than
  // enough information for a strong 128-bit hash.  We use two
  // different 56-byte-to-8-byte hashes to get a 16-byte final result.
  x = HashLen16(x, v.first);
  y = HashLen16(y + z, w.first);
  return uint128(HashLen16(x + v.second, w.second) + y,
                 HashLen16(x + w.second, y + v.second));
}

uint128 CityHash128(const char *s, size_t len) {
  if (len >= 16) {
    return CityHash128WithSeed(s + 16,
                               len - 16,
                               uint128(Fetch64(s) ^ k3,
                                       Fetch64(s + 8)));
  } else if (len >= 8) {
    return CityHash128WithSeed(NULL,
                               0,
                               uint128(Fetch64(s) ^ (len * k0),
                                       Fetch64(s + len - 8) ^ k1));
  } else {
    return CityHash128WithSeed(s, len, uint128(k0, k1));
  }
}

#if defined(__SSE4_2__) && defined(__x86_64__)
#include <nmmintrin.h>

// Requires len >= 240.
static void CityHashCrc256Long(const char *s, size_t len,
                               uint32 seed, uint64 *result) {
  uint64 a = Fetch64(s + 56) + k0;
  uint64 b = Fetch64(s + 96) + k0;
  uint64 c = result[0] = HashLen16(b, len);
  uint64 d = result[1] = Fetch64(s + 120) * k0 + len;
  uint64 e = Fetch64(s + 184) + seed;
  uint64 f = seed;
  uint64 g = 0;
  uint64 h = 0;
  uint64 i = 0;
  uint64 j = 0;
  uint64 t = c + d;

  // 240 bytes of input per iter.
  size_t iters = len / 240;
  len -= iters * 240;
  do {
#define CHUNK(multiplier, z)                                    \
    {                                                           \
      uint64 old_a = a;                                         \
      a = Rotate(b, 41 ^ z) * multiplier + Fetch64(s);          \
      b = Rotate(c, 27 ^ z) * multiplier + Fetch64(s + 8);      \
      c = Rotate(d, 41 ^ z) * multiplier + Fetch64(s + 16);     \
      d = Rotate(e, 33 ^ z) * multiplier + Fetch64(s + 24);     \
      e = Rotate(t, 25 ^ z) * multiplier + Fetch64(s + 32);     \
      t = old_a;                                                \
    }                                                           \
    f = _mm_crc32_u64(f, a);                                    \
    g = _mm_crc32_u64(g, b);                                    \
    h = _mm_crc32_u64(h, c);                                    \
    i = _mm_crc32_u64(i, d);                                    \
    j = _mm_crc32_u64(j, e);                                    \
    s += 40

    CHUNK(1, 1); CHUNK(k0, 0);
    CHUNK(1, 1); CHUNK(k0, 0);
    CHUNK(1, 1); CHUNK(k0, 0);
  } while (--iters > 0);

  while (len >= 40) {
    CHUNK(k0, 0);
    len -= 40;
  }
  if (len > 0) {
    s = s + len - 40;
    CHUNK(k0, 0);
  }
  j += i << 32;
  a = HashLen16(a, j);
  h += g << 32;
  b += h;
  c = HashLen16(c, f) + i;
  d = HashLen16(d, e + result[0]);
  j += e;
  i += HashLen16(h, t);
  e = HashLen16(a, d) + j;
  f = HashLen16(b, c) + a;
  g = HashLen16(j, i) + c;
  result[0] = e + f + g + h;
  a = ShiftMix((a + g) * k0) * k0 + b;
  result[1] += a + result[0];
  a = ShiftMix(a * k0) * k0 + c;
  result[2] = a + result[1];
  a = ShiftMix((a + e) * k0) * k0;
  result[3] = a + result[2];
}

// Requires len < 240.
static void CityHashCrc256Short(const char *s, size_t len, uint64 *result) {
  char buf[240];
  memcpy(buf, s, len);
  memset(buf + len, 0, 240 - len);
  CityHashCrc256Long(buf, 240, ~static_cast<uint32>(len), result);
}

void CityHashCrc256(const char *s, size_t len, uint64 *result) {
  if (LIKELY(len >= 240)) {
    CityHashCrc256Long(s, len, 0, result);
  } else {
    CityHashCrc256Short(s, len, result);
  }
}

uint128 CityHashCrc128WithSeed(const char *s, size_t len, uint128 seed) {
  if (len <= 900) {
    return CityHash128WithSeed(s, len, seed);
  } else {
    uint64 result[4];
    CityHashCrc256(s, len, result);
    uint64 u = Uint128High64(seed) + result[0];
    uint64 v = Uint128Low64(seed) + result[1];
    return uint128(HashLen16(u, v + result[2]),
                   HashLen16(Rotate(v, 32), u * k0 + result[3]));
  }
}

uint128 CityHashCrc128(const char *s, size_t len) {
  if (len <= 900) {
    return CityHash128(s, len);
  } else {
    uint64 result[4];
    CityHashCrc256(s, len, result);
    return uint128(result[2], result[3]);
  }
}

#endif


================================================
FILE: benchmarks/City.h
================================================
// Copyright (c) 2011 Google, Inc.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in
// all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
// THE SOFTWARE.
//
// CityHash, by Geoff Pike and Jyrki Alakuijala
//
// This file provides a few functions for hashing strings. On x86-64
// hardware in 2011, CityHash64() is faster than other high-quality
// hash functions, such as Murmur.  This is largely due to higher
// instruction-level parallelism.  CityHash64() and CityHash128() also perform
// well on hash-quality tests.
//
// CityHash128() is optimized for relatively long strings and returns
// a 128-bit hash.  For strings more than about 2000 bytes it can be
// faster than CityHash64().
//
// Functions in the CityHash family are not suitable for cryptography.
//
// WARNING: This code has not been tested on big-endian platforms!
// It is known to work well on little-endian platforms that have a small penalty
// for unaligned reads, such as current Intel and AMD moderate-to-high-end CPUs.
//
// By the way, for some hash functions, given strings a and b, the hash
// of a+b is easily derived from the hashes of a and b.  This property
// doesn't hold for any hash functions in this file.

#ifndef CITY_HASH_H_
#define CITY_HASH_H_

#include <stdlib.h>  // for size_t.
#include <utility>

// Microsoft Visual Studio may not have stdint.h.
#if defined(_MSC_VER) && (_MSC_VER < 1600)
typedef unsigned char uint8_t;
typedef unsigned int uint32_t;
typedef unsigned __int64 uint64_t;
#else  // defined(_MSC_VER)
#include <stdint.h>
#endif // !defined(_MSC_VER)

typedef uint8_t uint8;
typedef uint32_t uint32;
typedef uint64_t uint64;
typedef std::pair<uint64, uint64> uint128;

inline uint64 Uint128Low64(const uint128& x) { return x.first; }
inline uint64 Uint128High64(const uint128& x) { return x.second; }

// Hash function for a byte array.
uint64 CityHash64(const char *buf, size_t len);

// Hash function for a byte array.  For convenience, a 64-bit seed is also
// hashed into the result.
uint64 CityHash64WithSeed(const char *buf, size_t len, uint64 seed);

// Hash function for a byte array.  For convenience, two seeds are also
// hashed into the result.
uint64 CityHash64WithSeeds(const char *buf, size_t len,
                           uint64 seed0, uint64 seed1);

// Hash function for a byte array.
uint128 CityHash128(const char *s, size_t len);

// Hash function for a byte array.  For convenience, a 128-bit seed is also
// hashed into the result.
uint128 CityHash128WithSeed(const char *s, size_t len, uint128 seed);

// Hash 128 input bits down to 64 bits of output.
// This is intended to be a reasonably good hash function.
inline uint64 Hash128to64(const uint128& x) {
  // Murmur-inspired hashing.
  const uint64 kMul = 0x9ddfea08eb382d69ULL;
  uint64 a = (Uint128Low64(x) ^ Uint128High64(x)) * kMul;
  a ^= (a >> 47);
  uint64 b = (Uint128High64(x) ^ a) * kMul;
  b ^= (b >> 47);
  b *= kMul;
  return b;
}

// Conditionally include declarations for versions of City that require SSE4.2
// instructions to be available.
#if defined(__SSE4_2__) && defined(__x86_64__)

// Hash function for a byte array.
uint128 CityHashCrc128(const char *s, size_t len);

// Hash function for a byte array.  For convenience, a 128-bit seed is also
// hashed into the result.
uint128 CityHashCrc128WithSeed(const char *s, size_t len, uint128 seed);

// Hash function for a byte array.  Sets result[0] ... result[3].
void CityHashCrc256(const char *s, size_t len, uint64 *result);

#endif  // __SSE4_2__

#endif  // CITY_HASH_H_


================================================
FILE: benchmarks/SpookyV2.cpp
================================================
// Spooky Hash
// A 128-bit noncryptographic hash, for checksums and table lookup
// By Bob Jenkins.  Public domain.
//   Oct 31 2010: published framework, disclaimer ShortHash isn't right
//   Nov 7 2010: disabled ShortHash
//   Oct 31 2011: replace End, ShortMix, ShortEnd, enable ShortHash again
//   April 10 2012: buffer overflow on platforms without unaligned reads
//   July 12 2012: was passing out variables in final to in/out in short
//   July 30 2012: I reintroduced the buffer overflow
//   August 5 2012: SpookyV2: d = should be d += in short hash, and remove extra mix from long hash

#include <memory.h>
#include "SpookyV2.h"

#define ALLOW_UNALIGNED_READS 1

//
// short hash ... it could be used on any message, 
// but it's used by Spooky just for short messages.
//
void SpookyHash::Short(
    const void *message,
    size_t length,
    uint64 *hash1,
    uint64 *hash2)
{
    uint64 buf[2*sc_numVars];
    union 
    { 
        const uint8 *p8; 
        uint32 *p32;
        uint64 *p64; 
        size_t i; 
    } u;

    u.p8 = (const uint8 *)message;
    
    if (!ALLOW_UNALIGNED_READS && (u.i & 0x7))
    {
        memcpy(buf, message, length);
        u.p64 = buf;
    }

    size_t remainder = length%32;
    uint64 a=*hash1;
    uint64 b=*hash2;
    uint64 c=sc_const;
    uint64 d=sc_const;

    if (length > 15)
    {
        const uint64 *end = u.p64 + (length/32)*4;
        
        // handle all complete sets of 32 bytes
        for (; u.p64 < end; u.p64 += 4)
        {
            c += u.p64[0];
            d += u.p64[1];
            ShortMix(a,b,c,d);
            a += u.p64[2];
            b += u.p64[3];
        }
        
        //Handle the case of 16+ remaining bytes.
        if (remainder >= 16)
        {
            c += u.p64[0];
            d += u.p64[1];
            ShortMix(a,b,c,d);
            u.p64 += 2;
            remainder -= 16;
        }
    }
    
    // Handle the last 0..15 bytes, and its length
    d += ((uint64)length) << 56;
    switch (remainder)
    {
    case 15:
    d += ((uint64)u.p8[14]) << 48;
    case 14:
        d += ((uint64)u.p8[13]) << 40;
    case 13:
        d += ((uint64)u.p8[12]) << 32;
    case 12:
        d += u.p32[2];
        c += u.p64[0];
        break;
    case 11:
        d += ((uint64)u.p8[10]) << 16;
    case 10:
        d += ((uint64)u.p8[9]) << 8;
    case 9:
        d += (uint64)u.p8[8];
    case 8:
        c += u.p64[0];
        break;
    case 7:
        c += ((uint64)u.p8[6]) << 48;
    case 6:
        c += ((uint64)u.p8[5]) << 40;
    case 5:
        c += ((uint64)u.p8[4]) << 32;
    case 4:
        c += u.p32[0];
        break;
    case 3:
        c += ((uint64)u.p8[2]) << 16;
    case 2:
        c += ((uint64)u.p8[1]) << 8;
    case 1:
        c += (uint64)u.p8[0];
        break;
    case 0:
        c += sc_const;
        d += sc_const;
    }
    ShortEnd(a,b,c,d);
    *hash1 = a;
    *hash2 = b;
}




// do the whole hash in one call
void SpookyHash::Hash128(
    const void *message, 
    size_t length, 
    uint64 *hash1, 
    uint64 *hash2)
{
    if (length < sc_bufSize)
    {
        Short(message, length, hash1, hash2);
        return;
    }

    uint64 h0,h1,h2,h3,h4,h5,h6,h7,h8,h9,h10,h11;
    uint64 buf[sc_numVars];
    uint64 *end;
    union 
    { 
        const uint8 *p8; 
        uint64 *p64; 
        size_t i; 
    } u;
    size_t remainder;
    
    h0=h3=h6=h9  = *hash1;
    h1=h4=h7=h10 = *hash2;
    h2=h5=h8=h11 = sc_const;
    
    u.p8 = (const uint8 *)message;
    end = u.p64 + (length/sc_blockSize)*sc_numVars;

    // handle all whole sc_blockSize blocks of bytes
    if (ALLOW_UNALIGNED_READS || ((u.i & 0x7) == 0))
    {
        while (u.p64 < end)
        { 
            Mix(u.p64, h0,h1,h2,h3,h4,h5,h6,h7,h8,h9,h10,h11);
	    u.p64 += sc_numVars;
        }
    }
    else
    {
        while (u.p64 < end)
        {
            memcpy(buf, u.p64, sc_blockSize);
            Mix(buf, h0,h1,h2,h3,h4,h5,h6,h7,h8,h9,h10,h11);
	    u.p64 += sc_numVars;
        }
    }

    // handle the last partial block of sc_blockSize bytes
    remainder = (length - ((const uint8 *)end-(const uint8 *)message));
    memcpy(buf, end, remainder);
    memset(((uint8 *)buf)+remainder, 0, sc_blockSize-remainder);
    ((uint8 *)buf)[sc_blockSize-1] = remainder;
    
    // do some final mixing 
    End(buf, h0,h1,h2,h3,h4,h5,h6,h7,h8,h9,h10,h11);
    *hash1 = h0;
    *hash2 = h1;
}



// init spooky state
void SpookyHash::Init(uint64 seed1, uint64 seed2)
{
    m_length = 0;
    m_remainder = 0;
    m_state[0] = seed1;
    m_state[1] = seed2;
}


// add a message fragment to the state
void SpookyHash::Update(const void *message, size_t length)
{
    uint64 h0,h1,h2,h3,h4,h5,h6,h7,h8,h9,h10,h11;
    size_t newLength = length + m_remainder;
    uint8  remainder;
    union 
    { 
        const uint8 *p8; 
        uint64 *p64; 
        size_t i; 
    } u;
    const uint64 *end;
    
    // Is this message fragment too short?  If it is, stuff it away.
    if (newLength < sc_bufSize)
    {
        memcpy(&((uint8 *)m_data)[m_remainder], message, length);
        m_length = length + m_length;
        m_remainder = (uint8)newLength;
        return;
    }
    
    // init the variables
    if (m_length < sc_bufSize)
    {
        h0=h3=h6=h9  = m_state[0];
        h1=h4=h7=h10 = m_state[1];
        h2=h5=h8=h11 = sc_const;
    }
    else
    {
        h0 = m_state[0];
        h1 = m_state[1];
        h2 = m_state[2];
        h3 = m_state[3];
        h4 = m_state[4];
        h5 = m_state[5];
        h6 = m_state[6];
        h7 = m_state[7];
        h8 = m_state[8];
        h9 = m_state[9];
        h10 = m_state[10];
        h11 = m_state[11];
    }
    m_length = length + m_length;
    
    // if we've got anything stuffed away, use it now
    if (m_remainder)
    {
        uint8 prefix = sc_bufSize-m_remainder;
        memcpy(&(((uint8 *)m_data)[m_remainder]), message, prefix);
        u.p64 = m_data;
        Mix(u.p64, h0,h1,h2,h3,h4,h5,h6,h7,h8,h9,h10,h11);
        Mix(&u.p64[sc_numVars], h0,h1,h2,h3,h4,h5,h6,h7,h8,h9,h10,h11);
        u.p8 = ((const uint8 *)message) + prefix;
        length -= prefix;
    }
    else
    {
        u.p8 = (const uint8 *)message;
    }
    
    // handle all whole blocks of sc_blockSize bytes
    end = u.p64 + (length/sc_blockSize)*sc_numVars;
    remainder = (uint8)(length-((const uint8 *)end-u.p8));
    if (ALLOW_UNALIGNED_READS || (u.i & 0x7) == 0)
    {
        while (u.p64 < end)
        { 
            Mix(u.p64, h0,h1,h2,h3,h4,h5,h6,h7,h8,h9,h10,h11);
	    u.p64 += sc_numVars;
        }
    }
    else
    {
        while (u.p64 < end)
        { 
            memcpy(m_data, u.p8, sc_blockSize);
            Mix(m_data, h0,h1,h2,h3,h4,h5,h6,h7,h8,h9,h10,h11);
	    u.p64 += sc_numVars;
        }
    }

    // stuff away the last few bytes
    m_remainder = remainder;
    memcpy(m_data, end, remainder);
    
    // stuff away the variables
    m_state[0] = h0;
    m_state[1] = h1;
    m_state[2] = h2;
    m_state[3] = h3;
    m_state[4] = h4;
    m_state[5] = h5;
    m_state[6] = h6;
    m_state[7] = h7;
    m_state[8] = h8;
    m_state[9] = h9;
    m_state[10] = h10;
    m_state[11] = h11;
}


// report the hash for the concatenation of all message fragments so far
void SpookyHash::Final(uint64 *hash1, uint64 *hash2)
{
    // init the variables
    if (m_length < sc_bufSize)
    {
        *hash1 = m_state[0];
        *hash2 = m_state[1];
        Short( m_data, m_length, hash1, hash2);
        return;
    }
    
    const uint64 *data = (const uint64 *)m_data;
    uint8 remainder = m_remainder;
    
    uint64 h0 = m_state[0];
    uint64 h1 = m_state[1];
    uint64 h2 = m_state[2];
    uint64 h3 = m_state[3];
    uint64 h4 = m_state[4];
    uint64 h5 = m_state[5];
    uint64 h6 = m_state[6];
    uint64 h7 = m_state[7];
    uint64 h8 = m_state[8];
    uint64 h9 = m_state[9];
    uint64 h10 = m_state[10];
    uint64 h11 = m_state[11];

    if (remainder >= sc_blockSize)
    {
        // m_data can contain two blocks; handle any whole first block
        Mix(data, h0,h1,h2,h3,h4,h5,h6,h7,h8,h9,h10,h11);
        data += sc_numVars;
        remainder -= sc_blockSize;
    }

    // mix in the last partial block, and the length mod sc_blockSize
    memset(&((uint8 *)data)[remainder], 0, (sc_blockSize-remainder));

    ((uint8 *)data)[sc_blockSize-1] = remainder;
    
    // do some final mixing
    End(data, h0,h1,h2,h3,h4,h5,h6,h7,h8,h9,h10,h11);

    *hash1 = h0;
    *hash2 = h1;
}



================================================
FILE: benchmarks/SpookyV2.h
================================================
//
// SpookyHash: a 128-bit noncryptographic hash function
// By Bob Jenkins, public domain
//   Oct 31 2010: alpha, framework + SpookyHash::Mix appears right
//   Oct 31 2011: alpha again, Mix only good to 2^^69 but rest appears right
//   Dec 31 2011: beta, improved Mix, tested it for 2-bit deltas
//   Feb  2 2012: production, same bits as beta
//   Feb  5 2012: adjusted definitions of uint* to be more portable
//   Mar 30 2012: 3 bytes/cycle, not 4.  Alpha was 4 but wasn't thorough enough.
//   August 5 2012: SpookyV2 (different results)
// 
// Up to 3 bytes/cycle for long messages.  Reasonably fast for short messages.
// All 1 or 2 bit deltas achieve avalanche within 1% bias per output bit.
//
// This was developed for and tested on 64-bit x86-compatible processors.
// It assumes the processor is little-endian.  There is a macro
// controlling whether unaligned reads are allowed (by default they are).
// This should be an equally good hash on big-endian machines, but it will
// compute different results on them than on little-endian machines.
//
// Google's CityHash has similar specs to SpookyHash, and CityHash is faster
// on new Intel boxes.  MD4 and MD5 also have similar specs, but they are orders
// of magnitude slower.  CRCs are two or more times slower, but unlike 
// SpookyHash, they have nice math for combining the CRCs of pieces to form 
// the CRCs of wholes.  There are also cryptographic hashes, but those are even 
// slower than MD5.
//

#include <stddef.h>

#ifdef _MSC_VER
# define INLINE __forceinline
  typedef  unsigned __int64 uint64;
  typedef  unsigned __int32 uint32;
  typedef  unsigned __int16 uint16;
  typedef  unsigned __int8  uint8;
#else
# include <stdint.h>
# define INLINE inline
  typedef  uint64_t  uint64;
  typedef  uint32_t  uint32;
  typedef  uint16_t  uint16;
  typedef  uint8_t   uint8;
#endif


class SpookyHash
{
public:
    //
    // SpookyHash: hash a single message in one call, produce 128-bit output
    //
    static void Hash128(
        const void *message,  // message to hash
        size_t length,        // length of message in bytes
        uint64 *hash1,        // in/out: in seed 1, out hash value 1
        uint64 *hash2);       // in/out: in seed 2, out hash value 2

    //
    // Hash64: hash a single message in one call, return 64-bit output
    //
    static uint64 Hash64(
        const void *message,  // message to hash
        size_t length,        // length of message in bytes
        uint64 seed)          // seed
    {
        uint64 hash1 = seed;
        Hash128(message, length, &hash1, &seed);
        return hash1;
    }

    //
    // Hash32: hash a single message in one call, produce 32-bit output
    //
    static uint32 Hash32(
        const void *message,  // message to hash
        size_t length,        // length of message in bytes
        uint32 seed)          // seed
    {
        uint64 hash1 = seed, hash2 = seed;
        Hash128(message, length, &hash1, &hash2);
        return (uint32)hash1;
    }

    //
    // Init: initialize the context of a SpookyHash
    //
    void Init(
        uint64 seed1,       // any 64-bit value will do, including 0
        uint64 seed2);      // different seeds produce independent hashes
    
    //
    // Update: add a piece of a message to a SpookyHash state
    //
    void Update(
        const void *message,  // message fragment
        size_t length);       // length of message fragment in bytes


    //
    // Final: compute the hash for the current SpookyHash state
    //
    // This does not modify the state; you can keep updating it afterward
    //
    // The result is the same as if SpookyHash() had been called with
    // all the pieces concatenated into one message.
    //
    void Final(
        uint64 *hash1,    // out only: first 64 bits of hash value.
        uint64 *hash2);   // out only: second 64 bits of hash value.

    //
    // left rotate a 64-bit value by k bytes
    //
    static INLINE uint64 Rot64(uint64 x, int k)
    {
        return (x << k) | (x >> (64 - k));
    }

    //
    // This is used if the input is 96 bytes long or longer.
    //
    // The internal state is fully overwritten every 96 bytes.
    // Every input bit appears to cause at least 128 bits of entropy
    // before 96 other bytes are combined, when run forward or backward
    //   For every input bit,
    //   Two inputs differing in just that input bit
    //   Where "differ" means xor or subtraction
    //   And the base value is random
    //   When run forward or backwards one Mix
    // I tried 3 pairs of each; they all differed by at least 212 bits.
    //
    static INLINE void Mix(
        const uint64 *data, 
        uint64 &s0, uint64 &s1, uint64 &s2, uint64 &s3,
        uint64 &s4, uint64 &s5, uint64 &s6, uint64 &s7,
        uint64 &s8, uint64 &s9, uint64 &s10,uint64 &s11)
    {
      s0 += data[0];    s2 ^= s10;    s11 ^= s0;    s0 = Rot64(s0,11);    s11 += s1;
      s1 += data[1];    s3 ^= s11;    s0 ^= s1;    s1 = Rot64(s1,32);    s0 += s2;
      s2 += data[2];    s4 ^= s0;    s1 ^= s2;    s2 = Rot64(s2,43);    s1 += s3;
      s3 += data[3];    s5 ^= s1;    s2 ^= s3;    s3 = Rot64(s3,31);    s2 += s4;
      s4 += data[4];    s6 ^= s2;    s3 ^= s4;    s4 = Rot64(s4,17);    s3 += s5;
      s5 += data[5];    s7 ^= s3;    s4 ^= s5;    s5 = Rot64(s5,28);    s4 += s6;
      s6 += data[6];    s8 ^= s4;    s5 ^= s6;    s6 = Rot64(s6,39);    s5 += s7;
      s7 += data[7];    s9 ^= s5;    s6 ^= s7;    s7 = Rot64(s7,57);    s6 += s8;
      s8 += data[8];    s10 ^= s6;    s7 ^= s8;    s8 = Rot64(s8,55);    s7 += s9;
      s9 += data[9];    s11 ^= s7;    s8 ^= s9;    s9 = Rot64(s9,54);    s8 += s10;
      s10 += data[10];    s0 ^= s8;    s9 ^= s10;    s10 = Rot64(s10,22);    s9 += s11;
      s11 += data[11];    s1 ^= s9;    s10 ^= s11;    s11 = Rot64(s11,46);    s10 += s0;
    }

    //
    // Mix all 12 inputs together so that h0, h1 are a hash of them all.
    //
    // For two inputs differing in just the input bits
    // Where "differ" means xor or subtraction
    // And the base value is random, or a counting value starting at that bit
    // The final result will have each bit of h0, h1 flip
    // For every input bit,
    // with probability 50 +- .3%
    // For every pair of input bits,
    // with probability 50 +- 3%
    //
    // This does not rely on the last Mix() call having already mixed some.
    // Two iterations was almost good enough for a 64-bit result, but a
    // 128-bit result is reported, so End() does three iterations.
    //
    static INLINE void EndPartial(
        uint64 &h0, uint64 &h1, uint64 &h2, uint64 &h3,
        uint64 &h4, uint64 &h5, uint64 &h6, uint64 &h7, 
        uint64 &h8, uint64 &h9, uint64 &h10,uint64 &h11)
    {
        h11+= h1;    h2 ^= h11;   h1 = Rot64(h1,44);
        h0 += h2;    h3 ^= h0;    h2 = Rot64(h2,15);
        h1 += h3;    h4 ^= h1;    h3 = Rot64(h3,34);
        h2 += h4;    h5 ^= h2;    h4 = Rot64(h4,21);
        h3 += h5;    h6 ^= h3;    h5 = Rot64(h5,38);
        h4 += h6;    h7 ^= h4;    h6 = Rot64(h6,33);
        h5 += h7;    h8 ^= h5;    h7 = Rot64(h7,10);
        h6 += h8;    h9 ^= h6;    h8 = Rot64(h8,13);
        h7 += h9;    h10^= h7;    h9 = Rot64(h9,38);
        h8 += h10;   h11^= h8;    h10= Rot64(h10,53);
        h9 += h11;   h0 ^= h9;    h11= Rot64(h11,42);
        h10+= h0;    h1 ^= h10;   h0 = Rot64(h0,54);
    }

    static INLINE void End(
        const uint64 *data, 
        uint64 &h0, uint64 &h1, uint64 &h2, uint64 &h3,
        uint64 &h4, uint64 &h5, uint64 &h6, uint64 &h7, 
        uint64 &h8, uint64 &h9, uint64 &h10,uint64 &h11)
    {
        h0 += data[0];   h1 += data[1];   h2 += data[2];   h3 += data[3];
        h4 += data[4];   h5 += data[5];   h6 += data[6];   h7 += data[7];
        h8 += data[8];   h9 += data[9];   h10 += data[10]; h11 += data[11];
        EndPartial(h0,h1,h2,h3,h4,h5,h6,h7,h8,h9,h10,h11);
        EndPartial(h0,h1,h2,h3,h4,h5,h6,h7,h8,h9,h10,h11);
        EndPartial(h0,h1,h2,h3,h4,h5,h6,h7,h8,h9,h10,h11);
    }

    //
    // The goal is for each bit of the input to expand into 128 bits of 
    //   apparent entropy before it is fully overwritten.
    // n trials both set and cleared at least m bits of h0 h1 h2 h3
    //   n: 2   m: 29
    //   n: 3   m: 46
    //   n: 4   m: 57
    //   n: 5   m: 107
    //   n: 6   m: 146
    //   n: 7   m: 152
    // when run forwards or backwards
    // for all 1-bit and 2-bit diffs
    // with diffs defined by either xor or subtraction
    // with a base of all zeros plus a counter, or plus another bit, or random
    //
    static INLINE void ShortMix(uint64 &h0, uint64 &h1, uint64 &h2, uint64 &h3)
    {
        h2 = Rot64(h2,50);  h2 += h3;  h0 ^= h2;
        h3 = Rot64(h3,52);  h3 += h0;  h1 ^= h3;
        h0 = Rot64(h0,30);  h0 += h1;  h2 ^= h0;
        h1 = Rot64(h1,41);  h1 += h2;  h3 ^= h1;
        h2 = Rot64(h2,54);  h2 += h3;  h0 ^= h2;
        h3 = Rot64(h3,48);  h3 += h0;  h1 ^= h3;
        h0 = Rot64(h0,38);  h0 += h1;  h2 ^= h0;
        h1 = Rot64(h1,37);  h1 += h2;  h3 ^= h1;
        h2 = Rot64(h2,62);  h2 += h3;  h0 ^= h2;
        h3 = Rot64(h3,34);  h3 += h0;  h1 ^= h3;
        h0 = Rot64(h0,5);   h0 += h1;  h2 ^= h0;
        h1 = Rot64(h1,36);  h1 += h2;  h3 ^= h1;
    }

    //
    // Mix all 4 inputs together so that h0, h1 are a hash of them all.
    //
    // For two inputs differing in just the input bits
    // Where "differ" means xor or subtraction
    // And the base value is random, or a counting value starting at that bit
    // The final result will have each bit of h0, h1 flip
    // For every input bit,
    // with probability 50 +- .3% (it is probably better than that)
    // For every pair of input bits,
    // with probability 50 +- .75% (the worst case is approximately that)
    //
    static INLINE void ShortEnd(uint64 &h0, uint64 &h1, uint64 &h2, uint64 &h3)
    {
        h3 ^= h2;  h2 = Rot64(h2,15);  h3 += h2;
        h0 ^= h3;  h3 = Rot64(h3,52);  h0 += h3;
        h1 ^= h0;  h0 = Rot64(h0,26);  h1 += h0;
        h2 ^= h1;  h1 = Rot64(h1,51);  h2 += h1;
        h3 ^= h2;  h2 = Rot64(h2,28);  h3 += h2;
        h0 ^= h3;  h3 = Rot64(h3,9);   h0 += h3;
        h1 ^= h0;  h0 = Rot64(h0,47);  h1 += h0;
        h2 ^= h1;  h1 = Rot64(h1,54);  h2 += h1;
        h3 ^= h2;  h2 = Rot64(h2,32);  h3 += h2;
        h0 ^= h3;  h3 = Rot64(h3,25);  h0 += h3;
        h1 ^= h0;  h0 = Rot64(h0,63);  h1 += h0;
    }
    
private:

    //
    // Short is used for messages under 192 bytes in length
    // Short has a low startup cost, the normal mode is good for long
    // keys, the cost crossover is at about 192 bytes.  The two modes were
    // held to the same quality bar.
    // 
    static void Short(
        const void *message,  // message (array of bytes, not necessarily aligned)
        size_t length,        // length of message (in bytes)
        uint64 *hash1,        // in/out: in the seed, out the hash value
        uint64 *hash2);       // in/out: in the seed, out the hash value

    // number of uint64's in internal state
    static const size_t sc_numVars = 12;

    // size of the internal state
    static const size_t sc_blockSize = sc_numVars*8;

    // size of buffer of unhashed data, in bytes
    static const size_t sc_bufSize = 2*sc_blockSize;

    //
    // sc_const: a constant which:
    //  * is not zero
    //  * is odd
    //  * is a not-very-regular mix of 1's and 0's
    //  * does not need any other special mathematical properties
    //
    static const uint64 sc_const = 0xdeadbeefdeadbeefLL;

    uint64 m_data[2*sc_numVars];   // unhashed data, for partial messages
    uint64 m_state[sc_numVars];  // internal state of the hash
    size_t m_length;             // total length of the input so far
    uint8  m_remainder;          // length of unhashed data stashed in m_data
};





================================================
FILE: benchmarks/bbs-prng.h
================================================
/* Copyright (c) 2016 Vladimir Makarov <vmakarov@gcc.gnu.org>

   Permission is hereby granted, free of charge, to any person
   obtaining a copy of this software and associated documentation
   files (the "Software"), to deal in the Software without
   restriction, including without limitation the rights to use, copy,
   modify, merge, publish, distribute, sublicense, and/or sell copies
   of the Software, and to permit persons to whom the Software is
   furnished to do so, subject to the following conditions:

   The above copyright notice and this permission notice shall be
   included in all copies or substantial portions of the Software.

   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
   EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
   MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
   NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
   BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
   ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
   CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
   SOFTWARE.
*/

/* Blum-Blum-Shub Pseudo Random Number Generator (PRNG).  It is a
   crypto level PRNG: asymptotically the prediction of the next
   generated number is NP-complete task as finding the solution is
   equivalent to solving quadratic residue modulo N problem.

   The PRNG equation is simple x[n+1] = (x[n] * x[n]) mod N, where N
   is a product of two large prime numbers.

   The PRNG finds the two prime numbers during the PRNG
   initialization.  The prime number search is a probabilistic one.
   The numbers might be composite but the probability of this is very
   small 4^(-100).

   Working with big numbers are implemented by GMP.  So you need link
   a GMP Library (-lgmp) if you are going to use the PRNG.

   To use a generator call `init_bbs_prng` first, then call
   `get_bbs_prn` as much as you want to get a new PRN.  At the end of
   the PRNG use, call `finish_bbs_prng`.  You can change the default
   seed by calling set_bbs_seed.

   The PRNG passes NIST Statistical Test Suite for Random and
   Pseudorandom Number Generators for Cryptographic Applications
   (version 2.2.1) with 1000 bitstreams each containing 1M bits.

   The generation of a new number takes about 73K CPU cycles on x86_64
   (Intel 4.2GHz i7-4790K), or speed of the generation is about 58K
   numbers per sec. */

#ifndef __BBS_PRNG__
#define __BBS_PRNG__

#ifdef _MSC_VER
typedef unsigned __int32 uint32_t;
typedef unsigned __int64 uint64_t;
#else
#include <stdint.h>
#endif

#include <stdlib.h>
#include <gmp.h>

/* What size numbers should we use to make the next number prediction
   hard and how hard the prediction can be for a given number is a
   tricky question.  Please, read the Blum Blum Shub article and
   numerous discussion on the Internet.  I believe the default value
   is good for my purposes.  */
#ifndef BBS_PRIME_BITS
#define BBS_PRIME_BITS 512
#endif

/* BBS N which is a factor of two primes and the last generated pseudo
   random number which also can be considered as the current state of
   the PRNG.  */
static MP_INT _BBS_N, _BBS_xn;

/* Set up _BBS_N and _BBS_xn. */
static inline void
init_bbs_prng (void) {
  int i, n;
  MP_INT start;
  
  mpz_init (&start);
  mpz_init (&_BBS_N);
  mpz_init (&_BBS_xn);
  for (n = 0; n != 3;) {
    mpz_set_ui (&start, 0);
    for (i = 0; i < BBS_PRIME_BITS; i++) {
      mpz_mul_ui (&start, &start, 2);
      mpz_add_ui (&start, &start, rand () % 2);
    }
    if (mpz_probab_prime_p (&start, 100) == 0
	/* The following is BBS requirement to have only one solution
	   for the quadratic residue equation.  */
	|| mpz_tdiv_ui (&start, 4) != 3)
      continue;
    if (n == 0) mpz_set (&_BBS_N, &start);
    else if (n == 1) mpz_mul (&_BBS_N, &_BBS_N, &start);
    else mpz_set (&_BBS_xn, &start);
    n++;
  }
  mpz_clear (&start);
}

/* Make _BBS_xn equal to SEED.  */
static inline void
set_bbs_seed (uint32_t seed) {
  mpz_set_ui (&_BBS_xn, seed);
}


/* Update _BBS_xn.  */
static inline void
_update_bbs_prng (void) {
  mpz_mul (&_BBS_xn, &_BBS_xn, &_BBS_xn);
  mpz_tdiv_r (&_BBS_xn, &_BBS_xn, &_BBS_N);
}

/* Return the next pseudo random number.  */
static inline uint64_t
get_bbs_prn (void) {
  int i;
  uint64_t res = 0;
  
  for (i = 0; i < 32; i++) {
    _update_bbs_prng ();
    /* We conservatively use only 2 LS bits.  Depending of
       BBS_PRIME_BITS value it could be more.  */
    res = res << 2 | mpz_tdiv_ui (&_BBS_xn, 4);
  }
  return res;
}

/* Finish work with the PRNG.  */
static inline void
finish_bbs_prng (void) {
  mpz_clear (&_BBS_N);
  mpz_clear (&_BBS_xn);
}

#endif


================================================
FILE: benchmarks/bench-crypto.c
================================================
#if defined(SHA2)

#include "sha512.h"
void sha512_test (const void *msg, int len, void *out) {
  sha512_ctx ctx;
  
  rhash_sha512_init (&ctx);
  rhash_sha512_update (&ctx, msg, len);
  rhash_sha512_final (&ctx, out);
}

#define test sha512_test

#elif defined(SHA3)

#include "sha3.h"
void sha3_512_test (const void *msg, int len, void *out) {
  sha3_ctx ctx;
  
  rhash_sha3_512_init (&ctx);
  rhash_sha3_update (&ctx, msg, len);
  rhash_sha3_final (&ctx, out);
}

#define test sha3_512_test

#elif defined(BLAKE2B)

#include "blake2.h"
void blake2b_test (const void *msg, int len, void *out) {
  static const uint64_t key[4] = {0, 0, 0, 0};
  blake2b ((uint8_t *) out, msg, key, 64, len, 32);
}

#define test blake2b_test

#elif defined(MUM512)

#include "mum512.h"

void mum512_test (const void *msg, int len, void *out) {
  mum512_hash (msg, len, out);
}

#define test mum512_test

#else
#error "I don't know what to test"
#endif

static const char msg[] =
"SATURDAY morning was come, and all the summer world was bright and\n\
fresh, and brimming with life. There was a song in every heart; and if\n\
the heart was young the music issued at the lips. There was cheer in\n\
every face and a spring in every step. The locust-trees were in bloom\n\
and the fragrance of the blossoms filled the air. Cardiff Hill, beyond\n\
the village and above it, was green with vegetation and it lay just far\n\
enough away to seem a Delectable Land, dreamy, reposeful, and inviting.\n\
\n\
Tom appeared on the sidewalk with a bucket of whitewash and a\n\
long-handled brush. He surveyed the fence, and all gladness left him and\n\
a deep melancholy settled down upon his spirit. Thirty yards of board\n\
fence nine feet high. Life to him seemed hollow, and existence but a\n\
burden. Sighing, he dipped his brush and passed it along the topmost\n\
plank; repeated the operation; did it again; compared the insignificant\n\
whitewashed streak with the far-reaching continent of unwhitewashed\n\
fence, and sat down on a tree-box discouraged. Jim came skipping out at\n\
the gate with a tin pail, and singing Buffalo Gals. Bringing water from\n\
the town pump had always been hateful work in Tom's eyes, before, but\n\
now it did not strike him so. He remembered that there was company at\n\
the pump. White, mulatto, and negro boys and girls were always there\n\
waiting their turns, resting, trading playthings, quarrelling, fighting,\n\
skylarking. And he remembered that although the pump was only a hundred\n\
and fifty yards off, Jim never got back with a bucket of water under an\n\
hour--and even then somebody generally had to go after him. Tom said:\n\
\n\
\"Say, Jim, I'll fetch the water if you'll whitewash some.\"\n\
\n\
Jim shook his head and said:\n\
\n\
\"Can't, Mars Tom. Ole missis, she tole me I got to go an' git dis water\n\
an' not stop foolin' roun' wid anybody. She say she spec' Mars Tom gwine\n\
to ax me to whitewash, an' so she tole me go 'long an' 'tend to my own\n\
business--she 'lowed _she'd_ 'tend to de whitewashin'.\"\n\
\n\
\"Oh, never you mind what she said, Jim. That's the way she always talks.\n\
Gimme the bucket--I won't be gone only a a minute. _She_ won't ever\n\
know.\"\n\
\n\
\"Oh, I dasn't, Mars Tom. Ole missis she'd take an' tar de head off'n me.\n\
'Deed she would.\"\n\
\n\
\"_She_! She never licks anybody--whacks 'em over the head with her\n\
thimble--and who cares for that, I'd like to know. She talks awful, but\n\
talk don't hurt--anyways it don't if she don't cry. Jim, I'll give you a\n\
marvel. I'll give you a white alley!\"\n\
\n\
Jim began to waver.\n\
\n\
\"White alley, Jim! And it's a bully taw.\"\n\
\n\
\"My! Dat's a mighty gay marvel, I tell you! But Mars Tom I's powerful\n\
'fraid ole missis--\"\n\
\n\
\"And besides, if you will I'll show you my sore toe.\"\n\
\n\
Jim was only human--this attraction was too much for him. He put down\n\
his pail, took the white alley, and bent over the toe with absorbing\n\
interest while the bandage was being unwound. In another moment he\n\
was flying down the street with his pail and a tingling rear, Tom was\n\
whitewashing with vigor, and Aunt Polly was retiring from the field with\n\
a slipper in her hand and triumph in her eye.\n\
\n\
But Tom's energy did not last. He began to think of the fun he had\n\
planned for this day, and his sorrows multiplied. Soon the free boys\n\
would come tripping along on all sorts of delicious expeditions, and\n\
they would make a world of fun of him for having to work--the very\n\
thought of it burnt him like fire. He got out his worldly wealth and\n\
examined it--bits of toys, marbles, and trash; enough to buy an exchange\n\
of _work_, maybe, but not half enough to buy so much as half an hour\n\
of pure freedom. So he returned his straitened means to his pocket, and\n\
gave up the idea of trying to buy the boys. At this dark and hopeless\n\
moment an inspiration burst upon him! Nothing less than a great,\n\
magnificent inspiration.\n\
\n\
He took up his brush and went tranquilly to work. Ben Rogers hove in\n\
sight presently--the very boy, of all boys, whose ridicule he had been\n\
dreading. Ben's gait was the hop-skip-and-jump--proof enough that his\n\
heart was light and his anticipations high. He was eating an apple, and\n\
giving a long, melodious whoop, at intervals, followed by a deep-toned\n\
ding-dong-dong, ding-dong-dong, for he was personating a steamboat. As\n\
he drew near, he slackened speed, took the middle of the street, leaned\n\
far over to starboard and rounded to ponderously and with laborious pomp\n\
and circumstance--for he was personating the Big Missouri, and considered\n\
himself to be drawing nine feet of water. He was boat and captain and\n\
engine-bells combined, so he had to imagine himself standing on his own\n\
hurricane-deck giving the orders and executing them:\n\
\n\
\"Stop her, sir! Ting-a-ling-ling!\" The headway ran almost out, and he\n\
drew up slowly toward the sidewalk.\n\
\n\
\"Ship up to back! Ting-a-ling-ling!\" His arms straightened and stiffened\n\
down his sides.\n\
\n\
\"Set her back on the stabboard! Ting-a-ling-ling! Chow! ch-chow-wow!\n\
Chow!\" His right hand, mean-time, describing stately circles--for it was\n\
representing a forty-foot wheel.\n\
\n\
\"Let her go back on the labboard! Ting-a-ling-ling! Chow-ch-chow-chow!\"\n\
The left hand began to describe circles.\n\
\n\
\"Stop the stabboard! Ting-a-ling-ling! Stop the labboard! Come ahead on\n\
the stabboard! Stop her! Let your outside turn over slow! Ting-a-ling-ling!\n\
Chow-ow-ow! Get out that head-line! _lively_ now! Come--out with\n\
your spring-line--what're you about there! Take a turn round that stump\n\
with the bight of it! Stand by that stage, now--let her go! Done with\n\
the engines, sir! Ting-a-ling-ling! SH'T! S'H'T! SH'T!\" (trying the\n\
gauge-cocks).\n\
\n\
Tom went on whitewashing--paid no attention to the steamboat. Ben stared\n\
a moment and then said: \"_Hi-Yi! You're_ up a stump, ain't you!\"\n\
\n\
No answer. Tom surveyed his last touch with the eye of an artist, then\n\
he gave his brush another gentle sweep and surveyed the result, as\n\
before. Ben ranged up alongside of him. Tom's mouth watered for the\n\
apple, but he stuck to his work. Ben said:\n\
\n\
\"Hello, old chap, you got to work, hey?\"\n\
\n\
Tom wheeled suddenly and said:\n\
\n\
\"Why, it's you, Ben! I warn't noticing.\"\n\
\n\
\"Say--I'm going in a-swimming, I am. Don't you wish you could? But of\n\
course you'd druther _work_--wouldn't you? Course you would!\"\n\
\n\
Tom contemplated the boy a bit, and said:\n\
\n\
\"What do you call work?\"\n\
\n\
\"Why, ain't _that_ work?\"\n\
\n\
Tom resumed his whitewashing, and answered carelessly:\n\
\n\
\"Well, maybe it is, and maybe it ain't. All I know, is, it suits Tom\n\
Sawyer.\"\n\
\n\
\"Oh come, now, you don't mean to let on that you _like_ it?\"\n\
\n\
The brush continued to move.\n\
\n\
\"Like it? Well, I don't see why I oughtn't to like it. Does a boy get a\n\
chance to whitewash a fence every day?\"\n\
\n\
That put the thing in a new light. Ben stopped nibbling his apple.\n\
Tom swept his brush daintily back and forth--stepped back to note the\n\
effect--added a touch here and there--criticised the effect again--Ben\n\
watching every move and getting more and more interested, more and more\n\
absorbed. Presently he said:\n\
\n\
\"Say, Tom, let _me_ whitewash a little.\"\n\
\n\
Tom considered, was about to consent; but he altered his mind:\n\
\n\
\"No--no--I reckon it wouldn't hardly do, Ben. You see, Aunt Polly's awful\n\
particular about this fence--right here on the street, you know--but if it\n\
was the back fence I wouldn't mind and _she_ wouldn't. Yes, she's awful\n\
particular about this fence; it's got to be done very careful; I reckon\n\
there ain't one boy in a thousand, maybe two thousand, that can do it\n\
the way it's got to be done.\"\n\
\n\
\"No--is that so? Oh come, now--lemme just try. Only just a little--I'd let\n\
_you_, if you was me, Tom.\"\n\
\n\
\"Ben, I'd like to, honest injun; but Aunt Polly--well, Jim wanted to do\n\
it, but she wouldn't let him; Sid wanted to do it, and she wouldn't let\n\
Sid. Now don't you see how I'm fixed? If you was to tackle this fence\n\
and anything was to happen to it--\"\n\
\n\
\"Oh, shucks, I'll be just as careful. Now lemme try. Say--I'll give you\n\
the core of my apple.\"\n\
\n\
\"Well, here--No, Ben, now don't. I'm afeard--\"\n\
\n\
\"I'll give you _all_ of it!\"\n\
\n\
Tom gave up the brush with reluctance in his face, but alacrity in his\n\
heart. And while the late steamer Big Missouri worked and sweated in the\n\
sun, the retired artist sat on a barrel in the shade close by,\n\
dangled his legs, munched his apple, and planned the slaughter of more\n\
innocents. There was no lack of material; boys happened along every\n\
little while; they came to jeer, but remained to whitewash. By the time\n\
Ben was fagged out, Tom had traded the next chance to Billy Fisher for\n\
a kite, in good repair; and when he played out, Johnny Miller bought in\n\
for a dead rat and a string to swing it with--and so on, and so on, hour\n\
after hour. And when the middle of the afternoon came, from being a\n\
poor poverty-stricken boy in the morning, Tom was literally rolling in\n\
wealth. He had besides the things before mentioned, twelve marbles, part\n\
of a jews-harp, a piece of blue bottle-glass to look through, a spool\n\
cannon, a key that wouldn't unlock anything, a fragment of chalk, a\n\
glass stopper of a decanter, a tin soldier, a couple of tadpoles,\n\
six fire-crackers, a kitten with only one eye, a brass door-knob, a\n\
dog-collar--but no dog--the handle of a knife, four pieces of orange-peel,\n\
and a dilapidated old window sash.\n\
\n\
He had had a nice, good, idle time all the while--plenty of company--and\n\
the fence had three coats of whitewash on it! If he hadn't run out of\n\
whitewash he would have bankrupted every boy in the village.\n\
\n\
Tom said to himself that it was not such a hollow world, after all. He\n\
had discovered a great law of human action, without knowing it--namely,\n\
that in order to make a man or a boy covet a thing, it is only necessary\n\
to make the thing difficult to attain. If he had been a great and\n\
wise philosopher, like the writer of this book, he would now have\n\
comprehended that Work consists of whatever a body is _obliged_ to do,\n\
and that Play consists of whatever a body is not obliged to do. And\n\
this would help him to understand why constructing artificial flowers or\n\
performing on a tread-mill is work, while rolling ten-pins or climbing\n\
Mont Blanc is only amusement. There are wealthy gentlemen in England\n\
who drive four-horse passenger-coaches twenty or thirty miles on a\n\
daily line, in the summer, because the privilege costs them considerable\n\
money; but if they were offered wages for the service, that would turn\n\
it into work and then they would resign.\n\
\n\
The boy mused awhile over the substantial change which had taken place\n\
in his worldly circumstances, and then wended toward headquarters to\n\
report.\n\
";

#include <stdlib.h>
#include <stdio.h>
static void
print512 (unsigned char digest[64]) {
  int i;
  
  for (i = 0; i < 64; i++)
    printf ("%x", digest[i]);
  printf ("\n");
}

#ifdef SPEED1
int main () {
  int i; uint64_t k = rand (); unsigned char out[64];
  
  for (i = 0; i < 2000000; i++)
    test (msg, 10, &out);
  printf ("10 byte: %s:", (size_t)&k & 0x7 ? "unaligned" : "aligned");
  print512 (out);
  return 0;
}
#endif

#ifdef SPEED2
int main () {
  int i; uint64_t k = rand (); unsigned char out[64];
  
  for (i = 0; i < 2000000; i++)
    test (msg, 100, &out);
  printf ("100 byte: %s:", (size_t)&k & 0x7 ? "unaligned" : "aligned");
  print512 (out);
  return 0;
}
#endif

#ifdef SPEED3
int main () {
  int i; uint64_t k = rand (); unsigned char out[64];
  
  for (i = 0; i < 2000000; i++)
    test (msg, 1000, &out);
  printf ("1000 byte: %s:", (size_t)&k & 0x7 ? "unaligned" : "aligned");
  print512 (out);
  return 0;
}
#endif

#ifdef SPEED4
int main () {
int i; uint64_t k = rand (); unsigned char out[64];

  for (i = 0; i < 500000; i++)
    test (msg, 10000, &out);
  printf ("10000 byte: %s:", (size_t)&k & 0x7 ? "unaligned" : "aligned");
  print512 (out);
  return 0;
}
#endif



================================================
FILE: benchmarks/bench-crypto.sh
================================================
#!/bin/bash

# Benchmarking different crypto hash functions.

IFS='%'
temp=__temp


print() {
    s=`grep -E 'user[ 	]*[0-9]' $2 | sed s/.*user// | sed s/\\t//`
    echo $1 "$s"s
}

echo compiling sha512
gcc -O3 -w -c sha512.c byte_order.c || exit 1
echo compiling sha3
gcc -O3 -w -c sha3.c || exit 1

str86_64="`uname -a|grep x86_64`"
if test -n str86_64; then
    echo compiling blake2b
    gcc -O3 -w -c -I. -std=gnu99 blake2b.c || exit 1
fi

echo +++10-byte speed '(20M texts)':
gcc -DSPEED1 -O3 -w -DSHA2 sha512.o byte_order.o bench-crypto.c && (time -p ./a.out) >$temp 2>&1 && print "sha512:" $temp
gcc -DSPEED1 -O3 -w -DSHA3 sha3.o byte_order.o bench-crypto.c && (time -p ./a.out) >$temp 2>&1 && print "sha3  :" $temp
if test -n str86_64; then
    gcc -DSPEED1 -O3 -w -DBLAKE2B blake2b.o bench-crypto.c && (time -p ./a.out) >$temp 2>&1 && print "blake2:" $temp
fi
gcc -DSPEED1 -O3 -w -DMUM512 -I../ bench-crypto.c && (time -p ./a.out) >$temp 2>&1 && print "mum512:" $temp

echo +++100-byte speed '(20M texts)':
gcc -DSPEED2 -O3 -w -DSHA2 sha512.o byte_order.o bench-crypto.c && (time -p ./a.out) >$temp 2>&1 && print "sha512:" $temp
gcc -DSPEED2 -O3 -w -DSHA3 sha3.o byte_order.o bench-crypto.c && (time -p ./a.out) >$temp 2>&1 && print "sha3  :" $temp
if test -n str86_64; then
    gcc -DSPEED2 -O3 -w -DBLAKE2B blake2b.o bench-crypto.c && (time -p ./a.out) >$temp 2>&1 && print "blake2:" $temp
fi
gcc -DSPEED2 -O3 -w -DMUM512 -I../ bench-crypto.c && (time -p ./a.out) >$temp 2>&1 && print "mum512:" $temp

echo +++1000-byte speed '(20M texts)':
gcc -DSPEED3 -O3 -w -DSHA2 sha512.o byte_order.o bench-crypto.c && (time -p ./a.out) >$temp 2>&1 && print "sha512:" $temp
gcc -DSPEED3 -O3 -w -DSHA3 sha3.o byte_order.o bench-crypto.c && (time -p ./a.out) >$temp 2>&1 && print "sha3  :" $temp
if test -n str86_64; then
    gcc -DSPEED3 -O3 -w -DBLAKE2B blake2b.o bench-crypto.c && (time -p ./a.out) >$temp 2>&1 && print "blake2:" $temp
fi
gcc -DSPEED3 -O3 -w -DMUM512 -I../ bench-crypto.c && (time -p ./a.out) >$temp 2>&1 && print "mum512:" $temp

echo +++10000-byte speed '(5M texts)':
gcc -DSPEED4 -O3 -w -DSHA2 sha512.o byte_order.o bench-crypto.c && (time -p ./a.out) >$temp 2>&1 && print "sha512:" $temp
gcc -DSPEED4 -O3 -w -DSHA3 sha3.o byte_order.o bench-crypto.c && (time -p ./a.out) >$temp 2>&1 && print "sha3  :" $temp
if test -n str86_64; then
    gcc -DSPEED4 -O3 -w -DBLAKE2B blake2b.o bench-crypto.c && (time -p ./a.out) >$temp 2>&1 && print "blake2:" $temp
fi
gcc -DSPEED4 -O3 -w -DMUM512 -I../ bench-crypto.c && (time -p ./a.out) >$temp 2>&1 && print "mum512:" $temp

rm -rf ./a.out $temp sha512.o sha3.o byte_order.o blake2b.o


================================================
FILE: benchmarks/bench-prng.c
================================================
#define N1 100000
#if defined(BBS)
#include "bbs-prng.h"
#define N2 2
static void init_prng (void) { init_bbs_prng (); }
static uint64_t get_prn (void) { return get_bbs_prn (); }
static void finish_prng (void) { finish_bbs_prng (); }
#elif defined(CHACHA)
#include "chacha-prng.h"
#define N2 5000
static void init_prng (void) { init_chacha_prng (); }
static uint64_t get_prn (void) { return get_chacha_prn (); }
static void finish_prng (void) { finish_chacha_prng (); }
#elif defined(SIP24)
#include "sip24-prng.h"
#define N2 5000
static void init_prng (void) { init_sip24_prng (); }
static uint64_t get_prn (void) { return get_sip24_prn (); }
static void finish_prng (void) { finish_sip24_prng (); }
#elif defined(MUM)
#include "mum-prng.h"
#define N2 30000
static void init_prng (void) { mum_hash_randomize (0); init_mum_prng (); }
static uint64_t get_prn (void) { return get_mum_prn (); }
static void finish_prng (void) { finish_mum_prng (); }
#elif defined(MUM512)
#include "mum512-prng.h"
#define N2 2000
static void init_prng (void) { init_mum512_prng (); }
static uint64_t get_prn (void) { return get_mum512_prn (); }
static void finish_prng (void) { finish_mum512_prng (); }
#elif defined(XOROSHIRO128P)
#include "xoroshiro128plus.c"
#define N2 30000
static void init_prng (void) { s[0] = 0xe220a8397b1dcdaf; s[1] = 0x6e789e6aa1b965f4; }
static uint64_t get_prn (void) { return next (); }
static void finish_prng (void) { }
#elif defined(XOROSHIRO128STARSTAR)
#include "xoroshiro128starstar.c"
#define N2 30000
static void init_prng (void) { s[0] = 0xe220a8397b1dcdaf; s[1] = 0x6e789e6aa1b965f4; }
static uint64_t get_prn (void) { return next (); }
static void finish_prng (void) { }
#elif defined(XOSHIRO256P)
#include "xoshiro256plus.c"
#define N2 30000
static void init_prng (void) { s[0] = 0xe220a8397b1dcdaf; s[1] = 0x6e789e6aa1b965f4; s[2] = 0x6c45d188009454f; s[3] = 0xf88bb8a8724c81ec; }
static uint64_t get_prn (void) { return next (); }
static void finish_prng (void) { }
#elif defined(XOSHIRO256STARSTAR)
#include "xoshiro256starstar.c"
#define N2 30000
static void init_prng (void) { s[0] = 0xe220a8397b1dcdaf; s[1] = 0x6e789e6aa1b965f4; s[2] = 0x6c45d188009454f; s[3] = 0xf88bb8a8724c81ec; }
static uint64_t get_prn (void) { return next (); }
static void finish_prng (void) { }
#elif defined(XOSHIRO512P)
#include "xoshiro512plus.c"
#define N2 30000
static void init_prng (void) { s[0] = 0xe220a8397b1dcdaf; s[1] = 0x6e789e6aa1b965f4; s[2] = 0x6c45d188009454f; s[3] = 0xf88bb8a8724c81ec; s[4] = 0x1b39896a51a8749b; s[5] = 0x53cb9f0c747ea2ea; s[6] = 0x2c829abe1f4532e1; s[7] = 0xc584133ac916ab3c; }
static uint64_t get_prn (void) { return next (); }
static void finish_prng (void) { }
#elif defined(XOSHIRO512STARSTAR)
#include "xoshiro512starstar.c"
#define N2 30000
static void init_prng (void) { s[0] = 0xe220a8397b1dcdaf; s[1] = 0x6e789e6aa1b965f4; s[2] = 0x6c45d188009454f; s[3] = 0xf88bb8a8724c81ec; s[4] = 0x1b39896a51a8749b; s[5] = 0x53cb9f0c747ea2ea; s[6] = 0x2c829abe1f4532e1; s[7] = 0xc584133ac916ab3c; }
static uint64_t get_prn (void) { return next (); }
static void finish_prng (void) { }
#elif defined(RAND)
#include <stdlib.h>
#include <stdint.h>
#define N2 7000
static void init_prng (void) { }
static uint64_t get_prn (void) { return rand (); }
static void finish_prng (void) { }
#endif

uint64_t dummy;
#include <stdio.h>
#include <time.h>

#ifdef OUTPUT
#include <stdint.h>

int main(void)
{
    init_prng ();
    while (1) {
        uint64_t value = get_prn();
        fwrite((void*) &value, sizeof(value), 1, stdout);
    }
}

#else
int main (void) {
  int i, j; double d; uint64_t res = 0;
  clock_t t = clock ();
  
  init_prng ();
  for (i = 0; i < N2; i++)
    for (j = 0; j < N1; j++)
      res ^= get_prn ();
  finish_prng ();
  t = clock () - t;
  d = (N1 + 0.0) * N2 * CLOCKS_PER_SEC / t / 1000;
  if (d > 1000)
    printf ("%7.2fM prns/sec\n", d / 1000);
  else
    printf ("%7.2fK prns/sec\n", d);
  dummy = res;
  return 0;
}
#endif


================================================
FILE: benchmarks/bench-prng.sh
================================================
#!/bin/bash

# Benchmarking different Pseudo Random Generators

echo +++pseudo random number generation speed '(PRNs/sec)':
if test x${MUM_ONLY} == x; then
    gcc -DBBS -O3 -w bench-prng.c -lgmp && echo -n 'BBS           : ' && ./a.out 2>&1
    gcc -DCHACHA -O3 -w bench-prng.c && echo -n 'ChaCha        : ' && ./a.out 2>&1
    gcc -DSIP24 -O3 -w bench-prng.c && echo -n 'Sip24         : ' && ./a.out 2>&1
fi
gcc -DMUM512 -DMUM512_ROUNDS=2 -I../ -O3 -w bench-prng.c && echo -n 'MUM512        : ' && ./a.out 2>&1
gcc -DMUM -I../ -O3 -w bench-prng.c && echo -n 'MUM           : ' && ./a.out 2>&1
if test x${MUM_ONLY} == x; then
    gcc -DXOROSHIRO128STARSTAR -I../ -std=c99 -O3 -w bench-prng.c && echo -n 'XOROSHIRO128**: ' && ./a.out 2>&1
    gcc -DXOSHIRO256STARSTAR -I../ -std=c99 -O3 -w bench-prng.c && echo -n 'XOSHIRO256**  : ' && ./a.out 2>&1
    gcc -DXOSHIRO512STARSTAR -I../ -std=c99 -O3 -w bench-prng.c && echo -n 'XOSHIRO512**  : ' && ./a.out 2>&1
    gcc -DRAND -I../ -O3 -w bench-prng.c && echo -n 'RAND          : ' && ./a.out 2>&1
    gcc -DXOROSHIRO128P -I../ -std=c99 -O3 -w bench-prng.c && echo -n 'XOROSHIRO128+ : ' && ./a.out 2>&1
    gcc -DXOSHIRO256P -I../ -std=c99 -O3 -w bench-prng.c && echo -n 'XOSHIRO256+   : ' && ./a.out 2>&1
    gcc -DXOSHIRO512P -I../ -std=c99 -O3 -w bench-prng.c && echo -n 'XOSHIRO512+   : ' && ./a.out 2>&1
fi

rm -rf ./a.out


================================================
FILE: benchmarks/bench.c
================================================
#if defined(Spooky)

#include "SpookyV2.h"
static void SpookyHash64_test (const void *key, int len, uint32_t seed, void *out) {
  *(uint64_t *) out = SpookyHash::Hash64 (key, len, seed);
}

#define test SpookyHash64_test
#define test64 test

#elif defined(City)

#include "City.h"
static void CityHash64_test (const void *key, int len, uint32_t seed, void *out) {
  *(uint64 *) out = CityHash64WithSeed ((const char *) key, len, seed);
}

#define test CityHash64_test
#define test64 test

#elif defined(SipHash)

#include <stdint.h>

extern int siphash (uint8_t *out, const uint8_t *in, uint64_t inlen, const uint8_t *k);

static void siphash_test (const void *key, int len, uint32_t seed, void *out) {
  uint64_t s[2];

  s[0] = seed;
  s[1] = 0;
  siphash (out, (const uint8_t *) key, len, (const uint8_t *) s);
}

#define test siphash_test
#define test64 test

#elif defined(xxHash)

#ifdef _MSC_VER
typedef unsigned __int32 uint32_t;
typedef unsigned __int64 uint64_t;
#else
#include <stdint.h>
#endif

#include "xxhash.c"

static void xxHash64_test (const void *key, int len, uint32_t seed, void *out) {
  *(uint64_t *) out = XXH64 (key, len, seed);
}

#define test xxHash64_test
#define test64 test

#elif defined(xxh3)

#include "xxh3.h"

static void xxh3_test (const void *key, int len, uint32_t seed, void *out) {
  *(uint64_t *) out = XXH3_64bits (key, len);
}

#define test xxh3_test
#define test64 test

#elif defined(T1HA2)

#include "t1ha.h"
static void t1ha_test (const void *key, int len, uint32_t seed, void *out) {
  *(uint64_t *) out = t1ha2_atonce (key, len, seed);
}

#define test t1ha_test
#define test64 test

#elif defined(City)

#include "City.h"
static void CityHash64_test (const void *key, int len, uint32_t seed, void *out) {
  *(uint64 *) out = CityHash64WithSeed ((const char *) key, len, seed);
}

#define test CityHash64_test
#define test64 test

#elif defined(METRO)

#include "metrohash64.h"
static void metro_test (const void *key, int len, uint32_t seed, void *out) {
  MetroHash64::Hash ((const uint8_t *) key, len, (uint8_t *) out, seed);
}

#define test metro_test
#define test64 test

#elif defined(MeowHash)

#ifdef _MSC_VER
typedef unsigned __int32 uint32_t;
typedef unsigned __int64 uint64_t;
#else
#include <stdint.h>
#endif

#include "meow_intrinsics.h"
#include "meow_hash.h"

static void meowhash_test (const void *key, int len, uint32_t seed, void *out) {
  *(uint64_t *) out = MeowU64From (MeowHash_Accelerated (seed, len, key), 0);
}

#define test meowhash_test
#define test64 test

#elif defined(MUM)

#include "mum.h"
static void mum_test (const void *key, int len, uint32_t seed, void *out) {
  *(uint64_t *) out = mum_hash (key, len, seed);
}

static void mum_test64 (const void *key, int len, uint32_t seed, void *out) {
  *(uint64_t *) out = mum_hash64 (*(uint64_t *) key, seed);
}

#define test mum_test
#define test64 mum_test64

#elif defined(VMUM)

#include "vmum.h"
static void mum_test (const void *key, int len, uint32_t seed, void *out) {
  *(uint64_t *) out = vmum_hash (key, len, seed);
}

static void mum_test64 (const void *key, int len, uint32_t seed, void *out) {
  *(uint64_t *) out = vmum_hash64 (*(uint64_t *) key, seed);
}

#define test mum_test
#define test64 mum_test64

#elif defined(RAPID)

#include "rapidhash.h"
static void rapid_test (const void *key, int len, uint32_t seed, void *out) {
  *(uint64_t *) out = rapidhash_withSeed (key, len, seed);
}

#define test rapid_test
#define test64 rapid_test

#else
#error "I don't know what to test"
#endif

#if DATA_LEN == 0

#include <stdlib.h>
#include <stdio.h>
uint32_t arr[16 * 256 * 1024];
int main () {
  int i;
  uint64_t out;

  for (i = 0; i < 16 * 256 * 1024; i++) {
    arr[i] = rand ();
  }
  for (i = 0; i < 10000; i++) test (arr, 16 * 256 * 1024 * 4, 2, &out), arr[0] = out;
  printf ("%s:%llx\n", (size_t) arr & 0x7 ? "unaligned" : "aligned", out);
  return 0;
}

#else

int len = DATA_LEN;
uint64_t k[(DATA_LEN + 7) / 8];
#include <assert.h>
#include <stdlib.h>
#include <stdio.h>
/* We should use external to prevent optimizations for MUM after
   inlining.  Otherwise MUM results will be too good.  */
int main () {
  int i, j, n;
  uint64_t out;

  assert (len <= 1024);
  printf ("%d-byte: %s:\n", len, (size_t) k & 0x7 ? "unaligned" : "aligned");
  for (i = 0; i < sizeof (k) / sizeof (uint64_t); i++) k[i] = i;
  for (j = 0; j < 128; j++)
    for (n = i = 0; i < 10000000; i++) test (k, len, 2, &out), k[0] = out;
  printf ("%llx\n", out);
  return 0;
}

#endif


================================================
FILE: benchmarks/bench.sh
================================================
#!/bin/bash

# Benchmarking different hash functions.

temp=__hash-temp.out
temp2=__hash-temp2.out
temp3=__hash-temp3.out

COPTFLAGS=${COPTFLAGS:--O3}
if test `uname -m` == x86_64; then
    COPTFLAGS=`echo $COPTFLAGS -march=native`
elif test `uname -m` == ppc64; then
    COPTFLAGS=`echo $COPTFLAGS -mcpu=native`
fi
LTO=${LTO:--flto}
CC=${CC:-cc}
CXX=${CXX:-c++}

echo Using ${CC} and ${CXX}

if test x${MUM_ONLY} == x; then
    echo compiling Spooky
    ${CXX} ${COPTFLAGS} ${LTO} -w -c SpookyV2.cpp || exit 1
    echo compiling City
    ${CXX} ${COPTFLAGS} ${LTO} -w -c City.cpp || exit 1
    echo compiling t1ha
    ${CC} ${COPTFLAGS} ${LTO} -w -c t1ha/src/t1ha*.c || exit 1
    echo compiling metrohash64
    ${CXX} ${COPTFLAGS} ${LTO} -w -c metrohash64.cpp || exit 1
    echo compiling SipHash24
    ${CC} ${COPTFLAGS} ${LTO} -w -c siphash24.c || exit 1
fi

rm -f $temp3

percent () {
    val=`awk "BEGIN {if ($2==0) print \"Inf\"; else printf \"%.2f\n\", $1/$2;}"`
    echo "$val"
    echo "$3:$val" >>$temp3
}

skip () {
    l=$1
    n=$2
    while test $l -le $n; do echo -n " "; l=`expr $l + 1`; done
}

print_time() {
    title="$1"
    secs=$2
    printf '%-.2f %5.2fs|' `percent $base_time $secs "$title"` $secs
}

TASKSET=""
if type taskset >/dev/null 2>&1;then TASKSET="taskset -c 0";fi
echo $TASKSET

run () {
  title=$1
  program=$2
  flag=$3
  ok=
  if (time -p $TASKSET $program) >$temp 2>$temp2; then
      ok=y
      (time -p $TASKSET $program) >$temp 2>>$temp2
      (time -p $TASKSET $program) >$temp 2>>$temp2
  fi
  if test x$ok = x;then echo $program: FAILED; return 1; fi
  secs=`grep -E 'user[ 	]*[0-9]' $temp2 | grep -F -v : | sed s/.*user// | sed s/\\t// | sort -n | head -1`
  if test x$flag != x;then base_time=$secs;fi
  print_time "$title" $secs
}

mach=`uname -m`
check_meow=`(test $mach == x86_64 || test $mach == aarch64) && echo yes`

check_meow=
check_xxHash=

echo -n '| Length    |  VMUM-V2  |  VMUM-V1  |  MUM-V4   |  MUM-V3   |  Spooky   |   City    |'
if test "$check_xxHash" == yes; then echo -n '  xxHash   |';fi
if test "$check_rapid" == yes; then echo -n '  Rapidh   |';fi
echo -n '  xxHash3  |   t1ha2   | SipHash24 |   Metro   |'
if test "$check_meow" == yes; then echo ' MeowHash  |'; else echo; fi
echo -n '|:----------|:---------:|:---------:|:---------:|:---------:|:---------:|:---------:|'
if test "$check_xxHash" == yes; then echo -n ':---------:|';fi
if test "$check_rapid" == yes; then echo -n ':---------:|';fi
echo -n ':---------:|:---------:|:---------:|:---------:|'
if test "$check_meow" == yes; then echo ':---------:|'; else echo; fi

for i in 3 4 5 6 7 8 9 10 11 12 13 14 15 16 32 64 96 128 192 256 512 1024 0;do
    if test $i == 0; then echo -n '| Bulk      |'; else printf '|%4d bytes |' $i;fi
    ${CXX} -DDATA_LEN=$i ${COPTFLAGS} -w -fpermissive -DVMUM -I../ bench.c && run "00vMUM-V2" "./a.out" first
    ${CXX} -DDATA_LEN=$i ${COPTFLAGS} -w -fpermissive -DVMUM -DVMUM_V1 -I../ bench.c && run "01vMUM-V1" "./a.out"
    ${CXX} -DDATA_LEN=$i ${COPTFLAGS} -w -fpermissive -DMUM -I../ bench.c && run "02MUM-V4" "./a.out"
    ${CXX} -DDATA_LEN=$i ${COPTFLAGS} -w -fpermissive -DMUM -DMUM_V3 -I../ bench.c && run "03MUM-V3" "./a.out"
    if test x${MUM_ONLY} == x; then
	${CXX} -DDATA_LEN=$i ${COPTFLAGS} ${LTO} -w -fpermissive -DSpooky SpookyV2.o bench.c && run "04Spooky" "./a.out"
	${CXX} -DDATA_LEN=$i ${COPTFLAGS} ${LTO} -w -fpermissive -DCity City.o bench.c && run "05City" "./a.out"
	if test "$check_xxHash" == yes;then
	    ${CXX} -DDATA_LEN=$i ${COPTFLAGS} ${LTO} -w -fpermissive -DxxHash bench.c && run "06xxHash" "./a.out"
	fi
	if test "$check_rapid" == yes;then
            ${CXX} -DDATA_LEN=$i ${COPTFLAGS} -w -fpermissive -DRAPID -I../ bench.c && run "07vRAPID" "./a.out"
	fi
	${CXX} -DDATA_LEN=$i ${COPTFLAGS} ${LTO} -w -fpermissive -Dxxh3 bench.c && run "08xxh3" "./a.out"
	${CC} -DDATA_LEN=$i ${COPTFLAGS} ${LTO} -w -fpermissive -It1ha -DT1HA2 t1ha*.o bench.c && run "09t1ha2" "./a.out"
	${CC} -DDATA_LEN=$i ${COPTFLAGS} ${LTO} -w -fpermissive -DSipHash siphash24.o bench.c && run "10Siphash24" "./a.out"
	${CXX} -DDATA_LEN=$i ${COPTFLAGS} ${LTO} -w -fpermissive -DMETRO metrohash64.o -I../ bench.c && run "11Metro" "./a.out"
	if test "$check_meow" == yes && test $mach == x86_64; then
	    ${CXX} -DDATA_LEN=$i ${COPTFLAGS} -w -mavx2 -maes -fpermissive -DMeowHash -I../ bench.c && run "12Meowhash" "./a.out"
	elif test "$check_meow" == yes && test $mach == aarch64; then
	    ${CXX} -DDATA_LEN=$i ${COPTFLAGS} -w -march=native -fpermissive -DMeowHash -I../ bench.c && run "13Meowhash" "./a.out"
	fi
    fi
    echo
done

echo -n '| Average   |'
for i in `awk -F: '{print $1}' $temp3|sort|uniq`; do
    printf '%-10.2f |' `awk -F: -v name="$i" 'name==$1 {f = f + $2; n++} END {printf "%0.2f\n", f / n}' $temp3`
done
echo

echo -n '| Geomean   |'
for i in `awk -F: '{print $1}' $temp3|sort|uniq`; do
    printf '%-10.2f |' `awk -F: -v name="$i" 'BEGIN{f=1.0} name==$1 {f = f * $2; n++} END {printf "%0.2f\n", exp (log(f)/n)}' $temp3`
done
echo

rm -rf ./a.out $temp $temp2 $temp3 SpookyV2.o City.o siphash24.o t1ha*.o


================================================
FILE: benchmarks/blake2-config.h
================================================
/*
   BLAKE2 reference source code package - optimized C implementations

   Copyright 2012, Samuel Neves <sneves@dei.uc.pt>.  You may use this under the
   terms of the CC0, the OpenSSL Licence, or the Apache Public License 2.0, at
   your option.  The terms of these licenses can be found at:

   - CC0 1.0 Universal : http://creativecommons.org/publicdomain/zero/1.0
   - OpenSSL license   : https://www.openssl.org/source/license.html
   - Apache 2.0        : http://www.apache.org/licenses/LICENSE-2.0

   More information about the BLAKE2 hash function can be found at
   https://blake2.net.
*/
#pragma once
#ifndef __BLAKE2_CONFIG_H__
#define __BLAKE2_CONFIG_H__

/* These don't work everywhere */
#if defined(__SSE2__) || defined(__x86_64__) || defined(__amd64__)
#define HAVE_SSE2
#endif

#if defined(__SSSE3__)
#define HAVE_SSSE3
#endif

#if defined(__SSE4_1__)
#define HAVE_SSE41
#endif

#if defined(__AVX__)
#define HAVE_AVX
#endif

#if defined(__XOP__)
#define HAVE_XOP
#endif


#ifdef HAVE_AVX2
#ifndef HAVE_AVX
#define HAVE_AVX
#endif
#endif

#ifdef HAVE_XOP
#ifndef HAVE_AVX
#define HAVE_AVX
#endif
#endif

#ifdef HAVE_AVX
#ifndef HAVE_SSE41
#define HAVE_SSE41
#endif
#endif

#ifdef HAVE_SSE41
#ifndef HAVE_SSSE3
#define HAVE_SSSE3
#endif
#endif

#ifdef HAVE_SSSE3
#define HAVE_SSE2
#endif

#if !defined(HAVE_SSE2)
#error "This code requires at least SSE2."
#endif

#endif



================================================
FILE: benchmarks/blake2-impl.h
================================================
/*
   BLAKE2 reference source code package - optimized C implementations
  
   Copyright 2012, Samuel Neves <sneves@dei.uc.pt>.  You may use this under the
   terms of the CC0, the OpenSSL Licence, or the Apache Public License 2.0, at
   your option.  The terms of these licenses can be found at:
  
   - CC0 1.0 Universal : http://creativecommons.org/publicdomain/zero/1.0
   - OpenSSL license   : https://www.openssl.org/source/license.html
   - Apache 2.0        : http://www.apache.org/licenses/LICENSE-2.0
  
   More information about the BLAKE2 hash function can be found at
   https://blake2.net.
*/
#pragma once
#ifndef __BLAKE2_IMPL_H__
#define __BLAKE2_IMPL_H__

#include <stdint.h>
#include <string.h>

BLAKE2_LOCAL_INLINE(uint32_t) load32( const void *src )
{
#if defined(NATIVE_LITTLE_ENDIAN)
  uint32_t w;
  memcpy(&w, src, sizeof w);
  return w;
#else
  const uint8_t *p = ( const uint8_t * )src;
  uint32_t w = *p++;
  w |= ( uint32_t )( *p++ ) <<  8;
  w |= ( uint32_t )( *p++ ) << 16;
  w |= ( uint32_t )( *p++ ) << 24;
  return w;
#endif
}

BLAKE2_LOCAL_INLINE(uint64_t) load64( const void *src )
{
#if defined(NATIVE_LITTLE_ENDIAN)
  uint64_t w;
  memcpy(&w, src, sizeof w);
  return w;
#else
  const uint8_t *p = ( const uint8_t * )src;
  uint64_t w = *p++;
  w |= ( uint64_t )( *p++ ) <<  8;
  w |= ( uint64_t )( *p++ ) << 16;
  w |= ( uint64_t )( *p++ ) << 24;
  w |= ( uint64_t )( *p++ ) << 32;
  w |= ( uint64_t )( *p++ ) << 40;
  w |= ( uint64_t )( *p++ ) << 48;
  w |= ( uint64_t )( *p++ ) << 56;
  return w;
#endif
}

BLAKE2_LOCAL_INLINE(void) store32( void *dst, uint32_t w )
{
#if defined(NATIVE_LITTLE_ENDIAN)
  memcpy(dst, &w, sizeof w);
#else
  uint8_t *p = ( uint8_t * )dst;
  *p++ = ( uint8_t )w; w >>= 8;
  *p++ = ( uint8_t )w; w >>= 8;
  *p++ = ( uint8_t )w; w >>= 8;
  *p++ = ( uint8_t )w;
#endif
}

BLAKE2_LOCAL_INLINE(void) store64( void *dst, uint64_t w )
{
#if defined(NATIVE_LITTLE_ENDIAN)
  memcpy(dst, &w, sizeof w);
#else
  uint8_t *p = ( uint8_t * )dst;
  *p++ = ( uint8_t )w; w >>= 8;
  *p++ = ( uint8_t )w; w >>= 8;
  *p++ = ( uint8_t )w; w >>= 8;
  *p++ = ( uint8_t )w; w >>= 8;
  *p++ = ( uint8_t )w; w >>= 8;
  *p++ = ( uint8_t )w; w >>= 8;
  *p++ = ( uint8_t )w; w >>= 8;
  *p++ = ( uint8_t )w;
#endif
}

BLAKE2_LOCAL_INLINE(uint64_t) load48( const void *src )
{
  const uint8_t *p = ( const uint8_t * )src;
  uint64_t w = *p++;
  w |= ( uint64_t )( *p++ ) <<  8;
  w |= ( uint64_t )( *p++ ) << 16;
  w |= ( uint64_t )( *p++ ) << 24;
  w |= ( uint64_t )( *p++ ) << 32;
  w |= ( uint64_t )( *p++ ) << 40;
  return w;
}

BLAKE2_LOCAL_INLINE(void) store48( void *dst, uint64_t w )
{
  uint8_t *p = ( uint8_t * )dst;
  *p++ = ( uint8_t )w; w >>= 8;
  *p++ = ( uint8_t )w; w >>= 8;
  *p++ = ( uint8_t )w; w >>= 8;
  *p++ = ( uint8_t )w; w >>= 8;
  *p++ = ( uint8_t )w; w >>= 8;
  *p++ = ( uint8_t )w;
}

BLAKE2_LOCAL_INLINE(uint32_t) rotl32( const uint32_t w, const unsigned c )
{
  return ( w << c ) | ( w >> ( 32 - c ) );
}

BLAKE2_LOCAL_INLINE(uint64_t) rotl64( const uint64_t w, const unsigned c )
{
  return ( w << c ) | ( w >> ( 64 - c ) );
}

BLAKE2_LOCAL_INLINE(uint32_t) rotr32( const uint32_t w, const unsigned c )
{
  return ( w >> c ) | ( w << ( 32 - c ) );
}

BLAKE2_LOCAL_INLINE(uint64_t) rotr64( const uint64_t w, const unsigned c )
{
  return ( w >> c ) | ( w << ( 64 - c ) );
}

/* prevents compiler optimizing out memset() */
BLAKE2_LOCAL_INLINE(void) secure_zero_memory(void *v, size_t n)
{
  static void *(*const volatile memset_v)(void *, int, size_t) = &memset;
  memset_v(v, 0, n);
}

#endif



================================================
FILE: benchmarks/blake2.h
================================================
/*
   BLAKE2 reference source code package - reference C implementations
  
   Copyright 2012, Samuel Neves <sneves@dei.uc.pt>.  You may use this under the
   terms of the CC0, the OpenSSL Licence, or the Apache Public License 2.0, at
   your option.  The terms of these licenses can be found at:
  
   - CC0 1.0 Universal : http://creativecommons.org/publicdomain/zero/1.0
   - OpenSSL license   : https://www.openssl.org/source/license.html
   - Apache 2.0        : http://www.apache.org/licenses/LICENSE-2.0
  
   More information about the BLAKE2 hash function can be found at
   https://blake2.net.
*/
#pragma once
#ifndef __BLAKE2_H__
#define __BLAKE2_H__

#include <stddef.h>
#include <stdint.h>

#ifdef BLAKE2_NO_INLINE
#define BLAKE2_LOCAL_INLINE(type) static type
#endif

#ifndef BLAKE2_LOCAL_INLINE
#define BLAKE2_LOCAL_INLINE(type) static inline type
#endif

#if defined(__cplusplus)
extern "C" {
#endif

  enum blake2s_constant
  {
    BLAKE2S_BLOCKBYTES = 64,
    BLAKE2S_OUTBYTES   = 32,
    BLAKE2S_KEYBYTES   = 32,
    BLAKE2S_SALTBYTES  = 8,
    BLAKE2S_PERSONALBYTES = 8
  };

  enum blake2b_constant
  {
    BLAKE2B_BLOCKBYTES = 128,
    BLAKE2B_OUTBYTES   = 64,
    BLAKE2B_KEYBYTES   = 64,
    BLAKE2B_SALTBYTES  = 16,
    BLAKE2B_PERSONALBYTES = 16
  };

  typedef struct __blake2s_state
  {
    uint32_t h[8];
    uint32_t t[2];
    uint32_t f[2];
    uint8_t  buf[2 * BLAKE2S_BLOCKBYTES];
    size_t   buflen;
    uint8_t  last_node;
  } blake2s_state;

  typedef struct __blake2b_state
  {
    uint64_t h[8];
    uint64_t t[2];
    uint64_t f[2];
    uint8_t  buf[2 * BLAKE2B_BLOCKBYTES];
    size_t   buflen;
    uint8_t  last_node;
  } blake2b_state;

  typedef struct __blake2sp_state
  {
    blake2s_state S[8][1];
    blake2s_state R[1];
    uint8_t buf[8 * BLAKE2S_BLOCKBYTES];
    size_t  buflen;
  } blake2sp_state;

  typedef struct __blake2bp_state
  {
    blake2b_state S[4][1];
    blake2b_state R[1];
    uint8_t buf[4 * BLAKE2B_BLOCKBYTES];
    size_t  buflen;
  } blake2bp_state;


#pragma pack(push, 1)
  typedef struct __blake2s_param
  {
    uint8_t  digest_length; /* 1 */
    uint8_t  key_length;    /* 2 */
    uint8_t  fanout;        /* 3 */
    uint8_t  depth;         /* 4 */
    uint32_t leaf_length;   /* 8 */
    uint8_t  node_offset[6];// 14
    uint8_t  node_depth;    /* 15 */
    uint8_t  inner_length;  /* 16 */
    /* uint8_t  reserved[0]; */
    uint8_t  salt[BLAKE2S_SALTBYTES]; /* 24 */
    uint8_t  personal[BLAKE2S_PERSONALBYTES];  /* 32 */
  } blake2s_param;

  typedef struct __blake2b_param
  {
    uint8_t  digest_length; /* 1 */
    uint8_t  key_length;    /* 2 */
    uint8_t  fanout;        /* 3 */
    uint8_t  depth;         /* 4 */
    uint32_t leaf_length;   /* 8 */
    uint64_t node_offset;   /* 16 */
    uint8_t  node_depth;    /* 17 */
    uint8_t  inner_length;  /* 18 */
    uint8_t  reserved[14];  /* 32 */
    uint8_t  salt[BLAKE2B_SALTBYTES]; /* 48 */
    uint8_t  personal[BLAKE2B_PERSONALBYTES];  /* 64 */
  } blake2b_param;
#pragma pack(pop)

  /* Streaming API */
  int blake2s_init( blake2s_state *S, const uint8_t outlen );
  int blake2s_init_key( blake2s_state *S, const uint8_t outlen, const void *key, const uint8_t keylen );
  int blake2s_init_param( blake2s_state *S, const blake2s_param *P );
  int blake2s_update( blake2s_state *S, const uint8_t *in, uint64_t inlen );
  int blake2s_final( blake2s_state *S, uint8_t *out, uint8_t outlen );

  int blake2b_init( blake2b_state *S, const uint8_t outlen );
  int blake2b_init_key( blake2b_state *S, const uint8_t outlen, const void *key, const uint8_t keylen );
  int blake2b_init_param( blake2b_state *S, const blake2b_param *P );
  int blake2b_update( blake2b_state *S, const uint8_t *in, uint64_t inlen );
  int blake2b_final( blake2b_state *S, uint8_t *out, uint8_t outlen );

  int blake2sp_init( blake2sp_state *S, const uint8_t outlen );
  int blake2sp_init_key( blake2sp_state *S, const uint8_t outlen, const void *key, const uint8_t keylen );
  int blake2sp_update( blake2sp_state *S, const uint8_t *in, uint64_t inlen );
  int blake2sp_final( blake2sp_state *S, uint8_t *out, uint8_t outlen );

  int blake2bp_init( blake2bp_state *S, const uint8_t outlen );
  int blake2bp_init_key( blake2bp_state *S, const uint8_t outlen, const void *key, const uint8_t keylen );
  int blake2bp_update( blake2bp_state *S, const uint8_t *in, uint64_t inlen );
  int blake2bp_final( blake2bp_state *S, uint8_t *out, uint8_t outlen );

  /* Simple API */
  int blake2s( uint8_t *out, const void *in, const void *key, const uint8_t outlen, const uint64_t inlen, uint8_t keylen );
  int blake2b( uint8_t *out, const void *in, const void *key, const uint8_t outlen, const uint64_t inlen, uint8_t keylen );

  int blake2sp( uint8_t *out, const void *in, const void *key, const uint8_t outlen, const uint64_t inlen, uint8_t keylen );
  int blake2bp( uint8_t *out, const void *in, const void *key, const uint8_t outlen, const uint64_t inlen, uint8_t keylen );

  static inline int blake2( uint8_t *out, const void *in, const void *key, const uint8_t outlen, const uint64_t inlen, uint8_t keylen )
  {
    return blake2b( out, in, key, outlen, inlen, keylen );
  }

#if defined(__cplusplus)
}
#endif

#endif



================================================
FILE: benchmarks/blake2b-load-sse2.h
================================================
/*
   BLAKE2 reference source code package - optimized C implementations
  
   Copyright 2012, Samuel Neves <sneves@dei.uc.pt>.  You may use this under the
   terms of the CC0, the OpenSSL Licence, or the Apache Public License 2.0, at
   your option.  The terms of these licenses can be found at:
  
   - CC0 1.0 Universal : http://creativecommons.org/publicdomain/zero/1.0
   - OpenSSL license   : https://www.openssl.org/source/license.html
   - Apache 2.0        : http://www.apache.org/licenses/LICENSE-2.0
  
   More information about the BLAKE2 hash function can be found at
   https://blake2.net.
*/
#pragma once
#ifndef __BLAKE2B_LOAD_SSE2_H__
#define __BLAKE2B_LOAD_SSE2_H__

#define LOAD_MSG_0_1(b0, b1) b0 = _mm_set_epi64x(m2, m0); b1 = _mm_set_epi64x(m6, m4)
#define LOAD_MSG_0_2(b0, b1) b0 = _mm_set_epi64x(m3, m1); b1 = _mm_set_epi64x(m7, m5)
#define LOAD_MSG_0_3(b0, b1) b0 = _mm_set_epi64x(m10, m8); b1 = _mm_set_epi64x(m14, m12)
#define LOAD_MSG_0_4(b0, b1) b0 = _mm_set_epi64x(m11, m9); b1 = _mm_set_epi64x(m15, m13)
#define LOAD_MSG_1_1(b0, b1) b0 = _mm_set_epi64x(m4, m14); b1 = _mm_set_epi64x(m13, m9)
#define LOAD_MSG_1_2(b0, b1) b0 = _mm_set_epi64x(m8, m10); b1 = _mm_set_epi64x(m6, m15)
#define LOAD_MSG_1_3(b0, b1) b0 = _mm_set_epi64x(m0, m1); b1 = _mm_set_epi64x(m5, m11)
#define LOAD_MSG_1_4(b0, b1) b0 = _mm_set_epi64x(m2, m12); b1 = _mm_set_epi64x(m3, m7)
#define LOAD_MSG_2_1(b0, b1) b0 = _mm_set_epi64x(m12, m11); b1 = _mm_set_epi64x(m15, m5)
#define LOAD_MSG_2_2(b0, b1) b0 = _mm_set_epi64x(m0, m8); b1 = _mm_set_epi64x(m13, m2)
#define LOAD_MSG_2_3(b0, b1) b0 = _mm_set_epi64x(m3, m10); b1 = _mm_set_epi64x(m9, m7)
#define LOAD_MSG_2_4(b0, b1) b0 = _mm_set_epi64x(m6, m14); b1 = _mm_set_epi64x(m4, m1)
#define LOAD_MSG_3_1(b0, b1) b0 = _mm_set_epi64x(m3, m7); b1 = _mm_set_epi64x(m11, m13)
#define LOAD_MSG_3_2(b0, b1) b0 = _mm_set_epi64x(m1, m9); b1 = _mm_set_epi64x(m14, m12)
#define LOAD_MSG_3_3(b0, b1) b0 = _mm_set_epi64x(m5, m2); b1 = _mm_set_epi64x(m15, m4)
#define LOAD_MSG_3_4(b0, b1) b0 = _mm_set_epi64x(m10, m6); b1 = _mm_set_epi64x(m8, m0)
#define LOAD_MSG_4_1(b0, b1) b0 = _mm_set_epi64x(m5, m9); b1 = _mm_set_epi64x(m10, m2)
#define LOAD_MSG_4_2(b0, b1) b0 = _mm_set_epi64x(m7, m0); b1 = _mm_set_epi64x(m15, m4)
#define LOAD_MSG_4_3(b0, b1) b0 = _mm_set_epi64x(m11, m14); b1 = _mm_set_epi64x(m3, m6)
#define LOAD_MSG_4_4(b0, b1) b0 = _mm_set_epi64x(m12, m1); b1 = _mm_set_epi64x(m13, m8)
#define LOAD_MSG_5_1(b0, b1) b0 = _mm_set_epi64x(m6, m2); b1 = _mm_set_epi64x(m8, m0)
#define LOAD_MSG_5_2(b0, b1) b0 = _mm_set_epi64x(m10, m12); b1 = _mm_set_epi64x(m3, m11)
#define LOAD_MSG_5_3(b0, b1) b0 = _mm_set_epi64x(m7, m4); b1 = _mm_set_epi64x(m1, m15)
#define LOAD_MSG_5_4(b0, b1) b0 = _mm_set_epi64x(m5, m13); b1 = _mm_set_epi64x(m9, m14)
#define LOAD_MSG_6_1(b0, b1) b0 = _mm_set_epi64x(m1, m12); b1 = _mm_set_epi64x(m4, m14)
#define LOAD_MSG_6_2(b0, b1) b0 = _mm_set_epi64x(m15, m5); b1 = _mm_set_epi64x(m10, m13)
#define LOAD_MSG_6_3(b0, b1) b0 = _mm_set_epi64x(m6, m0); b1 = _mm_set_epi64x(m8, m9)
#define LOAD_MSG_6_4(b0, b1) b0 = _mm_set_epi64x(m3, m7); b1 = _mm_set_epi64x(m11, m2)
#define LOAD_MSG_7_1(b0, b1) b0 = _mm_set_epi64x(m7, m13); b1 = _mm_set_epi64x(m3, m12)
#define LOAD_MSG_7_2(b0, b1) b0 = _mm_set_epi64x(m14, m11); b1 = _mm_set_epi64x(m9, m1)
#define LOAD_MSG_7_3(b0, b1) b0 = _mm_set_epi64x(m15, m5); b1 = _mm_set_epi64x(m2, m8)
#define LOAD_MSG_7_4(b0, b1) b0 = _mm_set_epi64x(m4, m0); b1 = _mm_set_epi64x(m10, m6)
#define LOAD_MSG_8_1(b0, b1) b0 = _mm_set_epi64x(m14, m6); b1 = _mm_set_epi64x(m0, m11)
#define LOAD_MSG_8_2(b0, b1) b0 = _mm_set_epi64x(m9, m15); b1 = _mm_set_epi64x(m8, m3)
#define LOAD_MSG_8_3(b0, b1) b0 = _mm_set_epi64x(m13, m12); b1 = _mm_set_epi64x(m10, m1)
#define LOAD_MSG_8_4(b0, b1) b0 = _mm_set_epi64x(m7, m2); b1 = _mm_set_epi64x(m5, m4)
#define LOAD_MSG_9_1(b0, b1) b0 = _mm_set_epi64x(m8, m10); b1 = _mm_set_epi64x(m1, m7)
#define LOAD_MSG_9_2(b0, b1) b0 = _mm_set_epi64x(m4, m2); b1 = _mm_set_epi64x(m5, m6)
#define LOAD_MSG_9_3(b0, b1) b0 = _mm_set_epi64x(m9, m15); b1 = _mm_set_epi64x(m13, m3)
#define LOAD_MSG_9_4(b0, b1) b0 = _mm_set_epi64x(m14, m11); b1 = _mm_set_epi64x(m0, m12)
#define LOAD_MSG_10_1(b0, b1) b0 = _mm_set_epi64x(m2, m0); b1 = _mm_set_epi64x(m6, m4)
#define LOAD_MSG_10_2(b0, b1) b0 = _mm_set_epi64x(m3, m1); b1 = _mm_set_epi64x(m7, m5)
#define LOAD_MSG_10_3(b0, b1) b0 = _mm_set_epi64x(m10, m8); b1 = _mm_set_epi64x(m14, m12)
#define LOAD_MSG_10_4(b0, b1) b0 = _mm_set_epi64x(m11, m9); b1 = _mm_set_epi64x(m15, m13)
#define LOAD_MSG_11_1(b0, b1) b0 = _mm_set_epi64x(m4, m14); b1 = _mm_set_epi64x(m13, m9)
#define LOAD_MSG_11_2(b0, b1) b0 = _mm_set_epi64x(m8, m10); b1 = _mm_set_epi64x(m6, m15)
#define LOAD_MSG_11_3(b0, b1) b0 = _mm_set_epi64x(m0, m1); b1 = _mm_set_epi64x(m5, m11)
#define LOAD_MSG_11_4(b0, b1) b0 = _mm_set_epi64x(m2, m12); b1 = _mm_set_epi64x(m3, m7)


#endif



================================================
FILE: benchmarks/blake2b-load-sse41.h
================================================
/*
   BLAKE2 reference source code package - optimized C implementations
  
   Copyright 2012, Samuel Neves <sneves@dei.uc.pt>.  You may use this under the
   terms of the CC0, the OpenSSL Licence, or the Apache Public License 2.0, at
   your option.  The terms of these licenses can be found at:
  
   - CC0 1.0 Universal : http://creativecommons.org/publicdomain/zero/1.0
   - OpenSSL license   : https://www.openssl.org/source/license.html
   - Apache 2.0        : http://www.apache.org/licenses/LICENSE-2.0
  
   More information about the BLAKE2 hash function can be found at
   https://blake2.net.
*/
#pragma once
#ifndef __BLAKE2B_LOAD_SSE41_H__
#define __BLAKE2B_LOAD_SSE41_H__

#define LOAD_MSG_0_1(b0, b1) \
do \
{ \
b0 = _mm_unpacklo_epi64(m0, m1); \
b1 = _mm_unpacklo_epi64(m2, m3); \
} while(0)


#define LOAD_MSG_0_2(b0, b1) \
do \
{ \
b0 = _mm_unpackhi_epi64(m0, m1); \
b1 = _mm_unpackhi_epi64(m2, m3); \
} while(0)


#define LOAD_MSG_0_3(b0, b1) \
do \
{ \
b0 = _mm_unpacklo_epi64(m4, m5); \
b1 = _mm_unpacklo_epi64(m6, m7); \
} while(0)


#define LOAD_MSG_0_4(b0, b1) \
do \
{ \
b0 = _mm_unpackhi_epi64(m4, m5); \
b1 = _mm_unpackhi_epi64(m6, m7); \
} while(0)


#define LOAD_MSG_1_1(b0, b1) \
do \
{ \
b0 = _mm_unpacklo_epi64(m7, m2); \
b1 = _mm_unpackhi_epi64(m4, m6); \
} while(0)


#define LOAD_MSG_1_2(b0, b1) \
do \
{ \
b0 = _mm_unpacklo_epi64(m5, m4); \
b1 = _mm_alignr_epi8(m3, m7, 8); \
} while(0)


#define LOAD_MSG_1_3(b0, b1) \
do \
{ \
b0 = _mm_shuffle_epi32(m0, _MM_SHUFFLE(1,0,3,2)); \
b1 = _mm_unpackhi_epi64(m5, m2); \
} while(0)


#define LOAD_MSG_1_4(b0, b1) \
do \
{ \
b0 = _mm_unpacklo_epi64(m6, m1); \
b1 = _mm_unpackhi_epi64(m3, m1); \
} while(0)


#define LOAD_MSG_2_1(b0, b1) \
do \
{ \
b0 = _mm_alignr_epi8(m6, m5, 8); \
b1 = _mm_unpackhi_epi64(m2, m7); \
} while(0)


#define LOAD_MSG_2_2(b0, b1) \
do \
{ \
b0 = _mm_unpacklo_epi64(m4, m0); \
b1 = _mm_blend_epi16(m1, m6, 0xF0); \
} while(0)


#define LOAD_MSG_2_3(b0, b1) \
do \
{ \
b0 = _mm_blend_epi16(m5, m1, 0xF0); \
b1 = _mm_unpackhi_epi64(m3, m4); \
} while(0)


#define LOAD_MSG_2_4(b0, b1) \
do \
{ \
b0 = _mm_unpacklo_epi64(m7, m3); \
b1 = _mm_alignr_epi8(m2, m0, 8); \
} while(0)


#define LOAD_MSG_3_1(b0, b1) \
do \
{ \
b0 = _mm_unpackhi_epi64(m3, m1); \
b1 = _mm_unpackhi_epi64(m6, m5); \
} while(0)


#define LOAD_MSG_3_2(b0, b1) \
do \
{ \
b0 = _mm_unpackhi_epi64(m4, m0); \
b1 = _mm_unpacklo_epi64(m6, m7); \
} while(0)


#define LOAD_MSG_3_3(b0, b1) \
do \
{ \
b0 = _mm_blend_epi16(m1, m2, 0xF0); \
b1 = _mm_blend_epi16(m2, m7, 0xF0); \
} while(0)


#define LOAD_MSG_3_4(b0, b1) \
do \
{ \
b0 = _mm_unpacklo_epi64(m3, m5); \
b1 = _mm_unpacklo_epi64(m0, m4); \
} while(0)


#define LOAD_MSG_4_1(b0, b1) \
do \
{ \
b0 = _mm_unpackhi_epi64(m4, m2); \
b1 = _mm_unpacklo_epi64(m1, m5); \
} while(0)


#define LOAD_MSG_4_2(b0, b1) \
do \
{ \
b0 = _mm_blend_epi16(m0, m3, 0xF0); \
b1 = _mm_blend_epi16(m2, m7, 0xF0); \
} while(0)


#define LOAD_MSG_4_3(b0, b1) \
do \
{ \
b0 = _mm_blend_epi16(m7, m5, 0xF0); \
b1 = _mm_blend_epi16(m3, m1, 0xF0); \
} while(0)


#define LOAD_MSG_4_4(b0, b1) \
do \
{ \
b0 = _mm_alignr_epi8(m6, m0, 8); \
b1 = _mm_blend_epi16(m4, m6, 0xF0); \
} while(0)


#define LOAD_MSG_5_1(b0, b1) \
do \
{ \
b0 = _mm_unpacklo_epi64(m1, m3); \
b1 = _mm_unpacklo_epi64(m0, m4); \
} while(0)


#define LOAD_MSG_5_2(b0, b1) \
do \
{ \
b0 = _mm_unpacklo_epi64(m6, m5); \
b1 = _mm_unpackhi_epi64(m5, m1); \
} while(0)


#define LOAD_MSG_5_3(b0, b1) \
do \
{ \
b0 = _mm_blend_epi16(m2, m3, 0xF0); \
b1 = _mm_unpackhi_epi64(m7, m0); \
} while(0)


#define LOAD_MSG_5_4(b0, b1) \
do \
{ \
b0 = _mm_unpackhi_epi64(m6, m2); \
b1 = _mm_blend_epi16(m7, m4, 0xF0); \
} while(0)


#define LOAD_MSG_6_1(b0, b1) \
do \
{ \
b0 = _mm_blend_epi16(m6, m0, 0xF0); \
b1 = _mm_unpacklo_epi64(m7, m2); \
} while(0)


#define LOAD_MSG_6_2(b0, b1) \
do \
{ \
b0 = _mm_unpackhi_epi64(m2, m7); \
b1 = _mm_alignr_epi8(m5, m6, 8); \
} while(0)


#define LOAD_MSG_6_3(b0, b1) \
do \
{ \
b0 = _mm_unpacklo_epi64(m0, m3); \
b1 = _mm_shuffle_epi32(m4, _MM_SHUFFLE(1,0,3,2)); \
} while(0)


#define LOAD_MSG_6_4(b0, b1) \
do \
{ \
b0 = _mm_unpackhi_epi64(m3, m1); \
b1 = _mm_blend_epi16(m1, m5, 0xF0); \
} while(0)


#define LOAD_MSG_7_1(b0, b1) \
do \
{ \
b0 = _mm_unpackhi_epi64(m6, m3); \
b1 = _mm_blend_epi16(m6, m1, 0xF0); \
} while(0)


#define LOAD_MSG_7_2(b0, b1) \
do \
{ \
b0 = _mm_alignr_epi8(m7, m5, 8); \
b1 = _mm_unpackhi_epi64(m0, m4); \
} while(0)


#define LOAD_MSG_7_3(b0, b1) \
do \
{ \
b0 = _mm_unpackhi_epi64(m2, m7); \
b1 = _mm_unpacklo_epi64(m4, m1); \
} while(0)


#define LOAD_MSG_7_4(b0, b1) \
do \
{ \
b0 = _mm_unpacklo_epi64(m0, m2); \
b1 = _mm_unpacklo_epi64(m3, m5); \
} while(0)


#define LOAD_MSG_8_1(b0, b1) \
do \
{ \
b0 = _mm_unpacklo_epi64(m3, m7); \
b1 = _mm_alignr_epi8(m0, m5, 8); \
} while(0)


#define LOAD_MSG_8_2(b0, b1) \
do \
{ \
b0 = _mm_unpackhi_epi64(m7, m4); \
b1 = _mm_alignr_epi8(m4, m1, 8); \
} while(0)


#define LOAD_MSG_8_3(b0, b1) \
do \
{ \
b0 = m6; \
b1 = _mm_alignr_epi8(m5, m0, 8); \
} while(0)


#define LOAD_MSG_8_4(b0, b1) \
do \
{ \
b0 = _mm_blend_epi16(m1, m3, 0xF0); \
b1 = m2; \
} while(0)


#define LOAD_MSG_9_1(b0, b1) \
do \
{ \
b0 = _mm_unpacklo_epi64(m5, m4); \
b1 = _mm_unpackhi_epi64(m3, m0); \
} while(0)


#define LOAD_MSG_9_2(b0, b1) \
do \
{ \
b0 = _mm_unpacklo_epi64(m1, m2); \
b1 = _mm_blend_epi16(m3, m2, 0xF0); \
} while(0)


#define LOAD_MSG_9_3(b0, b1) \
do \
{ \
b0 = _mm_unpackhi_epi64(m7, m4); \
b1 = _mm_unpackhi_epi64(m1, m6); \
} while(0)


#define LOAD_MSG_9_4(b0, b1) \
do \
{ \
b0 = _mm_alignr_epi8(m7, m5, 8); \
b1 = _mm_unpacklo_epi64(m6, m0); \
} while(0)


#define LOAD_MSG_10_1(b0, b1) \
do \
{ \
b0 = _mm_unpacklo_epi64(m0, m1); \
b1 = _mm_unpacklo_epi64(m2, m3); \
} while(0)


#define LOAD_MSG_10_2(b0, b1) \
do \
{ \
b0 = _mm_unpackhi_epi64(m0, m1); \
b1 = _mm_unpackhi_epi64(m2, m3); \
} while(0)


#define LOAD_MSG_10_3(b0, b1) \
do \
{ \
b0 = _mm_unpacklo_epi64(m4, m5); \
b1 = _mm_unpacklo_epi64(m6, m7); \
} while(0)


#define LOAD_MSG_10_4(b0, b1) \
do \
{ \
b0 = _mm_unpackhi_epi64(m4, m5); \
b1 = _mm_unpackhi_epi64(m6, m7); \
} while(0)


#define LOAD_MSG_11_1(b0, b1) \
do \
{ \
b0 = _mm_unpacklo_epi64(m7, m2); \
b1 = _mm_unpackhi_epi64(m4, m6); \
} while(0)


#define LOAD_MSG_11_2(b0, b1) \
do \
{ \
b0 = _mm_unpacklo_epi64(m5, m4); \
b1 = _mm_alignr_epi8(m3, m7, 8); \
} while(0)


#define LOAD_MSG_11_3(b0, b1) \
do \
{ \
b0 = _mm_shuffle_epi32(m0, _MM_SHUFFLE(1,0,3,2)); \
b1 = _mm_unpackhi_epi64(m5, m2); \
} while(0)


#define LOAD_MSG_11_4(b0, b1) \
do \
{ \
b0 = _mm_unpacklo_epi64(m6, m1); \
b1 = _mm_unpackhi_epi64(m3, m1); \
} while(0)


#endif



================================================
FILE: benchmarks/blake2b-round.h
================================================
/*
   BLAKE2 reference source code package - optimized C implementations
  
   Copyright 2012, Samuel Neves <sneves@dei.uc.pt>.  You may use this under the
   terms of the CC0, the OpenSSL Licence, or the Apache Public License 2.0, at
   your option.  The terms of these licenses can be found at:
  
   - CC0 1.0 Universal : http://creativecommons.org/publicdomain/zero/1.0
   - OpenSSL license   : https://www.openssl.org/source/license.html
   - Apache 2.0        : http://www.apache.org/licenses/LICENSE-2.0
  
   More information about the BLAKE2 hash function can be found at
   https://blake2.net.
*/
#pragma once
#ifndef __BLAKE2B_ROUND_H__
#define __BLAKE2B_ROUND_H__

#define LOADU(p)  _mm_loadu_si128( (const __m128i *)(p) )
#define STOREU(p,r) _mm_storeu_si128((__m128i *)(p), r)

#define TOF(reg) _mm_castsi128_ps((reg))
#define TOI(reg) _mm_castps_si128((reg))

#define LIKELY(x) __builtin_expect((x),1)


/* Microarchitecture-specific macros */
#ifndef HAVE_XOP
#ifdef HAVE_SSSE3
#define _mm_roti_epi64(x, c) \
    (-(c) == 32) ? _mm_shuffle_epi32((x), _MM_SHUFFLE(2,3,0,1))  \
    : (-(c) == 24) ? _mm_shuffle_epi8((x), r24) \
    : (-(c) == 16) ? _mm_shuffle_epi8((x), r16) \
    : (-(c) == 63) ? _mm_xor_si128(_mm_srli_epi64((x), -(c)), _mm_add_epi64((x), (x)))  \
    : _mm_xor_si128(_mm_srli_epi64((x), -(c)), _mm_slli_epi64((x), 64-(-(c))))
#else
#define _mm_roti_epi64(r, c) _mm_xor_si128(_mm_srli_epi64( (r), -(c) ),_mm_slli_epi64( (r), 64-(-(c)) ))
#endif
#else
/* ... */
#endif



#define G1(row1l,row2l,row3l,row4l,row1h,row2h,row3h,row4h,b0,b1) \
  row1l = _mm_add_epi64(_mm_add_epi64(row1l, b0), row2l); \
  row1h = _mm_add_epi64(_mm_add_epi64(row1h, b1), row2h); \
  \
  row4l = _mm_xor_si128(row4l, row1l); \
  row4h = _mm_xor_si128(row4h, row1h); \
  \
  row4l = _mm_roti_epi64(row4l, -32); \
  row4h = _mm_roti_epi64(row4h, -32); \
  \
  row3l = _mm_add_epi64(row3l, row4l); \
  row3h = _mm_add_epi64(row3h, row4h); \
  \
  row2l = _mm_xor_si128(row2l, row3l); \
  row2h = _mm_xor_si128(row2h, row3h); \
  \
  row2l = _mm_roti_epi64(row2l, -24); \
  row2h = _mm_roti_epi64(row2h, -24); \
 
#define G2(row1l,row2l,row3l,row4l,row1h,row2h,row3h,row4h,b0,b1) \
  row1l = _mm_add_epi64(_mm_add_epi64(row1l, b0), row2l); \
  row1h = _mm_add_epi64(_mm_add_epi64(row1h, b1), row2h); \
  \
  row4l = _mm_xor_si128(row4l, row1l); \
  row4h = _mm_xor_si128(row4h, row1h); \
  \
  row4l = _mm_roti_epi64(row4l, -16); \
  row4h = _mm_roti_epi64(row4h, -16); \
  \
  row3l = _mm_add_epi64(row3l, row4l); \
  row3h = _mm_add_epi64(row3h, row4h); \
  \
  row2l = _mm_xor_si128(row2l, row3l); \
  row2h = _mm_xor_si128(row2h, row3h); \
  \
  row2l = _mm_roti_epi64(row2l, -63); \
  row2h = _mm_roti_epi64(row2h, -63); \
 
#if defined(HAVE_SSSE3)
#define DIAGONALIZE(row1l,row2l,row3l,row4l,row1h,row2h,row3h,row4h) \
  t0 = _mm_alignr_epi8(row2h, row2l, 8); \
  t1 = _mm_alignr_epi8(row2l, row2h, 8); \
  row2l = t0; \
  row2h = t1; \
  \
  t0 = row3l; \
  row3l = row3h; \
  row3h = t0;    \
  \
  t0 = _mm_alignr_epi8(row4h, row4l, 8); \
  t1 = _mm_alignr_epi8(row4l, row4h, 8); \
  row4l = t1; \
  row4h = t0;

#define UNDIAGONALIZE(row1l,row2l,row3l,row4l,row1h,row2h,row3h,row4h) \
  t0 = _mm_alignr_epi8(row2l, row2h, 8); \
  t1 = _mm_alignr_epi8(row2h, row2l, 8); \
  row2l = t0; \
  row2h = t1; \
  \
  t0 = row3l; \
  row3l = row3h; \
  row3h = t0; \
  \
  t0 = _mm_alignr_epi8(row4l, row4h, 8); \
  t1 = _mm_alignr_epi8(row4h, row4l, 8); \
  row4l = t1; \
  row4h = t0;
#else

#define DIAGONALIZE(row1l,row2l,row3l,row4l,row1h,row2h,row3h,row4h) \
  t0 = row4l;\
  t1 = row2l;\
  row4l = row3l;\
  row3l = row3h;\
  row3h = row4l;\
  row4l = _mm_unpackhi_epi64(row4h, _mm_unpacklo_epi64(t0, t0)); \
  row4h = _mm_unpackhi_epi64(t0, _mm_unpacklo_epi64(row4h, row4h)); \
  row2l = _mm_unpackhi_epi64(row2l, _mm_unpacklo_epi64(row2h, row2h)); \
  row2h = _mm_unpackhi_epi64(row2h, _mm_unpacklo_epi64(t1, t1))

#define UNDIAGONALIZE(row1l,row2l,row3l,row4l,row1h,row2h,row3h,row4h) \
  t0 = row3l;\
  row3l = row3h;\
  row3h = t0;\
  t0 = row2l;\
  t1 = row4l;\
  row2l = _mm_unpackhi_epi64(row2h, _mm_unpacklo_epi64(row2l, row2l)); \
  row2h = _mm_unpackhi_epi64(t0, _mm_unpacklo_epi64(row2h, row2h)); \
  row4l = _mm_unpackhi_epi64(row4l, _mm_unpacklo_epi64(row4h, row4h)); \
  row4h = _mm_unpackhi_epi64(row4h, _mm_unpacklo_epi64(t1, t1))

#endif

#if defined(HAVE_SSE41)
#include "blake2b-load-sse41.h"
#else
#include "blake2b-load-sse2.h"
#endif

#define ROUND(r) \
  LOAD_MSG_ ##r ##_1(b0, b1); \
  G1(row1l,row2l,row3l,row4l,row1h,row2h,row3h,row4h,b0,b1); \
  LOAD_MSG_ ##r ##_2(b0, b1); \
  G2(row1l,row2l,row3l,row4l,row1h,row2h,row3h,row4h,b0,b1); \
  DIAGONALIZE(row1l,row2l,row3l,row4l,row1h,row2h,row3h,row4h); \
  LOAD_MSG_ ##r ##_3(b0, b1); \
  G1(row1l,row2l,row3l,row4l,row1h,row2h,row3h,row4h,b0,b1); \
  LOAD_MSG_ ##r ##_4(b0, b1); \
  G2(row1l,row2l,row3l,row4l,row1h,row2h,row3h,row4h,b0,b1); \
  UNDIAGONALIZE(row1l,row2l,row3l,row4l,row1h,row2h,row3h,row4h);

#endif



================================================
FILE: benchmarks/blake2b.c
================================================
/*
   BLAKE2 reference source code package - optimized C implementations
  
   Copyright 2012, Samuel Neves <sneves@dei.uc.pt>.  You may use this under the
   terms of the CC0, the OpenSSL Licence, or the Apache Public License 2.0, at
   your option.  The terms of these licenses can be found at:
  
   - CC0 1.0 Universal : http://creativecommons.org/publicdomain/zero/1.0
   - OpenSSL license   : https://www.openssl.org/source/license.html
   - Apache 2.0        : http://www.apache.org/licenses/LICENSE-2.0
  
   More information about the BLAKE2 hash function can be found at
   https://blake2.net.
*/

#include <stdint.h>
#include <string.h>
#include <stdio.h>

#include "blake2.h"
#include "blake2-impl.h"

#include "blake2-config.h"

#ifdef _MSC_VER
#include <intrin.h> /* for _mm_set_epi64x */
#endif
#include <emmintrin.h>
#if defined(HAVE_SSSE3)
#include <tmmintrin.h>
#endif
#if defined(HAVE_SSE41)
#include <smmintrin.h>
#endif
#if defined(HAVE_AVX)
#include <immintrin.h>
#endif
#if defined(HAVE_XOP)
#include <x86intrin.h>
#endif

#include "blake2b-round.h"

static const uint64_t blake2b_IV[8] =
{
  0x6a09e667f3bcc908ULL, 0xbb67ae8584caa73bULL,
  0x3c6ef372fe94f82bULL, 0xa54ff53a5f1d36f1ULL,
  0x510e527fade682d1ULL, 0x9b05688c2b3e6c1fULL,
  0x1f83d9abfb41bd6bULL, 0x5be0cd19137e2179ULL
};

static const uint8_t blake2b_sigma[12][16] =
{
  {  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15 } ,
  { 14, 10,  4,  8,  9, 15, 13,  6,  1, 12,  0,  2, 11,  7,  5,  3 } ,
  { 11,  8, 12,  0,  5,  2, 15, 13, 10, 14,  3,  6,  7,  1,  9,  4 } ,
  {  7,  9,  3,  1, 13, 12, 11, 14,  2,  6,  5, 10,  4,  0, 15,  8 } ,
  {  9,  0,  5,  7,  2,  4, 10, 15, 14,  1, 11, 12,  6,  8,  3, 13 } ,
  {  2, 12,  6, 10,  0, 11,  8,  3,  4, 13,  7,  5, 15, 14,  1,  9 } ,
  { 12,  5,  1, 15, 14, 13,  4, 10,  0,  7,  6,  3,  9,  2,  8, 11 } ,
  { 13, 11,  7, 14, 12,  1,  3,  9,  5,  0, 15,  4,  8,  6,  2, 10 } ,
  {  6, 15, 14,  9, 11,  3,  0,  8, 12,  2, 13,  7,  1,  4, 10,  5 } ,
  { 10,  2,  8,  4,  7,  6,  1,  5, 15, 11,  9, 14,  3, 12, 13 , 0 } ,
  {  0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15 } ,
  { 14, 10,  4,  8,  9, 15, 13,  6,  1, 12,  0,  2, 11,  7,  5,  3 }
};


/* Some helper functions, not necessarily useful */
BLAKE2_LOCAL_INLINE(int) blake2b_set_lastnode( blake2b_state *S )
{
  S->f[1] = -1;
  return 0;
}

BLAKE2_LOCAL_INLINE(int) blake2b_clear_lastnode( blake2b_state *S )
{
  S->f[1] = 0;
  return 0;
}

BLAKE2_LOCAL_INLINE(int) blake2b_is_lastblock( const blake2b_state *S )
{
  return S->f[0] != 0;
}

BLAKE2_LOCAL_INLINE(int) blake2b_set_lastblock( blake2b_state *S )
{
  if( S->last_node ) blake2b_set_lastnode( S );

  S->f[0] = -1;
  return 0;
}

BLAKE2_LOCAL_INLINE(int) blake2b_clear_lastblock( blake2b_state *S )
{
  if( S->last_node ) blake2b_clear_lastnode( S );

  S->f[0] = 0;
  return 0;
}


BLAKE2_LOCAL_INLINE(int) blake2b_increment_counter( blake2b_state *S, const uint64_t inc )
{
#if __x86_64__
  /* ADD/ADC chain */
  __uint128_t t = ( ( __uint128_t )S->t[1] << 64 ) | S->t[0];
  t += inc;
  S->t[0] = ( uint64_t )( t >>  0 );
  S->t[1] = ( uint64_t )( t >> 64 );
#else
  S->t[0] += inc;
  S->t[1] += ( S->t[0] < inc );
#endif
  return 0;
}


/* Parameter-related functions */
BLAKE2_LOCAL_INLINE(int) blake2b_param_set_digest_length( blake2b_param *P, const uint8_t digest_length )
{
  P->digest_length = digest_length;
  return 0;
}

BLAKE2_LOCAL_INLINE(int) blake2b_param_set_fanout( blake2b_param *P, const uint8_t fanout )
{
  P->fanout = fanout;
  return 0;
}

BLAKE2_LOCAL_INLINE(int) blake2b_param_set_max_depth( blake2b_param *P, const uint8_t depth )
{
  P->depth = depth;
  return 0;
}

BLAKE2_LOCAL_INLINE(int) blake2b_param_set_leaf_length( blake2b_param *P, const uint32_t leaf_length )
{
  P->leaf_length = leaf_length;
  return 0;
}

BLAKE2_LOCAL_INLINE(int) blake2b_param_set_node_offset( blake2b_param *P, const uint64_t node_offset )
{
  P->node_offset = node_offset;
  return 0;
}

BLAKE2_LOCAL_INLINE(int) blake2b_param_set_node_depth( blake2b_param *P, const uint8_t node_depth )
{
  P->node_depth = node_depth;
  return 0;
}

BLAKE2_LOCAL_INLINE(int) blake2b_param_set_inner_length( blake2b_param *P, const uint8_t inner_length )
{
  P->inner_length = inner_length;
  return 0;
}

BLAKE2_LOCAL_INLINE(int) blake2b_param_set_salt( blake2b_param *P, const uint8_t salt[BLAKE2B_SALTBYTES] )
{
  memcpy( P->salt, salt, BLAKE2B_SALTBYTES );
  return 0;
}

BLAKE2_LOCAL_INLINE(int) blake2b_param_set_personal( blake2b_param *P, const uint8_t personal[BLAKE2B_PERSONALBYTES] )
{
  memcpy( P->personal, personal, BLAKE2B_PERSONALBYTES );
  return 0;
}

BLAKE2_LOCAL_INLINE(int) blake2b_init0( blake2b_state *S )
{
  memset( S, 0, sizeof( blake2b_state ) );

  for( int i = 0; i < 8; ++i ) S->h[i] = blake2b_IV[i];

  return 0;
}

/* init xors IV with input parameter block */
int blake2b_init_param( blake2b_state *S, const blake2b_param *P )
{
  /*blake2b_init0( S ); */
  const uint8_t * v = ( const uint8_t * )( blake2b_IV );
  const uint8_t * p = ( const uint8_t * )( P );
  uint8_t * h = ( uint8_t * )( S->h );
  /* IV XOR ParamBlock */
  memset( S, 0, sizeof( blake2b_state ) );

  for( int i = 0; i < BLAKE2B_OUTBYTES; ++i ) h[i] = v[i] ^ p[i];

  return 0;
}


/* Some sort of default parameter block initialization, for sequential blake2b */
int blake2b_init( blake2b_state *S, const uint8_t outlen )
{
  const blake2b_param P =
  {
    outlen,
    0,
    1,
    1,
    0,
    0,
    0,
    0,
    {0},
    {0},
    {0}
  };

  if ( ( !outlen ) || ( outlen > BLAKE2B_OUTBYTES ) ) return -1;

  return blake2b_init_param( S, &P );
}

int blake2b_init_key( blake2b_state *S, const uint8_t outlen, const void *key, const uint8_t keylen )
{
  const blake2b_param P =
  {
    outlen,
    keylen,
    1,
    1,
    0,
    0,
    0,
    0,
    {0},
    {0},
    {0}
  };

  if ( ( !outlen ) || ( outlen > BLAKE2B_OUTBYTES ) ) return -1;

  if ( ( !keylen ) || keylen > BLAKE2B_KEYBYTES ) return -1;

  if( blake2b_init_param( S, &P ) < 0 )
    return 0;

  {
    uint8_t block[BLAKE2B_BLOCKBYTES];
    memset( block, 0, BLAKE2B_BLOCKBYTES );
    memcpy( block, key, keylen );
    blake2b_update( S, block, BLAKE2B_BLOCKBYTES );
    secure_zero_memory( block, BLAKE2B_BLOCKBYTES ); /* Burn the key from stack */
  }
  return 0;
}

BLAKE2_LOCAL_INLINE(int) blake2b_compress( blake2b_state *S, const uint8_t block[BLAKE2B_BLOCKBYTES] )
{
  __m128i row1l, row1h;
  __m128i row2l, row2h;
  __m128i row3l, row3h;
  __m128i row4l, row4h;
  __m128i b0, b1;
  __m128i t0, t1;
#if defined(HAVE_SSSE3) && !defined(HAVE_XOP)
  const __m128i r16 = _mm_setr_epi8( 2, 3, 4, 5, 6, 7, 0, 1, 10, 11, 12, 13, 14, 15, 8, 9 );
  const __m128i r24 = _mm_setr_epi8( 3, 4, 5, 6, 7, 0, 1, 2, 11, 12, 13, 14, 15, 8, 9, 10 );
#endif
#if defined(HAVE_SSE41)
  const __m128i m0 = LOADU( block + 00 );
  const __m128i m1 = LOADU( block + 16 );
  const __m128i m2 = LOADU( block + 32 );
  const __m128i m3 = LOADU( block + 48 );
  const __m128i m4 = LOADU( block + 64 );
  const __m128i m5 = LOADU( block + 80 );
  const __m128i m6 = LOADU( block + 96 );
  const __m128i m7 = LOADU( block + 112 );
#else
  const uint64_t  m0 = ( ( uint64_t * )block )[ 0];
  const uint64_t  m1 = ( ( uint64_t * )block )[ 1];
  const uint64_t  m2 = ( ( uint64_t * )block )[ 2];
  const uint64_t  m3 = ( ( uint64_t * )block )[ 3];
  const uint64_t  m4 = ( ( uint64_t * )block )[ 4];
  const uint64_t  m5 = ( ( uint64_t * )block )[ 5];
  const uint64_t  m6 = ( ( uint64_t * )block )[ 6];
  const uint64_t  m7 = ( ( uint64_t * )block )[ 7];
  const uint64_t  m8 = ( ( uint64_t * )block )[ 8];
  const uint64_t  m9 = ( ( uint64_t * )block )[ 9];
  const uint64_t m10 = ( ( uint64_t * )block )[10];
  const uint64_t m11 = ( ( uint64_t * )block )[11];
  const uint64_t m12 = ( ( uint64_t * )block )[12];
  const uint64_t m13 = ( ( uint64_t * )block )[13];
  const uint64_t m14 = ( ( uint64_t * )block )[14];
  const uint64_t m15 = ( ( uint64_t * )block )[15];
#endif
  row1l = LOADU( &S->h[0] );
  row1h = LOADU( &S->h[2] );
  row2l = LOADU( &S->h[4] );
  row2h = LOADU( &S->h[6] );
  row3l = LOADU( &blake2b_IV[0] );
  row3h = LOADU( &blake2b_IV[2] );
  row4l = _mm_xor_si128( LOADU( &blake2b_IV[4] ), LOADU( &S->t[0] ) );
  row4h = _mm_xor_si128( LOADU( &blake2b_IV[6] ), LOADU( &S->f[0] ) );
  ROUND( 0 );
  ROUND( 1 );
  ROUND( 2 );
  ROUND( 3 );
  ROUND( 4 );
  ROUND( 5 );
  ROUND( 6 );
  ROUND( 7 );
  ROUND( 8 );
  ROUND( 9 );
  ROUND( 10 );
  ROUND( 11 );
  row1l = _mm_xor_si128( row3l, row1l );
  row1h = _mm_xor_si128( row3h, row1h );
  STOREU( &S->h[0], _mm_xor_si128( LOADU( &S->h[0] ), row1l ) );
  STOREU( &S->h[2], _mm_xor_si128( LOADU( &S->h[2] ), row1h ) );
  row2l = _mm_xor_si128( row4l, row2l );
  row2h = _mm_xor_si128( row4h, row2h );
  STOREU( &S->h[4], _mm_xor_si128( LOADU( &S->h[4] ), row2l ) );
  STOREU( &S->h[6], _mm_xor_si128( LOADU( &S->h[6] ), row2h ) );
  return 0;
}


int blake2b_update( blake2b_state *S, const uint8_t *in, uint64_t inlen )
{
  while( inlen > 0 )
  {
    size_t left = S->buflen;
    size_t fill = 2 * BLAKE2B_BLOCKBYTES - left;

    if( inlen > fill )
    {
      memcpy( S->buf + left, in, fill ); /* Fill buffer */
      S->buflen += fill;
      blake2b_increment_counter( S, BLAKE2B_BLOCKBYTES );
      blake2b_compress( S, S->buf ); /* Compress */
      memcpy( S->buf, S->buf + BLAKE2B_BLOCKBYTES, BLAKE2B_BLOCKBYTES ); /* Shift buffer left */
      S->buflen -= BLAKE2B_BLOCKBYTES;
      in += fill;
      inlen -= fill;
    }
    else /* inlen <= fill */
    {
      memcpy( S->buf + left, in, inlen );
      S->buflen += inlen; /* Be lazy, do not compress */
      in += inlen;
      inlen -= inlen;
    }
  }

  return 0;
}


int blake2b_final( blake2b_state *S, uint8_t *out, uint8_t outlen )
{
  if( outlen > BLAKE2B_OUTBYTES )
    return -1;

  if( blake2b_is_lastblock( S ) )
    return -1;

  if( S->buflen > BLAKE2B_BLOCKBYTES )
  {
    blake2b_increment_counter( S, BLAKE2B_BLOCKBYTES );
    blake2b_compress( S, S->buf );
    S->buflen -= BLAKE2B_BLOCKBYTES;
    memcpy( S->buf, S->buf + BLAKE2B_BLOCKBYTES, S->buflen );
  }

  blake2b_increment_counter( S, S->buflen );
  blake2b_set_lastblock( S );
  memset( S->buf + S->buflen, 0, 2 * BLAKE2B_BLOCKBYTES - S->buflen ); /* Padding */
  blake2b_compress( S, S->buf );
  memcpy( out, &S->h[0], outlen );
  return 0;
}


int blake2b( uint8_t *out, const void *in, const void *key, const uint8_t outlen, const uint64_t inlen, uint8_t keylen )
{
  blake2b_state S[1];

  /* Verify parameters */
  if ( NULL == in && inlen > 0 ) return -1;

  if ( NULL == out ) return -1;

  if( NULL == key && keylen > 0 ) return -1;

  if( !outlen || outlen > BLAKE2B_OUTBYTES ) return -1;

  if( keylen > BLAKE2B_KEYBYTES ) return -1;

  if( keylen )
  {
    if( blake2b_init_key( S, outlen, key, keylen ) < 0 ) return -1;
  }
  else
  {
    if( blake2b_init( S, outlen ) < 0 ) return -1;
  }

  blake2b_update( S, ( const uint8_t * )in, inlen );
  blake2b_final( S, out, outlen );
  return 0;
}

#if defined(SUPERCOP)
int crypto_hash( unsigned char *out, unsigned char *in, unsigned long long inlen )
{
  return blake2b( out, in, NULL, BLAKE2B_OUTBYTES, inlen, 0 );
}
#endif

#if defined(BLAKE2B_SELFTEST)
#include <string.h>
#include "blake2-kat.h"
int main( int argc, char **argv )
{
  uint8_t key[BLAKE2B_KEYBYTES];
  uint8_t buf[KAT_LENGTH];

  for( size_t i = 0; i < BLAKE2B_KEYBYTES; ++i )
    key[i] = ( uint8_t )i;

  for( size_t i = 0; i < KAT_LENGTH; ++i )
    buf[i] = ( uint8_t )i;

  for( size_t i = 0; i < KAT_LENGTH; ++i )
  {
    uint8_t hash[BLAKE2B_OUTBYTES];
    blake2b( hash, buf, key, BLAKE2B_OUTBYTES, i, BLAKE2B_KEYBYTES );

    if( 0 != memcmp( hash, blake2b_keyed_kat[i], BLAKE2B_OUTBYTES ) )
    {
      puts( "error" );
      return -1;
    }
  }

  puts( "ok" );
  return 0;
}
#endif



================================================
FILE: benchmarks/byte_order.c
================================================
/* byte_order.c - byte order related platform dependent routines,
 *
 * Copyright: 2008-2012 Aleksey Kravchenko <rhash.admin@gmail.com>
 *
 * Permission is hereby granted,  free of charge,  to any person  obtaining a
 * copy of this software and associated documentation files (the "Software"),
 * to deal in the Software without restriction,  including without limitation
 * the rights to  use, copy, modify,  merge, publish, distribute, sublicense,
 * and/or sell copies  of  the Software,  and to permit  persons  to whom the
 * Software is furnished to do so.
 *
 * This program  is  distributed  in  the  hope  that it will be useful,  but
 * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
 * or FITNESS FOR A PARTICULAR PURPOSE.  Use this program  at  your own risk!
 */
#include "byte_order.h"

#if !(__GNUC__ >= 4 || (__GNUC__ ==3 && __GNUC_MINOR__ >= 4)) /* if !GCC or GCC < 4.3 */

#  if _MSC_VER >= 1300 && (_M_IX86 || _M_AMD64 || _M_IA64) /* if MSVC++ >= 2002 on x86/x64 */
#  include <intrin.h>
#  pragma intrinsic(_BitScanForward)

/**
 * Returns index of the trailing bit of x.
 *
 * @param x the number to process
 * @return zero-based index of the trailing bit
 */
unsigned rhash_ctz(unsigned x)
{
	unsigned long index;
	unsigned char isNonzero = _BitScanForward(&index, x); /* MSVC intrinsic */
	return (isNonzero ? (unsigned)index : 0);
}
#  else /* _MSC_VER >= 1300... */

/**
 * Returns index of the trailing bit of a 32-bit number.
 * This is a plain C equivalent for GCC __builtin_ctz() bit scan.
 *
 * @param x the number to process
 * @return zero-based index of the trailing bit
 */
unsigned rhash_ctz(unsigned x)
{
	/* array for conversion to bit position */
	static unsigned char bit_pos[32] =  {
		0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
		31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
	};

	/* The De Bruijn bit-scan was devised in 1997, according to Donald Knuth
	 * by Martin Lauter. The constant 0x077CB531UL is a De Bruijn sequence,
	 * which produces a unique pattern of bits into the high 5 bits for each
	 * possible bit position that it is multiplied against.
	 * See http://graphics.stanford.edu/~seander/bithacks.html
	 * and http://chessprogramming.wikispaces.com/BitScan */
	return (unsigned)bit_pos[((uint32_t)((x & -x) * 0x077CB531U)) >> 27];
}
#  endif /* _MSC_VER >= 1300... */
#endif /* !(GCC >= 4.3) */

/**
 * Copy a memory block with simultaneous exchanging byte order.
 * The byte order is changed from little-endian 32-bit integers
 * to big-endian (or vice-versa).
 *
 * @param to the pointer where to copy memory block
 * @param index the index to start writing from
 * @param from  the source block to copy
 * @param length length of the memory block
 */
void rhash_swap_copy_str_to_u32(void* to, int index, const void* from, size_t length)
{
	/* if all pointers and length are 32-bits aligned */
	if ( 0 == (( (int)((char*)to - (char*)0) | ((char*)from - (char*)0) | index | length ) & 3) ) {
		/* copy memory as 32-bit words */
		const uint32_t* src = (const uint32_t*)from;
		const uint32_t* end = (const uint32_t*)((const char*)src + length);
		uint32_t* dst = (uint32_t*)((char*)to + index);
		while (src < end) *(dst++) = bswap_32( *(src++) );
	} else {
		const char* src = (const char*)from;
		for (length += index; (size_t)index < length; index++) ((char*)to)[index ^ 3] = *(src++);
	}
}

/**
 * Copy a memory block with changed byte order.
 * The byte order is changed from little-endian 64-bit integers
 * to big-endian (or vice-versa).
 *
 * @param to     the pointer where to copy memory block
 * @param index  the index to start writing from
 * @param from   the source block to copy
 * @param length length of the memory block
 */
void rhash_swap_copy_str_to_u64(void* to, int index, const void* from, size_t length)
{
	/* if all pointers and length are 64-bits aligned */
	if ( 0 == (( (int)((char*)to - (char*)0) | ((char*)from - (char*)0) | index | length ) & 7) ) {
		/* copy aligned memory block as 64-bit integers */
		const uint64_t* src = (const uint64_t*)from;
		const uint64_t* end = (const uint64_t*)((const char*)src + length);
		uint64_t* dst = (uint64_t*)((char*)to + index);
		while (src < end) *(dst++) = bswap_64( *(src++) );
	} else {
		const char* src = (const char*)from;
		for (length += index; (size_t)index < length; index++) ((char*)to)[index ^ 7] = *(src++);
	}
}

/**
 * Copy data from a sequence of 64-bit words to a binary string of given length,
 * while changing byte order.
 *
 * @param to     the binary string to receive data
 * @param from   the source sequence of 64-bit words
 * @param length the size in bytes of the data being copied
 */
void rhash_swap_copy_u64_to_str(void* to, const void* from, size_t length)
{
	/* if all pointers and length are 64-bits aligned */
	if ( 0 == (( (int)((char*)to - (char*)0) | ((char*)from - (char*)0) | length ) & 7) ) {
		/* copy aligned memory block as 64-bit integers */
		const uint64_t* src = (const uint64_t*)from;
		const uint64_t* end = (const uint64_t*)((const char*)src + length);
		uint64_t* dst = (uint64_t*)to;
		while (src < end) *(dst++) = bswap_64( *(src++) );
	} else {
		size_t index;
		char* dst = (char*)to;
		for (index = 0; index < length; index++) *(dst++) = ((char*)from)[index ^ 7];
	}
}

/**
 * Exchange byte order in the given array of 32-bit integers.
 *
 * @param arr    the array to process
 * @param length array length
 */
void rhash_u32_mem_swap(unsigned *arr, int length)
{
	unsigned* end = arr + length;
	for (; arr < end; arr++) {
		*arr = bswap_32(*arr);
	}
}


================================================
FILE: benchmarks/byte_order.h
================================================
/* byte_order.h */
#ifndef BYTE_ORDER_H
#define BYTE_ORDER_H
#include "ustd.h"
#include <stdlib.h>

#ifdef IN_RHASH
#include "config.h"
#endif

#ifdef __GLIBC__
# include <endian.h>
#endif

#ifdef __cplusplus
extern "C" {
#endif

/* if x86 compatible cpu */
#if defined(i386) || defined(__i386__) || defined(__i486__) || \
	defined(__i586__) || defined(__i686__) || defined(__pentium__) || \
	defined(__pentiumpro__) || defined(__pentium4__) || \
	defined(__nocona__) || defined(prescott) || defined(__core2__) || \
	defined(__k6__) || defined(__k8__) || defined(__athlon__) || \
	defined(__amd64) || defined(__amd64__) || \
	defined(__x86_64) || defined(__x86_64__) || defined(_M_IX86) || \
	defined(_M_AMD64) || defined(_M_IA64) || defined(_M_X64)
/* detect if x86-64 instruction set is supported */
# if defined(_LP64) || defined(__LP64__) || defined(__x86_64) || \
	defined(__x86_64__) || defined(_M_AMD64) || defined(_M_X64)
#  define CPU_X64
# else
#  define CPU_IA32
# endif
#endif


/* detect CPU endianness */
#if (defined(__BYTE_ORDER) && defined(__LITTLE_ENDIAN) && \
		__BYTE_ORDER == __LITTLE_ENDIAN) || \
	defined(CPU_IA32) || defined(CPU_X64) || \
	defined(__ia64) || defined(__ia64__) || defined(__alpha__) || defined(_M_ALPHA) || \
	defined(vax) || defined(MIPSEL) || defined(_ARM_) || defined(__arm__)
# define CPU_LITTLE_ENDIAN
# define IS_BIG_ENDIAN 0
# define IS_LITTLE_ENDIAN 1
#elif (defined(__BYTE_ORDER) && defined(__BIG_ENDIAN) && \
		__BYTE_ORDER == __BIG_ENDIAN) || \
	defined(__sparc) || defined(__sparc__) || defined(sparc) || \
	defined(_ARCH_PPC) || defined(_ARCH_PPC64) || defined(_POWER) || \
	defined(__POWERPC__) || defined(POWERPC) || defined(__powerpc) || \
	defined(__powerpc__) || defined(__powerpc64__) || defined(__ppc__) || \
	defined(__hpux)  || defined(_MIPSEB) || defined(mc68000) || \
	defined(__s390__) || defined(__s390x__) || defined(sel)
# define CPU_BIG_ENDIAN
# define IS_BIG_ENDIAN 1
# define IS_LITTLE_ENDIAN 0
#else
# error "Can't detect CPU architechture"
#endif

#define IS_ALIGNED_32(p) (0 == (3 & ((const char*)(p) - (const char*)0)))
#define IS_ALIGNED_64(p) (0 == (7 & ((const char*)(p) - (const char*)0)))

#if defined(_MSC_VER)
#define ALIGN_ATTR(n) __declspec(align(n))
#elif defined(__GNUC__)
#define ALIGN_ATTR(n) __attribute__((aligned (n)))
#else
#define ALIGN_ATTR(n) /* nothing */
#endif


#if defined(_MSC_VER) || defined(__BORLANDC__)
#define I64(x) x##ui64
#else
#define I64(x) x##LL
#endif

/* convert a hash flag to index */
#if __GNUC__ >= 4 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 4) /* GCC < 3.4 */
# define rhash_ctz(x) __builtin_ctz(x)
#else
unsigned rhash_ctz(unsigned); /* define as function */
#endif

void rhash_swap_copy_str_to_u32(void* to, int index, const void* from, size_t length);
void rhash_swap_copy_str_to_u64(void* to, int index, const void* from, size_t length);
void rhash_swap_copy_u64_to_str(void* to, const void* from, size_t length);
void rhash_u32_mem_swap(unsigned *p, int length_in_u32);

/* define bswap_32 */
#if defined(__GNUC__) && defined(CPU_IA32) && !defined(__i386__)
/* for intel x86 CPU */
static inline uint32_t bswap_32(uint32_t x) {
	__asm("bswap\t%0" : "=r" (x) : "0" (x));
	return x;
}
#elif defined(__GNUC__)  && (__GNUC__ >= 4) && (__GNUC__ > 4 || __GNUC_MINOR__ >= 3)
/* for GCC >= 4.3 */
# define bswap_32(x) __builtin_bswap32(x)
#elif (_MSC_VER > 1300) && (defined(CPU_IA32) || defined(CPU_X64)) /* MS VC */
# define bswap_32(x) _byteswap_ulong((unsigned long)x)
#elif !defined(__STRICT_ANSI__)
/* general bswap_32 definition */
static inline uint32_t bswap_32(uint32_t x) {
	x = ((x << 8) & 0xFF00FF00) | ((x >> 8) & 0x00FF00FF);
	return (x >> 16) | (x << 16);
}
#else
#define bswap_32(x) ((((x) & 0xff000000) >> 24) | (((x) & 0x00ff0000) >>  8) | \
	(((x) & 0x0000ff00) <<  8) | (((x) & 0x000000ff) << 24))
#endif /* bswap_32 */

#if defined(__GNUC__) && (__GNUC__ >= 4) && (__GNUC__ > 4 || __GNUC_MINOR__ >= 3)
# define bswap_64(x) __builtin_bswap64(x)
#elif (_MSC_VER > 1300) && (defined(CPU_IA32) || defined(CPU_X64)) /* MS VC */
# define bswap_64(x) _byteswap_uint64((__int64)x)
#elif !defined(__STRICT_ANSI__)
static inline uint64_t bswap_64(uint64_t x) {
	union {
		uint64_t ll;
		uint32_t l[2];
	} w, r;
	w.ll = x;
	r.l[0] = bswap_32(w.l[1]);
	r.l[1] = bswap_32(w.l[0]);
	return r.ll;
}
#else
#error "bswap_64 unsupported"
#endif

#ifdef CPU_BIG_ENDIAN
# define be2me_32(x) (x)
# define be2me_64(x) (x)
# define le2me_32(x) bswap_32(x)
# define le2me_64(x) bswap_64(x)

# define be32_copy(to, index, from, length) memcpy((to) + (index), (from), (length))
# define le32_copy(to, index, from, length) rhash_swap_copy_str_to_u32((to), (index), (from), (length))
# define be64_copy(to, index, from, length) memcpy((to) + (index), (from), (length))
# define le64_copy(to, index, from, length) rhash_swap_copy_str_to_u64((to), (index), (from), (length))
# define me64_to_be_str(to, from, length) memcpy((to), (from), (length))
# define me64_to_le_str(to, from, length) rhash_swap_copy_u64_to_str((to), (from), (length))

#else /* CPU_BIG_ENDIAN */
# define be2me_32(x) bswap_32(x)
# define be2me_64(x) bswap_64(x)
# define le2me_32(x) (x)
# define le2me_64(x) (x)

# define be32_copy(to, index, from, length) rhash_swap_copy_str_to_u32((to), (index), (from), (length))
# define le32_copy(to, index, from, length) memcpy((to) + (index), (from), (length))
# define be64_copy(to, index, from, length) rhash_swap_copy_str_to_u64((to), (index), (from), (length))
# define le64_copy(to, index, from, length) memcpy((to) + (index), (from), (length))
# define me64_to_be_str(to, from, length) rhash_swap_copy_u64_to_str((to), (from), (length))
# define me64_to_le_str(to, from, length) memcpy((to), (from), (length))
#endif /* CPU_BIG_ENDIAN */

/* ROTL/ROTR macros rotate a 32/64-bit word left/right by n bits */
#define ROTL32(dword, n) ((dword) << (n) ^ ((dword) >> (32 - (n))))
#define ROTR32(dword, n) ((dword) >> (n) ^ ((dword) << (32 - (n))))
#define ROTL64(qword, n) ((qword) << (n) ^ ((qword) >> (64 - (n))))
#define ROTR64(qword, n) ((qword) >> (n) ^ ((qword) << (64 - (n))))

#ifdef __cplusplus
} /* extern "C" */
#endif /* __cplusplus */

#endif /* BYTE_ORDER_H */


================================================
FILE: benchmarks/chacha-prng.h
================================================
/* Copyright (c) 2016 Vladimir Makarov <vmakarov@gcc.gnu.org>

   Permission is hereby granted, free of charge, to any person
   obtaining a copy of this software and associated documentation
   files (the "Software"), to deal in the Software without
   restriction, including without limitation the rights to use, copy,
   modify, merge, publish, distribute, sublicense, and/or sell copies
   of the Software, and to permit persons to whom the Software is
   furnished to do so, subject to the following conditions:

   The above copyright notice and this permission notice shall be
   included in all copies or substantial portions of the Software.

   THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
   EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
   MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
   NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
   BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
   ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
   CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
   SOFTWARE.
*/

/* Pseudo Random Number Generator (PRNG) based on ChaCha stream cipher
   designed by D.J. Bernstein.  The code is ChaCha reference code
   (please see https://cr.yp.to/chacha.html) adapted for our purposes.

   It is a crypto level PRNG as the stream cypher on which it is
   based.

   To use a generator call `init_chacha_prng` first, then call
   `get_chacha_prn` as much as you want to get a new PRN.  At the end
   of the PRNG use, call `finish_chacha_prng`.  You can change the
   default seed by calling `set_chacha_prng_seed`.

   The PRNG passes NIST Statistical Test Suite for Random and
   Pseudorandom Number Generators for Cryptographic Applications
   (version 2.2.1) with 1000 bitstreams each containing 1M bits.

   The generation of a new number takes about 40 CPU cycles on x86_64
   (Intel 4.2GHz i7-4790K), or speed of the generation is about 106M
   numbers per sec.  */

#ifndef __CHACHA_PRNG__
#define __CHACHA_PRNG__

#ifdef _MSC_VER
typedef unsigned __int32 uint32_t;
typedef unsigned __int64 uint64_t;
#else
#include <stdint.h>
#endif

#include <stdlib.h>

static inline uint32_t
_chacha_prng_rotl (uint32_t v, int c) {
  return (v << c) | (v >> (32 - c));
}

/* ChaCha state transformation step.  */
static inline void
_chacha_prng_quarter_round (uint32_t *a, uint32_t *b, uint32_t *c, uint32_t *d) {
  *a += *b; *d = _chacha_prng_rotl (*d ^ *a, 16);
  *c += *d; *b = _chacha_prng_rotl (*b ^ *c, 12);
  *a += *b; *d = _chacha_prng_rotl (*d ^ *a, 8);
  *c += *d; *b = _chacha_prng_rotl (*b ^ *c, 7);
}

/* Major ChaCha state transformation.  */
static inline void
_chacha_prng_salsa20 (uint32_t output[16], const uint32_t input[16]) {
  int i;

  for (i = 0; i < 16; i++)
    output[i] = input[i];
  for (i = 8; i > 0; i -= 2) {
    _chacha_prng_quarter_round (&output[0], &output[4], &output[8],&output[12]);
    _chacha_prng_quarter_round (&output[1], &output[5], &output[9],&output[13]);
    _chacha_prng_quarter_round (&output[2], &output[6],&output[10],&output[14]);
    _chacha_prng_quarter_round (&output[3], &output[7],&output[11],&output[15]);
    _chacha_prng_quarter_round (&output[0], &output[5],&output[10],&output[15]);
    _chacha_prng_quarter_round (&output[1], &output[6],&output[11],&output[12]);
    _chacha_prng_quarter_round (&output[2], &output[7], &output[8],&output[13]);
    _chacha_prng_quarter_round (&output[3], &output[4], &output[9],&output[14]);
  }
  for (i = 0; i < 16; i++)
    output[i] += input[i];
}

/* Internal state of the PRNG.  */
static struct {
  int ind; /* position in the output */
  /* The current PRNG state and parts of the recently generated
     numbers.  */
  uint32_t input[16], output[16];
} _chacha_prng_state;

/* Some random prime numbers.  */
static const uint32_t chacha_prng_primes[4] = {0xfa835867, 0x2086ca69, 0x1467c0fb, 0x638e2b99};

/* Internal function to set ChaCha PRNG seed by K and IV.  */
static inline void
_set_chacha_prng_key_iv (uint32_t k[8], uint32_t iv[2]) {
  int i;
  
  for (i = 0; i < 8; i++)
    _chacha_prng_state.input[i + 4] = k[i];
  for (i = 0; i < 4; i++)
    _chacha_prng_state.input[i] = chacha_prng_primes[i];
  _chacha_prng_state.input[12] = 0;
  _chacha_prng_state.input[13] = 0;
  _chacha_prng_state.input[14] = iv[0];
  _chacha_prng_state.input[15] = iv[1];
}

/* Internal function to initiate ChaCha PRNG with seed given by K and
   IV.  */
static inline void
_init_chacha_prng_with_key_iv (uint32_t k[8], uint32_t iv[2]) {
  _set_chacha_prng_key_iv (k, iv);
  _chacha_prng_state.ind = 16;
}

/* Initiate the PRNG with some random seed.  */
static inline void
init_chacha_prng (void) {
  int i;
  uint32_t k[8], iv[2];

  for (i = 0; i < 8; i++)
    k[i] = rand ();
  iv[0] = rand ();
  iv[1] = rand ();
  _init_chacha_prng_with_key_iv (k, iv);
}

/* Set ChaCha PRNG SEED.  */
static inline void
set_chacha_prng_seed (uint32_t seed) {
  static uint32_t k[8] = {0, 0, 0, 0};
  static uint32_t iv[2] = {0, 0};
  
  k[0] = seed;
  _set_chacha_prng_key_iv (k, iv);
}

/* Return the next pseudo-random number.  */
static inline uint64_t
get_chacha_prn (void)
{
  uint64_t res;

  for (;;) {
    if (_chacha_prng_state.ind < 16) {
      res = ((uint64_t) _chacha_prng_state.output[_chacha_prng_state.ind] << 32
	     | _chacha_prng_state.output[_chacha_prng_state.ind + 1]);
      _chacha_prng_state.ind += 2;
      return res;
    }
    _chacha_prng_state.ind = 0;
    _chacha_prng_salsa20 (_chacha_prng_state.output, _chacha_prng_state.input);
    _chacha_prng_state.input[12]++;
    if (! _chacha_prng_state.input[12])
      /* If it is becomming zero we produced too many numbers by
	 current PRNG.  */
      _chacha_prng_state.input[13]++;
  }
}

/* Empty function for our PRNGs interface.  */
static inline void
finish_chacha_prng (void) {
}

#endif


================================================
FILE: benchmarks/gen-table.rb
================================================
#!/usr/bin/ruby
# Take stdin and output the table
rows = []
cols = []
tab = {}
cur = ""
n = 0
STDIN.each_line do |line|
  puts line
  if md = /[+]+([0-9a-zA-Z-]+)/.match(line)
     rows.push(cur=line[md.begin(1)...md.end(1)])
     n += 1
  elsif md = /([0-9a-zA-Z-]+)\s*:\s*(\d+.\d+)s/.match(line)
     name=line[md.begin(1)...md.end(1)]
     cols.push(name) if n == 1
     tab[cur + name] = line[md.begin(2)...md.end(2)]
  end
end


def print_header(cols, mr, mc)
  print "|".ljust(mr, " ")
  cols.each do |e|
    print " | ", e.ljust(mc, " ")
  end
  print " |\n", ":".ljust(mr + 1, "-")
  cols.each do |e|
    print "|", ":".rjust(mc + 2, "-")
  end
  print "|\n"
end

mr = rows.map { |e| e.length}.max
mc = cols.map { |e| e.length}.max
mc = 11 if mc < 11

print_header(cols, mr, mc)

rows.each { |r|
  print "|", r.ljust(mr, " "), "| "
  min = 100000.0;
  cols.each { |c|
    min = tab[r + c].to_f if tab.has_key?(r + c) && tab[r + c].to_f < min
  }
  cols.each { |c|
    if ! tab.has_key?(r + c)
      print "-".ljust(mc - 4, " "), " | "
      continue
    end
    v = tab[r + c]
    print v.to_f == min ? "**" : "  "
    print v
    print v.to_f == min ? "**" : "  "
    print " ".ljust(mc - 4 - v.length, " ")
    print " | "
  }
  print "\n"
}


================================================
FILE: benchmarks/meow_hash.h
================================================
/* ========================================================================

   Meow - A Fast Non-cryptographic Hash
   (C) Copyright 2018 by Molly Rocket, Inc. (https://mollyrocket.com)
   
   See https://mollyrocket.com/meowhash for details.
   
   ========================================================================
   
   zlib License
   
   (C) Copyright 2018 Molly Rocket, Inc.
   
   This software is provided 'as-is', without any express or implied
   warranty.  In no event will the authors be held liable for any damages
   arising from the use of this software.
   
   Permission is granted to anyone to use this software for any purpose,
   including commercial applications, and to alter it and redistribute it
   freely, subject to the following restrictions:
   
   1. The origin of this software must not be misrepresented; you must not
      claim that you wrote the original software. If you use this software
      in a product, an acknowledgment in the product documentation would be
      appreciated but is not required.
   2. Altered source versions must be plainly marked as such, and must not be
      misrepresented as being the original software.
   3. This notice may not be removed or altered from any source distribution.
   
   ========================================================================
   
   FAQ
   
   Q: What is it?
   
   A: Meow is a 128-bit non-cryptographic hash that operates at high speeds
      on x64 and ARM processors that provide AES instructions.  It is
      designed to be truncatable to 64 and 32-bit hash values and still
      retain good collision resistance.
      
   Q: What is it GOOD for?
   
   A: Quickly hashing any amount of data for comparison purposes such as
      block deduplication or change detection.  It is extremely fast on
      all buffer sizes, from one byte to one gigabyte and up.
      
   Q: What is it BAD for?
   
   A: Anything security-related.  It should be assumed that it provides
      no protection from adversaries whatsoever.  It is also not particularly
      fast on processors that don't support AES instructions (eg., non-x64/ARM
      processors).
      
   Q: Why is it called the "Meow hash"?
   
   A: It is named after a character in Meow the Infinite
      (https://meowtheinfinite.com)
      
   Q: Who wrote it?
   
   A: CASEY MURATORI (https://caseymuratori.com) wrote the original
      implementation for use in processing large-footprint assets for
      the game 1935 (https://molly1935.com).
      
      After the initial version, the hash was refined via collaboration
      with several great programmers who contributed suggestions and
      modifications:
      
      JEFF ROBERTS (https://radgametools.com) provided a super slick
      way to handle the residual end-of-buffer bytes that dramatically
      improved Meow's small hash performance.
      
      MARTINS MOZEIKO (https://matrins.ninja) ported Meow to ARM and
      ANSI-C, and added the proper preprocessor dressing for clean
      compilation on a variety of compiler configurations.
      
      FABIAN GIESEN (https://fgiesen.wordpress.com) provided support
      for getting the benchmarking working properly across a number
      of platforms.
      
      ARAS PRANCKEVICIUS (https://aras-p.info) provided the allocation
      shim for compilation on Mac OS X.
      
   ========================================================================
   
   USAGE
   
   For a complete working example, see meow_example.cpp.  Briefly:
   
       // Include meow_intrinsics if you want it to detect platforms
       // and define types and intrinsics for you.  Omit it if you
       // want to define them yourself.
       #include "meow_intrinsics.h"
       
       // Include meow_hash for the Meow hash function
       #include "meow_hash.h"
       
       // Hash a block of data using CPU-specific acceleration
       meow_u128 MeowHash_Accelerated(u64 Seed, u64 Len, void *Source);
       
       // Check if two Meow hashes are the same
       // (returns zero if they aren't, non-zero if they are)
       int MeowHashesAreEqual(meow_u128 A, meow_u128 B)
       
       // Truncate a Meow hash to 64 bits
       meow_u64 MeowU64From(meow_u128 Hash);
       
       // Truncate a Meow hash to 32 bits
       meow_u32 MeowU32From(meow_u128 Hash);
       
   **** VERY IMPORTANT X64 COMPILATION NOTES ****
   
   On x64, Meow uses the AESDEC instruction, which comes in two flavors:
   SSE (aesdec) and AVX (vaesdec).  If you are compiling _with_ AVX support,
   your compiler will probably emit the AVX variant, which means your code
   WILL NOT RUN on computers that do not have AVX.  If you need to deploy
   this hash on computers that do not have AVX, you must take care to
   TURN OFF support for AVX in your compiler for the file that includes
   the Meow hash!
   
   ======================================================================== */

//
// NOTE(casey): This version is EXPERIMENTAL.  The Meow hash is still
// undergoing testing and finalization.
//
// **** EXPECT HASHES/APIs TO CHANGE UNTIL THE VERSION NUMBER HITS 1.0. ****
//
// You have been warned.
//

static const unsigned char MeowShiftAdjust[31] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128};
static const unsigned char MeowMaskLen[32] = {255,255,255,255, 255,255,255,255, 255,255,255,255, 255,255,255,255, 0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0};

// TODO(casey): These constants are loaded to initialize the lanes.  Jacob should
// give us some feedback on what they should _actually_ be set to.
#define MEOW_S0_INIT { 0, 1, 2, 3,  4, 5, 6, 7,  8, 9,10,11, 12,13,14,15}
#define MEOW_S1_INIT {16,17,18,19, 20,21,22,23, 24,25,26,27, 28,29,30,31}
#define MEOW_S2_INIT {32,33,34,35, 36,37,38,39, 40,41,42,43, 44,45,46,47}
#define MEOW_S3_INIT {48,49,50,51, 52,53,54,55, 56,57,58,59, 60,61,62,63}
static const unsigned char MeowS0Init[] = MEOW_S0_INIT;
static const unsigned char MeowS1Init[] = MEOW_S1_INIT;
static const unsigned char MeowS2Init[] = MEOW_S2_INIT;
static const unsigned char MeowS3Init[] = MEOW_S3_INIT;

//
// NOTE(casey): 128-wide AES-NI Meow (maximum of 16 bytes/clock single threaded)
//

static meow_hash
MeowHash_Accelerated(meow_u64 Seed, meow_u64 TotalLengthInBytes, void *SourceInit)
{
    //
    // NOTE(casey): Initialize the four AES streams and the mixer
    //
    
    meow_aes_128 S0 = Meow128_GetAESConstant(MeowS0Init);
    meow_aes_128 S1 = Meow128_GetAESConstant(MeowS1Init);
    meow_aes_128 S2 = Meow128_GetAESConstant(MeowS2Init);
    meow_aes_128 S3 = Meow128_GetAESConstant(MeowS3Init);
    
    meow_u128 Mixer = Meow128_Set64x2(Seed - TotalLengthInBytes,
                                      Seed + TotalLengthInBytes + 1);
    
    //
    // NOTE(casey): Handle as many full 256-byte blocks as possible
    //
    
    meow_u8 *Source = (meow_u8 *)SourceInit;
    meow_u64 Len = TotalLengthInBytes;
    int unsigned Len8 = Len & 15;
    int unsigned Len128 = Len & 48;
    
    while(Len >= 64)
    {
        S0 = Meow128_AESDEC_Mem(S0, Source);
        S1 = Meow128_AESDEC_Mem(S1, Source + 16);
        S2 = Meow128_AESDEC_Mem(S2, Source + 32);
        S3 = Meow128_AESDEC_Mem(S3, Source + 48);
        
        Len -= 64;
        Source += 64;
    }
    
    //
    // NOTE(casey): Overhanging individual bytes
    //
    
    if(Len8)
    {
        meow_u8 *Overhang = Source + Len128;
        int Align = ((int)(meow_umm)Overhang) & 15;
        if(Align)
        {
            int End = ((int)(meow_umm)Overhang) & (MEOW_PAGESIZE - 1);
        
            // NOTE(jeffr): If we are nowhere near the page end, use full unaligned load (cmov to set)
            if (End <= (MEOW_PAGESIZE - 16))
            {
                Align = 0;
            }
            
            // NOTE(jeffr): If we will read over the page end, use a full unaligned load (cmov to set)
            if ((End + Len8) > MEOW_PAGESIZE)
            {
                Align = 0;
            }
            
            meow_u128 Partial = Meow128_Shuffle_Mem(Overhang - Align, &MeowShiftAdjust[Align]);
            
            Partial = Meow128_And_Mem( Partial, &MeowMaskLen[16 - Len8] );
            S3 = Meow128_AESDEC(S3, Partial);
        }
        else
        {
            // NOTE(casey): We don't have to do Jeff's heroics when we know the
            // buffer is aligned, since we cannot span a memory page (by definition).
            meow_u128 Partial = Meow128_And_Mem(*(meow_u128 *)Overhang, &MeowMaskLen[16 - Len8]);
            S3 = Meow128_AESDEC(S3, Partial);
        }
    }
    
    //
    // NOTE(casey): Overhanging full 128-bit lanes
    //
    
    switch(Len128)
    {
        case 48: S2 = Meow128_AESDEC_Mem(S2, Source + 32);
        case 32: S1 = Meow128_AESDEC_Mem(S1, Source + 16);
        case 16: S0 = Meow128_AESDEC_Mem(S0, Source);
    }
    
    //
    // NOTE(casey): Mix the four lanes down to one 128-bit hash
    //
    
    S3 = Meow128_AESDEC(S3, Mixer);
    S2 = Meow128_AESDEC(S2, Mixer);
    S1 = Meow128_AESDEC(S1, Mixer);
    S0 = Meow128_AESDEC(S0, Mixer);
    
    S2 = Meow128_AESDEC(S2, Meow128_AESDEC_Finalize(S3));
    S0 = Meow128_AESDEC(S0, Meow128_AESDEC_Finalize(S1));
    
    S2 = Meow128_AESDEC(S2, Mixer);
    
    S0 = Meow128_AESDEC(S0, Meow128_AESDEC_Finalize(S2));
    S0 = Meow128_AESDEC(S0, Mixer);
    
    meow_hash Result;
    Meow128_CopyToHash(Meow128_AESDEC_Finalize(S0), Result);
    
    return(Result);
}


================================================
FILE: benchmarks/meow_intrinsics.h
================================================
/* ========================================================================

   meow_intrinsics.h
   (C) Copyright 2018 by Molly Rocket, Inc. (https://mollyrocket.com)
   
   See https://mollyrocket.com/meowhash for details.
   
   This is the default way to define all of the types and operations that
   meow_hash.h needs.  However, if you've got your _own_ equivalent type
   definitions and intrinsics, you can _omit_ this header file and just
   #define/typedef all the Meow ops to map to your own ops, keeping things
   nice and uniform in your codebase.
   
   ======================================================================== */

#if !defined(MEOW_HASH_INTRINSICS_H)

//
// NOTE(casey): Try to guess the source file for compiler intrinsics
//
#if _MSC_VER

#if _M_AMD64 || _M_IX86
#include <intrin.h>
#elif _M_ARM64
#include <arm64_neon.h>
#endif

#else

#if __x86_64__ || __i386__
#include <x86intrin.h>
#elif __aarch64__
#include <arm_neon.h>
#endif

#endif

//
// NOTE(casey): Set #define's to their defaults
//

#if !defined(MEOW_HASH_INTEL) || !defined(MEOW_HASH_ARMV8)
#if __x86_64__ || _M_AMD64
#define MEOW_HASH_INTEL 1
#define MEOW_64BIT 1
#define MEOW_PAGESIZE 4096
#elif __i386__  || _M_IX86
#define MEOW_HASH_INTEL 1
#define MEOW_64BIT 0
#define MEOW_PAGESIZE 4096
#elif __aarch64__ || _M_ARM64
#define MEOW_HASH_ARMV8 1
#define MEOW_64BIT 1
#define MEOW_PAGESIZE 4096
#else
#error Cannot determine architecture to use!
#endif
#endif

//
// NOTE(casey): Define basic types
//

#define meow_u8 char unsigned
#define meow_u16 short unsigned
#define meow_u32 int unsigned
#define meow_u64 long long unsigned

#if MEOW_64BIT
#define meow_umm long long unsigned
#else
#define meow_umm int unsigned
#endif

//
// NOTE(casey): Operations for x64 processors
//

#if MEOW_HASH_INTEL

#define meow_u128 __m128i
#define meow_aes_128 __m128i
#define meow_u256 __m256i
#define meow_aes_256 __m256i
#define meow_u512 __m512i
#define meow_aes_512 __m512i

#define MeowU32From(A, I) (_mm_extract_epi32((A), (I)))
#define MeowU64From(A, I) (_mm_extract_epi64((A), (I)))
#define MeowHashesAreEqual(A, B) (_mm_movemask_epi8(_mm_cmpeq_epi8((A), (B))) == 0xFFFF)

#define Meow128_AESDEC(Prior, Xor) _mm_aesdec_si128((Prior), (Xor))
#define Meow128_AESDEC_Mem(Prior, Xor) _mm_aesdec_si128((Prior), _mm_loadu_si128((meow_u128 *)(Xor)))
#define Meow128_AESDEC_Finalize(A) (A)
#define Meow128_Set64x2(Low64, High64) _mm_set_epi64x((High64), (Low64))
#define Meow128_Set64x2_State(Low64, High64) Meow128_Set64x2(Low64, High64)
#define Meow128_GetAESConstant(Ptr) (*(meow_u128 *)(Ptr))

#define Meow128_And_Mem(A,B) _mm_and_si128((A),_mm_loadu_si128((meow_u128 *)(B)))
#define Meow128_Shuffle_Mem(Mem,Control) _mm_shuffle_epi8(_mm_loadu_si128((meow_u128 *)(Mem)),_mm_loadu_si128((meow_u128 *)(Control)))

// TODO(casey): Not sure if this should actually be Meow128_Zero(A) ((A) = _mm_setzero_si128()), maybe
#define Meow128_Zero() _mm_setzero_si128()

#define Meow256_AESDEC(Prior, XOr) _mm256_aesdec_epi128((Prior), (XOr))
#define Meow256_AESDEC_Mem(Prior, XOr) _mm256_aesdec_epi128((Prior), *(meow_u256 *)(XOr))
#define Meow256_Zero() _mm256_setzero_si256()
#define Meow256_PartialLoad(A, B) _mm256_mask_loadu_epi8(_mm256_setzero_si256(), _cvtu32_mask32((1UL<<(B)) - 1), (A))
#define Meow128_FromLow(A) _mm256_extracti128_si256((A), 0)
#define Meow128_FromHigh(A) _mm256_extracti128_si256((A), 1)

#define Meow512_AESDEC(Prior, XOr) _mm512_aesdec_epi128((Prior), (XOr))
#define Meow512_AESDEC_Mem(Prior, XOr) _mm512_aesdec_epi128((Prior), *(meow_u512 *)(XOr))
#define Meow512_Zero() _mm512_setzero_si512()
#define Meow512_PartialLoad(A, B) _mm512_mask_loadu_epi8(_mm512_setzero_si512(), _cvtu64_mask64((1ULL<<(B)) - 1), (A))
#define Meow256_FromLow(A) _mm512_extracti64x4_epi64((A), 0)
#define Meow256_FromHigh(A) _mm512_extracti64x4_epi64((A), 1)

//
// NOTE(casey): Operations for ARM processors
//

#elif MEOW_HASH_ARMV8

#define meow_u128 uint8x16_t

// NOTE(mmozeiko): AES opcodes on ARMv8 work a bit differently than on Intel
// On Intel the "x = AESDEC(x, m)" does following:
//   x = InvMixColumns(SubBytes(ShiftRows(x))) ^ m
// But on ARMv8 the "x = AESDEC(x, m)" does following:
//   x = SubBytes(ShiftRows(x ^ m))
// Thus on ARMv8 it requires extra InvMixColumns call and delay on Xor operation.
// On iteration N it needs to use m[N-1] as input, and remeber m[N] for next iteration.
// This structure will store memory operand in member B which will be used in
// next AESDEC opcode. Remember to do one more XOR(A,B) when finishing AES
// operations in a loop.
typedef struct {
    meow_u128 A;
    meow_u128 B;
} meow_aes_128;

#define MeowU32From(A, I) (vgetq_lane_u32(vreinterpretq_u32_u8((A)), (I)))
#define MeowU64From(A, I) (vgetq_lane_u64(vreinterpretq_u64_u8((A)), (I)))

static int
MeowHashesAreEqualImpl(meow_u128 A, meow_u128 B)
{
    uint8x16_t Powers = {
        1, 2, 4, 8, 16, 32, 64, 128, 1, 2, 4, 8, 16, 32, 64, 128,
    };

    uint8x16_t Input = vceqq_u8(A, B);
    uint64x2_t Mask = vpaddlq_u32(vpaddlq_u16(vpaddlq_u8(vandq_u8(Input, Powers))));

    meow_u16 Output;
    vst1q_lane_u8((meow_u8*)&Output + 0, vreinterpretq_u8_u64(Mask), 0);
    vst1q_lane_u8((meow_u8*)&Output + 1, vreinterpretq_u8_u64(Mask), 8);
    return Output == 0xFFFF;
}

#define MeowHashesAreEqual(A, B) MeowHashesAreEqualImpl((A), (B))

static meow_aes_128
Meow128_AESDEC(meow_aes_128 Prior, meow_u128 Xor)
{
    meow_aes_128 R;
    R.A = vaesimcq_u8(vaesdq_u8(Prior.A, Prior.B));
    R.B = Xor;
    return(R);
}

static meow_aes_128
Meow128_AESDEC_Mem(meow_aes_128 Prior, void *Xor)
{
    meow_aes_128 R;
    R.A = vaesimcq_u8(vaesdq_u8(Prior.A, Prior.B));
    R.B = vld1q_u8((meow_u8*)Xor);
    return(R);
}

static meow_u128
Meow128_AESDEC_Finalize(meow_aes_128 Value)
{
    meow_u128 R = veorq_u8(Value.A, Value.B);
    return(R);
}

static meow_u128
Meow128_Zero()
{
    meow_u128 R = vdupq_n_u8(0);
    return(R);
}

static meow_aes_128
Meow128_GetAESConstant(const meow_u8 *Ptr)
{
    meow_aes_128 R;
    R.A = vld1q_u8(Ptr);
    R.B = vdupq_n_u8(0);
    return(R);
}

static meow_u128
Meow128_Set64x2(meow_u64 Low64, meow_u64 High64)
{
   meow_u128 R = vreinterpretq_u8_u64(vcombine_u64(vcreate_u64(Low64), vcreate_u64(High64)));
   return(R);
}

static meow_aes_128
Meow128_Set64x2_State(meow_u64 Low64, meow_u64 High64)
{
   meow_aes_128 R;
   R.A = Meow128_Set64x2(Low64, High64);
   R.B = Meow128_Zero();
   return(R);
}

#define Meow128_And_Mem(A,B) vandq_u8((A), vld1q_u8((meow_u8 *)B))
#define Meow128_Shuffle_Mem(Mem,Control) vqtbl1q_u8(vld1q_u8((meow_u8 *)(Mem)),vld1q_u8((meow_u8 *)(Control)))

#endif

#define MEOW_HASH_VERSION 4
#define MEOW_HASH_VERSION_NAME "0.4/himalayan"

#if MEOW_INCLUDE_C

// NOTE(casey): Unfortunately, if you want an ANSI-C version, we have to slow everyone
// else down because you can't return 128-bit values by register anymore (in case the
// CPU doesn't support that)
union meow_hash
{
    meow_u128 u128;
    meow_u64 u64[2];
    meow_u32 u32[4];
};
#define Meow128_CopyToHash(A, B) ((B).u128 = (A))

#undef MeowU64From
#undef MeowU32From
#undef MeowHashesAreEqual
#define MeowU32From(A, I) ((A).u32[I])
#define MeowU64From(A, I) ((A).u64[I])
#define MeowHashesAreEqual(A, B) (((A).u32[0] == (B).u32[0]) && ((A).u32[1] == (B).u32[1]) && ((A).u32[2] == (B).u32[2]) && ((A).u32[3] == (B).u32[3]))

#else

typedef meow_u128 meow_hash;
#define Meow128_CopyToHash(A, B) ((B) = (A))

#endif

typedef struct meow_hash_state meow_hash_state;
typedef meow_hash meow_hash_implementation(meow_u64 Seed, meow_u64 Len, void *Source);
typedef void meow_absorb_implementation(struct meow_hash_state *State, meow_u64 Len, void *Source);

#define MEOW_HASH_INTRINSICS_H
#endif


================================================
FILE: benchmarks/metrohash64.cpp
================================================
// metrohash64.cpp
//
// Copyright 2015-2018 J. Andrew Rogers
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//     http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

#include "platform.h"
#include "metrohash64.h"

#include <cstring>

const char * MetroHash64::test_string = "012345678901234567890123456789012345678901234567890123456789012";

const uint8_t MetroHash64::test_seed_0[8] =   { 0x6B, 0x75, 0x3D, 0xAE, 0x06, 0x70, 0x4B, 0xAD };
const uint8_t MetroHash64::test_seed_1[8] =   { 0x3B, 0x0D, 0x48, 0x1C, 0xF4, 0xB9, 0xB8, 0xDF };



MetroHash64::MetroHash64(const uint64_t seed)
{
    Initialize(seed);
}


void MetroHash64::Initialize(const uint64_t seed)
{
    vseed = (static_cast<uint64_t>(seed) + k2) * k0;

    // initialize internal hash registers
    state.v[0] = vseed;
    state.v[1] = vseed;
    state.v[2] = vseed;
    state.v[3] = vseed;

    // initialize total length of input
    bytes = 0;
}


void MetroHash64::Update(const uint8_t * const buffer, const uint64_t length)
{
    const uint8_t * ptr = reinterpret_cast<const uint8_t*>(buffer);
    const uint8_t * const end = ptr + length;

    // input buffer may be partially filled
    if (bytes % 32)
    {
        uint64_t fill = 32 - (bytes % 32);
        if (fill > length)
            fill = length;

        memcpy(input.b + (bytes % 32), ptr, static_cast<size_t>(fill));
        ptr   += fill;
        bytes += fill;
        
        // input buffer is still partially filled
        if ((bytes % 32) != 0) return;

        // process full input buffer
        state.v[0] += read_u64(&input.b[ 0]) * k0; state.v[0] = rotate_right(state.v[0],29) + state.v[2];
        state.v[1] += read_u64(&input.b[ 8]) * k1; state.v[1] = rotate_right(state.v[1],29) + state.v[3];
        state.v[2] += read_u64(&input.b[16]) * k2; state.v[2] = rotate_right(state.v[2],29) + state.v[0];
        state.v[3] += read_u64(&input.b[24]) * k3; state.v[3] = rotate_right(state.v[3],29) + state.v[1];
    }
    
    // bulk update
    bytes += static_cast<uint64_t>(end - ptr);
    while (ptr <= (end - 32))
    {
        // process directly from the source, bypassing the input buffer
        state.v[0] += read_u64(ptr) * k0; ptr += 8; state.v[0] = rotate_right(state.v[0],29) + state.v[2];
        state.v[1] += read_u64
Download .txt
gitextract_28zcd061/

├── .clang-format
├── ChangeLog
├── README.md
├── benchmarks/
│   ├── City.cpp
│   ├── City.h
│   ├── SpookyV2.cpp
│   ├── SpookyV2.h
│   ├── bbs-prng.h
│   ├── bench-crypto.c
│   ├── bench-crypto.sh
│   ├── bench-prng.c
│   ├── bench-prng.sh
│   ├── bench.c
│   ├── bench.sh
│   ├── blake2-config.h
│   ├── blake2-impl.h
│   ├── blake2.h
│   ├── blake2b-load-sse2.h
│   ├── blake2b-load-sse41.h
│   ├── blake2b-round.h
│   ├── blake2b.c
│   ├── byte_order.c
│   ├── byte_order.h
│   ├── chacha-prng.h
│   ├── gen-table.rb
│   ├── meow_hash.h
│   ├── meow_intrinsics.h
│   ├── metrohash64.cpp
│   ├── metrohash64.h
│   ├── mum512-prng.h
│   ├── platform.h
│   ├── rapidhash.h
│   ├── sha3.c
│   ├── sha3.h
│   ├── sha512.c
│   ├── sha512.h
│   ├── sip24-prng.h
│   ├── siphash24.c
│   ├── splitmix64.c
│   ├── t1ha/
│   │   ├── src/
│   │   │   ├── t1ha0.c
│   │   │   ├── t1ha0_ia32aes_a.h
│   │   │   ├── t1ha0_ia32aes_avx.c
│   │   │   ├── t1ha0_ia32aes_avx2.c
│   │   │   ├── t1ha0_ia32aes_b.h
│   │   │   ├── t1ha0_ia32aes_noavx.c
│   │   │   ├── t1ha0_selfcheck.c
│   │   │   ├── t1ha1.c
│   │   │   ├── t1ha1_selfcheck.c
│   │   │   ├── t1ha2.c
│   │   │   ├── t1ha2_selfcheck.c
│   │   │   ├── t1ha_bits.h
│   │   │   ├── t1ha_selfcheck.c
│   │   │   ├── t1ha_selfcheck.h
│   │   │   └── t1ha_selfcheck_all.c
│   │   └── t1ha.h
│   ├── ustd.h
│   ├── xoroshiro128plus.c
│   ├── xoroshiro128starstar.c
│   ├── xoseed.c
│   ├── xoshiro256plus.c
│   ├── xoshiro256starstar.c
│   ├── xoshiro512plus.c
│   ├── xoshiro512starstar.c
│   ├── xxh3.h
│   ├── xxhash.c
│   └── xxhash.h
├── mum-prng.h
├── mum.h
├── mum512.h
└── vmum.h
Download .txt
SYMBOL INDEX (654 symbols across 52 files)

FILE: benchmarks/City.cpp
  function uint64 (line 37) | static uint64 UNALIGNED_LOAD64(const char *p) {
  function uint32 (line 43) | static uint32 UNALIGNED_LOAD32(const char *p) {
  function uint64 (line 84) | static uint64 Fetch64(const char *p) {
  function uint32 (line 88) | static uint32 Fetch32(const char *p) {
  function uint64 (line 100) | static uint64 Rotate(uint64 val, int shift) {
  function uint64 (line 108) | static uint64 RotateByAtLeast1(uint64 val, int shift) {
  function uint64 (line 112) | static uint64 ShiftMix(uint64 val) {
  function uint64 (line 116) | static uint64 HashLen16(uint64 u, uint64 v) {
  function uint64 (line 120) | static uint64 HashLen0to16(const char *s, size_t len) {
  function uint64 (line 143) | static uint64 HashLen17to32(const char *s, size_t len) {
  function WeakHashLen32WithSeeds (line 154) | static pair<uint64, uint64> WeakHashLen32WithSeeds(
  function WeakHashLen32WithSeeds (line 166) | static pair<uint64, uint64> WeakHashLen32WithSeeds(
  function uint64 (line 177) | static uint64 HashLen33to64(const char *s, size_t len) {
  function uint64 (line 200) | uint64 CityHash64(const char *s, size_t len) {
  function uint64 (line 238) | uint64 CityHash64WithSeed(const char *s, size_t len, uint64 seed) {
  function uint64 (line 242) | uint64 CityHash64WithSeeds(const char *s, size_t len,
  function uint128 (line 249) | static uint128 CityMurmur(const char *s, size_t len, uint128 seed) {
  function uint128 (line 279) | uint128 CityHash128WithSeed(const char *s, size_t len, uint128 seed) {
  function uint128 (line 338) | uint128 CityHash128(const char *s, size_t len) {
  function CityHashCrc256Long (line 358) | static void CityHashCrc256Long(const char *s, size_t len,
  function CityHashCrc256Short (line 427) | static void CityHashCrc256Short(const char *s, size_t len, uint64 *resul...
  function CityHashCrc256 (line 434) | void CityHashCrc256(const char *s, size_t len, uint64 *result) {
  function uint128 (line 442) | uint128 CityHashCrc128WithSeed(const char *s, size_t len, uint128 seed) {
  function uint128 (line 455) | uint128 CityHashCrc128(const char *s, size_t len) {

FILE: benchmarks/City.h
  type uint8 (line 58) | typedef uint8_t uint8;
  type uint32 (line 59) | typedef uint32_t uint32;
  type uint64 (line 60) | typedef uint64_t uint64;
  type std (line 61) | typedef std::pair<uint64, uint64> uint128;
  function uint64 (line 63) | inline uint64 Uint128Low64(const uint128& x) { return x.first; }
  function uint64 (line 64) | inline uint64 Uint128High64(const uint128& x) { return x.second; }
  function uint64 (line 87) | inline uint64 Hash128to64(const uint128& x) {

FILE: benchmarks/SpookyV2.h
  type uint64 (line 33) | typedef  unsigned __int64 uint64;
  type uint32 (line 34) | typedef  unsigned __int32 uint32;
  type uint16 (line 35) | typedef  unsigned __int16 uint16;
  type uint8 (line 36) | typedef  unsigned __int8  uint8;
  type uint64 (line 40) | typedef  uint64_t  uint64;
  type uint32 (line 41) | typedef  uint32_t  uint32;
  type uint16 (line 42) | typedef  uint16_t  uint16;
  type uint8 (line 43) | typedef  uint8_t   uint8;
  function class (line 47) | class SpookyHash

FILE: benchmarks/bbs-prng.h
  function init_bbs_prng (line 81) | static inline void
  function set_bbs_seed (line 109) | static inline void
  function _update_bbs_prng (line 116) | static inline void
  function get_bbs_prn (line 123) | static inline uint64_t
  function finish_bbs_prng (line 138) | static inline void

FILE: benchmarks/bench-crypto.c
  function sha512_test (line 4) | void sha512_test (const void *msg, int len, void *out) {
  function sha3_512_test (line 17) | void sha3_512_test (const void *msg, int len, void *out) {
  function blake2b_test (line 30) | void blake2b_test (const void *msg, int len, void *out) {
  function mum512_test (line 41) | void mum512_test (const void *msg, int len, void *out) {
  function print512 (line 272) | static void
  function main (line 282) | int main () {
  function main (line 294) | int main () {
  function main (line 306) | int main () {
  function main (line 318) | int main () {

FILE: benchmarks/bench-prng.c
  function init_prng (line 5) | static void init_prng (void) { init_bbs_prng (); }
  function get_prn (line 6) | static uint64_t get_prn (void) { return get_bbs_prn (); }
  function finish_prng (line 7) | static void finish_prng (void) { finish_bbs_prng (); }
  function init_prng (line 11) | static void init_prng (void) { init_chacha_prng (); }
  function get_prn (line 12) | static uint64_t get_prn (void) { return get_chacha_prn (); }
  function finish_prng (line 13) | static void finish_prng (void) { finish_chacha_prng (); }
  function init_prng (line 17) | static void init_prng (void) { init_sip24_prng (); }
  function get_prn (line 18) | static uint64_t get_prn (void) { return get_sip24_prn (); }
  function finish_prng (line 19) | static void finish_prng (void) { finish_sip24_prng (); }
  function init_prng (line 23) | static void init_prng (void) { mum_hash_randomize (0); init_mum_prng (); }
  function get_prn (line 24) | static uint64_t get_prn (void) { return get_mum_prn (); }
  function finish_prng (line 25) | static void finish_prng (void) { finish_mum_prng (); }
  function init_prng (line 29) | static void init_prng (void) { init_mum512_prng (); }
  function get_prn (line 30) | static uint64_t get_prn (void) { return get_mum512_prn (); }
  function finish_prng (line 31) | static void finish_prng (void) { finish_mum512_prng (); }
  function init_prng (line 35) | static void init_prng (void) { s[0] = 0xe220a8397b1dcdaf; s[1] = 0x6e789...
  function get_prn (line 36) | static uint64_t get_prn (void) { return next (); }
  function finish_prng (line 37) | static void finish_prng (void) { }
  function init_prng (line 41) | static void init_prng (void) { s[0] = 0xe220a8397b1dcdaf; s[1] = 0x6e789...
  function get_prn (line 42) | static uint64_t get_prn (void) { return next (); }
  function finish_prng (line 43) | static void finish_prng (void) { }
  function init_prng (line 47) | static void init_prng (void) { s[0] = 0xe220a8397b1dcdaf; s[1] = 0x6e789...
  function get_prn (line 48) | static uint64_t get_prn (void) { return next (); }
  function finish_prng (line 49) | static void finish_prng (void) { }
  function init_prng (line 53) | static void init_prng (void) { s[0] = 0xe220a8397b1dcdaf; s[1] = 0x6e789...
  function get_prn (line 54) | static uint64_t get_prn (void) { return next (); }
  function finish_prng (line 55) | static void finish_prng (void) { }
  function init_prng (line 59) | static void init_prng (void) { s[0] = 0xe220a8397b1dcdaf; s[1] = 0x6e789...
  function get_prn (line 60) | static uint64_t get_prn (void) { return next (); }
  function finish_prng (line 61) | static void finish_prng (void) { }
  function init_prng (line 65) | static void init_prng (void) { s[0] = 0xe220a8397b1dcdaf; s[1] = 0x6e789...
  function get_prn (line 66) | static uint64_t get_prn (void) { return next (); }
  function finish_prng (line 67) | static void finish_prng (void) { }
  function init_prng (line 72) | static void init_prng (void) { }
  function get_prn (line 73) | static uint64_t get_prn (void) { return rand (); }
  function finish_prng (line 74) | static void finish_prng (void) { }
  function main (line 84) | int main(void)
  function main (line 94) | int main (void) {

FILE: benchmarks/bench.c
  function SpookyHash64_test (line 4) | static void SpookyHash64_test (const void *key, int len, uint32_t seed, ...
  function CityHash64_test (line 14) | static void CityHash64_test (const void *key, int len, uint32_t seed, vo...
  function siphash_test (line 27) | static void siphash_test (const void *key, int len, uint32_t seed, void ...
  function xxHash64_test (line 49) | static void xxHash64_test (const void *key, int len, uint32_t seed, void...
  function xxh3_test (line 60) | static void xxh3_test (const void *key, int len, uint32_t seed, void *ou...
  function t1ha_test (line 70) | static void t1ha_test (const void *key, int len, uint32_t seed, void *ou...
  function CityHash64_test (line 80) | static void CityHash64_test (const void *key, int len, uint32_t seed, vo...
  function metro_test (line 90) | static void metro_test (const void *key, int len, uint32_t seed, void *o...
  function meowhash_test (line 109) | static void meowhash_test (const void *key, int len, uint32_t seed, void...
  function mum_test (line 119) | static void mum_test (const void *key, int len, uint32_t seed, void *out) {
  function mum_test64 (line 123) | static void mum_test64 (const void *key, int len, uint32_t seed, void *o...
  function mum_test (line 133) | static void mum_test (const void *key, int len, uint32_t seed, void *out) {
  function mum_test64 (line 137) | static void mum_test64 (const void *key, int len, uint32_t seed, void *o...
  function rapid_test (line 147) | static void rapid_test (const void *key, int len, uint32_t seed, void *o...
  function main (line 163) | int main () {
  function main (line 184) | int main () {

FILE: benchmarks/blake2-impl.h
  function load32 (line 22) | BLAKE2_LOCAL_INLINE(uint32_t) load32( const void *src )
  function load64 (line 38) | BLAKE2_LOCAL_INLINE(uint64_t) load64( const void *src )
  function store32 (line 58) | BLAKE2_LOCAL_INLINE(void) store32( void *dst, uint32_t w )
  function store64 (line 71) | BLAKE2_LOCAL_INLINE(void) store64( void *dst, uint64_t w )
  function load48 (line 88) | BLAKE2_LOCAL_INLINE(uint64_t) load48( const void *src )
  function store48 (line 100) | BLAKE2_LOCAL_INLINE(void) store48( void *dst, uint64_t w )
  function rotl32 (line 111) | BLAKE2_LOCAL_INLINE(uint32_t) rotl32( const uint32_t w, const unsigned c )
  function rotl64 (line 116) | BLAKE2_LOCAL_INLINE(uint64_t) rotl64( const uint64_t w, const unsigned c )
  function rotr32 (line 121) | BLAKE2_LOCAL_INLINE(uint32_t) rotr32( const uint32_t w, const unsigned c )
  function rotr64 (line 126) | BLAKE2_LOCAL_INLINE(uint64_t) rotr64( const uint64_t w, const unsigned c )
  function secure_zero_memory (line 132) | BLAKE2_LOCAL_INLINE(void) secure_zero_memory(void *v, size_t n)

FILE: benchmarks/blake2.h
  type blake2s_constant (line 34) | enum blake2s_constant
  type blake2b_constant (line 43) | enum blake2b_constant
  type blake2s_state (line 52) | typedef struct __blake2s_state
  type blake2b_state (line 62) | typedef struct __blake2b_state
  type blake2sp_state (line 72) | typedef struct __blake2sp_state
  type blake2bp_state (line 80) | typedef struct __blake2bp_state
  type blake2s_param (line 90) | typedef struct __blake2s_param
  type blake2b_param (line 105) | typedef struct __blake2b_param
  function blake2 (line 151) | static inline int blake2( uint8_t *out, const void *in, const void *key,...

FILE: benchmarks/blake2b.c
  function blake2b_set_lastnode (line 70) | BLAKE2_LOCAL_INLINE(int) blake2b_set_lastnode( blake2b_state *S )
  function blake2b_clear_lastnode (line 76) | BLAKE2_LOCAL_INLINE(int) blake2b_clear_lastnode( blake2b_state *S )
  function blake2b_is_lastblock (line 82) | BLAKE2_LOCAL_INLINE(int) blake2b_is_lastblock( const blake2b_state *S )
  function blake2b_set_lastblock (line 87) | BLAKE2_LOCAL_INLINE(int) blake2b_set_lastblock( blake2b_state *S )
  function blake2b_clear_lastblock (line 95) | BLAKE2_LOCAL_INLINE(int) blake2b_clear_lastblock( blake2b_state *S )
  function blake2b_increment_counter (line 104) | BLAKE2_LOCAL_INLINE(int) blake2b_increment_counter( blake2b_state *S, co...
  function blake2b_param_set_digest_length (line 121) | BLAKE2_LOCAL_INLINE(int) blake2b_param_set_digest_length( blake2b_param ...
  function blake2b_param_set_fanout (line 127) | BLAKE2_LOCAL_INLINE(int) blake2b_param_set_fanout( blake2b_param *P, con...
  function blake2b_param_set_max_depth (line 133) | BLAKE2_LOCAL_INLINE(int) blake2b_param_set_max_depth( blake2b_param *P, ...
  function blake2b_param_set_leaf_length (line 139) | BLAKE2_LOCAL_INLINE(int) blake2b_param_set_leaf_length( blake2b_param *P...
  function blake2b_param_set_node_offset (line 145) | BLAKE2_LOCAL_INLINE(int) blake2b_param_set_node_offset( blake2b_param *P...
  function blake2b_param_set_node_depth (line 151) | BLAKE2_LOCAL_INLINE(int) blake2b_param_set_node_depth( blake2b_param *P,...
  function blake2b_param_set_inner_length (line 157) | BLAKE2_LOCAL_INLINE(int) blake2b_param_set_inner_length( blake2b_param *...
  function blake2b_param_set_salt (line 163) | BLAKE2_LOCAL_INLINE(int) blake2b_param_set_salt( blake2b_param *P, const...
  function blake2b_param_set_personal (line 169) | BLAKE2_LOCAL_INLINE(int) blake2b_param_set_personal( blake2b_param *P, c...
  function blake2b_init0 (line 175) | BLAKE2_LOCAL_INLINE(int) blake2b_init0( blake2b_state *S )
  function blake2b_init_param (line 185) | int blake2b_init_param( blake2b_state *S, const blake2b_param *P )
  function blake2b_init (line 201) | int blake2b_init( blake2b_state *S, const uint8_t outlen )
  function blake2b_init_key (line 223) | int blake2b_init_key( blake2b_state *S, const uint8_t outlen, const void...
  function blake2b_compress (line 257) | BLAKE2_LOCAL_INLINE(int) blake2b_compress( blake2b_state *S, const uint8...
  function blake2b_update (line 328) | int blake2b_update( blake2b_state *S, const uint8_t *in, uint64_t inlen )
  function blake2b_final (line 359) | int blake2b_final( blake2b_state *S, uint8_t *out, uint8_t outlen )
  function blake2b (line 384) | int blake2b( uint8_t *out, const void *in, const void *key, const uint8_...
  function crypto_hash (line 414) | int crypto_hash( unsigned char *out, unsigned char *in, unsigned long lo...
  function main (line 423) | int main( int argc, char **argv )

FILE: benchmarks/byte_order.c
  function rhash_ctz (line 30) | unsigned rhash_ctz(unsigned x)
  function rhash_ctz (line 45) | unsigned rhash_ctz(unsigned x)
  function rhash_swap_copy_str_to_u32 (line 74) | void rhash_swap_copy_str_to_u32(void* to, int index, const void* from, s...
  function rhash_swap_copy_str_to_u64 (line 99) | void rhash_swap_copy_str_to_u64(void* to, int index, const void* from, s...
  function rhash_swap_copy_u64_to_str (line 122) | void rhash_swap_copy_u64_to_str(void* to, const void* from, size_t length)
  function rhash_u32_mem_swap (line 144) | void rhash_u32_mem_swap(unsigned *arr, int length)

FILE: benchmarks/byte_order.h
  function bswap_32 (line 95) | static inline uint32_t bswap_32(uint32_t x) {
  function bswap_32 (line 106) | static inline uint32_t bswap_32(uint32_t x) {
  function bswap_64 (line 120) | static inline uint64_t bswap_64(uint64_t x) {

FILE: benchmarks/chacha-prng.h
  function _chacha_prng_rotl (line 56) | static inline uint32_t
  function _chacha_prng_quarter_round (line 62) | static inline void
  function _chacha_prng_salsa20 (line 71) | static inline void
  function _set_chacha_prng_key_iv (line 103) | static inline void
  function _init_chacha_prng_with_key_iv (line 119) | static inline void
  function init_chacha_prng (line 126) | static inline void
  function set_chacha_prng_seed (line 139) | static inline void
  function get_chacha_prn (line 149) | static inline uint64_t
  function finish_chacha_prng (line 172) | static inline void

FILE: benchmarks/gen-table.rb
  function print_header (line 21) | def print_header(cols, mr, mc)

FILE: benchmarks/meow_hash.h
  function meow_hash (line 150) | static meow_hash

FILE: benchmarks/meow_intrinsics.h
  type meow_aes_128 (line 138) | typedef struct {
  function MeowHashesAreEqualImpl (line 146) | static int
  function meow_aes_128 (line 164) | static meow_aes_128
  function meow_aes_128 (line 173) | static meow_aes_128
  function meow_u128 (line 182) | static meow_u128
  function meow_u128 (line 189) | static meow_u128
  function meow_aes_128 (line 196) | static meow_aes_128
  function meow_u128 (line 205) | static meow_u128
  function meow_aes_128 (line 212) | static meow_aes_128
  type meow_u128 (line 251) | typedef meow_u128 meow_hash;
  type meow_hash_state (line 256) | typedef struct meow_hash_state meow_hash_state;
  type meow_hash (line 257) | typedef meow_hash meow_hash_implementation(meow_u64 Seed, meow_u64 Len, ...
  type meow_hash_state (line 258) | struct meow_hash_state

FILE: benchmarks/metrohash64.cpp
  function metrohash64_1 (line 256) | void metrohash64_1(const uint8_t * key, uint64_t len, uint32_t seed, uin...
  function metrohash64_2 (line 334) | void metrohash64_2(const uint8_t * key, uint64_t len, uint32_t seed, uin...

FILE: benchmarks/metrohash64.h
  function class (line 22) | class MetroHash64

FILE: benchmarks/mum512-prng.h
  function init_mum512_prng (line 64) | static inline void
  function set_mum512_seed (line 74) | static inline void
  function get_mum512_prn (line 84) | static inline uint64_t
  function finish_mum512_prng (line 98) | static inline void

FILE: benchmarks/platform.h
  function rotate_right (line 32) | inline static uint64_t rotate_right(uint64_t v, unsigned k)
  function read_u64 (line 38) | inline static uint64_t read_u64(const void * const ptr)
  function read_u32 (line 43) | inline static uint64_t read_u32(const void * const ptr)
  function read_u16 (line 48) | inline static uint64_t read_u16(const void * const ptr)
  function read_u8 (line 53) | inline static uint64_t read_u8 (const void * const ptr)

FILE: benchmarks/rapidhash.h
  function RAPIDHASH_INLINE_CONSTEXPR (line 156) | RAPIDHASH_INLINE_CONSTEXPR void rapid_mum(uint64_t *A, uint64_t *B) RAPI...
  function RAPIDHASH_INLINE_CONSTEXPR (line 208) | RAPIDHASH_INLINE_CONSTEXPR uint64_t rapid_mix(uint64_t A, uint64_t B) RA...
  function RAPIDHASH_INLINE (line 214) | RAPIDHASH_INLINE uint64_t rapid_read64(const uint8_t *p) RAPIDHASH_NOEXC...
  function RAPIDHASH_INLINE (line 215) | RAPIDHASH_INLINE uint64_t rapid_read32(const uint8_t *p) RAPIDHASH_NOEXC...
  function RAPIDHASH_INLINE (line 217) | RAPIDHASH_INLINE uint64_t rapid_read64(const uint8_t *p) RAPIDHASH_NOEXC...
  function RAPIDHASH_INLINE (line 218) | RAPIDHASH_INLINE uint64_t rapid_read32(const uint8_t *p) RAPIDHASH_NOEXC...
  function RAPIDHASH_INLINE (line 220) | RAPIDHASH_INLINE uint64_t rapid_read64(const uint8_t *p) RAPIDHASH_NOEXC...
  function RAPIDHASH_INLINE (line 221) | RAPIDHASH_INLINE uint64_t rapid_read32(const uint8_t *p) RAPIDHASH_NOEXC...
  function RAPIDHASH_INLINE (line 223) | RAPIDHASH_INLINE uint64_t rapid_read64(const uint8_t *p) RAPIDHASH_NOEXC...
  function RAPIDHASH_INLINE (line 227) | RAPIDHASH_INLINE uint64_t rapid_read32(const uint8_t *p) RAPIDHASH_NOEXC...
  function RAPIDHASH_INLINE_CONSTEXPR (line 243) | RAPIDHASH_INLINE_CONSTEXPR uint64_t rapidhash_internal(const void *key, ...
  function RAPIDHASH_INLINE_CONSTEXPR (line 356) | RAPIDHASH_INLINE_CONSTEXPR uint64_t rapidhashMicro_internal(const void *...
  function RAPIDHASH_INLINE_CONSTEXPR (line 426) | RAPIDHASH_INLINE_CONSTEXPR uint64_t rapidhashNano_internal(const void *k...
  function RAPIDHASH_INLINE_CONSTEXPR (line 486) | RAPIDHASH_INLINE_CONSTEXPR uint64_t rapidhash_withSeed(const void *key, ...
  function RAPIDHASH_INLINE_CONSTEXPR (line 500) | RAPIDHASH_INLINE_CONSTEXPR uint64_t rapidhash(const void *key, size_t le...
  function RAPIDHASH_INLINE_CONSTEXPR (line 519) | RAPIDHASH_INLINE_CONSTEXPR uint64_t rapidhashMicro_withSeed(const void *...
  function RAPIDHASH_INLINE_CONSTEXPR (line 533) | RAPIDHASH_INLINE_CONSTEXPR uint64_t rapidhashMicro(const void *key, size...
  function RAPIDHASH_INLINE_CONSTEXPR (line 548) | RAPIDHASH_INLINE_CONSTEXPR uint64_t rapidhashNano_withSeed(const void *k...
  function RAPIDHASH_INLINE_CONSTEXPR (line 566) | RAPIDHASH_INLINE_CONSTEXPR uint64_t rapidhashNano(const void *key, size_...

FILE: benchmarks/sha3.c
  function rhash_keccak_init (line 39) | static void rhash_keccak_init(sha3_ctx *ctx, unsigned bits)
  function rhash_sha3_224_init (line 54) | void rhash_sha3_224_init(sha3_ctx *ctx)
  function rhash_sha3_256_init (line 64) | void rhash_sha3_256_init(sha3_ctx *ctx)
  function rhash_sha3_384_init (line 74) | void rhash_sha3_384_init(sha3_ctx *ctx)
  function rhash_sha3_512_init (line 84) | void rhash_sha3_512_init(sha3_ctx *ctx)
  function keccak_theta (line 90) | static void keccak_theta(uint64_t *A)
  function keccak_pi (line 114) | static void keccak_pi(uint64_t *A)
  function keccak_chi (line 146) | static void keccak_chi(uint64_t *A)
  function rhash_sha3_permutation (line 159) | static void rhash_sha3_permutation(uint64_t *state)
  function rhash_sha3_process_block (line 207) | static void rhash_sha3_process_block(uint64_t hash[25], const uint64_t *...
  function rhash_sha3_update (line 263) | void rhash_sha3_update(sha3_ctx *ctx, const unsigned char *msg, size_t s...
  function rhash_sha3_final (line 308) | void rhash_sha3_final(sha3_ctx *ctx, unsigned char* result)
  function rhash_keccak_final (line 336) | void rhash_keccak_final(sha3_ctx *ctx, unsigned char* result)

FILE: benchmarks/sha3.h
  type sha3_ctx (line 20) | typedef struct sha3_ctx

FILE: benchmarks/sha512.c
  function rhash_sha512_init (line 84) | void rhash_sha512_init(sha512_ctx *ctx)
  function rhash_sha384_init (line 107) | void rhash_sha384_init(struct sha512_ctx *ctx)
  function rhash_sha512_process_block (line 130) | static void rhash_sha512_process_block(uint64_t hash[8], uint64_t block[...
  function rhash_sha512_update (line 189) | void rhash_sha512_update(sha512_ctx *ctx, const unsigned char *msg, size...
  function rhash_sha512_final (line 231) | void rhash_sha512_final(sha512_ctx *ctx, unsigned char* result)

FILE: benchmarks/sha512.h
  type sha512_ctx (line 35) | typedef struct sha512_ctx

FILE: benchmarks/sip24-prng.h
  function _sip24_prng_rotl (line 55) | static inline uint64_t
  function _sip24_prng_round (line 61) | static inline void
  function _sip24_prng_gen (line 86) | static inline void
  function set_sip24_prng_seed (line 114) | static inline void
  function init_sip24_prng_with_seed (line 123) | static inline void
  function init_sip24_prng (line 131) | static inline void
  function get_sip24_prn (line 142) | static inline uint64_t
  function finish_sip24_prng (line 160) | static inline void

FILE: benchmarks/siphash24.c
  function siphash (line 77) | int siphash(uint8_t *out, const uint8_t *in, uint64_t inlen, const uint8...

FILE: benchmarks/splitmix64.c
  function next (line 23) | uint64_t next() {

FILE: benchmarks/t1ha/src/t1ha0.c
  function tail32_le_aligned (line 48) | uint32_t tail32_le_aligned(const void *v,
  function tail32_le_unaligned (line 93) | uint32_t
  function tail32_be_aligned (line 148) | uint32_t tail32_be_aligned(const void *v,
  function tail32_be_unaligned (line 186) | uint32_t
  function rot32 (line 237) | uint32_t rot32(uint32_t v, unsigned s) {
  function __always_inline (line 242) | static __always_inline void mixup32(uint32_t *a, uint32_t *b, uint32_t v,
  function __always_inline (line 249) | static __always_inline uint64_t final32(uint32_t a, uint32_t b) {
  function __cold (line 361) | __cold uint64_t t1ha_ia32cpu_features(void) {
  function t1ha0_resolve (line 391) | t1ha0_resolve(void) {
  function t1ha0_init (line 445) | __attribute__((__constructor__)) t1ha0_init(void) {
  function __cold (line 451) | static __cold uint64_t t1ha0_proxy(const void *data, size_t len,

FILE: benchmarks/t1ha/src/t1ha0_ia32aes_a.h
  function T1HA_IA32AES_NAME (line 49) | uint64_t T1HA_IA32AES_NAME(const void *data, size_t len, uint64_t seed) {

FILE: benchmarks/t1ha/src/t1ha0_ia32aes_b.h
  function T1HA_IA32AES_NAME (line 49) | uint64_t T1HA_IA32AES_NAME(const void *data, size_t len, uint64_t seed) {

FILE: benchmarks/t1ha/src/t1ha0_selfcheck.c
  function __cold (line 148) | __cold int t1ha_selfcheck__t1ha0_32le(void) {
  function __cold (line 152) | __cold int t1ha_selfcheck__t1ha0_32be(void) {
  function __cold (line 157) | __cold int t1ha_selfcheck__t1ha0_ia32aes_noavx(void) {
  function __cold (line 161) | __cold int t1ha_selfcheck__t1ha0_ia32aes_avx(void) {
  function __cold (line 166) | __cold int t1ha_selfcheck__t1ha0_ia32aes_avx2(void) {
  function __cold (line 172) | __cold int t1ha_selfcheck__t1ha0(void) {

FILE: benchmarks/t1ha/src/t1ha1.c
  function mix64 (line 49) | static __inline uint64_t mix64(uint64_t v, uint64_t p) {
  function final_weak_avalanche (line 54) | static __inline uint64_t final_weak_avalanche(uint64_t a, uint64_t b) {

FILE: benchmarks/t1ha/src/t1ha1_selfcheck.c
  function __cold (line 100) | __cold int t1ha_selfcheck__t1ha1_le(void) {
  function __cold (line 104) | __cold int t1ha_selfcheck__t1ha1_be(void) {
  function __cold (line 108) | __cold int t1ha_selfcheck__t1ha1(void) {

FILE: benchmarks/t1ha/src/t1ha2.c
  function __always_inline (line 48) | static __always_inline void init_ab(t1ha_state256_t *s, uint64_t x,
  function __always_inline (line 54) | static __always_inline void init_cd(t1ha_state256_t *s, uint64_t x,
  function __always_inline (line 77) | static __always_inline void squash(t1ha_state256_t *s) {

FILE: benchmarks/t1ha/src/t1ha2_selfcheck.c
  function __cold (line 146) | __cold int t1ha_selfcheck__t1ha2_atonce(void) {
  function __cold (line 150) | __cold static uint64_t thunk_atonce128(const void *data, size_t len,
  function __cold (line 156) | __cold int t1ha_selfcheck__t1ha2_atonce128(void) {
  function __cold (line 160) | __cold static uint64_t thunk_stream(const void *data, size_t len,
  function __cold (line 168) | __cold static uint64_t thunk_stream128(const void *data, size_t len,
  function __cold (line 177) | __cold int t1ha_selfcheck__t1ha2_stream(void) {
  function __cold (line 182) | __cold int t1ha_selfcheck__t1ha2(void) {

FILE: benchmarks/t1ha/src/t1ha_bits.h
  function e2k_add64carry_first (line 200) | unsigned
  function e2k_add64carry_next (line 208) | unsigned
  function e2k_add64carry_last (line 217) | void e2k_add64carry_last(unsigned carry,
  function msvc32_add64carry_first (line 288) | static __forceinline char
  function msvc32_add64carry_next (line 301) | static __forceinline char msvc32_add64carry_next(char carry, uint64_t base,
  function msvc32_add64carry_last (line 315) | static __forceinline void msvc32_add64carry_last(char carry, uint64_t base,
  function __always_inline (line 368) | static __always_inline uint64_t bswap64(uint64_t v) {
  function __always_inline (line 385) | static __always_inline uint32_t bswap32(uint32_t v) {
  function __always_inline (line 398) | static __always_inline uint16_t bswap16(uint16_t v) { return v << 8 | v ...
  type t1ha_unaligned_proxy (line 414) | typedef struct {
  type t1ha_unaligned_proxy (line 431) | typedef struct {
  function fetch16_le_aligned (line 525) | uint16_t
  function fetch16_le_unaligned (line 537) | uint16_t
  function fetch32_le_aligned (line 552) | uint32_t
  function fetch32_le_unaligned (line 564) | uint32_t
  function fetch64_le_aligned (line 579) | uint64_t
  function fetch64_le_unaligned (line 591) | uint64_t
  function tail64_le_aligned (line 604) | uint64_t tail64_le_aligned(const void *v,
  function tail64_le_unaligned (line 683) | uint64_t
  function fetch16_be_aligned (line 770) | uint16_t
  function fetch16_be_unaligned (line 782) | uint16_t
  function fetch32_be_aligned (line 796) | uint32_t
  function fetch32_be_unaligned (line 808) | uint32_t
  function fetch64_be_aligned (line 822) | uint64_t
  function fetch64_be_unaligned (line 834) | uint64_t
  function tail64_be_aligned (line 847) | uint64_t tail64_be_aligned(const void *v,
  function tail64_be_unaligned (line 907) | uint64_t
  function rot64 (line 983) | uint64_t rot64(uint64_t v, unsigned s) {
  function mul_32x32_64 (line 989) | uint64_t mul_32x32_64(uint32_t a,
  function add64carry_first (line 996) | unsigned
  function add64carry_next (line 1010) | unsigned
  function add64carry_last (line 1024) | void
  function mul_64x64_128 (line 1037) | uint64_t mul_64x64_128(uint64_t a,
  function mul_64x64_high (line 1070) | uint64_t mul_64x64_high(uint64_t a,
  function mux64 (line 1090) | uint64_t mux64(uint64_t v,
  function final64 (line 1097) | uint64_t final64(uint64_t a, uint64_t b) {
  function mixup64 (line 1103) | void mixup64(uint64_t *__restrict a,
  type t1ha_uint128_t (line 1113) | typedef union t1ha_uint128 {
  function t1ha_uint128_t (line 1127) | t1ha_uint128_t
  function t1ha_uint128_t (line 1140) | t1ha_uint128_t
  function t1ha_uint128_t (line 1154) | t1ha_uint128_t
  function t1ha_uint128_t (line 1168) | t1ha_uint128_t or128(t1ha_uint128_t x,
  function t1ha_uint128_t (line 1181) | t1ha_uint128_t xor128(t1ha_uint128_t x,
  function t1ha_uint128_t (line 1194) | t1ha_uint128_t rot128(t1ha_uint128_t v,
  function t1ha_uint128_t (line 1206) | t1ha_uint128_t add128(t1ha_uint128_t x,
  function t1ha_uint128_t (line 1218) | t1ha_uint128_t mul128(t1ha_uint128_t x,
  function t1ha_ia32_AESNI_avail (line 1236) | bool
  function t1ha_ia32_AVX_avail (line 1242) | bool
  function t1ha_ia32_AVX2_avail (line 1248) | bool

FILE: benchmarks/t1ha/src/t1ha_selfcheck.c
  function probe (line 55) | static __inline bool probe(uint64_t (*hash)(const void *, size_t, uint64...
  function __cold (line 63) | __cold int t1ha_selfcheck(uint64_t (*hash)(const void *, size_t, uint64_t),

FILE: benchmarks/t1ha/src/t1ha_selfcheck_all.c
  function __cold (line 47) | __cold int t1ha_selfcheck__all_enabled(void) {

FILE: benchmarks/t1ha/t1ha.h
  type t1ha_state256 (line 406) | typedef union T1HA_ALIGN_PREFIX t1ha_state256
  type t1ha_context_t (line 415) | typedef struct t1ha_context {
  function __force_inline (line 637) | static __force_inline uint64_t t1ha0(const void *data, size_t length,
  function __force_inline (line 660) | static __force_inline uint64_t t1ha0(const void *data, size_t length,
  function __force_inline (line 693) | static __force_inline uint64_t t1ha0(const void *data, size_t length,

FILE: benchmarks/xoroshiro128plus.c
  function rotl (line 32) | static inline uint64_t rotl(const uint64_t x, int k) {
  function next (line 36) | uint64_t next(void) {
  function jump (line 53) | void jump(void) {

FILE: benchmarks/xoroshiro128starstar.c
  function rotl (line 23) | static inline uint64_t rotl(const uint64_t x, int k) {
  function next (line 30) | uint64_t next(void) {
  function jump (line 47) | void jump(void) {
  function long_jump (line 71) | void long_jump(void) {

FILE: benchmarks/xoseed.c
  function main (line 5) | void main (int argc, char *argv[]) {

FILE: benchmarks/xoshiro256plus.c
  function rotl (line 27) | static inline uint64_t rotl(const uint64_t x, int k) {
  function next (line 34) | uint64_t next(void) {
  function jump (line 56) | void jump(void) {
  function long_jump (line 86) | void long_jump(void) {

FILE: benchmarks/xoshiro256starstar.c
  function rotl (line 21) | static inline uint64_t rotl(const uint64_t x, int k) {
  function next (line 28) | uint64_t next(void) {
  function jump (line 50) | void jump(void) {
  function long_jump (line 81) | void long_jump(void) {

FILE: benchmarks/xoshiro512plus.c
  function rotl (line 27) | static inline uint64_t rotl(const uint64_t x, int k) {
  function next (line 34) | uint64_t next(void) {
  function jump (line 60) | void jump(void) {

FILE: benchmarks/xoshiro512starstar.c
  function rotl (line 23) | static inline uint64_t rotl(const uint64_t x, int k) {
  function next (line 30) | uint64_t next(void) {
  function jump (line 56) | void jump(void) {

FILE: benchmarks/xxhash.h
  type XXH_errorcode (line 572) | typedef enum {
  type XXH32_hash_t (line 587) | typedef uint32_t XXH32_hash_t;
  type XXH32_hash_t (line 597) | typedef uint32_t XXH32_hash_t;
  type XXH32_hash_t (line 602) | typedef unsigned int XXH32_hash_t;
  type XXH32_hash_t (line 604) | typedef unsigned long XXH32_hash_t;
  type XXH32_state_t (line 653) | typedef struct XXH32_state_s XXH32_state_t;
  type XXH32_canonical_t (line 754) | typedef struct {
  type XXH64_hash_t (line 866) | typedef uint64_t XXH64_hash_t;
  type XXH64_hash_t (line 875) | typedef uint64_t XXH64_hash_t;
  type XXH64_hash_t (line 880) | typedef unsigned long XXH64_hash_t;
  type XXH64_hash_t (line 883) | typedef unsigned long long XXH64_hash_t;
  type XXH64_state_t (line 927) | typedef struct XXH64_state_s XXH64_state_t;
  type XXH64_canonical_t (line 1028) | typedef struct { unsigned char digest[sizeof(XXH64_hash_t)]; } XXH64_can...
  type XXH3_state_t (line 1243) | typedef struct XXH3_state_s XXH3_state_t;
  type XXH128_hash_t (line 1382) | typedef struct {
  type XXH128_canonical_t (line 1605) | typedef struct { unsigned char digest[sizeof(XXH128_hash_t)]; } XXH128_c...
  type XXH32_state_s (line 1672) | struct XXH32_state_s {
  type XXH64_state_s (line 1696) | struct XXH64_state_s {
  type XXH3_state_s (line 1770) | struct XXH3_state_s {
  function XXH_CONSTF (line 2369) | static XXH_CONSTF void* XXH_malloc(size_t s) { (void)s; return NULL; }
  function XXH_free (line 2370) | static void XXH_free(void* p) { (void)p; }
  function XXH_MALLOCF (line 2384) | static XXH_MALLOCF void* XXH_malloc(size_t s) { return malloc(s); }
  function XXH_free (line 2390) | static void XXH_free(void* p) { free(p); }
  type xxh_u8 (line 2550) | typedef uint8_t xxh_u8;
  type xxh_u8 (line 2552) | typedef unsigned char xxh_u8;
  type XXH32_hash_t (line 2554) | typedef XXH32_hash_t xxh_u32;
  function xxh_u32 (line 2626) | static xxh_u32 XXH_read32(const void* memPtr) { return *(const xxh_u32*)...
  type unalign (line 2638) | typedef union { xxh_u32 u32; } __attribute__((__packed__)) unalign;
  function xxh_u32 (line 2640) | static xxh_u32 XXH_read32(const void* ptr)
  function xxh_u32 (line 2652) | static xxh_u32 XXH_read32(const void* memPtr)
  function XXH_isLittleEndian (line 2699) | static int XXH_isLittleEndian(void)
  function xxh_u32 (line 2813) | static xxh_u32 XXH_swap32 (xxh_u32 x)
  type XXH_alignment (line 2831) | typedef enum {
  function XXH_FORCE_INLINE (line 2843) | XXH_FORCE_INLINE xxh_u32 XXH_readLE32(const void* memPtr)
  function XXH_FORCE_INLINE (line 2852) | XXH_FORCE_INLINE xxh_u32 XXH_readBE32(const void* memPtr)
  function XXH_FORCE_INLINE (line 2862) | XXH_FORCE_INLINE xxh_u32 XXH_readLE32(const void* ptr)
  function xxh_u32 (line 2867) | static xxh_u32 XXH_readBE32(const void* ptr)
  function XXH_FORCE_INLINE (line 2873) | XXH_FORCE_INLINE xxh_u32
  function XXH_versionNumber (line 2888) | XXH_PUBLIC_API unsigned XXH_versionNumber (void) { return XXH_VERSION_NU...
  function xxh_u32 (line 2928) | static xxh_u32 XXH32_round(xxh_u32 acc, xxh_u32 input)
  function xxh_u32 (line 2985) | static xxh_u32 XXH32_avalanche(xxh_u32 hash)
  function XXH_FORCE_INLINE (line 3001) | XXH_FORCE_INLINE void
  function XXH_FORCE_INLINE (line 3017) | XXH_FORCE_INLINE const xxh_u8 *
  function xxh_u32 (line 3044) | xxh_u32
  function XXH_PUREF (line 3067) | static XXH_PUREF xxh_u32
  function xxh_u32 (line 3156) | xxh_u32
  function XXH_PUBLIC_API (line 3180) | XXH_PUBLIC_API XXH32_hash_t XXH32 (const void* input, size_t len, XXH32_...
  function XXH_PUBLIC_API (line 3203) | XXH_PUBLIC_API XXH32_state_t* XXH32_createState(void)
  function XXH_PUBLIC_API (line 3208) | XXH_PUBLIC_API XXH_errorcode XXH32_freeState(XXH32_state_t* statePtr)
  function XXH_PUBLIC_API (line 3215) | XXH_PUBLIC_API void XXH32_copyState(XXH32_state_t* dstState, const XXH32...
  function XXH_PUBLIC_API (line 3221) | XXH_PUBLIC_API XXH_errorcode XXH32_reset(XXH32_state_t* statePtr, XXH32_...
  function XXH_PUBLIC_API (line 3231) | XXH_PUBLIC_API XXH_errorcode
  function XXH_PUBLIC_API (line 3278) | XXH_PUBLIC_API XXH32_hash_t XXH32_digest(const XXH32_state_t* state)
  function XXH_PUBLIC_API (line 3297) | XXH_PUBLIC_API void XXH32_canonicalFromHash(XXH32_canonical_t* dst, XXH3...
  function XXH_PUBLIC_API (line 3304) | XXH_PUBLIC_API XXH32_hash_t XXH32_hashFromCanonical(const XXH32_canonica...
  type XXH64_hash_t (line 3322) | typedef XXH64_hash_t xxh_u64;
  function xxh_u64 (line 3336) | static xxh_u64 XXH_read64(const void* memPtr)
  type unalign64 (line 3351) | typedef union { xxh_u32 u32; xxh_u64 u64; } __attribute__((__packed__)) ...
  function xxh_u64 (line 3353) | static xxh_u64 XXH_read64(const void* ptr)
  function xxh_u64 (line 3365) | static xxh_u64 XXH_read64(const void* memPtr)
  function xxh_u64 (line 3379) | static xxh_u64 XXH_swap64(xxh_u64 x)
  function XXH_FORCE_INLINE (line 3396) | XXH_FORCE_INLINE xxh_u64 XXH_readLE64(const void* memPtr)
  function XXH_FORCE_INLINE (line 3409) | XXH_FORCE_INLINE xxh_u64 XXH_readBE64(const void* memPtr)
  function XXH_FORCE_INLINE (line 3423) | XXH_FORCE_INLINE xxh_u64 XXH_readLE64(const void* ptr)
  function xxh_u64 (line 3428) | static xxh_u64 XXH_readBE64(const void* ptr)
  function XXH_FORCE_INLINE (line 3434) | XXH_FORCE_INLINE xxh_u64
  function xxh_u64 (line 3469) | static xxh_u64 XXH64_round(xxh_u64 acc, xxh_u64 input)
  function xxh_u64 (line 3494) | static xxh_u64 XXH64_mergeRound(xxh_u64 acc, xxh_u64 val)
  function xxh_u64 (line 3503) | static xxh_u64 XXH64_avalanche(xxh_u64 hash)
  function XXH_FORCE_INLINE (line 3520) | XXH_FORCE_INLINE void
  function XXH_FORCE_INLINE (line 3536) | XXH_FORCE_INLINE const xxh_u8 *
  function xxh_u64 (line 3572) | xxh_u64
  function xxh_u64 (line 3610) | xxh_u64
  function xxh_u64 (line 3654) | xxh_u64
  function XXH_PUBLIC_API (line 3678) | XXH_PUBLIC_API XXH64_hash_t XXH64 (XXH_NOESCAPE const void* input, size_...
  function XXH_PUBLIC_API (line 3700) | XXH_PUBLIC_API XXH64_state_t* XXH64_createState(void)
  function XXH_PUBLIC_API (line 3705) | XXH_PUBLIC_API XXH_errorcode XXH64_freeState(XXH64_state_t* statePtr)
  function XXH_PUBLIC_API (line 3712) | XXH_PUBLIC_API void XXH64_copyState(XXH_NOESCAPE XXH64_state_t* dstState...
  function XXH_PUBLIC_API (line 3718) | XXH_PUBLIC_API XXH_errorcode XXH64_reset(XXH_NOESCAPE XXH64_state_t* sta...
  function XXH_PUBLIC_API (line 3727) | XXH_PUBLIC_API XXH_errorcode
  function XXH_PUBLIC_API (line 3773) | XXH_PUBLIC_API XXH64_hash_t XXH64_digest(XXH_NOESCAPE const XXH64_state_...
  function XXH_PUBLIC_API (line 3792) | XXH_PUBLIC_API void XXH64_canonicalFromHash(XXH_NOESCAPE XXH64_canonical...
  function XXH_PUBLIC_API (line 3800) | XXH_PUBLIC_API XXH64_hash_t XXH64_hashFromCanonical(XXH_NOESCAPE const X...
  type uint64x2_t (line 4090) | typedef uint64x2_t xxh_aliasing_uint64x2_t
  function XXH_FORCE_INLINE (line 4106) | XXH_FORCE_INLINE uint64x2_t XXH_vld1q_u64(void const* ptr) /* silence -W...
  function XXH_FORCE_INLINE (line 4111) | XXH_FORCE_INLINE uint64x2_t XXH_vld1q_u64(void const* ptr)
  function XXH_FORCE_INLINE (line 4126) | XXH_FORCE_INLINE uint64x2_t
  function XXH_FORCE_INLINE (line 4133) | XXH_FORCE_INLINE uint64x2_t
  function XXH_FORCE_INLINE (line 4141) | XXH_FORCE_INLINE uint64x2_t
  function XXH_FORCE_INLINE (line 4148) | XXH_FORCE_INLINE uint64x2_t
  type xxh_u64x2 (line 4239) | typedef __vector unsigned long long xxh_u64x2;
  type xxh_u8x16 (line 4240) | typedef __vector unsigned char xxh_u8x16;
  type xxh_u32x4 (line 4241) | typedef __vector unsigned xxh_u32x4;
  type xxh_u64x2 (line 4246) | typedef xxh_u64x2 xxh_aliasing_u64x2
  function XXH_FORCE_INLINE (line 4267) | XXH_FORCE_INLINE xxh_u64x2 XXH_vec_revb(xxh_u64x2 val)
  function XXH_FORCE_INLINE (line 4279) | XXH_FORCE_INLINE xxh_u64x2 XXH_vec_loadu(const void *ptr)
  function XXH_FORCE_INLINE (line 4307) | XXH_FORCE_INLINE xxh_u64x2 XXH_vec_mulo(xxh_u32x4 a, xxh_u32x4 b)
  function XXH_FORCE_INLINE (line 4313) | XXH_FORCE_INLINE xxh_u64x2 XXH_vec_mule(xxh_u32x4 a, xxh_u32x4 b)
  function XXH_FORCE_INLINE (line 4404) | XXH_FORCE_INLINE xxh_u64
  function XXH128_hash_t (line 4431) | static XXH128_hash_t
  function xxh_u64 (line 4565) | static xxh_u64
  function xxh_u64 (line 4573) | xxh_u64 XXH_xorshift64(xxh_u64 v64, int shift)
  function XXH64_hash_t (line 4583) | static XXH64_hash_t XXH3_avalanche(xxh_u64 h64)
  function XXH64_hash_t (line 4596) | static XXH64_hash_t XXH3_rrmxmx(xxh_u64 h64, xxh_u64 len)
  function XXH64_hash_t (line 4640) | XXH64_hash_t
  function XXH64_hash_t (line 4662) | XXH64_hash_t
  function XXH64_hash_t (line 4678) | XXH64_hash_t
  function XXH64_hash_t (line 4695) | XXH64_hash_t
  function XXH_FORCE_INLINE (line 4732) | XXH_FORCE_INLINE xxh_u64 XXH3_mix16B(const xxh_u8* XXH_RESTRICT input,
  function XXH64_hash_t (line 4765) | XXH64_hash_t
  function XXH64_hash_t (line 4801) | XXH64_hash_t
  function XXH_FORCE_INLINE (line 4913) | XXH_FORCE_INLINE void XXH_writeLE64(void* dst, xxh_u64 v64)
  type xxh_i64 (line 4927) | typedef int64_t xxh_i64;
  type xxh_i64 (line 4930) | typedef long long xxh_i64;
  function XXH3_accumulate_512_avx512 (line 4964) | void
  function XXH_FORCE_INLINE (line 4991) | XXH_FORCE_INLINE XXH_TARGET_AVX512 XXH3_ACCUMULATE_TEMPLATE(avx512)
  function XXH3_initCustomSecret_avx512 (line 5037) | void
  function XXH3_accumulate_512_avx2 (line 5067) | void
  function XXH_FORCE_INLINE (line 5100) | XXH_FORCE_INLINE XXH_TARGET_AVX2 XXH3_ACCUMULATE_TEMPLATE(avx2)
  function XXH3_initCustomSecret_avx2 (line 5131) | void XXH3_initCustomSecret_avx2(void* XXH_RESTRICT customSecret, xxh_u64...
  function XXH3_accumulate_512_sse2 (line 5173) | void
  function XXH_FORCE_INLINE (line 5207) | XXH_FORCE_INLINE XXH_TARGET_SSE2 XXH3_ACCUMULATE_TEMPLATE(sse2)
  function XXH3_initCustomSecret_sse2 (line 5238) | void XXH3_initCustomSecret_sse2(void* XXH_RESTRICT customSecret, xxh_u64...
  function XXH_FORCE_INLINE (line 5308) | XXH_FORCE_INLINE void
  function XXH_FORCE_INLINE (line 5434) | XXH_FORCE_INLINE XXH3_ACCUMULATE_TEMPLATE(neon)
  function XXH_FORCE_INLINE (line 5495) | XXH_FORCE_INLINE void
  function XXH_FORCE_INLINE (line 5529) | XXH_FORCE_INLINE XXH3_ACCUMULATE_TEMPLATE(vsx)
  function XXH_FORCE_INLINE (line 5565) | XXH_FORCE_INLINE void
  function XXH_FORCE_INLINE (line 5605) | XXH_FORCE_INLINE void
  function XXH_FORCE_INLINE (line 5675) | XXH_FORCE_INLINE void
  function XXH_FORCE_INLINE (line 5706) | XXH_FORCE_INLINE XXH3_ACCUMULATE_TEMPLATE(lsx)
  function XXH_FORCE_INLINE (line 5754) | XXH_FORCE_INLINE xxh_u64
  function XXH_FORCE_INLINE (line 5763) | XXH_FORCE_INLINE xxh_u64
  function XXH_FORCE_INLINE (line 5777) | XXH_FORCE_INLINE void
  function XXH_FORCE_INLINE (line 5800) | XXH_FORCE_INLINE void
  function XXH_FORCE_INLINE (line 5817) | XXH_FORCE_INLINE XXH3_ACCUMULATE_TEMPLATE(scalar)
  function XXH_FORCE_INLINE (line 5849) | XXH_FORCE_INLINE void
  function XXH_FORCE_INLINE (line 5858) | XXH_FORCE_INLINE void
  function XXH_FORCE_INLINE (line 5987) | XXH_FORCE_INLINE void
  function XXH_FORCE_INLINE (line 6020) | XXH_FORCE_INLINE xxh_u64
  function XXH_PUREF (line 6028) | static XXH_PUREF XXH64_hash_t
  function XXH_PUREF (line 6058) | static XXH_PUREF XXH64_hash_t
  function XXH_FORCE_INLINE (line 6067) | XXH_FORCE_INLINE XXH64_hash_t
  function XXH3_WITH_SECRET_INLINE (line 6090) | XXH3_WITH_SECRET_INLINE XXH64_hash_t
  function XXH64_hash_t (line 6104) | XXH64_hash_t
  function XXH_FORCE_INLINE (line 6123) | XXH_FORCE_INLINE XXH64_hash_t
  function XXH_NO_INLINE (line 6146) | XXH_NO_INLINE XXH64_hash_t
  type XXH64_hash_t (line 6156) | typedef XXH64_hash_t (*XXH3_hashLong64_f)(const void* XXH_RESTRICT, size_t,
  function XXH_FORCE_INLINE (line 6159) | XXH_FORCE_INLINE XXH64_hash_t
  function XXH_PUBLIC_API (line 6185) | XXH_PUBLIC_API XXH64_hash_t XXH3_64bits(XXH_NOESCAPE const void* input, ...
  function XXH_PUBLIC_API (line 6191) | XXH_PUBLIC_API XXH64_hash_t
  function XXH_PUBLIC_API (line 6198) | XXH_PUBLIC_API XXH64_hash_t
  function XXH_PUBLIC_API (line 6204) | XXH_PUBLIC_API XXH64_hash_t
  function XXH_MALLOCF (line 6238) | static XXH_MALLOCF void* XXH_alignedMalloc(size_t s, size_t align)
  function XXH_alignedFree (line 6269) | static void XXH_alignedFree(void* p)
  function XXH_PUBLIC_API (line 6291) | XXH_PUBLIC_API XXH3_state_t* XXH3_createState(void)
  function XXH_PUBLIC_API (line 6311) | XXH_PUBLIC_API XXH_errorcode XXH3_freeState(XXH3_state_t* statePtr)
  function XXH_PUBLIC_API (line 6318) | XXH_PUBLIC_API void
  function XXH3_reset_internal (line 6324) | static void
  function XXH_PUBLIC_API (line 6352) | XXH_PUBLIC_API XXH_errorcode
  function XXH_PUBLIC_API (line 6361) | XXH_PUBLIC_API XXH_errorcode
  function XXH_PUBLIC_API (line 6372) | XXH_PUBLIC_API XXH_errorcode
  function XXH_PUBLIC_API (line 6384) | XXH_PUBLIC_API XXH_errorcode
  function XXH_FORCE_INLINE (line 6412) | XXH_FORCE_INLINE const xxh_u8 *
  function XXH_PUBLIC_API (line 6536) | XXH_PUBLIC_API XXH_errorcode
  function XXH_FORCE_INLINE (line 6544) | XXH_FORCE_INLINE void
  function XXH_PUBLIC_API (line 6582) | XXH_PUBLIC_API XXH64_hash_t XXH3_64bits_digest (XXH_NOESCAPE const XXH3_...
  function XXH128_hash_t (line 6616) | XXH128_hash_t
  function XXH128_hash_t (line 6645) | XXH128_hash_t
  function XXH128_hash_t (line 6672) | XXH128_hash_t
  function XXH128_hash_t (line 6747) | XXH128_hash_t
  function XXH_FORCE_INLINE (line 6766) | XXH_FORCE_INLINE XXH128_hash_t
  function XXH128_hash_t (line 6778) | XXH128_hash_t
  function XXH128_hash_t (line 6822) | XXH128_hash_t
  function XXH_PUREF (line 6880) | static XXH_PUREF XXH128_hash_t
  function XXH_FORCE_INLINE (line 6891) | XXH_FORCE_INLINE XXH128_hash_t
  function XXH128_hash_t (line 6910) | XXH128_hash_t
  function XXH3_WITH_SECRET_INLINE (line 6927) | XXH3_WITH_SECRET_INLINE XXH128_hash_t
  function XXH_FORCE_INLINE (line 6937) | XXH_FORCE_INLINE XXH128_hash_t
  function XXH_NO_INLINE (line 6958) | XXH_NO_INLINE XXH128_hash_t
  type XXH128_hash_t (line 6967) | typedef XXH128_hash_t (*XXH3_hashLong128_f)(const void* XXH_RESTRICT, si...
  function XXH_FORCE_INLINE (line 6970) | XXH_FORCE_INLINE XXH128_hash_t
  function XXH_PUBLIC_API (line 6995) | XXH_PUBLIC_API XXH128_hash_t XXH3_128bits(XXH_NOESCAPE const void* input...
  function XXH_PUBLIC_API (line 7003) | XXH_PUBLIC_API XXH128_hash_t
  function XXH_PUBLIC_API (line 7012) | XXH_PUBLIC_API XXH128_hash_t
  function XXH_PUBLIC_API (line 7021) | XXH_PUBLIC_API XXH128_hash_t
  function XXH_PUBLIC_API (line 7030) | XXH_PUBLIC_API XXH128_hash_t
  function XXH_PUBLIC_API (line 7045) | XXH_PUBLIC_API XXH_errorcode
  function XXH_PUBLIC_API (line 7052) | XXH_PUBLIC_API XXH_errorcode
  function XXH_PUBLIC_API (line 7059) | XXH_PUBLIC_API XXH_errorcode
  function XXH_PUBLIC_API (line 7066) | XXH_PUBLIC_API XXH_errorcode
  function XXH_PUBLIC_API (line 7073) | XXH_PUBLIC_API XXH_errorcode
  function XXH_PUBLIC_API (line 7080) | XXH_PUBLIC_API XXH128_hash_t XXH3_128bits_digest (XXH_NOESCAPE const XXH...
  function XXH_PUBLIC_API (line 7102) | XXH_PUBLIC_API int XXH128_isEqual(XXH128_hash_t h1, XXH128_hash_t h2)
  function XXH_PUBLIC_API (line 7113) | XXH_PUBLIC_API int XXH128_cmp(XXH_NOESCAPE const void* h128_1, XXH_NOESC...
  function XXH_PUBLIC_API (line 7126) | XXH_PUBLIC_API void
  function XXH_PUBLIC_API (line 7139) | XXH_PUBLIC_API XXH128_hash_t
  function XXH_FORCE_INLINE (line 7156) | XXH_FORCE_INLINE void XXH3_combine16(void* dst, XXH128_hash_t h128)
  function XXH_PUBLIC_API (line 7163) | XXH_PUBLIC_API XXH_errorcode
  function XXH_PUBLIC_API (line 7208) | XXH_PUBLIC_API void

FILE: mum-prng.h
  function _mum_prng_update (line 93) | static void __attribute__ ((noinline))
  function _mum_prng_setup_avx2 (line 104) | static inline void
  function _start_mum_prng (line 115) | static inline void
  function init_mum_prng (line 129) | static inline void
  function set_mum_prng_seed (line 134) | static inline void
  function get_mum_prn (line 139) | static inline uint64_t
  function finish_mum_prng (line 149) | static inline void

FILE: mum.h
  function _MUM_INLINE (line 95) | static _MUM_INLINE uint64_t _mum (uint64_t v, uint64_t p) {
  function _MUM_INLINE (line 144) | static _MUM_INLINE uint64_t _mum_le (uint64_t v) {
  function _MUM_INLINE (line 154) | static _MUM_INLINE uint32_t _mum_le32 (uint32_t v) {
  function _MUM_INLINE (line 164) | static _MUM_INLINE uint64_t _mum_le16 (uint16_t v) {
  function _MUM_INLINE (line 201) | static _MUM_INLINE uint64_t _mum_rotl (uint64_t v, int sh) { return v <<...
  function _MUM_INLINE (line 203) | static _MUM_INLINE uint64_t _mum_xor (uint64_t a, uint64_t b) {
  function _MUM_INLINE (line 288) | static _MUM_INLINE uint64_t _mum_final (uint64_t h) {
  function _MUM_INLINE (line 346) | static _MUM_INLINE uint64_t _mum_next_factor (void) {
  function _MUM_INLINE (line 357) | static _MUM_INLINE void mum_hash_randomize (uint64_t seed) {
  function _MUM_INLINE (line 373) | static _MUM_INLINE uint64_t mum_hash_init (uint64_t seed) { return seed; }
  function _MUM_INLINE (line 376) | static _MUM_INLINE uint64_t mum_hash_step (uint64_t h, uint64_t key) {
  function _MUM_INLINE (line 381) | static _MUM_INLINE uint64_t mum_hash_finish (uint64_t h) { return _mum_f...
  function _MUM_INLINE (line 384) | static _MUM_INLINE size_t mum_hash64 (uint64_t key, uint64_t seed) {
  function _MUM_INLINE (line 390) | static _MUM_INLINE uint64_t mum_hash (const void *key, size_t len, uint6...

FILE: mum512.h
  type __uint128_t (line 94) | typedef __uint128_t _mc_ti;
  type _mc_ti (line 97) | typedef struct {uint64_t lo, hi;} _mc_ti;
  type _mc_ti (line 99) | typedef struct {uint64_t hi, lo;} _mc_ti;
  function _mc_lo64 (line 106) | static inline uint64_t _mc_lo64 (_mc_ti a) {return a;}
  function _mc_hi64 (line 107) | static inline uint64_t _mc_hi64 (_mc_ti a) {return a >> 64;}
  function _mc_ti (line 108) | static inline _mc_ti _mc_lo2hi (_mc_ti a) {return a << 64;}
  function _mc_ti (line 109) | static inline _mc_ti _mc_hi2lo (_mc_ti a) {return a >> 64;}
  function _mc_ti (line 110) | static inline _mc_ti _mc_add (_mc_ti a, _mc_ti b) {return a + b;}
  function _mc_ti (line 111) | static inline _mc_ti _mc_xor (_mc_ti a, _mc_ti b) {return a ^ b;}
  function _mc_lt (line 112) | static inline uint32_t _mc_lt (_mc_ti a, _mc_ti b) { return a < b;}
  function _mc_ti (line 113) | static inline _mc_ti _mc_const (uint64_t hi, uint64_t lo) {
  function _mc_ti (line 116) | static inline _mc_ti _mc_rotr (_mc_ti a, int sh) {
  function _mc_ti (line 120) | static inline _mc_ti _mc_mul64 (uint64_t a, uint64_t b) {
  function _mc_ti (line 135) | static inline _mc_ti _mc_swap (_mc_ti v) {
  function _mc_lo64 (line 141) | static inline uint64_t _mc_lo64 (_mc_ti a) {return a.lo;}
  function _mc_hi64 (line 142) | static inline uint64_t _mc_hi64 (_mc_ti a) {return a.hi;}
  function _mc_ti (line 143) | static inline _mc_ti _mc_lo2hi (_mc_ti a) {a.hi = a.lo; a.lo = 0; return...
  function _mc_ti (line 144) | static inline _mc_ti _mc_hi2lo (_mc_ti a) {a.lo = a.hi; a.hi = 0; return...
  function _mc_ti (line 145) | static inline _mc_ti _mc_const (uint64_t hi, uint64_t lo) {
  function _mc_ti (line 149) | static inline _mc_ti _mc_add (_mc_ti a, _mc_ti b) {
  function _mc_lt (line 152) | static inline uint32_t _mc_lt (_mc_ti a, _mc_ti b) {
  function _mc_ti (line 155) | static inline _mc_ti _mc_xor (_mc_ti a, _mc_ti b) {
  function _mc_ti (line 158) | static inline _mc_ti _mc_rotr (_mc_ti a, int sh) {
  function _mc_ti (line 172) | static inline _mc_ti _mc_mul64 (uint64_t a, uint64_t b) {
  function _mc_ti (line 192) | static inline _mc_ti _mc_swap (_mc_ti v) {
  function _mc_ti (line 199) | static inline _mc_ti _mc_mum (_mc_ti a, _mc_ti b) {
  function _mc_ti (line 218) | static inline _mc_ti _mc_2le (_mc_ti v) {
  function _mc_ti (line 228) | static inline _mc_ti _mc_get (const uint64_t a[2]) {return _mc_const (a[...
  function _mc_permute (line 394) | static inline void
  function _mc_mix (line 413) | static inline void
  function _mc_init_state (line 446) | static inline void
  function mum512_keyed_hash (line 513) | static inline void
  function mum512_hash (line 528) | static inline void

FILE: vmum.h
  function _VMUM_INLINE (line 121) | static _VMUM_INLINE uint64_t _vmum (uint64_t v, uint64_t p) {
  function _VMUM_INLINE (line 170) | static _VMUM_INLINE uint64_t _vmum_le (uint64_t v) {
  function _VMUM_INLINE (line 180) | static _VMUM_INLINE uint32_t _vmum_le32 (uint32_t v) {
  function _VMUM_INLINE (line 190) | static _VMUM_INLINE uint64_t _vmum_le16 (uint16_t v) {
  function _VMUM_INLINE (line 200) | static _VMUM_INLINE uint64_t _vmum_xor (uint64_t a, uint64_t b) {
  function _VMUM_INLINE (line 208) | static _VMUM_INLINE uint64_t _vmum_plus (uint64_t a, uint64_t b) {
  function _VMUM_INLINE (line 223) | static _VMUM_INLINE _vmum_block_t _vmum_block (_vmum_block_t v, _vmum_bl...
  function _VMUM_INLINE (line 230) | static _VMUM_INLINE _vmum_block_t _vmum_nonzero (_vmum_block_t v) {
  function _VMUM_INLINE (line 235) | static _VMUM_INLINE void _vmum_update_block (_vmum_block_t *s, const _vm...
  function _VMUM_INLINE (line 243) | static _VMUM_INLINE void _vmum_factor_block (_vmum_block_t *s, const _vm...
  function _VMUM_INLINE (line 246) | static _VMUM_INLINE void _vmum_zero_block (_vmum_block_t *b) { *b = (_vm...
  function _VMUM_INLINE (line 247) | static _VMUM_INLINE uint64_t _vmum_fold_block (const _vmum_block_t *b) {
  type _vmum_block_t (line 252) | typedef struct {
  function _VMUM_INLINE (line 255) | static _VMUM_INLINE uint64x2_t _vmum_val (uint64x2_t v, uint64x2_t p) {
  function _VMUM_INLINE (line 260) | static _VMUM_INLINE uint64x2_t _vmum_nonzero (uint64x2_t v) {
  function _VMUM_INLINE (line 264) | static _VMUM_INLINE void _vmum_update_block (_vmum_block_t *s, const _vm...
  function _VMUM_INLINE (line 275) | static _VMUM_INLINE void _vmum_factor_block (_vmum_block_t *s, const _vm...
  function _VMUM_INLINE (line 279) | static _VMUM_INLINE void _vmum_zero_block (_vmum_block_t *b) { *b = (_vm...
  function _VMUM_INLINE (line 280) | static _VMUM_INLINE uint64_t _vmum_fold_block (_vmum_block_t *b) {
  type _vmum_block_t (line 286) | typedef struct {
  function _VMUM_INLINE (line 289) | static _VMUM_INLINE _vmum_v2di _vmum_val (_vmum_v2di v, _vmum_v2di p) {
  function _VMUM_INLINE (line 293) | static _VMUM_INLINE void _vmum_zero_block (_vmum_block_t *b) { *b = (_vm...
  function _VMUM_INLINE (line 294) | static _VMUM_INLINE _vmum_v2di _vmum_nonzero (_vmum_v2di v) {
  function _VMUM_INLINE (line 299) | static _VMUM_INLINE void _vmum_update_block (_vmum_block_t *s, const _vm...
  function _VMUM_INLINE (line 310) | static _VMUM_INLINE void _vmum_factor_block (_vmum_block_t *s, const _vm...
  function _VMUM_INLINE (line 314) | static _VMUM_INLINE uint64_t _vmum_fold_block (_vmum_block_t *b) {
  type _vmum_block_t (line 318) | typedef struct {
  function _VMUM_INLINE (line 321) | static _VMUM_INLINE uint64_t _vmum_val (uint64_t v, uint64_t p) {
  function _VMUM_INLINE (line 325) | static _VMUM_INLINE void _vmum_update_block (_vmum_block_t *s, const _vm...
  function _VMUM_INLINE (line 337) | static _VMUM_INLINE void _vmum_factor_block (_vmum_block_t *s, const _vm...
  function _VMUM_INLINE (line 343) | static _VMUM_INLINE void _vmum_zero_block (_vmum_block_t *b) { *b = (_vm...
  function _VMUM_INLINE (line 344) | static _VMUM_INLINE uint64_t _vmum_fold_block (_vmum_block_t *b) {
  function _VMUM_INLINE (line 466) | static _VMUM_INLINE uint64_t _vmum_final (uint64_t h) {
  function _VMUM_INLINE (line 517) | static _VMUM_INLINE uint64_t _vmum_next_factor (void) {
  function _VMUM_INLINE (line 528) | static _VMUM_INLINE void vmum_hash_randomize (uint64_t seed) {
  function _VMUM_INLINE (line 545) | static _VMUM_INLINE uint64_t vmum_hash_init (uint64_t seed) { return see...
  function _VMUM_INLINE (line 548) | static _VMUM_INLINE uint64_t vmum_hash_step (uint64_t h, uint64_t key) {
  function _VMUM_INLINE (line 553) | static _VMUM_INLINE uint64_t vmum_hash_finish (uint64_t h) { return _vmu...
  function _VMUM_INLINE (line 557) | static _VMUM_INLINE size_t vmum_hash64 (uint64_t key, uint64_t seed) {
  function _VMUM_INLINE (line 563) | static _VMUM_INLINE uint64_t vmum_hash (const void *key, size_t len, uin...
Condensed preview — 70 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (810K chars).
[
  {
    "path": ".clang-format",
    "chars": 657,
    "preview": "BasedOnStyle: google\nSpaceBeforeParens: Always\nIndentCaseLabels: false\nAllowShortIfStatementsOnASingleLine: true\nAllowSh"
  },
  {
    "path": "ChangeLog",
    "chars": 7233,
    "preview": "2018-11-02  Vladimir Makarov  <vmakarov@gcc.gnu.org>\n\n\t* README.md: Add update about mum-prng.  Correct typo for\n\txoshir"
  },
  {
    "path": "README.md",
    "chars": 35101,
    "preview": "# **Update (Nov. 28, 2025): Implemented collision attack prevention in VMUM and MUM-V3**\n* The attack is described in Is"
  },
  {
    "path": "benchmarks/City.cpp",
    "chars": 15200,
    "preview": "// Copyright (c) 2011 Google, Inc.\n//\n// Permission is hereby granted, free of charge, to any person obtaining a copy\n//"
  },
  {
    "path": "benchmarks/City.h",
    "chars": 4495,
    "preview": "// Copyright (c) 2011 Google, Inc.\n//\n// Permission is hereby granted, free of charge, to any person obtaining a copy\n//"
  },
  {
    "path": "benchmarks/SpookyV2.cpp",
    "chars": 8529,
    "preview": "// Spooky Hash\n// A 128-bit noncryptographic hash, for checksums and table lookup\n// By Bob Jenkins.  Public domain.\n// "
  },
  {
    "path": "benchmarks/SpookyV2.h",
    "chars": 11893,
    "preview": "//\n// SpookyHash: a 128-bit noncryptographic hash function\n// By Bob Jenkins, public domain\n//   Oct 31 2010: alpha, fra"
  },
  {
    "path": "benchmarks/bbs-prng.h",
    "chars": 4702,
    "preview": "/* Copyright (c) 2016 Vladimir Makarov <vmakarov@gcc.gnu.org>\n\n   Permission is hereby granted, free of charge, to any p"
  },
  {
    "path": "benchmarks/bench-crypto.c",
    "chars": 13382,
    "preview": "#if defined(SHA2)\n\n#include \"sha512.h\"\nvoid sha512_test (const void *msg, int len, void *out) {\n  sha512_ctx ctx;\n  \n  r"
  },
  {
    "path": "benchmarks/bench-crypto.sh",
    "chars": 2647,
    "preview": "#!/bin/bash\n\n# Benchmarking different crypto hash functions.\n\nIFS='%'\ntemp=__temp\n\n\nprint() {\n    s=`grep -E 'user[ \t]*["
  },
  {
    "path": "benchmarks/bench-prng.c",
    "chars": 3985,
    "preview": "#define N1 100000\n#if defined(BBS)\n#include \"bbs-prng.h\"\n#define N2 2\nstatic void init_prng (void) { init_bbs_prng (); }"
  },
  {
    "path": "benchmarks/bench-prng.sh",
    "chars": 1376,
    "preview": "#!/bin/bash\n\n# Benchmarking different Pseudo Random Generators\n\necho +++pseudo random number generation speed '(PRNs/sec"
  },
  {
    "path": "benchmarks/bench.c",
    "chars": 4511,
    "preview": "#if defined(Spooky)\n\n#include \"SpookyV2.h\"\nstatic void SpookyHash64_test (const void *key, int len, uint32_t seed, void "
  },
  {
    "path": "benchmarks/bench.sh",
    "chars": 5100,
    "preview": "#!/bin/bash\n\n# Benchmarking different hash functions.\n\ntemp=__hash-temp.out\ntemp2=__hash-temp2.out\ntemp3=__hash-temp3.ou"
  },
  {
    "path": "benchmarks/blake2-config.h",
    "chars": 1390,
    "preview": "/*\n   BLAKE2 reference source code package - optimized C implementations\n\n   Copyright 2012, Samuel Neves <sneves@dei.uc"
  },
  {
    "path": "benchmarks/blake2-impl.h",
    "chars": 3564,
    "preview": "/*\n   BLAKE2 reference source code package - optimized C implementations\n  \n   Copyright 2012, Samuel Neves <sneves@dei."
  },
  {
    "path": "benchmarks/blake2.h",
    "chars": 5239,
    "preview": "/*\n   BLAKE2 reference source code package - reference C implementations\n  \n   Copyright 2012, Samuel Neves <sneves@dei."
  },
  {
    "path": "benchmarks/blake2b-load-sse2.h",
    "chars": 4903,
    "preview": "/*\n   BLAKE2 reference source code package - optimized C implementations\n  \n   Copyright 2012, Samuel Neves <sneves@dei."
  },
  {
    "path": "benchmarks/blake2b-load-sse41.h",
    "chars": 6658,
    "preview": "/*\n   BLAKE2 reference source code package - optimized C implementations\n  \n   Copyright 2012, Samuel Neves <sneves@dei."
  },
  {
    "path": "benchmarks/blake2b-round.h",
    "chars": 5013,
    "preview": "/*\n   BLAKE2 reference source code package - optimized C implementations\n  \n   Copyright 2012, Samuel Neves <sneves@dei."
  },
  {
    "path": "benchmarks/blake2b.c",
    "chars": 11861,
    "preview": "/*\n   BLAKE2 reference source code package - optimized C implementations\n  \n   Copyright 2012, Samuel Neves <sneves@dei."
  },
  {
    "path": "benchmarks/byte_order.c",
    "chars": 5600,
    "preview": "/* byte_order.c - byte order related platform dependent routines,\n *\n * Copyright: 2008-2012 Aleksey Kravchenko <rhash.a"
  },
  {
    "path": "benchmarks/byte_order.h",
    "chars": 6196,
    "preview": "/* byte_order.h */\n#ifndef BYTE_ORDER_H\n#define BYTE_ORDER_H\n#include \"ustd.h\"\n#include <stdlib.h>\n\n#ifdef IN_RHASH\n#inc"
  },
  {
    "path": "benchmarks/chacha-prng.h",
    "chars": 5916,
    "preview": "/* Copyright (c) 2016 Vladimir Makarov <vmakarov@gcc.gnu.org>\n\n   Permission is hereby granted, free of charge, to any p"
  },
  {
    "path": "benchmarks/gen-table.rb",
    "chars": 1252,
    "preview": "#!/usr/bin/ruby\n# Take stdin and output the table\nrows = []\ncols = []\ntab = {}\ncur = \"\"\nn = 0\nSTDIN.each_line do |line|\n"
  },
  {
    "path": "benchmarks/meow_hash.h",
    "chars": 9532,
    "preview": "/* ========================================================================\n\n   Meow - A Fast Non-cryptographic Hash\n   "
  },
  {
    "path": "benchmarks/meow_intrinsics.h",
    "chars": 7722,
    "preview": "/* ========================================================================\n\n   meow_intrinsics.h\n   (C) Copyright 2018 "
  },
  {
    "path": "benchmarks/metrohash64.cpp",
    "chars": 12817,
    "preview": "// metrohash64.cpp\n//\n// Copyright 2015-2018 J. Andrew Rogers\n//\n// Licensed under the Apache License, Version 2.0 (the "
  },
  {
    "path": "benchmarks/metrohash64.h",
    "chars": 2638,
    "preview": "// metrohash64.h\n//\n// Copyright 2015-2018 J. Andrew Rogers\n//\n// Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "benchmarks/mum512-prng.h",
    "chars": 3529,
    "preview": "/* Copyright (c) 2016 Vladimir Makarov <vmakarov@gcc.gnu.org>\n\n   Permission is hereby granted, free of charge, to any p"
  },
  {
    "path": "benchmarks/platform.h",
    "chars": 2056,
    "preview": "// platform.h\n//\n// The MIT License (MIT)\n//\n// Copyright (c) 2015 J. Andrew Rogers\n//\n// Permission is hereby granted, "
  },
  {
    "path": "benchmarks/rapidhash.h",
    "chars": 21261,
    "preview": "/*\r\n * rapidhash V3 - Very fast, high quality, platform-independent hashing algorithm.\r\n *\r\n * Based on 'wyhash', by Wan"
  },
  {
    "path": "benchmarks/sha3.c",
    "chars": 10021,
    "preview": "/* sha3.c - an implementation of Secure Hash Algorithm 3 (Keccak).\n * based on the\n * The Keccak SHA-3 submission. Submi"
  },
  {
    "path": "benchmarks/sha3.h",
    "chars": 1461,
    "preview": "/* sha3.h */\n#ifndef RHASH_SHA3_H\n#define RHASH_SHA3_H\n#include \"ustd.h\"\n\n#ifdef __cplusplus\nextern \"C\" {\n#endif\n\n#defin"
  },
  {
    "path": "benchmarks/sha512.c",
    "chars": 9960,
    "preview": "/* sha512.c - an implementation of SHA-384/512 hash functions\n * based on FIPS 180-3 (Federal Information Processing Sta"
  },
  {
    "path": "benchmarks/sha512.h",
    "chars": 1355,
    "preview": "/* sha.h sha512 and sha384 hash functions */\n#ifndef SHA512_H\n#define SHA512_H\n\n#if _MSC_VER >= 1300\n\n# define int64_t _"
  },
  {
    "path": "benchmarks/sip24-prng.h",
    "chars": 5235,
    "preview": "/* Copyright (c) 2016 Vladimir Makarov <vmakarov@gcc.gnu.org>\n\n   Permission is hereby granted, free of charge, to any p"
  },
  {
    "path": "benchmarks/siphash24.c",
    "chars": 5400,
    "preview": "/*\n   SipHash reference C implementation\n\n   Copyright (c) 2012-2014 Jean-Philippe Aumasson\n   <jeanphilippe.aumasson@gm"
  },
  {
    "path": "benchmarks/splitmix64.c",
    "chars": 1091,
    "preview": "/*  Written in 2015 by Sebastiano Vigna (vigna@acm.org)\n\nTo the extent possible under law, the author has dedicated all "
  },
  {
    "path": "benchmarks/t1ha/src/t1ha0.c",
    "chars": 16821,
    "preview": "/*\n *  Copyright (c) 2016-2020 Positive Technologies, https://www.ptsecurity.com,\n *  Fast Positive Hash.\n *\n *  Portion"
  },
  {
    "path": "benchmarks/t1ha/src/t1ha0_ia32aes_a.h",
    "chars": 5784,
    "preview": "/*\n *  Copyright (c) 2016-2020 Positive Technologies, https://www.ptsecurity.com,\n *  Fast Positive Hash.\n *\n *  Portion"
  },
  {
    "path": "benchmarks/t1ha/src/t1ha0_ia32aes_avx.c",
    "chars": 124,
    "preview": "#ifndef T1HA0_DISABLED\n#define T1HA_IA32AES_NAME t1ha0_ia32aes_avx\n#include \"t1ha0_ia32aes_a.h\"\n#endif /* T1HA0_DISABLED"
  },
  {
    "path": "benchmarks/t1ha/src/t1ha0_ia32aes_avx2.c",
    "chars": 125,
    "preview": "#ifndef T1HA0_DISABLED\n#define T1HA_IA32AES_NAME t1ha0_ia32aes_avx2\n#include \"t1ha0_ia32aes_b.h\"\n#endif /* T1HA0_DISABLE"
  },
  {
    "path": "benchmarks/t1ha/src/t1ha0_ia32aes_b.h",
    "chars": 5231,
    "preview": "/*\n *  Copyright (c) 2016-2020 Positive Technologies, https://www.ptsecurity.com,\n *  Fast Positive Hash.\n *\n *  Portion"
  },
  {
    "path": "benchmarks/t1ha/src/t1ha0_ia32aes_noavx.c",
    "chars": 126,
    "preview": "#ifndef T1HA0_DISABLED\n#define T1HA_IA32AES_NAME t1ha0_ia32aes_noavx\n#include \"t1ha0_ia32aes_a.h\"\n#endif /* T1HA0_DISABL"
  },
  {
    "path": "benchmarks/t1ha/src/t1ha0_selfcheck.c",
    "chars": 10479,
    "preview": "/*\n *  Copyright (c) 2016-2020 Positive Technologies, https://www.ptsecurity.com,\n *  Fast Positive Hash.\n *\n *  Portion"
  },
  {
    "path": "benchmarks/t1ha/src/t1ha1.c",
    "chars": 8473,
    "preview": "/*\n *  Copyright (c) 2016-2020 Positive Technologies, https://www.ptsecurity.com,\n *  Fast Positive Hash.\n *\n *  Portion"
  },
  {
    "path": "benchmarks/t1ha/src/t1ha1_selfcheck.c",
    "chars": 5715,
    "preview": "/*\n *  Copyright (c) 2016-2020 Positive Technologies, https://www.ptsecurity.com,\n *  Fast Positive Hash.\n *\n *  Portion"
  },
  {
    "path": "benchmarks/t1ha/src/t1ha2.c",
    "chars": 18615,
    "preview": "/*\n *  Copyright (c) 2016-2020 Positive Technologies, https://www.ptsecurity.com,\n *  Fast Positive Hash.\n *\n *  Portion"
  },
  {
    "path": "benchmarks/t1ha/src/t1ha2_selfcheck.c",
    "chars": 10131,
    "preview": "/*\n *  Copyright (c) 2016-2020 Positive Technologies, https://www.ptsecurity.com,\n *  Fast Positive Hash.\n *\n *  Portion"
  },
  {
    "path": "benchmarks/t1ha/src/t1ha_bits.h",
    "chars": 41604,
    "preview": "/*\n *  Copyright (c) 2016-2020 Positive Technologies, https://www.ptsecurity.com,\n *  Fast Positive Hash.\n *\n *  Portion"
  },
  {
    "path": "benchmarks/t1ha/src/t1ha_selfcheck.c",
    "chars": 3772,
    "preview": "/*\n *  Copyright (c) 2016-2020 Positive Technologies, https://www.ptsecurity.com,\n *  Fast Positive Hash.\n *\n *  Portion"
  },
  {
    "path": "benchmarks/t1ha/src/t1ha_selfcheck.h",
    "chars": 2990,
    "preview": "/*\n *  Copyright (c) 2016-2020 Positive Technologies, https://www.ptsecurity.com,\n *  Fast Positive Hash.\n *\n *  Portion"
  },
  {
    "path": "benchmarks/t1ha/src/t1ha_selfcheck_all.c",
    "chars": 2230,
    "preview": "/*\n *  Copyright (c) 2016-2020 Positive Technologies, https://www.ptsecurity.com,\n *  Fast Positive Hash.\n *\n *  Portion"
  },
  {
    "path": "benchmarks/t1ha/t1ha.h",
    "chars": 27272,
    "preview": "/*\n *  Copyright (c) 2016-2020 Positive Technologies, https://www.ptsecurity.com,\n *  Fast Positive Hash.\n *\n *  Portion"
  },
  {
    "path": "benchmarks/ustd.h",
    "chars": 713,
    "preview": "/* ustd.h common macros and includes */\n#ifndef LIBRHASH_USTD_H\n#define LIBRHASH_USTD_H\n\n#if _MSC_VER >= 1300\n\n# define "
  },
  {
    "path": "benchmarks/xoroshiro128plus.c",
    "chars": 2242,
    "preview": "/*  Written in 2016 by David Blackman and Sebastiano Vigna (vigna@acm.org)\n\nTo the extent possible under law, the author"
  },
  {
    "path": "benchmarks/xoroshiro128starstar.c",
    "chars": 2393,
    "preview": "/*  Written in 2018 by David Blackman and Sebastiano Vigna (vigna@acm.org)\n\nTo the extent possible under law, the author"
  },
  {
    "path": "benchmarks/xoseed.c",
    "chars": 218,
    "preview": "#include <stdlib.h>\n#include <stdio.h>\n#include \"splitmix64.c\"\n\nvoid main (int argc, char *argv[]) {\n  int n = atoi(argv"
  },
  {
    "path": "benchmarks/xoshiro256plus.c",
    "chars": 2886,
    "preview": "/*  Written in 2018 by David Blackman and Sebastiano Vigna (vigna@acm.org)\n\nTo the extent possible under law, the author"
  },
  {
    "path": "benchmarks/xoshiro256starstar.c",
    "chars": 2616,
    "preview": "/*  Written in 2018 by David Blackman and Sebastiano Vigna (vigna@acm.org)\n\nTo the extent possible under law, the author"
  },
  {
    "path": "benchmarks/xoshiro512plus.c",
    "chars": 2251,
    "preview": "/*  Written in 2018 by David Blackman and Sebastiano Vigna (vigna@acm.org)\n\nTo the extent possible under law, the author"
  },
  {
    "path": "benchmarks/xoshiro512starstar.c",
    "chars": 1988,
    "preview": "/*  Written in 2018 by David Blackman and Sebastiano Vigna (vigna@acm.org)\n\nTo the extent possible under law, the author"
  },
  {
    "path": "benchmarks/xxh3.h",
    "chars": 2394,
    "preview": "/*\n * xxHash - Extremely Fast Hash algorithm\n * Development source file for `xxh3`\n * Copyright (C) 2019-2021 Yann Colle"
  },
  {
    "path": "benchmarks/xxhash.c",
    "chars": 1855,
    "preview": "/*\n * xxHash - Extremely Fast Hash algorithm\n * Copyright (C) 2012-2023 Yann Collet\n *\n * BSD 2-Clause License (https://"
  },
  {
    "path": "benchmarks/xxhash.h",
    "chars": 268319,
    "preview": "/*\n * xxHash - Extremely Fast Hash algorithm\n * Header File\n * Copyright (C) 2012-2023 Yann Collet\n *\n * BSD 2-Clause Li"
  },
  {
    "path": "mum-prng.h",
    "chars": 4649,
    "preview": "/* Copyright (c) 2016, 2017, 2018\n   Vladimir Makarov <vmakarov@gcc.gnu.org>\n\n   Permission is hereby granted, free of c"
  },
  {
    "path": "mum.h",
    "chars": 14417,
    "preview": "/* Copyright (c) 2016-2025\n   Vladimir Makarov <vmakarov@gcc.gnu.org>\n\n   Permission is hereby granted, free of charge, "
  },
  {
    "path": "mum512.h",
    "chars": 18708,
    "preview": "/* Copyright (c) 2016, 2017, 2018\n   Vladimir Makarov <vmakarov@gcc.gnu.org>\n\n   Permission is hereby granted, free of c"
  },
  {
    "path": "vmum.h",
    "chars": 23242,
    "preview": "/* Copyright (c) 2025\n   Vladimir Makarov <vmakarov@gcc.gnu.org>\n\n   Permission is hereby granted, free of charge, to an"
  }
]

About this extraction

This page contains the full source code of the vnmakarov/mum-hash GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 70 files (760.6 KB), approximately 255.0k tokens, and a symbol index with 654 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!