[
  {
    "path": "LICENSE",
    "content": "This is free and unencumbered software released into the public domain.\n\nAnyone is free to copy, modify, publish, use, compile, sell, or\ndistribute this software, either in source code form or as a compiled\nbinary, for any purpose, commercial or non-commercial, and by any\nmeans.\n\nIn jurisdictions that recognize copyright laws, the author or authors\nof this software dedicate any and all copyright interest in the\nsoftware to the public domain. We make this dedication for the benefit\nof the public at large and to the detriment of our heirs and\nsuccessors. We intend this dedication to be an overt act of\nrelinquishment in perpetuity of all present and future rights to this\nsoftware under copyright law.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND,\nEXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF\nMERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.\nIN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR\nOTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,\nARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR\nOTHER DEALINGS IN THE SOFTWARE.\n\nFor more information, please refer to <http://unlicense.org>\n"
  },
  {
    "path": "README.md",
    "content": "Intro\n-----\n\nThis document describes a stable adaptive hybrid bucket / quick / merge / drop sort named wolfsort.\nThe bucket sort, forming the core of wolfsort, is not a comparison sort, so wolfsort can be considered\na member of the radix-sort family. Quicksort and mergesort are well known. Dropsort gained popularity\nafter it was reinvented as Stalin sort. A [benchmark](https://github.com/scandum/wolfsort#benchmark-for-wolfsort-v1154-dripsort) is available at the bottom.\n\nWhy a hybrid?\n-------------\nWhile an adaptive merge sort is very fast at sorting ordered data, its inability to effectively\npartition is its greatest weakness. A radix-like bucket sort, on the other hand, is unable to take advantage of\nsorted data. While quicksort is fast at partitioning, a bucket sort is faster on medium-sized\narrays in the 1K - 1M element range. Dropsort in turn hybridizes surprisingly well with bucket\nand sample sorts.\n\nHistory\n-------\nWolfsort 1, codename: quantumsort, started out with the concept that memory is in abundance on\nmodern systems. I theorized that by allocating 8n memory performance could be increased by allowing\na bucket sort to partition in one pass.\n\nNot all the memory would be used or ever accessed however, which is why I envisioned it as a type\nof poor-man's quantum computing. The extra memory only serves to simplify computations. The concept\nkind of worked, except that large memory allocations in C can be either very fast or very slow. I\ndidn't investigate why.\n\nI also learned people don't like it when you use the term quantum computing outside of the proper\ncontext, or perhaps they were upset about wolfsort's voracious appetite for memory. Hence it was named.\n\nWolfsort 2, codename: flowsort, is when I reinvented counting sort. Instead of making 1 pass and\nusing extra memory to deal with fluctuations in the data, flowsort makes one pass to calculate the\nbucket sizes, then makes a second pass to neatly fill the buckets.\n\nWolfsort 3, codename: dripsort, was inspired by the work of M. Lochbaum on [rhsort](https://github.com/mlochbaum/rhsort)\nto use a method similar to dropsort to deal with bucket overflow, and to calculate the minimum and\nmaximum value to optimize for distributions with a small range of values. Dripsort once again makes\none pass and uses around 4n memory to deal with fluctuations in the data. Compared to v1 this is a\n50% reduction in memory allocation, while at the same time significantly increasing robustness.\n\nAnalyzer\n--------\nWolfsort uses the same analyzer as [fluxsort](https://github.com/scandum/fluxsort) to sort fully\nin-order and fully reverse-order distributions in n comparisons. The array is split into 4 segments\nfor which a measure of presortedness is calculated. Mostly ordered segments are sorted with\n[quadsort](https://github.com/scandum/quadsort), while mostly random segments are sorted with wolfsort.\n\nIn addition, the minimum and maximum value in the distribution is obtained.\n\nSetting the bucket size\n-----------------------\nFor optimal performance wolfsort needs to have at least 8 buckets, end up with between 1 and 16 elements\nper bucket, so the bucket size is set to hold 8 elements on average. However, the buckets should remain\nin the L1 cache, so the maximum number of buckets is set at 65536.\n\nThis sets the optimal range for wolfsort between 8 * 8 (64) and 8 * 65536 (524,288) elements. Beyond\nthe optimal range performance will degrade steadily. Once the average bucket size reaches the threshold\nof 18 elements (1,179,648 total elements) the sort becomes less optimal than quicksort, though it retains\na computational advantage for a little while longer. However, by recursing once, wolfsort increases the\noptimal range to 1 trillion elements.\n\nBy computing the minimum and maximum value in the data distribution, the number of buckets are optimized\nfurther to target the sweet spot.\n\nDropsort\n--------\nDropsort was first proposed as an alternative sorting algorithm by David Morgan in 2006, it makes one pass\nand is lossy. The algorithm was reinvented in 2018 as Stalin sort. The concept of dropping hash entries in\na non-lossy manner was independently developed by Marshall Lochbaum in 2018 and is utilized in his 2022\nrelease of rhsort (Robin Hood Sort).\n\nWolfsort allocates 4n memory to allow some deviancy in the data distribution and minimize bucket overflow.\nIn the case an element is too deviant and overflows the bucket, it is copied in-place to the input\narray. In near-optimal cases this results in a minimal drip, in the worst case it will result in a downpour\nof elements being copied to the input array.\n\nWhile a centrally planned partitioning system has its weaknesses, the worst case is mostly alleviated by using\nfluxsort on the deviant elements once partitioning finishes. Fluxsort is adaptive and is generally\nstrong against distributions where wolfsort is weak.\n\nThe overall performance gain from incorporating dropsort into wolfsort is approximately 20%, but can reach\nan order of magnitude when the fallback is synergetic with fluxsort. Deviant distributions can deceive\nwolfsort for a time, but not a very long time.\n\nSmall number sorting\n--------------------\nSince wolfsort uses auxiliary memory, each partition is stable once partitioning completes. The next\nstep is to sort the content of each bucket using fluxsort. If the number of elements in a bucket is\nbelow 32, fluxsort defaults to quadsort, which is highly optimized for sorting small arrays using a\ncombination of branchless parity merges and twice-unguarded insertion.\n\nOnce each bucket is sorted, all that remains is merging the two distributions of compliant and deviant\nelements, and wolfsort is finished.\n\nMemory overhead\n---------------\nWolfsort requires 4n memory for the partitioning process and n / 4 memory (up to a maximum of 65536)\nfor the buckets.\n\nIf not enough memory is available wolfsort falls back on fluxsort, which requires exactly 1n swap memory,\nand if that's not sufficient fluxsort falls back on quadsort which can sort in-place. It is an\noption to fall back on blitsort instead of quadsort, but since this would be an a-typical case,\nand increase dependencies, I didn't implement this.\n\n64 bit integers\n---------------\nWith the advent of fluxsort and crumsort the dominance of radix sorts has been pushed out of 64 bit territory. Increased memory-level-parallelism in future hardware, or algorithmic optimizations, might make radix sorts competitive again for 64 bit types. Wolfsort has a commented-out default to fluxsort.\n\n128 bit floats\n--------------\nWolfsort defaults to fluxsort for 128 bit floats. Keep in mind that in the real world you'll typically be sorting tables instead of arrays, so the benchmark isn't indicative of real world performance, as the sort will likely be copying 64 bit pointers instead of 128 bit floats.\n\nGod Mode\n--------\nWolfsort supports a cheat mode where the sort becomes unstable. This trick was taken from rhsort. Since wolfsort aspires to have some utility as a stable sort, this method is disabled by default, including in the benchmark.\n\nIn the benchmark rhsort does use this optimization, but it's only relevant for the random % 100 distribution. For 32 bit random integers rhsort easily beats wolfsort without an unfair advantage.\n\nLLVM\n----\nWhen compiling with Clang, quadsort and fluxsort will take advantate of branchless ternary oprations, which gives a 15-30% performance gain. While not an algorithmic improvement, it's relevant to keep in mind, particularly when it comes to LLVM compiled Rust sorts with similar optimizations.\n\nInterface\n---------\nWolfsort uses the same interface as qsort, which is described in [man qsort](https://man7.org/linux/man-pages/man3/qsort.3p.html).\n\nWolfsort also comes with the `wolfsort_prim(void *array, size_t nmemb, size_t size)` function to perform primitive comparisons on arrays of 32 and 64 bit integers. Nmemb is the number of elements, while size should be either `sizeof(int)` or `sizeof(long long)` for signed integers, and `sizeof(int) + 1` or `sizeof(long long) + 1` for unsigned integers. Support for the char and short types can be easily added in wolfsort.h.\n\nWolfsort can only sort arrays of primitive integers by default. Wolfsort should be able to sort tables with some minor changes, but it'll require a different interface than qsort() provides.\n\nProof of concept\n----------------\nWolfsort is primarily a proof of concept for a hybrid bucket / comparison sort. It only supports non-negative integers.\n\nI'll briefly mention other sorting algorithms listed in the benchmark code / graphs. They can all be considered the fastest algorithms currently available in their particular class.\n\nBlitsort\n--------\n[Blitsort](https://github.com/scandum/blitsort) is a hybrid in-place stable adaptive rotate quick / merge sort.\n\nCrumsort\n--------\n[Crumsort](https://github.com/scandum/crumsort) is a hybrid in-place unstable adaptive quick / rotate merge sort.\n\nQuadsort\n--------\n[Quadsort](https://github.com/scandum/quadsort) is an adaptive mergesort. It supports rotations as a fall-back to sort in-place. It has very good performance when it comes to sorting tables and generally outperforms timsort.\n\nGridsort\n--------\n[Gridsort](https://github.com/scandum/gridsort) is a stable comparison sort which stores data in a 2 dimensional self-balancing grid. It has some interesting properties and was the fastest comparison sort for random data for a brief period of time.\n\nFluxsort\n--------\n[Fluxsort](https://github.com/scandum/fluxsort) is a hybrid stable branchless out-of-place quick / merge sort.\n\nPiposort\n--------\n[Piposort](https://github.com/scandum/piposort) is a simplified branchless quadsort with a much smaller code size and complexity while still being very fast. Piposort might be of use to people who want to port quadsort. This is a lot easier when you start out small.\n\nrhsort\n------\n[rhsort](https://github.com/mlochbaum/rhsort) is a hybrid stable out-of-place counting / radix / drop / insertion sort. It has exceptional performance on random and generic data for medium array sizes.\n\nSka sort\n--------\n[Ska sort](https://github.com/skarupke/ska_sort) is an advanced radix sort that can sort strings and floats as well. It offers both an in-place and out-of-place version, but since the out-of-place unstable version is not very competitive with wolfsort, I only benchmark the stable and faster ska_sort_copy variant.\n\nBig O\n-----\n```\n                 ┌───────────────────────┐┌────────────────────┐\n                 │comparisons            ││swap memory         │\n┌───────────────┐├───────┬───────┬───────┤├──────┬──────┬──────┤┌──────┐┌─────────┐┌─────────┐┌─────────┐\n│name           ││min    │avg    │max    ││min   │avg   │max   ││stable││partition││adaptive ││compares │\n├───────────────┤├───────┼───────┼───────┤├──────┼──────┼──────┤├──────┤├─────────┤├─────────┤├─────────┤\n│blitsort       ││n      │n log n│n log n││1     │1     │1     ││yes   ││yes      ││yes      ││yes      │\n├───────────────┤├───────┼───────┼───────┤├──────┼──────┼──────┤├──────┤├─────────┤├─────────┤├─────────┤\n│crumsort       ││n      │n log n│n log n││1     │1     │1     ││no    ││yes      ││yes      ││yes      │\n├───────────────┤├───────┼───────┼───────┤├──────┼──────┼──────┤├──────┤├─────────┤├─────────┤├─────────┤\n│fluxsort       ││n      │n log n│n log n││n     │n     │n     ││yes   ││yes      ││yes      ││yes      │\n├───────────────┤├───────┼───────┼───────┤├──────┼──────┼──────┤├──────┤├─────────┤├─────────┤├─────────┤\n│gridsort       ││n      │n log n│n log n││n     │n     │n     ││yes   ││yes      ││yes      ││yes      │\n├───────────────┤├───────┼───────┼───────┤├──────┼──────┼──────┤├──────┤├─────────┤├─────────┤├─────────┤\n│quadsort       ││n      │n log n│n log n││1     │n     │n     ││yes   ││no       ││yes      ││yes      │\n├───────────────┤├───────┼───────┼───────┤├──────┼──────┼──────┤├──────┤├─────────┤├─────────┤├─────────┤\n│wolfsort       ││n      │n log n│n log n││n     │n     │n     ││yes   ││yes      ││yes      ││hybrid   │\n├───────────────┤├───────┼───────┼───────┤├──────┼──────┼──────┤├──────┤├─────────┤├─────────┤├─────────┤\n│rhsort         ││n      │n log n│n log n││n     │n     │n     ││yes   ││yes      ││semi     ││hybrid   │\n├───────────────┤├───────┼───────┼───────┤├──────┼──────┼──────┤├──────┤├─────────┤├─────────┤├─────────┤\n│skasort_copy   ││n k    │n k    │n k    ││n     │n     │n     ││yes   ││yes      ││no       ││no       │\n└───────────────┘└───────┴───────┴───────┘└──────┴──────┴──────┘└──────┘└─────────┘└─────────┘└─────────┘\n```\n\nBenchmark for Wolfsort v1.2.1.3\n-------------------------------\n\nrhsort vs wolfsort vs ska_sort_copy on 100K elements\n----------------------------------------------------\nThe following benchmark was on WSL gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1) on 100,000 32 bit integers.\nThe source code was compiled using g++ -O3 -fpermissive bench.c. All comparisons are inlined through the cmp macro.\nA table with the best and average time in seconds can be uncollapsed below the bar graph.\n\n![Graph](/images/radix1.png)\n\n<details><summary><b>data table</b></summary>\n\n|      Name |    Items | Type |     Best |  Average |     Loops | Samples |     Distribution |\n| --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |\n|  wolfsort |   100000 |   64 | 0.003006 | 0.003063 |         0 |     100 |     random order |\n|   skasort |   100000 |   64 | 0.001818 | 0.001842 |         0 |     100 |     random order |\n\n|      Name |    Items | Type |     Best |  Average |     Loops | Samples |     Distribution |\n| --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |\n|    rhsort |   100000 |   32 | 0.000706 | 0.000729 |         0 |     100 |     random order |\n|  wolfsort |   100000 |   32 | 0.001000 | 0.001026 |         0 |     100 |     random order |\n|   skasort |   100000 |   32 | 0.000626 | 0.000640 |         0 |     100 |     random order |\n|           |          |      |          |          |           |         |                  |\n|    rhsort |   100000 |   32 | 0.000115 | 0.000118 |         0 |     100 |     random % 100 |\n|  wolfsort |   100000 |   32 | 0.000376 | 0.000382 |         0 |     100 |     random % 100 |\n|   skasort |   100000 |   32 | 0.000780 | 0.000793 |         0 |     100 |     random % 100 |\n|           |          |      |          |          |           |         |                  |\n|    rhsort |   100000 |   32 | 0.000302 | 0.000317 |         0 |     100 |  ascending order |\n|  wolfsort |   100000 |   32 | 0.000086 | 0.000088 |         0 |     100 |  ascending order |\n|   skasort |   100000 |   32 | 0.000709 | 0.000720 |         0 |     100 |  ascending order |\n|           |          |      |          |          |           |         |                  |\n|    rhsort |   100000 |   32 | 0.000615 | 0.000633 |         0 |     100 |    ascending saw |\n|  wolfsort |   100000 |   32 | 0.000379 | 0.000407 |         0 |     100 |    ascending saw |\n|   skasort |   100000 |   32 | 0.000624 | 0.000637 |         0 |     100 |    ascending saw |\n|           |          |      |          |          |           |         |                  |\n|    rhsort |   100000 |   32 | 0.000591 | 0.000615 |         0 |     100 |       pipe organ |\n|  wolfsort |   100000 |   32 | 0.000248 | 0.000258 |         0 |     100 |       pipe organ |\n|   skasort |   100000 |   32 | 0.000624 | 0.000639 |         0 |     100 |       pipe organ |\n|           |          |      |          |          |           |         |                  |\n|    rhsort |   100000 |   32 | 0.000400 | 0.000420 |         0 |     100 | descending order |\n|  wolfsort |   100000 |   32 | 0.000097 | 0.000101 |         0 |     100 | descending order |\n|   skasort |   100000 |   32 | 0.000684 | 0.000693 |         0 |     100 | descending order |\n|           |          |      |          |          |           |         |                  |\n|    rhsort |   100000 |   32 | 0.000612 | 0.000629 |         0 |     100 |   descending saw |\n|  wolfsort |   100000 |   32 | 0.000389 | 0.000393 |         0 |     100 |   descending saw |\n|   skasort |   100000 |   32 | 0.000627 | 0.000639 |         0 |     100 |   descending saw |\n|           |          |      |          |          |           |         |                  |\n|    rhsort |   100000 |   32 | 0.000633 | 0.000664 |         0 |     100 |      random tail |\n|  wolfsort |   100000 |   32 | 0.000467 | 0.000473 |         0 |     100 |      random tail |\n|   skasort |   100000 |   32 | 0.000622 | 0.000636 |         0 |     100 |      random tail |\n|           |          |      |          |          |           |         |                  |\n|    rhsort |   100000 |   32 | 0.000671 | 0.000685 |         0 |     100 |      random half |\n|  wolfsort |   100000 |   32 | 0.000689 | 0.000706 |         0 |     100 |      random half |\n|   skasort |   100000 |   32 | 0.000628 | 0.000641 |         0 |     100 |      random half |\n|           |          |      |          |          |           |         |                  |\n|    rhsort |   100000 |   32 | 0.002019 | 0.002052 |         0 |     100 |  ascending tiles |\n|  wolfsort |   100000 |   32 | 0.000683 | 0.000691 |         0 |     100 |  ascending tiles |\n|   skasort |   100000 |   32 | 0.001096 | 0.001113 |         0 |     100 |  ascending tiles |\n|           |          |      |          |          |           |         |                  |\n|    rhsort |   100000 |   32 | 0.000837 | 0.000871 |         0 |     100 |     bit reversal |\n|  wolfsort |   100000 |   32 | 0.000887 | 0.000928 |         0 |     100 |     bit reversal |\n|   skasort |   100000 |   32 | 0.000775 | 0.000782 |         0 |     100 |     bit reversal |\n|           |          |      |          |          |           |         |                  |\n|    rhsort |   100000 |   32 | 0.000118 | 0.000123 |         0 |     100 |       random % 4 |\n|  wolfsort |   100000 |   32 | 0.000368 | 0.000371 |         0 |     100 |       random % 4 |\n|   skasort |   100000 |   32 | 0.000785 | 0.000809 |         0 |     100 |       random % 4 |\n|           |          |      |          |          |           |         |                  |\n|    rhsort |   100000 |   32 | 0.001278 | 0.001465 |         0 |     100 |      semi random |\n|  wolfsort |   100000 |   32 | 0.000792 | 0.000811 |         0 |     100 |      semi random |\n|   skasort |   100000 |   32 | 0.000805 | 0.000821 |         0 |     100 |      semi random |\n|           |          |      |          |          |           |         |                  |\n|    rhsort |   100000 |   32 | 0.000198 | 0.000202 |         0 |     100 |    random signal |\n|  wolfsort |   100000 |   32 | 0.000815 | 0.000829 |         0 |     100 |    random signal |\n|   skasort |   100000 |   32 | 0.001099 | 0.001118 |         0 |     100 |    random signal |\n\n</details>\n\nThe following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04).\nThe source code was compiled using `g++ -O3 -w -fpermissive bench.c`. It measures the performance on random data with array sizes\nranging from 10 to 10,000,000. It's generated by running the benchmark using 10000000 0 0 as the argument. The benchmark is weighted, meaning the number of repetitions\nhalves each time the number of items doubles. A table with the best and average time in seconds can be uncollapsed below the bar graph.\n\n![Graph](/images/radix2.png)\n\n<details><summary><b>data table</b></summary>\n\n|      Name |    Items | Type |     Best |  Average |  Compares | Samples |     Distribution |\n| --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |\n|    rhsort |       10 |   32 | 0.135095 | 0.137011 |       0.0 |      10 |        random 10 |\n|  wolfsort |       10 |   32 | 0.052087 | 0.052986 |       0.0 |      10 |        random 10 |\n|   skasort |       10 |   32 | 0.099853 | 0.100198 |       0.0 |      10 |        random 10 |\n|           |          |      |          |          |           |         |                  |\n|    rhsort |      100 |   32 | 0.069252 | 0.070421 |       0.0 |      10 |       random 100 |\n|  wolfsort |      100 |   32 | 0.132208 | 0.132824 |       0.0 |      10 |       random 100 |\n|   skasort |      100 |   32 | 0.232007 | 0.232507 |       0.0 |      10 |       random 100 |\n|           |          |      |          |          |           |         |                  |\n|    rhsort |     1000 |   32 | 0.055916 | 0.056130 |       0.0 |      10 |      random 1000 |\n|  wolfsort |     1000 |   32 | 0.101611 | 0.101913 |       0.0 |      10 |      random 1000 |\n|   skasort |     1000 |   32 | 0.054757 | 0.055050 |       0.0 |      10 |      random 1000 |\n|           |          |      |          |          |           |         |                  |\n|    rhsort |    10000 |   32 | 0.057062 | 0.057359 |       0.0 |      10 |     random 10000 |\n|  wolfsort |    10000 |   32 | 0.118598 | 0.119373 |       0.0 |      10 |     random 10000 |\n|   skasort |    10000 |   32 | 0.059786 | 0.060189 |       0.0 |      10 |     random 10000 |\n|           |          |      |          |          |           |         |                  |\n|    rhsort |   100000 |   32 | 0.071273 | 0.073310 |       0.0 |      10 |    random 100000 |\n|  wolfsort |   100000 |   32 | 0.102639 | 0.103917 |       0.0 |      10 |    random 100000 |\n|   skasort |   100000 |   32 | 0.064120 | 0.064615 |       0.0 |      10 |    random 100000 |\n|           |          |      |          |          |           |         |                  |\n|    rhsort |  1000000 |   32 | 0.181059 | 0.187563 |       0.0 |      10 |   random 1000000 |\n|  wolfsort |  1000000 |   32 | 0.146630 | 0.147598 |       0.0 |      10 |   random 1000000 |\n|   skasort |  1000000 |   32 | 0.070250 | 0.071571 |       0.0 |      10 |   random 1000000 |\n|           |          |      |          |          |           |         |                  |\n|    rhsort | 10000000 |   32 | 0.412107 | 0.425066 |         0 |      10 |  random 10000000 |\n|  wolfsort | 10000000 |   32 | 0.193120 | 0.200947 |         0 |      10 |  random 10000000 |\n|   skasort | 10000000 |   32 | 0.115721 | 0.116621 |         0 |      10 |  random 10000000 |\n\n</details>\n\nBenchmark for Wolfsort v1.2.1.3\n-------------------------------\n\nfluxsort vs gridsort vs quadsort vs wolfsort on 100K elements\n-------------------------------------------------------------\nThe following benchmark was on WSL gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1).\nThe source code was compiled using g++ -O3 -fpermissive bench.c. All comparisons are inlined through the cmp macro.\nA table with the best and average time in seconds can be uncollapsed below the bar graph.\n\n![Graph](/images/graph1.png)\n\n<details><summary><b>data table</b></summary>\n\n|      Name |    Items | Type |     Best |  Average |  Compares | Samples |     Distribution |\n| --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |\n|  fluxsort |   100000 |  128 | 0.008328 | 0.008424 |         0 |     100 |     random order |\n|  gridsort |   100000 |  128 | 0.007823 | 0.007932 |         0 |     100 |     random order |\n|  quadsort |   100000 |  128 | 0.008260 | 0.008353 |         0 |     100 |     random order |\n|  wolfsort |   100000 |  128 | 0.008330 | 0.008415 |         0 |     100 |     random order |\n\n|      Name |    Items | Type |     Best |  Average |  Compares | Samples |     Distribution |\n| --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |\n|  fluxsort |   100000 |   64 | 0.001971 | 0.001991 |         0 |     100 |     random order |\n|  gridsort |   100000 |   64 | 0.002370 | 0.002398 |         0 |     100 |     random order |\n|  quadsort |   100000 |   64 | 0.002230 | 0.002254 |         0 |     100 |     random order |\n|  wolfsort |   100000 |   64 | 0.003023 | 0.003068 |         0 |     100 |     random order |\n\n|      Name |    Items | Type |     Best |  Average |     Loops | Samples |     Distribution |\n| --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |\n|  fluxsort |   100000 |   32 | 0.001868 | 0.001901 |         0 |     100 |     random order |\n|  gridsort |   100000 |   32 | 0.002324 | 0.002357 |         0 |     100 |     random order |\n|  quadsort |   100000 |   32 | 0.002149 | 0.002174 |         0 |     100 |     random order |\n|  wolfsort |   100000 |   32 | 0.000988 | 0.001019 |         0 |     100 |     random order |\n|           |          |      |          |          |           |         |                  |\n|  fluxsort |   100000 |   32 | 0.000733 | 0.000740 |         0 |     100 |     random % 100 |\n|  gridsort |   100000 |   32 | 0.001921 | 0.001941 |         0 |     100 |     random % 100 |\n|  quadsort |   100000 |   32 | 0.001627 | 0.001645 |         0 |     100 |     random % 100 |\n|  wolfsort |   100000 |   32 | 0.000374 | 0.000378 |         0 |     100 |     random % 100 |\n|           |          |      |          |          |           |         |                  |\n|  fluxsort |   100000 |   32 | 0.000043 | 0.000044 |         0 |     100 |  ascending order |\n|  gridsort |   100000 |   32 | 0.000264 | 0.000271 |         0 |     100 |  ascending order |\n|  quadsort |   100000 |   32 | 0.000052 | 0.000053 |         0 |     100 |  ascending order |\n|  wolfsort |   100000 |   32 | 0.000087 | 0.000089 |         0 |     100 |  ascending order |\n|           |          |      |          |          |           |         |                  |\n|  fluxsort |   100000 |   32 | 0.000305 | 0.000314 |         0 |     100 |    ascending saw |\n|  gridsort |   100000 |   32 | 0.000621 | 0.000641 |         0 |     100 |    ascending saw |\n|  quadsort |   100000 |   32 | 0.000411 | 0.000417 |         0 |     100 |    ascending saw |\n|  wolfsort |   100000 |   32 | 0.000379 | 0.000384 |         0 |     100 |    ascending saw |\n|           |          |      |          |          |           |         |                  |\n|  fluxsort |   100000 |   32 | 0.000193 | 0.000203 |         0 |     100 |       pipe organ |\n|  gridsort |   100000 |   32 | 0.000446 | 0.000486 |         0 |     100 |       pipe organ |\n|  quadsort |   100000 |   32 | 0.000252 | 0.000260 |         0 |     100 |       pipe organ |\n|  wolfsort |   100000 |   32 | 0.000248 | 0.000259 |         0 |     100 |       pipe organ |\n|           |          |      |          |          |           |         |                  |\n|  fluxsort |   100000 |   32 | 0.000054 | 0.000055 |         0 |     100 | descending order |\n|  gridsort |   100000 |   32 | 0.000284 | 0.000295 |         0 |     100 | descending order |\n|  quadsort |   100000 |   32 | 0.000068 | 0.000070 |         0 |     100 | descending order |\n|  wolfsort |   100000 |   32 | 0.000097 | 0.000100 |         0 |     100 | descending order |\n|           |          |      |          |          |           |         |                  |\n|  fluxsort |   100000 |   32 | 0.000315 | 0.000325 |         0 |     100 |   descending saw |\n|  gridsort |   100000 |   32 | 0.000652 | 0.000667 |         0 |     100 |   descending saw |\n|  quadsort |   100000 |   32 | 0.000440 | 0.000446 |         0 |     100 |   descending saw |\n|  wolfsort |   100000 |   32 | 0.000389 | 0.000393 |         0 |     100 |   descending saw |\n|           |          |      |          |          |           |         |                  |\n|  fluxsort |   100000 |   32 | 0.000607 | 0.000619 |         0 |     100 |      random tail |\n|  gridsort |   100000 |   32 | 0.000847 | 0.000860 |         0 |     100 |      random tail |\n|  quadsort |   100000 |   32 | 0.000685 | 0.000694 |         0 |     100 |      random tail |\n|  wolfsort |   100000 |   32 | 0.000464 | 0.000471 |         0 |     100 |      random tail |\n|           |          |      |          |          |           |         |                  |\n|  fluxsort |   100000 |   32 | 0.001074 | 0.001081 |         0 |     100 |      random half |\n|  gridsort |   100000 |   32 | 0.001332 | 0.001355 |         0 |     100 |      random half |\n|  quadsort |   100000 |   32 | 0.001230 | 0.001243 |         0 |     100 |      random half |\n|  wolfsort |   100000 |   32 | 0.000686 | 0.000696 |         0 |     100 |      random half |\n|           |          |      |          |          |           |         |                  |\n|  fluxsort |   100000 |   32 | 0.000317 | 0.000324 |         0 |     100 |  ascending tiles |\n|  gridsort |   100000 |   32 | 0.000665 | 0.000693 |         0 |     100 |  ascending tiles |\n|  quadsort |   100000 |   32 | 0.000789 | 0.000802 |         0 |     100 |  ascending tiles |\n|  wolfsort |   100000 |   32 | 0.000686 | 0.000693 |         0 |     100 |  ascending tiles |\n|           |          |      |          |          |           |         |                  |\n|  fluxsort |   100000 |   32 | 0.001714 | 0.001751 |         0 |     100 |     bit reversal |\n|  gridsort |   100000 |   32 | 0.002045 | 0.002060 |         0 |     100 |     bit reversal |\n|  quadsort |   100000 |   32 | 0.002083 | 0.002100 |         0 |     100 |     bit reversal |\n|  wolfsort |   100000 |   32 | 0.000888 | 0.000912 |         0 |     100 |     bit reversal |\n|           |          |      |          |          |           |         |                  |\n|  fluxsort |   100000 |   32 | 0.000215 | 0.000223 |         0 |     100 |       random % 4 |\n|  gridsort |   100000 |   32 | 0.001283 | 0.001305 |         0 |     100 |       random % 4 |\n|  quadsort |   100000 |   32 | 0.001080 | 0.001090 |         0 |     100 |       random % 4 |\n|  wolfsort |   100000 |   32 | 0.000369 | 0.000371 |         0 |     100 |       random % 4 |\n|           |          |      |          |          |           |         |                  |\n|  fluxsort |   100000 |   32 | 0.001072 | 0.001098 |         0 |     100 |      semi random |\n|  gridsort |   100000 |   32 | 0.001355 | 0.001377 |         0 |     100 |      semi random |\n|  quadsort |   100000 |   32 | 0.001062 | 0.001074 |         0 |     100 |      semi random |\n|  wolfsort |   100000 |   32 | 0.000789 | 0.000803 |         0 |     100 |      semi random |\n|           |          |      |          |          |           |         |                  |\n|  fluxsort |   100000 |   32 | 0.001079 | 0.001099 |         0 |     100 |    random signal |\n|  gridsort |   100000 |   32 | 0.001296 | 0.001306 |         0 |     100 |    random signal |\n|  quadsort |   100000 |   32 | 0.001014 | 0.001027 |         0 |     100 |    random signal |\n|  wolfsort |   100000 |   32 | 0.000816 | 0.000828 |         0 |     100 |    random signal |\n\n</details>\n\nfluxsort vs gridsort vs quadsort vs wolfsort on 10M elements\n------------------------------------------------------------\n\n![Graph](/images/graph2.png)\n<details><summary><b>data table</b></summary>\n\n|      Name |    Items | Type |     Best |  Average |  Compares | Samples |     Distribution |\n| --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |\n|  fluxsort | 10000000 |  128 | 1.242395 | 1.264809 |         0 |      10 |     random order |\n|  gridsort | 10000000 |  128 | 1.048748 | 1.110490 |         0 |      10 |     random order |\n|  quadsort | 10000000 |  128 | 1.407639 | 1.418088 |         0 |      10 |     random order |\n|  wolfsort | 10000000 |  128 | 1.239099 | 1.241608 |         0 |      10 |     random order |\n\n|      Name |    Items | Type |     Best |  Average |  Compares | Samples |     Distribution |\n| --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |\n|  fluxsort | 10000000 |   64 | 0.317327 | 0.318203 |         0 |      10 |     random order |\n|  gridsort | 10000000 |   64 | 0.332430 | 0.334392 |         0 |      10 |     random order |\n|  quadsort | 10000000 |   64 | 0.438257 | 0.439139 |         0 |      10 |     random order |\n|  wolfsort | 10000000 |   64 | 0.481977 | 0.484055 |         0 |      10 |     random order |\n\n|      Name |    Items | Type |     Best |  Average |     Loops | Samples |     Distribution |\n| --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |\n|  fluxsort | 10000000 |   32 | 0.269351 | 0.271460 |         0 |      10 |     random order |\n|  gridsort | 10000000 |   32 | 0.322099 | 0.323899 |         0 |      10 |     random order |\n|  quadsort | 10000000 |   32 | 0.364457 | 0.365617 |         0 |      10 |     random order |\n|  wolfsort | 10000000 |   32 | 0.189780 | 0.190911 |         0 |      10 |     random order |\n|           |          |      |          |          |           |         |                  |\n|  fluxsort | 10000000 |   32 | 0.089973 | 0.090849 |         0 |      10 |     random % 100 |\n|  gridsort | 10000000 |   32 | 0.172222 | 0.173147 |         0 |      10 |     random % 100 |\n|  quadsort | 10000000 |   32 | 0.248361 | 0.250615 |         0 |      10 |     random % 100 |\n|  wolfsort | 10000000 |   32 | 0.086473 | 0.087067 |         0 |      10 |     random % 100 |\n|           |          |      |          |          |           |         |                  |\n|  fluxsort | 10000000 |   32 | 0.006437 | 0.006574 |         0 |      10 |  ascending order |\n|  gridsort | 10000000 |   32 | 0.032321 | 0.032798 |         0 |      10 |  ascending order |\n|  quadsort | 10000000 |   32 | 0.011736 | 0.012125 |         0 |      10 |  ascending order |\n|  wolfsort | 10000000 |   32 | 0.010888 | 0.011015 |         0 |      10 |  ascending order |\n|           |          |      |          |          |           |         |                  |\n|  fluxsort | 10000000 |   32 | 0.074940 | 0.075525 |         0 |      10 |    ascending saw |\n|  gridsort | 10000000 |   32 | 0.067478 | 0.067893 |         0 |      10 |    ascending saw |\n|  quadsort | 10000000 |   32 | 0.097133 | 0.098004 |         0 |      10 |    ascending saw |\n|  wolfsort | 10000000 |   32 | 0.081797 | 0.082794 |         0 |      10 |    ascending saw |\n|           |          |      |          |          |           |         |                  |\n|  fluxsort | 10000000 |   32 | 0.064577 | 0.065593 |         0 |      10 |       pipe organ |\n|  gridsort | 10000000 |   32 | 0.048932 | 0.049336 |         0 |      10 |       pipe organ |\n|  quadsort | 10000000 |   32 | 0.082533 | 0.083781 |         0 |      10 |       pipe organ |\n|  wolfsort | 10000000 |   32 | 0.070334 | 0.071158 |         0 |      10 |       pipe organ |\n|           |          |      |          |          |           |         |                  |\n|  fluxsort | 10000000 |   32 | 0.009807 | 0.010104 |         0 |      10 | descending order |\n|  gridsort | 10000000 |   32 | 0.034583 | 0.034814 |         0 |      10 | descending order |\n|  quadsort | 10000000 |   32 | 0.011396 | 0.011639 |         0 |      10 | descending order |\n|  wolfsort | 10000000 |   32 | 0.014198 | 0.014544 |         0 |      10 | descending order |\n|           |          |      |          |          |           |         |                  |\n|  fluxsort | 10000000 |   32 | 0.078279 | 0.079071 |         0 |      10 |   descending saw |\n|  gridsort | 10000000 |   32 | 0.069702 | 0.070109 |         0 |      10 |   descending saw |\n|  quadsort | 10000000 |   32 | 0.101826 | 0.102801 |         0 |      10 |   descending saw |\n|  wolfsort | 10000000 |   32 | 0.085101 | 0.085973 |         0 |      10 |   descending saw |\n|           |          |      |          |          |           |         |                  |\n|  fluxsort | 10000000 |   32 | 0.121948 | 0.122561 |         0 |      10 |      random tail |\n|  gridsort | 10000000 |   32 | 0.109341 | 0.110117 |         0 |      10 |      random tail |\n|  quadsort | 10000000 |   32 | 0.153324 | 0.153797 |         0 |      10 |      random tail |\n|  wolfsort | 10000000 |   32 | 0.103558 | 0.104152 |         0 |      10 |      random tail |\n|           |          |      |          |          |           |         |                  |\n|  fluxsort | 10000000 |   32 | 0.181347 | 0.183186 |         0 |      10 |      random half |\n|  gridsort | 10000000 |   32 | 0.185691 | 0.186592 |         0 |      10 |      random half |\n|  quadsort | 10000000 |   32 | 0.225265 | 0.225897 |         0 |      10 |      random half |\n|  wolfsort | 10000000 |   32 | 0.159819 | 0.160569 |         0 |      10 |      random half |\n|           |          |      |          |          |           |         |                  |\n|  fluxsort | 10000000 |   32 | 0.073673 | 0.074755 |         0 |      10 |  ascending tiles |\n|  gridsort | 10000000 |   32 | 0.126309 | 0.126626 |         0 |      10 |  ascending tiles |\n|  quadsort | 10000000 |   32 | 0.165332 | 0.167541 |         0 |      10 |  ascending tiles |\n|  wolfsort | 10000000 |   32 | 0.093424 | 0.094040 |         0 |      10 |  ascending tiles |\n|           |          |      |          |          |           |         |                  |\n|  fluxsort | 10000000 |   32 | 0.271679 | 0.272589 |         0 |      10 |     bit reversal |\n|  gridsort | 10000000 |   32 | 0.296563 | 0.297984 |         0 |      10 |     bit reversal |\n|  quadsort | 10000000 |   32 | 0.369105 | 0.370652 |         0 |      10 |     bit reversal |\n|  wolfsort | 10000000 |   32 | 0.251209 | 0.252891 |         0 |      10 |     bit reversal |\n|           |          |      |          |          |           |         |                  |\n|  fluxsort | 10000000 |   32 | 0.056011 | 0.056552 |         0 |      10 |       random % 4 |\n|  gridsort | 10000000 |   32 | 0.191179 | 0.194017 |         0 |      10 |       random % 4 |\n|  quadsort | 10000000 |   32 | 0.192466 | 0.193967 |         0 |      10 |       random % 4 |\n|  wolfsort | 10000000 |   32 | 0.081668 | 0.082543 |         0 |      10 |       random % 4 |\n|           |          |      |          |          |           |         |                  |\n|  fluxsort | 10000000 |   32 | 0.054231 | 0.054571 |         0 |      10 |      semi random |\n|  gridsort | 10000000 |   32 | 0.146534 | 0.146907 |         0 |      10 |      semi random |\n|  quadsort | 10000000 |   32 | 0.197462 | 0.200010 |         0 |      10 |      semi random |\n|  wolfsort | 10000000 |   32 | 0.192603 | 0.194365 |         0 |      10 |      semi random |\n|           |          |      |          |          |           |         |                  |\n|  fluxsort | 10000000 |   32 | 0.173080 | 0.176575 |         0 |      10 |    random signal |\n|  gridsort | 10000000 |   32 | 0.137590 | 0.137932 |         0 |      10 |    random signal |\n|  quadsort | 10000000 |   32 | 0.180939 | 0.181778 |         0 |      10 |    random signal |\n|  wolfsort | 10000000 |   32 | 0.161181 | 0.161714 |         0 |      10 |    random signal |\n\n</details>\n\n\nblitsort vs crumsort vs pdqsort vs wolfsort on 100K elements\n-------------------------------------------------------------\nThe following benchmark was on WSL gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1).\nThe source code was compiled using g++ -O3 -fpermissive bench.c. All comparisons are inlined through the cmp macro.\nA table with the best and average time in seconds can be uncollapsed below the bar graph.\n\nBlitsort uses 512 elements of auxiliary memory, crumsort 512, pdqsort 64, and wolfsort 100000.\n![Graph](/images/graph3.png)\n\n<details><summary><b>data table</b></summary>\n\n|      Name |    Items | Type |     Best |  Average |  Compares | Samples |     Distribution |\n| --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |\n|  blitsort |   100000 |  128 | 0.010864 | 0.010994 |         0 |     100 |     random order |\n|  crumsort |   100000 |  128 | 0.008143 | 0.008222 |         0 |     100 |     random order |\n|   pdqsort |   100000 |  128 | 0.005954 | 0.006063 |         0 |     100 |     random order |\n|  wolfsort |   100000 |  128 | 0.008308 | 0.008396 |         0 |     100 |     random order |\n\n|      Name |    Items | Type |     Best |  Average |  Compares | Samples |     Distribution |\n| --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |\n|  blitsort |   100000 |   64 | 0.002326 | 0.002354 |         0 |     100 |     random order |\n|  crumsort |   100000 |   64 | 0.001835 | 0.001848 |         0 |     100 |     random order |\n|   pdqsort |   100000 |   64 | 0.002752 | 0.002806 |         0 |     100 |     random order |\n|  wolfsort |   100000 |   64 | 0.003014 | 0.003069 |         0 |     100 |     random order |\n\n|      Name |    Items | Type |     Best |  Average |     Loops | Samples |     Distribution |\n| --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |\n|  blitsort |   100000 |   32 | 0.002094 | 0.002117 |         0 |     100 |     random order |\n|  crumsort |   100000 |   32 | 0.001764 | 0.001779 |         0 |     100 |     random order |\n|   pdqsort |   100000 |   32 | 0.002747 | 0.002770 |         0 |     100 |     random order |\n|  wolfsort |   100000 |   32 | 0.000983 | 0.001016 |         0 |     100 |     random order |\n|           |          |      |          |          |           |         |                  |\n|  blitsort |   100000 |   32 | 0.000880 | 0.000891 |         0 |     100 |     random % 100 |\n|  crumsort |   100000 |   32 | 0.000602 | 0.000641 |         0 |     100 |     random % 100 |\n|   pdqsort |   100000 |   32 | 0.000795 | 0.000805 |         0 |     100 |     random % 100 |\n|  wolfsort |   100000 |   32 | 0.000376 | 0.000381 |         0 |     100 |     random % 100 |\n|           |          |      |          |          |           |         |                  |\n|  blitsort |   100000 |   32 | 0.000043 | 0.000045 |         0 |     100 |  ascending order |\n|  crumsort |   100000 |   32 | 0.000043 | 0.000044 |         0 |     100 |  ascending order |\n|   pdqsort |   100000 |   32 | 0.000084 | 0.000088 |         0 |     100 |  ascending order |\n|  wolfsort |   100000 |   32 | 0.000086 | 0.000088 |         0 |     100 |  ascending order |\n|           |          |      |          |          |           |         |                  |\n|  blitsort |   100000 |   32 | 0.000440 | 0.000450 |         0 |     100 |    ascending saw |\n|  crumsort |   100000 |   32 | 0.000410 | 0.000419 |         0 |     100 |    ascending saw |\n|   pdqsort |   100000 |   32 | 0.003222 | 0.003246 |         0 |     100 |    ascending saw |\n|  wolfsort |   100000 |   32 | 0.000379 | 0.000382 |         0 |     100 |    ascending saw |\n|           |          |      |          |          |           |         |                  |\n|  blitsort |   100000 |   32 | 0.000242 | 0.000251 |         0 |     100 |       pipe organ |\n|  crumsort |   100000 |   32 | 0.000229 | 0.000243 |         0 |     100 |       pipe organ |\n|   pdqsort |   100000 |   32 | 0.002842 | 0.002864 |         0 |     100 |       pipe organ |\n|  wolfsort |   100000 |   32 | 0.000249 | 0.000257 |         0 |     100 |       pipe organ |\n|           |          |      |          |          |           |         |                  |\n|  blitsort |   100000 |   32 | 0.000054 | 0.000055 |         0 |     100 | descending order |\n|  crumsort |   100000 |   32 | 0.000054 | 0.000055 |         0 |     100 | descending order |\n|   pdqsort |   100000 |   32 | 0.000190 | 0.000198 |         0 |     100 | descending order |\n|  wolfsort |   100000 |   32 | 0.000097 | 0.000100 |         0 |     100 | descending order |\n|           |          |      |          |          |           |         |                  |\n|  blitsort |   100000 |   32 | 0.000452 | 0.000466 |         0 |     100 |   descending saw |\n|  crumsort |   100000 |   32 | 0.000421 | 0.000431 |         0 |     100 |   descending saw |\n|   pdqsort |   100000 |   32 | 0.004200 | 0.004245 |         0 |     100 |   descending saw |\n|  wolfsort |   100000 |   32 | 0.000383 | 0.000402 |         0 |     100 |   descending saw |\n|           |          |      |          |          |           |         |                  |\n|  blitsort |   100000 |   32 | 0.000782 | 0.000829 |         0 |     100 |      random tail |\n|  crumsort |   100000 |   32 | 0.000714 | 0.000755 |         0 |     100 |      random tail |\n|   pdqsort |   100000 |   32 | 0.002638 | 0.002759 |         0 |     100 |      random tail |\n|  wolfsort |   100000 |   32 | 0.000463 | 0.000483 |         0 |     100 |      random tail |\n|           |          |      |          |          |           |         |                  |\n|  blitsort |   100000 |   32 | 0.001210 | 0.001275 |         0 |     100 |      random half |\n|  crumsort |   100000 |   32 | 0.001063 | 0.001096 |         0 |     100 |      random half |\n|   pdqsort |   100000 |   32 | 0.002738 | 0.002780 |         0 |     100 |      random half |\n|  wolfsort |   100000 |   32 | 0.000685 | 0.000712 |         0 |     100 |      random half |\n|           |          |      |          |          |           |         |                  |\n|  blitsort |   100000 |   32 | 0.001105 | 0.001278 |         0 |     100 |  ascending tiles |\n|  crumsort |   100000 |   32 | 0.001393 | 0.001435 |         0 |     100 |  ascending tiles |\n|   pdqsort |   100000 |   32 | 0.002367 | 0.002398 |         0 |     100 |  ascending tiles |\n|  wolfsort |   100000 |   32 | 0.000682 | 0.000689 |         0 |     100 |  ascending tiles |\n|           |          |      |          |          |           |         |                  |\n|  blitsort |   100000 |   32 | 0.001956 | 0.001988 |         0 |     100 |     bit reversal |\n|  crumsort |   100000 |   32 | 0.001762 | 0.001794 |         0 |     100 |     bit reversal |\n|   pdqsort |   100000 |   32 | 0.002731 | 0.002758 |         0 |     100 |     bit reversal |\n|  wolfsort |   100000 |   32 | 0.000890 | 0.000921 |         0 |     100 |     bit reversal |\n|           |          |      |          |          |           |         |                  |\n|  blitsort |   100000 |   32 | 0.000328 | 0.000341 |         0 |     100 |       random % 4 |\n|  crumsort |   100000 |   32 | 0.000206 | 0.000216 |         0 |     100 |       random % 4 |\n|   pdqsort |   100000 |   32 | 0.000382 | 0.000391 |         0 |     100 |       random % 4 |\n|  wolfsort |   100000 |   32 | 0.000367 | 0.000378 |         0 |     100 |       random % 4 |\n|           |          |      |          |          |           |         |                  |\n|  blitsort |   100000 |   32 | 0.001209 | 0.001244 |         0 |     100 |      semi random |\n|  crumsort |   100000 |   32 | 0.000309 | 0.000319 |         0 |     100 |      semi random |\n|   pdqsort |   100000 |   32 | 0.000479 | 0.000500 |         0 |     100 |      semi random |\n|  wolfsort |   100000 |   32 | 0.000791 | 0.000828 |         0 |     100 |      semi random |\n|           |          |      |          |          |           |         |                  |\n|  blitsort |   100000 |   32 | 0.001893 | 0.001926 |         0 |     100 |    random signal |\n|  crumsort |   100000 |   32 | 0.001714 | 0.001742 |         0 |     100 |    random signal |\n|   pdqsort |   100000 |   32 | 0.002950 | 0.002976 |         0 |     100 |    random signal |\n|  wolfsort |   100000 |   32 | 0.000814 | 0.000834 |         0 |     100 |    random signal |\n\n</details>\n\nblitsort vs crumsort vs pdqsort vs wolfsort on 10M elements\n-----------------------------------------------------------\nBlitsort uses 512 elements of auxiliary memory, crumsort 512, pdqsort 64, and wolfsort 100000000.\n\n![Graph](/images/graph4.png)\n<details><summary><b>data table</b></summary>\n\n|      Name |    Items | Type |     Best |  Average |  Compares | Samples |     Distribution |\n| --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |\n|  blitsort | 10000000 |  128 | 2.172622 | 2.191956 |         0 |      10 |     random order |\n|  crumsort | 10000000 |  128 | 1.134328 | 1.135821 |         0 |      10 |     random order |\n|   pdqsort | 10000000 |  128 | 0.805620 | 0.808041 |         0 |      10 |     random order |\n|  wolfsort | 10000000 |  128 | 1.237174 | 1.238863 |         0 |      10 |     random order |\n\n|      Name |    Items | Type |     Best |  Average |  Compares | Samples |     Distribution |\n| --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |\n|  blitsort | 10000000 |   64 | 0.434356 | 0.443134 |         0 |      10 |     random order |\n|  crumsort | 10000000 |   64 | 0.250065 | 0.251453 |         0 |      10 |     random order |\n|   pdqsort | 10000000 |   64 | 0.359586 | 0.360388 |         0 |      10 |     random order |\n|  wolfsort | 10000000 |   64 | 0.480904 | 0.482835 |         0 |      10 |     random order |\n\n|      Name |    Items | Type |     Best |  Average |     Loops | Samples |     Distribution |\n| --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |\n|  blitsort | 10000000 |   32 | 0.332071 | 0.339524 |         0 |      10 |     random order |\n|  crumsort | 10000000 |   32 | 0.231584 | 0.232056 |         0 |      10 |     random order |\n|   pdqsort | 10000000 |   32 | 0.347793 | 0.348437 |         0 |      10 |     random order |\n|  wolfsort | 10000000 |   32 | 0.189250 | 0.189762 |         0 |      10 |     random order |\n|           |          |      |          |          |           |         |                  |\n|  blitsort | 10000000 |   32 | 0.126792 | 0.128439 |         0 |      10 |     random % 100 |\n|  crumsort | 10000000 |   32 | 0.060683 | 0.061353 |         0 |      10 |     random % 100 |\n|   pdqsort | 10000000 |   32 | 0.079284 | 0.079891 |         0 |      10 |     random % 100 |\n|  wolfsort | 10000000 |   32 | 0.086577 | 0.087157 |         0 |      10 |     random % 100 |\n|           |          |      |          |          |           |         |                  |\n|  blitsort | 10000000 |   32 | 0.006581 | 0.006784 |         0 |      10 |  ascending order |\n|  crumsort | 10000000 |   32 | 0.006690 | 0.006801 |         0 |      10 |  ascending order |\n|   pdqsort | 10000000 |   32 | 0.011712 | 0.011851 |         0 |      10 |  ascending order |\n|  wolfsort | 10000000 |   32 | 0.010958 | 0.011520 |         0 |      10 |  ascending order |\n|           |          |      |          |          |           |         |                  |\n|  blitsort | 10000000 |   32 | 0.070514 | 0.071260 |         0 |      10 |    ascending saw |\n|  crumsort | 10000000 |   32 | 0.064829 | 0.066035 |         0 |      10 |    ascending saw |\n|   pdqsort | 10000000 |   32 | 0.560995 | 0.561774 |         0 |      10 |    ascending saw |\n|  wolfsort | 10000000 |   32 | 0.081644 | 0.082279 |         0 |      10 |    ascending saw |\n|           |          |      |          |          |           |         |                  |\n|  blitsort | 10000000 |   32 | 0.041220 | 0.041924 |         0 |      10 |       pipe organ |\n|  crumsort | 10000000 |   32 | 0.039335 | 0.040018 |         0 |      10 |       pipe organ |\n|   pdqsort | 10000000 |   32 | 0.363633 | 0.364187 |         0 |      10 |       pipe organ |\n|  wolfsort | 10000000 |   32 | 0.070536 | 0.071400 |         0 |      10 |       pipe organ |\n|           |          |      |          |          |           |         |                  |\n|  blitsort | 10000000 |   32 | 0.010271 | 0.010549 |         0 |      10 | descending order |\n|  crumsort | 10000000 |   32 | 0.010254 | 0.010499 |         0 |      10 | descending order |\n|   pdqsort | 10000000 |   32 | 0.023129 | 0.023708 |         0 |      10 | descending order |\n|  wolfsort | 10000000 |   32 | 0.014583 | 0.015316 |         0 |      10 | descending order |\n|           |          |      |          |          |           |         |                  |\n|  blitsort | 10000000 |   32 | 0.073410 | 0.074402 |         0 |      10 |   descending saw |\n|  crumsort | 10000000 |   32 | 0.068284 | 0.069154 |         0 |      10 |   descending saw |\n|   pdqsort | 10000000 |   32 | 0.942142 | 0.958606 |         0 |      10 |   descending saw |\n|  wolfsort | 10000000 |   32 | 0.085338 | 0.086014 |         0 |      10 |   descending saw |\n|           |          |      |          |          |           |         |                  |\n|  blitsort | 10000000 |   32 | 0.124089 | 0.130327 |         0 |      10 |      random tail |\n|  crumsort | 10000000 |   32 | 0.103030 | 0.104337 |         0 |      10 |      random tail |\n|   pdqsort | 10000000 |   32 | 0.337862 | 0.342594 |         0 |      10 |      random tail |\n|  wolfsort | 10000000 |   32 | 0.103381 | 0.108048 |         0 |      10 |      random tail |\n|           |          |      |          |          |           |         |                  |\n|  blitsort | 10000000 |   32 | 0.191479 | 0.193036 |         0 |      10 |      random half |\n|  crumsort | 10000000 |   32 | 0.146732 | 0.147742 |         0 |      10 |      random half |\n|   pdqsort | 10000000 |   32 | 0.342803 | 0.343424 |         0 |      10 |      random half |\n|  wolfsort | 10000000 |   32 | 0.159515 | 0.160787 |         0 |      10 |      random half |\n|           |          |      |          |          |           |         |                  |\n|  blitsort | 10000000 |   32 | 0.182256 | 0.190378 |         0 |      10 |  ascending tiles |\n|  crumsort | 10000000 |   32 | 0.188875 | 0.195063 |         0 |      10 |  ascending tiles |\n|   pdqsort | 10000000 |   32 | 0.285777 | 0.286996 |         0 |      10 |  ascending tiles |\n|  wolfsort | 10000000 |   32 | 0.093709 | 0.094315 |         0 |      10 |  ascending tiles |\n|           |          |      |          |          |           |         |                  |\n|  blitsort | 10000000 |   32 | 0.324983 | 0.326345 |         0 |      10 |     bit reversal |\n|  crumsort | 10000000 |   32 | 0.230872 | 0.231599 |         0 |      10 |     bit reversal |\n|   pdqsort | 10000000 |   32 | 0.343915 | 0.344677 |         0 |      10 |     bit reversal |\n|  wolfsort | 10000000 |   32 | 0.250331 | 0.251319 |         0 |      10 |     bit reversal |\n|           |          |      |          |          |           |         |                  |\n|  blitsort | 10000000 |   32 | 0.061197 | 0.062058 |         0 |      10 |       random % 4 |\n|  crumsort | 10000000 |   32 | 0.030134 | 0.030564 |         0 |      10 |       random % 4 |\n|   pdqsort | 10000000 |   32 | 0.043492 | 0.043673 |         0 |      10 |       random % 4 |\n|  wolfsort | 10000000 |   32 | 0.081548 | 0.082020 |         0 |      10 |       random % 4 |\n|           |          |      |          |          |           |         |                  |\n|  blitsort | 10000000 |   32 | 0.066686 | 0.067764 |         0 |      10 |      semi random |\n|  crumsort | 10000000 |   32 | 0.045479 | 0.046088 |         0 |      10 |      semi random |\n|   pdqsort | 10000000 |   32 | 0.060253 | 0.060612 |         0 |      10 |      semi random |\n|  wolfsort | 10000000 |   32 | 0.190505 | 0.191946 |         0 |      10 |      semi random |\n|           |          |      |          |          |           |         |                  |\n|  blitsort | 10000000 |   32 | 0.272456 | 0.274928 |         0 |      10 |    random signal |\n|  crumsort | 10000000 |   32 | 0.224115 | 0.225966 |         0 |      10 |    random signal |\n|   pdqsort | 10000000 |   32 | 0.382742 | 0.384505 |         0 |      10 |    random signal |\n|  wolfsort | 10000000 |   32 | 0.160946 | 0.161769 |         0 |      10 |    random signal |\n\n</details>\n"
  },
  {
    "path": "src/bench.c",
    "content": "/*\n\tTo compile use either:\n\n\tgcc -O3 bench.c\n\n\tor\n\n\tclang -O3 bench.c\n\n\tor\n\n\tg++ -O3 bench.c\n*/\n\n#include <stdlib.h>\n#include <stdio.h>\n#include <string.h>\n#include <sys/time.h>\n#include <time.h>\n#include <errno.h>\n#include <math.h>\n\n#define cmp(a,b) (*(a) > *(b)) // uncomment for faster primitive comparisons\n\nconst char *sorts[] = { \"*\", \"quadsort\", \"gridsort\", \"blitsort\", \"fluxsort\", \"skipsort\", \"crumsort\", \"wolfsort\", \"sort::std\" };\n\n//#define SKIP_STRINGS\n//#define SKIP_DOUBLES\n//#define SKIP_LONGS\n\n#if __has_include(\"blitsort.h\")\n  #include \"blitsort.h\" // curl \"https://raw.githubusercontent.com/scandum/blitsort/master/src/blitsort.{c,h}\" -o \"blitsort.#1\"\n#endif\n#if __has_include(\"crumsort.h\")\n  #include \"crumsort.h\" // curl \"https://raw.githubusercontent.com/scandum/crumsort/master/src/crumsort.{c,h}\" -o \"crumsort.#1\"\n#endif\n#if __has_include(\"dripsort.h\")\n  #include \"dripsort.h\"\n#endif\n#if __has_include(\"flowsort.h\")\n  #include \"flowsort.h\"\n#endif\n#if __has_include(\"fluxsort.h\")\n  #include \"fluxsort.h\" // curl \"https://raw.githubusercontent.com/scandum/fluxsort/master/src/fluxsort.{c,h}\" -o \"fluxsort.#1\"\n#endif\n#if __has_include(\"gridsort.h\")\n  #include \"gridsort.h\" // curl \"https://raw.githubusercontent.com/scandum/gridsort/master/src/gridsort.{c,h}\" -o \"gridsort.#1\"\n#endif\n#if __has_include(\"octosort.h\")\n  #include \"octosort.h\" // curl \"https://raw.githubusercontent.com/scandum/octosort/master/src/octosort.{c,h}\" -o \"octosort.#1\"\n#endif\n#if __has_include(\"piposort.h\")\n  #include \"piposort.h\" // curl \"https://raw.githubusercontent.com/scandum/piposort/master/src/piposort.{c,h}\" -o \"piposort.#1\"\n#endif\n#if __has_include(\"quadsort.h\")\n  #include \"quadsort.h\" // curl \"https://raw.githubusercontent.com/scandum/quadsort/master/src/quadsort.{c,h}\" -o \"quadsort.#1\"\n#endif\n#if __has_include(\"skipsort.h\")\n  #include \"skipsort.h\"\n#endif\n#if __has_include(\"wolfsort.h\")\n  #include \"wolfsort.h\" // curl \"https://raw.githubusercontent.com/scandum/wolfsort/master/src/wolfsort.{c,h}\" -o \"wolfsort.#1\"\n#endif\n\n#if __has_include(\"rhsort.c\")\n    #define RHSORT_C\n    #include \"rhsort.c\" // curl https://raw.githubusercontent.com/mlochbaum/rhsort/master/rhsort.c > rhsort.c\n#endif\n\n#ifdef __GNUG__\n  #include <algorithm>\n  #if __has_include(\"pdqsort.h\")\n    #include \"pdqsort.h\" // curl https://raw.githubusercontent.com/orlp/pdqsort/master/pdqsort.h > pdqsort.h\n  #endif\n  #if __has_include(\"ska_sort.hpp\")\n    #define SKASORT_HPP\n    #include \"ska_sort.hpp\" // curl https://raw.githubusercontent.com/skarupke/ska_sort/master/ska_sort.hpp > ska_sort.hpp\n  #endif\n  #if __has_include(\"timsort.hpp\")\n    #include \"timsort.hpp\" // curl https://raw.githubusercontent.com/timsort/cpp-TimSort/master/include/gfx/timsort.hpp > timsort.hpp\n  #endif\n#endif\n\n#if __has_include(\"antiqsort.c\")\n  #include \"antiqsort.c\"\n#endif\n\n//typedef int CMPFUNC (const void *a, const void *b);\n\ntypedef void SRTFUNC(void *array, size_t nmemb, size_t size, CMPFUNC *cmpf);\n\n\n// Comment out Remove __attribute__ ((noinline)) and comparisons++ for full\n// throttle. Like so: #define COMPARISON_PP //comparisons++ \n\nsize_t comparisons;\n\n#define COMPARISON_PP comparisons++\n\n#define NO_INLINE __attribute__ ((noinline))\n\n// primitive type comparison functions\n\nNO_INLINE int cmp_int(const void * a, const void * b)\n{\n\tCOMPARISON_PP;\n\n\treturn *(int *) a - *(int *) b;\n\n//\tconst int l = *(const int *)a;\n//\tconst int r = *(const int *)b;\n\n//\treturn l - r;\n//\treturn l > r;\n//\treturn (l > r) - (l < r);\n}\n\nNO_INLINE int cmp_rev(const void * a, const void * b)\n{\n\tint fa = *(int *)a;\n\tint fb = *(int *)b;\n\n\tCOMPARISON_PP;\n\n\treturn fb - fa;\n}\n\nNO_INLINE int cmp_stable(const void * a, const void * b)\n{\n\tint fa = *(int *)a;\n\tint fb = *(int *)b;\n\n\tCOMPARISON_PP;\n\n\treturn fa / 100000 - fb / 100000;\n}\n\nNO_INLINE int cmp_long(const void * a, const void * b)\n{\n\tconst long long fa = *(const long long *) a;\n\tconst long long fb = *(const long long *) b;\n\n\tCOMPARISON_PP;\n\n\treturn (fa > fb) - (fa < fb);\n//\treturn (fa > fb);\n}\n\nNO_INLINE int cmp_float(const void * a, const void * b)\n{\n\treturn *(float *) a - *(float *) b;\n}\n\nNO_INLINE int cmp_long_double(const void * a, const void * b)\n{\n\tconst long double fa = *(const long double *) a;\n\tconst long double fb = *(const long double *) b;\n\n\tCOMPARISON_PP;\n\n\treturn (fa > fb) - (fa < fb);\n\n/*\tif (isnan(fa) || isnan(fb))\n\t{\n\t\treturn isnan(fa) - isnan(fb);\n\t}\n\n\treturn (fa > fb);\n*/\n}\n\n// pointer comparison functions\n\nNO_INLINE int cmp_str(const void * a, const void * b)\n{\n\tCOMPARISON_PP;\n\n\treturn strcmp(*(const char **) a, *(const char **) b);\n}\n\nNO_INLINE int cmp_int_ptr(const void * a, const void * b)\n{\n\tconst int *fa = *(const int **) a;\n\tconst int *fb = *(const int **) b;\n\n\tCOMPARISON_PP;\n\n\treturn (*fa > *fb) - (*fa < *fb);\n}\n\nNO_INLINE int cmp_long_ptr(const void * a, const void * b)\n{\n\tconst long long *fa = *(const long long **) a;\n\tconst long long *fb = *(const long long **) b;\n\n\tCOMPARISON_PP;\n\n\treturn (*fa > *fb) - (*fa < *fb);\n}\n\nNO_INLINE int cmp_long_double_ptr(const void * a, const void * b)\n{\n\tconst long double *fa = *(const long double **) a;\n\tconst long double *fb = *(const long double **) b;\n\n\tCOMPARISON_PP;\n\n\treturn (*fa > *fb) - (*fa < *fb);\n}\n\n// c++ comparison functions\n\n#ifdef __GNUG__\n\nNO_INLINE bool cpp_cmp_int(const int &a, const int &b)\n{\n\tCOMPARISON_PP;\n\n\treturn a < b;\n}\n\nNO_INLINE bool cpp_cmp_str(char const* const a, char const* const b)\n{\n\tCOMPARISON_PP;\n\n\treturn strcmp(a, b) < 0;\n}\n\n#endif\n\nlong long utime()\n{\n\tstruct timeval now_time;\n\n\tgettimeofday(&now_time, NULL);\n\n\treturn now_time.tv_sec * 1000000LL + now_time.tv_usec;\n}\n\nvoid seed_rand(unsigned long long seed)\n{\n\tsrand(seed);\n}\n\nvoid test_sort(void *array, void *unsorted, void *valid, int minimum, int maximum, int samples, int repetitions, SRTFUNC *srt, const char *name, const char *desc, size_t size, CMPFUNC *cmpf)\n{\n\tlong long start, end, total, best, average_time, average_comp;\n\tchar temp[100];\n\tstatic char compare = 0;\n\tlong long *ptla = (long long *) array, *ptlv = (long long *) valid;\n\tlong double *ptda = (long double *) array, *ptdv = (long double *) valid;\n\tint *pta = (int *) array, *ptv = (int *) valid, rep, sam, max, cnt, name32;\n\n#ifdef SKASORT_HPP\n\tvoid *swap;\n#endif\n\n\tif (*name == '*')\n\t{\n\t\tif (!strcmp(desc, \"random order\") || !strcmp(desc, \"random 1-4\") || !strcmp(desc, \"random 4\") || !strcmp(desc, \"random string\") || !strcmp(desc, \"random 10\"))\n\t\t{\n\t\t\tif (comparisons)\n\t\t\t{\n\t\t\t\tcompare = 1;\n\t\t\t\tprintf(\"%s\\n\", \"|      Name |    Items | Type |     Best |  Average |  Compares | Samples |     Distribution |\");\n\t\t\t\tprintf(\"%s\\n\", \"| --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |\");\n\t\t\t}\n\t\t\telse\n\t\t\t{\n\t\t\t\tprintf(\"%s\\n\", \"|      Name |    Items | Type |     Best |  Average |     Loops | Samples |     Distribution |\");\n\t\t\t\tprintf(\"%s\\n\", \"| --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |\");\n\t\t\t}\n\t\t}\n\t\telse\n\t\t{\n\t\t\t\tprintf(\"%s\\n\", \"|           |          |      |          |          |           |         |                  |\");\n\t\t}\n\t\treturn;\n\t}\n\n\tname32 = name[0] + (name[1] ? name[1] * 32 : 0) + (name[2] ? name[2] * 1024 : 0);\n\n\tbest = average_time = average_comp = 0;\n\n\tif (minimum == 7 && maximum == 7)\n\t{\n\t\tpta = (int *) unsorted;\n\t\tprintf(\"\\e[1;32m%10d %10d %10d %10d %10d %10d %10d\\e[0m\\n\", pta[0], pta[1], pta[2], pta[3], pta[4], pta[5], pta[6]);\n\t\tpta = (int *) array;\n\t}\n\n\tfor (sam = 0 ; sam < samples ; sam++)\n\t{\n\t\ttotal = average_comp = 0;\n\t\tmax = minimum;\n\n\t\tstart = utime();\n\n\t\tfor (rep = repetitions - 1 ; rep >= 0 ; rep--)\n\t\t{\n\t\t\tmemcpy(array, (char *) unsorted + maximum * rep * size, max * size);\n\n\t\t\tcomparisons = 0;\n\n\t\t\t// edit char *sorts to add / remove sorts\n\n\t\t\tswitch (name32)\n\t\t\t{\n#ifdef BLITSORT_H\n\t\t\t\tcase 'b' + 'l' * 32 + 'i' * 1024: blitsort(array, max, size, cmpf); break;\n#endif\n#ifdef CRUMSORT_H\n\t\t\t\tcase 'c' + 'r' * 32 + 'u' * 1024: crumsort(array, max, size, cmpf); break;\n#endif\n#ifdef DRIPSORT_H\n\t\t\t\tcase 'd' + 'r' * 32 + 'i' * 1024: dripsort(array, max, size, cmpf); break;\n#endif\n#ifdef FLOWSORT_H\n\t\t\t\tcase 'f' + 'l' * 32 + 'o' * 1024: flowsort(array, max, size, cmpf); break;\n#endif\n#ifdef FLUXSORT_H\n\t\t\t\tcase 'f' + 'l' * 32 + 'u' * 1024: fluxsort(array, max, size, cmpf); break;\n\t\t\t\tcase 's' + '_' * 32 + 'f' * 1024: fluxsort_size(array, max, size, cmpf); break;\n\n#endif\n#ifdef GRIDSORT_H\n\t\t\t\tcase 'g' + 'r' * 32 + 'i' * 1024: gridsort(array, max, size, cmpf); break;\n#endif\n#ifdef OCTOSORT_H\n\t\t\t\tcase 'o' + 'c' * 32 + 't' * 1024: octosort(array, max, size, cmpf); break;\n#endif\n#ifdef PIPOSORT_H\n\t\t\t\tcase 'p' + 'i' * 32 + 'p' * 1024: piposort(array, max, size, cmpf); break;\n#endif\n#ifdef QUADSORT_H\n\t\t\t\tcase 'q' + 'u' * 32 + 'a' * 1024: quadsort(array, max, size, cmpf); break;\n\t\t\t\tcase 's' + '_' * 32 + 'q' * 1024: quadsort_size(array, max, size, cmpf); break;\n#endif\n#ifdef SKIPSORT_H\n\t\t\t\tcase 's' + 'k' * 32 + 'i' * 1024: skipsort(array, max, size, cmpf); break;\n#endif\n#ifdef WOLFSORT_H\n\t\t\t\tcase 'w' + 'o' * 32 + 'l' * 1024: wolfsort(array, max, size, cmpf); break;\n#endif\n\t\t\t\tcase 'q' + 's' * 32 + 'o' * 1024: qsort(array, max, size, cmpf); break;\n\n#ifdef RHSORT_C\n\t\t\t\tcase 'r' + 'h' * 32 + 's' * 1024: if (size == sizeof(int)) rhsort32(pta, max); else return; break;\n#endif\n\n#ifdef __GNUG__\n\t\t\t\tcase 's' + 'o' * 32 + 'r' * 1024: if (size == sizeof(int)) std::sort(pta, pta + max); else if (size == sizeof(long long)) std::sort(ptla, ptla + max); else std::sort(ptda, ptda + max); break;\n\t\t\t\tcase 's' + 't' * 32 + 'a' * 1024: if (size == sizeof(int)) std::stable_sort(pta, pta + max); else if (size == sizeof(long long)) std::stable_sort(ptla, ptla + max); else std::stable_sort(ptda, ptda + max); break;\n\n  #ifdef PDQSORT_H\n\t\t\t\tcase 'p' + 'd' * 32 + 'q' * 1024: if (size == sizeof(int)) pdqsort(pta, pta + max); else if (size == sizeof(long long)) pdqsort(ptla, ptla + max); else pdqsort(ptda, ptda + max); break;\n  #endif\n  #ifdef SKASORT_HPP\n\t\t\t\tcase 's' + 'k' * 32 + 'a' * 1024: swap = malloc(max * size); if (size == sizeof(int)) ska_sort_copy(pta, pta + max, (int *) swap); else if (size == sizeof(long long)) ska_sort_copy(ptla, ptla + max, (long long *) swap); else repetitions = 0; free(swap); break;\n  #endif\n  #ifdef GFX_TIMSORT_HPP\n\t\t\t\tcase 't' + 'i' * 32 + 'm' * 1024: if (size == sizeof(int)) gfx::timsort(pta, pta + max, cpp_cmp_int); else if (size == sizeof(long long)) gfx::timsort(ptla, ptla + max); else gfx::timsort(ptda, ptda + max); break;\n  #endif\n#endif\n\t\t\t\tdefault:\n\t\t\t\t\tswitch (name32)\n\t\t\t\t\t{\n\t\t\t\t\t\tcase 's' + 'o' * 32 + 'r' * 1024:\n\t\t\t\t\t\tcase 's' + 't' * 32 + 'a' * 1024:\n\t\t\t\t\t\tcase 'p' + 'd' * 32 + 'q' * 1024: \n\t\t\t\t\t\tcase 'r' + 'h' * 32 + 's' * 1024:\n\t\t\t\t\t\tcase 's' + 'k' * 32 + 'a' * 1024:\n\t\t\t\t\t\tcase 't' + 'i' * 32 + 'm' * 1024:\n\t\t\t\t\t\t\tprintf(\"unknown sort: %s (compile with g++ instead of gcc?)\\n\", name);\n\t\t\t\t\t\t\treturn;\n\t\t\t\t\t\tdefault:\n\t\t\t\t\t\t\tprintf(\"unknown sort: %s\\n\", name);\n\t\t\t\t\t\t\treturn;\n\t\t\t\t\t}\n\t\t\t}\n\t\t\taverage_comp += comparisons;\n\n\t\t\tif (minimum < maximum && ++max > maximum)\n\t\t\t{\n\t\t\t\tmax = minimum;\n\t\t\t}\n\t\t}\n\t\tend = utime();\n\n\t\ttotal = end - start;\n\n\t\tif (!best || total < best)\n\t\t{\n\t\t\tbest = total;\n\t\t}\n\t\taverage_time += total;\n\t}\n\n\tif (minimum == 7 && maximum == 7)\n\t{\n\t\tprintf(\"\\e[1;32m%10d %10d %10d %10d %10d %10d %10d\\e[0m\\n\", pta[0], pta[1], pta[2], pta[3], pta[4], pta[5], pta[6]);\n\t}\n\n\tif (repetitions == 0)\n\t{\n\t\treturn;\n\t}\n\n\taverage_time /= samples;\n\n\tif (cmpf == cmp_stable)\n\t{\n\t\tfor (cnt = 1 ; cnt < maximum ; cnt++)\n\t\t{\n\t\t\tif (pta[cnt - 1] > pta[cnt])\n\t\t\t{\n\t\t\t\tsprintf(temp, \"\\e[1;31m%16s\\e[0m\", \"unstable\");\n\t\t\t\tdesc = temp;\n\t\t\t\tbreak;\n\t\t\t}\n\t\t}\n\t}\n\n\tif (compare)\n\t{\n\t\tif (repetitions <= 1)\n\t\t{\n\t\t\tprintf(\"|%10s |%9d | %4d |%9f |%9f |%10d | %7d | %16s |\\e[0m\\n\", name, maximum, (int) size * 8, best / 1000000.0, average_time / 1000000.0, (int) comparisons, samples, desc);\n\t\t}\n\t\telse\n\t\t{\n\t\t\tprintf(\"|%10s |%9d | %4d |%9f |%9f |%10.1f | %7d | %16s |\\e[0m\\n\", name, maximum, (int) size * 8, best / 1000000.0, average_time / 1000000.0, (float) average_comp / repetitions, samples, desc);\n\t\t}\n\t}\n\telse\n\t{\n\t\tprintf(\"|%10s | %8d | %4d | %f | %f | %9d | %7d | %16s |\\e[0m\\n\", name, maximum, (int) size * 8, best / 1000000.0, average_time / 1000000.0, repetitions, samples, desc);\n\t}\n\n\tif (minimum != maximum || cmpf == cmp_stable)\n\t{\n\t\treturn;\n\t}\n\n\tfor (cnt = 1 ; cnt < maximum ; cnt++)\n\t{\n\t\tif (cmpf == cmp_str)\n\t\t{\n\t\t\tchar **ptsa = (char **) array;\n\t\t\tif (strcmp((char *) ptsa[cnt - 1], (char *) ptsa[cnt]) > 0)\n\t\t\t{\n\t\t\t\tprintf(\"%17s: not properly sorted at index %d. (%s vs %s\\n\", name, cnt, (char *) ptsa[cnt - 1], (char *) ptsa[cnt]);\n\t\t\t\tbreak;\n\t\t\t}\n\t\t}\n\t\telse if (size == sizeof(int *) && cmpf == cmp_long_double_ptr)\n\t\t{\n\t\t\tlong double **pptda = (long double **) array;\n\n\t\t\tif (cmp_long_double_ptr(&pptda[cnt - 1], &pptda[cnt]) > 0)\n\t\t\t{\n\t\t\t\tprintf(\"%17s: not properly sorted at index %d. (%Lf vs %Lf\\n\", name, cnt, *pptda[cnt - 1], *pptda[cnt]);\n\t\t\t\tbreak;\n\t\t\t}\n\t\t}\n\t\telse if (cmpf == cmp_long_ptr)\n\t\t{\n\t\t\tlong long **pptla = (long long **) array;\n\n\t\t\tif (cmp_long_ptr(&pptla[cnt - 1], &pptla[cnt]) > 0)\n\t\t\t{\n\t\t\t\tprintf(\"%17s: not properly sorted at index %d. (%lld vs %lld\\n\", name, cnt, *pptla[cnt - 1], *pptla[cnt]);\n\t\t\t\tbreak;\n\t\t\t}\n\t\t}\n\t\telse if (cmpf == cmp_int_ptr)\n\t\t{\n\t\t\tint **pptia = (int **) array;\n\n\t\t\tif (cmp_int_ptr(&pptia[cnt - 1], &pptia[cnt]) > 0)\n\t\t\t{\n\t\t\t\tprintf(\"%17s: not properly sorted at index %d. (%d vs %d\\n\", name, cnt, *pptia[cnt - 1], *pptia[cnt]);\n\t\t\t\tbreak;\n\t\t\t}\n\t\t}\n\t\telse if (size == sizeof(int))\n\t\t{\n\t\t\tif (pta[cnt - 1] > pta[cnt])\n\t\t\t{\n\t\t\t\tprintf(\"%17s: not properly sorted at index %d. (%d vs %d\\n\", name, cnt, pta[cnt - 1], pta[cnt]);\n\t\t\t\tbreak;\n\t\t\t}\n\t\t\tif (pta[cnt - 1] == pta[cnt])\n\t\t\t{\n//\t\t\t\tprintf(\"%17s: Found a repeat value at index %d. (%d)\\n\", name, cnt, pta[cnt]);\n\t\t\t}\n\t\t}\n\t\telse if (size == sizeof(long long))\n\t\t{\n\t\t\tif (ptla[cnt - 1] > ptla[cnt])\n\t\t\t{\n\t\t\t\tprintf(\"%17s: not properly sorted at index %d. (%lld vs %lld\\n\", name, cnt, ptla[cnt - 1], ptla[cnt]);\n\t\t\t\tbreak;\n\t\t\t}\n\t\t}\n\t\telse if (size == sizeof(long double))\n\t\t{\n\t\t\tif (cmp_long_double(&ptda[cnt - 1], &ptda[cnt]) > 0)\n\t\t\t{\n\t\t\t\tprintf(\"%17s: not properly sorted at index %d. (%Lf vs %Lf\\n\", name, cnt, ptda[cnt - 1], ptda[cnt]);\n\t\t\t\tbreak;\n\t\t\t}\n\t\t}\n\t}\n\n\tfor (cnt = 1 ; cnt < maximum ; cnt++)\n\t{\n\t\tif (size == sizeof(int))\n\t\t{\n\t\t\tif (pta[cnt] != ptv[cnt])\n\t\t\t{\n\t\t\t\tprintf(\"         validate: array[%d] != valid[%d]. (%d vs %d\\n\", cnt, cnt, pta[cnt], ptv[cnt]);\n\t\t\t\tbreak;\n\t\t\t}\n\t\t}\n\t\telse if (size == sizeof(long long))\n\t\t{\n\t\t\tif (ptla[cnt] != ptlv[cnt])\n\t\t\t{\n\t\t\t\tif (cmpf == cmp_str)\n\t\t\t\t{\n\t\t\t\t\tchar **ptsa = (char **) array;\n\t\t\t\t\tchar **ptsv = (char **) valid;\n\n\t\t\t\t\tprintf(\"         validate: array[%d] != valid[%d]. (%s vs %s) %s\\n\", cnt, cnt, (char *) ptsa[cnt], (char *) ptsv[cnt], !strcmp((char *) ptsa[cnt], (char *) ptsv[cnt]) ? \"\\e[1;31munstable\\e[0m\" : \"\");\n\t\t\t\t\tbreak;\n\t\t\t\t}\n\t\t\t\tif (cmpf == cmp_long_ptr)\n\t\t\t\t{\n\t\t\t\t\tlong long **ptla = (long long **) array;\n\t\t\t\t\tlong long **ptlv = (long long **) valid;\n\n\t\t\t\t\tprintf(\"         validate: array[%d] != valid[%d]. (%lld vs %lld) %s\\n\", cnt, cnt, *ptla[cnt], *ptlv[cnt], (*ptla[cnt] == *ptlv[cnt]) ? \"\\e[1;31munstable\\e[0m\" : \"\");\n\t\t\t\t\tbreak;\n\t\t\t\t}\n\t\t\t\tif (cmpf == cmp_int_ptr)\n\t\t\t\t{\n\t\t\t\t\tint **ptia = (int **) array;\n\t\t\t\t\tint **ptiv = (int **) valid;\n\n\t\t\t\t\tprintf(\"         validate: array[%d] != valid[%d]. (%d vs %d) %s\\n\", cnt, cnt, *ptia[cnt], *ptiv[cnt], (*ptia[cnt] == *ptiv[cnt]) ? \"\\e[1;31munstable\\e[0m\" : \"\");\n\t\t\t\t\tbreak;\n\t\t\t\t}\n\n\t\t\t\tprintf(\"         validate: array[%d] != valid[%d]. (%lld vs %lld\\n\", cnt, cnt, ptla[cnt], ptlv[cnt]);\n\t\t\t\tbreak;\n\t\t\t}\n\t\t}\n\t\telse if (size == sizeof(long double))\n\t\t{\n\t\t\tif (ptda[cnt] != ptdv[cnt])\n\t\t\t{\n\t\t\t\tprintf(\"         validate: array[%d] != valid[%d]. (%Lf vs %Lf\\n\", cnt, cnt, ptda[cnt], ptdv[cnt]);\n\t\t\t\tbreak;\n\t\t\t}\n\t\t}\n\t}\n}\n\nvoid validate()\n{\n\tint seed = time(NULL);\n\tint cnt, val, max = 1000;\n\n\tint *a_array, *r_array, *v_array;\n\n\tseed_rand(seed);\n\n\ta_array = (int *) malloc(max * sizeof(int));\n\tr_array = (int *) malloc(max * sizeof(int));\n\tv_array = (int *) malloc(max * sizeof(int));\n\n\tfor (cnt = 0 ; cnt < max ; cnt++) r_array[cnt] = rand();\n\n\tfor (cnt = 0 ; cnt < max ; cnt++)\n\t{\n\t\tmemcpy(a_array, r_array, cnt * sizeof(int));\n\t\tmemcpy(v_array, r_array, cnt * sizeof(int));\n\n\t\tquadsort_prim(a_array, cnt, sizeof(int));\n\t\tqsort(v_array, cnt, sizeof(int), cmp_int);\n\n\t\tfor (val = 0 ; val < cnt ; val++)\n\t\t{\n\t\t\tif (val && v_array[val - 1] > v_array[val]) {printf(\"\\e[1;31mvalidate rand: seed %d: size: %d Not properly sorted at index %d.\\n\", seed, cnt, val); return;}\n\t\t\tif (a_array[val] != v_array[val])           {printf(\"\\e[1;31mvalidate rand: seed %d: size: %d Not verified at index %d.\\n\", seed, cnt, val); return;}\n\t\t}\n\t}\n\n\t// ascending saw\n\n\tfor (cnt = 0 ; cnt < max ; cnt++) r_array[cnt] = cnt % (max / 5);\n\n\tfor (cnt = 0 ; cnt < max ; cnt += 7)\n\t{\n\t\tmemcpy(a_array, r_array, cnt * sizeof(int));\n\t\tmemcpy(v_array, r_array, cnt * sizeof(int));\n\n\t\tquadsort(a_array, cnt, sizeof(int), cmp_int);\n\t\tqsort(v_array, cnt, sizeof(int), cmp_int);\n\n\t\tfor (val = 0 ; val < cnt ; val++)\n\t\t{\n\t\t\tif (val && v_array[val - 1] > v_array[val]) {printf(\"\\e[1;31mvalidate ascending saw: seed %d: size: %d Not properly sorted at index %d.\\n\", seed, cnt, val); return;}\n\t\t\tif (a_array[val] != v_array[val])           {printf(\"\\e[1;31mvalidate ascending saw: seed %d: size: %d Not verified at index %d.\\n\", seed, cnt, val); return;}\n\t\t}\n\t}\n\n\t// descending saw\n\n\tfor (cnt = 0 ; cnt < max ; cnt++)\n\t{\n\t\tr_array[cnt] = (max - cnt + 1) % (max / 11);\n\t}\n\n\tfor (cnt = 1 ; cnt < max ; cnt += 7)\n\t{\n\t\tmemcpy(a_array, r_array, cnt * sizeof(int));\n\t\tmemcpy(v_array, r_array, cnt * sizeof(int));\n\n\t\tquadsort(a_array, cnt, sizeof(int), cmp_int);\n\t\tqsort(v_array, cnt, sizeof(int), cmp_int);\n\n\t\tfor (val = 0 ; val < cnt ; val++)\n\t\t{\n\t\t\tif (val && v_array[val - 1] > v_array[val]) {printf(\"\\e[1;31mvalidate descending saw: seed %d: size: %d Not properly sorted at index %d.\\n\\n\", seed, cnt, val); return;}\n\t\t\tif (a_array[val] != v_array[val])           {printf(\"\\e[1;31mvalidate descending saw: seed %d: size: %d Not verified at index %d.\\n\\n\", seed, cnt, val); return;}\n\t\t}\n\t}\n\n\t// random half\n\n\tfor (cnt = 0 ; cnt < max ; cnt++) r_array[cnt] = (cnt < max / 2) ? cnt : rand();\n\n\tfor (cnt = 1 ; cnt < max ; cnt += 7)\n\t{\n\t\tmemcpy(a_array, r_array, cnt * sizeof(int));\n\t\tmemcpy(v_array, r_array, cnt * sizeof(int));\n\n\t\tquadsort(a_array, cnt, sizeof(int), cmp_int);\n\t\tqsort(v_array, cnt, sizeof(int), cmp_int);\n\n\t\tfor (val = 0 ; val < cnt ; val++)\n\t\t{\n\t\t\tif (val && v_array[val - 1] > v_array[val]) {printf(\"\\e[1;31mvalidate rand tail: seed %d: size: %d Not properly sorted at index %d.\\n\", seed, cnt, val); return;}\n\t\t\tif (a_array[val] != v_array[val])           {printf(\"\\e[1;31mvalidate rand tail: seed %d: size: %d Not verified at index %d.\\n\", seed, cnt, val); return;}\n\t\t}\n\t}\n\tfree(a_array);\n\tfree(r_array);\n\tfree(v_array);\n}\n\nunsigned int bit_reverse(unsigned int x)\n{\n    x = (((x & 0xaaaaaaaa) >> 1) | ((x & 0x55555555) << 1));\n    x = (((x & 0xcccccccc) >> 2) | ((x & 0x33333333) << 2));\n    x = (((x & 0xf0f0f0f0) >> 4) | ((x & 0x0f0f0f0f) << 4));\n    x = (((x & 0xff00ff00) >> 8) | ((x & 0x00ff00ff) << 8));\n\n    return((x >> 16) | (x << 15));\n}\n\nvoid run_test(void *a_array, void *r_array, void *v_array, int minimum, int maximum, int samples, int repetitions, int copies, const char *desc, size_t size, CMPFUNC *cmpf)\n{\n\tint cnt, rep;\n\n\tmemcpy(v_array, r_array, maximum * size);\n\n\tfor (rep = 0 ; rep < copies ; rep++)\n\t{\n\t\tmemcpy((char *) r_array + rep * maximum * size, v_array, maximum * size);\n\t}\n\tquadsort(v_array, maximum, size, cmpf);\n\n\tfor (cnt = 0 ; (size_t) cnt < sizeof(sorts) / sizeof(char *) ; cnt++)\n\t{\n\t\ttest_sort(a_array, r_array, v_array, minimum, maximum, samples, repetitions, qsort, sorts[cnt], desc, size, cmpf);\n\t}\n}\n\nvoid range_test(int max, int samples, int repetitions, int seed)\n{\n\tint cnt, last;\n\tint mem = max * 10 > 32768 * 64 ? max * 10 : 32768 * 64;\n\tchar dist[40];\n\n\tint *a_array = (int *) malloc(max * sizeof(int));\n\tint *r_array = (int *) malloc(mem * sizeof(int));\n\tint *v_array = (int *) malloc(max * sizeof(int));\n\n\tsrand(seed);\n\n\tfor (cnt = 0 ; cnt < mem ; cnt++)\n\t{\n\t\tr_array[cnt] = rand();\n\t}\n\n\tif (max <= 4096)\n\t{\n\t\tfor (last = 1, samples = 32768*4, repetitions = 4 ; repetitions <= max ; repetitions *= 2, samples /= 2)\n\t\t{\n\t\t\tif (max >= repetitions)\n\t\t\t{\n\t\t\t\tsprintf(dist, \"random %d-%d\", last, repetitions);\n\n\t\t\t\tmemcpy(v_array, r_array, repetitions * sizeof(int));\n\t\t\t\tquadsort(v_array, repetitions, sizeof(int), cmp_int);\n\n\t\t\t\tfor (cnt = 0 ; (size_t) cnt < sizeof(sorts) / sizeof(char *) ; cnt++)\n\t\t\t\t{\n\t\t\t\t\ttest_sort(a_array, r_array, v_array, last, repetitions, 50, samples, qsort, sorts[cnt], dist, sizeof(int), cmp_int);\n\t\t\t\t}\n\t\t\t\tlast = repetitions + 1;\n\t\t\t}\n\t\t}\n\t\tfree(a_array);\n\t\tfree(r_array);\n\t\tfree(v_array);\n\t\treturn;\n\t}\n\n\tif (max == 10000000)\n\t{\n\t\trepetitions = 10000000;\n\n\t\tfor (max = 10 ; max <= 10000000 ; max *= 10)\n\t\t{\n\t\t\trepetitions /= 10;\n\n\t\t\tmemcpy(v_array, r_array, max * sizeof(int));\n\t\t\tquadsort_prim(v_array, max, sizeof(int));\n\n\t\t\tsprintf(dist, \"random %d\", max);\n\n\t\t\tfor (cnt = 0 ; (size_t) cnt < sizeof(sorts) / sizeof(char *) ; cnt++)\n\t\t\t{\n\t\t\t\ttest_sort(a_array, r_array, v_array, max, max, 10, repetitions, qsort, sorts[cnt], dist, sizeof(int), cmp_int);\n\t\t\t}\n\t\t}\n\t}\n\telse\n\t{\n\t\tfor (samples = 32768*4, repetitions = 4 ; samples > 0 ; repetitions *= 2, samples /= 2)\n\t\t{\n\t\t\tif (max >= repetitions)\n\t\t\t{\n\t\t\t\tmemcpy(v_array, r_array, repetitions * sizeof(int));\n\t\t\t\tquadsort(v_array, repetitions, sizeof(int), cmp_int);\n\n\t\t\t\tsprintf(dist, \"random %d\", repetitions);\n\n\t\t\t\tfor (cnt = 0 ; (size_t) cnt < sizeof(sorts) / sizeof(char *) ; cnt++)\n\t\t\t\t{\n\t\t\t\t\ttest_sort(a_array, r_array, v_array, repetitions, repetitions, 100, samples, qsort, sorts[cnt], dist, sizeof(int), cmp_int);\n\t\t\t\t}\n\t\t\t}\n\t\t}\n\t}\n\tfree(a_array);\n\tfree(r_array);\n\tfree(v_array);\n\treturn;\n}\n\n#define VAR int\n\nint main(int argc, char **argv)\n{\n\tint max = 100000;\n\tint samples = 10;\n\tint repetitions = 1;\n\tint seed = 0;\n\tint cnt, mem;\n\tVAR *a_array, *r_array, *v_array, sum;\n\n\tif (argc >= 1 && argv[1] && *argv[1])\n\t{\n\t\tmax = atoi(argv[1]);\n\t}\n\n\tif (argc >= 2 && argv[2] && *argv[2])\n\t{\n\t\tsamples = atoi(argv[2]);\n\t}\n\n\tif (argc >= 3 && argv[3] && *argv[3])\n\t{\n\t\trepetitions = atoi(argv[3]);\n\t}\n\n\tif (argc >= 4 && argv[4] && *argv[4])\n\t{\n\t\tseed = atoi(argv[4]);\n\t}\n\n\tvalidate();\n\n\tseed = seed ? seed : time(NULL);\n\n\tprintf(\"Info: int = %lu, long long = %lu, long double = %lu\\n\\n\", sizeof(int) * 8, sizeof(long long) * 8, sizeof(long double) * 8);\n\n\tprintf(\"Benchmark: array size: %d, samples: %d, repetitions: %d, seed: %d\\n\\n\", max, samples, repetitions, seed);\n\n\tif (repetitions == 0)\n\t{\n\t\trange_test(max, samples, repetitions, seed);\n\t\treturn 0;\n\t}\n\n\tmem = max * repetitions;\n\n#ifndef SKIP_STRINGS\n#ifndef cmp\n\n\t// C string\n\n\t{\n\t\tchar **sa_array = (char **) malloc(max * sizeof(char **));\n\t\tchar **sr_array = (char **) malloc(mem * sizeof(char **));\n\t\tchar **sv_array = (char **) malloc(max * sizeof(char **));\n\n\t\tchar *buffer = (char *) malloc(mem * 16);\n\n\t\tseed_rand(seed);\n\n\t\tfor (cnt = 0 ; cnt < mem ; cnt++)\n\t\t{\n\t\t\tsprintf(buffer + cnt * 16, \"%X\", rand() % 1000000);\n\n\t\t\tsr_array[cnt] = buffer + cnt * 16;\n\t\t}\n\t\trun_test(sa_array, sr_array, sv_array, max, max, samples, repetitions, 0, \"random string\", sizeof(char **), cmp_str);\n\n\t\tfree(sa_array);\n\t\tfree(sr_array);\n\t\tfree(sv_array);\n\n\t\tfree(buffer);\n\t}\n\n\t// long double table\n\n\t{\n\t\tlong double **da_array = (long double **) malloc(max * sizeof(long double *));\n\t\tlong double **dr_array = (long double **) malloc(mem * sizeof(long double *));\n\t\tlong double **dv_array = (long double **) malloc(max * sizeof(long double *));\n\n\t\tlong double *buffer = (long double *) malloc(mem * sizeof(long double));\n\n\t\tif (da_array == NULL || dr_array == NULL || dv_array == NULL)\n\t\t{\n\t\t\tprintf(\"main(%d,%d,%d): malloc: %s\\n\", max, samples, repetitions, strerror(errno));\n\n\t\t\treturn 0;\n\t\t}\n\n\t\tseed_rand(seed);\n\n\t\tfor (cnt = 0 ; cnt < mem ; cnt++)\n\t\t{\n\t\t\tbuffer[cnt] = (long double) rand();\n\t\t\tbuffer[cnt] += (long double) ((unsigned long long) rand() << 32ULL);\n\n\t\t\tdr_array[cnt] = buffer + cnt;\n\t\t}\n\t\trun_test(da_array, dr_array, dv_array, max, max, samples, repetitions, 0, \"random double\", sizeof(long double *), cmp_long_double_ptr);\n\n\t\tfree(da_array);\n\t\tfree(dr_array);\n\t\tfree(dv_array);\n\n\t\tfree(buffer);\n\t}\n\n\t// long long table\n\n\t{\n\t\tlong long **la_array = (long long **) malloc(max * sizeof(long long *));\n\t\tlong long **lr_array = (long long **) malloc(mem * sizeof(long long *));\n\t\tlong long **lv_array = (long long **) malloc(max * sizeof(long long *));\n\n\t\tlong long *buffer = (long long *) malloc(mem * sizeof(long long));\n\n\t\tif (la_array == NULL || lr_array == NULL || lv_array == NULL)\n\t\t{\n\t\t\tprintf(\"main(%d,%d,%d): malloc: %s\\n\", max, samples, repetitions, strerror(errno));\n\n\t\t\treturn 0;\n\t\t}\n\n\t\tseed_rand(seed);\n\n\t\tfor (cnt = 0 ; cnt < mem ; cnt++)\n\t\t{\n\t\t\tbuffer[cnt] = (long long) rand();\n\t\t\tbuffer[cnt] += (long long) ((unsigned long long) rand() << 32ULL);\n\n\t\t\tlr_array[cnt] = buffer + cnt;\n\t\t}\n\t\trun_test(la_array, lr_array, lv_array, max, max, samples, repetitions, 0, \"random long\", sizeof(long long *), cmp_long_ptr);\n\n\n\t\tfree(la_array);\n\t\tfree(lr_array);\n\t\tfree(lv_array);\n\n\t\tfree(buffer);\n\t}\n\n\t// int table\n\n\t{\n\t\tint **la_array = (int **) malloc(max * sizeof(int *));\n\t\tint **lr_array = (int **) malloc(mem * sizeof(int *));\n\t\tint **lv_array = (int **) malloc(max * sizeof(int *));\n\n\t\tint *buffer = (int *) malloc(mem * sizeof(int));\n\n\t\tif (la_array == NULL || lr_array == NULL || lv_array == NULL)\n\t\t{\n\t\t\tprintf(\"main(%d,%d,%d): malloc: %s\\n\", max, samples, repetitions, strerror(errno));\n\n\t\t\treturn 0;\n\t\t}\n\n\t\tseed_rand(seed);\n\n\t\tfor (cnt = 0 ; cnt < mem ; cnt++)\n\t\t{\n\t\t\tbuffer[cnt] = rand();\n\n\t\t\tlr_array[cnt] = buffer + cnt;\n\t\t}\n\t\trun_test(la_array, lr_array, lv_array, max, max, samples, repetitions, 0, \"random int\", sizeof(int *), cmp_int_ptr);\n\n\t\tfree(la_array);\n\t\tfree(lr_array);\n\t\tfree(lv_array);\n\n\t\tfree(buffer);\n\n\t\tprintf(\"\\n\");\n\t}\n#endif\n#endif\n\t// 128 bit\n\n#ifndef SKIP_DOUBLES\n\tlong double *da_array = (long double *) malloc(max * sizeof(long double));\n\tlong double *dr_array = (long double *) malloc(mem * sizeof(long double));\n\tlong double *dv_array = (long double *) malloc(max * sizeof(long double));\n\n\tif (da_array == NULL || dr_array == NULL || dv_array == NULL)\n\t{\n\t\tprintf(\"main(%d,%d,%d): malloc: %s\\n\", max, samples, repetitions, strerror(errno));\n\n\t\treturn 0;\n\t}\n\n\tseed_rand(seed);\n\n\tfor (cnt = 0 ; cnt < mem ; cnt++)\n\t{\n\t\tdr_array[cnt] = (long double) rand();\n\t\tdr_array[cnt] += (long double) ((unsigned long long) rand() << 32ULL);\n\t\tdr_array[cnt] += 1.0L / 3.0L;\n\t}\n\n\tmemcpy(dv_array, dr_array, max * sizeof(long double));\n\tquadsort(dv_array, max, sizeof(long double), cmp_long_double);\n\n\tfor (cnt = 0 ; (size_t) cnt < sizeof(sorts) / sizeof(char *) ; cnt++)\n\t{\n\t\ttest_sort(da_array, dr_array, dv_array, max, max, samples, repetitions, qsort, sorts[cnt], \"random order\", sizeof(long double), cmp_long_double);\n\t}\n#ifndef cmp\n#ifdef QUADSORT_H\n\ttest_sort(da_array, dr_array, dv_array, max, max, samples, repetitions, qsort, \"s_quadsort\", \"random order\", sizeof(long double), cmp_long_double_ptr);\n#endif\n#endif\n\tfree(da_array);\n\tfree(dr_array);\n\tfree(dv_array);\n\n\tprintf(\"\\n\");\n#endif\n\t// 64 bit\n\n#ifndef SKIP_LONGS\n\tlong long *la_array = (long long *) malloc(max * sizeof(long long));\n\tlong long *lr_array = (long long *) malloc(mem * sizeof(long long));\n\tlong long *lv_array = (long long *) malloc(max * sizeof(long long));\n\n\tif (la_array == NULL || lr_array == NULL || lv_array == NULL)\n\t{\n\t\tprintf(\"main(%d,%d,%d): malloc: %s\\n\", max, samples, repetitions, strerror(errno));\n\n\t\treturn 0;\n\t}\n\n\tseed_rand(seed);\n\n\tfor (cnt = 0 ; cnt < mem ; cnt++)\n\t{\n\t\tlr_array[cnt] = rand();\n\t\tlr_array[cnt] += (unsigned long long) rand() << 32ULL;\n\t}\n\n\tmemcpy(lv_array, lr_array, max * sizeof(long long));\n\tquadsort(lv_array, max, sizeof(long long), cmp_long);\n\n\tfor (cnt = 0 ; (size_t) cnt < sizeof(sorts) / sizeof(char *) ; cnt++)\n\t{\n\t\ttest_sort(la_array, lr_array, lv_array, max, max, samples, repetitions, qsort, sorts[cnt], \"random order\", sizeof(long long), cmp_long);\n\t}\n\n\tfree(la_array);\n\tfree(lr_array);\n\tfree(lv_array);\n\n\tprintf(\"\\n\");\n#endif\n\t// 32 bit\n\n\ta_array = (VAR *) malloc(max * sizeof(VAR));\n\tr_array = (VAR *) malloc(mem * sizeof(VAR));\n\tv_array = (VAR *) malloc(max * sizeof(VAR));\n\n\tint quad0 = 0;\n\tint nmemb = max;\n\tint half1 = nmemb / 2;\n\tint half2 = nmemb - half1;\n\tint quad1 = half1 / 2;\n\tint quad2 = half1 - quad1;\n\tint quad3 = half2 / 2;\n\tint quad4 = half2 - quad3;\n\n\tint span3 = quad1 + quad2 + quad3;\n\n\t// random\n\n\tseed_rand(seed);\n\n\tfor (cnt = 0 ; cnt < mem ; cnt++)\n\t{\n\t\tr_array[cnt] = rand();\n\t}\n\trun_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, \"random order\", sizeof(VAR), cmp_int);\n\n\t// random % 100\n\n\tfor (cnt = 0 ; cnt < mem ; cnt++)\n\t{\n\t\tr_array[cnt] = rand() % 100;\n\t}\n\trun_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, \"random % 100\", sizeof(VAR), cmp_int);\n\n\t// ascending\n\n\tfor (cnt = sum = 0 ; cnt < mem ; cnt++)\n\t{\n\t\tr_array[cnt] = sum; sum += rand() % 5;\n\t}\n\n\trun_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, \"ascending order\", sizeof(VAR), cmp_int);\n\n\t// ascending saw\n\n\tfor (cnt = 0 ; cnt < max ; cnt++)\n\t{\n\t\tr_array[cnt] = rand();\n\t}\n\n\tquadsort(r_array + quad0, quad1, sizeof(VAR), cmp_int);\n\tquadsort(r_array + quad1, quad2, sizeof(VAR), cmp_int);\n\tquadsort(r_array + half1, quad3, sizeof(VAR), cmp_int);\n\tquadsort(r_array + span3, quad4, sizeof(VAR), cmp_int);\n\n\trun_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, \"ascending saw\", sizeof(VAR), cmp_int);\n\n\t// pipe organ\n\n\tfor (cnt = 0 ; cnt < max ; cnt++)\n\t{\n\t\tr_array[cnt] = rand();\n\t}\n\n\tquadsort(r_array + quad0, half1, sizeof(VAR), cmp_int);\n\tqsort(r_array + half1, half2, sizeof(VAR), cmp_rev);\n\n\tfor (cnt = half1 + 1 ; cnt < max ; cnt++)\n\t{\n\t\tif (r_array[cnt] >= r_array[cnt - 1])\n\t\t{\n\t\t\tr_array[cnt] = r_array[cnt - 1] - 1; // guarantee the run is strictly descending\n\t\t}\n\t}\n\n\trun_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, \"pipe organ\", sizeof(VAR), cmp_int);\n\n\t// descending\n\n\tfor (cnt = 0, sum = mem * 10 ; cnt < mem ; cnt++)\n\t{\n\t\tr_array[cnt] = sum; sum -= 1 + rand() % 5;\n\t}\n\trun_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, \"descending order\", sizeof(VAR), cmp_int);\n\n\t// descending saw\n\n\tfor (cnt = 0 ; cnt < max ; cnt++)\n\t{\n\t\tr_array[cnt] = rand();\n\t}\n\n\tqsort(r_array + quad0, quad1, sizeof(VAR), cmp_rev);\n\tqsort(r_array + quad1, quad2, sizeof(VAR), cmp_rev);\n\tqsort(r_array + half1, quad3, sizeof(VAR), cmp_rev);\n\tqsort(r_array + span3, quad4, sizeof(VAR), cmp_rev);\n\n\tfor (cnt = 1 ; cnt < max ; cnt++)\n\t{\n\t\tif (cnt == quad1 || cnt == half1 || cnt == span3) continue;\n\n\t\tif (r_array[cnt] >= r_array[cnt - 1])\n\t\t{\n\t\t\tr_array[cnt] = r_array[cnt - 1] - 1; // guarantee the run is strictly descending\n\t\t}\n\t}\n\n\trun_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, \"descending saw\", sizeof(VAR), cmp_int);\n\n\n\t// random tail 25%\n\n\tfor (cnt = 0 ; cnt < max ; cnt++)\n\t{\n\t\tr_array[cnt] = rand();\n\t}\n\tquadsort(r_array, span3, sizeof(VAR), cmp_int);\n\n\trun_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, \"random tail\", sizeof(VAR), cmp_int);\n\n\t// random 50%\n\n\tfor (cnt = 0 ; cnt < max ; cnt++)\n\t{\n\t\tr_array[cnt] = rand();\n\t}\n\tquadsort(r_array, half1, sizeof(VAR), cmp_int);\n\n\trun_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, \"random half\", sizeof(VAR), cmp_int);\n\n\t// tiles\n\n\tfor (cnt = 0 ; cnt < mem ; cnt++)\n\t{\n\t\tif (cnt % 2 == 0)\n\t\t{\n\t\t\tr_array[cnt] = 16777216 + cnt;\n\t\t}\n\t\telse\n\t\t{\n\t\t\tr_array[cnt] = 33554432 + cnt;\n\t\t}\n\t}\n\trun_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, \"ascending tiles\", sizeof(VAR), cmp_int);\n\n\t// bit-reversal\n\n\tfor (cnt = 0 ; cnt < mem ; cnt++)\n\t{\n\t\tr_array[cnt] = bit_reverse(cnt);\n\t}\n\trun_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, \"bit reversal\", sizeof(VAR), cmp_int);\n\n#ifndef cmp\n  #ifdef ANTIQSORT\n    test_antiqsort;\n  #endif\n#endif\n\n#define QUAD_DEBUG\n#if __has_include(\"extra_tests.c\")\n  #include \"extra_tests.c\"\n#endif\n\n\tfree(a_array);\n\tfree(r_array);\n\tfree(v_array);\n\n\treturn 0;\n}\n"
  },
  {
    "path": "src/blitsort.c",
    "content": "// blitsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com\n\n#define BLIT_AUX 512 // set to 0 for sqrt(n) cache size\n#define BLIT_OUT  96 // should be smaller or equal to BLIT_AUX\n\nvoid FUNC(blit_partition)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp);\n\nvoid FUNC(blit_analyze)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp)\n{\n\tunsigned char loop, asum, bsum, csum, dsum;\n\tunsigned int astreaks, bstreaks, cstreaks, dstreaks;\n\tsize_t quad1, quad2, quad3, quad4, half1, half2;\n\tsize_t cnt, abalance, bbalance, cbalance, dbalance;\n\tVAR *pta, *ptb, *ptc, *ptd;\n\n\thalf1 = nmemb / 2;\n\tquad1 = half1 / 2;\n\tquad2 = half1 - quad1;\n\thalf2 = nmemb - half1;\n\tquad3 = half2 / 2;\n\tquad4 = half2 - quad3;\n\n\tpta = array;\n\tptb = array + quad1;\n\tptc = array + half1;\n\tptd = array + half1 + quad3;\n\n\tastreaks = bstreaks = cstreaks = dstreaks = 0;\n\tabalance = bbalance = cbalance = dbalance = 0;\n\n\tfor (cnt = nmemb ; cnt > 132 ; cnt -= 128)\n\t{\n\t\tfor (asum = bsum = csum = dsum = 0, loop = 32 ; loop ; loop--)\n\t\t{\n\t\t\tasum += cmp(pta, pta + 1) > 0; pta++;\n\t\t\tbsum += cmp(ptb, ptb + 1) > 0; ptb++;\n\t\t\tcsum += cmp(ptc, ptc + 1) > 0; ptc++;\n\t\t\tdsum += cmp(ptd, ptd + 1) > 0; ptd++;\n\t\t}\n\t\tabalance += asum; astreaks += asum = (asum == 0) | (asum == 32);\n\t\tbbalance += bsum; bstreaks += bsum = (bsum == 0) | (bsum == 32);\n\t\tcbalance += csum; cstreaks += csum = (csum == 0) | (csum == 32);\n\t\tdbalance += dsum; dstreaks += dsum = (dsum == 0) | (dsum == 32);\n\n\t\tif (cnt > 516 && asum + bsum + csum + dsum == 0)\n\t\t{\n\t\t\tabalance += 48; pta += 96;\n\t\t\tbbalance += 48; ptb += 96;\n\t\t\tcbalance += 48; ptc += 96;\n\t\t\tdbalance += 48; ptd += 96;\n\t\t\tcnt -= 384;\n\t\t}\n\t}\n\n\tfor ( ; cnt > 7 ; cnt -= 4)\n\t{\n\t\tabalance += cmp(pta, pta + 1) > 0; pta++;\n\t\tbbalance += cmp(ptb, ptb + 1) > 0; ptb++;\n\t\tcbalance += cmp(ptc, ptc + 1) > 0; ptc++;\n\t\tdbalance += cmp(ptd, ptd + 1) > 0; ptd++;\n\t}\n\n\tif (quad1 < quad2) {bbalance += cmp(ptb, ptb + 1) > 0; ptb++;}\n\tif (quad1 < quad3) {cbalance += cmp(ptc, ptc + 1) > 0; ptc++;}\n\tif (quad1 < quad4) {dbalance += cmp(ptd, ptd + 1) > 0; ptd++;}\n\n\tcnt = abalance + bbalance + cbalance + dbalance;\n\n\tif (cnt == 0)\n\t{\n\t\tif (cmp(pta, pta + 1) <= 0 && cmp(ptb, ptb + 1) <= 0 && cmp(ptc, ptc + 1) <= 0)\n\t\t{\n\t\t\treturn;\n\t\t}\n\t}\n\n\tasum = quad1 - abalance == 1;\n\tbsum = quad2 - bbalance == 1;\n\tcsum = quad3 - cbalance == 1;\n\tdsum = quad4 - dbalance == 1;\n\n\tif (asum | bsum | csum | dsum)\n\t{\n\t\tunsigned char span1 = (asum && bsum) * (cmp(pta, pta + 1) > 0);\n\t\tunsigned char span2 = (bsum && csum) * (cmp(ptb, ptb + 1) > 0);\n\t\tunsigned char span3 = (csum && dsum) * (cmp(ptc, ptc + 1) > 0);\n\n\t\tswitch (span1 | span2 * 2 | span3 * 4)\n\t\t{\n\t\t\tcase 0: break;\n\t\t\tcase 1: FUNC(quad_reversal)(array, ptb);   abalance = bbalance = 0; break;\n\t\t\tcase 2: FUNC(quad_reversal)(pta + 1, ptc); bbalance = cbalance = 0; break;\n\t\t\tcase 3: FUNC(quad_reversal)(array, ptc);   abalance = bbalance = cbalance = 0; break;\n\t\t\tcase 4: FUNC(quad_reversal)(ptb + 1, ptd); cbalance = dbalance = 0; break;\n\t\t\tcase 5: FUNC(quad_reversal)(array, ptb);\n\t\t\t\tFUNC(quad_reversal)(ptb + 1, ptd); abalance = bbalance = cbalance = dbalance = 0; break;\n\t\t\tcase 6: FUNC(quad_reversal)(pta + 1, ptd); bbalance = cbalance = dbalance = 0; break;\n\t\t\tcase 7: FUNC(quad_reversal)(array, ptd); return;\n\t\t}\n\n\t\tif (asum && abalance) {FUNC(quad_reversal)(array,   pta); abalance = 0;}\n\t\tif (bsum && bbalance) {FUNC(quad_reversal)(pta + 1, ptb); bbalance = 0;}\n\t\tif (csum && cbalance) {FUNC(quad_reversal)(ptb + 1, ptc); cbalance = 0;}\n\t\tif (dsum && dbalance) {FUNC(quad_reversal)(ptc + 1, ptd); dbalance = 0;}\n\t}\n\n#ifdef cmp\n\tcnt = nmemb / 256; // more than 50% ordered\n#else\n\tcnt = nmemb / 512; // more than 25% ordered\n#endif\n\tasum = astreaks > cnt;\n\tbsum = bstreaks > cnt;\n\tcsum = cstreaks > cnt;\n\tdsum = dstreaks > cnt;\n\n#ifndef cmp\n\tif (quad1 > QUAD_CACHE)\n\t{\n\t\tasum = bsum = csum = dsum = 1;\n\t}\n#endif\n\tswitch (asum + bsum * 2 + csum * 4 + dsum * 8)\n\t{\n\t\tcase 0:\n\t\t\tFUNC(blit_partition)(array, swap, swap_size, nmemb, cmp);\n\t\t\treturn;\n\t\tcase 1:\n\t\t\tif (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp);\n\t\t\tFUNC(blit_partition)(pta + 1, swap, swap_size, quad2 + half2, cmp);\n\t\t\tbreak;\n\t\tcase 2:\n\t\t\tFUNC(blit_partition)(array, swap, swap_size, quad1, cmp);\n\t\t\tif (bbalance) FUNC(quadsort_swap)(pta + 1, swap, swap_size, quad2, cmp);\n\t\t\tFUNC(blit_partition)(ptb + 1, swap, swap_size, half2, cmp);\n\t\t\tbreak;\n\t\tcase 3:\n\t\t\tif (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp);\n\t\t\tif (bbalance) FUNC(quadsort_swap)(pta + 1, swap, swap_size, quad2, cmp);\n\t\t\tFUNC(blit_partition)(ptb + 1, swap, swap_size, half2, cmp);\n\t\t\tbreak;\n\t\tcase 4:\n\t\t\tFUNC(blit_partition)(array, swap, swap_size, half1, cmp);\n\t\t\tif (cbalance) FUNC(quadsort_swap)(ptb + 1, swap, swap_size, quad3, cmp);\n\t\t\tFUNC(blit_partition)(ptc + 1, swap, swap_size, quad4, cmp);\n\t\t\tbreak;\n\t\tcase 8:\n\t\t\tFUNC(blit_partition)(array, swap, swap_size, half1 + quad3, cmp);\n\t\t\tif (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp);\n\t\t\tbreak;\n\t\tcase 9:\n\t\t\tif (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp);\n\t\t\tFUNC(blit_partition)(pta + 1, swap, swap_size, quad2 + quad3, cmp);\n\t\t\tif (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp);\n\t\t\tbreak;\n\t\tcase 12:\n\t\t\tFUNC(blit_partition)(array, swap, swap_size, half1, cmp);\n\t\t\tif (cbalance) FUNC(quadsort_swap)(ptb + 1, swap, swap_size, quad3, cmp);\n\t\t\tif (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp);\n\t\t\tbreak;\n\t\tcase 5:\n\t\tcase 6:\n\t\tcase 7:\n\t\tcase 10:\n\t\tcase 11:\n\t\tcase 13:\n\t\tcase 14:\n\t\tcase 15:\n\t\t\tif (asum)\n\t\t\t{\n\t\t\t\tif (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp);\n\t\t\t}\n\t\t\telse FUNC(blit_partition)(array, swap, swap_size, quad1, cmp);\n\t\t\tif (bsum)\n\t\t\t{\n\t\t\t\tif (bbalance) FUNC(quadsort_swap)(pta + 1, swap, swap_size, quad2, cmp);\n\t\t\t}\n\t\t\telse FUNC(blit_partition)(pta + 1, swap, swap_size, quad2, cmp);\n\t\t\tif (csum)\n\t\t\t{\n\t\t\t\tif (cbalance) FUNC(quadsort_swap)(ptb + 1, swap, swap_size, quad3, cmp);\n\t\t\t}\n\t\t\telse FUNC(blit_partition)(ptb + 1, swap, swap_size, quad3, cmp);\n\t\t\tif (dsum)\n\t\t\t{\n\t\t\t\tif (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp);\n\t\t\t}\n\t\t\telse FUNC(blit_partition)(ptc + 1, swap, swap_size, quad4, cmp);\n\t\t\tbreak;\n\t}\n\n\tif (cmp(pta, pta + 1) <= 0)\n\t{\n\t\tif (cmp(ptc, ptc + 1) <= 0)\n\t\t{\n\t\t\tif (cmp(ptb, ptb + 1) <= 0)\n\t\t\t{\n\t\t\t\treturn;\n\t\t\t}\n\t\t}\n\t\telse\n\t\t{\n\t\t\tFUNC(rotate_merge_block)(array + half1, swap, swap_size, quad3, quad4, cmp);\n\t\t}\n\t}\n\telse\n\t{\n\t\tFUNC(rotate_merge_block)(array, swap, swap_size, quad1, quad2, cmp);\n\n\t\tif (cmp(ptc, ptc + 1) > 0)\n\t\t{\n\t\t\tFUNC(rotate_merge_block)(array + half1, swap, swap_size, quad3, quad4, cmp);\n\t\t}\n\t}\n\tFUNC(rotate_merge_block)(array, swap, swap_size, half1, half2, cmp);\n}\n\n// The next 4 functions are used for pivot selection\n\nVAR FUNC(blit_binary_median)(VAR *pta, VAR *ptb, size_t len, CMPFUNC *cmp)\n{\n\twhile (len /= 2)\n\t{\n\t\tif (cmp(pta + len, ptb + len) <= 0) pta += len; else ptb += len;\n\t}\n\treturn cmp(pta, ptb) > 0 ? *pta : *ptb;\n}\n\nvoid FUNC(blit_trim_four)(VAR *pta, CMPFUNC *cmp)\n{\n\tVAR swap;\n\tsize_t x;\n\n\tx = cmp(pta, pta + 1)  > 0; swap = pta[!x]; pta[0] = pta[x]; pta[1] = swap; pta += 2;\n\tx = cmp(pta, pta + 1)  > 0; swap = pta[!x]; pta[0] = pta[x]; pta[1] = swap; pta -= 2;\n\n\tx = (cmp(pta, pta + 2) <= 0) * 2; pta[2] = pta[x]; pta++;\n\tx = (cmp(pta, pta + 2)  > 0) * 2; pta[0] = pta[x];\n}\n\nVAR FUNC(blit_median_of_nine)(VAR *array, VAR *swap, size_t nmemb, CMPFUNC *cmp)\n{\n\tVAR *pta;\n\tsize_t x, y, z;\n\n\tz = nmemb / 9;\n\n\tpta = array;\n\n\tfor (x = 0 ; x < 9 ; x++)\n\t{\n\t\tswap[x] = *pta;\n\n\t\tpta += z;\n\t}\n\n\tFUNC(blit_trim_four)(swap, cmp);\n\tFUNC(blit_trim_four)(swap + 4, cmp);\n\n\tswap[0] = swap[5];\n\tswap[3] = swap[8];\n\n\tFUNC(blit_trim_four)(swap, cmp);\n\n\tswap[0] = swap[6];\n\n\tx = cmp(swap + 0, swap + 1) > 0;\n\ty = cmp(swap + 0, swap + 2) > 0;\n\tz = cmp(swap + 1, swap + 2) > 0;\n\n\treturn swap[(x == y) + (y ^ z)];\n}\n\nVAR FUNC(blit_median_of_cbrt)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, int *generic, CMPFUNC *cmp)\n{\n\tVAR *pta, *pts;\n\tsize_t cnt, div, cbrt;\n\n\tfor (cbrt = 32 ; nmemb > cbrt * cbrt * cbrt && cbrt < swap_size ; cbrt *= 2) {}\n\n\tdiv = nmemb / cbrt;\n\n\tpta = array; // + (size_t) &div / 16 % div; // for a non-deterministic offset\n\tpts = swap;\n\n\tfor (cnt = 0 ; cnt < cbrt ; cnt++)\n\t{\n\t\tpts[cnt] = *pta;\n\n\t\tpta += div;\n\t}\n\tcbrt /= 2;\n\n\tFUNC(quadsort_swap)(pts, pts + cbrt * 2, cbrt, cbrt, cmp);\n\tFUNC(quadsort_swap)(pts + cbrt, pts + cbrt * 2, cbrt, cbrt, cmp);\n\n\t*generic = (cmp(pts + cbrt * 2 - 1, pts) <= 0) & (cmp(pts + cbrt - 1, pts) <= 0);\n\n\treturn FUNC(blit_binary_median)(pts, pts + cbrt, cbrt, cmp);\n}\n\n// As per suggestion by Marshall Lochbaum to improve generic data handling\n\nsize_t FUNC(blit_reverse_partition)(VAR *array, VAR *swap, VAR *piv, size_t swap_size, size_t nmemb, CMPFUNC *cmp)\n{\n\tif (nmemb > swap_size)\n\t{\n\t\tsize_t l, r, h = nmemb / 2;\n\n\t\tl = FUNC(blit_reverse_partition)(array + 0, swap, piv, swap_size, h, cmp);\n\t\tr = FUNC(blit_reverse_partition)(array + h, swap, piv, swap_size, nmemb - h, cmp);\n\n\t\tFUNC(trinity_rotation)(array + l, swap, swap_size, h - l + r, h - l);\n\n\t\treturn l + r;\n\t}\n#if !defined __clang__\n\tsize_t cnt, val, m = 0;\n\tVAR *pta = array;\n\n\tfor (cnt = nmemb / 4 ; cnt ; cnt--)\n\t{\n\t\tval = cmp(piv, pta) > 0; swap[-m] = array[m] = *pta++; m += val; swap++;\n\t\tval = cmp(piv, pta) > 0; swap[-m] = array[m] = *pta++; m += val; swap++;\n\t\tval = cmp(piv, pta) > 0; swap[-m] = array[m] = *pta++; m += val; swap++;\n\t\tval = cmp(piv, pta) > 0; swap[-m] = array[m] = *pta++; m += val; swap++;\n\t}\n\n\tfor (cnt = nmemb % 4 ; cnt ; cnt--)\n\t{\n\t\tval = cmp(piv, pta) > 0; swap[-m] = array[m] = *pta++; m += val; swap++;\n\t}\n\tswap -= nmemb;\n#else\n\tsize_t cnt, m;\n\tVAR *tmp, *ptx = array, *pta = array, *pts = swap;\n\n\tfor (cnt = nmemb / 4 ; cnt ; cnt--)\n\t{\n\t\ttmp = cmp(piv, ptx) > 0 ? pta++ : pts++; *tmp = *ptx++;\n\t\ttmp = cmp(piv, ptx) > 0 ? pta++ : pts++; *tmp = *ptx++;\n\t\ttmp = cmp(piv, ptx) > 0 ? pta++ : pts++; *tmp = *ptx++;\n\t\ttmp = cmp(piv, ptx) > 0 ? pta++ : pts++; *tmp = *ptx++;\n\t}\n\n\tfor (cnt = nmemb % 4 ; cnt ; cnt--)\n\t{\n\t\ttmp = cmp(piv, ptx) > 0 ? pta++ : pts++; *tmp = *ptx++;\n\t}\n\tm = pta - array;\n#endif\n\tmemcpy(array + m, swap, (nmemb - m) * sizeof(VAR));\n\n\treturn m;\n}\n\nsize_t FUNC(blit_default_partition)(VAR *array, VAR *swap, VAR *piv, size_t swap_size, size_t nmemb, CMPFUNC *cmp)\n{\n\tif (nmemb > swap_size)\n\t{\n\t\tsize_t l, r, h = nmemb / 2;\n\n\t\tl = FUNC(blit_default_partition)(array + 0, swap, piv, swap_size, h, cmp);\n\t\tr = FUNC(blit_default_partition)(array + h, swap, piv, swap_size, nmemb - h, cmp);\n\n\t\tFUNC(trinity_rotation)(array + l, swap, swap_size, h - l + r, h - l);\n\n\t\treturn l + r;\n\t}\n#if !defined __clang__\n\tsize_t cnt, val, m = 0;\n\tVAR *pta = array;\n\n\tfor (cnt = nmemb / 4 ; cnt ; cnt--)\n\t{\n\t\tval = cmp(pta, piv) <= 0; swap[-m] = array[m] = *pta++; m += val; swap++;\n\t\tval = cmp(pta, piv) <= 0; swap[-m] = array[m] = *pta++; m += val; swap++;\n\t\tval = cmp(pta, piv) <= 0; swap[-m] = array[m] = *pta++; m += val; swap++;\n\t\tval = cmp(pta, piv) <= 0; swap[-m] = array[m] = *pta++; m += val; swap++;\n\t}\n\n\tfor (cnt = nmemb % 4 ; cnt ; cnt--)\n\t{\n\t\tval = cmp(pta, piv) <= 0; swap[-m] = array[m] = *pta++; m += val; swap++;\n\t}\n\tswap -= nmemb;\n#else\n\tsize_t cnt, m;\n\tVAR *tmp, *ptx = array, *pta = array, *pts = swap;\n\n\tfor (cnt = nmemb / 4 ; cnt ; cnt--)\n\t{\n\t\ttmp = cmp(ptx, piv) <= 0 ? pta++ : pts++; *tmp = *ptx++;\n\t\ttmp = cmp(ptx, piv) <= 0 ? pta++ : pts++; *tmp = *ptx++;\n\t\ttmp = cmp(ptx, piv) <= 0 ? pta++ : pts++; *tmp = *ptx++;\n\t\ttmp = cmp(ptx, piv) <= 0 ? pta++ : pts++; *tmp = *ptx++;\n\t}\n\n\tfor (cnt = nmemb % 4 ; cnt ; cnt--)\n\t{\n\t\ttmp = cmp(ptx, piv) <= 0 ? pta++ : pts++; *tmp = *ptx++;\n\t}\n\tm = pta - array;\n#endif\t\n\tmemcpy(array + m, swap, sizeof(VAR) * (nmemb - m));\n\n\treturn m;\n}\n\nvoid FUNC(blit_partition)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp)\n{\n\tsize_t a_size = 0, s_size;\n\tVAR piv, max = 0;\n\tint generic = 0;\n\n\twhile (1)\n\t{\n\t\tif (nmemb <= 2048)\n\t\t{\n\t\t\tpiv = FUNC(blit_median_of_nine)(array, swap, nmemb, cmp);\n\t\t}\n\t\telse\n\t\t{\n\t\t\tpiv = FUNC(blit_median_of_cbrt)(array, swap, swap_size, nmemb, &generic, cmp);\n\n\t\t\tif (generic) break;\n\t\t}\n\n\t\tif (a_size && cmp(&max, &piv) <= 0)\n\t\t{\n\t\t\ta_size = FUNC(blit_reverse_partition)(array, swap, &piv, swap_size, nmemb, cmp);\n\t\t\ts_size = nmemb - a_size;\n\t\t\tnmemb = a_size;\n\n\t\t\tif (s_size <= a_size / 16 || a_size <= BLIT_OUT) break;\n\n\t\t\ta_size = 0;\n\t\t\tcontinue;\n\t\t}\n\n\t\ta_size = FUNC(blit_default_partition)(array, swap, &piv, swap_size, nmemb, cmp);\n\t\ts_size = nmemb - a_size;\n\n\t\tif (a_size <= s_size / 16 || s_size <= BLIT_OUT)\n\t\t{\n\t\t\tif (s_size == 0)\n\t\t\t{\n\t\t\t\ta_size = FUNC(blit_reverse_partition)(array, swap, &piv, swap_size, a_size, cmp);\n\t\t\t\ts_size = nmemb - a_size;\n\t\t\t\tnmemb = a_size;\n\n\t\t\t\tif (s_size <= a_size / 16 || a_size <= BLIT_OUT) break;\n\n\t\t\t\ta_size = 0;\n\t\t\t\tcontinue;\n\t\t\t}\n\t\t\tFUNC(quadsort_swap)(array + a_size, swap, swap_size, s_size, cmp);\n\t\t}\n\t\telse\n\t\t{\n\t\t\tFUNC(blit_partition)(array + a_size, swap, swap_size, s_size, cmp);\n\t\t}\n\t\tnmemb = a_size;\n\n\t\tif (s_size <= a_size / 16 || a_size <= BLIT_OUT) break;\n\n\t\tmax = piv;\n\t}\n\tFUNC(quadsort_swap)(array, swap, swap_size, nmemb, cmp);\n}\n\nvoid FUNC(blitsort)(void *array, size_t nmemb, CMPFUNC *cmp)\n{\n\tif (nmemb <= 132)\n\t{\n\t\tFUNC(quadsort)(array, nmemb, cmp);\n\t}\n\telse\n\t{\n\t\tVAR *pta = (VAR *) array;\n#if BLIT_AUX\n\t\tsize_t swap_size = BLIT_AUX;\n#else\n\t\tsize_t swap_size = 1 << 19;\n\n\t\twhile (nmemb / swap_size < swap_size / 128)\n\t\t{\n\t\t\tswap_size /= 4;\n\t\t}\n#endif\n\t\tVAR swap[swap_size];\n\n\t\tFUNC(blit_analyze)(pta, swap, swap_size, nmemb, cmp);\n\t}\n}\n\nvoid FUNC(blitsort_swap)(void *array, void *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp)\n{\n\tif (nmemb <= 132)\n\t{\n\t\tFUNC(quadsort_swap)(array, swap, swap_size, nmemb, cmp);\n\t}\n\telse\n\t{\n\t\tVAR *pta = (VAR *) array;\n\t\tVAR *pts = (VAR *) swap;\n\n\t\tFUNC(blit_analyze)(pta, pts, swap_size, nmemb, cmp);\n\t}\n}\n\n#undef BLIT_AUX\n#undef BLIT_OUT\n"
  },
  {
    "path": "src/blitsort.h",
    "content": "// blitsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com\n\n#ifndef BLITSORT_H\n#define BLITSORT_H\n\n#include <stdlib.h>\n#include <stdio.h>\n#include <assert.h>\n#include <errno.h>\n#include <stdalign.h>\n#include <float.h>\n#include <string.h>\n\ntypedef int CMPFUNC (const void *a, const void *b);\n\n//#define cmp(a,b) (*(a) > *(b))\n\n#ifndef QUADSORT_H\n  #include \"quadsort.h\"\n#endif\n\n// When sorting an array of pointers, like a string array, the QUAD_CACHE needs\n// to be set for proper performance when sorting large arrays.\n// quadsort_prim() can be used to sort arrays of 32 and 64 bit integers\n// without a comparison function or cache restrictions.\n\n// With a 6 MB L3 cache a value of 262144 works well.\n\n#ifdef cmp\n  #define QUAD_CACHE 4294967295\n#else\n//#define QUAD_CACHE 131072\n  #define QUAD_CACHE 262144\n//#define QUAD_CACHE 524288\n//#define QUAD_CACHE 4294967295\n#endif\n\n//////////////////////////////////////////////////////////\n// ┌───────────────────────────────────────────────────┐//\n// │       ██████┐ ██████┐    ██████┐ ██████┐████████┐ │//\n// │       └────██┐└────██┐   ██┌──██┐└─██┌─┘└──██┌──┘ │//\n// │        █████┌┘ █████┌┘   ██████┌┘  ██│     ██│    │//\n// │        └───██┐██┌───┘    ██┌──██┐  ██│     ██│    │//\n// │       ██████┌┘███████┐   ██████┌┘██████┐   ██│    │//\n// │       └─────┘ └──────┘   └─────┘ └─────┘   └─┘    │//\n// └───────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n#define VAR int\n#define FUNC(NAME) NAME##32\n\n#include \"blitsort.c\"\n\n#undef VAR\n#undef FUNC\n\n// blitsort_prim\n\n#define VAR int\n#define FUNC(NAME) NAME##_int32\n#ifndef cmp\n  #define cmp(a,b) (*(a) > *(b))\n  #include \"blitsort.c\"\n  #undef cmp\n#else\n  #include \"blitsort.c\"\n#endif\n#undef VAR\n#undef FUNC\n\n#define VAR unsigned int\n#define FUNC(NAME) NAME##_uint32\n#ifndef cmp\n  #define cmp(a,b) (*(a) > *(b))\n  #include \"blitsort.c\"\n  #undef cmp\n#else\n  #include \"blitsort.c\"\n#endif\n#undef VAR\n#undef FUNC\n\n//////////////////////////////////////////////////////////\n// ┌───────────────────────────────────────────────────┐//\n// │        █████┐ ██┐  ██┐   ██████┐ ██████┐████████┐ │//\n// │       ██┌───┘ ██│  ██│   ██┌──██┐└─██┌─┘└──██┌──┘ │//\n// │       ██████┐ ███████│   ██████┌┘  ██│     ██│    │//\n// │       ██┌──██┐└────██│   ██┌──██┐  ██│     ██│    │//\n// │       └█████┌┘     ██│   ██████┌┘██████┐   ██│    │//\n// │        └────┘      └─┘   └─────┘ └─────┘   └─┘    │//\n// └───────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n#define VAR long long\n#define FUNC(NAME) NAME##64\n\n#include \"blitsort.c\"\n\n#undef VAR\n#undef FUNC\n\n// blitsort_prim\n\n#define VAR long long\n#define FUNC(NAME) NAME##_int64\n#ifndef cmp\n  #define cmp(a,b) (*(a) > *(b))\n  #include \"blitsort.c\"\n  #undef cmp\n#else\n  #include \"blitsort.c\"\n#endif\n#undef VAR\n#undef FUNC\n\n#define VAR unsigned long long\n#define FUNC(NAME) NAME##_uint64\n#ifndef cmp\n  #define cmp(a,b) (*(a) > *(b))\n  #include \"blitsort.c\"\n  #undef cmp\n#else\n  #include \"blitsort.c\"\n#endif\n#undef VAR\n#undef FUNC\n\n// This section is outside of 32/64 bit pointer territory, so no cache checks\n// necessary, unless sorting 32+ byte structures.\n\n#undef QUAD_CACHE\n#define QUAD_CACHE 4294967295\n\n//////////////////////////////////////////////////////////\n//┌────────────────────────────────────────────────────┐//\n//│                █████┐    ██████┐ ██████┐████████┐  │//\n//│               ██┌──██┐   ██┌──██┐└─██┌─┘└──██┌──┘  │//\n//│               └█████┌┘   ██████┌┘  ██│     ██│     │//\n//│               ██┌──██┐   ██┌──██┐  ██│     ██│     │//\n//│               └█████┌┘   ██████┌┘██████┐   ██│     │//\n//│                └────┘    └─────┘ └─────┘   └─┘     │//\n//└────────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n#define VAR char\n#define FUNC(NAME) NAME##8\n\n#include \"blitsort.c\"\n\n#undef VAR\n#undef FUNC\n\n//////////////////////////////////////////////////////////\n//┌────────────────────────────────────────────────────┐//\n//│           ▄██┐   █████┐    ██████┐ ██████┐████████┐│//\n//│          ████│  ██┌───┘    ██┌──██┐└─██┌─┘└──██┌──┘│//\n//│          └─██│  ██████┐    ██████┌┘  ██│     ██│   │//\n//│            ██│  ██┌──██┐   ██┌──██┐  ██│     ██│   │//\n//│          ██████┐└█████┌┘   ██████┌┘██████┐   ██│   │//\n//│          └─────┘ └────┘    └─────┘ └─────┘   └─┘   │//\n//└────────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n#define VAR short\n#define FUNC(NAME) NAME##16\n\n#include \"blitsort.c\"\n\n#undef VAR\n#undef FUNC\n\n//////////////////////////////////////////////////////////\n//┌────────────────────────────────────────────────────┐//\n//│  ▄██┐  ██████┐  █████┐    ██████┐ ██████┐████████┐ │//\n//│ ████│  └────██┐██┌──██┐   ██┌──██┐└─██┌─┘└──██┌──┘ │//\n//│ └─██│   █████┌┘└█████┌┘   ██████┌┘  ██│     ██│    │//\n//│   ██│  ██┌───┘ ██┌──██┐   ██┌──██┐  ██│     ██│    │//\n//│ ██████┐███████┐└█████┌┘   ██████┌┘██████┐   ██│    │//\n//│ └─────┘└──────┘ └────┘    └─────┘ └─────┘   └─┘    │//\n//└────────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n// 128 reflects the name, though the actual size is 80, 96, or 128 bits,\n// depending on platform.\n#if (DBL_MANT_DIG < LDBL_MANT_DIG)\n  #define VAR long double\n  #define FUNC(NAME) NAME##128\n    #include \"blitsort.c\"\n  #undef VAR\n  #undef FUNC\n#endif\n\n///////////////////////////////////////////////////////////\n//┌─────────────────────────────────────────────────────┐//\n//│ ██████┐██┐   ██┐███████┐████████┐ ██████┐ ███┐  ███┐│//\n//│██┌────┘██│   ██│██┌────┘└──██┌──┘██┌───██┐████┐████││//\n//│██│     ██│   ██│███████┐   ██│   ██│   ██│██┌███┌██││//\n//│██│     ██│   ██│└────██│   ██│   ██│   ██│██│└█┌┘██││//\n//│└██████┐└██████┌┘███████│   ██│   └██████┌┘██│ └┘ ██││//\n//│ └─────┘ └─────┘ └──────┘   └─┘    └─────┘ └─┘    └─┘│//\n//└─────────────────────────────────────────────────────┘//\n///////////////////////////////////////////////////////////\n\n/*\ntypedef struct {char bytes[32];} struct256;\n#define VAR struct256\n#define FUNC(NAME) NAME##256\n\n#include \"blitsort.c\"\n\n#undef VAR\n#undef FUNC\n*/\n\n /////////////////////////////////////////////////////////////////////////////\n//┌────────────────────────────────────────────────────────────────────────┐//\n//│   ██████┐ ██┐     ██████┐████████┐███████┐ ██████┐ ██████┐ ████████┐   │//\n//│   ██┌──██┐██│     └─██┌─┘└──██┌──┘██┌────┘██┌───██┐██┌──██┐└──██┌──┘   │//\n//│   ██████┌┘██│       ██│     ██│   ███████┐██│   ██│██████┌┘   ██│      │//\n//│   ██┌──██┐██│       ██│     ██│   └────██│██│   ██│██┌──██┐   ██│      │//\n//│   ██████┌┘███████┐██████┐   ██│   ███████│└██████┌┘██│  ██│   ██│      │//\n//│   └─────┘ └──────┘└─────┘   └─┘   └──────┘ └─────┘ └─┘  └─┘   └─┘      │//\n//└────────────────────────────────────────────────────────────────────────┘//\n/////////////////////////////////////////////////////////////////////////////\n\nvoid blitsort(void *array, size_t nmemb, size_t size, CMPFUNC *cmp)\n{\n\tif (nmemb < 2)\n\t{\n\t\treturn;\n\t}\n\n\tswitch (size)\n\t{\n\t\tcase sizeof(char):\n\t\t\tblitsort8(array, nmemb, cmp);\n\t\t\treturn;\n\n\t\tcase sizeof(short):\n\t\t\tblitsort16(array, nmemb, cmp);\n\t\t\treturn;\n\n\t\tcase sizeof(int):\n\t\t\tblitsort32(array, nmemb, cmp);\n\t\t\treturn;\n\n\t\tcase sizeof(long long):\n\t\t\tblitsort64(array, nmemb, cmp);\n\t\t\treturn;\n#if (DBL_MANT_DIG < LDBL_MANT_DIG)\n\t\tcase sizeof(long double):\n\t\t\tblitsort128(array, nmemb, cmp);\n\t\t\treturn;\n#endif\n//\t\tcase sizeof(struct256):\n//\t\t\tblitsort256(array, nmemb, cmp);\n\t\t\treturn;\n\n\t\tdefault:\n#if (DBL_MANT_DIG < LDBL_MANT_DIG)\n\t\t\tassert(size == sizeof(char) || size == sizeof(short) || size == sizeof(int) || size == sizeof(long long) || size == sizeof(long double));\n#else\n\t\t\tassert(size == sizeof(char) || size == sizeof(short) || size == sizeof(int) || size == sizeof(long long));\n#endif\n//\t\t\tqsort(array, nmemb, size, cmp);\n\t}\n}\n\n// suggested size values for primitives:\n\n//\t\tcase  0: unsigned char\n//\t\tcase  1: signed char\n//\t\tcase  2: signed short\n//\t\tcase  3: unsigned short\n//\t\tcase  4: signed int\n//\t\tcase  5: unsigned int\n//\t\tcase  6: float\n//\t\tcase  7: double\n//\t\tcase  8: signed long long\n//\t\tcase  9: unsigned long long\n//\t\tcase  ?: long double, use sizeof(long double):\n\nvoid blitsort_prim(void *array, size_t nmemb, size_t size)\n{\n\tif (nmemb < 2)\n\t{\n\t\treturn;\n\t}\n\n\tswitch (size)\n\t{\n\t\tcase 4:\n\t\t\tblitsort_int32(array, nmemb, NULL);\n\t\t\treturn;\n\t\tcase 5:\n\t\t\tblitsort_uint32(array, nmemb, NULL);\n\t\t\treturn;\n\t\tcase 8:\n\t\t\tblitsort_int64(array, nmemb, NULL);\n\t\t\treturn;\n\t\tcase 9:\n\t\t\tblitsort_uint64(array, nmemb, NULL);\n\t\t\treturn;\n\t\tdefault:\n\t\t\tassert(size == sizeof(int) || size == sizeof(int) + 1 || size == sizeof(long long) || size == sizeof(long long) + 1);\n\t\t\treturn;\n\t}\n}\n\n#undef QUAD_CACHE\n\n#endif\n"
  },
  {
    "path": "src/crumsort.c",
    "content": "// crumsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com\n\n#define CRUM_AUX  512\n#define CRUM_OUT   96\n\nvoid FUNC(fulcrum_partition)(VAR *array, VAR *swap, VAR *max, size_t swap_size, size_t nmemb, CMPFUNC *cmp);\n\nvoid FUNC(crum_analyze)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp)\n{\n\tunsigned char loop, asum, bsum, csum, dsum;\n\tunsigned int astreaks, bstreaks, cstreaks, dstreaks;\n\tsize_t quad1, quad2, quad3, quad4, half1, half2;\n\tsize_t cnt, abalance, bbalance, cbalance, dbalance;\n\tVAR *pta, *ptb, *ptc, *ptd;\n\n\thalf1 = nmemb / 2;\n\tquad1 = half1 / 2;\n\tquad2 = half1 - quad1;\n\thalf2 = nmemb - half1;\n\tquad3 = half2 / 2;\n\tquad4 = half2 - quad3;\n\n\tpta = array;\n\tptb = array + quad1;\n\tptc = array + half1;\n\tptd = array + half1 + quad3;\n\n\tastreaks = bstreaks = cstreaks = dstreaks = 0;\n\tabalance = bbalance = cbalance = dbalance = 0;\n\n\tfor (cnt = nmemb ; cnt > 132 ; cnt -= 128)\n\t{\n\t\tfor (asum = bsum = csum = dsum = 0, loop = 32 ; loop ; loop--)\n\t\t{\n\t\t\tasum += cmp(pta, pta + 1) > 0; pta++;\n\t\t\tbsum += cmp(ptb, ptb + 1) > 0; ptb++;\n\t\t\tcsum += cmp(ptc, ptc + 1) > 0; ptc++;\n\t\t\tdsum += cmp(ptd, ptd + 1) > 0; ptd++;\n\t\t}\n\t\tabalance += asum; astreaks += asum = (asum == 0) | (asum == 32);\n\t\tbbalance += bsum; bstreaks += bsum = (bsum == 0) | (bsum == 32);\n\t\tcbalance += csum; cstreaks += csum = (csum == 0) | (csum == 32);\n\t\tdbalance += dsum; dstreaks += dsum = (dsum == 0) | (dsum == 32);\n\n\t\tif (cnt > 516 && asum + bsum + csum + dsum == 0)\n\t\t{\n\t\t\tabalance += 48; pta += 96;\n\t\t\tbbalance += 48; ptb += 96;\n\t\t\tcbalance += 48; ptc += 96;\n\t\t\tdbalance += 48; ptd += 96;\n\t\t\tcnt -= 384;\n\t\t}\n\t}\n\n\tfor ( ; cnt > 7 ; cnt -= 4)\n\t{\n\t\tabalance += cmp(pta, pta + 1) > 0; pta++;\n\t\tbbalance += cmp(ptb, ptb + 1) > 0; ptb++;\n\t\tcbalance += cmp(ptc, ptc + 1) > 0; ptc++;\n\t\tdbalance += cmp(ptd, ptd + 1) > 0; ptd++;\n\t}\n\n\tif (quad1 < quad2) {bbalance += cmp(ptb, ptb + 1) > 0; ptb++;}\n\tif (quad1 < quad3) {cbalance += cmp(ptc, ptc + 1) > 0; ptc++;}\n\tif (quad1 < quad4) {dbalance += cmp(ptd, ptd + 1) > 0; ptd++;}\n\n\tcnt = abalance + bbalance + cbalance + dbalance;\n\n\tif (cnt == 0)\n\t{\n\t\tif (cmp(pta, pta + 1) <= 0 && cmp(ptb, ptb + 1) <= 0 && cmp(ptc, ptc + 1) <= 0)\n\t\t{\n\t\t\treturn;\n\t\t}\n\t}\n\n\tasum = quad1 - abalance == 1;\n\tbsum = quad2 - bbalance == 1;\n\tcsum = quad3 - cbalance == 1;\n\tdsum = quad4 - dbalance == 1;\n\n\tif (asum | bsum | csum | dsum)\n\t{\n\t\tunsigned char span1 = (asum && bsum) * (cmp(pta, pta + 1) > 0);\n\t\tunsigned char span2 = (bsum && csum) * (cmp(ptb, ptb + 1) > 0);\n\t\tunsigned char span3 = (csum && dsum) * (cmp(ptc, ptc + 1) > 0);\n\n\t\tswitch (span1 | span2 * 2 | span3 * 4)\n\t\t{\n\t\t\tcase 0: break;\n\t\t\tcase 1: FUNC(quad_reversal)(array, ptb);   abalance = bbalance = 0; break;\n\t\t\tcase 2: FUNC(quad_reversal)(pta + 1, ptc); bbalance = cbalance = 0; break;\n\t\t\tcase 3: FUNC(quad_reversal)(array, ptc);   abalance = bbalance = cbalance = 0; break;\n\t\t\tcase 4: FUNC(quad_reversal)(ptb + 1, ptd); cbalance = dbalance = 0; break;\n\t\t\tcase 5: FUNC(quad_reversal)(array, ptb);\n\t\t\t\tFUNC(quad_reversal)(ptb + 1, ptd); abalance = bbalance = cbalance = dbalance = 0; break;\n\t\t\tcase 6: FUNC(quad_reversal)(pta + 1, ptd); bbalance = cbalance = dbalance = 0; break;\n\t\t\tcase 7: FUNC(quad_reversal)(array, ptd); return;\n\t\t}\n\n\t\tif (asum && abalance) {FUNC(quad_reversal)(array,   pta); abalance = 0;}\n\t\tif (bsum && bbalance) {FUNC(quad_reversal)(pta + 1, ptb); bbalance = 0;}\n\t\tif (csum && cbalance) {FUNC(quad_reversal)(ptb + 1, ptc); cbalance = 0;}\n\t\tif (dsum && dbalance) {FUNC(quad_reversal)(ptc + 1, ptd); dbalance = 0;}\n\t}\n\n#ifdef cmp\n\tcnt = nmemb / 256; // switch to quadsort if at least 50% ordered\n#else\n\tcnt = nmemb / 512; // switch to quadsort if at least 25% ordered\n#endif\n\tasum = astreaks > cnt;\n\tbsum = bstreaks > cnt;\n\tcsum = cstreaks > cnt;\n\tdsum = dstreaks > cnt;\n\n#ifndef cmp\n\tif (quad1 > QUAD_CACHE)\n\t{\n\t\tasum = bsum = csum = dsum = 1;\n\t}\n#endif\n\tswitch (asum + bsum * 2 + csum * 4 + dsum * 8)\n\t{\n\t\tcase 0:\n\t\t\tFUNC(fulcrum_partition)(array, swap, NULL, swap_size, nmemb, cmp);\n\t\t\treturn;\n\t\tcase 1:\n\t\t\tif (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp);\n\t\t\tFUNC(fulcrum_partition)(pta + 1, swap, NULL, swap_size, quad2 + half2, cmp);\n\t\t\tbreak;\n\t\tcase 2:\n\t\t\tFUNC(fulcrum_partition)(array, swap, NULL, swap_size, quad1, cmp);\n\t\t\tif (bbalance) FUNC(quadsort_swap)(pta + 1, swap, swap_size, quad2, cmp);\n\t\t\tFUNC(fulcrum_partition)(ptb + 1, swap, NULL, swap_size, half2, cmp);\n\t\t\tbreak;\n\t\tcase 3:\n\t\t\tif (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp);\n\t\t\tif (bbalance) FUNC(quadsort_swap)(pta + 1, swap, swap_size, quad2, cmp);\n\t\t\tFUNC(fulcrum_partition)(ptb + 1, swap, NULL, swap_size, half2, cmp);\n\t\t\tbreak;\n\t\tcase 4:\n\t\t\tFUNC(fulcrum_partition)(array, swap, NULL, swap_size, half1, cmp);\n\t\t\tif (cbalance) FUNC(quadsort_swap)(ptb + 1, swap, swap_size, quad3, cmp);\n\t\t\tFUNC(fulcrum_partition)(ptc + 1, swap, NULL, swap_size, quad4, cmp);\n\t\t\tbreak;\n\t\tcase 8:\n\t\t\tFUNC(fulcrum_partition)(array, swap, NULL, swap_size, half1 + quad3, cmp);\n\t\t\tif (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp);\n\t\t\tbreak;\n\t\tcase 9:\n\t\t\tif (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp);\n\t\t\tFUNC(fulcrum_partition)(pta + 1, swap, NULL, swap_size, quad2 + quad3, cmp);\n\t\t\tif (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp);\n\t\t\tbreak;\n\t\tcase 12:\n\t\t\tFUNC(fulcrum_partition)(array, swap, NULL, swap_size, half1, cmp);\n\t\t\tif (cbalance) FUNC(quadsort_swap)(ptb + 1, swap, swap_size, quad3, cmp);\n\t\t\tif (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp);\n\t\t\tbreak;\n\t\tcase 5:\n\t\tcase 6:\n\t\tcase 7:\n\t\tcase 10:\n\t\tcase 11:\n\t\tcase 13:\n\t\tcase 14:\n\t\tcase 15:\n\t\t\tif (asum)\n\t\t\t{\n\t\t\t\tif (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp);\n\t\t\t}\n\t\t\telse FUNC(fulcrum_partition)(array, swap, NULL, swap_size, quad1, cmp);\n\t\t\tif (bsum)\n\t\t\t{\n\t\t\t\tif (bbalance) FUNC(quadsort_swap)(pta + 1, swap, swap_size, quad2, cmp);\n\t\t\t}\n\t\t\telse FUNC(fulcrum_partition)(pta + 1, swap, NULL, swap_size, quad2, cmp);\n\t\t\tif (csum)\n\t\t\t{\n\t\t\t\tif (cbalance) FUNC(quadsort_swap)(ptb + 1, swap, swap_size, quad3, cmp);\n\t\t\t}\n\t\t\telse FUNC(fulcrum_partition)(ptb + 1, swap, NULL, swap_size, quad3, cmp);\n\t\t\tif (dsum)\n\t\t\t{\n\t\t\t\tif (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp);\n\t\t\t}\n\t\t\telse FUNC(fulcrum_partition)(ptc + 1, swap, NULL, swap_size, quad4, cmp);\n\t\t\tbreak;\n\t}\n\n\tif (cmp(pta, pta + 1) <= 0)\n\t{\n\t\tif (cmp(ptc, ptc + 1) <= 0)\n\t\t{\n\t\t\tif (cmp(ptb, ptb + 1) <= 0)\n\t\t\t{\n\t\t\t\treturn;\n\t\t\t}\n\t\t}\n\t\telse\n\t\t{\n\t\t\tFUNC(rotate_merge_block)(array + half1, swap, swap_size, quad3, quad4, cmp);\n\t\t}\n\t}\n\telse\n\t{\n\t\tFUNC(rotate_merge_block)(array, swap, swap_size, quad1, quad2, cmp);\n\n\t\tif (cmp(ptc, ptc + 1) > 0)\n\t\t{\n\t\t\tFUNC(rotate_merge_block)(array + half1, swap, swap_size, quad3, quad4, cmp);\n\t\t}\n\t}\n\tFUNC(rotate_merge_block)(array, swap, swap_size, half1, half2, cmp);\n}\n\n// The next 4 functions are used for pivot selection\n\nVAR *FUNC(crum_binary_median)(VAR *pta, VAR *ptb, size_t len, CMPFUNC *cmp)\n{\n\twhile (len /= 2)\n\t{\n\t\tif (cmp(pta + len, ptb + len) <= 0) pta += len; else ptb += len;\n\t}\n\treturn cmp(pta, ptb) > 0 ? pta : ptb;\n}\n\nVAR *FUNC(crum_median_of_cbrt)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, int *generic, CMPFUNC *cmp)\n{\n\tVAR *pta, *piv;\n\tsize_t cnt, cbrt, div;\n\n\tfor (cbrt = 32 ; nmemb > cbrt * cbrt * cbrt && cbrt < swap_size ; cbrt *= 2) {}\n\n\tdiv = nmemb / cbrt;\n\n\tpta = array + nmemb - 1 - (size_t) &div / 64 % div;\n\tpiv = array + cbrt;\n\n\tfor (cnt = cbrt ; cnt ; cnt--)\n\t{\n\t\tswap[0] = *--piv; *piv = *pta; *pta = swap[0];\n\n\t\tpta -= div;\n\t}\n\n\tcbrt /= 2;\n\n\tFUNC(quadsort_swap)(piv, swap, swap_size, cbrt, cmp);\n\tFUNC(quadsort_swap)(piv + cbrt, swap, swap_size, cbrt, cmp);\n\n\t*generic = (cmp(piv + cbrt * 2 - 1, piv) <= 0) & (cmp(piv + cbrt - 1, piv) <= 0);\n\n\treturn FUNC(crum_binary_median)(piv, piv + cbrt, cbrt, cmp);\n}\n\nsize_t FUNC(crum_median_of_three)(VAR *array, size_t v0, size_t v1, size_t v2, CMPFUNC *cmp)\n{\n\tsize_t v[3] = {v0, v1, v2};\n\tchar x, y, z;\n\n\tx = cmp(array + v0, array + v1) > 0;\n\ty = cmp(array + v0, array + v2) > 0;\n\tz = cmp(array + v1, array + v2) > 0;\n\n\treturn v[(x == y) + (y ^ z)];\n}\n\nVAR *FUNC(crum_median_of_nine)(VAR *array, size_t nmemb, CMPFUNC *cmp)\n{\n\tsize_t x, y, z, div = nmemb / 16;\n\n\tx = FUNC(crum_median_of_three)(array, div * 2, div * 1, div * 4, cmp);\n\ty = FUNC(crum_median_of_three)(array, div * 8, div * 6, div * 10, cmp);\n\tz = FUNC(crum_median_of_three)(array, div * 14, div * 12, div * 15, cmp);\n\n\treturn array + FUNC(crum_median_of_three)(array, x, y, z, cmp);\n}\n\nsize_t FUNC(fulcrum_default_partition)(VAR *array, VAR *swap, VAR *ptx, VAR *piv, size_t swap_size, size_t nmemb, CMPFUNC *cmp)\n{\n\tsize_t i, cnt, val, m = 0;\n\tVAR *ptl, *ptr, *pta, *tpa;\n\n\tmemcpy(swap, array, 32 * sizeof(VAR));\n\tmemcpy(swap + 32, array + nmemb - 32, 32 * sizeof(VAR));\n\n\tptl = array;\n\tptr = array + nmemb - 1;\n\n\tpta = array + 32;\n\ttpa = array + nmemb - 33;\n\n\tcnt = nmemb / 16 - 4;\n\n\twhile (1)\n\t{\n\t\tif (pta - ptl - m <= 48)\n\t\t{\n\t\t\tif (cnt-- == 0) break;\n\n\t\t\tfor (i = 16 ; i ; i--)\n\t\t\t{\n\t\t\t\tval = cmp(pta, piv) <= 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--;\n\t\t\t}\n\t\t}\n\t\tif (pta - ptl - m >= 16)\n\t\t{\n\t\t\tif (cnt-- == 0) break;\n\n\t\t\tfor (i = 16 ; i ; i--)\n\t\t\t{\n\t\t\t\tval = cmp(tpa, piv) <= 0; ptl[m] = ptr[m] = *tpa--; m += val; ptr--;\n\t\t\t}\n\t\t}\n\t}\n\n\tif (pta - ptl - m <= 48)\n\t{\n\t\tfor (cnt = nmemb % 16 ; cnt ; cnt--)\n\t\t{\n\t\t\tval = cmp(pta, piv) <= 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--;\n\t\t}\n\t}\n\telse\n\t{\n\t\tfor (cnt = nmemb % 16 ; cnt ; cnt--)\n\t\t{\n\t\t\tval = cmp(tpa, piv) <= 0; ptl[m] = ptr[m] = *tpa--; m += val; ptr--;\n\t\t}\n\t}\n\tpta = swap;\n\n\tfor (cnt = 16 ; cnt ; cnt--)\n\t{\n\t\tval = cmp(pta, piv) <= 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--;\n\t\tval = cmp(pta, piv) <= 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--;\n\t\tval = cmp(pta, piv) <= 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--;\n\t\tval = cmp(pta, piv) <= 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--;\n\t}\n\treturn m;\n}\n\n// As per suggestion by Marshall Lochbaum to improve generic data handling by mimicking dual-pivot quicksort\n\nsize_t FUNC(fulcrum_reverse_partition)(VAR *array, VAR *swap, VAR *ptx, VAR *piv, size_t swap_size, size_t nmemb, CMPFUNC *cmp)\n{\n\tsize_t i, cnt, val, m = 0;\n\tVAR *ptl, *ptr, *pta, *tpa;\n\n\tmemcpy(swap, array, 32 * sizeof(VAR));\n\tmemcpy(swap + 32, array + nmemb - 32, 32 * sizeof(VAR));\n\n\tptl = array;\n\tptr = array + nmemb - 1;\n\n\tpta = array + 32;\n\ttpa = array + nmemb - 33;\n\n\tcnt = nmemb / 16 - 4;\n\n\twhile (1)\n\t{\n\t\tif (pta - ptl - m <= 48)\n\t\t{\n\t\t\tif (cnt-- == 0) break;\n\n\t\t\tfor (i = 16 ; i ; i--)\n\t\t\t{\n\t\t\t\tval = cmp(piv, pta) > 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--;\n\t\t\t}\n\t\t}\n\t\tif (pta - ptl - m >= 16)\n\t\t{\n\t\t\tif (cnt-- == 0) break;\n\n\t\t\tfor (i = 16 ; i ; i--)\n\t\t\t{\n\t\t\t\tval = cmp(piv, tpa) > 0; ptl[m] = ptr[m] = *tpa--; m += val; ptr--;\n\t\t\t}\n\t\t}\n\t}\n\n\tif (pta - ptl - m <= 48)\n\t{\n\t\tfor (cnt = nmemb % 16 ; cnt ; cnt--)\n\t\t{\n\t\t\tval = cmp(piv, pta) > 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--;\n\t\t}\n\t}\n\telse\n\t{\n\t\tfor (cnt = nmemb % 16 ; cnt ; cnt--)\n\t\t{\n\t\t\tval = cmp(piv, tpa) > 0; ptl[m] = ptr[m] = *tpa--; m += val; ptr--;\n\t\t}\n\t}\n\tpta = swap;\n\n\tfor (cnt = 16 ; cnt ; cnt--)\n\t{\n\t\tval = cmp(piv, pta) > 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--;\n\t\tval = cmp(piv, pta) > 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--;\n\t\tval = cmp(piv, pta) > 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--;\n\t\tval = cmp(piv, pta) > 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--;\n\t}\n\treturn m;\n}\n\nvoid FUNC(fulcrum_partition)(VAR *array, VAR *swap, VAR *max, size_t swap_size, size_t nmemb, CMPFUNC *cmp)\n{\n\tsize_t a_size, s_size;\n\tVAR *ptp, piv;\n\tint generic = 0;\n\n\twhile (1)\n\t{\n\t\tif (nmemb <= 2048)\n\t\t{\n\t\t\tptp = FUNC(crum_median_of_nine)(array, nmemb, cmp);\n\t\t}\n\t\telse\n\t\t{\n\t\t\tptp = FUNC(crum_median_of_cbrt)(array, swap, swap_size, nmemb, &generic, cmp);\n\n\t\t\tif (generic) break;\n\t\t}\n\t\tpiv = *ptp;\n\n\t\tif (max && cmp(max, &piv) <= 0)\n\t\t{\n\t\t\ta_size = FUNC(fulcrum_reverse_partition)(array, swap, array, &piv, swap_size, nmemb, cmp);\n\t\t\ts_size = nmemb - a_size;\n\t\t\tnmemb = a_size;\n\n\t\t\tif (s_size <= a_size / 32 || a_size <= CRUM_OUT) break;\n\n\t\t\tmax = NULL;\n\t\t\tcontinue;\n\t\t}\n\t\t*ptp = array[--nmemb];\n\n\t\ta_size = FUNC(fulcrum_default_partition)(array, swap, array, &piv, swap_size, nmemb, cmp);\n\t\ts_size = nmemb - a_size;\n\n\t\tptp = array + a_size; array[nmemb] = *ptp; *ptp = piv;\n\n\t\tif (a_size <= s_size / 32 || s_size <= CRUM_OUT)\n\t\t{\n\t\t\tFUNC(quadsort_swap)(ptp + 1, swap, swap_size, s_size, cmp);\n\t\t}\n\t\telse\n\t\t{\n\t\t\tFUNC(fulcrum_partition)(ptp + 1, swap, max, swap_size, s_size, cmp);\n\t\t}\n\t\tnmemb = a_size;\n\n\t\tif (s_size <= a_size / 32 || a_size <= CRUM_OUT)\n\t\t{\n\t\t\tif (a_size <= CRUM_OUT) break;\n\n\t\t\ta_size = FUNC(fulcrum_reverse_partition)(array, swap, array, &piv, swap_size, nmemb, cmp);\n\t\t\ts_size = nmemb - a_size;\n\t\t\tnmemb = a_size;\n\n\t\t\tif (s_size <= a_size / 32 || a_size <= CRUM_OUT) break;\n\n\t\t\tmax = NULL;\n\t\t\tcontinue;\n\t\t}\n\t\tmax = ptp;\n\t}\n\tFUNC(quadsort_swap)(array, swap, swap_size, nmemb, cmp);\n}\n\nvoid FUNC(crumsort)(void *array, size_t nmemb, CMPFUNC *cmp)\n{\n\tif (nmemb <= 256)\n\t{\n\t\tVAR swap[nmemb];\n\n\t\tFUNC(quadsort_swap)(array, swap, nmemb, nmemb, cmp);\n\n\t\treturn;\n\t}\n\tVAR *pta = (VAR *) array;\n#if CRUM_AUX\n\tsize_t swap_size = CRUM_AUX;\n#else\n\tsize_t swap_size = 128;\n\n\twhile (swap_size * swap_size <= nmemb)\n\t{\n\t\tswap_size *= 4;\n\t}\n#endif\n\tVAR swap[swap_size];\n\n\tFUNC(crum_analyze)(pta, swap, swap_size, nmemb, cmp);\n}\n\nvoid FUNC(crumsort_swap)(void *array, void *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp)\n{\n\tif (nmemb <= 256)\n\t{\n\t\tFUNC(quadsort_swap)(array, swap, swap_size, nmemb, cmp);\n\t}\n\telse\n\t{\n\t\tVAR *pta = (VAR *) array;\n\t\tVAR *pts = (VAR *) swap;\n\n\t\tFUNC(crum_analyze)(pta, pts, swap_size, nmemb, cmp);\n\t}\n}\n"
  },
  {
    "path": "src/crumsort.h",
    "content": "// crumsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com\n\n#ifndef CRUMSORT_H\n#define CRUMSORT_H\n\n#include <stdlib.h>\n#include <stdio.h>\n#include <assert.h>\n#include <errno.h>\n#include <stdalign.h>\n#include <float.h>\n#include <string.h>\n\ntypedef int CMPFUNC (const void *a, const void *b);\n\n//#define cmp(a,b) (*(a) > *(b))\n\n#ifndef QUADSORT_H\n  #include \"quadsort.h\"\n#endif\n\n// When sorting an array of pointers, like a string array, the QUAD_CACHE needs\n// to be set for proper performance when sorting large arrays.\n// crumsort_prim() can be used to sort arrays of 32 and 64 bit integers\n// without a comparison function or cache restrictions.\n\n// With a 6 MB L3 cache a value of 262144 works well.\n\n#ifdef cmp\n  #define QUAD_CACHE 4294967295\n#else\n//#define QUAD_CACHE 131072\n  #define QUAD_CACHE 262144\n//#define QUAD_CACHE 524288\n//#define QUAD_CACHE 4294967295\n#endif\n\n//////////////////////////////////////////////////////////\n// ┌───────────────────────────────────────────────────┐//\n// │       ██████┐ ██████┐    ██████┐ ██████┐████████┐ │//\n// │       └────██┐└────██┐   ██┌──██┐└─██┌─┘└──██┌──┘ │//\n// │        █████┌┘ █████┌┘   ██████┌┘  ██│     ██│    │//\n// │        └───██┐██┌───┘    ██┌──██┐  ██│     ██│    │//\n// │       ██████┌┘███████┐   ██████┌┘██████┐   ██│    │//\n// │       └─────┘ └──────┘   └─────┘ └─────┘   └─┘    │//\n// └───────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n#define VAR int\n#define FUNC(NAME) NAME##32\n\n#include \"crumsort.c\"\n\n#undef VAR\n#undef FUNC\n\n// crumsort_prim\n\n#define VAR int\n#define FUNC(NAME) NAME##_int32\n#ifndef cmp\n  #define cmp(a,b) (*(a) > *(b))\n  #include \"crumsort.c\"\n  #undef cmp\n#else\n  #include \"crumsort.c\"\n#endif\n#undef VAR\n#undef FUNC\n\n#define VAR unsigned int\n#define FUNC(NAME) NAME##_uint32\n#ifndef cmp\n  #define cmp(a,b) (*(a) > *(b))\n  #include \"crumsort.c\"\n  #undef cmp\n#else\n  #include \"crumsort.c\"\n#endif\n#undef VAR\n#undef FUNC\n\n//////////////////////////////////////////////////////////\n// ┌───────────────────────────────────────────────────┐//\n// │        █████┐ ██┐  ██┐   ██████┐ ██████┐████████┐ │//\n// │       ██┌───┘ ██│  ██│   ██┌──██┐└─██┌─┘└──██┌──┘ │//\n// │       ██████┐ ███████│   ██████┌┘  ██│     ██│    │//\n// │       ██┌──██┐└────██│   ██┌──██┐  ██│     ██│    │//\n// │       └█████┌┘     ██│   ██████┌┘██████┐   ██│    │//\n// │        └────┘      └─┘   └─────┘ └─────┘   └─┘    │//\n// └───────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n#define VAR long long\n#define FUNC(NAME) NAME##64\n\n#include \"crumsort.c\"\n\n#undef VAR\n#undef FUNC\n\n// crumsort_prim\n\n#define VAR long long\n#define FUNC(NAME) NAME##_int64\n#ifndef cmp\n  #define cmp(a,b) (*(a) > *(b))\n  #include \"crumsort.c\"\n  #undef cmp\n#else\n  #include \"crumsort.c\"\n#endif\n#undef VAR\n#undef FUNC\n\n#define VAR unsigned long long\n#define FUNC(NAME) NAME##_uint64\n#ifndef cmp\n  #define cmp(a,b) (*(a) > *(b))\n  #include \"crumsort.c\"\n  #undef cmp\n#else\n  #include \"crumsort.c\"\n#endif\n#undef VAR\n#undef FUNC\n\n// This section is outside of 32/64 bit pointer territory, so no cache checks\n// necessary, unless sorting 32+ byte structures.\n\n#undef QUAD_CACHE\n#define QUAD_CACHE 4294967295\n\n//////////////////////////////////////////////////////////\n//┌────────────────────────────────────────────────────┐//\n//│                █████┐    ██████┐ ██████┐████████┐  │//\n//│               ██┌──██┐   ██┌──██┐└─██┌─┘└──██┌──┘  │//\n//│               └█████┌┘   ██████┌┘  ██│     ██│     │//\n//│               ██┌──██┐   ██┌──██┐  ██│     ██│     │//\n//│               └█████┌┘   ██████┌┘██████┐   ██│     │//\n//│                └────┘    └─────┘ └─────┘   └─┘     │//\n//└────────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n#define VAR char\n#define FUNC(NAME) NAME##8\n\n#include \"crumsort.c\"\n\n#undef VAR\n#undef FUNC\n\n//////////////////////////////////////////////////////////\n//┌────────────────────────────────────────────────────┐//\n//│           ▄██┐   █████┐    ██████┐ ██████┐████████┐│//\n//│          ████│  ██┌───┘    ██┌──██┐└─██┌─┘└──██┌──┘│//\n//│          └─██│  ██████┐    ██████┌┘  ██│     ██│   │//\n//│            ██│  ██┌──██┐   ██┌──██┐  ██│     ██│   │//\n//│          ██████┐└█████┌┘   ██████┌┘██████┐   ██│   │//\n//│          └─────┘ └────┘    └─────┘ └─────┘   └─┘   │//\n//└────────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n#define VAR short\n#define FUNC(NAME) NAME##16\n\n#include \"crumsort.c\"\n\n#undef VAR\n#undef FUNC\n\n//////////////////////////////////////////////////////////\n//┌────────────────────────────────────────────────────┐//\n//│  ▄██┐  ██████┐  █████┐    ██████┐ ██████┐████████┐ │//\n//│ ████│  └────██┐██┌──██┐   ██┌──██┐└─██┌─┘└──██┌──┘ │//\n//│ └─██│   █████┌┘└█████┌┘   ██████┌┘  ██│     ██│    │//\n//│   ██│  ██┌───┘ ██┌──██┐   ██┌──██┐  ██│     ██│    │//\n//│ ██████┐███████┐└█████┌┘   ██████┌┘██████┐   ██│    │//\n//│ └─────┘└──────┘ └────┘    └─────┘ └─────┘   └─┘    │//\n//└────────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n// 128 reflects the name, though the actual size of a long double is 64, 80,\n// 96, or 128 bits, depending on platform.\n\n#if (DBL_MANT_DIG < LDBL_MANT_DIG)\n  #define VAR long double\n  #define FUNC(NAME) NAME##128\n    #include \"crumsort.c\"\n  #undef VAR\n  #undef FUNC\n#endif\n\n///////////////////////////////////////////////////////////\n//┌─────────────────────────────────────────────────────┐//\n//│ ██████┐██┐   ██┐███████┐████████┐ ██████┐ ███┐  ███┐│//\n//│██┌────┘██│   ██│██┌────┘└──██┌──┘██┌───██┐████┐████││//\n//│██│     ██│   ██│███████┐   ██│   ██│   ██│██┌███┌██││//\n//│██│     ██│   ██│└────██│   ██│   ██│   ██│██│└█┌┘██││//\n//│└██████┐└██████┌┘███████│   ██│   └██████┌┘██│ └┘ ██││//\n//│ └─────┘ └─────┘ └──────┘   └─┘    └─────┘ └─┘    └─┘│//\n//└─────────────────────────────────────────────────────┘//\n///////////////////////////////////////////////////////////\n\n/*\ntypedef struct {char bytes[32];} struct256;\n#define VAR struct256\n#define FUNC(NAME) NAME##256\n\n#include \"crumsort.c\"\n\n#undef VAR\n#undef FUNC\n*/\n\n //////////////////////////////////////////////////////////////////////////\n//┌─────────────────────────────────────────────────────────────────────┐//\n//│ ██████┐██████┐ ██┐   ██┐███┐  ███┐███████┐ ██████┐ ██████┐ ████████┐│//\n//│██┌────┘██┌──██┐██│   ██│████┐████│██┌────┘██┌───██┐██┌──██┐└──██┌──┘│//\n//│██│     ██████┌┘██│   ██│██┌███┌██│███████┐██│   ██│██████┌┘   ██│   │//\n//│██│     ██┌──██┐██│   ██│██│└█┌┘██│└────██│██│   ██│██┌──██┐   ██│   │//\n//│└██████┐██│  ██│└██████┌┘██│ └┘ ██│███████│└██████┌┘██│  ██│   ██│   │//\n//│ └─────┘└─┘  └─┘ └─────┘ └─┘    └─┘└──────┘ └─────┘ └─┘  └─┘   └─┘   │//\n//└─────────────────────────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////////////////////\n\nvoid crumsort(void *array, size_t nmemb, size_t size, CMPFUNC *cmp)\n{\n\tif (nmemb < 2)\n\t{\n\t\treturn;\n\t}\n\n\tswitch (size)\n\t{\n\t\tcase sizeof(char):\n\t\t\tcrumsort8(array, nmemb, cmp);\n\t\t\treturn;\n\n\t\tcase sizeof(short):\n\t\t\tcrumsort16(array, nmemb, cmp);\n\t\t\treturn;\n\n\t\tcase sizeof(int):\n\t\t\tcrumsort32(array, nmemb, cmp);\n\t\t\treturn;\n\n\t\tcase sizeof(long long):\n\t\t\tcrumsort64(array, nmemb, cmp);\n\t\t\treturn;\n#if (DBL_MANT_DIG < LDBL_MANT_DIG)\n\t\tcase sizeof(long double):\n\t\t\tcrumsort128(array, nmemb, cmp);\n\t\t\treturn;\n#endif\n//\t\tcase sizeof(struct256):\n//\t\t\tcrumsort256(array, nmemb, cmp);\n\t\t\treturn;\n\n\t\tdefault:\n#if (DBL_MANT_DIG < LDBL_MANT_DIG)\n\t\t\tassert(size == sizeof(char) || size == sizeof(short) || size == sizeof(int) || size == sizeof(long long) || size == sizeof(long double));\n#else\n\t\t\tassert(size == sizeof(char) || size == sizeof(short) || size == sizeof(int) || size == sizeof(long long));\n#endif\n//\t\t\tqsort(array, nmemb, size, cmp);\n\t}\n}\n\n// suggested size values for primitives:\n\n//\t\tcase  0: unsigned char\n//\t\tcase  1: signed char\n//\t\tcase  2: signed short\n//\t\tcase  3: unsigned short\n//\t\tcase  4: signed int\n//\t\tcase  5: unsigned int\n//\t\tcase  6: float\n//\t\tcase  7: double\n//\t\tcase  8: signed long long\n//\t\tcase  9: unsigned long long\n//\t\tcase  ?: long double, use sizeof(long double):\n\nvoid crumsort_prim(void *array, size_t nmemb, size_t size)\n{\n\tif (nmemb < 2)\n\t{\n\t\treturn;\n\t}\n\n\tswitch (size)\n\t{\n\t\tcase 4:\n\t\t\tcrumsort_int32(array, nmemb, NULL);\n\t\t\treturn;\n\t\tcase 5:\n\t\t\tcrumsort_uint32(array, nmemb, NULL);\n\t\t\treturn;\n\t\tcase 8:\n\t\t\tcrumsort_int64(array, nmemb, NULL);\n\t\t\treturn;\n\t\tcase 9:\n\t\t\tcrumsort_uint64(array, nmemb, NULL);\n\t\t\treturn;\n\t\tdefault:\n\t\t\tassert(size == sizeof(int) || size == sizeof(int) + 1 || size == sizeof(long long) || size == sizeof(long long) + 1);\n\t\t\treturn;\n\t}\n}\n\n#undef QUAD_CACHE\n\n#endif\n"
  },
  {
    "path": "src/extra_tests.c",
    "content": "#ifdef QUAD_DEBUG\n\n\t// random % 4\n\n\tfor (cnt = 0 ; cnt < mem ; cnt++)\n\t{\n\t\tr_array[cnt] = rand() % 4;\n\t}\n\trun_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, \"random % 4\", sizeof(VAR), cmp_int);\n\n\t// semi random\n\n\tfor (cnt = 0 ; cnt < mem ; cnt++)\n\t{\n\t\tr_array[cnt] = rand() % 8 / 7 * rand();\n\t}\n\trun_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, \"semi random\", sizeof(VAR), cmp_int);\n\n\t// random signal\n\n\tfor (cnt = 0 ; cnt < mem ; cnt++)\n\t{\n\t\tif (cnt < mem / 2)\n\t\t{\n\t\t\tr_array[cnt] = cnt + rand() % 16;\n\t\t}\n\t\telse\n\t\t{\n\t\t\tr_array[cnt] = mem - cnt + rand() % 16;\n\t\t}\n\t}\n\trun_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, \"random signal\", sizeof(VAR), cmp_int);\n\n\t// exponential\n\n\tfor (cnt = 0 ; cnt < mem ; cnt++)\n\t{\n\t\tr_array[cnt] = (size_t) (cnt * cnt) % 10000; //(1 << 30);\n\t}\n\trun_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, \"exponential\", sizeof(VAR), cmp_int);\n\n\t// random fragments -- Make array 92% sorted\n\n\tfor (cnt = 0 ; cnt < max ; cnt++)\n\t{\n\t\tr_array[cnt] = rand();\n\t}\n\tquadsort(r_array + quad0, quad1 / 100 * 98, sizeof(VAR), cmp_int);\n\tquadsort(r_array + quad1, quad1 / 100 * 98, sizeof(VAR), cmp_int);\n\tquadsort(r_array + half1, quad1 / 100 * 98, sizeof(VAR), cmp_int);\n\tquadsort(r_array + span3, quad1 / 100 * 98, sizeof(VAR), cmp_int);\n\n\trun_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, \"chaos fragments\", sizeof(VAR), cmp_int);\n\n\t// Make array 12% sorted, this tends to make timsort/powersort slower than fully random\n\n\tfor (cnt = 0 ; cnt < max ; cnt++)\n\t{\n\t\tr_array[cnt] = rand();\n\t}\n\tquadsort(r_array + quad0 / 1, quad1 * 2 / 100, sizeof(VAR), cmp_int);\n\tquadsort(r_array + quad1 / 2, quad1 * 2 / 100, sizeof(VAR), cmp_int);\n\tquadsort(r_array + quad1 / 1, quad1 * 2 / 100, sizeof(VAR), cmp_int);\n\tquadsort(r_array + half1 / 1, quad1 * 2 / 100, sizeof(VAR), cmp_int);\n\tquadsort(r_array + span3 / 2, quad1 * 2 / 100, sizeof(VAR), cmp_int);\n\tquadsort(r_array + span3 / 1, quad1 * 2 / 100, sizeof(VAR), cmp_int);\n\n\trun_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, \"order fragments\", sizeof(VAR), cmp_int);\n\n\t// Make array 95% generic\n\n\tfor (cnt = 0 ; cnt < max ; cnt++)\n\t{\n\t\tif (rand() % 20 == 0)\n\t\t{\n\t\t\tr_array[cnt] = rand();\n\t\t}\n\t\telse\n\t\t{\n\t\t\tr_array[cnt] = 1000000000;\n\t\t}\n\t}\n\trun_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, \"95% generic\", sizeof(VAR), cmp_int);\n\n\t// Three saws\n\n\tfor (cnt = 0 ; cnt < max ; cnt++)\n\t{\n\t\tr_array[cnt] = rand();\n\t}\n\tquadsort(r_array, max / 3, sizeof(VAR), cmp_int);\n\tquadsort(r_array + max / 3, max / 3, sizeof(VAR), cmp_int);\n\tquadsort(r_array + max / 3 * 2, max / 3, sizeof(VAR), cmp_int);\n\n\trun_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, \"three saws\", sizeof(VAR), cmp_int);\n\n\t// various combinations of reverse and ascending order data\n/*\n\tfor (cnt = 0 ; cnt < max ; cnt++) r_array[cnt] = rand();\n\tquadsort(r_array + quad0, half1, sizeof(VAR), cmp_int);\n\tquadsort(r_array + half1, half2, sizeof(VAR), cmp_int);\n\trun_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, \"aaaaa aaaaa\", sizeof(VAR), cmp_int);\n\n\tfor (cnt = 0 ; cnt < max ; cnt++) r_array[cnt] = rand();\n\tquadsort(r_array + quad1 / 2, nmemb - quad1 / 2, sizeof(VAR), cmp_int);\n\trun_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, \"raaaaaaaaaa\", sizeof(VAR), cmp_int);\n\n\tsize_t span2 = quad2 + quad3 + quad4;\n\n\tfor (cnt = 0 ; cnt < max ; cnt++) r_array[cnt] = rand();\n\tquadsort(r_array + quad1, span2, sizeof(VAR), cmp_int);\n\trun_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, \"rr aaaaaaaa\", sizeof(VAR), cmp_int);\n\n\tfor (cnt = 0 ; cnt < max ; cnt++) r_array[cnt] = rand();\n\tquadsort(r_array + quad0, quad1, sizeof(VAR), cmp_int);\n\tquadsort(r_array + half1, half2, sizeof(VAR), cmp_int);\n\trun_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, \"aa rr aaaaa\", sizeof(VAR), cmp_int);\n\n\tfor (cnt = 0 ; cnt < max ; cnt++) r_array[cnt] = rand();\n\tquadsort(r_array + quad0, half1, sizeof(VAR), cmp_int);\n\tquadsort(r_array + span3, quad4, sizeof(VAR), cmp_int);\n\trun_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, \"aaaaa rr aa\", sizeof(VAR), cmp_int);\n\n\tfor (cnt = 0 ; cnt < max ; cnt++) r_array[cnt] = rand();\n\tquadsort(r_array + quad0, nmemb, sizeof(VAR), cmp_int);\n\tqsort(r_array + quad0, half1, sizeof(VAR), cmp_rev);\n\tqsort(r_array + half1, half2, sizeof(VAR), cmp_rev);\n\trun_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, \"rrrrr rrrrr\", sizeof(VAR), cmp_int);\n\n\tfor (cnt = 0 ; cnt < max ; cnt++) r_array[cnt] = rand();\n\tquadsort(r_array + quad0, nmemb, sizeof(VAR), cmp_int);\n\tqsort(r_array + quad0, quad1, sizeof(VAR), cmp_rev);\n\tqsort(r_array + quad1, quad2, sizeof(VAR), cmp_rev);\n\tqsort(r_array + half1, quad3, sizeof(VAR), cmp_rev);\n\tqsort(r_array + span3, quad4, sizeof(VAR), cmp_rev);\n\trun_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, \"rr rr rr rr\", sizeof(VAR), cmp_int);\n*/\n#endif\n"
  },
  {
    "path": "src/fluxsort.c",
    "content": "// fluxsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com\n\n#define FLUX_OUT 96\n\nvoid FUNC(flux_partition)(VAR *array, VAR *swap, VAR *ptx, VAR *ptp, size_t nmemb, CMPFUNC *cmp);\n\n// Determine whether to use mergesort or quicksort\n\nvoid FUNC(flux_analyze)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp)\n{\n\tunsigned char loop, asum, bsum, csum, dsum;\n\tunsigned int astreaks, bstreaks, cstreaks, dstreaks;\n\tsize_t quad1, quad2, quad3, quad4, half1, half2;\n\tsize_t cnt, abalance, bbalance, cbalance, dbalance;\n\tVAR *pta, *ptb, *ptc, *ptd;\n\n\thalf1 = nmemb / 2;\n\tquad1 = half1 / 2;\n\tquad2 = half1 - quad1;\n\thalf2 = nmemb - half1;\n\tquad3 = half2 / 2;\n\tquad4 = half2 - quad3;\n\n\tpta = array;\n\tptb = array + quad1;\n\tptc = array + half1;\n\tptd = array + half1 + quad3;\n\n\tastreaks = bstreaks = cstreaks = dstreaks = 0;\n\tabalance = bbalance = cbalance = dbalance = 0;\n\n\tif (quad1 < quad2) {bbalance += cmp(ptb, ptb + 1) > 0; ptb++;}\n\tif (quad1 < quad3) {cbalance += cmp(ptc, ptc + 1) > 0; ptc++;}\n\tif (quad1 < quad4) {dbalance += cmp(ptd, ptd + 1) > 0; ptd++;}\n\n\tfor (cnt = nmemb ; cnt > 132 ; cnt -= 128)\n\t{\n\t\tfor (asum = bsum = csum = dsum = 0, loop = 32 ; loop ; loop--)\n\t\t{\n\t\t\tasum += cmp(pta, pta + 1) > 0; pta++;\n\t\t\tbsum += cmp(ptb, ptb + 1) > 0; ptb++;\n\t\t\tcsum += cmp(ptc, ptc + 1) > 0; ptc++;\n\t\t\tdsum += cmp(ptd, ptd + 1) > 0; ptd++;\n\t\t}\n\t\tabalance += asum; astreaks += asum = (asum == 0) | (asum == 32);\n\t\tbbalance += bsum; bstreaks += bsum = (bsum == 0) | (bsum == 32);\n\t\tcbalance += csum; cstreaks += csum = (csum == 0) | (csum == 32);\n\t\tdbalance += dsum; dstreaks += dsum = (dsum == 0) | (dsum == 32);\n\n\t\tif (cnt > 516 && asum + bsum + csum + dsum == 0)\n\t\t{\n\t\t\tabalance += 48; pta += 96;\n\t\t\tbbalance += 48; ptb += 96;\n\t\t\tcbalance += 48; ptc += 96;\n\t\t\tdbalance += 48; ptd += 96;\n\t\t\tcnt -= 384;\n\t\t}\n\t}\n\n\tfor ( ; cnt > 7 ; cnt -= 4)\n\t{\n\t\tabalance += cmp(pta, pta + 1) > 0; pta++;\n\t\tbbalance += cmp(ptb, ptb + 1) > 0; ptb++;\n\t\tcbalance += cmp(ptc, ptc + 1) > 0; ptc++;\n\t\tdbalance += cmp(ptd, ptd + 1) > 0; ptd++;\n\t}\n\n\tcnt = abalance + bbalance + cbalance + dbalance;\n\n\tif (cnt == 0)\n\t{\n\t\tif (cmp(pta, pta + 1) <= 0 && cmp(ptb, ptb + 1) <= 0 && cmp(ptc, ptc + 1) <= 0)\n\t\t{\n\t\t\treturn;\n\t\t}\n\t}\n\n\tasum = quad1 - abalance == 1;\n\tbsum = quad2 - bbalance == 1;\n\tcsum = quad3 - cbalance == 1;\n\tdsum = quad4 - dbalance == 1;\n\n\tif (asum | bsum | csum | dsum)\n\t{\n\t\tunsigned char span1 = (asum && bsum) * (cmp(pta, pta + 1) > 0);\n\t\tunsigned char span2 = (bsum && csum) * (cmp(ptb, ptb + 1) > 0);\n\t\tunsigned char span3 = (csum && dsum) * (cmp(ptc, ptc + 1) > 0);\n\n\t\tswitch (span1 | span2 * 2 | span3 * 4)\n\t\t{\n\t\t\tcase 0: break;\n\t\t\tcase 1: FUNC(quad_reversal)(array, ptb);   abalance = bbalance = 0; break;\n\t\t\tcase 2: FUNC(quad_reversal)(pta + 1, ptc); bbalance = cbalance = 0; break;\n\t\t\tcase 3: FUNC(quad_reversal)(array, ptc);   abalance = bbalance = cbalance = 0; break;\n\t\t\tcase 4: FUNC(quad_reversal)(ptb + 1, ptd); cbalance = dbalance = 0; break;\n\t\t\tcase 5: FUNC(quad_reversal)(array, ptb);\n\t\t\t\tFUNC(quad_reversal)(ptb + 1, ptd); abalance = bbalance = cbalance = dbalance = 0; break;\n\t\t\tcase 6: FUNC(quad_reversal)(pta + 1, ptd); bbalance = cbalance = dbalance = 0; break;\n\t\t\tcase 7: FUNC(quad_reversal)(array, ptd); return;\n\t\t}\n\t\tif (asum && abalance) {FUNC(quad_reversal)(array,   pta); abalance = 0;}\n\t\tif (bsum && bbalance) {FUNC(quad_reversal)(pta + 1, ptb); bbalance = 0;}\n\t\tif (csum && cbalance) {FUNC(quad_reversal)(ptb + 1, ptc); cbalance = 0;}\n\t\tif (dsum && dbalance) {FUNC(quad_reversal)(ptc + 1, ptd); dbalance = 0;}\n\t}\n\n#ifdef cmp\n\tcnt = nmemb / 256; // switch to quadsort if at least 50% ordered\n#else\n\tcnt = nmemb / 512; // switch to quadsort if at least 25% ordered\n#endif\n\tasum = astreaks > cnt;\n\tbsum = bstreaks > cnt;\n\tcsum = cstreaks > cnt;\n\tdsum = dstreaks > cnt;\n\n#ifndef cmp\n\tif (quad1 > QUAD_CACHE)\n\t{\n\t\tasum = bsum = csum = dsum = 1;\n\t}\n#endif\n\n\tswitch (asum + bsum * 2 + csum * 4 + dsum * 8)\n\t{\n\t\tcase 0:\n\t\t\tFUNC(flux_partition)(array, swap, array, swap + nmemb, nmemb, cmp);\n\t\t\treturn;\n\t\tcase 1:\n\t\t\tif (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp);\n\t\t\tFUNC(flux_partition)(pta + 1, swap, pta + 1, swap + quad2 + half2, quad2 + half2, cmp);\n\t\t\tbreak;\n\t\tcase 2:\n\t\t\tFUNC(flux_partition)(array, swap, array, swap + quad1, quad1, cmp);\n\t\t\tif (bbalance) FUNC(quadsort_swap)(pta + 1, swap, swap_size, quad2, cmp);\n\t\t\tFUNC(flux_partition)(ptb + 1, swap, ptb + 1, swap + half2, half2, cmp);\n\t\t\tbreak;\n\t\tcase 3:\n\t\t\tif (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp);\n\t\t\tif (bbalance) FUNC(quadsort_swap)(pta + 1, swap, swap_size, quad2, cmp);\n\t\t\tFUNC(flux_partition)(ptb + 1, swap, ptb + 1, swap + half2, half2, cmp);\n\t\t\tbreak;\n\t\tcase 4:\n\t\t\tFUNC(flux_partition)(array, swap, array, swap + half1, half1, cmp);\n\t\t\tif (cbalance) FUNC(quadsort_swap)(ptb + 1, swap, swap_size, quad3, cmp);\n\t\t\tFUNC(flux_partition)(ptc + 1, swap, ptc + 1, swap + quad4, quad4, cmp);\n\t\t\tbreak;\n\t\tcase 8:\n\t\t\tFUNC(flux_partition)(array, swap, array, swap + half1 + quad3, half1 + quad3, cmp);\n\t\t\tif (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp);\n\t\t\tbreak;\n\t\tcase 9:\n\t\t\tif (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp);\n\t\t\tFUNC(flux_partition)(pta + 1, swap, pta + 1, swap + quad2 + quad3, quad2 + quad3, cmp);\n\t\t\tif (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp);\n\t\t\tbreak;\n\t\tcase 12:\n\t\t\tFUNC(flux_partition)(array, swap, array, swap + half1, half1, cmp);\n\t\t\tif (cbalance) FUNC(quadsort_swap)(ptb + 1, swap, swap_size, quad3, cmp);\n\t\t\tif (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp);\n\t\t\tbreak;\n\t\tcase 5:\n\t\tcase 6:\n\t\tcase 7:\n\t\tcase 10:\n\t\tcase 11:\n\t\tcase 13:\n\t\tcase 14:\n\t\tcase 15:\n\t\t\tif (asum)\n\t\t\t{\n\t\t\t\tif (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp);\n\t\t\t}\n\t\t\telse FUNC(flux_partition)(array, swap, array, swap + quad1, quad1, cmp);\n\t\t\tif (bsum)\n\t\t\t{\n\t\t\t\tif (bbalance) FUNC(quadsort_swap)(pta + 1, swap, swap_size, quad2, cmp);\n\t\t\t}\n\t\t\telse FUNC(flux_partition)(pta + 1, swap, pta + 1, swap + quad2, quad2, cmp);\n\t\t\tif (csum)\n\t\t\t{\n\t\t\t\tif (cbalance) FUNC(quadsort_swap)(ptb + 1, swap, swap_size, quad3, cmp);\n\t\t\t}\n\t\t\telse FUNC(flux_partition)(ptb + 1, swap, ptb + 1, swap + quad3, quad3, cmp);\n\t\t\tif (dsum)\n\t\t\t{\n\t\t\t\tif (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp);\n\t\t\t}\n\t\t\telse FUNC(flux_partition)(ptc + 1, swap, ptc + 1, swap + quad4, quad4, cmp);\n\t\t\tbreak;\n\t}\n\n\tif (cmp(pta, pta + 1) <= 0)\n\t{\n\t\tif (cmp(ptc, ptc + 1) <= 0)\n\t\t{\n\t\t\tif (cmp(ptb, ptb + 1) <= 0)\n\t\t\t{\n\t\t\t\treturn;\n\t\t\t}\n\t\t\tmemcpy(swap, array, nmemb * sizeof(VAR));\n\t\t}\n\t\telse\n\t\t{\n\t\t\tFUNC(cross_merge)(swap + half1, array + half1, quad3, quad4, cmp);\n\t\t\tmemcpy(swap, array, half1 * sizeof(VAR));\n\t\t}\n\t}\n\telse\n\t{\n\t\tif (cmp(ptc, ptc + 1) <= 0)\n\t\t{\n\t\t\tmemcpy(swap + half1, array + half1, half2 * sizeof(VAR));\n\t\t\tFUNC(cross_merge)(swap, array, quad1, quad2, cmp);\n\t\t}\n\t\telse\n\t\t{\n\t\t\tFUNC(cross_merge)(swap + half1, ptb + 1, quad3, quad4, cmp);\n\t\t\tFUNC(cross_merge)(swap, array, quad1, quad2, cmp);\n\t\t}\n\t}\n\tFUNC(cross_merge)(array, swap, half1, half2, cmp);\n}\n\n// The next 4 functions are used for pivot selection\n\nVAR FUNC(binary_median)(VAR *pta, VAR *ptb, size_t len, CMPFUNC *cmp)\n{\n\twhile (len /= 2)\n\t{\n\t\tif (cmp(pta + len, ptb + len) <= 0) pta += len; else ptb += len;\n\t}\n\treturn cmp(pta, ptb) > 0 ? *pta : *ptb;\n}\n\nvoid FUNC(trim_four)(VAR *pta, CMPFUNC *cmp)\n{\n\tVAR swap;\n\tsize_t x;\n\n\tx = cmp(pta, pta + 1)  > 0; swap = pta[!x]; pta[0] = pta[x]; pta[1] = swap; pta += 2;\n\tx = cmp(pta, pta + 1)  > 0; swap = pta[!x]; pta[0] = pta[x]; pta[1] = swap; pta -= 2;\n\n\tx = (cmp(pta, pta + 2) <= 0) * 2; pta[2] = pta[x]; pta++;\n\tx = (cmp(pta, pta + 2)  > 0) * 2; pta[0] = pta[x];\n}\n\nVAR FUNC(median_of_nine)(VAR *array, size_t nmemb, CMPFUNC *cmp)\n{\n\tVAR *pta, swap[9];\n\tsize_t x, y, z;\n\n\tz = nmemb / 9;\n\n\tpta = array;\n\n\tfor (x = 0 ; x < 9 ; x++)\n\t{\n\t\tswap[x] = *pta;\n\n\t\tpta += z;\n\t}\n\n\tFUNC(trim_four)(swap, cmp);\n\tFUNC(trim_four)(swap + 4, cmp);\n\n\tswap[0] = swap[5];\n\tswap[3] = swap[8];\n\n\tFUNC(trim_four)(swap, cmp);\n\n\tswap[0] = swap[6];\n\n\tx = cmp(swap + 0, swap + 1) > 0;\n\ty = cmp(swap + 0, swap + 2) > 0;\n\tz = cmp(swap + 1, swap + 2) > 0;\n\n\treturn swap[(x == y) + (y ^ z)];\n}\n\nVAR FUNC(median_of_cbrt)(VAR *array, VAR *swap, VAR *ptx, size_t nmemb, int *generic, CMPFUNC *cmp)\n{\n\tVAR *pta, *pts;\n\tsize_t cnt, div, cbrt;\n\n\tfor (cbrt = 32 ; nmemb > cbrt * cbrt * cbrt ; cbrt *= 2) {}\n\n\tdiv = nmemb / cbrt;\n\n\tpta = ptx + (size_t) &div / 16 % div;\n\tpts = ptx == array ? swap : array;\n\n\tfor (cnt = 0 ; cnt < cbrt ; cnt++)\n\t{\n\t\tpts[cnt] = *pta;\n\n\t\tpta += div;\n\t}\n\tcbrt /= 2;\n\n\tFUNC(quadsort_swap)(pts, pts + cbrt * 2, cbrt, cbrt, cmp);\n\tFUNC(quadsort_swap)(pts + cbrt, pts + cbrt * 2, cbrt, cbrt, cmp);\n\n\t*generic = (cmp(pts + cbrt * 2 - 1, pts) <= 0) & (cmp(pts + cbrt - 1, pts) <= 0);\n\n\treturn FUNC(binary_median)(pts, pts + cbrt, cbrt, cmp);\n}\n\n// As per suggestion by Marshall Lochbaum to improve generic data handling by mimicking dual-pivot quicksort\n\nvoid FUNC(flux_reverse_partition)(VAR *array, VAR *swap, VAR *ptx, VAR *piv, size_t nmemb, CMPFUNC *cmp)\n{\n\tsize_t a_size, s_size;\n\n#if !defined __clang__\n\t{\n\t\tsize_t cnt, m, val;\n\t\tVAR *pts = swap;\n\n\t\tfor (m = 0, cnt = nmemb / 8 ; cnt ; cnt--)\n\t\t{\n\t\t\tval = cmp(piv, ptx) > 0; pts[-m] = array[m] = *ptx++; m += val; pts++;\n\t\t\tval = cmp(piv, ptx) > 0; pts[-m] = array[m] = *ptx++; m += val; pts++;\n\t\t\tval = cmp(piv, ptx) > 0; pts[-m] = array[m] = *ptx++; m += val; pts++;\n\t\t\tval = cmp(piv, ptx) > 0; pts[-m] = array[m] = *ptx++; m += val; pts++;\n\t\t\tval = cmp(piv, ptx) > 0; pts[-m] = array[m] = *ptx++; m += val; pts++;\n\t\t\tval = cmp(piv, ptx) > 0; pts[-m] = array[m] = *ptx++; m += val; pts++;\n\t\t\tval = cmp(piv, ptx) > 0; pts[-m] = array[m] = *ptx++; m += val; pts++;\n\t\t\tval = cmp(piv, ptx) > 0; pts[-m] = array[m] = *ptx++; m += val; pts++;\n\t\t}\n\n\t\tfor (cnt = nmemb % 8 ; cnt ; cnt--)\n\t\t{\n\t\t\tval = cmp(piv, ptx) > 0; pts[-m] = array[m] = *ptx++; m += val; pts++;\n\t\t}\n\t\ta_size = m;\n\t\ts_size = nmemb - a_size;\n\t}\n#else\n\t{\n\t\tsize_t cnt;\n\t\tVAR *tmp, *pta = array, *pts = swap;\n\n\t\tfor (cnt = nmemb / 8 ; cnt ; cnt--)\n\t\t{\n\t\t\ttmp = cmp(piv, ptx) > 0 ? pta++ : pts++; *tmp = *ptx++;\n\t\t\ttmp = cmp(piv, ptx) > 0 ? pta++ : pts++; *tmp = *ptx++;\n\t\t\ttmp = cmp(piv, ptx) > 0 ? pta++ : pts++; *tmp = *ptx++;\n\t\t\ttmp = cmp(piv, ptx) > 0 ? pta++ : pts++; *tmp = *ptx++;\n\t\t\ttmp = cmp(piv, ptx) > 0 ? pta++ : pts++; *tmp = *ptx++;\n\t\t\ttmp = cmp(piv, ptx) > 0 ? pta++ : pts++; *tmp = *ptx++;\n\t\t\ttmp = cmp(piv, ptx) > 0 ? pta++ : pts++; *tmp = *ptx++;\n\t\t\ttmp = cmp(piv, ptx) > 0 ? pta++ : pts++; *tmp = *ptx++;\n\t\t}\n\n\t\tfor (cnt = nmemb % 8 ; cnt ; cnt--)\n\t\t{\n\t\t\ttmp = cmp(piv, ptx) > 0 ? pta++ : pts++; *tmp = *ptx++;\n\t\t}\n\t\ta_size = pta - array;\n\t\ts_size = pts - swap;\n\t}\n#endif\n\tmemcpy(array + a_size, swap, s_size * sizeof(VAR));\n\n\tif (s_size <= a_size / 16 || a_size <= FLUX_OUT)\n\t{\n\t\tFUNC(quadsort_swap)(array, swap, a_size, a_size, cmp);\n\t\treturn;\n\t}\n\tFUNC(flux_partition)(array, swap, array, piv, a_size, cmp);\n}\n\nsize_t FUNC(flux_default_partition)(VAR *array, VAR *swap, VAR *ptx, VAR *piv, size_t nmemb, CMPFUNC *cmp)\n{\n\tsize_t run = 0, a = 0, m = 0;\n\n#if !defined __clang__\n\tsize_t val;\n\n\tfor (a = 8 ; a <= nmemb ; a += 8)\n\t{\n\t\tval = cmp(ptx, piv) <= 0; swap[-m] = array[m] = *ptx++; m += val; swap++;\n\t\tval = cmp(ptx, piv) <= 0; swap[-m] = array[m] = *ptx++; m += val; swap++;\n\t\tval = cmp(ptx, piv) <= 0; swap[-m] = array[m] = *ptx++; m += val; swap++;\n\t\tval = cmp(ptx, piv) <= 0; swap[-m] = array[m] = *ptx++; m += val; swap++;\n\t\tval = cmp(ptx, piv) <= 0; swap[-m] = array[m] = *ptx++; m += val; swap++;\n\t\tval = cmp(ptx, piv) <= 0; swap[-m] = array[m] = *ptx++; m += val; swap++;\n\t\tval = cmp(ptx, piv) <= 0; swap[-m] = array[m] = *ptx++; m += val; swap++;\n\t\tval = cmp(ptx, piv) <= 0; swap[-m] = array[m] = *ptx++; m += val; swap++;\n\n\t\tif (m == a) run = a;\n\t}\n\n\tfor (a = nmemb % 8 ; a ; a--)\n\t{\n\t\tval = cmp(ptx, piv) <= 0; swap[-m] = array[m] = *ptx++; m += val; swap++;\n\t}\n\tswap -= nmemb;\n#else\n\tVAR *tmp, *pta = array, *pts = swap;\n\n\tfor (a = 8 ; a <= nmemb ; a += 8)\n\t{\n\t\ttmp = cmp(ptx, piv) <= 0 ? pta++ : pts++; *tmp = *ptx++;\n\t\ttmp = cmp(ptx, piv) <= 0 ? pta++ : pts++; *tmp = *ptx++;\n\t\ttmp = cmp(ptx, piv) <= 0 ? pta++ : pts++; *tmp = *ptx++;\n\t\ttmp = cmp(ptx, piv) <= 0 ? pta++ : pts++; *tmp = *ptx++;\n\n\t\ttmp = cmp(ptx, piv) <= 0 ? pta++ : pts++; *tmp = *ptx++;\n\t\ttmp = cmp(ptx, piv) <= 0 ? pta++ : pts++; *tmp = *ptx++;\n\t\ttmp = cmp(ptx, piv) <= 0 ? pta++ : pts++; *tmp = *ptx++;\n\t\ttmp = cmp(ptx, piv) <= 0 ? pta++ : pts++; *tmp = *ptx++;\n\n\t\tif (pta == array || pts == swap) run = a;\n\t}\n\n\tfor (a = nmemb % 8 ; a ; a--)\n\t{\n\t\ttmp = cmp(ptx, piv) <= 0 ? pta++ : pts++; *tmp = *ptx++;\n\t}\n\tm = pta - array;\n#endif\n\n\tif (run <= nmemb / 4)\n\t{\n\t\treturn m;\n\t}\n\n\tif (m == nmemb)\n\t{\n\t\treturn m;\n\t}\n\n\ta = nmemb - m;\n\n\tmemcpy(array + m, swap, a * sizeof(VAR));\n\n\tFUNC(quadsort_swap)(array + m, swap, a, a, cmp);\n\tFUNC(quadsort_swap)(array, swap, m, m, cmp);\n\n\treturn 0;\n}\n\nvoid FUNC(flux_partition)(VAR *array, VAR *swap, VAR *ptx, VAR *piv, size_t nmemb, CMPFUNC *cmp)\n{\n\tsize_t a_size = 0, s_size;\n\tint generic = 0;\n\n\twhile (1)\n\t{\n\t\t--piv;\n\n\t\tif (nmemb <= 2048)\n\t\t{\n\t\t\t*piv = FUNC(median_of_nine)(ptx, nmemb, cmp);\n\t\t}\n\t\telse\n\t\t{\n\t\t\t*piv = FUNC(median_of_cbrt)(array, swap, ptx, nmemb, &generic, cmp);\n\n\t\t\tif (generic)\n\t\t\t{\n\t\t\t\tif (ptx == swap)\n\t\t\t\t{\n\t\t\t\t\tmemcpy(array, swap, nmemb * sizeof(VAR));\n\t\t\t\t}\n\t\t\t\tFUNC(quadsort_swap)(array, swap, nmemb, nmemb, cmp);\n\t\t\t\treturn;\n\t\t\t}\n\t\t}\n\n\t\tif (a_size && cmp(piv + 1, piv) <= 0)\n\t\t{\n\t\t\tFUNC(flux_reverse_partition)(array, swap, array, piv, nmemb, cmp);\n\t\t\treturn;\n\t\t}\n\t\ta_size = FUNC(flux_default_partition)(array, swap, ptx, piv, nmemb, cmp);\n\t\ts_size = nmemb - a_size;\n\n\t\tif (a_size <= s_size / 32 || s_size <= FLUX_OUT)\n\t\t{\n\t\t\tif (a_size == 0)\n\t\t\t{\n\t\t\t\treturn;\n\t\t\t}\n\t\t\tif (s_size == 0)\n\t\t\t{\n\t\t\t\tFUNC(flux_reverse_partition)(array, swap, array, piv, a_size, cmp);\n\t\t\t\treturn;\n\t\t\t}\n\t\t\tmemcpy(array + a_size, swap, s_size * sizeof(VAR));\n\t\t\tFUNC(quadsort_swap)(array + a_size, swap, s_size, s_size, cmp);\n\t\t}\n\t\telse\n\t\t{\n\t\t\tFUNC(flux_partition)(array + a_size, swap, swap, piv, s_size, cmp);\n\t\t}\n\n\t\tif (s_size <= a_size / 32 || a_size <= FLUX_OUT)\n\t\t{\n\t\t\tif (a_size <= FLUX_OUT)\n\t\t\t{\n\t\t\t\tFUNC(quadsort_swap)(array, swap, a_size, a_size, cmp);\n\t\t\t}\n\t\t\telse\n\t\t\t{\n\t\t\t\tFUNC(flux_reverse_partition)(array, swap, array, piv, a_size, cmp);\n\t\t\t}\n\t\t\treturn;\n\t\t}\n\t\tnmemb = a_size;\n\t\tptx = array;\n\t}\n}\n\nvoid FUNC(fluxsort)(void *array, size_t nmemb, CMPFUNC *cmp)\n{\n\tif (nmemb <= 132)\n\t{\n\t\tFUNC(quadsort)(array, nmemb, cmp);\n\t}\n\telse\n\t{\n\t\tVAR *pta = (VAR *) array;\n\t\tVAR *swap = (VAR *) malloc(nmemb * sizeof(VAR));\n\n\t\tif (swap == NULL)\n\t\t{\n\t\t\tFUNC(quadsort)(array, nmemb, cmp);\n\t\t\treturn;\n\t\t}\n\t\tFUNC(flux_analyze)(pta, swap, nmemb, nmemb, cmp);\n\n\t\tfree(swap);\n\t}\n}\n\nvoid FUNC(fluxsort_swap)(void *array, void *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp)\n{\n\tif (nmemb <= 132)\n\t{\n\t\tFUNC(quadsort_swap)(array, swap, swap_size, nmemb, cmp);\n\t}\n\telse\n\t{\n\t\tVAR *pta = (VAR *) array;\n\t\tVAR *pts = (VAR *) swap;\n\n\t\tFUNC(flux_analyze)(pta, pts, swap_size, nmemb, cmp);\n\t}\n}\n"
  },
  {
    "path": "src/fluxsort.h",
    "content": "// fluxsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com\n\n#ifndef FLUXSORT_H\n#define FLUXSORT_H\n\n#include <stdlib.h>\n#include <stdio.h>\n#include <assert.h>\n#include <errno.h>\n#include <float.h>\n#include <string.h>\n\ntypedef int CMPFUNC (const void *a, const void *b);\n\n//#define cmp(a,b) (*(a) > *(b))\n\n#ifndef QUADSORT_H\n  #include \"quadsort.h\"\n#endif\n\n// When sorting an array of 32/64 bit pointers, like a string array, QUAD_CACHE\n// needs to be adjusted in quadsort.h and here for proper performance when\n// sorting large arrays.\n\n#ifdef cmp\n  #define QUAD_CACHE 4294967295\n#else\n//#define QUAD_CACHE 131072\n  #define QUAD_CACHE 262144\n//#define QUAD_CACHE 524288\n//#define QUAD_CACHE 4294967295\n#endif\n\n//////////////////////////////////////////////////////////\n// ┌───────────────────────────────────────────────────┐//\n// │       ██████┐ ██████┐    ██████┐ ██████┐████████┐ │//\n// │       └────██┐└────██┐   ██┌──██┐└─██┌─┘└──██┌──┘ │//\n// │        █████┌┘ █████┌┘   ██████┌┘  ██│     ██│    │//\n// │        └───██┐██┌───┘    ██┌──██┐  ██│     ██│    │//\n// │       ██████┌┘███████┐   ██████┌┘██████┐   ██│    │//\n// │       └─────┘ └──────┘   └─────┘ └─────┘   └─┘    │//\n// └───────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n#define VAR int\n#define FUNC(NAME) NAME##32\n\n#include \"fluxsort.c\"\n\n#undef VAR\n#undef FUNC\n\n// fluxsort_prim\n\n#define VAR int\n#define FUNC(NAME) NAME##_int32\n#ifndef cmp\n  #define cmp(a,b) (*(a) > *(b))\n  #include \"fluxsort.c\"\n  #undef cmp\n#else\n  #include \"fluxsort.c\"\n#endif\n#undef VAR\n#undef FUNC\n\n#define VAR unsigned int\n#define FUNC(NAME) NAME##_uint32\n#ifndef cmp\n  #define cmp(a,b) (*(a) > *(b))\n  #include \"fluxsort.c\"\n  #undef cmp\n#else\n  #include \"fluxsort.c\"\n#endif\n#undef VAR\n#undef FUNC\n\n//////////////////////////////////////////////////////////\n// ┌───────────────────────────────────────────────────┐//\n// │        █████┐ ██┐  ██┐   ██████┐ ██████┐████████┐ │//\n// │       ██┌───┘ ██│  ██│   ██┌──██┐└─██┌─┘└──██┌──┘ │//\n// │       ██████┐ ███████│   ██████┌┘  ██│     ██│    │//\n// │       ██┌──██┐└────██│   ██┌──██┐  ██│     ██│    │//\n// │       └█████┌┘     ██│   ██████┌┘██████┐   ██│    │//\n// │        └────┘      └─┘   └─────┘ └─────┘   └─┘    │//\n// └───────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n#define VAR long long\n#define FUNC(NAME) NAME##64\n\n#include \"fluxsort.c\"\n\n#undef VAR\n#undef FUNC\n\n// fluxsort_prim\n\n#define VAR long long\n#define FUNC(NAME) NAME##_int64\n#ifndef cmp\n  #define cmp(a,b) (*(a) > *(b))\n  #include \"fluxsort.c\"\n  #undef cmp\n#else\n  #include \"fluxsort.c\"\n#endif\n#undef VAR\n#undef FUNC\n\n#define VAR unsigned long long\n#define FUNC(NAME) NAME##_uint64\n#ifndef cmp\n  #define cmp(a,b) (*(a) > *(b))\n  #include \"fluxsort.c\"\n  #undef cmp\n#else\n  #include \"fluxsort.c\"\n#endif\n#undef VAR\n#undef FUNC\n\n// This section is outside of 32/64 bit pointer territory, so no cache checks\n// necessary, unless sorting 32+ byte structures.\n\n#undef QUAD_CACHE\n#define QUAD_CACHE 4294967295\n\n//////////////////////////////////////////////////////////\n//┌────────────────────────────────────────────────────┐//\n//│                █████┐    ██████┐ ██████┐████████┐  │//\n//│               ██┌──██┐   ██┌──██┐└─██┌─┘└──██┌──┘  │//\n//│               └█████┌┘   ██████┌┘  ██│     ██│     │//\n//│               ██┌──██┐   ██┌──██┐  ██│     ██│     │//\n//│               └█████┌┘   ██████┌┘██████┐   ██│     │//\n//│                └────┘    └─────┘ └─────┘   └─┘     │//\n//└────────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n#define VAR char\n#define FUNC(NAME) NAME##8\n\n#include \"fluxsort.c\"\n\n#undef VAR\n#undef FUNC\n\n//////////////////////////////////////////////////////////\n//┌────────────────────────────────────────────────────┐//\n//│           ▄██┐   █████┐    ██████┐ ██████┐████████┐│//\n//│          ████│  ██┌───┘    ██┌──██┐└─██┌─┘└──██┌──┘│//\n//│          └─██│  ██████┐    ██████┌┘  ██│     ██│   │//\n//│            ██│  ██┌──██┐   ██┌──██┐  ██│     ██│   │//\n//│          ██████┐└█████┌┘   ██████┌┘██████┐   ██│   │//\n//│          └─────┘ └────┘    └─────┘ └─────┘   └─┘   │//\n//└────────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n#define VAR short\n#define FUNC(NAME) NAME##16\n\n#include \"fluxsort.c\"\n\n#undef VAR\n#undef FUNC\n\n\n\n//////////////////////////////////////////////////////////\n//┌────────────────────────────────────────────────────┐//\n//│  ▄██┐  ██████┐  █████┐    ██████┐ ██████┐████████┐ │//\n//│ ████│  └────██┐██┌──██┐   ██┌──██┐└─██┌─┘└──██┌──┘ │//\n//│ └─██│   █████┌┘└█████┌┘   ██████┌┘  ██│     ██│    │//\n//│   ██│  ██┌───┘ ██┌──██┐   ██┌──██┐  ██│     ██│    │//\n//│ ██████┐███████┐└█████┌┘   ██████┌┘██████┐   ██│    │//\n//│ └─────┘└──────┘ └────┘    └─────┘ └─────┘   └─┘    │//\n//└────────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n#if (DBL_MANT_DIG < LDBL_MANT_DIG)\n  #define VAR long double\n  #define FUNC(NAME) NAME##128\n    #include \"fluxsort.c\"\n  #undef VAR\n  #undef FUNC\n#endif\n\n//////////////////////////////////////////////////////////////////////////\n//┌────────────────────────────────────────────────────────────────────┐//\n//│███████┐██┐     ██┐   ██┐██┐  ██┐███████┐ ██████┐ ██████┐ ████████┐ │//\n//│██┌────┘██│     ██│   ██│└██┐██┌┘██┌────┘██┌───██┐██┌──██┐└──██┌──┘ │//\n//│█████┐  ██│     ██│   ██│ └███┌┘ ███████┐██│   ██│██████┌┘   ██│    │//\n//│██┌──┘  ██│     ██│   ██│ ██┌██┐ └────██│██│   ██│██┌──██┐   ██│    │//\n//│██│     ███████┐└██████┌┘██┌┘ ██┐███████│└██████┌┘██│  ██│   ██│    │//\n//│└─┘     └──────┘ └─────┘ └─┘  └─┘└──────┘ └─────┘ └─┘  └─┘   └─┘    │//\n//└────────────────────────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////////////////////\n\nvoid fluxsort(void *array, size_t nmemb, size_t size, CMPFUNC *cmp)\n{\n\tif (nmemb < 2)\n\t{\n\t\treturn;\n\t}\n\n\tswitch (size)\n\t{\n\t\tcase sizeof(char):\n\t\t\tfluxsort8(array, nmemb, cmp);\n\t\t\treturn;\n\n\t\tcase sizeof(short):\n\t\t\tfluxsort16(array, nmemb, cmp);\n\t\t\treturn;\n\n\t\tcase sizeof(int):\n\t\t\tfluxsort32(array, nmemb, cmp);\n\t\t\treturn;\n\n\t\tcase sizeof(long long):\n\t\t\tfluxsort64(array, nmemb, cmp);\n\t\t\treturn;\n#if (DBL_MANT_DIG < LDBL_MANT_DIG)\n\t\tcase sizeof(long double):\n\t\t\tfluxsort128(array, nmemb, cmp);\n\t\t\treturn;\n#endif\n\n\t\tdefault:\n#if (DBL_MANT_DIG < LDBL_MANT_DIG)\n\t\t\tassert(size == sizeof(char) || size == sizeof(short) || size == sizeof(int) || size == sizeof(long long) || size == sizeof(long double));\n#else\n\t\t\tassert(size == sizeof(char) || size == sizeof(short) || size == sizeof(int) || size == sizeof(long long));\n#endif\n\t}\n}\n\n// This must match quadsort_prim()\n\nvoid fluxsort_prim(void *array, size_t nmemb, size_t size)\n{\n\tif (nmemb < 2)\n\t{\n\t\treturn;\n\t}\n\n\tswitch (size)\n\t{\n\t\tcase 4:\n\t\t\tfluxsort_int32(array, nmemb, NULL);\n\t\t\treturn;\n\t\tcase 5:\n\t\t\tfluxsort_uint32(array, nmemb, NULL);\n\t\t\treturn;\n\t\tcase 8:\n\t\t\tfluxsort_int64(array, nmemb, NULL);\n\t\t\treturn;\n\t\tcase 9:\n\t\t\tfluxsort_uint64(array, nmemb, NULL);\n\t\t\treturn;\n\t\tdefault:\n\t\t\tassert(size == sizeof(int) || size == sizeof(int) + 1 || size == sizeof(long long) || size == sizeof(long long) + 1);\n\t\t\treturn;\n\t}\n}\n\n// Sort arrays of structures, the comparison function must be by reference.\n\nvoid fluxsort_size(void *array, size_t nmemb, size_t size, CMPFUNC *cmp)\n{\n\tchar **pti, *pta, *pts;\n\tsize_t index, offset;\n\n\tpta = (char *) array;\n\tpti = (char **) malloc(nmemb * sizeof(char *));\n\n\tassert(pti != NULL);\n\n\tfor (index = offset = 0 ; index < nmemb ; index++)\n\t{\n\t\tpti[index] = pta + offset;\n\n\t\toffset += size;\n\t}\n\n\tswitch (sizeof(size_t))\n\t{\n\t\tcase 4: fluxsort32(pti, nmemb, cmp); break;\n\t\tcase 8: fluxsort64(pti, nmemb, cmp); break;\n\t}\n\n\tpts = (char *) malloc(nmemb * size);\n\n\tassert(pts != NULL);\n\t\n\tfor (index = 0 ; index < nmemb ; index++)\n\t{\n\t\tmemcpy(pts, pti[index], size);\n\n\t\tpts += size;\n\t}\n\tpts -= nmemb * size;\n\n\tmemcpy(array, pts, nmemb * size);\n\n\tfree(pti);\n\tfree(pts);\n}\n\n#undef QUAD_CACHE\n\n#endif\n"
  },
  {
    "path": "src/gridsort.c",
    "content": "// gridsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com\n\nSTRUCT(x_node)\n{\n\tVAR *swap;\n\tsize_t y_size;\n\tsize_t y;\n\tVAR *y_base;\n\tSTRUCT(y_node) **y_axis;\n};\n\nSTRUCT(y_node)\n{\n\tsize_t z_size;\n\tVAR *z_axis1;\n\tVAR *z_axis2;\n};\n\nSTRUCT(x_node) *FUNC(create_grid)(VAR *array, size_t nmemb, CMPFUNC *cmp)\n{\n\tSTRUCT(x_node) *x_node = (STRUCT(x_node) *) malloc(sizeof(STRUCT(x_node)));\n\tSTRUCT(y_node) *y_node;\n\n\tfor (BSC_Z = BSC_X ; BSC_Z * BSC_Z / 4 < nmemb ; BSC_Z *= 4);\n\n\tx_node->swap = (VAR *) malloc(BSC_Z * 2 * sizeof(VAR));\n\n\tx_node->y_base = (VAR *) malloc(BSC_Z * sizeof(VAR));\n\n\tx_node->y_axis = (STRUCT(y_node) **) malloc(BSC_Z * sizeof(STRUCT(y_node) *));\n\n\tFUNC(quadsort_swap)(array, x_node->swap, BSC_Z * 2, BSC_Z * 2, cmp);\n\n\tfor (int cnt = 0 ; cnt < 2 ; cnt++)\n\t{\n\t\ty_node = (STRUCT(y_node) *) malloc(sizeof(STRUCT(y_node)));\n\n\t\ty_node->z_axis1 = (VAR *) malloc(BSC_Z * sizeof(VAR));\n\t\tmemcpy(y_node->z_axis1, array + cnt * BSC_Z, BSC_Z * sizeof(VAR));\n\n\t\ty_node->z_axis2 = (VAR *) malloc(BSC_Z * sizeof(VAR));\n\n\t\ty_node->z_size = 0;\n\n\t\tx_node->y_axis[cnt] = y_node;\n\t\tx_node->y_base[cnt] = y_node->z_axis1[0];\n\t}\n\tx_node->y_size = 2;\n\tx_node->y = 0;\n\n\treturn x_node;\n}\n\n// used by destroy_grid\n\n// y_node->z_axis1 should be sorted and of BSC_Z size.\n// y_node->z_axis2 should be unsorted and of y_node->z_size size.\n\nvoid FUNC(twin_merge_cpy)(STRUCT(x_node) *x_node, VAR *dest, STRUCT(y_node) *y_node, CMPFUNC *cmp)\n{\n\tVAR *ptl = y_node->z_axis1;\n\tVAR *ptr = y_node->z_axis2;\n\tsize_t nmemb1 = BSC_Z;\n\tsize_t nmemb2 = y_node->z_size;\n\tVAR *tpl = y_node->z_axis1 + nmemb1 - 1;\n\tVAR *tpr = y_node->z_axis2 + nmemb2 - 1;\n\tVAR *ptd = dest;\n\tVAR *tpd = dest + nmemb1 + nmemb2 - 1;\n\tsize_t loop, x, y;\n\n\tFUNC(quadsort_swap)(ptr, x_node->swap, nmemb2, nmemb2, cmp);\n\n\twhile (1)\n\t{\n\t\tif (tpl - ptl > 8)\n\t\t{\n\t\t\tptl8_ptr: if (cmp(ptl + 7, ptr) <= 0)\n\t\t\t{\n\t\t\t\tmemcpy(ptd, ptl, 8 * sizeof(VAR)); ptd += 8; ptl += 8;\n\n\t\t\t\tif (tpl - ptl > 8) {goto ptl8_ptr;} continue;\n\t\t\t}\n\n\t\t\ttpl8_tpr: if (cmp(tpl - 7, tpr) > 0)\n\t\t\t{\n\t\t\t\ttpd -= 7; tpl -= 7; memcpy(tpd--, tpl--, 8 * sizeof(VAR));\n\n\t\t\t\tif (tpl - ptl > 8) {goto tpl8_tpr;} continue;\n\t\t\t}\n\t\t}\n\n\t\tif (tpr - ptr > 8)\n\t\t{\n\t\t\tptl_ptr8: if (cmp(ptl, ptr + 7) > 0)\n\t\t\t{\n\t\t\t\tmemcpy(ptd, ptr, 8 * sizeof(VAR)); ptd += 8; ptr += 8;\n\n\t\t\t\tif (tpr - ptr > 8) {goto ptl_ptr8;} continue;\n\t\t\t}\n\n\t\t\ttpl_tpr8: if (cmp(tpl, tpr - 7) <= 0)\n\t\t\t{\n\t\t\t\ttpd -= 7; tpr -= 7; memcpy(tpd--, tpr--, 8 * sizeof(VAR));\n\n\t\t\t\tif (tpr - ptr > 8) {goto tpl_tpr8;} continue;\n\t\t\t}\n\t\t}\n\n\t\tif (tpd - ptd < 16)\n\t\t{\n\t\t\tbreak;\n\t\t}\n\n\t\tloop = 8; do\n\t\t{\n\t\t\thead_branchless_merge(ptd, x, ptl, ptr, cmp);\n\t\t\ttail_branchless_merge(tpd, y, tpl, tpr, cmp);\n\t\t}\n\t\twhile (--loop);\n\t}\n\n\twhile (tpl - ptl > 1 && tpr - ptr > 1)\n\t{\n\t\tif (cmp(ptl + 1, ptr) <= 0)\n\t\t{\n\t\t\t*ptd++ = *ptl++; *ptd++ = *ptl++;\n\t\t}\n\t\telse if (cmp(ptl, ptr + 1) > 0)\n\t\t{\n\t\t\t*ptd++ = *ptr++; *ptd++ = *ptr++;\n\t\t}\n\t\telse \n\t\t{\n\t\t\tx = cmp(ptl, ptr) <= 0; y = !x; ptd[x] = *ptr; ptr += 1; ptd[y] = *ptl; ptl += 1; ptd += 2;\n\t\t\tx = cmp(ptl, ptr) <= 0; y = !x; ptd[x] = *ptr; ptr += y; ptd[y] = *ptl; ptl += x; ptd++;\n\t\t}\n\t}\n\n\twhile (ptl <= tpl && ptr <= tpr)\n\t{\n\t\t*ptd++ = cmp(ptl, ptr) <= 0 ? *ptl++ : *ptr++;\n\t}\n\twhile (ptl <= tpl)\n\t{\n\t\t*ptd++ = *ptl++;\n\t}\n\twhile (ptr <= tpr)\n\t{\n\t\t*ptd++ = *ptr++;\n\t}\n}\n\nvoid FUNC(parity_twin_merge)(VAR *ptl, VAR *ptr, VAR *ptd, VAR *tpd, size_t block, CMPFUNC *cmp)\n{\n\tVAR *tpl, *tpr;\n#if !defined __clang__\n\tunsigned char x, y;\n#endif\n\ttpl = ptl + block - 1;\n\ttpr = ptr + block - 1;\n\n\tfor (block-- ; block ; block--)\n\t{\n\t\thead_branchless_merge(ptd, x, ptl, ptr, cmp);\n\t\ttail_branchless_merge(tpd, y, tpl, tpr, cmp);\n\t}\n\t*ptd = cmp(ptl, ptr) <= 0 ? *ptl : *ptr;\n\t*tpd = cmp(tpl, tpr)  > 0 ? *tpl : *tpr;\n}\n\n// merge two sorted arrays across two buckets\n// [AB][AB] --> [AA][  ] + [BB][  ]\n\nvoid FUNC(twin_merge)(STRUCT(x_node) *x_node, STRUCT(y_node) *y_node1, STRUCT(y_node) *y_node2, CMPFUNC *cmp)\n{\n\tVAR *pta, *ptb, *tpa, *tpb, *pts;\n\n\tFUNC(quadsort_swap)(y_node1->z_axis2, x_node->swap, BSC_Z, BSC_Z, cmp);\n\n\tpta = y_node1->z_axis1;\n\tptb = y_node1->z_axis2;\n\ttpa = pta + BSC_Z - 1;\n\ttpb = ptb + BSC_Z - 1;\n\n\tif (cmp(tpa, ptb) <= 0)\n\t{\n\t\tpts = y_node1->z_axis2;\n\t\ty_node1->z_axis2 = y_node2->z_axis1;\n\t\ty_node2->z_axis1 = pts;\n\n\t\treturn;\n\t}\n\n\tif (cmp(pta, tpb) > 0)\n\t{\n\t\tpts = y_node1->z_axis1;\n\t\ty_node1->z_axis1 = y_node1->z_axis2;\n\t\ty_node1->z_axis2 = y_node2->z_axis1;\n\t\ty_node2->z_axis1 = pts;\n\n\t\treturn;\n\t}\n\n\tFUNC(parity_twin_merge)(pta, ptb, y_node2->z_axis2, y_node2->z_axis1 + BSC_Z - 1, BSC_Z, cmp);\n\n\tpta = y_node1->z_axis1; y_node1->z_axis1 = y_node2->z_axis2; y_node2->z_axis2 = pta;\n}\n\nvoid FUNC(destroy_grid)(STRUCT(x_node) *x_node, VAR *array, CMPFUNC *cmp)\n{\n\tSTRUCT(y_node) *y_node;\n\tsize_t y, z;\n\n\tfor (y = z = 0 ; y < x_node->y_size ; y++)\n\t{\n\t\ty_node = x_node->y_axis[y];\n\n\t\tif (y_node->z_size)\n\t\t{\n\t\t\tFUNC(twin_merge_cpy)(x_node, &array[z], y_node, cmp);\n\t\t}\n\t\telse\n\t\t{\n\t\t\tmemcpy(&array[z], y_node->z_axis1, BSC_Z * sizeof(VAR));\n\t\t}\n\t\tz += BSC_Z + y_node->z_size;\n\n\t\tfree(y_node->z_axis1);\n\t\tfree(y_node->z_axis2);\n\n\t\tfree(y_node);\n\t}\n\tfree(x_node->y_axis);\n\tfree(x_node->y_base);\n\tfree(x_node->swap);\n\n\tfree(x_node);\n}\n\nsize_t FUNC(adaptive_binary_search)(STRUCT(x_node) *x_node, VAR *array, VAR key, CMPFUNC *cmp)\n{\n\tstatic unsigned int run;\n\tsize_t top, mid;\n\tVAR *base = array;\n\n\tif (!run)\n\t{\n\t\ttop = x_node->y_size;\n\n\t\tgoto monobound;\n\t}\n\n\tif (x_node->y == x_node->y_size - 1)\n\t{\n\t\tif (cmp(base + x_node->y, &key) <= 0)\n\t\t{\n\t\t\treturn x_node->y;\n\t\t}\n\t\ttop = x_node->y;\n\n\t\tgoto monobound;\n\t}\n\n\tif (x_node->y == 0)\n\t{\n\t\tbase++;\n\n\t\tif (cmp(base, &key) > 0)\n\t\t{\n\t\t\treturn 0;\n\t\t}\n\t\ttop = x_node->y_size - 1;\n\n\t\tgoto monobound;\n\t}\n\n\tbase += x_node->y;\n\n\tif (cmp(base, &key) <= 0)\n\t{\n\t\tif (cmp(base + 1, &key) > 0)\n\t\t{\n\t\t\tgoto end;\n\t\t}\n\t\tbase++;\n\t\ttop = x_node->y_size - x_node->y - 1;\n\t\t\n\t}\n\telse\n\t{\n\t\tbase--;\n\n\t\tif (cmp(base, &key) <= 0)\n\t\t{\n\t\t\tgoto end;\n\t\t}\n\t\ttop = x_node->y - 1;\n\t\tbase = array;\n\t}\n\n\tmonobound:\n\n\twhile (top > 1)\n\t{\n\t\tmid = top / 2;\n\n\t\tif (cmp(base + mid, &key) <= 0)\n\t\t{\n\t\t\tbase += mid;\n\t\t}\n\t\ttop -= mid;\n\t}\n\n\tend:\n\n\ttop = base - array;\n\n\trun = x_node->y == top;\n\n\treturn x_node->y = top;\n}\n\nvoid FUNC(insert_y_node)(STRUCT(x_node) *x_node, size_t y)\n{\n\tsize_t end = ++x_node->y_size;\n\n\tif (x_node->y_size % BSC_Z == 0)\n\t{\n\t\tx_node->y_base = (VAR *) realloc(x_node->y_base, (x_node->y_size + BSC_Z) * sizeof(VAR));\n\t\tx_node->y_axis = (STRUCT(y_node) **) realloc(x_node->y_axis, (x_node->y_size + BSC_Z) * sizeof(STRUCT(y_node) *));\n\t}\n\n\twhile (y < --end)\n\t{\n\t\tx_node->y_axis[end] = x_node->y_axis[end - 1];\n\t\tx_node->y_base[end] = x_node->y_base[end - 1];\n\t}\n\tx_node->y_axis[y] = (STRUCT(y_node) *) malloc(sizeof(STRUCT(y_node)));\n\n\tx_node->y_axis[y]->z_axis1 = (VAR *) malloc(BSC_Z * sizeof(VAR));\n\tx_node->y_axis[y]->z_axis2 = (VAR *) malloc(BSC_Z * sizeof(VAR));\n}\n\nvoid FUNC(split_y_node)(STRUCT(x_node) *x_node, size_t y1, size_t y2, CMPFUNC *cmp)\n{\n\tSTRUCT(y_node) *y_node1, *y_node2;\n\n\tFUNC(insert_y_node)(x_node, y2);\n\n\ty_node1 = x_node->y_axis[y1];\n\ty_node2 = x_node->y_axis[y2];\n\n\tFUNC(twin_merge)(x_node, y_node1, y_node2, cmp);\n\n\ty_node1->z_size = y_node2->z_size = 0;\n\n\tx_node->y_base[y1] = y_node1->z_axis1[0];\n\tx_node->y_base[y2] = y_node2->z_axis1[0];\n}\n\nvoid FUNC(insert_z_node)(STRUCT(x_node) *x_node, VAR key, CMPFUNC *cmp)\n{\n\tSTRUCT(y_node) *y_node;\n\tsize_t y;\n\n\ty = FUNC(adaptive_binary_search)(x_node, x_node->y_base, key, cmp);\n\n\ty_node = x_node->y_axis[y];\n\n\ty_node->z_axis2[y_node->z_size++] = key;\n\n\tif (y_node->z_size == BSC_Z)\n\t{\n\t\tFUNC(split_y_node)(x_node, y, y + 1, cmp);\n\t}\n}\n\n\n/////////////////////////////////////////////////////////////////////////////\n//┌───────────────────────────────────────────────────────────────────────┐//\n//│    ██████┐ ██████┐ ██████┐██████┐ ███████┐ ██████┐ ██████┐ ████████┐  │//\n//│   ██┌────┘ ██┌──██┐└─██┌─┘██┌──██┐██┌────┘██┌───██┐██┌──██┐└──██┌──┘  │//\n//│   ██│  ███┐██████┌┘  ██│  ██│  ██│███████┐██│   ██│██████┌┘   ██│     │//\n//│   ██│   ██│██┌──██┐  ██│  ██│  ██│└────██│██│   ██│██┌──██┐   ██│     │//\n//│   └██████┌┘██│  ██│██████┐██████┌┘███████│└██████┌┘██│  ██│   ██│     │//\n//│    └─────┘ └─┘  └─┘└─────┘└─────┘ └──────┘ └─────┘ └─┘  └─┘   └─┘     │//\n//└───────────────────────────────────────────────────────────────────────┘//\n/////////////////////////////////////////////////////////////////////////////\n\nvoid FUNC(gridsort)(void *array, size_t nmemb, size_t size, CMPFUNC *cmp)\n{\n\tsize_t cnt = nmemb;\n\tVAR *pta = (VAR *) array;\n\n\tSTRUCT(x_node) *grid = FUNC(create_grid)(pta, cnt, cmp);\n\n\tpta += BSC_Z * 2;\n\tcnt -= BSC_Z * 2;\n\n\twhile (cnt--)\n\t{\n\t\tFUNC(insert_z_node)(grid, *pta++, cmp);\n\t}\n\n\tFUNC(destroy_grid)(grid, (VAR *) array, cmp);\n}\n"
  },
  {
    "path": "src/gridsort.h",
    "content": "// gridsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com\n\n#ifndef GRIDSORT_H\n#define GRIDSORT_H\n\n//#define cmp(a,b) (*(a) > *(b))\n\n#ifndef QUADSORT_H\n  #include \"quadsort.h\"\n#endif\n\n#include <stdlib.h>\n#include <stdio.h>\n#include <assert.h>\n#include <errno.h>\n\ntypedef int CMPFUNC (const void *a, const void *b);\n\n#define BSC_X 32\n#define BSC_Y 2\n\nsize_t  BSC_Z;\n\n//////////////////////////////////////////////////////////\n//┌────────────────────────────────────────────────────┐//\n//│                █████┐    ██████┐ ██████┐████████┐  │//\n//│               ██┌──██┐   ██┌──██┐└─██┌─┘└──██┌──┘  │//\n//│               └█████┌┘   ██████┌┘  ██│     ██│     │//\n//│               ██┌──██┐   ██┌──██┐  ██│     ██│     │//\n//│               └█████┌┘   ██████┌┘██████┐   ██│     │//\n//│                └────┘    └─────┘ └─────┘   └─┘     │//\n//└────────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n#undef VAR\n#undef FUNC\n#undef STRUCT\n\n#define VAR char\n#define FUNC(NAME) NAME##8\n#define STRUCT(NAME) struct NAME##8\n\n#include \"gridsort.c\"\n\n//////////////////////////////////////////////////////////\n//┌────────────────────────────────────────────────────┐//\n//│           ▄██┐   █████┐    ██████┐ ██████┐████████┐│//\n//│          ████│  ██┌───┘    ██┌──██┐└─██┌─┘└──██┌──┘│//\n//│          └─██│  ██████┐    ██████┌┘  ██│     ██│   │//\n//│            ██│  ██┌──██┐   ██┌──██┐  ██│     ██│   │//\n//│          ██████┐└█████┌┘   ██████┌┘██████┐   ██│   │//\n//│          └─────┘ └────┘    └─────┘ └─────┘   └─┘   │//\n//└────────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n#undef VAR\n#undef FUNC\n#undef STRUCT\n\n#define VAR short\n#define FUNC(NAME) NAME##16\n#define STRUCT(NAME) struct NAME##16\n\n#include \"gridsort.c\"\n\n//////////////////////////////////////////////////////////\n// ┌───────────────────────────────────────────────────┐//\n// │       ██████┐ ██████┐    ██████┐ ██████┐████████┐ │//\n// │       └────██┐└────██┐   ██┌──██┐└─██┌─┘└──██┌──┘ │//\n// │        █████┌┘ █████┌┘   ██████┌┘  ██│     ██│    │//\n// │        └───██┐██┌───┘    ██┌──██┐  ██│     ██│    │//\n// │       ██████┌┘███████┐   ██████┌┘██████┐   ██│    │//\n// │       └─────┘ └──────┘   └─────┘ └─────┘   └─┘    │//\n// └───────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n#undef VAR\n#undef FUNC\n#undef STRUCT\n\n#define VAR int\n#define FUNC(NAME) NAME##32\n#define STRUCT(NAME) struct NAME##32\n\n#include \"gridsort.c\"\n\n//////////////////////////////////////////////////////////\n// ┌───────────────────────────────────────────────────┐//\n// │        █████┐ ██┐  ██┐   ██████┐ ██████┐████████┐ │//\n// │       ██┌───┘ ██│  ██│   ██┌──██┐└─██┌─┘└──██┌──┘ │//\n// │       ██████┐ ███████│   ██████┌┘  ██│     ██│    │//\n// │       ██┌──██┐└────██│   ██┌──██┐  ██│     ██│    │//\n// │       └█████┌┘     ██│   ██████┌┘██████┐   ██│    │//\n// │        └────┘      └─┘   └─────┘ └─────┘   └─┘    │//\n// └───────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n#undef VAR\n#undef FUNC\n#undef STRUCT\n\n#define VAR long long\n#define FUNC(NAME) NAME##64\n#define STRUCT(NAME) struct NAME##64\n\n#include \"gridsort.c\"\n\n//////////////////////////////////////////////////////////\n//┌────────────────────────────────────────────────────┐//\n//│  ▄██┐  ██████┐  █████┐    ██████┐ ██████┐████████┐ │//\n//│ ████│  └────██┐██┌──██┐   ██┌──██┐└─██┌─┘└──██┌──┘ │//\n//│ └─██│   █████┌┘└█████┌┘   ██████┌┘  ██│     ██│    │//\n//│   ██│  ██┌───┘ ██┌──██┐   ██┌──██┐  ██│     ██│    │//\n//│ ██████┐███████┐└█████┌┘   ██████┌┘██████┐   ██│    │//\n//│ └─────┘└──────┘ └────┘    └─────┘ └─────┘   └─┘    │//\n//└────────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n#undef VAR\n#undef FUNC\n#undef STRUCT\n\n#define VAR long double\n#define FUNC(NAME) NAME##128\n#define STRUCT(NAME) struct NAME##128\n\n#include \"gridsort.c\"\n\n/////////////////////////////////////////////////////////////////////////////\n//┌───────────────────────────────────────────────────────────────────────┐//\n//│    ██████┐ ██████┐ ██████┐██████┐ ███████┐ ██████┐ ██████┐ ████████┐  │//\n//│   ██┌────┘ ██┌──██┐└─██┌─┘██┌──██┐██┌────┘██┌───██┐██┌──██┐└──██┌──┘  │//\n//│   ██│  ███┐██████┌┘  ██│  ██│  ██│███████┐██│   ██│██████┌┘   ██│     │//\n//│   ██│   ██│██┌──██┐  ██│  ██│  ██│└────██│██│   ██│██┌──██┐   ██│     │//\n//│   └██████┌┘██│  ██│██████┐██████┌┘███████│└██████┌┘██│  ██│   ██│     │//\n//│    └─────┘ └─┘  └─┘└─────┘└─────┘ └──────┘ └─────┘ └─┘  └─┘   └─┘     │//\n//└───────────────────────────────────────────────────────────────────────┘//\n/////////////////////////////////////////////////////////////////////////////\n\nvoid gridsort(void *array, size_t nmemb, size_t size, CMPFUNC *cmp)\n{\n\tif (nmemb < BSC_X * BSC_X)\n\t{\n\t\treturn quadsort(array, nmemb, size, cmp);\n\t}\n\n\tswitch (size)\n\t{\n\t\tcase sizeof(char):\n\t\t\treturn gridsort8(array, nmemb, size, cmp);\n\n\t\tcase sizeof(short):\n\t\t\treturn gridsort16(array, nmemb, size, cmp);\n\n\t\tcase sizeof(int):\n\t\t\treturn gridsort32(array, nmemb, size, cmp);\n\n\t\tcase sizeof(long long):\n\t\t\treturn gridsort64(array, nmemb, size, cmp);\n\n\t\tcase sizeof(long double):\n\t\t\treturn gridsort128(array, nmemb, size, cmp);\n\n\t\tdefault:\n\t\t\tassert(size == sizeof(char) || size == sizeof(short) || size == sizeof(int) || size == sizeof(long long) || size == sizeof(long double));\n\t}\n}\n\n#undef VAR\n#undef FUNC\n#undef STRUCT\n\n#endif\n"
  },
  {
    "path": "src/quadsort.c",
    "content": "// quadsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com\n\n// the next seven functions are used for sorting 0 to 31 elements\n\nvoid FUNC(parity_swap_four)(VAR *array, CMPFUNC *cmp)\n{\n\tVAR tmp, *pta = array;\n\tsize_t x;\n\n\tbranchless_swap(pta, tmp, x, cmp); pta += 2;\n\tbranchless_swap(pta, tmp, x, cmp); pta--;\n\n\tif (cmp(pta, pta + 1) > 0)\n\t{\n\t\ttmp = pta[0]; pta[0] = pta[1]; pta[1] = tmp; pta--;\n\n\t\tbranchless_swap(pta, tmp, x, cmp); pta += 2;\n\t\tbranchless_swap(pta, tmp, x, cmp); pta--;\n\t\tbranchless_swap(pta, tmp, x, cmp);\n\t}\n}\n\nvoid FUNC(parity_swap_five)(VAR *array, CMPFUNC *cmp)\n{\n\tVAR tmp, *pta = array;\n\tsize_t x, y;\n\n\tbranchless_swap(pta, tmp, x, cmp); pta += 2;\n\tbranchless_swap(pta, tmp, x, cmp); pta -= 1;\n\tbranchless_swap(pta, tmp, x, cmp); pta += 2;\n\tbranchless_swap(pta, tmp, y, cmp); pta = array;\n\n\tif (x + y)\n\t{\n\t\tbranchless_swap(pta, tmp, x, cmp); pta += 2;\n\t\tbranchless_swap(pta, tmp, x, cmp); pta -= 1;\n\t\tbranchless_swap(pta, tmp, x, cmp); pta += 2;\n\t\tbranchless_swap(pta, tmp, x, cmp); pta = array;\n\t\tbranchless_swap(pta, tmp, x, cmp); pta += 2;\n\t\tbranchless_swap(pta, tmp, x, cmp); pta -= 1;\n\t}\n}\n\nvoid FUNC(parity_swap_six)(VAR *array, VAR *swap, CMPFUNC *cmp)\n{\n\tVAR tmp, *pta = array, *ptl, *ptr;\n\tsize_t x, y;\n\n\tbranchless_swap(pta, tmp, x, cmp); pta++;\n\tbranchless_swap(pta, tmp, x, cmp); pta += 3;\n\tbranchless_swap(pta, tmp, x, cmp); pta--;\n\tbranchless_swap(pta, tmp, x, cmp); pta = array;\n\n\tif (cmp(pta + 2, pta + 3) <= 0)\n\t{\n\t\tbranchless_swap(pta, tmp, x, cmp); pta += 4;\n\t\tbranchless_swap(pta, tmp, x, cmp);\n\t\treturn;\n\t}\n\tx = cmp(pta, pta + 1) > 0; y = !x; swap[0] = pta[x]; swap[1] = pta[y]; swap[2] = pta[2]; pta += 4;\n\tx = cmp(pta, pta + 1) > 0; y = !x; swap[4] = pta[x]; swap[5] = pta[y]; swap[3] = pta[-1];\n\n\tpta = array; ptl = swap; ptr = swap + 3;\n\n\thead_branchless_merge(pta, x, ptl, ptr, cmp);\n\thead_branchless_merge(pta, x, ptl, ptr, cmp);\n\thead_branchless_merge(pta, x, ptl, ptr, cmp);\n\n\tpta = array + 5; ptl = swap + 2; ptr = swap + 5;\n\n\ttail_branchless_merge(pta, y, ptl, ptr, cmp);\n\ttail_branchless_merge(pta, y, ptl, ptr, cmp);\n\t*pta = cmp(ptl, ptr)  > 0 ? *ptl : *ptr;\n}\n\nvoid FUNC(parity_swap_seven)(VAR *array, VAR *swap, CMPFUNC *cmp)\n{\n\tVAR tmp, *pta = array, *ptl, *ptr;\n\tsize_t x, y;\n\n\tbranchless_swap(pta, tmp, x, cmp); pta += 2;\n\tbranchless_swap(pta, tmp, x, cmp); pta += 2;\n\tbranchless_swap(pta, tmp, x, cmp); pta -= 3;\n\tbranchless_swap(pta, tmp, y, cmp); pta += 2;\n\tbranchless_swap(pta, tmp, x, cmp); pta += 2; y += x;\n\tbranchless_swap(pta, tmp, x, cmp); pta -= 1; y += x;\n\n\tif (y == 0) return;\n\n\tbranchless_swap(pta, tmp, x, cmp); pta = array;\n\n\tx = cmp(pta, pta + 1) > 0; swap[0] = pta[x]; swap[1] = pta[!x]; swap[2] = pta[2]; pta += 3;\n\tx = cmp(pta, pta + 1) > 0; swap[3] = pta[x]; swap[4] = pta[!x]; pta += 2;\n\tx = cmp(pta, pta + 1) > 0; swap[5] = pta[x]; swap[6] = pta[!x];\n\n\tpta = array; ptl = swap; ptr = swap + 3;\n\n\thead_branchless_merge(pta, x, ptl, ptr, cmp);\n\thead_branchless_merge(pta, x, ptl, ptr, cmp);\n\thead_branchless_merge(pta, x, ptl, ptr, cmp);\n\n\tpta = array + 6; ptl = swap + 2; ptr = swap + 6;\n\n\ttail_branchless_merge(pta, y, ptl, ptr, cmp);\n\ttail_branchless_merge(pta, y, ptl, ptr, cmp);\n\ttail_branchless_merge(pta, y, ptl, ptr, cmp);\n\t*pta = cmp(ptl, ptr) > 0 ? *ptl : *ptr;\n}\n\nvoid FUNC(tiny_sort)(VAR *array, VAR *swap, size_t nmemb, CMPFUNC *cmp)\n{\n\tVAR tmp;\n\tsize_t x;\n\n\tswitch (nmemb)\n\t{\n\t\tcase 0:\n\t\tcase 1:\n\t\t\treturn;\n\t\tcase 2:\n\t\t\tbranchless_swap(array, tmp, x, cmp);\n\t\t\treturn;\n\t\tcase 3:\n\t\t\tbranchless_swap(array, tmp, x, cmp); array++;\n\t\t\tbranchless_swap(array, tmp, x, cmp); array--;\n\t\t\tbranchless_swap(array, tmp, x, cmp);\n\t\t\treturn;\n\t\tcase 4:\n\t\t\tFUNC(parity_swap_four)(array, cmp);\n\t\t\treturn;\n\t\tcase 5:\n\t\t\tFUNC(parity_swap_five)(array, cmp);\n\t\t\treturn;\n\t\tcase 6:\n\t\t\tFUNC(parity_swap_six)(array, swap, cmp);\n\t\t\treturn;\n\t\tcase 7:\n\t\t\tFUNC(parity_swap_seven)(array, swap, cmp);\n\t\t\treturn;\n\t}\n}\n\n// left must be equal or one smaller than right\n\nvoid FUNC(parity_merge)(VAR *dest, VAR *from, size_t left, size_t right, CMPFUNC *cmp)\n{\n\tVAR *ptl, *ptr, *tpl, *tpr, *tpd, *ptd;\n#if !defined __clang__\n\tsize_t x, y;\n#endif\n\tptl = from;\n\tptr = from + left;\n\tptd = dest;\n\ttpl = ptr - 1;\n\ttpr = tpl + right;\n\ttpd = dest + left + right - 1;\n\n\tif (left < right)\n\t{\n\t\t*ptd++ = cmp(ptl, ptr) <= 0 ? *ptl++ : *ptr++;\n\t}\n\t*ptd++ = cmp(ptl, ptr) <= 0 ? *ptl++ : *ptr++;\n\n#if !defined cmp && !defined __clang__ // cache limit workaround for gcc\n\tif (left > QUAD_CACHE)\n\t{\n\t\twhile (--left)\n\t\t{\n\t\t\t*ptd++ = cmp(ptl, ptr) <= 0 ? *ptl++ : *ptr++;\n\t\t\t*tpd-- = cmp(tpl, tpr)  > 0 ? *tpl-- : *tpr--;\n\t\t}\n\t}\n\telse\n#endif\n\t{\n\t\twhile (--left)\n\t\t{\n\t\t\thead_branchless_merge(ptd, x, ptl, ptr, cmp);\n\t\t\ttail_branchless_merge(tpd, y, tpl, tpr, cmp);\n\t\t}\n\t}\n\t*tpd = cmp(tpl, tpr)  > 0 ? *tpl : *tpr;\n}\n\nvoid FUNC(tail_swap)(VAR *array, VAR *swap, size_t nmemb, CMPFUNC *cmp)\n{\n\tif (nmemb < 8)\n\t{\n\t\tFUNC(tiny_sort)(array, swap, nmemb, cmp);\n\t\treturn;\n\t}\n\tsize_t quad1, quad2, quad3, quad4, half1, half2;\n\n\thalf1 = nmemb / 2;\n\tquad1 = half1 / 2;\n\tquad2 = half1 - quad1;\n\thalf2 = nmemb - half1;\n\tquad3 = half2 / 2;\n\tquad4 = half2 - quad3;\n\n\tVAR *pta = array;\n\n\tFUNC(tail_swap)(pta, swap, quad1, cmp); pta += quad1;\n\tFUNC(tail_swap)(pta, swap, quad2, cmp); pta += quad2;\n\tFUNC(tail_swap)(pta, swap, quad3, cmp); pta += quad3;\n\tFUNC(tail_swap)(pta, swap, quad4, cmp);\n\n\tif (cmp(array + quad1 - 1, array + quad1) <= 0 && cmp(array + half1 - 1, array + half1) <= 0 && cmp(pta - 1, pta) <= 0)\n\t{\n\t\treturn;\n\t}\n\tFUNC(parity_merge)(swap, array, quad1, quad2, cmp);\n\tFUNC(parity_merge)(swap + half1, array + half1, quad3, quad4, cmp);\n\tFUNC(parity_merge)(array, swap, half1, half2, cmp);\n}\n\n// the next three functions create sorted blocks of 32 elements\n\nvoid FUNC(quad_reversal)(VAR *pta, VAR *ptz)\n{\n\tVAR *ptb, *pty, tmp1, tmp2;\n\n\tsize_t loop = (ptz - pta) / 2;\n\n\tptb = pta + loop;\n\tpty = ptz - loop;\n\n\tif (loop % 2 == 0)\n\t{\n\t\ttmp2 = *ptb; *ptb-- = *pty; *pty++ = tmp2; loop--;\n\t}\n\n\tloop /= 2;\n\n\tdo\n\t{\n\t\ttmp1 = *pta; *pta++ = *ptz; *ptz-- = tmp1;\n\t\ttmp2 = *ptb; *ptb-- = *pty; *pty++ = tmp2;\n\t}\n\twhile (loop--);\n}\n\nvoid FUNC(quad_swap_merge)(VAR *array, VAR *swap, CMPFUNC *cmp)\n{\n\tVAR *pts, *ptl, *ptr;\n#if !defined __clang__\n\tsize_t x;\n#endif\n\tparity_merge_two(array + 0, swap + 0, x, ptl, ptr, pts, cmp);\n\tparity_merge_two(array + 4, swap + 4, x, ptl, ptr, pts, cmp);\n\n\tparity_merge_four(swap, array, x, ptl, ptr, pts, cmp);\n}\n\nvoid FUNC(tail_merge)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, size_t block, CMPFUNC *cmp);\n\nsize_t FUNC(quad_swap)(VAR *array, size_t nmemb, CMPFUNC *cmp)\n{\n\tVAR tmp, swap[32];\n\tsize_t count;\n\tVAR *pta, *pts;\n\tunsigned char v1, v2, v3, v4, x;\n\tpta = array;\n\n\tcount = nmemb / 8;\n\n\twhile (count--)\n\t{\n\t\tv1 = cmp(pta + 0, pta + 1) > 0;\n\t\tv2 = cmp(pta + 2, pta + 3) > 0;\n\t\tv3 = cmp(pta + 4, pta + 5) > 0;\n\t\tv4 = cmp(pta + 6, pta + 7) > 0;\n\n\t\tswitch (v1 + v2 * 2 + v3 * 4 + v4 * 8)\n\t\t{\n\t\t\tcase 0:\n\t\t\t\tif (cmp(pta + 1, pta + 2) <= 0 && cmp(pta + 3, pta + 4) <= 0 && cmp(pta + 5, pta + 6) <= 0)\n\t\t\t\t{\n\t\t\t\t\tgoto ordered;\n\t\t\t\t}\n\t\t\t\tFUNC(quad_swap_merge)(pta, swap, cmp);\n\t\t\t\tbreak;\n\n\t\t\tcase 15:\n\t\t\t\tif (cmp(pta + 1, pta + 2) > 0 && cmp(pta + 3, pta + 4) > 0 && cmp(pta + 5, pta + 6) > 0)\n\t\t\t\t{\n\t\t\t\t\tpts = pta;\n\t\t\t\t\tgoto reversed;\n\t\t\t\t}\n\n\t\t\tdefault:\n\t\t\tnot_ordered:\n\t\t\t\tx = !v1; tmp = pta[x]; pta[0] = pta[v1]; pta[1] = tmp; pta += 2;\n\t\t\t\tx = !v2; tmp = pta[x]; pta[0] = pta[v2]; pta[1] = tmp; pta += 2;\n\t\t\t\tx = !v3; tmp = pta[x]; pta[0] = pta[v3]; pta[1] = tmp; pta += 2;\n\t\t\t\tx = !v4; tmp = pta[x]; pta[0] = pta[v4]; pta[1] = tmp; pta -= 6;\n\n\t\t\t\tFUNC(quad_swap_merge)(pta, swap, cmp);\n\t\t}\n\t\tpta += 8;\n\n\t\tcontinue;\n\n\t\tordered:\n\n\t\tpta += 8;\n\n\t\tif (count--)\n\t\t{\n\t\t\tif ((v1 = cmp(pta + 0, pta + 1) > 0) | (v2 = cmp(pta + 2, pta + 3) > 0) | (v3 = cmp(pta + 4, pta + 5) > 0) | (v4 = cmp(pta + 6, pta + 7) > 0))\n\t\t\t{\n\t\t\t\tif (v1 + v2 + v3 + v4 == 4 && cmp(pta + 1, pta + 2) > 0 && cmp(pta + 3, pta + 4) > 0 && cmp(pta + 5, pta + 6) > 0)\n\t\t\t\t{\n\t\t\t\t\tpts = pta;\n\t\t\t\t\tgoto reversed;\n\t\t\t\t}\n\t\t\t\tgoto not_ordered;\n\t\t\t}\n\t\t\tif (cmp(pta + 1, pta + 2) <= 0 && cmp(pta + 3, pta + 4) <= 0 && cmp(pta + 5, pta + 6) <= 0)\n\t\t\t{\n\t\t\t\tgoto ordered;\n\t\t\t}\n\t\t\tFUNC(quad_swap_merge)(pta, swap, cmp);\n\t\t\tpta += 8;\n\t\t\tcontinue;\n\t\t}\n\t\tbreak;\n\n\t\treversed:\n\n\t\tpta += 8;\n\n\t\tif (count--)\n\t\t{\n\t\t\tif ((v1 = cmp(pta + 0, pta + 1) <= 0) | (v2 = cmp(pta + 2, pta + 3) <= 0) | (v3 = cmp(pta + 4, pta + 5) <= 0) | (v4 = cmp(pta + 6, pta + 7) <= 0))\n\t\t\t{\n\t\t\t\t// not reversed\n\t\t\t}\n\t\t\telse\n\t\t\t{\n\t\t\t\tif (cmp(pta - 1, pta) > 0 && cmp(pta + 1, pta + 2) > 0 && cmp(pta + 3, pta + 4) > 0 && cmp(pta + 5, pta + 6) > 0)\n\t\t\t\t{\n\t\t\t\t\tgoto reversed;\n\t\t\t\t}\n\t\t\t}\n\t\t\tFUNC(quad_reversal)(pts, pta - 1);\n\n\t\t\tif (v1 + v2 + v3 + v4 == 4 && cmp(pta + 1, pta + 2) <= 0 && cmp(pta + 3, pta + 4) <= 0 && cmp(pta + 5, pta + 6) <= 0)\n\t\t\t{\n\t\t\t\tgoto ordered;\n\t\t\t}\n\t\t\tif (v1 + v2 + v3 + v4 == 0 && cmp(pta + 1, pta + 2)  > 0 && cmp(pta + 3, pta + 4)  > 0 && cmp(pta + 5, pta + 6)  > 0)\n\t\t\t{\n\t\t\t\tpts = pta;\n\t\t\t\tgoto reversed;\n\t\t\t}\n\n\t\t\tx = !v1; tmp = pta[v1]; pta[0] = pta[x]; pta[1] = tmp; pta += 2;\n\t\t\tx = !v2; tmp = pta[v2]; pta[0] = pta[x]; pta[1] = tmp; pta += 2;\n\t\t\tx = !v3; tmp = pta[v3]; pta[0] = pta[x]; pta[1] = tmp; pta += 2;\n\t\t\tx = !v4; tmp = pta[v4]; pta[0] = pta[x]; pta[1] = tmp; pta -= 6;\n\n\t\t\tif (cmp(pta + 1, pta + 2) > 0 || cmp(pta + 3, pta + 4) > 0 || cmp(pta + 5, pta + 6) > 0)\n\t\t\t{\n\t\t\t\tFUNC(quad_swap_merge)(pta, swap, cmp);\n\t\t\t}\n\t\t\tpta += 8;\n\t\t\tcontinue;\n\t\t}\n\n\t\tswitch (nmemb % 8)\n\t\t{\n\t\t\tcase 7: if (cmp(pta + 5, pta + 6) <= 0) break;\n\t\t\tcase 6: if (cmp(pta + 4, pta + 5) <= 0) break;\n\t\t\tcase 5: if (cmp(pta + 3, pta + 4) <= 0) break;\n\t\t\tcase 4: if (cmp(pta + 2, pta + 3) <= 0) break;\n\t\t\tcase 3: if (cmp(pta + 1, pta + 2) <= 0) break;\n\t\t\tcase 2: if (cmp(pta + 0, pta + 1) <= 0) break;\n\t\t\tcase 1: if (cmp(pta - 1, pta + 0) <= 0) break;\n\t\t\tcase 0:\n\t\t\t\tFUNC(quad_reversal)(pts, pta + nmemb % 8 - 1);\n\n\t\t\t\tif (pts == array)\n\t\t\t\t{\n\t\t\t\t\treturn 1;\n\t\t\t\t}\n\t\t\t\tgoto reverse_end;\n\t\t}\n\t\tFUNC(quad_reversal)(pts, pta - 1);\n\t\tbreak;\n\t}\n\tFUNC(tail_swap)(pta, swap, nmemb % 8, cmp);\n\n\treverse_end:\n\n\tpta = array;\n\n\tfor (count = nmemb / 32 ; count-- ; pta += 32)\n\t{\n\t\tif (cmp(pta + 7, pta + 8) <= 0 && cmp(pta + 15, pta + 16) <= 0 && cmp(pta + 23, pta + 24) <= 0)\n\t\t{\n\t\t\tcontinue;\n\t\t}\n\t\tFUNC(parity_merge)(swap, pta, 8, 8, cmp);\n\t\tFUNC(parity_merge)(swap + 16, pta + 16, 8, 8, cmp);\n\t\tFUNC(parity_merge)(pta, swap, 16, 16, cmp);\n\t}\n\n\tif (nmemb % 32 > 8)\n\t{\n\t\tFUNC(tail_merge)(pta, swap, 32, nmemb % 32, 8, cmp);\n\t}\n\treturn 0;\n}\n\n// The next six functions are quad merge support routines\n\nvoid FUNC(cross_merge)(VAR *dest, VAR *from, size_t left, size_t right, CMPFUNC *cmp)\n{\n\tVAR *ptl, *tpl, *ptr, *tpr, *ptd, *tpd;\n\tsize_t loop;\n#if !defined __clang__\n\tsize_t x, y;\n#endif\n\tptl = from;\n\tptr = from + left;\n\ttpl = ptr - 1;\n\ttpr = tpl + right;\n\n\tif (left + 1 >= right && right >= left && left >= 32)\n\t{\n\t\tif (cmp(ptl + 15, ptr) > 0 && cmp(ptl, ptr + 15) <= 0 && cmp(tpl, tpr - 15) > 0 && cmp(tpl - 15, tpr) <= 0)\n\t\t{\n\t\t\tFUNC(parity_merge)(dest, from, left, right, cmp);\n\t\t\treturn;\n\t\t}\n\t}\n\tptd = dest;\n\ttpd = dest + left + right - 1;\n\n\twhile (1)\n\t{\n\t\tif (tpl - ptl > 8)\n\t\t{\n\t\t\tptl8_ptr: if (cmp(ptl + 7, ptr) <= 0)\n\t\t\t{\n\t\t\t\tmemcpy(ptd, ptl, 8 * sizeof(VAR)); ptd += 8; ptl += 8;\n\n\t\t\t\tif (tpl - ptl > 8) {goto ptl8_ptr;} continue;\n\t\t\t}\n\n\t\t\ttpl8_tpr: if (cmp(tpl - 7, tpr) > 0)\n\t\t\t{\n\t\t\t\ttpd -= 7; tpl -= 7; memcpy(tpd--, tpl--, 8 * sizeof(VAR));\n\n\t\t\t\tif (tpl - ptl > 8) {goto tpl8_tpr;} continue;\n\t\t\t}\n\t\t}\n\n\t\tif (tpr - ptr > 8)\n\t\t{\n\t\t\tptl_ptr8: if (cmp(ptl, ptr + 7) > 0)\n\t\t\t{\n\t\t\t\tmemcpy(ptd, ptr, 8 * sizeof(VAR)); ptd += 8; ptr += 8;\n\n\t\t\t\tif (tpr - ptr > 8) {goto ptl_ptr8;} continue;\n\t\t\t}\n\n\t\t\ttpl_tpr8: if (cmp(tpl, tpr - 7) <= 0)\n\t\t\t{\n\t\t\t\ttpd -= 7; tpr -= 7; memcpy(tpd--, tpr--, 8 * sizeof(VAR));\n\n\t\t\t\tif (tpr - ptr > 8) {goto tpl_tpr8;} continue;\n\t\t\t}\n\t\t}\n\n\t\tif (tpd - ptd < 16)\n\t\t{\n\t\t\tbreak;\n\t\t}\n\n#if !defined cmp && !defined __clang__\n\t\tif (left > QUAD_CACHE)\n\t\t{\n\t\t\tloop = 8; do\n\t\t\t{\n\t\t\t\t*ptd++ = cmp(ptl, ptr) <= 0 ? *ptl++ : *ptr++;\n\t\t\t\t*tpd-- = cmp(tpl, tpr)  > 0 ? *tpl-- : *tpr--;\n\t\t\t}\n\t\t\twhile (--loop);\n\t\t}\n\t\telse\n#endif\n\t\t{\n\t\t\tloop = 8; do\n\t\t\t{\n\t\t\t\thead_branchless_merge(ptd, x, ptl, ptr, cmp);\n\t\t\t\ttail_branchless_merge(tpd, y, tpl, tpr, cmp);\n\t\t\t}\n\t\t\twhile (--loop);\n\t\t}\n\t}\n\n\twhile (ptl <= tpl && ptr <= tpr)\n\t{\n\t\t*ptd++ = cmp(ptl, ptr) <= 0 ? *ptl++ : *ptr++;\n\t}\n\twhile (ptl <= tpl)\n\t{\n\t\t*ptd++ = *ptl++;\n\t}\n\twhile (ptr <= tpr)\n\t{\n\t\t*ptd++ = *ptr++;\n\t}\n}\n\nvoid FUNC(quad_merge_block)(VAR *array, VAR *swap, size_t block, CMPFUNC *cmp)\n{\n\tVAR *pt1, *pt2, *pt3;\n\tsize_t block_x_2 = block * 2;\n\n\tpt1 = array + block;\n\tpt2 = pt1 + block;\n\tpt3 = pt2 + block;\n\n\tswitch ((cmp(pt1 - 1, pt1) <= 0) | (cmp(pt3 - 1, pt3) <= 0) * 2)\n\t{\n\t\tcase 0:\n\t\t\tFUNC(cross_merge)(swap, array, block, block, cmp);\n\t\t\tFUNC(cross_merge)(swap + block_x_2, pt2, block, block, cmp);\n\t\t\tbreak;\n\t\tcase 1:\n\t\t\tmemcpy(swap, array, block_x_2 * sizeof(VAR));\n\t\t\tFUNC(cross_merge)(swap + block_x_2, pt2, block, block, cmp);\n\t\t\tbreak;\n\t\tcase 2:\n\t\t\tFUNC(cross_merge)(swap, array, block, block, cmp);\n\t\t\tmemcpy(swap + block_x_2, pt2, block_x_2 * sizeof(VAR));\n\t\t\tbreak;\n\t\tcase 3:\n\t\t\tif (cmp(pt2 - 1, pt2) <= 0)\n\t\t\t\treturn;\n\t\t\tmemcpy(swap, array, block_x_2 * 2 * sizeof(VAR));\n\t}\n\tFUNC(cross_merge)(array, swap, block_x_2, block_x_2, cmp);\n}\n\nsize_t FUNC(quad_merge)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, size_t block, CMPFUNC *cmp)\n{\n\tVAR *pta, *pte;\n\n\tpte = array + nmemb;\n\n\tblock *= 4;\n\n\twhile (block <= nmemb && block <= swap_size)\n\t{\n\t\tpta = array;\n\n\t\tdo\n\t\t{\n\t\t\tFUNC(quad_merge_block)(pta, swap, block / 4, cmp);\n\n\t\t\tpta += block;\n\t\t}\n\t\twhile (pta + block <= pte);\n\n\t\tFUNC(tail_merge)(pta, swap, swap_size, pte - pta, block / 4, cmp);\n\n\t\tblock *= 4;\n\t}\n\n\tFUNC(tail_merge)(array, swap, swap_size, nmemb, block / 4, cmp);\n\n\treturn block / 2;\n}\n\nvoid FUNC(partial_forward_merge)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, size_t block, CMPFUNC *cmp)\n{\n\tVAR *ptl, *ptr, *tpl, *tpr;\n\tsize_t x;\n\n\tif (nmemb == block)\n\t{\n\t\treturn;\n\t}\n\n\tptr = array + block;\n\ttpr = array + nmemb - 1;\n\n\tif (cmp(ptr - 1, ptr) <= 0)\n\t{\n\t\treturn;\n\t}\n\n\tmemcpy(swap, array, block * sizeof(VAR));\n\n\tptl = swap;\n\ttpl = swap + block - 1;\n\n\twhile (ptl < tpl - 1 && ptr < tpr - 1)\n\t{\n\t\tptr2: if (cmp(ptl, ptr + 1) > 0)\n\t\t{\n\t\t\t*array++ = *ptr++; *array++ = *ptr++;\n\n\t\t\tif (ptr < tpr - 1) {goto ptr2;} break;\n\t\t}\n\t\tif (cmp(ptl + 1, ptr) <= 0)\n\t\t{\n\t\t\t*array++ = *ptl++; *array++ = *ptl++;\n\n\t\t\tif (ptl < tpl - 1) {goto ptl2;} break;\n\t\t}\n\n\t\tgoto cross_swap;\n\n\t\tptl2: if (cmp(ptl + 1, ptr) <= 0)\n\t\t{\n\t\t\t*array++ = *ptl++; *array++ = *ptl++;\n\n\t\t\tif (ptl < tpl - 1) {goto ptl2;} break;\n\t\t}\n\n\t\tif (cmp(ptl, ptr + 1) > 0)\n\t\t{\n\t\t\t*array++ = *ptr++; *array++ = *ptr++;\n\n\t\t\tif (ptr < tpr - 1) {goto ptr2;} break;\n\t\t}\n\n\t\tcross_swap:\n\n\t\tx = cmp(ptl, ptr) <= 0; array[x] = *ptr; ptr += 1; array[!x] = *ptl; ptl += 1; array += 2;\n\t\thead_branchless_merge(array, x, ptl, ptr, cmp);\n\t}\n\n\twhile (ptl <= tpl && ptr <= tpr)\n\t{\n\t\t*array++ = cmp(ptl, ptr) <= 0 ? *ptl++ : *ptr++;\n\t}\n\n\twhile (ptl <= tpl)\n\t{\n\t\t*array++ = *ptl++;\n\t}\n}\n\nvoid FUNC(partial_backward_merge)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, size_t block, CMPFUNC *cmp)\n{\n\tVAR *tpl, *tpa, *tpr;\n\tsize_t right, loop, x;\n\n\tif (nmemb == block)\n\t{\n\t\treturn;\n\t}\n\n\ttpl = array + block - 1;\n\ttpa = array + nmemb - 1;\n\n\tif (cmp(tpl, tpl + 1) <= 0)\n\t{\n\t\treturn;\n\t}\n\n\tright = nmemb - block;\n\n\tif (nmemb <= swap_size && right >= 64)\n\t{\n\t\tFUNC(cross_merge)(swap, array, block, right, cmp);\n\n\t\tmemcpy(array, swap, nmemb * sizeof(VAR));\n\n\t\treturn;\n\t}\n\n\tmemcpy(swap, array + block, right * sizeof(VAR));\n\n\ttpr = swap + right - 1;\n\n\twhile (tpl > array + 16 && tpr > swap + 16)\n\t{\n\t\ttpl_tpr16: if (cmp(tpl, tpr - 15) <= 0)\n\t\t{\n\t\t\tloop = 16; do *tpa-- = *tpr--; while (--loop);\n\n\t\t\tif (tpr > swap + 16) {goto tpl_tpr16;} break;\n\t\t}\n\n\t\ttpl16_tpr: if (cmp(tpl - 15, tpr) > 0)\n\t\t{\n\t\t\tloop = 16; do *tpa-- = *tpl--; while (--loop);\n\t\t\t\n\t\t\tif (tpl > array + 16) {goto tpl16_tpr;} break;\n\t\t}\n\t\tloop = 8; do\n\t\t{\n\t\t\tif (cmp(tpl, tpr - 1) <= 0)\n\t\t\t{\n\t\t\t\t*tpa-- = *tpr--; *tpa-- = *tpr--;\n\t\t\t}\n\t\t\telse if (cmp(tpl - 1, tpr) > 0)\n\t\t\t{\n\t\t\t\t*tpa-- = *tpl--; *tpa-- = *tpl--;\n\t\t\t}\n\t\t\telse\n\t\t\t{\n\t\t\t\tx = cmp(tpl, tpr) <= 0; tpa--; tpa[x] = *tpr; tpr -= 1; tpa[!x] = *tpl; tpl -= 1; tpa--;\n\t\t\t\ttail_branchless_merge(tpa, x, tpl, tpr, cmp);\n\t\t\t}\n\t\t}\n\t\twhile (--loop);\n\t}\n\n\twhile (tpr > swap + 1 && tpl > array + 1)\n\t{\n\t\ttpr2: if (cmp(tpl, tpr - 1) <= 0)\n\t\t{\n\t\t\t*tpa-- = *tpr--; *tpa-- = *tpr--;\n\t\t\t\n\t\t\tif (tpr > swap + 1) {goto tpr2;} break;\n\t\t}\n\n\t\tif (cmp(tpl - 1, tpr) > 0)\n\t\t{\n\t\t\t*tpa-- = *tpl--; *tpa-- = *tpl--;\n\n\t\t\tif (tpl > array + 1) {goto tpl2;} break;\n\t\t}\n\t\tgoto cross_swap;\n\n\t\ttpl2: if (cmp(tpl - 1, tpr) > 0)\n\t\t{\n\t\t\t*tpa-- = *tpl--; *tpa-- = *tpl--;\n\n\t\t\tif (tpl > array + 1) {goto tpl2;} break;\n\t\t}\n\n\t\tif (cmp(tpl, tpr - 1) <= 0)\n\t\t{\n\t\t\t*tpa-- = *tpr--; *tpa-- = *tpr--;\n\t\t\t\n\t\t\tif (tpr > swap + 1) {goto tpr2;} break;\n\t\t}\n\t\tcross_swap:\n\n\t\tx = cmp(tpl, tpr) <= 0; tpa--; tpa[x] = *tpr; tpr -= 1; tpa[!x] = *tpl; tpl -= 1; tpa--;\n\t\ttail_branchless_merge(tpa, x, tpl, tpr, cmp);\n\t}\n\n\twhile (tpr >= swap && tpl >= array)\n\t{\n\t\t*tpa-- = cmp(tpl, tpr) > 0 ? *tpl-- : *tpr--;\n\t}\n\n\twhile (tpr >= swap)\n\t{\n\t\t*tpa-- = *tpr--;\n\t}\n}\n\nvoid FUNC(tail_merge)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, size_t block, CMPFUNC *cmp)\n{\n\tVAR *pta, *pte;\n\n\tpte = array + nmemb;\n\n\twhile (block < nmemb && block <= swap_size)\n\t{\n\t\tfor (pta = array ; pta + block < pte ; pta += block * 2)\n\t\t{\n\t\t\tif (pta + block * 2 < pte)\n\t\t\t{\n\t\t\t\tFUNC(partial_backward_merge)(pta, swap, swap_size, block * 2, block, cmp);\n\n\t\t\t\tcontinue;\n\t\t\t}\n\t\t\tFUNC(partial_backward_merge)(pta, swap, swap_size, pte - pta, block, cmp);\n\n\t\t\tbreak;\n\t\t}\n\t\tblock *= 2;\n\t}\n}\n\n// the next four functions provide in-place rotate merge support\n\nvoid FUNC(trinity_rotation)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, size_t left)\n{\n\tVAR temp;\n\tsize_t bridge, right = nmemb - left;\n\n\tif (swap_size > 65536)\n\t{\n\t\tswap_size = 65536;\n\t}\n\n\tif (left < right)\n\t{\n\t\tif (left <= swap_size)\n\t\t{\n\t\t\tmemcpy(swap, array, left * sizeof(VAR));\n\t\t\tmemmove(array, array + left, right * sizeof(VAR));\n\t\t\tmemcpy(array + right, swap, left * sizeof(VAR));\n\t\t}\n\t\telse\n\t\t{\n\t\t\tVAR *pta, *ptb, *ptc, *ptd;\n\n\t\t\tpta = array;\n\t\t\tptb = pta + left;\n\n\t\t\tbridge = right - left;\n\n\t\t\tif (bridge <= swap_size && bridge > 3)\n\t\t\t{\n\t\t\t\tptc = pta + right;\n\t\t\t\tptd = ptc + left;\n\n\t\t\t\tmemcpy(swap, ptb, bridge * sizeof(VAR));\n\n\t\t\t\twhile (left--)\n\t\t\t\t{\n\t\t\t\t\t*--ptc = *--ptd; *ptd = *--ptb;\n\t\t\t\t}\n\t\t\t\tmemcpy(pta, swap, bridge * sizeof(VAR));\n\t\t\t}\n\t\t\telse\n\t\t\t{\n\t\t\t\tptc = ptb;\n\t\t\t\tptd = ptc + right;\n\n\t\t\t\tbridge = left / 2;\n\n\t\t\t\twhile (bridge--)\n\t\t\t\t{\n\t\t\t\t\ttemp = *--ptb; *ptb = *pta; *pta++ = *ptc; *ptc++ = *--ptd; *ptd = temp;\n\t\t\t\t}\n\n\t\t\t\tbridge = (ptd - ptc) / 2;\n\n\t\t\t\twhile (bridge--)\n\t\t\t\t{\n\t\t\t\t\ttemp = *ptc; *ptc++ = *--ptd; *ptd = *pta; *pta++ = temp;\n\t\t\t\t}\n\n\t\t\t\tbridge = (ptd - pta) / 2;\n\n\t\t\t\twhile (bridge--)\n\t\t\t\t{\n\t\t\t\t\ttemp = *pta; *pta++ = *--ptd; *ptd = temp;\n\t\t\t\t}\n\t\t\t}\n\t\t}\n\t}\n\telse if (right < left)\n\t{\n\t\tif (right <= swap_size)\n\t\t{\n\t\t\tmemcpy(swap, array + left, right * sizeof(VAR));\n\t\t\tmemmove(array + right, array, left * sizeof(VAR));\n\t\t\tmemcpy(array, swap, right * sizeof(VAR));\n\t\t}\n\t\telse\n\t\t{\n\t\t\tVAR *pta, *ptb, *ptc, *ptd;\n\n\t\t\tpta = array;\n\t\t\tptb = pta + left;\n\n\t\t\tbridge = left - right;\n\n\t\t\tif (bridge <= swap_size && bridge > 3)\n\t\t\t{\n\t\t\t\tptc = pta + right;\n\t\t\t\tptd = ptc + left;\n\n\t\t\t\tmemcpy(swap, ptc, bridge * sizeof(VAR));\n\n\t\t\t\twhile (right--)\n\t\t\t\t{\n\t\t\t\t\t*ptc++ = *pta; *pta++ = *ptb++;\n\t\t\t\t}\n\t\t\t\tmemcpy(ptd - bridge, swap, bridge * sizeof(VAR));\n\t\t\t}\n\t\t\telse\n\t\t\t{\n\t\t\t\tptc = ptb;\n\t\t\t\tptd = ptc + right;\n\n\t\t\t\tbridge = right / 2;\n\n\t\t\t\twhile (bridge--)\n\t\t\t\t{\n\t\t\t\t\ttemp = *--ptb; *ptb = *pta; *pta++ = *ptc; *ptc++ = *--ptd; *ptd = temp;\n\t\t\t\t}\n\n\t\t\t\tbridge = (ptb - pta) / 2;\n\n\t\t\t\twhile (bridge--)\n\t\t\t\t{\n\t\t\t\t\ttemp = *--ptb; *ptb = *pta; *pta++ = *--ptd; *ptd = temp;\n\t\t\t\t}\n\n\t\t\t\tbridge = (ptd - pta) / 2;\n\n\t\t\t\twhile (bridge--)\n\t\t\t\t{\n\t\t\t\t\ttemp = *pta; *pta++ = *--ptd; *ptd = temp;\n\t\t\t\t}\n\t\t\t}\n\t\t}\n\t}\n\telse\n\t{\n\t\tVAR *pta, *ptb;\n\n\t\tpta = array;\n\t\tptb = pta + left;\n\n\t\twhile (left--)\n\t\t{\n\t\t\ttemp = *pta; *pta++ = *ptb; *ptb++ = temp;\n\t\t}\n\t}\n}\n\nsize_t FUNC(monobound_binary_first)(VAR *array, VAR *value, size_t top, CMPFUNC *cmp)\n{\n\tVAR *end;\n\tsize_t mid;\n\n\tend = array + top;\n\n\twhile (top > 1)\n\t{\n\t\tmid = top / 2;\n\n\t\tif (cmp(value, end - mid) <= 0)\n\t\t{\n\t\t\tend -= mid;\n\t\t}\n\t\ttop -= mid;\n\t}\n\n\tif (cmp(value, end - 1) <= 0)\n\t{\n\t\tend--;\n\t}\n\treturn (end - array);\n}\n\nvoid FUNC(rotate_merge_block)(VAR *array, VAR *swap, size_t swap_size, size_t lblock, size_t right, CMPFUNC *cmp)\n{\n\tsize_t left, rblock, unbalanced;\n\n\tif (cmp(array + lblock - 1, array + lblock) <= 0)\n\t{\n\t\treturn;\n\t}\n\n\trblock = lblock / 2;\n\tlblock -= rblock;\n\n\tleft = FUNC(monobound_binary_first)(array + lblock + rblock, array + lblock, right, cmp);\n\n\tright -= left;\n\n\t// [ lblock ] [ rblock ] [ left ] [ right ]\n\n\tif (left)\n\t{\n\t\tif (lblock + left <= swap_size)\n\t\t{\n\t\t\tmemcpy(swap, array, lblock * sizeof(VAR));\n\t\t\tmemcpy(swap + lblock, array + lblock + rblock, left * sizeof(VAR));\n\t\t\tmemmove(array + lblock + left, array + lblock, rblock * sizeof(VAR));\n\n\t\t\tFUNC(cross_merge)(array, swap, lblock, left, cmp);\n\t\t}\n\t\telse\n\t\t{\n\t\t\tFUNC(trinity_rotation)(array + lblock, swap, swap_size, rblock + left, rblock);\n\n\t\t\tunbalanced = (left * 2 < lblock) | (lblock * 2 < left);\n\n\t\t\tif (unbalanced && left <= swap_size)\n\t\t\t{\n\t\t\t\tFUNC(partial_backward_merge)(array, swap, swap_size, lblock + left, lblock, cmp);\n\t\t\t}\n\t\t\telse if (unbalanced && lblock <= swap_size)\n\t\t\t{\n\t\t\t\tFUNC(partial_forward_merge)(array, swap, swap_size, lblock + left, lblock, cmp);\n\t\t\t}\n\t\t\telse\n\t\t\t{\n\t\t\t\tFUNC(rotate_merge_block)(array, swap, swap_size, lblock, left, cmp);\n\t\t\t}\n\t\t}\n\t}\n\n\tif (right)\n\t{\n\t\tunbalanced = (right * 2 < rblock) | (rblock * 2 < right);\n\n\t\tif ((unbalanced && right <= swap_size) || right + rblock <= swap_size)\n\t\t{\n\t\t\tFUNC(partial_backward_merge)(array + lblock + left, swap, swap_size, rblock + right, rblock, cmp);\n\t\t}\n\t\telse if (unbalanced && rblock <= swap_size)\n\t\t{\n\t\t\tFUNC(partial_forward_merge)(array + lblock + left, swap, swap_size, rblock + right, rblock, cmp);\n\t\t}\n\t\telse\n\t\t{\n\t\t\tFUNC(rotate_merge_block)(array + lblock + left, swap, swap_size, rblock, right, cmp);\n\t\t}\n\t}\n}\n\nvoid FUNC(rotate_merge)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, size_t block, CMPFUNC *cmp)\n{\n\tVAR *pta, *pte;\n\n\tpte = array + nmemb;\n\n\tif (nmemb <= block * 2 && nmemb - block <= swap_size)\n\t{\n\t\tFUNC(partial_backward_merge)(array, swap, swap_size, nmemb, block, cmp);\n\n\t\treturn;\n\t}\n\n\twhile (block < nmemb)\n\t{\n\t\tfor (pta = array ; pta + block < pte ; pta += block * 2)\n\t\t{\n\t\t\tif (pta + block * 2 < pte)\n\t\t\t{\n\t\t\t\tFUNC(rotate_merge_block)(pta, swap, swap_size, block, block, cmp);\n\n\t\t\t\tcontinue;\n\t\t\t}\n\t\t\tFUNC(rotate_merge_block)(pta, swap, swap_size, block, pte - pta - block, cmp);\n\n\t\t\tbreak;\n\t\t}\n\t\tblock *= 2;\n\t}\n}\n\n///////////////////////////////////////////////////////////////////////////////\n//┌─────────────────────────────────────────────────────────────────────────┐//\n//│    ██████┐ ██┐   ██┐ █████┐ ██████┐ ███████┐ ██████┐ ██████┐ ████████┐  │//\n//│   ██┌───██┐██│   ██│██┌──██┐██┌──██┐██┌────┘██┌───██┐██┌──██┐└──██┌──┘  │//\n//│   ██│   ██│██│   ██│███████│██│  ██│███████┐██│   ██│██████┌┘   ██│     │//\n//│   ██│▄▄ ██│██│   ██│██┌──██│██│  ██│└────██│██│   ██│██┌──██┐   ██│     │//\n//│   └██████┌┘└██████┌┘██│  ██│██████┌┘███████│└██████┌┘██│  ██│   ██│     │//\n//│    └──▀▀─┘  └─────┘ └─┘  └─┘└─────┘ └──────┘ └─────┘ └─┘  └─┘   └─┘     │//\n//└─────────────────────────────────────────────────────────────────────────┘//\n///////////////////////////////////////////////////////////////////////////////\n\nvoid FUNC(quadsort)(void *array, size_t nmemb, CMPFUNC *cmp)\n{\n\tVAR *pta = (VAR *) array;\n\n\tif (nmemb < 32)\n\t{\n\t\tVAR swap[nmemb];\n\n\t\tFUNC(tail_swap)(pta, swap, nmemb, cmp);\n\t}\n\telse if (FUNC(quad_swap)(pta, nmemb, cmp) == 0)\n\t{\n\t\tVAR *swap = NULL;\n\t\tsize_t block, swap_size = nmemb;\n\n\t\tif (nmemb > 4194304) for (swap_size = 4194304 ; swap_size * 8 <= nmemb ; swap_size *= 4) {}\n\n\t\tswap = (VAR *) malloc(swap_size * sizeof(VAR));\n\n\t\tif (swap == NULL)\n\t\t{\n\t\t\tVAR stack[512];\n\n\t\t\tblock = FUNC(quad_merge)(pta, stack, 512, nmemb, 32, cmp);\n\n\t\t\tFUNC(rotate_merge)(pta, stack, 512, nmemb, block, cmp);\n\n\t\t\treturn;\n\t\t}\n\t\tblock = FUNC(quad_merge)(pta, swap, swap_size, nmemb, 32, cmp);\n\n\t\tFUNC(rotate_merge)(pta, swap, swap_size, nmemb, block, cmp);\n\n\t\tfree(swap);\n\t}\n}\n\nvoid FUNC(quadsort_swap)(void *array, void *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp)\n{\n\tVAR *pta = (VAR *) array;\n\tVAR *pts = (VAR *) swap;\n\n\tif (nmemb <= 96)\n\t{\n\t\tFUNC(tail_swap)(pta, pts, nmemb, cmp);\n\t}\n\telse if (FUNC(quad_swap)(pta, nmemb, cmp) == 0)\n\t{\n\t\tsize_t block = FUNC(quad_merge)(pta, pts, swap_size, nmemb, 32, cmp);\n\n\t\tFUNC(rotate_merge)(pta, pts, swap_size, nmemb, block, cmp);\n\t}\n}\n"
  },
  {
    "path": "src/quadsort.h",
    "content": "// quadsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com\n\n#ifndef QUADSORT_H\n#define QUADSORT_H\n\n#include <stdlib.h>\n#include <stdio.h>\n#include <assert.h>\n#include <errno.h>\n#include <float.h>\n#include <string.h>\n\n//#include <stdalign.h>\n\ntypedef int CMPFUNC (const void *a, const void *b);\n\n//#define cmp(a,b) (*(a) > *(b))\n\n\n// When sorting an array of pointers, like a string array, the QUAD_CACHE needs\n// to be set for proper performance when sorting large arrays.\n// quadsort_prim() can be used to sort arrays of 32 and 64 bit integers\n// without a comparison function or cache restrictions.\n\n// With a 6 MB L3 cache a value of 262144 works well.\n\n#ifdef cmp\n  #define QUAD_CACHE 4294967295\n#else\n//#define QUAD_CACHE 131072\n  #define QUAD_CACHE 262144\n//#define QUAD_CACHE 524288\n//#define QUAD_CACHE 4294967295\n#endif\n\n// utilize branchless ternary operations in clang\n\n#if !defined __clang__\n#define head_branchless_merge(ptd, x, ptl, ptr, cmp)  \\\n\tx = cmp(ptl, ptr) <= 0;  \\\n\t*ptd = *ptl;  \\\n\tptl += x;  \\\n\tptd[x] = *ptr;  \\\n\tptr += !x;  \\\n\tptd++;\n#else\n#define head_branchless_merge(ptd, x, ptl, ptr, cmp)  \\\n\t*ptd++ = cmp(ptl, ptr) <= 0 ? *ptl++ : *ptr++;\n#endif\n\n#if !defined __clang__\n#define tail_branchless_merge(tpd, y, tpl, tpr, cmp)  \\\n\ty = cmp(tpl, tpr) <= 0;  \\\n\t*tpd = *tpl;  \\\n\ttpl -= !y;  \\\n\ttpd--;  \\\n\ttpd[y] = *tpr;  \\\n\ttpr -= y;\n#else\n#define tail_branchless_merge(tpd, x, tpl, tpr, cmp)  \\\n\t*tpd-- = cmp(tpl, tpr) > 0 ? *tpl-- : *tpr--;\n#endif\n\n// guarantee small parity merges are inlined with minimal overhead\n\n#define parity_merge_two(array, swap, x, ptl, ptr, pts, cmp)  \\\n\tptl = array; ptr = array + 2; pts = swap;  \\\n\thead_branchless_merge(pts, x, ptl, ptr, cmp);  \\\n\t*pts = cmp(ptl, ptr) <= 0 ? *ptl : *ptr;  \\\n  \\\n\tptl = array + 1; ptr = array + 3; pts = swap + 3;  \\\n\ttail_branchless_merge(pts, x, ptl, ptr, cmp);  \\\n\t*pts = cmp(ptl, ptr)  > 0 ? *ptl : *ptr;\n\n#define parity_merge_four(array, swap, x, ptl, ptr, pts, cmp)  \\\n\tptl = array + 0; ptr = array + 4; pts = swap;  \\\n\thead_branchless_merge(pts, x, ptl, ptr, cmp);  \\\n\thead_branchless_merge(pts, x, ptl, ptr, cmp);  \\\n\thead_branchless_merge(pts, x, ptl, ptr, cmp);  \\\n\t*pts = cmp(ptl, ptr) <= 0 ? *ptl : *ptr;  \\\n  \\\n\tptl = array + 3; ptr = array + 7; pts = swap + 7;  \\\n\ttail_branchless_merge(pts, x, ptl, ptr, cmp);  \\\n\ttail_branchless_merge(pts, x, ptl, ptr, cmp);  \\\n\ttail_branchless_merge(pts, x, ptl, ptr, cmp);  \\\n\t*pts = cmp(ptl, ptr)  > 0 ? *ptl : *ptr;\n\n\n#if !defined __clang__\n#define branchless_swap(pta, swap, x, cmp)  \\\n\tx = cmp(pta, pta + 1) > 0;  \\\n\tswap = pta[!x];  \\\n\tpta[0] = pta[x];  \\\n\tpta[1] = swap;\n#else\n#define branchless_swap(pta, swap, x, cmp)  \\\n\tx = 0;  \\\n\tswap = cmp(pta, pta + 1) > 0 ? pta[x++] : pta[1];  \\\n\tpta[0] = pta[x];  \\\n\tpta[1] = swap;\n#endif\n\n#define swap_branchless(pta, swap, x, y, cmp)  \\\n\tx = cmp(pta, pta + 1) > 0;  \\\n\ty = !x;  \\\n\tswap = pta[y];  \\\n\tpta[0] = pta[x];  \\\n\tpta[1] = swap;\n\n//////////////////////////////////////////////////////////\n// ┌───────────────────────────────────────────────────┐//\n// │       ██████┐ ██████┐    ██████┐ ██████┐████████┐ │//\n// │       └────██┐└────██┐   ██┌──██┐└─██┌─┘└──██┌──┘ │//\n// │        █████┌┘ █████┌┘   ██████┌┘  ██│     ██│    │//\n// │        └───██┐██┌───┘    ██┌──██┐  ██│     ██│    │//\n// │       ██████┌┘███████┐   ██████┌┘██████┐   ██│    │//\n// │       └─────┘ └──────┘   └─────┘ └─────┘   └─┘    │//\n// └───────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n#define VAR int\n#define FUNC(NAME) NAME##32\n\n#include \"quadsort.c\"\n\n#undef VAR\n#undef FUNC\n\n// quadsort_prim\n\n#define VAR int\n#define FUNC(NAME) NAME##_int32\n#ifndef cmp\n  #define cmp(a,b) (*(a) > *(b))\n  #include \"quadsort.c\"\n  #undef cmp\n#else\n  #include \"quadsort.c\"\n#endif\n#undef VAR\n#undef FUNC\n\n#define VAR unsigned int\n#define FUNC(NAME) NAME##_uint32\n#ifndef cmp\n  #define cmp(a,b) (*(a) > *(b))\n  #include \"quadsort.c\"\n  #undef cmp\n#else\n  #include \"quadsort.c\"\n#endif\n#undef VAR\n#undef FUNC\n\n//////////////////////////////////////////////////////////\n// ┌───────────────────────────────────────────────────┐//\n// │        █████┐ ██┐  ██┐   ██████┐ ██████┐████████┐ │//\n// │       ██┌───┘ ██│  ██│   ██┌──██┐└─██┌─┘└──██┌──┘ │//\n// │       ██████┐ ███████│   ██████┌┘  ██│     ██│    │//\n// │       ██┌──██┐└────██│   ██┌──██┐  ██│     ██│    │//\n// │       └█████┌┘     ██│   ██████┌┘██████┐   ██│    │//\n// │        └────┘      └─┘   └─────┘ └─────┘   └─┘    │//\n// └───────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n#define VAR long long\n#define FUNC(NAME) NAME##64\n\n#include \"quadsort.c\"\n\n#undef VAR\n#undef FUNC\n\n// quadsort_prim\n\n#define VAR long long\n#define FUNC(NAME) NAME##_int64\n#ifndef cmp\n  #define cmp(a,b) (*(a) > *(b))\n  #include \"quadsort.c\"\n  #undef cmp\n#else\n  #include \"quadsort.c\"\n#endif\n#undef VAR\n#undef FUNC\n\n#define VAR unsigned long long\n#define FUNC(NAME) NAME##_uint64\n#ifndef cmp\n  #define cmp(a,b) (*(a) > *(b))\n  #include \"quadsort.c\"\n  #undef cmp\n#else\n  #include \"quadsort.c\"\n#endif\n#undef VAR\n#undef FUNC\n\n// This section is outside of 32/64 bit pointer territory, so no cache checks\n// necessary, unless sorting 32+ byte structures.\n\n#undef QUAD_CACHE\n#define QUAD_CACHE 4294967295\n\n//////////////////////////////////////////////////////////\n//┌────────────────────────────────────────────────────┐//\n//│                █████┐    ██████┐ ██████┐████████┐  │//\n//│               ██┌──██┐   ██┌──██┐└─██┌─┘└──██┌──┘  │//\n//│               └█████┌┘   ██████┌┘  ██│     ██│     │//\n//│               ██┌──██┐   ██┌──██┐  ██│     ██│     │//\n//│               └█████┌┘   ██████┌┘██████┐   ██│     │//\n//│                └────┘    └─────┘ └─────┘   └─┘     │//\n//└────────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n#define VAR char\n#define FUNC(NAME) NAME##8\n\n#include \"quadsort.c\"\n\n#undef VAR\n#undef FUNC\n\n//////////////////////////////////////////////////////////\n//┌────────────────────────────────────────────────────┐//\n//│           ▄██┐   █████┐    ██████┐ ██████┐████████┐│//\n//│          ████│  ██┌───┘    ██┌──██┐└─██┌─┘└──██┌──┘│//\n//│          └─██│  ██████┐    ██████┌┘  ██│     ██│   │//\n//│            ██│  ██┌──██┐   ██┌──██┐  ██│     ██│   │//\n//│          ██████┐└█████┌┘   ██████┌┘██████┐   ██│   │//\n//│          └─────┘ └────┘    └─────┘ └─────┘   └─┘   │//\n//└────────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n#define VAR short\n#define FUNC(NAME) NAME##16\n\n#include \"quadsort.c\"\n\n#undef VAR\n#undef FUNC\n\n//////////////////////////////////////////////////////////\n//┌────────────────────────────────────────────────────┐//\n//│  ▄██┐  ██████┐  █████┐    ██████┐ ██████┐████████┐ │//\n//│ ████│  └────██┐██┌──██┐   ██┌──██┐└─██┌─┘└──██┌──┘ │//\n//│ └─██│   █████┌┘└█████┌┘   ██████┌┘  ██│     ██│    │//\n//│   ██│  ██┌───┘ ██┌──██┐   ██┌──██┐  ██│     ██│    │//\n//│ ██████┐███████┐└█████┌┘   ██████┌┘██████┐   ██│    │//\n//│ └─────┘└──────┘ └────┘    └─────┘ └─────┘   └─┘    │//\n//└────────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n// 128 reflects the name, though the actual size of a long double is 64, 80,\n// 96, or 128 bits, depending on platform.\n\n#if (DBL_MANT_DIG < LDBL_MANT_DIG)\n  #define VAR long double\n  #define FUNC(NAME) NAME##128\n  #include \"quadsort.c\"\n  #undef VAR\n  #undef FUNC\n#endif\n\n///////////////////////////////////////////////////////////\n//┌─────────────────────────────────────────────────────┐//\n//│ ██████┐██┐   ██┐███████┐████████┐ ██████┐ ███┐  ███┐│//\n//│██┌────┘██│   ██│██┌────┘└──██┌──┘██┌───██┐████┐████││//\n//│██│     ██│   ██│███████┐   ██│   ██│   ██│██┌███┌██││//\n//│██│     ██│   ██│└────██│   ██│   ██│   ██│██│└█┌┘██││//\n//│└██████┐└██████┌┘███████│   ██│   └██████┌┘██│ └┘ ██││//\n//│ └─────┘ └─────┘ └──────┘   └─┘    └─────┘ └─┘    └─┘│//\n//└─────────────────────────────────────────────────────┘//\n///////////////////////////////////////////////////////////\n\n/*\ntypedef struct {char bytes[32];} struct256;\n#define VAR struct256\n#define FUNC(NAME) NAME##256\n\n#include \"quadsort.c\"\n\n#undef VAR\n#undef FUNC\n*/\n\n///////////////////////////////////////////////////////////////////////////////\n//┌─────────────────────────────────────────────────────────────────────────┐//\n//│    ██████┐ ██┐   ██┐ █████┐ ██████┐ ███████┐ ██████┐ ██████┐ ████████┐  │//\n//│   ██┌───██┐██│   ██│██┌──██┐██┌──██┐██┌────┘██┌───██┐██┌──██┐└──██┌──┘  │//\n//│   ██│   ██│██│   ██│███████│██│  ██│███████┐██│   ██│██████┌┘   ██│     │//\n//│   ██│▄▄ ██│██│   ██│██┌──██│██│  ██│└────██│██│   ██│██┌──██┐   ██│     │//\n//│   └██████┌┘└██████┌┘██│  ██│██████┌┘███████│└██████┌┘██│  ██│   ██│     │//\n//│    └──▀▀─┘  └─────┘ └─┘  └─┘└─────┘ └──────┘ └─────┘ └─┘  └─┘   └─┘     │//\n//└─────────────────────────────────────────────────────────────────────────┘//\n///////////////////////////////////////////////////////////////////////////////\n\n\nvoid quadsort(void *array, size_t nmemb, size_t size, CMPFUNC *cmp)\n{\n\tif (nmemb < 2)\n\t{\n\t\treturn;\n\t}\n\n\tswitch (size)\n\t{\n\t\tcase sizeof(char):\n\t\t\tquadsort8(array, nmemb, cmp);\n\t\t\treturn;\n\n\t\tcase sizeof(short):\n\t\t\tquadsort16(array, nmemb, cmp);\n\t\t\treturn;\n\n\t\tcase sizeof(int):\n\t\t\tquadsort32(array, nmemb, cmp);\n\t\t\treturn;\n\n\t\tcase sizeof(long long):\n\t\t\tquadsort64(array, nmemb, cmp);\n\t\t\treturn;\n#if (DBL_MANT_DIG < LDBL_MANT_DIG)\n\t\tcase sizeof(long double):\n\t\t\tquadsort128(array, nmemb, cmp);\n\t\t\treturn;\n#endif\n//\t\tcase sizeof(struct256):\n//\t\t\tquadsort256(array, nmemb, cmp);\n//\t\t\treturn;\n\n\t\tdefault:\n#if (DBL_MANT_DIG < LDBL_MANT_DIG)\n\t\t\tassert(size == sizeof(char) || size == sizeof(short) || size == sizeof(int) || size == sizeof(long long) || size == sizeof(long double));\n#else\n\t\t\tassert(size == sizeof(char) || size == sizeof(short) || size == sizeof(int) || size == sizeof(long long));\n#endif\n//\t\t\tqsort(array, nmemb, size, cmp);\n\t}\n}\n\n// suggested size values for primitives:\n\n//\t\tcase  0: unsigned char\n//\t\tcase  1: signed char\n//\t\tcase  2: signed short\n//\t\tcase  3: unsigned short\n//\t\tcase  4: signed int\n//\t\tcase  5: unsigned int\n//\t\tcase  6: float\n//\t\tcase  7: double\n//\t\tcase  8: signed long long\n//\t\tcase  9: unsigned long long\n//\t\tcase  ?: long double, use sizeof(long double):\n\nvoid quadsort_prim(void *array, size_t nmemb, size_t size)\n{\n\tif (nmemb < 2)\n\t{\n\t\treturn;\n\t}\n\n\tswitch (size)\n\t{\n\t\tcase 4:\n\t\t\tquadsort_int32(array, nmemb, NULL);\n\t\t\treturn;\n\t\tcase 5:\n\t\t\tquadsort_uint32(array, nmemb, NULL);\n\t\t\treturn;\n\t\tcase 8:\n\t\t\tquadsort_int64(array, nmemb, NULL);\n\t\t\treturn;\n\t\tcase 9:\n\t\t\tquadsort_uint64(array, nmemb, NULL);\n\t\t\treturn;\n\t\tdefault:\n\t\t\tassert(size == sizeof(int) || size == sizeof(int) + 1 || size == sizeof(long long) || size == sizeof(long long) + 1);\n\t\t\treturn;\n\t}\n}\n\n// Sort arrays of structures, the comparison function must be by reference.\n\nvoid quadsort_size(void *array, size_t nmemb, size_t size, CMPFUNC *cmp)\n{\n\tchar **pti, *pta, *pts;\n\tsize_t index, offset;\n\n\tif (nmemb < 2)\n\t{\n\t\treturn;\n\t}\n\tpta = (char *) array;\n\tpti = (char **) malloc(nmemb * sizeof(char *));\n\n\tassert(pti != NULL);\n\n\tfor (index = offset = 0 ; index < nmemb ; index++)\n\t{\n\t\tpti[index] = pta + offset;\n\n\t\toffset += size;\n\t}\n\n\tswitch (sizeof(size_t))\n\t{\n\t\tcase 4: quadsort32(pti, nmemb, cmp); break;\n\t\tcase 8: quadsort64(pti, nmemb, cmp); break;\n\t}\n\n\tpts = (char *) malloc(nmemb * size);\n\n\tassert(pts != NULL);\n\t\n\tfor (index = 0 ; index < nmemb ; index++)\n\t{\n\t\tmemcpy(pts, pti[index], size);\n\n\t\tpts += size;\n\t}\n\tpts -= nmemb * size;\n\n\tmemcpy(array, pts, nmemb * size);\n\n\tfree(pti);\n\tfree(pts);\n}\n\n#undef QUAD_CACHE\n\n#endif\n"
  },
  {
    "path": "src/skipsort.c",
    "content": "// skipsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com\n\nvoid FUNC(skip_partition)(VAR *array, VAR *swap, VAR *ptx, VAR *ptp, size_t nmemb, CMPFUNC *cmp);\n\n// Similar to quadsort, but detect both random and reverse order runs\n\nint FUNC(skip_analyze)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp)\n{\n\tsize_t count, span;\n\tVAR *pta, *pts;\n\tunsigned char v1, v2, v3, v4, x;\n\tpta = array;\n\n\tcount = nmemb / 8;\n\n\twhile (count--)\n\t{\n\t\t// granular\n\n\t\tv1 = cmp(pta + 0, pta + 1) > 0;\n\t\tv2 = cmp(pta + 2, pta + 3) > 0;\n\t\tv3 = cmp(pta + 4, pta + 5) > 0;\n\t\tv4 = cmp(pta + 6, pta + 7) > 0;\n\n\t\tswitch (v1 + v2 * 2 + v3 * 4 + v4 * 8)\n\t\t{\n\t\t\tcase 0:\n\t\t\t\tif (cmp(pta + 1, pta + 2) <= 0 && cmp(pta + 3, pta + 4) <= 0 && cmp(pta + 5, pta + 6) <= 0)\n\t\t\t\t{\n\t\t\t\t\tgoto ordered;\n\t\t\t\t}\n\t\t\t\tpts = pta;\n\t\t\t\tgoto random;\n\n\t\t\tcase 15:\n\t\t\t\tif (cmp(pta + 1, pta + 2) > 0 && cmp(pta + 3, pta + 4) > 0 && cmp(pta + 5, pta + 6) > 0)\n\t\t\t\t{\n\t\t\t\t\tpts = pta;\n\t\t\t\t\tgoto reversed;\n\t\t\t\t}\n\n\t\t\tdefault:\n\t\t\t\tpts = pta;\n\t\t\t\tgoto random;\n\t\t}\n\n\t\trandom: // random\n\n\t\tpta += 8;\n\n\t\tif (count--)\n\t\t{\n\t\t\tv1 = cmp(pta + 0, pta + 1) > 0;\n\t\t\tv2 = cmp(pta + 2, pta + 3) > 0;\n\t\t\tv3 = cmp(pta + 4, pta + 5) > 0;\n\t\t\tv4 = cmp(pta + 6, pta + 7) > 0;\n\n\t\t\tswitch (v1 + v2 * 2 + v3 * 4 + v4 * 8)\n\t\t\t{\n\t\t\t\tcase 0:\n\t\t\t\t\tif (cmp(pta + 1, pta + 2) <= 0 && cmp(pta + 3, pta + 4) <= 0 && cmp(pta + 5, pta + 6) <= 0)\n\t\t\t\t\t{\n\t\t\t\t\t\tif (count)\n\t\t\t\t\t\t{\n\t\t\t\t\t\t\tpta += 8;\n\t\t\t\t\t\t\tif (cmp(pta + 0, pta + 1) <= 0 && cmp(pta + 1, pta + 2) <= 0 && cmp(pta + 2, pta + 3) <= 0 && cmp(pta + 3, pta + 4) <= 0 && cmp(pta + 4, pta + 5) <= 0 && cmp(pta + 5, pta + 6) <= 0 && cmp(pta + 6, pta + 7) <= 0)\n\t\t\t\t\t\t\t{\n\t\t\t\t\t\t\t\tpta -= 8;\n\t\t\t\t\t\t\t\tbreak;\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t\tcount--;\n\t\t\t\t\t\t}\n\t\t\t\t\t}\n\t\t\t\t\tgoto randomc;\n\n\t\t\t\tcase 15:\n\t\t\t\t\tif (cmp(pta + 1, pta + 2) > 0 && cmp(pta + 3, pta + 4) > 0 && cmp(pta + 5, pta + 6) > 0)\n\t\t\t\t\t{\n\t\t\t\t\t\tbreak;\n\t\t\t\t\t}\n\n\t\t\t\tdefault:\n\t\t\t\trandomc:\n\t\t\t\t\tif (count >= 6)\n\t\t\t\t\t{\n\t\t\t\t\t\tcount -= 6;\n\t\t\t\t\t\tpta += 48;\n\t\t\t\t\t}\n\t\t\t\t\tgoto random;\n\t\t\t}\n\t\t\tspan = (pta - pts);\n\n\t\t\tif (span <= 96)\n\t\t\t{\n\t\t\t\tFUNC(tail_swap)(pts, swap, span, cmp);\n\t\t\t}\n\t\t\telse\n\t\t\t{\n\t\t\t\tFUNC(flux_partition)(pts, swap, pts, swap + span, span, cmp);\n\t\t\t}\n\n\t\t\tif (v1 | v2 | v3 | v4)\n\t\t\t{\n\t\t\t\tpts = pta;\n\t\t\t\tgoto reversed;\n\t\t\t}\n\t\t\tpta += 8;\n\t\t\tcount--;\n\t\t\tgoto ordered;\n\t\t}\n\t\tspan = (pta - pts);\n\n\t\tif (span <= 96)\n\t\t{\n\t\t\tFUNC(tail_swap)(pts, swap, span, cmp);\n\t\t\tbreak;\n\t\t}\n\t\tif (pts == array)\n\t\t{\n\t\t\tFUNC(flux_partition)(array, swap, pts, swap + nmemb, nmemb, cmp);\n\t\t\treturn 1;\n\t\t}\n\t\tFUNC(flux_partition)(pts, swap, pts, swap + span, span, cmp);\n\t\tbreak;\n\n\t\tordered: // ordered \n\n\t\tpta += 8;\n\n\t\tif (count--)\n\t\t{\n\t\t\tif ((v1 = cmp(pta + 0, pta + 1) > 0) | (v2 = cmp(pta + 2, pta + 3) > 0) | (v3 = cmp(pta + 4, pta + 5) > 0) | (v4 = cmp(pta + 6, pta + 7) > 0))\n\t\t\t{\n\t\t\t\tpts = pta;\n\t\t\t\tgoto random;\n\t\t\t}\n\t\t\tif (cmp(pta + 1, pta + 2) <= 0 && cmp(pta + 3, pta + 4) <= 0 && cmp(pta + 5, pta + 6) <= 0)\n\t\t\t{\n\t\t\t\tgoto ordered;\n\t\t\t}\n\t\t\tFUNC(quad_swap_merge)(pta, swap, cmp);\n\t\t\tpta += 8;\n\t\t\tcontinue;\n\t\t}\n\t\tbreak;\n\n\t\treversed: // reversed\n\n\t\tpta += 8;\n\n\t\tif (count--)\n\t\t{\n\t\t\tif ((v1 = cmp(pta + 0, pta + 1) <= 0) | (v2 = cmp(pta + 2, pta + 3) <= 0) | (v3 = cmp(pta + 4, pta + 5) <= 0) | (v4 = cmp(pta + 6, pta + 7) <= 0))\n\t\t\t{\n\t\t\t\tnot_reversed:\n\n\t\t\t\tx = !v1; swap[0] = pta[v1]; pta[0] = pta[x]; pta[1] = swap[0]; pta += 2;\n\t\t\t\tx = !v2; swap[0] = pta[v2]; pta[0] = pta[x]; pta[1] = swap[0]; pta += 2;\n\t\t\t\tx = !v3; swap[0] = pta[v3]; pta[0] = pta[x]; pta[1] = swap[0]; pta += 2;\n\t\t\t\tx = !v4; swap[0] = pta[v4]; pta[0] = pta[x]; pta[1] = swap[0]; pta -= 6;\n\n\t\t\t\tif (cmp(pta + 1, pta + 2) > 0 || cmp(pta + 3, pta + 4) > 0 || cmp(pta + 5, pta + 6) > 0)\n\t\t\t\t{\n\t\t\t\t\tFUNC(quad_swap_merge)(pta, swap, cmp);\n\t\t\t\t}\n\t\t\t}\n\t\t\telse\n\t\t\t{\n\t\t\t\tif (cmp(pta - 1, pta) > 0 && cmp(pta + 1, pta + 2) > 0 && cmp(pta + 3, pta + 4) > 0 && cmp(pta + 5, pta + 6) > 0)\n\t\t\t\t{\n\t\t\t\t\tgoto reversed;\n\t\t\t\t}\n\t\t\t\tgoto not_reversed;\n\t\t\t}\n\t\t\tFUNC(quad_reversal)(pts, pta - 1);\n\t\t\tpta += 8;\n\t\t\tcontinue;\n\t\t}\n\n\t\tswitch (nmemb % 8)\n\t\t{\n\t\t\tcase 7: if (cmp(pta + 5, pta + 6) <= 0) break;\n\t\t\tcase 6: if (cmp(pta + 4, pta + 5) <= 0) break;\n\t\t\tcase 5: if (cmp(pta + 3, pta + 4) <= 0) break;\n\t\t\tcase 4: if (cmp(pta + 2, pta + 3) <= 0) break;\n\t\t\tcase 3: if (cmp(pta + 1, pta + 2) <= 0) break;\n\t\t\tcase 2: if (cmp(pta + 0, pta + 1) <= 0) break;\n\t\t\tcase 1: if (cmp(pta - 1, pta + 0) <= 0) break;\n\t\t\tcase 0:\n\t\t\t\tFUNC(quad_reversal)(pts, pta + nmemb % 8 - 1);\n\n\t\t\t\tif (pts == array)\n\t\t\t\t{\n\t\t\t\t\treturn 1;\n\t\t\t\t}\n\t\t\t\tgoto reverse_end;\n\t\t}\n\t\tFUNC(quad_reversal)(pts, pta - 1);\n\t\tbreak;\n\t}\n\tFUNC(tail_swap)(pta, swap, nmemb % 8, cmp);\n\n\treverse_end:\n\n\tpta = array;\n\n\tfor (count = nmemb / 32 ; count-- ; pta += 32)\n\t{\n\t\tif (cmp(pta + 7, pta + 8) <= 0 && cmp(pta + 15, pta + 16) <= 0 && cmp(pta + 23, pta + 24) <= 0)\n\t\t{\n\t\t\tcontinue;\n\t\t}\n\t\tFUNC(parity_merge)(swap, pta, 8, 8, cmp);\n\t\tFUNC(parity_merge)(swap + 16, pta + 16, 8, 8, cmp);\n\t\tFUNC(parity_merge)(pta, swap, 16, 16, cmp);\n\t}\n\n\tif (nmemb % 32 > 8)\n\t{\n\t\tFUNC(tail_merge)(pta, swap, 32, nmemb % 32, 8, cmp);\n\t}\n\treturn 0;\n}\n\nvoid FUNC(skipsort)(void *array, size_t nmemb, CMPFUNC *cmp)\n{\n\tVAR *pta = (VAR *) array;\n\n\tif (nmemb <= 96)\n\t{\n\t\tVAR swap[nmemb];\n\n\t\tFUNC(tail_swap)(pta, swap, nmemb, cmp);\n\t}\n\telse\n\t{\n\t\tVAR *swap = (VAR *) malloc(nmemb * sizeof(VAR));\n\n\t\tif (swap == NULL)\n\t\t{\n\t\t\tFUNC(quadsort)(pta, nmemb, cmp);\n\t\t\treturn;\n\t\t}\n\t\tif (FUNC(skip_analyze)(pta, swap, nmemb, nmemb, cmp) == 0)\n\t\t{\n\t\t\tFUNC(quad_merge)(pta, swap, nmemb, nmemb, 32, cmp);\n\t\t}\n\t\tfree(swap);\n\t}\n}\n\nvoid FUNC(skipsort_swap)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp)\n{\n\tif (nmemb <= 96)\n\t{\n\t\tFUNC(tail_swap)(array, swap, nmemb, cmp);\n\t}\n\telse if (swap_size < nmemb)\n\t{\n\t\tFUNC(quadsort_swap)(array, swap, swap_size, nmemb, cmp);\n\t}\n\telse\n\t{\n\t\tFUNC(skip_analyze)(array, swap, swap_size, nmemb, cmp);\n\t}\n}\n"
  },
  {
    "path": "src/skipsort.h",
    "content": "// skipsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com\n\n#ifndef SKIPSORT_H\n#define SKIPSORT_H\n\n#include <stdlib.h>\n#include <stdio.h>\n#include <assert.h>\n#include <errno.h>\n\ntypedef int CMPFUNC (const void *a, const void *b);\n\n//#define cmp(a,b) (*(a) > *(b))\n\n#ifndef QUADSORT_H\n  #include \"quadsort.h\"\n#endif\n#ifndef FLUXSORT_H\n  #include \"fluxsort.h\"\n#endif\n\n// When sorting an array of pointers, like a string array, QUAD_CACHE needs to\n// be adjusted in quadsort.h for proper performance when sorting large arrays.\n\n\n//////////////////////////////////////////////////////////\n//┌────────────────────────────────────────────────────┐//\n//│                █████┐    ██████┐ ██████┐████████┐  │//\n//│               ██┌──██┐   ██┌──██┐└─██┌─┘└──██┌──┘  │//\n//│               └█████┌┘   ██████┌┘  ██│     ██│     │//\n//│               ██┌──██┐   ██┌──██┐  ██│     ██│     │//\n//│               └█████┌┘   ██████┌┘██████┐   ██│     │//\n//│                └────┘    └─────┘ └─────┘   └─┘     │//\n//└────────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n#define VAR char\n#define FUNC(NAME) NAME##8\n\n#include \"skipsort.c\"\n\n#undef VAR\n#undef FUNC\n\n//////////////////////////////////////////////////////////\n//┌────────────────────────────────────────────────────┐//\n//│           ▄██┐   █████┐    ██████┐ ██████┐████████┐│//\n//│          ████│  ██┌───┘    ██┌──██┐└─██┌─┘└──██┌──┘│//\n//│          └─██│  ██████┐    ██████┌┘  ██│     ██│   │//\n//│            ██│  ██┌──██┐   ██┌──██┐  ██│     ██│   │//\n//│          ██████┐└█████┌┘   ██████┌┘██████┐   ██│   │//\n//│          └─────┘ └────┘    └─────┘ └─────┘   └─┘   │//\n//└────────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n#define VAR short\n#define FUNC(NAME) NAME##16\n\n#include \"skipsort.c\"\n\n#undef VAR\n#undef FUNC\n\n//////////////////////////////////////////////////////////\n// ┌───────────────────────────────────────────────────┐//\n// │       ██████┐ ██████┐    ██████┐ ██████┐████████┐ │//\n// │       └────██┐└────██┐   ██┌──██┐└─██┌─┘└──██┌──┘ │//\n// │        █████┌┘ █████┌┘   ██████┌┘  ██│     ██│    │//\n// │        └───██┐██┌───┘    ██┌──██┐  ██│     ██│    │//\n// │       ██████┌┘███████┐   ██████┌┘██████┐   ██│    │//\n// │       └─────┘ └──────┘   └─────┘ └─────┘   └─┘    │//\n// └───────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n#define VAR int\n#define FUNC(NAME) NAME##32\n\n#include \"skipsort.c\"\n\n#undef VAR\n#undef FUNC\n\n#ifndef cmp\n  #define cmp(a,b) (*(a) > *(b))\n\n  #define VAR int\n  #define FUNC(NAME) NAME##_int32\n\n  #include \"skipsort.c\"\n\n  #undef VAR\n  #undef FUNC\n\n  #undef cmp\n#endif\n\n//////////////////////////////////////////////////////////\n// ┌───────────────────────────────────────────────────┐//\n// │        █████┐ ██┐  ██┐   ██████┐ ██████┐████████┐ │//\n// │       ██┌───┘ ██│  ██│   ██┌──██┐└─██┌─┘└──██┌──┘ │//\n// │       ██████┐ ███████│   ██████┌┘  ██│     ██│    │//\n// │       ██┌──██┐└────██│   ██┌──██┐  ██│     ██│    │//\n// │       └█████┌┘     ██│   ██████┌┘██████┐   ██│    │//\n// │        └────┘      └─┘   └─────┘ └─────┘   └─┘    │//\n// └───────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n#define VAR long long\n#define FUNC(NAME) NAME##64\n\n#include \"skipsort.c\"\n\n#undef VAR\n#undef FUNC\n\n#ifndef cmp\n  #define cmp(a,b) (*(a) > *(b))\n\n  #define VAR long long\n  #define FUNC(NAME) NAME##_int64\n\n  #include \"skipsort.c\"\n\n  #undef VAR\n  #undef FUNC\n\n  #undef cmp\n#endif\n\n//////////////////////////////////////////////////////////\n//┌────────────────────────────────────────────────────┐//\n//│  ▄██┐  ██████┐  █████┐    ██████┐ ██████┐████████┐ │//\n//│ ████│  └────██┐██┌──██┐   ██┌──██┐└─██┌─┘└──██┌──┘ │//\n//│ └─██│   █████┌┘└█████┌┘   ██████┌┘  ██│     ██│    │//\n//│   ██│  ██┌───┘ ██┌──██┐   ██┌──██┐  ██│     ██│    │//\n//│ ██████┐███████┐└█████┌┘   ██████┌┘██████┐   ██│    │//\n//│ └─────┘└──────┘ └────┘    └─────┘ └─────┘   └─┘    │//\n//└────────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n#define VAR long double\n#define FUNC(NAME) NAME##128\n\n#include \"skipsort.c\"\n\n#undef VAR\n#undef FUNC\n\n////////////////////////////////////////////////////////////////////////\n//┌──────────────────────────────────────────────────────────────────┐//\n//│███████┐██┐  ██┐██████┐██████┐ ███████┐ ██████┐ ██████┐ ████████┐ │//\n//│██┌────┘██│ ██┌┘└─██┌─┘██┌──██┐██┌────┘██┌───██┐██┌──██┐└──██┌──┘ │//\n//│███████┐█████┌┘   ██│  ██████┌┘███████┐██│   ██│██████┌┘   ██│    │//\n//│└────██│██┌─██┐   ██│  ██┌───┘ └────██│██│   ██│██┌──██┐   ██│    │//\n//│███████│██│  ██┐██████┐██│     ███████│└██████┌┘██│  ██│   ██│    │//\n//│└──────┘└─┘  └─┘└─────┘└─┘     └──────┘ └─────┘ └─┘  └─┘   └─┘    │//\n//└──────────────────────────────────────────────────────────────────┘//\n////////////////////////////////////////////////////////////////////////\n\nvoid skipsort(void *array, size_t nmemb, size_t size, CMPFUNC *cmp)\n{\n\tif (nmemb < 2)\n\t{\n\t\treturn;\n\t}\n#ifndef cmp\n\tif (cmp == NULL)\n\t{\n\t\tswitch (size)\n\t\t{\n\t\t\tcase sizeof(int):\n\t\t\t\treturn skipsort_int32(array, nmemb, cmp);\n\t\t\tcase sizeof(long long):\n\t\t\t\treturn skipsort_int64(array, nmemb, cmp);\n\t\t}\n\t\treturn assert(size == sizeof(int));\n\t}\n#endif\n\n\tswitch (size)\n\t{\n\t\tcase sizeof(char):\n\t\t\treturn skipsort8(array, nmemb, cmp);\n\n\t\tcase sizeof(short):\n\t\t\treturn skipsort16(array, nmemb, cmp);\n\n\t\tcase sizeof(int):\n\t\t\treturn skipsort32(array, nmemb, cmp);\n\n\t\tcase sizeof(long long):\n\t\t\treturn skipsort64(array, nmemb, cmp);\n\n\t\tcase sizeof(long double):\n\t\t\treturn skipsort128(array, nmemb, cmp);\n\n\t\tdefault:\n\t\t\treturn assert(size == sizeof(char) || size == sizeof(short) || size == sizeof(int) || size == sizeof(long long) || size == sizeof(long double));\n\t}\n}\n\n#endif\n"
  },
  {
    "path": "src/wolfsort.c",
    "content": "// wolfsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com\n\n//#define GODMODE \n\n#ifdef GODMODE // inspired by rhsort, technically unstable.\n\nvoid FUNC(unstable_count)(VAR *array, size_t nmemb, size_t buckets, VAR min, CMPFUNC *cmp)\n{\n\tVAR *pta;\n\tsize_t index;\n\tsize_t *count = (size_t *) calloc(sizeof(size_t), buckets), loop;\n\n\tpta = array;\n\n\tfor (index = nmemb / 16 ; index ; index--)\n\t{\n\t\tfor (loop = 16 ; loop ; loop--)\n\t\t{\n\t\t\tcount[*pta++ - min]++;\n\t\t}\n\t}\n\n\tfor (index = nmemb % 16 ; index ; index--)\n\t{\n\t\tcount[*pta++ - min]++;\n\t}\n\n\tpta = array;\n\n\tfor (index = 0 ; index < buckets ; index++)\n\t{\n\t\tfor (loop = count[index] ; loop ; loop--)\n\t\t{\n\t\t\t*pta++ = index + min;\n\t\t}\n\t}\n\n\tfree(count);\n\n\treturn;\n}\n#endif\n\ninline void FUNC(wolf_unguarded_insert)(VAR *array, size_t offset, size_t nmemb, CMPFUNC *cmp)\n{\n\tVAR key, *pta, *end;\n\tsize_t i, top, x, y;\n\n\tfor (i = offset ; i < nmemb ; i++)\n\t{\n\t\tpta = end = array + i;\n\n\t\tif (cmp(--pta, end) <= 0)\n\t\t{\n\t\t\tcontinue;\n\t\t}\n\n\t\tkey = *end;\n\n\t\tif (cmp(array + 1, &key) > 0)\n\t\t{\n\t\t\ttop = i - 1;\n\n\t\t\tdo\n\t\t\t{\n\t\t\t\t*end-- = *pta--;\n\t\t\t}\n\t\t\twhile (--top);\n\n\t\t\t*end-- = key;\n\t\t}\n\t\telse\n\t\t{\n\t\t\tdo\n\t\t\t{\n\t\t\t\t*end-- = *pta--;\n\t\t\t\t*end-- = *pta--;\n\t\t\t}\n\t\t\twhile (cmp(pta, &key) > 0);\n\n\t\t\tend[0] = end[1];\n\t\t\tend[1] = key;\n\t\t}\n\t\tx = cmp(end, end + 1) > 0; y = !x; key = end[y]; end[0] = end[x]; end[1] = key;\n\t}\n}\n\nvoid FUNC(wolfsort_swap)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp);\n\nvoid FUNC(wolf_partition)(VAR *array, VAR *aux, size_t aux_size, size_t nmemb, VAR min, VAR max, CMPFUNC *cmp)\n{\n\tVAR *swap, *pta, *pts, *ptd, range, moduler;\n\tsize_t index, cnt, loop, dmemb, buckets;\n\tunsigned int *count, limit;\n\n\tif (nmemb < 32)\n\t{\n\t\treturn FUNC(quadsort)(array, nmemb, cmp);\n\t}\n\n\trange = max - min;\n\n\tif (range >> 16 == 0 || (size_t) range <= nmemb / 4)\n\t{\n\t\tbuckets = range + 1;\n\t\tmoduler = 1;\n\t}\n\telse\n\t{\n\t\tbuckets = nmemb <= 4 * 65536 ? nmemb / 4 : 1024;\n\n\t\tfor (moduler = 4 ; (size_t) moduler <= range / buckets ; moduler *= 2) {}\n\n\t\tbuckets = range / moduler + 1;\n\t}\n\n\tlimit = (nmemb / buckets) * 4;\n\n\tcount = (unsigned int *) calloc(sizeof(int), buckets);\n\n\tswap = aux;\n\n\tif (limit * buckets > aux_size)\n\t{\n\t\tswap = (VAR *) malloc(limit * buckets * sizeof(VAR));\n\t}\n\n\tif (count == NULL || swap == NULL)\n\t{\n\t\tif (count)\n\t\t{\n\t\t\tfree(count);\n\t\t}\n\t\tFUNC(fluxsort_swap)(array, aux, aux_size, nmemb, cmp);\n\t\treturn;\n\t}\n\n\tptd = pta = array;\n\n\tfor (loop = nmemb ; loop ; loop--)\n\t{\n\t\tmax = *pta++;\n\n\t\tindex = (unsigned int) (max - min) / moduler;\n\n\t\tif (count[index] < limit)\n\t\t{\n\t\t\tswap[index * limit + count[index]++] = max;\n\t\t\tcontinue;\n\t\t}\n\t\t// The element doesn't fit, so we drop it to the main array. Inspired by rhsort.\n\t\t*ptd++ = max;\n\t}\n\n\tdmemb = ptd - array;\n\n\tif (dmemb)\n\t{\n\t\tptd = array + nmemb - dmemb;\n\n\t\tmemmove(ptd, array, dmemb * sizeof(VAR));\n\t}\n\tpta = array;\n\tpts = swap;\n\n\tfor (index = 0 ; index < buckets ; index++)\n\t{\n\t\tcnt = count[index];\n\n\t\tif (cnt)\n\t\t{\n\t\t\tmemcpy(pta, pts, cnt * sizeof(VAR));\n\n\t\t\tif (moduler > 1)\n\t\t\t{\n\t\t\t\tFUNC(wolfsort_swap)(pta, swap, limit + pts - swap, cnt, cmp);\n\t\t\t}\n\t\t\tpta += cnt;\n\t\t}\n\t\tpts += limit;\n\t}\n\n\tif (dmemb)\n\t{\n\t\tFUNC(fluxsort_swap)(ptd, swap, dmemb, dmemb, cmp);\n\n\t\tFUNC(partial_backward_merge)(array, swap, nmemb, nmemb, nmemb - dmemb, cmp);\n\t}\n\tif (limit * buckets > aux_size)\n\t{\n\t\tfree(swap);\n\t}\n\tfree(count);\n}\n\nvoid FUNC(wolf_minmax)(VAR *min, VAR *max, VAR *pta, VAR *ptb, VAR *ptc, VAR *ptd, CMPFUNC *cmp)\n{\n\tif (cmp(min, pta) > 0) *min = *pta;\n\tif (cmp(pta, max) > 0) *max = *pta;\n\tif (cmp(min, ptb) > 0) *min = *ptb;\n\tif (cmp(ptb, max) > 0) *max = *ptb;\n\tif (cmp(min, ptc) > 0) *min = *ptc;\n\tif (cmp(ptc, max) > 0) *max = *ptc;\n\tif (cmp(min, ptd) > 0) *min = *ptd;\n\tif (cmp(ptd, max) > 0) *max = *ptd;\n}\n\nvoid FUNC(wolf_analyze)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp)\n{\n\tunsigned char loop, asum, bsum, csum, dsum;\n\tunsigned int astreaks, bstreaks, cstreaks, dstreaks;\n\tsize_t quad1, quad2, quad3, quad4, half1, half2;\n\tsize_t cnt, abalance, bbalance, cbalance, dbalance;\n\tVAR min, max, *pta, *ptb, *ptc, *ptd;\n\n\thalf1 = nmemb / 2;\n\tquad1 = half1 / 2;\n\tquad2 = half1 - quad1;\n\thalf2 = nmemb - half1;\n\tquad3 = half2 / 2;\n\tquad4 = half2 - quad3;\n\n\tmin = max = array[nmemb - 1];\n\n\tpta = array;\n\tptb = array + quad1;\n\tptc = array + half1;\n\tptd = array + half1 + quad3;\n\n\tastreaks = bstreaks = cstreaks = dstreaks = 0;\n\tabalance = bbalance = cbalance = dbalance = 0;\n\n\tfor (cnt = nmemb ; cnt > 132 ; cnt -= 128)\n\t{\n\t\tfor (asum = bsum = csum = dsum = 0, loop = 32 ; loop ; loop--)\n\t\t{\n\t\t\tFUNC(wolf_minmax)(&min, &max, pta, ptb, ptc, ptd, cmp);\n\n\t\t\tasum += cmp(pta, pta + 1) > 0; pta++;\n\t\t\tbsum += cmp(ptb, ptb + 1) > 0; ptb++;\n\t\t\tcsum += cmp(ptc, ptc + 1) > 0; ptc++;\n\t\t\tdsum += cmp(ptd, ptd + 1) > 0; ptd++;\n\t\t}\n\t\tabalance += asum; astreaks += (asum == 0) | (asum == 32);\n\t\tbbalance += bsum; bstreaks += (bsum == 0) | (bsum == 32);\n\t\tcbalance += csum; cstreaks += (csum == 0) | (csum == 32);\n\t\tdbalance += dsum; dstreaks += (dsum == 0) | (dsum == 32);\n\t}\n\n\tfor ( ; cnt > 7 ; cnt -= 4)\n\t{\n\t\tFUNC(wolf_minmax)(&min, &max, pta, ptb, ptc, ptd, cmp);\n\n\t\tabalance += cmp(pta, pta + 1) > 0; pta++;\n\t\tbbalance += cmp(ptb, ptb + 1) > 0; ptb++;\n\t\tcbalance += cmp(ptc, ptc + 1) > 0; ptc++;\n\t\tdbalance += cmp(ptd, ptd + 1) > 0; ptd++;\n\t}\n\n\tif (quad1 < quad2)\n\t{\n\t\tif (cmp(&min, ptb) > 0) min = *ptb; else if (cmp(ptb, &max) > 0) max = *ptb;\n\t\tbbalance += cmp(ptb, ptb + 1) > 0; ptb++;\n\t}\n\tif (quad1 < quad3)\n\t{\n\t\tif (cmp(&min, ptc) > 0) min = *ptc; else if (cmp(ptc, &max) > 0) max = *ptc;\n\t\tcbalance += cmp(ptc, ptc + 1) > 0; ptc++;\n\t}\n\tif (quad1 < quad4)\n\t{\n\t\tif (cmp(&min, ptd) > 0) min = *ptd; else if (cmp(ptd, &max) > 0) max = *ptd;\n\t\tdbalance += cmp(ptd, ptd + 1) > 0; ptd++;\n\t}\n\tFUNC(wolf_minmax)(&min, &max, pta, ptb, ptc, ptd, cmp);\n\n\tcnt = abalance + bbalance + cbalance + dbalance;\n\n\tif (cnt == 0)\n\t{\n\t\tif (cmp(pta, pta + 1) <= 0 && cmp(ptb, ptb + 1) <= 0 && cmp(ptc, ptc + 1) <= 0)\n\t\t{\n\t\t\treturn;\n\t\t}\n\t}\n\n#ifdef GODMODE\n\t{\n\t\tVAR range = max - min;\n\n\t\tif (range < 65536 || range <= nmemb / 4)\n\t\t{\n\t\t\tFUNC(unstable_count)(array, nmemb, range + 1, min, cmp);\n\t\t\treturn;\n\t\t}\n\t}\n#endif\n\n\tasum = quad1 - abalance == 1;\n\tbsum = quad2 - bbalance == 1;\n\tcsum = quad3 - cbalance == 1;\n\tdsum = quad4 - dbalance == 1;\n\n\tif (asum | bsum | csum | dsum)\n\t{\n\t\tunsigned char span1 = (asum && bsum) * (cmp(pta, pta + 1) > 0);\n\t\tunsigned char span2 = (bsum && csum) * (cmp(ptb, ptb + 1) > 0);\n\t\tunsigned char span3 = (csum && dsum) * (cmp(ptc, ptc + 1) > 0);\n\n\t\tswitch (span1 | span2 * 2 | span3 * 4)\n\t\t{\n\t\t\tcase 0: break;\n\t\t\tcase 1: FUNC(quad_reversal)(array, ptb);   abalance = bbalance = 0; break;\n\t\t\tcase 2: FUNC(quad_reversal)(pta + 1, ptc); bbalance = cbalance = 0; break;\n\t\t\tcase 3: FUNC(quad_reversal)(array, ptc);   abalance = bbalance = cbalance = 0; break;\n\t\t\tcase 4: FUNC(quad_reversal)(ptb + 1, ptd); cbalance = dbalance = 0; break;\n\t\t\tcase 5: FUNC(quad_reversal)(array, ptb);\n\t\t\t\tFUNC(quad_reversal)(ptb + 1, ptd); abalance = bbalance = cbalance = dbalance = 0; break;\n\t\t\tcase 6: FUNC(quad_reversal)(pta + 1, ptd); bbalance = cbalance = dbalance = 0; break;\n\t\t\tcase 7: FUNC(quad_reversal)(array, ptd); return;\n\t\t}\n\n\t\tif (asum && abalance) {FUNC(quad_reversal)(array,   pta); abalance = 0;}\n\t\tif (bsum && bbalance) {FUNC(quad_reversal)(pta + 1, ptb); bbalance = 0;}\n\t\tif (csum && cbalance) {FUNC(quad_reversal)(ptb + 1, ptc); cbalance = 0;}\n\t\tif (dsum && dbalance) {FUNC(quad_reversal)(ptc + 1, ptd); dbalance = 0;}\n\t}\n\n#ifdef cmp\n\tcnt = nmemb / 256; // switch to quadsort if more than 50% ordered\n#else\n\tcnt = nmemb / 512; // switch to quadsort if more than 25% ordered\n#endif\n\tasum = astreaks > cnt;\n\tbsum = bstreaks > cnt;\n\tcsum = cstreaks > cnt;\n\tdsum = dstreaks > cnt;\n\n#ifndef cmp\n\tif (quad1 > QUAD_CACHE)\n\t{\n\t\tasum = bsum = csum = dsum = 1;\n\t}\n#endif\n\tswitch (asum + bsum * 2 + csum * 4 + dsum * 8)\n\t{\n\t\tcase 0:\n\t\t\tFUNC(wolf_partition)(array, swap, swap_size, nmemb, min, max, cmp);\n\t\t\treturn;\n\t\tcase 1:\n\t\t\tif (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp);\n\t\t\tFUNC(wolf_partition)(pta + 1, swap, swap_size, quad2 + half2, min, max, cmp);\n\t\t\tbreak;\n\t\tcase 2:\n\t\t\tFUNC(wolf_partition)(array, swap, swap_size, quad1, min, max, cmp);\n\t\t\tif (bbalance) FUNC(quadsort_swap)(pta + 1, swap, swap_size, quad2, cmp);\n\t\t\tFUNC(wolf_partition)(ptb + 1, swap, swap_size, half2, min, max, cmp);\n\t\t\tbreak;\n\t\tcase 3:\n\t\t\tif (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp);\n\t\t\tif (bbalance) FUNC(quadsort_swap)(pta + 1, swap, swap_size, quad2, cmp);\n\t\t\tFUNC(wolf_partition)(ptb + 1, swap, swap_size, half2, min, max, cmp);\n\t\t\tbreak;\n\t\tcase 4:\n\t\t\tFUNC(wolf_partition)(array, swap, swap_size, half1, min, max, cmp);\n\t\t\tif (cbalance) FUNC(quadsort_swap)(ptb + 1, swap, swap_size, quad3, cmp);\n\t\t\tFUNC(wolf_partition)(ptc + 1, swap, swap_size, quad4, min, max, cmp);\n\t\t\tbreak;\n\t\tcase 8:\n\t\t\tFUNC(wolf_partition)(array, swap, swap_size, half1 + quad3, min, max, cmp);\n\t\t\tif (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp);\n\t\t\tbreak;\n\t\tcase 9:\n\t\t\tif (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp);\n\t\t\tFUNC(wolf_partition)(pta + 1, swap, swap_size, quad2 + quad3, min, max, cmp);\n\t\t\tif (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp);\n\t\t\tbreak;\n\t\tcase 12:\n\t\t\tFUNC(wolf_partition)(array, swap, swap_size, half1, min, max, cmp);\n\t\t\tif (cbalance) FUNC(quadsort_swap)(ptb + 1, swap, swap_size, quad3, cmp);\n\t\t\tif (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp);\n\t\t\tbreak;\n\t\tcase 5:\n\t\tcase 6:\n\t\tcase 7:\n\t\tcase 10:\n\t\tcase 11:\n\t\tcase 13:\n\t\tcase 14:\n\t\tcase 15:\n\t\t\tif (asum)\n\t\t\t{\n\t\t\t\tif (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp);\n\t\t\t}\n\t\t\telse FUNC(wolf_partition)(array, swap, swap_size, quad1, min, max, cmp);\n\t\t\tif (bsum)\n\t\t\t{\n\t\t\t\tif (bbalance) FUNC(quadsort_swap)(pta + 1, swap, swap_size, quad2, cmp);\n\t\t\t}\n\t\t\telse FUNC(wolf_partition)(pta + 1, swap, swap_size, quad2, min, max, cmp);\n\t\t\tif (csum)\n\t\t\t{\n\t\t\t\tif (cbalance) FUNC(quadsort_swap)(ptb + 1, swap, swap_size, quad3, cmp);\n\t\t\t}\n\t\t\telse FUNC(wolf_partition)(ptb + 1, swap, swap_size, quad3, min, max, cmp);\n\t\t\tif (dsum)\n\t\t\t{\n\t\t\t\tif (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp);\n\t\t\t}\n\t\t\telse FUNC(wolf_partition)(ptc + 1, swap, swap_size, quad4, min, max, cmp);\n\t\t\tbreak;\n\t}\n\n\tif (cmp(pta, pta + 1) <= 0)\n\t{\n\t\tmemcpy(swap, array, half1 * sizeof(VAR));\n\n\t\tif (cmp(ptc, ptc + 1) <= 0)\n\t\t{\n\t\t\tif (cmp(ptb, ptb + 1) <= 0)\n\t\t\t{\n\t\t\t\treturn;\n\t\t\t}\n\t\t\tmemcpy(swap + half1, array + half1, half2 * sizeof(VAR));\n\t\t}\n\t\telse\n\t\t{\n\t\t\tFUNC(cross_merge)(swap + half1, array + half1, quad3, quad4, cmp);\n\t\t}\n\t}\n\telse\n\t{\n\t\tFUNC(cross_merge)(swap, array, quad1, quad2, cmp);\n\n\t\tif (cmp(ptc, ptc + 1) <= 0)\n\t\t{\n\t\t\tmemcpy(swap + half1, array + half1, half2 * sizeof(VAR));\n\t\t}\n\t\telse\n\t\t{\n\t\t\tFUNC(cross_merge)(swap + half1, ptb + 1, quad3, quad4, cmp);\n\t\t}\n\t}\n\tFUNC(cross_merge)(array, swap, half1, half2, cmp);\n}\n\nvoid FUNC(wolfsort)(void *array, size_t nmemb, CMPFUNC *cmp)\n{\n\tVAR *pta = (VAR *) array;\n\n\tif (nmemb <= 132)\n\t{\n\t\tFUNC(quadsort)(pta, nmemb, cmp);\n\t}\n\telse\n\t{\n\t\tVAR *swap = (VAR *) malloc(nmemb * sizeof(VAR));\n\n\t\tif (swap == NULL)\n\t\t{\n\t\t\tFUNC(quadsort)(pta, nmemb, cmp);\n\t\t\treturn;\n\t\t}\n\n\t\tFUNC(wolf_analyze)(pta, swap, nmemb, nmemb, cmp);\n\n\t\tfree(swap);\n\t}\n}\n\nvoid FUNC(wolfsort_swap)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp)\n{\n\tif (nmemb <= 132)\n\t{\n\t\tFUNC(quadsort_swap)(array, swap, nmemb, nmemb, cmp);\n\t}\n\telse\n\t{\n\t\tFUNC(wolf_analyze)(array, swap, swap_size, nmemb, cmp);\n\t}\n}\n"
  },
  {
    "path": "src/wolfsort.h",
    "content": "// wolfsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com\n\n#ifndef WOLFSORT_H\n#define WOLFSORT_H\n\n#include <stdlib.h>\n#include <stdio.h>\n#include <assert.h>\n#include <errno.h>\n#include <stdalign.h>\n\ntypedef int CMPFUNC (const void *a, const void *b);\n\n//#define cmp(a,b) (*(a) > *(b))\n\n// When sorting an array of pointers, like a string array, the QUAD_CACHE needs\n// to be set for proper performance when sorting large arrays.\n// wolfsort_prim() can be used to sort 32 and 64 bit primitives.\n\n// With a 6 MB L3 cache a value of 262144 works well.\n\n#ifdef cmp\n  #define QUAD_CACHE 4294967295\n#else\n//#define QUAD_CACHE 131072\n  #define QUAD_CACHE 262144\n//#define QUAD_CACHE 524288\n//#define QUAD_CACHE 4294967295\n#endif\n\n#ifndef FLUXSORT_H\n  #include \"fluxsort.h\"\n#endif\n\n//////////////////////////////////////////////////////////\n// ┌───────────────────────────────────────────────────┐//\n// │       ██████┐ ██████┐    ██████┐ ██████┐████████┐ │//\n// │       └────██┐└────██┐   ██┌──██┐└─██┌─┘└──██┌──┘ │//\n// │        █████┌┘ █████┌┘   ██████┌┘  ██│     ██│    │//\n// │        └───██┐██┌───┘    ██┌──██┐  ██│     ██│    │//\n// │       ██████┌┘███████┐   ██████┌┘██████┐   ██│    │//\n// │       └─────┘ └──────┘   └─────┘ └─────┘   └─┘    │//\n// └───────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n/*\n#define VAR int\n#define FUNC(NAME) NAME##32\n\n#include \"wolfsort.c\"\n\n#undef VAR\n#undef FUNC\n*/\n// wolfsort_prim\n\n#define VAR int\n#define FUNC(NAME) NAME##_int32\n#ifndef cmp\n  #define cmp(a,b) (*(a) > *(b))\n  #include \"wolfsort.c\"\n  #undef cmp\n#else\n  #include \"wolfsort.c\"\n#endif\n#undef VAR\n#undef FUNC\n\n#define VAR unsigned int\n#define FUNC(NAME) NAME##_uint32\n#ifndef cmp\n  #define cmp(a,b) (*(a) > *(b))\n  #include \"wolfsort.c\"\n  #undef cmp\n#else\n  #include \"wolfsort.c\"\n#endif\n#undef VAR\n#undef FUNC\n\n//////////////////////////////////////////////////////////\n// ┌───────────────────────────────────────────────────┐//\n// │        █████┐ ██┐  ██┐   ██████┐ ██████┐████████┐ │//\n// │       ██┌───┘ ██│  ██│   ██┌──██┐└─██┌─┘└──██┌──┘ │//\n// │       ██████┐ ███████│   ██████┌┘  ██│     ██│    │//\n// │       ██┌──██┐└────██│   ██┌──██┐  ██│     ██│    │//\n// │       └█████┌┘     ██│   ██████┌┘██████┐   ██│    │//\n// │        └────┘      └─┘   └─────┘ └─────┘   └─┘    │//\n// └───────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n/*\n#define VAR long long\n#define FUNC(NAME) NAME##64\n\n#include \"wolfsort.c\"\n\n#undef VAR\n#undef FUNC\n*/\n// wolfsort_prim\n\n#define VAR long long\n#define FUNC(NAME) NAME##_int64\n#ifndef cmp\n  #define cmp(a,b) (*(a) > *(b))\n  #include \"wolfsort.c\"\n  #undef cmp\n#else\n  #include \"wolfsort.c\"\n#endif\n#undef VAR\n#undef FUNC\n\n#define VAR unsigned long long\n#define FUNC(NAME) NAME##_uint64\n#ifndef cmp\n  #define cmp(a,b) (*(a) > *(b))\n  #include \"wolfsort.c\"\n  #undef cmp\n#else\n  #include \"wolfsort.c\"\n#endif\n#undef VAR\n#undef FUNC\n\n// This section is outside of 32/64 bit pointer territory, so no cache checks\n// necessary, unless sorting 32+ byte structures.\n\n#undef QUAD_CACHE\n#define QUAD_CACHE 4294967295\n\n//////////////////////////////////////////////////////////\n//┌────────────────────────────────────────────────────┐//\n//│                █████┐    ██████┐ ██████┐████████┐  │//\n//│               ██┌──██┐   ██┌──██┐└─██┌─┘└──██┌──┘  │//\n//│               └█████┌┘   ██████┌┘  ██│     ██│     │//\n//│               ██┌──██┐   ██┌──██┐  ██│     ██│     │//\n//│               └█████┌┘   ██████┌┘██████┐   ██│     │//\n//│                └────┘    └─────┘ └─────┘   └─┘     │//\n//└────────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n#define VAR char\n#define FUNC(NAME) NAME##8\n\n#include \"wolfsort.c\"\n\n#undef VAR\n#undef FUNC\n\n//////////////////////////////////////////////////////////\n//┌────────────────────────────────────────────────────┐//\n//│           ▄██┐   █████┐    ██████┐ ██████┐████████┐│//\n//│          ████│  ██┌───┘    ██┌──██┐└─██┌─┘└──██┌──┘│//\n//│          └─██│  ██████┐    ██████┌┘  ██│     ██│   │//\n//│            ██│  ██┌──██┐   ██┌──██┐  ██│     ██│   │//\n//│          ██████┐└█████┌┘   ██████┌┘██████┐   ██│   │//\n//│          └─────┘ └────┘    └─────┘ └─────┘   └─┘   │//\n//└────────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////\n\n#define VAR short\n#define FUNC(NAME) NAME##16\n\n#include \"wolfsort.c\"\n\n#undef VAR\n#undef FUNC\n\n///////////////////////////////////////////////////////////\n//┌─────────────────────────────────────────────────────┐//\n//│ ██████┐██┐   ██┐███████┐████████┐ ██████┐ ███┐  ███┐│//\n//│██┌────┘██│   ██│██┌────┘└──██┌──┘██┌───██┐████┐████││//\n//│██│     ██│   ██│███████┐   ██│   ██│   ██│██┌███┌██││//\n//│██│     ██│   ██│└────██│   ██│   ██│   ██│██│└█┌┘██││//\n//│└██████┐└██████┌┘███████│   ██│   └██████┌┘██│ └┘ ██││//\n//│ └─────┘ └─────┘ └──────┘   └─┘    └─────┘ └─┘    └─┘│//\n//└─────────────────────────────────────────────────────┘//\n///////////////////////////////////////////////////////////\n\n/*\ntypedef struct {char bytes[32];} struct256;\n#define VAR struct256\n#define FUNC(NAME) NAME##256\n\n#include \"wolfsort.c\"\n\n#undef VAR\n#undef FUNC\n*/\n\n //////////////////////////////////////////////////////////////////////////\n//┌─────────────────────────────────────────────────────────────────────┐//\n//│██┐    ██┐ ██████┐ ██┐     ███████┐███████┐ ██████┐ ██████┐ ████████┐│//\n//│██│    ██│██┌───██┐██│     ██┌────┘██┌────┘██┌───██┐██┌──██┐└──██┌──┘│//\n//│██│ █┐ ██│██│   ██│██│     █████┐  ███████┐██│   ██│██████┌┘   ██│   │//\n//│██│███┐██│██│   ██│██│     ██┌──┘  └────██│██│   ██│██┌──██┐   ██│   │//\n//│└███┌███┌┘└██████┌┘███████┐██│     ███████│└██████┌┘██│  ██│   ██│   │//\n//│ └──┘└──┘  └─────┘ └──────┘└─┘     └──────┘ └─────┘ └─┘  └─┘   └─┘   │//\n//└─────────────────────────────────────────────────────────────────────┘//\n//////////////////////////////////////////////////////////////////////////\n\nvoid wolfsort(void *array, size_t nmemb, size_t size, CMPFUNC *cmp)\n{\n\tif (nmemb < 2)\n\t{\n\t\treturn;\n\t}\n\n\tswitch (size)\n\t{\n\t\tcase sizeof(char):\n\t\t\twolfsort8(array, nmemb, cmp);\n\t\t\treturn;\n\n\t\tcase sizeof(short):\n\t\t\twolfsort16(array, nmemb, cmp);\n\t\t\treturn;\n\n\t\tcase sizeof(int):\n\t\t\twolfsort_uint32(array, nmemb, cmp);\n\t\t\treturn;\n\n\t\tcase sizeof(long long):\n\t\t\twolfsort_uint64(array, nmemb, cmp);\n//\t\t\tfluxsort64(array, nmemb, cmp); // fluxsort generally beats wolfsort for 64+ bit types\n\t\t\treturn;\n\n\t\tcase sizeof(long double):\n\t\t\tfluxsort128(array, nmemb, cmp);\n\t\t\treturn;\n\n//\t\tcase sizeof(struct256):\n//\t\t\twolfsort256(array, nmemb, cmp);\n\t\t\treturn;\n\n\t\tdefault:\n\t\t\tassert(size == sizeof(char) || size == sizeof(short) || size == sizeof(int) || size == sizeof(long long) || size == sizeof(long double));\n//\t\t\tqsort(array, nmemb, size, cmp);\n\t}\n}\n\n// suggested size values for primitives:\n\n//\t\tcase  0: unsigned char\n//\t\tcase  1: signed char\n//\t\tcase  2: signed short\n//\t\tcase  3: unsigned short\n//\t\tcase  4: signed int\n//\t\tcase  5: unsigned int\n//\t\tcase  6: float\n//\t\tcase  7: double\n//\t\tcase  8: signed long long\n//\t\tcase  9: unsigned long long\n//\t\tcase 16: long double\n\nvoid wolfsort_prim(void *array, size_t nmemb, size_t size)\n{\n\tif (nmemb < 2)\n\t{\n\t\treturn;\n\t}\n\n\tswitch (size)\n\t{\n\t\tcase 4:\n\t\t\tfluxsort_int32(array, nmemb, NULL);\n\t\t\treturn;\n\t\tcase 8:\n\t\t\tfluxsort_int64(array, nmemb, NULL);\n\t\t\treturn;\n\t\tcase 5:\n\t\t\twolfsort_uint32(array, nmemb, NULL);\n\t\t\treturn;\n\t\tcase 9:\n\t\t\twolfsort_uint64(array, nmemb, NULL);\n\t\t\treturn;\n\t\tdefault:\n\t\t\tassert(size == sizeof(int) || size == sizeof(long long) || size == sizeof(int) + 1 || size == sizeof(long long) + 1);\n\t\t\treturn;\n\t}\n}\n\n#undef QUAD_CACHE\n\n#endif\n"
  }
]