Repository: scandum/wolfsort Branch: master Commit: 56ad38959aee Files: 18 Total size: 238.6 KB Directory structure: gitextract_sm3bx4qr/ ├── LICENSE ├── README.md └── src/ ├── bench.c ├── blitsort.c ├── blitsort.h ├── crumsort.c ├── crumsort.h ├── extra_tests.c ├── fluxsort.c ├── fluxsort.h ├── gridsort.c ├── gridsort.h ├── quadsort.c ├── quadsort.h ├── skipsort.c ├── skipsort.h ├── wolfsort.c └── wolfsort.h ================================================ FILE CONTENTS ================================================ ================================================ FILE: LICENSE ================================================ This is free and unencumbered software released into the public domain. Anyone is free to copy, modify, publish, use, compile, sell, or distribute this software, either in source code form or as a compiled binary, for any purpose, commercial or non-commercial, and by any means. In jurisdictions that recognize copyright laws, the author or authors of this software dedicate any and all copyright interest in the software to the public domain. We make this dedication for the benefit of the public at large and to the detriment of our heirs and successors. We intend this dedication to be an overt act of relinquishment in perpetuity of all present and future rights to this software under copyright law. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. For more information, please refer to ================================================ FILE: README.md ================================================ Intro ----- This document describes a stable adaptive hybrid bucket / quick / merge / drop sort named wolfsort. The bucket sort, forming the core of wolfsort, is not a comparison sort, so wolfsort can be considered a member of the radix-sort family. Quicksort and mergesort are well known. Dropsort gained popularity after it was reinvented as Stalin sort. A [benchmark](https://github.com/scandum/wolfsort#benchmark-for-wolfsort-v1154-dripsort) is available at the bottom. Why a hybrid? ------------- While an adaptive merge sort is very fast at sorting ordered data, its inability to effectively partition is its greatest weakness. A radix-like bucket sort, on the other hand, is unable to take advantage of sorted data. While quicksort is fast at partitioning, a bucket sort is faster on medium-sized arrays in the 1K - 1M element range. Dropsort in turn hybridizes surprisingly well with bucket and sample sorts. History ------- Wolfsort 1, codename: quantumsort, started out with the concept that memory is in abundance on modern systems. I theorized that by allocating 8n memory performance could be increased by allowing a bucket sort to partition in one pass. Not all the memory would be used or ever accessed however, which is why I envisioned it as a type of poor-man's quantum computing. The extra memory only serves to simplify computations. The concept kind of worked, except that large memory allocations in C can be either very fast or very slow. I didn't investigate why. I also learned people don't like it when you use the term quantum computing outside of the proper context, or perhaps they were upset about wolfsort's voracious appetite for memory. Hence it was named. Wolfsort 2, codename: flowsort, is when I reinvented counting sort. Instead of making 1 pass and using extra memory to deal with fluctuations in the data, flowsort makes one pass to calculate the bucket sizes, then makes a second pass to neatly fill the buckets. Wolfsort 3, codename: dripsort, was inspired by the work of M. Lochbaum on [rhsort](https://github.com/mlochbaum/rhsort) to use a method similar to dropsort to deal with bucket overflow, and to calculate the minimum and maximum value to optimize for distributions with a small range of values. Dripsort once again makes one pass and uses around 4n memory to deal with fluctuations in the data. Compared to v1 this is a 50% reduction in memory allocation, while at the same time significantly increasing robustness. Analyzer -------- Wolfsort uses the same analyzer as [fluxsort](https://github.com/scandum/fluxsort) to sort fully in-order and fully reverse-order distributions in n comparisons. The array is split into 4 segments for which a measure of presortedness is calculated. Mostly ordered segments are sorted with [quadsort](https://github.com/scandum/quadsort), while mostly random segments are sorted with wolfsort. In addition, the minimum and maximum value in the distribution is obtained. Setting the bucket size ----------------------- For optimal performance wolfsort needs to have at least 8 buckets, end up with between 1 and 16 elements per bucket, so the bucket size is set to hold 8 elements on average. However, the buckets should remain in the L1 cache, so the maximum number of buckets is set at 65536. This sets the optimal range for wolfsort between 8 * 8 (64) and 8 * 65536 (524,288) elements. Beyond the optimal range performance will degrade steadily. Once the average bucket size reaches the threshold of 18 elements (1,179,648 total elements) the sort becomes less optimal than quicksort, though it retains a computational advantage for a little while longer. However, by recursing once, wolfsort increases the optimal range to 1 trillion elements. By computing the minimum and maximum value in the data distribution, the number of buckets are optimized further to target the sweet spot. Dropsort -------- Dropsort was first proposed as an alternative sorting algorithm by David Morgan in 2006, it makes one pass and is lossy. The algorithm was reinvented in 2018 as Stalin sort. The concept of dropping hash entries in a non-lossy manner was independently developed by Marshall Lochbaum in 2018 and is utilized in his 2022 release of rhsort (Robin Hood Sort). Wolfsort allocates 4n memory to allow some deviancy in the data distribution and minimize bucket overflow. In the case an element is too deviant and overflows the bucket, it is copied in-place to the input array. In near-optimal cases this results in a minimal drip, in the worst case it will result in a downpour of elements being copied to the input array. While a centrally planned partitioning system has its weaknesses, the worst case is mostly alleviated by using fluxsort on the deviant elements once partitioning finishes. Fluxsort is adaptive and is generally strong against distributions where wolfsort is weak. The overall performance gain from incorporating dropsort into wolfsort is approximately 20%, but can reach an order of magnitude when the fallback is synergetic with fluxsort. Deviant distributions can deceive wolfsort for a time, but not a very long time. Small number sorting -------------------- Since wolfsort uses auxiliary memory, each partition is stable once partitioning completes. The next step is to sort the content of each bucket using fluxsort. If the number of elements in a bucket is below 32, fluxsort defaults to quadsort, which is highly optimized for sorting small arrays using a combination of branchless parity merges and twice-unguarded insertion. Once each bucket is sorted, all that remains is merging the two distributions of compliant and deviant elements, and wolfsort is finished. Memory overhead --------------- Wolfsort requires 4n memory for the partitioning process and n / 4 memory (up to a maximum of 65536) for the buckets. If not enough memory is available wolfsort falls back on fluxsort, which requires exactly 1n swap memory, and if that's not sufficient fluxsort falls back on quadsort which can sort in-place. It is an option to fall back on blitsort instead of quadsort, but since this would be an a-typical case, and increase dependencies, I didn't implement this. 64 bit integers --------------- With the advent of fluxsort and crumsort the dominance of radix sorts has been pushed out of 64 bit territory. Increased memory-level-parallelism in future hardware, or algorithmic optimizations, might make radix sorts competitive again for 64 bit types. Wolfsort has a commented-out default to fluxsort. 128 bit floats -------------- Wolfsort defaults to fluxsort for 128 bit floats. Keep in mind that in the real world you'll typically be sorting tables instead of arrays, so the benchmark isn't indicative of real world performance, as the sort will likely be copying 64 bit pointers instead of 128 bit floats. God Mode -------- Wolfsort supports a cheat mode where the sort becomes unstable. This trick was taken from rhsort. Since wolfsort aspires to have some utility as a stable sort, this method is disabled by default, including in the benchmark. In the benchmark rhsort does use this optimization, but it's only relevant for the random % 100 distribution. For 32 bit random integers rhsort easily beats wolfsort without an unfair advantage. LLVM ---- When compiling with Clang, quadsort and fluxsort will take advantate of branchless ternary oprations, which gives a 15-30% performance gain. While not an algorithmic improvement, it's relevant to keep in mind, particularly when it comes to LLVM compiled Rust sorts with similar optimizations. Interface --------- Wolfsort uses the same interface as qsort, which is described in [man qsort](https://man7.org/linux/man-pages/man3/qsort.3p.html). Wolfsort also comes with the `wolfsort_prim(void *array, size_t nmemb, size_t size)` function to perform primitive comparisons on arrays of 32 and 64 bit integers. Nmemb is the number of elements, while size should be either `sizeof(int)` or `sizeof(long long)` for signed integers, and `sizeof(int) + 1` or `sizeof(long long) + 1` for unsigned integers. Support for the char and short types can be easily added in wolfsort.h. Wolfsort can only sort arrays of primitive integers by default. Wolfsort should be able to sort tables with some minor changes, but it'll require a different interface than qsort() provides. Proof of concept ---------------- Wolfsort is primarily a proof of concept for a hybrid bucket / comparison sort. It only supports non-negative integers. I'll briefly mention other sorting algorithms listed in the benchmark code / graphs. They can all be considered the fastest algorithms currently available in their particular class. Blitsort -------- [Blitsort](https://github.com/scandum/blitsort) is a hybrid in-place stable adaptive rotate quick / merge sort. Crumsort -------- [Crumsort](https://github.com/scandum/crumsort) is a hybrid in-place unstable adaptive quick / rotate merge sort. Quadsort -------- [Quadsort](https://github.com/scandum/quadsort) is an adaptive mergesort. It supports rotations as a fall-back to sort in-place. It has very good performance when it comes to sorting tables and generally outperforms timsort. Gridsort -------- [Gridsort](https://github.com/scandum/gridsort) is a stable comparison sort which stores data in a 2 dimensional self-balancing grid. It has some interesting properties and was the fastest comparison sort for random data for a brief period of time. Fluxsort -------- [Fluxsort](https://github.com/scandum/fluxsort) is a hybrid stable branchless out-of-place quick / merge sort. Piposort -------- [Piposort](https://github.com/scandum/piposort) is a simplified branchless quadsort with a much smaller code size and complexity while still being very fast. Piposort might be of use to people who want to port quadsort. This is a lot easier when you start out small. rhsort ------ [rhsort](https://github.com/mlochbaum/rhsort) is a hybrid stable out-of-place counting / radix / drop / insertion sort. It has exceptional performance on random and generic data for medium array sizes. Ska sort -------- [Ska sort](https://github.com/skarupke/ska_sort) is an advanced radix sort that can sort strings and floats as well. It offers both an in-place and out-of-place version, but since the out-of-place unstable version is not very competitive with wolfsort, I only benchmark the stable and faster ska_sort_copy variant. Big O ----- ``` ┌───────────────────────┐┌────────────────────┐ │comparisons ││swap memory │ ┌───────────────┐├───────┬───────┬───────┤├──────┬──────┬──────┤┌──────┐┌─────────┐┌─────────┐┌─────────┐ │name ││min │avg │max ││min │avg │max ││stable││partition││adaptive ││compares │ ├───────────────┤├───────┼───────┼───────┤├──────┼──────┼──────┤├──────┤├─────────┤├─────────┤├─────────┤ │blitsort ││n │n log n│n log n││1 │1 │1 ││yes ││yes ││yes ││yes │ ├───────────────┤├───────┼───────┼───────┤├──────┼──────┼──────┤├──────┤├─────────┤├─────────┤├─────────┤ │crumsort ││n │n log n│n log n││1 │1 │1 ││no ││yes ││yes ││yes │ ├───────────────┤├───────┼───────┼───────┤├──────┼──────┼──────┤├──────┤├─────────┤├─────────┤├─────────┤ │fluxsort ││n │n log n│n log n││n │n │n ││yes ││yes ││yes ││yes │ ├───────────────┤├───────┼───────┼───────┤├──────┼──────┼──────┤├──────┤├─────────┤├─────────┤├─────────┤ │gridsort ││n │n log n│n log n││n │n │n ││yes ││yes ││yes ││yes │ ├───────────────┤├───────┼───────┼───────┤├──────┼──────┼──────┤├──────┤├─────────┤├─────────┤├─────────┤ │quadsort ││n │n log n│n log n││1 │n │n ││yes ││no ││yes ││yes │ ├───────────────┤├───────┼───────┼───────┤├──────┼──────┼──────┤├──────┤├─────────┤├─────────┤├─────────┤ │wolfsort ││n │n log n│n log n││n │n │n ││yes ││yes ││yes ││hybrid │ ├───────────────┤├───────┼───────┼───────┤├──────┼──────┼──────┤├──────┤├─────────┤├─────────┤├─────────┤ │rhsort ││n │n log n│n log n││n │n │n ││yes ││yes ││semi ││hybrid │ ├───────────────┤├───────┼───────┼───────┤├──────┼──────┼──────┤├──────┤├─────────┤├─────────┤├─────────┤ │skasort_copy ││n k │n k │n k ││n │n │n ││yes ││yes ││no ││no │ └───────────────┘└───────┴───────┴───────┘└──────┴──────┴──────┘└──────┘└─────────┘└─────────┘└─────────┘ ``` Benchmark for Wolfsort v1.2.1.3 ------------------------------- rhsort vs wolfsort vs ska_sort_copy on 100K elements ---------------------------------------------------- The following benchmark was on WSL gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1) on 100,000 32 bit integers. The source code was compiled using g++ -O3 -fpermissive bench.c. All comparisons are inlined through the cmp macro. A table with the best and average time in seconds can be uncollapsed below the bar graph. ![Graph](/images/radix1.png)
data table | Name | Items | Type | Best | Average | Loops | Samples | Distribution | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- | | wolfsort | 100000 | 64 | 0.003006 | 0.003063 | 0 | 100 | random order | | skasort | 100000 | 64 | 0.001818 | 0.001842 | 0 | 100 | random order | | Name | Items | Type | Best | Average | Loops | Samples | Distribution | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- | | rhsort | 100000 | 32 | 0.000706 | 0.000729 | 0 | 100 | random order | | wolfsort | 100000 | 32 | 0.001000 | 0.001026 | 0 | 100 | random order | | skasort | 100000 | 32 | 0.000626 | 0.000640 | 0 | 100 | random order | | | | | | | | | | | rhsort | 100000 | 32 | 0.000115 | 0.000118 | 0 | 100 | random % 100 | | wolfsort | 100000 | 32 | 0.000376 | 0.000382 | 0 | 100 | random % 100 | | skasort | 100000 | 32 | 0.000780 | 0.000793 | 0 | 100 | random % 100 | | | | | | | | | | | rhsort | 100000 | 32 | 0.000302 | 0.000317 | 0 | 100 | ascending order | | wolfsort | 100000 | 32 | 0.000086 | 0.000088 | 0 | 100 | ascending order | | skasort | 100000 | 32 | 0.000709 | 0.000720 | 0 | 100 | ascending order | | | | | | | | | | | rhsort | 100000 | 32 | 0.000615 | 0.000633 | 0 | 100 | ascending saw | | wolfsort | 100000 | 32 | 0.000379 | 0.000407 | 0 | 100 | ascending saw | | skasort | 100000 | 32 | 0.000624 | 0.000637 | 0 | 100 | ascending saw | | | | | | | | | | | rhsort | 100000 | 32 | 0.000591 | 0.000615 | 0 | 100 | pipe organ | | wolfsort | 100000 | 32 | 0.000248 | 0.000258 | 0 | 100 | pipe organ | | skasort | 100000 | 32 | 0.000624 | 0.000639 | 0 | 100 | pipe organ | | | | | | | | | | | rhsort | 100000 | 32 | 0.000400 | 0.000420 | 0 | 100 | descending order | | wolfsort | 100000 | 32 | 0.000097 | 0.000101 | 0 | 100 | descending order | | skasort | 100000 | 32 | 0.000684 | 0.000693 | 0 | 100 | descending order | | | | | | | | | | | rhsort | 100000 | 32 | 0.000612 | 0.000629 | 0 | 100 | descending saw | | wolfsort | 100000 | 32 | 0.000389 | 0.000393 | 0 | 100 | descending saw | | skasort | 100000 | 32 | 0.000627 | 0.000639 | 0 | 100 | descending saw | | | | | | | | | | | rhsort | 100000 | 32 | 0.000633 | 0.000664 | 0 | 100 | random tail | | wolfsort | 100000 | 32 | 0.000467 | 0.000473 | 0 | 100 | random tail | | skasort | 100000 | 32 | 0.000622 | 0.000636 | 0 | 100 | random tail | | | | | | | | | | | rhsort | 100000 | 32 | 0.000671 | 0.000685 | 0 | 100 | random half | | wolfsort | 100000 | 32 | 0.000689 | 0.000706 | 0 | 100 | random half | | skasort | 100000 | 32 | 0.000628 | 0.000641 | 0 | 100 | random half | | | | | | | | | | | rhsort | 100000 | 32 | 0.002019 | 0.002052 | 0 | 100 | ascending tiles | | wolfsort | 100000 | 32 | 0.000683 | 0.000691 | 0 | 100 | ascending tiles | | skasort | 100000 | 32 | 0.001096 | 0.001113 | 0 | 100 | ascending tiles | | | | | | | | | | | rhsort | 100000 | 32 | 0.000837 | 0.000871 | 0 | 100 | bit reversal | | wolfsort | 100000 | 32 | 0.000887 | 0.000928 | 0 | 100 | bit reversal | | skasort | 100000 | 32 | 0.000775 | 0.000782 | 0 | 100 | bit reversal | | | | | | | | | | | rhsort | 100000 | 32 | 0.000118 | 0.000123 | 0 | 100 | random % 4 | | wolfsort | 100000 | 32 | 0.000368 | 0.000371 | 0 | 100 | random % 4 | | skasort | 100000 | 32 | 0.000785 | 0.000809 | 0 | 100 | random % 4 | | | | | | | | | | | rhsort | 100000 | 32 | 0.001278 | 0.001465 | 0 | 100 | semi random | | wolfsort | 100000 | 32 | 0.000792 | 0.000811 | 0 | 100 | semi random | | skasort | 100000 | 32 | 0.000805 | 0.000821 | 0 | 100 | semi random | | | | | | | | | | | rhsort | 100000 | 32 | 0.000198 | 0.000202 | 0 | 100 | random signal | | wolfsort | 100000 | 32 | 0.000815 | 0.000829 | 0 | 100 | random signal | | skasort | 100000 | 32 | 0.001099 | 0.001118 | 0 | 100 | random signal |
The following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04). The source code was compiled using `g++ -O3 -w -fpermissive bench.c`. It measures the performance on random data with array sizes ranging from 10 to 10,000,000. It's generated by running the benchmark using 10000000 0 0 as the argument. The benchmark is weighted, meaning the number of repetitions halves each time the number of items doubles. A table with the best and average time in seconds can be uncollapsed below the bar graph. ![Graph](/images/radix2.png)
data table | Name | Items | Type | Best | Average | Compares | Samples | Distribution | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- | | rhsort | 10 | 32 | 0.135095 | 0.137011 | 0.0 | 10 | random 10 | | wolfsort | 10 | 32 | 0.052087 | 0.052986 | 0.0 | 10 | random 10 | | skasort | 10 | 32 | 0.099853 | 0.100198 | 0.0 | 10 | random 10 | | | | | | | | | | | rhsort | 100 | 32 | 0.069252 | 0.070421 | 0.0 | 10 | random 100 | | wolfsort | 100 | 32 | 0.132208 | 0.132824 | 0.0 | 10 | random 100 | | skasort | 100 | 32 | 0.232007 | 0.232507 | 0.0 | 10 | random 100 | | | | | | | | | | | rhsort | 1000 | 32 | 0.055916 | 0.056130 | 0.0 | 10 | random 1000 | | wolfsort | 1000 | 32 | 0.101611 | 0.101913 | 0.0 | 10 | random 1000 | | skasort | 1000 | 32 | 0.054757 | 0.055050 | 0.0 | 10 | random 1000 | | | | | | | | | | | rhsort | 10000 | 32 | 0.057062 | 0.057359 | 0.0 | 10 | random 10000 | | wolfsort | 10000 | 32 | 0.118598 | 0.119373 | 0.0 | 10 | random 10000 | | skasort | 10000 | 32 | 0.059786 | 0.060189 | 0.0 | 10 | random 10000 | | | | | | | | | | | rhsort | 100000 | 32 | 0.071273 | 0.073310 | 0.0 | 10 | random 100000 | | wolfsort | 100000 | 32 | 0.102639 | 0.103917 | 0.0 | 10 | random 100000 | | skasort | 100000 | 32 | 0.064120 | 0.064615 | 0.0 | 10 | random 100000 | | | | | | | | | | | rhsort | 1000000 | 32 | 0.181059 | 0.187563 | 0.0 | 10 | random 1000000 | | wolfsort | 1000000 | 32 | 0.146630 | 0.147598 | 0.0 | 10 | random 1000000 | | skasort | 1000000 | 32 | 0.070250 | 0.071571 | 0.0 | 10 | random 1000000 | | | | | | | | | | | rhsort | 10000000 | 32 | 0.412107 | 0.425066 | 0 | 10 | random 10000000 | | wolfsort | 10000000 | 32 | 0.193120 | 0.200947 | 0 | 10 | random 10000000 | | skasort | 10000000 | 32 | 0.115721 | 0.116621 | 0 | 10 | random 10000000 |
Benchmark for Wolfsort v1.2.1.3 ------------------------------- fluxsort vs gridsort vs quadsort vs wolfsort on 100K elements ------------------------------------------------------------- The following benchmark was on WSL gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1). The source code was compiled using g++ -O3 -fpermissive bench.c. All comparisons are inlined through the cmp macro. A table with the best and average time in seconds can be uncollapsed below the bar graph. ![Graph](/images/graph1.png)
data table | Name | Items | Type | Best | Average | Compares | Samples | Distribution | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- | | fluxsort | 100000 | 128 | 0.008328 | 0.008424 | 0 | 100 | random order | | gridsort | 100000 | 128 | 0.007823 | 0.007932 | 0 | 100 | random order | | quadsort | 100000 | 128 | 0.008260 | 0.008353 | 0 | 100 | random order | | wolfsort | 100000 | 128 | 0.008330 | 0.008415 | 0 | 100 | random order | | Name | Items | Type | Best | Average | Compares | Samples | Distribution | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- | | fluxsort | 100000 | 64 | 0.001971 | 0.001991 | 0 | 100 | random order | | gridsort | 100000 | 64 | 0.002370 | 0.002398 | 0 | 100 | random order | | quadsort | 100000 | 64 | 0.002230 | 0.002254 | 0 | 100 | random order | | wolfsort | 100000 | 64 | 0.003023 | 0.003068 | 0 | 100 | random order | | Name | Items | Type | Best | Average | Loops | Samples | Distribution | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- | | fluxsort | 100000 | 32 | 0.001868 | 0.001901 | 0 | 100 | random order | | gridsort | 100000 | 32 | 0.002324 | 0.002357 | 0 | 100 | random order | | quadsort | 100000 | 32 | 0.002149 | 0.002174 | 0 | 100 | random order | | wolfsort | 100000 | 32 | 0.000988 | 0.001019 | 0 | 100 | random order | | | | | | | | | | | fluxsort | 100000 | 32 | 0.000733 | 0.000740 | 0 | 100 | random % 100 | | gridsort | 100000 | 32 | 0.001921 | 0.001941 | 0 | 100 | random % 100 | | quadsort | 100000 | 32 | 0.001627 | 0.001645 | 0 | 100 | random % 100 | | wolfsort | 100000 | 32 | 0.000374 | 0.000378 | 0 | 100 | random % 100 | | | | | | | | | | | fluxsort | 100000 | 32 | 0.000043 | 0.000044 | 0 | 100 | ascending order | | gridsort | 100000 | 32 | 0.000264 | 0.000271 | 0 | 100 | ascending order | | quadsort | 100000 | 32 | 0.000052 | 0.000053 | 0 | 100 | ascending order | | wolfsort | 100000 | 32 | 0.000087 | 0.000089 | 0 | 100 | ascending order | | | | | | | | | | | fluxsort | 100000 | 32 | 0.000305 | 0.000314 | 0 | 100 | ascending saw | | gridsort | 100000 | 32 | 0.000621 | 0.000641 | 0 | 100 | ascending saw | | quadsort | 100000 | 32 | 0.000411 | 0.000417 | 0 | 100 | ascending saw | | wolfsort | 100000 | 32 | 0.000379 | 0.000384 | 0 | 100 | ascending saw | | | | | | | | | | | fluxsort | 100000 | 32 | 0.000193 | 0.000203 | 0 | 100 | pipe organ | | gridsort | 100000 | 32 | 0.000446 | 0.000486 | 0 | 100 | pipe organ | | quadsort | 100000 | 32 | 0.000252 | 0.000260 | 0 | 100 | pipe organ | | wolfsort | 100000 | 32 | 0.000248 | 0.000259 | 0 | 100 | pipe organ | | | | | | | | | | | fluxsort | 100000 | 32 | 0.000054 | 0.000055 | 0 | 100 | descending order | | gridsort | 100000 | 32 | 0.000284 | 0.000295 | 0 | 100 | descending order | | quadsort | 100000 | 32 | 0.000068 | 0.000070 | 0 | 100 | descending order | | wolfsort | 100000 | 32 | 0.000097 | 0.000100 | 0 | 100 | descending order | | | | | | | | | | | fluxsort | 100000 | 32 | 0.000315 | 0.000325 | 0 | 100 | descending saw | | gridsort | 100000 | 32 | 0.000652 | 0.000667 | 0 | 100 | descending saw | | quadsort | 100000 | 32 | 0.000440 | 0.000446 | 0 | 100 | descending saw | | wolfsort | 100000 | 32 | 0.000389 | 0.000393 | 0 | 100 | descending saw | | | | | | | | | | | fluxsort | 100000 | 32 | 0.000607 | 0.000619 | 0 | 100 | random tail | | gridsort | 100000 | 32 | 0.000847 | 0.000860 | 0 | 100 | random tail | | quadsort | 100000 | 32 | 0.000685 | 0.000694 | 0 | 100 | random tail | | wolfsort | 100000 | 32 | 0.000464 | 0.000471 | 0 | 100 | random tail | | | | | | | | | | | fluxsort | 100000 | 32 | 0.001074 | 0.001081 | 0 | 100 | random half | | gridsort | 100000 | 32 | 0.001332 | 0.001355 | 0 | 100 | random half | | quadsort | 100000 | 32 | 0.001230 | 0.001243 | 0 | 100 | random half | | wolfsort | 100000 | 32 | 0.000686 | 0.000696 | 0 | 100 | random half | | | | | | | | | | | fluxsort | 100000 | 32 | 0.000317 | 0.000324 | 0 | 100 | ascending tiles | | gridsort | 100000 | 32 | 0.000665 | 0.000693 | 0 | 100 | ascending tiles | | quadsort | 100000 | 32 | 0.000789 | 0.000802 | 0 | 100 | ascending tiles | | wolfsort | 100000 | 32 | 0.000686 | 0.000693 | 0 | 100 | ascending tiles | | | | | | | | | | | fluxsort | 100000 | 32 | 0.001714 | 0.001751 | 0 | 100 | bit reversal | | gridsort | 100000 | 32 | 0.002045 | 0.002060 | 0 | 100 | bit reversal | | quadsort | 100000 | 32 | 0.002083 | 0.002100 | 0 | 100 | bit reversal | | wolfsort | 100000 | 32 | 0.000888 | 0.000912 | 0 | 100 | bit reversal | | | | | | | | | | | fluxsort | 100000 | 32 | 0.000215 | 0.000223 | 0 | 100 | random % 4 | | gridsort | 100000 | 32 | 0.001283 | 0.001305 | 0 | 100 | random % 4 | | quadsort | 100000 | 32 | 0.001080 | 0.001090 | 0 | 100 | random % 4 | | wolfsort | 100000 | 32 | 0.000369 | 0.000371 | 0 | 100 | random % 4 | | | | | | | | | | | fluxsort | 100000 | 32 | 0.001072 | 0.001098 | 0 | 100 | semi random | | gridsort | 100000 | 32 | 0.001355 | 0.001377 | 0 | 100 | semi random | | quadsort | 100000 | 32 | 0.001062 | 0.001074 | 0 | 100 | semi random | | wolfsort | 100000 | 32 | 0.000789 | 0.000803 | 0 | 100 | semi random | | | | | | | | | | | fluxsort | 100000 | 32 | 0.001079 | 0.001099 | 0 | 100 | random signal | | gridsort | 100000 | 32 | 0.001296 | 0.001306 | 0 | 100 | random signal | | quadsort | 100000 | 32 | 0.001014 | 0.001027 | 0 | 100 | random signal | | wolfsort | 100000 | 32 | 0.000816 | 0.000828 | 0 | 100 | random signal |
fluxsort vs gridsort vs quadsort vs wolfsort on 10M elements ------------------------------------------------------------ ![Graph](/images/graph2.png)
data table | Name | Items | Type | Best | Average | Compares | Samples | Distribution | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- | | fluxsort | 10000000 | 128 | 1.242395 | 1.264809 | 0 | 10 | random order | | gridsort | 10000000 | 128 | 1.048748 | 1.110490 | 0 | 10 | random order | | quadsort | 10000000 | 128 | 1.407639 | 1.418088 | 0 | 10 | random order | | wolfsort | 10000000 | 128 | 1.239099 | 1.241608 | 0 | 10 | random order | | Name | Items | Type | Best | Average | Compares | Samples | Distribution | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- | | fluxsort | 10000000 | 64 | 0.317327 | 0.318203 | 0 | 10 | random order | | gridsort | 10000000 | 64 | 0.332430 | 0.334392 | 0 | 10 | random order | | quadsort | 10000000 | 64 | 0.438257 | 0.439139 | 0 | 10 | random order | | wolfsort | 10000000 | 64 | 0.481977 | 0.484055 | 0 | 10 | random order | | Name | Items | Type | Best | Average | Loops | Samples | Distribution | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- | | fluxsort | 10000000 | 32 | 0.269351 | 0.271460 | 0 | 10 | random order | | gridsort | 10000000 | 32 | 0.322099 | 0.323899 | 0 | 10 | random order | | quadsort | 10000000 | 32 | 0.364457 | 0.365617 | 0 | 10 | random order | | wolfsort | 10000000 | 32 | 0.189780 | 0.190911 | 0 | 10 | random order | | | | | | | | | | | fluxsort | 10000000 | 32 | 0.089973 | 0.090849 | 0 | 10 | random % 100 | | gridsort | 10000000 | 32 | 0.172222 | 0.173147 | 0 | 10 | random % 100 | | quadsort | 10000000 | 32 | 0.248361 | 0.250615 | 0 | 10 | random % 100 | | wolfsort | 10000000 | 32 | 0.086473 | 0.087067 | 0 | 10 | random % 100 | | | | | | | | | | | fluxsort | 10000000 | 32 | 0.006437 | 0.006574 | 0 | 10 | ascending order | | gridsort | 10000000 | 32 | 0.032321 | 0.032798 | 0 | 10 | ascending order | | quadsort | 10000000 | 32 | 0.011736 | 0.012125 | 0 | 10 | ascending order | | wolfsort | 10000000 | 32 | 0.010888 | 0.011015 | 0 | 10 | ascending order | | | | | | | | | | | fluxsort | 10000000 | 32 | 0.074940 | 0.075525 | 0 | 10 | ascending saw | | gridsort | 10000000 | 32 | 0.067478 | 0.067893 | 0 | 10 | ascending saw | | quadsort | 10000000 | 32 | 0.097133 | 0.098004 | 0 | 10 | ascending saw | | wolfsort | 10000000 | 32 | 0.081797 | 0.082794 | 0 | 10 | ascending saw | | | | | | | | | | | fluxsort | 10000000 | 32 | 0.064577 | 0.065593 | 0 | 10 | pipe organ | | gridsort | 10000000 | 32 | 0.048932 | 0.049336 | 0 | 10 | pipe organ | | quadsort | 10000000 | 32 | 0.082533 | 0.083781 | 0 | 10 | pipe organ | | wolfsort | 10000000 | 32 | 0.070334 | 0.071158 | 0 | 10 | pipe organ | | | | | | | | | | | fluxsort | 10000000 | 32 | 0.009807 | 0.010104 | 0 | 10 | descending order | | gridsort | 10000000 | 32 | 0.034583 | 0.034814 | 0 | 10 | descending order | | quadsort | 10000000 | 32 | 0.011396 | 0.011639 | 0 | 10 | descending order | | wolfsort | 10000000 | 32 | 0.014198 | 0.014544 | 0 | 10 | descending order | | | | | | | | | | | fluxsort | 10000000 | 32 | 0.078279 | 0.079071 | 0 | 10 | descending saw | | gridsort | 10000000 | 32 | 0.069702 | 0.070109 | 0 | 10 | descending saw | | quadsort | 10000000 | 32 | 0.101826 | 0.102801 | 0 | 10 | descending saw | | wolfsort | 10000000 | 32 | 0.085101 | 0.085973 | 0 | 10 | descending saw | | | | | | | | | | | fluxsort | 10000000 | 32 | 0.121948 | 0.122561 | 0 | 10 | random tail | | gridsort | 10000000 | 32 | 0.109341 | 0.110117 | 0 | 10 | random tail | | quadsort | 10000000 | 32 | 0.153324 | 0.153797 | 0 | 10 | random tail | | wolfsort | 10000000 | 32 | 0.103558 | 0.104152 | 0 | 10 | random tail | | | | | | | | | | | fluxsort | 10000000 | 32 | 0.181347 | 0.183186 | 0 | 10 | random half | | gridsort | 10000000 | 32 | 0.185691 | 0.186592 | 0 | 10 | random half | | quadsort | 10000000 | 32 | 0.225265 | 0.225897 | 0 | 10 | random half | | wolfsort | 10000000 | 32 | 0.159819 | 0.160569 | 0 | 10 | random half | | | | | | | | | | | fluxsort | 10000000 | 32 | 0.073673 | 0.074755 | 0 | 10 | ascending tiles | | gridsort | 10000000 | 32 | 0.126309 | 0.126626 | 0 | 10 | ascending tiles | | quadsort | 10000000 | 32 | 0.165332 | 0.167541 | 0 | 10 | ascending tiles | | wolfsort | 10000000 | 32 | 0.093424 | 0.094040 | 0 | 10 | ascending tiles | | | | | | | | | | | fluxsort | 10000000 | 32 | 0.271679 | 0.272589 | 0 | 10 | bit reversal | | gridsort | 10000000 | 32 | 0.296563 | 0.297984 | 0 | 10 | bit reversal | | quadsort | 10000000 | 32 | 0.369105 | 0.370652 | 0 | 10 | bit reversal | | wolfsort | 10000000 | 32 | 0.251209 | 0.252891 | 0 | 10 | bit reversal | | | | | | | | | | | fluxsort | 10000000 | 32 | 0.056011 | 0.056552 | 0 | 10 | random % 4 | | gridsort | 10000000 | 32 | 0.191179 | 0.194017 | 0 | 10 | random % 4 | | quadsort | 10000000 | 32 | 0.192466 | 0.193967 | 0 | 10 | random % 4 | | wolfsort | 10000000 | 32 | 0.081668 | 0.082543 | 0 | 10 | random % 4 | | | | | | | | | | | fluxsort | 10000000 | 32 | 0.054231 | 0.054571 | 0 | 10 | semi random | | gridsort | 10000000 | 32 | 0.146534 | 0.146907 | 0 | 10 | semi random | | quadsort | 10000000 | 32 | 0.197462 | 0.200010 | 0 | 10 | semi random | | wolfsort | 10000000 | 32 | 0.192603 | 0.194365 | 0 | 10 | semi random | | | | | | | | | | | fluxsort | 10000000 | 32 | 0.173080 | 0.176575 | 0 | 10 | random signal | | gridsort | 10000000 | 32 | 0.137590 | 0.137932 | 0 | 10 | random signal | | quadsort | 10000000 | 32 | 0.180939 | 0.181778 | 0 | 10 | random signal | | wolfsort | 10000000 | 32 | 0.161181 | 0.161714 | 0 | 10 | random signal |
blitsort vs crumsort vs pdqsort vs wolfsort on 100K elements ------------------------------------------------------------- The following benchmark was on WSL gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1). The source code was compiled using g++ -O3 -fpermissive bench.c. All comparisons are inlined through the cmp macro. A table with the best and average time in seconds can be uncollapsed below the bar graph. Blitsort uses 512 elements of auxiliary memory, crumsort 512, pdqsort 64, and wolfsort 100000. ![Graph](/images/graph3.png)
data table | Name | Items | Type | Best | Average | Compares | Samples | Distribution | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- | | blitsort | 100000 | 128 | 0.010864 | 0.010994 | 0 | 100 | random order | | crumsort | 100000 | 128 | 0.008143 | 0.008222 | 0 | 100 | random order | | pdqsort | 100000 | 128 | 0.005954 | 0.006063 | 0 | 100 | random order | | wolfsort | 100000 | 128 | 0.008308 | 0.008396 | 0 | 100 | random order | | Name | Items | Type | Best | Average | Compares | Samples | Distribution | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- | | blitsort | 100000 | 64 | 0.002326 | 0.002354 | 0 | 100 | random order | | crumsort | 100000 | 64 | 0.001835 | 0.001848 | 0 | 100 | random order | | pdqsort | 100000 | 64 | 0.002752 | 0.002806 | 0 | 100 | random order | | wolfsort | 100000 | 64 | 0.003014 | 0.003069 | 0 | 100 | random order | | Name | Items | Type | Best | Average | Loops | Samples | Distribution | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- | | blitsort | 100000 | 32 | 0.002094 | 0.002117 | 0 | 100 | random order | | crumsort | 100000 | 32 | 0.001764 | 0.001779 | 0 | 100 | random order | | pdqsort | 100000 | 32 | 0.002747 | 0.002770 | 0 | 100 | random order | | wolfsort | 100000 | 32 | 0.000983 | 0.001016 | 0 | 100 | random order | | | | | | | | | | | blitsort | 100000 | 32 | 0.000880 | 0.000891 | 0 | 100 | random % 100 | | crumsort | 100000 | 32 | 0.000602 | 0.000641 | 0 | 100 | random % 100 | | pdqsort | 100000 | 32 | 0.000795 | 0.000805 | 0 | 100 | random % 100 | | wolfsort | 100000 | 32 | 0.000376 | 0.000381 | 0 | 100 | random % 100 | | | | | | | | | | | blitsort | 100000 | 32 | 0.000043 | 0.000045 | 0 | 100 | ascending order | | crumsort | 100000 | 32 | 0.000043 | 0.000044 | 0 | 100 | ascending order | | pdqsort | 100000 | 32 | 0.000084 | 0.000088 | 0 | 100 | ascending order | | wolfsort | 100000 | 32 | 0.000086 | 0.000088 | 0 | 100 | ascending order | | | | | | | | | | | blitsort | 100000 | 32 | 0.000440 | 0.000450 | 0 | 100 | ascending saw | | crumsort | 100000 | 32 | 0.000410 | 0.000419 | 0 | 100 | ascending saw | | pdqsort | 100000 | 32 | 0.003222 | 0.003246 | 0 | 100 | ascending saw | | wolfsort | 100000 | 32 | 0.000379 | 0.000382 | 0 | 100 | ascending saw | | | | | | | | | | | blitsort | 100000 | 32 | 0.000242 | 0.000251 | 0 | 100 | pipe organ | | crumsort | 100000 | 32 | 0.000229 | 0.000243 | 0 | 100 | pipe organ | | pdqsort | 100000 | 32 | 0.002842 | 0.002864 | 0 | 100 | pipe organ | | wolfsort | 100000 | 32 | 0.000249 | 0.000257 | 0 | 100 | pipe organ | | | | | | | | | | | blitsort | 100000 | 32 | 0.000054 | 0.000055 | 0 | 100 | descending order | | crumsort | 100000 | 32 | 0.000054 | 0.000055 | 0 | 100 | descending order | | pdqsort | 100000 | 32 | 0.000190 | 0.000198 | 0 | 100 | descending order | | wolfsort | 100000 | 32 | 0.000097 | 0.000100 | 0 | 100 | descending order | | | | | | | | | | | blitsort | 100000 | 32 | 0.000452 | 0.000466 | 0 | 100 | descending saw | | crumsort | 100000 | 32 | 0.000421 | 0.000431 | 0 | 100 | descending saw | | pdqsort | 100000 | 32 | 0.004200 | 0.004245 | 0 | 100 | descending saw | | wolfsort | 100000 | 32 | 0.000383 | 0.000402 | 0 | 100 | descending saw | | | | | | | | | | | blitsort | 100000 | 32 | 0.000782 | 0.000829 | 0 | 100 | random tail | | crumsort | 100000 | 32 | 0.000714 | 0.000755 | 0 | 100 | random tail | | pdqsort | 100000 | 32 | 0.002638 | 0.002759 | 0 | 100 | random tail | | wolfsort | 100000 | 32 | 0.000463 | 0.000483 | 0 | 100 | random tail | | | | | | | | | | | blitsort | 100000 | 32 | 0.001210 | 0.001275 | 0 | 100 | random half | | crumsort | 100000 | 32 | 0.001063 | 0.001096 | 0 | 100 | random half | | pdqsort | 100000 | 32 | 0.002738 | 0.002780 | 0 | 100 | random half | | wolfsort | 100000 | 32 | 0.000685 | 0.000712 | 0 | 100 | random half | | | | | | | | | | | blitsort | 100000 | 32 | 0.001105 | 0.001278 | 0 | 100 | ascending tiles | | crumsort | 100000 | 32 | 0.001393 | 0.001435 | 0 | 100 | ascending tiles | | pdqsort | 100000 | 32 | 0.002367 | 0.002398 | 0 | 100 | ascending tiles | | wolfsort | 100000 | 32 | 0.000682 | 0.000689 | 0 | 100 | ascending tiles | | | | | | | | | | | blitsort | 100000 | 32 | 0.001956 | 0.001988 | 0 | 100 | bit reversal | | crumsort | 100000 | 32 | 0.001762 | 0.001794 | 0 | 100 | bit reversal | | pdqsort | 100000 | 32 | 0.002731 | 0.002758 | 0 | 100 | bit reversal | | wolfsort | 100000 | 32 | 0.000890 | 0.000921 | 0 | 100 | bit reversal | | | | | | | | | | | blitsort | 100000 | 32 | 0.000328 | 0.000341 | 0 | 100 | random % 4 | | crumsort | 100000 | 32 | 0.000206 | 0.000216 | 0 | 100 | random % 4 | | pdqsort | 100000 | 32 | 0.000382 | 0.000391 | 0 | 100 | random % 4 | | wolfsort | 100000 | 32 | 0.000367 | 0.000378 | 0 | 100 | random % 4 | | | | | | | | | | | blitsort | 100000 | 32 | 0.001209 | 0.001244 | 0 | 100 | semi random | | crumsort | 100000 | 32 | 0.000309 | 0.000319 | 0 | 100 | semi random | | pdqsort | 100000 | 32 | 0.000479 | 0.000500 | 0 | 100 | semi random | | wolfsort | 100000 | 32 | 0.000791 | 0.000828 | 0 | 100 | semi random | | | | | | | | | | | blitsort | 100000 | 32 | 0.001893 | 0.001926 | 0 | 100 | random signal | | crumsort | 100000 | 32 | 0.001714 | 0.001742 | 0 | 100 | random signal | | pdqsort | 100000 | 32 | 0.002950 | 0.002976 | 0 | 100 | random signal | | wolfsort | 100000 | 32 | 0.000814 | 0.000834 | 0 | 100 | random signal |
blitsort vs crumsort vs pdqsort vs wolfsort on 10M elements ----------------------------------------------------------- Blitsort uses 512 elements of auxiliary memory, crumsort 512, pdqsort 64, and wolfsort 100000000. ![Graph](/images/graph4.png)
data table | Name | Items | Type | Best | Average | Compares | Samples | Distribution | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- | | blitsort | 10000000 | 128 | 2.172622 | 2.191956 | 0 | 10 | random order | | crumsort | 10000000 | 128 | 1.134328 | 1.135821 | 0 | 10 | random order | | pdqsort | 10000000 | 128 | 0.805620 | 0.808041 | 0 | 10 | random order | | wolfsort | 10000000 | 128 | 1.237174 | 1.238863 | 0 | 10 | random order | | Name | Items | Type | Best | Average | Compares | Samples | Distribution | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- | | blitsort | 10000000 | 64 | 0.434356 | 0.443134 | 0 | 10 | random order | | crumsort | 10000000 | 64 | 0.250065 | 0.251453 | 0 | 10 | random order | | pdqsort | 10000000 | 64 | 0.359586 | 0.360388 | 0 | 10 | random order | | wolfsort | 10000000 | 64 | 0.480904 | 0.482835 | 0 | 10 | random order | | Name | Items | Type | Best | Average | Loops | Samples | Distribution | | --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- | | blitsort | 10000000 | 32 | 0.332071 | 0.339524 | 0 | 10 | random order | | crumsort | 10000000 | 32 | 0.231584 | 0.232056 | 0 | 10 | random order | | pdqsort | 10000000 | 32 | 0.347793 | 0.348437 | 0 | 10 | random order | | wolfsort | 10000000 | 32 | 0.189250 | 0.189762 | 0 | 10 | random order | | | | | | | | | | | blitsort | 10000000 | 32 | 0.126792 | 0.128439 | 0 | 10 | random % 100 | | crumsort | 10000000 | 32 | 0.060683 | 0.061353 | 0 | 10 | random % 100 | | pdqsort | 10000000 | 32 | 0.079284 | 0.079891 | 0 | 10 | random % 100 | | wolfsort | 10000000 | 32 | 0.086577 | 0.087157 | 0 | 10 | random % 100 | | | | | | | | | | | blitsort | 10000000 | 32 | 0.006581 | 0.006784 | 0 | 10 | ascending order | | crumsort | 10000000 | 32 | 0.006690 | 0.006801 | 0 | 10 | ascending order | | pdqsort | 10000000 | 32 | 0.011712 | 0.011851 | 0 | 10 | ascending order | | wolfsort | 10000000 | 32 | 0.010958 | 0.011520 | 0 | 10 | ascending order | | | | | | | | | | | blitsort | 10000000 | 32 | 0.070514 | 0.071260 | 0 | 10 | ascending saw | | crumsort | 10000000 | 32 | 0.064829 | 0.066035 | 0 | 10 | ascending saw | | pdqsort | 10000000 | 32 | 0.560995 | 0.561774 | 0 | 10 | ascending saw | | wolfsort | 10000000 | 32 | 0.081644 | 0.082279 | 0 | 10 | ascending saw | | | | | | | | | | | blitsort | 10000000 | 32 | 0.041220 | 0.041924 | 0 | 10 | pipe organ | | crumsort | 10000000 | 32 | 0.039335 | 0.040018 | 0 | 10 | pipe organ | | pdqsort | 10000000 | 32 | 0.363633 | 0.364187 | 0 | 10 | pipe organ | | wolfsort | 10000000 | 32 | 0.070536 | 0.071400 | 0 | 10 | pipe organ | | | | | | | | | | | blitsort | 10000000 | 32 | 0.010271 | 0.010549 | 0 | 10 | descending order | | crumsort | 10000000 | 32 | 0.010254 | 0.010499 | 0 | 10 | descending order | | pdqsort | 10000000 | 32 | 0.023129 | 0.023708 | 0 | 10 | descending order | | wolfsort | 10000000 | 32 | 0.014583 | 0.015316 | 0 | 10 | descending order | | | | | | | | | | | blitsort | 10000000 | 32 | 0.073410 | 0.074402 | 0 | 10 | descending saw | | crumsort | 10000000 | 32 | 0.068284 | 0.069154 | 0 | 10 | descending saw | | pdqsort | 10000000 | 32 | 0.942142 | 0.958606 | 0 | 10 | descending saw | | wolfsort | 10000000 | 32 | 0.085338 | 0.086014 | 0 | 10 | descending saw | | | | | | | | | | | blitsort | 10000000 | 32 | 0.124089 | 0.130327 | 0 | 10 | random tail | | crumsort | 10000000 | 32 | 0.103030 | 0.104337 | 0 | 10 | random tail | | pdqsort | 10000000 | 32 | 0.337862 | 0.342594 | 0 | 10 | random tail | | wolfsort | 10000000 | 32 | 0.103381 | 0.108048 | 0 | 10 | random tail | | | | | | | | | | | blitsort | 10000000 | 32 | 0.191479 | 0.193036 | 0 | 10 | random half | | crumsort | 10000000 | 32 | 0.146732 | 0.147742 | 0 | 10 | random half | | pdqsort | 10000000 | 32 | 0.342803 | 0.343424 | 0 | 10 | random half | | wolfsort | 10000000 | 32 | 0.159515 | 0.160787 | 0 | 10 | random half | | | | | | | | | | | blitsort | 10000000 | 32 | 0.182256 | 0.190378 | 0 | 10 | ascending tiles | | crumsort | 10000000 | 32 | 0.188875 | 0.195063 | 0 | 10 | ascending tiles | | pdqsort | 10000000 | 32 | 0.285777 | 0.286996 | 0 | 10 | ascending tiles | | wolfsort | 10000000 | 32 | 0.093709 | 0.094315 | 0 | 10 | ascending tiles | | | | | | | | | | | blitsort | 10000000 | 32 | 0.324983 | 0.326345 | 0 | 10 | bit reversal | | crumsort | 10000000 | 32 | 0.230872 | 0.231599 | 0 | 10 | bit reversal | | pdqsort | 10000000 | 32 | 0.343915 | 0.344677 | 0 | 10 | bit reversal | | wolfsort | 10000000 | 32 | 0.250331 | 0.251319 | 0 | 10 | bit reversal | | | | | | | | | | | blitsort | 10000000 | 32 | 0.061197 | 0.062058 | 0 | 10 | random % 4 | | crumsort | 10000000 | 32 | 0.030134 | 0.030564 | 0 | 10 | random % 4 | | pdqsort | 10000000 | 32 | 0.043492 | 0.043673 | 0 | 10 | random % 4 | | wolfsort | 10000000 | 32 | 0.081548 | 0.082020 | 0 | 10 | random % 4 | | | | | | | | | | | blitsort | 10000000 | 32 | 0.066686 | 0.067764 | 0 | 10 | semi random | | crumsort | 10000000 | 32 | 0.045479 | 0.046088 | 0 | 10 | semi random | | pdqsort | 10000000 | 32 | 0.060253 | 0.060612 | 0 | 10 | semi random | | wolfsort | 10000000 | 32 | 0.190505 | 0.191946 | 0 | 10 | semi random | | | | | | | | | | | blitsort | 10000000 | 32 | 0.272456 | 0.274928 | 0 | 10 | random signal | | crumsort | 10000000 | 32 | 0.224115 | 0.225966 | 0 | 10 | random signal | | pdqsort | 10000000 | 32 | 0.382742 | 0.384505 | 0 | 10 | random signal | | wolfsort | 10000000 | 32 | 0.160946 | 0.161769 | 0 | 10 | random signal |
================================================ FILE: src/bench.c ================================================ /* To compile use either: gcc -O3 bench.c or clang -O3 bench.c or g++ -O3 bench.c */ #include #include #include #include #include #include #include #define cmp(a,b) (*(a) > *(b)) // uncomment for faster primitive comparisons const char *sorts[] = { "*", "quadsort", "gridsort", "blitsort", "fluxsort", "skipsort", "crumsort", "wolfsort", "sort::std" }; //#define SKIP_STRINGS //#define SKIP_DOUBLES //#define SKIP_LONGS #if __has_include("blitsort.h") #include "blitsort.h" // curl "https://raw.githubusercontent.com/scandum/blitsort/master/src/blitsort.{c,h}" -o "blitsort.#1" #endif #if __has_include("crumsort.h") #include "crumsort.h" // curl "https://raw.githubusercontent.com/scandum/crumsort/master/src/crumsort.{c,h}" -o "crumsort.#1" #endif #if __has_include("dripsort.h") #include "dripsort.h" #endif #if __has_include("flowsort.h") #include "flowsort.h" #endif #if __has_include("fluxsort.h") #include "fluxsort.h" // curl "https://raw.githubusercontent.com/scandum/fluxsort/master/src/fluxsort.{c,h}" -o "fluxsort.#1" #endif #if __has_include("gridsort.h") #include "gridsort.h" // curl "https://raw.githubusercontent.com/scandum/gridsort/master/src/gridsort.{c,h}" -o "gridsort.#1" #endif #if __has_include("octosort.h") #include "octosort.h" // curl "https://raw.githubusercontent.com/scandum/octosort/master/src/octosort.{c,h}" -o "octosort.#1" #endif #if __has_include("piposort.h") #include "piposort.h" // curl "https://raw.githubusercontent.com/scandum/piposort/master/src/piposort.{c,h}" -o "piposort.#1" #endif #if __has_include("quadsort.h") #include "quadsort.h" // curl "https://raw.githubusercontent.com/scandum/quadsort/master/src/quadsort.{c,h}" -o "quadsort.#1" #endif #if __has_include("skipsort.h") #include "skipsort.h" #endif #if __has_include("wolfsort.h") #include "wolfsort.h" // curl "https://raw.githubusercontent.com/scandum/wolfsort/master/src/wolfsort.{c,h}" -o "wolfsort.#1" #endif #if __has_include("rhsort.c") #define RHSORT_C #include "rhsort.c" // curl https://raw.githubusercontent.com/mlochbaum/rhsort/master/rhsort.c > rhsort.c #endif #ifdef __GNUG__ #include #if __has_include("pdqsort.h") #include "pdqsort.h" // curl https://raw.githubusercontent.com/orlp/pdqsort/master/pdqsort.h > pdqsort.h #endif #if __has_include("ska_sort.hpp") #define SKASORT_HPP #include "ska_sort.hpp" // curl https://raw.githubusercontent.com/skarupke/ska_sort/master/ska_sort.hpp > ska_sort.hpp #endif #if __has_include("timsort.hpp") #include "timsort.hpp" // curl https://raw.githubusercontent.com/timsort/cpp-TimSort/master/include/gfx/timsort.hpp > timsort.hpp #endif #endif #if __has_include("antiqsort.c") #include "antiqsort.c" #endif //typedef int CMPFUNC (const void *a, const void *b); typedef void SRTFUNC(void *array, size_t nmemb, size_t size, CMPFUNC *cmpf); // Comment out Remove __attribute__ ((noinline)) and comparisons++ for full // throttle. Like so: #define COMPARISON_PP //comparisons++ size_t comparisons; #define COMPARISON_PP comparisons++ #define NO_INLINE __attribute__ ((noinline)) // primitive type comparison functions NO_INLINE int cmp_int(const void * a, const void * b) { COMPARISON_PP; return *(int *) a - *(int *) b; // const int l = *(const int *)a; // const int r = *(const int *)b; // return l - r; // return l > r; // return (l > r) - (l < r); } NO_INLINE int cmp_rev(const void * a, const void * b) { int fa = *(int *)a; int fb = *(int *)b; COMPARISON_PP; return fb - fa; } NO_INLINE int cmp_stable(const void * a, const void * b) { int fa = *(int *)a; int fb = *(int *)b; COMPARISON_PP; return fa / 100000 - fb / 100000; } NO_INLINE int cmp_long(const void * a, const void * b) { const long long fa = *(const long long *) a; const long long fb = *(const long long *) b; COMPARISON_PP; return (fa > fb) - (fa < fb); // return (fa > fb); } NO_INLINE int cmp_float(const void * a, const void * b) { return *(float *) a - *(float *) b; } NO_INLINE int cmp_long_double(const void * a, const void * b) { const long double fa = *(const long double *) a; const long double fb = *(const long double *) b; COMPARISON_PP; return (fa > fb) - (fa < fb); /* if (isnan(fa) || isnan(fb)) { return isnan(fa) - isnan(fb); } return (fa > fb); */ } // pointer comparison functions NO_INLINE int cmp_str(const void * a, const void * b) { COMPARISON_PP; return strcmp(*(const char **) a, *(const char **) b); } NO_INLINE int cmp_int_ptr(const void * a, const void * b) { const int *fa = *(const int **) a; const int *fb = *(const int **) b; COMPARISON_PP; return (*fa > *fb) - (*fa < *fb); } NO_INLINE int cmp_long_ptr(const void * a, const void * b) { const long long *fa = *(const long long **) a; const long long *fb = *(const long long **) b; COMPARISON_PP; return (*fa > *fb) - (*fa < *fb); } NO_INLINE int cmp_long_double_ptr(const void * a, const void * b) { const long double *fa = *(const long double **) a; const long double *fb = *(const long double **) b; COMPARISON_PP; return (*fa > *fb) - (*fa < *fb); } // c++ comparison functions #ifdef __GNUG__ NO_INLINE bool cpp_cmp_int(const int &a, const int &b) { COMPARISON_PP; return a < b; } NO_INLINE bool cpp_cmp_str(char const* const a, char const* const b) { COMPARISON_PP; return strcmp(a, b) < 0; } #endif long long utime() { struct timeval now_time; gettimeofday(&now_time, NULL); return now_time.tv_sec * 1000000LL + now_time.tv_usec; } void seed_rand(unsigned long long seed) { srand(seed); } void test_sort(void *array, void *unsorted, void *valid, int minimum, int maximum, int samples, int repetitions, SRTFUNC *srt, const char *name, const char *desc, size_t size, CMPFUNC *cmpf) { long long start, end, total, best, average_time, average_comp; char temp[100]; static char compare = 0; long long *ptla = (long long *) array, *ptlv = (long long *) valid; long double *ptda = (long double *) array, *ptdv = (long double *) valid; int *pta = (int *) array, *ptv = (int *) valid, rep, sam, max, cnt, name32; #ifdef SKASORT_HPP void *swap; #endif if (*name == '*') { if (!strcmp(desc, "random order") || !strcmp(desc, "random 1-4") || !strcmp(desc, "random 4") || !strcmp(desc, "random string") || !strcmp(desc, "random 10")) { if (comparisons) { compare = 1; printf("%s\n", "| Name | Items | Type | Best | Average | Compares | Samples | Distribution |"); printf("%s\n", "| --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |"); } else { printf("%s\n", "| Name | Items | Type | Best | Average | Loops | Samples | Distribution |"); printf("%s\n", "| --------- | -------- | ---- | -------- | -------- | --------- | ------- | ---------------- |"); } } else { printf("%s\n", "| | | | | | | | |"); } return; } name32 = name[0] + (name[1] ? name[1] * 32 : 0) + (name[2] ? name[2] * 1024 : 0); best = average_time = average_comp = 0; if (minimum == 7 && maximum == 7) { pta = (int *) unsorted; printf("\e[1;32m%10d %10d %10d %10d %10d %10d %10d\e[0m\n", pta[0], pta[1], pta[2], pta[3], pta[4], pta[5], pta[6]); pta = (int *) array; } for (sam = 0 ; sam < samples ; sam++) { total = average_comp = 0; max = minimum; start = utime(); for (rep = repetitions - 1 ; rep >= 0 ; rep--) { memcpy(array, (char *) unsorted + maximum * rep * size, max * size); comparisons = 0; // edit char *sorts to add / remove sorts switch (name32) { #ifdef BLITSORT_H case 'b' + 'l' * 32 + 'i' * 1024: blitsort(array, max, size, cmpf); break; #endif #ifdef CRUMSORT_H case 'c' + 'r' * 32 + 'u' * 1024: crumsort(array, max, size, cmpf); break; #endif #ifdef DRIPSORT_H case 'd' + 'r' * 32 + 'i' * 1024: dripsort(array, max, size, cmpf); break; #endif #ifdef FLOWSORT_H case 'f' + 'l' * 32 + 'o' * 1024: flowsort(array, max, size, cmpf); break; #endif #ifdef FLUXSORT_H case 'f' + 'l' * 32 + 'u' * 1024: fluxsort(array, max, size, cmpf); break; case 's' + '_' * 32 + 'f' * 1024: fluxsort_size(array, max, size, cmpf); break; #endif #ifdef GRIDSORT_H case 'g' + 'r' * 32 + 'i' * 1024: gridsort(array, max, size, cmpf); break; #endif #ifdef OCTOSORT_H case 'o' + 'c' * 32 + 't' * 1024: octosort(array, max, size, cmpf); break; #endif #ifdef PIPOSORT_H case 'p' + 'i' * 32 + 'p' * 1024: piposort(array, max, size, cmpf); break; #endif #ifdef QUADSORT_H case 'q' + 'u' * 32 + 'a' * 1024: quadsort(array, max, size, cmpf); break; case 's' + '_' * 32 + 'q' * 1024: quadsort_size(array, max, size, cmpf); break; #endif #ifdef SKIPSORT_H case 's' + 'k' * 32 + 'i' * 1024: skipsort(array, max, size, cmpf); break; #endif #ifdef WOLFSORT_H case 'w' + 'o' * 32 + 'l' * 1024: wolfsort(array, max, size, cmpf); break; #endif case 'q' + 's' * 32 + 'o' * 1024: qsort(array, max, size, cmpf); break; #ifdef RHSORT_C case 'r' + 'h' * 32 + 's' * 1024: if (size == sizeof(int)) rhsort32(pta, max); else return; break; #endif #ifdef __GNUG__ case 's' + 'o' * 32 + 'r' * 1024: if (size == sizeof(int)) std::sort(pta, pta + max); else if (size == sizeof(long long)) std::sort(ptla, ptla + max); else std::sort(ptda, ptda + max); break; case 's' + 't' * 32 + 'a' * 1024: if (size == sizeof(int)) std::stable_sort(pta, pta + max); else if (size == sizeof(long long)) std::stable_sort(ptla, ptla + max); else std::stable_sort(ptda, ptda + max); break; #ifdef PDQSORT_H case 'p' + 'd' * 32 + 'q' * 1024: if (size == sizeof(int)) pdqsort(pta, pta + max); else if (size == sizeof(long long)) pdqsort(ptla, ptla + max); else pdqsort(ptda, ptda + max); break; #endif #ifdef SKASORT_HPP case 's' + 'k' * 32 + 'a' * 1024: swap = malloc(max * size); if (size == sizeof(int)) ska_sort_copy(pta, pta + max, (int *) swap); else if (size == sizeof(long long)) ska_sort_copy(ptla, ptla + max, (long long *) swap); else repetitions = 0; free(swap); break; #endif #ifdef GFX_TIMSORT_HPP case 't' + 'i' * 32 + 'm' * 1024: if (size == sizeof(int)) gfx::timsort(pta, pta + max, cpp_cmp_int); else if (size == sizeof(long long)) gfx::timsort(ptla, ptla + max); else gfx::timsort(ptda, ptda + max); break; #endif #endif default: switch (name32) { case 's' + 'o' * 32 + 'r' * 1024: case 's' + 't' * 32 + 'a' * 1024: case 'p' + 'd' * 32 + 'q' * 1024: case 'r' + 'h' * 32 + 's' * 1024: case 's' + 'k' * 32 + 'a' * 1024: case 't' + 'i' * 32 + 'm' * 1024: printf("unknown sort: %s (compile with g++ instead of gcc?)\n", name); return; default: printf("unknown sort: %s\n", name); return; } } average_comp += comparisons; if (minimum < maximum && ++max > maximum) { max = minimum; } } end = utime(); total = end - start; if (!best || total < best) { best = total; } average_time += total; } if (minimum == 7 && maximum == 7) { printf("\e[1;32m%10d %10d %10d %10d %10d %10d %10d\e[0m\n", pta[0], pta[1], pta[2], pta[3], pta[4], pta[5], pta[6]); } if (repetitions == 0) { return; } average_time /= samples; if (cmpf == cmp_stable) { for (cnt = 1 ; cnt < maximum ; cnt++) { if (pta[cnt - 1] > pta[cnt]) { sprintf(temp, "\e[1;31m%16s\e[0m", "unstable"); desc = temp; break; } } } if (compare) { if (repetitions <= 1) { printf("|%10s |%9d | %4d |%9f |%9f |%10d | %7d | %16s |\e[0m\n", name, maximum, (int) size * 8, best / 1000000.0, average_time / 1000000.0, (int) comparisons, samples, desc); } else { printf("|%10s |%9d | %4d |%9f |%9f |%10.1f | %7d | %16s |\e[0m\n", name, maximum, (int) size * 8, best / 1000000.0, average_time / 1000000.0, (float) average_comp / repetitions, samples, desc); } } else { printf("|%10s | %8d | %4d | %f | %f | %9d | %7d | %16s |\e[0m\n", name, maximum, (int) size * 8, best / 1000000.0, average_time / 1000000.0, repetitions, samples, desc); } if (minimum != maximum || cmpf == cmp_stable) { return; } for (cnt = 1 ; cnt < maximum ; cnt++) { if (cmpf == cmp_str) { char **ptsa = (char **) array; if (strcmp((char *) ptsa[cnt - 1], (char *) ptsa[cnt]) > 0) { printf("%17s: not properly sorted at index %d. (%s vs %s\n", name, cnt, (char *) ptsa[cnt - 1], (char *) ptsa[cnt]); break; } } else if (size == sizeof(int *) && cmpf == cmp_long_double_ptr) { long double **pptda = (long double **) array; if (cmp_long_double_ptr(&pptda[cnt - 1], &pptda[cnt]) > 0) { printf("%17s: not properly sorted at index %d. (%Lf vs %Lf\n", name, cnt, *pptda[cnt - 1], *pptda[cnt]); break; } } else if (cmpf == cmp_long_ptr) { long long **pptla = (long long **) array; if (cmp_long_ptr(&pptla[cnt - 1], &pptla[cnt]) > 0) { printf("%17s: not properly sorted at index %d. (%lld vs %lld\n", name, cnt, *pptla[cnt - 1], *pptla[cnt]); break; } } else if (cmpf == cmp_int_ptr) { int **pptia = (int **) array; if (cmp_int_ptr(&pptia[cnt - 1], &pptia[cnt]) > 0) { printf("%17s: not properly sorted at index %d. (%d vs %d\n", name, cnt, *pptia[cnt - 1], *pptia[cnt]); break; } } else if (size == sizeof(int)) { if (pta[cnt - 1] > pta[cnt]) { printf("%17s: not properly sorted at index %d. (%d vs %d\n", name, cnt, pta[cnt - 1], pta[cnt]); break; } if (pta[cnt - 1] == pta[cnt]) { // printf("%17s: Found a repeat value at index %d. (%d)\n", name, cnt, pta[cnt]); } } else if (size == sizeof(long long)) { if (ptla[cnt - 1] > ptla[cnt]) { printf("%17s: not properly sorted at index %d. (%lld vs %lld\n", name, cnt, ptla[cnt - 1], ptla[cnt]); break; } } else if (size == sizeof(long double)) { if (cmp_long_double(&ptda[cnt - 1], &ptda[cnt]) > 0) { printf("%17s: not properly sorted at index %d. (%Lf vs %Lf\n", name, cnt, ptda[cnt - 1], ptda[cnt]); break; } } } for (cnt = 1 ; cnt < maximum ; cnt++) { if (size == sizeof(int)) { if (pta[cnt] != ptv[cnt]) { printf(" validate: array[%d] != valid[%d]. (%d vs %d\n", cnt, cnt, pta[cnt], ptv[cnt]); break; } } else if (size == sizeof(long long)) { if (ptla[cnt] != ptlv[cnt]) { if (cmpf == cmp_str) { char **ptsa = (char **) array; char **ptsv = (char **) valid; printf(" validate: array[%d] != valid[%d]. (%s vs %s) %s\n", cnt, cnt, (char *) ptsa[cnt], (char *) ptsv[cnt], !strcmp((char *) ptsa[cnt], (char *) ptsv[cnt]) ? "\e[1;31munstable\e[0m" : ""); break; } if (cmpf == cmp_long_ptr) { long long **ptla = (long long **) array; long long **ptlv = (long long **) valid; printf(" validate: array[%d] != valid[%d]. (%lld vs %lld) %s\n", cnt, cnt, *ptla[cnt], *ptlv[cnt], (*ptla[cnt] == *ptlv[cnt]) ? "\e[1;31munstable\e[0m" : ""); break; } if (cmpf == cmp_int_ptr) { int **ptia = (int **) array; int **ptiv = (int **) valid; printf(" validate: array[%d] != valid[%d]. (%d vs %d) %s\n", cnt, cnt, *ptia[cnt], *ptiv[cnt], (*ptia[cnt] == *ptiv[cnt]) ? "\e[1;31munstable\e[0m" : ""); break; } printf(" validate: array[%d] != valid[%d]. (%lld vs %lld\n", cnt, cnt, ptla[cnt], ptlv[cnt]); break; } } else if (size == sizeof(long double)) { if (ptda[cnt] != ptdv[cnt]) { printf(" validate: array[%d] != valid[%d]. (%Lf vs %Lf\n", cnt, cnt, ptda[cnt], ptdv[cnt]); break; } } } } void validate() { int seed = time(NULL); int cnt, val, max = 1000; int *a_array, *r_array, *v_array; seed_rand(seed); a_array = (int *) malloc(max * sizeof(int)); r_array = (int *) malloc(max * sizeof(int)); v_array = (int *) malloc(max * sizeof(int)); for (cnt = 0 ; cnt < max ; cnt++) r_array[cnt] = rand(); for (cnt = 0 ; cnt < max ; cnt++) { memcpy(a_array, r_array, cnt * sizeof(int)); memcpy(v_array, r_array, cnt * sizeof(int)); quadsort_prim(a_array, cnt, sizeof(int)); qsort(v_array, cnt, sizeof(int), cmp_int); for (val = 0 ; val < cnt ; val++) { if (val && v_array[val - 1] > v_array[val]) {printf("\e[1;31mvalidate rand: seed %d: size: %d Not properly sorted at index %d.\n", seed, cnt, val); return;} if (a_array[val] != v_array[val]) {printf("\e[1;31mvalidate rand: seed %d: size: %d Not verified at index %d.\n", seed, cnt, val); return;} } } // ascending saw for (cnt = 0 ; cnt < max ; cnt++) r_array[cnt] = cnt % (max / 5); for (cnt = 0 ; cnt < max ; cnt += 7) { memcpy(a_array, r_array, cnt * sizeof(int)); memcpy(v_array, r_array, cnt * sizeof(int)); quadsort(a_array, cnt, sizeof(int), cmp_int); qsort(v_array, cnt, sizeof(int), cmp_int); for (val = 0 ; val < cnt ; val++) { if (val && v_array[val - 1] > v_array[val]) {printf("\e[1;31mvalidate ascending saw: seed %d: size: %d Not properly sorted at index %d.\n", seed, cnt, val); return;} if (a_array[val] != v_array[val]) {printf("\e[1;31mvalidate ascending saw: seed %d: size: %d Not verified at index %d.\n", seed, cnt, val); return;} } } // descending saw for (cnt = 0 ; cnt < max ; cnt++) { r_array[cnt] = (max - cnt + 1) % (max / 11); } for (cnt = 1 ; cnt < max ; cnt += 7) { memcpy(a_array, r_array, cnt * sizeof(int)); memcpy(v_array, r_array, cnt * sizeof(int)); quadsort(a_array, cnt, sizeof(int), cmp_int); qsort(v_array, cnt, sizeof(int), cmp_int); for (val = 0 ; val < cnt ; val++) { if (val && v_array[val - 1] > v_array[val]) {printf("\e[1;31mvalidate descending saw: seed %d: size: %d Not properly sorted at index %d.\n\n", seed, cnt, val); return;} if (a_array[val] != v_array[val]) {printf("\e[1;31mvalidate descending saw: seed %d: size: %d Not verified at index %d.\n\n", seed, cnt, val); return;} } } // random half for (cnt = 0 ; cnt < max ; cnt++) r_array[cnt] = (cnt < max / 2) ? cnt : rand(); for (cnt = 1 ; cnt < max ; cnt += 7) { memcpy(a_array, r_array, cnt * sizeof(int)); memcpy(v_array, r_array, cnt * sizeof(int)); quadsort(a_array, cnt, sizeof(int), cmp_int); qsort(v_array, cnt, sizeof(int), cmp_int); for (val = 0 ; val < cnt ; val++) { if (val && v_array[val - 1] > v_array[val]) {printf("\e[1;31mvalidate rand tail: seed %d: size: %d Not properly sorted at index %d.\n", seed, cnt, val); return;} if (a_array[val] != v_array[val]) {printf("\e[1;31mvalidate rand tail: seed %d: size: %d Not verified at index %d.\n", seed, cnt, val); return;} } } free(a_array); free(r_array); free(v_array); } unsigned int bit_reverse(unsigned int x) { x = (((x & 0xaaaaaaaa) >> 1) | ((x & 0x55555555) << 1)); x = (((x & 0xcccccccc) >> 2) | ((x & 0x33333333) << 2)); x = (((x & 0xf0f0f0f0) >> 4) | ((x & 0x0f0f0f0f) << 4)); x = (((x & 0xff00ff00) >> 8) | ((x & 0x00ff00ff) << 8)); return((x >> 16) | (x << 15)); } void run_test(void *a_array, void *r_array, void *v_array, int minimum, int maximum, int samples, int repetitions, int copies, const char *desc, size_t size, CMPFUNC *cmpf) { int cnt, rep; memcpy(v_array, r_array, maximum * size); for (rep = 0 ; rep < copies ; rep++) { memcpy((char *) r_array + rep * maximum * size, v_array, maximum * size); } quadsort(v_array, maximum, size, cmpf); for (cnt = 0 ; (size_t) cnt < sizeof(sorts) / sizeof(char *) ; cnt++) { test_sort(a_array, r_array, v_array, minimum, maximum, samples, repetitions, qsort, sorts[cnt], desc, size, cmpf); } } void range_test(int max, int samples, int repetitions, int seed) { int cnt, last; int mem = max * 10 > 32768 * 64 ? max * 10 : 32768 * 64; char dist[40]; int *a_array = (int *) malloc(max * sizeof(int)); int *r_array = (int *) malloc(mem * sizeof(int)); int *v_array = (int *) malloc(max * sizeof(int)); srand(seed); for (cnt = 0 ; cnt < mem ; cnt++) { r_array[cnt] = rand(); } if (max <= 4096) { for (last = 1, samples = 32768*4, repetitions = 4 ; repetitions <= max ; repetitions *= 2, samples /= 2) { if (max >= repetitions) { sprintf(dist, "random %d-%d", last, repetitions); memcpy(v_array, r_array, repetitions * sizeof(int)); quadsort(v_array, repetitions, sizeof(int), cmp_int); for (cnt = 0 ; (size_t) cnt < sizeof(sorts) / sizeof(char *) ; cnt++) { test_sort(a_array, r_array, v_array, last, repetitions, 50, samples, qsort, sorts[cnt], dist, sizeof(int), cmp_int); } last = repetitions + 1; } } free(a_array); free(r_array); free(v_array); return; } if (max == 10000000) { repetitions = 10000000; for (max = 10 ; max <= 10000000 ; max *= 10) { repetitions /= 10; memcpy(v_array, r_array, max * sizeof(int)); quadsort_prim(v_array, max, sizeof(int)); sprintf(dist, "random %d", max); for (cnt = 0 ; (size_t) cnt < sizeof(sorts) / sizeof(char *) ; cnt++) { test_sort(a_array, r_array, v_array, max, max, 10, repetitions, qsort, sorts[cnt], dist, sizeof(int), cmp_int); } } } else { for (samples = 32768*4, repetitions = 4 ; samples > 0 ; repetitions *= 2, samples /= 2) { if (max >= repetitions) { memcpy(v_array, r_array, repetitions * sizeof(int)); quadsort(v_array, repetitions, sizeof(int), cmp_int); sprintf(dist, "random %d", repetitions); for (cnt = 0 ; (size_t) cnt < sizeof(sorts) / sizeof(char *) ; cnt++) { test_sort(a_array, r_array, v_array, repetitions, repetitions, 100, samples, qsort, sorts[cnt], dist, sizeof(int), cmp_int); } } } } free(a_array); free(r_array); free(v_array); return; } #define VAR int int main(int argc, char **argv) { int max = 100000; int samples = 10; int repetitions = 1; int seed = 0; int cnt, mem; VAR *a_array, *r_array, *v_array, sum; if (argc >= 1 && argv[1] && *argv[1]) { max = atoi(argv[1]); } if (argc >= 2 && argv[2] && *argv[2]) { samples = atoi(argv[2]); } if (argc >= 3 && argv[3] && *argv[3]) { repetitions = atoi(argv[3]); } if (argc >= 4 && argv[4] && *argv[4]) { seed = atoi(argv[4]); } validate(); seed = seed ? seed : time(NULL); printf("Info: int = %lu, long long = %lu, long double = %lu\n\n", sizeof(int) * 8, sizeof(long long) * 8, sizeof(long double) * 8); printf("Benchmark: array size: %d, samples: %d, repetitions: %d, seed: %d\n\n", max, samples, repetitions, seed); if (repetitions == 0) { range_test(max, samples, repetitions, seed); return 0; } mem = max * repetitions; #ifndef SKIP_STRINGS #ifndef cmp // C string { char **sa_array = (char **) malloc(max * sizeof(char **)); char **sr_array = (char **) malloc(mem * sizeof(char **)); char **sv_array = (char **) malloc(max * sizeof(char **)); char *buffer = (char *) malloc(mem * 16); seed_rand(seed); for (cnt = 0 ; cnt < mem ; cnt++) { sprintf(buffer + cnt * 16, "%X", rand() % 1000000); sr_array[cnt] = buffer + cnt * 16; } run_test(sa_array, sr_array, sv_array, max, max, samples, repetitions, 0, "random string", sizeof(char **), cmp_str); free(sa_array); free(sr_array); free(sv_array); free(buffer); } // long double table { long double **da_array = (long double **) malloc(max * sizeof(long double *)); long double **dr_array = (long double **) malloc(mem * sizeof(long double *)); long double **dv_array = (long double **) malloc(max * sizeof(long double *)); long double *buffer = (long double *) malloc(mem * sizeof(long double)); if (da_array == NULL || dr_array == NULL || dv_array == NULL) { printf("main(%d,%d,%d): malloc: %s\n", max, samples, repetitions, strerror(errno)); return 0; } seed_rand(seed); for (cnt = 0 ; cnt < mem ; cnt++) { buffer[cnt] = (long double) rand(); buffer[cnt] += (long double) ((unsigned long long) rand() << 32ULL); dr_array[cnt] = buffer + cnt; } run_test(da_array, dr_array, dv_array, max, max, samples, repetitions, 0, "random double", sizeof(long double *), cmp_long_double_ptr); free(da_array); free(dr_array); free(dv_array); free(buffer); } // long long table { long long **la_array = (long long **) malloc(max * sizeof(long long *)); long long **lr_array = (long long **) malloc(mem * sizeof(long long *)); long long **lv_array = (long long **) malloc(max * sizeof(long long *)); long long *buffer = (long long *) malloc(mem * sizeof(long long)); if (la_array == NULL || lr_array == NULL || lv_array == NULL) { printf("main(%d,%d,%d): malloc: %s\n", max, samples, repetitions, strerror(errno)); return 0; } seed_rand(seed); for (cnt = 0 ; cnt < mem ; cnt++) { buffer[cnt] = (long long) rand(); buffer[cnt] += (long long) ((unsigned long long) rand() << 32ULL); lr_array[cnt] = buffer + cnt; } run_test(la_array, lr_array, lv_array, max, max, samples, repetitions, 0, "random long", sizeof(long long *), cmp_long_ptr); free(la_array); free(lr_array); free(lv_array); free(buffer); } // int table { int **la_array = (int **) malloc(max * sizeof(int *)); int **lr_array = (int **) malloc(mem * sizeof(int *)); int **lv_array = (int **) malloc(max * sizeof(int *)); int *buffer = (int *) malloc(mem * sizeof(int)); if (la_array == NULL || lr_array == NULL || lv_array == NULL) { printf("main(%d,%d,%d): malloc: %s\n", max, samples, repetitions, strerror(errno)); return 0; } seed_rand(seed); for (cnt = 0 ; cnt < mem ; cnt++) { buffer[cnt] = rand(); lr_array[cnt] = buffer + cnt; } run_test(la_array, lr_array, lv_array, max, max, samples, repetitions, 0, "random int", sizeof(int *), cmp_int_ptr); free(la_array); free(lr_array); free(lv_array); free(buffer); printf("\n"); } #endif #endif // 128 bit #ifndef SKIP_DOUBLES long double *da_array = (long double *) malloc(max * sizeof(long double)); long double *dr_array = (long double *) malloc(mem * sizeof(long double)); long double *dv_array = (long double *) malloc(max * sizeof(long double)); if (da_array == NULL || dr_array == NULL || dv_array == NULL) { printf("main(%d,%d,%d): malloc: %s\n", max, samples, repetitions, strerror(errno)); return 0; } seed_rand(seed); for (cnt = 0 ; cnt < mem ; cnt++) { dr_array[cnt] = (long double) rand(); dr_array[cnt] += (long double) ((unsigned long long) rand() << 32ULL); dr_array[cnt] += 1.0L / 3.0L; } memcpy(dv_array, dr_array, max * sizeof(long double)); quadsort(dv_array, max, sizeof(long double), cmp_long_double); for (cnt = 0 ; (size_t) cnt < sizeof(sorts) / sizeof(char *) ; cnt++) { test_sort(da_array, dr_array, dv_array, max, max, samples, repetitions, qsort, sorts[cnt], "random order", sizeof(long double), cmp_long_double); } #ifndef cmp #ifdef QUADSORT_H test_sort(da_array, dr_array, dv_array, max, max, samples, repetitions, qsort, "s_quadsort", "random order", sizeof(long double), cmp_long_double_ptr); #endif #endif free(da_array); free(dr_array); free(dv_array); printf("\n"); #endif // 64 bit #ifndef SKIP_LONGS long long *la_array = (long long *) malloc(max * sizeof(long long)); long long *lr_array = (long long *) malloc(mem * sizeof(long long)); long long *lv_array = (long long *) malloc(max * sizeof(long long)); if (la_array == NULL || lr_array == NULL || lv_array == NULL) { printf("main(%d,%d,%d): malloc: %s\n", max, samples, repetitions, strerror(errno)); return 0; } seed_rand(seed); for (cnt = 0 ; cnt < mem ; cnt++) { lr_array[cnt] = rand(); lr_array[cnt] += (unsigned long long) rand() << 32ULL; } memcpy(lv_array, lr_array, max * sizeof(long long)); quadsort(lv_array, max, sizeof(long long), cmp_long); for (cnt = 0 ; (size_t) cnt < sizeof(sorts) / sizeof(char *) ; cnt++) { test_sort(la_array, lr_array, lv_array, max, max, samples, repetitions, qsort, sorts[cnt], "random order", sizeof(long long), cmp_long); } free(la_array); free(lr_array); free(lv_array); printf("\n"); #endif // 32 bit a_array = (VAR *) malloc(max * sizeof(VAR)); r_array = (VAR *) malloc(mem * sizeof(VAR)); v_array = (VAR *) malloc(max * sizeof(VAR)); int quad0 = 0; int nmemb = max; int half1 = nmemb / 2; int half2 = nmemb - half1; int quad1 = half1 / 2; int quad2 = half1 - quad1; int quad3 = half2 / 2; int quad4 = half2 - quad3; int span3 = quad1 + quad2 + quad3; // random seed_rand(seed); for (cnt = 0 ; cnt < mem ; cnt++) { r_array[cnt] = rand(); } run_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, "random order", sizeof(VAR), cmp_int); // random % 100 for (cnt = 0 ; cnt < mem ; cnt++) { r_array[cnt] = rand() % 100; } run_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, "random % 100", sizeof(VAR), cmp_int); // ascending for (cnt = sum = 0 ; cnt < mem ; cnt++) { r_array[cnt] = sum; sum += rand() % 5; } run_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, "ascending order", sizeof(VAR), cmp_int); // ascending saw for (cnt = 0 ; cnt < max ; cnt++) { r_array[cnt] = rand(); } quadsort(r_array + quad0, quad1, sizeof(VAR), cmp_int); quadsort(r_array + quad1, quad2, sizeof(VAR), cmp_int); quadsort(r_array + half1, quad3, sizeof(VAR), cmp_int); quadsort(r_array + span3, quad4, sizeof(VAR), cmp_int); run_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, "ascending saw", sizeof(VAR), cmp_int); // pipe organ for (cnt = 0 ; cnt < max ; cnt++) { r_array[cnt] = rand(); } quadsort(r_array + quad0, half1, sizeof(VAR), cmp_int); qsort(r_array + half1, half2, sizeof(VAR), cmp_rev); for (cnt = half1 + 1 ; cnt < max ; cnt++) { if (r_array[cnt] >= r_array[cnt - 1]) { r_array[cnt] = r_array[cnt - 1] - 1; // guarantee the run is strictly descending } } run_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, "pipe organ", sizeof(VAR), cmp_int); // descending for (cnt = 0, sum = mem * 10 ; cnt < mem ; cnt++) { r_array[cnt] = sum; sum -= 1 + rand() % 5; } run_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, "descending order", sizeof(VAR), cmp_int); // descending saw for (cnt = 0 ; cnt < max ; cnt++) { r_array[cnt] = rand(); } qsort(r_array + quad0, quad1, sizeof(VAR), cmp_rev); qsort(r_array + quad1, quad2, sizeof(VAR), cmp_rev); qsort(r_array + half1, quad3, sizeof(VAR), cmp_rev); qsort(r_array + span3, quad4, sizeof(VAR), cmp_rev); for (cnt = 1 ; cnt < max ; cnt++) { if (cnt == quad1 || cnt == half1 || cnt == span3) continue; if (r_array[cnt] >= r_array[cnt - 1]) { r_array[cnt] = r_array[cnt - 1] - 1; // guarantee the run is strictly descending } } run_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, "descending saw", sizeof(VAR), cmp_int); // random tail 25% for (cnt = 0 ; cnt < max ; cnt++) { r_array[cnt] = rand(); } quadsort(r_array, span3, sizeof(VAR), cmp_int); run_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, "random tail", sizeof(VAR), cmp_int); // random 50% for (cnt = 0 ; cnt < max ; cnt++) { r_array[cnt] = rand(); } quadsort(r_array, half1, sizeof(VAR), cmp_int); run_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, "random half", sizeof(VAR), cmp_int); // tiles for (cnt = 0 ; cnt < mem ; cnt++) { if (cnt % 2 == 0) { r_array[cnt] = 16777216 + cnt; } else { r_array[cnt] = 33554432 + cnt; } } run_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, "ascending tiles", sizeof(VAR), cmp_int); // bit-reversal for (cnt = 0 ; cnt < mem ; cnt++) { r_array[cnt] = bit_reverse(cnt); } run_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, "bit reversal", sizeof(VAR), cmp_int); #ifndef cmp #ifdef ANTIQSORT test_antiqsort; #endif #endif #define QUAD_DEBUG #if __has_include("extra_tests.c") #include "extra_tests.c" #endif free(a_array); free(r_array); free(v_array); return 0; } ================================================ FILE: src/blitsort.c ================================================ // blitsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com #define BLIT_AUX 512 // set to 0 for sqrt(n) cache size #define BLIT_OUT 96 // should be smaller or equal to BLIT_AUX void FUNC(blit_partition)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp); void FUNC(blit_analyze)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp) { unsigned char loop, asum, bsum, csum, dsum; unsigned int astreaks, bstreaks, cstreaks, dstreaks; size_t quad1, quad2, quad3, quad4, half1, half2; size_t cnt, abalance, bbalance, cbalance, dbalance; VAR *pta, *ptb, *ptc, *ptd; half1 = nmemb / 2; quad1 = half1 / 2; quad2 = half1 - quad1; half2 = nmemb - half1; quad3 = half2 / 2; quad4 = half2 - quad3; pta = array; ptb = array + quad1; ptc = array + half1; ptd = array + half1 + quad3; astreaks = bstreaks = cstreaks = dstreaks = 0; abalance = bbalance = cbalance = dbalance = 0; for (cnt = nmemb ; cnt > 132 ; cnt -= 128) { for (asum = bsum = csum = dsum = 0, loop = 32 ; loop ; loop--) { asum += cmp(pta, pta + 1) > 0; pta++; bsum += cmp(ptb, ptb + 1) > 0; ptb++; csum += cmp(ptc, ptc + 1) > 0; ptc++; dsum += cmp(ptd, ptd + 1) > 0; ptd++; } abalance += asum; astreaks += asum = (asum == 0) | (asum == 32); bbalance += bsum; bstreaks += bsum = (bsum == 0) | (bsum == 32); cbalance += csum; cstreaks += csum = (csum == 0) | (csum == 32); dbalance += dsum; dstreaks += dsum = (dsum == 0) | (dsum == 32); if (cnt > 516 && asum + bsum + csum + dsum == 0) { abalance += 48; pta += 96; bbalance += 48; ptb += 96; cbalance += 48; ptc += 96; dbalance += 48; ptd += 96; cnt -= 384; } } for ( ; cnt > 7 ; cnt -= 4) { abalance += cmp(pta, pta + 1) > 0; pta++; bbalance += cmp(ptb, ptb + 1) > 0; ptb++; cbalance += cmp(ptc, ptc + 1) > 0; ptc++; dbalance += cmp(ptd, ptd + 1) > 0; ptd++; } if (quad1 < quad2) {bbalance += cmp(ptb, ptb + 1) > 0; ptb++;} if (quad1 < quad3) {cbalance += cmp(ptc, ptc + 1) > 0; ptc++;} if (quad1 < quad4) {dbalance += cmp(ptd, ptd + 1) > 0; ptd++;} cnt = abalance + bbalance + cbalance + dbalance; if (cnt == 0) { if (cmp(pta, pta + 1) <= 0 && cmp(ptb, ptb + 1) <= 0 && cmp(ptc, ptc + 1) <= 0) { return; } } asum = quad1 - abalance == 1; bsum = quad2 - bbalance == 1; csum = quad3 - cbalance == 1; dsum = quad4 - dbalance == 1; if (asum | bsum | csum | dsum) { unsigned char span1 = (asum && bsum) * (cmp(pta, pta + 1) > 0); unsigned char span2 = (bsum && csum) * (cmp(ptb, ptb + 1) > 0); unsigned char span3 = (csum && dsum) * (cmp(ptc, ptc + 1) > 0); switch (span1 | span2 * 2 | span3 * 4) { case 0: break; case 1: FUNC(quad_reversal)(array, ptb); abalance = bbalance = 0; break; case 2: FUNC(quad_reversal)(pta + 1, ptc); bbalance = cbalance = 0; break; case 3: FUNC(quad_reversal)(array, ptc); abalance = bbalance = cbalance = 0; break; case 4: FUNC(quad_reversal)(ptb + 1, ptd); cbalance = dbalance = 0; break; case 5: FUNC(quad_reversal)(array, ptb); FUNC(quad_reversal)(ptb + 1, ptd); abalance = bbalance = cbalance = dbalance = 0; break; case 6: FUNC(quad_reversal)(pta + 1, ptd); bbalance = cbalance = dbalance = 0; break; case 7: FUNC(quad_reversal)(array, ptd); return; } if (asum && abalance) {FUNC(quad_reversal)(array, pta); abalance = 0;} if (bsum && bbalance) {FUNC(quad_reversal)(pta + 1, ptb); bbalance = 0;} if (csum && cbalance) {FUNC(quad_reversal)(ptb + 1, ptc); cbalance = 0;} if (dsum && dbalance) {FUNC(quad_reversal)(ptc + 1, ptd); dbalance = 0;} } #ifdef cmp cnt = nmemb / 256; // more than 50% ordered #else cnt = nmemb / 512; // more than 25% ordered #endif asum = astreaks > cnt; bsum = bstreaks > cnt; csum = cstreaks > cnt; dsum = dstreaks > cnt; #ifndef cmp if (quad1 > QUAD_CACHE) { asum = bsum = csum = dsum = 1; } #endif switch (asum + bsum * 2 + csum * 4 + dsum * 8) { case 0: FUNC(blit_partition)(array, swap, swap_size, nmemb, cmp); return; case 1: if (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp); FUNC(blit_partition)(pta + 1, swap, swap_size, quad2 + half2, cmp); break; case 2: FUNC(blit_partition)(array, swap, swap_size, quad1, cmp); if (bbalance) FUNC(quadsort_swap)(pta + 1, swap, swap_size, quad2, cmp); FUNC(blit_partition)(ptb + 1, swap, swap_size, half2, cmp); break; case 3: if (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp); if (bbalance) FUNC(quadsort_swap)(pta + 1, swap, swap_size, quad2, cmp); FUNC(blit_partition)(ptb + 1, swap, swap_size, half2, cmp); break; case 4: FUNC(blit_partition)(array, swap, swap_size, half1, cmp); if (cbalance) FUNC(quadsort_swap)(ptb + 1, swap, swap_size, quad3, cmp); FUNC(blit_partition)(ptc + 1, swap, swap_size, quad4, cmp); break; case 8: FUNC(blit_partition)(array, swap, swap_size, half1 + quad3, cmp); if (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp); break; case 9: if (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp); FUNC(blit_partition)(pta + 1, swap, swap_size, quad2 + quad3, cmp); if (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp); break; case 12: FUNC(blit_partition)(array, swap, swap_size, half1, cmp); if (cbalance) FUNC(quadsort_swap)(ptb + 1, swap, swap_size, quad3, cmp); if (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp); break; case 5: case 6: case 7: case 10: case 11: case 13: case 14: case 15: if (asum) { if (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp); } else FUNC(blit_partition)(array, swap, swap_size, quad1, cmp); if (bsum) { if (bbalance) FUNC(quadsort_swap)(pta + 1, swap, swap_size, quad2, cmp); } else FUNC(blit_partition)(pta + 1, swap, swap_size, quad2, cmp); if (csum) { if (cbalance) FUNC(quadsort_swap)(ptb + 1, swap, swap_size, quad3, cmp); } else FUNC(blit_partition)(ptb + 1, swap, swap_size, quad3, cmp); if (dsum) { if (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp); } else FUNC(blit_partition)(ptc + 1, swap, swap_size, quad4, cmp); break; } if (cmp(pta, pta + 1) <= 0) { if (cmp(ptc, ptc + 1) <= 0) { if (cmp(ptb, ptb + 1) <= 0) { return; } } else { FUNC(rotate_merge_block)(array + half1, swap, swap_size, quad3, quad4, cmp); } } else { FUNC(rotate_merge_block)(array, swap, swap_size, quad1, quad2, cmp); if (cmp(ptc, ptc + 1) > 0) { FUNC(rotate_merge_block)(array + half1, swap, swap_size, quad3, quad4, cmp); } } FUNC(rotate_merge_block)(array, swap, swap_size, half1, half2, cmp); } // The next 4 functions are used for pivot selection VAR FUNC(blit_binary_median)(VAR *pta, VAR *ptb, size_t len, CMPFUNC *cmp) { while (len /= 2) { if (cmp(pta + len, ptb + len) <= 0) pta += len; else ptb += len; } return cmp(pta, ptb) > 0 ? *pta : *ptb; } void FUNC(blit_trim_four)(VAR *pta, CMPFUNC *cmp) { VAR swap; size_t x; x = cmp(pta, pta + 1) > 0; swap = pta[!x]; pta[0] = pta[x]; pta[1] = swap; pta += 2; x = cmp(pta, pta + 1) > 0; swap = pta[!x]; pta[0] = pta[x]; pta[1] = swap; pta -= 2; x = (cmp(pta, pta + 2) <= 0) * 2; pta[2] = pta[x]; pta++; x = (cmp(pta, pta + 2) > 0) * 2; pta[0] = pta[x]; } VAR FUNC(blit_median_of_nine)(VAR *array, VAR *swap, size_t nmemb, CMPFUNC *cmp) { VAR *pta; size_t x, y, z; z = nmemb / 9; pta = array; for (x = 0 ; x < 9 ; x++) { swap[x] = *pta; pta += z; } FUNC(blit_trim_four)(swap, cmp); FUNC(blit_trim_four)(swap + 4, cmp); swap[0] = swap[5]; swap[3] = swap[8]; FUNC(blit_trim_four)(swap, cmp); swap[0] = swap[6]; x = cmp(swap + 0, swap + 1) > 0; y = cmp(swap + 0, swap + 2) > 0; z = cmp(swap + 1, swap + 2) > 0; return swap[(x == y) + (y ^ z)]; } VAR FUNC(blit_median_of_cbrt)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, int *generic, CMPFUNC *cmp) { VAR *pta, *pts; size_t cnt, div, cbrt; for (cbrt = 32 ; nmemb > cbrt * cbrt * cbrt && cbrt < swap_size ; cbrt *= 2) {} div = nmemb / cbrt; pta = array; // + (size_t) &div / 16 % div; // for a non-deterministic offset pts = swap; for (cnt = 0 ; cnt < cbrt ; cnt++) { pts[cnt] = *pta; pta += div; } cbrt /= 2; FUNC(quadsort_swap)(pts, pts + cbrt * 2, cbrt, cbrt, cmp); FUNC(quadsort_swap)(pts + cbrt, pts + cbrt * 2, cbrt, cbrt, cmp); *generic = (cmp(pts + cbrt * 2 - 1, pts) <= 0) & (cmp(pts + cbrt - 1, pts) <= 0); return FUNC(blit_binary_median)(pts, pts + cbrt, cbrt, cmp); } // As per suggestion by Marshall Lochbaum to improve generic data handling size_t FUNC(blit_reverse_partition)(VAR *array, VAR *swap, VAR *piv, size_t swap_size, size_t nmemb, CMPFUNC *cmp) { if (nmemb > swap_size) { size_t l, r, h = nmemb / 2; l = FUNC(blit_reverse_partition)(array + 0, swap, piv, swap_size, h, cmp); r = FUNC(blit_reverse_partition)(array + h, swap, piv, swap_size, nmemb - h, cmp); FUNC(trinity_rotation)(array + l, swap, swap_size, h - l + r, h - l); return l + r; } #if !defined __clang__ size_t cnt, val, m = 0; VAR *pta = array; for (cnt = nmemb / 4 ; cnt ; cnt--) { val = cmp(piv, pta) > 0; swap[-m] = array[m] = *pta++; m += val; swap++; val = cmp(piv, pta) > 0; swap[-m] = array[m] = *pta++; m += val; swap++; val = cmp(piv, pta) > 0; swap[-m] = array[m] = *pta++; m += val; swap++; val = cmp(piv, pta) > 0; swap[-m] = array[m] = *pta++; m += val; swap++; } for (cnt = nmemb % 4 ; cnt ; cnt--) { val = cmp(piv, pta) > 0; swap[-m] = array[m] = *pta++; m += val; swap++; } swap -= nmemb; #else size_t cnt, m; VAR *tmp, *ptx = array, *pta = array, *pts = swap; for (cnt = nmemb / 4 ; cnt ; cnt--) { tmp = cmp(piv, ptx) > 0 ? pta++ : pts++; *tmp = *ptx++; tmp = cmp(piv, ptx) > 0 ? pta++ : pts++; *tmp = *ptx++; tmp = cmp(piv, ptx) > 0 ? pta++ : pts++; *tmp = *ptx++; tmp = cmp(piv, ptx) > 0 ? pta++ : pts++; *tmp = *ptx++; } for (cnt = nmemb % 4 ; cnt ; cnt--) { tmp = cmp(piv, ptx) > 0 ? pta++ : pts++; *tmp = *ptx++; } m = pta - array; #endif memcpy(array + m, swap, (nmemb - m) * sizeof(VAR)); return m; } size_t FUNC(blit_default_partition)(VAR *array, VAR *swap, VAR *piv, size_t swap_size, size_t nmemb, CMPFUNC *cmp) { if (nmemb > swap_size) { size_t l, r, h = nmemb / 2; l = FUNC(blit_default_partition)(array + 0, swap, piv, swap_size, h, cmp); r = FUNC(blit_default_partition)(array + h, swap, piv, swap_size, nmemb - h, cmp); FUNC(trinity_rotation)(array + l, swap, swap_size, h - l + r, h - l); return l + r; } #if !defined __clang__ size_t cnt, val, m = 0; VAR *pta = array; for (cnt = nmemb / 4 ; cnt ; cnt--) { val = cmp(pta, piv) <= 0; swap[-m] = array[m] = *pta++; m += val; swap++; val = cmp(pta, piv) <= 0; swap[-m] = array[m] = *pta++; m += val; swap++; val = cmp(pta, piv) <= 0; swap[-m] = array[m] = *pta++; m += val; swap++; val = cmp(pta, piv) <= 0; swap[-m] = array[m] = *pta++; m += val; swap++; } for (cnt = nmemb % 4 ; cnt ; cnt--) { val = cmp(pta, piv) <= 0; swap[-m] = array[m] = *pta++; m += val; swap++; } swap -= nmemb; #else size_t cnt, m; VAR *tmp, *ptx = array, *pta = array, *pts = swap; for (cnt = nmemb / 4 ; cnt ; cnt--) { tmp = cmp(ptx, piv) <= 0 ? pta++ : pts++; *tmp = *ptx++; tmp = cmp(ptx, piv) <= 0 ? pta++ : pts++; *tmp = *ptx++; tmp = cmp(ptx, piv) <= 0 ? pta++ : pts++; *tmp = *ptx++; tmp = cmp(ptx, piv) <= 0 ? pta++ : pts++; *tmp = *ptx++; } for (cnt = nmemb % 4 ; cnt ; cnt--) { tmp = cmp(ptx, piv) <= 0 ? pta++ : pts++; *tmp = *ptx++; } m = pta - array; #endif memcpy(array + m, swap, sizeof(VAR) * (nmemb - m)); return m; } void FUNC(blit_partition)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp) { size_t a_size = 0, s_size; VAR piv, max = 0; int generic = 0; while (1) { if (nmemb <= 2048) { piv = FUNC(blit_median_of_nine)(array, swap, nmemb, cmp); } else { piv = FUNC(blit_median_of_cbrt)(array, swap, swap_size, nmemb, &generic, cmp); if (generic) break; } if (a_size && cmp(&max, &piv) <= 0) { a_size = FUNC(blit_reverse_partition)(array, swap, &piv, swap_size, nmemb, cmp); s_size = nmemb - a_size; nmemb = a_size; if (s_size <= a_size / 16 || a_size <= BLIT_OUT) break; a_size = 0; continue; } a_size = FUNC(blit_default_partition)(array, swap, &piv, swap_size, nmemb, cmp); s_size = nmemb - a_size; if (a_size <= s_size / 16 || s_size <= BLIT_OUT) { if (s_size == 0) { a_size = FUNC(blit_reverse_partition)(array, swap, &piv, swap_size, a_size, cmp); s_size = nmemb - a_size; nmemb = a_size; if (s_size <= a_size / 16 || a_size <= BLIT_OUT) break; a_size = 0; continue; } FUNC(quadsort_swap)(array + a_size, swap, swap_size, s_size, cmp); } else { FUNC(blit_partition)(array + a_size, swap, swap_size, s_size, cmp); } nmemb = a_size; if (s_size <= a_size / 16 || a_size <= BLIT_OUT) break; max = piv; } FUNC(quadsort_swap)(array, swap, swap_size, nmemb, cmp); } void FUNC(blitsort)(void *array, size_t nmemb, CMPFUNC *cmp) { if (nmemb <= 132) { FUNC(quadsort)(array, nmemb, cmp); } else { VAR *pta = (VAR *) array; #if BLIT_AUX size_t swap_size = BLIT_AUX; #else size_t swap_size = 1 << 19; while (nmemb / swap_size < swap_size / 128) { swap_size /= 4; } #endif VAR swap[swap_size]; FUNC(blit_analyze)(pta, swap, swap_size, nmemb, cmp); } } void FUNC(blitsort_swap)(void *array, void *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp) { if (nmemb <= 132) { FUNC(quadsort_swap)(array, swap, swap_size, nmemb, cmp); } else { VAR *pta = (VAR *) array; VAR *pts = (VAR *) swap; FUNC(blit_analyze)(pta, pts, swap_size, nmemb, cmp); } } #undef BLIT_AUX #undef BLIT_OUT ================================================ FILE: src/blitsort.h ================================================ // blitsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com #ifndef BLITSORT_H #define BLITSORT_H #include #include #include #include #include #include #include typedef int CMPFUNC (const void *a, const void *b); //#define cmp(a,b) (*(a) > *(b)) #ifndef QUADSORT_H #include "quadsort.h" #endif // When sorting an array of pointers, like a string array, the QUAD_CACHE needs // to be set for proper performance when sorting large arrays. // quadsort_prim() can be used to sort arrays of 32 and 64 bit integers // without a comparison function or cache restrictions. // With a 6 MB L3 cache a value of 262144 works well. #ifdef cmp #define QUAD_CACHE 4294967295 #else //#define QUAD_CACHE 131072 #define QUAD_CACHE 262144 //#define QUAD_CACHE 524288 //#define QUAD_CACHE 4294967295 #endif ////////////////////////////////////////////////////////// // ┌───────────────────────────────────────────────────┐// // │ ██████┐ ██████┐ ██████┐ ██████┐████████┐ │// // │ └────██┐└────██┐ ██┌──██┐└─██┌─┘└──██┌──┘ │// // │ █████┌┘ █████┌┘ ██████┌┘ ██│ ██│ │// // │ └───██┐██┌───┘ ██┌──██┐ ██│ ██│ │// // │ ██████┌┘███████┐ ██████┌┘██████┐ ██│ │// // │ └─────┘ └──────┘ └─────┘ └─────┘ └─┘ │// // └───────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// #define VAR int #define FUNC(NAME) NAME##32 #include "blitsort.c" #undef VAR #undef FUNC // blitsort_prim #define VAR int #define FUNC(NAME) NAME##_int32 #ifndef cmp #define cmp(a,b) (*(a) > *(b)) #include "blitsort.c" #undef cmp #else #include "blitsort.c" #endif #undef VAR #undef FUNC #define VAR unsigned int #define FUNC(NAME) NAME##_uint32 #ifndef cmp #define cmp(a,b) (*(a) > *(b)) #include "blitsort.c" #undef cmp #else #include "blitsort.c" #endif #undef VAR #undef FUNC ////////////////////////////////////////////////////////// // ┌───────────────────────────────────────────────────┐// // │ █████┐ ██┐ ██┐ ██████┐ ██████┐████████┐ │// // │ ██┌───┘ ██│ ██│ ██┌──██┐└─██┌─┘└──██┌──┘ │// // │ ██████┐ ███████│ ██████┌┘ ██│ ██│ │// // │ ██┌──██┐└────██│ ██┌──██┐ ██│ ██│ │// // │ └█████┌┘ ██│ ██████┌┘██████┐ ██│ │// // │ └────┘ └─┘ └─────┘ └─────┘ └─┘ │// // └───────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// #define VAR long long #define FUNC(NAME) NAME##64 #include "blitsort.c" #undef VAR #undef FUNC // blitsort_prim #define VAR long long #define FUNC(NAME) NAME##_int64 #ifndef cmp #define cmp(a,b) (*(a) > *(b)) #include "blitsort.c" #undef cmp #else #include "blitsort.c" #endif #undef VAR #undef FUNC #define VAR unsigned long long #define FUNC(NAME) NAME##_uint64 #ifndef cmp #define cmp(a,b) (*(a) > *(b)) #include "blitsort.c" #undef cmp #else #include "blitsort.c" #endif #undef VAR #undef FUNC // This section is outside of 32/64 bit pointer territory, so no cache checks // necessary, unless sorting 32+ byte structures. #undef QUAD_CACHE #define QUAD_CACHE 4294967295 ////////////////////////////////////////////////////////// //┌────────────────────────────────────────────────────┐// //│ █████┐ ██████┐ ██████┐████████┐ │// //│ ██┌──██┐ ██┌──██┐└─██┌─┘└──██┌──┘ │// //│ └█████┌┘ ██████┌┘ ██│ ██│ │// //│ ██┌──██┐ ██┌──██┐ ██│ ██│ │// //│ └█████┌┘ ██████┌┘██████┐ ██│ │// //│ └────┘ └─────┘ └─────┘ └─┘ │// //└────────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// #define VAR char #define FUNC(NAME) NAME##8 #include "blitsort.c" #undef VAR #undef FUNC ////////////////////////////////////////////////////////// //┌────────────────────────────────────────────────────┐// //│ ▄██┐ █████┐ ██████┐ ██████┐████████┐│// //│ ████│ ██┌───┘ ██┌──██┐└─██┌─┘└──██┌──┘│// //│ └─██│ ██████┐ ██████┌┘ ██│ ██│ │// //│ ██│ ██┌──██┐ ██┌──██┐ ██│ ██│ │// //│ ██████┐└█████┌┘ ██████┌┘██████┐ ██│ │// //│ └─────┘ └────┘ └─────┘ └─────┘ └─┘ │// //└────────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// #define VAR short #define FUNC(NAME) NAME##16 #include "blitsort.c" #undef VAR #undef FUNC ////////////////////////////////////////////////////////// //┌────────────────────────────────────────────────────┐// //│ ▄██┐ ██████┐ █████┐ ██████┐ ██████┐████████┐ │// //│ ████│ └────██┐██┌──██┐ ██┌──██┐└─██┌─┘└──██┌──┘ │// //│ └─██│ █████┌┘└█████┌┘ ██████┌┘ ██│ ██│ │// //│ ██│ ██┌───┘ ██┌──██┐ ██┌──██┐ ██│ ██│ │// //│ ██████┐███████┐└█████┌┘ ██████┌┘██████┐ ██│ │// //│ └─────┘└──────┘ └────┘ └─────┘ └─────┘ └─┘ │// //└────────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// // 128 reflects the name, though the actual size is 80, 96, or 128 bits, // depending on platform. #if (DBL_MANT_DIG < LDBL_MANT_DIG) #define VAR long double #define FUNC(NAME) NAME##128 #include "blitsort.c" #undef VAR #undef FUNC #endif /////////////////////////////////////////////////////////// //┌─────────────────────────────────────────────────────┐// //│ ██████┐██┐ ██┐███████┐████████┐ ██████┐ ███┐ ███┐│// //│██┌────┘██│ ██│██┌────┘└──██┌──┘██┌───██┐████┐████││// //│██│ ██│ ██│███████┐ ██│ ██│ ██│██┌███┌██││// //│██│ ██│ ██│└────██│ ██│ ██│ ██│██│└█┌┘██││// //│└██████┐└██████┌┘███████│ ██│ └██████┌┘██│ └┘ ██││// //│ └─────┘ └─────┘ └──────┘ └─┘ └─────┘ └─┘ └─┘│// //└─────────────────────────────────────────────────────┘// /////////////////////////////////////////////////////////// /* typedef struct {char bytes[32];} struct256; #define VAR struct256 #define FUNC(NAME) NAME##256 #include "blitsort.c" #undef VAR #undef FUNC */ ///////////////////////////////////////////////////////////////////////////// //┌────────────────────────────────────────────────────────────────────────┐// //│ ██████┐ ██┐ ██████┐████████┐███████┐ ██████┐ ██████┐ ████████┐ │// //│ ██┌──██┐██│ └─██┌─┘└──██┌──┘██┌────┘██┌───██┐██┌──██┐└──██┌──┘ │// //│ ██████┌┘██│ ██│ ██│ ███████┐██│ ██│██████┌┘ ██│ │// //│ ██┌──██┐██│ ██│ ██│ └────██│██│ ██│██┌──██┐ ██│ │// //│ ██████┌┘███████┐██████┐ ██│ ███████│└██████┌┘██│ ██│ ██│ │// //│ └─────┘ └──────┘└─────┘ └─┘ └──────┘ └─────┘ └─┘ └─┘ └─┘ │// //└────────────────────────────────────────────────────────────────────────┘// ///////////////////////////////////////////////////////////////////////////// void blitsort(void *array, size_t nmemb, size_t size, CMPFUNC *cmp) { if (nmemb < 2) { return; } switch (size) { case sizeof(char): blitsort8(array, nmemb, cmp); return; case sizeof(short): blitsort16(array, nmemb, cmp); return; case sizeof(int): blitsort32(array, nmemb, cmp); return; case sizeof(long long): blitsort64(array, nmemb, cmp); return; #if (DBL_MANT_DIG < LDBL_MANT_DIG) case sizeof(long double): blitsort128(array, nmemb, cmp); return; #endif // case sizeof(struct256): // blitsort256(array, nmemb, cmp); return; default: #if (DBL_MANT_DIG < LDBL_MANT_DIG) assert(size == sizeof(char) || size == sizeof(short) || size == sizeof(int) || size == sizeof(long long) || size == sizeof(long double)); #else assert(size == sizeof(char) || size == sizeof(short) || size == sizeof(int) || size == sizeof(long long)); #endif // qsort(array, nmemb, size, cmp); } } // suggested size values for primitives: // case 0: unsigned char // case 1: signed char // case 2: signed short // case 3: unsigned short // case 4: signed int // case 5: unsigned int // case 6: float // case 7: double // case 8: signed long long // case 9: unsigned long long // case ?: long double, use sizeof(long double): void blitsort_prim(void *array, size_t nmemb, size_t size) { if (nmemb < 2) { return; } switch (size) { case 4: blitsort_int32(array, nmemb, NULL); return; case 5: blitsort_uint32(array, nmemb, NULL); return; case 8: blitsort_int64(array, nmemb, NULL); return; case 9: blitsort_uint64(array, nmemb, NULL); return; default: assert(size == sizeof(int) || size == sizeof(int) + 1 || size == sizeof(long long) || size == sizeof(long long) + 1); return; } } #undef QUAD_CACHE #endif ================================================ FILE: src/crumsort.c ================================================ // crumsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com #define CRUM_AUX 512 #define CRUM_OUT 96 void FUNC(fulcrum_partition)(VAR *array, VAR *swap, VAR *max, size_t swap_size, size_t nmemb, CMPFUNC *cmp); void FUNC(crum_analyze)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp) { unsigned char loop, asum, bsum, csum, dsum; unsigned int astreaks, bstreaks, cstreaks, dstreaks; size_t quad1, quad2, quad3, quad4, half1, half2; size_t cnt, abalance, bbalance, cbalance, dbalance; VAR *pta, *ptb, *ptc, *ptd; half1 = nmemb / 2; quad1 = half1 / 2; quad2 = half1 - quad1; half2 = nmemb - half1; quad3 = half2 / 2; quad4 = half2 - quad3; pta = array; ptb = array + quad1; ptc = array + half1; ptd = array + half1 + quad3; astreaks = bstreaks = cstreaks = dstreaks = 0; abalance = bbalance = cbalance = dbalance = 0; for (cnt = nmemb ; cnt > 132 ; cnt -= 128) { for (asum = bsum = csum = dsum = 0, loop = 32 ; loop ; loop--) { asum += cmp(pta, pta + 1) > 0; pta++; bsum += cmp(ptb, ptb + 1) > 0; ptb++; csum += cmp(ptc, ptc + 1) > 0; ptc++; dsum += cmp(ptd, ptd + 1) > 0; ptd++; } abalance += asum; astreaks += asum = (asum == 0) | (asum == 32); bbalance += bsum; bstreaks += bsum = (bsum == 0) | (bsum == 32); cbalance += csum; cstreaks += csum = (csum == 0) | (csum == 32); dbalance += dsum; dstreaks += dsum = (dsum == 0) | (dsum == 32); if (cnt > 516 && asum + bsum + csum + dsum == 0) { abalance += 48; pta += 96; bbalance += 48; ptb += 96; cbalance += 48; ptc += 96; dbalance += 48; ptd += 96; cnt -= 384; } } for ( ; cnt > 7 ; cnt -= 4) { abalance += cmp(pta, pta + 1) > 0; pta++; bbalance += cmp(ptb, ptb + 1) > 0; ptb++; cbalance += cmp(ptc, ptc + 1) > 0; ptc++; dbalance += cmp(ptd, ptd + 1) > 0; ptd++; } if (quad1 < quad2) {bbalance += cmp(ptb, ptb + 1) > 0; ptb++;} if (quad1 < quad3) {cbalance += cmp(ptc, ptc + 1) > 0; ptc++;} if (quad1 < quad4) {dbalance += cmp(ptd, ptd + 1) > 0; ptd++;} cnt = abalance + bbalance + cbalance + dbalance; if (cnt == 0) { if (cmp(pta, pta + 1) <= 0 && cmp(ptb, ptb + 1) <= 0 && cmp(ptc, ptc + 1) <= 0) { return; } } asum = quad1 - abalance == 1; bsum = quad2 - bbalance == 1; csum = quad3 - cbalance == 1; dsum = quad4 - dbalance == 1; if (asum | bsum | csum | dsum) { unsigned char span1 = (asum && bsum) * (cmp(pta, pta + 1) > 0); unsigned char span2 = (bsum && csum) * (cmp(ptb, ptb + 1) > 0); unsigned char span3 = (csum && dsum) * (cmp(ptc, ptc + 1) > 0); switch (span1 | span2 * 2 | span3 * 4) { case 0: break; case 1: FUNC(quad_reversal)(array, ptb); abalance = bbalance = 0; break; case 2: FUNC(quad_reversal)(pta + 1, ptc); bbalance = cbalance = 0; break; case 3: FUNC(quad_reversal)(array, ptc); abalance = bbalance = cbalance = 0; break; case 4: FUNC(quad_reversal)(ptb + 1, ptd); cbalance = dbalance = 0; break; case 5: FUNC(quad_reversal)(array, ptb); FUNC(quad_reversal)(ptb + 1, ptd); abalance = bbalance = cbalance = dbalance = 0; break; case 6: FUNC(quad_reversal)(pta + 1, ptd); bbalance = cbalance = dbalance = 0; break; case 7: FUNC(quad_reversal)(array, ptd); return; } if (asum && abalance) {FUNC(quad_reversal)(array, pta); abalance = 0;} if (bsum && bbalance) {FUNC(quad_reversal)(pta + 1, ptb); bbalance = 0;} if (csum && cbalance) {FUNC(quad_reversal)(ptb + 1, ptc); cbalance = 0;} if (dsum && dbalance) {FUNC(quad_reversal)(ptc + 1, ptd); dbalance = 0;} } #ifdef cmp cnt = nmemb / 256; // switch to quadsort if at least 50% ordered #else cnt = nmemb / 512; // switch to quadsort if at least 25% ordered #endif asum = astreaks > cnt; bsum = bstreaks > cnt; csum = cstreaks > cnt; dsum = dstreaks > cnt; #ifndef cmp if (quad1 > QUAD_CACHE) { asum = bsum = csum = dsum = 1; } #endif switch (asum + bsum * 2 + csum * 4 + dsum * 8) { case 0: FUNC(fulcrum_partition)(array, swap, NULL, swap_size, nmemb, cmp); return; case 1: if (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp); FUNC(fulcrum_partition)(pta + 1, swap, NULL, swap_size, quad2 + half2, cmp); break; case 2: FUNC(fulcrum_partition)(array, swap, NULL, swap_size, quad1, cmp); if (bbalance) FUNC(quadsort_swap)(pta + 1, swap, swap_size, quad2, cmp); FUNC(fulcrum_partition)(ptb + 1, swap, NULL, swap_size, half2, cmp); break; case 3: if (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp); if (bbalance) FUNC(quadsort_swap)(pta + 1, swap, swap_size, quad2, cmp); FUNC(fulcrum_partition)(ptb + 1, swap, NULL, swap_size, half2, cmp); break; case 4: FUNC(fulcrum_partition)(array, swap, NULL, swap_size, half1, cmp); if (cbalance) FUNC(quadsort_swap)(ptb + 1, swap, swap_size, quad3, cmp); FUNC(fulcrum_partition)(ptc + 1, swap, NULL, swap_size, quad4, cmp); break; case 8: FUNC(fulcrum_partition)(array, swap, NULL, swap_size, half1 + quad3, cmp); if (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp); break; case 9: if (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp); FUNC(fulcrum_partition)(pta + 1, swap, NULL, swap_size, quad2 + quad3, cmp); if (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp); break; case 12: FUNC(fulcrum_partition)(array, swap, NULL, swap_size, half1, cmp); if (cbalance) FUNC(quadsort_swap)(ptb + 1, swap, swap_size, quad3, cmp); if (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp); break; case 5: case 6: case 7: case 10: case 11: case 13: case 14: case 15: if (asum) { if (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp); } else FUNC(fulcrum_partition)(array, swap, NULL, swap_size, quad1, cmp); if (bsum) { if (bbalance) FUNC(quadsort_swap)(pta + 1, swap, swap_size, quad2, cmp); } else FUNC(fulcrum_partition)(pta + 1, swap, NULL, swap_size, quad2, cmp); if (csum) { if (cbalance) FUNC(quadsort_swap)(ptb + 1, swap, swap_size, quad3, cmp); } else FUNC(fulcrum_partition)(ptb + 1, swap, NULL, swap_size, quad3, cmp); if (dsum) { if (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp); } else FUNC(fulcrum_partition)(ptc + 1, swap, NULL, swap_size, quad4, cmp); break; } if (cmp(pta, pta + 1) <= 0) { if (cmp(ptc, ptc + 1) <= 0) { if (cmp(ptb, ptb + 1) <= 0) { return; } } else { FUNC(rotate_merge_block)(array + half1, swap, swap_size, quad3, quad4, cmp); } } else { FUNC(rotate_merge_block)(array, swap, swap_size, quad1, quad2, cmp); if (cmp(ptc, ptc + 1) > 0) { FUNC(rotate_merge_block)(array + half1, swap, swap_size, quad3, quad4, cmp); } } FUNC(rotate_merge_block)(array, swap, swap_size, half1, half2, cmp); } // The next 4 functions are used for pivot selection VAR *FUNC(crum_binary_median)(VAR *pta, VAR *ptb, size_t len, CMPFUNC *cmp) { while (len /= 2) { if (cmp(pta + len, ptb + len) <= 0) pta += len; else ptb += len; } return cmp(pta, ptb) > 0 ? pta : ptb; } VAR *FUNC(crum_median_of_cbrt)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, int *generic, CMPFUNC *cmp) { VAR *pta, *piv; size_t cnt, cbrt, div; for (cbrt = 32 ; nmemb > cbrt * cbrt * cbrt && cbrt < swap_size ; cbrt *= 2) {} div = nmemb / cbrt; pta = array + nmemb - 1 - (size_t) &div / 64 % div; piv = array + cbrt; for (cnt = cbrt ; cnt ; cnt--) { swap[0] = *--piv; *piv = *pta; *pta = swap[0]; pta -= div; } cbrt /= 2; FUNC(quadsort_swap)(piv, swap, swap_size, cbrt, cmp); FUNC(quadsort_swap)(piv + cbrt, swap, swap_size, cbrt, cmp); *generic = (cmp(piv + cbrt * 2 - 1, piv) <= 0) & (cmp(piv + cbrt - 1, piv) <= 0); return FUNC(crum_binary_median)(piv, piv + cbrt, cbrt, cmp); } size_t FUNC(crum_median_of_three)(VAR *array, size_t v0, size_t v1, size_t v2, CMPFUNC *cmp) { size_t v[3] = {v0, v1, v2}; char x, y, z; x = cmp(array + v0, array + v1) > 0; y = cmp(array + v0, array + v2) > 0; z = cmp(array + v1, array + v2) > 0; return v[(x == y) + (y ^ z)]; } VAR *FUNC(crum_median_of_nine)(VAR *array, size_t nmemb, CMPFUNC *cmp) { size_t x, y, z, div = nmemb / 16; x = FUNC(crum_median_of_three)(array, div * 2, div * 1, div * 4, cmp); y = FUNC(crum_median_of_three)(array, div * 8, div * 6, div * 10, cmp); z = FUNC(crum_median_of_three)(array, div * 14, div * 12, div * 15, cmp); return array + FUNC(crum_median_of_three)(array, x, y, z, cmp); } size_t FUNC(fulcrum_default_partition)(VAR *array, VAR *swap, VAR *ptx, VAR *piv, size_t swap_size, size_t nmemb, CMPFUNC *cmp) { size_t i, cnt, val, m = 0; VAR *ptl, *ptr, *pta, *tpa; memcpy(swap, array, 32 * sizeof(VAR)); memcpy(swap + 32, array + nmemb - 32, 32 * sizeof(VAR)); ptl = array; ptr = array + nmemb - 1; pta = array + 32; tpa = array + nmemb - 33; cnt = nmemb / 16 - 4; while (1) { if (pta - ptl - m <= 48) { if (cnt-- == 0) break; for (i = 16 ; i ; i--) { val = cmp(pta, piv) <= 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--; } } if (pta - ptl - m >= 16) { if (cnt-- == 0) break; for (i = 16 ; i ; i--) { val = cmp(tpa, piv) <= 0; ptl[m] = ptr[m] = *tpa--; m += val; ptr--; } } } if (pta - ptl - m <= 48) { for (cnt = nmemb % 16 ; cnt ; cnt--) { val = cmp(pta, piv) <= 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--; } } else { for (cnt = nmemb % 16 ; cnt ; cnt--) { val = cmp(tpa, piv) <= 0; ptl[m] = ptr[m] = *tpa--; m += val; ptr--; } } pta = swap; for (cnt = 16 ; cnt ; cnt--) { val = cmp(pta, piv) <= 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--; val = cmp(pta, piv) <= 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--; val = cmp(pta, piv) <= 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--; val = cmp(pta, piv) <= 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--; } return m; } // As per suggestion by Marshall Lochbaum to improve generic data handling by mimicking dual-pivot quicksort size_t FUNC(fulcrum_reverse_partition)(VAR *array, VAR *swap, VAR *ptx, VAR *piv, size_t swap_size, size_t nmemb, CMPFUNC *cmp) { size_t i, cnt, val, m = 0; VAR *ptl, *ptr, *pta, *tpa; memcpy(swap, array, 32 * sizeof(VAR)); memcpy(swap + 32, array + nmemb - 32, 32 * sizeof(VAR)); ptl = array; ptr = array + nmemb - 1; pta = array + 32; tpa = array + nmemb - 33; cnt = nmemb / 16 - 4; while (1) { if (pta - ptl - m <= 48) { if (cnt-- == 0) break; for (i = 16 ; i ; i--) { val = cmp(piv, pta) > 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--; } } if (pta - ptl - m >= 16) { if (cnt-- == 0) break; for (i = 16 ; i ; i--) { val = cmp(piv, tpa) > 0; ptl[m] = ptr[m] = *tpa--; m += val; ptr--; } } } if (pta - ptl - m <= 48) { for (cnt = nmemb % 16 ; cnt ; cnt--) { val = cmp(piv, pta) > 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--; } } else { for (cnt = nmemb % 16 ; cnt ; cnt--) { val = cmp(piv, tpa) > 0; ptl[m] = ptr[m] = *tpa--; m += val; ptr--; } } pta = swap; for (cnt = 16 ; cnt ; cnt--) { val = cmp(piv, pta) > 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--; val = cmp(piv, pta) > 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--; val = cmp(piv, pta) > 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--; val = cmp(piv, pta) > 0; ptl[m] = ptr[m] = *pta++; m += val; ptr--; } return m; } void FUNC(fulcrum_partition)(VAR *array, VAR *swap, VAR *max, size_t swap_size, size_t nmemb, CMPFUNC *cmp) { size_t a_size, s_size; VAR *ptp, piv; int generic = 0; while (1) { if (nmemb <= 2048) { ptp = FUNC(crum_median_of_nine)(array, nmemb, cmp); } else { ptp = FUNC(crum_median_of_cbrt)(array, swap, swap_size, nmemb, &generic, cmp); if (generic) break; } piv = *ptp; if (max && cmp(max, &piv) <= 0) { a_size = FUNC(fulcrum_reverse_partition)(array, swap, array, &piv, swap_size, nmemb, cmp); s_size = nmemb - a_size; nmemb = a_size; if (s_size <= a_size / 32 || a_size <= CRUM_OUT) break; max = NULL; continue; } *ptp = array[--nmemb]; a_size = FUNC(fulcrum_default_partition)(array, swap, array, &piv, swap_size, nmemb, cmp); s_size = nmemb - a_size; ptp = array + a_size; array[nmemb] = *ptp; *ptp = piv; if (a_size <= s_size / 32 || s_size <= CRUM_OUT) { FUNC(quadsort_swap)(ptp + 1, swap, swap_size, s_size, cmp); } else { FUNC(fulcrum_partition)(ptp + 1, swap, max, swap_size, s_size, cmp); } nmemb = a_size; if (s_size <= a_size / 32 || a_size <= CRUM_OUT) { if (a_size <= CRUM_OUT) break; a_size = FUNC(fulcrum_reverse_partition)(array, swap, array, &piv, swap_size, nmemb, cmp); s_size = nmemb - a_size; nmemb = a_size; if (s_size <= a_size / 32 || a_size <= CRUM_OUT) break; max = NULL; continue; } max = ptp; } FUNC(quadsort_swap)(array, swap, swap_size, nmemb, cmp); } void FUNC(crumsort)(void *array, size_t nmemb, CMPFUNC *cmp) { if (nmemb <= 256) { VAR swap[nmemb]; FUNC(quadsort_swap)(array, swap, nmemb, nmemb, cmp); return; } VAR *pta = (VAR *) array; #if CRUM_AUX size_t swap_size = CRUM_AUX; #else size_t swap_size = 128; while (swap_size * swap_size <= nmemb) { swap_size *= 4; } #endif VAR swap[swap_size]; FUNC(crum_analyze)(pta, swap, swap_size, nmemb, cmp); } void FUNC(crumsort_swap)(void *array, void *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp) { if (nmemb <= 256) { FUNC(quadsort_swap)(array, swap, swap_size, nmemb, cmp); } else { VAR *pta = (VAR *) array; VAR *pts = (VAR *) swap; FUNC(crum_analyze)(pta, pts, swap_size, nmemb, cmp); } } ================================================ FILE: src/crumsort.h ================================================ // crumsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com #ifndef CRUMSORT_H #define CRUMSORT_H #include #include #include #include #include #include #include typedef int CMPFUNC (const void *a, const void *b); //#define cmp(a,b) (*(a) > *(b)) #ifndef QUADSORT_H #include "quadsort.h" #endif // When sorting an array of pointers, like a string array, the QUAD_CACHE needs // to be set for proper performance when sorting large arrays. // crumsort_prim() can be used to sort arrays of 32 and 64 bit integers // without a comparison function or cache restrictions. // With a 6 MB L3 cache a value of 262144 works well. #ifdef cmp #define QUAD_CACHE 4294967295 #else //#define QUAD_CACHE 131072 #define QUAD_CACHE 262144 //#define QUAD_CACHE 524288 //#define QUAD_CACHE 4294967295 #endif ////////////////////////////////////////////////////////// // ┌───────────────────────────────────────────────────┐// // │ ██████┐ ██████┐ ██████┐ ██████┐████████┐ │// // │ └────██┐└────██┐ ██┌──██┐└─██┌─┘└──██┌──┘ │// // │ █████┌┘ █████┌┘ ██████┌┘ ██│ ██│ │// // │ └───██┐██┌───┘ ██┌──██┐ ██│ ██│ │// // │ ██████┌┘███████┐ ██████┌┘██████┐ ██│ │// // │ └─────┘ └──────┘ └─────┘ └─────┘ └─┘ │// // └───────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// #define VAR int #define FUNC(NAME) NAME##32 #include "crumsort.c" #undef VAR #undef FUNC // crumsort_prim #define VAR int #define FUNC(NAME) NAME##_int32 #ifndef cmp #define cmp(a,b) (*(a) > *(b)) #include "crumsort.c" #undef cmp #else #include "crumsort.c" #endif #undef VAR #undef FUNC #define VAR unsigned int #define FUNC(NAME) NAME##_uint32 #ifndef cmp #define cmp(a,b) (*(a) > *(b)) #include "crumsort.c" #undef cmp #else #include "crumsort.c" #endif #undef VAR #undef FUNC ////////////////////////////////////////////////////////// // ┌───────────────────────────────────────────────────┐// // │ █████┐ ██┐ ██┐ ██████┐ ██████┐████████┐ │// // │ ██┌───┘ ██│ ██│ ██┌──██┐└─██┌─┘└──██┌──┘ │// // │ ██████┐ ███████│ ██████┌┘ ██│ ██│ │// // │ ██┌──██┐└────██│ ██┌──██┐ ██│ ██│ │// // │ └█████┌┘ ██│ ██████┌┘██████┐ ██│ │// // │ └────┘ └─┘ └─────┘ └─────┘ └─┘ │// // └───────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// #define VAR long long #define FUNC(NAME) NAME##64 #include "crumsort.c" #undef VAR #undef FUNC // crumsort_prim #define VAR long long #define FUNC(NAME) NAME##_int64 #ifndef cmp #define cmp(a,b) (*(a) > *(b)) #include "crumsort.c" #undef cmp #else #include "crumsort.c" #endif #undef VAR #undef FUNC #define VAR unsigned long long #define FUNC(NAME) NAME##_uint64 #ifndef cmp #define cmp(a,b) (*(a) > *(b)) #include "crumsort.c" #undef cmp #else #include "crumsort.c" #endif #undef VAR #undef FUNC // This section is outside of 32/64 bit pointer territory, so no cache checks // necessary, unless sorting 32+ byte structures. #undef QUAD_CACHE #define QUAD_CACHE 4294967295 ////////////////////////////////////////////////////////// //┌────────────────────────────────────────────────────┐// //│ █████┐ ██████┐ ██████┐████████┐ │// //│ ██┌──██┐ ██┌──██┐└─██┌─┘└──██┌──┘ │// //│ └█████┌┘ ██████┌┘ ██│ ██│ │// //│ ██┌──██┐ ██┌──██┐ ██│ ██│ │// //│ └█████┌┘ ██████┌┘██████┐ ██│ │// //│ └────┘ └─────┘ └─────┘ └─┘ │// //└────────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// #define VAR char #define FUNC(NAME) NAME##8 #include "crumsort.c" #undef VAR #undef FUNC ////////////////////////////////////////////////////////// //┌────────────────────────────────────────────────────┐// //│ ▄██┐ █████┐ ██████┐ ██████┐████████┐│// //│ ████│ ██┌───┘ ██┌──██┐└─██┌─┘└──██┌──┘│// //│ └─██│ ██████┐ ██████┌┘ ██│ ██│ │// //│ ██│ ██┌──██┐ ██┌──██┐ ██│ ██│ │// //│ ██████┐└█████┌┘ ██████┌┘██████┐ ██│ │// //│ └─────┘ └────┘ └─────┘ └─────┘ └─┘ │// //└────────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// #define VAR short #define FUNC(NAME) NAME##16 #include "crumsort.c" #undef VAR #undef FUNC ////////////////////////////////////////////////////////// //┌────────────────────────────────────────────────────┐// //│ ▄██┐ ██████┐ █████┐ ██████┐ ██████┐████████┐ │// //│ ████│ └────██┐██┌──██┐ ██┌──██┐└─██┌─┘└──██┌──┘ │// //│ └─██│ █████┌┘└█████┌┘ ██████┌┘ ██│ ██│ │// //│ ██│ ██┌───┘ ██┌──██┐ ██┌──██┐ ██│ ██│ │// //│ ██████┐███████┐└█████┌┘ ██████┌┘██████┐ ██│ │// //│ └─────┘└──────┘ └────┘ └─────┘ └─────┘ └─┘ │// //└────────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// // 128 reflects the name, though the actual size of a long double is 64, 80, // 96, or 128 bits, depending on platform. #if (DBL_MANT_DIG < LDBL_MANT_DIG) #define VAR long double #define FUNC(NAME) NAME##128 #include "crumsort.c" #undef VAR #undef FUNC #endif /////////////////////////////////////////////////////////// //┌─────────────────────────────────────────────────────┐// //│ ██████┐██┐ ██┐███████┐████████┐ ██████┐ ███┐ ███┐│// //│██┌────┘██│ ██│██┌────┘└──██┌──┘██┌───██┐████┐████││// //│██│ ██│ ██│███████┐ ██│ ██│ ██│██┌███┌██││// //│██│ ██│ ██│└────██│ ██│ ██│ ██│██│└█┌┘██││// //│└██████┐└██████┌┘███████│ ██│ └██████┌┘██│ └┘ ██││// //│ └─────┘ └─────┘ └──────┘ └─┘ └─────┘ └─┘ └─┘│// //└─────────────────────────────────────────────────────┘// /////////////////////////////////////////////////////////// /* typedef struct {char bytes[32];} struct256; #define VAR struct256 #define FUNC(NAME) NAME##256 #include "crumsort.c" #undef VAR #undef FUNC */ ////////////////////////////////////////////////////////////////////////// //┌─────────────────────────────────────────────────────────────────────┐// //│ ██████┐██████┐ ██┐ ██┐███┐ ███┐███████┐ ██████┐ ██████┐ ████████┐│// //│██┌────┘██┌──██┐██│ ██│████┐████│██┌────┘██┌───██┐██┌──██┐└──██┌──┘│// //│██│ ██████┌┘██│ ██│██┌███┌██│███████┐██│ ██│██████┌┘ ██│ │// //│██│ ██┌──██┐██│ ██│██│└█┌┘██│└────██│██│ ██│██┌──██┐ ██│ │// //│└██████┐██│ ██│└██████┌┘██│ └┘ ██│███████│└██████┌┘██│ ██│ ██│ │// //│ └─────┘└─┘ └─┘ └─────┘ └─┘ └─┘└──────┘ └─────┘ └─┘ └─┘ └─┘ │// //└─────────────────────────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////////////////////// void crumsort(void *array, size_t nmemb, size_t size, CMPFUNC *cmp) { if (nmemb < 2) { return; } switch (size) { case sizeof(char): crumsort8(array, nmemb, cmp); return; case sizeof(short): crumsort16(array, nmemb, cmp); return; case sizeof(int): crumsort32(array, nmemb, cmp); return; case sizeof(long long): crumsort64(array, nmemb, cmp); return; #if (DBL_MANT_DIG < LDBL_MANT_DIG) case sizeof(long double): crumsort128(array, nmemb, cmp); return; #endif // case sizeof(struct256): // crumsort256(array, nmemb, cmp); return; default: #if (DBL_MANT_DIG < LDBL_MANT_DIG) assert(size == sizeof(char) || size == sizeof(short) || size == sizeof(int) || size == sizeof(long long) || size == sizeof(long double)); #else assert(size == sizeof(char) || size == sizeof(short) || size == sizeof(int) || size == sizeof(long long)); #endif // qsort(array, nmemb, size, cmp); } } // suggested size values for primitives: // case 0: unsigned char // case 1: signed char // case 2: signed short // case 3: unsigned short // case 4: signed int // case 5: unsigned int // case 6: float // case 7: double // case 8: signed long long // case 9: unsigned long long // case ?: long double, use sizeof(long double): void crumsort_prim(void *array, size_t nmemb, size_t size) { if (nmemb < 2) { return; } switch (size) { case 4: crumsort_int32(array, nmemb, NULL); return; case 5: crumsort_uint32(array, nmemb, NULL); return; case 8: crumsort_int64(array, nmemb, NULL); return; case 9: crumsort_uint64(array, nmemb, NULL); return; default: assert(size == sizeof(int) || size == sizeof(int) + 1 || size == sizeof(long long) || size == sizeof(long long) + 1); return; } } #undef QUAD_CACHE #endif ================================================ FILE: src/extra_tests.c ================================================ #ifdef QUAD_DEBUG // random % 4 for (cnt = 0 ; cnt < mem ; cnt++) { r_array[cnt] = rand() % 4; } run_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, "random % 4", sizeof(VAR), cmp_int); // semi random for (cnt = 0 ; cnt < mem ; cnt++) { r_array[cnt] = rand() % 8 / 7 * rand(); } run_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, "semi random", sizeof(VAR), cmp_int); // random signal for (cnt = 0 ; cnt < mem ; cnt++) { if (cnt < mem / 2) { r_array[cnt] = cnt + rand() % 16; } else { r_array[cnt] = mem - cnt + rand() % 16; } } run_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, "random signal", sizeof(VAR), cmp_int); // exponential for (cnt = 0 ; cnt < mem ; cnt++) { r_array[cnt] = (size_t) (cnt * cnt) % 10000; //(1 << 30); } run_test(a_array, r_array, v_array, max, max, samples, repetitions, 0, "exponential", sizeof(VAR), cmp_int); // random fragments -- Make array 92% sorted for (cnt = 0 ; cnt < max ; cnt++) { r_array[cnt] = rand(); } quadsort(r_array + quad0, quad1 / 100 * 98, sizeof(VAR), cmp_int); quadsort(r_array + quad1, quad1 / 100 * 98, sizeof(VAR), cmp_int); quadsort(r_array + half1, quad1 / 100 * 98, sizeof(VAR), cmp_int); quadsort(r_array + span3, quad1 / 100 * 98, sizeof(VAR), cmp_int); run_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, "chaos fragments", sizeof(VAR), cmp_int); // Make array 12% sorted, this tends to make timsort/powersort slower than fully random for (cnt = 0 ; cnt < max ; cnt++) { r_array[cnt] = rand(); } quadsort(r_array + quad0 / 1, quad1 * 2 / 100, sizeof(VAR), cmp_int); quadsort(r_array + quad1 / 2, quad1 * 2 / 100, sizeof(VAR), cmp_int); quadsort(r_array + quad1 / 1, quad1 * 2 / 100, sizeof(VAR), cmp_int); quadsort(r_array + half1 / 1, quad1 * 2 / 100, sizeof(VAR), cmp_int); quadsort(r_array + span3 / 2, quad1 * 2 / 100, sizeof(VAR), cmp_int); quadsort(r_array + span3 / 1, quad1 * 2 / 100, sizeof(VAR), cmp_int); run_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, "order fragments", sizeof(VAR), cmp_int); // Make array 95% generic for (cnt = 0 ; cnt < max ; cnt++) { if (rand() % 20 == 0) { r_array[cnt] = rand(); } else { r_array[cnt] = 1000000000; } } run_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, "95% generic", sizeof(VAR), cmp_int); // Three saws for (cnt = 0 ; cnt < max ; cnt++) { r_array[cnt] = rand(); } quadsort(r_array, max / 3, sizeof(VAR), cmp_int); quadsort(r_array + max / 3, max / 3, sizeof(VAR), cmp_int); quadsort(r_array + max / 3 * 2, max / 3, sizeof(VAR), cmp_int); run_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, "three saws", sizeof(VAR), cmp_int); // various combinations of reverse and ascending order data /* for (cnt = 0 ; cnt < max ; cnt++) r_array[cnt] = rand(); quadsort(r_array + quad0, half1, sizeof(VAR), cmp_int); quadsort(r_array + half1, half2, sizeof(VAR), cmp_int); run_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, "aaaaa aaaaa", sizeof(VAR), cmp_int); for (cnt = 0 ; cnt < max ; cnt++) r_array[cnt] = rand(); quadsort(r_array + quad1 / 2, nmemb - quad1 / 2, sizeof(VAR), cmp_int); run_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, "raaaaaaaaaa", sizeof(VAR), cmp_int); size_t span2 = quad2 + quad3 + quad4; for (cnt = 0 ; cnt < max ; cnt++) r_array[cnt] = rand(); quadsort(r_array + quad1, span2, sizeof(VAR), cmp_int); run_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, "rr aaaaaaaa", sizeof(VAR), cmp_int); for (cnt = 0 ; cnt < max ; cnt++) r_array[cnt] = rand(); quadsort(r_array + quad0, quad1, sizeof(VAR), cmp_int); quadsort(r_array + half1, half2, sizeof(VAR), cmp_int); run_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, "aa rr aaaaa", sizeof(VAR), cmp_int); for (cnt = 0 ; cnt < max ; cnt++) r_array[cnt] = rand(); quadsort(r_array + quad0, half1, sizeof(VAR), cmp_int); quadsort(r_array + span3, quad4, sizeof(VAR), cmp_int); run_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, "aaaaa rr aa", sizeof(VAR), cmp_int); for (cnt = 0 ; cnt < max ; cnt++) r_array[cnt] = rand(); quadsort(r_array + quad0, nmemb, sizeof(VAR), cmp_int); qsort(r_array + quad0, half1, sizeof(VAR), cmp_rev); qsort(r_array + half1, half2, sizeof(VAR), cmp_rev); run_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, "rrrrr rrrrr", sizeof(VAR), cmp_int); for (cnt = 0 ; cnt < max ; cnt++) r_array[cnt] = rand(); quadsort(r_array + quad0, nmemb, sizeof(VAR), cmp_int); qsort(r_array + quad0, quad1, sizeof(VAR), cmp_rev); qsort(r_array + quad1, quad2, sizeof(VAR), cmp_rev); qsort(r_array + half1, quad3, sizeof(VAR), cmp_rev); qsort(r_array + span3, quad4, sizeof(VAR), cmp_rev); run_test(a_array, r_array, v_array, max, max, samples, repetitions, repetitions, "rr rr rr rr", sizeof(VAR), cmp_int); */ #endif ================================================ FILE: src/fluxsort.c ================================================ // fluxsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com #define FLUX_OUT 96 void FUNC(flux_partition)(VAR *array, VAR *swap, VAR *ptx, VAR *ptp, size_t nmemb, CMPFUNC *cmp); // Determine whether to use mergesort or quicksort void FUNC(flux_analyze)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp) { unsigned char loop, asum, bsum, csum, dsum; unsigned int astreaks, bstreaks, cstreaks, dstreaks; size_t quad1, quad2, quad3, quad4, half1, half2; size_t cnt, abalance, bbalance, cbalance, dbalance; VAR *pta, *ptb, *ptc, *ptd; half1 = nmemb / 2; quad1 = half1 / 2; quad2 = half1 - quad1; half2 = nmemb - half1; quad3 = half2 / 2; quad4 = half2 - quad3; pta = array; ptb = array + quad1; ptc = array + half1; ptd = array + half1 + quad3; astreaks = bstreaks = cstreaks = dstreaks = 0; abalance = bbalance = cbalance = dbalance = 0; if (quad1 < quad2) {bbalance += cmp(ptb, ptb + 1) > 0; ptb++;} if (quad1 < quad3) {cbalance += cmp(ptc, ptc + 1) > 0; ptc++;} if (quad1 < quad4) {dbalance += cmp(ptd, ptd + 1) > 0; ptd++;} for (cnt = nmemb ; cnt > 132 ; cnt -= 128) { for (asum = bsum = csum = dsum = 0, loop = 32 ; loop ; loop--) { asum += cmp(pta, pta + 1) > 0; pta++; bsum += cmp(ptb, ptb + 1) > 0; ptb++; csum += cmp(ptc, ptc + 1) > 0; ptc++; dsum += cmp(ptd, ptd + 1) > 0; ptd++; } abalance += asum; astreaks += asum = (asum == 0) | (asum == 32); bbalance += bsum; bstreaks += bsum = (bsum == 0) | (bsum == 32); cbalance += csum; cstreaks += csum = (csum == 0) | (csum == 32); dbalance += dsum; dstreaks += dsum = (dsum == 0) | (dsum == 32); if (cnt > 516 && asum + bsum + csum + dsum == 0) { abalance += 48; pta += 96; bbalance += 48; ptb += 96; cbalance += 48; ptc += 96; dbalance += 48; ptd += 96; cnt -= 384; } } for ( ; cnt > 7 ; cnt -= 4) { abalance += cmp(pta, pta + 1) > 0; pta++; bbalance += cmp(ptb, ptb + 1) > 0; ptb++; cbalance += cmp(ptc, ptc + 1) > 0; ptc++; dbalance += cmp(ptd, ptd + 1) > 0; ptd++; } cnt = abalance + bbalance + cbalance + dbalance; if (cnt == 0) { if (cmp(pta, pta + 1) <= 0 && cmp(ptb, ptb + 1) <= 0 && cmp(ptc, ptc + 1) <= 0) { return; } } asum = quad1 - abalance == 1; bsum = quad2 - bbalance == 1; csum = quad3 - cbalance == 1; dsum = quad4 - dbalance == 1; if (asum | bsum | csum | dsum) { unsigned char span1 = (asum && bsum) * (cmp(pta, pta + 1) > 0); unsigned char span2 = (bsum && csum) * (cmp(ptb, ptb + 1) > 0); unsigned char span3 = (csum && dsum) * (cmp(ptc, ptc + 1) > 0); switch (span1 | span2 * 2 | span3 * 4) { case 0: break; case 1: FUNC(quad_reversal)(array, ptb); abalance = bbalance = 0; break; case 2: FUNC(quad_reversal)(pta + 1, ptc); bbalance = cbalance = 0; break; case 3: FUNC(quad_reversal)(array, ptc); abalance = bbalance = cbalance = 0; break; case 4: FUNC(quad_reversal)(ptb + 1, ptd); cbalance = dbalance = 0; break; case 5: FUNC(quad_reversal)(array, ptb); FUNC(quad_reversal)(ptb + 1, ptd); abalance = bbalance = cbalance = dbalance = 0; break; case 6: FUNC(quad_reversal)(pta + 1, ptd); bbalance = cbalance = dbalance = 0; break; case 7: FUNC(quad_reversal)(array, ptd); return; } if (asum && abalance) {FUNC(quad_reversal)(array, pta); abalance = 0;} if (bsum && bbalance) {FUNC(quad_reversal)(pta + 1, ptb); bbalance = 0;} if (csum && cbalance) {FUNC(quad_reversal)(ptb + 1, ptc); cbalance = 0;} if (dsum && dbalance) {FUNC(quad_reversal)(ptc + 1, ptd); dbalance = 0;} } #ifdef cmp cnt = nmemb / 256; // switch to quadsort if at least 50% ordered #else cnt = nmemb / 512; // switch to quadsort if at least 25% ordered #endif asum = astreaks > cnt; bsum = bstreaks > cnt; csum = cstreaks > cnt; dsum = dstreaks > cnt; #ifndef cmp if (quad1 > QUAD_CACHE) { asum = bsum = csum = dsum = 1; } #endif switch (asum + bsum * 2 + csum * 4 + dsum * 8) { case 0: FUNC(flux_partition)(array, swap, array, swap + nmemb, nmemb, cmp); return; case 1: if (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp); FUNC(flux_partition)(pta + 1, swap, pta + 1, swap + quad2 + half2, quad2 + half2, cmp); break; case 2: FUNC(flux_partition)(array, swap, array, swap + quad1, quad1, cmp); if (bbalance) FUNC(quadsort_swap)(pta + 1, swap, swap_size, quad2, cmp); FUNC(flux_partition)(ptb + 1, swap, ptb + 1, swap + half2, half2, cmp); break; case 3: if (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp); if (bbalance) FUNC(quadsort_swap)(pta + 1, swap, swap_size, quad2, cmp); FUNC(flux_partition)(ptb + 1, swap, ptb + 1, swap + half2, half2, cmp); break; case 4: FUNC(flux_partition)(array, swap, array, swap + half1, half1, cmp); if (cbalance) FUNC(quadsort_swap)(ptb + 1, swap, swap_size, quad3, cmp); FUNC(flux_partition)(ptc + 1, swap, ptc + 1, swap + quad4, quad4, cmp); break; case 8: FUNC(flux_partition)(array, swap, array, swap + half1 + quad3, half1 + quad3, cmp); if (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp); break; case 9: if (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp); FUNC(flux_partition)(pta + 1, swap, pta + 1, swap + quad2 + quad3, quad2 + quad3, cmp); if (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp); break; case 12: FUNC(flux_partition)(array, swap, array, swap + half1, half1, cmp); if (cbalance) FUNC(quadsort_swap)(ptb + 1, swap, swap_size, quad3, cmp); if (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp); break; case 5: case 6: case 7: case 10: case 11: case 13: case 14: case 15: if (asum) { if (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp); } else FUNC(flux_partition)(array, swap, array, swap + quad1, quad1, cmp); if (bsum) { if (bbalance) FUNC(quadsort_swap)(pta + 1, swap, swap_size, quad2, cmp); } else FUNC(flux_partition)(pta + 1, swap, pta + 1, swap + quad2, quad2, cmp); if (csum) { if (cbalance) FUNC(quadsort_swap)(ptb + 1, swap, swap_size, quad3, cmp); } else FUNC(flux_partition)(ptb + 1, swap, ptb + 1, swap + quad3, quad3, cmp); if (dsum) { if (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp); } else FUNC(flux_partition)(ptc + 1, swap, ptc + 1, swap + quad4, quad4, cmp); break; } if (cmp(pta, pta + 1) <= 0) { if (cmp(ptc, ptc + 1) <= 0) { if (cmp(ptb, ptb + 1) <= 0) { return; } memcpy(swap, array, nmemb * sizeof(VAR)); } else { FUNC(cross_merge)(swap + half1, array + half1, quad3, quad4, cmp); memcpy(swap, array, half1 * sizeof(VAR)); } } else { if (cmp(ptc, ptc + 1) <= 0) { memcpy(swap + half1, array + half1, half2 * sizeof(VAR)); FUNC(cross_merge)(swap, array, quad1, quad2, cmp); } else { FUNC(cross_merge)(swap + half1, ptb + 1, quad3, quad4, cmp); FUNC(cross_merge)(swap, array, quad1, quad2, cmp); } } FUNC(cross_merge)(array, swap, half1, half2, cmp); } // The next 4 functions are used for pivot selection VAR FUNC(binary_median)(VAR *pta, VAR *ptb, size_t len, CMPFUNC *cmp) { while (len /= 2) { if (cmp(pta + len, ptb + len) <= 0) pta += len; else ptb += len; } return cmp(pta, ptb) > 0 ? *pta : *ptb; } void FUNC(trim_four)(VAR *pta, CMPFUNC *cmp) { VAR swap; size_t x; x = cmp(pta, pta + 1) > 0; swap = pta[!x]; pta[0] = pta[x]; pta[1] = swap; pta += 2; x = cmp(pta, pta + 1) > 0; swap = pta[!x]; pta[0] = pta[x]; pta[1] = swap; pta -= 2; x = (cmp(pta, pta + 2) <= 0) * 2; pta[2] = pta[x]; pta++; x = (cmp(pta, pta + 2) > 0) * 2; pta[0] = pta[x]; } VAR FUNC(median_of_nine)(VAR *array, size_t nmemb, CMPFUNC *cmp) { VAR *pta, swap[9]; size_t x, y, z; z = nmemb / 9; pta = array; for (x = 0 ; x < 9 ; x++) { swap[x] = *pta; pta += z; } FUNC(trim_four)(swap, cmp); FUNC(trim_four)(swap + 4, cmp); swap[0] = swap[5]; swap[3] = swap[8]; FUNC(trim_four)(swap, cmp); swap[0] = swap[6]; x = cmp(swap + 0, swap + 1) > 0; y = cmp(swap + 0, swap + 2) > 0; z = cmp(swap + 1, swap + 2) > 0; return swap[(x == y) + (y ^ z)]; } VAR FUNC(median_of_cbrt)(VAR *array, VAR *swap, VAR *ptx, size_t nmemb, int *generic, CMPFUNC *cmp) { VAR *pta, *pts; size_t cnt, div, cbrt; for (cbrt = 32 ; nmemb > cbrt * cbrt * cbrt ; cbrt *= 2) {} div = nmemb / cbrt; pta = ptx + (size_t) &div / 16 % div; pts = ptx == array ? swap : array; for (cnt = 0 ; cnt < cbrt ; cnt++) { pts[cnt] = *pta; pta += div; } cbrt /= 2; FUNC(quadsort_swap)(pts, pts + cbrt * 2, cbrt, cbrt, cmp); FUNC(quadsort_swap)(pts + cbrt, pts + cbrt * 2, cbrt, cbrt, cmp); *generic = (cmp(pts + cbrt * 2 - 1, pts) <= 0) & (cmp(pts + cbrt - 1, pts) <= 0); return FUNC(binary_median)(pts, pts + cbrt, cbrt, cmp); } // As per suggestion by Marshall Lochbaum to improve generic data handling by mimicking dual-pivot quicksort void FUNC(flux_reverse_partition)(VAR *array, VAR *swap, VAR *ptx, VAR *piv, size_t nmemb, CMPFUNC *cmp) { size_t a_size, s_size; #if !defined __clang__ { size_t cnt, m, val; VAR *pts = swap; for (m = 0, cnt = nmemb / 8 ; cnt ; cnt--) { val = cmp(piv, ptx) > 0; pts[-m] = array[m] = *ptx++; m += val; pts++; val = cmp(piv, ptx) > 0; pts[-m] = array[m] = *ptx++; m += val; pts++; val = cmp(piv, ptx) > 0; pts[-m] = array[m] = *ptx++; m += val; pts++; val = cmp(piv, ptx) > 0; pts[-m] = array[m] = *ptx++; m += val; pts++; val = cmp(piv, ptx) > 0; pts[-m] = array[m] = *ptx++; m += val; pts++; val = cmp(piv, ptx) > 0; pts[-m] = array[m] = *ptx++; m += val; pts++; val = cmp(piv, ptx) > 0; pts[-m] = array[m] = *ptx++; m += val; pts++; val = cmp(piv, ptx) > 0; pts[-m] = array[m] = *ptx++; m += val; pts++; } for (cnt = nmemb % 8 ; cnt ; cnt--) { val = cmp(piv, ptx) > 0; pts[-m] = array[m] = *ptx++; m += val; pts++; } a_size = m; s_size = nmemb - a_size; } #else { size_t cnt; VAR *tmp, *pta = array, *pts = swap; for (cnt = nmemb / 8 ; cnt ; cnt--) { tmp = cmp(piv, ptx) > 0 ? pta++ : pts++; *tmp = *ptx++; tmp = cmp(piv, ptx) > 0 ? pta++ : pts++; *tmp = *ptx++; tmp = cmp(piv, ptx) > 0 ? pta++ : pts++; *tmp = *ptx++; tmp = cmp(piv, ptx) > 0 ? pta++ : pts++; *tmp = *ptx++; tmp = cmp(piv, ptx) > 0 ? pta++ : pts++; *tmp = *ptx++; tmp = cmp(piv, ptx) > 0 ? pta++ : pts++; *tmp = *ptx++; tmp = cmp(piv, ptx) > 0 ? pta++ : pts++; *tmp = *ptx++; tmp = cmp(piv, ptx) > 0 ? pta++ : pts++; *tmp = *ptx++; } for (cnt = nmemb % 8 ; cnt ; cnt--) { tmp = cmp(piv, ptx) > 0 ? pta++ : pts++; *tmp = *ptx++; } a_size = pta - array; s_size = pts - swap; } #endif memcpy(array + a_size, swap, s_size * sizeof(VAR)); if (s_size <= a_size / 16 || a_size <= FLUX_OUT) { FUNC(quadsort_swap)(array, swap, a_size, a_size, cmp); return; } FUNC(flux_partition)(array, swap, array, piv, a_size, cmp); } size_t FUNC(flux_default_partition)(VAR *array, VAR *swap, VAR *ptx, VAR *piv, size_t nmemb, CMPFUNC *cmp) { size_t run = 0, a = 0, m = 0; #if !defined __clang__ size_t val; for (a = 8 ; a <= nmemb ; a += 8) { val = cmp(ptx, piv) <= 0; swap[-m] = array[m] = *ptx++; m += val; swap++; val = cmp(ptx, piv) <= 0; swap[-m] = array[m] = *ptx++; m += val; swap++; val = cmp(ptx, piv) <= 0; swap[-m] = array[m] = *ptx++; m += val; swap++; val = cmp(ptx, piv) <= 0; swap[-m] = array[m] = *ptx++; m += val; swap++; val = cmp(ptx, piv) <= 0; swap[-m] = array[m] = *ptx++; m += val; swap++; val = cmp(ptx, piv) <= 0; swap[-m] = array[m] = *ptx++; m += val; swap++; val = cmp(ptx, piv) <= 0; swap[-m] = array[m] = *ptx++; m += val; swap++; val = cmp(ptx, piv) <= 0; swap[-m] = array[m] = *ptx++; m += val; swap++; if (m == a) run = a; } for (a = nmemb % 8 ; a ; a--) { val = cmp(ptx, piv) <= 0; swap[-m] = array[m] = *ptx++; m += val; swap++; } swap -= nmemb; #else VAR *tmp, *pta = array, *pts = swap; for (a = 8 ; a <= nmemb ; a += 8) { tmp = cmp(ptx, piv) <= 0 ? pta++ : pts++; *tmp = *ptx++; tmp = cmp(ptx, piv) <= 0 ? pta++ : pts++; *tmp = *ptx++; tmp = cmp(ptx, piv) <= 0 ? pta++ : pts++; *tmp = *ptx++; tmp = cmp(ptx, piv) <= 0 ? pta++ : pts++; *tmp = *ptx++; tmp = cmp(ptx, piv) <= 0 ? pta++ : pts++; *tmp = *ptx++; tmp = cmp(ptx, piv) <= 0 ? pta++ : pts++; *tmp = *ptx++; tmp = cmp(ptx, piv) <= 0 ? pta++ : pts++; *tmp = *ptx++; tmp = cmp(ptx, piv) <= 0 ? pta++ : pts++; *tmp = *ptx++; if (pta == array || pts == swap) run = a; } for (a = nmemb % 8 ; a ; a--) { tmp = cmp(ptx, piv) <= 0 ? pta++ : pts++; *tmp = *ptx++; } m = pta - array; #endif if (run <= nmemb / 4) { return m; } if (m == nmemb) { return m; } a = nmemb - m; memcpy(array + m, swap, a * sizeof(VAR)); FUNC(quadsort_swap)(array + m, swap, a, a, cmp); FUNC(quadsort_swap)(array, swap, m, m, cmp); return 0; } void FUNC(flux_partition)(VAR *array, VAR *swap, VAR *ptx, VAR *piv, size_t nmemb, CMPFUNC *cmp) { size_t a_size = 0, s_size; int generic = 0; while (1) { --piv; if (nmemb <= 2048) { *piv = FUNC(median_of_nine)(ptx, nmemb, cmp); } else { *piv = FUNC(median_of_cbrt)(array, swap, ptx, nmemb, &generic, cmp); if (generic) { if (ptx == swap) { memcpy(array, swap, nmemb * sizeof(VAR)); } FUNC(quadsort_swap)(array, swap, nmemb, nmemb, cmp); return; } } if (a_size && cmp(piv + 1, piv) <= 0) { FUNC(flux_reverse_partition)(array, swap, array, piv, nmemb, cmp); return; } a_size = FUNC(flux_default_partition)(array, swap, ptx, piv, nmemb, cmp); s_size = nmemb - a_size; if (a_size <= s_size / 32 || s_size <= FLUX_OUT) { if (a_size == 0) { return; } if (s_size == 0) { FUNC(flux_reverse_partition)(array, swap, array, piv, a_size, cmp); return; } memcpy(array + a_size, swap, s_size * sizeof(VAR)); FUNC(quadsort_swap)(array + a_size, swap, s_size, s_size, cmp); } else { FUNC(flux_partition)(array + a_size, swap, swap, piv, s_size, cmp); } if (s_size <= a_size / 32 || a_size <= FLUX_OUT) { if (a_size <= FLUX_OUT) { FUNC(quadsort_swap)(array, swap, a_size, a_size, cmp); } else { FUNC(flux_reverse_partition)(array, swap, array, piv, a_size, cmp); } return; } nmemb = a_size; ptx = array; } } void FUNC(fluxsort)(void *array, size_t nmemb, CMPFUNC *cmp) { if (nmemb <= 132) { FUNC(quadsort)(array, nmemb, cmp); } else { VAR *pta = (VAR *) array; VAR *swap = (VAR *) malloc(nmemb * sizeof(VAR)); if (swap == NULL) { FUNC(quadsort)(array, nmemb, cmp); return; } FUNC(flux_analyze)(pta, swap, nmemb, nmemb, cmp); free(swap); } } void FUNC(fluxsort_swap)(void *array, void *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp) { if (nmemb <= 132) { FUNC(quadsort_swap)(array, swap, swap_size, nmemb, cmp); } else { VAR *pta = (VAR *) array; VAR *pts = (VAR *) swap; FUNC(flux_analyze)(pta, pts, swap_size, nmemb, cmp); } } ================================================ FILE: src/fluxsort.h ================================================ // fluxsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com #ifndef FLUXSORT_H #define FLUXSORT_H #include #include #include #include #include #include typedef int CMPFUNC (const void *a, const void *b); //#define cmp(a,b) (*(a) > *(b)) #ifndef QUADSORT_H #include "quadsort.h" #endif // When sorting an array of 32/64 bit pointers, like a string array, QUAD_CACHE // needs to be adjusted in quadsort.h and here for proper performance when // sorting large arrays. #ifdef cmp #define QUAD_CACHE 4294967295 #else //#define QUAD_CACHE 131072 #define QUAD_CACHE 262144 //#define QUAD_CACHE 524288 //#define QUAD_CACHE 4294967295 #endif ////////////////////////////////////////////////////////// // ┌───────────────────────────────────────────────────┐// // │ ██████┐ ██████┐ ██████┐ ██████┐████████┐ │// // │ └────██┐└────██┐ ██┌──██┐└─██┌─┘└──██┌──┘ │// // │ █████┌┘ █████┌┘ ██████┌┘ ██│ ██│ │// // │ └───██┐██┌───┘ ██┌──██┐ ██│ ██│ │// // │ ██████┌┘███████┐ ██████┌┘██████┐ ██│ │// // │ └─────┘ └──────┘ └─────┘ └─────┘ └─┘ │// // └───────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// #define VAR int #define FUNC(NAME) NAME##32 #include "fluxsort.c" #undef VAR #undef FUNC // fluxsort_prim #define VAR int #define FUNC(NAME) NAME##_int32 #ifndef cmp #define cmp(a,b) (*(a) > *(b)) #include "fluxsort.c" #undef cmp #else #include "fluxsort.c" #endif #undef VAR #undef FUNC #define VAR unsigned int #define FUNC(NAME) NAME##_uint32 #ifndef cmp #define cmp(a,b) (*(a) > *(b)) #include "fluxsort.c" #undef cmp #else #include "fluxsort.c" #endif #undef VAR #undef FUNC ////////////////////////////////////////////////////////// // ┌───────────────────────────────────────────────────┐// // │ █████┐ ██┐ ██┐ ██████┐ ██████┐████████┐ │// // │ ██┌───┘ ██│ ██│ ██┌──██┐└─██┌─┘└──██┌──┘ │// // │ ██████┐ ███████│ ██████┌┘ ██│ ██│ │// // │ ██┌──██┐└────██│ ██┌──██┐ ██│ ██│ │// // │ └█████┌┘ ██│ ██████┌┘██████┐ ██│ │// // │ └────┘ └─┘ └─────┘ └─────┘ └─┘ │// // └───────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// #define VAR long long #define FUNC(NAME) NAME##64 #include "fluxsort.c" #undef VAR #undef FUNC // fluxsort_prim #define VAR long long #define FUNC(NAME) NAME##_int64 #ifndef cmp #define cmp(a,b) (*(a) > *(b)) #include "fluxsort.c" #undef cmp #else #include "fluxsort.c" #endif #undef VAR #undef FUNC #define VAR unsigned long long #define FUNC(NAME) NAME##_uint64 #ifndef cmp #define cmp(a,b) (*(a) > *(b)) #include "fluxsort.c" #undef cmp #else #include "fluxsort.c" #endif #undef VAR #undef FUNC // This section is outside of 32/64 bit pointer territory, so no cache checks // necessary, unless sorting 32+ byte structures. #undef QUAD_CACHE #define QUAD_CACHE 4294967295 ////////////////////////////////////////////////////////// //┌────────────────────────────────────────────────────┐// //│ █████┐ ██████┐ ██████┐████████┐ │// //│ ██┌──██┐ ██┌──██┐└─██┌─┘└──██┌──┘ │// //│ └█████┌┘ ██████┌┘ ██│ ██│ │// //│ ██┌──██┐ ██┌──██┐ ██│ ██│ │// //│ └█████┌┘ ██████┌┘██████┐ ██│ │// //│ └────┘ └─────┘ └─────┘ └─┘ │// //└────────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// #define VAR char #define FUNC(NAME) NAME##8 #include "fluxsort.c" #undef VAR #undef FUNC ////////////////////////////////////////////////////////// //┌────────────────────────────────────────────────────┐// //│ ▄██┐ █████┐ ██████┐ ██████┐████████┐│// //│ ████│ ██┌───┘ ██┌──██┐└─██┌─┘└──██┌──┘│// //│ └─██│ ██████┐ ██████┌┘ ██│ ██│ │// //│ ██│ ██┌──██┐ ██┌──██┐ ██│ ██│ │// //│ ██████┐└█████┌┘ ██████┌┘██████┐ ██│ │// //│ └─────┘ └────┘ └─────┘ └─────┘ └─┘ │// //└────────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// #define VAR short #define FUNC(NAME) NAME##16 #include "fluxsort.c" #undef VAR #undef FUNC ////////////////////////////////////////////////////////// //┌────────────────────────────────────────────────────┐// //│ ▄██┐ ██████┐ █████┐ ██████┐ ██████┐████████┐ │// //│ ████│ └────██┐██┌──██┐ ██┌──██┐└─██┌─┘└──██┌──┘ │// //│ └─██│ █████┌┘└█████┌┘ ██████┌┘ ██│ ██│ │// //│ ██│ ██┌───┘ ██┌──██┐ ██┌──██┐ ██│ ██│ │// //│ ██████┐███████┐└█████┌┘ ██████┌┘██████┐ ██│ │// //│ └─────┘└──────┘ └────┘ └─────┘ └─────┘ └─┘ │// //└────────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// #if (DBL_MANT_DIG < LDBL_MANT_DIG) #define VAR long double #define FUNC(NAME) NAME##128 #include "fluxsort.c" #undef VAR #undef FUNC #endif ////////////////////////////////////////////////////////////////////////// //┌────────────────────────────────────────────────────────────────────┐// //│███████┐██┐ ██┐ ██┐██┐ ██┐███████┐ ██████┐ ██████┐ ████████┐ │// //│██┌────┘██│ ██│ ██│└██┐██┌┘██┌────┘██┌───██┐██┌──██┐└──██┌──┘ │// //│█████┐ ██│ ██│ ██│ └███┌┘ ███████┐██│ ██│██████┌┘ ██│ │// //│██┌──┘ ██│ ██│ ██│ ██┌██┐ └────██│██│ ██│██┌──██┐ ██│ │// //│██│ ███████┐└██████┌┘██┌┘ ██┐███████│└██████┌┘██│ ██│ ██│ │// //│└─┘ └──────┘ └─────┘ └─┘ └─┘└──────┘ └─────┘ └─┘ └─┘ └─┘ │// //└────────────────────────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////////////////////// void fluxsort(void *array, size_t nmemb, size_t size, CMPFUNC *cmp) { if (nmemb < 2) { return; } switch (size) { case sizeof(char): fluxsort8(array, nmemb, cmp); return; case sizeof(short): fluxsort16(array, nmemb, cmp); return; case sizeof(int): fluxsort32(array, nmemb, cmp); return; case sizeof(long long): fluxsort64(array, nmemb, cmp); return; #if (DBL_MANT_DIG < LDBL_MANT_DIG) case sizeof(long double): fluxsort128(array, nmemb, cmp); return; #endif default: #if (DBL_MANT_DIG < LDBL_MANT_DIG) assert(size == sizeof(char) || size == sizeof(short) || size == sizeof(int) || size == sizeof(long long) || size == sizeof(long double)); #else assert(size == sizeof(char) || size == sizeof(short) || size == sizeof(int) || size == sizeof(long long)); #endif } } // This must match quadsort_prim() void fluxsort_prim(void *array, size_t nmemb, size_t size) { if (nmemb < 2) { return; } switch (size) { case 4: fluxsort_int32(array, nmemb, NULL); return; case 5: fluxsort_uint32(array, nmemb, NULL); return; case 8: fluxsort_int64(array, nmemb, NULL); return; case 9: fluxsort_uint64(array, nmemb, NULL); return; default: assert(size == sizeof(int) || size == sizeof(int) + 1 || size == sizeof(long long) || size == sizeof(long long) + 1); return; } } // Sort arrays of structures, the comparison function must be by reference. void fluxsort_size(void *array, size_t nmemb, size_t size, CMPFUNC *cmp) { char **pti, *pta, *pts; size_t index, offset; pta = (char *) array; pti = (char **) malloc(nmemb * sizeof(char *)); assert(pti != NULL); for (index = offset = 0 ; index < nmemb ; index++) { pti[index] = pta + offset; offset += size; } switch (sizeof(size_t)) { case 4: fluxsort32(pti, nmemb, cmp); break; case 8: fluxsort64(pti, nmemb, cmp); break; } pts = (char *) malloc(nmemb * size); assert(pts != NULL); for (index = 0 ; index < nmemb ; index++) { memcpy(pts, pti[index], size); pts += size; } pts -= nmemb * size; memcpy(array, pts, nmemb * size); free(pti); free(pts); } #undef QUAD_CACHE #endif ================================================ FILE: src/gridsort.c ================================================ // gridsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com STRUCT(x_node) { VAR *swap; size_t y_size; size_t y; VAR *y_base; STRUCT(y_node) **y_axis; }; STRUCT(y_node) { size_t z_size; VAR *z_axis1; VAR *z_axis2; }; STRUCT(x_node) *FUNC(create_grid)(VAR *array, size_t nmemb, CMPFUNC *cmp) { STRUCT(x_node) *x_node = (STRUCT(x_node) *) malloc(sizeof(STRUCT(x_node))); STRUCT(y_node) *y_node; for (BSC_Z = BSC_X ; BSC_Z * BSC_Z / 4 < nmemb ; BSC_Z *= 4); x_node->swap = (VAR *) malloc(BSC_Z * 2 * sizeof(VAR)); x_node->y_base = (VAR *) malloc(BSC_Z * sizeof(VAR)); x_node->y_axis = (STRUCT(y_node) **) malloc(BSC_Z * sizeof(STRUCT(y_node) *)); FUNC(quadsort_swap)(array, x_node->swap, BSC_Z * 2, BSC_Z * 2, cmp); for (int cnt = 0 ; cnt < 2 ; cnt++) { y_node = (STRUCT(y_node) *) malloc(sizeof(STRUCT(y_node))); y_node->z_axis1 = (VAR *) malloc(BSC_Z * sizeof(VAR)); memcpy(y_node->z_axis1, array + cnt * BSC_Z, BSC_Z * sizeof(VAR)); y_node->z_axis2 = (VAR *) malloc(BSC_Z * sizeof(VAR)); y_node->z_size = 0; x_node->y_axis[cnt] = y_node; x_node->y_base[cnt] = y_node->z_axis1[0]; } x_node->y_size = 2; x_node->y = 0; return x_node; } // used by destroy_grid // y_node->z_axis1 should be sorted and of BSC_Z size. // y_node->z_axis2 should be unsorted and of y_node->z_size size. void FUNC(twin_merge_cpy)(STRUCT(x_node) *x_node, VAR *dest, STRUCT(y_node) *y_node, CMPFUNC *cmp) { VAR *ptl = y_node->z_axis1; VAR *ptr = y_node->z_axis2; size_t nmemb1 = BSC_Z; size_t nmemb2 = y_node->z_size; VAR *tpl = y_node->z_axis1 + nmemb1 - 1; VAR *tpr = y_node->z_axis2 + nmemb2 - 1; VAR *ptd = dest; VAR *tpd = dest + nmemb1 + nmemb2 - 1; size_t loop, x, y; FUNC(quadsort_swap)(ptr, x_node->swap, nmemb2, nmemb2, cmp); while (1) { if (tpl - ptl > 8) { ptl8_ptr: if (cmp(ptl + 7, ptr) <= 0) { memcpy(ptd, ptl, 8 * sizeof(VAR)); ptd += 8; ptl += 8; if (tpl - ptl > 8) {goto ptl8_ptr;} continue; } tpl8_tpr: if (cmp(tpl - 7, tpr) > 0) { tpd -= 7; tpl -= 7; memcpy(tpd--, tpl--, 8 * sizeof(VAR)); if (tpl - ptl > 8) {goto tpl8_tpr;} continue; } } if (tpr - ptr > 8) { ptl_ptr8: if (cmp(ptl, ptr + 7) > 0) { memcpy(ptd, ptr, 8 * sizeof(VAR)); ptd += 8; ptr += 8; if (tpr - ptr > 8) {goto ptl_ptr8;} continue; } tpl_tpr8: if (cmp(tpl, tpr - 7) <= 0) { tpd -= 7; tpr -= 7; memcpy(tpd--, tpr--, 8 * sizeof(VAR)); if (tpr - ptr > 8) {goto tpl_tpr8;} continue; } } if (tpd - ptd < 16) { break; } loop = 8; do { head_branchless_merge(ptd, x, ptl, ptr, cmp); tail_branchless_merge(tpd, y, tpl, tpr, cmp); } while (--loop); } while (tpl - ptl > 1 && tpr - ptr > 1) { if (cmp(ptl + 1, ptr) <= 0) { *ptd++ = *ptl++; *ptd++ = *ptl++; } else if (cmp(ptl, ptr + 1) > 0) { *ptd++ = *ptr++; *ptd++ = *ptr++; } else { x = cmp(ptl, ptr) <= 0; y = !x; ptd[x] = *ptr; ptr += 1; ptd[y] = *ptl; ptl += 1; ptd += 2; x = cmp(ptl, ptr) <= 0; y = !x; ptd[x] = *ptr; ptr += y; ptd[y] = *ptl; ptl += x; ptd++; } } while (ptl <= tpl && ptr <= tpr) { *ptd++ = cmp(ptl, ptr) <= 0 ? *ptl++ : *ptr++; } while (ptl <= tpl) { *ptd++ = *ptl++; } while (ptr <= tpr) { *ptd++ = *ptr++; } } void FUNC(parity_twin_merge)(VAR *ptl, VAR *ptr, VAR *ptd, VAR *tpd, size_t block, CMPFUNC *cmp) { VAR *tpl, *tpr; #if !defined __clang__ unsigned char x, y; #endif tpl = ptl + block - 1; tpr = ptr + block - 1; for (block-- ; block ; block--) { head_branchless_merge(ptd, x, ptl, ptr, cmp); tail_branchless_merge(tpd, y, tpl, tpr, cmp); } *ptd = cmp(ptl, ptr) <= 0 ? *ptl : *ptr; *tpd = cmp(tpl, tpr) > 0 ? *tpl : *tpr; } // merge two sorted arrays across two buckets // [AB][AB] --> [AA][ ] + [BB][ ] void FUNC(twin_merge)(STRUCT(x_node) *x_node, STRUCT(y_node) *y_node1, STRUCT(y_node) *y_node2, CMPFUNC *cmp) { VAR *pta, *ptb, *tpa, *tpb, *pts; FUNC(quadsort_swap)(y_node1->z_axis2, x_node->swap, BSC_Z, BSC_Z, cmp); pta = y_node1->z_axis1; ptb = y_node1->z_axis2; tpa = pta + BSC_Z - 1; tpb = ptb + BSC_Z - 1; if (cmp(tpa, ptb) <= 0) { pts = y_node1->z_axis2; y_node1->z_axis2 = y_node2->z_axis1; y_node2->z_axis1 = pts; return; } if (cmp(pta, tpb) > 0) { pts = y_node1->z_axis1; y_node1->z_axis1 = y_node1->z_axis2; y_node1->z_axis2 = y_node2->z_axis1; y_node2->z_axis1 = pts; return; } FUNC(parity_twin_merge)(pta, ptb, y_node2->z_axis2, y_node2->z_axis1 + BSC_Z - 1, BSC_Z, cmp); pta = y_node1->z_axis1; y_node1->z_axis1 = y_node2->z_axis2; y_node2->z_axis2 = pta; } void FUNC(destroy_grid)(STRUCT(x_node) *x_node, VAR *array, CMPFUNC *cmp) { STRUCT(y_node) *y_node; size_t y, z; for (y = z = 0 ; y < x_node->y_size ; y++) { y_node = x_node->y_axis[y]; if (y_node->z_size) { FUNC(twin_merge_cpy)(x_node, &array[z], y_node, cmp); } else { memcpy(&array[z], y_node->z_axis1, BSC_Z * sizeof(VAR)); } z += BSC_Z + y_node->z_size; free(y_node->z_axis1); free(y_node->z_axis2); free(y_node); } free(x_node->y_axis); free(x_node->y_base); free(x_node->swap); free(x_node); } size_t FUNC(adaptive_binary_search)(STRUCT(x_node) *x_node, VAR *array, VAR key, CMPFUNC *cmp) { static unsigned int run; size_t top, mid; VAR *base = array; if (!run) { top = x_node->y_size; goto monobound; } if (x_node->y == x_node->y_size - 1) { if (cmp(base + x_node->y, &key) <= 0) { return x_node->y; } top = x_node->y; goto monobound; } if (x_node->y == 0) { base++; if (cmp(base, &key) > 0) { return 0; } top = x_node->y_size - 1; goto monobound; } base += x_node->y; if (cmp(base, &key) <= 0) { if (cmp(base + 1, &key) > 0) { goto end; } base++; top = x_node->y_size - x_node->y - 1; } else { base--; if (cmp(base, &key) <= 0) { goto end; } top = x_node->y - 1; base = array; } monobound: while (top > 1) { mid = top / 2; if (cmp(base + mid, &key) <= 0) { base += mid; } top -= mid; } end: top = base - array; run = x_node->y == top; return x_node->y = top; } void FUNC(insert_y_node)(STRUCT(x_node) *x_node, size_t y) { size_t end = ++x_node->y_size; if (x_node->y_size % BSC_Z == 0) { x_node->y_base = (VAR *) realloc(x_node->y_base, (x_node->y_size + BSC_Z) * sizeof(VAR)); x_node->y_axis = (STRUCT(y_node) **) realloc(x_node->y_axis, (x_node->y_size + BSC_Z) * sizeof(STRUCT(y_node) *)); } while (y < --end) { x_node->y_axis[end] = x_node->y_axis[end - 1]; x_node->y_base[end] = x_node->y_base[end - 1]; } x_node->y_axis[y] = (STRUCT(y_node) *) malloc(sizeof(STRUCT(y_node))); x_node->y_axis[y]->z_axis1 = (VAR *) malloc(BSC_Z * sizeof(VAR)); x_node->y_axis[y]->z_axis2 = (VAR *) malloc(BSC_Z * sizeof(VAR)); } void FUNC(split_y_node)(STRUCT(x_node) *x_node, size_t y1, size_t y2, CMPFUNC *cmp) { STRUCT(y_node) *y_node1, *y_node2; FUNC(insert_y_node)(x_node, y2); y_node1 = x_node->y_axis[y1]; y_node2 = x_node->y_axis[y2]; FUNC(twin_merge)(x_node, y_node1, y_node2, cmp); y_node1->z_size = y_node2->z_size = 0; x_node->y_base[y1] = y_node1->z_axis1[0]; x_node->y_base[y2] = y_node2->z_axis1[0]; } void FUNC(insert_z_node)(STRUCT(x_node) *x_node, VAR key, CMPFUNC *cmp) { STRUCT(y_node) *y_node; size_t y; y = FUNC(adaptive_binary_search)(x_node, x_node->y_base, key, cmp); y_node = x_node->y_axis[y]; y_node->z_axis2[y_node->z_size++] = key; if (y_node->z_size == BSC_Z) { FUNC(split_y_node)(x_node, y, y + 1, cmp); } } ///////////////////////////////////////////////////////////////////////////// //┌───────────────────────────────────────────────────────────────────────┐// //│ ██████┐ ██████┐ ██████┐██████┐ ███████┐ ██████┐ ██████┐ ████████┐ │// //│ ██┌────┘ ██┌──██┐└─██┌─┘██┌──██┐██┌────┘██┌───██┐██┌──██┐└──██┌──┘ │// //│ ██│ ███┐██████┌┘ ██│ ██│ ██│███████┐██│ ██│██████┌┘ ██│ │// //│ ██│ ██│██┌──██┐ ██│ ██│ ██│└────██│██│ ██│██┌──██┐ ██│ │// //│ └██████┌┘██│ ██│██████┐██████┌┘███████│└██████┌┘██│ ██│ ██│ │// //│ └─────┘ └─┘ └─┘└─────┘└─────┘ └──────┘ └─────┘ └─┘ └─┘ └─┘ │// //└───────────────────────────────────────────────────────────────────────┘// ///////////////////////////////////////////////////////////////////////////// void FUNC(gridsort)(void *array, size_t nmemb, size_t size, CMPFUNC *cmp) { size_t cnt = nmemb; VAR *pta = (VAR *) array; STRUCT(x_node) *grid = FUNC(create_grid)(pta, cnt, cmp); pta += BSC_Z * 2; cnt -= BSC_Z * 2; while (cnt--) { FUNC(insert_z_node)(grid, *pta++, cmp); } FUNC(destroy_grid)(grid, (VAR *) array, cmp); } ================================================ FILE: src/gridsort.h ================================================ // gridsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com #ifndef GRIDSORT_H #define GRIDSORT_H //#define cmp(a,b) (*(a) > *(b)) #ifndef QUADSORT_H #include "quadsort.h" #endif #include #include #include #include typedef int CMPFUNC (const void *a, const void *b); #define BSC_X 32 #define BSC_Y 2 size_t BSC_Z; ////////////////////////////////////////////////////////// //┌────────────────────────────────────────────────────┐// //│ █████┐ ██████┐ ██████┐████████┐ │// //│ ██┌──██┐ ██┌──██┐└─██┌─┘└──██┌──┘ │// //│ └█████┌┘ ██████┌┘ ██│ ██│ │// //│ ██┌──██┐ ██┌──██┐ ██│ ██│ │// //│ └█████┌┘ ██████┌┘██████┐ ██│ │// //│ └────┘ └─────┘ └─────┘ └─┘ │// //└────────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// #undef VAR #undef FUNC #undef STRUCT #define VAR char #define FUNC(NAME) NAME##8 #define STRUCT(NAME) struct NAME##8 #include "gridsort.c" ////////////////////////////////////////////////////////// //┌────────────────────────────────────────────────────┐// //│ ▄██┐ █████┐ ██████┐ ██████┐████████┐│// //│ ████│ ██┌───┘ ██┌──██┐└─██┌─┘└──██┌──┘│// //│ └─██│ ██████┐ ██████┌┘ ██│ ██│ │// //│ ██│ ██┌──██┐ ██┌──██┐ ██│ ██│ │// //│ ██████┐└█████┌┘ ██████┌┘██████┐ ██│ │// //│ └─────┘ └────┘ └─────┘ └─────┘ └─┘ │// //└────────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// #undef VAR #undef FUNC #undef STRUCT #define VAR short #define FUNC(NAME) NAME##16 #define STRUCT(NAME) struct NAME##16 #include "gridsort.c" ////////////////////////////////////////////////////////// // ┌───────────────────────────────────────────────────┐// // │ ██████┐ ██████┐ ██████┐ ██████┐████████┐ │// // │ └────██┐└────██┐ ██┌──██┐└─██┌─┘└──██┌──┘ │// // │ █████┌┘ █████┌┘ ██████┌┘ ██│ ██│ │// // │ └───██┐██┌───┘ ██┌──██┐ ██│ ██│ │// // │ ██████┌┘███████┐ ██████┌┘██████┐ ██│ │// // │ └─────┘ └──────┘ └─────┘ └─────┘ └─┘ │// // └───────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// #undef VAR #undef FUNC #undef STRUCT #define VAR int #define FUNC(NAME) NAME##32 #define STRUCT(NAME) struct NAME##32 #include "gridsort.c" ////////////////////////////////////////////////////////// // ┌───────────────────────────────────────────────────┐// // │ █████┐ ██┐ ██┐ ██████┐ ██████┐████████┐ │// // │ ██┌───┘ ██│ ██│ ██┌──██┐└─██┌─┘└──██┌──┘ │// // │ ██████┐ ███████│ ██████┌┘ ██│ ██│ │// // │ ██┌──██┐└────██│ ██┌──██┐ ██│ ██│ │// // │ └█████┌┘ ██│ ██████┌┘██████┐ ██│ │// // │ └────┘ └─┘ └─────┘ └─────┘ └─┘ │// // └───────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// #undef VAR #undef FUNC #undef STRUCT #define VAR long long #define FUNC(NAME) NAME##64 #define STRUCT(NAME) struct NAME##64 #include "gridsort.c" ////////////////////////////////////////////////////////// //┌────────────────────────────────────────────────────┐// //│ ▄██┐ ██████┐ █████┐ ██████┐ ██████┐████████┐ │// //│ ████│ └────██┐██┌──██┐ ██┌──██┐└─██┌─┘└──██┌──┘ │// //│ └─██│ █████┌┘└█████┌┘ ██████┌┘ ██│ ██│ │// //│ ██│ ██┌───┘ ██┌──██┐ ██┌──██┐ ██│ ██│ │// //│ ██████┐███████┐└█████┌┘ ██████┌┘██████┐ ██│ │// //│ └─────┘└──────┘ └────┘ └─────┘ └─────┘ └─┘ │// //└────────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// #undef VAR #undef FUNC #undef STRUCT #define VAR long double #define FUNC(NAME) NAME##128 #define STRUCT(NAME) struct NAME##128 #include "gridsort.c" ///////////////////////////////////////////////////////////////////////////// //┌───────────────────────────────────────────────────────────────────────┐// //│ ██████┐ ██████┐ ██████┐██████┐ ███████┐ ██████┐ ██████┐ ████████┐ │// //│ ██┌────┘ ██┌──██┐└─██┌─┘██┌──██┐██┌────┘██┌───██┐██┌──██┐└──██┌──┘ │// //│ ██│ ███┐██████┌┘ ██│ ██│ ██│███████┐██│ ██│██████┌┘ ██│ │// //│ ██│ ██│██┌──██┐ ██│ ██│ ██│└────██│██│ ██│██┌──██┐ ██│ │// //│ └██████┌┘██│ ██│██████┐██████┌┘███████│└██████┌┘██│ ██│ ██│ │// //│ └─────┘ └─┘ └─┘└─────┘└─────┘ └──────┘ └─────┘ └─┘ └─┘ └─┘ │// //└───────────────────────────────────────────────────────────────────────┘// ///////////////////////////////////////////////////////////////////////////// void gridsort(void *array, size_t nmemb, size_t size, CMPFUNC *cmp) { if (nmemb < BSC_X * BSC_X) { return quadsort(array, nmemb, size, cmp); } switch (size) { case sizeof(char): return gridsort8(array, nmemb, size, cmp); case sizeof(short): return gridsort16(array, nmemb, size, cmp); case sizeof(int): return gridsort32(array, nmemb, size, cmp); case sizeof(long long): return gridsort64(array, nmemb, size, cmp); case sizeof(long double): return gridsort128(array, nmemb, size, cmp); default: assert(size == sizeof(char) || size == sizeof(short) || size == sizeof(int) || size == sizeof(long long) || size == sizeof(long double)); } } #undef VAR #undef FUNC #undef STRUCT #endif ================================================ FILE: src/quadsort.c ================================================ // quadsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com // the next seven functions are used for sorting 0 to 31 elements void FUNC(parity_swap_four)(VAR *array, CMPFUNC *cmp) { VAR tmp, *pta = array; size_t x; branchless_swap(pta, tmp, x, cmp); pta += 2; branchless_swap(pta, tmp, x, cmp); pta--; if (cmp(pta, pta + 1) > 0) { tmp = pta[0]; pta[0] = pta[1]; pta[1] = tmp; pta--; branchless_swap(pta, tmp, x, cmp); pta += 2; branchless_swap(pta, tmp, x, cmp); pta--; branchless_swap(pta, tmp, x, cmp); } } void FUNC(parity_swap_five)(VAR *array, CMPFUNC *cmp) { VAR tmp, *pta = array; size_t x, y; branchless_swap(pta, tmp, x, cmp); pta += 2; branchless_swap(pta, tmp, x, cmp); pta -= 1; branchless_swap(pta, tmp, x, cmp); pta += 2; branchless_swap(pta, tmp, y, cmp); pta = array; if (x + y) { branchless_swap(pta, tmp, x, cmp); pta += 2; branchless_swap(pta, tmp, x, cmp); pta -= 1; branchless_swap(pta, tmp, x, cmp); pta += 2; branchless_swap(pta, tmp, x, cmp); pta = array; branchless_swap(pta, tmp, x, cmp); pta += 2; branchless_swap(pta, tmp, x, cmp); pta -= 1; } } void FUNC(parity_swap_six)(VAR *array, VAR *swap, CMPFUNC *cmp) { VAR tmp, *pta = array, *ptl, *ptr; size_t x, y; branchless_swap(pta, tmp, x, cmp); pta++; branchless_swap(pta, tmp, x, cmp); pta += 3; branchless_swap(pta, tmp, x, cmp); pta--; branchless_swap(pta, tmp, x, cmp); pta = array; if (cmp(pta + 2, pta + 3) <= 0) { branchless_swap(pta, tmp, x, cmp); pta += 4; branchless_swap(pta, tmp, x, cmp); return; } x = cmp(pta, pta + 1) > 0; y = !x; swap[0] = pta[x]; swap[1] = pta[y]; swap[2] = pta[2]; pta += 4; x = cmp(pta, pta + 1) > 0; y = !x; swap[4] = pta[x]; swap[5] = pta[y]; swap[3] = pta[-1]; pta = array; ptl = swap; ptr = swap + 3; head_branchless_merge(pta, x, ptl, ptr, cmp); head_branchless_merge(pta, x, ptl, ptr, cmp); head_branchless_merge(pta, x, ptl, ptr, cmp); pta = array + 5; ptl = swap + 2; ptr = swap + 5; tail_branchless_merge(pta, y, ptl, ptr, cmp); tail_branchless_merge(pta, y, ptl, ptr, cmp); *pta = cmp(ptl, ptr) > 0 ? *ptl : *ptr; } void FUNC(parity_swap_seven)(VAR *array, VAR *swap, CMPFUNC *cmp) { VAR tmp, *pta = array, *ptl, *ptr; size_t x, y; branchless_swap(pta, tmp, x, cmp); pta += 2; branchless_swap(pta, tmp, x, cmp); pta += 2; branchless_swap(pta, tmp, x, cmp); pta -= 3; branchless_swap(pta, tmp, y, cmp); pta += 2; branchless_swap(pta, tmp, x, cmp); pta += 2; y += x; branchless_swap(pta, tmp, x, cmp); pta -= 1; y += x; if (y == 0) return; branchless_swap(pta, tmp, x, cmp); pta = array; x = cmp(pta, pta + 1) > 0; swap[0] = pta[x]; swap[1] = pta[!x]; swap[2] = pta[2]; pta += 3; x = cmp(pta, pta + 1) > 0; swap[3] = pta[x]; swap[4] = pta[!x]; pta += 2; x = cmp(pta, pta + 1) > 0; swap[5] = pta[x]; swap[6] = pta[!x]; pta = array; ptl = swap; ptr = swap + 3; head_branchless_merge(pta, x, ptl, ptr, cmp); head_branchless_merge(pta, x, ptl, ptr, cmp); head_branchless_merge(pta, x, ptl, ptr, cmp); pta = array + 6; ptl = swap + 2; ptr = swap + 6; tail_branchless_merge(pta, y, ptl, ptr, cmp); tail_branchless_merge(pta, y, ptl, ptr, cmp); tail_branchless_merge(pta, y, ptl, ptr, cmp); *pta = cmp(ptl, ptr) > 0 ? *ptl : *ptr; } void FUNC(tiny_sort)(VAR *array, VAR *swap, size_t nmemb, CMPFUNC *cmp) { VAR tmp; size_t x; switch (nmemb) { case 0: case 1: return; case 2: branchless_swap(array, tmp, x, cmp); return; case 3: branchless_swap(array, tmp, x, cmp); array++; branchless_swap(array, tmp, x, cmp); array--; branchless_swap(array, tmp, x, cmp); return; case 4: FUNC(parity_swap_four)(array, cmp); return; case 5: FUNC(parity_swap_five)(array, cmp); return; case 6: FUNC(parity_swap_six)(array, swap, cmp); return; case 7: FUNC(parity_swap_seven)(array, swap, cmp); return; } } // left must be equal or one smaller than right void FUNC(parity_merge)(VAR *dest, VAR *from, size_t left, size_t right, CMPFUNC *cmp) { VAR *ptl, *ptr, *tpl, *tpr, *tpd, *ptd; #if !defined __clang__ size_t x, y; #endif ptl = from; ptr = from + left; ptd = dest; tpl = ptr - 1; tpr = tpl + right; tpd = dest + left + right - 1; if (left < right) { *ptd++ = cmp(ptl, ptr) <= 0 ? *ptl++ : *ptr++; } *ptd++ = cmp(ptl, ptr) <= 0 ? *ptl++ : *ptr++; #if !defined cmp && !defined __clang__ // cache limit workaround for gcc if (left > QUAD_CACHE) { while (--left) { *ptd++ = cmp(ptl, ptr) <= 0 ? *ptl++ : *ptr++; *tpd-- = cmp(tpl, tpr) > 0 ? *tpl-- : *tpr--; } } else #endif { while (--left) { head_branchless_merge(ptd, x, ptl, ptr, cmp); tail_branchless_merge(tpd, y, tpl, tpr, cmp); } } *tpd = cmp(tpl, tpr) > 0 ? *tpl : *tpr; } void FUNC(tail_swap)(VAR *array, VAR *swap, size_t nmemb, CMPFUNC *cmp) { if (nmemb < 8) { FUNC(tiny_sort)(array, swap, nmemb, cmp); return; } size_t quad1, quad2, quad3, quad4, half1, half2; half1 = nmemb / 2; quad1 = half1 / 2; quad2 = half1 - quad1; half2 = nmemb - half1; quad3 = half2 / 2; quad4 = half2 - quad3; VAR *pta = array; FUNC(tail_swap)(pta, swap, quad1, cmp); pta += quad1; FUNC(tail_swap)(pta, swap, quad2, cmp); pta += quad2; FUNC(tail_swap)(pta, swap, quad3, cmp); pta += quad3; FUNC(tail_swap)(pta, swap, quad4, cmp); if (cmp(array + quad1 - 1, array + quad1) <= 0 && cmp(array + half1 - 1, array + half1) <= 0 && cmp(pta - 1, pta) <= 0) { return; } FUNC(parity_merge)(swap, array, quad1, quad2, cmp); FUNC(parity_merge)(swap + half1, array + half1, quad3, quad4, cmp); FUNC(parity_merge)(array, swap, half1, half2, cmp); } // the next three functions create sorted blocks of 32 elements void FUNC(quad_reversal)(VAR *pta, VAR *ptz) { VAR *ptb, *pty, tmp1, tmp2; size_t loop = (ptz - pta) / 2; ptb = pta + loop; pty = ptz - loop; if (loop % 2 == 0) { tmp2 = *ptb; *ptb-- = *pty; *pty++ = tmp2; loop--; } loop /= 2; do { tmp1 = *pta; *pta++ = *ptz; *ptz-- = tmp1; tmp2 = *ptb; *ptb-- = *pty; *pty++ = tmp2; } while (loop--); } void FUNC(quad_swap_merge)(VAR *array, VAR *swap, CMPFUNC *cmp) { VAR *pts, *ptl, *ptr; #if !defined __clang__ size_t x; #endif parity_merge_two(array + 0, swap + 0, x, ptl, ptr, pts, cmp); parity_merge_two(array + 4, swap + 4, x, ptl, ptr, pts, cmp); parity_merge_four(swap, array, x, ptl, ptr, pts, cmp); } void FUNC(tail_merge)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, size_t block, CMPFUNC *cmp); size_t FUNC(quad_swap)(VAR *array, size_t nmemb, CMPFUNC *cmp) { VAR tmp, swap[32]; size_t count; VAR *pta, *pts; unsigned char v1, v2, v3, v4, x; pta = array; count = nmemb / 8; while (count--) { v1 = cmp(pta + 0, pta + 1) > 0; v2 = cmp(pta + 2, pta + 3) > 0; v3 = cmp(pta + 4, pta + 5) > 0; v4 = cmp(pta + 6, pta + 7) > 0; switch (v1 + v2 * 2 + v3 * 4 + v4 * 8) { case 0: if (cmp(pta + 1, pta + 2) <= 0 && cmp(pta + 3, pta + 4) <= 0 && cmp(pta + 5, pta + 6) <= 0) { goto ordered; } FUNC(quad_swap_merge)(pta, swap, cmp); break; case 15: if (cmp(pta + 1, pta + 2) > 0 && cmp(pta + 3, pta + 4) > 0 && cmp(pta + 5, pta + 6) > 0) { pts = pta; goto reversed; } default: not_ordered: x = !v1; tmp = pta[x]; pta[0] = pta[v1]; pta[1] = tmp; pta += 2; x = !v2; tmp = pta[x]; pta[0] = pta[v2]; pta[1] = tmp; pta += 2; x = !v3; tmp = pta[x]; pta[0] = pta[v3]; pta[1] = tmp; pta += 2; x = !v4; tmp = pta[x]; pta[0] = pta[v4]; pta[1] = tmp; pta -= 6; FUNC(quad_swap_merge)(pta, swap, cmp); } pta += 8; continue; ordered: pta += 8; if (count--) { if ((v1 = cmp(pta + 0, pta + 1) > 0) | (v2 = cmp(pta + 2, pta + 3) > 0) | (v3 = cmp(pta + 4, pta + 5) > 0) | (v4 = cmp(pta + 6, pta + 7) > 0)) { if (v1 + v2 + v3 + v4 == 4 && cmp(pta + 1, pta + 2) > 0 && cmp(pta + 3, pta + 4) > 0 && cmp(pta + 5, pta + 6) > 0) { pts = pta; goto reversed; } goto not_ordered; } if (cmp(pta + 1, pta + 2) <= 0 && cmp(pta + 3, pta + 4) <= 0 && cmp(pta + 5, pta + 6) <= 0) { goto ordered; } FUNC(quad_swap_merge)(pta, swap, cmp); pta += 8; continue; } break; reversed: pta += 8; if (count--) { if ((v1 = cmp(pta + 0, pta + 1) <= 0) | (v2 = cmp(pta + 2, pta + 3) <= 0) | (v3 = cmp(pta + 4, pta + 5) <= 0) | (v4 = cmp(pta + 6, pta + 7) <= 0)) { // not reversed } else { if (cmp(pta - 1, pta) > 0 && cmp(pta + 1, pta + 2) > 0 && cmp(pta + 3, pta + 4) > 0 && cmp(pta + 5, pta + 6) > 0) { goto reversed; } } FUNC(quad_reversal)(pts, pta - 1); if (v1 + v2 + v3 + v4 == 4 && cmp(pta + 1, pta + 2) <= 0 && cmp(pta + 3, pta + 4) <= 0 && cmp(pta + 5, pta + 6) <= 0) { goto ordered; } if (v1 + v2 + v3 + v4 == 0 && cmp(pta + 1, pta + 2) > 0 && cmp(pta + 3, pta + 4) > 0 && cmp(pta + 5, pta + 6) > 0) { pts = pta; goto reversed; } x = !v1; tmp = pta[v1]; pta[0] = pta[x]; pta[1] = tmp; pta += 2; x = !v2; tmp = pta[v2]; pta[0] = pta[x]; pta[1] = tmp; pta += 2; x = !v3; tmp = pta[v3]; pta[0] = pta[x]; pta[1] = tmp; pta += 2; x = !v4; tmp = pta[v4]; pta[0] = pta[x]; pta[1] = tmp; pta -= 6; if (cmp(pta + 1, pta + 2) > 0 || cmp(pta + 3, pta + 4) > 0 || cmp(pta + 5, pta + 6) > 0) { FUNC(quad_swap_merge)(pta, swap, cmp); } pta += 8; continue; } switch (nmemb % 8) { case 7: if (cmp(pta + 5, pta + 6) <= 0) break; case 6: if (cmp(pta + 4, pta + 5) <= 0) break; case 5: if (cmp(pta + 3, pta + 4) <= 0) break; case 4: if (cmp(pta + 2, pta + 3) <= 0) break; case 3: if (cmp(pta + 1, pta + 2) <= 0) break; case 2: if (cmp(pta + 0, pta + 1) <= 0) break; case 1: if (cmp(pta - 1, pta + 0) <= 0) break; case 0: FUNC(quad_reversal)(pts, pta + nmemb % 8 - 1); if (pts == array) { return 1; } goto reverse_end; } FUNC(quad_reversal)(pts, pta - 1); break; } FUNC(tail_swap)(pta, swap, nmemb % 8, cmp); reverse_end: pta = array; for (count = nmemb / 32 ; count-- ; pta += 32) { if (cmp(pta + 7, pta + 8) <= 0 && cmp(pta + 15, pta + 16) <= 0 && cmp(pta + 23, pta + 24) <= 0) { continue; } FUNC(parity_merge)(swap, pta, 8, 8, cmp); FUNC(parity_merge)(swap + 16, pta + 16, 8, 8, cmp); FUNC(parity_merge)(pta, swap, 16, 16, cmp); } if (nmemb % 32 > 8) { FUNC(tail_merge)(pta, swap, 32, nmemb % 32, 8, cmp); } return 0; } // The next six functions are quad merge support routines void FUNC(cross_merge)(VAR *dest, VAR *from, size_t left, size_t right, CMPFUNC *cmp) { VAR *ptl, *tpl, *ptr, *tpr, *ptd, *tpd; size_t loop; #if !defined __clang__ size_t x, y; #endif ptl = from; ptr = from + left; tpl = ptr - 1; tpr = tpl + right; if (left + 1 >= right && right >= left && left >= 32) { if (cmp(ptl + 15, ptr) > 0 && cmp(ptl, ptr + 15) <= 0 && cmp(tpl, tpr - 15) > 0 && cmp(tpl - 15, tpr) <= 0) { FUNC(parity_merge)(dest, from, left, right, cmp); return; } } ptd = dest; tpd = dest + left + right - 1; while (1) { if (tpl - ptl > 8) { ptl8_ptr: if (cmp(ptl + 7, ptr) <= 0) { memcpy(ptd, ptl, 8 * sizeof(VAR)); ptd += 8; ptl += 8; if (tpl - ptl > 8) {goto ptl8_ptr;} continue; } tpl8_tpr: if (cmp(tpl - 7, tpr) > 0) { tpd -= 7; tpl -= 7; memcpy(tpd--, tpl--, 8 * sizeof(VAR)); if (tpl - ptl > 8) {goto tpl8_tpr;} continue; } } if (tpr - ptr > 8) { ptl_ptr8: if (cmp(ptl, ptr + 7) > 0) { memcpy(ptd, ptr, 8 * sizeof(VAR)); ptd += 8; ptr += 8; if (tpr - ptr > 8) {goto ptl_ptr8;} continue; } tpl_tpr8: if (cmp(tpl, tpr - 7) <= 0) { tpd -= 7; tpr -= 7; memcpy(tpd--, tpr--, 8 * sizeof(VAR)); if (tpr - ptr > 8) {goto tpl_tpr8;} continue; } } if (tpd - ptd < 16) { break; } #if !defined cmp && !defined __clang__ if (left > QUAD_CACHE) { loop = 8; do { *ptd++ = cmp(ptl, ptr) <= 0 ? *ptl++ : *ptr++; *tpd-- = cmp(tpl, tpr) > 0 ? *tpl-- : *tpr--; } while (--loop); } else #endif { loop = 8; do { head_branchless_merge(ptd, x, ptl, ptr, cmp); tail_branchless_merge(tpd, y, tpl, tpr, cmp); } while (--loop); } } while (ptl <= tpl && ptr <= tpr) { *ptd++ = cmp(ptl, ptr) <= 0 ? *ptl++ : *ptr++; } while (ptl <= tpl) { *ptd++ = *ptl++; } while (ptr <= tpr) { *ptd++ = *ptr++; } } void FUNC(quad_merge_block)(VAR *array, VAR *swap, size_t block, CMPFUNC *cmp) { VAR *pt1, *pt2, *pt3; size_t block_x_2 = block * 2; pt1 = array + block; pt2 = pt1 + block; pt3 = pt2 + block; switch ((cmp(pt1 - 1, pt1) <= 0) | (cmp(pt3 - 1, pt3) <= 0) * 2) { case 0: FUNC(cross_merge)(swap, array, block, block, cmp); FUNC(cross_merge)(swap + block_x_2, pt2, block, block, cmp); break; case 1: memcpy(swap, array, block_x_2 * sizeof(VAR)); FUNC(cross_merge)(swap + block_x_2, pt2, block, block, cmp); break; case 2: FUNC(cross_merge)(swap, array, block, block, cmp); memcpy(swap + block_x_2, pt2, block_x_2 * sizeof(VAR)); break; case 3: if (cmp(pt2 - 1, pt2) <= 0) return; memcpy(swap, array, block_x_2 * 2 * sizeof(VAR)); } FUNC(cross_merge)(array, swap, block_x_2, block_x_2, cmp); } size_t FUNC(quad_merge)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, size_t block, CMPFUNC *cmp) { VAR *pta, *pte; pte = array + nmemb; block *= 4; while (block <= nmemb && block <= swap_size) { pta = array; do { FUNC(quad_merge_block)(pta, swap, block / 4, cmp); pta += block; } while (pta + block <= pte); FUNC(tail_merge)(pta, swap, swap_size, pte - pta, block / 4, cmp); block *= 4; } FUNC(tail_merge)(array, swap, swap_size, nmemb, block / 4, cmp); return block / 2; } void FUNC(partial_forward_merge)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, size_t block, CMPFUNC *cmp) { VAR *ptl, *ptr, *tpl, *tpr; size_t x; if (nmemb == block) { return; } ptr = array + block; tpr = array + nmemb - 1; if (cmp(ptr - 1, ptr) <= 0) { return; } memcpy(swap, array, block * sizeof(VAR)); ptl = swap; tpl = swap + block - 1; while (ptl < tpl - 1 && ptr < tpr - 1) { ptr2: if (cmp(ptl, ptr + 1) > 0) { *array++ = *ptr++; *array++ = *ptr++; if (ptr < tpr - 1) {goto ptr2;} break; } if (cmp(ptl + 1, ptr) <= 0) { *array++ = *ptl++; *array++ = *ptl++; if (ptl < tpl - 1) {goto ptl2;} break; } goto cross_swap; ptl2: if (cmp(ptl + 1, ptr) <= 0) { *array++ = *ptl++; *array++ = *ptl++; if (ptl < tpl - 1) {goto ptl2;} break; } if (cmp(ptl, ptr + 1) > 0) { *array++ = *ptr++; *array++ = *ptr++; if (ptr < tpr - 1) {goto ptr2;} break; } cross_swap: x = cmp(ptl, ptr) <= 0; array[x] = *ptr; ptr += 1; array[!x] = *ptl; ptl += 1; array += 2; head_branchless_merge(array, x, ptl, ptr, cmp); } while (ptl <= tpl && ptr <= tpr) { *array++ = cmp(ptl, ptr) <= 0 ? *ptl++ : *ptr++; } while (ptl <= tpl) { *array++ = *ptl++; } } void FUNC(partial_backward_merge)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, size_t block, CMPFUNC *cmp) { VAR *tpl, *tpa, *tpr; size_t right, loop, x; if (nmemb == block) { return; } tpl = array + block - 1; tpa = array + nmemb - 1; if (cmp(tpl, tpl + 1) <= 0) { return; } right = nmemb - block; if (nmemb <= swap_size && right >= 64) { FUNC(cross_merge)(swap, array, block, right, cmp); memcpy(array, swap, nmemb * sizeof(VAR)); return; } memcpy(swap, array + block, right * sizeof(VAR)); tpr = swap + right - 1; while (tpl > array + 16 && tpr > swap + 16) { tpl_tpr16: if (cmp(tpl, tpr - 15) <= 0) { loop = 16; do *tpa-- = *tpr--; while (--loop); if (tpr > swap + 16) {goto tpl_tpr16;} break; } tpl16_tpr: if (cmp(tpl - 15, tpr) > 0) { loop = 16; do *tpa-- = *tpl--; while (--loop); if (tpl > array + 16) {goto tpl16_tpr;} break; } loop = 8; do { if (cmp(tpl, tpr - 1) <= 0) { *tpa-- = *tpr--; *tpa-- = *tpr--; } else if (cmp(tpl - 1, tpr) > 0) { *tpa-- = *tpl--; *tpa-- = *tpl--; } else { x = cmp(tpl, tpr) <= 0; tpa--; tpa[x] = *tpr; tpr -= 1; tpa[!x] = *tpl; tpl -= 1; tpa--; tail_branchless_merge(tpa, x, tpl, tpr, cmp); } } while (--loop); } while (tpr > swap + 1 && tpl > array + 1) { tpr2: if (cmp(tpl, tpr - 1) <= 0) { *tpa-- = *tpr--; *tpa-- = *tpr--; if (tpr > swap + 1) {goto tpr2;} break; } if (cmp(tpl - 1, tpr) > 0) { *tpa-- = *tpl--; *tpa-- = *tpl--; if (tpl > array + 1) {goto tpl2;} break; } goto cross_swap; tpl2: if (cmp(tpl - 1, tpr) > 0) { *tpa-- = *tpl--; *tpa-- = *tpl--; if (tpl > array + 1) {goto tpl2;} break; } if (cmp(tpl, tpr - 1) <= 0) { *tpa-- = *tpr--; *tpa-- = *tpr--; if (tpr > swap + 1) {goto tpr2;} break; } cross_swap: x = cmp(tpl, tpr) <= 0; tpa--; tpa[x] = *tpr; tpr -= 1; tpa[!x] = *tpl; tpl -= 1; tpa--; tail_branchless_merge(tpa, x, tpl, tpr, cmp); } while (tpr >= swap && tpl >= array) { *tpa-- = cmp(tpl, tpr) > 0 ? *tpl-- : *tpr--; } while (tpr >= swap) { *tpa-- = *tpr--; } } void FUNC(tail_merge)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, size_t block, CMPFUNC *cmp) { VAR *pta, *pte; pte = array + nmemb; while (block < nmemb && block <= swap_size) { for (pta = array ; pta + block < pte ; pta += block * 2) { if (pta + block * 2 < pte) { FUNC(partial_backward_merge)(pta, swap, swap_size, block * 2, block, cmp); continue; } FUNC(partial_backward_merge)(pta, swap, swap_size, pte - pta, block, cmp); break; } block *= 2; } } // the next four functions provide in-place rotate merge support void FUNC(trinity_rotation)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, size_t left) { VAR temp; size_t bridge, right = nmemb - left; if (swap_size > 65536) { swap_size = 65536; } if (left < right) { if (left <= swap_size) { memcpy(swap, array, left * sizeof(VAR)); memmove(array, array + left, right * sizeof(VAR)); memcpy(array + right, swap, left * sizeof(VAR)); } else { VAR *pta, *ptb, *ptc, *ptd; pta = array; ptb = pta + left; bridge = right - left; if (bridge <= swap_size && bridge > 3) { ptc = pta + right; ptd = ptc + left; memcpy(swap, ptb, bridge * sizeof(VAR)); while (left--) { *--ptc = *--ptd; *ptd = *--ptb; } memcpy(pta, swap, bridge * sizeof(VAR)); } else { ptc = ptb; ptd = ptc + right; bridge = left / 2; while (bridge--) { temp = *--ptb; *ptb = *pta; *pta++ = *ptc; *ptc++ = *--ptd; *ptd = temp; } bridge = (ptd - ptc) / 2; while (bridge--) { temp = *ptc; *ptc++ = *--ptd; *ptd = *pta; *pta++ = temp; } bridge = (ptd - pta) / 2; while (bridge--) { temp = *pta; *pta++ = *--ptd; *ptd = temp; } } } } else if (right < left) { if (right <= swap_size) { memcpy(swap, array + left, right * sizeof(VAR)); memmove(array + right, array, left * sizeof(VAR)); memcpy(array, swap, right * sizeof(VAR)); } else { VAR *pta, *ptb, *ptc, *ptd; pta = array; ptb = pta + left; bridge = left - right; if (bridge <= swap_size && bridge > 3) { ptc = pta + right; ptd = ptc + left; memcpy(swap, ptc, bridge * sizeof(VAR)); while (right--) { *ptc++ = *pta; *pta++ = *ptb++; } memcpy(ptd - bridge, swap, bridge * sizeof(VAR)); } else { ptc = ptb; ptd = ptc + right; bridge = right / 2; while (bridge--) { temp = *--ptb; *ptb = *pta; *pta++ = *ptc; *ptc++ = *--ptd; *ptd = temp; } bridge = (ptb - pta) / 2; while (bridge--) { temp = *--ptb; *ptb = *pta; *pta++ = *--ptd; *ptd = temp; } bridge = (ptd - pta) / 2; while (bridge--) { temp = *pta; *pta++ = *--ptd; *ptd = temp; } } } } else { VAR *pta, *ptb; pta = array; ptb = pta + left; while (left--) { temp = *pta; *pta++ = *ptb; *ptb++ = temp; } } } size_t FUNC(monobound_binary_first)(VAR *array, VAR *value, size_t top, CMPFUNC *cmp) { VAR *end; size_t mid; end = array + top; while (top > 1) { mid = top / 2; if (cmp(value, end - mid) <= 0) { end -= mid; } top -= mid; } if (cmp(value, end - 1) <= 0) { end--; } return (end - array); } void FUNC(rotate_merge_block)(VAR *array, VAR *swap, size_t swap_size, size_t lblock, size_t right, CMPFUNC *cmp) { size_t left, rblock, unbalanced; if (cmp(array + lblock - 1, array + lblock) <= 0) { return; } rblock = lblock / 2; lblock -= rblock; left = FUNC(monobound_binary_first)(array + lblock + rblock, array + lblock, right, cmp); right -= left; // [ lblock ] [ rblock ] [ left ] [ right ] if (left) { if (lblock + left <= swap_size) { memcpy(swap, array, lblock * sizeof(VAR)); memcpy(swap + lblock, array + lblock + rblock, left * sizeof(VAR)); memmove(array + lblock + left, array + lblock, rblock * sizeof(VAR)); FUNC(cross_merge)(array, swap, lblock, left, cmp); } else { FUNC(trinity_rotation)(array + lblock, swap, swap_size, rblock + left, rblock); unbalanced = (left * 2 < lblock) | (lblock * 2 < left); if (unbalanced && left <= swap_size) { FUNC(partial_backward_merge)(array, swap, swap_size, lblock + left, lblock, cmp); } else if (unbalanced && lblock <= swap_size) { FUNC(partial_forward_merge)(array, swap, swap_size, lblock + left, lblock, cmp); } else { FUNC(rotate_merge_block)(array, swap, swap_size, lblock, left, cmp); } } } if (right) { unbalanced = (right * 2 < rblock) | (rblock * 2 < right); if ((unbalanced && right <= swap_size) || right + rblock <= swap_size) { FUNC(partial_backward_merge)(array + lblock + left, swap, swap_size, rblock + right, rblock, cmp); } else if (unbalanced && rblock <= swap_size) { FUNC(partial_forward_merge)(array + lblock + left, swap, swap_size, rblock + right, rblock, cmp); } else { FUNC(rotate_merge_block)(array + lblock + left, swap, swap_size, rblock, right, cmp); } } } void FUNC(rotate_merge)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, size_t block, CMPFUNC *cmp) { VAR *pta, *pte; pte = array + nmemb; if (nmemb <= block * 2 && nmemb - block <= swap_size) { FUNC(partial_backward_merge)(array, swap, swap_size, nmemb, block, cmp); return; } while (block < nmemb) { for (pta = array ; pta + block < pte ; pta += block * 2) { if (pta + block * 2 < pte) { FUNC(rotate_merge_block)(pta, swap, swap_size, block, block, cmp); continue; } FUNC(rotate_merge_block)(pta, swap, swap_size, block, pte - pta - block, cmp); break; } block *= 2; } } /////////////////////////////////////////////////////////////////////////////// //┌─────────────────────────────────────────────────────────────────────────┐// //│ ██████┐ ██┐ ██┐ █████┐ ██████┐ ███████┐ ██████┐ ██████┐ ████████┐ │// //│ ██┌───██┐██│ ██│██┌──██┐██┌──██┐██┌────┘██┌───██┐██┌──██┐└──██┌──┘ │// //│ ██│ ██│██│ ██│███████│██│ ██│███████┐██│ ██│██████┌┘ ██│ │// //│ ██│▄▄ ██│██│ ██│██┌──██│██│ ██│└────██│██│ ██│██┌──██┐ ██│ │// //│ └██████┌┘└██████┌┘██│ ██│██████┌┘███████│└██████┌┘██│ ██│ ██│ │// //│ └──▀▀─┘ └─────┘ └─┘ └─┘└─────┘ └──────┘ └─────┘ └─┘ └─┘ └─┘ │// //└─────────────────────────────────────────────────────────────────────────┘// /////////////////////////////////////////////////////////////////////////////// void FUNC(quadsort)(void *array, size_t nmemb, CMPFUNC *cmp) { VAR *pta = (VAR *) array; if (nmemb < 32) { VAR swap[nmemb]; FUNC(tail_swap)(pta, swap, nmemb, cmp); } else if (FUNC(quad_swap)(pta, nmemb, cmp) == 0) { VAR *swap = NULL; size_t block, swap_size = nmemb; if (nmemb > 4194304) for (swap_size = 4194304 ; swap_size * 8 <= nmemb ; swap_size *= 4) {} swap = (VAR *) malloc(swap_size * sizeof(VAR)); if (swap == NULL) { VAR stack[512]; block = FUNC(quad_merge)(pta, stack, 512, nmemb, 32, cmp); FUNC(rotate_merge)(pta, stack, 512, nmemb, block, cmp); return; } block = FUNC(quad_merge)(pta, swap, swap_size, nmemb, 32, cmp); FUNC(rotate_merge)(pta, swap, swap_size, nmemb, block, cmp); free(swap); } } void FUNC(quadsort_swap)(void *array, void *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp) { VAR *pta = (VAR *) array; VAR *pts = (VAR *) swap; if (nmemb <= 96) { FUNC(tail_swap)(pta, pts, nmemb, cmp); } else if (FUNC(quad_swap)(pta, nmemb, cmp) == 0) { size_t block = FUNC(quad_merge)(pta, pts, swap_size, nmemb, 32, cmp); FUNC(rotate_merge)(pta, pts, swap_size, nmemb, block, cmp); } } ================================================ FILE: src/quadsort.h ================================================ // quadsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com #ifndef QUADSORT_H #define QUADSORT_H #include #include #include #include #include #include //#include typedef int CMPFUNC (const void *a, const void *b); //#define cmp(a,b) (*(a) > *(b)) // When sorting an array of pointers, like a string array, the QUAD_CACHE needs // to be set for proper performance when sorting large arrays. // quadsort_prim() can be used to sort arrays of 32 and 64 bit integers // without a comparison function or cache restrictions. // With a 6 MB L3 cache a value of 262144 works well. #ifdef cmp #define QUAD_CACHE 4294967295 #else //#define QUAD_CACHE 131072 #define QUAD_CACHE 262144 //#define QUAD_CACHE 524288 //#define QUAD_CACHE 4294967295 #endif // utilize branchless ternary operations in clang #if !defined __clang__ #define head_branchless_merge(ptd, x, ptl, ptr, cmp) \ x = cmp(ptl, ptr) <= 0; \ *ptd = *ptl; \ ptl += x; \ ptd[x] = *ptr; \ ptr += !x; \ ptd++; #else #define head_branchless_merge(ptd, x, ptl, ptr, cmp) \ *ptd++ = cmp(ptl, ptr) <= 0 ? *ptl++ : *ptr++; #endif #if !defined __clang__ #define tail_branchless_merge(tpd, y, tpl, tpr, cmp) \ y = cmp(tpl, tpr) <= 0; \ *tpd = *tpl; \ tpl -= !y; \ tpd--; \ tpd[y] = *tpr; \ tpr -= y; #else #define tail_branchless_merge(tpd, x, tpl, tpr, cmp) \ *tpd-- = cmp(tpl, tpr) > 0 ? *tpl-- : *tpr--; #endif // guarantee small parity merges are inlined with minimal overhead #define parity_merge_two(array, swap, x, ptl, ptr, pts, cmp) \ ptl = array; ptr = array + 2; pts = swap; \ head_branchless_merge(pts, x, ptl, ptr, cmp); \ *pts = cmp(ptl, ptr) <= 0 ? *ptl : *ptr; \ \ ptl = array + 1; ptr = array + 3; pts = swap + 3; \ tail_branchless_merge(pts, x, ptl, ptr, cmp); \ *pts = cmp(ptl, ptr) > 0 ? *ptl : *ptr; #define parity_merge_four(array, swap, x, ptl, ptr, pts, cmp) \ ptl = array + 0; ptr = array + 4; pts = swap; \ head_branchless_merge(pts, x, ptl, ptr, cmp); \ head_branchless_merge(pts, x, ptl, ptr, cmp); \ head_branchless_merge(pts, x, ptl, ptr, cmp); \ *pts = cmp(ptl, ptr) <= 0 ? *ptl : *ptr; \ \ ptl = array + 3; ptr = array + 7; pts = swap + 7; \ tail_branchless_merge(pts, x, ptl, ptr, cmp); \ tail_branchless_merge(pts, x, ptl, ptr, cmp); \ tail_branchless_merge(pts, x, ptl, ptr, cmp); \ *pts = cmp(ptl, ptr) > 0 ? *ptl : *ptr; #if !defined __clang__ #define branchless_swap(pta, swap, x, cmp) \ x = cmp(pta, pta + 1) > 0; \ swap = pta[!x]; \ pta[0] = pta[x]; \ pta[1] = swap; #else #define branchless_swap(pta, swap, x, cmp) \ x = 0; \ swap = cmp(pta, pta + 1) > 0 ? pta[x++] : pta[1]; \ pta[0] = pta[x]; \ pta[1] = swap; #endif #define swap_branchless(pta, swap, x, y, cmp) \ x = cmp(pta, pta + 1) > 0; \ y = !x; \ swap = pta[y]; \ pta[0] = pta[x]; \ pta[1] = swap; ////////////////////////////////////////////////////////// // ┌───────────────────────────────────────────────────┐// // │ ██████┐ ██████┐ ██████┐ ██████┐████████┐ │// // │ └────██┐└────██┐ ██┌──██┐└─██┌─┘└──██┌──┘ │// // │ █████┌┘ █████┌┘ ██████┌┘ ██│ ██│ │// // │ └───██┐██┌───┘ ██┌──██┐ ██│ ██│ │// // │ ██████┌┘███████┐ ██████┌┘██████┐ ██│ │// // │ └─────┘ └──────┘ └─────┘ └─────┘ └─┘ │// // └───────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// #define VAR int #define FUNC(NAME) NAME##32 #include "quadsort.c" #undef VAR #undef FUNC // quadsort_prim #define VAR int #define FUNC(NAME) NAME##_int32 #ifndef cmp #define cmp(a,b) (*(a) > *(b)) #include "quadsort.c" #undef cmp #else #include "quadsort.c" #endif #undef VAR #undef FUNC #define VAR unsigned int #define FUNC(NAME) NAME##_uint32 #ifndef cmp #define cmp(a,b) (*(a) > *(b)) #include "quadsort.c" #undef cmp #else #include "quadsort.c" #endif #undef VAR #undef FUNC ////////////////////////////////////////////////////////// // ┌───────────────────────────────────────────────────┐// // │ █████┐ ██┐ ██┐ ██████┐ ██████┐████████┐ │// // │ ██┌───┘ ██│ ██│ ██┌──██┐└─██┌─┘└──██┌──┘ │// // │ ██████┐ ███████│ ██████┌┘ ██│ ██│ │// // │ ██┌──██┐└────██│ ██┌──██┐ ██│ ██│ │// // │ └█████┌┘ ██│ ██████┌┘██████┐ ██│ │// // │ └────┘ └─┘ └─────┘ └─────┘ └─┘ │// // └───────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// #define VAR long long #define FUNC(NAME) NAME##64 #include "quadsort.c" #undef VAR #undef FUNC // quadsort_prim #define VAR long long #define FUNC(NAME) NAME##_int64 #ifndef cmp #define cmp(a,b) (*(a) > *(b)) #include "quadsort.c" #undef cmp #else #include "quadsort.c" #endif #undef VAR #undef FUNC #define VAR unsigned long long #define FUNC(NAME) NAME##_uint64 #ifndef cmp #define cmp(a,b) (*(a) > *(b)) #include "quadsort.c" #undef cmp #else #include "quadsort.c" #endif #undef VAR #undef FUNC // This section is outside of 32/64 bit pointer territory, so no cache checks // necessary, unless sorting 32+ byte structures. #undef QUAD_CACHE #define QUAD_CACHE 4294967295 ////////////////////////////////////////////////////////// //┌────────────────────────────────────────────────────┐// //│ █████┐ ██████┐ ██████┐████████┐ │// //│ ██┌──██┐ ██┌──██┐└─██┌─┘└──██┌──┘ │// //│ └█████┌┘ ██████┌┘ ██│ ██│ │// //│ ██┌──██┐ ██┌──██┐ ██│ ██│ │// //│ └█████┌┘ ██████┌┘██████┐ ██│ │// //│ └────┘ └─────┘ └─────┘ └─┘ │// //└────────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// #define VAR char #define FUNC(NAME) NAME##8 #include "quadsort.c" #undef VAR #undef FUNC ////////////////////////////////////////////////////////// //┌────────────────────────────────────────────────────┐// //│ ▄██┐ █████┐ ██████┐ ██████┐████████┐│// //│ ████│ ██┌───┘ ██┌──██┐└─██┌─┘└──██┌──┘│// //│ └─██│ ██████┐ ██████┌┘ ██│ ██│ │// //│ ██│ ██┌──██┐ ██┌──██┐ ██│ ██│ │// //│ ██████┐└█████┌┘ ██████┌┘██████┐ ██│ │// //│ └─────┘ └────┘ └─────┘ └─────┘ └─┘ │// //└────────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// #define VAR short #define FUNC(NAME) NAME##16 #include "quadsort.c" #undef VAR #undef FUNC ////////////////////////////////////////////////////////// //┌────────────────────────────────────────────────────┐// //│ ▄██┐ ██████┐ █████┐ ██████┐ ██████┐████████┐ │// //│ ████│ └────██┐██┌──██┐ ██┌──██┐└─██┌─┘└──██┌──┘ │// //│ └─██│ █████┌┘└█████┌┘ ██████┌┘ ██│ ██│ │// //│ ██│ ██┌───┘ ██┌──██┐ ██┌──██┐ ██│ ██│ │// //│ ██████┐███████┐└█████┌┘ ██████┌┘██████┐ ██│ │// //│ └─────┘└──────┘ └────┘ └─────┘ └─────┘ └─┘ │// //└────────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// // 128 reflects the name, though the actual size of a long double is 64, 80, // 96, or 128 bits, depending on platform. #if (DBL_MANT_DIG < LDBL_MANT_DIG) #define VAR long double #define FUNC(NAME) NAME##128 #include "quadsort.c" #undef VAR #undef FUNC #endif /////////////////////////////////////////////////////////// //┌─────────────────────────────────────────────────────┐// //│ ██████┐██┐ ██┐███████┐████████┐ ██████┐ ███┐ ███┐│// //│██┌────┘██│ ██│██┌────┘└──██┌──┘██┌───██┐████┐████││// //│██│ ██│ ██│███████┐ ██│ ██│ ██│██┌███┌██││// //│██│ ██│ ██│└────██│ ██│ ██│ ██│██│└█┌┘██││// //│└██████┐└██████┌┘███████│ ██│ └██████┌┘██│ └┘ ██││// //│ └─────┘ └─────┘ └──────┘ └─┘ └─────┘ └─┘ └─┘│// //└─────────────────────────────────────────────────────┘// /////////////////////////////////////////////////////////// /* typedef struct {char bytes[32];} struct256; #define VAR struct256 #define FUNC(NAME) NAME##256 #include "quadsort.c" #undef VAR #undef FUNC */ /////////////////////////////////////////////////////////////////////////////// //┌─────────────────────────────────────────────────────────────────────────┐// //│ ██████┐ ██┐ ██┐ █████┐ ██████┐ ███████┐ ██████┐ ██████┐ ████████┐ │// //│ ██┌───██┐██│ ██│██┌──██┐██┌──██┐██┌────┘██┌───██┐██┌──██┐└──██┌──┘ │// //│ ██│ ██│██│ ██│███████│██│ ██│███████┐██│ ██│██████┌┘ ██│ │// //│ ██│▄▄ ██│██│ ██│██┌──██│██│ ██│└────██│██│ ██│██┌──██┐ ██│ │// //│ └██████┌┘└██████┌┘██│ ██│██████┌┘███████│└██████┌┘██│ ██│ ██│ │// //│ └──▀▀─┘ └─────┘ └─┘ └─┘└─────┘ └──────┘ └─────┘ └─┘ └─┘ └─┘ │// //└─────────────────────────────────────────────────────────────────────────┘// /////////////////////////////////////////////////////////////////////////////// void quadsort(void *array, size_t nmemb, size_t size, CMPFUNC *cmp) { if (nmemb < 2) { return; } switch (size) { case sizeof(char): quadsort8(array, nmemb, cmp); return; case sizeof(short): quadsort16(array, nmemb, cmp); return; case sizeof(int): quadsort32(array, nmemb, cmp); return; case sizeof(long long): quadsort64(array, nmemb, cmp); return; #if (DBL_MANT_DIG < LDBL_MANT_DIG) case sizeof(long double): quadsort128(array, nmemb, cmp); return; #endif // case sizeof(struct256): // quadsort256(array, nmemb, cmp); // return; default: #if (DBL_MANT_DIG < LDBL_MANT_DIG) assert(size == sizeof(char) || size == sizeof(short) || size == sizeof(int) || size == sizeof(long long) || size == sizeof(long double)); #else assert(size == sizeof(char) || size == sizeof(short) || size == sizeof(int) || size == sizeof(long long)); #endif // qsort(array, nmemb, size, cmp); } } // suggested size values for primitives: // case 0: unsigned char // case 1: signed char // case 2: signed short // case 3: unsigned short // case 4: signed int // case 5: unsigned int // case 6: float // case 7: double // case 8: signed long long // case 9: unsigned long long // case ?: long double, use sizeof(long double): void quadsort_prim(void *array, size_t nmemb, size_t size) { if (nmemb < 2) { return; } switch (size) { case 4: quadsort_int32(array, nmemb, NULL); return; case 5: quadsort_uint32(array, nmemb, NULL); return; case 8: quadsort_int64(array, nmemb, NULL); return; case 9: quadsort_uint64(array, nmemb, NULL); return; default: assert(size == sizeof(int) || size == sizeof(int) + 1 || size == sizeof(long long) || size == sizeof(long long) + 1); return; } } // Sort arrays of structures, the comparison function must be by reference. void quadsort_size(void *array, size_t nmemb, size_t size, CMPFUNC *cmp) { char **pti, *pta, *pts; size_t index, offset; if (nmemb < 2) { return; } pta = (char *) array; pti = (char **) malloc(nmemb * sizeof(char *)); assert(pti != NULL); for (index = offset = 0 ; index < nmemb ; index++) { pti[index] = pta + offset; offset += size; } switch (sizeof(size_t)) { case 4: quadsort32(pti, nmemb, cmp); break; case 8: quadsort64(pti, nmemb, cmp); break; } pts = (char *) malloc(nmemb * size); assert(pts != NULL); for (index = 0 ; index < nmemb ; index++) { memcpy(pts, pti[index], size); pts += size; } pts -= nmemb * size; memcpy(array, pts, nmemb * size); free(pti); free(pts); } #undef QUAD_CACHE #endif ================================================ FILE: src/skipsort.c ================================================ // skipsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com void FUNC(skip_partition)(VAR *array, VAR *swap, VAR *ptx, VAR *ptp, size_t nmemb, CMPFUNC *cmp); // Similar to quadsort, but detect both random and reverse order runs int FUNC(skip_analyze)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp) { size_t count, span; VAR *pta, *pts; unsigned char v1, v2, v3, v4, x; pta = array; count = nmemb / 8; while (count--) { // granular v1 = cmp(pta + 0, pta + 1) > 0; v2 = cmp(pta + 2, pta + 3) > 0; v3 = cmp(pta + 4, pta + 5) > 0; v4 = cmp(pta + 6, pta + 7) > 0; switch (v1 + v2 * 2 + v3 * 4 + v4 * 8) { case 0: if (cmp(pta + 1, pta + 2) <= 0 && cmp(pta + 3, pta + 4) <= 0 && cmp(pta + 5, pta + 6) <= 0) { goto ordered; } pts = pta; goto random; case 15: if (cmp(pta + 1, pta + 2) > 0 && cmp(pta + 3, pta + 4) > 0 && cmp(pta + 5, pta + 6) > 0) { pts = pta; goto reversed; } default: pts = pta; goto random; } random: // random pta += 8; if (count--) { v1 = cmp(pta + 0, pta + 1) > 0; v2 = cmp(pta + 2, pta + 3) > 0; v3 = cmp(pta + 4, pta + 5) > 0; v4 = cmp(pta + 6, pta + 7) > 0; switch (v1 + v2 * 2 + v3 * 4 + v4 * 8) { case 0: if (cmp(pta + 1, pta + 2) <= 0 && cmp(pta + 3, pta + 4) <= 0 && cmp(pta + 5, pta + 6) <= 0) { if (count) { pta += 8; if (cmp(pta + 0, pta + 1) <= 0 && cmp(pta + 1, pta + 2) <= 0 && cmp(pta + 2, pta + 3) <= 0 && cmp(pta + 3, pta + 4) <= 0 && cmp(pta + 4, pta + 5) <= 0 && cmp(pta + 5, pta + 6) <= 0 && cmp(pta + 6, pta + 7) <= 0) { pta -= 8; break; } count--; } } goto randomc; case 15: if (cmp(pta + 1, pta + 2) > 0 && cmp(pta + 3, pta + 4) > 0 && cmp(pta + 5, pta + 6) > 0) { break; } default: randomc: if (count >= 6) { count -= 6; pta += 48; } goto random; } span = (pta - pts); if (span <= 96) { FUNC(tail_swap)(pts, swap, span, cmp); } else { FUNC(flux_partition)(pts, swap, pts, swap + span, span, cmp); } if (v1 | v2 | v3 | v4) { pts = pta; goto reversed; } pta += 8; count--; goto ordered; } span = (pta - pts); if (span <= 96) { FUNC(tail_swap)(pts, swap, span, cmp); break; } if (pts == array) { FUNC(flux_partition)(array, swap, pts, swap + nmemb, nmemb, cmp); return 1; } FUNC(flux_partition)(pts, swap, pts, swap + span, span, cmp); break; ordered: // ordered pta += 8; if (count--) { if ((v1 = cmp(pta + 0, pta + 1) > 0) | (v2 = cmp(pta + 2, pta + 3) > 0) | (v3 = cmp(pta + 4, pta + 5) > 0) | (v4 = cmp(pta + 6, pta + 7) > 0)) { pts = pta; goto random; } if (cmp(pta + 1, pta + 2) <= 0 && cmp(pta + 3, pta + 4) <= 0 && cmp(pta + 5, pta + 6) <= 0) { goto ordered; } FUNC(quad_swap_merge)(pta, swap, cmp); pta += 8; continue; } break; reversed: // reversed pta += 8; if (count--) { if ((v1 = cmp(pta + 0, pta + 1) <= 0) | (v2 = cmp(pta + 2, pta + 3) <= 0) | (v3 = cmp(pta + 4, pta + 5) <= 0) | (v4 = cmp(pta + 6, pta + 7) <= 0)) { not_reversed: x = !v1; swap[0] = pta[v1]; pta[0] = pta[x]; pta[1] = swap[0]; pta += 2; x = !v2; swap[0] = pta[v2]; pta[0] = pta[x]; pta[1] = swap[0]; pta += 2; x = !v3; swap[0] = pta[v3]; pta[0] = pta[x]; pta[1] = swap[0]; pta += 2; x = !v4; swap[0] = pta[v4]; pta[0] = pta[x]; pta[1] = swap[0]; pta -= 6; if (cmp(pta + 1, pta + 2) > 0 || cmp(pta + 3, pta + 4) > 0 || cmp(pta + 5, pta + 6) > 0) { FUNC(quad_swap_merge)(pta, swap, cmp); } } else { if (cmp(pta - 1, pta) > 0 && cmp(pta + 1, pta + 2) > 0 && cmp(pta + 3, pta + 4) > 0 && cmp(pta + 5, pta + 6) > 0) { goto reversed; } goto not_reversed; } FUNC(quad_reversal)(pts, pta - 1); pta += 8; continue; } switch (nmemb % 8) { case 7: if (cmp(pta + 5, pta + 6) <= 0) break; case 6: if (cmp(pta + 4, pta + 5) <= 0) break; case 5: if (cmp(pta + 3, pta + 4) <= 0) break; case 4: if (cmp(pta + 2, pta + 3) <= 0) break; case 3: if (cmp(pta + 1, pta + 2) <= 0) break; case 2: if (cmp(pta + 0, pta + 1) <= 0) break; case 1: if (cmp(pta - 1, pta + 0) <= 0) break; case 0: FUNC(quad_reversal)(pts, pta + nmemb % 8 - 1); if (pts == array) { return 1; } goto reverse_end; } FUNC(quad_reversal)(pts, pta - 1); break; } FUNC(tail_swap)(pta, swap, nmemb % 8, cmp); reverse_end: pta = array; for (count = nmemb / 32 ; count-- ; pta += 32) { if (cmp(pta + 7, pta + 8) <= 0 && cmp(pta + 15, pta + 16) <= 0 && cmp(pta + 23, pta + 24) <= 0) { continue; } FUNC(parity_merge)(swap, pta, 8, 8, cmp); FUNC(parity_merge)(swap + 16, pta + 16, 8, 8, cmp); FUNC(parity_merge)(pta, swap, 16, 16, cmp); } if (nmemb % 32 > 8) { FUNC(tail_merge)(pta, swap, 32, nmemb % 32, 8, cmp); } return 0; } void FUNC(skipsort)(void *array, size_t nmemb, CMPFUNC *cmp) { VAR *pta = (VAR *) array; if (nmemb <= 96) { VAR swap[nmemb]; FUNC(tail_swap)(pta, swap, nmemb, cmp); } else { VAR *swap = (VAR *) malloc(nmemb * sizeof(VAR)); if (swap == NULL) { FUNC(quadsort)(pta, nmemb, cmp); return; } if (FUNC(skip_analyze)(pta, swap, nmemb, nmemb, cmp) == 0) { FUNC(quad_merge)(pta, swap, nmemb, nmemb, 32, cmp); } free(swap); } } void FUNC(skipsort_swap)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp) { if (nmemb <= 96) { FUNC(tail_swap)(array, swap, nmemb, cmp); } else if (swap_size < nmemb) { FUNC(quadsort_swap)(array, swap, swap_size, nmemb, cmp); } else { FUNC(skip_analyze)(array, swap, swap_size, nmemb, cmp); } } ================================================ FILE: src/skipsort.h ================================================ // skipsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com #ifndef SKIPSORT_H #define SKIPSORT_H #include #include #include #include typedef int CMPFUNC (const void *a, const void *b); //#define cmp(a,b) (*(a) > *(b)) #ifndef QUADSORT_H #include "quadsort.h" #endif #ifndef FLUXSORT_H #include "fluxsort.h" #endif // When sorting an array of pointers, like a string array, QUAD_CACHE needs to // be adjusted in quadsort.h for proper performance when sorting large arrays. ////////////////////////////////////////////////////////// //┌────────────────────────────────────────────────────┐// //│ █████┐ ██████┐ ██████┐████████┐ │// //│ ██┌──██┐ ██┌──██┐└─██┌─┘└──██┌──┘ │// //│ └█████┌┘ ██████┌┘ ██│ ██│ │// //│ ██┌──██┐ ██┌──██┐ ██│ ██│ │// //│ └█████┌┘ ██████┌┘██████┐ ██│ │// //│ └────┘ └─────┘ └─────┘ └─┘ │// //└────────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// #define VAR char #define FUNC(NAME) NAME##8 #include "skipsort.c" #undef VAR #undef FUNC ////////////////////////////////////////////////////////// //┌────────────────────────────────────────────────────┐// //│ ▄██┐ █████┐ ██████┐ ██████┐████████┐│// //│ ████│ ██┌───┘ ██┌──██┐└─██┌─┘└──██┌──┘│// //│ └─██│ ██████┐ ██████┌┘ ██│ ██│ │// //│ ██│ ██┌──██┐ ██┌──██┐ ██│ ██│ │// //│ ██████┐└█████┌┘ ██████┌┘██████┐ ██│ │// //│ └─────┘ └────┘ └─────┘ └─────┘ └─┘ │// //└────────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// #define VAR short #define FUNC(NAME) NAME##16 #include "skipsort.c" #undef VAR #undef FUNC ////////////////////////////////////////////////////////// // ┌───────────────────────────────────────────────────┐// // │ ██████┐ ██████┐ ██████┐ ██████┐████████┐ │// // │ └────██┐└────██┐ ██┌──██┐└─██┌─┘└──██┌──┘ │// // │ █████┌┘ █████┌┘ ██████┌┘ ██│ ██│ │// // │ └───██┐██┌───┘ ██┌──██┐ ██│ ██│ │// // │ ██████┌┘███████┐ ██████┌┘██████┐ ██│ │// // │ └─────┘ └──────┘ └─────┘ └─────┘ └─┘ │// // └───────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// #define VAR int #define FUNC(NAME) NAME##32 #include "skipsort.c" #undef VAR #undef FUNC #ifndef cmp #define cmp(a,b) (*(a) > *(b)) #define VAR int #define FUNC(NAME) NAME##_int32 #include "skipsort.c" #undef VAR #undef FUNC #undef cmp #endif ////////////////////////////////////////////////////////// // ┌───────────────────────────────────────────────────┐// // │ █████┐ ██┐ ██┐ ██████┐ ██████┐████████┐ │// // │ ██┌───┘ ██│ ██│ ██┌──██┐└─██┌─┘└──██┌──┘ │// // │ ██████┐ ███████│ ██████┌┘ ██│ ██│ │// // │ ██┌──██┐└────██│ ██┌──██┐ ██│ ██│ │// // │ └█████┌┘ ██│ ██████┌┘██████┐ ██│ │// // │ └────┘ └─┘ └─────┘ └─────┘ └─┘ │// // └───────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// #define VAR long long #define FUNC(NAME) NAME##64 #include "skipsort.c" #undef VAR #undef FUNC #ifndef cmp #define cmp(a,b) (*(a) > *(b)) #define VAR long long #define FUNC(NAME) NAME##_int64 #include "skipsort.c" #undef VAR #undef FUNC #undef cmp #endif ////////////////////////////////////////////////////////// //┌────────────────────────────────────────────────────┐// //│ ▄██┐ ██████┐ █████┐ ██████┐ ██████┐████████┐ │// //│ ████│ └────██┐██┌──██┐ ██┌──██┐└─██┌─┘└──██┌──┘ │// //│ └─██│ █████┌┘└█████┌┘ ██████┌┘ ██│ ██│ │// //│ ██│ ██┌───┘ ██┌──██┐ ██┌──██┐ ██│ ██│ │// //│ ██████┐███████┐└█████┌┘ ██████┌┘██████┐ ██│ │// //│ └─────┘└──────┘ └────┘ └─────┘ └─────┘ └─┘ │// //└────────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// #define VAR long double #define FUNC(NAME) NAME##128 #include "skipsort.c" #undef VAR #undef FUNC //////////////////////////////////////////////////////////////////////// //┌──────────────────────────────────────────────────────────────────┐// //│███████┐██┐ ██┐██████┐██████┐ ███████┐ ██████┐ ██████┐ ████████┐ │// //│██┌────┘██│ ██┌┘└─██┌─┘██┌──██┐██┌────┘██┌───██┐██┌──██┐└──██┌──┘ │// //│███████┐█████┌┘ ██│ ██████┌┘███████┐██│ ██│██████┌┘ ██│ │// //│└────██│██┌─██┐ ██│ ██┌───┘ └────██│██│ ██│██┌──██┐ ██│ │// //│███████│██│ ██┐██████┐██│ ███████│└██████┌┘██│ ██│ ██│ │// //│└──────┘└─┘ └─┘└─────┘└─┘ └──────┘ └─────┘ └─┘ └─┘ └─┘ │// //└──────────────────────────────────────────────────────────────────┘// //////////////////////////////////////////////////////////////////////// void skipsort(void *array, size_t nmemb, size_t size, CMPFUNC *cmp) { if (nmemb < 2) { return; } #ifndef cmp if (cmp == NULL) { switch (size) { case sizeof(int): return skipsort_int32(array, nmemb, cmp); case sizeof(long long): return skipsort_int64(array, nmemb, cmp); } return assert(size == sizeof(int)); } #endif switch (size) { case sizeof(char): return skipsort8(array, nmemb, cmp); case sizeof(short): return skipsort16(array, nmemb, cmp); case sizeof(int): return skipsort32(array, nmemb, cmp); case sizeof(long long): return skipsort64(array, nmemb, cmp); case sizeof(long double): return skipsort128(array, nmemb, cmp); default: return assert(size == sizeof(char) || size == sizeof(short) || size == sizeof(int) || size == sizeof(long long) || size == sizeof(long double)); } } #endif ================================================ FILE: src/wolfsort.c ================================================ // wolfsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com //#define GODMODE #ifdef GODMODE // inspired by rhsort, technically unstable. void FUNC(unstable_count)(VAR *array, size_t nmemb, size_t buckets, VAR min, CMPFUNC *cmp) { VAR *pta; size_t index; size_t *count = (size_t *) calloc(sizeof(size_t), buckets), loop; pta = array; for (index = nmemb / 16 ; index ; index--) { for (loop = 16 ; loop ; loop--) { count[*pta++ - min]++; } } for (index = nmemb % 16 ; index ; index--) { count[*pta++ - min]++; } pta = array; for (index = 0 ; index < buckets ; index++) { for (loop = count[index] ; loop ; loop--) { *pta++ = index + min; } } free(count); return; } #endif inline void FUNC(wolf_unguarded_insert)(VAR *array, size_t offset, size_t nmemb, CMPFUNC *cmp) { VAR key, *pta, *end; size_t i, top, x, y; for (i = offset ; i < nmemb ; i++) { pta = end = array + i; if (cmp(--pta, end) <= 0) { continue; } key = *end; if (cmp(array + 1, &key) > 0) { top = i - 1; do { *end-- = *pta--; } while (--top); *end-- = key; } else { do { *end-- = *pta--; *end-- = *pta--; } while (cmp(pta, &key) > 0); end[0] = end[1]; end[1] = key; } x = cmp(end, end + 1) > 0; y = !x; key = end[y]; end[0] = end[x]; end[1] = key; } } void FUNC(wolfsort_swap)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp); void FUNC(wolf_partition)(VAR *array, VAR *aux, size_t aux_size, size_t nmemb, VAR min, VAR max, CMPFUNC *cmp) { VAR *swap, *pta, *pts, *ptd, range, moduler; size_t index, cnt, loop, dmemb, buckets; unsigned int *count, limit; if (nmemb < 32) { return FUNC(quadsort)(array, nmemb, cmp); } range = max - min; if (range >> 16 == 0 || (size_t) range <= nmemb / 4) { buckets = range + 1; moduler = 1; } else { buckets = nmemb <= 4 * 65536 ? nmemb / 4 : 1024; for (moduler = 4 ; (size_t) moduler <= range / buckets ; moduler *= 2) {} buckets = range / moduler + 1; } limit = (nmemb / buckets) * 4; count = (unsigned int *) calloc(sizeof(int), buckets); swap = aux; if (limit * buckets > aux_size) { swap = (VAR *) malloc(limit * buckets * sizeof(VAR)); } if (count == NULL || swap == NULL) { if (count) { free(count); } FUNC(fluxsort_swap)(array, aux, aux_size, nmemb, cmp); return; } ptd = pta = array; for (loop = nmemb ; loop ; loop--) { max = *pta++; index = (unsigned int) (max - min) / moduler; if (count[index] < limit) { swap[index * limit + count[index]++] = max; continue; } // The element doesn't fit, so we drop it to the main array. Inspired by rhsort. *ptd++ = max; } dmemb = ptd - array; if (dmemb) { ptd = array + nmemb - dmemb; memmove(ptd, array, dmemb * sizeof(VAR)); } pta = array; pts = swap; for (index = 0 ; index < buckets ; index++) { cnt = count[index]; if (cnt) { memcpy(pta, pts, cnt * sizeof(VAR)); if (moduler > 1) { FUNC(wolfsort_swap)(pta, swap, limit + pts - swap, cnt, cmp); } pta += cnt; } pts += limit; } if (dmemb) { FUNC(fluxsort_swap)(ptd, swap, dmemb, dmemb, cmp); FUNC(partial_backward_merge)(array, swap, nmemb, nmemb, nmemb - dmemb, cmp); } if (limit * buckets > aux_size) { free(swap); } free(count); } void FUNC(wolf_minmax)(VAR *min, VAR *max, VAR *pta, VAR *ptb, VAR *ptc, VAR *ptd, CMPFUNC *cmp) { if (cmp(min, pta) > 0) *min = *pta; if (cmp(pta, max) > 0) *max = *pta; if (cmp(min, ptb) > 0) *min = *ptb; if (cmp(ptb, max) > 0) *max = *ptb; if (cmp(min, ptc) > 0) *min = *ptc; if (cmp(ptc, max) > 0) *max = *ptc; if (cmp(min, ptd) > 0) *min = *ptd; if (cmp(ptd, max) > 0) *max = *ptd; } void FUNC(wolf_analyze)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp) { unsigned char loop, asum, bsum, csum, dsum; unsigned int astreaks, bstreaks, cstreaks, dstreaks; size_t quad1, quad2, quad3, quad4, half1, half2; size_t cnt, abalance, bbalance, cbalance, dbalance; VAR min, max, *pta, *ptb, *ptc, *ptd; half1 = nmemb / 2; quad1 = half1 / 2; quad2 = half1 - quad1; half2 = nmemb - half1; quad3 = half2 / 2; quad4 = half2 - quad3; min = max = array[nmemb - 1]; pta = array; ptb = array + quad1; ptc = array + half1; ptd = array + half1 + quad3; astreaks = bstreaks = cstreaks = dstreaks = 0; abalance = bbalance = cbalance = dbalance = 0; for (cnt = nmemb ; cnt > 132 ; cnt -= 128) { for (asum = bsum = csum = dsum = 0, loop = 32 ; loop ; loop--) { FUNC(wolf_minmax)(&min, &max, pta, ptb, ptc, ptd, cmp); asum += cmp(pta, pta + 1) > 0; pta++; bsum += cmp(ptb, ptb + 1) > 0; ptb++; csum += cmp(ptc, ptc + 1) > 0; ptc++; dsum += cmp(ptd, ptd + 1) > 0; ptd++; } abalance += asum; astreaks += (asum == 0) | (asum == 32); bbalance += bsum; bstreaks += (bsum == 0) | (bsum == 32); cbalance += csum; cstreaks += (csum == 0) | (csum == 32); dbalance += dsum; dstreaks += (dsum == 0) | (dsum == 32); } for ( ; cnt > 7 ; cnt -= 4) { FUNC(wolf_minmax)(&min, &max, pta, ptb, ptc, ptd, cmp); abalance += cmp(pta, pta + 1) > 0; pta++; bbalance += cmp(ptb, ptb + 1) > 0; ptb++; cbalance += cmp(ptc, ptc + 1) > 0; ptc++; dbalance += cmp(ptd, ptd + 1) > 0; ptd++; } if (quad1 < quad2) { if (cmp(&min, ptb) > 0) min = *ptb; else if (cmp(ptb, &max) > 0) max = *ptb; bbalance += cmp(ptb, ptb + 1) > 0; ptb++; } if (quad1 < quad3) { if (cmp(&min, ptc) > 0) min = *ptc; else if (cmp(ptc, &max) > 0) max = *ptc; cbalance += cmp(ptc, ptc + 1) > 0; ptc++; } if (quad1 < quad4) { if (cmp(&min, ptd) > 0) min = *ptd; else if (cmp(ptd, &max) > 0) max = *ptd; dbalance += cmp(ptd, ptd + 1) > 0; ptd++; } FUNC(wolf_minmax)(&min, &max, pta, ptb, ptc, ptd, cmp); cnt = abalance + bbalance + cbalance + dbalance; if (cnt == 0) { if (cmp(pta, pta + 1) <= 0 && cmp(ptb, ptb + 1) <= 0 && cmp(ptc, ptc + 1) <= 0) { return; } } #ifdef GODMODE { VAR range = max - min; if (range < 65536 || range <= nmemb / 4) { FUNC(unstable_count)(array, nmemb, range + 1, min, cmp); return; } } #endif asum = quad1 - abalance == 1; bsum = quad2 - bbalance == 1; csum = quad3 - cbalance == 1; dsum = quad4 - dbalance == 1; if (asum | bsum | csum | dsum) { unsigned char span1 = (asum && bsum) * (cmp(pta, pta + 1) > 0); unsigned char span2 = (bsum && csum) * (cmp(ptb, ptb + 1) > 0); unsigned char span3 = (csum && dsum) * (cmp(ptc, ptc + 1) > 0); switch (span1 | span2 * 2 | span3 * 4) { case 0: break; case 1: FUNC(quad_reversal)(array, ptb); abalance = bbalance = 0; break; case 2: FUNC(quad_reversal)(pta + 1, ptc); bbalance = cbalance = 0; break; case 3: FUNC(quad_reversal)(array, ptc); abalance = bbalance = cbalance = 0; break; case 4: FUNC(quad_reversal)(ptb + 1, ptd); cbalance = dbalance = 0; break; case 5: FUNC(quad_reversal)(array, ptb); FUNC(quad_reversal)(ptb + 1, ptd); abalance = bbalance = cbalance = dbalance = 0; break; case 6: FUNC(quad_reversal)(pta + 1, ptd); bbalance = cbalance = dbalance = 0; break; case 7: FUNC(quad_reversal)(array, ptd); return; } if (asum && abalance) {FUNC(quad_reversal)(array, pta); abalance = 0;} if (bsum && bbalance) {FUNC(quad_reversal)(pta + 1, ptb); bbalance = 0;} if (csum && cbalance) {FUNC(quad_reversal)(ptb + 1, ptc); cbalance = 0;} if (dsum && dbalance) {FUNC(quad_reversal)(ptc + 1, ptd); dbalance = 0;} } #ifdef cmp cnt = nmemb / 256; // switch to quadsort if more than 50% ordered #else cnt = nmemb / 512; // switch to quadsort if more than 25% ordered #endif asum = astreaks > cnt; bsum = bstreaks > cnt; csum = cstreaks > cnt; dsum = dstreaks > cnt; #ifndef cmp if (quad1 > QUAD_CACHE) { asum = bsum = csum = dsum = 1; } #endif switch (asum + bsum * 2 + csum * 4 + dsum * 8) { case 0: FUNC(wolf_partition)(array, swap, swap_size, nmemb, min, max, cmp); return; case 1: if (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp); FUNC(wolf_partition)(pta + 1, swap, swap_size, quad2 + half2, min, max, cmp); break; case 2: FUNC(wolf_partition)(array, swap, swap_size, quad1, min, max, cmp); if (bbalance) FUNC(quadsort_swap)(pta + 1, swap, swap_size, quad2, cmp); FUNC(wolf_partition)(ptb + 1, swap, swap_size, half2, min, max, cmp); break; case 3: if (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp); if (bbalance) FUNC(quadsort_swap)(pta + 1, swap, swap_size, quad2, cmp); FUNC(wolf_partition)(ptb + 1, swap, swap_size, half2, min, max, cmp); break; case 4: FUNC(wolf_partition)(array, swap, swap_size, half1, min, max, cmp); if (cbalance) FUNC(quadsort_swap)(ptb + 1, swap, swap_size, quad3, cmp); FUNC(wolf_partition)(ptc + 1, swap, swap_size, quad4, min, max, cmp); break; case 8: FUNC(wolf_partition)(array, swap, swap_size, half1 + quad3, min, max, cmp); if (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp); break; case 9: if (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp); FUNC(wolf_partition)(pta + 1, swap, swap_size, quad2 + quad3, min, max, cmp); if (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp); break; case 12: FUNC(wolf_partition)(array, swap, swap_size, half1, min, max, cmp); if (cbalance) FUNC(quadsort_swap)(ptb + 1, swap, swap_size, quad3, cmp); if (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp); break; case 5: case 6: case 7: case 10: case 11: case 13: case 14: case 15: if (asum) { if (abalance) FUNC(quadsort_swap)(array, swap, swap_size, quad1, cmp); } else FUNC(wolf_partition)(array, swap, swap_size, quad1, min, max, cmp); if (bsum) { if (bbalance) FUNC(quadsort_swap)(pta + 1, swap, swap_size, quad2, cmp); } else FUNC(wolf_partition)(pta + 1, swap, swap_size, quad2, min, max, cmp); if (csum) { if (cbalance) FUNC(quadsort_swap)(ptb + 1, swap, swap_size, quad3, cmp); } else FUNC(wolf_partition)(ptb + 1, swap, swap_size, quad3, min, max, cmp); if (dsum) { if (dbalance) FUNC(quadsort_swap)(ptc + 1, swap, swap_size, quad4, cmp); } else FUNC(wolf_partition)(ptc + 1, swap, swap_size, quad4, min, max, cmp); break; } if (cmp(pta, pta + 1) <= 0) { memcpy(swap, array, half1 * sizeof(VAR)); if (cmp(ptc, ptc + 1) <= 0) { if (cmp(ptb, ptb + 1) <= 0) { return; } memcpy(swap + half1, array + half1, half2 * sizeof(VAR)); } else { FUNC(cross_merge)(swap + half1, array + half1, quad3, quad4, cmp); } } else { FUNC(cross_merge)(swap, array, quad1, quad2, cmp); if (cmp(ptc, ptc + 1) <= 0) { memcpy(swap + half1, array + half1, half2 * sizeof(VAR)); } else { FUNC(cross_merge)(swap + half1, ptb + 1, quad3, quad4, cmp); } } FUNC(cross_merge)(array, swap, half1, half2, cmp); } void FUNC(wolfsort)(void *array, size_t nmemb, CMPFUNC *cmp) { VAR *pta = (VAR *) array; if (nmemb <= 132) { FUNC(quadsort)(pta, nmemb, cmp); } else { VAR *swap = (VAR *) malloc(nmemb * sizeof(VAR)); if (swap == NULL) { FUNC(quadsort)(pta, nmemb, cmp); return; } FUNC(wolf_analyze)(pta, swap, nmemb, nmemb, cmp); free(swap); } } void FUNC(wolfsort_swap)(VAR *array, VAR *swap, size_t swap_size, size_t nmemb, CMPFUNC *cmp) { if (nmemb <= 132) { FUNC(quadsort_swap)(array, swap, nmemb, nmemb, cmp); } else { FUNC(wolf_analyze)(array, swap, swap_size, nmemb, cmp); } } ================================================ FILE: src/wolfsort.h ================================================ // wolfsort 1.2.1.3 - Igor van den Hoven ivdhoven@gmail.com #ifndef WOLFSORT_H #define WOLFSORT_H #include #include #include #include #include typedef int CMPFUNC (const void *a, const void *b); //#define cmp(a,b) (*(a) > *(b)) // When sorting an array of pointers, like a string array, the QUAD_CACHE needs // to be set for proper performance when sorting large arrays. // wolfsort_prim() can be used to sort 32 and 64 bit primitives. // With a 6 MB L3 cache a value of 262144 works well. #ifdef cmp #define QUAD_CACHE 4294967295 #else //#define QUAD_CACHE 131072 #define QUAD_CACHE 262144 //#define QUAD_CACHE 524288 //#define QUAD_CACHE 4294967295 #endif #ifndef FLUXSORT_H #include "fluxsort.h" #endif ////////////////////////////////////////////////////////// // ┌───────────────────────────────────────────────────┐// // │ ██████┐ ██████┐ ██████┐ ██████┐████████┐ │// // │ └────██┐└────██┐ ██┌──██┐└─██┌─┘└──██┌──┘ │// // │ █████┌┘ █████┌┘ ██████┌┘ ██│ ██│ │// // │ └───██┐██┌───┘ ██┌──██┐ ██│ ██│ │// // │ ██████┌┘███████┐ ██████┌┘██████┐ ██│ │// // │ └─────┘ └──────┘ └─────┘ └─────┘ └─┘ │// // └───────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// /* #define VAR int #define FUNC(NAME) NAME##32 #include "wolfsort.c" #undef VAR #undef FUNC */ // wolfsort_prim #define VAR int #define FUNC(NAME) NAME##_int32 #ifndef cmp #define cmp(a,b) (*(a) > *(b)) #include "wolfsort.c" #undef cmp #else #include "wolfsort.c" #endif #undef VAR #undef FUNC #define VAR unsigned int #define FUNC(NAME) NAME##_uint32 #ifndef cmp #define cmp(a,b) (*(a) > *(b)) #include "wolfsort.c" #undef cmp #else #include "wolfsort.c" #endif #undef VAR #undef FUNC ////////////////////////////////////////////////////////// // ┌───────────────────────────────────────────────────┐// // │ █████┐ ██┐ ██┐ ██████┐ ██████┐████████┐ │// // │ ██┌───┘ ██│ ██│ ██┌──██┐└─██┌─┘└──██┌──┘ │// // │ ██████┐ ███████│ ██████┌┘ ██│ ██│ │// // │ ██┌──██┐└────██│ ██┌──██┐ ██│ ██│ │// // │ └█████┌┘ ██│ ██████┌┘██████┐ ██│ │// // │ └────┘ └─┘ └─────┘ └─────┘ └─┘ │// // └───────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// /* #define VAR long long #define FUNC(NAME) NAME##64 #include "wolfsort.c" #undef VAR #undef FUNC */ // wolfsort_prim #define VAR long long #define FUNC(NAME) NAME##_int64 #ifndef cmp #define cmp(a,b) (*(a) > *(b)) #include "wolfsort.c" #undef cmp #else #include "wolfsort.c" #endif #undef VAR #undef FUNC #define VAR unsigned long long #define FUNC(NAME) NAME##_uint64 #ifndef cmp #define cmp(a,b) (*(a) > *(b)) #include "wolfsort.c" #undef cmp #else #include "wolfsort.c" #endif #undef VAR #undef FUNC // This section is outside of 32/64 bit pointer territory, so no cache checks // necessary, unless sorting 32+ byte structures. #undef QUAD_CACHE #define QUAD_CACHE 4294967295 ////////////////////////////////////////////////////////// //┌────────────────────────────────────────────────────┐// //│ █████┐ ██████┐ ██████┐████████┐ │// //│ ██┌──██┐ ██┌──██┐└─██┌─┘└──██┌──┘ │// //│ └█████┌┘ ██████┌┘ ██│ ██│ │// //│ ██┌──██┐ ██┌──██┐ ██│ ██│ │// //│ └█████┌┘ ██████┌┘██████┐ ██│ │// //│ └────┘ └─────┘ └─────┘ └─┘ │// //└────────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// #define VAR char #define FUNC(NAME) NAME##8 #include "wolfsort.c" #undef VAR #undef FUNC ////////////////////////////////////////////////////////// //┌────────────────────────────────────────────────────┐// //│ ▄██┐ █████┐ ██████┐ ██████┐████████┐│// //│ ████│ ██┌───┘ ██┌──██┐└─██┌─┘└──██┌──┘│// //│ └─██│ ██████┐ ██████┌┘ ██│ ██│ │// //│ ██│ ██┌──██┐ ██┌──██┐ ██│ ██│ │// //│ ██████┐└█████┌┘ ██████┌┘██████┐ ██│ │// //│ └─────┘ └────┘ └─────┘ └─────┘ └─┘ │// //└────────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////// #define VAR short #define FUNC(NAME) NAME##16 #include "wolfsort.c" #undef VAR #undef FUNC /////////////////////////////////////////////////////////// //┌─────────────────────────────────────────────────────┐// //│ ██████┐██┐ ██┐███████┐████████┐ ██████┐ ███┐ ███┐│// //│██┌────┘██│ ██│██┌────┘└──██┌──┘██┌───██┐████┐████││// //│██│ ██│ ██│███████┐ ██│ ██│ ██│██┌███┌██││// //│██│ ██│ ██│└────██│ ██│ ██│ ██│██│└█┌┘██││// //│└██████┐└██████┌┘███████│ ██│ └██████┌┘██│ └┘ ██││// //│ └─────┘ └─────┘ └──────┘ └─┘ └─────┘ └─┘ └─┘│// //└─────────────────────────────────────────────────────┘// /////////////////////////////////////////////////////////// /* typedef struct {char bytes[32];} struct256; #define VAR struct256 #define FUNC(NAME) NAME##256 #include "wolfsort.c" #undef VAR #undef FUNC */ ////////////////////////////////////////////////////////////////////////// //┌─────────────────────────────────────────────────────────────────────┐// //│██┐ ██┐ ██████┐ ██┐ ███████┐███████┐ ██████┐ ██████┐ ████████┐│// //│██│ ██│██┌───██┐██│ ██┌────┘██┌────┘██┌───██┐██┌──██┐└──██┌──┘│// //│██│ █┐ ██│██│ ██│██│ █████┐ ███████┐██│ ██│██████┌┘ ██│ │// //│██│███┐██│██│ ██│██│ ██┌──┘ └────██│██│ ██│██┌──██┐ ██│ │// //│└███┌███┌┘└██████┌┘███████┐██│ ███████│└██████┌┘██│ ██│ ██│ │// //│ └──┘└──┘ └─────┘ └──────┘└─┘ └──────┘ └─────┘ └─┘ └─┘ └─┘ │// //└─────────────────────────────────────────────────────────────────────┘// ////////////////////////////////////////////////////////////////////////// void wolfsort(void *array, size_t nmemb, size_t size, CMPFUNC *cmp) { if (nmemb < 2) { return; } switch (size) { case sizeof(char): wolfsort8(array, nmemb, cmp); return; case sizeof(short): wolfsort16(array, nmemb, cmp); return; case sizeof(int): wolfsort_uint32(array, nmemb, cmp); return; case sizeof(long long): wolfsort_uint64(array, nmemb, cmp); // fluxsort64(array, nmemb, cmp); // fluxsort generally beats wolfsort for 64+ bit types return; case sizeof(long double): fluxsort128(array, nmemb, cmp); return; // case sizeof(struct256): // wolfsort256(array, nmemb, cmp); return; default: assert(size == sizeof(char) || size == sizeof(short) || size == sizeof(int) || size == sizeof(long long) || size == sizeof(long double)); // qsort(array, nmemb, size, cmp); } } // suggested size values for primitives: // case 0: unsigned char // case 1: signed char // case 2: signed short // case 3: unsigned short // case 4: signed int // case 5: unsigned int // case 6: float // case 7: double // case 8: signed long long // case 9: unsigned long long // case 16: long double void wolfsort_prim(void *array, size_t nmemb, size_t size) { if (nmemb < 2) { return; } switch (size) { case 4: fluxsort_int32(array, nmemb, NULL); return; case 8: fluxsort_int64(array, nmemb, NULL); return; case 5: wolfsort_uint32(array, nmemb, NULL); return; case 9: wolfsort_uint64(array, nmemb, NULL); return; default: assert(size == sizeof(int) || size == sizeof(long long) || size == sizeof(int) + 1 || size == sizeof(long long) + 1); return; } } #undef QUAD_CACHE #endif