[
  {
    "path": ".travis.yml",
    "content": "language: go\ndist: trusty\nsudo: false\ngo:\n  - \"1.8.x\"\n  - \"1.9.x\"\n  - \"1.10.x\"\n  - master\nbefore_script:  \n  - \"go get -u gopkg.in/alecthomas/gometalinter.v2\"\n  - \"gometalinter.v2 --install\"\nscript:\n  - \"go test -v -cover -benchmem -bench=. $(go list ./... | grep -v /vendor/ | sed \\\"s&_${PWD}&.&\\\")\"\n  - \"gometalinter.v2 --enable-all ./...\"\n"
  },
  {
    "path": "MIT-LICENSE.txt",
    "content": "The MIT License (MIT)\nCopyright © 2014, 2015 Barry Allard\n\nPermission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "**Important**: Zeroth, [consider](https://bdupras.github.io/filter-tutorial/) if a [Cuckoo filter](https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf) could be [right for your use-case](https://github.com/seiflotfy/cuckoofilter).\n\n\n[![GoDoc](https://godoc.org/github.com/steakknife/bloomfilter?status.png)](https://godoc.org/github.com/steakknife/bloomfilter) [![travis](https://img.shields.io/travis/steakknife/bloomfilter.svg)](https://travis-ci.org/steakknife/bloomfilter)\n\n# Face-meltingly fast, thread-safe, marshalable, unionable, probability- and optimal-size-calculating Bloom filter in go\n\nCopyright © 2014-2016,2018 Barry Allard\n\n[MIT license](MIT-LICENSE.txt)\n\n## WTF is a bloom filter\n\n**TL;DR: **Probabilistic, extra lookup table to track a set of elements kept elsewhere to reduce expensive, unnecessary set element retrieval and/or iterator operations **when an element is not present in the set.** It's a classic time-storage tradeoff algoritm.\n\n### Properties\n\n#### [See wikipedia](https://en.wikipedia.org/wiki/Bloom_filter) for algorithm details\n\n|Impact|What|Description|\n|---|---|---|\n|Good|No false negatives|know for certain if a given element is definitely NOT in the set|\n|Bad|False positives|uncertain if a given element is in the set|\n|Bad|Theoretical potential for hash collisions|in very large systems and/or badly hash.Hash64-conforming implementations|\n|Bad|Add only|Cannot remove an element, it would destroy information about other elements|\n|Good|Constant storage|uses only a fixed amount of memory|\n\n## Naming conventions\n\n(Similar to algorithm)\n\n|Variable/function|Description|Range|\n|---|---|---|\n|m/M()|number of bits in the bloom filter (memory representation is about m/8 bytes in size)|>=2|\n|n/N()|number of elements present|>=0|\n|k/K()|number of keys to use (keys are kept private to user code but are de/serialized to Marshal and file I/O)|>=0|\n|maxN|maximum capacity of intended structure|>0|\n|p|maximum allowed probability of collision (for computing m and k for optimal sizing)|>0..<1|\n\n- Memory representation should be exactly `24 + 8*(k + (m+63)/64) + unsafe.Sizeof(RWMutex)` bytes.\n- Serialized (`BinaryMarshaler`) representation should be exactly `72 + 8*(k + (m+63)/64)` bytes. (Disk format is less due to compression.)\n\n## Binary serialization format\n\nAll values in Little-endian format\n\n|Offset|Offset (Hex)|Length (bytes)|Name|Type|\n|---|---|---|---|---|\n|0|00|8|k|`uint64`|\n|8|08|8|n|`uint64`|\n|16|10|8|m|`uint64`|\n|24|18|k|(keys)|`[k]uint64`|\n|24+8*k|...|(m+63)/64|(bloom filter)|`[(m+63)/64]uint64`|\n|24+8\\*k+8\\*((m+63)/64)|...|48|(SHA384 of all previous fields, hashed in order)|`[48]byte`|\n\n- `bloomfilter.Filter` conforms to `encoding.BinaryMarshaler` and `encoding.BinaryUnmarshaler'\n\n## Usage\n\n```go\n\nimport \"github.com/steakknife/bloomfilter\"\n\nconst (\n  maxElements = 100000\n  probCollide = 0.0000001\n)\n\nbf, err := bloomfilter.NewOptimal(maxElements, probCollide)\nif err != nil {\n  panic(err)\n}\n\nsomeValue := ... // must conform to hash.Hash64\n\nbf.Add(someValue)\nif bf.Contains(someValue) { // probably true, could be false\n  // whatever\n}\n\nanotherValue := ... // must also conform to hash.Hash64\n\nif bf.Contains(anotherValue) {\n  panic(\"This should never happen\")\n}\n\nerr := bf.WriteFile(\"1.bf.gz\")  // saves this BF to a file\nif err != nil {\n  panic(err)\n}\n\nbf2, err := bloomfilter.ReadFile(\"1.bf.gz\") // read the BF to another var\nif err != nil {\n  panic(err)\n}\n```\n\n\n## Design\n\nWhere possible, branch-free operations are used to avoid deep pipeline / execution unit stalls on branch-misses.\n\n## Get\n\n    go get -u github.com/steakknife/bloomfilter  # master is always stable\n\n## Source\n\n- On the web: [https://github.com/steakknife/bloomfilter](https://github.com/steakknife/bloomfilter)\n\n- Git: `git clone https://github.com/steakknife/bloomfilter`\n\n## Contact\n\n- [Feedback](mailto:barry.allard@gmail.com)\n\n- [Issues](https://github.com/steakknife/bloomfilter/issues)\n\n## License\n\n[MIT license](MIT-LICENSE.txt)\n\nCopyright © 2014-2016 Barry Allard\n"
  },
  {
    "path": "binarymarshaler.go",
    "content": "// Package bloomfilter is face-meltingly fast, thread-safe,\n// marshalable, unionable, probability- and\n// optimal-size-calculating Bloom filter in go\n//\n// https://github.com/steakknife/bloomfilter\n//\n// Copyright © 2014, 2015, 2018 Barry Allard\n//\n// MIT license\n//\npackage bloomfilter\n\nimport (\n\t\"bytes\"\n\t\"crypto/sha512\"\n\t\"encoding/binary\"\n)\n\n// conforms to encoding.BinaryMarshaler\n\n// marshalled binary layout (Little Endian):\n//\n//\t k\t1 uint64\n//\t n\t1 uint64\n//\t m\t1 uint64\n//\t keys\t[k]uint64\n//\t bits\t[(m+63)/64]uint64\n//\t hash\tsha384 (384 bits == 48 bytes)\n//\n//\t size = (3 + k + (m+63)/64) * 8 bytes\n//\n\nfunc (f *Filter) marshal() (buf *bytes.Buffer,\n\thash [sha512.Size384]byte,\n\terr error,\n) {\n\tf.lock.RLock()\n\tdefer f.lock.RUnlock()\n\n\tdebug(\"write bf k=%d n=%d m=%d\\n\", f.K(), f.n, f.m)\n\n\tbuf = new(bytes.Buffer)\n\n\terr = binary.Write(buf, binary.LittleEndian, f.K())\n\tif err != nil {\n\t\treturn nil, hash, err\n\t}\n\n\terr = binary.Write(buf, binary.LittleEndian, f.n)\n\tif err != nil {\n\t\treturn nil, hash, err\n\t}\n\n\terr = binary.Write(buf, binary.LittleEndian, f.m)\n\tif err != nil {\n\t\treturn nil, hash, err\n\t}\n\n\terr = binary.Write(buf, binary.LittleEndian, f.keys)\n\tif err != nil {\n\t\treturn nil, hash, err\n\t}\n\n\terr = binary.Write(buf, binary.LittleEndian, f.bits)\n\tif err != nil {\n\t\treturn nil, hash, err\n\t}\n\n\thash = sha512.Sum384(buf.Bytes())\n\terr = binary.Write(buf, binary.LittleEndian, hash)\n\treturn buf, hash, err\n}\n\n// MarshalBinary converts a Filter into []bytes\nfunc (f *Filter) MarshalBinary() (data []byte, err error) {\n\tbuf, hash, err := f.marshal()\n\tif err != nil {\n\t\treturn nil, err\n\t}\n\n\tdebug(\n\t\t\"bloomfilter.MarshalBinary: Successfully wrote %d byte(s), sha384 %v\",\n\t\tbuf.Len(), hash,\n\t)\n\tdata = buf.Bytes()\n\treturn data, nil\n}\n"
  },
  {
    "path": "binaryunmarshaler.go",
    "content": "// Package bloomfilter is face-meltingly fast, thread-safe,\n// marshalable, unionable, probability- and\n// optimal-size-calculating Bloom filter in go\n//\n// https://github.com/steakknife/bloomfilter\n//\n// Copyright © 2014, 2015, 2018 Barry Allard\n//\n// MIT license\n//\npackage bloomfilter\n\nimport (\n\t\"bytes\"\n\t\"crypto/hmac\"\n\t\"crypto/sha512\"\n\t\"encoding/binary\"\n\t\"io\"\n)\n\nfunc unmarshalBinaryHeader(r io.Reader) (k, n, m uint64, err error) {\n\terr = binary.Read(r, binary.LittleEndian, &k)\n\tif err != nil {\n\t\treturn k, n, m, err\n\t}\n\n\tif k < KMin {\n\t\treturn k, n, m, errK()\n\t}\n\n\terr = binary.Read(r, binary.LittleEndian, &n)\n\tif err != nil {\n\t\treturn k, n, m, err\n\t}\n\n\terr = binary.Read(r, binary.LittleEndian, &m)\n\tif err != nil {\n\t\treturn k, n, m, err\n\t}\n\n\tif m < MMin {\n\t\treturn k, n, m, errM()\n\t}\n\n\tdebug(\"read bf k=%d n=%d m=%d\\n\", k, n, m)\n\n\treturn k, n, m, err\n}\n\nfunc unmarshalBinaryBits(r io.Reader, m uint64) (bits []uint64, err error) {\n\tbits, err = newBits(m)\n\tif err != nil {\n\t\treturn bits, err\n\t}\n\terr = binary.Read(r, binary.LittleEndian, bits)\n\treturn bits, err\n\n}\n\nfunc unmarshalBinaryKeys(r io.Reader, k uint64) (keys []uint64, err error) {\n\tkeys = make([]uint64, k)\n\terr = binary.Read(r, binary.LittleEndian, keys)\n\treturn keys, err\n}\n\nfunc checkBinaryHash(r io.Reader, data []byte) (err error) {\n\texpectedHash := make([]byte, sha512.Size384)\n\terr = binary.Read(r, binary.LittleEndian, expectedHash)\n\tif err != nil {\n\t\treturn err\n\t}\n\n\tactualHash := sha512.Sum384(data[:len(data)-sha512.Size384])\n\n\tif !hmac.Equal(expectedHash, actualHash[:]) {\n\t\tdebug(\"bloomfilter.UnmarshalBinary() sha384 hash failed:\"+\n\t\t\t\" actual %v  expected %v\", actualHash, expectedHash)\n\t\treturn errHash()\n\t}\n\n\tdebug(\"bloomfilter.UnmarshalBinary() successfully read\"+\n\t\t\" %d byte(s), sha384 %v\", len(data), actualHash)\n\treturn nil\n}\n\n// UnmarshalBinary converts []bytes into a Filter\n// conforms to encoding.BinaryUnmarshaler\nfunc (f *Filter) UnmarshalBinary(data []byte) (err error) {\n\tf.lock.Lock()\n\tdefer f.lock.Unlock()\n\n\tbuf := bytes.NewBuffer(data)\n\n\tvar k uint64\n\tk, f.n, f.m, err = unmarshalBinaryHeader(buf)\n\tif err != nil {\n\t\treturn err\n\t}\n\n\tf.keys, err = unmarshalBinaryKeys(buf, k)\n\tif err != nil {\n\t\treturn err\n\t}\n\n\tf.bits, err = unmarshalBinaryBits(buf, f.m)\n\tif err != nil {\n\t\treturn err\n\t}\n\n\treturn checkBinaryHash(buf, data)\n}\n"
  },
  {
    "path": "bloomfilter.go",
    "content": "// Package bloomfilter is face-meltingly fast, thread-safe,\n// marshalable, unionable, probability- and\n// optimal-size-calculating Bloom filter in go\n//\n// https://github.com/steakknife/bloomfilter\n//\n// Copyright © 2014, 2015, 2018 Barry Allard\n//\n// MIT license\n//\npackage bloomfilter\n\nimport (\n\t\"hash\"\n\t\"sync\"\n)\n\n// Filter is an opaque Bloom filter type\ntype Filter struct {\n\tlock sync.RWMutex\n\tbits []uint64\n\tkeys []uint64\n\tm    uint64 // number of bits the \"bits\" field should recognize\n\tn    uint64 // number of inserted elements\n}\n\n// Hashable -> hashes\nfunc (f *Filter) hash(v hash.Hash64) []uint64 {\n\trawHash := v.Sum64()\n\tn := len(f.keys)\n\thashes := make([]uint64, n)\n\tfor i := 0; i < n; i++ {\n\t\thashes[i] = rawHash ^ f.keys[i]\n\t}\n\treturn hashes\n}\n\n// M is the size of Bloom filter, in bits\nfunc (f *Filter) M() uint64 {\n\treturn f.m\n}\n\n// K is the count of keys\nfunc (f *Filter) K() uint64 {\n\treturn uint64(len(f.keys))\n}\n\n// Add a hashable item, v, to the filter\nfunc (f *Filter) Add(v hash.Hash64) {\n\tf.lock.Lock()\n\tdefer f.lock.Unlock()\n\n\tfor _, i := range f.hash(v) {\n\t\t// f.setBit(i)\n\t\ti %= f.m\n\t\tf.bits[i>>6] |= 1 << uint(i&0x3f)\n\t}\n\tf.n++\n}\n\n// Contains tests if f contains v\n// false: f definitely does not contain value v\n// true:  f maybe contains value v\nfunc (f *Filter) Contains(v hash.Hash64) bool {\n\tf.lock.RLock()\n\tdefer f.lock.RUnlock()\n\n\tr := uint64(1)\n\tfor _, i := range f.hash(v) {\n\t\t// r |= f.getBit(k)\n\t\ti %= f.m\n\t\tr &= (f.bits[i>>6] >> uint(i&0x3f)) & 1\n\t}\n\treturn uint64ToBool(r)\n}\n\n// Copy f to a new Bloom filter\nfunc (f *Filter) Copy() (*Filter, error) {\n\tf.lock.RLock()\n\tdefer f.lock.RUnlock()\n\n\tout, err := f.NewCompatible()\n\tif err != nil {\n\t\treturn nil, err\n\t}\n\tcopy(out.bits, f.bits)\n\tout.n = f.n\n\treturn out, nil\n}\n\n// UnionInPlace merges Bloom filter f2 into f\nfunc (f *Filter) UnionInPlace(f2 *Filter) error {\n\tif !f.IsCompatible(f2) {\n\t\treturn errIncompatibleBloomFilters()\n\t}\n\n\tf.lock.Lock()\n\tdefer f.lock.Unlock()\n\n\tfor i, bitword := range f2.bits {\n\t\tf.bits[i] |= bitword\n\t}\n\treturn nil\n}\n\n// Union merges f2 and f2 into a new Filter out\nfunc (f *Filter) Union(f2 *Filter) (out *Filter, err error) {\n\tif !f.IsCompatible(f2) {\n\t\treturn nil, errIncompatibleBloomFilters()\n\t}\n\n\tf.lock.RLock()\n\tdefer f.lock.RUnlock()\n\n\tout, err = f.NewCompatible()\n\tif err != nil {\n\t\treturn nil, err\n\t}\n\tfor i, bitword := range f2.bits {\n\t\tout.bits[i] = f.bits[i] | bitword\n\t}\n\treturn out, nil\n}\n"
  },
  {
    "path": "bloomfilter_test.go",
    "content": "// Package bloomfilter is face-meltingly fast, thread-safe,\n// marshalable, unionable, probability- and\n// optimal-size-calculating Bloom filter in go\n//\n// https://github.com/steakknife/bloomfilter\n//\n// Copyright © 2014, 2015, 2018 Barry Allard\n//\n// MIT license\n//\npackage bloomfilter\n\nimport (\n\t\"math/rand\"\n\t\"testing\"\n)\n\n// a read-only type that conforms to hash.Hash64, but only Sum64() works.\n// It is set by writing the underlying value.\ntype hashableUint64 uint64\n\nfunc (h hashableUint64) Write([]byte) (int, error) {\n\tpanic(\"Unimplemented\")\n}\n\nfunc (h hashableUint64) Sum([]byte) []byte {\n\tpanic(\"Unimplemented\")\n}\n\nfunc (h hashableUint64) Reset() {\n\tpanic(\"Unimplemented\")\n}\n\nfunc (h hashableUint64) BlockSize() int {\n\tpanic(\"Unimplemented\")\n}\n\nfunc (h hashableUint64) Size() int {\n\tpanic(\"Unimplemented\")\n}\n\nfunc (h hashableUint64) Sum64() uint64 {\n\treturn uint64(h)\n}\n\nfunc hashableUint64Values() []hashableUint64 {\n\treturn []hashableUint64{\n\t\t0,\n\t\t7,\n\t\t0x0c0ffee0,\n\t\t0xdeadbeef,\n\t\t0xffffffff,\n\t}\n}\n\nfunc hashableUint64NotValues() []hashableUint64 {\n\treturn []hashableUint64{\n\t\t1,\n\t\t5,\n\t\t42,\n\t\t0xa5a5a5a5,\n\t\t0xfffffffe,\n\t}\n}\n\nfunc Test0(t *testing.T) {\n\tbf, _ := New(10000, 5)\n\n\tt.Log(\"Filled ratio before adds :\", bf.PreciseFilledRatio())\n\tfor _, x := range hashableUint64Values() {\n\t\tbf.Add(x)\n\t}\n\tt.Log(\"Filled ratio after adds :\", bf.PreciseFilledRatio())\n\n\t// these may or may not be true\n\tfor _, y := range hashableUint64Values() {\n\t\tif bf.Contains(y) {\n\t\t\tt.Log(\"value in set querties: may contain \", y)\n\t\t} else {\n\t\t\tt.Fatal(\"value in set queries: definitely does not contain \", y,\n\t\t\t\t\", but it should\")\n\t\t}\n\t}\n\n\t// these must all be false\n\tfor _, z := range hashableUint64NotValues() {\n\t\tif bf.Contains(z) {\n\t\t\tt.Log(\"value not in set queries: may or may not contain \", z)\n\t\t} else {\n\t\t\tt.Log(\"value not in set queries: definitely does not contain \", z,\n\t\t\t\t\" which is correct\")\n\t\t}\n\t}\n}\n\nfunc BenchmarkAddX10kX5(b *testing.B) {\n\tb.StopTimer()\n\tbf, _ := New(10000, 5)\n\tb.StartTimer()\n\tfor i := 0; i < b.N; i++ {\n\t\tbf.Add(hashableUint64(rand.Uint32()))\n\t}\n}\n\nfunc BenchmarkContains1kX10kX5(b *testing.B) {\n\tb.StopTimer()\n\tbf, _ := New(10000, 5)\n\tfor i := 0; i < 1000; i++ {\n\t\tbf.Add(hashableUint64(rand.Uint32()))\n\t}\n\tb.StartTimer()\n\tfor i := 0; i < b.N; i++ {\n\t\tbf.Contains(hashableUint64(rand.Uint32()))\n\t}\n}\n\nfunc BenchmarkContains100kX10BX20(b *testing.B) {\n\tb.StopTimer()\n\tbf, _ := New(10*1000*1000*1000, 20)\n\tfor i := 0; i < 100*1000; i++ {\n\t\tbf.Add(hashableUint64(rand.Uint32()))\n\t}\n\tb.StartTimer()\n\tfor i := 0; i < b.N; i++ {\n\t\tbf.Contains(hashableUint64(rand.Uint32()))\n\t}\n}\n"
  },
  {
    "path": "conformance.go",
    "content": "// Package bloomfilter is face-meltingly fast, thread-safe,\n// marshalable, unionable, probability- and\n// optimal-size-calculating Bloom filter in go\n//\n// https://github.com/steakknife/bloomfilter\n//\n// Copyright © 2014, 2015, 2018 Barry Allard\n//\n// MIT license\n//\npackage bloomfilter\n\nimport (\n\t\"encoding\"\n\t\"encoding/gob\"\n\t\"io\"\n)\n\n// compile-time conformance tests\nvar (\n\t_ encoding.BinaryMarshaler   = (*Filter)(nil)\n\t_ encoding.BinaryUnmarshaler = (*Filter)(nil)\n\t_ encoding.TextMarshaler     = (*Filter)(nil)\n\t_ encoding.TextUnmarshaler   = (*Filter)(nil)\n\t_ io.ReaderFrom              = (*Filter)(nil)\n\t_ io.WriterTo                = (*Filter)(nil)\n\t_ gob.GobDecoder             = (*Filter)(nil)\n\t_ gob.GobEncoder             = (*Filter)(nil)\n)\n"
  },
  {
    "path": "debug.go",
    "content": "// Package bloomfilter is face-meltingly fast, thread-safe,\n// marshalable, unionable, probability- and\n// optimal-size-calculating Bloom filter in go\n//\n// https://github.com/steakknife/bloomfilter\n//\n// Copyright © 2014, 2015, 2018 Barry Allard\n//\n// MIT license\n//\npackage bloomfilter\n\nimport (\n\t\"log\"\n\t\"os\"\n)\n\nconst debugVar = \"GOLANG_STEAKKNIFE_BLOOMFILTER_DEBUG\"\n\n// EnableDebugging permits debug() logging of details to stderr\nfunc EnableDebugging() {\n\terr := os.Setenv(debugVar, \"1\")\n\tif err != nil {\n\t\tpanic(\"Unable to Setenv \" + debugVar)\n\t}\n}\n\nfunc debugging() bool {\n\treturn os.Getenv(debugVar) != \"\"\n}\n\n// debug printing when debugging() is true\nfunc debug(format string, a ...interface{}) {\n\tif debugging() {\n\t\tlog.Printf(format, a...)\n\t}\n}\n"
  },
  {
    "path": "errors.go",
    "content": "// Package bloomfilter is face-meltingly fast, thread-safe,\n// marshalable, unionable, probability- and\n// optimal-size-calculating Bloom filter in go\n//\n// https://github.com/steakknife/bloomfilter\n//\n// Copyright © 2014, 2015, 2018 Barry Allard\n//\n// MIT license\n//\npackage bloomfilter\n\nimport \"fmt\"\n\nfunc errHash() error {\n\treturn fmt.Errorf(\n\t\t\"Hash mismatch, the Bloom filter is probably corrupt\")\n}\nfunc errK() error {\n\treturn fmt.Errorf(\n\t\t\"keys must have length %d or greater\", KMin)\n}\nfunc errM() error {\n\treturn fmt.Errorf(\n\t\t\"m (number of bits in the Bloom filter) must be >= %d\", MMin)\n}\nfunc errUniqueKeys() error {\n\treturn fmt.Errorf(\n\t\t\"Bloom filter keys must be unique\")\n}\nfunc errIncompatibleBloomFilters() error {\n\treturn fmt.Errorf(\n\t\t\"Cannot perform union on two incompatible Bloom filters\")\n}\n"
  },
  {
    "path": "fileio.go",
    "content": "// Package bloomfilter is face-meltingly fast, thread-safe,\n// marshalable, unionable, probability- and\n// optimal-size-calculating Bloom filter in go\n//\n// https://github.com/steakknife/bloomfilter\n//\n// Copyright © 2014, 2015, 2018 Barry Allard\n//\n// MIT license\n//\npackage bloomfilter\n\nimport (\n\t\"compress/gzip\"\n\t\"io\"\n\t\"io/ioutil\"\n\t\"os\"\n)\n\n// ReadFrom r and overwrite f with new Bloom filter data\nfunc (f *Filter) ReadFrom(r io.Reader) (n int64, err error) {\n\tf2, n, err := ReadFrom(r)\n\tif err != nil {\n\t\treturn -1, err\n\t}\n\tf.lock.Lock()\n\tdefer f.lock.Unlock()\n\tf.m = f2.m\n\tf.n = f2.n\n\tf.bits = f2.bits\n\tf.keys = f2.keys\n\treturn n, nil\n}\n\n// ReadFrom Reader r into a lossless-compressed Bloom filter f\nfunc ReadFrom(r io.Reader) (f *Filter, n int64, err error) {\n\trawR, err := gzip.NewReader(r)\n\tif err != nil {\n\t\treturn nil, -1, err\n\t}\n\tdefer func() {\n\t\terr = rawR.Close()\n\t}()\n\n\tcontent, err := ioutil.ReadAll(rawR)\n\tif err != nil {\n\t\treturn nil, -1, err\n\t}\n\n\tf = new(Filter)\n\tn = int64(len(content))\n\terr = f.UnmarshalBinary(content)\n\tif err != nil {\n\t\treturn nil, -1, err\n\t}\n\treturn f, n, nil\n}\n\n// ReadFile from filename into a lossless-compressed Bloom Filter f\n// Suggested file extension: .bf.gz\nfunc ReadFile(filename string) (f *Filter, n int64, err error) {\n\tr, err := os.Open(filename)\n\tif err != nil {\n\t\treturn nil, -1, err\n\t}\n\tdefer func() {\n\t\terr = r.Close()\n\t}()\n\n\treturn ReadFrom(r)\n}\n\n// WriteTo a Writer w from lossless-compressed Bloom Filter f\nfunc (f *Filter) WriteTo(w io.Writer) (n int64, err error) {\n\tf.lock.RLock()\n\tdefer f.lock.RUnlock()\n\n\trawW := gzip.NewWriter(w)\n\tdefer func() {\n\t\terr = rawW.Close()\n\t}()\n\n\tcontent, err := f.MarshalBinary()\n\tif err != nil {\n\t\treturn -1, err\n\t}\n\n\tintN, err := rawW.Write(content)\n\tn = int64(intN)\n\treturn n, err\n}\n\n// WriteFile filename from a a lossless-compressed Bloom Filter f\n// Suggested file extension: .bf.gz\nfunc (f *Filter) WriteFile(filename string) (n int64, err error) {\n\tw, err := os.Create(filename)\n\tif err != nil {\n\t\treturn -1, err\n\t}\n\tdefer func() {\n\t\terr = w.Close()\n\t}()\n\n\treturn f.WriteTo(w)\n}\n"
  },
  {
    "path": "fileio_test.go",
    "content": "// Package bloomfilter is face-meltingly fast, thread-safe,\n// marshalable, unionable, probability- and\n// optimal-size-calculating Bloom filter in go\n//\n// https://github.com/steakknife/bloomfilter\n//\n// Copyright © 2014, 2015, 2018 Barry Allard\n//\n// MIT license\n//\npackage bloomfilter\n\nimport (\n\t\"bytes\"\n\t\"testing\"\n)\n\nfunc TestWriteRead(t *testing.T) {\n\t// minimal filter\n\tf, _ := New(2, 1)\n\n\tv := hashableUint64(0)\n\tf.Add(v)\n\n\tvar b bytes.Buffer\n\n\t_, err := f.WriteTo(&b)\n\tif err != nil {\n\t\tt.Error(err)\n\t}\n\n\tf2, _, err := ReadFrom(&b)\n\tif err != nil {\n\t\tt.Error(err)\n\t}\n\n\tif !f2.Contains(v) {\n\t\tt.Error(\"Filters not equal\")\n\t}\n}\n"
  },
  {
    "path": "gob.go",
    "content": "// Package bloomfilter is face-meltingly fast, thread-safe,\n// marshalable, unionable, probability- and\n// optimal-size-calculating Bloom filter in go\n//\n// https://github.com/steakknife/bloomfilter\n//\n// Copyright © 2014, 2015, 2018 Barry Allard\n//\n// MIT license\n//\npackage bloomfilter\n\nimport _ \"encoding/gob\" // make sure gob is available\n\n// GobDecode conforms to interface gob.GobDecoder\nfunc (f *Filter) GobDecode(data []byte) error {\n\treturn f.UnmarshalBinary(data)\n}\n\n// GobEncode conforms to interface gob.GobEncoder\nfunc (f *Filter) GobEncode() ([]byte, error) {\n\treturn f.MarshalBinary()\n}\n"
  },
  {
    "path": "iscompatible.go",
    "content": "// Package bloomfilter is face-meltingly fast, thread-safe,\n// marshalable, unionable, probability- and\n// optimal-size-calculating Bloom filter in go\n//\n// https://github.com/steakknife/bloomfilter\n//\n// Copyright © 2014, 2015, 2018 Barry Allard\n//\n// MIT license\n//\npackage bloomfilter\n\nimport \"unsafe\"\n\nfunc uint64ToBool(x uint64) bool {\n\treturn *(*bool)(unsafe.Pointer(&x)) // #nosec\n}\n\n// returns 0 if equal, does not compare len(b0) with len(b1)\nfunc noBranchCompareUint64s(b0, b1 []uint64) uint64 {\n\tr := uint64(0)\n\tfor i, b0i := range b0 {\n\t\tr |= b0i ^ b1[i]\n\t}\n\treturn r\n}\n\n// IsCompatible is true if f and f2 can be Union()ed together\nfunc (f *Filter) IsCompatible(f2 *Filter) bool {\n\tf.lock.RLock()\n\tdefer f.lock.RUnlock()\n\n\tf.lock.RLock()\n\tdefer f2.lock.RUnlock()\n\n\t// 0 is true, non-0 is false\n\tcompat := f.M() ^ f2.M()\n\tcompat |= f.K() ^ f2.K()\n\tcompat |= noBranchCompareUint64s(f.keys, f2.keys)\n\treturn uint64ToBool(^compat)\n}\n"
  },
  {
    "path": "new.go",
    "content": "// Package bloomfilter is face-meltingly fast, thread-safe,\n// marshalable, unionable, probability- and\n// optimal-size-calculating Bloom filter in go\n//\n// https://github.com/steakknife/bloomfilter\n//\n// Copyright © 2014, 2015, 2018 Barry Allard\n//\n// MIT license\n//\npackage bloomfilter\n\nimport (\n\t\"crypto/rand\"\n\t\"encoding/binary\"\n\t\"log\"\n)\n\nconst (\n\t// MMin is the minimum Bloom filter bits count\n\tMMin = 2\n\t// KMin is the minimum number of keys\n\tKMin = 1\n\t// Uint64Bytes is the number of bytes in type uint64\n\tUint64Bytes = 8\n)\n\n// New Filter with CSPRNG keys\n//\n// m is the size of the Bloom filter, in bits, >= 2\n//\n// k is the number of random keys, >= 1\nfunc New(m, k uint64) (*Filter, error) {\n\treturn NewWithKeys(m, newRandKeys(k))\n}\n\nfunc newRandKeys(k uint64) []uint64 {\n\tkeys := make([]uint64, k)\n\terr := binary.Read(rand.Reader, binary.LittleEndian, keys)\n\tif err != nil {\n\t\tlog.Panicf(\n\t\t\t\"Cannot read %d bytes from CSRPNG crypto/rand.Read (err=%v)\",\n\t\t\tUint64Bytes, err,\n\t\t)\n\t}\n\treturn keys\n}\n\n// NewCompatible Filter compatible with f\nfunc (f *Filter) NewCompatible() (*Filter, error) {\n\treturn NewWithKeys(f.m, f.keys)\n}\n\n// NewOptimal Bloom filter with random CSPRNG keys\nfunc NewOptimal(maxN uint64, p float64) (*Filter, error) {\n\tm := OptimalM(maxN, p)\n\tk := OptimalK(m, maxN)\n\tdebug(\"New optimal bloom filter ::\"+\n\t\t\" requested max elements (n):%d,\"+\n\t\t\" probability of collision (p):%1.10f \"+\n\t\t\"-> recommends -> bits (m): %d (%f GiB), \"+\n\t\t\"number of keys (k): %d\",\n\t\tmaxN, p, m, float64(m)/(gigabitsPerGiB), k)\n\treturn New(m, k)\n}\n\n// UniqueKeys is true if all keys are unique\nfunc UniqueKeys(keys []uint64) bool {\n\tfor j := 0; j < len(keys)-1; j++ {\n\t\telem := keys[j]\n\t\tfor i := 1; i < j; i++ {\n\t\t\tif keys[i] == elem {\n\t\t\t\treturn false\n\t\t\t}\n\t\t}\n\t}\n\treturn true\n}\n\n// NewWithKeys creates a new Filter from user-supplied origKeys\nfunc NewWithKeys(m uint64, origKeys []uint64) (f *Filter, err error) {\n\tbits, err := newBits(m)\n\tif err != nil {\n\t\treturn nil, err\n\t}\n\tkeys, err := newKeysCopy(origKeys)\n\tif err != nil {\n\t\treturn nil, err\n\t}\n\treturn &Filter{\n\t\tm:    m,\n\t\tn:    0,\n\t\tbits: bits,\n\t\tkeys: keys,\n\t}, nil\n}\n\nfunc newBits(m uint64) ([]uint64, error) {\n\tif m < MMin {\n\t\treturn nil, errM()\n\t}\n\treturn make([]uint64, (m+63)/64), nil\n}\n\nfunc newKeysBlank(k uint64) ([]uint64, error) {\n\tif k < KMin {\n\t\treturn nil, errK()\n\t}\n\treturn make([]uint64, k), nil\n}\n\nfunc newKeysCopy(origKeys []uint64) (keys []uint64, err error) {\n\tif !UniqueKeys(origKeys) {\n\t\treturn nil, errUniqueKeys()\n\t}\n\tkeys, err = newKeysBlank(uint64(len(origKeys)))\n\tif err != nil {\n\t\treturn keys, err\n\t}\n\tcopy(keys, origKeys)\n\treturn keys, err\n}\n\nfunc newWithKeysAndBits(m uint64, keys []uint64, bits []uint64, n uint64) (\n\tf *Filter, err error,\n) {\n\tf, err = NewWithKeys(m, keys)\n\tif err != nil {\n\t\treturn nil, err\n\t}\n\tcopy(f.bits, bits)\n\tf.n = n\n\treturn f, nil\n}\n"
  },
  {
    "path": "optimal.go",
    "content": "// Package bloomfilter is face-meltingly fast, thread-safe,\n// marshalable, unionable, probability- and\n// optimal-size-calculating Bloom filter in go\n//\n// https://github.com/steakknife/bloomfilter\n//\n// Copyright © 2014, 2015, 2018 Barry Allard\n//\n// MIT license\n//\npackage bloomfilter\n\nimport \"math\"\n\nconst gigabitsPerGiB float64 = 8.0 * 1024 * 1024 * 1024\n\n// OptimalK calculates the optimal k value for creating a new Bloom filter\n// maxn is the maximum anticipated number of elements\nfunc OptimalK(m, maxN uint64) uint64 {\n\treturn uint64(math.Ceil(float64(m) * math.Ln2 / float64(maxN)))\n}\n\n// OptimalM calculates the optimal m value for creating a new Bloom filter\n// p is the desired false positive probability\n// optimal m = ceiling( - n * ln(p) / ln(2)**2 )\nfunc OptimalM(maxN uint64, p float64) uint64 {\n\treturn uint64(math.Ceil(-float64(maxN) * math.Log(p) / (math.Ln2 * math.Ln2)))\n}\n"
  },
  {
    "path": "optimal_test.go",
    "content": "package bloomfilter\n\nimport (\n\t\"testing\"\n)\n\nfunc TestOptimal(t *testing.T) {\n\ttests := []struct {\n\t\tn    uint64\n\t\tp    float64\n\t\tk, m uint64\n\t}{\n\t\t{\n\t\t\tn: 1000,\n\t\t\tp: 0.01 / 100,\n\t\t\tk: 14,\n\t\t\tm: 19171,\n\t\t},\n\t\t{\n\t\t\tn: 10000,\n\t\t\tp: 0.01 / 100,\n\t\t\tk: 14,\n\t\t\tm: 191702,\n\t\t},\n\t\t{\n\t\t\tn: 10000,\n\t\t\tp: 0.01 / 100,\n\t\t\tk: 14,\n\t\t\tm: 191702,\n\t\t},\n\t\t{\n\t\t\tn: 1000,\n\t\t\tp: 0.001 / 100,\n\t\t\tk: 17,\n\t\t\tm: 23963,\n\t\t},\n\t}\n\n\tfor _, test := range tests {\n\t\tm := OptimalM(test.n, test.p)\n\t\tk := OptimalK(m, test.n)\n\n\t\tif k != test.k || m != test.m {\n\t\t\tt.Errorf(\n\t\t\t\t\"n=%d p=%f: expected (m=%d, k=%d), got (m=%d, k=%d)\",\n\t\t\t\ttest.n,\n\t\t\t\ttest.p,\n\t\t\t\ttest.m,\n\t\t\t\ttest.k,\n\t\t\t\tm,\n\t\t\t\tk,\n\t\t\t)\n\t\t}\n\t}\n}\n"
  },
  {
    "path": "statistics.go",
    "content": "// Package bloomfilter is face-meltingly fast, thread-safe,\n// marshalable, unionable, probability- and\n// optimal-size-calculating Bloom filter in go\n//\n// https://github.com/steakknife/bloomfilter\n//\n// Copyright © 2014, 2015, 2018 Barry Allard\n//\n// MIT license\n//\npackage bloomfilter\n\nimport (\n\t\"math\"\n\n\t\"github.com/steakknife/hamming\"\n)\n\n// PreciseFilledRatio is an exhaustive count # of 1's\nfunc (f *Filter) PreciseFilledRatio() float64 {\n\tf.lock.RLock()\n\tdefer f.lock.RUnlock()\n\n\treturn float64(hamming.CountBitsUint64s(f.bits)) / float64(f.M())\n}\n\n// N is how many elements have been inserted\n// (actually, how many Add()s have been performed?)\nfunc (f *Filter) N() uint64 {\n\tf.lock.RLock()\n\tdefer f.lock.RUnlock()\n\n\treturn f.n\n}\n\n// FalsePosititveProbability is the upper-bound probability of false positives\n//  (1 - exp(-k*(n+0.5)/(m-1))) ** k\nfunc (f *Filter) FalsePosititveProbability() float64 {\n\tk := float64(f.K())\n\tn := float64(f.N())\n\tm := float64(f.M())\n\treturn math.Pow(1.0-math.Exp(-k)*(n+0.5)/(m-1), k)\n}\n"
  },
  {
    "path": "textmarshaler.go",
    "content": "// Package bloomfilter is face-meltingly fast, thread-safe,\n// marshalable, unionable, probability- and\n// optimal-size-calculating Bloom filter in go\n//\n// https://github.com/steakknife/bloomfilter\n//\n// Copyright © 2014, 2015, 2018 Barry Allard\n//\n// MIT license\n//\npackage bloomfilter\n\nimport \"fmt\"\n\n// MarshalText conforms to encoding.TextMarshaler\nfunc (f *Filter) MarshalText() (text []byte, err error) {\n\tf.lock.RLock()\n\tdefer f.lock.RUnlock()\n\n\ts := fmt.Sprintln(\"k\")\n\ts += fmt.Sprintln(f.K())\n\ts += fmt.Sprintln(\"n\")\n\ts += fmt.Sprintln(f.n)\n\ts += fmt.Sprintln(\"m\")\n\ts += fmt.Sprintln(f.m)\n\n\ts += fmt.Sprintln(\"keys\")\n\tfor key := range f.keys {\n\t\ts += fmt.Sprintf(keyFormat, key) + nl()\n\t}\n\n\ts += fmt.Sprintln(\"bits\")\n\tfor w := range f.bits {\n\t\ts += fmt.Sprintf(bitsFormat, w) + nl()\n\t}\n\n\t_, hash, err := f.marshal()\n\tif err != nil {\n\t\treturn nil, err\n\t}\n\ts += fmt.Sprintln(\"sha384\")\n\tfor b := range hash {\n\t\ts += fmt.Sprintf(\"%02x\", b)\n\t}\n\ts += nl()\n\n\ttext = []byte(s)\n\treturn text, nil\n}\n"
  },
  {
    "path": "textunmarshaler.go",
    "content": "// Package bloomfilter is face-meltingly fast, thread-safe,\n// marshalable, unionable, probability- and\n// optimal-size-calculating Bloom filter in go\n//\n// https://github.com/steakknife/bloomfilter\n//\n// Copyright © 2014, 2015, 2018 Barry Allard\n//\n// MIT license\n//\npackage bloomfilter\n\nimport (\n\t\"bytes\"\n\t\"crypto/hmac\"\n\t\"crypto/sha512\"\n\t\"fmt\"\n\t\"io\"\n)\n\nconst (\n\tkeyFormat  = \"%016x\"\n\tbitsFormat = \"%016x\"\n)\n\nfunc nl() string {\n\treturn fmt.Sprintln()\n}\n\nfunc unmarshalTextHeader(r io.Reader) (k, n, m uint64, err error) {\n\tformat := \"k\" + nl() + \"%d\" + nl()\n\tformat += \"n\" + nl() + \"%d\" + nl()\n\tformat += \"m\" + nl() + \"%d\" + nl()\n\tformat += \"keys\" + nl()\n\n\t_, err = fmt.Fscanf(r, format, k, n, m)\n\treturn k, n, m, err\n}\n\nfunc unmarshalTextKeys(r io.Reader, keys []uint64) (err error) {\n\tfor i := range keys {\n\t\t_, err = fmt.Fscanf(r, keyFormat, keys[i])\n\t\tif err != nil {\n\t\t\treturn err\n\t\t}\n\t}\n\treturn nil\n}\n\nfunc unmarshalTextBits(r io.Reader, bits []uint64) (err error) {\n\t_, err = fmt.Fscanf(r, \"bits\")\n\tif err != nil {\n\t\treturn err\n\t}\n\n\tfor i := range bits {\n\t\t_, err = fmt.Fscanf(r, bitsFormat, bits[i])\n\t\tif err != nil {\n\t\t\treturn err\n\t\t}\n\t}\n\n\treturn nil\n}\n\nfunc unmarshalAndCheckTextHash(r io.Reader, f *Filter) (err error) {\n\t_, err = fmt.Fscanf(r, \"sha384\")\n\tif err != nil {\n\t\treturn err\n\t}\n\n\tactualHash := [sha512.Size384]byte{}\n\n\tfor i := range actualHash {\n\t\t_, err = fmt.Fscanf(r, \"%02x\", actualHash[i])\n\t\tif err != nil {\n\t\t\treturn err\n\t\t}\n\t}\n\n\t_, expectedHash, err := f.marshal()\n\tif err != nil {\n\t\treturn err\n\t}\n\n\tif !hmac.Equal(expectedHash[:], actualHash[:]) {\n\t\treturn errHash()\n\t}\n\n\treturn nil\n}\n\n// UnmarshalText conforms to TextUnmarshaler\nfunc UnmarshalText(text []byte) (f *Filter, err error) {\n\tr := bytes.NewBuffer(text)\n\tk, n, m, err := unmarshalTextHeader(r)\n\tif err != nil {\n\t\treturn nil, err\n\t}\n\n\tkeys, err := newKeysBlank(k)\n\tif err != nil {\n\t\treturn nil, err\n\t}\n\n\terr = unmarshalTextKeys(r, keys)\n\tif err != nil {\n\t\treturn nil, err\n\t}\n\n\tbits, err := newBits(m)\n\tif err != nil {\n\t\treturn nil, err\n\t}\n\n\terr = unmarshalTextBits(r, bits)\n\tif err != nil {\n\t\treturn nil, err\n\t}\n\n\tf, err = newWithKeysAndBits(m, keys, bits, n)\n\tif err != nil {\n\t\treturn nil, err\n\t}\n\n\terr = unmarshalAndCheckTextHash(r, f)\n\tif err != nil {\n\t\treturn nil, err\n\t}\n\n\treturn f, nil\n}\n\n// UnmarshalText method overwrites f with data decoded from text\nfunc (f *Filter) UnmarshalText(text []byte) error {\n\tf.lock.Lock()\n\tdefer f.lock.Unlock()\n\n\tf2, err := UnmarshalText(text)\n\tif err != nil {\n\t\treturn err\n\t}\n\n\tf.m = f2.m\n\tf.n = f2.n\n\tcopy(f.bits, f2.bits)\n\tcopy(f.keys, f2.keys)\n\n\treturn nil\n}\n"
  }
]