Full Code of xrash/smetrics for AI

master 55b8f293f342 cached
19 files
19.5 KB
7.8k tokens
17 symbols
1 requests
Download .txt
Repository: xrash/smetrics
Branch: master
Commit: 55b8f293f342
Files: 19
Total size: 19.5 KB

Directory structure:
gitextract_fu9bok7z/

├── .travis.yml
├── LICENSE
├── README.md
├── doc.go
├── go.mod
├── hamming.go
├── jaro-winkler.go
├── jaro.go
├── soundex.go
├── tests/
│   ├── Makefile
│   ├── hamming_test.go
│   ├── jaro-winkler_test.go
│   ├── jaro_test.go
│   ├── soundex_test.go
│   ├── testcases.go
│   ├── ukkonen_test.go
│   └── wagner-fischer_test.go
├── ukkonen.go
└── wagner-fischer.go

================================================
FILE CONTENTS
================================================

================================================
FILE: .travis.yml
================================================
language: go
go:
    - 1.11
    - 1.12
    - 1.13
    - 1.14.x
    - master
script:
    - cd tests && make


================================================
FILE: LICENSE
================================================
Copyright (C) 2016 Felipe da Cunha Gonçalves
All Rights Reserved.

MIT LICENSE

Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


================================================
FILE: README.md
================================================
[![Build Status](https://travis-ci.org/xrash/smetrics.svg?branch=master)](http://travis-ci.org/xrash/smetrics)

# smetrics

`smetrics` is "string metrics".

Package smetrics provides a bunch of algorithms for calculating the distance between strings.

There are implementations for calculating the popular Levenshtein distance (aka Edit Distance or Wagner-Fischer), as well as the Jaro distance, the Jaro-Winkler distance, and more.

# How to import

```go
import "github.com/xrash/smetrics"
```

# Documentation

Go to [https://pkg.go.dev/github.com/xrash/smetrics](https://pkg.go.dev/github.com/xrash/smetrics) for complete documentation.

# Example

```go
package main

import (
	"github.com/xrash/smetrics"
)

func main() {
	smetrics.WagnerFischer("POTATO", "POTATTO", 1, 1, 2)
	smetrics.WagnerFischer("MOUSE", "HOUSE", 2, 2, 4)

	smetrics.Ukkonen("POTATO", "POTATTO", 1, 1, 2)
	smetrics.Ukkonen("MOUSE", "HOUSE", 2, 2, 4)

	smetrics.Jaro("AL", "AL")
	smetrics.Jaro("MARTHA", "MARHTA")

	smetrics.JaroWinkler("AL", "AL", 0.7, 4)
	smetrics.JaroWinkler("MARTHA", "MARHTA", 0.7, 4)

	smetrics.Soundex("Euler")
	smetrics.Soundex("Ellery")

	smetrics.Hamming("aaa", "aaa")
	smetrics.Hamming("aaa", "aab")
}
```


================================================
FILE: doc.go
================================================
/*
Package smetrics provides a bunch of algorithms for calculating
the distance between strings.

There are implementations for calculating the popular Levenshtein
distance (aka Edit Distance or Wagner-Fischer), as well as the Jaro
distance, the Jaro-Winkler distance, and more.

For the Levenshtein distance, you can use the functions WagnerFischer()
and Ukkonen(). Read the documentation on these functions.

For the Jaro and Jaro-Winkler algorithms, check the functions
Jaro() and JaroWinkler(). Read the documentation on these functions.

For the Soundex algorithm, check the function Soundex().

For the Hamming distance algorithm, check the function Hamming().
*/
package smetrics


================================================
FILE: go.mod
================================================
module github.com/xrash/smetrics

go 1.15


================================================
FILE: hamming.go
================================================
package smetrics

import (
	"fmt"
)

// The Hamming distance is the minimum number of substitutions required to change string A into string B. Both strings must have the same size. If the strings have different sizes, the function returns an error.
func Hamming(a, b string) (int, error) {
	al := len(a)
	bl := len(b)

	if al != bl {
		return -1, fmt.Errorf("strings are not equal (len(a)=%d, len(b)=%d)", al, bl)
	}

	var difference = 0

	for i := range a {
		if a[i] != b[i] {
			difference = difference + 1
		}
	}

	return difference, nil
}


================================================
FILE: jaro-winkler.go
================================================
package smetrics

import (
	"math"
)

// The Jaro-Winkler distance. The result is 1 for equal strings, and 0 for completely different strings. It is commonly used on Record Linkage stuff, thus it tries to be accurate for common typos when writing real names such as  person names and street names.
// Jaro-Winkler is a modification of the Jaro algorithm. It works by first running Jaro, then boosting the score of exact matches at the beginning of the strings. Because of that, it introduces two more parameters: the boostThreshold and the prefixSize. These are commonly set to 0.7 and 4, respectively.
func JaroWinkler(a, b string, boostThreshold float64, prefixSize int) float64 {
	j := Jaro(a, b)

	if j <= boostThreshold {
		return j
	}

	prefixSize = int(math.Min(float64(len(a)), math.Min(float64(prefixSize), float64(len(b)))))

	var prefixMatch float64
	for i := 0; i < prefixSize; i++ {
		if a[i] == b[i] {
			prefixMatch++
		} else {
			break
		}
	}

	return j + 0.1*prefixMatch*(1.0-j)
}


================================================
FILE: jaro.go
================================================
package smetrics

import (
	"math"
)

// The Jaro distance. The result is 1 for equal strings, and 0 for completely different strings.
func Jaro(a, b string) float64 {
	// If both strings are zero-length, they are completely equal,
	// therefore return 1.
	if len(a) == 0 && len(b) == 0 {
		return 1
	}

	// If one string is zero-length, strings are completely different,
	// therefore return 0.
	if len(a) == 0 || len(b) == 0 {
		return 0
	}

	// Define the necessary variables for the algorithm.
	la := float64(len(a))
	lb := float64(len(b))
	matchRange := int(math.Max(0, math.Floor(math.Max(la, lb)/2.0)-1))
	matchesA := make([]bool, len(a))
	matchesB := make([]bool, len(b))
	var matches float64 = 0

	// Step 1: Matches
	// Loop through each character of the first string,
	// looking for a matching character in the second string.
	for i := 0; i < len(a); i++ {
		start := int(math.Max(0, float64(i-matchRange)))
		end := int(math.Min(lb-1, float64(i+matchRange)))

		for j := start; j <= end; j++ {
			if matchesB[j] {
				continue
			}

			if a[i] == b[j] {
				matchesA[i] = true
				matchesB[j] = true
				matches++
				break
			}
		}
	}

	// If there are no matches, strings are completely different,
	// therefore return 0.
	if matches == 0 {
		return 0
	}

	// Step 2: Transpositions
	// Loop through the matches' arrays, looking for
	// unaligned matches. Count the number of unaligned matches.
	unaligned := 0
	j := 0
	for i := 0; i < len(a); i++ {
		if !matchesA[i] {
			continue
		}

		for !matchesB[j] {
			j++
		}

		if a[i] != b[j] {
			unaligned++
		}

		j++
	}

	// The number of unaligned matches divided by two, is the number of _transpositions_.
	transpositions := math.Floor(float64(unaligned) / 2)

	// Jaro distance is the average between these three numbers:
	// 1. matches / length of string A
	// 2. matches / length of string B
	// 3. (matches - transpositions/matches)
	// So, all that divided by three is the final result.
	return ((matches / la) + (matches / lb) + ((matches - transpositions) / matches)) / 3.0
}


================================================
FILE: soundex.go
================================================
package smetrics

import (
	"strings"
)

// The Soundex encoding. It is a phonetic algorithm that considers how the words sound in English. Soundex maps a string to a 4-byte code consisting of the first letter of the original string and three numbers. Strings that sound similar should map to the same code.
func Soundex(s string) string {
	b := strings.Builder{}
	b.Grow(4)

	p := s[0]
	if p <= 'z' && p >= 'a' {
		p -= 32 // convert to uppercase
	}
	b.WriteByte(p)

	n := 0
	for i := 1; i < len(s); i++ {
		c := s[i]

		if c <= 'z' && c >= 'a' {
			c -= 32 // convert to uppercase
		} else if c < 'A' || c > 'Z' {
			continue
		}

		if c == p {
			continue
		}

		p = c

		switch c {
		case 'B', 'P', 'F', 'V':
			c = '1'
		case 'C', 'S', 'K', 'G', 'J', 'Q', 'X', 'Z':
			c = '2'
		case 'D', 'T':
			c = '3'
		case 'L':
			c = '4'
		case 'M', 'N':
			c = '5'
		case 'R':
			c = '6'
		default:
			continue
		}

		b.WriteByte(c)
		n++
		if n == 3 {
			break
		}
	}

	for i := n; i < 3; i++ {
		b.WriteByte('0')
	}

	return b.String()
}


================================================
FILE: tests/Makefile
================================================
.PHONY : test
test :
	go test -v

.PHONY : gdb
gdb :
	go test -c -s -N -l
	gdb ./tests.test


================================================
FILE: tests/hamming_test.go
================================================
package tests

import (
	"fmt"
	"github.com/xrash/smetrics"
	"testing"
)

func TestHamming(t *testing.T) {
	cases := []hammingcase{
		{"a", "a", 0},
		{"a", "b", 1},
		{"AAAA", "AABB", 2},
		{"BAAA", "AAAA", 1},
		{"BAAA", "CCCC", 4},
		{"karolin", "kathrin", 3},
		{"karolin", "kerstin", 3},
		{"1011101", "1001001", 2},
		{"2173896", "2233796", 3},
	}

	for _, c := range cases {
		r, err := smetrics.Hamming(c.a, c.b)
		if err != nil {
			t.Fatalf("got error from hamming err=%s", err)
		}
		if r != c.diff {
			fmt.Println(r, "instead of", c.diff)
			t.Fail()
		}
	}
}

func TestHammingError(t *testing.T) {
	res, err := smetrics.Hamming("a", "bbb")
	if err == nil {
		t.Fatalf("expected error from 'a' and 'bbb' on hamming")
	}
	if res != -1 {
		t.Fatalf("erroring response wasn't -1, but %d", res)
	}
}


================================================
FILE: tests/jaro-winkler_test.go
================================================
package tests

import (
	"fmt"
	"github.com/xrash/smetrics"
	"testing"
)

func TestJaroWinkler(t *testing.T) {
	for _, c := range __jaro_winkler_cases {
		r := smetrics.JaroWinkler(c.a, c.b, 0.7, 4)
		result := fmt.Sprintf("%.3f", r)
		expected := fmt.Sprintf("%.3f", c.r)
		if result != expected {
			fmt.Println(c.a, c.b, result, "instead of", expected)
			t.Fail()
		}
	}
}


================================================
FILE: tests/jaro_test.go
================================================
package tests

import (
	"fmt"
	"github.com/xrash/smetrics"
	"testing"
)

func TestJaro(t *testing.T) {
	for _, c := range __jaro_cases {
		r := smetrics.Jaro(c.a, c.b)
		result := fmt.Sprintf("%.3f", r)
		expected := fmt.Sprintf("%.3f", c.r)
		if result != expected {
			fmt.Println(c.a, c.b, result, "instead of", expected)
			t.Fail()
		}
	}
}


================================================
FILE: tests/soundex_test.go
================================================
package tests

import (
	"fmt"
	"github.com/xrash/smetrics"
	"testing"
)

func TestSoundex(t *testing.T) {
	cases := []soundexcase{
		{"Euler", "E460"},
		{"Ellery", "E460"},
		{"Gauss", "G200"},
		{"Ghosh", "G200"},
		{"Hilbert", "H416"},
		{"Heilbrohn", "H416"},
		{"Knuth", "K530"},
		{"Kant", "K530"},
		{"Lloyd", "L300"},
		{"Ladd", "L300"},
		{"Lukasiewicz", "L222"},
		{"Lissjous", "L222"},
		{"Ravi", "R100"},
		{"Ravee", "R100"},
	}

	for _, c := range cases {
		if r := smetrics.Soundex(c.s); r != c.t {
			fmt.Println(r, "instead of", c.t, "for", c.s)
			t.Fail()
		}
	}
}


================================================
FILE: tests/testcases.go
================================================
package tests

type levenshteincase struct {
	s     string
	t     string
	icost int
	dcost int
	scost int
	r     int
}

type soundexcase struct {
	s string
	t string
}

type hammingcase struct {
	a    string
	b    string
	diff int
}

type jarocase struct {
	a string
	b string
	r float64
}

var __jaro_cases = []*jarocase{
	{a: "SHACKLEFORD", b: "SHACKELFORD", r: 0.970},
	{a: "DUNNINGHAM", b: "CUNNIGHAM", r: 0.896},
	{a: "NICHLESON", b: "NICHULSON", r: 0.926},
	{a: "JONES", b: "JOHNSON", r: 0.790},
	{a: "MASSEY", b: "MASSIE", r: 0.889},
	{a: "ABROMS", b: "ABRAMS", r: 0.889},
	{a: "HARDIN", b: "MARTINEZ", r: 0.722},
	{a: "ITMAN", b: "SMITH", r: 0.467},
	{a: "JERALDINE", b: "GERALDINE", r: 0.926},
	{a: "MARHTA", b: "MARTHA", r: 0.944},
	{a: "MICHELLE", b: "MICHAEL", r: 0.869},
	{a: "JULIES", b: "JULIUS", r: 0.889},
	{a: "TANYA", b: "TONYA", r: 0.867},
	{a: "DWAYNE", b: "DUANE", r: 0.822},
	{a: "SEAN", b: "SUSAN", r: 0.783},
	{a: "JON", b: "JOHN", r: 0.917},
	//	{a: "JON", b: "JAN", r: 0.000},
	{a: "BROOKHAVEN", b: "BRROKHAVEN", r: 0.933},
	{a: "BROOK HALLOW", b: "BROOK HLLW", r: 0.944},
	{a: "DECATUR", b: "DECATIR", r: 0.905},
	{a: "FITZRUREITER", b: "FITZENREITER", r: 0.856},
	{a: "HIGBEE", b: "HIGHEE", r: 0.889},
	{a: "HIGBEE", b: "HIGVEE", r: 0.889},
	{a: "LACURA", b: "LOCURA", r: 0.889},
	{a: "IOWA", b: "IONA", r: 0.833},
	//	{a: "1ST", b: "IST", r: 0.000},

	// Equal strings.
	{a: "", b: "", r: 1.000},
	{a: "A", b: "A", r: 1.000},
	{a: "AA", b: "AA", r: 1.000},
	{a: "AAA", b: "AAA", r: 1.000},
	{a: "AAAA", b: "AAAA", r: 1.000},
	{a: "AAAAA", b: "AAAAA", r: 1.000},
	{a: "AAAAAA", b: "AAAAAA", r: 1.000},
	{
		a: "Legend of the Galactic Heroes",
		b: "Legend of the Galactic Heroes",
		r: 1.000,
	},
	{
		a: "Home is the place where, when you have to go there, they have to take you in.",
		b: "Home is the place where, when you have to go there, they have to take you in.",
		r: 1.000,
	},
	{
		a: "Pedro de Alcântara João Carlos Leopoldo Salvador Bibiano Francisco Xavier de Paula Leocádio Miguel Gabriel Rafael Gonzaga de Habsburgo-Lorena e Bragança",
		b: "Pedro de Alcântara João Carlos Leopoldo Salvador Bibiano Francisco Xavier de Paula Leocádio Miguel Gabriel Rafael Gonzaga de Habsburgo-Lorena e Bragança",
		r: 1.000,
	},
	{
		a: "Et tu, Brute",
		b: "Et tu, Brute",
		r: 1.000,
	},

	// Completely different strings.
	{a: "", b: "A", r: 0.000},
	{a: "", b: "AA", r: 0.000},
	{a: "", b: "AAA", r: 0.000},
	{a: "", b: "AAAA", r: 0.000},
	{a: "", b: "AAAAA", r: 0.000},
	{a: "A", b: "", r: 0.000},
	{a: "AA", b: "", r: 0.000},
	{a: "AAA", b: "", r: 0.000},
	{a: "AAAA", b: "", r: 0.000},
	{a: "AAAAA", b: "", r: 0.000},
	{a: "A", b: "B", r: 0.000},
	{a: "AA", b: "BB", r: 0.000},
	{a: "AAA", b: "BBB", r: 0.000},
	{a: "AAAA", b: "BBBB", r: 0.000},
	{a: "AAAAa", b: "BBBBB", r: 0.000},
}

var __jaro_winkler_cases = []*jarocase{
	{a: "SHACKLEFORD", b: "SHACKELFORD", r: 0.982},
	{a: "DUNNINGHAM", b: "CUNNIGHAM", r: 0.896},
	{a: "NICHLESON", b: "NICHULSON", r: 0.956},
	{a: "JONES", b: "JOHNSON", r: 0.832},
	{a: "MASSEY", b: "MASSIE", r: 0.933},
	{a: "ABROMS", b: "ABRAMS", r: 0.922},
	{a: "HARDIN", b: "MARTINEZ", r: 0.722},
	{a: "ITMAN", b: "SMITH", r: 0.467},
	{a: "JERALDINE", b: "GERALDINE", r: 0.926},
	{a: "MARHTA", b: "MARTHA", r: 0.961},
	{a: "MICHELLE", b: "MICHAEL", r: 0.921},
	{a: "JULIES", b: "JULIUS", r: 0.933},
	{a: "TANYA", b: "TONYA", r: 0.880},
	{a: "DWAYNE", b: "DUANE", r: 0.840},
	{a: "SEAN", b: "SUSAN", r: 0.805},
	{a: "JON", b: "JOHN", r: 0.933},
	//	{a: "JON", b: "JAN", r: 0.000},
	{a: "BROOKHAVEN", b: "BRROKHAVEN", r: 0.947},
	{a: "BROOK HALLOW", b: "BROOK HLLW", r: 0.967},
	{a: "DECATUR", b: "DECATIR", r: 0.943},
	{a: "FITZRUREITER", b: "FITZENREITER", r: 0.913},
	{a: "HIGBEE", b: "HIGHEE", r: 0.922},
	{a: "HIGBEE", b: "HIGVEE", r: 0.922},
	{a: "LACURA", b: "LOCURA", r: 0.900},
	{a: "IOWA", b: "IONA", r: 0.867},
	//	{a: "1ST", b: "IST", r: 0.000},
	{a: "w", b: "w", r: 1.000},

	// Equal strings.
	{a: "", b: "", r: 1.000},
	{a: "A", b: "A", r: 1.000},
	{a: "AA", b: "AA", r: 1.000},
	{a: "AAA", b: "AAA", r: 1.000},
	{a: "AAAA", b: "AAAA", r: 1.000},
	{a: "AAAAA", b: "AAAAA", r: 1.000},
	{a: "AAAAAA", b: "AAAAAA", r: 1.000},
	{
		a: "Legend of the Galactic Heroes",
		b: "Legend of the Galactic Heroes",
		r: 1.000,
	},
	{
		a: "Home is the place where, when you have to go there, they have to take you in.",
		b: "Home is the place where, when you have to go there, they have to take you in.",
		r: 1.000,
	},
	{
		a: "Pedro de Alcântara João Carlos Leopoldo Salvador Bibiano Francisco Xavier de Paula Leocádio Miguel Gabriel Rafael Gonzaga de Habsburgo-Lorena e Bragança",
		b: "Pedro de Alcântara João Carlos Leopoldo Salvador Bibiano Francisco Xavier de Paula Leocádio Miguel Gabriel Rafael Gonzaga de Habsburgo-Lorena e Bragança",
		r: 1.000,
	},
	{
		a: "Et tu, Brute",
		b: "Et tu, Brute",
		r: 1.000,
	},

	// Completely different strings.
	{a: "", b: "A", r: 0.000},
	{a: "", b: "AA", r: 0.000},
	{a: "", b: "AAA", r: 0.000},
	{a: "", b: "AAAA", r: 0.000},
	{a: "", b: "AAAAA", r: 0.000},
	{a: "A", b: "", r: 0.000},
	{a: "AA", b: "", r: 0.000},
	{a: "AAA", b: "", r: 0.000},
	{a: "AAAA", b: "", r: 0.000},
	{a: "AAAAA", b: "", r: 0.000},
	{a: "A", b: "B", r: 0.000},
	{a: "AA", b: "BB", r: 0.000},
	{a: "AAA", b: "BBB", r: 0.000},
	{a: "AAAA", b: "BBBB", r: 0.000},
	{a: "AAAAa", b: "BBBBB", r: 0.000},
}


================================================
FILE: tests/ukkonen_test.go
================================================
package tests

import (
	"fmt"
	"github.com/xrash/smetrics"
	"testing"
)

func TestUkkonen(t *testing.T) {
	cases := []levenshteincase{
		{"RASH", "RASH", 1, 1, 2, 0},
		{"POTATO", "POTTATO", 1, 1, 2, 1},
		{"POTTATO", "POTATO", 1, 1, 2, 1},
		{"HOUSE", "MOUSE", 1, 1, 2, 2},
		{"MOUSE", "HOUSE", 2, 2, 4, 4},
		{"abc", "xy", 2, 3, 5, 13},
		{"xy", "abc", 2, 3, 5, 12},
	}

	for _, c := range cases {
		if r := smetrics.Ukkonen(c.s, c.t, c.icost, c.dcost, c.scost); r != c.r {
			fmt.Println(r, "instead of", c.r)
			t.Fail()
		}
	}
}


================================================
FILE: tests/wagner-fischer_test.go
================================================
package tests

import (
	"fmt"
	"github.com/xrash/smetrics"
	"testing"
)

func TestWagnerFischer(t *testing.T) {
	cases := []levenshteincase{
		{"RASH", "RASH", 1, 1, 2, 0},
		{"POTATO", "POTTATO", 1, 1, 2, 1},
		{"POTTATO", "POTATO", 1, 1, 2, 1},
		{"HOUSE", "MOUSE", 1, 1, 2, 2},
		{"MOUSE", "HOUSE", 2, 2, 4, 4},
		{"abc", "xy", 2, 3, 5, 13},
		{"xy", "abc", 2, 3, 5, 12},
	}

	for _, c := range cases {
		if r := smetrics.WagnerFischer(c.s, c.t, c.icost, c.dcost, c.scost); r != c.r {
			fmt.Println(r, "instead of", c.r)
			t.Fail()
		}
	}
}


================================================
FILE: ukkonen.go
================================================
package smetrics

import (
	"math"
)

// The Ukkonen algorithm for calculating the Levenshtein distance. The algorithm is described in http://www.cs.helsinki.fi/u/ukkonen/InfCont85.PDF, or in docs/InfCont85.PDF. It runs on O(t . min(m, n)) where t is the actual distance between strings a and b. It needs O(min(t, m, n)) space. This function might be preferred over WagnerFischer() for *very* similar strings. But test it out yourself.
// The first two parameters are the two strings to be compared. The last three parameters are the insertion cost, the deletion cost and the substitution cost. These are normally defined as 1, 1 and 2 respectively.
func Ukkonen(a, b string, icost, dcost, scost int) int {
	var lowerCost int

	if icost < dcost && icost < scost {
		lowerCost = icost
	} else if dcost < scost {
		lowerCost = dcost
	} else {
		lowerCost = scost
	}

	infinite := math.MaxInt32 / 2

	var r []int
	var k, kprime, p, t int
	var ins, del, sub int

	if len(a) > len(b) {
		t = (len(a) - len(b) + 1) * lowerCost
	} else {
		t = (len(b) - len(a) + 1) * lowerCost
	}

	for {
		if (t / lowerCost) < (len(b) - len(a)) {
			continue
		}

		// This is the right damn thing since the original Ukkonen
		// paper minimizes the expression result only, but the uncommented version
		// doesn't need to deal with floats so it's faster.
		// p = int(math.Floor(0.5*((float64(t)/float64(lowerCost)) - float64(len(b) - len(a)))))
		p = ((t / lowerCost) - (len(b) - len(a))) / 2

		k = -p
		kprime = k

		rowlength := (len(b) - len(a)) + (2 * p)

		r = make([]int, rowlength+2)

		for i := 0; i < rowlength+2; i++ {
			r[i] = infinite
		}

		for i := 0; i <= len(a); i++ {
			for j := 0; j <= rowlength; j++ {
				if i == j+k && i == 0 {
					r[j] = 0
				} else {
					if j-1 < 0 {
						ins = infinite
					} else {
						ins = r[j-1] + icost
					}

					del = r[j+1] + dcost
					sub = r[j] + scost

					if i-1 < 0 || i-1 >= len(a) || j+k-1 >= len(b) || j+k-1 < 0 {
						sub = infinite
					} else if a[i-1] == b[j+k-1] {
						sub = r[j]
					}

					if ins < del && ins < sub {
						r[j] = ins
					} else if del < sub {
						r[j] = del
					} else {
						r[j] = sub
					}
				}
			}
			k++
		}

		if r[(len(b)-len(a))+(2*p)+kprime] <= t {
			break
		} else {
			t *= 2
		}
	}

	return r[(len(b)-len(a))+(2*p)+kprime]
}


================================================
FILE: wagner-fischer.go
================================================
package smetrics

// The Wagner-Fischer algorithm for calculating the Levenshtein distance.
// The first two parameters are the two strings to be compared. The last three parameters are the insertion cost, the deletion cost and the substitution cost. These are normally defined as 1, 1 and 2 respectively.
func WagnerFischer(a, b string, icost, dcost, scost int) int {

	// Allocate both rows.
	row1 := make([]int, len(b)+1)
	row2 := make([]int, len(b)+1)
	var tmp []int

	// Initialize the first row.
	for i := 1; i <= len(b); i++ {
		row1[i] = i * icost
	}

	// For each row...
	for i := 1; i <= len(a); i++ {
		row2[0] = i * dcost

		// For each column...
		for j := 1; j <= len(b); j++ {
			if a[i-1] == b[j-1] {
				row2[j] = row1[j-1]
			} else {
				ins := row2[j-1] + icost
				del := row1[j] + dcost
				sub := row1[j-1] + scost

				if ins < del && ins < sub {
					row2[j] = ins
				} else if del < sub {
					row2[j] = del
				} else {
					row2[j] = sub
				}
			}
		}

		// Swap the rows at the end of each row.
		tmp = row1
		row1 = row2
		row2 = tmp
	}

	// Because we swapped the rows, the final result is in row1 instead of row2.
	return row1[len(row1)-1]
}
Download .txt
gitextract_fu9bok7z/

├── .travis.yml
├── LICENSE
├── README.md
├── doc.go
├── go.mod
├── hamming.go
├── jaro-winkler.go
├── jaro.go
├── soundex.go
├── tests/
│   ├── Makefile
│   ├── hamming_test.go
│   ├── jaro-winkler_test.go
│   ├── jaro_test.go
│   ├── soundex_test.go
│   ├── testcases.go
│   ├── ukkonen_test.go
│   └── wagner-fischer_test.go
├── ukkonen.go
└── wagner-fischer.go
Download .txt
SYMBOL INDEX (17 symbols across 13 files)

FILE: hamming.go
  function Hamming (line 8) | func Hamming(a, b string) (int, error) {

FILE: jaro-winkler.go
  function JaroWinkler (line 9) | func JaroWinkler(a, b string, boostThreshold float64, prefixSize int) fl...

FILE: jaro.go
  function Jaro (line 8) | func Jaro(a, b string) float64 {

FILE: soundex.go
  function Soundex (line 8) | func Soundex(s string) string {

FILE: tests/hamming_test.go
  function TestHamming (line 9) | func TestHamming(t *testing.T) {
  function TestHammingError (line 34) | func TestHammingError(t *testing.T) {

FILE: tests/jaro-winkler_test.go
  function TestJaroWinkler (line 9) | func TestJaroWinkler(t *testing.T) {

FILE: tests/jaro_test.go
  function TestJaro (line 9) | func TestJaro(t *testing.T) {

FILE: tests/soundex_test.go
  function TestSoundex (line 9) | func TestSoundex(t *testing.T) {

FILE: tests/testcases.go
  type levenshteincase (line 3) | type levenshteincase struct
  type soundexcase (line 12) | type soundexcase struct
  type hammingcase (line 17) | type hammingcase struct
  type jarocase (line 23) | type jarocase struct

FILE: tests/ukkonen_test.go
  function TestUkkonen (line 9) | func TestUkkonen(t *testing.T) {

FILE: tests/wagner-fischer_test.go
  function TestWagnerFischer (line 9) | func TestWagnerFischer(t *testing.T) {

FILE: ukkonen.go
  function Ukkonen (line 9) | func Ukkonen(a, b string, icost, dcost, scost int) int {

FILE: wagner-fischer.go
  function WagnerFischer (line 5) | func WagnerFischer(a, b string, icost, dcost, scost int) int {
Condensed preview — 19 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (23K chars).
[
  {
    "path": ".travis.yml",
    "chars": 107,
    "preview": "language: go\ngo:\n    - 1.11\n    - 1.12\n    - 1.13\n    - 1.14.x\n    - master\nscript:\n    - cd tests && make\n"
  },
  {
    "path": "LICENSE",
    "chars": 1103,
    "preview": "Copyright (C) 2016 Felipe da Cunha Gonçalves\nAll Rights Reserved.\n\nMIT LICENSE\n\nPermission is hereby granted, free of ch"
  },
  {
    "path": "README.md",
    "chars": 1210,
    "preview": "[![Build Status](https://travis-ci.org/xrash/smetrics.svg?branch=master)](http://travis-ci.org/xrash/smetrics)\n\n# smetri"
  },
  {
    "path": "doc.go",
    "chars": 687,
    "preview": "/*\nPackage smetrics provides a bunch of algorithms for calculating\nthe distance between strings.\n\nThere are implementati"
  },
  {
    "path": "go.mod",
    "chars": 42,
    "preview": "module github.com/xrash/smetrics\n\ngo 1.15\n"
  },
  {
    "path": "hamming.go",
    "chars": 544,
    "preview": "package smetrics\n\nimport (\n\t\"fmt\"\n)\n\n// The Hamming distance is the minimum number of substitutions required to change s"
  },
  {
    "path": "jaro-winkler.go",
    "chars": 999,
    "preview": "package smetrics\n\nimport (\n\t\"math\"\n)\n\n// The Jaro-Winkler distance. The result is 1 for equal strings, and 0 for complet"
  },
  {
    "path": "jaro.go",
    "chars": 2049,
    "preview": "package smetrics\n\nimport (\n\t\"math\"\n)\n\n// The Jaro distance. The result is 1 for equal strings, and 0 for completely diff"
  },
  {
    "path": "soundex.go",
    "chars": 1036,
    "preview": "package smetrics\n\nimport (\n\t\"strings\"\n)\n\n// The Soundex encoding. It is a phonetic algorithm that considers how the word"
  },
  {
    "path": "tests/Makefile",
    "chars": 92,
    "preview": ".PHONY : test\ntest :\n\tgo test -v\n\n.PHONY : gdb\ngdb :\n\tgo test -c -s -N -l\n\tgdb ./tests.test\n"
  },
  {
    "path": "tests/hamming_test.go",
    "chars": 809,
    "preview": "package tests\n\nimport (\n\t\"fmt\"\n\t\"github.com/xrash/smetrics\"\n\t\"testing\"\n)\n\nfunc TestHamming(t *testing.T) {\n\tcases := []h"
  },
  {
    "path": "tests/jaro-winkler_test.go",
    "chars": 377,
    "preview": "package tests\n\nimport (\n\t\"fmt\"\n\t\"github.com/xrash/smetrics\"\n\t\"testing\"\n)\n\nfunc TestJaroWinkler(t *testing.T) {\n\tfor _, c"
  },
  {
    "path": "tests/jaro_test.go",
    "chars": 347,
    "preview": "package tests\n\nimport (\n\t\"fmt\"\n\t\"github.com/xrash/smetrics\"\n\t\"testing\"\n)\n\nfunc TestJaro(t *testing.T) {\n\tfor _, c := ran"
  },
  {
    "path": "tests/soundex_test.go",
    "chars": 584,
    "preview": "package tests\n\nimport (\n\t\"fmt\"\n\t\"github.com/xrash/smetrics\"\n\t\"testing\"\n)\n\nfunc TestSoundex(t *testing.T) {\n\tcases := []s"
  },
  {
    "path": "tests/testcases.go",
    "chars": 5389,
    "preview": "package tests\n\ntype levenshteincase struct {\n\ts     string\n\tt     string\n\ticost int\n\tdcost int\n\tscost int\n\tr     int\n}\n\n"
  },
  {
    "path": "tests/ukkonen_test.go",
    "chars": 535,
    "preview": "package tests\n\nimport (\n\t\"fmt\"\n\t\"github.com/xrash/smetrics\"\n\t\"testing\"\n)\n\nfunc TestUkkonen(t *testing.T) {\n\tcases := []l"
  },
  {
    "path": "tests/wagner-fischer_test.go",
    "chars": 547,
    "preview": "package tests\n\nimport (\n\t\"fmt\"\n\t\"github.com/xrash/smetrics\"\n\t\"testing\"\n)\n\nfunc TestWagnerFischer(t *testing.T) {\n\tcases "
  },
  {
    "path": "ukkonen.go",
    "chars": 2326,
    "preview": "package smetrics\n\nimport (\n\t\"math\"\n)\n\n// The Ukkonen algorithm for calculating the Levenshtein distance. The algorithm i"
  },
  {
    "path": "wagner-fischer.go",
    "chars": 1176,
    "preview": "package smetrics\n\n// The Wagner-Fischer algorithm for calculating the Levenshtein distance.\n// The first two parameters "
  }
]

About this extraction

This page contains the full source code of the xrash/smetrics GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 19 files (19.5 KB), approximately 7.8k tokens, and a symbol index with 17 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!