Repository: xrash/smetrics
Branch: master
Commit: 55b8f293f342
Files: 19
Total size: 19.5 KB
Directory structure:
gitextract_fu9bok7z/
├── .travis.yml
├── LICENSE
├── README.md
├── doc.go
├── go.mod
├── hamming.go
├── jaro-winkler.go
├── jaro.go
├── soundex.go
├── tests/
│ ├── Makefile
│ ├── hamming_test.go
│ ├── jaro-winkler_test.go
│ ├── jaro_test.go
│ ├── soundex_test.go
│ ├── testcases.go
│ ├── ukkonen_test.go
│ └── wagner-fischer_test.go
├── ukkonen.go
└── wagner-fischer.go
================================================
FILE CONTENTS
================================================
================================================
FILE: .travis.yml
================================================
language: go
go:
- 1.11
- 1.12
- 1.13
- 1.14.x
- master
script:
- cd tests && make
================================================
FILE: LICENSE
================================================
Copyright (C) 2016 Felipe da Cunha Gonçalves
All Rights Reserved.
MIT LICENSE
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
================================================
FILE: README.md
================================================
[](http://travis-ci.org/xrash/smetrics)
# smetrics
`smetrics` is "string metrics".
Package smetrics provides a bunch of algorithms for calculating the distance between strings.
There are implementations for calculating the popular Levenshtein distance (aka Edit Distance or Wagner-Fischer), as well as the Jaro distance, the Jaro-Winkler distance, and more.
# How to import
```go
import "github.com/xrash/smetrics"
```
# Documentation
Go to [https://pkg.go.dev/github.com/xrash/smetrics](https://pkg.go.dev/github.com/xrash/smetrics) for complete documentation.
# Example
```go
package main
import (
"github.com/xrash/smetrics"
)
func main() {
smetrics.WagnerFischer("POTATO", "POTATTO", 1, 1, 2)
smetrics.WagnerFischer("MOUSE", "HOUSE", 2, 2, 4)
smetrics.Ukkonen("POTATO", "POTATTO", 1, 1, 2)
smetrics.Ukkonen("MOUSE", "HOUSE", 2, 2, 4)
smetrics.Jaro("AL", "AL")
smetrics.Jaro("MARTHA", "MARHTA")
smetrics.JaroWinkler("AL", "AL", 0.7, 4)
smetrics.JaroWinkler("MARTHA", "MARHTA", 0.7, 4)
smetrics.Soundex("Euler")
smetrics.Soundex("Ellery")
smetrics.Hamming("aaa", "aaa")
smetrics.Hamming("aaa", "aab")
}
```
================================================
FILE: doc.go
================================================
/*
Package smetrics provides a bunch of algorithms for calculating
the distance between strings.
There are implementations for calculating the popular Levenshtein
distance (aka Edit Distance or Wagner-Fischer), as well as the Jaro
distance, the Jaro-Winkler distance, and more.
For the Levenshtein distance, you can use the functions WagnerFischer()
and Ukkonen(). Read the documentation on these functions.
For the Jaro and Jaro-Winkler algorithms, check the functions
Jaro() and JaroWinkler(). Read the documentation on these functions.
For the Soundex algorithm, check the function Soundex().
For the Hamming distance algorithm, check the function Hamming().
*/
package smetrics
================================================
FILE: go.mod
================================================
module github.com/xrash/smetrics
go 1.15
================================================
FILE: hamming.go
================================================
package smetrics
import (
"fmt"
)
// The Hamming distance is the minimum number of substitutions required to change string A into string B. Both strings must have the same size. If the strings have different sizes, the function returns an error.
func Hamming(a, b string) (int, error) {
al := len(a)
bl := len(b)
if al != bl {
return -1, fmt.Errorf("strings are not equal (len(a)=%d, len(b)=%d)", al, bl)
}
var difference = 0
for i := range a {
if a[i] != b[i] {
difference = difference + 1
}
}
return difference, nil
}
================================================
FILE: jaro-winkler.go
================================================
package smetrics
import (
"math"
)
// The Jaro-Winkler distance. The result is 1 for equal strings, and 0 for completely different strings. It is commonly used on Record Linkage stuff, thus it tries to be accurate for common typos when writing real names such as person names and street names.
// Jaro-Winkler is a modification of the Jaro algorithm. It works by first running Jaro, then boosting the score of exact matches at the beginning of the strings. Because of that, it introduces two more parameters: the boostThreshold and the prefixSize. These are commonly set to 0.7 and 4, respectively.
func JaroWinkler(a, b string, boostThreshold float64, prefixSize int) float64 {
j := Jaro(a, b)
if j <= boostThreshold {
return j
}
prefixSize = int(math.Min(float64(len(a)), math.Min(float64(prefixSize), float64(len(b)))))
var prefixMatch float64
for i := 0; i < prefixSize; i++ {
if a[i] == b[i] {
prefixMatch++
} else {
break
}
}
return j + 0.1*prefixMatch*(1.0-j)
}
================================================
FILE: jaro.go
================================================
package smetrics
import (
"math"
)
// The Jaro distance. The result is 1 for equal strings, and 0 for completely different strings.
func Jaro(a, b string) float64 {
// If both strings are zero-length, they are completely equal,
// therefore return 1.
if len(a) == 0 && len(b) == 0 {
return 1
}
// If one string is zero-length, strings are completely different,
// therefore return 0.
if len(a) == 0 || len(b) == 0 {
return 0
}
// Define the necessary variables for the algorithm.
la := float64(len(a))
lb := float64(len(b))
matchRange := int(math.Max(0, math.Floor(math.Max(la, lb)/2.0)-1))
matchesA := make([]bool, len(a))
matchesB := make([]bool, len(b))
var matches float64 = 0
// Step 1: Matches
// Loop through each character of the first string,
// looking for a matching character in the second string.
for i := 0; i < len(a); i++ {
start := int(math.Max(0, float64(i-matchRange)))
end := int(math.Min(lb-1, float64(i+matchRange)))
for j := start; j <= end; j++ {
if matchesB[j] {
continue
}
if a[i] == b[j] {
matchesA[i] = true
matchesB[j] = true
matches++
break
}
}
}
// If there are no matches, strings are completely different,
// therefore return 0.
if matches == 0 {
return 0
}
// Step 2: Transpositions
// Loop through the matches' arrays, looking for
// unaligned matches. Count the number of unaligned matches.
unaligned := 0
j := 0
for i := 0; i < len(a); i++ {
if !matchesA[i] {
continue
}
for !matchesB[j] {
j++
}
if a[i] != b[j] {
unaligned++
}
j++
}
// The number of unaligned matches divided by two, is the number of _transpositions_.
transpositions := math.Floor(float64(unaligned) / 2)
// Jaro distance is the average between these three numbers:
// 1. matches / length of string A
// 2. matches / length of string B
// 3. (matches - transpositions/matches)
// So, all that divided by three is the final result.
return ((matches / la) + (matches / lb) + ((matches - transpositions) / matches)) / 3.0
}
================================================
FILE: soundex.go
================================================
package smetrics
import (
"strings"
)
// The Soundex encoding. It is a phonetic algorithm that considers how the words sound in English. Soundex maps a string to a 4-byte code consisting of the first letter of the original string and three numbers. Strings that sound similar should map to the same code.
func Soundex(s string) string {
b := strings.Builder{}
b.Grow(4)
p := s[0]
if p <= 'z' && p >= 'a' {
p -= 32 // convert to uppercase
}
b.WriteByte(p)
n := 0
for i := 1; i < len(s); i++ {
c := s[i]
if c <= 'z' && c >= 'a' {
c -= 32 // convert to uppercase
} else if c < 'A' || c > 'Z' {
continue
}
if c == p {
continue
}
p = c
switch c {
case 'B', 'P', 'F', 'V':
c = '1'
case 'C', 'S', 'K', 'G', 'J', 'Q', 'X', 'Z':
c = '2'
case 'D', 'T':
c = '3'
case 'L':
c = '4'
case 'M', 'N':
c = '5'
case 'R':
c = '6'
default:
continue
}
b.WriteByte(c)
n++
if n == 3 {
break
}
}
for i := n; i < 3; i++ {
b.WriteByte('0')
}
return b.String()
}
================================================
FILE: tests/Makefile
================================================
.PHONY : test
test :
go test -v
.PHONY : gdb
gdb :
go test -c -s -N -l
gdb ./tests.test
================================================
FILE: tests/hamming_test.go
================================================
package tests
import (
"fmt"
"github.com/xrash/smetrics"
"testing"
)
func TestHamming(t *testing.T) {
cases := []hammingcase{
{"a", "a", 0},
{"a", "b", 1},
{"AAAA", "AABB", 2},
{"BAAA", "AAAA", 1},
{"BAAA", "CCCC", 4},
{"karolin", "kathrin", 3},
{"karolin", "kerstin", 3},
{"1011101", "1001001", 2},
{"2173896", "2233796", 3},
}
for _, c := range cases {
r, err := smetrics.Hamming(c.a, c.b)
if err != nil {
t.Fatalf("got error from hamming err=%s", err)
}
if r != c.diff {
fmt.Println(r, "instead of", c.diff)
t.Fail()
}
}
}
func TestHammingError(t *testing.T) {
res, err := smetrics.Hamming("a", "bbb")
if err == nil {
t.Fatalf("expected error from 'a' and 'bbb' on hamming")
}
if res != -1 {
t.Fatalf("erroring response wasn't -1, but %d", res)
}
}
================================================
FILE: tests/jaro-winkler_test.go
================================================
package tests
import (
"fmt"
"github.com/xrash/smetrics"
"testing"
)
func TestJaroWinkler(t *testing.T) {
for _, c := range __jaro_winkler_cases {
r := smetrics.JaroWinkler(c.a, c.b, 0.7, 4)
result := fmt.Sprintf("%.3f", r)
expected := fmt.Sprintf("%.3f", c.r)
if result != expected {
fmt.Println(c.a, c.b, result, "instead of", expected)
t.Fail()
}
}
}
================================================
FILE: tests/jaro_test.go
================================================
package tests
import (
"fmt"
"github.com/xrash/smetrics"
"testing"
)
func TestJaro(t *testing.T) {
for _, c := range __jaro_cases {
r := smetrics.Jaro(c.a, c.b)
result := fmt.Sprintf("%.3f", r)
expected := fmt.Sprintf("%.3f", c.r)
if result != expected {
fmt.Println(c.a, c.b, result, "instead of", expected)
t.Fail()
}
}
}
================================================
FILE: tests/soundex_test.go
================================================
package tests
import (
"fmt"
"github.com/xrash/smetrics"
"testing"
)
func TestSoundex(t *testing.T) {
cases := []soundexcase{
{"Euler", "E460"},
{"Ellery", "E460"},
{"Gauss", "G200"},
{"Ghosh", "G200"},
{"Hilbert", "H416"},
{"Heilbrohn", "H416"},
{"Knuth", "K530"},
{"Kant", "K530"},
{"Lloyd", "L300"},
{"Ladd", "L300"},
{"Lukasiewicz", "L222"},
{"Lissjous", "L222"},
{"Ravi", "R100"},
{"Ravee", "R100"},
}
for _, c := range cases {
if r := smetrics.Soundex(c.s); r != c.t {
fmt.Println(r, "instead of", c.t, "for", c.s)
t.Fail()
}
}
}
================================================
FILE: tests/testcases.go
================================================
package tests
type levenshteincase struct {
s string
t string
icost int
dcost int
scost int
r int
}
type soundexcase struct {
s string
t string
}
type hammingcase struct {
a string
b string
diff int
}
type jarocase struct {
a string
b string
r float64
}
var __jaro_cases = []*jarocase{
{a: "SHACKLEFORD", b: "SHACKELFORD", r: 0.970},
{a: "DUNNINGHAM", b: "CUNNIGHAM", r: 0.896},
{a: "NICHLESON", b: "NICHULSON", r: 0.926},
{a: "JONES", b: "JOHNSON", r: 0.790},
{a: "MASSEY", b: "MASSIE", r: 0.889},
{a: "ABROMS", b: "ABRAMS", r: 0.889},
{a: "HARDIN", b: "MARTINEZ", r: 0.722},
{a: "ITMAN", b: "SMITH", r: 0.467},
{a: "JERALDINE", b: "GERALDINE", r: 0.926},
{a: "MARHTA", b: "MARTHA", r: 0.944},
{a: "MICHELLE", b: "MICHAEL", r: 0.869},
{a: "JULIES", b: "JULIUS", r: 0.889},
{a: "TANYA", b: "TONYA", r: 0.867},
{a: "DWAYNE", b: "DUANE", r: 0.822},
{a: "SEAN", b: "SUSAN", r: 0.783},
{a: "JON", b: "JOHN", r: 0.917},
// {a: "JON", b: "JAN", r: 0.000},
{a: "BROOKHAVEN", b: "BRROKHAVEN", r: 0.933},
{a: "BROOK HALLOW", b: "BROOK HLLW", r: 0.944},
{a: "DECATUR", b: "DECATIR", r: 0.905},
{a: "FITZRUREITER", b: "FITZENREITER", r: 0.856},
{a: "HIGBEE", b: "HIGHEE", r: 0.889},
{a: "HIGBEE", b: "HIGVEE", r: 0.889},
{a: "LACURA", b: "LOCURA", r: 0.889},
{a: "IOWA", b: "IONA", r: 0.833},
// {a: "1ST", b: "IST", r: 0.000},
// Equal strings.
{a: "", b: "", r: 1.000},
{a: "A", b: "A", r: 1.000},
{a: "AA", b: "AA", r: 1.000},
{a: "AAA", b: "AAA", r: 1.000},
{a: "AAAA", b: "AAAA", r: 1.000},
{a: "AAAAA", b: "AAAAA", r: 1.000},
{a: "AAAAAA", b: "AAAAAA", r: 1.000},
{
a: "Legend of the Galactic Heroes",
b: "Legend of the Galactic Heroes",
r: 1.000,
},
{
a: "Home is the place where, when you have to go there, they have to take you in.",
b: "Home is the place where, when you have to go there, they have to take you in.",
r: 1.000,
},
{
a: "Pedro de Alcântara João Carlos Leopoldo Salvador Bibiano Francisco Xavier de Paula Leocádio Miguel Gabriel Rafael Gonzaga de Habsburgo-Lorena e Bragança",
b: "Pedro de Alcântara João Carlos Leopoldo Salvador Bibiano Francisco Xavier de Paula Leocádio Miguel Gabriel Rafael Gonzaga de Habsburgo-Lorena e Bragança",
r: 1.000,
},
{
a: "Et tu, Brute",
b: "Et tu, Brute",
r: 1.000,
},
// Completely different strings.
{a: "", b: "A", r: 0.000},
{a: "", b: "AA", r: 0.000},
{a: "", b: "AAA", r: 0.000},
{a: "", b: "AAAA", r: 0.000},
{a: "", b: "AAAAA", r: 0.000},
{a: "A", b: "", r: 0.000},
{a: "AA", b: "", r: 0.000},
{a: "AAA", b: "", r: 0.000},
{a: "AAAA", b: "", r: 0.000},
{a: "AAAAA", b: "", r: 0.000},
{a: "A", b: "B", r: 0.000},
{a: "AA", b: "BB", r: 0.000},
{a: "AAA", b: "BBB", r: 0.000},
{a: "AAAA", b: "BBBB", r: 0.000},
{a: "AAAAa", b: "BBBBB", r: 0.000},
}
var __jaro_winkler_cases = []*jarocase{
{a: "SHACKLEFORD", b: "SHACKELFORD", r: 0.982},
{a: "DUNNINGHAM", b: "CUNNIGHAM", r: 0.896},
{a: "NICHLESON", b: "NICHULSON", r: 0.956},
{a: "JONES", b: "JOHNSON", r: 0.832},
{a: "MASSEY", b: "MASSIE", r: 0.933},
{a: "ABROMS", b: "ABRAMS", r: 0.922},
{a: "HARDIN", b: "MARTINEZ", r: 0.722},
{a: "ITMAN", b: "SMITH", r: 0.467},
{a: "JERALDINE", b: "GERALDINE", r: 0.926},
{a: "MARHTA", b: "MARTHA", r: 0.961},
{a: "MICHELLE", b: "MICHAEL", r: 0.921},
{a: "JULIES", b: "JULIUS", r: 0.933},
{a: "TANYA", b: "TONYA", r: 0.880},
{a: "DWAYNE", b: "DUANE", r: 0.840},
{a: "SEAN", b: "SUSAN", r: 0.805},
{a: "JON", b: "JOHN", r: 0.933},
// {a: "JON", b: "JAN", r: 0.000},
{a: "BROOKHAVEN", b: "BRROKHAVEN", r: 0.947},
{a: "BROOK HALLOW", b: "BROOK HLLW", r: 0.967},
{a: "DECATUR", b: "DECATIR", r: 0.943},
{a: "FITZRUREITER", b: "FITZENREITER", r: 0.913},
{a: "HIGBEE", b: "HIGHEE", r: 0.922},
{a: "HIGBEE", b: "HIGVEE", r: 0.922},
{a: "LACURA", b: "LOCURA", r: 0.900},
{a: "IOWA", b: "IONA", r: 0.867},
// {a: "1ST", b: "IST", r: 0.000},
{a: "w", b: "w", r: 1.000},
// Equal strings.
{a: "", b: "", r: 1.000},
{a: "A", b: "A", r: 1.000},
{a: "AA", b: "AA", r: 1.000},
{a: "AAA", b: "AAA", r: 1.000},
{a: "AAAA", b: "AAAA", r: 1.000},
{a: "AAAAA", b: "AAAAA", r: 1.000},
{a: "AAAAAA", b: "AAAAAA", r: 1.000},
{
a: "Legend of the Galactic Heroes",
b: "Legend of the Galactic Heroes",
r: 1.000,
},
{
a: "Home is the place where, when you have to go there, they have to take you in.",
b: "Home is the place where, when you have to go there, they have to take you in.",
r: 1.000,
},
{
a: "Pedro de Alcântara João Carlos Leopoldo Salvador Bibiano Francisco Xavier de Paula Leocádio Miguel Gabriel Rafael Gonzaga de Habsburgo-Lorena e Bragança",
b: "Pedro de Alcântara João Carlos Leopoldo Salvador Bibiano Francisco Xavier de Paula Leocádio Miguel Gabriel Rafael Gonzaga de Habsburgo-Lorena e Bragança",
r: 1.000,
},
{
a: "Et tu, Brute",
b: "Et tu, Brute",
r: 1.000,
},
// Completely different strings.
{a: "", b: "A", r: 0.000},
{a: "", b: "AA", r: 0.000},
{a: "", b: "AAA", r: 0.000},
{a: "", b: "AAAA", r: 0.000},
{a: "", b: "AAAAA", r: 0.000},
{a: "A", b: "", r: 0.000},
{a: "AA", b: "", r: 0.000},
{a: "AAA", b: "", r: 0.000},
{a: "AAAA", b: "", r: 0.000},
{a: "AAAAA", b: "", r: 0.000},
{a: "A", b: "B", r: 0.000},
{a: "AA", b: "BB", r: 0.000},
{a: "AAA", b: "BBB", r: 0.000},
{a: "AAAA", b: "BBBB", r: 0.000},
{a: "AAAAa", b: "BBBBB", r: 0.000},
}
================================================
FILE: tests/ukkonen_test.go
================================================
package tests
import (
"fmt"
"github.com/xrash/smetrics"
"testing"
)
func TestUkkonen(t *testing.T) {
cases := []levenshteincase{
{"RASH", "RASH", 1, 1, 2, 0},
{"POTATO", "POTTATO", 1, 1, 2, 1},
{"POTTATO", "POTATO", 1, 1, 2, 1},
{"HOUSE", "MOUSE", 1, 1, 2, 2},
{"MOUSE", "HOUSE", 2, 2, 4, 4},
{"abc", "xy", 2, 3, 5, 13},
{"xy", "abc", 2, 3, 5, 12},
}
for _, c := range cases {
if r := smetrics.Ukkonen(c.s, c.t, c.icost, c.dcost, c.scost); r != c.r {
fmt.Println(r, "instead of", c.r)
t.Fail()
}
}
}
================================================
FILE: tests/wagner-fischer_test.go
================================================
package tests
import (
"fmt"
"github.com/xrash/smetrics"
"testing"
)
func TestWagnerFischer(t *testing.T) {
cases := []levenshteincase{
{"RASH", "RASH", 1, 1, 2, 0},
{"POTATO", "POTTATO", 1, 1, 2, 1},
{"POTTATO", "POTATO", 1, 1, 2, 1},
{"HOUSE", "MOUSE", 1, 1, 2, 2},
{"MOUSE", "HOUSE", 2, 2, 4, 4},
{"abc", "xy", 2, 3, 5, 13},
{"xy", "abc", 2, 3, 5, 12},
}
for _, c := range cases {
if r := smetrics.WagnerFischer(c.s, c.t, c.icost, c.dcost, c.scost); r != c.r {
fmt.Println(r, "instead of", c.r)
t.Fail()
}
}
}
================================================
FILE: ukkonen.go
================================================
package smetrics
import (
"math"
)
// The Ukkonen algorithm for calculating the Levenshtein distance. The algorithm is described in http://www.cs.helsinki.fi/u/ukkonen/InfCont85.PDF, or in docs/InfCont85.PDF. It runs on O(t . min(m, n)) where t is the actual distance between strings a and b. It needs O(min(t, m, n)) space. This function might be preferred over WagnerFischer() for *very* similar strings. But test it out yourself.
// The first two parameters are the two strings to be compared. The last three parameters are the insertion cost, the deletion cost and the substitution cost. These are normally defined as 1, 1 and 2 respectively.
func Ukkonen(a, b string, icost, dcost, scost int) int {
var lowerCost int
if icost < dcost && icost < scost {
lowerCost = icost
} else if dcost < scost {
lowerCost = dcost
} else {
lowerCost = scost
}
infinite := math.MaxInt32 / 2
var r []int
var k, kprime, p, t int
var ins, del, sub int
if len(a) > len(b) {
t = (len(a) - len(b) + 1) * lowerCost
} else {
t = (len(b) - len(a) + 1) * lowerCost
}
for {
if (t / lowerCost) < (len(b) - len(a)) {
continue
}
// This is the right damn thing since the original Ukkonen
// paper minimizes the expression result only, but the uncommented version
// doesn't need to deal with floats so it's faster.
// p = int(math.Floor(0.5*((float64(t)/float64(lowerCost)) - float64(len(b) - len(a)))))
p = ((t / lowerCost) - (len(b) - len(a))) / 2
k = -p
kprime = k
rowlength := (len(b) - len(a)) + (2 * p)
r = make([]int, rowlength+2)
for i := 0; i < rowlength+2; i++ {
r[i] = infinite
}
for i := 0; i <= len(a); i++ {
for j := 0; j <= rowlength; j++ {
if i == j+k && i == 0 {
r[j] = 0
} else {
if j-1 < 0 {
ins = infinite
} else {
ins = r[j-1] + icost
}
del = r[j+1] + dcost
sub = r[j] + scost
if i-1 < 0 || i-1 >= len(a) || j+k-1 >= len(b) || j+k-1 < 0 {
sub = infinite
} else if a[i-1] == b[j+k-1] {
sub = r[j]
}
if ins < del && ins < sub {
r[j] = ins
} else if del < sub {
r[j] = del
} else {
r[j] = sub
}
}
}
k++
}
if r[(len(b)-len(a))+(2*p)+kprime] <= t {
break
} else {
t *= 2
}
}
return r[(len(b)-len(a))+(2*p)+kprime]
}
================================================
FILE: wagner-fischer.go
================================================
package smetrics
// The Wagner-Fischer algorithm for calculating the Levenshtein distance.
// The first two parameters are the two strings to be compared. The last three parameters are the insertion cost, the deletion cost and the substitution cost. These are normally defined as 1, 1 and 2 respectively.
func WagnerFischer(a, b string, icost, dcost, scost int) int {
// Allocate both rows.
row1 := make([]int, len(b)+1)
row2 := make([]int, len(b)+1)
var tmp []int
// Initialize the first row.
for i := 1; i <= len(b); i++ {
row1[i] = i * icost
}
// For each row...
for i := 1; i <= len(a); i++ {
row2[0] = i * dcost
// For each column...
for j := 1; j <= len(b); j++ {
if a[i-1] == b[j-1] {
row2[j] = row1[j-1]
} else {
ins := row2[j-1] + icost
del := row1[j] + dcost
sub := row1[j-1] + scost
if ins < del && ins < sub {
row2[j] = ins
} else if del < sub {
row2[j] = del
} else {
row2[j] = sub
}
}
}
// Swap the rows at the end of each row.
tmp = row1
row1 = row2
row2 = tmp
}
// Because we swapped the rows, the final result is in row1 instead of row2.
return row1[len(row1)-1]
}
gitextract_fu9bok7z/ ├── .travis.yml ├── LICENSE ├── README.md ├── doc.go ├── go.mod ├── hamming.go ├── jaro-winkler.go ├── jaro.go ├── soundex.go ├── tests/ │ ├── Makefile │ ├── hamming_test.go │ ├── jaro-winkler_test.go │ ├── jaro_test.go │ ├── soundex_test.go │ ├── testcases.go │ ├── ukkonen_test.go │ └── wagner-fischer_test.go ├── ukkonen.go └── wagner-fischer.go
SYMBOL INDEX (17 symbols across 13 files)
FILE: hamming.go
function Hamming (line 8) | func Hamming(a, b string) (int, error) {
FILE: jaro-winkler.go
function JaroWinkler (line 9) | func JaroWinkler(a, b string, boostThreshold float64, prefixSize int) fl...
FILE: jaro.go
function Jaro (line 8) | func Jaro(a, b string) float64 {
FILE: soundex.go
function Soundex (line 8) | func Soundex(s string) string {
FILE: tests/hamming_test.go
function TestHamming (line 9) | func TestHamming(t *testing.T) {
function TestHammingError (line 34) | func TestHammingError(t *testing.T) {
FILE: tests/jaro-winkler_test.go
function TestJaroWinkler (line 9) | func TestJaroWinkler(t *testing.T) {
FILE: tests/jaro_test.go
function TestJaro (line 9) | func TestJaro(t *testing.T) {
FILE: tests/soundex_test.go
function TestSoundex (line 9) | func TestSoundex(t *testing.T) {
FILE: tests/testcases.go
type levenshteincase (line 3) | type levenshteincase struct
type soundexcase (line 12) | type soundexcase struct
type hammingcase (line 17) | type hammingcase struct
type jarocase (line 23) | type jarocase struct
FILE: tests/ukkonen_test.go
function TestUkkonen (line 9) | func TestUkkonen(t *testing.T) {
FILE: tests/wagner-fischer_test.go
function TestWagnerFischer (line 9) | func TestWagnerFischer(t *testing.T) {
FILE: ukkonen.go
function Ukkonen (line 9) | func Ukkonen(a, b string, icost, dcost, scost int) int {
FILE: wagner-fischer.go
function WagnerFischer (line 5) | func WagnerFischer(a, b string, icost, dcost, scost int) int {
Condensed preview — 19 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (23K chars).
[
{
"path": ".travis.yml",
"chars": 107,
"preview": "language: go\ngo:\n - 1.11\n - 1.12\n - 1.13\n - 1.14.x\n - master\nscript:\n - cd tests && make\n"
},
{
"path": "LICENSE",
"chars": 1103,
"preview": "Copyright (C) 2016 Felipe da Cunha Gonçalves\nAll Rights Reserved.\n\nMIT LICENSE\n\nPermission is hereby granted, free of ch"
},
{
"path": "README.md",
"chars": 1210,
"preview": "[](http://travis-ci.org/xrash/smetrics)\n\n# smetri"
},
{
"path": "doc.go",
"chars": 687,
"preview": "/*\nPackage smetrics provides a bunch of algorithms for calculating\nthe distance between strings.\n\nThere are implementati"
},
{
"path": "go.mod",
"chars": 42,
"preview": "module github.com/xrash/smetrics\n\ngo 1.15\n"
},
{
"path": "hamming.go",
"chars": 544,
"preview": "package smetrics\n\nimport (\n\t\"fmt\"\n)\n\n// The Hamming distance is the minimum number of substitutions required to change s"
},
{
"path": "jaro-winkler.go",
"chars": 999,
"preview": "package smetrics\n\nimport (\n\t\"math\"\n)\n\n// The Jaro-Winkler distance. The result is 1 for equal strings, and 0 for complet"
},
{
"path": "jaro.go",
"chars": 2049,
"preview": "package smetrics\n\nimport (\n\t\"math\"\n)\n\n// The Jaro distance. The result is 1 for equal strings, and 0 for completely diff"
},
{
"path": "soundex.go",
"chars": 1036,
"preview": "package smetrics\n\nimport (\n\t\"strings\"\n)\n\n// The Soundex encoding. It is a phonetic algorithm that considers how the word"
},
{
"path": "tests/Makefile",
"chars": 92,
"preview": ".PHONY : test\ntest :\n\tgo test -v\n\n.PHONY : gdb\ngdb :\n\tgo test -c -s -N -l\n\tgdb ./tests.test\n"
},
{
"path": "tests/hamming_test.go",
"chars": 809,
"preview": "package tests\n\nimport (\n\t\"fmt\"\n\t\"github.com/xrash/smetrics\"\n\t\"testing\"\n)\n\nfunc TestHamming(t *testing.T) {\n\tcases := []h"
},
{
"path": "tests/jaro-winkler_test.go",
"chars": 377,
"preview": "package tests\n\nimport (\n\t\"fmt\"\n\t\"github.com/xrash/smetrics\"\n\t\"testing\"\n)\n\nfunc TestJaroWinkler(t *testing.T) {\n\tfor _, c"
},
{
"path": "tests/jaro_test.go",
"chars": 347,
"preview": "package tests\n\nimport (\n\t\"fmt\"\n\t\"github.com/xrash/smetrics\"\n\t\"testing\"\n)\n\nfunc TestJaro(t *testing.T) {\n\tfor _, c := ran"
},
{
"path": "tests/soundex_test.go",
"chars": 584,
"preview": "package tests\n\nimport (\n\t\"fmt\"\n\t\"github.com/xrash/smetrics\"\n\t\"testing\"\n)\n\nfunc TestSoundex(t *testing.T) {\n\tcases := []s"
},
{
"path": "tests/testcases.go",
"chars": 5389,
"preview": "package tests\n\ntype levenshteincase struct {\n\ts string\n\tt string\n\ticost int\n\tdcost int\n\tscost int\n\tr int\n}\n\n"
},
{
"path": "tests/ukkonen_test.go",
"chars": 535,
"preview": "package tests\n\nimport (\n\t\"fmt\"\n\t\"github.com/xrash/smetrics\"\n\t\"testing\"\n)\n\nfunc TestUkkonen(t *testing.T) {\n\tcases := []l"
},
{
"path": "tests/wagner-fischer_test.go",
"chars": 547,
"preview": "package tests\n\nimport (\n\t\"fmt\"\n\t\"github.com/xrash/smetrics\"\n\t\"testing\"\n)\n\nfunc TestWagnerFischer(t *testing.T) {\n\tcases "
},
{
"path": "ukkonen.go",
"chars": 2326,
"preview": "package smetrics\n\nimport (\n\t\"math\"\n)\n\n// The Ukkonen algorithm for calculating the Levenshtein distance. The algorithm i"
},
{
"path": "wagner-fischer.go",
"chars": 1176,
"preview": "package smetrics\n\n// The Wagner-Fischer algorithm for calculating the Levenshtein distance.\n// The first two parameters "
}
]
About this extraction
This page contains the full source code of the xrash/smetrics GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 19 files (19.5 KB), approximately 7.8k tokens, and a symbol index with 17 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.