Full Code of chewxy/lingo for AI

master 491e816b48d4 cached

128 files

278.9 KB

93.4k tokens

992 symbols

1 requests

Download .txt

Showing preview only (306K chars total). Download the full file or copy to clipboard to get everything.

Repository: chewxy/lingo
Branch: master
Commit: 491e816b48d4
Files: 128
Total size: 278.9 KB

Directory structure:
gitextract_whqjv2y6/

├── .gitignore
├── .travis.yml
├── CONTRIBUTING.md
├── CONTRIBUTORS.md
├── LICENSE
├── POSTag.go
├── POSTag_stanford.go
├── POSTag_stanford_string.go
├── POSTag_universal.go
├── POSTag_universal_string.go
├── README.md
├── annotation.go
├── annotationSet.go
├── annotationSet_bench_test.go
├── browncluster.go
├── cmd/
│   ├── demo/
│   │   ├── io.go
│   │   ├── main.go
│   │   └── nlp.go
│   ├── dep/
│   │   ├── fixer.go
│   │   ├── io.go
│   │   ├── main.go
│   │   ├── pipeline.go
│   │   └── train.go
│   ├── lexer/
│   │   └── main.go
│   └── pos/
│       ├── crossvalidation.go
│       ├── fixer.go
│       └── main.go
├── const.go
├── corpus/
│   ├── consopt.go
│   ├── corpus.go
│   ├── corpus_test.go
│   ├── functions.go
│   ├── functions_test.go
│   ├── inflection.go
│   ├── inflection_test.go
│   ├── io.go
│   ├── io_test.go
│   ├── lda.go
│   ├── test_test.go
│   └── utils.go
├── dep/
│   ├── README.md
│   ├── arcStandard.go
│   ├── arcStandard_test.go
│   ├── configuration.go
│   ├── configuration_test.go
│   ├── debug.go
│   ├── dependencyParser.go
│   ├── documentation/
│   │   ├── iamhuman.dot
│   │   └── thecatsatonthemat.dot
│   ├── errors.go
│   ├── evaluation.go
│   ├── example.go
│   ├── example_test.go
│   ├── featureExtraction.go
│   ├── features.go
│   ├── features_string.go
│   ├── fix.go
│   ├── init.go
│   ├── models.go
│   ├── models_test.go
│   ├── move.go
│   ├── move_string.go
│   ├── nn2.go
│   ├── nn2_io.go
│   ├── nn2_io_test.go
│   ├── nn2_test.go
│   ├── nnconfig.go
│   ├── release.go
│   ├── span.go
│   ├── test_test.go
│   ├── train.go
│   ├── train_test.go
│   ├── transition.go
│   └── util.go
├── dependency.go
├── dependencyTree.go
├── dependencyType.go
├── dependencyType_stanford.go
├── dependencyType_stanford_string.go
├── dependencyType_universal.go
├── dependencyType_universal_string.go
├── errors.go
├── go.mod
├── go.sum
├── interfaces.go
├── io.go
├── io_test.go
├── lexeme.go
├── lexemetype_string.go
├── lexer/
│   ├── lexer.go
│   ├── lexer_test.go
│   └── stateFn.go
├── lingo.go
├── pos/
│   ├── allinone_test.go
│   ├── context.go
│   ├── context_test.go
│   ├── contexttype_string.go
│   ├── debug.go
│   ├── errors.go
│   ├── features.go
│   ├── features_test.go
│   ├── featuretype_string.go
│   ├── models.go
│   ├── models_test.go
│   ├── perceptron.go
│   ├── perceptron_io.go
│   ├── perceptron_io_test.go
│   ├── postagger.go
│   ├── release.go
│   ├── sentence.go
│   ├── test_test.go
│   ├── util.go
│   └── util_test.go
├── sentence.go
├── sets.go
├── shape.go
├── stopwords.go
├── treebank/
│   ├── const_postag_stanford.go
│   ├── const_postag_universal.go
│   ├── const_rel_stanford.go
│   ├── const_rel_universal.go
│   ├── sentenceTag.go
│   ├── sentenceTag_test.go
│   ├── treebank.go
│   ├── treebank_test.go
│   └── util.go
├── utils.go
└── wordFlags.go

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
# Compiled Object files, Static and Dynamic libs (Shared Objects)
*.o
*.a
*.so

# Folders
_obj
_test

# Architecture specific extensions/prefixes
*.[568vq]
[568vq].out

*.cgo1.go
*.cgo2.c
_cgo_defun.c
_cgo_gotypes.go
_cgo_export.*

_testmain.go

*.exe
*.test
*.prof


================================================
FILE: .travis.yml
================================================
language: go

branches:
  only:
    - master

go:
  - 1.11.x
  - 1.12.x
  - 1.13.x
  - tip

env:
  - GO111MODULE=on

matrix:
  allow_failures:
    - go: tip


================================================
FILE: CONTRIBUTING.md
================================================
# Contributing #

Contributors are welcome! We want to make contributing as easy as possible, and the process is very Github-centric. [Github Issues](https://github.com/chewxy/lingo/issues) are used to manage any contributions and changes. If you don't have a github account, please feel free to email me (my  user name [at] gmail.com), and I'll gladly open an issue on your behalf.

# Process #

Say you have a change you want to make, this is the process:

1. Open an issue.
2. I'll have a brief discussion with you. If you don't feel comfortable with a public discussion, I'm okay to email. 
3. Fork this project on Github, and clone it to your local machine.
4. Make your changes
5. Make sure you have tests. If you foresee breaking any API, it is vital that it be discussed beforehand.
6. Make sure your tests pass.
7. `gofmt` your code
8. Send a Pull Request.

Say you instead saw one of the [many issues](https://github.com/chewxy/lingo/issues) and want to solve one of them. This is the process:

1. Comment on the issue saying you'll pick it up. (Alternatively, email me)
2. Fork the project on Github, clone to your local drive.
3. Fork this project on Github, and clone it to your local machine.
4. Make your changes
5. Make sure you have tests. If you foresee breaking any API, it is vital that it be discussed beforehand.
6. Make sure your tests pass.
7. `gofmt` your code
8. Send a Pull Request.

## Pull Requests ##

I'll review every pull request. I may request some changes, or delve into further discussions. After that, once I'm satisfied everything passes, I'll merge the pull request. Then I'll add your name into the CONTRIBUTORS list.

# Debugging #

This package comes with a debug tag option. Most subpackages will have a `debug.go` which contain a `logf` function for logging any traces you wish to trace. 

================================================
FILE: CONTRIBUTORS.md
================================================
# Contributors #

* Xuanyi Chew (@chewxy) - initial package

================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2017 Chewxy

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: POSTag.go
================================================
package lingo

import (
	"fmt"
	"strings"
)

// POSTag represents a Part of Speech Tag.
type POSTag byte

var posTagLookup map[string]POSTag

func init() {
	posTagLookup = make(map[string]POSTag)
	for t := X; t < MAXTAG; t++ {
		s := t.String()
		posTagLookup[s] = POSTag(t)
		posTagLookup[strings.ToLower(s)] = POSTag(t)
	}
}

func (p POSTag) MarshalText() ([]byte, error) {
	return []byte(fmt.Sprintf("%v", p)), nil // add quotes back
}

func (p *POSTag) UnmarshalText(text []byte) error {
	str := strings.Trim(string(text), `"`) // for JSON use, if any
	tag, _ := posTagLookup[str]
	*p = tag
	return nil
}

// POSTag related functions
func InPOSTags(x POSTag, set []POSTag) bool {
	for _, v := range set {
		if v == x {
			return true
		}
	}
	return false
}

func IsAdjective(x POSTag) bool     { return InPOSTags(x, Adjectives) }
func IsNoun(x POSTag) bool          { return InPOSTags(x, Nouns) }
func IsProperNoun(x POSTag) bool    { return InPOSTags(x, ProperNouns) }
func IsVerb(x POSTag) bool          { return InPOSTags(x, Verbs) }
func IsAdverb(x POSTag) bool        { return InPOSTags(x, Adverbs) }
func IsInterrogative(x POSTag) bool { return InPOSTags(x, Interrogatives) }
func IsDeterminer(x POSTag) bool    { return InPOSTags(x, Determiners) }
func IsNumber(x POSTag) bool        { return InPOSTags(x, Numbers) }
func IsSymbol(x POSTag) bool        { return InPOSTags(x, Symbols) }


================================================
FILE: POSTag_stanford.go
================================================
// +build stanfordtags

package lingo

//go:generate stringer -type=POSTag -output=POSTag_stanford_string.go

const BUILD_TAGSET = "stanfordtags"

const (
	X           POSTag = iota // aka NULLTAG
	UNKNOWN_TAG               // Unknown
	ROOT_TAG                  // For Root
	CC                        // Coordinating conjunction
	CD                        // Cardinal number
	DT                        // Determiner
	EX                        // Existential there
	FW                        // Foreign word
	IN                        // Preposition or subordinating conjunction
	JJ                        // Adjective
	JJR                       // Adjective, comparative
	JJS                       // Adjective, superlative
	LS                        // List item marker
	MD                        // Modal
	NN                        // Noun, singular or mass
	NNS                       // Noun, plural
	NNP                       // Proper noun, singular
	NNPS                      // Proper noun, plural
	PDT                       // Predeterminer
	POS                       // Possessive ending
	PRP                       // Personal pronoun
	PPRP                      // Possessive pronoun (PRP$)
	RB                        // Adverb
	RBR                       // Adverb, comparative
	RBS                       // Adverb, superlative
	RP                        // Particle
	SYM                       // Symbol
	TO                        // to
	UH                        // Interjection
	VB                        // Verb, base form
	VBD                       // Verb, past tense
	VBG                       // Verb, gerund or present participle
	VBN                       // Verb, past participle
	VBP                       // Verb, non-3rd person singular present
	VBZ                       // Verb, 3rd person singular present
	WDT                       // Wh-determiner
	WP                        // Wh-pronoun
	PWP                       // Possessive wh-pronoun (WP$)
	WRB                       // Wh-adverb

	// Punctuation related stuff: http://stackoverflow.com/a/21546294
	COMMA      // Obvious isn't it?
	FULLSTOP   // fullstop
	OPENQUOTE  // Penn Treebank uses ``
	CLOSEQUOTE // Penn Treebank uses ''
	COLON
	DOLLAR
	HASHSIGN
	LEFTBRACE
	RIGHTBRACE

	// Extensions for web shit: https://www.ldc.upenn.edu/sites/www.ldc.upenn.edu/files/etb-supplementary-guidelines-2009-addendum.pdf
	// http://clear.colorado.edu/compsem/documents/treebank_guidelines.pdf
	HYPH // Hyphen in split compounds
	AFX  // affix
	ADD  // url or email addy
	NFP  // superfluous (non final) puncutation
	GW   // Goes WIth
	XX   // deidentified data (aka giberish)

	MAXTAG
)

// POSTagShortcut is a shortcut function to help the POSTagger shortcircuit some decisions about what the tag is
func POSTagShortcut(l Lexeme) (POSTag, bool) {
	switch l.LexemeType {
	case Number:
		return CD, true
	case Punctuation:
		switch l.Value {
		case ",":
			return COMMA, true
		case ".":
			return FULLSTOP, true
		case "``":
			return OPENQUOTE, true
		case "''":
			return CLOSEQUOTE, true
		case ":":
			return COLON, true
		case "#":
			return HASHSIGN, true
		case "(":
			return LEFTBRACE, true
		case ")":
			return RIGHTBRACE, true
		default:
			return X, false
		}
	case Symbol:
		return SYM, true
	case URI:
		return ADD, true
	case Date:
		return CD, true
	case Time:
		return CD, true
	case EOF:
		return X, true
	}
	return X, false
}

// sets

var Adjectives = []POSTag{JJ, JJR, JJS}
var Nouns = []POSTag{NN, NNP, NNS, NNPS}
var ProperNouns = []POSTag{NNP, NNPS}
var Verbs = []POSTag{VB, VBD, VBG, VBN, VBP, VBZ}
var Adverbs = []POSTag{RB, RBR, RBS}
var Determiners = []POSTag{DT, PDT}
var Interrogatives = []POSTag{WDT, WP, PWP, WRB}
var Numbers = []POSTag{CD}
var Symbols = []POSTag{SYM, FULLSTOP, COMMA, OPENQUOTE, COLON, DOLLAR, HASHSIGN, LEFTBRACE, RIGHTBRACE, HYPH, NFP}

// IsIN returns true if the POSTag is a subordinating conjunction.
// The reason why this exists is because in the stanford tag, IN is the POSTag
// while in the universal dependencies, it's the SCONJ POSTag
func IsIN(x POSTag) bool { return x == IN }


================================================
FILE: POSTag_stanford_string.go
================================================
// +build stanfordtags

// Code generated by "stringer -type=POSTag -output=POSTag_stanford_string.go"; DO NOT EDIT

package lingo

import "fmt"

const _POSTag_name = "XUNKNOWN_TAGROOT_TAGCCCDDTEXFWINJJJJRJJSLSMDNNNNSNNPNNPSPDTPOSPRPPPRPRBRBRRBSRPSYMTOUHVBVBDVBGVBNVBPVBZWDTWPPWPWRBCOMMAFULLSTOPOPENQUOTECLOSEQUOTECOLONDOLLARHASHSIGNLEFTBRACERIGHTBRACEHYPHAFXADDNFPGWXXMAXTAG"

var _POSTag_index = [...]uint8{0, 1, 12, 20, 22, 24, 26, 28, 30, 32, 34, 37, 40, 42, 44, 46, 49, 52, 56, 59, 62, 65, 69, 71, 74, 77, 79, 82, 84, 86, 88, 91, 94, 97, 100, 103, 106, 108, 111, 114, 119, 127, 136, 146, 151, 157, 165, 174, 184, 188, 191, 194, 197, 199, 201, 207}

func (i POSTag) String() string {
	if i >= POSTag(len(_POSTag_index)-1) {
		return fmt.Sprintf("POSTag(%d)", i)
	}
	return _POSTag_name[_POSTag_index[i]:_POSTag_index[i+1]]
}


================================================
FILE: POSTag_universal.go
================================================
// +build !stanfordtags

package lingo

//go:generate stringer -type=POSTag -output=POSTag_universal_string.go

const BUILD_TAGSET = "universaltags"

const (
	X POSTag = iota // aka NULLTAG
	UNKNOWN_TAG
	ROOT_TAG
	ADJ
	ADP
	ADV
	AUX
	CONJ
	DET
	INTJ
	NOUN
	NUM
	PART
	PRON
	PROPN
	PUNCT
	SCONJ
	SYM
	VERB

	MAXTAG // MAXTAG is provided here as index support
)

// POSTagShortcut is a shortcut function to help the POSTagger shortcircuit some decisions about what the tag is
func POSTagShortcut(l Lexeme) (POSTag, bool) {
	switch l.LexemeType {
	case Number:
		return NUM, true
	case Punctuation:
		return PUNCT, true
	case Symbol:
		return SYM, true
	case URI:
		return X, true
	case Date:
		return NUM, true
	case Time:
		return NUM, true
	case EOF:
		return X, true
	}
	return X, false
}

var Adjectives = []POSTag{ADJ}
var Nouns = []POSTag{NOUN, PROPN}
var ProperNouns = []POSTag{PROPN}
var Verbs = []POSTag{VERB}
var Adverbs = []POSTag{ADV}
var Determiners = []POSTag{DET}
var Interrogatives = []POSTag{PRON, DET, ADV}
var Numbers = []POSTag{NUM}
var Symbols = []POSTag{SYM, PUNCT}

// IsIN returns true if the POSTag is a subordinating conjunction.
// The reason why this exists is because in the stanford tag, IN is the POSTag
// while in the universal dependencies, it's the SCONJ POSTag
func IsIN(x POSTag) bool { return x == SCONJ }


================================================
FILE: POSTag_universal_string.go
================================================
// +build !stanfordtags

// Code generated by "stringer -type=POSTag -output=POSTag_universal_string.go"; DO NOT EDIT

package lingo

import "fmt"

const _POSTag_name = "XUNKNOWN_TAGROOT_TAGADJADPADVAUXCONJDETINTJNOUNNUMPARTPRONPROPNPUNCTSCONJSYMVERBMAXTAG"

var _POSTag_index = [...]uint8{0, 1, 12, 20, 23, 26, 29, 32, 36, 39, 43, 47, 50, 54, 58, 63, 68, 73, 76, 80, 86}

func (i POSTag) String() string {
	if i >= POSTag(len(_POSTag_index)-1) {
		return fmt.Sprintf("POSTag(%d)", i)
	}
	return _POSTag_name[_POSTag_index[i]:_POSTag_index[i+1]]
}


================================================
FILE: README.md
================================================
# lingo #

<img src="https://raw.githubusercontent.com/chewxy/lingo/master/media/gopher_small.png" align="right" />

[![Build Status](https://travis-ci.org/chewxy/lingo.svg?branch=master)](https://travis-ci.org/chewxy/lingo)

package `lingo` provides the data structures and algorithms required for natural language processing.

Specifically, it provides a POS Tagger (`lingo/pos`), a Dependency Parser (`lingo/dep`), and a basic tokenizer (`lingo/lexer`) for English. It also provides data structures for holding corpuses (`lingo/corpus`), and treebanks (`lingo/treebank`).

The aim of this package is to provide a production quality pipeline for natural language processing.

# Install #

The package is go-gettable: `go get -u github.com/chewxy/lingo`

This package and its subpackages depend on very few external packages. Here they are:

| Package | Used For | Vitality | Notes | Licence |
|---------|----------|----------|-------|---------|
| [gorgonia](https://github.com/chewxy/gorgonia) | Machine learning | Vital. It won't be hard to rewrite them, but why? | Same author | [Gorgonia Licence](https://github.com/chewxy/gorgonia/blob/master/LICENSE) (Apache 2.0-like) |
| [gographviz](https://github.com/awalterschulze/gographviz) | Visualization of annotations, and other graph-related visualizations | Vital for visualizations, which are a nice-to-have feature | API last changed 12th April 2017 | [gographviz licence](https://github.com/awalterschulze/gographviz/blob/master/LICENSE) (Apache 2.0) |
| [errors](https://github.com/pkg/errors)  | Errors   | The package won't die without it, but it's a very nice to have | Stable API for the past year | [errors licence](https://github.com/pkg/errors/blob/master/LICENSE) (MIT/BSD like) |
| [set](https://github.com/xtgo/set) | Set operations | Can be easily replaced | Stable API for the past year | [set licence](https://github.com/xtgo/set/blob/master/LICENSE) (MIT/BSD-like) |

# Usage #

See the individual packages for usage. There is also a bunch of executables in the `cmd` directory. They're meant to be examples as to how a natural language processing pipeline can be set up.

A natural language pipeline with this package is heavily channels driven. Here's is an example for dependency parsing:

```go
func main() {
	inputString: `The cat sat on the mat`
	lx := lexer.New("dummy", strings.NewReader(inputString)) // lexer - required to break a sentence up into words.
	pt := pos.New(pos.WithModel(posModel))                   // POS Tagger - required to tag the words with a part of speech tag.
	dp := dep.New(depModel)                                  // Creates a new parser

	// set up a pipeline
	pt.Input = lx.Output
	dp.Input = pt.Output

	// run all
	go lx.Run()
	go pt.Run()
	go dp.Run()

	// wait to receive:
	for {
		select {
		case d := <- dp.Output:
			// do something
		case err:= <-dp.Error:
			// handle error
		}
	}

}

```



# How It Works #
For specific tasks (POS tagging, parsing, named entity recognition etc), refer to the README of each subpackage. This package on its own mainly provides the data structures that the subpackages will use.

Perhaps the most important data structure is the `*Annotation` structure. It basically holds a word and the associated metadata for the word.

For dependency parses, the graph takes three forms: `*Dependency`, `*DependencyTree` and `*Annotation`. All three forms are convertable from one to another. TODO: explain rationale behind each data type.

## Quirks ##

### Very Oddly Specific POS Tags and Dependency Rel Types ###

A particular quirk you may have noticed is that the `POSTag` and `DependencyType` are hard coded in as constants. This package does in fact provide two variations of each: one from Stanford/Penn Treebank and one from [UniversalDependencies](http://universaldependencies.org/).

The main reason for hardcoding these are mainly for performance reasons - knowing ahead how much to allocate reduces a lot of additional work the program has to do. It also reduces the chances of mutating a global variable.

Of course this comes as a tradeoff - programs are limited to these two options. Thankfully there are only a limited number of POS Tag and Dependency Relation types. Two of the most popular ones (Stanford/PTB and Universal Dependencies) have been implemented.

The following build tags are supported:

* stanfordtags
* universaltags
* stanfordrel
* universalrel

To use a specific tagset or relset, build your program thusly: `go build -tags='stanfordtags'`.

The default tag and dependency rel types are the universal dependencies version.

### Lexer ###

You should also note that the tokenizer, `lingo/lexer` is not your usual run-of-the-mill NLP tokenizer. It's a tokenizer that tokenizes by space, with some specific rules for English. It was inspired by Rob Pike's talk on lexers. I thought it'd be cool to write something like that for NLP.

The test cases in package `lingo/lexer` showcases how it handles unicode, and other pathalogical english.

# Contributing #
see CONTRIBUTING.md for more info

# Licence #

This package is licenced under the MIT licence.


================================================
FILE: annotation.go
================================================
package lingo

import (
	"errors"
	"fmt"
	"strings"
)

// Annotation is the word and it's metadata.
// This includes the position, its dependency head (if available), its lemma, POSTag, etc
//
// A collection of Annoations - AnnotatedSentence is also a representation of a dependency parse
//
// Every field is exported for easy gobbing. be very careful with setting stuff
type Annotation struct {
	Lexeme
	POSTag
	// NER

	// fields to do with an annotation being in a collection
	DependencyType
	ID       int
	Head     *Annotation
	children AnnotationSet //will not be serialized

	// info about the annotation itself
	Lemma   string
	Lowered string
	Stem    string

	// auxiliary data for processing
	Cluster
	Shape
	WordFlag
}

func NewAnnotation() *Annotation {
	return &Annotation{
		Lexeme: nullLexeme,
		Lemma:  "",
		Shape:  Shape(""),
	}
}

// AnnotationFromLexTag is only ever used in tests. Fixer is optional
func AnnotationFromLexTag(l Lexeme, t POSTag, f AnnotationFixer) *Annotation {
	a := &Annotation{
		Lexeme:         l,
		POSTag:         t,
		DependencyType: NoDepType,
		Lemma:          "",
		Lowered:        strings.ToLower(l.Value),
	}

	// it's ok to panic - it will cause the tests to fail
	if err := a.Process(f); err != nil {
		panic(err)
	}

	return a
}

func (a *Annotation) Clone() *Annotation {
	b := *a
	b.ID = -1
	b.Head = nil
	b.children = nil
	b.DependencyType = NoDepType

	return &b
}

func (a *Annotation) SetHead(headAnn *Annotation) {
	a.Head = headAnn
	if headAnn != rootAnnotation && headAnn != startAnnotation && headAnn != nullAnnotation {
		headAnn.children = append(headAnn.children, a)
	}
}

func (a *Annotation) HeadID() int {
	if a.Head != nil {
		return a.Head.ID
	}
	return -1
}

func (a *Annotation) IsNumber() bool {
	return IsNumber(a.POSTag) && (a.LexemeType != Date && a.LexemeType != Time && a.LexemeType != URI)
}

func (a *Annotation) String() string {
	return a.Value
}

func (a *Annotation) GoString() string {
	s := fmt.Sprintf("%q/%s", a.Lexeme.Value, a.POSTag)

	if a.Head != nil {
		return fmt.Sprintf("(%v) <-%v- (%q/%s) ", s, a.DependencyType, a.Head.Value, a.Head.POSTag)
	}
	return s
}

func (a *Annotation) Process(f AnnotationFixer) error {
	if a.Lexeme != nullLexeme {
		a.Lowered = strings.ToLower(a.Value)
		a.Shape = a.Lexeme.Shape()
		a.WordFlag = a.Lexeme.Flags()

		var err error
		if f != nil {
			var stem string
			if stem, err = f.Stem(a.Lowered); err != nil {
				if _, ok := err.(componentUnavailable); !ok {
					return err
				}
			}
			a.Stem = stem

			var clust map[string]Cluster
			if clust, err = f.Clusters(); err == nil {
				a.Cluster = clust[a.Value]
			}
		}

		return nil
	}
	return errors.New("No Lexeme!")
}

var rootAnnotation = &Annotation{
	Lexeme:         rootLexeme,
	POSTag:         ROOT_TAG,
	DependencyType: Root,
	ID:             0,
	Head:           nil,
	Lemma:          "",
	Lowered:        "",
	Cluster:        0,
	Shape:          "",
	WordFlag:       NoFlag,
}

var startAnnotation = &Annotation{
	Lexeme:         startLexeme,
	POSTag:         ROOT_TAG,
	DependencyType: NoDepType,
	ID:             -1,
	Head:           nil,
	Lemma:          "",
	Lowered:        "",
	Cluster:        0,
	Shape:          "",
	WordFlag:       NoFlag,
}

var nullAnnotation = &Annotation{
	Lexeme:         nullLexeme,
	POSTag:         X,
	DependencyType: NoDepType,
	ID:             -1,
	Head:           nil,
	Lemma:          "",
	Lowered:        "",
	Cluster:        0,
	Shape:          "",
	WordFlag:       NoFlag,
}

func RootAnnotation() *Annotation  { return rootAnnotation }
func StartAnnotation() *Annotation { return startAnnotation }
func NullAnnotation() *Annotation  { return nullAnnotation }

func StringToAnnotation(s string, f AnnotationFixer) *Annotation {
	l := MakeLexeme(s, Word)
	a := NewAnnotation()
	a.Lexeme = l
	if err := a.Process(f); err != nil {
		panic(err.Error())
	}
	return a
}

type AnnotationFixer interface {
	Lemmatizer
	Stemmer
	Clusters() (map[string]Cluster, error)
}


================================================
FILE: annotationSet.go
================================================
package lingo

import (
	"sort"
	"unsafe"

	"github.com/xtgo/set"
)

type AnnotationSet []*Annotation

func (as AnnotationSet) Len() int      { return len(as) }
func (as AnnotationSet) Swap(i, j int) { as[i], as[j] = as[j], as[i] }
func (as AnnotationSet) Less(i, j int) bool {
	return uintptr(unsafe.Pointer(as[i])) < uintptr(unsafe.Pointer(as[j]))
}

func (as AnnotationSet) Set() AnnotationSet {
	sort.Sort(as)
	n := set.Uniq(as)
	return as[:n]
}

func (as AnnotationSet) Contains(a *Annotation) bool {
	if as.Index(a) == len(as) {
		return false
	}
	return true
}

func (as AnnotationSet) Index(a *Annotation) int {
	for i, an := range as {
		if an == a {
			return i
		}
	}
	return len(as)
}

func (as AnnotationSet) Add(a *Annotation) AnnotationSet {
	if as.Contains(a) {
		return as
	}
	as = append(as, a)
	return as
}


================================================
FILE: annotationSet_bench_test.go
================================================
package lingo

import (
	"sort"
	"testing"
)

func (as AnnotationSet) index2(a *Annotation) int {
	sort.Sort(as)
	f := func(i int) bool { return as[i] == a }
	return sort.Search(len(as), f)
}

var benchIndexRes int

func benchASIndex(size int, b *testing.B) {
	var as AnnotationSet
	for i := 0; i < size; i++ {
		as = append(as, new(Annotation))
	}

	doesntcontain := new(Annotation)
	contains := as[0]

	for n := 0; n < b.N; n++ {
		benchIndexRes = as.Index(doesntcontain)
		benchIndexRes = as.Index(contains)
	}
}

func benchASIndex2(size int, b *testing.B) {
	var as AnnotationSet
	for i := 0; i < size; i++ {
		as = append(as, new(Annotation))
	}

	doesntcontain := new(Annotation)
	contains := as[0]

	for n := 0; n < b.N; n++ {
		benchIndexRes = as.index2(doesntcontain)
		benchIndexRes = as.index2(contains)
	}
}

func BenchmarkAnnotationSetIndex_1(b *testing.B)    { benchASIndex(1, b) }
func BenchmarkAnnotationSetIndex_2(b *testing.B)    { benchASIndex(2, b) }
func BenchmarkAnnotationSetIndex_8(b *testing.B)    { benchASIndex(8, b) }
func BenchmarkAnnotationSetIndex_16(b *testing.B)   { benchASIndex(16, b) }
func BenchmarkAnnotationSetIndex_32(b *testing.B)   { benchASIndex(32, b) }
func BenchmarkAnnotationSetIndex_64(b *testing.B)   { benchASIndex(64, b) }
func BenchmarkAnnotationSetIndex_128(b *testing.B)  { benchASIndex(128, b) }
func BenchmarkAnnotationSetIndex_256(b *testing.B)  { benchASIndex(256, b) }
func BenchmarkAnnotationSetIndex_512(b *testing.B)  { benchASIndex(512, b) }
func BenchmarkAnnotationSetIndex_1024(b *testing.B) { benchASIndex(1024, b) }

func BenchmarkAnnotationSetIndex2_1(b *testing.B)    { benchASIndex2(1, b) }
func BenchmarkAnnotationSetIndex2_2(b *testing.B)    { benchASIndex2(2, b) }
func BenchmarkAnnotationSetIndex2_8(b *testing.B)    { benchASIndex2(8, b) }
func BenchmarkAnnotationSetIndex2_16(b *testing.B)   { benchASIndex2(16, b) }
func BenchmarkAnnotationSetIndex2_32(b *testing.B)   { benchASIndex2(32, b) }
func BenchmarkAnnotationSetIndex2_64(b *testing.B)   { benchASIndex2(64, b) }
func BenchmarkAnnotationSetIndex2_128(b *testing.B)  { benchASIndex2(128, b) }
func BenchmarkAnnotationSetIndex2_256(b *testing.B)  { benchASIndex2(256, b) }
func BenchmarkAnnotationSetIndex2_512(b *testing.B)  { benchASIndex2(512, b) }
func BenchmarkAnnotationSetIndex2_1024(b *testing.B) { benchASIndex2(1024, b) }


================================================
FILE: browncluster.go
================================================
package lingo

import (
	"bufio"
	"io"
	"strconv"
	"strings"
)

// this file provides IO support and type safety for brown clusters.
// The creation of brownclusters is not done here.
// Right now lingo does not generate clusters - use PercyLiang's excellent tool for that

// Cluster represents a brown cluster
type Cluster int

// ReadCluster reads PercyLiang's cluster file format and returns a map of strings to Cluster
func ReadCluster(r io.Reader) map[string]Cluster {
	scanner := bufio.NewScanner(r)
	clusters := make(map[string]Cluster)

	for scanner.Scan() {
		line := scanner.Text()

		splits := strings.Split(line, "\t")
		var word string
		var cluster, freq int

		word = splits[1]

		var i64 int64
		var err error
		if i64, err = strconv.ParseInt(splits[0], 2, 64); err != nil {
			panic(err)
		}
		cluster = int(i64)

		if freq, err = strconv.Atoi(splits[2]); err != nil {
			panic(err)
		}

		// if clusterer has only seen a word a few times, then the cluster is not reliable
		if freq >= 3 {
			clusters[word] = Cluster(cluster)
		} else {
			clusters[word] = Cluster(0)
		}
	}

	// expand clusters with recasing
	for word, clust := range clusters {
		lowered := strings.ToLower(word)
		if _, ok := clusters[lowered]; !ok {
			clusters[lowered] = clust
		}

		titled := strings.ToTitle(word)
		if _, ok := clusters[titled]; !ok {
			clusters[titled] = clust
		}

		uppered := strings.ToUpper(word)
		if _, ok := clusters[uppered]; !ok {
			clusters[uppered] = clust
		}
	}

	return clusters
}


================================================
FILE: cmd/demo/io.go
================================================
package main

import (
	"log"
	"os"

	"github.com/chewxy/lingo"
	"github.com/chewxy/lingo/dep"
	"github.com/chewxy/lingo/pos"
)

const (
	posModelFile = `model/pos_stanfordtags_universalrel.final.model`
	depModelFile = `model/dep_stanfordtags_universalrel.final.model`
	brownCluster = `clusters.txt`
)

func io() {
	var err error
	log.Println("loading POS Tagger model")
	if posModel, err = pos.Load(posModelFile); err != nil {
		log.Fatal(err)
	}

	log.Println("loading Dependency Parser model")
	if depModel, err = dep.Load(depModelFile); err != nil {
		log.Fatal(err)
	}
	var f *os.File
	if f, err = os.Open(brownCluster); err != nil {
		log.Fatal(err)
	}
	clusters = lingo.ReadCluster(f)
}


================================================
FILE: cmd/demo/main.go
================================================
package main

import (
	"io/ioutil"
	"os"
	"os/exec"

	"github.com/abiosoft/ishell"
	"github.com/chewxy/lingo"
	"github.com/pkg/browser"
)

func main() {
	io()
	shell := ishell.New()

	var d *lingo.Dependency
	// var sent lingo.AnnotatedSentence
	var err error
	shell.AddCmd(&ishell.Cmd{
		Name: "dep",
		Help: "perform dependency parsing",
		Func: func(c *ishell.Context) {
			c.ShowPrompt(false)
			defer c.ShowPrompt(true)

			c.Print("Query: ")
			query := c.ReadLine()

			if d, err = pipeline(query); err != nil {
				c.Printf("Error: %v", err)
			}

			c.Printf("%v\n", d)
		},
	})

	shell.AddCmd(&ishell.Cmd{
		Name: "show",
		Help: "show dependency parse on browser",
		Func: func(c *ishell.Context) {
			var tmp *os.File
			if tmp, err = ioutil.TempFile("", "dep"); err != nil {
				c.Printf("Cannot open file %v\n", err)
				return
			}
			defer os.Remove(tmp.Name())

			c.Printf("%v\n", tmp.Name())

			dot := d.Tree().Dot()
			tmp.Write([]byte(dot))
			if err := tmp.Close(); err != nil {
				c.Printf("Error closing file %v", err)
			}
			cmd := exec.Command("dot", "-Tpng", "-O", tmp.Name())
			if err = cmd.Run(); err != nil {
				c.Printf("Cannot execute dot: %v\n", err)
			}

			browser.OpenFile(tmp.Name() + ".png")

		},
	})
	shell.Start()
}


================================================
FILE: cmd/demo/nlp.go
================================================
package main

import (
	"fmt"
	"strings"

	"github.com/chewxy/lingo"
	"github.com/chewxy/lingo/dep"
	"github.com/chewxy/lingo/lexer"
	"github.com/chewxy/lingo/pos"
	"github.com/kljensen/snowball"
	"github.com/pkg/errors"
)

var posModel *pos.Model
var depModel *dep.Model

var clusters map[string]lingo.Cluster

type stemmer struct{}

func (stemmer) Stem(a string) (string, error) {
	return snowball.Stem(a, "english", true)
}

type fixer struct {
	stemmer
}

func (f fixer) Clusters() (map[string]lingo.Cluster, error) { return clusters, nil }
func (f fixer) Lemmatize(a string, pt lingo.POSTag) ([]string, error) {
	return nil, nocomp("lemmatizer")
}

type nocomp string

func (e nocomp) Error() string     { return fmt.Sprintf("no %v", string(e)) }
func (e nocomp) Component() string { return string(e) }

func pipeline(s string) (d *lingo.Dependency, err error) {
	if posModel == nil || depModel == nil {
		return nil, errors.Errorf("Unable to create a pipeline")
	}
	lx := lexer.New(s, strings.NewReader(s))
	pt := pos.New(pos.WithModel(posModel), pos.WithStemmer(stemmer{}))
	dp := dep.New(depModel)

	// pipeline
	pt.Input = lx.Output
	dp.Input = pt.Output

	go lx.Run()
	go pt.Run()
	go dp.Run()

	var ok bool
	for {
		select {
		case d, ok = <-dp.Output:
			if !ok {
				continue
			}
			return
		case err = <-dp.Error:
			return
		}
	}
}


================================================
FILE: cmd/dep/fixer.go
================================================
package main

import (
	"fmt"

	"github.com/chewxy/lingo"
	"github.com/kljensen/snowball"
)

type stemmer struct{}

func (stemmer) Stem(a string) (string, error) {
	return snowball.Stem(a, "english", true)
}

type fixer struct {
	stemmer
}

func (f fixer) Clusters() (map[string]lingo.Cluster, error) { return clusters, nil }
func (f fixer) Lemmatize(a string, pt lingo.POSTag) ([]string, error) {
	return nil, nocomp("lemmatizer")
}

type nocomp string

func (e nocomp) Error() string     { return fmt.Sprintf("no %v", string(e)) }
func (e nocomp) Component() string { return string(e) }


================================================
FILE: cmd/dep/io.go
================================================
package main

import (
	"log"

	"github.com/chewxy/lingo/dep"
	"github.com/chewxy/lingo/pos"
	"github.com/chewxy/lingo/treebank"
)

func validateFlags() {
	if *load == "" && *trainFile == "" {
		log.Fatal("Must either load a model or pass in a training file")
	}

	if *epoch < 0 {
		log.Fatal("epochs must only be positive numbers")
	}

	if *load != "" {
		toLoad = true
	}

	if *trainFile != "" {
		toTrain = true
	}

	if *testFile != "" {
		*cv = true
	}

	// warnings
	if *load == "" && *save == "" {
		log.Println("WARNING: Models that have been trained will NOT be saved")
	}
}

func loadTreebanks() {
	if *trainFile != "" {
		trainTB = treebank.LoadUniversal(*trainFile)
	}

	if *testFile != "" {
		testTB = treebank.LoadUniversal(*testFile)
	}
}

func loadPOSModel() {
	var err error
	if *loadPOS == "" {
		log.Fatal("Cannot proceed without having a POS model")
	}
	if POSModel, err = pos.Load(*loadPOS); err != nil {
		log.Fatal(err)
	}
}

func loadDepModel() {
	var err error

	if DepModel, err = dep.Load(*load); err != nil {
		log.Fatal(err)
	}
}

func saveModel() {
	if *save != "" && DepModel != nil {
		DepModel.Save(*save)
	}
}


================================================
FILE: cmd/dep/main.go
================================================
package main

import (
	"flag"
	"log"
	"os"
	"os/signal"
	"runtime/pprof"
	"syscall"

	"github.com/chewxy/lingo"
	"github.com/chewxy/lingo/dep"
	"github.com/chewxy/lingo/pos"
)

var save = flag.String("save", "", "save as...")
var load = flag.String("load", "", "load a model")
var loadPOS = flag.String("PTmodel", "", "load a POS Tagger model")
var clusterFiles = flag.String("cluster", "", "Brown Cluster files. If nothing is passed in, then the brown cluster won't be used")
var trainFile = flag.String("train", "", "Training on... (Only CONLLU formatted training files are accepted)")
var testFile = flag.String("test", "", "Test on... (Only CONLLU formatted training files are accepted). If this is not provided, the model will be trained without crossvalidation")
var cv = flag.Bool("cv", false, "Cross validate training model? Defaults to false.")
var epoch = flag.Int("epoch", 10, "Training epochs. Defaults to 10")
var format = flag.String("f", "", "Format to output. Default is none. Accepts: {json, dot}")

var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to file")
var memprofile = flag.String("memprofile", "", "write memory profile to this file")

var clusters map[string]lingo.Cluster
var POSModel *pos.Model
var DepModel *dep.Model
var toLoad, toTrain bool

func init() {
	if lingo.BUILD_TAGSET != "stanfordtags" && lingo.BUILD_TAGSET != "universaltags" {
		log.Fatalf("Tagset %q unsupported", lingo.BUILD_TAGSET)
	}

	if lingo.BUILD_RELSET != "stanfordrel" && lingo.BUILD_RELSET != "universalrel" {
		log.Fatalf("Relset %q unsupported", lingo.BUILD_RELSET)
	}
}

func cleanup(sigChan chan os.Signal, cpuprofiling, memprofiling bool) {
	select {
	case <-sigChan:
		log.Println("EMERGENCY EXIT")
		if cpuprofiling {
			pprof.StopCPUProfile()

		}
		if memprofiling {
			f, err := os.Create(*memprofile)
			if err != nil {
				log.Fatal(err)
			}
			pprof.WriteHeapProfile(f)
			f.Close()
		}
		saveModel()
		os.Exit(1)
	}
}

func main() {
	flag.Parse()
	validateFlags()

	sigChan := make(chan os.Signal, 1)
	signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
	var cpuprofiling, memprofiling bool
	if *cpuprofile != "" {
		f, err := os.Create(*cpuprofile)
		if err != nil {
			log.Fatal(err)
		}
		cpuprofiling = true
		pprof.StartCPUProfile(f)
		defer pprof.StopCPUProfile()
	}

	if *memprofile != "" {
		memprofiling = true
	}

	go cleanup(sigChan, cpuprofiling, memprofiling)

	loadPOSModel()
	if toLoad {
		loadDepModel()
	}

	if toTrain {
		loadTreebanks()
		train()
	}

	saveModel()
}


================================================
FILE: cmd/dep/pipeline.go
================================================
package main

import (
	"encoding/json"
	"fmt"
	"strings"

	"github.com/chewxy/lingo"
	"github.com/chewxy/lingo/dep"
	"github.com/chewxy/lingo/lexer"
	"github.com/chewxy/lingo/pos"
)

func receive(deps chan *lingo.Dependency, errs, errChan chan error) {
	defer close(errChan)
	for {
		select {
		case dep, ok := <-deps:
			if !ok {
				continue
			}
			switch *format {
			case "json":
				bs, _ := json.MarshalIndent(dep, "", "\t")
				fmt.Printf("%s\n", string(bs))
			case "dot":
				fmt.Printf("%v\n", dep.Tree().Dot())
			}

		case err := <-errs:
			errChan <- err
		}
	}
}

func pipeline(s string) error {
	lx := lexer.New(s, strings.NewReader(s))
	pt := pos.New(pos.WithModel(POSModel))
	dp := dep.New(DepModel)

	pt.Input = lx.Output
	dp.Input = pt.Output

	errChan := make(chan error)
	go lx.Run()
	go pt.Run()
	go receive(dp.Output, dp.Error, errChan)
	dp.Run()

	return <-errChan
}


================================================
FILE: cmd/dep/train.go
================================================
package main

import (
	"log"

	"github.com/chewxy/lingo/dep"
	"github.com/chewxy/lingo/treebank"
	"gorgonia.org/tensor"
)

var trainTB []treebank.SentenceTag
var testTB []treebank.SentenceTag

func train() {
	conf := dep.DefaultNNConfig
	conf.Dtype = tensor.Float32
	var trainer *dep.Trainer

	if testTB != nil {
		log.Printf("TRAINING WITH CROSSVALIDATION")
		trainer = dep.NewTrainer(dep.WithGeneratedCorpus(trainTB...), dep.WithTrainingSet(trainTB), dep.WithCrossValidationSet(testTB), dep.WithConfig(conf))
		trainer.SaveBest = "TMP.model"
		if err := trainer.Init(); err != nil {
			log.Fatalf("Unable to initialize trainer: \n%+v", err)
		}

		prog := trainer.Perf()
		cost := trainer.Cost()
		go func() {
			for {
				select {
				case p := <-prog:
					log.Printf("%v\n", p)
				case c := <-cost:
					log.Printf("Cost %v\n", c)
				}
			}
		}()

	} else {
		trainer = dep.NewTrainer(dep.WithGeneratedCorpus(trainTB...), dep.WithTrainingSet(trainTB), dep.WithConfig(conf))
		if err := trainer.Init(); err != nil {
			log.Fatalf("Unable to initialize trainer: \n%+v", err)
		}

		prog := trainer.Cost()
		go func() {
			for cost := range prog {
				log.Printf("Cost %v\n", cost)
			}
		}()
	}

	if err := trainer.Train(*epoch); err != nil {
		log.Fatal(err)
	}

	DepModel = trainer.Model
}


================================================
FILE: cmd/lexer/main.go
================================================
package main

import (
	"flag"
	"fmt"
	"strings"

	"github.com/chewxy/lingo"
	"github.com/chewxy/lingo/lexer"
)

var input = flag.String("input", "", "input string to lex")
var output = make(chan lingo.Lexeme)

func receieve() {
	for l := range output {
		fmt.Printf("%v\n", l)
	}
}

func main() {
	flag.Parse()

	s := *input

	go receieve()
	l := lexer.New(s, strings.NewReader(s))
	l.Output = output
	l.Run()
}


================================================
FILE: cmd/pos/crossvalidation.go
================================================
package main

import (
	"bytes"
	"fmt"
	"log"
	"os"
	"strings"
	"sync"

	"github.com/chewxy/lingo"
	"github.com/chewxy/lingo/lexer"
	"github.com/chewxy/lingo/pos"
	"github.com/chewxy/lingo/treebank"
)

type testResult struct {
	tagged lingo.AnnotatedSentence
	actual lingo.AnnotatedSentence
}

func (tr testResult) compare() (int, bool) {
	tagged := tr.tagged
	actual := tr.actual

	var sameLength bool = true

	if len(tagged) != len(actual) {
		sameLength = false
	}

	var counter int
	for i, v := range actual {
		if i >= len(tagged) {
			break
		}
		if v.POSTag == tagged[i].POSTag {
			counter++
		}
	}
	return counter, sameLength
}

func crossValidate(resultChan chan testResult) {
	diffLengthCount := 0
	totalLength := 0
	correctCount := 0
	sentences := 0

	var wrongResults []testResult

	for res := range resultChan {
		sentences++
		length := len(res.actual)
		cc, sl := res.compare()
		if !sl {
			diffLengthCount++
		}
		correctCount += cc
		totalLength += length

		if cc != length && *inspect != "" {
			wrongResults = append(wrongResults, res)
		}
	}

	if *inspect != "" {
		f, err := os.OpenFile(*inspect, os.O_WRONLY|os.O_CREATE, 0666)
		if err != nil {
			log.Fatal(err)
		}

		// can write directly to f
		var buf bytes.Buffer
		for _, res := range wrongResults {
			fmt.Fprintf(&buf, "Sentence: \nW:%v\nG:%v\nTags:\nW: %v\nG: %v\n\n", res.actual.StringSlice(), res.tagged.StringSlice(), res.actual.Tags(), res.tagged.Tags())
		}

		f.WriteString(buf.String())
		f.Close()
	}

	fmt.Printf("CrossValidation: %d/%d = %f. Differing Lengths : %d/%d = %f\n", correctCount, totalLength, float64(correctCount)/float64(totalLength), diffLengthCount, sentences, float64(diffLengthCount)/float64(sentences))
}

func collect(ch chan lingo.AnnotatedSentence, correct lingo.AnnotatedSentence, outCh chan testResult, wg *sync.WaitGroup) {
	defer wg.Done()

	for sentence := range ch {
		outCh <- testResult{sentence, correct}
	}
}

func testModel(sentences []treebank.SentenceTag) {
	resultChan := make(chan testResult)

	go func() {
		defer close(resultChan)
		var wg sync.WaitGroup
		for _, sentence := range sentences {
			wg.Add(1)
			input := sentence.String()
			correct := sentence.AnnotatedSentence(fixer{stemmer{}})
			ch := make(chan lingo.AnnotatedSentence)
			go collect(ch, correct, resultChan, &wg)
			go cvpipeline(input, ch)
		}
		wg.Wait()
	}()

	crossValidate(resultChan)

}

func cvpipeline(s string, output chan lingo.AnnotatedSentence) {
	l := lexer.New(s, strings.NewReader(s))
	pt := pos.New(pos.WithModel(model))

	pt.Input = l.Output
	pt.Output = output

	go l.Run()
	pt.Run()
}


================================================
FILE: cmd/pos/fixer.go
================================================
// +build !chewxy

package main

import (
	"fmt"

	"github.com/chewxy/lingo"
	"github.com/kljensen/snowball"
)

type stemmer struct{}

func (stemmer) Stem(a string) (string, error) {
	return snowball.Stem(a, "english", true)
}

type fixer struct {
	stemmer
}

func (f fixer) Clusters() (map[string]lingo.Cluster, error) { return clusters, nil }
func (f fixer) Lemmatize(a string, pt lingo.POSTag) ([]string, error) {
	return nil, nocomp("lemmatizer")
}

type nocomp string

func (e nocomp) Error() string     { return fmt.Sprintf("no %v", string(e)) }
func (e nocomp) Component() string { return string(e) }


================================================
FILE: cmd/pos/main.go
================================================
package main

import (
	"flag"
	"fmt"
	"log"
	"os"
	"os/signal"
	"runtime/pprof"
	"strings"
	"sync"
	"syscall"
	"time"

	"github.com/chewxy/lingo"
	"github.com/chewxy/lingo/lexer"
	"github.com/chewxy/lingo/pos"
	"github.com/chewxy/lingo/treebank"
)

var save = flag.String("save", "", "save as...")
var load = flag.String("load", "", "load a model")
var clusterFiles = flag.String("cluster", "", "Brown Cluster files. If nothing is passed in, then the brown cluster won't be used")
var trainFile = flag.String("train", "", "Training on... files that end with '.conllu' will be treated as CONLLU formatted files. Files ending with '.zip' will be treted as EWT files")
var testFile = flag.String("test", "", "Test on... Files to cross validate the model on. If this is provided, automatic crossvalidation will be done")
var cv = flag.Bool("cv", false, "Cross validate training model? Defaults to false.")
var epoch = flag.Int("epoch", 1500, "Training epochs. Defaults to 1500")
var inspect = flag.String("inpect", "", "Inspect all the wrong outputs to figure out what went wrong in the POSTagging. This is useful for debugging")
var input = flag.String("input", "", "Input sentence to tag")

var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to file")
var memprofile = flag.String("memprofile", "", "write memory profile to this file")

var clusters map[string]lingo.Cluster
var model *pos.Model

func receive(sentences chan lingo.AnnotatedSentence, wg *sync.WaitGroup) {
	defer wg.Done()
	for sent := range sentences {
		for _, a := range sent {
			fmt.Printf("%#v: %s| %s | %s | %d\n", a, a.POSTag, a.Lemma, a.WordFlag, a.Cluster)
		}
	}
}

func pipeline(s string) {
	l := lexer.New(s, strings.NewReader(s))
	pt := pos.New(pos.WithModel(model))

	pt.Input = l.Output
	var wg sync.WaitGroup

	go l.Run()
	go receive(pt.Output, &wg)

	wg.Add(1)

	pt.Run()
	wg.Wait()
}

func validateFlags() {
	if *load == "" && *trainFile == "" {
		log.Fatal("Must either load a model or pass in a training file")
	}

	if *epoch < 0 {
		log.Fatal("epochs must be positive numbers only!")
	}

	if *testFile != "" {
		*cv = true
	}

	// warnings

	if *load == "" && *save == "" {
		log.Println("WARNING: Models that are trained will NOT be saved")
	}
}

func loadOrTrain() {
	var trained *pos.Tagger
	if *clusterFiles != "" {
		f, err := os.Open(*clusterFiles)
		if err != nil {
			log.Fatal(err)
		}
		clusters = lingo.ReadCluster(f)

		trained = pos.New(pos.WithCluster(clusters), pos.WithStemmer(stemmer{}))
	} else {
		trained = pos.New()
	}

	if *load != "" {
		start := time.Now()
		var err error
		if model, err = pos.Load(*load); err != nil {
			log.Fatal(err)
		}
		log.Printf("Loading model from %q took %v", *load, time.Since(start))
		return
	}

	var sentences []treebank.SentenceTag
	switch {
	case strings.HasSuffix(*trainFile, ".zip"):
		sentences = treebank.LoadEWT(*trainFile)

		// TODO split sentences for crossvalidation

	case strings.HasSuffix(*trainFile, ".conllu"):
		sentences = treebank.LoadUniversal(*trainFile)
	default:
		f, err := os.Open(*trainFile)
		if err != nil {
			log.Fatal(err)
		}

		sentences = treebank.ReadConllu(f)
	}

	log.Printf("Start training for %d epochs...", *epoch)
	start := time.Now()
	trained.Train(sentences, *epoch)
	log.Printf("End Training. Training took %v minutes", time.Since(start).Minutes())

	if *save != "" {
		trained.Save(*save)
		log.Printf("Model saved as: %v", *save)
	}
}

func cleanup(sigChan chan os.Signal, profiling bool) {
	select {
	case <-sigChan:
		log.Println("EMERGENCY EXIT")
		if profiling {
			pprof.StopCPUProfile()
		}
		os.Exit(1)
	}
}

func main() {
	flag.Parse()

	if lingo.BUILD_TAGSET != "stanfordtags" && lingo.BUILD_TAGSET != "universaltags" {
		log.Fatalf("Tagset: %v is unsupported", lingo.BUILD_TAGSET)
	}

	sigChan := make(chan os.Signal, 1)
	signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)

	var profiling bool
	if *cpuprofile != "" {
		f, err := os.Create(*cpuprofile)
		if err != nil {
			log.Fatal(err)
		}
		profiling = true
		pprof.StartCPUProfile(f)
		defer pprof.StopCPUProfile()
	}

	go cleanup(sigChan, profiling)

	validateFlags()
	loadOrTrain()

	if *memprofile != "" {
		f, err := os.Create(*memprofile)
		if err != nil {
			log.Fatal(err)
		}
		pprof.WriteHeapProfile(f)
		f.Close()
	}

	if *input != "" {
		pipeline(*input)
	}

	if *cv {
		log.Printf("Cross Validating now")
		testSentences := treebank.LoadUniversal(*testFile)
		testModel(testSentences)
	}

}


================================================
FILE: const.go
================================================
package lingo

// constants that are not pertaining to build tags

var empty struct{}

// NumberWords was generated with this python code
/*
	numberWords = {}

	simple = '''zero one two three four five six seven eight nine ten eleven twelve
	        thirteen fourteen fifteen sixteen seventeen eighteen nineteen
	        twenty'''.split()
	for i, word in zip(xrange(0, 20+1), simple):
	    numberWords[word] = i

	tense = '''thirty forty fifty sixty seventy eighty ninety hundred'''.split()
	for i, word in zip(xrange(30, 100+1, 10), tense):
		numberWords[word] = i

	larges = '''thousand million billion trillion quadrillion quintillion sextillion septillion'''.split()
	for i, word in zip(xrange(3, 24+1, 3), larges):
		numberWords[word] = 10**i
*/
var NumberWords = map[string]int{
	"zero":        0,
	"one":         1,
	"two":         2,
	"three":       3,
	"four":        4,
	"five":        5,
	"six":         6,
	"seven":       7,
	"eight":       8,
	"nine":        9,
	"ten":         10,
	"eleven":      11,
	"twelve":      12,
	"thirteen":    13,
	"fourteen":    14,
	"fifteen":     15,
	"sixteen":     16,
	"nineteen":    19,
	"seventeen":   17,
	"eighteen":    18,
	"twenty":      20,
	"thirty":      30,
	"forty":       40,
	"fifty":       50,
	"sixty":       60,
	"seventy":     70,
	"eighty":      80,
	"ninety":      90,
	"hundred":     100,
	"thousand":    1000,
	"million":     1000000,
	"billion":     1000000000,
	"trillion":    1000000000000,
	"quadrillion": 1000000000000000,
	// "quintillion": 1000000000000000000,
	// "sextillion": 1000000000000000000000,
	// "septillion": 1000000000000000000000000,
}


================================================
FILE: corpus/consopt.go
================================================
package corpus

import (
	"log"
	"sort"
	"sync/atomic"
	"unicode/utf8"

	"github.com/pkg/errors"
	"github.com/xtgo/set"
)

// ConsOpt is a construction option for manual creation of a Corpus
type ConsOpt func(c *Corpus) error

// WithWords creates a corpus from a word list. It may have repeated words
func WithWords(a []string) ConsOpt {
	f := func(c *Corpus) error {
		s := set.Strings(a)
		c.words = s
		c.frequencies = make([]int, len(s))

		ids := make(map[string]int)
		maxID := len(s)

		var totalFreq, maxWL int
		// NOTE: here we're iterating over the set of words
		for i, w := range s {
			runeCount := utf8.RuneCountInString(w)
			if runeCount > c.maxWordLength {
				maxWL = runeCount
			}

			ids[w] = i
		}

		// NOTE: here we're iterating over the original word list.
		for _, w := range a {
			c.frequencies[ids[w]]++
			totalFreq++
		}

		c.ids = ids
		atomic.AddInt64(&c.maxid, int64(maxID))
		c.totalFreq = totalFreq
		c.maxWordLength = maxWL
		return nil
	}
	return f
}

// WithOrderedWords creates a Corpus with the given word order
func WithOrderedWords(a []string) ConsOpt {
	f := func(c *Corpus) error {
		s := a
		c.words = s
		c.frequencies = make([]int, len(s))
		for i := range c.frequencies {
			c.frequencies[i] = 1
		}

		ids := make(map[string]int)
		maxID := len(s)
		totalFreq := len(s)
		var maxWL int
		for i, w := range a {
			runeCount := utf8.RuneCountInString(w)
			if runeCount > c.maxWordLength {
				maxWL = runeCount
			}
			ids[w] = i
		}

		c.ids = ids
		atomic.AddInt64(&c.maxid, int64(maxID))
		c.totalFreq = totalFreq
		c.maxWordLength = maxWL
		return nil
	}
	return f
}

// WithSize preallocates all the things in Corpus
func WithSize(size int) ConsOpt {
	return func(c *Corpus) error {
		c.words = make([]string, 0, size)
		c.frequencies = make([]int, 0, size)
		return nil
	}
}

// FromDict is a construction option to take a map[string]int where the int represents the word ID.
// This is useful for constructing corpuses from foreign sources where the ID mappings are important
func FromDict(d map[string]int) ConsOpt {
	return func(c *Corpus) error {
		var a sortutil
		for k, v := range d {
			a.words = append(a.words, k)
			a.ids = append(a.ids, v)
		}
		sort.Sort(&a)
		c.ids = make(map[string]int)
		for i, w := range a.words {
			if i != a.ids[i] {
				return errors.Errorf("Unmarshaling error. Expected %dth ID to be %d. Got %d instead. Perhaps something went wrong during sorting? SLYTHERIN IT IS!", i, i, a.ids[i])
			}
			c.words = append(c.words, w)
			c.frequencies = append(c.frequencies, 1)
			c.ids[w] = i

			c.totalFreq++
			runeCount := utf8.RuneCountInString(w)
			if runeCount > c.maxWordLength {
				log.Printf("FD MaxWordLength %d - %q", runeCount, w)
				c.maxWordLength = runeCount
			}
		}
		c.maxid = int64(len(a.words))
		return nil
	}

}

// FromDictWithFreq is like FromDict, but also has a frequency.
func FromDictWithFreq(d map[string]struct{ ID, Freq int }) ConsOpt {
	return func(c *Corpus) error {
		var a sortutil
		for k, v := range d {
			a.words = append(a.words, k)
			a.ids = append(a.ids, v.ID)
			a.freqs = append(a.freqs, v.Freq)
		}
		sort.Sort(&a)
		c.ids = make(map[string]int)
		for i, w := range a.words {
			if i != a.ids[i] {
				return errors.Errorf("Unmarshaling error. Expected %dth ID to be %d. Got %d instead. Perhaps something went wrong during sorting? SLYTHERIN IT IS!", i, i, a.ids[i])
			}
			c.words = append(c.words, w)
			c.frequencies = append(c.frequencies, a.freqs[i])
			c.ids[w] = i

			c.totalFreq += a.freqs[i]
			runeCount := utf8.RuneCountInString(w)
			if runeCount > c.maxWordLength {
				c.maxWordLength = runeCount
			}
		}
		c.maxid = int64(len(a.words))
		return nil
	}
}


================================================
FILE: corpus/corpus.go
================================================
package corpus

import (
	"sync/atomic"
	"unicode/utf8"

	"github.com/pkg/errors"
)

// Corpus is a data structure holding the relevant metadata and information for a corpus of text.
// It serves as vocabulary with ID for lookup. This is very useful as neural networks rely on the IDs rather than the text themselves
type Corpus struct {
	words       []string
	frequencies []int

	ids map[string]int

	// atomic read and write plz
	maxid         int64
	totalFreq     int
	maxWordLength int
}

// New creates a new *Corpus
func New() *Corpus {
	c := &Corpus{
		words:       make([]string, 0),
		frequencies: make([]int, 0),
		ids:         make(map[string]int),
	}

	// add some default words
	c.Add("") // aka NULL - when there are no words
	c.Add("-UNKNOWN-")
	c.Add("-ROOT-")
	c.maxWordLength = 0 // specials don't have lengths

	return c
}

// Construct creates a Corpus given the construction options. This allows for more flexibility
func Construct(opts ...ConsOpt) (*Corpus, error) {
	c := new(Corpus)

	// checks
	if c.words == nil {
		c.words = make([]string, 0)
	}
	if c.frequencies == nil {
		c.frequencies = make([]int, 0)
	}
	if c.ids == nil {
		c.ids = make(map[string]int)
	}

	for _, opt := range opts {
		if err := opt(c); err != nil {
			return nil, err
		}
	}

	return c, nil
}

// ID returns the ID of a word and whether or not it was found in the corpus
func (c *Corpus) Id(word string) (int, bool) {
	id, ok := c.ids[word]
	return id, ok
}

// Word returns the word given the ID, and whether or not it was found in the corpus
func (c *Corpus) Word(id int) (string, bool) {
	size := atomic.LoadInt64(&c.maxid)
	maxid := int(size)

	if id >= maxid {
		return "", false
	}
	return c.words[id], true
}

// Add adds a word to the corpus and returns its ID. If a word was previously in the corpus, it merely updates the frequency count and returns the ID
func (c *Corpus) Add(word string) int {
	if id, ok := c.ids[word]; ok {
		c.frequencies[id]++
		c.totalFreq++
		return id
	}

	id := atomic.AddInt64(&c.maxid, 1)
	c.ids[word] = int(id - 1)
	c.words = append(c.words, word)
	c.frequencies = append(c.frequencies, 1)
	c.totalFreq++

	runeCount := utf8.RuneCountInString(word)
	if runeCount > c.maxWordLength {
		c.maxWordLength = runeCount
	}

	return int(id - 1)
}

// Size returns the size of the corpus.
func (c *Corpus) Size() int {
	size := atomic.LoadInt64(&c.maxid)
	return int(size)
}

// WordFreq returns the frequency of the word. If the word wasn't in the corpus, it returns 0.
func (c *Corpus) WordFreq(word string) int {
	id, ok := c.ids[word]
	if !ok {
		return 0
	}

	return c.frequencies[id]
}

// IDFreq returns the frequency of a word given an ID. If the word isn't in the corpus it returns 0.
func (c *Corpus) IDFreq(id int) int {
	size := atomic.LoadInt64(&c.maxid)
	maxid := int(size)

	if id >= maxid {
		return 0
	}
	return c.frequencies[id]
}

// TotalFreq returns the total number of words ever seen by the corpus. This number includes the count of repeat words.
func (c *Corpus) TotalFreq() int {
	return c.totalFreq
}

// MaxWordLength returns the length of the longest known word in the corpus.
func (c *Corpus) MaxWordLength() int {
	return c.maxWordLength
}

// WordProb returns the probability of a word appearing in the corpus.
func (c *Corpus) WordProb(word string) (float64, bool) {
	id, ok := c.Id(word)
	if !ok {
		return 0, false
	}

	count := c.frequencies[id]
	return float64(count) / float64(c.totalFreq), true

}

// Merge combines two corpuses. The receiver is the one that is mutated.
func (c *Corpus) Merge(other *Corpus) {
	for i, word := range other.words {
		freq := other.frequencies[i]
		if id, ok := c.ids[word]; ok {
			c.frequencies[id] += freq
			c.totalFreq += freq
		} else {
			id := c.Add(word)
			c.frequencies[id] += freq - 1
			c.totalFreq += freq - 1
		}
	}
}

// Replace replaces the content of a word. The old reference remains.
//
// e.g: c.Replace("foo", "bar")
// c.Id("foo") will still return a ID. The ID will be the same as c.Id("bar")
func (c *Corpus) Replace(a, with string) error {
	old, ok := c.ids[a]
	if !ok {
		return errors.Errorf("Cannot replace %q with %q. %q is not found", a, with, a)
	}
	if _, ok := c.ids[with]; ok {
		return errors.Errorf("Cannot replace %q with %q. %q exists in the corpus", a, with, with)
	}
	c.words[old] = with
	return nil

}

// ReplaceWord replaces the word associated with the given ID. The old reference remains.
func (c *Corpus) ReplaceWord(id int, with string) error {
	if id >= len(c.words) {
		return errors.Errorf("Cannot replace word with ID %d. Out of bounds.", id)
	}
	if _, ok := c.ids[with]; ok {
		return errors.Errorf("Cannot replace word with ID %d with %q. %q exists in the corpus", id, with, with)
	}
	c.words[id] = with
	return nil
}


================================================
FILE: corpus/corpus_test.go
================================================
package corpus

import (
	"testing"

	"github.com/stretchr/testify/assert"
)

func TestCorpus(t *testing.T) {
	assert := assert.New(t)
	dict := New()
	assert.Equal(0, dict.WordFreq("hello")) // frequency of a word not in dict ould have to be 0
	assert.Equal(0, dict.IDFreq(3))         // ditto

	id := dict.Add("hello")

	assert.Equal(3, id)
	assert.Equal([]string{"", "-UNKNOWN-", "-ROOT-", "hello"}, dict.words)
	assert.Equal(map[string]int{"": 0, "-UNKNOWN-": 1, "-ROOT-": 2, "hello": 3}, dict.ids)
	assert.Equal(4, dict.Size())

	id2, ok := dict.Id("hello")
	if !ok {
		t.Errorf("The ID of null should be  0")
	}
	assert.Equal(id, id2)

	word, ok := dict.Word(3)
	if !ok {
		t.Errorf("Expected word of ID 3 to be found")
	}
	assert.Equal("hello", word)

	dict.Add(word)
	assert.Equal(2, dict.WordFreq(word))
	assert.Equal(2, dict.IDFreq(3))
	assert.Equal(5, dict.TotalFreq())
	assert.Equal(5, dict.MaxWordLength())

	prob, ok := dict.WordProb(word)
	if !ok {
		t.Errorf("Expected a probability")
	}
	assert.Equal(0.4, prob)
	// t.Logf("%q: %v", word, dict.WordProb(word))
}

func TestCorpus_Merge(t *testing.T) {
	assert := assert.New(t)

	dict := New()
	id := dict.Add("hello")
	dict.frequencies[id] += 4 // freq for "hello" is 5
	dict.totalFreq += 4

	other := New()
	id = other.Add("hello")
	other.frequencies[id] += 2 // freq for "hello" is 3
	other.totalFreq += 2
	id = other.Add("world")
	other.frequencies[id] += 1
	other.totalFreq += 1

	dict.Merge(other)

	assert.Equal(8, dict.WordFreq("hello"))
	assert.Equal(2, dict.WordFreq("world"))
}


================================================
FILE: corpus/functions.go
================================================
package corpus

import (
	"math"
	"strings"
	"unicode/utf8"

	"github.com/chewxy/lingo"
	"github.com/chewxy/lingo/treebank"
	"github.com/pkg/errors"
)

// GenerateCorpus creates a Corpus given a set of SentenceTag from a training set.
func GenerateCorpus(sentenceTags []treebank.SentenceTag) *Corpus {
	words := make([]string, 3)
	frequencies := make([]int, 3)

	words[0] = ""      // aka NULL, for when no word can be found
	frequencies[0] = 0 // no word is never found

	words[1] = "-UNKNOWN-"
	frequencies[1] = 0

	words[2] = "-ROOT-"
	frequencies[2] = 1

	knownWords := make(map[string]int)
	knownWords[""] = 0
	knownWords["-UNKNOWN-"] = 1
	knownWords["-ROOT-"] = 2

	maxWordLength := 0

	for _, sentenceTag := range sentenceTags {
		for _, lex := range sentenceTag.Sentence {
			id, ok := knownWords[lex.Value]
			if !ok {
				knownWords[lex.Value] = len(words)
				words = append(words, lex.Value)
				frequencies = append(frequencies, 1)

				runeCount := utf8.RuneCountInString(lex.Value)
				if runeCount > maxWordLength {
					maxWordLength = runeCount
				}
			} else {
				frequencies[id]++
			}
		}
	}

	var totals int
	for _, f := range frequencies {
		totals += f
	}

	return &Corpus{words, frequencies, knownWords, int64(len(words)), totals, maxWordLength}
}

// ViterbiSplit is a Viterbi algorithm for splitting words given a corpus
func ViterbiSplit(input string, c *Corpus) []string {
	s := strings.ToLower(input)
	probabilities := []float64{1.0}
	lasts := []int{0}

	runes := []int{}
	for i := range s {
		runes = append(runes, i)
	}
	runes = append(runes, len(s)+1)

	for i := range s {
		probs := make([]float64, 0)
		ls := make([]int, 0)

		// m := maxInt(0, i-c.maxWordLength)

		for j, r := range runes {
			if r > i {
				break
			}

			p, ok := c.WordProb(s[r : i+1])
			if !ok {
				// http://stackoverflow.com/questions/195010/how-can-i-split-multiple-joined-words#comment48879458_481773
				p = (math.Log(float64(1)/float64(c.totalFreq)) - float64(c.maxWordLength) - float64(1)) * float64(i-r) // note it should be i-r not j-i as per the SO post
			}
			prob := probabilities[j] * p

			probs = append(probs, prob)
			ls = append(ls, r)
		}

		maxProb := -math.SmallestNonzeroFloat64
		maxK := -1 << 63
		for j, p := range probs {
			if p > maxProb {
				maxProb = p
				maxK = ls[j]
			}
		}
		probabilities = append(probabilities, maxProb)
		lasts = append(lasts, maxK)
	}

	words := make([]string, 0)
	i := utf8.RuneCountInString(s)

	for i > 0 {
		start := lasts[i]
		words = append(words, s[start:i])
		i = start
	}

	// reverse it
	for i, j := 0, len(words)-1; i < j; i, j = i+1, j-1 {
		words[i], words[j] = words[j], words[i]
	}

	return words
}

// CosineSimilarity measures the cosine similarity of two strings.
func CosineSimilarity(a, b []string) float64 {
	countsA := make([]float64, 0)
	countsB := make([]float64, 0)
	uniques := make(map[string]int)

	// index the strings first
	for _, st := range a {
		s := strings.ToLower(st)
		id, ok := uniques[s]
		if !ok {
			uniques[s] = len(countsA)
			countsA = append(countsA, 1)
			countsB = append(countsB, 0) // create for countsB, but don't add
		} else {
			countsA[id]++
		}
	}

	for _, st := range b {
		s := strings.ToLower(st)
		id, ok := uniques[s]
		if !ok {
			uniques[s] = len(countsA)
			countsA = append(countsA, 0)
			countsB = append(countsB, 1)
		} else {
			countsB[id]++
		}
	}

	magA, err := mag(countsA)
	if err != nil {
		panic(err)
	}

	magB, err := mag(countsB)
	if err != nil {
		panic(err)
	}

	dotProd, err := dot(countsA, countsB)
	if err != nil {
		panic(err)
	}

	return dotProd / (magA * magB)

}

// DamerauLevenshtein calculates the Damerau-Levensthtein distance between two strings. See more at https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance
func DamerauLevenshtein(s1 string, s2 string) (distance int) {
	// index by code point, not byte
	r1 := []rune(s1)
	r2 := []rune(s2)

	// the maximum possible distance
	inf := len(r1) + len(r2)

	// if one string is blank, we needs insertions
	// for all characters in the other one
	if len(r1) == 0 {
		return len(r2)
	}

	if len(r2) == 0 {
		return len(r1)
	}

	// construct the edit-tracking matrix
	matrix := make([][]int, len(r1))
	for i := range matrix {
		matrix[i] = make([]int, len(r2))
	}

	// seen characters
	seenRunes := make(map[rune]int)

	if r1[0] != r2[0] {
		matrix[0][0] = 1
	}

	seenRunes[r1[0]] = 0
	for i := 1; i < len(r1); i++ {
		deleteDist := matrix[i-1][0] + 1
		insertDist := (i+1)*1 + 1
		var matchDist int
		if r1[i] == r2[0] {
			matchDist = i
		} else {
			matchDist = i + 1
		}
		matrix[i][0] = minInt(minInt(deleteDist, insertDist), matchDist)
	}

	for j := 1; j < len(r2); j++ {
		deleteDist := (j + 1) * 2
		insertDist := matrix[0][j-1] + 1
		var matchDist int
		if r1[0] == r2[j] {
			matchDist = j
		} else {
			matchDist = j + 1
		}

		matrix[0][j] = minInt(minInt(deleteDist, insertDist), matchDist)
	}

	for i := 1; i < len(r1); i++ {
		var maxSrcMatchIndex int
		if r1[i] == r2[0] {
			maxSrcMatchIndex = 0
		} else {
			maxSrcMatchIndex = -1
		}

		for j := 1; j < len(r2); j++ {
			swapIndex, ok := seenRunes[r2[j]]
			jSwap := maxSrcMatchIndex
			deleteDist := matrix[i-1][j] + 1
			insertDist := matrix[i][j-1] + 1
			matchDist := matrix[i-1][j-1]
			if r1[i] != r2[j] {
				matchDist += 1
			} else {
				maxSrcMatchIndex = j
			}

			// for transpositions
			var swapDist int
			if ok && jSwap != -1 {
				iSwap := swapIndex
				var preSwapCost int
				if iSwap == 0 && jSwap == 0 {
					preSwapCost = 0
				} else {
					preSwapCost = matrix[maxInt(0, iSwap-1)][maxInt(0, jSwap-1)]
				}
				swapDist = i + j + preSwapCost - iSwap - jSwap - 1
			} else {
				swapDist = inf
			}
			matrix[i][j] = minInt(minInt(minInt(deleteDist, insertDist), matchDist), swapDist)
		}
		seenRunes[r1[i]] = i
	}

	return matrix[len(r1)-1][len(r2)-1]
}

// LongestCommonPrefix takes a slice of strings, and finds the longest common prefix
func LongestCommonPrefix(strs ...string) string {
	switch len(strs) {
	case 0:
		return "" // idiots
	case 1:
		return strs[0]
	}

	min := strs[0]
	max := strs[0]

	for _, s := range strs[1:] {
		switch {
		case s < min:
			min = s
		case s > max:
			max = s
		}
	}

	for i := 0; i < len(min) && i < len(max); i++ {
		if min[i] != max[i] {
			return min[:i]
		}
	}

	// In the case where lengths are not equal but all bytes
	// are equal, min is the answer ("foo" < "foobar").
	return min
}

/* The following two functions help in parsing a string into numbers. It's recommended you write abstractions over the functions*/

// StrsToInts converts a string slice into an int slice, with the help of NumberWords.
// The function assumes all helper words like "and" have been stripped.
// 		"One hundred and five" -> []string{"one", "hundred", "five"}
// This is a very primitive method, and doesn't take into account other words like "a hundred" or "a couple of hundred"
func StrsToInts(strs []string) (retVal []int, err error) {
	for _, s := range strs {
		intVal, ok := lingo.NumberWords[s]
		if !ok {
			return nil, errors.Errorf("Unable to parse the words %q as numbers", s)
		}

		if len(retVal) > 0 && intVal == 100 && retVal[len(retVal)-1] < 100 {
			retVal[len(retVal)-1] *= 100
		} else if len(retVal) > 0 && retVal[len(retVal)-1] < 1000 && intVal < 1000 {
			retVal[len(retVal)-1] += intVal
		} else {
			retVal = append(retVal, intVal)
		}
	}
	return
}

// CombineInts takes a int slice, and tries to make it one integer.
// It works by taking advantage of english - anything more than 1000 has a repeated pattern
// e.g.
// 		one hundred and fifty thousand two hundred and two
// there are 2 repeated patterns (one hundred and fifty) and  (two hundred and two)
//
// This allows us to repeatedly combine by addition or multiplication until there is one left
func CombineInts(ints []int) int {
	var total int
	for len(ints) > 0 {
		if len(ints) == 1 || ints[0] >= 1000 {
			last := ints[len(ints)-1]
			total += last
			ints = ints[0 : len(ints)-1] //pop it
		} else {
			if ints[1] < 1000 {
				// something went wrong
				panic("HELP!")
			}
			total += ints[0] * ints[1]
			ints = ints[2:]
		}
	}
	return total
}


================================================
FILE: corpus/functions_test.go
================================================
package corpus

import (
	"strings"
	"testing"

	"github.com/stretchr/testify/assert"
)

func Test_GenerateCorpus(t *testing.T) {
	sentenceTags := mediumSentence()
	dict := GenerateCorpus(sentenceTags)

	// testing time
	assert := assert.New(t)
	expectedWords := []string{"", "-UNKNOWN-", "-ROOT-", "President", "Bush", "on", "Tuesday", "nominated", "two", "individuals", "to", "replace", "retiring", "jurists", "federal", "courts", "in", "the", "Washington", "area", "."}

	expectedIDs := make(map[string]int)
	for i, w := range expectedWords {
		expectedIDs[w] = i
	}

	assert.Equal(expectedWords, dict.words, "Corpus known words should be the same as the manually annotated expected values")
	assert.Equal(expectedIDs, dict.ids, "IDs should be the same as expected IDs")
	assert.Equal(int64(len(expectedWords)), dict.maxid)
}

func TestViterbiSplit(t *testing.T) {
	assert := assert.New(t)
	dict := GenerateCorpus(mediumSentence())

	s2 := "twoindividuals"
	words := ViterbiSplit(s2, dict)
	assert.Equal([]string{"two", "individuals"}, words)

	s2 = "FederalCourts"
	words = ViterbiSplit(s2, dict)
	assert.Equal([]string{"federal", "courts"}, words)

	s3 := "toreplaceon"
	words = ViterbiSplit(s3, dict)
	assert.Equal([]string{"to", "replace", "on"}, words)
}

func TestCosineSimilarity(t *testing.T) {
	a := strings.Split("This is a test of cosine similarity", " ")
	b := strings.Split("This is not a test of cosine similarity", " ")

	s1 := CosineSimilarity(a, a)
	s2 := CosineSimilarity(a, b)

	if !floatEquals64(s1, 1) {
		t.Error("Expected similarity to be 1 when compared with itself")
	}
	if s2 > s1 {
		t.Error("Something went wrong with the cosine similarity algorithm")
	}

	c := strings.Split("Parramatta Road", " ")
	d := strings.Split("Parramatta Rd", " ")

	s1 = CosineSimilarity(c, c)
	s2 = CosineSimilarity(c, d)

	if !floatEquals64(s1, 1) {
		t.Error("Expected similarity to be 1 when compared with itself")
	}
	if s2 > s1 {
		t.Error("Something went wrong with the cosine similarity algorithm")
	}
}

func TestDL(t *testing.T) {
	a := "This is a test of Damerau Levenshtein"
	b := "This is not a test of Damerau Levenshtein"

	s1 := DamerauLevenshtein(a, a)
	s2 := DamerauLevenshtein(a, b)
	if s1 != 0 {
		t.Errorf("Expected the distance to be 0 when compared against itself. Got %d", s1)
	}

	if s2 < s1 {
		t.Error("Expected DL similarity to be greater when compared against itself")
	}

	c := "Parramatta Road"
	d := "Paramatta Rd"

	s1 = DamerauLevenshtein(c, c)
	s2 = DamerauLevenshtein(c, d)

	if s1 != 0 {
		t.Errorf("Expected the distance to be 0 when compared against itself. Got %d", s1)
	}
	if s2 < s1 {
		t.Error("Expected DL similarity to be greater when compared against itself")
	}
}

func TestLCP(t *testing.T) {
	assert := assert.New(t)
	lcp := LongestCommonPrefix("Hello World", "Hell yeah!")
	assert.Equal("Hell", lcp)

	lcp = LongestCommonPrefix("Hello World", "Hell yeah!", "hey there")
	assert.Equal("", lcp)

	lcp = LongestCommonPrefix()
	assert.Equal("", lcp)

	lcp = LongestCommonPrefix("OneWord")
	assert.Equal("OneWord", lcp)

	lcp = LongestCommonPrefix("foo", "foobar")
	assert.Equal("foo", lcp)
}

var parseNumTests = []struct {
	s string
	v int
}{
	{"twenty nine", 29},
	{"one hundred five", 105},
	{"five hundred twenty thousand twenty one", 520021},
}

func TestParseNumber(t *testing.T) {
	for _, pnts := range parseNumTests {
		s := strings.Split(pnts.s, " ")
		ints, err := StrsToInts(s)
		if err != nil {
			t.Error(err)
			continue
		}

		v := CombineInts(ints)
		if v != pnts.v {
			t.Errorf("Expected %q to be parsed to %d. Got %d instead", pnts.s, pnts.v, v)
		}
	}
}


================================================
FILE: corpus/inflection.go
================================================
package corpus

import (
	"regexp"

	"github.com/chewxy/lingo"
)

type conversionPattern struct {
	pattern     *regexp.Regexp
	replacement string
}

func newConversionPattern(from, to string) conversionPattern {
	rFrom := regexp.MustCompile(from)
	return conversionPattern{rFrom, to}
}

// plural -> singular
var plural = []conversionPattern{
	newConversionPattern("(quiz)$", "${1}zes"),
	newConversionPattern("^(ox)$", "${1}en"),
	newConversionPattern("([m|l])ouse$", "${1}ice"),
	newConversionPattern("(matr|vert|ind)ix|ex$", "${1}ices"),
	newConversionPattern("(x|ch|ss|sh)$", "${1}es"),
	newConversionPattern("([^aeiouy]|qu)ies$", "${1}y"),
	newConversionPattern("([^aeiouy]|qu)y$", "${1}ies"),
	newConversionPattern("(hive)$", "${1}s"),
	newConversionPattern("(?:([^f])fe|([lr])f)$", "${1}${2}ves"),
	newConversionPattern("sis$", "ses"),
	newConversionPattern("([ti])um$", "${1}a"),
	newConversionPattern("(buffal|tomat|potat)o$", "${1}oes"),
	newConversionPattern("(bu)s$", "${1}ses"),
	newConversionPattern("(alias|status|sex)$", "${1}es"),
	newConversionPattern("(octop|vir)us$", "${1}i"),
	newConversionPattern("(ax|test)is$", "${1}es"),
	newConversionPattern("s$", "s"),
	newConversionPattern("$", "s"),
}

// singular -> plural
var singular = []conversionPattern{
	newConversionPattern("(quiz)zes$", "${1}"),
	newConversionPattern("(matr)ices$", "${1}ix"),
	newConversionPattern("(vert|ind)ices$", "${1}ex"),
	newConversionPattern("^(ox)en", "${1}"),
	newConversionPattern("(alias|status)es$", "${1}"),
	newConversionPattern("(octop|vir)i$", "${1}us"),
	newConversionPattern("(cris|ax|test)es$", "${1}is"),
	newConversionPattern("(shoe)s$", "${1}"),
	newConversionPattern("(o)es$", "${1}"),
	newConversionPattern("(bus)es$", "${1}"),
	newConversionPattern("([m|l])ice$", "${1}ouse"),
	newConversionPattern("(x|ch|ss|sh)es$", "${1}"),
	newConversionPattern("(m)ovies$", "${1}ovie"),
	newConversionPattern("(s)eries$", "${1}eries"),
	newConversionPattern("([^aeiouy]|qu)ies$", "${1}y"),
	newConversionPattern("([lr])ves$", "${1}f"),
	newConversionPattern("(tive)s$", "${1}"),
	newConversionPattern("(hive)s$", "${1}"),
	newConversionPattern("([^f])ves$", "${1}fe"),
	newConversionPattern("(^analy)ses$", "${1}sis"),
	newConversionPattern("((a)naly|(b)a|(d)iagno|(p)arenthe|(p)rogno|(s)ynop|(t)he)ses$", "${1}${2}sis"),
	newConversionPattern("([ti])a$", "${1}um"),
	newConversionPattern("(n)ews$", "${1}ews"),
	newConversionPattern("s$", ""),
}

// weird pluralizations that don't match the rules above
var irregular = []conversionPattern{
	newConversionPattern("person", "people"),
	newConversionPattern("man", "men"),
	newConversionPattern("child", "children"),
	newConversionPattern("sex", "sexes"),
	newConversionPattern("move", "moves"),
	newConversionPattern("sleeve", "sleeves"),
	newConversionPattern("datum", "data"),
	newConversionPattern("box", "boxes"),
	newConversionPattern("knife", "knives"),
}

var unconvertable = []string{
	"equipment",
	"information",
	"rice",
	"money",
	"species",
	"series",
	"fish",
	"sheep",
}

// Pluralize pluralizes words based on rules known
func Pluralize(word string) string {
	if lingo.InStringSlice(word, unconvertable) {
		return word
	}

	for _, cp := range irregular {
		if cp.pattern.MatchString(word) {
			return cp.replacement
		}
	}

	for _, cp := range plural {
		if cp.pattern.MatchString(word) {
			// log.Printf("\t%q Matches %q", word, cp.pattern.String())
			return cp.pattern.ReplaceAllString(word, cp.replacement)
		}
	}
	return word
}

// Singularize singularizes words based on rules known
func Singularize(word string) string {
	if lingo.InStringSlice(word, unconvertable) {
		return word
	}

	for _, cp := range singular {
		if cp.pattern.MatchString(word) {
			return cp.pattern.ReplaceAllString(word, cp.replacement)
		}
	}
	return word
}


================================================
FILE: corpus/inflection_test.go
================================================
package corpus

import "testing"

var pluralizeTest = []struct {
	word, correct string
}{
	{"friend", "friends"},
	{"tomato", "tomatoes"},
	{"knife", "knives"},
	{"dwarf", "dwarves"},
	{"box", "boxes"},
	{"ox", "oxen"},
	{"man", "men"},
	{"equipment", "equipment"},
}

var singularizeTest = []struct {
	word, correct string
}{
	{"condolences", "condolence"},
	{"fish", "fish"},
	{"shoes", "shoe"},
	{"viri", "virus"},
	{"elves", "elf"},
}

func TestPluralize(t *testing.T) {
	for _, pts := range pluralizeTest {
		got := Pluralize(pts.word)
		if got != pts.correct {
			t.Errorf("Pluralizing %q failed. Want %q. Got %q instead", pts.word, pts.correct, got)
		}
	}
}

func TestSingularize(t *testing.T) {
	for _, pts := range singularizeTest {
		got := Singularize(pts.word)
		if got != pts.correct {
			t.Errorf("Singularizing %q failed. Want %q. Got %q instead", pts.word, pts.correct, got)
		}
	}
}


================================================
FILE: corpus/io.go
================================================
package corpus

import (
	"bufio"
	"bytes"
	"encoding/gob"
	"io"
	"strconv"
	"strings"
)

// sortutil is a utility struct meant to sort words based on IDs
type sortutil struct {
	words []string
	ids   []int
	freqs []int
}

func (s *sortutil) Len() int           { return len(s.words) }
func (s *sortutil) Less(i, j int) bool { return s.ids[i] < s.ids[j] }
func (s *sortutil) Swap(i, j int) {
	s.words[i], s.words[j] = s.words[j], s.words[i]
	s.ids[i], s.ids[j] = s.ids[j], s.ids[i]
	if len(s.freqs) > 0 {
		s.freqs[i], s.freqs[j] = s.freqs[j], s.freqs[i]
	}
}

// ToDictWithFreq returns a simple marshalable type. Conceptually it's a JSON object with the words as the keys. The values are a pair - ID and Freq.
func ToDictWithFreq(c *Corpus) map[string]struct{ ID, Freq int } {
	retVal := make(map[string]struct{ ID, Freq int })
	for i, w := range c.words {
		retVal[w] = struct{ ID, Freq int }{i, c.frequencies[i]}
	}
	return retVal
}

// ToDict returns a marshalable dict. It returns a copy of the ID mapping.
func ToDict(c *Corpus) map[string]int {
	retVal := make(map[string]int)
	for k, v := range c.ids {
		retVal[k] = v
	}
	return retVal
}

// GobEncode implements GobEncoder for *Corpus
func (c *Corpus) GobEncode() ([]byte, error) {
	var buf bytes.Buffer
	encoder := gob.NewEncoder(&buf)

	if err := encoder.Encode(c.words); err != nil {
		return nil, err
	}

	if err := encoder.Encode(c.ids); err != nil {
		return nil, err
	}

	if err := encoder.Encode(c.frequencies); err != nil {
		return nil, err
	}

	if err := encoder.Encode(c.maxid); err != nil {
		return nil, err
	}

	if err := encoder.Encode(c.totalFreq); err != nil {
		return nil, err
	}

	if err := encoder.Encode(c.maxWordLength); err != nil {
		return nil, err
	}

	return buf.Bytes(), nil
}

// GobDecode implements GobDecoder for *Corpus
func (c *Corpus) GobDecode(buf []byte) error {
	b := bytes.NewBuffer(buf)
	decoder := gob.NewDecoder(b)

	if err := decoder.Decode(&c.words); err != nil {
		return err
	}

	if err := decoder.Decode(&c.ids); err != nil {
		return err
	}

	if err := decoder.Decode(&c.frequencies); err != nil {
		return err
	}

	if err := decoder.Decode(&c.maxid); err != nil {
		return err
	}

	if err := decoder.Decode(&c.totalFreq); err != nil {
		return err
	}

	if err := decoder.Decode(&c.maxWordLength); err != nil {
		return err
	}

	return nil
}

// LoadOneGram loads a 1_gram.txt file, which is a tab separated file which lists the frequency counts of words. Example:
// 		the	23135851162
// 		of	13151942776
// 		and	12997637966
// 		to	12136980858
// 		a	9081174698
// 		in	8469404971
// 		for	5933321709
func (c *Corpus) LoadOneGram(r io.Reader) error {
	scanner := bufio.NewScanner(r)
	for scanner.Scan() {
		line := scanner.Text()
		splits := strings.Split(line, "\t")

		if len(splits) == 0 {
			break
		}

		word := splits[0] // TODO: normalize
		count, err := strconv.Atoi(splits[1])
		if err != nil {
			return err
		}

		id := c.Add(word)
		c.frequencies[id] = count
		c.totalFreq--
		c.totalFreq += count

		wc := len([]rune(word))
		if wc > c.maxWordLength {
			c.maxWordLength = wc
		}
	}
	return nil
}


================================================
FILE: corpus/io_test.go
================================================
package corpus

import (
	"bytes"
	"encoding/gob"
	"strings"
	"testing"

	"github.com/stretchr/testify/assert"
)

func TestCorpusGob(t *testing.T) {
	buf := new(bytes.Buffer)

	c := New()
	c.Add("Hello")
	c.Add("World")

	helloID, _ := c.Id("Hello")
	worldID, _ := c.Id("World")

	encoder := gob.NewEncoder(buf)
	decoder := gob.NewDecoder(buf)

	if err := encoder.Encode(c); err != nil {
		t.Fatal(err)
	}

	c2 := New()
	if err := decoder.Decode(c2); err != nil {
		t.Fatal(err)
	}

	if hid, ok := c2.Id("Hello"); !ok || (ok && hid != helloID) {
		t.Errorf("\"Hello\" not found after decoding.")
	}

	if wid, ok := c2.Id("World"); !ok || (ok && wid != worldID) {
		t.Errorf("\"World\" not found after decoding.")
	}
}

func TestCorpusToDict(t *testing.T) {
	assert := assert.New(t)
	c, _ := Construct(WithWords([]string{"World", "Hello", "World"}))

	d := ToDict(c)
	c2, err := Construct(FromDict(d))
	if err != nil {
		t.Fatal(err)
	}
	assert.Equal(c.words, c2.words, "Expected words to be the same")
	assert.Equal(c.ids, c2.ids, "Expected IDs to be the same")
	assert.NotEqual(c.frequencies, c2.frequencies, "Expected frequencies to not be the same")
	assert.Equal(c.maxid, c2.maxid, "Expected maxID to be the same")
	assert.NotEqual(c.totalFreq, c2.totalFreq, "Expected totalFreq to be different.")
	assert.Equal(c.maxWordLength, c2.maxWordLength, "Expected maxWordLength to be the same")
}

func TestCorpusToDictWithFreq(t *testing.T) {
	assert := assert.New(t)
	c, _ := Construct(WithWords([]string{"World", "Hello", "World"}))

	d := ToDictWithFreq(c)
	c2, err := Construct(FromDictWithFreq(d))
	if err != nil {
		t.Fatal(err)
	}

	assert.Equal(c, c2)
}

func TestLoadOneGram(t *testing.T) {
	assert := assert.New(t)
	r := strings.NewReader(sample1Gram)

	c := New()
	err := c.LoadOneGram(r)
	assert.Nil(err)
	assert.Equal(10, c.Size())

	id, ok := c.Id("for")
	if !ok {
		t.Errorf("Expected \"for\" to be in corpus after loading one gram file")
	}
	assert.Equal(int(c.maxid-1), id)

}


================================================
FILE: corpus/lda.go
================================================
package corpus

import (
	"gorgonia.org/tensor"
)

// LDAModel ... TODO
//https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation
type LDAModel struct {
	// params
	Alpha tensor.Tensor // is a Row
	Eta   tensor.Tensor // is a Col
	//Kappa gorgonia.Scalar // Decay
	//Tau0  gorgonia.Scalar // offset

	// parameters needed for working
	Topics      int
	ChunkSize   int
	Terms       int
	UpdateEvery int
	EvalEvery   int

	// consts
	Iterations     int
	GammaThreshold float64

	MinimumProb float64

	// track current progress
	Updates int

	// type
	Dtype tensor.Dtype
}

func (l *LDAModel) init() {
	eta := tensor.New(tensor.Of(l.Dtype), tensor.WithShape(l.Topics))
	alpha := tensor.New(tensor.Of(l.Dtype), tensor.WithShape(l.Topics))

	switch l.Dtype {
	case tensor.Float64:
		v := 1.0 / float64(l.Topics)
		eta.Memset(v)
		alpha.Memset(v)
	case tensor.Float32:
		v := float32(1) / float32(l.Topics)
		eta.Memset(v)
		alpha.Memset(v)
	}

	l.Alpha = alpha
	l.Eta = eta
}


================================================
FILE: corpus/test_test.go
================================================
package corpus

import (
	"strings"

	"github.com/chewxy/lingo/treebank"
)

const sample1Gram = `the	23135851162
of	13151942776
and	12997637966
to	12136980858
a	9081174698
in	8469404971
for	5933321709`

func mediumSentence() []treebank.SentenceTag {
	conllu := `1	President	President	PROPN	NNP	Number=Sing	2	compound	_	_
2	Bush	Bush	PROPN	NNP	Number=Sing	5	nsubj	_	_
3	on	on	ADP	IN	_	4	case	_	_
4	Tuesday	Tuesday	PROPN	NNP	Number=Sing	5	nmod	_	_
5	nominated	nominate	VERB	VBD	Mood=Ind|Tense=Past|VerbForm=Fin	0	root	_	_
6	two	two	NUM	CD	NumType=Card	7	nummod	_	_
7	individuals	individual	NOUN	NNS	Number=Plur	5	dobj	_	_
8	to	to	PART	TO	_	9	mark	_	_
9	replace	replace	VERB	VB	VerbForm=Inf	5	advcl	_	_
10	retiring	retire	VERB	VBG	VerbForm=Ger	11	amod	_	_
11	jurists	jurist	NOUN	NNS	Number=Plur	9	dobj	_	_
12	on	on	ADP	IN	_	14	case	_	_
13	federal	federal	ADJ	JJ	Degree=Pos	14	amod	_	_
14	courts	court	NOUN	NNS	Number=Plur	11	nmod	_	_
15	in	in	ADP	IN	_	18	case	_	_
16	the	the	DET	DT	Definite=Def|PronType=Art	18	det	_	_
17	Washington	Washington	PROPN	NNP	Number=Sing	18	compound	_	_
18	area	area	NOUN	NN	Number=Sing	14	nmod	_	_
19	.	.	PUNCT	.	_	5	punct	_	_

`

	readr := strings.NewReader(conllu)
	return treebank.ReadConllu(readr)
}

const EPSILON64 float64 = 1e-10

func floatEquals64(a, b float64) bool {
	if (a-b) < EPSILON64 && (b-a) < EPSILON64 {
		return true
	}
	return false
}


================================================
FILE: corpus/utils.go
================================================
package corpus

import (
	"errors"
	"math"
)

func minInt(a, b int) int {
	if a < b {
		return a
	}
	return b
}

func maxInt(a, b int) int {
	if a > b {
		return a
	}
	return b
}

func dot(a, b []float64) (float64, error) {
	if len(a) != len(b) {
		return 0, errors.New("Differing lengths!")
	}

	var retVal float64
	for i, v := range a {
		retVal += v * b[i]
	}
	return retVal, nil
}

func mag(a []float64) (float64, error) {
	dotProd, err := dot(a, a)
	if err != nil {
		return dotProd, err
	}
	return math.Sqrt(dotProd), nil
}


================================================
FILE: dep/README.md
================================================
# Dependency Parser #

Package `dependencyparser` is a package that provides data structures and algorithms for a dependency parser as described by [Chen and Manning 2014](http://cs.stanford.edu/people/danqi/papers/emnlp2014.pdf) [PDF]. It achieves similar accuracy scores as the the cited paper.

# Installing #

`go get -u github.com/chewxy/lingo/dep`



# How It Works #

## Transition Based Parsing ##

The core of the parser is a transition based parser, as popularized by [Nivre 2003](https://stp.lingfil.uu.se/~nivre/docs/iwpt03.pdf) [PDF]. It's essentially a [shift-reduce parser](https://en.wikipedia.org/wiki/Shift-reduce_parser) with more states. Dan Jurafsky has a very [complete overview of transition-based parsing](https://web.stanford.edu/~jurafsky/slp3/14.pdf) [PDF], which should be consulted should more questions arise.

### Transitions ###

At the core of a transition based parser are two data structures: a stack and a queue. The queue, or buffer holds a list of words waiting to be parsed. Parsing is then simply a matter of manipulating the state of the stack and queue. Specifically there are three possible actions in an arc-standard parser:

* `Shift`: Shift simply shifts one word from the buffer on to the top of the stack
* `Left`: Left means the top of the stack is the head of the word underneath it. After the transition is applied (the link between the nodes attached), the word underneath the stack is removed.
* `Right`: Right means that the top of the stack is the child of the word underneath it. After the transition is applied, the top of the stack is popped.

A word on the terms "head", and "child". Consider the sentence "I am human":

!["I am human" example](https://github.com/chewxy/lingo/blob/master/dep/documentation/iamhuman.dot.png?raw=true)

We say "human" is the head of the words "I" and "am". Therefore, "I" and "am" are considered to be children of "human".

### Example ###

Let's look at a simple example to concrefy the ideas: "The cat sat on the mat". Here are the states

| Step | Stack                         | Buffer                                    | Transition |
|------|-------------------------------|-------------------------------------------|------------|
|0 | [ROOT]                            | ["The", "cat", "sat", "on", "the", "mat"] | Shift      |
|1 | [ROOT, "The"]                     | ["cat", "sat", "on", "the", "mat"]        | Shift      |
|2 | [ROOT, "The", "cat"]              | ["sat", "on", "the", "mat"]               | Left       | 
|3 | [ROOT, "cat"]                     | ["sat", "on", "the", "mat"]               | Shift      |
|4 | [ROOT, "cat", "sat"]              | ["on", "the", "mat"]                      | Left       |
|5 | [ROOT, "sat"]                     | ["on", "the", "mat"]                      | Shift      |
|6 | [ROOT, "sat", "on"]               | ["the", "mat"]                            | Shift      |
|7 | [ROOT, "sat", "on", "the"]        | ["mat"]                                   | Shift      |
|8 | [ROOT, "sat", "on", "the", "mat"] | []                                        | Left       |
|9 | [ROOT, "sat", "on", "mat"]        | []                                        | Left       |
|10| [ROOT, "sat", "mat"]              | []                                        | Right      |
|11| [ROOT, "sat"]                     | []                                        | Left       |

The above transitions produces this parse tree:

!["the cat sat on the mat"](https://github.com/chewxy/lingo/blob/master/dep/documentation/thecatsatonthemat.dot.png?raw=true)

The real question then is of course - how does the system know which is the correct transition to emit, given the state?

The answer is machine learning.

## Machine Learning ##

What exactly are we learning? Or more carefully put, what are the inputs and outputs of the machine learning algorithm? The table in the example above provides a template for the inputs and output. The output is easy - the transition is what we want to learn. 

As for the input, it's a little bit more complex. The input consists of the stack and the buffer. It'd be impractical and slow to include everything in the stack and buffer (dynamic neural networks are somewhat slower than static ones). So Chen and Manning came up with an ingenious idea - 

* Use the top 3 words of the stack
* Use the top 3 words of the buffer
* Use the first and second leftmost/rightmost children of the first two words of the stack

Instead of directly using the words, POS Tag and dependency relations as features, the rather ingenious idea was that it would use vectors drawn from an embedding matrix to represent these features instead. So instead of building sparse features, concatenating the vectors form a fixed sized input vector. This makes training the network much more expedient. 
You'll find this in [features.go](https://github.com/chewxy/lingo/blob/master/dependencyParser/features.go)

Given each state above, it'd be fairly trivial to extract an input vector based on the 18 "features" listed and feed forwards to a neural network. The result is a fast parser.

### Neural Network ###

The machine learning algorithm behind this parser is a simple 3-layered network. An input layer is constructed from the embedding matrices, and is forwarded to the first layer, which is activated by a cube activation function. This then passes forwards to a dropout layer before the last layer, which is a softmax layer.

[image of NN] 

## Hairy Bits ##

The hairy bits of this is the oracle. Specifically, the question: given a training sentence, how do we generate correct examples such as the table above? 

TODO: finish writing this section


# How To Use #

This package provides three main data structures for use:

* `Parser`
* `Model`
* `Trainer`

`Trainer` takes a `[]treebank.SentenceTag` and produces a `Model`. `Parser` requires a `Model` to run, and is basically a exported wrapper over `configuration` that handles a pipeline.

## Basic NLP Pipeline ##

```go
func main() {
	inputString: `The cat sat on the mat`
	lx := lexer.New("dummy", strings.NewReader(inputString)) // lexer - required to break a sentence up into words. 
	pt := pos.New(pos.WithModel(posModel))                   // POS Tagger - required to tag the words with a part of speech tag.
	dp := dep.New(depModel)                                  // Creates a new parser

	// set up a pipeline
	pt.Input = lx.Output
	dp.Input = pt.Output

	// run all
	go lx.Run()
	go pt.Run()
	go dp.Run()

	// wait to receive:
	for {
		select {
		case d := <- dp.Output:
			// do something
		case err:= <-dp.Error:
			// handle error
		}
	}

}
```

## Training A Model ##

To train a model you'd use the `Trainer`. The trainer accepts a `[]treebank.SentenceTag`. As long as you can parse your training file into those (package `treebank` accepts CONLLU formatted files as well as the PennTreebank formatted files), you'd be fine.

An example trainer is in the cmd directory of `lingo`

# FAQ #

**Why not an LSTM or RNN to encode the state of the stack and buffer?**

The answer is simplicity and speed. I have attempted variants of the parser with different neural networks - they don't work as fast as this. I am aware of Parsey-McParseface and the slightly improved accuracy compared to this model, but the speed has been not as great as I expect. This package emphasises parsing speed over accuracy - for most well written English sentences, this package performs well.

**Why are there no models?**

I'm afraid you're gonna have to train your own models. Training takes days on the Universal Dependency dataset and I haven't had the time to train on those. All my models are specific to the use of the company, and hence cannot be released.

**What caveats are there?**

Chen and Manning described using pre-computed activations for the top 10000 or so words. I did not implement that, but it would be trivial to revisit and implement it. Feel free to send a pull request.

**How can this be sped up?**

Use multiple, smaller trainers, each training on a separate batch. You can hence train them concurrently (pass the costs in a channel and collect at the end). At the end, sum the gradients before applying adagrad. The trade off is that a LOT more memory will be used. It's also the reason why it wasn't included as the default. It's quite trivial to write though. Send a pull request if you have managed to reduce memory usage.


# Contributing #

see package lingo's CONTRIBUTING.md for more information. There is currently a list of issues in Github issues. Those are good places to start.

# Licence #

This package is MIT licenced.

================================================
FILE: dep/arcStandard.go
================================================
package dep

import "github.com/chewxy/lingo"

// var SingleRoot bool = true // make this part of a build process

// canApply checks if a particular transition can be applied
func (c *configuration) canApply(t transition) bool {

	var h head
	if t.Move == Left || t.Move == Right {
		if t.Move == Left {
			h = c.stackValue(0)
		} else {
			h = c.stackValue(1)
		}

		if h < 0 {
			return false
		}
		if h == 0 && t.DependencyType != lingo.Root {
			return false
		}
	}

	stackSize := c.stackSize()
	bufferSize := c.bufferSize()

	if t.Move == Left {
		return stackSize > 2
	}

	if t.Move == Right {
		return stackSize > 2 || (stackSize == 2 && bufferSize == 0)

		// if not single root build
		// return stackSize >= 2
	}

	return bufferSize > 0 // strange other thing...

}

// apply applies the transition
func (c *configuration) apply(t transition) {
	logf("Applying %v", t)
	w1 := int(c.stackValue(1))
	w2 := int(c.stackValue(0))

	if t.Move == Left {
		c.AddArc(w2, w1, t.DependencyType)
		c.removeSecondTopStack()
	} else if t.Move == Right {
		c.AddArc(w1, w2, t.DependencyType)
		c.removeTopStack()
	} else {
		c.shift()
	}
}

// oracle gets the gold transition given the state
func (c *configuration) oracle(goldParse *lingo.Dependency) (t transition) {
	w1 := int(c.stackValue(1))
	w2 := int(c.stackValue(0))

	if w1 > 0 && goldParse.Head(w1) == w2 {
		t.Move = Left
		t.DependencyType = goldParse.Label(w1)
		return
	} else if w1 >= 0 && goldParse.Head(w2) == w1 && !c.hasOtherChildren(w2, goldParse) {
		t.Move = Right
		t.DependencyType = goldParse.Label(w2)

		return
	}
	return // default transition is Shift
}


================================================
FILE: dep/arcStandard_test.go
================================================
package dep

import (
	"testing"

	"github.com/chewxy/lingo"
	"github.com/stretchr/testify/assert"
)

func TestCanApply(t *testing.T) {
	dep := simpleSentence()[0].Dependency(dummyFix{})

	buffer := make([]head, 0)
	for i := 1; i < dep.WordCount(); i++ {
		buffer = append(buffer, head(i))
	}

	stack := []head{0}

	c := &configuration{
		Dependency: dep,
		stack:      stack,
		buffer:     buffer,
	}

	assert := assert.New(t)

	logf("Start config: \n%v", c)

	rootLeft := c.canApply(transition{Left, lingo.Root})
	rootRight := c.canApply(transition{Right, lingo.Root})
	NSubjLeft := c.canApply(transition{Left, lingo.NSubj})
	NSubjRight := c.canApply(transition{Right, lingo.NSubj})
	ShiftDep := c.canApply(transition{Shift, lingo.NoDepType})

	assert.Equal(false, rootLeft, "rootLeft should be false")
	assert.Equal(false, rootRight, "rootRight should be false")
	assert.Equal(false, NSubjLeft, "NSubjLeft should be false")
	assert.Equal(false, NSubjRight, "NSubjRight should be false")
	assert.Equal(true, ShiftDep, "ShiftDep should be true")

	logf("rootRight: %v, rootLeft: %v", rootLeft, rootRight)
	logf("NSubjRight: %v, NSubjLeft: %v", NSubjRight, NSubjLeft)
	logf("ShiftDep: %v", ShiftDep)

	c.shift()
	c.shift()
	logf("%v", c)

	rootLeft = c.canApply(transition{Left, lingo.Root})
	rootRight = c.canApply(transition{Right, lingo.Root})
	NSubjLeft = c.canApply(transition{Left, lingo.NSubj})
	NSubjRight = c.canApply(transition{Right, lingo.NSubj})
	ShiftDep = c.canApply(transition{Shift, lingo.NoDepType})

	assert.Equal(true, rootLeft, "rootLeft should be true")
	assert.Equal(true, rootRight, "rootRight should be true")
	assert.Equal(true, NSubjLeft, "NSubjLeft should be true")
	assert.Equal(true, NSubjRight, "NSubjRight should be true")
	assert.Equal(true, ShiftDep, "ShiftDep should be true")

	logf("rootRight: %v, rootLeft: %v", rootLeft, rootRight)
	logf("NSubjRight: %v, NSubjLeft: %v", NSubjRight, NSubjLeft)
	logf("ShiftDep: %v", ShiftDep)
}

func TestOracle(t *testing.T) {
	st := simpleSentence()[0]
	s := st.AnnotatedSentence(nil)
	c := newConfiguration(s, true)
	d := s.Dependency()

	for count := 0; !c.isTerminal() && count < 100; count++ {
		oracle := c.oracle(d)

		if !c.canApply(oracle) && (oracle != transition{Right, lingo.Root}) {
			t.Errorf("Cannot apply %v", oracle)
			break
		}

		c.apply(oracle)
	}

	assert.Equal(t, d.Heads(), c.Heads())
}


================================================
FILE: dep/configuration.go
================================================
package dep

import (
	"fmt"

	"github.com/chewxy/lingo"
)

// describes the current state of the parser

type head int

const (
	DOES_NOT_EXIST head = iota - 1
)

// configuration is the meat of the shift-reduce parsing. It holds the state for the shift reduction
type configuration struct {
	*lingo.Dependency
	stack  []head
	buffer []head

	bp int // buffer pointer - starts at 0, increments
}

func newConfiguration(sentence lingo.AnnotatedSentence, fromGold bool) *configuration {
	if fromGold {
		sentence = sentence.Clone()
	}

	dep := lingo.NewDependency(lingo.FromAnnotatedSentence(sentence), lingo.AllocTree())
	dep.SetID()
	sentence = sentence[1:] // because the POSTagger automatically adds a ROOTTAG at the end of it

	var buffer []head
	for i := 1; i <= len(sentence); i++ {
		buffer = append(buffer, head(i))
	}

	var stack []head
	stack = append(stack, head(0)) // add root

	return &configuration{
		Dependency: dep,
		stack:      stack,
		buffer:     buffer,
	}
}

func (c *configuration) String() string {
	return fmt.Sprintf("Stack: %v Buffer(%d): %v", c.stack, c.bp, c.buffer[c.bp:])
}

func (c *configuration) GoString() string {
	return fmt.Sprintf("Stack: %v Buffer(%d): %v\nHeads: %v\nRels: %v\n", c.stack, c.bp, c.buffer[c.bp:], c.Heads(), c.Labels())
}

func (c *configuration) bufferSize() int {
	return len(c.buffer) - c.bp
}

func (c *configuration) stackSize() int {
	return len(c.stack)
}

func (c *configuration) head(i int) head {
	heads := c.Heads() // TODO: maybe some sanity checks?
	return head(heads[i])
}

// gets the sentence index of the ith word on the stack. If there isn't anything on the stack, it returns DOES_NOT_EXIST
func (c *configuration) stackValue(i int) head {
	size := c.stackSize()
	if i >= size || i < 0 {
		return DOES_NOT_EXIST
	}
	return c.stack[size-1-i]
}

func (c *configuration) bufferValue(i int) head {
	size := c.bufferSize()
	if i >= size {
		return DOES_NOT_EXIST
	}
	return c.buffer[i+c.bp]
}

/*  stack machinations */

// pop pops the stack. It isn't really used any more. removeStack(), removeTopStack() and removeSecondTopStack() has superseded its function
func (c *configuration) pop() head {
	retVal := c.stack[len(c.stack)-1]
	c.stack = c.stack[0 : len(c.stack)-1]
	return retVal
}

// removes a value from the stack.
func (c *configuration) removeStack(i int) {
	c.stack = c.stack[:i+copy(c.stack[i:], c.stack[i+1:])]
}

// removeSecondTopStack removes the 2nd-to-last element
func (c *configuration) removeSecondTopStack() bool {
	stackSize := c.stackSize()
	if stackSize < 2 {
		return false
	}
	i := stackSize - 2
	c.removeStack(i)
	return true
}

func (c *configuration) removeTopStack() bool {
	stackSize := c.stackSize()
	if stackSize < 1 {
		return false
	}
	i := stackSize - 1
	c.removeStack(i)
	return true
}

/* Dependency related stuff */

func (c *configuration) label(i head) lingo.DependencyType {
	if i < 0 {
		return lingo.NoDepType
	}

	if i == 0 {
		return lingo.NoDepType
	}

	return c.Label(int(i))
	// i--

	// labels := c.Labels()
	// return labels[i]
}

func (c *configuration) annotation(i head) *lingo.Annotation {
	if i < 0 {
		return lingo.NullAnnotation()
	}

	if i == 0 {
		return lingo.RootAnnotation()
	}
	// i--

	return c.Annotation(int(i))

	// return c.Sentence()[i]
}

// gets the jth left child of the ith word of a sentence
func (c *configuration) lc(k, cnt head) head {
	if k < 0 || int(k) > c.N() {
		return DOES_NOT_EXIST
	}

	cc := 0
	for i := 1; i < int(k); i++ {
		if c.Head(i) == int(k) {
			cc++
			if int(cnt) == cc {
				return head(i)
			}
		}
	}
	return DOES_NOT_EXIST
}

func (c *configuration) rc(k, cnt head) head {
	if k < 0 || int(k) > c.N() {
		return DOES_NOT_EXIST
	}

	cc := 0
	for i := c.N(); i > int(k); i-- {
		if c.Head(i) == int(k) {
			cc++
			if cc == int(cnt) {
				return head(i)
			}
		}
	}
	return DOES_NOT_EXIST
}

func (c *configuration) hasOtherChildren(i int, goldParse *lingo.Dependency) bool {
	for j := 1; j <= goldParse.N(); j++ {
		if goldParse.Head(j) == i && c.Head(j) != i {
			return true
		}
	}
	return false
}

func (c *configuration) isTerminal() bool {
	return c.stackSize() == 1 && c.bufferSize() == 0
}

// Actual Transitioning stuff
func (c *configuration) shift() bool {
	i := c.bufferValue(0)
	if i == DOES_NOT_EXIST {
		return false
	}

	c.bp++ // move the buffer pointer up

	c.stack = append(c.stack, i) // push to it.... gotta work the pop
	return true
}


================================================
FILE: dep/configuration_test.go
================================================
package dep

import (
	"testing"

	"github.com/chewxy/lingo"
	"github.com/stretchr/testify/assert"
)

func TestStackAppendRemove(t *testing.T) {
	sentence := mediumSentence()[0]
	as := sentence.AnnotatedSentence(dummyFix{})

	c := newConfiguration(as, true)
	t.Logf("C: %v", c)
	t.Logf("C: %#v", c)

	assert := assert.New(t)

	c.stack = append(c.stack, 200)
	assert.Equal([]head{0, 200}, c.stack, "stack is not equal after appending")

	c.removeTopStack()
	assert.Equal([]head{0}, c.stack, "stack is not equal after removeTopStack")

	c.stack = append(c.stack, 200)
	c.removeSecondTopStack()
	assert.Equal([]head{200}, c.stack, "stack is not equal after removeSecondTopStack()")

	correctHeads := []int{-1} // the -1 is the root
	correctHeads = append(correctHeads, sentence.Heads...)
	correctLabels := []lingo.DependencyType{lingo.Root}
	correctLabels = append(correctLabels, sentence.Labels...)

	dep := sentence.Dependency(dummyFix{})
	assert.Equal(correctHeads, dep.Heads(), "Heads are not equal")
	assert.Equal(correctLabels, dep.Labels(), "Labels are not equal %v \n %v", correctLabels, dep.Labels())
}

func TestConfiguration_StackValue(t *testing.T) {
	c := new(configuration)
	c.stack = []head{0, 1, 2, 5, 6}

	zero := c.stackValue(0)
	one := c.stackValue(1)
	four := c.stackValue(4)
	five := c.stackValue(5)
	negone := c.stackValue(-1)

	assert := assert.New(t)
	assert.Equal(head(6), zero, "Zeroth value not the same")
	assert.Equal(head(5), one, "First value not the same")
	assert.Equal(head(0), four, "Fourth value not the same")
	assert.Equal(DOES_NOT_EXIST, five, "Fifth value not the same")
	assert.Equal(DOES_NOT_EXIST, negone, "NegOne value not the same")

}


================================================
FILE: dep/debug.go
================================================
// +build debug

package dep

import (
	"bytes"
	"fmt"
	"log"
	"runtime"
	"strings"
	"sync/atomic"

	"github.com/chewxy/lingo"
)

const BUILD_DEBUG = "PARSER: DEBUG BUILD"
const BUILD_DIAG = "Diagnostic Build"

const DEBUG = true

var READMEMSTATS = true

var TABCOUNT uint32 = 0

func tabcount() int {
	return int(atomic.LoadUint32(&TABCOUNT))
}

func enterLoggingContext() {
	atomic.AddUint32(&TABCOUNT, 1)
	tc := tabcount()
	log.SetPrefix(strings.Repeat("\t", tc))
}

func leaveLoggingContext() {
	tc := tabcount()
	tc--

	if tc < 0 {
		atomic.StoreUint32(&TABCOUNT, 0)
		tc = 0
	} else {
		atomic.StoreUint32(&TABCOUNT, uint32(tc))
	}
	log.SetPrefix(strings.Repeat("\t", tc))
}

func logf(format string, others ...interface{}) {
	if !DEBUG {
		return
	}
	log.Printf(format, others...)
}

func logTrainingProgress(iteration, correct, total, length, possibles int) {
	if !DEBUG {
		return
	}

	log.Printf("Iteration %d. Correct/Total: %d/%d = %.2f", iteration, correct, total, float64(correct)/float64(total))
	log.Printf("DictSize: %d/%d, load factor of: %.2f", length, possibles, float64(length)/float64(possibles))
}

func logMemStats() {
	if !DEBUG || !READMEMSTATS {
		return
	}

	var mem runtime.MemStats
	runtime.ReadMemStats(&mem)

	log.Printf("Allocated          : %.2f MB", (float64(mem.Alloc)/1024)/float64(1024))
	log.Printf("Total Allocated    : %.2f MB", (float64(mem.TotalAlloc)/1024)/float64(1024))
	log.Printf("Heap Allocted      : %.2f MB", (float64(mem.HeapAlloc)/1024)/float64(1024))
	log.Printf("Sys Total Allocated: %.2f MB", (float64(mem.HeapSys)/1024)/float64(1024))
	log.Println("----------")
}

func recoverFrom(format string, attrs ...interface{}) {
	if r := recover(); r != nil {
		log.Printf(format, attrs...)
		panic(r)
	}
}

/* Nice output of shit */
func (d *Parser) SprintFeatures(features []int) string {
	// tabcount := int(atomic.LoadUint32(&TABCOUNT))

	var buf bytes.Buffer

	for i := 0; i < 18; i++ {
		number := features[i]
		id := number - wordFeatsStartAt
		word, _ := d.corpus.Word(id)

		if word == "" {
			word = "-NULL-"
		}

		buf.WriteString(fmt.Sprintf("%d, %q, %d \n", feature(i), word, number))
	}

	for i := 0; i < 18; i++ {
		number := features[i+18]

		buf.WriteString(fmt.Sprintf("%d, %v, %d\n", feature(i+18), lingo.POSTag(number), number))
	}

	for i := 0; i < 12; i++ {
		number := features[i+36]
		id := number - labelFeatsStartAt

		buf.WriteString(fmt.Sprintf("%d, %v, %d\n", feature(i+36), lingo.DependencyType(id), number))
	}

	return buf.String()
}

func SprintScores(scores []float64, ts []transition) string {
	var buf bytes.Buffer
	for i, v := range scores {
		if i >= len(ts) {
			buf.WriteString(fmt.Sprintf("UNKNOWN TRANSITION, %v\n", v))
			continue
		}
		buf.WriteString(fmt.Sprintf("%v, %v\n", ts[i], v))
	}
	return buf.String()
}

func SprintFloatSlice(a []float64) string {
	var buf bytes.Buffer
	buf.WriteString("[")
	for i, v := range a {
		if i < len(a)-1 {
			buf.WriteString(fmt.Sprintf("%v, ", v))
		} else {
			buf.WriteString(fmt.Sprintf("%v", v))
		}
	}
	buf.WriteString("]")
	return buf.String()
}


================================================
FILE: dep/dependencyParser.go
================================================
package dep

import (
	"fmt"

	"github.com/chewxy/lingo"
	"github.com/chewxy/lingo/corpus"
	"github.com/pkg/errors"
)

var KnownWords *corpus.Corpus // package provided global

// Parser is the object that performs the dependency parsing
// It contains a neural network, which is the core of it.
//
// The same object can be used to train the NN
type Parser struct {
	Input  chan lingo.AnnotatedSentence
	Output chan *lingo.Dependency
	Error  chan error

	*Model
}

// New creates a new Parser
func New(m *Model) *Parser {
	d := &Parser{
		Output: make(chan *lingo.Dependency),
		Error:  make(chan error),

		Model: m,
	}

	return d
}

// Run is used when using the NN to parse a sentence. For training, see Train()
func (d *Parser) Run() {
	defer close(d.Output)
	for sentence := range d.Input {
		dep, err := d.predict(sentence)

		if err != nil {
			d.Error <- err
			return
		}
		d.Output <- dep
	}
	return
}

func (d *Parser) predict(sentence lingo.AnnotatedSentence) (*lingo.Dependency, error) {
	// defer func() {
	// 	if r := recover(); r != nil {
	// 		log.Printf("Parsing for %q", sentence.ValueString())
	// 		panic(r)
	// 	}
	// }()
	c := newConfiguration(sentence, false)

	var err error
	var argmax int
	var count int
	for !c.isTerminal() && count < 100 {
		logf("%v", c)
		if count == 99 {
			logf("TARPIT")
		}

		features := getFeatures(c, d.corpus)
		// features2 := getFeatureArray(c, d.dict)

		if argmax, err = d.nn.pred(features); err != nil {
			return nil, err
		}
		// log.Printf("Argmax: %v, len(d.ts): %v, len(transitions) %v", argmax, len(d.ts), len(transitions))
		t := transitions[argmax] // no this is NOT a mistake
		if !c.canApply(t) {
			t = transition{Shift, lingo.NoDepType} // reset
			// manual argmaxing
			switch scores := d.nn.scores.Value().Data().(type) {
			case []float32:
				var maxScore float32
				for i, kt := range d.ts {
					if scores[i] > maxScore && c.canApply(kt) {
						maxScore = scores[i]
						t = kt
					}
				}
			case []float64:
				var maxScore float64
				for i, kt := range d.ts {
					if scores[i] > maxScore && c.canApply(kt) {
						maxScore = scores[i]
						t = kt
					}
				}
			default:
				return nil, errors.Errorf("Unhandled score type %T", d.nn.scores.Value())
			}

		}
		c.apply(t)

		count++
	}
	fix(c.Dependency)
	return c.Dependency, err
}

func (d *Parser) String() string {
	var nns, ds string

	if d.corpus != nil {
		ds = fmt.Sprintf("\nDict Size: %d words\nMAXTAG: %d\nMAXDEPTYPE: %d\n", d.corpus.Size(), lingo.MAXTAG, lingo.MAXDEPTYPE)
	} else {
		ds = "\n"
	}

	if d.nn != nil && d.nn.initialized() {
		nns = fmt.Sprintf("\nNeural Network:\n=================\n%v\n", d.nn)
	}

	if !d.nn.initialized() {
		panic(fmt.Sprintf("%v", d.nn))
	}

	base := "\n\nDependency Parser Info:\n=======================\n"
	return base + ds + nns
}


================================================
FILE: dep/documentation/iamhuman.dot
================================================
digraph G {
	Node_0xc425b88740->Node_0xc425b88780[ label=Root ];
	Node_0xc425b88780->Node_0xc425b88800[ label=Cop ];
	Node_0xc425b88780->Node_0xc425b887c0[ label=NSubj ];
	Node_0xc425b88740 [ label="0: &#34;-ROOT-/ROOT_TAG&#34;" ];
	Node_0xc425b88780 [ label="3: &#34;human/JJ&#34;" ];
	Node_0xc425b887c0 [ label="1: &#34;I/PRP&#34;" ];
	Node_0xc425b88800 [ label="2: &#34;am/VBP&#34;" ];

}

================================================
FILE: dep/documentation/thecatsatonthemat.dot
================================================
digraph G {
	Node_0xc4349eeec0->Node_0xc4349eef80[ label=Root ];
	Node_0xc4349eef80->Node_0xc4349eefc0[ label=NMod ];
	Node_0xc4349eefc0->Node_0xc4349ef040[ label=Det ];
	Node_0xc4349eef80->Node_0xc4349eef00[ label=NSubj ];
	Node_0xc4349eef00->Node_0xc4349eef40[ label=Det ];
	Node_0xc4349eefc0->Node_0xc4349ef000[ label=Case ];
	Node_0xc4349eeec0 [ label="0: &#34;-ROOT-/ROOT_TAG&#34;" ];
	Node_0xc4349eef00 [ label="2: &#34;cat/NN&#34;" ];
	Node_0xc4349eef40 [ label="1: &#34;the/DT&#34;" ];
	Node_0xc4349eef80 [ label="3: &#34;sat/VBD&#34;" ];
	Node_0xc4349eefc0 [ label="6: &#34;mat/NN&#34;" ];
	Node_0xc4349ef000 [ label="4: &#34;on/IN&#34;" ];
	Node_0xc4349ef040 [ label="5: &#34;the/DT&#34;" ];

}



================================================
FILE: dep/errors.go
================================================
package dep

import (
	"fmt"

	"github.com/chewxy/lingo"
)

type componentUnavailable string

func (c componentUnavailable) Error() string     { return fmt.Sprintf("%v unavailable", c) }
func (c componentUnavailable) Component() string { return string(c) }

// TarpitError is an error when the arc-standard is stuck.
// It implements GoStringer, which when called will output the state as a string.
// It also implements lingo.Sentencer, so the offending sentence can easily be retrieved
type TarpitError struct{ *configuration }

func (err TarpitError) Error() string { return "Tarpit Error" }

// NonProjective error is the error that is emitted when the dependency tree is not projective (that is to say the children cross lines)
type NonProjectiveError struct{ *lingo.Dependency }

func (err NonProjectiveError) Error() string { return "Non-projective tree" }


================================================
FILE: dep/evaluation.go
================================================
package dep

import (
	"fmt"
	"io/ioutil"

	"github.com/chewxy/lingo"
	"github.com/chewxy/lingo/treebank"
)

// Performance is a tuple that holds performance information from a training session
type Performance struct {
	Iter int     // which training iteration is this?
	UAS  float64 // Unlabelled Attachment Score
	LAS  float64 // Labeled Attachment Score
	UEM  float64 // Unlabelled Exact Match
	Root float64 // Correct Roots Ratio
}

func (p Performance) String() string {
	s := `EPO: %d
UAS: %.5f
LAS: %.5f
UEM: %.5f
ROO: %.5f`

	return fmt.Sprintf(s, p.Iter, p.UAS, p.LAS, p.UEM, p.Root)
}

// performance evaluation related code goes here

// Evaluate compares predicted trees with the gold standard trees and returns a Performance. It panics if the number of predicted trees and the number of gold trees aren't the same
func Evaluate(predictedTrees, goldTrees []*lingo.Dependency) Performance {
	if len(predictedTrees) != len(goldTrees) {
		panic(fmt.Sprintf("%d predicted trees; %d gold trees. Unable to compare", len(predictedTrees), len(goldTrees)))
	}

	var correctLabels, correctHeads, correctTrees, correctRoot, sumArcs float64
	var check int

	for i, tr := range predictedTrees {
		gTr := goldTrees[i]

		if len(tr.AnnotatedSentence) != len(gTr.AnnotatedSentence) {
			sumArcs += float64(gTr.N())

			// log.Printf("WARNING: %q and %q do not have the same length", tr, gTr)
			continue
		}

		var nCorrectHead int
		for j, a := range tr.AnnotatedSentence[1:] {
			b := gTr.AnnotatedSentence[j+1]
			if a.HeadID() == b.HeadID() {
				correctHeads++
				nCorrectHead++
			}

			if a.DependencyType == b.DependencyType {
				correctLabels++
			}
			sumArcs++
		}
		if nCorrectHead == gTr.N() {
			correctTrees++
		}
		if tr.Root() == gTr.Root() {
			correctRoot++
		}

		// check 5 per iteration
		if check < 5 {
			logf("predictedHeads: \n%v\n%v\n", tr.Heads(), gTr.Heads())
			logf("Ns: %v | %v || Correct: %v", tr.N(), gTr.N(), nCorrectHead)
			check++
		}
	}

	uas := correctHeads / sumArcs
	las := correctLabels / sumArcs
	uem := correctTrees / float64(len(predictedTrees))
	roo := correctRoot / float64(len(predictedTrees))

	return Performance{UAS: uas, LAS: las, UEM: uem, Root: roo}
}

func (t *Trainer) crossValidate(st []treebank.SentenceTag) Performance {
	preds := t.predMany(st)
	golds := make([]*lingo.Dependency, len(st))

	for i, s := range st {
		golds[i] = s.Dependency(t)
	}
	return Evaluate(preds, golds)
}

func (t *Trainer) predMany(sentenceTags []treebank.SentenceTag) []*lingo.Dependency {
	retVal := make([]*lingo.Dependency, len(sentenceTags))
	for i, st := range sentenceTags {
		dep, err := t.pred(st.AnnotatedSentence(t))
		if err != nil {
			ioutil.WriteFile("fullGraph.dot", []byte(t.nn.g.ToDot()), 0644)
			panic(fmt.Sprintf("%+v", err))
		}
		retVal[i] = dep
	}
	return retVal
}

func (t *Trainer) pred(as lingo.AnnotatedSentence) (*lingo.Dependency, error) {
	d := new(Parser)
	d.Model = t.Model

	return d.predict(as)
}


================================================
FILE: dep/example.go
================================================
package dep

import (
	"math/rand"

	"github.com/chewxy/lingo"
	"github.com/chewxy/lingo/corpus"
	"github.com/chewxy/lingo/treebank"
)

// example is a training example.
type example struct {
	transition

	features []int // features are used in the embeddings
	labels   []int // labels are used in scoring the transitions
}

func makeExamples(sentenceTags []treebank.SentenceTag, conf NNConfig, dict *corpus.Corpus, ts []transition, f lingo.AnnotationFixer) []example {
	var examples []example

	var tarpit, nonprojective, good int
	for i, sentenceTag := range sentenceTags {
		exs, err := makeOneExample(i, sentenceTag, dict, ts, f)
		if err != nil {
			switch err.(type) {
			case TarpitError:
				tarpit++
			case NonProjectiveError:
				nonprojective++
			}
		} else {
			examples = append(examples, exs...)
			good++
		}
	}

	logf("Number of SentenceTags Generated Into Examples: %d/%d | Number of Examples: %d | Number of nonprojective examples: %d | Number of tarpit examples: %d", good, len(sentenceTags), len(examples), nonprojective, tarpit)
	return examples
}

// makeOneExample is an example of a poorly named function. It makes an example from a SentenceTag
func makeOneExample(i int, sentenceTag treebank.SentenceTag, dict *corpus.Corpus, ts []transition, f lingo.AnnotationFixer) ([]example, error) {
	var examples []example

	s := sentenceTag.AnnotatedSentence(f)
	dep := s.Dependency()
	if dep.IsProjective() {
		c := newConfiguration(s, true)

		count := 0
		for !c.isTerminal() && count < 1000 {
			if count == 999 {
				return examples, TarpitError{c}
			}

			oracle := c.oracle(dep)
			features := getFeatures(c, dict)

			labels := make([]int, MAXTRANSITION)
			for i, t := range ts {
				if t == oracle {
					labels[i] = 1
				} else if c.canApply(t) {
					labels[i] = 0
				} else {
					labels[i] = -1
				}
			}

			ex := example{transition{oracle.Move, oracle.DependencyType}, features, labels}
			examples = append(examples, ex)

			c.apply(oracle)
			count++
		}
	} else {
		return nil, NonProjectiveError{dep}
	}

	return examples, nil
}

func shuffleExamples(a []example) {
	for i := range a {
		j := rand.Intn(i + 1)
		a[i], a[j] = a[j], a[i]
	}
}


================================================
FILE: dep/example_test.go
================================================
package dep

import (
	"testing"

	"github.com/chewxy/lingo/corpus"
)

func TestMakeExamples(t *testing.T) {
	st := simpleSentence()
	dict := corpus.GenerateCorpus(st)

	exs := makeExamples(st, DefaultNNConfig, dict, transitions, dummyFix{})
	if len(exs) != 20 {
		t.Error("Expected 20 examples to be generated from simple sentence")
	}
}


================================================
FILE: dep/featureExtraction.go
================================================
package dep

import (
	"github.com/chewxy/lingo"
	"github.com/chewxy/lingo/corpus"
)

// getFeatures extracts the IDs to pass into the neural network. These IDs are used in the network to construct the  input layers
func getFeatures(c *configuration, dict *corpus.Corpus) []int {
	// logf("CONFIG: %v", c)
	wordFeats := make([]int, 0)
	posFeats := make([]lingo.POSTag, 0)
	labelFeats := make([]lingo.DependencyType, 0)
	unknownID, _ := dict.Id("-UNKNOWN-")

	for j := 2; j >= 0; j-- {
		index := c.stackValue(j)
		mor := c.annotation(index)

		if wordID, ok := dict.Id(mor.Value); ok {
			wordFeats = append(wordFeats, wordID)
		} else {
			wordFeats = append(wordFeats, unknownID)
		}
		posFeats = append(posFeats, mor.POSTag)
	}

	// logf("wordFeats: %v", wordFeats)

	for j := 0; j <= 2; j++ {
		index := c.bufferValue(j)
		mor := c.annotation(index)
		// logf("Want: %v Index: %d. Morpheme: %v", j, index, mor)

		if wordID, ok := dict.Id(mor.Value); ok {
			wordFeats = append(wordFeats, wordID)
		} else {
			wordFeats = append(wordFeats, unknownID)
		}
		posFeats = append(posFeats, mor.POSTag)
	}
	// logf("wordFeats: %v", wordFeats)

	for j := 0; j <= 1; j++ {
		k := c.stackValue(j)

		index := c.lc(k, 1)
		mor := c.annotation(index)
		if wordID, ok := dict.Id(mor.Value); ok {
			wordFeats = append(wordFeats, wordID)
		} else {
			wordFeats = append(wordFeats, unknownID)
		}
		posFeats = append(posFeats, mor.POSTag)
		labelFeats = append(labelFeats, c.label(index))

		index = c.rc(k, 1)
		mor = c.annotation(index)
		if wordID, ok := dict.Id(mor.Value); ok {
			wordFeats = append(wordFeats, wordID)
		} else {
			wordFeats = append(wordFeats, unknownID)
		}
		posFeats = append(posFeats, mor.POSTag)
		labelFeats = append(labelFeats, c.label(index))

		index = c.lc(k, 2)
		mor = c.annotation(index)
		if wordID, ok := dict.Id(mor.Value); ok {
			wordFeats = append(wordFeats, wordID)
		} else {
			wordFeats = append(wordFeats, unknownID)
		}
		posFeats = append(posFeats, mor.POSTag)
		labelFeats = append(labelFeats, c.label(index))

		index = c.rc(k, 2)
		mor = c.annotation(index)
		if wordID, ok := dict.Id(mor.Value); ok {
			wordFeats = append(wordFeats, wordID)
		} else {
			wordFeats = append(wordFeats, unknownID)
		}
		posFeats = append(posFeats, mor.POSTag)
		labelFeats = append(labelFeats, c.label(index))

		leftChild := c.lc(k, 1)
		index = c.lc(leftChild, 1)
		mor = c.annotation(index)
		if wordID, ok := dict.Id(mor.Value); ok {
			wordFeats = append(wordFeats, wordID)
		} else {
			wordFeats = append(wordFeats, unknownID)
		}
		posFeats = append(posFeats, mor.POSTag)
		labelFeats = append(labelFeats, c.label(index))

		rightChild := c.rc(k, 1)
		index = c.rc(rightChild, 1)
		mor = c.annotation(index)
		if wordID, ok := dict.Id(mor.Value); ok {
			wordFeats = append(wordFeats, wordID)
		} else {
			wordFeats = append(wordFeats, unknownID)
		}
		posFeats = append(posFeats, mor.POSTag)
		labelFeats = append(labelFeats, c.label(index))
	}

	// the embedding matrix is arranged thus:
	/*
		POSTag0 0, 1, ... 50
		POSTag1
		...
		MAXTAG-1
		DepType0
		DepType1
		...
		MAXDEPTYPE-1
		WordID0
		...
		WordIDN
	*/

	features := make([]int, MAXFEATURE)

	for i, w := range wordFeats {
		features[i] = w + wordFeatsStartAt
	}
	for i, t := range posFeats {
		features[i+POS_OFFSET] = int(t)
	}
	for i, l := range labelFeats {
		features[i+DEP_OFFSET] = int(l) + labelFeatsStartAt
	}

	return features
}

const (
	POS_OFFSET   int = 18
	DEP_OFFSET       = 36
	STACK_OFFSET     = 6
	STACK_NUMBER     = 6
)


================================================
FILE: dep/features.go
================================================
package dep

import "github.com/chewxy/lingo"

// the features are used as columns in the matrix

// go:generate stringer type=feature -output=feature_string.go
type feature int

const (
	// first 18 are word related features
	// second 18 are POS related features
	// last 12 are label related features

	s0w feature = iota
	s1w
	s2w

	b0w
	b1w
	b2w

	s0l1w
	s0r1w
	s0l2w
	s0r2w
	s0llw
	s0rrw

	s1l1w
	s1r1w
	s1l2w
	s1r2w
	s1llw
	s1rrw

	// POS related words
	s0t
	s1t
	s2t

	b0t
	b1t
	b2t

	s0l1t
	s0r1t
	s0l2t
	s0r2t
	s0llt
	s0rrt

	s1l1t
	s1r1t
	s1l2t
	s1r2t
	s1llt
	s1rrt

	// label related
	s0l1d
	s0r1d
	s0l2d
	s0r2d
	s0lld
	s0rrd

	s1l1d
	s1r1d
	s1l2d
	s1r2d
	s1lld
	s1rrd

	MAXFEATURE
)

const (
	wordFeatsStartAt  int = int(lingo.MAXTAG) + int(lingo.MAXDEPTYPE)
	labelFeatsStartAt     = int(lingo.MAXTAG)
	posFeatsStartAt       = 0
)


================================================
FILE: dep/features_string.go
================================================
// generated by stringer -type=feature -output=features_string.go; DO NOT EDIT

package dep

import "fmt"

const _feature_name = "s0ws1ws2wb0wb1wb2ws0l1ws0r1ws0l2ws0r2ws0llws0rrws1l1ws1r1ws1l2ws1r2ws1llws1rrws0ts1ts2tb0tb1tb2ts0l1ts0r1ts0l2ts0r2ts0llts0rrts1l1ts1r1ts1l2ts1r2ts1llts1rrts0l1ds0r1ds0l2ds0r2ds0llds0rrds1l1ds1r1ds1l2ds1r2ds1llds1rrdMAXFEATURE"

var _feature_index = [...]uint8{0, 3, 6, 9, 12, 15, 18, 23, 28, 33, 38, 43, 48, 53, 58, 63, 68, 73, 78, 81, 84, 87, 90, 93, 96, 101, 106, 111, 116, 121, 126, 131, 136, 141, 146, 151, 156, 161, 166, 171, 176, 181, 186, 191, 196, 201, 206, 211, 216, 226}

func (i feature) String() string {
	if i < 0 || i >= feature(len(_feature_index)-1) {
		return fmt.Sprintf("feature(%d)", i)
	}
	return _feature_name[_feature_index[i]:_feature_index[i+1]]
}


================================================
FILE: dep/fix.go
================================================
package dep

import (
	"log"

	"github.com/chewxy/lingo"
)

// applies common fixes
func fix(d *lingo.Dependency) {
	// NNP fix:
	// If a sentence is [a, b, c, D, E, f, g]
	// where D, E are NNPs, they should be compound words
	// The head should be the one with higher headID
	spans := properNounSpans(d)
	for _, s := range spans {
		// we don't care about single word proper nouns
		if s.end-s.start <= 1 {
			continue
		}

		phrase := d.AnnotatedSentence[s.start:s.end]

		// pick up all compound roots
		// find annotations that do not have compound as deptype
		var compoundRoots lingo.AnnotationSet
		var problematic lingo.AnnotationSet
		for _, a := range phrase {
			if lingo.IsCompound(a.DependencyType) {
				compoundRoots = compoundRoots.Add(a.Head)
			}

			if !lingo.IsCompound(a.DependencyType) && a.ID != s.end-1 {
				problematic = problematic.Add(a)
			}
		}

		// if no root
		if len(compoundRoots) == 0 {
			// actual root is the word with the largest ID
			var compoundRoot *lingo.Annotation
			var rootRoot *lingo.Annotation
			for last := -1; s.end+last >= s.start; last-- {
				predictedRoot := s.end + last
				compoundRoot = d.AnnotatedSentence[predictedRoot]

				// incorrects :
				//	dep==Dep
				// 	dep==Root && others has dep != root

				if compoundRoot.DependencyType == lingo.Dep {
					problematic = problematic.Add(compoundRoot)
					continue
				}

				if compoundRoot.DependencyType != lingo.Dep && compoundRoot.DependencyType != lingo.Root {
					break
				}

				if compoundRoot.DependencyType == lingo.Root {
					rootRoot = compoundRoot
					problematic = problematic.Add(compoundRoot)
				}
			}

			if rootRoot != nil && rootRoot != compoundRoot {
				// we have two potential roots. Choose the best
				log.Println("Problem when fixing: more than one possible compound root found")
			}

			for _, a := range problematic {
				if a == compoundRoot {
					continue
				}
				tmpHead := a.Head
				tmpRel := a.DependencyType

				a.SetHead(compoundRoot)
				a.DependencyType = lingo.Compound

				for _, childID := range d.AnnotatedSentence.Children(a.ID) {
					childA := d.AnnotatedSentence[childID]
					childA.SetHead(tmpHead)
					childA.DependencyType = tmpRel
				}
			}

		}

		// if more than one root...
		logf("More than zero compound roots not handled yet")

	}

	// Number fix
}

func properNounSpans(d *lingo.Dependency) (retVal []span) {
	start := -1
	end := -1
	for i, a := range d.AnnotatedSentence {
		if lingo.IsProperNoun(a.POSTag) {
			if start == -1 {
				start = i
				end = i + 1
			} else {
				end = i + 1
			}
		} else {
			if end == -1 {
				end = i
			}

			if start > -1 {
				s := makeSpan(start, end)
				retVal = append(retVal, s)
			}

			start = -1
			end = -1
		}
	}

	if start > -1 {
		s := makeSpan(start, len(d.AnnotatedSentence))
		retVal = append(retVal, s)
	}
	return
}


================================================
FILE: dep/init.go
================================================
package dep

import "github.com/chewxy/lingo/corpus"

func init() {
	c := corpus.New()
	c.Add("") // add null words

	KnownWords = c
}


================================================
FILE: dep/models.go
================================================
package dep

import (
	"bufio"
	"bytes"
	"encoding/gob"
	"fmt"
	"io"
	"os"

	"github.com/chewxy/lingo/corpus"
	"github.com/pkg/errors"
	"gorgonia.org/tensor"
)

// Model holds the neural network that a DependencyParser uses. To train, use a Trainer
type Model struct {
	nn     *neuralnetwork2
	corpus *corpus.Corpus
	ts     []transition
}

func (m *Model) Corpus() *corpus.Corpus { return m.corpus }

func (m *Model) WordEmbeddings() *tensor.Dense {
	val := m.nn.e_w.Value().(*tensor.Dense)
	emb := val.Clone().(*tensor.Dense)
	return emb
}

func (m *Model) POSTagEmbeddings() *tensor.Dense {
	val := m.nn.e_t.Value().(*tensor.Dense)
	emb := val.Clone().(*tensor.Dense)
	return emb
}

func (m *Model) String() string {
	var buf bytes.Buffer
	buf.WriteString(m.nn.String())
	buf.WriteString("Transitions: [")
	for _, t := range m.ts {
		fmt.Fprintf(&buf, "%v, ", t)
	}
	buf.WriteString("]")
	return buf.String()
}

func (m *Model) Save(filename string) error {
	if m.nn == nil {
		return errors.Errorf("Cannot save a model with no nn")
	}

	f, err := os.Create(filename)
	if err != nil {
		return err
	}
	return m.SaveWriter(f)
}

func (m *Model) SaveWriter(f io.WriteCloser) error {
	defer f.Close()
	w := bufio.NewWriter(f)
	defer w.Flush()
	encoder := gob.NewEncoder(w)

	if err := encoder.Encode(m.corpus); err != nil {
		return err
	}

	if err := encoder.Encode(m.nn); err != nil {
		return err
	}

	// if err := encoder.Encode(m.ts); err != nil {
	// 	return err
	// }

	return nil
}

func Load(filename string) (*Model, error) {
	f, err := os.Open(filename)
	if err != nil {
		return nil, err
	}
	return LoadReader(f)
}

func LoadReader(rd io.ReadCloser) (*Model, error) {
	defer rd.Close()
	r := bufio.NewReader(rd)
	decoder := gob.NewDecoder(r)

	m := new(Model)
	if err := decoder.Decode(&m.corpus); err != nil {
		return nil, err
	}

	m.nn = new(neuralnetwork2)
	m.nn.dict = m.corpus

	if err := decoder.Decode(&m.nn); err != nil {
		return nil, err
	}

	if err := decoder.Decode(&m.ts); err != nil {
		m.ts = transitions
	}
	m.nn.transitions = m.ts

	return m, nil

}


================================================
FILE: dep/models_test.go
================================================
package dep

import (
	"os"
	"testing"

	"github.com/stretchr/testify/assert"
	G "gorgonia.org/gorgonia"
)

func TestModel_SaveLoad(t *testing.T) {
	assert := assert.New(t)

	testFileName := "TestSave.dat"
	m := new(Model)

	// dumb shit
	if err := m.Save(testFileName); err == nil {
		t.Error("Expected an error")
	}

	conf := DefaultNNConfig
	conf.Dtype = G.Float32
	m = new(Model)
	m.ts = transitions
	m.corpus = KnownWords

	m.nn = new(neuralnetwork2)
	m.nn.NNConfig = conf
	m.nn.dict = m.corpus

	if err := m.nn.init(); err != nil {
		t.Error(err)
	}

	if err := m.Save(testFileName); err != nil {
		t.Fatal(err)
	}

	var m2 *Model
	var err error
	if m2, err = Load(testFileName); err != nil {
		t.Error(err)

	}

	assert.Equal(m.corpus, m2.corpus, "Both Dependency Parsers need to have the same dict")

	if !G.ValueEq(m.nn.w2.Value(), m2.nn.w2.Value()) {
		t.Errorf("Expected w2 to be equal")
	}
	if !G.ValueEq(m.nn.e_w.Value(), m2.nn.e_w.Value()) {
		t.Errorf("Expected e_w to be equal")
	}

	// cleanup
	if err := os.Remove(testFileName); err != nil {
		t.Error(err)
	}
}


================================================
FILE: dep/move.go
================================================
package dep

// Move is an action that the dependency parser can take - whether to Shift, Attach-Left, or AttachRight
type Move byte

//go:generate stringer -type=Move

const (
	Shift Move = iota
	Left
	Right

	MAXMOVE
)

// ALLMOVES is the set of all possible moves
var ALLMOVES = [...]Move{Left, Right, Shift}


================================================
FILE: dep/move_string.go
================================================
// generated by stringer -type=Move; DO NOT EDIT

package dep

import "fmt"

const _Move_name = "ShiftLeftRightMAXMOVE"

var _Move_index = [...]uint8{0, 5, 9, 14, 21}

func (i Move) String() string {
	if i >= Move(len(_Move_index)-1) {
		return fmt.Sprintf("Move(%d)", i)
	}
	return _Move_name[_Move_index[i]:_Move_index[i+1]]
}


================================================
FILE: dep/nn2.go
================================================
package dep

import (
	"github.com/chewxy/lingo"
	"github.com/chewxy/lingo/corpus"
	"github.com/pkg/errors"
	G "gorgonia.org/gorgonia"
	"gorgonia.org/tensor"
)

// may is a simple monad for handling errors
type may struct {
	error
	n *G.Node
}

func (m *may) doUnary(fn func(*G.Node) (*G.Node, error)) {
	if m.error != nil {
		return
	}
	m.n, m.error = fn(m.n)
}

func (m *may) doBinary(fn func(a, b *G.Node) (*G.Node, error), other *G.Node) {
	if m.error != nil {
		return
	}
	m.n, m.error = fn(m.n, other)
}

func (m *may) doSwapBinary(fn func(a, b *G.Node) (*G.Node, error), other *G.Node) {
	if m.error != nil {
		return
	}
	m.n, m.error = fn(other, m.n)
}

type neuralnetwork2 struct {
	NNConfig

	g   *G.ExprGraph
	sub *G.ExprGraph

	// model

	// embedding matrices for word, POSTags and labels respectively
	e_w *G.Node // Shape: (EmbeddingSize, DictSize)
	e_t *G.Node // Shape: (EmbeddingSize, lingo.MAXTAG)
	e_l *G.Node // Shape: (EmbeddingSize, lingo.MAXDEP)

	// w1
	w1_w *G.Node // Shape: (HiddenSize, DictSize)
	w1_t *G.Node // Shape: (HiddenSize, lingo.MAXTAG)
	w1_l *G.Node // Shape: (HiddenSize, lingo.MAXDEP)
	b    *G.Node // Shape: (HiddenSize)

	// w2
	w2 *G.Node // Shape: (MAXTRANSITION, HiddenSize)

	// selects
	x_wSelW G.Nodes // 18 - word features
	x_tSelT G.Nodes // 18 - POSTag features
	x_lSelL G.Nodes // 12 - Dependency feature

	// inputs (feature vectors built up from the selects)
	x_w *G.Node
	x_t *G.Node
	x_l *G.Node

	// outputs
	scores  *G.Node // argmax this to get the greedy decoded transition
	logProb *G.Node
	cost    *G.Node
	costVal G.Value

	vm     G.VM
	model  G.Nodes
	solver G.Solver

	dict        *corpus.Corpus
	transitions []transition

	costChan chan G.Value

	// wordfeats *G.Node
	// tagfeats  *G.Node
	// depfeats  *G.Node
	// sumfeats  *G.Node
	// act       *G.Node
}

func (nn *neuralnetwork2) initialized() bool {
	return nn.g != nil && nn.sub != nil &&
		nn.e_w != nil && nn.e_t != nil && nn.e_l != nil &&
		nn.w1_w != nil && nn.w1_t != nil && nn.w1_l != nil && nn.b != nil &&
		nn.w2 != nil && len(nn.x_wSelW) > 0 && len(nn.x_tSelT) > 0 && len(nn.x_lSelL) > 0 &&
		nn.x_w != nil && nn.x_t != nil && nn.x_l != nil &&
		nn.scores != nil &&
		nn.dict != nil && nn.vm != nil && nn.solver != nil
}

func (nn *neuralnetwork2) init() error {
	if nn.dict == nil {
		return errors.Errorf("No Corpus Provided to the Neural Network. Will be unable to decode")
	}

	g := G.NewGraph()
	nn.g = g

	word := nn.dict.Size()
	tags := int(lingo.MAXTAG)
	deps := int(lingo.MAXDEPTYPE)
	// trns := len(nn.transitions)

	wordFeats := POS_OFFSET - 0
	tagFeats := DEP_OFFSET - POS_OFFSET
	depFeats := int(MAXFEATURE) - DEP_OFFSET

	// In any case a very very very small dict was passed in
	// we set the minimum to wordFeatss
	if word < wordFeats {
		word = wordFeats
	}

	logf(`Word: %d
tags: %d
deps: %d
wordFeats: %d
tagFeats: %d
depFeats: %d
`, word, tags, deps, wordFeats, tagFeats, depFeats)

	// define inputs
	nn.x_w = G.NewVector(g, nn.Dtype, G.WithShape(wordFeats*nn.EmbeddingSize), G.WithName("word input"), G.WithInit(G.Zeroes()))
	nn.x_t = G.NewVector(g, nn.Dtype, G.WithShape(tagFeats*nn.EmbeddingSize), G.WithName("POSTag input"), G.WithInit(G.Zeroes()))
	nn.x_l = G.NewVector(g, nn.Dtype, G.WithShape(depFeats*nn.EmbeddingSize), G.WithName("word input"), G.WithInit(G.Zeroes()))

	nn.x_wSelW = make(G.Nodes, wordFeats)
	nn.x_tSelT = make(G.Nodes, tagFeats)
	nn.x_lSelL = make(G.Nodes, depFeats)

	// define models
	nn.e_w = G.NewMatrix(g, nn.Dtype, G.WithShape(word, nn.EmbeddingSize), G.WithName("e_w"), G.WithInit(G.GlorotU(1)))
	nn.e_t = G.NewMatrix(g, nn.Dtype, G.WithShape(tags, nn.EmbeddingSize), G.WithName("e_t"), G.WithInit(G.GlorotU(1)))
	nn.e_l = G.NewMatrix(g, nn.Dtype, G.WithShape(deps, nn.EmbeddingSize), G.WithName("e_l"), G.WithInit(G.GlorotU(1)))

	nn.w1_w = G.NewMatrix(g, nn.Dtype, G.WithShape(nn.HiddenSize, nn.EmbeddingSize*wordFeats), G.WithName("w1_w"), G.WithInit(G.GlorotU(1)))
	nn.w1_t = G.NewMatrix(g, nn.Dtype, G.WithShape(nn.HiddenSize, nn.EmbeddingSize*tagFeats), G.WithName("w1_t"), G.WithInit(G.GlorotU(1)))
	nn.w1_l = G.NewMatrix(g, nn.Dtype, G.WithShape(nn.HiddenSize, nn.EmbeddingSize*depFeats), G.WithName("w1_l"), G.WithInit(G.GlorotU(1)))
	nn.b = G.NewVector(g, nn.Dtype, G.WithShape(nn.HiddenSize), G.WithName("b"), G.WithInit(G.Zeroes()))

	nn.w2 = G.NewMatrix(g, nn.Dtype, G.WithShape(MAXTRANSITION, nn.HiddenSize), G.WithName("w2"), G.WithInit(G.GlorotU(1)))

	nn.model = G.Nodes{nn.e_w, nn.e_t, nn.e_l, nn.w1_w, nn.w1_t, nn.w1_l, nn.b, nn.w2}

	// define selects
	// words first
	logf("nn.e_w: %+1.1s", nn.e_w.Value())
	var err error
	for i := 0; i < wordFeats; i++ {
		if nn.x_wSelW[i], err = G.Slice(nn.e_w, G.S(i)); err != nil { // dummy slices... they'll be replaced at runtime
			return err
		}

	}

	// tag features
	for i := 0; i < tagFeats; i++ {
		if nn.x_tSelT[i], err = G.Slice(nn.e_t, G.S(i)); err != nil { // dummy slices... they'll be replaced at runtime
			return err
		}
	}

	// dependency features
	for i := 0; i < depFeats; i++ {
		if nn.x_lSelL[i], err = G.Slice(nn.e_l, G.S(i)); err != nil {
			return err
		}
	}

	// forwards
	if err = nn.fwd(); err != nil {
		return err
	}

	// backprop
	if _, err = G.Grad(nn.cost, nn.model...); err != nil {
		return err
	}

	nn.sub = g.SubgraphRoots(nn.scores)

	// prog, locmap, err := G.Compile(nn.g)
	// if err != nil {
	// 	return err
	// }
	// log.Printf("Prog: %v", prog)

	// ioutil.WriteFile("graph.dot", []byte(g.ToDot()), 0644)

	// logger := log.New(os.Stderr, "", 0)
	// nn.vm = G.NewTapeMachine(prog, locmap, G.BindDualValues(nn.model...), G.UseCudaFor(), G.WithLogger(logger), G.WithWatchlist())
	// nn.vm = G.NewTapeMachine(prog, locmap, G.BindDualValues(nn.model...), G.UseCudaFor())
	nn.vm = G.NewTapeMachine(nn.g, G.BindDualValues(nn.model...), G.UseCudaFor())
	G.BindDualValues(nn.scores)(nn.vm) // makes sure that scores is a *dualValue
	nn.solver = G.NewAdaGradSolver(G.WithLearnRate(nn.AdaAlpha), G.WithEps(nn.AdaEps), G.WithL2Reg(nn.Reg), G.WithBatchSize(float64(nn.BatchSize)))
	// nn.solver = G.NewVanillaSolver(G.WithLearnRate(nn.AdaAlpha), G.WithL2Reg(nn.Reg))
	return nil
}

func (nn *neuralnetwork2) fwd() error {
	var err error

	// build up x vectors
	if nn.x_w, err = G.Concat(0, nn.x_wSelW...); err != nil {
		return err
	}

	if nn.x_t, err = G.Concat(0, nn.x_tSelT...); err != nil {
		return err
	}

	if nn.x_l, err = G.Concat(0, nn.x_lSelL...); err != nil {
		return err
	}

	logf("w1_w %v, x_w %v", nn.w1_w.Shape(), nn.x_w.Shape())
	m_w := &may{nil, nn.w1_w}
	m_w.doBinary(G.Mul, nn.x_w)
	if m_w.error != nil {
		return m_w.error
	}

	logf("w1_t %v, x_t %v", nn.w1_t.Shape(), nn.x_t.Shape())
	m_t := &may{nil, nn.w1_t}
	m_t.doBinary(G.Mul, nn.x_t)
	if m_t.error != nil {
		return m_t.error
	}

	logf("w1_l %v, x_l %v", nn.w1_l.Shape(), nn.x_l.Shape())
	m_l := &may{nil, nn.w1_l}
	m_l.doBinary(G.Mul, nn.x_l)
	if m_l.error != nil {
		return m_l.error
	}

	// add and activate layer 1
	logf("w : %v", m_w.n.Shape())
	m_w1 := &may{nil, m_w.n}
	m_w1.doBinary(G.Add, m_t.n)
	m_w1.doBinary(G.Add, m_l.n)
	m_w1.doBinary(G.Add, nn.b)
	m_w1.doUnary(G.Cube)
	if m_w1.error != nil {
		return m_w1.error
	}

	if nn.Dropout > 0 {
		logf("Doing dropout")
		m_w1.n, m_w1.error = G.Dropout(m_w1.n, nn.Dropout)
		if m_w1.error != nil {
			return m_w1.error
		}
	}

	// go to softmax layer
	logf("w2: %v, w1act: %v", nn.w2.Shape(), m_w1.n.Shape())
	m_sm := &may{nil, nn.w2}
	m_sm.doBinary(G.Mul, m_w1.n)
	nn.scores = m_sm.n
	m_sm.doUnary(G.SoftMax)
	if m_sm.error != nil {
		return m_sm
	}

	nn.logProb = m_sm.n
	// G.WithName("Logprob")(nn.logProb)
	// log.Printf("LOGPROB %v %p %v", nn.logProb, nn.logProb, nn.logProb)
	if nn.cost, err = G.Slice(nn.logProb, G.S(0)); err != nil { // slice is a dummy tensor.Slice. It'll be replaced at runtime
		return err
	}

	G.Read(nn.cost, &nn.costVal)
	return nil
}

func (nn *neuralnetwork2) costProgress() <-chan G.Value {
	if nn.costChan == nil {
		nn.costChan = make(chan G.Value)
	}
	return nn.costChan
}

// train does one epoch of training. The examples are batched.
func (nn *neuralnetwork2) train(examples []example) error {
	size := len(examples)
	batches := size / nn.BatchSize

	var start, end int
	if nn.BatchSize > size {
		batches = 1
		end = size
		G.WithBatchSize(float64(size))(nn.solver) // set it such that the solver doesn't get confused
	} else {
		end = nn.BatchSize
	}

	for batch := 0; batch < batches; batch++ {
		for _, ex := range examples[start:end] {
			nn.feats2vec(ex.features)
			tid := lookupTransition(ex.transition, nn.transitions)

			if err := G.UnsafeLet(nn.cost, G.S(tid)); err != nil {
				return err
			}

			if err := nn.vm.RunAll(); err != nil {
				return err
			}

			nn.vm.Reset()
		}
		if err := nn.solver.Step(G.NodesToValueGrads(nn.model)); err != nil {
			err = errors.Wrapf(err, "Stepping on the model failed %v", batch)
			return err
		}

		if nn.costChan != nil {
			nn.costChan <- nn.costVal
		}

		start = end
		if start >= size {
			break
		}
		end += nn.BatchSize
		if end >= size {
			end = size
		}
	}

	return nil
}

// pred predicts the index of the transitions
func (nn *neuralnetwork2) pred(ind []int) (int, error) {
	nn.feats2vec(ind)

	// f, _ := os.OpenFile("LOOOOOG", os.O_APPEND|os.O_CREATE|os.O_RDWR, 0644)
	// logger := log.New(f, "", 0)
	// logger := log.New(os.Stderr, "", 0)

	// m := G.NewLispMachine(nn.sub, G.ExecuteFwdOnly(), G.WithLogger(logger), G.WithWatchlist(), G.LogBothDir(), G.WithValueFmt("%+3.3v"))
	m := G.NewLispMachine(nn.sub, G.ExecuteFwdOnly())
	if err := m.RunAll(); err != nil {
		return 0, err
	}
	// logger.Println("========================\n")

	val := nn.scores.Value().(tensor.Tensor)
	t, err := tensor.Argmax(val, tensor.AllAxes)
	if err != nil {
		return 0, err
	}

	return t.ScalarValue().(int), nil
}

// utility function

func (nn *neuralnetwork2) feats2vec(indicators []int) error {
	// fix word features
	for i, ind := range indicators[:POS_OFFSET] {
		if err := G.UnsafeLet(nn.x_wSelW[i], G.S(ind-wordFeatsStartAt)); err != nil {
			return err
		}
	}

	// fix tag features
	for i, ind := range indicators[POS_OFFSET:DEP_OFFSET] {
		if err := G.UnsafeLet(nn.x_tSelT[i], G.S(ind)); err != nil {
			return err
		}
	}

	for i, ind := range indicators[DEP_OFFSET:] {
		if err := G.UnsafeLet(nn.x_lSelL[i], G.S(ind-labelFeatsStartAt)); err != nil {
			return err
		}
	}

	return nil
}


================================================
FILE: dep/nn2_io.go
================================================
package dep

import (
	"bytes"
	"encoding/gob"
	"fmt"

	"github.com/pkg/errors"
	G "gorgonia.org/gorgonia"
	T "gorgonia.org/tensor"
)

var empty struct{}

func (nn *neuralnetwork2) String() string {
	s := `Config
------
%v
Info
------
Embeddings_Word       : %v
Embeddings_POStag     : %v
Embeddings_Dependency : %v
Selects_Words         : %d
Selects_POSTag        : %d
Selects_Dependency    : %d
Weights1_Word         : %v
Weights1_POSTag       : %v
Weights1_Dependency   : %v
Biases                : %v
Weights2              : %v
`

	return fmt.Sprintf(s, nn.NNConfig,
		nn.e_w.Shape(), nn.e_t.Shape(), nn.e_l.Shape(),
		len(nn.x_wSelW), len(nn.x_tSelT), len(nn.x_lSelL),
		nn.w1_w.Shape(), nn.w1_t.Shape(), nn.w1_l.Shape(),
		nn.b.Shape(), nn.w2.Shape())
}

func (nn *neuralnetwork2) GobEncode() ([]byte, error) {
	if !nn.initialized() {
		return nil, errors.Errorf("Neural network not initialized. Cannot gob")
	}

	var buf bytes.Buffer
	encoder := gob.NewEncoder(&buf)

	if err := encoder.Encode(nn.NNConfig); err != nil {
		return nil, err
	}

	if err := encoder.Encode(nn.e_w.Value()); err != nil {
		return nil, err
	}

	if err := encoder.Encode(nn.e_t.Value()); err != nil {
		return nil, err
	}

	if err := encoder.Encode(nn.e_l.Value()); err != nil {
		return nil, err
	}

	if err := encoder.Encode(nn.w1_w.Value()); err != nil {
		return nil, err
	}

	if err := encoder.Encode(nn.w1_t.Value()); err != nil {
		return nil, err
	}

	if err := encoder.Encode(nn.w1_l.Value()); err != nil {
		return nil, err
	}

	if err := encoder.Encode(nn.b.Value()); err != nil {
		return nil, err
	}

	if err := encoder.Encode(nn.w2.Value()); err != nil {
		return nil, err
	}
	return buf.Bytes(), nil
}

func (nn *neuralnetwork2) GobDecode(buf []byte) error {
	// prechecks
	if nn.dict == nil {
		return errors.Errorf("Neural Network has no corpus attached to it (Corpuses are serialized separately).")
	}

	b := bytes.NewBuffer(buf)
	decoder := gob.NewDecoder(b)

	if err := decoder.Decode(&nn.NNConfig); err != nil {
		return err
	}

	if err := nn.init(); err != nil {
		return err
	}

	e_w := T.New(T.Of(nn.Dtype), T.WithShape(nn.e_w.Shape()...))
	if err := decoder.Decode(e_w); err != nil {
		return err
	}
	G.Let(nn.e_w, e_w)

	e_t := T.New(T.Of(nn.Dtype), T.WithShape(nn.e_t.Shape()...))
	if err := decoder.Decode(e_t); err != nil {
		return err
	}
	G.Let(nn.e_t, e_t)

	e_l := T.New(T.Of(nn.Dtype), T.WithShape(nn.e_l.Shape()...))
	if err := decoder.Decode(e_l); err != nil {
		return err
	}
	G.Let(nn.e_l, e_l)

	w1_w := T.New(T.Of(nn.Dtype), T.WithShape(nn.w1_w.Shape()...))
	if err := decoder.Decode(w1_w); err != nil {
		return err
	}
	G.Let(nn.w1_w, w1_w)

	w1_t := T.New(T.Of(nn.Dtype), T.WithShape(nn.w1_t.Shape()...))
	if err := decoder.Decode(w1_t); err != nil {
		return err
	}
	G.Let(nn.w1_t, w1_t)

	w1_l := T.New(T.Of(nn.Dtype), T.WithShape(nn.w1_l.Shape()...))
	if err := decoder.Decode(w1_l); err != nil {
		return err
	}
	G.Let(nn.w1_l, w1_l)

	bias := T.New(T.Of(nn.Dtype), T.WithShape(nn.b.Shape()...))
	if err := decoder.Decode(bias); err != nil {
		return err
	}
	G.Let(nn.b, bias)

	w2 := T.New(T.Of(nn.Dtype), T.WithShape(nn.w2.Shape()...))
	if err := decoder.Decode(w2); err != nil {
		return err
	}
	G.Let(nn.w2, w2)

	return nil
}


================================================
FILE: dep/nn2_io_test.go
================================================
package dep

import (
	"bytes"
	"encoding/gob"
	"fmt"
	"testing"

	"github.com/chewxy/lingo"
	"github.com/chewxy/lingo/corpus"
	G "gorgonia.org/gorgonia"
)

func TestNNIO(t *testing.T) {
	sts := allSentences()
	nn := new(neuralnetwork2)
	nn.NNConfig = DefaultNNConfig
	nn.dict = corpus.GenerateCorpus(sts)
	nn.transitions = transitions

	if err := nn.init(); err != nil {
		t.Fatalf("%+v", err)
	}

	s := `Config
------
Batch Size               : 10000
Dropout Rate             : 0.500000
AdaGrad Eps (ε)          : 0.000001
AdaGrad Learn Rate (η)   : 0.010000
Regularization Parameter : 0.000002
Hidden Layer Size        : 200
Embedding Size           : 50
Number Precomputed       : 30000

Evaluate Per 100 Iterations
Clear Gradients Per 0 Iterations
Dtype: float64

Info
------
Embeddings_Word       : (74, 50)
Embeddings_POStag     : (%d, 50)
Embeddings_Dependency : (%d, 50)
Selects_Words         : 18
Selects_POSTag        : 18
Selects_Dependency    : 12
Weights1_Word         : (200, 900)
Weights1_POSTag       : (200, 900)
Weights1_Dependency   : (200, 600)
Biases                : (200)
Weights2              : (%d, 200)
`

	correctDesc := fmt.Sprintf(s, lingo.MAXTAG, lingo.MAXDEPTYPE, MAXTRANSITION)
	if nn.String() != correctDesc {
		t.Errorf("Oops. Got %q. Want %q", nn.String(), correctDesc)
	}
	// nn.Dtype = tensor.Float32

	var buf bytes.Buffer
	encoder := gob.NewEncoder(&buf)
	if err := encoder.Encode(nn); err != nil {
		t.Fatalf("%+v", err)
	}

	decoder := gob.NewDecoder(&buf)
	nn2 := new(neuralnetwork2)
	nn2.dict = corpus.GenerateCorpus(sts)
	nn2.transitions = transitions
	if err := decoder.Decode(nn2); err != nil {
		t.Fatal(err)
	}

	if nn.String() != correctDesc {
		t.Fatalf("Oops. Got %q. Want %q", nn.String(), correctDesc)
	}

	if !G.ValueEq(nn.e_w.Value(), nn2.e_w.Value()) {
		t.Errorf("Expected e_w to be the same. Expected %1.1s. Got %1.1s", nn.e_w.Value(), nn2.e_w.Value())
	}

	if !G.ValueEq(nn.e_t.Value(), nn2.e_t.Value()) {
		t.Errorf("Expected e_t to be the same. Expected %1.1s. Got %1.1s", nn.e_t.Value(), nn2.e_t.Value())
	}

	if !G.ValueEq(nn.e_l.Value(), nn2.e_l.Value()) {
		t.Errorf("Expected e_l to be the same. Expected %1.1s. Got %1.1s", nn.e_l.Value(), nn2.e_l.Value())
	}

	if !G.ValueEq(nn.w1_w.Value(), nn2.w1_w.Value()) {
		t.Errorf("Expected w1_w to be the same. Expected %1.1s. Got %1.1s", nn.w1_w.Value(), nn2.w1_w.Value())
	}

	if !G.ValueEq(nn.w1_t.Value(), nn2.w1_t.Value()) {
		t.Errorf("Expected w1_t to be the same. Expected %1.1s. Got %1.1s", nn.w1_t.Value(), nn2.w1_t.Value())
	}

	if !G.ValueEq(nn.w1_l.Value(), nn2.w1_l.Value()) {
		t.Errorf("Expected w1_l to be the same. Expected %1.1s. Got %1.1s", nn.w1_l.Value(), nn2.w1_l.Value())
	}

	if !G.ValueEq(nn.b.Value(), nn2.b.Value()) {
		t.Errorf("Expected b to be the same. Expected %1.1s. Got %1.1s", nn.b.Value(), nn2.b.Value())
	}

	if !G.ValueEq(nn.w2.Value(), nn2.w2.Value()) {
		t.Errorf("Expected w2 to be the same. Expected %1.1s. Got %1.1s", nn.w2.Value(), nn2.w2.Value())
	}

	t.Logf("Visual Inspection: \n%+1.8s\n%+1.8s", nn.e_w.Value(), nn2.e_w.Value())

	// special case
	buf.Reset()
	encoder = gob.NewEncoder(&buf)
	if err := encoder.Encode(nn); err != nil {
		t.Fatalf("%+v", err)
	}
	decoder = gob.NewDecoder(&buf)
	nn3 := new(neuralnetwork2)
	if err := decoder.Decode(nn3); err == nil {
		t.Error("Expected a nocorpus error")
	}
}


================================================
FILE: dep/nn2_test.go
================================================
package dep

import (
	"math/rand"
	"testing"
	"time"

	"github.com/chewxy/lingo/corpus"
	"gorgonia.org/gorgonia"
)

func TestNN2(t *testing.T) {
	rand.Seed(1337)

	// we test 50 iterations unless the short flag is passed in
	epochs := 50
	if testing.Short() {
		epochs = 10
	}

	sts := allSentences()
	nn := new(neuralnetwork2)
	nn.NNConfig = DefaultNNConfig
	nn.Dtype = gorgonia.Float32
	nn.dict = corpus.GenerateCorpus(sts)
	nn.transitions = transitions

	if err := nn.init(); err != nil {
		t.Fatalf("%+v", err)
	}

	var costs []float64
	ch := nn.costProgress()
	sigChan := make(chan struct{})

	go func(ch <-chan gorgonia.Value, sig chan struct{}) {
		for cost := range ch {
			switch c := cost.Data().(type) {
			case float32:
				costs = append(costs, float64(c))
			case float64:
				costs = append(costs, c)
			}

			t.Logf("Cost %v", cost)
		}
		sig <- struct{}{}
	}(ch, sigChan)

	exs := makeExamples(sts, nn.NNConfig, nn.dict, transitions, dummyFix{})

	start := time.Now()
	for i := 0; i < epochs; i++ {
		if err := nn.train(exs); err != nil {
			t.Errorf("%+v", err)
		}
		shuffleExamples(exs)
	}
	// simulate what *DependencyParser would do
	close(nn.costChan)
	nn.costChan = nil

	t.Logf("Training %d iterations took Taken: %v", epochs, time.Since(start))

	<-sigChan
	if len(costs) == 0 {
		t.Error("Expected some costs")
	}
	if costs[0] <= costs[len(costs)-1] {
		t.Error("Expected costs to have reduced during training")
	}

	// PREDICTION TIME!

	ss2 := simpleSentence()
	exs = makeExamples(ss2, nn.NNConfig, nn.dict, transitions, dummyFix{})
	start = time.Now()
	for i, ex := range exs {
		ind, err := nn.pred(ex.features)
		if err != nil {
			t.Errorf("Example %d failed: %v", i, err)
			continue
		}

		t.Logf("Example %d. Want: %v. Got %v. Same: %t", i, ex.transition, transitions[ind], ex.transition == transitions[ind])
	}
	t.Logf("Pred Time Taken: %v", time.Since(start))
}


================================================
FILE: dep/nnconfig.go
================================================
package dep

import (
	"bytes"
	"encoding/gob"
	"fmt"

	"github.com/pkg/errors"
	"gorgonia.org/tensor"
)

// NNConfig configures the neural network
type NNConfig struct {
	BatchSize                  int     // 10000
	Dropout                    float64 // 0.5
	AdaEps                     float64 // 1e-6
	AdaAlpha                   float64 //0.02
	Reg                        float64 // 1e-8
	HiddenSize                 int     // 200
	EmbeddingSize              int     // 50
	NumPrecomputed             int     //100000
	EvalPerIteration           int     // 100
	ClearGradientsPerIteration int     // 0

	Dtype tensor.Dtype
}

func (c NNConfig) String() string {
	s := `Batch Size               : %d
Dropout Rate             : %f
AdaGrad Eps (ε)          : %f
AdaGrad Learn Rate (η)   : %f
Regularization Parameter : %f
Hidden Layer Size        : %d
Embedding Size           : %d
Number Precomputed       : %d

Evaluate Per %d Iterations
Clear Gradients Per %d Iterations
Dtype: %v
`
	return fmt.Sprintf(s, c.BatchSize, c.Dropout, c.AdaEps, c.AdaAlpha, c.Reg, c.HiddenSize, c.EmbeddingSize, c.NumPrecomputed, c.EvalPerIteration, c.ClearGradientsPerIteration, c.Dtype)
}

// DefaultNNConfig is the default config that is passed in, for initialization purposses.
var DefaultNNConfig NNConfig

func (c NNConfig) GobEncode() ([]byte, error) {
	var buf bytes.Buffer
	encoder := gob.NewEncoder(&buf)
	encoder.Encode(c.BatchSize)
	encoder.Encode(c.Dropout)
	encoder.Encode(c.AdaEps)
	encoder.Encode(c.AdaAlpha)
	encoder.Encode(c.Reg)
	encoder.Encode(c.HiddenSize)
	encoder.Encode(c.EmbeddingSize)
	encoder.Encode(c.NumPrecomputed)
	encoder.Encode(c.EvalPerIteration)
	encoder.Encode(c.ClearGradientsPerIteration)

	switch c.Dtype {
	case tensor.Float64:
		encoder.Encode(byte(0))
	case tensor.Float32:
		encoder.Encode(byte(1))
	default:
		return nil, errors.Errorf("Unsupported Dtype to be GobEncoded")
	}
	return buf.Bytes(), nil
}

func (c *NNConfig) GobDecode(p []byte) error {
	b := bytes.NewBuffer(p)
	decoder := gob.NewDecoder(b)

	decoder.Decode(&c.BatchSize)
	decoder.Decode(&c.Dropout)
	decoder.Decode(&c.AdaEps)
	decoder.Decode(&c.AdaAlpha)
	decoder.Decode(&c.Reg)
	decoder.Decode(&c.HiddenSize)
	decoder.Decode(&c.EmbeddingSize)
	decoder.Decode(&c.NumPrecomputed)
	decoder.Decode(&c.EvalPerIteration)
	decoder.Decode(&c.ClearGradientsPerIteration)

	var bite byte
	decoder.Decode(&bite)
	switch bite {
	case 0:
		c.Dtype = tensor.Float64
	case 1:
		c.Dtype = tensor.Float32
	default:
		return errors.Errorf("Unsupported Dtype to be GobDecoded: %v", bite)
	}
	return nil
}

func init() {
	DefaultNNConfig = NNConfig{
		BatchSize: 10000,
		Dropout:   0.5,

		AdaEps:   1e-6,
		AdaAlpha: 0.01,

		Reg: 1.5e-6,

		HiddenSize:     200,
		EmbeddingSize:  50,
		NumPrecomputed: 30000,

		EvalPerIteration:           100,
		ClearGradientsPerIteration: 0,

		Dtype: tensor.Float64,
		// Dtype: gorgonia.Float32,
	}
}


================================================
FILE: dep/release.go
================================================
// +build !debug

package dep

const BUILD_DEBUG = "PARSER: RELEASE BUILD"
const BUILD_DIAG = "Non-Diagnostic Build"

const DEBUG = false

var READMEMSTATS = false

var TABCOUNT uint32 = 0

func enterLoggingContext() {}

func leaveLoggingContext() {}

func logTrainingProgress(iteration, correct, total, length, possibles int) {}

func logMemStats() {}

func logf(format string, others ...interface{}) {}

func recoverFrom(format string, attrs ...interface{}) {}

func (d *Parser) SprintFeatures(feature []int) string { return "" }

func SprintScores(scores []float64, ts []transition) string { return "" }


================================================
FILE: dep/span.go
================================================
package dep

type span struct {
	start, end int
}

func makeSpan(start, end int) span {
	if end <= start {
		panic("Impossible span created")
	}
	return span{start, end}
}

func (s span) combine(other span) span {
	start := minInt(s.start, other.start)
	end := maxInt(s.end, other.end)
	return span{start, end}
}


================================================
FILE: dep/test_test.go
================================================
package dep

import (
	"bufio"
	"crypto/md5"
	"encoding/gob"
	"fmt"
	"io"
	"log"
	"os"
	"strings"

	"github.com/chewxy/lingo"
	"github.com/chewxy/lingo/treebank"
	"github.com/kljensen/snowball"
)

type dummyLem struct{}

func (dummyLem) Lemmatize(s string, pt lingo.POSTag) ([]string, error) {
	return nil, componentUnavailable("lemmatizer")
}

type dummyStemmer struct{}

func (dummyStemmer) Stem(s string) (string, error) {
	return snowball.Stem(s, "english", true)
}

type dummyFix struct {
	dummyStemmer
	dummyLem
}

func (dummyFix) Clusters() (map[string]lingo.Cluster, error) {
	return nil, componentUnavailable("clusters")
}

const nnps = `1	Guerrillas	guerrilla	NOUN	NNS	Number=Plur	2	nsubj	_	_
2	threatened	threaten	VERB	VBD	Mood=Ind|Tense=Past|VerbForm=Fin	0	root	_	_
3	to	to	PART	TO	_	4	mark	_	_
4	assassinate	assassinate	VERB	VB	VerbForm=Inf	2	xcomp	_	_
5	Prime	Prime	PROPN	NNP	Number=Sing	6	compound	_	_
6	Minister	Minister	PROPN	NNP	Number=Sing	8	compound	_	_
7	Iyad	Iyad	PROPN	NNP	Number=Sing	8	compound	_	_
8	Allawi	Allawi	PROPN	NNP	Number=Sing	4	dobj	_	_
9	and	and	CONJ	CC	_	8	cc	_	_
10	Minister	Minister	PROPN	NNP	Number=Sing	14	compound	_	_
11	of	of	ADP	IN	_	12	case	_	_
12	Defense	Defense	PROPN	NNP	Number=Sing	10	nmod	_	_
13	Hazem	Hazem	PROPN	NNP	Number=Sing	14	compound	_	_
14	Shaalan	Shaalan	PROPN	NNP	Number=Sing	8	conj	_	_
15	in	in	ADP	IN	_	16	case	_	_
16	retaliation	retaliation	NOUN	NN	Number=Sing	4	nmod	_	_
17	for	for	ADP	IN	_	19	case	_	_
18	the	the	DET	DT	Definite=Def|PronType=Art	19	det	_	_
19	attack	attack	NOUN	NN	Number=Sing	16	nmod	_	_
20	.	.	PUNCT	.	_	2	punct	_	_

`
const simple = `1	Yet	yet	CONJ	CC	_	5	cc	_	_
2	we	we	PRON	PRP	Case=Nom|Number=Plur|Person=1|PronType=Prs	5	nsubj	_	_
3	did	do	AUX	VBD	Mood=Ind|Tense=Past|VerbForm=Fin	5	aux	_	_
4	n't	not	PART	RB	_	5	neg	_	_
5	charge	charge	VERB	VB	VerbForm=Inf	0	root	_	_
6	them	they	PRON	PRP	Case=Acc|Number=Plur|Person=3|PronType=Prs	5	dobj	_	_
7	for	for	ADP	IN	_	9	case	_	_
8	the	the	DET	DT	Definite=Def|PronType=Art	9	det	_	_
9	evacuation	evacuation	NOUN	NN	Number=Sing	5	nmod	_	_
10	.	.	PUNCT	.	_	5	punct	_	_

`

const med = `1	President	President	PROPN	NNP	Number=Sing	2	compound	_	_
2	Bush	Bush	PROPN	NNP	Number=Sing	5	nsubj	_	_
3	on	on	ADP	IN	_	4	case	_	_
4	Tuesday	Tuesday	PROPN	NNP	Number=Sing	5	nmod	_	_
5	nominated	nominate	VERB	VBD	Mood=Ind|Tense=Past|VerbForm=Fin	0	root	_	_
6	two	two	NUM	CD	NumType=Card	7	nummod	_	_
7	individuals	individual	NOUN	NNS	Number=Plur	5	dobj	_	_
8	to	to	PART	TO	_	9	mark	_	_
9	replace	replace	VERB	VB	VerbForm=Inf	5	advcl	_	_
10	retiring	retire	VERB	VBG	VerbForm=Ger	11	amod	_	_
11	jurists	jurist	NOUN	NNS	Number=Plur	9	dobj	_	_
12	on	on	ADP	IN	_	14	case	_	_
13	federal	federal	ADJ	JJ	Degree=Pos	14	amod	_	_
14	courts	court	NOUN	NNS	Number=Plur	11	nmod	_	_
15	in	in	ADP	IN	_	18	case	_	_
16	the	the	DET	DT	Definite=Def|PronType=Art	18	det	_	_
17	Washington	Washington	PROPN	NNP	Number=Sing	18	compound	_	_
18	area	area	NOUN	NN	Number=Sing	14	nmod	_	_
19	.	.	PUNCT	.	_	5	punct	_	_

`

const long = `1	Now	now	ADV	RB	_	5	advmod	_	_
2	,	,	PUNCT	,	_	5	punct	_	_
3	I	I	PRON	PRP	Case=Nom|Number=Sing|Person=1|PronType=Prs	5	nsubj	_	_
4	would	would	AUX	MD	VerbForm=Fin	5	aux	_	_
5	argue	argue	VERB	VB	VerbForm=Inf	0	root	_	_
6	that	that	SCONJ	IN	_	11	mark	_	_
7	one	one	PRON	PRP	_	11	nsubj	_	_
8	could	could	AUX	MD	VerbForm=Fin	11	aux	_	_
9	have	have	AUX	VB	VerbForm=Inf	11	aux	_	_
10	reasonably	reasonably	ADV	RB	_	11	advmod	_	_
11	predicted	predict	VERB	VBN	Tense=Past|VerbForm=Part	5	ccomp	_	_
12	that	that	SCONJ	IN	_	19	mark	_	_
13	some	some	DET	DT	_	14	det	_	_
14	form	form	NOUN	NN	Number=Sing	19	nsubj	_	_
15	of	of	ADP	IN	_	17	case	_	_
16	military	military	ADJ	JJ	Degree=Pos	17	amod	_	_
17	violence	violence	NOUN	NN	Number=Sing	14	nmod	_	_
18	was	be	VERB	VBD	Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin	19	cop	_	_
19	likely	likely	ADJ	JJ	Degree=Pos	11	ccomp	_	_
20	to	to	PART	TO	_	21	mark	_	_
21	occur	occur	VERB	VB	VerbForm=Inf	19	xcomp	_	_
22	in	in	ADP	IN	_	23	case	_	_
23	Lebanon	Lebanon	PROPN	NNP	Number=Sing	21	nmod	_	_
24	-LRB-	-lrb-	PUNCT	-LRB-	_	25	punct	_	_
25	considering	consider	VERB	VBG	VerbForm=Ger	19	advcl	_	_
26	that	that	SCONJ	IN	_	31	mark	_	_
27	the	the	DET	DT	Definite=Def|PronType=Art	28	det	_	_
28	country	country	NOUN	NN	Number=Sing	31	nsubj	_	_
29	has	have	AUX	VBZ	Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin	31	aux	_	_
30	been	be	AUX	VBN	Tense=Past|VerbForm=Part	31	aux	_	_
31	experiencing	experience	VERB	VBG	Tense=Pres|VerbForm=Part	25	ccomp	_	_
32	some	some	DET	DT	_	33	det	_	_
33	form	form	NOUN	NN	Number=Sing	31	dobj	_	_
34	of	of	ADP	IN	_	35	case	_	_
35	conflict	conflict	NOUN	NN	Number=Sing	33	nmod	_	_
36	for	for	ADP	IN	_	41	case	_	_
37	approximately	approximately	ADV	RB	_	41	advmod	_	_
38	the	the	DET	DT	Definite=Def|PronType=Art	41	det	_	_
39	last	last	ADJ	JJ	Degree=Pos	41	amod	_	_
40	32	32	NUM	CD	NumType=Card	41	nummod	_	_
41	years	year	NOUN	NNS	Number=Plur	31	nmod	_	_
42	-RRB-	-rrb-	PUNCT	-RRB-	_	25	punct	_	_
43	.	.	PUNCT	.	_	5	punct	_	_

`

const cvconllu = `1	Google	Google	PROPN	NNP	Number=Sing	6	nsubj	_	_
2	is	be	VERB	VBZ	Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin	6	cop	_	_
3	a	a	DET	DT	Definite=Ind|PronType=Art	6	det	_	_
4	nice	nice	ADJ	JJ	Degree=Pos	6	amod	_	_
5	search	search	NOUN	NN	Number=Sing	6	compound	_	_
6	engine	engine	NOUN	NN	Number=Sing	0	root	_	_
7	.	.	PUNCT	.	_	6	punct	_	_

1	Does	do	AUX	VBZ	Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin	3	aux	_	_
2	anybody	anybody	NOUN	NN	Number=Sing	3	nsubj	_	_
3	use	use	VERB	VB	VerbForm=Inf	0	root	_	_
4	it	it	PRON	PRP	Case=Acc|Gender=Neut|Number=Sing|Person=3|PronType=Prs	3	dobj	_	_
5	for	for	ADP	IN	_	6	case	_	_
6	anything	anything	NOUN	NN	Number=Sing	3	nmod	_	_
7	else	else	ADJ	JJ	Degree=Pos	6	amod	_	_
8	?	?	PUNCT	.	_	3	punct	_	_

`

func lotsaNNP() *lingo.Dependency {
	readr := strings.NewReader(nnps)
	sentenceTags := treebank.ReadConllu(readr)

	return sentenceTags[0].Dependency(dummyFix{})
}

// simpleSentence has 10 words
func simpleSentence() []treebank.SentenceTag {
	readr := strings.NewReader(simple)
	return treebank.ReadConllu(readr)
}

func mediumSentence() []treebank.SentenceTag {
	readr := strings.NewReader(med)
	return treebank.ReadConllu(readr)
}

// longSentence has 44 words
func longSentence() []treebank.SentenceTag {
	readr := strings.NewReader(long)
	return treebank.ReadConllu(readr)
}

func allSentences() []treebank.SentenceTag {
	sentenceTags := treebank.ReadConllu(strings.NewReader(nnps))
	sentenceTags = append(sentenceTags, treebank.ReadConllu(strings.NewReader(simple))...)
	sentenceTags = append(sentenceTags, treebank.ReadConllu(strings.NewReader(med))...)
	sentenceTags = append(sentenceTags, treebank.ReadConllu(strings.NewReader(long))...)
	return sentenceTags
}

func cvSentences() []treebank.SentenceTag {
	return treebank.ReadConllu(strings.NewReader(cvconllu))
}

func hash(s string) string {
	h := md5.New()
	io.WriteString(h, s)
	return fmt.Sprintf("%x", h.Sum(nil))
}

func cache(input string, s lingo.AnnotatedSentence) {
	hashfilename := "cached/" + hash(input) + ".cached"
	f, err := os.Create(hashfilename)
	if err != nil {
		log.Fatal(err)
	}
	defer f.Close()

	w := bufio.NewWriter(f)
	defer w.Flush()

	encoder := gob.NewEncoder(w)

	if err := encoder.Encode(s); err != nil {
		log.Fatal(err)
	}
}

func useCached(filename string) *lingo.Dependency {
	f, err := os.Open(filename)
	if err != nil {
		log.Fatal(err)
	}
	defer f.Close()

	r := bufio.NewReader(f)
	decoder := gob.NewDecoder(r)

	var sentence lingo.AnnotatedSentence
	if err := decoder.Decode(&sentence); err != nil {
		log.Fatal(err)
	}
	// fixes ID and what nots
	sentence.Fix()

	dep := sentence.Dependency()
	return dep
}


================================================
FILE: dep/train.go
================================================
package dep

import (
	"fmt"
	"os"
	"sync"

	"github.com/chewxy/lingo"
	"github.com/chewxy/lingo/corpus"
	"github.com/chewxy/lingo/treebank"
	"github.com/pkg/errors"
)

// TrainerConsOpt is a construction option for trainer
type TrainerConsOpt func(t *Trainer)

// WithTrainingModel loads a trainer with a model
func WithTrainingModel(m *Model) TrainerConsOpt {
	f := func(t *Trainer) {
		t.Model = m
	}
	return f
}

// WithTrainingSet creates a trainer with a training set
func WithTrainingSet(st []treebank.SentenceTag) TrainerConsOpt {
	f := func(t *Trainer) {
		t.trainingSet = st
	}
	return f
}

// WithCrossValidationSet creates a trainer with a cross validation set
func WithCrossValidationSet(st []treebank.SentenceTag) TrainerConsOpt {
	f := func(t *Trainer) {
		t.crossValSet = st
	}
	return f
}

// WithConfig sets up a *Trainer with a NNConfig
func WithConfig(conf NNConfig) TrainerConsOpt {
	f := func(t *Trainer) {
		t.nn.NNConfig = conf
		t.nn.dict = t.corpus
		t.nn.transitions = t.ts
		t.EvalPerIter = conf.EvalPerIteration
	}
	return f
}

// WithLemmatizer sets the lemmatizer option on the Trainer
func WithLemmatizer(l lingo.Lemmatizer) TrainerConsOpt {
	f := func(t *Trainer) {
		// cannot pass in itself!
		if T, ok := l.(*Trainer); ok && T == t {
			panic("Recursive definition of lemmatizer (trying to set the t.lemmatizer = T) !")
		}

		t.l = l
	}
	return f
}

// WithStemmer sets up the stemmer option on the DependencyParser
func WithStemmer(s lingo.Stemmer) TrainerConsOpt {
	f := func(t *Trainer) {
		// cannot pass in itself
		if T, ok := s.(*Trainer); ok && T == t {
			panic("Recursive setting of stemmer! (Trying to set t.stemmer = T)")
		}
		t.s = s
	}
	return f
}

// WithCluster sets the brown cluster options for the DependencyParser
func WithCluster(c map[string]lingo.Cluster) TrainerConsOpt {
	f := func(t *Trainer) {
		t.c = c
	}
	return f
}

// WithCorpus creates a Trainer with a corpus
func WithCorpus(c *corpus.Corpus) TrainerConsOpt {
	f := func(t *Trainer) {
		t.corpus = c
		t.nn.dict = c
	}
	return f
}

// WithGeneratedCorpus creates a Trainer's corpus from a list of SentenceTags. The corpus will be generated from the SentenceTags
func WithGeneratedCorpus(sts ...treebank.SentenceTag) TrainerConsOpt {
	f := func(t *Trainer) {
		dict := corpus.GenerateCorpus(sts)
		if t.corpus == nil {
			t.corpus = dict
		} else {
			t.corpus.Merge(dict)
		}

		t.nn.dict = t.corpus
	}
	return f
}

// Trainer trains a model
type Trainer struct {
	trainingSet []treebank.SentenceTag
	crossValSet []treebank.SentenceTag

	once sync.Once
	*Model

	// Training configuration
	EvalPerIter int    // for cross validation - evaluate results every n epochs
	PassDirect  bool   // Pass on the costs directly to the cost channel? If false, an average will be used
	SaveBest    string // SaveBest is the filename that will be saved. If it's empty then the best-while-training will not be saved

	// fixer
	l lingo.Lemmatizer
	s lingo.Stemmer
	c map[string]lingo.Cluster

	err  chan error
	cost chan float64
	perf chan Performance
}

// NewTrainer creates a new Trainer.
func NewTrainer(opts ...TrainerConsOpt) *Trainer {
	t := new(Trainer)
	// set up the default model
	t.Model = new(Model)
	t.corpus = KnownWords
	t.ts = transitions

	// set up the neural network
	t.nn = new(neuralnetwork2)
	t.nn.NNConfig = DefaultNNConfig
	t.nn.transitions = transitions
	t.nn.dict = KnownWords

	for _, opt := range opts {
		opt(t)
	}
	return t
}

// Lemmatize implemnets lingo.Lemmatizer
func (t *Trainer) Lemmatize(a string, pt lingo.POSTag) ([]string, error) {
	if t.l == nil {
		return nil, componentUnavailable("Lemmatizer")
	}
	return t.l.Lemmatize(a, pt)
}

// Stem implements lingo.Stemmer
func (t *Trainer) Stem(a string) (string, error) {
	if t.s == nil {
		return "", componentUnavailable("Stemmer")
	}
	return t.s.Stem(a)
}

// Clusters implements lingo.Fixer
func (t *Trainer) Clusters() (map[string]lingo.Cluster, error) {
	if t.c == nil {
		return nil, componentUnavailable("Clusters")
	}
	return t.c, nil
}

/* Getters */

// Cost returns a channel of costs for monitoring the training. If the PassDirect field in the trainer is set to true
// then the costs are directly returned. Otherwise the costs are averaged over the epoch.
func (t *Trainer) Cost() <-chan float64 {
	if t.cost == nil {
		t.cost = make(chan float64)
	}
	return t.cost
}

// Perf returns a channel of Performance for monitoring the training.
func (t *Trainer) Perf() <-chan Performance {
	if t.perf == nil {
		t.perf = make(chan Performance)
	}
	return t.perf
}

/* Methods */

// Init initializes the DependencyParser with a corpus and a neural network config
func (t *Trainer) Init() (err error) {
	f := func() {
		err = t.nn.init()
	}
	t.once.Do(f)
	return
}

// Train trains a model.
//
// If a cross validation set is provided, it will automatically train with the cross validation set
func (t *Trainer) Train(epochs int) error {
	if err := t.pretrainCheck(); err != nil {
		return err
	}
	if len(t.crossValSet) > 0 {
		return t.crossValidateTrain(epochs)
	}
	return t.train(epochs)
}

// TrainWithoutCrossValidation trains a model without cross validation.
func (t *Trainer) TrainWithoutCrossValidation(epochs int) error {
	return t.train(epochs)
}

// train simply trains the model without having a cross validation.
func (t *Trainer) train(epochs int) error {

	var epochChan chan struct{}
	if t.cost != nil {
		defer func() {
			close(t.cost)
			t.cost = nil
		}()

		epochChan = t.handleCosts()
		if epochChan != nil {
			defer close(epochChan)
		}
	}

	examples := makeExamples(t.trainingSet, t.nn.NNConfig, t.nn.dict, t.ts, t)

	for e := 0; e < epochs; e++ {
		if err := t.nn.train(examples); err != nil {
			return err
		}

		if epochChan != nil {
			epochChan <- struct{}{}
		}

		shuffleExamples(examples)
	}
	return nil
}

// crossValidateTrain trains the model but also does cross validation to ensure overfitting don't happen.
func (t *Trainer) crossValidateTrain(epochs int) error {
	if t.perf != nil {
		defer func() {
			close(t.perf)
			t.perf = nil
		}()
	}

	var epochChan chan struct{}
	if t.cost != nil {
		defer func() {
			close(t.cost)
			t.cost = nil
		}()

		epochChan = t.handleCosts()
		if epochChan != nil {
			defer close(epochChan)
		}
	}
	examples := makeExamples(t.trainingSet, t.nn.NNConfig, t.nn.dict, t.ts, t)

	var best Performance
	for e := 0; e < epochs; e++ {
		if err := t.nn.train(examples); err != nil {
			return err
		}

		if t.EvalPerIter > 0 && e%t.EvalPerIter == 0 || e == epochs-1 {
			perf := t.crossValidate(t.crossValSet)

			// if there is a channel to report back the performance, send it down
			if t.perf != nil {
				perf.Iter = e
				t.perf <- perf
			}

			if perf.UAS > best.UAS {
				best = perf

				if t.SaveBest != "" {
					f, err := os.Create(t.SaveBest)
					if err != nil {
						err = errors.Wrapf(err, "Unable to open SaveBest file %q", t.SaveBest)
						return err
					}

					t.Model.SaveWriter(f)
				}
			}
		}

		if epochChan != nil {
			epochChan <- struct{}{}
		}

		shuffleExamples(examples)
	}
	return nil
}

// pretrainCheck checks if everything is sane
func (t *Trainer) pretrainCheck() error {
	// check
	if t.nn == nil || !t.nn.initialized() {
		return errors.Errorf("DependencyParser not init()'d. Perhaps you forgot to call .Init() somewhere?")
	}

	if len(t.trainingSet) == 0 {
		return errors.Errorf("Cannot train with no training data set")
	}

	return nil
}

// handleCosts handles the costs from the neural network in two ways:
//		1. pass: directly passes on the costs (which may come from multiple batches in an epoch)
//		2. mean: calculates the mean of the costs and passes it on into d.cost
//
// If d.cost is nil, it simply returns. This method should be called after a check that d.cost is not nil
func (t *Trainer) handleCosts() (epochChan chan struct{}) {
	nncost := t.nn.costProgress()

	if t.PassDirect {
		go func() {
			for cost := range nncost {
				switch c := cost.Data().(type) {
				case float32:
					t.cost <- float64(c)
				case float64:
					t.cost <- c
				default:
					// this should NEVER happen
					panic(fmt.Sprintf("Unhandled cost type %T", c))
				}
			}
		}()
	} else {
		epochChan = make(chan struct{})

		// it collects the costs until the epoch chan signals that an epoch is done. Then the cost is averaged and sent down the d.cost channel
		go func(epochChan chan struct{}) {
			var collected []float64
			for {
				select {
				case cost := <-nncost:
					switch c := cost.Data().(type) {
					case float32:
						collected = append(collected, float64(c))
					case float64:
						collected = append(collected, c)
					default:
						// this should NEVER happen
						panic(fmt.Sprintf("Unhandled cost type %T", c))
					}
				case <-epochChan:
					var avg float64
					for _, cost := range collected {
						avg += cost
					}

					if len(collected) > 0 {
						avg /= float64(len(collected))
					}

					t.cost <- avg
					collected = collected[:0]
				}
			}
		}(epochChan)
	}
	return
}


================================================
FILE: dep/train_test.go
================================================
package dep

import (
	"testing"

	"github.com/chewxy/lingo/corpus"

	G "gorgonia.org/gorgonia"
)

func TestTrainerInitializations(t *testing.T) {
	var d *Trainer
	c := corpus.New()

	d = NewTrainer(WithCorpus(c))
	if d.corpus != c {
		t.Errorf("Expected Corpus to be set to %p. Got %p instead", c, d.corpus)
	}

	d = NewTrainer(WithConfig(DefaultNNConfig))
	if d.corpus != KnownWords {
		t.Error("Expected corpus to be set to the default KnownWords corpus")
	}
	if d.nn == nil {
		t.Fatal("Expected a neural network")
	}
	if d.nn.dict != KnownWords {
		t.Error("Expected neuralnetwork's dict to be set")
	}

	// d2 = d.Clone()
	// if d2.nn != d.nn {
	// 	t.Error("Expected a neural network!")
	// }

	// // init empty
	// d = New()
	// if err := d.Init(); err != nil {
	// 	t.Errorf("%+v", err)
	// }

	// // init with a corpus
	// d = New(WithCorpus(c))
	// if err := d.Init(); err != nil {
	// 	t.Errorf("%+v", err)
	// }
}

func TestTrainer_train(t *testing.T) {
	sts := allSentences()
	epochs := 10

	var err error

	trainer := NewTrainer(WithGeneratedCorpus(sts...), WithTrainingSet(sts))
	if err = trainer.Train(epochs); err == nil {
		t.Error("Expected an error when training an uninitialized Trainer")
	}

	// with init
	t.Logf("Pass On Costs Directly")
	conf := DefaultNNConfig
	conf.BatchSize = 90
	trainer = NewTrainer(WithGeneratedCorpus(sts...), WithConfig(conf), WithTrainingSet(sts))
	if err := trainer.Init(); err != nil {
		t.Errorf("%+v", err)
	}
	trainer.PassDirect = true

	var costs []float64
	cost := trainer.Cost()

	go func() {
		for c := range cost {
			costs = append(costs, c)
			t.Logf("Cost %v", c)
		}
	}()

	if err = trainer.Train(epochs); err != nil {
		t.Errorf("Err: %v", err)
	}

	if len(costs) == 0 {
		t.Errorf("Zero costs...")
		goto avgcosts
	}

	t.Logf("Costs %d", len(costs))
	if len(costs) < (epochs*2)-5 { // we'll allow some tolerance
		t.Errorf("Expected some costs")
	}
	if costs[0] < costs[len(costs)-1] {
		t.Errorf("Costs should be reducing")
	}

avgcosts:
	// with init, avg costs
	t.Logf("Average Costs")
	costs = costs[:0] // reset
	conf = DefaultNNConfig
	conf.Dtype = G.Float32

	trainer = NewTrainer(WithGeneratedCorpus(sts...), WithConfig(conf), WithTrainingSet(sts))
	if err := trainer.Init(); err != nil {
		t.Errorf("%+v", err)
	}
	trainer.PassDirect = false

	cost = trainer.Cost()

	go func() {
		for c := range cost {
			costs = append(costs, c)
			t.Logf("Cost %v", c)
		}
	}()
	if err = trainer.Train(epochs); err != nil {
		t.Errorf("%v", err)
	}

	if len(costs) == 0 {
		t.Fatal("Zero costs")
	}

	t.Logf("Costs %d", len(costs))
	if len(costs) == 0 {
		t.Errorf("Expected some costs")
	}

	if costs[0] < costs[len(costs)-1] {
		t.Errorf("Costs should be reducing")
	}
}

func TestTestTrainer_crossValidateTrain(t *testing.T) {
	sts := allSentences()
	cv := cvSentences()
	epochs := 10

	var trainer *Trainer
	var err error

	// uninit
	t.Logf("Uninitiated")
	trainer = NewTrainer(WithGeneratedCorpus(sts...))
	if err = trainer.Train(epochs); err == nil {
		t.Errorf("Expected an error when training with an uninitialized Trainer")
	}

	// with init
	t.Logf("Pass On Costs Directly")
	conf := DefaultNNConfig
	conf.BatchSize = 90
	trainer = NewTrainer(WithGeneratedCorpus(sts...), WithConfig(conf), WithTrainingSet(sts), WithCrossValidationSet(cv))
	trainer.PassDirect = true
	if err := trainer.Init(); err != nil {
		t.Errorf("%+v", err)
	}

	var costs []float64
	cost := trainer.Cost()
	perf := trainer.Perf()

	go func() {
		for p := range perf {
			t.Logf("Perf \n%v", p)
		}
	}()

	go func() {
		for c := range cost {
			costs = append(costs, c)
			t.Logf("Cost %v", c)
		}
	}()
	if err = trainer.Train(epochs); err != nil {
		t.Error(err)
	}

	if len(costs) == 0 {
		t.Errorf("Zero costs")
		goto avgCosts
	}

	t.Logf("Costs %d", len(costs))
	if len(costs) < (epochs*2)-5 { // we'll allow some tolerance
		t.Errorf("Expected some costs")
	}
	if costs[0] < costs[len(costs)-1] {
		t.Errorf("Costs should be reducing")
	}

avgCosts:
	// with init, avg costs, and using float32
	t.Logf("Average Costs")
	costs = costs[:0] // reset
	conf = DefaultNNConfig
	conf.Dtype = G.Float32
	trainer = NewTrainer(WithGeneratedCorpus(sts...), WithConfig(conf), WithTrainingSet(sts), WithCrossValidationSet(cv))
	if err := trainer.Init(); err != nil {
		t.Errorf("%+v", err)
	}
	trainer.PassDirect = false

	cost = trainer.Cost()
	perf = trainer.Perf()

	go func() {
		for p := range perf {
			t.Logf("Perf \n%v", p)
		}
	}()

	go func() {
		for c := range cost {
			costs = append(costs, c)
			t.Logf("Cost %v", c)
		}
	}()
	trainer.Train(epochs)

	if len(costs) == 0 {
		t.Fatal("Zero costs")
	}

	t.Logf("Costs %d", len(costs))
	if len(costs) == 0 {
		t.Errorf("Expected some costs")
	}

	if costs[0] < costs[len(costs)-1] {
		t.Errorf("Costs should be reducing")
	}
}


================================================
FILE: dep/transition.go
================================================
package dep

import (
	"fmt"

	"github.com/chewxy/lingo"
)

// transition is a tuple of Move and label
type transition struct {
	Move
	lingo.DependencyType
}

var transitions []transition
var MAXTRANSITION int

func buildTransitions(labels []lingo.DependencyType) []transition {
	ts := make([]transition, 0)
	// for _, l := range labels {
	// 	if l == lingo.NoDepType {
	// 		continue
	// 	}
	// 	t := transition{Left, l}
	// 	ts = append(ts, t)
	// }

	// for _, l := range labels {
	// 	if l == lingo.NoDepType {
	// 		continue
	// 	}

	// 	t := transition{Right, l}
	// 	ts = append(ts, t)
	// }

	// ts = append(ts, transition{Shift, lingo.NoDepType})

	for _, m := range ALLMOVES {
		for _, l := range labels {
			if (m == Shift && l != lingo.NoDepType) || (m != Shift && l == lingo.NoDepType) {
				continue
			}
			t := transition{m, l}
			ts = append(ts, t)
		}
	}
	return ts
}

func (t transition) String() string {
	return fmt.Sprintf("(%s, %s)", t.Move, t.DependencyType)
}

func lookupTransition(t transition, table []transition) int {
	for i, v := range table {
		if v == t {
			return i
		}
	}
	panic(fmt.Sprintf("Transition %v not found", t))
}

// this builds the default transitions
func init() {
	lbls := make([]lingo.DependencyType, lingo.MAXDEPTYPE)

	for i := 0; i < int(lingo.MAXDEPTYPE); i++ {
		lbls[i] = lingo.DependencyType(i)
	}

	transitions = buildTransitions(lbls)
	MAXTRANSITION = len(transitions)
}


================================================
FILE: dep/util.go
================================================
package dep

func minInt(a, b int) int {
	if a < b {
		return a
	}
	return b
}

func maxInt(a, b int) int {
	if a > b {
		return a
	}
	return b
}


================================================
FILE: dependency.go
================================================
package lingo

import (
	"bytes"
	"fmt"
)

// Dependency represents the dependency parse of a sentence. While AnnotatedSentence does
// already do a job of representing the dependency parse of a sentence, *Dependency actually contains
// meta information about the dependency parse (specifically, lefts, rights) that makes parsing a dependency a lot faster
//
// The fields are mostly left unexported for a good reason - a dependency parse SHOULD be static after it's been built
type Dependency struct {
	AnnotatedSentence

	wordCount int

	lefts  [][]int
	rights [][]int

	counter int // for checking if a tree is projective

	n int
}

type depConsOpt func(*Dependency)

// FromAnnotatedSentence creates a dependency from an AnnotatedSentence.
func FromAnnotatedSentence(s AnnotatedSentence) depConsOpt {
	fn := func(d *Dependency) {
		wc := len(s)
		d.AnnotatedSentence = s
		d.wordCount = wc
		d.n = wc - 1
	}
	return fn
}

// AllocTree allocates the lefts and rights. Typical construction of the *Dependency doesn't allocate the trees as they're not necessary for a number of tasks.
func AllocTree() depConsOpt {
	fn := func(d *Dependency) {
		d.lefts = make([][]int, d.wordCount)
		d.rights = make([][]int, d.wordCount)
		for i := 0; i < d.wordCount; i++ {
			d.lefts[i] = make([]int, 0)
			d.rights[i] = make([]int, 0)
		}
	}
	return fn
}

// NewDependency creates a new *Dependency. It takes optional construction options:
//		FromAnnotatedSentence
//		AllocTree
func NewDependency(opts ...depConsOpt) *Dependency {
	d := new(Dependency)

	for _, opt := range opts {
		opt(d)
	}
	return d
}

func (d *Dependency) Sentence() AnnotatedSentence { return d.AnnotatedSentence }
func (d *Dependency) Lefts() [][]int              { return d.lefts }
func (d *Dependency) Rights() [][]int             { return d.rights }
func (d *Dependency) WordCount() int              { return d.wordCount }
func (d *Dependency) N() int                      { return d.n }

// please only use these for testing
func (d *Dependency) SetLefts(l [][]int)  { d.lefts = l }
func (d *Dependency) SetRights(r [][]int) { d.rights = r }

func (d *Dependency) Head(i int) int {
	if i < 0 || i >= d.wordCount || d.AnnotatedSentence[i].Head == nil {
		return -1
	}

	return d.AnnotatedSentence[i].HeadID()
}

func (d *Dependency) Label(i int) DependencyType {
	if i < 0 || i >= d.wordCount {
		return NoDepType
	}

	return d.AnnotatedSentence[i].DependencyType
}

func (d *Dependency) Annotation(i int) *Annotation {
	if i < 0 || i >= d.wordCount {
		return nullAnnotation
	}

	return d.AnnotatedSentence[i]
}

func (d *Dependency) AddArc(head, child int, label DependencyType) {
	d.AddChild(head, child)
	d.AddRel(child, label)
}

func (d *Dependency) AddChild(head, child int) {
	headAnn := d.AnnotatedSentence[head]
	d.AnnotatedSentence[child].SetHead(headAnn)

	if child < head {
		d.lefts[head] = append(d.lefts[head], child)
	} else {
		d.rights[head] = append(d.rights[head], child)
	}

	d.n++
}

func (d *Dependency) AddRel(child int, rel DependencyType) {
	// d.labels[child] = rel
	d.AnnotatedSentence[child].DependencyType = rel
}

func (d *Dependency) HasSingleRoot() bool {
	roots := 0
	for _, a := range d.AnnotatedSentence {
		h := a.HeadID()
		if h == 0 {
			roots++
		}
	}

	return roots == 1
}

func (d *Dependency) IsLegal() bool {
	var heads []int
	for _, a := range d.AnnotatedSentence {
		h := a.HeadID()
		if h < 0 || h > d.wordCount {
			return false
		}
		heads = append(heads, -1)
	}

	for i := 1; i < d.wordCount; i++ {
		for k := i; k > 0; {
			if heads[k] >= 0 && heads[k] < 1 {
				break
			}
			if heads[k] == i {
				return false
			}
			heads[k] = i
			k = d.AnnotatedSentence[k].HeadID()
		}
	}

	return true
}

func (d *Dependency) IsProjective() bool {
	d.counter = -1
	return d.projectiveVisit(0)
}

func (d *Dependency) projectiveVisit(w int) bool {
	for i := 1; i < w; i++ {
		if d.AnnotatedSentence[i].HeadID() == w && d.projectiveVisit(i) == false {
			return false
		}
	}

	d.counter++

	if w != d.counter {
		return false
	}

	for i := w + 1; i < d.wordCount; i++ {
		if d.AnnotatedSentence[i].HeadID() == w && d.projectiveVisit(i) == false {
			return false
		}
	}

	return true
}

func (d *Dependency) Root() int {
	for i := 1; i <= d.n; i++ {
		if d.Head(i) == 0 {
			return i
		}
	}

	return 0
}

func (d *Dependency) SprintRel() string {
	var buf bytes.Buffer

	for _, e := range d.Edges() {
		fmt.Fprintf(&buf, "%v(%q-%d, %q-%d)\n", e.Rel, e.Gov.Value, e.Gov.ID, e.Dep.Value, e.Dep.ID)
	}

	return buf.String()
}

type DependencyEdge struct {
	Gov *Annotation
	Dep *Annotation
	Rel DependencyType
}

// Sort interface

type edgeByID []DependencyEdge

func (b edgeByID) Len() int           { return len(b) }
func (b edgeByID) Swap(i, j int)      { b[i], b[j] = b[j], b[i] }
func (b edgeByID) Less(i, j int) bool { return b[i].Dep.ID < b[j].Dep.ID }


================================================
FILE: dependencyTree.go
================================================
package lingo

import (
	"github.com/awalterschulze/gographviz"

	"fmt"

	"sync"
)

// A DependencyTree is an alternate form of representing a dependency parse.
// This form makes it easier to traverse the tree
type DependencyTree struct {
	Parent *DependencyTree

	ID   int            // the word number in a sentence
	Type DependencyType // refers to the dependency type to the parent
	Word *Annotation

	Children []*DependencyTree
}

func NewDependencyTree(parent *DependencyTree, ID int, ann *Annotation) *DependencyTree {
	return &DependencyTree{
		Parent:   parent,
		ID:       ID,
		Word:     ann,
		Children: make([]*DependencyTree, 0),
	}
}

func (d *DependencyTree) AddChild(child *DependencyTree) {
	d.Children = append(d.Children, child)
}

func (d *DependencyTree) AddRel(rel DependencyType) {
	d.Type = rel
}

func (d *DependencyTree) walk(c chan *DependencyTree, wg *sync.WaitGroup) {
	defer wg.Done()

	for _, child := range d.Children {
		wg.Add(1)
		go child.walk(c, wg)
	}
	c <- d // man someone should do somehting about my bad naming
}

func (d *DependencyTree) Dot() string {
	// walk graph
	c := make(chan *DependencyTree)
	out := make(chan string)

	go dotString(c, out)
	var wg sync.WaitGroup
	wg.Add(1)
	go d.walk(c, &wg)

	wg.Wait()
	close(c)
	return <-out
}

func dotString(c chan *DependencyTree, out chan string) {
	g := gographviz.NewEscape()
	g.SetName("G")
	g.SetDir(true) // it's always going to be a directed graph
	// g.AddNode("G", "Node_0x0", nil) // add the root

	for t := range c {
		id := fmt.Sprintf("Node_%p", t)
		attrs := map[string]string{
			"label": fmt.Sprintf("%d: \"%s/%s\"", t.ID, t.Word.Value, t.Word.POSTag),
		}
		g.AddNode("G", id, attrs)

		if t.Parent == nil {
			continue
		}

		parentID := fmt.Sprintf("Node_%p", t.Parent)
		edgeAttrs := map[string]string{
			"label": fmt.Sprintf("%v", t.Type),
		}
		g.AddEdge(parentID, id, true, edgeAttrs)
	}
	out <- g.String()
}

func (d *DependencyTree) Walk(fn func(interface{})) {
	for _, child := range d.Children {
		child.Walk(fn)
	}

	if fn != nil {
		fn(d)
	}
}


================================================
FILE: dependencyType.go
================================================
package lingo

import (
	"fmt"
	"strings"
)

// DependencyType represents the relation between two words
type DependencyType byte

var dependencyTypeLookup map[string]DependencyType

func init() {
	dependencyTypeLookup = make(map[string]DependencyType)
	for dt := NoDepType; dt < MAXDEPTYPE; dt++ {
		s := dt.String()
		dependencyTypeLookup[s] = DependencyType(dt)
		dependencyTypeLookup[strings.ToLower(s)] = DependencyType(dt)
	}
}

func (dt DependencyType) MarshalText() ([]byte, error) {
	return []byte(fmt.Sprintf("%v", dt)), nil
}

func (dt *DependencyType) UnmarshalText(text []byte) error {
	str := strings.Trim(string(text), `"`) // for JSON use, if any
	deptype, _ := dependencyTypeLookup[str]
	*dt = deptype
	return nil
}

// list of dependency type functions

func InDepTypes(x DependencyType, set []DependencyType) bool {
	for _, v := range set {
		if v == x {
			return true
		}
	}
	return false
}

func IsModifier(x DependencyType) bool      { return InDepTypes(x, Modifiers) }
func IsCompound(x DependencyType) bool      { return InDepTypes(x, Compounds) }
func IsDeterminerRel(x DependencyType) bool { return InDepTypes(x, DeterminerRels) }
func IsMultiword(x DependencyType) bool     { return InDepTypes(x, MultiWord) }
func IsQuantifier(x DependencyType) bool    { return InDepTypes(x, QuantifingMods) }


================================================
FILE: dependencyType_stanford.go
================================================
// +build stanfordrel

package lingo

const BUILD_RELSET = "stanfordrel"

//go:generate stringer -type=DependencyType -output=dependencyType_stanford_string.go

// http://nlp.stanford.edu/software/dependencies_manual.pdf
const (
	NoDepType DependencyType = iota
	Dep
	Root
	Aux           // Auxilliary
	AuxPass       // passive auxiliary
	Cop           // Copula
	Arg           // argument
	Agent         // agent
	Comp          // Complement
	AComp         // adjectival complement
	CComp         // clausal complement with internal subject
	XComp         // clausal complement with external subject
	Obj           // Object
	DObj          // Direct Object
	IObj          // Indirect Object
	PObj          // Object of preposition
	Subj          // subject
	NSubj         // Nominal Subject
	NSubjPass     // passive nominal subject
	CSubj         // clausal subject
	CSubjPass     // passive clausal subject
	Coordination  // coordination (cannot use CC, as it's a POSTag)
	Conj          // conjunction
	Expl          // Expletive
	Mod           // modifier
	AMod          // adjectival modifier
	Appos         // Appositional modifier
	Advcl         // adverbial clause modifier
	Det           // determiner
	Predet        // predeterminer
	Preconj       // Preconjunction
	Vmod          // reduced, nonfinite verbal modifier
	MWE           // multiword expression modifier
	Mark          // marker (word introducing an Advcl or CComp)
	AdvMod        // adverbial modifier
	Neg           // negation modifier
	RCMod         // relative clause modifier
	QuantMod      // quantifier modifier
	NounMod       // Noun Compound Modifier (cannot use NN because NN is defined as a POSTag)
	NPAdvMod      // Noun phrase adverbial modifier
	TMod          // temporal modifier
	Num           // Numeric Modifier
	NumberElement // element of compound number (cannot use Number because Number is defined as a LexemeType)
	Prep          // prepositional modifier
	Poss          // possession modifier
	Possessive    // possessive modifier ('s)
	PRT           // phrasal verb partical
	Parataxis     // Parataxis (words that are next to each other)
	GoesWith      // GoesWith
	Punct         // punctuation
	Ref           // referant
	SDep          // Semantic Dependent
	XSubj         // controlling subject

	// additional stuff not found in the original, but found in EWT
	Case
	Compound
	NMod
	Discourse
	NumMod
	RelCl
	NFinCl
	NMod_Poss
	NMod_NPMod
	Vocative
	List
	MWPrep // multiword prepositional modifier
	Remnant
	Acl
	NPMod
	MDVod
	DetMod

	// found in stanford nnparser SD models
	PComp

	MAXDEPTYPE
)

var Modifiers = []DependencyType{AMod}
var Compounds = []DependencyType{Compound}
var DeterminerRels = []DependencyType{Det, DetMod}
var MultiWord = []DependencyType{MWE, MWPrep, Compound, Parataxis}
var QuantifingMods = []DependencyType{QuantMod, NumMod}


================================================
FILE: dependencyType_stanford_string.go
================================================
// +build stanfordrel

// Code generated by "stringer -type=DependencyType -output=dependencyType_stanford_string.go"; DO NOT EDIT

package lingo

import "fmt"

const _DependencyType_name = "NoDepTypeDepRootNSubjNSubjPassDObjIObjCSubjCSubjPassCCompXCompNumModApposNModAClACl_RelClDetDet_PreDetAModNegCaseNMod_NPModNMod_TModNMod_PossAdvClAdvModCompoundCompound_PartMWEListParataxisDiscourseExplAuxAuxPassCopMarkPunctConjCoordinationCC_PreConjMAXDEPTYPE"

var _DependencyType_index = [...]uint16{0, 9, 12, 16, 21, 30, 34, 38, 43, 52, 57, 62, 68, 73, 77, 80, 89, 92, 102, 106, 109, 113, 123, 132, 141, 146, 152, 160, 173, 176, 180, 189, 198, 202, 205, 212, 215, 219, 224, 228, 240, 250, 260}

func (i DependencyType) String() string {
	if i >= DependencyType(len(_DependencyType_index)-1) {
		return fmt.Sprintf("DependencyType(%d)", i)
	}
	return _DependencyType_name[_DependencyType_index[i]:_DependencyType_index[i+1]]
}


================================================
FILE: dependencyType_universal.go
================================================
// +build !stanfordrel

package lingo

const BUILD_RELSET = "universalrel"

//go:generate stringer -type=DependencyType -output=dependencyType_universal_string.go

// http://universaldependencies.github.io/docs/en/dep/all.html
const (
	NoDepType DependencyType = iota
	Dep
	Root

	// Core dependents of clausal predicates

	// nominal dependencies
	NSubj
	NSubjPass
	DObj
	IObj

	// predicate dependencies
	CSubj
	CSubjPass
	CComp

	XComp

	// Noun dependents

	// nominal dependencies
	NumMod
	Appos
	NMod

	// predicate dependencies
	ACl
	ACl_RelCl // RCMod in stanford deps
	Det
	Det_PreDet

	// modifier word
	AMod
	Neg

	// Case Marking, preposition, possessive
	Case

	//Non-Core Dependents of Clausal Predicates

	// Nominal dependencies
	NMod_NPMod
	NMod_TMod
	NMod_Poss

	// Predicate Dependencies
	AdvCl

	// Modifier Word
	AdvMod

	// Compounding and Unanalyzed
	Compound
	Compound_Part
	Name // Unused in English
	MWE
	Foreign  // Unused in English
	GoesWith // Unused in English

	// Loose Joining Relations
	List
	Dislocated // Unused in English
	Parataxis
	Remnant    // Unused in English
	Reparandum // Unused in English

	// Special Clausal Dependents

	// Nominal Dependent
	Vocative // Unused in English
	Discourse
	Expl

	// Auxilliary
	Aux
	AuxPass
	Cop

	// Other
	Mark
	Punct

	// Coordination

	Conj
	Coordination // CC
	CC_PreConj

	MAXDEPTYPE
)

var Modifiers = []DependencyType{AMod}
var Compounds = []DependencyType{Compound, Compound_Part}
var DeterminerRels = []DependencyType{Det, Det_PreDet}
var MultiWord = []DependencyType{MWE, Compound, Compound_Part, Parataxis}
var QuantifingMods = []DependencyType{NumMod}


================================================
FILE: dependencyType_universal_string.go
================================================
// +build !stanfordrel

// Code generated by "stringer -type=DependencyType -output=dependencyType_universal_string.go"; DO NOT EDIT

package lingo

import "fmt"

const _DependencyType_name = "NoDepTypeDepRootNSubjNSubjPassDObjIObjCSubjCSubjPassCCompXCompNumModApposNModAClACl_RelClDetDet_PreDetAModNegCaseNMod_NPModNMod_TModNMod_PossAdvClAdvModCompoundCompound_PartNameMWEForeignGoesWithListDislocatedParataxisRemnantReparandumVocativeDiscourseExplAuxAuxPassCopMarkPunctConjCoordinationCC_PreConjMAXDEPTYPE"

var _DependencyType_index = [...]uint16{0, 9, 12, 16, 21, 30, 34, 38, 43, 52, 57, 62, 68, 73, 77, 80, 89, 92, 102, 106, 109, 113, 123, 132, 141, 146, 152, 160, 173, 177, 180, 187, 195, 199, 209, 218, 225, 235, 243, 252, 256, 259, 266, 269, 273, 278, 282, 294, 304, 314}

func (i DependencyType) String() string {
	if i >= DependencyType(len(_DependencyType_index)-1) {
		return fmt.Sprintf("DependencyType(%d)", i)
	}
	return _DependencyType_name[_DependencyType_index[i]:_DependencyType_index[i+1]]
}


================================================
FILE: errors.go
================================================
package lingo

type componentUnavailable interface {
	error
	Component() string
}


================================================
FILE: go.mod
================================================
module github.com/chewxy/lingo

require (
	github.com/abiosoft/ishell v2.0.0+incompatible
	github.com/abiosoft/readline v0.0.0-20180607040430-155bce2042db // indirect
	github.com/awalterschulze/gographviz v0.0.0-20190221210632-1e9ccb565bca
	github.com/chewxy/hm v1.0.0 // indirect
	github.com/chewxy/math32 v1.0.0 // indirect
	github.com/chzyer/logex v1.1.10 // indirect
	github.com/chzyer/test v0.0.0-20180213035817-a1ea475d72b1 // indirect
	github.com/davecgh/go-spew v1.1.1 // indirect
	github.com/fatih/color v1.7.0 // indirect
	github.com/flynn-archive/go-shlex v0.0.0-20150515145356-3f9db97f8568 // indirect
	github.com/gogo/protobuf v1.2.1 // indirect
	github.com/golang/protobuf v1.2.0 // indirect
	github.com/google/flatbuffers v1.10.0 // indirect
	github.com/kljensen/snowball v0.6.0
	github.com/leesper/go_rng v0.0.0-20171009123644-5344a9259b21 // indirect
	github.com/mattn/go-colorable v0.1.1 // indirect
	github.com/mattn/go-isatty v0.0.6 // indirect
	github.com/pkg/browser v0.0.0-20180916011732-0a3d74bf9ce4
	github.com/pkg/errors v0.8.1
	github.com/stretchr/testify v1.3.0
	github.com/xtgo/set v1.0.0
	golang.org/x/exp v0.0.0-20190221220918-438050ddec5e // indirect
	golang.org/x/sync v0.0.0-20181221193216-37e7f081c4d4 // indirect
	golang.org/x/sys v0.0.0-20190225065934-cc5685c2db12 // indirect
	golang.org/x/text v0.3.0
	gonum.org/v1/gonum v0.0.0-20190221132855-8ea67971a689 // indirect
	gonum.org/v1/netlib v0.0.0-20190221094214-0632e2ebbd2d // indirect
	gorgonia.org/cu v0.9.0-beta // indirect
	gorgonia.org/dawson v1.1.0 // indirect
	gorgonia.org/gorgonia v0.9.1
	gorgonia.org/tensor v0.9.0-beta
	gorgonia.org/vecf32 v0.7.0 // indirect
	gorgonia.org/vecf64 v0.7.0 // indirect
)

go 1.13


================================================
FILE: go.sum
================================================
github.com/abiosoft/ishell v2.0.0+incompatible/go.mod h1:HQR9AqF2R3P4XXpMpI0NAzgHf/aS6+zVXRj14cVk9qg=
github.com/abiosoft/readline v0.0.0-20180607040430-155bce2042db/go.mod h1:rB3B4rKii8V21ydCbIzH5hZiCQE7f5E9SzUb/ZZx530=
github.com/awalterschulze/gographviz v0.0.0-20190221210632-1e9ccb565bca h1:xwIXr1FpA2XBoohlpvgb11No/zbsh5Clm/98PWPcHVA=
github.com/awalterschulze/gographviz v0.0.0-20190221210632-1e9ccb565bca/go.mod h1:GEV5wmg4YquNw7v1kkyoX9etIk8yVmXj+AkDHuuETHs=
github.com/chewxy/hm v1.0.0 h1:zy/TSv3LV2nD3dwUEQL2VhXeoXbb9QkpmdRAVUFiA6k=
github.com/chewxy/hm v1.0.0/go.mod h1:qg9YI4q6Fkj/whwHR1D+bOGeF7SniIP40VweVepLjg0=
github.com/chewxy/math32 v1.0.0 h1:RTt2SACA7BTzvbsAKVQJLZpV6zY2MZw4bW9L2HEKkHg=
github.com/chewxy/math32 v1.0.0/go.mod h1:Miac6hA1ohdDUTagnvJy/q+aNnEk16qWUdb8ZVhvCN0=
github.com/chzyer/logex v1.1.10 h1:Swpa1K6QvQznwJRcfTfQJmTE72DqScAa40E+fbHEXEE=
github.com/chzyer/logex v1.1.10/go.mod h1:+Ywpsq7O8HXn0nuIou7OrIPyXbp3wmkHB+jjWRnGsAI=
github.com/chzyer/test v0.0.0-20180213035817-a1ea475d72b1 h1:q763qf9huN11kDQavWsoZXJNW3xEE4JJyHa5Q25/sd8=
github.com/chzyer/test v0.0.0-20180213035817-a1ea475d72b1/go.mod h1:Q3SI9o4m/ZMnBNeIyt5eFwwo7qiLfzFZmjNmxjkiQlU=
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/fatih/color v1.7.0/go.mod h1:Zm6kSWBoL9eyXnKyktHP6abPY2pDugNf5KwzbycvMj4=
github.com/flynn-archive/go-shlex v0.0.0-20150515145356-3f9db97f8568/go.mod h1:rZfgFAXFS/z/lEd6LJmf9HVZ1LkgYiHx5pHhV5DR16M=
github.com/gogo/protobuf v1.2.1 h1:/s5zKNz0uPFCZ5hddgPdo2TK2TVrUNMn0OOX8/aZMTE=
github.com/gogo/protobuf v1.2.1/go.mod h1:hp+jE20tsWTFYpLwKvXlhS1hjn+gTNwPg2I6zVXpSg4=
github.com/golang/protobuf v1.2.0 h1:P3YflyNX/ehuJFLhxviNdFxQPkGK5cDcApsge1SqnvM=
github.com/golang/protobuf v1.2.0/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
github.com/google/flatbuffers v1.10.0 h1:wHCM5N1xsJ3VwePcIpVqnmjAqRXlR44gv4hpGi+/LIw=
github.com/google/flatbuffers v1.10.0/go.mod h1:1AeVuKshWv4vARoZatz6mlQ0JxURH0Kv5+zNeJKJCa8=
github.com/kisielk/errcheck v1.1.0/go.mod h1:EZBBE59ingxPouuu3KfxchcWSUPOHkagtvWXihfKN4Q=
github.com/kisielk/gotool v1.0.0/go.mod h1:XhKaO+MFFWcvkIS/tQcRk01m1F5IRFswLeQ+oQHNcck=
github.com/kljensen/snowball v0.6.0/go.mod h1:27N7E8fVU5H68RlUmnWwZCfxgt4POBJfENGMvNRhldw=
github.com/leesper/go_rng v0.0.0-20171009123644-5344a9259b21 h1:O75p5GUdUfhJqNCMM1ntthjtJCOHVa1lzMSfh5Qsa0Y=
github.com/leesper/go_rng v0.0.0-20171009123644-5344a9259b21/go.mod h1:N0SVk0uhy+E1PZ3C9ctsPRlvOPAFPkCNlcPBDkt0N3U=
github.com/mattn/go-colorable v0.1.1/go.mod h1:FuOcm+DKB9mbwrcAfNl7/TZVBZ6rcnceauSikq3lYCQ=
github.com/mattn/go-isatty v0.0.5/go.mod h1:Iq45c/XA43vh69/j3iqttzPXn0bhXyGjM0Hdxcsrc5s=
github.com/mattn/go-isatty v0.0.6/go.mod h1:Iq45c/XA43vh69/j3iqttzPXn0bhXyGjM0Hdxcsrc5s=
github.com/pkg/browser v0.0.0-20180916011732-0a3d74bf9ce4/go.mod h1:4OwLy04Bl9Ef3GJJCoec+30X3LQs/0/m4HFRt/2LUSA=
github.com/pkg/errors v0.8.1 h1:iURUrRGxPUNPdy5/HRSm+Yj6okJ6UtLINN0Q9M4+h3I=
github.com/pkg/errors v0.8.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
github.com/stretchr/testify v1.3.0 h1:TivCn/peBQ7UY8ooIcPgZFpTNSz0Q2U6UrFlUfqbe0Q=
github.com/stretchr/tes

Download .txt

gitextract_whqjv2y6/

├── .gitignore
├── .travis.yml
├── CONTRIBUTING.md
├── CONTRIBUTORS.md
├── LICENSE
├── POSTag.go
├── POSTag_stanford.go
├── POSTag_stanford_string.go
├── POSTag_universal.go
├── POSTag_universal_string.go
├── README.md
├── annotation.go
├── annotationSet.go
├── annotationSet_bench_test.go
├── browncluster.go
├── cmd/
│   ├── demo/
│   │   ├── io.go
│   │   ├── main.go
│   │   └── nlp.go
│   ├── dep/
│   │   ├── fixer.go
│   │   ├── io.go
│   │   ├── main.go
│   │   ├── pipeline.go
│   │   └── train.go
│   ├── lexer/
│   │   └── main.go
│   └── pos/
│       ├── crossvalidation.go
│       ├── fixer.go
│       └── main.go
├── const.go
├── corpus/
│   ├── consopt.go
│   ├── corpus.go
│   ├── corpus_test.go
│   ├── functions.go
│   ├── functions_test.go
│   ├── inflection.go
│   ├── inflection_test.go
│   ├── io.go
│   ├── io_test.go
│   ├── lda.go
│   ├── test_test.go
│   └── utils.go
├── dep/
│   ├── README.md
│   ├── arcStandard.go
│   ├── arcStandard_test.go
│   ├── configuration.go
│   ├── configuration_test.go
│   ├── debug.go
│   ├── dependencyParser.go
│   ├── documentation/
│   │   ├── iamhuman.dot
│   │   └── thecatsatonthemat.dot
│   ├── errors.go
│   ├── evaluation.go
│   ├── example.go
│   ├── example_test.go
│   ├── featureExtraction.go
│   ├── features.go
│   ├── features_string.go
│   ├── fix.go
│   ├── init.go
│   ├── models.go
│   ├── models_test.go
│   ├── move.go
│   ├── move_string.go
│   ├── nn2.go
│   ├── nn2_io.go
│   ├── nn2_io_test.go
│   ├── nn2_test.go
│   ├── nnconfig.go
│   ├── release.go
│   ├── span.go
│   ├── test_test.go
│   ├── train.go
│   ├── train_test.go
│   ├── transition.go
│   └── util.go
├── dependency.go
├── dependencyTree.go
├── dependencyType.go
├── dependencyType_stanford.go
├── dependencyType_stanford_string.go
├── dependencyType_universal.go
├── dependencyType_universal_string.go
├── errors.go
├── go.mod
├── go.sum
├── interfaces.go
├── io.go
├── io_test.go
├── lexeme.go
├── lexemetype_string.go
├── lexer/
│   ├── lexer.go
│   ├── lexer_test.go
│   └── stateFn.go
├── lingo.go
├── pos/
│   ├── allinone_test.go
│   ├── context.go
│   ├── context_test.go
│   ├── contexttype_string.go
│   ├── debug.go
│   ├── errors.go
│   ├── features.go
│   ├── features_test.go
│   ├── featuretype_string.go
│   ├── models.go
│   ├── models_test.go
│   ├── perceptron.go
│   ├── perceptron_io.go
│   ├── perceptron_io_test.go
│   ├── postagger.go
│   ├── release.go
│   ├── sentence.go
│   ├── test_test.go
│   ├── util.go
│   └── util_test.go
├── sentence.go
├── sets.go
├── shape.go
├── stopwords.go
├── treebank/
│   ├── const_postag_stanford.go
│   ├── const_postag_universal.go
│   ├── const_rel_stanford.go
│   ├── const_rel_universal.go
│   ├── sentenceTag.go
│   ├── sentenceTag_test.go
│   ├── treebank.go
│   ├── treebank_test.go
│   └── util.go
├── utils.go
└── wordFlags.go

Download .txt

SYMBOL INDEX (992 symbols across 111 files)

FILE: POSTag.go
  type POSTag (line 9) | type POSTag
    method MarshalText (line 22) | func (p POSTag) MarshalText() ([]byte, error) {
    method UnmarshalText (line 26) | func (p *POSTag) UnmarshalText(text []byte) error {
  function init (line 13) | func init() {
  function InPOSTags (line 34) | func InPOSTags(x POSTag, set []POSTag) bool {
  function IsAdjective (line 43) | func IsAdjective(x POSTag) bool     { return InPOSTags(x, Adjectives) }
  function IsNoun (line 44) | func IsNoun(x POSTag) bool          { return InPOSTags(x, Nouns) }
  function IsProperNoun (line 45) | func IsProperNoun(x POSTag) bool    { return InPOSTags(x, ProperNouns) }
  function IsVerb (line 46) | func IsVerb(x POSTag) bool          { return InPOSTags(x, Verbs) }
  function IsAdverb (line 47) | func IsAdverb(x POSTag) bool        { return InPOSTags(x, Adverbs) }
  function IsInterrogative (line 48) | func IsInterrogative(x POSTag) bool { return InPOSTags(x, Interrogatives) }
  function IsDeterminer (line 49) | func IsDeterminer(x POSTag) bool    { return InPOSTags(x, Determiners) }
  function IsNumber (line 50) | func IsNumber(x POSTag) bool        { return InPOSTags(x, Numbers) }
  function IsSymbol (line 51) | func IsSymbol(x POSTag) bool        { return InPOSTags(x, Symbols) }

FILE: POSTag_stanford.go
  constant BUILD_TAGSET (line 7) | BUILD_TAGSET = "stanfordtags"
  constant X (line 10) | X           POSTag = iota
  constant UNKNOWN_TAG (line 11) | UNKNOWN_TAG
  constant ROOT_TAG (line 12) | ROOT_TAG
  constant CC (line 13) | CC
  constant CD (line 14) | CD
  constant DT (line 15) | DT
  constant EX (line 16) | EX
  constant FW (line 17) | FW
  constant IN (line 18) | IN
  constant JJ (line 19) | JJ
  constant JJR (line 20) | JJR
  constant JJS (line 21) | JJS
  constant LS (line 22) | LS
  constant MD (line 23) | MD
  constant NN (line 24) | NN
  constant NNS (line 25) | NNS
  constant NNP (line 26) | NNP
  constant NNPS (line 27) | NNPS
  constant PDT (line 28) | PDT
  constant POS (line 29) | POS
  constant PRP (line 30) | PRP
  constant PPRP (line 31) | PPRP
  constant RB (line 32) | RB
  constant RBR (line 33) | RBR
  constant RBS (line 34) | RBS
  constant RP (line 35) | RP
  constant SYM (line 36) | SYM
  constant TO (line 37) | TO
  constant UH (line 38) | UH
  constant VB (line 39) | VB
  constant VBD (line 40) | VBD
  constant VBG (line 41) | VBG
  constant VBN (line 42) | VBN
  constant VBP (line 43) | VBP
  constant VBZ (line 44) | VBZ
  constant WDT (line 45) | WDT
  constant WP (line 46) | WP
  constant PWP (line 47) | PWP
  constant WRB (line 48) | WRB
  constant COMMA (line 51) | COMMA
  constant FULLSTOP (line 52) | FULLSTOP
  constant OPENQUOTE (line 53) | OPENQUOTE
  constant CLOSEQUOTE (line 54) | CLOSEQUOTE
  constant COLON (line 55) | COLON
  constant DOLLAR (line 56) | DOLLAR
  constant HASHSIGN (line 57) | HASHSIGN
  constant LEFTBRACE (line 58) | LEFTBRACE
  constant RIGHTBRACE (line 59) | RIGHTBRACE
  constant HYPH (line 63) | HYPH
  constant AFX (line 64) | AFX
  constant ADD (line 65) | ADD
  constant NFP (line 66) | NFP
  constant GW (line 67) | GW
  constant XX (line 68) | XX
  constant MAXTAG (line 70) | MAXTAG
  function POSTagShortcut (line 74) | func POSTagShortcut(l Lexeme) (POSTag, bool) {
  function IsIN (line 128) | func IsIN(x POSTag) bool { return x == IN }

FILE: POSTag_stanford_string.go
  constant _POSTag_name (line 9) | _POSTag_name = "XUNKNOWN_TAGROOT_TAGCCCDDTEXFWINJJJJRJJSLSMDNNNNSNNPNNPS...
  method String (line 13) | func (i POSTag) String() string {

FILE: POSTag_universal.go
  constant BUILD_TAGSET (line 7) | BUILD_TAGSET = "universaltags"
  constant X (line 10) | X POSTag = iota
  constant UNKNOWN_TAG (line 11) | UNKNOWN_TAG
  constant ROOT_TAG (line 12) | ROOT_TAG
  constant ADJ (line 13) | ADJ
  constant ADP (line 14) | ADP
  constant ADV (line 15) | ADV
  constant AUX (line 16) | AUX
  constant CONJ (line 17) | CONJ
  constant DET (line 18) | DET
  constant INTJ (line 19) | INTJ
  constant NOUN (line 20) | NOUN
  constant NUM (line 21) | NUM
  constant PART (line 22) | PART
  constant PRON (line 23) | PRON
  constant PROPN (line 24) | PROPN
  constant PUNCT (line 25) | PUNCT
  constant SCONJ (line 26) | SCONJ
  constant SYM (line 27) | SYM
  constant VERB (line 28) | VERB
  constant MAXTAG (line 30) | MAXTAG
  function POSTagShortcut (line 34) | func POSTagShortcut(l Lexeme) (POSTag, bool) {
  function IsIN (line 67) | func IsIN(x POSTag) bool { return x == SCONJ }

FILE: POSTag_universal_string.go
  constant _POSTag_name (line 9) | _POSTag_name = "XUNKNOWN_TAGROOT_TAGADJADPADVAUXCONJDETINTJNOUNNUMPARTPR...
  method String (line 13) | func (i POSTag) String() string {

FILE: annotation.go
  type Annotation (line 15) | type Annotation struct
    method Clone (line 63) | func (a *Annotation) Clone() *Annotation {
    method SetHead (line 73) | func (a *Annotation) SetHead(headAnn *Annotation) {
    method HeadID (line 80) | func (a *Annotation) HeadID() int {
    method IsNumber (line 87) | func (a *Annotation) IsNumber() bool {
    method String (line 91) | func (a *Annotation) String() string {
    method GoString (line 95) | func (a *Annotation) GoString() string {
    method Process (line 104) | func (a *Annotation) Process(f AnnotationFixer) error {
  function NewAnnotation (line 37) | func NewAnnotation() *Annotation {
  function AnnotationFromLexTag (line 46) | func AnnotationFromLexTag(l Lexeme, t POSTag, f AnnotationFixer) *Annota...
  function RootAnnotation (line 170) | func RootAnnotation() *Annotation  { return rootAnnotation }
  function StartAnnotation (line 171) | func StartAnnotation() *Annotation { return startAnnotation }
  function NullAnnotation (line 172) | func NullAnnotation() *Annotation  { return nullAnnotation }
  function StringToAnnotation (line 174) | func StringToAnnotation(s string, f AnnotationFixer) *Annotation {
  type AnnotationFixer (line 184) | type AnnotationFixer interface

FILE: annotationSet.go
  type AnnotationSet (line 10) | type AnnotationSet
    method Len (line 12) | func (as AnnotationSet) Len() int      { return len(as) }
    method Swap (line 13) | func (as AnnotationSet) Swap(i, j int) { as[i], as[j] = as[j], as[i] }
    method Less (line 14) | func (as AnnotationSet) Less(i, j int) bool {
    method Set (line 18) | func (as AnnotationSet) Set() AnnotationSet {
    method Contains (line 24) | func (as AnnotationSet) Contains(a *Annotation) bool {
    method Index (line 31) | func (as AnnotationSet) Index(a *Annotation) int {
    method Add (line 40) | func (as AnnotationSet) Add(a *Annotation) AnnotationSet {

FILE: annotationSet_bench_test.go
  method index2 (line 8) | func (as AnnotationSet) index2(a *Annotation) int {
  function benchASIndex (line 16) | func benchASIndex(size int, b *testing.B) {
  function benchASIndex2 (line 31) | func benchASIndex2(size int, b *testing.B) {
  function BenchmarkAnnotationSetIndex_1 (line 46) | func BenchmarkAnnotationSetIndex_1(b *testing.B)    { benchASIndex(1, b) }
  function BenchmarkAnnotationSetIndex_2 (line 47) | func BenchmarkAnnotationSetIndex_2(b *testing.B)    { benchASIndex(2, b) }
  function BenchmarkAnnotationSetIndex_8 (line 48) | func BenchmarkAnnotationSetIndex_8(b *testing.B)    { benchASIndex(8, b) }
  function BenchmarkAnnotationSetIndex_16 (line 49) | func BenchmarkAnnotationSetIndex_16(b *testing.B)   { benchASIndex(16, b) }
  function BenchmarkAnnotationSetIndex_32 (line 50) | func BenchmarkAnnotationSetIndex_32(b *testing.B)   { benchASIndex(32, b) }
  function BenchmarkAnnotationSetIndex_64 (line 51) | func BenchmarkAnnotationSetIndex_64(b *testing.B)   { benchASIndex(64, b) }
  function BenchmarkAnnotationSetIndex_128 (line 52) | func BenchmarkAnnotationSetIndex_128(b *testing.B)  { benchASIndex(128, ...
  function BenchmarkAnnotationSetIndex_256 (line 53) | func BenchmarkAnnotationSetIndex_256(b *testing.B)  { benchASIndex(256, ...
  function BenchmarkAnnotationSetIndex_512 (line 54) | func BenchmarkAnnotationSetIndex_512(b *testing.B)  { benchASIndex(512, ...
  function BenchmarkAnnotationSetIndex_1024 (line 55) | func BenchmarkAnnotationSetIndex_1024(b *testing.B) { benchASIndex(1024,...
  function BenchmarkAnnotationSetIndex2_1 (line 57) | func BenchmarkAnnotationSetIndex2_1(b *testing.B)    { benchASIndex2(1, ...
  function BenchmarkAnnotationSetIndex2_2 (line 58) | func BenchmarkAnnotationSetIndex2_2(b *testing.B)    { benchASIndex2(2, ...
  function BenchmarkAnnotationSetIndex2_8 (line 59) | func BenchmarkAnnotationSetIndex2_8(b *testing.B)    { benchASIndex2(8, ...
  function BenchmarkAnnotationSetIndex2_16 (line 60) | func BenchmarkAnnotationSetIndex2_16(b *testing.B)   { benchASIndex2(16,...
  function BenchmarkAnnotationSetIndex2_32 (line 61) | func BenchmarkAnnotationSetIndex2_32(b *testing.B)   { benchASIndex2(32,...
  function BenchmarkAnnotationSetIndex2_64 (line 62) | func BenchmarkAnnotationSetIndex2_64(b *testing.B)   { benchASIndex2(64,...
  function BenchmarkAnnotationSetIndex2_128 (line 63) | func BenchmarkAnnotationSetIndex2_128(b *testing.B)  { benchASIndex2(128...
  function BenchmarkAnnotationSetIndex2_256 (line 64) | func BenchmarkAnnotationSetIndex2_256(b *testing.B)  { benchASIndex2(256...
  function BenchmarkAnnotationSetIndex2_512 (line 65) | func BenchmarkAnnotationSetIndex2_512(b *testing.B)  { benchASIndex2(512...
  function BenchmarkAnnotationSetIndex2_1024 (line 66) | func BenchmarkAnnotationSetIndex2_1024(b *testing.B) { benchASIndex2(102...

FILE: browncluster.go
  type Cluster (line 15) | type Cluster
  function ReadCluster (line 18) | func ReadCluster(r io.Reader) map[string]Cluster {

FILE: cmd/demo/io.go
  constant posModelFile (line 13) | posModelFile = `model/pos_stanfordtags_universalrel.final.model`
  constant depModelFile (line 14) | depModelFile = `model/dep_stanfordtags_universalrel.final.model`
  constant brownCluster (line 15) | brownCluster = `clusters.txt`
  function io (line 18) | func io() {

FILE: cmd/demo/main.go
  function main (line 13) | func main() {

FILE: cmd/demo/nlp.go
  type stemmer (line 20) | type stemmer struct
    method Stem (line 22) | func (stemmer) Stem(a string) (string, error) {
  type fixer (line 26) | type fixer struct
    method Clusters (line 30) | func (f fixer) Clusters() (map[string]lingo.Cluster, error) { return c...
    method Lemmatize (line 31) | func (f fixer) Lemmatize(a string, pt lingo.POSTag) ([]string, error) {
  type nocomp (line 35) | type nocomp
    method Error (line 37) | func (e nocomp) Error() string     { return fmt.Sprintf("no %v", strin...
    method Component (line 38) | func (e nocomp) Component() string { return string(e) }
  function pipeline (line 40) | func pipeline(s string) (d *lingo.Dependency, err error) {

FILE: cmd/dep/fixer.go
  type stemmer (line 10) | type stemmer struct
    method Stem (line 12) | func (stemmer) Stem(a string) (string, error) {
  type fixer (line 16) | type fixer struct
    method Clusters (line 20) | func (f fixer) Clusters() (map[string]lingo.Cluster, error) { return c...
    method Lemmatize (line 21) | func (f fixer) Lemmatize(a string, pt lingo.POSTag) ([]string, error) {
  type nocomp (line 25) | type nocomp
    method Error (line 27) | func (e nocomp) Error() string     { return fmt.Sprintf("no %v", strin...
    method Component (line 28) | func (e nocomp) Component() string { return string(e) }

FILE: cmd/dep/io.go
  function validateFlags (line 11) | func validateFlags() {
  function loadTreebanks (line 38) | func loadTreebanks() {
  function loadPOSModel (line 48) | func loadPOSModel() {
  function loadDepModel (line 58) | func loadDepModel() {
  function saveModel (line 66) | func saveModel() {

FILE: cmd/dep/main.go
  function init (line 34) | func init() {
  function cleanup (line 44) | func cleanup(sigChan chan os.Signal, cpuprofiling, memprofiling bool) {
  function main (line 65) | func main() {

FILE: cmd/dep/pipeline.go
  function receive (line 14) | func receive(deps chan *lingo.Dependency, errs, errChan chan error) {
  function pipeline (line 36) | func pipeline(s string) error {

FILE: cmd/dep/train.go
  function train (line 14) | func train() {

FILE: cmd/lexer/main.go
  function receieve (line 15) | func receieve() {
  function main (line 21) | func main() {

FILE: cmd/pos/crossvalidation.go
  type testResult (line 17) | type testResult struct
    method compare (line 22) | func (tr testResult) compare() (int, bool) {
  function crossValidate (line 44) | func crossValidate(resultChan chan testResult) {
  function collect (line 86) | func collect(ch chan lingo.AnnotatedSentence, correct lingo.AnnotatedSen...
  function testModel (line 94) | func testModel(sentences []treebank.SentenceTag) {
  function cvpipeline (line 115) | func cvpipeline(s string, output chan lingo.AnnotatedSentence) {

FILE: cmd/pos/fixer.go
  type stemmer (line 12) | type stemmer struct
    method Stem (line 14) | func (stemmer) Stem(a string) (string, error) {
  type fixer (line 18) | type fixer struct
    method Clusters (line 22) | func (f fixer) Clusters() (map[string]lingo.Cluster, error) { return c...
    method Lemmatize (line 23) | func (f fixer) Lemmatize(a string, pt lingo.POSTag) ([]string, error) {
  type nocomp (line 27) | type nocomp
    method Error (line 29) | func (e nocomp) Error() string     { return fmt.Sprintf("no %v", strin...
    method Component (line 30) | func (e nocomp) Component() string { return string(e) }

FILE: cmd/pos/main.go
  function receive (line 37) | func receive(sentences chan lingo.AnnotatedSentence, wg *sync.WaitGroup) {
  function pipeline (line 46) | func pipeline(s string) {
  function validateFlags (line 62) | func validateFlags() {
  function loadOrTrain (line 82) | func loadOrTrain() {
  function cleanup (line 135) | func cleanup(sigChan chan os.Signal, profiling bool) {
  function main (line 146) | func main() {

FILE: corpus/consopt.go
  type ConsOpt (line 14) | type ConsOpt
  function WithWords (line 17) | func WithWords(a []string) ConsOpt {
  function WithOrderedWords (line 53) | func WithOrderedWords(a []string) ConsOpt {
  function WithSize (line 84) | func WithSize(size int) ConsOpt {
  function FromDict (line 94) | func FromDict(d map[string]int) ConsOpt {
  function FromDictWithFreq (line 125) | func FromDictWithFreq(d map[string]struct{ ID, Freq int }) ConsOpt {

FILE: corpus/corpus.go
  type Corpus (line 12) | type Corpus struct
    method Id (line 66) | func (c *Corpus) Id(word string) (int, bool) {
    method Word (line 72) | func (c *Corpus) Word(id int) (string, bool) {
    method Add (line 83) | func (c *Corpus) Add(word string) int {
    method Size (line 105) | func (c *Corpus) Size() int {
    method WordFreq (line 111) | func (c *Corpus) WordFreq(word string) int {
    method IDFreq (line 121) | func (c *Corpus) IDFreq(id int) int {
    method TotalFreq (line 132) | func (c *Corpus) TotalFreq() int {
    method MaxWordLength (line 137) | func (c *Corpus) MaxWordLength() int {
    method WordProb (line 142) | func (c *Corpus) WordProb(word string) (float64, bool) {
    method Merge (line 154) | func (c *Corpus) Merge(other *Corpus) {
    method Replace (line 172) | func (c *Corpus) Replace(a, with string) error {
    method ReplaceWord (line 186) | func (c *Corpus) ReplaceWord(id int, with string) error {
  function New (line 25) | func New() *Corpus {
  function Construct (line 42) | func Construct(opts ...ConsOpt) (*Corpus, error) {

FILE: corpus/corpus_test.go
  function TestCorpus (line 9) | func TestCorpus(t *testing.T) {
  function TestCorpus_Merge (line 48) | func TestCorpus_Merge(t *testing.T) {

FILE: corpus/functions.go
  function GenerateCorpus (line 14) | func GenerateCorpus(sentenceTags []treebank.SentenceTag) *Corpus {
  function ViterbiSplit (line 61) | func ViterbiSplit(input string, c *Corpus) []string {
  function CosineSimilarity (line 124) | func CosineSimilarity(a, b []string) float64 {
  function DamerauLevenshtein (line 174) | func DamerauLevenshtein(s1 string, s2 string) (distance int) {
  function LongestCommonPrefix (line 274) | func LongestCommonPrefix(strs ...string) string {
  function StrsToInts (line 311) | func StrsToInts(strs []string) (retVal []int, err error) {
  function CombineInts (line 336) | func CombineInts(ints []int) int {

FILE: corpus/functions_test.go
  function Test_GenerateCorpus (line 10) | func Test_GenerateCorpus(t *testing.T) {
  function TestViterbiSplit (line 28) | func TestViterbiSplit(t *testing.T) {
  function TestCosineSimilarity (line 45) | func TestCosineSimilarity(t *testing.T) {
  function TestDL (line 73) | func TestDL(t *testing.T) {
  function TestLCP (line 101) | func TestLCP(t *testing.T) {
  function TestParseNumber (line 128) | func TestParseNumber(t *testing.T) {

FILE: corpus/inflection.go
  type conversionPattern (line 9) | type conversionPattern struct
  function newConversionPattern (line 14) | func newConversionPattern(from, to string) conversionPattern {
  function Pluralize (line 94) | func Pluralize(word string) string {
  function Singularize (line 115) | func Singularize(word string) string {

FILE: corpus/inflection_test.go
  function TestPluralize (line 28) | func TestPluralize(t *testing.T) {
  function TestSingularize (line 37) | func TestSingularize(t *testing.T) {

FILE: corpus/io.go
  type sortutil (line 13) | type sortutil struct
    method Len (line 19) | func (s *sortutil) Len() int           { return len(s.words) }
    method Less (line 20) | func (s *sortutil) Less(i, j int) bool { return s.ids[i] < s.ids[j] }
    method Swap (line 21) | func (s *sortutil) Swap(i, j int) {
  function ToDictWithFreq (line 30) | func ToDictWithFreq(c *Corpus) map[string]struct{ ID, Freq int } {
  function ToDict (line 39) | func ToDict(c *Corpus) map[string]int {
  method GobEncode (line 48) | func (c *Corpus) GobEncode() ([]byte, error) {
  method GobDecode (line 80) | func (c *Corpus) GobDecode(buf []byte) error {
  method LoadOneGram (line 119) | func (c *Corpus) LoadOneGram(r io.Reader) error {

FILE: corpus/io_test.go
  function TestCorpusGob (line 12) | func TestCorpusGob(t *testing.T) {
  function TestCorpusToDict (line 43) | func TestCorpusToDict(t *testing.T) {
  function TestCorpusToDictWithFreq (line 60) | func TestCorpusToDictWithFreq(t *testing.T) {
  function TestLoadOneGram (line 73) | func TestLoadOneGram(t *testing.T) {

FILE: corpus/lda.go
  type LDAModel (line 9) | type LDAModel struct
    method init (line 36) | func (l *LDAModel) init() {

FILE: corpus/test_test.go
  constant sample1Gram (line 9) | sample1Gram = `the	23135851162
  function mediumSentence (line 17) | func mediumSentence() []treebank.SentenceTag {
  constant EPSILON64 (line 44) | EPSILON64 float64 = 1e-10
  function floatEquals64 (line 46) | func floatEquals64(a, b float64) bool {

FILE: corpus/utils.go
  function minInt (line 8) | func minInt(a, b int) int {
  function maxInt (line 15) | func maxInt(a, b int) int {
  function dot (line 22) | func dot(a, b []float64) (float64, error) {
  function mag (line 34) | func mag(a []float64) (float64, error) {

FILE: dep/arcStandard.go
  method canApply (line 8) | func (c *configuration) canApply(t transition) bool {
  method apply (line 45) | func (c *configuration) apply(t transition) {
  method oracle (line 62) | func (c *configuration) oracle(goldParse *lingo.Dependency) (t transitio...

FILE: dep/arcStandard_test.go
  function TestCanApply (line 10) | func TestCanApply(t *testing.T) {
  function TestOracle (line 67) | func TestOracle(t *testing.T) {

FILE: dep/configuration.go
  type head (line 11) | type head
  constant DOES_NOT_EXIST (line 14) | DOES_NOT_EXIST head = iota - 1
  type configuration (line 18) | type configuration struct
    method String (line 50) | func (c *configuration) String() string {
    method GoString (line 54) | func (c *configuration) GoString() string {
    method bufferSize (line 58) | func (c *configuration) bufferSize() int {
    method stackSize (line 62) | func (c *configuration) stackSize() int {
    method head (line 66) | func (c *configuration) head(i int) head {
    method stackValue (line 72) | func (c *configuration) stackValue(i int) head {
    method bufferValue (line 80) | func (c *configuration) bufferValue(i int) head {
    method pop (line 91) | func (c *configuration) pop() head {
    method removeStack (line 98) | func (c *configuration) removeStack(i int) {
    method removeSecondTopStack (line 103) | func (c *configuration) removeSecondTopStack() bool {
    method removeTopStack (line 113) | func (c *configuration) removeTopStack() bool {
    method label (line 125) | func (c *configuration) label(i head) lingo.DependencyType {
    method annotation (line 141) | func (c *configuration) annotation(i head) *lingo.Annotation {
    method lc (line 157) | func (c *configuration) lc(k, cnt head) head {
    method rc (line 174) | func (c *configuration) rc(k, cnt head) head {
    method hasOtherChildren (line 191) | func (c *configuration) hasOtherChildren(i int, goldParse *lingo.Depen...
    method isTerminal (line 200) | func (c *configuration) isTerminal() bool {
    method shift (line 205) | func (c *configuration) shift() bool {
  function newConfiguration (line 26) | func newConfiguration(sentence lingo.AnnotatedSentence, fromGold bool) *...

FILE: dep/configuration_test.go
  function TestStackAppendRemove (line 10) | func TestStackAppendRemove(t *testing.T) {
  function TestConfiguration_StackValue (line 40) | func TestConfiguration_StackValue(t *testing.T) {

FILE: dep/debug.go
  constant BUILD_DEBUG (line 16) | BUILD_DEBUG = "PARSER: DEBUG BUILD"
  constant BUILD_DIAG (line 17) | BUILD_DIAG = "Diagnostic Build"
  constant DEBUG (line 19) | DEBUG = true
  function tabcount (line 25) | func tabcount() int {
  function enterLoggingContext (line 29) | func enterLoggingContext() {
  function leaveLoggingContext (line 35) | func leaveLoggingContext() {
  function logf (line 48) | func logf(format string, others ...interface{}) {
  function logTrainingProgress (line 55) | func logTrainingProgress(iteration, correct, total, length, possibles in...
  function logMemStats (line 64) | func logMemStats() {
  function recoverFrom (line 79) | func recoverFrom(format string, attrs ...interface{}) {
  method SprintFeatures (line 87) | func (d *Parser) SprintFeatures(features []int) string {
  function SprintScores (line 120) | func SprintScores(scores []float64, ts []transition) string {
  function SprintFloatSlice (line 132) | func SprintFloatSlice(a []float64) string {

FILE: dep/dependencyParser.go
  type Parser (line 17) | type Parser struct
    method Run (line 38) | func (d *Parser) Run() {
    method predict (line 52) | func (d *Parser) predict(sentence lingo.AnnotatedSentence) (*lingo.Dep...
    method String (line 111) | func (d *Parser) String() string {
  function New (line 26) | func New(m *Model) *Parser {

FILE: dep/errors.go
  type componentUnavailable (line 9) | type componentUnavailable
    method Error (line 11) | func (c componentUnavailable) Error() string     { return fmt.Sprintf(...
    method Component (line 12) | func (c componentUnavailable) Component() string { return string(c) }
  type TarpitError (line 17) | type TarpitError struct
    method Error (line 19) | func (err TarpitError) Error() string { return "Tarpit Error" }
  type NonProjectiveError (line 22) | type NonProjectiveError struct
    method Error (line 24) | func (err NonProjectiveError) Error() string { return "Non-projective ...

FILE: dep/evaluation.go
  type Performance (line 12) | type Performance struct
    method String (line 20) | func (p Performance) String() string {
  function Evaluate (line 33) | func Evaluate(predictedTrees, goldTrees []*lingo.Dependency) Performance {
  method crossValidate (line 87) | func (t *Trainer) crossValidate(st []treebank.SentenceTag) Performance {
  method predMany (line 97) | func (t *Trainer) predMany(sentenceTags []treebank.SentenceTag) []*lingo...
  method pred (line 110) | func (t *Trainer) pred(as lingo.AnnotatedSentence) (*lingo.Dependency, e...

FILE: dep/example.go
  type example (line 12) | type example struct
  function makeExamples (line 19) | func makeExamples(sentenceTags []treebank.SentenceTag, conf NNConfig, di...
  function makeOneExample (line 43) | func makeOneExample(i int, sentenceTag treebank.SentenceTag, dict *corpu...
  function shuffleExamples (line 84) | func shuffleExamples(a []example) {

FILE: dep/example_test.go
  function TestMakeExamples (line 9) | func TestMakeExamples(t *testing.T) {

FILE: dep/featureExtraction.go
  function getFeatures (line 9) | func getFeatures(c *configuration, dict *corpus.Corpus) []int {
  constant POS_OFFSET (line 141) | POS_OFFSET   int = 18
  constant DEP_OFFSET (line 142) | DEP_OFFSET       = 36
  constant STACK_OFFSET (line 143) | STACK_OFFSET     = 6
  constant STACK_NUMBER (line 144) | STACK_NUMBER     = 6

FILE: dep/features.go
  type feature (line 8) | type feature
  constant s0w (line 15) | s0w feature = iota
  constant s1w (line 16) | s1w
  constant s2w (line 17) | s2w
  constant b0w (line 19) | b0w
  constant b1w (line 20) | b1w
  constant b2w (line 21) | b2w
  constant s0l1w (line 23) | s0l1w
  constant s0r1w (line 24) | s0r1w
  constant s0l2w (line 25) | s0l2w
  constant s0r2w (line 26) | s0r2w
  constant s0llw (line 27) | s0llw
  constant s0rrw (line 28) | s0rrw
  constant s1l1w (line 30) | s1l1w
  constant s1r1w (line 31) | s1r1w
  constant s1l2w (line 32) | s1l2w
  constant s1r2w (line 33) | s1r2w
  constant s1llw (line 34) | s1llw
  constant s1rrw (line 35) | s1rrw
  constant s0t (line 38) | s0t
  constant s1t (line 39) | s1t
  constant s2t (line 40) | s2t
  constant b0t (line 42) | b0t
  constant b1t (line 43) | b1t
  constant b2t (line 44) | b2t
  constant s0l1t (line 46) | s0l1t
  constant s0r1t (line 47) | s0r1t
  constant s0l2t (line 48) | s0l2t
  constant s0r2t (line 49) | s0r2t
  constant s0llt (line 50) | s0llt
  constant s0rrt (line 51) | s0rrt
  constant s1l1t (line 53) | s1l1t
  constant s1r1t (line 54) | s1r1t
  constant s1l2t (line 55) | s1l2t
  constant s1r2t (line 56) | s1r2t
  constant s1llt (line 57) | s1llt
  constant s1rrt (line 58) | s1rrt
  constant s0l1d (line 61) | s0l1d
  constant s0r1d (line 62) | s0r1d
  constant s0l2d (line 63) | s0l2d
  constant s0r2d (line 64) | s0r2d
  constant s0lld (line 65) | s0lld
  constant s0rrd (line 66) | s0rrd
  constant s1l1d (line 68) | s1l1d
  constant s1r1d (line 69) | s1r1d
  constant s1l2d (line 70) | s1l2d
  constant s1r2d (line 71) | s1r2d
  constant s1lld (line 72) | s1lld
  constant s1rrd (line 73) | s1rrd
  constant MAXFEATURE (line 75) | MAXFEATURE
  constant wordFeatsStartAt (line 79) | wordFeatsStartAt  int = int(lingo.MAXTAG) + int(lingo.MAXDEPTYPE)
  constant labelFeatsStartAt (line 80) | labelFeatsStartAt     = int(lingo.MAXTAG)
  constant posFeatsStartAt (line 81) | posFeatsStartAt       = 0

FILE: dep/features_string.go
  constant _feature_name (line 7) | _feature_name = "s0ws1ws2wb0wb1wb2ws0l1ws0r1ws0l2ws0r2ws0llws0rrws1l1ws1...
  method String (line 11) | func (i feature) String() string {

FILE: dep/fix.go
  function fix (line 10) | func fix(d *lingo.Dependency) {
  function properNounSpans (line 98) | func properNounSpans(d *lingo.Dependency) (retVal []span) {

FILE: dep/init.go
  function init (line 5) | func init() {

FILE: dep/models.go
  type Model (line 17) | type Model struct
    method Corpus (line 23) | func (m *Model) Corpus() *corpus.Corpus { return m.corpus }
    method WordEmbeddings (line 25) | func (m *Model) WordEmbeddings() *tensor.Dense {
    method POSTagEmbeddings (line 31) | func (m *Model) POSTagEmbeddings() *tensor.Dense {
    method String (line 37) | func (m *Model) String() string {
    method Save (line 48) | func (m *Model) Save(filename string) error {
    method SaveWriter (line 60) | func (m *Model) SaveWriter(f io.WriteCloser) error {
  function Load (line 81) | func Load(filename string) (*Model, error) {
  function LoadReader (line 89) | func LoadReader(rd io.ReadCloser) (*Model, error) {

FILE: dep/models_test.go
  function TestModel_SaveLoad (line 11) | func TestModel_SaveLoad(t *testing.T) {

FILE: dep/move.go
  type Move (line 4) | type Move
  constant Shift (line 9) | Shift Move = iota
  constant Left (line 10) | Left
  constant Right (line 11) | Right
  constant MAXMOVE (line 13) | MAXMOVE

FILE: dep/move_string.go
  constant _Move_name (line 7) | _Move_name = "ShiftLeftRightMAXMOVE"
  method String (line 11) | func (i Move) String() string {

FILE: dep/nn2.go
  type may (line 12) | type may struct
    method doUnary (line 17) | func (m *may) doUnary(fn func(*G.Node) (*G.Node, error)) {
    method doBinary (line 24) | func (m *may) doBinary(fn func(a, b *G.Node) (*G.Node, error), other *...
    method doSwapBinary (line 31) | func (m *may) doSwapBinary(fn func(a, b *G.Node) (*G.Node, error), oth...
  type neuralnetwork2 (line 38) | type neuralnetwork2 struct
    method initialized (line 92) | func (nn *neuralnetwork2) initialized() bool {
    method init (line 102) | func (nn *neuralnetwork2) init() error {
    method fwd (line 211) | func (nn *neuralnetwork2) fwd() error {
    method costProgress (line 288) | func (nn *neuralnetwork2) costProgress() <-chan G.Value {
    method train (line 296) | func (nn *neuralnetwork2) train(examples []example) error {
    method pred (line 347) | func (nn *neuralnetwork2) pred(ind []int) (int, error) {
    method feats2vec (line 372) | func (nn *neuralnetwork2) feats2vec(indicators []int) error {

FILE: dep/nn2_io.go
  method String (line 15) | func (nn *neuralnetwork2) String() string {
  method GobEncode (line 41) | func (nn *neuralnetwork2) GobEncode() ([]byte, error) {
  method GobDecode (line 87) | func (nn *neuralnetwork2) GobDecode(buf []byte) error {

FILE: dep/nn2_io_test.go
  function TestNNIO (line 14) | func TestNNIO(t *testing.T) {

FILE: dep/nn2_test.go
  function TestNN2 (line 12) | func TestNN2(t *testing.T) {

FILE: dep/nnconfig.go
  type NNConfig (line 13) | type NNConfig struct
    method String (line 28) | func (c NNConfig) String() string {
    method GobEncode (line 48) | func (c NNConfig) GobEncode() ([]byte, error) {
    method GobDecode (line 73) | func (c *NNConfig) GobDecode(p []byte) error {
  function init (line 101) | func init() {

FILE: dep/release.go
  constant BUILD_DEBUG (line 5) | BUILD_DEBUG = "PARSER: RELEASE BUILD"
  constant BUILD_DIAG (line 6) | BUILD_DIAG = "Non-Diagnostic Build"
  constant DEBUG (line 8) | DEBUG = false
  function enterLoggingContext (line 14) | func enterLoggingContext() {}
  function leaveLoggingContext (line 16) | func leaveLoggingContext() {}
  function logTrainingProgress (line 18) | func logTrainingProgress(iteration, correct, total, length, possibles in...
  function logMemStats (line 20) | func logMemStats() {}
  function logf (line 22) | func logf(format string, others ...interface{}) {}
  function recoverFrom (line 24) | func recoverFrom(format string, attrs ...interface{}) {}
  method SprintFeatures (line 26) | func (d *Parser) SprintFeatures(feature []int) string { return "" }
  function SprintScores (line 28) | func SprintScores(scores []float64, ts []transition) string { return "" }

FILE: dep/span.go
  type span (line 3) | type span struct
    method combine (line 14) | func (s span) combine(other span) span {
  function makeSpan (line 7) | func makeSpan(start, end int) span {

FILE: dep/test_test.go
  type dummyLem (line 18) | type dummyLem struct
    method Lemmatize (line 20) | func (dummyLem) Lemmatize(s string, pt lingo.POSTag) ([]string, error) {
  type dummyStemmer (line 24) | type dummyStemmer struct
    method Stem (line 26) | func (dummyStemmer) Stem(s string) (string, error) {
  type dummyFix (line 30) | type dummyFix struct
    method Clusters (line 35) | func (dummyFix) Clusters() (map[string]lingo.Cluster, error) {
  constant nnps (line 39) | nnps = `1	Guerrillas	guerrilla	NOUN	NNS	Number=Plur	2	nsubj	_	_
  constant simple (line 61) | simple = `1	Yet	yet	CONJ	CC	_	5	cc	_	_
  constant med (line 74) | med = `1	President	President	PROPN	NNP	Number=Sing	2	compound	_	_
  constant long (line 96) | long = `1	Now	now	ADV	RB	_	5	advmod	_	_
  constant cvconllu (line 142) | cvconllu = `1	Google	Google	PROPN	NNP	Number=Sing	6	nsubj	_	_
  function lotsaNNP (line 161) | func lotsaNNP() *lingo.Dependency {
  function simpleSentence (line 169) | func simpleSentence() []treebank.SentenceTag {
  function mediumSentence (line 174) | func mediumSentence() []treebank.SentenceTag {
  function longSentence (line 180) | func longSentence() []treebank.SentenceTag {
  function allSentences (line 185) | func allSentences() []treebank.SentenceTag {
  function cvSentences (line 193) | func cvSentences() []treebank.SentenceTag {
  function hash (line 197) | func hash(s string) string {
  function cache (line 203) | func cache(input string, s lingo.AnnotatedSentence) {
  function useCached (line 221) | func useCached(filename string) *lingo.Dependency {

FILE: dep/train.go
  type TrainerConsOpt (line 15) | type TrainerConsOpt
  function WithTrainingModel (line 18) | func WithTrainingModel(m *Model) TrainerConsOpt {
  function WithTrainingSet (line 26) | func WithTrainingSet(st []treebank.SentenceTag) TrainerConsOpt {
  function WithCrossValidationSet (line 34) | func WithCrossValidationSet(st []treebank.SentenceTag) TrainerConsOpt {
  function WithConfig (line 42) | func WithConfig(conf NNConfig) TrainerConsOpt {
  function WithLemmatizer (line 53) | func WithLemmatizer(l lingo.Lemmatizer) TrainerConsOpt {
  function WithStemmer (line 66) | func WithStemmer(s lingo.Stemmer) TrainerConsOpt {
  function WithCluster (line 78) | func WithCluster(c map[string]lingo.Cluster) TrainerConsOpt {
  function WithCorpus (line 86) | func WithCorpus(c *corpus.Corpus) TrainerConsOpt {
  function WithGeneratedCorpus (line 95) | func WithGeneratedCorpus(sts ...treebank.SentenceTag) TrainerConsOpt {
  type Trainer (line 110) | type Trainer struct
    method Lemmatize (line 153) | func (t *Trainer) Lemmatize(a string, pt lingo.POSTag) ([]string, erro...
    method Stem (line 161) | func (t *Trainer) Stem(a string) (string, error) {
    method Clusters (line 169) | func (t *Trainer) Clusters() (map[string]lingo.Cluster, error) {
    method Cost (line 180) | func (t *Trainer) Cost() <-chan float64 {
    method Perf (line 188) | func (t *Trainer) Perf() <-chan Performance {
    method Init (line 198) | func (t *Trainer) Init() (err error) {
    method Train (line 209) | func (t *Trainer) Train(epochs int) error {
    method TrainWithoutCrossValidation (line 220) | func (t *Trainer) TrainWithoutCrossValidation(epochs int) error {
    method train (line 225) | func (t *Trainer) train(epochs int) error {
    method crossValidateTrain (line 257) | func (t *Trainer) crossValidateTrain(epochs int) error {
    method pretrainCheck (line 319) | func (t *Trainer) pretrainCheck() error {
    method handleCosts (line 337) | func (t *Trainer) handleCosts() (epochChan chan struct{}) {
  function NewTrainer (line 133) | func NewTrainer(opts ...TrainerConsOpt) *Trainer {

FILE: dep/train_test.go
  function TestTrainerInitializations (line 11) | func TestTrainerInitializations(t *testing.T) {
  function TestTrainer_train (line 49) | func TestTrainer_train(t *testing.T) {
  function TestTestTrainer_crossValidateTrain (line 136) | func TestTestTrainer_crossValidateTrain(t *testing.T) {

FILE: dep/transition.go
  type transition (line 10) | type transition struct
    method String (line 51) | func (t transition) String() string {
  function buildTransitions (line 18) | func buildTransitions(labels []lingo.DependencyType) []transition {
  function lookupTransition (line 55) | func lookupTransition(t transition, table []transition) int {
  function init (line 65) | func init() {

FILE: dep/util.go
  function minInt (line 3) | func minInt(a, b int) int {
  function maxInt (line 10) | func maxInt(a, b int) int {

FILE: dependency.go
  type Dependency (line 13) | type Dependency struct
    method Sentence (line 64) | func (d *Dependency) Sentence() AnnotatedSentence { return d.Annotated...
    method Lefts (line 65) | func (d *Dependency) Lefts() [][]int              { return d.lefts }
    method Rights (line 66) | func (d *Dependency) Rights() [][]int             { return d.rights }
    method WordCount (line 67) | func (d *Dependency) WordCount() int              { return d.wordCount }
    method N (line 68) | func (d *Dependency) N() int                      { return d.n }
    method SetLefts (line 71) | func (d *Dependency) SetLefts(l [][]int)  { d.lefts = l }
    method SetRights (line 72) | func (d *Dependency) SetRights(r [][]int) { d.rights = r }
    method Head (line 74) | func (d *Dependency) Head(i int) int {
    method Label (line 82) | func (d *Dependency) Label(i int) DependencyType {
    method Annotation (line 90) | func (d *Dependency) Annotation(i int) *Annotation {
    method AddArc (line 98) | func (d *Dependency) AddArc(head, child int, label DependencyType) {
    method AddChild (line 103) | func (d *Dependency) AddChild(head, child int) {
    method AddRel (line 116) | func (d *Dependency) AddRel(child int, rel DependencyType) {
    method HasSingleRoot (line 121) | func (d *Dependency) HasSingleRoot() bool {
    method IsLegal (line 133) | func (d *Dependency) IsLegal() bool {
    method IsProjective (line 159) | func (d *Dependency) IsProjective() bool {
    method projectiveVisit (line 164) | func (d *Dependency) projectiveVisit(w int) bool {
    method Root (line 186) | func (d *Dependency) Root() int {
    method SprintRel (line 196) | func (d *Dependency) SprintRel() string {
  type depConsOpt (line 26) | type depConsOpt
  function FromAnnotatedSentence (line 29) | func FromAnnotatedSentence(s AnnotatedSentence) depConsOpt {
  function AllocTree (line 40) | func AllocTree() depConsOpt {
  function NewDependency (line 55) | func NewDependency(opts ...depConsOpt) *Dependency {
  type DependencyEdge (line 206) | type DependencyEdge struct
  type edgeByID (line 214) | type edgeByID
    method Len (line 216) | func (b edgeByID) Len() int           { return len(b) }
    method Swap (line 217) | func (b edgeByID) Swap(i, j int)      { b[i], b[j] = b[j], b[i] }
    method Less (line 218) | func (b edgeByID) Less(i, j int) bool { return b[i].Dep.ID < b[j].Dep....

FILE: dependencyTree.go
  type DependencyTree (line 13) | type DependencyTree struct
    method AddChild (line 32) | func (d *DependencyTree) AddChild(child *DependencyTree) {
    method AddRel (line 36) | func (d *DependencyTree) AddRel(rel DependencyType) {
    method walk (line 40) | func (d *DependencyTree) walk(c chan *DependencyTree, wg *sync.WaitGro...
    method Dot (line 50) | func (d *DependencyTree) Dot() string {
    method Walk (line 91) | func (d *DependencyTree) Walk(fn func(interface{})) {
  function NewDependencyTree (line 23) | func NewDependencyTree(parent *DependencyTree, ID int, ann *Annotation) ...
  function dotString (line 65) | func dotString(c chan *DependencyTree, out chan string) {

FILE: dependencyType.go
  type DependencyType (line 9) | type DependencyType
    method MarshalText (line 22) | func (dt DependencyType) MarshalText() ([]byte, error) {
    method UnmarshalText (line 26) | func (dt *DependencyType) UnmarshalText(text []byte) error {
  function init (line 13) | func init() {
  function InDepTypes (line 35) | func InDepTypes(x DependencyType, set []DependencyType) bool {
  function IsModifier (line 44) | func IsModifier(x DependencyType) bool      { return InDepTypes(x, Modif...
  function IsCompound (line 45) | func IsCompound(x DependencyType) bool      { return InDepTypes(x, Compo...
  function IsDeterminerRel (line 46) | func IsDeterminerRel(x DependencyType) bool { return InDepTypes(x, Deter...
  function IsMultiword (line 47) | func IsMultiword(x DependencyType) bool     { return InDepTypes(x, Multi...
  function IsQuantifier (line 48) | func IsQuantifier(x DependencyType) bool    { return InDepTypes(x, Quant...

FILE: dependencyType_stanford.go
  constant BUILD_RELSET (line 5) | BUILD_RELSET = "stanfordrel"
  constant NoDepType (line 11) | NoDepType DependencyType = iota
  constant Dep (line 12) | Dep
  constant Root (line 13) | Root
  constant Aux (line 14) | Aux
  constant AuxPass (line 15) | AuxPass
  constant Cop (line 16) | Cop
  constant Arg (line 17) | Arg
  constant Agent (line 18) | Agent
  constant Comp (line 19) | Comp
  constant AComp (line 20) | AComp
  constant CComp (line 21) | CComp
  constant XComp (line 22) | XComp
  constant Obj (line 23) | Obj
  constant DObj (line 24) | DObj
  constant IObj (line 25) | IObj
  constant PObj (line 26) | PObj
  constant Subj (line 27) | Subj
  constant NSubj (line 28) | NSubj
  constant NSubjPass (line 29) | NSubjPass
  constant CSubj (line 30) | CSubj
  constant CSubjPass (line 31) | CSubjPass
  constant Coordination (line 32) | Coordination
  constant Conj (line 33) | Conj
  constant Expl (line 34) | Expl
  constant Mod (line 35) | Mod
  constant AMod (line 36) | AMod
  constant Appos (line 37) | Appos
  constant Advcl (line 38) | Advcl
  constant Det (line 39) | Det
  constant Predet (line 40) | Predet
  constant Preconj (line 41) | Preconj
  constant Vmod (line 42) | Vmod
  constant MWE (line 43) | MWE
  constant Mark (line 44) | Mark
  constant AdvMod (line 45) | AdvMod
  constant Neg (line 46) | Neg
  constant RCMod (line 47) | RCMod
  constant QuantMod (line 48) | QuantMod
  constant NounMod (line 49) | NounMod
  constant NPAdvMod (line 50) | NPAdvMod
  constant TMod (line 51) | TMod
  constant Num (line 52) | Num
  constant NumberElement (line 53) | NumberElement
  constant Prep (line 54) | Prep
  constant Poss (line 55) | Poss
  constant Possessive (line 56) | Possessive
  constant PRT (line 57) | PRT
  constant Parataxis (line 58) | Parataxis
  constant GoesWith (line 59) | GoesWith
  constant Punct (line 60) | Punct
  constant Ref (line 61) | Ref
  constant SDep (line 62) | SDep
  constant XSubj (line 63) | XSubj
  constant Case (line 66) | Case
  constant Compound (line 67) | Compound
  constant NMod (line 68) | NMod
  constant Discourse (line 69) | Discourse
  constant NumMod (line 70) | NumMod
  constant RelCl (line 71) | RelCl
  constant NFinCl (line 72) | NFinCl
  constant NMod_Poss (line 73) | NMod_Poss
  constant NMod_NPMod (line 74) | NMod_NPMod
  constant Vocative (line 75) | Vocative
  constant List (line 76) | List
  constant MWPrep (line 77) | MWPrep
  constant Remnant (line 78) | Remnant
  constant Acl (line 79) | Acl
  constant NPMod (line 80) | NPMod
  constant MDVod (line 81) | MDVod
  constant DetMod (line 82) | DetMod
  constant PComp (line 85) | PComp
  constant MAXDEPTYPE (line 87) | MAXDEPTYPE

FILE: dependencyType_stanford_string.go
  constant _DependencyType_name (line 9) | _DependencyType_name = "NoDepTypeDepRootNSubjNSubjPassDObjIObjCSubjCSubj...
  method String (line 13) | func (i DependencyType) String() string {

FILE: dependencyType_universal.go
  constant BUILD_RELSET (line 5) | BUILD_RELSET = "universalrel"
  constant NoDepType (line 11) | NoDepType DependencyType = iota
  constant Dep (line 12) | Dep
  constant Root (line 13) | Root
  constant NSubj (line 18) | NSubj
  constant NSubjPass (line 19) | NSubjPass
  constant DObj (line 20) | DObj
  constant IObj (line 21) | IObj
  constant CSubj (line 24) | CSubj
  constant CSubjPass (line 25) | CSubjPass
  constant CComp (line 26) | CComp
  constant XComp (line 28) | XComp
  constant NumMod (line 33) | NumMod
  constant Appos (line 34) | Appos
  constant NMod (line 35) | NMod
  constant ACl (line 38) | ACl
  constant ACl_RelCl (line 39) | ACl_RelCl
  constant Det (line 40) | Det
  constant Det_PreDet (line 41) | Det_PreDet
  constant AMod (line 44) | AMod
  constant Neg (line 45) | Neg
  constant Case (line 48) | Case
  constant NMod_NPMod (line 53) | NMod_NPMod
  constant NMod_TMod (line 54) | NMod_TMod
  constant NMod_Poss (line 55) | NMod_Poss
  constant AdvCl (line 58) | AdvCl
  constant AdvMod (line 61) | AdvMod
  constant Compound (line 64) | Compound
  constant Compound_Part (line 65) | Compound_Part
  constant Name (line 66) | Name
  constant MWE (line 67) | MWE
  constant Foreign (line 68) | Foreign
  constant GoesWith (line 69) | GoesWith
  constant List (line 72) | List
  constant Dislocated (line 73) | Dislocated
  constant Parataxis (line 74) | Parataxis
  constant Remnant (line 75) | Remnant
  constant Reparandum (line 76) | Reparandum
  constant Vocative (line 81) | Vocative
  constant Discourse (line 82) | Discourse
  constant Expl (line 83) | Expl
  constant Aux (line 86) | Aux
  constant AuxPass (line 87) | AuxPass
  constant Cop (line 88) | Cop
  constant Mark (line 91) | Mark
  constant Punct (line 92) | Punct
  constant Conj (line 96) | Conj
  constant Coordination (line 97) | Coordination
  constant CC_PreConj (line 98) | CC_PreConj
  constant MAXDEPTYPE (line 100) | MAXDEPTYPE

FILE: dependencyType_universal_string.go
  constant _DependencyType_name (line 9) | _DependencyType_name = "NoDepTypeDepRootNSubjNSubjPassDObjIObjCSubjCSubj...
  method String (line 13) | func (i DependencyType) String() string {

FILE: errors.go
  type componentUnavailable (line 3) | type componentUnavailable interface

FILE: interfaces.go
  type Lemmatizer (line 10) | type Lemmatizer interface
  type Stemmer (line 15) | type Stemmer interface
  type Sentencer (line 20) | type Sentencer interface
  type Corpus (line 25) | type Corpus interface
  type WordEmbeddings (line 59) | type WordEmbeddings interface

FILE: io.go
  type dummyAnnotation (line 12) | type dummyAnnotation struct
  method MarshalJSON (line 39) | func (a *Annotation) MarshalJSON() ([]byte, error) {
  method UnmarshalJSON (line 81) | func (a *Annotation) UnmarshalJSON(b []byte) error {
  method MarshalJSON (line 105) | func (as AnnotatedSentence) MarshalJSON() ([]byte, error) {
  method UnmarshalJSON (line 122) | func (as *AnnotatedSentence) UnmarshalJSON(b []byte) error {

FILE: io_test.go
  function TestAnnotationJSON (line 8) | func TestAnnotationJSON(t *testing.T) {
  function TestAnnotatedSentenceJSON (line 40) | func TestAnnotatedSentenceJSON(t *testing.T) {

FILE: lexeme.go
  type LexemeType (line 10) | type LexemeType
  constant EOF (line 13) | EOF LexemeType = iota
  constant Word (line 14) | Word
  constant Disambig (line 15) | Disambig
  constant URI (line 16) | URI
  constant Number (line 17) | Number
  constant Date (line 18) | Date
  constant Time (line 19) | Time
  constant Punctuation (line 20) | Punctuation
  constant Symbol (line 21) | Symbol
  constant Space (line 22) | Space
  constant SystemUse (line 23) | SystemUse
  type Lexeme (line 26) | type Lexeme struct
    method Fix (line 45) | func (l Lexeme) Fix() Lexeme {
    method String (line 53) | func (l Lexeme) String() string {
    method GoString (line 62) | func (l Lexeme) GoString() string {
  function MakeLexeme (line 35) | func MakeLexeme(s string, t LexemeType) Lexeme {
  function StartLexeme (line 75) | func StartLexeme() Lexeme { return startLexeme }
  function RootLexeme (line 76) | func RootLexeme() Lexeme  { return rootLexeme }
  function NullLexeme (line 77) | func NullLexeme() Lexeme  { return nullLexeme }

FILE: lexemetype_string.go
  constant _LexemeType_name (line 7) | _LexemeType_name = "EOFWordDisambigURINumberDateTimePunctuationSymbolSpa...
  method String (line 11) | func (i LexemeType) String() string {

FILE: lexer/lexer.go
  constant eof (line 15) | eof rune = -1
  type Lexer (line 17) | type Lexer struct
    method Run (line 54) | func (l *Lexer) Run() {
    method Reset (line 64) | func (l *Lexer) Reset(r io.Reader) {
    method next (line 73) | func (l *Lexer) next() rune {
    method nextUntilEOF (line 87) | func (l *Lexer) nextUntilEOF(s string) bool {
    method backup (line 98) | func (l *Lexer) backup() {
    method peek (line 104) | func (l *Lexer) peek() rune {
    method lineCount (line 118) | func (l *Lexer) lineCount() {
    method accept (line 127) | func (l *Lexer) accept() {
    method acceptRun (line 131) | func (l *Lexer) acceptRun(valid string) (accepted bool) {
    method acceptRunFn (line 140) | func (l *Lexer) acceptRunFn(fn func(rune) bool) (accepted int) {
    method ignore (line 149) | func (l *Lexer) ignore() {
    method emit (line 154) | func (l *Lexer) emit(t lingo.LexemeType) {
  function New (line 38) | func New(name string, r io.Reader) *Lexer {

FILE: lexer/lexer_test.go
  type lexerTest (line 10) | type lexerTest struct
  function testLexer (line 203) | func testLexer(lts *lexerTest) []lingo.Lexeme {
  function TestLexer (line 214) | func TestLexer(t *testing.T) {

FILE: lexer/stateFn.go
  type stateFn (line 9) | type stateFn
  function lexText (line 11) | func lexText(l *Lexer) (fn stateFn) {
  function lexNumber (line 120) | func lexNumber(l *Lexer) (fn stateFn) {
  function lexWhitespace (line 165) | func lexWhitespace(l *Lexer) (fn stateFn) {
  function lexPunctuation (line 190) | func lexPunctuation(l *Lexer) (fn stateFn) {
  function lexSymbol (line 228) | func lexSymbol(l *Lexer) (fn stateFn) {
  function lexURI (line 235) | func lexURI(l *Lexer) (fn stateFn) {
  function lexDate (line 252) | func lexDate(l *Lexer) (fn stateFn) {
  function lexTime (line 268) | func lexTime(l *Lexer) (fn stateFn) {

FILE: pos/allinone_test.go
  function TestEverything (line 13) | func TestEverything(t *testing.T) {

FILE: pos/context.go
  type contextType (line 30) | type contextType
  constant featuresPerContext (line 32) | featuresPerContext = 8
  constant contexts (line 33) | contexts = 5
  constant prev2Word (line 36) | prev2Word contextType = iota
  constant prev2Lemma (line 37) | prev2Lemma
  constant prev2Cluster (line 38) | prev2Cluster
  constant prev2Shape (line 39) | prev2Shape
  constant prev2Prefix1 (line 40) | prev2Prefix1
  constant prev2Suffix3 (line 41) | prev2Suffix3
  constant prev2POSTag (line 42) | prev2POSTag
  constant prev2Flags (line 43) | prev2Flags
  constant prevWord (line 46) | prevWord
  constant prevLemma (line 47) | prevLemma
  constant prevCluster (line 48) | prevCluster
  constant prevShape (line 49) | prevShape
  constant prevPrefix1 (line 50) | prevPrefix1
  constant prevSuffix3 (line 51) | prevSuffix3
  constant prevPOSTag (line 52) | prevPOSTag
  constant prevFlags (line 53) | prevFlags
  constant ithWord (line 56) | ithWord
  constant ithLemma (line 57) | ithLemma
  constant ithCluster (line 58) | ithCluster
  constant ithShape (line 59) | ithShape
  constant ithPrefix1 (line 60) | ithPrefix1
  constant ithSuffix3 (line 61) | ithSuffix3
  constant ithPOSTag (line 62) | ithPOSTag
  constant ithFlags (line 63) | ithFlags
  constant nextWord (line 66) | nextWord
  constant nextLemma (line 67) | nextLemma
  constant nextCluster (line 68) | nextCluster
  constant nextShape (line 69) | nextShape
  constant nextPrefix1 (line 70) | nextPrefix1
  constant nextSuffix3 (line 71) | nextSuffix3
  constant nextPOSTag (line 72) | nextPOSTag
  constant nextFlags (line 73) | nextFlags
  constant next2Word (line 76) | next2Word
  constant next2Lemma (line 77) | next2Lemma
  constant next2Cluster (line 78) | next2Cluster
  constant next2Shape (line 79) | next2Shape
  constant next2Prefix1 (line 80) | next2Prefix1
  constant next2Suffix3 (line 81) | next2Suffix3
  constant next2POSTag (line 82) | next2POSTag
  constant next2Flags (line 83) | next2Flags
  constant MAXCONTEXTTYPE (line 85) | MAXCONTEXTTYPE
  type contextMap (line 88) | type contextMap
  function getContext (line 90) | func getContext(prev2, prev, ith, next, next2 *lingo.Annotation) (retVal...
  function extractContext (line 120) | func extractContext(a *lingo.Annotation) (retVal [featuresPerContext]str...

FILE: pos/context_test.go
  function TestExtractContext (line 26) | func TestExtractContext(t *testing.T) {

FILE: pos/contexttype_string.go
  constant _contextType_name (line 7) | _contextType_name = "prev2Wordprev2Lemmaprev2Clusterprev2Shapeprev2Prefi...
  method String (line 11) | func (i contextType) String() string {

FILE: pos/debug.go
  constant BUILD_DEBUG (line 11) | BUILD_DEBUG = "POS TAGGER: Debug Build"
  function tabcount (line 17) | func tabcount() int {
  function enterLoggingContext (line 21) | func enterLoggingContext() {
  function leaveLoggingContext (line 27) | func leaveLoggingContext() {
  function logf (line 40) | func logf(format string, others ...interface{}) {
  function recoverFrom (line 44) | func recoverFrom(format string, attrs ...interface{}) {

FILE: pos/errors.go
  type componentUnavailable (line 5) | type componentUnavailable
    method Error (line 7) | func (c componentUnavailable) Error() string     { return fmt.Sprintf(...
    method Component (line 8) | func (c componentUnavailable) Component() string { return string(c) }

FILE: pos/features.go
  type featureType (line 10) | type featureType
  constant bias (line 14) | bias featureType = iota
  constant ithWord_ (line 16) | ithWord_
  constant nextWord_ (line 17) | nextWord_
  constant next2Word_ (line 18) | next2Word_
  constant ithSuffix3_ (line 20) | ithSuffix3_
  constant ithPrefix1_ (line 21) | ithPrefix1_
  constant prevPOSTag_ (line 23) | prevPOSTag_
  constant prev2POSTag_ (line 24) | prev2POSTag_
  constant prevSuffix3_ (line 25) | prevSuffix3_
  constant nextSuffix3_ (line 26) | nextSuffix3_
  constant ithShape_ (line 28) | ithShape_
  constant ithCluster_ (line 29) | ithCluster_
  constant nextCluster_ (line 30) | nextCluster_
  constant next2Cluster_ (line 31) | next2Cluster_
  constant prevCluster_ (line 32) | prevCluster_
  constant prev2Cluster_ (line 33) | prev2Cluster_
  constant ithFlags_ (line 35) | ithFlags_
  constant nextFlags_ (line 36) | nextFlags_
  constant next2Flags_ (line 37) | next2Flags_
  constant prevFlags_ (line 38) | prevFlags_
  constant prev2Flags_ (line 39) | prev2Flags_
  constant prevLemma_prevPOSTag (line 41) | prevLemma_prevPOSTag
  constant prevPOSTag_ithWord (line 42) | prevPOSTag_ithWord
  constant prevPOSTag_prev2POSTag (line 43) | prevPOSTag_prev2POSTag
  constant prev2Lemma_prev2POSTag (line 44) | prev2Lemma_prev2POSTag
  constant MAXFEATURETYPE (line 46) | MAXFEATURETYPE
  type feature (line 76) | type feature interface
  type singleFeature (line 81) | type singleFeature struct
    method FeatType (line 86) | func (sf singleFeature) FeatType() featureType { return sf.featureType }
    method String (line 87) | func (sf singleFeature) String() string {
  type tupleFeature (line 91) | type tupleFeature struct
    method FeatType (line 97) | func (tf tupleFeature) FeatType() featureType { return tf.featureType }
    method String (line 98) | func (tf tupleFeature) String() string {
  type featureMap (line 102) | type featureMap
    method String (line 104) | func (fm featureMap) String() string {
    method add (line 112) | func (fm *featureMap) add(f feature) { (*fm)[f]++ }
  type sfFeatures (line 114) | type sfFeatures
  type tfFeatures (line 115) | type tfFeatures
  function fillFromContext (line 117) | func fillFromContext(c contextMap) (sf sfFeatures, tf tfFeatures) {
  function getFeatures (line 130) | func getFeatures(s lingo.AnnotatedSentence, i int) (sfFeatures, tfFeatur...

FILE: pos/features_test.go
  function TestGetFeatures (line 12) | func TestGetFeatures(t *testing.T) {

FILE: pos/featuretype_string.go
  constant _featureType_name (line 7) | _featureType_name = "biasithWord_prevLemma_prevPOSTagprev2Lemma_prev2POS...
  method String (line 11) | func (i featureType) String() string {

FILE: pos/models.go
  type Model (line 13) | type Model struct
    method Save (line 19) | func (m *Model) Save(filename string) error {
    method SaveWriter (line 27) | func (m *Model) SaveWriter(f io.WriteCloser) error {
  function Load (line 47) | func Load(filename string) (*Model, error) {
  function LoadReader (line 55) | func LoadReader(rd io.ReadCloser) (*Model, error) {
  method Load (line 76) | func (p *Tagger) Load(filename string) error {

FILE: pos/models_test.go
  function TestSaveLoad (line 12) | func TestSaveLoad(t *testing.T) {

FILE: pos/perceptron.go
  type perceptron (line 5) | type perceptron struct
    method updateWeightsSF (line 35) | func (p *perceptron) updateWeightsSF(f singleFeature, tag lingo.POSTag...
    method updateWeightsTF (line 46) | func (p *perceptron) updateWeightsTF(f tupleFeature, tag lingo.POSTag,...
    method update (line 57) | func (p *perceptron) update(guess, truth lingo.POSTag, sf sfFeatures, ...
    method predict (line 90) | func (p *perceptron) predict(sf sfFeatures, tf tfFeatures) lingo.POSTag {
    method average (line 111) | func (p *perceptron) average() {
  type fctuple (line 18) | type fctuple struct
  function newPerceptron (line 23) | func newPerceptron() *perceptron {

FILE: pos/perceptron_io.go
  method GobEncode (line 10) | func (sf singleFeature) GobEncode() ([]byte, error) {
  method GobDecode (line 25) | func (sf *singleFeature) GobDecode(buf []byte) error {
  method GobEncode (line 41) | func (tf tupleFeature) GobEncode() ([]byte, error) {
  method GobDecode (line 60) | func (tf *tupleFeature) GobDecode(buf []byte) error {
  method GobEncode (line 81) | func (fc fctuple) GobEncode() ([]byte, error) {
  method GobDecode (line 96) | func (fc *fctuple) GobDecode(buf []byte) error {
  method GobEncode (line 112) | func (p *perceptron) GobEncode() ([]byte, error) {
  method GobDecode (line 142) | func (p *perceptron) GobDecode(buf []byte) error {
  function init (line 173) | func init() {

FILE: pos/perceptron_io_test.go
  function TestFeatureSerialization (line 14) | func TestFeatureSerialization(t *testing.T) {
  function TestPerceptron_Serialize (line 44) | func TestPerceptron_Serialize(t *testing.T) {

FILE: pos/postagger.go
  type Tagger (line 16) | type Tagger struct
    method Clone (line 98) | func (p *Tagger) Clone() *Tagger {
    method Run (line 114) | func (p *Tagger) Run() {
    method Lemmatize (line 138) | func (p *Tagger) Lemmatize(a string, pt lingo.POSTag) ([]string, error) {
    method Stem (line 146) | func (p *Tagger) Stem(a string) (string, error) {
    method Clusters (line 154) | func (p *Tagger) Clusters() (map[string]lingo.Cluster, error) {
    method Progress (line 162) | func (p *Tagger) Progress() <-chan Progress {
    method Train (line 170) | func (p *Tagger) Train(sentences []treebank.SentenceTag, iterations in...
    method LoadShortcuts (line 248) | func (p *Tagger) LoadShortcuts(shortcuts map[string]lingo.POSTag) {
    method fillCache (line 254) | func (p *Tagger) fillCache(sentences []treebank.SentenceTag) {
    method shortcut (line 296) | func (p *Tagger) shortcut(l lingo.Lexeme) (lingo.POSTag, bool) {
    method setTag (line 304) | func (p *Tagger) setTag(a *lingo.Annotation, tag lingo.POSTag) {
  type ConsOpt (line 32) | type ConsOpt
  function WithCorpus (line 35) | func WithCorpus(c *corpus.Corpus) ConsOpt {
  function WithLemmatizer (line 44) | func WithLemmatizer(l lingo.Lemmatizer) ConsOpt {
  function WithStemmer (line 53) | func WithStemmer(s lingo.Stemmer) ConsOpt {
  function WithCluster (line 62) | func WithCluster(c map[string]lingo.Cluster) ConsOpt {
  function WithModel (line 70) | func WithModel(m *Model) ConsOpt {
  function New (line 78) | func New(opts ...ConsOpt) *Tagger {
  type Progress (line 322) | type Progress struct

FILE: pos/release.go
  constant BUILD_DEBUG (line 5) | BUILD_DEBUG = "POS TAGGER: Release Build"
  function tabcount (line 10) | func tabcount() int                                   { return 0 }
  function enterLoggingContext (line 11) | func enterLoggingContext()                            {}
  function leaveLoggingContext (line 12) | func leaveLoggingContext()                            {}
  function logf (line 13) | func logf(format string, others ...interface{})       {}
  function recoverFrom (line 14) | func recoverFrom(format string, attrs ...interface{}) {}
  method ShowWeights (line 16) | func (p *Tagger) ShowWeights() {}
  function printShortcuts (line 17) | func printShortcuts(p *Tagger) {}

FILE: pos/sentence.go
  method getSentences (line 7) | func (p *Tagger) getSentences() {

FILE: pos/test_test.go
  type dummyLem (line 8) | type dummyLem struct
    method Lemmatize (line 10) | func (dummyLem) Lemmatize(s string, pt lingo.POSTag) ([]string, error) {
  type dummyStemmer (line 19) | type dummyStemmer struct
    method Stem (line 21) | func (dummyStemmer) Stem(s string) (string, error) {
  type dummyFix (line 31) | type dummyFix struct
    method Clusters (line 36) | func (dummyFix) Clusters() (map[string]lingo.Cluster, error) { return ...
  constant conllu (line 38) | conllu = `1	From	from	ADP	IN	_	3	case	_	_

FILE: pos/util.go
  function maxScore (line 9) | func maxScore(scores *[lingo.MAXTAG]float64) lingo.POSTag {

FILE: pos/util_test.go
  function TestMaxScore (line 11) | func TestMaxScore(t *testing.T) {

FILE: sentence.go
  type LexemeSentence (line 13) | type LexemeSentence
    method String (line 17) | func (ls LexemeSentence) String() string {
  function NewLexemeSentence (line 15) | func NewLexemeSentence() LexemeSentence { return LexemeSentence(make([]L...
  type AnnotatedSentence (line 29) | type AnnotatedSentence
    method Clone (line 33) | func (as AnnotatedSentence) Clone() AnnotatedSentence {
    method SetID (line 47) | func (as AnnotatedSentence) SetID() {
    method Fix (line 56) | func (as AnnotatedSentence) Fix() {
    method IsValid (line 74) | func (as AnnotatedSentence) IsValid() bool {
    method Phrase (line 96) | func (as AnnotatedSentence) Phrase(start, end int) (AnnotatedSentence,...
    method IDs (line 107) | func (as AnnotatedSentence) IDs() []int {
    method Tags (line 116) | func (as AnnotatedSentence) Tags() []POSTag {
    method Heads (line 125) | func (as AnnotatedSentence) Heads() []int {
    method Leaves (line 134) | func (as AnnotatedSentence) Leaves() (retVal []int) {
    method Labels (line 144) | func (as AnnotatedSentence) Labels() []DependencyType {
    method StringSlice (line 153) | func (as AnnotatedSentence) StringSlice() []string {
    method LoweredStringSlice (line 162) | func (as AnnotatedSentence) LoweredStringSlice() []string {
    method Lemmas (line 171) | func (as AnnotatedSentence) Lemmas() []string {
    method Stems (line 180) | func (as AnnotatedSentence) Stems() []string {
    method Children (line 188) | func (as AnnotatedSentence) Children(h int) (retVal []int) {
    method Edges (line 197) | func (as AnnotatedSentence) Edges() (retVal []DependencyEdge) {
    method Dependency (line 217) | func (as AnnotatedSentence) Dependency() *Dependency {
    method Tree (line 221) | func (as AnnotatedSentence) Tree() *DependencyTree {
    method String (line 263) | func (as AnnotatedSentence) String() string {
    method ValueString (line 274) | func (as AnnotatedSentence) ValueString() string {
    method LoweredString (line 285) | func (as AnnotatedSentence) LoweredString() string {
    method LemmaString (line 296) | func (as AnnotatedSentence) LemmaString() string {
    method StemString (line 307) | func (as AnnotatedSentence) StemString() string {
    method Len (line 319) | func (as AnnotatedSentence) Len() int           { return len(as) }
    method Swap (line 320) | func (as AnnotatedSentence) Swap(i, j int)      { as[i], as[j] = as[j]...
    method Less (line 321) | func (as AnnotatedSentence) Less(i, j int) bool { return as[i].ID < as...
  function NewAnnotatedSentence (line 31) | func NewAnnotatedSentence() AnnotatedSentence { return make(AnnotatedSen...

FILE: sets.go
  type TagSet (line 11) | type TagSet
    method String (line 13) | func (ts TagSet) String() string {
  type DependencyTypeSet (line 22) | type DependencyTypeSet
    method String (line 24) | func (dts DependencyTypeSet) String() string {

FILE: shape.go
  type Shape (line 9) | type Shape
  method Shape (line 11) | func (l Lexeme) Shape() Shape {

FILE: stopwords.go
  constant sw (line 5) | sw = `a about above across after afterwards again against all almost alo...
  function init (line 9) | func init() {
  function UnescapeSpecials (line 18) | func UnescapeSpecials(word string) string {

FILE: treebank/sentenceTag.go
  type SentenceTag (line 10) | type SentenceTag struct
    method AnnotatedSentence (line 17) | func (s SentenceTag) AnnotatedSentence(f lingo.AnnotationFixer) lingo....
    method Dependency (line 48) | func (s SentenceTag) Dependency(f lingo.AnnotationFixer) *lingo.Depend...
    method String (line 55) | func (s SentenceTag) String() string {
  function ShuffleSentenceTag (line 59) | func ShuffleSentenceTag(s []SentenceTag) []SentenceTag {
  function WrapLexemeSentence (line 71) | func WrapLexemeSentence(sentence lingo.LexemeSentence) lingo.LexemeSente...
  function WrapTags (line 79) | func WrapTags(tagList []lingo.POSTag) []lingo.POSTag {
  function WrapHeads (line 85) | func WrapHeads(heads []int) []int {
  function WrapDeps (line 91) | func WrapDeps(deps []lingo.DependencyType) []lingo.DependencyType {

FILE: treebank/sentenceTag_test.go
  function TestSentenceTag (line 10) | func TestSentenceTag(t *testing.T) {

FILE: treebank/treebank.go
  type Loader (line 19) | type Loader
  function LoadUniversal (line 22) | func LoadUniversal(fileName string) []SentenceTag {
  function ReadConllu (line 34) | func ReadConllu(reader io.Reader) []SentenceTag {
  function LoadEWT (line 111) | func LoadEWT(filename string) []SentenceTag {

FILE: treebank/treebank_test.go
  constant sampleConllu (line 11) | sampleConllu = `1	President	President	PROPN	NNP	Number=Sing	2	compound	_	_
  function Test_ReadConllu (line 33) | func Test_ReadConllu(t *testing.T) {
  function ttos (line 119) | func ttos(ts []lingo.POSTag) []string {
  function ltos (line 127) | func ltos(ls []lingo.DependencyType) []string {

FILE: treebank/util.go
  function StringToLexType (line 8) | func StringToLexType(tag string) lingo.LexemeType {
  function StringToPOSTag (line 23) | func StringToPOSTag(tag string) (lingo.POSTag, bool) {
  function StringToDependencyType (line 29) | func StringToDependencyType(ud string) (lingo.DependencyType, bool) {
  function reset (line 35) | func reset() (lingo.LexemeSentence, []lingo.POSTag, []int, []lingo.Depen...
  function finish (line 44) | func finish(s lingo.LexemeSentence, st []lingo.POSTag, sh []int, sdt []l...

FILE: utils.go
  function InStringSlice (line 3) | func InStringSlice(s string, l []string) bool {
  type is (line 12) | type is
  function StringIs (line 14) | func StringIs(s string, f is) bool {
  function isAscii (line 23) | func isAscii(r rune) bool {
  function EqStringSlice (line 30) | func EqStringSlice(a, b []string) bool {

FILE: wordFlags.go
  type WordFlag (line 10) | type WordFlag
    method String (line 31) | func (f WordFlag) String() string {
  constant NoFlag (line 13) | NoFlag WordFlag = iota
  constant IsLetter (line 14) | IsLetter
  constant IsAscii (line 15) | IsAscii
  constant IsDigit (line 16) | IsDigit
  constant IsLower (line 17) | IsLower
  constant IsPunct (line 18) | IsPunct
  constant IsSpace (line 19) | IsSpace
  constant IsTitle (line 20) | IsTitle
  constant IsUpper (line 21) | IsUpper
  constant LikeURL (line 22) | LikeURL
  constant LikeNum (line 23) | LikeNum
  constant LikeEmail (line 24) | LikeEmail
  constant IsStopWord (line 25) | IsStopWord
  constant IsOOV (line 26) | IsOOV
  constant MAXFLAG (line 28) | MAXFLAG
  method Flags (line 35) | func (l Lexeme) Flags() WordFlag {

Download .json

Condensed preview — 128 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (320K chars).

[
  {
    "path": ".gitignore",
    "chars": 266,
    "preview": "# Compiled Object files, Static and Dynamic libs (Shared Objects)\n*.o\n*.a\n*.so\n\n# Folders\n_obj\n_test\n\n# Architecture spe"
  },
  {
    "path": ".travis.yml",
    "chars": 157,
    "preview": "language: go\n\nbranches:\n  only:\n    - master\n\ngo:\n  - 1.11.x\n  - 1.12.x\n  - 1.13.x\n  - tip\n\nenv:\n  - GO111MODULE=on\n\nmat"
  },
  {
    "path": "CONTRIBUTING.md",
    "chars": 1832,
    "preview": "# Contributing #\n\nContributors are welcome! We want to make contributing as easy as possible, and the process is very Gi"
  },
  {
    "path": "CONTRIBUTORS.md",
    "chars": 59,
    "preview": "# Contributors #\n\n* Xuanyi Chew (@chewxy) - initial package"
  },
  {
    "path": "LICENSE",
    "chars": 1063,
    "preview": "MIT License\n\nCopyright (c) 2017 Chewxy\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof "
  },
  {
    "path": "POSTag.go",
    "chars": 1397,
    "preview": "package lingo\n\nimport (\n\t\"fmt\"\n\t\"strings\"\n)\n\n// POSTag represents a Part of Speech Tag.\ntype POSTag byte\n\nvar posTagLook"
  },
  {
    "path": "POSTag_stanford.go",
    "chars": 4115,
    "preview": "// +build stanfordtags\n\npackage lingo\n\n//go:generate stringer -type=POSTag -output=POSTag_stanford_string.go\n\nconst BUIL"
  },
  {
    "path": "POSTag_stanford_string.go",
    "chars": 829,
    "preview": "// +build stanfordtags\n\n// Code generated by \"stringer -type=POSTag -output=POSTag_stanford_string.go\"; DO NOT EDIT\n\npac"
  },
  {
    "path": "POSTag_universal.go",
    "chars": 1342,
    "preview": "// +build !stanfordtags\n\npackage lingo\n\n//go:generate stringer -type=POSTag -output=POSTag_universal_string.go\n\nconst BU"
  },
  {
    "path": "POSTag_universal_string.go",
    "chars": 548,
    "preview": "// +build !stanfordtags\n\n// Code generated by \"stringer -type=POSTag -output=POSTag_universal_string.go\"; DO NOT EDIT\n\np"
  },
  {
    "path": "README.md",
    "chars": 5130,
    "preview": "# lingo #\n\n<img src=\"https://raw.githubusercontent.com/chewxy/lingo/master/media/gopher_small.png\" align=\"right\" />\n\n[!["
  },
  {
    "path": "annotation.go",
    "chars": 3999,
    "preview": "package lingo\n\nimport (\n\t\"errors\"\n\t\"fmt\"\n\t\"strings\"\n)\n\n// Annotation is the word and it's metadata.\n// This includes the"
  },
  {
    "path": "annotationSet.go",
    "chars": 826,
    "preview": "package lingo\n\nimport (\n\t\"sort\"\n\t\"unsafe\"\n\n\t\"github.com/xtgo/set\"\n)\n\ntype AnnotationSet []*Annotation\n\nfunc (as Annotati"
  },
  {
    "path": "annotationSet_bench_test.go",
    "chars": 2366,
    "preview": "package lingo\n\nimport (\n\t\"sort\"\n\t\"testing\"\n)\n\nfunc (as AnnotationSet) index2(a *Annotation) int {\n\tsort.Sort(as)\n\tf := f"
  },
  {
    "path": "browncluster.go",
    "chars": 1509,
    "preview": "package lingo\n\nimport (\n\t\"bufio\"\n\t\"io\"\n\t\"strconv\"\n\t\"strings\"\n)\n\n// this file provides IO support and type safety for bro"
  },
  {
    "path": "cmd/demo/io.go",
    "chars": 694,
    "preview": "package main\n\nimport (\n\t\"log\"\n\t\"os\"\n\n\t\"github.com/chewxy/lingo\"\n\t\"github.com/chewxy/lingo/dep\"\n\t\"github.com/chewxy/lingo"
  },
  {
    "path": "cmd/demo/main.go",
    "chars": 1264,
    "preview": "package main\n\nimport (\n\t\"io/ioutil\"\n\t\"os\"\n\t\"os/exec\"\n\n\t\"github.com/abiosoft/ishell\"\n\t\"github.com/chewxy/lingo\"\n\t\"github."
  },
  {
    "path": "cmd/demo/nlp.go",
    "chars": 1348,
    "preview": "package main\n\nimport (\n\t\"fmt\"\n\t\"strings\"\n\n\t\"github.com/chewxy/lingo\"\n\t\"github.com/chewxy/lingo/dep\"\n\t\"github.com/chewxy/"
  },
  {
    "path": "cmd/dep/fixer.go",
    "chars": 589,
    "preview": "package main\n\nimport (\n\t\"fmt\"\n\n\t\"github.com/chewxy/lingo\"\n\t\"github.com/kljensen/snowball\"\n)\n\ntype stemmer struct{}\n\nfunc"
  },
  {
    "path": "cmd/dep/io.go",
    "chars": 1143,
    "preview": "package main\n\nimport (\n\t\"log\"\n\n\t\"github.com/chewxy/lingo/dep\"\n\t\"github.com/chewxy/lingo/pos\"\n\t\"github.com/chewxy/lingo/t"
  },
  {
    "path": "cmd/dep/main.go",
    "chars": 2527,
    "preview": "package main\n\nimport (\n\t\"flag\"\n\t\"log\"\n\t\"os\"\n\t\"os/signal\"\n\t\"runtime/pprof\"\n\t\"syscall\"\n\n\t\"github.com/chewxy/lingo\"\n\t\"githu"
  },
  {
    "path": "cmd/dep/pipeline.go",
    "chars": 893,
    "preview": "package main\n\nimport (\n\t\"encoding/json\"\n\t\"fmt\"\n\t\"strings\"\n\n\t\"github.com/chewxy/lingo\"\n\t\"github.com/chewxy/lingo/dep\"\n\t\"g"
  },
  {
    "path": "cmd/dep/train.go",
    "chars": 1298,
    "preview": "package main\n\nimport (\n\t\"log\"\n\n\t\"github.com/chewxy/lingo/dep\"\n\t\"github.com/chewxy/lingo/treebank\"\n\t\"gorgonia.org/tensor\""
  },
  {
    "path": "cmd/lexer/main.go",
    "chars": 413,
    "preview": "package main\n\nimport (\n\t\"flag\"\n\t\"fmt\"\n\t\"strings\"\n\n\t\"github.com/chewxy/lingo\"\n\t\"github.com/chewxy/lingo/lexer\"\n)\n\nvar inp"
  },
  {
    "path": "cmd/pos/crossvalidation.go",
    "chars": 2608,
    "preview": "package main\n\nimport (\n\t\"bytes\"\n\t\"fmt\"\n\t\"log\"\n\t\"os\"\n\t\"strings\"\n\t\"sync\"\n\n\t\"github.com/chewxy/lingo\"\n\t\"github.com/chewxy/l"
  },
  {
    "path": "cmd/pos/fixer.go",
    "chars": 608,
    "preview": "// +build !chewxy\n\npackage main\n\nimport (\n\t\"fmt\"\n\n\t\"github.com/chewxy/lingo\"\n\t\"github.com/kljensen/snowball\"\n)\n\ntype ste"
  },
  {
    "path": "cmd/pos/main.go",
    "chars": 4478,
    "preview": "package main\n\nimport (\n\t\"flag\"\n\t\"fmt\"\n\t\"log\"\n\t\"os\"\n\t\"os/signal\"\n\t\"runtime/pprof\"\n\t\"strings\"\n\t\"sync\"\n\t\"syscall\"\n\t\"time\"\n\n"
  },
  {
    "path": "const.go",
    "chars": 1625,
    "preview": "package lingo\n\n// constants that are not pertaining to build tags\n\nvar empty struct{}\n\n// NumberWords was generated with"
  },
  {
    "path": "corpus/consopt.go",
    "chars": 3707,
    "preview": "package corpus\n\nimport (\n\t\"log\"\n\t\"sort\"\n\t\"sync/atomic\"\n\t\"unicode/utf8\"\n\n\t\"github.com/pkg/errors\"\n\t\"github.com/xtgo/set\"\n"
  },
  {
    "path": "corpus/corpus.go",
    "chars": 4775,
    "preview": "package corpus\n\nimport (\n\t\"sync/atomic\"\n\t\"unicode/utf8\"\n\n\t\"github.com/pkg/errors\"\n)\n\n// Corpus is a data structure holdi"
  },
  {
    "path": "corpus/corpus_test.go",
    "chars": 1553,
    "preview": "package corpus\n\nimport (\n\t\"testing\"\n\n\t\"github.com/stretchr/testify/assert\"\n)\n\nfunc TestCorpus(t *testing.T) {\n\tassert :="
  },
  {
    "path": "corpus/functions.go",
    "chars": 8180,
    "preview": "package corpus\n\nimport (\n\t\"math\"\n\t\"strings\"\n\t\"unicode/utf8\"\n\n\t\"github.com/chewxy/lingo\"\n\t\"github.com/chewxy/lingo/treeba"
  },
  {
    "path": "corpus/functions_test.go",
    "chars": 3629,
    "preview": "package corpus\n\nimport (\n\t\"strings\"\n\t\"testing\"\n\n\t\"github.com/stretchr/testify/assert\"\n)\n\nfunc Test_GenerateCorpus(t *tes"
  },
  {
    "path": "corpus/inflection.go",
    "chars": 3816,
    "preview": "package corpus\n\nimport (\n\t\"regexp\"\n\n\t\"github.com/chewxy/lingo\"\n)\n\ntype conversionPattern struct {\n\tpattern     *regexp.R"
  },
  {
    "path": "corpus/inflection_test.go",
    "chars": 901,
    "preview": "package corpus\n\nimport \"testing\"\n\nvar pluralizeTest = []struct {\n\tword, correct string\n}{\n\t{\"friend\", \"friends\"},\n\t{\"tom"
  },
  {
    "path": "corpus/io.go",
    "chars": 3123,
    "preview": "package corpus\n\nimport (\n\t\"bufio\"\n\t\"bytes\"\n\t\"encoding/gob\"\n\t\"io\"\n\t\"strconv\"\n\t\"strings\"\n)\n\n// sortutil is a utility struc"
  },
  {
    "path": "corpus/io_test.go",
    "chars": 1992,
    "preview": "package corpus\n\nimport (\n\t\"bytes\"\n\t\"encoding/gob\"\n\t\"strings\"\n\t\"testing\"\n\n\t\"github.com/stretchr/testify/assert\"\n)\n\nfunc T"
  },
  {
    "path": "corpus/lda.go",
    "chars": 973,
    "preview": "package corpus\n\nimport (\n\t\"gorgonia.org/tensor\"\n)\n\n// LDAModel ... TODO\n//https://en.wikipedia.org/wiki/Latent_Dirichlet"
  },
  {
    "path": "corpus/test_test.go",
    "chars": 1382,
    "preview": "package corpus\n\nimport (\n\t\"strings\"\n\n\t\"github.com/chewxy/lingo/treebank\"\n)\n\nconst sample1Gram = `the\t23135851162\nof\t1315"
  },
  {
    "path": "corpus/utils.go",
    "chars": 530,
    "preview": "package corpus\n\nimport (\n\t\"errors\"\n\t\"math\"\n)\n\nfunc minInt(a, b int) int {\n\tif a < b {\n\t\treturn a\n\t}\n\treturn b\n}\n\nfunc ma"
  },
  {
    "path": "dep/README.md",
    "chars": 8658,
    "preview": "# Dependency Parser #\n\nPackage `dependencyparser` is a package that provides data structures and algorithms for a depend"
  },
  {
    "path": "dep/arcStandard.go",
    "chars": 1628,
    "preview": "package dep\n\nimport \"github.com/chewxy/lingo\"\n\n// var SingleRoot bool = true // make this part of a build process\n\n// ca"
  },
  {
    "path": "dep/arcStandard_test.go",
    "chars": 2385,
    "preview": "package dep\n\nimport (\n\t\"testing\"\n\n\t\"github.com/chewxy/lingo\"\n\t\"github.com/stretchr/testify/assert\"\n)\n\nfunc TestCanApply("
  },
  {
    "path": "dep/configuration.go",
    "chars": 4431,
    "preview": "package dep\n\nimport (\n\t\"fmt\"\n\n\t\"github.com/chewxy/lingo\"\n)\n\n// describes the current state of the parser\n\ntype head int\n"
  },
  {
    "path": "dep/configuration_test.go",
    "chars": 1678,
    "preview": "package dep\n\nimport (\n\t\"testing\"\n\n\t\"github.com/chewxy/lingo\"\n\t\"github.com/stretchr/testify/assert\"\n)\n\nfunc TestStackAppe"
  },
  {
    "path": "dep/debug.go",
    "chars": 3086,
    "preview": "// +build debug\n\npackage dep\n\nimport (\n\t\"bytes\"\n\t\"fmt\"\n\t\"log\"\n\t\"runtime\"\n\t\"strings\"\n\t\"sync/atomic\"\n\n\t\"github.com/chewxy/"
  },
  {
    "path": "dep/dependencyParser.go",
    "chars": 2824,
    "preview": "package dep\n\nimport (\n\t\"fmt\"\n\n\t\"github.com/chewxy/lingo\"\n\t\"github.com/chewxy/lingo/corpus\"\n\t\"github.com/pkg/errors\"\n)\n\nv"
  },
  {
    "path": "dep/documentation/iamhuman.dot",
    "chars": 391,
    "preview": "digraph G {\n\tNode_0xc425b88740->Node_0xc425b88780[ label=Root ];\n\tNode_0xc425b88780->Node_0xc425b88800[ label=Cop ];\n\tNo"
  },
  {
    "path": "dep/documentation/thecatsatonthemat.dot",
    "chars": 706,
    "preview": "digraph G {\n\tNode_0xc4349eeec0->Node_0xc4349eef80[ label=Root ];\n\tNode_0xc4349eef80->Node_0xc4349eefc0[ label=NMod ];\n\tN"
  },
  {
    "path": "dep/errors.go",
    "chars": 864,
    "preview": "package dep\n\nimport (\n\t\"fmt\"\n\n\t\"github.com/chewxy/lingo\"\n)\n\ntype componentUnavailable string\n\nfunc (c componentUnavailab"
  },
  {
    "path": "dep/evaluation.go",
    "chars": 2968,
    "preview": "package dep\n\nimport (\n\t\"fmt\"\n\t\"io/ioutil\"\n\n\t\"github.com/chewxy/lingo\"\n\t\"github.com/chewxy/lingo/treebank\"\n)\n\n// Performa"
  },
  {
    "path": "dep/example.go",
    "chars": 2183,
    "preview": "package dep\n\nimport (\n\t\"math/rand\"\n\n\t\"github.com/chewxy/lingo\"\n\t\"github.com/chewxy/lingo/corpus\"\n\t\"github.com/chewxy/lin"
  },
  {
    "path": "dep/example_test.go",
    "chars": 339,
    "preview": "package dep\n\nimport (\n\t\"testing\"\n\n\t\"github.com/chewxy/lingo/corpus\"\n)\n\nfunc TestMakeExamples(t *testing.T) {\n\tst := simp"
  },
  {
    "path": "dep/featureExtraction.go",
    "chars": 3542,
    "preview": "package dep\n\nimport (\n\t\"github.com/chewxy/lingo\"\n\t\"github.com/chewxy/lingo/corpus\"\n)\n\n// getFeatures extracts the IDs to"
  },
  {
    "path": "dep/features.go",
    "chars": 844,
    "preview": "package dep\n\nimport \"github.com/chewxy/lingo\"\n\n// the features are used as columns in the matrix\n\n// go:generate stringe"
  },
  {
    "path": "dep/features_string.go",
    "chars": 804,
    "preview": "// generated by stringer -type=feature -output=features_string.go; DO NOT EDIT\n\npackage dep\n\nimport \"fmt\"\n\nconst _featur"
  },
  {
    "path": "dep/fix.go",
    "chars": 2856,
    "preview": "package dep\n\nimport (\n\t\"log\"\n\n\t\"github.com/chewxy/lingo\"\n)\n\n// applies common fixes\nfunc fix(d *lingo.Dependency) {\n\t// "
  },
  {
    "path": "dep/init.go",
    "chars": 135,
    "preview": "package dep\n\nimport \"github.com/chewxy/lingo/corpus\"\n\nfunc init() {\n\tc := corpus.New()\n\tc.Add(\"\") // add null words\n\n\tKn"
  },
  {
    "path": "dep/models.go",
    "chars": 2079,
    "preview": "package dep\n\nimport (\n\t\"bufio\"\n\t\"bytes\"\n\t\"encoding/gob\"\n\t\"fmt\"\n\t\"io\"\n\t\"os\"\n\n\t\"github.com/chewxy/lingo/corpus\"\n\t\"github.c"
  },
  {
    "path": "dep/models_test.go",
    "chars": 1080,
    "preview": "package dep\n\nimport (\n\t\"os\"\n\t\"testing\"\n\n\t\"github.com/stretchr/testify/assert\"\n\tG \"gorgonia.org/gorgonia\"\n)\n\nfunc TestMod"
  },
  {
    "path": "dep/move.go",
    "chars": 312,
    "preview": "package dep\n\n// Move is an action that the dependency parser can take - whether to Shift, Attach-Left, or AttachRight\nty"
  },
  {
    "path": "dep/move_string.go",
    "chars": 329,
    "preview": "// generated by stringer -type=Move; DO NOT EDIT\n\npackage dep\n\nimport \"fmt\"\n\nconst _Move_name = \"ShiftLeftRightMAXMOVE\"\n"
  },
  {
    "path": "dep/nn2.go",
    "chars": 10402,
    "preview": "package dep\n\nimport (\n\t\"github.com/chewxy/lingo\"\n\t\"github.com/chewxy/lingo/corpus\"\n\t\"github.com/pkg/errors\"\n\tG \"gorgonia"
  },
  {
    "path": "dep/nn2_io.go",
    "chars": 3260,
    "preview": "package dep\n\nimport (\n\t\"bytes\"\n\t\"encoding/gob\"\n\t\"fmt\"\n\n\t\"github.com/pkg/errors\"\n\tG \"gorgonia.org/gorgonia\"\n\tT \"gorgonia."
  },
  {
    "path": "dep/nn2_io_test.go",
    "chars": 3370,
    "preview": "package dep\n\nimport (\n\t\"bytes\"\n\t\"encoding/gob\"\n\t\"fmt\"\n\t\"testing\"\n\n\t\"github.com/chewxy/lingo\"\n\t\"github.com/chewxy/lingo/c"
  },
  {
    "path": "dep/nn2_test.go",
    "chars": 1901,
    "preview": "package dep\n\nimport (\n\t\"math/rand\"\n\t\"testing\"\n\t\"time\"\n\n\t\"github.com/chewxy/lingo/corpus\"\n\t\"gorgonia.org/gorgonia\"\n)\n\nfun"
  },
  {
    "path": "dep/nnconfig.go",
    "chars": 2914,
    "preview": "package dep\n\nimport (\n\t\"bytes\"\n\t\"encoding/gob\"\n\t\"fmt\"\n\n\t\"github.com/pkg/errors\"\n\t\"gorgonia.org/tensor\"\n)\n\n// NNConfig co"
  },
  {
    "path": "dep/release.go",
    "chars": 607,
    "preview": "// +build !debug\n\npackage dep\n\nconst BUILD_DEBUG = \"PARSER: RELEASE BUILD\"\nconst BUILD_DIAG = \"Non-Diagnostic Build\"\n\nco"
  },
  {
    "path": "dep/span.go",
    "chars": 313,
    "preview": "package dep\n\ntype span struct {\n\tstart, end int\n}\n\nfunc makeSpan(start, end int) span {\n\tif end <= start {\n\t\tpanic(\"Impo"
  },
  {
    "path": "dep/test_test.go",
    "chars": 7717,
    "preview": "package dep\n\nimport (\n\t\"bufio\"\n\t\"crypto/md5\"\n\t\"encoding/gob\"\n\t\"fmt\"\n\t\"io\"\n\t\"log\"\n\t\"os\"\n\t\"strings\"\n\n\t\"github.com/chewxy/l"
  },
  {
    "path": "dep/train.go",
    "chars": 9031,
    "preview": "package dep\n\nimport (\n\t\"fmt\"\n\t\"os\"\n\t\"sync\"\n\n\t\"github.com/chewxy/lingo\"\n\t\"github.com/chewxy/lingo/corpus\"\n\t\"github.com/ch"
  },
  {
    "path": "dep/train_test.go",
    "chars": 4827,
    "preview": "package dep\n\nimport (\n\t\"testing\"\n\n\t\"github.com/chewxy/lingo/corpus\"\n\n\tG \"gorgonia.org/gorgonia\"\n)\n\nfunc TestTrainerIniti"
  },
  {
    "path": "dep/transition.go",
    "chars": 1431,
    "preview": "package dep\n\nimport (\n\t\"fmt\"\n\n\t\"github.com/chewxy/lingo\"\n)\n\n// transition is a tuple of Move and label\ntype transition s"
  },
  {
    "path": "dep/util.go",
    "chars": 146,
    "preview": "package dep\n\nfunc minInt(a, b int) int {\n\tif a < b {\n\t\treturn a\n\t}\n\treturn b\n}\n\nfunc maxInt(a, b int) int {\n\tif a > b {\n"
  },
  {
    "path": "dependency.go",
    "chars": 4869,
    "preview": "package lingo\n\nimport (\n\t\"bytes\"\n\t\"fmt\"\n)\n\n// Dependency represents the dependency parse of a sentence. While AnnotatedS"
  },
  {
    "path": "dependencyTree.go",
    "chars": 2069,
    "preview": "package lingo\n\nimport (\n\t\"github.com/awalterschulze/gographviz\"\n\n\t\"fmt\"\n\n\t\"sync\"\n)\n\n// A DependencyTree is an alternate "
  },
  {
    "path": "dependencyType.go",
    "chars": 1323,
    "preview": "package lingo\n\nimport (\n\t\"fmt\"\n\t\"strings\"\n)\n\n// DependencyType represents the relation between two words\ntype Dependency"
  },
  {
    "path": "dependencyType_stanford.go",
    "chars": 2858,
    "preview": "// +build stanfordrel\n\npackage lingo\n\nconst BUILD_RELSET = \"stanfordrel\"\n\n//go:generate stringer -type=DependencyType -o"
  },
  {
    "path": "dependencyType_stanford_string.go",
    "chars": 921,
    "preview": "// +build stanfordrel\n\n// Code generated by \"stringer -type=DependencyType -output=dependencyType_stanford_string.go\"; D"
  },
  {
    "path": "dependencyType_universal.go",
    "chars": 1644,
    "preview": "// +build !stanfordrel\n\npackage lingo\n\nconst BUILD_RELSET = \"universalrel\"\n\n//go:generate stringer -type=DependencyType "
  },
  {
    "path": "dependencyType_universal_string.go",
    "chars": 1012,
    "preview": "// +build !stanfordrel\n\n// Code generated by \"stringer -type=DependencyType -output=dependencyType_universal_string.go\";"
  },
  {
    "path": "errors.go",
    "chars": 82,
    "preview": "package lingo\n\ntype componentUnavailable interface {\n\terror\n\tComponent() string\n}\n"
  },
  {
    "path": "go.mod",
    "chars": 1710,
    "preview": "module github.com/chewxy/lingo\n\nrequire (\n\tgithub.com/abiosoft/ishell v2.0.0+incompatible\n\tgithub.com/abiosoft/readline "
  },
  {
    "path": "go.sum",
    "chars": 6315,
    "preview": "github.com/abiosoft/ishell v2.0.0+incompatible/go.mod h1:HQR9AqF2R3P4XXpMpI0NAzgHf/aS6+zVXRj14cVk9qg=\ngithub.com/abiosof"
  },
  {
    "path": "interfaces.go",
    "chars": 1988,
    "preview": "package lingo\n\nimport (\n\t\"encoding/gob\"\n\n\t\"gorgonia.org/tensor\"\n)\n\n// Lemmatizer is anything that can lemmatize\ntype Lem"
  },
  {
    "path": "io.go",
    "chars": 3552,
    "preview": "package lingo\n\nimport (\n\t\"bytes\"\n\t\"encoding/json\"\n\t\"fmt\"\n\t\"strings\"\n\n\t\"github.com/pkg/errors\"\n)\n\ntype dummyAnnotation st"
  },
  {
    "path": "io_test.go",
    "chars": 1849,
    "preview": "package lingo\n\nimport (\n\t\"encoding/json\"\n\t\"testing\"\n)\n\nfunc TestAnnotationJSON(t *testing.T) {\n\ta := NewAnnotation()\n\ta."
  },
  {
    "path": "lexeme.go",
    "chars": 1298,
    "preview": "package lingo\n\nimport (\n\t\"fmt\"\n\t\"unicode\"\n)\n\n//go:generate stringer -type=LexemeType\n\ntype LexemeType byte\n\nconst (\n\tEOF"
  },
  {
    "path": "lexemetype_string.go",
    "chars": 468,
    "preview": "// Code generated by \"stringer -type=LexemeType\"; DO NOT EDIT\n\npackage lingo\n\nimport \"fmt\"\n\nconst _LexemeType_name = \"EO"
  },
  {
    "path": "lexer/lexer.go",
    "chars": 2750,
    "preview": "package lexer\n\nimport (\n\t\"bufio\"\n\t\"bytes\"\n\t\"io\"\n\t\"strings\"\n\t\"sync\"\n\n\t\"golang.org/x/text/unicode/norm\"\n\n\t\"github.com/chew"
  },
  {
    "path": "lexer/lexer_test.go",
    "chars": 6700,
    "preview": "package lexer\n\nimport (\n\t\"strings\"\n\t\"testing\"\n\n\t\"github.com/chewxy/lingo\"\n)\n\ntype lexerTest struct {\n\tname string\n\ts    "
  },
  {
    "path": "lexer/stateFn.go",
    "chars": 5750,
    "preview": "package lexer\n\nimport (\n\t\"unicode\"\n\n\t\"github.com/chewxy/lingo\"\n)\n\ntype stateFn func(*Lexer) stateFn\n\nfunc lexText(l *Lex"
  },
  {
    "path": "lingo.go",
    "chars": 117,
    "preview": "// package lingo provides the data structures and algorithms required for natural language processing.\npackage lingo\n"
  },
  {
    "path": "pos/allinone_test.go",
    "chars": 1046,
    "preview": "package pos\n\nimport (\n\t\"log\"\n\t\"strings\"\n\t\"testing\"\n\n\t\"github.com/chewxy/lingo\"\n\t\"github.com/chewxy/lingo/lexer\"\n\t\"github"
  },
  {
    "path": "pos/context.go",
    "chars": 2674,
    "preview": "package pos\n\nimport (\n\t\"strconv\"\n\n\t\"github.com/chewxy/lingo\"\n)\n\n/*\nA context is which word in the current state the POST"
  },
  {
    "path": "pos/context_test.go",
    "chars": 1561,
    "preview": "package pos\n\nimport (\n\t\"strings\"\n\t\"testing\"\n\n\t\"github.com/chewxy/lingo\"\n)\n\nvar extractContextTest = []struct {\n\tval stri"
  },
  {
    "path": "pos/contexttype_string.go",
    "chars": 972,
    "preview": "// generated by stringer -type=contextType; DO NOT EDIT\n\npackage pos\n\nimport \"fmt\"\n\nconst _contextType_name = \"prev2Word"
  },
  {
    "path": "pos/debug.go",
    "chars": 789,
    "preview": "// +build debug\n\npackage pos\n\nimport (\n\t\"log\"\n\t\"strings\"\n\t\"sync/atomic\"\n)\n\nconst BUILD_DEBUG = \"POS TAGGER: Debug Build\""
  },
  {
    "path": "pos/errors.go",
    "chars": 224,
    "preview": "package pos\n\nimport \"fmt\"\n\ntype componentUnavailable string\n\nfunc (c componentUnavailable) Error() string     { return f"
  },
  {
    "path": "pos/features.go",
    "chars": 3168,
    "preview": "package pos\n\nimport (\n\t\"bytes\"\n\t\"fmt\"\n\n\t\"github.com/chewxy/lingo\"\n)\n\ntype featureType byte\n\n//go:generate stringer -type"
  },
  {
    "path": "pos/features_test.go",
    "chars": 6517,
    "preview": "// +build stanfordtags\n\npackage pos\n\nimport (\n\t\"testing\"\n\n\t\"github.com/chewxy/lingo\"\n\t\"github.com/stretchr/testify/asser"
  },
  {
    "path": "pos/featuretype_string.go",
    "chars": 804,
    "preview": "// generated by stringer -type=featureType; DO NOT EDIT\n\npackage pos\n\nimport \"fmt\"\n\nconst _featureType_name = \"biasithWo"
  },
  {
    "path": "pos/models.go",
    "chars": 1295,
    "preview": "package pos\n\nimport (\n\t\"bufio\"\n\t\"encoding/gob\"\n\t\"io\"\n\t\"os\"\n\n\t\"github.com/chewxy/lingo\"\n)\n\n// Model is the model that the"
  },
  {
    "path": "pos/models_test.go",
    "chars": 658,
    "preview": "package pos\n\nimport (\n\t\"os\"\n\t\"strings\"\n\t\"testing\"\n\n\t\"github.com/chewxy/lingo/treebank\"\n\t\"github.com/stretchr/testify/ass"
  },
  {
    "path": "pos/perceptron.go",
    "chars": 3257,
    "preview": "package pos\n\nimport \"github.com/chewxy/lingo\"\n\ntype perceptron struct {\n\t// weights map[feature]*[lingo.MAXTAG]float64 /"
  },
  {
    "path": "pos/perceptron_io.go",
    "chars": 3163,
    "preview": "package pos\n\nimport (\n\t\"bytes\"\n\t\"encoding/gob\"\n)\n\n/* Feature Gob interface */\n\nfunc (sf singleFeature) GobEncode() ([]by"
  },
  {
    "path": "pos/perceptron_io_test.go",
    "chars": 1781,
    "preview": "// +build stanfordtags\n\npackage pos\n\nimport (\n\t\"bytes\"\n\t\"encoding/gob\"\n\t\"testing\"\n\n\t\"github.com/chewxy/lingo\"\n\t\"github.c"
  },
  {
    "path": "pos/postagger.go",
    "chars": 7725,
    "preview": "package pos\n\nimport (\n\t\"github.com/chewxy/lingo\"\n\t\"github.com/chewxy/lingo/corpus\"\n\t\"github.com/chewxy/lingo/treebank\"\n)"
  },
  {
    "path": "pos/release.go",
    "chars": 490,
    "preview": "// +build !debug\n\npackage pos\n\nconst BUILD_DEBUG = \"POS TAGGER: Release Build\"\n\nvar TABCOUNT uint32 = 0\nvar tracking = f"
  },
  {
    "path": "pos/sentence.go",
    "chars": 588,
    "preview": "package pos\n\nimport \"github.com/chewxy/lingo\"\n\n// \"log\"\n\nfunc (p *Tagger) getSentences() {\n\tdefer close(p.sentences)\n\n\tv"
  },
  {
    "path": "pos/test_test.go",
    "chars": 3456,
    "preview": "package pos\n\nimport (\n\t\"github.com/chewxy/lingo\"\n\t\"github.com/kljensen/snowball\"\n)\n\ntype dummyLem struct{}\n\nfunc (dummyL"
  },
  {
    "path": "pos/util.go",
    "chars": 293,
    "preview": "package pos\n\nimport (\n\t\"math\"\n\n\t\"github.com/chewxy/lingo\"\n)\n\nfunc maxScore(scores *[lingo.MAXTAG]float64) lingo.POSTag {"
  },
  {
    "path": "pos/util_test.go",
    "chars": 435,
    "preview": "package pos\n\nimport (\n\t\"math\"\n\t\"math/rand\"\n\t\"testing\"\n\n\t\"github.com/chewxy/lingo\"\n)\n\nfunc TestMaxScore(t *testing.T) {\n\t"
  },
  {
    "path": "sentence.go",
    "chars": 7282,
    "preview": "package lingo\n\nimport (\n\t\"bytes\"\n\t\"fmt\"\n\t\"sort\"\n\t\"strings\"\n\n\t\"github.com/pkg/errors\"\n)\n\n/* Lexeme Sentence */\ntype Lexem"
  },
  {
    "path": "sets.go",
    "chars": 570,
    "preview": "package lingo\n\nimport (\n\t\"bytes\"\n\t\"fmt\"\n)\n\n/* TAG SET */\n\n// TagSet is a set of all the POSTags\ntype TagSet [MAXTAG]bool"
  },
  {
    "path": "shape.go",
    "chars": 899,
    "preview": "package lingo\n\nimport (\n\t\"bytes\"\n\t\"unicode\"\n)\n\n// Shape represents the shape of a word. It's currently implemented as a "
  },
  {
    "path": "stopwords.go",
    "chars": 2417,
    "preview": "package lingo\n\nimport \"strings\"\n\nconst sw = `a about above across after afterwards again against all almost alone along "
  },
  {
    "path": "treebank/const_postag_stanford.go",
    "chars": 1364,
    "preview": "// +build stanfordtags\n\npackage treebank\n\nimport \"github.com/chewxy/lingo\"\n\nvar posTagTable map[string]lingo.POSTag = ma"
  },
  {
    "path": "treebank/const_postag_universal.go",
    "chars": 600,
    "preview": "// +build !stanfordtags\n\npackage treebank\n\nimport \"github.com/chewxy/lingo\"\n\nvar posTagTable map[string]lingo.POSTag = m"
  },
  {
    "path": "treebank/const_rel_stanford.go",
    "chars": 2261,
    "preview": "// +build stanfordrel\n\npackage treebank\n\nimport \"github.com/chewxy/lingo\"\n\nvar dependencyTable map[string]lingo.Dependen"
  },
  {
    "path": "treebank/const_rel_universal.go",
    "chars": 1819,
    "preview": "// +build !stanfordrel\n\npackage treebank\n\nimport \"github.com/chewxy/lingo\"\n\nvar dependencyTable map[string]lingo.Depende"
  },
  {
    "path": "treebank/sentenceTag.go",
    "chars": 2083,
    "preview": "package treebank\n\nimport (\n\t\"math/rand\"\n\n\t\"github.com/chewxy/lingo\"\n)\n\n// SentenceTag is a struc that holds a sentence, "
  },
  {
    "path": "treebank/sentenceTag_test.go",
    "chars": 425,
    "preview": "package treebank\n\nimport (\n\t\"strings\"\n\t\"testing\"\n\n\t\"github.com/stretchr/testify/assert\"\n)\n\nfunc TestSentenceTag(t *testi"
  },
  {
    "path": "treebank/treebank.go",
    "chars": 2654,
    "preview": "package treebank\n\nimport (\n\t\"archive/zip\"\n\t\"io\"\n\t\"log\"\n\n\t\"github.com/chewxy/lingo\"\n\n\t\"bufio\"\n\t\"os\"\n\t\"strconv\"\n\t\"strings\""
  },
  {
    "path": "treebank/treebank_test.go",
    "chars": 2654,
    "preview": "package treebank\n\nimport (\n\t\"strings\"\n\t\"testing\"\n\n\t\"github.com/chewxy/lingo\"\n\t\"github.com/stretchr/testify/assert\"\n)\n\nco"
  },
  {
    "path": "treebank/util.go",
    "chars": 1099,
    "preview": "package treebank\n\nimport \"github.com/chewxy/lingo\"\n\nvar alreadyLogged map[string]bool = make(map[string]bool)\n\n// TODO :"
  },
  {
    "path": "utils.go",
    "chars": 513,
    "preview": "package lingo\n\nfunc InStringSlice(s string, l []string) bool {\n\tfor _, v := range l {\n\t\tif s == v {\n\t\t\treturn true\n\t\t}\n\t"
  },
  {
    "path": "wordFlags.go",
    "chars": 1260,
    "preview": "package lingo\n\nimport (\n\t\"fmt\"\n\t\"strings\"\n\t\"unicode\"\n)\n\n// WordFlags represent the types a word may be. A word may have "
  }
]

About this extraction

This page contains the full source code of the chewxy/lingo GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 128 files (278.9 KB), approximately 93.4k tokens, and a symbol index with 992 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo