Full Code of bluenote-1577/skani for AI

main b0cc88293e75 cached

88 files

52.5 MB

11.0M tokens

280 symbols

1 requests

Copy disabled (too large) Download .txt

Showing preview only (43,976K chars total). Download the full file to get everything.

Repository: bluenote-1577/skani
Branch: main
Commit: b0cc88293e75
Files: 88
Total size: 52.5 MB

Directory structure:
gitextract_6ays1fb4/

├── .github/
│   └── workflows/
│       └── release.yml
├── .gitignore
├── .gitmodules
├── CHANGELOG.md
├── Cargo.toml
├── LICENSE
├── README.md
├── model_to_src.sh
├── scripts/
│   ├── clustermap_triangle.py
│   └── pre_release.sh
├── skani_matrix.af
├── src/
│   ├── avx2_seeding.rs
│   ├── chain.rs
│   ├── cli.rs
│   ├── cmd_line.rs
│   ├── dist.rs
│   ├── file_io.rs
│   ├── lib.rs
│   ├── main.rs
│   ├── model.rs
│   ├── params.rs
│   ├── parse.rs
│   ├── regression.rs
│   ├── screen.rs
│   ├── search.rs
│   ├── seeding.rs
│   ├── sketch.rs
│   ├── sketch_db.rs
│   ├── triangle.rs
│   └── types.rs
├── test_files/
│   ├── GCF_005706655.1_ASM570665v1_genomic.fna
│   ├── GCF_005844845.1_ASM584484v1_genomic.fna
│   ├── MN-03.fa
│   ├── all_ns.fa
│   ├── e.coli-EC590.fasta
│   ├── e.coli-K12.fasta
│   ├── e.coli-W.fasta
│   ├── e.coli-h5.fasta
│   ├── e.coli-o157.fasta
│   ├── e.coli-o157.fasta.sketch
│   ├── empty_fasta.fa
│   ├── list.txt
│   ├── o157_plasmid.fasta
│   ├── o157_reads.fastq
│   ├── query_list.txt
│   ├── skani_matrix.af
│   ├── test.fasta
│   └── viruses.fna
├── test_results_versions/
│   ├── 0.2.1
│   ├── 0.2.2
│   ├── 0.3.0
│   └── v0.2.1
└── tests/
    ├── int_test_new.rs
    ├── integration_test.rs
    ├── results/
    │   ├── output
    │   ├── output.af
    │   ├── output_o_triangle_full
    │   ├── output_o_triangle_full.af
    │   ├── test_dist_file.txt
    │   ├── test_sketch_dir/
    │   │   ├── e.coli-EC590.fasta.sketch
    │   │   ├── e.coli-K12.fasta.sketch
    │   │   ├── e.coli-W.fasta.gz.sketch
    │   │   ├── e.coli-W.fasta.sketch
    │   │   ├── e.coli-h5.fasta.sketch
    │   │   ├── e.coli-o157.fasta.sketch
    │   │   ├── o157_plasmid.fasta.sketch
    │   │   └── o157_reads.fastq.sketch
    │   ├── test_sketch_dir1/
    │   │   ├── e.coli-EC590.fasta.sketch
    │   │   ├── e.coli-K12.fasta.sketch
    │   │   ├── e.coli-W.fasta.gz.sketch
    │   │   └── o157_reads.fastq.sketch
    │   ├── test_sketch_dir3/
    │   │   ├── e.coli-EC590.fasta.sketch
    │   │   ├── e.coli-K12.fasta.sketch
    │   │   ├── e.coli-W.fasta.gz.sketch
    │   │   ├── e.coli-W.fasta.sketch
    │   │   ├── e.coli-h5.fasta.sketch
    │   │   ├── e.coli-o157.fasta.sketch
    │   │   ├── o157_plasmid.fasta.sketch
    │   │   └── o157_reads.fastq.sketch
    │   └── test_sketch_dir_aai/
    │       ├── e.coli-EC590.fasta.sketch
    │       ├── e.coli-K12.fasta.sketch
    │       ├── e.coli-W.fasta.gz.sketch
    │       ├── e.coli-W.fasta.sketch
    │       ├── e.coli-h5.fasta.sketch
    │       ├── e.coli-o157.fasta.sketch
    │       ├── o157_plasmid.fasta.sketch
    │       └── o157_reads.fastq.sketch
    └── tests.rs

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/workflows/release.yml
================================================
name: "tagged-release"

on:
  workflow_dispatch:
  push:
    tags:
      - "v*"

jobs:
  tagged-release:
    name: "Tagged Release"
    runs-on: "ubuntu-latest"

    steps:
      - uses: actions/checkout@v2
      - uses: actions-rs/toolchain@v1
        with:
          toolchain: stable
      - run: sudo apt-get install musl musl-tools; rustup target add x86_64-unknown-linux-musl; cargo build --release --target=x86_64-unknown-linux-musl
      - uses: "marvinpinto/action-automatic-releases@latest"
        with:
          repo_token: "${{ secrets.GITHUB_TOKEN }}"
          prerelease: false
          automatic_release_tag: "latest"
          files: |
            target/x86_64-unknown-linux-musl/release/skani


================================================
FILE: .gitignore
================================================
/target
/refs
callgrind*
CLAUDE.md


================================================
FILE: .gitmodules
================================================
[submodule "skani-mummer-train"]
	path = skani-mummer-train
	url = https://github.com/bluenote-1577/skani-mummer-train


================================================
FILE: CHANGELOG.md
================================================
### v0.3.1 - 2025-10-11

### Minor
* Fixed `--both-min-af` bug in `skani search`

### v0.3.0 released - 2025-08 (Breaking changes)

#### Major

* Engineering changes: skani takes ~30-40% less memory than before but is ~5-10% slower. Results should still be identical to before. 
* BREAKING: Changed sketching output. Now all sketches are concatenated into a searchable database file by default. Original behavior can be resotored via `--separate-files`. 
* New command line options `--both-min-af` and `--short-header` 

#### Minor

* Refactored the commandline backend. Small deviations in commandline behavior may be present, hopefully not big bugs were introduced. 

### v0.2.2 released - 2024-07-04

#### Major

* added the `--small-genomes` preset. This is just an alias for `-c 30 -m 200 --faster-small`. This makes skani much faster when comparing hundreds of thousands of small genomes. 

#### Minor

* fixed a bug where `skani triangle --full-matrix` gave different results between STDOUT and `-o` (thanks to Florian Plaza Onate)
* added a `--diagonal` option (suggested by Antonio Camargo) to print diagonal entries for sparse and lower-triangular distance matrices
* added a warning to use `--faster-small` when comparing too many contigs (e.g. viruses, plasmids). 

### v0.2.1 released - 2023-10-11

More consistent support for small contigs and sequences. 

#### Major

* --faster-small option included in dist and triangle. 

Genomes (and contigs with the --i, --ri, --qi options) with less than 20 marker k-mers are not screened according to the -s option. This makes skani for senative for small sequences, but can hamper performance on very large datasets with lots of small genomes/contigs. 

This heuristic can now be disabled with the `--faster-small` option. 

#### Minor

* skani's version is now displayed properly
* Added some error messages for degenerate cases (and more testing)
* We found that the statically built binary can be a lot slower in certain cases. File i/o may be an issue for the binary version. A note is now added in the README.

### v0.2.0 released - 2023-09-26

#### BREAKING

* --learned-ani feature was buggy before and now removed. 

#### Major

* Major bug found: debiasing for ANI was turned off if there were > 5000 queries present in skani search and skani dist. This bug is fixed now. 

#### Minor

* The rust API is changing in this version. Not published to Cargo yet (waiting on https://github.com/DDOtten/partitions/pull/3 to be published to crates...)

### v0.1.5 released - 2023-09-01

#### Major

Improved "N" character support: 

* changed query-reference selection method slightly via a slight hack, using marker seeds to estimate reference length instead. This makes it so NNN characters are not counted. 
* Now seeds with "N" characters present are no longer indexed. 

#### Minor
* --robust now uses the learned ANI debiasing procedure by default. 

### v0.1.4 released - 2023-06-14

#### Major
* skani triangle had a bug where if more than 5000 queries were present and --sparse or -E was not specified, the intermediate batch of 5000 queries would be written in sparse mode. 
* skani triangle -o was giving different upper triangle matrix instead of lower triangle (skani triangle > res gives lower triangle). Matrices are consistently lower triangle now.
* Changed to lto = true for release mode. I see anywhere from a 5-10% speedup for this.

#### Minor
* Changed some dependencies so no more dependencies on old crates that will deprecate. 

### v0.1.3 released - 2023-05-09 

#### Major
* Fixed a bug where memory was blowing up in `dist` and `triangle` when the marker-index was activated. For big datasets, there could be > 100 GBs of wasted memory. 
* skani now outputs intermediate results after processing each batch of 5000 queries. **This will mean that outputs may no longer be deterministically ordered if there are > 5000 genomes**, but you can sort the output file to get deterministic outputs, i.e ``skani triangle *.fa | sort -k 3 -n > sorted_skani_result.txt`` will guarantee deterministic output order. 

#### Minor 
* Changed the marker index hash table population method. Used to overestimate memory usage slightly.
* New help message for marker parameters. Turns out that for small genomes, having more markers may make filtering significantly better. 
* Added -i option to sketch so you can sketch individual records in multifastas -- does not work for search yet though, only for sketching. 

### v0.1.2 released - 2023-04-28.

Small fixes.

* Added `--medium` pre-set, which is just `-c 70`. Seems to work okay for comparing fragmented genomes. 
* **BREAKING**: Changed `--marker-index` to `--no-marker-index` as a more sane option. 
* Added `--distance` option to `skani triangle` to output distance matrix (i.e. 100 - ANI) instead of similarity matrix. 
* Misc. help message fixes

### v0.1.1 released - 2023-04-09. 

Small fixes.

* Made aligned fraction in `triangle mode` a full matrix by default. This is not a symmetric matrix since AF is not symmetric. 
* Misc. help message fixes 

### v0.1.0 released - 2023-02-07. 

We added new experiments on the revised version of our preprint (Extended Data Figs 11-14). We show skani has quite good AF correlation with MUMmer, and that it works decently on simple eukaryotic MAGs, especially with the `--slow` option (see below). 

#### Major

* **ANI debiasing added** - skani now uses a debiasing step with a regression model trained on MAGs to give more accurate ANIs. Old version gave robust, but slightly overestimated ANIs, especially around 95-97% range. Debiasing is enabled by default, but can be turned off with ``--no-learned-ani``.
* **More accurate aligned fraction** - chaining algorithm changed to give a more accurate aligned fraction (AF) estimate. The previous version had more variance and underestimated AF for certain assemblies.

#### Minor

* **Small contig/genome defaults made better** - should be more sensitive so that they don't get filtered by default.
* **Repetitive k-mer masking made better** - smarter settings and should work better for eukaryotic genomes; shouldn't affect prokaryotic genomes much.
* **`--fast` and `--slow` mode added** - alias for `-c 200` and `-c 30` respectively.
* **More non x86_64 builds should work** - there was a bug before where skani would be dysfunctional on non x86_64 architectures. It seems to at least build on ARM64 architectures successfully now.


================================================
FILE: Cargo.toml
================================================
[package]
name = "skani"
###Make sure to change version in main.rs after changing cargo.toml
version = "0.3.1"
####
edition = "2021"
license = "MIT OR Apache-2.0"
description = "skani is a fast tool for calculating ANI between metagenomic sequences, such as metagenome-assembled genomes (MAGs). It is extremely fast and is robust against incompleteness and fragmentation, giving accurate ANI estimates."
homepage = "https://github.com/bluenote-1577/skani"
documentation = "https://github.com/bluenote-1577/skani"
repository = "https://github.com/bluenote-1577/skani"
readme = "README.md"

exclude = [
    "test_files/*",
    "videos/*",
]

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
thiserror = "1.0"
bio = "1"
rand = "0.8.3"
fxhash = "0"
partitions = { git = "https://github.com/bluenote-1577/partitions.git" }
#partitions = "0.2"
num-traits = "0"
needletail = "0.5"
simple-logging= "2"
log = "0.4"
rayon = "1.5"
smallvec = { version = "1", features = ["union","serde","write"] }
serde = "1"
bincode = "1"
intervallum = "1"
rust-lapper = "1"
gcollections = "1"
fastrand="1"
gbdt = "0"
serde_json = "1"
statrs = "0"
memmap2 = "0.9"

[dependencies.clap]
version = "3"
features = ["derive"]
optional = true

[target.'cfg(target_env = "musl")'.dependencies]
tikv-jemallocator = "0.5.4"

[dev-dependencies]
assert_cmd = "1.0.1"
predicates = "1"
serial_test = "0"
tsv = "0.1.1"
reflection = "0"

[features]
default = ["cli"]
cli = ["clap"]

[[bin]]
name = "skani"
path = "src/main.rs"
required-features = ["cli"]


[profile.release]
panic = "abort"
lto = true

[profile.dev]
#opt-level = 1
opt-level = 3

#[rust]
#debuginfo-level = 1


================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2022 Jim Shaw

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
# skani - accurate, fast nucleotide identity calculation for MAGs, genomes, and databases

## Introduction

**skani** is a program for calculating **average nucleotide identity** (ANI) and **aligned fraction** (AF) for DNA sequences (contigs/MAGs/genomes) and ANI > ~80%.

skani uses an approximate mapping method without base-level alignment to get ANI. It is magnitudes faster than BLAST-based methods and almost as accurate. skani offers:

1. **Accurate ANI calculations for MAGs**. skani is accurate for incomplete and medium-quality metagenome-assembled genomes (MAGs). Pure sketching methods (e.g. Mash) may underestimate ANI for incomplete MAGs.

2. **Aligned fraction results**. skani outputs the fraction of genome aligned. 

3. **Fast computations**. Indexing/sketching is ~ 3x faster than Mash, and querying is about 25x faster than FastANI (but slower than Mash). 

4. **Efficient database search**. Querying a genome against a preprocessed database of >65000 prokaryotic genomes takes seconds with a single processor and ~6 GB of RAM. Constructing a database from genome sequences takes minutes to an hour.

##  Updates

> [!IMPORTANT]
> 
> Skani v0.3.x is now released. v0.3 has breaking changes compared to versions <= 0.2.x. 

### v0.3.0 - 2025-08-10

* BREAKING: old `.sketch` files no longer work.
* New `skani sketch` functionality. Creates a single database instead of individual `.sketch` files by default. The previous behaviour can be obtained via `--separate-sketches` option.
* Skani should now take 30-40% less memory, but 5-10% longer runtimes.

See the [CHANGELOG](https://github.com/bluenote-1577/skani/blob/main/CHANGELOG.md) for the skani's full versioning history. 

##  Install

#### Option 1: Build from source

Requirements:
1. [rust](https://www.rust-lang.org/tools/install) programming language and associated tools such as cargo are required and assumed to be in PATH.
2. A c compiler (e.g. GCC)
3. make

Building takes a few minutes (depending on # of cores).

```sh
git clone https://github.com/bluenote-1577/skani
cd skani

# If default rust install directory is ~/.cargo
cargo install --path . --root ~/.cargo
skani dist refs/e.coli-EC590.fasta refs/e.coli-K12.fasta

# If ~/.cargo doesn't exist use below commands instead
#cargo build --release
#./target/release/skani dist refs/e.coli-EC590.fasta refs/e.coli-K12.fasta
```

See the [Releases](https://github.com/bluenote-1577/skani/releases) page for obtaining specific versions of skani.

#### Option 2: Conda (source version: 0.3)
[![Anaconda-Server Badge](https://anaconda.org/bioconda/skani/badges/version.svg)](https://anaconda.org/bioconda/skani)
[![Anaconda-Server Badge](https://anaconda.org/bioconda/skani/badges/latest_release_date.svg)](https://anaconda.org/bioconda/skani)
```sh
conda install -c bioconda skani
```

#### Option 3: Pre-built x86-64 linux statically compiled executable

We offer a pre-built statically compiled executable for x86-64 Linux systems. That is, if you're on an x86-64 Linux system, you can just download the binary and run it without installing anything. 

For using the latest version of skani: 

```sh
wget https://github.com/bluenote-1577/skani/releases/download/latest/skani
chmod +x skani
./skani -h
```

**Important**: the binary runs slightly slower (3-10%) most of the time, but it can be drastically slower on some tasks. 

## Quick start

```sh
# compare two genomes for ANI. skani is symmetric, so order does not affect ANI
skani dist genome1.fa genome2.fa 
skani dist genome2.fa genome1.fa 

# compare multiple genomes; all options take -t for multi-threading.
skani dist -t 3 -q query1.fa query2.fa -r reference1.fa reference2.fa -o all-to-all_results.txt

# compare individual fasta records (e.g. contigs)
skani dist --qi -q assembly1.fa --ri -r assembly2.fa  

# construct database and do memory-efficient search
skani sketch genomes_to_search/* -o database
skani search query1.fa query2.fa ... -d database

# construct similarity matrix/edge list for all genomes in folder
skani triangle genome_folder/* > skani_ani_matrix.txt
skani triangle genome_folder/* -E > skani_ani_edge_list.txt

# we provide a script in this repository for clustering/visualizing distance matrices.
# requires python3, seaborn, scipy/numpy, and matplotlib.
python scripts/clustermap_triangle.py skani_ani_matrix.txt 

```

## Tutorials and manuals

### [skani basic usage information](https://github.com/bluenote-1577/skani/wiki/skani-basic-usage-guide)

For more information about using the specific skani subcommands, see the [guide linked above](https://github.com/bluenote-1577/skani/wiki/skani-basic-usage-guide). 

### skani tutorials

1. #### [Tutorial: setting up the GTDB prokaryotic genome database to search against](https://github.com/bluenote-1577/skani/wiki/Tutorial:-setting-up-the-GTDB-genome-database-to-search-against)
2. #### [Tutorial: classifying entire assemblies against > 85,000 genomes in under 2 minutes](https://github.com/bluenote-1577/skani/wiki/Tutorial:-classifying-entire-assemblies-(MAGs-or-contigs)-against-85,000-genomes-in-under-2-minutes)
3. #### [Tutorial: strain-level clustering of MAGs using skani, and why Mash/FastANI have issues](https://github.com/bluenote-1577/skani/wiki/Tutorial:-strain-and-species-level-clustering-of-MAGs-with-skani-triangle)

### [skani cookbook](https://github.com/bluenote-1577/skani/wiki/skani-cookbook)

Some common use cases and parameter settings are outlined in the cookbook. 

### [Pre-sketched databases for searching](https://github.com/bluenote-1577/skani/wiki/Pre%E2%80%90sketched-databases)

Pre-sketched databases can be downloaded and quickly searched against. 

### [skani advanced usage information](https://github.com/bluenote-1577/skani/wiki/skani-advanced-usage-guide)

See the advanced usage guide linked above for more information about topics such as:

* optimizing sensitivity/speed of skani
* optimizing skani for long-reads or contigs
* making skani for memory efficient for huge data sets

## Output

If the resulting aligned fraction for the two genomes is < 15%, no output is given. 

**In practice, this means that only results with > ~82% ANI are reliably output** (with default parameters). See the [skani advanced usage guide](https://github.com/bluenote-1577/skani/wiki/skani-advanced-usage-guide) for information on how to compare lower ANI genomes. 

The default output for `search` and `dist` looks like
```
Ref_file	Query_file	ANI	Align_fraction_ref	Align_fraction_query	Ref_name	Query_name
refs/e.coli-EC590.fasta	refs/e.coli-K12.fasta	99.39	93.95	93.37	NZ_CP016182.2 Escherichia coli strain EC590 chromosome, complete genome	NC_007779.1 Escherichia coli str. K-12 substr. W3110, complete sequence
```
- Ref_file: the filename of the reference.
- Query_file: the filename of the query.
- ANI: the ANI.
- Aligned_fraction_query/reference: fraction of query/reference covered by alignments.
- Ref/Query_name: the id of the first record in the reference/query file.

The order of results is dependent on the command and not guaranteed to be deterministic when > 5000 query genomes are present. `dist` and `search` try to place the highest ANI results first. 

## Citation

Jim Shaw and Yun William Yu. Fast and robust metagenomic sequence comparison through sparse chaining with skani. Nature Methods (2023). https://doi.org/10.1038/s41592-023-02018-3

## Feature requests, issues

skani is actively being developed by me ([Jim Shaw](https://jim-shaw-bluenote.github.io/)). I'm more than happy to accommodate simple feature requests (different types of outputs, etc). Feel free to open an issue with your feature request on the GitHub repository. If you catch any bugs, please open an issue or e-mail me (e-mail on my website). 

## Calling skani from rust or python

### Rust API

If you're interested in using skani as a rust library, check out the minimal example here: https://github.com/bluenote-1577/skani-lib-example. The documentation is currently minimal (https://docs.rs/skani/0.1.0/skani/) and I guarantee no API stability. 

### Python bindings 

If you're interested in calling skani from python, see the [pyskani](https://github.com/althonos/pyskani) python interface and bindings to skani written by [Martin Larralde](https://github.com/althonos). Note: I am not personally involved in the pyskani project and do not offer guarantees on the correctness of the outputs. 


================================================
FILE: model_to_src.sh
================================================
#Models used in the preprint of skani. Trained on only Nayfach et al (2021) GEM data.
MODEL=./skani-mummer-train/v0.1.0-paper-models/0.19525746-3-195-0.06.model
#MODEL=./skani-mummer-train/v0.1.0-paper-models/nayfach_half.model
MODEL_C200=./skani-mummer-train/v0.1.0-paper-models/0.19861849-3-195-0.089999996.model
SRC=src/model.rs

echo $MODEL
echo $MODEL_C200

echo 'pub const MODEL:&str = r#"' > $SRC
cat $MODEL >> $SRC
echo '"#;' >> $SRC
printf '\n' >> $SRC

echo 'pub const MODEL_C200:&str = r#"' >> $SRC
cat $MODEL_C200 >> $SRC
echo '"#;' >> $SRC
printf '\n' >> $SRC


================================================
FILE: scripts/clustermap_triangle.py
================================================
import numpy as np
from collections import defaultdict
import seaborn
import matplotlib.pyplot as plt
import sys
sys.setrecursionlimit(100000)
from scipy.cluster import hierarchy
import scipy
file = sys.argv[1]
if 'mash' in file:
    print("ANI matrix obtained from Mash detected.")
if 'fastani' in file:
    print("ANI matrix obtained from FastANI detected.")


counter = 0
items = 0
labels = []
condensed = []
matrix = []
all_labels = set()
delim = '\t'
#delim = ','
for line in open(file, 'r'):
    if counter == 0:
        #print(line)
        spl = line.split(delim)
        if len(spl) > 2:
            items = len(spl)
        else:
            items = int(line.split(delim)[-1])
        #print(items)
        matrix = [[] for x in range(items)]
        counter += 1
        continue
    if delim in line:
        spl = line.split(delim);
    else:
        spl = line.split();
    #print(spl[0].split('/')[-1])
    labels.append(spl[0].split('/')[-1])
    endpoints = range(1,counter)
    for i in endpoints:
        if 'mash' in file:
            matrix[i-1].append(100 - 100 * float(spl[i]))
        elif 'fastani' in file:
            matrix[i-1].append(float(spl[i]))
        else:
            if float(spl[i]) <= 1:
                matrix[i-1].append(float(spl[i]) * 100)
            else:
                matrix[i-1].append(float(spl[i]))
    counter += 1

for vec in matrix:
    for score in vec:
        condensed.append(100 - score)


cmap = seaborn.cm.rocket_r

#Z = hierarchy.linkage(condensed, 'single')
#Z = hierarchy.linkage(condensed, 'complete')
Z = hierarchy.linkage(condensed, 'average')
square_mat = scipy.spatial.distance.squareform(condensed)
if len(sys.argv) > 2:
    vmax = float(sys.argv[2])
    cg = seaborn.clustermap(square_mat, row_linkage = Z, col_linkage = Z, vmax = vmax, cmap = cmap)
else:
    cg = seaborn.clustermap(square_mat, row_linkage = Z, col_linkage = Z, cmap = cmap)

 
#print(cg.dendrogram_row.reordered_ind)
re = [labels[x] for x in cg.dendrogram_row.reordered_ind]

if len(labels) < 50:
    xticks = [x for x in range(len(labels))]
    cg.ax_heatmap.set_xticks(xticks)
    cg.ax_heatmap.set_xticklabels(re, rotation=90)
#cg.ax_heatmap.set_yticklabels(re, rotation=0)
plt.tight_layout()
plt.show()


================================================
FILE: scripts/pre_release.sh
================================================
#!/bin/bash

# Define the expected version
EXPECTED_VERSION="0.3.0"

# Function to extract the version from Cargo.toml
get_cargo_version() {
      grep -m1 "^version" Cargo.toml | sed -E 's/version\s*=\s*"([^"]+)"/\1/'
}

get_main_version() {
      grep 'pub const VERSION' src/params.rs | sed -E 's/pub const VERSION\s*:\s*&str\s*=\s*"([^"]+)";/\1/'
}

# Check if Cargo.toml has the expected version
CARGO_VERSION=$(get_cargo_version)
if [ "$CARGO_VERSION" == "$EXPECTED_VERSION" ]; then
  echo "Cargo.toml has the correct version: $CARGO_VERSION."
else
  echo "Error: Cargo.toml version ($CARGO_VERSION) does not match expected version ($EXPECTED_VERSION)."
  exit 1
fi

# Check if main.rs has the expected version
MAIN_VERSION=$(get_main_version)
if [ "$MAIN_VERSION" == "$EXPECTED_VERSION" ]; then
  echo "main.rs has the correct version: $MAIN_VERSION."
else
  echo "Error: main.rs version ($MAIN_VERSION) does not match expected version ($EXPECTED_VERSION)."
  exit 1
fi

# Run the cargo test command
echo "Running tests..."
cargo test -j 1 -- --show-output > test_results_versions/$EXPECTED_VERSION

echo "Test results have been saved to test_results_versions/$EXPECTED_VERSION"


================================================
FILE: skani_matrix.af
================================================
4
./test_files/GCF_005706655.1_ASM570665v1_genomic.fna	100.00	0.00	0.00	0.00
./test_files/e.coli-EC590.fasta	0.00	100.00	90.67	48.52
./test_files/e.coli-W.fasta.gz	0.00	85.49	100.00	48.19
./test_files/o157_reads.fastq	0.00	40.64	42.81	100.00


================================================
FILE: src/avx2_seeding.rs
================================================
use std::arch::x86_64::*;
use crate::params::*;
use crate::types::*;

#[inline]
#[target_feature(enable = "avx2")]
pub unsafe fn mm_hash256(kmer: __m256i) -> __m256i {
    let mut key = kmer;
    let s1 = _mm256_slli_epi64(key, 21);
    key = _mm256_add_epi64(key, s1);
    key = _mm256_xor_si256(key, _mm256_cmpeq_epi64(key, key));

    key = _mm256_xor_si256(key, _mm256_srli_epi64(key, 24));
    let s2 = _mm256_slli_epi64(key, 3);
    let s3 = _mm256_slli_epi64(key, 8);

    key = _mm256_add_epi64(key, s2);
    key = _mm256_add_epi64(key, s3);
    key = _mm256_xor_si256(key, _mm256_srli_epi64(key, 14));
    let s4 = _mm256_slli_epi64(key, 2);
    let s5 = _mm256_slli_epi64(key, 4);
    key = _mm256_add_epi64(key, s4);
    key = _mm256_add_epi64(key, s5);
    key = _mm256_xor_si256(key, _mm256_srli_epi64(key, 28));

    let s6 = _mm256_slli_epi64(key, 31);
    key = _mm256_add_epi64(key, s6);

    key
}

#[target_feature(enable = "avx2")]
pub unsafe fn avx2_fmh_seeds(
    string: &[u8],
    sketch_params: &SketchParams,
    contig_index: ContigIndex,
    new_sketch: &mut Sketch,
    seed: bool,
) {
    if seed && new_sketch.kmer_seeds_k.is_none() {
        new_sketch.kmer_seeds_k = Some(KmerSeeds::default());
    }
    let marker_k = K_MARKER_DNA;
    let _kmer_seeds_k = &mut new_sketch.kmer_seeds_k;
    let k = sketch_params.k;
    let c = sketch_params.c;
    let marker_c = sketch_params.marker_c;
    let len = (string.len() - marker_k + 1) / 4;
    let string1 = &string[0..len + marker_k - 1];
    let string2 = &string[len..2 * len + marker_k - 1];
    let string3 = &string[2 * len..3 * len + marker_k - 1];
    let string4 = &string[3 * len..4 * len + marker_k - 1];
    if k > marker_k {
        panic!("Value of k > {} for DNA; not allowed.", marker_k);
    }
    if string.len() < 2 * marker_k {
        return;
    }

    let mut rolling_kmer_f_marker = _mm256_set_epi64x(0, 0, 0, 0);
    let mut rolling_kmer_r_marker = _mm256_set_epi64x(0, 0, 0, 0);
    let rev_sub = _mm256_set_epi64x(3, 3, 3, 3);
    for i in 0..marker_k - 1 {
        let ascii_rep_1 = string1[i] as usize;
        let ascii_rep_2 = string2[i] as usize;
        let ascii_rep_3 = string3[i] as usize;
        let ascii_rep_4 = string4[i] as usize;
        let nuc_f1 = BYTE_TO_SEQ[ascii_rep_1] as i64;
        let nuc_f2 = BYTE_TO_SEQ[ascii_rep_2] as i64;
        let nuc_f3 = BYTE_TO_SEQ[ascii_rep_3] as i64;
        let nuc_f4 = BYTE_TO_SEQ[ascii_rep_4] as i64;
        let f_nucs = _mm256_set_epi64x(nuc_f4, nuc_f3, nuc_f2, nuc_f1);
        let r_nucs = _mm256_sub_epi64(rev_sub, f_nucs);

        rolling_kmer_f_marker = _mm256_slli_epi64(rolling_kmer_f_marker, 2);
        rolling_kmer_f_marker = _mm256_or_si256(rolling_kmer_f_marker, f_nucs);

        rolling_kmer_r_marker = _mm256_srli_epi64(rolling_kmer_r_marker, 2);
        let shift_nuc_r = _mm256_slli_epi64(r_nucs, 40);
        rolling_kmer_r_marker = _mm256_or_si256(rolling_kmer_r_marker, shift_nuc_r);
    }

    let seed_mask = (MarkerBits::MAX >> (std::mem::size_of::<MarkerBits>() * 8 - 2 * k)) as i64;
    let mm256_seed_mask = _mm256_set_epi64x(seed_mask, seed_mask, seed_mask, seed_mask);
    let marker_mask =
        (MarkerBits::MAX >> (std::mem::size_of::<MarkerBits>() * 8 - 2 * marker_k)) as i64;
    let rev_marker_mask: u64 = !(3 << (2 * marker_k - 2));
    let rev_marker_mask = i64::from_le_bytes(rev_marker_mask.to_le_bytes());
    //    dbg!(u64::MAX / (c as u64));
    //    dbg!((u64::MAX / (c as u64)) as i64);
    let threshold = i64::MIN + (u64::MAX / (c as u64)) as i64;
    let _threshold_marker = i64::MIN + (u64::MAX / marker_c as u64) as i64;
    let threshold_unsigned = u64::MAX / c as u64;
    let threshold_marker_unsigned = u64::MAX / marker_c as u64;
    let _cmp_thresh = _mm256_set_epi64x(threshold, threshold, threshold, threshold);

    let mm256_marker_mask = _mm256_set_epi64x(marker_mask, marker_mask, marker_mask, marker_mask);
    let mm256_rev_marker_mask = _mm256_set_epi64x(
        rev_marker_mask,
        rev_marker_mask,
        rev_marker_mask,
        rev_marker_mask,
    );

    //dbg!(KmerEnc::print_string(u64::from_le_bytes(_mm256_extract_epi64(rolling_kmer_f_marker,0).to_le_bytes()), 21));

    let mut resume_inds = [0,0,0,0];
    for i in marker_k-1..(len + marker_k - 1) {

        let ascii_rep_1 = string1[i] as usize;
        let ascii_rep_2 = string2[i] as usize;
        let ascii_rep_3 = string3[i] as usize;
        let ascii_rep_4 = string4[i] as usize;

        if ascii_rep_1 == ASCII_N{
            resume_inds[0] = i + marker_k;
        }
        if ascii_rep_2 == ASCII_N{
            resume_inds[1] = i + marker_k;
        }
        if ascii_rep_3 == ASCII_N{
            resume_inds[2] = i + marker_k;
        }
        if ascii_rep_4 == ASCII_N{
            resume_inds[3] = i + marker_k;
        }


        let nuc_f1 = BYTE_TO_SEQ[ascii_rep_1] as i64;
        let nuc_f2 = BYTE_TO_SEQ[ascii_rep_2] as i64;
        let nuc_f3 = BYTE_TO_SEQ[ascii_rep_3] as i64;
        let nuc_f4 = BYTE_TO_SEQ[ascii_rep_4] as i64;
        
        let f_nucs = _mm256_set_epi64x(nuc_f4, nuc_f3, nuc_f2, nuc_f1);
        let r_nucs = _mm256_sub_epi64(rev_sub, f_nucs);

        rolling_kmer_f_marker = _mm256_slli_epi64(rolling_kmer_f_marker, 2);
        rolling_kmer_f_marker = _mm256_or_si256(rolling_kmer_f_marker, f_nucs);
        rolling_kmer_f_marker = _mm256_and_si256(rolling_kmer_f_marker, mm256_marker_mask);
        rolling_kmer_r_marker = _mm256_srli_epi64(rolling_kmer_r_marker, 2);
        let shift_nuc_r = _mm256_slli_epi64(r_nucs, 40);
        rolling_kmer_r_marker = _mm256_and_si256(rolling_kmer_r_marker, mm256_rev_marker_mask);
        rolling_kmer_r_marker = _mm256_or_si256(rolling_kmer_r_marker, shift_nuc_r);

        let rolling_kmer_f_seed = _mm256_and_si256(rolling_kmer_f_marker, mm256_seed_mask);
        let rolling_kmer_r_seed = _mm256_and_si256(rolling_kmer_r_marker, mm256_seed_mask);
        let compare = _mm256_cmpgt_epi64(rolling_kmer_r_seed, rolling_kmer_f_seed);
        let compare_marker = _mm256_cmpgt_epi64(rolling_kmer_r_marker, rolling_kmer_f_marker);
        let canonical_seeds_256 =
            _mm256_blendv_epi8(rolling_kmer_r_seed, rolling_kmer_f_seed, compare);

        let canonical = [
            _mm256_extract_epi64(compare, 0) != 0,
            _mm256_extract_epi64(compare, 1) != 0,
            _mm256_extract_epi64(compare, 2) != 0,
            _mm256_extract_epi64(compare, 3) != 0,
        ];

        let canonical_seeds = [
            _mm256_extract_epi64(canonical_seeds_256, 0),
            _mm256_extract_epi64(canonical_seeds_256, 1),
            _mm256_extract_epi64(canonical_seeds_256, 2),
            _mm256_extract_epi64(canonical_seeds_256, 3),
        ];

        let hash_256 = mm_hash256(canonical_seeds_256);
        let v1 = _mm256_extract_epi64(hash_256, 0) as u64;
        let v2 = _mm256_extract_epi64(hash_256, 1) as u64;
        let v3 = _mm256_extract_epi64(hash_256, 2) as u64;
        let v4 = _mm256_extract_epi64(hash_256, 3) as u64;
        //        let threshold_256 = _mm256_cmpgt_epi64(cmp_thresh, hash_256);
        //        let m1 = _mm256_extract_epi64(threshold_256, 0);
        //        let m2 = _mm256_extract_epi64(threshold_256, 1);
        //        let m3 = _mm256_extract_epi64(threshold_256, 2);
        //        let m4 = _mm256_extract_epi64(threshold_256, 3);

        if true {
            //            if m1 !={
            if v1 < threshold_unsigned && resume_inds[0] <= i {
                const IND: i32 = 0;
                new_sketch.add_seed_position(
                    canonical_seeds[IND as usize] as SeedBits,
                    SeedPosition::new(
                        i as GnPosition,
                        contig_index,
                        canonical[IND as usize],
                    )
                );
                let canonical_marker = _mm256_extract_epi64(compare_marker, IND) != 0;
                let canonical_kmer_marker;
                if canonical_marker {
                    canonical_kmer_marker = _mm256_extract_epi64(rolling_kmer_f_marker, IND);
                } else {
                    canonical_kmer_marker = _mm256_extract_epi64(rolling_kmer_r_marker, IND);
                };
                //                if _mm256_extract_epi64(hash_256, IND) < threshold_marker {
                if v1 < threshold_marker_unsigned {
                    new_sketch.marker_seeds.insert(canonical_kmer_marker as u64);
                }
            }
            //            if m2 != 0 {
            if v2 < threshold_unsigned && resume_inds[1] <= i {
                const IND: i32 = 1;
                new_sketch.add_seed_position(
                    canonical_seeds[IND as usize] as SeedBits,
                    SeedPosition::new(
                        i as GnPosition + (len as i32 * IND) as GnPosition,
                        contig_index,
                        canonical[IND as usize],
                    )
                );
                let canonical_marker = _mm256_extract_epi64(compare_marker, IND) != 0;
                let canonical_kmer_marker;
                if canonical_marker {
                    canonical_kmer_marker = _mm256_extract_epi64(rolling_kmer_f_marker, IND);
                } else {
                    canonical_kmer_marker = _mm256_extract_epi64(rolling_kmer_r_marker, IND);
                };
                //                if _mm256_extract_epi64(hash_256, IND) < threshold_marker {
                if v2 < threshold_marker_unsigned {
                    new_sketch.marker_seeds.insert(canonical_kmer_marker as u64);
                }
            }
            //            if m3 != 0 {
            if v3 < threshold_unsigned && resume_inds[2] <= i{
                const IND: i32 = 2;
                new_sketch.add_seed_position(
                    canonical_seeds[IND as usize] as SeedBits,
                    SeedPosition::new(
                        i as GnPosition + (len as i32 * IND) as GnPosition,
                        contig_index,
                        canonical[IND as usize],
                    )
                );
                let canonical_marker = _mm256_extract_epi64(compare_marker, IND) != 0;
                let canonical_kmer_marker;
                if canonical_marker {
                    canonical_kmer_marker = _mm256_extract_epi64(rolling_kmer_f_marker, IND);
                } else {
                    canonical_kmer_marker = _mm256_extract_epi64(rolling_kmer_r_marker, IND);
                };
                //                if _mm256_extract_epi64(hash_256, IND) < threshold_marker {
                if v3 < threshold_marker_unsigned {
                    new_sketch.marker_seeds.insert(canonical_kmer_marker as u64);
                }
            }
            //            if m4 != 0 {
            if v4 < threshold_unsigned && resume_inds[3] <= i{
                const IND: i32 = 3;
                new_sketch.add_seed_position(
                    canonical_seeds[IND as usize] as SeedBits,
                    SeedPosition::new(
                        i as GnPosition + (len as i32 * IND) as GnPosition,
                        contig_index,
                        canonical[IND as usize],
                    )
                );
                let canonical_marker = _mm256_extract_epi64(compare_marker, IND) != 0;
                let canonical_kmer_marker;
                if canonical_marker {
                    canonical_kmer_marker = _mm256_extract_epi64(rolling_kmer_f_marker, IND);
                } else {
                    canonical_kmer_marker = _mm256_extract_epi64(rolling_kmer_r_marker, IND);
                };
                //                if _mm256_extract_epi64(hash_256, IND) < threshold_marker {
                if v4 < threshold_marker_unsigned {
                    new_sketch.marker_seeds.insert(canonical_kmer_marker as u64);
                }
            }
        }
    }
}


================================================
FILE: src/chain.rs
================================================
use crate::params::*;
use gbdt::gradient_boost::GBDT;
use crate::types::*;
use bio::data_structures::interval_tree::IntervalTree;
use crate::regression;

use fxhash::FxHashMap;
use log::*;
use partitions::*;
use std::mem;
extern crate interval;
use gcollections::ops::set::*;
use interval::interval_set::*;

fn switch_qr(med_ctg_len_r: f64, med_ctg_len_q: f64, q_sk_len: f64,r_sk_len: f64, query_file_name: &str, ref_file_name: &str)-> bool{
    let score_query = q_sk_len
        * (f64::min(med_ctg_len_q, 300000.));
    let score_ref = r_sk_len
        * (f64::min(med_ctg_len_r, 300000.));
    if score_query == score_ref{
        query_file_name > ref_file_name
    }
    else{
        score_query > score_ref
    }
}

fn mean(data: &[f64]) -> Option<f64> {
    let sum = data.iter().sum::<f64>() as f64;
    let count = data.len() as f64;
    if data.len() > 0{
        return Some(sum/count);
    }
    else{
        return None;
    }
}

fn std_deviation(data: &[f64]) -> f64 {
    let count = data.len();
    let data_mean = mean(&data);
    if data_mean.is_none(){
        return 0.
    }
    else{
        let data_mean = data_mean.unwrap();
        let variance = data.iter().map(|value| {
            let diff = data_mean - (*value);

            diff * diff
        }).sum::<f64>() / count as f64;

        variance.sqrt()
    }
}

fn bootstrap_interval(ani_ests: &Vec<(f64,usize)>) -> (f64,f64,f64){
    let ani_est_no_mult = ani_ests.iter().map(|x| x.0).collect::<Vec<f64>>();
    let std = std_deviation(&ani_est_no_mult);
    let mut res = vec![];
    let mut mult_ani_ests = vec![];
    fastrand::seed(7);
    let num_samp = ani_ests.len();
    //Return no confidence interval if number of samples is too small. 
    if num_samp < 10 {
        return (0.,1., std);
    }
    for (ani,mult) in ani_ests.iter(){
        for _ in 0..*mult{
            mult_ani_ests.push(ani);
        }
    }
    let iters = 100;
    for _ in 0..iters{
        let mut rand_vec = vec![];
        rand_vec.reserve(num_samp);
        for _ in 0..num_samp{
            rand_vec.push(fastrand::usize(..mult_ani_ests.len()));
        }
        let sum = rand_vec.into_iter().map(|x| mult_ani_ests[x]).sum::<f64>();
        res.push(sum/(num_samp as f64));
    }
    res.sort_by(|x,y| x.partial_cmp(y).unwrap());
    (res[iters * 5 / 100 - 1],res[iters * 95 / 100 - 1], std)

}

pub fn map_params_from_sketch <'a>(
    ref_sketch: &Sketch,
    amino_acid: bool,
    command_params: &CommandParams,
    model_opt: &'a Option<GBDT>
) -> MapParams<'a> {
    let max_gap_length = if amino_acid{D_MAX_GAP_LENGTH_AAI} else {D_MAX_GAP_LENGTH};
    let anchor_score = if amino_acid{D_ANCHOR_SCORE_AAI} else {D_ANCHOR_SCORE_ANI};
    let min_anchors = if amino_acid{D_MIN_ANCHORS_AAI} else {D_MIN_ANCHORS_ANI};
    let min_length_cover = if amino_acid{MIN_LENGTH_COVER_AAI} else {MIN_LENGTH_COVER};
    let fragment_length = fragment_length_formula(ref_sketch.total_sequence_length, amino_acid);
    let length_cutoff = fragment_length;
    let mut frac_cover_cutoff = command_params.min_aligned_frac;
    if frac_cover_cutoff < 0.{
        if amino_acid {
            frac_cover_cutoff = D_FRAC_COVER_CUTOFF_AA.parse::<f64>().unwrap()/100.;
        } else {
            frac_cover_cutoff = D_FRAC_COVER_CUTOFF.parse::<f64>().unwrap()/100.;
        }
    }

    let both_frac_cover_cutoff = command_params.both_min_aligned_frac;
    let length_cover_cutoff = 5000000;
    let bp_chain_band = if amino_acid {BP_CHAIN_BAND_AAI} else {BP_CHAIN_BAND};
    let index_chain_band = bp_chain_band/ref_sketch.c;
    let min_score = min_anchors as f64 * anchor_score * 0.75;
//    let min_score = 0.;
    let k = ref_sketch.k;
    let model;
    if let Some(m) = model_opt{
        model = Some(m);
    }
    else{
        model = None
    }
    MapParams {
        fragment_length,
        max_gap_length,
        anchor_score,
        min_anchors,
        length_cutoff,
        frac_cover_cutoff,
        both_frac_cover_cutoff,
        length_cover_cutoff,
        index_chain_band,
        k,
        amino_acid,
        min_score,
        robust: command_params.robust,
        median: command_params.median,
        bp_chain_band,
        min_length_cover,
        model
    }
}

pub fn chain_seeds(
    ref_sketch: &Sketch,
    query_sketch: &Sketch,
    map_params: MapParams,
) -> AniEstResult {
    let (anchor_chunks, switched) = get_anchors(ref_sketch, query_sketch, &map_params);
    let chain_results = chain_anchors_ani(&anchor_chunks, &map_params);
    let mut good_intervals = vec![];
    for i in 0..anchor_chunks.chunks.len() {
        let chain_result = &chain_results[i];
        let anchors = &anchor_chunks.chunks[i];
        get_chain_intervals(&mut good_intervals, chain_result, anchors, &map_params, i);
    }
    let good_interval_chunks =
        get_nonoverlapping_chains(&mut good_intervals, anchor_chunks.chunks.len());
    let mut ani = calculate_ani(
        &good_interval_chunks,
        ref_sketch,
        query_sketch,
        &anchor_chunks,
        &map_params,
        switched,
    );
    if let Some(model) = map_params.model{
        regression::predict_from_ani_res(&mut ani, model);
    }
    ani
}

fn calculate_ani(
    int_chunks: &Vec<Vec<ChainInterval>>,
    ref_sketch: &Sketch,
    query_sketch: &Sketch,
    anchor_chunks: &AnchorChunks,
    map_params: &MapParams,
    switched: bool,
) -> AniEstResult {
    let k = map_params.k;
    let mut ani_ests = vec![];
    let c = ref_sketch.c as GnPosition;
    let sensitive_af;
    if c < 200{
        sensitive_af = true;
    }
    else{
        sensitive_af = false;
    }
    let mut _num_good_chunks = 0;
    let mut _all_anchors_total = 0;
    let mut total_query_bases = 0;
    let mut total_ref_range = 0;
    let mut leftmost_interval = &ChainInterval::default();
    let mut rightmost_interval = &ChainInterval::default();
    let mut avg_chain_int_len = 0;
    let mut num_chains = 0;
    for (i, intervals) in int_chunks.iter().enumerate() {
        let mut all_intervals = vec![].to_interval_set();
        let mut total_anchors = 0;
        let mut total_bases_contained_query = 0;
        let mut _total_bases_contained_ref = 0;
        let mut total_range_query = (GnPosition::MAX, GnPosition::MIN);
        let mut total_range_ref = (GnPosition::MAX, GnPosition::MIN);
        for int in intervals {
            total_anchors += int.num_anchors;

            if int.interval_on_query.0 < total_range_query.0 {
                total_range_query.0 = int.interval_on_query.0;
                leftmost_interval = int;
            }
            if int.interval_on_query.1 > total_range_query.1 {
                total_range_query.1 = int.interval_on_query.1;
                rightmost_interval = int;
            }
            if int.interval_on_ref.0 < total_range_ref.0 {
                total_range_ref.0 = int.interval_on_ref.0;
            }
            if int.interval_on_ref.1 > total_range_ref.1 {
                total_range_ref.1 = int.interval_on_ref.1;
            }
            if !switched {
                total_bases_contained_query += int.interval_on_query.1 - int.interval_on_query.0
                    + map_params.k as GnPosition
                    + 2 * c;
                _total_bases_contained_ref += int.interval_on_ref.1 - int.interval_on_ref.0
                    + map_params.k as GnPosition
                    + 2 * c;
            } else {
                total_bases_contained_query += int.interval_on_ref.1 - int.interval_on_ref.0
                    + map_params.k as GnPosition
                    + 2 * c;
                _total_bases_contained_ref += int.interval_on_query.1 - int.interval_on_query.0
                    + map_params.k as GnPosition
                    + 2 * c;
            }

            let start =
                (i32::max(int.interval_on_query.0 as i32 - c as i32, 0)) as u32;
            let stop = int.interval_on_query.1 + c;
            all_intervals = all_intervals.union(&vec![(start, stop)].to_interval_set());
            //interval_vec.insert(int_insert, i);
            if sensitive_af{
                total_query_bases +=  int.query_range_len() - int.overlap + 2 * c + k as GnPosition;
                total_ref_range +=  int.query_range_len() - int.overlap + 2 * c + k as GnPosition;
            }

            avg_chain_int_len += int.query_range_len() - int.overlap + 2 * c + k as GnPosition;
            num_chains += 1;
        }

        if total_anchors == 0{
            continue;
        }

        if  total_range_query.1 - total_range_query.0 < map_params.min_length_cover as GnPosition{
            continue;
        }

        if !sensitive_af{
            total_query_bases += total_range_query.1 - total_range_query.0 + 2 * c + map_params.k as GnPosition;
            total_ref_range += total_range_query.1 - total_range_query.0 + 2 * c + map_params.k as GnPosition;
        }

        let mut num_seeds_in_intervals = 0;
        let mut upper_lower_seeds = 0;
        for pos in anchor_chunks.seeds_in_chunk[i].iter() {
            if all_intervals.contains(pos) {
                num_seeds_in_intervals += 1;
            }
        }

        let mut left_spacing_est =  0;
        let mut right_spacing_est = 0;
        let switched_ref_sketch;
        let switched_query_sketch;
        if switched{
            switched_query_sketch = &ref_sketch;
            switched_ref_sketch = &query_sketch;
        }
        else{
            switched_ref_sketch = &ref_sketch;
            switched_query_sketch = &query_sketch;
        }

        let ref_ctg_len = switched_ref_sketch.contig_lengths[leftmost_interval.ref_contig];
        trace!("switched ref ctg len {},id {}", ref_ctg_len, leftmost_interval.ref_contig);
        let q_ctg_len = switched_query_sketch.contig_lengths[leftmost_interval.query_contig];
        trace!("switched query ctg len {},id {}", q_ctg_len, leftmost_interval.query_contig);


        //TODO this was for testing... don't use this mechanism
        let extend = 0;
        if leftmost_interval.reverse_chain{
            let ref_ctg_len =switched_ref_sketch.contig_lengths[leftmost_interval.ref_contig];
            if ref_ctg_len - leftmost_interval.interval_on_ref.1 < extend{
                left_spacing_est = ref_ctg_len - leftmost_interval.interval_on_ref.1;
            }
        }
        else{
            //Clipped
            if leftmost_interval.interval_on_ref.0 < extend{
                left_spacing_est = leftmost_interval.interval_on_ref.0;
            }
        }
        if rightmost_interval.reverse_chain{
            if rightmost_interval.interval_on_ref.0 < extend{
                right_spacing_est = rightmost_interval.interval_on_ref.0;
            }
        }
        else{
            let ref_ctg_len = switched_ref_sketch.contig_lengths[rightmost_interval.ref_contig];
            trace!("{},{}",ref_ctg_len, rightmost_interval.ref_contig);
//                dbg!(ref_ctg_len, rightmost_interval);
//                dbg!(query_sketch.contig_lengths[leftmost_interval.ref_contig], leftmost_interval);
            if ref_ctg_len - rightmost_interval.interval_on_ref.1 < extend{
                right_spacing_est = ref_ctg_len - rightmost_interval.interval_on_ref.1;
            }
        }
//        dbg!(right_spacing_est, left_spacing_est);
        
        //        let spacing_est = (total_range_query.1 - total_range_query.0
        //            + 2 * ref_sketch.c as GnPosition)
        //            / num_seeds_in_intervals as GnPosition;
        for pos in anchor_chunks.seeds_in_chunk[i].iter() {
            if *pos + left_spacing_est >= total_range_query.0
                && *pos <= total_range_query.1 + right_spacing_est
            {
                upper_lower_seeds += 1;
            }
        }

        let mut anchors_in_chunk_considered = anchor_chunks.seeds_in_chunk[i].len();
        let putative_ani = f64::powf(
//                        total_anchors as f64 / (upper_lower_seeds) as f64,
            total_anchors as f64 / (num_seeds_in_intervals) as f64,
            1. / k as f64,
        );
        if putative_ani > 0.950
//            && total_bases_contained_query > ref_sketch.c as GnPosition * 20
            //&& total_bases_contained_query > c  * 3 * (upper_lower_seeds / total_anchors) as GnPosition
            && total_bases_contained_query > c * 4
            && !map_params.amino_acid
            && total_range_query.1 - total_range_query.0 < (CHUNK_SIZE_DNA * 9 / 10) as GnPosition 
            && anchors_in_chunk_considered as f64 > 1.05 * upper_lower_seeds as f64 
        {
            //                        anchors_in_chunk_considered = num_seeds_in_intervals;
            trace!("putative ani filter {} -> {}", anchors_in_chunk_considered, upper_lower_seeds);
            anchors_in_chunk_considered = upper_lower_seeds;
        }

        let test = false;
        if test{
            if right_spacing_est != 0 || left_spacing_est != 0{
                anchors_in_chunk_considered = upper_lower_seeds;
            }
            else{
                anchors_in_chunk_considered = anchor_chunks.seeds_in_chunk[i].len();
            }
        }

        let ml_hits = if map_params.amino_acid {
            f64::min(
                1.,
                total_anchors as f64 / anchors_in_chunk_considered as f64 * 6.,
            )
        } else {
            f64::min(
                1.,
                total_anchors as f64 / anchors_in_chunk_considered as f64,
            )
        };
        let ani_est = if map_params.amino_acid {
            f64::powf(ml_hits, 1. / k as f64)
        } else {
            f64::powf(ml_hits, 1. / k as f64)
        };

        //        total_bases_contained_query =
        //            total_range_query.1 - total_range_query.0 + map_params.k as GnPosition;
        //        total_bases_contained_ref =
        //            total_range_query.1 - total_range_query.0 + map_params.k as GnPosition;
        //
        //        total_query_range += total_bases_contained_query;
        //        total_ref_range += total_bases_contained_ref;

        //        ani_ests.push((ani_est, anchor_chunks.seeds_in_chunk[i]));
        if map_params.amino_acid {
//            ani_ests.push((ani_est, anchor_chunks.seeds_in_chunk[i].len() / 6));
            ani_ests.push((ani_est, anchor_chunks.seeds_in_chunk[i].len() / 6));
//            ani_ests.push((ani_est, 1));
        } else {
            //ani_ests.push((ani_est, anchor_chunks.seeds_in_chunk[i].len()));
            ani_ests.push((ani_est, anchors_in_chunk_considered));
        }
        //                        ani_ests.push((ani_est, upper_lower_seeds));
        trace!(
            "Ani est fragment {}, total range {:?}, total anchors {}, seeds in fragment {:?},",
            ani_est,
            total_range_query,
            total_anchors,
            anchor_chunks.seeds_in_chunk[i].len(),
        );
        trace!(
            "Intervals {:?}, Num Anchors in Interval {}, Total Anchors {}",
            &intervals,
            intervals[0].num_anchors,
            total_anchors
        );
        _all_anchors_total += total_anchors;
        _num_good_chunks += 1;
    }
    ani_ests.sort_by(|x, y| x.partial_cmp(y).unwrap());

    if ani_ests.is_empty() || num_chains == 0 {
        let mut ret = AniEstResult::default();
        ret.ani = f32::NAN;
        return ret;
    }
    avg_chain_int_len /= num_chains;
    let mut total_multiplicitiy = 0;
    for ani in ani_ests.iter(){
        total_multiplicitiy += ani.1;
    }
    let lower;
    let upper;
    if map_params.median{
        lower = 0.499;
        upper = 0.501;
    } else if map_params.robust{
        lower = 0.10;
        upper = 0.90;
    } else {
        lower = 0.;
        upper = 1.;
    }

//    for ani_est in ani_ests.iter(){
//        println!("{},{}", ani_est.0, ani_est.1);
//    }

    let mut lower_i = 0;
    let mut upper_i = ani_ests.len()-1;
    let mut changed_l = false;
    let mut _changed_u = false;

    let mut curr_sum = 0;
    for (i,ani) in ani_ests.iter().enumerate(){
        curr_sum += ani.1;
        if curr_sum >= ((total_multiplicitiy as f64) * lower) as usize && !changed_l{
            lower_i = i;
            changed_l = true;
        }
        if curr_sum >= ((total_multiplicitiy as f64) * upper) as usize && !_changed_u{
            upper_i = i+1;
            _changed_u = true;
            break;
        }
    }

    let mut total_multiplicitiy = 0;
    let mut weighted_avg = 0.;
    for i in lower_i..upper_i{
        weighted_avg += ani_ests[i].0 * ani_ests[i].1 as f64;
        total_multiplicitiy += ani_ests[i].1;
        //        weighted_avg += ani_ests[i].0 * (ani_ests[i].1 as f64);
        //        total_weight_interval += ani_ests[i].1;
    }
    //    let mut final_ani = weighted_avg / total_weight_interval as f64;
    let mut final_ani = weighted_avg / total_multiplicitiy as f64;

//    let (upper, lower) = z_interval(&ani_ests);
    let ci_std = bootstrap_interval(&ani_ests);
    let ci = (ci_std.0, ci_std.1);
    let std = ci_std.2;
    let covered_query = f64::min(
        1.,
        total_query_bases as f64 / query_sketch.total_sequence_length as f64,
    );
    let covered_ref = f64::min(
        1.,
        total_ref_range as f64 / ref_sketch.total_sequence_length as f64,
    );
    
    let q_string = &query_sketch.file_name;
    let id_string = if map_params.amino_acid { "AAI" } else { "ANI" };
    trace!("Total 1-to-1 aligned bases: {}", total_query_bases);
    //println!("{}",total_query_bases);
    debug!(
        "Query {} Ref {} - {} {}, +/- = {}/{}. ",
        q_string,
        ref_sketch.file_name,
        id_string,
        final_ani,
        ci.0,
        ci.1,
    );

    if map_params.both_frac_cover_cutoff > 0.0 {
        // When --both-min-af is specified, require BOTH genomes to have aligned fraction above threshold
        if covered_query < map_params.both_frac_cover_cutoff || covered_ref < map_params.both_frac_cover_cutoff {
            final_ani = -1.;
        }
    } else {
        // Original behavior: different logic for amino acid vs nucleotide
        if map_params.amino_acid{
            if covered_query < map_params.frac_cover_cutoff  || covered_ref < map_params.frac_cover_cutoff
            {
                final_ani = -1.;
            }
        }
        else if covered_query < map_params.frac_cover_cutoff  && covered_ref < map_params.frac_cover_cutoff
        {
            final_ani = -1.;
        }
    }

    let mut sorted_contigs_q = query_sketch.contig_lengths.clone();
    let mut sorted_contigs_r = ref_sketch.contig_lengths.clone();
    sorted_contigs_q.sort();
    sorted_contigs_r.sort();
    let qs_lens = sorted_contigs_q.len();
    let rs_lens = sorted_contigs_r.len();
    let contig_quants_q = [sorted_contigs_q[qs_lens * 10 / 100 ], sorted_contigs_q[qs_lens* 50 / 100], sorted_contigs_q[qs_lens * 90 / 100]];
    let contig_quants_r = [sorted_contigs_r[rs_lens * 10 / 100 ], sorted_contigs_r[rs_lens* 50 / 100], sorted_contigs_r[rs_lens * 90 / 100]];
//    let mean_contig_len_q = query_sketch.contig_lengths.iter().map(|x| *x as f64).sum::<f64>()
//        /(query_sketch.contig_lengths.len() as f64);
//    let mean_contig_len_r = ref_sketch.contig_lengths.iter().map(|x| *x as f64).sum::<f64>()
//        /(ref_sketch.contig_lengths.len() as f64);

    AniEstResult {
        ani: final_ani as f32,
        align_fraction_query: covered_query as f32,
        align_fraction_ref: covered_ref as f32,
        ref_file: ref_sketch.file_name.clone(),
        query_file: query_sketch.file_name.clone(),
        query_contig: query_sketch.contigs[0].clone(),
        ref_contig: ref_sketch.contigs[0].clone(),
        num_contigs_r: ref_sketch.contigs.len() as u32,
        num_contigs_q: query_sketch.contigs.len() as u32,
        ci_upper: ci.1 as f32,
        ci_lower: ci.0 as f32,
        aai: map_params.amino_acid,
        quant_90_contig_len_q: contig_quants_q[2] as f32,
        quant_90_contig_len_r: contig_quants_r[2] as f32,
        quant_50_contig_len_q: contig_quants_q[1] as f32,
        quant_50_contig_len_r: contig_quants_r[1] as f32,
        quant_10_contig_len_q: contig_quants_q[0] as f32,
        quant_10_contig_len_r: contig_quants_r[0] as f32,
        std: std as f32,
        avg_chain_int_len,
        total_bases_covered: total_query_bases,
    }
}

#[inline]
pub fn score_anchors(anchor_curr: &Anchor, anchor_past: &Anchor, map_params: &MapParams) -> f64 {
    // if anchor_curr.query_phase != anchor_past.query_phase
    //     || anchor_curr.ref_phase != anchor_past.ref_phase
    // {
    //     return f64::MIN;
    // }
    if anchor_curr.reverse_match != anchor_past.reverse_match {
        return f64::MIN;
    }
    if anchor_curr.ref_pos == anchor_past.ref_pos || anchor_curr.query_pos == anchor_past.query_pos
    {
        return f64::MIN;
    }

    let acqpf64 = anchor_curr.query_pos as f64;
    let apqpf64 = anchor_past.query_pos as f64;
    let acrpf64 = anchor_curr.ref_pos as f64;
    let aprpf64 = anchor_past.ref_pos as f64;

    let d_q = (acqpf64 - apqpf64).abs();

    let d_r;
    if anchor_curr.reverse_match {
        d_r = aprpf64 - acrpf64;
    } else {
        d_r = acrpf64 - aprpf64;
    }

    if d_q > D_MAX_LIN_LENGTH || d_r > D_MAX_LIN_LENGTH {
        return f64::MIN;
    }

    if d_r <= 0. {
        return f64::MIN;
    }

    let gap = (d_r - d_q).abs();
    if gap > map_params.max_gap_length {
        return f64::MIN;
    }
    //    let ol_q = f64::max(0., map_params.k as f64 - d_q);
    //    let ol_r = f64::max(0., map_params.k as f64 - d_r);
    //    let ol = f64::max(ol_q, ol_r);
    //    return map_params.anchor_score * (1. - ol / map_params.k as f64) - gap;
    map_params.anchor_score - gap
}




fn get_anchors(
    ref_sketch: &Sketch,
    query_sketch: &Sketch,
    map_params: &MapParams,
) -> (AnchorChunks, bool) {
    let k = map_params.k;
    let kmer_seeds_ref;
    let kmer_seeds_query;
    let mut query_positions_all;
    let switched;
    if ref_sketch.contig_lengths.is_empty() || query_sketch.contig_lengths.is_empty(){
        return (AnchorChunks::default(), true);
    }
//    let score_query = query_sketch.total_sequence_length as f64
//    let score_ref = ref_sketch.total_sequence_length as f64
//        * (ref_sketch.total_sequence_length as f64 / ref_sketch.contigs.len() as f64).ln();
//
    let mut ctgs_q = query_sketch.contig_lengths.iter().collect::<Vec<&GnPosition>>();
    let mean_ctg_len_q = query_sketch.contig_lengths.iter().map(|x| *x as f64).sum::<f64>()
        /(query_sketch.contig_lengths.len() as f64);
    ctgs_q.sort_unstable();
//    let med_ctg_len_q = *ctgs_q[query_sketch.contig_lengths.len()/2] as f64;
    let mean_ctg_len_r = ref_sketch.contig_lengths.iter().map(|x| *x as f64).sum::<f64>()
        /(ref_sketch.contig_lengths.len() as f64);

//    let score_query = (query_sketch.total_sequence_length as f64)
//        * f64::min(med_ctg_len_q, 40000.);
//    let score_ref = (ref_sketch.total_sequence_length as f64)
//        * f64::min(med_ctg_len_r, 40000.);

    let query_length_markers_proxy;
    let ref_length_markers_proxy;

    if query_sketch.total_sequence_length > 100_000 && ref_sketch.total_sequence_length > 100_000{
        query_length_markers_proxy = query_sketch.marker_seeds.len() as f64 * query_sketch.c as f64;
        ref_length_markers_proxy = ref_sketch.marker_seeds.len() as f64 * ref_sketch.c as f64;
    }
    else{
        query_length_markers_proxy = query_sketch.total_sequence_length as f64;
        ref_length_markers_proxy = ref_sketch.total_sequence_length as f64;
    }
    if switch_qr(mean_ctg_len_r,mean_ctg_len_q, query_length_markers_proxy, ref_length_markers_proxy, &query_sketch.file_name, &ref_sketch.file_name){
        switched = true;

        kmer_seeds_ref = query_sketch.kmer_seeds_k.as_ref().unwrap();
        kmer_seeds_query = ref_sketch.kmer_seeds_k.as_ref().unwrap();
        query_positions_all = vec![vec![]; ref_sketch.contigs.len()];
    } else {
        switched = false;

        kmer_seeds_ref = ref_sketch.kmer_seeds_k.as_ref().unwrap();
        kmer_seeds_query = query_sketch.kmer_seeds_k.as_ref().unwrap();
        query_positions_all = vec![vec![]; query_sketch.contigs.len()];
    }
    //    let kmer_seeds_ref = &ref_sketch.kmer_seeds_k[k];
    //    let kmer_seeds_query = &query_sketch.kmer_seeds_k[k];
    let mut anchors = vec![];
    let mut query_kmers_with_hits = 0;
    for (canon_kmer, _query_tagged) in kmer_seeds_query.iter() {
        // Get query positions using the new API
        let query_positions_iter = if switched {
            ref_sketch.get_seed_positions(*canon_kmer)
        } else {
            query_sketch.get_seed_positions(*canon_kmer)
        };
        
        if query_positions_iter.len() > map_params.index_chain_band{
            continue;
        }
        
        // Get query positions using Cow
        let query_positions = query_positions_iter;
        let contains = kmer_seeds_ref.contains_key(canon_kmer);

        if !contains {
            for qpos in query_positions.iter() {
                query_positions_all[qpos.contig_index() as usize].push(qpos.pos);
            }
        } else {
            // Get reference positions using the new API  
            let ref_positions = if switched {
                query_sketch.get_seed_positions(*canon_kmer)
            } else {
                ref_sketch.get_seed_positions(*canon_kmer)
            };

            if ref_positions.len() > map_params.index_chain_band{
                continue;
            }

            for qpos in query_positions.iter() {
                query_positions_all[qpos.contig_index() as usize].push(qpos.pos);
            }

            query_kmers_with_hits += 1;
            for qpos in query_positions.iter() {
                for rpos in ref_positions.iter() {
                    anchors.push(Anchor::new(
                        &(rpos.pos, rpos.contig_index()),
                        &(qpos.pos, qpos.contig_index()),
                        rpos.canonical() != qpos.canonical(),
                    ));
                }
            }
        }
    }
    if anchors.is_empty() {
        debug!(
            "no anchors found for {}, {}",
            &ref_sketch.file_name, &query_sketch.file_name
        );
        return (AnchorChunks::default(), true);
    }
    anchors.sort_unstable();
    for query_position_vec in query_positions_all.iter_mut() {
        query_position_vec.sort_unstable();
    }
    debug!(
        "Ref seeds len {}, Query seeds len {}, Anchors {}, Seeds hit query {}, Est {}, Ref_file {}, Query_file {}",
        kmer_seeds_ref.len(),
        kmer_seeds_query.len(),
        anchors.len(),
        query_kmers_with_hits,
        f64::powf(
            (query_kmers_with_hits as f64) / (kmer_seeds_query.len() as f64),
            1. / (k as f64)
        ),
        ref_sketch.file_name,
        query_sketch.file_name,
    );
    let mut lengths = vec![];
    let mut chunks = vec![];
    let mut curr_anchor_chunk = vec![];
    let mut block_seeds = vec![];
    let smallest_anchor_query_pos = anchors[0].query_pos;
    let mut last_query_contig = anchors[0].query_contig;
    let mut curr_end_point = smallest_anchor_query_pos + map_params.fragment_length as u32;
    let mut running_counter = 0;
    for anchor in anchors {
        if last_query_contig != anchor.query_contig || anchor.query_pos > curr_end_point {
            if query_positions_all[last_query_contig as usize].is_empty() {
                warn!("{}", &query_sketch.contigs[last_query_contig as usize]);
                continue;
            }
            let mut _num_seeds_in_block = 0;
            let mut seed_pos_in_block = vec![];
            let mut first_iter = true;
            loop {
                if running_counter >= query_positions_all[last_query_contig as usize].len() {
                    break;
                }
                if query_positions_all[last_query_contig as usize][running_counter]
                    <= curr_end_point
                {
                    if first_iter {
                        first_iter = false;
                        trace!(
                            "start {}",
                            query_positions_all[last_query_contig as usize][running_counter]
                        );
                    }
                    seed_pos_in_block
                        .push(query_positions_all[last_query_contig as usize][running_counter]);
                    running_counter += 1;
                    _num_seeds_in_block += 1;
                } else {
                    trace!(
                        "end {}",
                        query_positions_all[last_query_contig as usize][running_counter - 1]
                    );
                    break;
                }
            }
            block_seeds.push(seed_pos_in_block);
            curr_end_point += map_params.fragment_length as u32;
            chunks.push(mem::take(&mut curr_anchor_chunk));
            curr_anchor_chunk = vec![];
            lengths.push(map_params.fragment_length as u32);
            if last_query_contig != anchor.query_contig {
                curr_end_point = anchor.query_pos + map_params.fragment_length as u32;
                running_counter = 0;
            }
        }
        last_query_contig = anchor.query_contig;
        curr_anchor_chunk.push(anchor);
    }
    if !curr_anchor_chunk.is_empty() {
        let mut _num_seeds_in_block = 0;
        let mut seed_pos_in_block = vec![];
        loop {
            if query_positions_all[last_query_contig as usize].is_empty() {
                warn!("{}", &query_sketch.contigs[last_query_contig as usize]);
                continue;
            }
            if running_counter >= query_positions_all[last_query_contig as usize].len() {
                break;
            }
            if (query_positions_all[last_query_contig as usize][running_counter]
                <= curr_anchor_chunk[curr_anchor_chunk.len() - 1].query_pos)
                || (last_query_contig
                    != curr_anchor_chunk[curr_anchor_chunk.len() - 1].query_contig)
            {
                seed_pos_in_block
                    .push(query_positions_all[last_query_contig as usize][running_counter]);
                running_counter += 1;
                _num_seeds_in_block += 1;
            } else {
                break;
            }
        }
        lengths.push(
            curr_anchor_chunk[curr_anchor_chunk.len() - 1].query_pos
                - curr_anchor_chunk[0].query_pos,
        );
        chunks.push(mem::take(&mut curr_anchor_chunk));
        block_seeds.push(seed_pos_in_block);
    }

    assert!(block_seeds.len() == chunks.len());

    (
        AnchorChunks {
            chunks,
            lengths,
            seeds_in_chunk: block_seeds,
        },
        switched,
    )
}

fn chain_anchors_ani(anchor_chunks: &AnchorChunks, map_params: &MapParams) -> Vec<ChainingResult> {
    let mut chaining_results = vec![];

    let num_chunks = anchor_chunks.chunks.len();
    let past_chain_length = usize::min(map_params.fragment_length / 2, map_params.bp_chain_band) as u32;

    for anchor_chunk in anchor_chunks.chunks.iter() {
        let mut pointer_vec = vec![0; anchor_chunk.len()];
        let mut score_vec = vec![0.; anchor_chunk.len()];
        let mut chain_part = partition_vec![];

        for i in 0..anchor_chunk.len() {
            chain_part.push(i);
            let anchor_curr = &anchor_chunk[i];
            let mut best_score = 0.;
            let mut best_prev_index = i;
            for j in (0..i).rev() {
                let anchor_past = &anchor_chunk[j];
                if anchor_curr.ref_contig != anchor_past.ref_contig {
                    continue;
                }
                if anchor_curr.query_pos - anchor_past.query_pos > past_chain_length
                    || i - j > map_params.index_chain_band
                {
                    break;
                }
//                if anchor_curr.query_contig != anchor_past.query_contig {
//                    continue;
//                }
                if anchor_curr.ref_contig != anchor_past.ref_contig {
                    continue;
                }

                let anchor_score = score_anchors(anchor_curr, anchor_past, map_params);
                if anchor_score == f64::MIN {
                    continue;
                }
                let new_score = anchor_score + score_vec[j];
                if new_score > best_score {
                    best_score = new_score;
                    best_prev_index = j;
                }
            }
            score_vec[i] = best_score;
            pointer_vec[i] = best_prev_index;
            if best_prev_index != i {
                chain_part.union(i, best_prev_index);
            }
        }

        chaining_results.push(ChainingResult {
            pointer_vec,
            chain_part,
            score_vec,
            num_chunks,
        });
    }
    chaining_results
}

//fn chain_anchors_global(anchors: &Vec<Anchor>) -> ChainingResult {
//    let mut pointer_vec = vec![0; anchors.len()];
//    let mut score_vec = vec![0.; anchors.len()];
//    let mut chain_part = partition_vec![];
//    for i in 0..anchors.len() {
//        chain_part.push(i);
//        let anchor_curr = &anchors[i];
//        let mut best_score = 0.;
//        let mut best_prev_index = i;
//        for j in (0..i).rev() {
//            let anchor_past = &anchors[j];
//            if anchor_curr.query_contig != anchor_past.query_contig {
//                break;
//            }
//            if anchor_curr.query_pos - anchor_past.query_pos > u32::min(FRAGMENT_LENGTH as u32, 2000) {
//                break;
//            }
//            let new_score = score_anchors(anchor_curr, anchor_past) + score_vec[j];
//            if new_score == f64::MIN {
//                continue;
//            }
//            if new_score > best_score {
//                best_score = new_score;
//                best_prev_index = j;
//            }
//        }
//        score_vec[i] = best_score;
//        pointer_vec[i] = best_prev_index;
//        if best_prev_index != i {
//            chain_part.union(i, best_prev_index);
//        }
//    }
//
//    return ChainingResult {
//        pointer_vec,
//        chain_part,
//        score_vec,
//        num_chunks: usize::MAX,
//    };
//}

fn get_chain_intervals(
    good_intervals: &mut Vec<ChainInterval>,
    cr: &ChainingResult,
    anchors: &Vec<Anchor>,
    map_params: &MapParams,
    chunk_id: usize,
) {
    for set in cr.chain_part.all_sets() {
        let mut small_chain = false;
        let mut first_iter = true;
        let mut max_score = f64::MIN;
        let mut best_index = usize::MAX;
        let mut num_anchors = 1;
        for (index, _value) in set {
            if first_iter {
                if cr.chain_part.len_of_set(index) < map_params.min_anchors {
                    small_chain = true;
                    break;
                }
                first_iter = false;
            }
            if cr.score_vec[index] > max_score {
                max_score = cr.score_vec[index];
                best_index = index;
            }
        }
        if small_chain {
            continue;
        }

        let mut index = best_index;
        while cr.pointer_vec[index] != index {
            index = cr.pointer_vec[index];
            num_anchors += 1;
        }
        small_chain = num_anchors < map_params.min_anchors;
        if small_chain || max_score < map_params.min_score {
            continue;
        }
        let smallest_id = index;
        let largest_id = best_index;
        let interval_on_query = (
            anchors[smallest_id].query_pos,
            anchors[largest_id].query_pos,
        );
        let endpoint1 = anchors[smallest_id].ref_pos;
        let endpoint2 = anchors[largest_id].ref_pos;
        assert!(anchors[smallest_id].reverse_match == anchors[largest_id].reverse_match);
        assert!(anchors[smallest_id].ref_contig == anchors[largest_id].ref_contig);
        let interval_on_ref = (
            GnPosition::min(endpoint1, endpoint2),
            GnPosition::max(endpoint1, endpoint2),
        );
        let ref_contig = anchors[smallest_id].ref_contig as usize;
        let query_contig = anchors[smallest_id].query_contig as usize;
        let chain_interval = ChainInterval {
            interval_on_query,
            interval_on_ref,
            ref_contig,
            query_contig,
            score: max_score,
            num_anchors,
            chunk_id,
            reverse_chain: anchors[smallest_id].reverse_match,
            overlap : 0
        };
        good_intervals.push(chain_interval);
    }
}
fn get_nonoverlapping_chains(
    intervals: &mut Vec<ChainInterval>,
    num_chunks: usize,
) -> Vec<Vec<ChainInterval>> {
    intervals.sort_by(|x, y| y.partial_cmp(x).unwrap());
    let mut interval_trees = FxHashMap::default();
    let mut interval_trees_ref = FxHashMap::default();
    let mut good_non_overlap_intervals = vec![vec![]; num_chunks];
    let mut bases_added = 0;
    for (i, int) in intervals.iter().enumerate() {
        let q_interval = (int.interval_on_query.0)..(int.interval_on_query.1);
        let r_interval = (int.interval_on_ref.0)..(int.interval_on_ref.1);
        let interval_tree_r = interval_trees_ref
            .entry(int.ref_contig)
            .or_insert(IntervalTree::new());
        let interval_tree_q = interval_trees
            .entry(int.query_contig)
            .or_insert(IntervalTree::new());

        let mut sum_overlaps_ref = 0;
        let mut sum_overlaps_query = 0;
        let no_overlap_ref;
        if interval_tree_r.find(&r_interval).count() == 0 {
            no_overlap_ref = true
        } else {
//            let mut TODO_intervals= vec![];
            let overlaps = interval_tree_r.find(&r_interval);
            let mut small_ol = false;
            for ol in overlaps {
                let ol_interval: &ChainInterval = &intervals[*ol.data()];
                let overlap = GnPosition::min(
                    int.interval_on_ref.1 - ol_interval.interval_on_ref.0,
                    ol_interval.interval_on_ref.1 - int.interval_on_ref.0,
                );
                sum_overlaps_ref += overlap;
//                TODO_intervals.push(ol_interval.clone());
                
            }
            if (sum_overlaps_ref as f32) < int.ref_range_len() as f32 * OVERLAP_ORTHOLOGOUS_FRACTION {
                bases_added += int.query_range_len();
                small_ol = true;
//                dbg!("ref", TODO_intervals, int);
            }
            if small_ol {
                no_overlap_ref = true;
            } else {
                no_overlap_ref = false;
            }
        }

        let no_overlap_query;
        if interval_tree_q.find(&q_interval).count() == 0 {
            no_overlap_query = true;
        } else {
            let overlaps = interval_tree_q.find(&q_interval);
            let mut small_ol = false;
//            let mut TODO_intervals= vec![];
            for ol in overlaps {
                let ol_interval: &ChainInterval = &intervals[*ol.data()];
                let overlap = GnPosition::min(
                    int.interval_on_query.1 - ol_interval.interval_on_query.0,
                    ol_interval.interval_on_query.1 - int.interval_on_query.0,
                );
                sum_overlaps_query += overlap;
//                TODO_intervals.push(ol_interval.clone());
            }

            if (sum_overlaps_query as f32) < int.query_range_len() as f32 * OVERLAP_ORTHOLOGOUS_FRACTION {
                bases_added += int.query_range_len();
                small_ol = true;
//                dbg!("query", TODO_intervals, int);
            }
            if small_ol {
                no_overlap_query = true;
            } else {
                no_overlap_query = false;
            }
        }
        if no_overlap_ref && no_overlap_query {
            //

            interval_tree_q.insert(q_interval, i);
            interval_tree_r.insert(r_interval, i);
            let mut cloned_int = int.clone();
            cloned_int.overlap = sum_overlaps_query;
            good_non_overlap_intervals[int.chunk_id].push(int.clone());
        }
    }
    trace!("Bases rescued by small overlapping orthologous threshold: {}", bases_added);
    //good_non_overlap_intervals.sort_by(|x, y| y.partial_cmp(&x).unwrap());
    good_non_overlap_intervals
}









































================================================
FILE: src/cli.rs
================================================
use clap::{Parser, Subcommand, Args};

#[derive(Parser)]
#[clap(
    name = "skani",
    version,
    about = "fast, robust ANI calculation and database searching for metagenomic contigs and assemblies. \n\nQuick ANI calculation:\nskani dist genome1.fa genome2.fa \n\nMemory-efficient database search:\nskani sketch genomes/* -o database; skani search -d database query1.fa query2.fa ...\n\nAll-to-all comparison:\nskani triangle genomes/*",
    arg_required_else_help = true, disable_help_subcommand = true
)]
pub struct Cli {
    #[clap(subcommand)]
    pub command: Commands,
}

#[derive(Subcommand)]
pub enum Commands {
    /// Sketch (index) genomes.
    /// Usage: skani sketch genome1.fa genome2.fa ... -o new_sketch_folder
    Sketch(SketchArgs),
    
    /// Compute ANI for queries against references fasta files or pre-computed sketch files.
    /// Usage: skani dist query.fa ref1.fa ref2.fa ... or use -q/--ql and -r/--rl options.
    Dist(DistArgs),
    
    /// Compute a lower triangular ANI/AF matrix.
    /// Usage: skani triangle genome1.fa genome2.fa genome3.fa ...
    Triangle(TriangleArgs),
    
    /// Search queries against a large pre-sketched database of reference genomes in a memory efficient manner.
    /// Usage: skani search -d sketch_folder query1.fa query2.fa ...
    Search(SearchArgs),
}

#[derive(Args)]
#[clap(group(
    clap::ArgGroup::new("input_group")
        .required(true)
))]
pub struct SketchArgs {
    /// Number of threads
    #[clap(short = 't', default_value = "3")]
    pub threads: String,

    /// fastas to sketch
    #[clap(help_heading = "INPUT/OUTPUT", group = "input_group")]
    pub fasta_files: Vec<String>,
    
    /// File with each line containing one fasta/sketch file
    #[clap(short = 'l', help_heading = "INPUT/OUTPUT", group = "input_group")]
    pub fasta_list: Option<String>,
    
    /// Use individual sequences instead the entire file for multi-fastas. 
    #[clap(short = 'i', help_heading = "INPUT/OUTPUT")]
    pub individual_contig: bool,
    
    /// Output folder where sketch files are placed. 
    #[clap(short = 'o', required = true, display_order = 1, help_heading = "INPUT/OUTPUT")]
    pub output: String,

    /// Slower skani mode; 4x slower and more memory. Gives much more accurate AF for distant genomes. More accurate ANI for VERY fragmented assemblies (< 3kb N50), but less accurate ANI otherwise. Alias for -c 30.
    #[clap(long = "slow", help_heading = "PRESETS")]
    pub slow: bool,
    
    /// Medium skani mode; 2x slower and more memory. More accurate AF and more accurate ANI for moderately fragmented assemblies (< 10kb N50). Alias for -c 70.
    #[clap(long = "medium", help_heading = "PRESETS")]
    pub medium: bool,
    
    /// Faster skani mode; 2x faster and less memory. Less accurate AF and less accurate ANI for distant genomes, but works ok for high N50 and > 95% ANI. Alias for -c 200.
    #[clap(long = "fast", help_heading = "PRESETS")]
    pub fast: bool,

    /// Create separate .sketch files instead of consolidated database format. DOES NOT WORK WITH -i. 
    #[clap(long = "separate-sketches", help_heading = "INPUT/OUTPUT")]
    pub separate_sketches: bool,

    /// Use amino acid to calculate AAI instead. [default: ANI]
    #[clap(short = 'a', long = "aai", hide = true, help_heading = "SKETCH PARAMETERS")]
    pub aai: bool,
    
    /// k-mer size. [default: 15]
    #[clap(short = 'k', hide = true, help_heading = "SKETCH PARAMETERS")]
    pub k: Option<String>,
    
    /// Compression factor (k-mer subsampling rate). [default: 125]
    #[clap(short = 'c', help_heading = "SKETCH PARAMETERS")]
    pub c: Option<String>,
    
    /// Marker k-mer compression factor. Markers are used for filtering. Consider decreasing to ~200-300 if working with small genomes (e.g. plasmids or viruses). [default: 1000]
    #[clap(short = 'm', help_heading = "SKETCH PARAMETERS")]
    pub marker_c: Option<String>,

    /// Debug level verbosity
    #[clap(short = 'v', long = "debug", help_heading = "MISC")]
    pub debug: bool,
    
    /// Trace level verbosity
    #[clap(long = "trace", help_heading = "MISC")]
    pub trace: bool,
}

#[derive(Args)]
#[clap(group(
    clap::ArgGroup::new("query_group")
        .required(true)
))]
pub struct DistArgs {
    /// Number of threads
    #[clap(short = 't', default_value = "3")]
    pub threads: String,

    /// Use amino acid to calculate AAI instead. [default: ANI]
    #[clap(short = 'a', long = "aai", hide = true, help_heading = "INPUTS")]
    pub aai: bool,
    
    /// Query fasta or sketch
    #[clap(help_heading = "INPUTS", group = "query_group")]
    pub query: Option<String>,
    
    /// Reference fasta(s) or sketch(es)
    #[clap(help_heading = "INPUTS")]
    pub reference: Vec<String>,
    
    /// Query fasta(s) or sketch(es)
    #[clap(short = 'q', multiple_values = true, help_heading = "INPUTS", group = "query_group")]
    pub queries: Vec<String>,
    
    /// Reference fasta(s) or sketch(es)
    #[clap(short = 'r', multiple_values = true, help_heading = "INPUTS")]
    pub references: Vec<String>,
    
    /// File with each line containing one fasta/sketch file
    #[clap(long = "rl", help_heading = "INPUTS")]
    pub reference_list: Option<String>,
    
    /// File with each line containing one fasta/sketch file
    #[clap(long = "ql", help_heading = "INPUTS", group = "query_group")]
    pub query_list: Option<String>,
    
    /// Use individual sequences for the QUERY in a multi-line fasta
    #[clap(long = "qi", help_heading = "INPUTS")]
    pub qi: bool,
    
    /// Use individual sequences for the REFERENCE in a multi-line fasta
    #[clap(long = "ri", help_heading = "INPUTS")]
    pub ri: bool,

    /// Output file name; rewrites file by default [default: output to stdout]
    #[clap(short = 'o', display_order = 1, help_heading = "OUTPUT")]
    pub output: Option<String>,
    
    /// Only output ANI values where one genome has aligned fraction > than this value. [default: 15]
    #[clap(long = "min-af", display_order = 100, help_heading = "OUTPUT")]
    pub min_af: Option<String>,
    
    /// Only output ANI values where both genomes have aligned fraction > than this value. [default: disabled]
    #[clap(long = "both-min-af", display_order = 101, help_heading = "OUTPUT")]
    pub both_min_af: Option<String>,
    
    /// Max number of results to show for each query. [default: unlimited]
    #[clap(short = 'n', help_heading = "OUTPUT")]
    pub n: Option<String>,
    
    /// Output [5%,95%] ANI confidence intervals using percentile bootstrap on the putative ANI distribution
    #[clap(long = "ci", help_heading = "OUTPUT")]
    pub ci: bool,
    
    /// Print additional info including contig N50s and more
    #[clap(long = "detailed", help_heading = "OUTPUT")]
    pub detailed: bool,
    
    /// Only display the first part of contig names (before first whitespace)
    #[clap(long = "short-header", help_heading = "OUTPUT")]
    pub short_header: bool,

    /// Slower skani mode; 4x slower and more memory. Gives much more accurate AF for distant genomes. More accurate ANI for VERY fragmented assemblies (< 3kb N50), but less accurate ANI otherwise. Alias for -c 30.
    #[clap(long = "slow", help_heading = "PRESETS")]
    pub slow: bool,
    
    /// Medium skani mode; 2x slower and more memory. More accurate AF and more accurate ANI for moderately fragmented assemblies (< 10kb N50). Alias for -c 70.
    #[clap(long = "medium", help_heading = "PRESETS")]
    pub medium: bool,
    
    /// Faster skani mode; 2x faster and less memory. Less accurate AF and less accurate ANI for distant genomes, but works ok for high N50 and > 95% ANI. Alias for -c 200.
    #[clap(long = "fast", help_heading = "PRESETS")]
    pub fast: bool,
    
    /// Mode for small genomes such as viruses or plasmids (< 20 kb). Can be much faster for large data, but is slower/less accurate on bacterial-sized genomes. Alias for: -c 30 -m 200 --faster-small.
    #[clap(long = "small-genomes", help_heading = "PRESETS")]
    pub small_genomes: bool,

    /// Disable regression model for ANI prediction. [default: learned ANI used for c >= 70 and >= 150,000 bases aligned and not on individual contigs]
    #[clap(long = "no-learned-ani", help_heading = "ALGORITHM PARAMETERS")]
    pub no_learned_ani: bool,
    
    /// Marker k-mer compression factor. Markers are used for filtering. Consider decreasing to ~200-300 if working with small genomes (e.g. plasmids or viruses). [default: 1000]
    #[clap(short = 'm', help_heading = "ALGORITHM PARAMETERS")]
    pub marker_c: Option<String>,
    
    /// k-mer size. [default: 15]
    #[clap(short = 'k', hide = true, help_heading = "ALGORITHM PARAMETERS")]
    pub k: Option<String>,
    
    /// Compression factor (k-mer subsampling rate). [default: 125]
    #[clap(short = 'c', help_heading = "ALGORITHM PARAMETERS")]
    pub c: Option<String>,
    
    /// Screen out pairs with *approximately* < % identity using k-mer sketching. [default: 80]
    #[clap(short = 's', help_heading = "ALGORITHM PARAMETERS")]
    pub s: Option<String>,
    
    /// Estimate mean after trimming off 10%/90% quantiles
    #[clap(long = "robust", help_heading = "ALGORITHM PARAMETERS")]
    pub robust: bool,
    
    /// Estimate median identity instead of average (mean) identity
    #[clap(long = "median", help_heading = "ALGORITHM PARAMETERS")]
    pub median: bool,
    
    /// Do not use hash-table inverted index for faster ANI filtering. [default: load index if > 100 query files or using the --qi option]
    #[clap(long = "no-marker-index", help_heading = "ALGORITHM PARAMETERS")]
    pub no_marker_index: bool,
    
    /// Filter genomes with < 20 marker k-mers more aggressively. Much faster for many small genomes but may miss some comparisons.
    #[clap(long = "faster-small", help_heading = "ALGORITHM PARAMETERS")]
    pub faster_small: bool,

    /// Debug level verbosity
    #[clap(short = 'v', long = "debug", help_heading = "MISC")]
    pub debug: bool,
    
    /// Trace level verbosity
    #[clap(long = "trace", help_heading = "MISC")]
    pub trace: bool,
}

#[derive(Args)]
#[clap(group(
    clap::ArgGroup::new("input_group")
        .required(true)
))]
pub struct TriangleArgs {
    /// Number of threads
    #[clap(short = 't', default_value = "3")]
    pub threads: String,

    /// File with each line containing one fasta/sketch file
    #[clap(short = 'l', help_heading = "INPUTS", group = "input_group")]
    pub fasta_list: Option<String>,
    
    /// Use amino acid to calculate AAI instead. [default: ANI]
    #[clap(short = 'a', long = "aai", hide = true, help_heading = "INPUTS")]
    pub aai: bool,
    
    /// Fasta(s) or sketch(es)
    #[clap(help_heading = "INPUTS", group = "input_group")]
    pub fasta_files: Vec<String>,
    
    /// Use individual sequences instead the entire file for multi-fastas
    #[clap(short = 'i', help_heading = "INPUTS")]
    pub individual_contig: bool,

    /// Output file name; rewrites file by default [default: output to stdout]
    #[clap(short = 'o', display_order = 1, help_heading = "OUTPUT")]
    pub output: Option<String>,
    
    /// Output full matrix instead of lower-triangular matrix
    #[clap(long = "full-matrix", help_heading = "OUTPUT")]
    pub full_matrix: bool,
    
    /// Output the diagonal of the ANI matrix (i.e. self-self comparisons) for both dense and sparse matrices
    #[clap(long = "diagonal", help_heading = "OUTPUT")]
    pub diagonal: bool,
    
    /// Only output ANI values where one genome has aligned fraction > than this value. [default: 15]
    #[clap(long = "min-af", help_heading = "OUTPUT")]
    pub min_af: Option<String>,
    
    /// Only output ANI values where both genomes have aligned fraction > than this value. [default: disabled]
    #[clap(long = "both-min-af", help_heading = "OUTPUT")]
    pub both_min_af: Option<String>,
    
    /// Output [5%,95%] ANI confidence intervals using percentile bootstrap on the putative ANI distribution. Only works with --sparse or -E.
    #[clap(long = "ci", help_heading = "OUTPUT")]
    pub ci: bool,
    
    /// Print additional info including contig N50s and more
    #[clap(long = "detailed", help_heading = "OUTPUT")]
    pub detailed: bool,
    
    /// Only display the first part of contig names (before first whitespace)
    #[clap(long = "short-header", help_heading = "OUTPUT")]
    pub short_header: bool,
    
    /// Output 100 - ANI instead of ANI, creating a distance instead of a similarity matrix. No effect if using --sparse or -E.
    #[clap(long = "distance", help_heading = "OUTPUT")]
    pub distance: bool,
    
    /// Output comparisons in a row-by-row form (i.e. sparse matrix) in the same form as `skani dist`. Only pairs with aligned fraction > --min-af are output.
    #[clap(long = "sparse", short = 'E', help_heading = "OUTPUT")]
    pub sparse: bool,

    /// Slower skani mode; 4x slower and more memory. Gives much more accurate AF for distant genomes. More accurate ANI for VERY fragmented assemblies (< 3kb N50), but less accurate ANI otherwise. Alias for -c 30.
    #[clap(long = "slow", help_heading = "PRESETS")]
    pub slow: bool,
    
    /// Medium skani mode; 2x slower and more memory. More accurate AF and more accurate ANI for moderately fragmented assemblies (< 10kb N50). Alias for -c 70.
    #[clap(long = "medium", help_heading = "PRESETS")]
    pub medium: bool,
    
    /// Faster skani mode; 2x faster and less memory. Less accurate AF and less accurate ANI for distant genomes, but works ok for high N50 and > 95% ANI. Alias for -c 200.
    #[clap(long = "fast", help_heading = "PRESETS")]
    pub fast: bool,
    
    /// Mode for small genomes such as viruses or plasmids (< 20 kb). Can be much faster for large data, but is slower/less accurate on bacterial-sized genomes. Alias for: -c 30 -m 200 --faster-small.
    #[clap(long = "small-genomes", help_heading = "PRESETS")]
    pub small_genomes: bool,

    /// Disable regression model for ANI prediction. [default: learned ANI used for c >= 70 and >= 150,000 bases aligned and not on individual contigs]
    #[clap(long = "no-learned-ani", help_heading = "ALGORITHM PARAMETERS")]
    pub no_learned_ani: bool,
    
    /// Marker k-mer compression factor. Markers are used for filtering. Consider decreasing to ~200-300 if working with small genomes (e.g. plasmids or viruses). [default: 1000]
    #[clap(short = 'm', help_heading = "ALGORITHM PARAMETERS")]
    pub marker_c: Option<String>,
    
    /// Screen out pairs with *approximately* < % identity using k-mer sketching. [default: 80]
    #[clap(short = 's', help_heading = "ALGORITHM PARAMETERS")]
    pub s: Option<String>,
    
    /// k-mer size. [default: 15]
    #[clap(short = 'k', hide = true, help_heading = "ALGORITHM PARAMETERS")]
    pub k: Option<String>,
    
    /// Compression factor (k-mer subsampling rate). [default: 125]
    #[clap(short = 'c', help_heading = "ALGORITHM PARAMETERS")]
    pub c: Option<String>,
    
    /// Estimate mean after trimming off 10%/90% quantiles
    #[clap(long = "robust", help_heading = "ALGORITHM PARAMETERS")]
    pub robust: bool,
    
    /// Estimate median identity instead of average (mean) identity
    #[clap(long = "median", help_heading = "ALGORITHM PARAMETERS")]
    pub median: bool,
    
    /// Filter genomes with < 20 marker k-mers more aggressively. Much faster for many small genomes but may miss some comparisons.
    #[clap(long = "faster-small", help_heading = "ALGORITHM PARAMETERS")]
    pub faster_small: bool,

    /// Debug level verbosity
    #[clap(short = 'v', long = "debug", help_heading = "MISC")]
    pub debug: bool,
    
    /// Trace level verbosity
    #[clap(long = "trace", help_heading = "MISC")]
    pub trace: bool,
}

#[derive(Args)]
#[clap(group(
    clap::ArgGroup::new("query_group")
        .required(true)
))]
pub struct SearchArgs {
    /// Number of threads
    #[clap(short = 't', default_value = "3")]
    pub threads: String,

    /// Output folder from `skani sketch`
    #[clap(short = 'd', required = true, help_heading = "INPUTS")]
    pub database: String,
    
    /// Query fasta(s) or sketch(es)
    #[clap(multiple_values = true, help_heading = "INPUTS", group = "query_group")]
    pub query: Vec<String>,
    
    /// Query fasta(s) or sketch(es)
    #[clap(short = 'q', multiple_values = true, help_heading = "INPUTS", group = "query_group")]
    pub queries: Vec<String>,
    
    /// File with each line containing one fasta/sketch file
    #[clap(long = "ql", help_heading = "INPUTS", group = "query_group")]
    pub query_list: Option<String>,
    
    /// Use individual sequences for the QUERY in a multi-line fasta
    #[clap(long = "qi", help_heading = "INPUTS")]
    pub qi: bool,

    /// Output file name; rewrites file by default [default: output to stdout]
    #[clap(short = 'o', display_order = 1, help_heading = "OUTPUT")]
    pub output: Option<String>,
    
    /// Output [5%,95%] ANI confidence intervals using percentile bootstrap on the putative ANI distribution
    #[clap(long = "ci", help_heading = "OUTPUT")]
    pub ci: bool,
    
    /// Print additional info including contig N50s and more
    #[clap(long = "detailed", help_heading = "OUTPUT")]
    pub detailed: bool,
    
    /// Only display the first part of contig names (before first whitespace)
    #[clap(long = "short-header", help_heading = "OUTPUT")]
    pub short_header: bool,
    
    /// Only output ANI values where one genome has aligned fraction > than this value. [default: 15]
    #[clap(long = "min-af", help_heading = "OUTPUT")]
    pub min_af: Option<String>,
    
    /// Only output ANI values where both genomes have aligned fraction > than this value. [default: disabled]
    #[clap(long = "both-min-af", help_heading = "OUTPUT")]
    pub both_min_af: Option<String>,
    
    /// Max number of results to show for each query. [default: unlimited]
    #[clap(short = 'n', help_heading = "OUTPUT")]
    pub n: Option<String>,

    /// Disable regression model for ANI prediction. [default: learned ANI used for c >= 70 and >= 150,000 bases aligned and not on individual contigs]
    #[clap(long = "no-learned-ani", help_heading = "ALGORITHM PARAMETERS")]
    pub no_learned_ani: bool,
    
    /// Keep reference sketches in memory if the sketch passes the marker filter. Takes more memory but is much faster when querying many similar sequences.
    #[clap(long = "keep-refs", help_heading = "ALGORITHM PARAMETERS")]
    pub keep_refs: bool,
    
    /// Do not use hash-table inverted index for faster ANI filtering. [default: load index if > 100 query files or using the --qi option]
    #[clap(long = "no-marker-index", help_heading = "ALGORITHM PARAMETERS")]
    pub no_marker_index: bool,
    
    /// Screen out pairs with *approximately* < % identity using k-mer sketching. [default: 80]
    #[clap(short = 's', help_heading = "ALGORITHM PARAMETERS")]
    pub s: Option<String>,
    
    /// Estimate mean after trimming off 10%/90% quantiles
    #[clap(long = "robust", help_heading = "ALGORITHM PARAMETERS")]
    pub robust: bool,
    
    /// Estimate median identity instead of average (mean) identity
    #[clap(long = "median", help_heading = "ALGORITHM PARAMETERS")]
    pub median: bool,

    /// Debug level verbosity
    #[clap(short = 'v', long = "debug", help_heading = "MISC")]
    pub debug: bool,
    
    /// Trace level verbosity
    #[clap(long = "trace", help_heading = "MISC")]
    pub trace: bool,
}


================================================
FILE: src/cmd_line.rs
================================================
pub const MIN_ALIGN_FRAC: &str = "min aligned frac";
pub const CMD_MIN_ALIGN_FRAC: &str = "min-af";
pub const H_MIN_ALIGN_FRAC: &str = "Only output ANI values where one genome has aligned fraction > than this value.\t[default: 15]";

pub const IND_CTG_QRY: &str = "individual contig query";
pub const CMD_IND_CTG_QRY: &str = "qi";
pub const H_IND_CTG_QRY: &str = "Use individual sequences for the QUERY in a multi-line fasta.";

pub const IND_CTG_REF: &str = "individual contig ref";
pub const CMD_IND_CTG_REF: &str = "ri";
pub const H_IND_CTG_REF: &str = "Use individual sequences for the REFERENCE in a multi-line fasta.";

pub const NO_FULL_INDEX: &str = "no marker index";
pub const CMD_NO_FULL_INDEX: &str = "no-marker-index";
pub const H_NO_FULL_INDEX: &str = "Do not use hash-table inverted index for faster ANI filtering. \t[default: load index if > 100 query files or using the --qi option]";

pub const ROBUST: &str = "robust";
pub const CMD_ROBUST: &str = "robust";
pub const H_ROBUST: &str = "Estimate mean after trimming off 10%/90% quantiles.";

pub const FULL_MAT: &str = "full-matrix";
pub const CMD_FULL_MAT: &str = "full-matrix";
pub const H_FULL_MAT: &str = "Output full matrix instead of lower-triangular matrix.";

pub const KEEP_REFS: &str = "keep-refs";
pub const CMD_KEEP_REFS: &str = "keep-refs";
pub const H_KEEP_REFS: &str = "Keep reference sketches in memory if the sketch passes the marker filter. Takes more memory but is much faster when querying many similar sequences.";

pub const C_FACTOR: &str = "c";
pub const CMD_C_FACTOR: &str = "c";
pub const H_C_FACTOR: &str = "Compression factor (k-mer subsampling rate).\t[default: 125]";

pub const H_SCREEN: &str = "Screen out pairs with *approximately* < % identity using k-mer sketching.\t[default: 80]";

pub const CONF_INTERVAL: &str = "ci";
pub const CMD_CONF_INTERVAL: &str = "ci";
pub const H_CONF_INTERVAL: &str = "Output [5%,95%] ANI confidence intervals using percentile bootstrap on the putative ANI distribution.";
pub const H_CONF_INTERVAL_TRI: &str = "Output [5%,95%] ANI confidence intervals using percentile bootstrap on the putative ANI distribution. Only works with --sparse or -E.";

pub const NO_LEARNED_ANI: &str = "no-learned-ani";
pub const CMD_NO_LEARNED_ANI : &str = "no-learned-ani";
pub const H_NO_LEARNED_ANI: &str = "Disable regression model for ANI prediction.\t[default: learned ANI used for c >= 70 and >= 150,000 bases aligned and not on individual contigs]";

pub const MODE_SLOW: &str = "slow";
pub const CMD_MODE_SLOW : &str = "slow";
pub const H_MODE_SLOW : &str = "Slower skani mode; 4x slower and more memory. Gives much more accurate AF for distant genomes. More accurate ANI for VERY fragmented assemblies (< 3kb N50), but less accurate ANI otherwise. Alias for -c 30.";

pub const MODE_SMALL_GENOMES: &str = "small-genomes";
pub const CMD_MODE_SMALL_GENOMES: &str = "small-genomes";
pub const H_MODE_SMALL_GENOMES : &str = "Mode for small genomes such as viruses or plasmids (< 20 kb). Can be much faster for large data, but is slower/less accurate on bacterial-sized genomes. Alias for: -c 30 -m 200 --faster-small.";

pub const MODE_FAST: &str = "fast";
pub const CMD_MODE_FAST : &str = "fast";
pub const H_MODE_FAST : &str = "Faster skani mode; 2x faster and less memory. Less accurate AF and less accurate ANI for distant genomes, but works ok for high N50 and > 95% ANI. Alias for -c 200.";

pub const MODE_MEDIUM: &str = "medium";
pub const CMD_MODE_MEDIUM : &str = "medium";
pub const H_MODE_MEDIUM: &str = "Medium skani mode; 2x slower and more memory. More accurate AF and more accurate ANI for moderately fragmented assemblies (< 10kb N50). Alias for -c 70.";

pub const MARKER_C: &str = "marker_c";
pub const CMD_MARKER_C: char = 'm';
pub const H_MARKER_C: &str = "Marker k-mer compression factor. Markers are used for filtering. Consider decreasing to ~200-300 if working with small genomes (e.g. plasmids or viruses). \t[default: 1000]";

pub const DETAIL_OUT: &str = "detailed";
pub const CMD_DETAIL_OUT: &str = "detailed";
pub const H_DETAIL_OUT: &str = "Print additional info including contig N50s and more.";

pub const DISTANCE_OUT: &str = "distance";
pub const CMD_DISTANCE_OUT: &str = "distance";
pub const H_DISTANCE_OUT: &str = "Output 100 - ANI instead of ANI, creating a distance instead of a similarity matrix. No effect if using --sparse or -E.";

pub const INT_WRITE: &str = "intermediate write count";
pub const CMD_INT_WRITE: &str = "inter-write";
pub const H_INT_WRITE: &str = "Write results to output after --inter-write queries are processed (leads to non-deterministic outputs when multi-threading). \t[default: 10000]";

pub const FAST_SMALL: &str = "faster-small";
pub const CMD_FAST_SMALL: &str = "faster-small";
pub const H_FAST_SMALL: &str = "Filter genomes with < 20 marker k-mers more aggressively. Much faster for many small genomes but may miss some comparisons.";

pub const DIAG: &str = "diagonal";
pub const CMD_DIAG: &str = "diagonal";
pub const H_DIAG: &str = "Output the diagonal of the ANI matrix (i.e. self-self comparisons) for both dense and sparse matrices.";



================================================
FILE: src/dist.rs
================================================
use crate::chain;
use crate::regression;
use crate::file_io;
use crate::params::*;
use crate::screen;
use crate::types::*;
use log::*;
use rayon::prelude::*;
use std::sync::Mutex;
use std::time::Instant;

pub fn dist(command_params: CommandParams, mut sketch_params: SketchParams) {
    let ref_sketches;
    let query_params;
    let query_sketches;
    let now = Instant::now();
    if command_params.refs_are_sketch {
        let new_sketch_params;
        info!("Sketches detected.");
        (new_sketch_params, ref_sketches) = file_io::sketches_from_sketch(
            &command_params.ref_files,
        );
        if new_sketch_params != sketch_params {
            warn!("Parameters from .sketch files not equal to the input parameters. Using parameters from .sketch files.")
        }
        sketch_params = new_sketch_params;
    } else if command_params.individual_contig_r {
        ref_sketches = file_io::fastx_to_multiple_sketch_rewrite(
            &command_params.ref_files,
            &sketch_params,
            true,
        );
    } else {
        ref_sketches =
            file_io::fastx_to_sketches(&command_params.ref_files, &sketch_params, true);
    }
    if command_params.queries_are_sketch {
        (query_params, query_sketches) =
            file_io::sketches_from_sketch(&command_params.query_files);
        if sketch_params != query_params && command_params.refs_are_sketch {
            panic!(
                "Query sketch parameters were not equal to reference sketch parameters. Exiting."
            );
        } else if sketch_params != query_params {
            warn!("Parameters from .sketch files not equal to the input parameters. Using parameters from .sketch files.")
        }
    } else if command_params.individual_contig_q {
        query_sketches = file_io::fastx_to_multiple_sketch_rewrite(
            &command_params.query_files,
            &sketch_params,
            true,
        );
    } else {
        query_sketches =
            file_io::fastx_to_sketches(&command_params.query_files, &sketch_params, true);
    }
    if query_sketches.is_empty() || ref_sketches.is_empty() {
        error!("No reference sketches/genomes or query sketches/genomes found.");
        std::process::exit(1)
    }


    let model_opt = regression::get_model(sketch_params.c, command_params.learned_ani);
    if model_opt.is_some() {
        info!("{}", LEARNED_INFO_HELP);
    }

    let screen_val;
    if command_params.screen_val == 0. {
        if sketch_params.use_aa {
            screen_val = SEARCH_AAI_CUTOFF_DEFAULT;
        } else {
            screen_val = SEARCH_ANI_CUTOFF_DEFAULT;
        }
    } else {
        screen_val = command_params.screen_val;
    }

    let kmer_to_sketch;

    if command_params.screen {
        let now = Instant::now();
        info!("Full index option detected; generating marker hash table");
        kmer_to_sketch = screen::kmer_to_sketch_from_refs(&ref_sketches);
        info!("Full indexing time: {}", now.elapsed().as_secs_f32());
    } else {
        kmer_to_sketch = KmerToSketch::default();
    }

    info!("Generating sketch time: {}", now.elapsed().as_secs_f32());
    let now = Instant::now();
    let js = (0..query_sketches.len())
        .into_iter()
        .collect::<Vec<usize>>();
    let anis: Mutex<Vec<AniEstResult>> = Mutex::new(vec![]);
    let counter: Mutex<usize> = Mutex::new(0);
    let first_write: Mutex<bool> = Mutex::new(true);
    js.into_par_iter().for_each(|j| {
        let query_sketch = &query_sketches[j];
        if !command_params.screen {
            let is = (0..ref_sketches.len()).into_iter().collect::<Vec<usize>>();
            is.into_par_iter().for_each(|i| {
                let ref_sketch = &ref_sketches[i];
                let passed_screen =
                    screen::check_markers_quickly(query_sketch, ref_sketch, screen_val, command_params.rescue_small);
                if passed_screen {
                    let map_params = chain::map_params_from_sketch(
                        ref_sketch,
                        sketch_params.use_aa,
                        &command_params,
                        &model_opt,
                    );
                    let ani_res;
                    ani_res = chain::chain_seeds(ref_sketch, query_sketch, map_params);
                    if ani_res.ani > 0.1 {
                        let mut locked = anis.lock().unwrap();
                        locked.push(ani_res);
                    }
                }
            });
        } else {
            let refs_passing_screen_table = screen::screen_refs(
                screen_val,
                &kmer_to_sketch,
                query_sketch,
                &sketch_params,
                &ref_sketches,
                command_params.rescue_small
            );
            refs_passing_screen_table.into_par_iter().for_each(|i| {
                let ref_sketch = &ref_sketches[i];
                let map_params = chain::map_params_from_sketch(
                    ref_sketch,
                    sketch_params.use_aa,
                    &command_params,
                    &model_opt
                );
                let ani_res = chain::chain_seeds(ref_sketch, query_sketch, map_params);
                if ani_res.ani > 0.1{
                    let mut locked = anis.lock().unwrap();
                    locked.push(ani_res);
                }
            });
        }
        let c;
        {
            let mut locked = counter.lock().unwrap();
            *locked += 1;
            c = *locked;
        }
        if c % 100 == 0 && c != 0{
            info!("{} query sequences processed.", c);
            if c % INTERMEDIATE_WRITE_COUNT == 0 && c != 0{
                info!("Writing results for {} query sequences.", INTERMEDIATE_WRITE_COUNT);
                let moved_anis: Vec<AniEstResult>;
                {
                let mut locked = anis.lock().unwrap();
                moved_anis = std::mem::take(&mut locked);
                }
                let mut fw = first_write.lock().unwrap();
                file_io::write_query_ref_list(
                    &moved_anis,
                    &command_params.out_file_name,
                    command_params.max_results,
                    sketch_params.use_aa,
                    command_params.est_ci,
                    command_params.detailed_out,
                    !*fw,
                    command_params.short_header,
                );
                if *fw == true{
                    *fw = false;
                }
            }
        }
    });
    let anis = anis.into_inner().unwrap();
    
    file_io::write_query_ref_list(
        &anis,
        &command_params.out_file_name,
        command_params.max_results,
        sketch_params.use_aa,
        command_params.est_ci,
        command_params.detailed_out,
        !*first_write.lock().unwrap(),
        command_params.short_header,
    );
    info!("ANI calculation time: {}", now.elapsed().as_secs_f32());
}


================================================
FILE: src/file_io.rs
================================================
use crate::params::*;
use std::fs::OpenOptions;
use crate::seeding;
use crate::types::*;
use fxhash::FxHashMap;
use log::*;
use needletail::parse_fastx_file;
use rand::seq::SliceRandom;
use rand::thread_rng;
use rayon::prelude::*;
use std::fs::File;
use std::io::{self, BufReader, BufWriter, Write};
use std::sync::Mutex;

fn write_header(writer: &mut impl Write, id_str: &str, ci: bool, verbose: bool) {
    if !ci && !verbose {
        writeln!(writer,"Ref_file\tQuery_file\t{}\tAlign_fraction_ref\tAlign_fraction_query\tRef_name\tQuery_name", id_str).unwrap();
    } else if !verbose {
        writeln!(writer,"Ref_file\tQuery_file\t{}\tAlign_fraction_ref\tAlign_fraction_query\tRef_name\tQuery_name\t{}_5_percentile\t{}_95_percentile", id_str, id_str, id_str).unwrap();
    } else {
        writeln!(writer,"Ref_file\tQuery_file\t{}\tAlign_fraction_ref\tAlign_fraction_query\tRef_name\tQuery_name\tNum_ref_contigs\tNum_query_contigs\t{}_5_percentile\t{}_95_percentile\tStandard_deviation\tRef_90_ctg_len\tRef_50_ctg_len\tRef_10_ctg_len\tQuery_90_ctg_len\tQuery_50_ctg_len\tQuery_10_ctg_len\tAvg_chain_len\tTotal_bases_covered", id_str, id_str, id_str).unwrap();
    }
}

fn write_ani_res_perfect(writer: &mut impl Write, sketch: &Sketch, ci: bool, verbose: bool, short_header: bool) {
    if !ci && !verbose {
        writeln!(
            writer,
            "{}\t{}\t{:.2}\t{:.2}\t{:.2}\t{}\t{}",
            sketch.file_name,
            sketch.file_name,
            100,
            100,
            100,
            truncate_contig_name(&sketch.contigs[0], short_header),
            truncate_contig_name(&sketch.contigs[0], short_header),
        )
        .unwrap();
    } else if !verbose {
        writeln!(
            writer,
            "{}\t{}\t{:.2}\t{:.2}\t{:.2}\t{}\t{}\t{:.2}\t{:.2}",
            sketch.file_name,
            sketch.file_name,
            100,
            100,
            100,
            truncate_contig_name(&sketch.contigs[0], short_header),
            truncate_contig_name(&sketch.contigs[0], short_header),
            100,
            100,
        )
        .unwrap();
    } else {
        writeln!(
            writer,
            "{}\t{}\t{:.2}\t{:.2}\t{:.2}\t{}\t{}\t{}\t{}\t{:.2}\t{:.2}\t{:.2}\t{:0}\t{:0}\t{:0}\t{:0}\t{:0}\t{:0}\t{:0}\t{:0}",
            sketch.file_name,
            sketch.file_name,
            100,
            100,
            100,
            truncate_contig_name(&sketch.contigs[0], short_header),
            truncate_contig_name(&sketch.contigs[0], short_header),
            sketch.contigs.len(),
            sketch.contigs.len(),
            100,
            100,
            0,
            -1,
            -1,
            -1,
            -1,
            -1,
            -1,
            0,
            sketch.total_sequence_length,
        )
        .unwrap();
    }
}

fn write_ani_res(writer: &mut impl Write, ani_res: &AniEstResult, ci: bool, verbose: bool, short_header: bool) {
    if !ci && !verbose {
        writeln!(
            writer,
            "{}\t{}\t{:.2}\t{:.2}\t{:.2}\t{}\t{}",
            ani_res.ref_file,
            ani_res.query_file,
            ani_res.ani * 100.,
            ani_res.align_fraction_ref * 100.,
            ani_res.align_fraction_query * 100.,
            truncate_contig_name(&ani_res.ref_contig, short_header),
            truncate_contig_name(&ani_res.query_contig, short_header),
        )
        .unwrap();
    } else if !verbose {
        writeln!(
            writer,
            "{}\t{}\t{:.2}\t{:.2}\t{:.2}\t{}\t{}\t{:.2}\t{:.2}",
            ani_res.ref_file,
            ani_res.query_file,
            ani_res.ani * 100.,
            ani_res.align_fraction_ref * 100.,
            ani_res.align_fraction_query * 100.,
            truncate_contig_name(&ani_res.ref_contig, short_header),
            truncate_contig_name(&ani_res.query_contig, short_header),
            ani_res.ci_lower * 100.,
            ani_res.ci_upper * 100.,
        )
        .unwrap();
    } else {
        writeln!(
            writer,
            "{}\t{}\t{:.2}\t{:.2}\t{:.2}\t{}\t{}\t{}\t{}\t{:.2}\t{:.2}\t{:.2}\t{:0}\t{:0}\t{:0}\t{:0}\t{:0}\t{:0}\t{:0}\t{:0}",
            ani_res.ref_file,
            ani_res.query_file,
            ani_res.ani * 100.,
            ani_res.align_fraction_ref * 100.,
            ani_res.align_fraction_query * 100.,
            truncate_contig_name(&ani_res.ref_contig, short_header),
            truncate_contig_name(&ani_res.query_contig, short_header),
            ani_res.num_contigs_r,
            ani_res.num_contigs_q,
            ani_res.ci_lower * 100.,
            ani_res.ci_upper * 100.,
            ani_res.std * 100.,
            ani_res.quant_90_contig_len_r,
            ani_res.quant_50_contig_len_r,
            ani_res.quant_10_contig_len_r,
            ani_res.quant_90_contig_len_q,
            ani_res.quant_50_contig_len_q,
            ani_res.quant_10_contig_len_q,
            ani_res.avg_chain_int_len,
            ani_res.total_bases_covered,
        )
        .unwrap();
    }
}

pub fn fastx_to_sketches(
    ref_files: &Vec<String>,
    sketch_params: &SketchParams,
    seed: bool,
) -> Vec<Sketch> {
    let ref_sketches: Mutex<Vec<_>> = Mutex::new(vec![]);
    let mut index_vec = (0..ref_files.len()).collect::<Vec<usize>>();
    index_vec.shuffle(&mut thread_rng());
    index_vec.into_par_iter().for_each(|i| {
        let ref_file = &ref_files[i];
        let mut new_sketch = Sketch::new(
            sketch_params.marker_c,
            sketch_params.c,
            sketch_params.k,
            ref_file.to_string(),
            sketch_params.use_aa,
        );
        let reader = parse_fastx_file(ref_file);
        if reader.is_err() {
            if ref_file.contains(".sketch"){
                warn!("{} is not a valid fasta/fastq file but has the .sketch extension. Not all inputs have .sketch extension, so fasta/fastq is assumed.", ref_file);
            }
            else{
                warn!("{} is not a valid fasta/fastq file; skipping.", ref_file);
            }
        } else {
            let mut j = 0;
            let mut is_valid = true;
            let mut reader = reader.unwrap();
            trace!("Sketching {} {}", new_sketch.file_name, i);
            while let Some(record) = reader.next() {
                if record.is_ok() {
                    let record = record.unwrap_or_else(|_| panic!("Invalid record for file {}", ref_file));
                    let contig = record.id();
                    let seq = record.seq();
                    if seq.len() >= MIN_LENGTH_CONTIG {
                        new_sketch
                            .contigs
                            .push(String::from_utf8(contig.to_vec()).unwrap());
                        new_sketch.contig_lengths.push(seq.len() as GnPosition);

                        new_sketch.total_sequence_length += seq.len();
                        if sketch_params.use_aa {
                            let orfs = seeding::get_orfs(&seq, sketch_params);
                            seeding::fmh_seeds_aa_with_orf(
                                &seq,
                                sketch_params,
                                j as u32,
                                &mut new_sketch,
                                orfs,
                                seed,
                            )
                        } else {
                            #[cfg(any(target_arch = "x86_64"))]
                            {
                                if is_x86_feature_detected!("avx2"){
                                    use crate::avx2_seeding;
                                    unsafe {
                                        avx2_seeding::avx2_fmh_seeds(
                                            &seq,
                                            sketch_params,
                                            j as u32,
                                            &mut new_sketch,
                                            seed,
                                        );
                                    }
                                } else {
                                    seeding::fmh_seeds(
                                        &seq,
                                        sketch_params,
                                        j as u32,
                                        &mut new_sketch,
                                        seed,
                                    );
                                }
                            }
                            #[cfg(not(target_arch = "x86_64"))]
                            {
                            seeding::fmh_seeds(
                                        &seq,
                                        sketch_params,
                                        j as u32,
                                        &mut new_sketch,
                                        seed,
                                    );
                            }

                        }
                        //new_sketch.contig_order = 0;
                        j += 1;
                    }
                } else {
                    warn!("File {} is not a valid fasta/fastq file", ref_file);
                    is_valid = false;
                    break;
                }
            }
            if is_valid && j > 0{
                {
                    let mut locked = ref_sketches.lock().unwrap();
                    locked.push(new_sketch);
                }
            }
            if j == 0 && is_valid{
                warn!("File {} consists of only contigs < {} bp. Skipping this file.",  ref_file, MIN_LENGTH_CONTIG);
            }
        }
    });
    let mut ref_sketches = ref_sketches.into_inner().unwrap();
    ref_sketches.sort();
    ref_sketches
}
pub fn fastx_to_multiple_sketch_rewrite(
    ref_files: &Vec<String>,
    sketch_params: &SketchParams,
    seed: bool,
) -> Vec<Sketch> {
    let ref_sketches: Mutex<Vec<_>> = Mutex::new(vec![]);
    let mut index_vec = (0..ref_files.len()).collect::<Vec<usize>>();
    index_vec.shuffle(&mut thread_rng());
    index_vec.into_par_iter().for_each(|i| {
        let mut small_contig_warn = false;
        let ref_file = &ref_files[i];
        let reader = parse_fastx_file(ref_file);
        if reader.is_err() {
            warn!("{} is not a valid fasta/fastq file; skipping.", ref_file);
        } else {
            let mut j = 0;
            let mut reader = reader.unwrap();
            trace!("Sketching {} {}", ref_file, i);
            while let Some(record) = reader.next() {
                if record.is_ok() {
                    let record =
                        record.unwrap_or_else(|_| panic!("Invalid record for file {}", ref_file));
                    let contig = record.id();
                    let seq = record.seq();
                    if seq.len() >= MIN_LENGTH_CONTIG {
                        let mut new_sketch = Sketch::new(
                            sketch_params.marker_c,
                            sketch_params.c,
                            sketch_params.k,
                            ref_file.to_string(),
                            sketch_params.use_aa,
                        );
                        new_sketch
                            .contigs
                            .push(String::from_utf8(contig.to_vec()).unwrap());
                        new_sketch.contig_lengths.push(seq.len() as GnPosition);

                        new_sketch.total_sequence_length += seq.len();
                        if sketch_params.use_aa {
                            let orfs = seeding::get_orfs(&seq, sketch_params);
                            seeding::fmh_seeds_aa_with_orf(
                                &seq,
                                sketch_params,
                                0_u32,
                                &mut new_sketch,
                                orfs,
                                seed,
                            )
                        } else {
                            #[cfg(any(target_arch = "x86_64"))]
                            {
                                if is_x86_feature_detected!("avx2") {
                                    use crate::avx2_seeding;
                                    unsafe {
                                        avx2_seeding::avx2_fmh_seeds(
                                            &seq,
                                            sketch_params,
                                            0_u32,
                                            &mut new_sketch,
                                            seed,
                                        );
                                    }
                                } else {
                                    seeding::fmh_seeds(
                                        &seq,
                                        sketch_params,
                                        0_u32,
                                        &mut new_sketch,
                                        seed,
                                    );
                                }
                            }
                            #[cfg(not(target_arch = "x86_64"))]
                            {
                            seeding::fmh_seeds(
                                        &seq,
                                        sketch_params,
                                        j as u32,
                                        &mut new_sketch,
                                        seed,
                                    );
                            }
                        }
                        new_sketch.contig_order = j;

//                        if new_sketch.total_sequence_length > REPET_KMER_THRESHOLD {
//                            new_sketch.repetitive_kmers =
//                                seeding::get_repetitive_kmers(&new_sketch.kmer_seeds_k, new_sketch.c);
//                        }

                        let mut locked = ref_sketches.lock().unwrap();
                        locked.push(new_sketch);
                        j += 1;
                    }
                    else if !small_contig_warn
                    {
                        small_contig_warn = true;
                        warn!("At least one sequence in file {} has < {} bp. These sequences will be skipped.", ref_file, MIN_LENGTH_CONTIG);
                    }
                } else {
                    warn!("File {} is not a valid fasta/fastq file", ref_file);
                    break;
                }
            }
        }
    });
    let mut ref_sketches = ref_sketches.into_inner().unwrap();
    ref_sketches.sort();
    ref_sketches
}

pub fn write_phyllip_matrix(
    anis: &FxHashMap<usize, FxHashMap<usize, AniEstResult>>,
    sketches: &Vec<Sketch>,
    file_name: &str,
    use_contig_names: bool,
    full_matrix: bool,
    diag: bool,
    aai: bool,
    distance: bool,
) {
    let perfect = if distance {0.} else {100.};
    let none = 100. - perfect;

    let _id_str = if aai { "AAI" } else { "ANI" };
    if file_name.is_empty() {
        let stdout = io::stdout();
        let mut handle = stdout.lock();
        writeln!(&mut handle, "{}", sketches.len()).unwrap();
        for i in 0..sketches.len() {
            let name;
            if use_contig_names {
                name = &sketches[i].contigs[0];
            } else {
                name = &sketches[i].file_name;
            }
            write!(&mut handle, "{}", name).unwrap();
            let end;
            if full_matrix {
                end = sketches.len();
            } else {
                if diag {
                    end = i + 1;
                } else {
                    end = i;
                }
            }
            for j in 0..end {
                if j == i {
                    write!(&mut handle, "\t{:.2}", perfect).unwrap();
                    continue;
                }
                let x = usize::min(i, j);
                let y = usize::max(i, j);
                if i == j {
                    write!(&mut handle, "\t{:.2}", perfect).unwrap();
                } else if !anis.contains_key(&x) || !anis[&x].contains_key(&y) {
                    write!(&mut handle, "\t{:.2}", none).unwrap();
                } else if anis[&x][&y].ani == -1. || anis[&x][&y].ani.is_nan() {
                    write!(&mut handle, "\t{:.2}", none).unwrap();
                } else {
                    let val = anis[&x][&y].ani * 100.;
                    let ani_val = if !distance {val} else { 100. - val};
                    write!(&mut handle, "\t{:.2}", ani_val).unwrap();
                }
            }
            writeln!(&mut handle).unwrap();
        }

        let af_mat_file = "skani_matrix.af".to_string();
        let mut af_file = BufWriter::new(File::create(af_mat_file).unwrap());
        writeln!(&mut af_file, "{}", sketches.len()).unwrap();
        for i in 0..sketches.len() {
            let name;
            if use_contig_names {
                name = &sketches[i].contigs[0];
            } else {
                name = &sketches[i].file_name;
            }
            write!(&mut af_file, "{}", name).unwrap();
            //We always output full matrix for AF.
            let end = sketches.len();
            for j in 0..end {
                if i == j {
                    write!(&mut af_file, "\t{:.2}", 100.).unwrap();
                    continue;
                }
                let x = usize::min(i, j);
                let y = usize::max(i, j);
                if !anis.contains_key(&x) || !anis[&x].contains_key(&y) {
                    write!(&mut af_file, "\t{:.2}", 0.).unwrap();
                } else if anis[&x][&y].ani == -1. || anis[&x][&y].ani.is_nan() {
                    write!(&mut af_file, "\t{:.2}", 0.).unwrap();
                } else {
                    if j > i {
                        write!(
                            &mut af_file,
                            "\t{:.2}",
                            anis[&x][&y].align_fraction_ref * 100.
                        )
                        .unwrap();
                    } else {
                        write!(
                            &mut af_file,
                            "\t{:.2}",
                            anis[&x][&y].align_fraction_query * 100.
                        )
                        .unwrap();
                    }
                }
            }
            writeln!(&mut af_file).unwrap();
        }

        info!("Aligned fraction matrix written to skani_matrix.af");
    } else {
        let ani_mat_file = file_name.to_string();
        let af_mat_file = format!("{}.af", file_name);
        let mut ani_file = BufWriter::new(File::create(ani_mat_file).expect(file_name));
        let mut af_file = BufWriter::new(File::create(af_mat_file).unwrap());
        writeln!(&mut ani_file, "{}", sketches.len()).unwrap();
        writeln!(&mut af_file, "{}", sketches.len()).unwrap();
        for i in 0..sketches.len() {
            let name;
            if use_contig_names {
                name = &sketches[i].contigs[0];
            } else {
                name = &sketches[i].file_name;
            }
            write!(&mut ani_file, "{}", name).unwrap();
            write!(&mut af_file, "{}", name).unwrap();
            let end = sketches.len();
            for j in 0..end {
                let full_cond = full_matrix || (i > j);
                if i == j {
                    if full_cond || diag {
                        write!(&mut ani_file, "\t{:.2}", perfect).unwrap();
                    }
                    write!(&mut af_file, "\t{:.2}", 100.).unwrap();
                    continue;
                }
                let x = usize::min(i, j);
                let y = usize::max(i, j);

                if !anis.contains_key(&x) || !anis[&x].contains_key(&y) {
                    if full_cond{
                        write!(&mut ani_file, "\t{:.2}", none).unwrap();
                    }
                    write!(&mut af_file, "\t{:.2}", 0.).unwrap();
                } else if anis[&x][&y].ani == -1. || anis[&x][&y].ani.is_nan() {
                    if full_cond{
                        write!(&mut ani_file, "\t{:.2}", none).unwrap();
                    }
                    write!(&mut af_file, "\t{:.2}", 0.).unwrap();
                } else {
                    if full_cond{
                        let val = anis[&x][&y].ani * 100.;
                        let ani_val = if !distance {val} else { 100. - val};
                        write!(&mut ani_file, "\t{:.2}", ani_val).unwrap();
                    }
                    if j > i {
                        write!(
                            &mut af_file,
                            "\t{:.2}",
                            anis[&x][&y].align_fraction_ref * 100.
                        )
                        .unwrap();
                    } else {
                        write!(
                            &mut af_file,
                            "\t{:.2}",
                            anis[&x][&y].align_fraction_query * 100.
                        )
                        .unwrap();
                    }
                }
            }
            writeln!(&mut ani_file).unwrap();
            writeln!(&mut af_file).unwrap();
        }

        info!(
            "Identity and align fraction matrix written to {} and {}.af",
            file_name, file_name
        );
    }
}

pub fn write_sparse_matrix(
    anis: &FxHashMap<usize, FxHashMap<usize, AniEstResult>>,
    sketches: &Vec<Sketch>,
    file_name: &str,
    aai: bool,
    est_ci: bool,
    detailed_out: bool,
    diag: bool,
    append: bool,
    short_header: bool,
) {
    let id_str = if aai { "AAI" } else { "ANI" };
    if file_name.is_empty() {
        let stdout = io::stdout();
        let mut handle = stdout.lock();
        if !append{
            write_header(&mut handle, id_str, est_ci, detailed_out);
        }
        //        write!(&mut handle,"Ref_file\tQuery_file\t{}\tAlign_fraction_ref\tAlign_fraction_query\t{}_95_percentile\t{}_5_percentile\tRef_name\tQuery_name\n", id_str, id_str, id_str).unwrap();
        if diag{
            for sketch in sketches.iter(){
                write_ani_res_perfect(&mut handle, sketch, est_ci, detailed_out, short_header);
            }
        }
        for i in anis.keys() {
            for (j, ani_res) in anis[i].iter() {
                if !(anis[i][j].ani == -1. || anis[i][j].ani.is_nan()) {
                    write_ani_res(&mut handle, ani_res, est_ci, detailed_out, short_header);
                }
            }
        }
    } else {
        let ani_mat_file = file_name.to_string();
        let mut ani_file;
        if append{
            let file = OpenOptions::new()
                .append(true)
                .create(true)
                .open(ani_mat_file).expect(file_name);
                ani_file = BufWriter::new(file);
        }
        else{
            ani_file = BufWriter::new(File::create(ani_mat_file).expect(file_name));
        }
        if !append{
            write_header(&mut ani_file, id_str, est_ci, detailed_out);
        }

        if diag{
            for sketch in sketches.iter(){
                write_ani_res_perfect(&mut ani_file, sketch, est_ci, detailed_out, short_header);
            }
        }

        for i in anis.keys() {
            if diag{
                write_ani_res_perfect(&mut ani_file, &sketches[*i], est_ci, detailed_out, short_header);
            }
            for (j, ani_res) in anis[i].iter() {
                if !(anis[i][j].ani == -1. || anis[i][j].ani.is_nan()) {
                    write_ani_res(&mut ani_file, ani_res, est_ci, detailed_out, short_header);
                }
            }
        }
    }
}

pub fn write_query_ref_list(
    anis: &Vec<AniEstResult>,
    file_name: &str,
    n: usize,
    aai: bool,
    est_ci: bool,
    detailed_out: bool,
    append: bool,
    short_header: bool,
) {
    let id_str = if aai { "AAI" } else { "ANI" };
    let mut query_file_result_map = FxHashMap::default();
    let out_file = file_name.to_string();

    for i in 0..anis.len() {
        if anis[i].ani < 0. || anis[i].ani.is_nan() {
            continue;
        }
        let _ani = if anis[i].ani < 0. {
            "NA".to_string()
        } else {
            format!("{}", anis[i].ani)
        };
        let results = query_file_result_map
            .entry(&anis[i].query_contig)
            .or_insert(vec![]);
        results.push(&anis[i]);
    }
    let mut sorted_keys = query_file_result_map.keys().collect::<Vec<&&String>>();
    sorted_keys.sort();

    if out_file.is_empty() {
        let stdout = io::stdout();
        let mut handle = stdout.lock();
        if !append{
            write_header(&mut handle, id_str, est_ci, detailed_out);
        }
        for key in sorted_keys {
            let mut anis = query_file_result_map[key].clone();

            anis.sort_by(|y, x| x.ani.partial_cmp(&y.ani).unwrap());
            for i in 0..usize::min(n, anis.len()) {
                write_ani_res(&mut handle, anis[i], est_ci, detailed_out, short_header);
            }
        }
    } else {
        let mut handle;
        if append{
            let file = OpenOptions::new()
                .append(true)
                .create(true)
                .open(out_file).expect(file_name);
            handle = BufWriter::new(file);
        }
        else{
            handle = BufWriter::new(File::create(out_file).expect(file_name));
        }

        if !append{
            write_header(&mut handle, id_str, est_ci, detailed_out);
        }
        for key in sorted_keys {
            let mut anis = query_file_result_map[key].clone();

            anis.sort_by(|y, x| x.ani.partial_cmp(&y.ani).unwrap());
            for i in 0..usize::min(n, anis.len()) {
                write_ani_res(&mut handle, anis[i], est_ci, detailed_out, short_header);
            }
        }
    }
}

pub fn sketches_from_sketch(ref_files: &Vec<String>) -> (SketchParams, Vec<Sketch>) {
    let ret_sketch_params: Mutex<SketchParams> = Mutex::new(SketchParams::default());
    let ret_ref_sketches: Mutex<Vec<Sketch>> = Mutex::new(vec![]);

    (0..ref_files.len())
        .collect::<Vec<usize>>()
        .into_par_iter()
        .for_each(|i| {
            let sketch_file = &ref_files[i];
            if !sketch_file.contains("markers.bin") {
                let f = File::open(sketch_file);
                if f.is_err() {
                    error!("Problem reading sketch file {}. Perhaps your file path is wrong? Exiting.", sketch_file);
                    std::process::exit(1)
                }
                let reader = BufReader::new(f.unwrap());
                let res: Result<(SketchParams, Sketch), _> = bincode::deserialize_from(reader);
                if res.is_ok() {
                    let (temp_sketch_param, temp_ref_sketch) = res.unwrap();
                    let mut locked = ret_sketch_params.lock().unwrap();
                    *locked = temp_sketch_param;
                    let mut locked = ret_ref_sketches.lock().unwrap();
                    locked.push(temp_ref_sketch);
                } else if sketch_file != "markers.bin" {
                    error!(
                        "{} is not a valid .sketch file or is corrupted. Skani v0.3+ is not compatible with older sketch files. 
                        Please re-run `skani sketch --separate-files` to generate new sketch files.",
                        sketch_file
                    );
                }
            }
        });

    let ret_sketch_params = ret_sketch_params.into_inner().unwrap();
    let mut ret_ref_sketches = ret_ref_sketches.into_inner().unwrap();

    ret_ref_sketches.sort_by(|x, y| x.file_name.cmp(&y.file_name));
    (ret_sketch_params, ret_ref_sketches)
}

pub fn marker_sketches_from_marker_file(marker_file: &str) -> (SketchParams, Vec<Sketch>) {
    let reader = BufReader::new(File::open(marker_file).unwrap());
    let res: Result<(SketchParams, Vec<Sketch>), _> = bincode::deserialize_from(reader);
    if res.is_ok() {
        res.unwrap()
    } else {
        error!("Problem reading {}. Exiting. ", marker_file);
        std::process::exit(1)
    }
}


================================================
FILE: src/lib.rs
================================================
pub mod types;
pub mod params;
pub mod chain;
pub mod file_io;
pub mod seeding;
pub mod screen;
pub mod search;
pub mod sketch;
pub mod dist;
pub mod triangle;
pub mod cmd_line;
pub mod model;
pub mod regression;

#[cfg(target_arch = "x86_64")]
pub mod avx2_seeding;
#[cfg(feature = "cli")]
pub mod parse;
#[cfg(feature = "cli")]
pub mod cli;
#[cfg(feature = "cli")]
pub mod sketch_db;


================================================
FILE: src/main.rs
================================================
use clap::Parser;
use std::env;
use skani::cli::{Cli, Commands};
use skani::dist;
use skani::parse;
use skani::search;
use skani::sketch;
use skani::triangle;

//Use this allocator when statically compiling
//instead of the default
//because the musl statically compiled binary
//uses a bad default allocator which makes the
//binary take 60% longer!!! Only affects
//static compilation though. 
#[cfg(target_env = "musl")]
#[global_allocator]
static GLOBAL: tikv_jemallocator::Jemalloc = tikv_jemallocator::Jemalloc;

fn main() {
    let cli = Cli::parse();

    let (sketch_params, command_params) = parse::parse_params_from_cli(&cli);

    let cmd_txt = env::args().into_iter().collect::<Vec<String>>().join(" ");
    let log_str = &cmd_txt[0..usize::min(cmd_txt.len(), 250)];
    if cmd_txt.len() > 250{
        log::info!("{} ...", log_str);
    }
    else{
        log::info!("{}", log_str);
    }

    match cli.command {
        Commands::Sketch(_) => {
            sketch::sketch(command_params, sketch_params);
        },
        Commands::Search(_) => {
            search::search(command_params);
        },
        Commands::Dist(_) => {
            dist::dist(command_params, sketch_params);
        },
        Commands::Triangle(_) => {
            triangle::triangle(command_params, sketch_params);
        },
    }
}


================================================
FILE: src/model.rs
================================================
pub const MODEL:&str = r#"
{"conf":{"feature_size":5,"max_depth":3,"iterations":195,"shrinkage":0.06,"feature_sample_ratio":1.0,"data_sample_ratio":1.0,"min_leaf_size":1,"loss":"LAD","debug":false,"initial_guess_enabled":false,"training_optimization_level":2},"trees":[{"tree":{"tree":[{"value":{"feature_index":0,"feature_value":98.285,"pred":0.0,"missing":0,"is_leaf":false},"index":0,"left":1,"right":8},{"value":{"feature_index":0,"feature_value":98.045,"pred":-0.7799988,"missing":0,"is_leaf":false},"index":1,"left":2,"right":5},{"value":{"feature_index":0,"feature_value":97.895004,"pred":-1.159996,"missing":0,"is_leaf":false},"index":2,"left":3,"right":4},{"value":{"feature_index":0,"feature_value":0.0,"pred":-1.4799957,"missing":0,"is_leaf":true},"index":3,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.3199997,"missing":0,"is_leaf":true},"index":4,"left":0,"right":0},{"value":{"feature_index":3,"feature_value":62615.0,"pred":-0.11999512,"missing":0,"is_leaf":false},"index":5,"left":6,"right":7},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.25,"missing":0,"is_leaf":true},"index":6,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.040000916,"missing":0,"is_leaf":true},"index":7,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":98.524994,"pred":0.7000046,"missing":0,"is_leaf":false},"index":8,"left":9,"right":12},{"value":{"feature_index":3,"feature_value":31745.0,"pred":0.1000061,"missing":0,"is_leaf":false},"index":9,"left":10,"right":11},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.049995422,"missing":0,"is_leaf":true},"index":10,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.1800003,"missing":0,"is_leaf":true},"index":11,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":98.675,"pred":0.8800049,"missing":0,"is_leaf":false},"index":12,"left":13,"right":14},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.33000183,"missing":0,"is_leaf":true},"index":13,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":1.0100021,"missing":0,"is_leaf":true},"index":14,"left":0,"right":0}]},"feature_size":5,"max_depth":3,"min_leaf_size":1,"loss":"LAD","feature_sample_ratio":1.0},{"tree":{"tree":[{"value":{"feature_index":0,"feature_value":98.295,"pred":0.007598877,"missing":0,"is_leaf":false},"index":0,"left":1,"right":8},{"value":{"feature_index":0,"feature_value":98.055,"pred":-0.69120026,"missing":0,"is_leaf":false},"index":1,"left":2,"right":5},{"value":{"feature_index":0,"feature_value":97.825,"pred":-1.060791,"missing":0,"is_leaf":false},"index":2,"left":3,"right":4},{"value":{"feature_index":0,"feature_value":0.0,"pred":-1.5811996,"missing":0,"is_leaf":true},"index":3,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.3007965,"missing":0,"is_leaf":true},"index":4,"left":0,"right":0},{"value":{"feature_index":3,"feature_value":38091.5,"pred":-0.10240173,"missing":0,"is_leaf":false},"index":5,"left":6,"right":7},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.2649994,"missing":0,"is_leaf":true},"index":6,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.025001526,"missing":0,"is_leaf":true},"index":7,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":98.535,"pred":0.6502075,"missing":0,"is_leaf":false},"index":8,"left":9,"right":12},{"value":{"feature_index":3,"feature_value":31680.5,"pred":0.09919739,"missing":0,"is_leaf":false},"index":9,"left":10,"right":11},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.036994934,"missing":0,"is_leaf":true},"index":10,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.17919922,"missing":0,"is_leaf":true},"index":11,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":98.755005,"pred":0.82940674,"missing":0,"is_leaf":false},"index":12,"left":13,"right":14},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.34020233,"missing":0,"is_leaf":true},"index":13,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":1.0394058,"missing":0,"is_leaf":true},"index":14,"left":0,"right":0}]},"feature_size":5,"max_depth":3,"min_leaf_size":1,"loss":"LAD","feature_sample_ratio":1.0},{"tree":{"tree":[{"value":{"feature_index":0,"feature_value":98.255005,"pred":0.010902405,"missing":0,"is_leaf":false},"index":0,"left":1,"right":8},{"value":{"feature_index":0,"feature_value":98.035,"pred":-0.6563263,"missing":0,"is_leaf":false},"index":1,"left":2,"right":5},{"value":{"feature_index":0,"feature_value":97.665,"pred":-1.0063248,"missing":0,"is_leaf":false},"index":2,"left":3,"right":4},{"value":{"feature_index":0,"feature_value":0.0,"pred":-2.1263275,"missing":0,"is_leaf":true},"index":3,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.302742,"missing":0,"is_leaf":true},"index":4,"left":0,"right":0},{"value":{"feature_index":3,"feature_value":65455.5,"pred":-0.12650299,"missing":0,"is_leaf":false},"index":5,"left":6,"right":7},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.24274445,"missing":0,"is_leaf":true},"index":6,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.026100159,"missing":0,"is_leaf":true},"index":7,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":98.494995,"pred":0.5770416,"missing":0,"is_leaf":false},"index":8,"left":9,"right":12},{"value":{"feature_index":3,"feature_value":41107.5,"pred":0.065223694,"missing":0,"is_leaf":false},"index":9,"left":10,"right":11},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.054779053,"missing":0,"is_leaf":true},"index":10,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.14845276,"missing":0,"is_leaf":true},"index":11,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":98.785,"pred":0.73703766,"missing":0,"is_leaf":false},"index":12,"left":13,"right":14},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.3097992,"missing":0,"is_leaf":true},"index":13,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":1.0170441,"missing":0,"is_leaf":true},"index":14,"left":0,"right":0}]},"feature_size":5,"max_depth":3,"min_leaf_size":1,"loss":"LAD","feature_sample_ratio":1.0},{"tree":{"tree":[{"value":{"feature_index":0,"feature_value":98.325,"pred":0.015464783,"missing":0,"is_leaf":false},"index":0,"left":1,"right":8},{"value":{"feature_index":0,"feature_value":98.085,"pred":-0.52497864,"missing":0,"is_leaf":false},"index":1,"left":2,"right":5},{"value":{"feature_index":0,"feature_value":97.545,"pred":-0.80874634,"missing":0,"is_leaf":false},"index":2,"left":3,"right":4},{"value":{"feature_index":0,"feature_value":0.0,"pred":-2.4487457,"missing":0,"is_leaf":true},"index":3,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.2881775,"missing":0,"is_leaf":true},"index":4,"left":0,"right":0},{"value":{"feature_index":3,"feature_value":38055.0,"pred":-0.061935425,"missing":0,"is_leaf":false},"index":5,"left":6,"right":7},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.21453094,"missing":0,"is_leaf":true},"index":6,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.044532776,"missing":0,"is_leaf":true},"index":7,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":98.565,"pred":0.5760193,"missing":0,"is_leaf":false},"index":8,"left":9,"right":12},{"value":{"feature_index":3,"feature_value":24181.0,"pred":0.109550476,"missing":0,"is_leaf":false},"index":9,"left":10,"right":11},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.04336548,"missing":0,"is_leaf":true},"index":10,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.16986847,"missing":0,"is_leaf":true},"index":11,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":98.865005,"pred":0.73602295,"missing":0,"is_leaf":false},"index":12,"left":13,"right":14},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.34602356,"missing":0,"is_leaf":true},"index":13,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":1.0860214,"missing":0,"is_leaf":true},"index":14,"left":0,"right":0}]},"feature_size":5,"max_depth":3,"min_leaf_size":1,"loss":"LAD","feature_sample_ratio":1.0},{"tree":{"tree":[{"value":{"feature_index":0,"feature_value":98.315,"pred":0.019355774,"missing":0,"is_leaf":false},"index":0,"left":1,"right":8},{"value":{"feature_index":0,"feature_value":98.035,"pred":-0.47862244,"missing":0,"is_leaf":false},"index":1,"left":2,"right":5},{"value":{"feature_index":0,"feature_value":97.455,"pred":-0.79182434,"missing":0,"is_leaf":false},"index":2,"left":3,"right":4},{"value":{"feature_index":0,"feature_value":0.0,"pred":-2.561821,"missing":0,"is_leaf":true},"index":3,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.29087067,"missing":0,"is_leaf":true},"index":4,"left":0,"right":0},{"value":{"feature_index":3,"feature_value":38055.0,"pred":-0.084602356,"missing":0,"is_leaf":false},"index":5,"left":6,"right":7},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.22509003,"missing":0,"is_leaf":true},"index":6,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.021820068,"missing":0,"is_leaf":true},"index":7,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":98.585,"pred":0.5252609,"missing":0,"is_leaf":false},"index":8,"left":9,"right":12},{"value":{"feature_index":3,"feature_value":29462.0,"pred":0.108314514,"missing":0,"is_leaf":false},"index":9,"left":10,"right":11},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.02168274,"missing":0,"is_leaf":true},"index":10,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.1793518,"missing":0,"is_leaf":true},"index":11,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":98.915,"pred":0.69085693,"missing":0,"is_leaf":false},"index":12,"left":13,"right":14},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.34526062,"missing":0,"is_leaf":true},"index":13,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":1.1008606,"missing":0,"is_leaf":true},"index":14,"left":0,"right":0}]},"feature_size":5,"max_depth":3,"min_leaf_size":1,"loss":"LAD","feature_sample_ratio":1.0},{"tree":{"tree":[{"value":{"feature_index":0,"feature_value":98.335,"pred":0.021842957,"missing":0,"is_leaf":false},"index":0,"left":1,"right":8},{"value":{"feature_index":0,"feature_value":98.104996,"pred":-0.4237442,"missing":0,"is_leaf":false},"index":1,"left":2,"right":5},{"value":{"feature_index":0,"feature_value":97.295,"pred":-0.6138725,"missing":0,"is_leaf":false},"index":2,"left":3,"right":4},{"value":{"feature_index":0,"feature_value":0.0,"pred":-2.6881104,"missing":0,"is_leaf":true},"index":3,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.26833344,"missing":0,"is_leaf":true},"index":4,"left":0,"right":0},{"value":{"feature_index":3,"feature_value":37588.0,"pred":-0.0394516,"missing":0,"is_leaf":false},"index":5,"left":6,"right":7},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.17816162,"missing":0,"is_leaf":true},"index":6,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.045562744,"missing":0,"is_leaf":true},"index":7,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":98.615005,"pred":0.49480438,"missing":0,"is_leaf":false},"index":8,"left":9,"right":12},{"value":{"feature_index":3,"feature_value":27910.5,"pred":0.11891174,"missing":0,"is_leaf":false},"index":9,"left":10,"right":11},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.015045166,"missing":0,"is_leaf":true},"index":10,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.18891144,"missing":0,"is_leaf":true},"index":11,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":98.955,"pred":0.6548004,"missing":0,"is_leaf":false},"index":12,"left":13,"right":14},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.33973694,"missing":0,"is_leaf":true},"index":13,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":1.0948029,"missing":0,"is_leaf":true},"index":14,"left":0,"right":0}]},"feature_size":5,"max_depth":3,"min_leaf_size":1,"loss":"LAD","feature_sample_ratio":1.0},{"tree":{"tree":[{"value":{"feature_index":0,"feature_value":98.365005,"pred":0.02331543,"missing":0,"is_leaf":false},"index":0,"left":1,"right":8},{"value":{"feature_index":0,"feature_value":98.115005,"pred":-0.3779068,"missing":0,"is_leaf":false},"index":1,"left":2,"right":5},{"value":{"feature_index":3,"feature_value":74035.0,"pred":-0.54790497,"missing":0,"is_leaf":false},"index":2,"left":3,"right":4},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.8420105,"missing":0,"is_leaf":true},"index":3,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.23223114,"missing":0,"is_leaf":true},"index":4,"left":0,"right":0},{"value":{"feature_index":3,"feature_value":31745.0,"pred":-0.02746582,"missing":0,"is_leaf":false},"index":5,"left":6,"right":7},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.16667938,"missing":0,"is_leaf":true},"index":6,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.04725647,"missing":0,"is_leaf":true},"index":7,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":98.645004,"pred":0.4691162,"missing":0,"is_leaf":false},"index":8,"left":9,"right":12},{"value":{"feature_index":3,"feature_value":24179.5,"pred":0.13839722,"missing":0,"is_leaf":false},"index":9,"left":10,"right":11},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.0073394775,"missing":0,"is_leaf":true},"index":10,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.19935608,"missing":0,"is_leaf":true},"index":11,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.035,"pred":0.62911224,"missing":0,"is_leaf":false},"index":12,"left":13,"right":14},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.33935547,"missing":0,"is_leaf":true},"index":13,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":1.1291122,"missing":0,"is_leaf":true},"index":14,"left":0,"right":0}]},"feature_size":5,"max_depth":3,"min_leaf_size":1,"loss":"LAD","feature_sample_ratio":1.0},{"tree":{"tree":[{"value":{"feature_index":0,"feature_value":98.375,"pred":0.026603699,"missing":0,"is_leaf":false},"index":0,"left":1,"right":8},{"value":{"feature_index":0,"feature_value":97.994995,"pred":-0.34321594,"missing":0,"is_leaf":false},"index":1,"left":2,"right":5},{"value":{"feature_index":0,"feature_value":97.195,"pred":-0.64149475,"missing":0,"is_leaf":false},"index":2,"left":3,"right":4},{"value":{"feature_index":0,"feature_value":0.0,"pred":-2.642891,"missing":0,"is_leaf":true},"index":3,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.24397278,"missing":0,"is_leaf":true},"index":4,"left":0,"right":0},{"value":{"feature_index":3,"feature_value":55297.0,"pred":-0.0569458,"missing":0,"is_leaf":false},"index":5,"left":6,"right":7},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.14874268,"missing":0,"is_leaf":true},"index":6,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.040534973,"missing":0,"is_leaf":true},"index":7,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":98.725006,"pred":0.4413681,"missing":0,"is_leaf":false},"index":8,"left":9,"right":12},{"value":{"feature_index":3,"feature_value":24553.5,"pred":0.16117096,"missing":0,"is_leaf":false},"index":9,"left":10,"right":11},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.013755798,"missing":0,"is_leaf":true},"index":10,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.22529602,"missing":0,"is_leaf":true},"index":11,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.095,"pred":0.64136505,"missing":0,"is_leaf":false},"index":12,"left":13,"right":14},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.34623718,"missing":0,"is_leaf":true},"index":13,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":1.1313629,"missing":0,"is_leaf":true},"index":14,"left":0,"right":0}]},"feature_size":5,"max_depth":3,"min_leaf_size":1,"loss":"LAD","feature_sample_ratio":1.0},{"tree":{"tree":[{"value":{"feature_index":0,"feature_value":98.384995,"pred":0.030181885,"missing":0,"is_leaf":false},"index":0,"left":1,"right":8},{"value":{"feature_index":0,"feature_value":97.634995,"pred":-0.3143158,"missing":0,"is_leaf":false},"index":1,"left":2,"right":5},{"value":{"feature_index":0,"feature_value":96.975006,"pred":-1.4577255,"missing":0,"is_leaf":false},"index":2,"left":3,"right":4},{"value":{"feature_index":0,"feature_value":0.0,"pred":-2.7777252,"missing":0,"is_leaf":true},"index":3,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.31274414,"missing":0,"is_leaf":true},"index":4,"left":0,"right":0},{"value":{"feature_index":4,"feature_value":7307.5,"pred":-0.10706329,"missing":0,"is_leaf":false},"index":5,"left":6,"right":7},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.22982025,"missing":0,"is_leaf":true},"index":6,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.014793396,"missing":0,"is_leaf":true},"index":7,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":98.805,"pred":0.4174118,"missing":0,"is_leaf":false},"index":8,"left":9,"right":12},{"value":{"feature_index":3,"feature_value":24553.5,"pred":0.17563629,"missing":0,"is_leaf":false},"index":9,"left":10,"right":11},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.028968811,"missing":0,"is_leaf":true},"index":10,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.23466492,"missing":0,"is_leaf":true},"index":11,"left":0,"right":0},{"value":{"feature_index":3,"feature_value":17550.5,"pred":0.6734848,"missing":0,"is_leaf":false},"index":12,"left":13,"right":14},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.6234894,"missing":0,"is_leaf":true},"index":13,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.6934891,"missing":0,"is_leaf":true},"index":14,"left":0,"right":0}]},"feature_size":5,"max_depth":3,"min_leaf_size":1,"loss":"LAD","feature_sample_ratio":1.0},{"tree":{"tree":[{"value":{"feature_index":0,"feature_value":98.244995,"pred":0.028289795,"missing":0,"is_leaf":false},"index":0,"left":1,"right":8},{"value":{"feature_index":0,"feature_value":96.915,"pred":-0.36049652,"missing":0,"is_leaf":false},"index":1,"left":2,"right":5},{"value":{"feature_index":0,"feature_value":96.815,"pred":-2.6610641,"missing":0,"is_leaf":false},"index":2,"left":3,"right":4},{"value":{"feature_index":0,"feature_value":0.0,"pred":-2.7310638,"missing":0,"is_leaf":true},"index":3,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.46765137,"missing":0,"is_leaf":true},"index":4,"left":0,"right":0},{"value":{"feature_index":3,"feature_value":45758.0,"pred":-0.17895508,"missing":0,"is_leaf":false},"index":5,"left":6,"right":7},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.31808472,"missing":0,"is_leaf":true},"index":6,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.071006775,"missing":0,"is_leaf":true},"index":7,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":98.565,"pred":0.33187866,"missing":0,"is_leaf":false},"index":8,"left":9,"right":12},{"value":{"feature_index":3,"feature_value":42891.5,"pred":0.053894043,"missing":0,"is_leaf":false},"index":9,"left":10,"right":11},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.026100159,"missing":0,"is_leaf":true},"index":10,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.11844635,"missing":0,"is_leaf":true},"index":11,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.165,"pred":0.46187592,"missing":0,"is_leaf":false},"index":12,"left":13,"right":14},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.27059174,"missing":0,"is_leaf":true},"index":13,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":1.0718765,"missing":0,"is_leaf":true},"index":14,"left":0,"right":0}]},"feature_size":5,"max_depth":3,"min_leaf_size":1,"loss":"LAD","feature_sample_ratio":1.0},{"tree":{"tree":[{"value":{"feature_index":0,"feature_value":98.405,"pred":0.02709961,"missing":0,"is_leaf":false},"index":0,"left":1,"right":8},{"value":{"feature_index":0,"feature_value":97.015,"pred":-0.27383423,"missing":0,"is_leaf":false},"index":1,"left":2,"right":5},{"value":{"feature_index":0,"feature_value":96.715,"pred":-2.4037857,"missing":0,"is_leaf":false},"index":2,"left":3,"right":4},{"value":{"feature_index":0,"feature_value":0.0,"pred":-2.6271973,"missing":0,"is_leaf":true},"index":3,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.42300415,"missing":0,"is_leaf":true},"index":4,"left":0,"right":0},{"value":{"feature_index":3,"feature_value":52762.5,"pred":-0.12566376,"missing":0,"is_leaf":false},"index":5,"left":6,"right":7},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.2292862,"missing":0,"is_leaf":true},"index":6,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.026756287,"missing":0,"is_leaf":true},"index":7,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":98.845,"pred":0.37176514,"missing":0,"is_leaf":false},"index":8,"left":9,"right":12},{"value":{"feature_index":3,"feature_value":17966.0,"pred":0.17192078,"missing":0,"is_leaf":false},"index":9,"left":10,"right":11},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.0027618408,"missing":0,"is_leaf":true},"index":10,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.21356201,"missing":0,"is_leaf":true},"index":11,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.285,"pred":0.62755585,"missing":0,"is_leaf":false},"index":12,"left":13,"right":14},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.30077362,"missing":0,"is_leaf":true},"index":13,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":1.0675583,"missing":0,"is_leaf":true},"index":14,"left":0,"right":0}]},"feature_size":5,"max_depth":3,"min_leaf_size":1,"loss":"LAD","feature_sample_ratio":1.0},{"tree":{"tree":[{"value":{"feature_index":0,"feature_value":98.225006,"pred":0.027778625,"missing":0,"is_leaf":false},"index":0,"left":1,"right":8},{"value":{"feature_index":0,"feature_value":96.905,"pred":-0.33761597,"missing":0,"is_leaf":false},"index":1,"left":2,"right":5},{"value":{"feature_index":0,"feature_value":96.455,"pred":-2.3495712,"missing":0,"is_leaf":false},"index":2,"left":3,"right":4},{"value":{"feature_index":0,"feature_value":0.0,"pred":-2.5995712,"missing":0,"is_leaf":true},"index":3,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.48420715,"missing":0,"is_leaf":true},"index":4,"left":0,"right":0},{"value":{"feature_index":3,"feature_value":45244.5,"pred":-0.16559601,"missing":0,"is_leaf":false},"index":5,"left":6,"right":7},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.29193878,"missing":0,"is_leaf":true},"index":6,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.06882477,"missing":0,"is_leaf":true},"index":7,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":98.604996,"pred":0.2897873,"missing":0,"is_leaf":false},"index":8,"left":9,"right":12},{"value":{"feature_index":3,"feature_value":42845.0,"pred":0.057777405,"missing":0,"is_leaf":false},"index":9,"left":10,"right":11},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.01890564,"missing":0,"is_leaf":true},"index":10,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.11808777,"missing":0,"is_leaf":true},"index":11,"left":0,"right":0},{"value":{"feature_index":3,"feature_value":17749.5,"pred":0.42770386,"missing":0,"is_leaf":false},"index":12,"left":13,"right":14},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.34922028,"missing":0,"is_leaf":true},"index":13,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.44428253,"missing":0,"is_leaf":true},"index":14,"left":0,"right":0}]},"feature_size":5,"max_depth":3,"min_leaf_size":1,"loss":"LAD","feature_sample_ratio":1.0},{"tree":{"tree":[{"value":{"feature_index":0,"feature_value":98.195,"pred":0.024780273,"missing":0,"is_leaf":false},"index":0,"left":1,"right":8},{"value":{"feature_index":0,"feature_value":96.895004,"pred":-0.33772278,"missing":0,"is_leaf":false},"index":1,"left":2,"right":5},{"value":{"feature_index":0,"feature_value":96.375,"pred":-2.20018,"missing":0,"is_leaf":false},"index":2,"left":3,"right":4},{"value":{"feature_index":0,"feature_value":0.0,"pred":-2.4835892,"missing":0,"is_leaf":true},"index":3,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.48856354,"missing":0,"is_leaf":true},"index":4,"left":0,"right":0},{"value":{"feature_index":3,"feature_value":37948.0,"pred":-0.16442108,"missing":0,"is_leaf":false},"index":5,"left":6,"right":7},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.30442047,"missing":0,"is_leaf":true},"index":6,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.07788086,"missing":0,"is_leaf":true},"index":7,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":98.625,"pred":0.2610016,"missing":0,"is_leaf":false},"index":8,"left":9,"right":12},{"value":{"feature_index":3,"feature_value":37664.0,"pred":0.053497314,"missing":0,"is_leaf":false},"index":9,"left":10,"right":11},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.026268005,"missing":0,"is_leaf":true},"index":10,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.1102829,"missing":0,"is_leaf":true},"index":11,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.225006,"pred":0.41073608,"missing":0,"is_leaf":false},"index":12,"left":13,"right":14},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.2280426,"missing":0,"is_leaf":true},"index":13,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.946846,"missing":0,"is_leaf":true},"index":14,"left":0,"right":0}]},"feature_size":5,"max_depth":3,"min_leaf_size":1,"loss":"LAD","feature_sample_ratio":1.0},{"tree":{"tree":[{"value":{"feature_index":0,"feature_value":98.425,"pred":0.023918152,"missing":0,"is_leaf":false},"index":0,"left":1,"right":8},{"value":{"feature_index":0,"feature_value":97.725006,"pred":-0.23320389,"missing":0,"is_leaf":false},"index":1,"left":2,"right":5},{"value":{"feature_index":0,"feature_value":96.615005,"pred":-0.7259064,"missing":0,"is_leaf":false},"index":2,"left":3,"right":4},{"value":{"feature_index":0,"feature_value":0.0,"pred":-2.2145767,"missing":0,"is_leaf":true},"index":3,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.2632141,"missing":0,"is_leaf":true},"index":4,"left":0,"right":0},{"value":{"feature_index":4,"feature_value":7125.5,"pred":-0.056159973,"missing":0,"is_leaf":false},"index":5,"left":6,"right":7},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.1565628,"missing":0,"is_leaf":true},"index":6,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.0038528442,"missing":0,"is_leaf":true},"index":7,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":98.985,"pred":0.31705475,"missing":0,"is_leaf":false},"index":8,"left":9,"right":12},{"value":{"feature_index":3,"feature_value":24566.0,"pred":0.1617508,"missing":0,"is_leaf":false},"index":9,"left":10,"right":11},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.035308838,"missing":0,"is_leaf":true},"index":10,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.21040344,"missing":0,"is_leaf":true},"index":11,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.354996,"pred":0.68994904,"missing":0,"is_leaf":false},"index":12,"left":13,"right":14},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.26436615,"missing":0,"is_leaf":true},"index":13,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.9499512,"missing":0,"is_leaf":true},"index":14,"left":0,"right":0}]},"feature_size":5,"max_depth":3,"min_leaf_size":1,"loss":"LAD","feature_sample_ratio":1.0},{"tree":{"tree":[{"value":{"feature_index":0,"feature_value":98.195,"pred":0.023460388,"missing":0,"is_leaf":false},"index":0,"left":1,"right":8},{"value":{"feature_index":0,"feature_value":96.875,"pred":-0.3065796,"missing":0,"is_leaf":false},"index":1,"left":2,"right":5},{"value":{"feature_index":0,"feature_value":96.185,"pred":-1.9317017,"missing":0,"is_leaf":false},"index":2,"left":3,"right":4},{"value":{"feature_index":0,"feature_value":0.0,"pred":-2.3417053,"missing":0,"is_leaf":true},"index":3,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.41832733,"missing":0,"is_leaf":true},"index":4,"left":0,"right":0},{"value":{"feature_index":3,"feature_value":37948.0,"pred":-0.14676666,"missing":0,"is_leaf":false},"index":5,"left":6,"right":7},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.27676392,"missing":0,"is_leaf":true},"index":6,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.06813431,"missing":0,"is_leaf":true},"index":7,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":98.715,"pred":0.2344284,"missing":0,"is_leaf":false},"index":8,"left":9,"right":12},{"value":{"feature_index":3,"feature_value":42845.0,"pred":0.0634613,"missing":0,"is_leaf":false},"index":9,"left":10,"right":11},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.007789612,"missing":0,"is_leaf":true},"index":10,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.12145233,"missing":0,"is_leaf":true},"index":11,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.315,"pred":0.40303802,"missing":0,"is_leaf":false},"index":12,"left":13,"right":14},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.22019196,"missing":0,"is_leaf":true},"index":13,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.87303925,"missing":0,"is_leaf":true},"index":14,"left":0,"right":0}]},"feature_size":5,"max_depth":3,"min_leaf_size":1,"loss":"LAD","feature_sample_ratio":1.0},{"tree":{"tree":[{"value":{"feature_index":0,"feature_value":98.435,"pred":0.023712158,"missing":0,"is_leaf":false},"index":0,"left":1,"right":8},{"value":{"feature_index":0,"feature_value":96.925,"pred":-0.20819092,"missing":0,"is_leaf":false},"index":1,"left":2,"right":5},{"value":{"feature_index":0,"feature_value":96.145004,"pred":-1.7511978,"missing":0,"is_leaf":false},"index":2,"left":3,"right":4},{"value":{"feature_index":0,"feature_value":0.0,"pred":-2.2411957,"missing":0,"is_leaf":true},"index":3,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.38684845,"missing":0,"is_leaf":true},"index":4,"left":0,"right":0},{"value":{"feature_index":3,"feature_value":34414.5,"pred":-0.08946228,"missing":0,"is_leaf":false},"index":5,"left":6,"right":7},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.19586945,"missing":0,"is_leaf":true},"index":6,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.028053284,"missing":0,"is_leaf":true},"index":7,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.255005,"pred":0.28818512,"missing":0,"is_leaf":false},"index":8,"left":9,"right":12},{"value":{"feature_index":3,"feature_value":17966.0,"pred":0.16618347,"missing":0,"is_leaf":false},"index":9,"left":10,"right":11},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.03163147,"missing":0,"is_leaf":true},"index":10,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.20267487,"missing":0,"is_leaf":true},"index":11,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.435,"pred":0.80065155,"missing":0,"is_leaf":false},"index":12,"left":13,"right":14},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.29065704,"missing":0,"is_leaf":true},"index":13,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.86065674,"missing":0,"is_leaf":true},"index":14,"left":0,"right":0}]},"feature_size":5,"max_depth":3,"min_leaf_size":1,"loss":"LAD","feature_sample_ratio":1.0},{"tree":{"tree":[{"value":{"feature_index":0,"feature_value":98.175,"pred":0.023200989,"missing":0,"is_leaf":false},"index":0,"left":1,"right":8},{"value":{"feature_index":0,"feature_value":96.535,"pred":-0.2893219,"missing":0,"is_leaf":false},"index":1,"left":2,"right":5},{"value":{"feature_index":0,"feature_value":96.035,"pred":-1.8467331,"missing":0,"is_leaf":false},"index":2,"left":3,"right":4},{"value":{"feature_index":0,"feature_value":0.0,"pred":-2.2367325,"missing":0,"is_leaf":true},"index":3,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.4296875,"missing":0,"is_leaf":true},"index":4,"left":0,"right":0},{"value":{"feature_index":3,"feature_value":45758.0,"pred":-0.14455414,"missing":0,"is_leaf":false},"index":5,"left":6,"right":7},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.25559998,"missing":0,"is_leaf":true},"index":6,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.060768127,"missing":0,"is_leaf":true},"index":7,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":98.835,"pred":0.20912933,"missing":0,"is_leaf":false},"index":8,"left":9,"right":12},{"value":{"feature_index":3,"feature_value":21161.5,"pred":0.071754456,"missing":0,"is_leaf":false},"index":9,"left":10,"right":11},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.03816223,"missing":0,"is_leaf":true},"index":10,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.10769653,"missing":0,"is_leaf":true},"index":11,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.395004,"pred":0.41893005,"missing":0,"is_leaf":false},"index":12,"left":13,"right":14},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.21312714,"missing":0,"is_leaf":true},"index":13,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.79901886,"missing":0,"is_leaf":true},"index":14,"left":0,"right":0}]},"feature_size":5,"max_depth":3,"min_leaf_size":1,"loss":"LAD","feature_sample_ratio":1.0},{"tree":{"tree":[{"value":{"feature_index":0,"feature_value":98.455,"pred":0.022392273,"missing":0,"is_leaf":false},"index":0,"left":1,"right":8},{"value":{"feature_index":0,"feature_value":96.945,"pred":-0.18534088,"missing":0,"is_leaf":false},"index":1,"left":2,"right":5},{"value":{"feature_index":0,"feature_value":95.945,"pred":-1.4725266,"missing":0,"is_leaf":false},"index":2,"left":3,"right":4},{"value":{"feature_index":0,"feature_value":0.0,"pred":-2.20253,"missing":0,"is_leaf":true},"index":3,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.36095428,"missing":0,"is_leaf":true},"index":4,"left":0,"right":0},{"value":{"feature_index":3,"feature_value":147196.0,"pred":-0.07585907,"missing":0,"is_leaf":false},"index":5,"left":6,"right":7},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.10622406,"missing":0,"is_leaf":true},"index":6,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.04598236,"missing":0,"is_leaf":true},"index":7,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.325,"pred":0.26772308,"missing":0,"is_leaf":false},"index":8,"left":9,"right":12},{"value":{"feature_index":3,"feature_value":17966.0,"pred":0.15674591,"missing":0,"is_leaf":false},"index":9,"left":10,"right":11},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.03404236,"missing":0,"is_leaf":true},"index":10,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.18948364,"missing":0,"is_leaf":true},"index":11,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.494995,"pred":0.73098755,"missing":0,"is_leaf":false},"index":12,"left":13,"right":14},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.27526855,"missing":0,"is_leaf":true},"index":13,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.780983,"missing":0,"is_leaf":true},"index":14,"left":0,"right":0}]},"feature_size":5,"max_depth":3,"min_leaf_size":1,"loss":"LAD","feature_sample_ratio":1.0},{"tree":{"tree":[{"value":{"feature_index":0,"feature_value":98.155,"pred":0.022567749,"missing":0,"is_leaf":false},"index":0,"left":1,"right":8},{"value":{"feature_index":0,"feature_value":96.405,"pred":-0.27445984,"missing":0,"is_leaf":false},"index":1,"left":2,"right":5},{"value":{"feature_index":0,"feature_value":95.805,"pred":-1.6469727,"missing":0,"is_leaf":false},"index":2,"left":3,"right":4},{"value":{"feature_index":0,"feature_value":0.0,"pred":-2.2003784,"missing":0,"is_leaf":true},"index":3,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.42037964,"missing":0,"is_leaf":true},"index":4,"left":0,"right":0},{"value":{"feature_index":3,"feature_value":31243.5,"pred":-0.13670349,"missing":0,"is_leaf":false},"index":5,"left":6,"right":7},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.2824173,"missing":0,"is_leaf":true},"index":6,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.067855835,"missing":0,"is_leaf":true},"index":7,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":98.985,"pred":0.18666458,"missing":0,"is_leaf":false},"index":8,"left":9,"right":12},{"value":{"feature_index":3,"feature_value":17967.5,"pred":0.07635498,"missing":0,"is_leaf":false},"index":9,"left":10,"right":11},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.038002014,"missing":0,"is_leaf":true},"index":10,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.10610199,"missing":0,"is_leaf":true},"index":11,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.455,"pred":0.4649582,"missing":0,"is_leaf":false},"index":12,"left":13,"right":14},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.20896912,"missing":0,"is_leaf":true},"index":13,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.7241287,"missing":0,"is_leaf":true},"index":14,"left":0,"right":0}]},"feature_size":5,"max_depth":3,"min_leaf_size":1,"loss":"LAD","feature_sample_ratio":1.0},{"tree":{"tree":[{"value":{"feature_index":0,"feature_value":98.475006,"pred":0.021652222,"missing":0,"is_leaf":false},"index":0,"left":1,"right":8},{"value":{"feature_index":0,"feature_value":96.875,"pred":-0.16542053,"missing":0,"is_leaf":false},"index":1,"left":2,"right":5},{"value":{"feature_index":0,"feature_value":95.735,"pred":-1.268364,"missing":0,"is_leaf":false},"index":2,"left":3,"right":4},{"value":{"feature_index":0,"feature_value":0.0,"pred":-2.1183624,"missing":0,"is_leaf":true},"index":3,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.35406494,"missing":0,"is_leaf":true},"index":4,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":98.015,"pred":-0.06694794,"missing":0,"is_leaf":false},"index":5,"left":6,"right":7},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.14665222,"missing":0,"is_leaf":true},"index":6,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.000579834,"missing":0,"is_leaf":true},"index":7,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.325,"pred":0.24983215,"missing":0,"is_leaf":false},"index":8,"left":9,"right":12},{"value":{"feature_index":3,"feature_value":33243.0,"pred":0.1437378,"missing":0,"is_leaf":false},"index":9,"left":10,"right":11},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.06717682,"missing":0,"is_leaf":true},"index":10,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.19382477,"missing":0,"is_leaf":true},"index":11,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.524994,"pred":0.6406784,"missing":0,"is_leaf":false},"index":12,"left":13,"right":14},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.25077057,"missing":0,"is_leaf":true},"index":13,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.6907654,"missing":0,"is_leaf":true},"index":14,"left":0,"right":0}]},"feature_size":5,"max_depth":3,"min_leaf_size":1,"loss":"LAD","feature_sample_ratio":1.0},{"tree":{"tree":[{"value":{"feature_index":0,"feature_value":98.485,"pred":0.021568298,"missing":0,"is_leaf":false},"index":0,"left":1,"right":8},{"value":{"feature_index":0,"feature_value":96.524994,"pred":-0.15479279,"missing":0,"is_leaf":false},"index":1,"left":2,"right":5},{"value":{"feature_index":0,"feature_value":95.545,"pred":-1.3312607,"missing":0,"is_leaf":false},"index":2,"left":3,"right":4},{"value":{"feature_index":0,"feature_value":0.0,"pred":-2.1512604,"missing":0,"is_leaf":true},"index":3,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.38282776,"missing":0,"is_leaf":true},"index":4,"left":0,"right":0},{"value":{"feature_index":3,"feature_value":53061.5,"pred":-0.06665802,"missing":0,"is_leaf":false},"index":5,"left":6,"right":7},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.13290405,"missing":0,"is_leaf":true},"index":6,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.0021362305,"missing":0,"is_leaf":true},"index":7,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.375,"pred":0.23970795,"missing":0,"is_leaf":false},"index":8,"left":9,"right":12},{"value":{"feature_index":3,"feature_value":24864.5,"pred":0.13816833,"missing":0,"is_leaf":false},"index":9,"left":10,"right":11},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.05544281,"missing":0,"is_leaf":true},"index":10,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.17816925,"missing":0,"is_leaf":true},"index":11,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.575,"pred":0.6093216,"missing":0,"is_leaf":false},"index":12,"left":13,"right":14},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.25923157,"missing":0,"is_leaf":true},"index":13,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.6692352,"missing":0,"is_leaf":true},"index":14,"left":0,"right":0}]},"feature_size":5,"max_depth":3,"min_leaf_size":1,"loss":"LAD","feature_sample_ratio":1.0},{"tree":{"tree":[{"value":{"feature_index":0,"feature_value":98.145004,"pred":0.021331787,"missing":0,"is_leaf":false},"index":0,"left":1,"right":8},{"value":{"feature_index":0,"feature_value":96.274994,"pred":-0.24298859,"missing":0,"is_leaf":false},"index":1,"left":2,"right":5},{"value":{"feature_index":0,"feature_value":95.415,"pred":-1.342186,"missing":0,"is_leaf":false},"index":2,"left":3,"right":4},{"value":{"feature_index":0,"feature_value":0.0,"pred":-2.1621933,"missing":0,"is_leaf":true},"index":3,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.39218903,"missing":0,"is_leaf":true},"index":4,"left":0,"right":0},{"value":{"feature_index":3,"feature_value":25172.0,"pred":-0.12136841,"missing":0,"is_leaf":false},"index":5,"left":6,"right":7},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.28339386,"missing":0,"is_leaf":true},"index":6,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.06210327,"missing":0,"is_leaf":true},"index":7,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":98.994995,"pred":0.16252136,"missing":0,"is_leaf":false},"index":8,"left":9,"right":12},{"value":{"feature_index":3,"feature_value":42891.5,"pred":0.06563568,"missing":0,"is_leaf":false},"index":9,"left":10,"right":11},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.007484436,"missing":0,"is_leaf":true},"index":10,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.10751343,"missing":0,"is_leaf":true},"index":11,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.485,"pred":0.3891678,"missing":0,"is_leaf":false},"index":12,"left":13,"right":14},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.18147278,"missing":0,"is_leaf":true},"index":13,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.59916687,"missing":0,"is_leaf":true},"index":14,"left":0,"right":0}]},"feature_size":5,"max_depth":3,"min_leaf_size":1,"loss":"LAD","feature_sample_ratio":1.0},{"tree":{"tree":[{"value":{"feature_index":0,"feature_value":98.505005,"pred":0.020706177,"missing":0,"is_leaf":false},"index":0,"left":1,"right":8},{"value":{"feature_index":0,"feature_value":96.515,"pred":-0.13822174,"missing":0,"is_leaf":false},"index":1,"left":2,"right":5},{"value":{"feature_index":0,"feature_value":95.315,"pred":-1.0963287,"missing":0,"is_leaf":false},"index":2,"left":3,"right":4},{"value":{"feature_index":0,"feature_value":0.0,"pred":-2.1724625,"missing":0,"is_leaf":true},"index":3,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.3552475,"missing":0,"is_leaf":true},"index":4,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":98.005005,"pred":-0.057739258,"missing":0,"is_leaf":false},"index":5,"left":6,"right":7},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.13598633,"missing":0,"is_leaf":true},"index":6,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.003326416,"missing":0,"is_leaf":true},"index":7,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.395004,"pred":0.22322083,"missing":0,"is_leaf":false},"index":8,"left":9,"right":12},{"value":{"feature_index":3,"feature_value":33243.0,"pred":0.12782288,"missing":0,"is_leaf":false},"index":9,"left":10,"right":11},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.06301117,"missing":0,"is_leaf":true},"index":10,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.17105865,"missing":0,"is_leaf":true},"index":11,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.585,"pred":0.5432205,"missing":0,"is_leaf":false},"index":12,"left":13,"right":14},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.22473145,"missing":0,"is_leaf":true},"index":13,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.5932236,"missing":0,"is_leaf":true},"index":14,"left":0,"right":0}]},"feature_size":5,"max_depth":3,"min_leaf_size":1,"loss":"LAD","feature_sample_ratio":1.0},{"tree":{"tree":[{"value":{"feature_index":0,"feature_value":98.515,"pred":0.020698547,"missing":0,"is_leaf":false},"index":0,"left":1,"right":8},{"value":{"feature_index":0,"feature_value":96.435,"pred":-0.12980652,"missing":0,"is_leaf":false},"index":1,"left":2,"right":5},{"value":{"feature_index":0,"feature_value":95.205,"pred":-1.0186996,"missing":0,"is_leaf":false},"index":2,"left":3,"right":4},{"value":{"feature_index":0,"feature_value":0.0,"pred":-2.1821136,"missing":0,"is_leaf":true},"index":3,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.34114075,"missing":0,"is_leaf":true},"index":4,"left":0,"right":0},{"value":{"feature_index":3,"feature_value":130487.5,"pred":-0.053749084,"missing":0,"is_leaf":false},"index":5,"left":6,"right":7},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.08656311,"missing":0,"is_leaf":true},"index":6,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.039657593,"missing":0,"is_leaf":true},"index":7,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.415,"pred":0.2148056,"missing":0,"is_leaf":false},"index":8,"left":9,"right":12},{"value":{"feature_index":3,"feature_value":24602.0,"pred":0.12172699,"missing":0,"is_leaf":false},"index":9,"left":10,"right":11},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.048835754,"missing":0,"is_leaf":true},"index":10,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.15837097,"missing":0,"is_leaf":true},"index":11,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.604996,"pred":0.51754,"missing":0,"is_leaf":false},"index":12,"left":13,"right":14},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.22073364,"missing":0,"is_leaf":true},"index":13,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.55763245,"missing":0,"is_leaf":true},"index":14,"left":0,"right":0}]},"feature_size":5,"max_depth":3,"min_leaf_size":1,"loss":"LAD","feature_sample_ratio":1.0},{"tree":{"tree":[{"value":{"feature_index":0,"feature_value":98.185,"pred":0.020469666,"missing":0,"is_leaf":false},"index":0,"left":1,"right":8},{"value":{"feature_index":0,"feature_value":96.005005,"pred":-0.20095825,"missing":0,"is_leaf":false},"index":1,"left":2,"right":5},{"value":{"feature_index":0,"feature_value":95.035,"pred":-1.2411804,"missing":0,"is_leaf":false},"index":2,"left":3,"right":4},{"value":{"feature_index":0,"feature_value":0.0,"pred":-2.2911835,"missing":0,"is_leaf":true},"index":3,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.35118103,"missing":0,"is_leaf":true},"index":4,"left":0,"right":0},{"value":{"feature_index":3,"feature_value":25172.0,"pred":-0.10155487,"missing":0,"is_leaf":false},"index":5,"left":6,"right":7},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.2474289,"missing":0,"is_leaf":true},"index":6,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.04636383,"missing":0,"is_leaf":true},"index":7,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.134995,"pred":0.14749146,"missing":0,"is_leaf":false},"index":8,"left":9,"right":12},{"value":{"feature_index":3,"feature_value":17967.5,"pred":0.066604614,"missing":0,"is_leaf":false},"index":9,"left":10,"right":11},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.024589539,"missing":0,"is_leaf":true},"index":10,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.09044647,"missing":0,"is_leaf":true},"index":11,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.545,"pred":0.39408875,"missing":0,"is_leaf":false},"index":12,"left":13,"right":14},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.17260742,"missing":0,"is_leaf":true},"index":13,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.5141754,"missing":0,"is_leaf":true},"index":14,"left":0,"right":0}]},"feature_size":5,"max_depth":3,"min_leaf_size":1,"loss":"LAD","feature_sample_ratio":1.0},{"tree":{"tree":[{"value":{"feature_index":0,"feature_value":98.475006,"pred":0.019882202,"missing":0,"is_leaf":false},"index":0,"left":1,"right":8},{"value":{"feature_index":0,"feature_value":96.415,"pred":-0.12538147,"missing":0,"is_leaf":false},"index":1,"left":2,"right":5},{"value":{"feature_index":0,"feature_value":94.915,"pred":-0.8237076,"missing":0,"is_leaf":false},"index":2,"left":3,"right":4},{"value":{"feature_index":0,"feature_value":0.0,"pred":-2.3337097,"missing":0,"is_leaf":true},"index":3,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.31370544,"missing":0,"is_leaf":true},"index":4,"left":0,"right":0},{"value":{"feature_index":3,"feature_value":21161.5,"pred":-0.052135468,"missing":0,"is_leaf":false},"index":5,"left":6,"right":7},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.16778564,"missing":0,"is_leaf":true},"index":6,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.0185318,"missing":0,"is_leaf":true},"index":7,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.465,"pred":0.18765259,"missing":0,"is_leaf":false},"index":8,"left":9,"right":12},{"value":{"feature_index":3,"feature_value":42942.5,"pred":0.10682678,"missing":0,"is_leaf":false},"index":9,"left":10,"right":11},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.05983734,"missing":0,"is_leaf":true},"index":10,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.1499176,"missing":0,"is_leaf":true},"index":11,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.634995,"pred":0.46331787,"missing":0,"is_leaf":false},"index":12,"left":13,"right":14},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.19554901,"missing":0,"is_leaf":true},"index":13,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.5033188,"missing":0,"is_leaf":true},"index":14,"left":0,"right":0}]},"feature_size":5,"max_depth":3,"min_leaf_size":1,"loss":"LAD","feature_sample_ratio":1.0},{"tree":{"tree":[{"value":{"feature_index":0,"feature_value":98.134995,"pred":0.019851685,"missing":0,"is_leaf":false},"index":0,"left":1,"right":8},{"value":{"feature_index":0,"feature_value":95.895004,"pred":-0.19548798,"missing":0,"is_leaf":false},"index":1,"left":2,"right":5},{"value":{"feature_index":0,"feature_value":94.735,"pred":-1.0836868,"missing":0,"is_leaf":false},"index":2,"left":3,"right":4},{"value":{"feature_index":0,"feature_value":0.0,"pred":-2.480278,"missing":0,"is_leaf":true},"index":3,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.3189392,"missing":0,"is_leaf":true},"index":4,"left":0,"right":0},{"value":{"feature_index":3,"feature_value":45758.0,"pred":-0.10134506,"missing":0,"is_leaf":false},"index":5,"left":6,"right":7},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.19054413,"missing":0,"is_leaf":true},"index":6,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.030082703,"missing":0,"is_leaf":true},"index":7,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.195,"pred":0.12967682,"missing":0,"is_leaf":false},"index":8,"left":9,"right":12},{"value":{"feature_index":3,"feature_value":17967.5,"pred":0.060539246,"missing":0,"is_leaf":false},"index":9,"left":10,"right":11},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.021469116,"missing":0,"is_leaf":true},"index":10,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.08100128,"missing":0,"is_leaf":true},"index":11,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.585,"pred":0.36312103,"missing":0,"is_leaf":false},"index":12,"left":13,"right":14},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.16394043,"missing":0,"is_leaf":true},"index":13,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.46312714,"missing":0,"is_leaf":true},"index":14,"left":0,"right":0}]},"feature_size":5,"max_depth":3,"min_leaf_size":1,"loss":"LAD","feature_sample_ratio":1.0},{"tree":{"tree":[{"value":{"feature_index":0,"feature_value":98.555,"pred":0.019355774,"missing":0,"is_leaf":false},"index":0,"left":1,"right":8},{"value":{"feature_index":0,"feature_value":97.075,"pred":-0.10277176,"missing":0,"is_leaf":false},"index":1,"left":2,"right":5},{"value":{"feature_index":0,"feature_value":94.634995,"pred":-0.51394653,"missing":0,"is_leaf":false},"index":2,"left":3,"right":4},{"value":{"feature_index":0,"feature_value":0.0,"pred":-2.4648666,"missing":0,"is_leaf":true},"index":3,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.2625351,"missing":0,"is_leaf":true},"index":4,"left":0,"right":0},{"value":{"feature_index":2,"feature_value":168903.5,"pred":-0.031913757,"missing":0,"is_leaf":false},"index":5,"left":6,"right":7},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.0723114,"missing":0,"is_leaf":true},"index":6,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.017433167,"missing":0,"is_leaf":true},"index":7,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.505005,"pred":0.18524933,"missing":0,"is_leaf":false},"index":8,"left":9,"right":12},{"value":{"feature_index":3,"feature_value":33243.0,"pred":0.10633087,"missing":0,"is_leaf":false},"index":9,"left":10,"right":11},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.056526184,"missing":0,"is_leaf":true},"index":10,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.14215851,"missing":0,"is_leaf":true},"index":11,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.665,"pred":0.4153366,"missing":0,"is_leaf":false},"index":12,"left":13,"right":14},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.17401123,"missing":0,"is_leaf":true},"index":13,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.45533752,"missing":0,"is_leaf":true},"index":14,"left":0,"right":0}]},"feature_size":5,"max_depth":3,"min_leaf_size":1,"loss":"LAD","feature_sample_ratio":1.0},{"tree":{"tree":[{"value":{"feature_index":0,"feature_value":98.575,"pred":0.018989563,"missing":0,"is_leaf":false},"index":0,"left":1,"right":8},{"value":{"feature_index":0,"feature_value":97.715,"pred":-0.09477997,"missing":0,"is_leaf":false},"index":1,"left":2,"right":5},{"value":{"feature_index":0,"feature_value":94.425,"pred":-0.3035965,"missing":0,"is_leaf":false},"index":2,"left":3,"right":4},{"value":{"feature_index":0,"feature_value":0.0,"pred":-2.486969,"missing":0,"is_leaf":true},"index":3,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.192482,"missing":0,"is_leaf":true},"index":4,"left":0,"right":0},{"value":{"feature_index":1,"feature_value":1.395,"pred":-0.0052871704,"missing":0,"is_leaf":false},"index":5,"left":6,"right":7},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.0377388,"missing":0,"is_leaf":true},"index":6,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.060432434,"missing":0,"is_leaf":true},"index":7,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.515,"pred":0.17843628,"missing":0,"is_leaf":false},"index":8,"left":9,"right":12},{"value":{"feature_index":3,"feature_value":24602.0,"pred":0.10296631,"missing":0,"is_leaf":false},"index":9,"left":10,"right":11},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.04561615,"missing":0,"is_leaf":true},"index":10,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.13304901,"missing":0,"is_leaf":true},"index":11,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.675,"pred":0.39792633,"missing":0,"is_leaf":false},"index":12,"left":13,"right":14},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.16490173,"missing":0,"is_leaf":true},"index":13,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":0.42801666,"missing":0,"is_leaf":true},"index":14,"left":0,"right":0}]},"feature_size":5,"max_depth":3,"min_leaf_size":1,"loss":"LAD","feature_sample_ratio":1.0},{"tree":{"tree":[{"value":{"feature_index":0,"feature_value":98.075,"pred":0.019203186,"missing":0,"is_leaf":false},"index":0,"left":1,"right":8},{"value":{"feature_index":0,"feature_value":95.645004,"pred":-0.17971802,"missing":0,"is_leaf":false},"index":1,"left":2,"right":5},{"value":{"feature_index":0,"feature_value":94.115005,"pred":-0.8543472,"missing":0,"is_leaf":false},"index":2,"left":3,"right":4},{"value":{"feature_index":0,"feature_value":0.0,"pred":-2.4843445,"missing":0,"is_leaf":true},"index":3,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.25724792,"missing":0,"is_leaf":true},"index":4,"left":0,"right":0},{"value":{"feature_index":3,"feature_value":34423.0,"pred":-0.09993744,"missing":0,"is_leaf":false},"index":5,"left":6,"right":7},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.21213531,"missing":0,"is_leaf":true},"index":6,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":0.0,"pred":-0.03462982,"missing":0,"is_leaf":true},"index":7,"left":0,"right":0},{"value":{"feature_index":0,"feature_value":99.195,"pred":0.109184265,"missing":0,"is_leaf":false},"index":8,"left":9,"right":12},{"value":{"feature_index":3,"feature_value":15561.5,"pred":0.048980713,"missing":0,"is_leaf":false},"index":9,"lef

Download .txt

gitextract_6ays1fb4/

├── .github/
│   └── workflows/
│       └── release.yml
├── .gitignore
├── .gitmodules
├── CHANGELOG.md
├── Cargo.toml
├── LICENSE
├── README.md
├── model_to_src.sh
├── scripts/
│   ├── clustermap_triangle.py
│   └── pre_release.sh
├── skani_matrix.af
├── src/
│   ├── avx2_seeding.rs
│   ├── chain.rs
│   ├── cli.rs
│   ├── cmd_line.rs
│   ├── dist.rs
│   ├── file_io.rs
│   ├── lib.rs
│   ├── main.rs
│   ├── model.rs
│   ├── params.rs
│   ├── parse.rs
│   ├── regression.rs
│   ├── screen.rs
│   ├── search.rs
│   ├── seeding.rs
│   ├── sketch.rs
│   ├── sketch_db.rs
│   ├── triangle.rs
│   └── types.rs
├── test_files/
│   ├── GCF_005706655.1_ASM570665v1_genomic.fna
│   ├── GCF_005844845.1_ASM584484v1_genomic.fna
│   ├── MN-03.fa
│   ├── all_ns.fa
│   ├── e.coli-EC590.fasta
│   ├── e.coli-K12.fasta
│   ├── e.coli-W.fasta
│   ├── e.coli-h5.fasta
│   ├── e.coli-o157.fasta
│   ├── e.coli-o157.fasta.sketch
│   ├── empty_fasta.fa
│   ├── list.txt
│   ├── o157_plasmid.fasta
│   ├── o157_reads.fastq
│   ├── query_list.txt
│   ├── skani_matrix.af
│   ├── test.fasta
│   └── viruses.fna
├── test_results_versions/
│   ├── 0.2.1
│   ├── 0.2.2
│   ├── 0.3.0
│   └── v0.2.1
└── tests/
    ├── int_test_new.rs
    ├── integration_test.rs
    ├── results/
    │   ├── output
    │   ├── output.af
    │   ├── output_o_triangle_full
    │   ├── output_o_triangle_full.af
    │   ├── test_dist_file.txt
    │   ├── test_sketch_dir/
    │   │   ├── e.coli-EC590.fasta.sketch
    │   │   ├── e.coli-K12.fasta.sketch
    │   │   ├── e.coli-W.fasta.gz.sketch
    │   │   ├── e.coli-W.fasta.sketch
    │   │   ├── e.coli-h5.fasta.sketch
    │   │   ├── e.coli-o157.fasta.sketch
    │   │   ├── o157_plasmid.fasta.sketch
    │   │   └── o157_reads.fastq.sketch
    │   ├── test_sketch_dir1/
    │   │   ├── e.coli-EC590.fasta.sketch
    │   │   ├── e.coli-K12.fasta.sketch
    │   │   ├── e.coli-W.fasta.gz.sketch
    │   │   └── o157_reads.fastq.sketch
    │   ├── test_sketch_dir3/
    │   │   ├── e.coli-EC590.fasta.sketch
    │   │   ├── e.coli-K12.fasta.sketch
    │   │   ├── e.coli-W.fasta.gz.sketch
    │   │   ├── e.coli-W.fasta.sketch
    │   │   ├── e.coli-h5.fasta.sketch
    │   │   ├── e.coli-o157.fasta.sketch
    │   │   ├── o157_plasmid.fasta.sketch
    │   │   └── o157_reads.fastq.sketch
    │   └── test_sketch_dir_aai/
    │       ├── e.coli-EC590.fasta.sketch
    │       ├── e.coli-K12.fasta.sketch
    │       ├── e.coli-W.fasta.gz.sketch
    │       ├── e.coli-W.fasta.sketch
    │       ├── e.coli-h5.fasta.sketch
    │       ├── e.coli-o157.fasta.sketch
    │       ├── o157_plasmid.fasta.sketch
    │       └── o157_reads.fastq.sketch
    └── tests.rs

Download .txt

SYMBOL INDEX (280 symbols across 21 files)

FILE: src/avx2_seeding.rs
  function mm_hash256 (line 7) | pub unsafe fn mm_hash256(kmer: __m256i) -> __m256i {
  function avx2_fmh_seeds (line 33) | pub unsafe fn avx2_fmh_seeds(

FILE: src/chain.rs
  function switch_qr (line 15) | fn switch_qr(med_ctg_len_r: f64, med_ctg_len_q: f64, q_sk_len: f64,r_sk_...
  function mean (line 28) | fn mean(data: &[f64]) -> Option<f64> {
  function std_deviation (line 39) | fn std_deviation(data: &[f64]) -> f64 {
  function bootstrap_interval (line 57) | fn bootstrap_interval(ani_ests: &Vec<(f64,usize)>) -> (f64,f64,f64){
  function map_params_from_sketch (line 88) | pub fn map_params_from_sketch <'a>(
  function chain_seeds (line 144) | pub fn chain_seeds(
  function calculate_ani (line 173) | fn calculate_ani(
  function score_anchors (line 558) | pub fn score_anchors(anchor_curr: &Anchor, anchor_past: &Anchor, map_par...
  function get_anchors (line 608) | fn get_anchors(
  function chain_anchors_ani (line 838) | fn chain_anchors_ani(anchor_chunks: &AnchorChunks, map_params: &MapParam...
  function get_chain_intervals (line 939) | fn get_chain_intervals(
  function get_nonoverlapping_chains (line 1008) | fn get_nonoverlapping_chains(

FILE: src/cli.rs
  type Cli (line 10) | pub struct Cli {
  type Commands (line 16) | pub enum Commands {
  type SketchArgs (line 39) | pub struct SketchArgs {
  type DistArgs (line 106) | pub struct DistArgs {
  type TriangleArgs (line 241) | pub struct TriangleArgs {
  type SearchArgs (line 364) | pub struct SearchArgs {

FILE: src/cmd_line.rs
  constant MIN_ALIGN_FRAC (line 1) | pub const MIN_ALIGN_FRAC: &str = "min aligned frac";
  constant CMD_MIN_ALIGN_FRAC (line 2) | pub const CMD_MIN_ALIGN_FRAC: &str = "min-af";
  constant H_MIN_ALIGN_FRAC (line 3) | pub const H_MIN_ALIGN_FRAC: &str = "Only output ANI values where one gen...
  constant IND_CTG_QRY (line 5) | pub const IND_CTG_QRY: &str = "individual contig query";
  constant CMD_IND_CTG_QRY (line 6) | pub const CMD_IND_CTG_QRY: &str = "qi";
  constant H_IND_CTG_QRY (line 7) | pub const H_IND_CTG_QRY: &str = "Use individual sequences for the QUERY ...
  constant IND_CTG_REF (line 9) | pub const IND_CTG_REF: &str = "individual contig ref";
  constant CMD_IND_CTG_REF (line 10) | pub const CMD_IND_CTG_REF: &str = "ri";
  constant H_IND_CTG_REF (line 11) | pub const H_IND_CTG_REF: &str = "Use individual sequences for the REFERE...
  constant NO_FULL_INDEX (line 13) | pub const NO_FULL_INDEX: &str = "no marker index";
  constant CMD_NO_FULL_INDEX (line 14) | pub const CMD_NO_FULL_INDEX: &str = "no-marker-index";
  constant H_NO_FULL_INDEX (line 15) | pub const H_NO_FULL_INDEX: &str = "Do not use hash-table inverted index ...
  constant ROBUST (line 17) | pub const ROBUST: &str = "robust";
  constant CMD_ROBUST (line 18) | pub const CMD_ROBUST: &str = "robust";
  constant H_ROBUST (line 19) | pub const H_ROBUST: &str = "Estimate mean after trimming off 10%/90% qua...
  constant FULL_MAT (line 21) | pub const FULL_MAT: &str = "full-matrix";
  constant CMD_FULL_MAT (line 22) | pub const CMD_FULL_MAT: &str = "full-matrix";
  constant H_FULL_MAT (line 23) | pub const H_FULL_MAT: &str = "Output full matrix instead of lower-triang...
  constant KEEP_REFS (line 25) | pub const KEEP_REFS: &str = "keep-refs";
  constant CMD_KEEP_REFS (line 26) | pub const CMD_KEEP_REFS: &str = "keep-refs";
  constant H_KEEP_REFS (line 27) | pub const H_KEEP_REFS: &str = "Keep reference sketches in memory if the ...
  constant C_FACTOR (line 29) | pub const C_FACTOR: &str = "c";
  constant CMD_C_FACTOR (line 30) | pub const CMD_C_FACTOR: &str = "c";
  constant H_C_FACTOR (line 31) | pub const H_C_FACTOR: &str = "Compression factor (k-mer subsampling rate...
  constant H_SCREEN (line 33) | pub const H_SCREEN: &str = "Screen out pairs with *approximately* < % id...
  constant CONF_INTERVAL (line 35) | pub const CONF_INTERVAL: &str = "ci";
  constant CMD_CONF_INTERVAL (line 36) | pub const CMD_CONF_INTERVAL: &str = "ci";
  constant H_CONF_INTERVAL (line 37) | pub const H_CONF_INTERVAL: &str = "Output [5%,95%] ANI confidence interv...
  constant H_CONF_INTERVAL_TRI (line 38) | pub const H_CONF_INTERVAL_TRI: &str = "Output [5%,95%] ANI confidence in...
  constant NO_LEARNED_ANI (line 40) | pub const NO_LEARNED_ANI: &str = "no-learned-ani";
  constant CMD_NO_LEARNED_ANI (line 41) | pub const CMD_NO_LEARNED_ANI : &str = "no-learned-ani";
  constant H_NO_LEARNED_ANI (line 42) | pub const H_NO_LEARNED_ANI: &str = "Disable regression model for ANI pre...
  constant MODE_SLOW (line 44) | pub const MODE_SLOW: &str = "slow";
  constant CMD_MODE_SLOW (line 45) | pub const CMD_MODE_SLOW : &str = "slow";
  constant H_MODE_SLOW (line 46) | pub const H_MODE_SLOW : &str = "Slower skani mode; 4x slower and more me...
  constant MODE_SMALL_GENOMES (line 48) | pub const MODE_SMALL_GENOMES: &str = "small-genomes";
  constant CMD_MODE_SMALL_GENOMES (line 49) | pub const CMD_MODE_SMALL_GENOMES: &str = "small-genomes";
  constant H_MODE_SMALL_GENOMES (line 50) | pub const H_MODE_SMALL_GENOMES : &str = "Mode for small genomes such as ...
  constant MODE_FAST (line 52) | pub const MODE_FAST: &str = "fast";
  constant CMD_MODE_FAST (line 53) | pub const CMD_MODE_FAST : &str = "fast";
  constant H_MODE_FAST (line 54) | pub const H_MODE_FAST : &str = "Faster skani mode; 2x faster and less me...
  constant MODE_MEDIUM (line 56) | pub const MODE_MEDIUM: &str = "medium";
  constant CMD_MODE_MEDIUM (line 57) | pub const CMD_MODE_MEDIUM : &str = "medium";
  constant H_MODE_MEDIUM (line 58) | pub const H_MODE_MEDIUM: &str = "Medium skani mode; 2x slower and more m...
  constant MARKER_C (line 60) | pub const MARKER_C: &str = "marker_c";
  constant CMD_MARKER_C (line 61) | pub const CMD_MARKER_C: char = 'm';
  constant H_MARKER_C (line 62) | pub const H_MARKER_C: &str = "Marker k-mer compression factor. Markers a...
  constant DETAIL_OUT (line 64) | pub const DETAIL_OUT: &str = "detailed";
  constant CMD_DETAIL_OUT (line 65) | pub const CMD_DETAIL_OUT: &str = "detailed";
  constant H_DETAIL_OUT (line 66) | pub const H_DETAIL_OUT: &str = "Print additional info including contig N...
  constant DISTANCE_OUT (line 68) | pub const DISTANCE_OUT: &str = "distance";
  constant CMD_DISTANCE_OUT (line 69) | pub const CMD_DISTANCE_OUT: &str = "distance";
  constant H_DISTANCE_OUT (line 70) | pub const H_DISTANCE_OUT: &str = "Output 100 - ANI instead of ANI, creat...
  constant INT_WRITE (line 72) | pub const INT_WRITE: &str = "intermediate write count";
  constant CMD_INT_WRITE (line 73) | pub const CMD_INT_WRITE: &str = "inter-write";
  constant H_INT_WRITE (line 74) | pub const H_INT_WRITE: &str = "Write results to output after --inter-wri...
  constant FAST_SMALL (line 76) | pub const FAST_SMALL: &str = "faster-small";
  constant CMD_FAST_SMALL (line 77) | pub const CMD_FAST_SMALL: &str = "faster-small";
  constant H_FAST_SMALL (line 78) | pub const H_FAST_SMALL: &str = "Filter genomes with < 20 marker k-mers m...
  constant DIAG (line 80) | pub const DIAG: &str = "diagonal";
  constant CMD_DIAG (line 81) | pub const CMD_DIAG: &str = "diagonal";
  constant H_DIAG (line 82) | pub const H_DIAG: &str = "Output the diagonal of the ANI matrix (i.e. se...

FILE: src/dist.rs
  function dist (line 12) | pub fn dist(command_params: CommandParams, mut sketch_params: SketchPara...

FILE: src/file_io.rs
  function write_header (line 15) | fn write_header(writer: &mut impl Write, id_str: &str, ci: bool, verbose...
  function write_ani_res_perfect (line 25) | fn write_ani_res_perfect(writer: &mut impl Write, sketch: &Sketch, ci: b...
  function write_ani_res (line 83) | fn write_ani_res(writer: &mut impl Write, ani_res: &AniEstResult, ci: bo...
  function fastx_to_sketches (line 141) | pub fn fastx_to_sketches(
  function fastx_to_multiple_sketch_rewrite (line 253) | pub fn fastx_to_multiple_sketch_rewrite(
  function write_phyllip_matrix (line 364) | pub fn write_phyllip_matrix(
  function write_sparse_matrix (line 541) | pub fn write_sparse_matrix(
  function write_query_ref_list (line 608) | pub fn write_query_ref_list(
  function sketches_from_sketch (line 680) | pub fn sketches_from_sketch(ref_files: &Vec<String>) -> (SketchParams, V...
  function marker_sketches_from_marker_file (line 720) | pub fn marker_sketches_from_marker_file(marker_file: &str) -> (SketchPar...

FILE: src/main.rs
  function main (line 20) | fn main() {

FILE: src/model.rs
  constant MODEL (line 1) | pub const MODEL:&str = r#"
  constant MODEL_C200 (line 4) | pub const MODEL_C200:&str = r#"

FILE: src/params.rs
  constant GB_IN_BYTES (line 4) | pub const GB_IN_BYTES: usize = 1_073_741_824;
  constant SMALL_VEC_SIZE (line 5) | pub const SMALL_VEC_SIZE: usize = 1;
  constant KMER_SK_SMALL_VEC_SIZE (line 6) | pub const KMER_SK_SMALL_VEC_SIZE: usize = 3;
  constant INTERMEDIATE_WRITE_COUNT (line 7) | pub const INTERMEDIATE_WRITE_COUNT: usize = 5000;
  constant D_FRAGMENT_LENGTH (line 11) | pub const D_FRAGMENT_LENGTH: usize = 200000;
  constant STOP_CODON (line 12) | pub const STOP_CODON: MarkerBits = 21;
  constant DEFAULT_C (line 13) | pub const DEFAULT_C: &str = "125";
  constant DEFAULT_C_AAI (line 14) | pub const DEFAULT_C_AAI: &str = "15";
  constant DEFAULT_K (line 15) | pub const DEFAULT_K: &str = "15";
  constant DEFAULT_K_AAI (line 16) | pub const DEFAULT_K_AAI: &str = "6";
  constant D_MAX_GAP_LENGTH (line 17) | pub const D_MAX_GAP_LENGTH: f64 = 300.;
  constant D_MAX_GAP_LENGTH_AAI (line 18) | pub const D_MAX_GAP_LENGTH_AAI: f64 = 50.;
  constant D_MAX_LIN_LENGTH (line 19) | pub const D_MAX_LIN_LENGTH: f64 = 5000.;
  constant D_ANCHOR_SCORE_ANI (line 20) | pub const D_ANCHOR_SCORE_ANI: f64 = 20.;
  constant D_ANCHOR_SCORE_AAI (line 21) | pub const D_ANCHOR_SCORE_AAI: f64 = 20.;
  constant D_MIN_ANCHORS_ANI (line 22) | pub const D_MIN_ANCHORS_ANI: usize = 3;
  constant D_MIN_ANCHORS_AAI (line 23) | pub const D_MIN_ANCHORS_AAI: usize = 5;
  constant D_LENGTH_CUTOFF (line 24) | pub const D_LENGTH_CUTOFF: usize = D_FRAGMENT_LENGTH;
  constant D_FRAC_COVER_CUTOFF (line 25) | pub const D_FRAC_COVER_CUTOFF: &str = "15";
  constant D_ANI_AND_COVER_CUTOFF (line 26) | pub const D_ANI_AND_COVER_CUTOFF: f64 = 0.95;
  constant D_FRAC_COVER_CUTOFF_AA (line 27) | pub const D_FRAC_COVER_CUTOFF_AA: &str = "5";
  constant ORF_SIZE (line 30) | pub const ORF_SIZE: usize = 30;
  constant MARKER_C_DEFAULT (line 31) | pub const MARKER_C_DEFAULT: &str = "1000";
  constant K_MARKER_AA (line 32) | pub const K_MARKER_AA: usize = 10;
  constant K_MARKER_DNA (line 33) | pub const K_MARKER_DNA: usize = 21;
  constant SEARCH_STRING (line 34) | pub const SEARCH_STRING: &str = "search";
  constant DIST_STRING (line 35) | pub const DIST_STRING: &str = "dist";
  constant SKETCH_STRING (line 36) | pub const SKETCH_STRING: &str = "sketch";
  constant TRIANGLE_STRING (line 37) | pub const TRIANGLE_STRING: &str = "triangle";
  constant CHUNK_SIZE_DNA (line 38) | pub const CHUNK_SIZE_DNA: usize = 20000;
  constant CHUNK_SIZE_AA (line 39) | pub const CHUNK_SIZE_AA: usize = 20000;
  constant MIN_LENGTH_CONTIG (line 40) | pub const MIN_LENGTH_CONTIG: usize = 500;
  constant MIN_LENGTH_COVER_AAI (line 41) | pub const MIN_LENGTH_COVER_AAI: usize = 500;
  constant MIN_LENGTH_COVER (line 42) | pub const MIN_LENGTH_COVER: usize = 500;
  constant BP_CHAIN_BAND (line 43) | pub const BP_CHAIN_BAND: usize = 2500;
  constant BP_CHAIN_BAND_AAI (line 44) | pub const BP_CHAIN_BAND_AAI: usize = 500;
  constant SEARCH_AAI_CUTOFF_DEFAULT (line 45) | pub const SEARCH_AAI_CUTOFF_DEFAULT: f64 = 0.60;
  constant SEARCH_ANI_CUTOFF_DEFAULT (line 46) | pub const SEARCH_ANI_CUTOFF_DEFAULT: f64 = 0.80;
  constant SCREEN_MINIMUM_KMERS (line 47) | pub const SCREEN_MINIMUM_KMERS: usize = 20;
  constant FULL_INDEX_THRESH (line 48) | pub const FULL_INDEX_THRESH: usize = 50;
  constant REPET_KMER_THRESHOLD (line 49) | pub const REPET_KMER_THRESHOLD: usize = 8_000_000;
  constant OVERLAP_ORTHOLOGOUS_FRACTION (line 50) | pub const OVERLAP_ORTHOLOGOUS_FRACTION: f32  = 0.50;
  constant TOTAL_BASES_REGRESS_CUTOFF (line 51) | pub const TOTAL_BASES_REGRESS_CUTOFF: usize = 150000;
  constant LEARNED_INFO_HELP (line 52) | pub const LEARNED_INFO_HELP: &str = "Learned ANI mode detected. ANI may ...
  constant FAST_C (line 54) | pub const FAST_C: usize = 200;
  constant SLOW_C (line 55) | pub const SLOW_C: usize = 30;
  constant MEDIUM_C (line 56) | pub const MEDIUM_C: usize = 70;
  constant SMALL_M (line 57) | pub const SMALL_M: usize = 200;
  constant ASCII_N (line 59) | pub const ASCII_N: usize = 78;
  constant ASCII_N_SMALL (line 60) | pub const ASCII_N_SMALL: usize = 110;
  type Mode (line 65) | pub enum Mode {
  type MapParams (line 73) | pub struct MapParams<'a> {
  type CommandParams (line 94) | pub struct CommandParams{
  function fragment_length_formula (line 123) | pub fn fragment_length_formula(_n: usize, aa: bool) -> usize {
  type SketchParams (line 135) | pub struct SketchParams {
    method new (line 147) | pub fn new(marker_c: usize, c: usize, k: usize, use_syncs: bool, use_a...

FILE: src/parse.rs
  function parse_params (line 12) | pub fn parse_params(matches: &ArgMatches) -> (SketchParams, CommandParam...
  function parse_params_search (line 380) | pub fn parse_params_search(matches_subc: &ArgMatches) -> (SketchParams, ...
  function parse_params_from_cli (line 502) | pub fn parse_params_from_cli(cli: &Cli) -> (SketchParams, CommandParams) {
  function setup_logging_and_threads (line 511) | fn setup_logging_and_threads(threads: &str, debug: bool, trace: bool) {
  function parse_sketch_args (line 527) | fn parse_sketch_args(args: &SketchArgs) -> (SketchParams, CommandParams) {
  function parse_dist_args (line 628) | fn parse_dist_args(args: &DistArgs) -> (SketchParams, CommandParams) {
  function parse_triangle_args (line 790) | fn parse_triangle_args(args: &TriangleArgs) -> (SketchParams, CommandPar...
  function parse_search_args (line 923) | fn parse_search_args(args: &SearchArgs) -> (SketchParams, CommandParams) {
  function read_file_list (line 1009) | fn read_file_list(file_path: &str) -> Vec<String> {

FILE: src/regression.rs
  function use_learned_ani (line 8) | pub fn use_learned_ani(c: usize, individual_contig_q: bool, individual_c...
  function get_model (line 12) | pub fn get_model(c: usize, learned_ani: bool) -> Option<GBDT>{
  function predict_from_ani_res (line 30) | pub fn predict_from_ani_res(ani_res: &mut AniEstResult, model: &GBDT) {

FILE: src/screen.rs
  function check_small_contigs (line 10) | pub fn check_small_contigs(ref_sketches: &Vec<Sketch>, query_sketches: &...
  function screen_refs_indices (line 39) | pub fn screen_refs_indices(
  function check_markers_quickly (line 84) | pub fn check_markers_quickly(
  function screen_refs (line 148) | pub fn screen_refs(
  function kmer_to_sketch_from_refs (line 190) | pub fn kmer_to_sketch_from_refs(ref_sketches: &Vec<Sketch>) -> KmerToSke...

FILE: src/search.rs
  function search (line 16) | pub fn search(command_params: CommandParams) {

FILE: src/seeding.rs
  function _position_min (line 6) | fn _position_min<T: Ord>(slice: &[T]) -> Option<usize> {
  function get_nonoverlap_orf (line 14) | pub fn get_nonoverlap_orf(sorted_orfs: Vec<Orf>) -> Vec<Orf> {
  function get_orfs (line 55) | pub fn get_orfs(string: &[u8], sketch_params: &SketchParams) -> Vec<Orf> {
  function fmh_seeds_aa_with_orf (line 114) | pub fn fmh_seeds_aa_with_orf(
  function fmh_seeds (line 225) | pub fn fmh_seeds(
  function get_repetitive_kmers (line 328) | pub fn get_repetitive_kmers(kmer_seeds: &Option<KmerSeeds>, sketch: &Ske...

FILE: src/sketch.rs
  function sketch (line 15) | pub fn sketch(command_params: CommandParams, sketch_params: SketchParams) {
  function sketch_separate_files (line 38) | fn sketch_separate_files(command_params: CommandParams, sketch_params: S...
  function sketch_consolidated_db (line 105) | fn sketch_consolidated_db(command_params: CommandParams, sketch_params: ...

FILE: src/sketch_db.rs
  type IndexEntry (line 11) | pub struct IndexEntry {
  type SketchDbWriter (line 18) | pub struct SketchDbWriter {
    method new (line 32) | pub fn new(output_dir: &str) -> Result<Self, Box<dyn std::error::Error...
    method add_sketch (line 45) | pub fn add_sketch(&mut self, sketch_params: &SketchParams, sketch: &Sk...
    method finalize (line 67) | pub fn finalize(mut self, output_dir: &str) -> Result<(), Box<dyn std:...
  type SketchDbReader (line 25) | pub struct SketchDbReader {
    method new (line 86) | pub fn new(database_dir: &str) -> Result<Self, Box<dyn std::error::Err...
    method get_sketch (line 108) | pub fn get_sketch(&self, index: usize) -> Result<(SketchParams, Sketch...
    method sketch_count (line 126) | pub fn sketch_count(&self) -> usize {
    method len (line 131) | pub fn len(&self) -> usize {
    method is_empty (line 136) | pub fn is_empty(&self) -> bool {
  function is_consolidated_db (line 142) | pub fn is_consolidated_db(database_dir: &str) -> bool {
  function has_separate_sketches (line 149) | pub fn has_separate_sketches(database_dir: &str) -> bool {

FILE: src/triangle.rs
  function triangle (line 13) | pub fn triangle(command_params: CommandParams, mut sketch_params: Sketch...

FILE: src/types.rs
  constant DNA_TO_AA (line 27) | pub const  DNA_TO_AA: [u8; 64] =
  constant BYTE_TO_SEQ (line 40) | pub const BYTE_TO_SEQ: [MarkerBits; 256] = [
  type GnPosition (line 52) | pub type GnPosition = u32;
  type ContigIndex (line 53) | pub type ContigIndex = u32;
  type MarkerBits (line 55) | pub type MarkerBits = u64;
  type SeedBits (line 56) | pub type SeedBits = u32;
  type KmerToSketch (line 57) | pub type KmerToSketch = MMHashMap<MarkerBits, SmallVec<[u32; KMER_SK_SMA...
  type KmerSeeds (line 59) | pub type KmerSeeds = MMHashMap32<SeedBits, u64>;
  type MultiPositionStorage (line 60) | pub type MultiPositionStorage = Vec<SmallVec<[SeedPosition; 3]>>;
  type MMBuildHasher (line 65) | pub type MMBuildHasher = BuildHasherDefault<MMHasher>;
  type MMBuildHasher32 (line 66) | pub type MMBuildHasher32 = BuildHasherDefault<MMHasher32>;
  type MMHashMap (line 67) | pub type MMHashMap<K, V> = HashMap<K, V, MMBuildHasher>;
  type MMHashMap32 (line 68) | pub type MMHashMap32<K, V> = HashMap<K, V, MMBuildHasher32>;
  type MMHashSet (line 69) | pub type MMHashSet<K> = HashSet<K, MMBuildHasher>;
  function mm_hashi64 (line 73) | pub fn mm_hashi64(kmer: i64) -> i64 {
  function mm_hash64 (line 86) | pub fn mm_hash64(kmer: u64) -> u64 {
  function mm_hash_bytes_32 (line 99) | pub fn mm_hash_bytes_32(bytes: &[u8]) -> usize {
  function mm_hash (line 112) | pub fn mm_hash(bytes: &[u8]) -> usize {
  type SeedPosition (line 125) | pub struct SeedPosition{
    method new (line 135) | pub fn new(pos: GnPosition, contig_index: ContigIndex, canonical: bool...
    method canonical (line 147) | pub fn canonical(&self) -> bool {
    method contig_index (line 153) | pub fn contig_index(&self) -> ContigIndex {
    method set_canonical (line 159) | pub fn set_canonical(&mut self, canonical: bool) {
    method set_contig_index (line 169) | pub fn set_contig_index(&mut self, contig_index: ContigIndex) {
    method pack_to_u64 (line 177) | pub fn pack_to_u64(&self) -> u64 {
    method unpack_from_u64 (line 184) | pub fn unpack_from_u64(packed: u64) -> Self {
  type ContigIndexCanonical (line 131) | pub type ContigIndexCanonical = u32;
  function truncate_contig_name (line 197) | pub fn truncate_contig_name(name: &str, short_header: bool) -> String {
  type TaggedIndex (line 207) | pub struct TaggedIndex;
    constant SINGLE_BIT (line 210) | const SINGLE_BIT: u64 = 1;
    method single (line 214) | pub fn single(seed_position: &SeedPosition) -> u64 {
    method multiple (line 220) | pub fn multiple(storage_index: usize) -> u64 {
    method is_single (line 226) | pub fn is_single(tagged_index: u64) -> bool {
    method get_single (line 232) | pub fn get_single(tagged_index: u64) -> SeedPosition {
    method get_storage_index (line 240) | pub fn get_storage_index(tagged_index: u64) -> usize {
  type Sketch (line 253) | pub struct Sketch {
    method add_seed_position (line 281) | pub fn add_seed_position(&mut self, seed: SeedBits, position: SeedPosi...
    method get_seed_positions (line 307) | pub fn get_seed_positions(&self, seed: SeedBits) -> Cow<[SeedPosition]> {
    method get_markers_only (line 322) | pub fn get_markers_only(sketch: &Sketch) -> Sketch{
    method new (line 342) | pub fn new(marker_c: usize, c: usize, k: usize, file_name: String, ami...
  method partial_cmp (line 355) | fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
  method cmp (line 361) | fn cmp(&self, other: &Self) -> Ordering {
  method hash (line 368) | fn hash<H: Hasher>(&self, state: &mut H) {
  method default (line 373) | fn default() -> Self {
  type MMHasher32 (line 394) | pub struct MMHasher32 {
  method write (line 400) | fn write(&mut self, bytes: &[u8]) {
  method finish (line 404) | fn finish(&self) -> u64 {
  type MMHasher (line 414) | pub struct MMHasher {
  method write (line 420) | fn write(&mut self, bytes: &[u8]) {
  method finish (line 424) | fn finish(&self) -> u64 {
  type KmerEnc (line 432) | pub struct KmerEnc {
    method decode (line 450) | pub fn decode(byte: u64) -> u8 {
    method print_string (line 464) | pub fn print_string(kmer: u64, k: usize) {
    method print_string_aa (line 475) | pub fn print_string_aa(kmer: u64, k: usize, sketch_params: &SketchPara...
  method eq (line 443) | fn eq(&self, other: &Self) -> bool {
  type ChainingResult (line 488) | pub struct ChainingResult {
  type ChainingResultANI (line 495) | pub struct ChainingResultANI {
  type Anchor (line 500) | pub struct Anchor {
    method new (line 530) | pub fn new(
  type ChainInterval (line 509) | pub struct ChainInterval {
    method query_range_len (line 521) | pub fn query_range_len(&self) -> GnPosition {
    method ref_range_len (line 524) | pub fn ref_range_len(&self) -> GnPosition {
  type AnchorChunks (line 546) | pub struct AnchorChunks {
  type Orf (line 553) | pub struct Orf{
  type AniEstResult (line 560) | pub struct AniEstResult{

FILE: tests/int_test_new.rs
  type AniResult (line 10) | pub struct AniResult{
  function run_skani (line 20) | fn run_skani<'a>(args: &'a [&str], stderr: bool) -> String{
  function get_result_from_out (line 35) | fn get_result_from_out(tsv_res : &str) -> Vec<AniResult>{
  function fast_test_small_genomes (line 57) | fn fast_test_small_genomes() {
  function test_diag_triangle (line 90) | fn test_diag_triangle(){
  function fast_test_screen (line 102) | fn fast_test_screen(){
  function fast_show_degenerate_inputs (line 136) | fn fast_show_degenerate_inputs(){

FILE: tests/integration_test.rs
  function full_test_sketch_and_search (line 7) | fn full_test_sketch_and_search() {
  function full_test_dist (line 176) | fn full_test_dist() {
  function full_test_triangle (line 457) | fn full_test_triangle() {
  function test_consolidated_database_functionality (line 599) | fn test_consolidated_database_functionality() {
  function test_consolidated_database_multiple_files (line 704) | fn test_consolidated_database_multiple_files() {
  function test_short_header_functionality (line 765) | fn test_short_header_functionality() {
  function test_individual_contigs_with_search (line 893) | fn test_individual_contigs_with_search() {
  function test_sketch_search_individual_contigs_matches_dist (line 981) | fn test_sketch_search_individual_contigs_matches_dist() {
  function test_both_min_af_functionality (line 1107) | fn test_both_min_af_functionality() {

FILE: tests/tests.rs
  function default_params (line 8) | fn default_params(mode: Mode) -> (CommandParams, SketchParams) {
  function fast_ecoli_test_simple (line 43) | fn fast_ecoli_test_simple() {
  function fast_ecoli_plasmid_test (line 63) | fn fast_ecoli_plasmid_test() {
  function fast_eukaryote_test (line 83) | fn fast_eukaryote_test() {
  function fast_avx2_vs_normal_code (line 131) | fn fast_avx2_vs_normal_code(){
  function fast_NNN_test_code (line 150) | fn fast_NNN_test_code(){

Copy disabled (too large) Download .json

Condensed preview — 88 files, each showing path, character count, and a content snippet. Download the .json file for the full structured content (44,754K chars).

[
  {
    "path": ".github/workflows/release.yml",
    "chars": 715,
    "preview": "name: \"tagged-release\"\n\non:\n  workflow_dispatch:\n  push:\n    tags:\n      - \"v*\"\n\njobs:\n  tagged-release:\n    name: \"Tagg"
  },
  {
    "path": ".gitignore",
    "chars": 35,
    "preview": "/target\n/refs\ncallgrind*\nCLAUDE.md\n"
  },
  {
    "path": ".gitmodules",
    "chars": 119,
    "preview": "[submodule \"skani-mummer-train\"]\n\tpath = skani-mummer-train\n\turl = https://github.com/bluenote-1577/skani-mummer-train\n"
  },
  {
    "path": "CHANGELOG.md",
    "chars": 6459,
    "preview": "### v0.3.1 - 2025-10-11\n\n### Minor\n* Fixed `--both-min-af` bug in `skani search`\n\n### v0.3.0 released - 2025-08 (Breakin"
  },
  {
    "path": "Cargo.toml",
    "chars": 1699,
    "preview": "[package]\nname = \"skani\"\n###Make sure to change version in main.rs after changing cargo.toml\nversion = \"0.3.1\"\n####\nedit"
  },
  {
    "path": "LICENSE",
    "chars": 1065,
    "preview": "MIT License\n\nCopyright (c) 2022 Jim Shaw\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\no"
  },
  {
    "path": "README.md",
    "chars": 8415,
    "preview": "# skani - accurate, fast nucleotide identity calculation for MAGs, genomes, and databases\n\n## Introduction\n\n**skani** is"
  },
  {
    "path": "model_to_src.sh",
    "chars": 573,
    "preview": "#Models used in the preprint of skani. Trained on only Nayfach et al (2021) GEM data.\nMODEL=./skani-mummer-train/v0.1.0-"
  },
  {
    "path": "scripts/clustermap_triangle.py",
    "chars": 2249,
    "preview": "import numpy as np\nfrom collections import defaultdict\nimport seaborn\nimport matplotlib.pyplot as plt\nimport sys\nsys.set"
  },
  {
    "path": "scripts/pre_release.sh",
    "chars": 1186,
    "preview": "#!/bin/bash\n\n# Define the expected version\nEXPECTED_VERSION=\"0.3.0\"\n\n# Function to extract the version from Cargo.toml\ng"
  },
  {
    "path": "skani_matrix.af",
    "chars": 242,
    "preview": "4\n./test_files/GCF_005706655.1_ASM570665v1_genomic.fna\t100.00\t0.00\t0.00\t0.00\n./test_files/e.coli-EC590.fasta\t0.00\t100.00"
  },
  {
    "path": "src/avx2_seeding.rs",
    "chars": 12012,
    "preview": "use std::arch::x86_64::*;\nuse crate::params::*;\nuse crate::types::*;\n\n#[inline]\n#[target_feature(enable = \"avx2\")]\npub u"
  },
  {
    "path": "src/chain.rs",
    "chars": 40828,
    "preview": "use crate::params::*;\nuse gbdt::gradient_boost::GBDT;\nuse crate::types::*;\nuse bio::data_structures::interval_tree::Inte"
  },
  {
    "path": "src/cli.rs",
    "chars": 19584,
    "preview": "use clap::{Parser, Subcommand, Args};\n\n#[derive(Parser)]\n#[clap(\n    name = \"skani\",\n    version,\n    about = \"fast, rob"
  },
  {
    "path": "src/cmd_line.rs",
    "chars": 5135,
    "preview": "pub const MIN_ALIGN_FRAC: &str = \"min aligned frac\";\npub const CMD_MIN_ALIGN_FRAC: &str = \"min-af\";\npub const H_MIN_ALIG"
  },
  {
    "path": "src/dist.rs",
    "chars": 7022,
    "preview": "use crate::chain;\nuse crate::regression;\nuse crate::file_io;\nuse crate::params::*;\nuse crate::screen;\nuse crate::types::"
  },
  {
    "path": "src/file_io.rs",
    "chars": 28443,
    "preview": "use crate::params::*;\nuse std::fs::OpenOptions;\nuse crate::seeding;\nuse crate::types::*;\nuse fxhash::FxHashMap;\nuse log:"
  },
  {
    "path": "src/lib.rs",
    "chars": 386,
    "preview": "pub mod types;\npub mod params;\npub mod chain;\npub mod file_io;\npub mod seeding;\npub mod screen;\npub mod search;\npub mod "
  },
  {
    "path": "src/main.rs",
    "chars": 1334,
    "preview": "use clap::Parser;\nuse std::env;\nuse skani::cli::{Cli, Commands};\nuse skani::dist;\nuse skani::parse;\nuse skani::search;\nu"
  },
  {
    "path": "src/model.rs",
    "chars": 792672,
    "preview": "pub const MODEL:&str = r#\"\n{\"conf\":{\"feature_size\":5,\"max_depth\":3,\"iterations\":195,\"shrinkage\":0.06,\"feature_sample_rat"
  },
  {
    "path": "src/params.rs",
    "chars": 5742,
    "preview": "use crate::types::*;\nuse gbdt::gradient_boost::GBDT;\n\npub const GB_IN_BYTES: usize = 1_073_741_824;\npub const SMALL_VEC_"
  },
  {
    "path": "src/parse.rs",
    "chars": 31371,
    "preview": "use crate::cli::{Cli, Commands, DistArgs, SearchArgs, SketchArgs, TriangleArgs};\nuse crate::cmd_line::*;\nuse crate::para"
  },
  {
    "path": "src/regression.rs",
    "chars": 2221,
    "preview": "use crate::types::*;\nuse crate::model;\nuse crate::params::*;\nuse gbdt::decision_tree::Data;\nuse gbdt::gradient_boost::GB"
  },
  {
    "path": "src/screen.rs",
    "chars": 6628,
    "preview": "use crate::params::*;\n//use std::collections::{HashMap, HashSet};\n//use std::hash::{BuildHasherDefault, Hash, Hasher};\nu"
  },
  {
    "path": "src/search.rs",
    "chars": 13158,
    "preview": "use crate::chain;\nuse crate::regression;\nuse crate::file_io;\nuse crate::params::*;\nuse crate::screen;\nuse crate::sketch_"
  },
  {
    "path": "src/seeding.rs",
    "chars": 12320,
    "preview": "use crate::params::*;\nuse crate::types::*;\nuse rust_lapper::{Interval, Lapper};\n\n#[inline]\nfn _position_min<T: Ord>(slic"
  },
  {
    "path": "src/sketch.rs",
    "chars": 7752,
    "preview": "use crate::file_io;\nuse crate::params::*;\nuse crate::sketch_db::{SketchDbWriter};\nuse crate::types::*;\nuse log::*;\nuse r"
  },
  {
    "path": "src/sketch_db.rs",
    "chars": 5528,
    "preview": "use crate::params::*;\nuse crate::types::*;\nuse serde::{Deserialize, Serialize};\nuse std::fs::File;\nuse std::io::{BufRead"
  },
  {
    "path": "src/triangle.rs",
    "chars": 6393,
    "preview": "use crate::chain;\nuse crate::file_io;\nuse crate::params::*;\nuse crate::regression;\nuse crate::screen;\nuse crate::types::"
  },
  {
    "path": "src/types.rs",
    "chars": 18936,
    "preview": "//Various DNA lookup tables and hashing methods are taken from miniprot by Heng Li. Attached below is their license:\n//T"
  },
  {
    "path": "test_files/GCF_005706655.1_ASM570665v1_genomic.fna",
    "chars": 5307384,
    "preview": ">NZ_CP036555.1 Bacteroides fragilis strain CCUG4856T chromosome, complete genome\nATGAGTGAATCGAGTCATGTCGGCCTATGGAACCGCTGT"
  },
  {
    "path": "test_files/GCF_005844845.1_ASM584484v1_genomic.fna",
    "chars": 5762091,
    "preview": ">NZ_SPGY01000010.1 Bacteroides fragilis strain 1001175st1_C3 NODE_10_length_157825_cov_29.4629, whole genome shotgun seq"
  },
  {
    "path": "test_files/MN-03.fa",
    "chars": 5312141,
    "preview": ">NZ_CP081897.1 Klebsiella pneumoniae strain MN-03 chromosome, complete genome\nGTGTCACTTTCGCTTTGGCAGCAGTGTCTTGCCCGATTGCAG"
  },
  {
    "path": "test_files/all_ns.fa",
    "chars": 1328,
    "preview": ">all_n\nNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN"
  },
  {
    "path": "test_files/e.coli-EC590.fasta",
    "chars": 4675498,
    "preview": ">NZ_CP016182.2 Escherichia coli strain EC590 chromosome, complete genome\nATGGAGATTCAGCAAAAAGGTAACTTTTTATAATTTTATCTACATAC"
  },
  {
    "path": "test_files/e.coli-K12.fasta",
    "chars": 4704485,
    "preview": ">NC_007779.1 Escherichia coli str. K-12 substr. W3110, complete sequence\nAGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTA"
  },
  {
    "path": "test_files/e.coli-W.fasta",
    "chars": 4958722,
    "preview": ">NC_017664.1 Escherichia coli W, complete sequence\nAGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAG"
  },
  {
    "path": "test_files/e.coli-h5.fasta",
    "chars": 4893714,
    "preview": ">NZ_CP010169.1 Escherichia coli strain H5 chromosome, complete genome\nGCCGCTTCGCTTTTTCTCAGCGGCGCGGGGTGTGCATAATACGCTTTCCC"
  },
  {
    "path": "test_files/e.coli-o157.fasta",
    "chars": 5578260,
    "preview": ">NZ_CP017438.1 Escherichia coli O157:H7 strain 2159 chromosome, complete genome\nGTATACAGATCGTGCGATCTACTGTGGATAACTCTGTCAG"
  },
  {
    "path": "test_files/empty_fasta.fa",
    "chars": 16,
    "preview": ">test\n>test_1\nA\n"
  },
  {
    "path": "test_files/list.txt",
    "chars": 294,
    "preview": "test_files/e.coli-EC590.fasta\ntest_files/e.coli-h5.fasta\ntest_files/e.coli-K12-base_change_temp.fasta\ntest_files/e.coli-"
  },
  {
    "path": "test_files/o157_plasmid.fasta",
    "chars": 93839,
    "preview": ">NZ_CP017439.1 Escherichia coli O157:H7 strain 2159 plasmid pO157, complete sequence\nGTGCAGGATGGTGTGACTGATCTTCAACAAACGTA"
  },
  {
    "path": "test_files/query_list.txt",
    "chars": 146,
    "preview": "./test_files/o157_reads.fastq\n./test_files/e.coli-EC590.fasta\n./test_files/e.coli-W.fasta.gz\n./test_files/GCF_005706655."
  },
  {
    "path": "test_files/skani_matrix.af",
    "chars": 711667,
    "preview": "364\nb4dde086-6653-c273-4ed8-33baba9cf8b9 NZ_CP017438.1,-strand,1215234-1220882 length=5628 error-free_length=5648 read_i"
  },
  {
    "path": "test_files/test.fasta",
    "chars": 43,
    "preview": ">test\nATCGATCGATCGATCGATCGATCGATCGATCGATCG\n"
  },
  {
    "path": "test_files/viruses.fna",
    "chars": 62793,
    "preview": ">NC_045512.2\nATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCT\nGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCACTCGGCTGC"
  },
  {
    "path": "test_results_versions/0.2.1",
    "chars": 88267,
    "preview": "\nrunning 0 tests\n\nsuccesses:\n\nsuccesses:\n\ntest result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; fi"
  },
  {
    "path": "test_results_versions/0.2.2",
    "chars": 88324,
    "preview": "\nrunning 0 tests\n\nsuccesses:\n\nsuccesses:\n\ntest result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; fi"
  },
  {
    "path": "test_results_versions/0.3.0",
    "chars": 86842,
    "preview": "\nrunning 0 tests\n\nsuccesses:\n\nsuccesses:\n\ntest result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; fi"
  },
  {
    "path": "test_results_versions/v0.2.1",
    "chars": 84641,
    "preview": "\nrunning 0 tests\n\ntest result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s\n\n\nrunnin"
  },
  {
    "path": "tests/int_test_new.rs",
    "chars": 6584,
    "preview": "use assert_cmd::prelude::*; \nuse reflection::Reflection;\nuse std::collections::HashMap;\nuse tsv::*;\n // Used for writing"
  },
  {
    "path": "tests/integration_test.rs",
    "chars": 42934,
    "preview": "use assert_cmd::prelude::*; \nuse tsv::*;\n // Used for writing assertions\nuse std::process::Command; // Run programs\n    "
  },
  {
    "path": "tests/results/output",
    "chars": 181,
    "preview": "4\n./test_files/GCF_005706655.1_ASM570665v1_genomic.fna\n./test_files/e.coli-EC590.fasta\t0.00\n./test_files/e.coli-W.fasta."
  },
  {
    "path": "tests/results/output.af",
    "chars": 242,
    "preview": "4\n./test_files/GCF_005706655.1_ASM570665v1_genomic.fna\t100.00\t0.00\t0.00\t0.00\n./test_files/e.coli-EC590.fasta\t0.00\t100.00"
  },
  {
    "path": "tests/results/output_o_triangle_full",
    "chars": 242,
    "preview": "4\n./test_files/GCF_005706655.1_ASM570665v1_genomic.fna\t100.00\t0.00\t0.00\t0.00\n./test_files/e.coli-EC590.fasta\t0.00\t100.00"
  },
  {
    "path": "tests/results/output_o_triangle_full.af",
    "chars": 242,
    "preview": "4\n./test_files/GCF_005706655.1_ASM570665v1_genomic.fna\t100.00\t0.00\t0.00\t0.00\n./test_files/e.coli-EC590.fasta\t0.00\t100.00"
  },
  {
    "path": "tests/results/test_dist_file.txt",
    "chars": 458150,
    "preview": "Ref_file\tQuery_file\tANI\tAlign_fraction_ref\tAlign_fraction_query\tRef_name\tQuery_name\n./test_files/o157_reads.fastq\t./test"
  },
  {
    "path": "tests/tests.rs",
    "chars": 5850,
    "preview": "use skani::chain::*;\nuse skani::seeding::*;\nuse skani::avx2_seeding::*;\nuse skani::regression::*;\nuse skani::file_io::*;"
  }
]

// ... and 30 more files (download for full content)

About this extraction

This page contains the full source code of the bluenote-1577/skani GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 88 files (52.5 MB), approximately 11.0M tokens, and a symbol index with 280 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo