Full Code of bluenote-1577/sylph for AI

main cf6ee068c028 cached
29 files
1.1 MB
523.7k tokens
128 symbols
1 requests
Download .txt
Showing preview only (1,115K chars total). Download the full file or copy to clipboard to get everything.
Repository: bluenote-1577/sylph
Branch: main
Commit: cf6ee068c028
Files: 29
Total size: 1.1 MB

Directory structure:
gitextract_a9n8s440/

├── .cargo/
│   └── config.toml
├── .github/
│   └── workflows/
│       └── release.yml
├── .gitignore
├── CHANGELOG.md
├── Cargo.toml
├── LICENSE
├── README.md
├── src/
│   ├── avx2_seeding.rs
│   ├── cmdline.rs
│   ├── constants.rs
│   ├── contain.rs
│   ├── inference.rs
│   ├── inspect.rs
│   ├── lib.rs
│   ├── main.rs
│   ├── seeding.rs
│   ├── sketch.rs
│   └── types.rs
├── test_files/
│   ├── k12_R1.fq
│   ├── k12_R2.fq
│   ├── list.txt
│   ├── pair_list1.txt
│   ├── pair_list2.txt
│   ├── sample_list.txt
│   ├── single_sample.txt
│   ├── t1.fq
│   └── t2.fq
└── tests/
    ├── integration_test.rs
    └── unit_test.rs

================================================
FILE CONTENTS
================================================

================================================
FILE: .cargo/config.toml
================================================
[target.x86_64-unknown-linux-musl]
rustflags = ["-Ctarget-feature=+crt-static"]


================================================
FILE: .github/workflows/release.yml
================================================
name: "tagged-release"

on:
  workflow_dispatch:
  push:
    tags:
      - "v*"

jobs:
  tagged-release:
    name: "Tagged Release"
    runs-on: "ubuntu-latest"

    steps:
      - uses: actions/checkout@v2
      - uses: actions-rs/toolchain@v1
        with:
          toolchain: stable
      - run: sudo apt-get install musl musl-tools; rustup target add x86_64-unknown-linux-musl; cargo build --release --target=x86_64-unknown-linux-musl
      - uses: "marvinpinto/action-automatic-releases@latest"
        with:
          repo_token: "${{ secrets.GITHUB_TOKEN }}"
          prerelease: false
          automatic_release_tag: "latest"
          files: |
            target/x86_64-unknown-linux-musl/release/sylph


================================================
FILE: .gitignore
================================================
# Generated by Cargo
# will have compiled files and executables
debug/
target/

# Remove Cargo.lock from gitignore if creating an executable, leave it for libraries
# More information here https://doc.rust-lang.org/cargo/guide/cargo-toml-vs-cargo-lock.html
Cargo.lock

# These are backup files generated by rustfmt
**/*.rs.bk

# MSVC Windows builds of rustc generate these, which store debugging information
*.pdb


================================================
FILE: CHANGELOG.md
================================================
# sylph v0.9.0: 10-13-2025

- Added an option `--estimate-read-count` to VERY ROUGHLY output estimated read counts in the "Sequence_abundance" column instead of an actual sequence abundance. This forces `-u`. Only works for short reads right now. 
- Record ids could previously contain tabs, causing sylph's tsv output to not be a true tsv. This is fixed (thanks Donovan Parks). 

## sylph v0.8.1: release date TODO

### Major

- There was a bug in the defn of the "e" constant; fixed now (thanks to Benjamin Lieser). Results may change by about ~2%. 

### Minor
- Allowing `-r` option for `profile` for single-ended reads (thanks again to Florian Plaza Onate for this) 

## sylph v0.8.0: default output behaviour change + more efficient `inspect`.

### Major
* `inspect` option takes much less memory (thanks to Martin Larralde/@althonos for this)
* BREAKING: Changed default behaviour of sylph to write a TSV header even when no genomes are detected (thanks to Florian Plaza Onate for this suggestion)


## sylph v0.7.0: `inspect` option - 2024-11-06

### Major
* Added `inspect` option for inspecting `.syldb/sylsp`.
* Removed native compile flag. 

## sylph v0.6.1: improved automatic detection of sequencing error for -u option

### Major

* Improved automatic estimation of sequencing error for estimating unknown abundances/coverages.

Explanation:

The -u option estimates the % of sequences that are "unknown" i.e. not captured by the database and the "true" coverage. This requires knowledge of sequencing error. Previous versions failed when the sample was too diverse compared to sequencing depth (e.g. low-throughput sequencing or complex (ocean/soil) metagenomes). 

**New fallback added**: For short-reads only, if the diversity is too high relative to sequencing depth, (avg k-mer depth < 3) then 99.5% is used as a fallback sequence identity estimate. 

## sylph v0.6.0 release: New output column, lazy raw paired fastq profiling: 2024-04-06 

### Major

* A new column called `kmers_reassigned` is now in the profile output. This states how many k-mers are lost due to reassignment for that particular genome. 
* `-1, -2` options are now available for `sylph profile`. You can now do `sylph profile database.syldb -1 1.fq -2 2.fq ...`

## sylph v0.5.1 release: **Memory improvement and bug fixes** : Dec 27 2023

### Major

* Scalable cuckoo filters are now used for read deduplication for memory savings. 
* Deduplication algorithm improved. v0.5.0 worked poorly on highly (>15%) duplicated read sets. 
* Shorter reads can be sketched now. Down to 32bp instead of 63 bp before.

## sylph v0.5.0 release: **Big improvements on real illumina data** : Dec 23 2023

### Major

**In previous versions, sylph was underperforming on real illumina data sets**. See https://github.com/bluenote-1577/sylph/issues/5 

This is because many real illumina datasets have a non-trivial number of duplicate reads. Duplicate reads mess up sylph's statistical model.

For the single and paired sketching options, a new deduplication routine has been added. This will be described in version 2 of our preprint. 

**This increases sketching memory by 3-4x but greatly increases performance on real datasets with > 1-2% of duplication, especially for low-abundance genomes**. 

For paired-end illumina reads with non-trivial (> 1% duplication), sylph can now 

1. detect up to many more species low-abundance species below 0.3x coverage
2. give better coverage/abundance estimates for low-abundance species 

### BREAKING

- sequence sketches (sylsp) have changed formats. Sequences will need to be re-sketched.
- `--read-length` option removed and incorporated into the sketches by default. (suggested by @fplaza)

### Other changes

- New warning when `-o` specified and only reads are sketched (https://github.com/bluenote-1577/sylph/issues/7)
- You can now rename sylph samples by specifing a sample naming file with `--sample-names` or `--lS` (suggested by @jolespin)
- Newline delimited files are available in `profile` and `query` now (suggested by @jolespin)


## sylph v0.4.1 release: getting ready for preprinting

### Minor

- small changes for help text, options, and output texts. 

## sylph v0.4.0 release: major interface changes

### BREAKING

- renamed `sylph contain` to `sylph query`. 
- methods for sketching are drastically different now. E.g. we use `-g genome1.fa genom2.fa` for specifying genomes and `-r read1.fa read2.fq` for specifying reads when sketching. 

### Major

- `-u` or `--estimate-unknown` options are now present for estimating unknown organisms in the sample. 
- When using `-u`, associated options `--read-seq-id` and `--read-len` are available for calculating true coverages with sylph, i.e., coverages concordant with read mapping

### Minor

- Coverage calculation is slightly different now.

## sylph v0.3.0 release: first class support for pseudotax, now called "profile" - 2023-10-01

Continuing development of sylph taxonomic profiling. 

### BREAKING

- `--pseudotax` option in previous version is now a new command called `profile`.
- Databases are enabled for profiling by default. 
- Changed file suffices to `syldb` and `sylsp`.

### Major
- Default parameter changes. --min-spacing is set to 30 now. 
- Made profiling faster with some algorithmic tweaks. 
- Coverage calculated slightly differently
- Many small software changes with respect to threading and outputs

## sylph v0.2.0 release: pseudotax improved - 2023-09-19

### BREAKING
- Sylph's *.sylqueries are no longer compatible with older versions of sylph (< v0.2). Files will need to be resketched. 

### Major
- Fixed a major bug for the `--pseudotax` option that required redesigning file formats. Please use `--enable-pseudotax` when using using `contain --pseudotax` from now on.
- `--pseudotax` option gives relative abundances now. We are gaining some confidence that this approach gives a rough, but surprisingly decent taxonomic classification.  
- Changed how `Eff_cov` is calculated. We just use the median coverage now, except when we apply coverage-adjustment 

### Minor
- Fixed command line ambiguity for sketching outputs. `-s` has been replaced with `-d` for `sylph sketch`.
- Sylph outputs the results after processing every sample, instead of batching results, now


## sylph v0.1.0 release - 2023-09-03

### Major

- Added `--pseudotax` option, similar to the `-w` option in mash screen, where k-mers are assigned to the highest ANI genome so redundancy is removed. The output is a very rough taxonomic classification of the sample. 

### Minor

- Some fixes and parameter changes from the v0.0.x releases. 


================================================
FILE: Cargo.toml
================================================
[package]
name = "sylph"
version = "0.9.0"
edition = "2021"
license = "MIT OR Apache-2.0"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
needletail = "0.5.0"
simple_logger=  { version = "3", features = ["stderr"] }
log = "0"
rayon = "1"
smallvec = { version = "1", features = ["union","serde","write"] }
serde = { version = "1", features = ["derive"] }
bincode = "1"
fxhash = "0"
clap = { version = "3", features = ["derive"] }
flate2 = { version = "1.0.17", features = ["zlib-ng"], default-features = false }
statrs="0.16"
nalgebra="0"
rand = "0"
regex = "1"
fastrand = "2"
memory-stats = "1"
scalable_cuckoo_filter = "0.2"
serde_yaml = "0.9"

[target.'cfg(target_env = "musl")'.dependencies]
tikv-jemallocator = "0"


[dev-dependencies]
assert_cmd = "1.0.1"
predicates = "1"
serial_test = "*"


[profile.release]
panic = "abort"
lto = true

[profile.dev]
#opt-level = 1
opt-level = 3

#[rust]
#debuginfo-level = 1


================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2023 Jim Shaw

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
# sylph - fast and precise species-level metagenomic profiling with ANIs 

> [!IMPORTANT]
> All documentation for sylph has moved to https://sylph-docs.github.io/.
>
> **EVERYTHING BELOW—i.e., the GitHub versions of the README/Wikis/Manuals—is OUTDATED.** 

## Introduction

**sylph** is a program that performs ultrafast (1) **ANI querying** or (2) **metagenomic profiling** for metagenomic shotgun samples. 

**Containment ANI querying**: sylph can search a genome, e.g. E. coli, against your sample. If sylph outputs an estimate of 97% ANI, your sample contains an E. coli with 97% ANI to the queried genome.

**Metagenomic profiling**: sylph can determine the species/taxa in your sample and their abundances, just like [Kraken](https://ccb.jhu.edu/software/kraken/) or [MetaPhlAn](https://github.com/biobakery/MetaPhlAn).

<p align="center"><img src="assets/sylph.gif?raw=true"/></p>
<p align="center">
   <i>
   Profiling 1 Gbp of mouse gut reads against 85,205 genomes in a few seconds 
   </i>
</p>


### Why sylph?

1. **Precise species-level profiling**: sylph has less false positives than Kraken and is about as precise and sensitive as marker gene methods (MetaPhlAn, mOTUs). 

2. **Ultrafast, multithreaded, multi-sample**: sylph can be > 50x faster than other methods. Sylph only takes ~15GB of RAM for profiling against the entire GTDB-R220 database (110k genomes).

3. **Accurate (containment) ANI information**: sylph can give accurate **ANI estimates** between reference genomes and your metagenome sample down to 0.1x coverage.

4. **Customizable databases and pre-built databases**: We offer pre-built databases of [prokaryotes, viruses, eukaryotes](https://github.com/bluenote-1577/sylph/wiki/Pre%E2%80%90built-databases). Custom databases (e.g. using your own MAGs) are easy to build.  

5. **Short or long reads**: Sylph was also the most accurate method [on Oxford Nanopore's independent benchmarks](https://nanoporetech.com/resource-centre/genomic-and-epigenomic-insights-into-microbial-biology-with-nanopore-metagenomic-and-isolate-sequencing).

### How does sylph work?

sylph uses a k-mer containment method. sylph's novelty lies in **using a statistical technique to estimate k-mer containment for low coverage genomes** , giving accurate results for low abundance organisms. See [here for more information on what sylph can and can not do](https://github.com/bluenote-1577/sylph/wiki/Introduction:-what-is-sylph-and-how-does-it-work%3F). 

## Very quick start

#### Profile metagenome sample against [GTDB-R220](https://gtdb.ecogenomic.org/) (113,104 bacterial/archaeal species representative genomes) 

```sh
conda install -c bioconda sylph

# download GTDB-R220 pre-built database (~13 GB)
wget http://faust.compbio.cs.cmu.edu/sylph-stuff/gtdb-r220-c200-dbv1.syldb

# multi-sample paired-end profiling (sylph version >= 0.6)
sylph profile gtdb-r220-c200-dbv1.syldb -1 *_1.fastq.gz -2 *_2.fastq.gz -t (threads) > profiling.tsv

# multi-sample single-end profiling
sylph profile gtdb-r220-c200-dbv1.syldb *.fastq -t (threads) > profiling.tsv
```

##  Install 

#### Option 1: conda install 
[![Anaconda-Server Badge](https://anaconda.org/bioconda/sylph/badges/version.svg)](https://anaconda.org/bioconda/sylph)
[![Anaconda-Server Badge](https://anaconda.org/bioconda/sylph/badges/latest_release_date.svg)](https://anaconda.org/bioconda/sylph)

```sh
conda install -c bioconda sylph
```

#### Option 2: Build from source

Requirements:
1. [rust](https://www.rust-lang.org/tools/install) (version > 1.63) programming language and associated tools such as cargo are required and assumed to be in PATH.
2. A c compiler (e.g. GCC)
3. make
4. cmake

Building takes a few minutes (depending on # of cores).

```sh
git clone https://github.com/bluenote-1577/sylph
cd sylph

# If default rust install directory is ~/.cargo
cargo install --path . --root ~/.cargo
sylph profile test_files/*
```
#### Option 3: Pre-built x86-64 linux statically compiled executable

If you're on an x86-64 system, you can download the binary and use it without any installation. 

```sh
wget https://github.com/bluenote-1577/sylph/releases/download/latest/sylph
chmod +x sylph
./sylph -h
```

Note: the binary is compiled with a different set of libraries (musl instead of glibc), probably impacting performance. 

## Tutorials, manuals, and pre-built databases

### [Pre-built databases](https://github.com/bluenote-1577/sylph/wiki/Pre%E2%80%90built-databases)

The pre-built databases [available here](https://github.com/bluenote-1577/sylph/wiki/Pre%E2%80%90built-databases) can be downloaded and used with sylph for profiling and containment querying. 

### [Cookbook](https://github.com/bluenote-1577/sylph/wiki/sylph-cookbook)

For common use cases and fast explanations, see the above [cookbook](https://github.com/bluenote-1577/sylph/wiki/sylph-cookbook).

### Tutorials
1. #### [Introduction: 5-minute sylph tutorial outlining basic usage](https://github.com/bluenote-1577/sylph/wiki/5%E2%80%90minute-sylph-tutorial)
2. #### [Taxonomic profiling against GTDB database with MetaPhlAn-like output format](https://github.com/bluenote-1577/sylph/wiki/Taxonomic-profiling-with-the-GTDB%E2%80%90R214-database)

### Manuals
1. #### [Output format (TSV) and containment ANI explanation](https://github.com/bluenote-1577/sylph/wiki/Output-format)
2. #### [Taxonomic integration and custom taxonomies](https://github.com/bluenote-1577/sylph/wiki/Incorporating-taxonomic-information-into-sylph-with-sylph%E2%80%90tax)

### [sylph-tax](https://github.com/bluenote-1577/sylph-tax) 

To incorporate *taxonomy* into sylph's outputs, see the [sylph-tax repository](https://github.com/bluenote-1577/sylph-tax). 

> [!TIP] 
> The new [sylph-tax](https://github.com/bluenote-1577/sylph-tax) program replaces the old [sylph-utils](https://github.com/bluenote-1577/sylph-utils) repository. 

## Changelog

#### Version v0.8.0 - 2024-12-12. 

* Made the `inspect` option much less memory intensive. Slightly changed outputs when no genomes are found.

See the [CHANGELOG](https://github.com/bluenote-1577/sylph/blob/main/CHANGELOG.md) for complete details.

## Citing sylph

Jim Shaw and Yun William Yu. Rapid species-level metagenome profiling and containment estimation with sylph (2024). Nature Biotechnology.



================================================
FILE: src/avx2_seeding.rs
================================================
use std::arch::x86_64::*;
use crate::types::*;

#[inline]
#[target_feature(enable = "avx2")]
pub unsafe fn mm_hash256(kmer: __m256i) -> __m256i {
    let mut key = kmer;
    let s1 = _mm256_slli_epi64(key, 21);
    key = _mm256_add_epi64(key, s1);
    //TODO this is bugged. Fix after release...
    key = _mm256_xor_si256(key, _mm256_cmpeq_epi64(key, key));

    key = _mm256_xor_si256(key, _mm256_srli_epi64(key, 24));
    let s2 = _mm256_slli_epi64(key, 3);
    let s3 = _mm256_slli_epi64(key, 8);

    key = _mm256_add_epi64(key, s2);
    key = _mm256_add_epi64(key, s3);
    key = _mm256_xor_si256(key, _mm256_srli_epi64(key, 14));
    let s4 = _mm256_slli_epi64(key, 2);
    let s5 = _mm256_slli_epi64(key, 4);
    key = _mm256_add_epi64(key, s4);
    key = _mm256_add_epi64(key, s5);
    key = _mm256_xor_si256(key, _mm256_srli_epi64(key, 28));

    let s6 = _mm256_slli_epi64(key, 31);
    key = _mm256_add_epi64(key, s6);

    return key;
}

#[target_feature(enable = "avx2")]
pub unsafe fn extract_markers_avx2(string: &[u8], kmer_vec: &mut Vec<u64>, c: usize, k: usize) {
    if string.len() < k {
        return;
    }
    let len = (string.len() - k + 1) / 4;
    let string1 = &string[0..len + k - 1];
    let string2 = &string[len..2 * len + k - 1];
    let string3 = &string[2 * len..3 * len + k - 1];
    let string4 = &string[3 * len..4 * len + k - 1];
    if string.len() < k+1{
        return;
    }

    let use_40 = if 2 * (k - 1) == 40 {
        true
    } else if 2 * (k - 1) == 60 {
        false
    } else {
        panic!()
    };
    const TWO_K_MINUS_2_60: i32 = 60;
    const TWO_K_MINUS_2_40: i32 = 40;
    let mut rolling_kmer_f_marker = _mm256_set_epi64x(0, 0, 0, 0);
    let mut rolling_kmer_r_marker = _mm256_set_epi64x(0, 0, 0, 0);
    let rev_sub = _mm256_set_epi64x(3, 3, 3, 3);
    for i in 0..k - 1 {
        let nuc_f1 = BYTE_TO_SEQ[string1[i] as usize] as i64;
        let nuc_f2 = BYTE_TO_SEQ[string2[i] as usize] as i64;
        let nuc_f3 = BYTE_TO_SEQ[string3[i] as usize] as i64;
        let nuc_f4 = BYTE_TO_SEQ[string4[i] as usize] as i64;
        let f_nucs = _mm256_set_epi64x(nuc_f4, nuc_f3, nuc_f2, nuc_f1);
        let r_nucs = _mm256_sub_epi64(rev_sub, f_nucs);

        rolling_kmer_f_marker = _mm256_slli_epi64(rolling_kmer_f_marker, 2);
        rolling_kmer_f_marker = _mm256_or_si256(rolling_kmer_f_marker, f_nucs);

        rolling_kmer_r_marker = _mm256_srli_epi64(rolling_kmer_r_marker, 2);

        let shift_nuc_r;
        if use_40 {
            shift_nuc_r = _mm256_slli_epi64(r_nucs, TWO_K_MINUS_2_40);
        } else {
            shift_nuc_r = _mm256_slli_epi64(r_nucs, TWO_K_MINUS_2_60);
        }
        rolling_kmer_r_marker = _mm256_or_si256(rolling_kmer_r_marker, shift_nuc_r);
    }

    let marker_mask = (Kmer::MAX >> (std::mem::size_of::<Kmer>() * 8 - 2 * k)) as i64;
    let rev_marker_mask: i64 = !(0 | (3 << 2 * k - 2));
    //    let rev_marker_mask = i64::from_le_bytes(rev_marker_mask.to_le_bytes());
    //    dbg!(u64::MAX / (c as u64));
    //    dbg!((u64::MAX / (c as u64)) as i64);
    let threshold_marker = u64::MAX / c as u64;

    let mm256_marker_mask = _mm256_set_epi64x(marker_mask, marker_mask, marker_mask, marker_mask);
    let mm256_rev_marker_mask = _mm256_set_epi64x(
        rev_marker_mask,
        rev_marker_mask,
        rev_marker_mask,
        rev_marker_mask,
    );

    for i in k - 1..(len + k - 1) {
        let nuc_f1 = BYTE_TO_SEQ[string1[i] as usize] as i64;
        let nuc_f2 = BYTE_TO_SEQ[string2[i] as usize] as i64;
        let nuc_f3 = BYTE_TO_SEQ[string3[i] as usize] as i64;
        let nuc_f4 = BYTE_TO_SEQ[string4[i] as usize] as i64;
        let f_nucs = _mm256_set_epi64x(nuc_f4, nuc_f3, nuc_f2, nuc_f1);
        let r_nucs = _mm256_sub_epi64(rev_sub, f_nucs);

        rolling_kmer_f_marker = _mm256_slli_epi64(rolling_kmer_f_marker, 2);
        rolling_kmer_f_marker = _mm256_or_si256(rolling_kmer_f_marker, f_nucs);
        rolling_kmer_f_marker = _mm256_and_si256(rolling_kmer_f_marker, mm256_marker_mask);

        rolling_kmer_r_marker = _mm256_srli_epi64(rolling_kmer_r_marker, 2);
        let shift_nuc_r;
        if use_40 {
            shift_nuc_r = _mm256_slli_epi64(r_nucs, TWO_K_MINUS_2_40);
        } else {
            shift_nuc_r = _mm256_slli_epi64(r_nucs, TWO_K_MINUS_2_60);
        }
        rolling_kmer_r_marker = _mm256_and_si256(rolling_kmer_r_marker, mm256_rev_marker_mask);
        rolling_kmer_r_marker = _mm256_or_si256(rolling_kmer_r_marker, shift_nuc_r);

        let compare_marker = _mm256_cmpgt_epi64(rolling_kmer_r_marker, rolling_kmer_f_marker);

        let canonical_markers_256 =
            _mm256_blendv_epi8(rolling_kmer_r_marker, rolling_kmer_f_marker, compare_marker);

        //        dbg!(rolling_kmer_f_marker,rolling_kmer_r_marker);
        //        dbg!(print_string(u64::from_ne_bytes(_mm256_extract_epi64(rolling_kmer_f_marker,1).to_ne_bytes()), 31));
        let hash_256 = mm_hash256(canonical_markers_256);
        let v1 = _mm256_extract_epi64(hash_256, 0) as u64;
        let v2 = _mm256_extract_epi64(hash_256, 1) as u64;
        let v3 = _mm256_extract_epi64(hash_256, 2) as u64;
        let v4 = _mm256_extract_epi64(hash_256, 3) as u64;
        //        let threshold_256 = _mm256_cmpgt_epi64(cmp_thresh, hash_256);
        //        let m1 = _mm256_extract_epi64(threshold_256, 0);
        //        let m2 = _mm256_extract_epi64(threshold_256, 1);
        //        let m3 = _mm256_extract_epi64(threshold_256, 2);
        //        let m4 = _mm256_extract_epi64(threshold_256, 3);

        if v1 < threshold_marker {
            kmer_vec.push(v1 as u64);
        }
        if v2 < threshold_marker {
            kmer_vec.push(v2 as u64);
        }
        if v3 < threshold_marker {
            kmer_vec.push(v3 as u64);
        }
        if v4 < threshold_marker {
            kmer_vec.push(v4 as u64);
        }
    }
}

#[target_feature(enable = "avx2")]
pub unsafe fn extract_markers_avx2_positions(string: &[u8], kmer_vec: &mut Vec<(usize, usize,u64)>, c: usize, k: usize, contig_number: usize) {
    if string.len() < k {
        return;
    }
    let len = (string.len() - k + 1) / 4;
    let string1 = &string[0..len + k - 1];
    let string2 = &string[len..2 * len + k - 1];
    let string3 = &string[2 * len..3 * len + k - 1];
    let string4 = &string[3 * len..4 * len + k - 1];
    if string.len() < 2 * k {
        return;
    }

    let use_40 = if 2 * (k - 1) == 40 {
        true
    } else if 2 * (k - 1) == 60 {
        false
    } else {
        panic!()
    };
    const TWO_K_MINUS_2_60: i32 = 60;
    const TWO_K_MINUS_2_40: i32 = 40;
    let mut rolling_kmer_f_marker = _mm256_set_epi64x(0, 0, 0, 0);
    let mut rolling_kmer_r_marker = _mm256_set_epi64x(0, 0, 0, 0);
    let rev_sub = _mm256_set_epi64x(3, 3, 3, 3);
    for i in 0..k - 1 {
        let nuc_f1 = BYTE_TO_SEQ[string1[i] as usize] as i64;
        let nuc_f2 = BYTE_TO_SEQ[string2[i] as usize] as i64;
        let nuc_f3 = BYTE_TO_SEQ[string3[i] as usize] as i64;
        let nuc_f4 = BYTE_TO_SEQ[string4[i] as usize] as i64;
        let f_nucs = _mm256_set_epi64x(nuc_f4, nuc_f3, nuc_f2, nuc_f1);
        let r_nucs = _mm256_sub_epi64(rev_sub, f_nucs);

        rolling_kmer_f_marker = _mm256_slli_epi64(rolling_kmer_f_marker, 2);
        rolling_kmer_f_marker = _mm256_or_si256(rolling_kmer_f_marker, f_nucs);

        rolling_kmer_r_marker = _mm256_srli_epi64(rolling_kmer_r_marker, 2);

        let shift_nuc_r;
        if use_40 {
            shift_nuc_r = _mm256_slli_epi64(r_nucs, TWO_K_MINUS_2_40);
        } else {
            shift_nuc_r = _mm256_slli_epi64(r_nucs, TWO_K_MINUS_2_60);
        }
        rolling_kmer_r_marker = _mm256_or_si256(rolling_kmer_r_marker, shift_nuc_r);
    }

    let marker_mask = (Kmer::MAX >> (std::mem::size_of::<Kmer>() * 8 - 2 * k)) as i64;
    let rev_marker_mask: i64 = !(0 | (3 << 2 * k - 2));
    //    let rev_marker_mask = i64::from_le_bytes(rev_marker_mask.to_le_bytes());
    //    dbg!(u64::MAX / (c as u64));
    //    dbg!((u64::MAX / (c as u64)) as i64);
    let threshold_marker = u64::MAX / c as u64;

    let mm256_marker_mask = _mm256_set_epi64x(marker_mask, marker_mask, marker_mask, marker_mask);
    let mm256_rev_marker_mask = _mm256_set_epi64x(
        rev_marker_mask,
        rev_marker_mask,
        rev_marker_mask,
        rev_marker_mask,
    );

    for i in k - 1..(len + k - 1) {
        let nuc_f1 = BYTE_TO_SEQ[string1[i] as usize] as i64;
        let nuc_f2 = BYTE_TO_SEQ[string2[i] as usize] as i64;
        let nuc_f3 = BYTE_TO_SEQ[string3[i] as usize] as i64;
        let nuc_f4 = BYTE_TO_SEQ[string4[i] as usize] as i64;
        let f_nucs = _mm256_set_epi64x(nuc_f4, nuc_f3, nuc_f2, nuc_f1);
        let r_nucs = _mm256_sub_epi64(rev_sub, f_nucs);

        rolling_kmer_f_marker = _mm256_slli_epi64(rolling_kmer_f_marker, 2);
        rolling_kmer_f_marker = _mm256_or_si256(rolling_kmer_f_marker, f_nucs);
        rolling_kmer_f_marker = _mm256_and_si256(rolling_kmer_f_marker, mm256_marker_mask);

        rolling_kmer_r_marker = _mm256_srli_epi64(rolling_kmer_r_marker, 2);
        let shift_nuc_r;
        if use_40 {
            shift_nuc_r = _mm256_slli_epi64(r_nucs, TWO_K_MINUS_2_40);
        } else {
            shift_nuc_r = _mm256_slli_epi64(r_nucs, TWO_K_MINUS_2_60);
        }
        rolling_kmer_r_marker = _mm256_and_si256(rolling_kmer_r_marker, mm256_rev_marker_mask);
        rolling_kmer_r_marker = _mm256_or_si256(rolling_kmer_r_marker, shift_nuc_r);

        let compare_marker = _mm256_cmpgt_epi64(rolling_kmer_r_marker, rolling_kmer_f_marker);

        let canonical_markers_256 =
            _mm256_blendv_epi8(rolling_kmer_r_marker, rolling_kmer_f_marker, compare_marker);

        //        dbg!(rolling_kmer_f_marker,rolling_kmer_r_marker);
        //        dbg!(print_string(u64::from_ne_bytes(_mm256_extract_epi64(rolling_kmer_f_marker,1).to_ne_bytes()), 31));
        let hash_256 = mm_hash256(canonical_markers_256);
        let v1 = _mm256_extract_epi64(hash_256, 0) as u64;
        let v2 = _mm256_extract_epi64(hash_256, 1) as u64;
        let v3 = _mm256_extract_epi64(hash_256, 2) as u64;
        let v4 = _mm256_extract_epi64(hash_256, 3) as u64;
        //        let threshold_256 = _mm256_cmpgt_epi64(cmp_thresh, hash_256);
        //        let m1 = _mm256_extract_epi64(threshold_256, 0);
        //        let m2 = _mm256_extract_epi64(threshold_256, 1);
        //        let m3 = _mm256_extract_epi64(threshold_256, 2);
        //        let m4 = _mm256_extract_epi64(threshold_256, 3);

        if v1 < threshold_marker {
            kmer_vec.push((contig_number, i, v1 as u64));
        }
        if v2 < threshold_marker {
            kmer_vec.push((contig_number, len + i, v2 as u64));
        }
        if v3 < threshold_marker {
            kmer_vec.push((contig_number, 2*len + i, v3 as u64));
        }
        if v4 < threshold_marker {
            kmer_vec.push((contig_number, 3*len + i, v4 as u64));
        }
    }
}


================================================
FILE: src/cmdline.rs
================================================
use clap::{Args, Parser, Subcommand};
use crate::constants::*;

#[derive(Parser)]
#[clap(author, version, about = "Ultrafast genome ANI queries and taxonomic profiling for metagenomic shotgun samples.\n\n--- Preparing inputs by sketching (indexing)\n## fastq (reads) and fasta (genomes all at once)\n## *.sylsp found in -d; *.syldb given by -o\nsylph sketch -t 5 sample1.fq sample2.fq genome1.fa genome2.fa -o genome1+genome2 -d sample_dir\n\n## paired-end reads\nsylph sketch -1 a_1.fq b_1.fq -2 b_2.fq b_2.fq -d paired_sketches\n\n--- Nearest neighbour containment ANI\nsylph query *.syldb *.sylsp > all-to-all-query.tsv\n\n--- Taxonomic profiling with relative abundances and ANI\nsylph profile *.syldb *.sylsp > all-to-all-profile.tsv", arg_required_else_help = true, disable_help_subcommand = true)]
pub struct Cli {
    #[clap(subcommand,)]
    pub mode: Mode,
}

#[derive(Subcommand)]
pub enum Mode {
    /// Sketch sequences into samples (reads) and databases (genomes). Each sample.fq -> sample.sylsp. All *.fa -> *.syldb. 
    #[clap(display_order = 1)]
    Sketch(SketchArgs),
    /// Coverage-adjusted ANI querying between databases and samples.
    #[clap(display_order = 3)]
    Query(ContainArgs),
    ///Species-level taxonomic profiling with abundances and ANIs. 
    #[clap(display_order = 2)]
    Profile(ContainArgs),
    ///Inspect sketched .syldb and .sylsp files.
    #[clap(arg_required_else_help = true, display_order = 4)]
    Inspect(InspectArgs),
}


#[derive(Args, Default)]
pub struct SketchArgs {
    #[clap(multiple=true, help_heading = "INPUT", help = "fasta/fastq files; gzip optional. Default: fastq file produces a sample sketch (*.sylsp) while fasta files are combined into a database (*.syldb).")]
    pub files: Vec<String>,
    #[clap(short='o',long="out-name-db", default_value = "database", help_heading = "OUTPUT", help = "Output name for database sketch (with .syldb appended)")]
    pub db_out_name: String,
    #[clap(short='d',long="sample-output-directory", default_value = "./", help_heading = "OUTPUT", help = "Output directory for sample sketches")]
    pub sample_output_dir: String,
    #[clap(short,long="individual-records", help_heading = "GENOME INPUT", help = "Use individual records (contigs) for database construction")]
    pub individual: bool,
    #[clap(multiple=true,short,long="reads", help_heading = "SINGLE-END INPUT", help = "Single-end fasta/fastq reads")]
    pub reads: Option<Vec<String>>,
    #[clap(multiple=true,short='g', long="genomes", help_heading = "GENOME INPUT", help = "Genomes in fasta format")]
    pub genomes: Option<Vec<String>>,
    #[clap(short,long="list", help_heading = "INPUT", help = "Newline delimited file with inputs; fastas -> database, fastq -> sample")]
    pub list_sequence: Option<String>,
    #[clap(long="rl", hidden=true, help_heading = "SINGLE-END INPUT", help = "Newline delimited file; inputs assumed reads")]
    pub list_reads: Option<String>,
    #[clap(long="gl", help_heading = "GENOME INPUT", help = "Newline delimited file; inputs assumed genomes")]
    pub list_genomes: Option<String>,
    #[clap(long="l1", help_heading = "PAIRED-END INPUT", help = "Newline delimited file; inputs are first pair of PE reads")]
    pub list_first_pair: Option<String>,
    #[clap(long="l2", help_heading = "PAIRED-END INPUT", help = "Newline delimited file; inputs are second pair of PE reads")]
    pub list_second_pair: Option<String>,
    #[clap(long="lS", help_heading = "INPUT", help = "Newline delimited file; read sketches are renamed to given sample names")]
    pub list_sample_names: Option<String>,
    #[clap(multiple=true, short='S', long="sample-names", help_heading = "INPUT", help = "Read sketches are renamed to given sample names")]
    pub sample_names: Option<Vec<String>>,

    #[clap(short, default_value_t = 31,help_heading = "ALGORITHM", help ="Value of k. Only k = 21, 31 are currently supported")]
    pub k: usize,
    #[clap(short, default_value_t = 200, help_heading = "ALGORITHM", help = "Subsampling rate")]
    pub c: usize,
    #[clap(short, default_value_t = 3, help = "Number of threads")]
    pub threads: usize,
    #[clap(long="ram-barrier", help = "Stop multi-threaded read sketching when (virtual) RAM is past this value (in GB). Does NOT guarantee max RAM limit", hidden=true)]
    pub max_ram: Option<usize>,
    #[clap(long="trace", help = "Trace output (caution: very verbose)")]
    pub trace: bool,
    #[clap(long="debug", help = "Debug output")]
    pub debug: bool,


    #[clap(long="no-dedup", help_heading = "ALGORITHM", help = "Disable read deduplication procedure. Reduces memory; not recommended for illumina data")]
    pub no_dedup: bool,
    #[clap(long="disable-profiling", help_heading = "ALGORITHM", help = "Disable sylph profile usage for databases; may decrease size and make sylph query slightly faster", hidden=true)]
    pub no_pseudotax: bool,
    #[clap(long="min-spacing", default_value_t = 30, help_heading = "ALGORITHM", help = "Minimum spacing between selected k-mers on the genomes")]
    pub min_spacing_kmer: usize,
    #[clap(long="fpr", default_value_t = DEFAULT_FPR, help_heading = "ALGORITHM", help = "False positive rate for read deduplicate hashing; valid values in [0,1).")]
    pub fpr: f64,
    #[clap(short='1',long="first-pairs", multiple=true, help_heading = "PAIRED-END INPUT", help = "First pairs for paired end reads")]
    pub first_pair: Vec<String>,
    #[clap(short='2',long="second-pairs", multiple=true, help_heading = "PAIRED-END INPUT", help = "Second pairs for paired end reads")]
    pub second_pair: Vec<String>,
}

#[derive(Args)]
pub struct ContainArgs {
    #[clap(multiple=true, help = "Pre-sketched *.syldb/*.sylsp files. Raw single-end fastq/fasta are allowed and will be automatically sketched to .sylsp/.syldb")]
    pub files: Vec<String>,

    
    #[clap(short='l',long="list", help = "Newline delimited file of file inputs",help_heading = "INPUT/OUTPUT")]
    pub file_list: Option<String>,

    #[clap(long,default_value_t = 3., help_heading = "ALGORITHM", help = "Minimum k-mer multiplicity needed for coverage correction. Higher values gives more precision but lower sensitivity")]
    pub min_count_correct: f64,
    #[clap(short='M',long,default_value_t = 50., help_heading = "ALGORITHM", help = "Exclude genomes with less than this number of sampled k-mers")]
    pub min_number_kmers: f64,
    #[clap(short, long="minimum-ani", help_heading = "ALGORITHM", help = "Minimum adjusted ANI to consider (0-100). Default is 90 for query and 95 for profile. Smaller than 95 for profile will give inaccurate results." )]
    pub minimum_ani: Option<f64>,
    #[clap(short, default_value_t = 3, help = "Number of threads")]
    pub threads: usize,
    #[clap(short='s', long="sample-threads", help = "Number of samples to be processed concurrently. Default: (# of total threads / 3) + 1 for profile, 1 for query")]
    pub sample_threads: Option<usize>,
    #[clap(long="trace", help = "Trace output (caution: very verbose)")]
    pub trace: bool,
    #[clap(long="debug", help = "Debug output")]
    pub debug: bool,

    #[clap(long="estimate-read-counts", help_heading = "ALGORITHM", help = "Very roughly estimate read counts in the 'Sequence_abundance' column instead of relative abundance. This forces `-u`, which may have caveats for long reads and complex environments.")]
    pub estimate_read_counts: bool,

    #[clap(short='u', long="estimate-unknown", help_heading = "ALGORITHM", help = "Estimate true coverage and scale sequence abundance in `profile` by estimated unknown sequence percentage" )]
    pub estimate_unknown: bool,
    
    #[clap(short='I',long="read-seq-id", help_heading = "ALGORITHM", help = "Sequence identity (%) of reads. Only used in -u option and overrides automatic detection. ")]
    pub seq_id: Option<f64>,

    //#[clap(short='l', long="read-length", help_heading = "ALGORITHM", help = "Read length (single-end length for pairs). Only necessary for short-read coverages when using --estimate-unknown. Not needed for long-reads" )]
    //pub read_length: Option<usize>,

    #[clap(short='R', long="redundancy-threshold", help_heading = "ALGORITHM", help = "Removes redundant genomes up to a rough ANI percentile when profiling", default_value_t = 99.0, hidden=true)]
    pub redundant_ani: f64,

    #[clap(short='r',long="reads", multiple=true, help = "Single-end raw reads (fastx/gzip)", display_order = 1, help_heading = "SKETCHING")]
    pub reads: Vec<String>,

    #[clap(short='1', long="first-pairs", multiple=true, help = "First pairs for raw paired-end reads (fastx/gzip)", help_heading = "SKETCHING")]
    pub first_pair: Vec<String>,

    #[clap(short='2', long="second-pairs", multiple=true, help = "Second pairs for raw paired-end reads (fastx/gzip)", help_heading = "SKETCHING")]
    pub second_pair: Vec<String>,

    #[clap(short, default_value_t = 200, help_heading = "SKETCHING", help = "Subsampling rate. Does nothing for pre-sketched files")]
    pub c: usize,
    #[clap(short, default_value_t = 31, help_heading = "SKETCHING", help = "Value of k. Only k = 21, 31 are currently supported. Does nothing for pre-sketched files")]
    pub k: usize,
    #[clap(short,long="individual-records", help_heading = "SKETCHING", help = "Use individual records (e.g. contigs) for database construction instead. Does nothing for pre-sketched files")]
    pub individual: bool,
    #[clap(long="min-spacing", default_value_t = 30, help_heading = "SKETCHING", help = "Minimum spacing between selected k-mers on the database genomes. Does nothing for pre-sketched files")]
    pub min_spacing_kmer: usize,

    #[clap(short='o',long="output-file", help = "Output to this file (TSV format). [default: stdout]", help_heading="INPUT/OUTPUT")]
    pub out_file_name: Option<String>,
    #[clap(long="log-reassignments", help = "Output information for how k-mers for genomes are reassigned during `profile`. Caution: can be verbose and slows down computation.")]
    pub log_reassignments: bool,


    //Hidden options that are embedded in the args but no longer used... 
    #[clap(short, hidden=true, long="pseudotax", help_heading = "ALGORITHM", help = "Pseudo taxonomic classification mode. This removes shared k-mers between species by assigning k-mers to the highest ANI species. Requires sketches with --enable-pseudotax option" )]
    pub pseudotax: bool,
    #[clap(long="ratio", hidden=true)]
    pub ratio: bool,
    #[clap(long="mme", hidden=true)]
    pub mme: bool,
    #[clap(long="mle", hidden=true)]
    pub mle: bool,
    #[clap(long="nb", hidden=true)]
    pub nb: bool,
    #[clap(long="no-ci", help = "Do not output confidence intervals", hidden=true)]
    pub no_ci: bool,
    #[clap(long="no-adjust", hidden=true)]
    pub no_adj: bool,
    #[clap(long="mean-coverage", help_heading = "ALGORITHM", help = "Use the robust mean coverage estimator instead of median estimator", hidden=true )]
    pub mean_coverage: bool,

}

#[derive(Args)]
pub struct InspectArgs {
    #[clap(multiple=true, help = "Pre-sketched *.syldb/*.sylsp files.")]
    pub files: Vec<String>,
    #[clap(short='o',long="output-file", help = "Output to this file (YAML format). [default: stdout]")]
    pub out_file_name: Option<String>,

}
    


================================================
FILE: src/constants.rs
================================================
pub const EM_ABUND_CUTOFF: f64 = 0.01;
pub const PAIR_REGEX: &str = r"(.+)(_?1|_?2)(\..+)";
pub const CUTOFF_PVALUE:f64 = 0.9999999999;
pub const SAMPLE_SIZE_CUTOFF: usize = 25;
pub const MEDIAN_ANI_THRESHOLD: f64 = 2.;
pub const QUERY_FILE_SUFFIX: &str = ".syldb";
pub const SAMPLE_FILE_SUFFIX: &str = ".sylsp";
pub const QUERY_FILE_SUFFIX_VALID : [&str;2] = [QUERY_FILE_SUFFIX, ".sylqueries"];
pub const SAMPLE_FILE_SUFFIX_VALID : [&str;2] = [SAMPLE_FILE_SUFFIX, ".sylsample"];
pub const MIN_ANI_DEF: f64 = 0.9;
pub const MIN_ANI_P_DEF: f64 = 0.95;
pub const MAX_MEDIAN_FOR_MEAN_FINAL_EST: f64 = 15.;
pub const DEREP_PROFILE_ANI: f64 = 0.975;
pub const MAX_DEDUP_COUNT: u32 = 4;
pub const MAX_DEDUP_LEN: usize = 10000000;
pub const DEFAULT_FPR: f64 = 0.0001;
pub const MED_KMER_FOR_ID_EST: f64 = 3.;


================================================
FILE: src/contain.rs
================================================
use crate::cmdline::*;
use std::path::Path;
use std::io::prelude::*;
use std::io;
use std::io::BufWriter;
use fxhash::FxHashMap;
use crate::constants::*;
use crate::inference::*;
use crate::sketch::*;
use crate::types::*;
use log::*;
use rayon::prelude::*;
use statrs::distribution::{DiscreteCDF, Poisson};
use std::fs::File;
use std::io::BufReader;
use std::sync::Mutex;

fn print_ani_result(ani_result: &AniResult, pseudotax: bool, writer: &mut Box<dyn Write + Send>) {
    let print_final_ani = format!("{:.2}", f64::min(ani_result.final_est_ani * 100., 100.));
    let lambda_print;
    if let AdjustStatus::Lambda(lambda) = ani_result.lambda {
        lambda_print = format!("{:.3}", lambda);
    } else if ani_result.lambda == AdjustStatus::High {
        lambda_print = format!("HIGH");
    } else {
        lambda_print = format!("LOW");
    }
    let low_ani = ani_result.ani_ci.0;
    let high_ani = ani_result.ani_ci.1;
    let low_lambda = ani_result.lambda_ci.0;
    let high_lambda = ani_result.lambda_ci.1;

    let ci_ani;
    if low_ani.is_none() || high_ani.is_none() {
        ci_ani = "NA-NA".to_string();
    } else {
        ci_ani = format!(
            "{:.2}-{:.2}",
            low_ani.unwrap() * 100.,
            high_ani.unwrap() * 100.
        );
    }

    let ci_lambda;
    if low_lambda.is_none() || high_lambda.is_none() {
        ci_lambda = "NA-NA".to_string();
    } else {
        ci_lambda = format!("{:.2}-{:.2}", low_lambda.unwrap(), high_lambda.unwrap());
    }


    //"Sample_file\tQuery_file\tTaxonomic_abundance\tSequence_abundance\tAdjusted_ANI\tEff_cov\tANI_5-95_percentile\tEff_lambda\tLambda_5-95_percentile\tMedian_cov\tMean_cov_geq1\tContainment_ind\tNaive_ANI\tContig_name",

    if !pseudotax{
        writeln!(writer, 
            "{}\t{}\t{}\t{:.3}\t{}\t{}\t{}\t{:.0}\t{:.3}\t{}/{}\t{:.2}\t{}",
            ani_result.seq_name,
            ani_result.gn_name,
            print_final_ani,
            ani_result.final_est_cov,
            ci_ani,
            lambda_print,
            ci_lambda,
            ani_result.median_cov,
            ani_result.mean_cov,
            ani_result.containment_index.0,
            ani_result.containment_index.1,
            ani_result.naive_ani * 100.,
            ani_result.contig_name,
        ).expect("Error writing to file");
    }
    else{
        writeln!(writer,
            "{}\t{}\t{:.4}\t{:.4}\t{}\t{:.3}\t{}\t{}\t{}\t{:.0}\t{:.3}\t{}/{}\t{:.2}\t{}\t{}",
            ani_result.seq_name,
            ani_result.gn_name,
            ani_result.rel_abund.unwrap(),
            ani_result.seq_abund.unwrap(),
            print_final_ani,
            ani_result.final_est_cov,
            ci_ani,
            lambda_print,
            ci_lambda,
            ani_result.median_cov,
            ani_result.mean_cov,
            ani_result.containment_index.0,
            ani_result.containment_index.1,
            ani_result.naive_ani * 100.,
            ani_result.kmers_lost.unwrap(),
            ani_result.contig_name,
        ).expect("Error writing to file");

    }
}

fn get_chunks(indices: &Vec<usize>, steps: usize) -> Vec<Vec<usize>>{
    let mut start = 0;
    let mut end = steps;
    let len = indices.len();
    let mut return_chunks = vec![];

    while start < len {
        if end > len {
            end = len;
        }

        let chunk: Vec<usize> = (start..end).collect();
        start = end;
        end += steps;
        return_chunks.push(chunk);
    }
    return_chunks
}

pub fn contain(mut args: ContainArgs, pseudotax_in: bool) {

    if pseudotax_in{
        args.pseudotax = true;
    }

    let level;
    if args.trace {
        level = log::LevelFilter::Trace;
    } else if args.debug {
        level = log::LevelFilter::Debug;
    }
    else{
        level = log::LevelFilter::Info;
    }
    
    simple_logger::SimpleLogger::new()
        .with_level(level)
        .init()
        .unwrap();

    rayon::ThreadPoolBuilder::new()
        .num_threads(args.threads)
        .build_global()
        .unwrap();

    let out_writer = match args.out_file_name {
        Some(ref x) => {
            let path = Path::new(&x);
            Box::new(BufWriter::new(File::create(&path).unwrap())) as Box<dyn Write + Send>
        }
        None => Box::new(BufWriter::new(io::stdout())) as Box<dyn Write + Send>,
    };

    if args.estimate_read_counts{
        args.estimate_unknown = true;
        log::info!("--estimate-read-counts detected, also enabling -u. Sequence_abundance column will be set to estimated read counts, not abundance. This is still experimental.");
    }

    log::info!("Obtaining sketches...");
    let mut genome_sketch_files = vec![];
    let mut genome_files = vec![];
    let mut read_sketch_files = vec![];
    let mut read_files = vec![];

    let mut all_files = args.files.clone();

    if let Some(ref newline_file) = args.file_list{
        let file = File::open(newline_file).unwrap();
        let reader = BufReader::new(file);
        for line in reader.lines() {
            all_files.push(line.unwrap());
        }

    }


    for file in all_files.iter(){

        let mut genome_sketch_good_suffix = false;
        for suff in QUERY_FILE_SUFFIX_VALID{
            if file.ends_with(suff){
                genome_sketch_good_suffix = true;
                break
            }
        }

        let mut sample_sketch_good_suffix = false;
        for suff in SAMPLE_FILE_SUFFIX_VALID{
            if file.ends_with(suff){
                sample_sketch_good_suffix = true;
                break
            }
        }

        if genome_sketch_good_suffix{
            genome_sketch_files.push(file);
        } else if sample_sketch_good_suffix{
            read_sketch_files.push(file);
        } else if is_fasta(&file) {
            genome_files.push(file);
        } else if is_fastq(&file) {
            read_files.push(vec![file]);
        } else {
            warn!(
                "{} file extension is not a sketch or a fasta/fastq file.",
                &file
            );
        }
    }

    if args.first_pair.len() != args.second_pair.len() {
        error!("Different number of paired sequences (-1, -2) for sketching. Exiting.");
        std::process::exit(1);
    }

    // zip together the first and second pair files, push them to read_files
    for (first, second) in args.first_pair.iter().zip(args.second_pair.iter()) {
        read_files.push(vec![first,second]);
    }

    for read in args.reads.iter() {
        read_files.push(vec![read]);
    }

    if genome_sketch_files.is_empty() && genome_files.is_empty(){
        log::error!("No genome files found; see sylph query/profile -h for help. Exiting");
        std::process::exit(1);
    }

    if read_sketch_files.is_empty() && read_files.is_empty(){
        log::error!("No read files found; see sylph query/profile -h for help. Exiting");
        std::process::exit(1);
    }

    let genome_sketches = get_genome_sketches(&args, &genome_sketch_files, &genome_files);
    let genome_index_vec = (0..genome_sketches.len()).collect::<Vec<usize>>();
    log::info!("Finished obtaining genome sketches.");

    if genome_sketches.is_empty() {
        log::error!("No genome sketches found; see sylph query/profile -h for help. Exiting");
        std::process::exit(1);
    }

    if genome_sketches.first().unwrap().pseudotax_tracked_nonused_kmers.is_none() && args.pseudotax{
        log::error!("Attempting profiling, but *.syldb was sketched with the --disable-profiling option. Exiting");
        std::process::exit(1);
    }

    let num_raw_read_files = read_files.len();
    let step;
    if let Some(sample_threads) = args.sample_threads{
        if sample_threads > 0{
            step = sample_threads;
        }
        else{
            step = 1;
        }
    }
    else{
        if args.pseudotax{
            step = usize::max(args.threads/3 + 1, usize::min(num_raw_read_files, args.threads))
        }
        else{
            step = usize::max(1, usize::min(num_raw_read_files, args.threads))
        }
    }

    let read_sketch_files_as_vec = read_sketch_files.clone().into_iter().map(|x| vec![x]).collect::<Vec<Vec<&String>>>();
    read_files.extend(read_sketch_files_as_vec);
    let sequence_index_vec = (0..read_files.len()).collect::<Vec<usize>>();
    let out_writer:Mutex<Box<dyn Write + Send>> = Mutex::new(out_writer);

    let chunks = get_chunks(&sequence_index_vec, step);

    print_header(args.pseudotax,&mut *out_writer.lock().unwrap(), args.estimate_unknown);
    chunks.into_iter().for_each(|chunk| {
        chunk.into_par_iter().for_each(|j|{
            let is_sketch = j >= read_files.len() - read_sketch_files.len();
            let sequence_sketch = get_seq_sketch(&args, &read_files[j], is_sketch, genome_sketches[0].c, genome_sketches[0].k);
            if sequence_sketch.is_some(){
                let first_read_file = read_files[j][0];
                let sequence_sketch = sequence_sketch.unwrap();
                
                let kmer_id_opt;
                if args.seq_id.is_some(){
                    kmer_id_opt = Some((args.seq_id.unwrap()/100.).powf(sequence_sketch.k as f64));
                }
                else{
                    kmer_id_opt = get_kmer_identity(&sequence_sketch, args.estimate_unknown);
                    log::debug!("{} has estimated identity {:.3}.", &first_read_file, kmer_id_opt.unwrap().powf(1./sequence_sketch.k as f64) * 100.);
                }
                
                let stats_vec_seq: Mutex<Vec<AniResult>> = Mutex::new(vec![]);
                genome_index_vec.par_iter().for_each(|i| {
                    let genome_sketch = &genome_sketches[*i];
                    let res = get_stats(&args, &genome_sketch, &sequence_sketch, None, args.log_reassignments);
                    if res.is_some() {
                        //res.as_mut().unwrap().genome_sketch_index = *i;
                        stats_vec_seq.lock().unwrap().push(res.unwrap());
                        
                    }
                });

                let mut stats_vec_seq = stats_vec_seq.into_inner().unwrap();
                estimate_true_cov(&mut stats_vec_seq, kmer_id_opt, args.estimate_unknown, sequence_sketch.mean_read_length, sequence_sketch.k);

                if args.pseudotax{
                    log::info!("{} taxonomic profiling; reassigning k-mers for {} genomes...", &first_read_file, stats_vec_seq.len());
                    let winner_map = winner_table(&stats_vec_seq, args.log_reassignments);
                    let remaining_genomes = stats_vec_seq.iter().map(|x| x.genome_sketch).collect::<Vec<&GenomeSketch>>();
                    let stats_vec_seq_2 = Mutex::new(vec![]);
                    remaining_genomes.into_par_iter().for_each(|genome_sketch|{
                        let res = get_stats(&args, &genome_sketch, &sequence_sketch, Some(&winner_map), args.log_reassignments);
                        if res.is_some() {
                            stats_vec_seq_2.lock().unwrap().push(res.unwrap());
                        }
                    });
                    stats_vec_seq = derep_if_reassign_threshold(&stats_vec_seq, stats_vec_seq_2.into_inner().unwrap(), args.redundant_ani, sequence_sketch.k);
                    //stats_vec_seq = stats_vec_seq_2.into_inner().unwrap();
                    estimate_true_cov(&mut stats_vec_seq, kmer_id_opt, args.estimate_unknown, sequence_sketch.mean_read_length, sequence_sketch.k);
                    log::info!("{} has {} genomes passing profiling threshold. ", &first_read_file, stats_vec_seq.len());

                    let mut bases_explained = 1.;
                    if args.estimate_unknown{
                        bases_explained = estimate_covered_bases(&stats_vec_seq, &sequence_sketch, sequence_sketch.mean_read_length, sequence_sketch.k);
                        log::info!("{} has {:.2}% of reads detected in database by profile", &first_read_file, bases_explained * 100.);
                        
                    }

                    let total_cov = stats_vec_seq.iter().map(|x| x.final_est_cov).sum::<f64>();
                    let total_seq_cov = stats_vec_seq.iter().map(|x| x.final_est_cov * x.genome_sketch.gn_size as f64).sum::<f64>();
                    for thing in stats_vec_seq.iter_mut(){
                        thing.rel_abund = Some(thing.final_est_cov/total_cov * 100.);
                    }
                    for thing in stats_vec_seq.iter_mut(){
                        if args.estimate_read_counts{
                            thing.seq_abund = Some((thing.final_est_cov * thing.genome_sketch.gn_size as f64 / sequence_sketch.mean_read_length * bases_explained).round());
                        }
                        else{
                            let seq_abund = thing.final_est_cov * thing.genome_sketch.gn_size as f64 / total_seq_cov * 100. * bases_explained;
                            thing.seq_abund = Some(seq_abund);
                        }
                    }
                }

                if args.pseudotax{
                    stats_vec_seq.sort_by(|x,y| y.rel_abund.unwrap().partial_cmp(&x.rel_abund.unwrap()).unwrap());
                }
                else{
                    stats_vec_seq.sort_by(|x,y| y.final_est_ani.partial_cmp(&x.final_est_ani).unwrap());
                }
                
                let mut out_writer = out_writer.lock().unwrap();
                for res in stats_vec_seq{
                    print_ani_result(&res, args.pseudotax, &mut *out_writer);
                }
            }
            if read_files[j].len() > 1{
                log::info!("Finished paired sample {}.", &read_files[j][0]);
            }
            else{
                log::info!("Finished sample {}.", &read_files[j][0]);
            }
        });
    });

    log::info!("sylph finished.");
}

fn derep_if_reassign_threshold<'a>(results_old: &Vec<AniResult>, results_new: Vec<AniResult<'a>>, ani_thresh: f64, k: usize) -> Vec<AniResult<'a>>{
    let ani_thresh = ani_thresh/100.;

    let mut gn_sketch_to_contain = FxHashMap::default();
    for result in results_old.iter(){
        gn_sketch_to_contain.insert(result.genome_sketch, result);
    }

    let threshold = f64::powf(ani_thresh, k as f64);
    let mut return_vec = vec![];
    for result in results_new.into_iter(){
        let old_res = &gn_sketch_to_contain[result.genome_sketch];
        let num_kmer_reassign = (old_res.containment_index.0 - result.containment_index.0) as f64;
        let reass_thresh = threshold * result.containment_index.1 as f64;
        if num_kmer_reassign < reass_thresh{
            return_vec.push(result);
        }
        else{
            log::debug!("genome {} had num k-mers reassigned = {}, threshold was {}, removing.", result.gn_name, num_kmer_reassign, reass_thresh);
        }
    }
    return return_vec;
}

fn estimate_true_cov(results: &mut Vec<AniResult>, kmer_id_opt: Option<f64>, 
                     estimate_unknown: bool, read_length: f64, k: usize){
    let mut multiplier = 1.;
    if estimate_unknown{
        multiplier = read_length / (read_length - k as f64 + 1.);
    }
    if estimate_unknown && kmer_id_opt.is_some(){
        let id = kmer_id_opt.unwrap();
        for res in results.iter_mut(){
            res.final_est_cov = res.final_est_cov / id * multiplier ;
        }
    }
}

fn estimate_covered_bases(results: &Vec<AniResult>, sequence_sketch: &SequencesSketch, read_length: f64, k: usize) -> f64{
    let multiplier = read_length / (read_length - (k as f64) + 1.);

    let mut num_covered_bases = 0.;
    for res in results.iter(){
        num_covered_bases += (res.genome_sketch.gn_size as f64) * res.final_est_cov
    }
    let mut num_total_counts = 0;
    for count in sequence_sketch.kmer_counts.values(){
        num_total_counts += *count as usize;
    }
    let num_tentative_bases = sequence_sketch.c * num_total_counts;
    let num_tentative_bases = num_tentative_bases as f64 * multiplier;
    if num_tentative_bases == 0.{
        return 0.;
    }
    return f64::min(num_covered_bases as f64 / num_tentative_bases, 1.);
}

fn winner_table<'a>(results : &'a Vec<AniResult>, log_reassign: bool) -> FxHashMap<Kmer, (f64,&'a GenomeSketch, bool)> {
    let mut kmer_to_genome_map : FxHashMap<_,_> = FxHashMap::default();
    for res in results.iter(){
        //let gn_sketch = &genome_sketches[res.genome_sketch_index];
        let gn_sketch = res.genome_sketch;
        for kmer in gn_sketch.genome_kmers.iter(){
            let v = kmer_to_genome_map.entry(*kmer).or_insert((res.final_est_ani, res.genome_sketch, false));
            if res.final_est_ani > v.0{
                *v = (res.final_est_ani, gn_sketch, true);
            }
        }
        
        if gn_sketch.pseudotax_tracked_nonused_kmers.is_some(){
            for kmer in gn_sketch.pseudotax_tracked_nonused_kmers.as_ref().unwrap().iter(){
                let v = kmer_to_genome_map.entry(*kmer).or_insert((res.final_est_ani, res.genome_sketch, false));
                if res.final_est_ani > v.0{
                    *v = (res.final_est_ani, gn_sketch, true);
                }
            }
        }
    }

    //log reassigned kmers
    if log_reassign{
        log::info!("------------- Logging k-mer reassignments -----------------");
        let mut sketch_to_index = FxHashMap::default();
        for (i,res) in results.iter().enumerate(){
            log::info!("Index\t{}\t{}\t{}", i, res.genome_sketch.file_name, res.genome_sketch.first_contig_name);
            sketch_to_index.insert(res.genome_sketch, i);
        }
        (0..results.len()).into_par_iter().for_each(|i|{
            let res = &results[i];
            let mut reassign_edge_map = FxHashMap::default();
            for kmer in res.genome_sketch.genome_kmers.iter(){
                let value = kmer_to_genome_map[kmer].1;
                if value != res.genome_sketch{
                    let edge_count = reassign_edge_map.entry((sketch_to_index[value],i)).or_insert(0);
                    *edge_count += 1;
                }
            }
            for (key,val) in reassign_edge_map{
                if val > 10{
                    log::info!("{}->{}\t{}\tkmers reassigned", key.0, key.1, val);
                }
            }
        });
    }

    return kmer_to_genome_map;
}

fn print_header(pseudotax: bool, writer: &mut Box<dyn Write + Send>, estimate_unknown: bool) {
    if !pseudotax{
        writeln!(writer,
            //"Sample_file\tQuery_file\tAdjusted_ANI\tNaive_ANI\tANI_5-95_percentile\tEff_cov\tEff_lambda\tLambda_5-95_percentile\tMedian_cov\tMean_cov_geq1\tContainment_ind\tContig_name",
            "Sample_file\tGenome_file\tAdjusted_ANI\tEff_cov\tANI_5-95_percentile\tEff_lambda\tLambda_5-95_percentile\tMedian_cov\tMean_cov_geq1\tContainment_ind\tNaive_ANI\tContig_name",
            ).expect("Error writing to file.");
    }
    else{
        let cov_head;
        if estimate_unknown{
            cov_head = "True_cov";
        }
        else{
            cov_head = "Eff_cov";
        }
        writeln!(writer,
            "Sample_file\tGenome_file\tTaxonomic_abundance\tSequence_abundance\tAdjusted_ANI\t{}\tANI_5-95_percentile\tEff_lambda\tLambda_5-95_percentile\tMedian_cov\tMean_cov_geq1\tContainment_ind\tNaive_ANI\tkmers_reassigned\tContig_name", cov_head
            ).expect("Error writing to file.");
    }
}

fn get_genome_sketches(
    args: &ContainArgs,
    genome_sketch_files: &Vec<&String>,
    genome_files: &Vec<&String>,
) -> Vec<GenomeSketch> {
    let mut lowest_genome_c = None;
    let mut current_k = None;

    let genome_sketches = Mutex::new(vec![]);

    for genome_sketch_file in genome_sketch_files {
        let file = File::open(genome_sketch_file).expect(&format!("The sketch `{}` could not be opened. Exiting", genome_sketch_file));
        let genome_reader = BufReader::with_capacity(10_000_000, file);
        let genome_sketches_vec: Vec<GenomeSketch> = bincode::deserialize_from(genome_reader)
            .expect(&format!(
                "The sketch `{}` is not a valid sketch. Perhaps it is an older, incompatible version ",
                &genome_sketch_file
            ));
        if genome_sketches_vec.is_empty() {
            continue;
        }
        let c = genome_sketches_vec.first().unwrap().c;
        let k = genome_sketches_vec.first().unwrap().k;
        if lowest_genome_c.is_none() {
            lowest_genome_c = Some(c);
        } else if lowest_genome_c.unwrap() < c {
            lowest_genome_c = Some(c);
        }
        if current_k.is_none() {
            current_k = Some(genome_sketches_vec.first().unwrap().k);
        } else if current_k.unwrap() != k {
            error!("Query sketches have inconsistent -k. Exiting.");
            std::process::exit(1);
        }
        genome_sketches.lock().unwrap().extend(genome_sketches_vec);
    }

    genome_files.into_par_iter().for_each(|genome_file|{
        if lowest_genome_c.is_some() && lowest_genome_c.unwrap() < args.c{
            error!("Value of -c for contain is {} -- greater than the smallest value of -c for a genome sketch {}. Continuing without sketching.", args.c, lowest_genome_c.unwrap());
        }
        else if current_k.is_some() && current_k.unwrap() != args.k{
            error!("-k {} is not equal to -k {} found in sketches. Continuing without sketching.", args.k, current_k.unwrap());
        }
        else {
            if args.individual{
            let indiv_gn_sketches = sketch_genome_individual(args.c, args.k, genome_file, args.min_spacing_kmer, args.pseudotax);
                genome_sketches.lock().unwrap().extend(indiv_gn_sketches);

            }
            else{
                let genome_sketch_opt = sketch_genome(args.c, args.k, &genome_file, args.min_spacing_kmer, args.pseudotax);
                if genome_sketch_opt.is_some() {
                    genome_sketches.lock().unwrap().push(genome_sketch_opt.unwrap());
                }
            }
        }
    });

    return genome_sketches.into_inner().unwrap();
}

fn get_seq_sketch(
    args: &ContainArgs,
    read_file: &Vec<&String>,
    is_sketch_file: bool,
    genome_c: usize,
    genome_k: usize,
) -> Option<SequencesSketch> {
    if is_sketch_file {
        let read_file = read_file[0];
        let read_sketch_file = read_file;
        let file = File::open(read_sketch_file).expect(&format!(
            "The sketch `{}` could not be opened",
            &read_sketch_file
        ));
        let read_reader = BufReader::with_capacity(10_000_000, file);
        let read_sketch: SequencesSketch = bincode::deserialize_from(read_reader).expect(
            &format!("The sketch `{}` is not a valid sketch. Perhaps it is an older incompatible version ", read_sketch_file),
        );
        if read_sketch.c > genome_c {
            error!("{} value of -c is {}; this is greater than the smallest value of -c = {} for a genome sketch. Exiting.", read_file, read_sketch.c, genome_c);
            return None;
        }
        else if read_sketch.c < genome_c{
            info!("{} value of -c for reads is {}; this is smaller than the -c for a genome sketch. Using the larger -c {} instead.", read_file, read_sketch.c,  genome_c);
        }

        return Some(read_sketch);
    } else {
        if args.c > genome_c{
            info!("{} value of -c for reads is {}; this is smaller than the -c for a genome sketch. Using the larger -c {} instead.", read_file[0], args.c,  genome_c);
        }
        if genome_c < args.c {
            error!("{} error: value of -c for contain = {} -- greater than the smallest value of -c for a genome sketch = {}. Continuing without sketching.", read_file[0], args.c, genome_c);
            return None;
        } 
        else if genome_k != args.k {
            error!(
                "{} -k {} is not equal to -k {} found in sketches. Continuing without sketching.",
                read_file[0], args.k, genome_k
            );
            return None;
        } else {
            if read_file.len() == 1{
                let read_sketch_opt = sketch_sequences_needle(&read_file[0], args.c, args.k, None, false);
                return read_sketch_opt;
            }
            else if read_file.len() == 2{
                let read_sketch_opt = sketch_pair_sequences(&read_file[0], &read_file[1], args.c, args.k, None, false, DEFAULT_FPR);
                return read_sketch_opt;
            }
            else{
                panic!("Internal Error: read_file has length {}. Something went wrong...", read_file.len());
            }
        }
    }
}

fn get_stats<'a>(
    args: &ContainArgs,
    genome_sketch: &'a GenomeSketch,
    sequence_sketch: &SequencesSketch,
    winner_map: Option<&FxHashMap<Kmer, (f64,& GenomeSketch, bool)>>,
    log_reassign: bool
) -> Option<AniResult<'a>> {
    if genome_sketch.k != sequence_sketch.k {
        log::error!(
            "k parameter for reads {} != k parameter for genome {}",
            sequence_sketch.k,
            genome_sketch.k
        );
        std::process::exit(1);
    }
    if genome_sketch.c < sequence_sketch.c {
        log::error!(
            "c parameter for reads {} > c parameter for genome {}",
            sequence_sketch.c,
            genome_sketch.c
        );
        std::process::exit(1);
    }
    let mut contain_count = 0;
    let mut covs = vec![];
    let gn_kmers = &genome_sketch.genome_kmers;
    if (gn_kmers.len() as f64) < args.min_number_kmers{
        return None
    }

    let mut kmers_lost_count = 0;
    for kmer in gn_kmers.iter() {
        if sequence_sketch.kmer_counts.contains_key(kmer) {
            if sequence_sketch.kmer_counts[kmer] == 0{
                continue
            }
            if winner_map.is_some(){
                let map = &winner_map.unwrap();
                if map[kmer].1 != genome_sketch{
                    kmers_lost_count += 1;
                    continue
                }
                contain_count += 1;
                covs.push(sequence_sketch.kmer_counts[kmer]);

            }
            else{
                contain_count += 1;
                covs.push(sequence_sketch.kmer_counts[kmer]);
            }
        }
    }

    if covs.is_empty() {
        return None;
    }
    let naive_ani = f64::powf(
        contain_count as f64 / gn_kmers.len() as f64,
        1. / genome_sketch.k as f64,
    );
    covs.sort();
    //let covs = &covs[0..covs.len() * 99 / 100];
    let median_cov = covs[covs.len() / 2] as f64;
    let pois = Poisson::new(median_cov).unwrap();
    let mut max_cov = f64::MAX;
    if median_cov < 30.{
        for i in covs.len() / 2..covs.len(){
            let cov = covs[i];
            if pois.cdf(cov.into()) < CUTOFF_PVALUE {
                max_cov = cov as f64;
            } else {
                break;
            }
        }
    }
    log::trace!("COV VECTOR for {}/{}: {:?}, MAX_COV_THRESHOLD: {}", sequence_sketch.file_name, genome_sketch.file_name ,covs, max_cov);


    let mut full_covs = vec![0; gn_kmers.len() - contain_count];
    for cov in covs.iter() {
        if (*cov as f64) <= max_cov {
            full_covs.push(*cov);
        }
    }
    let var = var(&full_covs);
    if var.is_some(){
        log::trace!("VAR {} {}", var.unwrap(), genome_sketch.file_name);
    }
    let mean_cov = full_covs.iter().sum::<u32>() as f64 / full_covs.len() as f64;
    let geq1_mean_cov = full_covs.iter().sum::<u32>() as f64 / covs.len() as f64;

    let use_lambda;
    if median_cov > MEDIAN_ANI_THRESHOLD {
        use_lambda = AdjustStatus::High
    } else {
        let test_lambda;
        if args.ratio {
            test_lambda = ratio_lambda(&full_covs, args.min_count_correct)
        } else if args.mme {
            test_lambda = mme_lambda(&full_covs)
        } else if args.nb {
            test_lambda = binary_search_lambda(&full_covs)
        } else if args.mle {
            test_lambda = mle_zip(&full_covs, sequence_sketch.k as f64)
        } else {
            test_lambda = ratio_lambda(&full_covs, args.min_count_correct)
        };
        if test_lambda.is_none() {
            use_lambda = AdjustStatus::Low
        } else {
            use_lambda = AdjustStatus::Lambda(test_lambda.unwrap());
        }
    }

    let final_est_cov;

    if let AdjustStatus::Lambda(lam) = use_lambda {
        final_est_cov = lam
    } else if median_cov < MAX_MEDIAN_FOR_MEAN_FINAL_EST{
        final_est_cov = geq1_mean_cov;
    } else{
        if args.mean_coverage{
            final_est_cov = geq1_mean_cov;
        }
        else{
            final_est_cov = median_cov;
        }
    }

    let opt_lambda;
    if use_lambda == AdjustStatus::Low || use_lambda == AdjustStatus::High {
        opt_lambda = None
    } else {
        opt_lambda = Some(final_est_cov)
    };

    let opt_est_ani = ani_from_lambda(opt_lambda, mean_cov, sequence_sketch.k as f64, &full_covs);
    
    let final_est_ani;
    if opt_lambda.is_none() || opt_est_ani.is_none() || args.no_adj {
        final_est_ani = naive_ani;
    } else {
        final_est_ani = opt_est_ani.unwrap();
    }

    let min_ani = if args.minimum_ani.is_some() {args.minimum_ani.unwrap()/100. }
        else if args.pseudotax { MIN_ANI_P_DEF } 
        else { MIN_ANI_DEF };
    if final_est_ani < min_ani {
        if winner_map.is_some(){
            //Used to be > min ani, now it is not after reassignment
            if log_reassign{
                log::info!("Genome/contig {}/{} has ANI = {} < {} after reassigning {} k-mers ({} contained k-mers after reassign)", 
                    genome_sketch.file_name,
                    genome_sketch.first_contig_name,
                    final_est_ani * 100.,
                    min_ani * 100.,
                    kmers_lost_count,
                    contain_count)
            }

        }
        return None;
    }

    let (mut low_ani, mut high_ani, mut low_lambda, mut high_lambda) = (None, None, None, None);
    if !args.no_ci && opt_lambda.is_some() {
        let bootstrap = bootstrap_interval(&full_covs, sequence_sketch.k as f64, &args);
        low_ani = bootstrap.0;
        high_ani = bootstrap.1;
        low_lambda = bootstrap.2;
        high_lambda = bootstrap.3;
    }

    
    let seq_name;
    if let Some(sample) = &sequence_sketch.sample_name{
        seq_name = sample.clone();
    }
    else{
        seq_name = sequence_sketch.file_name.clone();
    }

    let kmers_lost;
    if winner_map.is_some(){
        kmers_lost = Some(kmers_lost_count)
    }
    else{
        kmers_lost = None;
    }

    let ani_result = AniResult {
        naive_ani,
        final_est_ani,
        final_est_cov,
        seq_name: seq_name,
        gn_name: genome_sketch.file_name.as_str(),
        contig_name: genome_sketch.first_contig_name.as_str(),
        mean_cov: geq1_mean_cov,
        median_cov,
        containment_index: (contain_count, gn_kmers.len()),
        lambda: use_lambda,
        ani_ci: (low_ani, high_ani),
        lambda_ci: (low_lambda, high_lambda),
        genome_sketch: genome_sketch,
        rel_abund: None,
        seq_abund: None,
        kmers_lost: kmers_lost,

    };
    //log::trace!("Other time {:?}", Instant::now() - start_t_initial);

    return Some(ani_result);
}


fn ani_from_lambda(lambda: Option<f64>, _mean: f64, k: f64, full_cov: &[u32]) -> Option<f64> {
    if lambda.is_none() {
        return None;
    }
    let mut contain_count = 0;
    let mut _zero_count = 0;
    for x in full_cov {
        if *x != 0 {
            contain_count += 1;
        } else {
            _zero_count += 1;
        }
    }

    let lambda = lambda.unwrap();
    let adj_index =
        contain_count as f64 / (1. - f64::exp(-lambda)) / full_cov.len() as f64;
    let ret_ani;
    //let ani = f64::powf(1. - pi, 1./k);
    let ani = f64::powf(adj_index, 1. / k);
    if ani < 0. || ani.is_nan() {
        ret_ani = None;
    } else {
        if ani > 1. {
            ret_ani = Some(ani)
        } else {
            ret_ani = Some(ani);
        }
    }
    return ret_ani;
}

fn bootstrap_interval(
    covs_full: &Vec<u32>,
    k: f64,
    args: &ContainArgs,
) -> (Option<f64>, Option<f64>, Option<f64>, Option<f64>) {
    fastrand::seed(7);
    let num_samp = covs_full.len();
    let iters = 100;
    let mut res_ani = vec![];
    let mut res_lambda = vec![];

    for _ in 0..iters {
        let mut rand_vec = vec![];
        rand_vec.reserve(num_samp);
        for _ in 0..num_samp {
            rand_vec.push(covs_full[fastrand::usize(..covs_full.len())]);
        }
        let lambda;
        if args.ratio {
            lambda = ratio_lambda(&rand_vec, args.min_count_correct);
        } else if args.mme {
            lambda = mme_lambda(&rand_vec);
        } else if args.nb {
            lambda = binary_search_lambda(&rand_vec);
        } else if args.mle {
            lambda = mle_zip(&rand_vec, k);
        } else {
            lambda = ratio_lambda(&rand_vec,args.min_count_correct);
        }
        let ani = ani_from_lambda(lambda, mean(&rand_vec).unwrap().into(), k, &rand_vec);
        if ani.is_some() && lambda.is_some() {
            if !ani.unwrap().is_nan() && !lambda.unwrap().is_nan() {
                res_ani.push(ani);
                res_lambda.push(lambda);
            }
        }
    }
    res_ani.sort_by(|x, y| x.partial_cmp(y).unwrap());
    res_lambda.sort_by(|x, y| x.partial_cmp(y).unwrap());
    if res_ani.len() < 50 {
        return (None, None, None, None);
    }
    let suc = res_ani.len();
    let low_ani = res_ani[suc * 5 / 100 - 1];
    let high_ani = res_ani[suc * 95 / 100 - 1];
    let low_lambda = res_lambda[suc * 5 / 100 - 1];
    let high_lambda = res_lambda[suc * 95 / 100 - 1];

    return (low_ani, high_ani, low_lambda, high_lambda);
}


fn get_kmer_identity(seq_sketch: &SequencesSketch, estimate_unknown: bool) -> Option<f64>{

    if !estimate_unknown{
        return None
    }

    let mut median = 0;
    let mut mov_avg_median = 0.;
    let mut n = 1.;
    for count in seq_sketch.kmer_counts.values(){
        if *count > 1{
            if *count > median{
                median += 1;
            }
            else{
                median -= 1;
            }
            mov_avg_median += median as f64;
            n += 1.;
        }
    }

    mov_avg_median /= n;
    log::debug!("Estimated continuous median k-mer count for {} is {:.3}", &seq_sketch.file_name, mov_avg_median);
    
    let mut num_1s = 0;
    let mut num_not1s = 0;
    for count in seq_sketch.kmer_counts.values(){
        if *count == 1{
            num_1s += 1;
        }
        else{
            num_not1s += *count;
        }
    }
    //0.1 so no div by 0 error
    let eps = num_not1s as f64 / (num_not1s as f64 + num_1s as f64 + 0.1);
    //dbg!("Automatic id est, 1-to-2 ratio, 2-to-3", eps.powf(1./31.), num_1s as f64 / num_2s as f64, two_to_three);

    if mov_avg_median < MED_KMER_FOR_ID_EST && seq_sketch.mean_read_length < 400.{
        log::info!("{} short-read sample has high diversity compared to sequencing depth (approx. avg depth < 3). Using 99.5% as read accuracy estimate instead of automatic detection for --estimate-unknown.", &seq_sketch.file_name);
        return Some(0.995f64.powf(seq_sketch.k as f64));
    }

    if eps < 1.{
        return Some(eps)
    }
    else{
        return Some(1.)
    }
}


================================================
FILE: src/inference.rs
================================================
use statrs::function::gamma::*;
use fxhash::FxHashMap;
use crate::constants::*;
use std::collections::HashSet;

pub fn r_from_moments_lambda(m: f64, v: f64, lambda: f64) -> f64{
    //return (v / m - 1. - lambda + m) / lambda;
    //return 1000.;
    return lambda / (v - 1. + lambda + m)
}

pub fn ratio_formula(val: f64, r: f64, lambda: f64) -> f64{
    if r < 100.{
       return gamma(r + val + 1.) / (val + 1.) / gamma(r + val) * lambda / (r + lambda)
    }
    else{
       return (r + val + 1.) / (val + 1.) * lambda / (r + lambda)
    }
}

fn ratio_from_moments_lambda(val: f64, lambda: f64, m: f64, v: f64) -> Option<f64>{
    let r = r_from_moments_lambda(m, v, lambda);
    if r < 0.{
        return None;
    }
    return Some(ratio_formula(val, r, lambda));
}

pub fn binary_search_lambda(full_covs: &[u32]) -> Option<f64>{
    if full_covs.len() == 0{
        return None
    }
    let m = mean(full_covs).unwrap();
    let v = var(full_covs).unwrap();
    let mut _nonzero = 0;
    let mut ones = 0;
    let mut twos = 0;

    for x in full_covs{
        if *x != 0{
            _nonzero += 1;
        }
        if *x == 1{
            ones += 1;
        }
        else if *x == 2{
            twos += 1;
        }
    }



    let ratio_est = twos as f64 / ones as f64;

    let left = f64::max(0.003, m - 2.);
    let right = m + 5.;
    let endpoints = (left,right);
    let mut best = None;
    let mut best_val = 10000.;
    for i in 0..10000{
        let test = (endpoints.1 - endpoints.0)/10000. * i as f64 + endpoints.0;
        let proposed = ratio_from_moments_lambda(1.,test , m, v);
        if proposed.is_some(){
            let p = proposed.unwrap() - ratio_est;
            if p.abs() < best_val{
                best_val = p.abs();
                best = Some(test);
            }
        }
    }
    if best.is_none(){
        return None
    }
    let best = best.unwrap();
    let r = r_from_moments_lambda(m,v,best);
    dbg!(m,v, ratio_est, r, best);
//    let ratio_adj = 1. - pi;
//    dbg!(best,best_val, r, nonzero, full_covs.len(), f64::powf(nonzero as f64 / full_covs.len() as f64, 1./k));
//    dbg!(1. - m / best, f64::powf(1. - m/best, 1. / k as f64));
//    dbg!(zeros_from_nb, ratio_adj, f64::powf(ratio_adj, 1. / k));
//    let val = 1.;
//    let mut endpoints_output = (ratio_from_moments_lambda(1., endpoints.0, m, v) - ratio_est, ratio_from_moments_lambda(1., endpoints.1, m, v) - ratio_est);
//    if endpoints_output.0 < 0. || endpoints_output.1 > 0.{
//        return None;
//    }
//    dbg!(endpoints_output);
//    for _ in 0..100{
//        let proposed = ratio_from_moments_lambda(1., (endpoints.1 + endpoints.0)/2., m, v) - ratio_est;
//        if proposed > 0.{
//            endpoints.0 = (endpoints.1 + endpoints.0)/2.;
//            endpoints_output.0 = proposed;
//        }
//        else{
//            endpoints.1 = (endpoints.1 + endpoints.0)/2.;
//            endpoints_output.1 = proposed;
//        }
//        curr = endpoints_output.0;
//        dbg!(endpoints, endpoints_output, proposed);
//    }
//
    return Some(best);
}

pub fn var(data: &[u32]) -> Option<f64> {
    if data.is_empty(){
        return None
    }
    let mean = mean(data).unwrap();
    let mut var = 0.;
    for x in data{
        var += (*x as f64 - mean) * (*x as f64 - mean)
    }
    return Some(var / data.len() as f64);
}

pub fn mean(data: &[u32]) -> Option<f64> {
    let sum = data.iter().sum::<u32>() as f64;
    let count = data.len();

    match count {
        positive if positive > 0 => Some(sum / count as f64),
        _ => None,
    }
}

pub fn mme_lambda(full_covs: &[u32]) -> Option<f64> {
    let mut num_zero = 0;
    let mut count_set: HashSet<_> = HashSet::default();

    for x in full_covs {
        if *x == 0 {
            num_zero += 1;
        } else {
            count_set.insert(x);
        }
    }

    //Lack of information for inference, retun None.
    if count_set.len() == 1 {
        return None;
    }

    if full_covs.len() - num_zero < SAMPLE_SIZE_CUTOFF {
        return None;
    }

    let mean = mean(&full_covs).unwrap();
    let var = var(&full_covs).unwrap();
    let lambda = var / mean + mean - 1.;
    if lambda < 0. {
        return None;
    } else {
        return Some(lambda as f64);
    }
}

pub fn mle_zip(full_covs: &[u32], _k: f64) -> Option<f64> {
    let mut num_zero = 0;
    let mut count_set: HashSet<_> = HashSet::default();

    for x in full_covs {
        if *x == 0 {
            num_zero += 1;
        } else {
            count_set.insert(x);
        }
    }

    //Lack of information for inference, retun None.
    if count_set.len() == 1 {
        return None;
    }

    if full_covs.len() - num_zero < SAMPLE_SIZE_CUTOFF {
        return None;
    }

    let mean = mean(&full_covs).unwrap();
    let lambda = newton_raphson(
        (num_zero as f32 / full_covs.len() as f32).into(),
        mean.into(),
    );
    //    log::trace!("lambda,pi {} {} {}", lambda,pi, num_zero as f64 / full_covs.len() as f64);
    let ret_lambda;
    if lambda < 0. || lambda.is_nan() {
        ret_lambda = None
    } else {
        ret_lambda = Some(lambda);
    }

    return ret_lambda;
}

fn newton_raphson(rat: f64, mean: f64) -> f64 {
    let mut curr = mean / (1. - rat);
    //    dbg!(1. - mean,rat);
    for _ in 0..1000 {
        let t1 = (1. - rat) * curr;
        let t2 = mean * (1. - f64::exp(-curr));
        let t3 = 1. - rat;
        let t4 = mean * (f64::exp(-curr));
        curr = curr - (t1 - t2) / (t3 - t4);
    }
    return curr;
}

pub fn ratio_lambda(full_covs: &Vec<u32>, min_count_correct: f64) -> Option<f64> {
    let mut num_zero = 0;
    let mut count_map: FxHashMap<_, _> = FxHashMap::default();

    for x in full_covs {
        if *x == 0 {
            num_zero += 1;
        } else {
            let c = count_map.entry(*x as usize).or_insert(0);
            *c += 1;
        }
    }

    //Lack of information for inference, retun None.
    if count_map.len() == 1 {
        return None;
    }

    if full_covs.len() - num_zero < SAMPLE_SIZE_CUTOFF {
        return None;
    } else {
        let mut sort_vec: Vec<(_, _)> = count_map.iter().map(|x| (x.1, x.0)).collect();
        sort_vec.sort_by(|x, y| y.cmp(&x));
        let most_ind = sort_vec[0].1;
        if !count_map.contains_key(&(most_ind + 1)) {
            return None;
        }
        let count_p1 = count_map[&(most_ind + 1)] as f64;
        let count = count_map[&most_ind] as f64;
        if count_p1 < min_count_correct || count < min_count_correct{
            return None;
        }
        let lambda = Some(count_p1 / count * ((most_ind + 1) as f64));
        return lambda;
    }
}


================================================
FILE: src/inspect.rs
================================================
use crate::types::*;
use std::fs::File;
use std::io::BufReader;
use std::io::BufWriter;
use std::io::Write;
use log::*;
use crate::constants::*;
use crate::cmdline::*;
use serde::{Deserialize, Serialize};

fn pipe_write(text: &str, writer: &mut Box<dyn Write + Send>){
    let result = write!(writer, "{}", text);
    match result {
        Err(e) if e.kind() == std::io::ErrorKind::BrokenPipe => {},
        _other => {},
    }
}

#[derive(Default, Deserialize, Serialize, Debug, PartialEq)]
struct SequencesSketchInspect{
    pub file_name: String,
    pub c: usize,
    pub k: usize,
    pub num_sketched_kmers: usize,
    pub approximate_number_bases: f32,
    pub mean_read_length: f64,
    pub sample_name: Option<String>,
    pub paired: bool,
}

impl From<SequencesSketch> for SequencesSketchInspect{
    fn from(
        sk: SequencesSketch
    ) -> Self {
        SequencesSketchInspect{
            file_name: sk.file_name,
            num_sketched_kmers: sk.kmer_counts.len(),
            c: sk.c,
            k: sk.k,
            approximate_number_bases: (sk.mean_read_length + sk.k as f64 - 1.) as f32 / (sk.mean_read_length) as f32 * sk.c as f32 * sk.kmer_counts.len() as f32,
            sample_name: sk.sample_name,
            paired: sk.paired,
            mean_read_length: sk.mean_read_length,
        }
    }
}

#[derive(Deserialize, Serialize, Debug, PartialEq, Hash, PartialOrd, Eq, Ord, Default, Clone)]
pub struct GenomeSketchInspect{
    pub file_name: String,
    pub genome_kmers_num: usize,
    pub first_contig_name: String,
    pub genome_size: usize,
}

impl From<GenomeSketch> for GenomeSketchInspect {
    fn from(
        sk: GenomeSketch
    ) -> Self {
        GenomeSketchInspect{
            genome_kmers_num: sk.genome_kmers.len(),
            file_name: sk.file_name,
            first_contig_name: sk.first_contig_name,
            genome_size: sk.gn_size,
        }
    }
}

#[derive(Deserialize, Serialize, Debug, PartialEq, Hash, PartialOrd, Eq, Ord, Default, Clone)]
pub struct DatabaseSketch{
    pub database_file: String,
    pub c: usize,
    pub k: usize,
    pub min_spacing_parameter: usize,
    pub genome_files: Vec<GenomeSketchInspect>,
}

#[derive(Debug, Default)]
struct DatabaseVisitor {
    c: Option<usize>,
    k: Option<usize>,
    min_spacing: Option<usize>,
    sketches: Vec<GenomeSketchInspect>,
}

impl<'de> serde::de::Visitor<'de> for DatabaseVisitor {
    type Value = Self;
    fn expecting(&self, formatter: &mut std::fmt::Formatter) -> std::fmt::Result {
        formatter.write_str("sequence of struct GenomeSketch")
    }

    fn visit_seq<S>(mut self, mut seq: S) -> Result<Self, S::Error>
        where
            S: serde::de::SeqAccess<'de>,
        {
            while let Some(value) = seq.next_element::<GenomeSketch>()? {
                self.c.get_or_insert(value.c);
                self.k.get_or_insert(value.k);
                self.min_spacing.get_or_insert(value.min_spacing);
                self.sketches.push(value.into());
            }

            Ok(self)
        }
}

impl<'de> Deserialize<'de> for DatabaseVisitor {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
        where D: serde::de::Deserializer<'de>
    {
        let inspector = DatabaseVisitor::default();
        deserializer.deserialize_seq(inspector)
    }
}


pub fn inspect(args: InspectArgs){
    simple_logger::SimpleLogger::new()
        .with_level(log::LevelFilter::Info)
        .init()
        .unwrap();

    let mut read_sketch_files = Vec::new();
    let mut genome_sketch_files = Vec::new();

    for file in args.files.iter(){
        let mut genome_sketch_good_suffix = false;
        for suff in QUERY_FILE_SUFFIX_VALID{
            if file.ends_with(suff){
                genome_sketch_good_suffix = true;
                break
            }
        }

        let mut sample_sketch_good_suffix = false;
        for suff in SAMPLE_FILE_SUFFIX_VALID{
            if file.ends_with(suff){
                sample_sketch_good_suffix = true;
                break
            }
        }

        if genome_sketch_good_suffix{
            genome_sketch_files.push(file);
        } else if sample_sketch_good_suffix{
            read_sketch_files.push(file);
        } else {
            warn!(
                "{} file is not a .sylsp or .syldb file. Skipping...",
                &file
            );
        }
    }

    let mut out_writer = match args.out_file_name {
        Some(ref x) => {
            Box::new(BufWriter::new(File::create(&x).unwrap())) as Box<dyn Write + Send>
        }
        None => Box::new(BufWriter::new(std::io::stdout())) as Box<dyn Write + Send>,
    };

    let mut db_sketches_inspect = Vec::new();
    for file in genome_sketch_files.iter(){
        let db_sketch = get_db_sketch_inspect(file);
        db_sketches_inspect.push(db_sketch);
    }
    let yaml = serde_yaml::to_string(&db_sketches_inspect).unwrap();
    if !db_sketches_inspect.is_empty(){
        pipe_write(&yaml, &mut out_writer);
    }

    let mut seq_sketches_inspect = Vec::new();

    for file in read_sketch_files.iter(){
        let seq_sketch = get_seq_sketch_inspect(file);
        seq_sketches_inspect.push(seq_sketch);
    }
    let yaml = serde_yaml::to_string(&seq_sketches_inspect).unwrap();
    if !seq_sketches_inspect.is_empty(){
        pipe_write(&yaml, &mut out_writer);
    }
}

fn get_db_sketch_inspect(
    genome_sketch_file: &String,
)  -> DatabaseSketch{

    let file = File::open(genome_sketch_file).expect(&format!("The sketch `{}` could not be opened. Exiting", genome_sketch_file));
    let genome_reader = BufReader::with_capacity(10_000_000, file);

    let visitor: DatabaseVisitor = bincode::deserialize_from(genome_reader)
        .expect(&format!(
            "The database sketch `{}` is not a valid sketch. Perhaps it is an older, incompatible version ",
            &genome_sketch_file
        ));
    if visitor.sketches.is_empty() {
        warn!(
            "The database sketch `{}` is empty. Skipping...",
            &genome_sketch_file
        );
        return DatabaseSketch::default();
    }

    info!(
        "Database file {} processed with {} genomes",
        genome_sketch_file, visitor.sketches.len()
    );

    DatabaseSketch{
        database_file: genome_sketch_file.clone(),
        c: visitor.c.unwrap(),
        k: visitor.k.unwrap(),
        min_spacing_parameter: visitor.min_spacing.unwrap(),
        genome_files: visitor.sketches,
    }
}

fn get_seq_sketch_inspect(
    read_file: &String,
) -> SequencesSketchInspect{
    let file = File::open(read_file).expect(&format!("The sketch `{}` could not be opened. Exiting", read_file));
    let seq_reader = BufReader::with_capacity(10_000_000, file);
    let seq_sketch: SequencesSketch = bincode::deserialize_from(seq_reader)
        .expect(&format!(
            "The sequence sketch `{}` is not a valid sketch. Perhaps it is an older, incompatible version ",
            &read_file
        ));
    info!(
        "Sequence file {} processed",
        read_file,
    );
    seq_sketch.into()
}


================================================
FILE: src/lib.rs
================================================
pub mod sketch;
pub mod constants;
pub mod types;
pub mod seeding;
pub mod cmdline;
pub mod contain;
pub mod inference;
pub mod inspect;

#[cfg(target_arch = "x86_64")]
pub mod avx2_seeding;




================================================
FILE: src/main.rs
================================================
use clap::Parser;
use sylph::cmdline::*;
use sylph::sketch;
use sylph::contain;
use sylph::inspect;
//use std::panic::set_hook;

//Use this allocator when statically compiling
//instead of the default
//because the musl statically compiled binary
//uses a bad default allocator which makes the
//binary take 60% longer!!! Only affects
//static compilation though. 
#[cfg(target_env = "musl")]
#[global_allocator]
static GLOBAL: tikv_jemallocator::Jemalloc = tikv_jemallocator::Jemalloc;

fn main() {
//    set_hook(Box::new(|info| {
//        if let Some(s) = info.payload().downcast_ref::<String>() {
//            log::error!("{}", s);
//        }
//    }));
    let cli = Cli::parse();
    match cli.mode {
        Mode::Sketch(sketch_args) => sketch::sketch(sketch_args),
        Mode::Query(contain_args) => contain::contain(contain_args, false),
        Mode::Profile(contain_args) => contain::contain(contain_args, true),
        Mode::Inspect(inspect_args) => inspect::inspect(inspect_args),
    }
}


================================================
FILE: src/seeding.rs
================================================
use crate::types::*;

#[inline]
pub fn mm_hash64(kmer: u64) -> u64 {
    //TODO this is bugged. Fix after release
    let mut key = kmer;
    key = !key.wrapping_add(key << 21); // key = (key << 21) - key - 1;
    key = key ^ key >> 24;
    key = (key.wrapping_add(key << 3)).wrapping_add(key << 8); // key * 265
    key = key ^ key >> 14;
    key = (key.wrapping_add(key << 2)).wrapping_add(key << 4); // key * 21
    key = key ^ key >> 28;
    key = key.wrapping_add(key << 31);
    key
}

#[inline]
pub fn rev_hash_64(hashed_key: u64) -> u64 {
    let mut key = hashed_key;

    // Invert h_key = h_key.wrapping_add(h_key << 31)
    let mut tmp: u64 = key.wrapping_sub(key << 31);
    key = key.wrapping_sub(tmp << 31);

    // Invert h_key = h_key ^ h_key >> 28;
    tmp = key ^ key >> 28;
    key = key ^ tmp >> 28;

    // Invert h_key = h_key.wrapping_add(h_key << 2).wrapping_add(h_key << 4)
    key = key.wrapping_mul(14933078535860113213u64);

    // Invert h_key = h_key ^ h_key >> 14;
    tmp = key ^ key >> 14;
    tmp = key ^ tmp >> 14;
    tmp = key ^ tmp >> 14;
    key = key ^ tmp >> 14;

    // Invert h_key = h_key.wrapping_add(h_key << 3).wrapping_add(h_key << 8)
    key = key.wrapping_mul(15244667743933553977u64);

    // Invert h_key = h_key ^ h_key >> 24
    tmp = key ^ key >> 24;
    key = key ^ tmp >> 24;

    // Invert h_key = (!h_key).wrapping_add(h_key << 21)
    tmp = !key;
    tmp = !(key.wrapping_sub(tmp << 21));
    tmp = !(key.wrapping_sub(tmp << 21));
    key = !(key.wrapping_sub(tmp << 21));

    key
}

pub fn decode(byte: u64) -> u8 {
    if byte == 0 {
        return b'A';
    } else if byte == 1 {
        return b'C';
    } else if byte == 2 {
        return b'G';
    } else if byte == 3 {
        return b'T';
    } else {
        panic!("decoding failed")
    }
}
pub fn print_string(kmer: u64, k: usize) {
    let mut bytes = vec![];
    let mask = 3;
    for i in 0..k {
        let val = kmer >> 2 * i;
        let val = val & mask;
        bytes.push(decode(val));
    }
    dbg!(std::str::from_utf8(&bytes.into_iter().rev().collect::<Vec<u8>>()).unwrap());
}
#[inline]
fn _position_min<T: Ord>(slice: &[T]) -> Option<usize> {
    slice
        .iter()
        .enumerate()
        .max_by(|(_, value0), (_, value1)| value1.cmp(value0))
        .map(|(idx, _)| idx)
}

pub fn fmh_seeds(
    string: &[u8],
    kmer_vec: &mut Vec<u64>,
    c: usize,
    k: usize
) {
    type MarkerBits = u64;
    if string.len() < k {
        return;
    }

    let marker_k = k;
    let mut rolling_kmer_f_marker: MarkerBits = 0;
    let mut rolling_kmer_r_marker: MarkerBits = 0;

    let marker_reverse_shift_dist = 2 * (marker_k - 1);
    let marker_mask = MarkerBits::MAX >> (std::mem::size_of::<MarkerBits>() * 8 - 2 * marker_k);
    let marker_rev_mask = !(3 << (2 * marker_k - 2));
    let len = string.len();
    //    let threshold = i64::MIN + (u64::MAX / (c as u64)) as i64;
    //    let threshold_marker = i64::MIN + (u64::MAX / sketch_params.marker_c as u64) as i64;

    let threshold_marker = u64::MAX / (c as u64);
    for i in 0..marker_k - 1 {
        let nuc_f = BYTE_TO_SEQ[string[i] as usize] as u64;
        //        let nuc_f = KmerEnc::encode(string[i]
        let nuc_r = 3 - nuc_f;
        rolling_kmer_f_marker <<= 2;
        rolling_kmer_f_marker |= nuc_f;
        //        rolling_kmer_r = KmerEnc::rc(rolling_kmer_f, k);
        rolling_kmer_r_marker >>= 2;
        rolling_kmer_r_marker |= nuc_r << marker_reverse_shift_dist;
    }
    for i in marker_k-1..len {
        let nuc_byte = string[i] as usize;
        let nuc_f = BYTE_TO_SEQ[nuc_byte] as u64;
        let nuc_r = 3 - nuc_f;
        rolling_kmer_f_marker <<= 2;
        rolling_kmer_f_marker |= nuc_f;
        rolling_kmer_f_marker &= marker_mask;
        rolling_kmer_r_marker >>= 2;
        rolling_kmer_r_marker &= marker_rev_mask;
        rolling_kmer_r_marker |= nuc_r << marker_reverse_shift_dist;
        //        rolling_kmer_r &= max_mask;
        //        KmerEnc::print_string(rolling_kmer_f, k);
        //        KmerEnc::print_string(rolling_kmer_r, k);
        //

        let canonical_marker = rolling_kmer_f_marker < rolling_kmer_r_marker;
        let canonical_kmer_marker = if canonical_marker {
            rolling_kmer_f_marker
        } else {
            rolling_kmer_r_marker
        };
        let hash_marker = mm_hash64(canonical_kmer_marker);

        if hash_marker < threshold_marker {
            kmer_vec.push(hash_marker as u64);
        }
    }
}

pub fn fmh_seeds_positions(
    string: &[u8],
    kmer_vec: &mut Vec<(usize,usize,u64)>,
    c: usize,
    k: usize,
    contig_number: usize
) {
    type MarkerBits = u64;
    if string.len() < k {
        return;
    }

    let marker_k = k;
    let mut rolling_kmer_f_marker: MarkerBits = 0;
    let mut rolling_kmer_r_marker: MarkerBits = 0;

    let marker_reverse_shift_dist = 2 * (marker_k - 1);
    let marker_mask = MarkerBits::MAX >> (std::mem::size_of::<MarkerBits>() * 8 - 2 * marker_k);
    let marker_rev_mask = !(3 << (2 * marker_k - 2));
    let len = string.len();
    //    let threshold = i64::MIN + (u64::MAX / (c as u64)) as i64;
    //    let threshold_marker = i64::MIN + (u64::MAX / sketch_params.marker_c as u64) as i64;

    let threshold_marker = u64::MAX / (c as u64);
    for i in 0..marker_k - 1 {
        let nuc_f = BYTE_TO_SEQ[string[i] as usize] as u64;
        //        let nuc_f = KmerEnc::encode(string[i]
        let nuc_r = 3 - nuc_f;
        rolling_kmer_f_marker <<= 2;
        rolling_kmer_f_marker |= nuc_f;
        //        rolling_kmer_r = KmerEnc::rc(rolling_kmer_f, k);
        rolling_kmer_r_marker >>= 2;
        rolling_kmer_r_marker |= nuc_r << marker_reverse_shift_dist;
    }
    for i in marker_k-1..len {
        let nuc_byte = string[i] as usize;
        let nuc_f = BYTE_TO_SEQ[nuc_byte] as u64;
        let nuc_r = 3 - nuc_f;
        rolling_kmer_f_marker <<= 2;
        rolling_kmer_f_marker |= nuc_f;
        rolling_kmer_f_marker &= marker_mask;
        rolling_kmer_r_marker >>= 2;
        rolling_kmer_r_marker &= marker_rev_mask;
        rolling_kmer_r_marker |= nuc_r << marker_reverse_shift_dist;
        //        rolling_kmer_r &= max_mask;
        //        KmerEnc::print_string(rolling_kmer_f, k);
        //        KmerEnc::print_string(rolling_kmer_r, k);
        //

        let canonical_marker = rolling_kmer_f_marker < rolling_kmer_r_marker;
        let canonical_kmer_marker = if canonical_marker {
            rolling_kmer_f_marker
        } else {
            rolling_kmer_r_marker
        };
        let hash_marker = mm_hash64(canonical_kmer_marker);

        if hash_marker < threshold_marker {
            kmer_vec.push((contig_number, i, hash_marker as u64));
        }
    }
}


================================================
FILE: src/sketch.rs
================================================
use crate::cmdline::*;
use scalable_cuckoo_filter::ScalableCuckooFilter;
use scalable_cuckoo_filter::ScalableCuckooFilterBuilder;

use fxhash::FxHashMap;
use fxhash::FxHashSet;
use fxhash::FxHasher;
use memory_stats::memory_stats;
use std::fs;
use std::thread;
use std::time::Duration;

use crate::constants::*;
use crate::seeding::*;
use crate::types::*;
use log::*;
use needletail::parse_fastx_file;
use rayon::prelude::*;
use std::collections::HashMap;
use std::fs::File;
use std::io::BufWriter;
use std::io::{prelude::*, BufReader};
use std::path::Path;
use std::sync::Mutex;
type Marker = u32;

pub fn check_vram_and_block(max_ram: usize, file: &str) {
    if let Some(usage) = memory_stats() {
        let mut gb_usage_curr = usage.virtual_mem as f64 / 1_000_000_000 as f64;
        if (max_ram as f64) < gb_usage_curr {
            log::debug!(
                "Max memory reached. Blocking sketch for {}. Curr memory {}, max mem {}",
                file,
                gb_usage_curr,
                max_ram
            );
        }
        while (max_ram as f64) < gb_usage_curr {
            let five_second = Duration::from_secs(1);
            thread::sleep(five_second);
            if let Some(usage) = memory_stats() {
                gb_usage_curr = usage.virtual_mem as f64 / 1_000_000_000 as f64;
                if (max_ram as f64) >= gb_usage_curr {
                    log::debug!("Sketching for {} freed", file);
                }
            } else {
                break;
            }
        }
    }
}

pub fn extract_markers(string: &[u8], kmer_vec: &mut Vec<u64>, c: usize, k: usize) {
    #[cfg(any(target_arch = "x86_64"))]
    {
        if is_x86_feature_detected!("avx2") {
            use crate::avx2_seeding::*;
            unsafe {
                extract_markers_avx2(string, kmer_vec, c, k);
            }
        } else {
            fmh_seeds(string, kmer_vec, c, k);
        }
    }
    #[cfg(not(target_arch = "x86_64"))]
    {
        fmh_seeds(string, kmer_vec, c, k);
    }
}

pub fn extract_markers_positions(
    string: &[u8],
    kmer_vec: &mut Vec<(usize, usize, u64)>,
    c: usize,
    k: usize,
    contig_number: usize,
) {
    #[cfg(any(target_arch = "x86_64"))]
    {
        if is_x86_feature_detected!("avx2") {
            use crate::avx2_seeding::*;
            unsafe {
                extract_markers_avx2_positions(string, kmer_vec, c, k, contig_number);
            }
        } else {
            fmh_seeds_positions(string, kmer_vec, c, k, contig_number);
        }
    }
    #[cfg(not(target_arch = "x86_64"))]
    {
        fmh_seeds_positions(string, kmer_vec, c, k, contig_number);
    }
}

pub fn is_fastq(file: &str) -> bool {
    if file.ends_with(".fq")
        || file.ends_with(".fnq")
        || file.ends_with(".fastq")
        || file.ends_with(".fq.gz")
        || file.ends_with(".fnq.gz")
        || file.ends_with(".fastq.gz")
    {
        return true;
    } else {
        return false;
    }
}

pub fn is_fasta(file: &str) -> bool {
    if file.ends_with(".fa")
        || file.ends_with(".fna")
        || file.ends_with(".fasta")
        || file.ends_with(".fa.gz")
        || file.ends_with(".fna.gz")
        || file.ends_with(".fasta.gz")
    {
        return true;
    } else {
        return false;
    }
}

fn check_args_valid(args: &SketchArgs) {
    let level;
    if args.trace {
        level = log::LevelFilter::Trace;
    } else if args.debug {
        level = log::LevelFilter::Debug;
    } else {
        level = log::LevelFilter::Info;
    }

    rayon::ThreadPoolBuilder::new()
        .num_threads(args.threads)
        .build_global()
        .unwrap();

    simple_logger::SimpleLogger::new()
        .with_level(level)
        .init()
        .unwrap();

    if args.files.is_empty()
        && args.list_sequence.is_none()
        && args.first_pair.is_empty()
        && args.second_pair.is_empty()
        && args.genomes.is_none()
        && args.reads.is_none()
        && args.list_genomes.is_none()
        && args.list_reads.is_none()
        && args.list_first_pair.is_none()
        && args.list_second_pair.is_none()
    {
        error!("No input sequences found; see sylph sketch -h for help. Exiting.");
        std::process::exit(1);
    }

    if args.fpr < 0. || args.fpr >= 1. {
        error!("Invalid FPR for sketching. Must be in [0,1).");
        std::process::exit(1);
    }
}

fn parse_ambiguous_files(
    args: &SketchArgs,
    read_inputs: &mut Vec<String>,
    genome_inputs: &mut Vec<String>,
) {
    let mut all_files = vec![];
    if args.list_sequence.is_some() {
        let file_list = args.list_sequence.as_ref().unwrap();
        parse_line_file(file_list, &mut all_files);
    }

    all_files.extend(args.files.clone());

    for file in all_files {
        if is_fastq(&file) {
            read_inputs.push(file);
        } else if is_fasta(&file) {
            genome_inputs.push(file);
        } else {
            warn!(
                "{} does not have a fasta/fastq/gzip type extension; skipping",
                file
            );
        }
    }
}

fn parse_reads_and_genomes(
    args: &SketchArgs,
    read_inputs: &mut Vec<String>,
    genome_inputs: &mut Vec<String>,
) {
    if let Some(genomes_syl_in) = args.genomes.clone() {
        for gn_file in genomes_syl_in {
            genome_inputs.push(gn_file);
        }
    }
    if let Some(reads_syl_in) = args.reads.clone() {
        for rd_file in reads_syl_in {
            read_inputs.push(rd_file);
        }
    }

    if args.list_reads.is_some() {
        let file_reads = args.list_reads.as_ref().unwrap();
        parse_line_file(file_reads, read_inputs);
    }

    if args.list_genomes.is_some() {
        let file_genomes = args.list_genomes.as_ref().unwrap();
        parse_line_file(file_genomes, genome_inputs);
    }
}

fn parse_paired_end_reads(
    args: &SketchArgs,
    first_pairs: &mut Vec<String>,
    second_pairs: &mut Vec<String>,
) {
    if args.first_pair.len() != args.second_pair.len() {
        error!("Different number of paired sequences. Exiting.");
        std::process::exit(1);
    }

    for f in args.first_pair.iter() {
        first_pairs.push(f.clone());
    }

    for f in args.second_pair.iter() {
        second_pairs.push(f.clone());
    }

    if args.list_first_pair.is_some() {
        let file_first_pair = args.list_first_pair.as_ref().unwrap();
        parse_line_file(file_first_pair, first_pairs);
    }

    if args.list_second_pair.is_some() {
        let file_second_pair = args.list_second_pair.as_ref().unwrap();
        parse_line_file(file_second_pair, second_pairs)
    }

    if first_pairs.len() != second_pairs.len() {
        error!("Different number of paired sequences. Exiting.");
        std::process::exit(1);
    }
}

fn parse_line_file(file_name: &str, vec: &mut Vec<String>) {
    let file = File::open(file_name).unwrap();
    let reader = BufReader::new(file);
    for line in reader.lines() {
        vec.push(line.unwrap());
    }
}

fn parse_sample_names(args: &SketchArgs) -> Option<Vec<String>> {
    if args.list_sample_names.is_none() && args.sample_names.is_none() {
        return None;
    } else {
        let mut sample_names = vec![];
        if let Some(file) = &args.list_sample_names {
            parse_line_file(file, &mut sample_names);
            return Some(sample_names);
        }
        if let Some(vec) = &args.sample_names {
            sample_names.extend(vec.clone());
        }
        return Some(sample_names);
    }
}

pub fn sketch(args: SketchArgs) {
    let mut read_inputs = vec![];
    let mut genome_inputs = vec![];
    let mut first_pairs = vec![];
    let mut second_pairs = vec![];

    check_args_valid(&args);
    parse_ambiguous_files(&args, &mut read_inputs, &mut genome_inputs);
    parse_reads_and_genomes(&args, &mut read_inputs, &mut genome_inputs);
    parse_paired_end_reads(&args, &mut first_pairs, &mut second_pairs);

    let sample_names = parse_sample_names(&args);
    if let Some(names) = &sample_names {
        if names.len() != first_pairs.len() + read_inputs.len() {
            log::error!("Sample name length is not equal to the number of reads. Exiting");
            std::process::exit(1);
        }
    }

    let mut max_ram = usize::MAX;
    if args.max_ram.is_some() {
        max_ram = args.max_ram.unwrap();
        if max_ram < 7 {
            log::error!("Max ram must be >= 7. Exiting.");
            std::process::exit(1);
        }
    }

    if genome_inputs.is_empty() && args.db_out_name != "database" {
        log::warn!(
            "-o is set but no genomes are present. -o only applies to genomes; see -d for reads"
        );
    }

    if !first_pairs.is_empty() && !second_pairs.is_empty() {
        info!("Sketching paired sequences...");
        let iter_vec: Vec<usize> = (0..first_pairs.len()).into_iter().collect();
        iter_vec.into_par_iter().for_each(|i| {
            let read_file1 = &first_pairs[i];
            let read_file2 = &second_pairs[i];

            let mut sample_name = None;
            if let Some(name) = &sample_names {
                sample_name = Some(name[i].clone());
            }
            let read_sketch_opt = sketch_pair_sequences(
                read_file1,
                read_file2,
                args.c,
                args.k,
                sample_name.clone(),
                args.no_dedup,
                args.fpr,
            );
            if read_sketch_opt.is_some() {
                let res = fs::create_dir_all(&args.sample_output_dir);
                if res.is_err() {
                    error!("Could not create directory at {}", args.sample_output_dir);
                    std::process::exit(1);
                }
                let pref = Path::new(&args.sample_output_dir);
                let read_sketch = read_sketch_opt.unwrap();

                let sketch_name;
                if sample_name.is_some() {
                    sketch_name = read_sketch.sample_name.as_ref().unwrap();
                } else {
                    sketch_name = &read_sketch.file_name;
                }

                let read_file_path = Path::new(&sketch_name).file_name().unwrap();
                let file_path = pref.join(&read_file_path);

                let file_path_str = format!(
                    "{}.paired{}",
                    file_path.to_str().unwrap(),
                    SAMPLE_FILE_SUFFIX
                );

                let mut read_sk_file = BufWriter::new(
                    File::create(&file_path_str)
                        .expect(&format!("{} path not valid; exiting ", file_path_str)),
                );

                bincode::serialize_into(&mut read_sk_file, &read_sketch).unwrap();
                info!("Sketching {} complete.", file_path_str);
            }
        });
    }

    if !read_inputs.is_empty() {
        info!("Sketching non-paired sequences...");
    }

    let iter_vec: Vec<usize> = (0..read_inputs.len()).into_iter().collect();
    iter_vec.into_par_iter().for_each(|i| {
        let pref = Path::new(&args.sample_output_dir);
        std::fs::create_dir_all(pref)
            .expect("Could not create directory for output sample files (-d). Exiting...");

        let read_file = &read_inputs[i];

        check_vram_and_block(max_ram, read_file);
        let mut sample_name = None;
        if let Some(name) = &sample_names {
            sample_name = Some(name[i + first_pairs.len()].clone());
        }

        let read_sketch_opt;
        read_sketch_opt = sketch_sequences_needle(
            read_file,
            args.c,
            args.k,
            sample_name.clone(),
            args.no_dedup,
        );

        if read_sketch_opt.is_some() {
            let read_sketch = read_sketch_opt.unwrap();
            let sketch_name;
            if sample_name.is_some() {
                sketch_name = read_sketch.sample_name.as_ref().unwrap();
            } else {
                sketch_name = &read_sketch.file_name;
            }
            let read_file_path = Path::new(&sketch_name).file_name().unwrap();
            let file_path = pref.join(&read_file_path);

            let file_path_str = format!("{}{}", file_path.to_str().unwrap(), SAMPLE_FILE_SUFFIX);

            let mut read_sk_file = BufWriter::new(
                File::create(&file_path_str)
                    .expect(&format!("{} path not valid; exiting.", file_path_str)),
            );

            bincode::serialize_into(&mut read_sk_file, &read_sketch).unwrap();
            info!("Sketching {} complete.", file_path_str);
        }
    });

    if !genome_inputs.is_empty() {
        info!("Sketching genomes...");
        let iter_vec: Vec<usize> = (0..genome_inputs.len()).into_iter().collect();
        let counter: Mutex<usize> = Mutex::new(0);
        let pref = Path::new(&args.db_out_name);
        let file_path_str = format!("{}{}", pref.to_str().unwrap(), QUERY_FILE_SUFFIX);
        let path = std::path::Path::new(&file_path_str);
        let prefix = path.parent().unwrap();
        std::fs::create_dir_all(prefix)
            .expect("Could not create directory for output database file (-o). Exiting...");
        let all_genome_sketches = Mutex::new(vec![]);

        iter_vec.into_par_iter().for_each(|i| {
            let genome_file = &genome_inputs[i];
            if args.individual {
                let indiv_gn_sketches = sketch_genome_individual(
                    args.c,
                    args.k,
                    genome_file,
                    args.min_spacing_kmer,
                    !args.no_pseudotax,
                );
                all_genome_sketches
                    .lock()
                    .unwrap()
                    .extend(indiv_gn_sketches);
            } else {
                let genome_sketch = sketch_genome(
                    args.c,
                    args.k,
                    genome_file,
                    args.min_spacing_kmer,
                    !args.no_pseudotax,
                );
                if genome_sketch.is_some() {
                    all_genome_sketches
                        .lock()
                        .unwrap()
                        .push(genome_sketch.unwrap());
                }
            }
            let mut c = counter.lock().unwrap();
            *c += 1;
            if *c % 100 == 0 && *c != 0 {
                info!("{} genomes processed.", *c);
            }
        });

        if all_genome_sketches.lock().unwrap().is_empty() {
            warn!(
                "No valid genomes to sketch; {} is not output",
                file_path_str
            );
        } else {
            let mut genome_sk_file = BufWriter::new(
                File::create(&file_path_str).expect(&format!("{} not valid ", file_path_str)),
            );
            info!("Wrote all genome sketches to {}", file_path_str);
            bincode::serialize_into(&mut genome_sk_file, &all_genome_sketches).unwrap();
        }
    }

    info!("Finished.");
}

pub fn sketch_genome_individual(
    c: usize,
    k: usize,
    ref_file: &str,
    min_spacing: usize,
    pseudotax: bool,
) -> Vec<GenomeSketch> {
    let reader = parse_fastx_file(&ref_file);
    if !reader.is_ok() {
        warn!("{} is not a valid fasta/fastq file; skipping.", ref_file);
        return vec![];
    } else {
        let mut reader = reader.unwrap();
        let mut return_vec = vec![];
        while let Some(record) = reader.next() {
            let mut return_genome_sketch = GenomeSketch::default();
            return_genome_sketch.c = c;
            return_genome_sketch.k = k;
            return_genome_sketch.file_name = ref_file.to_string();
            if record.is_ok() {
                let mut pseudotax_track_kmers = vec![];
                let mut kmer_vec = vec![];
                let record = record.expect(&format!("Invalid record for file {} ", ref_file));
                let contig_name = String::from_utf8_lossy(record.id()).to_string();
                let contig_name_notab = contig_name.replace('\t', " ");
                return_genome_sketch.first_contig_name = contig_name_notab.to_owned();
                let seq = record.seq();

                extract_markers_positions(&seq, &mut kmer_vec, c, k, 0);

                let mut kmer_set = MMHashSet::default();
                let mut duplicate_set = MMHashSet::default();
                let mut new_vec = Vec::with_capacity(kmer_vec.len());
                kmer_vec.sort();
                for (_, _pos, km) in kmer_vec.iter() {
                    if !kmer_set.contains(&km) {
                        kmer_set.insert(km);
                    } else {
                        duplicate_set.insert(km);
                    }
                }
                let mut last_pos = 0;
                for (_, pos, km) in kmer_vec.iter() {
                    if !duplicate_set.contains(&km) {
                        if last_pos == 0 || pos - last_pos > min_spacing {
                            new_vec.push(*km);
                            last_pos = *pos;
                        } else if pseudotax {
                            pseudotax_track_kmers.push(*km);
                        }
                    }
                }

                return_genome_sketch.gn_size = record.seq().len();
                return_genome_sketch.genome_kmers = new_vec;
                return_genome_sketch.min_spacing = min_spacing;
                if pseudotax {
                    return_genome_sketch.pseudotax_tracked_nonused_kmers =
                        Some(pseudotax_track_kmers);
                }
                return_vec.push(return_genome_sketch);
            } else {
                warn!("File {} is not a valid fasta/fastq file", ref_file);
                return vec![];
            }
        }
        return return_vec;
    }
}

pub fn sketch_genome(
    c: usize,
    k: usize,
    ref_file: &str,
    min_spacing: usize,
    pseudotax: bool,
) -> Option<GenomeSketch> {
    let reader = parse_fastx_file(&ref_file);
    let mut vec = vec![];
    let mut pseudotax_track_kmers = vec![];
    if !reader.is_ok() {
        warn!("{} is not a valid fasta/fastq file; skipping.", ref_file);
        return None;
    } else {
        let mut reader = reader.unwrap();
        let mut first = true;
        let mut return_genome_sketch = GenomeSketch::default();
        return_genome_sketch.c = c;
        return_genome_sketch.k = k;
        return_genome_sketch.file_name = ref_file.to_string();
        let mut contig_number = 0;
        while let Some(record) = reader.next() {
            if record.is_ok() {
                let record = record.expect(&format!("Invalid record for file {} ", ref_file));
                if first {
                    let contig_name = String::from_utf8_lossy(record.id()).to_string();
                    let contig_name_notab = contig_name.replace('\t', " ");
                    return_genome_sketch.first_contig_name = contig_name_notab.to_owned();
                    first = false;
                }
                let seq = record.seq();

                return_genome_sketch.gn_size += seq.len();
                extract_markers_positions(&seq, &mut vec, c, k, contig_number);

                contig_number += 1
            } else {
                warn!("File {} is not a valid fasta/fastq file", ref_file);
                return None;
            }
        }
        let mut kmer_set = MMHashSet::default();
        let mut duplicate_set = MMHashSet::default();
        let mut new_vec = Vec::with_capacity(vec.len());
        vec.sort();
        for (_, _, km) in vec.iter() {
            if !kmer_set.contains(&km) {
                kmer_set.insert(km);
            } else {
                duplicate_set.insert(km);
            }
        }

        let mut last_pos = 0;
        let mut last_contig = 0;
        for (contig, pos, km) in vec.iter() {
            if !duplicate_set.contains(&km) {
                if last_pos == 0 || last_contig != *contig || pos - last_pos > min_spacing {
                    new_vec.push(*km);
                    last_contig = *contig;
                    last_pos = *pos;
                } else if pseudotax {
                    pseudotax_track_kmers.push(*km);
                }
            }
        }
        return_genome_sketch.genome_kmers = new_vec;
        return_genome_sketch.min_spacing = min_spacing;
        if pseudotax {
            return_genome_sketch.pseudotax_tracked_nonused_kmers = Some(pseudotax_track_kmers);
        }
        return Some(return_genome_sketch);
    }
}

#[inline]
fn pair_kmer_single(s1: &[u8]) -> Option<([Marker; 2], [Marker; 2])> {
    let k = std::mem::size_of::<Marker>() * 4;
    if s1.len() < 4 * k + 2 {
        return None;
    } else {
        let mut kmer_f = 0;
        let mut kmer_g = 0;
        let mut kmer_r = 0;
        let mut kmer_t = 0;
        let halfway = s1.len() / 2;
        // len(s1)/2 + (k-1)* 2 + 2 < len(s1)
        for i in 0..k {
            let nuc_1 = BYTE_TO_SEQ[s1[2 * i] as usize] as Marker;
            let nuc_2 = BYTE_TO_SEQ[s1[2 * i + halfway] as usize] as Marker;
            let nuc_3 = BYTE_TO_SEQ[s1[1 + 2 * i] as usize] as Marker;
            let nuc_4 = BYTE_TO_SEQ[s1[1 + 2 * i + halfway] as usize] as Marker;

            kmer_f <<= 2;
            kmer_f |= nuc_1;

            kmer_r <<= 2;
            kmer_r |= nuc_2;

            kmer_g <<= 2;
            kmer_g |= nuc_3;

            kmer_t <<= 2;
            kmer_t |= nuc_4;
        }
        return Some(([kmer_f, kmer_r], [kmer_g, kmer_t]));
    }
}

#[inline]
fn pair_kmer(s1: &[u8], s2: &[u8]) -> Option<([Marker; 2], [Marker; 2])> {
    let k = std::mem::size_of::<Marker>() * 4;
    if s1.len() < 2 * k + 1 || s2.len() < 2 * k + 1 {
        return None;
    } else {
        let mut kmer_f = 0;
        let mut kmer_g = 0;
        let mut kmer_r = 0;
        let mut kmer_t = 0;
        for i in 0..k {
            let nuc_1 = BYTE_TO_SEQ[s1[2 * i] as usize] as Marker;
            let nuc_2 = BYTE_TO_SEQ[s2[2 * i] as usize] as Marker;
            let nuc_3 = BYTE_TO_SEQ[s1[1 + 2 * i] as usize] as Marker;
            let nuc_4 = BYTE_TO_SEQ[s2[1 + 2 * i] as usize] as Marker;

            kmer_f <<= 2;
            kmer_f |= nuc_1;

            kmer_r <<= 2;
            kmer_r |= nuc_2;

            kmer_g <<= 2;
            kmer_g |= nuc_3;

            kmer_t <<= 2;
            kmer_t |= nuc_4;
        }
        return Some(([kmer_f, kmer_r], [kmer_g, kmer_t]));
    }
}

fn dup_removal_lsh_full_exact(
    kmer_counts: &mut FxHashMap<Kmer, u32>,
    kmer_to_pair_set: &mut FxHashSet<(u64, [Marker; 2])>,
    //kmer_to_pair_set: &mut ScalableCuckooFilter<(u64,[Marker;2]), FxHasher>,
    //kmer_to_pair_set: &mut GrowableBloom,
    km: &u64,
    kmer_pair: Option<([Marker; 2], [Marker; 2])>,
    num_dup_removed: &mut usize,
    no_dedup: bool,
    threshold: Option<u32>,
) {
    let c = kmer_counts.entry(*km).or_insert(0);
    let mut c_threshold = u32::MAX;
    if let Some(t) = threshold {
        c_threshold = t;
    }
    if !no_dedup && *c < c_threshold {
        if let Some(doublepairs) = kmer_pair {
            let mut ret = false;
            if kmer_to_pair_set.contains(&(*km, doublepairs.0)) {
                //Need this when using approximate data structures
                if *c > 0 {
                    ret = true;
                }
            } else {
                kmer_to_pair_set.insert((*km, doublepairs.0));
            }
            if kmer_to_pair_set.contains(&(*km, doublepairs.1)) {
                if *c > 0 {
                    ret = true;
                }
            } else {
                kmer_to_pair_set.insert((*km, doublepairs.1));
            }
            if ret {
                *num_dup_removed += 1;
                return;
            }
        }
    }
    *c += 1;
}

fn dup_removal_lsh_full(
    kmer_counts: &mut FxHashMap<Kmer, u32>,
    //kmer_to_pair_set: &mut FxHashSet<(u64,[Marker;2])>,
    kmer_to_pair_set: &mut ScalableCuckooFilter<(u64, [Marker; 2]), FxHasher>,
    //kmer_to_pair_set: &mut GrowableBloom,
    km: &u64,
    kmer_pair: Option<([Marker; 2], [Marker; 2])>,
    num_dup_removed: &mut usize,
    no_dedup: bool,
) {
    let c = kmer_counts.entry(*km).or_insert(0);
    if !no_dedup {
        if let Some(doublepairs) = kmer_pair {
            let mut ret = false;
            if kmer_to_pair_set.contains(&(*km, doublepairs.0)) {
                //Need this when using approximate data structures
                if *c > 0 {
                    ret = true;
                }
            } else {
                kmer_to_pair_set.insert(&(*km, doublepairs.0));
            }
            if kmer_to_pair_set.contains(&(*km, doublepairs.1)) {
                if *c > 0 {
                    ret = true;
                }
            } else {
                kmer_to_pair_set.insert(&(*km, doublepairs.1));
            }
            if ret {
                *num_dup_removed += 1;
                return;
            }
        }
    }
    *c += 1;
}

pub fn sketch_pair_sequences(
    read_file1: &str,
    read_file2: &str,
    c: usize,
    k: usize,
    sample_name: Option<String>,
    no_dedup: bool,
    dedup_fpr: f64,
) -> Option<SequencesSketch> {
    let r1o = parse_fastx_file(&read_file1);
    let r2o = parse_fastx_file(&read_file2);
    let mut read_sketch = SequencesSketch::new(read_file1.to_string(), c, k, true, sample_name, 0.);
    if r1o.is_err() || r2o.is_err() {
        log::error!("Paired end reading failed for '{}' and '{}'. Make sure the files are present or the sequences are valid.", read_file1, read_file2);
        std::process::exit(1);
    }

    let mut num_dup_removed = 0;

    let mut reader1 = r1o.unwrap();
    let mut reader2 = r2o.unwrap();

    //let mut kmer_pair_set = FxHashMap::default();
    let mut kmer_pair_set = FxHashSet::default();
    //let mut kmer_pair_set = GrowableBloom::new(0.001, 1_000_000_0);
    let mut fpr = 0.001;
    if dedup_fpr != 0. {
        fpr = dedup_fpr;
    }
    let mut kmer_pair_set_approx = ScalableCuckooFilterBuilder::new()
        .initial_capacity(1_000_000_0)
        .false_positive_probability(fpr)
        .hasher(FxHasher::default())
        .finish();

    let mut mean_read_length: f64 = 0.;
    let mut counter: f64 = 0.;

    loop {
        let n1 = reader1.next();
        let n2 = reader2.next();
        if let Some(rec1_o) = n1 {
            if let Some(rec2_o) = n2 {
                if let Ok(rec1) = rec1_o {
                    if let Ok(rec2) = rec2_o {
                        let mut temp_vec1 = vec![];
                        let mut temp_vec2 = vec![];

                        extract_markers(&rec1.seq(), &mut temp_vec1, c, k);
                        extract_markers(&rec2.seq(), &mut temp_vec2, c, k);
                        let kmer_pair = pair_kmer(&rec1.seq(), &rec2.seq());

                        //moving average
                        counter += 1.;
                        mean_read_length = mean_read_length
                            + ((rec1.seq().len() as f64) - mean_read_length) / counter;

                        for km in temp_vec1.iter() {
                            if dedup_fpr == 0. {
                                dup_removal_lsh_full_exact(
                                    &mut read_sketch.kmer_counts,
                                    &mut kmer_pair_set,
                                    km,
                                    kmer_pair,
                                    &mut num_dup_removed,
                                    no_dedup,
                                    None,
                                );
                            } else {
                                dup_removal_lsh_full(
                                    &mut read_sketch.kmer_counts,
                                    &mut kmer_pair_set_approx,
                                    km,
                                    kmer_pair,
                                    &mut num_dup_removed,
                                    no_dedup,
                                );
                            }
                            //dup_removal_lsh(&mut read_sketch.kmer_counts, &mut kmer_pair_set, km, kmer_pair, &mut num_dup_removed, no_dedup);
                        }
                        for km in temp_vec2.iter() {
                            if temp_vec1.contains(km) {
                                continue;
                            }
                            if dedup_fpr == 0. {
                                dup_removal_lsh_full_exact(
                                    &mut read_sketch.kmer_counts,
                                    &mut kmer_pair_set,
                                    km,
                                    kmer_pair,
                                    &mut num_dup_removed,
                                    no_dedup,
                                    None,
                                );
                            } else {
                                dup_removal_lsh_full(
                                    &mut read_sketch.kmer_counts,
                                    &mut kmer_pair_set_approx,
                                    km,
                                    kmer_pair,
                                    &mut num_dup_removed,
                                    no_dedup,
                                );
                            }
                            //dup_removal_lsh(&mut read_sketch.kmer_counts, &mut kmer_pair_set, km, kmer_pair, &mut num_dup_removed, no_dedup);
                        }
                    }
                } else {
                    return None;
                }
            }
        } else {
            break;
        }
    }
    let percent = (num_dup_removed as f64)/((read_sketch.kmer_counts.values().sum::<u32>() as f64) + num_dup_removed as f64) * 100.;
    log::debug!(
        "Number of sketched k-mers removed due to read duplication for {}: {}. Percentage: {:.2}%",
        read_sketch.file_name,
        num_dup_removed,
        percent,
    );
    read_sketch.mean_read_length = mean_read_length;
    return Some(read_sketch);
}

pub fn sketch_sequences_needle(
    read_file: &str,
    c: usize,
    k: usize,
    sample_name: Option<String>,
    no_dedup: bool,
) -> Option<SequencesSketch> {
    let mut kmer_map = HashMap::default();
    let ref_file = &read_file;
    let reader = parse_fastx_file(&ref_file);
    let mut mean_read_length = 0.;
    let mut counter = 0.;
    let mut kmer_to_pair_table = FxHashSet::default();
    let mut num_dup_removed = 0;

    if !reader.is_ok() {
        warn!("{} is not a valid fasta/fastq file; skipping.", ref_file);
        return None
    } else {
        let mut reader = reader.unwrap();
        while let Some(record) = reader.next() {
            if record.is_ok() {
                let mut vec = vec![];
                let record = record.expect(&format!("Invalid record for file {} ", ref_file));
                let seq = record.seq();
                let kmer_pair;
                if seq.len() > 400 {
                    kmer_pair = None;
                } else {
                    kmer_pair = pair_kmer_single(&seq);
                }
                extract_markers(&seq, &mut vec, c, k);
                for km in vec {
                    dup_removal_lsh_full_exact(
                        &mut kmer_map,
                        &mut kmer_to_pair_table,
                        &km,
                        kmer_pair,
                        &mut num_dup_removed,
                        no_dedup,
                        Some(MAX_DEDUP_COUNT),
                    );
                }
                //moving average
                counter += 1.;
                mean_read_length =
                    mean_read_length + ((seq.len() as f64) - mean_read_length) / counter;
            } else {
                warn!("File {} is not a valid fasta/fastq file", ref_file);
            }
        }
    }

    return Some(SequencesSketch {
        kmer_counts: kmer_map,
        file_name: read_file.to_string(),
        c,
        k,
        paired: false,
        sample_name: sample_name,
        mean_read_length,
    });
}


================================================
FILE: src/types.rs
================================================
//Various byte-tables and hashing methods are taken from miniprot by Heng Li. Attached below is their license:
//The MIT License

// **** miniprot LICENSE ***
//Copyright (c) 2022-     Dana-Farber Cancer Institute
//
//Permission is hereby granted, free of charge, to any person obtaining
//a copy of this software and associated documentation files (the
//"Software"), to deal in the Software without restriction, including
//without limitation the rights to use, copy, modify, merge, publish,
//distribute, sublicense, and/or sell copies of the Software, and to
//permit persons to whom the Software is furnished to do so, subject to
//the following conditions:
//
//The above copyright notice and this permission notice shall be
//included in all copies or substantial portions of the Software.
//
//THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
//EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
//MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
//NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
//BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
//ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
//CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
//SOFTWARE.
//******************************


use std::collections::HashMap;

// bytecheck can be used to validate your data if you want
use std::hash::{BuildHasherDefault, Hasher};
use std::collections::HashSet;
use smallvec::SmallVec;
use serde::{Deserialize, Serialize, Serializer, Deserializer, de::Visitor};
use fxhash::FxHashMap;

#[derive(Deserialize, Serialize, Debug, PartialEq)]
pub enum AdjustStatus {
    Lambda(f64),
    Low,
    High,
}

impl Default for AdjustStatus {
    fn default() -> Self {AdjustStatus::Low }
}

pub type Kmer = u64;
pub const BYTE_TO_SEQ: [u8; 256] = [
    0, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 1, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
];

#[inline]
pub fn mm_hash(bytes: &[u8]) -> usize {
    let mut key = usize::from_ne_bytes(bytes.try_into().unwrap()) as usize;
    key = (!key).wrapping_add(key << 21); // key = (key << 21) - key - 1;
    key = key ^ key >> 24;
    key = (key.wrapping_add(key << 3)).wrapping_add(key << 8); // key * 265
    key = key ^ key >> 14;
    key = (key.wrapping_add(key << 2)).wrapping_add(key << 4); // key * 21
    key = key ^ key >> 28;
    key = key.wrapping_add(key << 31);
    return key;
}

pub struct MMHasher {
    hash: usize,
}

impl Hasher for MMHasher {
    #[inline]
    fn write(&mut self, bytes: &[u8]) {
        self.hash = mm_hash(bytes);
    }
    #[inline]
    fn finish(&self) -> u64 {
        self.hash as u64
    }
}

impl Default for MMHasher {
    #[inline]
    fn default() -> MMHasher {
        MMHasher { hash: 0 }
    }
}

//Implement minimap2 hashing, will test later.
pub type MMBuildHasher = BuildHasherDefault<MMHasher>;
pub type MMHashMap<K, V> = HashMap<K, V, MMBuildHasher>;
pub type MMHashSet<K> = HashSet<K, MMBuildHasher>;

/// `serde` helpers to improve serialization of the `FxHashMap` storing k-mer counts.
/// 
/// Encoding the `FxHashMap` as a sequence instead of a map speeds up serialize
/// and deserialize by a magnitude.
mod kmer_counts {
    use super::*;

    struct FxHashMapVisitor;
    
    impl<'a> Visitor<'a> for FxHashMapVisitor {
        type Value = FxHashMap<Kmer, u32>;

        fn expecting(&self, formatter: &mut std::fmt::Formatter) -> std::fmt::Result {
            formatter.write_str("a sequence of kmer counts")
        }

        fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
            where
                A: serde::de::SeqAccess<'a>
        {
            let mut counts = match seq.size_hint() {
                Some(size) => FxHashMap::with_capacity_and_hasher(size, Default::default()),
                None => FxHashMap::default(),
            };
            while let Some(item) = seq.next_element::<(Kmer, u32)>()? {
                counts.insert(item.0, item.1);
            }
            Ok(counts)
        }
    }

    pub fn serialize<S>(
        kmer_counts: &FxHashMap<Kmer, u32>, 
        serializer: S
    ) -> Result<S::Ok, S::Error> 
    where S: Serializer {
        serializer.collect_seq(kmer_counts.into_iter())
    }

    pub fn deserialize<'de, D>(deserializer: D) -> Result<FxHashMap<Kmer, u32>, D::Error> where D: Deserializer<'de> {
        deserializer.deserialize_seq(FxHashMapVisitor)
    }
}

#[derive(Default, Deserialize, Serialize, Debug, PartialEq)]
pub struct SequencesSketch{
    #[serde(with = "kmer_counts")]
    pub kmer_counts: FxHashMap<Kmer, u32>,
    pub c: usize,
    pub k: usize,
    pub file_name: String,
    pub sample_name: Option<String>,
    pub paired: bool,
    pub mean_read_length: f64,
}

impl SequencesSketch{
    pub fn new(file_name: String, c: usize, k: usize, paired: bool, sample_name: Option<String>, mean_read_length: f64) -> SequencesSketch{
        return SequencesSketch{kmer_counts : HashMap::default(), file_name, c, k, paired, sample_name, mean_read_length}
    }
}

#[derive(Deserialize, Serialize, Debug, PartialEq, Hash, PartialOrd, Eq, Ord, Default, Clone)]
pub struct GenomeSketch{
    pub genome_kmers: Vec<Kmer>,
    pub pseudotax_tracked_nonused_kmers: Option<Vec<Kmer>>,
    pub file_name: String,
    pub first_contig_name: String,
    pub c: usize,
    pub k: usize,
    pub gn_size: usize,
    pub min_spacing: usize,
}

#[derive(Deserialize, Serialize, Debug, PartialEq)]
#[derive(Default, Clone)]
pub struct MultGenomeSketch{
    pub genome_kmer_index: Vec<(Kmer,SmallVec<[u32;1]>)>,
    pub file_names: Vec<String>,
    pub contig_names: Vec<String>,
    pub c: usize,
    pub k: usize,
}

#[derive(Debug, PartialEq)]
pub struct AniResult<'a>{
    pub naive_ani: f64,
    pub final_est_ani: f64,
    pub final_est_cov: f64,
    pub seq_name: String,
    pub gn_name: &'a str,
    pub contig_name: &'a str,
    pub mean_cov: f64,
    pub median_cov: f64,
    pub containment_index: (usize,usize),
    pub lambda: AdjustStatus,
    pub ani_ci: (Option<f64>,Option<f64>),
    pub lambda_ci: (Option<f64>,Option<f64>),
    pub genome_sketch: &'a GenomeSketch,
    pub rel_abund: Option<f64>,
    pub seq_abund: Option<f64>,
    pub kmers_lost: Option<usize>,

}


================================================
FILE: test_files/k12_R1.fq
================================================
@NC_007779.1_2104702_2105250_1:0:0_2:0:0_0/1
GTGGTGCGGTGCGGCAAGGCGCTATCCAGGGATAACCGGGCAAACAGACGCATGGAGGCGATTTCGTACA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3843327_3843872_2:0:0_0:0:0_1/1
CGTGATGATAAACACTGGCCGGAAGAACACTGGCGAGAATTGATTGGTTTACTGGCTGATTCAGGAATAC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3551005_3551530_1:0:0_3:0:0_2/1
GTTGGCGTTCTTGATGTCTACCCAGAGGATGGTCATCGCACCGCGACCAAATGTTGGAGCAAGACTTGCT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4203852_4204347_1:0:0_1:0:0_3/1
GCCAGTGTGCCGCCAACCTGGCCGGCACGCAGAATGATAATTTTCATCAGTCGCGACCCGTTATCTCATC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_250281_250700_1:0:0_2:0:0_4/1
ATCTCAATGCGCCGGTTGCCCGCGCTTTGCGGATTTTTTGAATCCAGCATCATCTGGTCCGCCATTGCGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2054094_2054595_3:0:0_2:0:0_5/1
ACCTCAATAATACATATGGCAAGTTCCTATAATACCCCTGTTGTTGCAATTTATGCTGATTACAAAACGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3454737_3455300_2:0:0_2:0:0_6/1
CACGGATTTGAAAGATAACTTTTTCAAAGAACAGGTCGAGGATCTGCTCTGTGGTGTAGTTCAGGGCGCT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3791315_3791763_2:0:0_2:1:0_7/1
TCATCGAACAACTATCGCTGGGCGTTTAGGACTCGGTGCGCCGTTTGTTGGGTCGGTTACGTCCGCGCCA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2666104_2666602_0:1:0_3:0:0_8/1
GCGATACAGACATCGGGAAGCAGGCTCGAATATTCTTAATTTTCAACTTGCACCAACATTCTATTAACCC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4520973_4521453_3:0:0_0:0:0_9/1
ACCACTGTAAGGAAAATAATTCTTATTTCGATTGTCCTTTTTACCCTTCTCGTTCGACTCATAGCTGAAC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2572462_2572936_3:1:0_0:0:0_a/1
GGCTAAGAGCGAACCATACGCTCAGCGAAGGCGCAGAGATCCATCTCCCTGCGGACAGTCGCCTGACGCC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2172028_2172547_1:0:0_3:0:0_b/1
TTCATTAAAGGTCGTAATACCATTAACCGCATCAATCCCATCAGCACCGACAAACATTAAATCGGCATTA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_37426_37881_1:0:0_0:0:0_c/1
GGCTTGCCTGACTAGCGCCCGCCCAATAACAAACGAACAAAAATGGATAGAGGTGCAATGGATATCATTG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1186376_1186713_0:0:0_0:0:0_d/1
GCAACGCGTCGCCATTGCTCGCGCGGTGGTTAACAAGCCTCGTCTGTTGTTGCTGGATGAGTCGCTCTCA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2356797_2357365_1:0:0_0:0:0_e/1
ATGATGAATTTCGAAAATCCATACGCAATCGAGATCCCCGAAAGGGCAAAACCTAAATCACCGCGTGAGA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2412337_2412876_2:0:0_1:0:0_f/1
CGTGTTTGTATAAATGAAAATGTGAGTCCTTGTTCCACTCTGGTGCAGCATCGCTGGTCATACGCGAACA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_481264_481652_0:0:0_0:0:0_10/1
AAGATGCGATGGTTTTCGCCTTTAACCTGCCCGCAATCGTGGAACTGGGTACTGCAACCGGCTTTGACTT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_696045_696562_3:0:0_3:0:0_11/1
CTTATCATAATGAAGTTATCCGCCATGCGCCGCATCAGGTAACGCTTGAGGACAGGATAACTGGCCCACG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1949829_1950461_1:0:0_2:0:0_12/1
GATGTCTGGTGTTGGCCCATTTACTATGACCTGCCATAAAAATATCTCCAGATAGGCCTGCCTGTTCAGG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2503007_2503466_1:0:0_3:0:0_13/1
GCTTTTCGACAGCGTAAAGAACTCGACTGCCACATCGCGTGCACCCGGTACCTGCCTAATTGACGGCGCT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2575624_2576098_0:0:0_0:0:0_14/1
TGACTATCCGAACCAGGTGAACAACGTCCTGTGCTTCCCGTTCATCTTCCGTGGCGCGCTGGACGTTGGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_367019_367512_1:0:0_0:0:0_15/1
CGCACCACTGTGCGGCGACTGCTGGAGACGCTGCAGGAAGAGGGATATGTCCGCCGTAGCCCCTCCGATG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1412508_1413022_2:0:0_2:1:0_16/1
GGTCCGTCAGTCTGTTGCTCATAAAGCATGGAAACATTTACAGGGCGGGAAGATTAAAGGAAAAACGTGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4050697_4051218_2:0:0_2:0:0_17/1
GACCAGGCCGCTCACGAAGGCACGCTGCATCACTAAAACAATCACCACCGGAGGGATAAGCTTTAACAAC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_932985_933399_1:0:0_2:0:0_18/1
GAGTAGGGAAGGAATACAGAGAGACAATAATAATGGAAGATAGCAAGAAGCGCCCTGGCAAAGATCTCGA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_657245_657764_0:1:0_2:0:0_19/1
TCATCCTTTGGACTCATTAAACCACTTAACGTTACCTTTAATCTTAGACATCAAAATTACCTTTACATGA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1894636_1895162_4:0:0_1:0:0_1a/1
GTACACCAGTGTAGTAGAGCGTTTTGTTGTGGATTACTACATCAGACCAGCGGGCTTCAGCATCGATACG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2565121_2565660_0:0:0_1:0:0_1b/1
CATGATCTTCGCGATGATTGTCGGCAAGTTGATCGGCGGCGTAACGGCGATTGGCGTGGCGATGATGCTG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3018455_3018957_0:0:0_1:0:0_1c/1
GCGTGAAGCCGTGAAAGCCACAGGCCGTGGTTTGCATATTCACGCTGCGGAAGACCTTTACGACGTTTCC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4379202_4379787_0:0:0_1:0:0_1d/1
CCATCCCGCATAAAAAACGTCTGCGGATTCACAGCCGTCTGCCGATTGTGATCCCGGCACGTATCACCGA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3154895_3155328_3:0:0_0:0:0_1e/1
TGGGGCATGACATTGCCATCCTGTAGCTTAATAACGGTTGGATTAGCCATACGTTCCTCCTTTATATGAA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_938319_938789_0:0:0_1:0:0_1f/1
GCATCCTGCTGGCTTTTGTTGAAACGGTGAACTTCGTCAACAAAACGAATAGTGCGGCGACCTGCATTGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1608706_1609224_3:0:0_0:0:0_20/1
CGCAGGCCAACGTTGTTGTAGAAGGGCGGTGCTGTTACCTGGGCCACAGCCAAGATCGGCAACATATTCG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3546059_3546594_0:0:0_0:0:0_21/1
GGTAAATCGTTATGTGCGGCACAGCTTTTTAACGCTGCTACCAGTTTCTGGTTGGGGAAGTAGCGCTTGT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3708368_3708890_2:0:0_2:0:0_22/1
GTGACGAGTAAAGTTCAAGAGCTGCAGCGGGTACATCAGCATCGCTTTCATGCCGTGGCGATGTCGGCAC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_894530_895089_0:0:0_2:0:0_23/1
TGGAAGAATCTCGACCCGGAACTGCTGAAGCTGGTCGCCAAACACGATCCCGACAATAAATTTGCTATGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1324264_1324739_2:0:0_1:0:0_24/1
GATTCTGCAATAAACGGAAAGCGTCAAAAACCGAAAGGGAGCCTAAGCGGGCGTCTTGATCCAGCAGTGG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2931428_2931966_1:0:0_1:0:0_25/1
CCTTATGCGACCTTTGGAACACGCGAACTTTCTGACCATGTTGCGCTGGCTCTCAAAAATCGTAAGGCAA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_718911_719501_0:0:0_2:0:1_26/1
CGCAAATTGCGGTTGATCTGCGCTTCTTGCAGTTTGTACCAGGGGAACACTGGCAATCTTTGAAGGCTAT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4135590_4136077_1:0:0_1:0:0_27/1
ACAAGGCAAGGCGACGAAAGAGAAAGTGATCGAACAGGACCGGGTCAGCGCCGCGCCGTTTTGCGAAAAG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3282853_3283333_1:0:0_1:0:0_28/1
CTTCCTGATGTACGCCCAGACGTTTGCACTGATGCAGTCGCTGCACATGGGCAATACGCCGGAAACCCCA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3140798_3141385_1:0:0_2:0:0_29/1
CGCGCCATACATCCGTAACATCATTCTGGCTGCGCACACGACGCATGACCATATTGTTCATTTCTATCAG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_901908_902495_2:0:0_0:0:0_2a/1
ATTCTGGTGGTGCTGTTTATCTATTTTGGCTCCTCGCAGCTGCTGCTGACGCTTTCGGATGGCTTCACTA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2474901_2475465_2:0:0_2:0:0_2b/1
AAAACGTTTCATCTTTATGAGATTCGAAAGATTCCGTTCCAACTTCTGTTGGGGTTGCTCCAAATGATAT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2436239_2436825_0:0:0_1:0:0_2c/1
CCTGTTCGGGGCGCATTCTAACAGAAAAAGAAAACGTTTGCGTAGGGATTTCCTTCCCGCGCATCAATAA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1633146_1633666_2:0:0_1:0:0_2d/1
GGTACGATGCTGGCGGAATATGCGCCAAAAGGTAAGCGCGGAATTATCTCCTCCTTTGTGGCTATGGGAA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_673646_674097_1:0:0_2:0:0_2e/1
GTAGTTAGCCGCTTCGGACTCCAGCATACCTTCTTTGTACTGCGGGCAAGTGTAGCGCGCATAGTACCAG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_886380_886942_1:0:0_1:0:0_2f/1
TTCATCGACCGTTACAGTCGCGTTGCCGTGGTTCGGGCCAGTTCACTAATGGGGGCGTTGGGTATTGGGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3041880_3042282_0:0:0_2:0:0_30/1
TTAGCTAAAATAAATTCTGATTTGAGACTAATCTCCTAAAAATCATGAAATTAAATGCGAAATTTCAACT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2922943_2923494_0:0:0_6:0:0_31/1
AACTCTGGGGTGAAGTGAGCAAAGAGACGCTGGGAACACGAAAACGGACGCATATAGCCTCAAATCTTGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_868833_869340_2:0:0_1:0:0_32/1
CCGGCCCCCCCCATCGAGCAGAAAACGGTGGTTGATGGCGAACCTGTTTTACGAGTGCGTAATCTTGTCA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_601686_602183_2:0:0_3:0:0_33/1
GCGAGTCAAAAACTTCGGCGCATTACCCTGCAGAGAAAGTCCAAACAGGATGCGGCTGTTAGAGTAAACC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3799606_3800050_0:0:0_1:0:0_34/1
GTGGCGATTATCTGGGGGCTAACTTTTGCATTGGTTCCCGTCGGCTGGTCAACGTGGATCACCCGCTCGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_249101_249550_0:0:0_1:0:0_35/1
CGGGAATGCCGGACCTGCCGTTTTTGCTGTTCAGCGCCCTGCTTGGTTTTACCGGCTGGCGGATGAGCAA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3588879_3589349_0:0:0_2:0:0_36/1
CGCGAAGCCGAAAAAGCCGGGCTTCCGGTGCTGATCAGCACTGGCGATAACGATATGGCGCAGCTGGTGA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1518636_1519094_2:0:0_1:0:0_37/1
GTCCAGCCAGATGAATAAAACGCGGGCAAATATGCGTCATTTTTTGACGTGCATCAAGTTTTTTCACCAG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4378479_4378859_1:0:0_0:0:0_38/1
TAATTGCATGAATGATTGCTCGCGCCAGCATTTCAATCTGCAAGTCATTGACGACTAACTCACATTTGCT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4529028_4529548_3:0:0_3:0:0_39/1
GTGAATTTAGCCAAATGAATACAGCCCAGCGCATGGCACTCGGCGACGAAGCTGTAACCATTGTCGACTG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2404046_2404532_2:0:0_1:0:0_3a/1
CGCAGATGAAGTAGTGGTTTACCGTACCGTTGTAACTGTTTTCGATACGACGCAGTTCGCCGTAACGTTC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4272202_4272751_3:0:0_0:0:0_3b/1
GATATTCAATTATAGTATCGTTAAATTCTAAAGTTAAAGAGAACTCTTTTTTCCGGTTTGTGTAAGTTTA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_847950_848490_2:0:0_0:0:0_3c/1
GATTTTTTCATATATGTGAATGTCACGCAGGGGATCGTCCCGTGGATAGAAAAAAGGAAATGCTATGAAG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2684384_2684954_0:0:0_3:0:0_3d/1
AAATCTCGAGTACCTTCCCAACCACCGGCTTTGCTGGCGTTTTCGTTATAGCTTTCCGCCTCTCGATTGA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4382800_4383303_2:0:0_0:0:0_3e/1
GTAGCGAGATTGTGCCAGTTGTATCCCTTGTTGAAGCGTTTTCTCATTGATAACAAGGGGTTTAAAATTG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1063307_1063868_1:0:0_1:0:0_3f/1
ACAACGCCCCGCCACACGCGGCCACGATATTGAAATCGAAGTGGCGGTATTCCTCGAAGAAACGCTTACT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_377715_378131_0:0:0_2:0:0_40/1
TGCCTTCATGCATCAGGTCGAAGGCGTCATTAATTTCATCCAGGCTCATGGTATGCGTGACAAACGGTTC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3029222_3029809_0:0:0_0:0:0_41/1
TAATCATTCTGGCATGTCGCACTCTCGCATTTAATCGTTTTTATCTGGATAGCGCTCTTTTGATCGGCAA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3421662_3422214_0:0:0_2:0:0_42/1
AACTTTAATTATTGCATCAGTTGGGATTGCGGGGGCGCTACCGTGGGGGATCTTACTGGCGTTAGGTCGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2478072_2478616_1:0:0_0:0:0_43/1
TTCGTCCCAAAGAACCGAACTGGACAGCCTGGGCAAACGAAATTCGCCTGATGTGTGTGCAGGATGGTCG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1485709_1486188_1:0:0_3:0:0_44/1
AGCTGAACTTACGCCATACCGAAATCATGCCGCTTTATGCGCGGCTTTCGAACAGCGAACAAAATAGGGT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1399595_1400211_0:0:0_2:0:0_45/1
TGACTTCTCATACGAAATTAGCACCCTGCTCTCCCCGGACGAACGTACCGCTATGCGTCAGGGCGTCATC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3465166_3465722_1:0:0_1:0:0_46/1
TTTAGTGCAATGGCATAAGCCAGCTTGACTGCGAGCGTGACGGCGCGAGCAGGTGCGAAAGCAGGTCATA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3875172_3875644_0:1:0_0:0:0_47/1
TCCGTGACAGCGACATCCAGCCCGTCAGGTCCCGTTCACGCCGCCACACCCGTTATGCCACCCTGCGGCC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_552925_553366_2:0:0_3:0:0_48/1
TTCATCAACGTGGTTGATAACGACTTCCTGAACATCTCTGGCGAACGCCTGCCAGGTTGGGGCTACTGCG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1345161_1345622_3:0:0_0:0:0_49/1
GTTGGCCCTTTGACACGCCAGTGCATCCAACAGGTGCATTTCAGCAAAGCATTTATTGGTATTGATGGCT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1948272_1948627_0:1:0_1:0:0_4a/1
AACTTTACCACACCGGTTTTAAATACGAATCTTCGCTCCAGTTCTGTTCAAAATTTGTTTTTGATATTTA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1064871_1065406_1:0:0_2:0:0_4b/1
TAACTTAATGGAGTATATTGAAAACATTAATGCGTGTGATGATGTTTTTTCTGAGTATTGTTTTGATGAT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_752698_753184_1:0:0_1:1:0_4c/1
GCCGTATCTCTGCCGTTCACTGCACCTGTTTATGCTGCTGATGAAGGTTCTGGCGAAATTCCCTTTAAGG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4135012_4135523_1:0:0_1:0:0_4d/1
TGGTCGGCGATGCTGATGTTATCAACCGTATTCGCGCCACGCTTAACTCCGGCGGTAGCCAGATCCAGGG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2729782_2730273_1:0:0_2:1:0_4e/1
ATTGTCGCCGTCCAGTAAATGATAGAACGAGCCCTTCGGGGCTCGTTTATGTCTATAAGTTAGACGGAAC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1542686_1543226_1:0:0_2:0:0_4f/1
AGGATAATCATCGAGCGCCCATGCGTTTTATGGGCGGAATCGGCAAATTCACGGGCGATGGTTTCAATAT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4633357_4633899_0:0:0_3:0:0_50/1
GGGTTGATCAGGGTACACGCTTTGGAGAACTCTCCGGTGGTGAGCGCGGTCGTCTGCATCTGGCGAAGCT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1407584_1408133_1:0:2_1:0:0_51/1
TAAAATATCGGCAATATTTGGAACTTATTACTGGATTTGGGTAATACGTTGTTGGACCGAGCCGGTCTGG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3541780_3542335_1:0:0_2:0:0_52/1
CGGCAATCAATGCCTGATGCGACGCTGTCGCGTCTTATCAGGCCTACAACTATTGCCTACCTGTAGTCCT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3989779_3990284_0:1:0_0:0:0_53/1
GGGATATTACTTCCGTACAGGGATTATCATGACCCTGCCTGTGCTGTTTGTGACGCTGGCTGCGCTGGCG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_651408_651810_4:0:0_2:0:0_54/1
TAACCAATTTTCTGCGGATTAGGATGGTAAAGGCGGAACGAGTGCCGGTCCCCAATCACCACATAATCAA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4248678_4249125_1:0:0_2:0:0_55/1
CAATCCAGGACGGATAAGGCTTTCACGCCTTATCCGACAACAACTGCCTGATGCGACGCTGACGCGTCTT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3879736_3880138_1:0:0_2:0:0_56/1
CGTCAGGCAGGCTGCACCCTACACGAATTAGGGACCACCAACCGCACGCACGCGAATGATTATCGTCAGG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3388509_3388942_2:0:0_2:1:0_57/1
CGGATCACCGTGCGTTCAAGCTCCAGTTTTGCCAGATCGCGGGTCGCCTTCGCTTTCGCTAACTGATGCA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1148415_1148862_2:0:0_1:0:0_58/1
GGCTCAAAAACGCCTAGATTACCAGGGTATCTATACCCCTGATCAGGTTGAGCGCTTCGCCGAATCCGTA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_301454_301904_3:0:0_0:0:0_59/1
TTAAGAATAATCCTCCTGCTGTCGCCGACTATGCTTAACGTTTAAAAAAGCATCAGCACTCTCGCAACGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4563246_4563723_5:0:0_3:0:0_5a/1
AAATGCGCCATCTCTGCTGACCGTTCTGCCGCACCGGACGTTTATCACCTGGCCAATATGGGGGCAGAAT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1233332_1233750_2:0:0_1:0:0_5b/1
GATCAGCAAAAAACTGGACGCGATGGGGATCAAAACCGTTCTCGATTTGGCGGATACAGATATCGGGATT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3012492_3012987_1:0:0_0:0:0_5c/1
TAAATATCAGCAACGGCAATATCAAATCCTAATAGCGCCGCACTCTGGGCTATCGCCCGTTTGACATGCC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_315629_316251_0:0:0_1:0:0_5d/1
ACATCCAGCTTCTGTTCCGAAATTTCATGCATGTGCCTTTGTGAGAATACATTCGCAAACGAAGGCTTTG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_295753_296075_1:0:0_1:0:0_5e/1
CCGAAGTCGTGCCTTGGCCTCTTTTTTTACTTTCGCCTCCACGACTGGACAAGTTAATGTGATCGGCAAT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4429822_4430301_1:0:0_0:0:0_5f/1
ATGGTCCATCCTGACCAGAGCGAACAGGTTCCGGGCATGATCGAGCGCTACACTGCTGCCATCCCTGGTG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3270942_3271433_0:0:0_1:0:0_60/1
GTTACAGGGCCGCTGGGCGAGAAAGTGAAAGCCAGTTGGGGGATCTCCGGCGATGGCAAAACCGCGTTTA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_373969_374534_0:1:0_2:0:0_61/1
GACGCTAACGCGTCTTATCAGGCCTACAAATTCCCGCGCCATCCGTAGGCCGGATCAGGCGTTAACGCCG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2644046_2644484_1:0:0_0:0:0_62/1
AGTGCCTTGCCGCGCGTCGCCCCACTGGCATGGAAAACGGGCACCAGCTATGGCTATCGTGACGCCTGGG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_640491_640897_4:0:0_1:0:0_63/1
AAGCACAAGAACGTCTGCAAACGATGGTCAGCCACTTCACCATCGATCCTTCCCGCATTAAAGAACATGT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3283730_3284293_3:0:0_2:0:0_64/1
TGTGGCGTACGGCAAATCAGGAAGATCTTCTGATGTGGTGCAGCTTTGCGGATGACGTAAATGGTTTTTT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_577533_577989_1:0:0_2:0:0_65/1
TTTCTTGAATTGGGCAGAAGAAACCTGTCGATGCAGCTAAAATTTGTGGCGGCGCAGAAAATGTTGTTAA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4176623_4177184_0:0:0_0:0:0_66/1
TTTTAATCTCAATGCGTTAGCAAAAATCTCTGACGACCCGCTGGCTTCCCCTGATTTCCCCGCCCAGGTA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2570726_2571213_0:0:0_3:0:0_67/1
TTCAGGTAAAAGCTCCCCCTACCCTCCGCAGAAGGTAAAATGAAAAAGGAGAGAGCGTGACGCCCGAATC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_31909_32468_4:0:0_1:0:0_68/1
GAAAAATTCGCCGGTGCTAACGACCGTCTGACCACTCAGATGAAATCGGATGGCGAAGTGATTGCGCTTT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2713112_2713532_2:0:0_4:0:0_69/1
GGGAATGATGATGCTCTGAATCGACGAGATTTGGGAGAACAATCACAGGAGCTGGTGGCTTCACCGCAAA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2818557_2819043_2:0:0_0:0:0_6a/1
ATAGCACCTTCTCCGGTTCCCGCTTCGATACGACGAAGGCCTGCAGCAGTACCCGATTCAGAGATGATGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4389648_4390146_4:0:0_0:0:0_6b/1
ACATCCCCGGCAGAAAATTGCAGCGCCCTGATTTCATCTTCGGTTATTGCCGGGTCCCGGCTGGTGTCAT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_863272_863658_1:0:0_0:0:0_6c/1
AGCGGCACGCGGATAATGATTTTTTTGCCCGCTGCGGCGAGTTTTTTCAGGTTCTCCAGCACTCTGGCGG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1130874_1131377_3:0:0_2:0:0_6d/1
ATGTCCGCTGTGCTTAACGATCTCAAAACGGTCATGGATCAAGAGCAGCAACATCTCTCTATGGGGGAGA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1440867_1441423_3:0:0_1:0:0_6e/1
GTTACTCAGATAGAAGCGCGCTTTTTTCTGGTTTAACACGGCCTGAAGTTCTTGCGGCAAGCGCGACCAC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2321875_2322349_2:0:0_0:0:0_6f/1
GTATTGCTTACGCCAGAGAAATAACTGGCTGGCTTCTACACCATTTTGCCGGGCAACGAGGGAGACCGTC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2204014_2204592_2:0:0_4:0:0_70/1
TTATGAGAGTGATCGTCTGGTAGTGACATTGCAGTATCAGGACGAGGATTATGGCTATTGTTATCGCTCA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3845410_3845833_5:0:0_4:1:0_71/1
GCCTGGAGCGAGCGATAGAGACTTTTCGACATCATCATGTCGCCAACCCAAGCCGTTCCGATCACCAGTA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4592003_4592485_1:0:0_1:0:0_72/1
ACTTCGTTTTCGCTCATTAAGTGCACCAGGCGTTCCCCATCAACCAACACCATACCCTCGACGGATTGGG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1373197_1373676_1:0:0_4:0:0_73/1
TTCAAAGTATTCTTGGCTCGCGTAATGATTACGCAGGTGTCGAAAAACTCGGATATAACCGTGCGCTTAA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_487534_488042_0:0:0_1:0:0_74/1
ACGCGCCAGCGCACGACGCCAGGCGAAACGCCGCGCCGCTACGCTTAAGCCACGCAGTACCGTCTGGTAC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_765785_766224_3:0:0_2:0:0_75/1
CATTTTCTTCACTTCTTCGATAAAGTGATATTTCGACTTTTCCAAGACCTGCCAGGAGAGATCCGGGAAC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4310731_4311259_1:0:0_0:0:0_76/1
GCTCCACCCTGGCCCGGATGCTGGCTTTCATGTATTCGATGTTGATGGCCGTTTTGTTCTTGCGTGGATG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1205855_1206316_1:0:0_1:0:0_77/1
ACGCTGCTGCTGACTTCACAAAAATTCGCGATAAACCGACACGCCAATATTTGGATTGGCTTTTTCCAGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3426285_3426770_1:0:0_2:0:0_78/1
TGTGCATTTTTGTGTACGGGGCTGTCACCCTGTCTCGCGCGCCTTTCCAGACGCTTCCACTAACACACAC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2302267_2302743_1:0:0_0:1:0_79/1
GCCTGTTATTCGACACCGTTCCGCTCACCGATCCGCTGATGACGCTGCAAAGTCTTGCCAGTGGTCATCT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4595948_4596410_0:0:0_1:1:0_7a/1
GATGTCAGTTGTAAAAGGCCAAAAATGGCCAAAACCAGCAGTAATCTGGTCACAATTTTGATACGTTTTA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2274923_2275441_2:0:0_2:0:0_7b/1
AGCTTTGCGAGTAACGGGCCTGGAAGTACTGCAACGCTGGCGGCATCCTGTCGCGGTAGAAATTCCCCCG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1101425_1101988_2:0:0_0:0:0_7c/1
GGTTTCGGGTTTACAAAATAACTCAACCGATTTTTAAGCCCCAGCTTCATAAGGAAAATAATCATGCAGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2829702_2830203_0:0:0_2:0:0_7d/1
ACCGCCCGATGAATAGCATGTTCCAGTTCGCGCACGTTTCCCGGAAAACTGTAGTGTTGCAGTAAATTTC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_110809_111260_2:0:0_3:0:0_7e/1
ATTCCAGTTTTTCAAATCGCTAAAAATGTTGGGGGGTAATCCCGAGTTCTTCCTGAAGTTCACGCACCAC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4104760_4105284_0:0:0_2:0:0_7f/1
TGTGCCTCCCGCTTTCCGTCGGGAGATCTACCGTGAGCTGGGGATCTCTCTCTACTCCAACGAGGCTGCC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3820480_3820957_0:0:0_3:0:0_80/1
GTTCATTTCACGCTTAGTCTGGCTGGGGGCAAAACAGGTTCTTGGGCTGGATGGCATTGGTGAGGCCGGT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1797314_1797824_0:0:0_2:0:0_81/1
TTGATTTCTATGCTTTGAAAGGCGATCTTGAATCCGTTCTCGACGTGACCGGTAAACTGAATGAGGTTGA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3972613_3973091_1:0:0_1:0:0_82/1
GATAAACAATAATTAATTTGATCGCCCGAACAGCAAAGTTTGGGCGATTTTTATTACGATAATAAAGTCT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2964042_2964515_1:0:0_1:0:0_83/1
TACTCTATGCGGGCTTCCTCGGCGTCTTCCTCGGGGGACGTATTGGTTCTGTTCTGTTCTACAATTTCCC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1361870_1362411_0:0:0_3:0:0_84/1
TAGTGCCAGCGCATCATCGCAGGCTTCCAGCACGTTATCAGTATGGTAAAGGTTGATTTCAAACTGACCC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2422979_2423442_0:0:0_3:0:0_85/1
CGATCACGCCCCAAATCACCCAGACCATTACGGCGGTCCGGACAATCAATCCCAGCCAGTGACCAAAGGT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1550498_1550944_2:0:0_0:0:0_86/1
TTCAGTCTGAATAGACGCCGGATCGACATCGTTCGACTCACCGTGGTTCTGCCAGAAGGTAGAGGTTTCA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1058539_1059057_2:0:0_2:0:0_87/1
AGTGCTCGTTGGTCGGTACTGGCGCTGGTCGCAATTGGGATTGTGATTGGGATTGCGGTGATTGTATTGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3051678_3052197_2:0:0_2:0:0_88/1
GAATGGGTGCCATCAGCTGCTACCATCACGCGGCCCGTCAGCGTCTCGCCACTCTCCAGCGTGACTTCAA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3311052_3311575_1:0:0_1:0:0_89/1
GGCGTCCTTTCATTCTATATACTTTGGAGTTTTAAAATGTCTCTAAGTACTGAAGCAACATCTAAAATCG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1406274_1406811_1:0:0_0:0:0_8a/1
CTGATGAAAACGAGATATGACAGCTGGCATCAGACTGCGGGAAATACTGGCCCCCATGCCGATATTAATC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_840394_840928_3:0:0_2:0:0_8b/1
CAGCGCCCCGTCTTTCCAGTTCATCAGATTGCCCGACTTGTCGGTGTCGACGGTGGTGCCCGGAGAACCT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3320724_3321180_2:0:0_4:0:0_8c/1
GTGTCGTAGAGCGTGTATAACCCTACAAAGACATATTGTCGACACGTACAGACTCCCCCACAATCAACAC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3830360_3830880_3:0:0_3:0:0_8d/1
GAAGTTCGCCCTTAAATCACTAAACCCGACCACTTCCACGTTCCCCAGATGCGCGGTTGCCTGGTGTTCC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_722906_723508_0:0:0_4:0:0_8e/1
TGGACGCTGGCCTTCAGTGGTTGGCACCGTCCTTAAAGTAGTGAGTTTCGATCTCTATTTTATCGCCCCA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_578180_578745_3:0:0_1:0:0_8f/1
TACACAATGAGTAGAAATTTCATATTGTTAATATTTATTAATGAATGCCAGGTGCGATGAATCGTCATTG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1006552_1007110_3:0:0_3:0:0_90/1
CAAGAGGTGTTGCTGGAGCACTTAGAAAATCAGGGAATTCGTATCCCTTATTCTTGCCGCGCGTGCATTT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1185612_1186150_2:0:0_0:0:0_91/1
TCAGCGAATTCTTCAATGAAGACGATCCTGACTTTGACCACTCTCTCGACCAAAAAATGGCCATTAATTG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3056869_3057348_0:0:0_0:0:0_92/1
AGTCAGATGGGTACGGGATCGCAGGCCGATGAAGTGGGCATCGCGGATGGATTCTTTTAATTGTTCATCA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4286739_4287172_1:0:0_3:0:0_93/1
CGCTCATTTTTTAATGAGTTTATTTGTTTAATATTATGGGAAAAGGAGATGCATTTGGGAGAGGAAGAGT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1800192_1800796_2:0:0_0:0:0_94/1
AGCGCGGAGAAACCGGAGCGTATTTAAGTACGTGAGAATTTCGAGCACAGCCCGGGACCAAAATGGCAAG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_327351_327887_1:0:0_1:0:0_95/1
TAAATACGCCGTCCGGCAGGCCCGCTTCGCTTTAAATTTCAGCCAGCTTTAACGCGGTAAGCGGGGTAAC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3370309_3370880_1:0:0_1:0:0_96/1
CCGCCGCCTGAGCAGGCGTGTTGTAACGCCCTTCGGGAATCACCCGACATCCGGCGTCGCTCAACGTTTT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2956370_2956826_3:0:0_1:0:0_97/1
AAACCGCTGCCGGAGTTGGCAAAAATGGCGGCGGACACCTTTGGTCGCGTGCCGCACAAAGAGAGCAAAA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2109362_2109842_1:0:0_1:0:0_98/1
TTAGGCATTGATTTTTTGAAAGACAAAGATTCTCTAGCGAGTAAAGCCCATAGAATCATCTAGACTGGAC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3995466_3995923_0:0:0_4:0:0_99/1
GAGGCTGGTTGTCTCAGTAGGTCCTTACCGGCAAGTAGGTTGGGATATCCAGCGTCAGCAAATAACCTTC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_706275_706861_6:0:0_1:0:0_9a/1
TGGCGGGTCACACACCGCTGAATGAAATCAAAAAGTAATCTGCTTTATGGCTTAAGCGACGCTTGAGGGT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2979149_2979594_3:0:0_0:0:0_9b/1
AATTCATGAAAATCGACGCTATGTAGCAGCACTTGCGCAGAGTGAAGCCCACCAAGGGGCTGTTTAATGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1462582_1463111_1:0:0_1:0:0_9c/1
TCTCTGCCGTTTCCGGCATGCTGACAGTTCCAAATTGCTGAGCCATGAGCGGGTTCACAAATCGCCAGCC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2086373_2086965_0:0:0_1:0:0_9d/1
TGAACGGTGATGCTCCACGCTGCATCGCCAATCTGCTGATAATCGGTGATGGCATGTCCTTCTTCTGCCG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4556582_4557112_2:0:0_0:0:0_9e/1
GCGCAACCTGGCGCAGTGTGGCATTCGCACCTTGTGCTACAACTTCATGCGGGTGCTCGACTGGACCCGT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3345240_3345727_0:0:0_0:0:0_9f/1
TCTCGATCTGTTAGCCAATCACGACTTCCGCACTTTAATGCGCGTCACGCGTCTGAAAGAAGATGTGCTG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4388388_4388993_2:0:0_1:0:0_a0/1
CAATTAAAGACTGGATATTTGATATCATCCAGGTATCAAATCTGTATTGGTTTTTACGCTGCCTGCTCAA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2708833_2709253_1:0:0_1:0:0_a1/1
TAAGTCTACAGAATCTGAACATGGCATTATCTGTGTAGAAATGCCCATTTAACTGCCTGAAGAGTAACCC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3650719_3651255_0:0:0_2:0:0_a2/1
GCCGCTCTTCCTCTCATGGCCAGCGCAGCAGATACCCCGTCAACTGCCACCGCACGCAAAGGCTTTGCCG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1304715_1305156_4:0:0_2:0:0_a3/1
ACAACGCCACCATCGGCAATATCATGAATTAAAGCGCCCCACCATTCCAGCCACCGCCGGGCAGCCAATG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3397336_3397827_0:0:0_1:0:0_a4/1
GCTATGTCGCCAGCGATGAACCGTTAGATAAAGCAGGTGCATACGTTATTCAGGGGCTGGGTGGCTGTTT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_271809_272297_1:0:0_0:0:0_a5/1
AATTCATCCCTGAGGATTTGCAGCTCAACAGCCAGTCTGGCCTGTAGCATTGTTTTGTTCACGCCATCAA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2156923_2157477_0:0:0_2:0:0_a6/1
CAGCAGCGTGGTGGCCACAGGACGGATAATAAACAGGCGCGACGGGCCGCCTGTGGTGCTCGGGGGTAAC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_266504_267005_0:0:0_1:0:0_a7/1
GGGTCGTCGTGAACATACAAAGCATATGCTGCGTCTGCGGCGGGAAGTGCAGATCACCGGTAAACAGGTG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1040370_1040862_0:0:0_1:0:0_a8/1
TCTGCTCCTTAGTACAACTCGTTTTCGTTACGGCGGAGAGTTTGTGTTGTCATGCGCCCCCACATTTTGT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4372140_4372626_0:0:0_2:0:0_a9/1
GAATATCAGTAGCTGAACCCGAACGACCATGTTAACAAATGTCAGTCCACTAACGACGCCTACCCGAGCG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3391700_3392193_1:0:0_0:0:0_aa/1
GATGCGACGCTGTGCGTCTTATCAGGCCTACAAACGGAACATAACCGTAGGTCGGATAAGGCGTTTACGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_351919_352425_1:0:0_4:0:0_ab/1
AGGAGAGCATTATTTCTTTTAGCGAATTTTATCAGCGTTCGATTAACGAACCGGAGCAGTTCTGGGCCGA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4041478_4041859_0:0:0_1:0:0_ac/1
TACTTCGCATCGGTCGCCGTGCCGTTGGCGTGGCAGTCAAATACGCCGAACTCAAAGCCTTTCAGATCGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3469727_3470183_3:0:0_1:2:0_ad/1
AAAAAGCGAAGCGGCACTGCTCTTTAACAATTTATAAGACAATCTGTTTGCGCACTCGAAGATACGGATT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3302325_3302817_2:0:0_2:0:0_ae/1
AATGGCCGTGAGAAAAGCAATGAAATATTCCTTAGGTCCAGTGCTGTGGTACTGGCCAAAAGAGACGCTG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1876719_1877255_1:0:0_1:0:0_af/1
CGACCGACACCATGAAGCCGCGGTGGCATTTTGTATTCACGCCGGAACGGATGAGTTAACCAGTCCTGTA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2095223_2095778_2:0:0_3:0:0_b0/1
ACCAACCTGATTAGTGATCATCACCAGCTTGTATCCCGCTTTTTGCAGCTTCAGCAGTTCCGGGATCCCG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1497039_1497512_0:0:0_0:0:0_b1/1
AGTTTTCGCAGGCAATGTCAGCCATGTACCAGCAAGAAGTTCCGCAATATGGCACGCTGCTGGAACTGGT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4027678_4028216_3:0:0_0:0:0_b2/1
TGCCTTTTCTTAATTTGCATATGTCGCCGGAGCGAATGCTGCAGCGGATGGACTCGGAAAAAGTGGTTAC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1532976_1533336_1:1:0_2:0:0_b3/1
AGGAGATAAGTCATGGAAGAAAAGAAACACGTTTGCATATTGTCAGTAACGTAACGACTGAACTATTGTG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4441807_4442222_2:0:0_1:0:0_b4/1
AGCGCGGCAGAAGGCAAAGCGTGGAAAGAAGAGTGGGGTGTGCGCAAGCAGATTCAGGTCCGCGATGCGG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_567028_567519_1:0:0_1:0:0_b5/1
GCTGCTACGATACTGCCTGCGTGGAAAGCTTCTTTCATTCGCTGAAAGTGGAATGTATCCATGGAGAACA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3922625_3923179_0:0:0_0:0:0_b6/1
CCAGCGCCATGCCGATCCGTCCCATCCCGACAATGCCCAGTGTTTTATGGTGAACGTCAGTGCCGTACCA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_757259_757921_1:0:0_1:0:0_b7/1
GCGAAACTGAAACTCGATCACCTGGGTAAAGAAGTTCTCGAATCCCGTCTGCCGGGAATCCTGGAGCTTT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1134932_1135452_0:0:0_0:0:0_b8/1
GCTTCCTCAACTCCATGCAGCAAAATACCGGCGCTAACAATATTGTGGCAACCACCCAGAACGGCTACAA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1713567_1714078_4:0:0_2:0:0_b9/1
CTTGAAGCCCTGCCCGGCGTAGGTCGTAAAACAGCCAACGTCGTATTAAACACTTCCATCGGCTTGCCGA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1865774_1866286_3:0:0_2:0:0_ba/1
GCCGAGAACGTTCAGATGATGCTGGTGGCCTACGGCGATAATACGATTCAGCGCTTCATCATTAATGACG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_372438_372977_0:0:0_2:0:0_bb/1
CCTTCGACTTCCAGCCAGACCGCTGTTTTCAGTCCTGAGAATTGCCCCACGCCCGGTAAATTGACCTGTT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1813908_1814399_0:0:0_0:0:0_bc/1
CCTGTCGGGAGCATTTTTGCTGCGATTCAAAAAAATGAGAGTCTCCTAAAGTTTCCTTCGATAATATCAA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2349633_2350050_2:0:0_3:0:0_bd/1
AAGGAACTGGGCGCTCTCATAGATTTCGCCGGTCACTCGGTTCTGTACCAGATATTTGCCATGCAGCTGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1234374_1234884_1:0:0_1:0:0_be/1
TATGGCCGCATGGTGTGGATGGCCCTGCCTTACACCCTCGTCCTGACACTCGACGGCTTGCTCTGCGTCG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2843199_2843682_1:0:0_0:0:0_bf/1
AAACTCCCAGGCAAGCTCACCTGTGGCGAGGTCAGTTTCAACCGTTAAGGCGTTTGACGGGCAGGCATTG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2573296_2573819_1:0:0_3:0:0_c0/1
TGCCGCCTTTGCCGGTCACCGATTTGAAGCACGGCTGCATTGCGCCCTGCTCCAGCGACTGTTTTTCCTT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1106699_1107206_3:0:0_2:0:0_c1/1
CCTGGTATTGCTTACGCCAGAGAAATCACTTGGTGGCTGCTACACCATGTTGCCGGGCAACGAGGGAGAC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_107954_108416_0:0:0_1:0:0_c2/1
TTTCCCTTTCAGTACTTCGTCGGAGAGTTTTTCCATCTCCGGTTCCATGGCATTGATGATGTTGACCACT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2855819_2856349_2:1:0_1:0:0_c3/1
ATCCCGAGATCCTGCTGTTTTACCGGATGGGTGATTTTTATGAACTGTTATAGGACGACGCAAAACGGGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3714894_3715323_2:0:0_1:0:0_c4/1
AGCGTGAAAGACATGTACCATGTCAAAAGCAAGCTGATTGCTCCGCAGGCCCTGACGATCTTCGTCTGGG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4472486_4473052_1:0:0_0:0:0_c5/1
CGGAAGGGTTAAATCAGGCGGAAGTGGAATCTGCCGGCGAACAACATGGTGAAAATAAATTACCCGCACA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2424071_2424550_0:0:0_2:0:0_c6/1
GCCAGCCGGGAACAAATGCGGGCACAGTGTCTGCGTCATCGTTCAACTAACATCGTCGTGCATGATGGCA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_284857_285348_0:0:0_3:0:1_c7/1
TCCTGCTGTGGGGGGCCATCCCGTTCGGCATCGTCTGCGTGCTGACCTTCTACACGCCGGACTTCTCCGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2009155_2009690_0:0:0_3:0:0_c8/1
TTGCCGGTTCCCCCGAAACTCACGCCAGGTTTTATTGCCGTAATTCTCTCCCAGATGAATGGTTTTTACG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1254866_1255346_1:0:0_2:0:0_c9/1
TCTCTCTGATGTAATGCCGTTGTATGTCGGTCTGTATCTTGGCTCAACACATGCATCGCCGGACTATATC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_769127_769674_1:0:0_1:0:0_ca/1
AACGTAAGTTTGTGATGCGCCGCTTTGAAGAGGTATTTGAGAAGATCGAAGCGCAGCGAGATAATCTGGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3623590_3624093_2:0:0_0:0:0_cb/1
CACGCGAGGGAAACCGAGGGTGTGATTCAATATTGTCATTTTTTCTACCTCTAATTATATGTAAATCCTA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_904654_905190_1:0:0_1:0:0_cc/1
GAGGCATCGCTAAGCAGTGTCGCCAGTTTGTCGCTCAGATAAGGGCGCAAGGCGGTGATGTCGTTTCTGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2596498_2597053_3:0:0_2:0:0_cd/1
TATCGTGGTCGTTATCAAATCTCTGTTAAGCCGCAGGGTTATCAGCAGGCGGATACGGTTACACTGCTGA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3266850_3267309_2:0:0_0:0:0_ce/1
ATAGAACCACTTCTAAAGACTTCCTGACAGACTACCAGGTGCTGCGTTTTCGGAAGAAGAATAGTGCTCA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4510112_4510622_2:0:0_1:0:0_cf/1
AATTCCAAGGTCATAAATAACATATCATCTTCATTGCCAAATCCTTGACCAGAATGGGCCAAATTTCCAC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_962385_962822_2:0:0_1:0:0_d0/1
CCAGAGAACCTGGCAGGAACGCACGAAAACCGTTCAGCTCAACAGTGAAGCCGCCCTTAACTTTGCCGTT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3664650_3665086_0:0:0_2:0:0_d1/1
GCTGAAGAATATTAACGCGAGGGTGACGATTCGCCAGGCGAATCAGTTCCCAGACGTTAATCCCCTGATC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2013645_2014195_2:0:0_1:0:0_d2/1
CCACAATTAAACAGCTTAGTAGTCTCAGGCATTGCTGTTATTTACCATATTATATTGCGATAAATAGAAA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_479246_479837_1:0:0_0:0:0_d3/1
CAATCATGGCTGGGTAAACGACCCAACCTCGGCGATCAACCTCCAGTTGAATGAACTGATTGAGCATATT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4416930_4417445_2:0:0_1:0:0_d4/1
CTGGTGTCAGACTTTCTATGCCCGCATTAAGCGAAAAAATTATTAATCACAATCCCGCAGCAGGAATGTT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2897364_2897837_1:0:0_0:2:0_d5/1
CGTACCGGTGCTTCCGCCACCGAAGGTGGGCTGGAAACTGTTGTAGACAACTCGGTGGTGCTCGACGTCT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2075245_2075722_0:0:0_2:0:0_d6/1
CCTGCAGCCTGTTCCCCTGGACAAAGGCCCCTTCGTCCGTGGTGGCACCGTTAATGGCTTCCAGCACCTG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_218712_219212_2:0:0_0:0:0_d7/1
ATGGTACGCCGGTAGTGGATATCAAACCGTATCTCCCCTTTGCCGAATCGCTTCCCGATGCCAGTGCCAG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3153680_3154183_2:0:0_1:0:0_d8/1
TCCATGCCTTTCAGGGGATCCAGAACTTGATCGAGAACGCCGGTTTTTTTCACGCTGCCGCCGCCGTAGG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4025152_4025634_0:0:0_3:0:0_d9/1
ATGCGTTAAGCAAATCGATGGCGGAAATGATTCAGGCTGATATGCGCCAGCATGGCGCAGATGTCTCGCT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2227118_2227579_1:0:0_0:0:0_da/1
ATACGCTGCCGTTCCGCAATACTAATCATCTGGTGTATCGCGATAACTGGAATATTCAGTTAACCAAAAC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2035128_2035575_2:0:0_0:0:0_db/1
TTCCGGGTTATCTCAAAATGGAATACGGTTCGACAAAGATGGAAGAGAGACTCTCTCGCAGCCCTGGCGG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_906088_906639_2:0:0_2:0:0_dc/1
TATTCCCCGTCTGGCGCATATGAAGCACCACTACGGCAGTATTCTGTTACCGCATGGCGGCCGTGCGCTG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3270624_3271143_2:0:0_1:0:0_dd/1
TGGTTTTGGCGGCGGTAGTCTTAATACTCTGAATGATATTGATATTTCCGGCCTCGATCCGCGCTTAACA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_212928_213383_2:0:0_2:0:0_de/1
CAGCGTTGGCCGCATTTTGCCGAAGCAACGGCCCGCAGCGCCGCACTTTGTGCTGAACAAGATAGGCTGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_64436_64954_2:0:0_0:0:0_df/1
ACGGTATCGAGGCGCTGAAATCCGCGTTCTGGAATTTCTCTTCATTCTCGCTGGAAACTGTCGCTCAGGA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_387107_387592_1:0:0_3:0:0_e0/1
GCGTACGCCCCACCGGATGTAGCAACGGCGGGTTTTTCGCGACCGCCTCGCGCCAGCGTTGATGTACCTC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3776608_3777140_2:0:0_1:0:0_e1/1
TTCGATGCAGTGTACCTGTGCTCACAGATGTCTACTTTTTCGCGAAAACGTAGATCTCTACCGCCCAACG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3294353_3294843_2:0:0_2:0:0_e2/1
CCACTTAGCCAGATCTAAAGCCAGGTTCAGCAGGATGGCGCGAGTATTGTGGTCGGTGCGTTGCTGAAAA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_226061_226576_2:0:0_1:1:0_e3/1
GTCTGGAAAGGCGCGCGATACAGGGTGACAGCCCCGTACACAAAAATGCACAAGCTGTGAGCTGGATGAG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2410268_2410816_1:0:0_2:0:0_e4/1
TATGCGATGAAGGATTTTTACTAAAAAAAAGCCGCAGGGGTTTAAAACACCCCCAGCGGCTCGTTTTTTA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2283285_2283894_2:0:0_1:0:0_e5/1
CCGCTATACTGGGCAGACAACCGTTCAGATTTTCGGGTCAATGCCCGACCTCCGTTTTTACAGGCAAAAT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3810209_3810655_1:0:0_1:0:0_e6/1
GCCGCTGATGCTGGTCTTTATGATCACTTCGCTGGAAACCATTGGGGATATCACGGCGACCTCTGACGTT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1896111_1896570_1:0:0_1:0:0_e7/1
ACATATATGGAGGAAGAACTAAAAGCACTGGTCTGAGTTAAATTTATATCAGCATAAATGGGTCAGGACG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1202032_1202517_2:0:0_0:0:0_e8/1
GAATTTATTTTGTCCACCGGCCATGAAACAAGGCCGGTTATGGCGAGTGCGTGAGGACGCCGAGTTAGTT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2591638_2592230_2:0:0_1:0:0_e9/1
GCCGTATAAGGCGTTAACGCCGCATCCGGCAATGGTGAACGCTGCCTGATGCGACGCTCACGCGTCTTAT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1216616_1217133_0:0:0_2:0:0_ea/1
GTTTTGGTATTAAAAGAAAGACACAAGAATATATGTCTGTTCCCGTCTTACTCTCGCCTCACCCATTACC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4234384_4234884_0:0:0_1:0:0_eb/1
GGAGTTTATTCGCGGCATGAGTGCGGGGGTGCCAATCCTCGGGACAGTGACCAAAAAGTGCAAAGTTAAA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_989626_990145_2:0:0_1:0:0_ec/1
GAACAAACGCTTTATCATGGCAGATCGTTTTGCCAGGGCAGTCAGAAAGTTTCGCCACCGGTTTACCGTT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1330235_1330757_0:0:0_1:0:0_ed/1
GCGTACCGCCTGGGTTAATGCAGTTGACACGCAGGCGCTGCTGATATTCATCGGCCAGTACCTGCATCAT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2632925_2633456_2:0:0_0:0:0_ee/1
GTCGCCTGAATGAAACGGTTCGTCTGCTGCTTGAGCATTAGATGGGACAGGTTTGGATCAGCGGCGAAAT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4141835_4142369_1:0:0_2:0:0_ef/1
AAACCAGCCGGTCGTAGACCACCACATCTGCCTGCTGAATTTGTTGCAGTCCTTTCAGTGTCAGCAGCCC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1750087_1750609_1:0:0_0:0:0_f0/1
GACTACAAAATGGAAATTAATGAGTCGCGCACCGTCCTGAAAACGGAAGACAACAGCTACTCGATTGACG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2999717_3000137_1:0:0_0:0:0_f1/1
AATTTCAACAGGATCAATACCTAACGCTGTCGCGGCGTCATCAAGCATAGACTCAACGGCAAATACGACT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1308580_1309120_1:0:0_2:0:0_f2/1
GCATAACAGAATGCGGGCAAAAACATCGTTGATTTCACCATCATCACTGTCCGCCAGACCAATCACCACT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3155804_3156287_0:0:0_2:0:0_f3/1
TTGCGTTTATCCCGGATCTAGATGAGTTGATCTTTTAAGAGCTTCCGGCTCTGCATGATGATGTCCTTAT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4322454_4322974_1:0:0_0:0:0_f4/1
GTGGAGCACCACGCATTCGCCCGCGTTGACGGTGCGCGAGGCGCGATTGAGGACGGGCAGGCGCACGCCG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_196794_197315_0:0:0_4:0:0_f5/1
ACTCCGCCACCATGCCTTCAATAATAAATCTGTCGGCCAACGAGCGGCGATTATTGCCGCAGGTCCGGTT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3964205_3964691_0:0:0_0:0:0_f6/1
GCGGCGTGAACGCCTTATCCGGCCTACAAATCCTTGCGAATTCAATATATTGTATGGAAATTCAGGCCTG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2999042_2999480_1:0:0_3:0:0_f7/1
TTGGACATTACCCGTCGACCTCGTGCTTTGTTTCAGTAAATAGCCACCGTTATGGATAGGTGCAGCGTCT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1569594_1570181_1:0:0_3:0:0_f8/1
GCCGCGCCTCGGGTATCGGAACCGAGCGGATCGTGTGAACGGTTACGCCACACGCCTGCCGGGCTAACAC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3928328_3928806_4:0:0_3:0:0_f9/1
TTTACTCTCCGGGAACCACTTCACCCAGTTAGAGAGCGTCAGCAGATAACCCGCGCCATCTGCCAGTCCG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3755026_3755454_0:0:0_2:0:0_fa/1
TGCGGTGAAGTTTCCAACAGCTGGAACGGCTGGGTAGAGTTCAGCTCTTTCGGGTAAGCAGGCAGCAGAG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_815566_816153_2:0:0_1:0:0_fb/1
GTCGTGCGGCACGTAACGTTAACGGTAAAGCGATTCTCTACGGCGCTACGATCACCCCATCAATGGCGAA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_968513_968924_2:0:0_2:0:0_fc/1
ATCGCTCGAGCCTTGTTGCGTGATAGGCCGATTCTGATTCTGGACGAAGCTACCTCGGCTCTGGATACCT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3343097_3343614_1:0:0_0:1:0_fd/1
GCATTACCCGTCAGAACGACAAAATCTTTTGCCAGTTCGTAGTGCATCTGGGAAGCGTGACCTTCTACGG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4133380_4133810_1:0:0_2:1:0_fe/1
GGCGCTATCTGGGGCGTGTTGATCCTTACTTGCCTGTTGCCAGTAAACCAGGTGCTGACCGCGCTGCCGG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2935878_2936390_0:0:0_2:0:0_ff/1
CTGCGTATCCCGGTATGTATGCACAACGTTGAAGAGACCAAAGTGTATCGTCCTTCTGCCTGGGCTGCGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2329145_2329714_1:0:0_1:0:0_100/1
GGTGGCGCGATGGATCTGGTGACCGGGTCGCGCCAAGTGATCATCGCCATGGAACATTGCGCCAAAGATG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1397228_1397769_0:0:0_1:0:0_101/1
TTCGCACCTTCCTTAGCTCATGTTGCTACTTCCTTCCCTGACTATTGATGATCTGAATGAATTGTCGCGG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1794449_1794945_2:0:0_2:0:0_102/1
GTACGGTTCCTCTGTTTTTTATTCATGGATTAATTTAGCGTCGTAATTACCCGATTTTCAAGATACTAAT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3049326_3049779_0:0:0_4:0:0_103/1
CTGCGTAAAAAATTTCTCCTCAGTTGTTTATATGATACCCATCACACTTTCCTCTCCCGGTTTTTTCGCC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3066380_3066846_2:0:0_0:1:0_104/1
CAAGCGTTGTTCAGTTAGGCGCTATCTGATGGAAAAATAAAACAGAGGCGCTAAGCTTGCCTCCAGAGGT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2702289_2702896_1:0:0_1:0:0_105/1
GTATAGAATATTCCCCCGAAGTTTAAGGTTGGCACCTCCAGGTAGCCACGGCACACGAAACAGCGTTGGT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3585966_3586474_1:0:0_2:0:0_106/1
CCACGCCTTGTTTCTACAACGAAGAAAACGTTTCAACCTGCACATCACCTTTAAACGCCAGTACAGCTTC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4502923_4503369_0:0:0_1:0:0_107/1
AGAATGACGTGCAAGTGCGCACGCGACACCCGGAGACAACGGCTGACTAAGCTTACTCCCCATCCCCGGG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1795321_1795834_2:0:0_2:0:0_108/1
GGCTTCATATCGGGGGAAAAACGCTGGATGACTTTTCCGTCCCTGCCAACCAGGAATTTTTCAAAATTCC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_956692_957193_3:0:0_4:0:0_109/1
CAGCCCTTCTGGCACATCGTCATATTCGGTCATTGTGAACCATTTTTCGTTGGGATAATGCACGAACGGA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1721166_1721637_2:0:0_1:0:0_10a/1
GACAAGAGACAGACCTACCATTGAAAGAACCAATACGCGTTTAATCATTGAAAAATCTCCTGTTCACCAT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2122666_2123204_3:0:0_4:0:0_10b/1
TCCTTAACCAGCGGTGTTTAGGGTTTGAAGTGGTGGGCGTTTAGCACGACCCGAAACCGGGCGGCGTTTC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1360797_1361165_1:0:0_2:0:0_10c/1
GAGATAGGCCAGACCCATCATCACCACCTGCCACAATTTCAGTGATTTTCGCAGACTGGTTTTGCCGGGT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4139301_4139770_1:0:0_1:0:0_10d/1
TGGCGCTGGTGATTTCCGGCCTGATGCCCTTCGACAAACTGGCCAATTCTGAAACGCCGATTTCCGACGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3805761_3806248_2:0:0_2:0:0_10e/1
CGCTACTACTCACTCACGACGCACAATCTGAAAACCGTTATGGAACAGCTGGCTCATGGTAAAGGCCGTT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4565252_4565721_2:0:0_1:0:0_10f/1
AAAAAACACCGGAAAAACAGATTATGGTGAGAAAAAAGGCCAGATACCCTTTAATGCCCACTTTTTCTGA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1098525_1099049_2:0:0_1:0:0_110/1
CGGTCGACTTGCCCGCAGACCCTCTCCCCTTTTTCGAGCTGGGCAATGGTGCGAGAAATGTACTCCAGAG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2623641_2624190_2:0:0_0:0:0_111/1
CAGTTTGCGACCTTTTTCCGGTTGGGTATGTTCCACGCCCATAAAAATCAGACGGGCTTCTTCATTACCG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_442351_442898_2:0:0_2:0:0_112/1
AACAGGTCGATAATTTTCAGACCACAGTCGATAGCTGTACCCGGCCCGTGGCTGGTCAGCAATTTTACCC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1571498_1571984_2:0:0_0:0:0_113/1
GGTGTCTTCGCCTGGGTATCAAATACTCTGGGGCCGAGATGGGGATTTGCAGCGATCTCATTTGGCTATC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3268621_3269192_1:0:1_1:0:0_114/1
ATATCCACAGGTAATATCAATATTAATTGCCCAATAAAAGATATTTACAATGAAATCAGGAGGGTTAAAC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2811669_2812160_2:0:0_3:0:0_115/1
CAGCCAAACGCACGATCGCGCTGGCGTTGTGGTCGATGACTGTGATTGTCGCGCCAATTTGCGGCCCGAT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1767740_1768187_0:0:0_2:0:0_116/1
AACTCGCCACGCTCCTCGCCCAGCGCCAGTTTATATTCATCGCGATAACAAAGTACCAGCGCCGGATCGA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1111197_1111726_1:1:0_0:0:0_117/1
TGCAACTCCGGACGATAGTTATTTGCAGGCGACGGTTGGTTCGGCCCAAACAGGAACATACTGGTTAACG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1157902_1158353_1:0:0_0:0:0_118/1
GATGTCTCTCTTAAAGATGAGGAACCGGTAGCACAACGCCCGGTTGCAGGTAATGCTCAATACGCAGTAA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3041646_3042129_2:0:0_2:0:0_119/1
GCTGGTTATCGCTGTGGCGCTGGCATTAAGATCCACCGCCTGAACGAGTAGCAACACCAGCCCACCGATA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3264416_3264943_1:0:0_2:0:0_11a/1
TGACGAAAGCGATCAGCAGCAACAGGAACAGCGCCACAAAGCCGCGATTCAGCGGTGCAAAGCCGAGCTG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3068465_3069078_1:0:0_3:0:0_11b/1
CAATATTCCTTTTCAACTGACTCCAAATGGAGAAATACACTCCGCCGCCTTATGACGGGCAGTCTGACAA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1607508_1607969_0:0:0_3:0:0_11c/1
CACGGTTTAACCCGTTTTCGGATTAAATATCCGCGATAAGCGTGACAGCATTCCCCAATCCAGCGCAGCG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1157714_1158207_0:0:0_0:1:0_11d/1
AGTATAGCCTGCAGGCGCGAGGGAGAAAGATGGTTTGCCAGTTCGGCGACCAGGCCCGGCACATCAACTT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4304452_4304981_0:0:0_1:0:0_11e/1
GCGTCGTGCTGCCGAGAATGCTGCACAACAGCAGACGTGAAAGTTGACGATTGATCATTGCGTGGCTCCT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3033296_3033820_0:0:0_1:0:0_11f/1
AACACGCACAGGGGGCTGACGCGGTAGTCGATCTTAACAATGAACTGAAAACGCGTCGTGAGAAGCTGGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4218699_4219127_2:0:0_0:0:0_120/1
CTGCTCGCCATACGGCCTTGTGAAAGCCAGTTCATCGGTTGTTGTTGCCTGTTCAGTCATCGTGCAGCTC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_297190_297706_0:0:0_1:0:0_121/1
TAGCTGCCCAATGCGCCGAGATAAAAGGGTTTTGCTTCTCGCGCGGCCTGCAACACTGGCAGCTCCCGGT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2427733_2428261_1:1:0_1:0:0_122/1
CATTATCTGGCCTGGATTCATATTGATGATATGGTCAACGGTATTGTCTGGCTGCTGGATAACGAGCTGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2979911_2980410_0:0:0_1:0:0_123/1
CGAACATAAAGGTCAGCCCTACGACCAGAGTCGCAATCATCTGTTGTTCTGTGGTCGTAAAGCCCGCCAT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3134732_3135162_1:0:0_1:0:0_124/1
AATGAATGCTTATTGTCTGATACACCAGAAATAACACTCTTTTTATCGTTAAAAAATGATATTTCACTTT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_373305_373754_2:0:0_1:1:0_125/1
GGGGCGGCGGATGTGGTGAAGCACGCCAAAATCGCGACGTTGTAGCTGCCAGGAATCGGCACTATTCACG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4567939_4568553_0:0:0_0:0:0_126/1
GCAAAGCGTTGCAGGCAGCGGGCATGACTTTTCGCGTCAGCGATATTCCCCGTGATTTACGCGGCGGCTG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3685647_3686168_0:0:0_5:0:0_127/1
TTGGGATATCGACCAGAACCGGACCAGGACGACCTGAGCAGGCAACGTCGAATGCTTCAGCCATGATGCG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4318014_4318611_1:0:0_0:0:0_128/1
GACGGGGAAGATGAATGAAAAAGATTGCATTTGGCTGTGATCATGTCGGTTTCATTTTAAAACATGACAT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1519290_1519750_0:0:0_0:0:0_129/1
GCTTGACGGCACCACGCAGACTTATATCATTTGGATGAATCGATAAATTTCACAAGTGGCTAAGGAGAAA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2302955_2303380_0:0:0_2:0:0_12a/1
CTTACTACACCTTAAAACTGGCGACGCTGGCCTCTGTTCTGTCGGCGGGCACGCCATATTTTGTCGCACG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3869197_3869775_1:0:0_2:0:0_12b/1
TTTTACCGCGCAAAAGGCAGAGATAGCGATCCCTATCACGCCACAGGTGACGGGAATTGTTACTGAAGTC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2155688_2156200_3:0:0_2:0:0_12c/1
TGCGTAACGACTCGCAAAGACTATTTGCTGGCAGTATCGCAGACTACAAAGCCTGCGTATTGACAATCTT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2152666_2153185_2:0:0_2:0:0_12d/1
GACCTACCATAAACCTGTTCTGGTTTACGTCACCCACGACGTGCCCGTGCTCGAGAACCGTAGCAAAAGA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4064581_4064959_0:1:0_2:0:0_12e/1
AAATTTACCGGCGCGACCGTAGCCGAAACGCTGAAAACCTGGACCATGATGGAAACCATCCTCGGCACTG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_903179_903670_1:0:0_1:0:0_12f/1
ATTTCAGGCTTTGGATCAGGCTGTCAAACGCCTGGTTAGAGAAAGTGCAGGTTGCATCAATCTCTTTACA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3182216_3182709_2:0:0_1:0:0_130/1
CCCACTTCGCGCTCAGGCAGGTGGTGTACTGACGCGTGGCGGTCATACAGAAGCAACTATTGATCTGATG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_520640_521081_3:0:0_0:0:0_131/1
CAGCGGTCGAGCAGATCGCCACGCAACACCAGCAGCAGTGCCAGCAGCATAAAGGAGAGCGAAAATGCCG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_348837_349377_0:0:0_1:0:0_132/1
CGTCGCTGACGCGACTTATCAGGCCTACGAGGTGCACTGAACTGTAGGTCGGATAAGACGGATGGCGTCG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_2178919_2179402_2:0:0_0:1:0_133/1
GCGGCGATCATCTGGGGCCAAACAGCTGGCAGCAAGAAAATGCGGATGCGGCGATGGAAAAATCCGTCGA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4089718_4090145_2:0:0_1:0:0_134/1
CCTGTTTGCCGATACTGTGCATCATCTGGTCGAAGGCATGCTTATCGCCGTCGCTGTATTTACCGCTTTC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4338555_4339011_1:0:0_2:0:0_135/1
TAGCCTTCGGTTTGCGCCGCCAGAATCAGTCCCTGCAATAACAGCGTATCGTCTTCAACAATCCGAATTT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4257823_4258262_1:0:3_2:0:0_136/1
CCGCGCGAGCAAGCGGTGCTGATGACCTAGTATCGCAACAACATTGCGCATATGTTGGTGGTGCCTTCGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4387447_4388020_0:0:0_2:0:0_137/1
GTATGAGCCAGGCGACGGTAACCGATATTCATTTGGTCCCGTTTGAGACACGTTTCGTTGGCCCCGGGCA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3069706_3070198_1:0:0_1:0:0_138/1
ACTGGCGGCGGCGCATTCCTCGAATTCGTGGAAGGTAAAGTACTGCCTGCAGTAGCGATGGTCGAAGAGC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4473596_4474086_1:1:0_1:0:0_139/1
ACTACCAATGACCACCTCACCCGCGTCCAGCCCCACTTCATGGCACACTTTCGCTGCAACTAACTCACTG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3628762_3629269_3:0:0_3:0:0_13a/1
TCCCCGCCCAAACCATTACGCCGGAGGTAATGCCCAGCACGCCCATCATGGCTTCTTTACGGGAACGACT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1671216_1671689_3:0:0_4:0:0_13b/1
GGCCATTCAACAGGAAACCAATAGCCAACATCGCCGTGGAAATGGACAGTGAAATACTACTGTTCGCGGT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3504753_3505209_2:0:0_1:0:0_13c/1
CAGCGATTCCACACTTACCTGATCCGGTTCTATGTTGTAACCTGCTTCACGCGCCAGAATCACCAGCTTG
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3127565_3128033_1:0:0_0:0:0_13d/1
TCGGCTTGCCGAACATCAACGAGGCGATTGCCGATCTGAACGGATCGCTCTTTCCCTTCGGCTGGGCAAA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3491353_3491833_2:0:0_1:0:0_13e/1
ATTTCCATCAGTTTCAGCAGATTCAACATCGCGATGTCATGCAAGCCGTAGGTTCTGCCGGGAATAGATA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3561063_3561546_0:0:0_1:0:0_13f/1
GGCTCCCAGCACACGGTTAAAGTCGATATCGACAATCACATTACCTAAATCAAAGATATAGAGCATTTTT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3460974_3461414_1:0:0_3:0:0_140/1
AATCTGCCCTCCGTTCGGCTGTTTCTTCATCGTGTCGCATCAAATGTGAGCAATAAAACAAATTATGCCA
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3356679_3357129_1:0:0_1:0:0_141/1
CGGTTCAACATCAGCAATGTTGACCGCGTTTTCACCCGGCGTAATTGCCAGCCGATCGCGCAGCGTGGTT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_1576152_1576746_0:1:0_4:0:0_142/1
GGTAAATTTATGGCTGCAAATAGATACCGGTTCATTGCAGGAAGAAGACAAAGAGCTCGGCGTGTCTCAT
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_4538709_4539239_2:0:0_0:0:0_143/1
CGCCACGATTGACTGAATAAAAGAGAACGGTTTTTCACTCATCAAAGCCGGTGTGAATCCGGAAAATACC
+
2222222222222222222222222222222222222222222222222222222222222222222222
@NC_007779.1_3828077_3828610_2:0:0_3:0:0_144/1
TCTCAAAGGAGGCGTAGTATACGCTGACTCAGCGAAGTGCTCAAGTCCCGACCAGACAAAGATCCCGAAG
+
2222222222222222222222222222222222
Download .txt
gitextract_a9n8s440/

├── .cargo/
│   └── config.toml
├── .github/
│   └── workflows/
│       └── release.yml
├── .gitignore
├── CHANGELOG.md
├── Cargo.toml
├── LICENSE
├── README.md
├── src/
│   ├── avx2_seeding.rs
│   ├── cmdline.rs
│   ├── constants.rs
│   ├── contain.rs
│   ├── inference.rs
│   ├── inspect.rs
│   ├── lib.rs
│   ├── main.rs
│   ├── seeding.rs
│   ├── sketch.rs
│   └── types.rs
├── test_files/
│   ├── k12_R1.fq
│   ├── k12_R2.fq
│   ├── list.txt
│   ├── pair_list1.txt
│   ├── pair_list2.txt
│   ├── sample_list.txt
│   ├── single_sample.txt
│   ├── t1.fq
│   └── t2.fq
└── tests/
    ├── integration_test.rs
    └── unit_test.rs
Download .txt
SYMBOL INDEX (128 symbols across 12 files)

FILE: src/avx2_seeding.rs
  function mm_hash256 (line 6) | pub unsafe fn mm_hash256(kmer: __m256i) -> __m256i {
  function extract_markers_avx2 (line 33) | pub unsafe fn extract_markers_avx2(string: &[u8], kmer_vec: &mut Vec<u64...
  function extract_markers_avx2_positions (line 151) | pub unsafe fn extract_markers_avx2_positions(string: &[u8], kmer_vec: &m...

FILE: src/cmdline.rs
  type Cli (line 6) | pub struct Cli {
  type Mode (line 12) | pub enum Mode {
  type SketchArgs (line 29) | pub struct SketchArgs {
  type ContainArgs (line 86) | pub struct ContainArgs {
  type InspectArgs (line 169) | pub struct InspectArgs {

FILE: src/constants.rs
  constant EM_ABUND_CUTOFF (line 1) | pub const EM_ABUND_CUTOFF: f64 = 0.01;
  constant PAIR_REGEX (line 2) | pub const PAIR_REGEX: &str = r"(.+)(_?1|_?2)(\..+)";
  constant CUTOFF_PVALUE (line 3) | pub const CUTOFF_PVALUE:f64 = 0.9999999999;
  constant SAMPLE_SIZE_CUTOFF (line 4) | pub const SAMPLE_SIZE_CUTOFF: usize = 25;
  constant MEDIAN_ANI_THRESHOLD (line 5) | pub const MEDIAN_ANI_THRESHOLD: f64 = 2.;
  constant QUERY_FILE_SUFFIX (line 6) | pub const QUERY_FILE_SUFFIX: &str = ".syldb";
  constant SAMPLE_FILE_SUFFIX (line 7) | pub const SAMPLE_FILE_SUFFIX: &str = ".sylsp";
  constant QUERY_FILE_SUFFIX_VALID (line 8) | pub const QUERY_FILE_SUFFIX_VALID : [&str;2] = [QUERY_FILE_SUFFIX, ".syl...
  constant SAMPLE_FILE_SUFFIX_VALID (line 9) | pub const SAMPLE_FILE_SUFFIX_VALID : [&str;2] = [SAMPLE_FILE_SUFFIX, ".s...
  constant MIN_ANI_DEF (line 10) | pub const MIN_ANI_DEF: f64 = 0.9;
  constant MIN_ANI_P_DEF (line 11) | pub const MIN_ANI_P_DEF: f64 = 0.95;
  constant MAX_MEDIAN_FOR_MEAN_FINAL_EST (line 12) | pub const MAX_MEDIAN_FOR_MEAN_FINAL_EST: f64 = 15.;
  constant DEREP_PROFILE_ANI (line 13) | pub const DEREP_PROFILE_ANI: f64 = 0.975;
  constant MAX_DEDUP_COUNT (line 14) | pub const MAX_DEDUP_COUNT: u32 = 4;
  constant MAX_DEDUP_LEN (line 15) | pub const MAX_DEDUP_LEN: usize = 10000000;
  constant DEFAULT_FPR (line 16) | pub const DEFAULT_FPR: f64 = 0.0001;
  constant MED_KMER_FOR_ID_EST (line 17) | pub const MED_KMER_FOR_ID_EST: f64 = 3.;

FILE: src/contain.rs
  function print_ani_result (line 18) | fn print_ani_result(ani_result: &AniResult, pseudotax: bool, writer: &mu...
  function get_chunks (line 96) | fn get_chunks(indices: &Vec<usize>, steps: usize) -> Vec<Vec<usize>>{
  function contain (line 115) | pub fn contain(mut args: ContainArgs, pseudotax_in: bool) {
  function derep_if_reassign_threshold (line 365) | fn derep_if_reassign_threshold<'a>(results_old: &Vec<AniResult>, results...
  function estimate_true_cov (line 389) | fn estimate_true_cov(results: &mut Vec<AniResult>, kmer_id_opt: Option<f...
  function estimate_covered_bases (line 403) | fn estimate_covered_bases(results: &Vec<AniResult>, sequence_sketch: &Se...
  function winner_table (line 422) | fn winner_table<'a>(results : &'a Vec<AniResult>, log_reassign: bool) ->...
  function print_header (line 473) | fn print_header(pseudotax: bool, writer: &mut Box<dyn Write + Send>, est...
  function get_genome_sketches (line 494) | fn get_genome_sketches(
  function get_seq_sketch (line 556) | fn get_seq_sketch(
  function get_stats (line 613) | fn get_stats<'a>(
  function ani_from_lambda (line 829) | fn ani_from_lambda(lambda: Option<f64>, _mean: f64, k: f64, full_cov: &[...
  function bootstrap_interval (line 861) | fn bootstrap_interval(
  function get_kmer_identity (line 913) | fn get_kmer_identity(seq_sketch: &SequencesSketch, estimate_unknown: boo...

FILE: src/inference.rs
  function r_from_moments_lambda (line 6) | pub fn r_from_moments_lambda(m: f64, v: f64, lambda: f64) -> f64{
  function ratio_formula (line 12) | pub fn ratio_formula(val: f64, r: f64, lambda: f64) -> f64{
  function ratio_from_moments_lambda (line 21) | fn ratio_from_moments_lambda(val: f64, lambda: f64, m: f64, v: f64) -> O...
  function binary_search_lambda (line 29) | pub fn binary_search_lambda(full_covs: &[u32]) -> Option<f64>{
  function var (line 104) | pub fn var(data: &[u32]) -> Option<f64> {
  function mean (line 116) | pub fn mean(data: &[u32]) -> Option<f64> {
  function mme_lambda (line 126) | pub fn mme_lambda(full_covs: &[u32]) -> Option<f64> {
  function mle_zip (line 157) | pub fn mle_zip(full_covs: &[u32], _k: f64) -> Option<f64> {
  function newton_raphson (line 194) | fn newton_raphson(rat: f64, mean: f64) -> f64 {
  function ratio_lambda (line 207) | pub fn ratio_lambda(full_covs: &Vec<u32>, min_count_correct: f64) -> Opt...

FILE: src/inspect.rs
  function pipe_write (line 11) | fn pipe_write(text: &str, writer: &mut Box<dyn Write + Send>){
  type SequencesSketchInspect (line 20) | struct SequencesSketchInspect{
    method from (line 32) | fn from(
  type GenomeSketchInspect (line 49) | pub struct GenomeSketchInspect{
    method from (line 57) | fn from(
  type DatabaseSketch (line 70) | pub struct DatabaseSketch{
  type DatabaseVisitor (line 79) | struct DatabaseVisitor {
    type Value (line 87) | type Value = Self;
    method expecting (line 88) | fn expecting(&self, formatter: &mut std::fmt::Formatter) -> std::fmt::...
    method visit_seq (line 92) | fn visit_seq<S>(mut self, mut seq: S) -> Result<Self, S::Error>
    method deserialize (line 108) | fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
  function inspect (line 117) | pub fn inspect(args: InspectArgs){
  function get_db_sketch_inspect (line 184) | fn get_db_sketch_inspect(
  function get_seq_sketch_inspect (line 218) | fn get_seq_sketch_inspect(

FILE: src/main.rs
  function main (line 18) | fn main() {

FILE: src/seeding.rs
  function mm_hash64 (line 4) | pub fn mm_hash64(kmer: u64) -> u64 {
  function rev_hash_64 (line 18) | pub fn rev_hash_64(hashed_key: u64) -> u64 {
  function decode (line 54) | pub fn decode(byte: u64) -> u8 {
  function print_string (line 67) | pub fn print_string(kmer: u64, k: usize) {
  function _position_min (line 78) | fn _position_min<T: Ord>(slice: &[T]) -> Option<usize> {
  function fmh_seeds (line 86) | pub fn fmh_seeds(
  function fmh_seeds_positions (line 148) | pub fn fmh_seeds_positions(

FILE: src/sketch.rs
  type Marker (line 25) | type Marker = u32;
  function check_vram_and_block (line 27) | pub fn check_vram_and_block(max_ram: usize, file: &str) {
  function extract_markers (line 53) | pub fn extract_markers(string: &[u8], kmer_vec: &mut Vec<u64>, c: usize,...
  function extract_markers_positions (line 71) | pub fn extract_markers_positions(
  function is_fastq (line 95) | pub fn is_fastq(file: &str) -> bool {
  function is_fasta (line 109) | pub fn is_fasta(file: &str) -> bool {
  function check_args_valid (line 123) | fn check_args_valid(args: &SketchArgs) {
  function parse_ambiguous_files (line 164) | fn parse_ambiguous_files(
  function parse_reads_and_genomes (line 191) | fn parse_reads_and_genomes(
  function parse_paired_end_reads (line 218) | fn parse_paired_end_reads(
  function parse_line_file (line 252) | fn parse_line_file(file_name: &str, vec: &mut Vec<String>) {
  function parse_sample_names (line 260) | fn parse_sample_names(args: &SketchArgs) -> Option<Vec<String>> {
  function sketch (line 276) | pub fn sketch(args: SketchArgs) {
  function sketch_genome_individual (line 481) | pub fn sketch_genome_individual(
  function sketch_genome (line 551) | pub fn sketch_genome(
  function pair_kmer_single (line 627) | fn pair_kmer_single(s1: &[u8]) -> Option<([Marker; 2], [Marker; 2])> {
  function pair_kmer (line 661) | fn pair_kmer(s1: &[u8], s2: &[u8]) -> Option<([Marker; 2], [Marker; 2])> {
  function dup_removal_lsh_full_exact (line 692) | fn dup_removal_lsh_full_exact(
  function dup_removal_lsh_full (line 735) | fn dup_removal_lsh_full(
  function sketch_pair_sequences (line 773) | pub fn sketch_pair_sequences(
  function sketch_sequences_needle (line 899) | pub fn sketch_sequences_needle(

FILE: src/types.rs
  type AdjustStatus (line 39) | pub enum AdjustStatus {
  method default (line 46) | fn default() -> Self {AdjustStatus::Low }
  type Kmer (line 49) | pub type Kmer = u64;
  constant BYTE_TO_SEQ (line 50) | pub const BYTE_TO_SEQ: [u8; 256] = [
  function mm_hash (line 62) | pub fn mm_hash(bytes: &[u8]) -> usize {
  type MMHasher (line 74) | pub struct MMHasher {
  method write (line 80) | fn write(&mut self, bytes: &[u8]) {
  method finish (line 84) | fn finish(&self) -> u64 {
  method default (line 91) | fn default() -> MMHasher {
  type MMBuildHasher (line 97) | pub type MMBuildHasher = BuildHasherDefault<MMHasher>;
  type MMHashMap (line 98) | pub type MMHashMap<K, V> = HashMap<K, V, MMBuildHasher>;
  type MMHashSet (line 99) | pub type MMHashSet<K> = HashSet<K, MMBuildHasher>;
  type FxHashMapVisitor (line 108) | struct FxHashMapVisitor;
    type Value (line 111) | type Value = FxHashMap<Kmer, u32>;
    method expecting (line 113) | fn expecting(&self, formatter: &mut std::fmt::Formatter) -> std::fmt::...
    method visit_seq (line 117) | fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
  function serialize (line 132) | pub fn serialize<S>(
  function deserialize (line 140) | pub fn deserialize<'de, D>(deserializer: D) -> Result<FxHashMap<Kmer, u3...
  type SequencesSketch (line 146) | pub struct SequencesSketch{
    method new (line 158) | pub fn new(file_name: String, c: usize, k: usize, paired: bool, sample...
  type GenomeSketch (line 164) | pub struct GenomeSketch{
  type MultGenomeSketch (line 177) | pub struct MultGenomeSketch{
  type AniResult (line 186) | pub struct AniResult<'a>{

FILE: tests/integration_test.rs
  function fresh (line 8) | fn fresh(){
  function test_sketch_commands (line 17) | fn test_sketch_commands() {
  function test_profile_vs_query (line 114) | fn test_profile_vs_query(){
  function test_sketch_list (line 145) | fn test_sketch_list(){
  function test_profile_disabling (line 212) | fn test_profile_disabling(){
  function test_sketch_fasta_fastq_concord (line 248) | fn test_sketch_fasta_fastq_concord(){
  function test_sample_names (line 298) | fn test_sample_names(){
  function test_fpr (line 378) | fn test_fpr(){
  function test_raw_inputs_profile_simple (line 425) | fn test_raw_inputs_profile_simple(){
  function test_estimate_read_counts (line 466) | fn test_estimate_read_counts(){
  function test_raw_inputs_profile_with_sketch (line 512) | fn test_raw_inputs_profile_with_sketch(){
  function test_inspect (line 552) | fn test_inspect(){

FILE: tests/unit_test.rs
  function test_hash (line 4) | fn test_hash(){
Condensed preview — 29 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (1,138K chars).
[
  {
    "path": ".cargo/config.toml",
    "chars": 80,
    "preview": "[target.x86_64-unknown-linux-musl]\nrustflags = [\"-Ctarget-feature=+crt-static\"]\n"
  },
  {
    "path": ".github/workflows/release.yml",
    "chars": 715,
    "preview": "name: \"tagged-release\"\n\non:\n  workflow_dispatch:\n  push:\n    tags:\n      - \"v*\"\n\njobs:\n  tagged-release:\n    name: \"Tagg"
  },
  {
    "path": ".gitignore",
    "chars": 414,
    "preview": "# Generated by Cargo\n# will have compiled files and executables\ndebug/\ntarget/\n\n# Remove Cargo.lock from gitignore if cr"
  },
  {
    "path": "CHANGELOG.md",
    "chars": 6640,
    "preview": "# sylph v0.9.0: 10-13-2025\n\n- Added an option `--estimate-read-count` to VERY ROUGHLY output estimated read counts in th"
  },
  {
    "path": "Cargo.toml",
    "chars": 983,
    "preview": "[package]\nname = \"sylph\"\nversion = \"0.9.0\"\nedition = \"2021\"\nlicense = \"MIT OR Apache-2.0\"\n\n# See more keys and their def"
  },
  {
    "path": "LICENSE",
    "chars": 1065,
    "preview": "MIT License\n\nCopyright (c) 2023 Jim Shaw\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\no"
  },
  {
    "path": "README.md",
    "chars": 6302,
    "preview": "# sylph - fast and precise species-level metagenomic profiling with ANIs \n\n> [!IMPORTANT]\n> All documentation for sylph "
  },
  {
    "path": "src/avx2_seeding.rs",
    "chars": 11040,
    "preview": "use std::arch::x86_64::*;\nuse crate::types::*;\n\n#[inline]\n#[target_feature(enable = \"avx2\")]\npub unsafe fn mm_hash256(km"
  },
  {
    "path": "src/cmdline.rs",
    "chars": 11296,
    "preview": "use clap::{Args, Parser, Subcommand};\nuse crate::constants::*;\n\n#[derive(Parser)]\n#[clap(author, version, about = \"Ultra"
  },
  {
    "path": "src/constants.rs",
    "chars": 802,
    "preview": "pub const EM_ABUND_CUTOFF: f64 = 0.01;\npub const PAIR_REGEX: &str = r\"(.+)(_?1|_?2)(\\..+)\";\npub const CUTOFF_PVALUE:f64 "
  },
  {
    "path": "src/contain.rs",
    "chars": 35515,
    "preview": "use crate::cmdline::*;\nuse std::path::Path;\nuse std::io::prelude::*;\nuse std::io;\nuse std::io::BufWriter;\nuse fxhash::Fx"
  },
  {
    "path": "src/inference.rs",
    "chars": 6711,
    "preview": "use statrs::function::gamma::*;\nuse fxhash::FxHashMap;\nuse crate::constants::*;\nuse std::collections::HashSet;\n\npub fn r"
  },
  {
    "path": "src/inspect.rs",
    "chars": 7116,
    "preview": "use crate::types::*;\nuse std::fs::File;\nuse std::io::BufReader;\nuse std::io::BufWriter;\nuse std::io::Write;\nuse log::*;\n"
  },
  {
    "path": "src/lib.rs",
    "chars": 193,
    "preview": "pub mod sketch;\npub mod constants;\npub mod types;\npub mod seeding;\npub mod cmdline;\npub mod contain;\npub mod inference;\n"
  },
  {
    "path": "src/main.rs",
    "chars": 1008,
    "preview": "use clap::Parser;\nuse sylph::cmdline::*;\nuse sylph::sketch;\nuse sylph::contain;\nuse sylph::inspect;\n//use std::panic::se"
  },
  {
    "path": "src/seeding.rs",
    "chars": 6787,
    "preview": "use crate::types::*;\n\n#[inline]\npub fn mm_hash64(kmer: u64) -> u64 {\n    //TODO this is bugged. Fix after release\n    le"
  },
  {
    "path": "src/sketch.rs",
    "chars": 32414,
    "preview": "use crate::cmdline::*;\nuse scalable_cuckoo_filter::ScalableCuckooFilter;\nuse scalable_cuckoo_filter::ScalableCuckooFilte"
  },
  {
    "path": "src/types.rs",
    "chars": 6992,
    "preview": "//Various byte-tables and hashing methods are taken from miniprot by Heng Li. Attached below is their license:\n//The MIT"
  },
  {
    "path": "test_files/k12_R1.fq",
    "chars": 476066,
    "preview": "@NC_007779.1_2104702_2105250_1:0:0_2:0:0_0/1\nGTGGTGCGGTGCGGCAAGGCGCTATCCAGGGATAACCGGGCAAACAGACGCATGGAGGCGATTTCGTACA\n+\n22"
  },
  {
    "path": "test_files/k12_R2.fq",
    "chars": 476066,
    "preview": "@NC_007779.1_2104702_2105250_1:0:0_2:0:0_0/2\nAACATGAAGCATGATGATTTGCTGACATATATTAAATATGTCGAAAGTAAGGGTTATGCTTTTAGTACAT\n+\n22"
  },
  {
    "path": "test_files/list.txt",
    "chars": 127,
    "preview": "test_files/e.coli-EC590.fasta.gz\ntest_files/e.coli-K12.fasta.gz\ntest_files/e.coli-o157.fasta.gz\ntest_files/o157_reads.fa"
  },
  {
    "path": "test_files/pair_list1.txt",
    "chars": 17,
    "preview": "test_files/t1.fq\n"
  },
  {
    "path": "test_files/pair_list2.txt",
    "chars": 17,
    "preview": "test_files/t2.fq\n"
  },
  {
    "path": "test_files/sample_list.txt",
    "chars": 6,
    "preview": "S1\nS2\n"
  },
  {
    "path": "test_files/single_sample.txt",
    "chars": 12,
    "preview": "SAMPLE_TEST\n"
  },
  {
    "path": "test_files/t1.fq",
    "chars": 1272,
    "preview": "@G002901995_1\nTAAAATATAACCTTCTATTTTAGCAACGTAAGTTTTTGGTATTTTAAAACGAGGATGTGTCATTAGATGCGTAAAAGCACCATCATTCGTTAACAATAACAGACCA"
  },
  {
    "path": "test_files/t2.fq",
    "chars": 1272,
    "preview": "@G002901995_1\nGCCTACAACTGATGAGGAAGAAGAGGAAATCGAATCCTTTTTTAGTCAGTTAGTGAATCAAAAAGGAGAATCATAATGAGCAAAGAATTAGAAAGACTACAAAAGC"
  },
  {
    "path": "tests/integration_test.rs",
    "chars": 18809,
    "preview": "use assert_cmd::prelude::*; // Add methods on commands\nuse std::str;\nuse std::fs;\nuse std::path::Path;\nuse serial_test::"
  },
  {
    "path": "tests/unit_test.rs",
    "chars": 877,
    "preview": "use assert_cmd::prelude::*; // Add methods on commands\nuse sylph::seeding;\n\nfn test_hash(){\n\n    let key = 1923823981293"
  }
]

About this extraction

This page contains the full source code of the bluenote-1577/sylph GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 29 files (1.1 MB), approximately 523.7k tokens, and a symbol index with 128 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!