[
  {
    "path": ".github/ISSUE_TEMPLATE/bug_report.md",
    "content": "---\nname: Bug report\nabout: Create a report to help us improve\ntitle: \"[BUG] XXX\"\nlabels: bug\nassignees: ''\n\n---\n\n**Describe the bug**\nA clear and concise description of what the bug is.\n\n**To Reproduce**\nSteps to reproduce the behavior:\n1. Describe the data you are using and provide a sample of your data if possible. For example, the paired-end reads are generated by 10x scATAC-seq. The read length is 50bp and the barcode length is 16bp.\n2. Get the Chromap version by running ```chromap -v``` and post it here.\n3. Provide the full command line you used to run Chromap.\n4. Provide the log output by Chromap and highlight the error message.\n\n**Expected behavior**\nA clear and concise description of what you expected to happen.\n\n**Screenshots**\nIf applicable, add screenshots to help explain your problem.\n\n**Environment (please complete the following information):**\n - OS: [e.g. Ubuntu 22.10]\n - Way you install Chromap [e.g. use Bioconda, download binary, build from source]\n - If you compiled Chromap from source yourself, please provide the compiler version [e.g. GCC 7.4.0]\n\n**Additional context**\nAdd any other context about the problem here.\n"
  },
  {
    "path": ".github/workflows/ci.yml",
    "content": "name: CI\n\non:\n  push:\n    branches: [ master ]\n  pull_request:\n    branches: [ master ]\n\nenv:\n  DEVELOPER_DIR: /Applications/Xcode.app/Contents/Developer\n\njobs:\n  ubuntu:\n    runs-on: ubuntu-latest\n    strategy:\n      matrix:\n        compiler: [g++, clang++]\n    steps:\n    - uses: actions/checkout@v2\n    - name: install-deps\n      run:\n        sudo apt-get update; sudo apt-get install -y clang libomp5 libomp-dev\n    - name: build-chromap\n      run:\n        make CXX=${{ matrix.compiler }}\n    - name: test-chromap\n      run:\n        ./chromap -h\n\n  macos:\n    runs-on: macos-latest\n    strategy:\n      matrix:\n        compiler: [clang++]\n    steps:\n    - uses: actions/checkout@v2\n    - name: cache-openmp\n      id: cache-openmp\n      uses: actions/cache@v3\n      with:\n        path: openmp-install\n        key: openmp-macos-install\n    - name: build-openmp\n      if: steps.cache-openmp.outputs.cache-hit != 'true'\n      run: |\n        wget https://github.com/llvm/llvm-project/releases/download/llvmorg-14.0.0/openmp-14.0.0.src.tar.xz\n        tar -xf openmp-14.0.0.src.tar.xz\n        cd openmp-14.0.0.src\n        sed -i'' -e '/.size __kmp_unnamed_critical_addr/d' runtime/src/z_Linux_asm.S\n        sed -i'' -e 's/__kmp_unnamed_critical_addr/___kmp_unnamed_critical_addr/g' runtime/src/z_Linux_asm.S\n        mkdir -p build && cd build\n        cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=install -DCMAKE_OSX_ARCHITECTURES=\"x86_64;arm64\" \\\n            -DLIBOMP_ENABLE_SHARED=OFF -DLIBOMP_OMPT_SUPPORT=OFF -DLIBOMP_USE_HWLOC=OFF ..\n        cmake --build . -j 3\n        cmake --build . --target install\n        mkdir $GITHUB_WORKSPACE/openmp-install\n        cp -r install/* $GITHUB_WORKSPACE/openmp-install\n    - name: install-openmp\n      run: |\n        sudo cp $GITHUB_WORKSPACE/openmp-install/include/* $DEVELOPER_DIR/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include\n        sudo cp $GITHUB_WORKSPACE/openmp-install/lib/libomp.a $DEVELOPER_DIR/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/lib\n    - name: build-chromap\n      run:\n        make CXX=${{ matrix.compiler }} CXXFLAGS=\"-arch x86_64 -isysroot $DEVELOPER_DIR/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk -std=c++11 -Wall -O3 -Xclang -fopenmp -msse4.1\" LDFLAGS=\"-L$DEVELOPER_DIR/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/lib -rpath $DEVELOPER_DIR/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/lib -lm -lz -lomp\"\n    - name: test-chromap\n      run:\n        ./chromap -h\n"
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2019 Haowen Zhang, Li Song, X. Shirley Liu, Heng Li\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "Makefile",
    "content": "CXX=g++\nCXXFLAGS=-std=c++11 -Wall -O3 -fopenmp -msse4.1\nLDFLAGS=-lm -lz\n\ncpp_source=sequence_batch.cc index.cc minimizer_generator.cc candidate_processor.cc alignment.cc feature_barcode_matrix.cc ksw.cc draft_mapping_generator.cc mapping_generator.cc mapping_writer.cc chromap.cc chromap_driver.cc\nsrc_dir=src\nobjs_dir=objs\nobjs+=$(patsubst %.cc,$(objs_dir)/%.o,$(cpp_source))\n\nexec=chromap\n\nifneq ($(asan),)\n\tCXXFLAGS+=-fsanitize=address -g\n\tLDFLAGS+=-fsanitize=address -ldl -g\nendif\n\nall: dir $(exec) \n\t\ndir:\n\tmkdir -p $(objs_dir)\n\n$(exec): $(objs)\n\t$(CXX) $(CXXFLAGS) $(objs) -o $(exec) $(LDFLAGS)\n\t\n$(objs_dir)/%.o: $(src_dir)/%.cc\n\t$(CXX) $(CXXFLAGS) -c $< -o $@\n\n.PHONY: clean\nclean:\n\t-rm -rf $(exec) $(objs_dir)\n"
  },
  {
    "path": "README.md",
    "content": "[![GitHub build](https://github.com/haowenz/chromap/actions/workflows/ci.yml/badge.svg)](https://github.com/haowenz/chromap/actions/workflows/ci.yml) [![GitHub license](https://img.shields.io/github/license/haowenz/chromap)](https://github.com/haowenz/chromap/blob/master/LICENSE) [![Conda version](https://img.shields.io/conda/v/bioconda/chromap)](https://anaconda.org/bioconda/chromap) [![Conda platform](https://img.shields.io/conda/pn/bioconda/chromap)](https://anaconda.org/bioconda/chromap) [![Conda download](https://img.shields.io/conda/dn/bioconda/chromap)](https://anaconda.org/bioconda/chromap)\n\n## <a name=\"started\"></a>Getting Started\n```sh\ngit clone https://github.com/haowenz/chromap.git\ncd chromap && make\n# create an index first and then map\n./chromap -i -r test/ref.fa -o ref.index\n./chromap -x ref.index -r test/ref.fa -1 test/read1.fq -2 test/read2.fq -o test.bed\n# use presets (no test data)\n./chromap --preset atac -x index -r ref.fa -1 read1.fq -2 read2.fq -o aln.bed       # ATAC-seq reads\n./chromap --preset atac -x index -r ref.fa -1 read1.fq -2 read2.fq -o aln.bed \\\n -b barcode.fq.gz --barcode-whitelist whitelist.txt                                       # scATAC-seq reads\n./chromap --preset chip -x index -r ref.fa -1 read1.fq -2 read2.fq -o aln.bed       # ChIP-seq reads\n./chromap --preset hic -x index -r ref.fa -1 read1.fq -2 read2.fq -o aln.pairs      # Hi-C reads and pairs output\n./chromap --preset hic -x index -r ref.fa -1 read1.fq -2 read2.fq --SAM -o aln.sam  # Hi-C reads and SAM output\n```\n## Table of Contents\n\n- [Getting Started](#started)\n- [User Guide](#uguide)\n  - [Installation](#install)\n  - [General usage](#general)\n  - [Use cases](#cases)\n    - [Map ChIP-seq short reads](#map-chip)\n    - [Map ATAC-seq/scATAC-seq short reads](#map-atac)\n    - [Map Hi-C short reads](#map-hic)\n  - [Summarizing mapping statistics/quality control](#atacseq-qc)\n    - [Summary File](#summaryfile)\n    - [Estimating FRiP](#estfrip)\n    - [Features to assist in doublet detection](#doublet)\n  - [Getting help](#help)\n  - [Citing Chromap](#cite)\n\n## <a name=\"uguide\"></a>User Guide\n\nChromap is an ultrafast method for aligning and preprocessing high throughput chromatin profiles. Typical use cases include: (1) trimming sequencing adapters, mapping bulk ATAC-seq or ChIP-seq genomic reads to the human genome and removing duplicates; (2) trimming sequencing adapters, mapping single cell ATAC-seq genomic reads to the human genome, correcting barcodes, removing duplicates and performing Tn5 shift; (3) split alignment of Hi-C reads against a reference genome. In all these three cases, Chromap is 10-20 times faster while being accurate.\n\n### <a name=\"install\"></a>Installation\n\nTo compile from the source, you need to have the GCC compiler with version>=7.3.0, GNU make and zlib development files installed. Then type `make` in the source code directory to compile. \n\nChromap is also available on [bioconda][bioconda]. Thus you can easily install Chromap with Conda\n```sh\nconda install -c conda-forge -c bioconda chromap\n```\n\n### <a name=\"general\"></a>General usage\nBefore mapping, an index of the reference needs to be created and saved on the disk:\n```sh\nchromap -i -r ref.fa -o index\n```\nThe users can input the min fragment length expected in their sequencing experiments, e.g. read length, by **--min-frag-length**. Then Chromap will choose proper k-mer length and window size to build the index. For human genome, it only takes a few minutes to build the index. Without any preset parameters, Chromap takes a reference database and a query sequence file as input and produce approximate mapping, without base-level alignment in the [BED format][bed]:\n\n```sh\nchromap -x index -r ref.fa -1 query.fq -o approx-mapping.bed\n```\nYou may ask Chromap to output alignments in the [SAM format][sam]:\n\n```sh\nchromap -x index -r ref.fa -1 query.fq --SAM -o alignment.sam\n```\nBut note that the the processing of SAM files is not fully optimized and can be slow. Thus generating the output in SAM format is not preferred and should be avoided when possible. Chromap can take multiple input read files:\n\n```sh\nchromap -x index -r ref.fa -1 query1.fq,query2.fq,query3.fq --SAM -o alignment.sam\n```\nChromap also supports wildcards in the read file names and will find all matched read files. To use this function, the read file names ***must*** be put in quotation marks:\n\n```sh\nchromap -x index -r ref.fa -1 \"query*.fq\" --SAM -o alignment.sam\n```\nChromap works with gzip'd FASTA and FASTQ formats as input. You don't need to convert between FASTA and FASTQ or decompress gzip'd files first. \n\n***Importantly***, it should be noted that once you build the index, indexing parameters such as **-k**, **-w** and **--min-frag-length** can't be changed during mapping. If you are running Chromap for different data types, you will probably need to keep multiple indexes generated with different parameters.\nThis makes Chromap different from BWA which always uses the same index regardless of query data types. Chromap can build the human genome index file in a few minutes.\n\nDetailed explanations for the options can be found at the [manpage][manpage].\n\n### <a name=\"cases\"></a>Use cases\n\nTo support different data types (e.g. ChIP-seq, Hi-C, ATAC-seq), Chromap needs to be tuned for optimal performance and accuracy. It is usually recommended to choose a preset with option **--preset**, which sets multiple parameters at the same time.\n\n#### <a name=\"map-chip\"></a>Map ChIP-seq short reads\n\n```sh\nchromap --preset chip -x index -r ref.fa -1 read1.fq.gz -2 read2.fq.gz -o aln.bed      # ChIP-seq reads\n```\nThis set of parameters is tuned for mapping ChIP-seq reads. Chromap will map the paired-end reads with max insert size up to 2000 (**-l 2000**) and then remove duplicates (**--remove-pcr-duplicates**) using the low memory mode (**--low-mem**). The output is in BED format (**--BED**). In the output BED file, each row is a mapping of a fragment (i.e., a read pair) and the columns are\n\n    chrom chrom_start chrom_end N mapq strand\nThe strand here is the strand of the first read in a read pair (specified by **-1**). If the mapping start and end locations of each read in a read pair are desired, **--TagAlign** should be used to overide **--BED** in the preset parameters as following\n```sh\nchromap --preset chip -x index -r ref.fa -1 read1.fq.gz -2 read2.fq.gz --TagAlign -o aln.tagAlign      # ChIP-seq reads\n```\nFor each read pair, there will be two rows in the output file, one for each read in the pair respectively. The meaning of the columns remains the same.\n\n#### <a name=\"map-atac\"></a>Map ATAC-seq/scATAC-seq short reads\n\n```sh\nchromap --preset atac -x index -r ref.fa -1 read1.fq.gz -2 read2.fq.gz -o aln.bed      # ATAC-seq reads\nchromap --preset atac -x index -r ref.fa -1 read1.fq.gz -2 read2.fq.gz -o aln.bed\\\n -b barcode.fq.gz --barcode-whitelist whitelist.txt                                    # scATAC-seq reads\n```\nThis set of parameters is tuned for mapping ATAC-seq/scATAC-seq reads. Chromap will trim the adapters on 3' end (**--trim-adapters**), map the paired-end reads with max insert size up to 2000 (**-l 2000**) and then remove duplicates at cell level (**--remove-pcr-duplicates-at-cell-level**). Tn5 shift will also be applied to the fragments (**--Tn5-shift**). The forward mapping start positions are increased by 4bp and the reverse mapping end positions are decreased by 5bp. The processing is run in the low memory mode (**--low-mem**).\n\nIf no barcode whitelist file is given, Chromap will skip barcode correction. When barcodes and a whitelist are given as input, by default Chromap will estimate barcode abundance and use this information to perform barcode correction with up to 1 Hamming distance from a whitelist barcode. By setting **--bc-error-threshold** to 2, Chromap is able to correct barcodes with up to 2 Hamming distance from a whitelist barcode. User can also increase the probability threshold to make a correction by setting **--bc-probability-threshold** (set to 0.9 by default) to a large value (e.g., 0.975) to only make reliable corrections. For scATAC-seq data with multiple read and barcode files, you can use \",\" to concatenate multiple input files as the example [above](#general). \n\nChromap also supports user-defined barcode format, including mixed barcode and genomic data case. User can specify the sequence structure through option **--read-format**. The value is a comma-separated string, each field in the string is also a semi-comma-splitted string\n\n    [r1|r2|bc]:start:end:strand\nThe start and end are inclusive and -1 means the end of the read. User may use multiple fields to specify non-consecutive segments, e.g. bc:0:15,bc:32:-1. The strand is presented by '+' and '-' symbol, if '-' the barcode will be reverse-complemented after extraction. The strand symbol can be omitted if it is '+' and is ignored on r1 and r2. For example, when the barcode is in the first 16bp of read1, one can use the option `-1 read1.fq.gz -2 read2.fq.gz --barcode read1.fq.gz --read-format bc:0:15,r1:16:-1`.\n\nThe output file formats for bulk and single-cell data are different except for the first three columns. For bulk data, the columns are\n\n    chrom chrom_start chrom_end N mapq strand duplicate_count\nFor single-cell data, the columns are \n    \n    chrom chrom_start chrom_end barcode duplicate_count\nthe same as the definition of the fragment file in [CellRanger][cellranger]. Note that chrom_end is open-end. This output fragment file can be used as input of downstream analysis tools such as [MAESTRO][MAESTRO], [ArchR][ArchR], [signac][signac] and etc.\n\nBesides, Chromap can translate input cell barcodes to another set of barcodes. Users can specify the translation file through the option **--barcode-translate**. The translation file is a two-column tsv/csv file with the translated barcode on the first column and the original barcode on the second column. This is useful for 10x Multiome data, where scATAC-seq and scRNA-seq data use different sets of barcodes. This option also supports combinatorial barcoding, such as SHARE-seq. Chromap can translate each barcode segment provided in the second column to the ID in the first column and add \"-\" to concatenate the IDs in the output.\n\n#### <a name=\"map-hic\"></a>Map Hi-C short reads\n\n```sh\nchromap --preset hic -x index -r ref.fa -1 read1.fa -2 read2.fa -o aln.pairs           # Hi-C reads and pairs output\n```\nChromap will perform split alignment (**--split-alignment**) on Hi-C reads and output mappings in [pairs][pairs] format (**--pairs**), which is used in [4DN Hi-C data processing pipeline][4DN]. Some Hi-C data analysis pipelines may require the reads are sorted in specific chromosome order other than the one in the index. Therefore, Chromap provides the option **--chr-order** to specify the alignment order, and **--pairs-natural-chr-order** for flipping the pair in the pairs format. \n\n### <a name=\"atacseq-qc\"></a>Summarizing mapping statistics/quality control\n\nChromap allows you to summarize the dataset's mapping statistics as well as quality metrics at either a *bulk* or *single cell* level. To enable this feature, users can specify a file path using this option, **--summary [FILE]**, where a csv file will be saved.\n\nThis summary file will output a series of metrics for each barcode (or the overall dataset if it is bulk). Here are the different columns contained within the summary file:\n\n```sh\nbarcode,total,duplicate,unmapped,lowmapq,cachehit,fric,estfrip,numcacheslots\n```\n\n- `barcode` - Barcode label for cell\n- `total` - Total number of fragments\n- `duplicate` - Number of duplicate fragments\n- `unmapped` - Number of unmapped fragments \n- `lowmapq` - Number of fragments with a low MAPQ\n- `cachehit` - Number of fragments that were found in the chromap cache during alignment\n- `fric` - Fraction of fragments in the chromap cache\n- `estfrip` - Estimated FRiP value based on a linear model ([See below for more details](#estfrip))\n- `numcacheslots` - Number of unique associated cache slots for this barcode (Relevant feature for doublet detection, [see below for more](#doublet))\n\nThe summary contains metrics relevant to the mappability of fragments from each barcode. \nHowever, it also contains metrics (`estfrip` and `numcacheslots`) relevant to quality control for chromatin profiling assays like scATAC-seq. These cache-related metrics require overall deep sequencing depth, so it is more useful for single-cell data. \nThe next two sections briefly describe these two metrics and how they can be useful for users.\n\n#### <a name=\"estfrip\"></a>Estimating FRiP\n\nThe `estfrip` column in Chromap's summary file represents an estimate of the FRiP score (Fraction of Reads in Peak Regions) computed by Chromap.\nChromap uses a simple multi-variate linear model to estimate the FRiP for each barcode and the features used in this model are `fric`, `duplicate`, `unmapped` and `lowmapq`.\n\nTypically, the FRiP score is used to assess the quality of chromatin profiles, where typically the higher the FRiP score the better. \n\nFor users, this `estfrip` can be used to quickly gauge the quality of the data by plotting all the values in a histogram and looking to see if you a multi-modal distribution.\nIn addition, when combining Chromap with downstream analysis tools such as [SnapATAC2](https://github.com/kaizhang/SnapATAC2) that perform clustering, the `estfrip` can be used to quickly identify any specific clusters that are lower quality than the rest.\n\n**An important note to users**, the `estfrip` values for every barcode should not be taken by themselves and used as the true FRiP score.\nThese estimates are mainly intended to be used for quality control at a dataset level where we compare different `estfrip` values to each other.\n\n#### <a name=\"doublet\"></a>Features to assist in doublet detection\n\nThe `numcacheslots` column in Chromap's summary file estimates the number of unique cache slots queried for each barcode during the alignment. This feature can be useful in assisting users for doublet detection/filtering.\n\nTypically for doublet detection in single-cell datasets, a simple and naive metric used to identify potential doublets is the number of fragments in cells (i.e. more reads, more likely a doublet). \n\nChromap uses the simple intuition that barcodes with higher number of peaks than usual, could be doublets. The number of unique cache slots that are queried can be seen as a proxy for the number of peaks. In our experiments, using `numcacheslots` yields a larger AUC compared using `total` for binary classification of doublets. Therefore, users can potentially use this metric as an additional check/feature along with other doublet-detection specific methods.\n\n\n### <a name=\"help\"></a>Getting help\n\nDetailed description of Chromap command line options and optional tags can be displayed by running Chromap with **-h** or be found at the [manpage][manpage]. If you encounter bugs or have further questions or requests, you can raise an issue at the [issue page][issue].\n\n### <a name=\"cite\"></a>Citing Chromap\n\nIf you use Chromap, please cite:\n\n> Zhang, H., Song, L., Wang, X., Cheng, H., Wang, C., Meyer, C. A., ..., Liu, X. S., Li, H. (2021). Fast alignment and preprocessing of chromatin profiles with Chromap. Nature communications, 12(1), 1-6.\n> https://doi.org/10.1038/s41467-021-26865-w\n\nThe summary file for QC is described in the manuscript:\n> Ahmed, O., Zhang, H., Langmead, B., Song, L. (2025). Quality control of single-cell ATAC-seq data without peak calling using Chromap. Biorxiv.\n> https://doi.org/10.1101/2025.07.15.664951\n\n[bed]: https://genome.ucsc.edu/FAQ/FAQformat.html#format1\n[paf]: https://github.com/lh3/miniasm/blob/master/PAF.md\n[sam]: https://samtools.github.io/hts-specs/SAMv1.pdf\n[pairs]: https://github.com/4dn-dcic/pairix/blob/master/pairs_format_specification.md\n[4DN]: https://data.4dnucleome.org/resources/data-analysis/hi_c-processing-pipeline\n[minimap]: https://github.com/lh3/minimap\n[release]: https://github.com/haowenz/chromap/releases\n[issue]: https://github.com/haowenz/chromap/issues\n[cellranger]: https://support.10xgenomics.com/single-cell-atac/software/pipelines/latest/output/fragments\n[manpage]: https://haowenz.github.io/chromap/chromap.html\n[bioconda]: https://anaconda.org/bioconda/chromap\n[ArchR]: https://www.archrproject.com/index.html\n[MAESTRO]: https://github.com/liulab-dfci/MAESTRO\n[signac]: https://satijalab.org/signac/articles/pbmc_vignette.html\n"
  },
  {
    "path": "chromap.1",
    "content": ".TH chromap 1 \"25 Jan 2024\" \"chromap-0.2.6 (r490)\" \"Bioinformatics tools\"\n.SH NAME\n.PP\nchromap - fast alignment and preprocessing of chromatin profiles\n.SH SYNOPSIS\n* Indexing the reference genome:\n.RS 4\nchromap\n.B -i\n.RB [ -k\n.IR kmer ]\n.RB [ -w\n.IR miniWinSize ]\n.B -r\n.I ref.fa\n.B -o\n.I ref.index\n.RE\n\n* Mapping (sc)ATAC-seq reads:\n.RS 4\nchromap\n.B --preset\n.I atac\n.B -r\n.I ref.fa\n.B -x\n.I ref.index\n.B -1 \n.I read1.fq\n.B -2\n.I read2.fq\n.B -o \n.I aln.bed\n.RB [ -b \n.IR barcode.fq.gz ] \n.RB [ --barcode-whitelist \n.IR whitelist.txt ]\n.RE\n\n* Mapping ChIP-seq reads:\n.RS 4\nchromap\n.B --preset\n.I chip\n.B -r\n.I ref.fa\n.B -x\n.I ref.index\n.B -1\n.I read1.fq\n.B -2\n.I read2.fq\n.B -o\n.I aln.bed\n.RE\n\n* Mapping Hi-C reads:\n.RS 4\nchromap \n.B --preset\n.I hic\n.B -r\n.I ref.fa\n.B -x\n.I ref.index\n.B -1\n.I read1.fq\n.B -2\n.I read2.fq\n.B -o\n.I aln.pairs\n.br\nchromap \n.B --preset\n.I hic\n.B -r\n.I ref.fa\n.B -x\n.I ref.index\n.B -1\n.I read1.fq\n.B -2\n.I read2.fq\n.B --SAM\n.B -o\n.I aln.sam\n.RE\n\n.SH DESCRIPTION\n.PP\nChromap is an ultrafast method for aligning and preprocessing high throughput\nchromatin profiles. Typical use cases include: (1) trimming sequencing adapters,\nmapping bulk ATAC-seq or ChIP-seq genomic reads to the human genome and removing\nduplicates; (2) trimming sequencing adapters, mapping single cell ATAC-seq\ngenomic reads to the human genome, correcting barcodes, removing duplicates and\nperforming Tn5 shift; (3) split alignment of Hi-C reads against a reference\ngenome. In all these three cases, Chromap is 10-20 times faster while being\naccurate.\n.SH OPTIONS\n.SS Indexing options\n.TP 10\n.BI -k \\ INT\nMinimizer k-mer length [17].\n.TP\n.BI -w \\ INT\nMinimizer window size [7]. A minimizer is the smallest k-mer\nin a window of w consecutive k-mers.\n.TP\n.B --min-frag-length\nMin fragment length for choosing k and w automatically [30]. Users can increase\nthis value when the min length of the fragments of interest is long, which can\nincrease the mapping speed. Note that the default value 30 is the min fragment\nlength that chromap can map. \n\n.SS Mapping options\n.TP 10\n.BI --split-alignment\nAllow split alignments. This option should be set only when mapping Hi-C reads.\n.TP\n.BI -e \\ INT\nMax edit distance allowed to map a read [8].\n.TP\n.BI -s \\ INT\nMin number of minimizers required to map a read [2].\n.TP\n.BI -f \\ INT1 [, INT2 ]\nIgnore minimizers occuring more than\n.I INT1\n[500] times.\n.I INT2\n[1000] is the threshold for a second round of seeding.\n.TP\n.BI -l \\ INT\nMax insert size, only for paired-end read mapping [1000].\n.TP\n.BI -q \\ INT\nMin MAPQ in range [0, 60] for mappings to be output [30].\n.TP\n.BI --min-read-length \\ INT\nSkip mapping the reads of length less than  \n.I INT \n[30]. Note that this is different from the index option\n.BR --min-frag-length\n, which set\n.BR -k\nand\n.BR -w\nfor indexing the genome.\n.TP\n.BI --trim-adapters\nTry to trim adapters on 3'. This only works for paired-end reads. When the\nfragment length indicated by the read pair is less than the length of the reads,\nthe two mates are overlapped with each other. Then the regions outside the\noverlap are regarded as adapters and trimmed.\n.TP\n.BI --remove-pcr-duplicates\nRemove PCR duplicates.\n.TP\n.BI --remove-pcr-duplicates-at-bulk-level\nRemove PCR duplicates at bulk level for single cell data.\n.TP\n.BI --remove-pcr-duplicates-at-cell-level\nRemove PCR duplicates at cell level for single cell data.\n.TP\n.BI --Tn5-shift\nPerform Tn5 shift. When this option is turned on, the forward mapping start\npositions are increased by 4bp and the reverse mapping end positions are\ndecreased by 5bp. Note that this works only when\n.BR --SAM\nis NOT set.\n.TP\n.BI --low-mem\nUse low memory mode. When this option is set, multiple temporary intermediate\nmapping files might be generated on disk and they are merged at the end of\nprocessing to reduce memory usage. When this is NOT set, all the mapping results\nare kept in the memory before they are saved on disk, which works more\nefficiently for datasets that are not too large.\n.TP\n.BI --bc-error-threshold \\ INT\nMax Hamming distance allowed to correct a barcode [1]. Note that the max \nsupported threshold is 2.\n.TP\n.BI --bc-probability-threshold \\ FLT\nMin probability to correct a barcode [0.9]. When there are multiple whitelisted\nbarcodes with the same Hamming distance to the barcode to correct, chromap will\nprocess the base quality of the mismatched bases, and compute a probability that\nthe correction is right.\n.TP\n.BI -t \\ INT\nThe number of threads for mapping [1].\n\n.SS Input options\n.TP 10\n.BI -r \\ FILE\nReference file.\n.TP\n.BI -x \\ FILE\nIndex file.\n.TP\n.BI -1 \\ FILE\nSingle-end read files or paired-end read files 1. Chromap supports mulitple\ninput files concatenate by \",\". For example, setting this option to \n\"Library1_R1.fastq.gz,Library2_R1.fastq.gz,Library3_R1.fastq.gz\" will make \nall three files as input and map them in this order. Similarly,\n.BR -2\nand\n.BR -b\nalso support multiple input files. And the ordering of the input files for all\nthe three options should match.\n.TP\n.BI -2 \\ FILE\nPaired-end read files 2.\n.TP\n.BI -b \\ FILE\nCell barcode files.\n.TP\n.BI --barcode-whitelist \\ FILE\nCell barcode whitelist file. This is supposed to be a txt file where each line\nis a whitelisted barcode.\n.TP\n.BI --read-format \\ STR\nFormat for read files and barcode files [\"r1:0:-1,bc:0:-1\"] as 10x Genomics \nsingle-end format.\n\n.SS Output options\n.TP 10\n.BR -o \\ FILE\nOutput file.\n.TP\n.BR --output-mappings-not-in-whitelist\nOutput mappings with barcode not in the whitelist.\n.TP\n.BR --chr-order \\ FILE          \nCustom chromosome order file. If not specified, the order of reference sequences will be used.\n.TP\n.BR --BED\nOutput mappings in BED/BEDPE format. Note that only one of the formats should be\nset.\n.TP\n.BR --TagAlign\nOutput mappings in TagAlign/PairedTagAlign format.\n.TP\n.BR --SAM\nOutput mappings in SAM format.\n.TP\n.BR --pairs\nOutput mappings in pairs format (defined by 4DN for HiC data).\n.TP\n.BR --pairs-natural-chr-order \\ FILE\nCustom chromosome order file for pairs flipping. If not specified, the custom chromosome order will be used.\n.TP\n.BR --barcode-translate \\ FILE\nConvert input barcodes to another set of barcodes in the output.\n.TP\n.BR --summary \\ FILE\nSummarize the mapping statistics at bulk or barcode level.\n.TP\n.B -v\nPrint version number to stdout.\n\n.SS Preset options\n.TP 10\n.BI --preset \\ STR\nPreset []. This option applies multiple options at the same time. It should be\napplied before other options because options applied later will overwrite the\nvalues set by\n.BR --preset .\nAvailable\n.I STR\nare:\n.RS\n.TP 10\n.B chip \nMapping ChIP-seq reads\n.RB ( -l\n.I 2000\n.B --remove-pcr-duplicates --low-mem\n.BR --BED ).\n.TP\n.B atac\nMapping ATAC-seq/scATAC-seq reads\n.RB ( -l \n.I 2000\n.B --remove-pcr-duplicates --low-mem --trim-adapters --Tn5-shift\n.B --remove-pcr-duplicates-at-cell-level\n.BR --BED ).\n.TP\n.B hic\nMapping Hi-C reads\n.RB ( -e \n.I 4\n.B -q\n.I 1 \n.B --low-mem --split-alignment\n.BR --pairs ).\n"
  },
  {
    "path": "docs/_config.yml",
    "content": "theme: jekyll-theme-modernist"
  },
  {
    "path": "docs/chromap.html",
    "content": "<!-- Creator     : groff version 1.22.2 -->\n<!-- CreationDate: Mon Sep 20 10:43:13 2021 -->\n<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\"\n\"http://www.w3.org/TR/html4/loose.dtd\">\n<html>\n<head>\n<meta name=\"generator\" content=\"groff -Thtml, see www.gnu.org\">\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=US-ASCII\">\n<meta name=\"Content-Style\" content=\"text/css\">\n<style type=\"text/css\">\n       p       { margin-top: 0; margin-bottom: 0; vertical-align: top }\n       pre     { margin-top: 0; margin-bottom: 0; vertical-align: top }\n       table   { margin-top: 0; margin-bottom: 0; vertical-align: top }\n       h1      { text-align: center }\n</style>\n<title>chromap</title>\n\n</head>\n<body>\n\n<h1 align=\"center\">chromap</h1>\n\n<a href=\"#NAME\">NAME</a><br>\n<a href=\"#SYNOPSIS\">SYNOPSIS</a><br>\n<a href=\"#DESCRIPTION\">DESCRIPTION</a><br>\n<a href=\"#OPTIONS\">OPTIONS</a><br>\n\n<hr>\n\n\n<h2>NAME\n<a name=\"NAME\"></a>\n</h2>\n\n\n<p style=\"margin-left:11%; margin-top: 1em\">chromap - fast\nalignment and preprocessing of chromatin profiles</p>\n\n<h2>SYNOPSIS\n<a name=\"SYNOPSIS\"></a>\n</h2>\n\n\n<p style=\"margin-left:11%; margin-top: 1em\">* Indexing the\nreference genome:</p>\n\n<p style=\"margin-left:17%;\">chromap <b>-i</b> [<b>-k</b>\n<i>kmer</i>] [<b>-w</b> <i>miniWinSize</i>] <b>-r</b>\n<i>ref.fa</i> <b>-o</b> <i>ref.index</i></p>\n\n<p style=\"margin-left:11%; margin-top: 1em\">* Mapping\n(sc)ATAC-seq reads:</p>\n\n<p style=\"margin-left:17%;\">chromap <b>--preset</b>\n<i>atac</i> <b>-r</b> <i>ref.fa</i> <b>-x</b>\n<i>ref.index</i> <b>-1</b> <i>read1.fq</i> <b>-2</b>\n<i>read2.fq</i> <b>-o</b> <i>aln.bed</i> [<b>-b</b>\n<i>barcode.fq.gz</i>] [<b>--barcode-whitelist</b>\n<i>whitelist.txt</i>]</p>\n\n<p style=\"margin-left:11%; margin-top: 1em\">* Mapping\nChIP-seq reads:</p>\n\n<p style=\"margin-left:17%;\">chromap <b>--preset</b>\n<i>chip</i> <b>-r</b> <i>ref.fa</i> <b>-x</b>\n<i>ref.index</i> <b>-1</b> <i>read1.fq</i> <b>-2</b>\n<i>read2.fq</i> <b>-o</b> <i>aln.bed</i></p>\n\n<p style=\"margin-left:11%; margin-top: 1em\">* Mapping Hi-C\nreads:</p>\n\n<p style=\"margin-left:17%;\">chromap <b>--preset</b>\n<i>hic</i> <b>-r</b> <i>ref.fa</i> <b>-x</b>\n<i>ref.index</i> <b>-1</b> <i>read1.fq</i> <b>-2</b>\n<i>read2.fq</i> <b>-o</b> <i>aln.pairs</i> <br>\nchromap <b>--preset</b> <i>hic</i> <b>-r</b> <i>ref.fa</i>\n<b>-x</b> <i>ref.index</i> <b>-1</b> <i>read1.fq</i>\n<b>-2</b> <i>read2.fq</i> <b>--SAM -o</b> <i>aln.sam</i></p>\n\n<h2>DESCRIPTION\n<a name=\"DESCRIPTION\"></a>\n</h2>\n\n\n<p style=\"margin-left:11%; margin-top: 1em\">Chromap is an\nultrafast method for aligning and preprocessing high\nthroughput chromatin profiles. Typical use cases include:\n(1) trimming sequencing adapters, mapping bulk ATAC-seq or\nChIP-seq genomic reads to the human genome and removing\nduplicates; (2) trimming sequencing adapters, mapping single\ncell ATAC-seq genomic reads to the human genome, correcting\nbarcodes, removing duplicates and performing Tn5 shift; (3)\nsplit alignment of Hi-C reads against a reference genome. In\nall these three cases, Chromap is 10-20 times faster while\nbeing accurate.</p>\n\n<h2>OPTIONS\n<a name=\"OPTIONS\"></a>\n</h2>\n\n\n<p style=\"margin-left:11%; margin-top: 1em\"><b>Indexing\noptions</b></p>\n\n<table width=\"100%\" border=\"0\" rules=\"none\" frame=\"void\"\n       cellspacing=\"0\" cellpadding=\"0\">\n<tr valign=\"top\" align=\"left\">\n<td width=\"11%\"></td>\n<td width=\"9%\">\n\n\n<p><b>-k&nbsp;</b><i>INT</i></p></td>\n<td width=\"6%\"></td>\n<td width=\"74%\">\n\n\n<p>Minimizer k-mer length [17].</p></td></tr>\n<tr valign=\"top\" align=\"left\">\n<td width=\"11%\"></td>\n<td width=\"9%\">\n\n\n<p><b>-w&nbsp;</b><i>INT</i></p></td>\n<td width=\"6%\"></td>\n<td width=\"74%\">\n\n\n<p>Minimizer window size [7]. A minimizer is the smallest\nk-mer in a window of w consecutive k-mers.</p></td></tr>\n</table>\n\n<p style=\"margin-left:11%;\"><b>--min-frag-length</b></p>\n\n<p style=\"margin-left:26%;\">Min fragment length for\nchoosing k and w automatically [30]. Users can increase this\nvalue when the min length of the fragments of interest is\nlong, which can increase the mapping speed. Note that the\ndefault value 30 is the min fragment length that chromap can\nmap.</p>\n\n<p style=\"margin-left:11%; margin-top: 1em\"><b>Mapping\noptions <br>\n--split-alignment</b></p>\n\n<p style=\"margin-left:26%;\">Allow split alignments. This\noption should be set only when mapping Hi-C reads.</p>\n\n<table width=\"100%\" border=\"0\" rules=\"none\" frame=\"void\"\n       cellspacing=\"0\" cellpadding=\"0\">\n<tr valign=\"top\" align=\"left\">\n<td width=\"11%\"></td>\n<td width=\"9%\">\n\n\n<p><b>-e&nbsp;</b><i>INT</i></p></td>\n<td width=\"6%\"></td>\n<td width=\"74%\">\n\n\n<p>Max edit distance allowed to map a read [8].</p></td></tr>\n<tr valign=\"top\" align=\"left\">\n<td width=\"11%\"></td>\n<td width=\"9%\">\n\n\n<p><b>-s&nbsp;</b><i>INT</i></p></td>\n<td width=\"6%\"></td>\n<td width=\"74%\">\n\n\n<p>Min number of minimizers required to map a read [2].</p></td></tr>\n</table>\n\n\n<p style=\"margin-left:11%;\"><b>-f&nbsp;</b><i>INT1</i><b>[,</b><i>INT2</i><b>]</b></p>\n\n<p style=\"margin-left:26%;\">Ignore minimizers occuring more\nthan <i>INT1</i> [500] times. <i>INT2</i> [1000] is the\nthreshold for a second round of seeding.</p>\n\n<table width=\"100%\" border=\"0\" rules=\"none\" frame=\"void\"\n       cellspacing=\"0\" cellpadding=\"0\">\n<tr valign=\"top\" align=\"left\">\n<td width=\"11%\"></td>\n<td width=\"9%\">\n\n\n<p><b>-l&nbsp;</b><i>INT</i></p></td>\n<td width=\"6%\"></td>\n<td width=\"74%\">\n\n\n<p>Max insert size, only for paired-end read mapping\n[1000].</p> </td></tr>\n<tr valign=\"top\" align=\"left\">\n<td width=\"11%\"></td>\n<td width=\"9%\">\n\n\n<p><b>-q&nbsp;</b><i>INT</i></p></td>\n<td width=\"6%\"></td>\n<td width=\"74%\">\n\n\n<p>Min MAPQ in range [0, 60] for mappings to be output\n[30].</p> </td></tr>\n</table>\n\n\n<p style=\"margin-left:11%;\"><b>--min-read-length&nbsp;</b><i>INT</i></p>\n\n<p style=\"margin-left:26%;\">Skip mapping the reads of\nlength less than <i>INT</i> [30]. Note that this is\ndifferent from the index option <b>--min-frag-length</b> ,\nwhich set <b>-k</b> and <b>-w</b> for indexing the\ngenome.</p>\n\n<p style=\"margin-left:11%;\"><b>--trim-adapters</b></p>\n\n<p style=\"margin-left:26%;\">Try to trim adapters on\n3&rsquo;. This only works for paired-end reads. When the\nfragment length indicated by the read pair is less than the\nlength of the reads, the two mates are overlapped with each\nother. Then the regions outside the overlap are regarded as\nadapters and trimmed.</p>\n\n\n<p style=\"margin-left:11%;\"><b>--remove-pcr-duplicates</b></p>\n\n<p style=\"margin-left:26%;\">Remove PCR duplicates.</p>\n\n\n<p style=\"margin-left:11%;\"><b>--remove-pcr-duplicates-at-bulk-level</b></p>\n\n<p style=\"margin-left:26%;\">Remove PCR duplicates at bulk\nlevel for single cell data.</p>\n\n\n<p style=\"margin-left:11%;\"><b>--remove-pcr-duplicates-at-cell-level</b></p>\n\n<p style=\"margin-left:26%;\">Remove PCR duplicates at cell\nlevel for single cell data.</p>\n\n<p style=\"margin-left:11%;\"><b>--Tn5-shift</b></p>\n\n<p style=\"margin-left:26%;\">Perform Tn5 shift. When this\noption is turned on, the forward mapping start positions are\nincreased by 4bp and the reverse mapping end positions are\ndecreased by 5bp. Note that this works only when\n<b>--SAM</b> is NOT set.</p>\n\n<table width=\"100%\" border=\"0\" rules=\"none\" frame=\"void\"\n       cellspacing=\"0\" cellpadding=\"0\">\n<tr valign=\"top\" align=\"left\">\n<td width=\"11%\"></td>\n<td width=\"14%\">\n\n\n<p><b>--low-mem</b></p></td>\n<td width=\"1%\"></td>\n<td width=\"74%\">\n\n\n<p>Use low memory mode. When this option is set, multiple\ntemporary intermediate mapping files might be generated on\ndisk and they are merged at the end of processing to reduce\nmemory usage. When this is NOT set, all the mapping results\nare kept in the memory before they are saved on disk, which\nworks more efficiently for datasets that are not too\nlarge.</p> </td></tr>\n</table>\n\n\n<p style=\"margin-left:11%;\"><b>--bc-error-threshold&nbsp;</b><i>INT</i></p>\n\n<p style=\"margin-left:26%;\">Max Hamming distance allowed to\ncorrect a barcode [1]. Note that the max supported threshold\nis 2.</p>\n\n\n<p style=\"margin-left:11%;\"><b>--bc-probability-threshold&nbsp;</b><i>FLT</i></p>\n\n<p style=\"margin-left:26%;\">Min probability to correct a\nbarcode [0.9]. When there are multiple whitelisted barcodes\nwith the same Hamming distance to the barcode to correct,\nchromap will process the base quality of the mismatched\nbases, and compute a probability that the correction is\nright.</p>\n\n<table width=\"100%\" border=\"0\" rules=\"none\" frame=\"void\"\n       cellspacing=\"0\" cellpadding=\"0\">\n<tr valign=\"top\" align=\"left\">\n<td width=\"11%\"></td>\n<td width=\"9%\">\n\n\n<p><b>-t&nbsp;</b><i>INT</i></p></td>\n<td width=\"6%\"></td>\n<td width=\"59%\">\n\n\n<p>The number of threads for mapping [1].</p></td>\n<td width=\"15%\">\n</td></tr>\n</table>\n\n<p style=\"margin-left:11%; margin-top: 1em\"><b>Input\noptions</b></p>\n\n<table width=\"100%\" border=\"0\" rules=\"none\" frame=\"void\"\n       cellspacing=\"0\" cellpadding=\"0\">\n<tr valign=\"top\" align=\"left\">\n<td width=\"11%\"></td>\n<td width=\"11%\">\n\n\n<p style=\"margin-top: 1em\"><b>-r&nbsp;</b><i>FILE</i></p></td>\n<td width=\"4%\"></td>\n<td width=\"74%\">\n\n\n<p style=\"margin-top: 1em\">Reference file.</p></td></tr>\n<tr valign=\"top\" align=\"left\">\n<td width=\"11%\"></td>\n<td width=\"11%\">\n\n\n<p><b>-x&nbsp;</b><i>FILE</i></p></td>\n<td width=\"4%\"></td>\n<td width=\"74%\">\n\n\n<p>Index file.</p></td></tr>\n<tr valign=\"top\" align=\"left\">\n<td width=\"11%\"></td>\n<td width=\"11%\">\n\n\n<p><b>-1&nbsp;</b><i>FILE</i></p></td>\n<td width=\"4%\"></td>\n<td width=\"74%\">\n\n\n<p>Single-end read files or paired-end read files 1.\nChromap supports mulitple input files concatenate by\n&quot;,&quot;. For example, setting this option to\n&quot;read11.fq,read12.fq,read13.fq&quot; will make all\nthree files as input and map them in this order. Similarly,\n<b>-2</b> and <b>-b</b> also support multiple input files.\nAnd the ordering of the input files for all the three\noptions should match.</p></td></tr>\n<tr valign=\"top\" align=\"left\">\n<td width=\"11%\"></td>\n<td width=\"11%\">\n\n\n<p><b>-2&nbsp;</b><i>FILE</i></p></td>\n<td width=\"4%\"></td>\n<td width=\"74%\">\n\n\n<p>Paired-end read files 2.</p></td></tr>\n<tr valign=\"top\" align=\"left\">\n<td width=\"11%\"></td>\n<td width=\"11%\">\n\n\n<p><b>-b&nbsp;</b><i>FILE</i></p></td>\n<td width=\"4%\"></td>\n<td width=\"74%\">\n\n\n<p>Cell barcode files.</p></td></tr>\n</table>\n\n\n<p style=\"margin-left:11%;\"><b>--barcode-whitelist&nbsp;</b><i>FILE</i></p>\n\n<p style=\"margin-left:26%;\">Cell barcode whitelist file.\nThis is supposed to be a txt file where each line is a\nwhitelisted barcode.</p>\n\n\n<p style=\"margin-left:11%;\"><b>--read-format&nbsp;</b><i>STR</i></p>\n\n<p style=\"margin-left:26%;\">Format for read files and\nbarcode files [&quot;r1:0:-1,bc:0:-1&quot;] as 10x Genomics\nsingle-end format.</p>\n\n<p style=\"margin-left:11%; margin-top: 1em\"><b>Output\noptions</b></p>\n\n<table width=\"100%\" border=\"0\" rules=\"none\" frame=\"void\"\n       cellspacing=\"0\" cellpadding=\"0\">\n<tr valign=\"top\" align=\"left\">\n<td width=\"11%\"></td>\n<td width=\"11%\">\n\n\n<p><b>-o&nbsp;</b>FILE</p></td>\n<td width=\"4%\"></td>\n<td width=\"19%\">\n\n\n<p>Output file.</p></td>\n<td width=\"55%\">\n</td></tr>\n</table>\n\n\n<p style=\"margin-left:11%;\"><b>--output-mappings-not-in-whitelist</b></p>\n\n<p style=\"margin-left:26%;\">Output mappings with barcode\nnot in the whitelist.</p>\n\n\n<p style=\"margin-left:11%;\"><b>--chr-order&nbsp;</b>FILE</p>\n\n<p style=\"margin-left:26%;\">Customized chromsome order.</p>\n\n<table width=\"100%\" border=\"0\" rules=\"none\" frame=\"void\"\n       cellspacing=\"0\" cellpadding=\"0\">\n<tr valign=\"top\" align=\"left\">\n<td width=\"11%\"></td>\n<td width=\"7%\">\n\n\n<p><b>--BED</b></p></td>\n<td width=\"8%\"></td>\n<td width=\"74%\">\n\n\n<p>Output mappings in BED/BEDPE format. Note that only one\nof the formats should be set.</p></td></tr>\n</table>\n\n<p style=\"margin-left:11%;\"><b>--TagAlign</b></p>\n\n<p style=\"margin-left:26%;\">Output mappings in\nTagAlign/PairedTagAlign format.</p>\n\n<table width=\"100%\" border=\"0\" rules=\"none\" frame=\"void\"\n       cellspacing=\"0\" cellpadding=\"0\">\n<tr valign=\"top\" align=\"left\">\n<td width=\"11%\"></td>\n<td width=\"11%\">\n\n\n<p><b>--SAM</b></p></td>\n<td width=\"4%\"></td>\n<td width=\"74%\">\n\n\n<p>Output mappings in SAM format.</p></td></tr>\n<tr valign=\"top\" align=\"left\">\n<td width=\"11%\"></td>\n<td width=\"11%\">\n\n\n<p><b>--pairs</b></p></td>\n<td width=\"4%\"></td>\n<td width=\"74%\">\n\n\n<p>Output mappings in pairs format (defined by 4DN for HiC\ndata).</p> </td></tr>\n</table>\n\n\n<p style=\"margin-left:11%;\"><b>--pairs-natural-chr-order&nbsp;</b>FILE</p>\n\n<p style=\"margin-left:26%;\">Natural chromosome order for\npairs flipping.</p>\n\n<table width=\"100%\" border=\"0\" rules=\"none\" frame=\"void\"\n       cellspacing=\"0\" cellpadding=\"0\">\n<tr valign=\"top\" align=\"left\">\n<td width=\"11%\"></td>\n<td width=\"3%\">\n\n\n<p><b>-v</b></p></td>\n<td width=\"12%\"></td>\n<td width=\"48%\">\n\n\n<p>Print version number to stdout.</p></td>\n<td width=\"26%\">\n</td></tr>\n</table>\n\n<p style=\"margin-left:11%; margin-top: 1em\"><b>Preset\noptions <br>\n--preset&nbsp;</b><i>STR</i></p>\n\n<p style=\"margin-left:26%;\">Preset []. This option applies\nmultiple options at the same time. It should be applied\nbefore other options because options applied later will\noverwrite the values set by <b>--preset</b>. Available\n<i>STR</i> are:</p>\n\n<table width=\"100%\" border=\"0\" rules=\"none\" frame=\"void\"\n       cellspacing=\"0\" cellpadding=\"0\">\n<tr valign=\"top\" align=\"left\">\n<td width=\"26%\"></td>\n<td width=\"6%\">\n\n\n<p><b>chip</b></p></td>\n<td width=\"10%\"></td>\n<td width=\"58%\">\n\n\n<p>Mapping ChIP-seq reads (<b>-l</b> <i>2000</i>\n<b>--remove-pcr-duplicates --low-mem --BED</b>).</p></td></tr>\n<tr valign=\"top\" align=\"left\">\n<td width=\"26%\"></td>\n<td width=\"6%\">\n\n\n<p><b>atac</b></p></td>\n<td width=\"10%\"></td>\n<td width=\"58%\">\n\n\n<p>Mapping ATAC-seq/scATAC-seq reads (<b>-l</b> <i>2000</i>\n<b>--remove-pcr-duplicates --low-mem --trim-adapters\n--Tn5-shift --remove-pcr-duplicates-at-cell-level\n--BED</b>).</p> </td></tr>\n<tr valign=\"top\" align=\"left\">\n<td width=\"26%\"></td>\n<td width=\"6%\">\n\n\n<p><b>hic</b></p></td>\n<td width=\"10%\"></td>\n<td width=\"58%\">\n\n\n<p>Mapping Hi-C reads (<b>-e</b> <i>4</i> <b>-q</b>\n<i>1</i> <b>--low-mem --split-alignment --pairs</b>).</p></td></tr>\n </table>\n<hr>\n</body>\n</html>\n"
  },
  {
    "path": "docs/index.md",
    "content": "## Getting help\n\n* [README][doc]: general documentation\n* [Manpage](chromap.html): explanation of command-line options\n* [Preprint][biorxiv]: free of charge preprint that describes the method\n* [GitHub Issues page][issue]: report bugs, request features and ask questions\n\n## Acquiring Chromap\n\n* `git clone https://github.com/haowenz/chromap.git`\n* [GitHub Release page][release]: versioned packages\n* Also [available from BioConda][bioconda]\n\n[doc]: https://github.com/haowenz/chromap/blob/master/README.md\n[biorxiv]: https://www.biorxiv.org/content/10.1101/2021.06.18.448995v1\n[bioconda]: https://anaconda.org/bioconda/chromap\n[release]: https://github.com/haowenz/chromap/releases\n[issue]: https://github.com/haowenz/chromap/issues\n"
  },
  {
    "path": "src/alignment.cc",
    "content": "#include \"alignment.h\"\n\n#include <smmintrin.h>\n\nnamespace chromap {\n\nint GetLongestMatchLength(const char *pattern, const char *text,\n                          const int read_length) {\n  int max_match = 0;\n  int tmp = 0;\n  for (int i = 0; i < read_length; ++i) {\n    if (CharToUint8(pattern[i]) == CharToUint8(text[i])) {\n      ++tmp;\n    } else if (tmp > max_match) {\n      max_match = tmp;\n    }\n  }\n  if (tmp > max_match) {\n    max_match = tmp;\n  }\n  return max_match;\n}\n\nint AdjustGapBeginning(const Strand mapping_strand, const char *ref,\n                       const char *read, int *gap_beginning, int read_end,\n                       int ref_start_position, int ref_end_position,\n                       int *n_cigar, uint32_t **cigar) {\n  int i, j;\n  if (mapping_strand == kPositive) {\n    if (*gap_beginning <= 0) {\n      return ref_start_position;\n    }\n\n    // printf(\"%d\\n\", *gap_beginning);\n\n    for (i = *gap_beginning - 1, j = ref_start_position - 1; i >= 0 && j >= 0;\n         --i, --j) {\n      // printf(\"%c %c\\n\", read[i], ref[j]);\n      if (read[i] != ref[j] && read[i] != ref[j] - 'a' + 'A') {\n        break;\n      }\n    }\n\n    *gap_beginning = i + 1;\n    // TODO: add soft clip in cigar\n    if (n_cigar && *n_cigar > 0) {\n      if (((*cigar)[0] & 0xf) == BAM_CMATCH) {\n        (*cigar)[0] += (ref_start_position - 1 - j) << 4;\n      }\n    }\n\n    return j + 1;\n  }\n\n  if (*gap_beginning <= 0) {\n    return ref_end_position;\n  }\n\n  // printf(\"%d\\n\", *gap_beginning);\n  /*char *tmp = new char[255] ;\n  strncpy(tmp, ref + ref_start_position, ref_end_position - ref_start_position\n  + 1 + 10) ; printf(\"%s %d. %d %d\\n\", tmp, strlen(tmp), ref_end_position -\n  ref_start_position + 1 + 10, strlen(ref)) ; delete[] tmp;*/\n\n  for (i = read_end + 1, j = ref_end_position + 1; read[i] && ref[j];\n       ++i, ++j) {\n    // printf(\"%c %c %c %c %c %c\\n\", read[i], ref[j - 1], ref[j], ref[j + 1],\n    // ref[j + 2], ref[j + 3]);\n    if (read[i] != ref[j] && read[i] != ref[j] - 'a' + 'A') {\n      break;\n    }\n  }\n\n  *gap_beginning = *gap_beginning + i - (read_end + 1);\n\n  if (n_cigar && *n_cigar > 0) {\n    if (((*cigar)[*n_cigar - 1] & 0xf) == BAM_CMATCH) {\n      (*cigar)[*n_cigar - 1] += (j - (ref_end_position + 1)) << 4;\n    }\n  }\n\n  return j - 1;\n}\n\nvoid GenerateNMAndMDTag(const char *pattern, const char *text,\n                        int mapping_start_position,\n                        MappingInMemory &mapping_in_memory) {\n  const char *read = text;\n  const char *reference = pattern + mapping_start_position;\n\n  const uint32_t *cigar = mapping_in_memory.cigar;\n  const int n_cigar = mapping_in_memory.n_cigar;\n  mapping_in_memory.NM = 0;\n  mapping_in_memory.MD_tag.clear();\n\n  int num_matches = 0;\n  int read_position = 0;\n  int reference_position = 0;\n\n  for (int ci = 0; ci < n_cigar; ++ci) {\n    uint32_t current_cigar_uint = cigar[ci];\n    uint8_t cigar_operation = bam_cigar_op(current_cigar_uint);\n    int num_cigar_operations = bam_cigar_oplen(current_cigar_uint);\n    if (cigar_operation == BAM_CMATCH) {\n      for (int opi = 0; opi < num_cigar_operations; ++opi) {\n        if (reference[reference_position] == read[read_position] ||\n            reference[reference_position] - 'a' + 'A' == read[read_position]) {\n          // a match\n          ++num_matches;\n        } else {\n          // a mismatch\n          ++mapping_in_memory.NM;\n          \n          mapping_in_memory.MD_tag.append(std::to_string(num_matches));\n          num_matches = 0;\n          mapping_in_memory.MD_tag.push_back(reference[reference_position]);\n        }\n        ++reference_position;\n        ++read_position;\n      }\n    } else if (cigar_operation == BAM_CINS) {\n      mapping_in_memory.NM += num_cigar_operations;\n      read_position += num_cigar_operations;\n    } else if (cigar_operation == BAM_CDEL) {\n      mapping_in_memory.NM += num_cigar_operations;\n      \n      mapping_in_memory.MD_tag.append(std::to_string(num_matches));\n      num_matches = 0;\n      mapping_in_memory.MD_tag.push_back('^');\n      for (int opi = 0; opi < num_cigar_operations; ++opi) {\n        mapping_in_memory.MD_tag.push_back(reference[reference_position]);\n        ++reference_position;\n      }\n    } else {\n      std::cerr << \"Unexpected cigar op: \" << (int)cigar_operation << \"\\n\";\n    }\n  }\n  mapping_in_memory.MD_tag.append(std::to_string(num_matches));\n}\n\nint BandedAlignPatternToText(int error_threshold, const char *pattern,\n                             const char *text, const int read_length,\n                             int *mapping_end_position) {\n  uint32_t Peq[5] = {0, 0, 0, 0, 0};\n  for (int i = 0; i < 2 * error_threshold; i++) {\n    uint8_t base = CharToUint8(pattern[i]);\n    Peq[base] = Peq[base] | (1 << i);\n  }\n  uint32_t highest_bit_in_band_mask = 1 << (2 * error_threshold);\n  uint32_t lowest_bit_in_band_mask = 1;\n  uint32_t VP = 0;\n  uint32_t VN = 0;\n  uint32_t X = 0;\n  uint32_t D0 = 0;\n  uint32_t HN = 0;\n  uint32_t HP = 0;\n  int num_errors_at_band_start_position = 0;\n  for (int i = 0; i < read_length; i++) {\n    uint8_t pattern_base = CharToUint8(pattern[i + 2 * error_threshold]);\n    Peq[pattern_base] = Peq[pattern_base] | highest_bit_in_band_mask;\n    X = Peq[CharToUint8(text[i])] | VN;\n    D0 = ((VP + (X & VP)) ^ VP) | X;\n    HN = VP & D0;\n    HP = VN | ~(VP | D0);\n    X = D0 >> 1;\n    VN = X & HP;\n    VP = HN | ~(X | HP);\n    num_errors_at_band_start_position += 1 - (D0 & lowest_bit_in_band_mask);\n    if (num_errors_at_band_start_position > 3 * error_threshold) {\n      return error_threshold + 1;\n    }\n    for (int ai = 0; ai < 5; ai++) {\n      Peq[ai] >>= 1;\n    }\n  }\n  int band_start_position = read_length - 1;\n  int min_num_errors = num_errors_at_band_start_position;\n  *mapping_end_position = band_start_position;\n  for (int i = 0; i < 2 * error_threshold; i++) {\n    num_errors_at_band_start_position =\n        num_errors_at_band_start_position + ((VP >> i) & (uint32_t)1);\n    num_errors_at_band_start_position =\n        num_errors_at_band_start_position - ((VN >> i) & (uint32_t)1);\n    if (num_errors_at_band_start_position < min_num_errors ||\n        (num_errors_at_band_start_position == min_num_errors &&\n         i + 1 == error_threshold)) {\n      min_num_errors = num_errors_at_band_start_position;\n      *mapping_end_position = band_start_position + 1 + i;\n    }\n  }\n  return min_num_errors;\n}\n\n// Return negative number if the termination are deemed at the beginning of the\n// read mappping_end_position is relative to pattern (reference)\n// read_mapping_length is for text (read)\nint BandedAlignPatternToTextWithDropOff(int error_threshold,\n                                        const char *pattern, const char *text,\n                                        const int read_length,\n                                        int *mapping_end_position,\n                                        int *read_mapping_length) {\n  uint32_t Peq[5] = {0, 0, 0, 0, 0};\n  for (int i = 0; i < 2 * error_threshold; i++) {\n    uint8_t base = CharToUint8(pattern[i]);\n    Peq[base] = Peq[base] | (1 << i);\n  }\n  uint32_t highest_bit_in_band_mask = 1 << (2 * error_threshold);\n  uint32_t lowest_bit_in_band_mask = 1;\n  uint32_t VP = 0;\n  uint32_t VN = 0;\n  uint32_t X = 0;\n  uint32_t D0 = 0;\n  uint32_t HN = 0;\n  uint32_t HP = 0;\n  uint32_t prev_VP = 0;\n  uint32_t prev_VN = 0;\n  int num_errors_at_band_start_position = 0;\n  int i = 0;\n  int fail_beginning = 0;  // the alignment failed at the beginning part\n  int prev_num_errors_at_band_start_position = 0;\n  for (; i < read_length; i++) {\n    uint8_t pattern_base = CharToUint8(pattern[i + 2 * error_threshold]);\n    Peq[pattern_base] = Peq[pattern_base] | highest_bit_in_band_mask;\n    X = Peq[CharToUint8(text[i])] | VN;\n    D0 = ((VP + (X & VP)) ^ VP) | X;\n    HN = VP & D0;\n    HP = VN | ~(VP | D0);\n    X = D0 >> 1;\n    prev_VN = VN;\n    prev_VP = VP;\n    VN = X & HP;\n    VP = HN | ~(X | HP);\n    prev_num_errors_at_band_start_position = num_errors_at_band_start_position;\n    num_errors_at_band_start_position += 1 - (D0 & lowest_bit_in_band_mask);\n    if (num_errors_at_band_start_position > 2 * error_threshold) {\n      // return error_threshold + 1;\n      // the min error in this band could be still less than the\n      // error_threshold, and could but this should be fine since it does not\n      // affect the 5' end of the read.\n      if (i < 4 * error_threshold && i < read_length / 2) {\n        fail_beginning = 1;\n      }\n      break;\n    }\n    for (int ai = 0; ai < 5; ai++) {\n      Peq[ai] >>= 1;\n    }\n  }\n\n  /*char tmp[255] ;\n  strncpy(tmp, pattern, read_length + 2 * error_threshold);\n  printf(\"%s\\n%s\\n\", tmp, text);\n  printf(\"%d\\n\", i) ;\n  fflush(stdout);*/\n  if (i < read_length) {\n    num_errors_at_band_start_position = prev_num_errors_at_band_start_position;\n    VN = prev_VN;\n    VP = prev_VP;\n  }\n  int band_start_position = i - 1;\n  int min_num_errors = num_errors_at_band_start_position;\n  *read_mapping_length = i;\n  *mapping_end_position = band_start_position;\n\n  for (i = 0; i < 2 * error_threshold; i++) {\n    num_errors_at_band_start_position =\n        num_errors_at_band_start_position + ((VP >> i) & (uint32_t)1);\n    num_errors_at_band_start_position =\n        num_errors_at_band_start_position - ((VN >> i) & (uint32_t)1);\n    if (num_errors_at_band_start_position < min_num_errors ||\n        (num_errors_at_band_start_position == min_num_errors &&\n         i + 1 == error_threshold)) {\n      min_num_errors = num_errors_at_band_start_position;\n      *mapping_end_position = band_start_position + 1 + i;\n    }\n  }\n  if (fail_beginning ||\n      (read_length > 60 &&\n       *mapping_end_position + 1 - error_threshold - min_num_errors < 30)) {\n    *mapping_end_position = -*mapping_end_position;\n  }\n  return min_num_errors;\n}\n\nint BandedAlignPatternToTextWithDropOffFrom3End(int error_threshold,\n                                                const char *pattern,\n                                                const char *text,\n                                                const int read_length,\n                                                int *mapping_end_position,\n                                                int *read_mapping_length) {\n  uint32_t Peq[5] = {0, 0, 0, 0, 0};\n  for (int i = 0; i < 2 * error_threshold; i++) {\n    uint8_t base =\n        CharToUint8(pattern[read_length + 2 * error_threshold - 1 - i]);\n    Peq[base] = Peq[base] | (1 << i);\n  }\n  uint32_t highest_bit_in_band_mask = 1 << (2 * error_threshold);\n  uint32_t lowest_bit_in_band_mask = 1;\n  uint32_t VP = 0;\n  uint32_t VN = 0;\n  uint32_t X = 0;\n  uint32_t D0 = 0;\n  uint32_t HN = 0;\n  uint32_t HP = 0;\n  uint32_t prev_VP = 0;\n  uint32_t prev_VN = 0;\n  int num_errors_at_band_start_position = 0;\n  int i = 0;\n  int fail_beginning = 0;  // the alignment failed at the beginning part\n  int prev_num_errors_at_band_start_position = 0;\n  for (; i < read_length; i++) {\n    // printf(\"%c %c %d\\n\", pattern[read_length - 1 - i], pattern[read_length -\n    // 1 - i + error_threshold], text[read_length - 1 - i]);\n    uint8_t pattern_base = CharToUint8(pattern[read_length - 1 - i]);\n    Peq[pattern_base] = Peq[pattern_base] | highest_bit_in_band_mask;\n    X = Peq[CharToUint8(text[read_length - 1 - i])] | VN;\n    D0 = ((VP + (X & VP)) ^ VP) | X;\n    HN = VP & D0;\n    HP = VN | ~(VP | D0);\n    X = D0 >> 1;\n    prev_VN = VN;\n    prev_VP = VP;\n    VN = X & HP;\n    VP = HN | ~(X | HP);\n    prev_num_errors_at_band_start_position = num_errors_at_band_start_position;\n    num_errors_at_band_start_position += 1 - (D0 & lowest_bit_in_band_mask);\n    /*printf(\"->%d %d %c %c\", i, num_errors_at_band_start_position,\n    pattern[read_length - 1 - i], text[read_length - 1 - i]) ; int tmp =\n    num_errors_at_band_start_position; for (int j = 0; j < 2 * error_threshold;\n    j++) { tmp = tmp + ((VP >> j) & (uint32_t) 1); tmp = tmp - ((VN >> j) &\n    (uint32_t) 1); printf(\" %d\", tmp);\n    }\n    printf(\"\\n\");*/\n    if (num_errors_at_band_start_position > 2 * error_threshold) {\n      // return error_threshold + 1;\n      if (i < 4 * error_threshold && i < read_length / 2) {\n        fail_beginning = 1;\n      }\n      break;\n    }\n    for (int ai = 0; ai < 5; ai++) {\n      Peq[ai] >>= 1;\n    }\n  }\n  // printf(\"li %d: %d %d %d\\n\", fail_beginning, i, error_threshold,\n  // read_length);\n  if (i < read_length) {\n    num_errors_at_band_start_position = prev_num_errors_at_band_start_position;\n    VN = prev_VN;\n    VP = prev_VP;\n  }\n  int band_start_position = i - 1;\n  int min_num_errors = num_errors_at_band_start_position;\n  *read_mapping_length = i;\n  *mapping_end_position = band_start_position;\n  // printf(\"-1: %d\\n\", num_errors_at_band_start_position);\n  for (i = 0; i < 2 * error_threshold; i++) {\n    num_errors_at_band_start_position =\n        num_errors_at_band_start_position + ((VP >> i) & (uint32_t)1);\n    num_errors_at_band_start_position =\n        num_errors_at_band_start_position - ((VN >> i) & (uint32_t)1);\n    // printf(\"%d: %d\\n\", i, num_errors_at_band_start_position);\n    if (num_errors_at_band_start_position < min_num_errors ||\n        (num_errors_at_band_start_position == min_num_errors &&\n         i + 1 == error_threshold)) {\n      min_num_errors = num_errors_at_band_start_position;\n      *mapping_end_position = band_start_position + (1 + i);\n    }\n  }\n  if (fail_beginning ||\n      (read_length > 60 &&\n       *mapping_end_position + 1 - error_threshold - min_num_errors < 30)) {\n    *mapping_end_position = -*mapping_end_position;\n  }\n  return min_num_errors;\n}\n\nvoid BandedAlign4PatternsToText(int error_threshold, const char **patterns,\n                                const char *text, int read_length,\n                                int32_t *mapping_edit_distances,\n                                int32_t *mapping_end_positions) {\n  int ALPHABET_SIZE = 5;\n  const char *reference_sequence0 = patterns[0];\n  const char *reference_sequence1 = patterns[1];\n  const char *reference_sequence2 = patterns[2];\n  const char *reference_sequence3 = patterns[3];\n  uint32_t highest_bit_in_band_mask = 1 << (2 * error_threshold);\n  __m128i highest_bit_in_band_mask_vpu0 =\n      _mm_set_epi32(0, 0, 0, highest_bit_in_band_mask);\n  __m128i highest_bit_in_band_mask_vpu1 =\n      _mm_set_epi32(0, 0, highest_bit_in_band_mask, 0);\n  __m128i highest_bit_in_band_mask_vpu2 =\n      _mm_set_epi32(0, highest_bit_in_band_mask, 0, 0);\n  __m128i highest_bit_in_band_mask_vpu3 =\n      _mm_set_epi32(highest_bit_in_band_mask, 0, 0, 0);\n  // Init Peq\n  __m128i Peq[ALPHABET_SIZE];\n  for (int ai = 0; ai < ALPHABET_SIZE; ai++) {\n    Peq[ai] = _mm_setzero_si128();\n  }\n  for (int i = 0; i < 2 * error_threshold; i++) {\n    uint8_t base0 = CharToUint8(reference_sequence0[i]);\n    uint8_t base1 = CharToUint8(reference_sequence1[i]);\n    uint8_t base2 = CharToUint8(reference_sequence2[i]);\n    uint8_t base3 = CharToUint8(reference_sequence3[i]);\n    Peq[base0] = _mm_or_si128(highest_bit_in_band_mask_vpu0, Peq[base0]);\n    Peq[base1] = _mm_or_si128(highest_bit_in_band_mask_vpu1, Peq[base1]);\n    Peq[base2] = _mm_or_si128(highest_bit_in_band_mask_vpu2, Peq[base2]);\n    Peq[base3] = _mm_or_si128(highest_bit_in_band_mask_vpu3, Peq[base3]);\n    for (int ai = 0; ai < ALPHABET_SIZE; ai++) {\n      Peq[ai] = _mm_srli_epi32(Peq[ai], 1);\n    }\n  }\n\n  uint32_t lowest_bit_in_band_mask = 1;\n  __m128i lowest_bit_in_band_mask_vpu = _mm_set1_epi32(lowest_bit_in_band_mask);\n  __m128i VP = _mm_setzero_si128();\n  __m128i VN = _mm_setzero_si128();\n  __m128i X = _mm_setzero_si128();\n  __m128i D0 = _mm_setzero_si128();\n  __m128i HN = _mm_setzero_si128();\n  __m128i HP = _mm_setzero_si128();\n  __m128i max_mask_vpu = _mm_set1_epi32(0xffffffff);\n  __m128i num_errors_at_band_start_position_vpu = _mm_setzero_si128();\n  __m128i early_stop_threshold_vpu = _mm_set1_epi32(error_threshold * 3);\n  for (int i = 0; i < read_length; i++) {\n    uint8_t base0 = CharToUint8(reference_sequence0[i + 2 * error_threshold]);\n    uint8_t base1 = CharToUint8(reference_sequence1[i + 2 * error_threshold]);\n    uint8_t base2 = CharToUint8(reference_sequence2[i + 2 * error_threshold]);\n    uint8_t base3 = CharToUint8(reference_sequence3[i + 2 * error_threshold]);\n    Peq[base0] = _mm_or_si128(highest_bit_in_band_mask_vpu0, Peq[base0]);\n    Peq[base1] = _mm_or_si128(highest_bit_in_band_mask_vpu1, Peq[base1]);\n    Peq[base2] = _mm_or_si128(highest_bit_in_band_mask_vpu2, Peq[base2]);\n    Peq[base3] = _mm_or_si128(highest_bit_in_band_mask_vpu3, Peq[base3]);\n    X = _mm_or_si128(Peq[CharToUint8(text[i])], VN);\n    D0 = _mm_and_si128(X, VP);\n    D0 = _mm_add_epi32(D0, VP);\n    D0 = _mm_xor_si128(D0, VP);\n    D0 = _mm_or_si128(D0, X);\n    HN = _mm_and_si128(VP, D0);\n    HP = _mm_or_si128(VP, D0);\n    HP = _mm_xor_si128(HP, max_mask_vpu);\n    HP = _mm_or_si128(HP, VN);\n    X = _mm_srli_epi32(D0, 1);\n    VN = _mm_and_si128(X, HP);\n    VP = _mm_or_si128(X, HP);\n    VP = _mm_xor_si128(VP, max_mask_vpu);\n    VP = _mm_or_si128(VP, HN);\n    __m128i E = _mm_and_si128(D0, lowest_bit_in_band_mask_vpu);\n    E = _mm_xor_si128(E, lowest_bit_in_band_mask_vpu);\n    num_errors_at_band_start_position_vpu =\n        _mm_add_epi32(num_errors_at_band_start_position_vpu, E);\n    __m128i early_stop = _mm_cmpgt_epi32(num_errors_at_band_start_position_vpu,\n                                         early_stop_threshold_vpu);\n    int tmp = _mm_movemask_epi8(early_stop);\n    if (tmp == 0xffff) {\n      _mm_store_si128((__m128i *)mapping_edit_distances,\n                      num_errors_at_band_start_position_vpu);\n      return;\n    }\n    for (int ai = 0; ai < ALPHABET_SIZE; ai++) {\n      Peq[ai] = _mm_srli_epi32(Peq[ai], 1);\n    }\n  }\n  int band_start_position = read_length - 1;\n  __m128i min_num_errors_vpu = num_errors_at_band_start_position_vpu;\n  for (int i = 0; i < 2 * error_threshold; i++) {\n    __m128i lowest_bit_in_VP_vpu =\n        _mm_and_si128(VP, lowest_bit_in_band_mask_vpu);\n    __m128i lowest_bit_in_VN_vpu =\n        _mm_and_si128(VN, lowest_bit_in_band_mask_vpu);\n    num_errors_at_band_start_position_vpu = _mm_add_epi32(\n        num_errors_at_band_start_position_vpu, lowest_bit_in_VP_vpu);\n    num_errors_at_band_start_position_vpu = _mm_sub_epi32(\n        num_errors_at_band_start_position_vpu, lowest_bit_in_VN_vpu);\n    __m128i mapping_end_positions_update_mask_vpu = _mm_cmplt_epi32(\n        num_errors_at_band_start_position_vpu, min_num_errors_vpu);\n    __m128i mapping_end_positions_update_mask_vpu1 = _mm_cmpeq_epi32(\n        num_errors_at_band_start_position_vpu, min_num_errors_vpu);\n    int mapping_end_positions_update_mask =\n        _mm_movemask_epi8(mapping_end_positions_update_mask_vpu);\n    int mapping_end_positions_update_mask1 =\n        _mm_movemask_epi8(mapping_end_positions_update_mask_vpu1);\n    for (int li = 0; li < 4; ++li) {\n      if ((mapping_end_positions_update_mask & 1) == 1 ||\n          ((mapping_end_positions_update_mask1 & 1) == 1 &&\n           i + 1 == error_threshold)) {\n        mapping_end_positions[li] = band_start_position + 1 + i;\n      }\n      mapping_end_positions_update_mask =\n          mapping_end_positions_update_mask >> 4;\n      mapping_end_positions_update_mask1 =\n          mapping_end_positions_update_mask1 >> 4;\n    }\n    min_num_errors_vpu = _mm_min_epi32(min_num_errors_vpu,\n                                       num_errors_at_band_start_position_vpu);\n    VP = _mm_srli_epi32(VP, 1);\n    VN = _mm_srli_epi32(VN, 1);\n  }\n  _mm_store_si128((__m128i *)mapping_edit_distances, min_num_errors_vpu);\n}\n\nvoid BandedAlign8PatternsToText(int error_threshold, const char **patterns,\n                                const char *text, int read_length,\n                                int16_t *mapping_edit_distances,\n                                int16_t *mapping_end_positions) {\n  int ALPHABET_SIZE = 5;\n  const char *reference_sequence0 = patterns[0];\n  const char *reference_sequence1 = patterns[1];\n  const char *reference_sequence2 = patterns[2];\n  const char *reference_sequence3 = patterns[3];\n  const char *reference_sequence4 = patterns[4];\n  const char *reference_sequence5 = patterns[5];\n  const char *reference_sequence6 = patterns[6];\n  const char *reference_sequence7 = patterns[7];\n  uint16_t highest_bit_in_band_mask = 1 << (2 * error_threshold);\n  __m128i highest_bit_in_band_mask_vpu0 =\n      _mm_set_epi16(0, 0, 0, 0, 0, 0, 0, highest_bit_in_band_mask);\n  __m128i highest_bit_in_band_mask_vpu1 =\n      _mm_set_epi16(0, 0, 0, 0, 0, 0, highest_bit_in_band_mask, 0);\n  __m128i highest_bit_in_band_mask_vpu2 =\n      _mm_set_epi16(0, 0, 0, 0, 0, highest_bit_in_band_mask, 0, 0);\n  __m128i highest_bit_in_band_mask_vpu3 =\n      _mm_set_epi16(0, 0, 0, 0, highest_bit_in_band_mask, 0, 0, 0);\n  __m128i highest_bit_in_band_mask_vpu4 =\n      _mm_set_epi16(0, 0, 0, highest_bit_in_band_mask, 0, 0, 0, 0);\n  __m128i highest_bit_in_band_mask_vpu5 =\n      _mm_set_epi16(0, 0, highest_bit_in_band_mask, 0, 0, 0, 0, 0);\n  __m128i highest_bit_in_band_mask_vpu6 =\n      _mm_set_epi16(0, highest_bit_in_band_mask, 0, 0, 0, 0, 0, 0);\n  __m128i highest_bit_in_band_mask_vpu7 =\n      _mm_set_epi16(highest_bit_in_band_mask, 0, 0, 0, 0, 0, 0, 0);\n  // Init Peq\n  __m128i Peq[ALPHABET_SIZE];\n  for (int ai = 0; ai < ALPHABET_SIZE; ai++) {\n    Peq[ai] = _mm_setzero_si128();\n  }\n  for (int i = 0; i < 2 * error_threshold; i++) {\n    uint8_t base0 = CharToUint8(reference_sequence0[i]);\n    uint8_t base1 = CharToUint8(reference_sequence1[i]);\n    uint8_t base2 = CharToUint8(reference_sequence2[i]);\n    uint8_t base3 = CharToUint8(reference_sequence3[i]);\n    uint8_t base4 = CharToUint8(reference_sequence4[i]);\n    uint8_t base5 = CharToUint8(reference_sequence5[i]);\n    uint8_t base6 = CharToUint8(reference_sequence6[i]);\n    uint8_t base7 = CharToUint8(reference_sequence7[i]);\n    Peq[base0] = _mm_or_si128(highest_bit_in_band_mask_vpu0, Peq[base0]);\n    Peq[base1] = _mm_or_si128(highest_bit_in_band_mask_vpu1, Peq[base1]);\n    Peq[base2] = _mm_or_si128(highest_bit_in_band_mask_vpu2, Peq[base2]);\n    Peq[base3] = _mm_or_si128(highest_bit_in_band_mask_vpu3, Peq[base3]);\n    Peq[base4] = _mm_or_si128(highest_bit_in_band_mask_vpu4, Peq[base4]);\n    Peq[base5] = _mm_or_si128(highest_bit_in_band_mask_vpu5, Peq[base5]);\n    Peq[base6] = _mm_or_si128(highest_bit_in_band_mask_vpu6, Peq[base6]);\n    Peq[base7] = _mm_or_si128(highest_bit_in_band_mask_vpu7, Peq[base7]);\n    for (int ai = 0; ai < ALPHABET_SIZE; ai++) {\n      Peq[ai] = _mm_srli_epi16(Peq[ai], 1);\n    }\n  }\n\n  uint16_t lowest_bit_in_band_mask = 1;\n  __m128i lowest_bit_in_band_mask_vpu = _mm_set1_epi16(lowest_bit_in_band_mask);\n  __m128i VP = _mm_setzero_si128();\n  __m128i VN = _mm_setzero_si128();\n  __m128i X = _mm_setzero_si128();\n  __m128i D0 = _mm_setzero_si128();\n  __m128i HN = _mm_setzero_si128();\n  __m128i HP = _mm_setzero_si128();\n  __m128i max_mask_vpu = _mm_set1_epi16(0xffff);\n  __m128i num_errors_at_band_start_position_vpu = _mm_setzero_si128();\n  __m128i early_stop_threshold_vpu = _mm_set1_epi16(error_threshold * 3);\n  for (int i = 0; i < read_length; i++) {\n    uint8_t base0 = CharToUint8(reference_sequence0[i + 2 * error_threshold]);\n    uint8_t base1 = CharToUint8(reference_sequence1[i + 2 * error_threshold]);\n    uint8_t base2 = CharToUint8(reference_sequence2[i + 2 * error_threshold]);\n    uint8_t base3 = CharToUint8(reference_sequence3[i + 2 * error_threshold]);\n    uint8_t base4 = CharToUint8(reference_sequence4[i + 2 * error_threshold]);\n    uint8_t base5 = CharToUint8(reference_sequence5[i + 2 * error_threshold]);\n    uint8_t base6 = CharToUint8(reference_sequence6[i + 2 * error_threshold]);\n    uint8_t base7 = CharToUint8(reference_sequence7[i + 2 * error_threshold]);\n    Peq[base0] = _mm_or_si128(highest_bit_in_band_mask_vpu0, Peq[base0]);\n    Peq[base1] = _mm_or_si128(highest_bit_in_band_mask_vpu1, Peq[base1]);\n    Peq[base2] = _mm_or_si128(highest_bit_in_band_mask_vpu2, Peq[base2]);\n    Peq[base3] = _mm_or_si128(highest_bit_in_band_mask_vpu3, Peq[base3]);\n    Peq[base4] = _mm_or_si128(highest_bit_in_band_mask_vpu4, Peq[base4]);\n    Peq[base5] = _mm_or_si128(highest_bit_in_band_mask_vpu5, Peq[base5]);\n    Peq[base6] = _mm_or_si128(highest_bit_in_band_mask_vpu6, Peq[base6]);\n    Peq[base7] = _mm_or_si128(highest_bit_in_band_mask_vpu7, Peq[base7]);\n    X = _mm_or_si128(Peq[CharToUint8(text[i])], VN);\n    D0 = _mm_and_si128(X, VP);\n    D0 = _mm_add_epi16(D0, VP);\n    D0 = _mm_xor_si128(D0, VP);\n    D0 = _mm_or_si128(D0, X);\n    HN = _mm_and_si128(VP, D0);\n    HP = _mm_or_si128(VP, D0);\n    HP = _mm_xor_si128(HP, max_mask_vpu);\n    HP = _mm_or_si128(HP, VN);\n    X = _mm_srli_epi16(D0, 1);\n    VN = _mm_and_si128(X, HP);\n    VP = _mm_or_si128(X, HP);\n    VP = _mm_xor_si128(VP, max_mask_vpu);\n    VP = _mm_or_si128(VP, HN);\n    __m128i E = _mm_and_si128(D0, lowest_bit_in_band_mask_vpu);\n    E = _mm_xor_si128(E, lowest_bit_in_band_mask_vpu);\n    num_errors_at_band_start_position_vpu =\n        _mm_add_epi16(num_errors_at_band_start_position_vpu, E);\n    __m128i early_stop = _mm_cmpgt_epi16(num_errors_at_band_start_position_vpu,\n                                         early_stop_threshold_vpu);\n    int tmp = _mm_movemask_epi8(early_stop);\n    if (tmp == 0xffff) {\n      _mm_store_si128((__m128i *)mapping_edit_distances,\n                      num_errors_at_band_start_position_vpu);\n      return;\n    }\n    for (int ai = 0; ai < ALPHABET_SIZE; ai++) {\n      Peq[ai] = _mm_srli_epi16(Peq[ai], 1);\n    }\n  }\n  int band_start_position = read_length - 1;\n  __m128i min_num_errors_vpu = num_errors_at_band_start_position_vpu;\n  for (int i = 0; i < 2 * error_threshold; i++) {\n    __m128i lowest_bit_in_VP_vpu =\n        _mm_and_si128(VP, lowest_bit_in_band_mask_vpu);\n    __m128i lowest_bit_in_VN_vpu =\n        _mm_and_si128(VN, lowest_bit_in_band_mask_vpu);\n    num_errors_at_band_start_position_vpu = _mm_add_epi16(\n        num_errors_at_band_start_position_vpu, lowest_bit_in_VP_vpu);\n    num_errors_at_band_start_position_vpu = _mm_sub_epi16(\n        num_errors_at_band_start_position_vpu, lowest_bit_in_VN_vpu);\n    __m128i mapping_end_positions_update_mask_vpu = _mm_cmplt_epi16(\n        num_errors_at_band_start_position_vpu, min_num_errors_vpu);\n    __m128i mapping_end_positions_update_mask_vpu1 = _mm_cmpeq_epi16(\n        num_errors_at_band_start_position_vpu, min_num_errors_vpu);\n    int mapping_end_positions_update_mask =\n        _mm_movemask_epi8(mapping_end_positions_update_mask_vpu);\n    int mapping_end_positions_update_mask1 =\n        _mm_movemask_epi8(mapping_end_positions_update_mask_vpu1);\n    for (int li = 0; li < 8; ++li) {\n      if ((mapping_end_positions_update_mask & 1) == 1 ||\n          ((mapping_end_positions_update_mask1 & 1) == 1 &&\n           i + 1 == error_threshold)) {\n        mapping_end_positions[li] = band_start_position + 1 + i;\n      }\n      mapping_end_positions_update_mask =\n          mapping_end_positions_update_mask >> 2;\n      mapping_end_positions_update_mask1 =\n          mapping_end_positions_update_mask1 >> 2;\n    }\n    min_num_errors_vpu = _mm_min_epi16(min_num_errors_vpu,\n                                       num_errors_at_band_start_position_vpu);\n    VP = _mm_srli_epi16(VP, 1);\n    VN = _mm_srli_epi16(VN, 1);\n  }\n  _mm_store_si128((__m128i *)mapping_edit_distances, min_num_errors_vpu);\n}\n\nvoid BandedTraceback(int error_threshold, int min_num_errors,\n                     const char *pattern, const char *text,\n                     const int read_length, int *mapping_start_position) {\n  // fisrt calculate the hamming distance and see whether it's equal to # errors\n  if (min_num_errors == 0) {\n    *mapping_start_position = error_threshold;\n    return;\n  }\n  int error_count = 0;\n  for (int i = 0; i < read_length; ++i) {\n    if (pattern[i + error_threshold] != text[i]) {\n      ++error_count;\n    }\n  }\n  if (error_count == min_num_errors) {\n    *mapping_start_position = error_threshold;\n    return;\n  }\n  // if not then there are gaps so that we have to traceback with edit distance.\n  uint32_t Peq[5] = {0, 0, 0, 0, 0};\n  for (int i = 0; i < 2 * error_threshold; i++) {\n    uint8_t base =\n        CharToUint8(pattern[read_length - 1 + 2 * error_threshold - i]);\n    Peq[base] = Peq[base] | (1 << i);\n  }\n  uint32_t highest_bit_in_band_mask = 1 << (2 * error_threshold);\n  uint32_t lowest_bit_in_band_mask = 1;\n  uint32_t VP = 0;\n  uint32_t VN = 0;\n  uint32_t X = 0;\n  uint32_t D0 = 0;\n  uint32_t HN = 0;\n  uint32_t HP = 0;\n  int num_errors_at_band_start_position = 0;\n  for (int i = 0; i < read_length; i++) {\n    uint8_t pattern_base = CharToUint8(pattern[read_length - 1 - i]);\n    Peq[pattern_base] = Peq[pattern_base] | highest_bit_in_band_mask;\n    X = Peq[CharToUint8(text[read_length - 1 - i])] | VN;\n    D0 = ((VP + (X & VP)) ^ VP) | X;\n    HN = VP & D0;\n    HP = VN | ~(VP | D0);\n    X = D0 >> 1;\n    VN = X & HP;\n    VP = HN | ~(X | HP);\n    num_errors_at_band_start_position += 1 - (D0 & lowest_bit_in_band_mask);\n    for (int ai = 0; ai < 5; ai++) {\n      Peq[ai] >>= 1;\n    }\n  }\n  *mapping_start_position = 2 * error_threshold;\n  for (int i = 0; i < 2 * error_threshold; i++) {\n    num_errors_at_band_start_position =\n        num_errors_at_band_start_position + ((VP >> i) & (uint32_t)1);\n    num_errors_at_band_start_position =\n        num_errors_at_band_start_position - ((VN >> i) & (uint32_t)1);\n    if (num_errors_at_band_start_position == min_num_errors) {\n      *mapping_start_position = 2 * error_threshold - (1 + i);\n      if (i + 1 == error_threshold) {\n        return;\n      }\n    }\n  }\n}\n\nvoid BandedTracebackToEnd(int error_threshold, int min_num_errors,\n                          const char *pattern, const char *text,\n                          const int read_length, int *mapping_end_position) {\n  // fisrt calculate the hamming distance and see whether it's equal to # errors\n  if (min_num_errors == 0) {\n    *mapping_end_position = read_length + error_threshold;\n    return;\n  }\n  int error_count = 0;\n  for (int i = 0; i < read_length; ++i) {\n    if (pattern[i + error_threshold] != text[i]) {\n      ++error_count;\n    }\n  }\n  if (error_count == min_num_errors) {\n    *mapping_end_position = read_length + error_threshold;\n    return;\n  }\n  // if not then there are gaps so that we have to traceback with edit distance.\n  uint32_t Peq[5] = {0, 0, 0, 0, 0};\n  for (int i = 0; i < 2 * error_threshold; i++) {\n    uint8_t base = CharToUint8(pattern[i]);\n    Peq[base] = Peq[base] | (1 << i);\n  }\n  uint32_t highest_bit_in_band_mask = 1 << (2 * error_threshold);\n  uint32_t lowest_bit_in_band_mask = 1;\n  uint32_t VP = 0;\n  uint32_t VN = 0;\n  uint32_t X = 0;\n  uint32_t D0 = 0;\n  uint32_t HN = 0;\n  uint32_t HP = 0;\n  int num_errors_at_band_start_position = 0;\n  for (int i = 0; i < read_length; i++) {\n    // printf(\"=>%d %d %c %c\\n\", i, num_errors_at_band_start_position, pattern[i\n    // + 2 * error_threshold], text[i]) ;\n    uint8_t pattern_base = CharToUint8(pattern[i + 2 * error_threshold]);\n    Peq[pattern_base] = Peq[pattern_base] | highest_bit_in_band_mask;\n    X = Peq[CharToUint8(text[i])] | VN;\n    D0 = ((VP + (X & VP)) ^ VP) | X;\n    HN = VP & D0;\n    HP = VN | ~(VP | D0);\n    X = D0 >> 1;\n    VN = X & HP;\n    VP = HN | ~(X | HP);\n    num_errors_at_band_start_position += 1 - (D0 & lowest_bit_in_band_mask);\n    for (int ai = 0; ai < 5; ai++) {\n      Peq[ai] >>= 1;\n    }\n  }\n  int band_start_position = read_length;\n  *mapping_end_position = band_start_position + 1;\n  for (int i = 0; i < 2 * error_threshold; i++) {\n    num_errors_at_band_start_position =\n        num_errors_at_band_start_position + ((VP >> i) & (uint32_t)1);\n    num_errors_at_band_start_position =\n        num_errors_at_band_start_position - ((VN >> i) & (uint32_t)1);\n    if (num_errors_at_band_start_position == min_num_errors) {\n      *mapping_end_position = band_start_position + (i + 1);\n      if (i + 1 == error_threshold) {\n        return;\n      }\n    }\n  }\n}\n\n}  // namespace chromap\n"
  },
  {
    "path": "src/alignment.h",
    "content": "#ifndef ALIGNMENT_H_\n#define ALIGNMENT_H_\n\n#include \"mapping_in_memory.h\"\n#include \"sam_mapping.h\"\n#include \"sequence_batch.h\"\n#include \"utils.h\"\n\nnamespace chromap {\n\nint GetLongestMatchLength(const char *pattern, const char *text,\n                          const int read_length);\n\n// Return newly adjusted reference start/end position for kPositive/kNegative\n// mappings.\nint AdjustGapBeginning(const Strand mapping_strand, const char *ref,\n                       const char *read, int *gap_beginning, int read_end,\n                       int ref_start_position, int ref_end_position,\n                       int *n_cigar, uint32_t **cigar);\n\n// Reference (pattern) mapping start postion and cigar must be computed before\n// calling this function. Read (text) must be already at the start position.\nvoid GenerateNMAndMDTag(const char *pattern, const char *text,\n                        int mapping_start_position,\n                        MappingInMemory &mapping_in_memory);\n\nint BandedAlignPatternToText(int error_threshold, const char *pattern,\n                             const char *text, const int read_length,\n                             int *mapping_end_position);\n\n// Return negative number if the termination are deemed at the beginning of the\n// read mappping_end_position is relative to pattern (reference)\n// read_mapping_length is for text (read)\nint BandedAlignPatternToTextWithDropOff(int error_threshold,\n                                        const char *pattern, const char *text,\n                                        const int read_length,\n                                        int *mapping_end_position,\n                                        int *read_mapping_length);\n\nint BandedAlignPatternToTextWithDropOffFrom3End(\n    int error_threshold, const char *pattern, const char *text,\n    const int read_length, int *mapping_end_position, int *read_mapping_length);\n\nvoid BandedAlign4PatternsToText(int error_threshold, const char **patterns,\n                                const char *text, int read_length,\n                                int32_t *mapping_edit_distances,\n                                int32_t *mapping_end_positions);\n\nvoid BandedAlign8PatternsToText(int error_threshold, const char **patterns,\n                                const char *text, int read_length,\n                                int16_t *mapping_edit_distances,\n                                int16_t *mapping_end_positions);\n\nvoid BandedTraceback(int error_threshold, int min_num_errors,\n                     const char *pattern, const char *text,\n                     const int read_length, int *mapping_start_position);\n\nvoid BandedTracebackToEnd(int error_threshold, int min_num_errors,\n                          const char *pattern, const char *text,\n                          const int read_length, int *mapping_end_position);\n\n}  // namespace chromap\n\n#endif  // ALIGNMENT_H_\n"
  },
  {
    "path": "src/barcode_translator.h",
    "content": "#ifndef BARCODETRANSLATOR_H_\n#define BARCODETRANSLATOR_H_\n\n#include <cinttypes>\n#include <cstring>\n#include <fstream>\n#include <functional>\n#include <iostream>\n#include <string>\n#include <vector>\n\n#include <zlib.h>\n\n#include \"khash.h\"\n#include \"utils.h\"\n\nnamespace chromap {\n\nKHASH_INIT(k64_str, uint64_t, char *, 1, kh_int64_hash_func,\n           kh_int64_hash_equal);\n\n// The class for handling barcode convertion.\nclass BarcodeTranslator {\n public:\n  BarcodeTranslator() {\n    barcode_translate_table_ = NULL;\n    from_bc_length_ = -1;\n  }\n\n  ~BarcodeTranslator() {\n    if (barcode_translate_table_ != NULL) {\n      khiter_t k;\n      for (k = kh_begin(barcode_translate_table_);\n           k != kh_end(barcode_translate_table_); ++k) {\n        if (kh_exist(barcode_translate_table_, k))\n          free(kh_value(barcode_translate_table_, k));\n      }\n      kh_destroy(k64_str, barcode_translate_table_);\n    }\n  }\n\n  void SetTranslateTable(const std::string &file) {\n    barcode_translate_table_ = kh_init(k64_str);\n    \n    if (1) {\n      gzFile barcode_translate_file = gzopen(file.c_str(), \"r\");\n      const uint32_t line_buffer_size = 512;\n      char file_line[line_buffer_size];\n      while (gzgets(barcode_translate_file, file_line, line_buffer_size) != NULL) {\n        int line_len = strlen(file_line);\n        if (file_line[line_len - 1] == '\\n') {\n          file_line[line_len - 1] = '\\0';\n        }\n        std::string tmp_string(file_line);\n        ProcessTranslateFileLine(tmp_string);\n      }\n    } else {\n      // Old implementation, which does not support gzipped input.\n      std::ifstream file_stream(file);\n      std::string file_line;\n      while (getline(file_stream, file_line)) {\n        ProcessTranslateFileLine(file_line);\n      }\n    }\n\n    mask_ = (1ull << (2 * from_bc_length_)) - 1;\n    /*for (int i = 0; i < from_bc_length_; ++i)\n    {\n      mask_ |= (3ull << (2*i));\n    }*/\n  }\n\n  std::string Translate(uint64_t bc, uint32_t bc_length) {\n    if (barcode_translate_table_ == NULL) {\n      return Seed2Sequence(bc, bc_length);\n    }\n\n    std::string ret;\n    uint64_t i;\n    for (i = 0; i < bc_length / from_bc_length_; ++i) {\n      uint64_t seed = (bc << (2 * i * from_bc_length_)) >>\n                      (2 * (bc_length / from_bc_length_ - 1) * from_bc_length_);\n      seed &= mask_;\n      khiter_t barcode_translate_table_iter =\n          kh_get(k64_str, barcode_translate_table_, seed);\n      if (barcode_translate_table_iter == kh_end(barcode_translate_table_)) {\n        std::cerr << \"Barcode does not exist in the translation table.\"\n                  << std::endl;\n        exit(-1);\n      }\n      std::string bc_to(\n          kh_value(barcode_translate_table_, barcode_translate_table_iter));\n      if (i == 0) {\n        ret = bc_to;\n      } else {\n        ret += \"-\" + bc_to;\n      }\n    }\n    return ret;\n  }\n\n private:\n  khash_t(k64_str) * barcode_translate_table_;\n  int from_bc_length_;\n  uint64_t mask_;\n\n  std::string Seed2Sequence(uint64_t seed, uint32_t seed_length) const {\n    std::string sequence;\n    sequence.reserve(seed_length);\n    uint64_t mask_ = 3;\n    for (uint32_t i = 0; i < seed_length; ++i) {\n      sequence.push_back(\n          Uint8ToChar((seed >> ((seed_length - 1 - i) * 2)) & mask_));\n    }\n    return sequence;\n  }\n\n  void ProcessTranslateFileLine(std::string &line) {\n    int i;\n    int len = line.length();\n    std::string to;\n    for (i = 0; i < len; ++i) {\n      if (line[i] == ',' || line[i] == '\\t') break;\n    }\n\n    to = line.substr(0, i);\n    // from = line.substr(i + 1, len - i - 1);\n    from_bc_length_ = len - i - 1;\n    uint64_t from_seed =\n        GenerateSeedFromSequence(line.c_str(), len, i + 1, from_bc_length_);\n\n    int khash_return_code;\n    khiter_t barcode_translate_table_iter = kh_put(\n        k64_str, barcode_translate_table_, from_seed, &khash_return_code);\n    kh_value(barcode_translate_table_, barcode_translate_table_iter) =\n        strdup(to.c_str());\n  }\n};\n\n}  // namespace chromap\n#endif\n"
  },
  {
    "path": "src/bed_mapping.h",
    "content": "#ifndef BEDMAPPING_H_\n#define BEDMAPPING_H_\n\n#include <string>\n\n#include \"mapping.h\"\n\nnamespace chromap {\n\nclass MappingWithBarcode : public Mapping {\n public:\n  uint32_t read_id_;\n  uint64_t cell_barcode_;\n  uint32_t fragment_start_position_;\n  uint16_t fragment_length_;\n  uint8_t mapq_ : 6, direction_ : 1, is_unique_ : 1;\n  uint8_t num_dups_;\n  // uint8_t mapq;\n  MappingWithBarcode() : num_dups_(0) {}\n  MappingWithBarcode(uint32_t read_id, uint64_t cell_barcode,\n                     uint32_t fragment_start_position, uint16_t fragment_length,\n                     uint8_t mapq, uint8_t direction, uint8_t is_unique,\n                     uint8_t num_dups)\n      : read_id_(read_id),\n        cell_barcode_(cell_barcode),\n        fragment_start_position_(fragment_start_position),\n        fragment_length_(fragment_length),\n        mapq_(mapq),\n        direction_(direction),\n        is_unique_(is_unique),\n        num_dups_(num_dups) {}\n  bool operator<(const MappingWithBarcode &m) const {\n    return std::tie(fragment_start_position_, fragment_length_, cell_barcode_,\n                    mapq_, direction_, is_unique_, read_id_) <\n           std::tie(m.fragment_start_position_, m.fragment_length_,\n                    m.cell_barcode_, m.mapq_, m.direction_, m.is_unique_,\n                    m.read_id_);\n  }\n  bool operator==(const MappingWithBarcode &m) const {\n    return std::tie(cell_barcode_, fragment_start_position_) ==\n           std::tie(m.cell_barcode_, m.fragment_start_position_);\n  }\n  bool IsSamePosition(const MappingWithBarcode &m) const {\n    return std::tie(fragment_start_position_) ==\n           std::tie(m.fragment_start_position_);\n  }\n  uint64_t GetBarcode() const { return cell_barcode_; }\n  void Tn5Shift() {\n    if (direction_ == 1) {\n      fragment_start_position_ += 4;\n    } else {\n      fragment_length_ -= 5;\n    }\n  }\n  bool IsPositiveStrand() const { return direction_ > 0 ? true : false; }\n  uint32_t GetStartPosition() const {  // inclusive\n    return fragment_start_position_;\n  }\n  uint32_t GetEndPosition() const {  // exclusive\n    return fragment_start_position_ + fragment_length_;\n  }\n};\n\nclass MappingWithoutBarcode : public Mapping {\n public:\n  uint32_t read_id_;\n  uint32_t fragment_start_position_;\n  uint16_t fragment_length_;\n  // uint8_t mapq;\n  uint8_t mapq_ : 6, direction_ : 1, is_unique_ : 1;\n  uint16_t num_dups_; // Need higher limit in bulk setting\n\n  MappingWithoutBarcode() : num_dups_(0) {}\n  MappingWithoutBarcode(uint32_t read_id, uint32_t fragment_start_position,\n                        uint16_t fragment_length, uint16_t mapq,\n                        uint8_t direction, uint8_t is_unique, uint8_t num_dups)\n      : read_id_(read_id),\n        fragment_start_position_(fragment_start_position),\n        fragment_length_(fragment_length),\n        mapq_(mapq),\n        direction_(direction),\n        is_unique_(is_unique),\n        num_dups_(num_dups) {}\n\n  bool operator<(const MappingWithoutBarcode &m) const {\n    return std::tie(fragment_start_position_, fragment_length_, mapq_,\n                    direction_, is_unique_, read_id_) <\n           std::tie(m.fragment_start_position_, m.fragment_length_, m.mapq_,\n                    m.direction_, m.is_unique_, m.read_id_);\n  }\n  bool operator==(const MappingWithoutBarcode &m) const {\n    return std::tie(fragment_start_position_) ==\n           std::tie(m.fragment_start_position_);\n  }\n  bool IsSamePosition(const MappingWithoutBarcode &m) const {\n    return std::tie(fragment_start_position_) ==\n           std::tie(m.fragment_start_position_);\n  }\n  uint64_t GetBarcode() const { return 0; }\n  void Tn5Shift() {\n    if (direction_ == 1) {\n      fragment_start_position_ += 4;\n    } else {\n      fragment_length_ -= 5;\n    }\n  }\n  bool IsPositiveStrand() const { return direction_ > 0 ? true : false; }\n  uint32_t GetStartPosition() const {  // inclusive\n    return fragment_start_position_;\n  }\n  uint32_t GetEndPosition() const {  // exclusive\n    return fragment_start_position_ + fragment_length_;\n  }\n};\n\nclass PairedEndMappingWithBarcode : public Mapping {\n public:\n  uint32_t read_id_;\n  uint64_t cell_barcode_;\n  uint32_t fragment_start_position_;\n  uint16_t fragment_length_;\n  uint8_t mapq_ : 6, direction_ : 1, is_unique_ : 1;\n  uint8_t num_dups_;\n  // uint8_t mapq;\n  uint16_t positive_alignment_length_;\n  uint16_t negative_alignment_length_;\n  PairedEndMappingWithBarcode() : num_dups_(0) {}\n  PairedEndMappingWithBarcode(uint32_t read_id, uint64_t cell_barcode,\n                              uint32_t fragment_start_position,\n                              uint16_t fragment_length, uint8_t mapq,\n                              uint8_t direction, uint8_t is_unique,\n                              uint8_t num_dups,\n                              uint16_t positive_alignment_length,\n                              uint16_t negative_alignment_length)\n      : read_id_(read_id),\n        cell_barcode_(cell_barcode),\n        fragment_start_position_(fragment_start_position),\n        fragment_length_(fragment_length),\n        mapq_(mapq),\n        direction_(direction),\n        is_unique_(is_unique),\n        num_dups_(num_dups),\n        positive_alignment_length_(positive_alignment_length),\n        negative_alignment_length_(negative_alignment_length) {}\n  bool operator<(const PairedEndMappingWithBarcode &m) const {\n    return std::tie(fragment_start_position_, fragment_length_, cell_barcode_,\n                    mapq_, direction_, is_unique_, read_id_,\n                    positive_alignment_length_, negative_alignment_length_) <\n           std::tie(m.fragment_start_position_, m.fragment_length_,\n                    m.cell_barcode_, m.mapq_, m.direction_, m.is_unique_,\n                    m.read_id_, m.positive_alignment_length_,\n                    m.negative_alignment_length_);\n  }\n  bool operator==(const PairedEndMappingWithBarcode &m) const {\n    return std::tie(cell_barcode_, fragment_start_position_,\n                    fragment_length_) == std::tie(m.cell_barcode_,\n                                                  m.fragment_start_position_,\n                                                  m.fragment_length_);\n  }\n  bool IsSamePosition(const PairedEndMappingWithBarcode &m) const {\n    return std::tie(fragment_start_position_, fragment_length_) ==\n           std::tie(m.fragment_start_position_, m.fragment_length_);\n  }\n  uint64_t GetBarcode() const { return cell_barcode_; }\n  void Tn5Shift() {\n    fragment_start_position_ += 4;\n    positive_alignment_length_ -= 4;\n    fragment_length_ -= 9;\n    negative_alignment_length_ -= 5;\n  }\n  bool IsPositiveStrand() const { return direction_ > 0 ? true : false; }\n  uint32_t GetStartPosition() const {  // inclusive\n    return fragment_start_position_;\n  }\n  uint32_t GetEndPosition() const {  // exclusive\n    return fragment_start_position_ + fragment_length_;\n  }\n};\n\nclass PairedEndMappingWithoutBarcode : public Mapping {\n public:\n  uint32_t read_id_;\n  uint32_t fragment_start_position_;\n  uint16_t fragment_length_;\n  uint8_t mapq_ : 6, direction_ : 1, is_unique_ : 1;\n  uint8_t num_dups_;\n  // uint8_t mapq;\n  uint16_t positive_alignment_length_;\n  uint16_t negative_alignment_length_;\n  PairedEndMappingWithoutBarcode() : num_dups_(0) {}\n  PairedEndMappingWithoutBarcode(uint32_t read_id,\n                                 uint32_t fragment_start_position,\n                                 uint16_t fragment_length, uint8_t mapq,\n                                 uint8_t direction, uint8_t is_unique,\n                                 uint16_t num_dups,\n                                 uint16_t positive_alignment_length,\n                                 uint16_t negative_alignment_length)\n      : read_id_(read_id),\n        fragment_start_position_(fragment_start_position),\n        fragment_length_(fragment_length),\n        mapq_(mapq),\n        direction_(direction),\n        is_unique_(is_unique),\n        num_dups_(num_dups),\n        positive_alignment_length_(positive_alignment_length),\n        negative_alignment_length_(negative_alignment_length) {}\n\n  bool operator<(const PairedEndMappingWithoutBarcode &m) const {\n    return std::tie(fragment_start_position_, fragment_length_, mapq_,\n                    direction_, is_unique_, read_id_,\n                    positive_alignment_length_, negative_alignment_length_) <\n           std::tie(m.fragment_start_position_, m.fragment_length_, m.mapq_,\n                    m.direction_, m.is_unique_, m.read_id_,\n                    m.positive_alignment_length_, m.negative_alignment_length_);\n  }\n  bool operator==(const PairedEndMappingWithoutBarcode &m) const {\n    return std::tie(fragment_start_position_, fragment_length_) ==\n           std::tie(m.fragment_start_position_, m.fragment_length_);\n  }\n  bool IsSamePosition(const PairedEndMappingWithoutBarcode &m) const {\n    return std::tie(fragment_start_position_, fragment_length_) ==\n           std::tie(m.fragment_start_position_, m.fragment_length_);\n  }\n  uint64_t GetBarcode() const { return 0; }\n  void Tn5Shift() {\n    fragment_start_position_ += 4;\n    positive_alignment_length_ -= 4;\n    fragment_length_ -= 9;\n    negative_alignment_length_ -= 5;\n  }\n  bool IsPositiveStrand() const { return direction_ > 0 ? true : false; }\n  uint32_t GetStartPosition() const {  // inclusive\n    return fragment_start_position_;\n  }\n  uint32_t GetEndPosition() const {  // exclusive\n    return fragment_start_position_ + fragment_length_;\n  }\n};\n\n}  // namespace chromap\n\n#endif  // BEDMAPPING_H_\n"
  },
  {
    "path": "src/candidate.h",
    "content": "#ifndef CANDIDATE_H_\n#define CANDIDATE_H_\n\n#include <stdint.h>\n\nnamespace chromap {\n\nstruct Candidate {\n  // The high 32 bits save the reference sequence index in the reference\n  // sequence batch. The low 32 bits save the reference position on that\n  // sequence.\n  uint64_t position = 0;\n\n  // The number of minimizers supports the position.\n  uint8_t count = 0;\n\n  inline uint32_t GetReferenceSequenceIndex() const { return (position >> 32); }\n\n  inline uint32_t GetReferenceSequencePosition() const { return position; }\n\n  inline uint8_t GetCount() { return count; }\n\n  inline bool operator<(const Candidate &c) const {\n    if (count > c.count) {\n      return true;\n    }\n\n    if (count < c.count) {\n      return false;\n    }\n\n    return position < c.position;\n  }\n};\n\n}  // namespace chromap\n\n#endif  // CANDIDATE_H_\n"
  },
  {
    "path": "src/candidate_position_generating_config.h",
    "content": "#ifndef CANDIDATE_POSITION_GENERATING_CONFIG_H_\n#define CANDIDATE_POSITION_GENERATING_CONFIG_H_\n\n#include <stdint.h>\n\nnamespace chromap {\n\n// This class holds the parameters to generate candidate position. Using the\n// parameters, it can check whether a seed is frequent or repetitive.\nclass CandidatePositionGeneratingConfig {\n public:\n  CandidatePositionGeneratingConfig() = delete;\n\n  CandidatePositionGeneratingConfig(uint32_t max_seed_frequency,\n                                    uint32_t repetitive_seed_frequency,\n                                    bool use_heap_merge)\n      : max_seed_frequency_(max_seed_frequency),\n        repetitive_seed_frequency_(repetitive_seed_frequency),\n        use_heap_merge_(use_heap_merge) {}\n\n  ~CandidatePositionGeneratingConfig() = default;\n\n  inline bool IsFrequentSeed(uint32_t seed_frequency) const {\n    return seed_frequency >= max_seed_frequency_;\n  }\n\n  inline bool IsRepetitiveSeed(uint32_t seed_frequency) const {\n    return seed_frequency >= repetitive_seed_frequency_;\n  }\n\n  inline bool UseHeapMerge() const { return use_heap_merge_; }\n\n  inline uint32_t GetMaxSeedFrequency() const { return max_seed_frequency_; }\n\n private:\n  // Only seeds with frequency less than this threshold will be used.\n  const uint32_t max_seed_frequency_;\n\n  // Seeds with frequency greater than or equal to this threshold will be\n  // considered as repetitive seeds.\n  const uint32_t repetitive_seed_frequency_;\n\n  // When the number of candidate positions is really large, use heap merge to\n  // merge sorted candidate lists.\n  const bool use_heap_merge_;\n};\n\n}  // namespace chromap\n\n#endif  // CANDIDATE_POSITION_GENERATING_CONFIG_H_\n"
  },
  {
    "path": "src/candidate_processor.cc",
    "content": "#include \"candidate_processor.h\"\n\n#include <cinttypes>\n#include <cstring>\n#include <functional>\n#include <iostream>\n#include <string>\n#include <vector>\n\nnamespace chromap {\n\nvoid CandidateProcessor::GenerateCandidates(\n    int error_threshold, const Index &index,\n    MappingMetadata &mapping_metadata) const {\n  const std::vector<Minimizer> &minimizers = mapping_metadata.minimizers_;\n  std::vector<uint64_t> &positive_hits = mapping_metadata.positive_hits_;\n  std::vector<uint64_t> &negative_hits = mapping_metadata.negative_hits_;\n  std::vector<Candidate> &positive_candidates =\n      mapping_metadata.positive_candidates_;\n  std::vector<Candidate> &negative_candidates =\n      mapping_metadata.negative_candidates_;\n  uint32_t &repetitive_seed_length = mapping_metadata.repetitive_seed_length_;\n\n  const CandidatePositionGeneratingConfig first_round_generating_config(\n      /*max_seed_frequency=*/max_seed_frequencies_[0],\n      /*repetitive_seed_frequency=*/max_seed_frequencies_[0],\n      /*use_heap_merge=*/false);\n\n  repetitive_seed_length = 0;\n  int repetitive_seed_count = index.GenerateCandidatePositions(\n      first_round_generating_config, mapping_metadata);\n\n  bool use_high_frequency_minimizers = false;\n  if (positive_hits.size() + negative_hits.size() == 0) {\n    positive_hits.clear();\n    negative_hits.clear();\n    repetitive_seed_length = 0;\n\n    const CandidatePositionGeneratingConfig second_round_generating_config(\n        /*max_seed_frequency=*/max_seed_frequencies_[1],\n        /*repetitive_seed_frequency=*/max_seed_frequencies_[0],\n        /*use_heap_merge=*/true);\n\n    repetitive_seed_count = index.GenerateCandidatePositions(\n        second_round_generating_config, mapping_metadata);\n    use_high_frequency_minimizers = true;\n    if (positive_hits.size() == 0 || negative_hits.size() == 0) {\n      use_high_frequency_minimizers = false;\n    }\n  }\n\n  int num_required_seeds = minimizers.size() - repetitive_seed_count;\n  num_required_seeds = num_required_seeds > 1 ? num_required_seeds : 1;\n  num_required_seeds = num_required_seeds > min_num_seeds_required_for_mapping_\n                           ? min_num_seeds_required_for_mapping_\n                           : num_required_seeds;\n  if (use_high_frequency_minimizers) {\n    num_required_seeds = min_num_seeds_required_for_mapping_;\n  }\n\n  // std::cerr << \"Normal positive gen on one dir\\n\";\n  GenerateCandidatesOnOneStrand(error_threshold, num_required_seeds,\n                                minimizers.size(), positive_hits,\n                                positive_candidates);\n  // std::cerr << \"Normal negative gen on one dir\\n\";\n  GenerateCandidatesOnOneStrand(error_threshold, num_required_seeds,\n                                minimizers.size(), negative_hits,\n                                negative_candidates);\n  // fprintf(stderr, \"p+n: %d\\n\", positive_candidates->size() +\n  // negative_candidates->size()) ;\n}\n\n// Return 0 if it supplements normally. Return 1 if the supplement could be too\n// aggressive, and MAPQ needs setting to 0.\nint CandidateProcessor::SupplementCandidates(\n    int error_threshold, uint32_t search_range, const Index &index,\n    PairedEndMappingMetadata &paired_end_mapping_metadata) const {\n  std::vector<Candidate> augment_positive_candidates1;\n  std::vector<Candidate> augment_positive_candidates2;\n  std::vector<Candidate> augment_negative_candidates1;\n  std::vector<Candidate> augment_negative_candidates2;\n\n  int ret = 0;\n\n  for (int mate = 0; mate <= 1; ++mate) {\n    std::vector<Minimizer> *minimizers;\n    std::vector<uint64_t> *positive_hits;\n    std::vector<uint64_t> *negative_hits;\n    std::vector<Candidate> *positive_candidates;\n    std::vector<Candidate> *negative_candidates;\n    std::vector<Candidate> *mate_positive_candidates;\n    std::vector<Candidate> *mate_negative_candidates;\n    std::vector<Candidate> *augment_positive_candidates;\n    std::vector<Candidate> *augment_negative_candidates;\n    uint32_t *repetitive_seed_length;\n\n    if (mate == 0) {\n      minimizers = &paired_end_mapping_metadata.mapping_metadata1_.minimizers_;\n      positive_hits =\n          &paired_end_mapping_metadata.mapping_metadata1_.positive_hits_;\n      negative_hits =\n          &paired_end_mapping_metadata.mapping_metadata1_.negative_hits_;\n      positive_candidates =\n          &paired_end_mapping_metadata.mapping_metadata1_.positive_candidates_;\n      negative_candidates =\n          &paired_end_mapping_metadata.mapping_metadata1_.negative_candidates_;\n      mate_positive_candidates =\n          &paired_end_mapping_metadata.mapping_metadata2_.positive_candidates_;\n      mate_negative_candidates =\n          &paired_end_mapping_metadata.mapping_metadata2_.negative_candidates_;\n      augment_positive_candidates = &augment_positive_candidates1;\n      augment_negative_candidates = &augment_negative_candidates1;\n      repetitive_seed_length = &paired_end_mapping_metadata.mapping_metadata1_\n                                    .repetitive_seed_length_;\n    } else {\n      minimizers = &paired_end_mapping_metadata.mapping_metadata2_.minimizers_;\n      positive_hits =\n          &paired_end_mapping_metadata.mapping_metadata2_.positive_hits_;\n      negative_hits =\n          &paired_end_mapping_metadata.mapping_metadata2_.negative_hits_;\n      positive_candidates =\n          &paired_end_mapping_metadata.mapping_metadata2_.positive_candidates_;\n      negative_candidates =\n          &paired_end_mapping_metadata.mapping_metadata2_.negative_candidates_;\n      mate_positive_candidates =\n          &paired_end_mapping_metadata.mapping_metadata1_.positive_candidates_;\n      mate_negative_candidates =\n          &paired_end_mapping_metadata.mapping_metadata1_.negative_candidates_;\n      augment_positive_candidates = &augment_positive_candidates2;\n      augment_negative_candidates = &augment_negative_candidates2;\n      repetitive_seed_length = &paired_end_mapping_metadata.mapping_metadata2_\n                                    .repetitive_seed_length_;\n    }\n\n    uint32_t mm_count = minimizers->size();\n    bool augment_flag = true;\n    uint32_t candidate_num = positive_candidates->size();\n\n    for (uint32_t i = 0; i < candidate_num; ++i) {\n      if ((*positive_candidates)[i].count >= mm_count / 2) {\n        augment_flag = false;\n        break;\n      }\n    }\n\n    candidate_num = negative_candidates->size();\n    if (augment_flag) {\n      for (uint32_t i = 0; i < candidate_num; ++i) {\n        if ((*negative_candidates)[i].count >= mm_count / 2) {\n          augment_flag = false;\n          break;\n        }\n      }\n    }\n\n    if (augment_flag) {\n      positive_hits->clear();\n      negative_hits->clear();\n      positive_hits->reserve(max_seed_frequencies_[0]);\n      negative_hits->reserve(max_seed_frequencies_[0]);\n      int positive_rescue_result = 0;\n      int negative_rescue_result = 0;\n      if (mate_positive_candidates->size() > 0) {\n        positive_rescue_result =\n            GenerateCandidatesFromRepetitiveReadWithMateInfoOnOneStrand(\n                kNegative, search_range, error_threshold, index, *minimizers,\n                *mate_positive_candidates, *repetitive_seed_length,\n                *negative_hits, *augment_negative_candidates);\n      }\n\n      if (mate_negative_candidates->size() > 0) {\n        negative_rescue_result =\n            GenerateCandidatesFromRepetitiveReadWithMateInfoOnOneStrand(\n                kPositive, search_range, error_threshold, index, *minimizers,\n                *mate_negative_candidates, *repetitive_seed_length,\n                *positive_hits, *augment_positive_candidates);\n      }\n\n      // If one of the strand did not supplement due to too many best candidate,\n      // and the filtered strand have better best candidates,\n      // and there is no candidate directly from minimizers,\n      // then we remove the supplement\n      if (((positive_rescue_result < 0 && negative_rescue_result > 0 &&\n            -positive_rescue_result >= negative_rescue_result) ||\n           (positive_rescue_result > 0 && negative_rescue_result < 0 &&\n            positive_rescue_result <= -negative_rescue_result)) &&\n          positive_candidates->size() + negative_candidates->size() == 0) {\n        // augment_positive_candidates->clear();\n        // augment_negative_candidates->clear();\n        ret = 1;\n      }\n    }\n  }\n\n  if (augment_positive_candidates1.size() > 0) {\n    MergeCandidates(\n        error_threshold,\n        paired_end_mapping_metadata.mapping_metadata1_.positive_candidates_,\n        augment_positive_candidates1,\n        paired_end_mapping_metadata.mapping_metadata1_\n            .positive_candidates_buffer_);\n  }\n\n  if (augment_negative_candidates1.size() > 0) {\n    MergeCandidates(\n        error_threshold,\n        paired_end_mapping_metadata.mapping_metadata1_.negative_candidates_,\n        augment_negative_candidates1,\n        paired_end_mapping_metadata.mapping_metadata1_\n            .negative_candidates_buffer_);\n  }\n\n  if (augment_positive_candidates2.size() > 0) {\n    MergeCandidates(\n        error_threshold,\n        paired_end_mapping_metadata.mapping_metadata2_.positive_candidates_,\n        augment_positive_candidates2,\n        paired_end_mapping_metadata.mapping_metadata2_\n            .positive_candidates_buffer_);\n  }\n\n  if (augment_negative_candidates2.size() > 0) {\n    MergeCandidates(\n        error_threshold,\n        paired_end_mapping_metadata.mapping_metadata2_.negative_candidates_,\n        augment_negative_candidates2,\n        paired_end_mapping_metadata.mapping_metadata2_\n            .negative_candidates_buffer_);\n  }\n  return ret;\n}\n\nvoid CandidateProcessor::ReduceCandidatesForPairedEndRead(\n    uint32_t mapping_positions_distance,\n    PairedEndMappingMetadata &paired_end_mapping_metadata) const {\n  const std::vector<Candidate> &positive_candidates1 =\n      paired_end_mapping_metadata.mapping_metadata1_\n          .positive_candidates_buffer_;\n  const std::vector<Candidate> &negative_candidates1 =\n      paired_end_mapping_metadata.mapping_metadata1_\n          .negative_candidates_buffer_;\n  const std::vector<Candidate> &positive_candidates2 =\n      paired_end_mapping_metadata.mapping_metadata2_\n          .positive_candidates_buffer_;\n  const std::vector<Candidate> &negative_candidates2 =\n      paired_end_mapping_metadata.mapping_metadata2_\n          .negative_candidates_buffer_;\n  std::vector<Candidate> &filtered_positive_candidates1 =\n      paired_end_mapping_metadata.mapping_metadata1_.positive_candidates_;\n  std::vector<Candidate> &filtered_negative_candidates1 =\n      paired_end_mapping_metadata.mapping_metadata1_.negative_candidates_;\n  std::vector<Candidate> &filtered_positive_candidates2 =\n      paired_end_mapping_metadata.mapping_metadata2_.positive_candidates_;\n  std::vector<Candidate> &filtered_negative_candidates2 =\n      paired_end_mapping_metadata.mapping_metadata2_.negative_candidates_;\n\n  ReduceCandidatesForPairedEndReadOnOneDirection(\n      mapping_positions_distance, positive_candidates1, negative_candidates2,\n      filtered_positive_candidates1, filtered_negative_candidates2);\n  ReduceCandidatesForPairedEndReadOnOneDirection(\n      mapping_positions_distance, negative_candidates1, positive_candidates2,\n      filtered_negative_candidates1, filtered_positive_candidates2);\n}\n\nint CandidateProcessor::\n    GenerateCandidatesFromRepetitiveReadWithMateInfoOnOneStrand(\n        const Strand strand, uint32_t search_range, int error_threshold,\n        const Index &index, const std::vector<Minimizer> &minimizers,\n        const std::vector<Candidate> &mate_candidates,\n        uint32_t &repetitive_seed_length, std::vector<uint64_t> &hits,\n        std::vector<Candidate> &candidates) const {\n  int max_seed_count =\n      index.GenerateCandidatePositionsFromRepetitiveReadWithMateInfoOnOneStrand(\n          strand, search_range, min_num_seeds_required_for_mapping_,\n          max_seed_frequencies_[0], error_threshold, minimizers,\n          mate_candidates, repetitive_seed_length, hits);\n\n  GenerateCandidatesOnOneStrand(error_threshold, /*num_seeds_required=*/1,\n                                minimizers.size(), hits, candidates);\n  return max_seed_count;\n}\n\nvoid CandidateProcessor::GenerateCandidatesOnOneStrand(\n    int error_threshold, int num_seeds_required, uint32_t num_minimizers,\n    std::vector<uint64_t> &hits, std::vector<Candidate> &candidates) const {\n  hits.emplace_back(UINT64_MAX);\n  if (hits.size() > 0) {\n    int minimizer_count = 1;\n    // The number of seeds with the exact same reference position.\n    int equal_count = 1;\n    int best_equal_count = 1;\n    uint64_t previous_hit = hits[0];\n    uint32_t previous_reference_id = previous_hit >> 32;\n    uint32_t previous_reference_position = previous_hit;\n    uint64_t best_local_hit = hits[0];\n    for (uint32_t pi = 1; pi < hits.size(); ++pi) {\n      uint32_t current_reference_id = hits[pi] >> 32;\n      uint32_t current_reference_position = hits[pi];\n#ifdef LI_DEBUG\n      printf(\"%s: %d %d\\n\", __func__, current_reference_id,\n             current_reference_position);\n#endif\n      if (current_reference_id != previous_reference_id ||\n          current_reference_position >\n              previous_reference_position + error_threshold ||\n          ((uint32_t)minimizer_count >= num_minimizers &&\n           current_reference_position >\n               (uint32_t)best_local_hit + error_threshold)) {\n        if (minimizer_count >= num_seeds_required) {\n          Candidate candidate;\n          candidate.position = best_local_hit;\n          candidate.count = best_equal_count;\n          candidates.push_back(candidate);\n        }\n\n        minimizer_count = 1;\n        equal_count = 1;\n        best_equal_count = 1;\n        best_local_hit = hits[pi];\n      } else {\n        if (hits[pi] == best_local_hit) {\n          ++equal_count;\n          ++best_equal_count;\n        } else if (hits[pi] == previous_hit) {\n          ++equal_count;\n          if (equal_count > best_equal_count) {\n            best_local_hit = previous_hit;\n            best_equal_count = equal_count;\n          }\n        } else {\n          equal_count = 1;\n        }\n\n        ++minimizer_count;\n      }\n\n      previous_hit = hits[pi];\n      previous_reference_id = current_reference_id;\n      previous_reference_position = current_reference_position;\n    }\n  }\n}\n\n// Merge c1 and c2 into buffer and then swap the results into c1.\nvoid CandidateProcessor::MergeCandidates(int error_threshold,\n                                         std::vector<Candidate> &c1,\n                                         std::vector<Candidate> &c2,\n                                         std::vector<Candidate> &buffer) const {\n  if (c1.size() == 0) {\n    c1.swap(c2);\n    return;\n  }\n\n  uint32_t i, j;\n  uint32_t size1, size2;\n  size1 = c1.size();\n  size2 = c2.size();\n  buffer.clear();\n\n#ifdef LI_DEBUG\n  for (i = 0; i < size1; ++i)\n    printf(\"c1: %d %d %d\\n\", (int)(c1[i].position >> 32), (int)c1[i].position,\n           c1[i].count);\n  for (i = 0; i < size2; ++i)\n    printf(\"c2: %d %d %d\\n\", (int)(c2[i].position >> 32), (int)c2[i].position,\n           c2[i].count);\n#endif\n\n  i = 0;\n  j = 0;\n  while (i < size1 && j < size2) {\n    if (c1[i].position == c2[j].position) {\n      if (buffer.empty() ||\n          c1[i].position > buffer.back().position + error_threshold) {\n        if (c1[i].count > c2[j].count) {\n          buffer.push_back(c1[i]);\n        } else {\n          buffer.push_back(c2[j]);\n        }\n      }\n      ++i, ++j;\n    } else if (c1[i].position < c2[j].position) {\n      if (buffer.empty() ||\n          c1[i].position > buffer.back().position + error_threshold) {\n        buffer.push_back(c1[i]);\n      }\n      ++i;\n    } else {\n      if (buffer.empty() ||\n          c2[j].position > buffer.back().position + error_threshold) {\n        buffer.push_back(c2[j]);\n      }\n      ++j;\n    }\n  }\n\n  while (i < size1) {\n    if (buffer.empty() ||\n        c1[i].position > buffer.back().position + error_threshold) {\n      buffer.push_back(c1[i]);\n    }\n    ++i;\n  }\n\n  while (j < size2) {\n    if (buffer.empty() ||\n        c2[j].position > buffer.back().position + error_threshold) {\n      buffer.push_back(c2[j]);\n    }\n    ++j;\n  }\n\n  c1.swap(buffer);\n}\n\nvoid CandidateProcessor::ReduceCandidatesForPairedEndReadOnOneDirection(\n    uint32_t mapping_positions_distance,\n    const std::vector<Candidate> &candidates1,\n    const std::vector<Candidate> &candidates2,\n    std::vector<Candidate> &filtered_candidates1,\n    std::vector<Candidate> &filtered_candidates2) const {\n  uint32_t i1 = 0;\n  uint32_t i2 = 0;\n  int num_unpaired_candidate1 = 0;\n  int num_unpaired_candidate2 = 0;\n  int num_unpaired_candidate_threshold = 5;\n  int max_candidate_count1 = 6;\n  int max_candidate_count2 = 6;\n  uint32_t previous_end_i2 = i2;\n#ifdef LI_DEBUG\n  for (uint32_t i = 0; i < candidates1.size(); ++i)\n    printf(\"%s 0: %d %d:%d\\n\", __func__, i,\n           (int)(candidates1[i].position >> 32), (int)candidates1[i].position);\n  for (uint32_t i = 0; i < candidates2.size(); ++i)\n    printf(\"%s 1: %d %d:%d\\n\", __func__, i,\n           (int)(candidates2[i].position >> 32), (int)candidates2[i].position);\n#endif\n  while (i1 < candidates1.size() && i2 < candidates2.size()) {\n    if (candidates1[i1].position >\n        candidates2[i2].position + mapping_positions_distance) {\n      if (i2 >= previous_end_i2 &&\n          num_unpaired_candidate2 < num_unpaired_candidate_threshold &&\n          (candidates1[i1].position >> 32) ==\n              (candidates2[i2].position >> 32) &&\n          candidates2[i2].count >= max_candidate_count2) {\n        filtered_candidates2.emplace_back(candidates2[i2]);\n        ++num_unpaired_candidate2;\n      }\n      ++i2;\n    } else if (candidates2[i2].position >\n               candidates1[i1].position + mapping_positions_distance) {\n      if (num_unpaired_candidate1 < num_unpaired_candidate_threshold &&\n          (candidates1[i1].position >> 32) ==\n              (candidates2[i2].position >> 32) &&\n          candidates1[i1].count >= max_candidate_count1) {\n        filtered_candidates1.emplace_back(candidates1[i1]);\n        ++num_unpaired_candidate1;\n      }\n      ++i1;\n    } else {\n      // ok, find a pair, we store current ni2 somewhere and keep looking until\n      // we go out of the range, then we go back and then move to next pi1 and\n      // keep doing the similar thing.\n      filtered_candidates1.emplace_back(candidates1[i1]);\n      if (candidates1[i1].count > max_candidate_count1) {\n        max_candidate_count1 = candidates1[i1].count;\n      }\n      uint32_t current_i2 = i2;\n      while (current_i2 < candidates2.size() &&\n             candidates2[current_i2].position <=\n                 candidates1[i1].position + mapping_positions_distance) {\n        if (current_i2 >= previous_end_i2) {\n          filtered_candidates2.emplace_back(candidates2[current_i2]);\n          if (candidates2[current_i2].count > max_candidate_count2) {\n            max_candidate_count2 = candidates2[current_i2].count;\n          }\n        }\n        ++current_i2;\n      }\n      previous_end_i2 = current_i2;\n      ++i1;\n    }\n  }\n}\n\n}  // namespace chromap\n"
  },
  {
    "path": "src/candidate_processor.h",
    "content": "#ifndef CANDIDATE_PROCESSOR_H_\n#define CANDIDATE_PROCESSOR_H_\n\n#include <cinttypes>\n#include <cstring>\n#include <functional>\n#include <iostream>\n#include <string>\n#include <vector>\n\n#include \"candidate.h\"\n#include \"index.h\"\n#include \"mapping_metadata.h\"\n#include \"paired_end_mapping_metadata.h\"\n#include \"sequence_batch.h\"\n#include \"utils.h\"\n\nnamespace chromap {\n\nclass CandidateProcessor {\n public:\n  CandidateProcessor() = delete;\n\n  CandidateProcessor(int min_num_seeds_required_for_mapping,\n                     const std::vector<int> max_seed_frequencies)\n      : min_num_seeds_required_for_mapping_(min_num_seeds_required_for_mapping),\n        max_seed_frequencies_(max_seed_frequencies) {}\n\n  ~CandidateProcessor() = default;\n\n  void GenerateCandidates(int error_threshold, const Index &index,\n                          MappingMetadata &mapping_metadata) const;\n\n  int SupplementCandidates(\n      int error_threshold, uint32_t search_range, const Index &index,\n      PairedEndMappingMetadata &paired_end_mapping_metadata) const;\n\n  void ReduceCandidatesForPairedEndRead(\n      uint32_t mapping_positions_distance,\n      PairedEndMappingMetadata &paired_end_mapping_metadata) const;\n\n private:\n  void GenerateCandidatesOnOneStrand(int error_threshold,\n                                     int num_seeds_required,\n                                     uint32_t num_minimizers,\n                                     std::vector<uint64_t> &hits,\n                                     std::vector<Candidate> &candidates) const;\n\n  int GenerateCandidatesFromRepetitiveReadWithMateInfoOnOneStrand(\n      const Strand strand, uint32_t search_range, int error_threshold,\n      const Index &index, const std::vector<Minimizer> &minimizers,\n      const std::vector<Candidate> &mate_candidates,\n      uint32_t &repetitive_seed_length, std::vector<uint64_t> &hits,\n      std::vector<Candidate> &candidates) const;\n\n  void MergeCandidates(int error_threshold, std::vector<Candidate> &c1,\n                       std::vector<Candidate> &c2,\n                       std::vector<Candidate> &buffer) const;\n\n  void ReduceCandidatesForPairedEndReadOnOneDirection(\n      uint32_t mapping_positions_distance,\n      const std::vector<Candidate> &candidates1,\n      const std::vector<Candidate> &candidates2,\n      std::vector<Candidate> &filtered_candidates1,\n      std::vector<Candidate> &filtered_candidates2) const;\n\n  const int min_num_seeds_required_for_mapping_;\n  // Vector of size 2. The first element is the frequency threshold, and the\n  // second element is the frequency threshold to run rescue. The second element\n  // should always larger than the first one.\n  // TODO(Haowen): add an error check.\n  const std::vector<int> max_seed_frequencies_;\n};\n\n}  // namespace chromap\n\n#endif  // CANDIDATE_PROCESSOR_H_\n"
  },
  {
    "path": "src/chromap.cc",
    "content": "#include \"chromap.h\"\n\n#include <assert.h>\n#include <math.h>\n\n#include <fstream>\n#include <iomanip>\n#include <iostream>\n#include <limits>\n#include <random>\n#include <sstream>\n#include <type_traits>\n#include <unordered_map>\n\nnamespace chromap {\n\nvoid Chromap::ConstructIndex() {\n  // TODO(Haowen): Need a faster algorithm\n  // Load all sequences in the reference into one batch\n  SequenceBatch reference;\n  reference.InitializeLoading(index_parameters_.reference_file_path);\n  reference.LoadAllSequences();\n  const uint32_t num_sequences = reference.GetNumSequences();\n  Index index(index_parameters_);\n  index.Construct(num_sequences, reference);\n  index.Statistics(num_sequences, reference);\n  index.Save();\n  reference.FinalizeLoading();\n}\n\nuint32_t Chromap::LoadSingleEndReadsWithBarcodes(SequenceBatch &read_batch,\n                                                 SequenceBatch &barcode_batch,\n                                                 bool parallel_parsing) {\n  //double real_start_time = GetRealTime();\n  uint32_t num_loaded_reads = 0;\n\n  if (!parallel_parsing || mapping_parameters_.is_bulk_data) {\n    while (num_loaded_reads < read_batch_size_) {\n      bool no_more_read = read_batch.LoadOneSequenceAndSaveAt(num_loaded_reads);\n      bool no_more_barcode = no_more_read;\n      if (!mapping_parameters_.is_bulk_data) {\n        no_more_barcode =\n          barcode_batch.LoadOneSequenceAndSaveAt(num_loaded_reads);\n      }\n\n      if (no_more_read && no_more_barcode) {\n        break;\n      } else if (no_more_read || no_more_barcode){\n        ExitWithMessage(\"Numbers of reads and barcodes don't match!\");\n      }\n      ++num_loaded_reads;\n    }\n  } else {\n    uint32_t num_loaded_barcode = 0 ;\n    \n#pragma omp task shared(num_loaded_reads, read_batch)\n    {\n      uint32_t i = 0 ;\n      for (i = 0 ; i < read_batch_size_; ++i) {\n        if (read_batch.LoadOneSequenceAndSaveAt(i) == true) { // true: no more read\n          break ;\n        }\n      }\n      num_loaded_reads = i ;\n    }\n\n#pragma omp task shared(num_loaded_barcode, barcode_batch)\n    { // bulk data will go to the other big branch\n      uint32_t i = 0 ;\n      for (i = 0 ; i < read_batch_size_; ++i) {\n        if (barcode_batch.LoadOneSequenceAndSaveAt(i) == true) { // true: no more read\n          break ;\n        }\n      }\n      num_loaded_barcode = i ;\n    }\n\n#pragma omp taskwait\n\n    if (num_loaded_reads != num_loaded_barcode) {\n        ExitWithMessage(\"Numbers of reads and barcodes don't match!\");\n    }\n  }\n  /*if (num_loaded_reads > 0) {\n    std::cerr << \"Loaded \" << num_loaded_reads << \" reads in \"\n              << GetRealTime() - real_start_time << \"s.\\n\";\n  } else {\n    std::cerr << \"No more reads.\\n\";\n  }*/\n  return num_loaded_reads;\n}\n\nuint32_t Chromap::LoadPairedEndReadsWithBarcodes(SequenceBatch &read_batch1,\n                                                 SequenceBatch &read_batch2,\n                                                 SequenceBatch &barcode_batch,\n                                                 bool parallel_parsing) {\n  // double real_start_time = Chromap<>::GetRealTime();\n  uint32_t num_loaded_pairs = 0;\n  \n  if (!parallel_parsing) {\n    while (num_loaded_pairs < read_batch_size_) {\n      bool no_more_read1 = read_batch1.LoadOneSequenceAndSaveAt(num_loaded_pairs);\n      bool no_more_read2 = read_batch2.LoadOneSequenceAndSaveAt(num_loaded_pairs);\n      bool no_more_barcode = no_more_read2;\n      if (!mapping_parameters_.is_bulk_data) {\n        no_more_barcode =\n          barcode_batch.LoadOneSequenceAndSaveAt(num_loaded_pairs);\n      }\n      \n      if (no_more_read1 && no_more_read2 && no_more_barcode) {\n        break;\n      } else if (no_more_read1 || no_more_read2 || no_more_barcode){\n        ExitWithMessage(\"Numbers of reads and barcodes don't match!\");\n      }\n      ++num_loaded_pairs;\n    }\n  } else {\n    uint32_t num_loaded_read1 = 0;\n    uint32_t num_loaded_read2 = 0;\n    uint32_t num_loaded_barcode = 0;\n    \n#pragma omp task shared(num_loaded_read1, read_batch1)\n    {\n      uint32_t i = 0 ;\n      for (i = 0 ; i < read_batch_size_; ++i) {\n        if (read_batch1.LoadOneSequenceAndSaveAt(i) == true) { // true: no more read\n          break ;\n        }\n      }\n      num_loaded_read1 = i ;\n    }\n\n#pragma omp task shared(num_loaded_read2, read_batch2)\n    {\n      uint32_t i = 0 ;\n      for (i = 0 ; i < read_batch_size_; ++i) {\n        if (read_batch2.LoadOneSequenceAndSaveAt(i) == true) { // true: no more read\n          break ;\n        }\n      }\n      num_loaded_read2 = i ;\n    }\n\n#pragma omp task shared(num_loaded_barcode, barcode_batch)\n    {\n      if (!mapping_parameters_.is_bulk_data) {\n        uint32_t i = 0 ;\n        for (i = 0 ; i < read_batch_size_; ++i) {\n          if (barcode_batch.LoadOneSequenceAndSaveAt(i) == true) { // true: no more read\n            break ;\n          }\n        }\n        num_loaded_barcode = i ;\n      }\n    }\n\n#pragma omp taskwait\n    if (mapping_parameters_.is_bulk_data) {\n      num_loaded_barcode = num_loaded_read2;\n    }\n    if (num_loaded_read1 != num_loaded_read2 || num_loaded_read2 != num_loaded_barcode) {\n        ExitWithMessage(\"Numbers of reads and barcodes don't match!\");\n    }\n\n    num_loaded_pairs = num_loaded_read1 ;\n  }\n  // if (num_loaded_pairs > 0) {\n  //  std::cerr << \"Loaded \" << num_loaded_pairs << \" pairs in \"<<\n  //  Chromap<>::GetRealTime() - real_start_time << \"s. \";\n  //} else {\n  //  std::cerr << \"No more reads.\\n\";\n  //}\n  return num_loaded_pairs;\n}\n\nvoid Chromap::TrimAdapterForPairedEndRead(uint32_t pair_index,\n                                          SequenceBatch &read_batch1,\n                                          SequenceBatch &read_batch2) {\n  const uint32_t raw_read1_length = read_batch1.GetSequenceLengthAt(pair_index);\n  const uint32_t raw_read2_length = read_batch2.GetSequenceLengthAt(pair_index);\n  const char *raw_read1 = read_batch1.GetSequenceAt(pair_index);\n  const char *raw_read2 = read_batch2.GetSequenceAt(pair_index);\n  const std::string &raw_negative_read1 =\n      read_batch1.GetNegativeSequenceAt(pair_index);\n  const std::string &raw_negative_read2 =\n      read_batch2.GetNegativeSequenceAt(pair_index);\n\n  // In the actual adaptor trimming, we assuem length(read1)<=length(read2). So\n  // we can have the case that read1 is a subset of read2.\n  const char *read1 =\n      raw_read1_length <= raw_read2_length ? raw_read1 : raw_read2;\n  const std::string &negative_read2 = raw_read1_length <= raw_read2_length\n                                          ? raw_negative_read2\n                                          : raw_negative_read1;\n  const uint32_t read1_length = raw_read1_length <= raw_read2_length\n                                    ? raw_read1_length\n                                    : raw_read2_length;\n  const uint32_t read2_length = raw_read1_length <= raw_read2_length\n                                    ? raw_read2_length\n                                    : raw_read1_length;\n\n  const int min_overlap_length = mapping_parameters_.min_read_length;\n  const int seed_length = min_overlap_length / 2;\n  const int error_threshold_for_merging = 1;\n  bool is_merged = false;\n\n  for (int si = 0; si < error_threshold_for_merging + 1; ++si) {\n    size_t seed_start_position =\n        negative_read2.find(read1 + si * seed_length, 0, seed_length);\n\n    while (seed_start_position != std::string::npos) {\n      const bool before_seed_is_enough_long =\n          seed_start_position >= (size_t)(si * seed_length);\n      const bool overlap_is_enough_long =\n          (int)(read2_length - seed_start_position + seed_length * si) >=\n          min_overlap_length;\n\n      if (!before_seed_is_enough_long || !overlap_is_enough_long) {\n        seed_start_position = negative_read2.find(\n            read1 + si * seed_length, seed_start_position + 1, seed_length);\n        continue;\n      }\n\n      bool can_merge = true;\n      int num_errors = 0;\n\n      // The bases before the seed.\n      for (int i = 0; i < seed_length * si; ++i) {\n        if (negative_read2[seed_start_position - si * seed_length + i] !=\n            read1[i]) {\n          ++num_errors;\n        }\n        if (num_errors > error_threshold_for_merging) {\n          can_merge = false;\n          break;\n        }\n      }\n\n      // The bases after the seed.\n      for (uint32_t i = seed_length; i + seed_start_position < read2_length &&\n                                     si * seed_length + i < read1_length;\n           ++i) {\n        if (negative_read2[seed_start_position + i] !=\n            read1[si * seed_length + i]) {\n          ++num_errors;\n        }\n        if (num_errors > error_threshold_for_merging) {\n          can_merge = false;\n          break;\n        }\n      }\n\n      if (can_merge) {\n        // Trim adapters and TODO: fix sequencing errors\n        int overlap_length =\n            read2_length - seed_start_position + si * seed_length;\n        int read2_offset = 0;\n        // The case that read1 is strictly contained in read2. overlap_length is\n        // inferred from the longer read2, which could be longer than read1. In\n        // that case, we don't trim read1 (make overlap length equal to read1\n        // length) and trim read2 as the original plan.\n        if (overlap_length > (int)read1_length) {\n          read2_offset = overlap_length - read1_length;\n          overlap_length = read1_length;\n        }\n\n        if (raw_read1_length <= raw_read2_length) {\n          read_batch1.TrimSequenceAt(pair_index, overlap_length);\n          read_batch2.TrimSequenceAt(pair_index, overlap_length + read2_offset);\n        } else {\n          read_batch1.TrimSequenceAt(pair_index, overlap_length + read2_offset);\n          read_batch2.TrimSequenceAt(pair_index, overlap_length);\n        }\n\n        is_merged = true;\n        // std::cerr << \"Trimed! overlap length: \" << overlap_length << \", \" <<\n        // read1.GetLength() << \" \" << read2.GetLength() << \"\\n\";\n        break;\n      }\n\n      seed_start_position = negative_read2.find(\n          read1 + si * seed_length, seed_start_position + 1, seed_length);\n    }\n\n    if (is_merged) {\n      break;\n    }\n  }\n}\n\nbool Chromap::PairedEndReadWithBarcodeIsDuplicate(\n    uint32_t pair_index, const SequenceBatch &barcode_batch,\n    const SequenceBatch &read_batch1, const SequenceBatch &read_batch2) {\n  int dedupe_seed_length = 16;\n  uint32_t barcode_length = barcode_batch.GetSequenceLengthAt(pair_index);\n  uint64_t barcode_key =\n      barcode_batch.GenerateSeedFromSequenceAt(pair_index, 0, barcode_length);\n  uint64_t read1_seed1 =\n      read_batch1.GenerateSeedFromSequenceAt(pair_index, 0, dedupe_seed_length);\n  uint64_t read2_seed1 =\n      read_batch2.GenerateSeedFromSequenceAt(pair_index, 0, dedupe_seed_length);\n  uint64_t read_seed_key =\n      (read1_seed1 << (dedupe_seed_length * 2)) | read2_seed1;\n  uint64_t read1_seed2 = read_batch1.GenerateSeedFromSequenceAt(\n      pair_index, dedupe_seed_length, dedupe_seed_length * 2);\n  uint64_t read2_seed2 = read_batch2.GenerateSeedFromSequenceAt(\n      pair_index, dedupe_seed_length, dedupe_seed_length * 2);\n  khiter_t barcode_table_iterator =\n      kh_get(k64_seq, barcode_lookup_table_, barcode_key);\n  if (barcode_table_iterator != kh_end(barcode_lookup_table_)) {\n    uint32_t read_lookup_table_index =\n        kh_value(barcode_lookup_table_, barcode_table_iterator);\n    // std::cerr << \"Have barcode, try to check read. \" <<\n    // read_lookup_table_index << \"\\n\";\n    khash_t(k128) *read_lookup_table =\n        read_lookup_tables_[read_lookup_table_index];\n    khiter_t read_lookup_table_iterator =\n        kh_get(k128, read_lookup_table, read_seed_key);\n    if (read_lookup_table_iterator != kh_end(read_lookup_table)) {\n      // std::cerr << \"Have barcode, have read, try whether match.\\n\";\n      uint128_t read_seeds =\n          kh_value(read_lookup_table, read_lookup_table_iterator);\n      if (read_seeds.first == read1_seed2 && read_seeds.second == read2_seed2) {\n        // std::cerr << \"Have barcode, have read, and match.\\n\";\n        return true;\n      } else {\n        // std::cerr << \"Have barcode, have read, but don't match.\\n\";\n        return false;\n      }\n    } else {\n      // std::cerr << \"Have barcode, no read.\\n\";\n      uint128_t read_seeds = {.first = read1_seed2, .second = read2_seed2};\n      int khash_return_code;\n      khiter_t read_lookup_table_insert_iterator =\n          kh_put(k128, read_lookup_table, read_seed_key, &khash_return_code);\n      assert(khash_return_code != -1 && khash_return_code != 0);\n      kh_value(read_lookup_table, read_lookup_table_insert_iterator) =\n          read_seeds;\n      // std::cerr << \"Have barcode, no read.\\n\";\n      return false;\n    }\n  } else {\n    // insert the barcode and append a new read hash table to tables and then\n    // insert the reads\n    // std::cerr << \"No barcode, no read.\\n\";\n    int khash_return_code;\n    khiter_t barcode_table_insert_iterator =\n        kh_put(k64_seq, barcode_lookup_table_, barcode_key, &khash_return_code);\n    assert(khash_return_code != -1 && khash_return_code != 0);\n    kh_value(barcode_lookup_table_, barcode_table_insert_iterator) =\n        read_lookup_tables_.size();\n    khash_t(k128) *read_lookup_table = kh_init(k128);\n    khiter_t read_lookup_table_iterator =\n        kh_put(k128, read_lookup_table, read_seed_key, &khash_return_code);\n    assert(khash_return_code != -1 && khash_return_code != 0);\n    uint128_t read_seeds = {.first = read1_seed2, .second = read2_seed2};\n    kh_value(read_lookup_table, read_lookup_table_iterator) = read_seeds;\n    read_lookup_tables_.push_back(read_lookup_table);\n    // std::cerr << \"No barcode, no read.\\n\";\n    return false;\n  }\n}\n\nuint32_t Chromap::SampleInputBarcodesAndExamineLength() {\n  if (mapping_parameters_.is_bulk_data) {\n    return 0;\n  }\n\n  uint32_t sample_batch_size = 1000;\n  SequenceBatch barcode_batch(sample_batch_size, barcode_effective_range_);\n\n  barcode_batch.InitializeLoading(mapping_parameters_.barcode_file_paths[0]);\n\n  uint32_t num_loaded_barcodes = barcode_batch.LoadBatch();\n\n  uint32_t cell_barcode_length = barcode_batch.GetSequenceLengthAt(0);\n  for (uint32_t i = 1; i < num_loaded_barcodes; ++i) {\n    if (barcode_batch.GetSequenceLengthAt(i) != cell_barcode_length) {\n      ExitWithMessage(\"ERROR: barcode lengths are not equal in the sample!\");\n    }\n  }\n\n  barcode_batch.FinalizeLoading();\n\n  return cell_barcode_length;\n}\n\nvoid Chromap::LoadBarcodeWhitelist() {\n  double real_start_time = GetRealTime();\n  int num_barcodes = 0;\n\n  if (1) {\n    gzFile barcode_whitelist_file = \n      gzopen(mapping_parameters_.barcode_whitelist_file_path.c_str(), \"r\"); \n    const uint32_t barcode_buffer_size = 256;\n    char barcode[barcode_buffer_size];\n    while (gzgets(barcode_whitelist_file, barcode, barcode_buffer_size) != NULL) {\n      size_t barcode_length = strlen(barcode);\n      if (barcode[barcode_length - 1] == '\\n') {\n        barcode[barcode_length - 1] = '\\0';\n        --barcode_length;\n      }\n      if (barcode_length > 32) {\n        ExitWithMessage(\"ERROR: barcode length is greater than 32!\");\n      }\n\n      if (barcode_length != barcode_length_) {\n        if (num_barcodes == 0) {\n          ExitWithMessage(\n              \"ERROR: whitelist and input barcode lengths are not equal!\");\n        } else {\n          ExitWithMessage(\n              \"ERROR: barcode lengths are not equal in the whitelist!\");\n        }\n      }\n      \n      uint64_t barcode_key = GenerateSeedFromSequence(\n          barcode, barcode_length, 0, barcode_length);\n      \n      int khash_return_code;\n      khiter_t barcode_whitelist_lookup_table_iterator =\n        kh_put(k64_seq, barcode_whitelist_lookup_table_, barcode_key,\n            &khash_return_code);\n      kh_value(barcode_whitelist_lookup_table_,\n          barcode_whitelist_lookup_table_iterator) = 0;\n      assert(khash_return_code != -1 && khash_return_code != 0);\n      ++num_barcodes;\n    }\n    if (!gzeof(barcode_whitelist_file)) {\n      ExitWithMessage(\"ERROR: barcode whitelist file does not exist or is truncated!\");\n    }\n    gzclose(barcode_whitelist_file);\n  } else { \n    std::ifstream barcode_whitelist_file_stream(\n        mapping_parameters_.barcode_whitelist_file_path);\n    std::string barcode_whitelist_file_line;\n    // bool first_line = true;\n    while (getline(barcode_whitelist_file_stream, barcode_whitelist_file_line)) {\n      std::stringstream barcode_whitelist_file_line_string_stream(\n          barcode_whitelist_file_line);\n      //// skip the header\n      // if (barcode_whitelist_file_line[0] == '#' ||\n      // barcode_whitelist_file_line.find(\"kmer\") == 0) {\n      //  continue;\n      //}\n      std::string barcode;\n      barcode_whitelist_file_line_string_stream >> barcode;\n      size_t barcode_length = barcode.length();\n      if (barcode_length > 32) {\n        ExitWithMessage(\"ERROR: barcode length is greater than 32!\");\n      }\n\n      if (barcode_length != barcode_length_) {\n        if (num_barcodes == 0) {\n          ExitWithMessage(\n              \"ERROR: whitelist and input barcode lengths are not equal!\");\n        } else {\n          ExitWithMessage(\n              \"ERROR: barcode lengths are not equal in the whitelist!\");\n        }\n      }\n\n      // if (first_line) {\n      //  //size_t barcode_length = kmer.length();\n      //  // Allocate memory to save pore model parameters\n      //  //size_t num_pore_models = 1 << (kmer_size_ * 2);\n      //  //pore_models_.assign(num_pore_models, PoreModelParameters());\n      //  //first_line = false;\n      //}\n      // assert(kmer.length() == (size_t)kmer_size_);\n      uint64_t barcode_key = GenerateSeedFromSequence(\n          barcode.data(), barcode_length, 0, barcode_length);\n      // PoreModelParameters &pore_model_parameters =\n      // pore_models_[kmer_hash_value]; barcode_whitelist_file_line_string_stream\n      // >> pore_model_parameters.level_mean >> pore_model_parameters.level_stdv\n      // >> pore_model_parameters.sd_mean >> pore_model_parameters.sd_stdv;\n      int khash_return_code;\n      khiter_t barcode_whitelist_lookup_table_iterator =\n        kh_put(k64_seq, barcode_whitelist_lookup_table_, barcode_key,\n            &khash_return_code);\n      kh_value(barcode_whitelist_lookup_table_,\n          barcode_whitelist_lookup_table_iterator) = 0;\n      assert(khash_return_code != -1 && khash_return_code != 0);\n      ++num_barcodes;\n    }\n    barcode_whitelist_file_stream.close();\n  }\n  std::cerr << \"Loaded \" << num_barcodes << \" barcodes in \"\n            << GetRealTime() - real_start_time << \"s.\\n\";\n}\n\nvoid Chromap::ComputeBarcodeAbundance(uint64_t max_num_sample_barcodes) {\n  double real_start_time = GetRealTime();\n  SequenceBatch barcode_batch(read_batch_size_, barcode_effective_range_);\n  for (size_t read_file_index = 0;\n       read_file_index < mapping_parameters_.read_file1_paths.size();\n       ++read_file_index) {\n    barcode_batch.InitializeLoading(\n        mapping_parameters_.barcode_file_paths[read_file_index]);\n    uint32_t num_loaded_barcodes = barcode_batch.LoadBatch();\n    while (num_loaded_barcodes > 0) {\n      for (uint32_t barcode_index = 0; barcode_index < num_loaded_barcodes;\n           ++barcode_index) {\n        std::vector<int> N_pos;  // position of Ns\n        barcode_batch.GetSequenceNsAt(barcode_index, /*little_endian=*/true,\n                                      N_pos);\n        if (N_pos.size() > 0) continue;\n\n        uint32_t barcode_length =\n            barcode_batch.GetSequenceLengthAt(barcode_index);\n        uint64_t barcode_key = barcode_batch.GenerateSeedFromSequenceAt(\n            barcode_index, 0, barcode_length);\n        khiter_t barcode_whitelist_lookup_table_iterator =\n            kh_get(k64_seq, barcode_whitelist_lookup_table_, barcode_key);\n        if (barcode_whitelist_lookup_table_iterator !=\n            kh_end(barcode_whitelist_lookup_table_)) {\n          // Correct barcode\n          kh_value(barcode_whitelist_lookup_table_,\n                   barcode_whitelist_lookup_table_iterator) += 1;\n          ++num_sample_barcodes_;\n        }\n      }\n      if (!mapping_parameters_.skip_barcode_check &&\n          num_sample_barcodes_ * 20 < num_loaded_barcodes) {\n        // Since num_loaded_pairs is a constant, this if is actuaclly only\n        // effective in the first iteration\n        ExitWithMessage(\n            \"Less than 5\\% barcodes can be found or corrected based on the \"\n            \"barcode whitelist.\\nPlease check whether the barcode whitelist \"\n            \"matches the data, e.g. length, reverse-complement. If this is a \"\n            \"false warning, please run Chromap with the option \"\n            \"--skip-barcode-check.\");\n      }\n\n      if (num_sample_barcodes_ >= max_num_sample_barcodes) {\n        break;\n      }\n      num_loaded_barcodes = barcode_batch.LoadBatch();\n    }\n    barcode_batch.FinalizeLoading();\n    if (num_sample_barcodes_ >= max_num_sample_barcodes) {\n      break;\n    }\n  }\n\n  std::cerr << \"Compute barcode abundance using \" << num_sample_barcodes_\n            << \" in \" << GetRealTime() - real_start_time << \"s.\\n\";\n}\n\nvoid Chromap::UpdateBarcodeAbundance(uint32_t num_loaded_barcodes,\n                                     const SequenceBatch &barcode_batch) {\n  double real_start_time = GetRealTime();\n  for (uint32_t barcode_index = 0; barcode_index < num_loaded_barcodes;\n       ++barcode_index) {\n    uint32_t barcode_length = barcode_batch.GetSequenceLengthAt(barcode_index);\n    uint64_t barcode_key = barcode_batch.GenerateSeedFromSequenceAt(\n        barcode_index, 0, barcode_length);\n    khiter_t barcode_whitelist_lookup_table_iterator =\n        kh_get(k64_seq, barcode_whitelist_lookup_table_, barcode_key);\n    if (barcode_whitelist_lookup_table_iterator !=\n        kh_end(barcode_whitelist_lookup_table_)) {\n      // Correct barcode\n      kh_value(barcode_whitelist_lookup_table_,\n               barcode_whitelist_lookup_table_iterator) += 1;\n      ++num_sample_barcodes_;\n    }\n  }\n  std::cerr << \"Update barcode abundance using \" << num_sample_barcodes_\n            << \" in \" << GetRealTime() - real_start_time << \"s.\\n\";\n}\n\nbool Chromap::CorrectBarcodeAt(uint32_t barcode_index,\n                               SequenceBatch &barcode_batch,\n                               uint64_t &num_barcode_in_whitelist,\n                               uint64_t &num_corrected_barcode) {\n  const uint32_t barcode_length =\n      barcode_batch.GetSequenceLengthAt(barcode_index);\n  const uint64_t barcode_key = barcode_batch.GenerateSeedFromSequenceAt(\n      barcode_index, 0, barcode_length);\n  khiter_t barcode_whitelist_lookup_table_iterator =\n      kh_get(k64_seq, barcode_whitelist_lookup_table_, barcode_key);\n  std::vector<int> N_pos;  // position of Ns\n\n  barcode_batch.GetSequenceNsAt(barcode_index, /*little_endian=*/true, N_pos);\n  if (N_pos.size() >\n      (uint32_t)mapping_parameters_.barcode_correction_error_threshold)\n    return false;\n\n  if (N_pos.size() == 0 && barcode_whitelist_lookup_table_iterator !=\n                               kh_end(barcode_whitelist_lookup_table_)) {\n    // Correct barcode\n    ++num_barcode_in_whitelist;\n    return true;\n  } else if (mapping_parameters_.barcode_correction_error_threshold > 0) {\n    // Need to correct this barcode\n    // const char *barcode = barcode_batch->GetSequenceAt(barcode_index);\n    // std::cerr << barcode_index << \" barcode \" << barcode << \" needs\n    // correction\\n\";\n    const char *barcode_qual = barcode_batch.GetSequenceQualAt(barcode_index);\n    std::vector<BarcodeWithQual> corrected_barcodes_with_quals;\n    uint64_t mask = (uint64_t)3;\n    uint32_t i_start = 0;\n    uint32_t i_end = barcode_length;\n    uint32_t ti_limit = 3;\n    if (N_pos.size() > 0) {\n      i_start = N_pos[0];\n      i_end = N_pos[0] + 1;\n      ti_limit = 4;\n    }\n    for (uint32_t i = i_start; i < i_end; ++i) {\n      uint64_t barcode_key_to_change = mask << (2 * i);\n      barcode_key_to_change = ~barcode_key_to_change;\n      barcode_key_to_change &= barcode_key;\n      uint64_t base_to_change1 = (barcode_key >> (2 * i)) & mask;\n      for (uint32_t ti = 0; ti < ti_limit; ++ti) {\n        // change the base\n        base_to_change1 += 1;\n        base_to_change1 &= mask;\n        // generate the corrected key\n        uint64_t corrected_barcode_key =\n            barcode_key_to_change | (base_to_change1 << (2 * i));\n        barcode_whitelist_lookup_table_iterator = kh_get(\n            k64_seq, barcode_whitelist_lookup_table_, corrected_barcode_key);\n        if (barcode_whitelist_lookup_table_iterator !=\n            kh_end(barcode_whitelist_lookup_table_)) {\n          // find one possible corrected barcode\n          double barcode_abundance =\n              kh_value(barcode_whitelist_lookup_table_,\n                       barcode_whitelist_lookup_table_iterator) /\n              (double)num_sample_barcodes_;\n          int qual_offset = 33;\n          int adjusted_qual =\n              barcode_qual[barcode_length - 1 - i] - qual_offset;\n          adjusted_qual = adjusted_qual > 40 ? 40 : adjusted_qual;\n          adjusted_qual = adjusted_qual < 3 ? 3 : adjusted_qual;\n          double score =\n              pow(10.0, ((-adjusted_qual) / 10.0)) * barcode_abundance;\n          corrected_barcodes_with_quals.emplace_back(\n              BarcodeWithQual{barcode_length - 1 - i,\n                              Uint8ToChar(base_to_change1), 0, 0, score});\n          // std::cerr << \"1score: \" << score << \" pos1: \" << barcode_length - 1\n          // - i << \" b1: \" << base_to_change1 << \" pos2: \" << 0 << \" b2: \" <<\n          // (char)0 << \"\\n\";\n        }\n        if (mapping_parameters_.barcode_correction_error_threshold == 2) {\n          uint32_t j_start = i + 1;\n          uint32_t j_end = barcode_length;\n          uint32_t ti2_limit = 3;\n          if (N_pos.size() == 2) {\n            j_start = N_pos[1];\n            j_end = N_pos[1] + 1;\n            ti2_limit = 4;\n          }\n          for (uint32_t j = j_start; j < j_end; ++j) {\n            uint64_t barcode_key_to_change2 = mask << (2 * i);\n            barcode_key_to_change2 = mask << (2 * j);\n            barcode_key_to_change2 = ~barcode_key_to_change2;\n            barcode_key_to_change2 &= corrected_barcode_key;\n            uint64_t base_to_change2 =\n                (corrected_barcode_key >> (2 * j)) & mask;\n            for (uint32_t ti2 = 0; ti2 < ti2_limit; ++ti2) {\n              // change the base\n              base_to_change2 += 1;\n              base_to_change2 &= mask;\n              // generate the corrected key\n              uint64_t corrected_barcode_key2 =\n                  barcode_key_to_change2 | (base_to_change2 << (2 * j));\n              barcode_whitelist_lookup_table_iterator =\n                  kh_get(k64_seq, barcode_whitelist_lookup_table_,\n                         corrected_barcode_key2);\n              if (barcode_whitelist_lookup_table_iterator !=\n                  kh_end(barcode_whitelist_lookup_table_)) {\n                // find one possible corrected barcode\n                double barcode_abundance =\n                    kh_value(barcode_whitelist_lookup_table_,\n                             barcode_whitelist_lookup_table_iterator) /\n                    (double)num_sample_barcodes_;\n                int qual_offset = 33;\n                int adjusted_qual =\n                    barcode_qual[barcode_length - 1 - j] - qual_offset;\n                adjusted_qual = adjusted_qual > 40 ? 40 : adjusted_qual;\n                adjusted_qual = adjusted_qual < 3 ? 3 : adjusted_qual;\n                int adjusted_qual1 =\n                    barcode_qual[barcode_length - 1 - i] - qual_offset;\n                adjusted_qual1 = adjusted_qual1 > 40 ? 40 : adjusted_qual1;\n                adjusted_qual1 = adjusted_qual1 < 3 ? 3 : adjusted_qual1;\n                adjusted_qual += adjusted_qual1;\n                double score =\n                    pow(10.0, ((-adjusted_qual) / 10.0)) * barcode_abundance;\n                corrected_barcodes_with_quals.emplace_back(BarcodeWithQual{\n                    barcode_length - 1 - i, Uint8ToChar(base_to_change1),\n                    barcode_length - 1 - j, Uint8ToChar(base_to_change2),\n                    score});\n                // std::cerr << \"2score: \" << score << \" pos1: \" <<\n                // barcode_length - 1 - i << \" b1: \" << base_to_change1 << \"\n                // pos2: \" << barcode_length - 1 -j << \" b2: \" <<\n                // base_to_change2\n                // << \"\\n\";\n              }\n            }\n          }\n        }\n      }\n    }\n    size_t num_possible_corrected_barcodes =\n        corrected_barcodes_with_quals.size();\n    if (num_possible_corrected_barcodes == 0) {\n      // Barcode cannot be corrected, leave it for downstream\n      return false;\n    } else if (num_possible_corrected_barcodes == 1) {\n      // Just correct it\n      // std::cerr << \"Corrected the barcode from \" << barcode << \" to \";\n      barcode_batch.CorrectBaseAt(\n          barcode_index, corrected_barcodes_with_quals[0].corrected_base_index1,\n          corrected_barcodes_with_quals[0].correct_base1);\n      if (corrected_barcodes_with_quals[0].correct_base2 != 0) {\n        barcode_batch.CorrectBaseAt(\n            barcode_index,\n            corrected_barcodes_with_quals[0].corrected_base_index2,\n            corrected_barcodes_with_quals[0].correct_base2);\n      }\n      // std::cerr << barcode << \"\\n\";\n      // std::cerr << \"score: \" << corrected_barcodes_with_quals[0].score <<\n      // \"\\n\"; std::cerr << \"score: \" << corrected_barcodes_with_quals[0].score\n      // << \" pos1: \" << corrected_barcodes_with_quals[0].corrected_base_index1\n      // << \" b1: \" << corrected_barcodes_with_quals[0].correct_base1 << \" pos2:\n      // \" << corrected_barcodes_with_quals[0].corrected_base_index2 << \" b2: \"\n      // << corrected_barcodes_with_quals[0].correct_base2 << \"\\n\";\n      ++num_corrected_barcode;\n      return true;\n    } else {\n      // Select the best correction\n      std::sort(corrected_barcodes_with_quals.begin(),\n                corrected_barcodes_with_quals.end(),\n                std::greater<BarcodeWithQual>());\n      // int num_ties = 0;\n      double sum_score = 0;\n      for (size_t ci = 0; ci < num_possible_corrected_barcodes; ++ci) {\n        sum_score += corrected_barcodes_with_quals[ci].score;\n        // std::cerr << ci << \" score: \" <<\n        // corrected_barcodes_with_quals[ci].score << \" pos1: \" <<\n        // corrected_barcodes_with_quals[ci].corrected_base_index1 << \" b1: \" <<\n        // corrected_barcodes_with_quals[ci].correct_base1 << \" pos2: \" <<\n        // corrected_barcodes_with_quals[ci].corrected_base_index2 << \" b2: \" <<\n        // corrected_barcodes_with_quals[ci].correct_base2 << \"\\n\"; if\n        // (corrected_barcodes_with_quals[ci].qual ==\n        // corrected_barcodes_with_quals[0].qual) {\n        //  ++num_ties;\n        //}\n      }\n      int best_corrected_barcode_index = 0;\n      // if (num_ties > 0) {\n      //  std::mt19937 tmp_generator(11);\n      //  std::uniform_int_distribution<int> distribution(0, num_ties); //\n      //  important: inclusive range best_corrected_barcode_index =\n      //  distribution(tmp_generator);\n      //}\n      // std::cerr << \"Corrected the barcode from \" << barcode << \" to \";\n      double confidence_threshold =\n          mapping_parameters_.barcode_correction_probability_threshold;\n      if (corrected_barcodes_with_quals[best_corrected_barcode_index].score /\n              sum_score >\n          confidence_threshold) {\n        barcode_batch.CorrectBaseAt(\n            barcode_index,\n            corrected_barcodes_with_quals[best_corrected_barcode_index]\n                .corrected_base_index1,\n            corrected_barcodes_with_quals[best_corrected_barcode_index]\n                .correct_base1);\n        if (corrected_barcodes_with_quals[best_corrected_barcode_index]\n                .correct_base2 != 0) {\n          barcode_batch.CorrectBaseAt(\n              barcode_index,\n              corrected_barcodes_with_quals[best_corrected_barcode_index]\n                  .corrected_base_index2,\n              corrected_barcodes_with_quals[best_corrected_barcode_index]\n                  .correct_base2);\n        }\n        // std::cerr << barcode << \"\\n\";\n        // std::cerr << \"score: \" <<\n        // corrected_barcodes_with_quals[best_corrected_barcode_index].score <<\n        // \"\\n\"; std::cerr << \"best score: \" <<\n        // corrected_barcodes_with_quals[best_corrected_barcode_index].score <<\n        // \" sum score: \" << sum_score << \"\\n\";\n        ++num_corrected_barcode;\n        return true;\n      } else {\n        // std::cerr << \"Didnt pass filter: \" <<\n        // corrected_barcodes_with_quals[best_corrected_barcode_index].score /\n        // sum_score << \"\\n\"; std::cerr << \"best score: \" <<\n        // corrected_barcodes_with_quals[best_corrected_barcode_index].score <<\n        // \" sum score: \" << sum_score << \"\\n\";\n        return false;\n      }\n    }\n  } else {\n    return false;\n  }\n}\n\nvoid Chromap::OutputBarcodeStatistics() {\n  std::cerr << \"Number of barcodes in whitelist: \" << num_barcode_in_whitelist_\n            << \".\\n\";\n  std::cerr << \"Number of corrected barcodes: \" << num_corrected_barcode_\n            << \".\\n\";\n}\n\nvoid Chromap::OutputMappingStatistics() {\n  std::cerr << \"Number of reads: \" << num_reads_ << \".\\n\";\n  // std::cerr << \"Number of duplicated reads: \" << num_duplicated_reads_ <<\n  // \".\\n\";\n  std::cerr << \"Number of mapped reads: \" << num_mapped_reads_ << \".\\n\";\n  std::cerr << \"Number of uniquely mapped reads: \" << num_uniquely_mapped_reads_\n            << \".\\n\";\n  std::cerr << \"Number of reads have multi-mappings: \"\n            << num_mapped_reads_ - num_uniquely_mapped_reads_ << \".\\n\";\n  std::cerr << \"Number of candidates: \" << num_candidates_ << \".\\n\";\n  std::cerr << \"Number of mappings: \" << num_mappings_ << \".\\n\";\n  std::cerr << \"Number of uni-mappings: \" << num_uniquely_mapped_reads_\n            << \".\\n\";\n  std::cerr << \"Number of multi-mappings: \"\n            << num_mappings_ - num_uniquely_mapped_reads_ << \".\\n\";\n}\n\nvoid Chromap::ParseReadFormat(const std::string &read_format) {\n  if (read_format.empty()) {\n    return;\n  }\n\n  read1_effective_range_.InitializeParsing();\n  read2_effective_range_.InitializeParsing();\n  barcode_effective_range_.InitializeParsing();\n\n  uint32_t i, j;\n  for (i = 0; i < read_format.size();) {\n    for (j = i + 1; j < read_format.size() && read_format[j] != ','; ++j)\n      ;\n    bool parse_success = true;\n    if (read_format[i] == 'r' && read_format[i + 1] == '1') {\n      parse_success =\n          read1_effective_range_.ParseFormatStringAndAppendEffectiveRange(\n              read_format.c_str() + i, j - i);\n    } else if (read_format[i] == 'r' && read_format[i + 1] == '2') {\n      parse_success =\n          read2_effective_range_.ParseFormatStringAndAppendEffectiveRange(\n              read_format.c_str() + i, j - i);\n    } else if (read_format[i] == 'b' && read_format[i + 1] == 'c') {\n      parse_success =\n          barcode_effective_range_.ParseFormatStringAndAppendEffectiveRange(\n              read_format.c_str() + i, j - i);\n    } else {\n      parse_success = false;\n    }\n\n    if (!parse_success) {\n      ExitWithMessage(\"Unknown read format: \" + read_format + \"\\n\");\n    }\n\n    i = j + 1;\n  }\n\n  read1_effective_range_.FinalizeParsing();\n  read2_effective_range_.FinalizeParsing();\n  barcode_effective_range_.FinalizeParsing();\n}\n\nvoid Chromap::GenerateCustomRidRanks(\n    const std::string &custom_rid_order_file_path,\n    uint32_t num_reference_sequences, const SequenceBatch &reference,\n    std::vector<int> &rid_ranks) {\n  for (uint32_t i = 0; i < num_reference_sequences; ++i) {\n    rid_ranks.emplace_back(i);\n  }\n\n  if (custom_rid_order_file_path.empty()) {\n    return;\n  }\n\n  std::unordered_map<std::string, int> ref_name_to_rank;\n  std::ifstream custom_rid_order_file_stream(custom_rid_order_file_path);\n  std::string ref_name;\n  uint32_t ref_rank = 0;\n  while (getline(custom_rid_order_file_stream, ref_name)) {\n    ref_name_to_rank[ref_name] = ref_rank;\n    ref_rank += 1;\n  }\n  custom_rid_order_file_stream.close();\n\n  // First, rank the chromosomes in the custom order provided by users.\n  for (uint32_t i = 0; i < num_reference_sequences; ++i) {\n    std::string ref_name(reference.GetSequenceNameAt(i));\n    if (ref_name_to_rank.find(ref_name) != ref_name_to_rank.end()) {\n      rid_ranks[i] = ref_name_to_rank[ref_name];\n    } else {\n      rid_ranks[i] = -1;\n    }\n  }\n\n  // There might be some rids without any custom order. We just order them based\n  // on their original order in the reference file.\n  uint32_t k = ref_name_to_rank.size();\n  // Rank the remaining chromosomes.\n  for (uint32_t i = 0; i < num_reference_sequences; ++i) {\n    if (rid_ranks[i] == -1) {\n      rid_ranks[i] = k;\n      ++k;\n    }\n  }\n\n  if (k > num_reference_sequences) {\n    ExitWithMessage(\n        \"ERROR: unknown chromsome names found in chromosome order file.\");\n  }\n}\n\nvoid Chromap::RerankCandidatesRid(std::vector<Candidate> &candidates) {\n  for (size_t i = 0; i < candidates.size(); ++i) {\n    uint64_t rid = (uint32_t)(candidates[i].position >> 32);\n    rid = custom_rid_rank_[rid];\n    candidates[i].position =\n        (candidates[i].position & (uint64_t)0xffffffff) | (rid << 32);\n  }\n}\n\n}  // namespace chromap\n"
  },
  {
    "path": "src/chromap.h",
    "content": "#ifndef CHROMAP_H_\n#define CHROMAP_H_\n\n#include <omp.h>\n\n#include <memory>\n#include <random>\n#include <string>\n#include <tuple>\n#include <vector>\n\n#include <queue> // Used these two for k-minhash\n#include <unordered_set>\n\n#include <sstream> // Used for frip est params splitting\n\n#include \"candidate_processor.h\"\n#include \"cxxopts.hpp\"\n#include \"draft_mapping_generator.h\"\n#include \"feature_barcode_matrix.h\"\n#include \"index.h\"\n#include \"index_parameters.h\"\n#include \"khash.h\"\n#include \"mapping_generator.h\"\n#include \"mapping_metadata.h\"\n#include \"mapping_parameters.h\"\n#include \"mapping_processor.h\"\n#include \"mapping_writer.h\"\n#include \"minimizer_generator.h\"\n#include \"mmcache.hpp\"\n#include \"paired_end_mapping_metadata.h\"\n#include \"sequence_batch.h\"\n#include \"sequence_effective_range.h\"\n#include \"temp_mapping.h\"\n#include \"utils.h\"\n\n#define CHROMAP_VERSION \"0.3.3-r521\"\n\nnamespace chromap {\n\nclass K_MinHash {\npublic:\n    /*\n     * MinHash Class - used to estimate the number of unique cache slots \n     *                 hit by each barcode\n     *\n     * @param k - size of MinHash sketch\n     * @param range - range of possible cache ids\n     */\n    K_MinHash(size_t k, size_t range) : k_(k), range_(range) {}\n\n    inline void add(size_t num) {\n      /* If num is not present in queue, we will add it */\n        if (unique_slots_.find(num) == unique_slots_.end()) {\n            unique_slots_.insert(num);\n            pq_.push(num);\n            // only keep smallest k numbers\n            if (pq_.size() > k_) {\n                unique_slots_.erase(pq_.top());\n                pq_.pop();\n            }\n        }\n    }\n\n    inline size_t compute_cardinality() {\n      /* Use k-MinHash estimator to return estimated cardinality */\n      if (pq_.size() < k_) {return 0;}\n      size_t cardinality = (k_ * range_)/pq_.top() - 1;\n      return cardinality;\n    }\n\nprivate:\n    size_t k_;\n    size_t range_;\n\n    /* Uses an unordered set to have O(1) find queries*/\n    std::priority_queue<uint32_t> pq_; // max-heap\n    std::unordered_set<uint32_t> unique_slots_; // keep track of unique values\n};\n\nclass Chromap {\n public:\n  Chromap() = delete;\n\n  // For index construction\n  Chromap(const IndexParameters &index_parameters)\n      : index_parameters_(index_parameters) {\n    barcode_lookup_table_ = NULL;\n    barcode_whitelist_lookup_table_ = NULL;\n  }\n\n  // For mapping\n  Chromap(const MappingParameters &mapping_parameters)\n      : mapping_parameters_(mapping_parameters) {\n    barcode_lookup_table_ = kh_init(k64_seq);\n    barcode_whitelist_lookup_table_ = kh_init(k64_seq);\n\n    ParseReadFormat(mapping_parameters.read_format);\n  }\n\n  ~Chromap() {\n    if (barcode_whitelist_lookup_table_ != NULL) {\n      kh_destroy(k64_seq, barcode_whitelist_lookup_table_);\n    }\n\n    if (barcode_lookup_table_ != NULL) {\n      kh_destroy(k64_seq, barcode_lookup_table_);\n    }\n    if (read_lookup_tables_.size() > 0) {\n      for (uint32_t i = 0; i < read_lookup_tables_.size(); ++i) {\n        kh_destroy(k128, read_lookup_tables_[i]);\n      }\n    }\n  }\n\n  void ConstructIndex();\n\n  template <typename MappingRecord>\n  void MapSingleEndReads();\n\n  template <typename MappingRecord>\n  void MapPairedEndReads();\n\n private:\n  uint32_t LoadSingleEndReadsWithBarcodes(SequenceBatch &read_batch,\n                                          SequenceBatch &barcode_batch,\n                                          bool parallel_parsing);\n\n  uint32_t LoadPairedEndReadsWithBarcodes(SequenceBatch &read_batch1,\n                                          SequenceBatch &read_batch2,\n                                          SequenceBatch &barcode_batch,\n                                          bool parallel_parsing);\n\n  void TrimAdapterForPairedEndRead(uint32_t pair_index,\n                                   SequenceBatch &read_batch1,\n                                   SequenceBatch &read_batch2);\n\n  bool PairedEndReadWithBarcodeIsDuplicate(uint32_t pair_index,\n                                           const SequenceBatch &barcode_batch,\n                                           const SequenceBatch &read_batch1,\n                                           const SequenceBatch &read_batch2);\n\n  uint32_t SampleInputBarcodesAndExamineLength();\n\n  void LoadBarcodeWhitelist();\n\n  void ComputeBarcodeAbundance(uint64_t max_num_sample_barcodes);\n\n  void UpdateBarcodeAbundance(uint32_t num_loaded_barcodes,\n                              const SequenceBatch &barcode_batch);\n\n  bool CorrectBarcodeAt(uint32_t barcode_index, SequenceBatch &barcode_batch,\n                        uint64_t &num_barcode_in_whitelist,\n                        uint64_t &num_corrected_barcode);\n\n  void OutputBarcodeStatistics();\n\n  void OutputMappingStatistics();\n\n  void ParseReadFormat(const std::string &read_format);\n\n  // User custom rid order file contains a column of reference sequence names\n  // and there is one name on each row. The reference sequence name on the ith\n  // row means the rank of this sequence is i. This function loads the custom\n  // rid order file and generates a mapping from the original rids to their\n  // custom ranks, e.g., rid_ranks[i] is the custom rank of the ith rid in the\n  // reference.\n  void GenerateCustomRidRanks(const std::string &custom_rid_order_file_path,\n                              uint32_t num_reference_sequences,\n                              const SequenceBatch &reference,\n                              std::vector<int> &rid_ranks);\n\n  // TODO: generate reranked candidates directly.\n  void RerankCandidatesRid(std::vector<Candidate> &candidates);\n\n  // Parameters\n  const IndexParameters index_parameters_;\n  const MappingParameters mapping_parameters_;\n\n  // Default batch size, # reads for single-end reads, # read pairs for\n  // paired-end reads.\n  const uint32_t read_batch_size_ = 500000;\n\n  // 0-start, 1-end (includsive), 2-strand(-1:minus, 1:plus)\n  SequenceEffectiveRange barcode_effective_range_;\n  SequenceEffectiveRange read1_effective_range_;\n  SequenceEffectiveRange read2_effective_range_;\n\n  std::vector<int> custom_rid_rank_;\n  std::vector<int> pairs_custom_rid_rank_;\n\n  khash_t(k64_seq) * barcode_whitelist_lookup_table_;\n\n  // For identical read dedupe\n  khash_t(k64_seq) * barcode_lookup_table_;\n  std::vector<khash_t(k128) *> read_lookup_tables_;\n\n  // For mapping.\n  const int min_unique_mapping_mapq_ = 4;\n\n  // For mapping stats.\n  uint64_t num_candidates_ = 0;\n  uint64_t num_mappings_ = 0;\n  uint64_t num_mapped_reads_ = 0;\n  uint64_t num_uniquely_mapped_reads_ = 0;\n  uint64_t num_reads_ = 0;\n  // # identical reads.\n  // uint64_t num_duplicated_reads_ = 0;\n\n  // For barcode stats.\n  const uint64_t initial_num_sample_barcodes_ = 20000000;\n  uint64_t num_sample_barcodes_ = 0;\n  uint64_t num_barcode_in_whitelist_ = 0;\n  uint64_t num_corrected_barcode_ = 0;\n  uint32_t barcode_length_ = 0;\n};\n\ntemplate <typename MappingRecord>\nvoid Chromap::MapSingleEndReads() {\n  double real_start_time = GetRealTime();\n\n  SequenceBatch reference;\n  reference.InitializeLoading(mapping_parameters_.reference_file_path);\n  reference.LoadAllSequences();\n  uint32_t num_reference_sequences = reference.GetNumSequences();\n  if (mapping_parameters_.custom_rid_order_file_path.length() > 0) {\n    GenerateCustomRidRanks(mapping_parameters_.custom_rid_order_file_path,\n                           num_reference_sequences, reference,\n                           custom_rid_rank_);\n    reference.ReorderSequences(custom_rid_rank_);\n  }\n\n  Index index(mapping_parameters_.index_file_path);\n  index.Load();\n  const int kmer_size = index.GetKmerSize();\n  const int window_size = index.GetWindowSize();\n  // index.Statistics(num_sequences, reference);\n\n  SequenceBatch read_batch(read_batch_size_, read1_effective_range_);\n  SequenceBatch read_batch_for_loading(read_batch_size_,\n                                       read1_effective_range_);\n  SequenceBatch barcode_batch(read_batch_size_, barcode_effective_range_);\n  SequenceBatch barcode_batch_for_loading(read_batch_size_,\n                                          barcode_effective_range_);\n\n  std::vector<std::vector<MappingRecord>> mappings_on_diff_ref_seqs;\n  mappings_on_diff_ref_seqs.reserve(num_reference_sequences);\n  for (uint32_t i = 0; i < num_reference_sequences; ++i) {\n    mappings_on_diff_ref_seqs.emplace_back(std::vector<MappingRecord>());\n  }\n\n  std::vector<TempMappingFileHandle<MappingRecord>> temp_mapping_file_handles;\n\n  // Preprocess barcodes for single cell data\n  if (!mapping_parameters_.is_bulk_data) {\n    barcode_length_ = SampleInputBarcodesAndExamineLength();\n    if (!mapping_parameters_.barcode_whitelist_file_path.empty()) {\n      LoadBarcodeWhitelist();\n      ComputeBarcodeAbundance(initial_num_sample_barcodes_);\n    }\n  }\n\n  MinimizerGenerator minimizer_generator(kmer_size, window_size);\n\n  CandidateProcessor candidate_processor(\n      mapping_parameters_.min_num_seeds_required_for_mapping,\n      mapping_parameters_.max_seed_frequencies);\n\n  MappingProcessor<MappingRecord> mapping_processor(mapping_parameters_,\n                                                    min_unique_mapping_mapq_);\n\n  DraftMappingGenerator draft_mapping_generator(mapping_parameters_);\n\n  MappingGenerator<MappingRecord> mapping_generator(mapping_parameters_,\n                                                    pairs_custom_rid_rank_);\n\n  MappingWriter<MappingRecord> mapping_writer(\n      mapping_parameters_, barcode_length_, pairs_custom_rid_rank_);\n\n  mapping_writer.OutputHeader(num_reference_sequences, reference);\n\n  uint32_t num_mappings_in_mem = 0;\n  uint64_t max_num_mappings_in_mem =\n      1 * ((uint64_t)1 << 30) / sizeof(MappingRecord);\n  if (mapping_parameters_.mapping_output_format == MAPPINGFORMAT_SAM ||\n      mapping_parameters_.mapping_output_format == MAPPINGFORMAT_PAF ||\n      mapping_parameters_.mapping_output_format == MAPPINGFORMAT_PAIRS) {\n    max_num_mappings_in_mem = 1 * ((uint64_t)1 << 29) / sizeof(MappingRecord);\n  }\n\n  mm_cache mm_to_candidates_cache(2000003);\n  mm_to_candidates_cache.SetKmerLength(kmer_size);\n  struct _mm_history *mm_history = new struct _mm_history[read_batch_size_];\n  // Use bit encoding to represent mapping results\n  // bit 0: is barcode in whitelist\n  uint8_t *read_map_summary = NULL ; \n  if (!mapping_parameters_.summary_metadata_file_path.empty()) {\n    read_map_summary = new uint8_t[read_batch_size_];\n    memset(read_map_summary, 1, sizeof(*read_map_summary)*read_batch_size_);\n  }\n\n  static uint64_t thread_num_candidates = 0;\n  static uint64_t thread_num_mappings = 0;\n  static uint64_t thread_num_mapped_reads = 0;\n  static uint64_t thread_num_uniquely_mapped_reads = 0;\n  static uint64_t thread_num_barcode_in_whitelist = 0;\n  static uint64_t thread_num_corrected_barcode = 0;\n#pragma omp threadprivate(                                               \\\n    thread_num_candidates, thread_num_mappings, thread_num_mapped_reads, \\\n    thread_num_uniquely_mapped_reads, thread_num_barcode_in_whitelist,   \\\n    thread_num_corrected_barcode)\n  double real_start_mapping_time = GetRealTime();\n  for (size_t read_file_index = 0;\n       read_file_index < mapping_parameters_.read_file1_paths.size();\n       ++read_file_index) {\n    read_batch_for_loading.InitializeLoading(\n        mapping_parameters_.read_file1_paths[read_file_index]);\n\n    if (!mapping_parameters_.is_bulk_data) {\n      barcode_batch_for_loading.InitializeLoading(\n          mapping_parameters_.barcode_file_paths[read_file_index]);\n    }\n\n    uint32_t num_loaded_reads_for_loading = 0;\n    uint32_t num_loaded_reads = LoadSingleEndReadsWithBarcodes(\n        read_batch_for_loading, barcode_batch_for_loading,\n        mapping_parameters_.num_threads >= 3 ? true : false);\n    read_batch_for_loading.SwapSequenceBatch(read_batch);\n\n    if (!mapping_parameters_.is_bulk_data) {\n      barcode_batch_for_loading.SwapSequenceBatch(barcode_batch);\n    }\n\n    std::vector<std::vector<std::vector<MappingRecord>>>\n        mappings_on_diff_ref_seqs_for_diff_threads;\n    std::vector<std::vector<std::vector<MappingRecord>>>\n        mappings_on_diff_ref_seqs_for_diff_threads_for_saving;\n    mappings_on_diff_ref_seqs_for_diff_threads.reserve(\n        mapping_parameters_.num_threads);\n    mappings_on_diff_ref_seqs_for_diff_threads_for_saving.reserve(\n        mapping_parameters_.num_threads);\n    for (int ti = 0; ti < mapping_parameters_.num_threads; ++ti) {\n      mappings_on_diff_ref_seqs_for_diff_threads.emplace_back(\n          std::vector<std::vector<MappingRecord>>(num_reference_sequences));\n      mappings_on_diff_ref_seqs_for_diff_threads_for_saving.emplace_back(\n          std::vector<std::vector<MappingRecord>>(num_reference_sequences));\n      for (uint32_t i = 0; i < num_reference_sequences; ++i) {\n        mappings_on_diff_ref_seqs_for_diff_threads[ti][i].reserve(\n            (num_loaded_reads + num_loaded_reads / 1000 *\n                                    mapping_parameters_.max_num_best_mappings) /\n            mapping_parameters_.num_threads / num_reference_sequences);\n        mappings_on_diff_ref_seqs_for_diff_threads_for_saving[ti][i].reserve(\n            (num_loaded_reads + num_loaded_reads / 1000 *\n                                    mapping_parameters_.max_num_best_mappings) /\n            mapping_parameters_.num_threads / num_reference_sequences);\n      }\n    }\n#pragma omp parallel shared(num_reads_, mm_history, read_map_summary, reference, index, read_batch, barcode_batch, read_batch_for_loading, barcode_batch_for_loading, std::cerr, num_loaded_reads_for_loading, num_loaded_reads, num_reference_sequences, mappings_on_diff_ref_seqs_for_diff_threads, mappings_on_diff_ref_seqs_for_diff_threads_for_saving, mappings_on_diff_ref_seqs, temp_mapping_file_handles, mm_to_candidates_cache, mapping_writer, minimizer_generator, candidate_processor, mapping_processor, draft_mapping_generator, mapping_generator, num_mappings_in_mem, max_num_mappings_in_mem) num_threads(mapping_parameters_.num_threads) reduction(+:num_candidates_, num_mappings_, num_mapped_reads_, num_uniquely_mapped_reads_, num_barcode_in_whitelist_, num_corrected_barcode_)\n    {\n      thread_num_candidates = 0;\n      thread_num_mappings = 0;\n      thread_num_mapped_reads = 0;\n      thread_num_uniquely_mapped_reads = 0;\n      thread_num_barcode_in_whitelist = 0;\n      thread_num_corrected_barcode = 0;\n      MappingMetadata mapping_metadata;\n#pragma omp single\n      {\n        while (num_loaded_reads > 0) {\n          double real_batch_start_time = GetRealTime();\n          num_reads_ += num_loaded_reads;\n#pragma omp task\n          {\n            num_loaded_reads_for_loading = LoadSingleEndReadsWithBarcodes(\n                read_batch_for_loading, barcode_batch_for_loading,\n                mapping_parameters_.num_threads >= 12 ? true : false);\n          }  // end of openmp loading task\n          uint32_t history_update_threshold =\n          mm_to_candidates_cache.GetUpdateThreshold(num_loaded_reads,\n                                                    num_reads_, \n                                                    false,\n                                                    0.01);\n          // int grain_size = 10000;\n//#pragma omp taskloop grainsize(grain_size) //num_tasks(num_threads_* 50)\n#pragma omp taskloop num_tasks( \\\n    mapping_parameters_.num_threads *mapping_parameters_.num_threads)\n          for (uint32_t read_index = 0; read_index < num_loaded_reads;\n               ++read_index) {\n            bool current_barcode_is_whitelisted = true;\n            if (!mapping_parameters_.barcode_whitelist_file_path.empty()) {\n              current_barcode_is_whitelisted = CorrectBarcodeAt(\n                  read_index, barcode_batch, thread_num_barcode_in_whitelist,\n                  thread_num_corrected_barcode);\n            }\n\n            if (!(current_barcode_is_whitelisted ||\n                  mapping_parameters_.output_mappings_not_in_whitelist)) {\n              if (read_map_summary != NULL)\n                read_map_summary[read_index] = 0;\n              continue;\n            }\n            \n            if (read_batch.GetSequenceLengthAt(read_index) <\n                (uint32_t)mapping_parameters_.min_read_length) {\n              continue;  // reads are too short, just drop.\n            }\n\n            read_batch.PrepareNegativeSequenceAt(read_index);\n\n            mapping_metadata.PrepareForMappingNextRead(\n                mapping_parameters_.max_seed_frequencies[0]);\n\n            minimizer_generator.GenerateMinimizers(\n                read_batch, read_index, mapping_metadata.minimizers_);\n\n            if (mapping_metadata.minimizers_.size() > 0) {\n              if (mapping_parameters_.custom_rid_order_file_path.length() > 0) {\n                RerankCandidatesRid(mapping_metadata.positive_candidates_);\n                RerankCandidatesRid(mapping_metadata.negative_candidates_);\n              }\n\n              if (mm_to_candidates_cache.Query(\n                      mapping_metadata,\n                      read_batch.GetSequenceLengthAt(read_index)) == -1) {\n                candidate_processor.GenerateCandidates(\n                    mapping_parameters_.error_threshold, index,\n                    mapping_metadata);\n              }\n\n              if (read_index < history_update_threshold) {\n                mm_history[read_index].timestamp = num_reads_;\n                mm_history[read_index].minimizers =\n                    mapping_metadata.minimizers_;\n                mm_history[read_index].positive_candidates =\n                    mapping_metadata.positive_candidates_;\n                mm_history[read_index].negative_candidates =\n                    mapping_metadata.negative_candidates_;\n                mm_history[read_index].repetitive_seed_length =\n                    mapping_metadata.repetitive_seed_length_;\n              }\n\n              size_t current_num_candidates =\n                  mapping_metadata.GetNumCandidates();\n              if (current_num_candidates > 0) {\n                thread_num_candidates += current_num_candidates;\n                draft_mapping_generator.GenerateDraftMappings(\n                    read_batch, read_index, reference, mapping_metadata);\n\n                const size_t current_num_draft_mappings =\n                    mapping_metadata.GetNumDraftMappings();\n                if (current_num_draft_mappings > 0) {\n                  std::vector<std::vector<MappingRecord>>\n                      &mappings_on_diff_ref_seqs =\n                          mappings_on_diff_ref_seqs_for_diff_threads\n                              [omp_get_thread_num()];\n\n                  mapping_generator.GenerateBestMappingsForSingleEndRead(\n                      read_batch, read_index, reference, barcode_batch,\n                      mapping_metadata, mappings_on_diff_ref_seqs);\n\n                  thread_num_mappings +=\n                      std::min(mapping_metadata.GetNumBestMappings(),\n                               mapping_parameters_.max_num_best_mappings);\n                  ++thread_num_mapped_reads;\n\n                  if (mapping_metadata.GetNumBestMappings() == 1) {\n                    ++thread_num_uniquely_mapped_reads;\n                  }\n                }\n              }\n            }\n          }\n#pragma omp taskwait\n          for (uint32_t read_index = 0; read_index < history_update_threshold;\n               ++read_index) {\n            if (mm_history[read_index].timestamp != num_reads_) continue;\n            mm_to_candidates_cache.Update(\n                mm_history[read_index].minimizers,\n                mm_history[read_index].positive_candidates,\n                mm_history[read_index].negative_candidates,\n                mm_history[read_index].repetitive_seed_length);\n            if (mm_history[read_index].positive_candidates.size() <\n                mm_history[read_index].positive_candidates.capacity() / 2) {\n              std::vector<Candidate>().swap(\n                  mm_history[read_index].positive_candidates);\n            }\n            if (mm_history[read_index].negative_candidates.size() <\n                mm_history[read_index].negative_candidates.capacity() / 2) {\n              std::vector<Candidate>().swap(\n                  mm_history[read_index].negative_candidates);\n            }\n          }\n          // std::cerr<<\"cache memusage: \" <<\n          // mm_to_candidates_cache.GetMemoryBytes() <<\"\\n\" ;\n          if (!mapping_parameters_.summary_metadata_file_path.empty()) {\n            if (mapping_parameters_.is_bulk_data) \n              mapping_writer.UpdateSummaryMetadata(0, SUMMARY_METADATA_TOTAL, \n                  num_loaded_reads) ;\n            else {\n              uint32_t nonwhitelist_count = 0;\n              for (uint32_t read_index = 0; read_index < num_loaded_reads; ++read_index)\n                if (read_map_summary[read_index] & 1) {\n                  mapping_writer.UpdateSummaryMetadata(\n                      barcode_batch.GenerateSeedFromSequenceAt(read_index, 0, barcode_length_), \n                      SUMMARY_METADATA_TOTAL, 1);\n                } else {\n                  ++nonwhitelist_count;\n                }\n              \n              mapping_writer.UpdateSpeicalCategorySummaryMetadata(/*nonwhitelist*/0, \n                  SUMMARY_METADATA_TOTAL, nonwhitelist_count);\n            }\n\n            // By default, set the lowest bit to 1 (whether the barcode is in the whitelist)\n            memset(read_map_summary, 1, sizeof(*read_map_summary)*read_batch_size_);\n          }\n          num_loaded_reads = num_loaded_reads_for_loading;\n          read_batch_for_loading.SwapSequenceBatch(read_batch);\n          barcode_batch_for_loading.SwapSequenceBatch(barcode_batch);\n          mappings_on_diff_ref_seqs_for_diff_threads.swap(\n              mappings_on_diff_ref_seqs_for_diff_threads_for_saving);\n#pragma omp task\n          {\n            num_mappings_in_mem +=\n                mapping_processor.MoveMappingsInBuffersToMappingContainer(\n                    num_reference_sequences,\n                    mappings_on_diff_ref_seqs_for_diff_threads_for_saving,\n                    mappings_on_diff_ref_seqs);\n            if (mapping_parameters_.low_memory_mode &&\n                num_mappings_in_mem > max_num_mappings_in_mem) {\n              mapping_processor.ParallelSortOutputMappings(num_reference_sequences,\n                                                   mappings_on_diff_ref_seqs, 0);\n\n              mapping_writer.OutputTempMappings(num_reference_sequences,\n                                                mappings_on_diff_ref_seqs,\n                                                temp_mapping_file_handles);\n\n              if (temp_mapping_file_handles.size() > 850\n                  && temp_mapping_file_handles.size() % 10 == 1) { // every 10 temp files, double the temp file size\n                max_num_mappings_in_mem <<= 1;\n                std::cerr << \"Used \" << temp_mapping_file_handles.size() << \"temp files. Double the temp file volume to \" << max_num_mappings_in_mem << \"\\n\" ;\n              }\n              num_mappings_in_mem = 0;\n            }\n          }\n          std::cerr << \"Mapped \" << num_loaded_reads << \" reads in \"\n                    << GetRealTime() - real_batch_start_time << \"s.\\n\";\n        }\n      }  // end of openmp single\n      {\n        num_barcode_in_whitelist_ += thread_num_barcode_in_whitelist;\n        num_corrected_barcode_ += thread_num_corrected_barcode;\n        num_candidates_ += thread_num_candidates;\n        num_mappings_ += thread_num_mappings;\n        num_mapped_reads_ += thread_num_mapped_reads;\n        num_uniquely_mapped_reads_ += thread_num_uniquely_mapped_reads;\n      }  // end of updating shared mapping stats\n    }    // end of openmp parallel region\n    read_batch_for_loading.FinalizeLoading();\n    if (!mapping_parameters_.is_bulk_data) {\n      barcode_batch_for_loading.FinalizeLoading();\n    }\n  }\n\n  std::cerr << \"Mapped all reads in \" << GetRealTime() - real_start_mapping_time\n            << \"s.\\n\";\n\n  delete[] mm_history;\n  if (read_map_summary != NULL)\n    delete[] read_map_summary;\n\n  OutputMappingStatistics();\n  if (!mapping_parameters_.is_bulk_data) {\n    OutputBarcodeStatistics();\n  }\n\n  index.Destroy();\n\n  if (mapping_parameters_.low_memory_mode) {\n    // First, process the remaining mappings in the memory and save them on\n    // disk.\n    if (num_mappings_in_mem > 0) {\n      mapping_processor.SortOutputMappings(num_reference_sequences,\n                                           mappings_on_diff_ref_seqs);\n\n      mapping_writer.OutputTempMappings(num_reference_sequences,\n                                        mappings_on_diff_ref_seqs,\n                                        temp_mapping_file_handles);\n      num_mappings_in_mem = 0;\n    }\n\n    mapping_writer.ProcessAndOutputMappingsInLowMemory(\n        num_mappings_in_mem, num_reference_sequences, reference,\n        barcode_whitelist_lookup_table_, temp_mapping_file_handles);\n  } else {\n    if (mapping_parameters_.Tn5_shift) {\n      mapping_processor.ApplyTn5ShiftOnMappings(num_reference_sequences,\n                                                mappings_on_diff_ref_seqs);\n    }\n\n    if (mapping_parameters_.remove_pcr_duplicates) {\n      mapping_processor.RemovePCRDuplicate(num_reference_sequences,\n                                           mappings_on_diff_ref_seqs,\n                                           mapping_parameters_.num_threads);\n      std::cerr << \"After removing PCR duplications, \";\n      mapping_processor.OutputMappingStatistics(num_reference_sequences,\n                                                mappings_on_diff_ref_seqs);\n    } else {\n      mapping_processor.ParallelSortOutputMappings(num_reference_sequences,\n                                           mappings_on_diff_ref_seqs,\n                                           mapping_parameters_.num_threads);\n    }\n\n    if (mapping_parameters_.allocate_multi_mappings) {\n      const uint64_t num_multi_mappings =\n          num_mapped_reads_ - num_uniquely_mapped_reads_;\n      mapping_processor.AllocateMultiMappings(\n          num_reference_sequences, num_multi_mappings,\n          mapping_parameters_.multi_mapping_allocation_distance,\n          mappings_on_diff_ref_seqs);\n      std::cerr << \"After allocating multi-mappings, \";\n      mapping_processor.OutputMappingStatistics(num_reference_sequences,\n                                                mappings_on_diff_ref_seqs);\n      mapping_processor.SortOutputMappings(num_reference_sequences,\n                                           mappings_on_diff_ref_seqs);\n    }\n    mapping_writer.OutputMappings(num_reference_sequences, reference,\n                                  mappings_on_diff_ref_seqs);\n  }\n  mapping_writer.OutputSummaryMetadata();\n\n  reference.FinalizeLoading();\n  std::cerr << \"Total time: \" << GetRealTime() - real_start_time << \"s.\\n\";\n}\n\ntemplate <typename MappingRecord>\nvoid Chromap::MapPairedEndReads() {\n  double real_start_time = GetRealTime();\n\n  // Load reference\n  SequenceBatch reference;\n  reference.InitializeLoading(mapping_parameters_.reference_file_path);\n  reference.LoadAllSequences();\n  uint32_t num_reference_sequences = reference.GetNumSequences();\n  \n  // Debugging Info (printing out reference information)\n  if (mapping_parameters_.debug_cache) {\n    for (size_t i = 0; i < num_reference_sequences; i++){\n      std::cout << \"[DEBUG][INDEX] seq_i = \" << i \n                << \" , seq_i_name = \" << reference.GetSequenceNameAt(i) << std::endl;\n    }\n  }\n  \n  if (mapping_parameters_.custom_rid_order_file_path.length() > 0) {\n    GenerateCustomRidRanks(mapping_parameters_.custom_rid_order_file_path,\n                           num_reference_sequences, reference,\n                           custom_rid_rank_);\n    reference.ReorderSequences(custom_rid_rank_);\n  }\n  if (mapping_parameters_.mapping_output_format == MAPPINGFORMAT_PAIRS) {\n    GenerateCustomRidRanks(\n        mapping_parameters_.pairs_flipping_custom_rid_order_file_path,\n        num_reference_sequences, reference, pairs_custom_rid_rank_);\n  }\n\n  // Load index\n  Index index(mapping_parameters_.index_file_path);\n  index.Load();\n  const int kmer_size = index.GetKmerSize();\n  const int window_size = index.GetWindowSize();\n  // index.Statistics(num_sequences, reference);\n\n  // Initialize read batches\n  SequenceBatch read_batch1(read_batch_size_, read1_effective_range_);\n  SequenceBatch read_batch2(read_batch_size_, read2_effective_range_);\n  SequenceBatch barcode_batch(read_batch_size_, barcode_effective_range_);\n  SequenceBatch read_batch1_for_loading(read_batch_size_,\n                                        read1_effective_range_);\n  SequenceBatch read_batch2_for_loading(read_batch_size_,\n                                        read2_effective_range_);\n  SequenceBatch barcode_batch_for_loading(read_batch_size_,\n                                          barcode_effective_range_);\n\n  // Check cache-related parameters\n  std::cerr << \"Cache Size: \" << mapping_parameters_.cache_size << std::endl;\n  std::cerr << \"Cache Update Param: \" << mapping_parameters_.cache_update_param << std::endl;\n  \n  std::vector<uint64_t> seeds_for_batch(500000, 0);\n\n  // Variables used for counting number of associated cache slots\n  bool output_num_cache_slots_info = mapping_parameters_.output_num_uniq_cache_slots;\n  if (mapping_parameters_.summary_metadata_file_path.empty()) {\n    output_num_cache_slots_info = false;\n  }\n  const size_t k_for_minhash = mapping_parameters_.k_for_minhash;\n\n  std::cerr << \"Output number of associated cache slots: \" << output_num_cache_slots_info << std::endl;\n  std::cerr << \"K for MinHash: \" << k_for_minhash << std::endl;\n\n  int num_locks_for_map = 1000;\n  omp_lock_t map_locks[num_locks_for_map];\n  for (int i = 0; i < num_locks_for_map; ++i) {omp_init_lock(&map_locks[i]);}\n  \n  std::vector<std::unordered_map<size_t, K_MinHash>> barcode_peak_map(num_locks_for_map);\n\n  // Parse out the parameters for chromap score (const, fric, dup, unmapped, lowmapq)\n  std::vector<double> frip_est_params; \n  std::stringstream ss(mapping_parameters_.frip_est_params);\n  std::string token;\n\n  while(std::getline(ss, token, ';')) {\n    try {\n      auto curr_param = std::stod(token);\n      frip_est_params.push_back(curr_param);\n    } catch(...) {\n      chromap::ExitWithMessage(\n        \"\\nException occurred while processing chromap score parameters\\n\"\n        );\n    }\n  }\n  if (frip_est_params.size() != 5) {\n    chromap::ExitWithMessage(\n      \"\\nInvalid number of parameters, expecting 5 parameters but found \" \n      + std::to_string(frip_est_params.size()) \n      + \" parameters\\n\"\n      );\n  }\n\n  // Initialize vector to keep track of cache hits for each thread\n  std::vector<int> cache_hits_per_thread(mapping_parameters_.num_threads, 0);\n\n  // Initialize cache\n  mm_cache mm_to_candidates_cache(mapping_parameters_.cache_size);\n  mm_to_candidates_cache.SetKmerLength(kmer_size);\n\n  struct _mm_history *mm_history1 = new struct _mm_history[read_batch_size_];\n  struct _mm_history *mm_history2 = new struct _mm_history[read_batch_size_];\n  \n  // The explanation for read_map_summary is in the single-end mapping function\n  uint8_t *read_map_summary = NULL ;\n  if (!mapping_parameters_.summary_metadata_file_path.empty()) {\n    read_map_summary = new uint8_t[read_batch_size_];\n    memset(read_map_summary, 1, sizeof(*read_map_summary)*read_batch_size_);\n  }\n  std::vector<std::vector<MappingRecord>> mappings_on_diff_ref_seqs;\n  \n  // Initialize mapping container\n  mappings_on_diff_ref_seqs.reserve(num_reference_sequences);\n  for (uint32_t i = 0; i < num_reference_sequences; ++i) {\n    mappings_on_diff_ref_seqs.emplace_back(std::vector<MappingRecord>());\n  }\n  std::vector<TempMappingFileHandle<MappingRecord>> temp_mapping_file_handles;\n\n  // Preprocess barcodes for single cell data\n  if (!mapping_parameters_.is_bulk_data) {\n    barcode_length_ = SampleInputBarcodesAndExamineLength();\n    if (!mapping_parameters_.barcode_whitelist_file_path.empty()) {\n      LoadBarcodeWhitelist();\n      ComputeBarcodeAbundance(initial_num_sample_barcodes_);\n    }\n  }\n\n  MinimizerGenerator minimizer_generator(kmer_size, window_size);\n\n  CandidateProcessor candidate_processor(\n      mapping_parameters_.min_num_seeds_required_for_mapping,\n      mapping_parameters_.max_seed_frequencies);\n\n  MappingProcessor<MappingRecord> mapping_processor(mapping_parameters_,\n                                                    min_unique_mapping_mapq_);\n\n  DraftMappingGenerator draft_mapping_generator(mapping_parameters_);\n\n  MappingGenerator<MappingRecord> mapping_generator(mapping_parameters_,\n                                                    pairs_custom_rid_rank_);\n\n  MappingWriter<MappingRecord> mapping_writer(\n      mapping_parameters_, barcode_length_, pairs_custom_rid_rank_);\n  mapping_writer.OutputHeader(num_reference_sequences, reference);\n\n  uint32_t num_mappings_in_mem = 0;\n  uint64_t max_num_mappings_in_mem =\n      1 * ((uint64_t)1 << 30) / sizeof(MappingRecord);\n  if (mapping_parameters_.mapping_output_format == MAPPINGFORMAT_SAM ||\n      mapping_parameters_.mapping_output_format == MAPPINGFORMAT_PAF ||\n      mapping_parameters_.mapping_output_format == MAPPINGFORMAT_PAIRS) {\n    max_num_mappings_in_mem = 1 * ((uint64_t)1 << 29) / sizeof(MappingRecord);\n  }\n\n  static uint64_t thread_num_candidates = 0;\n  static uint64_t thread_num_mappings = 0;\n  static uint64_t thread_num_mapped_reads = 0;\n  static uint64_t thread_num_uniquely_mapped_reads = 0;\n  static uint64_t thread_num_barcode_in_whitelist = 0;\n  static uint64_t thread_num_corrected_barcode = 0;\n#pragma omp threadprivate(                                               \\\n    thread_num_candidates, thread_num_mappings, thread_num_mapped_reads, \\\n    thread_num_uniquely_mapped_reads, thread_num_barcode_in_whitelist,   \\\n    thread_num_corrected_barcode)\n  double real_start_mapping_time = GetRealTime();\n  for (size_t read_file_index = 0;\n       read_file_index < mapping_parameters_.read_file1_paths.size();\n       ++read_file_index) {\n    // Set read batches to the current read files.\n    read_batch1_for_loading.InitializeLoading(\n        mapping_parameters_.read_file1_paths[read_file_index]);\n    read_batch2_for_loading.InitializeLoading(\n        mapping_parameters_.read_file2_paths[read_file_index]);\n    if (!mapping_parameters_.is_bulk_data) {\n      barcode_batch_for_loading.InitializeLoading(\n          mapping_parameters_.barcode_file_paths[read_file_index]);\n    }\n\n    // Load the first batches.\n    uint32_t num_loaded_pairs_for_loading = 0;\n    uint32_t num_loaded_pairs = LoadPairedEndReadsWithBarcodes(\n        read_batch1_for_loading, read_batch2_for_loading,\n        barcode_batch_for_loading, mapping_parameters_.num_threads >= 3 ? true : false);\n    read_batch1_for_loading.SwapSequenceBatch(read_batch1);\n    read_batch2_for_loading.SwapSequenceBatch(read_batch2);\n    if (!mapping_parameters_.is_bulk_data) {\n      barcode_batch_for_loading.SwapSequenceBatch(barcode_batch);\n    }\n\n    // Setup thread private vectors to save mapping results.\n    std::vector<std::vector<std::vector<MappingRecord>>>\n        mappings_on_diff_ref_seqs_for_diff_threads;\n    std::vector<std::vector<std::vector<MappingRecord>>>\n        mappings_on_diff_ref_seqs_for_diff_threads_for_saving;\n    mappings_on_diff_ref_seqs_for_diff_threads.reserve(\n        mapping_parameters_.num_threads);\n    mappings_on_diff_ref_seqs_for_diff_threads_for_saving.reserve(\n        mapping_parameters_.num_threads);\n    for (int ti = 0; ti < mapping_parameters_.num_threads; ++ti) {\n      mappings_on_diff_ref_seqs_for_diff_threads.emplace_back(\n          std::vector<std::vector<MappingRecord>>(num_reference_sequences));\n      mappings_on_diff_ref_seqs_for_diff_threads_for_saving.emplace_back(\n          std::vector<std::vector<MappingRecord>>(num_reference_sequences));\n      for (uint32_t i = 0; i < num_reference_sequences; ++i) {\n        mappings_on_diff_ref_seqs_for_diff_threads[ti][i].reserve(\n            (num_loaded_pairs + num_loaded_pairs / 1000 *\n                                    mapping_parameters_.max_num_best_mappings) /\n            mapping_parameters_.num_threads / num_reference_sequences);\n        mappings_on_diff_ref_seqs_for_diff_threads_for_saving[ti][i].reserve(\n            (num_loaded_pairs + num_loaded_pairs / 1000 *\n                                    mapping_parameters_.max_num_best_mappings) /\n            mapping_parameters_.num_threads / num_reference_sequences);\n      }\n    }\n\n#pragma omp parallel shared(num_reads_, num_reference_sequences, reference, index, read_batch1, read_batch2, barcode_batch, read_batch1_for_loading, read_batch2_for_loading, barcode_batch_for_loading, minimizer_generator, candidate_processor, mapping_processor, draft_mapping_generator, mapping_generator, mapping_writer, std::cerr, num_loaded_pairs_for_loading, num_loaded_pairs, mappings_on_diff_ref_seqs_for_diff_threads, mappings_on_diff_ref_seqs_for_diff_threads_for_saving, mappings_on_diff_ref_seqs, num_mappings_in_mem, max_num_mappings_in_mem, temp_mapping_file_handles, mm_to_candidates_cache, mm_history1, mm_history2, read_map_summary) num_threads(mapping_parameters_.num_threads) reduction(+:num_candidates_, num_mappings_, num_mapped_reads_, num_uniquely_mapped_reads_, num_barcode_in_whitelist_, num_corrected_barcode_)\n    {\n      thread_num_candidates = 0;\n      thread_num_mappings = 0;\n      thread_num_mapped_reads = 0;\n      thread_num_uniquely_mapped_reads = 0;\n      thread_num_barcode_in_whitelist = 0;\n      thread_num_corrected_barcode = 0;\n      PairedEndMappingMetadata paired_end_mapping_metadata;\n\n      std::vector<int> best_mapping_indices(\n          mapping_parameters_.max_num_best_mappings);\n      std::mt19937 generator(11);\n#pragma omp single\n      {\n        double real_batch_start_time = GetRealTime();\n        while (num_loaded_pairs > 0) {\n          num_reads_ += num_loaded_pairs;\n          num_reads_ += num_loaded_pairs;\n\n#pragma omp task\n          {\n            num_loaded_pairs_for_loading = LoadPairedEndReadsWithBarcodes(\n                read_batch1_for_loading, read_batch2_for_loading,\n                barcode_batch_for_loading, \n                mapping_parameters_.num_threads >= 12 ? true : false);\n          }  // end of openmp loading task\n\n          int grain_size = 5000;\n          uint32_t history_update_threshold =\n          mm_to_candidates_cache.GetUpdateThreshold(num_loaded_pairs,\n                                                    num_reads_, \n                                                    true,\n                                                    mapping_parameters_.cache_update_param\n                                                    );\n          std::fill(cache_hits_per_thread.begin(), cache_hits_per_thread.end(), 0);\n\n          if (mapping_parameters_.debug_cache) {\n            std::cout << \"[DEBUG][UPDATE] update_threshold = \" << history_update_threshold << std::endl;\n          }\n\n#pragma omp taskloop grainsize(grain_size)\n          for (uint32_t pair_index = 0; pair_index < num_loaded_pairs;\n               ++pair_index) {\n            int thread_id = omp_get_thread_num();\n            \n            bool current_barcode_is_whitelisted = true;\n            if (!mapping_parameters_.barcode_whitelist_file_path.empty()) {\n              current_barcode_is_whitelisted = CorrectBarcodeAt(\n                  pair_index, barcode_batch, thread_num_barcode_in_whitelist,\n                  thread_num_corrected_barcode);\n            }\n\n            // calculate seed value for each barcode to use later (below and summary update)\n            size_t curr_seed_val = barcode_batch.GenerateSeedFromSequenceAt(pair_index, 0, barcode_length_);\n            seeds_for_batch[pair_index] = curr_seed_val;\n\n            if (current_barcode_is_whitelisted ||\n                mapping_parameters_.output_mappings_not_in_whitelist) {\n              \n              if (read_batch1.GetSequenceLengthAt(pair_index) <\n                  (uint32_t)mapping_parameters_.min_read_length ||\n                  read_batch2.GetSequenceLengthAt(pair_index) <\n                  (uint32_t)mapping_parameters_.min_read_length) {\n                continue;  // reads are too short, just drop.\n              }\n\n              read_batch1.PrepareNegativeSequenceAt(pair_index);\n              read_batch2.PrepareNegativeSequenceAt(pair_index);\n\n              if (mapping_parameters_.trim_adapters) {\n                TrimAdapterForPairedEndRead(pair_index, read_batch1,\n                                            read_batch2);\n              }\n\n              paired_end_mapping_metadata.PreparedForMappingNextReadPair(\n                  mapping_parameters_.max_seed_frequencies[0]);\n\n              minimizer_generator.GenerateMinimizers(\n                  read_batch1, pair_index,\n                  paired_end_mapping_metadata.mapping_metadata1_.minimizers_);\n              minimizer_generator.GenerateMinimizers(\n                  read_batch2, pair_index,\n                  paired_end_mapping_metadata.mapping_metadata2_.minimizers_);\n\n              if (paired_end_mapping_metadata.BothEndsHaveMinimizers()) {\n                // declare temp local variable for cache result\n                int cache_query_result1 = 0;\n                int cache_query_result2 = 0;\n                int cache_miss = 0;\n\n                cache_query_result1 = mm_to_candidates_cache.Query(paired_end_mapping_metadata.mapping_metadata1_,\n                                                                  read_batch1.GetSequenceLengthAt(pair_index));\n                if (cache_query_result1 == -1) \n                {\n                  candidate_processor.GenerateCandidates(\n                      mapping_parameters_.error_threshold, \n                      index,\n                      paired_end_mapping_metadata.mapping_metadata1_\n                      );\n                  ++cache_miss;\n                }\n                size_t current_num_candidates1 = paired_end_mapping_metadata.mapping_metadata1_.GetNumCandidates();\n\n\n                cache_query_result2 = mm_to_candidates_cache.Query(paired_end_mapping_metadata.mapping_metadata2_,\n                                                                  read_batch2.GetSequenceLengthAt(pair_index));\n                if (cache_query_result2 == -1) \n                {\n                  candidate_processor.GenerateCandidates(\n                      mapping_parameters_.error_threshold, \n                      index,\n                      paired_end_mapping_metadata.mapping_metadata2_\n                      );\n                  ++cache_miss;\n                }\n                size_t current_num_candidates2 = paired_end_mapping_metadata.mapping_metadata2_.GetNumCandidates();\n\n                // increment variable for cache_hits\n                bool curr_read_hit_cache = false;\n                if (cache_query_result1 >= 0 || cache_query_result2 >= 0) {\n                  cache_hits_per_thread[thread_id]++;\n                  curr_read_hit_cache = true;\n                }\n\n                // update the peak counting data-structure\n                if (output_num_cache_slots_info && curr_read_hit_cache) {\n                  // calculate which map this barcode is in\n                  size_t map_id = curr_seed_val % num_locks_for_map;\n                \n                  // grab lock for this map, and add to the K-MinHash for this particular barcode\n                  omp_set_lock(&map_locks[map_id]);\n                  auto it = barcode_peak_map[map_id].emplace(curr_seed_val, K_MinHash(k_for_minhash, mapping_parameters_.cache_size)).first;\n                  if (cache_query_result1 >= 0) {it->second.add(cache_query_result1);}\n                  if (cache_query_result2 >= 0) {it->second.add(cache_query_result2);}\n                  omp_unset_lock(&map_locks[map_id]);\n                }\n\n                if (pair_index < history_update_threshold) {\n                  mm_history1[pair_index].timestamp =\n                      mm_history2[pair_index].timestamp = num_reads_;\n                  mm_history1[pair_index].minimizers =\n                      paired_end_mapping_metadata.mapping_metadata1_\n                          .minimizers_;\n                  mm_history1[pair_index].positive_candidates =\n                      paired_end_mapping_metadata.mapping_metadata1_\n                          .positive_candidates_;\n                  mm_history1[pair_index].negative_candidates =\n                      paired_end_mapping_metadata.mapping_metadata1_\n                          .negative_candidates_;\n                  mm_history1[pair_index].repetitive_seed_length =\n                      paired_end_mapping_metadata.mapping_metadata1_\n                          .repetitive_seed_length_;\n                  mm_history2[pair_index].minimizers =\n                      paired_end_mapping_metadata.mapping_metadata2_\n                          .minimizers_;\n                  mm_history2[pair_index].positive_candidates =\n                      paired_end_mapping_metadata.mapping_metadata2_\n                          .positive_candidates_;\n                  mm_history2[pair_index].negative_candidates =\n                      paired_end_mapping_metadata.mapping_metadata2_\n                          .negative_candidates_;\n                  mm_history2[pair_index].repetitive_seed_length =\n                      paired_end_mapping_metadata.mapping_metadata2_\n                          .repetitive_seed_length_;\n                }\n\n                // Test whether we need to augment the candidate list with mate\n                // information.\n                int supplementCandidateResult = 0;\n                if (!mapping_parameters_.split_alignment) {\n                  supplementCandidateResult =\n                      candidate_processor.SupplementCandidates(\n                          mapping_parameters_.error_threshold,\n                          /*search_range=*/2 *\n                              mapping_parameters_.max_insert_size,\n                          index, paired_end_mapping_metadata);\n                  current_num_candidates1 =\n                      paired_end_mapping_metadata.mapping_metadata1_\n                          .GetNumCandidates();\n                  current_num_candidates2 =\n                      paired_end_mapping_metadata.mapping_metadata2_\n                          .GetNumCandidates();\n                }\n\n                if (current_num_candidates1 > 0 &&\n                    current_num_candidates2 > 0 &&\n                    !mapping_parameters_.split_alignment) {\n                  paired_end_mapping_metadata.MoveCandidiatesToBuffer();\n\n                  // Paired-end filter\n                  candidate_processor.ReduceCandidatesForPairedEndRead(\n                      mapping_parameters_.max_insert_size,\n                      paired_end_mapping_metadata);\n\n                  current_num_candidates1 =\n                      paired_end_mapping_metadata.mapping_metadata1_\n                          .GetNumCandidates();\n                  current_num_candidates2 =\n                      paired_end_mapping_metadata.mapping_metadata2_\n                          .GetNumCandidates();\n                }\n\n                // Verify candidates\n                if (current_num_candidates1 > 0 &&\n                    current_num_candidates2 > 0) {\n                  thread_num_candidates +=\n                      current_num_candidates1 + current_num_candidates2;\n\n                  if (mapping_parameters_.custom_rid_order_file_path.length() >\n                      0) {\n                    RerankCandidatesRid(\n                        paired_end_mapping_metadata.mapping_metadata1_\n                            .positive_candidates_);\n                    RerankCandidatesRid(\n                        paired_end_mapping_metadata.mapping_metadata1_\n                            .negative_candidates_);\n                    RerankCandidatesRid(\n                        paired_end_mapping_metadata.mapping_metadata2_\n                            .positive_candidates_);\n                    RerankCandidatesRid(\n                        paired_end_mapping_metadata.mapping_metadata2_\n                            .negative_candidates_);\n                  }\n\n                  draft_mapping_generator.GenerateDraftMappings(\n                      read_batch1, pair_index, reference,\n                      paired_end_mapping_metadata.mapping_metadata1_);\n\n                  const size_t current_num_draft_mappings1 =\n                      paired_end_mapping_metadata.mapping_metadata1_\n                          .GetNumDraftMappings();\n\n                  draft_mapping_generator.GenerateDraftMappings(\n                      read_batch2, pair_index, reference,\n                      paired_end_mapping_metadata.mapping_metadata2_);\n\n                  const size_t current_num_draft_mappings2 =\n                      paired_end_mapping_metadata.mapping_metadata2_\n                          .GetNumDraftMappings();\n\n                  if (current_num_draft_mappings1 > 0 &&\n                      current_num_draft_mappings2 > 0) {\n                    std::vector<std::vector<MappingRecord>>\n                        &mappings_on_diff_ref_seqs =\n                            mappings_on_diff_ref_seqs_for_diff_threads\n                                [omp_get_thread_num()];\n\n                    if (!mapping_parameters_.split_alignment) {\n                      // GenerateBestMappingsForPairedEndRead assumes the\n                      // mappings are sorted by coordinate for non split\n                      // alignments. In split alignment, we don't want to sort\n                      // and this keeps mapping and split_sites vectors\n                      // consistent.\n                      paired_end_mapping_metadata.SortMappingsByPositions();\n                    }\n\n                    int force_mapq = -1;\n                    if (supplementCandidateResult != 0) {\n                      force_mapq = 0;\n                    }\n\n                    mapping_generator.GenerateBestMappingsForPairedEndRead(\n                        pair_index, read_batch1, read_batch2, barcode_batch,\n                        reference, best_mapping_indices, generator, force_mapq,\n                        paired_end_mapping_metadata, mappings_on_diff_ref_seqs);\n\n                    if (paired_end_mapping_metadata.GetNumBestMappings() == 1) {\n                      ++thread_num_uniquely_mapped_reads;\n                      ++thread_num_uniquely_mapped_reads;\n                    }\n\n                    thread_num_mappings += std::min(\n                        paired_end_mapping_metadata.GetNumBestMappings(),\n                        mapping_parameters_.max_num_best_mappings);\n                    thread_num_mappings += std::min(\n                        paired_end_mapping_metadata.GetNumBestMappings(),\n                        mapping_parameters_.max_num_best_mappings);\n                    if (paired_end_mapping_metadata.GetNumBestMappings() > 0) {\n                      ++thread_num_mapped_reads;\n                      ++thread_num_mapped_reads;\n\n                      if (read_map_summary != NULL)\n                        read_map_summary[pair_index] |= (cache_miss < 2 ? 2 : 0) ;\n                    }\n                  }\n                }  // verify candidate\n              }\n            } else {\n              if (read_map_summary != NULL)\n                read_map_summary[pair_index] = 0 ;\n            }\n          }  // end of for pair_index\n\n          // if (num_reads_ / 2 > initial_num_sample_barcodes_) {\n          //  if (!is_bulk_data_) {\n          //    if (!barcode_whitelist_file_path_.empty()) {\n          //      UpdateBarcodeAbundance(num_loaded_pairs, barcode_batch);\n          //    }\n          //  }\n          //}\n#pragma omp taskloop grainsize( std::max(history_update_threshold / mapping_parameters_.num_threads, (unsigned int)grain_size) )\n          // Update cache\n          for (uint32_t pair_index = 0; pair_index < history_update_threshold;\n               ++pair_index) {\n            if (mm_history1[pair_index].timestamp != num_reads_) continue;\n\n            mm_to_candidates_cache.Update(\n                mm_history1[pair_index].minimizers,\n                mm_history1[pair_index].positive_candidates,\n                mm_history1[pair_index].negative_candidates,\n                mm_history1[pair_index].repetitive_seed_length,\n                mapping_parameters_.debug_cache);\n            mm_to_candidates_cache.Update(\n                mm_history2[pair_index].minimizers,\n                mm_history2[pair_index].positive_candidates,\n                mm_history2[pair_index].negative_candidates,\n                mm_history2[pair_index].repetitive_seed_length,\n                mapping_parameters_.debug_cache);\n\n            if (mm_history1[pair_index].positive_candidates.size() > 50) {\n              std::vector<Candidate>().swap(\n                  mm_history1[pair_index].positive_candidates);\n            }\n            if (mm_history1[pair_index].negative_candidates.size() > 50) {\n              std::vector<Candidate>().swap(\n                  mm_history1[pair_index].negative_candidates);\n            }\n            if (mm_history2[pair_index].positive_candidates.size() > 50) {\n              std::vector<Candidate>().swap(\n                  mm_history2[pair_index].positive_candidates);\n            }\n            if (mm_history2[pair_index].negative_candidates.size() > 50) {\n              std::vector<Candidate>().swap(\n                  mm_history2[pair_index].negative_candidates);\n            }\n          }\n\n#pragma omp taskwait\n          if (!mapping_parameters_.summary_metadata_file_path.empty()) {\n            // Update total read count and number of cache hits\n            if (mapping_parameters_.is_bulk_data) {\n              // Sum up cache hits for each thread\n              int cache_hits_for_batch = 0;\n              for (int hits: cache_hits_per_thread) {\n                cache_hits_for_batch += hits;\n              }\n              mapping_writer.UpdateSummaryMetadata(0, \n                                                   SUMMARY_METADATA_TOTAL, \n                                                   num_loaded_pairs);\n              mapping_writer.UpdateSummaryMetadata(0,\n                                                   SUMMARY_METADATA_CACHEHIT, \n                                                   cache_hits_for_batch);\n            }\n            else {\n              uint32_t nonwhitelist_count = 0;\n              for (uint32_t pair_index = 0; pair_index < num_loaded_pairs; ++pair_index) {\n                uint64_t pair_seed = seeds_for_batch[pair_index];\n                if (read_map_summary[pair_index] & 1) {\n                  mapping_writer.UpdateSummaryMetadata(\n                                            pair_seed, \n                                            SUMMARY_METADATA_TOTAL, \n                                            1);\n                } else {\n                  ++nonwhitelist_count ;\n                }\n\n                if (read_map_summary[pair_index] & 2) {\n                  mapping_writer.UpdateSummaryMetadata( \n                                            pair_seed,\n                                            SUMMARY_METADATA_CACHEHIT, \n                                            1);\n                }\n              }\n              mapping_writer.UpdateSpeicalCategorySummaryMetadata(/*nonwhitelist*/0, \n                  SUMMARY_METADATA_TOTAL, nonwhitelist_count);\n            }  \n\n            memset(read_map_summary, 1, sizeof(*read_map_summary)*read_batch_size_);\n          }\n\n          std::cerr << \"Mapped \" << num_loaded_pairs << \" read pairs in \"\n                    << GetRealTime() - real_batch_start_time << \"s.\\n\";\n          real_batch_start_time = GetRealTime();\n\n          // Swap to next batch\n          num_loaded_pairs = num_loaded_pairs_for_loading;\n          read_batch1_for_loading.SwapSequenceBatch(read_batch1);\n          read_batch2_for_loading.SwapSequenceBatch(read_batch2);\n          barcode_batch_for_loading.SwapSequenceBatch(barcode_batch);\n          mappings_on_diff_ref_seqs_for_diff_threads.swap(\n              mappings_on_diff_ref_seqs_for_diff_threads_for_saving);\n\n          // Reset for next batch\n          std::fill(seeds_for_batch.begin(), seeds_for_batch.end(), 0);\n\n#pragma omp task\n          {\n            // Handle output\n            num_mappings_in_mem +=\n                mapping_processor.MoveMappingsInBuffersToMappingContainer(\n                    num_reference_sequences,\n                    mappings_on_diff_ref_seqs_for_diff_threads_for_saving,\n                    mappings_on_diff_ref_seqs);\n            if (mapping_parameters_.low_memory_mode &&\n                num_mappings_in_mem > max_num_mappings_in_mem) {\n              mapping_processor.ParallelSortOutputMappings(num_reference_sequences,\n                                                   mappings_on_diff_ref_seqs, 0);\n\n              mapping_writer.OutputTempMappings(num_reference_sequences,\n                                                mappings_on_diff_ref_seqs,\n                                                temp_mapping_file_handles);\n              if (temp_mapping_file_handles.size() > 850\n                  && temp_mapping_file_handles.size() % 10 == 1) { // every 10 temp files, double the temp file size\n                max_num_mappings_in_mem <<= 1;\n                std::cerr << \"Used \" << temp_mapping_file_handles.size() << \"temp files. Double the temp file volume to \" << max_num_mappings_in_mem << \"\\n\" ;\n              }\n              num_mappings_in_mem = 0;\n            }\n          }  // end of omp task to handle output\n        }    // end of while num_loaded_pairs\n      }      // end of openmp single\n\n      num_barcode_in_whitelist_ += thread_num_barcode_in_whitelist;\n      num_corrected_barcode_ += thread_num_corrected_barcode;\n      num_candidates_ += thread_num_candidates;\n      num_mappings_ += thread_num_mappings;\n      num_mapped_reads_ += thread_num_mapped_reads;\n      num_uniquely_mapped_reads_ += thread_num_uniquely_mapped_reads;\n    }  // end of openmp parallel region\n\n    read_batch1_for_loading.FinalizeLoading();\n    read_batch2_for_loading.FinalizeLoading();\n\n    if (!mapping_parameters_.is_bulk_data) {\n      barcode_batch_for_loading.FinalizeLoading();\n    }\n  }  // end of for read_file_index\n\n  std::cerr << \"Mapped all reads in \" << GetRealTime() - real_start_mapping_time\n            << \"s.\\n\";\n\n  delete[] mm_history1;\n  delete[] mm_history2;\n  if (read_map_summary != NULL)\n    delete[] read_map_summary;\n\n  OutputMappingStatistics();\n  if (!mapping_parameters_.is_bulk_data) {\n    OutputBarcodeStatistics();\n  }\n\n  index.Destroy();\n\n  if (mapping_parameters_.low_memory_mode) {\n    // First, process the remaining mappings in the memory and save them on\n    // disk.\n    if (num_mappings_in_mem > 0) {\n      mapping_processor.SortOutputMappings(num_reference_sequences,\n                                           mappings_on_diff_ref_seqs);\n\n      mapping_writer.OutputTempMappings(num_reference_sequences,\n                                        mappings_on_diff_ref_seqs,\n                                        temp_mapping_file_handles);\n      num_mappings_in_mem = 0;\n    }\n\n    mapping_writer.ProcessAndOutputMappingsInLowMemory(\n        num_mappings_in_mem, num_reference_sequences, reference,\n        barcode_whitelist_lookup_table_, temp_mapping_file_handles);\n  } \n  else {\n    if (mapping_parameters_.Tn5_shift) {\n      mapping_processor.ApplyTn5ShiftOnMappings(num_reference_sequences,\n                                                mappings_on_diff_ref_seqs);\n    }\n\n    if (mapping_parameters_.remove_pcr_duplicates) {\n      mapping_processor.RemovePCRDuplicate(num_reference_sequences,\n                                           mappings_on_diff_ref_seqs,\n                                           mapping_parameters_.num_threads);\n      std::cerr << \"After removing PCR duplications, \";\n      mapping_processor.OutputMappingStatistics(num_reference_sequences,\n                                                mappings_on_diff_ref_seqs);\n    } else {\n      mapping_processor.ParallelSortOutputMappings(num_reference_sequences,\n                                           mappings_on_diff_ref_seqs,\n                                           mapping_parameters_.num_threads);\n    }\n\n    if (mapping_parameters_.allocate_multi_mappings) {\n      const uint64_t num_multi_mappings =\n          num_mapped_reads_ - num_uniquely_mapped_reads_;\n      mapping_processor.AllocateMultiMappings(\n          num_reference_sequences, num_multi_mappings,\n          mapping_parameters_.multi_mapping_allocation_distance,\n          mappings_on_diff_ref_seqs);\n      std::cerr << \"After allocating multi-mappings, \";\n      mapping_processor.OutputMappingStatistics(num_reference_sequences,\n                                                mappings_on_diff_ref_seqs);\n      mapping_processor.SortOutputMappings(num_reference_sequences,\n                                           mappings_on_diff_ref_seqs);\n    }\n    mapping_writer.OutputMappings(num_reference_sequences, reference,\n                                  mappings_on_diff_ref_seqs);\n    // Temporarily disable feature matrix output. Do not delete the following\n    // commented code.\n    // if (!is_bulk_data_ && !matrix_output_prefix_.empty()) {\n    //   if constexpr (std::is_same<MappingRecord,\n    //                             PairedEndMappingWithBarcode>::value) {\n    //    FeatureBarcodeMatrix feature_barcode_matrix(\n    //        cell_by_bin_, bin_size_, multi_mapping_allocation_distance_,\n    //        depth_cutoff_to_call_peak_);\n    //    std::vector<std::vector<PairedEndMappingWithBarcode>> &mappings =\n    //        allocate_multi_mappings_\n    //            ? allocated_mappings_on_diff_ref_seqs\n    //            : (remove_pcr_duplicates_ ? deduped_mappings_on_diff_ref_seqs\n    //                                      : mappings_on_diff_ref_seqs);\n\n    //    feature_barcode_matrix.OutputFeatureMatrix(num_reference_sequences,\n    //                                               reference, mappings,\n    //                                               matrix_output_prefix_);\n    //  }\n    //}\n  }\n  \n  if (mapping_parameters_.mapping_output_format == MAPPINGFORMAT_SAM)\n    mapping_writer.AdjustSummaryPairedEndOverCount() ;\n\n  // Destory the locks used for map\n  for (int i = 0; i < num_locks_for_map; ++i) {\n    omp_destroy_lock(&map_locks[i]);\n  }\n\n  // Add cardinality information to summary metadata\n  if (output_num_cache_slots_info) {\n    for (auto curr_map: barcode_peak_map) {\n      for (auto &pair: curr_map) {\n        size_t curr_seed = pair.first;\n        size_t est_num_slots = pair.second.compute_cardinality();\n\n        mapping_writer.UpdateSummaryMetadata( \n                          curr_seed,\n                          SUMMARY_METADATA_CARDINALITY, \n                          est_num_slots);\n      }\n    }\n  }\n\n  mapping_writer.OutputSummaryMetadata(frip_est_params, output_num_cache_slots_info);\n  reference.FinalizeLoading();\n  if (mapping_parameters_.debug_cache) {mm_to_candidates_cache.PrintStats();}\n  \n  std::cerr << \"Total time: \" << GetRealTime() - real_start_time << \"s.\\n\";\n}\n\n}  // namespace chromap\n\n#endif  // CHROMAP_H_\n"
  },
  {
    "path": "src/chromap_driver.cc",
    "content": "#include \"chromap_driver.h\"\n\n#include <glob.h>\n\n#include <cassert>\n#include <iomanip>\n#include <string>\n#include <vector>\n\n#include \"chromap.h\"\n#include \"cxxopts.hpp\"\n\nnamespace chromap {\nnamespace {\n\nvoid AddIndexingOptions(cxxopts::Options &options) {\n  options.add_options(\"Indexing\")(\"i,build-index\", \"Build index\")(\n      \"min-frag-length\",\n      \"Min fragment length for choosing k and w automatically [30]\",\n      cxxopts::value<int>(),\n      \"INT\")(\"k,kmer\", \"Kmer length [17]\", cxxopts::value<int>(), \"INT\")(\n      \"w,window\", \"Window size [7]\", cxxopts::value<int>(), \"INT\");\n}\n\nvoid AddMappingOptions(cxxopts::Options &options) {\n  options.set_width(120).add_options(\"Mapping\")(\n      \"preset\",\n      \"Preset parameters for mapping reads (always applied before other \"\n      \"options) []\\natac: mapping ATAC-seq/scATAC-seq reads\\nchip: mapping \"\n      \"ChIP-seq reads\\nhic: mapping Hi-C reads\",\n      cxxopts::value<std::string>(),\n      \"STR\")(\"split-alignment\", \"Allow split alignments\")(\n      \"e,error-threshold\", \"Max # errors allowed to map a read [8]\",\n      cxxopts::value<int>(), \"INT\")\n      //(\"A,match-score\", \"Match score [1]\", cxxopts::value<int>(), \"INT\")\n      //(\"B,mismatch-penalty\", \"Mismatch penalty [4]\", cxxopts::value<int>(),\n      //\"INT\")\n      //(\"O,gap-open-penalties\", \"Gap open penalty [6,6]\",\n      // cxxopts::value<std::vector<int>>(), \"INT[,INT]\")\n      //(\"E,gap-extension-penalties\", \"Gap extension penalty [1,1]\",\n      // cxxopts::value<std::vector<int>>(), \"INT[,INT]\")\n      (\"s,min-num-seeds\", \"Min # seeds to try to map a read [2]\",\n       cxxopts::value<int>(),\n       \"INT\")(\"f,max-seed-frequencies\",\n              \"Max seed frequencies for a seed to be selected [500,1000]\",\n              cxxopts::value<std::vector<int>>(), \"INT[,INT]\")\n      //(\"n,max-num-best-mappings\", \"Only report n best mappings [1]\",\n      // cxxopts::value<int>(), \"INT\")\n      (\"l,max-insert-size\",\n       \"Max insert size, only for paired-end read mapping [1000]\",\n       cxxopts::value<int>(),\n       \"INT\")(\"q,MAPQ-threshold\",\n              \"Min MAPQ in range [0, 60] for mappings to be output [30]\",\n              cxxopts::value<uint8_t>(),\n              \"INT\")(\"min-read-length\", \"Min read length [30]\",\n                     cxxopts::value<int>(), \"INT\")\n      //(\"multi-mapping-allocation-distance\", \"Uni-mappings within this distance\n      // from any end of multi-mappings are used for allocation [0]\",\n      // cxxopts::value<int>(), \"INT\")\n      //(\"multi-mapping-allocation-seed\", \"Seed for random number generator in\n      // multi-mapping allocation [11]\", cxxopts::value<int>(), \"INT\")\n      //(\"drop-repetitive-reads\", \"Drop reads with too many best mappings\n      //[500000]\", cxxopts::value<int>(), \"INT\")\n      (\"trim-adapters\", \"Try to trim adapters on 3'\")(\"remove-pcr-duplicates\",\n                                                      \"Remove PCR duplicates\")(\n          \"remove-pcr-duplicates-at-bulk-level\",\n          \"Remove PCR duplicates at bulk level for single cell data\")(\n          \"remove-pcr-duplicates-at-cell-level\",\n          \"Remove PCR duplicates at cell level for single cell data\")\n      //(\"allocate-multi-mappings\", \"Allocate multi-mappings\")\n      (\"Tn5-shift\", \"Perform Tn5 shift\")(\"low-mem\", \"Use low memory mode\")(\n          \"bc-error-threshold\",\n          \"Max Hamming distance allowed to correct a barcode [1]\",\n          cxxopts::value<int>(),\n          \"INT\")(\"bc-probability-threshold\",\n                 \"Min probability to correct a barcode [0.9]\",\n                 cxxopts::value<double>(),\n                 \"FLT\")(\"t,num-threads\", \"# threads for mapping [1]\",\n                        cxxopts::value<int>(), \"INT\")\n      (\"frip-est-params\", \"coefficients used for frip est calculation, separated by semi-colons\",\n      cxxopts::value<std::string>(), \"STR\")\n      (\"turn-off-num-uniq-cache-slots\", \"turn off the output of number of cache slots in summary file\");\n}\n\nvoid AddInputOptions(cxxopts::Options &options) {\n  options.add_options(\"Input\")(\"r,ref\", \"Reference file\",\n                               cxxopts::value<std::string>(), \"FILE\")(\n      \"x,index\", \"Index file\", cxxopts::value<std::string>(), \"FILE\")(\n      \"1,read1\", \"Single-end read files or paired-end read files 1\",\n      cxxopts::value<std::vector<std::string>>(),\n      \"FILE\")(\"2,read2\", \"Paired-end read files 2\",\n              cxxopts::value<std::vector<std::string>>(),\n              \"FILE\")(\"b,barcode\", \"Cell barcode files\",\n                      cxxopts::value<std::vector<std::string>>(), \"FILE\")(\n      \"barcode-whitelist\", \"Cell barcode whitelist file\",\n      cxxopts::value<std::string>(),\n      \"FILE\")(\"read-format\",\n              \"Format for read files and barcode files  [\\\"r1:0:-1,bc:0:-1\\\" \"\n              \"as 10x Genomics single-end format]\",\n              cxxopts::value<std::string>(), \"STR\");\n}\n\nvoid AddOutputOptions(cxxopts::Options &options) {\n  options.add_options(\"Output\")(\"o,output\", \"Output file\",\n                                cxxopts::value<std::string>(), \"FILE\")\n      //(\"p,matrix-output-prefix\", \"Prefix of matrix output files\",\n      // cxxopts::value<std::string>(), \"FILE\")\n      (\"output-mappings-not-in-whitelist\",\n       \"Output mappings with barcode not in the whitelist\")(\n          \"chr-order\",\n          \"Custom chromosome order file. If not specified, the order of \"\n          \"reference sequences will be used\",\n          cxxopts::value<std::string>(),\n          \"FILE\")(\"BED\", \"Output mappings in BED/BEDPE format\")(\n          \"TagAlign\", \"Output mappings in TagAlign/PairedTagAlign format\")(\n          \"SAM\", \"Output mappings in SAM format\")(\n          \"pairs\",\n          \"Output mappings in pairs format (defined by 4DN for HiC data)\")(\n          \"pairs-natural-chr-order\",\n          \"Custom chromosome order file for pairs flipping. If not specified, \"\n          \"the custom chromosome order will be used\",\n          cxxopts::value<std::string>(),\n          \"FILE\")(\"barcode-translate\",\n                  \"Convert barcode to the specified sequences during output\",\n                  cxxopts::value<std::string>(), \"FILE\")(\n          \"summary\",\n          \"Summarize the mapping statistics at bulk or barcode level\",\n          cxxopts::value<std::string>(), \"FILE\");\n  //(\"PAF\", \"Output mappings in PAF format (only for test)\");\n}\n\nvoid AddDevelopmentOptions(cxxopts::Options &options) {\n  options.add_options(\"Development options\")(\"A,match-score\", \"Match score [1]\",\n                                             cxxopts::value<int>(), \"INT\")(\n      \"B,mismatch-penalty\", \"Mismatch penalty [4]\", cxxopts::value<int>(),\n      \"INT\")(\"O,gap-open-penalties\", \"Gap open penalty [6,6]\",\n             cxxopts::value<std::vector<int>>(), \"INT[,INT]\")(\n      \"E,gap-extension-penalties\", \"Gap extension penalty [1,1]\",\n      cxxopts::value<std::vector<int>>(),\n      \"INT[,INT]\")(\"n,max-num-best-mappings\", \"Only report n best mappings [1]\",\n                   cxxopts::value<int>(),\n                   \"INT\")(\"multi-mapping-allocation-distance\",\n                          \"Uni-mappings within this distance from any end of \"\n                          \"multi-mappings are used for allocation [0]\",\n                          cxxopts::value<int>(), \"INT\")(\n      \"multi-mapping-allocation-seed\",\n      \"Seed for random number generator in multi-mapping allocation [11]\",\n      cxxopts::value<int>(), \"INT\")(\n      \"drop-repetitive-reads\",\n      \"Drop reads with too many best mappings [500000]\", cxxopts::value<int>(),\n      \"INT\")(\"allocate-multi-mappings\", \"Allocate multi-mappings\")(\n      \"PAF\", \"Output mappings in PAF format (only for test)\")(\n      \"skip-barcode-check\",\n      \"Do not check whether too few barcodes are in the whitelist\")\n      (\"cache-size\", \"number of cache entries [4000003]\", cxxopts::value<int>(), \"INT\")\n      (\"cache-update-param\", \"value used to control number of reads sampled [0.01]\", cxxopts::value<double>(), \"FLT\")\n      (\"debug-cache\", \"verbose output for debugging cache used in chromap\")\n      (\"k-for-minhash\", \"number of values stored in each MinHash sketch [250]\", cxxopts::value<int>(), \"INT\");\n}\n\nvoid AddPeakOptions(cxxopts::Options &options) {\n  options.add_options(\"Peak\")(\"cell-by-bin\", \"Generate cell-by-bin matrix\")(\n      \"bin-size\", \"Bin size to generate cell-by-bin matrix [5000]\",\n      cxxopts::value<int>(),\n      \"INT\")(\"depth-cutoff\", \"Depth cutoff for peak calling [3]\",\n             cxxopts::value<int>(),\n             \"INT\")(\"peak-min-length\", \"Min length of peaks to report [30]\",\n                    cxxopts::value<int>(), \"INT\")(\n      \"peak-merge-max-length\", \"Peaks within this length will be merged [30]\",\n      cxxopts::value<int>(), \"INT\");\n}\n\n// Return all file paths that match the input pattern.\nstd::vector<std::string> GetMatchedFilePaths(const std::string &pattern) {\n  glob_t glob_result;\n  memset(&glob_result, 0, sizeof(glob_result));\n\n  const int return_value =\n      glob(pattern.c_str(), GLOB_TILDE, NULL, &glob_result);\n\n  if (return_value != 0) {\n    globfree(&glob_result);\n    chromap::ExitWithMessage(\"glob() failed with return value \" +\n                             std::to_string(return_value) + \"\\n\");\n  }\n\n  std::vector<std::string> matched_file_paths;\n  matched_file_paths.reserve(glob_result.gl_pathc);\n  for (size_t i = 0; i < glob_result.gl_pathc; ++i) {\n    matched_file_paths.push_back(std::string(glob_result.gl_pathv[i]));\n    std::cerr << matched_file_paths.back() << \"\\n\";\n  }\n  globfree(&glob_result);\n\n  return matched_file_paths;\n}\n\n// Return all file paths that match the input patterns.\nstd::vector<std::string> GetMatchedFilePaths(\n    const std::vector<std::string> &patterns) {\n  std::vector<std::string> all_matched_file_paths;\n  for (const auto &pattern : patterns) {\n    std::vector<std::string> matched_file_paths = GetMatchedFilePaths(pattern);\n    all_matched_file_paths.reserve(all_matched_file_paths.size() +\n                                   matched_file_paths.size());\n    all_matched_file_paths.insert(\n        std::end(all_matched_file_paths),\n        std::make_move_iterator(std::begin(matched_file_paths)),\n        std::make_move_iterator(std::end(matched_file_paths)));\n  }\n  return all_matched_file_paths;\n}\n\n}  // namespace\n\nvoid ChromapDriver::ParseArgsAndRun(int argc, char *argv[]) {\n  cxxopts::Options options(\n      \"chromap\", \"Fast alignment and preprocessing of chromatin profiles\");\n\n  options.add_options()(\"v,version\", \"Print version\")(\"h,help\", \"Print help\");\n\n  AddIndexingOptions(options);\n  AddMappingOptions(options);\n\n  // We don't support peak options for now.\n  // AddPeakOptions(options);\n\n  AddInputOptions(options);\n  AddOutputOptions(options);\n\n  AddDevelopmentOptions(options);\n\n  auto result = options.parse(argc, argv);\n  if (result.count(\"h\")) {\n    std::cerr << options.help(\n        {\"\", \"Indexing\", \"Mapping\", \"Peak\", \"Input\", \"Output\"});\n    return;\n  }\n  if (result.count(\"v\")) {\n    std::cerr << CHROMAP_VERSION << \"\\n\";\n    return;\n  }\n  // Parameters and their default\n  IndexParameters index_parameters;\n  MappingParameters mapping_parameters;\n\n  if (result.count(\"preset\")) {\n    std::string read_type = result[\"preset\"].as<std::string>();\n    if (read_type == \"atac\") {\n      std::cerr << \"Preset parameters for ATAC-seq/scATAC-seq are used.\\n\";\n      mapping_parameters.max_insert_size = 2000;\n      mapping_parameters.trim_adapters = true;\n      mapping_parameters.remove_pcr_duplicates = true;\n      mapping_parameters.remove_pcr_duplicates_at_bulk_level = false;\n      mapping_parameters.Tn5_shift = true;\n      mapping_parameters.mapping_output_format = MAPPINGFORMAT_BED;\n      mapping_parameters.low_memory_mode = true;\n    } else if (read_type == \"chip\") {\n      std::cerr << \"Preset parameters for ChIP-seq are used.\\n\";\n      mapping_parameters.max_insert_size = 2000;\n      mapping_parameters.remove_pcr_duplicates = true;\n      mapping_parameters.low_memory_mode = true;\n      mapping_parameters.mapping_output_format = MAPPINGFORMAT_BED;\n    } else if (read_type == \"hic\") {\n      std::cerr << \"Preset parameters for Hi-C are used.\\n\";\n      mapping_parameters.error_threshold = 4;\n      mapping_parameters.mapq_threshold = 1;\n      mapping_parameters.split_alignment = true;\n      mapping_parameters.low_memory_mode = true;\n      mapping_parameters.mapping_output_format = MAPPINGFORMAT_PAIRS;\n    } else {\n      chromap::ExitWithMessage(\"Unrecognized preset parameters \" + read_type +\n                               \"\\n\");\n    }\n  }\n  // Optional parameters\n  if (result.count(\"min-frag-length\")) {\n    int min_fragment_length = result[\"min-frag-length\"].as<int>();\n    if (min_fragment_length <= 60) {\n      index_parameters.kmer_size = 17;\n      index_parameters.window_size = 7;\n    } else if (min_fragment_length <= 80) {\n      index_parameters.kmer_size = 19;\n      index_parameters.window_size = 10;\n    } else {\n      index_parameters.kmer_size = 23;\n      index_parameters.window_size = 11;\n    }\n  }\n  if (result.count(\"k\")) {\n    index_parameters.kmer_size = result[\"kmer\"].as<int>();\n  }\n  if (result.count(\"w\")) {\n    index_parameters.window_size = result[\"window\"].as<int>();\n  }\n  if (result.count(\"e\")) {\n    mapping_parameters.error_threshold = result[\"error-threshold\"].as<int>();\n  }\n  if (result.count(\"A\")) {\n    mapping_parameters.match_score = result[\"match-score\"].as<int>();\n  }\n  if (result.count(\"B\")) {\n    mapping_parameters.mismatch_penalty = result[\"mismatch-penalty\"].as<int>();\n  }\n  if (result.count(\"O\")) {\n    mapping_parameters.gap_open_penalties =\n        result[\"gap-open-penalties\"].as<std::vector<int>>();\n  }\n  if (result.count(\"E\")) {\n    mapping_parameters.gap_extension_penalties =\n        result[\"gap-extension-penalties\"].as<std::vector<int>>();\n  }\n  if (result.count(\"s\")) {\n    mapping_parameters.min_num_seeds_required_for_mapping =\n        result[\"min-num-seeds\"].as<int>();\n  }\n  if (result.count(\"f\")) {\n    mapping_parameters.max_seed_frequencies =\n        result[\"max-seed-frequencies\"].as<std::vector<int>>();\n  }\n  if (result.count(\"n\")) {\n    mapping_parameters.max_num_best_mappings =\n        result[\"max-num-best-mappings\"].as<int>();\n  }\n  if (result.count(\"l\")) {\n    mapping_parameters.max_insert_size = result[\"max-insert-size\"].as<int>();\n  }\n  if (result.count(\"q\")) {\n    mapping_parameters.mapq_threshold = result[\"MAPQ-threshold\"].as<uint8_t>();\n  }\n  if (result.count(\"t\")) {\n    mapping_parameters.num_threads = result[\"num-threads\"].as<int>();\n  }\n\n\n  // check cache-related parameters\n  if (result.count(\"cache-update-param\")) {\n    mapping_parameters.cache_update_param = result[\"cache-update-param\"].as<double>();\n    if (mapping_parameters.cache_update_param < 0.0 || mapping_parameters.cache_update_param > 1.0){\n      chromap::ExitWithMessage(\"cache update param is not approriate, must be in this range (0, 1]\");\n    }\n  } \n  if (result.count(\"cache-size\")) {\n    mapping_parameters.cache_size = result[\"cache-size\"].as<int>();\n    if (mapping_parameters.cache_size < 2000000 || mapping_parameters.cache_size > 15000000) {\n        chromap::ExitWithMessage(\"cache size is not in appropriate range\\n\");\n    }\n  }\n  if (result.count(\"debug-cache\")) {\n    mapping_parameters.debug_cache = true;\n  }\n  if (result.count(\"frip-est-params\")) {\n    mapping_parameters.frip_est_params = result[\"frip-est-params\"].as<std::string>();\n  }\n  if (result.count(\"turn-off-num-uniq-cache-slots\")) {\n    mapping_parameters.output_num_uniq_cache_slots = false;\n  } \n  if (result.count(\"k-for-minhash\")) {\n    mapping_parameters.k_for_minhash = result[\"k-for-minhash\"].as<int>();\n    if (mapping_parameters.k_for_minhash < 1 || mapping_parameters.k_for_minhash >= 2000) {\n      chromap::ExitWithMessage(\"Invalid paramter for size of MinHash sketch (--k-for-minhash)\");\n    }\n  }\n\n\n  if (result.count(\"min-read-length\")) {\n    mapping_parameters.min_read_length = result[\"min-read-length\"].as<int>();\n  }\n  if (result.count(\"bc-error-threshold\")) {\n    mapping_parameters.barcode_correction_error_threshold =\n        result[\"bc-error-threshold\"].as<int>();\n  }\n  if (result.count(\"bc-probability-threshold\")) {\n    mapping_parameters.barcode_correction_probability_threshold =\n        result[\"bc-probability-threshold\"].as<double>();\n  }\n  if (result.count(\"multi-mapping-allocation-distance\")) {\n    mapping_parameters.multi_mapping_allocation_distance =\n        result[\"multi-mapping-allocation-distance\"].as<int>();\n  }\n  if (result.count(\"multi-mapping-allocation-seed\")) {\n    mapping_parameters.multi_mapping_allocation_seed =\n        result[\"multi-mapping-allocation-seed\"].as<int>();\n  }\n  if (result.count(\"drop-repetitive-reads\")) {\n    mapping_parameters.drop_repetitive_reads =\n        result[\"drop-repetitive-reads\"].as<int>();\n  }\n  if (result.count(\"trim-adapters\")) {\n    mapping_parameters.trim_adapters = true;\n  }\n  if (result.count(\"remove-pcr-duplicates\")) {\n    mapping_parameters.remove_pcr_duplicates = true;\n  }\n  if (result.count(\"remove-pcr-duplicates-at-bulk-level\")) {\n    mapping_parameters.remove_pcr_duplicates_at_bulk_level = true;\n  }\n  if (result.count(\"remove-pcr-duplicates-at-cell-level\")) {\n    mapping_parameters.remove_pcr_duplicates_at_bulk_level = false;\n  }\n  if (result.count(\"allocate-multi-mappings\")) {\n    mapping_parameters.allocate_multi_mappings = true;\n    mapping_parameters.only_output_unique_mappings = false;\n  }\n  if (result.count(\"Tn5-shift\")) {\n    mapping_parameters.Tn5_shift = true;\n  }\n  if (result.count(\"split-alignment\")) {\n    mapping_parameters.split_alignment = true;\n  }\n  if (result.count(\"output-mappings-not-in-whitelist\")) {\n    mapping_parameters.output_mappings_not_in_whitelist = true;\n  }\n  if (result.count(\"BED\")) {\n    mapping_parameters.mapping_output_format = MAPPINGFORMAT_BED;\n  }\n  if (result.count(\"TagAlign\")) {\n    mapping_parameters.mapping_output_format = MAPPINGFORMAT_TAGALIGN;\n  }\n  if (result.count(\"PAF\")) {\n    mapping_parameters.mapping_output_format = MAPPINGFORMAT_PAF;\n  }\n  if (result.count(\"pairs\")) {\n    mapping_parameters.mapping_output_format = MAPPINGFORMAT_PAIRS;\n  }\n  if (result.count(\"SAM\")) {\n    mapping_parameters.mapping_output_format = MAPPINGFORMAT_SAM;\n  }\n  if (result.count(\"low-mem\")) {\n    mapping_parameters.low_memory_mode = true;\n  }\n  if (result.count(\"cell-by-bin\")) {\n    mapping_parameters.cell_by_bin = true;\n  }\n  if (result.count(\"bin-size\")) {\n    mapping_parameters.bin_size = result[\"bin-size\"].as<int>();\n  }\n  if (result.count(\"depth-cutoff\")) {\n    mapping_parameters.depth_cutoff_to_call_peak =\n        result[\"depth-cutoff\"].as<uint16_t>();\n  }\n  if (result.count(\"peak-min-length\")) {\n    mapping_parameters.peak_min_length = result[\"peak-min-length\"].as<int>();\n  }\n  if (result.count(\"peak-merge-max-length\")) {\n    mapping_parameters.peak_merge_max_length =\n        result[\"peak-merge-max-length\"].as<int>();\n  }\n\n  std::cerr << std::setprecision(2) << std::fixed;\n  if (result.count(\"i\")) {\n    if (result.count(\"r\")) {\n      index_parameters.reference_file_path = result[\"ref\"].as<std::string>();\n    } else {\n      chromap::ExitWithMessage(\"No reference specified!\");\n    }\n    if (result.count(\"o\")) {\n      index_parameters.index_output_file_path =\n          result[\"output\"].as<std::string>();\n    } else {\n      chromap::ExitWithMessage(\"No output file specified!\");\n    }\n    std::cerr << \"Build index for the reference.\\n\";\n    std::cerr << \"Kmer length: \" << index_parameters.kmer_size\n              << \", window size: \" << index_parameters.window_size << \"\\n\";\n    std::cerr << \"Reference file: \" << index_parameters.reference_file_path\n              << \"\\n\";\n    std::cerr << \"Output file: \" << index_parameters.index_output_file_path\n              << \"\\n\";\n    chromap::Chromap chromap_for_indexing(index_parameters);\n    chromap_for_indexing.ConstructIndex();\n  } else if (result.count(\"1\")) {\n    std::cerr << \"Start to map reads.\\n\";\n    if (result.count(\"r\")) {\n      mapping_parameters.reference_file_path = result[\"ref\"].as<std::string>();\n    } else {\n      chromap::ExitWithMessage(\"No reference specified!\");\n    }\n    if (result.count(\"o\")) {\n      mapping_parameters.mapping_output_file_path =\n          result[\"output\"].as<std::string>();\n    } else {\n      chromap::ExitWithMessage(\"No output file specified!\");\n    }\n    if (result.count(\"x\")) {\n      mapping_parameters.index_file_path = result[\"index\"].as<std::string>();\n    } else {\n      chromap::ExitWithMessage(\"No index file specified!\");\n    }\n    if (result.count(\"1\")) {\n      mapping_parameters.read_file1_paths =\n          GetMatchedFilePaths(result[\"read1\"].as<std::vector<std::string>>());\n    } else {\n      chromap::ExitWithMessage(\"No read file specified!\");\n    }\n    if (result.count(\"2\")) {\n      mapping_parameters.read_file2_paths =\n          GetMatchedFilePaths(result[\"read2\"].as<std::vector<std::string>>());\n    }\n\n    if (result.count(\"b\")) {\n      mapping_parameters.is_bulk_data = false;\n      mapping_parameters.barcode_file_paths =\n          GetMatchedFilePaths(result[\"barcode\"].as<std::vector<std::string>>());\n      if (result.count(\"barcode-whitelist\") == 0) {\n        std::cerr << \"WARNING: there are input barcode files but a barcode \"\n                     \"whitelist file is missing!\\n\";\n      }\n    }\n\n    if (result.count(\"barcode-whitelist\")) {\n      if (mapping_parameters.is_bulk_data) {\n        chromap::ExitWithMessage(\n            \"No barcode file specified but the barcode whitelist file is \"\n            \"given!\");\n      }\n      mapping_parameters.barcode_whitelist_file_path =\n          result[\"barcode-whitelist\"].as<std::string>();\n    }\n\n    if (result.count(\"p\")) {\n      mapping_parameters.matrix_output_prefix =\n          result[\"matrix-output-prefix\"].as<std::string>();\n      if (mapping_parameters.is_bulk_data) {\n        chromap::ExitWithMessage(\n            \"No barcode file specified but asked to output matrix files!\");\n      }\n    }\n    if (result.count(\"read-format\")) {\n      mapping_parameters.read_format = result[\"read-format\"].as<std::string>();\n    }\n\n    if (result.count(\"chr-order\")) {\n      mapping_parameters.custom_rid_order_file_path =\n          result[\"chr-order\"].as<std::string>();\n    }\n\n    if (result.count(\"pairs-natural-chr-order\")) {\n      mapping_parameters.pairs_flipping_custom_rid_order_file_path =\n          result[\"pairs-natural-chr-order\"].as<std::string>();\n    }\n\n    if (result.count(\"barcode-translate\")) {\n      mapping_parameters.barcode_translate_table_file_path =\n          result[\"barcode-translate\"].as<std::string>();\n    }\n\n    if (result.count(\"summary\")) {\n      mapping_parameters.summary_metadata_file_path =\n          result[\"summary\"].as<std::string>();\n    }\n\n    if (result.count(\"skip-barcode-check\")) {\n      mapping_parameters.skip_barcode_check = true;\n    }\n\n    // std::cerr << \"Parameters: error threshold: \" << error_threshold << \",\n    // match score: \" << match_score << \", mismatch_penalty: \" <<\n    // mismatch_penalty << \", gap open penalties for deletions and insertions: \"\n    // << gap_open_penalties[0] << \",\" << gap_open_penalties[1] << \", gap\n    // extension penalties for deletions and insertions: \" <<\n    // gap_extension_penalties[0] << \",\" << gap_extension_penalties[1] << \",\n    // min-num-seeds: \" << min_num_seeds_required_for_mapping << \",\n    // max-seed-frequency: \" << max_seed_frequencies[0] << \",\" <<\n    // max_seed_frequencies[1] << \", max-num-best-mappings: \" <<\n    // max_num_best_mappings << \", max-insert-size: \" << max_insert_size << \",\n    // MAPQ-threshold: \" << (int)mapq_threshold << \", min-read-length: \" <<\n    // min_read_length << \", multi-mapping-allocation-distance: \" <<\n    // multi_mapping_allocation_distance << \", multi-mapping-allocation-seed: \"\n    // << multi_mapping_allocation_seed << \", drop-repetitive-reads: \" <<\n    // drop_repetitive_reads << \"\\n\";\n    std::cerr << \"Parameters: error threshold: \"\n              << mapping_parameters.error_threshold << \", min-num-seeds: \"\n              << mapping_parameters.min_num_seeds_required_for_mapping\n              << \", max-seed-frequency: \"\n              << mapping_parameters.max_seed_frequencies[0] << \",\"\n              << mapping_parameters.max_seed_frequencies[1]\n              << \", max-num-best-mappings: \"\n              << mapping_parameters.max_num_best_mappings\n              << \", max-insert-size: \" << mapping_parameters.max_insert_size\n              << \", MAPQ-threshold: \" << (int)mapping_parameters.mapq_threshold\n              << \", min-read-length: \" << mapping_parameters.min_read_length\n              << \", bc-error-threshold: \"\n              << mapping_parameters.barcode_correction_error_threshold\n              << \", bc-probability-threshold: \"\n              << mapping_parameters.barcode_correction_probability_threshold\n              << \"\\n\";\n    std::cerr << \"Number of threads: \" << mapping_parameters.num_threads\n              << \"\\n\";\n    if (mapping_parameters.is_bulk_data) {\n      std::cerr << \"Analyze bulk data.\\n\";\n    } else {\n      std::cerr << \"Analyze single-cell data.\\n\";\n    }\n    if (mapping_parameters.trim_adapters) {\n      std::cerr << \"Will try to remove adapters on 3'.\\n\";\n    } else {\n      std::cerr << \"Won't try to remove adapters on 3'.\\n\";\n    }\n    if (mapping_parameters.remove_pcr_duplicates) {\n      std::cerr << \"Will remove PCR duplicates after mapping.\\n\";\n    } else {\n      std::cerr << \"Won't remove PCR duplicates after mapping.\\n\";\n    }\n    if (mapping_parameters.remove_pcr_duplicates_at_bulk_level) {\n      std::cerr << \"Will remove PCR duplicates at bulk level.\\n\";\n    } else {\n      std::cerr << \"Will remove PCR duplicates at cell level.\\n\";\n    }\n    if (mapping_parameters.allocate_multi_mappings) {\n      std::cerr << \"Will allocate multi-mappings after mapping.\\n\";\n    } else {\n      std::cerr << \"Won't allocate multi-mappings after mapping.\\n\";\n    }\n    if (mapping_parameters.only_output_unique_mappings) {\n      std::cerr << \"Only output unique mappings after mapping.\\n\";\n    }\n    if (!mapping_parameters.output_mappings_not_in_whitelist) {\n      std::cerr << \"Only output mappings of which barcodes are in whitelist.\\n\";\n    } else {\n      std::cerr << \"No filtering of mappings based on whether their barcodes \"\n                   \"are in whitelist.\\n\";\n    }\n    // if (allocate_multi_mappings && only_output_unique_mappings) {\n    //  std::cerr << \"WARNING: you want to output unique mappings only but you\n    //  ask to allocate multi-mappings! In this case, it won't allocate\n    //  multi-mappings and will only output unique mappings.\\n\";\n    //  allocate_multi_mappings = false;\n    //}\n    if (mapping_parameters.max_num_best_mappings >\n        mapping_parameters.drop_repetitive_reads) {\n      std::cerr << \"WARNING: you want to drop mapped reads with more than \"\n                << mapping_parameters.drop_repetitive_reads\n                << \" mappings. But you want to output top \"\n                << mapping_parameters.max_num_best_mappings\n                << \" best mappings. In this case, only reads with <=\"\n                << mapping_parameters.drop_repetitive_reads\n                << \" best mappings will be output.\\n\";\n      mapping_parameters.max_num_best_mappings =\n          mapping_parameters.drop_repetitive_reads;\n    }\n    if (mapping_parameters.Tn5_shift) {\n      std::cerr << \"Perform Tn5 shift.\\n\";\n    }\n    if (mapping_parameters.split_alignment) {\n      std::cerr << \"Allow split alignment.\\n\";\n    }\n\n    switch (mapping_parameters.mapping_output_format) {\n      case MAPPINGFORMAT_BED:\n        std::cerr << \"Output mappings in BED/BEDPE format.\\n\";\n        break;\n      case MAPPINGFORMAT_TAGALIGN:\n        std::cerr << \"Output mappings in TagAlign/PairedTagAlign format.\\n\";\n        break;\n      case MAPPINGFORMAT_PAF:\n        std::cerr << \"Output mappings in PAF format.\\n\";\n        break;\n      case MAPPINGFORMAT_SAM:\n        std::cerr << \"Output mappings in SAM format.\\n\";\n        break;\n      case MAPPINGFORMAT_PAIRS:\n        std::cerr << \"Output mappings in pairs format.\\n\";\n        break;\n      default:\n        chromap::ExitWithMessage(\"Unknown mapping output format!\");\n        break;\n    }\n\n    std::cerr << \"Reference file: \" << mapping_parameters.reference_file_path\n              << \"\\n\";\n    std::cerr << \"Index file: \" << mapping_parameters.index_file_path << \"\\n\";\n    for (size_t i = 0; i < mapping_parameters.read_file1_paths.size(); ++i) {\n      std::cerr << i + 1\n                << \"th read 1 file: \" << mapping_parameters.read_file1_paths[i]\n                << \"\\n\";\n    }\n    if (result.count(\"2\") != 0) {\n      for (size_t i = 0; i < mapping_parameters.read_file2_paths.size(); ++i) {\n        std::cerr << i + 1 << \"th read 2 file: \"\n                  << mapping_parameters.read_file2_paths[i] << \"\\n\";\n      }\n    }\n    if (result.count(\"b\") != 0) {\n      for (size_t i = 0; i < mapping_parameters.barcode_file_paths.size();\n           ++i) {\n        std::cerr << i + 1 << \"th cell barcode file: \"\n                  << mapping_parameters.barcode_file_paths[i] << \"\\n\";\n      }\n    }\n    if (result.count(\"barcode-whitelist\") != 0) {\n      std::cerr << \"Cell barcode whitelist file: \"\n                << mapping_parameters.barcode_whitelist_file_path << \"\\n\";\n    }\n    std::cerr << \"Output file: \" << mapping_parameters.mapping_output_file_path\n              << \"\\n\";\n    if (result.count(\"matrix-output-prefix\") != 0) {\n      std::cerr << \"Matrix output prefix: \"\n                << mapping_parameters.matrix_output_prefix << \"\\n\";\n    }\n\n    chromap::Chromap chromap_for_mapping(mapping_parameters);\n\n    if (result.count(\"2\") == 0) {\n      // Single-end reads.\n      switch (mapping_parameters.mapping_output_format) {\n        case MAPPINGFORMAT_PAF: {\n          chromap_for_mapping.MapSingleEndReads<chromap::PAFMapping>();\n          break;\n        }\n        case MAPPINGFORMAT_SAM: {\n          chromap_for_mapping.MapSingleEndReads<chromap::SAMMapping>();\n          break;\n        }\n        case MAPPINGFORMAT_PAIRS:\n          chromap::ExitWithMessage(\"No support for single-end HiC yet!\");\n          break;\n        case MAPPINGFORMAT_BED:\n        case MAPPINGFORMAT_TAGALIGN:\n          if (result.count(\"b\") != 0) {\n            chromap_for_mapping\n                .MapSingleEndReads<chromap::MappingWithBarcode>();\n          } else {\n            chromap_for_mapping\n                .MapSingleEndReads<chromap::MappingWithoutBarcode>();\n          }\n          break;\n        default:\n          chromap::ExitWithMessage(\"Unknown mapping output format!\");\n          break;\n      }\n    } else {\n      // Paired-end reads.\n      switch (mapping_parameters.mapping_output_format) {\n        case MAPPINGFORMAT_PAF: {\n          chromap_for_mapping.MapPairedEndReads<chromap::PairedPAFMapping>();\n          break;\n        }\n        case MAPPINGFORMAT_SAM: {\n          chromap_for_mapping.MapPairedEndReads<chromap::SAMMapping>();\n          break;\n        }\n        case MAPPINGFORMAT_PAIRS: {\n          chromap_for_mapping.MapPairedEndReads<chromap::PairsMapping>();\n          break;\n        }\n        case MAPPINGFORMAT_BED:\n        case MAPPINGFORMAT_TAGALIGN:\n          if (result.count(\"b\") != 0) {\n            chromap_for_mapping\n                .MapPairedEndReads<chromap::PairedEndMappingWithBarcode>();\n          } else {\n            chromap_for_mapping\n                .MapPairedEndReads<chromap::PairedEndMappingWithoutBarcode>();\n          }\n          break;\n        default:\n          chromap::ExitWithMessage(\"Unknown mapping output format!\");\n          break;\n      }\n    }\n  } else {\n    std::cerr << options.help(\n        {\"\", \"Indexing\", \"Mapping\", \"Peak\", \"Input\", \"Output\"});\n  }\n}\n\n}  // namespace chromap\n\nint main(int argc, char *argv[]) {\n  chromap::ChromapDriver chromap_driver;\n  chromap_driver.ParseArgsAndRun(argc, argv);\n  return 0;\n}\n"
  },
  {
    "path": "src/chromap_driver.h",
    "content": "#ifndef CHROMAP_DRIVER_H_\n#define CHROMAP_DRIVER_H_\n\nnamespace chromap {\n\nclass ChromapDriver {\n public:\n  ChromapDriver() = default;\n  ~ChromapDriver() = default;\n  void ParseArgsAndRun(int argc, char *argv[]);\n};\n\n}  // namespace chromap\n\n#endif  // CHROMAP_DRIVER_H_\n"
  },
  {
    "path": "src/cxxopts.hpp",
    "content": "/*\n\nCopyright (c) 2014, 2015, 2016, 2017 Jarryd Beck\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in\nall copies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN\nTHE SOFTWARE.\n\n*/\n\n#ifndef CXXOPTS_HPP_INCLUDED\n#define CXXOPTS_HPP_INCLUDED\n\n#include <cctype>\n#include <cstring>\n#include <exception>\n#include <iostream>\n#include <limits>\n#include <list>\n#include <map>\n#include <memory>\n#include <regex>\n#include <sstream>\n#include <string>\n#include <unordered_map>\n#include <unordered_set>\n#include <utility>\n#include <vector>\n\n#ifdef __cpp_lib_optional\n#include <optional>\n#define CXXOPTS_HAS_OPTIONAL\n#endif\n\n#if __cplusplus >= 201603L\n#define CXXOPTS_NODISCARD [[nodiscard]]\n#else\n#define CXXOPTS_NODISCARD\n#endif\n\n#ifndef CXXOPTS_VECTOR_DELIMITER\n#define CXXOPTS_VECTOR_DELIMITER ','\n#endif\n\n#define CXXOPTS__VERSION_MAJOR 3\n#define CXXOPTS__VERSION_MINOR 0\n#define CXXOPTS__VERSION_PATCH 0\n\nnamespace cxxopts\n{\n  static constexpr struct {\n    uint8_t major, minor, patch;\n  } version = {\n    CXXOPTS__VERSION_MAJOR,\n    CXXOPTS__VERSION_MINOR,\n    CXXOPTS__VERSION_PATCH\n  };\n} // namespace cxxopts\n\n//when we ask cxxopts to use Unicode, help strings are processed using ICU,\n//which results in the correct lengths being computed for strings when they\n//are formatted for the help output\n//it is necessary to make sure that <unicode/unistr.h> can be found by the\n//compiler, and that icu-uc is linked in to the binary.\n\n#ifdef CXXOPTS_USE_UNICODE\n#include <unicode/unistr.h>\n\nnamespace cxxopts\n{\n  using String = icu::UnicodeString;\n\n  inline\n  String\n  toLocalString(std::string s)\n  {\n    return icu::UnicodeString::fromUTF8(std::move(s));\n  }\n\n  class UnicodeStringIterator : public\n    std::iterator<std::forward_iterator_tag, int32_t>\n  {\n    public:\n\n    UnicodeStringIterator(const icu::UnicodeString* string, int32_t pos)\n    : s(string)\n    , i(pos)\n    {\n    }\n\n    value_type\n    operator*() const\n    {\n      return s->char32At(i);\n    }\n\n    bool\n    operator==(const UnicodeStringIterator& rhs) const\n    {\n      return s == rhs.s && i == rhs.i;\n    }\n\n    bool\n    operator!=(const UnicodeStringIterator& rhs) const\n    {\n      return !(*this == rhs);\n    }\n\n    UnicodeStringIterator&\n    operator++()\n    {\n      ++i;\n      return *this;\n    }\n\n    UnicodeStringIterator\n    operator+(int32_t v)\n    {\n      return UnicodeStringIterator(s, i + v);\n    }\n\n    private:\n    const icu::UnicodeString* s;\n    int32_t i;\n  };\n\n  inline\n  String&\n  stringAppend(String&s, String a)\n  {\n    return s.append(std::move(a));\n  }\n\n  inline\n  String&\n  stringAppend(String& s, size_t n, UChar32 c)\n  {\n    for (size_t i = 0; i != n; ++i)\n    {\n      s.append(c);\n    }\n\n    return s;\n  }\n\n  template <typename Iterator>\n  String&\n  stringAppend(String& s, Iterator begin, Iterator end)\n  {\n    while (begin != end)\n    {\n      s.append(*begin);\n      ++begin;\n    }\n\n    return s;\n  }\n\n  inline\n  size_t\n  stringLength(const String& s)\n  {\n    return s.length();\n  }\n\n  inline\n  std::string\n  toUTF8String(const String& s)\n  {\n    std::string result;\n    s.toUTF8String(result);\n\n    return result;\n  }\n\n  inline\n  bool\n  empty(const String& s)\n  {\n    return s.isEmpty();\n  }\n}\n\nnamespace std\n{\n  inline\n  cxxopts::UnicodeStringIterator\n  begin(const icu::UnicodeString& s)\n  {\n    return cxxopts::UnicodeStringIterator(&s, 0);\n  }\n\n  inline\n  cxxopts::UnicodeStringIterator\n  end(const icu::UnicodeString& s)\n  {\n    return cxxopts::UnicodeStringIterator(&s, s.length());\n  }\n}\n\n//ifdef CXXOPTS_USE_UNICODE\n#else\n\nnamespace cxxopts\n{\n  using String = std::string;\n\n  template <typename T>\n  T\n  toLocalString(T&& t)\n  {\n    return std::forward<T>(t);\n  }\n\n  inline\n  size_t\n  stringLength(const String& s)\n  {\n    return s.length();\n  }\n\n  inline\n  String&\n  stringAppend(String&s, const String& a)\n  {\n    return s.append(a);\n  }\n\n  inline\n  String&\n  stringAppend(String& s, size_t n, char c)\n  {\n    return s.append(n, c);\n  }\n\n  template <typename Iterator>\n  String&\n  stringAppend(String& s, Iterator begin, Iterator end)\n  {\n    return s.append(begin, end);\n  }\n\n  template <typename T>\n  std::string\n  toUTF8String(T&& t)\n  {\n    return std::forward<T>(t);\n  }\n\n  inline\n  bool\n  empty(const std::string& s)\n  {\n    return s.empty();\n  }\n} // namespace cxxopts\n\n//ifdef CXXOPTS_USE_UNICODE\n#endif\n\nnamespace cxxopts\n{\n  namespace\n  {\n#ifdef _WIN32\n    const std::string LQUOTE(\"\\'\");\n    const std::string RQUOTE(\"\\'\");\n#else\n    const std::string LQUOTE(\"‘\");\n    const std::string RQUOTE(\"’\");\n#endif\n  } // namespace\n\n#if defined(__GNUC__)\n// GNU GCC with -Weffc++ will issue a warning regarding the upcoming class, we want to silence it:\n// warning: base class 'class std::enable_shared_from_this<cxxopts::Value>' has accessible non-virtual destructor\n#pragma GCC diagnostic ignored \"-Wnon-virtual-dtor\"\n#pragma GCC diagnostic push\n// This will be ignored under other compilers like LLVM clang.\n#endif\n  class Value : public std::enable_shared_from_this<Value>\n  {\n    public:\n\n    virtual ~Value() = default;\n\n    virtual\n    std::shared_ptr<Value>\n    clone() const = 0;\n\n    virtual void\n    parse(const std::string& text) const = 0;\n\n    virtual void\n    parse() const = 0;\n\n    virtual bool\n    has_default() const = 0;\n\n    virtual bool\n    is_container() const = 0;\n\n    virtual bool\n    has_implicit() const = 0;\n\n    virtual std::string\n    get_default_value() const = 0;\n\n    virtual std::string\n    get_implicit_value() const = 0;\n\n    virtual std::shared_ptr<Value>\n    default_value(const std::string& value) = 0;\n\n    virtual std::shared_ptr<Value>\n    implicit_value(const std::string& value) = 0;\n\n    virtual std::shared_ptr<Value>\n    no_implicit_value() = 0;\n\n    virtual bool\n    is_boolean() const = 0;\n  };\n#if defined(__GNUC__)\n#pragma GCC diagnostic pop\n#endif\n  class OptionException : public std::exception\n  {\n    public:\n    explicit OptionException(std::string  message)\n    : m_message(std::move(message))\n    {\n    }\n\n    CXXOPTS_NODISCARD\n    const char*\n    what() const noexcept override\n    {\n      return m_message.c_str();\n    }\n\n    private:\n    std::string m_message;\n  };\n\n  class OptionSpecException : public OptionException\n  {\n    public:\n\n    explicit OptionSpecException(const std::string& message)\n    : OptionException(message)\n    {\n    }\n  };\n\n  class OptionParseException : public OptionException\n  {\n    public:\n    explicit OptionParseException(const std::string& message)\n    : OptionException(message)\n    {\n    }\n  };\n\n  class option_exists_error : public OptionSpecException\n  {\n    public:\n    explicit option_exists_error(const std::string& option)\n    : OptionSpecException(\"Option \" + LQUOTE + option + RQUOTE + \" already exists\")\n    {\n    }\n  };\n\n  class invalid_option_format_error : public OptionSpecException\n  {\n    public:\n    explicit invalid_option_format_error(const std::string& format)\n    : OptionSpecException(\"Invalid option format \" + LQUOTE + format + RQUOTE)\n    {\n    }\n  };\n\n  class option_syntax_exception : public OptionParseException {\n    public:\n    explicit option_syntax_exception(const std::string& text)\n    : OptionParseException(\"Argument \" + LQUOTE + text + RQUOTE +\n        \" starts with a - but has incorrect syntax\")\n    {\n    }\n  };\n\n  class option_not_exists_exception : public OptionParseException\n  {\n    public:\n    explicit option_not_exists_exception(const std::string& option)\n    : OptionParseException(\"Option \" + LQUOTE + option + RQUOTE + \" does not exist\")\n    {\n    }\n  };\n\n  class missing_argument_exception : public OptionParseException\n  {\n    public:\n    explicit missing_argument_exception(const std::string& option)\n    : OptionParseException(\n        \"Option \" + LQUOTE + option + RQUOTE + \" is missing an argument\"\n      )\n    {\n    }\n  };\n\n  class option_requires_argument_exception : public OptionParseException\n  {\n    public:\n    explicit option_requires_argument_exception(const std::string& option)\n    : OptionParseException(\n        \"Option \" + LQUOTE + option + RQUOTE + \" requires an argument\"\n      )\n    {\n    }\n  };\n\n  class option_not_has_argument_exception : public OptionParseException\n  {\n    public:\n    option_not_has_argument_exception\n    (\n      const std::string& option,\n      const std::string& arg\n    )\n    : OptionParseException(\n        \"Option \" + LQUOTE + option + RQUOTE +\n        \" does not take an argument, but argument \" +\n        LQUOTE + arg + RQUOTE + \" given\"\n      )\n    {\n    }\n  };\n\n  class option_not_present_exception : public OptionParseException\n  {\n    public:\n    explicit option_not_present_exception(const std::string& option)\n    : OptionParseException(\"Option \" + LQUOTE + option + RQUOTE + \" not present\")\n    {\n    }\n  };\n\n  class option_has_no_value_exception : public OptionException\n  {\n    public:\n    explicit option_has_no_value_exception(const std::string& option)\n    : OptionException(\n        option.empty() ?\n        (\"Option \" + LQUOTE + option + RQUOTE + \" has no value\") :\n        \"Option has no value\")\n    {\n    }\n  };\n\n  class argument_incorrect_type : public OptionParseException\n  {\n    public:\n    explicit argument_incorrect_type\n    (\n      const std::string& arg\n    )\n    : OptionParseException(\n        \"Argument \" + LQUOTE + arg + RQUOTE + \" failed to parse\"\n      )\n    {\n    }\n  };\n\n  class option_required_exception : public OptionParseException\n  {\n    public:\n    explicit option_required_exception(const std::string& option)\n    : OptionParseException(\n        \"Option \" + LQUOTE + option + RQUOTE + \" is required but not present\"\n      )\n    {\n    }\n  };\n\n  template <typename T>\n  void throw_or_mimic(const std::string& text)\n  {\n    static_assert(std::is_base_of<std::exception, T>::value,\n                  \"throw_or_mimic only works on std::exception and \"\n                  \"deriving classes\");\n\n#ifndef CXXOPTS_NO_EXCEPTIONS\n    // If CXXOPTS_NO_EXCEPTIONS is not defined, just throw\n    throw T{text};\n#else\n    // Otherwise manually instantiate the exception, print what() to stderr,\n    // and exit\n    T exception{text};\n    std::cerr << exception.what() << std::endl;\n    std::exit(EXIT_FAILURE);\n#endif\n  }\n\n  namespace values\n  {\n    namespace\n    {\n      std::basic_regex<char> integer_pattern\n        (\"(-)?(0x)?([0-9a-zA-Z]+)|((0x)?0)\");\n      std::basic_regex<char> truthy_pattern\n        (\"(t|T)(rue)?|1\");\n      std::basic_regex<char> falsy_pattern\n        (\"(f|F)(alse)?|0\");\n    } // namespace\n\n    namespace detail\n    {\n      template <typename T, bool B>\n      struct SignedCheck;\n\n      template <typename T>\n      struct SignedCheck<T, true>\n      {\n        template <typename U>\n        void\n        operator()(bool negative, U u, const std::string& text)\n        {\n          if (negative)\n          {\n            if (u > static_cast<U>((std::numeric_limits<T>::min)()))\n            {\n              throw_or_mimic<argument_incorrect_type>(text);\n            }\n          }\n          else\n          {\n            if (u > static_cast<U>((std::numeric_limits<T>::max)()))\n            {\n              throw_or_mimic<argument_incorrect_type>(text);\n            }\n          }\n        }\n      };\n\n      template <typename T>\n      struct SignedCheck<T, false>\n      {\n        template <typename U>\n        void\n        operator()(bool, U, const std::string&) const {}\n      };\n\n      template <typename T, typename U>\n      void\n      check_signed_range(bool negative, U value, const std::string& text)\n      {\n        SignedCheck<T, std::numeric_limits<T>::is_signed>()(negative, value, text);\n      }\n    } // namespace detail\n\n    template <typename R, typename T>\n    void\n    checked_negate(R& r, T&& t, const std::string&, std::true_type)\n    {\n      // if we got to here, then `t` is a positive number that fits into\n      // `R`. So to avoid MSVC C4146, we first cast it to `R`.\n      // See https://github.com/jarro2783/cxxopts/issues/62 for more details.\n      r = static_cast<R>(-static_cast<R>(t-1)-1);\n    }\n\n    template <typename R, typename T>\n    void\n    checked_negate(R&, T&&, const std::string& text, std::false_type)\n    {\n      throw_or_mimic<argument_incorrect_type>(text);\n    }\n\n    template <typename T>\n    void\n    integer_parser(const std::string& text, T& value)\n    {\n      std::smatch match;\n      std::regex_match(text, match, integer_pattern);\n\n      if (match.length() == 0)\n      {\n        throw_or_mimic<argument_incorrect_type>(text);\n      }\n\n      if (match.length(4) > 0)\n      {\n        value = 0;\n        return;\n      }\n\n      using US = typename std::make_unsigned<T>::type;\n\n      constexpr bool is_signed = std::numeric_limits<T>::is_signed;\n      const bool negative = match.length(1) > 0;\n      const uint8_t base = match.length(2) > 0 ? 16 : 10;\n\n      auto value_match = match[3];\n\n      US result = 0;\n\n      for (auto iter = value_match.first; iter != value_match.second; ++iter)\n      {\n        US digit = 0;\n\n        if (*iter >= '0' && *iter <= '9')\n        {\n          digit = static_cast<US>(*iter - '0');\n        }\n        else if (base == 16 && *iter >= 'a' && *iter <= 'f')\n        {\n          digit = static_cast<US>(*iter - 'a' + 10);\n        }\n        else if (base == 16 && *iter >= 'A' && *iter <= 'F')\n        {\n          digit = static_cast<US>(*iter - 'A' + 10);\n        }\n        else\n        {\n          throw_or_mimic<argument_incorrect_type>(text);\n        }\n\n        const US next = static_cast<US>(result * base + digit);\n        if (result > next)\n        {\n          throw_or_mimic<argument_incorrect_type>(text);\n        }\n\n        result = next;\n      }\n\n      detail::check_signed_range<T>(negative, result, text);\n\n      if (negative)\n      {\n        checked_negate<T>(value, result, text, std::integral_constant<bool, is_signed>());\n      }\n      else\n      {\n        value = static_cast<T>(result);\n      }\n    }\n\n    template <typename T>\n    void stringstream_parser(const std::string& text, T& value)\n    {\n      std::stringstream in(text);\n      in >> value;\n      if (!in) {\n        throw_or_mimic<argument_incorrect_type>(text);\n      }\n    }\n\n    inline\n    void\n    parse_value(const std::string& text, uint8_t& value)\n    {\n      integer_parser(text, value);\n    }\n\n    inline\n    void\n    parse_value(const std::string& text, int8_t& value)\n    {\n      integer_parser(text, value);\n    }\n\n    inline\n    void\n    parse_value(const std::string& text, uint16_t& value)\n    {\n      integer_parser(text, value);\n    }\n\n    inline\n    void\n    parse_value(const std::string& text, int16_t& value)\n    {\n      integer_parser(text, value);\n    }\n\n    inline\n    void\n    parse_value(const std::string& text, uint32_t& value)\n    {\n      integer_parser(text, value);\n    }\n\n    inline\n    void\n    parse_value(const std::string& text, int32_t& value)\n    {\n      integer_parser(text, value);\n    }\n\n    inline\n    void\n    parse_value(const std::string& text, uint64_t& value)\n    {\n      integer_parser(text, value);\n    }\n\n    inline\n    void\n    parse_value(const std::string& text, int64_t& value)\n    {\n      integer_parser(text, value);\n    }\n\n    inline\n    void\n    parse_value(const std::string& text, bool& value)\n    {\n      std::smatch result;\n      std::regex_match(text, result, truthy_pattern);\n\n      if (!result.empty())\n      {\n        value = true;\n        return;\n      }\n\n      std::regex_match(text, result, falsy_pattern);\n      if (!result.empty())\n      {\n        value = false;\n        return;\n      }\n\n      throw_or_mimic<argument_incorrect_type>(text);\n    }\n\n    inline\n    void\n    parse_value(const std::string& text, std::string& value)\n    {\n      value = text;\n    }\n\n    // The fallback parser. It uses the stringstream parser to parse all types\n    // that have not been overloaded explicitly.  It has to be placed in the\n    // source code before all other more specialized templates.\n    template <typename T>\n    void\n    parse_value(const std::string& text, T& value) {\n      stringstream_parser(text, value);\n    }\n\n    template <typename T>\n    void\n    parse_value(const std::string& text, std::vector<T>& value)\n    {\n      std::stringstream in(text);\n      std::string token;\n      while(!in.eof() && std::getline(in, token, CXXOPTS_VECTOR_DELIMITER)) {\n        T v;\n        parse_value(token, v);\n        value.emplace_back(std::move(v));\n      }\n    }\n\n#ifdef CXXOPTS_HAS_OPTIONAL\n    template <typename T>\n    void\n    parse_value(const std::string& text, std::optional<T>& value)\n    {\n      T result;\n      parse_value(text, result);\n      value = std::move(result);\n    }\n#endif\n\n    inline\n    void parse_value(const std::string& text, char& c)\n    {\n      if (text.length() != 1)\n      {\n        throw_or_mimic<argument_incorrect_type>(text);\n      }\n\n      c = text[0];\n    }\n\n    template <typename T>\n    struct type_is_container\n    {\n      static constexpr bool value = false;\n    };\n\n    template <typename T>\n    struct type_is_container<std::vector<T>>\n    {\n      static constexpr bool value = true;\n    };\n\n    template <typename T>\n    class abstract_value : public Value\n    {\n      using Self = abstract_value<T>;\n\n      public:\n      abstract_value()\n      : m_result(std::make_shared<T>())\n      , m_store(m_result.get())\n      {\n      }\n\n      explicit abstract_value(T* t)\n      : m_store(t)\n      {\n      }\n\n      ~abstract_value() override = default;\n\n      abstract_value& operator=(const abstract_value&) = default;\n\n      abstract_value(const abstract_value& rhs)\n      {\n        if (rhs.m_result)\n        {\n          m_result = std::make_shared<T>();\n          m_store = m_result.get();\n        }\n        else\n        {\n          m_store = rhs.m_store;\n        }\n\n        m_default = rhs.m_default;\n        m_implicit = rhs.m_implicit;\n        m_default_value = rhs.m_default_value;\n        m_implicit_value = rhs.m_implicit_value;\n      }\n\n      void\n      parse(const std::string& text) const override\n      {\n        parse_value(text, *m_store);\n      }\n\n      bool\n      is_container() const override\n      {\n        return type_is_container<T>::value;\n      }\n\n      void\n      parse() const override\n      {\n        parse_value(m_default_value, *m_store);\n      }\n\n      bool\n      has_default() const override\n      {\n        return m_default;\n      }\n\n      bool\n      has_implicit() const override\n      {\n        return m_implicit;\n      }\n\n      std::shared_ptr<Value>\n      default_value(const std::string& value) override\n      {\n        m_default = true;\n        m_default_value = value;\n        return shared_from_this();\n      }\n\n      std::shared_ptr<Value>\n      implicit_value(const std::string& value) override\n      {\n        m_implicit = true;\n        m_implicit_value = value;\n        return shared_from_this();\n      }\n\n      std::shared_ptr<Value>\n      no_implicit_value() override\n      {\n        m_implicit = false;\n        return shared_from_this();\n      }\n\n      std::string\n      get_default_value() const override\n      {\n        return m_default_value;\n      }\n\n      std::string\n      get_implicit_value() const override\n      {\n        return m_implicit_value;\n      }\n\n      bool\n      is_boolean() const override\n      {\n        return std::is_same<T, bool>::value;\n      }\n\n      const T&\n      get() const\n      {\n        if (m_store == nullptr)\n        {\n          return *m_result;\n        }\n        return *m_store;\n      }\n\n      protected:\n      std::shared_ptr<T> m_result{};\n      T* m_store{};\n\n      bool m_default = false;\n      bool m_implicit = false;\n\n      std::string m_default_value{};\n      std::string m_implicit_value{};\n    };\n\n    template <typename T>\n    class standard_value : public abstract_value<T>\n    {\n      public:\n      using abstract_value<T>::abstract_value;\n\n      CXXOPTS_NODISCARD\n      std::shared_ptr<Value>\n      clone() const override\n      {\n        return std::make_shared<standard_value<T>>(*this);\n      }\n    };\n\n    template <>\n    class standard_value<bool> : public abstract_value<bool>\n    {\n      public:\n      ~standard_value() override = default;\n\n      standard_value()\n      {\n        set_default_and_implicit();\n      }\n\n      explicit standard_value(bool* b)\n      : abstract_value(b)\n      {\n        set_default_and_implicit();\n      }\n\n      std::shared_ptr<Value>\n      clone() const override\n      {\n        return std::make_shared<standard_value<bool>>(*this);\n      }\n\n      private:\n\n      void\n      set_default_and_implicit()\n      {\n        m_default = true;\n        m_default_value = \"false\";\n        m_implicit = true;\n        m_implicit_value = \"true\";\n      }\n    };\n  } // namespace values\n\n  template <typename T>\n  std::shared_ptr<Value>\n  value()\n  {\n    return std::make_shared<values::standard_value<T>>();\n  }\n\n  template <typename T>\n  std::shared_ptr<Value>\n  value(T& t)\n  {\n    return std::make_shared<values::standard_value<T>>(&t);\n  }\n\n  class OptionAdder;\n\n  class OptionDetails\n  {\n    public:\n    OptionDetails\n    (\n      std::string short_,\n      std::string long_,\n      String desc,\n      std::shared_ptr<const Value> val\n    )\n    : m_short(std::move(short_))\n    , m_long(std::move(long_))\n    , m_desc(std::move(desc))\n    , m_value(std::move(val))\n    , m_count(0)\n    {\n      m_hash = std::hash<std::string>{}(m_long + m_short);\n    }\n\n    OptionDetails(const OptionDetails& rhs)\n    : m_desc(rhs.m_desc)\n    , m_value(rhs.m_value->clone())\n    , m_count(rhs.m_count)\n    {\n    }\n\n    OptionDetails(OptionDetails&& rhs) = default;\n\n    CXXOPTS_NODISCARD\n    const String&\n    description() const\n    {\n      return m_desc;\n    }\n\n    CXXOPTS_NODISCARD\n    const Value&\n    value() const {\n        return *m_value;\n    }\n\n    CXXOPTS_NODISCARD\n    std::shared_ptr<Value>\n    make_storage() const\n    {\n      return m_value->clone();\n    }\n\n    CXXOPTS_NODISCARD\n    const std::string&\n    short_name() const\n    {\n      return m_short;\n    }\n\n    CXXOPTS_NODISCARD\n    const std::string&\n    long_name() const\n    {\n      return m_long;\n    }\n\n    size_t\n    hash() const\n    {\n      return m_hash;\n    }\n\n    private:\n    std::string m_short{};\n    std::string m_long{};\n    String m_desc{};\n    std::shared_ptr<const Value> m_value{};\n    int m_count;\n\n    size_t m_hash{};\n  };\n\n  struct HelpOptionDetails\n  {\n    std::string s;\n    std::string l;\n    String desc;\n    bool has_default;\n    std::string default_value;\n    bool has_implicit;\n    std::string implicit_value;\n    std::string arg_help;\n    bool is_container;\n    bool is_boolean;\n  };\n\n  struct HelpGroupDetails\n  {\n    std::string name{};\n    std::string description{};\n    std::vector<HelpOptionDetails> options{};\n  };\n\n  class OptionValue\n  {\n    public:\n    void\n    parse\n    (\n      const std::shared_ptr<const OptionDetails>& details,\n      const std::string& text\n    )\n    {\n      ensure_value(details);\n      ++m_count;\n      m_value->parse(text);\n      m_long_name = &details->long_name();\n    }\n\n    void\n    parse_default(const std::shared_ptr<const OptionDetails>& details)\n    {\n      ensure_value(details);\n      m_default = true;\n      m_long_name = &details->long_name();\n      m_value->parse();\n    }\n\n#if defined(__GNUC__)\n#if __GNUC__ <= 10 && __GNUC_MINOR__ <= 1\n#pragma GCC diagnostic push\n#pragma GCC diagnostic ignored \"-Werror=null-dereference\"\n#endif\n#endif\n    \n    CXXOPTS_NODISCARD\n    size_t\n    count() const noexcept\n    {\n      return m_count;\n    }\n    \n#if defined(__GNUC__)\n#if __GNUC__ <= 10 && __GNUC_MINOR__ <= 1\n#pragma GCC diagnostic pop\n#endif\n#endif\n\n    // TODO: maybe default options should count towards the number of arguments\n    CXXOPTS_NODISCARD\n    bool\n    has_default() const noexcept\n    {\n      return m_default;\n    }\n\n    template <typename T>\n    const T&\n    as() const\n    {\n      if (m_value == nullptr) {\n          throw_or_mimic<option_has_no_value_exception>(\n              m_long_name == nullptr ? \"\" : *m_long_name);\n      }\n\n#ifdef CXXOPTS_NO_RTTI\n      return static_cast<const values::standard_value<T>&>(*m_value).get();\n#else\n      return dynamic_cast<const values::standard_value<T>&>(*m_value).get();\n#endif\n    }\n\n    private:\n    void\n    ensure_value(const std::shared_ptr<const OptionDetails>& details)\n    {\n      if (m_value == nullptr)\n      {\n        m_value = details->make_storage();\n      }\n    }\n\n\n    const std::string* m_long_name = nullptr;\n    // Holding this pointer is safe, since OptionValue's only exist in key-value pairs,\n    // where the key has the string we point to.\n    std::shared_ptr<Value> m_value{};\n    size_t m_count = 0;\n    bool m_default = false;\n  };\n\n  class KeyValue\n  {\n    public:\n    KeyValue(std::string key_, std::string value_)\n    : m_key(std::move(key_))\n    , m_value(std::move(value_))\n    {\n    }\n\n    CXXOPTS_NODISCARD\n    const std::string&\n    key() const\n    {\n      return m_key;\n    }\n\n    CXXOPTS_NODISCARD\n    const std::string&\n    value() const\n    {\n      return m_value;\n    }\n\n    template <typename T>\n    T\n    as() const\n    {\n      T result;\n      values::parse_value(m_value, result);\n      return result;\n    }\n\n    private:\n    std::string m_key;\n    std::string m_value;\n  };\n\n  using ParsedHashMap = std::unordered_map<size_t, OptionValue>;\n  using NameHashMap = std::unordered_map<std::string, size_t>;\n\n  class ParseResult\n  {\n    public:\n\n    ParseResult() = default;\n    ParseResult(const ParseResult&) = default;\n\n    ParseResult(NameHashMap&& keys, ParsedHashMap&& values, std::vector<KeyValue> sequential, std::vector<std::string>&& unmatched_args)\n    : m_keys(std::move(keys))\n    , m_values(std::move(values))\n    , m_sequential(std::move(sequential))\n    , m_unmatched(std::move(unmatched_args))\n    {\n    }\n\n    ParseResult& operator=(ParseResult&&) = default;\n    ParseResult& operator=(const ParseResult&) = default;\n\n    size_t\n    count(const std::string& o) const\n    {\n      auto iter = m_keys.find(o);\n      if (iter == m_keys.end())\n      {\n        return 0;\n      }\n\n      auto viter = m_values.find(iter->second);\n\n      if (viter == m_values.end())\n      {\n        return 0;\n      }\n\n      return viter->second.count();\n    }\n\n    const OptionValue&\n    operator[](const std::string& option) const\n    {\n      auto iter = m_keys.find(option);\n\n      if (iter == m_keys.end())\n      {\n        throw_or_mimic<option_not_present_exception>(option);\n      }\n\n      auto viter = m_values.find(iter->second);\n\n      if (viter == m_values.end())\n      {\n        throw_or_mimic<option_not_present_exception>(option);\n      }\n\n      return viter->second;\n    }\n\n    const std::vector<KeyValue>&\n    arguments() const\n    {\n      return m_sequential;\n    }\n\n    const std::vector<std::string>&\n    unmatched() const\n    {\n      return m_unmatched;\n    }\n\n    private:\n    NameHashMap m_keys{};\n    ParsedHashMap m_values{};\n    std::vector<KeyValue> m_sequential{};\n    std::vector<std::string> m_unmatched{};\n  };\n\n  struct Option\n  {\n    Option\n    (\n      std::string opts,\n      std::string desc,\n      std::shared_ptr<const Value>  value = ::cxxopts::value<bool>(),\n      std::string arg_help = \"\"\n    )\n    : opts_(std::move(opts))\n    , desc_(std::move(desc))\n    , value_(std::move(value))\n    , arg_help_(std::move(arg_help))\n    {\n    }\n\n    std::string opts_;\n    std::string desc_;\n    std::shared_ptr<const Value> value_;\n    std::string arg_help_;\n  };\n\n  using OptionMap = std::unordered_map<std::string, std::shared_ptr<OptionDetails>>;\n  using PositionalList = std::vector<std::string>;\n  using PositionalListIterator = PositionalList::const_iterator;\n\n  class OptionParser\n  {\n    public:\n    OptionParser(const OptionMap& options, const PositionalList& positional, bool allow_unrecognised)\n    : m_options(options)\n    , m_positional(positional)\n    , m_allow_unrecognised(allow_unrecognised)\n    {\n    }\n\n    ParseResult\n    parse(int argc, const char* const* argv);\n\n    bool\n    consume_positional(const std::string& a, PositionalListIterator& next);\n\n    void\n    checked_parse_arg\n    (\n      int argc,\n      const char* const* argv,\n      int& current,\n      const std::shared_ptr<OptionDetails>& value,\n      const std::string& name\n    );\n\n    void\n    add_to_option(OptionMap::const_iterator iter, const std::string& option, const std::string& arg);\n\n    void\n    parse_option\n    (\n      const std::shared_ptr<OptionDetails>& value,\n      const std::string& name,\n      const std::string& arg = \"\"\n    );\n\n    void\n    parse_default(const std::shared_ptr<OptionDetails>& details);\n\n    private:\n\n    void finalise_aliases();\n\n    const OptionMap& m_options;\n    const PositionalList& m_positional;\n\n    std::vector<KeyValue> m_sequential{};\n    bool m_allow_unrecognised;\n\n    ParsedHashMap m_parsed{};\n    NameHashMap m_keys{};\n  };\n\n  class Options\n  {\n    public:\n\n    explicit Options(std::string program, std::string help_string = \"\")\n    : m_program(std::move(program))\n    , m_help_string(toLocalString(std::move(help_string)))\n    , m_custom_help(\"[OPTION...]\")\n    , m_positional_help(\"positional parameters\")\n    , m_show_positional(false)\n    , m_allow_unrecognised(false)\n    , m_width(76)\n    , m_tab_expansion(false)\n    , m_options(std::make_shared<OptionMap>())\n    {\n    }\n\n    Options&\n    positional_help(std::string help_text)\n    {\n      m_positional_help = std::move(help_text);\n      return *this;\n    }\n\n    Options&\n    custom_help(std::string help_text)\n    {\n      m_custom_help = std::move(help_text);\n      return *this;\n    }\n\n    Options&\n    show_positional_help()\n    {\n      m_show_positional = true;\n      return *this;\n    }\n\n    Options&\n    allow_unrecognised_options()\n    {\n      m_allow_unrecognised = true;\n      return *this;\n    }\n\n    Options&\n    set_width(size_t width)\n    {\n      m_width = width;\n      return *this;\n    }\n\n    Options&\n    set_tab_expansion(bool expansion=true)\n    {\n      m_tab_expansion = expansion;\n      return *this;\n    }\n\n    ParseResult\n    parse(int argc, const char* const* argv);\n\n    OptionAdder\n    add_options(std::string group = \"\");\n\n    void\n    add_options\n    (\n      const std::string& group,\n      std::initializer_list<Option> options\n    );\n\n    void\n    add_option\n    (\n      const std::string& group,\n      const Option& option\n    );\n\n    void\n    add_option\n    (\n      const std::string& group,\n      const std::string& s,\n      const std::string& l,\n      std::string desc,\n      const std::shared_ptr<const Value>& value,\n      std::string arg_help\n    );\n\n    //parse positional arguments into the given option\n    void\n    parse_positional(std::string option);\n\n    void\n    parse_positional(std::vector<std::string> options);\n\n    void\n    parse_positional(std::initializer_list<std::string> options);\n\n    template <typename Iterator>\n    void\n    parse_positional(Iterator begin, Iterator end) {\n      parse_positional(std::vector<std::string>{begin, end});\n    }\n\n    std::string\n    help(const std::vector<std::string>& groups = {}) const;\n\n    std::vector<std::string>\n    groups() const;\n\n    const HelpGroupDetails&\n    group_help(const std::string& group) const;\n\n    private:\n\n    void\n    add_one_option\n    (\n      const std::string& option,\n      const std::shared_ptr<OptionDetails>& details\n    );\n\n    String\n    help_one_group(const std::string& group) const;\n\n    void\n    generate_group_help\n    (\n      String& result,\n      const std::vector<std::string>& groups\n    ) const;\n\n    void\n    generate_all_groups_help(String& result) const;\n\n    std::string m_program{};\n    String m_help_string{};\n    std::string m_custom_help{};\n    std::string m_positional_help{};\n    bool m_show_positional;\n    bool m_allow_unrecognised;\n    size_t m_width;\n    bool m_tab_expansion;\n\n    std::shared_ptr<OptionMap> m_options;\n    std::vector<std::string> m_positional{};\n    std::unordered_set<std::string> m_positional_set{};\n\n    //mapping from groups to help options\n    std::map<std::string, HelpGroupDetails> m_help{};\n\n    std::list<OptionDetails> m_option_list{};\n    std::unordered_map<std::string, decltype(m_option_list)::iterator> m_option_map{};\n  };\n\n  class OptionAdder\n  {\n    public:\n\n    OptionAdder(Options& options, std::string group)\n    : m_options(options), m_group(std::move(group))\n    {\n    }\n\n    OptionAdder&\n    operator()\n    (\n      const std::string& opts,\n      const std::string& desc,\n      const std::shared_ptr<const Value>& value\n        = ::cxxopts::value<bool>(),\n      std::string arg_help = \"\"\n    );\n\n    private:\n    Options& m_options;\n    std::string m_group;\n  };\n\n  namespace\n  {\n    constexpr size_t OPTION_LONGEST = 30;\n    constexpr size_t OPTION_DESC_GAP = 2;\n\n    std::basic_regex<char> option_matcher\n      (\"--([[:alnum:]][-_[:alnum:]]+)(=(.*))?|-([[:alnum:]]+)\");\n\n    std::basic_regex<char> option_specifier\n      (\"(([[:alnum:]]),)?[ ]*([[:alnum:]][-_[:alnum:]]*)?\");\n\n    String\n    format_option\n    (\n      const HelpOptionDetails& o\n    )\n    {\n      const auto& s = o.s;\n      const auto& l = o.l;\n\n      String result = \"  \";\n\n      if (!s.empty())\n      {\n        result += \"-\" + toLocalString(s);\n        if (!l.empty())\n        {\n          result += \",\";\n        }\n      }\n      else\n      {\n        result += \"   \";\n      }\n\n      if (!l.empty())\n      {\n        result += \" --\" + toLocalString(l);\n      }\n\n      auto arg = !o.arg_help.empty() ? toLocalString(o.arg_help) : \"arg\";\n\n      if (!o.is_boolean)\n      {\n        if (o.has_implicit)\n        {\n          result += \" [=\" + arg + \"(=\" + toLocalString(o.implicit_value) + \")]\";\n        }\n        else\n        {\n          result += \" \" + arg;\n        }\n      }\n\n      return result;\n    }\n\n    String\n    format_description\n    (\n      const HelpOptionDetails& o,\n      size_t start,\n      size_t allowed,\n      bool tab_expansion\n    )\n    {\n      auto desc = o.desc;\n\n      if (o.has_default && (!o.is_boolean || o.default_value != \"false\"))\n      {\n        if(!o.default_value.empty())\n        {\n          desc += toLocalString(\" (default: \" + o.default_value + \")\");\n        }\n        else\n        {\n          desc += toLocalString(\" (default: \\\"\\\")\");\n        }\n      }\n\n      String result;\n\n      if (tab_expansion)\n      {\n        String desc2;\n        auto size = size_t{ 0 };\n        for (auto c = std::begin(desc); c != std::end(desc); ++c)\n        {\n          if (*c == '\\n')\n          {\n            desc2 += *c;\n            size = 0;\n          }\n          else if (*c == '\\t')\n          {\n            auto skip = 8 - size % 8;\n            stringAppend(desc2, skip, ' ');\n            size += skip;\n          }\n          else\n          {\n            desc2 += *c;\n            ++size;\n          }\n        }\n        desc = desc2;\n      }\n\n      desc += \" \";\n\n      auto current = std::begin(desc);\n      auto previous = current;\n      auto startLine = current;\n      auto lastSpace = current;\n\n      auto size = size_t{};\n\n      bool appendNewLine;\n      bool onlyWhiteSpace = true;\n\n      while (current != std::end(desc))\n      {\n        appendNewLine = false;\n\n        if (std::isblank(*previous))\n        {\n          lastSpace = current;\n        }\n\n        if (!std::isblank(*current))\n        {\n          onlyWhiteSpace = false;\n        }\n\n        while (*current == '\\n')\n        {\n          previous = current;\n          ++current;\n          appendNewLine = true;\n        }\n\n        if (!appendNewLine && size >= allowed)\n        {\n          if (lastSpace != startLine)\n          {\n            current = lastSpace;\n            previous = current;\n          }\n          appendNewLine = true;\n        }\n\n        if (appendNewLine)\n        {\n          stringAppend(result, startLine, current);\n          startLine = current;\n          lastSpace = current;\n\n          if (*previous != '\\n')\n          {\n            stringAppend(result, \"\\n\");\n          }\n\n          stringAppend(result, start, ' ');\n\n          if (*previous != '\\n')\n          {\n            stringAppend(result, lastSpace, current);\n          }\n\n          onlyWhiteSpace = true;\n          size = 0;\n        }\n\n        previous = current;\n        ++current;\n        ++size;\n      }\n\n      //append whatever is left but ignore whitespace\n      if (!onlyWhiteSpace)\n      {\n        stringAppend(result, startLine, previous);\n      }\n\n      return result;\n    }\n  } // namespace\n\ninline\nvoid\nOptions::add_options\n(\n  const std::string &group,\n  std::initializer_list<Option> options\n)\n{\n OptionAdder option_adder(*this, group);\n for (const auto &option: options)\n {\n   option_adder(option.opts_, option.desc_, option.value_, option.arg_help_);\n }\n}\n\ninline\nOptionAdder\nOptions::add_options(std::string group)\n{\n  return OptionAdder(*this, std::move(group));\n}\n\ninline\nOptionAdder&\nOptionAdder::operator()\n(\n  const std::string& opts,\n  const std::string& desc,\n  const std::shared_ptr<const Value>& value,\n  std::string arg_help\n)\n{\n  std::match_results<const char*> result;\n  std::regex_match(opts.c_str(), result, option_specifier);\n\n  if (result.empty())\n  {\n    throw_or_mimic<invalid_option_format_error>(opts);\n  }\n\n  const auto& short_match = result[2];\n  const auto& long_match = result[3];\n\n  if (!short_match.length() && !long_match.length())\n  {\n    throw_or_mimic<invalid_option_format_error>(opts);\n  } else if (long_match.length() == 1 && short_match.length())\n  {\n    throw_or_mimic<invalid_option_format_error>(opts);\n  }\n\n  auto option_names = []\n  (\n    const std::sub_match<const char*>& short_,\n    const std::sub_match<const char*>& long_\n  )\n  {\n    if (long_.length() == 1)\n    {\n      return std::make_tuple(long_.str(), short_.str());\n    }\n    return std::make_tuple(short_.str(), long_.str());\n  }(short_match, long_match);\n\n  m_options.add_option\n  (\n    m_group,\n    std::get<0>(option_names),\n    std::get<1>(option_names),\n    desc,\n    value,\n    std::move(arg_help)\n  );\n\n  return *this;\n}\n\ninline\nvoid\nOptionParser::parse_default(const std::shared_ptr<OptionDetails>& details)\n{\n  // TODO: remove the duplicate code here\n  auto& store = m_parsed[details->hash()];\n  store.parse_default(details);\n}\n\ninline\nvoid\nOptionParser::parse_option\n(\n  const std::shared_ptr<OptionDetails>& value,\n  const std::string& /*name*/,\n  const std::string& arg\n)\n{\n  auto hash = value->hash();\n  auto& result = m_parsed[hash];\n  result.parse(value, arg);\n\n  m_sequential.emplace_back(value->long_name(), arg);\n}\n\ninline\nvoid\nOptionParser::checked_parse_arg\n(\n  int argc,\n  const char* const* argv,\n  int& current,\n  const std::shared_ptr<OptionDetails>& value,\n  const std::string& name\n)\n{\n  if (current + 1 >= argc)\n  {\n    if (value->value().has_implicit())\n    {\n      parse_option(value, name, value->value().get_implicit_value());\n    }\n    else\n    {\n      throw_or_mimic<missing_argument_exception>(name);\n    }\n  }\n  else\n  {\n    if (value->value().has_implicit())\n    {\n      parse_option(value, name, value->value().get_implicit_value());\n    }\n    else\n    {\n      parse_option(value, name, argv[current + 1]);\n      ++current;\n    }\n  }\n}\n\ninline\nvoid\nOptionParser::add_to_option(OptionMap::const_iterator iter, const std::string& option, const std::string& arg)\n{\n  parse_option(iter->second, option, arg);\n}\n\ninline\nbool\nOptionParser::consume_positional(const std::string& a, PositionalListIterator& next)\n{\n  while (next != m_positional.end())\n  {\n    auto iter = m_options.find(*next);\n    if (iter != m_options.end())\n    {\n      if (!iter->second->value().is_container())\n      {\n        auto& result = m_parsed[iter->second->hash()];\n        if (result.count() == 0)\n        {\n          add_to_option(iter, *next, a);\n          ++next;\n          return true;\n        }\n        ++next;\n        continue;\n      }\n      add_to_option(iter, *next, a);\n      return true;\n    }\n    throw_or_mimic<option_not_exists_exception>(*next);\n  }\n\n  return false;\n}\n\ninline\nvoid\nOptions::parse_positional(std::string option)\n{\n  parse_positional(std::vector<std::string>{std::move(option)});\n}\n\ninline\nvoid\nOptions::parse_positional(std::vector<std::string> options)\n{\n  m_positional = std::move(options);\n\n  m_positional_set.insert(m_positional.begin(), m_positional.end());\n}\n\ninline\nvoid\nOptions::parse_positional(std::initializer_list<std::string> options)\n{\n  parse_positional(std::vector<std::string>(options));\n}\n\ninline\nParseResult\nOptions::parse(int argc, const char* const* argv)\n{\n  OptionParser parser(*m_options, m_positional, m_allow_unrecognised);\n\n  return parser.parse(argc, argv);\n}\n\ninline ParseResult\nOptionParser::parse(int argc, const char* const* argv)\n{\n  int current = 1;\n  bool consume_remaining = false;\n  auto next_positional = m_positional.begin();\n\n  std::vector<std::string> unmatched;\n\n  while (current != argc)\n  {\n    if (strcmp(argv[current], \"--\") == 0)\n    {\n      consume_remaining = true;\n      ++current;\n      break;\n    }\n\n    std::match_results<const char*> result;\n    std::regex_match(argv[current], result, option_matcher);\n\n    if (result.empty())\n    {\n      //not a flag\n\n      // but if it starts with a `-`, then it's an error\n      if (argv[current][0] == '-' && argv[current][1] != '\\0') {\n        if (!m_allow_unrecognised) {\n          throw_or_mimic<option_syntax_exception>(argv[current]);\n        }\n      }\n\n      //if true is returned here then it was consumed, otherwise it is\n      //ignored\n      if (consume_positional(argv[current], next_positional))\n      {\n      }\n      else\n      {\n        unmatched.emplace_back(argv[current]);\n      }\n      //if we return from here then it was parsed successfully, so continue\n    }\n    else\n    {\n      //short or long option?\n      if (result[4].length() != 0)\n      {\n        const std::string& s = result[4];\n\n        for (std::size_t i = 0; i != s.size(); ++i)\n        {\n          std::string name(1, s[i]);\n          auto iter = m_options.find(name);\n\n          if (iter == m_options.end())\n          {\n            if (m_allow_unrecognised)\n            {\n              continue;\n            }\n            //error\n            throw_or_mimic<option_not_exists_exception>(name);\n          }\n\n          auto value = iter->second;\n\n          if (i + 1 == s.size())\n          {\n            //it must be the last argument\n            checked_parse_arg(argc, argv, current, value, name);\n          }\n          else if (value->value().has_implicit())\n          {\n            parse_option(value, name, value->value().get_implicit_value());\n          }\n          else\n          {\n            //error\n            throw_or_mimic<option_requires_argument_exception>(name);\n          }\n        }\n      }\n      else if (result[1].length() != 0)\n      {\n        const std::string& name = result[1];\n\n        auto iter = m_options.find(name);\n\n        if (iter == m_options.end())\n        {\n          if (m_allow_unrecognised)\n          {\n            // keep unrecognised options in argument list, skip to next argument\n            unmatched.emplace_back(argv[current]);\n            ++current;\n            continue;\n          }\n          //error\n          throw_or_mimic<option_not_exists_exception>(name);\n        }\n\n        auto opt = iter->second;\n\n        //equals provided for long option?\n        if (result[2].length() != 0)\n        {\n          //parse the option given\n\n          parse_option(opt, name, result[3]);\n        }\n        else\n        {\n          //parse the next argument\n          checked_parse_arg(argc, argv, current, opt, name);\n        }\n      }\n\n    }\n\n    ++current;\n  }\n\n  for (auto& opt : m_options)\n  {\n    auto& detail = opt.second;\n    const auto& value = detail->value();\n\n    auto& store = m_parsed[detail->hash()];\n\n    if(value.has_default() && !store.count() && !store.has_default()){\n      parse_default(detail);\n    }\n  }\n\n  if (consume_remaining)\n  {\n    while (current < argc)\n    {\n      if (!consume_positional(argv[current], next_positional)) {\n        break;\n      }\n      ++current;\n    }\n\n    //adjust argv for any that couldn't be swallowed\n    while (current != argc) {\n      unmatched.emplace_back(argv[current]);\n      ++current;\n    }\n  }\n\n  finalise_aliases();\n\n  ParseResult parsed(std::move(m_keys), std::move(m_parsed), std::move(m_sequential), std::move(unmatched));\n  return parsed;\n}\n\ninline\nvoid\nOptionParser::finalise_aliases()\n{\n  for (auto& option: m_options)\n  {\n    auto& detail = *option.second;\n    auto hash = detail.hash();\n    m_keys[detail.short_name()] = hash;\n    m_keys[detail.long_name()] = hash;\n\n    m_parsed.emplace(hash, OptionValue());\n  }\n}\n\ninline\nvoid\nOptions::add_option\n(\n  const std::string& group,\n  const Option& option\n)\n{\n    add_options(group, {option});\n}\n\ninline\nvoid\nOptions::add_option\n(\n  const std::string& group,\n  const std::string& s,\n  const std::string& l,\n  std::string desc,\n  const std::shared_ptr<const Value>& value,\n  std::string arg_help\n)\n{\n  auto stringDesc = toLocalString(std::move(desc));\n  auto option = std::make_shared<OptionDetails>(s, l, stringDesc, value);\n\n  if (!s.empty())\n  {\n    add_one_option(s, option);\n  }\n\n  if (!l.empty())\n  {\n    add_one_option(l, option);\n  }\n\n  m_option_list.push_front(*option.get());\n  auto iter = m_option_list.begin();\n  m_option_map[s] = iter;\n  m_option_map[l] = iter;\n\n  //add the help details\n  auto& options = m_help[group];\n\n  options.options.emplace_back(HelpOptionDetails{s, l, stringDesc,\n      value->has_default(), value->get_default_value(),\n      value->has_implicit(), value->get_implicit_value(),\n      std::move(arg_help),\n      value->is_container(),\n      value->is_boolean()});\n}\n\ninline\nvoid\nOptions::add_one_option\n(\n  const std::string& option,\n  const std::shared_ptr<OptionDetails>& details\n)\n{\n  auto in = m_options->emplace(option, details);\n\n  if (!in.second)\n  {\n    throw_or_mimic<option_exists_error>(option);\n  }\n}\n\ninline\nString\nOptions::help_one_group(const std::string& g) const\n{\n  using OptionHelp = std::vector<std::pair<String, String>>;\n\n  auto group = m_help.find(g);\n  if (group == m_help.end())\n  {\n    return \"\";\n  }\n\n  OptionHelp format;\n\n  size_t longest = 0;\n\n  String result;\n\n  if (!g.empty())\n  {\n    result += toLocalString(\" \" + g + \" options:\\n\");\n  }\n\n  for (const auto& o : group->second.options)\n  {\n    if (m_positional_set.find(o.l) != m_positional_set.end() &&\n        !m_show_positional)\n    {\n      continue;\n    }\n\n    auto s = format_option(o);\n    longest = (std::max)(longest, stringLength(s));\n    format.push_back(std::make_pair(s, String()));\n  }\n  longest = (std::min)(longest, OPTION_LONGEST);\n\n  //widest allowed description -- min 10 chars for helptext/line\n  size_t allowed = 10;\n  if (m_width > allowed + longest + OPTION_DESC_GAP)\n  {\n    allowed = m_width - longest - OPTION_DESC_GAP;\n  }\n\n  auto fiter = format.begin();\n  for (const auto& o : group->second.options)\n  {\n    if (m_positional_set.find(o.l) != m_positional_set.end() &&\n        !m_show_positional)\n    {\n      continue;\n    }\n\n    auto d = format_description(o, longest + OPTION_DESC_GAP, allowed, m_tab_expansion);\n\n    result += fiter->first;\n    if (stringLength(fiter->first) > longest)\n    {\n      result += '\\n';\n      result += toLocalString(std::string(longest + OPTION_DESC_GAP, ' '));\n    }\n    else\n    {\n      result += toLocalString(std::string(longest + OPTION_DESC_GAP -\n        stringLength(fiter->first),\n        ' '));\n    }\n    result += d;\n    result += '\\n';\n\n    ++fiter;\n  }\n\n  return result;\n}\n\ninline\nvoid\nOptions::generate_group_help\n(\n  String& result,\n  const std::vector<std::string>& print_groups\n) const\n{\n  for (size_t i = 0; i != print_groups.size(); ++i)\n  {\n    const String& group_help_text = help_one_group(print_groups[i]);\n    if (empty(group_help_text))\n    {\n      continue;\n    }\n    result += group_help_text;\n    if (i < print_groups.size() - 1)\n    {\n      result += '\\n';\n    }\n  }\n}\n\ninline\nvoid\nOptions::generate_all_groups_help(String& result) const\n{\n  std::vector<std::string> all_groups;\n\n  std::transform(\n    m_help.begin(),\n    m_help.end(),\n    std::back_inserter(all_groups),\n    [] (const std::map<std::string, HelpGroupDetails>::value_type& group)\n    {\n      return group.first;\n    }\n  );\n\n  generate_group_help(result, all_groups);\n}\n\ninline\nstd::string\nOptions::help(const std::vector<std::string>& help_groups) const\n{\n  String result = m_help_string + \"\\nUsage:\\n  \" +\n    toLocalString(m_program) + \" \" + toLocalString(m_custom_help);\n\n  if (!m_positional.empty() && !m_positional_help.empty()) {\n    result += \" \" + toLocalString(m_positional_help);\n  }\n\n  result += \"\\n\\n\";\n\n  if (help_groups.empty())\n  {\n    generate_all_groups_help(result);\n  }\n  else\n  {\n    generate_group_help(result, help_groups);\n  }\n\n  return toUTF8String(result);\n}\n\ninline\nstd::vector<std::string>\nOptions::groups() const\n{\n  std::vector<std::string> g;\n\n  std::transform(\n    m_help.begin(),\n    m_help.end(),\n    std::back_inserter(g),\n    [] (const std::map<std::string, HelpGroupDetails>::value_type& pair)\n    {\n      return pair.first;\n    }\n  );\n\n  return g;\n}\n\ninline\nconst HelpGroupDetails&\nOptions::group_help(const std::string& group) const\n{\n  return m_help.at(group);\n}\n\n} // namespace cxxopts\n\n#endif //CXXOPTS_HPP_INCLUDED\n"
  },
  {
    "path": "src/draft_mapping.h",
    "content": "#ifndef DRAFT_MAPPING_H_\n#define DRAFT_MAPPING_H_\n\n#include <stdint.h>\n\nnamespace chromap {\n\nstruct DraftMapping {\n  int num_errors = 0;\n\n  // The high 32 bits save the reference sequence index in the reference\n  // sequence batch. The low 32 bits save the mapping end position on the\n  // reference sequence.\n  uint64_t position = 0;\n\n  DraftMapping(int num_errors, uint64_t position)\n      : num_errors(num_errors), position(position) {}\n\n  inline int GetNumErrors() const { return num_errors; }\n\n  inline uint32_t GetReferenceSequenceIndex() const { return (position >> 32); }\n\n  inline uint32_t GetReferenceSequencePosition() const { return position; }\n};\n\n}  // namespace chromap\n\n#endif  // DRAFT_MAPPING_H_\n"
  },
  {
    "path": "src/draft_mapping_generator.cc",
    "content": "#include \"draft_mapping_generator.h\"\n\n#include <vector>\n\n#include \"alignment.h\"\n\nnamespace chromap {\n\nvoid DraftMappingGenerator::GenerateDraftMappings(\n    const SequenceBatch &read_batch, uint32_t read_index,\n    const SequenceBatch &reference, MappingMetadata &mapping_metadata) {\n  mapping_metadata.SetMinNumErrors(error_threshold_ + 1);\n  mapping_metadata.SetNumBestMappings(0);\n  mapping_metadata.SetSecondMinNumErrors(error_threshold_ + 1);\n  mapping_metadata.SetNumSecondBestMappings(0);\n\n  // Directly obtain the non-split mapping in ideal case and return without\n  // running actual verification.\n  const bool is_mapping_generated =\n      GenerateNonSplitDraftMappingSupportedByAllMinimizers(\n          read_batch, read_index, reference, mapping_metadata);\n  if (is_mapping_generated) {\n    return;\n  }\n\n  // Use more sophicated approach to obtain the mapping.\n  // Sort the candidates by their count in descending order.\n  // TODO: check if this sorting is necessary.\n  mapping_metadata.SortCandidates();\n\n  // For split alignments, SIMD cannot be used.\n  if (split_alignment_) {\n    GenerateDraftMappingsOnOneStrand(kPositive, read_index, read_batch,\n                                     reference, mapping_metadata);\n\n    GenerateDraftMappingsOnOneStrand(kNegative, read_index, read_batch,\n                                     reference, mapping_metadata);\n    return;\n  }\n\n  // For non-split alignments, use SIMD when possible.\n  if (mapping_metadata.GetNumPositiveCandidates() < (size_t)num_vpu_lanes_) {\n    GenerateDraftMappingsOnOneStrand(kPositive, read_index, read_batch,\n                                     reference, mapping_metadata);\n  } else {\n    GenerateDraftMappingsOnOneStrandUsingSIMD(kPositive, read_index, read_batch,\n                                              reference, mapping_metadata);\n  }\n\n  if (mapping_metadata.GetNumNegativeCandidates() < (size_t)num_vpu_lanes_) {\n    GenerateDraftMappingsOnOneStrand(kNegative, read_index, read_batch,\n                                     reference, mapping_metadata);\n  } else {\n    GenerateDraftMappingsOnOneStrandUsingSIMD(kNegative, read_index, read_batch,\n                                              reference, mapping_metadata);\n  }\n}\n\nbool DraftMappingGenerator::IsValidCandidate(uint32_t rid, uint32_t position,\n                                             uint32_t read_length,\n                                             const SequenceBatch &reference) {\n  const uint32_t reference_length = reference.GetSequenceLengthAt(rid);\n\n  if (position < (uint32_t)error_threshold_ || position >= reference_length ||\n      position + read_length + (uint32_t)error_threshold_ >= reference_length) {\n    return false;\n  }\n\n  return true;\n}\n\nbool DraftMappingGenerator::\n    GenerateNonSplitDraftMappingSupportedByAllMinimizers(\n        const SequenceBatch &read_batch, uint32_t read_index,\n        const SequenceBatch &reference, MappingMetadata &mapping_metadata) {\n  if (split_alignment_) {\n    return false;\n  }\n\n  const bool has_one_candidate = (mapping_metadata.GetNumCandidates() == 1);\n\n  if (!has_one_candidate) {\n    return false;\n  }\n\n  const std::vector<Candidate> &positive_candidates =\n      mapping_metadata.positive_candidates_;\n  const std::vector<Candidate> &negative_candidates =\n      mapping_metadata.negative_candidates_;\n\n  std::vector<DraftMapping> &positive_mappings =\n      mapping_metadata.positive_mappings_;\n  std::vector<DraftMapping> &negative_mappings =\n      mapping_metadata.negative_mappings_;\n\n  uint32_t num_all_minimizer_candidates = 0;\n  uint32_t all_minimizer_candidate_index = 0;\n  Strand all_minimizer_candidate_strand = kPositive;\n\n  for (uint32_t i = 0; i < positive_candidates.size(); ++i) {\n    if (positive_candidates[i].count == mapping_metadata.GetNumMinimizers()) {\n      all_minimizer_candidate_index = i;\n      ++num_all_minimizer_candidates;\n    }\n  }\n\n  for (uint32_t i = 0; i < negative_candidates.size(); ++i) {\n    if (negative_candidates[i].count == mapping_metadata.GetNumMinimizers()) {\n      all_minimizer_candidate_index = i;\n      all_minimizer_candidate_strand = kNegative;\n      ++num_all_minimizer_candidates;\n    }\n  }\n\n  if (num_all_minimizer_candidates != 1) {\n    return false;\n  }\n\n  mapping_metadata.SetMinNumErrors(0);\n  mapping_metadata.SetNumBestMappings(1);\n  mapping_metadata.SetNumSecondBestMappings(0);\n\n  const uint32_t read_length = read_batch.GetSequenceLengthAt(read_index);\n  const std::vector<Candidate> &candidates =\n      all_minimizer_candidate_strand == kPositive ? positive_candidates\n                                                  : negative_candidates;\n\n  const uint32_t rid =\n      candidates[all_minimizer_candidate_index].GetReferenceSequenceIndex();\n\n  uint32_t position = 0;\n\n  if (all_minimizer_candidate_strand == kPositive) {\n    position = positive_candidates[all_minimizer_candidate_index]\n                   .GetReferenceSequencePosition();\n  } else {\n    position = negative_candidates[all_minimizer_candidate_index]\n                   .GetReferenceSequencePosition() -\n               read_length + 1;\n  }\n\n  const bool is_valid_candidate =\n      IsValidCandidate(rid, position, read_length, reference);\n  if (is_valid_candidate) {\n    if (all_minimizer_candidate_strand == kPositive) {\n      positive_mappings.emplace_back(\n          0, positive_candidates[all_minimizer_candidate_index].position +\n                 read_length - 1);\n    } else {\n      negative_mappings.emplace_back(\n          0, negative_candidates[all_minimizer_candidate_index].position);\n    }\n    return true;\n  }\n\n  return false;\n}\n\nvoid DraftMappingGenerator::GenerateDraftMappingsOnOneStrandUsingSIMD(\n    const Strand candidate_strand, uint32_t read_index,\n    const SequenceBatch &read_batch, const SequenceBatch &reference,\n    MappingMetadata &mapping_metadata) {\n  const char *read = read_batch.GetSequenceAt(read_index);\n  const uint32_t read_length = read_batch.GetSequenceLengthAt(read_index);\n  const std::string &negative_read =\n      read_batch.GetNegativeSequenceAt(read_index);\n\n  const std::vector<Candidate> &candidates =\n      candidate_strand == kPositive ? mapping_metadata.positive_candidates_\n                                    : mapping_metadata.negative_candidates_;\n  std::vector<DraftMapping> &mappings =\n      candidate_strand == kPositive ? mapping_metadata.positive_mappings_\n                                    : mapping_metadata.negative_mappings_;\n  int &min_num_errors = mapping_metadata.min_num_errors_;\n  int &num_best_mappings = mapping_metadata.num_best_mappings_;\n  int &second_min_num_errors = mapping_metadata.second_min_num_errors_;\n  int &num_second_best_mappings = mapping_metadata.num_second_best_mappings_;\n\n  Candidate valid_candidates[num_vpu_lanes_];\n  const char *valid_candidate_starts[num_vpu_lanes_];\n  uint32_t valid_candidate_index = 0;\n  size_t candidate_index = 0;\n  uint32_t candidate_count_threshold = 0;\n\n  while (candidate_index < candidates.size()) {\n    if (candidates[candidate_index].count < candidate_count_threshold) {\n      break;\n    }\n\n    uint32_t rid = candidates[candidate_index].GetReferenceSequenceIndex();\n    uint32_t position =\n        candidates[candidate_index].GetReferenceSequencePosition();\n\n    if (candidate_strand == kNegative) {\n      position = position - read_length + 1;\n    }\n\n    if (!IsValidCandidate(rid, position, read_length, reference)) {\n      ++candidate_index;\n      continue;\n    }\n\n    valid_candidates[valid_candidate_index] = candidates[candidate_index];\n    valid_candidate_starts[valid_candidate_index] =\n        reference.GetSequenceAt(rid) + position - error_threshold_;\n    ++valid_candidate_index;\n    ++candidate_index;\n\n    if (valid_candidate_index < (uint32_t)num_vpu_lanes_) {\n      continue;\n    }\n\n    if (num_vpu_lanes_ == 8) {\n      int16_t mapping_edit_distances[num_vpu_lanes_];\n      int16_t mapping_end_positions[num_vpu_lanes_];\n      for (int li = 0; li < num_vpu_lanes_; ++li) {\n        mapping_end_positions[li] = read_length - 1;\n      }\n      if (candidate_strand == kPositive) {\n        BandedAlign8PatternsToText(error_threshold_, valid_candidate_starts,\n                                   read, read_length, mapping_edit_distances,\n                                   mapping_end_positions);\n      } else {\n        BandedAlign8PatternsToText(\n            error_threshold_, valid_candidate_starts, negative_read.data(),\n            read_length, mapping_edit_distances, mapping_end_positions);\n      }\n      for (int mi = 0; mi < num_vpu_lanes_; ++mi) {\n        if (mapping_edit_distances[mi] <= error_threshold_) {\n          if (mapping_edit_distances[mi] < min_num_errors) {\n            second_min_num_errors = min_num_errors;\n            num_second_best_mappings = num_best_mappings;\n            min_num_errors = mapping_edit_distances[mi];\n            num_best_mappings = 1;\n          } else if (mapping_edit_distances[mi] == min_num_errors) {\n            num_best_mappings++;\n          } else if (mapping_edit_distances[mi] == second_min_num_errors) {\n            num_second_best_mappings++;\n          } else if (mapping_edit_distances[mi] < second_min_num_errors) {\n            num_second_best_mappings = 1;\n            second_min_num_errors = mapping_edit_distances[mi];\n          }\n          if (candidate_strand == kPositive) {\n            mappings.emplace_back(mapping_edit_distances[mi],\n                                  valid_candidates[mi].position -\n                                      error_threshold_ +\n                                      mapping_end_positions[mi]);\n          } else {\n            mappings.emplace_back(mapping_edit_distances[mi],\n                                  valid_candidates[mi].position - read_length +\n                                      1 - error_threshold_ +\n                                      mapping_end_positions[mi]);\n          }\n        } else {\n          candidate_count_threshold = valid_candidates[mi].count;\n        }\n      }\n    } else if (num_vpu_lanes_ == 4) {\n      int32_t mapping_edit_distances[num_vpu_lanes_];\n      int32_t mapping_end_positions[num_vpu_lanes_];\n      for (int li = 0; li < num_vpu_lanes_; ++li) {\n        mapping_end_positions[li] = read_length - 1;\n      }\n      if (candidate_strand == kPositive) {\n        BandedAlign4PatternsToText(error_threshold_, valid_candidate_starts,\n                                   read, read_length, mapping_edit_distances,\n                                   mapping_end_positions);\n      } else {\n        BandedAlign4PatternsToText(\n            error_threshold_, valid_candidate_starts, negative_read.data(),\n            read_length, mapping_edit_distances, mapping_end_positions);\n      }\n      for (int mi = 0; mi < num_vpu_lanes_; ++mi) {\n        if (mapping_edit_distances[mi] <= error_threshold_) {\n          if (mapping_edit_distances[mi] < min_num_errors) {\n            second_min_num_errors = min_num_errors;\n            num_second_best_mappings = num_best_mappings;\n            min_num_errors = mapping_edit_distances[mi];\n            num_best_mappings = 1;\n          } else if (mapping_edit_distances[mi] == min_num_errors) {\n            num_best_mappings++;\n          } else if (mapping_edit_distances[mi] == second_min_num_errors) {\n            num_second_best_mappings++;\n          } else if (mapping_edit_distances[mi] < second_min_num_errors) {\n            num_second_best_mappings = 1;\n            second_min_num_errors = mapping_edit_distances[mi];\n          }\n          if (candidate_strand == kPositive) {\n            mappings.emplace_back(mapping_edit_distances[mi],\n                                  valid_candidates[mi].position -\n                                      error_threshold_ +\n                                      mapping_end_positions[mi]);\n          } else {\n            mappings.emplace_back(mapping_edit_distances[mi],\n                                  valid_candidates[mi].position - read_length +\n                                      1 - error_threshold_ +\n                                      mapping_end_positions[mi]);\n          }\n        } else {\n          candidate_count_threshold = valid_candidates[mi].count;\n        }\n      }\n    }\n\n    valid_candidate_index = 0;\n  }\n\n  for (uint32_t ci = 0; ci < valid_candidate_index; ++ci) {\n    uint32_t rid = valid_candidates[ci].GetReferenceSequenceIndex();\n    uint32_t position = valid_candidates[ci].GetReferenceSequencePosition();\n    if (candidate_strand == kNegative) {\n      position = position - read_length + 1;\n    }\n\n    if (!IsValidCandidate(rid, position, read_length, reference)) {\n      continue;\n    }\n\n    int mapping_end_position;\n    int num_errors;\n    if (candidate_strand == kPositive) {\n      num_errors = BandedAlignPatternToText(\n          error_threshold_,\n          reference.GetSequenceAt(rid) + position - error_threshold_, read,\n          read_length, &mapping_end_position);\n    } else {\n      num_errors = BandedAlignPatternToText(\n          error_threshold_,\n          reference.GetSequenceAt(rid) + position - error_threshold_,\n          negative_read.data(), read_length, &mapping_end_position);\n    }\n    if (num_errors <= error_threshold_) {\n      if (num_errors < min_num_errors) {\n        second_min_num_errors = min_num_errors;\n        num_second_best_mappings = num_best_mappings;\n        min_num_errors = num_errors;\n        num_best_mappings = 1;\n      } else if (num_errors == min_num_errors) {\n        num_best_mappings++;\n      } else if (num_errors == second_min_num_errors) {\n        num_second_best_mappings++;\n      } else if (num_errors < second_min_num_errors) {\n        num_second_best_mappings = 1;\n        second_min_num_errors = num_errors;\n      }\n      if (candidate_strand == kPositive) {\n        mappings.emplace_back(num_errors, valid_candidates[ci].position -\n                                              error_threshold_ +\n                                              mapping_end_position);\n      } else {\n        mappings.emplace_back(num_errors,\n                              valid_candidates[ci].position - read_length + 1 -\n                                  error_threshold_ + mapping_end_position);\n      }\n    }\n  }\n}\n\nvoid DraftMappingGenerator::GenerateDraftMappingsOnOneStrand(\n    const Strand candidate_strand, uint32_t read_index,\n    const SequenceBatch &read_batch, const SequenceBatch &reference,\n    MappingMetadata &mapping_metadata) {\n  const char *read = read_batch.GetSequenceAt(read_index);\n  const uint32_t read_length = read_batch.GetSequenceLengthAt(read_index);\n  const std::string &negative_read =\n      read_batch.GetNegativeSequenceAt(read_index);\n\n  const std::vector<Candidate> &candidates =\n      candidate_strand == kPositive ? mapping_metadata.positive_candidates_\n                                    : mapping_metadata.negative_candidates_;\n  std::vector<DraftMapping> &mappings =\n      candidate_strand == kPositive ? mapping_metadata.positive_mappings_\n                                    : mapping_metadata.negative_mappings_;\n  std::vector<int> &split_sites = candidate_strand == kPositive\n                                      ? mapping_metadata.positive_split_sites_\n                                      : mapping_metadata.negative_split_sites_;\n  int &min_num_errors = mapping_metadata.min_num_errors_;\n  int &num_best_mappings = mapping_metadata.num_best_mappings_;\n  int &second_min_num_errors = mapping_metadata.second_min_num_errors_;\n  int &num_second_best_mappings = mapping_metadata.num_second_best_mappings_;\n\n  uint32_t candidate_count_threshold = 0;\n\n  for (uint32_t ci = 0; ci < candidates.size(); ++ci) {\n    if (candidates[ci].count < candidate_count_threshold) {\n      break;\n    }\n\n    uint32_t rid = candidates[ci].GetReferenceSequenceIndex();\n    uint32_t position = candidates[ci].GetReferenceSequencePosition();\n    if (candidate_strand == kNegative) {\n      position = position - read_length + 1;\n    }\n\n    if (!IsValidCandidate(rid, position, read_length, reference)) {\n      continue;\n    }\n\n    int mapping_end_position = read_length;\n    int gap_beginning = 0;\n    int num_errors = 0;\n    const int allow_gap_beginning_ = 20;\n    const int mapping_length_threshold = 30;\n    int allow_gap_beginning = allow_gap_beginning_ - error_threshold_;\n    int actual_num_errors = 0;\n    int read_mapping_length = 0;\n    int best_mapping_longest_match = 0;\n    int longest_match = 0;\n\n    if (split_alignment_) {\n      if (candidate_strand == kPositive) {\n        num_errors = BandedAlignPatternToTextWithDropOff(\n            error_threshold_,\n            reference.GetSequenceAt(rid) + position - error_threshold_, read,\n            read_length, &mapping_end_position, &read_mapping_length);\n        if (mapping_end_position < 0 && allow_gap_beginning > 0) {\n          int backup_num_errors = num_errors;\n          int backup_mapping_end_position = -mapping_end_position;\n          int backup_read_mapping_length = read_mapping_length;\n          num_errors = BandedAlignPatternToTextWithDropOff(\n              error_threshold_,\n              reference.GetSequenceAt(rid) + position - error_threshold_ +\n                  allow_gap_beginning,\n              read + allow_gap_beginning, read_length - allow_gap_beginning,\n              &mapping_end_position, &read_mapping_length);\n          if (num_errors > error_threshold_ || mapping_end_position < 0) {\n            num_errors = backup_num_errors;\n            mapping_end_position = backup_mapping_end_position;\n            read_mapping_length = backup_read_mapping_length;\n          } else {\n            gap_beginning = allow_gap_beginning;\n            // Realign the mapping end position as it is the alignment from the\n            // whole read.\n            mapping_end_position += gap_beginning;\n            // I use this adjustment since \"position\" is based on the whole\n            // read, and it will be more consistent with no gap beginning case.\n            read_mapping_length += gap_beginning;\n          }\n        }\n      } else {\n        num_errors = BandedAlignPatternToTextWithDropOffFrom3End(\n            error_threshold_,\n            reference.GetSequenceAt(rid) + position - error_threshold_,\n            negative_read.data(), read_length, &mapping_end_position,\n            &read_mapping_length);\n        if (mapping_end_position < 0 && allow_gap_beginning > 0) {\n          int backup_num_errors = num_errors;\n          int backup_mapping_end_position = -mapping_end_position;\n          int backup_read_mapping_length = read_mapping_length;\n          num_errors = BandedAlignPatternToTextWithDropOffFrom3End(\n              error_threshold_,\n              reference.GetSequenceAt(rid) + position - error_threshold_,\n              negative_read.data(), read_length - allow_gap_beginning,\n              &mapping_end_position, &read_mapping_length);\n          if (num_errors > error_threshold_ || mapping_end_position < 0) {\n            num_errors = backup_num_errors;\n            mapping_end_position = backup_mapping_end_position;\n            read_mapping_length = backup_read_mapping_length;\n          } else {\n            gap_beginning = allow_gap_beginning;\n            mapping_end_position += gap_beginning;\n            read_mapping_length += gap_beginning;\n          }\n        }\n      }\n\n      if (mapping_end_position + 1 - error_threshold_ - num_errors -\n              gap_beginning >=\n          mapping_length_threshold) {\n        actual_num_errors = num_errors;\n        num_errors = -(mapping_end_position - error_threshold_ - num_errors -\n                       gap_beginning);\n\n        if (candidates.size() > 200) {\n          if (candidate_strand == kPositive) {\n            longest_match = GetLongestMatchLength(\n                reference.GetSequenceAt(rid) + position, read, read_length);\n          } else {\n            longest_match =\n                GetLongestMatchLength(reference.GetSequenceAt(rid) + position,\n                                      negative_read.data(), read_length);\n          }\n        }\n      } else {\n        num_errors = error_threshold_ + 1;\n        actual_num_errors = error_threshold_ + 1;\n      }\n    } else {\n      if (candidate_strand == kPositive) {\n        num_errors = BandedAlignPatternToText(\n            error_threshold_,\n            reference.GetSequenceAt(rid) + position - error_threshold_, read,\n            read_length, &mapping_end_position);\n      } else {\n        num_errors = BandedAlignPatternToText(\n            error_threshold_,\n            reference.GetSequenceAt(rid) + position - error_threshold_,\n            negative_read.data(), read_length, &mapping_end_position);\n      }\n    }\n\n    if (num_errors <= error_threshold_) {\n      if (num_errors < min_num_errors) {\n        second_min_num_errors = min_num_errors;\n        num_second_best_mappings = num_best_mappings;\n        min_num_errors = num_errors;\n        num_best_mappings = 1;\n        if (split_alignment_) {\n          if (candidates.size() > 50) {\n            candidate_count_threshold = candidates[ci].count;\n          } else {\n            candidate_count_threshold = candidates[ci].count / 2;\n          }\n          if (second_min_num_errors < min_num_errors + error_threshold_ / 2 &&\n              best_mapping_longest_match > longest_match &&\n              candidates.size() > 200) {\n            second_min_num_errors = min_num_errors;\n          }\n        }\n        best_mapping_longest_match = longest_match;\n      } else if (num_errors == min_num_errors) {\n        num_best_mappings++;\n      } else if (num_errors == second_min_num_errors) {\n        num_second_best_mappings++;\n      } else if (num_errors < second_min_num_errors) {\n        num_second_best_mappings = 1;\n        second_min_num_errors = num_errors;\n      }\n\n      if (candidate_strand == kPositive) {\n        mappings.emplace_back(\n            num_errors,\n            candidates[ci].position - error_threshold_ + mapping_end_position);\n      } else {\n        if (split_alignment_ && mapping_output_format_ != MAPPINGFORMAT_SAM) {\n          // TODO: this if condition is suspicious. Check this later.\n          mappings.emplace_back(num_errors,\n                                candidates[ci].position - gap_beginning);\n        } else {\n          // Need to minus gap_beginning because mapping_end_position is\n          // adjusted by it, but read_length is not.\n          // printf(\"%d %d %d\\n\", candidates[ci].position, mapping_end_position,\n          // gap_beginning);\n          mappings.emplace_back(num_errors,\n                                candidates[ci].position - read_length + 1 -\n                                    error_threshold_ + mapping_end_position);\n        }\n      }\n\n      if (split_alignment_) {\n        split_sites.emplace_back(((actual_num_errors & 0xff) << 24) |\n                                 ((gap_beginning & 0xff) << 16) |\n                                 (read_mapping_length & 0xffff));\n      }\n    }\n  }\n}\n\n}  // namespace chromap\n"
  },
  {
    "path": "src/draft_mapping_generator.h",
    "content": "#ifndef DRAFT_MAPPING_GENERATOR_H_\n#define DRAFT_MAPPING_GENERATOR_H_\n\n#include <cstdint>\n\n#include \"draft_mapping.h\"\n#include \"mapping_metadata.h\"\n#include \"mapping_parameters.h\"\n#include \"sequence_batch.h\"\n#include \"utils.h\"\n\nnamespace chromap {\n\nclass DraftMappingGenerator {\n public:\n  DraftMappingGenerator() = delete;\n\n  DraftMappingGenerator(const MappingParameters &mapping_parameters)\n      : error_threshold_(mapping_parameters.error_threshold),\n        split_alignment_(mapping_parameters.split_alignment),\n        num_vpu_lanes_(mapping_parameters.GetNumVPULanes()),\n        mapping_output_format_(mapping_parameters.mapping_output_format) {}\n\n  ~DraftMappingGenerator() = default;\n\n  void GenerateDraftMappings(const SequenceBatch &read_batch,\n                             uint32_t read_index,\n                             const SequenceBatch &reference,\n                             MappingMetadata &mapping_metadata);\n\n private:\n  // Return true if the candidate position is valid on the reference with rid.\n  // Note only the position is checked and the input rid is not checked in this\n  // function. So the input rid must be valid.\n  bool IsValidCandidate(uint32_t rid, uint32_t position, uint32_t read_length,\n                        const SequenceBatch &reference);\n\n  // Return true when there is one non-split mapping generated and the mapping\n  // is supported by all the minimizers.\n  bool GenerateNonSplitDraftMappingSupportedByAllMinimizers(\n      const SequenceBatch &read_batch, uint32_t read_index,\n      const SequenceBatch &reference, MappingMetadata &mapping_metadata);\n\n  void GenerateDraftMappingsOnOneStrandUsingSIMD(\n      const Strand candidate_strand, uint32_t read_index,\n      const SequenceBatch &read_batch, const SequenceBatch &reference,\n      MappingMetadata &mapping_metadata);\n\n  void GenerateDraftMappingsOnOneStrand(const Strand candidate_strand,\n                                        uint32_t read_index,\n                                        const SequenceBatch &read_batch,\n                                        const SequenceBatch &reference,\n                                        MappingMetadata &mapping_metadata);\n\n  const int error_threshold_;\n  const bool split_alignment_;\n  const int num_vpu_lanes_;\n  const MappingOutputFormat mapping_output_format_;\n};\n\n}  // namespace chromap\n\n#endif  // DRAFT_MAPPING_GENERATOR_H_\n"
  },
  {
    "path": "src/feature_barcode_matrix.cc",
    "content": "#include \"feature_barcode_matrix.h\"\n\n#include <cinttypes>\n#include <cstring>\n#include <functional>\n#include <iostream>\n#include <string>\n#include <vector>\n\nnamespace chromap {\n\nvoid FeatureBarcodeMatrix::BuildAugmentedTreeForPeaks(uint32_t ref_id) {\n  // std::sort(mappings.begin(), mappings.end(), IntervalLess());\n  int max_level = 0;\n  size_t i, last_i = 0;  // last_i points to the rightmost node in the tree\n  uint32_t last = 0;     // last is the max value at node last_i\n  int k;\n  std::vector<Peak> &peaks = peaks_on_diff_ref_seqs_[ref_id];\n  std::vector<uint32_t> &extras = tree_extras_on_diff_ref_seqs_[ref_id];\n  if (peaks.size() == 0) {\n    max_level = -1;\n  }\n\n  for (i = 0; i < peaks.size(); i += 2) {\n    last_i = i;\n    // last = mappings[i].max = mappings[i].en; // leaves (i.e. at level 0)\n    last = extras[i] =\n        peaks[i].start_position + peaks[i].length;  // leaves (i.e. at level 0)\n  }\n\n  for (k = 1; 1LL << k <= (int64_t)peaks.size();\n       ++k) {  // process internal nodes in the bottom-up order\n    size_t x = 1LL << (k - 1);\n    size_t i0 = (x << 1) - 1;\n    size_t step = x << 2;  // i0 is the first node\n    for (i = i0; i < peaks.size();\n         i += step) {               // traverse all nodes at level k\n      uint32_t el = extras[i - x];  // max value of the left child\n      uint32_t er =\n          i + x < peaks.size() ? extras[i + x] : last;  // of the right child\n      uint32_t e = peaks[i].start_position + peaks[i].length;\n      e = e > el ? e : el;\n      e = e > er ? e : er;\n      extras[i] = e;  // set the max value for node i\n    }\n    last_i =\n        last_i >> k & 1\n            ? last_i - x\n            : last_i +\n                  x;  // last_i now points to the parent of the original last_i\n    if (last_i < peaks.size() &&\n        extras[last_i] > last)  // update last accordingly\n      last = extras[last_i];\n  }\n\n  max_level = k - 1;\n  tree_info_on_diff_ref_seqs_.emplace_back(max_level, peaks.size());\n}\n\nuint32_t FeatureBarcodeMatrix::CallPeaks(\n    uint16_t coverage_threshold, uint32_t num_reference_sequences,\n    const SequenceBatch &reference,\n    const std::vector<std::vector<PairedEndMappingWithBarcode>> &mappings) {\n  double real_start_time = GetRealTime();\n  // std::vector<std::vector<PairedEndMappingWithBarcode>> &mappings =\n  //    allocate_multi_mappings_\n  //        ? allocated_mappings_on_diff_ref_seqs_\n  //        : (remove_pcr_duplicates_ ? deduped_mappings_on_diff_ref_seqs_\n  //                                  : mappings_on_diff_ref_seqs_);\n  // Build pileup.\n  for (uint32_t ri = 0; ri < num_reference_sequences; ++ri) {\n    pileup_on_diff_ref_seqs_.emplace_back(std::vector<uint16_t>());\n    pileup_on_diff_ref_seqs_[ri].assign(reference.GetSequenceLengthAt(ri), 0);\n    for (size_t mi = 0; mi < mappings[ri].size(); ++mi) {\n      for (uint16_t pi = 0; pi < mappings[ri][mi].fragment_length_; ++pi) {\n        ++pileup_on_diff_ref_seqs_[ri]\n                                  [mappings[ri][mi].GetStartPosition() + pi];\n      }\n    }\n  }\n  std::cerr << \"Built pileup in \" << GetRealTime() - real_start_time << \"s.\\n\";\n\n  real_start_time = GetRealTime();\n  // Call and save peaks.\n  tree_extras_on_diff_ref_seqs_.clear();\n  tree_info_on_diff_ref_seqs_.clear();\n  tree_extras_on_diff_ref_seqs_.reserve(num_reference_sequences);\n  tree_info_on_diff_ref_seqs_.reserve(num_reference_sequences);\n  uint32_t peak_count = 0;\n  for (uint32_t ri = 0; ri < num_reference_sequences; ++ri) {\n    tree_extras_on_diff_ref_seqs_.emplace_back(std::vector<uint32_t>());\n    tree_extras_on_diff_ref_seqs_[ri].reserve(\n        reference.GetSequenceLengthAt(ri) / 100);\n    peaks_on_diff_ref_seqs_.emplace_back(std::vector<Peak>());\n    uint32_t peak_start_position = 0;\n    uint16_t peak_length = 0;\n    for (size_t pi = 0; pi < reference.GetSequenceLengthAt(ri); ++pi) {\n      if (pileup_on_diff_ref_seqs_[ri][pi] >= coverage_threshold) {\n        if (peak_length == 0) {  // start a new peak\n          peak_start_position = pi;\n        }\n        ++peak_length;               // extend the peak\n      } else if (peak_length > 0) {  // save the previous peak\n        // TODO(Haowen): improve peak calling\n        peaks_on_diff_ref_seqs_[ri].emplace_back(\n            Peak{peak_start_position, peak_length, peak_count});\n        tree_extras_on_diff_ref_seqs_[ri].emplace_back(0);\n        feature_barcode_matrix_writer_.OutputPeaks(peak_start_position,\n                                                   peak_length, ri, reference);\n        ++peak_count;\n        peak_length = 0;\n      }\n    }\n    BuildAugmentedTreeForPeaks(ri);\n  }\n  std::cerr << \"Call peaks and built peak augmented tree in \"\n            << GetRealTime() - real_start_time << \"s.\\n\";\n  // Output feature matrix\n  return peak_count;\n}\n\nvoid FeatureBarcodeMatrix::OutputFeatureMatrix(\n    uint32_t num_sequences, const SequenceBatch &reference,\n    const std::vector<std::vector<PairedEndMappingWithBarcode>> &mappings,\n    const std::string &matrix_output_prefix) {\n  feature_barcode_matrix_writer_.InitializeMatrixOutput(matrix_output_prefix);\n\n  uint32_t num_peaks = 0;\n  if (cell_by_bin_) {\n    feature_barcode_matrix_writer_.OutputPeaks(bin_size_, num_sequences,\n                                               reference);\n    for (uint32_t i = 0; i < num_sequences; ++i) {\n      uint32_t ref_seq_length = reference.GetSequenceLengthAt(i);\n      num_peaks += ref_seq_length / bin_size_;\n      if (ref_seq_length % bin_size_ != 0) {\n        ++num_peaks;\n      }\n    }\n  } else {\n    num_peaks = CallPeaks(depth_cutoff_to_call_peak_, num_sequences, reference,\n                          mappings);\n  }\n\n  // std::vector<std::vector<PairedEndMappingWithBarcode>> &mappings =\n  //    allocate_multi_mappings_\n  //        ? allocated_mappings_on_diff_ref_seqs_\n  //        : (remove_pcr_duplicates_ ? deduped_mappings_on_diff_ref_seqs_\n  //                                  : mappings_on_diff_ref_seqs_);\n  double real_start_time = GetRealTime();\n  // First pass to index barcodes\n  uint32_t barcode_index = 0;\n  for (uint32_t rid = 0; rid < num_sequences; ++rid) {\n    for (uint32_t mi = 0; mi < mappings[rid].size(); ++mi) {\n      uint64_t barcode_key = mappings[rid][mi].cell_barcode_;\n      khiter_t barcode_index_table_iterator =\n          kh_get(k64_seq, barcode_index_table_, barcode_key);\n      if (barcode_index_table_iterator == kh_end(barcode_index_table_)) {\n        int khash_return_code;\n        barcode_index_table_iterator = kh_put(k64_seq, barcode_index_table_,\n                                              barcode_key, &khash_return_code);\n        assert(khash_return_code != -1 && khash_return_code != 0);\n        kh_value(barcode_index_table_, barcode_index_table_iterator) =\n            barcode_index;\n        ++barcode_index;\n        feature_barcode_matrix_writer_.AppendBarcodeOutput(barcode_key);\n      }\n    }\n  }\n  std::cerr << \"Index and output barcodes in \"\n            << GetRealTime() - real_start_time << \"s.\\n\";\n\n  real_start_time = GetRealTime();\n  // Second pass to generate matrix\n  khash_t(kmatrix) *matrix = kh_init(kmatrix);\n  std::vector<uint32_t> overlapped_peak_indices;\n  for (uint32_t rid = 0; rid < num_sequences; ++rid) {\n    for (uint32_t mi = 0; mi < mappings[rid].size(); ++mi) {\n      uint64_t barcode_key = mappings[rid][mi].cell_barcode_;\n      khiter_t barcode_index_table_iterator =\n          kh_get(k64_seq, barcode_index_table_, barcode_key);\n      uint64_t barcode_index =\n          kh_value(barcode_index_table_, barcode_index_table_iterator);\n      overlapped_peak_indices.clear();\n      if (cell_by_bin_) {\n        GetNumOverlappedBins(rid, mappings[rid][mi].GetStartPosition(),\n                             mappings[rid][mi].GetEndPosition() -\n                                 mappings[rid][mi].GetStartPosition(),\n                             num_sequences, reference, overlapped_peak_indices);\n      } else {\n        GetNumOverlappedPeaks(rid, mappings[rid][mi], overlapped_peak_indices);\n      }\n      size_t num_overlapped_peaks = overlapped_peak_indices.size();\n      for (size_t pi = 0; pi < num_overlapped_peaks; ++pi) {\n        uint32_t peak_index = overlapped_peak_indices[pi];\n        uint64_t entry_index = (barcode_index << 32) | peak_index;\n        khiter_t matrix_iterator = kh_get(kmatrix, matrix, entry_index);\n        if (matrix_iterator == kh_end(matrix)) {\n          int khash_return_code;\n          matrix_iterator =\n              kh_put(kmatrix, matrix, entry_index, &khash_return_code);\n          assert(khash_return_code != -1 && khash_return_code != 0);\n          kh_value(matrix, matrix_iterator) = 1;\n        } else {\n          kh_value(matrix, matrix_iterator) += 1;\n        }\n      }\n    }\n  }\n  std::cerr << \"Generate feature matrix in \" << GetRealTime() - real_start_time\n            << \"s.\\n\";\n  // Output matrix\n  real_start_time = GetRealTime();\n  feature_barcode_matrix_writer_.WriteMatrixOutputHead(\n      num_peaks, kh_size(barcode_index_table_), kh_size(matrix));\n  uint64_t key;\n  uint32_t value;\n  std::vector<std::pair<uint64_t, uint32_t>> feature_matrix;\n  feature_matrix.reserve(kh_size(matrix));\n  // kh_foreach(matrix, key, value,\n  // output_tools_->AppendMatrixOutput((uint32_t)key, (uint32_t)(key >> 32),\n  // value));\n  kh_foreach(matrix, key, value, feature_matrix.emplace_back(key, value));\n  kh_destroy(kmatrix, matrix);\n  std::sort(feature_matrix.begin(), feature_matrix.end());\n  for (size_t i = 0; i < feature_matrix.size(); ++i) {\n    feature_barcode_matrix_writer_.AppendMatrixOutput(\n        (uint32_t)feature_matrix[i].first,\n        (uint32_t)(feature_matrix[i].first >> 32), feature_matrix[i].second);\n  }\n\n  feature_barcode_matrix_writer_.FinalizeMatrixOutput();\n  std::cerr << \"Output feature matrix in \" << GetRealTime() - real_start_time\n            << \"s.\\n\";\n}\n\nvoid FeatureBarcodeMatrix::GetNumOverlappedBins(\n    uint32_t rid, uint32_t start_position, uint16_t mapping_length,\n    uint32_t num_sequences, const SequenceBatch &reference,\n    std::vector<uint32_t> &overlapped_peak_indices) {\n  uint32_t bin_index = 0;\n  for (uint32_t i = 0; i < rid; ++i) {\n    uint32_t ref_seq_length = reference.GetSequenceLengthAt(i);\n    bin_index += ref_seq_length / bin_size_;\n    if (ref_seq_length % bin_size_ != 0) {\n      ++bin_index;\n    }\n  }\n  bin_index += start_position / bin_size_;\n  overlapped_peak_indices.emplace_back(bin_index);\n  uint32_t max_num_overlapped_bins = mapping_length / bin_size_ + 2;\n  for (uint32_t i = 0; i < max_num_overlapped_bins; ++i) {\n    if (start_position + mapping_length - 1 >=\n        (bin_index + 1 + i) * bin_size_) {\n      overlapped_peak_indices.emplace_back(bin_index + 1 + i);\n    }\n  }\n}\n\nuint32_t FeatureBarcodeMatrix::GetNumOverlappedPeaks(\n    uint32_t ref_id, const PairedEndMappingWithBarcode &mapping,\n    std::vector<uint32_t> &overlapped_peak_indices) {\n  int t = 0;\n  StackCell stack[64];\n  // out.clear();\n  overlapped_peak_indices.clear();\n  int num_overlapped_peaks = 0;\n  int max_level = tree_info_on_diff_ref_seqs_[ref_id].first;\n  uint32_t num_tree_nodes = tree_info_on_diff_ref_seqs_[ref_id].second;\n  std::vector<Peak> &peaks = peaks_on_diff_ref_seqs_[ref_id];\n  std::vector<uint32_t> &extras = tree_extras_on_diff_ref_seqs_[ref_id];\n  // uint32_t interval_start = mapping.fragment_start_position;\n  uint32_t interval_start =\n      mapping.GetStartPosition() > (uint32_t)overlap_distance_\n          ? mapping.GetStartPosition() - overlap_distance_\n          : 0;\n  uint32_t interval_end =\n      mapping.GetEndPosition() + (uint32_t)overlap_distance_;\n  stack[t++] = StackCell(max_level, (1LL << max_level) - 1,\n                         0);  // push the root; this is a top down traversal\n  while (\n      t) {  // the following guarantees that numbers in out[] are always sorted\n    StackCell z = stack[--t];\n    if (z.k <=\n        3) {  // we are in a small subtree; traverse every node in this subtree\n      size_t i, i0 = z.x >> z.k << z.k, i1 = i0 + (1LL << (z.k + 1)) - 1;\n      if (i1 >= num_tree_nodes) {\n        i1 = num_tree_nodes;\n      }\n      for (i = i0; i < i1 && peaks[i].start_position < interval_end; ++i) {\n        if (interval_start <\n            peaks[i].start_position +\n                peaks[i].length) {  // if overlap, append to out[]\n          // out.push_back(i);\n          overlapped_peak_indices.emplace_back(peaks[i].index);\n          ++num_overlapped_peaks;\n        }\n      }\n    } else if (z.w == 0) {  // if left child not processed\n      size_t y =\n          z.x - (1LL << (z.k - 1));  // the left child of z.x; NB: y may be out\n                                     // of range (i.e. y>=a.size())\n      stack[t++] = StackCell(\n          z.k, z.x,\n          1);  // re-add node z.x, but mark the left child having been processed\n      if (y >= num_tree_nodes ||\n          extras[y] > interval_start)  // push the left child if y is out of\n                                       // range or may overlap with the query\n        stack[t++] = StackCell(z.k - 1, y, 0);\n    } else if (z.x < num_tree_nodes &&\n               peaks[z.x].start_position <\n                   interval_end) {  // need to push the right child\n      if (interval_start < peaks[z.x].start_position + peaks[z.x].length) {\n        // out.push_back(z.x); // test if z.x overlaps the query; if yes, append\n        // to out[]\n        overlapped_peak_indices.emplace_back(peaks[z.x].index);\n        ++num_overlapped_peaks;\n      }\n      stack[t++] = StackCell(z.k - 1, z.x + (1LL << (z.k - 1)),\n                             0);  // push the right child\n    }\n  }\n  return num_overlapped_peaks;\n}\n\n}  // namespace chromap\n"
  },
  {
    "path": "src/feature_barcode_matrix.h",
    "content": "#ifndef FEATURE_BARCODE_MATRIX_H_\n#define FEATURE_BARCODE_MATRIX_H_\n\n#include <assert.h>\n\n#include <algorithm>\n#include <cinttypes>\n#include <cstring>\n#include <functional>\n#include <iostream>\n#include <string>\n#include <vector>\n\n#include \"bed_mapping.h\"\n#include \"feature_barcode_matrix_writer.h\"\n#include \"khash.h\"\n#include \"utils.h\"\n\nnamespace chromap {\n\nstruct Peak {\n  uint32_t start_position;\n  uint16_t length;\n  uint32_t index;\n};\n\nclass FeatureBarcodeMatrix {\n public:\n  FeatureBarcodeMatrix(bool cell_by_bin, int bin_size, int overlap_distance,\n                       uint16_t depth_cutoff_to_call_peak)\n      : cell_by_bin_(cell_by_bin),\n        bin_size_(bin_size),\n        overlap_distance_(overlap_distance),\n        depth_cutoff_to_call_peak_(depth_cutoff_to_call_peak) {\n    barcode_index_table_ = kh_init(k64_seq);\n  }\n\n  ~FeatureBarcodeMatrix() {\n    if (barcode_index_table_ != NULL) {\n      kh_destroy(k64_seq, barcode_index_table_);\n    }\n  }\n\n  void OutputFeatureMatrix(\n      uint32_t num_sequences, const SequenceBatch &reference,\n      const std::vector<std::vector<PairedEndMappingWithBarcode>> &mappings,\n      const std::string &matrix_output_prefix);\n\n private:\n  void BuildAugmentedTreeForPeaks(uint32_t ref_id);\n\n  uint32_t GetNumOverlappedPeaks(\n      uint32_t ref_id, const PairedEndMappingWithBarcode &mapping,\n      std::vector<uint32_t> &overlapped_peak_indices);\n\n  void GetNumOverlappedBins(uint32_t rid, uint32_t start_position,\n                            uint16_t mapping_length, uint32_t num_sequences,\n                            const SequenceBatch &reference,\n                            std::vector<uint32_t> &overlapped_peak_indices);\n\n  uint32_t CallPeaks(\n      uint16_t coverage_threshold, uint32_t num_reference_sequences,\n      const SequenceBatch &reference,\n      const std::vector<std::vector<PairedEndMappingWithBarcode>> &mappings);\n\n  const bool cell_by_bin_;\n  const int bin_size_;\n  const int overlap_distance_;\n  const uint16_t depth_cutoff_to_call_peak_;\n\n  khash_t(k64_seq) * barcode_index_table_;\n  // (max_level, # nodes)\n  std::vector<std::pair<int, uint32_t>> tree_info_on_diff_ref_seqs_;\n\n  // max\n  std::vector<std::vector<uint32_t>> tree_extras_on_diff_ref_seqs_;\n\n  // For peak calling.\n  std::vector<std::vector<uint16_t>> pileup_on_diff_ref_seqs_;\n  std::vector<std::vector<Peak>> peaks_on_diff_ref_seqs_;\n\n  FeatureBarcodeMatrixWriter feature_barcode_matrix_writer_;\n};\n\n}  // namespace chromap\n\n#endif  // FEATURE_BARCODE_MATRIX_H_\n"
  },
  {
    "path": "src/feature_barcode_matrix_writer.h",
    "content": "#ifndef FEATUREBARCODEMATRIXWRITER_H_\n#define FEATUREBARCODEMATRIXWRITER_H_\n\n#include <assert.h>\n\n#include <cinttypes>\n#include <cstring>\n#include <functional>\n#include <iostream>\n#include <string>\n#include <vector>\n\n#include \"barcode_translator.h\"\n#include \"sequence_batch.h\"\n\nnamespace chromap {\n\n// The code here is not working properly since the barcode length is not set.\n// But this feature is not used in the realse for now so this is fine.\nclass FeatureBarcodeMatrixWriter {\n public:\n  FeatureBarcodeMatrixWriter() {}\n  ~FeatureBarcodeMatrixWriter() {}\n\n  inline void InitializeMatrixOutput(const std::string &matrix_output_prefix) {\n    matrix_output_prefix_ = matrix_output_prefix;\n    matrix_output_file_ =\n        fopen((matrix_output_prefix_ + \"_matrix.mtx\").c_str(), \"w\");\n    assert(matrix_output_file_ != nullptr);\n    peak_output_file_ =\n        fopen((matrix_output_prefix_ + \"_peaks.bed\").c_str(), \"w\");\n    assert(peak_output_file_ != nullptr);\n    barcode_output_file_ =\n        fopen((matrix_output_prefix_ + \"_barcode.tsv\").c_str(), \"w\");\n    assert(barcode_output_file_ != nullptr);\n  }\n\n  void OutputPeaks(uint32_t bin_size, uint32_t num_sequences,\n                   const SequenceBatch &reference) {\n    for (uint32_t rid = 0; rid < num_sequences; ++rid) {\n      uint32_t sequence_length = reference.GetSequenceLengthAt(rid);\n      const char *sequence_name = reference.GetSequenceNameAt(rid);\n      for (uint32_t position = 0; position < sequence_length;\n           position += bin_size) {\n        fprintf(peak_output_file_, \"%s\\t%u\\t%u\\n\", sequence_name, position + 1,\n                position + bin_size);\n      }\n    }\n  }\n\n  void OutputPeaks(uint32_t peak_start_position, uint16_t peak_length,\n                   uint32_t rid, const SequenceBatch &reference) {\n    const char *sequence_name = reference.GetSequenceNameAt(rid);\n    fprintf(peak_output_file_, \"%s\\t%u\\t%u\\n\", sequence_name,\n            peak_start_position + 1, peak_start_position + peak_length);\n  }\n\n  void AppendBarcodeOutput(uint64_t barcode_key) {\n    fprintf(barcode_output_file_, \"%s-1\\n\",\n            barcode_translator_.Translate(barcode_key, cell_barcode_length_)\n                .data());\n  }\n\n  void WriteMatrixOutputHead(uint64_t num_peaks, uint64_t num_barcodes,\n                             uint64_t num_lines) {\n    fprintf(matrix_output_file_, \"%\" PRIu64 \"\\t%\" PRIu64 \"\\t%\" PRIu64 \"\\n\",\n            num_peaks, num_barcodes, num_lines);\n  }\n\n  void AppendMatrixOutput(uint32_t peak_index, uint32_t barcode_index,\n                          uint32_t num_mappings) {\n    fprintf(matrix_output_file_, \"%u\\t%u\\t%u\\n\", peak_index, barcode_index,\n            num_mappings);\n  }\n\n  inline void FinalizeMatrixOutput() {\n    fclose(matrix_output_file_);\n    fclose(peak_output_file_);\n    fclose(barcode_output_file_);\n  }\n\n  inline void SetBarcodeTranslateTable(const std::string &file) {\n    barcode_translator_.SetTranslateTable(file);\n  }\n\n  inline void SetBarcodeLength(uint32_t cell_barcode_length) {\n    cell_barcode_length_ = cell_barcode_length;\n  }\n\n protected:\n  uint32_t cell_barcode_length_ = 16;\n  std::string matrix_output_prefix_;\n  FILE *peak_output_file_ = nullptr;\n  FILE *barcode_output_file_ = nullptr;\n  FILE *matrix_output_file_ = nullptr;\n  BarcodeTranslator barcode_translator_;\n};\n\n}  // namespace chromap\n\n#endif  // FEATUREBARCODEMATRIXWRITER_H_\n"
  },
  {
    "path": "src/hit_utils.h",
    "content": "#ifndef HIT_UTILS_H_\n#define HIT_UTILS_H_\n\n#include \"strand.h\"\n\nnamespace chromap {\n\ninline static uint32_t HitToSequenceIndex(uint64_t hit) { return (hit >> 33); }\n\ninline static uint32_t HitToSequencePosition(uint64_t hit) {\n  return (hit >> 1);\n}\n\ninline static Strand HitToStrand(uint64_t hit) {\n  if ((hit & 1) == 0) {\n    return kPositive;\n  }\n  return kNegative;\n}\n\ninline static bool AreTwoHitsOnTheSameStrand(uint64_t hit1, uint64_t hit2) {\n  return ((hit1 & 1) == (hit2 & 1));\n}\n\n}  // namespace chromap\n\n#endif  // HIT_UTILS_H_\n"
  },
  {
    "path": "src/index.cc",
    "content": "#include \"index.h\"\n\n#include <assert.h>\n\n#include <algorithm>\n#include <iostream>\n\n#include \"minimizer_generator.h\"\n\nnamespace chromap {\n\nvoid Index::Construct(uint32_t num_sequences, const SequenceBatch &reference) {\n  const double real_start_time = GetRealTime();\n\n  std::vector<Minimizer> minimizers;\n  minimizers.reserve(reference.GetNumBases() / window_size_ * 2);\n  std::cerr << \"Collecting minimizers.\\n\";\n  MinimizerGenerator minimizer_generator(kmer_size_, window_size_);\n  for (uint32_t sequence_index = 0; sequence_index < num_sequences;\n       ++sequence_index) {\n    minimizer_generator.GenerateMinimizers(reference, sequence_index,\n                                           minimizers);\n  }\n  std::cerr << \"Collected \" << minimizers.size() << \" minimizers.\\n\";\n  std::cerr << \"Sorting minimizers.\\n\";\n  std::stable_sort(minimizers.begin(), minimizers.end());\n  std::cerr << \"Sorted all minimizers.\\n\";\n  const size_t num_minimizers = minimizers.size();\n  assert(num_minimizers > 0);\n  // TODO: check this assert!\n  // Here I make sure the # minimizers is less than the limit of signed int32,\n  // so that I can use int to store position later.\n  assert(num_minimizers <= static_cast<size_t>(INT_MAX));\n\n  occurrence_table_.reserve(num_minimizers);\n  uint64_t previous_lookup_hash =\n      GenerateHashInLookupTable(minimizers[0].GetHash());\n  uint32_t num_previous_minimizer_occurrences = 0;\n  uint64_t num_nonsingletons = 0;\n  uint32_t num_singletons = 0;\n  for (size_t mi = 0; mi <= num_minimizers; ++mi) {\n    const bool is_last_iteration = mi == num_minimizers;\n    const uint64_t current_lookup_hash =\n        is_last_iteration ? previous_lookup_hash + 1\n                          : GenerateHashInLookupTable(minimizers[mi].GetHash());\n\n    if (current_lookup_hash != previous_lookup_hash) {\n      int khash_return_code = 0;\n      khiter_t khash_iterator =\n          kh_put(k64, lookup_table_, previous_lookup_hash, &khash_return_code);\n      assert(khash_return_code != -1 && khash_return_code != 0);\n\n      if (num_previous_minimizer_occurrences == 1) {\n        // We set the lowest bit of the key value to 1 if the minimizer only\n        // occurs once. And the occurrence is directly saved in the lookup\n        // table.\n        kh_key(lookup_table_, khash_iterator) |= 1;\n        kh_value(lookup_table_, khash_iterator) = occurrence_table_.back();\n        occurrence_table_.pop_back();\n        ++num_singletons;\n      } else {\n        kh_value(lookup_table_, khash_iterator) =\n            GenerateEntryValueInLookupTable(num_nonsingletons,\n                                            num_previous_minimizer_occurrences);\n        num_nonsingletons += num_previous_minimizer_occurrences;\n      }\n      num_previous_minimizer_occurrences = 1;\n    } else {\n      num_previous_minimizer_occurrences++;\n    }\n\n    if (is_last_iteration) {\n      break;\n    }\n\n    occurrence_table_.push_back(minimizers[mi].GetHit());\n    previous_lookup_hash = current_lookup_hash;\n  }\n  assert(num_nonsingletons + num_singletons == num_minimizers);\n\n  std::cerr << \"Kmer size: \" << kmer_size_ << \", window size: \" << window_size_\n            << \".\\n\";\n  std::cerr << \"Lookup table size: \" << kh_size(lookup_table_)\n            << \", # buckets: \" << kh_n_buckets(lookup_table_)\n            << \", occurrence table size: \" << occurrence_table_.size()\n            << \", # singletons: \" << num_singletons << \".\\n\";\n  std::cerr << \"Built index successfully in \" << GetRealTime() - real_start_time\n            << \"s.\\n\";\n}\n\nvoid Index::Save() const {\n  const double real_start_time = GetRealTime();\n  FILE *index_file = fopen(index_file_path_.c_str(), \"wb\");\n  assert(index_file != nullptr);\n\n  uint64_t num_bytes = 0;\n  int err = 0;\n\n  err = fwrite(&kmer_size_, sizeof(int), 1, index_file);\n  num_bytes += sizeof(int);\n  assert(err != 0);\n\n  err = fwrite(&window_size_, sizeof(int), 1, index_file);\n  num_bytes += sizeof(int);\n  assert(err != 0);\n\n  const uint32_t lookup_table_size = kh_size(lookup_table_);\n  err = fwrite(&lookup_table_size, sizeof(uint32_t), 1, index_file);\n  num_bytes += sizeof(uint32_t);\n  assert(err != 0);\n\n  kh_save(k64, lookup_table_, index_file);\n  num_bytes += sizeof(uint64_t) * 2 * lookup_table_size;\n\n  const uint32_t occurrence_table_size = occurrence_table_.size();\n  err = fwrite(&occurrence_table_size, sizeof(uint32_t), 1, index_file);\n  num_bytes += sizeof(uint32_t);\n  assert(err != 0);\n\n  if (occurrence_table_size > 0) {\n    err = fwrite(occurrence_table_.data(), sizeof(uint64_t),\n                 occurrence_table_size, index_file);\n    num_bytes += sizeof(uint64_t) * occurrence_table_size;\n    assert(err != 0);\n  }\n\n  fclose(index_file);\n  // std::cerr << \"Index size: \" << num_bytes / (1024.0 * 1024 * 1024) << \"GB,\n  std::cerr << \"Saved in \" << GetRealTime() - real_start_time << \"s.\\n\";\n}\n\nvoid Index::Load() {\n  const double real_start_time = GetRealTime();\n  FILE *index_file = fopen(index_file_path_.c_str(), \"rb\");\n  assert(index_file != nullptr);\n\n  int err = 0;\n  err = fread(&kmer_size_, sizeof(int), 1, index_file);\n  assert(err != 0);\n\n  err = fread(&window_size_, sizeof(int), 1, index_file);\n  assert(err != 0);\n\n  uint32_t lookup_table_size = 0;\n  err = fread(&lookup_table_size, sizeof(uint32_t), 1, index_file);\n  assert(err != 0);\n\n  kh_load(k64, lookup_table_, index_file);\n\n  uint32_t occurrence_table_size = 0;\n  err = fread(&occurrence_table_size, sizeof(uint32_t), 1, index_file);\n  assert(err != 0);\n\n  if (occurrence_table_size > 0) {\n    occurrence_table_.resize(occurrence_table_size);\n    err = fread(occurrence_table_.data(), sizeof(uint64_t),\n                occurrence_table_size, index_file);\n    assert(err != 0);\n  }\n\n  fclose(index_file);\n\n  std::cerr << \"Kmer size: \" << kmer_size_ << \", window size: \" << window_size_\n            << \".\\n\";\n  std::cerr << \"Lookup table size: \" << kh_size(lookup_table_)\n            << \", occurrence table size: \" << occurrence_table_.size() << \".\\n\";\n  std::cerr << \"Loaded index successfully in \"\n            << GetRealTime() - real_start_time << \"s.\\n\";\n}\n\nvoid Index::Statistics(uint32_t num_sequences,\n                       const SequenceBatch &reference) const {\n  double real_start_time = GetRealTime();\n  int n = 0, n1 = 0;\n  uint32_t i;\n  uint64_t sum = 0, len = 0;\n  fprintf(stderr, \"[M::%s] kmer size: %d; skip: %d; #seq: %d\\n\", __func__,\n          kmer_size_, window_size_, num_sequences);\n  for (i = 0; i < num_sequences; ++i) {\n    len += reference.GetSequenceLengthAt(i);\n  }\n  assert(len == reference.GetNumBases());\n  if (lookup_table_) {\n    n += kh_size(lookup_table_);\n  }\n  for (khint_t k = 0; k < kh_end(lookup_table_); ++k) {\n    if (kh_exist(lookup_table_, k)) {\n      sum +=\n          kh_key(lookup_table_, k) & 1 ? 1 : (uint32_t)kh_val(lookup_table_, k);\n      if (kh_key(lookup_table_, k) & 1) ++n1;\n    }\n  }\n  fprintf(stderr,\n          \"[M::%s::%.3f] distinct minimizers: %d (%.2f%% are singletons); \"\n          \"average occurrences: %.3lf; average spacing: %.3lf\\n\",\n          __func__, GetRealTime() - real_start_time, n, 100.0 * n1 / n,\n          (double)sum / n, (double)len / sum);\n}\n\nvoid Index::CheckIndex(uint32_t num_sequences,\n                       const SequenceBatch &reference) const {\n  std::vector<Minimizer> minimizers;\n  minimizers.reserve(reference.GetNumBases() / window_size_ * 2);\n  MinimizerGenerator minimizer_generator(kmer_size_, window_size_);\n  for (uint32_t sequence_index = 0; sequence_index < num_sequences;\n       ++sequence_index) {\n    minimizer_generator.GenerateMinimizers(reference, sequence_index,\n                                           minimizers);\n  }\n  std::cerr << \"Collected \" << minimizers.size() << \" minimizers.\\n\";\n  std::stable_sort(minimizers.begin(), minimizers.end());\n  std::cerr << \"Sorted minimizers.\\n\";\n\n  uint32_t count = 0;\n  for (uint32_t i = 0; i < minimizers.size(); ++i) {\n    khiter_t khash_iterator = kh_get(\n        k64, lookup_table_, GenerateHashInLookupTable(minimizers[i].GetHash()));\n    assert(khash_iterator != kh_end(lookup_table_));\n    uint64_t key = kh_key(lookup_table_, khash_iterator);\n    uint64_t value = kh_value(lookup_table_, khash_iterator);\n    if (key & 1) {  // singleton\n      assert(minimizers[i].GetHit() == value);\n      count = 0;\n    } else {\n      uint32_t offset = GenerateOffsetInOccurrenceTable(value);\n      uint32_t num_occ = GenerateNumOccurrenceInOccurrenceTable(value);\n      uint64_t value_in_index = occurrence_table_[offset + count];\n      assert(value_in_index == minimizers[i].GetHit());\n      ++count;\n      if (count == num_occ) {\n        count = 0;\n      }\n    }\n  }\n}\n\nint Index::GenerateCandidatePositions(\n    const CandidatePositionGeneratingConfig &generating_config,\n    MappingMetadata &mapping_metadata) const {\n  const uint32_t num_minimizers = mapping_metadata.GetNumMinimizers();\n  const std::vector<Minimizer> &minimizers = mapping_metadata.minimizers_;\n\n  std::vector<std::vector<uint64_t>> positive_candidate_position_lists;\n  std::vector<std::vector<uint64_t>> negative_candidate_position_lists;\n  if (generating_config.UseHeapMerge()) {\n    for (uint32_t i = 0; i < num_minimizers; ++i) {\n      positive_candidate_position_lists.emplace_back(std::vector<uint64_t>());\n      negative_candidate_position_lists.emplace_back(std::vector<uint64_t>());\n    }\n  }\n  bool is_candidate_position_list_sorted = true;\n\n  mapping_metadata.positive_hits_.reserve(\n      generating_config.GetMaxSeedFrequency() * 2);\n  mapping_metadata.negative_hits_.reserve(\n      generating_config.GetMaxSeedFrequency() * 2);\n\n  RepetitiveSeedStats repetitive_seed_stats;\n  for (uint32_t mi = 0; mi < num_minimizers; ++mi) {\n    khiter_t khash_iterator =\n        kh_get(k64, lookup_table_,\n               GenerateHashInLookupTable(minimizers[mi].GetHash()));\n    if (khash_iterator == kh_end(lookup_table_)) {\n      // std::cerr << \"The minimizer is not in reference!\\n\";\n      continue;\n    }\n\n    std::vector<uint64_t> &positive_candidate_positions =\n        generating_config.UseHeapMerge() ? positive_candidate_position_lists[mi]\n                                         : mapping_metadata.positive_hits_;\n    std::vector<uint64_t> &negative_candidate_positions =\n        generating_config.UseHeapMerge() ? negative_candidate_position_lists[mi]\n                                         : mapping_metadata.negative_hits_;\n\n    const uint64_t lookup_key = kh_key(lookup_table_, khash_iterator);\n    const uint64_t lookup_value = kh_value(lookup_table_, khash_iterator);\n    const uint64_t read_hit = minimizers[mi].GetHit();\n    if (IsSingletonLookupKey(lookup_key)) {\n      const uint64_t candidate_position = GenerateCandidatePositionFromHits(\n          /*reference_hit=*/lookup_value, read_hit);\n      if (AreTwoHitsOnTheSameStrand(/*reference_hit=*/lookup_value, read_hit)) {\n        positive_candidate_positions.push_back(candidate_position);\n      } else {\n        negative_candidate_positions.push_back(candidate_position);\n      }\n      continue;\n    }\n\n    const uint32_t num_occurrences =\n        GenerateNumOccurrenceInOccurrenceTable(lookup_value);\n    if (!generating_config.IsFrequentSeed(num_occurrences)) {\n      const uint32_t read_position = HitToSequencePosition(read_hit);\n      const uint32_t occ_offset = GenerateOffsetInOccurrenceTable(lookup_value);\n      for (uint32_t oi = 0; oi < num_occurrences; ++oi) {\n        const uint64_t reference_hit = occurrence_table_[occ_offset + oi];\n        const uint64_t candidate_position =\n            GenerateCandidatePositionFromHits(reference_hit, read_hit);\n        if (AreTwoHitsOnTheSameStrand(reference_hit, read_hit)) {\n          const uint32_t reference_position =\n              HitToSequencePosition(reference_hit);\n          if (reference_position < read_position) {\n            is_candidate_position_list_sorted = false;\n          }\n          positive_candidate_positions.push_back(candidate_position);\n        } else {\n          negative_candidate_positions.push_back(candidate_position);\n        }\n      }\n    }\n\n    if (generating_config.IsRepetitiveSeed(num_occurrences)) {\n      const uint32_t read_position = HitToSequencePosition(read_hit);\n      UpdateRepetitiveSeedStats(read_position, repetitive_seed_stats);\n    }\n  }\n\n  if (generating_config.UseHeapMerge()) {\n    // TODO: try to remove this sorting.\n    if (!is_candidate_position_list_sorted) {\n      for (uint32_t mi = 0; mi < num_minimizers; ++mi) {\n        std::sort(positive_candidate_position_lists[mi].begin(),\n                  positive_candidate_position_lists[mi].end());\n      }\n    }\n    HeapMergeCandidatePositionLists(positive_candidate_position_lists,\n                                    mapping_metadata.positive_hits_);\n    HeapMergeCandidatePositionLists(negative_candidate_position_lists,\n                                    mapping_metadata.negative_hits_);\n  } else {\n    std::sort(mapping_metadata.positive_hits_.begin(),\n              mapping_metadata.positive_hits_.end());\n    std::sort(mapping_metadata.negative_hits_.begin(),\n              mapping_metadata.negative_hits_.end());\n  }\n\n#ifdef LI_DEBUG\n  for (uint32_t mi = 0; mi < mapping_metadata.positive_hits_.size(); ++mi)\n    printf(\"+ %llu %d %d\\n\", mapping_metadata.positive_hits_[mi],\n           (int)(mapping_metadata.positive_hits_[mi] >> 32), (int)(mapping_metadata.positive_hits_[mi]));\n\n  for (uint32_t mi = 0; mi < mapping_metadata.negative_hits_.size(); ++mi)\n    printf(\"- %llu %d %d\\n\", mapping_metadata.negative_hits_[mi],\n           (int)(mapping_metadata.negative_hits_[mi] >> 32), (int)(mapping_metadata.negative_hits_[mi]));\n#endif\n\n  mapping_metadata.repetitive_seed_length_ =\n      repetitive_seed_stats.repetitive_seed_length;\n  return repetitive_seed_stats.repetitive_seed_count;\n}\n\nint Index::GenerateCandidatePositionsFromRepetitiveReadWithMateInfoOnOneStrand(\n    const Strand strand, uint32_t search_range,\n    int min_num_seeds_required_for_mapping, int max_seed_frequency0,\n    int error_threshold, const std::vector<Minimizer> &minimizers,\n    const std::vector<Candidate> &mate_candidates,\n    uint32_t &repetitive_seed_length,\n    std::vector<uint64_t> &candidate_positions) const {\n  const uint32_t mate_candidates_size = mate_candidates.size();\n  int max_minimizer_count = 0;\n  int best_candidate_num = 0;\n  for (uint32_t i = 0; i < mate_candidates_size; ++i) {\n    int count = mate_candidates[i].count;\n    if (count > max_minimizer_count) {\n      max_minimizer_count = count;\n      best_candidate_num = 1;\n    } else if (count == max_minimizer_count) {\n      ++best_candidate_num;\n    }\n  }\n\n  const bool mate_has_too_many_candidates =\n      best_candidate_num >= 300 ||\n      mate_candidates_size > static_cast<uint32_t>(max_seed_frequency0);\n  const bool mate_has_too_many_low_support_candidates =\n      max_minimizer_count <= min_num_seeds_required_for_mapping &&\n      best_candidate_num >= 200;\n  if (mate_has_too_many_candidates ||\n      mate_has_too_many_low_support_candidates) {\n    return -max_minimizer_count;\n  }\n\n  // TODO: reduce the search range based on the strand.\n  std::vector<std::pair<uint64_t, uint64_t>> boundaries;\n  boundaries.reserve(best_candidate_num);\n  for (uint32_t ci = 0; ci < mate_candidates_size; ++ci) {\n    if (mate_candidates[ci].count == max_minimizer_count) {\n      const uint64_t boundary_start =\n          (mate_candidates[ci].position < search_range)\n              ? 0\n              : (mate_candidates[ci].position - search_range);\n      const uint64_t boundary_end = mate_candidates[ci].position + search_range;\n      boundaries.emplace_back(boundary_start, boundary_end);\n    }\n  }\n\n  const uint32_t raw_boundary_size = boundaries.size();\n  if (raw_boundary_size == 0) {\n    return max_minimizer_count;\n  }\n\n  // Merge adjacent boundary point. Assume the candidates are sorted by\n  // coordinate, and thus boundaries are also sorted.\n  uint32_t boundary_size = 1;\n  for (uint32_t bi = 1; bi < raw_boundary_size; ++bi) {\n    if (boundaries[boundary_size - 1].second < boundaries[bi].first) {\n      boundaries[boundary_size] = boundaries[bi];\n      ++boundary_size;\n    } else {\n      boundaries[boundary_size - 1].second = boundaries[bi].second;\n    }\n  }\n  boundaries.resize(boundary_size);\n\n  RepetitiveSeedStats repetitive_seed_stats;\n  for (uint32_t mi = 0; mi < minimizers.size(); ++mi) {\n    khiter_t khash_iterator =\n        kh_get(k64, lookup_table_,\n               GenerateHashInLookupTable(minimizers[mi].GetHash()));\n    if (khash_iterator == kh_end(lookup_table_)) {\n      // std::cerr << \"The minimizer is not in reference!\\n\";\n      continue;\n    }\n\n    const uint64_t lookup_key = kh_key(lookup_table_, khash_iterator);\n    const uint64_t lookup_value = kh_value(lookup_table_, khash_iterator);\n    const uint64_t read_hit = minimizers[mi].GetHit();\n    const uint32_t read_position = HitToSequencePosition(read_hit);\n    if (IsSingletonLookupKey(lookup_key)) {\n      const uint64_t candidate_position =\n          GenerateCandidatePositionFromHits(lookup_value, read_hit);\n      const bool on_same_strand =\n          AreTwoHitsOnTheSameStrand(lookup_value, read_hit);\n      if ((on_same_strand && strand == kPositive) ||\n          (!on_same_strand && strand == kNegative)) {\n        candidate_positions.push_back(candidate_position);\n      }\n      continue;\n    }\n\n    const uint32_t offset = GenerateOffsetInOccurrenceTable(lookup_value);\n    const uint32_t num_occurrences =\n        GenerateNumOccurrenceInOccurrenceTable(lookup_value);\n    int32_t prev_l = 0;\n    for (uint32_t bi = 0; bi < boundary_size; ++bi) {\n      // Use binary search to locate the coordinate near mate position.\n      int32_t l = prev_l, m = 0, r = num_occurrences - 1;\n      uint64_t boundary = boundaries[bi].first;\n      while (l <= r) {\n        m = (l + r) / 2;\n        uint64_t candidate_position =\n            GenerateCandidatePositionFromOccurrenceTableEntry(\n                occurrence_table_[offset + m]);\n        if (candidate_position < boundary) {\n          l = m + 1;\n        } else if (candidate_position > boundary) {\n          r = m - 1;\n        } else {\n          break;\n        }\n      }\n      // For next boundary, we don't have to start from l=0.\n      prev_l = m;\n\n      for (uint32_t oi = m; oi < num_occurrences; ++oi) {\n        const uint64_t reference_hit = occurrence_table_[offset + oi];\n        if ((GenerateCandidatePositionFromOccurrenceTableEntry(reference_hit)) >\n            boundaries[bi].second) {\n          break;\n        }\n        const uint64_t candidate_position =\n            GenerateCandidatePositionFromHits(reference_hit, read_hit);\n        const bool on_same_strand =\n            AreTwoHitsOnTheSameStrand(reference_hit, read_hit);\n        if ((on_same_strand && strand == kPositive) ||\n            (!on_same_strand && strand == kNegative)) {\n          candidate_positions.push_back(candidate_position);\n        }\n      }\n    }\n\n    if (num_occurrences >= (uint32_t)max_seed_frequency0) {\n      UpdateRepetitiveSeedStats(read_position, repetitive_seed_stats);\n    }\n  }\n\n  std::sort(candidate_positions.begin(), candidate_positions.end());\n  repetitive_seed_length = repetitive_seed_stats.repetitive_seed_length;\n  return max_minimizer_count;\n}\n\nuint64_t Index::GenerateCandidatePositionFromHits(uint64_t reference_hit,\n                                                  uint64_t read_hit) const {\n  const uint32_t reference_position = HitToSequencePosition(reference_hit);\n  const uint32_t read_position = HitToSequencePosition(read_hit);\n  // For now we can't see the reference here. So let us don't validate this\n  // candidate position. Instead, we do it later some time when we check the\n  // candidates.\n  const uint32_t reference_start_position =\n      AreTwoHitsOnTheSameStrand(reference_hit, read_hit)\n          ? reference_position - read_position\n          : reference_position + read_position - kmer_size_ + 1;\n  const uint64_t reference_id = HitToSequenceIndex(reference_hit);\n  return SequenceIndexAndPositionToCandidatePosition(reference_id,\n                                                     reference_start_position);\n}\n\nvoid Index::UpdateRepetitiveSeedStats(uint32_t read_position,\n                                      RepetitiveSeedStats &stats) const {\n  if (stats.previous_repetitive_seed_position > read_position) {\n    // First minimizer.\n    stats.repetitive_seed_length += kmer_size_;\n  } else {\n    if (read_position < stats.previous_repetitive_seed_position + kmer_size_ +\n                            window_size_ - 1) {\n      stats.repetitive_seed_length +=\n          read_position - stats.previous_repetitive_seed_position;\n    } else {\n      stats.repetitive_seed_length += kmer_size_;\n    }\n  }\n  stats.previous_repetitive_seed_position = read_position;\n  ++stats.repetitive_seed_count;\n}\n\n}  // namespace chromap\n"
  },
  {
    "path": "src/index.h",
    "content": "#ifndef INDEX_H_\n#define INDEX_H_\n\n#include <limits>\n#include <queue>\n#include <string>\n#include <vector>\n\n#include \"candidate_position_generating_config.h\"\n#include \"index_parameters.h\"\n#include \"index_utils.h\"\n#include \"mapping_metadata.h\"\n#include \"minimizer.h\"\n#include \"sequence_batch.h\"\n#include \"utils.h\"\n\nnamespace chromap {\n\nclass Index {\n public:\n  Index() = delete;\n\n  // For read mapping.\n  Index(const std::string &index_file_path)\n      : index_file_path_(index_file_path) {\n    lookup_table_ = kh_init(k64);\n  }\n\n  // For index construction.\n  Index(const IndexParameters &index_parameters)\n      : kmer_size_(index_parameters.kmer_size),\n        window_size_(index_parameters.window_size),\n        num_threads_(index_parameters.num_threads),\n        index_file_path_(index_parameters.index_output_file_path) {\n    lookup_table_ = kh_init(k64);\n  }\n\n  ~Index() { Destroy(); }\n\n  void Destroy() {\n    if (lookup_table_ != nullptr) {\n      kh_destroy(k64, lookup_table_);\n      lookup_table_ = nullptr;\n    }\n\n    std::vector<uint64_t>().swap(occurrence_table_);\n  }\n\n  void Construct(uint32_t num_sequences, const SequenceBatch &reference);\n\n  void Save() const;\n\n  void Load();\n\n  // Output index stats.\n  void Statistics(uint32_t num_sequences, const SequenceBatch &reference) const;\n\n  // Check the index for some reference genome. Only for debug.\n  void CheckIndex(uint32_t num_sequences, const SequenceBatch &reference) const;\n\n  // Return the number of repetitive seeds.\n  int GenerateCandidatePositions(\n      const CandidatePositionGeneratingConfig &generating_config,\n      MappingMetadata &mapping_metadata) const;\n\n  // Input a search range, for each best mate candidate, serach for candidate\n  // positions for the read. Return the minimizer count of the best candidate if\n  // it finishes normally. Or return a negative value if it stops early due to\n  // too many candidates with low minimizer count.\n  // 'strand' is the strand to generate (augment) candidates.\n  int GenerateCandidatePositionsFromRepetitiveReadWithMateInfoOnOneStrand(\n      const Strand strand, uint32_t search_range,\n      int min_num_seeds_required_for_mapping, int max_seed_frequency0,\n      int error_threshold, const std::vector<Minimizer> &minimizers,\n      const std::vector<Candidate> &mate_candidates,\n      uint32_t &repetitive_seed_length,\n      std::vector<uint64_t> &candidate_positions) const;\n\n  int GetKmerSize() const { return kmer_size_; }\n\n  int GetWindowSize() const { return window_size_; }\n\n  uint32_t GetLookupTableSize() const { return kh_size(lookup_table_); }\n\n private:\n  uint64_t GenerateCandidatePositionFromHits(uint64_t reference_hit,\n                                             uint64_t read_hit) const;\n\n  void UpdateRepetitiveSeedStats(uint32_t read_position,\n                                 RepetitiveSeedStats &stats) const;\n\n  int kmer_size_ = 0;\n  int window_size_ = 0;\n  // Number of threads to build the index, which is not used right now.\n  int num_threads_ = 1;\n  const std::string index_file_path_;\n  khash_t(k64) *lookup_table_ = nullptr;\n  std::vector<uint64_t> occurrence_table_;\n};\n\n}  // namespace chromap\n\n#endif  // INDEX_H_\n"
  },
  {
    "path": "src/index_parameters.h",
    "content": "#ifndef INDEX_PARAMETERS_H_\n#define INDEX_PARAMETERS_H_\n\nnamespace chromap {\n\nstruct IndexParameters {\n  int kmer_size = 17;\n  int window_size = 7;\n  int num_threads = 1;\n  std::string reference_file_path;\n  std::string index_output_file_path;\n};\n\n}  // namespace chromap\n\n#endif  // INDEX_PARAMETERS_H_\n"
  },
  {
    "path": "src/index_utils.h",
    "content": "#ifndef INDEX_UTILS_H_\n#define INDEX_UTILS_H_\n\n#include <stdint.h>\n\n#include \"khash.h\"\n\n// Note that the max kmer size is 28 and its hash value is always saved in the\n// lowest 56 bits of an unsigned 64-bit integer. When an element is inserted\n// into the hash table, its hash value is left shifted by 1 bit and the lowest\n// bit of the key value is set to 1 when the minimizer only occurs once. So\n// right shift by one bit is lossless and safe.\n#define KHashFunctionForIndex(a) ((a) >> 1)\n#define KHashEqForIndex(a, b) ((a) >> 1 == (b) >> 1)\nKHASH_INIT(/*name=*/k64, /*khkey_t=*/uint64_t, /*khval_t=*/uint64_t,\n           /*kh_is_map=*/1, /*__hash_func=*/KHashFunctionForIndex,\n           /*__hash_equal=*/KHashEqForIndex);\n\nnamespace chromap {\n\nstruct RepetitiveSeedStats {\n  uint32_t repetitive_seed_length = 0;\n  uint32_t previous_repetitive_seed_position =\n      std::numeric_limits<uint32_t>::max();\n  int repetitive_seed_count = 0;\n};\n\ninline static uint64_t GenerateHashInLookupTable(uint64_t minimizer_hash) {\n  return minimizer_hash << 1;\n}\n\ninline static uint64_t GenerateEntryValueInLookupTable(\n    uint64_t occurrence_table_offset, uint32_t num_occurrences) {\n  return (occurrence_table_offset << 32) | num_occurrences;\n}\n\ninline static uint32_t GenerateOffsetInOccurrenceTable(uint64_t lookup_value) {\n  return lookup_value >> 32;\n}\n\ninline static uint32_t GenerateNumOccurrenceInOccurrenceTable(\n    uint64_t lookup_table_entry_value) {\n  return static_cast<uint32_t>(lookup_table_entry_value);\n}\n\ninline static uint64_t SequenceIndexAndPositionToCandidatePosition(\n    uint64_t sequence_id, uint32_t sequence_position) {\n  return (sequence_id << 32) | sequence_position;\n}\n\ninline static uint64_t GenerateCandidatePositionFromOccurrenceTableEntry(\n    uint64_t entry) {\n  return entry >> 1;\n}\n\ninline static bool IsSingletonLookupKey(uint64_t lookup_key) {\n  return (lookup_key & 1) > 0;\n}\n\n// Only used in Index to merge sorted candidate position lists using heap.\nstruct CandidatePositionWithListIndex {\n  uint32_t list_index;\n  uint64_t position;\n\n  CandidatePositionWithListIndex(uint32_t list_index, uint64_t position)\n      : list_index(list_index), position(position) {}\n\n  bool operator<(const CandidatePositionWithListIndex &h) const {\n    // The inversed direction is to make a min-heap.\n    return position > h.position;\n  }\n};\n\ninline static void HeapMergeCandidatePositionLists(\n    const std::vector<std::vector<uint64_t>> sorted_candidate_position_lists,\n    std::vector<uint64_t> &candidate_positions) {\n  std::priority_queue<CandidatePositionWithListIndex> heap;\n  std::vector<uint32_t> candidate_position_list_indices(\n      sorted_candidate_position_lists.size(), 0);\n\n  for (uint32_t li = 0; li < sorted_candidate_position_lists.size(); ++li) {\n    if (sorted_candidate_position_lists[li].size() == 0) {\n      continue;\n    }\n    heap.emplace(li, sorted_candidate_position_lists[li][0]);\n  }\n\n  while (!heap.empty()) {\n    const CandidatePositionWithListIndex min_candidate_position = heap.top();\n    heap.pop();\n    candidate_positions.push_back(min_candidate_position.position);\n    ++candidate_position_list_indices[min_candidate_position.list_index];\n\n    const uint32_t min_candidate_position_list_index =\n        candidate_position_list_indices[min_candidate_position.list_index];\n    const std::vector<uint64_t> &min_sorted_candidate_position_list =\n        sorted_candidate_position_lists[min_candidate_position.list_index];\n    if (min_candidate_position_list_index <\n        min_sorted_candidate_position_list.size()) {\n      heap.emplace(min_candidate_position.list_index,\n                   min_sorted_candidate_position_list\n                       [min_candidate_position_list_index]);\n    }\n  }\n}\n\n}  // namespace chromap\n\n#endif  // INDEX_UTILS_H_\n"
  },
  {
    "path": "src/khash.h",
    "content": "/* The MIT License\n\n   Copyright (c) 2008, 2009, 2011 by Attractive Chaos <attractor@live.co.uk>\n\n   Permission is hereby granted, free of charge, to any person obtaining\n   a copy of this software and associated documentation files (the\n   \"Software\"), to deal in the Software without restriction, including\n   without limitation the rights to use, copy, modify, merge, publish,\n   distribute, sublicense, and/or sell copies of the Software, and to\n   permit persons to whom the Software is furnished to do so, subject to\n   the following conditions:\n\n   The above copyright notice and this permission notice shall be\n   included in all copies or substantial portions of the Software.\n\n   THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND,\n   EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF\n   MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND\n   NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS\n   BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN\n   ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN\n   CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n   SOFTWARE.\n*/\n\n/*\n  An example:\n\n#include \"khash.h\"\nKHASH_MAP_INIT_INT(32, char)\nint main() {\n\tint ret, is_missing;\n\tkhiter_t k;\n\tkhash_t(32) *h = kh_init(32);\n\tk = kh_put(32, h, 5, &ret);\n\tkh_value(h, k) = 10;\n\tk = kh_get(32, h, 10);\n\tis_missing = (k == kh_end(h));\n\tk = kh_get(32, h, 5);\n\tkh_del(32, h, k);\n\tfor (k = kh_begin(h); k != kh_end(h); ++k)\n\t\tif (kh_exist(h, k)) kh_value(h, k) = 1;\n\tkh_destroy(32, h);\n\treturn 0;\n}\n*/\n\n/*\n  2013-05-02 (0.2.8):\n\n\t* Use quadratic probing. When the capacity is power of 2, stepping function\n\t  i*(i+1)/2 guarantees to traverse each bucket. It is better than double\n\t  hashing on cache performance and is more robust than linear probing.\n\n\t  In theory, double hashing should be more robust than quadratic probing.\n\t  However, my implementation is probably not for large hash tables, because\n\t  the second hash function is closely tied to the first hash function,\n\t  which reduce the effectiveness of double hashing.\n\n\tReference: http://research.cs.vt.edu/AVresearch/hashing/quadratic.php\n\n  2011-12-29 (0.2.7):\n\n    * Minor code clean up; no actual effect.\n\n  2011-09-16 (0.2.6):\n\n\t* The capacity is a power of 2. This seems to dramatically improve the\n\t  speed for simple keys. Thank Zilong Tan for the suggestion. Reference:\n\n\t   - http://code.google.com/p/ulib/\n\t   - http://nothings.org/computer/judy/\n\n\t* Allow to optionally use linear probing which usually has better\n\t  performance for random input. Double hashing is still the default as it\n\t  is more robust to certain non-random input.\n\n\t* Added Wang's integer hash function (not used by default). This hash\n\t  function is more robust to certain non-random input.\n\n  2011-02-14 (0.2.5):\n\n    * Allow to declare global functions.\n\n  2009-09-26 (0.2.4):\n\n    * Improve portability\n\n  2008-09-19 (0.2.3):\n\n\t* Corrected the example\n\t* Improved interfaces\n\n  2008-09-11 (0.2.2):\n\n\t* Improved speed a little in kh_put()\n\n  2008-09-10 (0.2.1):\n\n\t* Added kh_clear()\n\t* Fixed a compiling error\n\n  2008-09-02 (0.2.0):\n\n\t* Changed to token concatenation which increases flexibility.\n\n  2008-08-31 (0.1.2):\n\n\t* Fixed a bug in kh_get(), which has not been tested previously.\n\n  2008-08-31 (0.1.1):\n\n\t* Added destructor\n*/\n\n\n#ifndef __AC_KHASH_H\n#define __AC_KHASH_H\n\n/*!\n  @header\n\n  Generic hash table library.\n */\n\n#define AC_VERSION_KHASH_H \"0.2.8\"\n\n#include <stdlib.h>\n#include <string.h>\n#include <limits.h>\n\n/* compiler specific configuration */\n\n#if UINT_MAX == 0xffffffffu\ntypedef unsigned int khint32_t;\n#elif ULONG_MAX == 0xffffffffu\ntypedef unsigned long khint32_t;\n#endif\n\n#if ULONG_MAX == ULLONG_MAX\ntypedef unsigned long khint64_t;\n#else\ntypedef unsigned long long khint64_t;\n#endif\n\n#ifndef kh_inline\n#ifdef _MSC_VER\n#define kh_inline __inline\n#else\n#define kh_inline inline\n#endif\n#endif /* kh_inline */\n\n#ifndef klib_unused\n#if (defined __clang__ && __clang_major__ >= 3) || (defined __GNUC__ && __GNUC__ >= 3)\n#define klib_unused __attribute__ ((__unused__))\n#else\n#define klib_unused\n#endif\n#endif /* klib_unused */\n\ntypedef khint32_t khint_t;\ntypedef khint_t khiter_t;\n\n#define __ac_isempty(flag, i) ((flag[i>>4]>>((i&0xfU)<<1))&2)\n#define __ac_isdel(flag, i) ((flag[i>>4]>>((i&0xfU)<<1))&1)\n#define __ac_iseither(flag, i) ((flag[i>>4]>>((i&0xfU)<<1))&3)\n#define __ac_set_isdel_false(flag, i) (flag[i>>4]&=~(1ul<<((i&0xfU)<<1)))\n#define __ac_set_isempty_false(flag, i) (flag[i>>4]&=~(2ul<<((i&0xfU)<<1)))\n#define __ac_set_isboth_false(flag, i) (flag[i>>4]&=~(3ul<<((i&0xfU)<<1)))\n#define __ac_set_isdel_true(flag, i) (flag[i>>4]|=1ul<<((i&0xfU)<<1))\n\n#define __ac_fsize(m) ((m) < 16? 1 : (m)>>4)\n\n#ifndef kroundup32\n#define kroundup32(x) (--(x), (x)|=(x)>>1, (x)|=(x)>>2, (x)|=(x)>>4, (x)|=(x)>>8, (x)|=(x)>>16, ++(x))\n#endif\n\n#ifndef kcalloc\n#define kcalloc(N,Z) calloc(N,Z)\n#endif\n#ifndef kmalloc\n#define kmalloc(Z) malloc(Z)\n#endif\n#ifndef krealloc\n#define krealloc(P,Z) realloc(P,Z)\n#endif\n#ifndef kfree\n#define kfree(P) free(P)\n#endif\n\nstatic const double __ac_HASH_UPPER = 0.77;\n\n#define __KHASH_TYPE(name, khkey_t, khval_t) \\\n\ttypedef struct kh_##name##_s { \\\n\t\tkhint_t n_buckets, size, n_occupied, upper_bound; \\\n\t\tkhint32_t *flags; \\\n\t\tkhkey_t *keys; \\\n\t\tkhval_t *vals; \\\n\t} kh_##name##_t;\n\n#define __KHASH_PROTOTYPES(name, khkey_t, khval_t)\t \t\t\t\t\t\\\n\textern kh_##name##_t *kh_init_##name(void);\t\t\t\t\t\t\t\\\n\textern void kh_destroy_##name(kh_##name##_t *h);\t\t\t\t\t\\\n\textern void kh_clear_##name(kh_##name##_t *h);\t\t\t\t\t\t\\\n\textern khint_t kh_get_##name(const kh_##name##_t *h, khkey_t key); \t\\\n\textern int kh_resize_##name(kh_##name##_t *h, khint_t new_n_buckets); \\\n\textern khint_t kh_put_##name(kh_##name##_t *h, khkey_t key, int *ret); \\\n\textern void kh_del_##name(kh_##name##_t *h, khint_t x); \\\n  extern void kh_load_##name(kh_##name##_t *h, FILE* fp); \\\n  extern void kh_save_##name(kh_##name##_t *h, FILE* fp);\n\n#define __KHASH_IMPL(name, SCOPE, khkey_t, khval_t, kh_is_map, __hash_func, __hash_equal) \\\n\tSCOPE kh_##name##_t *kh_init_##name(void) {\t\t\t\t\t\t\t\\\n\t\treturn (kh_##name##_t*)kcalloc(1, sizeof(kh_##name##_t));\t\t\\\n\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\tSCOPE void kh_destroy_##name(kh_##name##_t *h)\t\t\t\t\t\t\\\n\t{\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\tif (h) {\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\tkfree((void *)h->keys); kfree(h->flags);\t\t\t\t\t\\\n\t\t\tkfree((void *)h->vals);\t\t\t\t\t\t\t\t\t\t\\\n\t\t\tkfree(h);\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\tSCOPE void kh_clear_##name(kh_##name##_t *h)\t\t\t\t\t\t\\\n\t{\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\tif (h && h->flags) {\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\tmemset(h->flags, 0xaa, __ac_fsize(h->n_buckets) * sizeof(khint32_t)); \\\n\t\t\th->size = h->n_occupied = 0;\t\t\t\t\t\t\t\t\\\n\t\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\tSCOPE khint_t kh_get_##name(const kh_##name##_t *h, khkey_t key) \t\\\n\t{\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\tif (h->n_buckets) {\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\tkhint_t k, i, last, mask, step = 0; \\\n\t\t\tmask = h->n_buckets - 1;\t\t\t\t\t\t\t\t\t\\\n\t\t\tk = __hash_func(key); i = k & mask;\t\t\t\t\t\t\t\\\n\t\t\tlast = i; \\\n\t\t\twhile (!__ac_isempty(h->flags, i) && (__ac_isdel(h->flags, i) || !__hash_equal(h->keys[i], key))) { \\\n\t\t\t\ti = (i + (++step)) & mask; \\\n\t\t\t\tif (i == last) return h->n_buckets;\t\t\t\t\t\t\\\n\t\t\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\treturn __ac_iseither(h->flags, i)? h->n_buckets : i;\t\t\\\n\t\t} else return 0;\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\tSCOPE int kh_resize_##name(kh_##name##_t *h, khint_t new_n_buckets) \\\n\t{ /* This function uses 0.25*n_buckets bytes of working space instead of [sizeof(key_t+val_t)+.25]*n_buckets. */ \\\n\t\tkhint32_t *new_flags = 0;\t\t\t\t\t\t\t\t\t\t\\\n\t\tkhint_t j = 1;\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t{\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\tkroundup32(new_n_buckets); \t\t\t\t\t\t\t\t\t\\\n\t\t\tif (new_n_buckets < 4) new_n_buckets = 4;\t\t\t\t\t\\\n\t\t\tif (h->size >= (khint_t)(new_n_buckets * __ac_HASH_UPPER + 0.5)) j = 0;\t/* requested size is too small */ \\\n\t\t\telse { /* hash table size to be changed (shrink or expand); rehash */ \\\n\t\t\t\tnew_flags = (khint32_t*)kmalloc(__ac_fsize(new_n_buckets) * sizeof(khint32_t));\t\\\n\t\t\t\tif (!new_flags) return -1;\t\t\t\t\t\t\t\t\\\n\t\t\t\tmemset(new_flags, 0xaa, __ac_fsize(new_n_buckets) * sizeof(khint32_t)); \\\n\t\t\t\tif (h->n_buckets < new_n_buckets) {\t/* expand */\t\t\\\n\t\t\t\t\tkhkey_t *new_keys = (khkey_t*)krealloc((void *)h->keys, new_n_buckets * sizeof(khkey_t)); \\\n\t\t\t\t\tif (!new_keys) { kfree(new_flags); return -1; }\t\t\\\n\t\t\t\t\th->keys = new_keys;\t\t\t\t\t\t\t\t\t\\\n\t\t\t\t\tif (kh_is_map) {\t\t\t\t\t\t\t\t\t\\\n\t\t\t\t\t\tkhval_t *new_vals = (khval_t*)krealloc((void *)h->vals, new_n_buckets * sizeof(khval_t)); \\\n\t\t\t\t\t\tif (!new_vals) { kfree(new_flags); return -1; }\t\\\n\t\t\t\t\t\th->vals = new_vals;\t\t\t\t\t\t\t\t\\\n\t\t\t\t\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\t\t} /* otherwise shrink */\t\t\t\t\t\t\t\t\\\n\t\t\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\tif (j) { /* rehashing is needed */\t\t\t\t\t\t\t\t\\\n\t\t\tfor (j = 0; j != h->n_buckets; ++j) {\t\t\t\t\t\t\\\n\t\t\t\tif (__ac_iseither(h->flags, j) == 0) {\t\t\t\t\t\\\n\t\t\t\t\tkhkey_t key = h->keys[j];\t\t\t\t\t\t\t\\\n\t\t\t\t\tkhval_t val;\t\t\t\t\t\t\t\t\t\t\\\n\t\t\t\t\tkhint_t new_mask;\t\t\t\t\t\t\t\t\t\\\n\t\t\t\t\tnew_mask = new_n_buckets - 1; \t\t\t\t\t\t\\\n\t\t\t\t\tif (kh_is_map) val = h->vals[j];\t\t\t\t\t\\\n\t\t\t\t\t__ac_set_isdel_true(h->flags, j);\t\t\t\t\t\\\n\t\t\t\t\twhile (1) { /* kick-out process; sort of like in Cuckoo hashing */ \\\n\t\t\t\t\t\tkhint_t k, i, step = 0; \\\n\t\t\t\t\t\tk = __hash_func(key);\t\t\t\t\t\t\t\\\n\t\t\t\t\t\ti = k & new_mask;\t\t\t\t\t\t\t\t\\\n\t\t\t\t\t\twhile (!__ac_isempty(new_flags, i)) i = (i + (++step)) & new_mask; \\\n\t\t\t\t\t\t__ac_set_isempty_false(new_flags, i);\t\t\t\\\n\t\t\t\t\t\tif (i < h->n_buckets && __ac_iseither(h->flags, i) == 0) { /* kick out the existing element */ \\\n\t\t\t\t\t\t\t{ khkey_t tmp = h->keys[i]; h->keys[i] = key; key = tmp; } \\\n\t\t\t\t\t\t\tif (kh_is_map) { khval_t tmp = h->vals[i]; h->vals[i] = val; val = tmp; } \\\n\t\t\t\t\t\t\t__ac_set_isdel_true(h->flags, i); /* mark it as deleted in the old hash table */ \\\n\t\t\t\t\t\t} else { /* write the element and jump out of the loop */ \\\n\t\t\t\t\t\t\th->keys[i] = key;\t\t\t\t\t\t\t\\\n\t\t\t\t\t\t\tif (kh_is_map) h->vals[i] = val;\t\t\t\\\n\t\t\t\t\t\t\tbreak;\t\t\t\t\t\t\t\t\t\t\\\n\t\t\t\t\t\t}\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\t\t\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\t\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\tif (h->n_buckets > new_n_buckets) { /* shrink the hash table */ \\\n\t\t\t\th->keys = (khkey_t*)krealloc((void *)h->keys, new_n_buckets * sizeof(khkey_t)); \\\n\t\t\t\tif (kh_is_map) h->vals = (khval_t*)krealloc((void *)h->vals, new_n_buckets * sizeof(khval_t)); \\\n\t\t\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\tkfree(h->flags); /* free the working space */\t\t\t\t\\\n\t\t\th->flags = new_flags;\t\t\t\t\t\t\t\t\t\t\\\n\t\t\th->n_buckets = new_n_buckets;\t\t\t\t\t\t\t\t\\\n\t\t\th->n_occupied = h->size;\t\t\t\t\t\t\t\t\t\\\n\t\t\th->upper_bound = (khint_t)(h->n_buckets * __ac_HASH_UPPER + 0.5); \\\n\t\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\treturn 0;\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\tSCOPE khint_t kh_put_##name(kh_##name##_t *h, khkey_t key, int *ret) \\\n\t{\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\tkhint_t x;\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\tif (h->n_occupied >= h->upper_bound) { /* update the hash table */ \\\n\t\t\tif (h->n_buckets > (h->size<<1)) {\t\t\t\t\t\t\t\\\n\t\t\t\tif (kh_resize_##name(h, h->n_buckets - 1) < 0) { /* clear \"deleted\" elements */ \\\n\t\t\t\t\t*ret = -1; return h->n_buckets;\t\t\t\t\t\t\\\n\t\t\t\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\t} else if (kh_resize_##name(h, h->n_buckets + 1) < 0) { /* expand the hash table */ \\\n\t\t\t\t*ret = -1; return h->n_buckets;\t\t\t\t\t\t\t\\\n\t\t\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t} /* TODO: to implement automatically shrinking; resize() already support shrinking */ \\\n\t\t{\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\tkhint_t k, i, site, last, mask = h->n_buckets - 1, step = 0; \\\n\t\t\tx = site = h->n_buckets; k = __hash_func(key); i = k & mask; \\\n\t\t\tif (__ac_isempty(h->flags, i)) x = i; /* for speed up */\t\\\n\t\t\telse {\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\t\tlast = i; \\\n\t\t\t\twhile (!__ac_isempty(h->flags, i) && (__ac_isdel(h->flags, i) || !__hash_equal(h->keys[i], key))) { \\\n\t\t\t\t\tif (__ac_isdel(h->flags, i)) site = i;\t\t\t\t\\\n\t\t\t\t\ti = (i + (++step)) & mask; \\\n\t\t\t\t\tif (i == last) { x = site; break; }\t\t\t\t\t\\\n\t\t\t\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\t\tif (x == h->n_buckets) {\t\t\t\t\t\t\t\t\\\n\t\t\t\t\tif (__ac_isempty(h->flags, i) && site != h->n_buckets) x = site; \\\n\t\t\t\t\telse x = i;\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\t\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\tif (__ac_isempty(h->flags, x)) { /* not present at all */\t\t\\\n\t\t\th->keys[x] = key;\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\t__ac_set_isboth_false(h->flags, x);\t\t\t\t\t\t\t\\\n\t\t\t++h->size; ++h->n_occupied;\t\t\t\t\t\t\t\t\t\\\n\t\t\t*ret = 1;\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t} else if (__ac_isdel(h->flags, x)) { /* deleted */\t\t\t\t\\\n\t\t\th->keys[x] = key;\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\t__ac_set_isboth_false(h->flags, x);\t\t\t\t\t\t\t\\\n\t\t\t++h->size;\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\t*ret = 2;\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t} else *ret = 0; /* Don't touch h->keys[x] if present and not deleted */ \\\n\t\treturn x;\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\tSCOPE void kh_del_##name(kh_##name##_t *h, khint_t x)\t\t\t\t\\\n\t{\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\tif (x != h->n_buckets && !__ac_iseither(h->flags, x)) {\t\t\t\\\n\t\t\t__ac_set_isdel_true(h->flags, x);\t\t\t\t\t\t\t\\\n\t\t\t--h->size;\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t} \\\n  SCOPE void kh_load_##name(kh_##name##_t *h, FILE* fp)\\\n  {\\\n    fread(&(h->n_buckets), sizeof(khint_t), 1, fp);\\\n    fread(&(h->size), sizeof(khint_t), 1, fp);\\\n    fread(&(h->n_occupied), sizeof(khint_t), 1, fp);\\\n    fread(&(h->upper_bound), sizeof(khint_t), 1, fp);\\\n    if (h->n_buckets)\\\n    {\\\n      h->flags = (khint32_t*)kmalloc(__ac_fsize(h->n_buckets) * sizeof(khint32_t));\\\n      fread(h->flags, sizeof(khint32_t), __ac_fsize(h->n_buckets), fp);\\\n      h->keys = (khkey_t*)kmalloc(sizeof(khkey_t)*h->n_buckets);\\\n      fread(h->keys, sizeof(khkey_t), h->n_buckets, fp);\\\n      h->vals = (khval_t*)kmalloc(sizeof(khval_t)*h->n_buckets);\\\n      fread(h->vals, sizeof(khval_t), h->n_buckets, fp);\\\n    }\\\n  }\\\n  SCOPE void kh_write_##name(kh_##name##_t *h, FILE* fp)\\\n  {\\\n    fwrite(&(h->n_buckets), sizeof(khint_t), 1, fp);\\\n    fwrite(&(h->size), sizeof(khint_t), 1, fp);\\\n    fwrite(&(h->n_occupied), sizeof(khint_t), 1, fp);\\\n    fwrite(&(h->upper_bound), sizeof(khint_t), 1, fp);\\\n    if (h->n_buckets)\\\n    {\\\n      fwrite(h->flags, sizeof(khint32_t), __ac_fsize(h->n_buckets), fp);\\\n      fwrite(h->keys, sizeof(khkey_t), h->n_buckets, fp);\\\n      fwrite(h->vals, sizeof(khval_t), h->n_buckets, fp);\\\n    }\\\n  }\n\n#define KHASH_DECLARE(name, khkey_t, khval_t)\t\t \t\t\t\t\t\\\n\t__KHASH_TYPE(name, khkey_t, khval_t) \t\t\t\t\t\t\t\t\\\n\t__KHASH_PROTOTYPES(name, khkey_t, khval_t)\n\n#define KHASH_INIT2(name, SCOPE, khkey_t, khval_t, kh_is_map, __hash_func, __hash_equal) \\\n\t__KHASH_TYPE(name, khkey_t, khval_t) \t\t\t\t\t\t\t\t\\\n\t__KHASH_IMPL(name, SCOPE, khkey_t, khval_t, kh_is_map, __hash_func, __hash_equal)\n\n#define KHASH_INIT(name, khkey_t, khval_t, kh_is_map, __hash_func, __hash_equal) \\\n\tKHASH_INIT2(name, static kh_inline klib_unused, khkey_t, khval_t, kh_is_map, __hash_func, __hash_equal)\n\n/* --- BEGIN OF HASH FUNCTIONS --- */\n\n/*! @function\n  @abstract     Integer hash function\n  @param  key   The integer [khint32_t]\n  @return       The hash value [khint_t]\n */\n#define kh_int_hash_func(key) (khint32_t)(key)\n/*! @function\n  @abstract     Integer comparison function\n */\n#define kh_int_hash_equal(a, b) ((a) == (b))\n/*! @function\n  @abstract     64-bit integer hash function\n  @param  key   The integer [khint64_t]\n  @return       The hash value [khint_t]\n */\n#define kh_int64_hash_func(key) (khint32_t)((key)>>33^(key)^(key)<<11)\n/*! @function\n  @abstract     64-bit integer comparison function\n */\n#define kh_int64_hash_equal(a, b) ((a) == (b))\n/*! @function\n  @abstract     const char* hash function\n  @param  s     Pointer to a null terminated string\n  @return       The hash value\n */\nstatic kh_inline khint_t __ac_X31_hash_string(const char *s)\n{\n\tkhint_t h = (khint_t)*s;\n\tif (h) for (++s ; *s; ++s) h = (h << 5) - h + (khint_t)*s;\n\treturn h;\n}\n/*! @function\n  @abstract     Another interface to const char* hash function\n  @param  key   Pointer to a null terminated string [const char*]\n  @return       The hash value [khint_t]\n */\n#define kh_str_hash_func(key) __ac_X31_hash_string(key)\n/*! @function\n  @abstract     Const char* comparison function\n */\n#define kh_str_hash_equal(a, b) (strcmp(a, b) == 0)\n\nstatic kh_inline khint_t __ac_Wang_hash(khint_t key)\n{\n    key += ~(key << 15);\n    key ^=  (key >> 10);\n    key +=  (key << 3);\n    key ^=  (key >> 6);\n    key += ~(key << 11);\n    key ^=  (key >> 16);\n    return key;\n}\n#define kh_int_hash_func2(key) __ac_Wang_hash((khint_t)key)\n\n/* --- END OF HASH FUNCTIONS --- */\n\n/* Other convenient macros... */\n\n/*!\n  @abstract Type of the hash table.\n  @param  name  Name of the hash table [symbol]\n */\n#define khash_t(name) kh_##name##_t\n\n/*! @function\n  @abstract     Initiate a hash table.\n  @param  name  Name of the hash table [symbol]\n  @return       Pointer to the hash table [khash_t(name)*]\n */\n#define kh_init(name) kh_init_##name()\n\n/*! @function\n  @abstract     Destroy a hash table.\n  @param  name  Name of the hash table [symbol]\n  @param  h     Pointer to the hash table [khash_t(name)*]\n */\n#define kh_destroy(name, h) kh_destroy_##name(h)\n\n/*! @function\n  @abstract     Reset a hash table without deallocating memory.\n  @param  name  Name of the hash table [symbol]\n  @param  h     Pointer to the hash table [khash_t(name)*]\n */\n#define kh_clear(name, h) kh_clear_##name(h)\n\n/*! @function\n  @abstract     Resize a hash table.\n  @param  name  Name of the hash table [symbol]\n  @param  h     Pointer to the hash table [khash_t(name)*]\n  @param  s     New size [khint_t]\n */\n#define kh_resize(name, h, s) kh_resize_##name(h, s)\n\n/*! @function\n  @abstract     Insert a key to the hash table.\n  @param  name  Name of the hash table [symbol]\n  @param  h     Pointer to the hash table [khash_t(name)*]\n  @param  k     Key [type of keys]\n  @param  r     Extra return code: -1 if the operation failed;\n                0 if the key is present in the hash table;\n                1 if the bucket is empty (never used); 2 if the element in\n\t\t\t\tthe bucket has been deleted [int*]\n  @return       Iterator to the inserted element [khint_t]\n */\n#define kh_put(name, h, k, r) kh_put_##name(h, k, r)\n\n/*! @function\n  @abstract     Retrieve a key from the hash table.\n  @param  name  Name of the hash table [symbol]\n  @param  h     Pointer to the hash table [khash_t(name)*]\n  @param  k     Key [type of keys]\n  @return       Iterator to the found element, or kh_end(h) if the element is absent [khint_t]\n */\n#define kh_get(name, h, k) kh_get_##name(h, k)\n\n/*! @function\n  @abstract     Remove a key from the hash table.\n  @param  name  Name of the hash table [symbol]\n  @param  h     Pointer to the hash table [khash_t(name)*]\n  @param  k     Iterator to the element to be deleted [khint_t]\n */\n#define kh_del(name, h, k) kh_del_##name(h, k)\n\n/*! @function\n  @abstract     Test whether a bucket contains data.\n  @param  h     Pointer to the hash table [khash_t(name)*]\n  @param  x     Iterator to the bucket [khint_t]\n  @return       1 if containing data; 0 otherwise [int]\n */\n#define kh_exist(h, x) (!__ac_iseither((h)->flags, (x)))\n\n/*! @function\n  @abstract     Get key given an iterator\n  @param  h     Pointer to the hash table [khash_t(name)*]\n  @param  x     Iterator to the bucket [khint_t]\n  @return       Key [type of keys]\n */\n#define kh_key(h, x) ((h)->keys[x])\n\n/*! @function\n  @abstract     Get value given an iterator\n  @param  h     Pointer to the hash table [khash_t(name)*]\n  @param  x     Iterator to the bucket [khint_t]\n  @return       Value [type of values]\n  @discussion   For hash sets, calling this results in segfault.\n */\n#define kh_val(h, x) ((h)->vals[x])\n\n/*! @function\n  @abstract     Alias of kh_val()\n */\n#define kh_value(h, x) ((h)->vals[x])\n\n/*! @function\n  @abstract     Get the start iterator\n  @param  h     Pointer to the hash table [khash_t(name)*]\n  @return       The start iterator [khint_t]\n */\n#define kh_begin(h) (khint_t)(0)\n\n/*! @function\n  @abstract     Get the end iterator\n  @param  h     Pointer to the hash table [khash_t(name)*]\n  @return       The end iterator [khint_t]\n */\n#define kh_end(h) ((h)->n_buckets)\n\n/*! @function\n  @abstract     Get the number of elements in the hash table\n  @param  h     Pointer to the hash table [khash_t(name)*]\n  @return       Number of elements in the hash table [khint_t]\n */\n#define kh_size(h) ((h)->size)\n\n/*! @function\n  @abstract     Get the number of buckets in the hash table\n  @param  h     Pointer to the hash table [khash_t(name)*]\n  @return       Number of buckets in the hash table [khint_t]\n */\n#define kh_n_buckets(h) ((h)->n_buckets)\n\n/*! @function\n  @abstract     Iterate over the entries in the hash table\n  @param  h     Pointer to the hash table [khash_t(name)*]\n  @param  kvar  Variable to which key will be assigned\n  @param  vvar  Variable to which value will be assigned\n  @param  code  Block of code to execute\n */\n#define kh_foreach(h, kvar, vvar, code) { khint_t __i;\t\t\\\n\tfor (__i = kh_begin(h); __i != kh_end(h); ++__i) {\t\t\\\n\t\tif (!kh_exist(h,__i)) continue;\t\t\t\t\t\t\\\n\t\t(kvar) = kh_key(h,__i);\t\t\t\t\t\t\t\t\\\n\t\t(vvar) = kh_val(h,__i);\t\t\t\t\t\t\t\t\\\n\t\tcode;\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t} }\n\n/*! @function\n  @abstract     Iterate over the values in the hash table\n  @param  h     Pointer to the hash table [khash_t(name)*]\n  @param  vvar  Variable to which value will be assigned\n  @param  code  Block of code to execute\n */\n#define kh_foreach_value(h, vvar, code) { khint_t __i;\t\t\\\n\tfor (__i = kh_begin(h); __i != kh_end(h); ++__i) {\t\t\\\n\t\tif (!kh_exist(h,__i)) continue;\t\t\t\t\t\t\\\n\t\t(vvar) = kh_val(h,__i);\t\t\t\t\t\t\t\t\\\n\t\tcode;\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t} }\n\n#define kh_load(name, h, fp)  kh_load_##name(h, fp)\n#define kh_save(name, h, fp)  kh_write_##name(h, fp)\n\n/* More convenient interfaces */\n\n/*! @function\n  @abstract     Instantiate a hash set containing integer keys\n  @param  name  Name of the hash table [symbol]\n */\n#define KHASH_SET_INIT_INT(name)\t\t\t\t\t\t\t\t\t\t\\\n\tKHASH_INIT(name, khint32_t, char, 0, kh_int_hash_func, kh_int_hash_equal)\n\n/*! @function\n  @abstract     Instantiate a hash map containing integer keys\n  @param  name  Name of the hash table [symbol]\n  @param  khval_t  Type of values [type]\n */\n#define KHASH_MAP_INIT_INT(name, khval_t)\t\t\t\t\t\t\t\t\\\n\tKHASH_INIT(name, khint32_t, khval_t, 1, kh_int_hash_func, kh_int_hash_equal)\n\n/*! @function\n  @abstract     Instantiate a hash set containing 64-bit integer keys\n  @param  name  Name of the hash table [symbol]\n */\n#define KHASH_SET_INIT_INT64(name)\t\t\t\t\t\t\t\t\t\t\\\n\tKHASH_INIT(name, khint64_t, char, 0, kh_int64_hash_func, kh_int64_hash_equal)\n\n/*! @function\n  @abstract     Instantiate a hash map containing 64-bit integer keys\n  @param  name  Name of the hash table [symbol]\n  @param  khval_t  Type of values [type]\n */\n#define KHASH_MAP_INIT_INT64(name, khval_t)\t\t\t\t\t\t\t\t\\\n\tKHASH_INIT(name, khint64_t, khval_t, 1, kh_int64_hash_func, kh_int64_hash_equal)\n\ntypedef const char *kh_cstr_t;\n/*! @function\n  @abstract     Instantiate a hash map containing const char* keys\n  @param  name  Name of the hash table [symbol]\n */\n#define KHASH_SET_INIT_STR(name)\t\t\t\t\t\t\t\t\t\t\\\n\tKHASH_INIT(name, kh_cstr_t, char, 0, kh_str_hash_func, kh_str_hash_equal)\n\n/*! @function\n  @abstract     Instantiate a hash map containing const char* keys\n  @param  name  Name of the hash table [symbol]\n  @param  khval_t  Type of values [type]\n */\n#define KHASH_MAP_INIT_STR(name, khval_t)\t\t\t\t\t\t\t\t\\\n\tKHASH_INIT(name, kh_cstr_t, khval_t, 1, kh_str_hash_func, kh_str_hash_equal)\n\n#endif /* __AC_KHASH_H */\n"
  },
  {
    "path": "src/kseq.h",
    "content": "/* The MIT License\n\n   Copyright (c) 2008, 2009, 2011 Attractive Chaos <attractor@live.co.uk>\n\n   Permission is hereby granted, free of charge, to any person obtaining\n   a copy of this software and associated documentation files (the\n   \"Software\"), to deal in the Software without restriction, including\n   without limitation the rights to use, copy, modify, merge, publish,\n   distribute, sublicense, and/or sell copies of the Software, and to\n   permit persons to whom the Software is furnished to do so, subject to\n   the following conditions:\n\n   The above copyright notice and this permission notice shall be\n   included in all copies or substantial portions of the Software.\n\n   THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND,\n   EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF\n   MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND\n   NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS\n   BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN\n   ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN\n   CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n   SOFTWARE.\n*/\n\n/* Last Modified: 05MAR2012 */\n\n#ifndef AC_KSEQ_H\n#define AC_KSEQ_H\n\n#include <ctype.h>\n#include <string.h>\n#include <stdlib.h>\n\n#define KS_SEP_SPACE 0 // isspace(): \\t, \\n, \\v, \\f, \\r\n#define KS_SEP_TAB   1 // isspace() && !' '\n#define KS_SEP_LINE  2 // line separator: \"\\n\" (Unix) or \"\\r\\n\" (Windows)\n#define KS_SEP_MAX   2\n\n#define __KS_TYPE(type_t)\t\t\t\t\t\t\\\n\ttypedef struct __kstream_t {\t\t\t\t\\\n\t\tunsigned char *buf;\t\t\t\t\t\t\\\n\t\tint begin, end, is_eof;\t\t\t\t\t\\\n\t\ttype_t f;\t\t\t\t\t\t\t\t\\\n\t} kstream_t;\n\n#define ks_err(ks) ((ks)->end == -1)\n#define ks_eof(ks) ((ks)->is_eof && (ks)->begin >= (ks)->end)\n#define ks_rewind(ks) ((ks)->is_eof = (ks)->begin = (ks)->end = 0)\n\n#define __KS_BASIC(type_t, __bufsize)\t\t\t\t\t\t\t\t\\\n\tstatic inline kstream_t *ks_init(type_t f)\t\t\t\t\t\t\\\n\t{\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\tkstream_t *ks = (kstream_t*)calloc(1, sizeof(kstream_t));\t\\\n\t\tks->f = f;\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\tks->buf = (unsigned char*)malloc(__bufsize);\t\t\t\t\\\n\t\treturn ks;\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\tstatic inline void ks_destroy(kstream_t *ks)\t\t\t\t\t\\\n\t{\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\tif (ks) {\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\tfree(ks->buf);\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\tfree(ks);\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t}\n\n#define __KS_GETC(__read, __bufsize)\t\t\t\t\t\t\\\n\tstatic inline int ks_getc(kstream_t *ks)\t\t\t\t\\\n\t{\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\tif (ks_err(ks)) return -3;\t\t\t\t\t\t\t\\\n\t\tif (ks->is_eof && ks->begin >= ks->end) return -1;\t\\\n\t\tif (ks->begin >= ks->end) {\t\t\t\t\t\t\t\\\n\t\t\tks->begin = 0;\t\t\t\t\t\t\t\t\t\\\n\t\t\tks->end = __read(ks->f, ks->buf, __bufsize);\t\\\n\t\t\tif (ks->end == 0) { ks->is_eof = 1; return -1;}\t\\\n\t\t\tif (ks->end == -1) { ks->is_eof = 1; return -3;}\\\n\t\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\treturn (int)ks->buf[ks->begin++];\t\t\t\t\t\\\n\t}\n\n#ifndef KSTRING_T\n#define KSTRING_T kstring_t\ntypedef struct __kstring_t {\n\tsize_t l, m;\n\tchar *s;\n} kstring_t;\n#endif\n\n#ifndef kroundup32\n#define kroundup32(x) (--(x), (x)|=(x)>>1, (x)|=(x)>>2, (x)|=(x)>>4, (x)|=(x)>>8, (x)|=(x)>>16, ++(x))\n#endif\n\n#define __KS_GETUNTIL(__read, __bufsize)\t\t\t\t\t\t\t\t\\\n\tstatic int ks_getuntil2(kstream_t *ks, int delimiter, kstring_t *str, int *dret, int append) \\\n\t{\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\tint gotany = 0;\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\tif (dret) *dret = 0;\t\t\t\t\t\t\t\t\t\t\t\\\n\t\tstr->l = append? str->l : 0;\t\t\t\t\t\t\t\t\t\\\n\t\tfor (;;) {\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\tint i;\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\tif (ks_err(ks)) return -3;\t\t\t\t\t\t\t\t\t\\\n\t\t\tif (ks->begin >= ks->end) {\t\t\t\t\t\t\t\t\t\\\n\t\t\t\tif (!ks->is_eof) {\t\t\t\t\t\t\t\t\t\t\\\n\t\t\t\t\tks->begin = 0;\t\t\t\t\t\t\t\t\t\t\\\n\t\t\t\t\tks->end = __read(ks->f, ks->buf, __bufsize);\t\t\\\n\t\t\t\t\tif (ks->end == 0) { ks->is_eof = 1; break; }\t\t\\\n\t\t\t\t\tif (ks->end == -1) { ks->is_eof = 1; return -3; }\t\\\n\t\t\t\t} else break;\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\tif (delimiter == KS_SEP_LINE) { \\\n\t\t\t\tfor (i = ks->begin; i < ks->end; ++i) \\\n\t\t\t\t\tif (ks->buf[i] == '\\n') break; \\\n\t\t\t} else if (delimiter > KS_SEP_MAX) {\t\t\t\t\t\t\\\n\t\t\t\tfor (i = ks->begin; i < ks->end; ++i)\t\t\t\t\t\\\n\t\t\t\t\tif (ks->buf[i] == delimiter) break;\t\t\t\t\t\\\n\t\t\t} else if (delimiter == KS_SEP_SPACE) {\t\t\t\t\t\t\\\n\t\t\t\tfor (i = ks->begin; i < ks->end; ++i)\t\t\t\t\t\\\n\t\t\t\t\tif (isspace(ks->buf[i])) break;\t\t\t\t\t\t\\\n\t\t\t} else if (delimiter == KS_SEP_TAB) {\t\t\t\t\t\t\\\n\t\t\t\tfor (i = ks->begin; i < ks->end; ++i)\t\t\t\t\t\\\n\t\t\t\t\tif (isspace(ks->buf[i]) && ks->buf[i] != ' ') break; \\\n\t\t\t} else i = 0; /* never come to here! */\t\t\t\t\t\t\\\n\t\t\tif (str->m - str->l < (size_t)(i - ks->begin + 1)) {\t\t\\\n\t\t\t\tstr->m = str->l + (i - ks->begin) + 1;\t\t\t\t\t\\\n\t\t\t\tkroundup32(str->m);\t\t\t\t\t\t\t\t\t\t\\\n\t\t\t\tstr->s = (char*)realloc(str->s, str->m);\t\t\t\t\\\n\t\t\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\tgotany = 1;\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\tmemcpy(str->s + str->l, ks->buf + ks->begin, i - ks->begin); \\\n\t\t\tstr->l = str->l + (i - ks->begin);\t\t\t\t\t\t\t\\\n\t\t\tks->begin = i + 1;\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\tif (i < ks->end) {\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\t\tif (dret) *dret = ks->buf[i];\t\t\t\t\t\t\t\\\n\t\t\t\tbreak;\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\tif (!gotany && ks_eof(ks)) return -1;\t\t\t\t\t\t\t\\\n\t\tif (str->s == 0) {\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\tstr->m = 1;\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\t\tstr->s = (char*)calloc(1, 1);\t\t\t\t\t\t\t\t\\\n\t\t} else if (delimiter == KS_SEP_LINE && str->l > 1 && str->s[str->l-1] == '\\r') --str->l; \\\n\t\tstr->s[str->l] = '\\0';\t\t\t\t\t\t\t\t\t\t\t\\\n\t\treturn str->l;\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t} \\\n\tstatic inline int ks_getuntil(kstream_t *ks, int delimiter, kstring_t *str, int *dret) \\\n\t{ return ks_getuntil2(ks, delimiter, str, dret, 0); }\n\n#define KSTREAM_INIT(type_t, __read, __bufsize) \\\n\t__KS_TYPE(type_t)\t\t\t\t\t\t\t\\\n\t__KS_BASIC(type_t, __bufsize)\t\t\t\t\\\n\t__KS_GETC(__read, __bufsize)\t\t\t\t\\\n\t__KS_GETUNTIL(__read, __bufsize)\n\n#define kseq_rewind(ks) ((ks)->last_char = (ks)->f->is_eof = (ks)->f->begin = (ks)->f->end = 0)\n\n#define __KSEQ_BASIC(SCOPE, type_t)\t\t\t\t\t\t\t\t\t\t\\\n\tSCOPE kseq_t *kseq_init(type_t fd)\t\t\t\t\t\t\t\t\t\\\n\t{\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\tkseq_t *s = (kseq_t*)calloc(1, sizeof(kseq_t));\t\t\t\t\t\\\n\t\ts->f = ks_init(fd);\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\treturn s;\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t}\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\tSCOPE void kseq_destroy(kseq_t *ks)\t\t\t\t\t\t\t\t\t\\\n\t{\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\tif (!ks) return;\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\tfree(ks->name.s); free(ks->comment.s); free(ks->seq.s);\tfree(ks->qual.s); \\\n\t\tks_destroy(ks->f);\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t\tfree(ks);\t\t\t\t\t\t\t\t\t\t\t\t\t\t\\\n\t}\n\n/* Return value:\n   >=0  length of the sequence (normal)\n   -1   end-of-file\n   -2   truncated quality string\n   -3   error reading stream\n */\n#define __KSEQ_READ(SCOPE) \\\n\tSCOPE int kseq_read(kseq_t *seq) \\\n\t{ \\\n\t\tint c,r; \\\n\t\tkstream_t *ks = seq->f; \\\n\t\tif (seq->last_char == 0) { /* then jump to the next header line */ \\\n\t\t\twhile ((c = ks_getc(ks)) >= 0 && c != '>' && c != '@'); \\\n\t\t\tif (c < 0) return c; /* end of file or error*/ \\\n\t\t\tseq->last_char = c; \\\n\t\t} /* else: the first header char has been read in the previous call */ \\\n\t\tseq->comment.l = seq->seq.l = seq->qual.l = 0; /* reset all members */ \\\n\t\tif ((r=ks_getuntil(ks, 0, &seq->name, &c)) < 0) return r;  /* normal exit: EOF or error */ \\\n\t\tif (c != '\\n') ks_getuntil(ks, KS_SEP_LINE, &seq->comment, 0); /* read FASTA/Q comment */ \\\n\t\tif (seq->seq.s == 0) { /* we can do this in the loop below, but that is slower */ \\\n\t\t\tseq->seq.m = 256; \\\n\t\t\tseq->seq.s = (char*)malloc(seq->seq.m); \\\n\t\t} \\\n\t\twhile ((c = ks_getc(ks)) >= 0 && c != '>' && c != '+' && c != '@') { \\\n\t\t\tif (c == '\\n') continue; /* skip empty lines */ \\\n\t\t\tseq->seq.s[seq->seq.l++] = c; /* this is safe: we always have enough space for 1 char */ \\\n\t\t\tks_getuntil2(ks, KS_SEP_LINE, &seq->seq, 0, 1); /* read the rest of the line */ \\\n\t\t} \\\n\t\tif (c == '>' || c == '@') seq->last_char = c; /* the first header char has been read */\t\\\n\t\tif (seq->seq.l + 1 >= seq->seq.m) { /* seq->seq.s[seq->seq.l] below may be out of boundary */ \\\n\t\t\tseq->seq.m = seq->seq.l + 2; \\\n\t\t\tkroundup32(seq->seq.m); /* rounded to the next closest 2^k */ \\\n\t\t\tseq->seq.s = (char*)realloc(seq->seq.s, seq->seq.m); \\\n\t\t} \\\n\t\tseq->seq.s[seq->seq.l] = 0;\t/* null terminated string */ \\\n\t\tif (c != '+') return seq->seq.l; /* FASTA */ \\\n\t\tif (seq->qual.m < seq->seq.m) {\t/* allocate memory for qual in case insufficient */ \\\n\t\t\tseq->qual.m = seq->seq.m; \\\n\t\t\tseq->qual.s = (char*)realloc(seq->qual.s, seq->qual.m); \\\n\t\t} \\\n\t\twhile ((c = ks_getc(ks)) >= 0 && c != '\\n'); /* skip the rest of '+' line */ \\\n\t\tif (c == -1) return -2; /* error: no quality string */ \\\n\t\twhile ((c = ks_getuntil2(ks, KS_SEP_LINE, &seq->qual, 0, 1) >= 0 && seq->qual.l < seq->seq.l)); \\\n\t\tif (c == -3) return -3; /* stream error */ \\\n\t\tseq->last_char = 0;\t/* we have not come to the next header line */ \\\n\t\tif (seq->seq.l != seq->qual.l) return -2; /* error: qual string is of a different length */ \\\n\t\treturn seq->seq.l; \\\n\t}\n\n#define __KSEQ_TYPE(type_t)\t\t\t\t\t\t\\\n\ttypedef struct {\t\t\t\t\t\t\t\\\n\t\tkstring_t name, comment, seq, qual;\t\t\\\n\t\tint last_char;\t\t\t\t\t\t\t\\\n\t\tkstream_t *f;\t\t\t\t\t\t\t\\\n    uint32_t id; \\\n\t} kseq_t;\n\n#define KSEQ_INIT2(SCOPE, type_t, __read)\t\t\\\n\tKSTREAM_INIT(type_t, __read, 16384)\t\t\t\\\n\t__KSEQ_TYPE(type_t)\t\t\t\t\t\t\t\\\n\t__KSEQ_BASIC(SCOPE, type_t)\t\t\t\t\t\\\n\t__KSEQ_READ(SCOPE)\n\n#define KSEQ_INIT(type_t, __read) KSEQ_INIT2(static, type_t, __read)\n\n#define KSEQ_DECLARE(type_t) \\\n\t__KS_TYPE(type_t) \\\n\t__KSEQ_TYPE(type_t) \\\n\textern kseq_t *kseq_init(type_t fd); \\\n\tvoid kseq_destroy(kseq_t *ks); \\\n\tint kseq_read(kseq_t *seq);\n\n#endif\n"
  },
  {
    "path": "src/ksw.cc",
    "content": "/* The MIT License\n\n   Copyright (c) 2011 by Attractive Chaos <attractor@live.co.uk>\n\n   Permission is hereby granted, free of charge, to any person obtaining\n   a copy of this software and associated documentation files (the\n   \"Software\"), to deal in the Software without restriction, including\n   without limitation the rights to use, copy, modify, merge, publish,\n   distribute, sublicense, and/or sell copies of the Software, and to\n   permit persons to whom the Software is furnished to do so, subject to\n   the following conditions:\n\n   The above copyright notice and this permission notice shall be\n   included in all copies or substantial portions of the Software.\n\n   THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND,\n   EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF\n   MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND\n   NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS\n   BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN\n   ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN\n   CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n   SOFTWARE.\n*/\n\n#include <stdlib.h>\n#include <stdint.h>\n#include <assert.h>\n#include <emmintrin.h>\n#include \"ksw.h\"\n#include \"sequence_batch.h\"\n\n#ifdef USE_MALLOC_WRAPPERS\n#  include \"malloc_wrap.h\"\n#endif\n\n#ifdef __GNUC__\n#define LIKELY(x) __builtin_expect((x),1)\n#define UNLIKELY(x) __builtin_expect((x),0)\n#else\n#define LIKELY(x) (x)\n#define UNLIKELY(x) (x)\n#endif\n\nconst kswr_t g_defr = { 0, -1, -1, -1, -1, -1, -1 };\n\nstruct _kswq_t {\n\tint qlen, slen;\n\tuint8_t shift, mdiff, max, size;\n\t__m128i *qp, *H0, *H1, *E, *Hmax;\n};\n\n/**\n * Initialize the query data structure\n *\n * @param size   Number of bytes used to store a score; valid valures are 1 or 2\n * @param qlen   Length of the query sequence\n * @param query  Query sequence\n * @param m      Size of the alphabet\n * @param mat    Scoring matrix in a one-dimension array\n *\n * @return       Query data structure\n */\nkswq_t *ksw_qinit(int size, int qlen, const uint8_t *query, int m, const int8_t *mat)\n{\n\tkswq_t *q;\n\tint slen, a, tmp, p;\n\n\tsize = size > 1? 2 : 1;\n\tp = 8 * (3 - size); // # values per __m128i\n\tslen = (qlen + p - 1) / p; // segmented length\n\tq = (kswq_t*)malloc(sizeof(kswq_t) + 256 + 16 * slen * (m + 4)); // a single block of memory\n\tq->qp = (__m128i*)(((size_t)q + sizeof(kswq_t) + 15) >> 4 << 4); // align memory\n\tq->H0 = q->qp + slen * m;\n\tq->H1 = q->H0 + slen;\n\tq->E  = q->H1 + slen;\n\tq->Hmax = q->E + slen;\n\tq->slen = slen; q->qlen = qlen; q->size = size;\n\t// compute shift\n\ttmp = m * m;\n\tfor (a = 0, q->shift = 127, q->mdiff = 0; a < tmp; ++a) { // find the minimum and maximum score\n\t\tif (mat[a] < (int8_t)q->shift) q->shift = mat[a];\n\t\tif (mat[a] > (int8_t)q->mdiff) q->mdiff = mat[a];\n\t}\n\tq->max = q->mdiff;\n\tq->shift = 256 - q->shift; // NB: q->shift is uint8_t\n\tq->mdiff += q->shift; // this is the difference between the min and max scores\n\t// An example: p=8, qlen=19, slen=3 and segmentation:\n\t//  {{0,3,6,9,12,15,18,-1},{1,4,7,10,13,16,-1,-1},{2,5,8,11,14,17,-1,-1}}\n\tif (size == 1) {\n\t\tint8_t *t = (int8_t*)q->qp;\n\t\tfor (a = 0; a < m; ++a) {\n\t\t\tint i, k, nlen = slen * p;\n\t\t\tconst int8_t *ma = mat + a * m;\n\t\t\tfor (i = 0; i < slen; ++i)\n\t\t\t\tfor (k = i; k < nlen; k += slen) // p iterations\n\t\t\t\t\t*t++ = (k >= qlen? 0 : ma[query[k]]) + q->shift;\n\t\t}\n\t} else {\n\t\tint16_t *t = (int16_t*)q->qp;\n\t\tfor (a = 0; a < m; ++a) {\n\t\t\tint i, k, nlen = slen * p;\n\t\t\tconst int8_t *ma = mat + a * m;\n\t\t\tfor (i = 0; i < slen; ++i)\n\t\t\t\tfor (k = i; k < nlen; k += slen) // p iterations\n\t\t\t\t\t*t++ = (k >= qlen? 0 : ma[query[k]]);\n\t\t}\n\t}\n\treturn q;\n}\n\nkswr_t ksw_u8(kswq_t *q, int tlen, const uint8_t *target, int _o_del, int _e_del, int _o_ins, int _e_ins, int xtra) // the first gap costs -(_o+_e)\n{\n\tint slen, i, m_b, n_b, te = -1, gmax = 0, minsc, endsc;\n\tuint64_t *b;\n\t__m128i zero, oe_del, e_del, oe_ins, e_ins, shift, *H0, *H1, *E, *Hmax;\n\tkswr_t r;\n\n#define __max_16(ret, xx) do { \\\n\t\t(xx) = _mm_max_epu8((xx), _mm_srli_si128((xx), 8)); \\\n\t\t(xx) = _mm_max_epu8((xx), _mm_srli_si128((xx), 4)); \\\n\t\t(xx) = _mm_max_epu8((xx), _mm_srli_si128((xx), 2)); \\\n\t\t(xx) = _mm_max_epu8((xx), _mm_srli_si128((xx), 1)); \\\n    \t(ret) = _mm_extract_epi16((xx), 0) & 0x00ff; \\\n\t} while (0)\n\n\t// initialization\n\tr = g_defr;\n\tminsc = (xtra&KSW_XSUBO)? xtra&0xffff : 0x10000;\n\tendsc = (xtra&KSW_XSTOP)? xtra&0xffff : 0x10000;\n\tm_b = n_b = 0; b = 0;\n\tzero = _mm_set1_epi32(0);\n\toe_del = _mm_set1_epi8(_o_del + _e_del);\n\te_del = _mm_set1_epi8(_e_del);\n\toe_ins = _mm_set1_epi8(_o_ins + _e_ins);\n\te_ins = _mm_set1_epi8(_e_ins);\n\tshift = _mm_set1_epi8(q->shift);\n\tH0 = q->H0; H1 = q->H1; E = q->E; Hmax = q->Hmax;\n\tslen = q->slen;\n\tfor (i = 0; i < slen; ++i) {\n\t\t_mm_store_si128(E + i, zero);\n\t\t_mm_store_si128(H0 + i, zero);\n\t\t_mm_store_si128(Hmax + i, zero);\n\t}\n\t// the core loop\n\tfor (i = 0; i < tlen; ++i) {\n\t\tint j, k, cmp, imax;\n\t\t__m128i e, h, t, f = zero, max = zero, *S = q->qp + target[i] * slen; // s is the 1st score vector\n\t\th = _mm_load_si128(H0 + slen - 1); // h={2,5,8,11,14,17,-1,-1} in the above example\n\t\th = _mm_slli_si128(h, 1); // h=H(i-1,-1); << instead of >> because x64 is little-endian\n\t\tfor (j = 0; LIKELY(j < slen); ++j) {\n\t\t\t/* SW cells are computed in the following order:\n\t\t\t *   H(i,j)   = max{H(i-1,j-1)+S(i,j), E(i,j), F(i,j)}\n\t\t\t *   E(i+1,j) = max{H(i,j)-q, E(i,j)-r}\n\t\t\t *   F(i,j+1) = max{H(i,j)-q, F(i,j)-r}\n\t\t\t */\n\t\t\t// compute H'(i,j); note that at the beginning, h=H'(i-1,j-1)\n\t\t\th = _mm_adds_epu8(h, _mm_load_si128(S + j));\n\t\t\th = _mm_subs_epu8(h, shift); // h=H'(i-1,j-1)+S(i,j)\n\t\t\te = _mm_load_si128(E + j); // e=E'(i,j)\n\t\t\th = _mm_max_epu8(h, e);\n\t\t\th = _mm_max_epu8(h, f); // h=H'(i,j)\n\t\t\tmax = _mm_max_epu8(max, h); // set max\n\t\t\t_mm_store_si128(H1 + j, h); // save to H'(i,j)\n\t\t\t// now compute E'(i+1,j)\n\t\t\te = _mm_subs_epu8(e, e_del); // e=E'(i,j) - e_del\n\t\t\tt = _mm_subs_epu8(h, oe_del); // h=H'(i,j) - o_del - e_del\n\t\t\te = _mm_max_epu8(e, t); // e=E'(i+1,j)\n\t\t\t_mm_store_si128(E + j, e); // save to E'(i+1,j)\n\t\t\t// now compute F'(i,j+1)\n\t\t\tf = _mm_subs_epu8(f, e_ins);\n\t\t\tt = _mm_subs_epu8(h, oe_ins); // h=H'(i,j) - o_ins - e_ins\n\t\t\tf = _mm_max_epu8(f, t);\n\t\t\t// get H'(i-1,j) and prepare for the next j\n\t\t\th = _mm_load_si128(H0 + j); // h=H'(i-1,j)\n\t\t}\n\t\t// NB: we do not need to set E(i,j) as we disallow adjecent insertion and then deletion\n\t\tfor (k = 0; LIKELY(k < 16); ++k) { // this block mimics SWPS3; NB: H(i,j) updated in the lazy-F loop cannot exceed max\n\t\t\tf = _mm_slli_si128(f, 1);\n\t\t\tfor (j = 0; LIKELY(j < slen); ++j) {\n\t\t\t\th = _mm_load_si128(H1 + j);\n\t\t\t\th = _mm_max_epu8(h, f); // h=H'(i,j)\n\t\t\t\t_mm_store_si128(H1 + j, h);\n\t\t\t\th = _mm_subs_epu8(h, oe_ins);\n\t\t\t\tf = _mm_subs_epu8(f, e_ins);\n\t\t\t\tcmp = _mm_movemask_epi8(_mm_cmpeq_epi8(_mm_subs_epu8(f, h), zero));\n\t\t\t\tif (UNLIKELY(cmp == 0xffff)) goto end_loop16;\n\t\t\t}\n\t\t}\nend_loop16:\n\t\t//int k;for (k=0;k<16;++k)printf(\"%d \", ((uint8_t*)&max)[k]);printf(\"\\n\");\n\t\t__max_16(imax, max); // imax is the maximum number in max\n\t\tif (imax >= minsc) { // write the b array; this condition adds branching unfornately\n\t\t\tif (n_b == 0 || (int32_t)b[n_b-1] + 1 != i) { // then append\n\t\t\t\tif (n_b == m_b) {\n\t\t\t\t\tm_b = m_b? m_b<<1 : 8;\n\t\t\t\t\tb = (uint64_t*)realloc(b, 8 * m_b);\n\t\t\t\t}\n\t\t\t\tb[n_b++] = (uint64_t)imax<<32 | i;\n\t\t\t} else if ((int)(b[n_b-1]>>32) < imax) b[n_b-1] = (uint64_t)imax<<32 | i; // modify the last\n\t\t}\n\t\tif (imax > gmax) {\n\t\t\tgmax = imax; te = i; // te is the end position on the target\n\t\t\tfor (j = 0; LIKELY(j < slen); ++j) // keep the H1 vector\n\t\t\t\t_mm_store_si128(Hmax + j, _mm_load_si128(H1 + j));\n\t\t\tif (gmax + q->shift >= 255 || gmax >= endsc) break;\n\t\t}\n\t\tS = H1; H1 = H0; H0 = S; // swap H0 and H1\n\t}\n\tr.score = gmax + q->shift < 255? gmax : 255;\n\tr.te = te;\n\tif (r.score != 255) { // get a->qe, the end of query match; find the 2nd best score\n\t\tint max = -1, tmp, low, high, qlen = slen * 16;\n\t\tuint8_t *t = (uint8_t*)Hmax;\n\t\tfor (i = 0; i < qlen; ++i, ++t)\n\t\t\tif ((int)*t > max) max = *t, r.qe = i / 16 + i % 16 * slen;\n\t\t\telse if ((int)*t == max && (tmp = i / 16 + i % 16 * slen) < r.qe) r.qe = tmp; \n\t\t//printf(\"%d,%d\\n\", max, gmax);\n\t\tif (b) {\n\t\t\ti = (r.score + q->max - 1) / q->max;\n\t\t\tlow = te - i; high = te + i;\n\t\t\tfor (i = 0; i < n_b; ++i) {\n\t\t\t\tint e = (int32_t)b[i];\n\t\t\t\tif ((e < low || e > high) && (int)(b[i]>>32) > r.score2)\n\t\t\t\t\tr.score2 = b[i]>>32, r.te2 = e;\n\t\t\t}\n\t\t}\n\t}\n\tfree(b);\n\treturn r;\n}\n\nkswr_t ksw_i16(kswq_t *q, int tlen, const uint8_t *target, int _o_del, int _e_del, int _o_ins, int _e_ins, int xtra) // the first gap costs -(_o+_e)\n{\n\tint slen, i, m_b, n_b, te = -1, gmax = 0, minsc, endsc;\n\tuint64_t *b;\n\t__m128i zero, oe_del, e_del, oe_ins, e_ins, *H0, *H1, *E, *Hmax;\n\tkswr_t r;\n\n#define __max_8(ret, xx) do { \\\n\t\t(xx) = _mm_max_epi16((xx), _mm_srli_si128((xx), 8)); \\\n\t\t(xx) = _mm_max_epi16((xx), _mm_srli_si128((xx), 4)); \\\n\t\t(xx) = _mm_max_epi16((xx), _mm_srli_si128((xx), 2)); \\\n    \t(ret) = _mm_extract_epi16((xx), 0); \\\n\t} while (0)\n\n\t// initialization\n\tr = g_defr;\n\tminsc = (xtra&KSW_XSUBO)? xtra&0xffff : 0x10000;\n\tendsc = (xtra&KSW_XSTOP)? xtra&0xffff : 0x10000;\n\tm_b = n_b = 0; b = 0;\n\tzero = _mm_set1_epi32(0);\n\toe_del = _mm_set1_epi16(_o_del + _e_del);\n\te_del = _mm_set1_epi16(_e_del);\n\toe_ins = _mm_set1_epi16(_o_ins + _e_ins);\n\te_ins = _mm_set1_epi16(_e_ins);\n\tH0 = q->H0; H1 = q->H1; E = q->E; Hmax = q->Hmax;\n\tslen = q->slen;\n\tfor (i = 0; i < slen; ++i) {\n\t\t_mm_store_si128(E + i, zero);\n\t\t_mm_store_si128(H0 + i, zero);\n\t\t_mm_store_si128(Hmax + i, zero);\n\t}\n\t// the core loop\n\tfor (i = 0; i < tlen; ++i) {\n\t\tint j, k, imax;\n\t\t__m128i e, t, h, f = zero, max = zero, *S = q->qp + target[i] * slen; // s is the 1st score vector\n\t\th = _mm_load_si128(H0 + slen - 1); // h={2,5,8,11,14,17,-1,-1} in the above example\n\t\th = _mm_slli_si128(h, 2);\n\t\tfor (j = 0; LIKELY(j < slen); ++j) {\n\t\t\th = _mm_adds_epi16(h, *S++);\n\t\t\te = _mm_load_si128(E + j);\n\t\t\th = _mm_max_epi16(h, e);\n\t\t\th = _mm_max_epi16(h, f);\n\t\t\tmax = _mm_max_epi16(max, h);\n\t\t\t_mm_store_si128(H1 + j, h);\n\t\t\te = _mm_subs_epu16(e, e_del);\n\t\t\tt = _mm_subs_epu16(h, oe_del);\n\t\t\te = _mm_max_epi16(e, t);\n\t\t\t_mm_store_si128(E + j, e);\n\t\t\tf = _mm_subs_epu16(f, e_ins);\n\t\t\tt = _mm_subs_epu16(h, oe_ins);\n\t\t\tf = _mm_max_epi16(f, t);\n\t\t\th = _mm_load_si128(H0 + j);\n\t\t}\n\t\tfor (k = 0; LIKELY(k < 16); ++k) {\n\t\t\tf = _mm_slli_si128(f, 2);\n\t\t\tfor (j = 0; LIKELY(j < slen); ++j) {\n\t\t\t\th = _mm_load_si128(H1 + j);\n\t\t\t\th = _mm_max_epi16(h, f);\n\t\t\t\t_mm_store_si128(H1 + j, h);\n\t\t\t\th = _mm_subs_epu16(h, oe_ins);\n\t\t\t\tf = _mm_subs_epu16(f, e_ins);\n\t\t\t\tif(UNLIKELY(!_mm_movemask_epi8(_mm_cmpgt_epi16(f, h)))) goto end_loop8;\n\t\t\t}\n\t\t}\nend_loop8:\n\t\t__max_8(imax, max);\n\t\tif (imax >= minsc) {\n\t\t\tif (n_b == 0 || (int32_t)b[n_b-1] + 1 != i) {\n\t\t\t\tif (n_b == m_b) {\n\t\t\t\t\tm_b = m_b? m_b<<1 : 8;\n\t\t\t\t\tb = (uint64_t*)realloc(b, 8 * m_b);\n\t\t\t\t}\n\t\t\t\tb[n_b++] = (uint64_t)imax<<32 | i;\n\t\t\t} else if ((int)(b[n_b-1]>>32) < imax) b[n_b-1] = (uint64_t)imax<<32 | i; // modify the last\n\t\t}\n\t\tif (imax > gmax) {\n\t\t\tgmax = imax; te = i;\n\t\t\tfor (j = 0; LIKELY(j < slen); ++j)\n\t\t\t\t_mm_store_si128(Hmax + j, _mm_load_si128(H1 + j));\n\t\t\tif (gmax >= endsc) break;\n\t\t}\n\t\tS = H1; H1 = H0; H0 = S;\n\t}\n\tr.score = gmax; r.te = te;\n\t{\n\t\tint max = -1, tmp, low, high, qlen = slen * 8;\n\t\tuint16_t *t = (uint16_t*)Hmax;\n\t\tfor (i = 0, r.qe = -1; i < qlen; ++i, ++t)\n\t\t\tif ((int)*t > max) max = *t, r.qe = i / 8 + i % 8 * slen;\n\t\t\telse if ((int)*t == max && (tmp = i / 8 + i % 8 * slen) < r.qe) r.qe = tmp; \n\t\tif (b) {\n\t\t\ti = (r.score + q->max - 1) / q->max;\n\t\t\tlow = te - i; high = te + i;\n\t\t\tfor (i = 0; i < n_b; ++i) {\n\t\t\t\tint e = (int32_t)b[i];\n\t\t\t\tif ((e < low || e > high) && (int)(b[i]>>32) > r.score2)\n\t\t\t\t\tr.score2 = b[i]>>32, r.te2 = e;\n\t\t\t}\n\t\t}\n\t}\n\tfree(b);\n\treturn r;\n}\n\nstatic inline void revseq(int l, uint8_t *s)\n{\n\tint i, t;\n\tfor (i = 0; i < l>>1; ++i)\n\t\tt = s[i], s[i] = s[l - 1 - i], s[l - 1 - i] = t;\n}\n\nkswr_t ksw_align2(int qlen, uint8_t *query, int tlen, uint8_t *target, int m, const int8_t *mat, int o_del, int e_del, int o_ins, int e_ins, int xtra, kswq_t **qry)\n{\n\tint size;\n\tkswq_t *q;\n\tkswr_t r, rr;\n\tkswr_t (*func)(kswq_t*, int, const uint8_t*, int, int, int, int, int);\n\n\tq = (qry && *qry)? *qry : ksw_qinit((xtra&KSW_XBYTE)? 1 : 2, qlen, query, m, mat);\n\tif (qry && *qry == 0) *qry = q;\n\tfunc = q->size == 2? ksw_i16 : ksw_u8;\n\tsize = q->size;\n\tr = func(q, tlen, target, o_del, e_del, o_ins, e_ins, xtra);\n\tif (qry == 0) free(q);\n\tif ((xtra&KSW_XSTART) == 0 || ((xtra&KSW_XSUBO) && r.score < (xtra&0xffff))) return r;\n\trevseq(r.qe + 1, query); revseq(r.te + 1, target); // +1 because qe/te points to the exact end, not the position after the end\n\tq = ksw_qinit(size, r.qe + 1, query, m, mat);\n\trr = func(q, tlen, target, o_del, e_del, o_ins, e_ins, KSW_XSTOP | r.score);\n\trevseq(r.qe + 1, query); revseq(r.te + 1, target);\n\tfree(q);\n\tif (r.score == rr.score)\n\t\tr.tb = r.te - rr.te, r.qb = r.qe - rr.qe;\n\treturn r;\n}\n\nkswr_t ksw_align(int qlen, uint8_t *query, int tlen, uint8_t *target, int m, const int8_t *mat, int gapo, int gape, int xtra, kswq_t **qry)\n{\n\treturn ksw_align2(qlen, query, tlen, target, m, mat, gapo, gape, gapo, gape, xtra, qry);\n}\n\n/********************\n *** SW extension ***\n ********************/\n\ntypedef struct {\n\tint32_t h, e;\n} eh_t;\n\nint ksw_extend2(int qlen, const uint8_t *query, int tlen, const uint8_t *target, int m, const int8_t *mat, int o_del, int e_del, int o_ins, int e_ins, int w, int end_bonus, int zdrop, int h0, int *_qle, int *_tle, int *_gtle, int *_gscore, int *_max_off)\n{\n\teh_t *eh; // score array\n\tint8_t *qp; // query profile\n\tint i, j, k, oe_del = o_del + e_del, oe_ins = o_ins + e_ins, beg, end, max, max_i, max_j, max_ins, max_del, max_ie, gscore, max_off;\n\tassert(h0 > 0);\n\t// allocate memory\n\tqp = (int8_t*)malloc(qlen * m);\n\teh = (eh_t*)calloc(qlen + 1, 8);\n\t// generate the query profile\n\tfor (k = i = 0; k < m; ++k) {\n\t\tconst int8_t *p = &mat[k * m];\n\t\tfor (j = 0; j < qlen; ++j) qp[i++] = p[query[j]];\n\t}\n\t// fill the first row\n\teh[0].h = h0; eh[1].h = h0 > oe_ins? h0 - oe_ins : 0;\n\tfor (j = 2; j <= qlen && eh[j-1].h > e_ins; ++j)\n\t\teh[j].h = eh[j-1].h - e_ins;\n\t// adjust $w if it is too large\n\tk = m * m;\n\tfor (i = 0, max = 0; i < k; ++i) // get the max score\n\t\tmax = max > mat[i]? max : mat[i];\n\tmax_ins = (int)((double)(qlen * max + end_bonus - o_ins) / e_ins + 1.);\n\tmax_ins = max_ins > 1? max_ins : 1;\n\tw = w < max_ins? w : max_ins;\n\tmax_del = (int)((double)(qlen * max + end_bonus - o_del) / e_del + 1.);\n\tmax_del = max_del > 1? max_del : 1;\n\tw = w < max_del? w : max_del; // TODO: is this necessary?\n\t// DP loop\n\tmax = h0, max_i = max_j = -1; max_ie = -1, gscore = -1;\n\tmax_off = 0;\n\tbeg = 0, end = qlen;\n\tfor (i = 0; LIKELY(i < tlen); ++i) {\n\t\tint t, f = 0, h1, m = 0, mj = -1;\n\t\tint8_t *q = &qp[target[i] * qlen];\n\t\t// apply the band and the constraint (if provided)\n\t\tif (beg < i - w) beg = i - w;\n\t\tif (end > i + w + 1) end = i + w + 1;\n\t\tif (end > qlen) end = qlen;\n\t\t// compute the first column\n\t\tif (beg == 0) {\n\t\t\th1 = h0 - (o_del + e_del * (i + 1));\n\t\t\tif (h1 < 0) h1 = 0;\n\t\t} else h1 = 0;\n\t\tfor (j = beg; LIKELY(j < end); ++j) {\n\t\t\t// At the beginning of the loop: eh[j] = { H(i-1,j-1), E(i,j) }, f = F(i,j) and h1 = H(i,j-1)\n\t\t\t// Similar to SSE2-SW, cells are computed in the following order:\n\t\t\t//   H(i,j)   = max{H(i-1,j-1)+S(i,j), E(i,j), F(i,j)}\n\t\t\t//   E(i+1,j) = max{H(i,j)-gapo, E(i,j)} - gape\n\t\t\t//   F(i,j+1) = max{H(i,j)-gapo, F(i,j)} - gape\n\t\t\teh_t *p = &eh[j];\n\t\t\tint h, M = p->h, e = p->e; // get H(i-1,j-1) and E(i-1,j)\n\t\t\tp->h = h1;          // set H(i,j-1) for the next row\n\t\t\tM = M? M + q[j] : 0;// separating H and M to disallow a cigar like \"100M3I3D20M\"\n\t\t\th = M > e? M : e;   // e and f are guaranteed to be non-negative, so h>=0 even if M<0\n\t\t\th = h > f? h : f;\n\t\t\th1 = h;             // save H(i,j) to h1 for the next column\n\t\t\tmj = m > h? mj : j; // record the position where max score is achieved\n\t\t\tm = m > h? m : h;   // m is stored at eh[mj+1]\n\t\t\tt = M - oe_del;\n\t\t\tt = t > 0? t : 0;\n\t\t\te -= e_del;\n\t\t\te = e > t? e : t;   // computed E(i+1,j)\n\t\t\tp->e = e;           // save E(i+1,j) for the next row\n\t\t\tt = M - oe_ins;\n\t\t\tt = t > 0? t : 0;\n\t\t\tf -= e_ins;\n\t\t\tf = f > t? f : t;   // computed F(i,j+1)\n\t\t}\n\t\teh[end].h = h1; eh[end].e = 0;\n\t\tif (j == qlen) {\n\t\t\tmax_ie = gscore > h1? max_ie : i;\n\t\t\tgscore = gscore > h1? gscore : h1;\n\t\t}\n\t\tif (m == 0) break;\n\t\tif (m > max) {\n\t\t\tmax = m, max_i = i, max_j = mj;\n\t\t\tmax_off = max_off > abs(mj - i)? max_off : abs(mj - i);\n\t\t} else if (zdrop > 0) {\n\t\t\tif (i - max_i > mj - max_j) {\n\t\t\t\tif (max - m - ((i - max_i) - (mj - max_j)) * e_del > zdrop) break;\n\t\t\t} else {\n\t\t\t\tif (max - m - ((mj - max_j) - (i - max_i)) * e_ins > zdrop) break;\n\t\t\t}\n\t\t}\n\t\t// update beg and end for the next round\n\t\tfor (j = beg; LIKELY(j < end) && eh[j].h == 0 && eh[j].e == 0; ++j);\n\t\tbeg = j;\n\t\tfor (j = end; LIKELY(j >= beg) && eh[j].h == 0 && eh[j].e == 0; --j);\n\t\tend = j + 2 < qlen? j + 2 : qlen;\n\t\t//beg = 0; end = qlen; // uncomment this line for debugging\n\t}\n\tfree(eh); free(qp);\n\tif (_qle) *_qle = max_j + 1;\n\tif (_tle) *_tle = max_i + 1;\n\tif (_gtle) *_gtle = max_ie + 1;\n\tif (_gscore) *_gscore = gscore;\n\tif (_max_off) *_max_off = max_off;\n\treturn max;\n}\n\nint ksw_extend(int qlen, const uint8_t *query, int tlen, const uint8_t *target, int m, const int8_t *mat, int gapo, int gape, int w, int end_bonus, int zdrop, int h0, int *qle, int *tle, int *gtle, int *gscore, int *max_off)\n{\n\treturn ksw_extend2(qlen, query, tlen, target, m, mat, gapo, gape, gapo, gape, w, end_bonus, zdrop, h0, qle, tle, gtle, gscore, max_off);\n}\n\n/********************\n * Global alignment *\n ********************/\n\n#define MINUS_INF -0x40000000\n\nstatic inline uint32_t *push_cigar(int *n_cigar, int *m_cigar, uint32_t *cigar, int op, int len)\n{\n\tif (*n_cigar == 0 || op != (int)(cigar[(*n_cigar) - 1]&0xf)) {\n\t\tif (*n_cigar == *m_cigar) {\n\t\t\t*m_cigar = *m_cigar? (*m_cigar)<<1 : 4;\n\t\t\tcigar = (uint32_t*)realloc(cigar, (*m_cigar) << 2);\n\t\t}\n\t\tcigar[(*n_cigar)++] = len<<4 | op;\n\t} else cigar[(*n_cigar)-1] += len<<4;\n\treturn cigar;\n}\n\nint ksw_semi_global3(int qlen, const char *query, int tlen, const char *target, int m, const int8_t *mat, int o_del, int e_del, int o_ins, int e_ins, int w, int *n_cigar_, uint32_t **cigar_, int *mapping_start_position, int *mapping_end_position)\n{\n\teh_t *eh;\n\tint8_t *qp; // query profile\n\tint i, j, k, oe_del = o_del + e_del, oe_ins = o_ins + e_ins, score, n_col;\n\tuint8_t *z; // backtrack matrix; in each cell: f<<4|e<<2|h; in principle, we can halve the memory, but backtrack will be a little more complex\n\tif (n_cigar_) *n_cigar_ = 0;\n\t// allocate memory\n\tn_col = qlen < 2*w+1? qlen : 2*w+1; // maximum #columns of the backtrack matrix\n\tz = n_cigar_ && cigar_? (uint8_t*)malloc((long)n_col * tlen) : 0;\n\tqp = (int8_t*)malloc(qlen * m);\n\teh = (eh_t*)calloc(qlen + 1, 8);\n\t// generate the query profile\n\tfor (k = i = 0; k < m; ++k) {\n\t\tconst int8_t *p = &mat[k * m];\n\t\tfor (j = 0; j < qlen; ++j) qp[i++] = p[chromap::CharToUint8(query[j])];\n\t}\n\t// fill the first row\n\teh[0].h = 0; eh[0].e = MINUS_INF;\n\tfor (j = 1; j <= qlen && j <= w; ++j)\n\t\t//eh[j].h = -(o_ins + e_ins * j), eh[j].e = MINUS_INF;\n\t\teh[j].h = 0, eh[j].e = MINUS_INF;\n\tfor (; j <= qlen; ++j) eh[j].h = eh[j].e = MINUS_INF; // everything is -inf outside the band\n\t// DP loop\n\tfor (i = 0; LIKELY(i < tlen); ++i) { // target sequence is in the outer loop\n\t\tint32_t f = MINUS_INF, h1, beg, end, t;\n\t\tint8_t *q = &qp[chromap::CharToUint8(target[i]) * qlen];\n\t\t//beg = i > w? i - w : 0;\n\t\tbeg = i; \n\t\t//beg = i > 1? i - 1 : 0;\n\t\tend = i + w + 1 < qlen? i + w + 1 : qlen; // only loop through [beg,end) of the query sequence\n\t\th1 = beg == 0? -(o_del + e_del * (i + 1)) : MINUS_INF;\n\t\tif (n_cigar_ && cigar_) {\n\t\t\tuint8_t *zi = &z[(long)i * n_col];\n\t\t\tfor (j = beg; LIKELY(j < end); ++j) {\n\t\t\t\t// At the beginning of the loop: eh[j] = { H(i-1,j-1), E(i,j) }, f = F(i,j) and h1 = H(i,j-1)\n\t\t\t\t// Cells are computed in the following order:\n\t\t\t\t//   M(i,j)   = H(i-1,j-1) + S(i,j)\n\t\t\t\t//   H(i,j)   = max{M(i,j), E(i,j), F(i,j)}\n\t\t\t\t//   E(i+1,j) = max{M(i,j)-gapo, E(i,j)} - gape\n\t\t\t\t//   F(i,j+1) = max{M(i,j)-gapo, F(i,j)} - gape\n\t\t\t\t// We have to separate M(i,j); otherwise the direction may not be recorded correctly.\n\t\t\t\t// However, a CIGAR like \"10M3I3D10M\" allowed by local() is disallowed by global().\n\t\t\t\t// Such a CIGAR may occur, in theory, if mismatch_penalty > 2*gap_ext_penalty + 2*gap_open_penalty/k.\n\t\t\t\t// In practice, this should happen very rarely given a reasonable scoring system.\n\t\t\t\teh_t *p = &eh[j];\n\t\t\t\tint32_t h, m = p->h, e = p->e;\n\t\t\t\tuint8_t d; // direction\n\t\t\t\tp->h = h1;\n\t\t\t\tm += q[j];\n\t\t\t\td = m >= e? 0 : 1;\n\t\t\t\th = m >= e? m : e;\n\t\t\t\td = h >= f? d : 2;\n\t\t\t\th = h >= f? h : f;\n\t\t\t\th1 = h;\n\t\t\t\tt = m - oe_del;\n\t\t\t\te -= e_del;\n\t\t\t\td |= e > t? 1<<2 : 0;\n\t\t\t\te  = e > t? e    : t;\n\t\t\t\tp->e = e;\n\t\t\t\tt = m - oe_ins;\n\t\t\t\tf -= e_ins;\n\t\t\t\td |= f > t? 2<<4 : 0; // if we want to halve the memory, use one bit only, instead of two\n\t\t\t\tf  = f > t? f    : t;\n\t\t\t\tzi[j - beg] = d; // z[i,j] keeps h for the current cell and e/f for the next cell\n\t\t\t}\n\t\t} else {\n\t\t\tfor (j = beg; LIKELY(j < end); ++j) {\n\t\t\t\teh_t *p = &eh[j];\n\t\t\t\tint32_t h, m = p->h, e = p->e;\n\t\t\t\tp->h = h1;\n\t\t\t\tm += q[j];\n\t\t\t\th = m >= e? m : e;\n\t\t\t\th = h >= f? h : f;\n\t\t\t\th1 = h;\n\t\t\t\tt = m - oe_del;\n\t\t\t\te -= e_del;\n\t\t\t\te  = e > t? e : t;\n\t\t\t\tp->e = e;\n\t\t\t\tt = m - oe_ins;\n\t\t\t\tf -= e_ins;\n\t\t\t\tf  = f > t? f : t;\n\t\t\t}\n\t\t}\n\t\teh[end].h = h1; eh[end].e = MINUS_INF;\n\t}\n\tscore = eh[qlen].h;\n\n  int max_score_position = qlen;\n  for (j = 1; j < w; ++j) {\n    if (eh[qlen - j].h > score) {\n      score = eh[qlen - j].h;\n      max_score_position = qlen - j;\n    }\n  }\n  if (mapping_end_position) {\n    *mapping_end_position = max_score_position;\n  }\n\tif (n_cigar_ && cigar_) { // backtrack\n\t\tint n_cigar = 0, m_cigar = 0, which = 0;\n\t\tuint32_t *cigar = 0, tmp;\n\t\t//i = tlen - 1; k = (i + w + 1 < qlen? i + w + 1 : qlen) - 1; // (i,k) points to the last cell\n\t\ti = tlen - 1; k = max_score_position - 1; // (i,k) points to the last cell\n\t\twhile (i >= 0 && k >= 0) {\n\t\t\t//which = z[(long)i * n_col + (k - (i > w? i - w : 0))] >> (which<<1) & 3;\n\t\t\twhich = z[(long)i * n_col + (k - i)] >> (which<<1) & 3;\n\t\t\tif (which == 0)      cigar = push_cigar(&n_cigar, &m_cigar, cigar, 0, 1), --i, --k;\n\t\t\telse if (which == 1) cigar = push_cigar(&n_cigar, &m_cigar, cigar, 1, 1), --i;\n\t\t\telse                 cigar = push_cigar(&n_cigar, &m_cigar, cigar, 2, 1), --k;\n\t\t}\n\t\tif (i >= 0) cigar = push_cigar(&n_cigar, &m_cigar, cigar, 1, i + 1);\n    if (mapping_start_position) {\n      *mapping_start_position = k + 1;\n    }\n\t\t//if (k >= 0) cigar = push_cigar(&n_cigar, &m_cigar, cigar, 1, k + 1);\n\t\tfor (i = 0; i < n_cigar>>1; ++i) // reverse CIGAR\n\t\t\ttmp = cigar[i], cigar[i] = cigar[n_cigar-1-i], cigar[n_cigar-1-i] = tmp;\n\t\t*n_cigar_ = n_cigar, *cigar_ = cigar;\n\t}\n\tfree(eh); free(qp); free(z);\n\treturn score;\n}\n\nint ksw_semi_global(int qlen, const char *query, int tlen, const char *target, int m, const int8_t *mat, int gapo, int gape, int w, int *n_cigar_, uint32_t **cigar_)\n{\n\treturn ksw_semi_global2(qlen, query, tlen, target, m, mat, gapo, gape, gapo, gape, w, n_cigar_, cigar_);\n}\n\nint ksw_semi_global2(int qlen, const char *query, int tlen, const char *target, int m, const int8_t *mat, int o_del, int e_del, int o_ins, int e_ins, int w, int *n_cigar_, uint32_t **cigar_)\n{\n\treturn ksw_semi_global3(qlen, query, tlen, target, m, mat, o_del, e_del, o_ins, e_ins, w, n_cigar_, cigar_, NULL, NULL);\n}\n\nint ksw_global2(int qlen, const uint8_t *query, int tlen, const uint8_t *target, int m, const int8_t *mat, int o_del, int e_del, int o_ins, int e_ins, int w, int *n_cigar_, uint32_t **cigar_)\n{\n\teh_t *eh;\n\tint8_t *qp; // query profile\n\tint i, j, k, oe_del = o_del + e_del, oe_ins = o_ins + e_ins, score, n_col;\n\tuint8_t *z; // backtrack matrix; in each cell: f<<4|e<<2|h; in principle, we can halve the memory, but backtrack will be a little more complex\n\tif (n_cigar_) *n_cigar_ = 0;\n\t// allocate memory\n\tn_col = qlen < 2*w+1? qlen : 2*w+1; // maximum #columns of the backtrack matrix\n\tz = n_cigar_ && cigar_? (uint8_t*)malloc((long)n_col * tlen) : 0;\n\tqp = (int8_t*)malloc(qlen * m);\n\teh = (eh_t*)calloc(qlen + 1, 8);\n\t// generate the query profile\n\tfor (k = i = 0; k < m; ++k) {\n\t\tconst int8_t *p = &mat[k * m];\n\t\tfor (j = 0; j < qlen; ++j) qp[i++] = p[query[j]];\n\t}\n\t// fill the first row\n\teh[0].h = 0; eh[0].e = MINUS_INF;\n\tfor (j = 1; j <= qlen && j <= w; ++j)\n\t\teh[j].h = -(o_ins + e_ins * j), eh[j].e = MINUS_INF;\n\tfor (; j <= qlen; ++j) eh[j].h = eh[j].e = MINUS_INF; // everything is -inf outside the band\n\t// DP loop\n\tfor (i = 0; LIKELY(i < tlen); ++i) { // target sequence is in the outer loop\n\t\tint32_t f = MINUS_INF, h1, beg, end, t;\n\t\tint8_t *q = &qp[target[i] * qlen];\n\t\tbeg = i > w? i - w : 0;\n\t\tend = i + w + 1 < qlen? i + w + 1 : qlen; // only loop through [beg,end) of the query sequence\n\t\th1 = beg == 0? -(o_del + e_del * (i + 1)) : MINUS_INF;\n\t\tif (n_cigar_ && cigar_) {\n\t\t\tuint8_t *zi = &z[(long)i * n_col];\n\t\t\tfor (j = beg; LIKELY(j < end); ++j) {\n\t\t\t\t// At the beginning of the loop: eh[j] = { H(i-1,j-1), E(i,j) }, f = F(i,j) and h1 = H(i,j-1)\n\t\t\t\t// Cells are computed in the following order:\n\t\t\t\t//   M(i,j)   = H(i-1,j-1) + S(i,j)\n\t\t\t\t//   H(i,j)   = max{M(i,j), E(i,j), F(i,j)}\n\t\t\t\t//   E(i+1,j) = max{M(i,j)-gapo, E(i,j)} - gape\n\t\t\t\t//   F(i,j+1) = max{M(i,j)-gapo, F(i,j)} - gape\n\t\t\t\t// We have to separate M(i,j); otherwise the direction may not be recorded correctly.\n\t\t\t\t// However, a CIGAR like \"10M3I3D10M\" allowed by local() is disallowed by global().\n\t\t\t\t// Such a CIGAR may occur, in theory, if mismatch_penalty > 2*gap_ext_penalty + 2*gap_open_penalty/k.\n\t\t\t\t// In practice, this should happen very rarely given a reasonable scoring system.\n\t\t\t\teh_t *p = &eh[j];\n\t\t\t\tint32_t h, m = p->h, e = p->e;\n\t\t\t\tuint8_t d; // direction\n\t\t\t\tp->h = h1;\n\t\t\t\tm += q[j];\n\t\t\t\td = m >= e? 0 : 1;\n\t\t\t\th = m >= e? m : e;\n\t\t\t\td = h >= f? d : 2;\n\t\t\t\th = h >= f? h : f;\n\t\t\t\th1 = h;\n\t\t\t\tt = m - oe_del;\n\t\t\t\te -= e_del;\n\t\t\t\td |= e > t? 1<<2 : 0;\n\t\t\t\te  = e > t? e    : t;\n\t\t\t\tp->e = e;\n\t\t\t\tt = m - oe_ins;\n\t\t\t\tf -= e_ins;\n\t\t\t\td |= f > t? 2<<4 : 0; // if we want to halve the memory, use one bit only, instead of two\n\t\t\t\tf  = f > t? f    : t;\n\t\t\t\tzi[j - beg] = d; // z[i,j] keeps h for the current cell and e/f for the next cell\n\t\t\t}\n\t\t} else {\n\t\t\tfor (j = beg; LIKELY(j < end); ++j) {\n\t\t\t\teh_t *p = &eh[j];\n\t\t\t\tint32_t h, m = p->h, e = p->e;\n\t\t\t\tp->h = h1;\n\t\t\t\tm += q[j];\n\t\t\t\th = m >= e? m : e;\n\t\t\t\th = h >= f? h : f;\n\t\t\t\th1 = h;\n\t\t\t\tt = m - oe_del;\n\t\t\t\te -= e_del;\n\t\t\t\te  = e > t? e : t;\n\t\t\t\tp->e = e;\n\t\t\t\tt = m - oe_ins;\n\t\t\t\tf -= e_ins;\n\t\t\t\tf  = f > t? f : t;\n\t\t\t}\n\t\t}\n\t\teh[end].h = h1; eh[end].e = MINUS_INF;\n\t}\n\tscore = eh[qlen].h;\n\tif (n_cigar_ && cigar_) { // backtrack\n\t\tint n_cigar = 0, m_cigar = 0, which = 0;\n\t\tuint32_t *cigar = 0, tmp;\n\t\ti = tlen - 1; k = (i + w + 1 < qlen? i + w + 1 : qlen) - 1; // (i,k) points to the last cell\n\t\twhile (i >= 0 && k >= 0) {\n\t\t\twhich = z[(long)i * n_col + (k - (i > w? i - w : 0))] >> (which<<1) & 3;\n\t\t\tif (which == 0)      cigar = push_cigar(&n_cigar, &m_cigar, cigar, 0, 1), --i, --k;\n\t\t\telse if (which == 1) cigar = push_cigar(&n_cigar, &m_cigar, cigar, 2, 1), --i;\n\t\t\telse                 cigar = push_cigar(&n_cigar, &m_cigar, cigar, 1, 1), --k;\n\t\t}\n\t\tif (i >= 0) cigar = push_cigar(&n_cigar, &m_cigar, cigar, 2, i + 1);\n\t\tif (k >= 0) cigar = push_cigar(&n_cigar, &m_cigar, cigar, 1, k + 1);\n\t\tfor (i = 0; i < n_cigar>>1; ++i) // reverse CIGAR\n\t\t\ttmp = cigar[i], cigar[i] = cigar[n_cigar-1-i], cigar[n_cigar-1-i] = tmp;\n\t\t*n_cigar_ = n_cigar, *cigar_ = cigar;\n\t}\n\tfree(eh); free(qp); free(z);\n\treturn score;\n}\n\nint ksw_global(int qlen, const uint8_t *query, int tlen, const uint8_t *target, int m, const int8_t *mat, int gapo, int gape, int w, int *n_cigar_, uint32_t **cigar_)\n{\n\treturn ksw_global2(qlen, query, tlen, target, m, mat, gapo, gape, gapo, gape, w, n_cigar_, cigar_);\n}\n\n/*******************************************\n * Main function (not compiled by default) *\n *******************************************/\n\n#ifdef _KSW_MAIN\n\n#include <unistd.h>\n#include <stdio.h>\n#include <zlib.h>\n#include \"kseq.h\"\nKSEQ_INIT(gzFile, err_gzread)\n\nunsigned char seq_nt4_table[256] = {\n\t4, 4, 4, 4,  4, 4, 4, 4,  4, 4, 4, 4,  4, 4, 4, 4, \n\t4, 4, 4, 4,  4, 4, 4, 4,  4, 4, 4, 4,  4, 4, 4, 4, \n\t4, 4, 4, 4,  4, 4, 4, 4,  4, 4, 4, 4,  4, 4, 4, 4,\n\t4, 4, 4, 4,  4, 4, 4, 4,  4, 4, 4, 4,  4, 4, 4, 4, \n\t4, 0, 4, 1,  4, 4, 4, 2,  4, 4, 4, 4,  4, 4, 4, 4, \n\t4, 4, 4, 4,  3, 4, 4, 4,  4, 4, 4, 4,  4, 4, 4, 4, \n\t4, 0, 4, 1,  4, 4, 4, 2,  4, 4, 4, 4,  4, 4, 4, 4, \n\t4, 4, 4, 4,  3, 4, 4, 4,  4, 4, 4, 4,  4, 4, 4, 4, \n\t4, 4, 4, 4,  4, 4, 4, 4,  4, 4, 4, 4,  4, 4, 4, 4, \n\t4, 4, 4, 4,  4, 4, 4, 4,  4, 4, 4, 4,  4, 4, 4, 4, \n\t4, 4, 4, 4,  4, 4, 4, 4,  4, 4, 4, 4,  4, 4, 4, 4, \n\t4, 4, 4, 4,  4, 4, 4, 4,  4, 4, 4, 4,  4, 4, 4, 4, \n\t4, 4, 4, 4,  4, 4, 4, 4,  4, 4, 4, 4,  4, 4, 4, 4, \n\t4, 4, 4, 4,  4, 4, 4, 4,  4, 4, 4, 4,  4, 4, 4, 4, \n\t4, 4, 4, 4,  4, 4, 4, 4,  4, 4, 4, 4,  4, 4, 4, 4, \n\t4, 4, 4, 4,  4, 4, 4, 4,  4, 4, 4, 4,  4, 4, 4, 4\n};\n\nint main(int argc, char *argv[])\n{\n\tint c, sa = 1, sb = 3, i, j, k, forward_only = 0, max_rseq = 0;\n\tint8_t mat[25];\n\tint gapo = 5, gape = 2, minsc = 0, xtra = KSW_XSTART;\n\tuint8_t *rseq = 0;\n\tgzFile fpt, fpq;\n\tkseq_t *kst, *ksq;\n\n\t// parse command line\n\twhile ((c = getopt(argc, argv, \"a:b:q:r:ft:1\")) >= 0) {\n\t\tswitch (c) {\n\t\t\tcase 'a': sa = atoi(optarg); break;\n\t\t\tcase 'b': sb = atoi(optarg); break;\n\t\t\tcase 'q': gapo = atoi(optarg); break;\n\t\t\tcase 'r': gape = atoi(optarg); break;\n\t\t\tcase 't': minsc = atoi(optarg); break;\n\t\t\tcase 'f': forward_only = 1; break;\n\t\t\tcase '1': xtra |= KSW_XBYTE; break;\n\t\t}\n\t}\n\tif (optind + 2 > argc) {\n\t\tfprintf(stderr, \"Usage: ksw [-1] [-f] [-a%d] [-b%d] [-q%d] [-r%d] [-t%d] <target.fa> <query.fa>\\n\", sa, sb, gapo, gape, minsc);\n\t\treturn 1;\n\t}\n\tif (minsc > 0xffff) minsc = 0xffff;\n\txtra |= KSW_XSUBO | minsc;\n\t// initialize scoring matrix\n\tfor (i = k = 0; i < 4; ++i) {\n\t\tfor (j = 0; j < 4; ++j)\n\t\t\tmat[k++] = i == j? sa : -sb;\n\t\tmat[k++] = 0; // ambiguous base\n\t}\n\tfor (j = 0; j < 5; ++j) mat[k++] = 0;\n\t// open file\n\tfpt = xzopen(argv[optind],   \"r\"); kst = kseq_init(fpt);\n\tfpq = xzopen(argv[optind+1], \"r\"); ksq = kseq_init(fpq);\n\t// all-pair alignment\n\twhile (kseq_read(ksq) > 0) {\n\t\tkswq_t *q[2] = {0, 0};\n\t\tkswr_t r;\n\t\tfor (i = 0; i < (int)ksq->seq.l; ++i) ksq->seq.s[i] = seq_nt4_table[(int)ksq->seq.s[i]];\n\t\tif (!forward_only) { // reverse\n\t\t\tif ((int)ksq->seq.m > max_rseq) {\n\t\t\t\tmax_rseq = ksq->seq.m;\n\t\t\t\trseq = (uint8_t*)realloc(rseq, max_rseq);\n\t\t\t}\n\t\t\tfor (i = 0, j = ksq->seq.l - 1; i < (int)ksq->seq.l; ++i, --j)\n\t\t\t\trseq[j] = ksq->seq.s[i] == 4? 4 : 3 - ksq->seq.s[i];\n\t\t}\n\t\tgzrewind(fpt); kseq_rewind(kst);\n\t\twhile (kseq_read(kst) > 0) {\n\t\t\tfor (i = 0; i < (int)kst->seq.l; ++i) kst->seq.s[i] = seq_nt4_table[(int)kst->seq.s[i]];\n\t\t\tr = ksw_align(ksq->seq.l, (uint8_t*)ksq->seq.s, kst->seq.l, (uint8_t*)kst->seq.s, 5, mat, gapo, gape, xtra, &q[0]);\n\t\t\tif (r.score >= minsc)\n\t\t\t\terr_printf(\"%s\\t%d\\t%d\\t%s\\t%d\\t%d\\t%d\\t%d\\t%d\\n\", kst->name.s, r.tb, r.te+1, ksq->name.s, r.qb, r.qe+1, r.score, r.score2, r.te2);\n\t\t\tif (rseq) {\n\t\t\t\tr = ksw_align(ksq->seq.l, rseq, kst->seq.l, (uint8_t*)kst->seq.s, 5, mat, gapo, gape, xtra, &q[1]);\n\t\t\t\tif (r.score >= minsc)\n\t\t\t\t\terr_printf(\"%s\\t%d\\t%d\\t%s\\t%d\\t%d\\t%d\\t%d\\t%d\\n\", kst->name.s, r.tb, r.te+1, ksq->name.s, (int)ksq->seq.l - r.qb, (int)ksq->seq.l - 1 - r.qe, r.score, r.score2, r.te2);\n\t\t\t}\n\t\t}\n\t\tfree(q[0]); free(q[1]);\n\t}\n\tfree(rseq);\n\tkseq_destroy(kst); err_gzclose(fpt);\n\tkseq_destroy(ksq); err_gzclose(fpq);\n\treturn 0;\n}\n#endif\n"
  },
  {
    "path": "src/ksw.h",
    "content": "#ifndef __AC_KSW_H\n#define __AC_KSW_H\n\n#include <stdint.h>\n\n#define KSW_XBYTE  0x10000\n#define KSW_XSTOP  0x20000\n#define KSW_XSUBO  0x40000\n#define KSW_XSTART 0x80000\n\nstruct _kswq_t;\ntypedef struct _kswq_t kswq_t;\n\ntypedef struct {\n\tint score; // best score\n\tint te, qe; // target end and query end\n\tint score2, te2; // second best score and ending position on the target\n\tint tb, qb; // target start and query start\n} kswr_t;\n\n#ifdef __cplusplus\nextern \"C\" {\n#endif\n\n\t/**\n\t * Aligning two sequences\n\t *\n\t * @param qlen    length of the query sequence (typically <tlen)\n\t * @param query   query sequence with 0 <= query[i] < m\n\t * @param tlen    length of the target sequence\n\t * @param target  target sequence\n\t * @param m       number of residue types\n\t * @param mat     m*m scoring matrix in one-dimension array\n\t * @param gapo    gap open penalty; a gap of length l cost \"-(gapo+l*gape)\"\n\t * @param gape    gap extension penalty\n\t * @param xtra    extra information (see below)\n\t * @param qry     query profile (see below)\n\t *\n\t * @return        alignment information in a struct; unset values to -1\n\t *\n\t * When xtra==0, ksw_align() uses a signed two-byte integer to store a\n\t * score and only finds the best score and the end positions. The 2nd best\n\t * score or the start positions are not attempted. The default behavior can\n\t * be tuned by setting KSW_X* flags:\n\t *\n\t *   KSW_XBYTE:  use an unsigned byte to store a score. If overflow occurs,\n\t *               kswr_t::score will be set to 255\n\t *\n\t *   KSW_XSUBO:  track the 2nd best score and the ending position on the\n\t *               target if the 2nd best is higher than (xtra&0xffff)\n\t *\n\t *   KSW_XSTOP:  stop if the maximum score is above (xtra&0xffff)\n\t *\n\t *   KSW_XSTART: find the start positions\n\t *\n\t * When *qry==NULL, ksw_align() will compute and allocate the query profile\n\t * and when the function returns, *qry will point to the profile, which can\n\t * be deallocated simply by free(). If one query is aligned against multiple\n\t * target sequences, *qry should be set to NULL during the first call and\n\t * freed after the last call. Note that qry can equal 0. In this case, the\n\t * query profile will be deallocated in ksw_align().\n\t */\n\tkswr_t ksw_align(int qlen, uint8_t *query, int tlen, uint8_t *target, int m, const int8_t *mat, int gapo, int gape, int xtra, kswq_t **qry);\n\tkswr_t ksw_align2(int qlen, uint8_t *query, int tlen, uint8_t *target, int m, const int8_t *mat, int o_del, int e_del, int o_ins, int e_ins, int xtra, kswq_t **qry);\n\n\t/**\n\t * Banded global alignment\n\t *\n\t * @param qlen    query length\n\t * @param query   query sequence with 0 <= query[i] < m\n\t * @param tlen    target length\n\t * @param target  target sequence with 0 <= target[i] < m\n\t * @param m       number of residue types\n\t * @param mat     m*m scoring mattrix in one-dimension array\n\t * @param gapo    gap open penalty; a gap of length l cost \"-(gapo+l*gape)\"\n\t * @param gape    gap extension penalty\n\t * @param w       band width\n\t * @param n_cigar (out) number of CIGAR elements\n\t * @param cigar   (out) BAM-encoded CIGAR; caller need to deallocate with free()\n\t *\n\t * @return        score of the alignment\n\t */\n\tint ksw_global(int qlen, const uint8_t *query, int tlen, const uint8_t *target, int m, const int8_t *mat, int gapo, int gape, int w, int *n_cigar, uint32_t **cigar);\n\tint ksw_global2(int qlen, const uint8_t *query, int tlen, const uint8_t *target, int m, const int8_t *mat, int o_del, int e_del, int o_ins, int e_ins, int w, int *n_cigar, uint32_t **cigar);\n\tint ksw_semi_global(int qlen, const char *query, int tlen, const char *target, int m, const int8_t *mat, int gapo, int gape, int w, int *n_cigar, uint32_t **cigar);\n\tint ksw_semi_global2(int qlen, const char *query, int tlen, const char *target, int m, const int8_t *mat, int o_del, int e_del, int o_ins, int e_ins, int w, int *n_cigar, uint32_t **cigar);\n\tint ksw_semi_global3(int qlen, const char *query, int tlen, const char *target, int m, const int8_t *mat, int o_del, int e_del, int o_ins, int e_ins, int w, int *n_cigar, uint32_t **cigar, int *mapping_start_position, int *mapping_end_position);\n\n\t/**\n\t * Extend alignment\n\t *\n\t * The routine aligns $query and $target, assuming their upstream sequences,\n\t * which are not provided, have been aligned with score $h0. In return,\n\t * region [0,*qle) on the query and [0,*tle) on the target sequences are\n\t * aligned together. If *gscore>=0, *gscore keeps the best score such that\n\t * the entire query sequence is aligned; *gtle keeps the position on the\n\t * target where *gscore is achieved. Returning *gscore and *gtle helps the\n\t * caller to decide whether an end-to-end hit or a partial hit is preferred.\n\t *\n\t * The first 9 parameters are identical to those in ksw_global()\n\t *\n\t * @param h0      alignment score of upstream sequences\n\t * @param _qle    (out) length of the query in the alignment\n\t * @param _tle    (out) length of the target in the alignment\n\t * @param _gtle   (out) length of the target if query is fully aligned\n\t * @param _gscore (out) score of the best end-to-end alignment; negative if not found\n\t *\n\t * @return        best semi-local alignment score\n\t */\n\tint ksw_extend(int qlen, const uint8_t *query, int tlen, const uint8_t *target, int m, const int8_t *mat, int gapo, int gape, int w, int end_bonus, int zdrop, int h0, int *qle, int *tle, int *gtle, int *gscore, int *max_off);\n\tint ksw_extend2(int qlen, const uint8_t *query, int tlen, const uint8_t *target, int m, const int8_t *mat, int o_del, int e_del, int o_ins, int e_ins, int w, int end_bonus, int zdrop, int h0, int *qle, int *tle, int *gtle, int *gscore, int *max_off);\n\n#ifdef __cplusplus\n}\n#endif\n\n#endif\n"
  },
  {
    "path": "src/mapping.h",
    "content": "#ifndef MAPPING_H_\n#define MAPPING_H_\n\nnamespace chromap {\n\n// An interface for various mapping formats.\nclass Mapping {\n public:\n  virtual ~Mapping() = default;\n  //// Defines the orders of mappings. Sort by mapping positions first, then\n  //// sorted by barcode and other fields if available. Make sure to consider\n  //// enough field so that the order is always deterministic.\n  // virtual bool operator<(const Mapping &m) const = 0;\n  //// Return true if two mappings are the same.\n  // virtual bool operator==(const Mapping &m) const = 0;\n  // Return true if two mappings are the same. For paired-end mappings, return\n  // true if the mapping intervals are the same. For single-end mappings, return\n  // true if the 5' mapping positions and strands are the same. This is\n  // different from the previous function as this function does not require\n  // barcodes to be the same.\n  // virtual bool IsSamePosition(const Mapping &m) const = 0;\n  // Return true if the mapping strand is positive. For paired-end reads, return\n  // true if the mapping strand of the first read is positive.\n  virtual bool IsPositiveStrand() const = 0;\n  // Barcodes are encoded by 64-bit integers. This function will return the\n  // encoded barcode. For mapping without barcode, this function will return 0;\n  virtual uint64_t GetBarcode() const = 0;\n  // Return inclusive mapping start position.\n  virtual uint32_t GetStartPosition() const = 0;\n  // Return exclusive mapping start position.\n  virtual uint32_t GetEndPosition() const = 0;\n  //// Return the total byte size of the mapping data field.\n  // virtual uint16_t GetByteSize() const = 0;\n  //// Write this mapping to a temp mapping output file in binary.\n  // virtual size_t WriteToFile(FILE *temp_mapping_output_file) const = 0;\n  //// Load this mapping fomr a temp mapping output file.\n  // virtual size_t LoadFromFile(FILE *temp_mapping_output_file) = 0;\n  // Perform Tn5 shift, which will change the start and end positions. Note that\n  // currently this can only be done on the mappings that are represented by\n  // intervals.\n  virtual void Tn5Shift() = 0;\n};\n\n}  // namespace chromap\n\n#endif  // MAPPING_H_\n"
  },
  {
    "path": "src/mapping_generator.cc",
    "content": "#include \"mapping_generator.h\"\n\nnamespace chromap {\n\n// For strand, kPositive is 1, kNegative is 0;\ntemplate <>\nvoid MappingGenerator<MappingWithoutBarcode>::EmplaceBackSingleEndMappingRecord(\n    MappingInMemory &mapping_in_memory,\n    std::vector<std::vector<MappingWithoutBarcode>>\n        &mappings_on_diff_ref_seqs) {\n  mappings_on_diff_ref_seqs[mapping_in_memory.rid].emplace_back(\n      mapping_in_memory.read_id, mapping_in_memory.GetFragmentStartPosition(),\n      mapping_in_memory.GetFragmentLength(), mapping_in_memory.mapq,\n      mapping_in_memory.GetStrand(), mapping_in_memory.is_unique,\n      /*num_dups=*/1);\n}\n\ntemplate <>\nvoid MappingGenerator<MappingWithBarcode>::EmplaceBackSingleEndMappingRecord(\n    MappingInMemory &mapping_in_memory,\n    std::vector<std::vector<MappingWithBarcode>> &mappings_on_diff_ref_seqs) {\n  mappings_on_diff_ref_seqs[mapping_in_memory.rid].emplace_back(\n      mapping_in_memory.read_id, mapping_in_memory.barcode_key,\n      mapping_in_memory.GetFragmentStartPosition(),\n      mapping_in_memory.GetFragmentLength(), mapping_in_memory.mapq,\n      mapping_in_memory.GetStrand(), mapping_in_memory.is_unique,\n      /*num_dups=*/1);\n}\n\ntemplate <>\nvoid MappingGenerator<PAFMapping>::EmplaceBackSingleEndMappingRecord(\n    MappingInMemory &mapping_in_memory,\n    std::vector<std::vector<PAFMapping>> &mappings_on_diff_ref_seqs) {\n  mappings_on_diff_ref_seqs[mapping_in_memory.rid].emplace_back(\n      mapping_in_memory.read_id, std::string(mapping_in_memory.read_name),\n      mapping_in_memory.read_length,\n      mapping_in_memory.GetFragmentStartPosition(),\n      mapping_in_memory.GetFragmentLength(), mapping_in_memory.mapq,\n      mapping_in_memory.GetStrand(), mapping_in_memory.is_unique,\n      /*num_dups=*/1);\n}\n\ntemplate <>\nvoid MappingGenerator<SAMMapping>::EmplaceBackSingleEndMappingRecord(\n    MappingInMemory &mapping_in_memory,\n    std::vector<std::vector<SAMMapping>> &mappings_on_diff_ref_seqs) {\n  mappings_on_diff_ref_seqs[mapping_in_memory.rid].emplace_back(\n      mapping_in_memory.read_id, std::string(mapping_in_memory.read_name),\n      mapping_in_memory.barcode_key, /*num_dups=*/1,\n      mapping_in_memory.GetFragmentStartPosition(), mapping_in_memory.rid,\n      /*mpos=*/0, /*mrid=*/-1, /*tlen=*/0, \n      mapping_in_memory.SAM_flag, mapping_in_memory.GetStrand(),\n      /*is_alt=*/0, mapping_in_memory.is_unique, mapping_in_memory.mapq,\n      mapping_in_memory.NM, mapping_in_memory.n_cigar, mapping_in_memory.cigar,\n      mapping_in_memory.MD_tag, std::string(mapping_in_memory.read_sequence),\n      std::string(mapping_in_memory.qual_sequence));\n}\n\ntemplate <>\nvoid MappingGenerator<PairedEndMappingWithBarcode>::\n    EmplaceBackSingleEndMappingRecord(\n        MappingInMemory &mapping_in_memory,\n        std::vector<std::vector<PairedEndMappingWithBarcode>>\n            &mappings_on_diff_ref_seqs) = delete;\n\ntemplate <>\nvoid MappingGenerator<PairedEndMappingWithoutBarcode>::\n    EmplaceBackSingleEndMappingRecord(\n        MappingInMemory &mapping_in_memory,\n        std::vector<std::vector<PairedEndMappingWithoutBarcode>>\n            &mappings_on_diff_ref_seqs) = delete;\n\ntemplate <>\nvoid MappingGenerator<PairedPAFMapping>::EmplaceBackSingleEndMappingRecord(\n    MappingInMemory &mapping_in_memory,\n    std::vector<std::vector<PairedPAFMapping>> &mappings_on_diff_ref_seqs) =\n    delete;\n\ntemplate <>\nvoid MappingGenerator<PairsMapping>::EmplaceBackSingleEndMappingRecord(\n    MappingInMemory &mapping_in_memory,\n    std::vector<std::vector<PairsMapping>> &mappings_on_diff_ref_seqs) = delete;\n\ntemplate <>\nvoid MappingGenerator<SAMMapping>::EmplaceBackPairedEndMappingRecord(\n    PairedEndMappingInMemory &paired_end_mapping_in_memory,\n    std::vector<std::vector<SAMMapping>> &mappings_on_diff_ref_seqs) {\n  int tlen = (int)paired_end_mapping_in_memory.GetFragmentLength();\n  for (int i = 0; i < 2; ++i) {\n    MappingInMemory &mapping_in_memory = (i == 0 ? paired_end_mapping_in_memory.mapping_in_memory1 :\n        paired_end_mapping_in_memory.mapping_in_memory2);\n    MappingInMemory &mate_mapping_in_memory = (i == 0 ? paired_end_mapping_in_memory.mapping_in_memory2 :\n        paired_end_mapping_in_memory.mapping_in_memory1);\n  \n    mappings_on_diff_ref_seqs[mapping_in_memory.rid].emplace_back(\n      mapping_in_memory.read_id, std::string(mapping_in_memory.read_name),\n      mapping_in_memory.barcode_key, /*num_dups=*/1,\n      mapping_in_memory.GetFragmentStartPosition(), mapping_in_memory.rid,\n      /*mpos=*/mate_mapping_in_memory.GetFragmentStartPosition(), \n      /*mrid=*/mate_mapping_in_memory.rid, \n      /*tlen=*/mapping_in_memory.GetStrand() ? tlen : -tlen, \n      mapping_in_memory.SAM_flag, mapping_in_memory.GetStrand(),\n      /*is_alt=*/0, mapping_in_memory.is_unique, mapping_in_memory.mapq,\n      mapping_in_memory.NM, mapping_in_memory.n_cigar, mapping_in_memory.cigar,\n      mapping_in_memory.MD_tag, std::string(mapping_in_memory.read_sequence),\n      std::string(mapping_in_memory.qual_sequence));\n  }\n}\n\ntemplate <>\nvoid MappingGenerator<PairedEndMappingWithoutBarcode>::\n    EmplaceBackPairedEndMappingRecord(\n        PairedEndMappingInMemory &paired_end_mapping_in_memory,\n        std::vector<std::vector<PairedEndMappingWithoutBarcode>>\n            &mappings_on_diff_ref_seqs) {\n  mappings_on_diff_ref_seqs[paired_end_mapping_in_memory.mapping_in_memory1.rid]\n      .emplace_back(paired_end_mapping_in_memory.GetReadId(),\n                    paired_end_mapping_in_memory.GetFragmentStartPosition(),\n                    paired_end_mapping_in_memory.GetFragmentLength(),\n                    paired_end_mapping_in_memory.mapq,\n                    paired_end_mapping_in_memory.GetStrand(),\n                    paired_end_mapping_in_memory.is_unique, /*num_dups=*/1,\n                    paired_end_mapping_in_memory.GetPositiveAlignmentLength(),\n                    paired_end_mapping_in_memory.GetNegativeAlignmentLength());\n}\n\ntemplate <>\nvoid MappingGenerator<PairedEndMappingWithBarcode>::\n    EmplaceBackPairedEndMappingRecord(\n        PairedEndMappingInMemory &paired_end_mapping_in_memory,\n        std::vector<std::vector<PairedEndMappingWithBarcode>>\n            &mappings_on_diff_ref_seqs) {\n  mappings_on_diff_ref_seqs[paired_end_mapping_in_memory.mapping_in_memory1.rid]\n      .emplace_back(paired_end_mapping_in_memory.GetReadId(),\n                    paired_end_mapping_in_memory.GetBarcode(),\n                    paired_end_mapping_in_memory.GetFragmentStartPosition(),\n                    paired_end_mapping_in_memory.GetFragmentLength(),\n                    paired_end_mapping_in_memory.mapq,\n                    paired_end_mapping_in_memory.GetStrand(),\n                    paired_end_mapping_in_memory.is_unique, /*num_dups=*/1,\n                    paired_end_mapping_in_memory.GetPositiveAlignmentLength(),\n                    paired_end_mapping_in_memory.GetNegativeAlignmentLength());\n}\n\ntemplate <>\nvoid MappingGenerator<PairedPAFMapping>::EmplaceBackPairedEndMappingRecord(\n    PairedEndMappingInMemory &paired_end_mapping_in_memory,\n    std::vector<std::vector<PairedPAFMapping>> &mappings_on_diff_ref_seqs) {\n  mappings_on_diff_ref_seqs[paired_end_mapping_in_memory.mapping_in_memory1.rid]\n      .emplace_back(\n          paired_end_mapping_in_memory.GetReadId(),\n          std::string(\n              paired_end_mapping_in_memory.mapping_in_memory1.read_name),\n          std::string(\n              paired_end_mapping_in_memory.mapping_in_memory2.read_name),\n          paired_end_mapping_in_memory.mapping_in_memory1.read_length,\n          paired_end_mapping_in_memory.mapping_in_memory2.read_length,\n          paired_end_mapping_in_memory.GetFragmentStartPosition(),\n          paired_end_mapping_in_memory.GetNegativeAlignmentLength(),\n          paired_end_mapping_in_memory.GetFragmentLength(),\n          paired_end_mapping_in_memory.GetPositiveAlignmentLength(),\n          paired_end_mapping_in_memory.mapq,\n          paired_end_mapping_in_memory.mapping_in_memory1.mapq,\n          paired_end_mapping_in_memory.mapping_in_memory2.mapq,\n          paired_end_mapping_in_memory.GetStrand(),\n          paired_end_mapping_in_memory.is_unique, /*num_dups=*/1);\n}\n\ntemplate <>\nvoid MappingGenerator<PairsMapping>::EmplaceBackPairedEndMappingRecord(\n    PairedEndMappingInMemory &paired_end_mapping_in_memory,\n    std::vector<std::vector<PairsMapping>> &mappings_on_diff_ref_seqs) {\n  uint8_t strand1 = paired_end_mapping_in_memory.mapping_in_memory1.GetStrand();\n  uint8_t strand2 = paired_end_mapping_in_memory.mapping_in_memory2.GetStrand();\n\n  int position1 =\n      paired_end_mapping_in_memory.mapping_in_memory1.ref_start_position;\n  int position2 =\n      paired_end_mapping_in_memory.mapping_in_memory2.ref_start_position;\n\n  if (paired_end_mapping_in_memory.mapping_in_memory1.strand == kNegative) {\n    position1 =\n        paired_end_mapping_in_memory.mapping_in_memory1.ref_end_position;\n  }\n\n  if (paired_end_mapping_in_memory.mapping_in_memory2.strand == kNegative) {\n    position2 =\n        paired_end_mapping_in_memory.mapping_in_memory2.ref_end_position;\n  }\n\n  int rid1 = paired_end_mapping_in_memory.mapping_in_memory1.rid;\n  int rid2 = paired_end_mapping_in_memory.mapping_in_memory2.rid;\n  const int rid1_rank = pairs_custom_rid_rank_[rid1];\n  const int rid2_rank = pairs_custom_rid_rank_[rid2];\n\n  const bool is_rid1_rank_smaller =\n      rid1_rank < rid2_rank || (rid1 == rid2 && position1 < position2);\n  if (!is_rid1_rank_smaller) {\n    std::swap(rid1, rid2);\n    std::swap(position1, position2);\n    std::swap(strand1, strand2);\n  }\n\n  mappings_on_diff_ref_seqs[rid1].emplace_back(\n      paired_end_mapping_in_memory.GetReadId(),\n      std::string(paired_end_mapping_in_memory.mapping_in_memory1.read_name),\n      paired_end_mapping_in_memory.GetBarcode(), rid1, rid2, position1,\n      position2, strand1, strand2, paired_end_mapping_in_memory.mapq,\n      paired_end_mapping_in_memory.is_unique, /*num_dups=*/1);\n}\n\ntemplate <>\nvoid MappingGenerator<MappingWithBarcode>::EmplaceBackPairedEndMappingRecord(\n    PairedEndMappingInMemory &paired_end_mapping_in_memory,\n    std::vector<std::vector<MappingWithBarcode>> &mappings_on_diff_ref_seqs) =\n    delete;\n\ntemplate <>\nvoid MappingGenerator<MappingWithoutBarcode>::EmplaceBackPairedEndMappingRecord(\n    PairedEndMappingInMemory &paired_end_mapping_in_memory,\n    std::vector<std::vector<MappingWithoutBarcode>>\n        &mappings_on_diff_ref_seqs) = delete;\n\ntemplate <>\nvoid MappingGenerator<PAFMapping>::EmplaceBackPairedEndMappingRecord(\n    PairedEndMappingInMemory &paired_end_mapping_in_memory,\n    std::vector<std::vector<PAFMapping>> &mappings_on_diff_ref_seqs) = delete;\n\n}  // namespace chromap\n"
  },
  {
    "path": "src/mapping_generator.h",
    "content": "#ifndef MAPPING_GENERATOR_H_\n#define MAPPING_GENERATOR_H_\n\n#include <cmath>\n#include <cstdint>\n#include <numeric>\n#include <random>\n#include <tuple>\n#include <vector>\n\n#include \"alignment.h\"\n#include \"bed_mapping.h\"\n#include \"ksw.h\"\n#include \"mapping.h\"\n#include \"mapping_in_memory.h\"\n#include \"mapping_metadata.h\"\n#include \"mapping_parameters.h\"\n#include \"paf_mapping.h\"\n#include \"paired_end_mapping_metadata.h\"\n#include \"pairs_mapping.h\"\n#include \"sam_mapping.h\"\n#include \"sequence_batch.h\"\n#include \"utils.h\"\n\nnamespace chromap {\n\n// Class to process draft mappings and generate best mappings and alignments. It\n// supports multi-threadidng as only the parameters are owned by the class.\ntemplate <typename MappingRecord>\nclass MappingGenerator {\n public:\n  MappingGenerator() = delete;\n\n  MappingGenerator(const MappingParameters &mapping_parameters,\n                   const std::vector<int> &pairs_custom_rid_rank)\n      : mapping_parameters_(mapping_parameters),\n        pairs_custom_rid_rank_(pairs_custom_rid_rank) {}\n\n  ~MappingGenerator() = default;\n\n  void GenerateBestMappingsForSingleEndRead(\n      const SequenceBatch &read_batch, uint32_t read_index,\n      const SequenceBatch &reference, const SequenceBatch &barcode_batch,\n      MappingMetadata &mapping_metadata,\n      std::vector<std::vector<MappingRecord>> &mappings_on_diff_ref_seqs);\n\n  // When the number of supplemented candidates is greater than 0, the\n  // force_mapq will be 0, and thereby setting the mapq to 0 (not mapq1 or\n  // mapq2). Split alignment won't run candidate supplement.\n  void GenerateBestMappingsForPairedEndRead(\n      uint32_t pair_index, const SequenceBatch &read_batch1,\n      const SequenceBatch &read_batch2, const SequenceBatch &barcode_batch,\n      const SequenceBatch &reference, std::vector<int> &best_mapping_indices,\n      std::mt19937 &generator, int force_mapq,\n      PairedEndMappingMetadata &paired_end_mapping_metadata,\n      std::vector<std::vector<MappingRecord>> &mappings_on_diff_ref_seqs);\n\n private:\n  void ProcessBestMappingsForSingleEndRead(\n      const Strand mapping_strand, uint32_t read_index,\n      const SequenceBatch &read_batch, const SequenceBatch &barcode_batch,\n      const SequenceBatch &reference, const MappingMetadata &mapping_metadata,\n      const std::vector<int> &best_mapping_indices, int &best_mapping_index,\n      int &num_best_mappings_reported,\n      std::vector<std::vector<MappingRecord>> &mappings_on_diff_ref_seqs);\n\n  void GenerateBestMappingsForPairedEndReadOnOneDirection(\n      const Strand first_read_strand, const Strand second_read_strand,\n      uint32_t pair_index, const SequenceBatch &read_batch1,\n      const SequenceBatch &read_batch2, const SequenceBatch &reference,\n      PairedEndMappingMetadata &paired_end_mapping_metadata);\n\n  void ProcessBestMappingsForPairedEndReadOnOneDirection(\n      const Strand first_read_strand, const Strand second_read_strand,\n      uint32_t pair_index, const SequenceBatch &read_batch1,\n      const SequenceBatch &read_batch2, const SequenceBatch &barcode_batch,\n      const SequenceBatch &reference,\n      const std::vector<int> &best_mapping_indices, int &best_mapping_index,\n      int &num_best_mappings_reported, int force_mapq,\n      const PairedEndMappingMetadata &paired_end_mapping_metadata,\n      std::vector<std::vector<MappingRecord>> &mappings_on_diff_ref_seqs);\n\n  void GetRefStartEndPositionForReadFromMapping(\n      const DraftMapping &mapping, const SequenceBatch &reference,\n      MappingInMemory &mapping_in_memory);\n\n  // For single-end. It should be fully specialized.\n  void EmplaceBackSingleEndMappingRecord(\n      MappingInMemory &mapping_in_memory,\n      std::vector<std::vector<MappingRecord>> &mappings_on_diff_ref_seqs);\n\n  // For paired-end. It should be fully specialized.\n  void EmplaceBackPairedEndMappingRecord(\n      PairedEndMappingInMemory &paired_mapping_in_memory,\n      std::vector<std::vector<MappingRecord>> &mappings_on_diff_ref_seqs);\n\n  uint8_t GetMAPQForSingleEndRead(const Strand strand, int num_errors,\n                                  uint16_t alignment_length, int read_length,\n                                  int max_num_error_difference,\n                                  const MappingMetadata &mapping_metadata);\n\n  uint8_t GetMAPQForPairedEndRead(\n      const Strand first_read_strand, const Strand second_read_strand,\n      int read1_num_errors, int read2_num_errors,\n      uint16_t read1_alignment_length, uint16_t read2_alignment_length,\n      int read1_length, int read2_length, int force_mapq,\n      const PairedEndMappingMetadata &paired_end_mapping_metadata,\n      uint8_t &mapq1, uint8_t &mapq2);\n\n  const MappingParameters mapping_parameters_;\n  const std::vector<int> pairs_custom_rid_rank_;\n};\n\ntemplate <typename MappingRecord>\nvoid MappingGenerator<MappingRecord>::GenerateBestMappingsForSingleEndRead(\n    const SequenceBatch &read_batch, uint32_t read_index,\n    const SequenceBatch &reference, const SequenceBatch &barcode_batch,\n    MappingMetadata &mapping_metadata,\n    std::vector<std::vector<MappingRecord>> &mappings_on_diff_ref_seqs) {\n  const int num_best_mappings = mapping_metadata.num_best_mappings_;\n\n  // We use reservoir sampling when the number of best mappings exceeds the\n  // threshold.\n  std::vector<int> best_mapping_indices(\n      mapping_parameters_.max_num_best_mappings);\n  std::iota(best_mapping_indices.begin(), best_mapping_indices.end(), 0);\n  if (num_best_mappings > mapping_parameters_.max_num_best_mappings) {\n    std::mt19937 generator(11);\n    for (int i = mapping_parameters_.max_num_best_mappings;\n         i < num_best_mappings; ++i) {\n      // Important: inclusive range.\n      std::uniform_int_distribution<int> distribution(0, i);\n      int j = distribution(generator);\n      if (j < mapping_parameters_.max_num_best_mappings) {\n        best_mapping_indices[j] = i;\n      }\n    }\n    std::sort(best_mapping_indices.begin(), best_mapping_indices.end());\n  }\n\n  int best_mapping_index = 0;\n  int num_best_mappings_reported = 0;\n  const int num_best_mappings_to_report =\n      std::min(num_best_mappings, mapping_parameters_.max_num_best_mappings);\n\n  ProcessBestMappingsForSingleEndRead(\n      kPositive, read_index, read_batch, barcode_batch, reference,\n      mapping_metadata, best_mapping_indices, best_mapping_index,\n      num_best_mappings_reported, mappings_on_diff_ref_seqs);\n\n  if (num_best_mappings_reported != num_best_mappings_to_report) {\n    ProcessBestMappingsForSingleEndRead(\n        kNegative, read_index, read_batch, barcode_batch, reference,\n        mapping_metadata, best_mapping_indices, best_mapping_index,\n        num_best_mappings_reported, mappings_on_diff_ref_seqs);\n  }\n}\n\ntemplate <typename MappingRecord>\nvoid MappingGenerator<MappingRecord>::GenerateBestMappingsForPairedEndRead(\n    uint32_t pair_index, const SequenceBatch &read_batch1,\n    const SequenceBatch &read_batch2, const SequenceBatch &barcode_batch,\n    const SequenceBatch &reference, std::vector<int> &best_mapping_indices,\n    std::mt19937 &generator, int force_mapq,\n    PairedEndMappingMetadata &paired_end_mapping_metadata,\n    std::vector<std::vector<MappingRecord>> &mappings_on_diff_ref_seqs) {\n  paired_end_mapping_metadata.SetMinSumErrors(\n      2 * mapping_parameters_.error_threshold + 1);\n  paired_end_mapping_metadata.SetNumBestMappings(0);\n  paired_end_mapping_metadata.SetSecondMinSumErrors(\n      2 * mapping_parameters_.error_threshold + 1);\n  paired_end_mapping_metadata.SetNumSecondBestMappings(0);\n\n  GenerateBestMappingsForPairedEndReadOnOneDirection(\n      kPositive, kNegative, pair_index, read_batch1, read_batch2, reference,\n      paired_end_mapping_metadata);\n  GenerateBestMappingsForPairedEndReadOnOneDirection(\n      kNegative, kPositive, pair_index, read_batch1, read_batch2, reference,\n      paired_end_mapping_metadata);\n\n  if (mapping_parameters_.split_alignment) {\n    GenerateBestMappingsForPairedEndReadOnOneDirection(\n        kPositive, kPositive, pair_index, read_batch1, read_batch2, reference,\n        paired_end_mapping_metadata);\n    GenerateBestMappingsForPairedEndReadOnOneDirection(\n        kNegative, kNegative, pair_index, read_batch1, read_batch2, reference,\n        paired_end_mapping_metadata);\n  }\n\n  if (paired_end_mapping_metadata.GetNumBestMappings() >\n      mapping_parameters_.drop_repetitive_reads) {\n    return;\n  }\n\n  // We use reservoir sampling when the number of best mappings exceeds the\n  // threshold.\n  // std::vector<int>\n  // best_mapping_indices(mapping_parameters_.max_num_best_mappings);\n  std::iota(best_mapping_indices.begin(), best_mapping_indices.end(), 0);\n  if (paired_end_mapping_metadata.GetNumBestMappings() >\n      mapping_parameters_.max_num_best_mappings) {\n    // std::mt19937 generator(11);\n    for (int i = mapping_parameters_.max_num_best_mappings;\n         i < paired_end_mapping_metadata.GetNumBestMappings(); ++i) {\n      // Important: inclusive range.\n      std::uniform_int_distribution<int> distribution(0, i);\n      int j = distribution(generator);\n      // int j = distribution(tmp_generator);\n      if (j < mapping_parameters_.max_num_best_mappings) {\n        best_mapping_indices[j] = i;\n      }\n    }\n    std::sort(best_mapping_indices.begin(), best_mapping_indices.end());\n  }\n\n  int best_mapping_index = 0;\n  int num_best_mappings_reported = 0;\n  const int num_best_mappings_to_report =\n      std::min(mapping_parameters_.max_num_best_mappings,\n               paired_end_mapping_metadata.GetNumBestMappings());\n\n  ProcessBestMappingsForPairedEndReadOnOneDirection(\n      kPositive, kNegative, pair_index, read_batch1, read_batch2, barcode_batch,\n      reference, best_mapping_indices, best_mapping_index,\n      num_best_mappings_reported, force_mapq, paired_end_mapping_metadata,\n      mappings_on_diff_ref_seqs);\n\n  if (num_best_mappings_reported != num_best_mappings_to_report) {\n    ProcessBestMappingsForPairedEndReadOnOneDirection(\n        kNegative, kPositive, pair_index, read_batch1, read_batch2,\n        barcode_batch, reference, best_mapping_indices, best_mapping_index,\n        num_best_mappings_reported, force_mapq, paired_end_mapping_metadata,\n        mappings_on_diff_ref_seqs);\n  }\n\n  if (mapping_parameters_.split_alignment &&\n      num_best_mappings_reported != num_best_mappings_to_report) {\n    ProcessBestMappingsForPairedEndReadOnOneDirection(\n        kPositive, kPositive, pair_index, read_batch1, read_batch2,\n        barcode_batch, reference, best_mapping_indices, best_mapping_index,\n        num_best_mappings_reported, force_mapq, paired_end_mapping_metadata,\n        mappings_on_diff_ref_seqs);\n  }\n\n  if (mapping_parameters_.split_alignment &&\n      num_best_mappings_reported != num_best_mappings_to_report) {\n    ProcessBestMappingsForPairedEndReadOnOneDirection(\n        kNegative, kNegative, pair_index, read_batch1, read_batch2,\n        barcode_batch, reference, best_mapping_indices, best_mapping_index,\n        num_best_mappings_reported, force_mapq, paired_end_mapping_metadata,\n        mappings_on_diff_ref_seqs);\n  }\n}\n\ntemplate <typename MappingRecord>\nvoid MappingGenerator<MappingRecord>::ProcessBestMappingsForSingleEndRead(\n    const Strand mapping_strand, uint32_t read_index,\n    const SequenceBatch &read_batch, const SequenceBatch &barcode_batch,\n    const SequenceBatch &reference, const MappingMetadata &mapping_metadata,\n    const std::vector<int> &best_mapping_indices, int &best_mapping_index,\n    int &num_best_mappings_reported,\n    std::vector<std::vector<MappingRecord>> &mappings_on_diff_ref_seqs) {\n  const std::vector<DraftMapping> &mappings =\n      mapping_strand == kPositive ? mapping_metadata.positive_mappings_\n                                  : mapping_metadata.negative_mappings_;\n  const std::vector<int> &split_sites =\n      mapping_strand == kPositive ? mapping_metadata.positive_split_sites_\n                                  : mapping_metadata.negative_split_sites_;\n\n  const char *read = read_batch.GetSequenceAt(read_index);\n  const uint32_t read_id = read_batch.GetSequenceIdAt(read_index);\n  const char *read_name = read_batch.GetSequenceNameAt(read_index);\n  const uint32_t read_length = read_batch.GetSequenceLengthAt(read_index);\n  const std::string &negative_read =\n      read_batch.GetNegativeSequenceAt(read_index);\n\n  MappingInMemory mapping_in_memory;\n  mapping_in_memory.read_id = read_id;\n  mapping_in_memory.read_name = read_name;\n  mapping_in_memory.is_unique = (mapping_metadata.num_best_mappings_ == 1);\n\n  uint64_t barcode_key = 0;\n  if (!mapping_parameters_.is_bulk_data) {\n    barcode_key = barcode_batch.GenerateSeedFromSequenceAt(\n        read_index, /*start_position=*/0,\n        barcode_batch.GetSequenceLengthAt(read_index));\n  }\n  mapping_in_memory.barcode_key = barcode_key;\n\n  mapping_in_memory.strand = mapping_strand;\n  mapping_in_memory.read_sequence =\n      mapping_strand == kPositive ? read : negative_read.data();\n  mapping_in_memory.read_length = read_length;\n\n  for (uint32_t mi = 0; mi < mappings.size(); ++mi) {\n    if (mappings[mi].GetNumErrors() > mapping_metadata.min_num_errors_) {\n      continue;\n    }\n\n    if (best_mapping_index ==\n        best_mapping_indices[num_best_mappings_reported]) {\n      mapping_in_memory.rid = mappings[mi].GetReferenceSequenceIndex();\n\n      if (mapping_parameters_.split_alignment) {\n        mapping_in_memory.read_split_site = split_sites[mi];\n      }\n\n      GetRefStartEndPositionForReadFromMapping(mappings[mi], reference,\n                                               mapping_in_memory);\n\n      const uint16_t alignment_length = mapping_in_memory.ref_end_position -\n                                        mapping_in_memory.ref_start_position +\n                                        1;\n      const uint8_t mapq = GetMAPQForSingleEndRead(\n          mapping_strand, /*num_errors=*/mappings[mi].GetNumErrors(),\n          alignment_length, read_length,\n          /*max_num_error_difference=*/mapping_parameters_.error_threshold,\n          mapping_metadata);\n      mapping_in_memory.mapq = mapq;\n\n      if (mapping_parameters_.mapping_output_format == MAPPINGFORMAT_SAM) {\n        uint16_t flag = mapping_strand == kPositive ? 0 : BAM_FREVERSE;\n        if (num_best_mappings_reported >= 1) {\n          flag |= BAM_FSECONDARY;\n        }\n        mapping_in_memory.SAM_flag = flag;\n        mapping_in_memory.qual_sequence =\n            read_batch.GetSequenceQualAt(read_index);\n      }\n\n      EmplaceBackSingleEndMappingRecord(mapping_in_memory,\n                                        mappings_on_diff_ref_seqs);\n\n      num_best_mappings_reported++;\n      if (num_best_mappings_reported ==\n          std::min(mapping_parameters_.max_num_best_mappings,\n                   mapping_metadata.num_best_mappings_)) {\n        break;\n      }\n    }\n\n    best_mapping_index++;\n  }\n}\n\ntemplate <typename MappingRecord>\nvoid MappingGenerator<MappingRecord>::\n    GenerateBestMappingsForPairedEndReadOnOneDirection(\n        const Strand first_read_strand, const Strand second_read_strand,\n        uint32_t pair_index, const SequenceBatch &read_batch1,\n        const SequenceBatch &read_batch2, const SequenceBatch &reference,\n        PairedEndMappingMetadata &paired_end_mapping_metadata) {\n  uint32_t i1 = 0;\n  uint32_t i2 = 0;\n  uint32_t min_overlap_length = mapping_parameters_.min_read_length;\n  uint32_t read1_length = read_batch1.GetSequenceLengthAt(pair_index);\n  uint32_t read2_length = read_batch2.GetSequenceLengthAt(pair_index);\n\n  const std::vector<DraftMapping> &mappings1 =\n      first_read_strand == kPositive\n          ? paired_end_mapping_metadata.mapping_metadata1_.positive_mappings_\n          : paired_end_mapping_metadata.mapping_metadata1_.negative_mappings_;\n  const std::vector<DraftMapping> &mappings2 =\n      second_read_strand == kPositive\n          ? paired_end_mapping_metadata.mapping_metadata2_.positive_mappings_\n          : paired_end_mapping_metadata.mapping_metadata2_.negative_mappings_;\n\n  std::vector<std::pair<uint32_t, uint32_t>> &best_mappings =\n      paired_end_mapping_metadata.GetBestMappings(first_read_strand,\n                                                  second_read_strand);\n  int &min_sum_errors = paired_end_mapping_metadata.min_sum_errors_;\n  int &num_best_mappings = paired_end_mapping_metadata.num_best_mappings_;\n  int &second_min_sum_errors =\n      paired_end_mapping_metadata.second_min_sum_errors_;\n  int &num_second_best_mappings =\n      paired_end_mapping_metadata.num_second_best_mappings_;\n\n#ifdef LI_DEBUG\n  for (int i = 0; i < mappings1.size(); ++i)\n    printf(\"mappings1 %d %d:%d\\n\", i,\n           (int)(mappings1[i].GetReferenceSequenceIndex()),\n           (int)mappings1[i].GetReferenceSequencePosition());\n  for (int i = 0; i < mappings2.size(); ++i)\n    printf(\"mappings2 %d %d:%d\\n\", i,\n           (int)(mappings2[i].GetReferenceSequenceIndex()),\n           (int)mappings2[i].GetReferenceSequencePosition());\n#endif\n\n  if (mapping_parameters_.split_alignment) {\n    if (mappings1.size() == 0 || mappings2.size() == 0) {\n      return;\n    }\n    // For split alignment, selecting the pairs whose both single-end are the\n    // best.\n    for (i1 = 0; i1 < mappings1.size(); ++i1) {\n      if (mappings1[i1].GetNumErrors() !=\n          paired_end_mapping_metadata.mapping_metadata1_.min_num_errors_) {\n        continue;\n      }\n      for (i2 = 0; i2 < mappings2.size(); ++i2) {\n        if (mappings2[i2].GetNumErrors() !=\n            paired_end_mapping_metadata.mapping_metadata2_.min_num_errors_) {\n          continue;\n        }\n        best_mappings.emplace_back(i1, i2);\n        min_sum_errors =\n            paired_end_mapping_metadata.mapping_metadata1_.min_num_errors_ +\n            paired_end_mapping_metadata.mapping_metadata2_.min_num_errors_;\n        //*second_min_sum_errors = min_num_errors1 + min_num_errors2 + 1;\n        num_best_mappings++;\n      }\n    }\n\n    return;\n  }\n\n  while (i1 < mappings1.size() && i2 < mappings2.size()) {\n    if ((first_read_strand == kNegative &&\n         mappings1[i1].position > mappings2[i2].position +\n                                      mapping_parameters_.max_insert_size -\n                                      read2_length) ||\n        (first_read_strand == kPositive &&\n         mappings1[i1].position >\n             mappings2[i2].position + read1_length - min_overlap_length)) {\n      ++i2;\n    } else if ((first_read_strand == kPositive &&\n                mappings2[i2].position >\n                    mappings1[i1].position +\n                        mapping_parameters_.max_insert_size - read1_length) ||\n               (first_read_strand == kNegative &&\n                mappings2[i2].position > mappings1[i1].position + read2_length -\n                                             min_overlap_length)) {\n      ++i1;\n    } else {\n      // Ok, find a pair, we store current ni2 somewhere and keep looking until\n      // we go out of the range, then we go back and then move to next pi1 and\n      // keep doing the similar thing.\n      uint32_t current_i2 = i2;\n      while (\n          current_i2 < mappings2.size() &&\n          ((first_read_strand == kPositive &&\n            mappings2[current_i2].position <=\n                mappings1[i1].position + mapping_parameters_.max_insert_size -\n                    read1_length) ||\n           (first_read_strand == kNegative &&\n            mappings2[current_i2].position <=\n                mappings1[i1].position + read2_length - min_overlap_length))) {\n#ifdef LI_DEBUG\n        printf(\n            \"%s passed: %llu %d %d %llu %d %d: %d %d %d\\n\", __func__,\n            mappings1[i1].GetReferenceSequenceIndex(),\n            int(mappings1[i1].GetReferenceSequencePosition()),\n\t\t\t\t\t\tfirst_read_strand,\n            mappings2[current_i2].GetReferenceSequenceIndex(),\n            int(mappings2[current_i2].GetReferenceSequencePosition()),\n\t\t\t\t\t\tsecond_read_strand,\n            mappings1[i1].GetNumErrors() + mappings2[current_i2].GetNumErrors(),\n            mappings1[i1].GetNumErrors(), mappings2[current_i2].GetNumErrors());\n#endif\n\n        int current_sum_errors =\n            mappings1[i1].GetNumErrors() + mappings2[current_i2].GetNumErrors();\n        if (current_sum_errors < min_sum_errors) {\n          second_min_sum_errors = min_sum_errors;\n          num_second_best_mappings = num_best_mappings;\n          min_sum_errors = current_sum_errors;\n          num_best_mappings = 1;\n          best_mappings.clear();\n          best_mappings.emplace_back(i1, current_i2);\n        } else if (current_sum_errors == min_sum_errors) {\n          num_best_mappings++;\n          best_mappings.emplace_back(i1, current_i2);\n        } else if (current_sum_errors == second_min_sum_errors) {\n          num_second_best_mappings++;\n        } else if (current_sum_errors < second_min_sum_errors) {\n          second_min_sum_errors = current_sum_errors;\n          num_second_best_mappings = 1;\n        }\n        ++current_i2;\n      }\n      ++i1;\n    }\n  }\n}\n\ntemplate <typename MappingRecord>\nvoid MappingGenerator<MappingRecord>::\n    ProcessBestMappingsForPairedEndReadOnOneDirection(\n        const Strand first_read_strand, const Strand second_read_strand,\n        uint32_t pair_index, const SequenceBatch &read_batch1,\n        const SequenceBatch &read_batch2, const SequenceBatch &barcode_batch,\n        const SequenceBatch &reference,\n        const std::vector<int> &best_mapping_indices, int &best_mapping_index,\n        int &num_best_mappings_reported, int force_mapq,\n        const PairedEndMappingMetadata &paired_end_mapping_metadata,\n        std::vector<std::vector<MappingRecord>> &mappings_on_diff_ref_seqs) {\n  PairedEndMappingInMemory paired_end_mapping_in_memory;\n\n  paired_end_mapping_in_memory.mapping_in_memory1.strand = first_read_strand;\n  paired_end_mapping_in_memory.mapping_in_memory2.strand = second_read_strand;\n\n  const char *read1 = read_batch1.GetSequenceAt(pair_index);\n  const char *read2 = read_batch2.GetSequenceAt(pair_index);\n  const uint32_t read1_length = read_batch1.GetSequenceLengthAt(pair_index);\n  const uint32_t read2_length = read_batch2.GetSequenceLengthAt(pair_index);\n  const char *read1_name = read_batch1.GetSequenceNameAt(pair_index);\n  const char *read2_name = read_batch2.GetSequenceNameAt(pair_index);\n  const std::string &negative_read1 =\n      read_batch1.GetNegativeSequenceAt(pair_index);\n  const std::string &negative_read2 =\n      read_batch2.GetNegativeSequenceAt(pair_index);\n  const uint32_t read_id = read_batch1.GetSequenceIdAt(pair_index);\n\n  paired_end_mapping_in_memory.mapping_in_memory1.read_id = read_id;\n  paired_end_mapping_in_memory.mapping_in_memory2.read_id = read_id;\n\n  paired_end_mapping_in_memory.mapping_in_memory1.read_name = read1_name;\n  paired_end_mapping_in_memory.mapping_in_memory2.read_name = read2_name;\n\n  paired_end_mapping_in_memory.mapping_in_memory1.read_length = read1_length;\n  paired_end_mapping_in_memory.mapping_in_memory2.read_length = read2_length;\n\n  const MappingMetadata &mapping_metadata1 =\n      paired_end_mapping_metadata.mapping_metadata1_;\n  const MappingMetadata &mapping_metadata2 =\n      paired_end_mapping_metadata.mapping_metadata2_;\n\n  const std::vector<DraftMapping> &mappings1 =\n      first_read_strand == kPositive ? mapping_metadata1.positive_mappings_\n                                     : mapping_metadata1.negative_mappings_;\n  const std::vector<DraftMapping> &mappings2 =\n      second_read_strand == kPositive ? mapping_metadata2.positive_mappings_\n                                      : mapping_metadata2.negative_mappings_;\n\n  const std::vector<int> &split_sites1 =\n      first_read_strand == kPositive ? mapping_metadata1.positive_split_sites_\n                                     : mapping_metadata1.negative_split_sites_;\n  const std::vector<int> &split_sites2 =\n      second_read_strand == kPositive ? mapping_metadata2.positive_split_sites_\n                                      : mapping_metadata2.negative_split_sites_;\n\n  const std::vector<std::pair<uint32_t, uint32_t>> &best_mappings =\n      paired_end_mapping_metadata.GetBestMappings(first_read_strand,\n                                                  second_read_strand);\n\n  const uint8_t is_unique =\n      (paired_end_mapping_metadata.num_best_mappings_ == 1 ||\n       mapping_metadata1.num_best_mappings_ == 1 ||\n       mapping_metadata2.num_best_mappings_ == 1)\n          ? 1\n          : 0;\n  paired_end_mapping_in_memory.is_unique = is_unique;\n\n  uint64_t barcode_key = 0;\n  if (!mapping_parameters_.is_bulk_data) {\n    barcode_key = barcode_batch.GenerateSeedFromSequenceAt(\n        pair_index, /*start_position=*/0,\n        barcode_batch.GetSequenceLengthAt(pair_index));\n  }\n  paired_end_mapping_in_memory.mapping_in_memory1.barcode_key = barcode_key;\n  paired_end_mapping_in_memory.mapping_in_memory2.barcode_key = barcode_key;\n\n  for (uint32_t mi = 0; mi < best_mappings.size(); ++mi) {\n    const uint32_t i1 = best_mappings[mi].first;\n    const uint32_t i2 = best_mappings[mi].second;\n    const int current_sum_errors =\n        mappings1[i1].GetNumErrors() + mappings2[i2].GetNumErrors();\n\n    if (current_sum_errors > paired_end_mapping_metadata.min_sum_errors_) {\n      continue;\n    }\n\n    if (best_mapping_index ==\n        best_mapping_indices[num_best_mappings_reported]) {\n      const uint32_t rid1 = mappings1[i1].GetReferenceSequenceIndex();\n      const uint32_t rid2 = mappings2[i2].GetReferenceSequenceIndex();\n\n      paired_end_mapping_in_memory.mapping_in_memory1.rid = rid1;\n      paired_end_mapping_in_memory.mapping_in_memory2.rid = rid2;\n\n      paired_end_mapping_in_memory.mapping_in_memory1.read_sequence =\n          first_read_strand == kPositive ? read1 : negative_read1.data();\n      paired_end_mapping_in_memory.mapping_in_memory2.read_sequence =\n          second_read_strand == kPositive ? read2 : negative_read2.data();\n\n      if (mapping_parameters_.split_alignment) {\n        paired_end_mapping_in_memory.mapping_in_memory1.read_split_site =\n            split_sites1[i1];\n        paired_end_mapping_in_memory.mapping_in_memory2.read_split_site =\n            split_sites2[i2];\n      }\n\n      GetRefStartEndPositionForReadFromMapping(\n          mappings1[i1], reference,\n          paired_end_mapping_in_memory.mapping_in_memory1);\n      GetRefStartEndPositionForReadFromMapping(\n          mappings2[i2], reference,\n          paired_end_mapping_in_memory.mapping_in_memory2);\n      uint8_t mapq1 = 0;\n      uint8_t mapq2 = 0;\n      const uint8_t mapq = GetMAPQForPairedEndRead(\n          first_read_strand, second_read_strand,\n          /*read1_num_errors=*/mappings1[i1].GetNumErrors(),\n          /*read2_num_errors=*/mappings2[i2].GetNumErrors(),\n          paired_end_mapping_in_memory.mapping_in_memory1.GetFragmentLength(),\n          paired_end_mapping_in_memory.mapping_in_memory2.GetFragmentLength(),\n          read1_length, read2_length, force_mapq, paired_end_mapping_metadata,\n          mapq1, mapq2);\n      paired_end_mapping_in_memory.mapq = mapq;\n      paired_end_mapping_in_memory.mapping_in_memory1.mapq = mapq;\n      paired_end_mapping_in_memory.mapping_in_memory2.mapq = mapq;\n\n      if (mapping_parameters_.mapping_output_format == MAPPINGFORMAT_SAM) {\n        uint16_t flag1 = 3;\n        uint16_t flag2 = 3;\n        if (first_read_strand == kNegative) {\n          flag1 |= BAM_FREVERSE;\n          flag2 |= BAM_FMREVERSE;\n        }\n        if (second_read_strand == kNegative) {\n          flag1 |= BAM_FMREVERSE;\n          flag2 |= BAM_FREVERSE;\n        }\n        flag1 |= BAM_FREAD1;\n        flag2 |= BAM_FREAD2;\n        if (num_best_mappings_reported >= 1) {\n          flag1 |= BAM_FSECONDARY;\n          flag2 |= BAM_FSECONDARY;\n        }\n        paired_end_mapping_in_memory.mapping_in_memory1.SAM_flag = flag1;\n        paired_end_mapping_in_memory.mapping_in_memory2.SAM_flag = flag2;\n        paired_end_mapping_in_memory.mapping_in_memory1.qual_sequence =\n            read_batch1.GetSequenceQualAt(pair_index);\n        paired_end_mapping_in_memory.mapping_in_memory2.qual_sequence =\n            read_batch2.GetSequenceQualAt(pair_index);\n        paired_end_mapping_in_memory.mapping_in_memory1.is_unique = is_unique;\n        paired_end_mapping_in_memory.mapping_in_memory2.is_unique = is_unique;\n      }\n\n      EmplaceBackPairedEndMappingRecord(paired_end_mapping_in_memory,\n                                        mappings_on_diff_ref_seqs);\n\n      num_best_mappings_reported++;\n      if (num_best_mappings_reported ==\n          std::min(mapping_parameters_.max_num_best_mappings,\n                   paired_end_mapping_metadata.num_best_mappings_)) {\n        break;\n      }\n    }\n\n    best_mapping_index++;\n  }\n}\n\n// The computed ref start and end coordinates are left closed and right closed.\ntemplate <typename MappingRecord>\nvoid MappingGenerator<MappingRecord>::GetRefStartEndPositionForReadFromMapping(\n    const DraftMapping &mapping, const SequenceBatch &reference,\n    MappingInMemory &mapping_in_memory) {\n  // For now this mat is only used by ksw to generate mappings in SAM format.\n  int8_t mat[25];\n  // if (output_mapping_in_SAM_) {\n  int i, j, k;\n  for (i = k = 0; i < 4; ++i) {\n    for (j = 0; j < 4; ++j)\n      mat[k++] = i == j ? mapping_parameters_.match_score\n                        : -mapping_parameters_.mismatch_penalty;\n    mat[k++] = 0;  // ambiguous base\n  }\n  for (j = 0; j < 5; ++j) mat[k++] = 0;\n  //}\n\n  const uint32_t rid = mapping.GetReferenceSequenceIndex();\n  const uint32_t ref_position = mapping.GetReferenceSequencePosition();\n\n  const int full_read_length = mapping_in_memory.read_length;\n  int read_length = mapping_in_memory.read_length;\n\n  const int min_num_errors = mapping.GetNumErrors();\n\n  int split_site =\n      mapping_in_memory.strand == kPositive ? 0 : mapping_in_memory.read_length;\n\n  int gap_beginning = 0;\n  int actual_num_errors = 0;\n\n  if (mapping_parameters_.split_alignment) {\n    split_site = mapping_in_memory.read_split_site & 0xffff;\n    // Beginning means the 5' end of the read.\n    gap_beginning = (mapping_in_memory.read_split_site >> 16) & 0xff;\n    // In split alignment, -num_errors is the number of matches.\n    actual_num_errors = (mapping_in_memory.read_split_site >> 24) & 0xff;\n    read_length = split_site - gap_beginning;\n  }\n\n  uint32_t verification_window_start_position =\n      ref_position + 1 >\n              (uint32_t)(read_length + mapping_parameters_.error_threshold)\n          ? ref_position + 1 - read_length - mapping_parameters_.error_threshold\n          : 0;\n\n  if (ref_position + mapping_parameters_.error_threshold >=\n      reference.GetSequenceLengthAt(rid)) {\n    verification_window_start_position = reference.GetSequenceLengthAt(rid) -\n                                         mapping_parameters_.error_threshold -\n                                         read_length;\n  }\n\n  if (verification_window_start_position < 0) {\n    verification_window_start_position = 0;\n  }\n\n  if (mapping_parameters_.split_alignment) {\n    if (split_site < full_read_length &&\n        mapping_parameters_.mapping_output_format == MAPPINGFORMAT_SAM &&\n        split_site > 3 * mapping_parameters_.error_threshold) {\n      split_site -= 3 * mapping_parameters_.error_threshold;\n    }\n    read_length = split_site - gap_beginning;\n  }\n\n  if (mapping_in_memory.strand == kPositive) {\n    if (mapping_parameters_.mapping_output_format == MAPPINGFORMAT_SAM) {\n      mapping_in_memory.n_cigar = 0;\n\n      int mapping_start_position = 0;\n      int mapping_end_position = 0;\n\n      ksw_semi_global3(\n          read_length + 2 * mapping_parameters_.error_threshold,\n          reference.GetSequenceAt(rid) + verification_window_start_position,\n          read_length, mapping_in_memory.read_sequence + gap_beginning, 5, mat,\n          mapping_parameters_.gap_open_penalties[0],\n          mapping_parameters_.gap_extension_penalties[0],\n          mapping_parameters_.gap_open_penalties[1],\n          mapping_parameters_.gap_extension_penalties[1],\n          mapping_parameters_.error_threshold * 2 + 1,\n          &(mapping_in_memory.n_cigar), &(mapping_in_memory.cigar),\n          &mapping_start_position, &mapping_end_position);\n\n      if (gap_beginning > 0) {\n        int new_ref_start_position = AdjustGapBeginning(\n            mapping_in_memory.strand, reference.GetSequenceAt(rid),\n            mapping_in_memory.read_sequence, &gap_beginning, read_length - 1,\n            verification_window_start_position + mapping_start_position,\n            verification_window_start_position + mapping_end_position - 1,\n            &(mapping_in_memory.n_cigar), &(mapping_in_memory.cigar));\n        mapping_start_position =\n            new_ref_start_position - verification_window_start_position;\n      }\n\n      GenerateNMAndMDTag(\n          reference.GetSequenceAt(rid),\n          mapping_in_memory.read_sequence + gap_beginning,\n          verification_window_start_position + mapping_start_position,\n          mapping_in_memory);\n\n      mapping_in_memory.ref_start_position =\n          verification_window_start_position + mapping_start_position;\n      mapping_in_memory.ref_end_position =\n          verification_window_start_position + mapping_end_position - 1;\n    } else {\n      int mapping_start_position = 0;\n      if (!mapping_parameters_.split_alignment) {\n        BandedTraceback(\n            mapping_parameters_.error_threshold, min_num_errors,\n            reference.GetSequenceAt(rid) + verification_window_start_position,\n            mapping_in_memory.read_sequence, read_length,\n            &mapping_start_position);\n      } else {\n        BandedTraceback(\n            mapping_parameters_.error_threshold, actual_num_errors,\n            reference.GetSequenceAt(rid) + verification_window_start_position,\n            mapping_in_memory.read_sequence + gap_beginning, read_length,\n            &mapping_start_position);\n      }\n\n      if (gap_beginning > 0) {\n        int new_ref_start_position = AdjustGapBeginning(\n            mapping_in_memory.strand, reference.GetSequenceAt(rid),\n            mapping_in_memory.read_sequence, &gap_beginning, read_length - 1,\n            verification_window_start_position + mapping_start_position,\n            ref_position, &(mapping_in_memory.n_cigar),\n            &(mapping_in_memory.cigar));\n\n        mapping_start_position =\n            new_ref_start_position - verification_window_start_position;\n      }\n\n      mapping_in_memory.ref_start_position =\n          verification_window_start_position + mapping_start_position;\n      mapping_in_memory.ref_end_position = ref_position;\n    }\n\n    return;\n  }\n\n  //  reversed read looks like:\n  //\n  //      veri_start_pos       ref_position\n  //  ref   --|-------------------|------------------->\n  //  read     <-|--read_length---|--gap_beginning--\n  //          split_site\n  //\n\n  const int read_start_site = full_read_length - split_site;\n  if (mapping_parameters_.mapping_output_format == MAPPINGFORMAT_SAM) {\n    mapping_in_memory.n_cigar = 0;\n\n    int mapping_start_position = 0;\n    int mapping_end_position = 0;\n\n    ksw_semi_global3(read_length + 2 * mapping_parameters_.error_threshold,\n                     reference.GetSequenceAt(rid) +\n                         verification_window_start_position + read_start_site,\n                     read_length,\n                     mapping_in_memory.read_sequence + read_start_site, 5, mat,\n                     mapping_parameters_.gap_open_penalties[0],\n                     mapping_parameters_.gap_extension_penalties[0],\n                     mapping_parameters_.gap_open_penalties[1],\n                     mapping_parameters_.gap_extension_penalties[1],\n                     mapping_parameters_.error_threshold * 2 + 1,\n                     &(mapping_in_memory.n_cigar), &(mapping_in_memory.cigar),\n                     &mapping_start_position, &mapping_end_position);\n\n    if (gap_beginning > 0) {\n      int new_ref_end_position = AdjustGapBeginning(\n          mapping_in_memory.strand, reference.GetSequenceAt(rid),\n          mapping_in_memory.read_sequence + read_start_site, &gap_beginning,\n          read_length - 1,\n          verification_window_start_position + mapping_start_position,\n          verification_window_start_position + mapping_end_position - 1,\n          &(mapping_in_memory.n_cigar), &(mapping_in_memory.cigar));\n\n      // The returned position is right-closed, so need to plus one to match\n      // bed convention\n      mapping_end_position = new_ref_end_position + 1 -\n                             verification_window_start_position -\n                             read_start_site;\n      read_length = split_site - gap_beginning;\n    }\n\n    GenerateNMAndMDTag(reference.GetSequenceAt(rid),\n                       mapping_in_memory.read_sequence + read_start_site,\n                       verification_window_start_position + read_start_site +\n                           mapping_start_position,\n                       mapping_in_memory);\n\n    mapping_in_memory.ref_start_position = verification_window_start_position +\n                                           read_start_site +\n                                           mapping_start_position;\n    mapping_in_memory.ref_end_position = verification_window_start_position +\n                                         read_start_site +\n                                         mapping_end_position - 1;\n  } else {\n    int mapping_start_position = mapping_parameters_.error_threshold;\n    int mapping_end_position =\n        ref_position - verification_window_start_position + 1;\n    // int n_cigar = 0;\n    // uint32_t *cigar;\n    // ksw_semi_global3(read_length + 2 * mapping_parameters_.error_threshold,\n    // reference.GetSequenceAt(rid) + verification_window_start_position,\n    // read_length, negative_read.data() + split_sites[mi], 5, mat,\n    // mapping_parameters_.gap_open_penalties[0],\n    // mapping_parameters_.gap_extension_penalties[0],\n    // mapping_parameters_.gap_open_penalties[1],\n    // mapping_parameters_.gap_extension_penalties[1],\n    // mapping_parameters_.error_threshold * 2 + 1, &n_cigar, &cigar,\n    // &mapping_start_position, &mapping_end_position); mapq =\n    // GetMAPQForSingleEndRead(mapping_parameters_.error_threshold, 0, 0,\n    // mapping_end_position - mapping_start_position + 1, min_num_errors,\n    // num_best_mappings, second_min_num_errors, num_second_best_mappings);\n    // uint32_t fragment_start_position = verification_window_start_position +\n    // mapping_start_position; uint16_t fragment_length = mapping_end_position\n    // - mapping_start_position + 1;\n    if (!mapping_parameters_.split_alignment) {\n      BandedTraceback(\n          mapping_parameters_.error_threshold, min_num_errors,\n          reference.GetSequenceAt(rid) + verification_window_start_position,\n          mapping_in_memory.read_sequence + read_start_site, read_length,\n          &mapping_start_position);\n    } else {\n      // BandedTracebackToEnd(mapping_parameters_.error_threshold,actual_num_errors,\n      // reference.GetSequenceAt(rid)\n      // + verification_window_start_position, read + read_start_site,\n      // read_length, &mapping_end_position);\n      BandedAlignPatternToText(\n          mapping_parameters_.error_threshold,\n          reference.GetSequenceAt(rid) + verification_window_start_position,\n          mapping_in_memory.read_sequence + read_start_site, read_length,\n          &mapping_end_position);\n      // seems banded align's mapping end position is included?\n      mapping_end_position += 1;\n    }\n\n    if (gap_beginning > 0) {\n      int new_ref_end_position = AdjustGapBeginning(\n          mapping_in_memory.strand, reference.GetSequenceAt(rid),\n          mapping_in_memory.read_sequence + read_start_site, &gap_beginning,\n          read_length - 1,\n          verification_window_start_position + mapping_start_position,\n          verification_window_start_position + mapping_end_position,\n          &(mapping_in_memory.n_cigar), &(mapping_in_memory.cigar));\n\n      // The returned position is right-closed, so need to plus one to match\n      // bed convention.\n      mapping_end_position =\n          new_ref_end_position - verification_window_start_position + 1;\n      read_length = split_site - gap_beginning;\n    }\n\n    mapping_in_memory.ref_start_position =\n        verification_window_start_position + mapping_start_position;\n    mapping_in_memory.ref_end_position =\n        verification_window_start_position + mapping_end_position - 1;\n  }\n}\n\ntemplate <typename MappingRecord>\nuint8_t MappingGenerator<MappingRecord>::GetMAPQForSingleEndRead(\n    const Strand strand, int num_errors, uint16_t alignment_length,\n    int read_length, int max_num_error_difference,\n    const MappingMetadata &mapping_metadata) {\n  int mapq_coef_length = 50;\n  int mapq_coef_fraction = log(mapq_coef_length);\n\n  if (!mapping_parameters_.split_alignment) {\n    alignment_length =\n        alignment_length > read_length ? alignment_length : read_length;\n  }\n\n  double alignment_identity = 1 - (double)num_errors / alignment_length;\n\n  if (mapping_parameters_.split_alignment) {\n    alignment_identity = (double)(-num_errors) / alignment_length;\n    if (alignment_identity > 1) alignment_identity = 1;\n  }\n\n  int mapq = 0;\n  int second_min_num_errors = mapping_metadata.second_min_num_errors_;\n\n  if (mapping_metadata.num_best_mappings_ > 1) {\n    // mapq = -4.343 * log(1 - 1.0 / num_best_mappings);\n    // if (num_best_mappings == 2) {\n    //  mapq = 3;\n    //} else if (num_best_mappings == 3) {\n    //  mapq = 2;\n    //} else if (num_best_mappings < 10) {\n    //  mapq = 1;\n    //} else {\n    //  mapq = 0;\n    //}\n  } else {\n    if (second_min_num_errors > num_errors + max_num_error_difference) {\n      second_min_num_errors = num_errors + max_num_error_difference;\n    }\n\n    double tmp = alignment_length < mapq_coef_length\n                     ? 1.0\n                     : mapq_coef_fraction / log(alignment_length);\n    tmp *= alignment_identity * alignment_identity;\n    mapq = 5 * 6.02 * (second_min_num_errors - num_errors) * tmp * tmp + 0.499;\n  }\n\n  if (mapping_metadata.num_second_best_mappings_ > 0) {\n    mapq -= (int)(4.343 * log(mapping_metadata.num_second_best_mappings_ + 1) +\n                  0.499);\n  }\n\n  if (mapq > 60) {\n    mapq = 60;\n  }\n  if (mapq < 0) {\n    mapq = 0;\n  }\n\n  if (mapping_metadata.repetitive_seed_length_ > 0) {\n    double frac_rep =\n        (mapping_metadata.repetitive_seed_length_) / (double)read_length;\n    if (mapping_metadata.repetitive_seed_length_ >= (uint32_t)read_length) {\n      frac_rep = 0.999;\n    }\n    if (alignment_identity <= 0.95) {\n      mapq = mapq * (1 - sqrt(frac_rep)) + 0.499;\n    } else if (alignment_identity <= 0.97) {\n      mapq = mapq * (1 - frac_rep) + 0.499;\n    } else if (alignment_identity >= 0.999) {\n      mapq = mapq * (1 - frac_rep * frac_rep * frac_rep * frac_rep) + 0.499;\n    } else {\n      mapq = mapq * (1 - frac_rep * frac_rep) + 0.499;\n    }\n  }\n\n  if (mapping_parameters_.split_alignment &&\n      alignment_length < read_length - mapping_parameters_.error_threshold &&\n      second_min_num_errors != num_errors) {\n    if (mapping_metadata.repetitive_seed_length_ >= alignment_length &&\n        mapping_metadata.repetitive_seed_length_ < (uint32_t)read_length &&\n        alignment_length < read_length / 3) {\n      mapq = 0;\n    }\n    const int diff = second_min_num_errors - num_errors;\n    const uint32_t num_candidates =\n        strand == kPositive ? mapping_metadata.positive_candidates_.size()\n                            : mapping_metadata.negative_candidates_.size();\n    if (second_min_num_errors - num_errors <=\n            mapping_parameters_.error_threshold * 3 / 4 &&\n        num_candidates >= 5) {\n      mapq -= (num_candidates / 5 / diff);\n    }\n    if (mapq < 0) {\n      mapq = 0;\n    }\n    if (mapping_metadata.num_second_best_mappings_ > 0 &&\n        second_min_num_errors - num_errors <=\n            mapping_parameters_.error_threshold * 3 / 4) {\n      mapq /= (mapping_metadata.num_second_best_mappings_ / diff + 1);\n    }\n  }\n\n  return (uint8_t)mapq;\n}\n\n#define raw_mapq(diff, a) ((int)(5 * 6.02 * (diff) / (a) + .499))\n\ntemplate <typename MappingRecord>\nuint8_t MappingGenerator<MappingRecord>::GetMAPQForPairedEndRead(\n    const Strand first_read_strand, const Strand second_read_strand,\n    int read1_num_errors, int read2_num_errors, uint16_t read1_alignment_length,\n    uint16_t read2_alignment_length, int read1_length, int read2_length,\n    int force_mapq, const PairedEndMappingMetadata &paired_end_mapping_metadata,\n    uint8_t &mapq1, uint8_t &mapq2) {\n  const MappingMetadata &mapping_metadata1 =\n      paired_end_mapping_metadata.mapping_metadata1_;\n  const MappingMetadata &mapping_metadata2 =\n      paired_end_mapping_metadata.mapping_metadata2_;\n\n#ifdef CHROMAP_DEBUG\n  std::cerr\n      << \" rl1:\"\n      << (int)paired_end_mapping_metadata.mapping_metadata1_\n             .repetitive_seed_length_\n      << \" rl2:\"\n      << (int)paired_end_mapping_metadata.mapping_metadata2_\n             .repetitive_seed_length_\n      << \" pal:\" << (int)read1_alignment_length\n      << \" nal:\" << (int)read2_alignment_length\n      << \" me:\" << paired_end_mapping_metadata.min_sum_errors_\n      << \" #bm:\" << paired_end_mapping_metadata.num_best_mappings_\n      << \" sme:\" << paired_end_mapping_metadata.second_min_sum_errors_\n      << \" #sbm:\" << paired_end_mapping_metadata.num_second_best_mappings_\n      << \" ne1:\" << read1_num_errors << \" ne2:\" << read2_num_errors << \" me1:\"\n      << paired_end_mapping_metadata.mapping_metadata1_.min_num_errors_\n      << \" me2:\"\n      << paired_end_mapping_metadata.mapping_metadata2_.min_num_errors_\n      << \" #bm1:\"\n      << paired_end_mapping_metadata.mapping_metadata1_.num_best_mappings_\n      << \" #bm2:\"\n      << paired_end_mapping_metadata.mapping_metadata2_.num_best_mappings_\n      << \" sme1:\"\n      << paired_end_mapping_metadata.mapping_metadata1_.second_min_num_errors_\n      << \" sme2:\"\n      << paired_end_mapping_metadata.mapping_metadata2_.second_min_num_errors_\n      << \" #sbm1:\"\n      << paired_end_mapping_metadata.mapping_metadata1_\n             .num_second_best_mappings_\n      << \" #sbm2:\"\n      << paired_end_mapping_metadata.mapping_metadata2_\n             .num_second_best_mappings_\n      << \"\\n\";\n#endif\n\n  uint8_t mapq_pe = 0;\n  int min_num_unpaired_sum_errors =\n      mapping_metadata1.min_num_errors_ + mapping_metadata2.min_num_errors_ + 3;\n\n  if (paired_end_mapping_metadata.num_best_mappings_ <= 1) {\n    int adjusted_second_min_sum_errors =\n        paired_end_mapping_metadata.second_min_sum_errors_ <\n                min_num_unpaired_sum_errors\n            ? paired_end_mapping_metadata.second_min_sum_errors_\n            : min_num_unpaired_sum_errors;\n\n    mapq_pe = raw_mapq(adjusted_second_min_sum_errors -\n                           paired_end_mapping_metadata.min_sum_errors_,\n                       1);\n\n#ifdef CHROMAP_DEBUG\n    std::cerr << \"mapqpe: \" << (int)mapq_pe << \"\\n\";\n#endif\n\n    if (paired_end_mapping_metadata.num_second_best_mappings_ > 0) {\n      mapq_pe -=\n          (int)(4.343 *\n                    log(paired_end_mapping_metadata.num_second_best_mappings_ +\n                        1) +\n                0.499);\n    }\n\n    if (mapq_pe > 60) {\n      mapq_pe = 60;\n    }\n    if (mapq_pe < 0) {\n      mapq_pe = 0;\n    }\n\n#ifdef CHROMAP_DEBUG\n    std::cerr << \"mapqpe: \" << (int)mapq_pe << \"\\n\";\n#endif\n\n    int repetitive_seed_length = mapping_metadata1.repetitive_seed_length_ +\n                                 mapping_metadata2.repetitive_seed_length_;\n\n    if (repetitive_seed_length > 0) {\n      double total_read_length = read1_length + read2_length;\n      double frac_rep = (double)repetitive_seed_length / total_read_length;\n      if (repetitive_seed_length >= total_read_length) {\n        frac_rep = 0.999;\n      }\n\n      double alignment_identity1 =\n          1 - (double)read1_num_errors / (read1_length > read1_alignment_length\n                                              ? read1_length\n                                              : read1_alignment_length);\n\n      double alignment_identity2 =\n          1 - (double)read2_num_errors / (read2_length > read2_alignment_length\n                                              ? read2_length\n                                              : read2_alignment_length);\n\n      double alignment_identity = alignment_identity1 < alignment_identity2\n                                      ? alignment_identity1\n                                      : alignment_identity2;\n\n      if (alignment_identity <= 0.95) {\n        mapq_pe = mapq_pe * (1 - sqrt(frac_rep)) + 0.499;\n      } else if (alignment_identity <= 0.97) {\n        mapq_pe = mapq_pe * (1 - frac_rep) + 0.499;\n      } else if (alignment_identity >= 0.999) {\n        mapq_pe =\n            mapq_pe * (1 - frac_rep * frac_rep * frac_rep * frac_rep) + 0.499;\n      } else {\n        mapq_pe = mapq_pe * (1 - frac_rep * frac_rep) + 0.499;\n      }\n    }\n  }\n\n  mapq1 = GetMAPQForSingleEndRead(\n      first_read_strand, read1_num_errors, read1_alignment_length, read1_length,\n      /*max_num_error_difference=*/2, mapping_metadata1);\n\n  mapq2 = GetMAPQForSingleEndRead(second_read_strand, read2_num_errors,\n                                  read2_alignment_length, read2_length,\n                                  /*max_num_error_difference=*/2,\n                                  mapping_metadata2);\n\n#ifdef CHROMAP_DEBUG\n  std::cerr << \" 1:\" << (int)mapq1 << \" 2:\" << (int)mapq2\n            << \" mapq_pe:\" << (int)mapq_pe << \"\\n\";\n#endif\n\n  if (!mapping_parameters_.split_alignment) {\n    mapq1 = mapq1 > mapq_pe                    ? mapq1\n            : mapq_pe < mapq1 + mapq_pe * 0.65 ? mapq_pe\n                                               : mapq1 + mapq_pe * 0.65;\n    mapq2 = mapq2 > mapq_pe                    ? mapq2\n            : mapq_pe < mapq2 + mapq_pe * 0.65 ? mapq_pe\n                                               : mapq2 + mapq_pe * 0.65;\n  }\n\n  mapq1 *= 1.2;\n  if (mapq1 > 60) {\n    mapq1 = 60;\n  }\n\n  mapq2 *= 1.2;\n  if (mapq2 > 60) {\n    mapq2 = 60;\n  }\n\n#ifdef CHROMAP_DEBUG\n  std::cerr << \" 1:\" << (int)mapq1 << \" 2:\" << (int)mapq2 << \"\\n\\n\";\n#endif\n\n  uint8_t mapq = mapq1 < mapq2 ? mapq1 : mapq2;\n\n  if (mapq < 60 && force_mapq >= 0 && force_mapq < mapq) {\n    mapq = force_mapq;\n  }\n\n  return (uint8_t)mapq;\n}\n\n}  // namespace chromap\n\n#endif  // MAPPING_GENERATOR_H_\n"
  },
  {
    "path": "src/mapping_in_memory.h",
    "content": "#ifndef MAPPING_IN_MEMORY_H_\n#define MAPPING_IN_MEMORY_H_\n\n#include <stdint.h>\n\n#include <string>\n\n#include \"utils.h\"\n\nnamespace chromap {\n\n// Regardless of mapping format, this struct can temporarily hold a mapping in\n// memory for easily passing it into the several functions before pushing it\n// to the result mapping vector. It never owns the read or the read qual. It\n// owns the cigar before push the mapping to the vector. (For now, the cigar\n// memory is released once a SAMMapping is created.) Since this struct is large,\n// we should never create a huge vector of this struct.\nstruct MappingInMemory {\n  uint32_t read_id = 0;\n  int read_split_site = 0;\n  int read_length = 0;\n\n  uint32_t rid = 0;\n  uint32_t ref_start_position = 0;\n  uint32_t ref_end_position = 0;\n\n  uint64_t barcode_key = 0;\n\n  Strand strand = kPositive;\n  bool is_unique = true;\n  uint8_t mapq = 0;\n\n  // It does NOT own read or read qual.\n  const char *read_name = nullptr;\n  const char *read_sequence = nullptr;\n  const char *qual_sequence = nullptr;\n\n  // SAM fields or tags.\n  uint16_t SAM_flag = 0;\n\n  uint32_t *cigar = nullptr;\n  int n_cigar = 0;\n\n  int NM = 0;\n\n  std::string MD_tag;\n\n  inline uint8_t GetStrand() const { return (strand == kPositive ? 1 : 0); }\n\n  inline uint32_t GetFragmentStartPosition() const {\n    return ref_start_position;\n  }\n\n  // TODO(Haowen): change this to alignment length.\n  inline uint16_t GetFragmentLength() const {\n    return ref_end_position - ref_start_position + 1;\n  }\n\n  inline uint16_t GetAlignmentLength() const {\n    return ref_end_position - ref_start_position + 1;\n  }\n};\n\nstruct PairedEndMappingInMemory {\n  MappingInMemory mapping_in_memory1;\n  MappingInMemory mapping_in_memory2;\n  uint8_t is_unique;\n  uint8_t mapq;\n\n  inline uint8_t GetStrand() const {\n    return (mapping_in_memory1.strand == kPositive ? 1 : 0);\n  }\n\n  inline uint32_t GetReadId() const { return mapping_in_memory1.read_id; }\n\n  inline uint64_t GetBarcode() const { return mapping_in_memory1.barcode_key; }\n\n  inline uint32_t GetFragmentStartPosition() const {\n    if (mapping_in_memory1.strand == kPositive) {\n      return mapping_in_memory1.GetFragmentStartPosition();\n    }\n\n    return mapping_in_memory2.GetFragmentStartPosition();\n  }\n\n  inline int GetFragmentLength() const {\n    if (mapping_in_memory1.strand == kPositive) {\n      return mapping_in_memory2.ref_end_position -\n             mapping_in_memory1.ref_start_position + 1;\n    }\n    return mapping_in_memory1.ref_end_position -\n           mapping_in_memory2.ref_start_position + 1;\n  }\n\n  inline uint32_t GetPositiveAlignmentLength() const {\n    if (mapping_in_memory1.strand == kPositive) {\n      return mapping_in_memory1.GetAlignmentLength();\n    }\n    return mapping_in_memory2.GetAlignmentLength();\n  }\n\n  inline uint32_t GetNegativeAlignmentLength() const {\n    if (mapping_in_memory1.strand == kNegative) {\n      return mapping_in_memory1.GetAlignmentLength();\n    }\n    return mapping_in_memory2.GetAlignmentLength();\n  }\n};\n\n}  // namespace chromap\n\n#endif  // MAPPING_IN_MEMORY_H_\n"
  },
  {
    "path": "src/mapping_metadata.h",
    "content": "#ifndef MAPPING_METADATA_H_\n#define MAPPING_METADATA_H_\n\n#include <algorithm>\n#include <cstdio>\n#include <utility>\n#include <vector>\n\n#include \"minimizer.h\"\n#include \"candidate.h\"\n#include \"draft_mapping.h\"\n\nnamespace chromap {\n\nclass mm_cache;\nclass Index;\nclass CandidateProcessor;\nclass PairedEndMappingMetadata;\nclass DraftMappingGenerator;\ntemplate <typename MappingRecord>\nclass MappingGenerator;\nclass Chromap;\n\nclass MappingMetadata {\n public:\n  inline void PrepareForMappingNextRead(int reserve_size) {\n    Clear();\n    ReserveSpace(reserve_size);\n    repetitive_seed_length_ = 0;\n  }\n\n  inline size_t GetNumMinimizers() const {\n    return minimizers_.size();\n  }\n\n  inline size_t GetNumPositiveCandidates() const {\n    return positive_candidates_.size();\n  }\n\n  inline size_t GetNumNegativeCandidates() const {\n    return negative_candidates_.size();\n  }\n\n  inline size_t GetNumCandidates() const {\n    return positive_candidates_.size() + negative_candidates_.size();\n  }\n\n  inline size_t GetNumDraftMappings() const {\n    return positive_mappings_.size() + negative_mappings_.size();\n  }\n\n  inline void MoveCandidiatesToBuffer() {\n    positive_candidates_.swap(positive_candidates_buffer_);\n    positive_candidates_.clear();\n    negative_candidates_.swap(negative_candidates_buffer_);\n    negative_candidates_.clear();\n  }\n\n  // Callback function to update all candidates.\n  inline void UpdateCandidates(void (*Update)(std::vector<Candidate> &)) {\n    Update(positive_candidates_);\n    Update(negative_candidates_);\n  }\n\n  inline void SortCandidates() {\n    std::sort(positive_candidates_.begin(), positive_candidates_.end());\n    std::sort(negative_candidates_.begin(), negative_candidates_.end());\n  }\n\n  inline void SortMappingsByPositions() {\n    auto compare_function = [](const DraftMapping &a, const DraftMapping &b) {\n      return a.position < b.position;\n    };\n    std::sort(positive_mappings_.begin(), positive_mappings_.end(),\n              compare_function);\n    std::sort(negative_mappings_.begin(), negative_mappings_.end(),\n              compare_function);\n  }\n\n  inline int GetMinNumErrors() const { return min_num_errors_; }\n  inline int GetSecondMinNumErrors() const { return second_min_num_errors_; }\n  inline int GetNumBestMappings() const { return num_best_mappings_; }\n  inline int GetNumSecondBestMappings() const {\n    return num_second_best_mappings_;\n  }\n\n  inline void SetMinNumErrors(int min_num_errors) {\n    min_num_errors_ = min_num_errors;\n  }\n  inline void SetSecondMinNumErrors(int second_min_num_errors) {\n    second_min_num_errors_ = second_min_num_errors;\n  }\n  inline void SetNumBestMappings(int num_best_mappings) {\n    num_best_mappings_ = num_best_mappings;\n  }\n  inline void SetNumSecondBestMappings(int num_second_best_mappings) {\n    num_second_best_mappings_ = num_second_best_mappings;\n  }\n\n  // For debug only.\n  inline void PrintCandidates(FILE *fp) {\n    uint32_t i;\n    for (i = 0; i < positive_candidates_.size(); ++i)\n      fprintf(fp, \"+ %d %d %d %d\\n\", i,\n              (int)(positive_candidates_[i].position >> 32),\n              (int)(positive_candidates_[i].position),\n              positive_candidates_[i].count);\n    for (i = 0; i < negative_candidates_.size(); ++i)\n      fprintf(fp, \"- %d %d %d %d\\n\", i,\n              (int)(negative_candidates_[i].position >> 32),\n              (int)(negative_candidates_[i].position),\n              negative_candidates_[i].count);\n  }\n\n protected:\n  inline void ReserveSpace(int reserve_size) {\n    minimizers_.reserve(reserve_size);\n    positive_hits_.reserve(reserve_size);\n    negative_hits_.reserve(reserve_size);\n    positive_candidates_.reserve(reserve_size);\n    negative_candidates_.reserve(reserve_size);\n    positive_candidates_buffer_.reserve(reserve_size);\n    negative_candidates_buffer_.reserve(reserve_size);\n    positive_mappings_.reserve(reserve_size);\n    negative_mappings_.reserve(reserve_size);\n    positive_split_sites_.reserve(reserve_size);\n    negative_split_sites_.reserve(reserve_size);\n  }\n\n  inline void Clear() {\n    minimizers_.clear();\n    positive_hits_.clear();\n    negative_hits_.clear();\n    positive_candidates_.clear();\n    negative_candidates_.clear();\n    positive_candidates_buffer_.clear();\n    negative_candidates_buffer_.clear();\n    positive_mappings_.clear();\n    negative_mappings_.clear();\n    positive_split_sites_.clear();\n    negative_split_sites_.clear();\n  }\n\n  int min_num_errors_, second_min_num_errors_;\n  int num_best_mappings_, num_second_best_mappings_;\n\n  uint32_t repetitive_seed_length_;\n\n  std::vector<Minimizer> minimizers_;\n\n  std::vector<uint64_t> positive_hits_;\n  std::vector<uint64_t> negative_hits_;\n\n  std::vector<Candidate> positive_candidates_;\n  std::vector<Candidate> negative_candidates_;\n\n  std::vector<Candidate> positive_candidates_buffer_;\n  std::vector<Candidate> negative_candidates_buffer_;\n\n  // The first element is ed, and the second element is position.\n  std::vector<DraftMapping> positive_mappings_;\n  std::vector<DraftMapping> negative_mappings_;\n\n  std::vector<int> positive_split_sites_;\n  std::vector<int> negative_split_sites_;\n\n  friend class mm_cache;\n  friend class Index;\n  friend class CandidateProcessor;\n  friend class PairedEndMappingMetadata;\n  friend class DraftMappingGenerator;\n  template <typename MappingRecord>\n  friend class MappingGenerator;\n  friend class Chromap;\n};\n\n}  // namespace chromap\n\n#endif  // MAPPING_METADATA_H_\n"
  },
  {
    "path": "src/mapping_parameters.h",
    "content": "#ifndef MAPPING_PARAMETERS_H_\n#define MAPPING_PARAMETERS_H_\n\n#include <cstdint>\n#include <string>\n\nnamespace chromap {\n\nenum MappingOutputFormat {\n  MAPPINGFORMAT_UNKNOWN,\n  MAPPINGFORMAT_BED,\n  MAPPINGFORMAT_TAGALIGN,\n  MAPPINGFORMAT_PAF,\n  MAPPINGFORMAT_SAM,\n  MAPPINGFORMAT_PAIRS\n};\n\nstruct MappingParameters {\n  int error_threshold = 8;\n  int match_score = 1;\n  int mismatch_penalty = 4;\n  std::vector<int> gap_open_penalties = {6, 6};\n  std::vector<int> gap_extension_penalties = {1, 1};\n  int min_num_seeds_required_for_mapping = 2;\n  std::vector<int> max_seed_frequencies = {500, 1000};\n\n  double cache_update_param = 0.01;\n  int cache_size = 4000003;\n  bool debug_cache = false;\n  std::string frip_est_params = \"-1.0996;4.2391;3.0164e-05;-2.1087e-04;-5.5825e-05\";\n  bool output_num_uniq_cache_slots = true;\n  int k_for_minhash = 250;\n\n  // Read with # best mappings greater than it will have this number of best\n  // mappings reported.\n  int max_num_best_mappings = 1;\n  int max_insert_size = 1000;\n  uint8_t mapq_threshold = 30;\n  int num_threads = 1;\n  int min_read_length = 30;\n  int barcode_correction_error_threshold = 1;\n  double barcode_correction_probability_threshold = 0.9;\n  int multi_mapping_allocation_distance = 0;\n  int multi_mapping_allocation_seed = 11;\n  // Read with more than this number of mappings will be dropped.\n  int drop_repetitive_reads = 500000;\n  bool trim_adapters = false;\n  bool remove_pcr_duplicates = false;\n  bool remove_pcr_duplicates_at_bulk_level = true;\n  bool is_bulk_data = true;\n  bool allocate_multi_mappings = false;\n  bool only_output_unique_mappings = true;\n  bool output_mappings_not_in_whitelist = false;\n  bool Tn5_shift = false;\n  bool split_alignment = false;\n  MappingOutputFormat mapping_output_format = MAPPINGFORMAT_BED;\n  bool low_memory_mode = false;\n  bool cell_by_bin = false;\n  int bin_size = 5000;\n  uint16_t depth_cutoff_to_call_peak = 3;\n  int peak_min_length = 30;\n  int peak_merge_max_length = 30;\n  std::string reference_file_path;\n  std::string index_file_path;\n  std::vector<std::string> read_file1_paths;\n  std::vector<std::string> read_file2_paths;\n  std::vector<std::string> barcode_file_paths;\n  std::string barcode_whitelist_file_path;\n  std::string read_format;\n  std::string mapping_output_file_path;\n  std::string matrix_output_prefix;\n  // The order for general sorting.\n  std::string custom_rid_order_file_path;\n  // The order for pairs format flipping.\n  std::string pairs_flipping_custom_rid_order_file_path;\n  std::string barcode_translate_table_file_path;\n  std::string summary_metadata_file_path;\n  bool skip_barcode_check = false;\n\n  int GetNumVPULanes() const {\n    int NUM_VPU_LANES = 0;\n    if (error_threshold < 8) {\n      NUM_VPU_LANES = 8;\n    } else if (error_threshold < 16) {\n      NUM_VPU_LANES = 4;\n    }\n    return NUM_VPU_LANES;\n  }\n};\n\n}  // namespace chromap\n\n#endif  // MAPPING_PARAMETERS_H_\n"
  },
  {
    "path": "src/mapping_processor.h",
    "content": "#ifndef MAPPING_PROCESSOR_H_\n#define MAPPING_PROCESSOR_H_\n\n#include <assert.h>\n\n#include <cinttypes>\n#include <cstring>\n#include <functional>\n#include <iostream>\n#include <string>\n#include <vector>\n\n#include \"bed_mapping.h\"\n#include \"mapping.h\"\n#include \"mapping_parameters.h\"\n#include \"paf_mapping.h\"\n#include \"pairs_mapping.h\"\n#include \"sam_mapping.h\"\n#include \"temp_mapping.h\"\n#include \"utils.h\"\n\nnamespace chromap {\n\ntemplate <typename MappingRecord>\nbool ReadIdLess(const std::pair<uint32_t, MappingRecord> &a,\n                const std::pair<uint32_t, MappingRecord> &b) {\n  return a.second.read_id_ < b.second.read_id_;\n}\n\n// Class to process mappings. It supports multi-threadidng as only the\n// parameters are owned by the class.\ntemplate <typename MappingRecord>\nclass MappingProcessor {\n public:\n  MappingProcessor() = delete;\n  MappingProcessor(const MappingParameters &mapping_parameters,\n                   int min_unique_mapping_mapq)\n      : min_unique_mapping_mapq_(min_unique_mapping_mapq),\n        multi_mapping_allocation_seed_(\n            mapping_parameters.multi_mapping_allocation_seed),\n        multi_mapping_allocation_distance_(\n            mapping_parameters.multi_mapping_allocation_distance),\n        max_num_best_mappings_(mapping_parameters.max_num_best_mappings) {}\n\n  ~MappingProcessor() = default;\n\n  void SortOutputMappings(\n      uint32_t num_reference_sequences,\n      std::vector<std::vector<MappingRecord>> &mappings) const;\n  \n  void ParallelSortOutputMappings(\n      uint32_t num_reference_sequences,\n      std::vector<std::vector<MappingRecord>> &mappings,\n      int num_threads) const;\n\n  void RemovePCRDuplicate(\n      uint32_t num_reference_sequences,\n      std::vector<std::vector<MappingRecord>> &mappings,\n      int num_threads) const;\n\n  void AllocateMultiMappings(\n      uint32_t num_reference_sequences, uint64_t num_multi_mappings,\n      int multi_mapping_allocation_distance,\n      std::vector<std::vector<MappingRecord>> &mappings) const;\n\n  void ApplyTn5ShiftOnMappings(\n      uint32_t num_reference_sequences,\n      std::vector<std::vector<MappingRecord>> &mappings);\n\n  uint32_t MoveMappingsInBuffersToMappingContainer(\n      uint32_t num_reference_sequences,\n      std::vector<std::vector<std::vector<MappingRecord>>>\n          &mappings_on_diff_ref_seqs_for_diff_threads_for_saving,\n      std::vector<std::vector<MappingRecord>> &mappings_on_diff_ref_seqs);\n\n  void OutputMappingStatistics(\n      uint32_t num_reference_sequences,\n      const std::vector<std::vector<MappingRecord>> &mappings_on_diff_ref_seqs);\n\n private:\n  void BuildAugmentedTree(\n      uint32_t ref_id,\n      std::vector<std::vector<MappingRecord>> &allocated_mappings,\n      std::vector<std::pair<int, uint32_t>> &tree_info,\n      std::vector<std::vector<uint32_t>> &tree_extras) const;\n\n  uint32_t GetNumOverlappedMappings(\n      uint32_t ref_id, int multi_mapping_allocation_distance,\n      const MappingRecord &mapping,\n      const std::vector<std::vector<MappingRecord>> &allocated_mappings,\n      const std::vector<std::pair<int, uint32_t>> &tree_info,\n      const std::vector<std::vector<uint32_t>> &tree_extras) const;\n\n  const int min_unique_mapping_mapq_;\n  const int multi_mapping_allocation_seed_;\n  const int multi_mapping_allocation_distance_;\n  const int max_num_best_mappings_;\n};\n\ntemplate <typename MappingRecord>\nvoid MappingProcessor<MappingRecord>::SortOutputMappings(\n    uint32_t num_reference_sequences,\n    std::vector<std::vector<MappingRecord>> &mappings) const {\n  // double real_dedupe_start_time = Chromap<>::GetRealTime();\n  uint32_t num_mappings = 0;\n  for (uint32_t ri = 0; ri < num_reference_sequences; ++ri) {\n    std::sort(mappings[ri].begin(), mappings[ri].end());\n    num_mappings += mappings[ri].size();\n  }\n  // std::cerr << \"Sorted \" << num_mappings << \" elements in \" <<\n  // Chromap<>::GetRealTime() - real_dedupe_start_time << \"s.\\n\";\n}\n\n// If num_thread <= 0, then the number of thread is set externally\n// Seems in this case omp task is much more efficient than omp parallel\ntemplate <typename MappingRecord>\nvoid MappingProcessor<MappingRecord>::ParallelSortOutputMappings(\n    uint32_t num_reference_sequences,\n    std::vector<std::vector<MappingRecord>> &mappings,\n    int num_threads) const {\n  // double real_dedupe_start_time = Chromap<>::GetRealTime();\n  if (num_threads <= 0)\n  {\n#pragma omp task shared(mappings)\n    {\n      for (uint32_t ri = 0; ri < num_reference_sequences; ri += 2) {\n        std::sort(mappings[ri].begin(), mappings[ri].end());\n      }\n    }\n    \n#pragma omp task shared(mappings)\n    {\n      for (uint32_t ri = 1; ri < num_reference_sequences; ri += 2) {\n        std::sort(mappings[ri].begin(), mappings[ri].end());\n      }\n    }\n#pragma omp taskwait\n  }\n  else\n  {\n#pragma omp parallel shared(mappings) num_threads(num_threads)\n    {\n#pragma omp single\n      {\n#pragma omp taskloop\n        for (uint32_t ri = 0; ri < num_reference_sequences; ++ri) {\n          std::sort(mappings[ri].begin(), mappings[ri].end());\n        }\n      }\n    }\n  }\n  \n  //uint32_t num_mappings = 0;\n  //for (uint32_t ri = 0; ri < num_reference_sequences; ++ri) {\n  //  num_mappings += mappings[ri].size();\n  //}\n  // std::cerr << \"Sorted \" << num_mappings << \" elements in \" <<\n  // Chromap<>::GetRealTime() - real_dedupe_start_time << \"s.\\n\";\n}\n\ntemplate <typename MappingRecord>\nvoid MappingProcessor<MappingRecord>::RemovePCRDuplicate(\n    uint32_t num_reference_sequences,\n    std::vector<std::vector<MappingRecord>> &mappings,\n    int num_threads) const {\n  double real_dedupe_start_time = GetRealTime();\n  ParallelSortOutputMappings(num_reference_sequences, mappings, num_threads);\n  std::cerr << \"Sorted in \" << GetRealTime() - real_dedupe_start_time << \"s.\\n\";\n\n  std::vector<std::vector<MappingRecord>> deduped_mappings;\n  uint32_t num_mappings = 0;\n  for (uint32_t ri = 0; ri < num_reference_sequences; ++ri) {\n    deduped_mappings.push_back(std::vector<MappingRecord>());\n    if (mappings[ri].size() != 0) {\n      // Haowen: Ideally I should output the last of the dups of first mappings.\n      // Li: The mappings' mapq are sorted in increasing order, so we should put the last\n      // map\n      auto last_it = mappings[ri].begin();\n      uint32_t last_dup_count = 1;\n\n      for (auto it = ++(mappings[ri].begin()); it != mappings[ri].end(); ++it) {\n        if (!((*it) == (*last_it))) {\n          deduped_mappings[ri].emplace_back((*last_it));\n          deduped_mappings[ri].back().num_dups_ = std::min(\n              (uint32_t)std::numeric_limits<uint8_t>::max(), last_dup_count);\n          last_dup_count = 1;\n        } else {\n          ++last_dup_count;\n        }\n        last_it = it;\n      }\n\n      deduped_mappings[ri].emplace_back((*last_it));\n      deduped_mappings[ri].back().num_dups_ = std::min(\n          (uint32_t)std::numeric_limits<uint8_t>::max(), last_dup_count);\n      num_mappings += deduped_mappings[ri].size();\n      deduped_mappings[ri].swap(mappings[ri]);\n    }\n  }\n  std::cerr << num_mappings << \" mappings left after deduplication in \"\n            << GetRealTime() - real_dedupe_start_time << \"s.\\n\";\n}\n\ntemplate <typename MappingRecord>\nvoid MappingProcessor<MappingRecord>::BuildAugmentedTree(\n    uint32_t ref_id,\n    std::vector<std::vector<MappingRecord>> &allocated_mappings,\n    std::vector<std::pair<int, uint32_t>> &tree_info,\n    std::vector<std::vector<uint32_t>> &tree_extras) const {\n  // std::sort(mappings.begin(), mappings.end(), IntervalLess());\n  int max_level = 0;\n  size_t i, last_i = 0;  // last_i points to the rightmost node in the tree\n  uint32_t last = 0;     // last is the max value at node last_i\n  int k;\n  std::vector<MappingRecord> &mappings = allocated_mappings[ref_id];\n  std::vector<uint32_t> &extras = tree_extras[ref_id];\n  if (mappings.size() == 0) {\n    max_level = -1;\n  }\n  for (i = 0; i < mappings.size(); i += 2) {\n    last_i = i;\n    // last = mappings[i].max = mappings[i].en; // leaves (i.e. at level 0)\n    last = extras[i] =\n        mappings[i].GetEndPosition();  // leaves (i.e. at level 0)\n  }\n  for (k = 1; 1LL << k <= (int64_t)mappings.size();\n       ++k) {  // process internal nodes in the bottom-up order\n    size_t x = 1LL << (k - 1);\n    size_t i0 = (x << 1) - 1;\n    size_t step = x << 2;  // i0 is the first node\n    for (i = i0; i < mappings.size();\n         i += step) {               // traverse all nodes at level k\n      uint32_t el = extras[i - x];  // max value of the left child\n      uint32_t er =\n          i + x < mappings.size() ? extras[i + x] : last;  // of the right child\n      uint32_t e = mappings[i].GetEndPosition();\n      e = e > el ? e : el;\n      e = e > er ? e : er;\n      extras[i] = e;  // set the max value for node i\n    }\n    last_i =\n        last_i >> k & 1\n            ? last_i - x\n            : last_i +\n                  x;  // last_i now points to the parent of the original last_i\n    if (last_i < mappings.size() &&\n        extras[last_i] > last)  // update last accordingly\n      last = extras[last_i];\n  }\n  max_level = k - 1;\n  tree_info.emplace_back(max_level, mappings.size());\n}\n\ntemplate <typename MappingRecord>\nuint32_t MappingProcessor<MappingRecord>::GetNumOverlappedMappings(\n    uint32_t ref_id, int multi_mapping_allocation_distance,\n    const MappingRecord &mapping,\n    const std::vector<std::vector<MappingRecord>> &allocated_mappings,\n    const std::vector<std::pair<int, uint32_t>> &tree_info,\n    const std::vector<std::vector<uint32_t>> &tree_extras) const {\n  int t = 0;\n  StackCell stack[64];\n  // out.clear();\n  int num_overlapped_mappings = 0;\n  int max_level = tree_info[ref_id].first;\n  uint32_t num_tree_nodes = tree_info[ref_id].second;\n  const std::vector<MappingRecord> &mappings = allocated_mappings[ref_id];\n  const std::vector<uint32_t> &extras = tree_extras[ref_id];\n  // uint32_t interval_start = mapping.fragment_start_position;\n  uint32_t interval_start =\n      mapping.GetStartPosition() > (uint32_t)multi_mapping_allocation_distance\n          ? mapping.GetStartPosition() - multi_mapping_allocation_distance\n          : 0;\n  uint32_t interval_end =\n      mapping.GetEndPosition() + (uint32_t)multi_mapping_allocation_distance;\n  // push the root; this is a top down traversal\n  stack[t++] = StackCell(max_level, (1LL << max_level) - 1, 0);\n  // the following guarantees that numbers in out[] are always sorted\n  while (t) {\n    StackCell z = stack[--t];\n    // we are in a small subtree; traverse every node in this subtree\n    if (z.k <= 3) {\n      size_t i, i0 = z.x >> z.k << z.k, i1 = i0 + (1LL << (z.k + 1)) - 1;\n      if (i1 >= num_tree_nodes) {\n        i1 = num_tree_nodes;\n      }\n      for (i = i0; i < i1 && mappings[i].GetStartPosition() < interval_end;\n           ++i) {\n        if (interval_start <\n            mappings[i].GetEndPosition()) {  // if overlap, append to out[]\n          // out.push_back(i);\n          ++num_overlapped_mappings;\n        }\n      }\n    } else if (z.w == 0) {  // if left child not processed\n      // the left child of z.x; NB: y may be out of range (i.e. y>=a.size())\n      size_t y = z.x - (1LL << (z.k - 1));\n      // re-add node z.x, but mark the left child having been processed\n      stack[t++] = StackCell(z.k, z.x, 1);\n      // push the left child if y is out of range or may overlap with the query\n      if (y >= num_tree_nodes || extras[y] > interval_start)\n        stack[t++] = StackCell(z.k - 1, y, 0);\n    } else if (z.x < num_tree_nodes &&\n               mappings[z.x].GetStartPosition() < interval_end) {\n      // need to push the right child\n      if (interval_start < mappings[z.x].GetEndPosition()) {\n        // out.push_back(z.x);\n        // test if z.x overlaps the query; if yes, append to out[]\n        ++num_overlapped_mappings;\n      }\n      // push the right child\n      stack[t++] = StackCell(z.k - 1, z.x + (1LL << (z.k - 1)), 0);\n    }\n  }\n  return num_overlapped_mappings;\n}\n\ntemplate <typename MappingRecord>\nvoid MappingProcessor<MappingRecord>::AllocateMultiMappings(\n    uint32_t num_reference_sequences, uint64_t num_multi_mappings,\n    int multi_mapping_allocation_distance,\n    std::vector<std::vector<MappingRecord>> &mappings) const {\n  double real_start_time = GetRealTime();\n\n  std::vector<std::pair<uint32_t, MappingRecord>> multi_mappings;\n  multi_mappings.reserve(num_multi_mappings);\n\n  std::vector<std::vector<MappingRecord>> allocated_mappings;\n  allocated_mappings.reserve(num_reference_sequences);\n\n  std::vector<std::pair<int, uint32_t>> tree_info;\n  // max (max_level, # nodes)\n  std::vector<std::vector<uint32_t>> tree_extras;\n  tree_extras.reserve(num_reference_sequences);\n  tree_info.reserve(num_reference_sequences);\n\n  // two passes, one for memory pre-allocation, another to move the mappings.\n  for (uint32_t ri = 0; ri < num_reference_sequences; ++ri) {\n    allocated_mappings.emplace_back(std::vector<MappingRecord>());\n    tree_extras.emplace_back(std::vector<uint32_t>());\n    uint32_t num_uni_mappings = 0;\n    uint32_t num_multi_mappings = 0;\n    for (uint32_t mi = 0; mi < mappings[ri].size(); ++mi) {\n      MappingRecord &mapping = mappings[ri][mi];\n      if ((mapping.mapq_) <\n          min_unique_mapping_mapq_) {  // we have to ensure that the mapq is\n                                       // lower than this if and only if it is a\n                                       // multi-read.\n        ++num_multi_mappings;\n      } else {\n        ++num_uni_mappings;\n      }\n    }\n    allocated_mappings[ri].reserve(num_uni_mappings);\n    tree_extras[ri].reserve(num_uni_mappings);\n    for (uint32_t mi = 0; mi < mappings[ri].size(); ++mi) {\n      MappingRecord &mapping = mappings[ri][mi];\n      if ((mapping.mapq_) < min_unique_mapping_mapq_) {\n        multi_mappings.emplace_back(ri, mapping);\n      } else {\n        allocated_mappings[ri].emplace_back(mapping);\n        tree_extras[ri].emplace_back(0);\n      }\n    }\n    std::vector<MappingRecord>().swap(mappings[ri]);\n    BuildAugmentedTree(ri, allocated_mappings, tree_info, tree_extras);\n  }\n  std::cerr << \"Got all \" << multi_mappings.size() << \" multi-mappings!\\n\";\n\n  std::stable_sort(multi_mappings.begin(), multi_mappings.end(),\n                   ReadIdLess<MappingRecord>);\n  std::vector<uint32_t> weights;\n  weights.reserve(max_num_best_mappings_);\n  uint32_t sum_weight = 0;\n  assert(multi_mappings.size() > 0);\n  uint32_t previous_read_id = multi_mappings[0].second.read_id_;\n  uint32_t start_mapping_index = 0;\n  // add a fake mapping at the end and make sure its id is different from the\n  // last one\n  assert(multi_mappings.size() != UINT32_MAX);\n  std::pair<uint32_t, MappingRecord> foo_mapping = multi_mappings.back();\n  foo_mapping.second.read_id_ = UINT32_MAX;\n  multi_mappings.emplace_back(foo_mapping);\n  std::mt19937 generator(multi_mapping_allocation_seed_);\n  uint32_t current_read_id;  //, reference_id, mapping_index;\n  // uint32_t allocated_read_id, allocated_reference_id,\n  // allocated_mapping_index;\n  uint32_t num_allocated_multi_mappings = 0;\n  uint32_t num_multi_mappings_without_overlapping_unique_mappings = 0;\n  for (uint32_t mi = 0; mi < multi_mappings.size(); ++mi) {\n    std::pair<uint32_t, MappingRecord> &current_multi_mapping =\n        multi_mappings[mi];  // mappings[reference_id][mapping_index];\n    current_read_id = current_multi_mapping.second.read_id_;\n    uint32_t num_overlaps = GetNumOverlappedMappings(\n        current_multi_mapping.first, multi_mapping_allocation_distance,\n        current_multi_mapping.second, allocated_mappings, tree_info,\n        tree_extras);\n    // std::cerr << mi << \" \" << current_read_id << \" \" << previous_read_id << \"\n    // \" << reference_id << \" \" << mapping_index << \" \" << interval_start << \" \"\n    // << num_overlaps << \" \" << sum_weight << \"\\n\";\n    if (current_read_id == previous_read_id) {\n      weights.emplace_back(num_overlaps);\n      sum_weight += num_overlaps;\n    } else {\n      // deal with the previous one.\n      if (sum_weight == 0) {\n        ++num_multi_mappings_without_overlapping_unique_mappings;\n        // assert(weights.size() > 1); // After PCR dedupe, some multi-reads may\n        // become uni-reads. For now, we just assign it to that unique mapping\n        // positions. std::fill(weights.begin(), weights.end(), 1); // We drop\n        // the multi-mappings that have no overlap with uni-mappings.\n      } else {\n        std::discrete_distribution<uint32_t> distribution(weights.begin(),\n                                                          weights.end());\n        uint32_t randomly_assigned_mapping_index = distribution(generator);\n        allocated_mappings[multi_mappings[start_mapping_index +\n                                          randomly_assigned_mapping_index]\n                               .first]\n            .emplace_back(multi_mappings[start_mapping_index +\n                                         randomly_assigned_mapping_index]\n                              .second);\n        ++num_allocated_multi_mappings;\n      }\n      // update current\n      weights.clear();\n      weights.emplace_back(num_overlaps);\n      sum_weight = num_overlaps;\n      start_mapping_index = mi;\n      previous_read_id = current_read_id;\n    }\n  }\n\n  mappings.swap(allocated_mappings);\n\n  std::cerr << \"Allocated \" << num_allocated_multi_mappings\n            << \" multi-mappings in \" << GetRealTime() - real_start_time\n            << \"s.\\n\";\n  std::cerr << \"# multi-mappings that have no uni-mapping overlaps: \"\n            << num_multi_mappings_without_overlapping_unique_mappings << \".\\n\";\n}\n\ntemplate <typename MappingRecord>\nvoid MappingProcessor<MappingRecord>::ApplyTn5ShiftOnMappings(\n    uint32_t num_reference_sequences,\n    std::vector<std::vector<MappingRecord>> &mappings) {\n  uint64_t num_shifted_mappings = 0;\n  for (auto &mappings_on_one_ref_seq : mappings) {\n    for (auto &mapping : mappings_on_one_ref_seq) {\n      mapping.Tn5Shift();\n      ++num_shifted_mappings;\n    }\n  }\n  std::cerr << \"# shifted mappings: \" << num_shifted_mappings << \".\\n\";\n}\n\ntemplate <typename MappingRecord>\nuint32_t\nMappingProcessor<MappingRecord>::MoveMappingsInBuffersToMappingContainer(\n    uint32_t num_reference_sequences,\n    std::vector<std::vector<std::vector<MappingRecord>>>\n        &mappings_on_diff_ref_seqs_for_diff_threads_for_saving,\n    std::vector<std::vector<MappingRecord>> &mappings_on_diff_ref_seqs) {\n  // double real_start_time = Chromap<>::GetRealTime();\n  uint32_t num_moved_mappings = 0;\n  for (size_t ti = 0;\n       ti < mappings_on_diff_ref_seqs_for_diff_threads_for_saving.size();\n       ++ti) {\n    for (uint32_t i = 0; i < num_reference_sequences; ++i) {\n      num_moved_mappings +=\n          mappings_on_diff_ref_seqs_for_diff_threads_for_saving[ti][i].size();\n      mappings_on_diff_ref_seqs[i].insert(\n          mappings_on_diff_ref_seqs[i].end(),\n          std::make_move_iterator(\n              mappings_on_diff_ref_seqs_for_diff_threads_for_saving[ti][i]\n                  .begin()),\n          std::make_move_iterator(\n              mappings_on_diff_ref_seqs_for_diff_threads_for_saving[ti][i]\n                  .end()));\n      mappings_on_diff_ref_seqs_for_diff_threads_for_saving[ti][i].clear();\n    }\n  }\n  // std::cerr << \"Moved mappings in \" << Chromap<>::GetRealTime() -\n  // real_start_time << \"s.\\n\";\n  return num_moved_mappings;\n}\n\ntemplate <typename MappingRecord>\nvoid MappingProcessor<MappingRecord>::OutputMappingStatistics(\n    uint32_t num_reference_sequences,\n    const std::vector<std::vector<MappingRecord>> &mappings_on_diff_ref_seqs) {\n  uint64_t num_uni_mappings = 0;\n  uint64_t num_multi_mappings = 0;\n  for (auto &mappings_on_one_ref_seq : mappings_on_diff_ref_seqs) {\n    for (auto &mapping : mappings_on_one_ref_seq) {\n      if ((mapping.is_unique_) == 1) {\n        ++num_uni_mappings;\n      } else {\n        ++num_multi_mappings;\n      }\n    }\n  }\n  std::cerr << \"# uni-mappings: \" << num_uni_mappings\n            << \", # multi-mappings: \" << num_multi_mappings\n            << \", total: \" << num_uni_mappings + num_multi_mappings << \".\\n\";\n}\n\n}  // namespace chromap\n\n#endif  // MAPPING_PROCESSOR_H_\n"
  },
  {
    "path": "src/mapping_writer.cc",
    "content": "#include \"mapping_writer.h\"\n\nnamespace chromap {\n\n// Specialization for BED format.\ntemplate <>\nvoid MappingWriter<MappingWithBarcode>::OutputHeader(\n    uint32_t num_reference_sequences, const SequenceBatch &reference) {}\n\ntemplate <>\nvoid MappingWriter<MappingWithBarcode>::AppendMapping(\n    uint32_t rid, const SequenceBatch &reference,\n    const MappingWithBarcode &mapping) {\n  if (mapping_parameters_.mapping_output_format == MAPPINGFORMAT_BED) {\n    std::string strand = mapping.IsPositiveStrand() ? \"+\" : \"-\";\n    const char *reference_sequence_name = reference.GetSequenceNameAt(rid);\n    uint32_t mapping_end_position = mapping.GetEndPosition();\n    this->AppendMappingOutput(std::string(reference_sequence_name) + \"\\t\" +\n                              std::to_string(mapping.GetStartPosition()) +\n                              \"\\t\" + std::to_string(mapping_end_position) +\n                              \"\\t\" +\n                              barcode_translator_.Translate(\n                                  mapping.cell_barcode_, cell_barcode_length_) +\n                              \"\\t\" + std::to_string(mapping.num_dups_) + \"\\n\");\n  } else {\n    std::string strand = mapping.IsPositiveStrand() ? \"+\" : \"-\";\n    const char *reference_sequence_name = reference.GetSequenceNameAt(rid);\n    uint32_t mapping_end_position = mapping.GetEndPosition();\n    this->AppendMappingOutput(std::string(reference_sequence_name) + \"\\t\" +\n                              std::to_string(mapping.GetStartPosition()) +\n                              \"\\t\" + std::to_string(mapping_end_position) +\n                              \"\\tN\\t\" + std::to_string(mapping.mapq_) + \"\\t\" +\n                              strand + \"\\n\");\n  }\n}\n\ntemplate <>\nvoid MappingWriter<MappingWithoutBarcode>::OutputHeader(\n    uint32_t num_reference_sequences, const SequenceBatch &reference) {}\n\ntemplate <>\nvoid MappingWriter<MappingWithoutBarcode>::AppendMapping(\n    uint32_t rid, const SequenceBatch &reference,\n    const MappingWithoutBarcode &mapping) {\n  if (mapping_parameters_.mapping_output_format == MAPPINGFORMAT_BED) {\n    std::string strand = mapping.IsPositiveStrand() ? \"+\" : \"-\";\n    const char *reference_sequence_name = reference.GetSequenceNameAt(rid);\n    uint32_t mapping_end_position = mapping.GetEndPosition();\n    this->AppendMappingOutput(std::string(reference_sequence_name) + \"\\t\" +\n                              std::to_string(mapping.GetStartPosition()) +\n                              \"\\t\" + std::to_string(mapping_end_position) +\n                              \"\\tN\\t\" + std::to_string(mapping.mapq_) + \"\\t\" +\n                              strand + \"\\t\" + std::to_string(mapping.num_dups_) + \"\\n\");\n  } else {\n    std::string strand = mapping.IsPositiveStrand() ? \"+\" : \"-\";\n    const char *reference_sequence_name = reference.GetSequenceNameAt(rid);\n    uint32_t mapping_end_position = mapping.GetEndPosition();\n    this->AppendMappingOutput(std::string(reference_sequence_name) + \"\\t\" +\n                              std::to_string(mapping.GetStartPosition()) +\n                              \"\\t\" + std::to_string(mapping_end_position) +\n                              \"\\tN\\t\" + std::to_string(mapping.mapq_) + \"\\t\" +\n                              strand + \"\\t\" + std::to_string(mapping.num_dups_) + \"\\n\");\n  }\n}\n\n// Specialization for BEDPE format.\ntemplate <>\nvoid MappingWriter<PairedEndMappingWithoutBarcode>::OutputHeader(\n    uint32_t num_reference_sequences, const SequenceBatch &reference) {}\n\ntemplate <>\nvoid MappingWriter<PairedEndMappingWithoutBarcode>::AppendMapping(\n    uint32_t rid, const SequenceBatch &reference,\n    const PairedEndMappingWithoutBarcode &mapping) {\n  if (mapping_parameters_.mapping_output_format == MAPPINGFORMAT_BED) {\n    std::string strand = mapping.IsPositiveStrand() ? \"+\" : \"-\";\n    const char *reference_sequence_name = reference.GetSequenceNameAt(rid);\n    uint32_t mapping_end_position = mapping.GetEndPosition();\n    this->AppendMappingOutput(std::string(reference_sequence_name) + \"\\t\" +\n                              std::to_string(mapping.GetStartPosition()) +\n                              \"\\t\" + std::to_string(mapping_end_position) +\n                              \"\\tN\\t\" + std::to_string(mapping.mapq_) + \"\\t\" +\n                              strand + \"\\t\" + std::to_string(mapping.num_dups_) + \"\\n\");\n  } else {\n    bool positive_strand = mapping.IsPositiveStrand();\n    uint32_t positive_read_end =\n        mapping.fragment_start_position_ + mapping.positive_alignment_length_;\n    uint32_t negative_read_end =\n        mapping.fragment_start_position_ + mapping.fragment_length_;\n    uint32_t negative_read_start =\n        negative_read_end - mapping.negative_alignment_length_;\n    const char *reference_sequence_name = reference.GetSequenceNameAt(rid);\n    if (positive_strand) {\n      this->AppendMappingOutput(\n          std::string(reference_sequence_name) + \"\\t\" +\n          std::to_string(mapping.fragment_start_position_) + \"\\t\" +\n          std::to_string(positive_read_end) + \"\\tN\\t\" +\n          std::to_string(mapping.mapq_) + \"\\t+\\n\" +\n          std::string(reference_sequence_name) + \"\\t\" +\n          std::to_string(negative_read_start) + \"\\t\" +\n          std::to_string(negative_read_end) + \"\\tN\\t\" +\n          std::to_string(mapping.mapq_) + \"\\t-\\t\" + \n          std::to_string(mapping.num_dups_) + \"\\n\");\n    } else {\n      this->AppendMappingOutput(\n          std::string(reference_sequence_name) + \"\\t\" +\n          std::to_string(negative_read_start) + \"\\t\" +\n          std::to_string(negative_read_end) + \"\\tN\\t\" +\n          std::to_string(mapping.mapq_) + \"\\t-\\n\" +\n          std::string(reference_sequence_name) + \"\\t\" +\n          std::to_string(mapping.fragment_start_position_) + \"\\t\" +\n          std::to_string(positive_read_end) + \"\\tN\\t\" +\n          std::to_string(mapping.mapq_) + \"\\t+\\t\" +\n          std::to_string(mapping.num_dups_) + \"\\n\");\n    }\n  }\n}\n\ntemplate <>\nvoid MappingWriter<PairedEndMappingWithBarcode>::OutputHeader(\n    uint32_t num_reference_sequences, const SequenceBatch &reference) {}\n\ntemplate <>\nvoid MappingWriter<PairedEndMappingWithBarcode>::AppendMapping(\n    uint32_t rid, const SequenceBatch &reference,\n    const PairedEndMappingWithBarcode &mapping) {\n  if (mapping_parameters_.mapping_output_format == MAPPINGFORMAT_BED) {\n    std::string strand = mapping.IsPositiveStrand() ? \"+\" : \"-\";\n    const char *reference_sequence_name = reference.GetSequenceNameAt(rid);\n    uint32_t mapping_end_position = mapping.GetEndPosition();\n    this->AppendMappingOutput(std::string(reference_sequence_name) + \"\\t\" +\n                              std::to_string(mapping.GetStartPosition()) +\n                              \"\\t\" + std::to_string(mapping_end_position) +\n                              \"\\t\" +\n                              barcode_translator_.Translate(\n                                  mapping.cell_barcode_, cell_barcode_length_) +\n                              \"\\t\" + std::to_string(mapping.num_dups_) + \"\\n\");\n  } else {\n    bool positive_strand = mapping.IsPositiveStrand();\n    uint32_t positive_read_end =\n        mapping.fragment_start_position_ + mapping.positive_alignment_length_;\n    uint32_t negative_read_end =\n        mapping.fragment_start_position_ + mapping.fragment_length_;\n    uint32_t negative_read_start =\n        negative_read_end - mapping.negative_alignment_length_;\n    const char *reference_sequence_name = reference.GetSequenceNameAt(rid);\n    if (positive_strand) {\n      this->AppendMappingOutput(\n          std::string(reference_sequence_name) + \"\\t\" +\n          std::to_string(mapping.fragment_start_position_) + \"\\t\" +\n          std::to_string(positive_read_end) + \"\\tN\\t\" +\n          std::to_string(mapping.mapq_) + \"\\t+\\n\" +\n          std::string(reference_sequence_name) + \"\\t\" +\n          std::to_string(negative_read_start) + \"\\t\" +\n          std::to_string(negative_read_end) + \"\\tN\\t\" +\n          std::to_string(mapping.mapq_) + \"\\t-\\n\");\n    } else {\n      this->AppendMappingOutput(\n          std::string(reference_sequence_name) + \"\\t\" +\n          std::to_string(negative_read_start) + \"\\t\" +\n          std::to_string(negative_read_end) + \"\\tN\\t\" +\n          std::to_string(mapping.mapq_) + \"\\t-\\n\" +\n          std::string(reference_sequence_name) + \"\\t\" +\n          std::to_string(mapping.fragment_start_position_) + \"\\t\" +\n          std::to_string(positive_read_end) + \"\\tN\\t\" +\n          std::to_string(mapping.mapq_) + \"\\t+\\n\");\n    }\n  }\n}\n\n// Specialization for PAF format.\ntemplate <>\nvoid MappingWriter<PAFMapping>::OutputHeader(uint32_t num_reference_sequences,\n                                             const SequenceBatch &reference) {}\n\ntemplate <>\nvoid MappingWriter<PAFMapping>::AppendMapping(uint32_t rid,\n                                              const SequenceBatch &reference,\n                                              const PAFMapping &mapping) {\n  const char *reference_sequence_name = reference.GetSequenceNameAt(rid);\n  uint32_t reference_sequence_length = reference.GetSequenceLengthAt(rid);\n  std::string strand = mapping.IsPositiveStrand() ? \"+\" : \"-\";\n  uint32_t mapping_end_position =\n      mapping.fragment_start_position_ + mapping.fragment_length_;\n  this->AppendMappingOutput(\n      mapping.read_name_ + \"\\t\" + std::to_string(mapping.read_length_) + \"\\t\" +\n      std::to_string(0) + \"\\t\" + std::to_string(mapping.read_length_) + \"\\t\" +\n      strand + \"\\t\" + std::string(reference_sequence_name) + \"\\t\" +\n      std::to_string(reference_sequence_length) + \"\\t\" +\n      std::to_string(mapping.fragment_start_position_) + \"\\t\" +\n      std::to_string(mapping_end_position) + \"\\t\" +\n      std::to_string(mapping.read_length_) + \"\\t\" +\n      std::to_string(mapping.fragment_length_) + \"\\t\" +\n      std::to_string(mapping.mapq_) + \"\\n\");\n}\n\ntemplate <>\nvoid MappingWriter<PAFMapping>::OutputTempMapping(\n    const std::string &temp_mapping_output_file_path,\n    uint32_t num_reference_sequences,\n    const std::vector<std::vector<PAFMapping> > &mappings) {\n  FILE *temp_mapping_output_file =\n      fopen(temp_mapping_output_file_path.c_str(), \"wb\");\n  assert(temp_mapping_output_file != NULL);\n  for (size_t ri = 0; ri < num_reference_sequences; ++ri) {\n    // make sure mappings[ri] exists even if its size is 0\n    size_t num_mappings = mappings[ri].size();\n    fwrite(&num_mappings, sizeof(size_t), 1, temp_mapping_output_file);\n    if (mappings[ri].size() > 0) {\n      for (size_t mi = 0; mi < num_mappings; ++mi) {\n        mappings[ri][mi].WriteToFile(temp_mapping_output_file);\n      }\n      // fwrite(mappings[ri].data(), sizeof(MappingRecord), mappings[ri].size(),\n      // temp_mapping_output_file);\n    }\n  }\n  fclose(temp_mapping_output_file);\n}\n\n// Specialization for PairedPAF format.\ntemplate <>\nvoid MappingWriter<PairedPAFMapping>::OutputHeader(\n    uint32_t num_reference_sequences, const SequenceBatch &reference) {}\n\ntemplate <>\nvoid MappingWriter<PairedPAFMapping>::OutputTempMapping(\n    const std::string &temp_mapping_output_file_path,\n    uint32_t num_reference_sequences,\n    const std::vector<std::vector<PairedPAFMapping> > &mappings) {\n  FILE *temp_mapping_output_file =\n      fopen(temp_mapping_output_file_path.c_str(), \"wb\");\n  assert(temp_mapping_output_file != NULL);\n  for (size_t ri = 0; ri < num_reference_sequences; ++ri) {\n    // make sure mappings[ri] exists even if its size is 0\n    size_t num_mappings = mappings[ri].size();\n    fwrite(&num_mappings, sizeof(size_t), 1, temp_mapping_output_file);\n    if (mappings[ri].size() > 0) {\n      for (size_t mi = 0; mi < num_mappings; ++mi) {\n        mappings[ri][mi].WriteToFile(temp_mapping_output_file);\n      }\n      // fwrite(mappings[ri].data(), sizeof(MappingRecord), mappings[ri].size(),\n      // temp_mapping_output_file);\n    }\n  }\n  fclose(temp_mapping_output_file);\n}\n\ntemplate <>\nvoid MappingWriter<PairedPAFMapping>::AppendMapping(\n    uint32_t rid, const SequenceBatch &reference,\n    const PairedPAFMapping &mapping) {\n  bool positive_strand = mapping.IsPositiveStrand();\n  uint32_t positive_read_end =\n      mapping.fragment_start_position_ + mapping.positive_alignment_length_;\n  uint32_t negative_read_end =\n      mapping.fragment_start_position_ + mapping.fragment_length_;\n  uint32_t negative_read_start =\n      negative_read_end - mapping.negative_alignment_length_;\n  const char *reference_sequence_name = reference.GetSequenceNameAt(rid);\n  uint32_t reference_sequence_length = reference.GetSequenceLengthAt(rid);\n  if (positive_strand) {\n    this->AppendMappingOutput(\n        mapping.read1_name_ + \"\\t\" + std::to_string(mapping.read1_length_) +\n        \"\\t\" + std::to_string(0) + \"\\t\" +\n        std::to_string(mapping.read1_length_) + \"\\t\" + \"+\" + \"\\t\" +\n        std::string(reference_sequence_name) + \"\\t\" +\n        std::to_string(reference_sequence_length) + \"\\t\" +\n        std::to_string(mapping.fragment_start_position_) + \"\\t\" +\n        std::to_string(positive_read_end) + \"\\t\" +\n        std::to_string(mapping.read1_length_) + \"\\t\" +\n        std::to_string(mapping.positive_alignment_length_) + \"\\t\" +\n        std::to_string(mapping.mapq1_) + \"\\n\");\n    this->AppendMappingOutput(\n        mapping.read2_name_ + \"\\t\" + std::to_string(mapping.read2_length_) +\n        \"\\t\" + std::to_string(0) + \"\\t\" +\n        std::to_string(mapping.read2_length_) + \"\\t\" + \"-\" + \"\\t\" +\n        std::string(reference_sequence_name) + \"\\t\" +\n        std::to_string(reference_sequence_length) + \"\\t\" +\n        std::to_string(negative_read_start) + \"\\t\" +\n        std::to_string(negative_read_end) + \"\\t\" +\n        std::to_string(mapping.read2_length_) + \"\\t\" +\n        std::to_string(mapping.negative_alignment_length_) + \"\\t\" +\n        std::to_string(mapping.mapq2_) + \"\\n\");\n  } else {\n    this->AppendMappingOutput(\n        mapping.read1_name_ + \"\\t\" + std::to_string(mapping.read1_length_) +\n        \"\\t\" + std::to_string(0) + \"\\t\" +\n        std::to_string(mapping.read1_length_) + \"\\t\" + \"-\" + \"\\t\" +\n        std::string(reference_sequence_name) + \"\\t\" +\n        std::to_string(reference_sequence_length) + \"\\t\" +\n        std::to_string(negative_read_start) + \"\\t\" +\n        std::to_string(negative_read_end) + \"\\t\" +\n        std::to_string(mapping.read1_length_) + \"\\t\" +\n        std::to_string(mapping.negative_alignment_length_) + \"\\t\" +\n        std::to_string(mapping.mapq1_) + \"\\n\");\n    this->AppendMappingOutput(\n        mapping.read2_name_ + \"\\t\" + std::to_string(mapping.read2_length_) +\n        \"\\t\" + std::to_string(0) + \"\\t\" +\n        std::to_string(mapping.read2_length_) + \"\\t\" + \"+\" + \"\\t\" +\n        std::string(reference_sequence_name) + \"\\t\" +\n        std::to_string(reference_sequence_length) + \"\\t\" +\n        std::to_string(mapping.fragment_start_position_) + \"\\t\" +\n        std::to_string(positive_read_end) + \"\\t\" +\n        std::to_string(mapping.read2_length_) + \"\\t\" +\n        std::to_string(mapping.positive_alignment_length_) + \"\\t\" +\n        std::to_string(mapping.mapq2_) + \"\\n\");\n  }\n}\n\n// Specialization for SAM format.\ntemplate <>\nvoid MappingWriter<SAMMapping>::OutputHeader(uint32_t num_reference_sequences,\n                                             const SequenceBatch &reference) {\n  for (uint32_t rid = 0; rid < num_reference_sequences; ++rid) {\n    const char *reference_sequence_name = reference.GetSequenceNameAt(rid);\n    uint32_t reference_sequence_length = reference.GetSequenceLengthAt(rid);\n    this->AppendMappingOutput(\n        \"@SQ\\tSN:\" + std::string(reference_sequence_name) +\n        \"\\tLN:\" + std::to_string(reference_sequence_length) + \"\\n\");\n  }\n}\n\ntemplate <>\nvoid MappingWriter<SAMMapping>::AppendMapping(uint32_t rid,\n                                              const SequenceBatch &reference,\n                                              const SAMMapping &mapping) {\n  // const char *reference_sequence_name = reference.GetSequenceNameAt(rid);\n  // uint32_t reference_sequence_length = reference.GetSequenceLengthAt(rid);\n  // std::string strand = (mapping.direction & 1) == 1 ? \"+\" : \"-\";\n  // uint32_t mapping_end_position = mapping.fragment_start_position +\n  // mapping.fragment_length;\n  const char *reference_sequence_name =\n      (mapping.flag_ & BAM_FUNMAP) > 0 ? \"*\" : reference.GetSequenceNameAt(rid);\n  const char *mate_ref_sequence_name =\n      mapping.mrid_ < 0 ? \"*\" : \n      ((uint32_t)mapping.mrid_ == rid ? \"=\" : reference.GetSequenceNameAt(mapping.mrid_));\n  const uint32_t mapping_start_position = mapping.GetStartPosition();\n  const uint32_t mate_mapping_start_position = mapping.mrid_ < 0 ? 0 : (mapping.mpos_ + 1);\n  this->AppendMappingOutput(\n      mapping.read_name_ + \"\\t\" + std::to_string(mapping.flag_) + \"\\t\" +\n      std::string(reference_sequence_name) + \"\\t\" +\n      std::to_string(mapping_start_position) + \"\\t\" +\n      std::to_string(mapping.mapq_) + \"\\t\" + mapping.GenerateCigarString() +\n      \"\\t\" + std::string(mate_ref_sequence_name) + \"\\t\" + \n      std::to_string(mate_mapping_start_position) + \"\\t\" + \n      std::to_string(mapping.tlen_) + \"\\t\" +\n      mapping.sequence_ + \"\\t\" + mapping.sequence_qual_ + \"\\t\" +\n      mapping.GenerateIntTagString(\"NM\", mapping.NM_) +\n      \"\\tMD:Z:\" + mapping.MD_);\n  if (cell_barcode_length_ > 0) {\n    this->AppendMappingOutput(\"\\tCB:Z:\" +\n                              barcode_translator_.Translate(\n                                  mapping.cell_barcode_, cell_barcode_length_));\n  }\n  this->AppendMappingOutput(\"\\n\");\n}\n\ntemplate <>\nvoid MappingWriter<SAMMapping>::OutputTempMapping(\n    const std::string &temp_mapping_output_file_path,\n    uint32_t num_reference_sequences,\n    const std::vector<std::vector<SAMMapping> > &mappings) {\n  FILE *temp_mapping_output_file =\n      fopen(temp_mapping_output_file_path.c_str(), \"wb\");\n  assert(temp_mapping_output_file != NULL);\n  for (size_t ri = 0; ri < num_reference_sequences; ++ri) {\n    // make sure mappings[ri] exists even if its size is 0\n    size_t num_mappings = mappings[ri].size();\n    fwrite(&num_mappings, sizeof(size_t), 1, temp_mapping_output_file);\n    if (mappings[ri].size() > 0) {\n      for (size_t mi = 0; mi < num_mappings; ++mi) {\n        mappings[ri][mi].WriteToFile(temp_mapping_output_file);\n      }\n      // fwrite(mappings[ri].data(), sizeof(MappingRecord), mappings[ri].size(),\n      // temp_mapping_output_file);\n    }\n  }\n  fclose(temp_mapping_output_file);\n}\n\n// Specialization for pairs format.\ntemplate <>\nvoid MappingWriter<PairsMapping>::OutputHeader(uint32_t num_reference_sequences,\n                                               const SequenceBatch &reference) {\n  std::vector<uint32_t> rid_order;\n  rid_order.resize(num_reference_sequences);\n  uint32_t i;\n  for (i = 0; i < num_reference_sequences; ++i) {\n    rid_order[pairs_custom_rid_rank_[i]] = i;\n  }\n  this->AppendMappingOutput(\"## pairs format v1.0.0\\n#shape: upper triangle\\n\");\n  for (i = 0; i < num_reference_sequences; ++i) {\n    uint32_t rid = rid_order[i];\n    const char *reference_sequence_name = reference.GetSequenceNameAt(rid);\n    uint32_t reference_sequence_length = reference.GetSequenceLengthAt(rid);\n    this->AppendMappingOutput(\n        \"#chromsize: \" + std::string(reference_sequence_name) + \" \" +\n        std::to_string(reference_sequence_length) + \"\\n\");\n  }\n  this->AppendMappingOutput(\n      \"#columns: readID chrom1 pos1 chrom2 pos2 strand1 strand2 pair_type mapq1 mapq2\\n\");\n}\n\ntemplate <>\nvoid MappingWriter<PairsMapping>::AppendMapping(uint32_t rid,\n                                                const SequenceBatch &reference,\n                                                const PairsMapping &mapping) {\n  const char *reference_sequence_name1 =\n      reference.GetSequenceNameAt(mapping.rid1_);\n  const char *reference_sequence_name2 =\n      reference.GetSequenceNameAt(mapping.rid2_);\n  this->AppendMappingOutput(mapping.read_name_ + \"\\t\" +\n                            std::string(reference_sequence_name1) + \"\\t\" +\n                            std::to_string(mapping.GetPosition(1)) + \"\\t\" +\n                            std::string(reference_sequence_name2) + \"\\t\" +\n                            std::to_string(mapping.GetPosition(2)) + \"\\t\" +\n                            std::string(1, mapping.GetStrand(1)) + \"\\t\" +\n                            std::string(1, mapping.GetStrand(2)) + \"\\tUU\\t\" +\n                            std::to_string(mapping.mapq_) + \"\\t\" + // mapq1\n                            std::to_string(mapping.mapq_) + \"\\n\"); // mapq2\n}\n\ntemplate <>\nvoid MappingWriter<PairsMapping>::OutputTempMapping(\n    const std::string &temp_mapping_output_file_path,\n    uint32_t num_reference_sequences,\n    const std::vector<std::vector<PairsMapping> > &mappings) {\n  FILE *temp_mapping_output_file =\n      fopen(temp_mapping_output_file_path.c_str(), \"wb\");\n  assert(temp_mapping_output_file != NULL);\n  for (size_t ri = 0; ri < num_reference_sequences; ++ri) {\n    // make sure mappings[ri] exists even if its size is 0\n    size_t num_mappings = mappings[ri].size();\n    fwrite(&num_mappings, sizeof(size_t), 1, temp_mapping_output_file);\n    if (mappings[ri].size() > 0) {\n      for (size_t mi = 0; mi < num_mappings; ++mi) {\n        mappings[ri][mi].WriteToFile(temp_mapping_output_file);\n      }\n      // fwrite(mappings[ri].data(), sizeof(MappingRecord), mappings[ri].size(),\n      // temp_mapping_output_file);\n    }\n  }\n  fclose(temp_mapping_output_file);\n}\n\n}  // namespace chromap\n"
  },
  {
    "path": "src/mapping_writer.h",
    "content": "#ifndef MAPPING_WRITER_H_\n#define MAPPING_WRITER_H_\n\n#include <assert.h>\n\n#include <cinttypes>\n#include <cstring>\n#include <functional>\n#include <iostream>\n#include <limits>\n#include <string>\n#include <vector>\n\n#include \"barcode_translator.h\"\n#include \"bed_mapping.h\"\n#include \"mapping.h\"\n#include \"mapping_parameters.h\"\n#include \"paf_mapping.h\"\n#include \"pairs_mapping.h\"\n#include \"sam_mapping.h\"\n#include \"sequence_batch.h\"\n#include \"temp_mapping.h\"\n#include \"utils.h\"\n#include \"summary_metadata.h\"\n\nnamespace chromap {\n\ntemplate <typename MappingRecord>\nclass MappingWriter {\n public:\n  MappingWriter() = delete;\n\n  MappingWriter(const MappingParameters mapping_parameters,\n                const uint32_t cell_barcode_length,\n                const std::vector<int> &pairs_custom_rid_rank)\n      : mapping_parameters_(mapping_parameters),\n        cell_barcode_length_(cell_barcode_length),\n        pairs_custom_rid_rank_(pairs_custom_rid_rank) {\n    if (!mapping_parameters_.barcode_translate_table_file_path.empty()) {\n      barcode_translator_.SetTranslateTable(\n          mapping_parameters_.barcode_translate_table_file_path);\n    }\n    summary_metadata_.SetBarcodeLength(cell_barcode_length);\n    mapping_output_file_ =\n        fopen(mapping_parameters_.mapping_output_file_path.c_str(), \"w\");\n    assert(mapping_output_file_ != nullptr);\n  }\n\n  ~MappingWriter() { fclose(mapping_output_file_); }\n\n  void OutputTempMappings(\n      uint32_t num_reference_sequences,\n      std::vector<std::vector<MappingRecord>> &mappings_on_diff_ref_seqs,\n      std::vector<TempMappingFileHandle<MappingRecord>>\n          &temp_mapping_file_handles);\n\n  void OutputMappings(uint32_t num_reference_sequences,\n                      const SequenceBatch &reference,\n                      const std::vector<std::vector<MappingRecord>> &mappings);\n\n  void OutputHeader(uint32_t num_reference_sequences,\n                    const SequenceBatch &reference);\n\n  void ProcessAndOutputMappingsInLowMemory(\n      uint32_t num_mappings_in_mem, uint32_t num_reference_sequences,\n      const SequenceBatch &reference,\n      const khash_t(k64_seq) * barcode_whitelist_lookup_table,\n      std::vector<TempMappingFileHandle<MappingRecord>>\n          &temp_mapping_file_handles);\n\n  void OutputSummaryMetadata(std::vector<double> frip_est_coeffs = {0.0, 0.0, 0.0, 0.0, 0.0}, bool output_num_cache_slots_info = true);\n  void UpdateSummaryMetadata(uint64_t barcode, int type, int change);\n  void UpdateSpeicalCategorySummaryMetadata(int category, int type, int change);\n  void AdjustSummaryPairedEndOverCount();\n\n protected:\n  void AppendMapping(uint32_t rid, const SequenceBatch &reference,\n                     const MappingRecord &mapping);\n\n  inline void AppendMappingOutput(const std::string &line) {\n    fprintf(mapping_output_file_, \"%s\", line.data());\n  }\n\n  size_t FindBestMappingIndexFromDuplicates(\n      const khash_t(k64_seq) * barcode_whitelist_lookup_table,\n      const std::vector<MappingRecord> &duplicates);\n\n  void OutputMappingsInVector(\n      uint8_t mapq_threshold, uint32_t num_reference_sequences,\n      const SequenceBatch &reference,\n      const std::vector<std::vector<MappingRecord>> &mappings);\n\n  // Output the mappings in a temp file.\n  inline void OutputTempMapping(\n      const std::string &temp_mapping_output_file_path,\n      uint32_t num_reference_sequences,\n      const std::vector<std::vector<MappingRecord>> &mappings) {\n    FILE *temp_mapping_output_file =\n        fopen(temp_mapping_output_file_path.c_str(), \"wb\");\n    assert(temp_mapping_output_file != NULL);\n    for (size_t ri = 0; ri < num_reference_sequences; ++ri) {\n      // make sure mappings[ri] exists even if its size is 0\n      size_t num_mappings = mappings[ri].size();\n      fwrite(&num_mappings, sizeof(size_t), 1, temp_mapping_output_file);\n      if (mappings[ri].size() > 0) {\n        fwrite(mappings[ri].data(), sizeof(MappingRecord), mappings[ri].size(),\n               temp_mapping_output_file);\n      }\n    }\n    fclose(temp_mapping_output_file);\n  }\n\n  // TODO(Haowen): use mapping_output_format_ variable to decide output in BED\n  // or TagAlign. It should be removed later.\n  const MappingParameters mapping_parameters_;\n  const uint32_t cell_barcode_length_;\n  FILE *mapping_output_file_ = nullptr;\n  BarcodeTranslator barcode_translator_;\n  SummaryMetadata summary_metadata_;\n\n  // for pairs\n  const std::vector<int> pairs_custom_rid_rank_;\n};\n\ntemplate <typename MappingRecord>\nsize_t MappingWriter<MappingRecord>::FindBestMappingIndexFromDuplicates(\n    const khash_t(k64_seq) * barcode_whitelist_lookup_table,\n    const std::vector<MappingRecord> &duplicates) {\n  // Find the best barcode, break ties first by the number of the\n  // barcodes in the dups, then by the barcode abundance.\n  size_t best_mapping_index = 0;\n\n  khiter_t barcode_whitelist_lookup_table_iterator =\n      kh_get(k64_seq, barcode_whitelist_lookup_table,\n             duplicates[best_mapping_index].GetBarcode());\n\n  double best_mapping_barcode_abundance = kh_value(\n      barcode_whitelist_lookup_table,\n      barcode_whitelist_lookup_table_iterator);  /// (double)num_sample_barcodes_;\n\n  for (size_t bulk_dup_i = 1; bulk_dup_i < duplicates.size(); ++bulk_dup_i) {\n    barcode_whitelist_lookup_table_iterator =\n        kh_get(k64_seq, barcode_whitelist_lookup_table,\n               duplicates[bulk_dup_i].GetBarcode());\n\n    const double current_mapping_barcode_abundance = kh_value(\n        barcode_whitelist_lookup_table,\n        barcode_whitelist_lookup_table_iterator);  /// (double)num_sample_barcodes_;\n\n    const bool same_num_dups_with_higer_barcode_abundance =\n        duplicates[bulk_dup_i].num_dups_ ==\n            duplicates[best_mapping_index].num_dups_ &&\n        current_mapping_barcode_abundance > best_mapping_barcode_abundance;\n\n    if (duplicates[bulk_dup_i].num_dups_ >\n            duplicates[best_mapping_index].num_dups_ ||\n        same_num_dups_with_higer_barcode_abundance) {\n      best_mapping_index = bulk_dup_i;\n      best_mapping_barcode_abundance = current_mapping_barcode_abundance;\n    }\n  }\n  return best_mapping_index;\n}\n\ntemplate <typename MappingRecord>\nvoid MappingWriter<MappingRecord>::ProcessAndOutputMappingsInLowMemory(\n    uint32_t num_mappings_in_mem, uint32_t num_reference_sequences,\n    const SequenceBatch &reference,\n    const khash_t(k64_seq) * barcode_whitelist_lookup_table,\n    std::vector<TempMappingFileHandle<MappingRecord>>\n        &temp_mapping_file_handles) {\n  if (temp_mapping_file_handles.size() == 0) {\n    return;\n  }\n\n  double sort_and_dedupe_start_time = GetRealTime();\n\n  // Calculate block size and initialize\n  uint64_t max_mem_size = 10 * ((uint64_t)1 << 30);\n  if (mapping_parameters_.mapping_output_format == MAPPINGFORMAT_SAM ||\n      mapping_parameters_.mapping_output_format == MAPPINGFORMAT_PAIRS ||\n      mapping_parameters_.mapping_output_format == MAPPINGFORMAT_PAF) {\n    max_mem_size = (uint64_t)1 << 30;\n  }\n  for (size_t hi = 0; hi < temp_mapping_file_handles.size(); ++hi) {\n    const uint32_t temp_mapping_block_size =\n        max_mem_size / temp_mapping_file_handles.size() / sizeof(MappingRecord);\n\n    temp_mapping_file_handles[hi].InitializeTempMappingLoading(\n        temp_mapping_block_size);\n    temp_mapping_file_handles[hi].LoadTempMappingBlock(num_reference_sequences);\n  }\n\n  // Merge and dedupe.\n  uint32_t last_rid = std::numeric_limits<uint32_t>::max();\n  MappingRecord last_mapping = MappingRecord();\n  uint32_t num_last_mapping_dups = 0;\n  uint64_t num_uni_mappings = 0;\n  uint64_t num_multi_mappings = 0;\n  uint64_t num_mappings_passing_filters = 0;\n  uint64_t num_total_mappings = 0;\n  std::vector<MappingRecord> temp_dups_for_bulk_level_dedup;\n  temp_dups_for_bulk_level_dedup.reserve(255);\n\n  const bool deduplicate_at_bulk_level_for_single_cell_data =\n      mapping_parameters_.remove_pcr_duplicates &&\n      !mapping_parameters_.is_bulk_data &&\n      mapping_parameters_.remove_pcr_duplicates_at_bulk_level;\n\n  while (true) {\n    // Merge, dedupe and output.\n    // Find min first (sorted by rid and then barcode and then positions).\n    size_t min_handle_index = temp_mapping_file_handles.size();\n    uint32_t min_rid = std::numeric_limits<uint32_t>::max();\n\n    for (size_t hi = 0; hi < temp_mapping_file_handles.size(); ++hi) {\n      const TempMappingFileHandle<MappingRecord> &current_handle =\n          temp_mapping_file_handles[hi];\n      if (current_handle.HasMappings()) {\n        const bool rid_is_smaller = current_handle.current_rid < min_rid;\n        const bool same_rid_smaller_mapping =\n            current_handle.current_rid == min_rid &&\n            current_handle.GetCurrentMapping() <\n                temp_mapping_file_handles[min_handle_index].GetCurrentMapping();\n\n        if (rid_is_smaller || same_rid_smaller_mapping) {\n          min_handle_index = hi;\n          min_rid = current_handle.current_rid;\n        }\n      }\n    }\n\n    // All mappings are merged. We only have to handle the case when the last\n    // mapping is a duplicate.\n    if (min_handle_index == temp_mapping_file_handles.size()) {\n      break;\n    }\n\n    ++num_total_mappings;\n\n    // Output the current min mapping if it is not a duplicate.\n    const MappingRecord &current_min_mapping =\n        temp_mapping_file_handles[min_handle_index].GetCurrentMapping();\n\n    const bool is_first_iteration = num_total_mappings == 1;\n    const bool current_mapping_is_duplicated_at_cell_level =\n        !is_first_iteration && current_min_mapping == last_mapping;\n    const bool current_mapping_is_duplicated_at_bulk_level =\n        !is_first_iteration && deduplicate_at_bulk_level_for_single_cell_data &&\n        current_min_mapping.IsSamePosition(last_mapping);\n    const bool current_mapping_is_duplicated =\n        last_rid == min_rid && (current_mapping_is_duplicated_at_cell_level ||\n                                current_mapping_is_duplicated_at_bulk_level);\n    if (mapping_parameters_.remove_pcr_duplicates &&\n        current_mapping_is_duplicated) {\n      ++num_last_mapping_dups;\n      if (deduplicate_at_bulk_level_for_single_cell_data) {\n        if (!temp_dups_for_bulk_level_dedup.empty() &&\n            current_min_mapping == temp_dups_for_bulk_level_dedup.back()) {\n          // Merge if their barcodes are the same. Be careful of \"==\" here!\n          temp_dups_for_bulk_level_dedup.back() = current_min_mapping;\n          temp_dups_for_bulk_level_dedup.back().num_dups_ += 1;\n        } else {\n          temp_dups_for_bulk_level_dedup.push_back(current_min_mapping);\n          temp_dups_for_bulk_level_dedup.back().num_dups_ = 1;\n        }\n      }\n      if (current_min_mapping.mapq_ > last_mapping.mapq_) {\n        last_mapping = current_min_mapping ;\n      }\n    } else {\n      if (!is_first_iteration) {\n        if (deduplicate_at_bulk_level_for_single_cell_data) {\n          size_t best_mapping_index = FindBestMappingIndexFromDuplicates(\n              barcode_whitelist_lookup_table, temp_dups_for_bulk_level_dedup);\n          last_mapping = temp_dups_for_bulk_level_dedup[best_mapping_index];\n\n          temp_dups_for_bulk_level_dedup.clear();\n        }\n\n        if (last_mapping.mapq_ >= mapping_parameters_.mapq_threshold) {\n          last_mapping.num_dups_ =\n              std::min((uint32_t)std::numeric_limits<uint8_t>::max(),\n                       num_last_mapping_dups);\n          if (mapping_parameters_.Tn5_shift) {\n            last_mapping.Tn5Shift();\n          }\n\n          AppendMapping(last_rid, reference, last_mapping);\n          ++num_mappings_passing_filters;\n          if (!mapping_parameters_.summary_metadata_file_path.empty())\n            summary_metadata_.UpdateCount(last_mapping.GetBarcode(), SUMMARY_METADATA_DUP,\n              num_last_mapping_dups - 1);\n        } else {\n          if (!mapping_parameters_.summary_metadata_file_path.empty())\n            summary_metadata_.UpdateCount(last_mapping.GetBarcode(), SUMMARY_METADATA_LOWMAPQ, \n                      num_last_mapping_dups);\n        }\n        if (!mapping_parameters_.summary_metadata_file_path.empty())\n          summary_metadata_.UpdateCount(last_mapping.GetBarcode(), SUMMARY_METADATA_MAPPED, \n                num_last_mapping_dups);\n\n        if (last_mapping.is_unique_ == 1) {\n          ++num_uni_mappings;\n        } else {\n          ++num_multi_mappings;\n        }\n      }\n\n      last_mapping = current_min_mapping;\n      last_rid = min_rid;\n      num_last_mapping_dups = 1;\n\n      if (deduplicate_at_bulk_level_for_single_cell_data) {\n        temp_dups_for_bulk_level_dedup.push_back(current_min_mapping);\n        temp_dups_for_bulk_level_dedup.back().num_dups_ = 1;\n      }\n    }\n\n    temp_mapping_file_handles[min_handle_index].Next(num_reference_sequences);\n  }\n\n  if (last_mapping.mapq_ >= mapping_parameters_.mapq_threshold) {\n    if (deduplicate_at_bulk_level_for_single_cell_data) {\n      size_t best_mapping_index = FindBestMappingIndexFromDuplicates(\n          barcode_whitelist_lookup_table, temp_dups_for_bulk_level_dedup);\n      last_mapping = temp_dups_for_bulk_level_dedup[best_mapping_index];\n\n      temp_dups_for_bulk_level_dedup.clear();\n    }\n\n    last_mapping.num_dups_ = std::min(\n        (uint32_t)std::numeric_limits<uint8_t>::max(), num_last_mapping_dups);\n    if (mapping_parameters_.Tn5_shift) {\n      last_mapping.Tn5Shift();\n    }\n    AppendMapping(last_rid, reference, last_mapping);\n    ++num_mappings_passing_filters;\n    \n    if (!mapping_parameters_.summary_metadata_file_path.empty())\n      summary_metadata_.UpdateCount(last_mapping.GetBarcode(), SUMMARY_METADATA_DUP,\n          num_last_mapping_dups - 1);\n  } else {\n    if (!mapping_parameters_.summary_metadata_file_path.empty())\n      summary_metadata_.UpdateCount(last_mapping.GetBarcode(), SUMMARY_METADATA_LOWMAPQ, \n          num_last_mapping_dups);\n  }\n  if (!mapping_parameters_.summary_metadata_file_path.empty())\n    summary_metadata_.UpdateCount(last_mapping.GetBarcode(), SUMMARY_METADATA_MAPPED, \n          num_last_mapping_dups);\n\n  if (last_mapping.is_unique_ == 1) {\n    ++num_uni_mappings;\n  } else {\n    ++num_multi_mappings;\n  }\n\n  // Delete temp files.\n  for (size_t hi = 0; hi < temp_mapping_file_handles.size(); ++hi) {\n    temp_mapping_file_handles[hi].FinalizeTempMappingLoading();\n    remove(temp_mapping_file_handles[hi].file_path.c_str());\n  }\n\n  if (mapping_parameters_.remove_pcr_duplicates) {\n    std::cerr << \"Sorted, deduped and outputed mappings in \"\n              << GetRealTime() - sort_and_dedupe_start_time << \"s.\\n\";\n  } else {\n    std::cerr << \"Sorted and outputed mappings in \"\n              << GetRealTime() - sort_and_dedupe_start_time << \"s.\\n\";\n  }\n  std::cerr << \"# uni-mappings: \" << num_uni_mappings\n            << \", # multi-mappings: \" << num_multi_mappings\n            << \", total: \" << num_uni_mappings + num_multi_mappings << \".\\n\";\n  std::cerr << \"Number of output mappings (passed filters): \"\n            << num_mappings_passing_filters << \"\\n\";\n}\n\ntemplate <typename MappingRecord>\nvoid MappingWriter<MappingRecord>::OutputTempMappings(\n    uint32_t num_reference_sequences,\n    std::vector<std::vector<MappingRecord>> &mappings_on_diff_ref_seqs,\n    std::vector<TempMappingFileHandle<MappingRecord>>\n        &temp_mapping_file_handles) {\n  TempMappingFileHandle<MappingRecord> temp_mapping_file_handle;\n  temp_mapping_file_handle.file_path =\n      mapping_parameters_.mapping_output_file_path + \".temp\" +\n      std::to_string(temp_mapping_file_handles.size());\n  if (mapping_parameters_.mapping_output_file_path == \"/dev/stdout\"\n      || mapping_parameters_.mapping_output_file_path == \"/dev/stderr\")\n  {\n    temp_mapping_file_handle.file_path = \"chromap_output.temp\" +\n      std::to_string(temp_mapping_file_handles.size());\n  }\n  temp_mapping_file_handles.emplace_back(temp_mapping_file_handle);\n\n  OutputTempMapping(temp_mapping_file_handle.file_path, num_reference_sequences,\n                    mappings_on_diff_ref_seqs);\n\n  for (uint32_t i = 0; i < num_reference_sequences; ++i) {\n    mappings_on_diff_ref_seqs[i].clear();\n  }\n}\n\ntemplate <typename MappingRecord>\nvoid MappingWriter<MappingRecord>::OutputMappingsInVector(\n    uint8_t mapq_threshold, uint32_t num_reference_sequences,\n    const SequenceBatch &reference,\n    const std::vector<std::vector<MappingRecord>> &mappings) {\n  uint64_t num_mappings_passing_filters = 0;\n  for (uint32_t ri = 0; ri < num_reference_sequences; ++ri) {\n    for (auto it = mappings[ri].begin(); it != mappings[ri].end(); ++it) {\n      uint8_t mapq = (it->mapq_);\n      // uint8_t is_unique = (it->is_unique);\n      if (mapq >= mapq_threshold) {\n        // if (allocate_multi_mappings_ || (only_output_unique_mappings_ &&\n        // is_unique == 1)) {\n        AppendMapping(ri, reference, *it);\n        ++num_mappings_passing_filters;\n        //}\n        //it->num_dups_ is capped by 255 here, so the count might be different in the\n        //  low-mem mode.\n        if (!mapping_parameters_.summary_metadata_file_path.empty())\n          summary_metadata_.UpdateCount(it->GetBarcode(), SUMMARY_METADATA_DUP,\n              it->num_dups_ - 1);\n      } else {\n        if (!mapping_parameters_.summary_metadata_file_path.empty())\n          summary_metadata_.UpdateCount(it->GetBarcode(), SUMMARY_METADATA_LOWMAPQ,\n              it->num_dups_);\n      }\n      if (!mapping_parameters_.summary_metadata_file_path.empty())\n        summary_metadata_.UpdateCount(it->GetBarcode(), SUMMARY_METADATA_MAPPED,\n            it->num_dups_);\n    }\n  }\n  std::cerr << \"Number of output mappings (passed filters): \"\n            << num_mappings_passing_filters << \"\\n\";\n}\n\ntemplate <typename MappingRecord>\nvoid MappingWriter<MappingRecord>::OutputMappings(\n    uint32_t num_reference_sequences, const SequenceBatch &reference,\n    const std::vector<std::vector<MappingRecord>> &mappings) {\n  // if (only_output_unique_mappings_ && mapq_threshold_ < 4)\n  //  mapq_threshold_ = 4;\n  OutputMappingsInVector(mapping_parameters_.mapq_threshold,\n                         num_reference_sequences, reference, mappings);\n}\n\ntemplate <typename MappingRecord>\nvoid MappingWriter<MappingRecord>::OutputSummaryMetadata(std::vector<double> frip_est_coeffs, bool output_num_cache_slots_info) {\n  if (!mapping_parameters_.summary_metadata_file_path.empty())\n  {\n    summary_metadata_.Output(\n        mapping_parameters_.summary_metadata_file_path.c_str(),\n        !mapping_parameters_.barcode_whitelist_file_path.empty() && !mapping_parameters_.output_mappings_not_in_whitelist,\n        frip_est_coeffs,\n        output_num_cache_slots_info\n        );\n  }\n}\n\ntemplate <typename MappingRecord>\n  void MappingWriter<MappingRecord>::UpdateSummaryMetadata(uint64_t barcode, int type, int change) {\n  if (!mapping_parameters_.summary_metadata_file_path.empty())\n    summary_metadata_.UpdateCount(barcode, type, change);\n}\n\n// category: 0: non-whitelist barcode\ntemplate <typename MappingRecord>\n  void MappingWriter<MappingRecord>::UpdateSpeicalCategorySummaryMetadata(int category, int type, int change) {\n  if (!mapping_parameters_.summary_metadata_file_path.empty()) {\n    if (category == 0)\n      summary_metadata_.UpdateNonWhitelistCount(type, change);\n  }\n}\n\ntemplate <typename MappingRecord>\n  void MappingWriter<MappingRecord>::AdjustSummaryPairedEndOverCount() {\n  if (!mapping_parameters_.summary_metadata_file_path.empty()\n      && mapping_parameters_.mapping_output_format == MAPPINGFORMAT_SAM)\n      summary_metadata_.AdjustPairedEndOverCount() ; \n}\n\n\n// Specialization for BED format.\ntemplate <>\nvoid MappingWriter<MappingWithBarcode>::OutputHeader(\n    uint32_t num_reference_sequences, const SequenceBatch &reference);\n\ntemplate <>\nvoid MappingWriter<MappingWithBarcode>::AppendMapping(\n    uint32_t rid, const SequenceBatch &reference,\n    const MappingWithBarcode &mapping);\n\ntemplate <>\nvoid MappingWriter<MappingWithoutBarcode>::OutputHeader(\n    uint32_t num_reference_sequences, const SequenceBatch &reference);\n\ntemplate <>\nvoid MappingWriter<MappingWithoutBarcode>::AppendMapping(\n    uint32_t rid, const SequenceBatch &reference,\n    const MappingWithoutBarcode &mapping);\n\n// Specialization for BEDPE format.\ntemplate <>\nvoid MappingWriter<PairedEndMappingWithoutBarcode>::OutputHeader(\n    uint32_t num_reference_sequences, const SequenceBatch &reference);\n\ntemplate <>\nvoid MappingWriter<PairedEndMappingWithoutBarcode>::AppendMapping(\n    uint32_t rid, const SequenceBatch &reference,\n    const PairedEndMappingWithoutBarcode &mapping);\n\ntemplate <>\nvoid MappingWriter<PairedEndMappingWithBarcode>::OutputHeader(\n    uint32_t num_reference_sequences, const SequenceBatch &reference);\n\ntemplate <>\nvoid MappingWriter<PairedEndMappingWithBarcode>::AppendMapping(\n    uint32_t rid, const SequenceBatch &reference,\n    const PairedEndMappingWithBarcode &mapping);\n\n// Specialization for PAF format.\ntemplate <>\nvoid MappingWriter<PAFMapping>::OutputHeader(uint32_t num_reference_sequences,\n                                             const SequenceBatch &reference);\n\ntemplate <>\nvoid MappingWriter<PAFMapping>::AppendMapping(uint32_t rid,\n                                              const SequenceBatch &reference,\n                                              const PAFMapping &mapping);\n\ntemplate <>\nvoid MappingWriter<PAFMapping>::OutputTempMapping(\n    const std::string &temp_mapping_output_file_path,\n    uint32_t num_reference_sequences,\n    const std::vector<std::vector<PAFMapping>> &mappings);\n\n// Specialization for PairedPAF format.\ntemplate <>\nvoid MappingWriter<PairedPAFMapping>::OutputHeader(\n    uint32_t num_reference_sequences, const SequenceBatch &reference);\n\ntemplate <>\nvoid MappingWriter<PairedPAFMapping>::OutputTempMapping(\n    const std::string &temp_mapping_output_file_path,\n    uint32_t num_reference_sequences,\n    const std::vector<std::vector<PairedPAFMapping>> &mappings);\n\ntemplate <>\nvoid MappingWriter<PairedPAFMapping>::AppendMapping(\n    uint32_t rid, const SequenceBatch &reference,\n    const PairedPAFMapping &mapping);\n\n// Specialization for SAM format.\ntemplate <>\nvoid MappingWriter<SAMMapping>::OutputHeader(uint32_t num_reference_sequences,\n                                             const SequenceBatch &reference);\n\ntemplate <>\nvoid MappingWriter<SAMMapping>::AppendMapping(uint32_t rid,\n                                              const SequenceBatch &reference,\n                                              const SAMMapping &mapping);\n\ntemplate <>\nvoid MappingWriter<SAMMapping>::OutputTempMapping(\n    const std::string &temp_mapping_output_file_path,\n    uint32_t num_reference_sequences,\n    const std::vector<std::vector<SAMMapping>> &mappings);\n\n// Specialization for pairs format.\ntemplate <>\nvoid MappingWriter<PairsMapping>::OutputHeader(uint32_t num_reference_sequences,\n                                               const SequenceBatch &reference);\n\ntemplate <>\nvoid MappingWriter<PairsMapping>::AppendMapping(uint32_t rid,\n                                                const SequenceBatch &reference,\n                                                const PairsMapping &mapping);\n\ntemplate <>\nvoid MappingWriter<PairsMapping>::OutputTempMapping(\n    const std::string &temp_mapping_output_file_path,\n    uint32_t num_reference_sequences,\n    const std::vector<std::vector<PairsMapping>> &mappings);\n\n}  // namespace chromap\n\n#endif  // MAPPING_WRITER_H_\n"
  },
  {
    "path": "src/minimizer.h",
    "content": "#ifndef MINIMIZER_H_\n#define MINIMIZER_H_\n\n#include <utility>\n\n#include \"hit_utils.h\"\n#include \"strand.h\"\n\nnamespace chromap {\n\nclass Minimizer {\n public:\n  Minimizer() = delete;\n\n  Minimizer(std::pair<uint64_t, uint64_t> minimizer)\n      : hash_(minimizer.first), hit_(minimizer.second) {}\n\n  Minimizer(uint64_t hash, uint64_t hit) : hash_(hash), hit_(hit) {}\n\n  ~Minimizer() = default;\n\n  inline uint64_t GetHash() const { return hash_; }\n\n  inline uint64_t GetHit() const { return hit_; }\n\n  inline uint32_t GetSequenceIndex() const { return HitToSequenceIndex(hit_); }\n\n  inline uint32_t GetSequencePosition() const {\n    return HitToSequencePosition(hit_);\n  }\n\n  inline Strand GetSequenceStrand() const { return HitToStrand(hit_); }\n\n  inline bool operator<(const Minimizer &m) const {\n    if (hash_ < m.hash_) {\n      return true;\n    }\n\n    if (hash_ == m.hash_ && hit_ < m.hit_) {\n      return true;\n    }\n\n    return false;\n  }\n\n private:\n  // The hash of the kmer.\n  uint64_t hash_ = 0;\n\n  // The high 31 bits save the sequence index in the sequence batch. The\n  // following 32 bits save the end position on that sequence. And the lowest\n  // bit encodes the strand (0 for positive).\n  uint64_t hit_ = 0;\n};\n\n}  // namespace chromap\n\n#endif  // MINIMIZER_H_\n"
  },
  {
    "path": "src/minimizer_generator.cc",
    "content": "#include \"minimizer_generator.h\"\n\n#include \"utils.h\"\n\nnamespace chromap {\n\nvoid MinimizerGenerator::GenerateMinimizers(\n    const SequenceBatch &sequence_batch, uint32_t sequence_index,\n    std::vector<Minimizer> &minimizers) const {\n  const uint32_t sequence_length =\n      sequence_batch.GetSequenceLengthAt(sequence_index);\n  const char *sequence = sequence_batch.GetSequenceAt(sequence_index);\n\n  const uint64_t num_shifted_bits = 2 * (kmer_size_ - 1);\n  const uint64_t mask = (((uint64_t)1) << (2 * kmer_size_)) - 1;\n\n  uint64_t seeds_in_two_strands[2] = {0, 0};\n  std::pair<uint64_t, uint64_t> buffer[256];\n  std::pair<uint64_t, uint64_t> min_seed = {UINT64_MAX, UINT64_MAX};\n\n  // 2 uint64_t cost 16 bytes.\n  memset(buffer, 0xff, window_size_ * 16);\n\n  int unambiguous_length = 0;\n  int position_in_buffer = 0;\n  int min_position = 0;\n\n  for (uint32_t position = 0; position < sequence_length; ++position) {\n    const uint8_t current_base = CharToUint8(sequence[position]);\n    std::pair<uint64_t, uint64_t> current_seed = {UINT64_MAX, UINT64_MAX};\n\n    if (current_base < 4) {\n      // Not an ambiguous base.\n      // Forward k-mer.\n      seeds_in_two_strands[0] =\n          ((seeds_in_two_strands[0] << 2) | current_base) & mask;\n      // Reverse k-mer.\n      seeds_in_two_strands[1] =\n          (seeds_in_two_strands[1] >> 2) |\n          (((uint64_t)(3 ^ current_base)) << num_shifted_bits);\n\n      if (seeds_in_two_strands[0] == seeds_in_two_strands[1]) {\n        // Skip \"symmetric k-mers\" as we don't know it strand.\n        continue;\n      }\n\n      uint64_t hash_keys_for_two_seeds[2] = {\n          Hash64(seeds_in_two_strands[0], mask),\n          Hash64(seeds_in_two_strands[1], mask)};\n\n      uint64_t strand =\n          hash_keys_for_two_seeds[0] < hash_keys_for_two_seeds[1] ? 0 : 1;\n\n      ++unambiguous_length;\n\n      if (unambiguous_length >= kmer_size_) {\n        current_seed.first = Hash64(hash_keys_for_two_seeds[strand], mask);\n        current_seed.second =\n            ((((uint64_t)sequence_index) << 32 | (uint32_t)position) << 1) |\n            strand;\n      }\n    } else {\n      unambiguous_length = 0;\n    }\n\n    // Need to do this here as appropriate position_in_buffer and\n    // buf[position_in_buffer] are needed below.\n    buffer[position_in_buffer] = current_seed;\n    if (unambiguous_length == window_size_ + kmer_size_ - 1 &&\n        min_seed.first != UINT64_MAX && min_seed.first < current_seed.first) {\n      // Special case for the first window - because identical k-mers are not\n      // stored yet.\n      for (int j = position_in_buffer + 1; j < window_size_; ++j)\n        if (min_seed.first == buffer[j].first &&\n            buffer[j].second != min_seed.second)\n          minimizers.emplace_back(buffer[j]);\n      for (int j = 0; j < position_in_buffer; ++j)\n        if (min_seed.first == buffer[j].first &&\n            buffer[j].second != min_seed.second)\n          minimizers.emplace_back(buffer[j]);\n    }\n\n    if (current_seed.first <= min_seed.first) {\n      // A new minimum; then write the old min.\n      if (unambiguous_length >= window_size_ + kmer_size_ &&\n          min_seed.first != UINT64_MAX) {\n        minimizers.emplace_back(min_seed);\n      }\n      min_seed = current_seed;\n      min_position = position_in_buffer;\n    } else if (position_in_buffer == min_position) {\n      // Old min has moved outside the window.\n      if (unambiguous_length >= window_size_ + kmer_size_ - 1 &&\n          min_seed.first != UINT64_MAX) {\n        minimizers.emplace_back(min_seed);\n      }\n\n      min_seed.first = UINT64_MAX;\n      for (int j = position_in_buffer + 1; j < window_size_; ++j) {\n        // The two loops are necessary when there are identical k-mers.\n        if (min_seed.first >= buffer[j].first) {\n          // >= is important s.t. min is always the closest k-mer.\n          min_seed = buffer[j];\n          min_position = j;\n        }\n      }\n\n      for (int j = 0; j <= position_in_buffer; ++j) {\n        if (min_seed.first >= buffer[j].first) {\n          min_seed = buffer[j];\n          min_position = j;\n        }\n      }\n\n      if (unambiguous_length >= window_size_ + kmer_size_ - 1 &&\n          min_seed.first != UINT64_MAX) {\n        // Write identical k-mers.\n        // These two loops make sure the output is sorted.\n        for (int j = position_in_buffer + 1; j < window_size_; ++j)\n          if (min_seed.first == buffer[j].first &&\n              min_seed.second != buffer[j].second)\n            minimizers.emplace_back(buffer[j]);\n        for (int j = 0; j <= position_in_buffer; ++j)\n          if (min_seed.first == buffer[j].first &&\n              min_seed.second != buffer[j].second)\n            minimizers.emplace_back(buffer[j]);\n      }\n    }\n\n    ++position_in_buffer;\n    if (position_in_buffer == window_size_) {\n      position_in_buffer = 0;\n    }\n  }\n\n  if (min_seed.first != UINT64_MAX) {\n    minimizers.emplace_back(min_seed);\n  }\n}\n\n}  // namespace chromap\n"
  },
  {
    "path": "src/minimizer_generator.h",
    "content": "#ifndef MINIMIZER_GENERATOR_H_\n#define MINIMIZER_GENERATOR_H_\n\n#include <cassert>\n#include <cstdint>\n#include <vector>\n\n#include \"minimizer.h\"\n#include \"sequence_batch.h\"\n\nnamespace chromap {\n\nclass MinimizerGenerator {\n public:\n  MinimizerGenerator() = delete;\n\n  MinimizerGenerator(int kmer_size, int window_size)\n      : kmer_size_(kmer_size), window_size_(window_size) {\n    // 56 bits for a k-mer. So the max kmer size is 28.\n    assert(kmer_size_ > 0 && kmer_size_ <= 28);\n    assert(window_size_ > 0 && window_size_ < 256);\n  }\n\n  ~MinimizerGenerator() = default;\n\n  void GenerateMinimizers(const SequenceBatch &sequence_batch,\n                          uint32_t sequence_index,\n                          std::vector<Minimizer> &minimizers) const;\n\n private:\n  const int kmer_size_;\n  const int window_size_;\n};\n\n}  // namespace chromap\n\n#endif  // MINIMIZER_GENERATOR_H_\n"
  },
  {
    "path": "src/mmcache.hpp",
    "content": "#ifndef CHROMAP_CACHE_H_\n#define CHROMAP_CACHE_H_\n\n#include \"index.h\"\n#include \"minimizer.h\"\n#include <mutex>\n\n#define FINGER_PRINT_SIZE 103\n\n#define HEAD_MM_ARRAY_SIZE 4194304   // 2^22\n#define HEAD_MM_ARRAY_MASK 0x3fffff  // 22 positions\n\nnamespace chromap {\nstruct _mm_cache_entry {\n  std::vector<uint64_t> minimizers;\n  std::vector<int> offsets;  // the distance to the next minimizer\n  std::vector<uint8_t> strands;\n  std::vector<Candidate> positive_candidates;\n  std::vector<Candidate> negative_candidates;\n  int weight;\n  unsigned short finger_print_cnt[FINGER_PRINT_SIZE];\n  int finger_print_cnt_sum;\n  uint32_t repetitive_seed_length;\n  int activated;\n};\n\nclass mm_cache {\n private:\n  int cache_size;\n  struct _mm_cache_entry *cache;\n  int num_locks_for_cache = 1000;\n  omp_lock_t entry_locks_omp[1000];\n  std::mutex print_lock;\n  int kmer_length;\n  int update_limit;\n  int saturate_count;\n  uint64_t *\n      head_mm;  // the first and last minimizer for each cached minimizer vector\n\n  // 0: not match. -1: opposite order. 1: same order\n  int IsMinimizersMatchCache(const std::vector<Minimizer> &minimizers,\n                             const struct _mm_cache_entry &cache) {\n    if (cache.minimizers.size() != minimizers.size()) return 0;\n    int size = minimizers.size();\n    int i, j;\n    int direction = 0;\n    for (i = 0; i < size; ++i) {\n      if (cache.minimizers[i] != minimizers[i].GetHash() ||\n          (minimizers[i].GetHit() & 1) != cache.strands[i])\n        break;\n    }\n    if (i >= size) {\n      for (i = 0; i < size - 1; ++i) {\n        if (cache.offsets[i] != ((int)minimizers[i + 1].GetSequencePosition() -\n                                 (int)minimizers[i].GetSequencePosition()))\n          break;\n      }\n      if (i >= size - 1) direction = 1;\n    }\n\n    if (direction == 1) return 1;\n\n    for (i = 0, j = size - 1; i < size; ++i, --j) {\n      if (cache.minimizers[i] != minimizers[j].GetHash() ||\n          (minimizers[j].GetHit() & 1) == cache.strands[i])\n        break;\n    }\n    if (i >= size) {\n      for (i = 0, j = size - 1; i < size - 1; ++i, --j) {\n        if (cache.offsets[i] !=\n            ((int)minimizers[j].GetSequencePosition()) -\n                ((int)minimizers[j - 1].GetSequencePosition()))\n          break;\n      }\n\n      if (i >= size - 1) {\n        direction = -1;\n      }\n    }\n    return direction;\n  }\n\n public:\n  mm_cache(int size) {\n    cache = new struct _mm_cache_entry[size];\n    head_mm = new uint64_t[HEAD_MM_ARRAY_SIZE];\n    cache_size = size;\n    // memset(cache, 0, sizeof(cache[0]) * size) ;\n    for (int i = 0; i < size; ++i) {\n      cache[i].weight = 0;\n      memset(cache[i].finger_print_cnt, 0,\n             sizeof(unsigned short) * FINGER_PRINT_SIZE);\n      cache[i].finger_print_cnt_sum = 0;\n      cache[i].activated = 0;\n    }\n    memset(head_mm, 0, sizeof(uint64_t) * HEAD_MM_ARRAY_SIZE);\n    update_limit = 10;\n    saturate_count = 100;\n\n    // initialize the array of OpenMP locks\n    for (int i = 0; i < num_locks_for_cache; ++i) {\n        omp_init_lock(&entry_locks_omp[i]);\n    }\n  }\n\n  ~mm_cache() {\n    delete[] cache;\n    delete[] head_mm;\n  \n    // destory OpenMP locks for parallelizing cache update\n    for (int i = 0; i < num_locks_for_cache; ++i) {\n      omp_destroy_lock(&entry_locks_omp[i]);\n    }\n  }\n\n  void SetKmerLength(int kl) { kmer_length = kl; }\n\n  // Return the hash entry index. -1 if failed.\n  int Query(MappingMetadata &mapping_metadata, uint32_t read_len) {\n    const std::vector<Minimizer> &minimizers = mapping_metadata.minimizers_;\n    std::vector<Candidate> &pos_candidates =\n        mapping_metadata.positive_candidates_;\n    std::vector<Candidate> &neg_candidates =\n        mapping_metadata.negative_candidates_;\n    uint32_t &repetitive_seed_length = mapping_metadata.repetitive_seed_length_;\n\n    int i;\n    int msize = minimizers.size();\n    if (msize == 0) return -1;\n    if ((head_mm[(minimizers[0].GetHash() >> 6) & HEAD_MM_ARRAY_MASK] &\n         (1ull << (minimizers[0].GetHash() & 0x3f))) == 0)\n      return -1;\n    uint64_t h = 0;\n    // for (i = 0 ; i < msize; ++i)\n    //  h += (minimizers[i].first);\n    if (msize == 1) {\n      h = (minimizers[0].GetHash());\n    } else {\n      h = minimizers[0].GetHash() + minimizers[msize - 1].GetHash();\n    }\n\n    int hidx = h % cache_size;\n    int direction = IsMinimizersMatchCache(minimizers, cache[hidx]);\n    if (direction == 1) {\n      pos_candidates = cache[hidx].positive_candidates;\n      neg_candidates = cache[hidx].negative_candidates;\n      repetitive_seed_length = cache[hidx].repetitive_seed_length;\n      int size = pos_candidates.size();\n      int shift = (int)minimizers[0].GetSequencePosition();\n      for (i = 0; i < size; ++i) {\n        uint64_t rid = pos_candidates[i].position >> 32;\n        int rpos = (int)pos_candidates[i].position;\n        pos_candidates[i].position = (rid << 32) + (uint32_t)(rpos - shift);\n      }\n      size = neg_candidates.size();\n      for (i = 0; i < size; ++i) neg_candidates[i].position += shift;\n      return hidx;\n    } else if (direction == -1) {  // The \"read\" is on the other direction of\n                                   // the cached \"read\"\n      int size = cache[hidx].negative_candidates.size();\n      // Start position of the last minimizer shoud equal the first minimizer's\n      // end position in rc \"read\".\n      int shift = read_len -\n                  ((int)minimizers[msize - 1].GetSequencePosition()) - 1 +\n                  kmer_length - 1;\n\n      pos_candidates = cache[hidx].negative_candidates;\n      for (i = 0; i < size; ++i) {\n        uint64_t rid = cache[hidx].negative_candidates[i].position >> 32;\n        int rpos = (int)cache[hidx].negative_candidates[i].position;\n        pos_candidates[i].position =\n            (rid << 32) + (uint32_t)(rpos + shift - read_len + 1);\n      }\n      size = cache[hidx].positive_candidates.size();\n      neg_candidates = cache[hidx].positive_candidates;\n      for (i = 0; i < size; ++i)\n        neg_candidates[i].position =\n            cache[hidx].positive_candidates[i].position - shift + read_len - 1;\n      repetitive_seed_length = cache[hidx].repetitive_seed_length;\n\n      return hidx;\n    } else {\n      return -1;\n    }\n  }\n\n  void Update(const std::vector<Minimizer> &minimizers,\n              std::vector<Candidate> &pos_candidates,\n              std::vector<Candidate> &neg_candidates,\n              uint32_t repetitive_seed_length,\n              bool debug=false) {\n    int i;\n    int msize = minimizers.size();\n\n    uint64_t h = 0;  // for hash\n    uint64_t f = 0;  // for finger printing\n\n    if (msize == 0)\n      return;\n    else if (msize == 1) {\n      h = f = (minimizers[0].GetHash());\n    } else {\n      h = minimizers[0].GetHash() + minimizers[msize - 1].GetHash();\n      f = minimizers[0].GetHash() ^ minimizers[msize - 1].GetHash();\n    }\n    int hidx = h % cache_size;\n    int finger_print = f % FINGER_PRINT_SIZE;\n\n    // beginning of locking phase - make sure to release it wherever we exit\n    int lock_index = hidx % num_locks_for_cache;\n    omp_set_lock(&entry_locks_omp[lock_index]);  \n\n    ++cache[hidx].finger_print_cnt[finger_print];\n    ++cache[hidx].finger_print_cnt_sum;\n\n    // case 1: already saturated\n    if (cache[hidx].finger_print_cnt_sum > saturate_count){ \n      omp_unset_lock(&entry_locks_omp[lock_index]);\n      return;\n    }\n\n    // case 2: no heavy hitter or not enough yet\n    if (cache[hidx].finger_print_cnt_sum < 10 ||\n        (int)cache[hidx].finger_print_cnt[finger_print] * 5 <\n            cache[hidx].finger_print_cnt_sum) {\n      omp_unset_lock(&entry_locks_omp[lock_index]);\n      return;\n    }\n\n    int direction = IsMinimizersMatchCache(minimizers, cache[hidx]);\n    if (direction != 0)\n      ++cache[hidx].weight;\n    else\n      --cache[hidx].weight;\n    cache[hidx].activated = 1;\n\n    // Renew the cache\n    if (cache[hidx].weight < 0) {\n      cache[hidx].weight = 1;\n      cache[hidx].minimizers.resize(msize);\n\n      if (msize == 0) {\n        cache[hidx].offsets.clear();\n        cache[hidx].strands.clear();\n        omp_unset_lock(&entry_locks_omp[lock_index]);\n        return;\n      }\n\n      int size = pos_candidates.size();\n      int shift = (int)minimizers[0].GetSequencePosition();\n\n      // Do not cache if it is too near the start.\n      for (i = 0; i < size; ++i)\n        if ((int)pos_candidates[i].position < kmer_length + shift) {\n          cache[hidx].offsets.clear();\n          cache[hidx].strands.clear();\n          cache[hidx].minimizers.clear();\n\n          omp_unset_lock(&entry_locks_omp[lock_index]);\n          return;\n        }\n\n      size = neg_candidates.size();\n      for (i = 0; i < size; ++i)\n        if ((int)neg_candidates[i].position -\n                ((int)minimizers[msize - 1].GetSequencePosition()) <\n            kmer_length + shift) {\n          cache[hidx].offsets.clear();\n          cache[hidx].strands.clear();\n          cache[hidx].minimizers.clear();\n\n          omp_unset_lock(&entry_locks_omp[lock_index]);\n          return;\n        }\n      cache[hidx].offsets.resize(msize - 1);\n      cache[hidx].strands.resize(msize);\n      for (i = 0; i < msize; ++i) {\n        cache[hidx].minimizers[i] = minimizers[i].GetHash();\n        cache[hidx].strands[i] = (minimizers[i].GetHit() & 1);\n      }\n      for (i = 0; i < msize - 1; ++i) {\n        cache[hidx].offsets[i] =\n            ((int)minimizers[i + 1].GetSequencePosition()) -\n            ((int)minimizers[i].GetSequencePosition());\n      }\n      std::vector<Candidate>().swap(cache[hidx].positive_candidates);\n      std::vector<Candidate>().swap(cache[hidx].negative_candidates);\n      cache[hidx].positive_candidates = pos_candidates;\n      cache[hidx].negative_candidates = neg_candidates;\n      cache[hidx].repetitive_seed_length = repetitive_seed_length;\n\n      // adjust the candidate position.\n      size = cache[hidx].positive_candidates.size();\n      for (i = 0; i < size; ++i)\n        cache[hidx].positive_candidates[i].position += shift;\n      size = cache[hidx].negative_candidates.size();\n      for (i = 0; i < size; ++i)\n        cache[hidx].negative_candidates[i].position -= shift;\n\n      // Debugging output (candidate stored in cache)\n      if (debug) {\n        print_lock.lock();\n        std::cout << \"[DEBUG][CACHE][1] hidx = \" << hidx << std::endl;\n        std::cout << \"[DEBUG][CACHE][2]\" << \" pos.size() = \" \n                                << cache[hidx].positive_candidates.size() \n                                << \" , \" << \"neg.size() = \" \n                                << cache[hidx].negative_candidates.size()  \n                                << \" , msize = \" << msize << std::endl;\n        std::cout << \"[DEBUG][CACHE][3] \";\n        for (const auto &minimizer : minimizers) {\n          std::cout << minimizer.GetHash() << \" \";\n        } std::cout << std::endl;\n\n        for (size_t j = 0; j < cache[hidx].positive_candidates.size(); ++j) {\n          std::cout << \"[DEBUG][CACHE][+] \" \n                    << \"hidx = \" << hidx\n                    << \" , cand_ref_seq = \" << cache[hidx].positive_candidates[j].GetReferenceSequenceIndex() \n                    << \" , cand_ref_pos = \" << cache[hidx].positive_candidates[j].GetReferenceSequencePosition()\n                    << \" , support = \" << unsigned(cache[hidx].positive_candidates[j].GetCount()) << std::endl;\n        }\n\n        for (size_t j = 0; j < cache[hidx].negative_candidates.size(); ++j) {\n          std::cout << \"[DEBUG][CACHE][-] \" \n                    << \"hidx = \" << hidx\n                    << \" , cand_ref_seq = \" << cache[hidx].negative_candidates[j].GetReferenceSequenceIndex() \n                    << \" , cand_ref_pos = \" << cache[hidx].negative_candidates[j].GetReferenceSequencePosition() \n                    << \" , support = \" << unsigned(cache[hidx].negative_candidates[j].GetCount()) << std::endl;\n        }\n        print_lock.unlock();\n      }\n\n      // Update head mm array\n      head_mm[(minimizers[0].GetHash() >> 6) & HEAD_MM_ARRAY_MASK] |=\n          (1ull << (minimizers[0].GetHash() & 0x3f));\n      head_mm[(minimizers[msize - 1].GetHash() >> 6) & HEAD_MM_ARRAY_MASK] |=\n          (1ull << (minimizers[msize - 1].GetHash() & 0x3f));\n    }\n    omp_unset_lock(&entry_locks_omp[lock_index]);\n  }\n\n  void DirectUpdateWeight(int idx, int weight) { cache[idx].weight += weight; }\n\n  uint64_t GetMemoryBytes() {\n    int i;\n    uint64_t ret = 0;\n    for (i = 0; i < cache_size; ++i) {\n      ret += sizeof(cache[i]) +\n             cache[i].minimizers.capacity() * sizeof(uint64_t) +\n             cache[i].offsets.capacity() * sizeof(int) +\n             cache[i].positive_candidates.capacity() * sizeof(Candidate) +\n             cache[i].negative_candidates.capacity() * sizeof(Candidate);\n    }\n    return ret;\n  }\n\n  // How many reads from a batch we want to use to update the cache.\n  // paired end data has twice the amount reads, so the threshold is lower\n  uint32_t GetUpdateThreshold(uint32_t num_loaded_reads, \n                              uint64_t num_reads,\n                              bool paired,\n                              double cache_update_param\n                              ) {\n    const uint32_t block = paired ? 2500000 : 5000000;    \n\n    if (num_reads <= block)\n      return num_loaded_reads;\n    else\n      return num_loaded_reads / (1 + (cache_update_param * (num_reads / block)));\n  }\n\n  void PrintStats() {\n    for (int i = 0; i < cache_size; ++i) {\n      printf(\"%d %d %d %d \", cache[i].weight, cache[i].finger_print_cnt_sum,\n             int(cache[i].positive_candidates.size() +\n                 cache[i].negative_candidates.size()),\n             cache[i].activated);\n      int tmp = 0;\n      for (int j = 0; j < FINGER_PRINT_SIZE; ++j)\n        if (cache[i].finger_print_cnt[j] > tmp)\n          tmp = cache[i].finger_print_cnt[j];\n      printf(\"%d\", tmp);\n      for (int j = 0; j < FINGER_PRINT_SIZE; ++j)\n        printf(\" %u\", cache[i].finger_print_cnt[j]);\n      printf(\"\\n\");\n    }\n  }\n};\n}  // namespace chromap\n\n#endif\n"
  },
  {
    "path": "src/paf_mapping.h",
    "content": "#ifndef PAFMAPPING_H_\n#define PAFMAPPING_H_\n\n#include <string>\n\n#include \"mapping.h\"\n\nnamespace chromap {\n\n// When direction = 1, strand is positive\nclass PAFMapping : public Mapping {\n public:\n  uint32_t read_id_;\n  std::string read_name_;\n  uint16_t read_length_;\n  uint32_t fragment_start_position_;\n  uint16_t fragment_length_;\n  uint8_t mapq_ : 6, direction_ : 1, is_unique_ : 1;\n  uint8_t num_dups_;\n  PAFMapping() : num_dups_(0) {}\n  PAFMapping(uint32_t read_id, const std::string &read_name,\n             uint16_t read_length, uint32_t fragment_start_position,\n             uint16_t fragment_length, uint8_t mapq, uint8_t direction,\n             uint8_t is_unique, uint8_t num_dups)\n      : read_id_(read_id),\n        read_name_(read_name),\n        read_length_(read_length),\n        fragment_start_position_(fragment_start_position),\n        fragment_length_(fragment_length),\n        mapq_(mapq),\n        direction_(direction),\n        is_unique_(is_unique),\n        num_dups_(num_dups){};\n  bool operator<(const PAFMapping &m) const {\n    return std::tie(fragment_start_position_, fragment_length_, mapq_,\n                    direction_, is_unique_, read_id_, read_length_) <\n           std::tie(m.fragment_start_position_, m.fragment_length_, m.mapq_,\n                    m.direction_, m.is_unique_, m.read_id_, m.read_length_);\n  }\n  bool operator==(const PAFMapping &m) const {\n    return std::tie(fragment_start_position_) ==\n           std::tie(m.fragment_start_position_);\n  }\n  bool IsSamePosition(const PAFMapping &m) const {\n    return std::tie(fragment_start_position_) ==\n           std::tie(m.fragment_start_position_);\n  }\n  uint64_t GetBarcode() const { return 0; }\n  void Tn5Shift() {\n    if (direction_ == 1) {\n      fragment_start_position_ += 4;\n    } else {\n      fragment_length_ -= 5;\n    }\n  }\n  bool IsPositiveStrand() const { return direction_ > 0 ? true : false; }\n  uint32_t GetStartPosition() const {  // inclusive\n    return fragment_start_position_;\n  }\n  uint32_t GetEndPosition() const {  // exclusive\n    return fragment_start_position_ + fragment_length_;\n  }\n  uint16_t GetByteSize() const {\n    return 2 * sizeof(uint32_t) + 2 * sizeof(uint16_t) + 2 * sizeof(uint8_t) +\n           read_name_.length() * sizeof(char);\n  }\n  size_t WriteToFile(FILE *temp_mapping_output_file) const {\n    size_t num_written_bytes = 0;\n    num_written_bytes +=\n        fwrite(&read_id_, sizeof(uint32_t), 1, temp_mapping_output_file);\n    uint16_t read_name_length = read_name_.length();\n    num_written_bytes += fwrite(&read_name_length, sizeof(uint16_t), 1,\n                                temp_mapping_output_file);\n    num_written_bytes += fwrite(read_name_.data(), sizeof(char),\n                                read_name_length, temp_mapping_output_file);\n    num_written_bytes +=\n        fwrite(&read_length_, sizeof(uint16_t), 1, temp_mapping_output_file);\n    num_written_bytes += fwrite(&fragment_start_position_, sizeof(uint32_t), 1,\n                                temp_mapping_output_file);\n    num_written_bytes += fwrite(&fragment_length_, sizeof(uint16_t), 1,\n                                temp_mapping_output_file);\n    uint8_t mapq_direction_is_unique =\n        (mapq_ << 2) | (direction_ << 1) | is_unique_;\n    num_written_bytes += fwrite(&mapq_direction_is_unique, sizeof(uint8_t), 1,\n                                temp_mapping_output_file);\n    num_written_bytes +=\n        fwrite(&num_dups_, sizeof(uint8_t), 1, temp_mapping_output_file);\n    return num_written_bytes;\n  }\n  size_t LoadFromFile(FILE *temp_mapping_output_file) {\n    size_t num_read_bytes = 0;\n    num_read_bytes +=\n        fread(&read_id_, sizeof(uint32_t), 1, temp_mapping_output_file);\n    uint16_t read_name_length = 0;\n    num_read_bytes +=\n        fread(&read_name_length, sizeof(uint16_t), 1, temp_mapping_output_file);\n    read_name_ = std::string(read_name_length, '\\0');\n    num_read_bytes += fread(&(read_name_[0]), sizeof(char), read_name_length,\n                            temp_mapping_output_file);\n    num_read_bytes +=\n        fread(&read_length_, sizeof(uint16_t), 1, temp_mapping_output_file);\n    num_read_bytes += fread(&fragment_start_position_, sizeof(uint32_t), 1,\n                            temp_mapping_output_file);\n    num_read_bytes +=\n        fread(&fragment_length_, sizeof(uint16_t), 1, temp_mapping_output_file);\n    uint8_t mapq_direction_is_unique = 0;\n    num_read_bytes += fread(&mapq_direction_is_unique, sizeof(uint8_t), 1,\n                            temp_mapping_output_file);\n    mapq_ = (mapq_direction_is_unique >> 2);\n    direction_ = (mapq_direction_is_unique >> 1) & 1;\n    is_unique_ = mapq_direction_is_unique & 1;\n    num_read_bytes +=\n        fread(&num_dups_, sizeof(uint8_t), 1, temp_mapping_output_file);\n    return num_read_bytes;\n  }\n};\n\nclass PairedPAFMapping : public Mapping {\n public:\n  uint32_t read_id_;\n  std::string read1_name_;\n  std::string read2_name_;\n  uint16_t read1_length_;\n  uint16_t read2_length_;\n  uint32_t fragment_start_position_;\n  uint16_t fragment_length_;\n  uint16_t positive_alignment_length_;\n  uint16_t negative_alignment_length_;\n  uint8_t mapq_;\n  uint16_t mapq1_ : 6, mapq2_ : 6, direction_ : 1, is_unique_ : 1,\n      reserved_ : 2;\n  uint8_t num_dups_;\n  // uint8_t mapq; // least significant bit saves the direction of mapping\n  PairedPAFMapping() : num_dups_(0) {}\n  PairedPAFMapping(uint32_t read_id, std::string read1_name,\n                   std::string read2_name, uint16_t read1_length,\n                   uint16_t read2_length, uint32_t fragment_start_position,\n                   uint16_t fragment_length, uint16_t positive_alignment_length,\n                   uint16_t negative_alignment_length, uint8_t mapq,\n                   uint16_t mapq1, uint16_t mapq2, uint16_t direction,\n                   uint16_t is_unique, uint8_t num_dups)\n      : read_id_(read_id),\n        read1_name_(read1_name),\n        read2_name_(read2_name),\n        read1_length_(read1_length),\n        read2_length_(read2_length),\n        fragment_start_position_(fragment_start_position),\n        fragment_length_(fragment_length),\n        positive_alignment_length_(positive_alignment_length),\n        negative_alignment_length_(negative_alignment_length),\n        mapq_(mapq),\n        mapq1_(mapq1),\n        mapq2_(mapq2),\n        direction_(direction),\n        is_unique_(is_unique),\n        num_dups_(num_dups) {}\n  bool operator<(const PairedPAFMapping &m) const {\n    return std::tie(fragment_start_position_, fragment_length_, mapq1_, mapq2_,\n                    direction_, is_unique_, read_id_,\n                    positive_alignment_length_, negative_alignment_length_) <\n           std::tie(m.fragment_start_position_, m.fragment_length_, m.mapq1_,\n                    m.mapq2_, m.direction_, m.is_unique_, m.read_id_,\n                    m.positive_alignment_length_, m.negative_alignment_length_);\n  }\n  bool operator==(const PairedPAFMapping &m) const {\n    return std::tie(fragment_start_position_, fragment_length_) ==\n           std::tie(m.fragment_start_position_, m.fragment_length_);\n  }\n  bool IsSamePosition(const PairedPAFMapping &m) const {\n    return std::tie(fragment_start_position_, fragment_length_) ==\n           std::tie(m.fragment_start_position_, m.fragment_length_);\n  }\n  uint64_t GetBarcode() const { return 0; }\n  void Tn5Shift() {\n    fragment_start_position_ += 4;\n    positive_alignment_length_ -= 4;\n    fragment_length_ -= 9;\n    negative_alignment_length_ -= 5;\n  }\n  bool IsPositiveStrand() const { return direction_ > 0 ? true : false; }\n  uint32_t GetStartPosition() const {  // inclusive\n    return fragment_start_position_;\n  }\n  uint32_t GetEndPosition() const {  // exclusive\n    return fragment_start_position_ + fragment_length_;\n  }\n  uint16_t GetByteSize() const {\n    return 2 * sizeof(uint32_t) + 6 * sizeof(uint16_t) + 2 * sizeof(uint8_t) +\n           (read1_name_.length() + read2_name_.length()) * sizeof(char);\n  }\n  size_t WriteToFile(FILE *temp_mapping_output_file) const {\n    size_t num_written_bytes = 0;\n    num_written_bytes +=\n        fwrite(&read_id_, sizeof(uint32_t), 1, temp_mapping_output_file);\n    uint16_t read1_name_length = read1_name_.length();\n    num_written_bytes += fwrite(&read1_name_length, sizeof(uint16_t), 1,\n                                temp_mapping_output_file);\n    num_written_bytes += fwrite(read1_name_.data(), sizeof(char),\n                                read1_name_length, temp_mapping_output_file);\n    uint16_t read2_name_length = read2_name_.length();\n    num_written_bytes += fwrite(&read2_name_length, sizeof(uint16_t), 1,\n                                temp_mapping_output_file);\n    num_written_bytes += fwrite(read2_name_.data(), sizeof(char),\n                                read2_name_length, temp_mapping_output_file);\n    num_written_bytes +=\n        fwrite(&read1_length_, sizeof(uint16_t), 1, temp_mapping_output_file);\n    num_written_bytes +=\n        fwrite(&read2_length_, sizeof(uint16_t), 1, temp_mapping_output_file);\n    num_written_bytes += fwrite(&fragment_start_position_, sizeof(uint32_t), 1,\n                                temp_mapping_output_file);\n    num_written_bytes += fwrite(&fragment_length_, sizeof(uint16_t), 1,\n                                temp_mapping_output_file);\n    num_written_bytes += fwrite(&positive_alignment_length_, sizeof(uint16_t),\n                                1, temp_mapping_output_file);\n    num_written_bytes += fwrite(&negative_alignment_length_, sizeof(uint16_t),\n                                1, temp_mapping_output_file);\n    num_written_bytes +=\n        fwrite(&mapq_, sizeof(uint8_t), 1, temp_mapping_output_file);\n    uint16_t mapq1_mapq2_direction_is_unique =\n        (mapq1_ << 10) | (mapq2_ << 4) | (direction_ << 3) | (is_unique_ << 2);\n    num_written_bytes += fwrite(&mapq1_mapq2_direction_is_unique,\n                                sizeof(uint16_t), 1, temp_mapping_output_file);\n    num_written_bytes +=\n        fwrite(&num_dups_, sizeof(uint8_t), 1, temp_mapping_output_file);\n    return num_written_bytes;\n  }\n  size_t LoadFromFile(FILE *temp_mapping_output_file) {\n    size_t num_read_bytes = 0;\n    num_read_bytes +=\n        fread(&read_id_, sizeof(uint32_t), 1, temp_mapping_output_file);\n    uint16_t read1_name_length = 0;\n    num_read_bytes += fread(&read1_name_length, sizeof(uint16_t), 1,\n                            temp_mapping_output_file);\n    read1_name_ = std::string(read1_name_length, '\\0');\n    num_read_bytes += fread(&(read1_name_[0]), sizeof(char), read1_name_length,\n                            temp_mapping_output_file);\n    uint16_t read2_name_length = 0;\n    num_read_bytes += fread(&read2_name_length, sizeof(uint16_t), 1,\n                            temp_mapping_output_file);\n    read2_name_ = std::string(read2_name_length, '\\0');\n    num_read_bytes += fread(&(read2_name_[0]), sizeof(char), read2_name_length,\n                            temp_mapping_output_file);\n    num_read_bytes +=\n        fread(&read1_length_, sizeof(uint16_t), 1, temp_mapping_output_file);\n    num_read_bytes +=\n        fread(&read2_length_, sizeof(uint16_t), 1, temp_mapping_output_file);\n    num_read_bytes += fread(&fragment_start_position_, sizeof(uint32_t), 1,\n                            temp_mapping_output_file);\n    num_read_bytes +=\n        fread(&fragment_length_, sizeof(uint16_t), 1, temp_mapping_output_file);\n    num_read_bytes += fread(&positive_alignment_length_, sizeof(uint16_t), 1,\n                            temp_mapping_output_file);\n    num_read_bytes += fread(&negative_alignment_length_, sizeof(uint16_t), 1,\n                            temp_mapping_output_file);\n    num_read_bytes +=\n        fread(&mapq_, sizeof(uint8_t), 1, temp_mapping_output_file);\n    uint16_t mapq1_mapq2_direction_is_unique = 0;\n    num_read_bytes += fread(&mapq1_mapq2_direction_is_unique, sizeof(uint16_t),\n                            1, temp_mapping_output_file);\n    mapq1_ = (mapq1_mapq2_direction_is_unique >> 10);\n    mapq2_ = ((mapq1_mapq2_direction_is_unique << 6) >> 10);\n    direction_ = (mapq1_mapq2_direction_is_unique >> 3) & 1;\n    is_unique_ = (mapq1_mapq2_direction_is_unique >> 2) & 1;\n    num_read_bytes +=\n        fread(&num_dups_, sizeof(uint8_t), 1, temp_mapping_output_file);\n    return num_read_bytes;\n  }\n};\n\n}  // namespace chromap\n\n#endif  // PAFMAPPING_H_\n"
  },
  {
    "path": "src/paired_end_mapping_metadata.h",
    "content": "#ifndef PAIRED_END_MAPPING_METADATA_H_\n#define PAIRED_END_MAPPING_METADATA_H_\n\n#include <algorithm>\n#include <utility>\n#include <vector>\n\n#include \"mapping_metadata.h\"\n\nnamespace chromap {\n\nclass PairedEndMappingMetadata {\n public:\n  inline void PreparedForMappingNextReadPair(int reserve_size) {\n    mapping_metadata1_.PrepareForMappingNextRead(reserve_size);\n    mapping_metadata2_.PrepareForMappingNextRead(reserve_size);\n\n    F1R2_best_mappings_.clear();\n    F2R1_best_mappings_.clear();\n    F1F2_best_mappings_.clear();\n    R1R2_best_mappings_.clear();\n\n    F1R2_best_mappings_.reserve(reserve_size);\n    F2R1_best_mappings_.reserve(reserve_size);\n    F1F2_best_mappings_.reserve(reserve_size);\n    R1R2_best_mappings_.reserve(reserve_size);\n  }\n\n  inline void MoveCandidiatesToBuffer() {\n    mapping_metadata1_.MoveCandidiatesToBuffer();\n    mapping_metadata2_.MoveCandidiatesToBuffer();\n  }\n\n  // Callback function to update all candidates.\n  inline void UpdateCandidates(void (*Update)(std::vector<Candidate> &)) {\n    mapping_metadata1_.UpdateCandidates(Update);\n    mapping_metadata2_.UpdateCandidates(Update);\n  }\n\n  inline void SortMappingsByPositions() {\n    mapping_metadata1_.SortMappingsByPositions();\n    mapping_metadata2_.SortMappingsByPositions();\n  }\n  // inline void ClearAndReserveMinimizers(int reserve_size) {\n  //  mapping_metadata1_.minimizers_.clear();\n  //  mapping_metadata2_.minimizers_.clear();\n  //  mapping_metadata1_.minimizers_.reserve(reserve_size);\n  //  mapping_metadata2_.minimizers_.reserve(reserve_size);\n  //}\n\n  inline bool BothEndsHaveMinimizers() const {\n    return !mapping_metadata1_.minimizers_.empty() &&\n           !mapping_metadata2_.minimizers_.empty();\n  }\n\n  inline int GetMinSumErrors() const { return min_sum_errors_; }\n  inline int GetSecondMinSumErrors() const { return second_min_sum_errors_; }\n  inline int GetNumBestMappings() const { return num_best_mappings_; }\n  inline int GetNumSecondBestMappings() const {\n    return num_second_best_mappings_;\n  }\n\n  // TODO: think how to deal with the code copy.\n  inline const std::vector<std::pair<uint32_t, uint32_t>> &GetBestMappings(\n      const Strand first_mapping_strand,\n      const Strand second_mapping_strand) const {\n    if (first_mapping_strand == kPositive) {\n      if (second_mapping_strand == kPositive) {\n        return F1F2_best_mappings_;\n      }\n      return F1R2_best_mappings_;\n    } else {\n      if (second_mapping_strand == kPositive) {\n        return F2R1_best_mappings_;\n      }\n      return R1R2_best_mappings_;\n    }\n  }\n\n  inline std::vector<std::pair<uint32_t, uint32_t>> &GetBestMappings(\n      const Strand first_mapping_strand, const Strand second_mapping_strand) {\n    if (first_mapping_strand == kPositive) {\n      if (second_mapping_strand == kPositive) {\n        return F1F2_best_mappings_;\n      }\n      return F1R2_best_mappings_;\n    } else {\n      if (second_mapping_strand == kPositive) {\n        return F2R1_best_mappings_;\n      }\n      return R1R2_best_mappings_;\n    }\n  }\n\n  inline void SetMinSumErrors(int min_sum_errors) {\n    min_sum_errors_ = min_sum_errors;\n  }\n  inline void SetSecondMinSumErrors(int second_min_sum_errors) {\n    second_min_sum_errors_ = second_min_sum_errors;\n  }\n  inline void SetNumBestMappings(int num_best_mappings) {\n    num_best_mappings_ = num_best_mappings;\n  }\n  inline void SetNumSecondBestMappings(int num_second_best_mappings) {\n    num_second_best_mappings_ = num_second_best_mappings;\n  }\n\n protected:\n  MappingMetadata mapping_metadata1_;\n  MappingMetadata mapping_metadata2_;\n\n  int min_sum_errors_, second_min_sum_errors_;\n  int num_best_mappings_, num_second_best_mappings_;\n\n  std::vector<std::pair<uint32_t, uint32_t>> F1R2_best_mappings_;\n  std::vector<std::pair<uint32_t, uint32_t>> F2R1_best_mappings_;\n  std::vector<std::pair<uint32_t, uint32_t>> F1F2_best_mappings_;\n  std::vector<std::pair<uint32_t, uint32_t>> R1R2_best_mappings_;\n\n  friend class CandidateProcessor;\n  template <typename MappingRecord>\n  friend class MappingGenerator;\n  friend class Chromap;\n};\n\n}  // namespace chromap\n\n#endif  // PAIRED_END_MAPPING_METADATA_H_\n"
  },
  {
    "path": "src/pairs_mapping.h",
    "content": "#ifndef PAIRSMAPPING_H_\n#define PAIRSMAPPING_H_\n\n#include <string>\n\n#include \"mapping.h\"\n\nnamespace chromap {\n\n// Format for pairtools for HiC data.\nclass PairsMapping : public Mapping {\n public:\n  uint32_t read_id_;\n  std::string read_name_;\n  uint64_t cell_barcode_;\n  int rid1_;\n  int rid2_;\n  uint32_t pos1_;\n  uint32_t pos2_;\n  int strand1_;  // 1-positive. 0-negative\n  int strand2_;\n  uint16_t mapq_ : 8, is_unique_ : 1, num_dups_ : 7;\n\n  PairsMapping() : num_dups_(0) {}\n  PairsMapping(uint32_t read_id, std::string read_name, uint64_t cell_barcode,\n               int rid1, int rid2, uint32_t pos1, uint32_t pos2, int strand1,\n               int strand2, uint8_t mapq, uint8_t is_unique, uint8_t num_dups)\n      : read_id_(read_id),\n        read_name_(read_name),\n        cell_barcode_(cell_barcode),\n        rid1_(rid1),\n        rid2_(rid2),\n        pos1_(pos1),\n        pos2_(pos2),\n        strand1_(strand1),\n        strand2_(strand2),\n        mapq_(mapq),\n        is_unique_(is_unique),\n        num_dups_(num_dups) {}\n  bool operator<(const PairsMapping &m) const {\n    return std::tie(rid1_, rid2_, pos1_, pos2_, mapq_, read_id_) <\n           std::tie(m.rid1_, m.rid2_, m.pos1_, m.pos2_, m.mapq_, m.read_id_);\n  }\n  bool operator==(const PairsMapping &m) const {\n    return std::tie(rid1_, pos1_, rid2_, pos2_) ==\n           std::tie(m.rid1_, m.pos1_, m.rid2_, m.pos2_);\n    // return std::tie(pos1, pos2, rid1, rid2, is_rev1, is_rev2) ==\n    // std::tie(m.pos1, m.pos2, m.rid1, m.rid2, m.is_rev1, m.is_rev2);\n  }\n  bool IsSamePosition(const PairsMapping &m) const {\n    return std::tie(rid1_, pos1_, rid2_, pos2_) ==\n           std::tie(m.rid1_, m.pos1_, m.rid2_, m.pos2_);\n  }\n  uint64_t GetBarcode() const { return 0; }\n  void Tn5Shift() {\n    // We don't support Tn5 shift in SAM format because it has other fields that\n    // depend mapping position.\n  }\n\n  int GetPosition(int idx) const {\n    if (idx == 2) {\n      return pos2_ + 1;\n    }\n    return pos1_ + 1;\n  }\n\n  char GetStrand(int idx) const {\n    int d = strand1_;\n    if (idx == 2) {\n      d = strand2_;\n    }\n    return d > 0 ? '+' : '-';\n  }\n\n  bool IsPositiveStrand() const { return strand1_ > 0 ? true : false; }\n  uint32_t GetStartPosition() const {  // inclusive\n    return pos1_;\n  }\n  uint32_t GetEndPosition() const {  // exclusive\n    return pos2_;\n  }\n  uint16_t GetByteSize() const {\n    return 5 * sizeof(uint32_t) + 1 * sizeof(uint16_t) + 4 * sizeof(int) +\n           read_name_.length() * sizeof(char);\n  }\n  size_t WriteToFile(FILE *temp_mapping_output_file) const {\n    size_t num_written_bytes = 0;\n    num_written_bytes +=\n        fwrite(&read_id_, sizeof(uint32_t), 1, temp_mapping_output_file);\n    uint16_t read_name_length = read_name_.length();\n    num_written_bytes += fwrite(&read_name_length, sizeof(uint16_t), 1,\n                                temp_mapping_output_file);\n    num_written_bytes += fwrite(read_name_.data(), sizeof(char),\n                                read_name_length, temp_mapping_output_file);\n    num_written_bytes +=\n        fwrite(&cell_barcode_, sizeof(uint64_t), 1, temp_mapping_output_file);\n    num_written_bytes +=\n        fwrite(&rid1_, sizeof(int), 1, temp_mapping_output_file);\n    num_written_bytes +=\n        fwrite(&rid2_, sizeof(int), 1, temp_mapping_output_file);\n    num_written_bytes +=\n        fwrite(&pos1_, sizeof(uint32_t), 1, temp_mapping_output_file);\n    num_written_bytes +=\n        fwrite(&pos2_, sizeof(uint32_t), 1, temp_mapping_output_file);\n    num_written_bytes +=\n        fwrite(&strand1_, sizeof(int), 1, temp_mapping_output_file);\n    num_written_bytes +=\n        fwrite(&strand2_, sizeof(int), 1, temp_mapping_output_file);\n    uint16_t mapq_unique_dups = (mapq_ << 8) | (is_unique_ << 7) | num_dups_;\n    num_written_bytes += fwrite(&mapq_unique_dups, sizeof(uint16_t), 1,\n                                temp_mapping_output_file);\n    return num_written_bytes;\n  }\n  size_t LoadFromFile(FILE *temp_mapping_output_file) {\n    size_t num_read_bytes = 0;\n    num_read_bytes +=\n        fread(&read_id_, sizeof(uint32_t), 1, temp_mapping_output_file);\n    uint16_t read_name_length = 0;\n    num_read_bytes +=\n        fread(&read_name_length, sizeof(uint16_t), 1, temp_mapping_output_file);\n    read_name_ = std::string(read_name_length, '\\0');\n    num_read_bytes += fread(&(read_name_[0]), sizeof(char), read_name_length,\n                            temp_mapping_output_file);\n    num_read_bytes +=\n        fread(&cell_barcode_, sizeof(uint64_t), 1, temp_mapping_output_file);\n    num_read_bytes += fread(&rid1_, sizeof(int), 1, temp_mapping_output_file);\n    num_read_bytes += fread(&rid2_, sizeof(int), 1, temp_mapping_output_file);\n    num_read_bytes +=\n        fread(&pos1_, sizeof(uint32_t), 1, temp_mapping_output_file);\n    num_read_bytes +=\n        fread(&pos2_, sizeof(uint32_t), 1, temp_mapping_output_file);\n    num_read_bytes +=\n        fread(&strand1_, sizeof(int), 1, temp_mapping_output_file);\n    num_read_bytes +=\n        fread(&strand2_, sizeof(int), 1, temp_mapping_output_file);\n    uint16_t mapq_unique_dups = 0;\n    num_read_bytes +=\n        fread(&mapq_unique_dups, sizeof(uint16_t), 1, temp_mapping_output_file);\n    mapq_ = (mapq_unique_dups >> 8);\n    is_unique_ = (mapq_unique_dups >> 7) & 1;\n    num_dups_ = ((mapq_unique_dups << 9) >> 9);\n    return num_read_bytes;\n  }\n};\n\n}  // namespace chromap\n\n#endif  // PAIRSMAPPING_H_\n"
  },
  {
    "path": "src/sam_mapping.h",
    "content": "#ifndef SAMMAPPING_H_\n#define SAMMAPPING_H_\n\n#include <string>\n#include <tuple>\n#include <vector>\n\n#include \"mapping.h\"\n\nnamespace chromap {\n\n/****************************\n **** CIGAR related macros ***\n *****************************/\n\n#define BAM_CMATCH 0\n#define BAM_CINS 1\n#define BAM_CDEL 2\n#define BAM_CREF_SKIP 3\n#define BAM_CSOFT_CLIP 4\n#define BAM_CHARD_CLIP 5\n#define BAM_CPAD 6\n#define BAM_CEQUAL 7\n#define BAM_CDIFF 8\n#define BAM_CBACK 9\n\n#define BAM_CIGAR_STR \"MIDNSHP=XB\"\n#define BAM_CIGAR_SHIFT 4\n#define BAM_CIGAR_MASK 0xf\n#define BAM_CIGAR_TYPE 0x3C1A7\n\n/*! @abstract Table for converting a CIGAR operator character to BAM_CMATCH etc.\n * Result is operator code or -1. Be sure to cast the index if it is a plain\n *char: int op = bam_cigar_table[(unsigned char) ch];\n **/\n// extern const int8_t bam_cigar_table[256];\nconst int8_t bam_cigar_table[256] = {\n    // 0 .. 47\n    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n\n    // 48 .. 63  (including =)\n    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, BAM_CEQUAL, -1, -1,\n\n    // 64 .. 79  (including MIDNHB)\n    -1, -1, BAM_CBACK, -1, BAM_CDEL, -1, -1, -1, BAM_CHARD_CLIP, BAM_CINS, -1,\n    -1, -1, BAM_CMATCH, BAM_CREF_SKIP, -1,\n\n    // 80 .. 95  (including SPX)\n    BAM_CPAD, -1, -1, BAM_CSOFT_CLIP, -1, -1, -1, -1, BAM_CDIFF, -1, -1, -1, -1,\n    -1, -1, -1,\n\n    // 96 .. 127\n    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n\n    // 128 .. 255\n    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,\n    -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1};\n#define bam_cigar_op(c) ((c)&BAM_CIGAR_MASK)\n#define bam_cigar_oplen(c) ((c) >> BAM_CIGAR_SHIFT)\n// Note that BAM_CIGAR_STR is padded to length 16 bytes below so that\n// the array look-up will not fall off the end.  '?' is chosen as the\n// padding character so it's easy to spot if one is emitted, and will\n// result in a parsing failure (in sam_parse1(), at least) if read.\n#define bam_cigar_opchr(c) (BAM_CIGAR_STR \"??????\"[bam_cigar_op(c)])\n#define bam_cigar_gen(l, o) ((l) << BAM_CIGAR_SHIFT | (o))\n\n/* bam_cigar_type returns a bit flag with:\n *   bit 1 set if the cigar operation consumes the query\n *   bit 2 set if the cigar operation consumes the reference\n *\n * For reference, the unobfuscated truth table for this function is:\n * BAM_CIGAR_TYPE  QUERY  REFERENCE\n * --------------------------------\n * BAM_CMATCH      1      1\n * BAM_CINS        1      0\n * BAM_CDEL        0      1\n * BAM_CREF_SKIP   0      1\n * BAM_CSOFT_CLIP  1      0\n * BAM_CHARD_CLIP  0      0\n * BAM_CPAD        0      0\n * BAM_CEQUAL      1      1\n * BAM_CDIFF       1      1\n * BAM_CBACK       0      0\n * --------------------------------\n */\n#define bam_cigar_type(o) (BAM_CIGAR_TYPE >> ((o) << 1) & 3)\n// bit 1: consume query; bit 2: consume reference\n\n/*! @abstract the read is paired in sequencing, no matter whether it is mapped\n * in a pair */\n#define BAM_FPAIRED 1\n/*! @abstract the read is mapped in a proper pair */\n#define BAM_FPROPER_PAIR 2\n/*! @abstract the read itself is unmapped; conflictive with BAM_FPROPER_PAIR */\n#define BAM_FUNMAP 4\n/*! @abstract the mate is unmapped */\n#define BAM_FMUNMAP 8\n/*! @abstract the read is mapped to the reverse strand */\n#define BAM_FREVERSE 16\n/*! @abstract the mate is mapped to the reverse strand */\n#define BAM_FMREVERSE 32\n/*! @abstract this is read1 */\n#define BAM_FREAD1 64\n/*! @abstract this is read2 */\n#define BAM_FREAD2 128\n/*! @abstract not primary alignment */\n#define BAM_FSECONDARY 256\n/*! @abstract QC failure */\n#define BAM_FQCFAIL 512\n/*! @abstract optical or PCR duplicate */\n#define BAM_FDUP 1024\n/*! @abstract supplementary alignment */\n#define BAM_FSUPPLEMENTARY 2048\n\nclass SAMMapping : public Mapping {\n public:\n  uint32_t read_id_;\n  std::string read_name_;\n  uint64_t cell_barcode_;\n  // uint16_t read_length;\n  // uint32_t fragment_start_position;\n  // uint16_t fragment_length;\n  // uint8_t direction : 1, is_unique : 1;\n  uint8_t num_dups_;\n  // uint16_t positive_alignment_length;\n  // uint16_t negative_alignment_length;\n\n  int64_t pos_;  // forward strand 5'-end mapping position (inclusive)\n  int rid_;      // reference sequence index in bntseq_t; <0 for unmapped\n  int64_t mpos_;  // forward strand 5'-end mapping position for mate (inclusive)\n  int mrid_;      // reference sequence index in bntseq_t; <0 for unmapped\n  int tlen_;      // template length\n  int flag_;     // extra flag\n  uint32_t is_rev_ : 1, is_alt_ : 1, is_unique_ : 1, mapq_ : 7,\n      NM_ : 22;      // is_rev: whether on the reverse strand; mapq: mapping\n                     // quality; NM: edit distance\n  int n_cigar_ = 0;  // number of CIGAR operations\n  std::vector<uint32_t> cigar_;  // CIGAR in the BAM encoding: opLen<<4|op; op\n                                 // to integer mapping: MIDSH=>01234\n  std::string MD_;\n  std::string sequence_;\n  std::string sequence_qual_;\n  // char *XA;        // alternative mappings\n  // int score, sub, alt_sc;\n\n  SAMMapping() {}\n\n  SAMMapping(uint32_t read_id, const std::string &read_name,\n             uint64_t cell_barcode, uint8_t num_dups, int64_t pos, int rid,\n             int64_t mpos, int mrid, int tlen, \n             int flag, uint8_t is_rev, uint8_t is_alt, uint8_t is_unique,\n             uint8_t mapq, uint32_t NM, int n_cigar, uint32_t *cigar,\n             const std::string &MD_tag, const std::string &sequence,\n             const std::string &sequence_qual)\n      : read_id_(read_id),\n        read_name_(read_name),\n        cell_barcode_(cell_barcode),\n        num_dups_(num_dups),\n        pos_(pos),\n        rid_(rid),\n        mpos_(mpos),\n        mrid_(mrid),\n        tlen_(tlen),\n        flag_(flag),\n        is_rev_(is_rev),\n        is_alt_(is_alt),\n        is_unique_(is_unique),\n        mapq_(mapq),\n        NM_(NM),\n        n_cigar_(n_cigar),\n        MD_(MD_tag) {\n    cigar_ = std::vector<uint32_t>(cigar, cigar + n_cigar);\n    free(cigar);\n\n    if (!IsPositiveStrand()) {\n      for (uint32_t i = 0; i < sequence_qual.length(); ++i) {\n        sequence_qual_.push_back(sequence_qual[sequence_qual.length() - 1 - i]);\n      }\n    } else {\n      sequence_qual_ = sequence_qual;\n    }\n\n    uint32_t sequence_length_deduced_from_cigar = GetSequenceLength();\n    if (sequence_length_deduced_from_cigar != sequence.length()) {\n      sequence_ = sequence.substr(0, sequence_length_deduced_from_cigar);\n      sequence_qual_ =\n          sequence_qual_.substr(0, sequence_length_deduced_from_cigar);\n    } else {\n      sequence_ = sequence;\n    }\n  }\n\n  bool operator<(const SAMMapping &m) const {\n    int read1_flag = flag_ & BAM_FREAD1;\n    int m_read1_flag = m.flag_ & BAM_FREAD1;\n    return std::tie(rid_, pos_, cell_barcode_, mrid_, mpos_, read1_flag, mapq_, read_id_) <\n           std::tie(m.rid_, m.pos_, m.cell_barcode_, m.mrid_, m.mpos_, m_read1_flag, m.mapq_, m.read_id_);\n  }\n  bool operator==(const SAMMapping &m) const {\n    int read1_flag = flag_ & BAM_FREAD1;\n    int m_read1_flag = m.flag_ & BAM_FREAD1;\n    return std::tie(pos_, rid_, cell_barcode_, read1_flag, mrid_, mpos_) ==\n           std::tie(m.pos_, m.rid_, m.cell_barcode_, m_read1_flag, m.mrid_, m.mpos_);\n  }\n  bool IsSamePosition(const SAMMapping &m) const {\n    return std::tie(pos_, rid_, is_rev_, mrid_, mpos_) == \n      std::tie(m.pos_, m.rid_, m.is_rev_, m.rid_, m.mpos_);\n  }\n  uint64_t GetBarcode() const { return cell_barcode_; }\n  void Tn5Shift() {\n    // We don't support Tn5 shift in SAM format because it has other fields that\n    // depend mapping position.\n  }\n  // TODO(Haowen): I have to change the variable names or this function to make\n  // the meaning consistent.\n  bool IsPositiveStrand() const { return is_rev_ > 0 ? true : false; }\n  // For now for convenience, we assume cigar should not be accessed after\n  // generating the cigar string for output\n  std::string GenerateCigarString() const {\n    if (n_cigar_ == 0) {\n      return \"*\";\n    }\n    std::string cigar_string = \"\";\n    for (int ci = 0; ci < n_cigar_; ++ci) {\n      uint32_t op = bam_cigar_op(cigar_[ci]);\n      uint32_t op_length = bam_cigar_oplen(cigar_[ci]);\n      // std::cerr << op << \" \" << op_length << \"\\n\";\n      cigar_string.append(std::to_string(op_length));\n      // cigar_string.append(std::to_string((BAM_CIGAR_STR[op])));\n      cigar_string.push_back((BAM_CIGAR_STR[op]));\n    }\n    return cigar_string;\n  }\n  std::string GenerateIntTagString(const std::string &tag, int value) const {\n    std::string tag_string = tag;\n    tag_string.append(\":i:\" + std::to_string(value));\n    return tag_string;\n  }\n  uint32_t GetAlignmentLength() const {\n    uint32_t alignment_length = 0;\n    for (int ci = 0; ci < n_cigar_; ++ci) {\n      uint32_t op = bam_cigar_op(cigar_[ci]);\n      uint32_t op_length = bam_cigar_oplen(cigar_[ci]);\n      if ((bam_cigar_type(op) & 0x2) > 0) {\n        alignment_length += op_length;\n      }\n    }\n    return alignment_length;\n  }\n\n  uint32_t GetSequenceLength() const {\n    uint32_t sequence_length = 0;\n    for (int ci = 0; ci < n_cigar_; ++ci) {\n      uint32_t op = bam_cigar_op(cigar_[ci]);\n      uint32_t op_length = bam_cigar_oplen(cigar_[ci]);\n      if ((bam_cigar_type(op) & 0x1) > 0) {\n        sequence_length += op_length;\n      }\n    }\n    return sequence_length;\n  }\n\n  uint32_t GetStartPosition() const {  // inclusive\n    return pos_ + 1;\n    /*if (IsPositiveStrand()) {\n      return pos + 1;\n    } else {\n      return pos + 1 - GetAlignmentLength() + 1;\n    }*/\n  }\n  uint32_t GetEndPosition() const {  // exclusive\n    return pos_ + GetAlignmentLength();\n    /*if (IsPositiveStrand()) {\n      return pos + GetAlignmentLength();\n    } else {\n      return pos + 1;\n    }*/\n  }\n  uint16_t GetByteSize() const {\n    return 2 * sizeof(uint32_t) + 2 * sizeof(uint16_t) + 2 * sizeof(uint8_t) +\n           (read_name_.length() + MD_.length()) * sizeof(char) +\n           n_cigar_ * sizeof(uint32_t);\n  }\n\n  size_t WriteToFile(FILE *temp_mapping_output_file) const {\n    size_t num_written_bytes = 0;\n    num_written_bytes +=\n        fwrite(&read_id_, sizeof(uint32_t), 1, temp_mapping_output_file);\n    uint16_t read_name_length = read_name_.length();\n    num_written_bytes += fwrite(&read_name_length, sizeof(uint16_t), 1,\n                                temp_mapping_output_file);\n    num_written_bytes += fwrite(read_name_.data(), sizeof(char),\n                                read_name_length, temp_mapping_output_file);\n    num_written_bytes +=\n        fwrite(&cell_barcode_, sizeof(uint64_t), 1, temp_mapping_output_file);\n    num_written_bytes +=\n        fwrite(&num_dups_, sizeof(uint8_t), 1, temp_mapping_output_file);\n    num_written_bytes +=\n        fwrite(&pos_, sizeof(int64_t), 1, temp_mapping_output_file);\n    num_written_bytes +=\n        fwrite(&rid_, sizeof(int), 1, temp_mapping_output_file);\n    num_written_bytes +=\n        fwrite(&mpos_, sizeof(int64_t), 1, temp_mapping_output_file);\n    num_written_bytes +=\n        fwrite(&mrid_, sizeof(int), 1, temp_mapping_output_file);\n    num_written_bytes +=\n        fwrite(&tlen_, sizeof(int), 1, temp_mapping_output_file);\n    num_written_bytes +=\n        fwrite(&flag_, sizeof(int), 1, temp_mapping_output_file);\n    uint32_t rev_alt_unique_mapq_NM = (is_rev_ << 31) | (is_alt_ << 30) |\n                                      (is_unique_ << 29) | (mapq_ << 22) | NM_;\n    num_written_bytes += fwrite(&rev_alt_unique_mapq_NM, sizeof(uint32_t), 1,\n                                temp_mapping_output_file);\n    num_written_bytes +=\n        fwrite(&n_cigar_, sizeof(int), 1, temp_mapping_output_file);\n    if (n_cigar_ > 0) {\n      num_written_bytes += fwrite(cigar_.data(), sizeof(uint32_t), n_cigar_,\n                                  temp_mapping_output_file);\n    }\n    uint16_t MD_length = MD_.length();\n    num_written_bytes +=\n        fwrite(&MD_length, sizeof(uint16_t), 1, temp_mapping_output_file);\n    if (MD_length > 0) {\n      num_written_bytes +=\n          fwrite(MD_.data(), sizeof(char), MD_length, temp_mapping_output_file);\n    }\n    uint16_t sequence_length = sequence_.length();\n    num_written_bytes +=\n        fwrite(&sequence_length, sizeof(uint16_t), 1, temp_mapping_output_file);\n    num_written_bytes += fwrite(sequence_.data(), sizeof(char), sequence_length,\n                                temp_mapping_output_file);\n    num_written_bytes += fwrite(sequence_qual_.data(), sizeof(char),\n                                sequence_length, temp_mapping_output_file);\n    return num_written_bytes;\n  }\n\n  size_t LoadFromFile(FILE *temp_mapping_output_file) {\n    int num_read_bytes = 0;\n    num_read_bytes +=\n        fread(&read_id_, sizeof(uint32_t), 1, temp_mapping_output_file);\n    uint16_t read_name_length = 0;\n    num_read_bytes +=\n        fread(&read_name_length, sizeof(uint16_t), 1, temp_mapping_output_file);\n    read_name_ = std::string(read_name_length, '\\0');\n    num_read_bytes += fread(&(read_name_[0]), sizeof(char), read_name_length,\n                            temp_mapping_output_file);\n    num_read_bytes +=\n        fread(&cell_barcode_, sizeof(uint64_t), 1, temp_mapping_output_file);\n    num_read_bytes +=\n        fread(&num_dups_, sizeof(uint8_t), 1, temp_mapping_output_file);\n    num_read_bytes +=\n        fread(&pos_, sizeof(int64_t), 1, temp_mapping_output_file);\n    num_read_bytes += fread(&rid_, sizeof(int), 1, temp_mapping_output_file);\n    num_read_bytes +=\n        fread(&mpos_, sizeof(int64_t), 1, temp_mapping_output_file);\n    num_read_bytes += fread(&mrid_, sizeof(int), 1, temp_mapping_output_file);\n    num_read_bytes += fread(&tlen_, sizeof(int), 1, temp_mapping_output_file);\n    num_read_bytes += fread(&flag_, sizeof(int), 1, temp_mapping_output_file);\n    uint32_t rev_alt_unique_mapq_NM = 0;\n    num_read_bytes += fread(&rev_alt_unique_mapq_NM, sizeof(uint32_t), 1,\n                            temp_mapping_output_file);\n    is_rev_ = (rev_alt_unique_mapq_NM >> 31);\n    is_alt_ = (rev_alt_unique_mapq_NM >> 30) & 1;\n    is_unique_ = (rev_alt_unique_mapq_NM >> 29) & 1;\n    mapq_ = ((rev_alt_unique_mapq_NM << 3) >> 25);\n    NM_ = ((rev_alt_unique_mapq_NM << 10) >> 10);\n    int previous_n_cigar_ = n_cigar_;\n    num_read_bytes +=\n        fread(&n_cigar_, sizeof(int), 1, temp_mapping_output_file);\n    if (n_cigar_ > 0) {\n      if (previous_n_cigar_ < n_cigar_) {\n        cigar_.resize(n_cigar_);\n      }\n      num_read_bytes += fread(cigar_.data(), sizeof(uint32_t), n_cigar_,\n                              temp_mapping_output_file);\n    }\n    uint16_t MD_length = 0;\n    num_read_bytes +=\n        fread(&MD_length, sizeof(uint16_t), 1, temp_mapping_output_file);\n    if (MD_length > 0) {\n      MD_ = std::string(MD_length, '\\0');\n      num_read_bytes +=\n          fread(&(MD_[0]), sizeof(char), MD_length, temp_mapping_output_file);\n    }\n    uint16_t sequence_length = 0;\n    num_read_bytes +=\n        fread(&sequence_length, sizeof(uint16_t), 1, temp_mapping_output_file);\n    if (sequence_length > 0) {\n      sequence_ = std::string(sequence_length, '\\0');\n      sequence_qual_ = std::string(sequence_length, '\\0');\n      num_read_bytes += fread(&(sequence_[0]), sizeof(char), sequence_length,\n                              temp_mapping_output_file);\n      num_read_bytes += fread(&(sequence_qual_[0]), sizeof(char),\n                              sequence_length, temp_mapping_output_file);\n    }\n    return num_read_bytes;\n  }\n};\n\n// TODO(Haowen) : Add PairedSAMMapping.\n}  // namespace chromap\n\n#endif  // SAMMAPPING_H_\n"
  },
  {
    "path": "src/sequence_batch.cc",
    "content": "#include \"sequence_batch.h\"\n\n#include <tuple>\n\n#include \"utils.h\"\n\nnamespace chromap {\n\nvoid SequenceBatch::InitializeLoading(const std::string &sequence_file_path) {\n  sequence_file_ = gzopen(sequence_file_path.c_str(), \"r\");\n  if (sequence_file_ == NULL) {\n    ExitWithMessage(\"Cannot find sequence file \" + sequence_file_path);\n  }\n  sequence_kseq_ = kseq_init(sequence_file_);\n}\n\nvoid SequenceBatch::FinalizeLoading() {\n  kseq_destroy(sequence_kseq_);\n  gzclose(sequence_file_);\n}\n\nbool SequenceBatch::LoadOneSequenceAndSaveAt(uint32_t sequence_index) {\n  if (sequence_index == 0) {\n    num_loaded_sequences_ = 0;\n  }\n\n  int length = kseq_read(sequence_kseq_);\n  while (length == 0) {\n    length = kseq_read(sequence_kseq_);\n  }\n\n  if (length > 0) {\n    kseq_t *sequence = sequence_batch_[sequence_index];\n    std::swap(sequence_kseq_->seq, sequence->seq);\n    ReplaceByEffectiveRange(sequence->seq, /*is_seq=*/true);\n    std::swap(sequence_kseq_->name, sequence->name);\n    std::swap(sequence_kseq_->comment, sequence->comment);\n    sequence->id = total_num_loaded_sequences_;\n    ++total_num_loaded_sequences_;\n\n    if (sequence_index >= num_loaded_sequences_) {\n      ++num_loaded_sequences_;\n    } else if (sequence_index + 1 != num_loaded_sequences_) {\n      std::cerr << sequence_index << \" \" << num_loaded_sequences_ << \"\\n\";\n      ExitWithMessage(\n          \"Shouldn't override other sequences rather than the last!\");\n    }\n\n    if (sequence_kseq_->qual.l != 0) {  // fastq file\n      std::swap(sequence_kseq_->qual, sequence->qual);\n      ReplaceByEffectiveRange(sequence->qual, /*is_seq=*/false);\n    }\n    return false;\n  }\n\n  // Make sure to reach the end of the file rather than meet an error.\n  if (length != -1) {\n    ExitWithMessage(\n        \"Didn't reach the end of sequence file, which might be corrupted!\");\n  }\n  return true;\n}\n\nuint32_t SequenceBatch::LoadBatch() {\n  double real_start_time = GetRealTime();\n  num_loaded_sequences_ = 0;\n  for (uint32_t sequence_index = 0; sequence_index < max_num_sequences_;\n       ++sequence_index) {\n    if (LoadOneSequenceAndSaveAt(sequence_index)) {\n      break;\n    }\n  }\n\n  if (num_loaded_sequences_ != 0) {\n    std::cerr << \"Loaded sequence batch successfully in \"\n              << GetRealTime() - real_start_time << \"s, \";\n    std::cerr << \"number of sequences: \" << num_loaded_sequences_ << \".\\n\";\n  } else {\n    std::cerr << \"No more sequences.\\n\";\n  }\n  return num_loaded_sequences_;\n}\n\nvoid SequenceBatch::LoadAllSequences() {\n  double real_start_time = GetRealTime();\n  sequence_batch_.reserve(200);\n  num_loaded_sequences_ = 0;\n  num_bases_ = 0;\n  int length = kseq_read(sequence_kseq_);\n  while (length >= 0) {\n    if (length > 0) {\n      sequence_batch_.emplace_back((kseq_t *)calloc(1, sizeof(kseq_t)));\n      kseq_t *sequence = sequence_batch_.back();\n      std::swap(sequence_kseq_->seq, sequence->seq);\n      ReplaceByEffectiveRange(sequence->seq, /*is_seq=*/true);\n      std::swap(sequence_kseq_->name, sequence->name);\n      std::swap(sequence_kseq_->comment, sequence->comment);\n      if (sequence_kseq_->qual.l != 0) {  // fastq file\n        std::swap(sequence_kseq_->qual, sequence->qual);\n        ReplaceByEffectiveRange(sequence->qual, /*is_seq=*/false);\n      }\n      sequence->id = total_num_loaded_sequences_;\n      ++total_num_loaded_sequences_;\n      ++num_loaded_sequences_;\n      num_bases_ += length;\n    }\n    length = kseq_read(sequence_kseq_);\n  }\n\n  // Make sure to reach the end of the file rather than meet an error.\n  if (length != -1) {\n    ExitWithMessage(\n        \"Didn't reach the end of sequence file, which might be corrupted!\");\n  }\n\n  std::cerr << \"Loaded all sequences successfully in \"\n            << GetRealTime() - real_start_time << \"s, \";\n  std::cerr << \"number of sequences: \" << num_loaded_sequences_ << \", \";\n  std::cerr << \"number of bases: \" << num_bases_ << \".\\n\";\n}\n\nvoid SequenceBatch::ReplaceByEffectiveRange(kstring_t &seq, bool is_seq) {\n  seq.l = effective_range_.Replace(seq.s, seq.l, is_seq);\n}\n\n}  // namespace chromap\n"
  },
  {
    "path": "src/sequence_batch.h",
    "content": "#ifndef SEQUENCEBATCH_H_\n#define SEQUENCEBATCH_H_\n\n#include <unistd.h>\n#include <zlib.h>\n\n#include <iostream>\n#include <string>\n#include <vector>\n\n#include \"kseq.h\"\n#include \"sequence_effective_range.h\"\n#include \"utils.h\"\n\nnamespace chromap {\n\nclass SequenceBatch {\n public:\n  KSEQ_INIT(gzFile, gzread);\n\n  // When 'max_num_sequences' is not specified. This batch can be used to load\n  // any number of sequences with a positive full effective range.\n  SequenceBatch() : effective_range_(SequenceEffectiveRange()) {}\n\n  // Construct once and use update sequences when loading each batch.\n  SequenceBatch(uint32_t max_num_sequences,\n                const SequenceEffectiveRange &effective_range)\n      : max_num_sequences_(max_num_sequences),\n        effective_range_(effective_range) {\n    sequence_batch_.reserve(max_num_sequences_);\n    for (uint32_t i = 0; i < max_num_sequences_; ++i) {\n      sequence_batch_.emplace_back((kseq_t *)calloc(1, sizeof(kseq_t)));\n      sequence_batch_.back()->f = NULL;\n    }\n    negative_sequence_batch_.assign(max_num_sequences_, \"\");\n  }\n\n  ~SequenceBatch() {\n    if (sequence_batch_.size() > 0) {\n      for (uint32_t i = 0; i < sequence_batch_.size(); ++i) {\n        kseq_destroy(sequence_batch_[i]);\n      }\n    }\n  }\n\n  inline uint64_t GetNumSequences() const { return num_loaded_sequences_; }\n\n  inline uint32_t GetMaxBatchSize() const { return max_num_sequences_; }\n\n  inline uint64_t GetNumBases() const { return num_bases_; }\n\n  inline std::vector<kseq_t *> &GetSequenceBatch() { return sequence_batch_; }\n\n  inline std::vector<std::string> &GetNegativeSequenceBatch() {\n    return negative_sequence_batch_;\n  }\n\n  inline const char *GetSequenceAt(uint32_t sequence_index) const {\n    return sequence_batch_[sequence_index]->seq.s;\n  }\n\n  inline uint32_t GetSequenceLengthAt(uint32_t sequence_index) const {\n    return sequence_batch_[sequence_index]->seq.l;\n  }\n\n  inline const char *GetSequenceNameAt(uint32_t sequence_index) const {\n    return sequence_batch_[sequence_index]->name.s;\n  }\n\n  inline uint32_t GetSequenceNameLengthAt(uint32_t sequence_index) const {\n    return sequence_batch_[sequence_index]->name.l;\n  }\n\n  inline const char *GetSequenceQualAt(uint32_t sequence_index) const {\n    return sequence_batch_[sequence_index]->qual.s;\n  }\n  inline uint32_t GetSequenceIdAt(uint32_t sequence_index) const {\n    return sequence_batch_[sequence_index]->id;\n  }\n\n  inline const std::string &GetNegativeSequenceAt(\n      uint32_t sequence_index) const {\n    return negative_sequence_batch_[sequence_index];\n  }\n\n  // big_endian: N_pos is in the order of sequence\n  // little_endian: N_pos is in the order from the sequence right side to left,\n  //                this is the order of the GenerateSeed\n  // e.g: If the sequence is \"ACN\", big endian returns N at 2,\n  //      little endian returns N at 0.\n  inline void GetSequenceNsAt(uint32_t sequence_index, bool little_endian,\n                              std::vector<int> &N_pos) {\n    const int l = sequence_batch_[sequence_index]->seq.l;\n    const char *s = sequence_batch_[sequence_index]->seq.s;\n    N_pos.clear();\n    if (little_endian) {\n      for (int i = l - 1; i >= 0; --i) {\n        if (s[i] == 'N') N_pos.push_back(l - 1 - i);\n      }\n    } else {\n      for (int i = 0; i < l; ++i) {\n        if (s[i] == 'N') N_pos.push_back(i);\n      }\n    }\n  }\n\n  inline bool IsNInSequenceAt(uint32_t sequence_index) {\n    const int l = sequence_batch_[sequence_index]->seq.l;\n    const char *s = sequence_batch_[sequence_index]->seq.s;\n    for (int i = 0 ; i < l ; ++i)\n      if (s[i] == 'N')\n        return true;\n    return false;\n  }\n\n  //  inline char GetReverseComplementBaseOfSequenceAt(uint32_t sequence_index,\n  //  uint32_t position) {\n  //    kseq_t *sequence = sequence_batch_[sequence_index];\n  //    return Uint8ToChar(((uint8_t)3) ^\n  //    (CharToUint8((sequence->seq.s)[sequence->seq.l - position - 1])));\n  //  }\n\n  inline void PrepareNegativeSequenceAt(uint32_t sequence_index) {\n    kseq_t *sequence = sequence_batch_[sequence_index];\n    uint32_t sequence_length = sequence->seq.l;\n    std::string &negative_sequence = negative_sequence_batch_[sequence_index];\n    negative_sequence.clear();\n    negative_sequence.reserve(sequence_length);\n    for (uint32_t i = 0; i < sequence_length; ++i) {\n      negative_sequence.push_back(Uint8ToChar(\n          ((uint8_t)3) ^\n          (CharToUint8((sequence->seq.s)[sequence_length - i - 1]))));\n    }\n  }\n\n  inline void TrimSequenceAt(uint32_t sequence_index, int length_after_trim) {\n    kseq_t *sequence = sequence_batch_[sequence_index];\n    if (length_after_trim >= (int)sequence->seq.l) {\n      return;\n    }\n\n    negative_sequence_batch_[sequence_index].erase(\n        negative_sequence_batch_[sequence_index].begin(),\n        negative_sequence_batch_[sequence_index].begin() + sequence->seq.l -\n            length_after_trim);\n\n    sequence->seq.l = length_after_trim;\n    sequence->seq.s[sequence->seq.l] = '\\0';\n    sequence->qual.l = length_after_trim;\n    sequence->qual.s[sequence->qual.l] = '\\0';\n  }\n\n  inline void SwapSequenceBatch(SequenceBatch &batch) {\n    sequence_batch_.swap(batch.GetSequenceBatch());\n    negative_sequence_batch_.swap(batch.GetNegativeSequenceBatch());\n  }\n\n  void InitializeLoading(const std::string &sequence_file_path);\n\n  void FinalizeLoading();\n\n  // The func should never override other sequences rather than the last, which\n  // means 'sequence_index' cannot be smaller than 'num_loaded_sequences_' - 1.\n  // Return true when reaching the end of the file.\n  bool LoadOneSequenceAndSaveAt(uint32_t sequence_index);\n\n  // Return the number of sequences loaded into the batch and return 0 if there\n  // is no more sequences. This func now is only used to load barcodes.\n  uint32_t LoadBatch();\n\n  // Load all sequences in a file. This function should only be used to load\n  // reference. And once the reference is loaded, the batch should never be\n  // updated. This func is slow when there are large number of sequences.\n  void LoadAllSequences();\n\n  inline void CorrectBaseAt(uint32_t sequence_index, uint32_t base_position,\n                            char correct_base) {\n    kseq_t *sequence = sequence_batch_[sequence_index];\n    sequence->seq.s[base_position] = correct_base;\n  }\n\n  inline uint64_t GenerateSeedFromSequenceAt(uint32_t sequence_index,\n                                             uint32_t start_position,\n                                             uint32_t seed_length) const {\n    const char *sequence = GetSequenceAt(sequence_index);\n    const uint32_t sequence_length = GetSequenceLengthAt(sequence_index);\n    return GenerateSeedFromSequence(sequence, sequence_length, start_position,\n                                    seed_length);\n  }\n\n  inline void ReorderSequences(const std::vector<int> &rid_rank) {\n    std::vector<kseq_t *> tmp_sequence_batch_ = sequence_batch_;\n    std::vector<std::string> tmp_negative_sequence_batch_ =\n        negative_sequence_batch_;\n    for (size_t i = 0; i < sequence_batch_.size(); ++i) {\n      sequence_batch_[rid_rank[i]] = tmp_sequence_batch_[i];\n    }\n\n    if (negative_sequence_batch_.size() > 0) {\n      for (size_t i = 0; i < sequence_batch_.size(); ++i) {\n        negative_sequence_batch_[rid_rank[i]] = tmp_negative_sequence_batch_[i];\n      }\n    }\n  }\n\n protected:\n  // When 'is_seq' is set to true, this func will complement the base when\n  // necessary. Otherwise, it will just reverse the sequence.\n  void ReplaceByEffectiveRange(kstring_t &seq, bool is_seq);\n\n  // This is the accumulated number of sequences that have ever been loaded into\n  // the batch. It is useful for tracking read ids.\n  uint32_t total_num_loaded_sequences_ = 0;\n\n  // This is the number of sequences loaded into the current batch.\n  uint32_t num_loaded_sequences_ = 0;\n\n  // This is the number of bases loaded into the current batch. It is only\n  // populated for the reference.\n  uint64_t num_bases_ = 0;\n\n  // This is the max number of sequences that can be loaded into the batch. It\n  // is set to 0 when there is no such restriction.\n  uint32_t max_num_sequences_ = 0;\n\n  gzFile sequence_file_;\n  kseq_t *sequence_kseq_ = nullptr;\n  std::vector<kseq_t *> sequence_batch_;\n\n  // TODO: avoid constructing the negative sequence batch.\n  std::vector<std::string> negative_sequence_batch_;\n\n  // Actual range within each sequence.\n  const SequenceEffectiveRange effective_range_;\n};\n\n}  // namespace chromap\n\n#endif  // SEQUENCEBATCH_H_\n"
  },
  {
    "path": "src/sequence_effective_range.h",
    "content": "#ifndef SEQUENCE_EFFECTIVE_RANGE_H_\n#define SEQUENCE_EFFECTIVE_RANGE_H_\n\n#include <stdlib.h>\n\n#include <algorithm>\n#include <vector>\n\n#include \"utils.h\"\n\nnamespace chromap {\n\n// The class handles the custom read format indicating the effective range on a\n// sequence. Default is the full range.\nclass SequenceEffectiveRange {\n public:\n  SequenceEffectiveRange() = default;\n  ~SequenceEffectiveRange() = default;\n\n  void InitializeParsing() {\n    starts.clear();\n    ends.clear();\n    strand = '+';\n  }\n\n  void FinalizeParsing() {\n    if (starts.empty() && ends.empty()) {\n      starts.push_back(0);\n      ends.push_back(-1);\n      strand = '+';\n      return;\n    }\n\n    /*std::sort(starts.begin(), starts.end());\n    std::sort(ends.begin(), ends.end());\n\n    if (ends[0] == -1) {\n      ends.erase(ends.begin());\n      ends.push_back(-1);\n    }*/\n  }\n\n  // Return false if it fails to parse the format string.\n  bool ParseFormatStringAndAppendEffectiveRange(const char *s, int len) {\n    int i;\n    int j = 0;  // start, end, strand section\n    char buffer[20];\n    int blen = 0;\n\n    for (i = 3; i <= len; ++i) {\n      if (i == len || s[i] == ':') {\n        buffer[blen] = '\\0';\n        if (j == 0) {\n          starts.push_back(atoi(buffer));\n        } else if (j == 1) {\n          ends.push_back(atoi(buffer));\n        } else {\n          strand = buffer[0];\n        }\n\n        blen = 0;\n        if (i < len && s[i] == ':') {\n          ++j;\n        }\n      } else {\n        buffer[blen] = s[i];\n        ++blen;\n      }\n    }\n\n    if (j >= 3 || starts.size() != ends.size()) {\n      return false;\n    }\n\n    return true;\n  }\n\n  // Replace by the range specified in the starts, ends section, but does not\n  // apply the strand operation. Return new length.\n  int Replace(char *s, int len, bool need_complement) const {\n    if (IsFullRangeAndPositiveStrand()) {\n      return len;\n    }\n\n    int i, j, k;\n    i = 0;\n    const int num_ranges = starts.size();\n    for (k = 0; k < num_ranges; ++k) {\n      int start = starts[k];\n      int end = ends[k];\n\n      if (end == -1) {\n        end = len - 1;\n      }\n\n      for (j = start; j <= end; ++i, ++j) {\n        s[i] = s[j];\n      }\n    }\n\n    s[i] = '\\0';\n    len = i;\n\n    if (strand == '-') {\n      if (need_complement) {\n        for (i = 0; i < len; ++i) {\n          s[i] = Uint8ToChar(((uint8_t)3) ^ (CharToUint8(s[i])));\n        }\n      }\n\n      for (i = 0, j = len - 1; i < j; ++i, --j) {\n        char tmp = s[i];\n        s[i] = s[j];\n        s[j] = tmp;\n      }\n    }\n    return len;\n  }\n\n private:\n  bool IsFullRangeAndPositiveStrand() const {\n    if (strand == '+' && starts[0] == 0 && ends[0] == -1) {\n      return true;\n    }\n\n    return false;\n  }\n\n  std::vector<int> starts = {0};\n  std::vector<int> ends = {-1};\n  // Strand is either '+' or '-'. The barcode will be reverse-complemented after\n  // extraction if strand is '-'.\n  char strand = '+';\n};\n\n}  // namespace chromap\n\n#endif\n"
  },
  {
    "path": "src/strand.h",
    "content": "#ifndef STRAND_H_\n#define STRAND_H_\n\nnamespace chromap {\n\nenum Strand {\n  kPositive,\n  kNegative,\n};\n\n}  // namespace chromap\n\n#endif  // STRAND_H_\n"
  },
  {
    "path": "src/summary_metadata.h",
    "content": "#ifndef SUMMARY_METADATA_H_\n#define SUMMARY_METADATA_H_\n\n#include <string>\n\n#include <stdio.h>\n#include <stdint.h>\n#include <cmath>\n\n#include \"khash.h\"\n#include \"utils.h\"\n\n// The class summarizes the overall mapping metadata \n\nnamespace chromap {\n\nenum SummaryMetadataField {\n  SUMMARY_METADATA_TOTAL = 0,\n  SUMMARY_METADATA_DUP,\n  SUMMARY_METADATA_MAPPED,\n  SUMMARY_METADATA_LOWMAPQ,\n\tSUMMARY_METADATA_CACHEHIT,\n  SUMMARY_METADATA_CARDINALITY,\n  SUMMARY_METADATA_FIELDS\n};\n\nstruct _barcodeSummaryMetadata {\n  int counts[SUMMARY_METADATA_FIELDS];\n  _barcodeSummaryMetadata() {\n    memset(counts, 0, sizeof(int) * SUMMARY_METADATA_FIELDS);\n  }\n};\n\n\nKHASH_MAP_INIT_INT64(k64_barcode_metadata, struct _barcodeSummaryMetadata)\n\nclass SummaryMetadata {\n public:\n  SummaryMetadata() {\n    barcode_metadata_ = kh_init(k64_barcode_metadata);\n    barcode_length_ = 16;\n  }\n  ~SummaryMetadata() {\n    kh_destroy(k64_barcode_metadata, barcode_metadata_);\n  }\n\n  inline double inverse_logit(double frip) {\n    return (1.0/(1.0 + std::exp(-frip)));\n  }\n\n  inline void OutputCounts(const char *barcode, const int *counts, FILE *fp, std::vector<double> frip_est_coeffs, bool output_num_cache_slots_info)\n  {\n    // define variables to store values\n    size_t num_total = counts[SUMMARY_METADATA_TOTAL];\n    size_t num_dup = counts[SUMMARY_METADATA_DUP]; \n    \n    size_t num_mapped = counts[SUMMARY_METADATA_MAPPED];\n    size_t num_unmapped = num_total - num_mapped;\n\n    size_t num_lowmapq = counts[SUMMARY_METADATA_LOWMAPQ];\n    size_t num_cachehit = counts[SUMMARY_METADATA_CACHEHIT];\n    double fric = (num_mapped != 0) ? (double) num_cachehit / (double) num_mapped : 0.0;\n\n    size_t num_cache_slots = counts[SUMMARY_METADATA_CARDINALITY];\n\n    // compute the estimated frip\n    double est_frip = (fric != 0.0) ? inverse_logit(frip_est_coeffs[0] + /* constant */\n                                           (frip_est_coeffs[1] * fric) +\n                                           (frip_est_coeffs[2] * num_dup) +\n                                           (frip_est_coeffs[3] * num_unmapped)  +\n                                           (frip_est_coeffs[4] * num_lowmapq)) : 0.0;\n\n    // print out data for current barcode\n    if (!output_num_cache_slots_info) {\n      fprintf(fp, \"%s,%ld,%ld,%ld,%ld,%ld,%.5lf,%.5lf\\n\", \n              barcode,\n              num_total,\n              num_dup,\n              num_unmapped,\n              num_lowmapq,\n              num_cachehit,\n              fric,\n              est_frip);\n    } else {\n      fprintf(fp, \"%s,%ld,%ld,%ld,%ld,%ld,%.5lf,%.5lf,%ld\\n\", \n              barcode,\n              num_total,\n              num_dup,\n              num_unmapped,\n              num_lowmapq,\n              num_cachehit,\n              fric,\n              est_frip,\n              num_cache_slots);\n    }\n  }\n\n  void Output(const char *filename, bool has_white_list, std::vector<double> frip_est_coeffs, bool output_num_cache_slots_info) {\n    FILE *fp = fopen(filename, \"w\");\n\n    // Change summary file header depending on options\n    if (!output_num_cache_slots_info)\n      fprintf(fp, \"barcode,total,duplicate,unmapped,lowmapq,cachehit,fric,estfrip\\n\");   \n    else\n      fprintf(fp, \"barcode,total,duplicate,unmapped,lowmapq,cachehit,fric,estfrip,numcacheslots\\n\"); \n\n    khiter_t k;\n    for (k = kh_begin(barcode_metadata_); k != kh_end(barcode_metadata_); ++k)\n      if (kh_exist(barcode_metadata_, k)) {\n        OutputCounts(\n                    Seed2Sequence(kh_key(barcode_metadata_, k), barcode_length_).c_str(),\n                    kh_value(barcode_metadata_, k).counts, \n                    fp,\n                    frip_est_coeffs,\n                    output_num_cache_slots_info\n                    );\n      }\n    if (has_white_list) {\n      OutputCounts(\n                   \"non-whitelist\", \n                   nonwhitelist_summary_.counts, \n                   fp,\n                   frip_est_coeffs,\n                   output_num_cache_slots_info\n                   ) ;\n    }\n    fclose(fp);\n  }\n\n  void UpdateCount(uint64_t barcode, int type, int change) {\n    int khash_return_code;\n    khiter_t barcode_metadata_iter = kh_put(k64_barcode_metadata, barcode_metadata_, barcode, &khash_return_code);\n    if (khash_return_code) {\n      struct _barcodeSummaryMetadata nb;\n      kh_value(barcode_metadata_, barcode_metadata_iter) = nb;\n    }\n    kh_value(barcode_metadata_, barcode_metadata_iter).counts[type] += change;\n  }\n\n  void UpdateNonWhitelistCount(int type, int change) {\n    nonwhitelist_summary_.counts[type] += change;\n  }\n\n  void SetBarcodeLength(int l) {\n    barcode_length_ = l;\n  }\n\n  // In SAM format for paired-end data, some count will be counted twice\n  void AdjustPairedEndOverCount() {\n    khiter_t k;\n    for (k = kh_begin(barcode_metadata_); k != kh_end(barcode_metadata_); ++k)\n      if (kh_exist(barcode_metadata_, k)) {\n        kh_value(barcode_metadata_, k).counts[SUMMARY_METADATA_DUP] /= 2 ;\n        kh_value(barcode_metadata_, k).counts[SUMMARY_METADATA_LOWMAPQ] /= 2 ;\n        kh_value(barcode_metadata_, k).counts[SUMMARY_METADATA_MAPPED] /= 2 ;\n      } \n  }\n\n private:\n  khash_t(k64_barcode_metadata) *barcode_metadata_;    \n  struct _barcodeSummaryMetadata nonwhitelist_summary_;  // summarize the fragments with no barcode information \n  int barcode_length_;\n\n  std::string Seed2Sequence(uint64_t seed, uint32_t seed_length) const {\n    std::string sequence;\n    sequence.reserve(seed_length);\n    uint64_t mask_ = 3;\n    for (uint32_t i = 0; i < seed_length; ++i) {\n      sequence.push_back(\n          Uint8ToChar((seed >> ((seed_length - 1 - i) * 2)) & mask_));\n    }\n    return sequence;\n  }\n};\n\n} // namespace chromap\n\n#endif\n"
  },
  {
    "path": "src/temp_mapping.h",
    "content": "#ifndef TEMPMAPPING_H_\n#define TEMPMAPPING_H_\n\n#include <assert.h>\n\n#include <cinttypes>\n#include <cstring>\n#include <functional>\n#include <iostream>\n#include <string>\n#include <vector>\n\n#include \"bed_mapping.h\"\n#include \"mapping.h\"\n#include \"paf_mapping.h\"\n#include \"pairs_mapping.h\"\n#include \"sam_mapping.h\"\n\nnamespace chromap {\n\ntemplate <typename MappingRecord>\nstruct TempMappingFileHandle {\n  std::string file_path;\n  FILE* file;\n  uint32_t num_mappings;\n  uint32_t block_size;\n  uint32_t current_rid;\n  uint32_t current_mapping_index;\n  uint32_t num_mappings_on_current_rid;\n  uint32_t num_loaded_mappings_on_current_rid;\n  // This vector only keep mappings on the same ref seq.\n  std::vector<MappingRecord> mappings;\n\n  inline const MappingRecord& GetCurrentMapping() const {\n    return mappings[current_mapping_index];\n  }\n\n  inline bool HasMappings() const { return num_mappings != 0; }\n\n  inline void InitializeTempMappingLoading(uint32_t temp_mapping_block_size) {\n    file = fopen(file_path.c_str(), \"rb\");\n    if (file == NULL) {\n      std::cerr << \"Cannot open temporary file \" << file_path << \". This may be caused by creating too many temporary files, please consider using command like \\\"ulimit -n 32768 -u 32768\\\" to increase the limit.\\n\" ;\n    }\n    assert(file != NULL);\n    num_mappings = 0;\n    block_size = temp_mapping_block_size;\n    current_rid = 0;\n    current_mapping_index = 0;\n    fread(&num_mappings_on_current_rid, sizeof(size_t), 1, file);\n    num_loaded_mappings_on_current_rid = 0;\n    mappings.resize(block_size);\n    // std::cerr << \"Block size: \" << block_size << \", initialize temp file \" <<\n    // file_path << \"\\n\";\n  }\n\n  inline void FinalizeTempMappingLoading() { fclose(file); }\n\n  inline void LoadTempMappingBlock(uint32_t num_reference_sequences) {\n    num_mappings = 0;\n    while (num_mappings == 0) {\n      // Only keep mappings on one ref seq, which means # mappings in buffer can\n      // be less than block size Two cases: current ref seq has remainings or\n      // not\n      if (num_loaded_mappings_on_current_rid < num_mappings_on_current_rid) {\n        // Check if # remains larger than block size\n        uint32_t num_mappings_to_load_on_current_rid =\n            num_mappings_on_current_rid - num_loaded_mappings_on_current_rid;\n        if (num_mappings_to_load_on_current_rid > block_size) {\n          num_mappings_to_load_on_current_rid = block_size;\n        }\n        // std::cerr << num_mappings_to_load_on_current_rid << \" \" <<\n        // num_loaded_mappings_on_current_rid << \" \" <<\n        // num_mappings_on_current_rid << \"\\n\"; std::cerr << mappings.size() <<\n        // \"\\n\";\n        fread(mappings.data(), sizeof(MappingRecord),\n              num_mappings_to_load_on_current_rid, file);\n        // std::cerr << \"Load mappings\\n\";\n        num_loaded_mappings_on_current_rid +=\n            num_mappings_to_load_on_current_rid;\n        num_mappings = num_mappings_to_load_on_current_rid;\n      } else {\n        // Move to next rid\n        ++current_rid;\n        if (current_rid < num_reference_sequences) {\n          // std::cerr << \"Load size\\n\";\n          fread(&num_mappings_on_current_rid, sizeof(size_t), 1, file);\n          // std::cerr << \"Load size \" << num_mappings_on_current_rid << \"\\n\";\n          num_loaded_mappings_on_current_rid = 0;\n        } else {\n          break;\n        }\n      }\n    }\n\n    current_mapping_index = 0;\n  }\n\n  inline void Next(uint32_t num_reference_sequences) {\n    ++current_mapping_index;\n    if (current_mapping_index >= num_mappings) {\n      LoadTempMappingBlock(num_reference_sequences);\n    }\n  }\n};\n\ntemplate <>\ninline void TempMappingFileHandle<PAFMapping>::LoadTempMappingBlock(\n    uint32_t num_reference_sequences) {\n  num_mappings = 0;\n  while (num_mappings == 0) {\n    // Only keep mappings on one ref seq, which means # mappings in buffer can\n    // be less than block size Two cases: current ref seq has remainings or not\n    if (num_loaded_mappings_on_current_rid < num_mappings_on_current_rid) {\n      // Check if # remains larger than block size\n      uint32_t num_mappings_to_load_on_current_rid =\n          num_mappings_on_current_rid - num_loaded_mappings_on_current_rid;\n      if (num_mappings_to_load_on_current_rid > block_size) {\n        num_mappings_to_load_on_current_rid = block_size;\n      }\n      // std::cerr << num_mappings_to_load_on_current_rid << \" \" <<\n      // num_loaded_mappings_on_current_rid << \" \" <<\n      // num_mappings_on_current_rid\n      // << \"\\n\"; std::cerr << mappings.size() << \"\\n\";\n      for (size_t mi = 0; mi < num_mappings_to_load_on_current_rid; ++mi) {\n        mappings[mi].LoadFromFile(file);\n      }\n      // fread(mappings.data(), sizeof(MappingRecord),\n      // num_mappings_to_load_on_current_rid, file); std::cerr << \"Load\n      // mappings\\n\";\n      num_loaded_mappings_on_current_rid += num_mappings_to_load_on_current_rid;\n      num_mappings = num_mappings_to_load_on_current_rid;\n    } else {\n      // Move to next rid\n      ++current_rid;\n      if (current_rid < num_reference_sequences) {\n        // std::cerr << \"Load size\\n\";\n        fread(&num_mappings_on_current_rid, sizeof(size_t), 1, file);\n        // std::cerr << \"Load size \" << num_mappings_on_current_rid << \"\\n\";\n        num_loaded_mappings_on_current_rid = 0;\n      } else {\n        break;\n      }\n    }\n  }\n  current_mapping_index = 0;\n}\n\ntemplate <>\ninline void TempMappingFileHandle<PairedPAFMapping>::LoadTempMappingBlock(\n    uint32_t num_reference_sequences) {\n  num_mappings = 0;\n  while (num_mappings == 0) {\n    // Only keep mappings on one ref seq, which means # mappings in buffer can\n    // be less than block size Two cases: current ref seq has remainings or not\n    if (num_loaded_mappings_on_current_rid < num_mappings_on_current_rid) {\n      // Check if # remains larger than block size\n      uint32_t num_mappings_to_load_on_current_rid =\n          num_mappings_on_current_rid - num_loaded_mappings_on_current_rid;\n      if (num_mappings_to_load_on_current_rid > block_size) {\n        num_mappings_to_load_on_current_rid = block_size;\n      }\n      // std::cerr << num_mappings_to_load_on_current_rid << \" \" <<\n      // num_loaded_mappings_on_current_rid << \" \" <<\n      // num_mappings_on_current_rid\n      // << \"\\n\"; std::cerr << mappings.size() << \"\\n\";\n      for (size_t mi = 0; mi < num_mappings_to_load_on_current_rid; ++mi) {\n        mappings[mi].LoadFromFile(file);\n      }\n      // fread(mappings.data(), sizeof(MappingRecord),\n      // num_mappings_to_load_on_current_rid, file); std::cerr << \"Load\n      // mappings\\n\";\n      num_loaded_mappings_on_current_rid += num_mappings_to_load_on_current_rid;\n      num_mappings = num_mappings_to_load_on_current_rid;\n    } else {\n      // Move to next rid\n      ++current_rid;\n      if (current_rid < num_reference_sequences) {\n        // std::cerr << \"Load size\\n\";\n        fread(&num_mappings_on_current_rid, sizeof(size_t), 1, file);\n        // std::cerr << \"Load size \" << num_mappings_on_current_rid << \"\\n\";\n        num_loaded_mappings_on_current_rid = 0;\n      } else {\n        break;\n      }\n    }\n  }\n  current_mapping_index = 0;\n}\n\ntemplate <>\ninline void TempMappingFileHandle<SAMMapping>::LoadTempMappingBlock(\n    uint32_t num_reference_sequences) {\n  num_mappings = 0;\n  while (num_mappings == 0) {\n    // Only keep mappings on one ref seq, which means # mappings in buffer can\n    // be less than block size Two cases: current ref seq has remainings or not\n    if (num_loaded_mappings_on_current_rid < num_mappings_on_current_rid) {\n      // Check if # remains larger than block size\n      uint32_t num_mappings_to_load_on_current_rid =\n          num_mappings_on_current_rid - num_loaded_mappings_on_current_rid;\n      if (num_mappings_to_load_on_current_rid > block_size) {\n        num_mappings_to_load_on_current_rid = block_size;\n      }\n      // std::cerr << num_mappings_to_load_on_current_rid << \" \" <<\n      // num_loaded_mappings_on_current_rid << \" \" <<\n      // num_mappings_on_current_rid\n      // << \"\\n\"; std::cerr << mappings.size() << \"\\n\";\n      for (size_t mi = 0; mi < num_mappings_to_load_on_current_rid; ++mi) {\n        mappings[mi].LoadFromFile(file);\n      }\n      // fread(mappings.data(), sizeof(MappingRecord),\n      // num_mappings_to_load_on_current_rid, file); std::cerr << \"Load\n      // mappings\\n\";\n      num_loaded_mappings_on_current_rid += num_mappings_to_load_on_current_rid;\n      num_mappings = num_mappings_to_load_on_current_rid;\n    } else {\n      // Move to next rid\n      ++current_rid;\n      if (current_rid < num_reference_sequences) {\n        // std::cerr << \"Load size\\n\";\n        fread(&num_mappings_on_current_rid, sizeof(size_t), 1, file);\n        // std::cerr << \"Load size \" << num_mappings_on_current_rid << \"\\n\";\n        num_loaded_mappings_on_current_rid = 0;\n      } else {\n        break;\n      }\n    }\n  }\n  current_mapping_index = 0;\n}\n\ntemplate <>\ninline void TempMappingFileHandle<PairsMapping>::LoadTempMappingBlock(\n    uint32_t num_reference_sequences) {\n  num_mappings = 0;\n  while (num_mappings == 0) {\n    // Only keep mappings on one ref seq, which means # mappings in buffer can\n    // be less than block size Two cases: current ref seq has remainings or not\n    if (num_loaded_mappings_on_current_rid < num_mappings_on_current_rid) {\n      // Check if # remains larger than block size\n      uint32_t num_mappings_to_load_on_current_rid =\n          num_mappings_on_current_rid - num_loaded_mappings_on_current_rid;\n      if (num_mappings_to_load_on_current_rid > block_size) {\n        num_mappings_to_load_on_current_rid = block_size;\n      }\n      // std::cerr << num_mappings_to_load_on_current_rid << \" \" <<\n      // num_loaded_mappings_on_current_rid << \" \" <<\n      // num_mappings_on_current_rid\n      // << \"\\n\"; std::cerr << mappings.size() << \"\\n\";\n      for (size_t mi = 0; mi < num_mappings_to_load_on_current_rid; ++mi) {\n        mappings[mi].LoadFromFile(file);\n      }\n      // fread(mappings.data(), sizeof(MappingRecord),\n      // num_mappings_to_load_on_current_rid, file); std::cerr << \"Load\n      // mappings\\n\";\n      num_loaded_mappings_on_current_rid += num_mappings_to_load_on_current_rid;\n      num_mappings = num_mappings_to_load_on_current_rid;\n    } else {\n      // Move to next rid\n      ++current_rid;\n      if (current_rid < num_reference_sequences) {\n        // std::cerr << \"Load size\\n\";\n        fread(&num_mappings_on_current_rid, sizeof(size_t), 1, file);\n        // std::cerr << \"Load size \" << num_mappings_on_current_rid << \"\\n\";\n        num_loaded_mappings_on_current_rid = 0;\n      } else {\n        break;\n      }\n    }\n  }\n  current_mapping_index = 0;\n}\n\n}  // namespace chromap\n\n#endif  // TEMPMAPPING_H_\n"
  },
  {
    "path": "src/utils.h",
    "content": "#ifndef UTILS_H_\n#define UTILS_H_\n\n#include <sys/resource.h>\n#include <sys/time.h>\n\n#include <iostream>\n#include <tuple>\n#include <vector>\n\n#include \"candidate.h\"\n#include \"khash.h\"\n#include \"minimizer.h\"\n#include \"strand.h\"\n\nnamespace chromap {\n\nstruct uint128_t {\n  uint64_t first;\n  uint64_t second;\n};\n\nstruct BarcodeWithQual {\n  uint32_t corrected_base_index1;\n  char correct_base1;\n  uint32_t corrected_base_index2;\n  char correct_base2;\n  double score;\n  bool operator>(const BarcodeWithQual &b) const {\n    return std::tie(score, corrected_base_index1, correct_base1,\n                    corrected_base_index2, correct_base2) >\n           std::tie(b.score, b.corrected_base_index1, b.correct_base1,\n                    b.corrected_base_index2, b.correct_base2);\n  }\n};\n\nstruct _mm_history {\n  unsigned int timestamp = 0;\n  std::vector<Minimizer> minimizers;\n  std::vector<Candidate> positive_candidates;\n  std::vector<Candidate> negative_candidates;\n  uint32_t repetitive_seed_length;\n};\n\nKHASH_MAP_INIT_INT64(k128, uint128_t);\nKHASH_MAP_INIT_INT64(k64_seq, uint64_t);\nKHASH_SET_INIT_INT(k32_set);\nKHASH_MAP_INIT_INT64(kmatrix, uint32_t);\n\nstruct StackCell {\n  size_t x;  // node\n  int k, w;  // k: level; w: 0 if left child hasn't been processed\n  StackCell(){};\n  StackCell(int k_, size_t x_, int w_) : x(x_), k(k_), w(w_){};\n};\n\ninline static double GetRealTime() {\n  struct timeval tp;\n  struct timezone tzp;\n  gettimeofday(&tp, &tzp);\n  return tp.tv_sec + tp.tv_usec * 1e-6;\n}\n\ninline static double GetCPUTime() {\n  struct rusage r;\n  getrusage(RUSAGE_SELF, &r);\n  return r.ru_utime.tv_sec + r.ru_stime.tv_sec +\n         1e-6 * (r.ru_utime.tv_usec + r.ru_stime.tv_usec);\n}\n\ninline static void ExitWithMessage(const std::string &message) {\n  std::cerr << message << std::endl;\n  exit(-1);\n}\n\ninline static uint64_t Hash64(uint64_t key, const uint64_t mask) {\n  key = (~key + (key << 21)) & mask;  // key = (key << 21) - key - 1;\n  key = key ^ key >> 24;\n  key = ((key + (key << 3)) + (key << 8)) & mask;  // key * 265\n  key = key ^ key >> 14;\n  key = ((key + (key << 2)) + (key << 4)) & mask;  // key * 21\n  key = key ^ key >> 28;\n  key = (key + (key << 31)) & mask;\n  return key;\n}\n\nstatic constexpr uint8_t char_to_uint8_table_[256] = {\n    4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,\n    4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,\n    4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 0, 4, 1, 4, 4, 4, 2,\n    4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,\n    4, 0, 4, 1, 4, 4, 4, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 3, 4, 4, 4,\n    4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,\n    4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,\n    4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,\n    4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,\n    4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,\n    4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4};\nstatic constexpr char uint8_to_char_table_[8] = {'A', 'C', 'G', 'T',\n                                                 'N', 'N', 'N', 'N'};\n\ninline static uint8_t CharToUint8(const char c) {\n  return char_to_uint8_table_[(uint8_t)c];\n}\n\ninline static char Uint8ToChar(const uint8_t i) {\n  return uint8_to_char_table_[i];\n}\n\n// Make sure the length is not greater than 32 before calling this function.\ninline static uint64_t GenerateSeedFromSequence(const char *sequence,\n                                                uint32_t sequence_length,\n                                                uint32_t start_position,\n                                                uint32_t seed_length) {\n  uint64_t seed = 0;\n  for (uint32_t i = 0; i < seed_length; ++i) {\n    if (start_position + i < sequence_length) {\n      uint8_t current_base = CharToUint8(sequence[i + start_position]);\n      if (current_base < 4) {               // not an ambiguous base\n        seed = (seed << 2) | current_base;  // forward k-mer\n      } else {\n        seed = seed << 2;  // N->A\n      }\n    } else {\n      seed = seed << 2;  // Pad A\n    }\n  }\n  return seed;\n}\n\ninline static uint64_t GenerateMinimizer(uint32_t sequence_index,\n                                         uint32_t sequence_position,\n                                         const Strand strand) {\n  const uint64_t minimizer =\n      (((uint64_t)sequence_index) << 32 | sequence_position) << 1;\n  return minimizer | (strand == kPositive ? 0 : 1);\n}\n\n}  // namespace chromap\n\n#endif  // UTILS_H_\n"
  },
  {
    "path": "test/read1.fq",
    "content": "@simulated.1/1\nACTCGCACGATAATCGTCATACATGGACGCGTCATTTGTTGAATACATGGTTTAATTATGAATTGCTATGGAAATCGAGAGGCCCGCTCGGTCCCCTTTA\n+\nIIHIIHHHIIHHIIIIGHIGGGIIEIIFEHFHIDGIDEIIIDEFGEICEBHIIIFIIHIIIBHIBE>ID>D@IICGFIBIIIB;>C7IEIIBI5GAI>IB\n@simulated.2/1\nCGAAACTCAAAGGATCTAGTATTGTCCATACGGCGCACACTCCGTCGGGACAACGGCATCGATGCTTACGTTAGCACCAGTTGAAGCGTGATATGTATAT\n+\nIIIHHIHIIHIHIIIIGIFIHFHIHIIIFHHIIIIECIEHGIHBIIBIHIIDII=GIICB>IFHIGI@FFIGHIDIFEG>:G5>A5DI<@IIIFEI@HII\n@simulated.3/1\nTTGCTCTCGGCAGTCTGTTTGTGGTACTATGTGCCTAGCTAATGACCTGAGAGGGTTAAGCCTTTGGATCAAGTAACGGACATACGGGGAAGATGTGACA\n+\nHIHHHIHIIIIHIGIIHIGIGEIGIIIIHFIG?FIGIBC@IIECHIEIIIIIII@ICFIIGCCHFFCIIIID=AEIIIIAIIE9IA=DIIIC7CIIIE7I\n@simulated.4/1\nACCCACATACCGAGCTCAGGTGTGTGTGTGGAGCTACACGTAGCCAACCGTTGAGGTATACCAGGATTTCATTAATCCGAGTAATCTTGCAATAGCGCGT\n+\nIHIIHIHIIHHHHHGIIIFIHFIFIFIICIIIBIHIIBDAGEIIDIAGIEGHGFIIDHEIDIIIIIIIE@IDAII:BIGFIIIGC@I?IIII?IBAIIII\n@simulated.5/1\nCCCATGTATGTAGACGTACAAGAGAAGAGCCTGTCATTGATACCCTTGTAACTTATGCTTAAGGCCACAGATAGTGTCACGTCGACACTTCTCCTATTAC\n+\nHHIIHIIIHGGHIIIIHHGFHIIFFEDIIGFIIIIEAIGCIIEIIIIDEIIFCFIIIEEGIFIEIGII<?FIIGIFIIGG>D@IAAIIIII<IC@6III@\n@simulated.6/1\nCAGACTTGAAGCATACAGCACTGATACGGATCTGTCAGTTCGTTAGGATCGTATGTCAGCCCCGGTGTATCGCCCAACAAAATGTGTGCGCGTTCTTCTA\n+\nHHHIIIGIHIGHIIFGIHHEHGGFEFFICIIIIIIEFIIFCIIFIIBIIIIDIIHICIIFIIBCIDIIDIIICI@3IG9BD><IHDIII@IDE@IIII=E\n@simulated.7/1\nAGCGGGAGTTATAGCCGTTACTGATCGGGCATCGACACGAAGAGTCTGCGTACTGTCTTAGACGATGCTCGGTTGATTCTGACCCGAACAAGCCCGTCCA\n+\nHHHIIHGHIHIHIEIHHCIIHEIHHHIIIEFGIGDIIHEIIHDIIIIE?IIIBIFIIEIIIHIDIEIIAGCE>IIGI;IIAIBIIIIEFIGGI?IG<HIF\n@simulated.8/1\nATATCCAGACCACGAATTTGTAACGTCCAAGTAAACCCAAGTCTGTCATAACTAAACTGTCGCAATTCTGCTAGGGTTGGACTTATGTGGTATCATTTTA\n+\nHIIHHIIHIIIGHIIIIHIIGGIHHIEGGIIFHHIEFIIIIIIFFEIIIIEGI?III;IIIIGDF<IIIIIIIEICICIBI@?IFIIIII?ID>CIII?H\n@simulated.9/1\nCGGTTGGCCCGTCGCTTACCCTTCGTAGCCAACCCGGTGGGGCCTCATCTGCGTCCTCTATTCCACCTTGGACTATAAATTTCCTTCCATGCTAGTCACT\n+\nIHHIIIIIHHHIIFGFIIIIHIIFGEBIIIFGIIBCIIIFAIIIIAGCAIHEIIIIFDIBI@BI@II?IF=DID@IIIIFICIIIIG@I<G=IBFEI@DC\n@simulated.10/1\nGGCGTGTGCAGTAAAGGGCCTAAGTCTAACGCGACAGGTGGGTTTGGCTAAGATTGCGCCTATTTGGTTTTCAAGCGGCAGTTAGTGCCCAAGGGCCGTC\n+\nHHIHHHIHIIHGHIDHGHHIIGGIIHEEIHEHGHEGIIIIIFDI=CHEIDFEIDIGI>IFEGIB>IIIIIFCIEIII>FAIBIA:EID<IIIIII6FGEG\n"
  },
  {
    "path": "test/read2.fq",
    "content": "@simulated.1/2\nTTCCTCCTATCCTAGATGCCACTTGTGAGGCAGCGCACTACCTAACAGCTGCCACATGCCGTTGGAGTCCATAGAGGGGCGACAGTCATGTACTGCCTTA\n+\nHHHIIIIHGFHGIIHHIIGIEIHEHIIIIBHIFGIEIIGHIIEDIIFIFFFC<=GIIIICIIHBIIDAIIF<@I>CIIF;II45I=FFIFEIIIIB1;II\n@simulated.2/2\nCTTGGTAATGTCTGTCTGTCCTGTTCGCTCCTTTTGTCCTCTGGAAGTGGTTACTATAGAAGAGGCACATATGTAGAAGTTCAGGTTGGCCGGCATGTCC\n+\nHIIIHHIHIGHFGGIHIIIFFHIGIIEIIHIHCFIGHGBIHIIIGIAFIIEFIDEHDGHGIIIDCEIIIIFIHHIIIIIIA=IHEIIBIIIIFII=III6\n@simulated.3/2\nCTGTTCAAGCAGAGTCCTACCTAAGCGAAAGGTTCAGAAATGGGTGCTTAGAGTGACATAACGACATAATGAGCGCCGCTGGTCGCCCAAGGTGACGACG\n+\nIHIIHHIIGIIIGHEHGHIGIIFEIIHIHHHCCDGCIIIGHFIIIHEICHHIC@IHBIIBFIAIIB?IA=IEI;CIIBBHG=IIEACIHI;II8@III8I\n@simulated.4/2\nTCTTATATGGAATCTGGGACGTGCTGGTGGGAGCGAATCGCCCTCCACTTCACTAGGAGCACATGTGACTCCCAAACAGGGAGCTGAAGCCCACCTATGA\n+\nHHIIHIIIIHIGGGIHIHHIIHCHIGIICIIIEHIIGGIHIIDICEIICIIIFIIGIHI?IGIIHFII?II=IIIDIIIGFIBIGIHIB?D;IE;B:EII\n@simulated.5/2\nCGAGGTGCTCGGTTTCCCCACTGTCGAAGGTCCATTTGTGAATTTTCCATGATGGTACGTGGTAAATTCACTTCTTACGATAATCACCGTATTCCGTAAG\n+\nIIHHHHHHHIGIIIHHIIIIIIGIIGFFIHHEGGIIGGIIGIIBIIIFEBEHDIIH?IGEICIFDDIIAEIIIGIIIBFIIC85IFI=DI;?IAI@IIII\n@simulated.6/2\nTAGCTATACAAGTCCTACAAGTTAATATCCAAAAGCTGACAAGATTTGACATGAGCGGTCACTCCACCGCCACAGTTTCAGACACCACTGTCTATGCACA\n+\nHIHHHHHIHHFHIIIIEIIIHGIIGIIIGIEIHIFHGIICIHIEIFIEIIFAIICI>@IFB>IHIIHIIIIGC;E>HI:III@EICDGIICIHEI8IIDI\n@simulated.7/2\nGTCTATACGGTCACCGTGGACCTGCGTGTGAACACACCCGCTACTCTACAAGCACTCAACGTTTGAACTAGGCCCCCAAACGGGCGAATCTTGCGCTTTG\n+\nHHHHIIIHIHHIHIGHIGIFIIG@IIHEGEIIHIIIDICII@EIIHBH?DGIIEFB?IIIIIEIGIII@DHI@IIIAIIAIII@6;III;III?H=IAHH\n@simulated.8/2\nCCCTACACACGGTTGCAAATTCCGGCCATCTGTCTTAATAGGAGCCGTGTAGACGCATTAGATCCGAACTTTATGCTGGGTTATTCACTGCCTTAGTAAG\n+\nIHIIIIHIIFHIIIHIIIIIGDIIFHIHFHIGIGIIHIEDIHCII>IIGIBIFII?IICADIIIHEGI5=CIIF=IIII@;IIIAH3I88IIIIAII:II\n@simulated.9/2\nCCCCAGGCATCTCACCTGCTCTTACCGGTGAAGATCCAGTTCTGATACATTGGTATATGTGCGAGGTGACGACTGGCGCGATCCGGATCTTCTCATGCCC\n+\nIHIIIHHHIIIHGHIHIIIIGIIHIGIIIIHEICFIHCG:ICIIIIHIIGIH<EI@HII@HIIIAFBEEIIIIII>III?IF=IF8A<IIIICE9III98\n@simulated.10/2\nCCCAGACGCCCCGAAATTTACCCGGTCGCCGAGCAGGCTCCTGTATGGAAACAACGATGATTGCAGCGATTGTGGTGCTTCGCCCTGGTCACACCCCGAG\n+\nHIHHIIIGIIGHIFIIIIIIHEIIHIFHIIGGGIIIBHEFGAICIFIIHIIDEIIBCIIIABIFGIF@=IHIIIIIII<IIIAIIBCIFCIBIII;II5C\n"
  },
  {
    "path": "test/ref.fa",
    "content": ">1\nTATGCACCAGAGTATGGAAGCATAAGCTCTGCATGCAAAGGTACATCAGATCCTGCGGTTGGGTGCCAAC\nCCAAGTGTGTTCACGGGCGCTTGACAGACATCGGAGGATGGTGCACACTCACTCGACCAGCGCAAAGCAC\nAGGATCTCACGGGCGGACATCTCTTAGGTCAGTCATCGTGGAGGAATGCTTGTACGTTCTTTTGGCTTCC\nCCTAACACGGCGGGCGTCTCCGGTACGTATCCTGTCGGTACACCCCTTAAGCCCCTAGGCCCGAAGAACA\nTAGCGCATTTCACGCTCTCTACGAATGACCGCAACGATCAAATGGGCGAGAACAACTAATTCCGATTCAT\nGGGGTTTGTGGATTGTGACACAGCGCGCCCGCTACTGCGGGACGTGAGGACGCCCAATTCTGCCAAGGAT\nTATTTAGGGTGTTTCACTAGAGTTATGCGCCGACCCCGGTTGGACCAGCTTGCATTCGAAACTGCGTTAC\nACAGCACCCCACCGCAATCGTATGACTCTCGCTGAAAAAGGGTGGTCAACCATTACACCTCTTATGCCTG\nTTGTGGGAGGCTCGGTCTTAAGCAGCGCGCGAGCTGTGATCCAGGCTACCACGGACATAGTGTATGGAAA\nGTGATCCAGAGTAGACCCGCGGGGGCCTGACCTAACCTATATAAGTTGTATCGTGGCTATGAGGGTAGTC\nGCCGGAGAAAACGTATGCTTACTGATTTTTAAGTCGGCGTGGCGCCGAAGCCGGATCGGTTGTAAGCTAG\nCCGGGCCTAGGGGTTCACCGTAACGGATTAGTCAAATTAAAATCCAGCGATGACTTCCTGATAGAACTCA\nAGTCGTGACCCCTCCGCTGCGGGCCTACATCTGTTTTCGCAGGCGTGGTTGTTTACCAGGTATGGTGCTC\nATCTCTATTAGTCACGGGCAGCATGGTGTCACCGAACCGCGCGTCTCCTAATATCTGGTCTACCGATTTA\nGCCCCGGCAAATAACTTTGGATTGTGGTTGGAGAGTGCCAGAACTGACGGGCGCTGCCGTGGGGCTCCTA\nACTAAAAACGCCACGGACCTGGCTAACATTCGTTGTTGACTATAACATTTGAGGGCGCTTCGGATTCCCC\nATACTGCCAGAGTATTATGTGGGTGGTAAACATAGATTCTATATAGTCAACGACATACACTCATTATTAT\nGCAATTGCGGCATCTCAACTATGTCTTAATTAGTTTTCCCGGATGGCGAAAACGATCTTACAGGAGAAGC\nGCTACGCTGGTTTGGAAGACACTTAGTATCCTAGTAGTATGGGCTTGTGCGGGTCAACGGGCGCCGTCAA\nAGCGCACACATATCTGGTGGGGACGGTGTCCCCTATCGGCGCACACGGGAGCCTAGGCAATCCCGACGTC\nCCGCGTGCTGGATAAAGAAAAGGCCGACTGCGCGAAATGAAGAATCGTCAATTTATTGTTGGCAGCTTTA\nCAGTTCTTCTCCGCGGGCGGGCAGAGTGGTTTTAAGACCGGGGTCTATGCACAAGGGTGGAGCTTGATTA\nCTATCATCGAAGGGTGACTTGCCGTGTTACAATCGACAAGCGAACGGCCGACTGCTTCGGCCCGCTGAGC\nGGACAACCTCCGATGTACCTACTCCATAGTAGACTTGGAAAACCCAGTCTTATGGCGCGGGGGAATCAAA\nTGTGCCGATTCTTGACGAATAGTTCTAAGCCGAACTGCCAATCAGGGAAAAGTGTCGCGCACTAACTGGA\nGCTGAAACCGCCAATAGTGTCTAAGTTACTCTTCCCATCTATCAAGGTAAGCCTTTTCCGTACAGTTATG\nACCATCTCACACTGGAAAGACGCACTGCATGCTCGGATGGAACTCGGAGATCACCTGGAAAGTCAGTGTC\nATGCGTGGCGGTTTAGTGTTCGACGTAAGAAAAACCTGGAAGACGGACGAGGTATGCAGACATTGCAGCA\nGTGGTAGTAGGGGTTATACCCCTGGATTTAAAACTCACAACCTTCCTCATAGGCCTAAGGATCTTGCTAG\nACCTTAAGCAGGTACGTGGATCAGAATCGCTACTTCCTGTTAAGACTGTGGACCCTCCCACAAACTCCGA\nTGCGAGCTAGGACGTCTTTAGCTCAGCTTGAGAATACTCCTATTTGCCTTGAAAGCTGAGCGGTTCAGAC\nAGAGTAACTACATCTTATATGTAACCACACTCACATAGTTGTTGGGGGCAAACAGCTAAGGATTCCTGGT\nCCCTGGCACGGATATAGATCACAATCTGGAATTCCCTCCTAAGTACCCGCCCGGTATTCCCACACTCTGT\nGAGACTACGTGCGCGTGTAGTATCGTGAGGTCCGCGGTGGAAAAGGGTTTGGCACTTACTACTCAGTGAC\nCGTATACACGGAGATTCGCACTGATGTGGAATATGAAATCCCACATCCCCTGAGAATTTCGAATCTGAGG\nATGAGTATATGCCTCGATGTAGGCCAGGAGCATTGATCCCGGCGTCGTCCGGTCTAAGCACCATAGTTAT\nGGGTCGGTTATAAACGAATTTTGACGCGACGGGGAATGATACAGGATCCACAGTGAAACTAGTCTGGGCA\nCCGATGCATTGCCAATGGTGCCTATTAGTGTTCCTGAAGTTGACTACAGTCCGTACCTCAGTATAGCGCT\nGGTTACTAGTAGCGAAGTTGAGATTGTAGCTCGTACTCCAATGACCACCCGAGGGGGTGGTGCAATGTGC\nAGGTAGGGGTAGGTTCCTGTAGTTCGGAGGTCAACCTCTTGTTGACGTCTGATGCGAGCCTGACTAAAAT\nGCGCTTCTTCACTTTTGTTCGTATAGTCACTATATTCGCGAAACCGTCGCTTTTATTATAGACGGCCTAC\nTTCTTTGACCGAGCCTCATAGTCTGCACTCGGGACGAAACTAACGGCTGTTCCACTCATGACCTACGCGC\nCTGAGTGATCAAATAATCAAAAGAATGCGCCGCTATATGTAGGGGGCCCATGTATTGGCTGACTTTGAAA\nACACTCTGACACGAACTTGATTGTCTTACTAGTGAGAGAGGGGGGAGATACGCAGGGCGCGGGAAAACGA\nTATGCAGCTGTAGGGAGCGCTATTTGCTAGGGAGTATACAATCCAGGCCTCCCTCGCTCTCATCTTTATC\nTGTCGATCGTGAGCCAACGACTAGTTGGTCTATCGCGATTATTGCCACTATCGTTCCCACCCGCCCGTGT\nCGCCGTAGGAACAACCGGTCTTGCTACGGATGCTCTGATGTTTGTGCTAGTGGTCTAACGGTCCACGACA\nACAGTGTCGTATACAGGGCCACTCTTCGAAAGCGGCGTAAGCGCCACGCTTGTCCGCGAGTTGGGGACCT\nCGGGGAGTCCGTATTCTTTGAACCCTTGTTCAGGCGAGTGTTTTTGACGCCTTTCAACAATGCAGATCCC\nAGGGAGGTCCGGCCCTCTGTTCTATGAGTAGGTCAAGCTAGAACCGTGTTGTGAGAGGAATGAGGGTGCT\nTAAAGGGTTACCCGTCGTCACATACGTAAAAAGAATCCCTATTATCATGGACTCCAATCTCGCCACGTGA\nAAGTTAGCGGGGATAGCGTTTGGGTCTTGCATATCCTGGGCTCTTCGGCTACGGGTGGACACGACATTGT\nTGATGGTGGTCGCGTAGAGCTAGCTTTTTACCTTATGGAAGGAGTTTCCGCACCCCCAGGGACGGGCTCC\nGGCCTCACTATGTTCAAAGCCCTAAATGCACATTCTAGTCTCCACGTGTTACAGTAGGGCCAAAGATGGC\nGTACTAACAGATTGGTATTGGACTATAAAAGATGAGTGATTAAGATCTGAAGTTCAAGACGTCAGGCTTA\nAGTAGGAGTACCTTCTGGCACGATAAGGACTCCTACCCCACATCCTGATAATGCAGCTATGTGGCAATTC\nACGCTTACCCAATCCTTAGCTGGCTAACAATTCCCCATACATGTTCAGCGAAGGTAGAACAACCAAAAGC\nCCGTTGCGGAACTGGCCAATGCTATCCCAAGTTAGAAGGAATACGAACTGGTTCCGGGGACCAAGGCCCA\nGTTGGACATAATTTAACAACTCGGCGAAGATAGCAAGTTCTGGCGTCTGAACGTATTATTGTTGCTGGCT\nATCACAGTTAATTCCCTGCCTACGAATTTTGTCGTACCAATTCAATCCCCCTGGCGGAGCTCTATGTCCT\nAAGGCCCTGTTCAACTGCCGGATGATATTGTCTGGTTTCTGTGCCGTCGCTCAAACAATTCTAGGCCCCG\nAAACGATCAGAGGAACAACCGGGAATGTCGGCCGACTTCCAGCTCCGTCCTATATGCTAAGCGAAAACTT\nTACACGAATCCAATACACATTTCTGCGAAGAGCTTTCACAAGGTACATTTTGCCGCCTTTTGAAGGATAC\nGAACTCGGATATGTACAAGAGACTCGGTCGTGCGGGAATCGCGAATTGCGGGAGTTAGGCACCCGGGTCA\nGACAATCCAGCGCAGACCTAGACCTAACAAGAAGCACGCCAGCGTGCCCAATCCGGATAGTGGACGCTCC\nCCGGGTTTAGAAGCAAATAACCTAGGTGTACCTAACAATCGGCAACGCAGTTTTCTAGGATTCTAGCGTT\nTCTATCAGAACAAACAGATTCTTTGAGGGCTTAAATGGCGCCTTTTCCGTACAGCCACTAGGTATGTTAA\nCGAACAACACCCTGCCTTTGACATGGCTCACGCTAGGCTTGTGAGCAAGGAGTAATCGACGGTTTGAGAT\nCGCTTCTCGTCTAATATTAGACAACCTTGGTCGTCGTTGCCATATAACGACCAATGATTGGTTTCGGACT\nGCGACTAATACTGGTAGGTGGCCTGGCGGTGTTGGTGTGGACTGGTGGTGACCCGCACAGTGCAAGAGAT\nCATCAGGGACCATGCGATAAGGGACCATACGCGTATGCCCGTTTTGCGAGCAACGCGCGAAGCACAGCGT\nCTGGGGGCAACCCAGTTCGTCCTGCAATGCAGGATACGCTGTAACGGTGTGCCGAAGTATTCAAGGTGAA\nGACCAACCAGTCTCGTGTAATCCGTTCTAAGACGCACCCTTCTACTCTCTCCCGGTCACGTCGCGTACTT\nGGCGTGCACAACAAGAACAGATTGGTGACTCAGTAGAGCCTCGCCGACAGTTCCCTATCTTTCACCAGCG\nGATACCTACGAGTTAACCAATTACCAGGTGGGCGCAACTACCGATTGGACTACGCTATGTCGTCCGGCTC\nCTGTTCAAGTAAATCCGTGGAGGTAAACTACCACCCGGATGTAAAACTCCGAGGTTCGTTCTATAGTCCC\nTTGGTCCGCAAATCGCCCTGTGGGTTAGGGATTATAAAATACACAGATTATAAACGTCGATAAAAGTAGG\nTACTGCACGGAGGGGTCTAACAAATAGATGATGTGGTCATCCACTCCGTCCTGAACACTAGCTGAGATTC\nCCGCCCAGAGAAGTTAATCATTTTTAAGGCCCGACGGAGGCACCTAGACGCCACGCTGCACTCTCTATCC\nCCTAAGCGGCTCAGGCGACTACTTCGGTTACCTGGTAACCAAGTGGCTTGAAAGTTCTTCCTCTTGTGGC\nACGAAAGTTCTGGGATCCGCCGAGGGGGGGTGTACTCAGAATGCTTCCTGGTGCGCTGAGCCGCTAGTGC\nTCACATATTAGACATTAATCTGCTCTTAATTATCCTATAGAGGAGCATCGGGACCGTGAAAACAAAAAAC\nGACCAACGAATCATTCCAAGACTTAGCTCAACCTGCCATTTTAAGTAATACGAATGTCAGGATACACCGG\nTAAAAGGTACCAGTCTGAGCAGTCCTTGATAGAGCCCAGTTAGGCAGTGCGATGATCTCACCAGGGAGAG\nATGGCACTTAGGGAACTCCTGTCATCACACTTTACATCGGCCTTCCTACAGACAGTTTGAATCAGCCTTC\nTGGAAGTGAATAGAAGGCATAGTATAGCTCCTGGCGAAGCTTGTGATACTGTAGAGTGCAAGTAGTGGTC\nCATTTACAAGTTGACCTTCCACCACTGAACCCGGTCTCAGGTGTTGGGTTATACAGTCACGCCACCGAGA\nCCGTTGCTGATCACCGAAAGGCAATCCAGGAACTTGGAGGATACGGGATGCTGATAACTAGAGGGAGTGC\nTAGTACAGGCCCCATATGCGGCACAACACAAATAGATATGATGCTTCCTGTGATACGGATCCTCAATACG\nCAGCTACCCGAAGGCAAAAGTCCACATGTTCGACTATCTTGGCTTTGGTGTGACCAAGACGTGTGCGGCA\nGAGAGGACGTTCGCCAGCAGGCACCGCGTATCCCAACTTGAGGTTTCGCTAAATCCTTCATTGGCTGGAT\nGAGGGGGTAAAGGTGGTCTCGGGACGTCTCGTGTGGACCGAAAACTTGTCTACGAGCTCAGCCAGGCTTG\nGCGAATGCCGAAACGAGAGTGTTCGTCCTGACTGGTGGGGGCCTACCTACAGACCTCAGTGTTAGCGCCT\nATAATGGAGGGCTGTGGACTAACATTCAGGTGCGTGGAGTCCCCACGACTGGCGATAGCGGCAGTTACTG\nGTTGCCGCACAGGCCTATTCGTAGCCTTTAAGAGGGGCACGGTTTCACGCTGTTACGCCGAGGACCGATC\nACCCGGAAGGCTCTCATCTCGTATGGGTACTTAGGCCGATTCCCCGCGCCAAACTTCAACCCAAAAGACG\nTTGAGTAGTCAACAGGATCATTGTGTCGTGAGGATAATCAGGATGGGGCCGTAGTAACGACGGCCTTTGC\nCAAGCCTAAAGGCGTTTTGAAAGTGAGTGTTTTGAAGATCTCCTCCAGCGGGGGCTCGCTCCGGAGGTGC\nATTAAGCCACCGCCCCACGAAAGTATATCGGCTTCAATGCGTGTTCCTCTAATGCTGGATTATTGGCCGT\nGGAACAGCTCGGAACTAGTTATACTGCCCGCTTTCGTTCAAGGTCACCAGGGCCCTATCTTCGATCCGTA\nGCCGCCTAGAAAGGAATATAGCTGCAGGAATCGCTTTATCGGATAGTTACTCCCCTCGCCACCGCCCATG\nAATATATTGACCCCAGGCATCTCACCTGCTCTTACCGGTGAAGATCCAGATCTGATACATTGGTATATGT\nGCGAGGTGACGACTGGCGCGATCCGGATCTTCTCATGCCCTACCCCAACTATTTTCACCGGTATCCCTCT\nTTATTAAAAGTCGTCTGGCGATCTCAGATTTAGGTACGACCTTGCCTTTTCTCCGAACCTAAAAAGCTTG\nAATGTTTTGCTGAGGTGCTTGCCGACGCCACTCAGTGACTAGCATGGAAGGAAATTTATAGTCCAAGGTG\nGAATAGAGGACGCAGATGAGGCCCCACCGGGTTGGCTACGAAGGGTAAGCGACGGGCCAACCGCGTCAAG\nCAGAGGGATGAACTTGTAATTCTTTTTATATCGCACTTCTTGATGTTTGCCCAGGTAGAAAGAGATTTTT\nTGACAGGTTCGTCTAGCTGAACTTCCGGTCGGTTTGCTCCCTACCAATTCTTCGTCATGAGGTGCGGAGT\nAGCCCCTCGATGGCCATCGGCGGGCCATGCTATTTTCAGCATCTTTTTTGCTTCAGATTAGAGCGCACAG\nAGTTTTAAGTTACATGACGTATGTTAAGATGTCCGAGTCTACTTGTGCCTAGGGCCAGGGGATGACTTCG\nAAGGTCACAACCCGAAGAGACAAGTACGCTATCCTGGGGAGATTCCAATACTTGGGTGGAAGTTTGTTGC\nCTCTAGGGCTGCGCCCTGGCCTGTAATGACTCGCCTAGTACGCCCGGGAGTGGGTTACCCATAGGATTCC\nAGAGTATATAGCTGAGCGTCCAGCCACTCGTGCGTACACCGAGGCGCTCGTCAGCTTCCCCTTCACCTAT\nGGCGGTCCGACTACCGAGAAGCTACACTTAGGAATTTTGCAAGTTTTTATGCTTAAGGTAACAATCCACT\nGTCCCCTCGAGTGGTGCCGGCCTACATAGAGTCGCCTCGAACGCGAATTAGCTTTTCTGACTTAGTCCGC\nAGTGTCAGCTACCTAAATAAAGACCGGGTTCTGTCGCTAACAGACTCGAATTCTTTCAGGATACCGCTGT\nTGTACACTAGGGCTACGGCTCCAGGACTCAGATAGCTAGGCCGTAATATGGAAGTTAAATCAACTGACCA\nAGAACTGGGGATACTCTCTTTTTGCAGTGTTAACAGTTCACGCACCTAAACGGCGCGACTTGAATAAACA\nCCTACGGCTATCATGAAAGAAAGTTATGTCCTAGTTTGAACGAAGTTTTGTGTTTCAAAAACGGAGGGTA\nACAGCCACGGAAAATAGTACTTCTTCTCGTTCAGGGTTCACGGGCCCGACCGATGGGCCTTTGAAGCCTC\nCTTCTTCAAGCTTTTTCAGCCGTCTCGATAGACCCACGGAGAGCGCTCACTGGTTGATAGGTGGTACGTA\nAAATGGGTACCACCAATATATTTTCCCAATGGTGGTACTTTCTAACCCCACTGACGGCGTACCGAACTGG\nTATGCTTTATTCAATGCCCGCTGTCACCATCAACTCCGCAACAATATTGCAGGGATACCACTTTCGACAC\nCCGTCTTGGATGTGTTCCCCATGTTGTGTCCCGGCCTGCGTTCACCTCTCTATTTAAGAAAGGAACGACT\nTTTCGCGCGGATACACATCCCGTGGTTGAGGGTGTAACTTGGGTTGGAGCATGTCCCACCACAGAGTCTG\nGATTCACGCGATCAGAACGTTCAGGCACTCTCACCAAAACTGGCCAGCATCCAGTCAGAGGAAGCATCCA\nGCTGACCGTGAGATGCCTTCCGAAAAGTGAGCATCTCGACATTTGTACGGGAATGATTATTTTGCAACCA\nCATCGGATGCACACGTGAACAATATATGGATTGGGAGTCACGAAATAGCTTACTCATTGCCATTTAGACT\nAGGTTTGACGGGCCAGGGCCTTCGCGTGGCGGGGATTCGAACGTGGCAATTCGTATGGCGTAGATGTCAA\nCAGAGCGGGTACCCGCGCTTCATTCCTGCGTTGGATCTCTCGAGTGGATGGTGCGGTTACAAGTACTTGT\nGAAACTTCCCGTAAGGCGGCCCTTTGGGTAGGGCGTCTCTGGAAGCGATTAGTGCGACAGATCACCGCAC\nCATGGGGTAAAAAGCTACTTCGTCATGAGCTATGCAGTCAGTAGGCCATGGCCTCGGCCAATAATAGCAC\nATGAACACGTCGTATTCTCCTCCATTGTCAGTAATTGCTTGTTTTGGCGTTCGCGGAGAATCCAATCTCC\nCGGTGCTTCAGCTCATTATCCTGACGAGGCGTCCCGTCCCGATCGGGAGAAGGAATTAGCGTATCGGTAA\nATGGATTGCAGGAGCTGAGCATACAACAGAGAACTGTATAAGCTATGGACTCGTCGGAACCGATAGGAAT\nTCCCAGTCTAGCCCCGGCTCGAACGGCACGGAATAAACGGCGATGGAACGATTGTAAAGATACTAAGCAC\nCTTCCCTACGTAAAAGGGAAACTAACCCTAGCCTAGCGTTTGAGGACTCCTAAGCGACATCCAACAGCTG\nCAGGATAAATGTACCTAAATCCCTTGCATCCCTCGTCATTTGCTCTTGTTAGAGACTTCGGTCGTCTTTC\nCTCGAACGCGGGGCCAGGCCGATATCCCACAGCTATGACAAGGTGTTTCAAGCACAGAGTAGTCACGGAC\nCTCTCGGGCTCCGTGCGGTTGCTGTAAAGTAGGGCTATCCTCAGGGCACATACAAGGAAATGACGTTCAA\nTTCCCCCGACGAACTTAGCGGTCTTTGCCAAGGTAATCATTCTGTTAGAATGGTTTCCACCGGTGGCACT\nAGGACCCTGGTGTAGCCAGAATTGTTGATGCCCTCTTGGCAATCACTCTCGAGGAGGGTCGTTTCTGATG\nAATAACATCGTGCTCCGTTTTTTACGGACCGTTTTCCCGAGGCTGACGCAGTCTTAGCCGGATTTCTCAT\nGGCAGCGCTGGTGAAGCGACGTGTTACTTGTACGCACAGTTGCGACATTGGAGGCATGTTTAACAGGGCT\nACGTCAGAAGCCCTTCCCCGGCCACCGGGCGCAGCGGTTATGGATATAGCAACACGTGACCTTCATTTTC\nACCCGCCCACTTCGAGCCTCAACCTGGCCCCGCTCAAACAAGGACGTGAGGATCTGGGCGTGTTTGGAGC\nCATTGCGGCCCGGCTACCACCGCGGATATTCTTTGATAACCGGTATTACTGTGCAGAGTGTCGGTGATGA\nGTGCTTGACGTCCAGAAAGCCTACTCGATCTCCTTATCGCGCCAATTTCCGTCTTTAACGACTCCGGTAG\nGATCCCCCTATGAAGGAATATGCCATACGACTCTTCCCGCTGCGCTGAATGTACAGCAGGCCGACCGTCC\nTTGTGGCCATTGCACCGTAAGCTCCCAGACCAAGGTAAACGTGACTCCTGGCCAGATTGAATGGCGGCGT\nAAGATGTGACTAGTCCGTATGTAATTCGTAGTCCTCATGTTAACCGGCACGCAGACGAGCAATGTGGCCC\nCCAACCGTCACTAGCACATCGGGCACAGAGATCCTCTACGTCTGTCATTCAAGACCGACGGCCGTGTGTA\nCCCGCGCCTCCCTCGAGCGTGGGGCGCACCCGAAGCGGTGACATACATCTAAATTAAGAAAACACTCGAG\nCGATACTGCTCATCTAGGTGAGATTCTACCGATAGATAGCGCGCTACCAATTGCCGATATCTATCGTACC\nCCCCGGATTGCCGGAGCAGGCGTGGAACGTCCCCTGAGGGACATGGCAATAATGTAGCTACTGGAAGGTG\nCTACCATGGCCAGCTAAGTCGTTGGATCTACGCAGTGAGCAGTCGCCATTGGTCCAGTAACTCCGGAAGC\nTGACCATTACACGCTAGTCAACTTTTCTTCCGAATTACTGCGTTCATAGACCATATTCGCTGCCGTAGCT\nAGAGTTTGTGTGGAGATGTAATCCTGATATAGAGATCAGCAACAACGGATTCATCCCTATGGTAGCCCCA\nCCGAGAACAGCTTCTAGGAAACGCCCTATGGGCAGTGAATAACTCCGATGCTTCATCTTATCGATGGTGC\nTGTTCCCGCTAAAAGAAATTAGCAGATATAGTACCGAAGCGACGGCCCACAGGCCAGTGGTCAGAGGTCG\nGTTTACCATTAACTCAACTCCACGCCTAGCCGTGACCGGAGGCGACTCCGTTGATGCTCATGCCGACGTG\nTTGATCAATAAGTGAAGAGATCATCCTCCCAAGGCCTTCTGCAATGGAGTTAGAGGATATCGTCGCAGGT\nTGGTCACTGGATCGAATTCACAATTAGCAAGACACATGTCCCAGCCGCTCGCTCGAGCCTTCTATGCTTC\nTATGAGGGCCTCCGAGGGCCCTATGTCCTAACTGGGCTCTCCGCGCACGAGCCCTGCGTCTATGACTTGA\nCCAGTGAGTGGTTACTCACGTCGTCATCGGATCCACTTTCGGTGCCCGTAAATCACGGGCCAACCGAATG\nGTCTGACCCCCAAAACGATTGGCATATGCCGACGGACTGGTCCTTCAGCTACAGTGTAATTTTCATCCGC\nTATGGCCCCGCAAGATGTTATGTTACGGCCCGTTCAACTCGTTAACGTAATGTCCAAGCATAGCTATACG\nACTTAAAATGCTGGCCGACAAGGCCGAGTATTAATATGCACCGTTCTTTCTGCTAATATGACTGTTCATG\nATTGCGGTTGTTAATAAACACTGGAGGGACTGGGCTAGTTCCGTGGAGGTCTCGTACTGACGTATAAATA\nTCTTGTGTCAGCCGCGATTTCGTTTCGTCGTGCGTGTGGGCCCAAATCAAGACATTTCGGCCATACCGTG\nGGACTTGAACTGTGAAAGAGGCAAGCCTGGGTGCCTGGTTCACTCAACAAGAATTCAGCAAAGTTGCGCA\nGTCTCCAGTGAAGCCTGCAAGTGCATTGAGTGGAAATGTATTATCACTTCGGTCGGACGCTATAACTTAA\nGCTAGGTGAAGGTGGCCAAGAAAAAGCGGTACACTGGCGAGGCATAACCGGTCGTCCTACAAACGCTCTT\nTTGTACTCTTCTACGAGGATGCATTAATGAGCTACATTGGCTATACGTTGCTATGCACCAACGTACCGTG\nTGGCTTGTATACATTTTCTTGAGTTTTTTGCGGTCAAGGCACACCCCCTCAACAAGGGTATAGATGAGAT\nGACAGCGTCTGACGCAATGGATTAATTGTGAGGTAGTCCTCGTATGCCTACCCCTTCCGATTTCCACCTT\nTTAGACGGGAAAAAGTCGATCTTTAGAAATTAGTCCCTCTCCCACATCTTTTGATTTGGAGACGCATTGG\nTAGACGGACGGGCGAAGGAAGCCCAATGGGGTGCGCGGTGAACACCCTCCAGAGAGTACCTCGTGCAGTT\nTTGTCGCCTCAATGGCCACTCCACAGGTGCGGAGTCGTGGCCGGTACCACCCCCTTATGTCCTCTTTCAC\nTCGCCAGTATGGGCCTCCAGCGATCGGTCTCCGTGTCTAGGCCGACAGAAACTTAGGGGACCAGCCCTTG\nCGCACGGTTAAAGGTGTGTGGGGCAACGACCGCAAATCGGTCAAAGAATGAAATTTTGACGTCGTATACA\nGGCTGGATAAACTCCTGTAGATGGACACCGATAATGCAGCCGCATTAGATGAAAGGAACACGGTTGGTAA\nAGTCGGACAATTTGTGGGGCGCGCCTCCATGGTCAAGGAAAGAACGGCGGTAGCAGCCCACCGTAGTACT\nGACGTATCACGACTTCATGATAGCGAGTCGATTTGTACGCTGACCGCAGTTTTATCGGTCAGATGGTGCC\nGTTTACATAAGCGCCCTCCCATATCTTGGCGCCTATCGTGGGTAGCAACGTTGACCGAGTGTGAATGGAG\nAACCAAGGGCTAAGAGGTTTATAGGAGACTAACTACTACTGTATTAATCGGTTAATTGTGATCACTTCAG\nTGGGACCTAGCCCCCTCGCGGTCATATAGATGCACGCAGCAAGTATACGCGTTACTCGCCATTCTCATGC\nCTTTATATACCTTCGTATTAACACGGAGGTAGTGCATCTTCGACGGCTACCCTGGGACGTTCCAACGCGG\nCTAGTGTAGGGTGTCCGGTACGTTCACCTCCCCTACTAGCGACGAGAGTAATGACAATTGAATACCGCCT\nGAGACGCCGTGACCCCGACGCTCTTTATAACAACAGAGTCGTACTTGATACCCTTGAGGCTTGAGTCTCG\nGGTACGACAAGGAAACACGGCGTCCGAGTCGGAATTTTCACATTTGCACGCTTAAGATGACCTATGGAGG\nGTGTATATTCGGACCTTGGGCCGTGGTCGCCGGCGGTTCTGTGAAGGGCCCGCGCTAAGGATCCACCCTC\nGACATGTCTGACCGAGGGCGAAGCCACACGCTCCGATGATATTGAACACTCGGTACTTTGAGAGGGTGCA\nTAAAGATTTATTTGGATACTCATTATCCCACATGTGATTCAGGTAAGGACGGCCTTACTCGGCGGGATCG\nTCTTCCTAATTTAAAGGCAGGATAGTGACTCGTTGTTCCAATGTTGGATCGATTGGGCAGACCCACCCGT\nATCGGGGCCTAAAAGGCAGTTTTTGAGATCGAGCCCACCTCCTGCCTCCTATCGTCTTCTATCCACATTA\nATCAGCAGGATGCAACACAGTGGTGCCTACGGTCATAACTTTACCTCAGCGTCACCAGATACCTCTTTTA\nTTTAGACGTGTAAGCTCCGCTCCTCTAATTCGTAAGAATATGGGCAATAATTCAGTAATTTACGAACGGT\nTTTCCGATACGGGGAATCCCAGGTGACAACCGGTTACCCAAGGGCGGGAGGAAAGTCCTAACGGCTGAGA\nGAGCTCGCTTATCGGATCATGAAGATGCCGCTTGCAGGGATCGTGCGATACATGGGTAGGAACACAGACT\nGCGAAATGTTTGGCAGGTCGGAGGCGACGCCAACCGTAACCGTTATAGGGCCGAAGCGATATCGTCACAG\nCTGATCCCGCGATGCGGAAATGCACGGTCCCTACGACGACCCATAGACAAACGATCCACTGCATACACAT\nGGAGGGATCTAAATCTTGGGGCGTCTACAATTTTAAGTATGTCCTTATTAACCGAGGAGAAGTACCCCGA\nTATTGAGGTGCTTGTGGGGCGGCCTGATTCATTAACGAGCTCGTCCTCTAATGGAATCCACTTGGGGGAA\nTAAGGGCTGTAACAAAGGTACACCCTTGTGTGGGAGTGACTTCTTATCGGGGCTTTCACTATTGGGGCTG\nTTAATCGCAATGGTACTTCTTAATCATCCAGGACGTTCGTTGATAAGTAGCTGACCAAGCGAGCAGATGG\nACCGAACCACCCCGAACCCTTATCCTTTAAGGATGCGGATCATTTTGTATGAGGGATACTGCAGTTGAAA\nCAAGCGTGTTCAACGTTATCTAACTAATTAGCCCCGGCGGCAGTGGAGCTGCCTCGGTAGTTGTGCGACC\nTTGCACGCCCTTGCCAAAAAAAGTTACTATGTTGGAACAACTACCGATGGACTACATATCGGACCATAGC\nAGGCGATCATACCTTGCGCCTACTGAAGCCCTCCCAATTATAGATAGTACAGCGTCAGTTACTTGCCATA\nGGCCGCCCTTGTGGTCATTCTCCGCGAAGGATGACTGAACCGTATTTTTGCCTCAATTCCGAGCTGGGCA\nGTCGTTGGTTTGCACTTGGCCGCAGATCTAAGTGAAGACGAAATCTCTACTTTTACCTCTCGCCGGTGGT\nGGACGTCAGAGGCTGGACTACCACATGCTAGCATATCAGGCCATCATCACGACTACCGTCACAGACGCGA\nTCTCTAGCGGGGAAGACAAGGTCTGTTGCAATTGTTGGCCACGCTCATGAGCTGAGTGTCTTGCTGTCAC\nCGGGTACTGGCGAGTGATCATATCGGCGTTGTCAAGTGTGCATAGCCGATCGGGTAAATTCGAAGACGTC\nGGAGAACTAAACGATTGCCCGAGCACATATGTCTAGACGGCAGCATTTTATCATAGACGCACCCTCTACA\nCACCAGTAGGCTTCGGCCTTAATAGAAACAGGCGCCCCTGCGCACTCTCCTCTTTACTGGACGCAAATGA\nAGCTTTGCGAAACAAGTAGATCTCTACACTGCTCTATAGCGGCTTTTTTTCTTGCATTCACGCGTGCTGA\nCGTTCAAGTCCTTATGACACGCCGAGCCTCGCTTAGATCTCGTTAGCCCCGAGCCGACCCGCATTAGCCC\nGATCCTACCCCAAATAATGCAGAGTGAGGGCGGTTCGATACTCTGCCTAGGTGGTTGACGGCAGCACTCA\nACTGACTAGCAAGAACACACTGCAGAAGTCATATCCGTTCGGGTACTTAATCGATAAATGTCTTTTCGTT\nGAGGATCCGCACAAAACGTGGCTGCACGCTAAGTCGAGTTACGAGGCTCTTGATTACCTTTTCAAAGTTC\nGGCGCTGTAACTTCTCCGTACCATGAATGGATCGTACACCTCGTTGCAAGACAGCACGGCTAAAGGAGCT\nGGGTTGCCCGCTCTCGCGCCTACATCGACTAGACGGGGCCATTAAATTCCCGGCTGAGCTCCGGCTCCTA\nAGATCGACGGTCGCGGAGAGGATCTATACTAATGGAGATGGATGTCTGACACCCGGATTCAGGGGATCAA\nCGGTATGCTGCTGGTCGGCATAGTCAAGTGAGGCGCTCCTGCTGAGTGAGGTAGAGTACGGCGTGGCTCA\nCGTATTGTAATCAGTGCCTAACGCCAACGGCCCCAACTCTCCTAATGAATTCATCCGCGTTCCCCCCAAC\nGGCGGCAGCATCGCAAGAATCACGCACAGGACCGCCTATACCAACTCGGGGCACAAATTAAACATAAATT\nGCAACTCCGAGCACTCCAGTCATGGCAGGGAGTTAGGTTTTAGGAAATCTGGCCCATGATACGAGGGACA\nTACTTTTCCAAGAGACGCGTTCTAGAATCTCTTCGGTCGCAAGTCGGCTCAATTCACGGTTTGCGCCAGG\nGAGAGCTTGGGAGCCCTACTTATAGGGGAAGCTCAGCACTCCCAATTAGTAATGACACTGGCAATGACTG\nCCATGGTTAACATTGCACTCACTGTTGACGTCAGAGCACGGATAAGAGTGGGACCTACTTTGCTCAGGTT\nTAGTTCATCGCCGCGACCGCGGGGCCGATTCGTCGAGTGGTGGGGTTTGATGTATACGGCGCAATAAGCT\nTTCGGCACATTACTCTCTGACGTAGCTTCAAGCCTCGAACGGTCCTCCATTACGTCTCAACCTACATTAC\nACAGGGACGGATCAATCTCCGACCTGCCGAACGCGGTTCGTCCATAAGTCTTCGGTGTACACTCAACACT\nCGGGATGCAATGCCTTGGTGCATAATGTCTCGCACGAGAAGTCGTGGACCCTCATGTGAGGACCGCGACA\nACCCTGGTTTAGTTGATCCCCACTCGTGCGAAGTGACAGATAACAGTACTGTAAACGAGCTAAGACCTGG\nTTGGGCTGCAAAACAGAACTCCGCCAGGGCTCACGAGTCCACATCTATTGTACGGTAACTTGGGGAGCGA\nTTGGGGCTCGGGTGTCTTATGTGACGTGCGGAACAAATCCACGAGGCGGAGTGCTCCCGTCAGTATCTGC\nGTCCAATATCTAAATCATTTACGCAATAGCGGCCTCTGGCAGATTAACTCCGACCGAGTGGACAGCCACG\nGAGCGTCAGTGTTGGCGGGCATGTCAATATCTTTAGAACGATACATATGTCCGTCTAGCACTTCTGGAGA\nCGTGTTACGTTTGTCCGTTCTTGCGGGGTACGCTCCTATGATAGAATTAGTTCTGGTAACACGCCCACGA\nTCTATGTGTGTTGGCTCGACGGCACGAGGTAAGGATCCCTAAAGTCAAGATAAATGTCAGGGCAAGGTCT\nCATGGGCGTCGTACAGTGAGTCTTAAGGTGCATAGAACCGATGCCGATGAGCATAGTACAAATTGCTGCC\nATGGGGAGTATCCGTCGCGAAACCGCAAACACCGGCACAATATATGCCGTCACATGACATTACTACCAAT\nGCCGCAAGAACATGAAGCTGCTCCGTGGCCAGAATCACCCAACCACCATTAAGGTCGTCACAATAATCAC\nAATAGATGTCTTGAAAAGGGCAACGCGCTACACGGAAGCCTCGGTGACTAGTGCACATGGCACCCCCCTC\nGACAGCATACCCGGCATCTCGTGCGGGAGAAGATTTGCCCACGACGAAAGGGAGGCCGCGAGCCCTTGGA\nGCCAAAAAGACTATGCAACACAGATATAGCGACACACAAGTCGGCAATCGGCCGACCTCGAATGCGAGAT\nGCTTCTTGTCGTCCCTGCCAAGCTGTATCGGCAAGCTCTAACTGAACGCGGTGCTCCTACAAAACTGGGT\nTTTAGGGCAATGCCGCCAACGACGTAGTTTGAGGCCGTCAGAGACGAACCGGCCATATGGGTTGATATGT\nTAGAACAGTGATTGGTATTTCTCGTACTTTAGATAGCTGGCGCAGTAAGTATTACGTGCACGCGACAATA\nGACCAATAATCGAAGTACAATATATGTATGGTGTCGTACTCCCTTTGGAAAATTGTTGCAGACTCAGGAA\nTATCCCTTCAATCACGTGAACATCCGGGTGTGAATTGGCGTTCCCCCGATGTTGTCTATCACGAGTCGCG\nGCTGTGTAACTATAAAGTCGAGACGGAAGTTTTCCACGTCGGGCTATTCCGCATCAGAAGCGTTAAGGTG\nACTTAGACATGTTAGTGGCGCCAAAACAATTTTAACTTCAGCGTGTCGTGGGTACCTTAATCAACGCGTA\nATTCCCGACATCGAAGTGGAGAATTGTTATGTGGGCTACACAACCACTACAAGCCAGTAAAGACAAACAC\nGCGGGTAAGGGAGTCTCACCGGGACCTCTCAACTGTTGGGGCCGACTCGATCTTTACTTTGCATGAAGGA\nAAATGGTCTTGCGTACTTCCCCTACTCTCTTCTTTCCCCCAGGTTCGTAATATTCTCTTAGGATCGAGTG\nTGACAAATTACATCGGGCAACTCATAATGCTGAGGTTCGAACGGCTTGATTACTGGAGCACAGTCGCATA\nACGCATCACAGGTAAAAACAGCACAGCACATTAGCAATCTCATTTGCAACTGTCGTCCGGTCACAGATTA\nAGTAAGGAGTAGAAACTGAAGACGTAGAGGTGCAAGTGGCCCACTCGTCTCCAAGTAACGCGGCTCTAAT\nGTCTAACTGCCACCGGTACCGCTGAATGAAGGACCTAGGAGAGGAATCCCTGGAACCGGCGAATGGCCAT\nGATCTAAAATGGTATTCAGCTAGGACACTCTAGTCTGCTAAGTTATGTCGTCTCCTTCATTGGTATCGCC\nGTGTATCAAGTGCAGACGTCATGTACTAGAGTCTCTCACGTCGTTGTGAATACTATTTCCTAGGCTGAAA\nAGACTGCTATGGTGTTCCTGTGGACGGGGGTCGACCCCACTCGGCAATCACGGCCCACGGGGCCCGATGC\nGACCATTACATCCCCGGGGGAACGTACATTAGGACGACAGTGAAGCGATTTCGCTATCTGCTCTTTACGT\nAACTTCAGAGCGTGCTCTATAACTTAACCTCTGGTAATAATGCGCTAGTCTTGCGTCGCAATGATGTATG\nGACCTAGAACTTTTATCTATACAACTTAACTAGACAAACCGAAATGGCAACGTTCCACTATGCAGCCCAG\nCGGCGCGAAATAACATGTACGGAGTAGCTGACTGCGGAAGCATCCAGGGAGTCATACACCATGGCGCTTG\nCTCAATATACGAGGGGACACTGGGAGTATGAGCCTTTTTTACAGTAGGAACAGCGAGATCTGTCACATCT\nTAGCACTCGCGCCACGATACAATTGTTGTCGTATATCCATAGGCTAGCAAGTAATAGGCGTCGGCAGCAT\nTTCTGTTTTAGCGTAGGCAAAGGATGGCGCGGATCCTCGGATCCTGGTAAATACCCGTCCTCTGCTCAAT\nTCTGCGGGAATTGTGTACCCGTCTGGGCCGTGCATCTGGGCGGGTAGAAGAGACAATGTATGCGGCGCCC\nTCCGTGAGCGGGTGGCATCGTATTTAGAGGAGCTGCCTCCACTTACATGTGTTATTCTGTAGATTGGGCG\nGAAGCTCAGGGGGGACAAAATCATCCCCCCTAAGCCTTTATCACGAGTCGCCCGCTTTAACGTTGATAAT\nGACAGCTCCACGGAGTCACGACGTCAGCAGTTATACTTAACGTACCATTAAACAGCATATGTAGATGCCT\nTTTTGAAGCGAGACCATGTCAAAAATTTTCGAATTTTCCCTTGACCTAACTGGCAATTCCTGACGGTTAG\nCGGGTGTCCGTCACGTGTTCGGTAAAGAGTCGTGACGAGTGCCTTCCGCACCCAAGGATCAAGAAGGTTT\nAATGGAAACACGAACAAACGAGTGACGTTCGTTCCACGTTAGCGGCAAATAAATCAGCCGCTGGCTGAAC\nTTGGCGTCAACGTCCCTCAACTCGGCATGGGTCCCCCTTCCATCGTCTCGCCTGCCAGCTCCTCTGAGAC\nAGACCAACAAATGCTACTGTTATTCTAGCAGCGGCAGTGGGCCTGCCCGCGCCTCTAACTGAAGCCCGCA\nCAGTAAATATCCAGGCCCCCCGCCCACCATCCTATAGGTGCGAGAGGTCTAAGGGGCGAAAGGAGCTGAT\nAGCAGATCCGACTGCGTAGATGTTAACCGTCCTAAGCTCTGAAAACGGGAGCTTTGAATCCTGAAAGCTG\nAAGTATCCTAGATTTAAGCGGAGTTCCAACTACACAGCAAGCATCGTCTTATACTTGAGGCTCAACGATA\nCGAGTGGTTAGAACTGCATCCTGACTTACAGGCTGGTCCGTATAAATGATGTAAGTGGTAGCAATTCAAT\nGTTCACAAAGGCGATGAGGACTAGTCCTCGGTTTGTTTTAGACATCGGAGGTACGAAAGCGTTCCAACCG\nTTAGAGATTATGCGGTTGTTCAGTACATTCCTATCGGGTTTACCTGTTGTTTGTAGAATGACTTATCGGT\nTCGCGGCCAGTCTAGGAATGCCTCTCGATGGACACTTATTGGTGGGATGGACCCTCAGGAATATCGATAC\nATTCCTTGTCAAGCACGAATCAGTGCTGCGGGTGTGCGAATGACATAGGCCCTTGGAATTGGCAGTCCCA\nGTTTGCTCTTCGCACTTCCGCGGCAGCCCCTGGCACTCGATTAGAGAAATTAGACTAGCACTTATTCCAC\nGCACCGGACCTCAACAGTCGGCAATTAATCTATTAGACGCGAGCTACCCCCTATGCGTCTAGCCGCCCGG\nCATTACATTATACTTTCCTGCCTTGGAGGATTATGGTCTCTCACAAATTTTTATTAGCTGTAATACGTCA\nGCCTGTAGTGATGTAGTGCACCGAACCGGTCGCAGTAAGGCGTAAAGCTGCAGCTCCGAGAGGGGACCCT\nAGCGTATACGATATCGTCATAGTGAACCATCTTTCGCCGTACATTACTCGACGTCCCGTTATACTAACGG\nCTTTCAAGTTAAAAGTGGGATGAGGACCAAGCGTGGTGACGGTGTAGTCACTAGACTAACGCGTCCCGAG\nTCGAGCGGATGCTAGCGCCAGGGTTGGATTAGTACCCACCTCCCATTGCGCACTAGAAGCTCAGTAGTAT\nCAGAGCATGGCAGTACCCTAAACCTGAGGCAAATCTTCTAAGCCGTCCGGATGGTTAATTGCGATCTCTC\nTTGGACTATGACCATGCCGCACTACTATCAAAAACAGTTGTAACTTGCGACGAATGCCTCACGACCCCGT\nTGGTTTAATTGTTGCGGTCTTAAGCGACACATCTTTCTAATATGCAGGGCACCAACGGAGGGGCACCGTC\nGTATATACGTCTGACCCTTGTAATTACTACGTTTCGATCGTATCCATTAATAGGTGCTACATGAGCTCGG\nATAGTGATAGTTGCTAATGAGAAATTGAGCGTCGTCGGAGGAATGACGAATGCACCCCACTCCCCTCTAT\nAAAAGCGACCGAACTGAAACGACGCACCTTAACCTCTGTGGGTCTGCTTTCTGGTAAGAGAACCAACTTA\nGCCTCCATCGCCTAGGAAGCCATTTCGAATTTCTTCACATCGGGATATTATCGGGATGAGGGACACTACT\nAAGTTTCTACCACGGCGGCCCATAAATAGATTCTGTTACTCTGCGTCAGTGAATAGACCATCCCTGAAGT\nCAGGGAACATAGGCTGTAAACCTCTCGCATATGTGCGACACATGTAGCCTTATGTGTTACAGTTGCAAGA\nACAAATATATGTTGTGGCTTAGTTTACGACGACAGCAGATCTCTAATAGCGTTCCTTAGATAGGTTAGTG\nGTCGACTACCATGACCATCTACAATAATGATATAGGAGGTTGCGGCGGTGAGTTTTCGTCCTTTTATAGA\nCTCGAACGACGACTCCGTTTATTTCTATATGATGGCTGCGGGCCGTGCCGTAGCAATTAGATGCATGAGT\nGCGGGTAGAAGTATCCAGGCCGGCTGACTATCGCTTTATGGAGTCAACGTCGCCGAGAGATTCCCTCTAA\nCAGATTACCGCACCTTAGTAGCGGTCTACTTCGACGGCGCACTTCAGGCCTGCGAATACGTTAGGCGATG\nCGAGACGTCGGCAGCTGTCTATCCAAGGGCATTCGTTATCGCGGCGGCAAGTTCCATCCGTACGGAGGTA\nATGTTACCTCACTTTCGCAAAGTATCCCCCGTTCTGTCTGTACGACTCAGAGGCTATTATGCACTAAAGT\nGACCAGATGTTTATTTAATAACTCTCACGCCACGCAACTCACGGCGGAAGGTCTATGGAAGAAAAGCCCT\nTCGCGGCCGTTTCATAGCCCTATTTCGTCGCACGGATCGTCTGGCCGCATCTCGTTCTCAGTTATTCGCA\nTCCAGCCGATCCAATCGAGTCGTGACATGTACACACTACTTCTCCTTTTAGCCACTTGCGGTACTTCATG\nTCAAGAACCCATACCTCAAGGCACGGAGCGGTCGCCACATGAACCCCTGGTAACGCATCTTCTGTCTATC\nAACCAACCCTGAGTACCTCGCTCCCTAAGGCCCCTATCAGGAGTAAATCTGTGCTGCGGCCCTGAGACCG\nTTAGTATTGCTTGCTCCGTTGCCTGTCTTCACTGGCGGATCCCAGATCCGTATGACCTCCACGGAACGTA\nCCTGCCCGACGCGCTATAATTATCGACCGGCTAAGACCCTCCAACGGTAATATTTCAGGATACTAAAAAA\nCTTCAAGGTATGTGCGTCACACTGGGTTGCCCCCGGATCGGGCTAGCGATATTCCGAAAGACGAATGCAG\nATGTTCGTGACAGGTGTCCTTGCGTAGCATACTGCTATGCACAGCCTAGAGGTAAAAAGGTCGTGGTTCG\nCGTCTGCTCATCGTTTGGCCTGAATTCGCTGGCCATTTTGATCCCCGAAGTGACTGATAACCTACCCAAC\nGTGGCTCCGGCAATAGAACAAGCTATTATAGCGGCGGGTTTTCCATACAAAGAGTTATTAGGCAGTCGCG\nCCTTCCGTGGTTAGCTACCACGCGGGGAATGGGATATAGATTACATCGAGGGTAGGGATTTTCCACACTT\nTATAGGCTCGATCGGGGCGTTAACTCGTCATCCCCGCCGAAGTAAGGCAATCTGGTAGACAATGTCGACG\nATCCGTAGGAGCAATACAAACAATGACAGCAACTAAGGACCACAGTGCGGTGGACCACCGTAATTCACAT\nGATGTTAAGTTGAATGGCGTCAGTCCACAGCGAAGGCTACGTCCGCGATCTCTCAATCCATAATTATAAT\nTCAATTCTAGAGCTCCTCAGTCACTACGACTATTCATTAAGCGTCTGAGGACGGTTATAAAGACACTTGG\nGTGACCCCTGACGAATCTTTAACACTTTCGACTAGTATATCAAGTGTTAGCCACGAGTATGATACAGCCA\nTTGAATAACGCAGGGATCGGCCACCGATCCAGGTCCGGTAGACAGGACCAACTCCTATCTCGATCGATGA\nACACTGTCGCCGCTCTGCCCATTATAGCAACAAAAAGTAAGAGAAAATATCTAAGATCGAGGTCTTTGCA\nTTACTAATTTAAGGCTTCAACACCTCGGGTGTAGGTAAGAGTTAAGGGGGATAATGGCACGAGTCTTTGC\nCGAGGAGAATCGCGAGTTAGTAATTCAGCGCGGTGACACTGACAACCAAGGTATGCCCCATACCGTTACT\nCGTGATTGCTTTCTGAGGTTACCTAGGTCGTGAATCGGTCGACATTACGGGTTCCGTCGATTCCATTGAC\nCCCACGTCGAGTACAGGAATCCTCTTAGGTCTTTTGTAGGCAGACCTCTAGGGTTTCGCATTCTTCCTGC\nGGGGGGTTGTTCTACGGGTATCTCGCGACAGTAGTACACACCTTTGCCCGTCCGTCCTAGTTACCGGTGG\nCACATGCCAATTATCCGCAAAACCGCAATATCTCAGGGCAGTTGAATTTGTTGTAGTAGCTGTGCATCGA\nTCTGAGTTCTCCGAGGGAGGGTATTGGCCGCCGTATAGGATCAGGTCCAGGTGTTCAACGACCACACATA\nTGCGCATTTGCCGTCTCGTTGCAAGGGCCGTATCGTCAAAGAAGGCGGAGTCTTTGGTGCACACCTTTTT\nTACGATACAGGGCGGTCAATCAATGGGACCCTGGACAGCCGTAAAGAGGCACAAGGAGAGTTCGTAGTCC\nGACCAACAGCACACCGCCGCGTATTCTTTGGACACTAGAAGATAAACACGAATCCGGTTAACTAGGCCCA\nCAGTGGCAACTTCCCCGGTGCTGCCTCGCTGGTTTACCATTCAACAGGTACTCAATGTATGATGTTCGCG\nGTAGCATAAACCGGCCAGCAACGACATGAATATCAGGTTCGTGCACTCCTAGTCCCGGAAGCTGGTAAGA\nGTTGTACCCATTGACTGGTGGCCGCGCACATCGCCCTCTAATGGCACTATCGACAATGGATCTACTCTCG\nTGGAAACTCGTATTAAATAGCTTTCAACCTTATAATTATTACGTCAGGGATGTTTTTGGGGACACATCAG\nATCCTGTCCGAAATGACTAGGGCTCCCCATATCCAGGTGCATTCAGTGCTACAATGTCTGATACCAATCC\nGCTTAGCGCCAGAATCGCTCCGTTCTGTTGGGGAAATAACGTGCTATATATAATATGGCTTCACAGTTTC\nCTTGAGCAGCTCGTCTGTCACTCCACGACAGTGAACGCTTAACCGTATGTGGATAGAAAACTCTACAAAC\nATCTTGTATCCGTTACTGACTGCGGTGCTATGTCGTGACGCGCTCAAGGGTCTTATGGGTTGAGAGTCAG\nCAGTGCGATGTTCTTGCTCGAGTATCGGAGTCGGCTGCACGGGTCGTTCGCATCATTTGCATTTTACCTA\nGAGTGTGACTCACCCGTTAGTTCCTTGAGAACCCCAAACACCAATGTGTCGGTCTGGTTGCCCAAGTAGC\nTAGTCGTATTACGTAGCTGGGTCTAGTCTGTAATTCTGTGACTCTTCGGGATGTGCGTCTCTTTCGTGGT\nGGAATTATACATTGCGTACGCAGGTAGCAGACTCTCCCATGCGTACATACGGTCGTCAGTCATGTTATAT\nTGGAACTCATATCCCTAAGTAGGGCCCGAGCCGTGAAGGGCTTAGACCGCTCCAGACCAAACGCGCCTGT\nCTGTTAAGCTATACTACATGTGCACGGAAACAAATATCGAGCATGACCGGAACCGGGCTAGTGCATCGAA\nCCTTGAAGAGAGTCTTGCCTGTGATATACCTTTCAGCTTCGTTGTCAAAAATGCCCACTGGCCCGGATCG\nGCGACCAAGAAGTTTGCGCACCGGATGCTATCGTGAACTCCTATCAATCCGGCGCAGATATCTATATGCC\nTAGCTGACACCTTAGACAGTACATAGTAGATCTTGCTACAGTTGGTTCACGGTTCAAACTCGACTTATCG\nAGAGGATATCGGACGTTACGGTACGAGACGCATTTTAAATTGTGAAACCTAGCTTTCAGACCCTCTAAGT\nGGCAACTTAGGACCGCTCACTGGCCCAGCCCCTTTGCGATTGATTGACACGTCGACTGGCCACATAAGTT\nTACTTGCATAGTGCCAGGAAATTCTGTGTAGTAAACAGCATGACGTATTTAACAAAGAATAGGTTCTCGC\nGGGGACGACGTAGACAGTAGCCGCATTTCATCGTTCTGGGACTACCCGTCGTGAGTCGTTGTATCCGACA\nACTAGGATTCTGCGGGGCTTGAGGTTTCTACATTCCTGCGGTCAAAGTGTTCCGAGCGGCGCAGCTAGGA\nCGCAGAGTACCGCCAAGAACAACATAGCGATTAAGTCCACCCAAGGCTCCGGTAAGCAGCCCAACCTATC\nCCCGCGCGGATTTATTGGTTGCTACTTTGCTCTGAACGACGGAGAAAGAAATTACTAAATGAATCGCTAT\nAACTCAGGGCAGCTGCCCGCTCCAGACGCGGTGGACAGCGCGCCTGCGCTTCTATACTCTACGAAATTCA\nAGTGCGTGTAGGGGAATATGCTTCATTTGCTAAAAGAAAGCCTCCTACGTTAGGAAGAAAAACCTACTTC\nCGTACTCGGACGGTTAGCGCGGGTACCTACTTTCTGGAGCCGAACTATGGCGACGTAGTTAGATCTTTTA\nAGGACTTACGTGTTTGTTCATACTTACCCTGTACTCCTTGGATTAATATTTGCAGCGCACTCCCATGACT\nGCCCATCTCATCGTTAAGCCAGATTATCCATCATCTCTGTTTCTGATCACGGGCGTCGAAGAGGCCTAGT\nTGCAAGATGAAGATTTGTGCGAATTGCGCCTCATCGGGAACGACTATTTTGGGCCCATTTAATCTTGCAC\nGTAAGCTATTTACTTGCCTAAATAGCAGACGCCTTCCCTGATGCCATGAGGACGCCACGAGCGCGGGTAT\nTGTCAGTAGTTCAAGTGTTGACACAACGAGGGGGTACAGCCAGGCCTTATACATAATGTACTGTTGCCAC\nTCTTTTGTGGGGCCCACTGATTTGACGAATGCCCAGATTGTAGGGACCGTGTATGAGGATTCCATCACAC\nAAATTCGGCCGCGGGTGTTAGGGCATTCATCCGTATGGCAAAACACCCGCAAGCTGCGGAAGTCCAGGCT\nCGCTTTAGGTCTCCAAAAATCTGTGAGGGTGCTGAGGGTAGTGCCGCTAGAGGAGAGGCAGCATAGGAAG\nAAGGCGGGGTCCAAACGAGCTGTTTGCGCACACACACTCTATATAGAGAGGTGCAGATAGGCGCATCGGG\nAGCCTATCCCCCCGACCTATACACTCAAATTCCTGTCAGGGAGCCGGGTGGGCTCGCAAGCCGAACGGTC\nCATGGTCCGTGTGTGAGGGAACTACGTTTGTCCTCTCCCGAATTAGGCGCCTGCGGGGGCCGTCCGAACA\nCCTCGGCAGAAACTGCTAGTTAGCCCTAAAAATCAGTTGATGGTTTACTGAGAGTCCCTGGGCCGTAACG\nCTAAATACCGAGCTTCTATCCAGAGAGGGGAACTGGTTGGTTGGGCTGAATGCAGACGGTACAAAGCTAG\nTACGGGCGGATGGTGAGTGTACGGCTAATCTCATTCAGATCTCGTATACCTAGGCCCACTTCGAGTCGTG\nCACCGCTACGAGCAACTATCGGTTTTCTTATGTTTTAAAACGCGTGCCACGGGCATAGTAGGGATAGCAC\nACATATTCCCTGTCTCTAGGGCGAGGCTAACTGTGTCTCGATGTAACGTCCCGCTATGATCGGATGTCAC\nTTCCGTTAACATGTAGGCGGTATTGGAGGGCTGTGTGGAAGTGACGGTGCGTGCCTACATATGGTCCCGT\nTGTTTCCTGTAACATAGAAATTGTTCCCACCATTCTTCGCAGCACTGGTGATTCGGACATACATTGAGTG\nGCAATGGTTAATTGTAACTGGACTAGTGTCGACCATGCACGTGAACGAACGCTGGAAGACAAATCACGGA\nTCGGTGAGATCCAAGCCAGACGCATCAGACTTGAAGCATACAGCACTGATACGGATCTGTCAGTTCGTTA\nGGATCGTATGTCAGCCCCGGTGTATCGCCCAACAAAATGTGTGCGCGTTCTTCTAAATAAACCACACTTC\nCTGGCAGGCCTCGCCAGACCTTTTATACATGGTGGATTTAACCCGTTACCCCCTCGGGGTGTCATCGGCG\nTTGTTGCACAATGTGCATAGACAGTGGTGTCTGAAACTGTGGCGGTGGAGTGACCGCTCATGTCAAATCT\nTGTCAGCTTTTGGATATTAACTTGTAGGACTTGTATAGCTAACACGCAAAGGCAGGCCAACACGTGGCAC\nGGTCAGGAGCGCGTTTGCCAAAGCAGGGGAATAGCTGGCTGAGCACTGGGGCCAATTCCTCGAGTTGGGA\nATCCGTGGGGGTGCTTAATCGCCGCCATTTGGTTTCCGCCGCTAGATCTGAGTACCCTCGAAACGATGAA\nACCTTTCCTGCCCGTAACAGGCGCGGCGCCCCTAGTCAAAAGCGACACAGGAGCCTTTATGGTTGTCGAT\nGTCTCGTACGCCCATTGATGTCAGCTGCGCACAAAAATACCATGTCGTCGGTAGATGTTGGGCCACTCTA\nCACAGGTGTTACACGTGTCATATATTGTCAACCTATTGTTGGGAAGATGCACCATCGATGGTGACCCCTA\nATTTGATAAAGATAACCTTATGCGTAACACGTCCTCTTTGCAGTAATAATAAGTCCGGAAGTTGTATTGA\nGGAGTTAAGTGGAGTGTGTAAACCGTGATGGTTGTGCGCCGAGGTCTTGTCTGCCGATAGGTGTTGTATA\nTCAAACGTCAGATCGAATCTGTTCTAGGCAGTTGCTTCATTATGTCGCTCTATCAAATCACGACGGGCAT\nTGCGTGTCGGGGGTTAGTACAATCGAATGATCCGAGGCGCGATTATGGTAACCCGAGACAGTATTTGCCC\nGCGAATTGCTAGAGTTGTGGCATTAAATCAACGTCATTCATGTCGGTTAGCTTTCAACCTCGTTGACCTG\nGTTTTAACGGCTCTCAAAAGATCATCGCACGAGTATATTCTCAGTTCGACGCCTGTCTAGTTACGCGCCA\nATCGCACATTTGGTGGGGGGGTCGATCCTGCATAAGGGGTTTACGGAGTCGGCGAGTTCATCACGTTACG\nTACCTATGCCAGTAGTCCCCCACTAATACCTATGGGAACATGCTTTTCAAAATTGGGATCGATGGACGCT\nTCCTAAAACCGGGCTGCGACTAGATCCGGGGCCGGCGTTACCCCCAAACAAATTCTAGATAACTACCTCG\nCGACTGTTCGTCACACTTGCCGGTTCACCATTTTTTGAAACTCCAACAACGAGCAGCATGCTTATTAGAG\nGCTTTCATACTACCCCAATCGAGTGGACTTGTAGAGATTAGTAACAGCCGTTGCAAGCCTCTTATATCTG\nGGTTGGATGTGCAAGCGACCCAGAATATTGGCCCACTAGGAACGCGGTAGAACGTCGCGCCGGAGCAATT\nAGGTTGACGTCTGAGAGGTTAATGTCTCCATGTGAGAGCGGCACTCGACGTGGTCACATGCTTCTATCGC\nTACGGGGTACCTCAGCCGAGCCTCCTAACGCGACGACGTACCTTTCCCCGAGTCGAAAAAGTATTGCAGT\nCGAGAGGCATGGGTATAAGTACGCTGTCCGCGCCTTATGGCCTCACAGACTGCGAAGGTTTTGTGAAATG\nTCCATTTAACCTAGGCATCGATAGGAATAGGCAAGGTGTAGACGTCTCTAATGGGTGGTTCGGAGAATAG\nCGACTCCCACAGTGGATCTATGATAGTACGGTTCTTAACTATTACCCCGGTCGTCCGTCGTACGAGATTA\nATCGCACACGGGCCACGGCGATTCGGCTGGGAACATCTTTACCTTACGCTTGGGCTTACATAGCTGCCCT\nGCCATCTGGAAGCATCCTTCTGTATCCCCGGCACCCCACTAACCGTTGCCGTTCAAATCCTACACAGTGG\nTATCCAATACGCTTCAACCCTCACGCACCCTTGTGTCGACCGCGGACAAGGCTTCGAGAGCTCACTGTTG\nAATTCGGGTATCCCGGGGGCTAGACCTACTATTGACAAATGCCGCTATGTAAGTGGACCGCCGTGGTAAG\nCCAGCCCCTTTTCTCGATACTGGGAGATGTCCCCATCGAAGTCAATGTATTTAGGGCGTGGGGCCTATTG\nTCACTTCACGGGTGCCTGACTAGGAGACCGTATAGGGAACATACCGTCATAGTCCATGACGCAGGCATCG\nGCGCGGTGCAATTACATAGACCCCCACTGGCGCACAAACCGGTACTCTCCATACGGTAGCAGCGTCTTGT\nCTCTCTTGGTCACGCCACACCGCTGACGATCGTGTTTTCCCTCATCGTGGGGACAGGCGCCCTTCGCTCT\nGGGCGCCGCATTGTTGTCGATTCGTTATCGCAACGATTTCGAATATAAATCAGTGAATGTGACGCTCGAA\nGGACGTCGCGGAGCAGCTATAAGCTGCTCCCTGGGGTAACTGGAATAGAGGTCGCTGAGGTACTTCACAC\nCTAGGGGGTTAATGTCACCGTGAGGGGGCGGACCATCACAGGCTCGTGTTTCGTTTTGACAACAATCAGT\nCCACAGTGCCCCGTATTAACGAATACTTCAGTTAGCTCCACCACCTATGAATAACCCTTTCTGTGATCTC\nTACGGGCACCGACTAGAAGCGTTTGCCGGAACAAGAGTAAAGGCCCTATGAGACAACGTGCTCACGCGAG\nGTTGCGCTTCTCAGCTCTGTCCTGCGTGGTCTGCACGGTTAGGAGCAATGGGAACCTGCAGGGCGCTACG\nAGAATCAATGTTTCTCTTGCGGCTGAGGTGAAAATCCTCAGAGTTACGCCAGTTTAGCTTTTGGTGTTGA\nAACCGCTCTAGGTGACCCGGTTCATCGGGTTTTTATAGACACCATTTTTTGGTCAGCCATTAAAGCCACC\nGAGCCCGGTAAACAACGTCCTCTTTCGGCTCTCAGCACAGCACAGCTACGAACAGGAGTTAACAACAGCA\nCGGTTGTAGGGAACGTATATTACATCGATCCCCAGAGCACGACGAGAATCCGAGTGAGACTAACTCAGGC\nTCCAGCTTATCCTGTCTTCTTCGTTAGTATGAATCCCGAGGAGGATCACCAAAGCTGTGGGGCCTGCACC\nTCAACATTTGGTTTTGACGGTTGACACGAACTATCAGTGGTTAGTGTCCGCTTTTCTCAGCTCTATTGCT\nCGGAACTGTGTCATGGCACGGCGAGGACATGCCTGACTCCTTTAACTGTTTCCCCATAGTAACAGGGCCG\nAATCTCAGTTCTCAACGATGTGGCGTGCCTATCCGACAGAATCAGGCTGAGAATCTAACCGGGGGTCTCG\nTAGTCTAGTCCAACTTGTGGTGCATACGTTCGTGCGCTTGGTCCAGTGGCACGTTAATACAGCAGCAGTC\nAAGTCCACATGCTAATCTGTTATAGGGTTTTCTGTTACTAGTATTCTATCTAGTATTGTGGAGGGAATAC\nCGTTCCTTATCGATTAGCACGAGGGAACCGTAGAAGGTGAGTGTATCGGCTAATACGTTTTTGCTAATTC\nACCCGACTGAGATATTGCTAAAGCTATACATAGATCAGCCAGCAGAGGTAGCCGACGCACCCACGATTAA\nGATCCACACAGATCAAGGAATTCGCATGCTGACCAGAATTGGTGACGCAGACCATGTCGGCCGTTAAGAA\nACCGTCAGATCTCTGTGGGTTCAACTAGGTGATATAATTCGCAGTTGGCCCTAAGATATCATGACCCCGC\nCGAGGCAAGGGTGGACCGCGGAGACGGCCCATCTTCCTAGACACGTCCCAGGATGAGCGCATGGAGCTGC\nATGTCGGGGAGGCCGCCGACGAGCTCAGATGTCTCAAGTATTCGATGCTATATGGCTCCCTCGTTAGATT\nTGCCCATCTTTGACGAAGTTATGGCGGGCTTTACCGTATGACAGAGGCAGCTGAGTACGAAGGTCCGTGC\nCTGAATTCCAACCGGTGAATTCATAATTGAGGTCTAACGGGATCAGAGCACCCCGAGGTTAAGTCGTTGA\nACATAAAGGGTCATTAATCCTAGAAGAGAAATCCCGATTCCTCTACTCCGTATGTGGTATAAGATTAAGT\nTATCCACGGTGCGAGCACTGTCTCTCATTTCGAATGTCTGGCTTCGCGTCGCGTCATGCAACGCGAAAGC\nTCCCCTTAGGTCAGCTTATTTTGATCACGCTGCACACATGGCAGGAATCAATTCACCCCATGTGTTCCTT\nTGCCCTTAGGCTGCCCCAAGGGTTTCACGTGGGGTTTGCAAATAACCTATATGGCGGAATCTGCATGAGA\nTAAAAGGTGCGTCCGCCTTGTAGGCAGTGAACATGGTAGAAATCAGAATATGAAGCGTTGTAATCGGCAG\nTAGCCAAGGTATGAAGTGGAAAATTATAAAAACCCGCTTTGGATAAACCTATCGGGCGCGGCCATTACCC\nTGTTGGGGTTATGATCTTCTTGCACAGTTAATCTTGAACAGCGTCGTCAGGGCACAGGGAATGTCGGTTT\nTGAGAAGTGCGTGGGTCACCCGCAATGCCCGGTGTATGACGATTCATCACTTCCCAGAGGATCATAACAT\nAATATCAACCAGCCCGAATCAACTACCTAGAGGACCCCAAAGCAGGTTGAGGTCGGCGAGGATTAAGCCT\nTCAGGTAGCGATGTCGTTTACACAGTGTATACCTTCAGGGATACATAGACTATTGATTTTCGCCGTGTAC\nCACTGCCGCGCCGGATAAGGAGATGAAGGTTCTGCGACTGTACCCTCGGTGGTATTCGTTAGAATTGGGT\nCATCAGGCTCACAGCTTCACGATCTGTGGGTGTACATATCACCGCCTCCTACCCAACACACAGTGTGGAC\nCATTAGCGGAGGGCTTGACGACACAACTCCTCAATGAGTAGCGGTGGCAGACACCGCGCTCTAGAGACCT\nAGTATATTTCTACCCGATGTTTAGCACGACCAGGTCCCTCCCAAGGCCCGTTAGTGCGGCAGTTAGATAC\nAGCGCCCAAGCAGAGTCGGCGCCTGTGCGGGTTACGGGCCAGAGTCAACACGCGTATAAGCACTATGACA\nAAAAAGCTAGTAGCTGCGGCTGATCGCTGGTCAGCCGGCTGTTAGCTGATGTATGGTGGCTGTCCATCCG\nCTGTACCGACTAGGAGGCCTAGGATCCGAGTACATACGCATCCATCACTAAGCTGAGACGCAATTCATGC\nGAGTAGGTCTCCATATCTCAGGCGAGAATCTAACCAAACTCGATACACTCTTACTATTTGTGCATTGGCT\nAGTATTTCCGCATGAAGTCAGCAGCGTCTTGATTCGCACAACTTTCGTGACTAACTAGAGGATTCACTAC\nCACGCTTCGGGAATAGGATTCTATGGACTCGTCCTTCGTAAGCGGATCCCTAATTCAGGTCAGGGTGAGC\nCTGACAAGGCCCACTTTGGCAGCTCTCCTCTAGAAGCATTACGTTCGACAAGATGATTGCCACAGCCCCC\nCTGCTTAAACCGTACCTAAAGCTCACTCTCCTAAAACCCCGAGCGGGGCCGCCCCGAATACTATAAAGTT\nGATTCTTGATAGATCGAGTTACCCTGCCGAAAACTATGGAAAGGTCCCCCTCTCCGATTGTACTCAACGA\nCACTGCAAAGACGGGGAAGCATGCAGCCGCACTTTCTTCCGTTCTCCTTGTGCTTCCCTTTCGTAGGGGT\nCATCCCATGGGCCTCAGCAATCATTGACAGGCGGTGGTAATCCTGTTAGTCGGAACCGTACGTGTGGAAA\nACGAAGAATACATGAGACGACCTTGTGTCGTTAATGTTTCCGTGCTGTCACCTATTATCTATGTGGAACT\nATTCCTACATGCCCATGGAGATTAATAGGAGGGCTACCTATGTGGTGTACTTTGTAAATCTACGGACAGA\nGAGACCTGTTGTGGGCCGGCACTAGTCTTCAACTCTATCGACAAAGCGGTTTCCGGCGAAGAACCCCAAA\nCCTGGGCTAACTTTCTAGTTCGCAGAAGGAATAGTGACGTGAACGGATGTCCCGTGCGCTAAAAAACATA\nTTAAAGGCCGCTCAAGCACCGAGCCGGGCTCCCAGCTAGGTATTTTTGTGGGTATCCACTCGATTCCCCC\nTTCTACGAACGTGGGCTGAATATAGCTGACCTTTAGTTAGGGAATCCTGTAGAAAGGCAGAAGCCGATGA\nGCGTTTGGATAGACAAACCCCGTGAACTAGACTCCTAGCTTATAAGGGGCTTGATGTAGTCGTGGAGCAG\nATGGCAAATTTGGGCGAGAGCAGAACCTGTACGAAGGAAGATTTAATCCCAGTTGCGAAAGATTACTTTC\nACCCTCTACCCTCTTTGGAGTATTTCCATACCCCGTATGATAGGGCGTGTGCAGTAAAGGGCCTAAGTCT\nAACGCGACAGGTGGGTTTGGCTAAGATTGCGCCTATTTGGTTTTCAAGCGGCAGTTAGTGCCCAAGGGCC\nGTCCCGTGTCATCATCACGACGTCGGACGGCAAGGGGTTGGACGTATTTTCAATACGGGTACGCGGGATT\nACGAGATCTAACTGGCAGGAACCTCGGGGTGTGACCAGGGCGAAGCACCACAATCGCTGCAATCATCGTT\nGTTTCCATACAGGAGCCTGCTCGGCGACCGGGTAAATTTCGGGGCGTCTGGGGCGATACTAATCAATCGC\nCATGAACGACTCAACAAGCCGTAAATGAAGAGTAAAAGACGTCACGAATAATAAAGTGAAGGGGTAATCC\nGCCTACATATACGCCGATAGTCAATGCCAGTATATCGTGTATTACTACGATCAGGTGCGTTGGCTTAGGA\nGAGCAACATATGTAAGGCACACGGTAGAAGTTTAATGCCTTCTTAAGCTTATGCTGGAGCCAGAAATAGA\nGTAACGCGACGATTATCGCTCTTAATCTTGGATACCTATCAGAATACCGTAAACGATTTAGTCTTACACT\nTTATTTTCAAAGCCATACCAGGCTATTTCACCACCATGATTCACTATGGCCACGTTAATTAGCCGTCGGG\nGAAGGACTGGGACAGGAAAGACGACTTCGGTGGTAGCTTACCACGGGAGTCAACTCTTCCGTATGCGCCG\nAAACCGCAAAATGTATGGCCTGGAATCTCTAACGCGCTAATGGATTTTGCCTTGATGACTCAAAATATAT\nCGACAGCTACAGAAGAGAGTACTGCGCTGGTGGTAAATGCAAATGTCGCTTTGAGAAACGCCAGGATGCG\nGAGGAGGATCAGGATTTCTTTTCACCTGTCAAGTTCTAGAATTACAAGGCACCTTACCAATTACCAGGCT\nGCCAGGTTTGGTACAGTAGCTGTATCGATTTGCCCCTTTCGGGATCCTTTACATCTTTAATTTAGGCGGC\nCCCCCTTTAATAGACTTTTGCAGATGGGCTTGTCTCTTATTGTTATGTGGTGCGGCACCCGGAAGGCACT\nGACTGCTCGAAGAATAGCTTCGTAGTATAGCCCCGACCAGATGTAGCATTCTTCACCACACACAGCTTAC\nGAAACCTCGCCATAAAAAGACTTCGATGGGGGCCTCAGTCTCGTTTAACTTAATGGTAATAACCAGTATA\nAGTTAATAGAGCCTACCAAAATCGGGGCCATTGCGCCCTGACCGGCTTTCCCCTATGAGAATGATACAAA\nTTTGGGCGTACCCGGACGGTCACACCGCTCAATCCAGGCGCAGTGCCTCTGGAACTAGGCGGTCTCGACA\nCATGTCTTCCGATCCTCAAGAGCGCAGATAGGGTCTACAGGCGTTACCACATTTTGATAAAATTCACGCT\nATACCGATAGACATACCGATCCACCAAGAAGACGCCACTCGGTGACTGACCTGGCAACAACGACTCCTGT\nCTATTGGTCGTATGTAATAGTAGACCGTACCCCTTAGGTATTTGCGCCCCCACCATGTATTCCGGATCTC\nCACTGTTAATGGCTGTTGGCAGATGTCTGTAGGCCTAACTGCCTAAACGGGACAAAGCTGACGTTCGCGT\nGCGTACCACCACGATCGCCGGTTGAGCAACCCTCTAAGTGGTGTTATATCCTAGCATATCTTGTATGCGG\nGAAACAGCTGAAGTGGACGACAGGGCGTGTAGAACCGTTCGGTCAGGTAGCCTTGACACGGGGGGCCATT\nTGGCACTAGTCGTGGCGAAGCTCTATCATTTAGGGCCTTTGGAGGCTTTCGTCTTGTGAAGAGATAGGTC\nTCCGGGGCACTTAACTAGCGACGGTAGAGAACCCGCACAAAATGGGTCCTTTGGGGCTTGCGATGAGTCA\nCTCGCACCTTACTCCATCGGAATTGACGCTAGCGGCAGTGATACCCCTACAGACCACCCCCGGGTTACTA\nGTGAGCTCCGCAACTCAGAGCCAAGATTATATCATTGCTGCCTTTCGAGGAGACCTCAAGATTCGCCAGT\nTTTAACGCTGGAACTCGTCAGGTACCGAGCGCCTAAGGCGAAATTAAACTCACACGCACTAACCAAACAG\nGCGGTTGTCCCCAAATAGTGCGCATATAGCTCTCGGAGACTTGGGTTGCAATTGTCGGCGCGGTGCCACG\nAGCACCACCCCCAAGCGCTCGAGGAATTGTAGCGTAGCCTTAGCATAAACAGCCACCCAACGACGTTTTG\nTGCCGTAATGTTCACTAAACCTGTCGGCCGGAAAACCAGTACCGGGGTTCTGGGTTACTCAAGGAGTAAT\nTAGGATCGCTCCCATAGTGCTCACGGGCGCATCCTAGTCTTCATGGTCGTCTCGGAACGGTCTACATCCC\nGTGCTTAGCGGCTCGTACATTCACGGCCCAAAGACGTGATACTGGCACCCACGTGTCACCGTTCTCACGT\nCAAATTTCATTATCTATAGCCCGAGAAGATTGCATATACGTACTAAAGTTCACACGAGTTCCCTTTGATT\nTCCATTCGACCCCCTCTGTTACACAGTCCAAGATGCACTTATCCGTAAGTGCTTTTTCAGTTAATGCGAT\nTTATGCCTGCTCTTGGATAGAAGTCAGTCGACCTCCGGGCGTACGTTGCGAATTGCGGGGACTTTGACAG\nCGTCCAAGGGTCGGCCGATAAAGCCAAAAGGCCTTTAAGACAAAGGACGCGTACGCACTTGCCAACGATA\nGTACATCATGACCACCGAATCATTAGGGTGTTATCGTCATACGATATTGGGCCTGTGCACTGTTAACCAA\nTGCGCGATATGGTAACTAACGCCCAGCTGAGTCCGAGGCCTACTATGGCTTCGCCATTTGTTGTAAAACT\nGGCGCGGGCTCCTTAGCCGCGATGGGTTGGGCACGAGCGTATGAGCCCTTTTACTTCTAGGGGTCTGGCT\nCAGACAGCATCCTGGCGGCGGCGCGGAGTCTCTAAATGGTACGCAGCAAAGACTTTGGGTTGCGATGGCA\nACGATCCGATCAACGCGGGCGAGAAGGAATCCAGGAGGGTATTCTGTGAGAAGTGTAGGAATCCTTTTAA\nTGGCTACAGGGATTATTCATACGGGGGCCCTTCATAATGCTTCTGCTACCGATGCGGCAGATACCGAAAG\nCAATTGCACCCCCAAAACACAACCGTTCCATTTACATCGATATCGCTTCGAAGGGGGTCGTCCTTGACAC\nCAGGTACCACTACCAATTGGTATCTATGAGCACCACACAATTTAGGATTGAGACGCAAGTCACGTCAATA\nGCGAACGCGCGGGCTCCGGGTGTGATTAAGACTAAGATAGAGGTCTGTCGCAGTGCCATCCCAAGTTTTA\nAAGGGCCGATTCGGAAAGATTAAGTCGTCATAACATCTAAGTAGATTGGAAACTCATTCGACCTGTACAT\nACAGCCAGGTATCAGGAGCGGCCACAACAGGTCCGTTGGGCGCCGTCGCCGTCTCTTTTACGGTAGGAGA\nTAAAAGTGACCCCGCGGATACTAACAGTGGGTGTCACTAGGTGGTTCTATCCTGCATCAGCGACTTTTAC\nGCACCACTCGCTGAAGACCACCATTCATGTCCTCGGTTCAATCATGTGCACAGCAGTTGATGTGCTGACA\nGGATGAATCTAATGCTATGAGTGGCTTTGATATACGTTCATAGTTCGGCATAGGCATCGAAAAAGAAAGG\nCGGATCCTAATGCTAAAGCTAGCATCGATCGTACGATCCAACAACCAACGGTAACATCCATGCTTAAAGA\nTCTGTCTGGCTCTCTTCGCAGACCAGATACCACACCCGTCCCTGACTCCCCAAGATCGCCTATGGATATT\nCTATGAGGGCAATATACATACTTTTTTGGAGATGTAGGTCAGGCTAGCATCTTTCTCCCCTGTCTATTAG\nTCATCCAGATTCCGTGTCGAGGTTTACCAAGTGTTCTACCACCTTGGAGCCAGGACTTCAACGGGCTCAA\nTGCTAGTAACGACAAGGGATCCCTGGTTACACTTAACCGCCATAAGCGATACCTGACGGGCTCTTGTCGC\nTCGTTCAACCACAATCCATTGTCAAAGCGGCAATTGACTAAGCTGTTGTGACACCAAATTTCAACCTGAT\nTGATCCGATTCACACCATATAGTAGGGCTGGTAACACCCCTCTCGAGTGTCTCCACGGTCTTCAAATCAA\nCAATCATGAAAGAGGACTTGGCCTGTTGGCACGACTTTAACAGACCGTTTCTGGTATGCCTGGCAGCAAA\nCCTGGCGGAAGAACGTTATTATCAACGTACAGTGGTATGGTGGGACTCGACGGCTTCCTAGAATTATACG\nGTGCTACGTGTCCTGCTGGGGCCTCAGAGAATTCATTGGAGCTCGAAACATTGTCTTCTCACTGGAATCA\nCAGTCAACTCCACGAGCGTCTGATCTCGTCCTGTGCGCAGAAAGCCCCCAGAGCCAGGTATCAAAGAGCC\nTGGAACTAGTTTTATTGACACGGTCGTGGGGTCCCTTCACTACCTGGAGCGGGTCCTGAAGATGTTGCCT\nGAGCACATACGTCGCACGGCAATGGCCAAATTGACCCCTGACACTGACCGCAGAGAACACTCCGACTATT\nTACGAGTTCCCACTGTCGTTTCTCTTATATGGAATCTGGGACGTGCTGGTGGGAGCGAATCGCCCTCCAC\nTTCACTAGGAGCACATGTGACTCCCAAACAGGGAGCTGAAGCCCACCTATGATGCGTTGGCTAATAGCTC\nGAATCACCGAACTGGAGTGATCCGCATCTTGGTGCTAACTTCCCGAGGGGAGACCAAACAACGGCGGTAC\nGGCTTGCTGTGACGTTTCAGATTTCAGAGCTCTGTAATCCTCCACGCGCTATTGCAAGATTACTCGGATT\nAATGAAATCCTGGTATACCTCAACGGTTGGCTACGTGTAGCTCCACACACACACCTGAGCTCGGTATGTG\nGGTCGGCTAAGGCGTAGTCGATTCAGGTCACCATAGCGTGCATTCGGGTCGCGCGCCGACGAGTAGGCAT\nCAGGTTAGTGGCCGTACTAGCCAAGCCACTCCACTGGGACGGGCATCGCTTGGTGTCATGGTTAGGTCAA\nACCGACCGCCCGACTCTTCGCATCATACAGCGTGCTGGAACGAAAAACTGGCACCTATCTCTCCTCTAAC\nGGAGATCCCTCCGTCAGCAGGGTCGATATCCCTTTGACCGCGATTGTCATAAAGGATTCGTCCCTCGGCA\nAAACGAGAGTCCGGATCCAGGTTCTGCTGCAGACTACTCCGATCTCCAAGGTACACCCCTGCAGTATGGT\nTATGCGTGTGCGGCGGTCAGCAGTCTTATGTAAAACCCCAGAAGACATTAGCACTAGTCTTGCGTGATTC\nGGACAGGAACGAATGGCCAGCCGCATCTTATTTGTGTGTGGTCAATTCTAAGCCGATCTTGATAAATTTG\nATCCCCTCAGCATCTCAGACAAGAATACGTAACCCACAAGCCCATTCCGATAGCCAGTTCGCGAGTTCAT\nTCTTCGGGACAGTGATATGGGAGAGCGTCGTATAGTTCATTGATATCGCCTCTGAGGCCACCTGCTTCAT\nATCGGACGTGTAAGCCGTAATACTAACAGAAGTTTGCGCTAGCAGGATCTGGTTAACCGGCGGGGCGTCT\nCCTGTTATAATCTGCACCAGTCCAACCGAATTCGGGTCGTGTCTTGATTCATGGTCTGTCTGCCGGTGTT\nCACACGGGCAAGACAGGCAACTCAAGTAGCTCAGTTGACTCGAAGCCGATAGCCTGAATCCATACCTCGC\nCTTAGTGCAAAGTTAGTCCGGCCGGGTACGGTGGGATCCATTAGTGTAACTTTCGGGTAACAACATGCCG\nCAGCCCCTCTCTTAAAACTTCGGCGGTACGATACAACTAAGCAGCCGATATGCATGCATCATCCGTATTC\nCTCCGTGGCGACAGCGTCAATTATAGGACTCCGATACTTTATCCTAGTAGATGACGGCAGATCGCGTGGT\nGATGCCCATTCAACATCCATTCATCTCTCTTCCATAACGAGTGCAACCGGCGATACAGCCCCTATGCTAC\nCGTGTTCCAGAGTTGTGGGACGCTATGTCATGGTACATAAATGCACGCCGTCCCCGCTGCTCGTTTGCTA\nTTTATGGTCCGAGCGAAGGCTGTCACATTTCCTGTCCTTTCTCCGTAGGGAATGTCACAAAGTTTGACAT\nAAATTCCAACCGCTCTGTGGGCCACACTTGCACGTAGGATCTGACACTTGGCAAACGATAAGTTGATTTT\nATTGTGCGTCGGCGGATTAGCGCCCACGACACACTGGGTGATATCATAACGAGTAGTTATTTCCACATAT\nACCGGGAAGAAATGAACCGTCGGCACCGAGTCAGTGACTGAGCCCGACGGACAGCACTTCCAAGTTTTAT\nCTCCAAGTAAAACAATTCTCGGACTAATACTTTCATGGTTGAACCAGGGACAGGGCATAGAACTATTGCT\nGCGACTTGCTGCGAATTCGTGACCTGATTCGGGGGTGAAACCTTTTCCCATGAACACGTTGCTTGTCATG\nGCACAAAGTTATGGACTCACAGTGTTAACAGACCGGTCGAAACGACAGGTGGCAGGTTTTGAATAACCGG\nGGCATCGGCCCAATGAGACTTGTTGTGATACCTACCGCGTATCCGTAGAGGATCGGGAATTGGTCCTATC\nGGCGGAGCGAGAGTTTCTACCGTTCCGTCTGGCTTACTACAGACCCGATCGGATTGCTGGTGAACGAATC\nCACCTTAGGCCTTCGGCCCATAGTCCTTGGCGGAACCTGCTCTTTCACCTAGGCATACTGGTTCCGTGGG\nCTATGGCATACAACAGGTCCATCGCAATCTTCTAAAACCAGCATGTTGTACGGGTGACGGATTTACCGAC\nTTGTTCGACAATTCCCGATCAGTTGAGTCTAGATACTCTCTAGTGTGCGTTGCACCCTCGATCCGACTCT\nGGCGGTCGTCTCACACAAAACTTTAAACTTCCTTAAGATCAAAGAATAAAAGAGGGTAGTTGAACTGGGC\nATTGCCAACCAATTAATTGGATTGTAACATTTCTAAGATCCTCCCCCTTGCTGTGTTTCCGACTCCCCTT\nTACAAGCCCATACACCGGAATTCACGTGAGTCTGAAAGGCCTACACTAAGGAAATGTACGAGCCGCTTGC\nTAAGGTTTTCCCGGGTAACTATTCTACCTCCGCGTACATAATCGTCCAGCGCCACACTATCATCGACGAG\nGATAATATGGGAGTACCAGGAGCATACCGATTGCTTGCAAAATGACAGCAAGCGTGCTACTTATATCACT\nTTGGAACTTCGTCGCTACCTCGGCCAATGTCCCATGCTCTATTCGTCATGGAATCTGTCACCGGCAAAGA\nTGTCTGCATCACACGGGCAATATAGGAGCGTAGACACAGGTGGAATCCAATGCTGGCCAGATAATCATCA\nCGCTCAAACTCCTCCACAACCAGTGCCGGGAATTCTCTCGGAACAGCAGAAGGCAGATTATCCAATGCAT\nCCCTCAGTCGCGGGGATGCATGATAGGTAGCCTTTTGGCGGGGGTTACGATAGTGGTTTGGTATTGAAAC\nCTTCTGATCTGGTCTGTGCTGTTGTCAGTATGAGACTACGTGATCCATGTAGAGTACGACCTATGCATGC\nAAGGTTCATGTTCCAGGGCTAACGGCGACGAAACTTTATGTTGAGCAAGATAGATGAGATCAGAGGTTGC\nTTTTTCACTAGTTGCATTTTAGCTACTGTGCGTGTACTAGCGAGTCCACCGCCGTAGGAGGGATAAGAAG\nGCGTTAGCTTCGTGGTTGTGAGCCTTTCCTTTCTTACTCCCCAAATACCTCCATGCTACGGGCTCAGTCG\nGTAGGACCTTAAGGACTACGTGCTACGAGACCTTATTATGTTTCGCCCCTTTCGGACCTTTCCTCGATAA\nTATCTGGTGCAATACCGCTATTATGTGATTATTCAGTAGGAGAGCCTCGTAAACTTTTACAACCGAGACC\nACACATATCGTGAATAGTGCAAGGCAATACAATAGCGTCAACGTATAAGTCCGTTAGCGAATGATGCCCC\nTGTGGAGGAGATTGAGGTATATTTGGTGATCGAAGGCCCTCTATGCATTTCACCAGGGACCTTCCCGCCA\nGCGGTACTCCATAGCCGGCCGCATGAAATTCTTTTTAATCCAGCAGTCGAACTTGCTCATTTGTACTCAT\nTCCTTTGAGCAAAACGCCGAGGTCGGGTGCGTTTACCCATGGCCTGGACAAGCTGGAAGTAGATGAGTGC\nATGCATGAATGAGCTCCACAAATTAGGAAGCGGAGACGTCATTTCATCACACCTGCGGTAGCGGAGGAAA\nGCTCCGCGCATACAGCAAATTTAACAGTCCCTGTGGATGGCAAATTTGCAATAGGGTATGTTACTACGGG\nGGGTCACTAGCTTTAAACACATTCTAAATCTATCGTTTAGTCGAAGCACGGATAGCCGGTTAGCTGAGTA\nCGAGATTTCACGTTGGTTGGCTTACTGGCGGCTGTATGCACCAGCTGCCTTCGTTAGGATCCCCATCAAC\nGTTGCCGGGGAGAACGACCATTTGTGCGGGTGGCTATGCCCTACACCGTTTGCCCACTGTAGTCCTCTCT\nCCCGCTTCGCAGATCGTAGAGGAGCGTCCCGCTCCGCAATAATTGTTCCGGATGTGCTTCTCGATGAACG\nCTTGCTTACTCTTCTAACCACACTGTACACGGAACTAGATCAGCTGGTCTATGGCAGCAAGATAACATCA\nTAACCTTTGTTACGCAGCGCGCCGTGGTAACCTGGGTAGCTTTCATTACCCTAGCACGTGGAAAAGCACA\nTCCGCGTCCGGTCAGTTTAGACCTATAGCTGCAGCAGGAGGTTCTCACTTTGCAAGGCGGTAGCCCCGGT\nTCTTGAATCACATTCACACTCGACAGCTCGTGGATGAAGAGACACAGTTACCTTAAGATTTACCTCCACT\nTTTAGCCCTCGGAATCTCTTCCCAGCTTGAATTCCCTGAGAGTTGATTGCAAAACTGCATCTGATGCTTT\nTGGTGAGAAAAAAGACTTCTGATTGAAACCCGCCACTGCAAGCAACATATGCGCGAAGACTTTGAATTCG\nTGTACGGTTAAGAAGGTCCAGGACAGAAATAAACGCCTTCAGGTTGAATGGGGGACAGCGTGCACTGCAT\nCCAACGAACGACATTTGGGTTCGAGGTTCTTTGGCCCATGTCCACTGCGTATGTGAGATGCCTTTCTAAC\nTGAGGGTGATATGGTCCTAAGACTGTTATTATCTTCGCCATGTGAGCAGTGATCATAACAAGTGATTCGC\nGTTGACCGTCGCTTTCCCGATTACGAAGTTATGAGTCCGATGTATCATGAACTTCCCGACGTGTTGCGAC\nTTCGCCGGTTCAGCTAGCCGGTACGCGACCTTTGCAATGTGCCTCACGAACACCCCACCTGTGAGGTAAT\nCCCAGTTAGATCTGTTCCGATCAAGTGACTATTGTATAGCGCTGTGGTTGTGCTTAAGTATCCATACCCA\nCCGATCTCTCGCCGGTGATGCTTAAAGGGGCATTAACAACAGTTGGCAGCAGATTAAATACACTCTGGGT\nTGGATACGAGATTGCATCATGATTCACTAAGGTTTCTACGTGCAATACTAAGCCGCTTTGCCCCAATCTC\nTGACGCAGCGCTCTCGATATCACTAGGCTACGAACGAAACAGATACCCATCAATACAGGGTGTGCAGCGT\nATCAGGTGTGCCAGGTCCAGTAAGACGCCTCACAGGAGCGGCATGTCCCCGTACCCGTTAGCCAATGCTA\nGGCCTCAGGTTGATAACCACGCAATTAGTCTTCATCACATACTTTCTGTCTGAATTCGTAAGACCTAAAC\nAACAAGTGTTACGCGGTTAAACGCTAATGGGCTCAATCTGCATGCGGAGGATTCAGATCTGCCGTTGCGA\nACATAGGTGTCTACTTAGCGGATATTAGCATACTAGGTTCGTTGGGTGACGGTCAACATAGACCGCACGG\nATATAGCGCACAGTCGACAAATTTTGAGGGGGCCGAATCTGCTATGTTCTGATTATGAGTGGTTACAGTA\nCTTACTCGGGGAGTTGCCATACGGGGGATTCGCGGACGTCGCATGGGTTTCTCAAGCCGGGGTTATCGTG\nTTTCTCCTTATTTATGATACACGTTCCACTACTGCTATGTGAGAAGCGGCCTAGCCGTGGCCCAGCGGTC\nGGGCCGTGCCTTATATGAAGGGCGTTCCCCTCCTATACCTTGCGGTTTGCAGCACGGACGTAGAAACTTC\nCACCAATAGATCACTTGAATTCCGGCCAGTGGTCAATGTGCATATAAATACCAGTTATCCGATGAGGCCC\nGGTAGTATATGGTTTGCTGCTGCCTCAGTGTAGAGCTAAAACTCTCTATGCGTGAGTTGATTCCGATTAT\nTCGAAGATTAAATTGTACCGGACGCAAAGATAGTACCCGCATGACAAATCTCTAGCCTATTCTACTTAGC\nTACACGCGGCCACCAGGCGGTGCTACGTAACCTTAATACCCCCGGGTTAGTTCCGGCCGCATAATCCGAA\nACTTCAACTTAAATCCTTCAAAGGGATGAGTCAAATGGTTCGCCTGACTTGCCCGACGGGTGCTCAGCCG\nTGCACACGGTCCCTTTTTGGATGTCTGAATCACATACGTTATGAGTGTGGATGGGGTGGACGGCAAGGCC\nCACCAGCGGCCGTTTGTAGCAACACCGAGTGAACTAGCTCGGAAAGGACAACACGGAATAAAGATAAGAC\nGATGCCAGGTCTAATCTCAGTTTCCTGTCAGCCGTCCTTAGCTGCTTCCGGGCCTTTGCGAGTCGATCGC\nGGCGTGAGCGACATTTGTGCTTGTTCCGATTGGCTTAACGAACGTTGAGACCCAGAGGCTTCGCCAATAG\nGTGAGGTGGGCGGAGATTTGTAACATAGCAACATAATCTTTTAGGGCAGAACGTGTACAGTAGTGAAAAG\nCGCAGGATCGACTTATCTGCTCCCTTCCAAACAAGATAGGATCCTTGCCGACGGGTTCATAAGGGCTTGC\nCATTTGTTTTTCTCGTCAGTTTTTACCATGTGGCTGCTCCGATGTCTGAAATAGGTTGAGCTGAGATTCT\nTACCCCGAAGCACCGATCGTAGGAGAGCTTTCATGCCAGACCTGCAGCAGCCTACCATTGTCAAGCTCAT\nCTGATTCAACGCATTCACTGCCACCGCTCCGAACACGTGGTTCGCGCCAGGGATATTAACTTACACTTCA\nGCTTCATGCCAAAGGTAAGCTGCCTATGTTAGCGATTGCTGGAAGAGATTCAGCAGTACGTAGATGCTGT\nCGATGTAAGAATACTACTTGTAAGTGGTGTGCCTGTTGTGAGAGTTGTCGTCCTATCGGATGCTGTTAGT\nTCTGAACTCTCATCGACCTATGCTATTCCAGGTCGGGAAAACGATCGGACCGTTATGCCAGTACAATCAT\nCAGGTGCATATATTCCGGGCGAACCTCCGATCTCCTGGCCGAATTTGCATCGCGTGTCGACTCTTACCAA\nCGGTCAACTTAACCTTTTCAAGGAAACTGGATGGATTTGCAAAAATGTCACTCAAGCCGGAGCTGGAGTG\nTCCGCACATGGGGGATGGCATTTTAATAATGCGTTGTGACAACCCACCCACGAGTAACTTAAACGTTGCC\nGGCCGGCGCCTGCGCACTCGACTCCAAGGGGCATGAAGATCTGTCAGCTAGTCTAGTAAACAACATGAGT\nATGTGGACGACCAAAGCCATGCAGTAGCAGAGCGACACAAATGACGTACGAAGCAGCGTGGGTACTGCTA\nGTGTGTTAAGTCTTCGCCTATCAATAGACACAGCCAAGCCTCAGTGCAATTAAATGTCTACCTCCTTTCT\nGTTACGTCAGGTGATGGTGCATAATCACTGCTCCAGGAGATCGGGCATATACGCTTTGAGTATGGGTACG\nGTATGGCGGAGCGGGAATCTATACTACGTCTGCCAGGAGCATTATTAAGTTCACAAGCCTACGCGTGTAA\nCTGCGAGAGGCTAAGTCTGTTGGGTCTGGATATAGCGGGATATGCGGGATGCGAAATCGCAATAAACTTA\nATCTACATACCCAATCCTTGGTAGAAAGTGCGGGGAGAGTAATTCACCTAGTCATGAGTAAATACATAGA\nTTAACCTAGACACAGATTTTTCCGCTCGGTGCATTTGCGTACCGGGCATTAAGCAGGATCCCGTCACAGG\nCTGCACGGGCATTTTACTTACATGTCCCCTACTCTCTTACCCCTAAAGTCAAAGCTATCTTTTAAGTATG\nAAAAGAGGTGGTCATAACCGACTGGACAGGGTACGGAATGCTATAGGGAGAAAGCACAGGCTGTGCGAGT\nACCCGACGGGCGACCGAATCTTGGGAGTTTTCGGTAGCCATGGCGTATCTGCCCCGATTTGCAGGTCGGG\nACTTGTTGGGAGGATAACGAACCACAAGCCACTCTCCTTCAGGTCAGACCCTCTGTCTGAATGAGATGAA\nCTGTCCTCACGCCGGCCTTCCCACGGGAGCATACGGAAGTTTCAACCGCAGCCGATTCGCTATGTTCCTT\nGTGTGCGACTACCCGCTACAATAACTGTAGCTGCGCCGACCTTCAACCATGAACAGCTCGTCGACCCACT\nTACAACGCAGCGCGTGGCACTGCGGTATGACCAGCCCGTTTGGCGACGTAACTATTTTTCATAAACCACG\nTAACTCACGCTGGCGGAGCGAGAATCGTCGAACTCAGCTGTAACTGTAGGGCGGCACGGCCCCGTTTAAG\nGAGCTTAACAATTGTCGGTCTTGCTAAAGTGCTCTTCAAGTACGCGGCTGGACGATAACAACGCAAGGTG\nGTAACGACAGCTATCTTAAGATACACGGGTATCACTGGGCCACGGACCGGGGCTTAGTCTAGCCTCAGTA\nCCTCCCCATTTCAACTTTAGGTGCATGGTCCCAAAAACATTGTGTGCTCCGTCGCTGCGACGGCCTGACC\nTTATGGTCATATGAAAGAGATTGTCAACATTGGAAGCGAGTGCACGCTAGCATTCGGGCTTGAAGTGCAG\nCGTTTGTGTGTATAACCGGAGTTAGACGTGGGGTACACCCTGGGTTACTGACTTCTGGCTGCCCCCGATC\nTGCAGTACTTGGCCCCGCAGAGCACACACAAGGACAGGATTCAATCATGGTTAGGAACTTTTACCGTCCG\nGATAACGTTGCGTCCGACGTTGTAGCTTATATATACTTTTGTTCGTTAGTCTCCAAGGCGCTTGTTGAAT\nCAAGGGCACGGCATTCGCATAACTTGGCTAATTCGTTTCTACAAGAAATACGTGTCCTGACTTAATACAA\nTTCATCGCCGTTCAACTCAGATGCCGTTAAGCCTGGACGTTGCGAGTGAGGGTACTGTAAATTACAGCCG\nTCTCACGCCATCGCTTTGGCAACTACGCACTTCTTCATCTGAGTCGCTATGGCCGGTCAGCGGTGCAAGT\nCGGCAGAGTACGGAGTCCGCTTCCTATCAACGTCACACTGTAGGTATATAGACCCGCATATTGAAGAGTT\nGTATCCCACTCCACATGGGTGGGGAATAGGTAGGGGGTCATAGGAACCCGGACGGGGCCTACTCCCTGAG\nACTGTTTAACAGGGGCTTGAGGGCTCTACACCTAGCGGCCAGCTCTACCATGAGTAGCGCGGGTTACCAG\nAGAATCACATCGTTACAAGAAGGTAGCTACTATAAATACTGCCATTACTAGCTATAAGCGCGGTTCCGGA\nACACGAAGCCCGACTGCCACTCGAATTTCGCTATGCCTTCACCGTGGCGATCATTACTTTGAAGTGAGGG\nGCATAACTAACCGGTAGTAAGGCACGGCGGCCATCGCTCTAACTTAGGCACTACCTCAGTGGCAGTAACT\nAGAATTGACAGGCCGTCCGCTTGGTGGTGAACGTAGCTCGGATACGAGCAGGCAGCTGCGGGTTGGACCA\nCTTCAGAAACGAACCTGGGCCTTATTTCTGATCCTTTTGGTATGTGATGAAGATCGCCAGGGTGGCCAAC\nATGTTTCCCTCACCGGTGCAGACTGATCGGGATAACCAACTAATCCCCTACACACGGTTGCAAATTCCGG\nCCATCTGTCTTAATAGGAGCCGTGTAGACGCATTAGATCCGAACTTTATGCTGGGTTATTCACTGCCTTA\nGTAAGGATGCCGCGCGTTTAATAAGTTGTGGACTCGTAATAGCGACAACAGAGTCGCTTGAGGAAGTTGT\nGGTCCTTTAGGACCATCGGTGGAGCGCGGCAGTAAAATGATACCACATAAGTCCAACCCTAGCAGAATTG\nCGACAGTTTAGTTATGACAGACTTGGGTTTACTTGGACGTTACAAATTCGTGGTCTGGATATTTGGTTTA\nCTGACATGCTTCCCGTACCGTAGTAGACAGCGGGTCGGCCACCTCAGTCTCGAAGCCGTACGGGGGATAC\nAACGCTGCGAATCGTCTTTATCTATCGTCCCCGGAGCTTGTTGTTAGAGTTGCAACTTTTAACAACAGCA\nACGTCAGACTAAAAACGAAGGCGGGCATATCGCCGGACAACTGGGTGGCGTACCACGGCCGCTACAGAGT\nATGAGAGTTAATTTGGGTAGGTTATTCTGCTTCTACACTGTAGCGACTAGCAATTCGGCCGGCGTAATAT\nGAAGATCAATGGAGTTATAGTTGTGACCGATAGTGATAAAGTATACCTCACCAACGCAACCAAAGCCAGA\nATCGTTGAGGACCAATTAATACCAGTGTATCCATGCCTTGCAAACCGTTGAATCCAAACGCCAACACTCA\nTGTCACTAAGGTTGCCCCCACTTGATCGTCTATCCGCCGCTCCTCACGGATGGTAGTATTGCCGGTCCGG\nTTCCGTTCGGGACAGCGCACAACTTTATAATGCGACGAGCATCGCCACCCTGCATAACTCCTTGCTATCT\nACAAGTTAATTGCACCGGATTTAAGCTAGCAGGGATCTTACAGTGCAAGTGGCGCTAGGGCCGTCCCGTG\nATAAATTCCGCAACTAACGATCCCACTAAATACCGACAAGGTTCGGCACTGCGGGGGTGGCGTCCCCTAC\nTAACTAACTTATACCGGTGTATACCCTCCCTGCTTTGAGTTGGGTAGCTATAACTGAAAAGCGATCCGAC\nATACGGGATGGTGGGTGTCGACACAGCTCTCTCCGCGCCAAATGTTCCTCTGCCAGCAACGTTTGGTCAA\nTCCTCTCGGAAAGAGTCGATACTTGCGCATTCGAAGCTTGTATCGCCTCGATGCCCGGTGCTTCTGCCCT\nAACCGTGTGATATGTCTGATTGCCGTGACGGGACCCCTGGCGATAGGATACCTAGAGGAAATTCTCCGTA\nCTTAGAAAGCAACCATCTCACTATGTTTACCGGAGACGTGTTGAATCCCCACGCTGTACATATTCCGGGA\nTACCTGGATACTAGCCCTACTTCACGATGAGGGTAGTGCTACTCTCCTATCTTCGGTGTAGAACCACTCC\nTGCTAACCGGGGAAAGTCTATTCACAGGTGCTATCAAAACCCAGGTATAAGCAAGCATCCGCCTCGAAAG\nCCGGGCCTGCGCACGGATCTCTTCTGCTCCTGCAACAATACCTCAAGATGCGACATCTTGCGCGCATCAT\nTGTTTACCTTTCAAGTCGTATGGTAGAAGCGGGTAACGTATAGCAACTACTCCGCTATAGGGCTCCTAAC\nCATGTTAAATGATGGCCGGGTAAGGCTGGCCTGTCACCGACACTCAGGCCTTGGACCATTTGGCTGAGCG\nCCAACCCTTAATTCCCGCACCTTGCATCGGTATCATTAGAGGGAGAGATTCTCTGCATTGCGGATAGCTC\nCGATTCCGGTCTAGCGCCCTGGGGCAGACGAATTCGACTTCCGGTGAACAACCTCTCTCACACGGTTGTA\nACCGTCCACACCTACTGTTCACGCCTACCCCCTCCCACATATTGTTTGTTACGGCGAGCTGACGCGTAAC\nGCACGTTTCCTGGCAGTCGATTCCAAGGCCGTAGACCAACACACGGTCTAATGCTTTCTGGCACGGCGGG\nCCTTGTAACGCCCCATTTCCTTTGAGGTCACAGTCACGGGCTCCTTTATCATATACATAGGTGACCTTGC\nTAAGGGCAACCTCTACAGATACCAGCATGAGCTGGATACGCTTAGAGGCTGGCAGAGGCACGTCCCCGGC\nGGGCAGGATGAAATCGGGTTGAATATGATTGCGGGGAACCGAGACTTTGGTATCGGGTCGCGCTCGCGGT\nAAGTGTAGTTGCTTCGCGGATCCCACGACTCTATCATACGACATCAACAGTACGAAATGCTTCGCATGCA\nTCAAGGAATAAGCGCAAAGTGGATGGACTCGACTTGGTTTTCGCGGAAAGAGTTCGTTCCCCTCCCAGGC\nATCGACCGTCTCCCTCTATGTGAAACGACGCCCTATCATCTGCTATGTGGAACCGGGCGGTTACCTGCGA\nGGTGTATTCGGCGTCTAAAACGAGGAAAAGACAGCCGGCGCAGGTGATATCAACACAACACTGTCATCGC\nGTTTGTCAACGACTTGAAACGAAACTTCGCTCCGACTGTCATAAGTGTGGTGGCGTTCCTCTTAGTACTA\nCGAACCCATTCAAAGGAAGGCGCCACATCAAATAGGCTAAACCGGAGATTAGTAACATTTAACTGTTCAA\nGTCTGATCATTCACCAAACTTAAACTACAAAGACGGACGCCAACGTGCGGTATATCGCCTCGGATTATAT\nCTATCCGGTATGCATTTGTTTGTCGAAGTAAAGAACCGAGCAGCTTTGCGAGGCGGCGTACCATGCACTG\nATATTTTGAGGCCTCCACGAGGTTCATATTTTAAATAGAGCCTGATCGGTCTAGACCTCTCGCGGTGCCC\nATAAGATTCTCCTCCGCCGGTGACTGATCACTCTACAACGAACAGTTATTAGTACCACGCCCTTGTCTCT\nGGCTAGGTGCTTATTGATAGCTTTAGGATACCTCGCGAAGAATCCGCGAGAGGTACTTTAGATATGTACA\nCTGCCGGCAGCTCCTCATGCCCGCCTAGCCTTAGTTGGGAGTCGTACAGTCACTGTTCGGGCCGTCCTGC\nGTGACCTGTAAGCGGCTTCCGCCGATCCAGATCTCGGGGTCCGTGCTTCCAATCCCATCATGGCCTGCAT\nTGGGTGAATTGAGTATCCCGAGATTCAGCTCTATGGTTTGGCAGAACAGGGGGCAGGCGTGACACAGCGC\nAAGCAACGGGTTACACGGATTAGTAATCTGCCAGGCTTCTTTAACCTAAGAAAACTTGCCTGATTAGGAC\nGTTAGTGGATGATTGTGCCTCCACCTCGGGTTATGTTCGTCACCTCTGGTAAAAGATTCACTGAAACGTA\nATTGTCTTTAGAGAGATGATAAGGACCTCCGGAACGACAGCTGGGGATTGAGCATGTTCCATGGCGGTAG\nTATATTTAACTAGTCTGGGTACTAAGTAAGGCTGGCTCGAGGACATGCGCCGAATACTGGACGACGTATT\nACACGGAGTGCAGGGAGCTGTTGTAAACTCTGTAGGCTTAAATAGAGAGCGCTTTAAGTAAGCGGCGTTA\nGATCTAATTTAAGGTCCGCCGGCTCAATGCTGAATGCTCGGGGATTCGCTGGGAACGCAAAATTACGTCC\nGTCACCCTTCTAGTCACCTGAGTGATTCCGTTTATGTGTAGCAGGCAAACGCTAAGTATGTTAGTTGTGG\nACCCTTCCTTACTGGGGTTCACAATGATTTTAGAGATTGTTCGATTCGCAAACATGTGGTTAGCGTAAAT\nCCCAATATTGCTATCCATTGTTCTTGATCGACGCCACGTCGTTATTCATGAGGCGGGGACCAGTTCGAAA\nTGAGCCTGGAAGACGCAGTATTGGGGTCGAATATGTTTGTCAGGATACCAACCTCAAGTTAAAGCAATTA\nATGTGGATTCGCCTCTTCTCAAGATTGGGGCCTACGTAGCGCGATGTAGCCCCATGAGCTCACCACTTGC\nACAAGTACACTAGACCCGTCGCTTATGTTATGATGTGCTGCATATTAAGAGCTTCGCTACAAGCCCCCTG\nAAACCCGTAACCCGACTTGAGTAGTGCCAGTTCCATGCGGTCGGACTACAAACTTGGTGTTGGCCCCTTC\nAATTCGGCAAAGCTACCTTTATAAACACCTGCTACGAATATATCGCCCCACCTTAGCTCATTTCAATCTT\nCTACGACGGCCATGGACCCCAGTAGGTCTTGCGCCCCAGTAGCCCGCGCGCGAGATTATGATGCTAGAGA\nGCAAGCGCCTTTTAAATGAGGACATCGTTCCTAGTTACCTGCATCTTAGCCCGATGCAGTCTGTTGGATT\nGCTCGATTCAAGACTAGCAGGCAACCACCGGAGCAAGGTTTCCACATGTGGAGACATTTTGTCCCATATT\nGACAATCAATGTCCGGATTAATTCCGGGTAGGTTTTGGAGTTGCGCAACTTCTCGTGATTACTTCATACA\nGAACACCCTGTGTTTGGATCTAGACACGTCGATCCCCGTAAGTGACGCTACTGAATATTATGCAAGCGCA\nTTGAAGGGTGTAGTCTGTGCAGCCAGCTGCCGCCACACTACCCTCGTGCCGGCGGTTTACGGTTTACCTT\nCCGATTGTTCCTTATCGGCCGATACCATCGGATAGGGGTTGCCTTCATGCGAAGGATCGCCTTACATTGG\nTCAAAAGTCGTTCGCGCCGCCGTCGGACTTGCGAAAAACAGTCTCTAGTTTCAATCAGGACGAATGAGGG\nAAAAATAGTTCGTGTGAATTCAATACTAGCAGCACGTGATAGACCAACTTTGCGGGCCCCATCGCGAACC\nCTACACGTGTAATGTACATTTCCCGCCCGTTAACCGTTCCGTGGGCTAGTCTTGCTAGTCATTGTGGACG\nCGTCTATACGGTCACCGTGGACCTGCGTGTGAACACACCCGCTACTCTACAAGCACTCAACGTTTGAACT\nAGGCCCCCAAACGGGCGAATCTTGCGCTTTGATAATATATGGTTAGTCCGCAAAGTTCGGGTTATAGCTA\nCATAAGGGACAATTGAAAAGGACGATGCACCTCTGATACTTCAGGTTTTGGCAGGATAGGATCGAGGAGG\nCGTGTGAAACCCCCGCCCTGGCCTGGCCCGTTGGACGGGCTTGTTCGGGTCAGAATCAACCGAGCATCGT\nCTAAGACAGTACGCAGACTCTTCGTGTCGATGCCCGATCAGTAACGGCTATAACTCCCGCTACCGTCTTT\nATTGACTCCCAGTATGTGATGAGCTAGCAACCCGCACATAACTACTGGTGAAGACGGATTCCGAGGTCCA\nCAACCGCATGTGGCAACATTGAAGGACAAGAAGGTACTTGTCTGGGGGAGTATTAGACGCCGGGCGCTGG\nTCCAGCCCGGGCGCTGCAGCATCGCTACGCGGAGTCAGTGTTCTCGTGGTCCTCGAGGAACTACAGCTAT\nTCCACGAGCGCTGGCTGAGGAGAGGCGCCATAACCGTAGATTCATTAGTGTAACCTCATTGAACGCACCA\nGAAAGCTCATCCCCGAACGAAGTATCGGCGGTGCAAGCTAAACACCTTCTAAAAGCTTCCCCAAGTAGAG\nCTTGTACGCCGGATTCAGACGCTGTTAGGACGGTGCTGTTGGGGGGTATCTAAAAAGCGATGACGGGTAC\nGTGCTGTTAACAGTCGTACAATGTCGGCCGTTTAGTGCGCGTTGATACCGGTACAGATTCGGCAGACTGT\nTCAATCGGAGACCTTCGGAGCATGCCCTTCCGTATGTGTGAAGGATTTTGGTCGGTCCGCTTCCACATAA\nACCTCGTTACACCGCGGCACCACTTTTTCCCTTATCGGAGGGCTGATGACCTACATCTGCGCCGTGCCTC\nGAGTCGTTTCCCCTATGGCGTCTCACGTTCGCGCTACTCCTACTTTATCTACTTCATCACCATCCCCCTA\nTGGACAGCGTGAATCCAGCCGGTGCAAGCCAAAACGGAGCAACTGTCTATCGACGCCTACTTATATCGAG\nCTGTGGGAGTTCCGTAATCGTTTGACTCCTATCCGCGTCTTACTACCTAAAACGGCCTGGGTATTCCTCG\nGCCAACACCAAACAGCAATATACAGCGTGTACTGGCAAAAAAGACACTATTTTTATCCAAGGCAAAACGC\nATCGAGAGGGAGTAGACGGGTTTCCCGCTGCGAAGATAGGTTCGATTGCACCGCATTTACAGTGGGAACG\nTTAACTATATTTTATGTCCGCTCAGTGTACCAAACTGCCAACCCTCCCCCAACCATATACCATCCGCATC\nTCGTTTGATATTTCAGTTTGATACTGACTTCTGCTGGGGGGTCGCATCGTTAGTAACCTGCAAGTTTGAA\nTCTGTAGGGTTACTGCTGGCCTTGTACTCGTCCATTCTGAAAAGCTCGCATCTTTATATGCGGTGCCCCA\nGCGGTTTAACTGCTGGGGGGAACTCTTTAAAGCAGGGTTCACAATTCTACGGTTATCCATTTAACGCCAA\nCCTTGTACCCGTAAGATTGTAATTTAGAGTTGTTGGGGTACGAGCGCTCGAGGCCCCACTCGCTGCGTAG\nTAGTTGCGACGAGGAGCCCTCTACCCCGGGTATTAGATCAGGGAGTTGTTACCCTTTATTGGTTACTCAC\nGCTGCTTTGTCCTGCTCCAGGCAAATCGCCTTAGGCACCGTCGGGGGTTAAGCCCTGCCTAACTTCAGTG\nGGTGGCGGAGATATTGCAAACGCTGAAGAGCGCTCTACACTAAAAAGAAGCATGGTTAATGACGTGTACA\nTTGAGTTTCTCCTATCGCGTATCTGATTATCTCATTAGTCTGGTCTGCCTACTCTCGCAGTTTGAGGATG\nCCATTGTATGTGGACCGCTATTGACCCGCATCCTTGCAGACTGGGACTATAATTAAAGGGAAAGGGGGTG\nGGTTGATGTCTGTTTCAGCTCACGAAAGCATAGGTATCAGTCGCATTCAGTGAGCATTTGCCACACGCTG\nAAATAAACGAATGAGATCAGAGTCCAGACTATGAAGGGGCAGGGAATTTCACCAGCAGCGACTGTGGTAC\nTAGTTTGGCATTCCCCTCTGTGCTATAGAACATTTTGCTGCCGCACATGCTCGGACAATCAGTGTTGGCG\nTTCTTAGTCCGAGACTTATCTCCCGCTGTGAACTTTGAATAAGAGACATCAATGGGCTTATTCCTTAGTG\nTTGGGGGTCTGTGTCAGTGGATCCCTGGACCAGATTGCAGTTCAAAAAAGAGCTAAATGGTTACTGATGT\nGCGTCAGACGCCACCCGGTAGCCCCGGGTTCGAGGAAACTCGGAATCCTATGAAACCACGAAGGTGAGTT\nTGTAGGATAAACGATAGCTTAGTCAGGTGGTCGTGCGCTATAGCTGTGCCAAAAGTGAGCGGATGGCAGT\nGAGCCTGGCACGACCGGACCGCATGTAGGATTAGATATTCGACAGCCCTGTTATGGAGATTCCCAATCGG\nTATCTCAATAGGCCCATCACGTAGCTTGGTCAGGAAAGGATGGTCGTGCTAGTACTAGCTCCGCCCTCTA\nGGAGATGAGTGAGGTACTCCGTCGGACCAAAGTTGTTTGGAGCGGATAACATCACCAGAGTCTTATTGGG\nTCACCCAAAGGACCCCTTAATGCAGAACGAGAAAGGTATGACAATAAGCCAATCAAACTCGGGGGCCCGC\nCCCTATGCAATTCATCTGGCAATCAGTCGATACCGACGTGAATTATCTACGTTATGGTCCTATTCTCGCA\nTGCTCGTCTGAAGCTAGAGTTGATCTTAATTGTTGGGACGGATTACTCGACTGTAGCTTGACCTCCGGTT\nTGGTCTGATGACCGGAATTTGGTGCGCTTTAGTCGTCCATTGCGACCAATTCATTGCATTGTAAAATTAC\nCATGGCGTCAGGAAATAATTCTCACCAATCCTAATCGGACGTGACCTCTCAGCGTTAATTCTGCTCCGAG\nGACGTTAGACGATTCCTAACCTACTTGCATCTTCTCTGATTGTTTCGGCATCAAAACTATAGTGCGCAAA\nCGTACATTATCAAAACGAAATAAAAGAATTTTGTCTTGTCCCCTTGAAGGCAGTCCGCTATCGCTAGTCT\nTACGTAGATCCTTCGGGTCGATGCCACGTGGTGATCTCGCATGCATTCAAAGTCCCAAGCGATTGGCGTT\nCACACCTGAAGGTTGGCGTCGATGCTTCATGTCGGATGGCGTTTAAGTGAGGAACAAAGAGCCTTCCATA\nAGAGAATGAAATGGAGAGGTTATACAGGAAATGATAGCCCGTGGTGATGCTCACCCGAGTTGAGTAAGGT\nGTCAGCAATTCTCTTTCCGCTTTCCAAACCATGGGCTAGCTTCTGGCATCTCAGCTCCGGTACGACTACC\nGCAATCAAAAGGTACCCAGCAGATTCTCTTTAGGATTTGCGGCCACCGCACTTACTACGCCTAAACCATT\nCCACCGGTTAGACGCGTTAGTTGTAATGACAGAACACCGGACTATAAACCCTCTATTCGAATGTCACACC\nGAACGAACCGACTGCCCCCTCGTCTTGCTGGTGGAGTGTCACAAAGTAATTTAGAGCGCTAATGAACCAT\nGTCATACCCAGACTCTGAGGGACGTTAATAATCAAAACAATTTGGCCATGTCGTAGACGCTAACTTCTAA\nTTACTATGGGGAAGCGCCAATGCGGTTCAAACACATACCATCGGGCGTTGCCGGCCTTACTGCCCAGTGG\nGAACTCCATTTCACTATGGCTAGACTAGGTCGTTTTGCCCAACCGATTGCCCAGGAACTTTGCGAGACTT\nGAGGTTAATTCGCATGAATGTCGGTCGTTCCACGTATAAATGCCTTCATTTTGTTTGGGTAAGTAGCTAT\nGTTTGAACGGCCGACGGGAGGTTAGGGGAAGTACTTCACATGTTGAGTCTTGGGGACAAATGAGCATAAA\nTTTTAGGTGTGGGTTCCGGCGCCTATACATATGGTAAACACCCTGGGGGTACAAAAGAACCTGCCGACCC\nAATTGTGAGAGAATTTTTAGGCTCGTCGGGCTGATGACAAACCTTTGTACGTGTAAGCCTAACTCCGATG\nACCAAAGAATAGTATGTGGTACTACGAGGTGGGAGTGAGGATCGTTGCAACGTAGGTTTGACGGTCTAAA\nCTAGTAACACATGGGTTATGTTCGTTGGTTTATAGGTCAGGTTTTGCACGATGTGTAAACTTCTTGCGGT\nATGTCCTGCACTCTAAGTTGGCGATGGACTAACTTCTCGAGGACCTTGACAGGAGATTTCCCGATGATCT\nCAGCTCCTTACAATTGTACATTCTTACAGCGTCTCAGCAAGTCAGTCACTCGACAAACGCCTAACTAAAC\nGTTCGGACCAAATCCCACATTCTCCCTTGTGTGTGGAAGCTATGAGCCTTAAGTGGTTCGATCGGACGTT\nGATCAATTAATACGACTGCTTAGCGCGGAAGTGACCTTCATTTAGAAATAGTATGCCACCGCGGGGGGCC\nTGGGAGTGACCCACTTCTTCTATTACGAGGCCGCATGCAGGCCTGCGAATTGTTTTTTGGCTTACAACGC\nCACCCGCGTGCATTAAAAGGTCTAGCTGATTCTTTGGAATTAGCCGCCTACGCGATGGCCAGCCAATCGA\nGCGACAGCAGATTTTTTAATCTTTCAGCTTTGGTGTACTACACCGGCGCCGGTTGGCCTGGCTATACCGG\nAAGGCCTTTATTGAGGGAGGGGGCCGGCCGCTGGAGCCGCTAAATATTTTAGCGCCTCGGGCCTAAATAA\nACGTCCTCATAAGACCCTGAAGGCTTGGTTACCTATGCTACTTATACAATTGTCTACGTCTGGCGGGGCT\nCGCTGTTAGTTACATAACATACATCCAGTTTCTCGGAATTACTATTTTTTTGAGTTGGTCACTAAACTCA\nGGCTCGACACCATCTGAGATTGTCCCATCCGATCAACAATGGAACTGATACTATTCTCCACCGACAATGG\nCGCCGGCGTCCACGTCATGGTACAGTACTAAATTTTTTCGGCTTAACCATACAGCTGTGTGTGGCACAGC\nTCATCCTGGCCGGCATGTGTCGAATTGCTTACAGGACTGTTTACCGGAACCTTAGAGTAGTAAGGTCCGA\nAGATGTACAAGCTTATAGATTGATGGGCACCCTCCAGAAGTCCATCTCACTAAAGCCCCGAGCTCATTAT\nTCATAGCGTTTGACCTCGACATTACAGATAGGCACCTCAAACCAACGGAATAACACGACGAACTCCAGAG\nCACGTGGATAAATATATCGACCAAACATTAATCCGCTGGGATGATGCCGATTCGTCCAGTATCTTTTCCA\nCGTGGATGTTTGTGGGCCGTCGCAGGTGGCAGTTCGCCTTCCGATATTGTACCTGGATTTTAATTCCGCC\nAAGATTCAGGAGCTGCTCTCCACTTTGCAAGCCTGAATTATCTTTGAAGAAACGTTTTTTTATAGGGGGG\nGAGGGACTCCACCTATGGTAATTTAGCTATCCAACAGTTACAGTCTTAAGGCCATAATCTTCACAATTGG\nGAAGATGAGGCTCCGTCAACTTGGGTCGGCTTGAGGCCGGTTACATCAAAATACTCGAAGCACAGAACGT\nACCGCGCGCAAAATTCACTTGGCAAGTGTTCACCGATTACTAGGAAGGGAGGCTCCGTCCCGCGACATAC\nGAAGGAAATGTTTTAACCGATTGCCGGCGTGCCACTAATGGCCTTTTACGCCTTTCACGCATATTTTACT\nACCCTCACCGTGCGTGATGAAATATCACATGCTGGTGGGCAGAACGGTGCCTCTTCCTCGCGATTTCGGA\nCTTCATCTCTGCACTAAAAGTATATGTATAAGATGCCAGTTGATTCCATACGAATCGGCAACAGCGCGTT\nAGGGAACATTACGTTACTTGAGCCCAGGGCTACGTGAAAGAAAGTTGCCGGGTTTTCCTCGCCGGCAAAT\nCGCGTAGCAACGGGTATGTCCCAAGGCAGAGCTCTAAACTTCAAGGGGTATCCTAGCTATTACAGTACTC\nCGCCTCGACGCCTAGCCCGTCATAGGCTGCCATAAAATTGTGGTAAACTTTCTAGGATGGAGCGAGCCGA\nTGAAAACGATTGCTCCCAACTTTCCAGCCCGGTTCCAAAACTGGCCACTCAGCTAGTTCGTGTGTTGAAT\nCTATTTATGATAAGGTCATCATGATTTAACCCTAGTCCGCGAATAGTAGGGCAACCTGATGCCTGGTGAT\nCGCATCGGCCATGGCATTTTGCCTTCGTGTGGCAAAAGCAAGGAGACAAACGCGTCGCACCGTGCCTGCG\nGCTTGTTTAAGGCGGCGATCACCCGCGCATCTCGTCGAAGGGTTAGCACTAGCGTGCACCGCCCCCGTGA\nGGTTGCAAATGGCTAGCGGCTGCGTTAGCCTACTCATGGGGCGGAAAAAAGGTGACGCCTAGTTATATAA\nACCGGGTAAGCATAGATCGTAAGGGCTGATCGATGGCTTGGACAGCCATGCCCTATCTTGTGATACCCTC\nGCTAAATAGTATCTTCGGCCTATAGCTGGCGAACTCCTGGGACTTCACAACAACCCACGGTTCTAAGACC\nAAGAGTCCGGGGTTCGCATAAGGGGTGGCGGCACGGCTCAAACCCTTCCCATCTGACCCCCGCCCTGGCT\nCAGCAGTCTGCCTAAATGATGTGGAAGCGTTCAACTGGAGGAGAAATCGACATGTCCCATCCGAACACGC\nGCTTATTGGTCCGAGAATCGCACTTTTTGATTCAAGGATGGGCTAGCACATTTTTGTACAAACGGGTTGA\nGGACATCGTTGGACGAATATCCGAGGTTTAACTGACTGTTGTGGGACTTTTGAGAACAGTTATTACGCTG\nAGGTCTCCGTTAAAGCTTTATACCAACAAGGACGATTAGGCTGTAGTTCACCGACTGCCCGCATCAGCAG\nTGACAGCACTCTCTTTGTTCGGACCCTTTTGTGCACACCTGTTGCTGTCGAGGGGCATCTCCGGACAAAG\nCTCTATTGGTCAGAGGGAATGCTTCAGTTTCCAGACGCACGGGTGACTCGTTCCGTTCACATAAAAAGGA\nACTAAGTAACGGCACTAGTCCATACACTGATTTTTGCCGAGTGTATTTGTTTGTCCCAGGTGTGGCTGGT\nCAATGAAGTAGTGCTTTTACTGGTGGAGGTCACCTACCGTGAGCTAAAGAACTACCTCGCGATCAGGAAC\nAGTTTGGCTCCTGCAAATTCCTCCTATCCTAGATGCCACTTGTGAGGCAGCGCACTACCTAACAGCTGCC\nACATGCCGTTGGAGTCCATAGAGGGGCGACAGTCATGTACTGCCTTAAGACTCCAGAGGTGCTGCCCAAA\nCCATATATCTGACTCCACCGGATGCATGTAGAGAGTCCAGGGTTGATCAATACCAGGTCAGTAACGGGAT\nTACGCGTGAGCCAATGTTAACCGGTTATTGCAACGGCCAATAAAGGGGACCGAGCGGGCCTCTCGATTTC\nCATAGCAATTCATAATTAAACCATGTATTCAACAAATGACGCGTCCATGTATGACGATTATCGTGCGAGT\nCTGATGGCCTCCCAAGAGATGGAGGCACGCGGAAGTTTAGAGCCGTCCCACAGGCGTTGCGGTGCATAGA\nGTTTGCAGTAGTGGCTCCGGGTTAGTGGTTAATGGTTTATAACTTTAGATGTATACCGCCGAGGTAAACA\nATCCGCGGCGTTCTTCATGCCCTAGTCATATCTGCTTATCATGGGATAGATACACACGAGCCGTTTACTA\nCGTCTGGGCCGCGTGCTGGAGAGCGTCTGATTCGACTATGTACCTCGTAACGGCACAACTTGTGAGTATG\nGGCTTTAGGTCTCGGGAGGGCTTTCTCTCACAGGCGTCCCACACAGCCAAGGACATCCGCCTTGGACCGG\nCGCATTTTTATCAGGCGGAGACACAGTAGACGGACCCGTGGACGTCCGTGAATGAGGTTATTAACAGGTT\nTCCTCGTCGTCCCACGACACAGTACAACCCGGTCTGGGGGCCTCAATTAATCAAGACCGATAGCAGAGAG\nATGTGGATGTGCCCAGGAAGCTATTTACCTTAACACTCGCCGTGAAGAAACCAACCCCTCGTTTCGCTAT\nCCCGTATTGATGCTGCATCCCGTATGTTCGTGCCATGCATTCTCTCCGAAACTTACATATACCCGCCACG\nCTATGCAACTTTCCGTACTCTATCGCGGCTAGTTGTAGTTGAATTCTACGATAACTAGAAAGCCGGCATG\nACTTGTGCCGATGTCTTTGCACGTCGGTGGTTGTTGGTCTCAGAATCACTCCACTTGTGAAGGTCTTTCT\nGAAACGTCGCATGTTCCGACTAGCATTCTACACGAACGCGCCTCTCGAGTTTCGGAACGCTGCGCTATGG\nTAGTTGTTTTGGACTTAGGGAATACGCCATCAGTACATAATGCTCACTACATTGTCATGCAGGCTGAGTA\nGCCAGTTTTAATTAACGTATGATTCATATACCACGATTTGCTTCCACATCGGGACCTAACACGATGCCAA\nGTCGAATTATATTAATCCCGACTTCATATGGTCATCCAGGAAATATGCAAAACAGCGAATGATTGCACCA\nTCTACCTGACTGAAAGATGCCCCGACACAAGAAAAGAGGGTTCAGACCAGCCGCAAGTACTTGTGCAGAC\nCCCGCAGTCTCGAATAAGAGGCGTAGGAGGACTAAGTCTCCTTAAACCCCTACAGATTTATCGACCGAGG\nATACCGTGATCACTAGCGCTAAATACACTTTTCTTACCTATACCCTGCGCTCGTTTTCTAGACGAAAGCT\nCCCTCAGTGCGTGCATTCCTTCAATGGAAATCTTGCACCGTGTAGATCGCAGTATCAGAGCAAGTGGTCA\nCTCTGGACGGTAGACGTTCTGGGGTGATCTCTGCTGTTACCTAAGGATAAGCATAATCTAAAGGCGGGTT\nTCGACGTCCTCGACCAGCCTACTATCTCCCTGAGTACGTCGATACGGCTTGATGCTTGTACGTGGCCCCG\nAACCTAAATACGATGCCGTGCCCGGCTCGCCGTTGCAGATCGATCCAGCCAAGCTTGGGGCATATGCTGG\nCTTATAATTGGACCAGCGAAAGGAGGTTACCAGTCTACCCCAGGTCGTTCAGTTTATATAGCAACTCAAA\nACCTCGTGGTCACGGGTGACAAATGATTTAAATGATCTGGCCAATTGCACGCCCCGCTCAAGCCTGACGA\nCTTCCCTCTTCTCCAATCAACGTGACGATGGGAGCGCCTCACCGCTTCACACTGTTGTGACGGTGAACCA\nGCGCCGATGAGAGCGCTAGGCTATAAAGGCCGAGAAGTAGATCGATCAGCCAATGCTTCAAGGTATACTG\nGCTACCACATCGTTTGACGCTTTCTGCCTGAATAGGTGACAATTACCGTGGCACGCACGCGACTGACTCG\nGCACTAGTTCGTTAATTATAAGGATCCGAGCGAAATTTTGACGAAAAAACCTTTCTGAGTCGTGTATGTG\nCTTCGAAGACTTCAGCAATTTCGTGCTCGGAATCCTGTTCGGTACTCCTCGGCCGCTCAGACTAACCCGC\nCACAGATTTTTCTTCTTATCTAATCTTTTTGCTCGTTTGGCCAATTCCAATTCCTGAAGCCTCCACTAGA\nAAGTGTGGTAGGAAGCCGCTTGCTCCGGCGATTGCCAGGTCTTCAGCTAGTTGCACGTTTCGTAGCACAT\nCTTTACCTAGCTTCCCTAAGAGATGATTTGTGGGCACGGCCAACGTCGCTTTGCGATCTCGTCCTATTGG\nTCTCCTACCCCCCATAAGGTAAAGTGCTGGTTCGACCCGAACATTTGGTACAATTAAATTTTCCCTGCAA\nAGTTAGACGTATGGACCGGAGCTTCTGTGCACTACCTTGCGTATCAGTCTTGAGTGCGTCTAACACGCCT\nATGAAATAGCAGCCACATGACGTACGAACTCCGCTTGGGAGCTTGTTTTAGCTGCCGCCATCAAAGAAGA\nTGAAAGAAGCTTCAGAGATTTCGTCCACACCGCCCGCTTACGAACCTAAGCAGAGCCGGGAGGTCCGCCA\nCGCTAAGGCTCCTCGGATATCAAGGTACTCCCTCGAGTTAGGCTTGGATGCAGACGGAAATGGGAAGGCA\nCAGTGCCACGGCTTTCGTGCATTGCTTGCCAGCTCAGAGAGAGCAGATTTAACGAGGATTGTAAAGGCCC\nTCAGAGGTTTCTGCGCACTCGGTGCGTACACATTGGTCGTGATCTTGATTAGCCGGAATCCTAGGGGGTG\nCACTCTGTCGTGCGTATGTACTCCCCAGCTCTGTTTGTAAGTATAGTTGGAATCAGCCCATCGATAGTCT\nAACCGAAATCGGGTACCAATAACCAATTCTTGTTGGAGTTGATCGCCGCCTAAAGCCGCACTTTAAATTG\nGCTGTGACCGACATCTAATGATTGGATATGAACATACGGGACCACGAAAATTAGTCGTCTATTTAGCGAA\nCGGACTAACCACGGGAGCTCTGCGAGTGTAGGCCTGTGGAGCCTTGACAGTTGTTCGCACTGTGACATTA\nTCCTCGGTTGGGAATAGACGGGCCGAAACAACCGCTTTCATCGTGGAAGGTTTTGGTCCTTTAACCAAGA\nTCATTTCTGAGCCCACGCAAACAGCTACGGCTAGACGAACCGATGGCGCTGGTCACTATTCACCTCTATA\nAACGGCATCCCGACTTCATTGCTCCTGTGAACTTTGAAATATTATTGGGCGTAAGCTGCTACTGTTCGTC\nCAGACTTTAGGAAACTGTAATCCCCTGTTGGTTACTAGTACTCGCTCCACGAGACCAGAACGTTCCCCCA\nAAACCTTGTGCAAAAAAGGCGCAGTTCCAGAGTTGTGCAATGGTGTATTTGCAACCGGAAAAGCATTATT\nGCATGGTCCTCCCTTCACGGAATCCGCACACCGGCGTTAAGGGACTGACCGCGTCGGTGAGGAAAAATAA\nTTGCATTCGCCGGAGTCATGGGACCTGGGCTGACTTGCGCTTCTACATTGTCGCGAATTTTGTATTGGAT\nGGGTAAACCGATCTTAAGAGACCGCATCAGTCCCATTAAATCTCACCAACGCTCACGCCACACAGTAACG\nTCCTCCCCGCCCGTCTGTCAGTGGCTGGCTCGCAACGTTAAGATTTGTCTTATCTACTTACGTCTTTGAG\nTTTGACGCCATACGTTGGCACTGTTACGTTCTCCAAACCACCTAGACGGGATTATCGGTACATACGCGAA\nCCCCAAGAATAAGCTTGAGATGGCGCAGTATTTTTACCATGCTCCAAAGCTTATCATTTCGGGCAATCAT\nTTCCTGCAGGGTAGTCCCCGCCTTTCATGGACCGTGGTCCTTCAATATGGCAAGGTCGTCTGGTTAAGCT\nCCATATGAACCTGAGTCGTATCATGACGCACAATAGCACCAATACCTTGCGCGGGAAAGCTTTCCAATGA\nACCCTGACGCAGGTGTTCAAGTGAGTAAATTGTGGGATCTTTCTCCGACGTGTAGTGGATTGCTCTCGGC\nAGTCTGTTTGTGGTACTATGTGCCTAGCTAATGACCTGAGAGGGTTAAGCCTTTGGATCAAGTAACGGAC\nATACGGGGAAGATGTGAAAGAGGGGACTTATGTCACGTACATCTACGCTTCCTCGTCCCTCCAGTAACCG\nATGGCCATGTCGTGAGGCGCGAACCATTTTGCCAAAGATATTAGTTTTCCCCAAGATCAACGTCGTCACC\nTTGGGCGACCAGCGGCGCTCATTATGTCGTTATGTCACTCTAAGCACCCATTTCTGAACCTTTCGCTTAG\nGTAGGACTCTGCTTGAACAGATTTTTCTCTGCGATGCGGTTAGAAACGTCGATTCAAGCTAAAGCAGATT\nCCATGGGATTTCCGCATGCCGATAAGCGACGCGAGGTGGTAGTATTAAAGTGGAGTCGCCAACGTCACTC\nTATCACTATTATTAATGTTCAGTATAACTCCATCGACTTAATGCTCATCCCGTACCAGGAATGCCCGATG\nAAGCGGGCGCTGGATGCGTGTCGCGAGTTCTTAAGCGCCCCAGCCGCCAAAACTCGATACTCCGTAGGGT\nAGACCTAGTATATTGGTAATCTCCCTCTGATGTTAGTTGTTTTCATGATGAATTCCCTCCGCGCCATATC\nAGGGGGGTAATCTCTCCAGCTACACCCCCCCTATCAATTAATTGTTTCAGCACCGATATGATCAAGTTTG\nGCATTTTCATTCTAGTTTCACGCCCCTCTATGGCTCACAGGCAGATTTCGAACGTCGACGTTTCCTCTTG\nAACTTGACTCATGCTTGTCGTGGGCCTATTAGAATCAGACAAGCAGATATGTCAATGACCCTACCGAACT\nATCCCGGCTTATCTTCGACATACTGCGCTGGTTCTTCTACGAACTGGGGCCCATGTTAATCTGTGCTCGG\nTCACTCCGGCCCCTCCACCGGAGCCCTTACGAGCCGCGGATAGTGCTCGTAGTAACTACTATGTAACAAT\nAGGAACGTGTAGTATCTAGTGGGCCGGCTCCCAATCAGTTGAATTGTTAAACCCAGACCACTTAAGATCC\nTCCAATTCCCCGTATATATTCAGTGCACCGATGCCGGATTCCCTCTATTGATAGTTTTACTTTGGCCAGT\nACGGCATAATGTTTAAAGCCTACGATGAAAGCCACTATTAGAACCTTTAGGCTCGGAATTCATCAAAGGG\nGCTTTCTAACAGCATGAACTGAAATTAAATTCTCGTTCATGCCCGCCCGAGGCATCCCCTGTGTTTCCAT\nGGCCGTGTAAAAAACACTGTGTGTCCTCCGCATCAGATCTGGTTCCCGGGTCATGAACATCGGGGGGCTC\nGCTGCCTTCGGACTTAATTCCCAGCTTTTTCAAGCTGACTTAACAGCAATAAAAGACTTAAGATCATAAT\nCTAAAAGGCCGATTCAGTGCTGAGCCCTATTCACCTCAACAAAAATCGATCTTTGGTGGCATCACGAGAA\nGCCGTACTAAAACCGACGTACCGACGTGTGGCAGTTCGCGTATGGAATTTCCAACTGTATTACCGTCAAG\nCTTTACAATTCTGCGCCACGTCTTGACTCTTACTACAACGTCTGTTCATGCGTAAAAACCACGCCCGCAC\nGTCTTACCGAATGTAAACTGACGCCGTCGGGTGGTCCGTATCATAAGGTTCCAGCACATTCGACGACTAG\nTTTATGTTCGGTTTCGATATTAGCCCACAGGCAAAGCTGCAGTGATTATGCGAGGATACAGGTCCCATAC\nAGTGTAGACGCGTTACGTCTGCATAATATGGTTATGCTATGCAGCTTGAGGATTTAGGTTATTCGCTCTA\nTGTCCGCGCATATTCCAAGACTTATCGCGGAGCGGTAATCTTGCTTAGTAGGGAACCAGACCTTCCTTGG\nCTTAGAGCAGTATAGTAACGTGTCCGGGTACAGTGTTCAAGTAGGCGTAGGTATAGCACTGTAGACCCGA\nCTCTCTCTCTCGAGCTTAAGCACAGCTACGAAAACGCAACCAACTAACTGAATTTGCCGGGATCTCAGTC\nGTCCCGGCTAAAGACTAGGAATTTCGTGTCGGTTAACGTATGTTTGTTCCTATGCGGAGTTGCGATTAGA\nAAAGTATAAGGCGTTAAAATAGTTATTGGGTGCTAGGGGAAATGACCATTCGCGGTGATCGAAACGCATT\nGCTGGTCCTTTTCTTGGGAGACACCTCGTCAACCTTAACAATGTCCTGGGGTTGGACCATACTCGCAGAA\nCGGATAGGTGTAGGCAGCCATCAGTGTTTGTCGACCCAAGCGATCCGGCATCCGTAGTCAGATGAGATAG\nACTACTCGTTCCGATAATCACGCTAAACACTTTTCAGGATCTAACATTCTAGACACCCCGAAGCATTCGT\nTTTTCGTAAGGTATAACTAATACCTTCCTACCTTCGTTGTCGAGAGACTAGAGGAAAGCCGCGTGGGGGT\nTATTTAGCCCTGCCCCACATTTCGTCGCCTGGACTATTCAAAAAGTCATAAAAGACCTGACTACTTCGAG\nTTAGCGATGAGAAACCCGCCGAGTCGGGAGGTAGGGACCGGATCCCCGGTCCACCCCGAGGACAATACAA\nGGCGTGTATGTGATCCATCTCCCTGCTCCGAGCGGGCACGAGCCCCTTGGTCGTTCCGAGCGTCCGCCGC\nTTAGAATAGCCCGCAATCTCGTGAATCGACCGCGGGCCGGCCTCGTGCCTCGGGTTAGGTATGGCGCGTT\nGCAAACAAACGGCGTCGGGGGTCGATTTCATTGTGTTACAAGCTAATAACAAGATCAGTAACGACTTCAG\nAAAAAGCTGCCTAGAGTAACCGAACATGCTGGACCTTATCAACTGCAGGATAGGTCGGTTCTTAATGCAA\nCTCGAGCATTCGCACGCTGCGCAACACTCGTGGTCAGACCTCGCTATCCGAGAATTTTTTAAAATTCACG\nTCCGATCTTGGACTCCTCTAGATGGCTGCTTAGGGGACTAGTATAAGGAAGAAGGTCAGTGCTGATCTGG\nCAACCAGCACATCGCCTTTTTGTCGGAGAGAGCACGGCCATCCAGTCCATCTTGCAAGGTAATGGACAGC\nCTGTAGGATAGAGGCCCCCTTCTTTACACGTTCACAGGGCTGGTAAGGCTCACTTCATAGGCCGACTTAC\nACACGCGTAGCTCGGTATGCCGATATCATTATCACACGCAAGTAAGTGAACAACCCCTTACTATTATGAC\nCTTAGATTACCTCCGAGCGGATCGGTCGAATTCGTATTGGTGGTAATAATCTCCTGGTCAGCAAGAGCAT\nGTCGAACCAGATATGGGCAAGCATGTGTAGAATCCACAGGGCGCTATCCGTCACAAACATTCTCGTGAAG\nATAGAGCGTAGTAATGCTGCTAACATCTCTCTTTGAGACTAGGTTCTGGAGTCGTGAATATATTCGTTTC\nAAAGTATGGTCACGCTTTTGGACTCCCGCGTCCCCCGGGTAAGCAGCACGCATAAGCTGAGTGCATGATA\nGGTAGGAGCCAAAAGATAGATAGCCCCAATAGGGGGGTGCCATTATAAGGGTCCATCTTATCTCGACTTT\nGTCTTTGTTCCAGATACCAACTCACGAACGGGAGCAAGTACACAGAGGCAGTGCGGAGACCTGCGATAGA\nCCCTAAGTGTAACGGAGTGGCGACTTTTTAATCCGCATTGAACGGATGCATCAGCTGATCCTATGAAGCA\nGCCAAATGCGCTATAATAAAGCGACCGCTCGCGAATGCGATAGAGCCCAAGGGATGAGATCCAGTCATGT\nGCAACTATTGTGCGTTTAGCCCATTAGGTGGTGCATGGGCGATACATACCCTTGAAACCCCACCGCGGAT\nCGCTATCGGGGCAGAAGCGCTAGTTACGGCACTATGCCCCATTGTAAGTTGGCTGACAGAAACTTTAGCT\nTTAAGATGATCGAGTCAGGAAACATAGCGGTTACGCACAACCAGCATACCGCGTAAGGTTCCTTCAGCTT\nTATGACCGAATTTTTTTCCGCCTTCAAAAGAGCTCGCTGGGTGTATGCGGCCGATAGCATGGCTAATTTA\nCTTCCAGAATCTGCGTCGACGGGTTTGAGCAATGTTGACACTTAGGCTCCGCTTTTGATGAACTGTTGCT\nGGGAAGTATAGATACTGCTGCGCTTCATCCAGCCTAAGGGCCACTCTACGCCATATCTAGGGGGAGGCGC\nTAAGTCACCTATTGAGTCTACTGGATCTTAGCAAAGTCGCGGTGATATTCAAAGTTGATGCTGGTTCAAC\nCCAATTCCTACTGAAAGGAAGCTCCCTGTCGAGAATGTACTCCCCTCTTCTGTTATCGCGACGAGTAGGT\nGCCTGCTAATGGACCGCCTTAATCGTCATTATTCACTCCACTTGTCGTCTGGATTAGGTAGACGACTTCC\nTAGCTGAAAGCAACCTGTCCTCGGGGGAGACTGTCTGCTTTAAATTGGGCGTGTACGCCGTTAACGTCCA\nAACGACGCCCTTACGCACACCGCAGGACAATATCGCGGCCTTAAGTGTCAACTAGTGCCGTTTCAAACCT\nAAGTAAGCTCTTCCCCATATGCGTCCCCAGGGATCGATCTCAGGCTCAATCGACGCTGTATGCGGGAGCC\nTATGATGCCCATCTGTACGTATACAATAGATTGTGCCGCAAGCTCACGGGATTAGAAAGTTAGACGTTAG\nCCAATATACGCTGATTATTAAGTGGATAACGAAAATATTGGCGTTAACATCAGCAAAGTTGGACCTCATC\nAGGTTAGCCACCAGAATGGCAAAGCCAACGTCGAGGACCCGATGATCTAAGTAGATGTCTCCTAGGACCT\nCCGCCGCAGATGAACTTAGTGGATTCCAGATTCGCTAGCCAGAGCATCATCTGTCAATTATTCACTGTAT\nGAGTAGGCGTTGGCCTAACAGTACTCTTTCCGGGGGCGCCGGGGCGACGCAGCATTAGGCGCTGGCAAAA\nAATCGATACCAAGATATTCATGGCGCCTTTTTCAGCGATATTTCCTGGTGAGGAACTACCTCACACGGAC\nTGCCGATTGTGACCAGCAGTGAGTCATGTGACACTGCTAGGTAGAGGTCGAATTAAAGATCATGTTTGAC\nCTCCGCATTCCGTAGTATTTAAGTTGCACATCGGGTTACAATATTACGTATTGACTACATGTTCTAAGGT\nTGTCAGAACGATGCATCCCCGGCTCATCGTACGCACAAGAGTACTGTTGTCGGGATTCGCCGGGGGAACG\nGGTGTACCGTAAGCATACCTGCAAATATATGCCCATAACTCCTGTATTGCCGACGGGGAAGTAGCAAGAC\nCGCCTATGGCAACGTTTAATTAGCATTTCTCTTTCTAAGTATTGAGACGCCGCAGCCAATGTTGCACTTG\nGTAGGATAACCCACCTGTTTTCCTAAGTTTCGTTGTATTACTGATCGGTTGTATATAGTCACAGGCGAAC\nCAGTCGAGTGTGAATCGGACGTTCACACCAACCTATGCTGAACTCCGCGGACTGCGGTAGGGTCTTCCTC\nCGCGCGGTTGCCGCTATCTGGGGAGTTTCCTCTGCGCGGGGTTCGATGACACCCATTATAAAACTTCTTA\nAGCTATTGCATCCAACGAACCCCAGATTACCTCGCTAGAAACTGAGATCGCCTCGTAGGCGGTATTGTGC\nCTTTATTCTAACAGTCGCCTTATACCAGTTGAACGGCAATATCGACTTTTTAGGCAAGGAGCATCGGCAC\nCCAAAGGTCCGTCCCCACGATGGGCATTGCCCATAAAGTCTTTACGCGCCGATGCCTGGTCCAGTTCATT\nATCATAATAGTTAGAATCAGATGCCTTTCTCGGGGAATGTGAATAAACAAGTTTAAAGTGCTGAGAAGGA\nGTTCAATATCCGAGCGACACAATCTTGGGGGGCTGCCCATCTAGCTTGACAGGCCGAATTGTGAACTAAC\nAGAATGAGCTTGCCAGGAGCCGACGTCAACTTGTGCTAATGGGGGGACCGCTATGAATACTCGAGGGTTC\nGCCGGATGCAGGTAGTTCGTATTTTCACCCGTACCTCACCTGGGGGTTCTTTAGCTCTGGTTCAGAGCCA\nGGTCCAGATTGCCAACCCTCTAACTACATCCTATCCACTCCACAACAGCATTAACGGGGCTTGAGTTGTT\nCGCGGAGTAAGATGATGCCCGATTGGCGACATTCCTGGCCAATTAACTCTTTCTGTGCTTGCATCCAGTC\nTCTGCTGTCATCTTAAGACGATCCATAATAATGGCACGCGAGGTTGGATGGTGGGTTCGATGTCCTCCTG\nCCGATCTGGCAGTGTGAATGCTGTTGTCTTATCCTAACGTCTATTCTATATAACAGCATCCGCAGAGTCT\nAAGAACACGATAATGGCTGCTTTTACAATTTGGGTACGTAGTTAGAGGTAAGCTTCAGCTGTTTCTTGTA\nTACGAAACATGGAAAAGGCCCCAGATGACTAACCTGTCCCGGGACCTTAAAATTAAGCACACACCCAGGG\nATATACCCCAGCTATCAATTAAAGGCCACAATGTATTGAAAATTCTTCAACAACAACGCGTATCCGCAGA\nTTATTTGCATAGAAAACTGGGGCTCTTACGGTTACCCCCGCCAGAGGCATTTCCCACATCGACATATCAA\nCACAACCCAATGTAATCCAATACCCTTAACCACTCTTCGAAAAGTTCAATTGAGTGTATGGCATCCGAAA\nGAGACACAGTTTAGACAATTGCCATCAGGTGACCACTCTGGCATGAAGCAACTAGATTCGCTCCCAACAC\nCCCCCTAATCCCATAGAGTGCCCATCGCCTCCCTCTCTGACGTAGGCCCCGACCCTGTCGCGCATGGGGG\nAACGCGGTCGTAGTGACATTCTAAGTGTCCTACCAGCCGATGTCCTAAGCGCAGACCAACTGGAAGTATT\nTACTTCTGTCGCGCCTCGTCGGGCTGATCCACAGGTGCCGTATACCGCAATTTCTAAAGACATTTACGAG\nTACCTCGATAGGCATCCTGCGACCTTTTTGGACCGCTACTCGCGTTTATACTACCTCTAGTATAGAGGTC\nTCACCCACTGAAACCTGTTCCGGCTTCCGTGACGGTTAGCCTCAGACCACGGTATAGATAGAGAGTTCGG\nTTTTGATCAGGCGGCCTAGAACATTATAAGGCATTTAGGTTGATAATAAAAGCACTGCAGCCCATCGCCG\nCGCCACAATCGCGGCAACGGTCAGAAGTACCTGCCGCGGAAAGTCAGGTGGGTTAGTTAGCAATGTTTGC\nGAGTTAATATCTTGAAGATAGCTTGTGGTGTCTGTAGTATATCTACGATGTGTAAGTCACGTTATATCGC\nAGAGAGCTTGAGATATCCAGCACTAGGGCCTCCCCGCTTAGAGCACAACTGGCCATCTGGCCGTTCCCCT\nGGCAGAAAGCAACTCGTCATTAAAACAACGTTATCCATGGAAGCGGATACGGATTCCAACAGACCTCGAT\nGTGTAGAGTACACCAACCCGCATGATTACACTTGCGATGTCGCATGTTTTTCCAGGGGCCAACGCAGGCA\nGAAACGAGCCGTGAGCTGAAGTAGCCATTTAGAATCAGGGATATCAACTTTCGGAACCGGCCAAGCCTGT\nAAATCATCCTAGGTTCCAAGCCGGTAGTCAATATCAAAGTCGATTATGCTTCGGAAGCAAACCTCTGATA\nGCCTAATGCCCTTCCACGGTCGATCGTCGAACACTACACGTAGGATACAAGGCACTCGTGACTTTTGTGG\nGAAACGTACGGTTTATTTCTATGATTGGACAGCACCGAGGCGGGTTACGGCTACCACGCTGGTCGCCCAT\nGTTTAGTTCAGGCTGCCAGGTGTGAAGTTGCCTACAAAGTCGGCGAAACCCCTACTATAGACTGATGTAA\nGTTCCCTTCCGCGATGTGCGACCCACTCTGGTCTATTTAAAGAGTCATGAGGCATGAAATCACTTATACA\nAGCCATAGCTTTGCATGAAGTACTTGCCCGGGATCGTCTGGTCCTCACTCCCAATTATGGCCCGTATATG\nGTAGTCGACGACCTGGCCCTCCGTGATCTATATTCATATAGCTCCAGCAGAGGCACCTGTAAATTCCCTC\nGACGGGATGGAATCCTCTCGCTAAACCAAGGCAATCTCGGACCGCACTGCATGTACCCAGGATAAACATG\nCGCGTGGGTGTGGGCACTGTCCGAATGCAGGCAGCCGGTTTACAAGCAGATTGAGGGCAAGCAGTGTCGG\nAGTGCGCAAATGCGAAATTACATAGGCAGTAGTATCCGTCGGTGAGACTACGTTCCTTGACTGTAGGCGT\nTCGCCCTTGTGTAAATAGGTCCCAAAATGGGCGCTGTATAGAACCAATAACCATTGGTTCTCAAGCATGC\nGATCACCCTGTTTATCGTAAGCGATGTGTGCGTTCTGTCAATGGGCATATAACCTCATTTCACAATTTCA\nACGGGTTTCTTCCATACGTGGGATCACGTGCGCTTACGGCGGCCGCTCACGCCCCTACCAGGAGCTGGTG\nTTTAAGACCCTCTTGGATTCATTGGTCCGGGTTCCTACCCCGCGCTATACTCAGAACGGGATATATCGCG\nCGGACTTTAACGCCATGAGACGTCAAGGTCGACCATACGAATCAAATATTCTACCTATCAAGTTTCTGAA\nCCCGAGCTATACTCTTACATATGGCGACTCCATTAAAGATGTGACCTCGGGATTGATGGTGTTCACTAGA\nTGGGATCAAAGGAGCTGGCCAGGCATAGTGGGCCAACTGACATCCTTCTCTTGAGTCGCCATAAGGTATG\nGACCGGTACGCTTTGCCATCTACGCAATTCAGCATCTGTGGCAGCAAACTAGCGAGGTATTTTTTCACTC\nTCTTGTGATGCCTTTATCCACGCACGTCGTTACTGTACGCGTGGCCATGAAAGCGGGGATAGATTAGGAA\nCTACTCAGTGTAGAGGAGGTTTTGTGGAAAACCAGCATAGTCGAAGTATCTGGATATCAGTCATGATTGC\nGCGAACTTCGGCGGCCTCATTATTCCAGGAACCTATTATGGGTGGCGACGAGGATAATTCTTACACAATT\nTAGTGGATTCGCGTCCAAATCAGCTCAGTGTCGTACGGTATGAATCCAACTACTGACAATGGTGGAGGTC\nAGTAAATAATGAACAGCCCCAAGCGTTAATCTACATAAGTTCTACCGATTATTGGTGGACCAATAACAAT\nCCGATTCAGGCCGTAGCCTGGTCCAGCATGTTTCGTATTCGAGACTGTTACGTTGGCCTTTGGCCAACAC\nACCTGTCTTCGAGATCGGAAGCACAACGAGAAAACCCCTTGCAGTTCATCGATATGCGGAATGGCCATAA\nGCAGTCGGGCGGTTCGTGAGGAAGTAGCCTTAGTTATATTTCTCCTGCGTGAGGGTTAATGGCTCTCCAT\nTTTCCCTCTTAAAAGGGGACCCTTGACCATAGTTAACCAACCACATGCATATTCGACTAATCGTACGGTC\nCTTCGAACCAAGGAGGCTCGCCTTCTCGAGTACGCCCTCAACCCGTTGGGAGTTCCCGCCAAATCAGGGT\nATTCTACTTAGTAAGGCTCCTGCGGACTGACCTAAGTCTCGCGAGTTACGAATGGACGGTCTGCGGAAAT\nTAGTTGGCAGCCAGTGCCGGATATGCAGATCACGGCTCCAAAATGGAGCCCTCGTGGTGAACCCCTTGTC\nATTAGTTCCCTGGCCGGACGTATACATCGCCATGGACCTAGTTCACCGAGGTGAACTGTACAATGCTGGA\nTGTTATAGGGCCTAAAACAAATTGGTCGCCTTCTATTGCTCAGACGACCAGGGTTGGTATAACTTCGCCT\nGGGATAGAGATCTTAGTCATGGGCCGATCGTCCACGAACGTAGCCCTGGAAGGAGTGCGGCCATACGGGC\nATATAAACACCTGGGAGCTAGGAATCCTGGGCTACTCAATTATATTTACGGCGTCAGCTAGAACAATACT\nTGATGATAGCTTGGTTGCCGGCTCGACGAACTATGGGTGGCTACCCCTTGAGACCGATTAATGTCACTAT\nGACAGATAGTGGCAAGTTCTGAAGGAAGTGAATTCATCTACGCTTTACATCAAGCACGGATATACCCCTC\nCTCACGCGTCAGCAGGGGGCTTCCGTACACCGATGTCTGACACTATCGGGGCTGGCAATTTTACATAGGG\nTGCCATACGCGGAATCAGAACCTCTGAAACTACTTGCGCTCTGGGCCTCGGTCTTAGGCCAGCGAATAAG\nGACGGTATTTTATTAGGGTGACCCGACACGATGCCGGTCTTTAAGTAGATTCACGGTTCACGGAAGGGCA\nATCCACCTTTCGTAGCCCGCCGCACAATGCTCACTGGAAGCATAACCCCTCGCCTGGAAAGACCTCTGCG\nATGCTCCCTTTCCAAATCACTGTTCAGCTCCGAATTATACGGCTCCAGAACCATTTTACGTACTCCGCGA\nACTCTTTCCGACTATGGGGGGCTAGCGACTGCACTCGGCCAAAATATGCAATGAGAGTTTGATCTGAAGG\nTATCGGCAAAGGGTCTGTGAGCCCTCCACTCCACAACATGTCAGAAGTATCCTGTTGGGCACCGCAAAGA\nAATACGATATCCCTGTATACATGACCACCTTATGCCTCCGTCTCCAGTGCTACTGACAGGATTCCAGCCC\nTTTTCTTCTTATATTTCAACATAGGAATTGAAGTTAGCACCCCGGCCGATTAGCCCACCCTTAGGCGTGG\nAGTTCTGATGAGGAGTACCCTTAACCAGGACCAAGCGTTTGAAGCCGCTCATCCTTATCGAGAAACTACG\nATCGCACCTATCGTCAGATGGATAATAGAGTCCTACGGCTTTAGTAATCGCCTACGCTGGGAGCGTCAAT\nGGGAAATCTCGAACGTATCTGACGATTTTGGGAAAATTTTAAGATCTAGGCCGCCCGATACATACCTTAC\nCTTACCTTTCGCTTGGTTAGGAACAATTCACGATAAGAATCTATTGTCTATTGTCACACGTTGTCGTGCC\nCAGGTCACGTAATCAATTAAACGCACAATGCCAGTCCCCCCAATCTGGCGGGTCCGTATTCCCAATGCAG\nCGATTGGAACCTATCTCAGCCTAACACAATACATCTAACTCATCCGCATAAAAGTGGCTCCTGTGATGAG\nCTCTCTGTGTAAACCAAGATACCAGAAGTGACGAATGGAACATGATGATAGCGCCTTTCCACTGGGAAAG\nCTCATCTACGGGTTCTTGGGGCTGAGCAGTATGGTTCTAAGTAGGGGGCGCCCGTCAGATCTGTACCGAG\nTACATTAAAGGGCAAGAACAGATCCGTTCCAAGCTGCGCTCCCTTTTCGTTCGCGGGGTTAGAACCAGAG\nTACAGTAGAGGGGCTCCCAGAGGCCGACCTCGGCCTTGCCGCCATTCCCATCGAGAATCAGAGTAGACTG\nCACCCCAGCCTTCATCCAGGCTACAGAGTTATGTGGGCCGGGACGCCGCTGCCAAGACTCTATTTCATAG\nCTCCGGCGATCGACGCGATCGCCGCGCTGAATAGCGGAATTCCTGCAGTGTTCCTCGGCGGTTCCTCCGA\nTTGAAACTCAGTAGCCATTGATCCTGAACATATATAGCAGTCTCGGTCATTAATGCTACTGATTACGACA\nGCTCGTTTAGCTATAGGAAGAGCAAAGATAGATAACATAAATGTGTTCCCCACACATCTGGAACACGGGG\nAAATCCGTGATATGCGAGTCTCATCTAGAGGGGCGTCGTAGCCCGTGCATCCATTGCCCCAACACTCAGA\nAGGCCACTCCGTCAGTGCACCTACCTACTATGTTGGAAGGTATATACTCGTCAGCTAGGTGTAGTTATTT\nGGTTCGGTGCCTCCTCACAAAGCAGCATACGTATTTACCGATGTAGTAGGCCACGGACCAACGCACAAAT\nTGCATAGCCGAAGTCATTTATGGCCAAGCAGGAGTTTCAGTCCGGTGCCTCAAGGACGTAGAGAACATTG\nGACAACGGCCTCGCAAGCAGCATCGTGCCAACAATACACGTGCGAGCTGCAAAGGAGAAGGAGACTCTAT\nCAAAGAAGCCACACCTTTCTCAGCGCCTTAGCGAGCGAACTTGCAGACGATTATCAGTTGCAGGACCTGA\nCAAGATTTTGACCCCCGACTCGGCATGGTACGTAGTGCTTGGCAGATTAGACGCTTGGACTTTACATGTG\nAGTATGAGAAAGCCGAGGTGGGACGGCGCGACTTTATGGAATGACGAGCAACGCATGCGCCCCCTCGAGG\nGCTCTCGACGCCCCGATTAATAATGGACAACGCCCTGGGCTAATCTCCTTTAGAGGTTACCACATCACCC\nGTTTCAGCCCAATAGGCGGAAGCAGTCCACGATGGAGGCGAGGGCTTATTGAGCATCGCTGCGTTGATCA\nTGTATGCACGGCCATTTTTAGATTATTCCAGTATAGTTATCCGAGTCATATTTAGCATCAAGCAATTCAA\nCGGCCTGAGGAGAGATGATTTGAGTCATGTAAAAACGTAACCTAATTTCTCTCCTACGTACGTGAGAGCG\nTTTCGAAGCTCACATAGCGGTAATTCTCGGTACGAGTTCAGGCGTGGGGGAGTACCCGATATACATTAAA\nAAATTGTCACATTAGCAAGGATGCGCCTTCGGGGCTGTTGGTGCCTTGAGACGATCGAGCGCGTGAAGCA\nAAGACGCTCACCTACTTAGATTTATCGCCCCCTCTCCTGATTGTGGTACCGCCTTCGGTCTGCTCAGCGA\nGAAATCGCCGGTCCGCGACATTCCCATTGGGAGCAGTTGGGGAACCTGGCACAAAAGTATGTTTTACGAC\nAGCTCATGGAATCATCGCTCCAGGTACCGATGCCCGGCAATGGCGTCACTGTCCAACAGTACCAACTGAG\nTTTTAGACCCCTGGTACTGTCTCTCCGGAAATTGTGGTCAGACTTCACCTTGAACGAGTTTGGCTGATTC\nTGGACGTGTTGTCCCATCCGGCTTACACACGATCTATTTACTGTTTGGCTATTGGATCCTATCAGGGTCC\nAGTCACTTATATCTAATTATTGACGATTAACTTTCTGGACTCATAAGGGATGGAAGGAGCGTAGCACAGT\nGTAAAGTACGTTAATAGAGAACGGTAAGTAACGTTTAACTGTCTGTTCTTGCCTCCGTGCACCGCTACAC\nCTCATATATATCCGCCACGAGCCGGATTCAACACATCCAGTCGGGGTTGTTTGGTGTGTTCGCGCCAGTC\nACTAGTTTTACACGCAGATGTTGGTCATAATGCTTCTATCCCCGTAGCATCGTCCAGTACGAAAACCCCA\nGGAACCCAACTACAGCCTGTGTGAGTTGTGTGCAGTTAAATGGAATTCTCTACTCAATGAGGCGACATTA\nGTCTCTTGGCCCAGGTCCGTCAGAGTAAGTCTCCAAGACGCTCCAACCGTGGATTGCATGAGGCCGTTTA\nAGGCTTCGCAACTCTTGTAATTCATCGAGGAACATAGCTATTTCCACGAGAGTCCGTTGTAATCACGCAA\nTTACTAGATGTCACTAATGCCCTCCCAAGTTGGGTCACTTGAGGCAGCACTCCGCAGCGCACAATTCTGC\nGTTTTGCAATTTAGGCGCCCTTTTTAGCACACCAGAGAGCTTTATCATATGAGAGATAGGGAGGGGCACC\nGCTTGGATCGCATACAGTTGACGCACACAATATTCCCAGTGAGGGAGGTAATACTCCGGGATTAGGAGCA\nATTACCGAGTGAGCACATAGGAATCGTGGCCATGATTTAAGATGCCGCATGAACGTACAAGAGGATGGAG\nGGAAAACCGCAGACGGCCAAGAAGATCGTCGTATTACATACTGTCTTACTTATATATAAGACAGGACGAC\nCCTTAAATTCCGCTTCTGTTCATTTCAACGCCTGCGTCCGGACCTCTACACTTTACGAATCCCCTACGAG\nGTCCCCAAATACAGCATGAGATCAAAAAGTCCAATCGTACGCATAACTACATGGGTATCGTCTGCTAGAA\nATGCCTGATGAGGAGGTCGCCGGGATGGAACTAGGAGCGATGGCGTCGACCGTAGGTTGATGTTGTATTT\nTATCCACCCCCATGTATATTATCCTTTGCCAACTTGAGCGAAGGAGTGGGCAGTTTCCTGGTTTTCGCAT\nCTGTCATTTAGCGGAGAAAGTTCCAACGTGTCGGCACTATTGTAGCAGGGCTCGTTGGTAAAGTAATCGG\nTAGAGCCCGTTGTGTCGATGTCACGAAAAGTCTCTAGGAACTTGCAGCGCCTGGACTGCTAAGCCGACGA\nTTGCTTAAATCACTACCTATTCTCCCTGAAGTTTGCTTTCTCGCTTGGTACGTTGACAAACCGATTTCCG\nGCGCGGATATGCAGCTTACCGCTTGTCCGCCCTGTCTTACAATTAAGAATCCACTCTGCGATTCGTTGGG\nTCATGCGACCCACTGATCTATGTCTGAACAACACCAGCTTCGCTGACTGAATCGTAAAGAGTTATTTTAG\nCCCTTAATGGCACCTATTGCGCCCACAAAACTTAACCCTGCCCAATATCGGCTGTGAACTCCATGTCTTC\nTCTCCGTGTTCGACTGTAGGATTTTGTACCTCGCTGCAAGACGCGTTAGGCCTATAGACCGACGTGTGTG\nCAGTCACCACTCTCATAACTGTTAACTGTCCATATCTGGAACCGAGATTTCAGTCGATATGCGCGCTTGG\nCTCTATCCAATCTCTGACTCACTTACATCTGTTAACGATAAACAGATAACATGCTGAGGAGTCATATCTT\nCTAGGCGCATCAATACAGTGGGGAGTTACCTGCTACGGGTGGGTAGGGACTCTAAACTCCCTCTCACCAA\nTGACGGGAAAACCTAGGATTACACACGCCAGTTACTAGTAAACACATCACCCCTAGGCTTACACATATAC\nCCAAGTGGTTTAAGCCGGATGGTACAAGTATCGGGGAAATAATATTGTCAAACTGCTGTTTGGCTGTTCT\nGGCTCATTAGGCTTTGCGGATTCACATAAGAGTCTGAGACACTGAAGGAGTTCTCGTTCGGAAAACCTAA\nGCTTCGAGAAACGCTCGCTCTTATAAAAGCCGCTGGCCTAACTTAATCCACTCTCGCGTTATGATGGTAG\nTGGAAGGTTCGCCTGTTGCAGCGCACGTTTTCAAGTTTTCAGCACGTTCTTCCCCCCGATACACCGGCGC\nTCGCGTAGGGAAACTTTGCTTGGGTTAAAGCGACAAATTAGATTTCCAAATTACTAGACAGGAGCGGACC\nATTATGTGTCTTGCGGCGTTGATAACCGAGCCCAAACCTGGACTGCCAAACCGATGAGCAGTTGGCCACC\nGAATATAGCCGGTGGACCGCATAACTAGCATTATGAGTCTGAGGGTGGTACACGTCGTTCAGTACCTCGA\nCCAGTAAATTTTCCTAACACCGGGGCCGAGTACGGCGTCGGTCTGGACAGTCACGTGTTAGCCCTCTCAG\nTCAGCTTCTCCGGCCTACAGAATGACTTGCCGGGCAATTTCTTCAGGTACAACATACTAGTCATACTAAC\nCTCGGGGAAGGGGGAAGAAGTAGAACGGACTGGATACGCAGAAGAACGTCCTCGCGTTTCATGCACACTC\nCTCGACGAACATCTTCGCGCCGCGCCCTGAGCTTATACTATGTGTGCTAGGCACTGTGCCTAGTGTCACG\nGGCTGTATGTGACAATTTACTACAGGCCCTGAATTGCAGGTTCTCTACACCGGGTCGCGAGACGCAGTGG\nCGCGACCACTCGGTCGTCTCATGTAGGCACGGACCTATTGTCACGCGAAATCTTAACGGCGGGAGGAGAG\nGATTGCATACCAGTCCCACTTTAGGTTCCAGCTATCGGTATTTCGCACCAACCCTTAGGAACGTCTTGCG\nTTTCCAACTCACCTTGTCGTCTAGTGCTACCCCAGAGTCTCTATGGCGGATGACAATGCGTCTTTGAAAT\nAGTTCGTGGAGCCCATTGTGGTACGCATTCACCTTCGACAGATGTTAGACCGTGTCAGTACATCCTTTTT\nTGAAGCATCCGCTTCACGAGGACAATCCGGCCGTTGGTAACAAAGGCCCCATCCGTACCGTGAGAACCTT\nAGGTGCTCTGCTTTTATGCAGTGCAGCCGTGGCCGAGCCGAGAGGGTGTACTTGGATCAACGCTAACATA\nCCATGACAGCTGTTAGAAACTCGAGCGTGGTCCGCTCGTGAGTAGTGAAGCCCTTATGCAGCGATCGCTA\nTTGTGAGCGCACAAGAGATACCGCGTAAGTCGCTTGCCCCAAATCCGGACAAATAAAGGCCCCTTGCCTA\nCGCAGGGTCTAATCCAAGGGATCCCTTCCTATCCTTGAAGTATGTTACCCGGTAGTACATGCTACGGTGG\nCCACTCTTTGAGCTGTGTACGGTGAGCCCGGCTAACATTACTAATCGGGAGGGTAAATAGAAACCAGGCC\nAAGGTTGACCCACAGTATCGGACGTACTTCAGGCGGCCCTCCTATTGCTTTAGTGGCGTGTCAAGTTTGA\nGTACTATTGTCTCGTGGGATGCGGGATTAGGCGGAATGTTATGGCTTAGCAAATGTTAGACATCTAAGTG\nACTACATTCCTATACGAACGTGGGGGACGGGGCAAGCAAGAAATGGCCATATATCTTTAATCTCGTGTAT\nGCCCAGAGATACCGTACGCATACATTCTGCACGACGGCTCTCGTTCATGTGCAAGAGTCGTTAGCTAACC\nCCAAAGCGGGAGGGATTTATATACGCATTCCGTTGATTCGCAACAAGAGCCCGTGAGTCGATCTATGACT\nTTTACTAGGTCTCTGATGCCCGCGTACAGCCCCCGGAGACAAGTCGCATCACATCGTAGGCCCCGCAAAA\nCAGCCAAGACAGTCATGACAACGGTACCCTTGTAGATGAGGACCCGAGCGTGGGATTAAACTAGCTGTTG\nCCGGACGCGAACGAGCTTCTCAATCTGACGGGCCACCTTGAGCAGCAGTCTTATCACAGCGGAATAGGGA\nACATGTATTAATCTGGCCTCCCATACGCCATCAGGCGGACGGCAATCATTGACTAGCAGCGCATCTTCGC\nGTGCCTGTCCGTGCACTCGAGCATGACGCACTCTGTTTACTCATCACTTTCCACAGTTTCCAGCATGGAG\nAGAGGTCATGGACAAGCGTCGGTCTGAATCAGACGTAAGGCCTCGCGTCTGCAGGTGTATCTCCCCCCGA\nGGTTCGGACAACTGTCCATATCGCACCGGAATCAAAAAATGCTTACTGGGATAACGTGCGTTAACCTGAG\nCCAATGGGGCCTCAACAGACACTCTGCTGTAACTTTTCGACAAGTTGGCGGAAATGGCTTACCTGGCTAC\nAGGATAACTAAGTGGGGCGGGGATTGACGCGATTTTAAGTTGGATCAGATCCAGCCACTTGTGATGGTTT\nGGACGCCATGGCGCGAAGTCTCGGAAGAGGCGGTGCAGAAAGTGCTCGTTTAGTCAATCCGGTCTCGTCC\nTTCGATACGTTGTATCGCCTCTGCCACTAACTTTAAGACGCGTACGCGTCTTTCCTTGACAACCTCTGCG\nCGGTACACATCTGAACGCTGTGGGGGCACTAGTCAATAACCTCTTACTGCGGGTTGGATCTTCTGCAGTG\nCCGAACAGCTGGGCTTCTTCGGGTAGACGTTAACCCTGGGGTTCGCAAAAGCGAACCGGTTGTTATTAAA\nCTGCACAATTTTTAGGTTTCACAGCAGATGCAGCCTACGGATAATTGGCTGTCCTTTCTACTCCTCTCGC\nGTTGAAAGGGTGTCGTGGAGCGCATGTTGGTGACCACATTACGAATAGAAGCAGGCACATAAAGTGCTAA\nCGCCTTTACGGGTAACGTGGGTTTCCCAAGGTACGACAATGTAAACCGGGCGGGATCCGAATTAGAATCT\nCTGCGTAGAGGTTAAGAGATGCAGCGGAACAAAGTTTGTAGTTCGCGGTTCCTTAGTTGGTCTCGTGCAA\nGTTTAGTCTACATGCTAACATGAACGCCGAGGCCAACAGGCATTTCACCGGCTATTGTTAGCCACGGCGG\nTCTATTCTTAGTATAGCGCCGGTTATGGCCATCACCGATTCATAAGAGCAGGGCTGGCTTGACGTCGACC\nAGCAATATGATGCCCAACGGCAACTCATGTGGTGGACGGCGATGTATGACCCACTTCCCGAGTCCAGACC\nCTTACGTGTCATCAGCATAGGTCGCCATAAGGTGTTTCAGATGGAGACCACTAGCGGGTGAAACGTGCTA\nCATATGCCGGCTAATGCTTGGACACTTGTTATTGCCACTAGAAAGTAAGCGACCCCAGGCGTGCCTCCGT\nGACTCTCTCTCCGAATCGCCGCGGTAGCCAATTCTTATTTTCTTAAACTTAGCTTTGTGGTCTGTTCCCC\nATATCTTTCCTCCCGGGACTACACCTTAAGTAAGCACACTGCCAGATCGCTCAAACGGATAGTTCGCAGG\nCACGGGCTCGACAGGGCTAAACCGAAGGCCCCCCGGAAACTCTTCTGTAGTGTTTCTGACTGCGTGAGCG\nGCGCACTGTAAATGCTCGTCTGTCCCACCAGTCTCATTGTCCTCTAGTTTCGTAGACACGTGCCTAGATC\nCATTCATTAGTGGCGTGGATCTCAGGATCCCAGAACCACGCAGTAAAACCGCCGGCCCCAAGTAAACTTT\nTCGGAGGTATCCCAATCACAGAGGAATCATGACGACCTACAACTCAAGCACCGCCTTGCCTTACCCGAAC\nTGGAAGAGCACTATACGTCAGCGACGACCCTTCACTCCTGCGGACGTCATCATATTGTAACTAAGCGGTG\nAGGGAATGAGGTTTTCGAATAGTACGGCTCCATTGTCGTGAGACTCTTTACTGTTCTTTAGTACTAGTAT\nCACCATCCCTGATGAACAGAGGGTGACCGGCCAGTTATTGCTCGCTCCCTCCTCGTACCGTTTAAAGAGT\nCCAAGAGATTAAATACTATACCAAGATCACCAGTAGGTGCGCAGCTACCGAAGCAGCTTGCCTGACCATC\nCTGTAGGTTAGACCCTGACACCCTTGTGGAAATTAATCTTCGGGTCGGGATCAAAGGAATCTCGTTAGCG\nATCAGTACACTTAGTAAAAAACCCCGTTTCTTTACACCTTGCAGCAAGAGTTTATCGCTCAGCCACACAT\nGATGCCATTAGCTGCCGGGTTATATTTGGAGCACGTCCACCCAGAGCCATACACGAGGTTGGGATCGCTC\nGAGATCCGACTTAGGCCTATCACCCCTCTTCGCTCTCGGAGTCCAACTCGTGCGCAGGTATTATATCTGA\nATAAAGACAGCGACTCACCCGACCCAAGCAGTTTAGTGCGCCGAAGTACGACAGCGGCACCATCCACGTT\nGTAAAGTCTGGGGTCAGATTGCAAACGCCGCTGGGAGGGGTCAGATATACACCGAACGCCCTTTTTTTAT\nGGAAATCCTGTTTCATGGGAAGAGAATTCTACCCCGGTATTCAACGGCGGGGAGAGGTCCCCAGACATGA\nCCGGAGATCTGCTAGCTTGCCGCTCGAGTCTAACCTTCGGACACATTGTGATACGAGCGTCCCCTCGGGC\nCGGCGGGATTATTGGACGATACTAGCGGGTAATGTCGGAACCCCTACTTCGCCAACCCGGAGCGTACCGG\nAGATATTTAAGGGAGCCGCGGGGCACTTCGCAGCGATCATGTACCACCAACTCTCCTTGCTACTAGCCGA\nCCTCTTTTGGTGTTGCAAATTCGAGCTAATCCACTCCAAGCGGCGACCAAGCTGAGTTCCGTGCAAGACC\nGCCGGCGCTTTTTAGGTTCCCCCGAGAGGGCATAAAAGGCTTCTGTCAACCCTCCCTCTACGGGGGAGCC\nATAAGGCGCGGGTCCGATTTACTGCAAGCCTAACAGGATACAGTCCGTTGGGGGATACTGAGTGTCCATG\nAGGGTAGGTTGCATTCATCGGGAGGTTTCGAAAAGAACTCGTTTCCAATTCTTATCGTGCTAGAGAACGC\nTTCGTTAAGTTGTACTACCAAGTAAACCTTCCCGTAATCGTCAGGTTCAGTAGATTGGGTTGCGGTTGGT\nCCAACAAGGGTCGGGGTGTCTAAACGCAATATGTAGGGTGCTACGTCTCTAAAATCACTCGTCAAGGTCT\nGGGCTTTAGCAAGCCAGAAACGGCACTCATCTCTACGCGCCGTTCGCGGGCCCTCTAGACTGATGGGACA\nAGAAGCACGGTTACCTAAAGACCTGCACAACTCACCACAATCAGATTTTCCTGGACCAGCGATTGAGTCG\nAGACCATGGCGCCACACTTCCCCACCTACAGGGGGAAGAGATATACTCCAAAGAACAGTTTGAGACTTCA\nGATCGGCCTAAAGAGCTCATATAATGATGACTATAACAACCCGTAGCCGTTAGCCTAGCTACACGAGGAT\nTAACATAACTGGCGCTTAAGTTACTTACCATATATGCGCACTACGTATGCTAGGTTGAGACATCAACACA\nTCTCTCGGGACTTGTCACGCGGGTAAATGACAGTGGTGGTGGGTACATTCCAAGCTTCACAGTCCTCAAT\nATTTGGTCCCCTGTATCGGAGCCCTCGACTTTCATCTGTATTCGACCACGGCAATGGACTGTCGGAAAAG\nTTGTTCATCTTTTGTCACGTAATAGATGACTTACCCCTCACGGACCAATCGATTTCCTTTCGGCAATCTC\nACGGATTCGATCTCGTGATTACTGATCTTGCGTTACCCCATTCTGACATACGAACTGGCTGCGCCTAGGG\nCCGTCAAGCCGGCGTCGCAAGATGCGAAGAAGCCTGTAAGTGGCGGGAGGAGCACTACTTAGCATCAGAG\nCAAATCTTGGAATGTGTGTCAAACTCGTTCGCCCAAGAACAGGTCATACGCGAGCGTGGGGCAAGAAAAT\nACGGTGCATACCGTCTTAGTGTTATTGCCGCGGGAATGCAGAATTGACACGGCGGTCCTACGAACCGGAT\nTACGATCGGCGCGCTAAATAAATTGCATAACCTATGCTCGATTTGACAGACTTCTAGGTAACGAAACTCA\nAAGGATCTAGTATTGTCCATACGGCGCACACTCCGTCGGGACAACGGCATCGATGCTTACGTTAGCACCA\nGTTGAAGCGTGATATGTATATCTCGTCTGCTCGGACGAAGATTAGGAGGGGACTGCGGCGAGGGTAATGG\nCCCGAGAGTACGCTAGTGCACCACGCATAGGCCCGGATCCTGATTGGGCGACGGAGAGTTGGACATGCCG\nGCCAACCTGAACTTCTACATATGTGCCTCTTCTATAGTAACCACTTCCAGAGGACAAAAGGAGCGAACAG\nGACAGACAGACATTACCAAGTGGGCAACCGCGTACATGTGATCGGGGTGACTTGAAGGCTGTAGGTGGGA\nTCTTCGCGAAGATAGGATGTACGATCTTCACCCAACCTCTTAAACCCGGCGCCCCGATTCGGCATGGTCA\nCATGGTGAAGTAGACAGATTAGGAGTGGCGTCTGCCCCAACGGTGGCGGGCTGCTATTATCGAGTTCGAG\nGTGGAGTTTCTTGGATACGCGACGCGCTTATTAATTCTTACACCACATTTTAGGTGGCGCTAATTGTATC\nTGGCGCTGCGGTGGGTCACCATTCCCGCTCAGTGTATATCAATCTCGGGTGGCACGCCTCGTGCGTATAT\nACTAGGAGTTATGGGGTACTCCGCGTCAAGAATGAGGTCACGTTAAGAGCGAATCTATAGCCGAATCTAG\nATATGTGTTCGACAATTATTTCTTGACTTACAAATCCATCGACGGACAACTTGGCGAATCGAGTTGTCAC\nTGCCTCCCTTACTCCGGCGGATTACCAGAAATAGCATGAGCATTCTAAGCAGTGACCACAATCGCCCCGA\nGGTGGGTTGCCTTTCGGCACGACCCGCCCGGCCTTTCCTTAGCCGGTTAGGCGTGTCCAGGGTTCGGTAT\nAGCAGTTAGGACACGTTCATGTCGTTCTCGGCCTAATTTCCTTACCGCAACTTTTCGAACGATGGAGTAA\nTTGGATGATGGGGAATCACCTCTCACTAGGCTCCAGGGCAACGTCTGAATACACTATAGCGTGAGGTAGA\nGGAAGTCGTGGGATCATTTGCTTATATTGCAACAGAAGCCGCTCTTTAAGCTTAGGTCGTTGATCTTTAC\nATGACGGACTCAGGATCTCCATACAGGCGACTGTAACAGGCCAGCTTGCATTTTTGAACGCCATATAGGT\nGCATCTCCCCACCTGAGGGCGCCAATCTTGTAATTGGAGGATATTTCCCAGTATCATTCTGACTAAGCGC\nTCTCAGGCGCAACTGGTTAAGGCTGACTATTCCAAGATCTTTCCTGCTCTAGGGCGGGCCGATTCGCAGT\nAGTTCACGAGCAAGAGCGCTTTCTGTTTGATCTTTAGATGGTCACACGCTCATCAGCAGAAGCTAGACCG\nGCAATGTGTTAGGAACGGCCAACGGGTCACTATTCGTGCAATACATGATTATTTCCGCGCAGGTGTCTCT\nTATGGTGATATCCTTAATTGAACACTACCAGCTCTGGATCGTTCTGACCGGGCTAGCAACCAACTGATGG\nTTGACTCAGTTATCGAGGCCCGTGGCGTAAGACCTGCAAGTCACAGATTGTTATGCCATCCGTGCAGTAC\nAGATTATTTCTAGAGAGCAGAACTTCCCTGGCCCTCGCCTTCATTGGCGTCAACGTCGATATGGAGACAA\nGCTCCGATAATTGTTACCACCATGTAGTGTAGCTTACATTTCCGTAACCCCCTTTCATTGAATGAACCTT\nCAGGATTATGACGGAGGGTGTTGGCTCAATATTGTATGTCCAGAAAGGGCTTCGGCTTGCGATTGACCTA\nGCAAGTTGGGGCGCTTGAGTTTGTCACAAGGACTATGGAACAATACTCCCGGCGAGTAGGCATCCTCTGA\nCACTGGTATGGCAGGCCCCGGATCTGGCTTCTCGGTCGAACTTAACCTATGAAAATTTAGTGAACCTTAA\nTCATGAGCCGCGCGCTGACCCACACCTTTAGTGTTTTTGTCAATCCGAGTCGAGGGATCTCTGATCTGAA\nAGGCACAGCCCAGTGTAATCTCACGGATGGCGCCCATTGCATTTGCAAGACTCCGGGACCTATATACTCG\nGGTCTTGGTACAATCACCAGAAGCGAGAAGTGTCGATCTGACTAAAGCAGAGGATGAGCTCAGCTTCAGG\nGCAGCTGACGTCTACGATTCTGTGTTTCGGGTCCAAAAAGTCTGACGTGAGGTATTTCGTTCATAGTACT\nGAAGACGATTCACGATTCTATCTGGTTGACGGCGCGAGTAGCCGTCAGCACGTATTAATAACCTTGAAAC\nTCTCGTGCCCTCAGTGAATAGCCACGGATCATGGCGCGAGCAGAGTTTCTAAACACAATGGCAGCTACAG\nCCCTGGGGTGGTAAAAAGATAAGAGTCCGTCTTCCAAGGCAATGCGCCCATCTGTATCAGTGGTGGCTGT\nGTTGGGGGTTAAACCGCGTCGTGAAACAGACCGATGGCCGTGGCATACATAGATATCGGTCGCGCGTACA\nAAGTGCTGATATCGTGTAAGAGTTGCAGGCGCCGTACATGCAGGCCTATAGCCCGCAGTTTAAATATCAG\nAGGATCCGTTTTGGGTACAGGTCTGCGCTAGCAAGGGGGAATTTTACCAAATTATCTCATCATGCTGCAG\nTGGTTGAATTGCCATCGGTACTGCGGGGGTCTCGCGCCAACGAAAAAATTTTCCTTTGGCCAAACCCTTA\nTCTTGGTGAGACAGCGTCTATCTCGGAAGAGCCTAGGCGCCAGTTAAGAGTCTAACGTACCCACCGAGCA\nCCTGAGACCGCGGTTGGCTAACCCGTATCGGGACAAACGCCGACCCCCGCGTACCAGATAGGCGAATAGT\nACCAACAAGAATCTAGTGTAAAGTCTTTCCACTCATAAAAGTACGCCAGGTCTTCCTGACACGCAATGTA\nTGACAACGCCGTTAGAAGTGTGGACTGAGATTTAATCTTTAGTCGCCTAACCTATGGCTAATAGGGAGGC\nCAAATGAAAAATCCCCGCGCCTGCCGGAGCTCTTTACGAGCAACACCTATGGTACGGCAGTGAGGGGGGC\nGAGGCGTAGCTATTCTATATGTGGCACTCATGCCCATCTTCACACTGCGATGTCAATCCTGTTAACTGAA\nCGAAAACTAGACCTGCGTTGGAGTGAGCGTATACGGCATCTTCGCACACCAAGCCTCGCGTGCATAGTAC\nCATGTTTTGGACGAAGACAAGGAACGACGAGCCTCCACGAAAGTTGTATTGGATCTCAGTACTACGTGAT\nGCTGCTATAATATATTGAATAGCTGATTGGTATAGCCCGGCTTAGGCTAGGCACTGCGATTAGCCAGAGA\nCACCGCGTTACCGGAAGCGACTAATTCGCACCGCCGCTCTGATATGGCCGCCGCTCATCTGTCGACTCGC\nTGTAGAGCAGAACCTCATTAGATGCTACGGAATTATGCCCCAAGCGCCCGCTTGCGGGCCTAATGCAGTG\nAGACCAAGGATTCGGACGGCCGTTTCTACGTCGAGGAGGCCTGAAGCATAGATATGAGAGGCAAACACAA\nTAGCCTATGGTTCGTAGCATGGCATGAAGAAATGCAGGACACACTCCCATGGATTGCCACGCGTCGAGAA\nAGGATTAAGTCTTAGCAGGAAAGCGGCCGCCGACATCCACGACGTTGATAACGAAGAGACCGCAACGTGG\nATAGTCTGTGGACAGCCCTCCCCTCATTGTTGCTCTAGTGGCTCTAGGCTACTCTAAAATAAAGCGCACT\nTGCGACAATCTATTAATGCATATGGCAGGAATGTACGAAGTCTAACGGCGTCGCGTGCTTAGACGTTCAG\nGCCGCACAATAATTATAGGCTACGTAGCCCTATCGTCTAGCGACCTCCCCCATTTGTCCTAAGCCTCGGA\nAGTGACGGAATATAAGCTAGCAGTTCTTCACTTGCCCCGAAAGTCCAATATTTGACACGGACATTGTGTT\nTGTCAACCACGTGTGCACTAATGGTCTGAGCATTATTCCCCCGTATTTCAGTCAAAGTGATGCGCTTAAA\nCTTGCCACAATAGTAAAATTCTGGGTTTGGACACGTACTTAAGTTCGCCGGCATTGCGCGCGATGTGGGC\nTTATGCTATACTGTATAGACCTGCCCTATTACGGAGGACTTGGTCATCGCACCAGGCCCGTCTTGGAATG\nGGGGATCGCGGGGAGCCTACTTAATGGCTCGTAAACGCGCGTAGCGGTGGGATGATAGCGTTTTGTCATC\nGGACCGGTGACATTTGCTCCAGCGTAAGTCGAGAACTGCGAGTTGTCTCGTGCGACTGAGAACTCCTGAA\nAGTTTAGTTCGACTGTATCAACCATCCGCAAGCAGGTGCACACCTGCCGATATACCTTTCAAGCCCAGGG\nAAAACACCAATGCACCGAGGGGTGCTCAGCTCATTTAATGTAGGCCAATTATCCTGGTTTGACAATAGTC\nAGTTGGATGAACTTACTGATATGCATGACACACTAGAGGATTCTACATTGAACACTTAAAGGATAGTTGA\nTGTTACATGCCCCATTTGGTGGGCTCTGCATCCGGGAACTCACCTCGGAAAAGGCAACCTACTTTACCAA\nCTCTGTCAAATGGAAATGGAACTGTATAGGCGCGGTGGGACCCAGTGGATACTTAACGAAGGACGCAACT\nGACGTCAGGCTCATAATGAGGGCCTTGTACTTTCTACTAAGGGTAGCCCTTCGACGGTTTTCTAAGCGAC\nTGACGGCCGATTCAACACACTGAAGCCTCCTGGAGTTATTAGTTCGAGTACGAATTTTATTCGCATTAGT\nCTCAGAGAAAAACACAACCCAGCGCCCTCGCGAAGTCATATGAAGTTAGCGACAGAACGATCGAGGTGTT\nGACCTCGGACCATCGGTTATTATGATCAAGAGGTCTTGCTACGTGATGTGCAGAACCTGTTCCAATCCGA\nCAGAGATCTGATAGTAATCGTACGAGTTATGGGCTATATAACGGGAAGTTTATCGTCCTGACTGGCGCAC\nATAATATGCTCCGGATCAAGCCCTAATTAGGTCAAACGACTAGTTTCGCCATGGTTGTGTGTCGAGTTTA\nTAGTTTAACGTCTGTATAATTACAGTGGATCTAAGAAGGGTCACCTAGTGATAAGCTATCGACCTTACAC\nATAAGGTAATGAAGTGGGTTGGCCGTCGGTTTGGAGGTACTCCGGCTCCGCGCTCAACCATGGGAGACGA\nTCCTCACCCGCGCTTAAGGTGTGGTGCGCATCAGTACCATGGCATGACGGGAGCGGAAGGTAATACCGTG\nATGTCAATATCCAAAAAAGGATCGTCCTAGTGCTAGGGCTCCGTACGTCGAAAGATTCACTAAAGCGCAT\nGGCATCACAGTTTCCGGGTAAATACAAGCGCGTGCAGAACGGCGTTCCGCTCGAGCTATCCTCTCGAGCG\nTAAGAAAATTGGACTGAATAAGTCTTCTTATGCTCCATAAAGATGGTAGTGCAATTCTCAATTCATCCTG\nGCCCGTGCGACTAGAAAACTCGGACACAACGAGAATATGCGCTAAAGTGTACGCGACGGACCCTCTCCAG\nCGCAATAACAGGGTACCTGCATGGCCGCCACTAAGATTCGTCCAAGCGAGAGACAATACCACGCGGGTAT\nACAGCCACGTCCTGGTGTGATACCAACGATAGGCGGGTATAATGGGTGCCCCGTTCGGACTTTATGGGAA\nTACACTTCAATCTAACAGGTTATAAGAACATAGCCAACGGTCGTTCGTATGCGGTCCTTGGCTTTATCGT\nACCTCCCTAATGACTTCCCTTCGTATGCTCCGCAAGCGAGCGGACGGCTGTGCTCGTCCCTTCCACTTAC\nTTTACTCCAGGTCCTGAGCTGTCTTGTCTCCTGTTGTGGTCCCTCCAGATGTAAAGACTCATGACCTATT\nCAAATCTTCGTATACGGTACCGGTATCGACAGAGTTTTGATTGCAACTGCCCTAGCAGGACTATAGGAGT\nAGGTTAGGCTGTGACCCGGATTTACGCCGAAAATATAGGGATCCGAAGTTCTTCCCGCAGGTAGATCGAG\nATGACAGCTTTCCCTCAAAATGAGTAGACAAAGTAAATACATTTGAAGCGCAGTAGTAACAGCGCAGGAC\nGGTACAAATGAGACGAACGTCTGGATTATGAGGGTTTCGTGCCGCATCGCTACCGTAGGCCCACAATTCA\nTTCGTTCCTGTGCATACCTGAATATAACGTGTTAGCGAAGTCAGAGTACATTTAGTCCGCCGCTGTGGAT\nGCTCACTATGAACGCATGCCGGTGGAAAAGCGGCGGAGTACCGTTCGGCATGGGACTCACATGTAGACAC\nTCATGCGTAAGGCATGTTGCAATTTTATACTGGAGGCAAAGTCGACCCATCTCGTACTTGAGTGGAGCAC\nATACAACTCCGGCATGCTGACGTGAACCGAAAGAATAGAGTGCAATGCGGCTTGCTGGACGTGTAGATGC\nCTTGTATTTGGGTGGATCTGTTCTAAGAAACTAAAAAACTAATCGTCACTAGCGTTTACGTTCACCGAGT\nGTGGGTCTATCAGGCTCGTATGCTTAGGGTTACGGATTCGGTTGGTGAAAGGTCCTATCCAATTCTGGTT\nGGGGGGCTCGGGAGTAGAGCTCCTTACGTGGGTCATATCTGTTGTACCAGGCGTATAATTAATTGTCATG\nCGGCTAACGTGCCCCGGCACCAACGGAAGCACCACGACAACTGTATGCTGAGGAGCCAGTGTCCGGCCAG\nCAGATGGGCAGTCGTGCGAACTAGAGAGCCATGGGGCCCTACGGGTCCATAGTAAAATGCCAAGGGAGAA\nTTGTTCTCTGACGTGGCCGTCCCATGACGCAATCGGACGCGCCACCCAGACCAACAGACTCGTTTCACAC\nCTGACGACGTTATCCCATAAACGTTGTAAGCGATGAATGGGAGTTTGCCTGTACTACTGCTCAGGAGCAG\nCAACGTCGATGGGGCAAAAACGGTTTTGGAGAGAGACCCCGTTAACGCCTGGGAACTTTATCTGTACGCT\nGCCCGCAAACGGCGTTTTACTATCTGCAAGCTGGTTGTGTCTATCGCTAGCGAGAAGCCTAGGCTGTTAT\nTCTATGCGGCATTGGACCCGCTTGCTGATAGGAAATGAGACTAGCATGTTAATACTAGTCCAGACGCCTG\nTGTAAGTGAGGGGGACGAGGACAACTCGAGTGCCTCGGGAGGGGTTATCCAGGTCTTAACTTCACTAGGC\nGGATCAGATTGCTCCTGCCGGCACGCTCACACACGCTACCTCGAAAAAGTGAGGCAACTAACCTGTATGC\nTTCTCGCGTCTGTAAGTAGATTACTGTCTAGTGAGCGCCAACTTGATTACGTGCCCATAGACAGTGCTCC\nGTTCCCGCTGCACAGGCTTAAACGGGAAACCGGCGGTACCCCTTGCCCGCACCGCAGTAGGCTGGTCGAC\nGACGCCTTATCAAGGATACGTCCTCAAGAGCGTGCTAACACCATAAACAACTTAGTTAATTCATAGTAAG\nGACTAGCTGCTGCTTCAAAGCCGACCCGCCAGCAGCAATTTATTAGAGTGAAATTCCCCCTGAAACCCAT\nTGGGAGAGCACGACAAAAGTGTTGATGACCCACGTATGTTCTTCCCCGCGACGAAGGCTCACTACGTCCC\nATAGTTAATACCAGAGTAGGCATAACTGGACTCGGCCGGTGCTCGTTTGAGCTAAGCCCATAACCACTTA\nGTCTATCTAAGCTTGCTGAGGGAGCGTCGTACATTCACACAAGGGTTGACCGGGAAAGTGAGGGCTCTAA\nACATGACGCCAGATTGAGATGTGGGGTAAATAGCGAGTCCAAAGCCCGGGTGACCCAACTACTCGCGAGC\nTTATCAACTACGTATCTCCTTCCCGCGCCCTTGGATAAACCTGTAGCCTCAATATATTATCAATAGTCGT\nGGGTCGCCTCTTTAAGGGACCGTTTCGCTGTCCACGAATGTCAAGGTCCTCGCTATCTCCCTGGGTCGGT\nCTCAATCCGGCGGACAGAGGCGGGGGCTTGGGCTAGATAGAAGTTAATAGTAAATTAACAACTCCAAAGT\nGTCAGGAGCTAGCCTTACGCAAACCTTGCGCCGTGACAACATATTCAGACGCGCCTCAGTAACGCAAGTG\nTTCCTGGTATCGTCCCGATTACGATGCAAATTACAGATCTTAACGAGGAGTCTCTCTTACTAACTTGCGC\nGAGGGTAAGCGGACCGCACATTAGATTGCTGGTTTCCCTACCGAGCAATGCCGCAATCCCGTGCAGGCTT\nTGGGCATTGGCTCCCGCGGTCCAACTGTCCCGAATTAGCCGTTACCAATGCCGACTCATTTTCCGAGGTG\nTTCATTGGCGGCTACGTGGGAGTGCGGGTGACCACCTCCATTTTTAGAACTCTAAGTGGCGTCCAAGCTG\nGCAGCGATCGCCCCGAGCGCACCCCGTGTGGGTTAGAGAGATGTCATCCGATACTTGTGCCACCTTCTAG\nGTACACTCTCGCAAAGAGCGAATTTAAATTCATGTCATCCAGCGTCCCTTCGTCTTCAAAACTCCTGCGG\nCCTAATATCTGTCATTACCAGGAGAGGGGGTTTACATGAGGTCGCCCACCTATCATATCATGGCGTCGAG\nAGCCCATGCTTTCCACATTCAGGGGAGTGCGGAGGCCCAATAACGGCTCGACCACCTGTCGAATTTAAAC\nTAGAGACACGGATGCCTCCATCCATTGCACATGGGTACGAGTCTCGACGTCGTCAATAATTGCCGTTCCC\nAAGCTAGGGACAGGCGTCGAACGAGAGTTTGTGCTATTTTCACCCCTGTCCCTCCTTTTCCGAATGAGCC\nCGTTGTGCAGGCATTGATGATGGACGCCAAGTACTGCCTTTCAGTTGTACCCTAAAATCCATATATCCGG\nAACGTACCCGTACTGTTACTAAGATTTCGCATTCCACCTTTTGGCACCCGGACAGTATGCGGTGGCAACG\nAGCCAGTCAATGTCGCCGAGGCCCCCACGCATAGCGCAGTTCCGAGTATGAACCCACGTAGTAGGATATG\nCGTTGACTTCCGCAGGATTAAACCGGGAGTGGTTATTTAGGAGTCTGGCTGGCCATAATAAATAAGGACC\nGCGTCGGGATAGAGAACAAGTTCTTCTCCCGCAGGATCTGGTTGCGACGTGAGAGGGGGCCCACTCCGTA\nGAGATTAAATCGATAGGATGATCGATAATGCAACTATTAGCCCTGCGCCGGAAAGTTCCGCAGCATGCCA\nACAACGGCAAATAGTAGTTAGATCCAATAACCCATTCTATCATCGTGCCTTGAAGCGGGTTAGTTGTTGG\nGAGTGGGGGCTCACATATTTCCCTCGTGTCTTGTATAGTGCCGTGCTCGCTAGGTCCGACATCCCCTTCT\nGCATAGTGGAGCGGACTAGGGATTGTCAAAAGTAAAGTGTGTGGTGATCCCGCGGGGCTGACGGATGCCT\nCACGAACTTTAAGGGCCTCGGTTGTAGAATGATAGGCTCTGCCCAGCCAAGTCGGACGTTTTCGGGCTTT\nTCTAAAACCTCGTCTTTGTGCTTCATTTATCTGCAGGTTCACCCGGGTATCCGAGCGTTCTCAGCAACGT\nTCAAAGTAAATTTGGAGTTTCAAGATTAGGTGAGCGATGGAGCAGTGAGTATGTGGTTTTCCCTTCCCGC\nTCACGCGTGTATGATTGAAAATGATACGTTTCCGTGCTTAATGACGGATGGGATACGACCGGAGAAAATG\nGTAAGGTCGGCCCTCGCATAACCCTCGATCGGAGTCTAACTACGTTTTACTTACGCATCCGTCACAACAG\nGCAGTGTGGATGCCCATAACTACCGTCCCTTGTTGCCGCGATATTTTTCAGATTTCAGGTGATGTAGGTT\nCCCTGTCAGTGAACAATTATATGTTAAATGTCACCTCACTCGAAATGAAACGAAGCAACACCAGTGAATC\nTGGATCCGCCACGCTTAGCCCCCCACTCCTTGCCCAGAGTGCTAAATGAGTCTTCGTGCGGCAAGACTTT\nGAGAATAGAACAACGTCCAGTAATCGGGGATGCGACACGCGCAACCGCGGCTCGTAACGAAAGTTTCGTT\nAACATGGTTCAATCAGCTAGTTATTCCTGATTCTGCCGAGCAAGTTTATCTCCATGCAGAAGAAGCAATA\nCCGTGTCGAATGAATACTTTTACGTCGTTGGGTGAAAGGAGGCTCCCGAACTGTGACGAATGGGTCTATT\nAAAGGACACACGAACTGTTGCCGGGCGCCGTGTGGTATGCCACCGCGGCAGAAAACTTCGTGGACATCCG\nCTGAGTTCCGACTTCTCAATTGGTAGTCTGGAAATTGCACACACGGCGCTCGAGCCCCCTCAACATTTCA\nGGATTGGGCGGGGCGATTCCCAAGACCAGCTGCGCTTTTTCTATCCGGGGAGGAATGGCCAGGTGCTGGG\nAAGTTTTCGGGGAATTTCGGCCAGAATGGAAACTTGTAATTGAAACAACCCATGTATGTAGACGTACAAG\nAGAAGAGCCTGTCATTGATACCCTTGTAACTTATGCTTAAGGCCACAGATAGTGTCACGTCGACACTTCT\nCCTATTACTGGACGACGAGAAGTCGGATGCAGTGACTCTGGATATGGCAGTCTAAACGCGGACCGTCTAA\nCCGAAGTTTATCTATGATTGCGAGTACTACTTACGGAATACGGTGATTATCGTAAGAAGTGAATTTACCA\nCGTACCATCATGGAAAATTCACAAATGGACCTTCGACAGTGGGGAAACCGAGCACCTCGGTATATAGGGC\nAATAGACAGACTGGGTTGAAGCGGCTCCGGTAGTGGTTACGGCCGCGCTCATGACACCAAAGCGATGTTA\nGCGGTGCTCCCTGAAACCCCGGCCTGTGCTATAGTATCCCCCACGAGGATCCCTAGGCTGGTGTATAGTG\nCGTCTATATACGGCGAAACCACTGCAGATCATTTTCTTTCGTGCAACGAAACGGTGAGTATTCGCCGTAT\nACAGATTCTCCCGTACCATAATGGGAGTGGAGATTAAGATTATGCCCGGACATTTAGTCTAGAAGCGTCT\nCTGGCTGCGGTACGAAATAAATGCTGTATTTAATCCCCGCGAATACATGGTTCCGGCTATGGGATTGGGG\nTCGCAAACCACAACGAGATAAAGTGGTGCAGTATAAGCGAAGTCAGTCCTGGAGTGAGTCTGCGGCGCAG\nAATACTTGTATCATCATAATCTTGGAATCATGTTGGAACATGTGTCCCCTGCCTACTCTCCGTGTTTCGT\nACGCATGGGAATTATTAATGTGGAGACCGAAACTCTGGTGCACCGCACTGCAGTGTGAGGCTCAACTATC\nGCCGCCAACGATTATCACATACATCGGACCGTACATTCCGTTCTCGATGTGACCGACAATCCCGATGCTC\nATCGTCATGGAGGCTCCACCCGTGCGTTTTAACTTGATTTGCGGTGTCACATTTCTCGAAGGTTTGCCTA\nGGACTAGCCTCTGCCGCGGAGGTTCTCCTGATCGTGGATGCACTCAGGCATGGACGTGTGAAGTCTAGCA\nCCCAGTGTCTATAGACCTCGGCGAGGTTCCTTCTTACTAAGACGTAGTAATACCGGGTCATTACTACACC\nTACAGTAGAAGTCCAGCCGCTTCTTATCTCTATGCGCTAGGCCACTCTCGAACCGGAGATGCATACCGAT\nAAATGGCGATGGTGAAATAACAGATAAGAAAGGCCCCCACCTGTCGGACCGCCTTCACGTCTGTGAAAGT\nGTCACATATATATGGGCCTAGCTCCGAGTTTGGCCGAGAAACCGCGAAAAAGTGGGTTCCTATTCGCAAC\nCTCGACCGCGTGCTGGTACATGTGACTTACCACCGGTTCGGTAGTGGTGCTCTGGGGGTGGGCTCCCCTT\nTCGAGCGGTTGCATAACACTTCTTCGAAAGCAACTGGCCTTATCATACTCGACCACCTTTACACAAGTCC\nATGACTTGAGGGTAAAATACTTCGAATATCAGTGTCCTAAGCTTGCAGATAAGAGTAAAGTTAGTAAATC\nATTCAGAGTATTCTATGACCATATCACGCCTTAGCGGAGGTTTCGGTACCGCCCTCCTACGAGCGGAACT\nACTAACAACACAGCGTCCGAAAAAGACTCAACATCCTGACGGTTACTTAGCGAGCGATCCCGACTCATGC\nTGGCGCCACTTGCAGCGTTTTAAGGCAGAGAAGGGACATTGCCTGAGTGAGGGGTACACGGCGCTACTTT\nATAAAGGCGCTCGAAGCTATAGTCGCCAATGTCATTTGGTCGCATGCAAGTCCGTTGCAATCAAACTTGC\nACGTCATTTTTACTTAGATCATACACTGTCTTATTGGCGGGCGAAAGTATCGACCCTGCTGAGTGGGTGG\nTGCCTTAAAAGGAAGAACAGGCACCTGATAGCGAGGGGAGCACCCTTAGAAATTGCAAACGGGGGTGTAT\nGAAACCCCTATTACGTTGTTTAGCTATCTAAGCTTCGCCTACCGGCCGGTGAAATCCGAGGCGAACACGG\nATATACACTATGATCCGTCTAAAAGCTAGCCATTACCTTGAGCTCTGACACTGCGGAAACGTGGAAAGGT\nAGCGGATGGCTACAACGAGTTATTCGGTAAGCATGGTAGCCGTGAGAAGACGGCACAGTTTCCCGCCAAC\nATTGTACTGTACCTGGGTGTATACCGTTTTCGACCCTTCCCCCCTACGTGGCGGTCACAATACCAGTTCA\nGATCTATCTATGACCAGGAGCGGCGGACTAATTATCCAAAGACCCCGTGGGTGCATTGTGGTTCCGAGGC\nCTCAATGTGGGTGATTACATTGGATTGATGTGTATTGGTTGGCCAGTGCTACTCGAGGGCTGAGCCAGTT\nTTAGTGGCTAACAAAATTAACTACTTACGGGATAAACCGTACTATCTGTGTTCTTGAAGGTCGTTCAATC\nGTACTCTTGGACCGCCTTTGGGTAAATGCTATCGTATTCAAGGGCATAGCGTAACTACCCGGTCCGATAA\nGTGGCTTGAACCGTGTTACGGCTTCCCGGACATGGTTATGACTGTGAGTTACACCGTGCCAGTAGGTGTG\nCCTATGCGAAGATGATCCGTTCGCATCCGGTCGGTGTCCGTCGAACAAGAAAGGTAGTGCGGTTCAGCCG\nAGGGTAGTATAGAGCCCATACTGGGTTCTACGACGATGTATATCGGCAAGGCAACTACGTGGGTTTGTTT\nCTACTCTGCAGAGCGGGGGGAGAGTACAGTCCAGGCATGTGTGCAAATGAGTATTGTTTAACCAAACTTT\nTTTCGGGCTCAGTTTCTTCATCATTTAGGACTACGTGACATGCAAGCTCTTTCATACATTAGAGTAACTA\nAATCCGAGCTGTAAGTTTAAGCTAGTAGCCGCTAGATCACTAGCGAGTATCAGCTCGGCGCCCGGTTCGG\nGCAGCGGTCGGTCGTTCCGCACGCGGGAGAGCGACTGTTCTGGCCCGCCGAGCCTTCACACTCTTGTTAG\nTCTGGCTCACCTAGGCCGGAAGTAGATGTCCAGTCAAGATGATATCAGGGCCTCACAACCAAGATCGAAT\nGTAGAGTACGTAGACGTTGCGGAACACCAACCTAAACGGGGCTAAGTAGAACAGTTGGAGTCACCCACAT\nTTTCTCCCGTACACCCGGCCCCCGAAATAGGTCGCATTCGTCTCTCGTTCTTAACTGTCGGTAGCCGCAA\nCATGGGGGTTCAGTGGCAGATAATTAAAGTGCGGGTTTTCACTACCAAGTGACGTGAAGGAGTAGCGGTA\nAAGACTCAGGCGACGTCGACAATAAGGTCGTTGCAGCTCGTACCCTGTTTAAGGAACTGGCTCGTACCCA\nGGTAGAGTCAGACGTGGGAGAGTGAGCGCGCCAACAAGCACGATCTGAAAGATCAACATTTGTACACATT\nCGCCCCAGCAGTAACGGTCCTTTCGAGCAACGTCATTAAACAGGGCCGGACTGCTGCGCTGACATAGTTC\nATAATTCGCTTGCGGACCGACGTCCTAGCTGTGGCGTCGCAGAGTAGACAATCAGTGTTTCCATGATAGC\nATTGTTCGGCATAGAGAGGGGGCCCGCCAGTGCTGCGGTATTATGGTCAACGACCTGACACTGATTCAAT\nGTATAACGCCGATCTTCGTGTCCCCATTTGAAATTGTTACACCGGTAAGGATCCGGGACTTCTTTTGACA\nTACGTCAACTAAAACTCGGCCGCATGTCGTGCCCCTATCTCGTCGACCCGCTCACAGGGTACAGTTTGGA\nGCACCATTGTCTTCCGCCTAGCCTACGAAGGGGTCATGTACAACGTACGCTTGCGCATGTTCACCGTGAG\nACAGTAATCGTCGTGAGTTTGTCCTTATACACTACCATCA\n"
  }
]