Full Code of infphilo/centrifuge for AI

master 88ecab483e32 cached

167 files

2.8 MB

750.8k tokens

1217 symbols

1 requests

Download .txt

Showing preview only (3,003K chars total). Download the full file or copy to clipboard to get everything.

Repository: infphilo/centrifuge
Branch: master
Commit: 88ecab483e32
Files: 167
Total size: 2.8 MB

Directory structure:
gitextract_sa6zefk6/

├── .gitignore
├── AUTHORS
├── LICENSE
├── MANUAL
├── MANUAL.markdown
├── Makefile
├── NEWS
├── README.md
├── TUTORIAL
├── VERSION
├── aligner_bt.cpp
├── aligner_bt.h
├── aligner_cache.cpp
├── aligner_cache.h
├── aligner_metrics.h
├── aligner_result.h
├── aligner_seed.cpp
├── aligner_seed.h
├── aligner_seed_policy.cpp
├── aligner_seed_policy.h
├── aligner_sw.cpp
├── aligner_sw.h
├── aligner_sw_common.h
├── aligner_sw_nuc.h
├── aligner_swsse.cpp
├── aligner_swsse.h
├── aligner_swsse_ee_i16.cpp
├── aligner_swsse_ee_u8.cpp
├── aligner_swsse_loc_i16.cpp
├── aligner_swsse_loc_u8.cpp
├── aln_sink.h
├── alphabet.cpp
├── alphabet.h
├── assert_helpers.h
├── binary_sa_search.h
├── bitpack.h
├── blockwise_sa.h
├── bt2_idx.cpp
├── bt2_idx.h
├── bt2_io.h
├── bt2_util.h
├── btypes.h
├── ccnt_lut.cpp
├── centrifuge
├── centrifuge-BuildSharedSequence.pl
├── centrifuge-RemoveEmptySequence.pl
├── centrifuge-RemoveN.pl
├── centrifuge-build
├── centrifuge-compress.pl
├── centrifuge-download
├── centrifuge-inspect
├── centrifuge-kreport
├── centrifuge-promote
├── centrifuge-sort-nt.pl
├── centrifuge.cpp
├── centrifuge.xcodeproj/
│   └── project.pbxproj
├── centrifuge_build.cpp
├── centrifuge_build_main.cpp
├── centrifuge_compress.cpp
├── centrifuge_inspect.cpp
├── centrifuge_main.cpp
├── centrifuge_report.cpp
├── classifier.h
├── diff_sample.cpp
├── diff_sample.h
├── doc/
│   ├── README
│   ├── add.css
│   ├── faq.shtml
│   ├── footer.inc.html
│   ├── index.shtml
│   ├── manual.html
│   ├── manual.inc.html
│   ├── manual.inc.html.old
│   ├── manual.shtml
│   ├── sidebar.inc.shtml
│   ├── strip_markdown.pl
│   └── style.css
├── dp_framer.cpp
├── dp_framer.h
├── ds.cpp
├── ds.h
├── edit.cpp
├── edit.h
├── endian_swap.h
├── evaluation/
│   ├── centrifuge_evaluate.py
│   ├── centrifuge_simulate_reads.py
│   └── test/
│       ├── abundance.Rmd
│       └── centrifuge_evaluate_mason.py
├── example/
│   ├── index/
│   │   ├── test.1.cf
│   │   ├── test.2.cf
│   │   ├── test.3.cf
│   │   └── test.4.cf
│   ├── reads/
│   │   └── input.fa
│   └── reference/
│       ├── gi_to_tid.dmp
│       ├── names.dmp
│       ├── nodes.dmp
│       └── test.fa
├── fast_mutex.h
├── filebuf.h
├── formats.h
├── functions.sh
├── group_walk.cpp
├── group_walk.h
├── hi_aligner.h
├── hier_idx.h
├── hier_idx_common.h
├── hyperloglogbias.h
├── hyperloglogplus.h
├── indices/
│   └── Makefile
├── limit.cpp
├── limit.h
├── ls.cpp
├── ls.h
├── mask.cpp
├── mask.h
├── mem_ids.h
├── mm.h
├── multikey_qsort.h
├── opts.h
├── outq.cpp
├── outq.h
├── pat.cpp
├── pat.h
├── pe.cpp
├── pe.h
├── presets.cpp
├── presets.h
├── processor_support.h
├── qual.cpp
├── qual.h
├── random_source.cpp
├── random_source.h
├── random_util.cpp
├── random_util.h
├── read.h
├── read_qseq.cpp
├── ref_coord.cpp
├── ref_coord.h
├── ref_read.cpp
├── ref_read.h
├── reference.cpp
├── reference.h
├── scoring.cpp
├── scoring.h
├── search_globals.h
├── sequence_io.h
├── shmem.cpp
├── shmem.h
├── simple_func.cpp
├── simple_func.h
├── sse_util.cpp
├── sse_util.h
├── sstring.cpp
├── sstring.h
├── str_util.h
├── taxonomy.h
├── third_party/
│   ├── MurmurHash3.cpp
│   ├── MurmurHash3.h
│   └── cpuid.h
├── threading.h
├── timer.h
├── tinythread.cpp
├── tinythread.h
├── tokenize.h
├── util.h
├── word_io.h
└── zbox.h

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
*~
*.dSYM
.DS_Store
*-debug
*-s
*-l
centrifuge.xcodeproj/project.xcworkspace
centrifuge.xcodeproj/xcuserdata
centrifuge.xcodeproj/xcshareddata

centrifuge-build-bin
centrifuge-buildc
centrifuge-class
centrifuge-inspect-bin


================================================
FILE: AUTHORS
================================================
Ben Langmead <langmea@cs.jhu.edu> wrote Bowtie 2, which is based partially on
Bowtie.  Bowtie was written by Ben Langmead and Cole Trapnell.

  Bowtie & Bowtie 2:  http://bowtie-bio.sf.net

A DLL from the pthreads for Win32 library is distributed with the Win32 version
of Bowtie 2.  The pthreads for Win32 library and the GnuWin32 package have many
contributors (see their respective web sites).

  pthreads for Win32: http://sourceware.org/pthreads-win32
  GnuWin32:           http://gnuwin32.sf.net

The ForkManager.pm perl module is used in Bowtie 2's random testing framework,
and is included as scripts/sim/contrib/ForkManager.pm.  ForkManager.pm is
written by dLux (Szabo, Balazs), with contributions by others.  See the perldoc
in ForkManager.pm for the complete list.

The file ls.h includes an implementation of the Larsson-Sadakane suffix sorting
algorithm.  The implementation is by N. Jesper Larsson and was adapted somewhat
for use in Bowtie 2.

TinyThreads is a portable thread implementation with a fairly compatible subset 
of C++11 thread management classes written by Marcus Geelnard. For more info
check http://tinythreadpp.bitsnbites.eu/ 

Various users have kindly supplied patches, bug reports and feature requests
over the years.  Many, many thanks go to them.

September 2011


================================================
FILE: LICENSE
================================================
                    GNU GENERAL PUBLIC LICENSE
                       Version 3, 29 June 2007

 Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
 Everyone is permitted to copy and distribute verbatim copies
 of this license document, but changing it is not allowed.

                            Preamble

  The GNU General Public License is a free, copyleft license for
software and other kinds of works.

  The licenses for most software and other practical works are designed
to take away your freedom to share and change the works.  By contrast,
the GNU General Public License is intended to guarantee your freedom to
share and change all versions of a program--to make sure it remains free
software for all its users.  We, the Free Software Foundation, use the
GNU General Public License for most of our software; it applies also to
any other work released this way by its authors.  You can apply it to
your programs, too.

  When we speak of free software, we are referring to freedom, not
price.  Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
them if you wish), that you receive source code or can get it if you
want it, that you can change the software or use pieces of it in new
free programs, and that you know you can do these things.

  To protect your rights, we need to prevent others from denying you
these rights or asking you to surrender the rights.  Therefore, you have
certain responsibilities if you distribute copies of the software, or if
you modify it: responsibilities to respect the freedom of others.

  For example, if you distribute copies of such a program, whether
gratis or for a fee, you must pass on to the recipients the same
freedoms that you received.  You must make sure that they, too, receive
or can get the source code.  And you must show them these terms so they
know their rights.

  Developers that use the GNU GPL protect your rights with two steps:
(1) assert copyright on the software, and (2) offer you this License
giving you legal permission to copy, distribute and/or modify it.

  For the developers' and authors' protection, the GPL clearly explains
that there is no warranty for this free software.  For both users' and
authors' sake, the GPL requires that modified versions be marked as
changed, so that their problems will not be attributed erroneously to
authors of previous versions.

  Some devices are designed to deny users access to install or run
modified versions of the software inside them, although the manufacturer
can do so.  This is fundamentally incompatible with the aim of
protecting users' freedom to change the software.  The systematic
pattern of such abuse occurs in the area of products for individuals to
use, which is precisely where it is most unacceptable.  Therefore, we
have designed this version of the GPL to prohibit the practice for those
products.  If such problems arise substantially in other domains, we
stand ready to extend this provision to those domains in future versions
of the GPL, as needed to protect the freedom of users.

  Finally, every program is threatened constantly by software patents.
States should not allow patents to restrict development and use of
software on general-purpose computers, but in those that do, we wish to
avoid the special danger that patents applied to a free program could
make it effectively proprietary.  To prevent this, the GPL assures that
patents cannot be used to render the program non-free.

  The precise terms and conditions for copying, distribution and
modification follow.

                       TERMS AND CONDITIONS

  0. Definitions.

  "This License" refers to version 3 of the GNU General Public License.

  "Copyright" also means copyright-like laws that apply to other kinds of
works, such as semiconductor masks.

  "The Program" refers to any copyrightable work licensed under this
License.  Each licensee is addressed as "you".  "Licensees" and
"recipients" may be individuals or organizations.

  To "modify" a work means to copy from or adapt all or part of the work
in a fashion requiring copyright permission, other than the making of an
exact copy.  The resulting work is called a "modified version" of the
earlier work or a work "based on" the earlier work.

  A "covered work" means either the unmodified Program or a work based
on the Program.

  To "propagate" a work means to do anything with it that, without
permission, would make you directly or secondarily liable for
infringement under applicable copyright law, except executing it on a
computer or modifying a private copy.  Propagation includes copying,
distribution (with or without modification), making available to the
public, and in some countries other activities as well.

  To "convey" a work means any kind of propagation that enables other
parties to make or receive copies.  Mere interaction with a user through
a computer network, with no transfer of a copy, is not conveying.

  An interactive user interface displays "Appropriate Legal Notices"
to the extent that it includes a convenient and prominently visible
feature that (1) displays an appropriate copyright notice, and (2)
tells the user that there is no warranty for the work (except to the
extent that warranties are provided), that licensees may convey the
work under this License, and how to view a copy of this License.  If
the interface presents a list of user commands or options, such as a
menu, a prominent item in the list meets this criterion.

  1. Source Code.

  The "source code" for a work means the preferred form of the work
for making modifications to it.  "Object code" means any non-source
form of a work.

  A "Standard Interface" means an interface that either is an official
standard defined by a recognized standards body, or, in the case of
interfaces specified for a particular programming language, one that
is widely used among developers working in that language.

  The "System Libraries" of an executable work include anything, other
than the work as a whole, that (a) is included in the normal form of
packaging a Major Component, but which is not part of that Major
Component, and (b) serves only to enable use of the work with that
Major Component, or to implement a Standard Interface for which an
implementation is available to the public in source code form.  A
"Major Component", in this context, means a major essential component
(kernel, window system, and so on) of the specific operating system
(if any) on which the executable work runs, or a compiler used to
produce the work, or an object code interpreter used to run it.

  The "Corresponding Source" for a work in object code form means all
the source code needed to generate, install, and (for an executable
work) run the object code and to modify the work, including scripts to
control those activities.  However, it does not include the work's
System Libraries, or general-purpose tools or generally available free
programs which are used unmodified in performing those activities but
which are not part of the work.  For example, Corresponding Source
includes interface definition files associated with source files for
the work, and the source code for shared libraries and dynamically
linked subprograms that the work is specifically designed to require,
such as by intimate data communication or control flow between those
subprograms and other parts of the work.

  The Corresponding Source need not include anything that users
can regenerate automatically from other parts of the Corresponding
Source.

  The Corresponding Source for a work in source code form is that
same work.

  2. Basic Permissions.

  All rights granted under this License are granted for the term of
copyright on the Program, and are irrevocable provided the stated
conditions are met.  This License explicitly affirms your unlimited
permission to run the unmodified Program.  The output from running a
covered work is covered by this License only if the output, given its
content, constitutes a covered work.  This License acknowledges your
rights of fair use or other equivalent, as provided by copyright law.

  You may make, run and propagate covered works that you do not
convey, without conditions so long as your license otherwise remains
in force.  You may convey covered works to others for the sole purpose
of having them make modifications exclusively for you, or provide you
with facilities for running those works, provided that you comply with
the terms of this License in conveying all material for which you do
not control copyright.  Those thus making or running the covered works
for you must do so exclusively on your behalf, under your direction
and control, on terms that prohibit them from making any copies of
your copyrighted material outside their relationship with you.

  Conveying under any other circumstances is permitted solely under
the conditions stated below.  Sublicensing is not allowed; section 10
makes it unnecessary.

  3. Protecting Users' Legal Rights From Anti-Circumvention Law.

  No covered work shall be deemed part of an effective technological
measure under any applicable law fulfilling obligations under article
11 of the WIPO copyright treaty adopted on 20 December 1996, or
similar laws prohibiting or restricting circumvention of such
measures.

  When you convey a covered work, you waive any legal power to forbid
circumvention of technological measures to the extent such circumvention
is effected by exercising rights under this License with respect to
the covered work, and you disclaim any intention to limit operation or
modification of the work as a means of enforcing, against the work's
users, your or third parties' legal rights to forbid circumvention of
technological measures.

  4. Conveying Verbatim Copies.

  You may convey verbatim copies of the Program's source code as you
receive it, in any medium, provided that you conspicuously and
appropriately publish on each copy an appropriate copyright notice;
keep intact all notices stating that this License and any
non-permissive terms added in accord with section 7 apply to the code;
keep intact all notices of the absence of any warranty; and give all
recipients a copy of this License along with the Program.

  You may charge any price or no price for each copy that you convey,
and you may offer support or warranty protection for a fee.

  5. Conveying Modified Source Versions.

  You may convey a work based on the Program, or the modifications to
produce it from the Program, in the form of source code under the
terms of section 4, provided that you also meet all of these conditions:

    a) The work must carry prominent notices stating that you modified
    it, and giving a relevant date.

    b) The work must carry prominent notices stating that it is
    released under this License and any conditions added under section
    7.  This requirement modifies the requirement in section 4 to
    "keep intact all notices".

    c) You must license the entire work, as a whole, under this
    License to anyone who comes into possession of a copy.  This
    License will therefore apply, along with any applicable section 7
    additional terms, to the whole of the work, and all its parts,
    regardless of how they are packaged.  This License gives no
    permission to license the work in any other way, but it does not
    invalidate such permission if you have separately received it.

    d) If the work has interactive user interfaces, each must display
    Appropriate Legal Notices; however, if the Program has interactive
    interfaces that do not display Appropriate Legal Notices, your
    work need not make them do so.

  A compilation of a covered work with other separate and independent
works, which are not by their nature extensions of the covered work,
and which are not combined with it such as to form a larger program,
in or on a volume of a storage or distribution medium, is called an
"aggregate" if the compilation and its resulting copyright are not
used to limit the access or legal rights of the compilation's users
beyond what the individual works permit.  Inclusion of a covered work
in an aggregate does not cause this License to apply to the other
parts of the aggregate.

  6. Conveying Non-Source Forms.

  You may convey a covered work in object code form under the terms
of sections 4 and 5, provided that you also convey the
machine-readable Corresponding Source under the terms of this License,
in one of these ways:

    a) Convey the object code in, or embodied in, a physical product
    (including a physical distribution medium), accompanied by the
    Corresponding Source fixed on a durable physical medium
    customarily used for software interchange.

    b) Convey the object code in, or embodied in, a physical product
    (including a physical distribution medium), accompanied by a
    written offer, valid for at least three years and valid for as
    long as you offer spare parts or customer support for that product
    model, to give anyone who possesses the object code either (1) a
    copy of the Corresponding Source for all the software in the
    product that is covered by this License, on a durable physical
    medium customarily used for software interchange, for a price no
    more than your reasonable cost of physically performing this
    conveying of source, or (2) access to copy the
    Corresponding Source from a network server at no charge.

    c) Convey individual copies of the object code with a copy of the
    written offer to provide the Corresponding Source.  This
    alternative is allowed only occasionally and noncommercially, and
    only if you received the object code with such an offer, in accord
    with subsection 6b.

    d) Convey the object code by offering access from a designated
    place (gratis or for a charge), and offer equivalent access to the
    Corresponding Source in the same way through the same place at no
    further charge.  You need not require recipients to copy the
    Corresponding Source along with the object code.  If the place to
    copy the object code is a network server, the Corresponding Source
    may be on a different server (operated by you or a third party)
    that supports equivalent copying facilities, provided you maintain
    clear directions next to the object code saying where to find the
    Corresponding Source.  Regardless of what server hosts the
    Corresponding Source, you remain obligated to ensure that it is
    available for as long as needed to satisfy these requirements.

    e) Convey the object code using peer-to-peer transmission, provided
    you inform other peers where the object code and Corresponding
    Source of the work are being offered to the general public at no
    charge under subsection 6d.

  A separable portion of the object code, whose source code is excluded
from the Corresponding Source as a System Library, need not be
included in conveying the object code work.

  A "User Product" is either (1) a "consumer product", which means any
tangible personal property which is normally used for personal, family,
or household purposes, or (2) anything designed or sold for incorporation
into a dwelling.  In determining whether a product is a consumer product,
doubtful cases shall be resolved in favor of coverage.  For a particular
product received by a particular user, "normally used" refers to a
typical or common use of that class of product, regardless of the status
of the particular user or of the way in which the particular user
actually uses, or expects or is expected to use, the product.  A product
is a consumer product regardless of whether the product has substantial
commercial, industrial or non-consumer uses, unless such uses represent
the only significant mode of use of the product.

  "Installation Information" for a User Product means any methods,
procedures, authorization keys, or other information required to install
and execute modified versions of a covered work in that User Product from
a modified version of its Corresponding Source.  The information must
suffice to ensure that the continued functioning of the modified object
code is in no case prevented or interfered with solely because
modification has been made.

  If you convey an object code work under this section in, or with, or
specifically for use in, a User Product, and the conveying occurs as
part of a transaction in which the right of possession and use of the
User Product is transferred to the recipient in perpetuity or for a
fixed term (regardless of how the transaction is characterized), the
Corresponding Source conveyed under this section must be accompanied
by the Installation Information.  But this requirement does not apply
if neither you nor any third party retains the ability to install
modified object code on the User Product (for example, the work has
been installed in ROM).

  The requirement to provide Installation Information does not include a
requirement to continue to provide support service, warranty, or updates
for a work that has been modified or installed by the recipient, or for
the User Product in which it has been modified or installed.  Access to a
network may be denied when the modification itself materially and
adversely affects the operation of the network or violates the rules and
protocols for communication across the network.

  Corresponding Source conveyed, and Installation Information provided,
in accord with this section must be in a format that is publicly
documented (and with an implementation available to the public in
source code form), and must require no special password or key for
unpacking, reading or copying.

  7. Additional Terms.

  "Additional permissions" are terms that supplement the terms of this
License by making exceptions from one or more of its conditions.
Additional permissions that are applicable to the entire Program shall
be treated as though they were included in this License, to the extent
that they are valid under applicable law.  If additional permissions
apply only to part of the Program, that part may be used separately
under those permissions, but the entire Program remains governed by
this License without regard to the additional permissions.

  When you convey a copy of a covered work, you may at your option
remove any additional permissions from that copy, or from any part of
it.  (Additional permissions may be written to require their own
removal in certain cases when you modify the work.)  You may place
additional permissions on material, added by you to a covered work,
for which you have or can give appropriate copyright permission.

  Notwithstanding any other provision of this License, for material you
add to a covered work, you may (if authorized by the copyright holders of
that material) supplement the terms of this License with terms:

    a) Disclaiming warranty or limiting liability differently from the
    terms of sections 15 and 16 of this License; or

    b) Requiring preservation of specified reasonable legal notices or
    author attributions in that material or in the Appropriate Legal
    Notices displayed by works containing it; or

    c) Prohibiting misrepresentation of the origin of that material, or
    requiring that modified versions of such material be marked in
    reasonable ways as different from the original version; or

    d) Limiting the use for publicity purposes of names of licensors or
    authors of the material; or

    e) Declining to grant rights under trademark law for use of some
    trade names, trademarks, or service marks; or

    f) Requiring indemnification of licensors and authors of that
    material by anyone who conveys the material (or modified versions of
    it) with contractual assumptions of liability to the recipient, for
    any liability that these contractual assumptions directly impose on
    those licensors and authors.

  All other non-permissive additional terms are considered "further
restrictions" within the meaning of section 10.  If the Program as you
received it, or any part of it, contains a notice stating that it is
governed by this License along with a term that is a further
restriction, you may remove that term.  If a license document contains
a further restriction but permits relicensing or conveying under this
License, you may add to a covered work material governed by the terms
of that license document, provided that the further restriction does
not survive such relicensing or conveying.

  If you add terms to a covered work in accord with this section, you
must place, in the relevant source files, a statement of the
additional terms that apply to those files, or a notice indicating
where to find the applicable terms.

  Additional terms, permissive or non-permissive, may be stated in the
form of a separately written license, or stated as exceptions;
the above requirements apply either way.

  8. Termination.

  You may not propagate or modify a covered work except as expressly
provided under this License.  Any attempt otherwise to propagate or
modify it is void, and will automatically terminate your rights under
this License (including any patent licenses granted under the third
paragraph of section 11).

  However, if you cease all violation of this License, then your
license from a particular copyright holder is reinstated (a)
provisionally, unless and until the copyright holder explicitly and
finally terminates your license, and (b) permanently, if the copyright
holder fails to notify you of the violation by some reasonable means
prior to 60 days after the cessation.

  Moreover, your license from a particular copyright holder is
reinstated permanently if the copyright holder notifies you of the
violation by some reasonable means, this is the first time you have
received notice of violation of this License (for any work) from that
copyright holder, and you cure the violation prior to 30 days after
your receipt of the notice.

  Termination of your rights under this section does not terminate the
licenses of parties who have received copies or rights from you under
this License.  If your rights have been terminated and not permanently
reinstated, you do not qualify to receive new licenses for the same
material under section 10.

  9. Acceptance Not Required for Having Copies.

  You are not required to accept this License in order to receive or
run a copy of the Program.  Ancillary propagation of a covered work
occurring solely as a consequence of using peer-to-peer transmission
to receive a copy likewise does not require acceptance.  However,
nothing other than this License grants you permission to propagate or
modify any covered work.  These actions infringe copyright if you do
not accept this License.  Therefore, by modifying or propagating a
covered work, you indicate your acceptance of this License to do so.

  10. Automatic Licensing of Downstream Recipients.

  Each time you convey a covered work, the recipient automatically
receives a license from the original licensors, to run, modify and
propagate that work, subject to this License.  You are not responsible
for enforcing compliance by third parties with this License.

  An "entity transaction" is a transaction transferring control of an
organization, or substantially all assets of one, or subdividing an
organization, or merging organizations.  If propagation of a covered
work results from an entity transaction, each party to that
transaction who receives a copy of the work also receives whatever
licenses to the work the party's predecessor in interest had or could
give under the previous paragraph, plus a right to possession of the
Corresponding Source of the work from the predecessor in interest, if
the predecessor has it or can get it with reasonable efforts.

  You may not impose any further restrictions on the exercise of the
rights granted or affirmed under this License.  For example, you may
not impose a license fee, royalty, or other charge for exercise of
rights granted under this License, and you may not initiate litigation
(including a cross-claim or counterclaim in a lawsuit) alleging that
any patent claim is infringed by making, using, selling, offering for
sale, or importing the Program or any portion of it.

  11. Patents.

  A "contributor" is a copyright holder who authorizes use under this
License of the Program or a work on which the Program is based.  The
work thus licensed is called the contributor's "contributor version".

  A contributor's "essential patent claims" are all patent claims
owned or controlled by the contributor, whether already acquired or
hereafter acquired, that would be infringed by some manner, permitted
by this License, of making, using, or selling its contributor version,
but do not include claims that would be infringed only as a
consequence of further modification of the contributor version.  For
purposes of this definition, "control" includes the right to grant
patent sublicenses in a manner consistent with the requirements of
this License.

  Each contributor grants you a non-exclusive, worldwide, royalty-free
patent license under the contributor's essential patent claims, to
make, use, sell, offer for sale, import and otherwise run, modify and
propagate the contents of its contributor version.

  In the following three paragraphs, a "patent license" is any express
agreement or commitment, however denominated, not to enforce a patent
(such as an express permission to practice a patent or covenant not to
sue for patent infringement).  To "grant" such a patent license to a
party means to make such an agreement or commitment not to enforce a
patent against the party.

  If you convey a covered work, knowingly relying on a patent license,
and the Corresponding Source of the work is not available for anyone
to copy, free of charge and under the terms of this License, through a
publicly available network server or other readily accessible means,
then you must either (1) cause the Corresponding Source to be so
available, or (2) arrange to deprive yourself of the benefit of the
patent license for this particular work, or (3) arrange, in a manner
consistent with the requirements of this License, to extend the patent
license to downstream recipients.  "Knowingly relying" means you have
actual knowledge that, but for the patent license, your conveying the
covered work in a country, or your recipient's use of the covered work
in a country, would infringe one or more identifiable patents in that
country that you have reason to believe are valid.

  If, pursuant to or in connection with a single transaction or
arrangement, you convey, or propagate by procuring conveyance of, a
covered work, and grant a patent license to some of the parties
receiving the covered work authorizing them to use, propagate, modify
or convey a specific copy of the covered work, then the patent license
you grant is automatically extended to all recipients of the covered
work and works based on it.

  A patent license is "discriminatory" if it does not include within
the scope of its coverage, prohibits the exercise of, or is
conditioned on the non-exercise of one or more of the rights that are
specifically granted under this License.  You may not convey a covered
work if you are a party to an arrangement with a third party that is
in the business of distributing software, under which you make payment
to the third party based on the extent of your activity of conveying
the work, and under which the third party grants, to any of the
parties who would receive the covered work from you, a discriminatory
patent license (a) in connection with copies of the covered work
conveyed by you (or copies made from those copies), or (b) primarily
for and in connection with specific products or compilations that
contain the covered work, unless you entered into that arrangement,
or that patent license was granted, prior to 28 March 2007.

  Nothing in this License shall be construed as excluding or limiting
any implied license or other defenses to infringement that may
otherwise be available to you under applicable patent law.

  12. No Surrender of Others' Freedom.

  If conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License.  If you cannot convey a
covered work so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you may
not convey it at all.  For example, if you agree to terms that obligate you
to collect a royalty for further conveying from those to whom you convey
the Program, the only way you could satisfy both those terms and this
License would be to refrain entirely from conveying the Program.

  13. Use with the GNU Affero General Public License.

  Notwithstanding any other provision of this License, you have
permission to link or combine any covered work with a work licensed
under version 3 of the GNU Affero General Public License into a single
combined work, and to convey the resulting work.  The terms of this
License will continue to apply to the part which is the covered work,
but the special requirements of the GNU Affero General Public License,
section 13, concerning interaction through a network will apply to the
combination as such.

  14. Revised Versions of this License.

  The Free Software Foundation may publish revised and/or new versions of
the GNU General Public License from time to time.  Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.

  Each version is given a distinguishing version number.  If the
Program specifies that a certain numbered version of the GNU General
Public License "or any later version" applies to it, you have the
option of following the terms and conditions either of that numbered
version or of any later version published by the Free Software
Foundation.  If the Program does not specify a version number of the
GNU General Public License, you may choose any version ever published
by the Free Software Foundation.

  If the Program specifies that a proxy can decide which future
versions of the GNU General Public License can be used, that proxy's
public statement of acceptance of a version permanently authorizes you
to choose that version for the Program.

  Later license versions may give you additional or different
permissions.  However, no additional obligations are imposed on any
author or copyright holder as a result of your choosing to follow a
later version.

  15. Disclaimer of Warranty.

  THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.

  16. Limitation of Liability.

  IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
SUCH DAMAGES.

  17. Interpretation of Sections 15 and 16.

  If the disclaimer of warranty and limitation of liability provided
above cannot be given local legal effect according to their terms,
reviewing courts shall apply local law that most closely approximates
an absolute waiver of all civil liability in connection with the
Program, unless a warranty or assumption of liability accompanies a
copy of the Program in return for a fee.

                     END OF TERMS AND CONDITIONS

            How to Apply These Terms to Your New Programs

  If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.

  To do so, attach the following notices to the program.  It is safest
to attach them to the start of each source file to most effectively
state the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.

    <one line to give the program's name and a brief idea of what it does.>
    Copyright (C) <year>  <name of author>

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.

Also add information on how to contact you by electronic and paper mail.

  If the program does terminal interaction, make it output a short
notice like this when it starts in an interactive mode:

    <program>  Copyright (C) <year>  <name of author>
    This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
    This is free software, and you are welcome to redistribute it
    under certain conditions; type `show c' for details.

The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License.  Of course, your program's commands
might be different; for a GUI interface, you would use an "about box".

  You should also get your employer (if you work as a programmer) or school,
if any, to sign a "copyright disclaimer" for the program, if necessary.
For more information on this, and how to apply and follow the GNU GPL, see
<http://www.gnu.org/licenses/>.

  The GNU General Public License does not permit incorporating your program
into proprietary programs.  If your program is a subroutine library, you
may consider it more useful to permit linking proprietary applications with
the library.  If this is what you want to do, use the GNU Lesser General
Public License instead of this License.  But first, please read
<http://www.gnu.org/philosophy/why-not-lgpl.html>.


================================================
FILE: MANUAL
================================================

Introduction
============

What is Centrifuge?
-----------------

[Centrifuge] is a novel microbial classification engine that enables
rapid, accurate, and sensitive labeling of reads and quantification of
species on desktop computers.  The system uses a novel indexing scheme
based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini
(FM) index, optimized specifically for the metagenomic classification
problem. Centrifuge requires a relatively small index (5.8 GB for all
complete bacterial and viral genomes plus the human genome) and
classifies sequences at a very high speed, allowing it to process the
millions of reads from a typical high-throughput DNA sequencing run
within a few minutes.  Together these advances enable timely and
accurate analysis of large metagenomics data sets on conventional
desktop computers.

[Centrifuge]:     http://www.ccb.jhu.edu/software/centrifuge

[Burrows-Wheeler Transform]: http://en.wikipedia.org/wiki/Burrows-Wheeler_transform
[FM Index]:        http://en.wikipedia.org/wiki/FM-index

[GPLv3 license]:   http://www.gnu.org/licenses/gpl-3.0.html

Obtaining Centrifuge
==================

Download Centrifuge and binaries from the Releases sections on the right side.
Binaries are available for Intel architectures (`x86_64`) running Linux, and Mac OS X.

Building from source
--------------------

Building Centrifuge from source requires a GNU-like environment with GCC, GNU Make
and other basics.  It should be possible to build Centrifuge on most vanilla Linux
installations or on a Mac installation with [Xcode] installed.  Centrifuge can
also be built on Windows using [Cygwin] or [MinGW] (MinGW recommended). For a 
MinGW build the choice of what compiler is to be used is important since this
will determine if a 32 or 64 bit code can be successfully compiled using it. If 
there is a need to generate both 32 and 64 bit on the same machine then a multilib 
MinGW has to be properly installed. [MSYS], the [zlib] library, and depending on 
architecture [pthreads] library are also required. We are recommending a 64 bit
build since it has some clear advantages in real life research problems. In order 
to simplify the MinGW setup it might be worth investigating popular MinGW personal 
builds since these are coming already prepared with most of the toolchains needed.

First, download the [source package] from the Releases secion on the right side.
Unzip the file, change to the unzipped directory, and build the
Centrifuge tools by running GNU `make` (usually with the command `make`, but
sometimes with `gmake`) with no arguments.  If building with MinGW, run `make`
from the MSYS environment.

Centrifuge is using the multithreading software model in order to speed up 
execution times on SMP architectures where this is possible. On POSIX 
platforms (like linux, Mac OS, etc) it needs the pthread library. Although
it is possible to use pthread library on non-POSIX platform like Windows, due
to performance reasons Centrifuge will try to use Windows native multithreading
if possible.

For the support of SRA data access in HISAT2, please download and install the [NCBI-NGS] toolkit.
When running `make`, specify additional variables as follow.
`make USE_SRA=1 NCBI_NGS_DIR=/path/to/NCBI-NGS-directory NCBI_VDB_DIR=/path/to/NCBI-NGS-directory`,
where `NCBI_NGS_DIR` and `NCBI_VDB_DIR` will be used in Makefile for -I and -L compilation options.
For example, $(NCBI_NGS_DIR)/include and $(NCBI_NGS_DIR)/lib64 will be used.  

[Cygwin]:   http://www.cygwin.com/
[MinGW]:    http://www.mingw.org/
[MSYS]:     http://www.mingw.org/wiki/msys
[zlib]:     http://cygwin.com/packages/mingw-zlib/
[pthreads]: http://sourceware.org/pthreads-win32/
[GnuWin32]: http://gnuwin32.sf.net/packages/coreutils.htm
[Xcode]:    http://developer.apple.com/xcode/
[Github site]: https://github.com/infphilo/centrifuge
[NCBI-NGS]: https://github.com/ncbi/ngs/wiki/Downloads

Running Centrifuge
=============

Adding to PATH
--------------

By adding your new Centrifuge directory to your [PATH environment variable], you
ensure that whenever you run `centrifuge`, `centrifuge-build`, `centrifuge-download` or `centrifuge-inspect`
from the command line, you will get the version you just installed without
having to specify the entire path.  This is recommended for most users.  To do
this, follow your operating system's instructions for adding the directory to
your [PATH].

If you would like to install Centrifuge by copying the Centrifuge executable files
to an existing directory in your [PATH], make sure that you copy all the
executables, including `centrifuge`, `centrifuge-class`, `centrifuge-build`, `centrifuge-build-bin`, `centrifuge-download` `centrifuge-inspect`
and `centrifuge-inspect-bin`. Furthermore you need the programs
in the scripts/ folder if you opt for genome compression in the database construction.

[PATH environment variable]: http://en.wikipedia.org/wiki/PATH_(variable)
[PATH]: http://en.wikipedia.org/wiki/PATH_(variable)

Before running Centrifuge
-----------------

Classification is considerably different from alignment in that classification is performed on a large set of genomes as opposed to on just one reference genome as in alignment.  Currently, an enormous number of complete genomes are available at the GenBank (e.g. >4,000 bacterial genomes, >10,000 viral genomes, …).  These genomes are organized in a taxonomic tree where each genome is located at the bottom of the tree, at the strain or subspecies level.  On the taxonomic tree, genomes have ancestors usually situated at the species level, and those ancestors also have ancestors at the genus level and so on up the family level, the order level, class level, phylum, kingdom, and finally at the root level.

Given the gigantic number of genomes available, which continues to expand at a rapid rate, and the development of the taxonomic tree, which continues to evolve with new advancements in research, we have designed Centrifuge to be flexible and general enough to reflect this huge database.  We provide several standard indexes that will meet most of users’ needs (see the side panel - Indexes).  In our approach our indexes not only include raw genome sequences, but also genome names/sizes and taxonomic trees.  This enables users to perform additional analyses on Centrifuge’s classification output without the need to download extra database sources.  This also eliminates the potential issue of discrepancy between the indexes we provide and the databases users may otherwise download.  We plan to provide a couple of additional standard indexes in the near future, and update the indexes on a regular basis.

We encourage first time users to take a look at and follow a `small example` that illustrates how to build an index, how to run Centrifuge using the index, how to interpret the classification results, and how to extract additional genomic information from the index.  For those who choose to build customized indexes, please take a close look at the following description.

Database download and index building
-----------------

Centrifuge indexes can be built with arbritary sequences. Standard choices are
all of the complete bacterial and viral genomes, or using the sequences that
are part of the BLAST nt database. Centrifuge always needs the
nodes.dmp file from the NCBI taxonomy dump to build the taxonomy tree,
as well as a sequence ID to taxonomy ID map. The map is a tab-separated
file with the sequence ID to taxonomy ID map.

To download all of the complete archaeal, viral, and bacterial genomes from RefSeq, and
build the index:

Centrifuge indices can be build on arbritary sequences. Usually an ensemble of
genomes is used - such as all complete microbial genomes in the RefSeq database,
or all sequences in the BLAST nt database. 

To map sequence identifiers to taxonomy IDs, and taxonomy IDs to names and 
its parents, three files are necessary in addition to the sequence files:

 - taxonomy tree: typically nodes.dmp from the NCBI taxonomy dump. Links taxonomy IDs to their parents
 - names file: typically names.dmp from the NCBI taxonomy dump. Links taxonomy IDs to their scientific name
 - a tab-separated sequence ID to taxonomy ID mapping

When using the provided scripts to download the genomes, these files are automatically downloaded or generated. 
When using a custom taxonomy or sequence files, please refer to the section `TODO` to learn more about their format.

### Building index on all complete bacterial and viral genomes

Use `centrifuge-download` to download genomes from NCBI. The following two commands download
the NCBI taxonomy to `taxonomy/` in the current directory, and all complete archaeal,
bacterial and viral genomes to `library/`. Low-complexity regions in the genomes are masked after
download (parameter `-m`) using blast+'s `dustmasker`. `centrifuge-download` outputs tab-separated 
sequence ID to taxonomy ID mappings to standard out, which are required by `centrifuge-build`.

    centrifuge-download -o taxonomy taxonomy
    centrifuge-download -o library -m -d "archaea,bacteria,viral" refseq > seqid2taxid.map

To build the index, first concatenate all downloaded sequences into a single file, and then
run `centrifuge-build`:
    
    cat library/*/*.fna > input-sequences.fna

    ## build centrifuge index with 4 threads
    centrifuge-build -p 4 --conversion-table seqid2taxid.map \
                     --taxonomy-tree taxonomy/nodes.dmp --name-table taxonomy/names.dmp \
                     input-sequences.fna abv

After building the index, all files except the index *.[123].cf files may be removed.
If you also want to include the human and/or the mouse genome, add their sequences to 
the library folder before building the index with one of the following commands:

After the index building, all but the *.[123].cf index files may be removed. I.e. the files in
the `library/` and `taxonomy/` directories are no longer needed.

### Adding human or mouse genome to the index
The human and mouse genomes can also be downloaded using `centrifuge-download`. They are in the
domain "vertebrate_mammalian" (argument `-d`), are assembled at the chromosome level (argument `-a`)
and categorized as reference genomes by RefSeq (`-c`). The argument `-t` takes a comma-separated
list of taxonomy IDs - e.g. `9606` for human and `10090` for mouse:

    # download mouse and human reference genomes
    centrifuge-download -o library -d "vertebrate_mammalian" -a "Chromosome" -t 9606,10090 -c 'reference genome' refseq >> seqid2taxid.map
    # only human
    centrifuge-download -o library -d "vertebrate_mammalian" -a "Chromosome" -t 9606 -c 'reference genome' refseq >> seqid2taxid.map
    # only mouse
    centrifuge-download -o library -d "vertebrate_mammalian" -a "Chromosome" -t 10090 -c 'reference genome' refseq >> seqid2taxid.map

### nt database

NCBI BLAST's nt database contains all spliced non-redundant coding
sequences from multiplpe databases, inferred from genommic
sequences. Traditionally used with BLAST, a download of the FASTA is
provided on the NCBI homepage. Building an index with any database 
requires the user to creates a sequence ID to taxonomy ID map that 
can be generated from a GI taxid dump:

    wget ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nt.gz
    gunzip nt.gz && mv -v nt nt.fa

    # Get mapping file
    wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz
    gunzip -c gi_taxid_nucl.dmp.gz | sed 's/^/gi|/' > gi_taxid_nucl.map

    # build index using 16 cores and a small bucket size, which will require less memory
    centrifuge-build -p 16 --bmax 1342177280 --conversion-table gi_taxid_nucl.map \
                     --taxonomy-tree taxonomy/nodes.dmp --name-table taxonomy/names.dmp \ 
                     nt.fa nt

### Custom database

To build a custom database, you need the provide the follwing four files to `centrifuge-build`:

  - `--conversion-table`: tab-separated file mapping sequence IDs to taxonomy IDs. Sequence IDs are the header up to the first space or second pipe (`|`).  
  - `--taxonomy-tree`: `\t|\t`-separated file mapping taxonomy IDs to their parents and rank, up to the root of the tree. When using NCBI taxonomy IDs, this will be the `nodes.dmp` from `ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz`.
  - `--name-table`: '\t|\t'-separated file mapping taxonomy IDs to a name. A further column (typically column 4) must specify `scientific name`. When using NCBI taxonomy IDs, `names.dmp` is the appropriate file.
  - reference sequences: The ID of the sequences are the header up to the first space or second pipe (`|`)

When using custom taxonomy IDs, use only positive integers greater-equal to `1` and use `1` for the root of the tree.

#### More info on `--taxonomy-tree` and `--name-table`

The format of these files are based on `nodes.dmp` and `names.dmp` from the NCBI taxonomy database dump. 

- Field terminator is `\t|\t`
- Row terminator is `\t|\n`

The `taxonomy-tree` / nodes.dmp file consists of taxonomy nodes. The description for each node includes the following
fields:

    tax_id                  -- node id in GenBank taxonomy database
    parent tax_id           -- parent node id in GenBank taxonomy database
    rank                    -- rank of this node (superkingdom, kingdom, ..., no rank)

Further fields are ignored.

The `name-table` / names.dmp is the taxonomy names file:

    tax_id                  -- the id of node associated with this name
    name_txt                -- name itself
    unique name             -- the unique variant of this name if name not unique
    name class              -- (scientific name, synonym, common name, ...)

`name class` **has** to be `scientific name` to be included in the build. All other lines are ignored

#### Example

*Conversion table `ex.conv`*: 
    
    Seq1	11
    Seq2	12
    Seq3	13
    Seq4	11

*Taxonomy tree `ex.tree`*: 

    1	|	1	|	root
    10	|	1	|	kingdom
    11	|	10	|	species
    12	|	10	|	species
    13	|	1	|	species

*Name table `ex.name`*:

    1	|	root	|		|	scientific name	|
    10	|	Bacteria	|		|	scientific name	|
    11	|	Bacterium A	|		|	scientific name	|
    12	|	Bacterium B	|		|	scientific name	|
    12	|	Some other species	|		|	scientific name	|

*Reference sequences `ex.fa`*:

    >Seq1
    AAAACGTACGA.....
    >Seq2
    AAAACGTACGA.....
    >Seq3
    AAAACGTACGA.....
    >Seq4
    AAAACGTACGA.....

To build the database, call

    centrifuge-build --conversion-table ex.conv \
                     --taxonomy-tree ex.tree --name-table ex.name \ 
                     ex.fa ex

which results in three index files named `ex.1.cf`, `ex.2.cf` and `ex.3.cf`.

### Centrifuge classification output

The following example shows classification assignments for a read.  The assignment output has 8 columns.

    readID    seqID   taxID score	   2ndBestScore	   hitLength	queryLength	numMatches
    1_1	      gi|4    9646  4225	   0		       80	80		1

    The first column is the read ID from a raw sequencing read (e.g., 1_1 in the example).
    The second column is the sequence ID of the genomic sequence, where the read is classified (e.g., gi|4).
    The third column is the taxonomic ID of the genomic sequence in the second column (e.g., 9646).
    The fourth column is the score for the classification, which is the weighted sum of hits (e.g., 4225)
    The fifth column is the score for the next best classification (e.g., 0).
    The sixth column is a pair of two numbers: (1) an approximate number of base pairs of the read that match the genomic sequence and (2) the length of a read or the combined length of mate pairs (e.g., 80 / 80).
    The seventh column is a pair of two numbers: (1) an approximate number of base pairs of the read that match the genomic sequence and (2) the length of a read or the combined length of mate pairs (e.g., 80 / 80). 
    The eighth column is the number of classifications for this read, indicating how many assignments were made (e.g.,1).

### Centrifuge summary output (the default filename is centrifuge_report.tsv)

The following example shows a classification summary for each genome or taxonomic unit.  The assignment output has 7 columns.

    name      	      	      		     	     	      	     	taxID	taxRank	   genomeSize 	numReads   numUniqueReads   abundance
    Wigglesworthia glossinidia endosymbiont of Glossina brevipalpis	36870	leaf	   703004		5981	   5964	            0.0152317

    The first column is the name of a genome, or the name corresponding to a taxonomic ID (the second column) at a rank higher than the strain (e.g., Wigglesworthia glossinidia endosymbiont of Glossina brevipalpis).
    The second column is the taxonomic ID (e.g., 36870).
    The third column is the taxonomic rank (e.g., leaf).
    The fourth column is the length of the genome sequence (e.g., 703004).
    The fifth column is the number of reads classified to this genomic sequence including multi-classified reads (e.g., 5981).
    The sixth column is the number of reads uniquely classified to this genomic sequence (e.g., 5964).
    The seventh column is the proportion of this genome normalized by its genomic length (e.g., 0.0152317).

As the GenBank database is incomplete (i.e., many more genomes remain to be identified and added), and reads have sequencing errors, classification programs including Centrifuge often report many false assignments.  In order to perform more conservative analyses, users may want to discard assignments for reads having a matching length (8th column in the output of Centrifuge) of 40% or lower.  It may be also helpful to use a score (4th column) for filtering out some assignments.   Our future research plans include working on developing methods that estimate confidence scores for assignments.

### Kraken-style report

`centrifuge-kreport` can be used to make a Kraken-style report from the Centrifuge output including taxonomy information:

`centrifuge-kreport -x <centrifuge index> <centrifuge out file>`

Inspecting the Centrifuge index
-----------------------

The index can be inspected with `centrifuge-inspect`.  To extract raw sequences:

    centrifuge-inspect <centrifuge index>

Extract the sequence ID to taxonomy ID conversion table from the index

    centrifuge-inspect --conversion-table <centrifuge index>

Extract the taxonomy tree from the index:

    centrifuge-inspect --taxonomy-tree <centrifuge index>

Extract the lengths of the sequences from the index (each row has two columns: taxonomic ID and length):

    centrifuge-inspect --size-table <centrifuge index>

Extract the names from the index (each row has two columns: taxonomic ID and name):

    centrifuge-inspect --name-table <centrifuge index>
    
Wrapper
-------

The `centrifuge`, `centrifuge-build` and `centrifuge-inspect` executables are actually 
wrapper scripts that call binary programs as appropriate. Also, the `centrifuge` wrapper
provides some key functionality, like the ability to handle compressed inputs,
and the functionality for `--un`, `--al` and related options.

It is recommended that you always run the centrifuge wrappers and not run the
binaries directly.

Performance tuning
------------------

1.  If your computer has multiple processors/cores, use `-p NTHREADS`

    The `-p` option causes Centrifuge to launch a specified number of parallel
    search threads.  Each thread runs on a different processor/core and all
    threads find alignments in parallel, increasing alignment throughput by
    approximately a multiple of the number of threads (though in practice,
    speedup is somewhat worse than linear).

Command Line
------------

### Usage

    centrifuge [options]* -x <centrifuge-idx> {-1 <m1> -2 <m2> | -U <r> | --sra-acc <SRA accession number>} [--report-file <report file name> -S <classification output file name>]

### Main arguments

    -x <centrifuge-idx>

The basename of the index for the reference genomes.  The basename is the name of
any of the index files up to but not including the final `.1.cf` / etc.  
`centrifuge` looks for the specified index first in the current directory,
then in the directory specified in the `CENTRIFUGE_INDEXES` environment variable.

    -1 <m1>

Comma-separated list of files containing mate 1s (filename usually includes
`_1`), e.g. `-1 flyA_1.fq,flyB_1.fq`.  Sequences specified with this option must
correspond file-for-file and read-for-read with those specified in `<m2>`. Reads
may be a mix of different lengths. If `-` is specified, `centrifuge` will read the
mate 1s from the "standard in" or "stdin" filehandle.

    -2 <m2>

Comma-separated list of files containing mate 2s (filename usually includes
`_2`), e.g. `-2 flyA_2.fq,flyB_2.fq`.  Sequences specified with this option must
correspond file-for-file and read-for-read with those specified in `<m1>`. Reads
may be a mix of different lengths. If `-` is specified, `centrifuge` will read the
mate 2s from the "standard in" or "stdin" filehandle.

    -U <r>

Comma-separated list of files containing unpaired reads to be aligned, e.g.
`lane1.fq,lane2.fq,lane3.fq,lane4.fq`.  Reads may be a mix of different lengths.
If `-` is specified, `centrifuge` gets the reads from the "standard in" or "stdin"
filehandle.

    --sra-acc <SRA accession number>

Comma-separated list of SRA accession numbers, e.g. `--sra-acc SRR353653,SRR353654`.
Information about read types is available at http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?sp=runinfo&acc=<b>sra-acc</b>&retmode=xml,
where <b>sra-acc</b> is SRA accession number.  If users run HISAT2 on a computer cluster, it is recommended to disable SRA-related caching (see the instruction at [SRA-MANUAL]).

[SRA-MANUAL]:	     https://github.com/ncbi/sra-tools/wiki/Toolkit-Configuration

    -S <filename>

File to write classification results to.  By default, assignments are written to the
"standard out" or "stdout" filehandle (i.e. the console).

    --report-file <filename>

File to write a classification summary to (default: centrifuge_report.tsv).

### Options

#### Input options

    -q

Reads (specified with `<m1>`, `<m2>`, `<s>`) are FASTQ files.  FASTQ files
usually have extension `.fq` or `.fastq`.  FASTQ is the default format.  See
also: `--solexa-quals` and `--int-quals`.

    --qseq

Reads (specified with `<m1>`, `<m2>`, `<s>`) are QSEQ files.  QSEQ files usually
end in `_qseq.txt`.  See also: `--solexa-quals` and `--int-quals`.

    -f

Reads (specified with `<m1>`, `<m2>`, `<s>`) are FASTA files.  FASTA files
usually have extension `.fa`, `.fasta`, `.mfa`, `.fna` or similar.  FASTA files
do not have a way of specifying quality values, so when `-f` is set, the result
is as if `--ignore-quals` is also set.

    -r

Reads (specified with `<m1>`, `<m2>`, `<s>`) are files with one input sequence
per line, without any other information (no read names, no qualities).  When
`-r` is set, the result is as if `--ignore-quals` is also set.

    -c

The read sequences are given on command line.  I.e. `<m1>`, `<m2>` and
`<singles>` are comma-separated lists of reads rather than lists of read files.
There is no way to specify read names or qualities, so `-c` also implies
`--ignore-quals`.

    -s/--skip <int>

Skip (i.e. do not align) the first `<int>` reads or pairs in the input.

    -u/--qupto <int>

Align the first `<int>` reads or read pairs from the input (after the
`-s`/`--skip` reads or pairs have been skipped), then stop.  Default: no limit.

    -5/--trim5 <int>

Trim `<int>` bases from 5' (left) end of each read before alignment (default: 0).

    -3/--trim3 <int>

Trim `<int>` bases from 3' (right) end of each read before alignment (default:
0).

    --phred33

Input qualities are ASCII chars equal to the [Phred quality] plus 33.  This is
also called the "Phred+33" encoding, which is used by the very latest Illumina
pipelines.

[Phred quality]: http://en.wikipedia.org/wiki/Phred_quality_score

    --phred64

Input qualities are ASCII chars equal to the [Phred quality] plus 64.  This is
also called the "Phred+64" encoding.

    --solexa-quals

Convert input qualities from [Solexa][Phred quality] (which can be negative) to
[Phred][Phred quality] (which can't).  This scheme was used in older Illumina GA
Pipeline versions (prior to 1.3).  Default: off.

    --int-quals

Quality values are represented in the read input file as space-separated ASCII
integers, e.g., `40 40 30 40`..., rather than ASCII characters, e.g., `II?I`....
 Integers are treated as being on the [Phred quality] scale unless
`--solexa-quals` is also specified. Default: off.

#### Classification

    --min-hitlen <int>

Minimum length of partial hits, which must be greater than 15 (default: 22)"

    -k <int>

It searches for at most `<int>` distinct, primary assignments for each read or pair.  
Primary assignments mean assignments whose assignment score is equal or higher than any other assignments.
If there are more primary assignments than this value, 
the search will merge some of the assignments into a higher taxonomic rank.
The assignment score for a paired-end assignment equals the sum of the assignment scores of the individual mates. 
Default: 5

    --host-taxids

A comma-separated list of taxonomic IDs that will be preferred in classification procedure.
The descendants from these IDs will also be preferred.  In case some of a read's assignments correspond to
these taxonomic IDs, only those corresponding assignments will be reported.

    --exclude-taxids

A comma-separated list of taxonomic IDs that will be excluded in classification procedure.
The descendants from these IDs will also be exclude. 

#### Alignment options

    --n-ceil <func>

Sets a function governing the maximum number of ambiguous characters (usually
`N`s and/or `.`s) allowed in a read as a function of read length.  For instance,
specifying `-L,0,0.15` sets the N-ceiling function `f` to `f(x) = 0 + 0.15 * x`,
where x is the read length.  See also: [setting function options].  Reads
exceeding this ceiling are [filtered out].  Default: `L,0,0.15`.

    --ignore-quals

When calculating a mismatch penalty, always consider the quality value at the
mismatched position to be the highest possible, regardless of the actual value. 
I.e. input is treated as though all quality values are high.  This is also the
default behavior when the input doesn't specify quality values (e.g. in `-f`,
`-r`, or `-c` modes).

    --nofw/--norc

If `--nofw` is specified, `centrifuge` will not attempt to align unpaired reads to
the forward (Watson) reference strand.  If `--norc` is specified, `centrifuge` will
not attempt to align unpaired reads against the reverse-complement (Crick)
reference strand. In paired-end mode, `--nofw` and `--norc` pertain to the
fragments; i.e. specifying `--nofw` causes `centrifuge` to explore only those
paired-end configurations corresponding to fragments from the reverse-complement
(Crick) strand.  Default: both strands enabled. 

#### Paired-end options

    --fr/--rf/--ff

The upstream/downstream mate orientations for a valid paired-end alignment
against the forward reference strand.  E.g., if `--fr` is specified and there is
a candidate paired-end alignment where mate 1 appears upstream of the reverse
complement of mate 2 and the fragment length constraints (`-I` and `-X`) are
met, that alignment is valid.  Also, if mate 2 appears upstream of the reverse
complement of mate 1 and all other constraints are met, that too is valid.
`--rf` likewise requires that an upstream mate1 be reverse-complemented and a
downstream mate2 be forward-oriented. ` --ff` requires both an upstream mate 1
and a downstream mate 2 to be forward-oriented.  Default: `--fr` (appropriate
for Illumina's Paired-end Sequencing Assay).

#### Output options

    -t/--time

Print the wall-clock time required to load the index files and align the reads. 
This is printed to the "standard error" ("stderr") filehandle.  Default: off.

    --un <path>
    --un-gz <path>
    --un-bz2 <path>

Write unpaired reads that fail to align to file at `<path>`.  These reads
correspond to the SAM records with the FLAGS `0x4` bit set and neither the
`0x40` nor `0x80` bits set.  If `--un-gz` is specified, output will be gzip
compressed. If `--un-bz2` is specified, output will be bzip2 compressed.  Reads
written in this way will appear exactly as they did in the input file, without
any modification (same sequence, same name, same quality string, same quality
encoding).  Reads will not necessarily appear in the same order as they did in
the input.

    --al <path>
    --al-gz <path>
    --al-bz2 <path>

Write unpaired reads that align at least once to file at `<path>`.  These reads
correspond to the SAM records with the FLAGS `0x4`, `0x40`, and `0x80` bits
unset.  If `--al-gz` is specified, output will be gzip compressed. If `--al-bz2`
is specified, output will be bzip2 compressed.  Reads written in this way will
appear exactly as they did in the input file, without any modification (same
sequence, same name, same quality string, same quality encoding).  Reads will
not necessarily appear in the same order as they did in the input.

    --un-conc <path>
    --un-conc-gz <path>
    --un-conc-bz2 <path>

Write paired-end reads that fail to align concordantly to file(s) at `<path>`.
These reads correspond to the SAM records with the FLAGS `0x4` bit set and
either the `0x40` or `0x80` bit set (depending on whether it's mate #1 or #2).
`.1` and `.2` strings are added to the filename to distinguish which file
contains mate #1 and mate #2.  If a percent symbol, `%`, is used in `<path>`,
the percent symbol is replaced with `1` or `2` to make the per-mate filenames.
Otherwise, `.1` or `.2` are added before the final dot in `<path>` to make the
per-mate filenames.  Reads written in this way will appear exactly as they did
in the input files, without any modification (same sequence, same name, same
quality string, same quality encoding).  Reads will not necessarily appear in
the same order as they did in the inputs.

    --al-conc <path>
    --al-conc-gz <path>
    --al-conc-bz2 <path>

Write paired-end reads that align concordantly at least once to file(s) at
`<path>`. These reads correspond to the SAM records with the FLAGS `0x4` bit
unset and either the `0x40` or `0x80` bit set (depending on whether it's mate #1
or #2). `.1` and `.2` strings are added to the filename to distinguish which
file contains mate #1 and mate #2.  If a percent symbol, `%`, is used in
`<path>`, the percent symbol is replaced with `1` or `2` to make the per-mate
filenames. Otherwise, `.1` or `.2` are added before the final dot in `<path>` to
make the per-mate filenames.  Reads written in this way will appear exactly as
they did in the input files, without any modification (same sequence, same name,
same quality string, same quality encoding).  Reads will not necessarily appear
in the same order as they did in the inputs.

    --quiet

Print nothing besides alignments and serious errors.

    --met-file <path>

Write `centrifuge` metrics to file `<path>`.  Having alignment metric can be useful
for debugging certain problems, especially performance issues.  See also:
`--met`.  Default: metrics disabled.

    --met-stderr

Write `centrifuge` metrics to the "standard error" ("stderr") filehandle.  This is
not mutually exclusive with `--met-file`.  Having alignment metric can be
useful for debugging certain problems, especially performance issues.  See also:
`--met`.  Default: metrics disabled.

    --met <int>

Write a new `centrifuge` metrics record every `<int>` seconds.  Only matters if
either `--met-stderr` or `--met-file` are specified.  Default: 1.

#### Performance options

    -o/--offrate <int>

Override the offrate of the index with `<int>`.  If `<int>` is greater
than the offrate used to build the index, then some row markings are
discarded when the index is read into memory.  This reduces the memory
footprint of the aligner but requires more time to calculate text
offsets.  `<int>` must be greater than the value used to build the
index.

    -p/--threads NTHREADS

Launch `NTHREADS` parallel search threads (default: 1).  Threads will run on
separate processors/cores and synchronize when parsing reads and outputting
alignments.  Searching for alignments is highly parallel, and speedup is close
to linear.  Increasing `-p` increases Centrifuge's memory footprint. E.g. when
aligning to a human genome index, increasing `-p` from 1 to 8 increases the
memory footprint by a few hundred megabytes.  This option is only available if
`bowtie` is linked with the `pthreads` library (i.e. if `BOWTIE_PTHREADS=0` is
not specified at build time).

    --reorder

Guarantees that output records are printed in an order corresponding to the
order of the reads in the original input file, even when `-p` is set greater
than 1.  Specifying `--reorder` and setting `-p` greater than 1 causes Centrifuge
to run somewhat slower and use somewhat more memory then if `--reorder` were
not specified.  Has no effect if `-p` is set to 1, since output order will
naturally correspond to input order in that case.

    --mm

Use memory-mapped I/O to load the index, rather than typical file I/O.
Memory-mapping allows many concurrent `bowtie` processes on the same computer to
share the same memory image of the index (i.e. you pay the memory overhead just
once).  This facilitates memory-efficient parallelization of `bowtie` in
situations where using `-p` is not possible or not preferable.

#### Other options

    --qc-filter

Filter out reads for which the QSEQ filter field is non-zero.  Only has an
effect when read format is `--qseq`.  Default: off.

    --seed <int>

Use `<int>` as the seed for pseudo-random number generator.  Default: 0.

    --non-deterministic

Normally, Centrifuge re-initializes its pseudo-random generator for each read.  It
seeds the generator with a number derived from (a) the read name, (b) the
nucleotide sequence, (c) the quality sequence, (d) the value of the `--seed`
option.  This means that if two reads are identical (same name, same
nucleotides, same qualities) Centrifuge will find and report the same classification(s)
for both, even if there was ambiguity.  When `--non-deterministic` is specified,
Centrifuge re-initializes its pseudo-random generator for each read using the
current time.  This means that Centrifuge will not necessarily report the same
classification for two identical reads.  This is counter-intuitive for some users,
but might be more appropriate in situations where the input consists of many
identical reads.

    --version

Print version information and quit.

    -h/--help

Print usage information and quit.

The `centrifuge-build` indexer
===========================

`centrifuge-build` builds a Centrifuge index from a set of DNA sequences.
`centrifuge-build` outputs a set of 6 files with suffixes `.1.cf`, `.2.cf`, and
`.3.cf`.  These files together
constitute the index: they are all that is needed to align reads to that
reference.  The original sequence FASTA files are no longer used by Centrifuge
once the index is built.

Use of Karkkainen's [blockwise algorithm] allows `centrifuge-build` to trade off
between running time and memory usage. `centrifuge-build` has two options
governing how it makes this trade: `--bmax`/`--bmaxdivn`,
and `--dcv`.  By default, `centrifuge-build` will automatically search for the
settings that yield the best running time without exhausting memory.  This
behavior can be disabled using the `-a`/`--noauto` option.

The indexer provides options pertaining to the "shape" of the index, e.g.
`--offrate` governs the fraction of [Burrows-Wheeler]
rows that are "marked" (i.e., the density of the suffix-array sample; see the
original [FM Index] paper for details).  All of these options are potentially
profitable trade-offs depending on the application.  They have been set to
defaults that are reasonable for most cases according to our experiments.  See
[Performance tuning] for details.

The Centrifuge index is based on the [FM Index] of Ferragina and Manzini, which in
turn is based on the [Burrows-Wheeler] transform.  The algorithm used to build
the index is based on the [blockwise algorithm] of Karkkainen.

[Blockwise algorithm]: http://portal.acm.org/citation.cfm?id=1314852
[Burrows-Wheeler]: http://en.wikipedia.org/wiki/Burrows-Wheeler_transform

Command Line
------------

Usage:

    centrifuge-build [options]* --conversion-table <table_in> --taxonomy-tree <taxonomy_in> --name-table <table_in2> <reference_in> <cf_base>

### Main arguments

A comma-separated list of FASTA files containing the reference sequences to be
aligned to, or, if `-c` is specified, the sequences
themselves. E.g., `<reference_in>` might be `chr1.fa,chr2.fa,chrX.fa,chrY.fa`,
or, if `-c` is specified, this might be
`GGTCATCCT,ACGGGTCGT,CCGTTCTATGCGGCTTA`.

The basename of the index files to write.  By default, `centrifuge-build` writes
files named `NAME.1.cf`, `NAME.2.cf`, and `NAME.3.cf`, where `NAME` is `<cf_base>`.

### Options

    -f

The reference input files (specified as `<reference_in>`) are FASTA files
(usually having extension `.fa`, `.mfa`, `.fna` or similar).

    -c

The reference sequences are given on the command line.  I.e. `<reference_in>` is
a comma-separated list of sequences rather than a list of FASTA files.

    -a/--noauto

Disable the default behavior whereby `centrifuge-build` automatically selects
values for the `--bmax`, `--dcv` and `--packed` parameters according to
available memory.  Instead, user may specify values for those parameters.  If
memory is exhausted during indexing, an error message will be printed; it is up
to the user to try new parameters.

    -p/--threads <int>

Launch `NTHREADS` parallel search threads (default: 1).

    --conversion-table <file>

List of UIDs (unique ID) and corresponding taxonomic IDs.

    --taxonomy-tree <file>

Taxonomic tree (e.g. nodes.dmp).

    --name-table <file>

Name table (e.g. names.dmp).

    --size-table <file>

List of taxonomic IDs and lengths of the sequences belonging to the same taxonomic IDs.

    --bmax <int>

The maximum number of suffixes allowed in a block.  Allowing more suffixes per
block makes indexing faster, but increases peak memory usage.  Setting this
option overrides any previous setting for `--bmax`, or `--bmaxdivn`. 
Default (in terms of the `--bmaxdivn` parameter) is `--bmaxdivn` 4.  This is
configured automatically by default; use `-a`/`--noauto` to configure manually.

    --bmaxdivn <int>

The maximum number of suffixes allowed in a block, expressed as a fraction of
the length of the reference.  Setting this option overrides any previous setting
for `--bmax`, or `--bmaxdivn`.  Default: `--bmaxdivn` 4.  This is
configured automatically by default; use `-a`/`--noauto` to configure manually.

    --dcv <int>

Use `<int>` as the period for the difference-cover sample.  A larger period
yields less memory overhead, but may make suffix sorting slower, especially if
repeats are present.  Must be a power of 2 no greater than 4096.  Default: 1024.
 This is configured automatically by default; use `-a`/`--noauto` to configure
manually.

    --nodc

Disable use of the difference-cover sample.  Suffix sorting becomes
quadratic-time in the worst case (where the worst case is an extremely
repetitive reference).  Default: off.

    -o/--offrate <int>

To map alignments back to positions on the reference sequences, it's necessary
to annotate ("mark") some or all of the [Burrows-Wheeler] rows with their
corresponding location on the genome. 
`-o`/`--offrate` governs how many rows get marked:
the indexer will mark every 2^`<int>` rows.  Marking more rows makes
reference-position lookups faster, but requires more memory to hold the
annotations at runtime.  The default is 4 (every 16th row is marked; for human
genome, annotations occupy about 680 megabytes).  

    -t/--ftabchars <int>

The ftab is the lookup table used to calculate an initial [Burrows-Wheeler]
range with respect to the first `<int>` characters of the query.  A larger
`<int>` yields a larger lookup table but faster query times.  The ftab has size
4^(`<int>`+1) bytes.  The default setting is 10 (ftab is 4MB).

    --seed <int>

Use `<int>` as the seed for pseudo-random number generator.

    --kmer-count <int>

Use `<int>` as kmer-size for counting the distinct number of k-mers in the input sequences.

    -q/--quiet

`centrifuge-build` is verbose by default.  With this option `centrifuge-build` will
print only error messages.

    -h/--help

Print usage information and quit.

    --version

Print version information and quit.

The `centrifuge-inspect` index inspector
=====================================

`centrifuge-inspect` extracts information from a Centrifuge index about what kind of
index it is and what reference sequences were used to build it. When run without
any options, the tool will output a FASTA file containing the sequences of the
original references (with all non-`A`/`C`/`G`/`T` characters converted to `N`s).
 It can also be used to extract just the reference sequence names using the
`-n`/`--names` option or a more verbose summary using the `-s`/`--summary`
option.

Command Line
------------

Usage:

    centrifuge-inspect [options]* <cf_base>

### Main arguments

The basename of the index to be inspected.  The basename is name of any of the
index files but with the `.X.cf` suffix omitted.
`centrifuge-inspect` first looks in the current directory for the index files, then
in the directory specified in the `Centrifuge_INDEXES` environment variable.

### Options

    -a/--across <int>

When printing FASTA output, output a newline character every `<int>` bases
(default: 60).

    -n/--names

Print reference sequence names, one per line, and quit.

    -s/--summary

Print a summary that includes information about index settings, as well as the
names and lengths of the input sequences.  The summary has this format: 

    Colorspace	<0 or 1>
    SA-Sample	1 in <sample>
    FTab-Chars	<chars>
    Sequence-1	<name>	<len>
    Sequence-2	<name>	<len>
    ...
    Sequence-N	<name>	<len>

Fields are separated by tabs.  Colorspace is always set to 0 for Centrifuge.

    --conversion-table

Print a list of UIDs (unique ID) and corresponding taxonomic IDs.

    --taxonomy-tree

Print taxonomic tree.

    --name-table

Print name table.

    --size-table

Print a list of taxonomic IDs and lengths of the sequences belonging to the same taxonomic IDs.

    -v/--verbose

Print verbose output (for debugging).

    --version

Print version information and quit.

    -h/--help

Print usage information and quit.

Getting started with Centrifuge
===================================================

Centrifuge comes with some example files to get you started.  The example files
are not scientifically significant; these files will simply let you start running Centrifuge and
downstream tools right away.

First follow the manual instructions to [obtain Centrifuge].  Set the `CENTRIFUGE_HOME`
environment variable to point to the new Centrifuge directory containing the
`centrifuge`, `centrifuge-build` and `centrifuge-inspect` binaries.  This is important,
as the `CENTRIFUGE_HOME` variable is used in the commands below to refer to that
directory.

Indexing a reference genome
---------------------------

To create an index for two small sequences included with Centrifuge, create a new temporary directory (it doesn't matter where), change into that directory, and run:

    $CENTRIFUGE_HOME/centrifuge-build --conversion-table $CENTRIFUGE_HOME/example/reference/gi_to_tid.dmp --taxonomy-tree $CENTRIFUGE_HOME/example/reference/nodes.dmp --name-table $CENTRIFUGE_HOME/example/reference/names.dmp $CENTRIFUGE_HOME/example/reference/test.fa test

The command should print many lines of output then quit. When the command
completes, the current directory will contain ten new files that all start with
`test` and end with `.1.cf`, `.2.cf`, `.3.cf`.  These files constitute the index - you're done!

You can use `centrifuge-build` to create an index for a set of FASTA files obtained
from any source, including sites such as [UCSC], [NCBI], and [Ensembl]. When
indexing multiple FASTA files, specify all the files using commas to separate
file names.  For more details on how to create an index with `centrifuge-build`,
see the [manual section on index building].  You may also want to bypass this
process by obtaining a pre-built index.

[UCSC]: http://genome.ucsc.edu/cgi-bin/hgGateway
[NCBI]: http://www.ncbi.nlm.nih.gov/sites/genome
[Ensembl]: http://www.ensembl.org/

Classifying example reads
----------------------

Stay in the directory created in the previous step, which now contains the
`test` index files.  Next, run:

    $CENTRIFUGE_HOME/centrifuge -f -x test $CENTRIFUGE_HOME/example/reads/input.fa

This runs the Centrifuge classifier, which classifies a set of unpaired reads to the
the genomes using the index generated in the previous step.
The classification results are reported to stdout, and a
short classification summary is written to centrifuge-species_report.tsv.

You will see something like this:

    readID  seqID taxID     score	2ndBestScore	hitLength	numMatches
    C_1 gi|7     9913      4225	4225		80		2
    C_1 gi|4     9646      4225	4225		80		2
    C_2 gi|4     9646      4225	4225		80		2
    C_2 gi|7     9913      4225	4225		80		2
    C_3 gi|7     9913      4225	4225		80		2
    C_3 gi|4     9646      4225	4225		80		2
    C_4 gi|4     9646      4225	4225		80		2
    C_4 gi|7     9913      4225	4225		80		2
    1_1 gi|4     9646      4225	0		80		1
    1_2 gi|4     9646      4225	0		80		1
    2_1 gi|7     9913      4225	0		80		1
    2_2 gi|7     9913      4225	0		80		1
    2_3 gi|7     9913      4225	0		80		1
    2_4 gi|7     9913      4225	0		80		1
    2_5 gi|7     9913      4225	0		80		1
    2_6 gi|7     9913      4225	0		80		1


================================================
FILE: MANUAL.markdown
================================================


<!--
 ! This manual is written in "markdown" format and thus contains some
 ! distracting formatting clutter.  See 'MANUAL' for an easier-to-read version
 ! of this text document, or see the HTML manual online.
 ! -->

Introduction
============

What is Centrifuge?
-----------------

[Centrifuge] is a novel microbial classification engine that enables
rapid, accurate, and sensitive labeling of reads and quantification of
species on desktop computers.  The system uses a novel indexing scheme
based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini
(FM) index, optimized specifically for the metagenomic classification
problem. Centrifuge requires a relatively small index (5.8 GB for all
complete bacterial and viral genomes plus the human genome) and
classifies sequences at a very high speed, allowing it to process the
millions of reads from a typical high-throughput DNA sequencing run
within a few minutes.  Together these advances enable timely and
accurate analysis of large metagenomics data sets on conventional
desktop computers.

[Centrifuge]:     http://www.ccb.jhu.edu/software/centrifuge

[Burrows-Wheeler Transform]: http://en.wikipedia.org/wiki/Burrows-Wheeler_transform
[FM Index]:        http://en.wikipedia.org/wiki/FM-index

[GPLv3 license]:   http://www.gnu.org/licenses/gpl-3.0.html


Obtaining Centrifuge
==================

Download Centrifuge and binaries from the Releases sections on the right side.
Binaries are available for Intel architectures (`x86_64`) running Linux, and Mac OS X.

Building from source
--------------------

Building Centrifuge from source requires a GNU-like environment with GCC, GNU Make
and other basics.  It should be possible to build Centrifuge on most vanilla Linux
installations or on a Mac installation with [Xcode] installed.  Centrifuge can
also be built on Windows using [Cygwin] or [MinGW] (MinGW recommended). For a 
MinGW build the choice of what compiler is to be used is important since this
will determine if a 32 or 64 bit code can be successfully compiled using it. If 
there is a need to generate both 32 and 64 bit on the same machine then a multilib 
MinGW has to be properly installed. [MSYS], the [zlib] library, and depending on 
architecture [pthreads] library are also required. We are recommending a 64 bit
build since it has some clear advantages in real life research problems. In order 
to simplify the MinGW setup it might be worth investigating popular MinGW personal 
builds since these are coming already prepared with most of the toolchains needed.

First, download the [source package] from the Releases secion on the right side.
Unzip the file, change to the unzipped directory, and build the
Centrifuge tools by running GNU `make` (usually with the command `make`, but
sometimes with `gmake`) with no arguments.  If building with MinGW, run `make`
from the MSYS environment.

Centrifuge is using the multithreading software model in order to speed up 
execution times on SMP architectures where this is possible. On POSIX 
platforms (like linux, Mac OS, etc) it needs the pthread library. Although
it is possible to use pthread library on non-POSIX platform like Windows, due
to performance reasons Centrifuge will try to use Windows native multithreading
if possible.

For the support of SRA data access in HISAT2, please download and install the [NCBI-NGS] toolkit.
When running `make`, specify additional variables as follow.
`make USE_SRA=1 NCBI_NGS_DIR=/path/to/NCBI-NGS-directory NCBI_VDB_DIR=/path/to/NCBI-NGS-directory`,
where `NCBI_NGS_DIR` and `NCBI_VDB_DIR` will be used in Makefile for -I and -L compilation options.
For example, $(NCBI_NGS_DIR)/include and $(NCBI_NGS_DIR)/lib64 will be used.  

[Cygwin]:   http://www.cygwin.com/
[MinGW]:    http://www.mingw.org/
[MSYS]:     http://www.mingw.org/wiki/msys
[zlib]:     http://cygwin.com/packages/mingw-zlib/
[pthreads]: http://sourceware.org/pthreads-win32/
[GnuWin32]: http://gnuwin32.sf.net/packages/coreutils.htm
[Xcode]:    http://developer.apple.com/xcode/
[Github site]: https://github.com/infphilo/centrifuge
[NCBI-NGS]: https://github.com/ncbi/ngs/wiki/Downloads

Running Centrifuge
=============

Adding to PATH
--------------

By adding your new Centrifuge directory to your [PATH environment variable], you
ensure that whenever you run `centrifuge`, `centrifuge-build`, `centrifuge-download` or `centrifuge-inspect`
from the command line, you will get the version you just installed without
having to specify the entire path.  This is recommended for most users.  To do
this, follow your operating system's instructions for adding the directory to
your [PATH].

If you would like to install Centrifuge by copying the Centrifuge executable files
to an existing directory in your [PATH], make sure that you copy all the
executables, including `centrifuge`, `centrifuge-class`, `centrifuge-build`, `centrifuge-build-bin`, `centrifuge-download` `centrifuge-inspect`
and `centrifuge-inspect-bin`. Furthermore you need the programs
in the scripts/ folder if you opt for genome compression in the database construction.

[PATH environment variable]: http://en.wikipedia.org/wiki/PATH_(variable)
[PATH]: http://en.wikipedia.org/wiki/PATH_(variable)


Before running Centrifuge
-----------------

Classification is considerably different from alignment in that classification is performed on a large set of genomes as opposed to on just one reference genome as in alignment.  Currently, an enormous number of complete genomes are available at the GenBank (e.g. >4,000 bacterial genomes, >10,000 viral genomes, …).  These genomes are organized in a taxonomic tree where each genome is located at the bottom of the tree, at the strain or subspecies level.  On the taxonomic tree, genomes have ancestors usually situated at the species level, and those ancestors also have ancestors at the genus level and so on up the family level, the order level, class level, phylum, kingdom, and finally at the root level.

Given the gigantic number of genomes available, which continues to expand at a rapid rate, and the development of the taxonomic tree, which continues to evolve with new advancements in research, we have designed Centrifuge to be flexible and general enough to reflect this huge database.  We provide several standard indexes that will meet most of users’ needs (see the side panel - Indexes).  In our approach our indexes not only include raw genome sequences, but also genome names/sizes and taxonomic trees.  This enables users to perform additional analyses on Centrifuge’s classification output without the need to download extra database sources.  This also eliminates the potential issue of discrepancy between the indexes we provide and the databases users may otherwise download.  We plan to provide a couple of additional standard indexes in the near future, and update the indexes on a regular basis.

We encourage first time users to take a look at and follow a [`small example`] that illustrates how to build an index, how to run Centrifuge using the index, how to interpret the classification results, and how to extract additional genomic information from the index.  For those who choose to build customized indexes, please take a close look at the following description.

Database download and index building
-----------------

Centrifuge indexes can be built with arbritary sequences. Standard choices are
all of the complete bacterial and viral genomes, or using the sequences that
are part of the BLAST nt database. Centrifuge always needs the
nodes.dmp file from the NCBI taxonomy dump to build the taxonomy tree,
as well as a sequence ID to taxonomy ID map. The map is a tab-separated
file with the sequence ID to taxonomy ID map.

To download all of the complete archaeal, viral, and bacterial genomes from RefSeq, and
build the index:

Centrifuge indices can be build on arbritary sequences. Usually an ensemble of
genomes is used - such as all complete microbial genomes in the RefSeq database,
or all sequences in the BLAST nt database. 


To map sequence identifiers to taxonomy IDs, and taxonomy IDs to names and 
its parents, three files are necessary in addition to the sequence files:

 - taxonomy tree: typically nodes.dmp from the NCBI taxonomy dump. Links taxonomy IDs to their parents
 - names file: typically names.dmp from the NCBI taxonomy dump. Links taxonomy IDs to their scientific name
 - a tab-separated sequence ID to taxonomy ID mapping

When using the provided scripts to download the genomes, these files are automatically downloaded or generated. 
When using a custom taxonomy or sequence files, please refer to the section `TODO` to learn more about their format.

### Building index on all complete bacterial and viral genomes

Use `centrifuge-download` to download genomes from NCBI. The following two commands download
the NCBI taxonomy to `taxonomy/` in the current directory, and all complete archaeal,
bacterial and viral genomes to `library/`. Low-complexity regions in the genomes are masked after
download (parameter `-m`) using blast+'s `dustmasker`. `centrifuge-download` outputs tab-separated 
sequence ID to taxonomy ID mappings to standard out, which are required by `centrifuge-build`.

    centrifuge-download -o taxonomy taxonomy
    centrifuge-download -o library -m -d "archaea,bacteria,viral" refseq > seqid2taxid.map

To build the index, first concatenate all downloaded sequences into a single file, and then
run `centrifuge-build`:
    
    cat library/*/*.fna > input-sequences.fna

    ## build centrifuge index with 4 threads
    centrifuge-build -p 4 --conversion-table seqid2taxid.map \
                     --taxonomy-tree taxonomy/nodes.dmp --name-table taxonomy/names.dmp \
                     input-sequences.fna abv

After building the index, all files except the index *.[123].cf files may be removed.
If you also want to include the human and/or the mouse genome, add their sequences to 
the library folder before building the index with one of the following commands:

After the index building, all but the *.[123].cf index files may be removed. I.e. the files in
the `library/` and `taxonomy/` directories are no longer needed.

### Adding human or mouse genome to the index
The human and mouse genomes can also be downloaded using `centrifuge-download`. They are in the
domain "vertebrate_mammalian" (argument `-d`), are assembled at the chromosome level (argument `-a`)
and categorized as reference genomes by RefSeq (`-c`). The argument `-t` takes a comma-separated
list of taxonomy IDs - e.g. `9606` for human and `10090` for mouse:

    # download mouse and human reference genomes
    centrifuge-download -o library -d "vertebrate_mammalian" -a "Chromosome" -t 9606,10090 -c 'reference genome' refseq >> seqid2taxid.map
    # only human
    centrifuge-download -o library -d "vertebrate_mammalian" -a "Chromosome" -t 9606 -c 'reference genome' refseq >> seqid2taxid.map
    # only mouse
    centrifuge-download -o library -d "vertebrate_mammalian" -a "Chromosome" -t 10090 -c 'reference genome' refseq >> seqid2taxid.map


### nt database

NCBI BLAST's nt database contains all spliced non-redundant coding
sequences from multiplpe databases, inferred from genommic
sequences. Traditionally used with BLAST, a download of the FASTA is
provided on the NCBI homepage. Building an index with any database 
requires the user to creates a sequence ID to taxonomy ID map that 
can be generated from a GI taxid dump:

    wget ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nt.gz
    gunzip nt.gz && mv -v nt nt.fa

    # Get mapping file
    wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.dmp.gz
    gunzip -c gi_taxid_nucl.dmp.gz | sed 's/^/gi|/' > gi_taxid_nucl.map

    # build index using 16 cores and a small bucket size, which will require less memory
    centrifuge-build -p 16 --bmax 1342177280 --conversion-table gi_taxid_nucl.map \
                     --taxonomy-tree taxonomy/nodes.dmp --name-table taxonomy/names.dmp \ 
                     nt.fa nt



### Custom database

To build a custom database, you need the provide the follwing four files to `centrifuge-build`:

  - `--conversion-table`: tab-separated file mapping sequence IDs to taxonomy IDs. Sequence IDs are the header up to the first space or second pipe (`|`).  
  - `--taxonomy-tree`: `\t|\t`-separated file mapping taxonomy IDs to their parents and rank, up to the root of the tree. When using NCBI taxonomy IDs, this will be the `nodes.dmp` from `ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz`.
  - `--name-table`: '\t|\t'-separated file mapping taxonomy IDs to a name. A further column (typically column 4) must specify `scientific name`. When using NCBI taxonomy IDs, `names.dmp` is the appropriate file.
  - reference sequences: The ID of the sequences are the header up to the first space or second pipe (`|`)

When using custom taxonomy IDs, use only positive integers greater-equal to `1` and use `1` for the root of the tree.

#### More info on `--taxonomy-tree` and `--name-table`

The format of these files are based on `nodes.dmp` and `names.dmp` from the NCBI taxonomy database dump. 

- Field terminator is `\t|\t`
- Row terminator is `\t|\n`

The `taxonomy-tree` / nodes.dmp file consists of taxonomy nodes. The description for each node includes the following
fields:

    tax_id                  -- node id in GenBank taxonomy database
    parent tax_id           -- parent node id in GenBank taxonomy database
    rank                    -- rank of this node (superkingdom, kingdom, ..., no rank)

Further fields are ignored.

The `name-table` / names.dmp is the taxonomy names file:

    tax_id                  -- the id of node associated with this name
    name_txt                -- name itself
    unique name             -- the unique variant of this name if name not unique
    name class              -- (scientific name, synonym, common name, ...)

`name class` **has** to be `scientific name` to be included in the build. All other lines are ignored

#### Example

*Conversion table `ex.conv`*: 
    
    Seq1	11
    Seq2	12
    Seq3	13
    Seq4	11


*Taxonomy tree `ex.tree`*: 

    1	|	1	|	root
    10	|	1	|	kingdom
    11	|	10	|	species
    12	|	10	|	species
    13	|	1	|	species

*Name table `ex.name`*:

    1	|	root	|		|	scientific name	|
    10	|	Bacteria	|		|	scientific name	|
    11	|	Bacterium A	|		|	scientific name	|
    12	|	Bacterium B	|		|	scientific name	|
    12	|	Some other species	|		|	scientific name	|

*Reference sequences `ex.fa`*:

    >Seq1
    AAAACGTACGA.....
    >Seq2
    AAAACGTACGA.....
    >Seq3
    AAAACGTACGA.....
    >Seq4
    AAAACGTACGA.....

To build the database, call

    centrifuge-build --conversion-table ex.conv \
                     --taxonomy-tree ex.tree --name-table ex.name \ 
                     ex.fa ex

which results in three index files named `ex.1.cf`, `ex.2.cf` and `ex.3.cf`.


### Centrifuge classification output

The following example shows classification assignments for a read.  The assignment output has 8 columns.

    readID    seqID   taxID score	   2ndBestScore	   hitLength	queryLength	numMatches
    1_1	      gi|4    9646  4225	   0		       80	80		1

    The first column is the read ID from a raw sequencing read (e.g., 1_1 in the example).
    The second column is the sequence ID of the genomic sequence, where the read is classified (e.g., gi|4).
    The third column is the taxonomic ID of the genomic sequence in the second column (e.g., 9646).
    The fourth column is the score for the classification, which is the weighted sum of hits (e.g., 4225)
    The fifth column is the score for the next best classification (e.g., 0).
    The sixth column is a pair of two numbers: (1) an approximate number of base pairs of the read that match the genomic sequence and (2) the length of a read or the combined length of mate pairs (e.g., 80 / 80).
    The seventh column is a pair of two numbers: (1) an approximate number of base pairs of the read that match the genomic sequence and (2) the length of a read or the combined length of mate pairs (e.g., 80 / 80). 
    The eighth column is the number of classifications for this read, indicating how many assignments were made (e.g.,1).

### Centrifuge summary output (the default filename is centrifuge_report.tsv)

The following example shows a classification summary for each genome or taxonomic unit.  The assignment output has 7 columns.

    name      	      	      		     	     	      	     	taxID	taxRank	   genomeSize 	numReads   numUniqueReads   abundance
    Wigglesworthia glossinidia endosymbiont of Glossina brevipalpis	36870	leaf	   703004		5981	   5964	            0.0152317

    The first column is the name of a genome, or the name corresponding to a taxonomic ID (the second column) at a rank higher than the strain (e.g., Wigglesworthia glossinidia endosymbiont of Glossina brevipalpis).
    The second column is the taxonomic ID (e.g., 36870).
    The third column is the taxonomic rank (e.g., leaf).
    The fourth column is the length of the genome sequence (e.g., 703004).
    The fifth column is the number of reads classified to this genomic sequence including multi-classified reads (e.g., 5981).
    The sixth column is the number of reads uniquely classified to this genomic sequence (e.g., 5964).
    The seventh column is the proportion of this genome normalized by its genomic length (e.g., 0.0152317).

As the GenBank database is incomplete (i.e., many more genomes remain to be identified and added), and reads have sequencing errors, classification programs including Centrifuge often report many false assignments.  In order to perform more conservative analyses, users may want to discard assignments for reads having a matching length (8th column in the output of Centrifuge) of 40% or lower.  It may be also helpful to use a score (4th column) for filtering out some assignments.   Our future research plans include working on developing methods that estimate confidence scores for assignments.

### Kraken-style report

`centrifuge-kreport` can be used to make a Kraken-style report from the Centrifuge output including taxonomy information:

`centrifuge-kreport -x <centrifuge index> <centrifuge out file>`


Inspecting the Centrifuge index
-----------------------

The index can be inspected with `centrifuge-inspect`.  To extract raw sequences:

    centrifuge-inspect <centrifuge index>

Extract the sequence ID to taxonomy ID conversion table from the index

    centrifuge-inspect --conversion-table <centrifuge index>

Extract the taxonomy tree from the index:

    centrifuge-inspect --taxonomy-tree <centrifuge index>

Extract the lengths of the sequences from the index (each row has two columns: taxonomic ID and length):

    centrifuge-inspect --size-table <centrifuge index>

Extract the names from the index (each row has two columns: taxonomic ID and name):

    centrifuge-inspect --name-table <centrifuge index>
    


Wrapper
-------

The `centrifuge`, `centrifuge-build` and `centrifuge-inspect` executables are actually 
wrapper scripts that call binary programs as appropriate. Also, the `centrifuge` wrapper
provides some key functionality, like the ability to handle compressed inputs,
and the functionality for [`--un`], [`--al`] and related options.

It is recommended that you always run the centrifuge wrappers and not run the
binaries directly.

Performance tuning
------------------

1.  If your computer has multiple processors/cores, use `-p NTHREADS`

    The [`-p`] option causes Centrifuge to launch a specified number of parallel
    search threads.  Each thread runs on a different processor/core and all
    threads find alignments in parallel, increasing alignment throughput by
    approximately a multiple of the number of threads (though in practice,
    speedup is somewhat worse than linear).

Command Line
------------


### Usage

    centrifuge [options]* -x <centrifuge-idx> {-1 <m1> -2 <m2> | -U <r> | --sra-acc <SRA accession number>} [--report-file <report file name> -S <classification output file name>]

### Main arguments

<table><tr><td>

[`-x`]: #centrifuge-options-x

    -x <centrifuge-idx>

</td><td>

The basename of the index for the reference genomes.  The basename is the name of
any of the index files up to but not including the final `.1.cf` / etc.  
`centrifuge` looks for the specified index first in the current directory,
then in the directory specified in the `CENTRIFUGE_INDEXES` environment variable.

</td></tr><tr><td>

[`-1`]: #centrifuge-options-1

    -1 <m1>

</td><td>

Comma-separated list of files containing mate 1s (filename usually includes
`_1`), e.g. `-1 flyA_1.fq,flyB_1.fq`.  Sequences specified with this option must
correspond file-for-file and read-for-read with those specified in `<m2>`. Reads
may be a mix of different lengths. If `-` is specified, `centrifuge` will read the
mate 1s from the "standard in" or "stdin" filehandle.

</td></tr><tr><td>

[`-2`]: #centrifuge-options-2

    -2 <m2>

</td><td>

Comma-separated list of files containing mate 2s (filename usually includes
`_2`), e.g. `-2 flyA_2.fq,flyB_2.fq`.  Sequences specified with this option must
correspond file-for-file and read-for-read with those specified in `<m1>`. Reads
may be a mix of different lengths. If `-` is specified, `centrifuge` will read the
mate 2s from the "standard in" or "stdin" filehandle.

</td></tr><tr><td>

[`-U`]: #centrifuge-options-U

    -U <r>

</td><td>

Comma-separated list of files containing unpaired reads to be aligned, e.g.
`lane1.fq,lane2.fq,lane3.fq,lane4.fq`.  Reads may be a mix of different lengths.
If `-` is specified, `centrifuge` gets the reads from the "standard in" or "stdin"
filehandle.

</td></tr><tr><td>

[`--sra-acc`]: #hisat2-options-sra-acc

    --sra-acc <SRA accession number>

</td><td>

Comma-separated list of SRA accession numbers, e.g. `--sra-acc SRR353653,SRR353654`.
Information about read types is available at http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?sp=runinfo&acc=<b>sra-acc</b>&retmode=xml,
where <b>sra-acc</b> is SRA accession number.  If users run HISAT2 on a computer cluster, it is recommended to disable SRA-related caching (see the instruction at [SRA-MANUAL]).

[SRA-MANUAL]:	     https://github.com/ncbi/sra-tools/wiki/Toolkit-Configuration

</td></tr><tr><td>

[`-S`]: #centrifuge-options-S

    -S <filename>

</td><td>

File to write classification results to.  By default, assignments are written to the
"standard out" or "stdout" filehandle (i.e. the console).

</td></tr><tr><td>

[`--report-file`]: #centrifuge-options-report-file

    --report-file <filename>

</td><td>

File to write a classification summary to (default: centrifuge_report.tsv).

</td></tr></table>

### Options

#### Input options

<table>
<tr><td id="centrifuge-options-q">

[`-q`]: #centrifuge-options-q

    -q

</td><td>

Reads (specified with `<m1>`, `<m2>`, `<s>`) are FASTQ files.  FASTQ files
usually have extension `.fq` or `.fastq`.  FASTQ is the default format.  See
also: [`--solexa-quals`] and [`--int-quals`].

</td></tr>
<tr><td id="centrifuge-options-qseq">

[`--qseq`]: #centrifuge-options-qseq

    --qseq

</td><td>

Reads (specified with `<m1>`, `<m2>`, `<s>`) are QSEQ files.  QSEQ files usually
end in `_qseq.txt`.  See also: [`--solexa-quals`] and [`--int-quals`].

</td></tr>
<tr><td id="centrifuge-options-f">

[`-f`]: #centrifuge-options-f

    -f

</td><td>

Reads (specified with `<m1>`, `<m2>`, `<s>`) are FASTA files.  FASTA files
usually have extension `.fa`, `.fasta`, `.mfa`, `.fna` or similar.  FASTA files
do not have a way of specifying quality values, so when `-f` is set, the result
is as if `--ignore-quals` is also set.

</td></tr>
<tr><td id="centrifuge-options-r">

[`-r`]: #centrifuge-options-r

    -r

</td><td>

Reads (specified with `<m1>`, `<m2>`, `<s>`) are files with one input sequence
per line, without any other information (no read names, no qualities).  When
`-r` is set, the result is as if `--ignore-quals` is also set.

</td></tr>
<tr><td id="centrifuge-options-c">

[`-c`]: #centrifuge-options-c

    -c

</td><td>

The read sequences are given on command line.  I.e. `<m1>`, `<m2>` and
`<singles>` are comma-separated lists of reads rather than lists of read files.
There is no way to specify read names or qualities, so `-c` also implies
`--ignore-quals`.

</td></tr>
<tr><td id="centrifuge-options-s">

[`-s`/`--skip`]: #centrifuge-options-s
[`-s`]: #centrifuge-options-s

    -s/--skip <int>

</td><td>

Skip (i.e. do not align) the first `<int>` reads or pairs in the input.

</td></tr>
<tr><td id="centrifuge-options-u">

[`-u`/`--qupto`]: #centrifuge-options-u
[`-u`]: #centrifuge-options-u

    -u/--qupto <int>

</td><td>

Align the first `<int>` reads or read pairs from the input (after the
[`-s`/`--skip`] reads or pairs have been skipped), then stop.  Default: no limit.

</td></tr>
<tr><td id="centrifuge-options-5">

[`-5`/`--trim5`]: #centrifuge-options-5
[`-5`]: #centrifuge-options-5

    -5/--trim5 <int>

</td><td>

Trim `<int>` bases from 5' (left) end of each read before alignment (default: 0).

</td></tr>
<tr><td id="centrifuge-options-3">

[`-3`/`--trim3`]: #centrifuge-options-3
[`-3`]: #centrifuge-options-3

    -3/--trim3 <int>

</td><td>

Trim `<int>` bases from 3' (right) end of each read before alignment (default:
0).

</td></tr><tr><td id="centrifuge-options-phred33-quals">

[`--phred33`]: #centrifuge-options-phred33-quals

    --phred33

</td><td>

Input qualities are ASCII chars equal to the [Phred quality] plus 33.  This is
also called the "Phred+33" encoding, which is used by the very latest Illumina
pipelines.

[Phred quality]: http://en.wikipedia.org/wiki/Phred_quality_score

</td></tr>
<tr><td id="centrifuge-options-phred64-quals">

[`--phred64`]: #centrifuge-options-phred64-quals

    --phred64

</td><td>

Input qualities are ASCII chars equal to the [Phred quality] plus 64.  This is
also called the "Phred+64" encoding.

</td></tr>
<tr><td id="centrifuge-options-solexa-quals">

[`--solexa-quals`]: #centrifuge-options-solexa-quals

    --solexa-quals

</td><td>

Convert input qualities from [Solexa][Phred quality] (which can be negative) to
[Phred][Phred quality] (which can't).  This scheme was used in older Illumina GA
Pipeline versions (prior to 1.3).  Default: off.

</td></tr>
<tr><td id="centrifuge-options-int-quals">

[`--int-quals`]: #centrifuge-options-int-quals

    --int-quals

</td><td>

Quality values are represented in the read input file as space-separated ASCII
integers, e.g., `40 40 30 40`..., rather than ASCII characters, e.g., `II?I`....
 Integers are treated as being on the [Phred quality] scale unless
[`--solexa-quals`] is also specified. Default: off.

</td></tr></table>

#### Classification

<table>

<tr><td id="centrifuge-options-min-hitlen">

[`--min-hitlen`]: #centrifuge-options-min-hitlen

    --min-hitlen <int>

</td><td>

Minimum length of partial hits, which must be greater than 15 (default: 22)"

</td></tr>

<tr><td id="centrifuge-options-k">

[`-k`]: #centrifuge-options-k

    -k <int>

</td><td>

It searches for at most `<int>` distinct, primary assignments for each read or pair.  
Primary assignments mean assignments whose assignment score is equal or higher than any other assignments.
If there are more primary assignments than this value, 
the search will merge some of the assignments into a higher taxonomic rank.
The assignment score for a paired-end assignment equals the sum of the assignment scores of the individual mates. 
Default: 5

</td></tr>

<tr><td id="centrifuge-options-host-taxids">

[`--host-taxids`]: #centrifuge-options-host-taxids

    --host-taxids

</td><td>

A comma-separated list of taxonomic IDs that will be preferred in classification procedure.
The descendants from these IDs will also be preferred.  In case some of a read's assignments correspond to
these taxonomic IDs, only those corresponding assignments will be reported.

</td></tr>

<tr><td id="centrifuge-options-exclude-taxids">

[`--exclude-taxids`]: #centrifuge-options-exclude-taxids

    --exclude-taxids

</td><td>

A comma-separated list of taxonomic IDs that will be excluded in classification procedure.
The descendants from these IDs will also be exclude. 

</td></tr>

</table>


<!--
#### Alignment options

<table>

<tr><td id="centrifuge-options-n-ceil">

[`--n-ceil`]: #centrifuge-options-n-ceil

    --n-ceil <func>

</td><td>

Sets a function governing the maximum number of ambiguous characters (usually
`N`s and/or `.`s) allowed in a read as a function of read length.  For instance,
specifying `-L,0,0.15` sets the N-ceiling function `f` to `f(x) = 0 + 0.15 * x`,
where x is the read length.  See also: [setting function options].  Reads
exceeding this ceiling are [filtered out].  Default: `L,0,0.15`.

[filtered out]: #filtering

</td></tr>

<tr><td id="centrifuge-options-ignore-quals">

[`--ignore-quals`]: #centrifuge-options-ignore-quals

    --ignore-quals

</td><td>

When calculating a mismatch penalty, always consider the quality value at the
mismatched position to be the highest possible, regardless of the actual value. 
I.e. input is treated as though all quality values are high.  This is also the
default behavior when the input doesn't specify quality values (e.g. in [`-f`],
[`-r`], or [`-c`] modes).

</td></tr>
<tr><td id="centrifuge-options-nofw">

[`--nofw`]: #centrifuge-options-nofw

    --nofw/--norc

</td><td>

If `--nofw` is specified, `centrifuge` will not attempt to align unpaired reads to
the forward (Watson) reference strand.  If `--norc` is specified, `centrifuge` will
not attempt to align unpaired reads against the reverse-complement (Crick)
reference strand. In paired-end mode, `--nofw` and `--norc` pertain to the
fragments; i.e. specifying `--nofw` causes `centrifuge` to explore only those
paired-end configurations corresponding to fragments from the reverse-complement
(Crick) strand.  Default: both strands enabled. 

</td></tr>

</table>

#### Paired-end options

<table>

<tr><td id="centrifuge-options-fr">

[`--fr`/`--rf`/`--ff`]: #centrifuge-options-fr
[`--fr`]: #centrifuge-options-fr
[`--rf`]: #centrifuge-options-fr
[`--ff`]: #centrifuge-options-fr

    --fr/--rf/--ff

</td><td>

The upstream/downstream mate orientations for a valid paired-end alignment
against the forward reference strand.  E.g., if `--fr` is specified and there is
a candidate paired-end alignment where mate 1 appears upstream of the reverse
complement of mate 2 and the fragment length constraints ([`-I`] and [`-X`]) are
met, that alignment is valid.  Also, if mate 2 appears upstream of the reverse
complement of mate 1 and all other constraints are met, that too is valid.
`--rf` likewise requires that an upstream mate1 be reverse-complemented and a
downstream mate2 be forward-oriented. ` --ff` requires both an upstream mate 1
and a downstream mate 2 to be forward-oriented.  Default: `--fr` (appropriate
for Illumina's Paired-end Sequencing Assay).

</td></tr></table>
-->

#### Output options

<table>

<tr><td id="centrifuge-options-t">

[`-t`/`--time`]: #centrifuge-options-t
[`-t`]: #centrifuge-options-t

    -t/--time

</td><td>

Print the wall-clock time required to load the index files and align the reads. 
This is printed to the "standard error" ("stderr") filehandle.  Default: off.

</td></tr>

<!--
<tr><td id="centrifuge-options-un">

[`--un`]: #centrifuge-options-un
[`--un-gz`]: #centrifuge-options-un
[`--un-bz2`]: #centrifuge-options-un

    --un <path>
    --un-gz <path>
    --un-bz2 <path>

</td><td>

Write unpaired reads that fail to align to file at `<path>`.  These reads
correspond to the SAM records with the FLAGS `0x4` bit set and neither the
`0x40` nor `0x80` bits set.  If `--un-gz` is specified, output will be gzip
compressed. If `--un-bz2` is specified, output will be bzip2 compressed.  Reads
written in this way will appear exactly as they did in the input file, without
any modification (same sequence, same name, same quality string, same quality
encoding).  Reads will not necessarily appear in the same order as they did in
the input.

</td></tr>
<tr><td id="centrifuge-options-al">

[`--al`]: #centrifuge-options-al
[`--al-gz`]: #centrifuge-options-al
[`--al-bz2`]: #centrifuge-options-al

    --al <path>
    --al-gz <path>
    --al-bz2 <path>

</td><td>

Write unpaired reads that align at least once to file at `<path>`.  These reads
correspond to the SAM records with the FLAGS `0x4`, `0x40`, and `0x80` bits
unset.  If `--al-gz` is specified, output will be gzip compressed. If `--al-bz2`
is specified, output will be bzip2 compressed.  Reads written in this way will
appear exactly as they did in the input file, without any modification (same
sequence, same name, same quality string, same quality encoding).  Reads will
not necessarily appear in the same order as they did in the input.

</td></tr>
<tr><td id="centrifuge-options-un-conc">

[`--un-conc`]: #centrifuge-options-un-conc
[`--un-conc-gz`]: #centrifuge-options-un-conc
[`--un-conc-bz2`]: #centrifuge-options-un-conc

    --un-conc <path>
    --un-conc-gz <path>
    --un-conc-bz2 <path>

</td><td>

Write paired-end reads that fail to align concordantly to file(s) at `<path>`.
These reads correspond to the SAM records with the FLAGS `0x4` bit set and
either the `0x40` or `0x80` bit set (depending on whether it's mate #1 or #2).
`.1` and `.2` strings are added to the filename to distinguish which file
contains mate #1 and mate #2.  If a percent symbol, `%`, is used in `<path>`,
the percent symbol is replaced with `1` or `2` to make the per-mate filenames.
Otherwise, `.1` or `.2` are added before the final dot in `<path>` to make the
per-mate filenames.  Reads written in this way will appear exactly as they did
in the input files, without any modification (same sequence, same name, same
quality string, same quality encoding).  Reads will not necessarily appear in
the same order as they did in the inputs.

</td></tr>
<tr><td id="centrifuge-options-al-conc">

[`--al-conc`]: #centrifuge-options-al-conc
[`--al-conc-gz`]: #centrifuge-options-al-conc
[`--al-conc-bz2`]: #centrifuge-options-al-conc

    --al-conc <path>
    --al-conc-gz <path>
    --al-conc-bz2 <path>

</td><td>

Write paired-end reads that align concordantly at least once to file(s) at
`<path>`. These reads correspond to the SAM records with the FLAGS `0x4` bit
unset and either the `0x40` or `0x80` bit set (depending on whether it's mate #1
or #2). `.1` and `.2` strings are added to the filename to distinguish which
file contains mate #1 and mate #2.  If a percent symbol, `%`, is used in
`<path>`, the percent symbol is replaced with `1` or `2` to make the per-mate
filenames. Otherwise, `.1` or `.2` are added before the final dot in `<path>` to
make the per-mate filenames.  Reads written in this way will appear exactly as
they did in the input files, without any modification (same sequence, same name,
same quality string, same quality encoding).  Reads will not necessarily appear
in the same order as they did in the inputs.

</td></tr>
-->

<tr><td id="centrifuge-options-quiet">

[`--quiet`]: #centrifuge-options-quiet

    --quiet

</td><td>

Print nothing besides alignments and serious errors.

</td></tr>
<tr><td id="centrifuge-options-met-file">

[`--met-file`]: #centrifuge-options-met-file

    --met-file <path>

</td><td>

Write `centrifuge` metrics to file `<path>`.  Having alignment metric can be useful
for debugging certain problems, especially performance issues.  See also:
[`--met`].  Default: metrics disabled.

</td></tr>
<tr><td id="centrifuge-options-met-stderr">

[`--met-stderr`]: #centrifuge-options-met-stderr

    --met-stderr

</td><td>

Write `centrifuge` metrics to the "standard error" ("stderr") filehandle.  This is
not mutually exclusive with [`--met-file`].  Having alignment metric can be
useful for debugging certain problems, especially performance issues.  See also:
[`--met`].  Default: metrics disabled.

</td></tr>
<tr><td id="centrifuge-options-met">

[`--met`]: #centrifuge-options-met

    --met <int>

</td><td>

Write a new `centrifuge` metrics record every `<int>` seconds.  Only matters if
either [`--met-stderr`] or [`--met-file`] are specified.  Default: 1.

</td></tr>
</table>

#### Performance options

<table><tr>

<td id="centrifuge-options-o">

[`-o`/`--offrate`]: #centrifuge-options-o
[`-o`]: #centrifuge-options-o
[`--offrate`]: #centrifuge-options-o

    -o/--offrate <int>

</td><td>

Override the offrate of the index with `<int>`.  If `<int>` is greater
than the offrate used to build the index, then some row markings are
discarded when the index is read into memory.  This reduces the memory
footprint of the aligner but requires more time to calculate text
offsets.  `<int>` must be greater than the value used to build the
index.

</td></tr>
<tr><td id="centrifuge-options-p">

[`-p`/`--threads`]: #centrifuge-options-p
[`-p`]: #centrifuge-options-p

    -p/--threads NTHREADS

</td><td>

Launch `NTHREADS` parallel search threads (default: 1).  Threads will run on
separate processors/cores and synchronize when parsing reads and outputting
alignments.  Searching for alignments is highly parallel, and speedup is close
to linear.  Increasing `-p` increases Centrifuge's memory footprint. E.g. when
aligning to a human genome index, increasing `-p` from 1 to 8 increases the
memory footprint by a few hundred megabytes.  This option is only available if
`bowtie` is linked with the `pthreads` library (i.e. if `BOWTIE_PTHREADS=0` is
not specified at build time).

</td></tr>
<tr><td id="centrifuge-options-reorder">

[`--reorder`]: #centrifuge-options-reorder

    --reorder

</td><td>

Guarantees that output records are printed in an order corresponding to the
order of the reads in the original input file, even when [`-p`] is set greater
than 1.  Specifying `--reorder` and setting [`-p`] greater than 1 causes Centrifuge
to run somewhat slower and use somewhat more memory then if `--reorder` were
not specified.  Has no effect if [`-p`] is set to 1, since output order will
naturally correspond to input order in that case.

</td></tr>
<tr><td id="centrifuge-options-mm">

[`--mm`]: #centrifuge-options-mm

    --mm

</td><td>

Use memory-mapped I/O to load the index, rather than typical file I/O.
Memory-mapping allows many concurrent `bowtie` processes on the same computer to
share the same memory image of the index (i.e. you pay the memory overhead just
once).  This facilitates memory-efficient parallelization of `bowtie` in
situations where using [`-p`] is not possible or not preferable.

</td></tr></table>

#### Other options

<table>
<tr><td id="centrifuge-options-qc-filter">

[`--qc-filter`]: #centrifuge-options-qc-filter

    --qc-filter

</td><td>

Filter out reads for which the QSEQ filter field is non-zero.  Only has an
effect when read format is [`--qseq`].  Default: off.

</td></tr>
<tr><td id="centrifuge-options-seed">

[`--seed`]: #centrifuge-options-seed

    --seed <int>

</td><td>

Use `<int>` as the seed for pseudo-random number generator.  Default: 0.

</td></tr>
<tr><td id="centrifuge-options-non-deterministic">

[`--non-deterministic`]: #centrifuge-options-non-deterministic

    --non-deterministic

</td><td>

Normally, Centrifuge re-initializes its pseudo-random generator for each read.  It
seeds the generator with a number derived from (a) the read name, (b) the
nucleotide sequence, (c) the quality sequence, (d) the value of the [`--seed`]
option.  This means that if two reads are identical (same name, same
nucleotides, same qualities) Centrifuge will find and report the same classification(s)
for both, even if there was ambiguity.  When `--non-deterministic` is specified,
Centrifuge re-initializes its pseudo-random generator for each read using the
current time.  This means that Centrifuge will not necessarily report the same
classification for two identical reads.  This is counter-intuitive for some users,
but might be more appropriate in situations where the input consists of many
identical reads.

</td></tr>
<tr><td id="centrifuge-options-version">

[`--version`]: #centrifuge-options-version

    --version

</td><td>

Print version information and quit.

</td></tr>
<tr><td id="centrifuge-options-h">

    -h/--help

</td><td>

Print usage information and quit.

</td></tr></table>


The `centrifuge-build` indexer
===========================

`centrifuge-build` builds a Centrifuge index from a set of DNA sequences.
`centrifuge-build` outputs a set of 6 files with suffixes `.1.cf`, `.2.cf`, and
`.3.cf`.  These files together
constitute the index: they are all that is needed to align reads to that
reference.  The original sequence FASTA files are no longer used by Centrifuge
once the index is built.

Use of Karkkainen's [blockwise algorithm] allows `centrifuge-build` to trade off
between running time and memory usage. `centrifuge-build` has two options
governing how it makes this trade: [`--bmax`]/[`--bmaxdivn`],
and [`--dcv`].  By default, `centrifuge-build` will automatically search for the
settings that yield the best running time without exhausting memory.  This
behavior can be disabled using the [`-a`/`--noauto`] option.

The indexer provides options pertaining to the "shape" of the index, e.g.
[`--offrate`](#centrifuge-build-options-o) governs the fraction of [Burrows-Wheeler]
rows that are "marked" (i.e., the density of the suffix-array sample; see the
original [FM Index] paper for details).  All of these options are potentially
profitable trade-offs depending on the application.  They have been set to
defaults that are reasonable for most cases according to our experiments.  See
[Performance tuning] for details.

The Centrifuge index is based on the [FM Index] of Ferragina and Manzini, which in
turn is based on the [Burrows-Wheeler] transform.  The algorithm used to build
the index is based on the [blockwise algorithm] of Karkkainen.

[Blockwise algorithm]: http://portal.acm.org/citation.cfm?id=1314852
[Burrows-Wheeler]: http://en.wikipedia.org/wiki/Burrows-Wheeler_transform
[Performance tuning]: #performance-tuning

Command Line
------------

Usage:

    centrifuge-build [options]* --conversion-table <table_in> --taxonomy-tree <taxonomy_in> --name-table <table_in2> <reference_in> <cf_base>

### Main arguments

<table><tr><td>

    <reference_in>

</td><td>

A comma-separated list of FASTA files containing the reference sequences to be
aligned to, or, if [`-c`](#centrifuge-build-options-c) is specified, the sequences
themselves. E.g., `<reference_in>` might be `chr1.fa,chr2.fa,chrX.fa,chrY.fa`,
or, if [`-c`](#centrifuge-build-options-c) is specified, this might be
`GGTCATCCT,ACGGGTCGT,CCGTTCTATGCGGCTTA`.

</td></tr><tr><td>

    <cf_base>

</td><td>

The basename of the index files to write.  By default, `centrifuge-build` writes
files named `NAME.1.cf`, `NAME.2.cf`, and `NAME.3.cf`, where `NAME` is `<cf_base>`.

</td></tr></table>

### Options

<table><tr><td>

    -f

</td><td>

The reference input files (specified as `<reference_in>`) are FASTA files
(usually having extension `.fa`, `.mfa`, `.fna` or similar).

</td></tr><tr><td id="centrifuge-build-options-c">

    -c

</td><td>

The reference sequences are given on the command line.  I.e. `<reference_in>` is
a comma-separated list of sequences rather than a list of FASTA files.

</td></tr>
<tr><td id="centrifuge-build-options-a">

[`-a`/`--noauto`]: #centrifuge-build-options-a

    -a/--noauto

</td><td>

Disable the default behavior whereby `centrifuge-build` automatically selects
values for the [`--bmax`], [`--dcv`] and [`--packed`] parameters according to
available memory.  Instead, user may specify values for those parameters.  If
memory is exhausted during indexing, an error message will be printed; it is up
to the user to try new parameters.

</td></tr><tr><td id="centrifuge-build-options-p">

[`-p`]: #centrifuge-build-options-p

    -p/--threads <int>

</td><td>

Launch `NTHREADS` parallel search threads (default: 1).

</td></tr><tr><td id="centrifuge-build-options-conversion-table">

[`--conversion-table`]: #centrifuge-build-options-conversion-table

    --conversion-table <file>

</td><td>

List of UIDs (unique ID) and corresponding taxonomic IDs.

</td></tr><tr><td id="centrifuge-build-options-taxonomy-tree">

[`--taxonomy-tree`]: #centrifuge-build-options-taxonomy-tree

    --taxonomy-tree <file>

</td><td>

Taxonomic tree (e.g. nodes.dmp).

</td></tr><tr><td id="centrifuge-build-options-name-table">

[`--taxonomy-tree`]: #centrifuge-build-options-name-table

    --name-table <file>

</td><td>

Name table (e.g. names.dmp).

</td></tr><tr><td id="centrifuge-build-options-taxonomy-tree">

[`--size-table`]: #centrifuge-build-options-size-table

    --size-table <file>

</td><td>

List of taxonomic IDs and lengths of the sequences belonging to the same taxonomic IDs.

</td></tr><tr><td id="centrifuge-build-options-bmax">

[`--bmax`]: #centrifuge-build-options-bmax

    --bmax <int>

</td><td>

The maximum number of suffixes allowed in a block.  Allowing more suffixes per
block makes indexing faster, but increases peak memory usage.  Setting this
option overrides any previous setting for [`--bmax`], or [`--bmaxdivn`]. 
Default (in terms of the [`--bmaxdivn`] parameter) is [`--bmaxdivn`] 4.  This is
configured automatically by default; use [`-a`/`--noauto`] to configure manually.

</td></tr><tr><td id="centrifuge-build-options-bmaxdivn">

[`--bmaxdivn`]: #centrifuge-build-options-bmaxdivn

    --bmaxdivn <int>

</td><td>

The maximum number of suffixes allowed in a block, expressed as a fraction of
the length of the reference.  Setting this option overrides any previous setting
for [`--bmax`], or [`--bmaxdivn`].  Default: [`--bmaxdivn`] 4.  This is
configured automatically by default; use [`-a`/`--noauto`] to configure manually.

</td></tr><tr><td id="centrifuge-build-options-dcv">

[`--dcv`]: #centrifuge-build-options-dcv

    --dcv <int>

</td><td>

Use `<int>` as the period for the difference-cover sample.  A larger period
yields less memory overhead, but may make suffix sorting slower, especially if
repeats are present.  Must be a power of 2 no greater than 4096.  Default: 1024.
 This is configured automatically by default; use [`-a`/`--noauto`] to configure
manually.

</td></tr><tr><td id="centrifuge-build-options-nodc">

[`--nodc`]: #centrifuge-build-options-nodc

    --nodc

</td><td>

Disable use of the difference-cover sample.  Suffix sorting becomes
quadratic-time in the worst case (where the worst case is an extremely
repetitive reference).  Default: off.

</td></tr><tr><td id="centrifuge-build-options-o">

    -o/--offrate <int>

</td><td>

To map alignments back to positions on the reference sequences, it's necessary
to annotate ("mark") some or all of the [Burrows-Wheeler] rows with their
corresponding location on the genome. 
[`-o`/`--offrate`](#centrifuge-build-options-o) governs how many rows get marked:
the indexer will mark every 2^`<int>` rows.  Marking more rows makes
reference-position lookups faster, but requires more memory to hold the
annotations at runtime.  The default is 4 (every 16th row is marked; for human
genome, annotations occupy about 680 megabytes).  

</td></tr><tr><td>

    -t/--ftabchars <int>

</td><td>

The ftab is the lookup table used to calculate an initial [Burrows-Wheeler]
range with respect to the first `<int>` characters of the query.  A larger
`<int>` yields a larger lookup table but faster query times.  The ftab has size
4^(`<int>`+1) bytes.  The default setting is 10 (ftab is 4MB).

</td></tr><tr><td>

    --seed <int>

</td><td>

Use `<int>` as the seed for pseudo-random number generator.

</td></tr><tr><td>

    --kmer-count <int>

</td><td>

Use `<int>` as kmer-size for counting the distinct number of k-mers in the input sequences.

</td></tr><tr><td>

    -q/--quiet

</td><td>

`centrifuge-build` is verbose by default.  With this option `centrifuge-build` will
print only error messages.

</td></tr><tr><td>

    -h/--help

</td><td>

Print usage information and quit.

</td></tr><tr><td>

    --version

</td><td>

Print version information and quit.

</td></tr></table>

The `centrifuge-inspect` index inspector
=====================================

`centrifuge-inspect` extracts information from a Centrifuge index about what kind of
index it is and what reference sequences were used to build it. When run without
any options, the tool will output a FASTA file containing the sequences of the
original references (with all non-`A`/`C`/`G`/`T` characters converted to `N`s).
 It can also be used to extract just the reference sequence names using the
[`-n`/`--names`] option or a more verbose summary using the [`-s`/`--summary`]
option.

Command Line
------------

Usage:

    centrifuge-inspect [options]* <cf_base>

### Main arguments

<table><tr><td>

    <cf_base>

</td><td>

The basename of the index to be inspected.  The basename is name of any of the
index files but with the `.X.cf` suffix omitted.
`centrifuge-inspect` first looks in the current directory for the index files, then
in the directory specified in the `Centrifuge_INDEXES` environment variable.

</td></tr></table>

### Options

<table><tr><td>

    -a/--across <int>

</td><td>

When printing FASTA output, output a newline character every `<int>` bases
(default: 60).

</td></tr><tr><td id="centrifuge-inspect-options-n">

[`-n`/`--names`]: #centrifuge-inspect-options-n

    -n/--names

</td><td>

Print reference sequence names, one per line, and quit.

</td></tr><tr><td id="centrifuge-inspect-options-s">

[`-s`/`--summary`]: #centrifuge-inspect-options-s

    -s/--summary

</td><td>

Print a summary that includes information about index settings, as well as the
names and lengths of the input sequences.  The summary has this format: 

    Colorspace	<0 or 1>
    SA-Sample	1 in <sample>
    FTab-Chars	<chars>
    Sequence-1	<name>	<len>
    Sequence-2	<name>	<len>
    ...
    Sequence-N	<name>	<len>

Fields are separated by tabs.  Colorspace is always set to 0 for Centrifuge.

</td></tr><tr><td id="centrifuge-inspect-options-conversion-table">

[`--conversion-table`]: #centrifuge-inspect-options-conversion-table

    --conversion-table

</td><td>

Print a list of UIDs (unique ID) and corresponding taxonomic IDs.

</td></tr><tr><td id="centrifuge-inspect-options-taxonomy-tree">

[`--taxonomy-tree`]: #centrifuge-inspect-options-taxonomy-tree

    --taxonomy-tree

</td><td>

Print taxonomic tree.

</td></tr><tr><td id="centrifuge-inspect-options-name-table">

[`--taxonomy-tree`]: #centrifuge-inspect-options-name-table

    --name-table

</td><td>

Print name table.

</td></tr><tr><td id="centrifuge-inspect-options-taxonomy-tree">

[`--size-table`]: #centrifuge-inspect-options-size-table

    --size-table

</td><td>

Print a list of taxonomic IDs and lengths of the sequences belonging to the same taxonomic IDs.

</td></tr><tr><td>

    -v/--verbose

</td><td>

Print verbose output (for debugging).

</td></tr><tr><td>

    --version

</td><td>

Print version information and quit.

</td></tr><tr><td>

    -h/--help

</td><td>

Print usage information and quit.

</td></tr></table>

[`small example`]: #centrifuge-example

Getting started with Centrifuge
===================================================

Centrifuge comes with some example files to get you started.  The example files
are not scientifically significant; these files will simply let you start running Centrifuge and
downstream tools right away.

First follow the manual instructions to [obtain Centrifuge].  Set the `CENTRIFUGE_HOME`
environment variable to point to the new Centrifuge directory containing the
`centrifuge`, `centrifuge-build` and `centrifuge-inspect` binaries.  This is important,
as the `CENTRIFUGE_HOME` variable is used in the commands below to refer to that
directory.

[obtain Centrifuge]: #obtaining-centrifuge

Indexing a reference genome
---------------------------

To create an index for two small sequences included with Centrifuge, create a new temporary directory (it doesn't matter where), change into that directory, and run:

    $CENTRIFUGE_HOME/centrifuge-build --conversion-table $CENTRIFUGE_HOME/example/reference/gi_to_tid.dmp --taxonomy-tree $CENTRIFUGE_HOME/example/reference/nodes.dmp --name-table $CENTRIFUGE_HOME/example/reference/names.dmp $CENTRIFUGE_HOME/example/reference/test.fa test

The command should print many lines of output then quit. When the command
completes, the current directory will contain ten new files that all start with
`test` and end with `.1.cf`, `.2.cf`, `.3.cf`.  These files constitute the index - you're done!

You can use `centrifuge-build` to create an index for a set of FASTA files obtained
from any source, including sites such as [UCSC], [NCBI], and [Ensembl]. When
indexing multiple FASTA files, specify all the files using commas to separate
file names.  For more details on how to create an index with `centrifuge-build`,
see the [manual section on index building].  You may also want to bypass this
process by obtaining a pre-built index.

[UCSC]: http://genome.ucsc.edu/cgi-bin/hgGateway
[NCBI]: http://www.ncbi.nlm.nih.gov/sites/genome
[Ensembl]: http://www.ensembl.org/
[manual section on index building]: #the-centrifuge-build-indexer
[using a pre-built index]: #using-a-pre-built-index

Classifying example reads
----------------------

Stay in the directory created in the previous step, which now contains the
`test` index files.  Next, run:

    $CENTRIFUGE_HOME/centrifuge -f -x test $CENTRIFUGE_HOME/example/reads/input.fa

This runs the Centrifuge classifier, which classifies a set of unpaired reads to the
the genomes using the index generated in the previous step.
The classification results are reported to stdout, and a
short classification summary is written to centrifuge-species_report.tsv.

You will see something like this:

    readID  seqID taxID     score	2ndBestScore	hitLength	numMatches
    C_1 gi|7     9913      4225	4225		80		2
    C_1 gi|4     9646      4225	4225		80		2
    C_2 gi|4     9646      4225	4225		80		2
    C_2 gi|7     9913      4225	4225		80		2
    C_3 gi|7     9913      4225	4225		80		2
    C_3 gi|4     9646      4225	4225		80		2
    C_4 gi|4     9646      4225	4225		80		2
    C_4 gi|7     9913      4225	4225		80		2
    1_1 gi|4     9646      4225	0		80		1
    1_2 gi|4     9646      4225	0		80		1
    2_1 gi|7     9913      4225	0		80		1
    2_2 gi|7     9913      4225	0		80		1
    2_3 gi|7     9913      4225	0		80		1
    2_4 gi|7     9913      4225	0		80		1
    2_5 gi|7     9913      4225	0		80		1
    2_6 gi|7     9913      4225	0		80		1


================================================
FILE: Makefile
================================================
#
# Copyright 2014, Daehwan Kim <infphilo@gmail.com>
#
# This file is part of Centrifuge, which is copied and modified from Makefile in the Bowtie2 package.
#
# Centrifuge is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# Centrifuge is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with Centrifuge.  If not, see <http://www.gnu.org/licenses/>.
#
#
# Makefile for centrifuge-bin, centrifuge-build, centrifuge-inspect
#

INC =
GCC_PREFIX = $(shell dirname `which gcc`)
GCC_SUFFIX =
CC = $(GCC_PREFIX)/gcc$(GCC_SUFFIX)
CPP = $(GCC_PREFIX)/g++$(GCC_SUFFIX)
CXX = $(CPP) #-fdiagnostics-color=always
HEADERS = $(wildcard *.h)
BOWTIE_MM = 1
BOWTIE_SHARED_MEM = 0

# Detect Cygwin or MinGW
WINDOWS = 0
CYGWIN = 0
MINGW = 0
ifneq (,$(findstring CYGWIN,$(shell uname)))
	WINDOWS = 1 
	CYGWIN = 1
	# POSIX memory-mapped files not currently supported on Windows
	BOWTIE_MM = 0
	BOWTIE_SHARED_MEM = 0
else
	ifneq (,$(findstring MINGW,$(shell uname)))
		WINDOWS = 1
		MINGW = 1
		# POSIX memory-mapped files not currently supported on Windows
		BOWTIE_MM = 0
		BOWTIE_SHARED_MEM = 0
	endif
endif

MACOS = 0
ifneq (,$(findstring Darwin,$(shell uname)))
	MACOS = 1
endif

POPCNT_CAPABILITY ?= 1
ifeq (1, $(POPCNT_CAPABILITY))
    EXTRA_FLAGS += -DPOPCNT_CAPABILITY
    INC += -I third_party
endif

MM_DEF = 

ifeq (1,$(BOWTIE_MM))
	MM_DEF = -DBOWTIE_MM
endif

SHMEM_DEF = 

ifeq (1,$(BOWTIE_SHARED_MEM))
	SHMEM_DEF = -DBOWTIE_SHARED_MEM
endif

PTHREAD_PKG =
PTHREAD_LIB = 

ifeq (1,$(MINGW))
	PTHREAD_LIB = 
else
	PTHREAD_LIB = -lpthread
endif

SEARCH_LIBS = 
BUILD_LIBS = 
INSPECT_LIBS =

ifeq (1,$(MINGW))
	BUILD_LIBS = 
	INSPECT_LIBS = 
endif

USE_SRA = 0
SRA_DEF =
SRA_LIB =
SERACH_INC = 
ifeq (1,$(USE_SRA))
	SRA_DEF = -DUSE_SRA
	SRA_LIB = -lncbi-ngs-c++-static -lngs-c++-static -lncbi-vdb-static -ldl
	SEARCH_INC += -I$(NCBI_NGS_DIR)/include -I$(NCBI_VDB_DIR)/include
	SEARCH_LIBS += -L$(NCBI_NGS_DIR)/lib64 -L$(NCBI_VDB_DIR)/lib64
endif

LIBS = $(PTHREAD_LIB)

SHARED_CPPS = ccnt_lut.cpp ref_read.cpp alphabet.cpp shmem.cpp \
	edit.cpp bt2_idx.cpp \
	reference.cpp ds.cpp limit.cpp \
	random_source.cpp tinythread.cpp
SEARCH_CPPS = qual.cpp pat.cpp \
	read_qseq.cpp ref_coord.cpp mask.cpp \
	pe.cpp aligner_seed_policy.cpp \
	scoring.cpp presets.cpp \
	simple_func.cpp random_util.cpp outq.cpp

BUILD_CPPS = diff_sample.cpp

CENTRIFUGE_CPPS_MAIN = $(SEARCH_CPPS) centrifuge_main.cpp
CENTRIFUGE_BUILD_CPPS_MAIN = $(BUILD_CPPS) centrifuge_build_main.cpp
CENTRIFUGE_COMPRESS_CPPS_MAIN = $(BUILD_CPPS) \
	aligner_seed.cpp \
	aligner_sw.cpp \
	aligner_cache.cpp \
	dp_framer.cpp \
	aligner_bt.cpp sse_util.cpp \
	aligner_swsse.cpp \
	aligner_swsse_loc_i16.cpp \
	aligner_swsse_ee_i16.cpp \
	aligner_swsse_loc_u8.cpp \
	aligner_swsse_ee_u8.cpp \
	scoring.cpp \
	mask.cpp \
	qual.cpp

CENTRIFUGE_REPORT_CPPS_MAIN=$(BUILD_CPPS)

SEARCH_FRAGMENTS = $(wildcard search_*_phase*.c)
VERSION = $(shell cat VERSION)
GIT_VERSION = $(VERSION)
#GIT_VERSION = $(shell command -v git 2>&1 > /dev/null && git describe --long --tags --dirty --always --abbrev=10 || cat VERSION)

# Convert BITS=?? to a -m flag
BITS=32
ifeq (x86_64,$(shell uname -m))
BITS=64
endif
# msys will always be 32 bit so look at the cpu arch instead.
ifneq (,$(findstring AMD64,$(PROCESSOR_ARCHITEW6432)))
	ifeq (1,$(MINGW))
		BITS=64
	endif
endif
BITS_FLAG =

ifeq (32,$(BITS))
	BITS_FLAG = -m32
endif

ifeq (64,$(BITS))
	BITS_FLAG = -m64
endif
SSE_FLAG=-msse2

DEBUG_FLAGS    = -O0 -g3 $(BIToS_FLAG) $(SSE_FLAG) -std=c++11
DEBUG_DEFS     = -DCOMPILER_OPTIONS="\"$(DEBUG_FLAGS) $(EXTRA_FLAGS)\""
RELEASE_FLAGS  = -O3 $(BITS_FLAG) $(SSE_FLAG) -funroll-loops -g3 -std=c++11
RELEASE_DEFS   = -DCOMPILER_OPTIONS="\"$(RELEASE_FLAGS) $(EXTRA_FLAGS)\""
NOASSERT_FLAGS = -DNDEBUG
FILE_FLAGS     = -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE
CFLAGS         = 
#CFLAGS         = -fdiagnostics-color=always

ifeq (1,$(USE_SRA))
	ifeq (1, $(MACOS))
		DEBUG_FLAGS += -mmacosx-version-min=10.6
		RELEASE_FLAGS += -mmacosx-version-min=10.6
	endif
endif


CENTRIFUGE_BIN_LIST = centrifuge-build-bin \
	centrifuge-class \
	centrifuge-inspect-bin

CENTRIFUGE_BIN_LIST_AUX = centrifuge-build-bin-debug \
	centrifuge-class-debug \
	centrifuge-inspect-bin-debug

CENTRIFUGE_SCRIPT_LIST = 	centrifuge \
	centrifuge-build \
	centrifuge-inspect \
	centrifuge-download \
	centrifuge-kreport \
	$(wildcard centrifuge-*.pl)


GENERAL_LIST = $(wildcard scripts/*.sh) \
	$(wildcard scripts/*.pl) \
	$(wildcard *.py) \
	$(wildcard *.pl) \
	doc/manual.inc.html \
	doc/README \
	doc/style.css \
	$(wildcard example/index/*.cf) \
	$(wildcard example/reads/*.fa) \
	$(wildcard example/reference/*) \
	indices/Makefile \
	$(PTHREAD_PKG) \
	$(CENTRIFUGE_SCRIPT_LIST) \
	AUTHORS \
	LICENSE \
	NEWS \
	MANUAL \
	MANUAL.markdown \
	TUTORIAL \
	VERSION

ifeq (1,$(WINDOWS))
	CENTRIFUGE_BIN_LIST := $(CENTRIFUGE_BIN_LIST) centrifuge.bat centrifuge-build.bat centrifuge-inspect.bat 
endif

# This is helpful on Windows under MinGW/MSYS, where Make might go for
# the Windows FIND tool instead.
FIND=$(shell which find)

SRC_PKG_LIST = $(wildcard *.h) \
	$(wildcard *.hh) \
	$(wildcard *.c) \
	$(wildcard *.cpp) \
	$(wildcard third_party/*.h) \
	$(wildcard third_party/*.cpp) \
	doc/strip_markdown.pl \
	Makefile \
	$(GENERAL_LIST)

BIN_PKG_LIST = $(GENERAL_LIST)

.PHONY: all allall both both-debug

all: $(CENTRIFUGE_BIN_LIST)

allall: $(CENTRIFUGE_BIN_LIST) $(CENTRIFUGE_BIN_LIST_AUX)

both: centrifuge-class centrifuge-build-bin

both-debug: centrifuge-class-debug centrifuge-build-bin-debug

DEFS=-fno-strict-aliasing \
     -DCENTRIFUGE_VERSION="\"$(GIT_VERSION)\"" \
     -DBUILD_HOST="\"`hostname`\"" \
     -DBUILD_TIME="\"`date`\"" \
     -DCOMPILER_VERSION="\"`$(CXX) -v 2>&1 | tail -1`\"" \
     $(FILE_FLAGS) \
	 $(CFLAGS) \
     $(PREF_DEF) \
     $(MM_DEF) \
     $(SHMEM_DEF)

#
# centrifuge targets
#

centrifuge-class: centrifuge.cpp $(SEARCH_CPPS) $(SHARED_CPPS) $(HEADERS) $(SEARCH_FRAGMENTS)
	$(CXX) $(RELEASE_FLAGS) $(RELEASE_DEFS) $(EXTRA_FLAGS) \
	$(DEFS) $(SRA_DEF) -DCENTRIFUGE -DBOWTIE2 -DBOWTIE_64BIT_INDEX $(NOASSERT_FLAGS) -Wall \
	$(INC) $(SEARCH_INC) \
	-o $@ $< \
	$(SHARED_CPPS) $(CENTRIFUGE_CPPS_MAIN) \
	$(LIBS) $(SRA_LIB) $(SEARCH_LIBS)

centrifuge-class-debug: centrifuge.cpp $(SEARCH_CPPS) $(SHARED_CPPS) $(HEADERS) $(SEARCH_FRAGMENTS)
	$(CXX) $(DEBUG_FLAGS) $(DEBUG_DEFS) $(EXTRA_FLAGS) \
	$(DEFS) $(SRA_DEF) -DCENTRIFUGE -DBOWTIE2 -DBOWTIE_64BIT_INDEX -Wall \
	$(INC) $(SRA_LIB) $(SEARCH_INC) \
	-o $@ $< \
	$(SHARED_CPPS) $(CENTRIFUGE_CPPS_MAIN) \
	$(LIBS) $(SRA_LIB) $(SEARCH_LIBS)

centrifuge-build-bin: centrifuge_build.cpp $(SHARED_CPPS) $(HEADERS)
	$(CXX) $(RELEASE_FLAGS) $(RELEASE_DEFS) $(EXTRA_FLAGS) \
	$(DEFS) -DCENTRIFUGE -DBOWTIE2 -DBOWTIE_64BIT_INDEX $(NOASSERT_FLAGS) -Wall \
	$(INC) \
	-o $@ $< \
	$(SHARED_CPPS) $(CENTRIFUGE_BUILD_CPPS_MAIN) \
	$(LIBS) $(BUILD_LIBS)

centrifuge-build-bin-debug: centrifuge_build.cpp $(SHARED_CPPS) $(HEADERS)
	$(CXX) $(DEBUG_FLAGS) $(DEBUG_DEFS) $(EXTRA_FLAGS) \
	$(DEFS) -DCENTRIFUGE -DBOWTIE2 -DBOWTIE_64BIT_INDEX -Wall \
	$(INC) \
	-o $@ $< \
	$(SHARED_CPPS) $(CENTRIFUGE_BUILD_CPPS_MAIN) \
	$(LIBS) $(BUILD_LIBS)

centrifuge-compress-bin: centrifuge_compress.cpp $(SHARED_CPPS) $(CENTRIFUGE_COMPRESS_CPPS_MAIN) $(HEADERS)
	$(CXX) $(RELEASE_FLAGS) $(RELEASE_DEFS) $(EXTRA_FLAGS) \
	$(DEFS) -DCENTRIFUGE -DBOWTIE2 -DBOWTIE_64BIT_INDEX $(NOASSERT_FLAGS) -Wall \
	$(INC) \
	-o $@ $< \
	$(SHARED_CPPS) $(CENTRIFUGE_COMPRESS_CPPS_MAIN) \
	$(LIBS) $(BUILD_LIBS)

centrifuge-compress-bin-debug: centrifuge_compress.cpp $(SHARED_CPPS) $(CENTRIFUGE_COMPRESS_CPPS_MAIN) $(HEADERS)
	$(CXX) $(DEBUG_FLAGS) $(DEBUG_DEFS) $(EXTRA_FLAGS) \
	$(DEFS) -DCENTRIFUGE -DBOWTIE2 -DBOWTIE_64BIT_INDEX -Wall \
	$(INC) \
	-o $@ $< \
	$(SHARED_CPPS) $(CENTRIFUGE_COMPRESS_CPPS_MAIN) \
	$(LIBS) $(BUILD_LIBS)

centrifuge-report-bin: centrifuge_report.cpp $(SHARED_CPPS) $(CENTRIFUGE_REPORT_CPPS_MAIN) $(HEADERS)
	$(CXX) $(RELEASE_FLAGS) $(RELEASE_DEFS) $(EXTRA_FLAGS) \
	$(DEFS) -DCENTRIFUGE -DBOWTIE2 -DBOWTIE_64BIT_INDEX $(NOASSERT_FLAGS) -Wall \
	$(INC) \
	-o $@ $< \
	$(SHARED_CPPS) $(CENTRIFUGE_REPORT_CPPS_MAIN) \
	$(LIBS) $(BUILD_LIBS)

centrifuge-report-bin-debug: centrifuge_report.cpp $(SHARED_CPPS) $(CENTRIFUGE_REPORT_CPPS_MAIN) $(HEADERS)
	$(CXX) $(DEBUG_FLAGS) $(DEBUG_DEFS) $(EXTRA_FLAGS) \
	$(DEFS) -DCENTRIFUGE -DBOWTIE2 -DBOWTIE_64BIT_INDEX -Wall \
	$(INC) \
	-o $@ $< \
	$(SHARED_CPPS) $(CENTRIFUGE_REPORT_CPPS_MAIN) \
	$(LIBS) $(BUILD_LIBS)

#centrifuge-RemoveN: centrifuge-RemoveN.cpp 
#	$(CXX) $(RELEASE_FLAGS) $(RELEASE_DEFS) $(EXTRA_FLAGS) \
#	$(DEFS) -DCENTRIFUGE -DBOWTIE2 -DBOWTIE_64BIT_INDEX $(NOASSERT_FLAGS) -Wall \
#	$(INC) \
#	-o $@ $< 


#
# centrifuge-inspect targets
#

centrifuge-inspect-bin: centrifuge_inspect.cpp $(HEADERS) $(SHARED_CPPS)
	$(CXX) $(RELEASE_FLAGS) \
	$(RELEASE_DEFS) $(EXTRA_FLAGS) \
	$(DEFS) -DCENTRIFUGE -DBOWTIE2 -DBOWTIE_64BIT_INDEX -Wall \
	$(INC) -I . \
	-o $@ $< \
	$(SHARED_CPPS) \
	$(LIBS) $(INSPECT_LIBS)

centrifuge-inspect-bin-debug: centrifuge_inspect.cpp $(HEADERS) $(SHARED_CPPS) 
	$(CXX) $(DEBUG_FLAGS) \
	$(DEBUG_DEFS) $(EXTRA_FLAGS) \
	$(DEFS) -DCENTRIFUGE -DBOWTIE2 -DBOWTIE_64BIT_INDEX -Wall \
	$(INC) -I . \
	-o $@ $< \
	$(SHARED_CPPS) \
	$(LIBS) $(INSPECT_LIBS)


centrifuge: ;

centrifuge.bat:
	echo "@echo off" > centrifuge.bat
	echo "perl %~dp0/centrifuge %*" >> centrifuge.bat

centrifuge-build.bat:
	echo "@echo off" > centrifuge-build.bat
	echo "python %~dp0/centrifuge-build %*" >> centrifuge-build.bat

centrifuge-inspect.bat:
	echo "@echo off" > centrifuge-inspect.bat
	echo "python %~dp0/centrifuge-inspect %*" >> centrifuge-inspect.bat


.PHONY: centrifuge-src
centrifuge-src: $(SRC_PKG_LIST)
	mkdir .src.tmp
	mkdir .src.tmp/centrifuge-$(VERSION)
	zip tmp.zip $(SRC_PKG_LIST)
	mv tmp.zip .src.tmp/centrifuge-$(VERSION)
	cd .src.tmp/centrifuge-$(VERSION) ; unzip tmp.zip ; rm -f tmp.zip
	cd .src.tmp ; zip -r centrifuge-$(VERSION)-source.zip centrifuge-$(VERSION)
	cp .src.tmp/centrifuge-$(VERSION)-source.zip .
	rm -rf .src.tmp

.PHONY: centrifuge-bin
centrifuge-bin: $(BIN_PKG_LIST) $(CENTRIFUGE_BIN_LIST) $(CENTRIFUGE_BIN_LIST_AUX) 
	rm -rf .bin.tmp
	mkdir .bin.tmp
	mkdir .bin.tmp/centrifuge-$(VERSION)
	if [ -f centrifuge.exe ] ; then \
		zip tmp.zip $(BIN_PKG_LIST) $(addsuffix .exe,$(CENTRIFUGE_BIN_LIST) $(CENTRIFUGE_BIN_LIST_AUX)) ; \
	else \
		zip tmp.zip $(BIN_PKG_LIST) $(CENTRIFUGE_BIN_LIST) $(CENTRIFUGE_BIN_LIST_AUX) ; \
	fi
	mv tmp.zip .bin.tmp/centrifuge-$(VERSION)
	cd .bin.tmp/centrifuge-$(VERSION) ; unzip tmp.zip ; rm -f tmp.zip
	cd .bin.tmp ; zip -r centrifuge-$(VERSION)-$(BITS).zip centrifuge-$(VERSION)
	cp .bin.tmp/centrifuge-$(VERSION)-$(BITS).zip .
	rm -rf .bin.tmp

.PHONY: doc
doc: doc/manual.inc.html MANUAL

doc/manual.inc.html: MANUAL.markdown
	pandoc -T "Centrifuge Manual" -o $@ \
	 --from markdown --to HTML --toc $^
	perl -i -ne \
	 '$$w=0 if m|^</body>|;print if $$w;$$w=1 if m|^<body>|;' $@

MANUAL: MANUAL.markdown
	perl doc/strip_markdown.pl < $^ > $@

prefix=/usr/local

.PHONY: install
install: all
	mkdir -p $(prefix)/bin
	mkdir -p $(prefix)/share/centrifuge/indices
	install -m 0644 indices/Makefile $(prefix)/share/centrifuge/indices
	install -d -m 0755 $(prefix)/share/centrifuge/doc
	install -m 0644 doc/* $(prefix)/share/centrifuge/doc
	for file in $(CENTRIFUGE_BIN_LIST) $(CENTRIFUGE_SCRIPT_LIST); do \
		install -m 0755 $$file $(prefix)/bin ; \
	done

.PHONY: uninstall
uninstall: all
	for file in $(CENTRIFUGE_BIN_LIST) $(CENTRIFUGE_SCRIPT_LIST); do \
		rm -v $(prefix)/bin/$$file ; \
		rm -v $(prefix)/share/centrifuge; \
	done


.PHONY: clean
clean:
	rm -f $(CENTRIFUGE_BIN_LIST) $(CENTRIFUGE_BIN_LIST_AUX) \
	$(addsuffix .exe,$(CENTRIFUGE_BIN_LIST) $(CENTRIFUGE_BIN_LIST_AUX)) \
	centrifuge-src.zip centrifuge-bin.zip
	rm -f core.* .tmp.head
	rm -rf *.dSYM
push-doc: doc/manual.inc.html
	scp doc/*.*html igm1:/data1/igm3/www/ccb.jhu.edu/html/software/centrifuge/


================================================
FILE: NEWS
================================================
Centrifuge NEWS
=============



================================================
FILE: README.md
================================================
# Centrifuge
Classifier for metagenomic sequences

[Centrifuge] is a novel microbial classification engine that enables
rapid, accurate and sensitive labeling of reads and quantification of
species on desktop computers.  The system uses a novel indexing scheme
based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini
(FM) index, optimized specifically for the metagenomic classification
problem. Centrifuge requires a relatively small index (4.7 GB for all
complete bacterial and viral genomes plus the human genome) and
classifies sequences at very high speed, allowing it to process the
millions of reads from a typical high-throughput DNA sequencing run
within a few minutes.  Together these advances enable timely and
accurate analysis of large metagenomics data sets on conventional
desktop computers

The Centrifuge hompage is  http://www.ccb.jhu.edu/software/centrifuge

The Centrifuge paper is available at https://genome.cshlp.org/content/26/12/1721

The Centrifuge poster is available at http://www.ccb.jhu.edu/people/infphilo/data/Centrifuge-poster.pdf

For more details on installing and running Centrifuge, look at MANUAL

## Quick guide
### Installation from source

    git clone https://github.com/infphilo/centrifuge
    cd centrifuge
    make
    sudo make install prefix=/usr/local

### Building indexes

We provide several indexes on the Centrifuge homepage at http://www.ccb.jhu.edu/software/centrifuge.
Centrifuge needs sequence and taxonomy files,  as well as sequence ID to taxonomy ID mapping. 
See the MANUAL files for details. We provide a Makefile that simplifies the building of several
standard and custom indices

    cd indices
    make p+h+v                   # bacterial, human, and viral genomes [~12G]
    make p_compressed            # bacterial genomes compressed at the species level [~4.2G]
    make p_compressed+h+v        # combination of the two above [~8G]


================================================
FILE: TUTORIAL
================================================
See section toward end of MANUAL entited "Getting started with Bowtie 2: Lambda
phage example".  Or, for tutorial for latest Bowtie 2 version, visit:

http://bowtie-bio.sf.net/bowtie2/manual.shtml#getting-started-with-bowtie-2-lambda-phage-example


================================================
FILE: VERSION
================================================
1.0.4


================================================
FILE: aligner_bt.cpp
================================================
/*
 * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>
 *
 * This file is part of Bowtie 2.
 *
 * Bowtie 2 is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.
 *
 * Bowtie 2 is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with Bowtie 2.  If not, see <http://www.gnu.org/licenses/>.
 */

#include "aligner_bt.h"
#include "mask.h"

using namespace std;

#define CHECK_ROW_COL(rowc, colc) \
	if(rowc >= 0 && colc >= 0) { \
		if(!sawcell_[colc].insert(rowc)) { \
			/* was already in there */ \
			abort = true; \
			return; \
		} \
		assert(local || prob_.cper_->debugCell(rowc, colc, hefc)); \
	}

/**
 * Fill in a triangle of the DP table and backtrace from the given cell to
 * a cell in the previous checkpoint, or to the terminal cell.
 */
void BtBranchTracer::triangleFill(
	int64_t rw,          // row of cell to backtrace from
	int64_t cl,          // column of cell to backtrace from
	int hef,             // cell to backtrace from is H (0), E (1), or F (2)
	TAlScore targ,       // score of cell to backtrace from
	TAlScore targ_final, // score of alignment we're looking for
	RandomSource& rnd,   // pseudo-random generator
	int64_t& row_new,    // out: row we ended up in after backtrace
	int64_t& col_new,    // out: column we ended up in after backtrace
	int& hef_new,        // out: H/E/F after backtrace
	TAlScore& targ_new,  // out: score up to cell we ended up in
	bool& done,          // out: finished tracing out an alignment?
	bool& abort)         // out: aborted b/c cell was seen before?
{
	assert_geq(rw, 0);
	assert_geq(cl, 0);
	assert_range(0, 2, hef);
	assert_lt(rw, (int64_t)prob_.qrylen_);
	assert_lt(cl, (int64_t)prob_.reflen_);
	assert(prob_.usecp_ && prob_.fill_);
	int64_t row = rw, col = cl;
	const int64_t colmin = 0;
	const int64_t rowmin = 0;
	const int64_t colmax = prob_.reflen_ - 1;
	const int64_t rowmax = prob_.qrylen_ - 1;
	assert_leq(prob_.reflen_, (TRefOff)sawcell_.size());
	assert_leq(col, (int64_t)prob_.cper_->hicol());
	assert_geq(col, (int64_t)prob_.cper_->locol());
	assert_geq(prob_.cper_->per(), 2);
	size_t mod = (row + col) & prob_.cper_->lomask();
	assert_lt(mod, prob_.cper_->per());
	// Allocate room for diags
	size_t depth = mod+1;
	assert_leq(depth, prob_.cper_->per());
	size_t breadth = depth;
	tri_.resize(depth);
	// Allocate room for each diag
	for(size_t i = 0; i < depth; i++) {
		tri_[i].resize(breadth - i);
	}
	bool upperleft = false;
	size_t off = (row + col) >> prob_.cper_->perpow2();
	if(off == 0) {
		upperleft = true;
	} else {
		off--;
	}
	const TAlScore sc_rdo = prob_.sc_->readGapOpen();
	const TAlScore sc_rde = prob_.sc_->readGapExtend();
	const TAlScore sc_rfo = prob_.sc_->refGapOpen();
	const TAlScore sc_rfe = prob_.sc_->refGapExtend();
	const bool local = !prob_.sc_->monotone;
	int64_t row_lo = row - (int64_t)mod;
	const CpQuad *prev2 = NULL, *prev1 = NULL;
	if(!upperleft) {
		// Read-only pointer to cells in diagonal -2.  Start one row above the
		// target row.
		prev2 = prob_.cper_->qdiag1sPtr() + (off * prob_.cper_->nrow() + row_lo - 1);
		// Read-only pointer to cells in diagonal -1.  Start one row above the
		// target row
		prev1 = prob_.cper_->qdiag2sPtr() + (off * prob_.cper_->nrow() + row_lo - 1);
#ifndef NDEBUG
		if(row >= (int64_t)mod) {
			size_t rowc = row - mod, colc = col;
			if(rowc > 0 && prob_.cper_->isCheckpointed(rowc-1, colc)) {
				TAlScore al = prev1[0].sc[0];
				if(al == MIN_I16) al = MIN_I64;
				assert_eq(prob_.cper_->scoreTriangle(rowc-1, colc, 0), al);
			}
			if(rowc > 0 && colc > 0 && prob_.cper_->isCheckpointed(rowc-1, colc-1)) {
				TAlScore al = prev2[0].sc[0];
				if(al == MIN_I16) al = MIN_I64;
				assert_eq(prob_.cper_->scoreTriangle(rowc-1, colc-1, 0), al);
			}
		}
#endif
	}
	// Pointer to cells in current diagonal
	// For each diagonal we need to fill in
	for(size_t i = 0; i < depth; i++) {
		CpQuad * cur = tri_[i].ptr();
		CpQuad * curc = cur;
		size_t doff = mod - i; // # diagonals we are away from target diag
		//assert_geq(row, (int64_t)doff);
		int64_t rowc = row - doff;
		int64_t colc = col;
		size_t neval = 0; // # cells evaluated in this diag
		ASSERT_ONLY(const CpQuad *last = NULL);
		// Fill this diagonal from upper right to lower left
		for(size_t j = 0; j < breadth; j++) {
			if(rowc >= rowmin && rowc <= rowmax &&
			   colc >= colmin && colc <= colmax)
			{
				neval++;
				int64_t fromend = prob_.qrylen_ - rowc - 1;
				bool allowGaps = fromend >= prob_.sc_->gapbar && rowc >= prob_.sc_->gapbar;
				// Fill this cell
				// Some things we might want to calculate about this cell up front:
				// 1. How many matches are possible from this cell to the cell in
				//    row, col, in case this allows us to prune
				// Get character from read
				int qc = prob_.qry_[rowc];
				// Get quality value from read
				int qq = prob_.qual_[rowc];
				assert_geq(qq, 33);
				// Get character from reference
				int rc = prob_.ref_[colc];
				assert_range(0, 16, rc);
				int16_t sc_diag = prob_.sc_->score(qc, rc, qq - 33);
				int16_t sc_h_up = MIN_I16;
				int16_t sc_f_up = MIN_I16;
				int16_t sc_h_lf = MIN_I16;
				int16_t sc_e_lf = MIN_I16;
				if(allowGaps) {
					if(rowc > 0) {
						assert(local || prev1[j+0].sc[2] < 0);
						if(prev1[j+0].sc[0] > MIN_I16) {
							sc_h_up = prev1[j+0].sc[0] - sc_rfo;
							if(local) sc_h_up = max<int16_t>(sc_h_up, 0);
						}
						if(prev1[j+0].sc[2] > MIN_I16) {
							sc_f_up = prev1[j+0].sc[2] - sc_rfe;
							if(local) sc_f_up = max<int16_t>(sc_f_up, 0);
						}
#ifndef NDEBUG
						TAlScore hup = prev1[j+0].sc[0];
						TAlScore fup = prev1[j+0].sc[2];
						if(hup == MIN_I16) hup = MIN_I64;
						if(fup == MIN_I16) fup = MIN_I64;
						if(local) {
							hup = max<int16_t>(hup, 0);
							fup = max<int16_t>(fup, 0);
						}
						if(prob_.cper_->isCheckpointed(rowc-1, colc)) {
							assert_eq(hup, prob_.cper_->scoreTriangle(rowc-1, colc, 0));
							assert_eq(fup, prob_.cper_->scoreTriangle(rowc-1, colc, 2));
						}
#endif
					}
					if(colc > 0) {
						assert(local || prev1[j+1].sc[1] < 0);
						if(prev1[j+1].sc[0] > MIN_I16) {
							sc_h_lf = prev1[j+1].sc[0] - sc_rdo;
							if(local) sc_h_lf = max<int16_t>(sc_h_lf, 0);
						}
						if(prev1[j+1].sc[1] > MIN_I16) {
							sc_e_lf = prev1[j+1].sc[1] - sc_rde;
							if(local) sc_e_lf = max<int16_t>(sc_e_lf, 0);
						}
#ifndef NDEBUG
						TAlScore hlf = prev1[j+1].sc[0];
						TAlScore elf = prev1[j+1].sc[1];
						if(hlf == MIN_I16) hlf = MIN_I64;
						if(elf == MIN_I16) elf = MIN_I64;
						if(local) {
							hlf = max<int16_t>(hlf, 0);
							elf = max<int16_t>(elf, 0);
						}
						if(prob_.cper_->isCheckpointed(rowc, colc-1)) {
							assert_eq(hlf, prob_.cper_->scoreTriangle(rowc, colc-1, 0));
							assert_eq(elf, prob_.cper_->scoreTriangle(rowc, colc-1, 1));
						}
#endif
					}
				}
				assert(rowc <= 1 || colc <= 0 || prev2 != NULL);
				int16_t sc_h_dg = ((rowc > 0 && colc > 0) ? prev2[j+0].sc[0] : 0);
				if(colc == 0 && rowc > 0 && !local) {
					sc_h_dg = MIN_I16;
				}
				if(sc_h_dg > MIN_I16) {
					sc_h_dg += sc_diag;
				}
				if(local) sc_h_dg = max<int16_t>(sc_h_dg, 0);
				// cerr << sc_diag << " " << sc_h_dg << " " << sc_h_up << " " << sc_f_up << " " << sc_h_lf << " " << sc_e_lf << endl;
				int mask = 0;
				// Calculate best ways into H, E, F cells starting with H.
				// Mask bits:
				// H: 1=diag, 2=hhoriz, 4=ehoriz, 8=hvert, 16=fvert
				// E: 32=hhoriz, 64=ehoriz
				// F: 128=hvert, 256=fvert
				int16_t sc_best = sc_h_dg;
				if(sc_h_dg > MIN_I64) {
					mask = 1;
				}
				if(colc > 0 && sc_h_lf >= sc_best && sc_h_lf > MIN_I64) {
					if(sc_h_lf > sc_best) mask = 0;
					mask |= 2;
					sc_best = sc_h_lf;
				}
				if(colc > 0 && sc_e_lf >= sc_best && sc_e_lf > MIN_I64) {
					if(sc_e_lf > sc_best) mask = 0;
					mask |= 4;
					sc_best = sc_e_lf;
				}
				if(rowc > 0 && sc_h_up >= sc_best && sc_h_up > MIN_I64) {
					if(sc_h_up > sc_best) mask = 0;
					mask |= 8;
					sc_best = sc_h_up;
				}
				if(rowc > 0 && sc_f_up >= sc_best && sc_f_up > MIN_I64) {
					if(sc_f_up > sc_best) mask = 0;
					mask |= 16;
					sc_best = sc_f_up;
				}
				// Calculate best way into E cell
				int16_t sc_e_best = sc_h_lf;
				if(colc > 0) {
					if(sc_h_lf >= sc_e_lf && sc_h_lf > MIN_I64) {
						if(sc_h_lf == sc_e_lf) {
							mask |= 64;
						}
						mask |= 32;
					} else if(sc_e_lf > MIN_I64) {
						sc_e_best = sc_e_lf;
						mask |= 64;
					}
				}
				if(sc_e_best > sc_best) {
					sc_best = sc_e_best;
					mask &= ~31; // don't go diagonal
				}
				// Calculate best way into F cell
				int16_t sc_f_best = sc_h_up;
				if(rowc > 0) {
					if(sc_h_up >= sc_f_up && sc_h_up > MIN_I64) {
						if(sc_h_up == sc_f_up) {
							mask |= 256;
						}
						mask |= 128;
					} else if(sc_f_up > MIN_I64) {
						sc_f_best = sc_f_up;
						mask |= 256;
					}
				}
				if(sc_f_best > sc_best) {
					sc_best = sc_f_best;
					mask &= ~127; // don't go horizontal or diagonal
				}
				// Install results in cur
				assert(!prob_.sc_->monotone || sc_best <= 0);
				assert(!prob_.sc_->monotone || sc_e_best <= 0);
				assert(!prob_.sc_->monotone || sc_f_best <= 0);
				curc->sc[0] = sc_best;
				assert( local || sc_e_best < 0);
				assert( local || sc_f_best < 0);
				assert(!local || sc_e_best >= 0 || sc_e_best == MIN_I16);
				assert(!local || sc_f_best >= 0 || sc_f_best == MIN_I16);
				curc->sc[1] = sc_e_best;
				curc->sc[2] = sc_f_best;
				curc->sc[3] = mask;
				// cerr << curc->sc[0] << " " << curc->sc[1] << " " << curc->sc[2] << " " << curc->sc[3] << endl;
				ASSERT_ONLY(last = curc);
#ifndef NDEBUG
				if(prob_.cper_->isCheckpointed(rowc, colc)) {
					if(local) {
						sc_e_best = max<int16_t>(sc_e_best, 0);
						sc_f_best = max<int16_t>(sc_f_best, 0);
					}
					TAlScore sc_best64   = sc_best;   if(sc_best   == MIN_I16) sc_best64   = MIN_I64;
					TAlScore sc_e_best64 = sc_e_best; if(sc_e_best == MIN_I16) sc_e_best64 = MIN_I64;
					TAlScore sc_f_best64 = sc_f_best; if(sc_f_best == MIN_I16) sc_f_best64 = MIN_I64;
					assert_eq(prob_.cper_->scoreTriangle(rowc, colc, 0), sc_best64);
					assert_eq(prob_.cper_->scoreTriangle(rowc, colc, 1), sc_e_best64);
					assert_eq(prob_.cper_->scoreTriangle(rowc, colc, 2), sc_f_best64);
				}
#endif
			}
			// Update row, col
			assert_lt(rowc, (int64_t)prob_.qrylen_);
			rowc++;
			colc--;
			curc++;
		} // for(size_t j = 0; j < breadth; j++)
		if(i == depth-1) {
			// Final iteration
			assert(last != NULL);
			assert_eq(1, neval);
			assert_neq(0, last->sc[3]);
			assert_eq(targ, last->sc[hef]);
		} else {
			breadth--;
			prev2 = prev1 + 1;
			prev1 = cur;
		}
	} // for(size_t i = 0; i < depth; i++)
	//
	// Now backtrack through the triangle.  Abort as soon as we enter a cell
	// that was visited by a previous backtrace.
	//
	int64_t rowc = row, colc = col;
	size_t curid;
	int hefc = hef;
	if(bs_.empty()) {
		// Start an initial branch
		CHECK_ROW_COL(rowc, colc);
		curid = bs_.alloc();
		assert_eq(0, curid);
		Edit e;
		bs_[curid].init(
			prob_,
			0,      // parent ID
			0,      // penalty
			0,      // score_en
			rowc,   // row
			colc,   // col
			e,      // edit
			0,      // hef
			true,   // I am the root
			false); // don't try to extend with exact matches
		bs_[curid].len_ = 0;
	} else {
		curid = bs_.size()-1;
	}
	size_t idx_orig = (row + col) >> prob_.cper_->perpow2();
	while(true) {
		// What depth are we?
		size_t mod = (rowc + colc) & prob_.cper_->lomask();
		assert_lt(mod, prob_.cper_->per());
		CpQuad * cur = tri_[mod].ptr();
		int64_t row_off = rowc - row_lo - mod;
		assert(!local || cur[row_off].sc[0] > 0);
		assert_geq(row_off, 0);
		int mask = cur[row_off].sc[3];
		assert_gt(mask, 0);
		int sel = -1;
		// Select what type of move to make, which depends on whether we're
		// currently in H, E, F:
		if(hefc == 0) {
			if(       (mask & 1) != 0) {
				// diagonal
				sel = 0;
			} else if((mask & 8) != 0) {
				// up to H
				sel = 3;
			} else if((mask & 16) != 0) {
				// up to F
				sel = 4;
			} else if((mask & 2) != 0) {
				// left to H
				sel = 1;
			} else if((mask & 4) != 0) {
				// left to E
				sel = 2;
			}
		} else if(hefc == 1) {
			if(       (mask & 32) != 0) {
				// left to H
				sel = 5;
			} else if((mask & 64) != 0) {
				// left to E
				sel = 6;
			}
		} else {
			assert_eq(2, hefc);
			if(       (mask & 128) != 0) {
				// up to H
				sel = 7;
			} else if((mask & 256) != 0) {
				// up to F
				sel = 8;
			}
		}
		assert_geq(sel, 0);
		// Get character from read
		int qc = prob_.qry_[rowc], qq = prob_.qual_[rowc];
		// Get character from reference
		int rc = prob_.ref_[colc];
		assert_range(0, 16, rc);
		// Now that we know what type of move to make, make it, updating our
		// row and column and moving updating the branch.
		if(sel == 0) {
			assert_geq(rowc, 0);
			assert_geq(colc, 0);
			TAlScore scd = prob_.sc_->score(qc, rc, qq - 33);
			if((rc & (1 << qc)) == 0) {
				// Mismatch
				size_t id = curid;
				// Check if the previous branch was the initial (bottommost)
				// branch with no matches.  If so, the mismatch should be added
				// to the initial branch, instead of starting a new branch.
				bool empty = (bs_[curid].len_ == 0 && curid == 0);
				if(!empty) {
					id = bs_.alloc();
				}
				Edit e((int)rowc, mask2dna[rc], "ACGTN"[qc], EDIT_TYPE_MM);
				assert_lt(scd, 0);
				TAlScore score_en = bs_[curid].score_st_ + scd;
				bs_[id].init(
					prob_,
					curid,    // parent ID
					-scd,     // penalty
					score_en, // score_en
					rowc,     // row
					colc,     // col
					e,        // edit
					hefc,     // hef
					empty,    // root?
					false);   // don't try to extend with exact matches
				//assert(!local || bs_[id].score_st_ >= 0);
				curid = id;
			} else {
				// Match
				bs_[curid].score_st_ += prob_.sc_->match();
				bs_[curid].len_++;
				assert_leq((int64_t)bs_[curid].len_, bs_[curid].row_ + 1);
			}
			rowc--;
			colc--;
			assert(local || bs_[curid].score_st_ >= targ_final);
			hefc = 0;
		} else if((sel >= 1 && sel <= 2) || (sel >= 5 && sel <= 6)) {
			assert_gt(colc, 0);
			// Read gap
			size_t id = bs_.alloc();
			Edit e((int)rowc+1, mask2dna[rc], '-', EDIT_TYPE_READ_GAP);
			TAlScore gapp = prob_.sc_->readGapOpen();
			if(bs_[curid].len_ == 0 && bs_[curid].e_.inited() && bs_[curid].e_.isReadGap()) {
				gapp = prob_.sc_->readGapExtend();
			}
			TAlScore score_en = bs_[curid].score_st_ - gapp;
			bs_[id].init(
				prob_,
				curid,    // parent ID
				gapp,     // penalty
				score_en, // score_en
				rowc,     // row
				colc-1,   // col
				e,        // edit
				hefc,     // hef
				false,    // root?
				false);   // don't try to extend with exact matches
			colc--;
			curid = id;
			assert( local || bs_[curid].score_st_ >= targ_final);
			//assert(!local || bs_[curid].score_st_ >= 0);
			if(sel == 1 || sel == 5) {
				hefc = 0;
			} else {
				hefc = 1;
			}
		} else {
			assert_gt(rowc, 0);
			// Reference gap
			size_t id = bs_.alloc();
			Edit e((int)rowc, '-', "ACGTN"[qc], EDIT_TYPE_REF_GAP);
			TAlScore gapp = prob_.sc_->refGapOpen();
			if(bs_[curid].len_ == 0 && bs_[curid].e_.inited() && bs_[curid].e_.isRefGap()) {
				gapp = prob_.sc_->refGapExtend();
			}
			TAlScore score_en = bs_[curid].score_st_ - gapp;
			bs_[id].init(
				prob_,
				curid,    // parent ID
				gapp,     // penalty
				score_en, // score_en
				rowc-1,   // row
				colc,     // col
				e,        // edit
				hefc,     // hef
				false,    // root?
				false);   // don't try to extend with exact matches
			rowc--;
			curid = id;
			//assert(!local || bs_[curid].score_st_ >= 0);
			if(sel == 3 || sel == 7) {
				hefc = 0;
			} else {
				hefc = 2;
			}
		}
		CHECK_ROW_COL(rowc, colc);
		size_t mod_new = (rowc + colc) & prob_.cper_->lomask();
		size_t idx = (rowc + colc) >> prob_.cper_->perpow2();
		assert_lt(mod_new, prob_.cper_->per());
		int64_t row_off_new = rowc - row_lo - mod_new;
		CpQuad * cur_new = NULL;
		if(colc >= 0 && rowc >= 0 && idx == idx_orig) {
			cur_new = tri_[mod_new].ptr();
		}
		bool hit_new_tri = (idx < idx_orig && colc >= 0 && rowc >= 0);
		// Check whether we made it to the top row or to a cell with score 0
		if(colc < 0 || rowc < 0 ||
		   (cur_new != NULL && (local && cur_new[row_off_new].sc[0] == 0)))
		{
			done = true;
			assert(bs_[curid].isSolution(prob_));
			addSolution(curid);
#ifndef NDEBUG
			// A check to see if any two adjacent branches in the backtrace
			// overlap.  If they do, the whole alignment will be filtered out
			// in trySolution(...)
			size_t cur = curid;
			if(!bs_[cur].root_) {
				size_t next = bs_[cur].parentId_;
				while(!bs_[next].root_) {
					assert_neq(cur, next);
					if(bs_[next].len_ != 0 || bs_[cur].len_ == 0) {
						assert(!bs_[cur].overlap(prob_, bs_[next]));
					}
					cur = next;
					next = bs_[cur].parentId_;
				}
			}
#endif
			return;
		}
		if(hit_new_tri) {
			assert(rowc < 0 || colc < 0 || prob_.cper_->isCheckpointed(rowc, colc));
			row_new = rowc; col_new = colc;
			hef_new = hefc;
			done = false;
			if(rowc < 0 || colc < 0) {
				assert(local);
				targ_new = 0;
			} else {
				targ_new = prob_.cper_->scoreTriangle(rowc, colc, hefc);
			}
			if(local && targ_new == 0) {
				done = true;
				assert(bs_[curid].isSolution(prob_));
				addSolution(curid);
			}
			assert((row_new >= 0 && col_new >= 0) || done);
			return;
		}
	}
	assert(false);
}

#ifndef NDEBUG
#define DEBUG_CHECK(ss, row, col, hef) { \
	if(prob_.cper_->debug() && row >= 0 && col >= 0) { \
		TAlScore s = ss; \
		if(s == MIN_I16) s = MIN_I64; \
		if(local && s < 0) s = 0; \
		TAlScore deb = prob_.cper_->debugCell(row, col, hef); \
		if(local && deb < 0) deb = 0; \
		assert_eq(s, deb); \
	} \
}
#else
#define DEBUG_CHECK(ss, row, col, hef)
#endif


/**
 * Fill in a square of the DP table and backtrace from the given cell to
 * a cell in the previous checkpoint, or to the terminal cell.
 */
void BtBranchTracer::squareFill(
	int64_t rw,          // row of cell to backtrace from
	int64_t cl,          // column of cell to backtrace from
	int hef,             // cell to backtrace from is H (0), E (1), or F (2)
	TAlScore targ,       // score of cell to backtrace from
	TAlScore targ_final, // score of alignment we're looking for
	RandomSource& rnd,   // pseudo-random generator
	int64_t& row_new,    // out: row we ended up in after backtrace
	int64_t& col_new,    // out: column we ended up in after backtrace
	int& hef_new,        // out: H/E/F after backtrace
	TAlScore& targ_new,  // out: score up to cell we ended up in
	bool& done,          // out: finished tracing out an alignment?
	bool& abort)         // out: aborted b/c cell was seen before?
{
	assert_geq(rw, 0);
	assert_geq(cl, 0);
	assert_range(0, 2, hef);
	assert_lt(rw, (int64_t)prob_.qrylen_);
	assert_lt(cl, (int64_t)prob_.reflen_);
	assert(prob_.usecp_ && prob_.fill_);
	const bool is8_ = prob_.cper_->is8_;
	int64_t row = rw, col = cl;
	assert_leq(prob_.reflen_, (TRefOff)sawcell_.size());
	assert_leq(col, (int64_t)prob_.cper_->hicol());
	assert_geq(col, (int64_t)prob_.cper_->locol());
	assert_geq(prob_.cper_->per(), 2);
	size_t xmod = col & prob_.cper_->lomask();
	size_t ymod = row & prob_.cper_->lomask();
	size_t xdiv = col >> prob_.cper_->perpow2();
	size_t ydiv = row >> prob_.cper_->perpow2();
	size_t sq_ncol = xmod+1, sq_nrow = ymod+1;
	sq_.resize(sq_ncol * sq_nrow);
	bool upper = ydiv == 0;
	bool left  = xdiv == 0;
	const TAlScore sc_rdo = prob_.sc_->readGapOpen();
	const TAlScore sc_rde = prob_.sc_->readGapExtend();
	const TAlScore sc_rfo = prob_.sc_->refGapOpen();
	const TAlScore sc_rfe = prob_.sc_->refGapExtend();
	const bool local = !prob_.sc_->monotone;
	const CpQuad *qup = NULL;
	const __m128i *qlf = NULL;
	size_t per = prob_.cper_->per_;
	ASSERT_ONLY(size_t nrow = prob_.cper_->nrow());
	size_t ncol = prob_.cper_->ncol();
	assert_eq(prob_.qrylen_, nrow);
	assert_eq(prob_.reflen_, (TRefOff)ncol);
	size_t niter = prob_.cper_->niter_;
	if(!upper) {
		qup = prob_.cper_->qrows_.ptr() + (ncol * (ydiv-1)) + xdiv * per;
	}
	if(!left) {
		// Set up the column pointers to point to the first __m128i word in the
		// relevant column
		size_t off = (niter << 2) * (xdiv-1);
		qlf = prob_.cper_->qcols_.ptr() + off;
	}
	size_t xedge = xdiv * per; // absolute offset of leftmost cell in square
	size_t yedge = ydiv * per; // absolute offset of topmost cell in square
	size_t xi = xedge, yi = yedge; // iterators for columns, rows
	size_t ii = 0; // iterator into packed square
	// Iterate over rows, then over columns
	size_t m128mod = yi % prob_.cper_->niter_;
	size_t m128div = yi / prob_.cper_->niter_;
	int16_t sc_h_dg_lastrow = MIN_I16;
	for(size_t i = 0; i <= ymod; i++, yi++) {
		assert_lt(yi, nrow);
 		xi = xedge;
		// Handling for first column is done outside the loop
		size_t fromend = prob_.qrylen_ - yi - 1;
		bool allowGaps = fromend >= (size_t)prob_.sc_->gapbar && yi >= (size_t)prob_.sc_->gapbar;
		// Get character, quality from read
		int qc = prob_.qry_[yi], qq = prob_.qual_[yi];
		assert_geq(qq, 33);
		int16_t sc_h_lf_last = MIN_I16;
		int16_t sc_e_lf_last = MIN_I16;
		for(size_t j = 0; j <= xmod; j++, xi++) {
			assert_lt(xi, ncol);
			// Get character from reference
			int rc = prob_.ref_[xi];
			assert_range(0, 16, rc);
			int16_t sc_diag = prob_.sc_->score(qc, rc, qq - 33);
			int16_t sc_h_up = MIN_I16, sc_f_up = MIN_I16,
			        sc_h_lf = MIN_I16, sc_e_lf = MIN_I16,
					sc_h_dg = MIN_I16;
			int16_t sc_h_up_c = MIN_I16, sc_f_up_c = MIN_I16,
			        sc_h_lf_c = MIN_I16, sc_e_lf_c = MIN_I16,
					sc_h_dg_c = MIN_I16;
			if(yi == 0) {
				// If I'm in the first first row or column set it to 0
				sc_h_dg = 0;
			} else if(xi == 0) {
				// Do nothing; leave it at min
				if(local) {
					sc_h_dg = 0;
				}
			} else if(i == 0 && j == 0) {
				// Otherwise, if I'm in the upper-left square corner, I can get
				// it from the checkpoint 
				sc_h_dg = qup[-1].sc[0];
			} else if(j == 0) {
				// Otherwise, if I'm in the leftmost cell of this row, I can
				// get it from sc_h_lf in first column of previous row
				sc_h_dg = sc_h_dg_lastrow;
			} else {
				// Otherwise, I can get it from qup
				sc_h_dg = qup[j-1].sc[0];
			}
			if(yi > 0 && xi > 0) DEBUG_CHECK(sc_h_dg, yi-1, xi-1, 2);
			
			// If we're in the leftmost column, calculate sc_h_lf regardless of
			// allowGaps.
			if(j == 0 && xi > 0) {
				// Get values for left neighbors from the checkpoint
				if(is8_) {
					size_t vecoff = (m128mod << 6) + m128div;
					sc_e_lf = ((uint8_t*)(qlf + 0))[vecoff];
					sc_h_lf = ((uint8_t*)(qlf + 2))[vecoff];
					if(local) {
						// No adjustment
					} else {
						if(sc_h_lf == 0) sc_h_lf = MIN_I16;
						else sc_h_lf -= 0xff;
						if(sc_e_lf == 0) sc_e_lf = MIN_I16;
						else sc_e_lf -= 0xff;
					}
				} else {
					size_t vecoff = (m128mod << 5) + m128div;
					sc_e_lf = ((int16_t*)(qlf + 0))[vecoff];
					sc_h_lf = ((int16_t*)(qlf + 2))[vecoff];
					if(local) {
						sc_h_lf += 0x8000; assert_geq(sc_h_lf, 0);
						sc_e_lf += 0x8000; assert_geq(sc_e_lf, 0);
					} else {
						if(sc_h_lf != MIN_I16) sc_h_lf -= 0x7fff;
						if(sc_e_lf != MIN_I16) sc_e_lf -= 0x7fff;
					}
				}
				DEBUG_CHECK(sc_e_lf, yi, xi-1, 0);
				DEBUG_CHECK(sc_h_lf, yi, xi-1, 2);
				sc_h_dg_lastrow = sc_h_lf;
			}
			
			if(allowGaps) {
				if(j == 0 /* at left edge */ && xi > 0 /* not extreme */) {
					sc_h_lf_c = sc_h_lf;
					sc_e_lf_c = sc_e_lf;
					if(sc_h_lf_c != MIN_I16) sc_h_lf_c -= sc_rdo;
					if(sc_e_lf_c != MIN_I16) sc_e_lf_c -= sc_rde;
					assert_leq(sc_h_lf_c, prob_.cper_->perf_);
					assert_leq(sc_e_lf_c, prob_.cper_->perf_);
				} else if(xi > 0) {
					// Get values for left neighbors from the previous iteration
					if(sc_h_lf_last != MIN_I16) {
						sc_h_lf = sc_h_lf_last;
						sc_h_lf_c = sc_h_lf - sc_rdo;
					}
					if(sc_e_lf_last != MIN_I16) {
						sc_e_lf = sc_e_lf_last;
						sc_e_lf_c = sc_e_lf - sc_rde;
					}
				}
				if(yi > 0 /* not extreme */) {
					// Get column values
					assert(qup != NULL);
					assert(local || qup[j].sc[2] < 0);
					if(qup[j].sc[0] > MIN_I16) {
						DEBUG_CHECK(qup[j].sc[0], yi-1, xi, 2);
						sc_h_up = qup[j].sc[0];
						sc_h_up_c = sc_h_up - sc_rfo;
					}
					if(qup[j].sc[2] > MIN_I16) {
						DEBUG_CHECK(qup[j].sc[2], yi-1, xi, 1);
						sc_f_up = qup[j].sc[2];
						sc_f_up_c = sc_f_up - sc_rfe;
					}
				}
				if(local) {
					sc_h_up_c = max<int16_t>(sc_h_up_c, 0);
					sc_f_up_c = max<int16_t>(sc_f_up_c, 0);
					sc_h_lf_c = max<int16_t>(sc_h_lf_c, 0);
					sc_e_lf_c = max<int16_t>(sc_e_lf_c, 0);
				}
			}
			
			if(sc_h_dg > MIN_I16) {
				sc_h_dg_c = sc_h_dg + sc_diag;
			}
			if(local) sc_h_dg_c = max<int16_t>(sc_h_dg_c, 0);
			
			int mask = 0;
			// Calculate best ways into H, E, F cells starting with H.
			// Mask bits:
			// H: 1=diag, 2=hhoriz, 4=ehoriz, 8=hvert, 16=fvert
			// E: 32=hhoriz, 64=ehoriz
			// F: 128=hvert, 256=fvert
			int16_t sc_best = sc_h_dg_c;
			if(sc_h_dg_c > MIN_I64) {
				mask = 1;
			}
			if(xi > 0 && sc_h_lf_c >= sc_best && sc_h_lf_c > MIN_I64) {
				if(sc_h_lf_c > sc_best) mask = 0;
				mask |= 2;
				sc_best = sc_h_lf_c;
			}
			if(xi > 0 && sc_e_lf_c >= sc_best && sc_e_lf_c > MIN_I64) {
				if(sc_e_lf_c > sc_best) mask = 0;
				mask |= 4;
				sc_best = sc_e_lf_c;
			}
			if(yi > 0 && sc_h_up_c >= sc_best && sc_h_up_c > MIN_I64) {
				if(sc_h_up_c > sc_best) mask = 0;
				mask |= 8;
				sc_best = sc_h_up_c;
			}
			if(yi > 0 && sc_f_up_c >= sc_best && sc_f_up_c > MIN_I64) {
				if(sc_f_up_c > sc_best) mask = 0;
				mask |= 16;
				sc_best = sc_f_up_c;
			}
			// Calculate best way into E cell
			int16_t sc_e_best = sc_h_lf_c;
			if(xi > 0) {
				if(sc_h_lf_c >= sc_e_lf_c && sc_h_lf_c > MIN_I64) {
					if(sc_h_lf_c == sc_e_lf_c) {
						mask |= 64;
					}
					mask |= 32;
				} else if(sc_e_lf_c > MIN_I64) {
					sc_e_best = sc_e_lf_c;
					mask |= 64;
				}
			}
			if(sc_e_best > sc_best) {
				sc_best = sc_e_best;
				mask &= ~31; // don't go diagonal
			}
			// Calculate best way into F cell
			int16_t sc_f_best = sc_h_up_c;
			if(yi > 0) {
				if(sc_h_up_c >= sc_f_up_c && sc_h_up_c > MIN_I64) {
					if(sc_h_up_c == sc_f_up_c) {
						mask |= 256;
					}
					mask |= 128;
				} else if(sc_f_up_c > MIN_I64) {
					sc_f_best = sc_f_up_c;
					mask |= 256;
				}
			}
			if(sc_f_best > sc_best) {
				sc_best = sc_f_best;
				mask &= ~127; // don't go horizontal or diagonal
			}
			// Install results in cur
			assert( local || sc_best <= 0);
			sq_[ii+j].sc[0] = sc_best;
			assert( local || sc_e_best < 0);
			assert( local || sc_f_best < 0);
			assert(!local || sc_e_best >= 0 || sc_e_best == MIN_I16);
			assert(!local || sc_f_best >= 0 || sc_f_best == MIN_I16);
			sq_[ii+j].sc[1] = sc_e_best;
			sq_[ii+j].sc[2] = sc_f_best;
			sq_[ii+j].sc[3] = mask;
			DEBUG_CHECK(sq_[ii+j].sc[0], yi, xi, 2); // H
			DEBUG_CHECK(sq_[ii+j].sc[1], yi, xi, 0); // E
			DEBUG_CHECK(sq_[ii+j].sc[2], yi, xi, 1); // F
			// Update sc_h_lf_last, sc_e_lf_last
			sc_h_lf_last = sc_best;
			sc_e_lf_last = sc_e_best;
		}
		// Update m128mod, m128div
		m128mod++;
		if(m128mod == prob_.cper_->niter_) {
			m128mod = 0;
			m128div++;
		}
		// update qup
		ii += sq_ncol;
		// dimensions of sq_
		qup = sq_.ptr() + sq_ncol * i;
	}
	assert_eq(targ, sq_[ymod * sq_ncol + xmod].sc[hef]);
	//
	// Now backtrack through the triangle.  Abort as soon as we enter a cell
	// that was visited by a previous backtrace.
	//
	int64_t rowc = row, colc = col;
	size_t curid;
	int hefc = hef;
	if(bs_.empty()) {
		// Start an initial branch
		CHECK_ROW_COL(rowc, colc);
		curid = bs_.alloc();
		assert_eq(0, curid);
		Edit e;
		bs_[curid].init(
			prob_,
			0,      // parent ID
			0,      // penalty
			0,      // score_en
			rowc,   // row
			colc,   // col
			e,      // edit
			0,      // hef
			true,   // root?
			false); // don't try to extend with exact matches
		bs_[curid].len_ = 0;
	} else {
		curid = bs_.size()-1;
	}
	size_t ymodTimesNcol = ymod * sq_ncol;
	while(true) {
		// What depth are we?
		assert_eq(ymodTimesNcol, ymod * sq_ncol);
		CpQuad * cur = sq_.ptr() + ymodTimesNcol + xmod;
		int mask = cur->sc[3];
		assert_gt(mask, 0);
		int sel = -1;
		// Select what type of move to make, which depends on whether we're
		// currently in H, E, F:
		if(hefc == 0) {
			if(       (mask & 1) != 0) {
				// diagonal
				sel = 0;
			} else if((mask & 8) != 0) {
				// up to H
				sel = 3;
			} else if((mask & 16) != 0) {
				// up to F
				sel = 4;
			} else if((mask & 2) != 0) {
				// left to H
				sel = 1;
			} else if((mask & 4) != 0) {
				// left to E
				sel = 2;
			}
		} else if(hefc == 1) {
			if(       (mask & 32) != 0) {
				// left to H
				sel = 5;
			} else if((mask & 64) != 0) {
				// left to E
				sel = 6;
			}
		} else {
			assert_eq(2, hefc);
			if(       (mask & 128) != 0) {
				// up to H
				sel = 7;
			} else if((mask & 256) != 0) {
				// up to F
				sel = 8;
			}
		}
		assert_geq(sel, 0);
		// Get character from read
		int qc = prob_.qry_[rowc], qq = prob_.qual_[rowc];
		// Get character from reference
		int rc = prob_.ref_[colc];
		assert_range(0, 16, rc);
		bool xexit = false, yexit = false;
		// Now that we know what type of move to make, make it, updating our
		// row and column and moving updating the branch.
		if(sel == 0) {
			assert_geq(rowc, 0);
			assert_geq(colc, 0);
			TAlScore scd = prob_.sc_->score(qc, rc, qq - 33);
			if((rc & (1 << qc)) == 0) {
				// Mismatch
				size_t id = curid;
				// Check if the previous branch was the initial (bottommost)
				// branch with no matches.  If so, the mismatch should be added
				// to the initial branch, instead of starting a new branch.
				bool empty = (bs_[curid].len_ == 0 && curid == 0);
				if(!empty) {
					id = bs_.alloc();
				}
				Edit e((int)rowc, mask2dna[rc], "ACGTN"[qc], EDIT_TYPE_MM);
				assert_lt(scd, 0);
				TAlScore score_en = bs_[curid].score_st_ + scd;
				bs_[id].init(
					prob_,
					curid,    // parent ID
					-scd,     // penalty
					score_en, // score_en
					rowc,     // row
					colc,     // col
					e,        // edit
					hefc,     // hef
					empty,    // root?
					false);   // don't try to extend with exact matches
				curid = id;
				//assert(!local || bs_[curid].score_st_ >= 0);
			} else {
				// Match
				bs_[curid].score_st_ += prob_.sc_->match();
				bs_[curid].len_++;
				assert_leq((int64_t)bs_[curid].len_, bs_[curid].row_ + 1);
			}
			if(xmod == 0) xexit = true;
			if(ymod == 0) yexit = true;
			rowc--; ymod--; ymodTimesNcol -= sq_ncol;
			colc--; xmod--;
			assert(local || bs_[curid].score_st_ >= targ_final);
			hefc = 0;
		} else if((sel >= 1 && sel <= 2) || (sel >= 5 && sel <= 6)) {
			assert_gt(colc, 0);
			// Read gap
			size_t id = bs_.alloc();
			Edit e((int)rowc+1, mask2dna[rc], '-', EDIT_TYPE_READ_GAP);
			TAlScore gapp = prob_.sc_->readGapOpen();
			if(bs_[curid].len_ == 0 && bs_[curid].e_.inited() && bs_[curid].e_.isReadGap()) {
				gapp = prob_.sc_->readGapExtend();
			}
			//assert(!local || bs_[curid].score_st_ >= gapp);
			TAlScore score_en = bs_[curid].score_st_ - gapp;
			bs_[id].init(
				prob_,
				curid,    // parent ID
				gapp,     // penalty
				score_en, // score_en
				rowc,     // row
				colc-1,   // col
				e,        // edit
				hefc,     // hef
				false,    // root?
				false);   // don't try to extend with exact matches
			if(xmod == 0) xexit = true;
			colc--; xmod--;
			curid = id;
			assert( local || bs_[curid].score_st_ >= targ_final);
			//assert(!local || bs_[curid].score_st_ >= 0);
			if(sel == 1 || sel == 5) {
				hefc = 0;
			} else {
				hefc = 1;
			}
		} else {
			assert_gt(rowc, 0);
			// Reference gap
			size_t id = bs_.alloc();
			Edit e((int)rowc, '-', "ACGTN"[qc], EDIT_TYPE_REF_GAP);
			TAlScore gapp = prob_.sc_->refGapOpen();
			if(bs_[curid].len_ == 0 && bs_[curid].e_.inited() && bs_[curid].e_.isRefGap()) {
				gapp = prob_.sc_->refGapExtend();
			}
			//assert(!local || bs_[curid].score_st_ >= gapp);
			TAlScore score_en = bs_[curid].score_st_ - gapp;
			bs_[id].init(
				prob_,
				curid,    // parent ID
				gapp,     // penalty
				score_en, // score_en
				rowc-1,   // row
				colc,     // col
				e,        // edit
				hefc,     // hef
				false,    // root?
				false);   // don't try to extend with exact matches
			if(ymod == 0) yexit = true;
			rowc--; ymod--; ymodTimesNcol -= sq_ncol;
			curid = id;
			assert( local || bs_[curid].score_st_ >= targ_final);
			//assert(!local || bs_[curid].score_st_ >= 0);
			if(sel == 3 || sel == 7) {
				hefc = 0;
			} else {
				hefc = 2;
			}
		}
		CHECK_ROW_COL(rowc, colc);
		CpQuad * cur_new = NULL;
		if(!xexit && !yexit) {
			cur_new = sq_.ptr() + ymodTimesNcol + xmod;
		}
		// Check whether we made it to the top row or to a cell with score 0
		if(colc < 0 || rowc < 0 ||
		   (cur_new != NULL && local && cur_new->sc[0] == 0))
		{
			done = true;
			assert(bs_[curid].isSolution(prob_));
			addSolution(curid);
#ifndef NDEBUG
			// A check to see if any two adjacent branches in the backtrace
			// overlap.  If they do, the whole alignment will be filtered out
			// in trySolution(...)
			size_t cur = curid;
			if(!bs_[cur].root_) {
				size_t next = bs_[cur].parentId_;
				while(!bs_[next].root_) {
					assert_neq(cur, next);
					if(bs_[next].len_ != 0 || bs_[cur].len_ == 0) {
						assert(!bs_[cur].overlap(prob_, bs_[next]));
					}
					cur = next;
					next = bs_[cur].parentId_;
				}
			}
#endif
			return;
		}
		assert(!xexit || hefc == 0 || hefc == 1);
		assert(!yexit || hefc == 0 || hefc == 2);
		if(xexit || yexit) {
			//assert(rowc < 0 || colc < 0 || prob_.cper_->isCheckpointed(rowc, colc));
			row_new = rowc; col_new = colc;
			hef_new = hefc;
			done = false;
			if(rowc < 0 || colc < 0) {
				assert(local);
				targ_new = 0;
			} else {
				// TODO: Don't use scoreSquare
				targ_new = prob_.cper_->scoreSquare(rowc, colc, hefc);
				assert(local || targ_new >= targ);
				assert(local || targ_new >= targ_final);
			}
			if(local && targ_new == 0) {
				assert_eq(0, hefc);
				done = true;
				assert(bs_[curid].isSolution(prob_));
				addSolution(curid);
			}
			assert((row_new >= 0 && col_new >= 0) || done);
			return;
		}
	}
	assert(false);
}

/**
 * Caller gives us score_en, row and col.  We figure out score_st and len_
 * by comparing characters from the strings.
 *
 * If this branch comes after a mismatch, (row, col) describe the cell that the
 * mismatch occurs in.  len_ is initially set to 1, and the next cell we test
 * is the next cell up and to the left (row-1, col-1).
 *
 * If this branch comes after a read gap, (row, col) describe the leftmost cell
 * involved in the gap.  len_ is initially set to 0, and the next cell we test
 * is the current cell (row, col).
 *
 * If this branch comes after a reference gap, (row, col) describe the upper
 * cell involved in the gap.  len_ is initially set to 0, and the next cell we
 * test is the current cell (row, col).
 */
void BtBranch::init(
	const BtBranchProblem& prob,
	size_t parentId,
	TAlScore penalty,
	TAlScore score_en,
	int64_t row,
	int64_t col,
	Edit e,
	int hef,
	bool root,
	bool extend)
{
	score_en_ = score_en;
	penalty_ = penalty;
	score_st_ = score_en_;
	row_ = row;
	col_ = col;
	parentId_ = parentId;
	e_ = e;
	root_ = root;
	assert(!root_ || parentId == 0);
	assert_lt(row, (int64_t)prob.qrylen_);
	assert_lt(col, (int64_t)prob.reflen_);
	// First match to check is diagonally above and to the left of the cell
	// where the edit occurs
	int64_t rowc = row;
	int64_t colc = col;
	len_ = 0;
	if(e.inited() && e.isMismatch()) {
		rowc--; colc--;
		len_ = 1;
	}
	int64_t match = prob.sc_->match();
	bool cp = prob.usecp_;
	size_t iters = 0;
	curtailed_ = false;
	if(extend) {
		while(rowc >= 0 && colc >= 0) {
			int rfm = prob.ref_[colc];
			assert_range(0, 16, rfm);
			int rdc = prob.qry_[rowc];
			bool matches = (rfm & (1 << rdc)) != 0;
			if(!matches) {
				// What's the mismatch penalty?
				break;
			}
			// Get score from checkpointer
			score_st_ += match;
			if(cp && rowc - 1 >= 0 && colc - 1 >= 0 &&
			   prob.cper_->isCheckpointed(rowc - 1, colc - 1))
			{
				// Possibly prune
				int16_t cpsc;
				cpsc = prob.cper_->scoreTriangle(rowc - 1, colc - 1, hef);
				if(cpsc + score_st_ < prob.targ_) {
					curtailed_ = true;
					break;
				}
			}
			iters++;
			rowc--; colc--;
		}
	}
	assert_geq(rowc, -1);
	assert_geq(colc, -1);
	len_ = (int64_t)row - rowc;
	assert_leq((int64_t)len_, row_+1);
	assert_leq((int64_t)len_, col_+1);
	assert_leq((int64_t)score_st_, (int64_t)prob.qrylen_ * match);
}

/**
 * Given a potential branch to add to the queue, see if we can follow the
 * branch a little further first.  If it's still valid, or if we reach a
 * choice between valid outgoing paths, go ahead and add it to the queue.
 */
void BtBranchTracer::examineBranch(
	int64_t row,
	int64_t col,
	const Edit& e,
	TAlScore pen,  // penalty associated with edit
	TAlScore sc,
	size_t parentId)
{
	size_t id = bs_.alloc();
	bs_[id].init(prob_, parentId, pen, sc, row, col, e, 0, false, true);
	if(bs_[id].isSolution(prob_)) {
		assert(bs_[id].isValid(prob_));
		addSolution(id);
	} else {
		// Check if this branch is legit
		if(bs_[id].isValid(prob_)) {
			add(id);
		} else {
			bs_.pop();
		}
	}
}

/**
 * Take all possible ways of leaving the given branch and add them to the
 * branch queue.
 */
void BtBranchTracer::addOffshoots(size_t bid) {
	BtBranch& b = bs_[bid];
	TAlScore sc = b.score_en_;
	int64_t match = prob_.sc_->match();
	int64_t scoreFloor = prob_.sc_->monotone ? MIN_I64 : 0;
	bool cp = prob_.usecp_; // Are there are any checkpoints?
	ASSERT_ONLY(TAlScore perfectScore = prob_.sc_->perfectScore(prob_.qrylen_));
	assert_leq(prob_.targ_, perfectScore);
	// For each cell in the branch
	for(size_t i = 0 ; i < b.len_; i++) {
		assert_leq((int64_t)i, b.row_+1);
		assert_leq((int64_t)i, b.col_+1);
		int64_t row = b.row_ - i, col = b.col_ - i;
		int64_t bonusLeft = (row + 1) * match;
		int64_t fromend = prob_.qrylen_ - row - 1;
		bool allowGaps = fromend >= prob_.sc_->gapbar && row >= prob_.sc_->gapbar;
		if(allowGaps && row >= 0 && col >= 0) {
			if(col > 0) {
				// Try a read gap - it's either an extension or an open
				bool extend = b.e_.inited() && b.e_.isReadGap() && i == 0;
				TAlScore rdgapPen = extend ?
					prob_.sc_->readGapExtend() : prob_.sc_->readGapOpen();
				bool prune = false;
				assert_gt(rdgapPen, 0);
				if(cp && prob_.cper_->isCheckpointed(row, col - 1)) {
					// Possibly prune
					int16_t cpsc = (int16_t)prob_.cper_->scoreTriangle(row, col - 1, 0);
					assert_leq(cpsc, perfectScore);
					assert_geq(prob_.sc_->readGapOpen(), prob_.sc_->readGapExtend());
					TAlScore bonus = prob_.sc_->readGapOpen() - prob_.sc_->readGapExtend();
					assert_geq(bonus, 0);
					if(cpsc + bonus + sc - rdgapPen < prob_.targ_) {
						prune = true;
					}
				}
				if(prune) {
					if(extend) { nrdexPrune_++; } else { nrdopPrune_++; }
				} else if(sc - rdgapPen >= scoreFloor && sc - rdgapPen + bonusLeft >= prob_.targ_) {
					// Yes, we can introduce a read gap here
					Edit e((int)row + 1, mask2dna[(int)prob_.ref_[col]], '-', EDIT_TYPE_READ_GAP);
					assert(e.isReadGap());
					examineBranch(row, col - 1, e, rdgapPen, sc - rdgapPen, bid);
					if(extend) { nrdex_++; } else { nrdop_++; }
				}
			}
			if(row > 0) {
				// Try a reference gap - it's either an extension or an open
				bool extend = b.e_.inited() && b.e_.isRefGap() && i == 0;
				TAlScore rfgapPen = (b.e_.inited() && b.e_.isRefGap()) ?
					prob_.sc_->refGapExtend() : prob_.sc_->refGapOpen();
				bool prune = false;
				assert_gt(rfgapPen, 0);
				if(cp && prob_.cper_->isCheckpointed(row - 1, col)) {
					// Possibly prune
					int16_t cpsc = (int16_t)prob_.cper_->scoreTriangle(row - 1, col, 0);
					assert_leq(cpsc, perfectScore);
					assert_geq(prob_.sc_->refGapOpen(), prob_.sc_->refGapExtend());
					TAlScore bonus = prob_.sc_->refGapOpen() - prob_.sc_->refGapExtend();
					assert_geq(bonus, 0);
					if(cpsc + bonus + sc - rfgapPen < prob_.targ_) {
						prune = true;
					}
				}
				if(prune) {
					if(extend) { nrfexPrune_++; } else { nrfopPrune_++; }
				} else if(sc - rfgapPen >= scoreFloor && sc - rfgapPen + bonusLeft >= prob_.targ_) {
					// Yes, we can introduce a ref gap here
					Edit e((int)row, '-', "ACGTN"[(int)prob_.qry_[row]], EDIT_TYPE_REF_GAP);
					assert(e.isRefGap());
					examineBranch(row - 1, col, e, rfgapPen, sc - rfgapPen, bid);
					if(extend) { nrfex_++; } else { nrfop_++; }
				}
			}
		}
		// If we're at the top of the branch but not yet at the top of
		// the DP table, a mismatch branch is also possible.
		if(i == b.len_ && !b.curtailed_ && row >= 0 && col >= 0) {
			int rfm = prob_.ref_[col];
			assert_lt(row, (int64_t)prob_.qrylen_);
			int rdc = prob_.qry_[row];
			int rdq = prob_.qual_[row];
			int scdiff = prob_.sc_->score(rdc, rfm, rdq - 33);
			assert_lt(scdiff, 0); // at end of branch, so can't match
			bool prune = false;
			if(cp && row > 0 && col > 0 && prob_.cper_->isCheckpointed(row - 1, col - 1)) {
				// Possibly prune
				int16_t cpsc = prob_.cper_->scoreTriangle(row - 1, col - 1, 0);
				assert_leq(cpsc, perfectScore);
				assert_leq(cpsc + scdiff + sc, perfectScore);
				if(cpsc + scdiff + sc < prob_.targ_) {
					prune = true;
				}
			}
			if(prune) {
				nmm_++;
			} else  {
				// Yes, we can introduce a mismatch here
				if(sc + scdiff >= scoreFloor && sc + scdiff + bonusLeft >= prob_.targ_) {
					Edit e((int)row, mask2dna[rfm], "ACGTN"[rdc], EDIT_TYPE_MM);
					bool nmm = (mask2dna[rfm] == 'N' || rdc > 4);
					assert_neq(e.chr, e.qchr);
					assert_lt(scdiff, 0);
					examineBranch(row - 1, col - 1, e, -scdiff, sc + scdiff, bid);
					if(nmm) { nnmm_++; } else { nmm_++; }
				}
			}
		}
		sc += match;
	}
}

/**
 * Sort unsorted branches, merge them with master sorted list.
 */
void BtBranchTracer::flushUnsorted() {
	if(unsorted_.empty()) {
		return;
	}
	unsorted_.sort();
	unsorted_.reverse();
#ifndef NDEBUG
	for(size_t i = 1; i < unsorted_.size(); i++) {
		assert_leq(bs_[unsorted_[i].second].score_st_, bs_[unsorted_[i-1].second].score_st_);
	}
#endif
	EList<size_t> *src2 = sortedSel_ ? &sorted1_ : &sorted2_;
	EList<size_t> *dest = sortedSel_ ? &sorted2_ : &sorted1_;
	// Merge src1 and src2 into dest
	dest->clear();
	size_t cur1 = 0, cur2 = cur_;
	while(cur1 < unsorted_.size() || cur2 < src2->size()) {
		// Take from 1 or 2 next?
		bool take1 = true;
		if(cur1 == unsorted_.size()) {
			take1 = false;
		} else if(cur2 == src2->size()) {
			take1 = true;
		} else {
			assert_neq(unsorted_[cur1].second, (*src2)[cur2]);
			take1 = bs_[unsorted_[cur1].second] < bs_[(*src2)[cur2]];
		}
		if(take1) {
			dest->push_back(unsorted_[cur1++].second); // Take from list 1
		} else {
			dest->push_back((*src2)[cur2++]); // Take from list 2
		}
	}
	assert_eq(cur1, unsorted_.size());
	assert_eq(cur2, src2->size());
	sortedSel_ = !sortedSel_;
	cur_ = 0;
	unsorted_.clear();
}

/**
 * Try all the solutions accumulated so far.  Solutions might be rejected
 * if they, for instance, overlap a previous solution, have too many Ns,
 * fail to overlap a core diagonal, etc.
 */
bool BtBranchTracer::trySolutions(
	bool lookForOlap,
	SwResult& res,
	size_t& off,
	size_t& nrej,
	RandomSource& rnd,
	bool& success)
{
	if(solutions_.size() > 0) {
		for(size_t i = 0; i < solutions_.size(); i++) {
			int ret = trySolution(solutions_[i], lookForOlap, res, off, nrej, rnd);
			if(ret == BT_FOUND) {
				success = true;
				return true; // there were solutions and one was good
			}
		}
		solutions_.clear();
		success = false;
		return true; // there were solutions but none were good
	}
	return false; // there were no solutions to check
}

/**
 * Given the id of a branch that completes a successful backtrace, turn the
 * chain of branches into 
 */
int BtBranchTracer::trySolution(
	size_t id,
	bool lookForOlap,
	SwResult& res,
	size_t& off,
	size_t& nrej,
	RandomSource& rnd)
{
#if 0
	AlnScore score;
	BtBranch *br = &bs_[id];
	// 'br' corresponds to the leftmost edit in a right-to-left
	// chain of edits.  
	EList<Edit>& ned = res.alres.ned();
	const BtBranch *cur = br, *prev = NULL;
	size_t ns = 0, nrefns = 0;
	size_t ngap = 0;
	while(true) {
		if(cur->e_.inited()) {
			if(cur->e_.isMismatch()) {
				if(cur->e_.qchr == 'N' || cur->e_.chr == 'N') {
					ns++;
				}
			} else if(cur->e_.isGap()) {
				ngap++;
			}
			if(cur->e_.chr == 'N') {
				nrefns++;
			}
			ned.push_back(cur->e_);
		}
		if(cur->root_) {
			break;
		}
		cur = &bs_[cur->parentId_];
	}
	if(ns > prob_.nceil_) {
		// Alignment has too many Ns in it!
		res.reset();
		assert(res.alres.ned().empty());
		nrej++;
		return BT_REJECTED_N;
	}
	// Update 'seenPaths_'
	cur = br;
	bool rejSeen = false; // set =true if we overlap prev path
	bool rejCore = true; // set =true if we don't touch core diag
	while(true) {
		// Consider row, col, len, then do something
		int64_t row = cur->row_, col = cur->col_;
		assert_lt(row, (int64_t)prob_.qrylen_);
		size_t fromend = prob_.qrylen_ - row - 1;
		size_t diag = fromend + col;
		// Calculate the diagonal within the *trimmed* rectangle,
		// i.e. the rectangle we dealt with in align, gather and
		// backtrack.
		int64_t diagi = col - row;
		// Now adjust to the diagonal within the *untrimmed*
		// rectangle by adding on the amount trimmed from the left.
		diagi += prob_.rect_->triml;
		assert_lt(diag, seenPaths_.size());
		// Does it overlap a core diagonal?
		if(diagi >= 0) {
			size_t diag = (size_t)diagi;
			if(diag >= prob_.rect_->corel &&
			   diag <= prob_.rect_->corer)
			{
				// Yes it does - it's OK
				rejCore = false;
			}
		}
		if(lookForOlap) {
			int64_t newlo, newhi;
			if(cur->len_ == 0) {
				if(prev != NULL && prev->len_ > 0) {
					// If there's a gap at the base of a non-0 length branch, the
					// gap will appear to overlap the branch if we give it length 1.
					newhi = newlo = 0;
				} else {
					// Read or ref gap with no matches coming off of it
					newlo = row;
					newhi = row + 1;
				}
			} else {
				// Diagonal with matches
				newlo = row - (cur->len_ - 1);
				newhi = row + 1;
			}
			assert_geq(newlo, 0);
			assert_geq(newhi, 0);
			// Does the diagonal cover cells?
			if(newhi > newlo) {
				// Check whether there is any overlap with previously traver

Download .txt

gitextract_sa6zefk6/

├── .gitignore
├── AUTHORS
├── LICENSE
├── MANUAL
├── MANUAL.markdown
├── Makefile
├── NEWS
├── README.md
├── TUTORIAL
├── VERSION
├── aligner_bt.cpp
├── aligner_bt.h
├── aligner_cache.cpp
├── aligner_cache.h
├── aligner_metrics.h
├── aligner_result.h
├── aligner_seed.cpp
├── aligner_seed.h
├── aligner_seed_policy.cpp
├── aligner_seed_policy.h
├── aligner_sw.cpp
├── aligner_sw.h
├── aligner_sw_common.h
├── aligner_sw_nuc.h
├── aligner_swsse.cpp
├── aligner_swsse.h
├── aligner_swsse_ee_i16.cpp
├── aligner_swsse_ee_u8.cpp
├── aligner_swsse_loc_i16.cpp
├── aligner_swsse_loc_u8.cpp
├── aln_sink.h
├── alphabet.cpp
├── alphabet.h
├── assert_helpers.h
├── binary_sa_search.h
├── bitpack.h
├── blockwise_sa.h
├── bt2_idx.cpp
├── bt2_idx.h
├── bt2_io.h
├── bt2_util.h
├── btypes.h
├── ccnt_lut.cpp
├── centrifuge
├── centrifuge-BuildSharedSequence.pl
├── centrifuge-RemoveEmptySequence.pl
├── centrifuge-RemoveN.pl
├── centrifuge-build
├── centrifuge-compress.pl
├── centrifuge-download
├── centrifuge-inspect
├── centrifuge-kreport
├── centrifuge-promote
├── centrifuge-sort-nt.pl
├── centrifuge.cpp
├── centrifuge.xcodeproj/
│   └── project.pbxproj
├── centrifuge_build.cpp
├── centrifuge_build_main.cpp
├── centrifuge_compress.cpp
├── centrifuge_inspect.cpp
├── centrifuge_main.cpp
├── centrifuge_report.cpp
├── classifier.h
├── diff_sample.cpp
├── diff_sample.h
├── doc/
│   ├── README
│   ├── add.css
│   ├── faq.shtml
│   ├── footer.inc.html
│   ├── index.shtml
│   ├── manual.html
│   ├── manual.inc.html
│   ├── manual.inc.html.old
│   ├── manual.shtml
│   ├── sidebar.inc.shtml
│   ├── strip_markdown.pl
│   └── style.css
├── dp_framer.cpp
├── dp_framer.h
├── ds.cpp
├── ds.h
├── edit.cpp
├── edit.h
├── endian_swap.h
├── evaluation/
│   ├── centrifuge_evaluate.py
│   ├── centrifuge_simulate_reads.py
│   └── test/
│       ├── abundance.Rmd
│       └── centrifuge_evaluate_mason.py
├── example/
│   ├── index/
│   │   ├── test.1.cf
│   │   ├── test.2.cf
│   │   ├── test.3.cf
│   │   └── test.4.cf
│   ├── reads/
│   │   └── input.fa
│   └── reference/
│       ├── gi_to_tid.dmp
│       ├── names.dmp
│       ├── nodes.dmp
│       └── test.fa
├── fast_mutex.h
├── filebuf.h
├── formats.h
├── functions.sh
├── group_walk.cpp
├── group_walk.h
├── hi_aligner.h
├── hier_idx.h
├── hier_idx_common.h
├── hyperloglogbias.h
├── hyperloglogplus.h
├── indices/
│   └── Makefile
├── limit.cpp
├── limit.h
├── ls.cpp
├── ls.h
├── mask.cpp
├── mask.h
├── mem_ids.h
├── mm.h
├── multikey_qsort.h
├── opts.h
├── outq.cpp
├── outq.h
├── pat.cpp
├── pat.h
├── pe.cpp
├── pe.h
├── presets.cpp
├── presets.h
├── processor_support.h
├── qual.cpp
├── qual.h
├── random_source.cpp
├── random_source.h
├── random_util.cpp
├── random_util.h
├── read.h
├── read_qseq.cpp
├── ref_coord.cpp
├── ref_coord.h
├── ref_read.cpp
├── ref_read.h
├── reference.cpp
├── reference.h
├── scoring.cpp
├── scoring.h
├── search_globals.h
├── sequence_io.h
├── shmem.cpp
├── shmem.h
├── simple_func.cpp
├── simple_func.h
├── sse_util.cpp
├── sse_util.h
├── sstring.cpp
├── sstring.h
├── str_util.h
├── taxonomy.h
├── third_party/
│   ├── MurmurHash3.cpp
│   ├── MurmurHash3.h
│   └── cpuid.h
├── threading.h
├── timer.h
├── tinythread.cpp
├── tinythread.h
├── tokenize.h
├── util.h
├── word_io.h
└── zbox.h

Download .txt

SYMBOL INDEX (1217 symbols across 102 files)

FILE: aligner_bt.cpp
  function main (line 1744) | int main(int argc, char **argv) {

FILE: aligner_bt.h
  function class (line 170) | class BtBranchProblem {
  function class (line 324) | class BtBranch {
  function reset (line 348) | void reset() {
  function isSolution (line 374) | bool isSolution(const BtBranchProblem& prob) const {
  function isValid (line 382) | bool isValid(const BtBranchProblem& prob) const {
  function overlap (line 406) | bool overlap(const BtBranchProblem& prob, const BtBranch& bt) const {
  function endsInFirstRow (line 461) | bool endsInFirstRow() const {
  function class (line 544) | class BtBranchTracer {

FILE: aligner_cache.cpp
  type option (line 37) | struct option
  function printUsage (line 42) | static void printUsage(ostream& os) {
  function add (line 51) | static void add(
  function aligner_cache_tests (line 64) | static void aligner_cache_tests() {
  function main (line 164) | int main(int argc, char **argv) {

FILE: aligner_cache.h
  type PListSlice (line 68) | typedef PListSlice<TIndexOffU, CACHE_PAGE_SZ> TSlice;
  function QKey (line 73) | struct QKey {
  function init (line 91) | bool init(
  function toString (line 122) | void toString(BTDnaString& s) {
  function reset (line 139) | void reset() { seq = 0; len = 0xffffffff; }
  function operator (line 144) | bool operator<(const QKey& o) const {
  function operator (line 151) | bool operator>(const QKey& o) const {
  function operator (line 158) | bool operator==(const QKey& o) const {
  function operator (line 166) | bool operator!=(const QKey& o) const {
  function index_t (line 206) | index_t numRanges() const {
  function empty (line 224) | bool empty() const {
  function reset (line 237) | void reset() {
  function init (line 244) | void init(index_t i, index_t ranges, index_t elts) {
  function addRange (line 251) | void addRange(index_t numElts) {
  type QKey (line 275) | typedef QKey SAKey;
  function valid (line 290) | bool valid() { return len != (index_t)OFF_MASK; }
  function init (line 303) | void init(
  function init (line 338) | void init(SAKey k, index_t tf, index_t tb, TSlice o) {
  function init (line 345) | void init(const SATuple& src, index_t first, index_t last) {
  function operator (line 370) | bool operator<(const SATuple& o) const {
  function operator (line 379) | bool operator>(const SATuple& o) const {
  function operator (line 389) | bool operator==(const SATuple& o) const {
  function reset (line 393) | void reset() { topf = topb = (index_t)OFF_MASK; offs.reset(); }
  function setLength (line 398) | void setLength(index_t nlen) {
  type RedBlackNode (line 435) | typedef RedBlackNode<QKey,  QVal<index_t> >  QNode;
  type RedBlackNode (line 436) | typedef RedBlackNode<SAKey, SAVal<index_t> > SANode;
  type PList (line 438) | typedef PList<SAKey, CACHE_PAGE_SZ> TQList;
  type PList (line 439) | typedef PList<index_t, CACHE_PAGE_SZ> TSAList;
  function empty (line 507) | bool empty() const {
  function MUTEX_T (line 600) | MUTEX_T* lockPtr() const {
  function nextRead (line 808) | void nextRead() {
  function clear (line 824) | void clear() {
  function AlignmentCache (line 875) | const AlignmentCache<index_t>* currentCache() const { return current_; }
  function AlignmentCache (line 898) | const AlignmentCache<index_t>& current() {

FILE: aligner_metrics.h
  function class (line 35) | class RunningStat {
  function class (line 91) | class AlignerMetrics {
  function nextRead (line 194) | void nextRead(const BTDnaString& read) {
  function setReadHasRange (line 217) | void setReadHasRange() {
  function finishRead (line 224) | void finishRead() {

FILE: aligner_result.h
  type TAlScore (line 34) | typedef int64_t TAlScore;
  function class (line 51) | class AlnScore {
  function reset (line 74) | void reset() {
  function AlnScore (line 81) | inline static AlnScore INVALID() {
  function invalidate (line 98) | inline void invalidate() {
  function o (line 107) | inline bool operator==(const AlnScore& o) const {
  function o (line 115) | inline bool operator!=(const AlnScore& o) const {
  function o (line 122) | inline bool operator>=(const AlnScore& o) const {
  function o (line 141) | inline bool operator<(const AlnScore& o) const {
  function o (line 148) | inline bool operator<=(const AlnScore& o) const {
  function o (line 155) | inline bool operator>(const AlnScore& o) const {
  function class (line 205) | class AlnRes {
  function reset (line 244) | void reset() {
  function setScore (line 257) | void setScore(TAlScore score) {
  function readPositions (line 270) | uint32_t> readPositions(size_t i) const { return readPositions_[i]; }
  function printSeq (line 280) | void printSeq(
  function printQuals (line 304) | void printQuals(
  function init (line 321) | void init(
  type TNumAlns (line 353) | typedef uint64_t TNumAlns;
  function class (line 360) | class AlnSetSumm {

FILE: aligner_seed.cpp
  function Constraint (line 30) | Constraint Constraint::exact() {
  function Constraint (line 40) | Constraint Constraint::penaltyBased(int pen) {
  function Constraint (line 50) | Constraint Constraint::penaltyFuncBased(const SimpleFunc& f) {
  function Constraint (line 60) | Constraint Constraint::mmBased(int mms) {
  function Constraint (line 71) | Constraint Constraint::editBased(int edits) {
  function parseInt (line 348) | static int parseInt(const char *errmsg, const char *arg) {
  type option (line 371) | struct option
  function printUsage (line 384) | static void printUsage(ostream& os) {
  function main (line 416) | int main(int argc, char **argv) {

FILE: aligner_seed.h
  function init (line 50) | struct Constraint {
  function mustMatch (line 68) | bool mustMatch() {
  function canMismatch (line 78) | bool canMismatch(int q, const Scoring& cm) {
  function canN (line 87) | bool canN(int q, const Scoring& cm) {
  function canMismatch (line 97) | bool canMismatch() {
  function canN (line 106) | bool canN() {
  function canDelete (line 115) | bool canDelete(int ex, const Scoring& cm) {
  function canDelete (line 124) | bool canDelete() {
  function canInsert (line 134) | bool canInsert(int ex, const Scoring& cm) {
  function canInsert (line 143) | bool canInsert() {
  function canGap (line 152) | bool canGap() {
  function chargeMismatch (line 160) | void chargeMismatch(int q, const Scoring& cm) {
  function chargeN (line 173) | void chargeN(int q, const Scoring& cm) {
  function chargeDelete (line 186) | void chargeDelete(int ex, const Scoring& cm) {
  function chargeInsert (line 199) | void chargeInsert(int ex, const Scoring& cm) {
  function acceptable (line 216) | bool acceptable() {
  function instantiate (line 229) | static int instantiate(size_t rdlen, const SimpleFunc& func) {
  function instantiate (line 236) | void instantiate(size_t rdlen) {
  type InstantiatedSeed (line 310) | struct InstantiatedSeed
  function Seed (line 321) | struct Seed {
  function init (line 339) | void init(int ln, int ty, Constraint* oc) {
  function acceptable (line 359) | bool acceptable() {
  function mmSeeds (line 386) | static void mmSeeds(
  type InstantiatedSeed (line 410) | struct InstantiatedSeed {
  function reset (line 486) | void reset() {
  function init (line 494) | void init(
  function ns (line 529) | int ns() const {
  function operator (line 564) | bool operator<(const EEHit& o) const {
  function repOk (line 577) | bool repOk(const Read& rd) const {
  function nextRead (line 638) | void nextRead(const Read& read) {
  function add (line 647) | void add(
  function reset (line 683) | void reset(
  function clearSeeds (line 716) | void clearSeeds() {
  function clear (line 743) | void clear() {
  function uniquenessFactor (line 791) | double uniquenessFactor() const {
  function index_t (line 843) | index_t idx2off(size_t off) const {
  function QVal (line 856) | const QVal<index_t>& hitsAtOffIdx(bool fw, size_t seedoffidx) const {
  function rankSeedHits (line 924) | void rankSeedHits(RandomSource& rnd) {
  function index_t (line 995) | index_t fewestEditsEE(bool fw, int seedlen, int per) const {
  function QVal (line 1032) | const QVal<index_t>& hitsByRank(
  function BTDnaString (line 1061) | const BTDnaString& seqByRank(index_t r) {
  function BTString (line 1070) | const BTString& qualByRank(index_t r) {
  function sort1mmEe (line 1107) | void sort1mmEe(RandomSource& rnd) {
  function add1mmEe (line 1132) | void add1mmEe(
  function addExactEeFw (line 1148) | void addExactEeFw(
  function addExactEeRc (line 1162) | void addExactEeRc(
  function clearExactE2eHits (line 1176) | void clearExactE2eHits() {
  function clear1mmE2eHits (line 1184) | void clear1mmE2eHits() {
  type SeedSearchMetrics (line 1276) | struct SeedSearchMetrics {
  function reset (line 1305) | void reset() {

FILE: aligner_seed_policy.cpp
  function parseFuncType (line 30) | static int parseFuncType(const std::string& otype) {
  function main (line 648) | int main() {

FILE: aligner_seed_policy.h
  function class (line 39) | class SeedAlignmentPolicy {

FILE: aligner_sw.cpp
  type option (line 937) | struct option
  function printUsage (line 945) | static void printUsage(ostream& os) {
  function T (line 958) | T parse(const char *s) {
  function doTestCase (line 979) | static void doTestCase(
  function doTestCase2 (line 1097) | static void doTestCase2(
  function doTestCase3 (line 1142) | static void doTestCase3(
  function doTestCase4 (line 1194) | static void doTestCase4(
  function doTests (line 1246) | static void doTests() {
  function doLocalTests (line 2505) | static void doLocalTests() {
  function main (line 2840) | int main(int argc, char **argv) {

FILE: aligner_sw.h
  function class (line 192) | class SwAligner {
  function backtrace (line 544) | bool backtrace(

FILE: aligner_sw_common.h
  type SwResult (line 40) | struct SwResult {

FILE: aligner_sw_nuc.h
  type DpNucFrame (line 40) | struct DpNucFrame {
  function DpBtCandidate (line 93) | struct DpBtCandidate {
  function reset (line 101) | void reset() { init(0, 0, 0); }
  function init (line 103) | void init(size_t row_, size_t col_, TAlScore score_) {
  function dominatedBy (line 119) | inline bool dominatedBy(const DpBtCandidate& o) {
  function repOk (line 174) | bool repOk() const {

FILE: aligner_swsse.h
  function clear (line 32) | struct SSEMetrics {
  function reset (line 37) | void reset() {
  type SSEMatrix (line 99) | struct SSEMatrix {
  function __m128i (line 134) | inline __m128i* evecUnsafe(size_t row, size_t col) {
  function __m128i (line 146) | inline __m128i* fvec(size_t row, size_t col) {
  function __m128i (line 158) | inline __m128i* hvec(size_t row, size_t col) {
  function __m128i (line 170) | inline __m128i* tmpvec(size_t row, size_t col) {
  function __m128i (line 182) | inline __m128i* tmpvecUnsafe(size_t row, size_t col) {
  function elt (line 222) | inline int elt(size_t row, size_t col, size_t mat) const {
  function eelt (line 243) | inline int eelt(size_t row, size_t col) const {
  function felt (line 250) | inline int felt(size_t row, size_t col) const {
  function helt (line 257) | inline int helt(size_t row, size_t col) const {
  function reportedThrough (line 264) | inline bool reportedThrough(
  function setReportedThrough (line 274) | inline void setReportedThrough(
  function resetRow (line 381) | void resetRow(size_t i) {
  type SSEData (line 408) | struct SSEData {
  function isHMaskSet (line 425) | inline bool SSEMatrix::isHMaskSet(
  function hMaskSet (line 437) | inline void SSEMatrix::hMaskSet(
  function isEMaskSet (line 450) | inline bool SSEMatrix::isEMaskSet(
  function eMaskSet (line 462) | inline void SSEMatrix::eMaskSet(
  function isFMaskSet (line 475) | inline bool SSEMatrix::isFMaskSet(
  function fMaskSet (line 487) | inline void SSEMatrix::fMaskSet(

FILE: aligner_swsse_ee_i16.cpp
  function cellOkEnd2EndI16 (line 154) | static bool cellOkEnd2EndI16(
  function TAlScore (line 297) | TAlScore SwAligner::alignGatherEE16(int& flag, bool debug) {
  function TAlScore (line 793) | TAlScore SwAligner::alignNucleotidesEnd2EndSseI16(int& flag, bool debug) {

FILE: aligner_swsse_ee_u8.cpp
  function cellOkEnd2EndU8 (line 151) | static bool cellOkEnd2EndU8(
  function TAlScore (line 298) | TAlScore SwAligner::alignGatherEE8(int& flag, bool debug) {
  function TAlScore (line 791) | TAlScore SwAligner::alignNucleotidesEnd2EndSseU8(int& flag, bool debug) {

FILE: aligner_swsse_loc_i16.cpp
  function cellOkLocalI16 (line 152) | static bool cellOkLocalI16(
  function TAlScore (line 295) | TAlScore SwAligner::alignGatherLoc16(int& flag, bool debug) {
  function TAlScore (line 967) | TAlScore SwAligner::alignNucleotidesLocalSseI16(int& flag, bool debug) {

FILE: aligner_swsse_loc_u8.cpp
  function cellOkLocalU8 (line 166) | static bool cellOkLocalU8(
  function TAlScore (line 307) | TAlScore SwAligner::alignGatherLoc8(int& flag, bool debug) {
  function TAlScore (line 973) | TAlScore SwAligner::alignNucleotidesLocalSseU8(int& flag, bool debug) {

FILE: aln_sink.h
  type ReadCounts (line 45) | struct ReadCounts {
  type SpeciesMetrics (line 56) | struct SpeciesMetrics {
  function reset (line 84) | void reset() {
  function init (line 93) | void init(
  function addSpeciesCounts (line 142) | void addSpeciesCounts(
  function addAllKmers (line 174) | void addAllKmers(
  function nDistinctKmers (line 192) | size_t nDistinctKmers(uint64_t taxID) {
  function EM (line 196) | static void EM(
  function calculateAbundance (line 274) | void calculateAbundance(const Ebwt<uint64_t>& ebwt, uint8_t rank)
  function reset (line 515) | struct ReportingMetrics {
  function init (line 525) | void init(
  type THitInt (line 567) | typedef int64_t THitInt;
  function init (line 573) | struct ReportingParams {
  function class (line 624) | class ReportingState {
  type EList (line 762) | typedef EList<std::string> StrList;
  function finish (line 880) | void finish(
  function ReportingParams (line 1213) | const ReportingParams& reportingParams() { return rp_;}
  function getPair (line 1287) | void getPair(const EList<AlnRes>*& rs) const { rs = &rs_; }
  type EList (line 1367) | typedef EList<std::string> StrList;
  function virtual (line 1384) | virtual ~AlnSinkSam() { }
  function std (line 1436) | static inline std::ostream& printPct(
  function AlnSetSumm (line 1696) | AlnSetSumm concordSumm(rd1_, rd2_, &rs_);
  function printUptoWs (line 2070) | inline void printUptoWs(
  function appendReadID (line 2202) | inline
  function appendSeqID (line 2219) | inline
  function appendTaxID (line 2236) | inline
  type FIELD_DEF (line 2253) | enum FIELD_DEF {
  function nextRead (line 2369) | void ReportingState::nextRead(bool paired) {
  function foundConcordant (line 2382) | bool ReportingState::foundConcordant() {
  function foundUnpaired (line 2400) | bool ReportingState::foundUnpaired(bool mate1) {
  function finish (line 2410) | void ReportingState::finish() {
  function getReport (line 2442) | void ReportingState::getReport(uint64_t& nconcordAln) const // # concord...
  function areDone (line 2466) | inline void ReportingState::areDone(

FILE: alphabet.cpp
  function setIupacsCat (line 280) | void setIupacsCat(uint8_t cat) {

FILE: alphabet.h
  function isDna (line 89) | static inline bool isDna(char c) {
  function isColor (line 96) | static inline bool isColor(char c) {
  function isAmbigNuc (line 103) | static inline bool isAmbigNuc(char c) {
  function isAmbigColor (line 110) | static inline bool isAmbigColor(char c) {
  function isAmbig (line 117) | static inline bool isAmbig(char c, bool color) {
  function isUnambigNuc (line 124) | static inline bool isUnambigNuc(char c) {
  function comp (line 131) | static inline char comp(char c) {
  function compDna (line 148) | static inline int compDna(int c) {
  function isUnambigDna (line 156) | static inline bool isUnambigDna(char c) {
  function isUnambigColor (line 163) | static inline bool isUnambigColor(char c) {
  function decodeNuc (line 173) | static inline void decodeNuc(char c , int& num, int *alts) {

FILE: assert_helpers.h
  function class (line 31) | class ReleaseAssertException : public std::runtime_error {
  function assert_in2 (line 245) | static inline void assert_in2(char c, const char *str, const char *file,...
  function assert_range_helper (line 262) | static void assert_range_helper(const T& begin,

FILE: binary_sa_search.h
  function TIndexOffU (line 47) | TIndexOffU binarySASearch(

FILE: bitpack.h
  function pack_2b_in_8b (line 31) | static inline void pack_2b_in_8b(const int two, uint8_t& eight, const in...
  function unpack_2b_from_8b (line 37) | static inline int unpack_2b_from_8b(const uint8_t eight, const int off) {
  function pack_2b_in_32b (line 42) | static inline void pack_2b_in_32b(const int two, uint32_t& thirty2, cons...
  function unpack_2b_from_32b (line 48) | static inline int unpack_2b_from_32b(const uint32_t thirty2, const int o...

FILE: blockwise_sa.h
  function hasMoreSuffixes (line 99) | bool hasMoreSuffixes() {
  function resetSuffixItr (line 114) | void resetSuffixItr() {
  function suffixItrIsReset (line 127) | bool suffixItrIsReset() {
  function virtual (line 154) | virtual bool hasMoreBlocks() const = 0;
  function simulateAllocs (line 233) | static size_t simulateAllocs(const TStr& text, TIndexOffU bucketSz) {
  function nextBlock_Worker (line 242) | static void nextBlock_Worker(void *vp) {
  function virtual (line 277) | virtual TIndexOffU nextSuffix() {
  function virtual (line 369) | virtual bool isReset() {
  function qsort (line 472) | inline void KarkkainenBlockwiseSA<S2bDnaString>::qsort(
  function BinarySorting_worker (line 506) | void BinarySorting_worker(void *vp)
  function TIndexOffU (line 730) | static TIndexOffU suffixLcp(const T& t, TIndexOffU aOff, TIndexOffU bOff) {
  function TIndexOffU (line 787) | TIndexOffU lookupSuffixZ(

FILE: bt2_idx.cpp
  function string (line 38) | string adjustEbwtBase(const string& cmdline,

FILE: bt2_idx.h
  type EBWT_FLAGS (line 99) | enum EBWT_FLAGS {
  function init (line 133) | void init(
  function setOffRate (line 200) | void setOffRate(int __offRate) {
  function print (line 225) | void print(ostream& out) const {
  function class (line 283) | class EbwtFileOpenException : public std::runtime_error {
  function fileSize (line 292) | static inline int64_t fileSize(const char* name) {
  function initFromTopBot (line 326) | static void initFromTopBot(
  function initFromRow (line 355) | void initFromRow(index_t row, const EbwtParams<index_t>& ep, const uint8...
  function nextSide (line 376) | void nextSide(const EbwtParams<index_t>& ep) {
  function repOk (line 404) | bool repOk(const EbwtParams<index_t>& ep) const {
  function invalidate (line 414) | void invalidate() {
  type USE_POPCNT_GENERIC (line 471) | struct USE_POPCNT_GENERIC {
  function countInU64 (line 505) | inline static int countInU64(int c, uint64_t dw) {
  function isPacked (line 1193) | bool isPacked() { return packed_; }
  function index_t (line 1614) | index_t joinedLen(EList<RefRecord>& szs) {
  function index_t (line 1652) | inline index_t*   fchr()              { return _fchr.get(); }
  function index_t (line 1653) | inline index_t*   ftab()              { return _ftab.get(); }
  function index_t (line 1654) | inline index_t*   eftab()             { return _eftab.get(); }
  function index_t (line 1657) | inline index_t*   plen()              { return _plen.get(); }
  function index_t (line 1658) | inline index_t*   rstarts()           { return _rstarts.get(); }
  function index_t (line 1660) | inline const index_t* fchr() const    { return _fchr.get(); }
  function index_t (line 1661) | inline const index_t* ftab() const    { return _ftab.get(); }
  function index_t (line 1662) | inline const index_t* eftab() const   { return _eftab.get(); }
  function index_t (line 1665) | inline const index_t* plen() const    { return _plen.get(); }
  function index_t (line 1666) | inline const index_t* rstarts() const { return _rstarts.get(); }
  function saGenomeBoundaryHas (line 1679) | inline const bool 	    saGenomeBoundaryHas( uint64_t key ) const { retur...
  function saGenomeBoundaryVal (line 1680) | inline const uint32_t saGenomeBoundaryVal( uint64_t key ) const { return...
  function loadIntoMemory (line 1748) | void loadIntoMemory(
  function evictFromMemory (line 1773) | void evictFromMemory() {
  function index_t (line 1794) | index_t ftabSeqToInt(
  function index_t (line 1823) | index_t ftabHi(index_t i) const {
  function index_t (line 1842) | static index_t ftabHi(
  function index_t (line 1863) | index_t ftabLo(index_t i) const {
  function index_t (line 1876) | index_t ftabLo(const BTDnaString& seq, index_t off) const {
  function index_t (line 1883) | index_t ftabHi(const BTDnaString& seq, index_t off) const {
  function ftabLoHi (line 1894) | bool
  function index_t (line 1921) | static index_t ftabLo(
  function index_t (line 1944) | index_t tryOffset(index_t elt) const {
  function index_t (line 1985) | index_t tryOffset(
  function postReadInit (line 2028) | void postReadInit(EbwtParams<index_t>& eh) {
  function print (line 2048) | void print(ostream& out) const {
  function print (line 2056) | void print(ostream& out, const EbwtParams<index_t>& eh) const {
  function index_t (line 2156) | inline index_t countBt2Side(const SideLocus<index_t>& l, int c) const {
  function countBt2SideRange (line 2202) | inline void countBt2SideRange(
  function countBt2SideEx (line 2285) | inline void countBt2SideEx(const SideLocus<index_t>& l, index_t* arrs) c...
  function index_t (line 2328) | inline index_t countUpTo(const SideLocus<index_t>& l, int c) const {
  function index_t (line 2396) | inline index_t countDownTo(const SideLocus<index_t>& l, int c) const {
  function countInU64Ex (line 2441) | inline static void countInU64Ex(uint64_t dw, index_t* arrs) {
  function countUpToEx (line 2496) | inline void countUpToEx(const SideLocus<index_t>& l, index_t* arrs) const {
  function index_t (line 2631) | inline index_t countBt2SideRange2(
  function rowL (line 2701) | inline int rowL(const SideLocus<index_t>& l) const {
  function rowL (line 2711) | inline int rowL(index_t i) const {
  function inMemoryRepOk (line 2932) | bool inMemoryRepOk(const EbwtParams<index_t>& eh) const {
  function repOk (line 2949) | bool repOk(const EbwtParams<index_t>& eh) const {
  function string (line 2963) | string get_uid(const string& header) {
  function get_tid (line 2975) | uint64_t get_tid(const string& stid) {
  function verbose (line 3074) | void verbose(const string& s) const {
  function is_read_err (line 3821) | inline bool is_read_err(int fdesc, ssize_t ret, size_t count) {
  function is_fread_err (line 3832) | inline bool is_fread_err(FILE* file_hd, size_t ret, size_t count) {

FILE: bt2_io.h
  type stat (line 90) | struct stat
  function else (line 120) | else if(_useMm && !justHeader) {
  function readEbwtColor (line 821) | bool
  function readEntireReverse (line 835) | bool

FILE: btypes.h
  type TIndexOffU (line 30) | typedef uint64_t TIndexOffU;
  type TIndexOff (line 31) | typedef int64_t TIndexOff;
  type TIndexOffU (line 39) | typedef uint32_t TIndexOffU;
  type TIndexOff (line 40) | typedef int TIndexOff;

FILE: ccnt_lut.cpp
  function countCnt (line 27) | int countCnt(int by, int c, uint8_t str) {
  function countCnt_rev (line 39) | int countCnt_rev(int by, int c, uint8_t str) {
  function initializeCntLut (line 51) | void initializeCntLut() {

FILE: centrifuge.cpp
  function parse_col_fmt (line 268) | static void parse_col_fmt(const string arg, EList<string>& tab_fmt_cols_...
  function resetOptions (line 285) | static void resetOptions() {
  type option (line 532) | struct option
  function printArgDesc (line 701) | static void printArgDesc(ostream& out) {
  function printUsage (line 737) | static void printUsage(ostream& out) {
  function parseInt (line 869) | static int parseInt(int lower, int upper, const char *errmsg, const char...
  function parseInt (line 890) | static int parseInt(int lower, const char *errmsg, const char *arg) {
  function T (line 898) | T parse(const char *s) {
  function parsePair (line 909) | pair<T, T> parsePair(const char *str, char delim) {
  function parseTuple (line 923) | void parseTuple(const char *str, char delim, EList<T>& ret) {
  function string (line 932) | static string applyPreset(const string& sorig, Presets& presets) {
  function parseOption (line 959) | static void parseOption(int next_option, const char *arg) {
  function parseOptions (line 1495) | static void parseOptions(int argc, const char **argv) {
  function PatternSourcePerThreadFactory (line 1665) | static PatternSourcePerThreadFactory*
  type OuterLoopMetrics (line 1690) | struct OuterLoopMetrics {
    method OuterLoopMetrics (line 1692) | OuterLoopMetrics() {
    method reset (line 1699) | void reset() {
    method merge (line 1709) | void merge(
  type PerfMetrics (line 1739) | struct PerfMetrics {
    method PerfMetrics (line 1741) | PerfMetrics() : first(true) { reset(); }
    method reset (line 1746) | void reset() {
    method merge (line 1769) | void merge(
    method reportInterval (line 1807) | void reportInterval(
    method mergeIncrementals (line 2184) | void mergeIncrementals() {
  function printMmsSkipMsg (line 2232) | static inline void printMmsSkipMsg(
  function printLenSkipMsg (line 2252) | static inline void printLenSkipMsg(
  function printLocalScoreMsg (line 2269) | static inline void printLocalScoreMsg(
  function printEEScoreMsg (line 2288) | static inline void printEEScoreMsg(
  function multiseedSearchWorker (line 2342) | static void multiseedSearchWorker(void *vp) {
  function multiseedSearch (line 2762) | static void multiseedSearch(
  function driver (line 2826) | static void driver(
  function centrifuge (line 3345) | int centrifuge(int argc, const char **argv) {

FILE: centrifuge_build.cpp
  function resetOptions (line 79) | static void resetOptions() {
  function printUsage (line 143) | static void printUsage(ostream& out) {
  type option (line 193) | struct option
  function T (line 238) | static T parseNumber(T lower, const char *errmsg) {
  function parseOptions (line 258) | static void parseOptions(int argc, const char **argv) {
  function deleteIdxFiles (line 379) | static void deleteIdxFiles(
  function driver (line 399) | static void driver(
  function centrifuge_build (line 554) | int centrifuge_build(int argc, const char **argv) {

FILE: centrifuge_build_main.cpp
  function main (line 43) | int main(int argc, const char **argv) {

FILE: centrifuge_compress.cpp
  function resetOptions (line 77) | static void resetOptions() {
  function printUsage (line 134) | static void printUsage(ostream& out) {
  type option (line 187) | struct option
  function parseNumber (line 229) | static int parseNumber(T lower, const char *errmsg) {
  function parseOptions (line 249) | static void parseOptions(int argc, const char **argv) {
  function print_fasta_record (line 353) | static void print_fasta_record(
  type RegionSimilar (line 392) | struct RegionSimilar {
    method reset (line 400) | void reset() {
  type Region (line 412) | struct Region {
    method match_size (line 421) | uint32_t match_size() {
    method reset (line 426) | void reset() {
  type RegionToMerge (line 436) | struct RegionToMerge {
    method reset (line 440) | void reset() {
  function driver (line 450) | static void driver(
  function centrifuge_compress (line 1271) | int centrifuge_compress(int argc, const char **argv) {
  function main (line 1406) | int main(int argc, const char **argv) {

FILE: centrifuge_inspect.cpp
  type option (line 62) | struct option
  function printUsage (line 83) | static void printUsage(ostream& out) {
  function parseInt (line 124) | static int parseInt(int lower, const char *errmsg) {
  function parseOptions (line 145) | static void parseOptions(int argc, char **argv) {
  function print_fasta_record (line 191) | static void print_fasta_record(
  function count_idx_kmers (line 218) | static uint64_t count_idx_kmers ( Ebwt<index_t>& ebwt)
  function print_ref_sequence (line 301) | static void print_ref_sequence(
  function print_ref_sequences (line 334) | static void print_ref_sequences(
  function print_index_sequences (line 369) | static void print_index_sequences(ostream& fout, Ebwt<index_t>& ebwt)
  function print_index_sequence_names (line 434) | static void print_index_sequence_names(const string& fname, ostream& fout)
  function print_index_summary (line 447) | static void print_index_summary(
  function driver (line 487) | static void driver(
  function main (line 609) | int main(int argc, char **argv) {

FILE: centrifuge_main.cpp
  function main (line 42) | int main(int argc, const char **argv) {

FILE: centrifuge_report.cpp
  function printUsage (line 36) | static void printUsage(ostream& out) {
  class Pair2ndComparator (line 48) | class Pair2ndComparator{
  function driver (line 57) | static void driver(
  function main (line 167) | int main(int argc, const char **argv) {

FILE: classifier.h
  function reset (line 47) | void reset() {
  function finalize (line 86) | void finalize(
  function virtual (line 211) | virtual
  function getGenomeIdx (line 573) | bool getGenomeIdx(
  function reportUnclassified (line 619) | void reportUnclassified( AlnSinkWrap<index_t>& sink )
  function searchForwardAndReverse (line 646) | void searchForwardAndReverse(
  function addHitToHitMap (line 982) | size_t addHitToHitMap(
  type compareBWTHits (line 1058) | struct compareBWTHits {

FILE: diff_sample.cpp
  type sampleEntry (line 22) | struct sampleEntry

FILE: diff_sample.h
  type sampleEntry (line 61) | struct sampleEntry {
  type sampleEntry (line 69) | struct sampleEntry
  function dcRepOk (line 77) | bool dcRepOk(T v, EList<T>& ds) {
  function increasing (line 109) | bool increasing(T* ts, size_t limit) {
  function hasDifference (line 121) | inline bool hasDifference(T *ds, T d, T v, T diff) {
  function popCount (line 414) | unsigned int popCount(T i) {
  function myLog2 (line 427) | unsigned int myLog2(T i) {
  function simulateAllocs (line 480) | static size_t simulateAllocs(const TStr& text, uint32_t v) {
  function modv (line 493) | uint32_t modv(TIndexOffU i) const    { return (uint32_t)(i & ~_vmask); }
  function TIndexOffU (line 494) | TIndexOffU divv(TIndexOffU i) const  { return i >> _log2v; }
  function print (line 513) | void print(ostream& out) {
  function verbose (line 536) | void verbose(const string& s) const {
  function suffixSameUpTo (line 661) | inline bool suffixSameUpTo(
  function VSorting_worker (line 690) | void VSorting_worker(void *vp)

FILE: dp_framer.cpp
  function testCaseFindMateAnchorLeft (line 364) | static void testCaseFindMateAnchorLeft(
  function testCaseFindMateAnchorRight (line 424) | static void testCaseFindMateAnchorRight(
  function main (line 483) | int main(void) {

FILE: dp_framer.h
  type DPRect (line 59) | struct DPRect {
  function entirelyTrimmed (line 94) | bool entirelyTrimmed() const {
  function initIval (line 112) | void initIval(Interval& iv) {
  function class (line 122) | class DynProgFramer {

FILE: ds.cpp
  function main (line 57) | int main(void) {

FILE: ds.h
  function class (line 37) | class MemoryTally {
  function xfer (line 394) | void xfer(EList<T, S>& o) {
  function ensure (line 440) | inline void ensure(size_t thresh) {
  function reserveExact (line 450) | inline void reserveExact(size_t newsz) {
  function push_back (line 469) | void push_back(const T& el) {
  function expand (line 478) | void expand() {
  function fill (line 487) | void fill(size_t begin, size_t end, const T& v) {
  function fill (line 498) | void fill(const T& v) {
  function fillZero (line 507) | void fillZero(size_t begin, size_t end) {
  function fillZero (line 515) | void fillZero() {
  function resizeNoCopy (line 523) | void resizeNoCopy(size_t sz) {
  function resize (line 537) | void resize(size_t sz) {
  function resizeExact (line 553) | void resizeExact(size_t sz) {
  function erase (line 566) | void erase(size_t idx) {
  function erase (line 577) | void erase(size_t idx, size_t len) {
  function insert (line 592) | void insert(const T& el, size_t idx) {
  function insert (line 606) | void insert(const EList<T>& l, size_t idx) {
  function pop_back (line 623) | void pop_back() {
  function clear (line 631) | void clear() {
  function T (line 639) | inline T& back() {
  function reverse (line 647) | void reverse() {
  function T (line 661) | inline const T& back() const {
  function isSuperset (line 699) | bool isSuperset(const EList<T, S>& o) const {
  function T (line 724) | inline T& operator[](size_t i) {
  function T (line 732) | inline const T& operator[](size_t i) const {
  function T (line 740) | inline T& get(size_t i) {
  function T (line 747) | inline const T& get(size_t i) const {
  function T (line 763) | const T& getSlow(size_t i) const {
  function sortPortion (line 770) | void sortPortion(size_t begin, size_t num) {
  function shufflePortion (line 784) | void shufflePortion(size_t begin, size_t num, RandomSource& rnd) {
  function sort (line 800) | void sort() {
  function remove (line 825) | void remove(size_t idx) {
  function T (line 837) | T *ptr() { return list_; }
  function T (line 842) | const T *ptr() const { return list_; }
  function setCat (line 847) | void setCat(int cat) {
  function bsearchLoBound (line 863) | size_t bsearchLoBound(const T& el) const {
  function lazyInitExact (line 896) | void lazyInitExact(size_t sz) {
  function T (line 907) | T *alloc(size_t sz) {
  function free (line 919) | void free() {
  function expandCopy (line 935) | void expandCopy(size_t thresh) {
  function expandCopyExact (line 946) | void expandCopyExact(size_t newsz) {
  function expandNoCopy (line 968) | void expandNoCopy(size_t thresh) {
  function expandNoCopyExact (line 980) | void expandNoCopyExact(size_t newsz) {
  function xfer (line 1096) | void xfer(ELList<T, S1, S2>& o) {
  function expand (line 1123) | void expand() {
  function resize (line 1133) | void resize(size_t sz) {
  function clear (line 1148) | void clear() {
  function setCat (line 1236) | void setCat(int cat) {
  function free (line 1283) | void free() {
  function expandCopy (line 1296) | void expandCopy(size_t thresh) {
  function expandNoCopy (line 1318) | void expandNoCopy(size_t thresh) {
  function xfer (line 1434) | void xfer(ELLList<T, S1, S2, S3>& o) {
  function expand (line 1461) | void expand() {
  function resize (line 1471) | void resize(size_t sz) {
  function clear (line 1484) | void clear() {
  function setCat (line 1572) | void setCat(int cat) {
  function free (line 1619) | void free() {
  function expandCopy (line 1632) | void expandCopy(size_t thresh) {
  function expandNoCopy (line 1654) | void expandNoCopy(size_t thresh) {
  function insert (line 1776) | bool insert(const T& el) {
  function contains (line 1797) | bool contains(const T& el) const {
  function remove (line 1818) | void remove(const T& el) {
  function resize (line 1835) | void resize(size_t sz) {
  function clear (line 1843) | void clear() { cur_ = 0; }
  function setCat (line 1853) | void setCat(int cat) {
  function xfer (line 1862) | void xfer(ESet<T>& o) {
  function T (line 1877) | T *ptr() { return list_; }
  function T (line 1882) | const T *ptr() const { return list_; }
  function free (line 1901) | void free() {
  function scanLoBound (line 1914) | size_t scanLoBound(const T& el) const {
  function bsearchLoBound (line 1928) | size_t bsearchLoBound(const T& el) const {
  function insert (line 1977) | void insert(const T& el, size_t idx) {
  function erase (line 1994) | void erase(size_t idx) {
  function expandCopy (line 2007) | void expandCopy(size_t thresh) {
  function xfer (line 2107) | void xfer(ELSet<T, S>& o) {
  function expand (line 2134) | void expand() {
  function resize (line 2144) | void resize(size_t sz) {
  function clear (line 2159) | void clear() {
  function ESet (line 2167) | inline ESet<T>& back() {
  function ESet (line 2175) | inline const ESet<T>& back() const {
  function ESet (line 2235) | const ESet<T>& getSlow(size_t i) const {
  function ESet (line 2247) | const ESet<T> *ptr() const { return list_; }
  function setCat (line 2252) | void setCat(int cat) {
  function free (line 2299) | void free() {
  function expandCopy (line 2312) | void expandCopy(size_t thresh) {
  function expandNoCopy (line 2334) | void expandNoCopy(size_t thresh) {
  function list_ (line 2392) | list_(NULL) {
  function insert (line 2444) | bool insert(const std::pair<K, V>& el) {
  function contains (line 2465) | bool contains(const K& el) const {
  function containsEx (line 2482) | bool containsEx(const K& el, size_t& i) const {
  function remove (line 2501) | void remove(const K& el) {
  function resize (line 2518) | void resize(size_t sz) {
  function clear (line 2541) | void clear() { cur_ = 0; }
  function free (line 2560) | void free() {
  function scanLoBound (line 2573) | size_t scanLoBound(const K& el) const {
  function bsearchLoBound (line 2587) | size_t bsearchLoBound(const K& el) const {
  function insert (line 2635) | void insert(const std::pair<K, V>& el, size_t idx) {
  function erase (line 2652) | void erase(size_t idx) {
  function expandCopy (line 2665) | void expandCopy(size_t thresh) {
  function clear (line 2702) | void clear() {
  function alloc (line 2709) | size_t alloc() {
  function resize (line 2738) | void resize(size_t sz) {
  function pop (line 2753) | void pop() {
  function T (line 2767) | const T& operator[](size_t off) const {
  function clear (line 2791) | void clear() {
  function reset (line 2798) | void reset() {
  function set (line 2806) | void set(size_t off) {
  function test (line 2817) | bool test(size_t off) const {
  function resize (line 2834) | void resize(size_t off) {
  function T (line 2885) | T top() {
  function T (line 2893) | T pop() {
  function T (line 2964) | const T& operator[](size_t i) const {
  function repOkNode (line 2980) | bool repOkNode(size_t cur) const {
  function clear (line 3001) | void clear() {
  function class (line 3015) | class Pool {
  function full (line 3055) | bool full() { return cur_ == pages_.size(); }
  function clear (line 3060) | void clear() {
  function free (line 3068) | void free() {
  function add (line 3132) | bool add(Pool& p, const EList<T>& os) {
  function copy (line 3150) | bool copy(
  function addFill (line 3173) | bool addFill(Pool& p, size_t num, const T& o) {
  function clear (line 3191) | void clear() {
  function T (line 3224) | inline const T& getConst(size_t i) const {
  function T (line 3234) | inline T& get(size_t i) {
  function T (line 3246) | inline T& back() {
  function T (line 3257) | inline const T& back() const {
  function ensure (line 3284) | bool ensure(Pool& p, size_t num) {
  function init (line 3352) | void init(const EListSlice<T, S>& sl, size_t first, size_t last) {
  function reset (line 3363) | void reset() {
  function T (line 3371) | inline const T& get(size_t i) const {
  function T (line 3380) | inline T& get(size_t i) {
  function T (line 3389) | inline T& operator[](size_t i) {
  function T (line 3398) | inline const T& operator[](size_t i) const {
  function setLength (line 3449) | void setLength(size_t nlen) {
  function init (line 3484) | void init(const PListSlice<T, S>& sl, size_t first, size_t last) {
  function reset (line 3495) | void reset() {
  function T (line 3503) | inline const T& get(size_t i) const {
  function T (line 3512) | inline T& get(size_t i) {
  function T (line 3521) | inline T& operator[](size_t i) {
  function T (line 3530) | inline const T& operator[](size_t i) const {
  function setLength (line 3581) | void setLength(size_t nlen) {
  type RedBlackNode (line 3599) | typedef RedBlackNode<K,P> TNode;
  function RedBlackNode (line 3612) | RedBlackNode *grandparent() {
  function RedBlackNode (line 3619) | RedBlackNode *uncle() {
  function replaceChild (line 3638) | void replaceChild(RedBlackNode* ol, RedBlackNode* nw) {
  function operator (line 3669) | bool operator<(const TNode& o) const { return key < o.key; }
  function operator (line 3674) | bool operator>(const TNode& o) const { return key > o.key; }
  function operator (line 3679) | bool operator==(const TNode& o) const { return key == o.key; }
  function operator (line 3684) | bool operator<(const K& okey) const { return key < okey; }
  function operator (line 3689) | bool operator>(const K& okey) const { return key > okey; }
  function operator (line 3694) | bool operator==(const K& okey) const { return key == okey; }
  type RedBlackNode (line 3707) | typedef RedBlackNode<K,P> TNode;
  function TNode (line 3721) | inline TNode* lookup(const K& key) const {
  function clear (line 3806) | void clear() {
  function addNode (line 3833) | bool addNode(Pool& p, TNode*& node) {
  function TNode (line 3861) | const TNode* root() const { return root_; }
  function redBlackRepOk (line 3869) | bool redBlackRepOk(TNode* n) {
  function redBlackRepOk (line 3899) | bool redBlackRepOk(
  function leftRotate (line 3950) | void leftRotate(TNode* n) {
  function rightRotate (line 3977) | void rightRotate(TNode* n) {
  function addNode (line 4002) | void addNode(TNode* n, TNode* parent, bool leftChild) {
  function toList (line 4111) | void toList(EList<T>& l) {
  function operator (line 4142) | bool operator==(const Pair& o) const {
  function operator (line 4146) | bool operator<(const Pair& o) const {
  function operator (line 4167) | bool operator==(const Triple& o) const {
  function operator (line 4171) | bool operator<(const Triple& o) const {
  function init (line 4201) | void init(
  function operator (line 4210) | bool operator==(const Quad& o) const {
  function operator (line 4214) | bool operator<(const Quad& o) const {
  function head (line 4249) | head(NULL) {
  function delete_node (line 4282) | void delete_node(LinkedEListNode<T> *node) {

FILE: edit.cpp
  function ostream (line 31) | ostream& operator<< (ostream& os, const Edit& e) {

FILE: edit.h
  type Edit (line 65) | struct Edit {
  function reset (line 95) | void reset() {
  function hasN (line 171) | bool hasN() const {
  function operator (line 195) | int operator== (const Edit &rhs) const {
  function isReadGap (line 210) | bool isReadGap() const {
  function isGap (line 227) | bool isGap() const {
  function numGaps (line 240) | static size_t numGaps(const EList<Edit>& es) {

FILE: endian_swap.h
  function currentlyBigEndian (line 29) | static inline bool currentlyBigEndian() {
  function endianSwapU16 (line 37) | static inline uint16_t endianSwapU16(uint16_t u) {
  function endianSwapU32 (line 47) | static inline uint32_t endianSwapU32(uint32_t u) {
  function endianSwapU64 (line 59) | static inline uint64_t endianSwapU64(uint64_t u) {
  function index_t (line 76) | inline index_t endianSwapIndex(index_t u) {
  function endianSwapI16 (line 89) | static inline int16_t endianSwapI16(int16_t i) {
  function endianizeU16 (line 100) | static inline uint16_t endianizeU16(uint16_t u, bool toBig) {
  function endianizeI16 (line 111) | static inline int16_t endianizeI16(int16_t i, bool toBig) {
  function endianSwapI32 (line 121) | static inline int32_t endianSwapI32(int32_t i) {
  function endianizeU32 (line 134) | static inline uint32_t endianizeU32(uint32_t u, bool toBig) {
  function endianizeI32 (line 145) | static inline int32_t endianizeI32(int32_t i, bool toBig) {

FILE: evaluation/centrifuge_evaluate.py
  function read_taxonomy_tree (line 13) | def read_taxonomy_tree(tax_file):
  function compare_scm (line 26) | def compare_scm(centrifuge_out, true_out, taxonomy_tree, rank):
  function compare_abundance (line 117) | def compare_abundance(centrifuge_out, true_out, taxonomy_tree, debug):
  function sql_execute (line 166) | def sql_execute(sql_db, sql_query):
  function create_sql_db (line 180) | def create_sql_db(sql_db):
  function write_analysis_data (line 232) | def write_analysis_data(sql_db, genome_name, database_name):
  function evaluate (line 278) | def evaluate(index_base,

FILE: evaluation/centrifuge_simulate_reads.py
  function reverse_complement (line 30) | def reverse_complement(seq):
  function get_genome_seq_id (line 58) | def get_genome_seq_id(genome_name):
  class ErrRandomSource (line 68) | class ErrRandomSource:
    method __init__ (line 69) | def __init__(self, prob = 0.0, size = 1 << 20):
    method getRand (line 79) | def getRand(self):
  function read_genomes (line 88) | def read_genomes(genomes_file, seq2taxID):
  function read_transcript (line 118) | def read_transcript(genomes_seq, gtf_file, frag_len):
  function generate_rna_expr_profile (line 183) | def generate_rna_expr_profile(expr_profile_type, num_transcripts = 10000):
  function generate_dna_expr_profile (line 208) | def generate_dna_expr_profile(expr_profile_type, num_genomes):
  function getSamAlignment (line 233) | def getSamAlignment(dna, exons, genome_seq, trans_seq, frag_pos, read_le...
  function samRepOk (line 457) | def samRepOk(genome_seq, read_seq, chr, pos, cigar, XM, NM, MD, Zs, max_...
  function simulate_reads (line 585) | def simulate_reads(index_fname, base_fname, \

FILE: evaluation/test/centrifuge_evaluate_mason.py
  function read_taxonomy_tree (line 13) | def read_taxonomy_tree(tax_file):
  function compare_scm (line 26) | def compare_scm(centrifuge_out, true_out, taxonomy_tree, rank):
  function evaluate (line 128) | def evaluate(index_base,

FILE: fast_mutex.h
  function lock (line 120) | inline void lock()

FILE: filebuf.h
  function isnewline (line 35) | static inline bool isnewline(int c) {
  function isspace_notnl (line 43) | static inline bool isspace_notnl(int c) {
  function class (line 57) | class FileBuf {
  function isOpen (line 84) | bool isOpen() {
  function close (line 91) | void close() {
  function get (line 104) | int get() {
  function eof (line 117) | bool eof() {
  function newFile (line 124) | void newFile(FILE *in) {
  function newFile (line 136) | void newFile(std::ifstream *__inf) {
  function newFile (line 148) | void newFile(std::istream *__ins) {
  function reset (line 161) | void reset() {
  function peek (line 181) | int peek() {
  function gets (line 222) | size_t gets(char *buf, size_t len) {
  function get (line 252) | size_t get(char *buf, size_t len) {
  function getPastWhitespace (line 268) | int getPastWhitespace() {
  function getPastNewline (line 279) | int getPastNewline() {
  function peekPastNewline (line 293) | int peekPastNewline() {
  function peekUptoNewline (line 306) | int peekUptoNewline() {
  function resetLastN (line 402) | void resetLastN() {
  function copyLastN (line 410) | size_t copyLastN(char *buf) {
  function class (line 456) | class BitpairOutFileBuf {
  function class (line 524) | class OutFileBuf {
  function write (line 586) | void write(char c) {
  function writeString (line 595) | void writeString(const std::string& s) {
  function writeChars (line 640) | void writeChars(const char * s, size_t len) {
  function writeChars (line 661) | void writeChars(const char * s) {
  function close (line 668) | void close() {
  function reset (line 680) | void reset() {
  function flush (line 685) | void flush() {

FILE: formats.h
  type file_format (line 29) | enum file_format {

FILE: group_walk.h
  function init (line 106) | void init(TIndexOffU tf, size_t len_, const T& o) {
  function reset (line 113) | void reset() { topf = std::numeric_limits<TIndexOffU>::max(); }
  function map (line 140) | GroupWalkState(int cat) : map(cat) {
  type WalkMetrics (line 155) | struct WalkMetrics {
  function reset (line 177) | void reset() {
  function reset (line 200) | void reset() {
  function init (line 208) | void init(
  function operator (line 226) | bool operator==(const GWElt& o) const {
  function operator (line 238) | bool operator!=(const GWElt& o) const {
  function reset (line 261) | void reset() {
  function init (line 269) | void init(
  function init (line 313) | void init(
  function reset (line 333) | void reset() {
  function repOk (line 350) | bool repOk(const SARangeWithOffs<T>& sa) const {
  function repOkBasic (line 374) | bool repOkBasic() {
  function setReported (line 382) | void setReported(index_t i) {
  function reported (line 392) | bool reported(index_t i) const {
  function repOk (line 711) | bool repOk(
  function repOkBasic (line 739) | bool repOkBasic() {
  function repOkMapInclusive (line 748) | bool repOkMapInclusive(GWHit<index_t, T>& hit, index_t range) const {
  function index_t (line 783) | index_t off(
  function index_t (line 797) | index_t map(index_t i) const {
  function setOff (line 830) | void setOff(
  function reset (line 1021) | void reset() {
  function initMap (line 1033) | void initMap(size_t newsz) {
  function doneReporting (line 1045) | bool doneReporting(const GWHit<index_t, T>& hit) const {
  function doneResolving (line 1056) | bool doneResolving(const SARangeWithOffs<T>& sa) const {
  function reset (line 1089) | void reset() {
  function init (line 1097) | void init(
  function advanceElement (line 1154) | bool advanceElement(
  function repOk (line 1220) | bool repOk(const SARangeWithOffs<T>& sa) const {

FILE: hi_aligner.h
  function reset (line 63) | void reset() {
  function init (line 73) | void init(
  function operator (line 103) | bool operator<(const BWTHit& o) const {
  function repOk (line 124) | bool repOk(const Read& rd) const {
  function reset (line 154) | void reset() {
  function init (line 164) | void init(
  function done (line 178) | bool done() {
  function done (line 188) | void done(bool done) {
  function offsetSize (line 197) | size_t  offsetSize()             { return _partialHits.size(); }
  function numPartialSearch (line 198) | size_t  numPartialSearch()       { return _numPartialSearch; }
  function numActualPartialSearch (line 199) | size_t  numActualPartialSearch()
  function width (line 205) | bool width(index_t offset_) {
  function hasGenomeCoords (line 210) | bool hasGenomeCoords(index_t offset_) {
  function hasAllGenomeCoords (line 220) | bool hasAllGenomeCoords() {
  function index_t (line 233) | index_t minWidth(index_t& offset) const {
  function searchScore (line 252) | int64_t searchScore(index_t minK) {
  function adjustOffset (line 272) | bool adjustOffset(index_t minK) {
  function setOffset (line 285) | void setOffset(index_t offset) {
  function trim5 (line 470) | void trim5(index_t trim5) { _trim5 = trim5; }
  function trim3 (line 471) | void trim3(index_t trim3) { _trim3 = trim3; }
  function contains (line 508) | bool contains(const GenomeHit<index_t>& other) const {
  function reset (line 639) | struct HIMetrics {
  function init (line 655) | void init(
  function initReads (line 765) | void initReads(Read *rds[2], bool nofw[2], bool norc[2], TAlScore minsc[...

FILE: hier_idx.h
  type Ebwt (line 38) | typedef Ebwt<index_t> PARENT_CLASS;
  function sanityCheckAll (line 275) | void sanityCheckAll(int reverse) const {
  type Ebwt (line 1074) | typedef Ebwt<index_t> PARENT_CLASS;
  function loadIntoMemory (line 1184) | void loadIntoMemory(
  function evictFromMemory (line 1222) | void evictFromMemory() {
  function sanityCheckAll (line 1231) | void sanityCheckAll(int reverse) const {
  function clearLocalEbwts (line 1270) | void clearLocalEbwts() {
  type EList (line 1382) | typedef EList<RefRecord, 1> EList_RefRecord;
  type stat (line 1737) | struct stat
  function else (line 1767) | else if(this->_useMm && !justHeader) {

FILE: hyperloglogplus.h
  function linearCounting (line 47) | double linearCounting(uint32_t m, uint32_t v) {
  function ranhash (line 60) | inline uint64_t ranhash (uint64_t u) {
  function murmurhash3_finalizer (line 72) | inline uint64_t murmurhash3_finalizer (uint64_t key)  {
  function alpha (line 87) | double alpha(uint32_t m)  {
  function calculateEstimate (line 103) | double calculateEstimate(vector<uint8_t> array) {
  function countZeros (line 112) | uint32_t countZeros(vector<uint8_t> s) {
  function clz_manual (line 161) | static int clz_manual(uint64_t x)
  function clz (line 176) | inline uint32_t clz(const uint32_t x) {
  function clz (line 180) | inline uint32_t clz(const uint64_t x) {
  function clz_log2 (line 191) | uint32_t clz_log2(const uint64_t w) {
  type set (line 201) | typedef set<uint32_t> SparseListType;
  type HashSize (line 202) | typedef uint64_t HashSize;
  type T_KEY (line 209) | typedef uint64_t T_KEY;
  function add (line 258) | void add(T_KEY item) {
  function add (line 267) | void add(T_KEY item, size_t size) {
  function add (line 306) | void add(vector<T_KEY> words) {
  function reset (line 315) | void reset() {
  function switchToNormalRepresentation (line 324) | void switchToNormalRepresentation() {
  function addToRegisters (line 343) | void addToRegisters(const SparseListType &sparseList) {
  function merge (line 362) | void merge(const HyperLogLogPlusMinus* other) {
  function get_index (line 444) | uint32_t get_index(const T hash_value, const uint8_t p, const uint8_t si...
  function get_index (line 451) | inline uint32_t get_index(const uint64_t hash_value, const uint8_t p) co...
  function get_index (line 455) | inline uint32_t get_index(const uint32_t hash_value, const uint8_t p) co...
  function T (line 460) | T get_trailing_ones(const uint8_t p) const {
  function get_rank (line 465) | uint8_t get_rank(const T hash_value, const uint8_t p) const {
  function initRawEstimateData (line 477) | void initRawEstimateData() {
  function initBiasData (line 498) | void initBiasData() {
  function getEstimateBias (line 525) | double getEstimateBias(double estimate) {
  function encodeHashIn32Bit (line 556) | uint32_t encodeHashIn32Bit(uint64_t hash_value) {
  function else (line 582) | struct idx_n_rank {

FILE: ls.cpp
  function main (line 30) | int main(void) {

FILE: ls.h
  function update_group (line 58) | inline void update_group(T *pl, T *pm) {
  function select_sort_split (line 73) | inline void select_sort_split(T *p, T n) {
  function T (line 100) | inline T choose_pivot(T *p, T n) {
  function sort_split (line 125) | inline void sort_split(T *p, T n)
  function bucketsort (line 187) | inline void bucketsort(T *x, T *p, T n, T k)
  function T (line 228) | inline T transform(T *x, T *p, T n, T k, T l, T q)

FILE: mask.h
  function matchesEx (line 37) | static inline int matchesEx(int i, int j) {
  function matches (line 48) | static inline bool matches(int i, int j) {
  function randFromMask (line 56) | static inline int randFromMask(RandomSource& rnd, int mask) {

FILE: multikey_qsort.h
  function swap (line 37) | inline void swap(TStr& s, size_t slen, TPos a, TPos b) {
  function swap (line 47) | inline void swap(TVal* s, size_t slen, TPos a, TPos b) {
  function vecswap (line 105) | inline void vecswap(TStr& s, size_t slen, TPos i, TPos j, TPos n, TPos b...
  function vecswap (line 123) | inline void vecswap(TVal *s, size_t slen, TPos i, TPos j, TPos n, TPos b...
  function vecswap2 (line 147) | inline void vecswap2(
  function vecswap2 (line 175) | inline void vecswap2(TVal* s, size_t slen, TVal* s2, TPos i, TPos j, TPo...
  function sanityCheckInputSufs (line 323) | static inline void sanityCheckInputSufs(TIndexOffU *s, size_t slen) {
  type QSortRange (line 506) | struct QSortRange {
  function get_uint8 (line 855) | uint8_t get_uint8(const TStr& t, size_t off) {
  function char_at_suf_u8 (line 874) | inline int char_at_suf_u8(

FILE: outq.cpp
  function main (line 107) | int main(void) {

FILE: outq.h
  function class (line 37) | class OutputQueue {
  function class (line 123) | class OutputQueueMark {

FILE: pat.cpp
  function PatternSource (line 48) | PatternSource* PatternSource::patsrcFromStrings(
  function parseQuals (line 650) | int parseQuals(
  function ASSERT_ONLY (line 986) | ASSERT_ONLY(int pk =) peekToEndOfLine(fb_);
  function wrongQualityFormat (line 1505) | void wrongQualityFormat(const BTString& read_name) {
  function tooFewQualities (line 1513) | void tooFewQualities(const BTString& read_name) {
  function tooManyQualities (line 1519) | void tooManyQualities(const BTString& read_name) {
  type SRA_Read (line 1527) | struct SRA_Read {
    method reset (line 1532) | void reset() {
  type SRA_Data (line 1541) | struct SRA_Data {
    method SRA_Data (line 1550) | SRA_Data() {
    method isFull (line 1558) | bool isFull() {
    method isEmpty (line 1564) | bool isEmpty() {
    method advanceReadPos (line 1580) | void advanceReadPos() {
    method advanceWritePos (line 1585) | void advanceWritePos() {
  function SRA_IO_Worker (line 1591) | static void SRA_IO_Worker(void *vp)

FILE: pat.h
  function genRandSeed (line 55) | static inline uint32_t genRandSeed(const BTDnaString& qry,
  type PatternParams (line 96) | struct PatternParams {
  function class (line 141) | class PatternSource {
  function class (line 279) | class PairedPatternSource {
  function class (line 344) | class PairedSoloPatternSource : public PairedPatternSource {
  function class (line 419) | class PairedDualPatternSource : public PairedPatternSource {
  function class (line 511) | class PatternSourcePerThread {
  function class (line 559) | class PatternSourcePerThreadFactory {
  function class (line 590) | class WrappedPatternSourcePerThread : public PatternSourcePerThread {
  function class (line 617) | class WrappedPatternSourcePerThreadFactory : public PatternSourcePerThre...
  function getOverNewline (line 649) | static inline int getOverNewline(FileBuf& in) {
  function peekOverNewline (line 658) | static inline int peekOverNewline(FileBuf& in) {
  function getToEndOfLine (line 670) | static inline int getToEndOfLine(FileBuf& in) {
  function peekToEndOfLine (line 685) | static inline int peekToEndOfLine(FileBuf& in) {
  function class (line 707) | class VectorPatternSource : public PatternSource {
  function class (line 757) | class BufferedFilePatternSource : public PatternSource {
  function class (line 936) | class FastaPatternSource : public BufferedFilePatternSource {
  function tokenizeQualLine (line 1007) | static inline bool tokenizeQualLine(
  function class (line 1025) | class TabbedPatternSource : public BufferedFilePatternSource {
  function class (line 1112) | class QseqPatternSource : public BufferedFilePatternSource {
  function class (line 1214) | class FastaContinuousPatternSource : public BufferedFilePatternSource {
  function virtual (line 1225) | virtual void reset() {
  function virtual (line 1330) | virtual void resetForNextFile() {
  function class (line 1360) | class FastqPatternSource : public BufferedFilePatternSource {
  function virtual (line 1449) | virtual void resetForNextFile() {
  function class (line 1478) | class RawPatternSource : public BufferedFilePatternSource {
  function namespace (line 1624) | namespace ngs {
  function namespace (line 1629) | namespace tthread {
  type SRA_Data (line 1633) | struct SRA_Data
  function class (line 1638) | class SRAPatternSource : public PatternSource {
  function virtual (line 1734) | virtual void reset() {

FILE: pe.cpp
  function testCaseClassify (line 361) | void testCaseClassify(
  function testCaseOtherMate (line 401) | void testCaseOtherMate(
  function main (line 466) | int main(int argc, char **argv) {

FILE: pe.h
  function pePolicyCompat (line 102) | static inline bool pePolicyCompat(
  function pePolicyMateDir (line 130) | static inline void pePolicyMateDir(
  function class (line 169) | class PairedEndPolicy {
  function reset (line 201) | void reset() {
  function init (line 208) | void init(
  function peClassifyPair (line 268) | int peClassifyPair(

FILE: presets.h
  function class (line 34) | class Presets {
  function class (line 52) | class PresetsV0 : public Presets {

FILE: processor_support.h
  type regs_t (line 23) | struct regs_t {unsigned int EAX, EBX, ECX, EDX;}
  function class (line 26) | class ProcessorSupport {

FILE: qual.h
  function phredcToPhredq (line 31) | static inline uint8_t phredcToPhredq(char c) {
  function solexaToPhred (line 45) | static inline uint8_t solexaToPhred(int sol) {
  function class (line 51) | class SimplePhredPenalty {
  function class (line 64) | class MaqPhredPenalty {
  function mmPenalty (line 77) | static inline uint8_t mmPenalty(bool maq, uint8_t qual) {
  function delPenalty (line 85) | static inline uint8_t delPenalty(bool maq, uint8_t qual) {
  function insPenalty (line 93) | static inline uint8_t insPenalty(bool maq, uint8_t qual_left, uint8_t qu...
  function charToPhred33 (line 105) | inline static char charToPhred33(char c, bool solQuals, bool phred64Qual...
  function intToPhred33 (line 152) | inline static char intToPhred33(int iQ, bool solQuals) {
  function roundPenalty (line 173) | inline static uint8_t roundPenalty(uint8_t p) {
  function penaltiesAt (line 183) | inline static uint8_t penaltiesAt(size_t off, uint8_t *q,
  function loPenaltyAt (line 217) | inline static uint8_t loPenaltyAt(size_t off, int alts,

FILE: random_source.cpp
  function main (line 69) | int main(void) {

FILE: random_source.h
  function class (line 34) | class RandomSource {
  function nextU32 (line 52) | uint32_t nextU32() {
  function nextU64 (line 63) | uint64_t nextU64() {
  function nextU32Range (line 75) | uint32_t nextU32Range(uint32_t lo, uint32_t hi) {
  function nextU2 (line 86) | uint32_t nextU2() {
  function nextBool (line 99) | bool nextBool() {
  function nextFromProbs (line 115) | uint32_t nextFromProbs(
  function nextFloat (line 128) | float nextFloat() {
  function nextU32 (line 133) | static uint32_t nextU32(uint32_t last,
  function class (line 154) | class RandomSource { // Mersenne Twister random number generator
  function reset (line 173) | void reset() {
  function virtual (line 179) | virtual ~RandomSource() { }
  function nextBool (line 188) | bool nextBool() {
  function nextU32 (line 195) | inline uint32_t nextU32() {
  function nextFloat (line 213) | float nextFloat() {
  function twiddle (line 229) | uint32_t twiddle(uint32_t u, uint32_t v) {

FILE: random_util.h
  function class (line 32) | class Random1toN {
  function reset (line 77) | void reset() {
  function T (line 86) | T next(RandomSource& rnd) {
  function setDone (line 171) | void setDone() { assert(inited()); cur_ = n_; }

FILE: read.h
  type rna_strandness_format (line 30) | enum rna_strandness_format {
  type TReadId (line 38) | typedef uint64_t TReadId;
  type TReadOff (line 39) | typedef size_t TReadOff;
  type TAlScore (line 40) | typedef int64_t TAlScore;
  function Read (line 47) | struct Read {
  function reset (line 53) | void reset() {
  function finalize (line 85) | void finalize() {
  function init (line 98) | void init(
  function constructRevComps (line 138) | void constructRevComps() {
  function constructReverses (line 156) | void constructReverses() {
  function fixMateName (line 171) | void fixMateName(int i) {
  function dump (line 200) | void dump(std::ostream& os) const {
  function same (line 258) | static bool same(
  function getc (line 298) | int getc(TReadOff off5p, bool fw) const {
  function getq (line 307) | int getq(TReadOff off5p) const {
  type FmStringOp (line 367) | struct FmStringOp {
  function back (line 378) | struct FmString {
  function reset (line 432) | struct PerReadMetrics {
  type timeval (line 463) | struct timeval
  type timezone (line 464) | struct timezone

FILE: read_qseq.cpp
  function ASSERT_ONLY (line 250) | ASSERT_ONLY(int c =) fb_.get();

FILE: ref_coord.cpp
  function ostream (line 25) | ostream& operator<<(ostream& out, const Interval& c) {
  function ostream (line 30) | ostream& operator<<(ostream& out, const Coord& c) {

FILE: ref_coord.h
  type TRefId (line 28) | typedef int64_t TRefId;
  type TRefOff (line 29) | typedef int64_t TRefOff;
  function class (line 36) | class Coord {
  function init (line 49) | void init(TRefId rf, TRefOff of, bool fw) {
  function init (line 58) | void init(const Coord& c) {
  function operator (line 67) | bool operator==(const Coord& o) const {
  function operator (line 78) | bool operator<(const Coord& o) const {
  function operator (line 91) | bool operator>=(const Coord& o) const {
  function operator (line 100) | bool operator>(const Coord& o) const {
  function operator (line 113) | bool operator<=(const Coord& o) const {
  function reset (line 120) | void reset() {
  function fw (line 143) | bool fw() const {
  function within (line 168) | bool within(int64_t len, int64_t inbegin, int64_t inend) const {
  function setRef (line 176) | inline void setRef(TRefId  id)  { ref_ = id;  }
  function setOff (line 177) | inline void setOff(TRefOff off) { off_ = off; }
  function adjustOff (line 179) | inline void adjustOff(TRefOff off) { off_ += off; }
  function class (line 193) | class Interval {

FILE: ref_read.cpp
  function RefRecord (line 28) | RefRecord fastaRefReadSize(
  function printRecords (line 198) | static void
  function reverseRefRecords (line 210) | void reverseRefRecords(
  function fastaRefReadSizes (line 277) | std::pair<size_t, size_t>

FILE: ref_read.h
  function class (line 38) | class RefTooLongException : public exception {
  function write (line 94) | void write(std::ostream& out, bool be) {
  type RefReadInParams (line 114) | struct RefReadInParams {

FILE: reference.cpp
  type stat (line 74) | struct stat

FILE: reference.h
  function class (line 59) | class BitPairReference {

FILE: scoring.cpp
  function main (line 172) | int main() {

FILE: scoring.h
  function class (line 96) | class Scoring {
  function setMatchBonus (line 186) | void setMatchBonus(int bonus) {
  function setMmPen (line 196) | void setMmPen(int mmType_, int mmpMax_, int mmpMin_) {
  function setNPen (line 206) | void setNPen(int nType, int n) {
  function linearFunc (line 230) | static float linearFunc(int64_t x, float cnst, float lin) {
  function mm (line 240) | inline int mm(int rdc, int refm, int q) const {
  function score (line 249) | inline int score(int rdc, int refm, int q) const {
  function score (line 266) | inline int score(int rdc, int refm, int q, int& ns) const {
  function mm (line 284) | inline int mm(int rdc, int q) const {
  function mm (line 293) | inline int mm(int q) const {
  function match (line 310) | inline int64_t match(int q) const {
  function perfectScore (line 318) | inline int64_t perfectScore(size_t rdlen) const {
  function n (line 337) | inline int n(int q) const {
  function ins (line 348) | inline int ins(int ext) const {
  function del (line 359) | inline int del(int ext) const {
  function Scoring (line 493) | static Scoring base1() {

FILE: sequence_io.h
  function parseFastaLens (line 38) | void parseFastaLens(
  function parseFasta (line 68) | void parseFasta(
  function parseFastas (line 108) | void parseFastas(

FILE: shmem.cpp
  function notifySharedMem (line 35) | void notifySharedMem(void *mem, size_t len) {
  function waitSharedMem (line 43) | void waitSharedMem(void *mem, size_t len) {

FILE: simple_func.cpp
  function SimpleFunc (line 42) | SimpleFunc SimpleFunc::parse(

FILE: simple_func.h
  function class (line 44) | class SimpleFunc {
  function init (line 54) | void init(int type, double I, double X, double C, double L) {
  function init (line 58) | void init(int type, double C, double L) {
  function setType (line 64) | void setType (int type ) { type_ = type; }
  function setMin (line 65) | void setMin  (double mn) { I_ = mn; }
  function setMax (line 66) | void setMax  (double mx) { X_ = mx; }
  function setConst (line 67) | void setConst(double co) { C_ = co; }
  function setCoeff (line 68) | void setCoeff(double ce) { L_ = ce; }
  function mult (line 76) | void mult(double x) {
  function reset (line 83) | void reset() { type_ = 0; }

FILE: sse_util.h
  function class (line 29) | class EList_m128i {
  function ensure (line 60) | inline void ensure(size_t thresh) {
  function reserveExact (line 70) | inline void reserveExact(size_t newsz) {
  function resize (line 89) | void resize(size_t sz) {
  function zero (line 104) | void zero() {
  function resizeNoCopy (line 114) | void resizeNoCopy(size_t sz) {
  function resizeExact (line 130) | void resizeExact(size_t sz) {
  function clear (line 143) | void clear() {
  function __m128i (line 151) | inline __m128i& operator[](size_t i) {
  function __m128i (line 159) | inline __m128i operator[](size_t i) const {
  function __m128i (line 167) | inline __m128i& get(size_t i) {
  function __m128i (line 174) | inline __m128i get(size_t i) const {
  function __m128i (line 181) | __m128i *ptr() { return list_; }
  function __m128i (line 186) | const __m128i *ptr() const { return list_; }
  function lazyInitExact (line 206) | void lazyInitExact(size_t sz) {
  function __m128i (line 217) | __m128i *alloc(size_t sz) {
  function free (line 243) | void free() {
  function expandCopy (line 257) | void expandCopy(size_t thresh) {
  function expandCopyExact (line 268) | void expandCopyExact(size_t newsz) {
  function expandNoCopy (line 290) | void expandNoCopy(size_t thresh) {
  function expandNoCopyExact (line 302) | void expandNoCopyExact(size_t newsz) {
  function reset (line 320) | struct  CpQuad {
  function operator (line 325) | bool operator==(const CpQuad& o) const {
  function class (line 339) | class Checkpointer {

FILE: sstring.cpp
  function main (line 29) | int main(void) {

FILE: sstring.h
  function sstr_len (line 82) | static inline size_t sstr_len(const char *s) {
  function sstr_eq (line 96) | inline bool sstr_eq(const T1& s1, const T2& s2) {
  function sstr_neq (line 107) | inline bool sstr_neq(const T1& s1, const T2& s2) {
  function virtual (line 572) | virtual ~SString() {
  function resize (line 603) | void resize(size_t sz) {
  function set (line 655) | inline void set(int c, size_t idx) {
  function T (line 663) | inline const T& operator[](size_t i) const {
  function T (line 671) | inline T& operator[](size_t i) {
  function T (line 679) | inline const T& get(size_t i) const {
  function virtual (line 688) | virtual void install(const T* b, size_t sz) {
  function virtual (line 698) | virtual void install(const std::basic_string<T>& b) {
  function install (line 708) | void install(const T* b) {
  function installReverse (line 716) | void installReverse(const char* b, size_t sz) {
  function installReverse (line 729) | void installReverse(const SString<T>& b) {
  function operator (line 736) | bool operator==(const SString<T>& o) {
  function operator (line 743) | bool operator!=(const SString<T>& o) {
  function operator (line 750) | bool operator<(const SString<T>& o) {
  function operator (line 757) | bool operator>(const SString<T>& o) {
  function operator (line 764) | bool operator<=(const SString<T>& o) {
  function operator (line 771) | bool operator>=(const SString<T>& o) {
  function reverse (line 778) | void reverse() {
  function reverseWindow (line 789) | void reverseWindow(size_t off, size_t len) {
  function fill (line 803) | void fill(size_t len, const T& el) {
  function fill (line 813) | void fill(const T& el) {
  function clear (line 825) | void clear() { len_ = 0; }
  function virtual (line 856) | virtual const T* toZBuf() const {
  function T (line 864) | const T* buf() const { return cs_; }
  function T (line 869) | T* wbuf() { return cs_; }
  function class (line 883) | class S2bDnaString {
  function virtual (line 983) | virtual ~S2bDnaString() {
  function resize (line 1016) | void resize(size_t sz) {
  function toChar (line 1034) | char toChar(size_t idx) const {
  function toColor (line 1043) | char toColor(size_t idx) const {
  function set (line 1094) | void set(int c, size_t idx) {
  function setChar (line 1106) | void setChar(int c, size_t idx) {
  function setColor (line 1115) | void setColor(int c, size_t idx) {
  function setWord (line 1124) | void setWord(uint32_t w, size_t i) {
  function const (line 1132) | char operator[](size_t i) const {
  function get (line 1140) | char get(size_t i) const {
  function install (line 1150) | void install(const uint32_t* b, size_t sz) {
  function install (line 1160) | void install(const char* b, size_t sz) {
  function installChars (line 1179) | void installChars(const char* b, size_t sz) {
  function installColors (line 1200) | void installColors(const char* b, size_t sz) {
  function install (line 1221) | void install(const char* b) {
  function installChars (line 1228) | void installChars(const char* b) {
  function installColors (line 1235) | void installColors(const char* b) {
  function install (line 1242) | void install(const std::basic_string<char>& b) {
  function installChars (line 1249) | void installChars(const std::basic_string<char>& b) {
  function installColors (line 1256) | void installColors(const std::basic_string<char>& b) {
  function installReverse (line 1264) | void installReverse(const char* b, size_t sz) {
  function installReverse (line 1285) | void installReverse(const char* b) {
  function installReverseChars (line 1293) | void installReverseChars(const char* b, size_t sz) {
  function installReverseChars (line 1317) | void installReverseChars(const char* b) {
  function installReverseColors (line 1325) | void installReverseColors(const char* b, size_t sz) {
  function installReverseColors (line 1349) | void installReverseColors(const char* b) {
  function installReverse (line 1357) | void installReverse(const S2bDnaString& b) {
  function reverse (line 1426) | void reverse() {
  function reverseWindow (line 1453) | void reverseWindow(size_t off, size_t len) {
  function fill (line 1483) | void fill(size_t len, char el) {
  function fill (line 1515) | void fill(char el) {
  function clear (line 1576) | void clear() { len_ = 0; }
  function virtual (line 1698) | virtual ~SStringExpandable() {
  function insert (line 1762) | void insert(const T& c, size_t idx) {
  function set (line 1777) | void set(int c, size_t idx) {
  function append (line 1785) | void append(const T& c) {
  function remove (line 1793) | void remove(size_t idx) {
  function T (line 1805) | const T& operator[](size_t i) const {
  function T (line 1821) | const T& get(size_t i) const {
  function T (line 1829) | const T* get_ptr(size_t i) const {
  function virtual (line 1837) | virtual void install(const T* b, size_t sz) {
  function install (line 1847) | void install(const T* b) { install(b, strlen(b)); }
  function installReverse (line 1853) | void installReverse(const char* b, size_t sz) {
  function installReverse (line 1865) | void installReverse(const SStringExpandable<T, S>& b) {
  function reverse (line 1918) | void reverse() {
  function reverseWindow (line 1929) | void reverseWindow(size_t off, size_t len) {
  function resize (line 1945) | void resize(size_t len) {
  function resize (line 1954) | void resize(size_t len, const T& el) {
  function fill (line 1967) | void fill(size_t len, const T& el) {
  function fill (line 1977) | void fill(const T& el) {
  function trimBegin (line 1984) | void trimBegin(size_t len) {
  function trimEnd (line 1998) | void trimEnd(size_t len) {
  function append (line 2006) | void append(const T* b, size_t sz) {
  function append (line 2015) | void append(const T* b) {
  function clear (line 2027) | void clear() { len_ = 0; }
  function virtual (line 2058) | virtual const T* toZBuf() const {
  function eq (line 2072) | bool eq(const char *str) const {
  function T (line 2080) | const T* buf() const { return cs_; }
  function T (line 2085) | T* wbuf() { return cs_; }
  function expandNoCopy (line 2113) | void expandNoCopy(size_t sz) {
  function explicit (line 2156) | explicit SStringFixed(const std::basic_string<T>& str) {
  function explicit (line 2163) | explicit SStringFixed(const T* b, size_t sz) {
  function explicit (line 2170) | explicit SStringFixed(const T* b) {
  function virtual (line 2174) | virtual ~SStringFixed() { }
  function T (line 2179) | inline const T& operator[](size_t i) const {
  function T (line 2186) | inline T& operator[](size_t i) {
  function T (line 2193) | inline const T& get(size_t i) const {
  function T (line 2201) | inline T& get(size_t i) {
  function insert (line 2258) | void insert(const T& c, size_t idx) {
  function set (line 2272) | void set(int c, size_t idx) {
  function append (line 2280) | void append(const T& c) {
  function remove (line 2288) | void remove(size_t idx) {
  function virtual (line 2300) | virtual void install(const T* b, size_t sz) {
  function install (line 2309) | void install(const T* b) { install(b, strlen(b)); }
  function installReverse (line 2315) | void installReverse(const char* b, size_t sz) {
  function installReverse (line 2327) | void installReverse(const SStringFixed<T, S>& b) {
  function reverse (line 2380) | void reverse() {
  function reverseWindow (line 2391) | void reverseWindow(size_t off, size_t len) {
  function resize (line 2407) | void resize(size_t len) {
  function resize (line 2416) | void resize(size_t len, const T& el) {
  function fill (line 2429) | void fill(size_t len, const T& el) {
  function fill (line 2439) | void fill(const T& el) {
  function trimBegin (line 2446) | void trimBegin(size_t len) {
  function trimEnd (line 2460) | void trimEnd(size_t len) {
  function append (line 2468) | void append(const T* b, size_t sz) {
  function append (line 2477) | void append(const T* b) {
  function clear (line 2489) | void clear() { len_ = 0; }
  function virtual (line 2500) | virtual const T* toZBuf() const {
  function eq (line 2509) | bool eq(const char *str) const {
  function T (line 2532) | const T* buf() const { return cs_; }
  function T (line 2537) | T* wbuf() { return cs_; }
  function explicit (line 2584) | explicit SDnaStringFixed(const std::basic_string<char>& str) :
  function explicit (line 2590) | explicit SDnaStringFixed(const char* b, size_t sz) :
  function virtual (line 2613) | virtual ~SDnaStringFixed() { }
  function installReverseComp (line 2620) | void installReverseComp(const char* b, size_t sz) {
  function installReverseComp (line 2633) | void installReverseComp(const SDnaStringFixed<S>& b) {
  function virtual (line 2667) | virtual void install(const char* b, size_t sz) {
  function virtual (line 2683) | virtual void installChars(const char* b, size_t sz) {
  function virtual (line 2698) | virtual void installColors(const char* b, size_t sz) {
  function virtual (line 2713) | virtual void installChars(const std::basic_string<char>& str) {
  function virtual (line 2721) | virtual void installColors(const std::basic_string<char>& str) {
  function set (line 2728) | void set(int c, size_t idx) {
  function append (line 2738) | void append(const char& c) {
  function setChar (line 2748) | void setChar(char c, size_t idx) {
  function appendChar (line 2757) | void appendChar(char c) {
  function toChar (line 2766) | char toChar(size_t idx) const {
  function const (line 2775) | const char& operator[](size_t i) const {
  function get (line 2782) | const char& get(size_t i) const {
  function virtual (line 2832) | virtual const char* toZBuf() const { return this->toZBufXForm("ACGTN"); }
  function virtual (line 2908) | virtual ~SDnaStringExpandable() { }
  function installReverseComp (line 2915) | void installReverseComp(const char* b, size_t sz) {
  function installReverseComp (line 2928) | void installReverseComp(const SDnaStringExpandable<S, M>& b) {
  function virtual (line 2981) | virtual void install(const char* b, size_t sz) {
  function virtual (line 2996) | virtual void installChars(const char* b, size_t sz) {
  function virtual (line 3010) | virtual void installColors(const char* b, size_t sz) {
  function virtual (line 3024) | virtual void installChars(const std::basic_string<char>& str) {
  function virtual (line 3032) | virtual void installColors(const std::basic_string<char>& str) {
  function set (line 3039) | void set(int c, size_t idx) {
  function append (line 3048) | void append(const char& c) {
  function setChar (line 3059) | void setChar(char c, size_t idx) {
  function appendChar (line 3068) | void appendChar(char c) {
  function toChar (line 3079) | char toChar(size_t idx) const {
  function const (line 3176) | inline const char& operator[](size_t i) const {
  function get (line 3183) | inline const char& get(size_t i) const {
  function virtual (line 3232) | virtual const char* toZBuf() const { return this->toZBufXForm("ACGTN"); }
  function explicit (line 3255) | explicit SDnaMaskString(const std::basic_string<char>& str) :
  function virtual (line 3277) | virtual ~SDnaMaskString() { }
  function installReverseComp (line 3284) | void installReverseComp(const char* b, size_t sz) {
  function installReverseComp (line 3299) | void installReverseComp(const SDnaMaskString<S, M>& b) {
  function virtual (line 3335) | virtual void install(const char* b, size_t sz) {
  function virtual (line 3351) | virtual void installChars(const char* b, size_t sz) {
  function virtual (line 3367) | virtual void installChars(const std::basic_string<char>& str) {
  function set (line 3374) | void set(int c, size_t idx) {
  function append (line 3383) | void append(const char& c) {
  function setChar (line 3394) | void setChar(char c, size_t idx) {
  function appendChar (line 3403) | void appendChar(char c) {
  function toChar (line 3414) | char toChar(size_t idx) const {
  function const (line 3422) | const char& operator[](size_t i) const {
  function get (line 3436) | const char& get(size_t i) const {
  function get (line 3445) | char& get(size_t i) {
  function virtual (line 3530) | virtual const char* toZBuf() const { return this->toZBufXForm(iupacs); }
  type SStringExpandable (line 3533) | typedef SStringExpandable<char, 1024, 2> BTString;

FILE: str_util.h
  function hash_string (line 28) | static inline int

FILE: taxonomy.h
  type TaxonomyNode (line 51) | struct TaxonomyNode {
  function getPath (line 151) | void getPath(uint64_t tid, EList<uint64_t>& path) const {
  type std (line 163) | typedef std::map<uint64_t, TaxonomyNode> TaxonomyTree;
  function initial_tax_rank_num (line 165) | inline static void initial_tax_rank_num() {
  function get_tax_rank_id (line 241) | inline static uint8_t get_tax_rank_id(const char* rank) {
  function get_taxid_at_parent_rank (line 303) | inline static uint64_t get_taxid_at_parent_rank(const TaxonomyTree& tree...
  function TaxonomyTree (line 322) | inline static TaxonomyTree read_taxonomy_tree(string taxonomy_fname) {

FILE: third_party/MurmurHash3.cpp
  function rotl32 (line 34) | inline uint32_t rotl32 ( uint32_t x, int8_t r )
  function rotl64 (line 39) | inline uint64_t rotl64 ( uint64_t x, int8_t r )
  function FORCE_INLINE (line 55) | FORCE_INLINE uint32_t getblock32 ( const uint32_t * p, int i )
  function FORCE_INLINE (line 60) | FORCE_INLINE uint64_t getblock64 ( const uint64_t * p, int i )
  function FORCE_INLINE (line 68) | FORCE_INLINE uint32_t fmix32 ( uint32_t h )
  function FORCE_INLINE (line 81) | FORCE_INLINE uint64_t fmix64 ( uint64_t k )
  function MurmurHash3_x86_32 (line 94) | void MurmurHash3_x86_32 ( const void * key, int len,
  function MurmurHash3_x86_128 (line 150) | void MurmurHash3_x86_128 ( const void * key, const int len,
  function MurmurHash3_x64_128 (line 255) | void MurmurHash3_x64_128 ( const void * key, const int len,

FILE: third_party/cpuid.h
  function __get_cpuid_max (line 120) | static __inline unsigned int
  function __get_cpuid (line 175) | static __inline int

FILE: threading.h
  function class (line 37) | class ThreadSafe {

FILE: timer.h
  function class (line 35) | class Timer {

FILE: tinythread.cpp
  type tthread (line 35) | namespace tthread {
    function _pthread_t_to_ID (line 131) | static thread::id _pthread_t_to_ID(const pthread_t &aHandle)
    type _thread_start_info (line 150) | struct _thread_start_info {

FILE: tinythread.h
  function class (line 159) | class mutex {
  function class (line 251) | class recursive_mutex {
  function explicit (line 350) | explicit lock_guard(mutex_type &aMutex)
  function class (line 392) | class condition_variable {
  function class (line 481) | class thread {

FILE: tokenize.h
  function tokenize (line 54) | void tokenize(const std::string& s, char delim, T& ss) {

FILE: util.h
  function extractIDFromRefName (line 57) | inline
  function string (line 72) | string to_string(T value) {
  function V (line 83) | V find_or_use_default(const std::map<K, V>& my_map, const K& query, cons...

FILE: word_io.h
  function writeU32 (line 35) | static inline void writeU32(std::ostream& out, uint32_t x, bool toBigEnd...
  function writeU32 (line 44) | static inline void writeU32(std::ostream& out, uint32_t x) {
  function writeI32 (line 53) | static inline void writeI32(std::ostream& out, int32_t x, bool toBigEndi...
  function writeI32 (line 62) | static inline void writeI32(std::ostream& out, int32_t x) {
  function writeU16 (line 71) | static inline void writeU16(std::ostream& out, uint16_t x, bool toBigEnd...
  function writeU16 (line 80) | static inline void writeU16(std::ostream& out, uint16_t x) {
  function writeI16 (line 89) | static inline void writeI16(std::ostream& out, int16_t x, bool toBigEndi...
  function writeI16 (line 98) | static inline void writeI16(std::ostream& out, int16_t x) {
  function readU32 (line 106) | static inline uint32_t readU32(std::istream& in, bool swap) {
  function readU32 (line 122) | static inline uint32_t readU32(int in, bool swap) {
  function readU32 (line 139) | static inline uint32_t readU32(FILE* in, bool swap) {
  function readI32 (line 156) | static inline int32_t readI32(std::istream& in, bool swap) {
  function readI32 (line 172) | static inline uint32_t readI32(int in, bool swap) {
  function readI32 (line 189) | static inline uint32_t readI32(FILE* in, bool swap) {
  function readU16 (line 206) | static inline uint16_t readU16(std::istream& in, bool swap) {
  function readU16 (line 222) | static inline uint16_t readU16(int in, bool swap) {
  function readU16 (line 239) | static inline uint16_t readU16(FILE* in, bool swap) {
  function readI16 (line 256) | static inline int32_t readI16(std::istream& in, bool swap) {
  function readI16 (line 272) | static inline uint16_t readI16(int in, bool swap) {
  function readI16 (line 289) | static inline uint16_t readI16(FILE* in, bool swap) {
  function index_t (line 312) | inline index_t readIndex(std::istream& in, bool swap) {
  function index_t (line 329) | inline index_t readIndex(int in, bool swap) {
  function index_t (line 355) | inline index_t readIndex(FILE* in, bool swap) {

Download .json

Condensed preview — 167 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (3,212K chars).

[
  {
    "path": ".gitignore",
    "chars": 223,
    "preview": "*~\n*.dSYM\n.DS_Store\n*-debug\n*-s\n*-l\ncentrifuge.xcodeproj/project.xcworkspace\ncentrifuge.xcodeproj/xcuserdata\ncentrifuge."
  },
  {
    "path": "AUTHORS",
    "chars": 1301,
    "preview": "Ben Langmead <langmea@cs.jhu.edu> wrote Bowtie 2, which is based partially on\nBowtie.  Bowtie was written by Ben Langmea"
  },
  {
    "path": "LICENSE",
    "chars": 35147,
    "preview": "                    GNU GENERAL PUBLIC LICENSE\n                       Version 3, 29 June 2007\n\n Copyright (C) 2007 Free "
  },
  {
    "path": "MANUAL",
    "chars": 46007,
    "preview": "\nIntroduction\n============\n\nWhat is Centrifuge?\n-----------------\n\n[Centrifuge] is a novel microbial classification engi"
  },
  {
    "path": "MANUAL.markdown",
    "chars": 54775,
    "preview": "\n\n<!--\n ! This manual is written in \"markdown\" format and thus contains some\n ! distracting formatting clutter.  See 'MA"
  },
  {
    "path": "Makefile",
    "chars": 12331,
    "preview": "#\n# Copyright 2014, Daehwan Kim <infphilo@gmail.com>\n#\n# This file is part of Centrifuge, which is copied and modified f"
  },
  {
    "path": "NEWS",
    "chars": 31,
    "preview": "Centrifuge NEWS\n=============\n\n"
  },
  {
    "path": "README.md",
    "chars": 1916,
    "preview": "# Centrifuge\nClassifier for metagenomic sequences\n\n[Centrifuge] is a novel microbial classification engine that enables\n"
  },
  {
    "path": "TUTORIAL",
    "chars": 248,
    "preview": "See section toward end of MANUAL entited \"Getting started with Bowtie 2: Lambda\nphage example\".  Or, for tutorial for la"
  },
  {
    "path": "VERSION",
    "chars": 6,
    "preview": "1.0.4\n"
  },
  {
    "path": "aligner_bt.cpp",
    "chars": 53941,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "aligner_bt.h",
    "chars": 30930,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "aligner_cache.cpp",
    "chars": 4235,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "aligner_cache.h",
    "chars": 26936,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "aligner_metrics.h",
    "chars": 11250,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "aligner_result.h",
    "chars": 11642,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "aligner_seed.cpp",
    "chars": 16565,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "aligner_seed.h",
    "chars": 84110,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "aligner_seed_policy.cpp",
    "chars": 28296,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "aligner_seed_policy.h",
    "chars": 7680,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "aligner_sw.cpp",
    "chars": 98940,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "aligner_sw.h",
    "chars": 25471,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "aligner_sw_common.h",
    "chars": 8712,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "aligner_sw_nuc.h",
    "chars": 7213,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "aligner_swsse.cpp",
    "chars": 2668,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "aligner_swsse.h",
    "chars": 15152,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "aligner_swsse_ee_i16.cpp",
    "chars": 63619,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "aligner_swsse_ee_u8.cpp",
    "chars": 62716,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "aligner_swsse_loc_i16.cpp",
    "chars": 76326,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "aligner_swsse_loc_u8.cpp",
    "chars": 75114,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "aln_sink.h",
    "chars": 81401,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "alphabet.cpp",
    "chars": 19320,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "alphabet.h",
    "chars": 5619,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "assert_helpers.h",
    "chars": 9780,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "binary_sa_search.h",
    "chars": 3535,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "bitpack.h",
    "chars": 1520,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "blockwise_sa.h",
    "chars": 38857,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "bt2_idx.cpp",
    "chars": 2196,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "bt2_idx.h",
    "chars": 131622,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "bt2_io.h",
    "chars": 35124,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "bt2_util.h",
    "chars": 6581,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "btypes.h",
    "chars": 1283,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "ccnt_lut.cpp",
    "chars": 1613,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "centrifuge",
    "chars": 25440,
    "preview": "#!/usr/bin/env perl\n\n#\n# Copyright 2014, Daehwan Kim <infphilo@gmail.com>\n#\n# This file is part of Centrifuge, which is "
  },
  {
    "path": "centrifuge-BuildSharedSequence.pl",
    "chars": 12330,
    "preview": "#!/usr/bin/env perl\n\nuse strict ;\nuse Getopt::Long;\nuse File::Basename;\n\nmy $usage = \"perl \".basename($0).\" file_list [-"
  },
  {
    "path": "centrifuge-RemoveEmptySequence.pl",
    "chars": 411,
    "preview": "#!/usr/bin/env perl\n\n# remove the headers with empty sequences. possible introduced by dustmask\n\nuse strict ;\n\n# die \"us"
  },
  {
    "path": "centrifuge-RemoveN.pl",
    "chars": 1011,
    "preview": "#!/usr/bin/env perl\n\nuse strict ;\nuse warnings ;\n\ndie \"usage: a.pl xxx.fa > output.fa\" if ( @ARGV == 0 ) ;\n\nmy $LINE_WID"
  },
  {
    "path": "centrifuge-build",
    "chars": 2315,
    "preview": "#!/usr/bin/env python\n\n\"\"\"\n Copyright 2014, Daehwan Kim <infphilo@gmail.com>\n\n This file is part of Centrifuge, which is"
  },
  {
    "path": "centrifuge-compress.pl",
    "chars": 13309,
    "preview": "#!/usr/bin/env perl\n\n# Read and merge the sequence for the chosen level\n\nuse strict ;\nuse warnings ;\n\nuse threads ;\nuse "
  },
  {
    "path": "centrifuge-download",
    "chars": 12954,
    "preview": "#!/bin/bash\n\nset -eu -o pipefail\n\nexists() {\n  command -v \"$1\" >/dev/null 2>&1\n}\n\ncut_after_first_space_or_second_pipe()"
  },
  {
    "path": "centrifuge-inspect",
    "chars": 1871,
    "preview": "#!/usr/bin/env python\n\n\"\"\"\n Copyright 2014, Daehwan Kim <infphilo@gmail.com>\n\n This file is part of Centrifuge, which is"
  },
  {
    "path": "centrifuge-kreport",
    "chars": 6706,
    "preview": "#!/usr/bin/env perl\n\n# Give a Kraken-style report from a Centrifuge output\n#\n# Based on kraken-report by Derrick Wood\n# "
  },
  {
    "path": "centrifuge-promote",
    "chars": 3588,
    "preview": "#!/usr/bin/env perl\n\nuse strict ;\nuse warnings ;\n\nuse File::Basename;\nuse Cwd;\nuse Cwd 'cwd' ;\nuse Cwd 'abs_path' ;\n\n\ndi"
  },
  {
    "path": "centrifuge-sort-nt.pl",
    "chars": 3219,
    "preview": "#! /usr/bin/env perl\n#\n# Sort nt file sequences according to their taxonomy ID\n# Uses the new mappinf file format availa"
  },
  {
    "path": "centrifuge.cpp",
    "chars": 130381,
    "preview": "/*\n * Copyright 2014, Daehwan Kim <infphilo@gmail.com>\n *\n * This file is part of Centrifuge.\n *\n * Centrifuge is free s"
  },
  {
    "path": "centrifuge.xcodeproj/project.pbxproj",
    "chars": 51405,
    "preview": "// !$*UTF8*$!\n{\n\tarchiveVersion = 1;\n\tclasses = {\n\t};\n\tobjectVersion = 46;\n\tobjects = {\n\n/* Begin PBXBuildFile section *"
  },
  {
    "path": "centrifuge_build.cpp",
    "chars": 28259,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "centrifuge_build_main.cpp",
    "chars": 2096,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "centrifuge_compress.cpp",
    "chars": 59529,
    "preview": "/*\n * Copyright 2015, Daehwan Kim <infphilo@gmail.com>\n *\n * This file is part of Centrifuge.\n *\n * Centrifuge is free s"
  },
  {
    "path": "centrifuge_inspect.cpp",
    "chars": 22986,
    "preview": "/*\n * Copyright 2016\n *\n * This file is part of Centrifuge and based on code from Bowtie 2.\n *\n * Bowtie 2 is free softw"
  },
  {
    "path": "centrifuge_main.cpp",
    "chars": 2042,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "centrifuge_report.cpp",
    "chars": 5792,
    "preview": "/*\n * centrifuge-build.cpp\n *\n *  Created on: Apr 8, 2015\n *      Author: fbreitwieser\n */\n\n#include<iostream>\n#include<"
  },
  {
    "path": "classifier.h",
    "chars": 45033,
    "preview": "/*\n * Copyright 2014, Daehwan Kim <infphilo@gmail.com>\n *\n * This file is part of HISAT.\n *\n * HISAT is free software: y"
  },
  {
    "path": "diff_sample.cpp",
    "chars": 4592,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "diff_sample.h",
    "chars": 30460,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "doc/README",
    "chars": 151,
    "preview": "To populate this directory, change to the bowtie2 directory and type\n'make doc'.  You must have pandoc installed:\n\n  htt"
  },
  {
    "path": "doc/add.css",
    "chars": 1234,
    "preview": ".pageStyle #leftside { \n  color: #666;\n}\n\n.pageStyle #leftside a { \n  color: #0066B3;\n  text-decoration: none;\n}\n\n.pageS"
  },
  {
    "path": "doc/faq.shtml",
    "chars": 1158,
    "preview": "<!--#set var=\"Title\" value=\"Centrifuge\" -->\n<!--#set var=\"NoCrumbs\" value=\"1\" -->\n<!--#set var=\"SubTitle\" value=\"Classif"
  },
  {
    "path": "doc/footer.inc.html",
    "chars": 423,
    "preview": "<div id=\"footer\">\n  <table cellspacing=\"15\" width=\"100%\"><tbody><tr><td>\n   This research was supported in part by NIH g"
  },
  {
    "path": "doc/index.shtml",
    "chars": 5448,
    "preview": "<!--#set var=\"Title\" value=\"Centrifuge\" -->\n<!--#set var=\"NoCrumbs\" value=\"1\" -->\n<!--#set var=\"SubTitle\" value=\"Classif"
  },
  {
    "path": "doc/manual.html",
    "chars": 84288,
    "preview": "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\""
  },
  {
    "path": "doc/manual.inc.html",
    "chars": 63516,
    "preview": "<nav id=\"TOC\">\n<ul>\n<li><a href=\"#introduction\">Introduction</a><ul>\n<li><a href=\"#what-is-centrifuge\">What is Centrifug"
  },
  {
    "path": "doc/manual.inc.html.old",
    "chars": 82391,
    "preview": "<div id=\"TOC\">\n<ul>\n<li><a href=\"#introduction\">Introduction</a><ul>\n<li><a href=\"#what-is-hisat\">What is HISAT?</a></li"
  },
  {
    "path": "doc/manual.shtml",
    "chars": 1068,
    "preview": "<!--#set var=\"Title\" value=\"Centrifuge\" -->\n<!--#set var=\"NoCrumbs\" value=\"1\" -->\n<!--#set var=\"SubTitle\" value=\"Classif"
  },
  {
    "path": "doc/sidebar.inc.shtml",
    "chars": 4629,
    "preview": "<h2>Site Map</h2>\n<div class=\"box\">\n <ul>\n   <li><a href=\"index.shtml\">Home</a></li>\n   <li><a href=\"manual.shtml\">Manua"
  },
  {
    "path": "doc/strip_markdown.pl",
    "chars": 778,
    "preview": "#!/usr/bin/env perl -w\n\n##\n# strip_markdown.pl\n#\n# Used to convert MANUAL.markdown to MANUAL.  Leaves all manual content"
  },
  {
    "path": "doc/style.css",
    "chars": 7942,
    "preview": "/* \nStylesheet for the free sNews15_1 template\nfrom http://www.free-css-templates.com\n*/\n\n/* Reset all margins and paddi"
  },
  {
    "path": "dp_framer.cpp",
    "chars": 37227,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "dp_framer.h",
    "chars": 9315,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "ds.cpp",
    "chars": 3211,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "ds.h",
    "chars": 89674,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "edit.cpp",
    "chars": 12331,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "edit.h",
    "chars": 9664,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "endian_swap.h",
    "chars": 4167,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "evaluation/centrifuge_evaluate.py",
    "chars": 23049,
    "preview": "#!/usr/bin/env python\n\nimport sys, os, subprocess, inspect\nimport platform, multiprocessing\nimport string, re\nfrom datet"
  },
  {
    "path": "evaluation/centrifuge_simulate_reads.py",
    "chars": 32261,
    "preview": "#!/usr/bin/env python\n\n#\n# Copyright 2015, Daehwan Kim <infphilo@gmail.com>\n#\n# This file is part of HISAT 2.\n#\n# HISAT "
  },
  {
    "path": "evaluation/test/abundance.Rmd",
    "chars": 1614,
    "preview": "---\ntitle: \"Centrifuge abundance\"\n# author: \"Daehwan Kim\"\ndate: \"August 15, 2016\"\noutput: html_document\n---\n\n```{r setup"
  },
  {
    "path": "evaluation/test/centrifuge_evaluate_mason.py",
    "chars": 13637,
    "preview": "#!/usr/bin/env python\n\nimport sys, os, subprocess, inspect\nimport platform, multiprocessing\nimport string, re\nfrom datet"
  },
  {
    "path": "example/reads/input.fa",
    "chars": 1032,
    "preview": ">C_1\nGATCCTCCCCAGGCCCCTACACCCAATGTGGAACCGGGGTCCCGAATGAAAATGCTGCTGTTCCCTGGAGGTGTTTTCCT\n>C_2\nGATCCTCCCCAGGCCCCTACACCCAATGT"
  },
  {
    "path": "example/reference/gi_to_tid.dmp",
    "chars": 20,
    "preview": "gi|4\t9646\ngi|7\t9913\n"
  },
  {
    "path": "example/reference/names.dmp",
    "chars": 3994,
    "preview": "1\t|\tall\t|\t\t|\tsynonym\t|\n1\t|\troot\t|\t\t|\tscientific name\t|\n2759\t|\tEucarya\t|\t\t|\tsynonym\t|\n2759\t|\tEucaryotae\t|\t\t|\tsynonym\t|\n27"
  },
  {
    "path": "example/reference/nodes.dmp",
    "chars": 827,
    "preview": "1\t|\t1\t|\tno rank\n2759\t|\t131567\t|\tsuperkingdom\n6072\t|\t33208\t|\tno rank\n7711\t|\t33511\t|\tphylum\n7742\t|\t89593\t|\tno rank\n7776\t|\t"
  },
  {
    "path": "example/reference/test.fa",
    "chars": 1192,
    "preview": ">gi|4|emb|X17276.1| Giant Panda satellite 1 DNA\nGGACGCTCTGCTTTGTTACCAATGAGAAGGGCGCTGAATCCTCGAAAATCCTGACCCTTTTAATTCATGCTC"
  },
  {
    "path": "fast_mutex.h",
    "chars": 6940,
    "preview": "/* -*- mode: c++; tab-width: 2; indent-tabs-mode: nil; -*-\nCopyright (c) 2010-2012 Marcus Geelnard\n\nThis software is pro"
  },
  {
    "path": "filebuf.h",
    "chars": 15921,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "formats.h",
    "chars": 1208,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "functions.sh",
    "chars": 1352,
    "preview": "#!/bin/bash\n\nfunction check_or_mkdir {\n    echo -n \"Creating $1 ... \" >&2\n    if [[ -d $1 && ! -n `find $1 -prune -empty"
  },
  {
    "path": "group_walk.cpp",
    "chars": 759,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "group_walk.h",
    "chars": 37406,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "hi_aligner.h",
    "chars": 31197,
    "preview": "/*\n * Copyright 2014, Daehwan Kim <infphilo@gmail.com>\n *\n * This file is part of HISAT.\n *\n * HISAT is free software: y"
  },
  {
    "path": "hier_idx.h",
    "chars": 69102,
    "preview": "/*\n * Copyright 2013, Daehwan Kim <infphilo@gmail.com>\n *\n * This file is part of Beast.  Beast is based on Bowtie 2.\n *"
  },
  {
    "path": "hier_idx_common.h",
    "chars": 1591,
    "preview": "/*\n * Copyright 2013, Daehwan Kim <infphilo@gmail.com>\n *\n * This file is part of Beast.  Beast is based on Bowtie 2.\n *"
  },
  {
    "path": "hyperloglogbias.h",
    "chars": 72841,
    "preview": "/*\n * hyperloglogbias.h\n *\n *  Created on: Apr 25, 2015\n *      Author: fbreitwieser\n */\n\n#ifndef HYPERLOGLOGBIAS_H_\n#de"
  },
  {
    "path": "hyperloglogplus.h",
    "chars": 19688,
    "preview": "/*\n * hyperloglogplus.h\n *\n * Implementation of HyperLogLog++ algorithm described by Stefan Heule et al.\n *\n *  Created "
  },
  {
    "path": "indices/Makefile",
    "chars": 12887,
    "preview": "#\n# Makefile\n# fbreitwieser, 2016-01-29 13:00\n#\n\nSHELL := /bin/bash\n\nTHREADS?=1\nKEEP_FILES?=0\n\nget_ref_file_names = $(ad"
  },
  {
    "path": "limit.cpp",
    "chars": 1894,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "limit.h",
    "chars": 1307,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "ls.cpp",
    "chars": 3641,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "ls.h",
    "chars": 11954,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "mask.cpp",
    "chars": 1051,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "mask.h",
    "chars": 2170,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "mem_ids.h",
    "chars": 1271,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "mm.h",
    "chars": 1549,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "multikey_qsort.h",
    "chars": 38494,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "opts.h",
    "chars": 7186,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "outq.cpp",
    "chars": 5496,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "outq.h",
    "chars": 3319,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "pat.cpp",
    "chars": 48012,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "pat.h",
    "chars": 44747,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "pe.cpp",
    "chars": 31570,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "pe.h",
    "chars": 9778,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "presets.cpp",
    "chars": 2712,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "presets.h",
    "chars": 1571,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "processor_support.h",
    "chars": 2241,
    "preview": "#ifndef PROCESSOR_SUPPORT_H_\n#define PROCESSOR_SUPPORT_H_\n\n// Utility class ProcessorSupport provides POPCNTenabled() to"
  },
  {
    "path": "qual.cpp",
    "chars": 4018,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "qual.h",
    "chars": 6589,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "random_source.cpp",
    "chars": 3280,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "random_source.h",
    "chars": 5350,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "random_util.cpp",
    "chars": 907,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "random_util.h",
    "chars": 5952,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "read.h",
    "chars": 14297,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "read_qseq.cpp",
    "chars": 8426,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "ref_coord.cpp",
    "chars": 1014,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "ref_coord.h",
    "chars": 10053,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "ref_read.cpp",
    "chars": 9450,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "ref_read.h",
    "chars": 8409,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "reference.cpp",
    "chars": 20853,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "reference.h",
    "chars": 5881,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "scoring.cpp",
    "chars": 9495,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "scoring.h",
    "chars": 16843,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "search_globals.h",
    "chars": 1442,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "sequence_io.h",
    "chars": 3727,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "shmem.cpp",
    "chars": 1341,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "shmem.h",
    "chars": 4974,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "simple_func.cpp",
    "chars": 2353,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "simple_func.h",
    "chars": 3475,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "sse_util.cpp",
    "chars": 979,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "sse_util.h",
    "chars": 14275,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "sstring.cpp",
    "chars": 5475,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "sstring.h",
    "chars": 79601,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "str_util.h",
    "chars": 1149,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "taxonomy.h",
    "chars": 11008,
    "preview": "/*\n * taxonomy.h\n *\n *  Created on: Feb 10, 2016\n *      Author: fbreitwieser\n */\n\n#ifndef TAXONOMY_H_\n#define TAXONOMY_"
  },
  {
    "path": "third_party/MurmurHash3.cpp",
    "chars": 7901,
    "preview": "//-----------------------------------------------------------------------------\n// MurmurHash3 was written by Austin App"
  },
  {
    "path": "third_party/MurmurHash3.h",
    "chars": 1106,
    "preview": "//-----------------------------------------------------------------------------\n// MurmurHash3 was written by Austin App"
  },
  {
    "path": "third_party/cpuid.h",
    "chars": 5589,
    "preview": "/*\n * Copyright (C) 2007, 2008, 2009 Free Software Foundation, Inc.\n *\n * This file is free software; you can redistribu"
  },
  {
    "path": "threading.h",
    "chars": 1381,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "timer.h",
    "chars": 2444,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "tinythread.cpp",
    "chars": 8506,
    "preview": "/* -*- mode: c++; tab-width: 2; indent-tabs-mode: nil; -*-\nCopyright (c) 2010-2012 Marcus Geelnard\n\nThis software is pro"
  },
  {
    "path": "tinythread.h",
    "chars": 21220,
    "preview": "/* -*- mode: c++; tab-width: 2; indent-tabs-mode: nil; -*-\nCopyright (c) 2010-2012 Marcus Geelnard\n\nThis software is pro"
  },
  {
    "path": "tokenize.h",
    "chars": 1757,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "util.h",
    "chars": 2415,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "word_io.h",
    "chars": 7939,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  },
  {
    "path": "zbox.h",
    "chars": 2749,
    "preview": "/*\n * Copyright 2011, Ben Langmead <langmea@cs.jhu.edu>\n *\n * This file is part of Bowtie 2.\n *\n * Bowtie 2 is free soft"
  }
]

// ... and 4 more files (download for full content)

About this extraction

This page contains the full source code of the infphilo/centrifuge GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 167 files (2.8 MB), approximately 750.8k tokens, and a symbol index with 1217 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo