Repository: mhahsler/dbscan
Branch: master
Commit: 111f9bc6a376
Files: 154
Total size: 962.1 KB

Directory structure:
gitextract_jkl9o70t/

├── .Rbuildignore
├── .github/
│   └── .gitignore
├── .gitignore
├── DESCRIPTION
├── LICENSE
├── NAMESPACE
├── NEWS.md
├── R/
│   ├── AAA_dbscan-package.R
│   ├── AAA_definitions.R
│   ├── DBCV_datasets.R
│   ├── DS3.R
│   ├── GLOSH.R
│   ├── LOF.R
│   ├── NN.R
│   ├── RcppExports.R
│   ├── broom-dbscan-tidiers.R
│   ├── comps.R
│   ├── dbcv.R
│   ├── dbscan.R
│   ├── dendrogram.R
│   ├── extractFOSC.R
│   ├── frNN.R
│   ├── hdbscan.R
│   ├── hullplot.R
│   ├── jpclust.R
│   ├── kNN.R
│   ├── kNNdist.R
│   ├── moons.R
│   ├── ncluster.R
│   ├── nobs.R
│   ├── optics.R
│   ├── pointdensity.R
│   ├── predict.R
│   ├── reachability.R
│   ├── sNN.R
│   ├── sNNclust.R
│   ├── utils.R
│   └── zzz.R
├── README.Rmd
├── README.md
├── data/
│   ├── DS3.rdata
│   ├── Dataset_1.rda
│   ├── Dataset_2.rda
│   ├── Dataset_3.rda
│   ├── Dataset_4.rda
│   └── moons.rdata
├── data_src/
│   ├── data_DBCV/
│   │   ├── dataset_1.txt
│   │   ├── dataset_2.txt
│   │   ├── dataset_3.txt
│   │   ├── dataset_4.txt
│   │   ├── read_data.R
│   │   └── test_DBCV.R
│   └── data_chameleon/
│       └── read.R
├── dbscan.Rproj
├── inst/
│   └── CITATION
├── man/
│   ├── DBCV_datasets.Rd
│   ├── DS3.Rd
│   ├── NN.Rd
│   ├── comps.Rd
│   ├── dbcv.Rd
│   ├── dbscan-package.Rd
│   ├── dbscan.Rd
│   ├── dbscan_tidiers.Rd
│   ├── dendrogram.Rd
│   ├── extractFOSC.Rd
│   ├── frNN.Rd
│   ├── glosh.Rd
│   ├── hdbscan.Rd
│   ├── hullplot.Rd
│   ├── jpclust.Rd
│   ├── kNN.Rd
│   ├── kNNdist.Rd
│   ├── lof.Rd
│   ├── moons.Rd
│   ├── ncluster.Rd
│   ├── optics.Rd
│   ├── pointdensity.Rd
│   ├── reachability.Rd
│   ├── sNN.Rd
│   └── sNNclust.Rd
├── src/
│   ├── ANN/
│   │   ├── ANN.cpp
│   │   ├── ANN.h
│   │   ├── ANNperf.h
│   │   ├── ANNx.h
│   │   ├── Copyright.txt
│   │   ├── License.txt
│   │   ├── ReadMe.txt
│   │   ├── bd_fix_rad_search.cpp
│   │   ├── bd_pr_search.cpp
│   │   ├── bd_search.cpp
│   │   ├── bd_tree.cpp
│   │   ├── bd_tree.h
│   │   ├── brute.cpp
│   │   ├── kd_dump.cpp
│   │   ├── kd_fix_rad_search.cpp
│   │   ├── kd_fix_rad_search.h
│   │   ├── kd_pr_search.cpp
│   │   ├── kd_pr_search.h
│   │   ├── kd_search.cpp
│   │   ├── kd_search.h
│   │   ├── kd_split.cpp
│   │   ├── kd_split.h
│   │   ├── kd_tree.cpp
│   │   ├── kd_tree.h
│   │   ├── kd_util.cpp
│   │   ├── kd_util.h
│   │   ├── perf.cpp
│   │   ├── pr_queue.h
│   │   └── pr_queue_k.h
│   ├── JP.cpp
│   ├── Makevars
│   ├── RcppExports.cpp
│   ├── UnionFind.cpp
│   ├── UnionFind.h
│   ├── cleanup.cpp
│   ├── connectedComps.cpp
│   ├── dbcv.cpp
│   ├── dbscan.cpp
│   ├── dendrogram.cpp
│   ├── density.cpp
│   ├── frNN.cpp
│   ├── hdbscan.cpp
│   ├── kNN.cpp
│   ├── kNN.h
│   ├── lof.cpp
│   ├── lt.h
│   ├── mrd.cpp
│   ├── mst.cpp
│   ├── mst.h
│   ├── optics.cpp
│   ├── regionQuery.cpp
│   ├── regionQuery.h
│   ├── utilities.cpp
│   └── utilities.h
├── tests/
│   ├── testthat/
│   │   ├── fixtures/
│   │   │   ├── elki_optics.rda
│   │   │   ├── elki_optics_xi.rda
│   │   │   └── test_data.rda
│   │   ├── test-dbcv.R
│   │   ├── test-dbscan.R
│   │   ├── test-fosc.R
│   │   ├── test-frNN.R
│   │   ├── test-hdbscan.R
│   │   ├── test-kNN.R
│   │   ├── test-kNNdist.R
│   │   ├── test-lof.R
│   │   ├── test-mst.R
│   │   ├── test-optics.R
│   │   ├── test-opticsXi.R
│   │   ├── test-predict.R
│   │   └── test-sNN.R
│   └── testthat.R
└── vignettes/
    ├── dbscan.Rnw
    ├── dbscan.bib
    └── hdbscan.Rmd

================================================
FILE CONTENTS
================================================

================================================
FILE: .Rbuildignore
================================================
proj$
^\.Rproj\.user$
^cran-comments\.md$
^appveyor\.yml$
^revdep$
^.*\.o$
^.*\.Rproj$
^LICENSE
README.Rmd
data_src
ignore
^\.github$


================================================
FILE: .github/.gitignore
================================================
*.html


================================================
FILE: .gitignore
================================================
# Generated files 
*.o
*.so

# History files
.Rhistory
.Rapp.history
.RData
*.Rcheck


# Example code in package build process
*-Ex.R

# RStudio files
.Rproj.user/

# produced vignettes
vignettes/*.html
vignettes/*.pdf
.Rproj.user

# OS stuff 
.DS*

# Personal work directories 
Work
ignore
jss


================================================
FILE: DESCRIPTION
================================================
Package: dbscan
Title: Density-Based Spatial Clustering of Applications with Noise
    (DBSCAN) and Related Algorithms
Version: 1.2.4
Date: 2025-12-18
Authors@R: c(
    person("Michael", "Hahsler", email = "mhahsler@lyle.smu.edu", 
           role = c("aut", "cre", "cph"),
           comment = c(ORCID = "0000-0003-2716-1405")),
    person("Matthew", "Piekenbrock", role = c("aut", "cph")),
    person("Sunil", "Arya", role = c("ctb", "cph")),
    person("David", "Mount", role = c("ctb", "cph")),
    person("Claudia", "Malzer", role = "ctb")
  )
Description: A fast reimplementation of several density-based algorithms
    of the DBSCAN family. Includes the clustering algorithms DBSCAN
    (density-based spatial clustering of applications with noise) and
    HDBSCAN (hierarchical DBSCAN), the ordering algorithm OPTICS (ordering
    points to identify the clustering structure), shared nearest neighbor
    clustering, and the outlier detection algorithms LOF (local outlier
    factor) and GLOSH (global-local outlier score from hierarchies). The
    implementations use the kd-tree data structure (from library ANN) for
    faster k-nearest neighbor search. An R interface to fast kNN and
    fixed-radius NN search is also provided.  Hahsler, Piekenbrock and
    Doran (2019) <doi:10.18637/jss.v091.i01>.
License: GPL (>= 2)
URL: https://github.com/mhahsler/dbscan
BugReports: https://github.com/mhahsler/dbscan/issues
Depends:
    R (>= 3.2.0)
Imports:
    generics,
    graphics,
    Rcpp (>= 1.0.0),
    stats
Suggests:
    dendextend,
    fpc,
    igraph,
    knitr,
    microbenchmark,
    rmarkdown,
    testthat (>= 3.0.0),
    tibble
LinkingTo: 
    Rcpp
VignetteBuilder: 
    knitr
Config/testthat/edition: 3
Copyright: ANN library is copyright by University of Maryland, Sunil Arya
    and David Mount. All other code is copyright by Michael Hahsler and
    Matthew Piekenbrock.
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.3


================================================
FILE: LICENSE
================================================
                    GNU GENERAL PUBLIC LICENSE
                       Version 3, 29 June 2007

 Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
 Everyone is permitted to copy and distribute verbatim copies
 of this license document, but changing it is not allowed.

                            Preamble

  The GNU General Public License is a free, copyleft license for
software and other kinds of works.

  The licenses for most software and other practical works are designed
to take away your freedom to share and change the works.  By contrast,
the GNU General Public License is intended to guarantee your freedom to
share and change all versions of a program--to make sure it remains free
software for all its users.  We, the Free Software Foundation, use the
GNU General Public License for most of our software; it applies also to
any other work released this way by its authors.  You can apply it to
your programs, too.

  When we speak of free software, we are referring to freedom, not
price.  Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
them if you wish), that you receive source code or can get it if you
want it, that you can change the software or use pieces of it in new
free programs, and that you know you can do these things.

  To protect your rights, we need to prevent others from denying you
these rights or asking you to surrender the rights.  Therefore, you have
certain responsibilities if you distribute copies of the software, or if
you modify it: responsibilities to respect the freedom of others.

  For example, if you distribute copies of such a program, whether
gratis or for a fee, you must pass on to the recipients the same
freedoms that you received.  You must make sure that they, too, receive
or can get the source code.  And you must show them these terms so they
know their rights.

  Developers that use the GNU GPL protect your rights with two steps:
(1) assert copyright on the software, and (2) offer you this License
giving you legal permission to copy, distribute and/or modify it.

  For the developers' and authors' protection, the GPL clearly explains
that there is no warranty for this free software.  For both users' and
authors' sake, the GPL requires that modified versions be marked as
changed, so that their problems will not be attributed erroneously to
authors of previous versions.

  Some devices are designed to deny users access to install or run
modified versions of the software inside them, although the manufacturer
can do so.  This is fundamentally incompatible with the aim of
protecting users' freedom to change the software.  The systematic
pattern of such abuse occurs in the area of products for individuals to
use, which is precisely where it is most unacceptable.  Therefore, we
have designed this version of the GPL to prohibit the practice for those
products.  If such problems arise substantially in other domains, we
stand ready to extend this provision to those domains in future versions
of the GPL, as needed to protect the freedom of users.

  Finally, every program is threatened constantly by software patents.
States should not allow patents to restrict development and use of
software on general-purpose computers, but in those that do, we wish to
avoid the special danger that patents applied to a free program could
make it effectively proprietary.  To prevent this, the GPL assures that
patents cannot be used to render the program non-free.

  The precise terms and conditions for copying, distribution and
modification follow.

                       TERMS AND CONDITIONS

  0. Definitions.

  "This License" refers to version 3 of the GNU General Public License.

  "Copyright" also means copyright-like laws that apply to other kinds of
works, such as semiconductor masks.

  "The Program" refers to any copyrightable work licensed under this
License.  Each licensee is addressed as "you".  "Licensees" and
"recipients" may be individuals or organizations.

  To "modify" a work means to copy from or adapt all or part of the work
in a fashion requiring copyright permission, other than the making of an
exact copy.  The resulting work is called a "modified version" of the
earlier work or a work "based on" the earlier work.

  A "covered work" means either the unmodified Program or a work based
on the Program.

  To "propagate" a work means to do anything with it that, without
permission, would make you directly or secondarily liable for
infringement under applicable copyright law, except executing it on a
computer or modifying a private copy.  Propagation includes copying,
distribution (with or without modification), making available to the
public, and in some countries other activities as well.

  To "convey" a work means any kind of propagation that enables other
parties to make or receive copies.  Mere interaction with a user through
a computer network, with no transfer of a copy, is not conveying.

  An interactive user interface displays "Appropriate Legal Notices"
to the extent that it includes a convenient and prominently visible
feature that (1) displays an appropriate copyright notice, and (2)
tells the user that there is no warranty for the work (except to the
extent that warranties are provided), that licensees may convey the
work under this License, and how to view a copy of this License.  If
the interface presents a list of user commands or options, such as a
menu, a prominent item in the list meets this criterion.

  1. Source Code.

  The "source code" for a work means the preferred form of the work
for making modifications to it.  "Object code" means any non-source
form of a work.

  A "Standard Interface" means an interface that either is an official
standard defined by a recognized standards body, or, in the case of
interfaces specified for a particular programming language, one that
is widely used among developers working in that language.

  The "System Libraries" of an executable work include anything, other
than the work as a whole, that (a) is included in the normal form of
packaging a Major Component, but which is not part of that Major
Component, and (b) serves only to enable use of the work with that
Major Component, or to implement a Standard Interface for which an
implementation is available to the public in source code form.  A
"Major Component", in this context, means a major essential component
(kernel, window system, and so on) of the specific operating system
(if any) on which the executable work runs, or a compiler used to
produce the work, or an object code interpreter used to run it.

  The "Corresponding Source" for a work in object code form means all
the source code needed to generate, install, and (for an executable
work) run the object code and to modify the work, including scripts to
control those activities.  However, it does not include the work's
System Libraries, or general-purpose tools or generally available free
programs which are used unmodified in performing those activities but
which are not part of the work.  For example, Corresponding Source
includes interface definition files associated with source files for
the work, and the source code for shared libraries and dynamically
linked subprograms that the work is specifically designed to require,
such as by intimate data communication or control flow between those
subprograms and other parts of the work.

  The Corresponding Source need not include anything that users
can regenerate automatically from other parts of the Corresponding
Source.

  The Corresponding Source for a work in source code form is that
same work.

  2. Basic Permissions.

  All rights granted under this License are granted for the term of
copyright on the Program, and are irrevocable provided the stated
conditions are met.  This License explicitly affirms your unlimited
permission to run the unmodified Program.  The output from running a
covered work is covered by this License only if the output, given its
content, constitutes a covered work.  This License acknowledges your
rights of fair use or other equivalent, as provided by copyright law.

  You may make, run and propagate covered works that you do not
convey, without conditions so long as your license otherwise remains
in force.  You may convey covered works to others for the sole purpose
of having them make modifications exclusively for you, or provide you
with facilities for running those works, provided that you comply with
the terms of this License in conveying all material for which you do
not control copyright.  Those thus making or running the covered works
for you must do so exclusively on your behalf, under your direction
and control, on terms that prohibit them from making any copies of
your copyrighted material outside their relationship with you.

  Conveying under any other circumstances is permitted solely under
the conditions stated below.  Sublicensing is not allowed; section 10
makes it unnecessary.

  3. Protecting Users' Legal Rights From Anti-Circumvention Law.

  No covered work shall be deemed part of an effective technological
measure under any applicable law fulfilling obligations under article
11 of the WIPO copyright treaty adopted on 20 December 1996, or
similar laws prohibiting or restricting circumvention of such
measures.

  When you convey a covered work, you waive any legal power to forbid
circumvention of technological measures to the extent such circumvention
is effected by exercising rights under this License with respect to
the covered work, and you disclaim any intention to limit operation or
modification of the work as a means of enforcing, against the work's
users, your or third parties' legal rights to forbid circumvention of
technological measures.

  4. Conveying Verbatim Copies.

  You may convey verbatim copies of the Program's source code as you
receive it, in any medium, provided that you conspicuously and
appropriately publish on each copy an appropriate copyright notice;
keep intact all notices stating that this License and any
non-permissive terms added in accord with section 7 apply to the code;
keep intact all notices of the absence of any warranty; and give all
recipients a copy of this License along with the Program.

  You may charge any price or no price for each copy that you convey,
and you may offer support or warranty protection for a fee.

  5. Conveying Modified Source Versions.

  You may convey a work based on the Program, or the modifications to
produce it from the Program, in the form of source code under the
terms of section 4, provided that you also meet all of these conditions:

    a) The work must carry prominent notices stating that you modified
    it, and giving a relevant date.

    b) The work must carry prominent notices stating that it is
    released under this License and any conditions added under section
    7.  This requirement modifies the requirement in section 4 to
    "keep intact all notices".

    c) You must license the entire work, as a whole, under this
    License to anyone who comes into possession of a copy.  This
    License will therefore apply, along with any applicable section 7
    additional terms, to the whole of the work, and all its parts,
    regardless of how they are packaged.  This License gives no
    permission to license the work in any other way, but it does not
    invalidate such permission if you have separately received it.

    d) If the work has interactive user interfaces, each must display
    Appropriate Legal Notices; however, if the Program has interactive
    interfaces that do not display Appropriate Legal Notices, your
    work need not make them do so.

  A compilation of a covered work with other separate and independent
works, which are not by their nature extensions of the covered work,
and which are not combined with it such as to form a larger program,
in or on a volume of a storage or distribution medium, is called an
"aggregate" if the compilation and its resulting copyright are not
used to limit the access or legal rights of the compilation's users
beyond what the individual works permit.  Inclusion of a covered work
in an aggregate does not cause this License to apply to the other
parts of the aggregate.

  6. Conveying Non-Source Forms.

  You may convey a covered work in object code form under the terms
of sections 4 and 5, provided that you also convey the
machine-readable Corresponding Source under the terms of this License,
in one of these ways:

    a) Convey the object code in, or embodied in, a physical product
    (including a physical distribution medium), accompanied by the
    Corresponding Source fixed on a durable physical medium
    customarily used for software interchange.

    b) Convey the object code in, or embodied in, a physical product
    (including a physical distribution medium), accompanied by a
    written offer, valid for at least three years and valid for as
    long as you offer spare parts or customer support for that product
    model, to give anyone who possesses the object code either (1) a
    copy of the Corresponding Source for all the software in the
    product that is covered by this License, on a durable physical
    medium customarily used for software interchange, for a price no
    more than your reasonable cost of physically performing this
    conveying of source, or (2) access to copy the
    Corresponding Source from a network server at no charge.

    c) Convey individual copies of the object code with a copy of the
    written offer to provide the Corresponding Source.  This
    alternative is allowed only occasionally and noncommercially, and
    only if you received the object code with such an offer, in accord
    with subsection 6b.

    d) Convey the object code by offering access from a designated
    place (gratis or for a charge), and offer equivalent access to the
    Corresponding Source in the same way through the same place at no
    further charge.  You need not require recipients to copy the
    Corresponding Source along with the object code.  If the place to
    copy the object code is a network server, the Corresponding Source
    may be on a different server (operated by you or a third party)
    that supports equivalent copying facilities, provided you maintain
    clear directions next to the object code saying where to find the
    Corresponding Source.  Regardless of what server hosts the
    Corresponding Source, you remain obligated to ensure that it is
    available for as long as needed to satisfy these requirements.

    e) Convey the object code using peer-to-peer transmission, provided
    you inform other peers where the object code and Corresponding
    Source of the work are being offered to the general public at no
    charge under subsection 6d.

  A separable portion of the object code, whose source code is excluded
from the Corresponding Source as a System Library, need not be
included in conveying the object code work.

  A "User Product" is either (1) a "consumer product", which means any
tangible personal property which is normally used for personal, family,
or household purposes, or (2) anything designed or sold for incorporation
into a dwelling.  In determining whether a product is a consumer product,
doubtful cases shall be resolved in favor of coverage.  For a particular
product received by a particular user, "normally used" refers to a
typical or common use of that class of product, regardless of the status
of the particular user or of the way in which the particular user
actually uses, or expects or is expected to use, the product.  A product
is a consumer product regardless of whether the product has substantial
commercial, industrial or non-consumer uses, unless such uses represent
the only significant mode of use of the product.

  "Installation Information" for a User Product means any methods,
procedures, authorization keys, or other information required to install
and execute modified versions of a covered work in that User Product from
a modified version of its Corresponding Source.  The information must
suffice to ensure that the continued functioning of the modified object
code is in no case prevented or interfered with solely because
modification has been made.

  If you convey an object code work under this section in, or with, or
specifically for use in, a User Product, and the conveying occurs as
part of a transaction in which the right of possession and use of the
User Product is transferred to the recipient in perpetuity or for a
fixed term (regardless of how the transaction is characterized), the
Corresponding Source conveyed under this section must be accompanied
by the Installation Information.  But this requirement does not apply
if neither you nor any third party retains the ability to install
modified object code on the User Product (for example, the work has
been installed in ROM).

  The requirement to provide Installation Information does not include a
requirement to continue to provide support service, warranty, or updates
for a work that has been modified or installed by the recipient, or for
the User Product in which it has been modified or installed.  Access to a
network may be denied when the modification itself materially and
adversely affects the operation of the network or violates the rules and
protocols for communication across the network.

  Corresponding Source conveyed, and Installation Information provided,
in accord with this section must be in a format that is publicly
documented (and with an implementation available to the public in
source code form), and must require no special password or key for
unpacking, reading or copying.

  7. Additional Terms.

  "Additional permissions" are terms that supplement the terms of this
License by making exceptions from one or more of its conditions.
Additional permissions that are applicable to the entire Program shall
be treated as though they were included in this License, to the extent
that they are valid under applicable law.  If additional permissions
apply only to part of the Program, that part may be used separately
under those permissions, but the entire Program remains governed by
this License without regard to the additional permissions.

  When you convey a copy of a covered work, you may at your option
remove any additional permissions from that copy, or from any part of
it.  (Additional permissions may be written to require their own
removal in certain cases when you modify the work.)  You may place
additional permissions on material, added by you to a covered work,
for which you have or can give appropriate copyright permission.

  Notwithstanding any other provision of this License, for material you
add to a covered work, you may (if authorized by the copyright holders of
that material) supplement the terms of this License with terms:

    a) Disclaiming warranty or limiting liability differently from the
    terms of sections 15 and 16 of this License; or

    b) Requiring preservation of specified reasonable legal notices or
    author attributions in that material or in the Appropriate Legal
    Notices displayed by works containing it; or

    c) Prohibiting misrepresentation of the origin of that material, or
    requiring that modified versions of such material be marked in
    reasonable ways as different from the original version; or

    d) Limiting the use for publicity purposes of names of licensors or
    authors of the material; or

    e) Declining to grant rights under trademark law for use of some
    trade names, trademarks, or service marks; or

    f) Requiring indemnification of licensors and authors of that
    material by anyone who conveys the material (or modified versions of
    it) with contractual assumptions of liability to the recipient, for
    any liability that these contractual assumptions directly impose on
    those licensors and authors.

  All other non-permissive additional terms are considered "further
restrictions" within the meaning of section 10.  If the Program as you
received it, or any part of it, contains a notice stating that it is
governed by this License along with a term that is a further
restriction, you may remove that term.  If a license document contains
a further restriction but permits relicensing or conveying under this
License, you may add to a covered work material governed by the terms
of that license document, provided that the further restriction does
not survive such relicensing or conveying.

  If you add terms to a covered work in accord with this section, you
must place, in the relevant source files, a statement of the
additional terms that apply to those files, or a notice indicating
where to find the applicable terms.

  Additional terms, permissive or non-permissive, may be stated in the
form of a separately written license, or stated as exceptions;
the above requirements apply either way.

  8. Termination.

  You may not propagate or modify a covered work except as expressly
provided under this License.  Any attempt otherwise to propagate or
modify it is void, and will automatically terminate your rights under
this License (including any patent licenses granted under the third
paragraph of section 11).

  However, if you cease all violation of this License, then your
license from a particular copyright holder is reinstated (a)
provisionally, unless and until the copyright holder explicitly and
finally terminates your license, and (b) permanently, if the copyright
holder fails to notify you of the violation by some reasonable means
prior to 60 days after the cessation.

  Moreover, your license from a particular copyright holder is
reinstated permanently if the copyright holder notifies you of the
violation by some reasonable means, this is the first time you have
received notice of violation of this License (for any work) from that
copyright holder, and you cure the violation prior to 30 days after
your receipt of the notice.

  Termination of your rights under this section does not terminate the
licenses of parties who have received copies or rights from you under
this License.  If your rights have been terminated and not permanently
reinstated, you do not qualify to receive new licenses for the same
material under section 10.

  9. Acceptance Not Required for Having Copies.

  You are not required to accept this License in order to receive or
run a copy of the Program.  Ancillary propagation of a covered work
occurring solely as a consequence of using peer-to-peer transmission
to receive a copy likewise does not require acceptance.  However,
nothing other than this License grants you permission to propagate or
modify any covered work.  These actions infringe copyright if you do
not accept this License.  Therefore, by modifying or propagating a
covered work, you indicate your acceptance of this License to do so.

  10. Automatic Licensing of Downstream Recipients.

  Each time you convey a covered work, the recipient automatically
receives a license from the original licensors, to run, modify and
propagate that work, subject to this License.  You are not responsible
for enforcing compliance by third parties with this License.

  An "entity transaction" is a transaction transferring control of an
organization, or substantially all assets of one, or subdividing an
organization, or merging organizations.  If propagation of a covered
work results from an entity transaction, each party to that
transaction who receives a copy of the work also receives whatever
licenses to the work the party's predecessor in interest had or could
give under the previous paragraph, plus a right to possession of the
Corresponding Source of the work from the predecessor in interest, if
the predecessor has it or can get it with reasonable efforts.

  You may not impose any further restrictions on the exercise of the
rights granted or affirmed under this License.  For example, you may
not impose a license fee, royalty, or other charge for exercise of
rights granted under this License, and you may not initiate litigation
(including a cross-claim or counterclaim in a lawsuit) alleging that
any patent claim is infringed by making, using, selling, offering for
sale, or importing the Program or any portion of it.

  11. Patents.

  A "contributor" is a copyright holder who authorizes use under this
License of the Program or a work on which the Program is based.  The
work thus licensed is called the contributor's "contributor version".

  A contributor's "essential patent claims" are all patent claims
owned or controlled by the contributor, whether already acquired or
hereafter acquired, that would be infringed by some manner, permitted
by this License, of making, using, or selling its contributor version,
but do not include claims that would be infringed only as a
consequence of further modification of the contributor version.  For
purposes of this definition, "control" includes the right to grant
patent sublicenses in a manner consistent with the requirements of
this License.

  Each contributor grants you a non-exclusive, worldwide, royalty-free
patent license under the contributor's essential patent claims, to
make, use, sell, offer for sale, import and otherwise run, modify and
propagate the contents of its contributor version.

  In the following three paragraphs, a "patent license" is any express
agreement or commitment, however denominated, not to enforce a patent
(such as an express permission to practice a patent or covenant not to
sue for patent infringement).  To "grant" such a patent license to a
party means to make such an agreement or commitment not to enforce a
patent against the party.

  If you convey a covered work, knowingly relying on a patent license,
and the Corresponding Source of the work is not available for anyone
to copy, free of charge and under the terms of this License, through a
publicly available network server or other readily accessible means,
then you must either (1) cause the Corresponding Source to be so
available, or (2) arrange to deprive yourself of the benefit of the
patent license for this particular work, or (3) arrange, in a manner
consistent with the requirements of this License, to extend the patent
license to downstream recipients.  "Knowingly relying" means you have
actual knowledge that, but for the patent license, your conveying the
covered work in a country, or your recipient's use of the covered work
in a country, would infringe one or more identifiable patents in that
country that you have reason to believe are valid.

  If, pursuant to or in connection with a single transaction or
arrangement, you convey, or propagate by procuring conveyance of, a
covered work, and grant a patent license to some of the parties
receiving the covered work authorizing them to use, propagate, modify
or convey a specific copy of the covered work, then the patent license
you grant is automatically extended to all recipients of the covered
work and works based on it.

  A patent license is "discriminatory" if it does not include within
the scope of its coverage, prohibits the exercise of, or is
conditioned on the non-exercise of one or more of the rights that are
specifically granted under this License.  You may not convey a covered
work if you are a party to an arrangement with a third party that is
in the business of distributing software, under which you make payment
to the third party based on the extent of your activity of conveying
the work, and under which the third party grants, to any of the
parties who would receive the covered work from you, a discriminatory
patent license (a) in connection with copies of the covered work
conveyed by you (or copies made from those copies), or (b) primarily
for and in connection with specific products or compilations that
contain the covered work, unless you entered into that arrangement,
or that patent license was granted, prior to 28 March 2007.

  Nothing in this License shall be construed as excluding or limiting
any implied license or other defenses to infringement that may
otherwise be available to you under applicable patent law.

  12. No Surrender of Others' Freedom.

  If conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License.  If you cannot convey a
covered work so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you may
not convey it at all.  For example, if you agree to terms that obligate you
to collect a royalty for further conveying from those to whom you convey
the Program, the only way you could satisfy both those terms and this
License would be to refrain entirely from conveying the Program.

  13. Use with the GNU Affero General Public License.

  Notwithstanding any other provision of this License, you have
permission to link or combine any covered work with a work licensed
under version 3 of the GNU Affero General Public License into a single
combined work, and to convey the resulting work.  The terms of this
License will continue to apply to the part which is the covered work,
but the special requirements of the GNU Affero General Public License,
section 13, concerning interaction through a network will apply to the
combination as such.

  14. Revised Versions of this License.

  The Free Software Foundation may publish revised and/or new versions of
the GNU General Public License from time to time.  Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.

  Each version is given a distinguishing version number.  If the
Program specifies that a certain numbered version of the GNU General
Public License "or any later version" applies to it, you have the
option of following the terms and conditions either of that numbered
version or of any later version published by the Free Software
Foundation.  If the Program does not specify a version number of the
GNU General Public License, you may choose any version ever published
by the Free Software Foundation.

  If the Program specifies that a proxy can decide which future
versions of the GNU General Public License can be used, that proxy's
public statement of acceptance of a version permanently authorizes you
to choose that version for the Program.

  Later license versions may give you additional or different
permissions.  However, no additional obligations are imposed on any
author or copyright holder as a result of your choosing to follow a
later version.

  15. Disclaimer of Warranty.

  THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.

  16. Limitation of Liability.

  IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
SUCH DAMAGES.

  17. Interpretation of Sections 15 and 16.

  If the disclaimer of warranty and limitation of liability provided
above cannot be given local legal effect according to their terms,
reviewing courts shall apply local law that most closely approximates
an absolute waiver of all civil liability in connection with the
Program, unless a warranty or assumption of liability accompanies a
copy of the Program in return for a fee.

                     END OF TERMS AND CONDITIONS

            How to Apply These Terms to Your New Programs

  If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.

  To do so, attach the following notices to the program.  It is safest
to attach them to the start of each source file to most effectively
state the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.

    {one line to give the program's name and a brief idea of what it does.}
    Copyright (C) {year}  {name of author}

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.

Also add information on how to contact you by electronic and paper mail.

  If the program does terminal interaction, make it output a short
notice like this when it starts in an interactive mode:

    {project}  Copyright (C) {year}  {fullname}
    This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
    This is free software, and you are welcome to redistribute it
    under certain conditions; type `show c' for details.

The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License.  Of course, your program's commands
might be different; for a GUI interface, you would use an "about box".

  You should also get your employer (if you work as a programmer) or school,
if any, to sign a "copyright disclaimer" for the program, if necessary.
For more information on this, and how to apply and follow the GNU GPL, see
<http://www.gnu.org/licenses/>.

  The GNU General Public License does not permit incorporating your program
into proprietary programs.  If your program is a subroutine library, you
may consider it more useful to permit linking proprietary applications with
the library.  If this is what you want to do, use the GNU Lesser General
Public License instead of this License.  But first, please read
<http://www.gnu.org/philosophy/why-not-lgpl.html>.


================================================
FILE: NAMESPACE
================================================
# Generated by roxygen2: do not edit by hand

S3method(adjacencylist,NN)
S3method(adjacencylist,frNN)
S3method(adjacencylist,kNN)
S3method(as.dendrogram,default)
S3method(as.dendrogram,hclust)
S3method(as.dendrogram,hdbscan)
S3method(as.dendrogram,optics)
S3method(as.dendrogram,reachability)
S3method(as.reachability,dendrogram)
S3method(as.reachability,optics)
S3method(augment,dbscan)
S3method(augment,general_clustering)
S3method(augment,hdbscan)
S3method(comps,dist)
S3method(comps,frNN)
S3method(comps,kNN)
S3method(comps,sNN)
S3method(glance,dbscan)
S3method(glance,general_clustering)
S3method(glance,hdbscan)
S3method(ncluster,default)
S3method(nnoise,default)
S3method(nobs,dbscan)
S3method(nobs,general_clustering)
S3method(nobs,hdbscan)
S3method(plot,NN)
S3method(plot,hdbscan)
S3method(plot,optics)
S3method(plot,reachability)
S3method(predict,dbscan_fast)
S3method(predict,hdbscan)
S3method(predict,optics)
S3method(print,dbscan_fast)
S3method(print,frNN)
S3method(print,general_clustering)
S3method(print,hdbscan)
S3method(print,kNN)
S3method(print,optics)
S3method(print,reachability)
S3method(print,sNN)
S3method(sort,NN)
S3method(sort,frNN)
S3method(sort,kNN)
S3method(sort,sNN)
S3method(tidy,dbscan)
S3method(tidy,general_clustering)
S3method(tidy,hdbscan)
export(adjacencylist)
export(as.dendrogram)
export(as.reachability)
export(augment)
export(clplot)
export(comps)
export(coredist)
export(dbcv)
export(dbscan)
export(extractDBSCAN)
export(extractFOSC)
export(extractXi)
export(frNN)
export(glance)
export(glosh)
export(hdbscan)
export(hullplot)
export(is.corepoint)
export(jpclust)
export(kNN)
export(kNNdist)
export(kNNdistplot)
export(lof)
export(mrdist)
export(ncluster)
export(nnoise)
export(optics)
export(pointdensity)
export(sNN)
export(sNNclust)
export(tidy)
import(Rcpp)
importFrom(generics,augment)
importFrom(generics,glance)
importFrom(generics,tidy)
importFrom(grDevices,adjustcolor)
importFrom(grDevices,chull)
importFrom(grDevices,palette)
importFrom(graphics,abline)
importFrom(graphics,lines)
importFrom(graphics,matplot)
importFrom(graphics,par)
importFrom(graphics,plot)
importFrom(graphics,points)
importFrom(graphics,polygon)
importFrom(graphics,segments)
importFrom(graphics,text)
importFrom(stats,as.dendrogram)
importFrom(stats,dendrapply)
importFrom(stats,dist)
importFrom(stats,hclust)
importFrom(stats,is.leaf)
importFrom(stats,nobs)
importFrom(stats,prcomp)
importFrom(stats,predict)
importFrom(utils,tail)
useDynLib(dbscan, .registration=TRUE)


================================================
FILE: NEWS.md
================================================
# dbscan 1.2.4 (2025-12-18)

## Bugfixes
* dbscan now checks for matrices with 0 rows or 0 columns
  (reported by maldridgeepa).
* Fixed license information for the ANN library header files (reported by 
  Charles Plessy).

# dbscan 1.2.3 (2025-08-20)

## Bugfixes
* plot.hdbscan gained parameters main, ylab, and leaflab (reported by nhward).

## Changes
* Fixed  partial argument matches.

# dbscan 1.2.2 (2025-01-24)

## Changes
* Removed dependence on the /bits/stdc++.h header. 

# dbscan 1.2.1 (2025-01-23)

## Changes
* Various refactoring by m-muecke

## New Features
* HDBSCAN gained parameter cluster_selection_epsilon to implement 
  clusters selected from Malzer and Baum (2020).
* Functions ncluster() and nnoise() were added.
* hullplot now() marks noise as x.
* Added clplot().
* pointdensity now also accepts a dist object as input and has the new type
  "gaussian" to calculate a Gaussian kernel estimate.
* Added the DBCV index.

## Bugfixes
* extractFOCS: Fixed total_score.
* Rewrote minimal spanning tree code.

# dbscan 1.2-0 (2024-06-28)

## New Features
* dbscan has now tidymodels tidiers (glance, tidy, augment).
* kNNdistplot can now plot a range of k/minPts values.
* added stats::nobs methods for the clusterings.
* kNN and frNN now contains the used distance metric.

## Changes
* dbscan component dist was renamed to metric. 
* Removed redundant sort in kNNdistplot (reported by Natasza Szczypien).
* Refactoring use anyNA(x) instead of any(is.na(x))
  and many more (by m-muecke).
* Reorganized the C++ source code.
* README now uses bibtex.
* Tests use now testthat edition 3 (m-muecke).

# dbscan 1.1-12 (2023-11-28)

## Bugfixes
* point_density checks now for missing values (reported by soelderer).
* Removed C++11 specification.
* ANN.cpp: fixed Rprintf warning.

# dbscan 1.1-11 (2022-10-26)

## New Features
* kNNdistplot gained parameter minPts.
* dbscan now retains information on distance method and border points.
* HDBSCAN now supports long vectors to work with larger distance matrices. 
* conversion from dist to kNN and frNN is now more memory efficient. It does no longer 
  coerce the dist object into a matrix of double the size, but extract the distances directly
  from the dist object.
* Better description of how predict uses only Euclidean distances and more error checking.
* The package now exports a new generic for as.dendrogram().

## Bugfixes
* is.corepoint() now uses the correct epsilon value (reported by Eng Aun).
* functions now check for cluster::dissimilariy objects which have class dist 
  but missing attributes.

# dbscan 1.1-10 (2022-01-14)

## New Features
* is.corepoint() for DBSCAN.
* coredist() and mrdist() for HDBSCAN.
* find connected components with comps().

## Changes
* reachability plot now shows all undefined distances as a dashed line.

## Bugfixes
* memory leak in mrd calculation fixed.

# dbscan 1.1-9 (2022-01-10)

## Changes
* We use now roxygen2.  

## New Features
* Added predict for hdbscan (as suggested by moredatapls)

# dbscan 1.1-8 (2021-04-26)

## Bugfixes
* LOF: fixed numerical issues with k-nearest neighbor distance on Solaris.

# dbscan 1.1-7 (2021-04-21)

## Bugfixes
* Fixed description of k in knndistplot and added minPts argument.
* Fixed bug for tied distances in lof (reported by sverchkov).

## Changes
* lof: the density parameter was changes to minPts to be consistent with the original paper and dbscan. Note that minPts = k + 1.

# dbscan 1.1-6 (2021-02-24)

## Improvements 
* Improved speed of LOF for large ks (following suggestions by eduardokapp). 
* kNN: results is now not sorted again for kd-tree queries which is much faster (by a factor of 10).
* ANN library: annclose() is now only called once when the package is unloaded. This is in preparation to support persistent kd-trees using external pointers.
* hdbscan lost parameter xdist.

## Bugfixes
* removed dependence on methods.
* fixed problem in hullplot for singleton clusters (reported by Fernando Archuby).
* GLOSH now also accepts data.frames.
* GLOSH returns now 0 instead of NaN if we have k duplicate points in the data.

# dbscan 1.1-5 (2019-10-22)

## New Features
* kNN and frNN gained parameter query to query neighbors for points not in the data.
* sNN gained parameter jp to decide if the shared NN should be counted using the definition by Jarvis and Patrick.


# dbscan 1.1-4 (2019-08-05)

## New Features
* kNNdist gained parameter all to indicate if a matrix with the distance to all 
  nearest neighbors up to k should be returned.

## Bugfixes
* kNNdist now correctly returns the distances to the kth neighbor 
  (reported by zschuster).
* dbscan: check eps and minPts parameters to avoid undefined results (reported by ArthurPERE).


# dbscan 1.1-3 (2018-11-12)

## Bugfixes
* pointdensity was double counting the query point (reported by Marius Hofert).

# dbscan 1.1-2 (2018-05-18)

## New Features
* OPTICS now calculates eps if it is omitted.

## Bugfixes
* Example now only uses igraph conditionally since it is unavailable 
  on Solaris (reported by B. Ripley).

# dbscan 1.1-1 (2017-03-19)

## Bugfixes

* Fixed problem with constant name on Solaris in ANN code (reported by B. Ripley).

# dbscan 1.1-0 (2017-03-18)

## New Features

* HDBSCAN was added.
* extractFOSC (optimal selection of clusters for HDBSCAN) was added.
* GLOSH outlier score was added.
* hullplot uses now filled polygons as the default.
* hullplot now used PCA if the data has more than 2 dimensions.
* Added NN superclass for kNN and frNN with plot and with adjacencylist().
* Added shared nearest neighbor clustering as sNNclust() and sNN to calculate
  the number of shared nearest neighbors.
* Added pointdensity function.
* Unsorted kNN and frNN can now be sorted using sort().
* kNN and frNN now also accept kNN and frNN objects, respectively. This can 
  be used to create a new kNN (frNN) with a reduced k or eps.
* Datasets added: DS3 and moon.

## Interface Changes

* Improved interface for dbscan() and optics(): ... it now passed on to frNN.
* OPTICS clustering extraction methods are now called extractDBSCAN and 
  extractXi.
* kNN and frNN are now objects with a print function.
* dbscan now also accepts a frNN object as input.
* jpclust and sNNclust now return a list instead of just the 
  cluster assignments.

# dbscan 1.0-0 (2017-02-02)

## New Features

* The package has now a vignette.
* Jarvis-Patrick clustering is now available as jpclust().
* Improved interface for dbscan() and optics(): ... is now passed on to frNN.
* OPTICS clustering extraction methods are now called extractDBSCAN and 
  extractXi.
* hullplot uses now filled polygons as the default.
* hullplot now used PCA if the data has more than 2 dimensions.
* kNN and frNN are now objects with a print function.
* dbscan now also accepts a frNN object as input.


# dbscan 0.9-8 (2016-08-05)

## New Features

* Added hullplot to plot a scatter plot with added convex cluster hulls.
* OPTICS: added a predecessor correction step that is used by 
    the ELKI implementation (Matt Piekenbrock).  

## Bugfixes

* Fixed a memory problem in frNN (reported by Yilei He).

# dbscan 0.9-7 (2016-04-14)

* OPTICSXi is now implemented (thanks to Matt Piekenbrock).
* DBSCAN now also accepts MinPts (with a capital M) to be
    compatible with the fpc version.
* DBSCAN objects are now also of class db scan_fast to avoid clashes with fpc.
* DBSCAN and OPTICS have now predict functions.
* Added test for unhandled NAs.
* Fixed LOF for more than k duplicate points (reported by Samneet Singh).

# dbscan 0.9-6 (2015-12-14)

* OPTICS: fixed second bug reported by Di Pang
* all methods now also accept dist objects and have a search
    method "dist" which precomputes distances.

# dbscan 0.9-5 (2015-10-04)

* OPTICS: fixed bug with first observation reported by Di Pang
* OPTICS: clusterings can now be extracted using optics_cut

# dbscan 0.9-4 (2015-09-17)

* added tests (testthat).
* input data is now checked if it can safely be coerced into a
    numeric matrix (storage.mode double).
* fixed self matches in kNN and frNN (now returns the first NN correctly).

# dbscan 0.9-3 (2015-9-2)

* Added weights to DBSCAN.

# dbscan 0.9-2 (2015-08-11)

* Added kNN interface.
* Added frNN (fixed radius NN) interface.
* Added LOF.
* Added OPTICS.
* All algorithms check now for interrupt (CTRL-C/Esc).
* DBSCAN now returns a list instead of a numeric vector.

# dbscan 0.9-1 (2015-07-21)

* DBSCAN: Improved speed by avoiding repeated sorting of point ids.
* Added linear NN search option.
* Added fast calculation for kNN distance.
* fpc and microbenchmark are now used conditionally in the examples.

# dbscan 0.9-0 (2015-07-15)

* initial release


================================================
FILE: R/AAA_dbscan-package.R
================================================
#' @keywords internal
#'
#' @section Key functions:
#' - Clustering: [dbscan()], [hdbscan()], [optics()], [jpclust()], [sNNclust()]
#' - Outliers: [lof()], [glosh()], [pointdensity()]
#' - Nearest Neighbors: [kNN()], [frNN()], [sNN()]
#'
#' @references
#' Hahsler M, Piekenbrock M, Doran D (2019). dbscan: Fast Density-Based Clustering with R. Journal of Statistical Software, 91(1), 1-30. \doi{10.18637/jss.v091.i01}
#'
#' @import Rcpp
#' @importFrom graphics plot points lines text abline polygon par segments matplot
#' @importFrom grDevices palette chull adjustcolor
#' @importFrom stats dist hclust dendrapply as.dendrogram is.leaf prcomp
#' @importFrom utils tail
#'
#' @useDynLib dbscan, .registration=TRUE
"_PACKAGE"


================================================
FILE: R/AAA_definitions.R
================================================
#######################################################################
# dbscan - Density Based Clustering of Applications with Noise
#          and Related Algorithms
# Copyright (C) 2015 Michael Hahsler

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

.ANNsplitRule <- c("STD", "MIDPT", "FAIR", "SL_MIDPT", "SL_FAIR", "SUGGEST")

.matrixlike <- function(x) {
  if  (is.null(dim(x)))
       return(FALSE)

  # check that there is at least one row and one column!
  if (nrow(x) < 1L) stop("the provided data has 0 rows!")
  if (ncol(x) < 1L) stop("the provided data has 0 columns!")

  TRUE
}


================================================
FILE: R/DBCV_datasets.R
================================================
#######################################################################
# dbscan - Density Based Clustering of Applications with Noise
#          and Related Algorithms
# Copyright (C) 2015 Michael Hahsler

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

#' DBCV Paper Datasets
#'
#' The four synthetic 2D datasets used in Moulavi et al (2014).
#'
#' @name DBCV_datasets
#' @aliases Dataset_1 Dataset_2 Dataset_3 Dataset_4
#' @docType data
#' @format Four data frames with the following 3 variables.
#' \describe{
#' \item{x}{a numeric vector}
#' \item{y}{a numeric vector}
#' \item{class}{an integer vector indicating the class label. 0 means noise.} }
#' @references Davoud Moulavi and Pablo A. Jaskowiak and
#' Ricardo J. G. B. Campello and Arthur Zimek and Jörg Sander (2014).
#' Density-Based Clustering Validation. In
#' _Proceedings of the 2014 SIAM International Conference on Data Mining,_
#' pages 839-847
#' \doi{10.1137/1.9781611973440.96}
#' @source https://github.com/pajaskowiak/dbcv
#' @keywords datasets
#' @examples
#' data("Dataset_1")
#' clplot(Dataset_1[, c("x", "y")], cl = Dataset_1$class)
#'
#' data("Dataset_2")
#' clplot(Dataset_2[, c("x", "y")], cl = Dataset_2$class)
#'
#' data("Dataset_3")
#' clplot(Dataset_3[, c("x", "y")], cl = Dataset_3$class)
#'
#' data("Dataset_4")
#' clplot(Dataset_4[, c("x", "y")], cl = Dataset_4$class)
NULL


================================================
FILE: R/DS3.R
================================================
#######################################################################
# dbscan - Density Based Clustering of Applications with Noise
#          and Related Algorithms
# Copyright (C) 2015 Michael Hahsler

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.


#' DS3: Spatial data with arbitrary shapes
#'
#' Contains 8000 2-d points, with 6 "natural" looking shapes, all of which have
#' an sinusoid-like shape that intersects with each cluster.
#' The data set was originally used as a benchmark data set for the Chameleon clustering
#' algorithm (Karypis, Han and Kumar, 1999) to
#' illustrate the a data set containing arbitrarily shaped
#' spatial data surrounded by both noise and artifacts.
#'
#' @name DS3
#' @docType data
#' @format A data.frame with 8000 observations on the following 2 columns:
#' \describe{
#'   \item{X}{a numeric vector}
#'   \item{Y}{a numeric vector}
#' }
#'
#' @references Karypis, George, Eui-Hong Han, and Vipin Kumar (1999).
#' Chameleon: Hierarchical clustering using dynamic modeling. _Computer_
#' 32(8): 68-75.
#' @source Obtained from \url{http://cs.joensuu.fi/sipu/datasets/}
#' @keywords datasets
#' @examples
#' data(DS3)
#' plot(DS3, pch = 20, cex = 0.25)
NULL


================================================
FILE: R/GLOSH.R
================================================
#######################################################################
# dbscan - Density Based Clustering of Applications with Noise
#          and Related Algorithms
# Copyright (C) 2015 Michael Hahsler, Matthew Piekenbrock

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

#' Global-Local Outlier Score from Hierarchies
#'
#' Calculate the Global-Local Outlier Score from Hierarchies (GLOSH) score for
#' each data point using a kd-tree to speed up kNN search.
#'
#' GLOSH compares the density of a point to densities of any points associated
#' within current and child clusters (if any). Points that have a substantially
#' lower density than the density mode (cluster) they most associate with are
#' considered outliers. GLOSH is computed from a hierarchy a clusters.
#'
#' Specifically, consider a point \emph{x} and a density or distance threshold
#' \emph{lambda}. GLOSH is calculated by taking 1 minus the ratio of how long
#' any of the child clusters of the cluster \emph{x} belongs to "survives"
#' changes in \emph{lambda} to the highest \emph{lambda} threshold of x, above
#' which x becomes a noise point.
#'
#' Scores close to 1 indicate outliers. For more details on the motivation for
#' this calculation, see Campello et al (2015).
#'
#' @aliases glosh GLOSH
#' @family Outlier Detection Functions
#'
#' @param x an [hclust] object, data matrix, or [dist] object.
#' @param k size of the neighborhood.
#' @param ... further arguments are passed on to [kNN()].
#' @return A numeric vector of length equal to the size of the original data
#' set containing GLOSH values for all data points.
#' @author Matt Piekenbrock
#'
#' @references Campello, Ricardo JGB, Davoud Moulavi, Arthur Zimek, and Joerg
#' Sander. Hierarchical density estimates for data clustering, visualization,
#' and outlier detection. _ACM Transactions on Knowledge Discovery from Data
#' (TKDD)_ 10, no. 1 (2015).
#' \doi{10.1145/2733381}
#' @keywords model
#' @examples
#' set.seed(665544)
#' n <- 100
#' x <- cbind(
#'   x=runif(10, 0, 5) + rnorm(n, sd = 0.4),
#'   y=runif(10, 0, 5) + rnorm(n, sd = 0.4)
#'   )
#'
#' ### calculate GLOSH score
#' glosh <- glosh(x, k = 3)
#'
#' ### distribution of outlier scores
#' summary(glosh)
#' hist(glosh, breaks = 10)
#'
#' ### simple function to plot point size is proportional to GLOSH score
#' plot_glosh <- function(x, glosh){
#'   plot(x, pch = ".", main = "GLOSH (k = 3)")
#'   points(x, cex = glosh*3, pch = 1, col = "red")
#'   text(x[glosh > 0.80, ], labels = round(glosh, 3)[glosh > 0.80], pos = 3)
#' }
#' plot_glosh(x, glosh)
#'
#' ### GLOSH with any hierarchy
#' x_dist <- dist(x)
#' x_sl <- hclust(x_dist, method = "single")
#' x_upgma <- hclust(x_dist, method = "average")
#' x_ward <- hclust(x_dist, method = "ward.D2")
#'
#' ## Compare what different linkage criterion consider as outliers
#' glosh_sl <- glosh(x_sl, k = 3)
#' plot_glosh(x, glosh_sl)
#'
#' glosh_upgma <- glosh(x_upgma, k = 3)
#' plot_glosh(x, glosh_upgma)
#'
#' glosh_ward <- glosh(x_ward, k = 3)
#' plot_glosh(x, glosh_ward)
#'
#' ## GLOSH is automatically computed with HDBSCAN
#' all(hdbscan(x, minPts = 3)$outlier_scores == glosh(x, k = 3))
#' @export
glosh <- function(x, k = 4, ...) {
  if (inherits(x, "data.frame"))
    x <- as.matrix(x)

  # get n
  if (inherits(x, "dist") || inherits(x, "matrix")) {
    if (inherits(x, "dist"))
      n <- attr(x, "Size")
    else
      n <- nrow(x)
    # get k nearest neighbors + distances
    d <- kNN(x, k - 1, ...)
    x_dist <-
      if (inherits(x, "dist"))
        x
    else
      dist(x, method = "euclidean") # copy since mrd changes by reference!

    .check_dist(x_dist)
    mrd <- mrd(x_dist, d$dist[, k - 1])

    # need to assemble hclust object manually
    mst <- mst(mrd, n)
    hc <- hclustMergeOrder(mst, order(mst[, 3]))
  } else if (inherits(x, "hclust")) {
    hc <- x
    n <- nrow(hc$merge) + 1
  }
  else
    stop("x needs to be a matrix, dist, or hclust object!")

  if (k < 2 || k >= n)
    stop("k has to be larger than 1 and smaller than the number of points")

  res <- computeStability(hc, k, compute_glosh = TRUE)

  # return
  attr(res, "glosh")
}


================================================
FILE: R/LOF.R
================================================
#######################################################################
# dbscan - Density Based Clustering of Applications with Noise
#          and Related Algorithms
# Copyright (C) 2015 Michael Hahsler

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.


#' Local Outlier Factor Score
#'
#' Calculate the Local Outlier Factor (LOF) score for each data point using a
#' kd-tree to speed up kNN search.
#'
#' LOF compares the local readability density (lrd) of an point to the lrd of
#' its neighbors. A LOF score of approximately 1 indicates that the lrd around
#' the point is comparable to the lrd of its neighbors and that the point is
#' not an outlier. Points that have a substantially lower lrd than their
#' neighbors are considered outliers and produce scores significantly larger
#' than 1.
#'
#' If a data matrix is specified, then Euclidean distances and fast nearest
#' neighbor search using a kd-tree is used.
#'
#' **Note on duplicate points:** If there are more than `minPts`
#' duplicates of a point in the data, then LOF the local readability distance
#' will be 0 resulting in an undefined LOF score of 0/0. We set LOF in this
#' case to 1 since there is already enough density from the points in the same
#' location to make them not outliers. The original paper by Breunig et al
#' (2000) assumes that the points are real duplicates and suggests to remove
#' the duplicates before computing LOF. If duplicate points are removed first,
#' then this LOF implementation in \pkg{dbscan} behaves like the one described
#' by Breunig et al.
#'
#' @aliases lof LOF
#' @family Outlier Detection Functions
#'
#' @param x a data matrix or a [dist] object.
#' @param minPts number of nearest neighbors used in defining the local
#' neighborhood of a point (includes the point itself).
#' @param ... further arguments are passed on to [kNN()].
#' Note: `sort` cannot be specified here since `lof()`
#' uses always `sort = TRUE`.
#'
#' @return A numeric vector of length `ncol(x)` containing LOF values for
#' all data points.
#'
#' @author Michael Hahsler
#' @references Breunig, M., Kriegel, H., Ng, R., and Sander, J. (2000). LOF:
#' identifying density-based local outliers. In _ACM Int. Conf. on
#' Management of Data,_ pages 93-104.
#' \doi{10.1145/335191.335388}
#' @keywords model
#' @examples
#' set.seed(665544)
#' n <- 100
#' x <- cbind(
#'   x=runif(10, 0, 5) + rnorm(n, sd = 0.4),
#'   y=runif(10, 0, 5) + rnorm(n, sd = 0.4)
#'   )
#'
#' ### calculate LOF score with a neighborhood of 3 points
#' lof <- lof(x, minPts = 3)
#'
#' ### distribution of outlier factors
#' summary(lof)
#' hist(lof, breaks = 10, main = "LOF (minPts = 3)")
#'
#' ### plot sorted lof. Looks like outliers start arounf a LOF of 2.
#' plot(sort(lof), type = "l",  main = "LOF (minPts = 3)",
#'   xlab = "Points sorted by LOF", ylab = "LOF")
#'
#' ### point size is proportional to LOF and mark points with a LOF > 2
#' plot(x, pch = ".", main = "LOF (minPts = 3)", asp = 1)
#' points(x, cex = (lof - 1) * 2, pch = 1, col = "red")
#' text(x[lof > 2,], labels = round(lof, 1)[lof > 2], pos = 3)
#' @export
lof <- function(x, minPts = 5, ...) {
  ### parse extra parameters
  extra <- list(...)

  # check for deprecated k
  if (!is.null(extra[["k"]])) {
    minPts <- extra[["k"]] + 1
    extra[["k"]] <- NULL
    warning("lof: k is now deprecated. use minPts = ", minPts, " instead .")
  }

  args <- c("search", "bucketSize", "splitRule", "approx")
  m <- pmatch(names(extra), args)
  if (anyNA(m))
    stop("Unknown parameter: ",
      toString(names(extra)[is.na(m)]))
  names(extra) <- args[m]

  search <- extra$search %||% "kdtree"
  search <- .parse_search(search)
  splitRule <- extra$splitRule %||% "suggest"
  splitRule <- .parse_splitRule(splitRule)
  bucketSize <- if (is.null(extra$bucketSize))
    10L
  else
    as.integer(extra$bucketSize)
  approx <- if (is.null(extra$approx))
    0
  else
    as.double(extra$approx)

  ### precompute distance matrix for dist search
  if (search == 3 && !inherits(x, "dist")) {
    if (.matrixlike(x))
      x <- dist(x)
    else
      stop("x needs to be a matrix to calculate distances")
  }

  # get and check n
  if (inherits(x, "dist"))
    n <- attr(x, "Size")
  else
    n <- nrow(x)
  if (is.null(n))
    stop("x needs to be a matrix or a dist object!")
  if (minPts < 2 || minPts > n)
    stop("minPts has to be at least 2 and not larger than the number of points")


  ### get LOF from a dist object
  if (inherits(x, "dist")) {
    if (anyNA(x))
      stop("NAs not allowed in dist for LOF!")

    # find k-NN distance, ids and distances
    x <- as.matrix(x)
    diag(x) <- Inf ### no self-matches
    o <- t(apply(x, 1, order, decreasing = FALSE))
    k_dist <- x[cbind(o[, minPts - 1], seq_len(n))]
    ids <-
      lapply(
        seq_len(n),
        FUN = function(i)
          which(x[i,] <= k_dist[i])
      )
    dist <-
      lapply(
        seq_len(n),
        FUN = function(i)
          x[i, x[i,] <= k_dist[i]]
      )

    ret <- list(k_dist = k_dist,
      ids = ids,
      dist = dist)

  } else{
    ### Use kd-tree

    if (anyNA(x))
      stop("NAs not allowed for LOF using kdtree!")

    ret <- lof_kNN(
      as.matrix(x),
      as.integer(minPts),
      as.integer(search),
      as.integer(bucketSize),
      as.integer(splitRule),
      as.double(approx)
    )
  }

  # calculate local reachability density (LRD)
  # reachability-distance_k(A,B) = max{k-distance(B), d(A,B)}
  # lrdk(A) = 1/(sum_B \in N_k(A) reachability-distance_k(A, B) / |N_k(A)|)
  lrd <- numeric(n)
  for (A in seq_len(n)) {
    Bs <- ret$ids[[A]]
    lrd[A] <-
      1 / (sum(pmax.int(ret$k_dist[Bs], ret$dist[[A]])) / length(Bs))
  }

  # calculate local outlier factor (LOF)
  # LOF_k(A) = sum_B \in N_k(A) lrd_k(B)/(|N_k(A)| lrdk(A))
  lof <- numeric(n)
  for (A in seq_len(n)) {
    Bs <- ret$ids[[A]]
    lof[A] <- sum(lrd[Bs]) / length(Bs) / lrd[A]
  }

  # with more than k duplicates lrd can become infinity
  # we define them not to be outliers
  lof[is.nan(lof)] <- 1

  lof
}


================================================
FILE: R/NN.R
================================================
#######################################################################
# dbscan - Density Based Clustering of Applications with Noise
#          and Related Algorithms
# Copyright (C) 2015 Michael Hahsler

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

#' NN --- Nearest Neighbors Superclass
#'
#' NN is an abstract S3 superclass for the classes of the objects returned
#' by [kNN()], [frNN()] and [sNN()]. Methods for sorting, plotting and getting an
#' adjacency list are defined.
#'
#' @name NN
#' @aliases NN
#' @family NN functions
#'
#' @param x a `NN` object
#' @param pch plotting character.
#' @param col color used for the data points (nodes).
#' @param linecol color used for edges.
#' @param ... further parameters past on to [plot()].
#' @param decreasing sort in decreasing order?
#' @param data that was used to create `x`
#' @param main title
#'
#' @section Subclasses:
#' [kNN], [frNN] and [sNN]
#'
#' @author Michael Hahsler
#' @keywords model
#' @examples
#' data(iris)
#' x <- iris[, -5]
#'
#' # finding kNN directly in data (using a kd-tree)
#' nn <- kNN(x, k=5)
#' nn
#'
#' # plot the kNN where NN are shown as line conecting points.
#' plot(nn, x)
#'
#' # show the first few elements of the adjacency list
#' head(adjacencylist(nn))
#'
#' \dontrun{
#' # create a graph and find connected components (if igraph is installed)
#' library("igraph")
#' g <- graph_from_adj_list(adjacencylist(nn))
#' comp <- components(g)
#' plot(x, col = comp$membership)
#'
#' # detect clusters (communities) with the label propagation algorithm
#' cl <- membership(cluster_label_prop(g))
#' plot(x, col = cl)
#' }
NULL

#' @rdname NN
#' @export
adjacencylist <- function (x, ...)
  UseMethod("adjacencylist", x)

#' @rdname NN
#' @export
adjacencylist.NN <- function (x, ...) {
  stop("needs to be implemented by a subclass")
  }

#' @rdname NN
#' @export
sort.NN <- function(x, decreasing = FALSE, ...) {
  stop("needs to be implemented by a subclass")
  }


#' @rdname NN
#' @export
plot.NN <- function(x, data, main = NULL, pch = 16, col = NULL, linecol = "gray", ...) {
  if (is.null(main)) {
    if (inherits(x, "frNN"))
      main <- paste0("frNN graph (eps = ", x$eps, ")")
    if (inherits(x, "kNN"))
      main <- paste0(x$k, "-NN graph")
    if (inherits(x, "sNN"))
      main <- paste0("Shared NN graph (k=", x$k,
        ifelse(is.null(x$kt), "", paste0(", kt=", x$kt)), ")")
  }

  ## create an empty plot
  plot(data[, 1:2], main = main, type = "n", pch = pch, col = col, ...)

  id <- adjacencylist(x)

  ## use lines if it is from the same data
  ## FIXME: this test is not perfect, maybe we should have a parameter here or add the query points...
  if (length(id) == nrow(data)) {
    for (i in seq_along(id)) {
      for (j in seq_along(id[[i]]))
        lines(x = c(data[i, 1], data[id[[i]][j], 1]),
          y = c(data[i, 2], data[id[[i]][j], 2]), col = linecol,
          ...)
    }

    ## ad vertices
    points(data[, 1:2], main = main, pch = pch, col = col, ...)

  } else {
    ## ad vertices
    points(data[, 1:2], main = main, pch = pch, ...)
    ## use colors if it was from a query
    for (i in seq_along(id)) {
      points(data[id[[i]], ], pch = pch, col = i + 1L)
    }
  }
}


================================================
FILE: R/RcppExports.R
================================================
# Generated by using Rcpp::compileAttributes() -> do not edit by hand
# Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393

JP_int <- function(nn, kt) {
    .Call(`_dbscan_JP_int`, nn, kt)
}

SNN_sim_int <- function(nn, jp) {
    .Call(`_dbscan_SNN_sim_int`, nn, jp)
}

ANN_cleanup <- function() {
    invisible(.Call(`_dbscan_ANN_cleanup`))
}

comps_kNN <- function(nn, mutual) {
    .Call(`_dbscan_comps_kNN`, nn, mutual)
}

comps_frNN <- function(nn, mutual) {
    .Call(`_dbscan_comps_frNN`, nn, mutual)
}

intToStr <- function(iv) {
    .Call(`_dbscan_intToStr`, iv)
}

dist_subset <- function(dist, idx) {
    .Call(`_dbscan_dist_subset`, dist, idx)
}

XOR <- function(lhs, rhs) {
    .Call(`_dbscan_XOR`, lhs, rhs)
}

dspc <- function(cl_idx, internal_nodes, all_cl_ids, mrd_dist) {
    .Call(`_dbscan_dspc`, cl_idx, internal_nodes, all_cl_ids, mrd_dist)
}

dbscan_int <- function(data, eps, minPts, weights, borderPoints, type, bucketSize, splitRule, approx, frNN) {
    .Call(`_dbscan_dbscan_int`, data, eps, minPts, weights, borderPoints, type, bucketSize, splitRule, approx, frNN)
}

reach_to_dendrogram <- function(reachability, pl_order) {
    .Call(`_dbscan_reach_to_dendrogram`, reachability, pl_order)
}

dendrogram_to_reach <- function(x) {
    .Call(`_dbscan_dendrogram_to_reach`, x)
}

mst_to_dendrogram <- function(mst) {
    .Call(`_dbscan_mst_to_dendrogram`, mst)
}

dbscan_density_int <- function(data, eps, type, bucketSize, splitRule, approx) {
    .Call(`_dbscan_dbscan_density_int`, data, eps, type, bucketSize, splitRule, approx)
}

frNN_int <- function(data, eps, type, bucketSize, splitRule, approx) {
    .Call(`_dbscan_frNN_int`, data, eps, type, bucketSize, splitRule, approx)
}

frNN_query_int <- function(data, query, eps, type, bucketSize, splitRule, approx) {
    .Call(`_dbscan_frNN_query_int`, data, query, eps, type, bucketSize, splitRule, approx)
}

distToAdjacency <- function(constraints, N) {
    .Call(`_dbscan_distToAdjacency`, constraints, N)
}

buildDendrogram <- function(hcl) {
    .Call(`_dbscan_buildDendrogram`, hcl)
}

all_children <- function(hier, key, leaves_only = FALSE) {
    .Call(`_dbscan_all_children`, hier, key, leaves_only)
}

node_xy <- function(cl_tree, cl_hierarchy, cid = 0L) {
    .Call(`_dbscan_node_xy`, cl_tree, cl_hierarchy, cid)
}

simplifiedTree <- function(cl_tree) {
    .Call(`_dbscan_simplifiedTree`, cl_tree)
}

computeStability <- function(hcl, minPts, compute_glosh = FALSE) {
    .Call(`_dbscan_computeStability`, hcl, minPts, compute_glosh)
}

validateConstraintList <- function(constraints, n) {
    .Call(`_dbscan_validateConstraintList`, constraints, n)
}

computeVirtualNode <- function(noise, constraints) {
    .Call(`_dbscan_computeVirtualNode`, noise, constraints)
}

fosc <- function(cl_tree, cid, sc, cl_hierarchy, prune_unstable_leaves = FALSE, cluster_selection_epsilon = 0.0, alpha = 0, useVirtual = FALSE, n_constraints = 0L, constraints = NULL) {
    .Call(`_dbscan_fosc`, cl_tree, cid, sc, cl_hierarchy, prune_unstable_leaves, cluster_selection_epsilon, alpha, useVirtual, n_constraints, constraints)
}

extractUnsupervised <- function(cl_tree, prune_unstable = FALSE, cluster_selection_epsilon = 0.0) {
    .Call(`_dbscan_extractUnsupervised`, cl_tree, prune_unstable, cluster_selection_epsilon)
}

extractSemiSupervised <- function(cl_tree, constraints, alpha = 0, prune_unstable_leaves = FALSE, cluster_selection_epsilon = 0.0) {
    .Call(`_dbscan_extractSemiSupervised`, cl_tree, constraints, alpha, prune_unstable_leaves, cluster_selection_epsilon)
}

kNN_query_int <- function(data, query, k, type, bucketSize, splitRule, approx) {
    .Call(`_dbscan_kNN_query_int`, data, query, k, type, bucketSize, splitRule, approx)
}

kNN_int <- function(data, k, type, bucketSize, splitRule, approx) {
    .Call(`_dbscan_kNN_int`, data, k, type, bucketSize, splitRule, approx)
}

lof_kNN <- function(data, minPts, type, bucketSize, splitRule, approx) {
    .Call(`_dbscan_lof_kNN`, data, minPts, type, bucketSize, splitRule, approx)
}

mrd <- function(dm, cd) {
    .Call(`_dbscan_mrd`, dm, cd)
}

mst <- function(x_dist, n) {
    .Call(`_dbscan_mst`, x_dist, n)
}

hclustMergeOrder <- function(mst, o) {
    .Call(`_dbscan_hclustMergeOrder`, mst, o)
}

optics_int <- function(data, eps, minPts, type, bucketSize, splitRule, approx, frNN) {
    .Call(`_dbscan_optics_int`, data, eps, minPts, type, bucketSize, splitRule, approx, frNN)
}

lowerTri <- function(m) {
    .Call(`_dbscan_lowerTri`, m)
}


================================================
FILE: R/broom-dbscan-tidiers.R
================================================
#' Turn an dbscan clustering object into a tidy tibble
#'
#' Provides [tidy()][generics::tidy()], [augment()][generics::augment()], and
#' [glance()][generics::glance()] verbs for clusterings created with algorithms
#' in package `dbscan` to work with [tidymodels](https://www.tidymodels.org/).
#'
#' @param x An `dbscan` object returned from [dbscan::dbscan()].
#' @param data The data used to create the clustering.
#' @param newdata New data to predict cluster labels for.
#' @param ... further arguments are ignored without a warning.
#'
#' @name dbscan_tidiers
#' @aliases dbscan_tidiers glance tidy augment
#' @family tidiers
#'
#' @seealso [generics::tidy()], [generics::augment()],
#'  [generics::glance()], [dbscan()]
#'
#' @examplesIf requireNamespace("tibble", quietly = TRUE) && identical(Sys.getenv("NOT_CRAN"), "true")
#'
#' data(iris)
#' x <- scale(iris[, 1:4])
#'
#' ## dbscan
#' db <- dbscan(x, eps = .9, minPts = 5)
#' db
#'
#' # summarize model fit with tidiers
#' tidy(db)
#' glance(db)
#'
#' # augment for this model needs the original data
#' augment(db, x)
#'
#' # to augment new data, the original data is also needed
#' augment(db, x, newdata = x[1:5, ])
#'
#' ## hdbscan
#' hdb <- hdbscan(x, minPts = 5)
#'
#' # summarize model fit with tidiers
#' tidy(hdb)
#' glance(hdb)
#'
#' # augment for this model needs the original data
#' augment(hdb, x)
#'
#' # to augment new data, the original data is also needed
#' augment(hdb, x, newdata = x[1:5, ])
#'
#' ## Jarvis-Patrick clustering
#' cl <- jpclust(x, k = 20, kt = 15)
#'
#' # summarize model fit with tidiers
#' tidy(cl)
#' glance(cl)
#'
#' # augment for this model needs the original data
#' augment(cl, x)
#'
#' ## Shared Nearest Neighbor clustering
#' cl <- sNNclust(x, k = 20, eps = 0.8, minPts = 15)
#'
#' # summarize model fit with tidiers
#' tidy(cl)
#' glance(cl)
#'
#' # augment for this model needs the original data
#' augment(cl, x)
#'
NULL

#' @rdname dbscan_tidiers
#' @importFrom generics tidy
#' @export
generics::tidy


#' @rdname dbscan_tidiers
#' @export
tidy.dbscan <- function(x, ...) {
  n_cl <- max(x$cluster)
  size <- table(factor(x$cluster, levels = 0:n_cl))

  tb <- tibble::tibble(cluster = as.factor(0:n_cl),
         size = as.integer(size))

  tb$noise <- tb$cluster == 0L
  tb
}

#' @rdname dbscan_tidiers
#' @export
tidy.hdbscan <- function(x, ...) {
  n_cl <- max(x$cluster)
  size <- table(factor(x$cluster, levels = 0:n_cl))

  tb <- tibble::tibble(cluster = as.factor(0:n_cl),
         size = as.integer(size))
  tb$cluster_score <- as.numeric(x$cluster_scores[as.character(tb$cluster)])
  tb$noise <- tb$cluster == 0L

  tb
}

#' @rdname dbscan_tidiers
#' @export
tidy.general_clustering <- function(x, ...) {
  n_cl <- max(x$cluster)
  size <- table(factor(x$cluster, levels = 0:n_cl))

  tb <- tibble::tibble(cluster = as.factor(0:n_cl),
         size = as.integer(size))
  tb$noise <- tb$cluster == 0L

  tb
}


## augment

#' @importFrom generics augment
#' @rdname dbscan_tidiers
#' @export
generics::augment


#' @rdname dbscan_tidiers
#' @export
augment.dbscan <- function(x, data = NULL, newdata = NULL, ...) {
  n_cl <- max(x$cluster)

  if (is.null(data) && is.null(newdata))
    stop("Must specify either `data` or `newdata` argument.")

  if (is.null(data) || nrow(data) != length(x$cluster)) {
    stop("The original data needs to be passed as data.")
  }

  if (is.null(newdata)) {
    tb <- tibble::as_tibble(data)
    tb$.cluster <- factor(x$cluster, levels = 0:n_cl)
  } else {
    tb <- tibble::as_tibble(newdata)
    tb$.cluster <- factor(predict(x,
                                  newdata = newdata,
                                  data = data), levels = 0:n_cl)
  }

  tb$noise <- tb$.cluster == 0L

  tb
}

#' @rdname dbscan_tidiers
#' @export
augment.hdbscan <- function(x, data = NULL, newdata = NULL, ...) {
  n_cl <- max(x$cluster)

  if (is.null(data) || nrow(data) != length(x$cluster)) {
    stop("The original data needs to be passed as data.")
  }

  if (is.null(newdata)) {
    tb <- tibble::as_tibble(data)
    tb$.cluster <- factor(x$cluster, levels = 0:n_cl)
    tb$.coredist <- x$coredist
    tb$.membership_prob <- x$membership_prob
    tb$.outlier_scores <- x$outlier_scores
  } else {
    tb <- tibble::as_tibble(newdata)
    tb$.cluster <- factor(
        predict(x, newdata = newdata, data = data), levels = 0:n_cl)
    tb$.coredist <- NA_real_
    tb$.membership_prob <- NA_real_
    tb$.outlier_scores <- NA_real_
  }

  tb
}

#' @rdname dbscan_tidiers
#' @export
augment.general_clustering <- function(x, data = NULL, newdata = NULL, ...) {
  n_cl <- max(x$cluster)

  if (is.null(data) || nrow(data) != length(x$cluster)) {
    stop("The original data needs to be passed as data.")
  }

  if (is.null(newdata)) {
    tb <- tibble::as_tibble(data)
    tb$.cluster <- factor(x$cluster, levels = 0:n_cl)
  } else {
    stop("augmenting new data is not supported.")
  }

  tb
}


## glance
#' @importFrom generics glance
#' @rdname dbscan_tidiers
#' @export
generics::glance


#' @rdname dbscan_tidiers
#' @export
glance.dbscan <- function(x, ...) {
  tibble::tibble(
    nobs = length(x$cluster),
    n.clusters = length(table(x$cluster[x$cluster != 0L])),
    nexcluded = sum(x$cluster == 0L)
  )
}

#' @rdname dbscan_tidiers
#' @export
glance.hdbscan <- function(x, ...) {
  tibble::tibble(
    nobs = length(x$cluster),
    n.clusters = length(table(x$cluster[x$cluster != 0L])),
    nexcluded = sum(x$cluster == 0L)
  )
}

#' @rdname dbscan_tidiers
#' @export
glance.general_clustering <- function(x, ...) {
  tibble::tibble(
    nobs = length(x$cluster),
    n.clusters = length(table(x$cluster[x$cluster != 0L])),
    nexcluded = sum(x$cluster == 0L)
  )
}


================================================
FILE: R/comps.R
================================================
#######################################################################
# dbscan - Density Based Clustering of Applications with Noise
#          and Related Algorithms
# Copyright (C) 2017 Michael Hahsler

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

#' Find Connected Components in a Nearest-neighbor Graph
#'
#' Generic function and methods to find connected components in nearest neighbor graphs.
#'
#' Note that for kNN graphs, one point may be in the kNN of the other but nor vice versa.
#' `mutual = TRUE` requires that both points are in each other's kNN.
#'
#' @family NN functions
#' @aliases components
#'
#' @param x the [NN] object representing the graph or a [dist] object
#' @param eps threshold on the distance
#' @param mutual for a pair of points, do both have to be in each other's neighborhood?
#' @param ... further arguments are currently unused.
#'
#' @return an integer vector with component assignments.
#'
#' @author Michael Hahsler
#' @keywords model
#' @examples
#' set.seed(665544)
#' n <- 100
#' x <- cbind(
#'   x=runif(10, 0, 5) + rnorm(n, sd = 0.4),
#'   y=runif(10, 0, 5) + rnorm(n, sd = 0.4)
#'   )
#' plot(x, pch = 16)
#'
#' # Connected components on a graph where each pair of points
#' # with a distance less or equal to eps are connected
#' d <- dist(x)
#' components <- comps(d, eps = .8)
#' plot(x, col = components, pch = 16)
#'
#' # Connected components in a fixed radius nearest neighbor graph
#' # Gives the same result as the threshold on the distances above
#' frnn <- frNN(x, eps = .8)
#' components <- comps(frnn)
#' plot(frnn, data = x, col = components)
#'
#' # Connected components on a k nearest neighbors graph
#' knn <- kNN(x, 3)
#' components <- comps(knn, mutual = FALSE)
#' plot(knn, data = x, col = components)
#'
#' components <- comps(knn, mutual = TRUE)
#' plot(knn, data = x, col = components)
#'
#' # Connected components in a shared nearest neighbor graph
#' snn <- sNN(x, k = 10, kt = 5)
#' components <- comps(snn)
#' plot(snn, data = x, col = components)
#' @export
comps <- function(x, ...) UseMethod("comps", x)

#' @rdname comps
#' @export
comps.dist <- function(x, eps, ...)
  stats::cutree(stats::hclust(x, method = "single"), h = eps)

#' @rdname comps
#' @export
comps.kNN <- function(x, mutual = FALSE, ...)
  as.integer(factor(comps_kNN(x$id, as.logical(mutual))))

# sNN and frNN are symmetric so no need for mutual
#' @rdname comps
#' @export
comps.sNN <- function(x, ...) comps.kNN(x, mutual = FALSE)

#' @rdname comps
#' @export
comps.frNN <- function(x, ...) comps_frNN(x$id, mutual = FALSE)


================================================
FILE: R/dbcv.R
================================================
#######################################################################
# dbscan - Density Based Clustering of Applications with Noise
#          and Related Algorithms
# Copyright (C) 2024 Michael Hahsler, Matt Piekenbrock

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.


#' Density-Based Clustering Validation Index (DBCV)
#'
#' Calculate the Density-Based Clustering Validation Index (DBCV)  for a
#' clustering.
#'
#' DBCV (Moulavi et al, 2014) computes a score based on the density sparseness of each cluster
#' and the density separation of each pair of clusters.
#'
#' The density sparseness of a cluster (DSC) is deﬁned as the maximum edge weight of
#' a minimal spanning tree for the internal points of the cluster using the mutual
#' reachability distance based on the all-points-core-distance. Internal points
#' are connected to more than one other point in the cluster. Since clusters of
#' a size less then 3 cannot have internal points, they are ignored (considered
#' noise) in this implementation.
#'
#' The density separation of a pair of clusters (DSPC)
#' is deﬁned as the minimum reachability distance between the internal nodes of
#' the spanning trees of the two clusters.
#'
#' The validity index for a cluster is calculated using these measures and aggregated
#' to a validity index for the whole clustering using a weighted average.
#'
#' The index is in the range \eqn{[-1,1]}. If the cluster density compactness is better
#' than the density separation, a positive value is returned. The actual value depends
#' on the separability of the data. In general, greater values
#' of the measure indicating a better density-based clustering solution.
#'
#' Noise points are included in the calculation only in the weighted average,
#' therefore clustering with more noise points will get a lower index.
#'
#' **Performance note:** This implementation calculates a distance matrix and thus
#' can only be used for small or sampled datasets.
#'
#' @aliases dbcv DBCV
#' @family Evaluation Functions
#'
#' @param x a data matrix or a dist object.
#' @param cl a clustering (e.g., a integer vector)
#' @param d dimensionality of the original data if a dist object is provided.
#' @param metric distance metric used. The available metrics are the methods
#'        implemented by `dist()` plus `"sqeuclidean"` for the squared
#'        Euclidean distance used in the original DBCV implementation.
#' @param sample sample size used for large datasets.
#'
#' @return A list with the DBCV `score` for the clustering,
#'   the density sparseness of cluster (`dsc`) values,
#'   the density separation of pairs of clusters (`dspc`) distances,
#'   and the validity indices of clusters (`c_c`).
#'
#' @author Matt Piekenbrock and Michael Hahsler
#' @references Davoud Moulavi and Pablo A. Jaskowiak and
#' Ricardo J. G. B. Campello and Arthur Zimek and Jörg Sander (2014).
#' Density-Based Clustering Validation. In
#' _Proceedings of the 2014 SIAM International Conference on Data Mining,_
#' pages 839-847
#' \doi{10.1137/1.9781611973440.96}
#'
#' Pablo A. Jaskowiak (2022). MATLAB implementation of DBCV.
#' \url{https://github.com/pajaskowiak/dbcv}
#' @examples
#' # Load a test dataset
#' data(Dataset_1)
#' x <- Dataset_1[, c("x", "y")]
#' class <- Dataset_1$class
#'
#' clplot(x, class)
#'
#' # We use MinPts 3 and use the knee at eps = .1 for dbscan
#' kNNdistplot(x, minPts = 3)
#'
#' cl <- dbscan(x, eps = .1, minPts = 3)
#' clplot(x, cl)
#'
#' dbcv(x, cl)
#'
#' # compare to the DBCV index on the original class labels and
#' # with a random partitioning
#' dbcv(x, class)
#' dbcv(x, sample(1:4, replace = TRUE, size = nrow(x)))
#'
#' # find the best eps using dbcv
#' eps_grid <- seq(.05,.2, by = .01)
#' cls <- lapply(eps_grid, FUN = function(e) dbscan(x, eps = e, minPts = 3))
#' dbcvs <- sapply(cls, FUN = function(cl) dbcv(x, cl)$score)
#'
#' plot(eps_grid, dbcvs, type = "l")
#'
#' eps_opt <- eps_grid[which.max(dbcvs)]
#' eps_opt
#'
#' cl <- dbscan(x, eps = eps_opt, minPts = 3)
#' clplot(x, cl)
#' @export
dbcv <- function(x,
                 cl,
                 d,
                 metric = "euclidean",
                 sample = NULL) {
  # a clustering with a cluster element
  if (is.list(cl)) {
    cl <- cl$cluster
  }

  if (inherits(x, "dist")) {
    xdist <- x
    if (missing(d))
      stop("d needs to be specified if a distance matrix is supplied!")

  } else if (.matrixlike(x)) {
    if (!is.null(sample)) {
      take <- sample(nrow(x), size = sample)
      x <- x[take, ]
      cl <- cl[take]
    }

    x <- as.matrix(x)
    if (!missing(d) && d != ncol(x))
      stop("d does not match the number of columns in x!")
    d <- ncol(x)

    if (pmatch(metric, "sqeuclidean", nomatch = 0))
      xdist <- dist(x, method = "euclidean")^2
    else
      xdist <- dist(x, method = metric)

  } else
    stop("'dbcv' expects x needs to be a matrix to calculate distances.")

  .check_dist(xdist)
  n <- attr(xdist, "Size")

  # in case we get a factor
  cl <- as.integer(cl)

  if (length(cl) != n)
    stop("cl does not match the number of rows in x!")

  ## calculate everything for all non-noise points ordered by cluster
  ## getClusterIdList removes noise points and singleton clusters
  ## and returns indices reorder by cluster
  cl_idx_list <- getClusterIdList(cl)
  n_cl <- length(cl_idx_list)
  ## reordered distances w/o noise
  all_dist <- dist_subset(xdist, unlist(cl_idx_list))

  new_cl_idx_list <- list()
  i <- 1L
  start <- 1
  for(l in lengths(cl_idx_list)) {
    end <- start + l - 1
    new_cl_idx_list[[i]] <- seq(start, end)
    start <- end + 1
    i <- i + 1L
  }

  cl_idx_list <- new_cl_idx_list
  all_idx <- unlist(cl_idx_list)


  ## 1. Calculate all-points-core-distance
  ## Calculate the all-points-core-distance for each point, within each cluster
  ## Note: this needs the dimensionality of the data d
  all_pts_core_dist <- unlist(lapply(
    cl_idx_list,
    FUN = function(ids) {
      dists <- (rowSums(as.matrix((
        1 / dist_subset(all_dist, ids)
      )^d)) / (length(ids) - 1))^(-1 / d)
    }
  ))

  ## 2. Create for each cluster a mutual reachability MSTs
  all_mrd <- structure(mrd(all_dist, all_pts_core_dist),
                       class = "dist",
                       Size = length(all_idx))
  ## Noise points are removed, but the index is affected by dividing by the
  ## total number of objects including the noise points (n)!

  ## mst is a matrix with columns: from to and weight
  mrd_graphs <- lapply(cl_idx_list, function(idx) {
    mst(x_dist = dist_subset(all_mrd, idx), n = length(idx))
  })

  ## 3. Density Sparseness of a Cluster (DSC):
  ## The maximum edge weight of the internal edges in the cluster's
  ## mutual reachability MST.

  ## find internal nodes for DSC and DSPC. Internal nodes have a degree > 1
  internal_nodes <- lapply(mrd_graphs, function(mst) {
    node_deg <- table(c(mst[, 1], mst[, 2]))
    idx <- as.integer(names(node_deg)[node_deg > 1])
    idx
  })

  dsc <- mapply(function(mst, int_idx) {
    # find internal edges
    int_edge_idx <- which((mst[, 1L] %in% int_idx) &
                            (mst[, 2L] %in% int_idx))
    if (length(int_edge_idx) == 0L) {
      return(max(mst[, 3L]))
    }
    max(mst[int_edge_idx, 3L])
  }, mrd_graphs, internal_nodes)


  ## 4. Density Separation of a Pair of Clusters (DSPC):
  ## The minimum reachability distance between the internal nodes of the
  ## internal nodes of a pair of MST_MRD's of clusters Ci and Cj
  dspc_dist <- dspc(cl_idx_list, internal_nodes, all_idx, all_mrd)
  # returns a matrix with Ci, Cj, dist

  # make it into a full distance matrix
  dspc_dist <- dspc_dist[, 3L]
  class(dspc_dist) <- "dist"
  attr(dspc_dist, "Size") <- n_cl
  attr(dspc_dist, "Diag") <- FALSE
  attr(dspc_dist, "Upper") <- FALSE

  dspc_mm <- as.matrix(dspc_dist)
  diag(dspc_mm) <- NA

  ## 5. Validity index of a cluster:
  min_separation <- apply(dspc_mm, MARGIN = 1, min, na.rm = TRUE)
  v_c <- (min_separation - dsc) / pmax(min_separation, dsc)


  ## 5. Validity index for whole clustering
  res <- sum(lengths(cl_idx_list) / n * v_c)

  return(list(
    score = res,
    n = n,
    n_c = lengths(cl_idx_list),
    d = d,
    dsc = dsc,
    dspc = dspc_dist,
    v_c = v_c
  ))
}


getClusterIdList <- function(cl) {
  ## In DBCV, singletons are ambiguously defined. However, they cannot be
  ## considered valid clusters, for reasons listed in section 4 of the
  ## original paper.
  ## Clusters with less then 3 points cannot have internal nodes, so we need to
  ## ignore them as well.
  ## To ensure coverage, they are assigned into the noise category.
  cl_freq <- table(cl)
  cl[cl %in% as.integer(names(which(cl_freq < 3)))] <- 0L
  if (all(cl == 0)) {
    return(0)
  }

  cl_ids <- unique(cl)            # all cluster ids
  cl_valid <- cl_ids[cl_ids != 0] # valid cluster indices (non-noise)
  n_cl <- length(cl_valid)        # number of clusters

  ## 1 or 0 clusters results in worst score + a warning
  if (n_cl <= 1) {
    warning("DBCV is undefined for less than 2 non-noise clusters with more than 2 member points.")
    return(-1L)
  }

  ## Indexes
  cl_ids_idx <- lapply(cl_valid, function(id)
    sort(which(cl == id))) ## the sort is important for indexing purposes
  return(cl_ids_idx)
}


================================================
FILE: R/dbscan.R
================================================
#######################################################################
# dbscan - Density Based Clustering of Applications with Noise
#          and Related Algorithms
# Copyright (C) 2015 Michael Hahsler
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.


#' Density-based Spatial Clustering of Applications with Noise (DBSCAN)
#'
#' Fast reimplementation of the DBSCAN (Density-based spatial clustering of
#' applications with noise) clustering algorithm using a kd-tree.
#'
#' The
#' implementation is significantly faster and can work with larger data sets
#' than [fpc::dbscan()] in \pkg{fpc}. Use `dbscan::dbscan()` (with specifying the package) to
#' call this implementation when you also load package \pkg{fpc}.
#'
#' **The algorithm**
#'
#' This implementation of DBSCAN follows the original
#' algorithm as described by Ester et al (1996). DBSCAN performs the following steps:
#'
#' 1. Estimate the density
#'   around each data point by counting the number of points in a user-specified
#'   eps-neighborhood and applies a used-specified minPts thresholds to identify
#'      - core points (points with more than minPts points in their neighborhood),
#'      - border points (non-core points with a core point in their neighborhood) and
#'      - noise points (all other points).
#' 2. Core points form the backbone of clusters by joining them into
#'   a cluster if they are density-reachable from each other (i.e., there is a chain of core
#'   points where one falls inside the eps-neighborhood of the next).
#' 3. Border points are assigned to clusters. The algorithm needs parameters
#'   `eps` (the radius of the epsilon neighborhood) and `minPts` (the
#'   density threshold).
#'
#' Border points are arbitrarily assigned to clusters in the original
#' algorithm. DBSCAN* (see Campello et al 2013) treats all border points as
#' noise points. This is implemented with `borderPoints = FALSE`.
#'
#' **Specifying the data**
#'
#' If `x` is a matrix or a data.frame, then fast fixed-radius nearest
#' neighbor computation using a kd-tree is performed using Euclidean distance.
#' See [frNN()] for more information on the parameters related to
#' nearest neighbor search. **Note** that only numerical values are allowed in `x`.
#'
#' Any precomputed distance matrix (dist object) can be specified as `x`.
#' You may run into memory issues since distance matrices are large.
#'
#' A precomputed frNN object can be supplied as `x`. In this case
#' `eps` does not need to be specified. This option us useful for large
#' data sets, where a sparse distance matrix is available. See
#' [frNN()] how to create frNN objects.
#'
#' **Setting parameters for DBSCAN**
#'
#' The parameters `minPts` and `eps` define the minimum density required
#' in the area around core points which form the backbone of clusters.
#' `minPts` is the number of points
#' required in the neighborhood around the point defined by the parameter `eps`
#' (i.e., the radius around the point). Both parameters
#' depend on each other and changing one typically requires changing
#' the other one as well. The parameters also depend on the size of the data set with
#' larger datasets requiring a larger `minPts` or a smaller `eps`.
#'
#' * `minPts:` The original
#' DBSCAN paper (Ester et al, 1996) suggests to start by setting \eqn{\text{minPts} \ge d + 1},
#' the data dimensionality plus one or higher with a minimum of 3. Larger values
#' are preferable since increasing the parameter suppresses more noise in the data
#' by requiring more points to form clusters.
#' Sander et al (1998) uses in the examples two times the data dimensionality.
#' Note that setting \eqn{\text{minPts} \le 2} is equivalent to hierarchical clustering
#' with the single link metric and the dendrogram cut at height `eps`.
#'
#' * `eps:` A suitable neighborhood size
#' parameter `eps` given a fixed value for `minPts` can be found
#' visually by inspecting the [kNNdistplot()] of the data using
#' \eqn{k = \text{minPts} - 1} (`minPts` includes the point itself, while the
#' k-nearest neighbors distance does not). The k-nearest neighbor distance plot
#' sorts all data points by their k-nearest neighbor distance. A sudden
#' increase of the kNN distance (a knee) indicates that the points to the right
#' are most likely outliers. Choose `eps` for DBSCAN where the knee is.
#'
#' **Predict cluster memberships**
#'
#' [predict()] can be used to predict cluster memberships for new data
#' points. A point is considered a member of a cluster if it is within the eps
#' neighborhood of a core point of the cluster. Points
#' which cannot be assigned to a cluster will be reported as
#' noise points (i.e., cluster ID 0).
#' **Important note:** `predict()` currently can only use Euclidean distance to determine
#' the neighborhood of core points. If `dbscan()` was called using distances other than Euclidean,
#' then the neighborhood calculation will not be correct and only approximated by Euclidean
#' distances. If the data contain factor columns (e.g., using Gower's distance), then
#' the factors in `data` and `query` first need to be converted to numeric to use the
#' Euclidean approximation.
#'
#'
#' @aliases dbscan DBSCAN print.dbscan_fast
#' @family clustering functions
#'
#' @param x a data matrix, a data.frame, a [dist] object or a [frNN] object with
#' fixed-radius nearest neighbors.
#' @param eps size (radius) of the epsilon neighborhood. Can be omitted if
#' `x` is a frNN object.
#' @param minPts number of minimum points required in the eps neighborhood for
#' core points (including the point itself).
#' @param weights numeric; weights for the data points. Only needed to perform
#' weighted clustering.
#' @param borderPoints logical; should border points be assigned to clusters.
#' The default is `TRUE` for regular DBSCAN. If `FALSE` then border
#' points are considered noise (see DBSCAN* in Campello et al, 2013).
#' @param ...  additional arguments are passed on to the fixed-radius nearest
#' neighbor search algorithm. See [frNN()] for details on how to
#' control the search strategy.
#'
#' @return `dbscan()` returns an object of class `dbscan_fast` with the following components:
#'
#' \item{eps }{ value of the `eps` parameter.}
#' \item{minPts }{ value of the `minPts` parameter.}
#' \item{metric }{ used distance metric.}
#' \item{cluster }{A integer vector with cluster assignments. Zero indicates noise points.}
#'
#' `is.corepoint()` returns a logical vector indicating for each data point if it is a
#'   core point.
#'
#' @author Michael Hahsler
#' @references Hahsler M, Piekenbrock M, Doran D (2019). dbscan: Fast
#' Density-Based Clustering with R.  _Journal of Statistical Software,_
#' 91(1), 1-30.
#' \doi{10.18637/jss.v091.i01}
#'
#' Martin Ester, Hans-Peter Kriegel, Joerg Sander, Xiaowei Xu (1996). A
#' Density-Based Algorithm for Discovering Clusters in Large Spatial Databases
#' with Noise. Institute for Computer Science, University of Munich.
#' _Proceedings of 2nd International Conference on Knowledge Discovery and
#' Data Mining (KDD-96),_ 226-231.
#' \url{https://dl.acm.org/doi/10.5555/3001460.3001507}
#'
#' Campello, R. J. G. B.; Moulavi, D.; Sander, J. (2013). Density-Based
#' Clustering Based on Hierarchical Density Estimates. Proceedings of the
#' 17th Pacific-Asia Conference on Knowledge Discovery in Databases, PAKDD
#' 2013, _Lecture Notes in Computer Science_ 7819, p. 160.
#' \doi{10.1007/978-3-642-37456-2_14}
#'
#' Sander, J., Ester, M., Kriegel, HP. et al. (1998). Density-Based
#' Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications.
#' _Data Mining and Knowledge Discovery_ 2, 169-194.
#' \doi{10.1023/A:1009745219419}
#'
#' @keywords model clustering
#' @examples
#' ## Example 1: use dbscan on the iris data set
#' data(iris)
#' iris <- as.matrix(iris[, 1:4])
#'
#' ## Find suitable DBSCAN parameters:
#' ## 1. We use minPts = dim + 1 = 5 for iris. A larger value can also be used.
#' ## 2. We inspect the k-NN distance plot for k = minPts - 1 = 4
#' kNNdistplot(iris, minPts = 5)
#'
#' ## Noise seems to start around a 4-NN distance of .7
#' abline(h=.7, col = "red", lty = 2)
#'
#' ## Cluster with the chosen parameters
#' res <- dbscan(iris, eps = .7, minPts = 5)
#' res
#'
#' pairs(iris, col = res$cluster + 1L)
#' clplot(iris, res)
#'
#' ## Use a precomputed frNN object
#' fr <- frNN(iris, eps = .7)
#' dbscan(fr, minPts = 5)
#'
#' ## Example 2: use data from fpc
#' set.seed(665544)
#' n <- 100
#' x <- cbind(
#'   x = runif(10, 0, 10) + rnorm(n, sd = 0.2),
#'   y = runif(10, 0, 10) + rnorm(n, sd = 0.2)
#'   )
#'
#' res <- dbscan(x, eps = .3, minPts = 3)
#' res
#'
#' ## plot clusters and add noise (cluster 0) as crosses.
#' plot(x, col = res$cluster)
#' points(x[res$cluster == 0, ], pch = 3, col = "grey")
#'
#' clplot(x, res)
#' hullplot(x, res)
#'
#' ## Predict cluster membership for new data points
#' ## (Note: 0 means it is predicted as noise)
#' newdata <- x[1:5,] + rnorm(10, 0, .3)
#' hullplot(x, res)
#' points(newdata, pch = 3 , col = "red", lwd = 3)
#' text(newdata, pos = 1)
#'
#' pred_label <- predict(res, newdata, data = x)
#' pred_label
#' points(newdata, col = pred_label + 1L,  cex = 2, lwd = 2)
#'
#' ## Compare speed against fpc version (if microbenchmark is installed)
#' ## Note: we use dbscan::dbscan to make sure that we do now run the
#' ## implementation in fpc.
#' \dontrun{
#' if (requireNamespace("fpc", quietly = TRUE) &&
#'     requireNamespace("microbenchmark", quietly = TRUE)) {
#'   t_dbscan <- microbenchmark::microbenchmark(
#'     dbscan::dbscan(x, .3, 3), times = 10, unit = "ms")
#'   t_dbscan_linear <- microbenchmark::microbenchmark(
#'     dbscan::dbscan(x, .3, 3, search = "linear"), times = 10, unit = "ms")
#'   t_dbscan_dist <- microbenchmark::microbenchmark(
#'     dbscan::dbscan(x, .3, 3, search = "dist"), times = 10, unit = "ms")
#'   t_fpc <- microbenchmark::microbenchmark(
#'     fpc::dbscan(x, .3, 3), times = 10, unit = "ms")
#'
#'   r <- rbind(t_fpc, t_dbscan_dist, t_dbscan_linear, t_dbscan)
#'   r
#'
#'   boxplot(r,
#'     names = c('fpc', 'dbscan (dist)', 'dbscan (linear)', 'dbscan (kdtree)'),
#'     main = "Runtime comparison in ms")
#'
#'   ## speedup of the kd-tree-based version compared to the fpc implementation
#'   median(t_fpc$time) / median(t_dbscan$time)
#' }}
#'
#' ## Example 3: manually create a frNN object for dbscan (dbscan only needs ids and eps)
#' nn <- structure(list(id = list(c(2,3), c(1,3), c(1,2,3), c(3,5), c(4,5)), eps = 1),
#'   class =  c("NN", "frNN"))
#' nn
#' dbscan(nn, minPts = 2)
#'
#' @export
dbscan <-
  function(x,
    eps,
    minPts = 5,
    weights = NULL,
    borderPoints = TRUE,
    ...) {
    if (inherits(x, "frNN") && missing(eps)) {
      eps <- x$eps
      dist_method <- x$metric
    }

    if (inherits(x, "dist")) {
      .check_dist(x)
      dist_method <- attr(x, "method")
    } else
      dist_method <- "euclidean"

    dist_method <- dist_method %||% "unknown"

    ### extra contains settings for frNN
    ### search = "kdtree", bucketSize = 10, splitRule = "suggest", approx = 0
    ### also check for MinPts for fpc compatibility (does not work for
    ### search method dist)
    extra <- list(...)
    args <-
      c("MinPts", "search", "bucketSize", "splitRule", "approx")
    m <- pmatch(names(extra), args)
    if (anyNA(m))
      stop("Unknown parameter: ",
        toString(names(extra)[is.na(m)]))
    names(extra) <- args[m]

    # fpc compartability
    if (!is.null(extra$MinPts)) {
      warning("converting argument MinPts (fpc) to minPts (dbscan)!")
      minPts <- extra$MinPts
      extra$MinPts <- NULL
    }

    search <- .parse_search(extra$search %||% "kdtree")
    splitRule <- .parse_splitRule(extra$splitRule %||% "suggest")
    bucketSize <- as.integer(extra$bucketSize %||% 10L)
    approx <- as.integer(extra$approx %||% 0L)

    ### do dist search
    if (search == 3L && !inherits(x, "dist")) {
      if (.matrixlike(x))
        x <- dist(x)
      else
        stop("x needs to be a matrix to calculate distances")
    }

    ## for dist we provide the R code with a frNN list and no x
    frNN <- list()
    if (inherits(x, "dist")) {
      frNN <- frNN(x, eps, ...)$id
      x <- matrix(0.0, nrow = 0, ncol = 0)
    } else if (inherits(x, "frNN")) {
      if (x$eps != eps) {
        eps <- x$eps
        warning("Using the eps of ",
          eps,
          " provided in the fixed-radius NN object.")
      }
      frNN <- x$id
      x <- matrix(0.0, nrow = 0, ncol = 0)

    } else {
      if (!.matrixlike(x))
        stop("x needs to be a matrix or data.frame.")
      ## make sure x is numeric
      x <- as.matrix(x)
      if (storage.mode(x) == "integer")
        storage.mode(x) <- "double"
      if (storage.mode(x) != "double")
        stop("all data in x has to be numeric.")
    }

    if (length(frNN) == 0 && anyNA(x))
      stop("data/distances cannot contain NAs for dbscan (with kd-tree)!")

    ## add self match and use C numbering if frNN is used
    if (length(frNN) > 0L)
      frNN <-
      lapply(
        seq_along(frNN),
        FUN = function(i)
          c(i - 1L, frNN[[i]] - 1L)
      )

    if (length(minPts) != 1L ||
        !is.finite(minPts) ||
        minPts < 0)
      stop("minPts need to be a single integer >=0.")

    if (is.null(eps) ||
        is.na(eps) || eps < 0)
      stop("eps needs to be >=0.")

    ret <- dbscan_int(
      x,
      as.double(eps),
      as.integer(minPts),
      as.double(weights),
      as.integer(borderPoints),
      as.integer(search),
      as.integer(bucketSize),
      as.integer(splitRule),
      as.double(approx),
      frNN
    )

    structure(
      list(
        cluster = ret,
        eps = eps,
        minPts = minPts,
        metric = dist_method,
        borderPoints = borderPoints
      ),
      class = c("dbscan_fast", "dbscan")
    )
  }

#' @export
print.dbscan_fast <- function(x, ...) {
  writeLines(c(
    paste0("DBSCAN clustering for ", nobs(x), " objects."),
    paste0("Parameters: eps = ", x$eps, ", minPts = ", x$minPts),
    paste0(
      "Using ",
      x$metric,
      " distances and borderpoints = ",
      x$borderPoints
    ),
    paste0(
      "The clustering contains ",
      ncluster(x),
      " cluster(s) and ",
      nnoise(x),
      " noise points."
    )
  ))

  print(table(x$cluster))
  cat("\n")

  writeLines(strwrap(paste0(
    "Available fields: ",
    toString(names(x))
  ), exdent = 18))
}

#' @rdname dbscan
#' @export
is.corepoint <- function(x, eps, minPts = 5, ...)
  lengths(frNN(x, eps = eps, ...)$id) >= (minPts - 1)


================================================
FILE: R/dendrogram.R
================================================
#######################################################################
# dbscan - Density Based Clustering of Applications with Noise
#          and Related Algorithms
# Copyright (C) 2015 Michael Hahsler, Matt Piekenbrock

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

#' Coersions to Dendrogram
#'
#' Provides a new generic function to coerce objects to dendrograms with
#' [stats::as.dendrogram()] as the default. Additional methods for
#' [hclust], [hdbscan] and [reachability] objects are provided.
#'
#' Coersion methods for
#' [hclust], [hdbscan] and [reachability] objects to [dendrogram] are provided.
#'
#' The coercion from `hclust` is a faster C++ reimplementation of the coercion in
#' package `stats`. The original implementation can be called
#' using [stats::as.dendrogram()].
#'
#' The coersion from [hdbscan] builds the non-simplified HDBSCAN hierarchy as a
#' dendrogram object.
#'
#' @name dendrogram
#' @aliases dendrogram
#'
#' @param object the object
#' @param ... further arguments
NULL

#' @rdname dendrogram
#' @export
as.dendrogram <- function (object, ...) {
  UseMethod("as.dendrogram", object)
}

#' @rdname dendrogram
#' @export
as.dendrogram.default <- function (object, ...)
  stats::as.dendrogram(object, ...)

## this is a replacement for stats::as.dendrogram for hclust
#' @rdname dendrogram
#' @export
as.dendrogram.hclust <- function(object, ...) {
  return(buildDendrogram(object))
}

#' @rdname dendrogram
#' @export
as.dendrogram.hdbscan <- function(object, ...) {
  return(buildDendrogram(object$hc))
}

#' @rdname dendrogram
#' @export
as.dendrogram.reachability <- function(object, ...) {
  if (sum(is.infinite(object$reachdist)) > 1)
    stop(
      "Multiple Infinite reachability distances found. Reachability plots can only be converted if they contain enough information to fully represent the dendrogram structure. If using OPTICS, a larger eps value (such as Inf) may be needed in the parameterization."
    )
  #dup_x <- object
  c_order <- order(object$reachdist) - 1
  # dup_x$order <- dup_x$order - 1
  #q_order <- sapply(c_order, function(i) which(dup_x$order == i))
  res <- reach_to_dendrogram(object, c_order)
  # res <- dendrapply(res, function(leaf) { new_leaf <- leaf[[1]]; attributes(new_leaf) <- attributes(leaf); new_leaf })

  # add mid points for plotting
  res <- .midcache.dendrogram(res)

  res
}

# calculate midpoints for dendrogram
# from stats, but not exported
# see stats:::midcache.dendrogram

.midcache.dendrogram <- function(x, type = "hclust", quiet = FALSE) {
  type <- match.arg(type)
  stopifnot(inherits(x, "dendrogram"))
  verbose <- getOption("verbose", 0) >= 2
  setmid <- function(d, type) {
    depth <- 0L
    kk <- integer()
    jj <- integer()
    dd <- list()
    repeat {
      if (!is.leaf(d)) {
        k <- length(d)
        if (k < 1)
          stop("dendrogram node with non-positive #{branches}")
        depth <- depth + 1L
        if (verbose)
          cat(sprintf(" depth(+)=%4d, k=%d\n", depth,
            k))
        kk[depth] <- k
        if (storage.mode(jj) != storage.mode(kk))
          storage.mode(jj) <- storage.mode(kk)
        dd[[depth]] <- d
        d <- d[[jj[depth] <- 1L]]
        next
      }
      while (depth) {
        k <- kk[depth]
        j <- jj[depth]
        r <- dd[[depth]]
        r[[j]] <- unclass(d)
        if (j < k)
          break
        depth <- depth - 1L
        if (verbose)
          cat(sprintf(" depth(-)=%4d, k=%d\n", depth,
            k))
        midS <- sum(vapply(r, .midDend, 0))
        if (!quiet && type == "hclust" && k != 2)
          warning("midcache() of non-binary dendrograms only partly implemented")
        attr(r, "midpoint") <- (.memberDend(r[[1L]]) +
            midS) / 2
        d <- r
      }
      if (!depth)
        break
      dd[[depth]] <- r
      d <- r[[jj[depth] <- j + 1L]]
    }
    d
  }
  setmid(x, type = type)
}

.midDend <- function(x) {
  attr(x, "midpoint") %||% 0
}

.memberDend <- function(x) {
  attr(x, "x.member") %||% attr(x, "members") %||% 1
}


================================================
FILE: R/extractFOSC.R
================================================
#######################################################################
# dbscan - Density Based Clustering of Applications with Noise
#          and Related Algorithms
# Copyright (C) 2015 Michael Hahsler, Matt Piekenbrock

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

#' Framework for the Optimal Extraction of Clusters from Hierarchies
#'
#' Generic reimplementation of the _Framework for Optimal Selection of Clusters_
#' (FOSC; Campello et al, 2013) to extract clusterings from hierarchical clustering (i.e.,
#' [hclust] objects).
#' Can be parameterized to perform unsupervised
#' cluster extraction through a stability-based measure, or semisupervised
#' cluster extraction through either a constraint-based extraction (with a
#' stability-based tiebreaker) or a mixed (weighted) constraint and
#' stability-based objective extraction.
#'
#' Campello et al (2013) suggested a _Framework for Optimal Selection of
#' Clusters_ (FOSC) as a framework to make local (non-horizontal) cuts to any
#' cluster tree hierarchy. This function implements the original extraction
#' algorithms as described by the framework for hclust objects. Traditional
#' cluster extraction methods from hierarchical representations (such as
#' [hclust] objects) generally rely on global parameters or cutting values
#' which are used to partition a cluster hierarchy into a set of disjoint, flat
#' clusters. This is implemented in R in function [stats::cutree()].
#' Although such methods are widespread, using global parameter
#' settings are inherently limited in that they cannot capture patterns within
#' the cluster hierarchy at varying _local_ levels of granularity.
#'
#' Rather than partitioning a hierarchy based on the number of the cluster one
#' expects to find (\eqn{k}) or based on some linkage distance threshold
#' (\eqn{H}), the FOSC proposes that the optimal clusters may exist at varying
#' distance thresholds in the hierarchy. To enable this idea, FOSC requires one
#' parameter (minPts) that represents _the minimum number of points that
#' constitute a valid cluster._ The first step of the FOSC algorithm is to
#' traverse the given cluster hierarchy divisively, recording new clusters at
#' each split if both branches represent more than or equal to minPts. Branches
#' that contain less than minPts points at one or both branches inherit the
#' parent clusters identity. Note that using FOSC, due to the constraint that
#' minPts must be greater than or equal to 2, it is possible that the optimal
#' cluster solution chosen makes local cuts that render parent branches of
#' sizes less than minPts as noise, which are denoted as 0 in the final
#' solution.
#'
#' Traversing the original cluster tree using minPts creates a new, simplified
#' cluster tree that is then post-processed recursively to extract clusters
#' that maximize for each cluster \eqn{C_i}{Ci} the cost function
#'
#' \deqn{\max_{\delta_2, \dots, \delta_k} J = \sum\limits_{i=2}^{k} \delta_i
#' S(C_i)}{ J = \sum \delta S(Ci) for all i clusters, } where
#' \eqn{S(C_i)}{S(Ci)} is the stability-based measure as \deqn{ S(C_i) =
#' \sum_{x_j \in C_i}(\frac{1}{h_{min} (x_j, C_i)} - \frac{1}{h_{max} (C_i)})
#' }{ S(Ci) = \sum (1/Hmin(Xj, Ci) - 1/Hmax(Ci)) for all Xj in Ci.}
#'
#' \eqn{\delta_i}{\delta} represents an indicator function, which constrains
#' the solution space such that clusters must be disjoint (cannot assign more
#' than 1 label to each cluster). The measure \eqn{S(C_i)}{S(Ci)} used by FOSC
#' is an unsupervised validation measure based on the assumption that, if you
#' vary the linkage/distance threshold across all possible values, more
#' prominent clusters that survive over many threshold variations should be
#' considered as stronger candidates of the optimal solution. For this reason,
#' using this measure to detect clusters is referred to as an unsupervised,
#' _stability-based_ extraction approach. In some cases it may be useful
#' to enact _instance-level_ constraints that ensure the solution space
#' conforms to linkage expectations known _a priori_. This general idea of
#' using preliminary expectations to augment the clustering solution will be
#' referred to as _semisupervised clustering_. If constraints are given in
#' the call to `extractFOSC()`, the following alternative objective function
#' is maximized:
#'
#' \deqn{J = \frac{1}{2n_c}\sum\limits_{j=1}^n \gamma (x_j)}{J = 1/(2 * nc)
#' \sum \gamma(Xj)}
#'
#' \eqn{n_c}{nc} is the total number of constraints given and
#' \eqn{\gamma(x_j)}{\gamma(Xj)} represents the number of constraints involving
#' object \eqn{x_j}{Xj} that are satisfied. In the case of ties (such as
#' solutions where no constraints were given), the unsupervised solution is
#' used as a tiebreaker. See Campello et al (2013) for more details.
#'
#' As a third option, if one wishes to prioritize the degree at which the
#' unsupervised and semisupervised solutions contribute to the overall optimal
#' solution, the parameter \eqn{\alpha} can be set to enable the extraction of
#' clusters that maximize the `mixed` objective function
#'
#' \deqn{J = \alpha S(C_i) + (1 - \alpha) \gamma(C_i))}{J = \alpha S(Ci) + (1 -
#' \alpha) \gamma(Ci).}
#'
#' FOSC expects the pairwise constraints to be passed as either 1) an
#' \eqn{n(n-1)/2} vector of integers representing the constraints, where 1
#' represents should-link, -1 represents should-not-link, and 0 represents no
#' preference using the unsupervised solution (see below for examples).
#' Alternatively, if only a few constraints are needed, a named list
#' representing the (symmetric) adjacency list can be used, where the names
#' correspond to indices of the points in the original data, and the values
#' correspond to integer vectors of constraints (positive indices for
#' should-link, negative indices for should-not-link). Again, see the examples
#' section for a demonstration of this.
#'
#' The parameters to the input function correspond to the concepts discussed
#' above. The `minPts` parameter to represent the minimum cluster size to
#' extract. The optional `constraints` parameter contains the pairwise,
#' instance-level constraints of the data. The optional `alpha` parameters
#' controls whether the mixed objective function is used (if `alpha` is
#' greater than 0). If the `validate_constraints` parameter is set to
#' true, the constraints are checked (and fixed) for symmetry (if point A has a
#' should-link constraint with point B, point B should also have the same
#' constraint). Asymmetric constraints are not supported.
#'
#' Unstable branch pruning was not discussed by Campello et al (2013), however
#' in some data sets it may be the case that specific subbranches scores are
#' significantly greater than sibling and parent branches, and thus sibling
#' branches should be considered as noise if their scores are cumulatively
#' lower than the parents. This can happen in extremely nonhomogeneous data
#' sets, where there exists locally very stable branches surrounded by unstable
#' branches that contain more than `minPts` points.
#' `prune_unstable = TRUE` will remove the unstable branches.
#'
#' @family clustering functions
#'
#' @param x a valid [hclust] object created via [hclust()] or [hdbscan()].
#' @param constraints Either a list or matrix of pairwise constraints. If
#' missing, an unsupervised measure of stability is used to make local cuts and
#' extract the optimal clusters. See details.
#' @param alpha numeric; weight between \eqn{[0, 1]} for mixed-objective
#' semi-supervised extraction. Defaults to 0.
#' @param minPts numeric; Defaults to 2. Only needed if class-less noise is a
#' valid label in the model.
#' @param prune_unstable logical; should significantly unstable subtrees be
#' pruned? The default is `FALSE` for the original optimal extraction
#' framework (see Campello et al, 2013). See details for what `TRUE`
#' implies.
#' @param validate_constraints logical; should constraints be checked for
#' validity? See details for what are considered valid constraints.
#'
#' @returns A list with the elements:
#'
#' \item{cluster }{A integer vector with cluster assignments. Zero
#' indicates noise points (if any).}
#' \item{hc }{The original [hclust] object with additional list elements
#' `"stability"`, `"constraint"`, and `"total"`
#' for the \eqn{n - 1} cluster-wide objective scores from the extraction.}
#'
#' @author Matt Piekenbrock
#' @seealso [hclust()], [hdbscan()], [stats::cutree()]
#' @references Campello, Ricardo JGB, Davoud Moulavi, Arthur Zimek, and Joerg
#' Sander (2013). A framework for semi-supervised and unsupervised optimal
#' extraction of clusters from hierarchies. _Data Mining and Knowledge
#' Discovery_ 27(3): 344-371.
#' \doi{10.1007/s10618-013-0311-4}
#' @keywords model clustering
#' @examples
#' data("moons")
#'
#' ## Regular HDBSCAN using stability-based extraction (unsupervised)
#' cl <- hdbscan(moons, minPts = 5)
#' cl$cluster
#'
#' ## Constraint-based extraction from the HDBSCAN hierarchy
#' ## (w/ stability-based tiebreaker (semisupervised))
#' cl_con <- extractFOSC(cl$hc, minPts = 5,
#'   constraints = list("12" = c(49, -47)))
#' cl_con$cluster
#'
#' ## Alternative formulation: Constraint-based extraction from the HDBSCAN hierarchy
#' ## (w/ stability-based tiebreaker (semisupervised)) using distance thresholds
#' dist_moons <- dist(moons)
#' cl_con2 <- extractFOSC(cl$hc, minPts = 5,
#'   constraints = ifelse(dist_moons < 0.1, 1L,
#'                 ifelse(dist_moons > 1, -1L, 0L)))
#'
#' cl_con2$cluster # same as the second example
#' @export
extractFOSC <-
  function(x,
    constraints,
    alpha = 0,
    minPts = 2L,
    prune_unstable = FALSE,
    validate_constraints = FALSE) {
    if (!inherits(x, "hclust"))
      stop("extractFOSC expects 'x' to be a valid hclust object.")

    # if constraints are given then they need to be a list, a matrix or a vector
    if (!(
      missing(constraints) ||
        is.list(constraints) ||
        is.matrix(constraints) ||
        is.numeric(constraints)
    ))
      stop("extractFOSC expects constraints to be either an adjacency list or adjacency matrix.")

    if (!minPts >= 2)
      stop("minPts must be at least 2.")
    if (alpha < 0 ||
        alpha > 1)
      stop("alpha can only takes values between [0, 1].")
    n <- nrow(x$merge) + 1L

    ## First step for both unsupervised and semisupervised - compute stability scores
    cl_tree <- computeStability(x, minPts)

    ## Unsupervised Extraction
    if (missing(constraints)) {
      cl_tree <- extractUnsupervised(cl_tree, prune_unstable)
    }
    ## Semi-supervised Extraction
    else {
      ## If given as adjacency-list form
      if (is.list(constraints)) {
        ## Checks for proper indexing, symmetry of constraints, etc.
        if (validate_constraints) {
          is_valid <- max(as.integer(names(constraints))) < n
          is_valid <- is_valid &&
            all(vapply(constraints, function(ilc) all(ilc <= n), logical(1L)))
          if (!is_valid) {
            stop("Detected constraint indices not in the interval [1, n]")
          }
          constraints <- validateConstraintList(constraints, n)
        }
        cl_tree <-
          extractSemiSupervised(cl_tree, constraints, alpha, prune_unstable)
      }
      ## Adjacency matrix given (probably from dist object), retrieve adjacency list form
      else if (is.vector(constraints)) {
        if (!all(constraints %in% c(-1, 0, 1))) {
          stop(
            "'extractFOSC' only accepts instance-level constraints. See ?extractFOSC for more details."
          )
        }
        ## Checks for proper integer labels, symmetry of constraints, length of vector, etc.
        if (validate_constraints) {
          is_valid <- length(constraints) == choose(n, 2)
          constraints_list <-
            validateConstraintList(distToAdjacency(constraints, n), n)
        } else {
          constraints_list <-  distToAdjacency(constraints, n)
        }
        cl_tree <-
          extractSemiSupervised(cl_tree, constraints_list, alpha, prune_unstable)
      }
      ## Full nxn adjacency-matrix given, give warning and retrieve adjacency list form
      else if (is.matrix(constraints)) {
        if (!all(constraints %in% c(-1, 0, 1))) {
          stop(
            "'extractFOSC' only accepts instance-level constraints. See ?extractFOSC for more details."
          )
        }
        if (!all(dim(constraints) == c(n, n))) {
          stop("Given matrix is not square.")
        }
        warning(
          "Full nxn matrix given; extractFOCS does not support asymmetric relational constraints. Using lower triangular."
        )

        constraints <- constraints[lower.tri(constraints)]

        ## Checks for proper integer labels, symmetry of constraints, length of vector, etc.
        if (validate_constraints) {
          is_valid <- length(constraints) == choose(n, 2)
          constraints_list <-
            validateConstraintList(distToAdjacency(constraints, n), n)
        } else {
          constraints_list <- distToAdjacency(constraints, n)
        }
        cl_tree <-
          extractSemiSupervised(cl_tree, constraints_list, alpha, prune_unstable)
      } else {
        stop(
          "'extractFOSC' doesn't know how to handle constraints of type ",
          class(constraints)
        )
      }
    }
    cl_track <- attr(cl_tree, "cl_tracker")
    stability_score <-
      vapply(cl_track, function(cid)
        cl_tree[[as.character(cid)]]$stability, numeric(1L))
    constraint_score <-
      vapply(cl_track, function(cid)
        cl_tree[[as.character(cid)]]$vscore %||% 0, numeric(1L))
    total_score <-
      vapply(cl_track, function(cid)
        cl_tree[[as.character(cid)]]$score %||% 0, numeric(1L))
    out <- append(
      x,
      list(
        cluster = cl_track,
        stability = stability_score,
        constraint = constraint_score,
        total = total_score
      )
    )
    extraction_type <-
      if (missing(constraints)) {
        "(w/ stability-based extraction)"
      } else if (alpha == 0) {
        "(w/ constraint-based extraction)"
      } else {
        "(w/ mixed-objective extraction)"
      }
    substrs <- strsplit(x$method, split = " \\(w\\/")[[1L]]
    out[["method"]] <-
      if (length(substrs) > 1)
        paste(substrs[[1]], extraction_type)
    else
      paste(out[["method"]], extraction_type)
    class(out) <- "hclust"
    return(list(cluster = attr(cl_tree, "cluster"), hc = out))
  }


================================================
FILE: R/frNN.R
================================================
#######################################################################
# dbscan - Density Based Clustering of Applications with Noise
#          and Related Algorithms
# Copyright (C) 2015 Michael Hahsler

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.


#' Find the Fixed Radius Nearest Neighbors
#'
#' This function uses a kd-tree to find the fixed radius nearest neighbors
#' (including distances) fast.
#'
#' If `x` is specified as a data matrix, then Euclidean distances an fast
#' nearest neighbor lookup using a kd-tree are used.
#'
#' To create a frNN object from scratch, you need to supply at least the
#' elements `id` with a list of integer vectors with the nearest neighbor
#' ids for each point and `eps` (see below).
#'
#' **Self-matches:** Self-matches are not returned!
#'
#' @aliases frNN frnn print.frnn
#' @family NN functions
#'
#' @param x a data matrix, a dist object or a frNN object.
#' @param eps neighbors radius.
#' @param query a data matrix with the points to query. If query is not
#' specified, the NN for all the points in `x` is returned. If query is
#' specified then `x` needs to be a data matrix.
#' @param sort sort the neighbors by distance? This is expensive and can be
#' done later using `sort()`.
#' @param search nearest neighbor search strategy (one of `"kdtree"`, `"linear"` or
#' `"dist"`).
#' @param bucketSize max size of the kd-tree leafs.
#' @param splitRule rule to split the kd-tree. One of `"STD"`, `"MIDPT"`, `"FAIR"`,
#' `"SL_MIDPT"`, `"SL_FAIR"` or `"SUGGEST"` (SL stands for sliding). `"SUGGEST"` uses
#' ANNs best guess.
#' @param approx use approximate nearest neighbors. All NN up to a distance of
#' a factor of `1 + approx` eps may be used. Some actual NN may be omitted
#' leading to spurious clusters and noise points.  However, the algorithm will
#' enjoy a significant speedup.
#' @param decreasing sort in decreasing order?
#' @param ... further arguments
#'
#' @returns
#'
#' `frNN()` returns an object of class [frNN] (subclass of
#' [NN]) containing a list with the following components:
#' \item{id }{a list of
#' integer vectors. Each vector contains the ids (row numbers) of the fixed radius nearest
#' neighbors. }
#' \item{dist }{a list with distances (same structure as
#' `id`). }
#' \item{eps }{ neighborhood radius `eps` that was used. }
#' \item{metric }{ used distance metric. }
#'
#' `adjacencylist()` returns a list with one entry per data point in `x`. Each entry
#' contains the id of the nearest neighbors.
#'
#' @author Michael Hahsler
#'
#' @references David M. Mount and Sunil Arya (2010). ANN: A Library for
#' Approximate Nearest Neighbor Searching,
#' \url{http://www.cs.umd.edu/~mount/ANN/}.
#' @keywords model
#' @examples
#' data(iris)
#' x <- iris[, -5]
#'
#' # Example 1: Find fixed radius nearest neighbors for each point
#' nn <- frNN(x, eps = .5)
#' nn
#'
#' # Number of neighbors
#' hist(lengths(adjacencylist(nn)),
#'   xlab = "k", main="Number of Neighbors",
#'   sub = paste("Neighborhood size eps =", nn$eps))
#'
#' # Explore neighbors of point i = 10
#' i <- 10
#' nn$id[[i]]
#' nn$dist[[i]]
#' plot(x, col = ifelse(seq_len(nrow(iris)) %in% nn$id[[i]], "red", "black"))
#'
#' # get an adjacency list
#' head(adjacencylist(nn))
#'
#' # plot the fixed radius neighbors (and then reduced to a radius of .3)
#' plot(nn, x)
#' plot(frNN(nn, eps = .3), x)
#'
#' ## Example 2: find fixed-radius NN for query points
#' q <- x[c(1,100),]
#' nn <- frNN(x, eps = .5, query = q)
#'
#' plot(nn, x, col = "grey")
#' points(q, pch = 3, lwd = 2)
#' @export frNN
frNN <-
  function(x,
    eps,
    query = NULL,
    sort = TRUE,
    search = "kdtree",
    bucketSize = 10,
    splitRule = "suggest",
    approx = 0) {
    if (is.null(eps) ||
        is.na(eps) || eps < 0)
      stop("eps needs to be >=0.")

    if (inherits(x, "frNN")) {
      if (x$eps < eps)
        stop("frNN in x has not a sufficient eps radius.")

      for (i in seq_along(x$dist)) {
        take <- x$dist[[i]] <= eps
        x$dist[[i]] <- x$dist[[i]][take]
        x$id[[i]] <- x$id[[i]][take]
      }
      x$eps <- eps

      return(x)
    }

    search <- .parse_search(search)
    splitRule <- .parse_splitRule(splitRule)

    ### dist search
    if (search == 3 && !inherits(x, "dist")) {
      if (.matrixlike(x))
        x <- dist(x)
      else
        stop("x needs to be a matrix to calculate distances")
    }

    ### get kNN from a dist object in R
    if (inherits(x, "dist")) {
      if (!is.null(query))
        stop("query can only be used if x contains the data.")

      if (anyNA(x))
        stop("data/distances cannot contain NAs for frNN (with kd-tree)!")

      return(dist_to_frNN(x, eps = eps, sort = sort))
    }

    ## make sure x is numeric
    if (!.matrixlike(x))
      stop("x needs to be a matrix or a data.frame.")
    x <- as.matrix(x)
    if (storage.mode(x) == "integer")
      storage.mode(x) <- "double"
    if (storage.mode(x) != "double")
      stop("all data in x has to be numeric.")

    if (!is.null(query)) {
      if (!.matrixlike(query))
        stop("query needs to be a matrix or a data.frame.")
      query <- as.matrix(query)
      if (storage.mode(query) == "integer")
        storage.mode(query) <- "double"
      if (storage.mode(query) != "double")
        stop("query has to be NULL or a numeric matrix or data.frame.")
      if (ncol(x) != ncol(query))
        stop("x and query need to have the same number of columns!")
    }

    if (anyNA(x))
      stop("data/distances cannot contain NAs for frNN (with kd-tree)!")

    ## returns NO self matches
    if (!is.null(query)) {
      ret <-
        frNN_query_int(
          as.matrix(x),
          as.matrix(query),
          as.double(eps),
          as.integer(search),
          as.integer(bucketSize),
          as.integer(splitRule),
          as.double(approx)
        )
      names(ret$dist) <- rownames(query)
      names(ret$id) <- rownames(query)
      ret$metric <- "euclidean"
    } else {
      ret <- frNN_int(
        as.matrix(x),
        as.double(eps),
        as.integer(search),
        as.integer(bucketSize),
        as.integer(splitRule),
        as.double(approx)
      )
      names(ret$dist) <- rownames(x)
      names(ret$id) <- rownames(x)
      ret$metric <- "euclidean"
    }

    ret$eps <- eps
    ret$sort <- FALSE
    class(ret) <- c("frNN", "NN")

    if (sort)
      ret <- sort.frNN(ret)

    ret
  }

# extract a row from a distance matrix without doubling space requirements
dist_row <- function(x, i, self_val = 0) {
  n <- attr(x, "Size")

  i <- rep(i, times = n)
  j <- seq_len(n)
  swap_idx <- i > j
  tmp <- i[swap_idx]
  i[swap_idx] <- j[swap_idx]
  j[swap_idx] <- tmp

  diag_idx <- i == j
  idx <- n * (i - 1) - i * (i - 1) / 2 + j - i
  idx[diag_idx] <- NA

  val <- x[idx]
  val[diag_idx] <- self_val
  val
}

dist_to_frNN <- function(x, eps, sort = FALSE) {
  .check_dist(x)

  n <- attr(x, "Size")

  id <- list()
  d <- list()

  for (i in seq_len(n)) {
    ### Inf -> no self-matches
    y <- dist_row(x, i, self_val = Inf)
    o <- which(y <= eps)
    id[[i]] <- o
    d[[i]] <- y[o]
  }
  names(id) <- labels(x)
  names(d) <- labels(x)

  ret <-
    structure(list(
      dist = d,
      id = id,
      eps = eps,
      metric = attr(x, "method"),
      sort = FALSE
    ),
      class = c("frNN", "NN"))

  if (sort)
    ret <- sort.frNN(ret)

  return(ret)
}

#' @rdname frNN
#' @export
sort.frNN <- function(x, decreasing = FALSE, ...) {
  if (isTRUE(x$sort))
    return(x)
  if (is.null(x$dist))
    stop("Unable to sort. Distances are missing.")

  ## FIXME: This is slow do this in C++
  n <- names(x$id)

  o <- lapply(
    seq_along(x$dist),
    FUN =
      function(i)
        order(x$dist[[i]], x$id[[i]], decreasing = decreasing)
  )
  x$dist <-
    lapply(
      seq_along(o),
      FUN = function(p)
        x$dist[[p]][o[[p]]]
    )
  x$id <- lapply(
    seq_along(o),
    FUN = function(p)
      x$id[[p]][o[[p]]]
  )

  names(x$dist) <- n
  names(x$id) <- n

  x$sort <- TRUE

  x
}

#' @rdname frNN
#' @export
adjacencylist.frNN <- function(x, ...)
  x$id

#' @rdname frNN
#' @export
print.frNN <- function(x, ...) {
  cat(
    "fixed radius nearest neighbors for ",
    length(x$id),
    " objects (eps=",
    x$eps,
    ").",
    "\n",
    sep = ""
  )

  cat("Distance metric:", x$metric, "\n")
  cat("\nAvailable fields: ", toString(names(x)), "\n", sep = "")
}


================================================
FILE: R/hdbscan.R
================================================
#######################################################################
# dbscan - Density Based Clustering of Applications with Noise
#          and Related Algorithms
# Copyright (C) 2015 Michael Hahsler, Matt Piekenbrock

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

#' Hierarchical DBSCAN (HDBSCAN)
#'
#' Fast C++ implementation of the HDBSCAN (Hierarchical DBSCAN) and its related
#' algorithms.
#'
#' This fast implementation of HDBSCAN (Campello et al., 2013) computes the
#' hierarchical cluster tree representing density estimates along with the
#' stability-based flat cluster extraction. HDBSCAN essentially computes the
#' hierarchy of all DBSCAN* clusterings, and
#' then uses a stability-based extraction method to find optimal cuts in the
#' hierarchy, thus producing a flat solution.
#'
#' HDBSCAN performs the following steps:
#'
#' 1. Compute mutual reachability distance mrd between points
#'    (based on distances and core distances).
#' 2. Use mdr as a distance measure to construct a minimum spanning tree.
#' 3. Prune the tree using stability.
#' 4. Extract the clusters.
#'
#' Additional, related algorithms including the "Global-Local Outlier Score
#' from Hierarchies" (GLOSH; see section 6 of Campello et al., 2015)
#' is available in function [glosh()]
#' and the ability to cluster based on instance-level constraints (see
#' section 5.3 of Campello et al. 2015) are supported. The algorithms only need
#' the parameter `minPts`.
#'
#' Note that `minPts` not only acts as a minimum cluster size to detect,
#' but also as a "smoothing" factor of the density estimates implicitly
#' computed from HDBSCAN.
#'
#' When using the optional parameter `cluster_selection_epsilon`,
#' a combination between DBSCAN* and HDBSCAN* can be achieved
#' (see Malzer & Baum 2020). This means that part of the
#' tree is affected by `cluster_selection_epsilon` as if
#' running DBSCAN* with `eps` = `cluster_selection_epsilon`.
#' The remaining part (on levels above the threshold) is still
#' processed by HDBSCAN*'s stability-based selection algorithm
#' and can therefore return clusters of variable densities.
#' Note that there is not always a remaining part, especially if
#' the parameter value is chosen too large, or if there aren't
#' enough clusters of variable densities. In this case, the result
#' will be equal to DBSCAN*.
# `cluster_selection_epsilon` is especially useful for cases
#' where HDBSCAN* produces too many small clusters that
#' need to be merged, while still being able to extract clusters
#' of variable densities at higher levels.
#'
#' `coredist()`: The core distance is defined for each point as
#' the distance to the `MinPts - 1`'s neighbor.
#' It is a density estimate equivalent to `kNNdist()` with `k = MinPts -1`.
#'
#' `mrdist()`: The mutual reachability distance is defined between two points as
#' `mrd(a, b) = max(coredist(a), coredist(b), dist(a, b))`. This distance metric is used by
#' HDBSCAN. It has the effect of increasing distances in low density areas.
#'
#' `predict()` assigns each new data point to the same cluster as the nearest point
#' if it is not more than that points core distance away. Otherwise the new point
#' is classified as a noise point (i.e., cluster ID 0).
#' @aliases hdbscan HDBSCAN print.hdbscan
#'
#' @family HDBSCAN functions
#' @family clustering functions
#'
#' @param x a data matrix (Euclidean distances are used) or a [dist] object
#' calculated with an arbitrary distance metric.
#' @param minPts integer; Minimum size of clusters. See details.
#' @param cluster_selection_epsilon double; a distance threshold below which
#  no clusters should be selected (see Malzer & Baum 2020)
#' @param gen_hdbscan_tree logical; should the robust single linkage tree be
#' explicitly computed (see cluster tree in Chaudhuri et al, 2010).
#' @param gen_simplified_tree logical; should the simplified hierarchy be
#' explicitly computed (see Campello et al, 2013).
#' @param verbose report progress.
#' @param ...  additional arguments are passed on.
#' @param scale integer; used to scale condensed tree based on the graphics
#' device. Lower scale results in wider colored trees lines.
#' The default `'suggest'` sets scale to the number of clusters.
#' @param gradient character vector; the colors to build the condensed tree
#' coloring with.
#' @param show_flat logical; whether to draw boxes indicating the most stable
#' clusters.
#' @param coredist numeric vector with precomputed core distances (optional).
#'
#' @return `hdbscan()` returns object of class `hdbscan` with the following components:
#' \item{cluster }{A integer vector with cluster assignments. Zero indicates
#' noise points.}
#' \item{minPts }{ value of the `minPts` parameter.}
#' \item{cluster_scores }{The sum of the stability scores for each salient
#' (flat) cluster. Corresponds to cluster IDs given the in `"cluster"` element.
#' }
#' \item{membership_prob }{The probability or individual stability of a
#' point within its clusters. Between 0 and 1.}
#' \item{outlier_scores }{The GLOSH outlier score of each point. }
#' \item{hc }{An [hclust] object of the HDBSCAN hierarchy. }
#'
#' `coredist()` returns a vector with the core distance for each data point.
#'
#' `mrdist()` returns a [dist] object containing pairwise mutual reachability distances.
#'
#' @author Matt Piekenbrock
#' @author Claudia Malzer (added cluster_selection_epsilon)
#'
#' @references
#' Campello RJGB, Moulavi D, Sander J (2013). Density-Based Clustering Based on
#' Hierarchical Density Estimates. Proceedings of the 17th Pacific-Asia
#' Conference on Knowledge Discovery in Databases, PAKDD 2013, _Lecture Notes
#' in Computer Science_ 7819, p. 160.
#' \doi{10.1007/978-3-642-37456-2_14}
#'
#' Campello RJGB, Moulavi D, Zimek A, Sander J (2015). Hierarchical density
#' estimates for data clustering, visualization, and outlier detection.
#' _ACM Transactions on Knowledge Discovery from Data (TKDD),_ 10(5):1-51.
#' \doi{10.1145/2733381}
#'
#' Malzer, C., & Baum, M. (2020). A Hybrid Approach To Hierarchical
#' Density-based Cluster Selection.
#' In 2020 IEEE International Conference on Multisensor Fusion
#' and Integration for Intelligent Systems (MFI), pp. 223-228.
#' \doi{10.1109/MFI49285.2020.9235263}
#' @keywords model clustering hierarchical
#' @examples
#' ## cluster the moons data set with HDBSCAN
#' data(moons)
#'
#' res <- hdbscan(moons, minPts = 5)
#' res
#'
#' plot(res)
#' clplot(moons, res)
#'
#' ## cluster the moons data set with HDBSCAN using Manhattan distances
#' res <- hdbscan(dist(moons, method = "manhattan"), minPts = 5)
#' plot(res)
#' clplot(moons, res)
#'
#' ## Example for HDBSCAN(e) using cluster_selection_epsilon
#' # data with clusters of various densities.
#' X <- data.frame(
#'  x = c(
#'   0.08, 0.46, 0.46, 2.95, 3.50, 1.49, 6.89, 6.87, 0.21, 0.15,
#'   0.15, 0.39, 0.80, 0.80, 0.37, 3.63, 0.35, 0.30, 0.64, 0.59, 1.20, 1.22,
#'   1.42, 0.95, 2.70, 6.36, 6.36, 6.36, 6.60, 0.04, 0.71, 0.57, 0.24, 0.24,
#'   0.04, 0.04, 1.35, 0.82, 1.04, 0.62, 0.26, 5.98, 1.67, 1.67, 0.48, 0.15,
#'   6.67, 6.67, 1.20, 0.21, 3.99, 0.12, 0.19, 0.15, 6.96, 0.26, 0.08, 0.30,
#'   1.04, 1.04, 1.04, 0.62, 0.04, 0.04, 0.04, 0.82, 0.82, 1.29, 1.35, 0.46,
#'   0.46, 0.04, 0.04, 5.98, 5.98, 6.87, 0.37, 6.47, 6.47, 6.47, 6.67, 0.30,
#'   1.49, 3.21, 3.21, 0.75, 0.75, 0.46, 0.46, 0.46, 0.46, 3.63, 0.39, 3.65,
#'   4.09, 4.01, 3.36, 1.43, 3.28, 5.94, 6.35, 6.87, 5.60, 5.99, 0.12, 0.00,
#'   0.32, 0.39, 0.00, 1.63, 1.36, 5.67, 5.60, 5.79, 1.10, 2.99, 0.39, 0.18
#'   ),
#'  y = c(
#'   7.41, 8.01, 8.01, 5.44, 7.11, 7.13, 1.83, 1.83, 8.22, 8.08,
#'   8.08, 7.20, 7.83, 7.83, 8.29, 5.99, 8.32, 8.22, 7.38, 7.69, 8.22, 7.31,
#'   8.25, 8.39, 6.34, 0.16, 0.16, 0.16, 1.66, 7.55, 7.90, 8.18, 8.32, 8.32,
#'   7.97, 7.97, 8.15, 8.43, 7.83, 8.32, 8.29, 1.03, 7.27, 7.27, 8.08, 7.27,
#'   0.79, 0.79, 8.22, 7.73, 6.62, 7.62, 8.39, 8.36, 1.73, 8.29, 8.04, 8.22,
#'   7.83, 7.83, 7.83, 8.32, 8.11, 7.69, 7.55, 7.20, 7.20, 8.01, 8.15, 7.55,
#'   7.55, 7.97, 7.97, 1.03, 1.03, 1.24, 7.20, 0.47, 0.47, 0.47, 0.79, 8.22,
#'   7.13, 6.48, 6.48, 7.10, 7.10, 8.01, 8.01, 8.01, 8.01, 5.99, 8.04, 5.22,
#'   5.82, 5.14, 4.81, 7.62, 5.73, 0.55, 1.31, 0.05, 0.95, 1.59, 7.99, 7.48,
#'   8.38, 7.12, 2.01, 1.40, 0.00, 9.69, 9.47, 9.25, 2.63, 6.89, 0.56, 3.11
#'  )
#' )
#'
#' ## HDBSCAN splits one cluster
#' hdb <- hdbscan(X, minPts = 3)
#' plot(hdb, show_flat = TRUE)
#' hullplot(X, hdb, main = "HDBSCAN")
#'
#' ## DBSCAN* marks the least dense cluster as outliers
#' db <- dbscan(X, eps = 1, minPts = 3, borderPoints = FALSE)
#' hullplot(X, db, main = "DBSCAN*")
#'
#' ## HDBSCAN(e) mixes HDBSCAN AND DBSCAN* to find all clusters
#' hdbe <- hdbscan(X, minPts = 3, cluster_selection_epsilon = 1)
#' plot(hdbe, show_flat = TRUE)
#' hullplot(X, hdbe, main = "HDBSCAN(e)")
#' @export
hdbscan <- function(x,
                    minPts,
                    cluster_selection_epsilon = 0.0,
                    gen_hdbscan_tree = FALSE,
                    gen_simplified_tree = FALSE,
                    verbose = FALSE) {
  if (!inherits(x, "dist") && !.matrixlike(x)) {
    stop("hdbscan expects a numeric matrix or a dist object.")
  }

  ## 1. Calculate the mutual reachability between points
  if (verbose) {
    cat("Calculating core distances...\n")
  }
  coredist <- coredist(x, minPts)


  if (verbose) {
    cat("Calculating the mutual reachability matrix distances...\n")
  }
  mrd <- mrdist(x, minPts, coredist = coredist)
  n <- attr(mrd, "Size")

  ## 2. Construct a minimum spanning tree and convert to RSL representation
  if (verbose) {
    cat("Constructing the minimum spanning tree...\n")
  }
  mst <- mst(mrd, n)
  hc <- hclustMergeOrder(mst, order(mst[, 3]))
  hc$call <- match.call()

  ## 3. Prune the tree
  ## Process the hierarchy to retrieve all the necessary info needed by HDBSCAN
  if (verbose) {
    cat("Tree pruning...\n")
  }
  res <- computeStability(hc, minPts, compute_glosh = TRUE)
  res <- extractUnsupervised(res, cluster_selection_epsilon = cluster_selection_epsilon)
  cl <- attr(res, "cluster")

  ## 4. Extract the clusters
  if (verbose) {
    cat("Extract clusters...\n")
  }
  sl <- attr(res, "salient_clusters")

  ## Generate membership 'probabilities' using core distance as the measure of density
  prob <- rep(0, length(cl))
  for (cid in sl) {
    max_f <- max(coredist[which(cl == cid)])
    pr <- (max_f - coredist[which(cl == cid)]) / max_f
    prob[cl == cid] <- pr
  }

  ## Match cluster assignments to be incremental, with 0 representing noise
  if (any(cl == 0)) {
    cluster <- match(cl, c(0, sl)) - 1
  } else {
    cluster <- match(cl, sl)
  }
  cl_map <-
    structure(sl, names = unique(cluster[hc$order][cluster[hc$order] != 0]))

  ## Stability scores
  ## NOTE: These scores represent the stability scores -before- the hierarchy traversal
  cluster_scores <-
    vapply(sl, function(sl_cid) {
      res[[as.character(sl_cid)]]$stability
    }, numeric(1L))
  names(cluster_scores) <- names(cl_map)

  ## Return everything HDBSCAN does
  attr(res, "cl_map") <-
    cl_map # Mapping of hierarchical IDS to 'normalized' incremental ids
  out <- structure(
    list(
      cluster = cluster,
      minPts = minPts,
      coredist = coredist,
      cluster_scores = cluster_scores,
      # (Cluster-wide cumulative) Stability Scores
      membership_prob = prob,
      # Individual point membership probabilities
      outlier_scores = attr(res, "glosh"),
      # Outlier Scores
      hc = hc # Hclust object of MST (can be cut for quick assignments)
    ),
    class = "hdbscan",
    hdbscan = res
  ) # hdbscan attributes contains actual HDBSCAN hierarchy

  ## The trees don't need to be explicitly computed, but they may be useful if the user wants them
  if (gen_hdbscan_tree) {
    out$hdbscan_tree <- buildDendrogram(hc)
  }
  if (gen_simplified_tree) {
    out$simplified_tree <- simplifiedTree(res)
  }
  return(out)
}

#' @rdname hdbscan
#' @export
print.hdbscan <- function(x, ...) {
  writeLines(c(
    paste0("HDBSCAN clustering for ", nobs(x), " objects."),
    paste0("Parameters: minPts = ", x$minPts),
    paste0(
      "The clustering contains ",
      ncluster(x),
      " cluster(s) and ",
      nnoise(x),
      " noise points."
    )
  ))

  print(table(x$cluster))
  cat("\n")
  writeLines(strwrap(paste0("Available fields: ", toString(names(
    x
  ))), exdent = 18))
}

#' @rdname hdbscan
#' @param leaflab a string specifying how leaves are labeled (see [stats::plot.dendrogram()]).
#' @param ylab the label for the y axis.
#' @param main Title of the plot.
#' @export
plot.hdbscan <-
  function(x,
           scale = "suggest",
           gradient = c("yellow", "red"),
           show_flat = FALSE,
           main = "HDBSCAN*",
           ylab = "eps value",
           leaflab = "none",
           ...) {
    ## Logic checks
    if (!(scale == "suggest" ||
          scale > 0)) {
      stop("scale parameter must be greater than 0.")
    }

    ## Main information needed
    hd_info <- attr(x, "hdbscan")
    dend <- x$simplified_tree %||% simplifiedTree(hd_info)
    coords <-
      node_xy(hd_info, cl_hierarchy = attr(hd_info, "cl_hierarchy"))

    ## Variables to help setup the scaling of the plotting
    nclusters <- length(hd_info)
    npoints <- length(x$cluster)
    nleaves <-
      length(all_children(
        attr(hd_info, "cl_hierarchy"),
        key = 0,
        leaves_only = TRUE
      ))

    scale <- ifelse(scale == "suggest", nclusters, nclusters / scale)

    ## Color variables
    col_breaks <- seq(0, length(x$cluster) + nclusters, by = nclusters)
    gcolors <- grDevices::colorRampPalette(gradient)(length(col_breaks))

    ## Depth-first search to recursively plot rectangles
    eps_dfs <- function(dend, index, parent_height, scale) {
      coord <- coords[index, ]
      cl_key <- as.character(attr(dend, "label"))

      ## widths == number of points in the cluster at each eps it was alive
      widths <-
        vapply(sort(hd_info[[cl_key]]$eps, decreasing = TRUE), function(eps) {
          sum(hd_info[[cl_key]]$eps <= eps)
        }, numeric(1L))
      if (length(widths) > 0) {
        widths <- c(widths + hd_info[[cl_key]]$n_children,
                    rep(hd_info[[cl_key]]$n_children, hd_info[[cl_key]]$n_children))
      } else {
        widths <-
          rep(hd_info[[cl_key]]$n_children, hd_info[[cl_key]]$n_children)
      }

      ## Normalize and scale widths to length of x-axis
      normalize <- function(x) {
        (nleaves) * (x - 1) / (npoints - 1)
      }
      xleft <- coord[[1]] - normalize(widths) / scale
      xright <- coord[[1]] + normalize(widths) / scale

      ## Top is always parent height, bottom is when the points died
      ## Minor adjustment made if at the root equivalent to plot.dendrogram(edge.root=T)
      if (cl_key == "0") {
        ytop <-
          rep(hd_info[[cl_key]]$eps_birth + 0.0625 * hd_info[[cl_key]]$eps_birth,
              length(widths))
        ybottom <- rep(hd_info[[cl_key]]$eps_death, length(widths))
      } else {
        ytop <- rep(parent_height, length(widths))
        ybottom <-
          c(
            sort(hd_info[[cl_key]]$eps, decreasing = TRUE),
            rep(hd_info[[cl_key]]$eps_death, hd_info[[cl_key]]$n_children)
          )
      }

      ## Draw the rectangles
      rect_color <-
        gcolors[.bincode(length(widths), breaks = col_breaks)]
      graphics::rect(
        xleft = xleft,
        xright = xright,
        ybottom = ybottom,
        ytop = ytop,
        col = rect_color,
        border = NA,
        lwd = 0
      )

      ## Highlight the most 'stable' clusters returned by the default flat cluster extraction
      if (show_flat) {
        salient_cl <- attr(hd_info, "salient_clusters")
        if (as.integer(attr(dend, "label")) %in% salient_cl) {
          x_adjust <-
            (max(xright) - min(xleft)) * 0.10 # 10% left/right border
          y_adjust <-
            (max(ytop) - min(ybottom)) * 0.025 # 2.5% above/below border
          graphics::rect(
            xleft = min(xleft) - x_adjust,
            xright = max(xright) + x_adjust,
            ybottom = min(ybottom) - y_adjust,
            ytop = max(ytop) + y_adjust,
            border = "red",
            lwd = 1
          )
          n_label <-
            names(which(attr(hd_info, "cl_map") == attr(dend, "label")))
          text(
            x = coord[[1]],
            y = min(ybottom),
            pos = 1,
            labels = n_label
          )
        }
      }

      ## Recurse in depth-first-manner
      if (is.leaf(dend)) {
        return(index)
      } else {
        left <-
          eps_dfs(
            dend[[1]],
            index = index + 1,
            parent_height = attr(dend, "height"),
            scale = scale
          )
        right <-
          eps_dfs(
            dend[[2]],
            index = left + 1,
            parent_height = attr(dend, "height"),
            scale = scale
          )
        return(right)
      }
    }

    ## Run the recursive plotting
    plot(
      dend,
      edge.root = TRUE,
      main = main,
      ylab = ylab,
      leaflab = leaflab,
      ...
    )
    eps_dfs(dend,
            index = 1,
            parent_height = 0,
            scale = scale)
    return(invisible(x))
  }

#' @rdname hdbscan
#' @export
coredist <- function(x, minPts)
  kNNdist(x, k = minPts - 1)

#' @rdname hdbscan
#' @export
mrdist <- function(x, minPts, coredist = NULL) {
  if (inherits(x, "dist")) {
    .check_dist(x)
    x_dist <- x
  } else {
    x_dist <- dist(x,
                   method = "euclidean",
                   diag = FALSE,
                   upper = FALSE)
  }

  if (is.null(coredist)) {
    coredist <- coredist(x, minPts)
  }

  # mr_dist <- as.vector(pmax(as.dist(outer(coredist, coredist, pmax)), x_dist))
  # much faster in C++
  mr_dist <- mrd(x_dist, coredist)
  class(mr_dist) <- "dist"
  attr(mr_dist, "Size") <- attr(x_dist, "Size")
  attr(mr_dist, "Diag") <- FALSE
  attr(mr_dist, "Upper") <- FALSE
  attr(mr_dist, "method") <- paste0("mutual reachability (", attr(x_dist, "method"), ")")
  mr_dist
}


================================================
FILE: R/hullplot.R
================================================
#######################################################################
# dbscan - Density Based Clustering of Applications with Noise
#          and Related Algorithms
# Copyright (C) 2015 Michael Hahsler

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

#' Plot Clusters
#'
#' This function produces a two-dimensional scatter plot of data points
#' and colors the data points according to a supplied clustering. Noise points
#' are marked as `x`. `hullplot()` also adds convex hulls to clusters.
#'
#' @name hullplot
#' @aliases hullplot clplot
#'
#' @param x a data matrix. If more than 2 columns are provided, then the data
#' is plotted using the first two principal components.
#' @param cl a clustering. Either a numeric cluster assignment vector or a
#' clustering object (a list with an element named `cluster`).
#' @param col colors used for clusters. Defaults to the standard palette.  The
#' first color (default is black) is used for noise/unassigned points (cluster
#' id 0).
#' @param pch a vector of plotting characters. By default `o` is used for
#'   points and `x` for noise points.
#' @param cex expansion factor for symbols.
#' @param hull_lwd,hull_lty line width and line type used for the convex hull.
#' @param main main title.
#' @param solid,alpha draw filled polygons instead of just lines for the convex
#' hulls? alpha controls the level of alpha shading.
#' @param ...  additional arguments passed on to plot.
#' @author Michael Hahsler
#' @keywords plot clustering
#' @examples
#' set.seed(2)
#' n <- 400
#'
#' x <- cbind(
#'   x = runif(4, 0, 1) + rnorm(n, sd = 0.1),
#'   y = runif(4, 0, 1) + rnorm(n, sd = 0.1)
#'   )
#' cl <- rep(1:4, times = 100)
#'
#'
#' ### original data with true clustering
#' clplot(x, cl, main = "True clusters")
#' hullplot(x, cl, main = "True clusters")
#' ### use different symbols
#' hullplot(x, cl, main = "True clusters", pch = cl)
#' ### just the hulls
#' hullplot(x, cl, main = "True clusters", pch = NA)
#' ### a version suitable for b/w printing)
#' hullplot(x, cl, main = "True clusters", solid = FALSE,
#'   col = c("grey", "black"), pch = cl)
#'
#'
#' ### run some clustering algorithms and plot the results
#' db <- dbscan(x, eps = .07, minPts = 10)
#' clplot(x, db, main = "DBSCAN")
#' hullplot(x, db, main = "DBSCAN")
#'
#' op <- optics(x, eps = 10, minPts = 10)
#' opDBSCAN <- extractDBSCAN(op, eps_cl = .07)
#' hullplot(x, opDBSCAN, main = "OPTICS")
#'
#' opXi <- extractXi(op, xi = 0.05)
#' hullplot(x, opXi, main = "OPTICSXi")
#'
#' # Extract minimal 'flat' clusters only
#' opXi <- extractXi(op, xi = 0.05, minimum = TRUE)
#' hullplot(x, opXi, main = "OPTICSXi")
#'
#' km <- kmeans(x, centers = 4)
#' hullplot(x, km, main = "k-means")
#'
#' hc <- cutree(hclust(dist(x)), k = 4)
#' hullplot(x, hc, main = "Hierarchical Clustering")
#' @export
hullplot <- function(x,
  cl,
  col = NULL,
  pch = NULL,
  cex = 0.5,
  hull_lwd = 1,
  hull_lty = 1,
  solid = TRUE,
  alpha = .2,
  main = "Convex Cluster Hulls",
  ...) {
  ### handle d>2 by using PCA
  if (ncol(x) > 2)
    x <- prcomp(x)$x

  ### extract clustering (keep hierarchical xICSXi structure)
  if (inherits(cl, "xics") || "clusters_xi" %in% names(cl)) {
    clusters_xi <- cl$clusters_xi
    cl_order <- cl$order
  } else
    clusters_xi <- NULL

  if (is.list(cl))
    cl <- cl$cluster
  if (!is.numeric(cl))
    stop("Could not get cluster assignment vector from cl.")

  #if(is.null(col)) col <- c("#000000FF", rainbow(n=max(cl)))
  if (is.null(col))
    col <- palette()

  # Note: We use the first color for noise points
  if (length(col) == 1L)
    col <- c(col, col)
  col_noise <- col[1]
  col <- col[-1]


  if (max(cl) > length(col)) {
    warning("Not enough colors. Some colors will be reused.")
    col <- rep(col, length.out = max(cl))
  }

  # mark noise points
  pch <- pch %||% ifelse(cl == 0L, 4L, 1L)

  plot(x[, 1:2],
    col = c(col_noise, col)[cl + 1L],
    pch = pch,
    cex = cex,
    main = main,
    ...)

  col_poly <- adjustcolor(col, alpha.f = alpha)
  border <- col

  ## no border?
  if (is.null(hull_lwd) || is.na(hull_lwd) || hull_lwd == 0) {
    hull_lwd <- 1
    border <- NA
  }

  if (inherits(cl, "xics") || "clusters_xi" %in% names(cl)) {
    ## This is necessary for larger datasets: Ensure largest is plotted first
    clusters_xi <-
      clusters_xi[order(-(clusters_xi$end - clusters_xi$start)), ] # Order by size (descending)
    ci_order <- clusters_xi$cluster_id
  } else {
    ci_order <- 1:max(cl)
  }

  for (i in seq_along(ci_order)) {
    ### use all the points for xICSXi's hierarchical structure
    if (is.null(clusters_xi)) {
      d <- x[cl == i, , drop = FALSE]
    } else {
      d <-
        x[cl_order[clusters_xi$start[i]:clusters_xi$end[i]], , drop = FALSE]
    }

    ch <- chull(d)
    ch <- c(ch, ch[1])
    if (!solid) {
      lines(d[ch, ],
            col = border[ci_order[i]],
            lwd = hull_lwd,
            lty = hull_lty)
    } else {
      polygon(
        d[ch, ],
        col = col_poly[ci_order[i]],
        lwd = hull_lwd,
        lty = hull_lty,
        border = border[ci_order[i]]
      )
    }
  }
}

#' @rdname hullplot
#' @export
clplot <- function(x,
                   cl,
                   col = NULL,
                   pch = NULL,
                   cex = 0.5,
                   main = "Cluster Plot",
                   ...)
  hullplot(x, cl = cl, col = col, pch = pch, cex = cex, main = main,
          solid = FALSE, hull_lwd = NA)


================================================
FILE: R/jpclust.R
================================================
#######################################################################
# dbscan - Density Based Clustering of Applications with Noise
#          and Related Algorithms
# Copyright (C) 2017 Michael Hahsler

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

#' Jarvis-Patrick Clustering
#'
#' Fast C++ implementation of the Jarvis-Patrick clustering which first builds
#' a shared nearest neighbor graph (k nearest neighbor sparsification) and then
#' places two points in the same cluster if they are in each others nearest
#' neighbor list and they share at least kt nearest neighbors.
#'
#' Following the original paper, the shared nearest neighbor list is
#' constructed as the k neighbors plus the point itself (as neighbor zero).
#' Therefore, the threshold `kt` needs to be in the range \eqn{[1, k]}.
#'
#' Fast nearest neighbors search with [kNN()] is only used if `x` is
#' a matrix. In this case Euclidean distance is used.
#'
#' @aliases jpclust print.general_clustering
#' @family clustering functions
#'
#' @param x a data matrix/data.frame (Euclidean distance is used), a
#' precomputed [dist] object or a kNN object created with [kNN()].
#' @param k Neighborhood size for nearest neighbor sparsification. If `x`
#' is a kNN object then `k` may be missing.
#' @param kt threshold on the number of shared nearest neighbors (including the
#' points themselves) to form clusters. Range: \eqn{[1, k]}
#' @param ...  additional arguments are passed on to the k nearest neighbor
#' search algorithm. See [kNN()] for details on how to control the
#' search strategy.
#'
#' @return A object of class `general_clustering` with the following
#' components:
#' \item{cluster }{A integer vector with cluster assignments. Zero
#' indicates noise points.}
#' \item{type }{ name of used clustering algorithm.}
#' \item{metric }{ the distance metric used for clustering.}
#' \item{param }{ list of used clustering parameters. }
#'
#' @author Michael Hahsler
#' @references R. A. Jarvis and E. A. Patrick. 1973. Clustering Using a
#' Similarity Measure Based on Shared Near Neighbors. _IEEE Trans. Comput.
#' 22,_ 11 (November 1973), 1025-1034.
#' \doi{10.1109/T-C.1973.223640}
#' @keywords model clustering
#' @examples
#' data("DS3")
#'
#' # use a shared neighborhood of 20 points and require 12 shared neighbors
#' cl <- jpclust(DS3, k = 20, kt = 12)
#' cl
#'
#' clplot(DS3, cl)
#' # Note: JP clustering does not consider noise and thus,
#' # the sine wave points chain clusters together.
#'
#' # use a precomputed kNN object instead of the original data.
#' nn <- kNN(DS3, k = 30)
#' nn
#'
#' cl <- jpclust(nn, k = 20, kt = 12)
#' cl
#'
#' # cluster with noise removed (use low pointdensity to identify noise)
#' d <- pointdensity(DS3, eps = 25)
#' hist(d, breaks = 20)
#' DS3_noiseless <- DS3[d > 110,]
#'
#' cl <- jpclust(DS3_noiseless, k = 20, kt = 10)
#' cl
#'
#' clplot(DS3_noiseless, cl)
#' @export
jpclust <- function(x, k, kt, ...) {
  # Create NN graph
  if (missing(k) && inherits(x, "kNN"))
      k <- x$k
  if (length(kt) != 1 || kt < 1 || kt > k)
    stop("kt needs to be a threshold in range [1, k].")

  nn <- kNN(x, k, sort = FALSE, ...)

  # Perform clustering
  cl <- JP_int(nn$id, kt = as.integer(kt))

  structure(
    list(
      cluster = as.integer(factor(cl)),
      type = "Jarvis-Patrick clustering",
      metric = nn$metric,
      param = list(k = k, kt = kt)
    ),
    class = c("general_clustering")
  )
}

#' @export
print.general_clustering <- function(x, ...) {
  cl <- unique(x$cluster)
  cl <- length(cl[cl != 0L])

  writeLines(c(
    paste0(x$type, " for ", length(x$cluster), " objects."),
    paste0("Parameters: ",
      paste(
        names(x$param),
        unlist(x$param, use.names = FALSE),
        sep = " = ",
        collapse = ", "
      )),
    paste0(
      "The clustering contains ",
      cl,
      " cluster(s) and ",
      sum(x$cluster == 0L),
      " noise points."
    )
  ))

  print(table(x$cluster))
  cat("\n")

  writeLines(strwrap(paste0(
    "Available fields: ",
    toString(names(x))
  ), exdent = 18))
}


================================================
FILE: R/kNN.R
================================================
#######################################################################
# dbscan - Density Based Clustering of Applications with Noise
#          and Related Algorithms
# Copyright (C) 2015 Michael Hahsler

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.


#' Find the k Nearest Neighbors
#'
#' This function uses a kd-tree to find all k nearest neighbors in a data
#' matrix (including distances) fast.
#'
#' **Ties:** If the kth and the (k+1)th nearest neighbor are tied, then the
#' neighbor found first is returned and the other one is ignored.
#'
#' **Self-matches:** If no query is specified, then self-matches are
#' removed.
#'
#' Details on the search parameters:
#'
#' * `search` controls if
#' a kd-tree or linear search (both implemented in the ANN library; see Mount
#' and Arya, 2010). Note, that these implementations cannot handle NAs.
#' `search = "dist"` precomputes Euclidean distances first using R. NAs are
#' handled, but the resulting distance matrix cannot contain NAs. To use other
#' distance measures, a precomputed distance matrix can be provided as `x`
#' (`search` is ignored).
#'
#' * `bucketSize` and `splitRule` influence how the kd-tree is
#' built. `approx` uses the approximate nearest neighbor search
#' implemented in ANN. All nearest neighbors up to a distance of
#' `eps / (1 + approx)` will be considered and all with a distance
#' greater than `eps` will not be considered. The other points might be
#' considered. Note that this results in some actual nearest neighbors being
#' omitted leading to spurious clusters and noise points. However, the
#' algorithm will enjoy a significant speedup. For more details see Mount and
#' Arya (2010).
#'
#' @aliases kNN knn
#' @family NN functions
#'
#' @param x a data matrix, a [dist] object or a [kNN] object.
#' @param k number of neighbors to find.
#' @param query a data matrix with the points to query. If query is not
#' specified, the NN for all the points in `x` is returned. If query is
#' specified then `x` needs to be a data matrix.
#' @param search nearest neighbor search strategy (one of `"kdtree"`, `"linear"` or
#' `"dist"`).
#' @param sort sort the neighbors by distance? Note that some search methods
#' already sort the results. Sorting is expensive and `sort = FALSE` may
#' be much faster for some search methods. kNN objects can be sorted using
#' `sort()`.
#' @param bucketSize max size of the kd-tree leafs.
#' @param splitRule rule to split the kd-tree. One of `"STD"`, `"MIDPT"`, `"FAIR"`,
#' `"SL_MIDPT"`, `"SL_FAIR"` or `"SUGGEST"` (SL stands for sliding). `"SUGGEST"` uses
#' ANNs best guess.
#' @param approx use approximate nearest neighbors. All NN up to a distance of
#' a factor of `1 + approx` eps may be used. Some actual NN may be omitted
#' leading to spurious clusters and noise points.  However, the algorithm will
#' enjoy a significant speedup.
#' @param decreasing sort in decreasing order?
#' @param ... further arguments
#'
#' @return An object of class `kNN` (subclass of [NN]) containing a
#' list with the following components:
#' \item{dist }{a matrix with distances. }
#' \item{id }{a matrix with `ids`. }
#' \item{k }{number `k` used. }
#' \item{metric }{ used distance metric. }
#'
#' @author Michael Hahsler
#' @references David M. Mount and Sunil Arya (2010). ANN: A Library for
#' Approximate Nearest Neighbor Searching,
#' \url{http://www.cs.umd.edu/~mount/ANN/}.
#' @keywords model
#' @examples
#' data(iris)
#' x <- iris[, -5]
#'
#' # Example 1: finding kNN for all points in a data matrix (using a kd-tree)
#' nn <- kNN(x, k = 5)
#' nn
#'
#' # explore neighborhood of point 10
#' i <- 10
#' nn$id[i,]
#' plot(x, col = ifelse(seq_len(nrow(iris)) %in% nn$id[i,], "red", "black"))
#'
#' # visualize the 5 nearest neighbors
#' plot(nn, x)
#'
#' # visualize a reduced 2-NN graph
#' plot(kNN(nn, k = 2), x)
#'
#' # Example 2: find kNN for query points
#' q <- x[c(1,100),]
#' nn <- kNN(x, k = 10, query = q)
#'
#' plot(nn, x, col = "grey")
#' points(q, pch = 3, lwd = 2)
#'
#' # Example 3: find kNN using distances
#' d <- dist(x, method = "manhattan")
#' nn <- kNN(d, k = 1)
#' plot(nn, x)
#' @export
kNN <-
  function(x,
    k,
    query = NULL,
    sort = TRUE,
    search = "kdtree",
    bucketSize = 10,
    splitRule = "suggest",
    approx = 0) {
    if (inherits(x, "kNN")) {
      if (x$k < k)
        stop("kNN in x has not enough nearest neighbors.")
      if (!x$sort)
        x <- sort(x)
      x$id <- x$id[, 1:k]
      if (!is.null(x$dist))
        x$dist <- x$dist[, 1:k]
      if (!is.null(x$shared))
        x$dist <- x$shared[, 1:k]
      x$k <- k
      return(x)
    }

    search <- .parse_search(search)
    splitRule <- .parse_splitRule(splitRule)

    k <- as.integer(k)
    if (k < 1)
      stop("Illegal k: needs to be k>=1!")

    ### dist search
    if (search == 3 && !inherits(x, "dist")) {
      if (.matrixlike(x))
        x <- dist(x)
      else
        stop("x needs to be a matrix to calculate distances")
    }

    ### get kNN from a dist object
    if (inherits(x, "dist")) {
      if (!is.null(query))
        stop("query can only be used if x contains a data matrix.")

      if (anyNA(x))
        stop("distances cannot be NAs for kNN!")

      return(dist_to_kNN(x, k = k))
    }

    ## make sure x is numeric
    if (!.matrixlike(x))
      stop("x needs to be a matrix to calculate distances")
    x <- as.matrix(x)
    if (storage.mode(x) == "integer")
      storage.mode(x) <- "double"
    if (storage.mode(x) != "double")
      stop("x has to be a numeric matrix.")

    if (!is.null(query)) {
      query <- as.matrix(query)
      if (storage.mode(query) == "integer")
        storage.mode(query) <- "double"
      if (storage.mode(query) != "double")
        stop("query has to be NULL or a numeric matrix.")
      if (ncol(x) != ncol(query))
        stop("x and query need to have the same number of columns!")
    }

    if (k >= nrow(x))
      stop("Not enough neighbors in data set!")


    if (anyNA(x))
      stop("data/distances cannot contain NAs for kNN (with kd-tree)!")

    ## returns NO self matches
    if (!is.null(query)) {
      ret <- kNN_query_int(
        as.matrix(x),
        as.matrix(query),
        as.integer(k),
        as.integer(search),
        as.integer(bucketSize),
        as.integer(splitRule),
        as.double(approx)
      )
      dimnames(ret$dist) <- list(rownames(query), 1:k)
      dimnames(ret$id) <- list(rownames(query), 1:k)
    } else {
      ret <- kNN_int(
        as.matrix(x),
        as.integer(k),
        as.integer(search),
        as.integer(bucketSize),
        as.integer(splitRule),
        as.double(approx)
      )
      dimnames(ret$dist) <- list(rownames(x), 1:k)
      dimnames(ret$id) <- list(rownames(x), 1:k)
    }

    class(ret) <- c("kNN", "NN")

    ### ANN already returns them sorted (by dist but not by ID)
    if (sort)
      ret <- sort(ret)

    ret$metric <- "euclidean"

    ret
  }

# make sure we have a lower-triangle representation w/o diagonal
.check_dist <- function(x) {
  if (!inherits(x, "dist"))
    stop("x needs to be a dist object")

  # cluster::dissimilarity does not have Diag or Upper attributes, but is a lower triangle
  # representation
  if (inherits(x, "dissimilarity"))
    return(TRUE)

  # check that dist objects have diag = FALSE, upper = FALSE
  if (attr(x, "Diag") || attr(x, "Upper"))
    stop("x needs to be a dist object with attributes Diag and Upper set to FALSE. Use as.dist(x, diag = FALSE, upper = FALSE) fist.")
  }

dist_to_kNN <- function(x, k) {
  .check_dist(x)

  n <- attr(x, "Size")

  id <- structure(integer(n * k), dim = c(n, k))
  d <- matrix(NA_real_, nrow = n, ncol = k)

  for (i in seq_len(n)) {
    ### Inf -> no self-matches
    y <- dist_row(x, i, self_val = Inf)
    o <- order(y, decreasing = FALSE)
    o <- o[seq_len(k)]
    id[i, ] <- o
    d[i, ] <- y[o]
  }
  dimnames(id) <- list(labels(x), seq_len(k))
  dimnames(d) <- list(labels(x), seq_len(k))

  ret <-
    structure(list(
      dist = d,
      id = id,
      k = k,
      sort = TRUE,
      metric = attr(x, "method")
    ),
      class = c("kNN", "NN"))

  return(ret)
}

#' @rdname kNN
#' @export
sort.kNN <- function(x, decreasing = FALSE, ...) {
  if (isTRUE(x$sort))
    return(x)
  if (is.null(x$dist))
    stop("Unable to sort. Distances are missing.")
  if (ncol(x$id) < 2) {
    x$sort <- TRUE
    return(x)
  }

  ## sort first by dist and break ties using id
  o <- vapply(
    seq_len(nrow(x$dist)),
    function(i) order(x$dist[i, ], x$id[i, ], decreasing = decreasing),
    integer(ncol(x$id))
  )
  for (i in seq_len(ncol(o))) {
    x$dist[i, ] <- x$dist[i, ][o[, i]]
    x$id[i, ] <- x$id[i, ][o[, i]]
  }
  x$sort <- TRUE

  x
}

#' @rdname kNN
#' @export
adjacencylist.kNN <- function(x, ...)
  lapply(
    seq_len(nrow(x$id)),
    FUN = function(i) {
      ## filter NAs
      tmp <- x$id[i, ]
      tmp[!is.na(tmp)]
    }
  )

#' @rdname kNN
#' @export
print.kNN <- function(x, ...) {
  cat("k-nearest neighbors for ",
    nrow(x$id),
    " objects (k=",
    x$k,
    ").",
    "\n",
    sep = "")
  cat("Distance metric:", x$metric, "\n")
  cat("\nAvailable fields: ", toString(names(x)), "\n", sep = "")
}

# Convert names to integers for C++
.parse_search <- function(search) {
  search <- pmatch(toupper(search), c("KDTREE", "LINEAR", "DIST"))
  if (is.na(search))
    stop("Unknown NN search type!")
  search
}

.parse_splitRule <- function(splitRule) {
  splitRule <- pmatch(toupper(splitRule), .ANNsplitRule) - 1L
  if (is.na(splitRule))
    stop("Unknown splitRule!")
  splitRule
}


================================================
FILE: R/kNNdist.R
================================================
#######################################################################
# dbscan - Density Based Clustering of Applications with Noise
#          and Related Algorithms
# Copyright (C) 2015 Michael Hahsler

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

#' Calculate and Plot k-Nearest Neighbor Distances
#'
#' Fast calculation of the k-nearest neighbor distances for a dataset
#' represented as a matrix of points. The kNN distance is defined as the
#' distance from a point to its k nearest neighbor. The kNN distance plot
#' displays the kNN distance of all points sorted from smallest to largest. The
#' plot can be used to help find suitable parameter values for [dbscan()].
#'
#' @family Outlier Detection Functions
#' @family NN functions
#'
#' @param x the data set as a matrix of points (Euclidean distance is used) or
#' a precalculated [dist] object.
#' @param k number of nearest neighbors used for the distance calculation. For
#' `kNNdistplot()` also a range of values for `k` or `minPts` can be specified.
#' @param minPts to use a k-NN plot to determine a suitable `eps` value for [dbscan()],
#'    `minPts` used in dbscan can be specified and will set `k = minPts - 1`.
#' @param all should a matrix with the distances to all k nearest neighbors be
#' returned?
#' @param ... further arguments (e.g., kd-tree related parameters) are passed
#' on to [kNN()].
#'
#' @return `kNNdist()` returns a numeric vector with the distance to its k
#' nearest neighbor. If `all = TRUE` then a matrix with k columns
#' containing the distances to all 1st, 2nd, ..., kth nearest neighbors is
#' returned instead.
#'
#' @author Michael Hahsler
#' @keywords model plot
#' @examples
#' data(iris)
#' iris <- as.matrix(iris[, 1:4])
#'
#' ## Find the 4-NN distance for each observation (see ?kNN
#' ## for different search strategies)
#' kNNdist(iris, k = 4)
#'
#' ## Get a matrix with distances to the 1st, 2nd, ..., 4th NN.
#' kNNdist(iris, k = 4, all = TRUE)
#'
#' ## Produce a k-NN distance plot to determine a suitable eps for
#' ## DBSCAN with MinPts = 5. Use k = 4 (= MinPts -1).
#' ## The knee is visible around a distance of .7
#' kNNdistplot(iris, k = 4)
#'
#' ## Look at all k-NN distance plots for a k of 1 to 10
#' ## Note that k-NN distances are increasing in k
#' kNNdistplot(iris, k = 1:20)
#'
#' cl <- dbscan(iris, eps = .7, minPts = 5)
#' pairs(iris, col = cl$cluster + 1L)
#' ## Note: black points are noise points
#' @export
kNNdist <- function(x, k, all = FALSE, ...) {
  kNNd <- kNN(x, k, sort = TRUE, ...)$dist
  if (!all)
    kNNd <- kNNd[, k]
  kNNd
}

#' @rdname kNNdist
#' @export
kNNdistplot <- function(x, k, minPts, ...) {
  if (missing(k) && missing(minPts))
    stop("k or minPts need to be specified.")

  if (missing(k))
    k <- minPts - 1

  if (length(k) == 1) {
  kNNdist <- sort(kNNdist(x, k, ...))
  plot(
    kNNdist,
    type = "l",
    ylab = paste0(k, "-NN distance"),
    xlab = "Points sorted by distance"
  )

  } else {
    knnds <- vapply(k, function(i) sort(kNNdist(x, i, ...)), numeric(nrow(x)))

    matplot(knnds, type = "l", lty = 1,
            ylab = paste0("k-NN distance"),
            xlab = "Points sorted by distance")
  }
}


================================================
FILE: R/moons.R
================================================
#######################################################################
# dbscan - Density Based Clustering of Applications with Noise
#          and Related Algorithms
# Copyright (C) 2015 Michael Hahsler

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

#' Moons Data
#'
#' Contains 100 2-d points, half of which are contained in two moons or
#' "blobs"" (25 points each blob), and the other half in asymmetric facing
#' crescent shapes. The three shapes are all linearly separable.
#'
#' This data was generated with the following Python commands using the
#' SciKit-Learn library:
#'
#' `> import sklearn.datasets as data`
#'
#' `> moons = data.make_moons(n_samples=50, noise=0.05)`
#'
#' `> blobs = data.make_blobs(n_samples=50, centers=[(-0.75,2.25), (1.0, 2.0)], cluster_std=0.25)`
#'
#' `> test_data = np.vstack([moons, blobs])`
#'
#' @name moons
#' @docType data
#' @format A data frame with 100 observations on the following 2 variables.
#' \describe{
#' \item{X}{a numeric vector}
#' \item{Y}{a numeric vector} }
#' @references Pedregosa, Fabian, Gael Varoquaux, Alexandre Gramfort,
#' Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel et al.
#' Scikit-learn: Machine learning in Python. _Journal of Machine Learning
#' Research_ 12, no. Oct (2011): 2825-2830.
#' @source See the HDBSCAN notebook from github documentation:
#' \url{http://hdbscan.readthedocs.io/en/latest/how_hdbscan_works.html}
#' @keywords datasets
#' @examples
#' data(moons)
#' plot(moons, pch=20)
NULL


================================================
FILE: R/ncluster.R
================================================
#######################################################################
# dbscan - Density Based Clustering of Applications with Noise
#          and Related Algorithms
# Copyright (C) 2015 Michael Hahsler, Matt Piekenbrock

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

#' Number of Clusters, Noise Points, and Observations
#'
#' Extract the number of clusters or the number of noise points for
#' a clustering. This function works with any clustering result that
#' contains a list element named `cluster` with a clustering vector. In
#' addition, `nobs` (see [stats::nobs()]) is also available to retrieve
#' the number of clustered points.
#'
#' @name ncluster
#' @aliases ncluster nnoise nobs
#' @family clustering functions
#'
#' @param object a clustering result object containing a `cluster` element.
#' @param ...  additional arguments are unused.
#'
#' @return returns the number if clusters or noise points.
#' @examples
#' data(iris)
#' iris <- as.matrix(iris[, 1:4])
#'
#' res <- dbscan(iris, eps = .7, minPts = 5)
#' res
#'
#' ncluster(res)
#' nnoise(res)
#' nobs(res)
#'
#' # the functions also work with kmeans and other clustering algorithms.
#' cl <- kmeans(iris, centers = 3)
#' ncluster(cl)
#' nnoise(cl)
#' nobs(res)
#' @export
ncluster <- function(object, ...) {
  UseMethod("ncluster")
}

#' @export
ncluster.default <- function(object, ...) {
  if (!is.list(object) || !is.numeric(object$cluster))
    stop("ncluster() requires a clustering object with a cluster component containing the cluster labels.")

  length(setdiff(unique(object$cluster), 0L))
}

#' @rdname ncluster
#' @export
nnoise <- function(object, ...) {
  UseMethod("nnoise")
}

#' @export
nnoise.default <- function(object, ...) {
  if (!is.list(object) || !is.numeric(object$cluster))
    stop("ncluster() requires a clustering object with a cluster component containing the cluster labels.")

  sum(object$cluster == 0L)
}


================================================
FILE: R/nobs.R
================================================

#' @importFrom stats nobs
#' @export
nobs.dbscan <- function(object, ...) length(object$cluster)

#' @export
nobs.hdbscan <- function(object, ...) length(object$cluster)

#' @export
nobs.general_clustering <- function(object, ...) length(object$cluster)


================================================
FILE: R/optics.R
================================================
#######################################################################
# dbscan - Density Based Clustering of Applications with Noise
#          and Related Algorithms
# Copyright (C) 2015 Michael Hahsler

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

#' Ordering Points to Identify the Clustering Structure (OPTICS)
#'
#' Implementation of the OPTICS (Ordering points to identify the clustering
#' structure) point ordering algorithm using a kd-tree.
#'
#' **The algorithm**
#'
#' This implementation of OPTICS implements the original
#' algorithm as described by Ankerst et al (1999). OPTICS is an ordering
#' algorithm with methods to extract a clustering from the ordering.
#' While using similar concepts as DBSCAN, for OPTICS `eps`
#' is only an upper limit for the neighborhood size used to reduce
#' computational complexity. Note that `minPts` in OPTICS has a different
#' effect then in DBSCAN. It is used to define dense neighborhoods, but since
#' `eps` is typically set rather high, this does not effect the ordering
#' much. However, it is also used to calculate the reachability distance and
#' larger values will make the reachability distance plot smoother.
#'
#' OPTICS linearly orders the data points such that points which are spatially
#' closest become neighbors in the ordering. The closest analog to this
#' ordering is dendrogram in single-link hierarchical clustering. The algorithm
#' also calculates the reachability distance for each point.
#' `plot()` (see [reachability_plot])
#' produces a reachability plot which shows each points reachability distance
#' between two consecutive points
#' where the points are sorted by OPTICS. Valleys represent clusters (the
#' deeper the valley, the more dense the cluster) and high points indicate
#' points between clusters.
#'
#' **Specifying the data**
#'
#' If `x` is specified as a data matrix, then Euclidean distances and fast
#' nearest neighbor lookup using a kd-tree are used. See [kNN()] for
#' details on the parameters for the kd-tree.
#'
#' **Extracting a clustering**
#'
#' Several methods to extract a clustering from the order returned by OPTICS are
#' implemented:
#'
#' * `extractDBSCAN()` extracts a clustering from an OPTICS ordering that is
#'   similar to what DBSCAN would produce with an eps set to `eps_cl` (see
#'   Ankerst et al, 1999). The only difference to a DBSCAN clustering is that
#'   OPTICS is not able to assign some border points and reports them instead as
#'   noise.
#'
#' * `extractXi()` extract clusters hierarchically specified in Ankerst et al
#'   (1999) based on the steepness of the reachability plot. One interpretation
#'   of the `xi` parameter is that it classifies clusters by change in
#'   relative cluster density. The used algorithm was originally contributed by
#'   the ELKI framework and is explained in Schubert et al (2018), but contains a
#'   set of fixes.
#'
#' **Predict cluster memberships**
#'
#' `predict()` requires an extracted DBSCAN clustering with `extractDBSCAN()` and then
#' uses predict for `dbscan()`.
#'
#' @aliases optics OPTICS
#' @family clustering functions
#'
#' @param x a data matrix or a [dist] object.
#' @param eps upper limit of the size of the epsilon neighborhood. Limiting the
#' neighborhood size improves performance and has no or very little impact on
#' the ordering as long as it is not set too low. If not specified, the largest
#' minPts-distance in the data set is used which gives the same result as
#' infinity.
#' @param minPts the parameter is used to identify dense neighborhoods and the
#' reachability distance is calculated as the distance to the minPts nearest
#' neighbor. Controls the smoothness of the reachability distribution. Default
#' is 5 points.
#' @param eps_cl Threshold to identify clusters (`eps_cl <= eps`).
#' @param xi Steepness threshold to identify clusters hierarchically using the
#' Xi method.
#' @param object an object of class `optics`.
#' @param minimum logical, representing whether or not to extract the minimal
#' (non-overlapping) clusters in the Xi clustering algorithm.
#' @param correctPredecessors logical, correct a common artifact by pruning
#' the steep up area for points that have predecessors not in the
#' cluster--found by the ELKI framework, see details below.
#' @param ...  additional arguments are passed on to fixed-radius nearest
#' neighbor search algorithm. See [frNN()] for details on how to
#' control the search strategy.
#' @param cluster,predecessor plot clusters and predecessors.
#'
#' @return An object of class `optics` with components:
#' \item{eps }{ value of `eps` parameter. }
#' \item{minPts }{ value of `minPts` parameter. }
#' \item{order }{ optics order for the data points in `x`. }
#' \item{reachdist }{ [reachability] distance for each data point in `x`. }
#' \item{coredist }{ core distance for each data point in `x`. }
#'
#' For `extractDBSCAN()`, in addition the following
#' components are available:
#' \item{eps_cl }{ the value of the `eps_cl` parameter. }
#' \item{cluster }{ assigned cluster labels in the order of the data points in `x`. }
#'
#' For `extractXi()`, in addition the following components
#' are available:
#' \item{xi}{ Steepness threshold`x`. }
#' \item{cluster }{ assigned cluster labels in the order of the data points in `x`.}
#' \item{clusters_xi }{ data.frame containing the start and end of each cluster
#' found in the OPTICS ordering. }
#'
#' @author Michael Hahsler and Matthew Piekenbrock
#' @seealso Density [reachability].
#'
#' @references Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Joerg
#' Sander (1999). OPTICS: Ordering Points To Identify the Clustering Structure.
#' _ACM SIGMOD international conference on Management of data._ ACM Press. pp.
#' \doi{10.1145/304181.304187}
#'
#' Hahsler M, Piekenbrock M, Doran D (2019). dbscan: Fast Density-Based
#' Clustering with R.  _Journal of Statistical Software_, 91(1), 1-30.
#' \doi{10.18637/jss.v091.i01}
#'
#' Erich Schubert, Michael Gertz (2018). Improving the Cluster Structure
#' Extracted from OPTICS Plots. In _Lernen, Wissen, Daten, Analysen (LWDA 2018),_
#' pp. 318-329.
#' @keywords model clustering
#' @examples
#' set.seed(2)
#' n <- 400
#'
#' x <- cbind(
#'   x = runif(4, 0, 1) + rnorm(n, sd = 0.1),
#'   y = runif(4, 0, 1) + rnorm(n, sd = 0.1)
#'   )
#'
#' plot(x, col=rep(1:4, times = 100))
#'
#' ### run OPTICS (Note: we use the default eps calculation)
#' res <- optics(x, minPts = 10)
#' res
#'
#' ### get order
#' res$order
#'
#' ### plot produces a reachability plot
#' plot(res)
#'
#' ### plot the order of points in the reachability plot
#' plot(x, col = "grey")
#' polygon(x[res$order, ])
#'
#' ### extract a DBSCAN clustering by cutting the reachability plot at eps_cl
#' res <- extractDBSCAN(res, eps_cl = .065)
#' res
#'
#' plot(res)  ## black is noise
#' hullplot(x, res)
#'
#' ### re-cut at a higher eps threshold
#' res <- extractDBSCAN(res, eps_cl = .07)
#' res
#' plot(res)
#' hullplot(x, res)
#'
#' ### extract hierarchical clustering of varying density using the Xi method
#' res <- extractXi(res, xi = 0.01)
#' res
#'
#' plot(res)
#' hullplot(x, res)
#'
#' # Xi cluster structure
#' res$clusters_xi
#'
#' ### use OPTICS on a precomputed distance matrix
#' d <- dist(x)
#' res <- optics(d, minPts = 10)
#' plot(res)
#' @export
optics <- function(x, eps = NULL, minPts = 5, ...) {
  ### find eps from minPts
  eps <- eps %||% max(kNNdist(x, k =  minPts))

  ### extra contains settings for frNN
  ### search = "kdtree", bucketSize = 10, splitRule = "suggest", approx = 0
  extra <- list(...)
  args <- c("search", "bucketSize", "splitRule", "approx")
  m <- pmatch(names(extra), args)
  if (anyNA(m))
    stop("Unknown parameter: ",
      toString(names(extra)[is.na(m)]))
  names(extra) <- args[m]

  search <- .parse_search(extra$search %||% "kdtree")
  splitRule <- .parse_splitRule(extra$splitRule %||% "suggest")
  bucketSize <- as.integer(extra$bucketSize %||% 10L)
  approx <- as.integer(extra$approx %||% 0L)

  ### dist search
  if (search == 3L && !inherits(x, "dist")) {
    if (.matrixlike(x))
      x <- dist(x)
    else
      stop("x needs to be a matrix to calculate distances")
  }

  ## for dist we provide the R code with a frNN list and no x
  frNN <- list()
  if (inherits(x, "dist")) {
    frNN <- frNN(x, eps, ...)
    ## add self match and use C numbering
    frNN$id <- lapply(
      seq_along(frNN$id),
      FUN = function(i)
        c(i - 1L, frNN$id[[i]] - 1L)
    )
    frNN$dist <- lapply(
      seq_along(frNN$dist),
      FUN = function(i)
        c(0, frNN$dist[[i]]) ^ 2
    )

    x <- matrix()
    storage.mode(x) <- "double"

  } else{
    if (!.matrixlike(x))
      stop("x needs to be a matrix")
    ## make sure x is numeric
    x <- as.matrix(x)
    if (storage.mode(x) == "integer")
      storage.mode(x) <- "double"
    if (storage.mode(x) != "double")
      stop("x has to be a numeric matrix.")
  }

  if (length(frNN) == 0 &&
      anyNA(x))
    stop("data/distances cannot contain NAs for optics (with kd-tree)!")

  ret <-
    optics_int(
      as.matrix(x),
      as.double(eps),
      as.integer(minPts),
      as.integer(search),
      as.integer(bucketSize),
      as.integer(splitRule),
      as.double(approx),
      frNN
    )

  ret$minPts <- minPts
  ret$eps <- eps
  ret$eps_cl <- NA_real_
  ret$xi <- NA_real_
  class(ret) <- "optics"

  ret
}

#' @rdname optics
#' @export
print.optics <- function(x, ...) {
  writeLines(c(
    paste0(
      "OPTICS ordering/clustering for ",
      length(x$order),
      " objects."
    ),
    paste0(
      "Parameters: ",
      "minPts = ",
      x$minPts,
      ", eps = ",
      x$eps,
      ", eps_cl = ",
      x$eps_cl,
      ", xi = ",
      x$xi
    )
  ))

  if (!is.null(x$cluster)) {

    if (is.na(x$xi)) {
      writeLines(paste0(
        "The clustering contains ",
        ncluster(x),
        " cluster(s) and ",
        nnoise(x),
        " noise points."
      ))

      print(table(x$cluster))
    } else {
      writeLines(
        paste0(
          "The clustering contains ",
          nrow(x$clusters_xi),
          " cluster(s) and ",
          nnoise(x),
          " noise points."
        )
      )
    }
    cat("\n")
  }
  writeLines(strwrap(paste0(
    "Available fields: ",
    toString(names(x))
  ), exdent = 18))
}

#' @rdname optics
#' @export
plot.optics <-
  function(x,
    cluster = TRUE,
    predecessor = FALSE,
    ...) {
    # OPTICS cluster extraction methods
    if (inherits(x$cluster, "xics") ||
        all(c("start", "end", "cluster_id") %in% names(x$clusters_xi))) {
      # Sort clusters by size
      hclusters <-
        x$clusters_xi[order(x$clusters_xi$end - x$clusters_xi$start), ]

      # .1 means to leave 15% for the cluster lines
      def.par <- par(no.readonly = TRUE)
      par(mar = c(2, 4, 4, 2) + 0.1, omd = c(0, 1, .15, 1))

      # Need to know how to spread out lines
      y_max <- max(x$reachdist[!is.infinite(x$reachdist)])
      y_increments <- (y_max / 0.85 * .15) / (nrow(hclusters) + 1L)

      # Get top level cluster labels
      # top_level <- extractClusterLabels(x$clusters_xi, x$order)
      plot(
        as.reachability(x),
        col = x$cluster[x$order] + 1L,
        xlab = NA,
        xaxt = 'n',
        yaxs = "i",
        ylim = c(0, y_max),
        ...
      )

      # Lines beneath plotting region indicating Xi clusters
      i <- seq_len(nrow(hclusters))
      segments(
        x0 = hclusters$start[i],
        y0 = -(y_increments * i),
        x1 = hclusters$end[i],
        col = hclusters$cluster_id[i] + 1L,
        lwd = 2,
        xpd = NA
      )
      ## Restore previous settings
      par(def.par)
    } else if (is.numeric(x$cluster) &&
        !is.null(x$eps_cl)) {
      # Works for integers too
      ## extractDBSCAN clustering
      plot(as.reachability(x), col = x$cluster[x$order] + 1L, ...)
      lines(
        x = c(0, length(x$cluster)),
        y = c(x$eps_cl, x$eps_cl),
        col = "black",
        lty = 2
      )
    } else {
      # Regular reachability plot
      plot(as.reachability(x), ...)
    }
  }

# Simple conversion between OPTICS objects and reachability objects
#' @rdname optics
#' @export
as.reachability.optics <- function(object, ...) {
  structure(list(reachdist = object$reachdist, order = object$order),
    class = "reachability")
}

# Conversion between OPTICS objects and dendrograms
#' @rdname optics
#' @export
as.dendrogram.optics <- function(object, ...) {
  if (object$minPts > length(object$order)) {
    stop("'minPts' should be less or equal to the points in the dataset.")
  }
  if (sum(is.infinite(object$reachdist)) > 1)
    stop(
      "Eps value is not large enough to capture the complete hiearchical structure of the dataset. Please use a large eps value (such as Inf)."
    )
  as.dendrogram(as.reachability(object))
}

#' @rdname optics
#' @export
extractDBSCAN <- function(object, eps_cl) {
  if (!inherits(object, "optics"))
    stop("extractDBSCAN only accepts objects resulting from dbscan::optics!")

  reachdist <- object$reachdist[object$order]
  coredist <- object$coredist[object$order]
  n <- length(object$order)
  cluster <- integer(n)

  clusterid <- 0L         ### 0 is noise
  for (i in 1:n) {
    if (reachdist[i] > eps_cl) {
      if (coredist[i] <= eps_cl) {
        clusterid <- clusterid + 1L
        cluster[i] <- clusterid
      } else{
        cluster[i] <- 0L  ### noise
      }
    } else{
      cluster[i] <- clusterid
    }
  }

  object$eps_cl <- eps_cl
  object$xi <- NA_real_
  ### fix the order so cluster is in the same order as the rows in x
  cluster[object$order] <- cluster
  object$cluster <- cluster

  object
}


#' @rdname optics
#' @export
extractXi <-
  function(object,
    xi,
    minimum = FALSE,
    correctPredecessors = TRUE)
  {
    if (!inherits(object, "optics"))
      stop("extractXi only accepts xs resulting from dbscan::optics!")
    if (xi >= 1.0 ||
        xi <= 0.0)
      stop("The Xi parameter must be (0, 1)")

    # Initial variables
    object$ord_rd <- object$reachdist[object$order]
    object$ixi <- (1 - xi)
    SetOfSteepDownAreas <- list()
    SetOfClusters <- list()
    index <- 1
    mib <- 0
    sdaset <- list()
    while (index <= length(object$order))
    {
      mib <- max(mib, object$ord_rd[index])
      if (!valid(index + 1, object))
        break

      # Test if this is a steep down area
      if (steepDown(index, object))
      {
        # Update mib values with current mib and filter
        sdaset <- updateFilterSDASet(mib, sdaset, object$ixi)
        startval <- object$ord_rd[index]
        mib <- 0
        startsteep <- index
        endsteep <- index + 1
        while (!is.na(object$order[index + 1])) {
          index <- index + 1
          if (steepDown(index, object)) {
            endsteep <- index + 1
            next
          }
          if (!steepDown(index, object, ixi = 1.0) ||
              index - endsteep > object$minPts)
            break
        }
        sda <- list(
          s = startsteep,
          e = endsteep,
          maximum = startval,
          mib = 0
        )
        # print(paste("New steep down area:", toString(sda)))
        sdaset <- append(sdaset, list(sda))
        next
      }
      if (steepUp(index, object))
      {
        sdaset <- updateFilterSDASet(mib, sdaset, object$ixi)
        {
          startsteep <- index
          endsteep <- index + 1
          mib <- object$ord_rd[index]
          esuccr <-
            if (!valid(index + 1, object))
              Inf
          else
            object$ord_rd[index + 1]
          if (!is.infinite(esuccr)) {
            while (!is.na(object$order[index + 1])) {
              index <- index + 1
              if (steepUp(index, object)) {
                endsteep <- index + 1
                mib <- object$ord_rd[index]
                esuccr <-
                  if (!valid(index + 1, object))
                    Inf
                else
                  object$ord_rd[index + 1]
                if (is.infinite(esuccr)) {
                  endsteep <- endsteep - 1
                  break
                }
                next
              }
              if (!steepUp(index, object, ixi = 1.0) ||
                  index - endsteep > object$minPts)
                break
            }
          } else {
            endsteep <- endsteep - 1
            index <- index + 1
          }
          sua <- list(s = startsteep,
            e = endsteep,
            maximum = esuccr)
          # print(paste("New steep up area:", toString(sua)))
        }
        for (sda in rev(sdaset))
        {
          # Condition 3B
          if (mib * object$ixi < sda$mib)
            next

          # Default values
          cstart <- sda$s
          cend <- sua$e

          # Credit to ELKI
          if (correctPredecessors) {
            while (cend > cstart && is.infinite(object$ord_rd[cend])) {
              cend <- cend - 1
            }
          }

          # Condition 4
          {
            # Case b
            if (sda$maximum * object$ixi >= sua$maximum) {
              while (cstart < cend &&
                  object$ord_rd[cstart + 1] > sua$maximum)
                cstart <- cstart + 1
            }
            # Case c
            else if (sua$maximum * object$ixi >= sda$maximum) {
              while (cend > cstart &&
                  object$ord_rd[cend - 1] > sda$maximum)
                cend <- cend - 1
            }
          }

          # This NOT in the original article - credit to ELKI for finding this.
          # Ensure that the predecessor is in the current cluster. This filter
          # removes common artifacts from the Xi method
          if (correctPredecessors) {
            while (cend > cstart) {
              tmp2 <- object$predecessor[object$order[cend]]
              if (!is.na(tmp2) &&
                  any(object$order[cstart:(cend - 1)] == tmp2, na.rm = TRUE))
                break
              # Not found.
              cend <- cend - 1
            }
          }

          # Ensure the last steep up point is not included if it's xi significant
          if (steepUp(index - 1, object)) {
            cend <- cend - 1
          }

          # obey minpts
          if (cend - cstart + 1 < object$minPts)
            next
          SetOfClusters <-
            append(SetOfClusters, list(list(
              start = cstart, end = cend
            )))
          next
        }
      } else {
        index <- index + 1
      }
    }
    # Remove aliases
    object$ord_rd <- NULL
    object$ixi <- NULL

    # Keep xi parameter, disable any previous flat clustering parameter
    object$xi <- xi
    object$eps_cl <- NA_real_

    # Zero-out clusters (only noise) if none found
    if (length(SetOfClusters) == 0) {
      warning(paste("No clusters were found with threshold:", xi))
      object$clusters_xi <- NULL
      object$cluster < rep(0, length(object$cluster))
      return(invisible(object))
    }
    # Cluster data exists; organize it by starting and ending index, give arbitrary id
    object$clusters_xi <- do.call(rbind, SetOfClusters)
    object$clusters_xi <-
      data.frame(
        start = unlist(object$clusters_xi[, 1], use.names = FALSE),
        end = unlist(object$clusters_xi[, 2], use.names = FALSE),
        check.names = FALSE
      )
    object$clusters_xi <-
      object$clusters_xi[order(object$clusters_xi$start, object$clusters_xi$end), ]
    object$clusters_xi <-
      cbind(object$clusters_xi, list(cluster_id = seq_len(nrow(object$clusters_xi))))
    row.names(object$clusters_xi) <- NULL

    ## Populate cluster vector with either:
    ## 1. 'top-level' cluster labels to aid in plotting
    ## 2. 'local' or non-overlapping cluster labels if minimum == TRUE
    object$cluster <-
      extractClusterLabels(object$clusters_xi, object$order, minimum = minimum)

    # Remove non-local clusters if minimum was specified
    if (minimum) {
      object$clusters_xi <-
        object$clusters_xi[sort(unique(object$cluster))[-1], ]
    }

    class(object$cluster) <-
      unique(append(class(object$cluster), "xics"))
    class(object$clusters_xi) <-
      unique(append(class(object$clusters_xi), "xics"))
    object
  }

# Removes obsolete steep areas
updateFilterSDASet <- function(mib, sdaset, ixi) {
  sdaset <- Filter(function(sda)
    sda$maximum * ixi > mib, sdaset)
  lapply(sdaset, function(sda) {
    if (mib > sda$mib)
      sda$mib <- mib
    sda
  })
}

# Determines if the reachability distance at the current index 'i' is
# (xi) significantly lower than the next index
steepUp <- function(i, object, ixi = object$ixi) {
  if (is.infinite(object$ord_rd[i]))
    return(FALSE)
  if (!valid(i + 1, object))
    return(TRUE)
  return(object$ord_rd[i] <= object$ord_rd[i + 1] * ixi)
}

# Determines if the reachability distance at the current index 'i' is
# (xi) significantly higher than the next index
steepDown <- function(i, object, ixi = object$ixi) {
  if (!valid(i + 1, object))
    return(FALSE)
  if (is.infinite(object$ord_rd[i + 1]))
    return(FALSE)
  return(object$ord_rd[i] * ixi >= object$ord_rd[i + 1])
}

# Determines if the reachability distance at the current index 'i' is a valid distance
valid <- function(index, object) {
  return(!is.na(object$ord_rd[index]))
}

### Extract clusters (minimum == T extracts clusters that do not contain other clusters) from a given ordering of points
extractClusterLabels <- function(cl, order, minimum = FALSE) {
  ## Add cluster_id to clusters
  if (!all(c("start", "end") %in% names(cl)))
    stop("extractClusterLabels expects start and end references")
  if (!"cluster_id" %in% names(cl))
    cl <- cbind(cl, cluster_id = seq_len(nrow(cl)))

  ## Sort cl based on minimum parameter / cluster size
  if (!"cluster_size" %in% names(cl))
    cl <- cbind(cl, list(cluster_size = (cl$end - cl$start)))
  cl <-
    if (minimum) {
      cl[order(cl$cluster_size), ]
    } else {
      cl[order(-cl$cluster_size), ]
    }

  ## Fill in the [cluster] vector with cluster IDs
  clusters <- rep(0, length(order))
  for (cid in cl$cluster_id) {
    cluster <- cl[cl$cluster_id == cid, ]
    if (minimum) {
      if (all(clusters[cluster$start:cluster$end] == 0)) {
        clusters[cluster$start:cluster$end] <- cid
      }
    } else
      clusters[cluster$start:cluster$end] <- cid
  }

  # Fix the ordering
  clusters[order] <- clusters
  return(clusters)
}


================================================
FILE: R/pointdensity.R
================================================
#######################################################################
# dbscan - Density Based Clustering of Applications with Noise
#          and Related Algorithms
# Copyright (C) 2017 Michael Hahsler

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

#' Calculate Local Density at Each Data Point
#'
#' Calculate the local density at each data point as either the number of
#' points in the eps-neighborhood (as used in `dbscan()`) or perform kernel density
#' estimation (KDE) using a uniform kernel. The function uses a kd-tree for fast
#' fixed-radius nearest neighbor search.
#'
#' `dbscan()` estimates the density around a point as the number of points in the
#' eps-neighborhood of the point (including the query point itself).
#' Kernel density estimation (KDE) using a uniform kernel, which is just this point
#' count in the eps-neighborhood divided by \eqn{(2\,eps\,n)}{(2 eps n)}, where
#' \eqn{n} is the number of points in `x`.
#'
#' Alternatively, `type = "gaussian"` calculates a Gaussian kernel estimate where
#' `eps` is used as the standard deviation. To speed up computation, a
#' kd-tree is used to find all points within 3 times the standard deviation and
#' these points are used for the estimate.
#'
#' Points with low local density often indicate noise (see e.g., Wishart (1969)
#' and Hartigan (1975)).
#'
#' @aliases pointdensity density
#' @family Outlier Detection Functions
#'
#' @param x a data matrix or a dist object.
#' @param eps radius of the eps-neighborhood, i.e., bandwidth of the uniform
#' kernel). For the Gaussian kde, this parameter specifies the standard deviation of
#' the kernel.
#' @param type `"frequency"`, `"density"`, or `"gaussian"`. should the raw count of
#' points inside the eps-neighborhood, the eps-neighborhood density estimate,
#' or a Gaussian density estimate be returned?
#' @param search,bucketSize,splitRule,approx algorithmic parameters for
#' [frNN()].
#'
#' @return A vector of the same length as data points (rows) in `x` with
#' the count or density values for each data point.
#'
#' @author Michael Hahsler
#' @seealso [frNN()], [stats::density()].
#' @references Wishart, D. (1969), Mode Analysis: A Generalization of Nearest
#' Neighbor which Reduces Chaining Effects, in _Numerical Taxonomy,_ Ed., A.J.
#' Cole, Academic Press, 282-311.
#'
#' John A. Hartigan (1975), _Clustering Algorithms,_ John Wiley & Sons, Inc.,
#' New York, NY, USA.
#' @keywords model
#' @examples
#' set.seed(665544)
#' n <- 100
#' x <- cbind(
#'   x=runif(10, 0, 5) + rnorm(n, sd = 0.4),
#'   y=runif(10, 0, 5) + rnorm(n, sd = 0.4)
#'   )
#' plot(x)
#'
#' ### calculate density around points
#' d <- pointdensity(x, eps = .5, type = "density")
#'
#' ### density distribution
#' summary(d)
#' hist(d, breaks = 10)
#'
#' ### plot with point size is proportional to Density
#' plot(x, pch = 19, main = "Density (eps = .5)", cex = d*5)
#'
#' ### Wishart (1969) single link clustering after removing low-density noise
#' # 1. remove noise with low density
#' f <- pointdensity(x, eps = .5, type = "frequency")
#' x_nonoise <- x[f >= 5,]
#'
#' # 2. use single-linkage on the non-noise points
#' hc <- hclust(dist(x_nonoise), method = "single")
#' plot(x, pch = 19, cex = .5)
#' points(x_nonoise, pch = 19, col= cutree(hc, k = 4) + 1L)
#' @export
pointdensity <- function(x,
  eps,
  type = "frequency",
  search = "kdtree",
  bucketSize = 10,
  splitRule = "suggest",
  approx = 0) {
  type <- match.arg(type, choices = c("frequency", "density", "gaussian"))

  if (anyNA(x))
    stop("missing values are not allowed in x.")

  if (type == "gaussian")
    return (.pointdensity_gaussian(x, sd = eps, search = search,
                                   bucketSize = bucketSize,
                                   splitRule = splitRule, approx = approx))

  # regular dbscan density estimation
  if (inherits(x, "dist")) {
    nn <- frNN(
      x,
      eps,
      sort = FALSE,
      search = search,
      bucketSize = bucketSize,
      splitRule = splitRule,
      approx = approx
    )
    d <- lengths(nn$id) + 1L

  } else {
    # faster implementation for a data matrix
    search <- .parse_search(search)
    splitRule <- .parse_splitRule(splitRule)

    d <- dbscan_density_int(
      as.matrix(x),
      as.double(eps),
      as.integer(search),
      as.integer(bucketSize),
      as.integer(splitRule),
      as.double(approx)
    )
  }

  if (type == "density")
    d <- d / (2 * eps * nrow(x))

  d
}

.pointdensity_gaussian <- function(x, sd, ...) {
    ### consider all points within 3 standard deviations
    nn <- frNN(
      x,
      3 * sd,
      sort = FALSE,
      ...
    )

    sigma <- sd^2
    d <- sapply(nn$dist, FUN = function(ds) sum(exp(-1 * ds^2 / (2 * sigma))))
    d <- d / (length(d) * sd * 2 * pi)
    d
}

#gof <- function(x, eps, ...) {
#  d <- pointdensity(x, eps, ...)
#  1/(d/mean(d))
#}


================================================
FILE: R/predict.R
================================================
#######################################################################
# dbscan - Density Based Clustering of Applications with Noise
#          and Related Algorithms
# Copyright (C) 2017 Michael Hahsler

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

#' @rdname dbscan
#' @param object clustering object.
#' @param data the data set used to create the clustering object.
#' @param newdata new data points for which the cluster membership should be
#' predicted.
#' @importFrom stats predict
#' @export
predict.dbscan_fast <- function (object, newdata, data, ...) {
  if (object$metric != "euclidean")
    warning("dbscan used non-Euclidean distances, predict assigns new points using Euclidean distances!")
  .predict_frNN(newdata, data, object$cluster, object$eps, ...)
}

#' @rdname optics
#' @param object clustering object.
#' @param data the data set used to create the clustering object.
#' @param newdata new data points for which the cluster membership should be
#' predicted.
#' @export
predict.optics <- function (object, newdata, data, ...) {
  if (is.null(object$cluster) ||
      is.null(object$eps_cl) || is.na(object$eps_cl))
    stop("no extracted clustering available in object! run extractDBSCAN() first.")
  .predict_frNN(newdata, data, object$cluster, object$eps_cl, ...)
}

#' @rdname hdbscan
#' @param object clustering object.
#' @param data the data set used to create the clustering object.
#' @param newdata new data points for which the cluster membership should be
#' predicted.
#' @export
predict.hdbscan <- function(object, newdata, data, ...) {
  clusters <- object$cluster

  if (is.null(newdata))
    return(clusters)

  # don't use noise
  coredist <- object$coredist[clusters != 0]
  data <- data[clusters != 0,]
  clusters <- clusters[clusters != 0]

  # find minPts - 1 nearest neighbor
  nns <- kNN(data, query = newdata, k = 1)

  # choose cluster if dist <= coredist of that point
  drop(ifelse(nns$dist > coredist[nns$id], 0L, clusters[nns$id]))
}

## find the cluster id of the closest NN in the eps neighborhood or return 0 otherwise.
.predict_frNN <- function(newdata, data, clusters, eps, ...) {
  if (is.null(newdata))
    return(clusters)

  if (ncol(data) != ncol(newdata))
    stop("Number of columns in data and newdata do not agree!")

  if (nrow(data) != length(clusters))
    stop("clustering does not agree with the number of data points in data.")

  if (is.data.frame(data)) {
    indx <- vapply(data, is.factor, logical(1L))
    if (any(indx)) {
      warning(
        "data contains factors! The factors are converted to numbers and euclidean distances are used"
      )
    }
    data[indx] <- lapply(data[indx], as.numeric)
    newdata[indx] <- lapply(newdata[indx], as.numeric)
  }

  # don't use noise
  data <- data[clusters != 0,]
  clusters <- clusters[clusters != 0]

  # calculate the frNN between newdata and data (only keep entries for newdata)
  nn <- frNN(data,
    query = newdata,
    eps = eps,
    sort = TRUE,
    ...)

  vapply(
    nn$id, function(nns) if (length(nns) == 0L) 0L else clusters[nns[1L]], integer(1L)
  )
}


================================================
FILE: R/reachability.R
================================================
#######################################################################
# dbscan - Density Based Clustering of Applications with Noise
#          and Related Algorithms
# Copyright (C) 2015 Michael Hahsler, Matt Piekenbrock

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.


#' Reachability Distances
#'
#' Reachability distances can be plotted to show the hierarchical relationships between data points.
#' The idea was originally introduced by Ankerst et al (1999) for [OPTICS]. Later,
#' Sanders et al (2003) showed that the visualization is useful for other hierarchical
#' structures and introduced an algorithm to convert [dendrogram] representation to
#' reachability plots.
#'
#' A reachability plot displays the points as vertical bars, were the height is the
#' reachability distance between two consecutive points.
#' The central idea behind reachability plots is that the ordering in which
#' points are plotted identifies underlying hierarchical density
#' representation as mountains and valleys of high and low reachability distance.
#' The original ordering algorithm OPTICS as described by Ankerst et al (1999)
#' introduced the notion of reachability plots.
#'
#' OPTICS linearly orders the data points such that points
#' which are spatially closest become neighbors in the ordering. Valleys
#' represent clusters, which can be represented hierarchically. Although the
#' ordering is crucial to the structure of the reachability plot, its important
#' to note that OPTICS, like DBSCAN, is not entirely deterministic and, just
#' like the dendrogram, isomorphisms may exist
#'
#' Reachability plots were shown to essentially convey the same information as
#' the more traditional dendrogram structure by Sanders et al (2003). An dendrograms
#' can be converted into reachability plots.
#'
#' Different hierarchical representations, such as dendrograms or reachability
#' plots, may be preferable depending on the context. In smaller datasets,
#' cluster memberships may be more easily identifiable through a dendrogram
#' representation, particularly is the user is already familiar with tree-like
#' representations. For larger datasets however, a reachability plot may be
#' preferred for visualizing macro-level density relationships.
#'
#' A variety of cluster extraction methods have been proposed using
#' reachability plots. Because both cluster extraction depend directly on the
#' ordering OPTICS produces, they are part of the [optics()] interface.
#' Nonetheless, reachability plots can be created directly from other types of
#' linkage trees, and vice versa.
#'
#' _Note:_ The reachability distance for the first point is by definition not defined
#' (it has no preceding point).
#' Also, the reachability distances can be undefined when a point does not have enough
#' neighbors in the epsilon neighborhood. We represent these undefined cases as `Inf`
#' and represent them in the plot as a dashed line.
#'
#' @name reachability
#' @aliases reachability reachability_plot print.reachability
#'
#' @param object any object that can be coerced to class
#' `reachability`, such as an object of class [optics] or [stats::dendrogram].
#' @param x object of class `reachability`.
#' @param order_labels whether to plot text labels for each points reachability
#' distance.
#' @param xlab x-axis label.
#' @param ylab y-axis label.
#' @param main Title of the plot.
#' @param ...  graphical parameters are passed on to `plot()`,
#'   or arguments for other methods.
#'
#' @return An object of class `reachability` with components:
#' \item{order }{order to use for the data points in `x`. }
#' \item{reachdist }{reachability distance for each data point in `x`. }
#'
#' @author Matthew Piekenbrock
#' @seealso [optics()], [as.dendrogram()], and [stats::hclust()].
#' @references Ankerst, M., M. M. Breunig, H.-P. Kriegel, J. Sander (1999).
#' OPTICS: Ordering Points To Identify the Clustering Structure. _ACM
#' SIGMOD international conference on Management of data._ ACM Press. pp.
#' 49--60.
#'
#' Sander, J., X. Qin, Z. Lu, N. Niu, and A. Kovarsky (2003). Automatic
#' extraction of clusters from hierarchical clustering representations.
#' _Pacific-Asia Conference on Knowledge Discovery and Data Mining._
#' Springer Berlin Heidelberg.
#' @keywords model clustering hierarchical clustering
#' @examples
#' set.seed(2)
#' n <- 20
#'
#' x <- cbind(
#'   x = runif(4, 0, 1) + rnorm(n, sd = 0.1),
#'   y = runif(4, 0, 1) + rnorm(n, sd = 0.1)
#' )
#'
#' plot(x, xlim = range(x), ylim = c(min(x) - sd(x), max(x) + sd(x)), pch = 20)
#' text(x = x, labels = seq_len(nrow(x)), pos = 3)
#'
#' ### run OPTICS
#' res <- optics(x, eps = 10,  minPts = 2)
#' res
#'
#' ### plot produces a reachability plot.
#' plot(res)
#'
#' ### Manually extract reachability components from OPTICS
#' reach <- as.reachability(res)
#' reach
#'
#' ### plot still produces a reachability plot; points ids
#' ### (rows in the original data) can be displayed with order_labels = TRUE
#' plot(reach, order_labels = TRUE)
#'
#' ### Reachability objects can be directly converted to dendrograms
#' dend <- as.dendrogram(reach)
#' dend
#' plot(dend)
#'
#' ### A dendrogram can be converted back into a reachability object
#' plot(as.reachability(dend))
NULL

#' @rdname reachability
#' @export
print.reachability <- function(x, ...) {
  avg_reach <- mean(x$reachdist[!is.infinite(x$reachdist)], na.rm = TRUE)
  cat(
    "Reachability plot collection for ",
    length(x$order),
    " objects.\n",
    "Avg minimum reachability distance: ",
    avg_reach,
    "\n",
    "Available Fields: order, reachdist",
    sep = ""
  )
}

#' @rdname reachability
#' @export
plot.reachability <- function(x,
  order_labels = FALSE,
  xlab = "Order",
  ylab = "Reachability dist.",
  main = "Reachability Plot",
  ...) {
  if (is.null(x$order) ||
      is.null(x$reachdist))
    stop("reachability objects need 'reachdist' and 'order' fields")
  reachdist <- x$reachdist[x$order]

  plot(
    reachdist,
    xlab = xlab,
    ylab = ylab,
    main = main,
    type = "h",
    ...
  )
  abline(v = which(is.infinite(reachdist)),
    lty = 3)
  if (order_labels) {
    text(
      x = seq_along(x$order),
      y = reachdist,
      labels = x$order,
      pos = 3
    )
  }
}

#' @rdname reachability
#' @export
as.reachability <-
  function(object, ...)
    UseMethod("as.reachability")


#' @rdname reachability
#' @export
as.reachability.dendrogram <- function(object, ...) {
  if (!inherits(object, "dendrogram"))
    stop("The as.reachability method requires a dendrogram object.")
  # Rcpp doesn't seem to import attributes well for vectors
  fix_x <- dendrapply(object, function(leaf) {
    new_leaf <-
      as.list(leaf)
    attributes(new_leaf) <- attributes(leaf)
    new_leaf
  })
  res <- dendrogram_to_reach(fix_x)
  # Refix the ordering
  res$reachdist <- res$reachdist[order(res$order)]

  return(res)
}


================================================
FILE: R/sNN.R
================================================
#######################################################################
# dbscan - Density Based Clustering of Applications with Noise
#          and Related Algorithms
# Copyright (C) 2017 Michael Hahsler

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

# number of shared nearest neighbors including the point itself.


#' Find Shared Nearest Neighbors
#'
#' Calculates the number of shared nearest neighbors
#' and creates a shared nearest neighbors graph.
#'
#' The number of shared nearest neighbors of two points p and q is the
#' intersection of the kNN neighborhood of two points.
#' Note: that each point is considered to be part
#' of its own kNN neighborhood.
#' The range for the shared nearest neighbors is
#' \eqn{[0, k]}. The result is a n-by-k matrix called `shared`.
#' Each row is a point and the columns are the point's k nearest neighbors.
#' The value is the count of the shared neighbors.
#'
#' The shared nearest neighbor graph connects a point with all its nearest neighbors
#' if they have at least one shared neighbor. The number of shared neighbors can be used
#' as an edge weight.
#' Javis and Patrick (1973) use a slightly
#' modified (see parameter `jp`) shared nearest neighbor graph for
#' clustering.
#'
#' @aliases sNN snn
#' @family NN functions
#'
#' @param x a data matrix, a [dist] object or a [kNN] object.
#' @param k number of neighbors to consider to calculate the shared nearest
#' neighbors.
#' @param kt minimum threshold on the number of shared nearest neighbors to
#' build the shared nearest neighbor graph. Edges are only preserved if
#' `kt` or more neighbors are shared.
#' @param jp In regular sNN graphs, two points that are not neighbors
#' can have shared neighbors.
#' Javis and Patrick (1973) requires the two points to be neighbors, otherwise
#' the count is zeroed out. `TRUE` uses this behavior.
#' @param search nearest neighbor search strategy (one of `"kdtree"`, `"linear"` or
#' `"dist"`).
#' @param sort sort by the number of shared nearest neighbors? Note that this
#' is expensive and `sort = FALSE` is much faster. sNN objects can be
#' sorted using `sort()`.
#' @param bucketSize max size of the kd-tree leafs.
#' @param splitRule rule to split the kd-tree. One of `"STD"`, `"MIDPT"`, `"FAIR"`,
#' `"SL_MIDPT"`, `"SL_FAIR"` or `"SUGGEST"` (SL stands for sliding). `"SUGGEST"` uses
#' ANNs best guess.
#' @param approx use approximate nearest neighbors. All NN up to a distance of
#' a factor of `(1 + approx) eps` may be used. Some actual NN may be omitted
#' leading to spurious clusters and noise points.  However, the algorithm will
#' enjoy a significant speedup.
#' @param decreasing logical; sort in decreasing order?
#' @param ... additional parameters are passed on.
#' @return An object of class `sNN` (subclass of [kNN] and [NN]) containing a list
#' with the following components:
#' \item{id }{a matrix with ids. }
#' \item{dist}{a matrix with the distances. }
#' \item{shared }{a matrix with the number of shared nearest neighbors. }
#' \item{k }{number of `k` used. }
#' \item{metric }{the used distance metric. }
#'
#' @author Michael Hahsler
#' @references R. A. Jarvis and E. A. Patrick. 1973. Clustering Using a
#' Similarity Measure Based on Shared Near Neighbors. _IEEE Trans. Comput._
#' 22, 11 (November 1973), 1025-1034.
#' \doi{10.1109/T-C.1973.223640}
#' @keywords model
#' @examples
#' data(iris)
#' x <- iris[, -5]
#'
#' # finding kNN and add the number of shared nearest neighbors.
#' k <- 5
#' nn <- sNN(x, k = k)
#' nn
#'
#' # shared nearest neighbor distribution
#' table(as.vector(nn$shared))
#'
#' # explore number of shared points for the k-neighborhood of point 10
#' i <- 10
#' nn$shared[i,]
#'
#' plot(nn, x)
#'
#' # apply a threshold to create a sNN graph with edges
#' # if more than 3 neighbors are shared.
#' nn_3 <- sNN(nn, kt = 3)
#' plot(nn_3, x)
#'
#' # get an adjacency list for the shared nearest neighbor graph
#' adjacencylist(nn_3)
#' @export
sNN <- function(x,
  k,
  kt = NULL,
  jp = FALSE,
  sort = TRUE,
  search = "kdtree",
  bucketSize = 10,
  splitRule = "suggest",
  approx = 0) {
  if (missing(k))
    k <- x$k

  if (inherits(x, "kNN")) {
    if (k != x$k) {
      if (ncol(x$id) < k)
        stop("kNN object does not contain enough neighbors!")
      if (!x$sort)
        x <- sort.kNN(x)
      x$id <- x$id[, 1:k]
      x$dist <- x$dist[, 1:k]
      x$k <- k
    }

  } else
    x <-
      kNN(
        x,
        k,
        sort = FALSE,
        search = search,
        bucketSize = bucketSize,
        splitRule = splitRule,
        approx = approx
      )

  x$shared <- SNN_sim_int(x$id, as.logical(jp[1]))
  x$sort_shared <- FALSE

  class(x) <- c("sNN", "kNN", "NN")

  if (sort)
    x <- sort.sNN(x)

  x$kt <- kt

  if (!is.null(kt)) {
    if (kt > k)
      stop("kt needs to be less than k.")
    rem <- x$shared < kt
    x$id[rem] <- NA
    x$dist[rem] <- NA
    x$shared[rem] <- NA
  }

  x
}

#' @rdname sNN
#' @export
sort.sNN <- function(x, decreasing = TRUE, ...) {
  if (isTRUE(x$sort_shared))
    return(x)
  if (is.null(x$shared))
    stop("Unable to sort. Number of shared neighbors is missing.")
  if (ncol(x$id) < 2) {
    x$sort <- TRUE
    x$sort_shared <- TRUE
    return(x)
  }

  ## sort first by number of shared points (decreasing) and break ties by id (increasing)
  k <- ncol(x$shared)
  o <- vapply(
    seq_len(nrow(x$shared)),
    function(i) order(k - x$shared[i, ], x$id[i, ], decreasing = !decreasing),
    integer(k)
  )
  for (i in seq_len(ncol(o))) {
    x$shared[i, ] <- x$shared[i, ][o[, i]]
    x$dist[i, ] <- x$dist[i, ][o[, i]]
    x$id[i, ] <- x$id[i, ][o[, i]]
  }

  x$sort <- FALSE
  x$sort_shared <- TRUE

  x
}

#' @rdname sNN
#' @export
print.sNN <- function(x, ...) {
  cat(
    "shared-nearest neighbors for ",
    nrow(x$id),
    " objects (k=",
    x$k,
    ", kt=",
    x$kt %||% "NULL",
    ").",
    "\n",
    sep = ""
  )
  cat("Available fields: ", toString(names(x)), "\n", sep = "")
}


================================================
FILE: R/sNNclust.R
================================================
#######################################################################
# dbscan - Density Based Clustering of Applications with Noise
#          and Related Algorithms
# Copyright (C) 2017 Michael Hahsler

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.


#' Shared Nearest Neighbor Clustering
#'
#' Implements the shared nearest neighbor clustering algorithm by Ertoz,
#' Steinbach and Kumar (2003).
#'
#' **Algorithm:**
#'
#' 1. Constructs a shared nearest neighbor graph for a given k. The edge
#' weights are the number of shared k nearest neighbors (in the range of
#' \eqn{[0, k]}).
#'
#' 2. Find each points SNN density, i.e., the number of points which have a
#' similarity of `eps` or greater.
#'
#' 3. Find the core points, i.e., all points that have an SNN density greater
#' than `MinPts`.
#'
#' 4. Form clusters from the core points and assign border points (i.e.,
#' non-core points which share at least `eps` neighbors with a core point).
#'
#' Note that steps 2-4 are equivalent to the DBSCAN algorithm (see [dbscan()])
#' and that `eps` has a different meaning than for DBSCAN. Here it is
#' a threshold on the number of shared neighbors (see [sNN()])
#' which defines a similarity.
#'
#' @aliases sNNclust snnclust
#' @family clustering functions
#'
#' @param x a data matrix/data.frame (Euclidean distance is used), a
#' precomputed [dist] object or a kNN object created with [kNN()].
#' @param k Neighborhood size for nearest neighbor sparsification to create the
#' shared NN graph.
#' @param eps Two objects are only reachable from each other if they share at
#' least `eps` nearest neighbors. Note: this is different from the `eps` in DBSCAN!
#' @param minPts minimum number of points that share at least `eps`
#' nearest neighbors for a point to be considered a core points.
#' @param borderPoints should border points be assigned to clusters like in
#' [DBSCAN]?
#' @param ...  additional arguments are passed on to the k nearest neighbor
#' search algorithm. See [kNN()] for details on how to control the
#' search strategy.
#'
#' @return A object of class `general_clustering` with the following
#' components:
#' \item{cluster }{A integer vector with cluster assignments. Zero
#' indicates noise points.}
#' \item{type }{ name of used clustering algorithm.}
#' \item{param }{ list of used clustering parameters. }
#'
#' @author Michael Hahsler
#'
#' @references Levent Ertoz, Michael Steinbach, Vipin Kumar, Finding Clusters
#' of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data,
#' _SIAM International Conference on Data Mining,_ 2003, 47-59.
#' \doi{10.1137/1.9781611972733.5}
#' @keywords model clustering
#' @examples
#' data("DS3")
#'
#' # Out of k = 20 NN 7 (eps) have to be shared to create a link in the sNN graph.
#' # A point needs a least 16 (minPts) links in the sNN graph to be a core point.
#' # Noise points have cluster id 0 and are shown in black.
#' cl <- sNNclust(DS3, k = 20, eps = 7, minPts = 16)
#' cl
#'
#' clplot(DS3, cl)
#'
#' @export
sNNclust <- function(x, k, eps, minPts, borderPoints = TRUE, ...) {
  nn <- sNN(x, k = k, jp = TRUE, ...)

  # convert into a frNN object which already enforces eps
  nn_list <- lapply(seq_len(nrow(nn$id)),
    FUN = function(i) unname(nn$id[i, nn$shared[i, ] >= eps]))
  snn <- structure(list(id = nn_list, eps = eps, metric = nn$metric),
    class = c("NN", "frNN"))

  # run dbscan
  cl <- dbscan(snn, minPts = minPts, borderPoints = borderPoints)

  structure(list(cluster = cl$cluster,
    type = "SharedNN clustering",
    param = list(k = k, eps = eps, minPts = minPts, borderPoints = borderPoints),
    metric = cl$metric),
    class = "general_clustering")
}


================================================
FILE: R/utils.R
================================================
`%||%` <- function(x, y) {
  if (is.null(x)) y else x
}


================================================
FILE: R/zzz.R
================================================
# ANN uses a global KD_TRIVIAL structure which needs to be removed.
.onUnload <- function(libpath) {
  ANN_cleanup()
  #cat("Cleaning up after ANN.\n")
}


================================================
FILE: README.Rmd
================================================
---
output: github_document
bibliography: vignettes/dbscan.bib
link-citations: yes
---

```{r echo=FALSE, results = 'asis'}
pkg <- 'dbscan'

source("https://raw.githubusercontent.com/mhahsler/pkg_helpers/main/pkg_helpers.R")
pkg_title(pkg, anaconda = "r-dbscan", stackoverflow = "dbscan%2br")
```

## Introduction

This R package [@hahsler2019dbscan] provides a fast C++ (re)implementation of several density-based algorithms with a focus on the DBSCAN family for clustering spatial data.
The package includes: 
 
__Clustering__

- __DBSCAN:__ Density-based spatial clustering of applications with noise [@ester1996density].
- __Jarvis-Patrick Clustering__: Clustering using a similarity measure based
on shared near neighbors [@jarvis1973].
- __SNN Clustering__: Shared nearest neighbor clustering [@erdoz2003].
- __HDBSCAN:__  Hierarchical DBSCAN with simplified hierarchy extraction [@campello2015hierarchical].
- __FOSC:__ Framework for optimal selection of clusters for unsupervised and semisupervised clustering of hierarchical cluster tree [@campello2013density].
- __OPTICS/OPTICSXi:__ Ordering points to identify the clustering structure and cluster extraction methods
  [@ankerst1999optics].

__Outlier Detection__

- __LOF:__ Local outlier factor algorithm [@breunig2000lof]. 
- __GLOSH:__ Global-Local Outlier Score from Hierarchies algorithm [@campello2015hierarchical]. 

__Cluster Evaluation__

- __DBCV:__ Density-based clustering validation [@moulavi2014].

__Fast Nearest-Neighbor Search (using kd-trees)__

- __kNN search__
- __Fixed-radius NN search__


The implementations use the kd-tree data structure (from library ANN) for faster k-nearest neighbor search, and are
for Euclidean distance typically faster than the native R implementations (e.g., dbscan in package `fpc`), or the 
implementations in [WEKA](https://ml.cms.waikato.ac.nz/weka/), [ELKI](https://elki-project.github.io/) and [Python's scikit-learn](https://scikit-learn.org/).

```{r echo=FALSE, results = 'asis'}
pkg_usage(pkg)
pkg_citation(pkg, 2)
pkg_install(pkg)
```

## Usage

Load the package and use the numeric variables in the iris dataset
```{r}
library("dbscan")

data("iris")
x <- as.matrix(iris[, 1:4])
```

DBSCAN
```{r}
db <- dbscan(x, eps = .42, minPts = 5)
db
```

Visualize the resulting clustering (noise points are shown in black).
```{r dbscan}
pairs(x, col = db$cluster + 1L)
```


OPTICS
```{r}
opt <- optics(x, eps = 1, minPts = 4)
opt
```

Extract DBSCAN-like clustering from OPTICS 
and create a reachability plot (extracted DBSCAN clusters at eps_cl=.4 are colored)
```{r OPTICS_extractDBSCAN, fig.height=3}
opt <- extractDBSCAN(opt, eps_cl = .4)
plot(opt)
```

HDBSCAN

```{r}
hdb <- hdbscan(x, minPts = 4)
hdb
```

Visualize the hierarchical clustering as a simplified tree. HDBSCAN finds 2 stable clusters.

```{r hdbscan, fig.height=4}
plot(hdb, show_flat = TRUE)
```

## Using dbscan with tidyverse

`dbscan` provides for all clustering algorithms `tidy()`, `augment()`, and `glance()` so they can
be easily used with tidyverse, ggplot2 and [tidymodels](https://www.tidymodels.org/learn/statistics/k-means/).

```{r tidyverse, message=FALSE, warning=FALSE}
library(tidyverse)
db <- x %>% dbscan(eps = .42, minPts = 5)
```

Get cluster statistics as a tibble

```{r tidyverse2}
tidy(db)
```

Visualize the clustering with ggplot2 (use an x for noise points)
```{r tidyverse3}
augment(db, x) %>% 
  ggplot(aes(x = Petal.Length, y = Petal.Width)) +
    geom_point(aes(color = .cluster, shape = noise)) +
    scale_shape_manual(values=c(19, 4))

```


## Using dbscan from Python
R, the R package `dbscan`, and the Python package `rpy2` need to be installed.

```{python, eval = FALSE, python.reticulate = FALSE}
import pandas as pd
import numpy as np

### prepare data
iris = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', 
                   header = None, 
                   names = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species'])
iris_numeric = iris[['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth']]

# get R dbscan package
from rpy2.robjects import packages
dbscan = packages.importr('dbscan')

# enable automatic conversion of pandas dataframes to R dataframes
from rpy2.robjects import pandas2ri
pandas2ri.activate()

db = dbscan.dbscan(iris_numeric, eps = 0.5, MinPts = 5)
print(db)
```

```
## DBSCAN clustering for 150 objects.
## Parameters: eps = 0.5, minPts = 5
## Using euclidean distances and borderpoints = TRUE
## The clustering contains 2 cluster(s) and 17 noise points.
## 
##  0  1  2 
## 17 49 84 
## 
## Available fields: cluster, eps, minPts, dist, borderPoints
```

```{python, eval = FALSE, python.reticulate = FALSE}
# get the cluster assignment vector
labels = np.array(db.rx('cluster'))
labels
```

```
## array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
##         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1,
##         1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 0, 2, 2, 0, 2, 2, 2, 2, 2,
##         2, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0,
##         2, 2, 2, 2, 2, 0, 2, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2, 0, 0, 2, 0, 0,
##         2, 2, 2, 2, 2, 2, 2, 0, 0, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 0,
##         2, 2, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]],
##       dtype=int32)
```

## License 
The dbscan package is licensed under the [GNU General Public License (GPL) Version 3](https://www.gnu.org/licenses/gpl-3.0.en.html). The __OPTICSXi__ R implementation was directly ported from the ELKI framework's Java implementation (GNU AGPLv3), with permission by the original author, Erich Schubert.  

## Changes
* List of changes from [NEWS.md](https://github.com/mhahsler/dbscan/blob/master/NEWS.md)

## References


================================================
FILE: README.md
================================================

# <img src="man/figures/logo.svg" align="right" height="139" /> R package dbscan - Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Related Algorithms

[![Package on
CRAN](https://www.r-pkg.org/badges/version/dbscan)](https://CRAN.R-project.org/package=dbscan)
[![CRAN RStudio mirror
downloads](https://cranlogs.r-pkg.org/badges/dbscan)](https://CRAN.R-project.org/package=dbscan)
![License](https://img.shields.io/cran/l/dbscan)
[![Anaconda.org](https://anaconda.org/conda-forge/r-dbscan/badges/version.svg)](https://anaconda.org/conda-forge/r-dbscan)
[![r-universe
status](https://mhahsler.r-universe.dev/badges/dbscan)](https://mhahsler.r-universe.dev/dbscan)
[![StackOverflow](https://img.shields.io/badge/stackoverflow-dbscan%2br-orange.svg)](https://stackoverflow.com/questions/tagged/dbscan%2br)

## Introduction

This R package ([Hahsler, Piekenbrock, and Doran
2019](#ref-hahsler2019dbscan)) provides a fast C++ (re)implementation of
several density-based algorithms with a focus on the DBSCAN family for
clustering spatial data. The package includes:

**Clustering**

- **DBSCAN:** Density-based spatial clustering of applications with
  noise ([Ester et al. 1996](#ref-ester1996density)).
- **Jarvis-Patrick Clustering**: Clustering using a similarity measure
  based on shared near neighbors ([Jarvis and Patrick
  1973](#ref-jarvis1973)).
- **SNN Clustering**: Shared nearest neighbor clustering ([Ertöz,
  Steinbach, and Kumar 2003](#ref-erdoz2003)).
- **HDBSCAN:** Hierarchical DBSCAN with simplified hierarchy extraction
  ([Campello et al. 2015](#ref-campello2015hierarchical)).
- **FOSC:** Framework for optimal selection of clusters for unsupervised
  and semisupervised clustering of hierarchical cluster tree ([Campello,
  Moulavi, and Sander 2013](#ref-campello2013density)).
- **OPTICS/OPTICSXi:** Ordering points to identify the clustering
  structure and cluster extraction methods ([Ankerst et al.
  1999](#ref-ankerst1999optics)).

**Outlier Detection**

- **LOF:** Local outlier factor algorithm ([Breunig et al.
  2000](#ref-breunig2000lof)).
- **GLOSH:** Global-Local Outlier Score from Hierarchies algorithm
  ([Campello et al. 2015](#ref-campello2015hierarchical)).

**Cluster Evaluation**

- **DBCV:** Density-based clustering validation ([Moulavi et al.
  2014](#ref-moulavi2014)).

**Fast Nearest-Neighbor Search (using kd-trees)**

- **kNN search**
- **Fixed-radius NN search**

The implementations use the kd-tree data structure (from library ANN)
for faster k-nearest neighbor search, and are for Euclidean distance
typically faster than the native R implementations (e.g., dbscan in
package `fpc`), or the implementations in
[WEKA](https://ml.cms.waikato.ac.nz/weka/),
[ELKI](https://elki-project.github.io/) and [Python’s
scikit-learn](https://scikit-learn.org/).

The following R packages use `dbscan`:
[AnimalSequences](https://CRAN.R-project.org/package=AnimalSequences),
[bioregion](https://CRAN.R-project.org/package=bioregion),
[clayringsmiletus](https://CRAN.R-project.org/package=clayringsmiletus),
[CLONETv2](https://CRAN.R-project.org/package=CLONETv2),
[clusterWebApp](https://CRAN.R-project.org/package=clusterWebApp),
[cordillera](https://CRAN.R-project.org/package=cordillera),
[CPC](https://CRAN.R-project.org/package=CPC),
[crosshap](https://CRAN.R-project.org/package=crosshap),
[crownsegmentr](https://CRAN.R-project.org/package=crownsegmentr),
[CspStandSegmentation](https://CRAN.R-project.org/package=CspStandSegmentation),
[daltoolbox](https://CRAN.R-project.org/package=daltoolbox),
[DataSimilarity](https://CRAN.R-project.org/package=DataSimilarity),
[diceR](https://CRAN.R-project.org/package=diceR),
[dobin](https://CRAN.R-project.org/package=dobin),
[doc2vec](https://CRAN.R-project.org/package=doc2vec),
[dPCP](https://CRAN.R-project.org/package=dPCP),
[emcAdr](https://CRAN.R-project.org/package=emcAdr),
[eventstream](https://CRAN.R-project.org/package=eventstream),
[evprof](https://CRAN.R-project.org/package=evprof),
[fastml](https://CRAN.R-project.org/package=fastml),
[FCPS](https://CRAN.R-project.org/package=FCPS),
[flowcluster](https://CRAN.R-project.org/package=flowcluster),
[funtimes](https://CRAN.R-project.org/package=funtimes),
[FuzzyDBScan](https://CRAN.R-project.org/package=FuzzyDBScan),
[HaploVar](https://CRAN.R-project.org/package=HaploVar),
[immunaut](https://CRAN.R-project.org/package=immunaut),
[karyotapR](https://CRAN.R-project.org/package=karyotapR),
[ksharp](https://CRAN.R-project.org/package=ksharp),
[LLMing](https://CRAN.R-project.org/package=LLMing),
[LOMAR](https://CRAN.R-project.org/package=LOMAR),
[maotai](https://CRAN.R-project.org/package=maotai),
[MapperAlgo](https://CRAN.R-project.org/package=MapperAlgo),
[metaCluster](https://CRAN.R-project.org/package=metaCluster),
[metasnf](https://CRAN.R-project.org/package=metasnf),
[mlr3cluster](https://CRAN.R-project.org/package=mlr3cluster),
[neuroim2](https://CRAN.R-project.org/package=neuroim2),
[oclust](https://CRAN.R-project.org/package=oclust),
[omicsTools](https://CRAN.R-project.org/package=omicsTools),
[openSkies](https://CRAN.R-project.org/package=openSkies),
[opticskxi](https://CRAN.R-project.org/package=opticskxi),
[OTclust](https://CRAN.R-project.org/package=OTclust),
[outlierensembles](https://CRAN.R-project.org/package=outlierensembles),
[outlierMBC](https://CRAN.R-project.org/package=outlierMBC),
[pagoda2](https://CRAN.R-project.org/package=pagoda2),
[parameters](https://CRAN.R-project.org/package=parameters),
[ParBayesianOptimization](https://CRAN.R-project.org/package=ParBayesianOptimization),
[performance](https://CRAN.R-project.org/package=performance),
[PiC](https://CRAN.R-project.org/package=PiC),
[rcrisp](https://CRAN.R-project.org/package=rcrisp),
[rMultiNet](https://CRAN.R-project.org/package=rMultiNet),
[seriation](https://CRAN.R-project.org/package=seriation),
[sfdep](https://CRAN.R-project.org/package=sfdep),
[sfnetworks](https://CRAN.R-project.org/package=sfnetworks),
[sharp](https://CRAN.R-project.org/package=sharp),
[smotefamily](https://CRAN.R-project.org/package=smotefamily),
[snap](https://CRAN.R-project.org/package=snap),
[spdep](https://CRAN.R-project.org/package=spdep),
[spNetwork](https://CRAN.R-project.org/package=spNetwork),
[ssMRCD](https://CRAN.R-project.org/package=ssMRCD),
[stream](https://CRAN.R-project.org/package=stream),
[SuperCell](https://CRAN.R-project.org/package=SuperCell),
[synr](https://CRAN.R-project.org/package=synr),
[tidySEM](https://CRAN.R-project.org/package=tidySEM),
[VBphenoR](https://CRAN.R-project.org/package=VBphenoR),
[VIProDesign](https://CRAN.R-project.org/package=VIProDesign),
[weird](https://CRAN.R-project.org/package=weird)

To cite package ‘dbscan’ in publications use:

> Hahsler M, Piekenbrock M, Doran D (2019). “dbscan: Fast Density-Based
> Clustering with R.” *Journal of Statistical Software*, *91*(1), 1-30.
> <doi:10.18637/jss.v091.i01> <https://doi.org/10.18637/jss.v091.i01>.

    @Article{,
      title = {{dbscan}: Fast Density-Based Clustering with {R}},
      author = {Michael Hahsler and Matthew Piekenbrock and Derek Doran},
      journal = {Journal of Statistical Software},
      year = {2019},
      volume = {91},
      number = {1},
      pages = {1--30},
      doi = {10.18637/jss.v091.i01},
    }

## Installation

**Stable CRAN version:** Install from within R with

``` r
install.packages("dbscan")
```

**Current development version:** Install from
[r-universe.](https://mhahsler.r-universe.dev/dbscan)

``` r
install.packages("dbscan",
    repos = c("https://mhahsler.r-universe.dev",
              "https://cloud.r-project.org/"))
```

## Usage

Load the package and use the numeric variables in the iris dataset

``` r
library("dbscan")

data("iris")
x <- as.matrix(iris[, 1:4])
```

DBSCAN

``` r
db <- dbscan(x, eps = 0.42, minPts = 5)
db
```

    ## DBSCAN clustering for 150 objects.
    ## Parameters: eps = 0.42, minPts = 5
    ## Using euclidean distances and borderpoints = TRUE
    ## The clustering contains 3 cluster(s) and 29 noise points.
    ## 
    ##  0  1  2  3 
    ## 29 48 37 36 
    ## 
    ## Available fields: cluster, eps, minPts, metric, borderPoints

Visualize the resulting clustering (noise points are shown in black).

``` r
pairs(x, col = db$cluster + 1L)
```

![](inst/README_files/dbscan-1.png)<!-- -->

OPTICS

``` r
opt <- optics(x, eps = 1, minPts = 4)
opt
```

    ## OPTICS ordering/clustering for 150 objects.
    ## Parameters: minPts = 4, eps = 1, eps_cl = NA, xi = NA
    ## Available fields: order, reachdist, coredist, predecessor, minPts, eps,
    ##                   eps_cl, xi

Extract DBSCAN-like clustering from OPTICS and create a reachability
plot (extracted DBSCAN clusters at eps_cl=.4 are colored)

``` r
opt <- extractDBSCAN(opt, eps_cl = 0.4)
plot(opt)
```

![](inst/README_files/OPTICS_extractDBSCAN-1.png)<!-- -->

HDBSCAN

``` r
hdb <- hdbscan(x, minPts = 4)
hdb
```

    ## HDBSCAN clustering for 150 objects.
    ## Parameters: minPts = 4
    ## The clustering contains 2 cluster(s) and 0 noise points.
    ## 
    ##   1   2 
    ## 100  50 
    ## 
    ## Available fields: cluster, minPts, coredist, cluster_scores,
    ##                   membership_prob, outlier_scores, hc

Visualize the hierarchical clustering as a simplified tree. HDBSCAN
finds 2 stable clusters.

``` r
plot(hdb, show_flat = TRUE)
```

![](inst/README_files/hdbscan-1.png)<!-- -->

## Using dbscan with tidyverse

`dbscan` provides for all clustering algorithms `tidy()`, `augment()`,
and `glance()` so they can be easily used with tidyverse, ggplot2 and
[tidymodels](https://www.tidymodels.org/learn/statistics/k-means/).

``` r
library(tidyverse)
db <- x %>%
    dbscan(eps = 0.42, minPts = 5)
```

Get cluster statistics as a tibble

``` r
tidy(db)
```

    ## # A tibble: 4 × 3
    ##   cluster  size noise
    ##   <fct>   <int> <lgl>
    ## 1 0          29 TRUE 
    ## 2 1          48 FALSE
    ## 3 2          37 FALSE
    ## 4 3          36 FALSE

Visualize the clustering with ggplot2 (use an x for noise points)

``` r
augment(db, x) %>%
    ggplot(aes(x = Petal.Length, y = Petal.Width)) + geom_point(aes(color = .cluster,
    shape = noise)) + scale_shape_manual(values = c(19, 4))
```

![](inst/README_files/tidyverse3-1.png)<!-- -->

## Using dbscan from Python

R, the R package `dbscan`, and the Python package `rpy2` need to be
installed.

``` python
import pandas as pd
import numpy as np

### prepare data
iris = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', 
                   header = None, 
                   names = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species'])
iris_numeric = iris[['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth']]

# get R dbscan package
from rpy2.robjects import packages
dbscan = packages.importr('dbscan')

# enable automatic conversion of pandas dataframes to R dataframes
from rpy2.robjects import pandas2ri
pandas2ri.activate()

db = dbscan.dbscan(iris_numeric, eps = 0.5, MinPts = 5)
print(db)
```

    ## DBSCAN clustering for 150 objects.
    ## Parameters: eps = 0.5, minPts = 5
    ## Using euclidean distances and borderpoints = TRUE
    ## The clustering contains 2 cluster(s) and 17 noise points.
    ## 
    ##  0  1  2 
    ## 17 49 84 
    ## 
    ## Available fields: cluster, eps, minPts, dist, borderPoints

``` python
# get the cluster assignment vector
labels = np.array(db.rx('cluster'))
labels
```

    ## array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    ##         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1,
    ##         1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 0, 2, 2, 0, 2, 2, 2, 2, 2,
    ##         2, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0,
    ##         2, 2, 2, 2, 2, 0, 2, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2, 0, 0, 2, 0, 0,
    ##         2, 2, 2, 2, 2, 2, 2, 0, 0, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 0,
    ##         2, 2, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]],
    ##       dtype=int32)

## License

The dbscan package is licensed under the [GNU General Public License
(GPL) Version 3](https://www.gnu.org/licenses/gpl-3.0.en.html). The
**OPTICSXi** R implementation was directly ported from the ELKI
framework’s Java implementation (GNU AGPLv3), with permission by the
original author, Erich Schubert.

## Changes

- List of changes from
  [NEWS.md](https://github.com/mhahsler/dbscan/blob/master/NEWS.md)

## References

<div id="refs" class="references csl-bib-body hanging-indent"
entry-spacing="0">

<div id="ref-ankerst1999optics" class="csl-entry">

Ankerst, Mihael, Markus M Breunig, Hans-Peter Kriegel, and Jörg Sander.
1999. “OPTICS: Ordering Points to Identify the Clustering Structure.” In
*ACM Sigmod Record*, 28:49–60. 2. ACM.
<https://doi.org/10.1145/304181.304187>.

</div>

<div id="ref-breunig2000lof" class="csl-entry">

Breunig, Markus M, Hans-Peter Kriegel, Raymond T Ng, and Jörg Sander.
2000. “LOF: Identifying Density-Based Local Outliers.” In *ACM Int.
Conf. On Management of Data*, 29:93–104. 2. ACM.
<https://doi.org/10.1145/335191.335388>.

</div>

<div id="ref-campello2013density" class="csl-entry">

Campello, Ricardo JGB, Davoud Moulavi, and Jörg Sander. 2013.
“Density-Based Clustering Based on Hierarchical Density Estimates.” In
*Pacific-Asia Conference on Knowledge Discovery and Data Mining*,
160–72. Springer. <https://doi.org/10.1007/978-3-642-37456-2_14>.

</div>

<div id="ref-campello2015hierarchical" class="csl-entry">

Campello, Ricardo JGB, Davoud Moulavi, Arthur Zimek, and Joerg Sander.
2015. “Hierarchical Density Estimates for Data Clustering,
Visualization, and Outlier Detection.” *ACM Transactions on Knowledge
Discovery from Data (TKDD)* 10 (1): 5.
<https://doi.org/10.1145/2733381>.

</div>

<div id="ref-erdoz2003" class="csl-entry">

Ertöz, Levent, Michael Steinbach, and Vipin Kumar. 2003. “Finding
Clusters of Different Sizes, Shapes, and Densities in Noisy, High
Dimensional Data.” In *Proceedings of the 2003 SIAM International
Conference on Data Mining (SDM)*, 47–58.
<https://doi.org/10.1137/1.9781611972733.5>.

</div>

<div id="ref-ester1996density" class="csl-entry">

Ester, Martin, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. 1996.
“A Density-Based Algorithm for Discovering Clusters in Large Spatial
Databases with Noise.” In *Proceedings of 2nd International Conference
on Knowledge Discovery and Data Mining (KDD-96)*, 226–31.
<https://dl.acm.org/doi/10.5555/3001460.3001507>.

</div>

<div id="ref-hahsler2019dbscan" class="csl-entry">

Hahsler, Michael, Matthew Piekenbrock, and Derek Doran. 2019.
“<span class="nocase">dbscan</span>: Fast Density-Based Clustering with
R.” *Journal of Statistical Software* 91 (1): 1–30.
<https://doi.org/10.18637/jss.v091.i01>.

</div>

<div id="ref-jarvis1973" class="csl-entry">

Jarvis, R. A., and E. A. Patrick. 1973. “Clustering Using a Similarity
Measure Based on Shared Near Neighbors.” *IEEE Transactions on
Computers* C-22 (11): 1025–34.
<https://doi.org/10.1109/T-C.1973.223640>.

</div>

<div id="ref-moulavi2014" class="csl-entry">

Moulavi, Davoud, Pablo A. Jaskowiak, Ricardo J. G. B. Campello, Arthur
Zimek, and Jörg Sander. 2014. “Density-Based Clustering Validation.” In
*Proceedings of the 2014 SIAM International Conference on Data Mining
(SDM)*, 839–47. <https://doi.org/10.1137/1.9781611973440.96>.

</div>

</div>


================================================
FILE: data_src/data_DBCV/dataset_1.txt
================================================
-0.0014755 0.99852 1
-0.005943 0.98904 1
0.028184 1.0181 1
0.019204 1.0041 1
0.033017 1.0128 1
0.011014 0.9857 1
0.033779 1.0033 1
0.045243 1.0096 1
0.02493 0.98413 1
0.064521 1.0185 1
0.032742 0.98149 1
0.042959 0.98645 1
0.049146 0.98734 1
0.05769 0.99058 1
0.070368 0.99792 1
0.070434 0.99262 1
0.09811 1.0149 1
0.078285 0.98967 1
0.096586 1.0025 1
0.10724 1.0077 1
0.083108 0.9781 1
0.088157 0.97763 1
0.092311 0.97624 1
0.10984 0.98821 1
0.12512 0.99789 1
0.13833 1.0055 1
0.12534 0.98686 1
0.13543 0.99127 1
0.13098 0.98113 1
0.14075 0.98519 1
0.16177 1.0005 1
0.13901 0.97193 1
0.14619 0.97331 1
0.14712 0.96842 1
0.16767 0.98311 1
0.19442 1.004 1
0.16394 0.96761 1
0.1977 0.99543 1
0.19514 0.98692 1
0.1946 0.9804 1
0.19852 0.97831 1
0.20655 0.98031 1
0.20457 0.97227 1
0.22232 0.98393 1
0.23737 0.99287 1
0.22462 0.97398 1
0.23313 0.97632 1
0.22676 0.96375 1
0.246 0.97677 1
0.26077 0.98529 1
0.26161 0.97986 1
0.23546 0.9474 1
0.2654 0.97101 1
0.24746 0.9467 1
0.26646 0.95933 1
0.29237 0.97882 1
0.26142 0.94142 1
0.29617 0.9697 1
0.29783 0.96485 1
0.27501 0.93551 1
0.2995 0.95344 1
0.29481 0.94216 1
0.31401 0.95475 1
0.32047 0.95457 1
0.32755 0.95496 1
0.31955 0.94027 1
0.33585 0.94983 1
0.33838 0.9456 1
0.32029 0.92072 1
0.32917 0.92278 1
0.36377 0.95052 1
0.34103 0.9209 1
0.34455 0.9175 1
0.36578 0.93179 1
0.36666 0.92569 1
0.38252 0.93455 1
0.38847 0.93345 1
0.40353 0.94145 1
0.38628 0.91709 1
0.39619 0.91987 1
0.40831 0.92482 1
0.42051 0.92983 1
0.42992 0.932 1
0.41207 0.90689 1
0.41348 0.901 1
0.41216 0.89236 1
0.42511 0.89794 1
0.44358 0.90901 1
0.44485 0.90285 1
0.43699 0.88752 1
0.45736 0.90039 1
0.44539 0.88088 1
0.44175 0.86967 1
0.45383 0.87414 1
0.47455 0.88721 1
0.46535 0.87033 1
0.47352 0.87079 1
0.48349 0.873 1
0.48279 0.86452 1
0.4897 0.86359 1
0.4966 0.86263 1
0.5235 0.88162 1
0.51375 0.86392 1
0.51293 0.85512 1
0.51094 0.8451 1
0.53526 0.86136 1
0.52601 0.84401 1
0.52951 0.83937 1
0.53659 0.83826 1
0.54668 0.84011 1
0.55938 0.84454 1
0.57416 0.85101 1
0.56963 0.83812 1
0.58407 0.84416 1
0.55567 0.80732 1
0.56363 0.80678 1
0.59075 0.82537 1
0.60254 0.82858 1
0.60324 0.82064 1
0.58442 0.79315 1
0.60202 0.80202 1
0.60983 0.80106 1
0.62846 0.81087 1
0.63324 0.80676 1
0.6081 0.77271 1
0.6167 0.77233 1
0.63294 0.77954 1
0.63518 0.7727 1
0.62163 0.75 1
0.63385 0.75303 1
0.64162 0.75155 1
0.63656 0.73719 1
0.66559 0.75686 1
0.65921 0.74105 1
0.67238 0.74474 1
0.66346 0.72628 1
0.69846 0.75167 1
0.6876 0.73114 1
0.69558 0.72939 1
0.67529 0.6993 1
0.69987 0.71402 1
0.69774 0.70195 1
0.71685 0.71106 1
0.69996 0.68408 1
0.70054 0.67451 1
0.72628 0.69003 1
0.72721 0.68066 1
0.74228 0.68534 1
0.75923 0.69184 1
0.73849 0.66055 1
0.74532 0.65676 1
0.76487 0.66559 1
0.76875 0.65868 1
0.78436 0.66339 1
0.76745 0.6355 1
0.77718 0.63414 1
0.79609 0.64187 1
0.76966 0.60415 1
0.77308 0.59619 1
0.80871 0.62031 1
0.79292 0.59292 1
0.80364 0.59192 1
0.81851 0.59494 1
0.81311 0.57757 1
0.81251 0.56488 1
0.80587 0.546 1
0.81022 0.53799 1
0.81768 0.53293 1
0.82183 0.52442 1
0.84505 0.53482 1
0.83976 0.51654 1
0.8495 0.51313 1
0.87442 0.52471 1
0.88198 0.51875 1
0.88773 0.51078 1
0.85546 0.46458 1
0.89719 0.49216 1
0.87605 0.45664 1
0.87376 0.43972 1
0.90767 0.45874 1
0.90549 0.44138 1
0.90785 0.42826 1
0.89003 0.39464 1
0.90694 0.3954 1
0.93518 0.4071 1
0.92258 0.37754 1
0.91917 0.35673 1
0.94426 0.36391 1
0.92657 0.32774 1
0.94763 0.3297 1
0.95621 0.31846 1
0.93664 0.27824 1
0.94663 0.26663 1
0.9509 0.24815 1
0.97853 0.25164 1
0.98948 0.23668 1
0.97915 0.19814 1
0.98452 0.17207 1
0.99067 0.14174 1
0.9892 0.094075 1
0.98787 -0.012127 1
0.0014755 -0.99852 2
0.005943 -0.98904 2
-0.028184 -1.0181 2
-0.019204 -1.0041 2
-0.033017 -1.0128 2
-0.011014 -0.9857 2
-0.033779 -1.0033 2
-0.045243 -1.0096 2
-0.02493 -0.98413 2
-0.064521 -1.0185 2
-0.032742 -0.98149 2
-0.042959 -0.98645 2
-0.049146 -0.98734 2
-0.05769 -0.99058 2
-0.070368 -0.99792 2
-0.070434 -0.99262 2
-0.09811 -1.0149 2
-0.078285 -0.98967 2
-0.096586 -1.0025 2
-0.10724 -1.0077 2
-0.083108 -0.9781 2
-0.088157 -0.97763 2
-0.092311 -0.97624 2
-0.10984 -0.98821 2
-0.12512 -0.99789 2
-0.13833 -1.0055 2
-0.12534 -0.98686 2
-0.13543 -0.99127 2
-0.13098 -0.98113 2
-0.14075 -0.98519 2
-0.16177 -1.0005 2
-0.13901 -0.97193 2
-0.14619 -0.97331 2
-0.14712 -0.96842 2
-0.16767 -0.98311 2
-0.19442 -1.004 2
-0.16394 -0.96761 2
-0.1977 -0.99543 2
-0.19514 -0.98692 2
-0.1946 -0.9804 2
-0.19852 -0.97831 2
-0.20655 -0.98031 2
-0.20457 -0.97227 2
-0.22232 -0.98393 2
-0.23737 -0.99287 2
-0.22462 -0.97398 2
-0.23313 -0.97632 2
-0.22676 -0.96375 2
-0.246 -0.97677 2
-0.26077 -0.98529 2
-0.26161 -0.97986 2
-0.23546 -0.9474 2
-0.2654 -0.97101 2
-0.24746 -0.9467 2
-0.26646 -0.95933 2
-0.29237 -0.97882 2
-0.26142 -0.94142 2
-0.29617 -0.9697 2
-0.29783 -0.96485 2
-0.27501 -0.93551 2
-0.2995 -0.95344 2
-0.29481 -0.94216 2
-0.31401 -0.95475 2
-0.32047 -0.95457 2
-0.32755 -0.95496 2
-0.31955 -0.94027 2
-0.33585 -0.94983 2
-0.33838 -0.9456 2
-0.32029 -0.92072 2
-0.32917 -0.92278 2
-0.36377 -0.95052 2
-0.34103 -0.9209 2
-0.34455 -0.9175 2
-0.36578 -0.93179 2
-0.36666 -0.92569 2
-0.38252 -0.93455 2
-0.38847 -0.93345 2
-0.40353 -0.94145 2
-0.38628 -0.91709 2
-0.39619 -0.91987 2
-0.40831 -0.92482 2
-0.42051 -0.92983 2
-0.42992 -0.932 2
-0.41207 -0.90689 2
-0.41348 -0.901 2
-0.41216 -0.89236 2
-0.42511 -0.89794 2
-0.44358 -0.90901 2
-0.44485 -0.90285 2
-0.43699 -0.88752 2
-0.45736 -0.90039 2
-0.44539 -0.88088 2
-0.44175 -0.86967 2
-0.45383 -0.87414 2
-0.47455 -0.88721 2
-0.46535 -0.87033 2
-0.47352 -0.87079 2
-0.48349 -0.873 2
-0.48279 -0.86452 2
-0.4897 -0.86359 2
-0.4966 -0.86263 2
-0.5235 -0.88162 2
-0.51375 -0.86392 2
-0.51293 -0.85512 2
-0.51094 -0.8451 2
-0.53526 -0.86136 2
-0.52601 -0.84401 2
-0.52951 -0.83937 2
-0.53659 -0.83826 2
-0.54668 -0.84011 2
-0.55938 -0.84454 2
-0.57416 -0.85101 2
-0.56963 -0.83812 2
-0.58407 -0.84416 2
-0.55567 -0.80732 2
-0.56363 -0.80678 2
-0.59075 -0.82537 2
-0.60254 -0.82858 2
-0.60324 -0.82064 2
-0.58442 -0.79315 2
-0.60202 -0.80202 2
-0.60983 -0.80106 2
-0.62846 -0.81087 2
-0.63324 -0.80676 2
-0.6081 -0.77271 2
-0.6167 -0.77233 2
-0.63294 -0.77954 2
-0.63518 -0.7727 2
-0.62163 -0.75 2
-0.63385 -0.75303 2
-0.64162 -0.75155 2
-0.63656 -0.73719 2
-0.66559 -0.75686 2
-0.65921 -0.74105 2
-0.67238 -0.74474 2
-0.66346 -0.72628 2
-0.69846 -0.75167 2
-0.6876 -0.73114 2
-0.69558 -0.72939 2
-0.67529 -0.6993 2
-0.69987 -0.71402 2
-0.69774 -0.70195 2
-0.71685 -0.71106 2
-0.69996 -0.68408 2
-0.70054 -0.67451 2
-0.72628 -0.69003 2
-0.72721 -0.68066 2
-0.74228 -0.68534 2
-0.75923 -0.69184 2
-0.73849 -0.66055 2
-0.74532 -0.65676 2
-0.76487 -0.66559 2
-0.76875 -0.65868 2
-0.78436 -0.66339 2
-0.76745 -0.6355 2
-0.77718 -0.63414 2
-0.79609 -0.64187 2
-0.76966 -0.60415 2
-0.77308 -0.59619 2
-0.80871 -0.62031 2
-0.79292 -0.59292 2
-0.80364 -0.59192 2
-0.81851 -0.59494 2
-0.81311 -0.57757 2
-0.81251 -0.56488 2
-0.80587 -0.546 2
-0.81022 -0.53799 2
-0.81768 -0.53293 2
-0.82183 -0.52442 2
-0.84505 -0.53482 2
-0.83976 -0.51654 2
-0.8495 -0.51313 2
-0.87442 -0.52471 2
-0.88198 -0.51875 2
-0.88773 -0.51078 2
-0.85546 -0.46458 2
-0.89719 -0.49216 2
-0.87605 -0.45664 2
-0.87376 -0.43972 2
-0.90767 -0.45874 2
-0.90549 -0.44138 2
-0.90785 -0.42826 2
-0.89003 -0.39464 2
-0.90694 -0.3954 2
-0.93518 -0.4071 2
-0.92258 -0.37754 2
-0.91917 -0.35673 2
-0.94426 -0.36391 2
-0.92657 -0.32774 2
-0.94763 -0.3297 2
-0.95621 -0.31846 2
-0.93664 -0.27824 2
-0.94663 -0.26663 2
-0.9509 -0.24815 2
-0.97853 -0.25164 2
-0.98948 -0.23668 2
-0.97915 -0.19814 2
-0.98452 -0.17207 2
-0.99067 -0.14174 2
-0.9892 -0.094075 2
-0.98787 0.012127 2
-0.0029509 1.997 3
-0.011886 1.9781 3
0.056369 2.0363 3
0.038408 2.0082 3
0.066034 2.0256 3
0.022028 1.9714 3
0.067558 2.0067 3
0.090485 2.0193 3
0.04986 1.9683 3
0.12904 2.037 3
0.065484 1.963 3
0.085919 1.9729 3
0.098292 1.9747 3
0.11538 1.9812 3
0.14074 1.9958 3
0.14087 1.9852 3
0.19622 2.0298 3
0.15657 1.9793 3
0.19317 2.0051 3
0.21449 2.0154 3
0.16622 1.9562 3
0.17631 1.9553 3
0.18462 1.9525 3
0.21968 1.9764 3
0.25024 1.9958 3
0.27666 2.011 3
0.25069 1.9737 3
0.27086 1.9825 3
0.26197 1.9623 3
0.28151 1.9704 3
0.32354 2.0009 3
0.27802 1.9439 3
0.29239 1.9466 3
0.29424 1.9368 3
0.33533 1.9662 3
0.38883 2.008 3
0.32789 1.9352 3
0.39539 1.9909 3
0.39027 1.9738 3
0.38919 1.9608 3
0.39703 1.9566 3
0.41309 1.9606 3
0.40914 1.9445 3
0.44464 1.9679 3
0.47475 1.9857 3
0.44924 1.948 3
0.46626 1.9526 3
0.45351 1.9275 3
0.492 1.9535 3
0.52153 1.9706 3
0.52323 1.9597 3
0.47091 1.8948 3
0.5308 1.942 3
0.49491 1.8934 3
0.53293 1.9187 3
0.58475 1.9576 3
0.52284 1.8828 3
0.59234 1.9394 3
0.59565 1.9297 3
0.55002 1.871 3
0.599 1.9069 3
0.58962 1.8843 3
0.62803 1.9095 3
0.64095 1.9091 3
0.65509 1.9099 3
0.63911 1.8805 3
0.6717 1.8997 3
0.67676 1.8912 3
0.64059 1.8414 3
0.65835 1.8456 3
0.72754 1.901 3
0.68206 1.8418 3
0.68909 1.835 3
0.73157 1.8636 3
0.73332 1.8514 3
0.76505 1.8691 3
0.77693 1.8669 3
0.80707 1.8829 3
0.77256 1.8342 3
0.79239 1.8397 3
0.81662 1.8496 3
0.84103 1.8597 3
0.85983 1.864 3
0.82415 1.8138 3
0.82696 1.802 3
0.82433 1.7847 3
0.85022 1.7959 3
0.88716 1.818 3
0.88971 1.8057 3
0.87398 1.775 3
0.91472 1.8008 3
0.89078 1.7618 3
0.8835 1.7393 3
0.90766 1.7483 3
0.94909 1.7744 3
0.9307 1.7407 3
0.94704 1.7416 3
0.96697 1.746 3
0.96559 1.729 3
0.9794 1.7272 3
0.99321 1.7253 3
1.047 1.7632 3
1.0275 1.7278 3
1.0259 1.7102 3
1.0219 1.6902 3
1.0705 1.7227 3
1.052 1.688 3
1.059 1.6787 3
1.0732 1.6765 3
1.0934 1.6802 3
1.1188 1.6891 3
1.1483 1.702 3
1.1393 1.6762 3
1.1681 1.6883 3
1.1113 1.6146 3
1.1273 1.6136 3
1.1815 1.6507 3
1.2051 1.6572 3
1.2065 1.6413 3
1.1688 1.5863 3
1.204 1.604 3
1.2197 1.6021 3
1.2569 1.6217 3
1.2665 1.6135 3
1.2162 1.5454 3
1.2334 1.5447 3
1.2659 1.5591 3
1.2704 1.5454 3
1.2433 1.5 3
1.2677 1.5061 3
1.2832 1.5031 3
1.2731 1.4744 3
1.3312 1.5137 3
1.3184 1.4821 3
1.3448 1.4895 3
1.3269 1.4526 3
1.3969 1.5033 3
1.3752 1.4623 3
1.3912 1.4588 3
1.3506 1.3986 3
1.3997 1.428 3
1.3955 1.4039 3
1.4337 1.4221 3
1.3999 1.3682 3
1.4011 1.349 3
1.4526 1.3801 3
1.4544 1.3613 3
1.4846 1.3707 3
1.5185 1.3837 3
1.477 1.3211 3
1.4906 1.3135 3
1.5297 1.3312 3
1.5375 1.3174 3
1.5687 1.3268 3
1.5349 1.271 3
1.5544 1.2683 3
1.5922 1.2837 3
1.5393 1.2083 3
1.5462 1.1924 3
1.6174 1.2406 3
1.5858 1.1858 3
1.6073 1.1838 3
1.637 1.1899 3
1.6262 1.1551 3
1.625 1.1298 3
1.6117 1.092 3
1.6204 1.076 3
1.6354 1.0659 3
1.6437 1.0488 3
1.6901 1.0696 3
1.6795 1.0331 3
1.699 1.0263 3
1.7488 1.0494 3
1.764 1.0375 3
1.7755 1.0216 3
1.7109 0.92917 3
1.7944 0.98432 3
1.7521 0.91328 3
1.7475 0.87945 3
1.8153 0.91747 3
1.811 0.88277 3
1.8157 0.85652 3
1.7801 0.78928 3
1.8139 0.79079 3
1.8704 0.8142 3
1.8452 0.75509 3
1.8383 0.71346 3
1.8885 0.72782 3
1.8531 0.65549 3
1.8953 0.65939 3
1.9124 0.63693 3
1.8733 0.55649 3
1.8933 0.53327 3
1.9018 0.49629 3
1.9571 0.50328 3
1.979 0.47337 3
1.9583 0.39629 3
1.969 0.34415 3
1.9813 0.28348 3
1.9784 0.18815 3
1.9757 -0.024254 3
0.0029509 -1.997 4
0.011886 -1.9781 4
-0.056369 -2.0363 4
-0.038408 -2.0082 4
-0.066034 -2.0256 4
-0.022028 -1.9714 4
-0.067558 -2.0067 4
-0.090485 -2.0193 4
-0.04986 -1.9683 4
-0.12904 -2.037 4
-0.065484 -1.963 4
-0.085919 -1.9729 4
-0.098292 -1.9747 4
-0.11538 -1.9812 4
-0.14074 -1.9958 4
-0.14087 -1.9852 4
-0.19622 -2.0298 4
-0.15657 -1.9793 4
-0.19317 -2.0051 4
-0.21449 -2.0154 4
-0.16622 -1.9562 4
-0.17631 -1.9553 4
-0.18462 -1.9525 4
-0.21968 -1.9764 4
-0.25024 -1.9958 4
-0.27666 -2.011 4
-0.25069 -1.9737 4
-0.27086 -1.9825 4
-0.26197 -1.9623 4
-0.28151 -1.9704 4
-0.32354 -2.0009 4
-0.27802 -1.9439 4
-0.29239 -1.9466 4
-0.29424 -1.9368 4
-0.33533 -1.9662 4
-0.38883 -2.008 4
-0.32789 -1.9352 4
-0.39539 -1.9909 4
-0.39027 -1.9738 4
-0.38919 -1.9608 4
-0.39703 -1.9566 4
-0.41309 -1.9606 4
-0.40914 -1.9445 4
-0.44464 -1.9679 4
-0.47475 -1.9857 4
-0.44924 -1.948 4
-0.46626 -1.9526 4
-0.45351 -1.9275 4
-0.492 -1.9535 4
-0.52153 -1.9706 4
-0.52323 -1.9597 4
-0.47091 -1.8948 4
-0.5308 -1.942 4
-0.49491 -1.8934 4
-0.53293 -1.9187 4
-0.58475 -1.9576 4
-0.52284 -1.8828 4
-0.59234 -1.9394 4
-0.59565 -1.9297 4
-0.55002 -1.871 4
-0.599 -1.9069 4
-0.58962 -1.8843 4
-0.62803 -1.9095 4
-0.64095 -1.9091 4
-0.65509 -1.9099 4
-0.63911 -1.8805 4
-0.6717 -1.8997 4
-0.67676 -1.8912 4
-0.64059 -1.8414 4
-0.65835 -1.8456 4
-0.72754 -1.901 4
-0.68206 -1.8418 4
-0.68909 -1.835 4
-0.73157 -1.8636 4
-0.73332 -1.8514 4
-0.76505 -1.8691 4
-0.77693 -1.8669 4
-0.80707 -1.8829 4
-0.77256 -1.8342 4
-0.79239 -1.8397 4
-0.81662 -1.8496 4
-0.84103 -1.8597 4
-0.85983 -1.864 4
-0.82415 -1.8138 4
-0.82696 -1.802 4
-0.82433 -1.7847 4
-0.85022 -1.7959 4
-0.88716 -1.818 4
-0.88971 -1.8057 4
-0.87398 -1.775 4
-0.91472 -1.8008 4
-0.89078 -1.7618 4
-0.8835 -1.7393 4
-0.90766 -1.7483 4
-0.94909 -1.7744 4
-0.9307 -1.7407 4
-0.94704 -1.7416 4
-0.96697 -1.746 4
-0.96559 -1.729 4
-0.9794 -1.7272 4
-0.99321 -1.7253 4
-1.047 -1.7632 4
-1.0275 -1.7278 4
-1.0259 -1.7102 4
-1.0219 -1.6902 4
-1.0705 -1.7227 4
-1.052 -1.688 4
-1.059 -1.6787 4
-1.0732 -1.6765 4
-1.0934 -1.6802 4
-1.1188 -1.6891 4
-1.1483 -1.702 4
-1.1393 -1.6762 4
-1.1681 -1.6883 4
-1.1113 -1.6146 4
-1.1273 -1.6136 4
-1.1815 -1.6507 4
-1.2051 -1.6572 4
-1.2065 -1.6413 4
-1.1688 -1.5863 4
-1.204 -1.604 4
-1.2197 -1.6021 4
-1.2569 -1.6217 4
-1.2665 -1.6135 4
-1.2162 -1.5454 4
-1.2334 -1.5447 4
-1.2659 -1.5591 4
-1.2704 -1.5454 4
-1.2433 -1.5 4
-1.2677 -1.5061 4
-1.2832 -1.5031 4
-1.2731 -1.4744 4
-1.3312 -1.5137 4
-1.3184 -1.4821 4
-1.3448 -1.4895 4
-1.3269 -1.4526 4
-1.3969 -1.5033 4
-1.3752 -1.4623 4
-1.3912 -1.4588 4
-1.3506 -1.3986 4
-1.3997 -1.428 4
-1.3955 -1.4039 4
-1.4337 -1.4221 4
-1.3999 -1.3682 4
-1.4011 -1.349 4
-1.4526 -1.3801 4
-1.4544 -1.3613 4
-1.4846 -1.3707 4
-1.5185 -1.3837 4
-1.477 -1.3211 4
-1.4906 -1.3135 4
-1.5297 -1.3312 4
-1.5375 -1.3174 4
-1.5687 -1.3268 4
-1.5349 -1.271 4
-1.5544 -1.2683 4
-1.5922 -1.2837 4
-1.5393 -1.2083 4
-1.5462 -1.1924 4
-1.6174 -1.2406 4
-1.5858 -1.1858 4
-1.6073 -1.1838 4
-1.637 -1.1899 4
-1.6262 -1.1551 4
-1.625 -1.1298 4
-1.6117 -1.092 4
-1.6204 -1.076 4
-1.6354 -1.0659 4
-1.6437 -1.0488 4
-1.6901 -1.0696 4
-1.6795 -1.0331 4
-1.699 -1.0263 4
-1.7488 -1.0494 4
-1.764 -1.0375 4
-1.7755 -1.0216 4
-1.7109 -0.92917 4
-1.7944 -0.98432 4
-1.7521 -0.91328 4
-1.7475 -0.87945 4
-1.8153 -0.91747 4
-1.811 -0.88277 4
-1.8157 -0.85652 4
-1.7801 -0.78928 4
-1.8139 -0.79079 4
-1.8704 -0.8142 4
-1.8452 -0.75509 4
-1.8383 -0.71346 4
-1.8885 -0.72782 4
-1.8531 -0.65549 4
-1.8953 -0.65939 4
-1.9124 -0.63693 4
-1.8733 -0.55649 4
-1.8933 -0.53327 4
-1.9018 -0.49629 4
-1.9571 -0.50328 4
-1.979 -0.47337 4
-1.9583 -0.39629 4
-1.969 -0.34415 4
-1.9813 -0.28348 4
-1.9784 -0.18815 4
-1.9757 0.024254 4
1.4303 -1.0155 -1
-0.47685 -0.96563 -1
0.84056 1.4012 -1
0.093202 -0.41791 -1
-0.54094 -1.6109 -1
-0.25885 -1.2472 -1
0.74337 -0.55785 -1
-1.0824 1.5259 -1
1.8981 0.40646 -1
1.8849 -0.98545 -1
-0.83407 -0.57677 -1
-0.64022 1.5788 -1
1.9672 1.6318 -1
1.1451 -0.21204 -1
1.1687 -0.9417 -1
0.52452 0.21924 -1
1.2342 -1.3084 -1
-0.20569 1.4654 -1
1.3101 -1.0919 -1
-1.4794 -1.3521 -1
0.052576 -1.9281 -1
0.85565 -0.72342 -1
-0.998 0.22474 -1
0.12641 1.3221 -1
-0.46676 1.2395 -1
1.1958 -1.9376 -1
0.67705 -0.52349 -1
1.9134 -0.033122 -1
1.7309 -0.1383 -1
0.30224 -1.8671 -1
-1.6636 0.47667 -1
-0.34148 0.31791 -1
-1.2647 -0.81965 -1
1.964 -0.2621 -1
0.080782 -1.4804 -1
1.5267 -0.81594 -1
0.58746 1.0648 -1
-0.13372 -1.8932 -1
-1.6037 -0.93906 -1
1.8538 2.0218 -1
0.47595 -0.21614 -1
-1.3631 -1.4146 -1
-0.40273 1.5735 -1
1.5157 -1.9092 -1
0.1546 -1.5643 -1
0.17307 -1.015 -1
-0.22804 1.0579 -1
-1.2532 1.6227 -1
-0.9937 -1.1268 -1
-0.85152 0.70602 -1
0.11693 1.2987 -1
0.23711 1.8289 -1
-0.33624 1.525 -1
1.6075 -0.43292 -1
-0.77214 1.7802 -1
0.59348 -0.25709 -1
-0.83697 -1.3749 -1
-0.96984 -0.77479 -1
-0.56196 0.73784 -1
1.2122 1.7683 -1
0.15425 1.8227 -1
0.35689 0.40366 -1
-1.0654 1.8287 -1
-1.5773 -0.39103 -1
0.57317 -1.8698 -1
1.9026 -0.83995 -1
-1.5782 -1.9069 -1
-1.2369 1.485 -1
-1.9441 -0.27481 -1
1.3406 -1.6589 -1
-0.073933 -1.4756 -1
-0.1247 -1.0512 -1
1.6189 -1.1285 -1
-0.32831 1.4982 -1
0.1749 1.0763 -1
0.78859 -0.63263 -1
-1.6681 -0.46941 -1
0.037311 0.38648 -1
-0.051917 0.14308 -1
1.4102 -0.67809 -1
0.45334 1.445 -1
-1.516 -0.95477 -1
0.42349 1.7679 -1
-1.3307 -0.44882 -1
-0.40012 0.74581 -1
0.12822 -0.91661 -1
1.4868 -1.9231 -1
0.63021 1.7951 -1
1.1397 0.1384 -1
-1.4819 0.69736 -1
0.098963 0.4381 -1
1.583 1.0221 -1
-1.549 1.9609 -1
0.53325 0.92753 -1
-1.6609 1.4557 -1
-0.35175 2.0038 -1
0.84258 1.057 -1
-1.5834 -1.442 -1
1.2282 -0.70763 -1
0.54608 -1.9197 -1
1.5774 0.7926 -1
0.48273 1.869 -1
-0.33838 0.93314 -1
0.58471 0.96454 -1
-0.042523 -1.3256 -1
-1.6098 -0.58906 -1
0.54416 0.30412 -1
1.7842 -0.16318 -1
-0.093611 1.3596 -1
0.40738 1.2851 -1
0.36251 -0.71722 -1
-1.0887 -0.1561 -1
0.66743 0.70871 -1
-1.3609 0.38795 -1
1.0867 -1.4895 -1
-1.1371 -1.9576 -1
-1.3111 -1.5273 -1
0.89457 -1.1274 -1
-0.96612 -0.20721 -1
-1.3363 0.14068 -1
0.4984 1.9978 -1


================================================
FILE: data_src/data_DBCV/dataset_2.txt
================================================
191.67 388.02 1
186.28 383.39 1
182.22 397.99 1
194.54 394.76 1
183.43 393.87 1
184.23 388.09 1
192.33 389.85 1
190.66 379.92 1
195.57 391.06 1
191.96 385.75 1
199.7 389.03 1
198.24 396.81 1
193.82 392.53 1
199.6 389 1
183.64 380.64 1
197.05 391.36 1
184.78 385 1
191.82 380.51 1
195.54 391.6 1
201.36 396.71 1
191.57 382.8 1
188.86 394.4 1
193.18 394.9 1
193.52 383.23 1
190.54 390.52 1
193.54 380.93 1
190.13 385.28 1
189.19 389.93 1
196.69 396.46 1
184.35 384.27 1
187.22 388 1
200.74 378.8 1
186.19 394.45 1
183.39 391.57 1
191.16 391.98 1
192.17 384.25 1
191.2 381.73 1
197.37 390.6 1
187.18 396.36 1
185.36 395.56 1
185 378.42 1
200.7 378.48 1
189.43 380.73 1
201.74 385.78 1
191.41 393.4 1
190.49 396.99 1
183 384.57 1
192.84 394.45 1
188.91 393.37 1
195.13 381.42 1
192.55 389.33 1
188.18 379.04 1
194.63 390.13 1
195.78 384.71 1
194 384.32 1
201.64 392.2 1
189.08 384.43 1
193.18 385.76 1
186.7 380.63 1
193.05 389.65 1
192.87 391.51 1
200.06 383.44 1
187.65 389.69 1
185.65 393.05 1
194.62 385.77 1
200.27 395.52 1
190.14 380.87 1
201.36 386.43 1
197.19 380.52 1
194.06 380.94 1
190.65 381.92 1
185.66 393.78 1
192.36 383.31 1
195.77 390.8 1
186.62 389.32 1
188.21 390.11 1
192.65 396.48 1
195.48 390.87 1
200.97 385.22 1
184.95 393.58 1
197.78 386.29 1
186.87 380.27 1
189.26 386.77 1
190.07 379.13 1
200.59 382.33 1
188.67 396.58 1
200.17 395.49 1
201.76 385.96 1
192.04 397.27 1
192.75 383.17 1
187.46 382.41 1
340.41 481.39 2
340.1 495.16 2
344.78 481.21 2
331.12 496.46 2
340.69 487.92 2
335.3 482.55 2
337.23 499.25 2
342.68 494.51 2
328.22 495.53 2
339.51 486.51 2
341.08 493.48 2
328.75 495.01 2
326.3 488.38 2
327.81 498.99 2
334.28 487.46 2
326.35 492.18 2
341.09 498.63 2
338.57 499.69 2
335.41 492.93 2
332.2 493.39 2
337.98 485.14 2
336.41 483.31 2
339.96 493.2 2
343.33 486.77 2
341.46 485.77 2
330.26 493.31 2
332.52 484.28 2
326.43 499.75 2
328.52 499.14 2
338.5 484.65 2
344.84 482.29 2
334.78 494.12 2
326.04 485.07 2
329.06 481.9 2
331.54 494.11 2
328.48 485.08 2
337.45 499.97 2
339.55 499.48 2
337.43 495.51 2
327.76 494.22 2
330.44 492.63 2
339.49 487.74 2
336.16 482.82 2
341.06 485.45 2
339.48 488.7 2
330.98 480.68 2
331.57 484.42 2
343.33 484.18 2
328.31 488.15 2
334.89 498.58 2
342.32 497.03 2
332.51 487.29 2
326.03 491.16 2
341.69 486.9 2
338.1 492.02 2
332.05 497.13 2
339.75 485.09 2
333.82 484.08 2
329.32 499.44 2
332.68 488.99 2
327.92 487.37 2
337.22 480.92 2
336.15 488.91 2
333.69 490.45 2
326.04 486.61 2
334.56 492.73 2
333.88 489.53 2
337.59 488.77 2
340.21 492.08 2
339.8 493.36 2
329.76 497.27 2
340.95 482.98 2
338.42 484.3 2
344.52 499.25 2
327.57 499.96 2
329.93 499.83 2
335.57 480.24 2
333.34 488.34 2
337.02 493.53 2
340.54 482.09 2
325.4 482.11 2
334.95 494.09 2
336.14 495 2
326.32 495.45 2
332.61 484.52 2
338.13 484.78 2
336.18 494.65 2
331.86 493.36 2
332.1 496.01 2
344.72 488.13 2
294.98 518.49 3
291.23 516.72 3
288.05 515.34 3
278.89 524.17 3
275.93 521.61 3
282.55 514.63 3
281.36 518.21 3
279.33 518.85 3
283.84 525.99 3
288.43 529.03 3
291.04 528.4 3
278.14 522.74 3
281.71 515.52 3
278.62 526.96 3
291.96 525.89 3
287.65 513.1 3
286.88 512.23 3
285.28 527.11 3
280.91 526.84 3
283.04 531.88 3
285.71 523.52 3
281.09 523.43 3
294.47 517.23 3
285.36 515.44 3
280.82 507.61 3
292.46 516.93 3
288.01 519.53 3
287.09 524.37 3
289.28 514.53 3
278.14 515.27 3
280.88 506.35 3
290.47 508.46 3
286.48 501.99 3
289.54 509.65 3
284.03 505.59 3
290.56 510.6 3
283.11 500.77 3
292.86 512.83 3
280.09 510.88 3
288.59 515.62 3
293.92 504.93 3
283.62 505.76 3
289 500.88 3
284.38 493.54 3
281.44 496.12 3
290.95 496.9 3
293.05 488.36 3
276.16 489.8 3
278.67 505.62 3
283.59 501.08 3
286.26 492.08 3
291.35 490.49 3
288.35 487.82 3
282.77 477.32 3
283.58 480.83 3
292.1 477.41 3
294.59 481.41 3
285.3 479.59 3
279.84 489.81 3
293.43 491.57 3
280.47 469.46 3
279.58 471 3
291.63 475.93 3
291.74 468.47 3
288.93 466.17 3
276.82 482.18 3
282.36 481.33 3
288.45 471.95 3
288.05 469.6 3
276.47 479.88 3
300.56 466.5 3
298.47 471.81 3
284.07 481.36 3
287.38 464.73 3
284.4 473.29 3
287.97 480.09 3
298.61 474.17 3
289.85 469.67 3
283.63 464.57 3
298.97 464.74 3
291.27 458.95 3
294.18 463.21 3
294.48 465.85 3
289.87 468.79 3
290.94 459.28 3
296.98 458.08 3
280.27 454.63 3
285.93 466.91 3
278.23 463.41 3
282.53 454.21 3
288.59 454.81 3
276.72 457.2 3
285.36 461.91 3
277.37 457.95 3
278.49 447.63 3
293.35 448.16 3
291.39 449.97 3
294.06 448.98 3
293.1 455.09 3
287.31 453.59 3
282.92 456.89 3
285.34 460.96 3
277.37 443.03 3
285.76 448.16 3
290.56 452.58 3
292.64 460.18 3
280.59 460.58 3
277.92 446.65 3
287.88 447.08 3
286.07 446.54 3
283.37 453.01 3
285.11 438.68 3
278.74 440.95 3
283.6 443.65 3
284.92 444.38 3
287.73 453.63 3
277.89 445.57 3
289.52 436.22 3
295.98 436.38 3
287.3 436.81 3
296.1 441.71 3
292.47 447.83 3
289.95 450.92 3
297.99 443.04 3
297 434.36 3
296.57 445.3 3
299.79 440.01 3
299.96 442.62 3
299.68 439.13 3
296.39 436.12 3
320.92 454.47 3
310.97 444.94 3
323.03 452.19 3
309.97 447.02 3
311.33 456.66 3
320.74 452.42 3
323.85 458.09 3
305.53 448.24 3
307.96 450.16 3
318.1 460.06 3
307.99 450.18 3
306.09 447.03 3
314.32 446.33 3
310.71 454.38 3
318.18 440.03 3
317.17 448.15 3
314.51 454.5 3
314.28 444.67 3
315.05 449.01 3
310.99 457.54 3
313.12 441.72 3
309.27 445.33 3
309.35 444.71 3
311.14 443.3 3
305.75 437.12 3
309.01 455.41 3
312.25 437.3 3
305.43 442.71 3
309.84 453.82 3
305.52 444.09 3
321.73 441.87 3
314.2 439.02 3
329.11 440.43 3
316.15 455.43 3
316.5 454.81 3
314.86 452.27 3
323.51 448.13 3
324 439.58 3
322.74 448.47 3
322.93 447.08 3
335.27 437.48 3
338.28 451.23 3
328.09 447.29 3
322.51 449.93 3
323.06 450.62 3
331.6 452.04 3
334.17 449.54 3
330.58 439.86 3
327.51 450.65 3
335.91 449.43 3
343.39 443.55 3
331.72 435.02 3
336.7 447.63 3
330.01 450.15 3
328.63 448.64 3
329.16 436.68 3
327.53 440.84 3
332.23 452.85 3
330.02 447.25 3
328.79 452.26 3
340.74 442.92 3
353.55 435.51 3
345.18 445.63 3
337.29 440.98 3
343.28 435.96 3
341.23 445.12 3
355.15 435.15 3
345.11 444.24 3
339.09 437.76 3
340.62 442.32 3
351.08 437.81 3
355.65 442.49 3
357.24 446.59 3
361.65 444.11 3
345.92 445.21 3
349.02 451.95 3
348.38 438.95 3
358.49 451.57 3
345.9 441.06 3
360.79 449.67 3
358.19 447.11 3
351.18 457.04 3
355.48 453.5 3
354.18 451.7 3
351.94 452.31 3
365.01 439.53 3
369.88 441.77 3
359.26 454.45 3
355.58 455.95 3
353.81 440.37 3
361.36 454.94 3
368.56 447.07 3
375.49 443.25 3
366.36 450.91 3
363.16 455.29 3
371.11 458.19 3
372.98 448 3
373.03 447.92 3
374.39 457.3 3
372.26 445.73 3
377.06 450.13 3
379.63 439.73 3
372.1 441.27 3
383.96 458.63 3
371.14 456.3 3
367.27 441.51 3
381.39 453.48 3
381.73 455 3
366.89 451.22 3
377.3 444.36 3
380.11 448.47 3
379.44 449.11 3
376.58 457.72 3
372.81 456.82 3
382.33 462.54 3
386.07 457.32 3
378.29 448.74 3
373.19 450.57 3
370.29 444.36 3
383.03 452.18 3
368.76 464.2 3
384.78 457.26 3
383.27 468.2 3
369.62 471.17 3
372.26 465.96 3
371.47 468.03 3
380.38 467.98 3
383.79 455.89 3
385.78 457.04 3
380.57 470.4 3
382.35 480.17 3
386.18 481.18 3
378.57 471.25 3
381.73 468.53 3
376.54 466.42 3
368.33 466.71 3
372.74 474.28 3
382.27 482.73 3
384.97 466.96 3
372.55 468.1 3
378.96 486.39 3
380.22 496.84 3
375.67 493.78 3
366.74 495.22 3
367.36 484.87 3
366.04 493.07 3
366.34 488.55 3
376.13 492.72 3
374.27 494.8 3
371.04 489.64 3
373.75 473.85 3
378.51 489.22 3
385.28 490.17 3
372.82 490.59 3
372.1 476.8 3
370.43 475.76 3
384.99 474.93 3
385.69 476.61 3
381.26 479.32 3
372.69 488.47 3
381.36 492.11 3
384.11 474.87 3
368.05 475.11 3
374.83 473.57 3
369.97 484.63 3
371.07 475.87 3
366.76 489.93 3
384.18 482.18 3
385.75 492.76 3
368.73 488.29 3
385.87 493.03 3
377.38 499.81 3
384.49 495.86 3
372.36 495.33 3
375.01 501.77 3
375.62 488.1 3
379.96 501.58 3
370.54 498.8 3
383.35 503.41 3
371.08 490.36 3
370.21 495.93 3
373.48 514.8 3
370.55 504.81 3
370.71 506.46 3
371.66 499.01 3
377.41 502.92 3
367.2 513.39 3
377.92 514.13 3
384.16 514.71 3
385.25 505.71 3
373.21 523.99 3
377.33 518.61 3
385.47 514.76 3
375.76 521.69 3
371.97 523.57 3
372.11 523.87 3
370.64 515.02 3
366.48 514.77 3
378.54 523.61 3
383.2 513.48 3
367.42 522.37 3
385.56 520.12 3
368.75 523.87 3
384.73 533.84 3
377.31 529.91 3
376.84 529.64 3
378.41 535.94 3
369.63 528.13 3
366.75 524.39 3
366.24 522.79 3
377.66 536.96 3
378.88 541.38 3
372.15 528.05 3
373.61 537.04 3
366.17 522.84 3
383.47 527.33 3
383.24 522.12 3
367.52 527.08 3
373.86 535.5 3
382.26 535.66 3
357.93 535.36 3
365.68 532.88 3
363.11 538.57 3
350.31 534.83 3
365.66 535.6 3
359.09 526.82 3
358.05 533.23 3
367.55 529.39 3
361.73 527.94 3
349.54 536.64 3
347.31 543.75 3
346.07 536.35 3
351.77 531.47 3
353.46 529.62 3
354.41 533.82 3
361.57 541.99 3
346.78 545.99 3
344.67 536.04 3
361.73 540.08 3
355.75 544.66 3
346.84 539.93 3
343.98 538.04 3
342.01 536.7 3
335.88 525.71 3
338 533.94 3
338.97 526.9 3
353.23 530.62 3
338.4 540.83 3
341.43 533.65 3
336.62 535.57 3
338.84 535.99 3
336.84 529.85 3
325.93 534.64 3
329.93 528.94 3
327.31 526.5 3
342.67 535.84 3
325.67 540.26 3
335.96 529.47 3
324.81 530.54 3
323.57 531.35 3
330.93 539.85 3
325.2 527.89 3
314.42 533.91 3
317.52 532.12 3
329.36 531.92 3
318.32 542.56 3
321.96 540.29 3
322.88 530.85 3
328.42 530.82 3
323.48 524.62 3
313.88 542.08 3
319.01 525.49 3
323.61 529.62 3
320.88 535.79 3
306.95 532.76 3
315.62 541.98 3
316.32 525.54 3
307.04 539.87 3
313.11 543.76 3
317.78 533.33 3
304.88 538.15 3
310.86 537.3 3
306.53 527.56 3
293.92 539.02 3
295.26 525.31 3
298.32 530.93 3
307.76 535.17 3
303.6 528.51 3
295.49 540.66 3
303.73 529.11 3
302.05 532.02 3
302.8 531.32 3
295.21 533.72 3
286.11 528.52 3
296.5 531.68 3
290.35 537.34 3
302.04 536.13 3
285.01 531.55 3
292.5 541.61 3
302.73 526.73 3
286.09 543.6 3
286.95 541.62 3
288.73 525.62 3
291.94 525.4 3
284.91 535.21 3
281.74 536.65 3
282.97 536.73 3
279.31 541.51 3
282.08 529.34 3
288.73 525.19 3
306.37 533.73 3
290.12 539.37 3
294.4 534.15 3
296.61 526.55 3
306.91 536.5 3
306 526.44 3
291.47 542.59 3
305.58 525.35 3
297.59 544.89 3
288.33 527.02 3
305.29 540.79 3
311.39 536.2 3
310.13 526.2 3
318.07 544.46 3
309.02 529.11 3
305.32 536.69 3
317.93 532.3 3
320.65 533.34 3
319.57 542.11 3
308.2 539.44 3
333.5 529.38 3
336.7 535.47 3
318.03 528.88 3
327.08 530.98 3
329.83 531.53 3
323.47 532.39 3
328.83 531.74 3
337.92 543.6 3
329.95 528.19 3
326.37 543.61 3
338.88 534.78 3
333.09 535.94 3
342.69 538.79 3
346.23 542.77 3
334.57 526.75 3
341.95 524.57 3
333.56 541.51 3
331.04 527.26 3
345.95 537.33 3
348.59 528.64 3
352.97 539.86 3
346.53 542.73 3
343.8 542.63 3
341.86 531.31 3
340.08 539.25 3
359.03 532.22 3
340.99 531.07 3
340.08 536.54 3
359.73 533.9 3
351.63 527.78 3
372.06 524.72 3
361.89 541.77 3
359.29 542.9 3
356.45 535.49 3
369.13 525.67 3
361.68 539.93 3
364.28 532.1 3
374.01 538.93 3
370.34 535.18 3
358.54 542.05 3
372.08 528.04 3
372.39 542.33 3
370.01 537.32 3
368.39 525.05 3
369.99 533.21 3
371.23 533.56 3
382.96 526.72 3
371.94 527.73 3
368.82 527 3
377.01 538.67 3
380.64 509.76 3
379.15 510.5 3
381.52 516.9 3
385.61 509.08 3
385.72 511.69 3
385.25 520.56 3
372.08 521.03 3
373.05 527.4 3
375.92 515 3
380.46 508.53 3
381.17 509.89 3
377.5 498.2 3
378.08 515.71 3
383.04 504.89 3
374.27 498.43 3
371.71 507.72 3
382.35 500.94 3
374.14 512.62 3
372.42 498.97 3
375.79 497.16 3
383.88 449.14 3
375.92 450.35 3
378.77 441.93 3
369.78 441.17 3
377.04 443.58 3
382.24 446.89 3
370.4 456.44 3
371.94 447.36 3
369.27 445.18 3
386.51 454.03 3
307.62 450.76 3
293.17 456.52 3
291.63 444.09 3
303.18 444.98 3
308.05 451.8 3
298.22 448.06 3
308.96 446.43 3
306.11 459.63 3
295.57 453.74 3
293.5 453.79 3
305.05 445.18 3
294.13 455.98 3
289.97 444.39 3
296.05 451.41 3
292.94 442.44 3
293.6 442.95 3
306.73 455.38 3
302.1 441.24 3
297.24 443.52 3
305.96 459.49 3
282.21 479.76 3
295.87 491.26 3
285.18 491.68 3
292.34 478.29 3
294.6 484.44 3
295.86 490.94 3
285.59 490.64 3
277.5 488.55 3
282.95 483.96 3
294.89 478 3
290.23 498.08 3
294.37 506.82 3
283.9 501.09 3
292.28 502.85 3
283.46 506.98 3
293.4 499.57 3
292.27 500.25 3
283.43 492.35 3
289.43 490.77 3
281.36 509.62 3
283.61 494.48 3
278.76 498.29 3
276.56 482 3
279.43 485.63 3
276.21 493.2 3
279.88 482.74 3
285 481.02 3
284.56 487 3
293.44 490.91 3
291.58 485.65 3
198.3 458.14 4
203.89 469.93 4
199.19 463.46 4
198.64 454.27 4
196.65 451.89 4
199.64 464.95 4
207.5 464.11 4
188.17 464.08 4
193.09 458.38 4
203.07 451.7 4
197.56 457.59 4
202.17 466.1 4
200.91 465.19 4
196.98 459.22 4
205.31 463.32 4
195.67 469.62 4
201.69 461.65 4
191.84 464.67 4
191 451.79 4
200.77 466.85 4
201.71 451.69 4
192.59 451.93 4
200.48 453.18 4
194.89 447.26 4
197.26 465.19 4
200.44 466.05 4
196.99 465.68 4
203.72 452.95 4
206.59 454.63 4
207.58 464.62 4
202.88 451.85 4
204.78 446.27 4
200.15 456.7 4
207.99 446.47 4
200.37 458.39 4
201.7 446.48 4
201.3 449.72 4
200.37 440.41 4
215.49 457.73 4
214.85 448.79 4
201.53 448.19 4
211.18 454.93 4
207.18 441.92 4
212.44 448.85 4
210.42 452.5 4
210.99 452.69 4
218.63 442.57 4
204.34 451.16 4
221.99 437.54 4
218.96 448.85 4
208.27 450.06 4
212.25 447.75 4
217.98 434.97 4
221.75 431.21 4
223.15 449.72 4
219.95 445.2 4
224.86 440.54 4
220.52 430.74 4
225.66 441.67 4
212.64 445.5 4
214.78 443.36 4
218.02 436 4
218.18 444.59 4
218.23 429.54 4
216.33 431.04 4
228.17 447.54 4
214.33 428.31 4
229.09 428.86 4
227.48 436.96 4
227.73 436.69 4
230.46 435.62 4
229.82 423.5 4
234.03 434.96 4
239.51 436.33 4
225.41 430.7 4
225.95 436.85 4
227.65 432.75 4
221.91 423.96 4
227.68 424.42 4
235.92 434.8 4
235.65 420.72 4
228.66 418.43 4
235.06 419.85 4
236.03 428.26 4
229.29 427.44 4
225.75 414.38 4
239.79 416.05 4
243.86 424.58 4
232.34 418.07 4
236.91 428.55 4
241.79 427.96 4
235.44 409.6 4
235.16 416.5 4
244.16 408.77 4
231.48 423.2 4
242.9 424.95 4
246.4 423.32 4
239.09 408.26 4
247.89 409.23 4
244.61 415.54 4
245.38 410.68 4
244.9 411.19 4
236.65 408.73 4
243.95 404.73 4
254.25 407.38 4
251.26 415.82 4
247.76 399.39 4
252.76 404.75 4
240.98 403.86 4
236.98 413.05 4
240.26 404.66 4
255.2 395.54 4
258.91 404.23 4
243.74 406.34 4
252.82 397.57 4
250.77 401.71 4
247.93 399.11 4
252.33 393.87 4
255.98 391.3 4
245.39 396.91 4
246.85 408.06 4
251.04 401.55 4
258.05 396.09 4
246.61 390.08 4
245.69 393.88 4
259.65 385 4
260.94 396.9 4
245.64 402.94 4
244.92 403.08 4
251.78 388.86 4
243.08 398.84 4
258.91 382.06 4
248.48 385.45 4
259.08 386.77 4
250.36 382.85 4
247.62 398.27 4
261.63 384.05 4
247.43 381.97 4
250.61 395.19 4
264.37 390.65 4
258.36 391.3 4
190.06 455.47 4
195.48 451.46 4
201.24 458.19 4
198.89 458.24 4
203.55 468.81 4
199.34 466.97 4
191.85 452.98 4
203.38 455.55 4
198.53 464.05 4
203 455.49 4
194.69 464.91 4
186.61 453.88 4
190.47 462.79 4
195.8 462.98 4
189.89 460.86 4
190.97 452.25 4
194.53 459.51 4
183.97 458.15 4
198.73 449.22 4
186.47 447.6 4
184.29 449.35 4
176.73 455.3 4
191.18 446.89 4
179.84 452.04 4
175.18 443.39 4
181.11 456.55 4
190.27 440.32 4
190.94 453.05 4
189.19 450.35 4
179.1 452.01 4
174.44 449.97 4
172.03 434.57 4
174.07 450.58 4
175.13 449.2 4
165.88 448.76 4
168.29 432.29 4
179.29 450.29 4
172.98 449.92 4
172.1 450.39 4
167.58 442.8 4
174.78 439.86 4
164.13 436.56 4
176.13 446.54 4
169.48 447.33 4
178.52 433.01 4
166.08 434.11 4
162.53 429.1 4
176.36 432.25 4
168.91 448.69 4
177.75 445.15 4
164.68 432.66 4
166.76 437.64 4
164.07 435.11 4
152.04 431.92 4
168.08 423.64 4
158.56 432.68 4
161.75 426.83 4
170.06 433.08 4
154.79 423.99 4
165.09 429.73 4
154.66 435.48 4
147.97 434.71 4
147.19 430.58 4
154.19 422.82 4
151.84 435.21 4
162.41 430.96 4
153.94 425.48 4
153.24 435.55 4
149.07 419.97 4
146.52 418.66 4
148.73 424.08 4
148.69 416 4
138.89 425.37 4
153.89 411.67 4
152.34 417.64 4
142.22 424.99 4
153.99 412.53 4
149.8 408.78 4
147.58 417.33 4
144.91 423.09 4
148.67 411.39 4
140.61 414.57 4
134.42 406.89 4
146.25 408.91 4
139 418.24 4
140.38 412.91 4
139.69 400.23 4
138.24 409.93 4
137.67 402.47 4
149.3 409.35 4
126.59 412.45 4
140.89 405.4 4
129.15 412.94 4
126.7 400.82 4
140.11 404.86 4
138.7 397.16 4
125.15 403.94 4
132.31 411.92 4
138.65 413.69 4
128.17 395.76 4
132.45 392.75 4
140.23 391 4
135.5 390.57 4
125.47 394.75 4
138.57 407.52 4
131.49 408.38 4
123.88 393.71 4
137.26 394.43 4
126.34 401.15 4
124.47 400.66 4
122.16 404.27 4
136.05 397.37 4
134.26 394.87 4
138.96 386.37 4
136.19 387.27 4
123.36 388.86 4
138.88 390.61 4
139.61 397.8 4
128.32 386.19 4
120.12 401.68 4
123.52 388.45 4
119.1 392.93 4
133.36 385.84 4
120.26 400.77 4
134.65 392.68 4
119.76 393.56 4
124.64 386.31 4
128.29 396.37 4
120.41 393.71 4
123.22 385.37 4
145.77 393.51 4
137.14 393.7 4
138.18 393.07 4
137.07 397.67 4
140.52 389.69 4
135.67 398.87 4
128.85 408.99 4
130.66 405.39 4
134.79 398.03 4
135.89 406.96 4
150.18 418.17 4
142.18 414.92 4
140.63 411.45 4
145.11 407.78 4
147.82 411.9 4
151 407.72 4
150.83 415.74 4
135.16 401.2 4
136.89 414.08 4
140.62 404.82 4
150.2 408.49 4
152.68 422.47 4
151.35 412.86 4
157.05 424.34 4
148.68 426.44 4
160.54 408.35 4
149.52 417.35 4
153.08 417 4
155.37 412.9 4
159.1 408.44 4
150.24 427.06 4
152.83 419.78 4
160.87 431.75 4
158.89 428.23 4
153.08 416.97 4
167.93 434.65 4
166.45 424.95 4
163.38 433.96 4
160.96 427.63 4
161.1 433.9 4
181.14 437.87 4
176.46 438.86 4
169.81 438.33 4
182.45 430.82 4
163.65 445.6 4
181.41 431.22 4
166.67 440.68 4
178.08 432.8 4
167.84 440.94 4
169.44 436.48 4
171.23 449.72 4
182.82 444.9 4
176.3 445.6 4
188.05 441.99 4
183.7 439.02 4
175.06 445.93 4
180.96 448.71 4
183.01 442.18 4
169.45 449.21 4
187.35 437.74 4
191.27 444.3 4
182.97 438.2 4
185.82 440.49 4
189.89 441.88 4
188.1 445.55 4
182.45 448.23 4
177.89 452.31 4
193.3 455.14 4
195.03 439.95 4
189.35 439.2 4
117.72 385.48 4
122.63 377.76 4
121.74 387.04 4
124.2 375.18 4
127.16 382.54 4
127.69 382.12 4
123.44 381.06 4
121.13 376.28 4
127.91 371.9 4
133.42 381.22 4
130.44 374.66 4
136.7 380.41 4
128.86 374.55 4
136.66 367.58 4
138.2 382.07 4
127.34 375.42 4
140.79 381.27 4
125.45 380.05 4
132.13 378.84 4
131.31 376.54 4
137.27 366.15 4
133.56 370.78 4
138.64 360.85 4
138.61 361.91 4
137.03 359.91 4
142.13 359.84 4
140.54 361.11 4
139.26 374.2 4
130.06 359.98 4
147.35 368.34 4
140.84 353.83 4
134.76 366.96 4
150.47 356.02 4
144.82 367.66 4
151.73 367 4
139.22 371.1 4
147.48 372.7 4
141.74 356.31 4
147.96 360.38 4
139.26 357.93 4
141.8 358.32 4
156.21 348.31 4
156.45 363.2 4
156.08 352.66 4
150.86 357.83 4
152.37 350.2 4
158.09 357.93 4
156.27 360.5 4
157.75 363.24 4
155.08 362.48 4
162.43 346.75 4
163.35 349.73 4
161.67 346.8 4
148.99 356.4 4
153.2 348.88 4
159.99 354.49 4
156.76 343.47 4
152.85 362.09 4
153.94 347.41 4
154.56 353.59 4
163.39 342.31 4
165.92 338.05 4
157 345.79 4
172.24 340.97 4
164.41 345.46 4
157 348.08 4
161.71 337.14 4
154.67 344.41 4
172.13 348.46 4
163.47 341.69 4
164.7 343.18 4
172.45 337.05 4
171.88 346.85 4
163.73 335.13 4
175.97 338.81 4
157.38 343.17 4
156.84 337.49 4
166.79 351.48 4
171 345.14 4
172.25 346.07 4
176.58 341.91 4
169.96 332.16 4
178.97 325.11 4
178.44 326.47 4
169.4 336.15 4
181.5 328.26 4
171.77 343.04 4
176.6 328.48 4
175.76 340.22 4
172.65 341.4 4
174.81 327.5 4
191.28 324.43 4
190.37 340.29 4
175.31 340.67 4
186.26 340.09 4
176.14 336.95 4
184.61 340.22 4
182.48 338.43 4
190.87 322.33 4
176.67 325.27 4
178.32 337.64 4
186.63 326.13 4
176.43 333.61 4
177.88 335.8 4
191.76 332.31 4
179.12 338.13 4
185.95 329.39 4
187.96 330.96 4
175.63 337.27 4
179.15 321.21 4
193.52 328.24 4
178.94 327.4 4
191.32 338.3 4
193.89 324.32 4
179.57 336.73 4
184.27 335.84 4
183.85 330.84 4
192.23 323.22 4
193.01 325.36 4
184.32 327.39 4
200.13 343.46 4
202.98 332.06 4
198.42 329.82 4
188.9 340.01 4
200.39 344.17 4
191.98 331.19 4
187.45 334.11 4
196.08 342.22 4
192.31 342.2 4
192.81 337.94 4
190.76 349.94 4
200.67 349.47 4
208.32 341.83 4
197.8 346.03 4
208.77 350.61 4
201.54 335.23 4
193.3 346.37 4
196.31 345.38 4
202.7 337.96 4
208.83 333.32 4
195.91 344.7 4
212.93 334.53 4
207.48 342.27 4
202.13 353 4
203.85 341.3 4
199.29 341.41 4
212.12 341.43 4
206.18 336.02 4
200.38 340.09 4
200.35 350.43 4
206.29 341.36 4
217.96 351.75 4
222.43 337.16 4
218.31 344.44 4
211.09 350.99 4
214.13 346.28 4
208.15 339.44 4
218.07 341.14 4
213.75 349.12 4
215.24 337.6 4
224.5 350.74 4
210.19 344.25 4
209.7 357.27 4
211.29 347.59 4
220.44 348.32 4
222.85 343.04 4
219.92 351.36 4
225.22 345.5 4
225.96 340.91 4
222.28 357.91 4
220.36 363.04 4
219.53 361.85 4
226.05 346.9 4
220.05 353.52 4
228.9 362.98 4
225.64 346.26 4
228.64 353.13 4
220.18 360 4
223.64 360.37 4
228.58 354.81 4
228.59 367.66 4
231.4 371.1 4
242.12 371.48 4
232.93 371 4
231.8 363.3 4
242.2 355.7 4
228.65 358.03 4
229.7 372.52 4
232.95 355.08 4
229.14 363.95 4
234.46 370.7 4
247.11 373.95 4
243.17 358.13 4
239.66 359.27 4
232.77 365.49 4
243.63 368.23 4
241.06 373.55 4
240.9 367.56 4
248.27 376.65 4
237.77 360.71 4
253.71 364.84 4
243.26 379.69 4
254.33 375.17 4
245.46 373.74 4
247.71 366.26 4
240.4 366.04 4
256.63 382.68 4
247.55 372.71 4
248.04 377.17 4
240.53 363.84 4
242.42 377.33 4
257.53 369.08 4
257.42 370.07 4
251.96 382.08 4
248.29 369.64 4
259.34 385.78 4
253.46 371.86 4
255.27 373.52 4
244.83 369.61 4
248.63 379.58 4
235.05 369.8 4
237.22 372.49 4
249.32 368.07 4
242.86 374.1 4
238.66 362.07 4
250.89 375.81 4
241.31 370.12 4
237.49 362.68 4
237.23 371.37 4
246.65 360.96 4
219.41 365.99 4
223.52 360.2 4
228.85 369.91 4
217.11 361.84 4
234.9 357.68 4
222.46 363.06 4
223.96 361.62 4
230.89 367.73 4
229.33 357.19 4
230.89 369.42 4
226.28 352.54 4
213.51 353.54 4
214.99 363.01 4
226.78 361.09 4
217.91 354.33 4
214.09 357.95 4
221.93 355.37 4
229.22 349.4 4
225.11 358.57 4
211.71 354.16 4
210.42 344.9 4
213.16 343.07 4
213.08 349.23 4
206.17 350.93 4
219.06 343.72 4
217.43 348.01 4
206.71 339.37 4
212.88 345.11 4
214.92 342.69 4
210.66 343.11 4
202.17 343.93 4
190.27 339.93 4
191.43 339.48 4
201.45 328.21 4
205.77 346.44 4
206.99 330.85 4
200.21 339.23 4
201.98 341.81 4
193.11 330.56 4
195.36 338.81 4
192.18 333.78 4
178.05 337.99 4
182.74 327.85 4
187.46 341.07 4
191.62 326.92 4
189.26 333.86 4
181.52 334.34 4
177.42 337.54 4
186.32 326.16 4
179.18 323.71 4
130.22 368.24 4
134.92 370.21 4
130.08 356.89 4
137.76 375.17 4
141.57 356.92 4
134.31 367.94 4
145.15 357 4
148.45 362.05 4
143.25 359.34 4
132.38 365.07 4
130.39 374.35 4
138.76 368.03 4
134.69 370.98 4
132.68 357.75 4
142.41 365.15 4
145.76 374.53 4
138.66 368.6 4
137.79 363.31 4
133.72 359.79 4
142.8 367.33 4
140.04 358.54 4
149.51 355.02 4
150.25 357.86 4
155.36 345.6 4
150.69 346.15 4
155.44 347.88 4
155.29 357.69 4
153.12 342.46 4
148.18 340.92 4
155.79 349.6 4
157.19 355.96 4
153.69 348.89 4
149.19 351.88 4
151.05 351.77 4
149.87 355.65 4
163.68 354.24 4
154.32 344.1 4
154.39 343.41 4
156.59 343.95 4
159.8 350.55 4
163.18 353.59 4
157.1 362.28 4
149.59 356.35 4
147.21 365.17 4
144.89 346.11 4
152.48 361.98 4
157.29 360.26 4
149.05 361.91 4
146.33 358.3 4
148.53 346.72 4
139.25 349.31 4
218.04 437.44 4
220.29 450.98 4
225.04 433.06 4
214.25 434.21 4
215.99 441.05 4
208.72 449.19 4
227.07 442.72 4
221.21 433.19 4
222.83 445.46 4
218.46 438.48 4
211.28 436.06 4
217.16 434.23 4
206.12 451.65 4
212.12 445.7 4
203.89 435.37 4
210.29 452.63 4
212.07 441.61 4
215.74 439.71 4
217.13 434.17 4
204.02 441.62 4
223.78 434.98 4
218.42 437.18 4
210.56 430.4 4
217.65 425.46 4
216.93 430.51 4
222.84 442.86 4
213.28 431.26 4
210.09 434.33 4
222.73 442.82 4
214.36 441.69 4
237.39 418.2 4
229.47 429.84 4
244.66 429.93 4
239.66 429.15 4
244.76 425.95 4
237.55 426.22 4
243.88 422.54 4
240.95 421.33 4
229.03 427.66 4
229.86 420.24 4
249.26 404.24 4
251.17 415.68 4
251.71 419.58 4
252.39 420.42 4
251.73 411.15 4
244.97 401.75 4
242.29 401.06 4
238.15 419.37 4
250.35 412.23 4
244.49 419.99 4
259.66 395.81 4
261.1 407.93 4
250.81 404.14 4
254.44 408.97 4
252.95 405.59 4
262.43 394.1 4
255.92 397.37 4
261.36 395.05 4
250.06 408.09 4
262.14 392.31 4
257.57 396.5 4
270.06 403.05 4
262.73 409.5 4
267.01 408.7 4
262.4 392.08 4
267.11 395.43 4
271.62 396.93 4
262.14 393.18 4
271.35 390.59 4
257.85 406.07 4
228.08 364.61 4
231.64 362.28 4
243.51 364.21 4
226.72 357.82 4
230.6 367.68 4
240.6 362.96 4
238.25 372.67 4
229.44 360.23 4
232.08 364.63 4
232.92 361.7 4
252.77 381.88 4
247.62 392.28 4
263.44 387.81 4
253.68 382.19 4
259.08 392.99 4
264.67 393.6 4
255.51 377.17 4
262.49 379.26 4
252.66 388 4
247.5 388.68 4
256.48 390.36 4
256.89 377.67 4
261.53 378.71 4
255.17 386.52 4
254.66 392.37 4
258.6 389.34 4
247.35 382.35 4
265.2 376.06 4
264.45 376.43 4
250.61 380.42 4
256.62 386.65 4
251.43 395.22 4
266.27 388.11 4
258.57 391.79 4
259.73 391 4
254.08 399.91 4
253.78 392.9 4
257.59 388.64 4
261.87 397.35 4
252.47 400.82 4
264.4 398.63 4
250 403.1 4
259.46 396.45 4
261.72 408.46 4
252.46 393.86 4
254.19 409.98 4
258.04 397.77 4
247.66 397.47 4
266.59 403.65 4
262.47 405.94 4
247.05 409.1 4
245.69 420.5 4
234.86 407.95 4
240.49 409.16 4
238.89 408.82 4
242.22 414 4
247.82 414.76 4
238.5 407.32 4
243.72 422.43 4
246.33 408.02 4
231.04 421.65 4
237.27 421.82 4
226.69 419.44 4
224.6 421.41 4
239.16 426.92 4
228.72 420.58 4
227.71 426.17 4
241.93 429.33 4
239.63 419.52 4
234.09 415.81 4
227.93 434.94 4
233.62 436.41 4
217.54 425.56 4
232.23 439.02 4
234.67 427.1 4
218.98 420.79 4
229.37 429.67 4
216.5 427.53 4
233.05 439.01 4
217.24 436.4 4
213.08 430.56 4
203.91 430.68 4
215.42 444.26 4
202.52 431.18 4
220.37 447.48 4
201.17 431.52 4
213.4 434.65 4
213.54 439.4 4
210.96 434.04 4
220.72 442.19 4
210.34 444.44 4
194.43 450.68 4
199.3 447.32 4
194.28 456.99 4
195.82 444.78 4
204.58 447.64 4
200.65 446.53 4
204.44 454.08 4
196.37 445.67 4
213.8 454.18 4
352.62 446.2 3
347.57 441.9 3
351.7 443.21 3
349.3 435.54 3
350.36 444.3 3
348.3 435.53 3
349.05 449.92 3
364.43 437.18 3
353.32 446.58 3
348.12 437.77 3
354.46 440.77 3
347.62 438.56 3
347.37 433.29 3
341.16 442.67 3
354.43 446.37 3
341.44 435.51 3
346.13 433.37 3
354.07 437.71 3
350.66 441.36 3
339.25 435.67 3
354.16 434.2 3
330.65 452.58 3
336.68 441.11 3
334.92 443.7 3
331.34 453.99 3
331.77 453.2 3
335.38 434.56 3
332.28 437.36 3
344.19 447.13 3
344.17 435.99 3
341.41 444.4 3
353.29 445.79 3
342.73 439.77 3
351.52 448.73 3
340.21 438.36 3
334.65 443.17 3
345.58 443.9 3
343.81 439.37 3
342.66 446.82 3
334.2 449.86 3
341.76 435.64 3
358.82 442.96 3
366.64 442.46 3
367.76 434.95 3
368.64 435.45 3
364.24 444.29 3
368.97 449.81 3
356.26 445 3
362.38 452.6 3
359.9 444.43 3
368.63 447.55 3
365.35 482.83 3
367.35 484.42 3
365.03 479.79 3
380.03 482.61 3
381.3 487.57 3
370.08 475.27 3
366.23 487.66 3
375.72 487.36 3
370.03 479.5 3
382.27 485.04 3
365.76 471.48 3
368.39 469.64 3
383.48 482.17 3
384.07 476.72 3
384.59 473.77 3
381.4 468.12 3
384.2 478.41 3
376.22 467.17 3
368.14 479.56 3
383.01 466.91 3
379.56 515.72 3
382.6 511.26 3
381.47 511.99 3
363.12 504.73 3
366.87 516.62 3
365.36 507.87 3
372.71 513.27 3
364.66 501.16 3
365.96 505.78 3
382.53 497.86 3
282.64 365.27 -1
180.87 329.72 -1
348.24 507.23 -1
148.71 327.57 -1
354.93 403.76 -1
222.26 360.51 -1
198.31 434.52 -1
320.1 432.7 -1
189 515.62 -1
258.98 449.59 -1
179.34 359.65 -1
132.88 525.39 -1
340.22 470.39 -1
148.85 434.21 -1
263.38 506 -1
285.63 385.88 -1
117.82 519.24 -1
161.7 440.67 -1
120.28 342.03 -1
320.12 429.9 -1
306.87 393.55 -1
213.5 443.29 -1
297.92 536.26 -1
142.18 403.83 -1
338.84 327.16 -1
231.48 487.44 -1
145.03 394.24 -1
143.95 470.76 -1
146.31 433.12 -1
284.02 514 -1
171.74 489.49 -1
231.47 410.66 -1
335.18 436.78 -1
206.13 452.43 -1
191.19 435.46 -1
337.81 486.17 -1
173.22 484.99 -1
293.66 364.85 -1
272.55 423.46 -1
269.24 437.79 -1
202.3 354.73 -1
156.87 496.98 -1
152.72 358.4 -1
344.22 432.53 -1
245.29 379.47 -1
180 460.97 -1
327.66 423.96 -1
301.15 378.63 -1
161.06 441.34 -1
155.67 341.81 -1
297.47 447.43 -1
294.78 512.77 -1
290.77 484.74 -1
167.93 442.3 -1
338.69 472.18 -1
282.69 514.68 -1
368.29 445.28 -1
295.5 525.92 -1
184.7 446.36 -1
130.27 514.15 -1
127.7 422.99 -1
310.78 357.32 -1
183.15 374.53 -1
156.88 458.72 -1
129.54 418.71 -1
335.13 457.69 -1
343.7 374.41 -1
179.41 531.05 -1
318.73 479.32 -1
193.51 473.08 -1
208.64 409.69 -1
206.21 466.76 -1
135.63 422.2 -1
138.46 543.74 -1
250.73 342.26 -1
218.81 426.49 -1
229.69 477.14 -1
284.29 532.91 -1
296.07 463.16 -1
288.54 459.93 -1
209.16 465.87 -1
317.9 355.28 -1
304.66 376.21 -1
275.39 444.87 -1
354.99 375.45 -1
375.01 434.52 -1
348.9 501.32 -1
203.83 470.36 -1
296.09 329.48 -1
361.14 367.97 -1
301.95 411.83 -1
386.27 527.61 -1
251.95 457.88 -1
191.07 370.28 -1
304.1 399.64 -1
385.17 381.44 -1
174.27 407.84 -1
229.46 518.63 -1
161.27 445.97 -1
184.84 394.25 -1
223.55 466.99 -1
209.15 458.42 -1
367.3 341.21 -1
239.41 403.66 -1
259.26 532.49 -1
252.43 528.64 -1
335.53 379.85 -1
334.05 380.68 -1
149.39 496.62 -1
283.93 411.55 -1
325.42 520.47 -1
194.28 422.86 -1
190.06 432.93 -1
243.01 405.05 -1
217.51 407.01 -1
246.54 424.18 -1
235.51 482.56 -1
186.32 485.29 -1
155.66 396.77 -1
189.27 425.58 -1
211.39 406.11 -1
185.6 335.59 -1
122.48 542.1 -1
238.88 455.25 -1
360.86 422.39 -1
294.33 539.7 -1
156.27 476.01 -1
187.2 430.9 -1
330.4 401.54 -1
313.23 418.2 -1
338.97 367.56 -1
369.39 443.68 -1
169.11 536.44 -1
259.36 363.25 -1
193.15 494.34 -1
227.49 509.44 -1
182.19 525.38 -1
213.83 462.97 -1
161.89 454.64 -1
118.4 494.88 -1
227.03 382.59 -1
192.88 323.07 -1
165.37 381.55 -1
275.14 441.21 -1
311.66 371.82 -1
311.59 526.32 -1
254.42 346.56 -1
205.33 381.08 -1
239.98 429.06 -1
197.52 469.78 -1
301.31 509.12 -1
366.28 463.88 -1
196.14 502.57 -1
245.66 449.88 -1
222.96 431.74 -1
203.44 501.38 -1
225.29 420.35 -1
353.61 332.2 -1
373.92 543.17 -1
118.9 489.63 -1
279.38 451.28 -1
328.88 396.33 -1
289.34 454.59 -1
149.17 523.88 -1
308.7 510.69 -1
144 516.41 -1
274.9 344.51 -1
345.54 498.18 -1
179.74 412.47 -1


================================================
FILE: data_src/data_DBCV/dataset_3.txt
================================================
-6.1698 2.2449 1
-2.6453 6.9494 1
-4.9691 4.9966 1
-3.0064 6.868 1
-4.3216 -3.2774 1
-0.45173 -4.763 1
2.2952 1.3859 1
-2.5657 7.1216 1
-2.3846 7.1456 1
-3.6913 6.332 1
-5.153 -2.4078 1
-1.3853 7.5652 1
1.8265 1.6745 1
3.1154 0.26274 1
-5.778 -1.4073 1
-2.4125 7.1112 1
2.5649 -3.0943 1
0.67043 -4.389 1
-0.79545 7.7774 1
-0.44995 -4.6329 1
-6.1132 1.9711 1
-6.1505 2.1439 1
-5.5912 -1.9206 1
-6.2993 1.4634 1
-1.3839 7.7193 1
-5.9906 2.6321 1
3.4322 -0.6101 1
-5.3034 -2.3233 1
-0.65154 -4.7226 1
-6.1521 0.39326 1
-3.8776 -3.7479 1
-6.3132 1.3069 1
0.93691 -4.4268 1
-0.35854 7.9465 1
-0.78787 -4.6477 1
-2.1887 7.3713 1
-0.74915 7.7361 1
-6.2274 0.0082439 1
-1.7182 7.5466 1
2.8764 -2.5084 1
2.0636 -3.6648 1
-4.6502 5.4532 1
2.7572 1.1565 1
2.5495 1.2955 1
3.2146 -0.17538 1
-6.2863 0.82606 1
-5.3056 4.401 1
-6.061 -0.26088 1
-5.2331 -2.2809 1
-2.6057 -4.3723 1
-5.6997 -1.5265 1
-0.87514 -4.7666 1
-6.1727 0.38501 1
2.5422 -3.008 1
-4.1156 5.9038 1
-2.0388 7.4532 1
3.4408 -0.76035 1
-5.2475 -2.2421 1
-5.5362 4.12 1
-1.4176 7.7016 1
2.4145 -3.3185 1
2.9195 -2.5891 1
-4.3937 -3.3767 1
1.7697 1.7792 1
-2.1413 -4.4583 1
-2.3857 -4.3793 1
-2.3096 7.314 1
-0.63063 -4.7641 1
-1.9559 -4.6505 1
-6.2077 0.76146 1
-5.1073 -2.3705 1
-5.8922 -1.2146 1
-3.1161 -4.1828 1
2.18 1.5453 1
-2.1073 7.4124 1
3.2978 -1.6224 1
1.2467 -4.2636 1
-3.2396 -4.0549 1
-5.7833 3.4376 1
-5.1881 -2.2561 1
0.60866 -4.403 1
2.178 -3.671 1
-1.998 -4.5224 1
-0.27819 -4.676 1
-6.1715 1.2513 1
-2.7158 7.0806 1
2.8426 -2.5933 1
-5.9742 -0.66186 1
-0.7476 7.7348 1
2.1055 1.7288 1
-5.8099 3.4073 1
-6.2355 0.50701 1
3.0143 -2.3951 1
-1.5247 7.5515 1
1.8699 -3.7282 1
-4.9572 4.9802 1
-4.3967 5.6445 1
3.3057 -1.5753 1
-4.8397 5.0022 1
-5.0865 -2.5794 1
3.3492 -1.7359 1
2.8991 -2.6453 1
-3.9605 -3.5645 1
1.5611 1.8471 1
3.3819 0.011511 1
0.16344 7.931 1
-1.5348 -4.6219 1
-3.9732 -3.56 1
-1.3544 -4.715 1
3.0456 -2.5702 1
-4.9667 -2.6651 1
-0.10006 7.8403 1
-6.1917 0.36259 1
-3.9011 6.0356 1
-0.90232 -4.6295 1
3.021 -2.4461 1
-6.076 -0.34853 1
3.214 0.30889 1
2.6117 1.3262 1
-0.83203 -4.6298 1
-6.2157 1.6625 1
-5.8382 3.1314 1
-6.1839 1.805 1
-2.8905 7.0309 1
0.14707 -4.5635 1
1.4356 -4.1191 1
-5.19 -2.3845 1
-5.1262 4.6174 1
2.3806 -3.4144 1
-5.8711 3.4416 1
-6.1608 -0.093702 1
0.21929 -4.5793 1
-1.1652 -4.7321 1
-3.9254 6.228 1
-3.7933 -3.8204 1
-1.6515 -4.5556 1
-5.9038 -0.9972 1
-6.2008 0.83282 1
1.6254 -3.9999 1
-5.6922 3.7521 1
-1.4298 7.5338 1
-2.7265 6.8836 1
-3.3159 6.6052 1
1.7764 -3.8709 1
1.0788 -4.3696 1
3.365 -0.99269 1
-3.9803 -3.6076 1
-4.9048 5.0547 1
-3.8898 6.2905 1
-1.4511 7.5179 1
-5.4893 -1.7314 1
3.3652 -1.1164 1
-0.98328 -4.6799 1
2.2731 1.5816 1
3.0047 0.36568 1
-3.2663 -4.1672 1
1.4962 1.8261 1
3.2998 -1.8415 1
-6.094 -0.34383 1
-5.5513 4.1196 1
-6.1505 1.898 1
1.2574 -4.2461 1
-5.8598 -1.3176 1
-1.2144 7.7601 1
-3.7141 6.3007 1
-5.3108 -2.3238 1
-5.8959 3.1082 1
-4.6705 -3.115 1
-5.3752 -2.1539 1
-2.2387 7.2425 1
-5.9466 3.3294 1
-5.851 -1.2134 1
2.6023 1.2032 1
-2.72 -4.3814 1
-3.4958 -3.9466 1
-0.93825 -4.7496 1
-4.1957 -3.5278 1
1.2663 -4.2248 1
3.2018 -2.1513 1
-3.4006 -4.1237 1
-2.171 7.313 1
-4.5383 5.4181 1
-4.0748 -3.5052 1
0.085267 -4.7325 1
2.4304 1.2277 1
-2.9598 6.8878 1
-6.1139 2.1374 1
-1.4481 7.5303 1
-4.8102 5.1402 1
-5.8818 3.5197 1
3.3688 -1.2538 1
-5.1811 4.6182 1
3.2674 -1.4453 1
-1.4325 -4.6638 1
3.3017 -0.12231 1
-2.1011 -4.5904 1
-4.0724 6.0534 1
-4.7317 5.2959 1
-5.4326 4.3945 1
-6.1893 2.2533 1
-2.2982 7.2706 1
3.0265 -2.3502 1
-3.7599 -3.904 1
-5.6157 3.99 1
-3.724 -3.84 1
-5.6048 3.8756 1
2.8111 -2.852 1
-0.80854 7.7524 1
-0.89907 7.8053 1
-6.0955 2.1419 1
-2.3342 -4.4729 1
-1.0081 -4.7688 1
-6.0881 2.6516 1
-1.6014 7.5111 1
-2.5625 -4.4832 1
-4.4079 -3.2629 1
-6.151 -0.31588 1
-5.9188 3.2525 1
-1.1186 -4.7033 1
-2.5774 -4.4689 1
-0.45431 -4.7551 1
-4.6819 5.3644 1
-0.77646 -4.7392 1
-4.583 5.6029 1
-4.9591 5.0709 1
3.3449 -1.0402 1
2.8335 -2.7907 1
-0.5272 -4.6913 1
-3.6065 6.3909 1
0.72579 -4.3608 1
2.8243 1.0215 1
-5.1695 4.6522 1
0.98626 -4.3009 1
-2.9066 6.8253 1
0.039289 7.8544 1
-4.4984 -3.1992 1
0.20231 -4.548 1
3.3438 -0.065579 1
-4.5904 -3.0627 1
-2.0359 -4.4939 1
-1.9076 -4.5294 1
-6.2583 0.89337 1
-6.1533 2.2783 1
2.5585 1.1847 1
-5.8076 3.3952 1
-4.1019 -3.4957 1
-6.303 1.606 1
-1.022 -4.8092 1
-3.0028 -4.3178 1
-6.1773 2.0103 1
-5.697 -1.4977 1
-6.1036 0.13014 1
-5.3868 4.384 1
-2.8206 -4.2493 1
-5.8641 3.2251 1
-0.57206 7.8735 1
-1.687 -4.6287 1
2.7958 0.98318 1
-4.4925 -3.149 1
-2.4689 -4.4222 1
-5.4706 -2.0264 1
-3.4792 6.6521 1
2.0998 -3.7863 1
-0.08993 -4.575 1
1.5086 1.9064 1
-6.0624 2.3474 1
-5.9216 3.1475 1
-2.0402 -4.5406 1
-1.5541 7.4392 1
-4.3069 -3.5199 1
-0.19346 -4.6784 1
-6.2463 0.82607 1
1.9587 -3.6271 1
-3.8419 6.3257 1
-5.8543 3.1637 1
-3.6623 -3.8544 1
-4.9279 -2.7225 1
2.452 1.2546 1
-4.961 -2.7089 1
-3.3635 6.6231 1
-1.7314 -4.6645 1
-2.1333 7.4257 1
-3.6074 -3.8437 1
-4.2607 -3.33 1
-5.9917 2.8099 1
-0.95921 -4.7305 1
-1.4213 7.625 1
-2.5912 -4.4147 1
1.2211 1.9544 1
3.1921 0.26394 1
2.9753 -2.3685 1
-6.181 2.5176 1
1.9384 -3.7714 1
-0.74784 7.7964 1
3.3864 -0.72418 1
0.34031 -4.4723 1
3.3078 -1.8496 1
-0.49715 7.8825 1
-0.83413 -4.7821 1
-2.8652 6.9821 1
-4.6518 5.3171 1
3.3604 -0.73416 1
-5.9145 -1.1117 1
-6.2117 1.0877 1
-5.8441 -1.089 1
-1.7505 7.5096 1
2.3767 1.3365 1
3.2739 -1.9942 1
2.8171 -2.8902 1
-6.1911 2.5085 1
-6.2916 1.5628 1
3.2681 -1.2102 1
-6.0338 -0.08985 1
-1.6683 -4.601 1
0.096065 -4.5557 1
-2.7946 6.8945 1
1.1918 -4.2068 1
3.4074 -0.2537 1
-5.9936 2.7372 1
-4.9291 5.1042 1
-2.4852 -4.4388 1
-1.7567 -4.6021 1
-4.6514 -3.0039 1
-1.0551 7.6402 1
-1.6229 7.5916 1
3.1767 -2.1681 1
-2.4498 -4.5642 1
-4.7123 -2.9139 1
-5.69 3.6948 1
3.2183 -0.028644 1
2.8023 1.0581 1
3.4585 -0.50888 1
2.0868 -3.7497 1
-4.4788 5.4788 1
3.3178 -0.50115 1
-3.317 -4.1684 1
-1.0013 -4.7458 1
-0.95184 -4.6885 1
-0.25688 -4.6315 1
-1.5891 7.565 1
-5.7677 3.7102 1
-1.0834 -4.6134 1
3.1957 -2.1511 1
-0.018241 -4.6831 1
-5.9338 2.837 1
-5.9907 -0.5755 1
-4.7271 5.3952 1
-2.897 -4.2621 1
0.47157 -4.5755 1
-6.2341 2.1326 1
-4.4388 -3.1694 1
-2.6483 7.1807 1
-5.726 -1.4455 1
-6.1568 -0.40491 1
2.2571 -3.3413 1
-4.0679 6.0436 1
-2.1194 7.3201 1
-3.3799 6.5573 1
-4.197 5.8339 1
3.0276 -2.2194 1
-3.9462 -3.6179 1
-5.2039 4.7859 1
-4.6198 5.4341 1
2.8861 0.76578 1
-3.0231 6.9319 1
2.8218 -2.8335 1
-2.8724 -4.2436 1
-3.9295 6.1823 1
2.3698 -3.4424 1
-4.1927 6.0031 1
-6.2545 1.9851 1
2.8982 -2.606 1
-6.2413 0.23513 1
-6.27 0.9425 1
-0.53161 7.9212 1
-4.7209 -3.0555 1
-6.0107 2.463 1
0.93173 -4.4139 1
-6.1823 1.3601 1
2.2714 -3.5054 1
-1.0324 7.741 1
-0.58375 7.8081 1
1.6579 -3.8829 1
2.9001 -2.4986 1
-5.674 -1.5775 1
-4.9496 -2.659 1
2.9873 0.69848 1
-4.3279 -3.3906 1
-1.3433 7.5808 1
-2.1915 -4.5722 1
-6.0963 0.13299 1
-2.353 7.2748 1
-5.7501 -1.5378 1
-3.885 -3.7594 1
-6.2901 0.44681 1
3.214 -2.0341 1
2.9397 0.65879 1
-5.5431 -1.8092 1
-6.1994 1.6203 1
-3.1293 6.7009 1
-6.2386 1.1145 1
-5.9896 -0.75084 1
-5.4608 4.4609 1
3.3977 -0.21849 1
2.1834 1.5013 1
2.152 -3.6281 1
-5.5111 -1.9358 1
3.2567 -1.7672 1
-6.1302 2.5927 1
-2.5059 7.1524 1
1.8789 -3.8147 1
3.3049 -1.0396 1
-4.602 -2.994 1
0.95192 -4.3575 1
-5.6499 3.798 1
-1.282 -4.756 1
-5.5084 4.3503 1
-0.066337 -4.5797 1
-2.1589 7.3053 1
2.8958 -2.7035 1
-5.8562 3.418 1
3.4431 -0.58257 1
-0.10927 8.0172 1
-5.9605 -0.4692 1
3.163 0.33222 1
-5.0862 4.8213 1
2.5286 1.2403 1
-1.4342 7.6504 1
-2.154 7.3785 1
-2.8986 -4.3834 1
-1.1953 -4.7598 1
-1.8133 7.4006 1
3.36 -0.72114 1
-4.0791 6.0075 1
-4.2142 -3.3836 1
0.15949 -4.6017 1
3.3754 -0.57769 1
3.4287 -1.356 1
3.3247 -1.1705 1
-3.6661 -3.7816 1
2.6744 1.1078 1
0.84009 -4.392 1
-3.3009 6.7279 1
-0.31977 7.9051 1
-4.5216 -3.1992 1
-0.35988 7.8863 1
-6.2713 1.3644 1
-6.0192 3.1203 1
-2.8101 6.9533 1
3.2133 -2.1276 1
0.21598 -4.559 1
-3.3028 6.6126 1
-2.2697 -4.4433 1
-5.4596 4.3848 1
-3.4816 6.5494 1
-1.0561 -4.7111 1
-0.13969 -4.7505 1
-3.3508 -3.9635 1
-3.6234 6.4307 1
1.4755 1.8387 1
-5.8838 -1.1636 1
-5.0427 4.8941 1
-3.5563 -3.9513 1
-0.48457 -4.7643 1
-2.276 7.3625 1
-0.96562 7.6426 1
-4.0564 6.0772 1
-4.5298 -3.2606 1
-5.9273 -1.0439 1
-2.6546 7.0087 1
2.1218 -3.5138 1
-5.5363 4.0746 1
-3.1884 -4.1543 1
-4.8921 5.1039 1
-6.196 2.0557 1
-3.8481 6.3307 1
-4.3378 5.7294 1
-5.0208 -2.7846 1
-5.8965 -1.0851 1
3.3137 -0.024915 1
3.3837 -0.22876 1
-4.6846 -3.0497 1
3.0696 -2.3043 1
-5.9582 3.1293 1
3.381 -0.70216 1
-5.7995 3.5909 1
0.42013 -4.4989 1
-6.2479 1.7108 1
-6.2452 1.1284 1
-3.4988 -4.0339 1
-1.2256 7.593 1
0.80094 1.9646 1
-5.5097 4.2706 1
-6.011 -0.44955 1
1.0144 -4.2568 1
1.7542 -3.8924 1
-5.514 -1.7172 1
-5.7481 3.6795 1
3.3633 -0.26338 1
-0.41615 7.7809 1
-2.7292 2.678 2
6.427 -1.5571 2
5.6794 1.9406 2
0.92956 -7.5603 2
5.5607 -4.2163 2
-0.381 4.6635 2
6.3861 -2.1147 2
-2.2577 3.5306 2
-2.1152 -1.2694 2
3.1563 -6.6707 2
-2.2423 3.3811 2
4.6982 -5.4378 2
-3.1782 0.9924 2
-0.1891 4.7885 2
-1.9537 3.854 2
3.3027 -6.5269 2
3.8727 4.129 2
6.5243 -0.31321 2
6.1538 -2.9673 2
2.4911 4.6972 2
2.0123 -7.1532 2
5.5601 -4.3073 2
3.3382 4.4703 2
-2.189 3.5106 2
6.1891 1.02 2
-1.1432 4.3208 2
5.0395 2.8845 2
6.3222 -2.5084 2
0.80556 -7.6111 2
-1.3839 -1.5559 2
3.9052 -6.2384 2
6.0627 -2.9543 2
4.5834 -5.5003 2
5.5657 2.2871 2
4.7444 -5.3809 2
6.3834 -2.2209 2
6.3097 0.46719 2
5.9742 1.3019 2
-3.0716 2.026 2
5.8631 1.5587 2
1.6789 4.9181 2
5.1373 2.9149 2
2.0008 -7.2334 2
1.3575 -7.4904 2
6.4563 -0.7725 2
2.707 4.5995 2
4.5101 -5.5125 2
5.963 1.4898 2
-0.33966 4.7476 2
5.2664 -4.7287 2
1.3812 -7.437 2
4.5617 -5.4559 2
5.587 -4.3219 2
3.0474 -6.7959 2
6.5192 -0.63595 2
6.1111 0.81528 2
-1.8965 3.899 2
5.7883 2.0388 2
2.9268 4.5453 2
6.2632 -2.7394 2
6.4489 -0.2652 2
6.4614 -1.1897 2
0.096154 4.8726 2
6.3421 -1.5528 2
5.8443 1.6615 2
6.4979 -0.40846 2
5.0686 -5.0257 2
6.3139 -1.9026 2
6.0893 -2.8633 2
-1.6164 -1.4629 2
2.5612 -7.0908 2
3.1338 -6.8139 2
1.8996 4.8209 2
4.9691 3.2237 2
2.2112 -7.2106 2
6.4475 -0.66118 2
1.7005 4.9799 2
1.5638 -7.4116 2
2.3391 -7.1162 2
6.209 0.51713 2
5.1163 2.784 2
1.4049 -7.4508 2
-2.8691 2.5268 2
6.4108 -0.012729 2
-1.7573 3.7635 2
-3.2408 1.3344 2
-3.1559 1.8975 2
1.9652 -7.2147 2
-3.159 0.98852 2
6.3528 -1.7104 2
4.8506 3.0983 2
-2.7115 -0.68904 2
0.68375 -7.616 2
6.4043 -2.163 2
6.4885 -0.88946 2
5.7026 -3.8504 2
1.0582 -7.5153 2
3.2273 -6.6599 2
3.1174 4.3946 2
3.0125 4.4683 2
6.3448 -0.13963 2
3.6491 -6.4201 2
-2.1277 3.6382 2
6.4675 -1.8384 2
4.1719 3.9544 2
4.6025 3.3786 2
4.0099 -5.9592 2
1.8536 -7.2324 2
5.7609 -3.705 2
6.3335 -1.442 2
6.4775 -0.7453 2
6.3857 -1.8588 2
-1.6809 3.8924 2
0.70171 -7.7085 2
5.2946 2.7231 2
4.9089 3.2015 2
5.9017 -3.3883 2
-2.2728 -1.1649 2
2.9752 4.6285 2
1.6232 4.9563 2
-1.0096 4.5489 2
2.0346 -7.241 2
6.3326 0.76215 2
2.7749 -6.8243 2
-1.8511 3.8155 2
4.4025 3.7079 2
4.8452 -5.2712 2
-2.8393 -0.62294 2
0.78812 -7.6084 2
2.3085 -7.0831 2
6.3191 -1.9666 2
6.3994 -0.45463 2
4.981 -4.9138 2
-2.2605 3.4586 2
0.72022 4.9324 2
-3.1875 0.31121 2
-3.2375 0.98241 2
5.4447 -4.5013 2
-2.9895 1.8525 2
5.9411 1.4263 2
-0.9885 4.304 2
2.952 4.6063 2
2.4579 4.8084 2
-2.1874 3.4713 2
-1.3382 4.2388 2
3.6935 4.3011 2
-3.251 1.4542 2
6.4796 -0.50248 2
-1.229 4.3682 2
1.6418 -7.4958 2
3.6038 -6.3414 2
-1.9447 3.6858 2
6.2058 -2.6473 2
1.3316 -7.5185 2
-3.126 0.82747 2
6.3842 -0.47861 2
-2.4151 3.2334 2
0.20408 -7.6577 2
0.62293 -7.716 2
3.2961 -6.4948 2
5.1814 2.8042 2
5.5743 2.2513 2
4.884 3.1278 2
2.0676 -7.1324 2
-2.6794 -0.68037 2
-2.9712 1.946 2
-1.2844 4.2258 2
2.5864 -6.9903 2
-3.19 1.3984 2
-2.0651 3.7256 2
2.7585 4.6294 2
5.3795 2.6451 2
6.0866 -3.357 2
4.0195 3.9389 2
3.1471 -6.5994 2
2.9439 4.4323 2
5.6417 -4.1828 2
-1.1651 -1.687 2
5.4889 2.552 2
1.8462 -7.352 2
5.2399 -4.6065 2
4.7348 -5.3232 2
6.4429 -0.42937 2
0.33706 -7.6951 2
3.3616 -6.5868 2
-2.4246 -0.99087 2
1.3711 4.8442 2
-2.7516 -0.67637 2
6.4497 -1.6127 2
6.3287 -2.4924 2
6.2181 0.61233 2
-1.9148 -1.4401 2
0.32903 4.8322 2
-1.0486 4.3948 2
1.8637 4.9261 2
5.946 1.6416 2
5.9782 -3.4719 2
4.3689 -5.8339 2
4.8701 -5.1862 2
-1.9648 3.7916 2
6.1463 -2.6648 2
2.493 -7.0228 2
5.8529 2.0451 2
0.14263 -7.7574 2
6.4427 -1.9957 2
4.5157 -5.6445 2
-2.8233 2.3889 2
1.4919 -7.3597 2
3.8798 4.0237 2
6.3055 -2.2512 2
-0.96533 4.3773 2
-2.9724 2.2722 2
0.0019711 4.8861 2
-2.8551 2.768 2
-2.533 3.1856 2
5.7456 2.1583 2
5.3981 2.519 2
5.5995 -4.2579 2
6.4312 -0.52667 2
2.4774 4.6862 2
4.7547 3.3431 2
-2.5939 -0.73218 2
1.2141 4.8501 2
1.6696 4.8471 2
-3.1081 0.27703 2
5.6124 2.4343 2
1.4382 4.9434 2
5.6154 2.4123 2
-2.1319 3.6001 2
4.9249 -5.107 2
0.050994 -7.7024 2
6.3555 -1.6063 2
-3.0791 2.1865 2
2.3547 4.7075 2
3.9816 -6.1968 2
1.0873 -7.5274 2
-2.6644 2.9582 2
2.0233 4.8139 2
-3.0489 2.1468 2
6.1675 1.2331 2
4.0886 3.9927 2
6.3816 -2.3946 2
6.3489 -2.3115 2
6.5316 -1.2236 2
1.9039 4.756 2
4.0596 3.8607 2
0.87403 4.8702 2
-3.1811 1.6199 2
-3.0779 0.47123 2
6.2672 -2.6707 2
6.0261 -3.0643 2
2.4272 -7.1491 2
-0.72962 4.6564 2
3.0473 4.4605 2
-1.992 3.6569 2
2.8542 -6.7082 2
1.9688 4.7551 2
4.8398 3.2717 2
6.5377 -0.83165 2
-2.7013 2.9882 2
3.4785 -6.5684 2
6.0918 -3.3086 2
4.1054 3.9046 2
4.2454 -5.8355 2
4.2297 -5.8555 2
6.1844 0.78824 2
-3.1072 0.79734 2
1.9617 -7.3447 2
5.6462 -3.983 2
-2.0419 3.8145 2
-1.4388 -1.5715 2
-3.0998 1.9846 2
5.8739 -3.4762 2
2.162 -7.1465 2
-2.0515 3.7541 2
4.9595 3.1266 2
1.3112 4.9388 2
-3.0044 2.2341 2
-2.6397 2.9147 2
5.8411 1.9259 2
-0.95939 -1.7587 2
5.5488 2.1736 2
5.0181 3.107 2
3.9234 -6.0841 2
6.2705 0.39543 2
6.419 -1.8796 2
3.7692 -6.1608 2
-0.061244 4.8233 2
0.85094 4.8976 2
4.1503 -5.9354 2
-2.0968 3.7061 2
0.83894 -7.5353 2
0.43546 4.8267 2
-3.141 1.7122 2
5.2341 2.7274 2
1.1118 4.8246 2
4.5534 3.5619 2
-1.5183 -1.5607 2
5.792 1.7899 2
2.1424 -7.1191 2
-0.54672 -1.6555 2
3.5336 -6.2971 2
5.1978 -4.7685 2
6.4257 -1.3189 2
-0.9037 -1.7984 2
-2.3201 -1.0552 2
0.10409 4.8321 2
3.3775 -6.5115 2
4.9017 3.3316 2
5.0012 -4.9709 2
3.4367 4.3207 2
5.8751 -3.5501 2
-0.34362 4.7043 2
1.0941 4.9251 2
5.1286 -4.8403 2
-3.0528 1.647 2
4.0627 3.8055 2
0.78173 4.8959 2
6.2907 -2.5884 2
5.7415 1.8189 2
-2.5618 3.1089 2
-0.6622 4.6721 2
4.671 3.4053 2
4.7148 -5.3409 2
5.6775 1.9527 2
3.0686 -6.8194 2
0.01098 4.9114 2
-0.75893 4.4866 2
6.112 1.0048 2
-3.1864 0.45116 2
4.5362 3.5128 2
4.3289 -5.8774 2
1.7771 -7.4087 2
1.2887 -7.4498 2
6.2827 0.83476 2
-0.65794 -1.674 2
3.0035 4.4978 2
-2.0238 3.7158 2
5.9609 1.7045 2
6.3526 -2.3443 2
-0.9987 -1.6175 2
6.41 -1.6784 2
1.0806 4.8737 2
-2.6533 -0.69065 2
0.1997 4.8505 2
5.9783 1.3177 2
5.5211 2.5726 2
4.3105 -5.7338 2
4.1865 3.9145 2
1.5045 4.9257 2
6.2142 -2.7649 2
-3.1703 0.47809 2
3.5251 -6.3682 2
2.3893 4.6788 2
-1.5404 -1.5897 2
3.7329 4.0781 2
5.6696 -3.8517 2
6.3815 -0.799 2
-2.3442 3.2488 2
3.5217 -6.3678 2
5.0262 -5.1201 2
6.4588 0.10579 2
3.522 -6.4586 2
4.1355 -5.8115 2
4.9176 -5.1472 2
-0.57683 4.5974 2
6.2585 -2.0077 2
-3.077 -0.045501 2
2.4779 -7.094 2
1.2709 4.9418 2
6.0997 1.2546 2
6.5594 -0.69323 2
-2.0176 3.6406 2
4.1245 -5.9357 2
3.6482 4.162 2
3.3778 4.3628 2
5.5678 2.1632 2
-2.6549 -0.58156 2
5.2243 2.8431 2
0.11568 4.8433 2
0.67305 4.9357 2
4.8975 2.9977 2
5.5937 -4.0394 2
4.4049 3.7283 2
-0.80157 4.4785 2
3.3968 4.3121 2
0.60669 -7.7112 2
3.5716 -6.4109 2
2.0704 4.8326 2
2.3665 -7.0215 2
0.64364 -7.7117 2
6.3317 0.14363 2
-1.8085 -1.5006 2
6.3736 -1.8954 2
4.0163 -5.9447 2
1.2159 -7.4892 2
-1.924 -1.4097 2
6.2921 -2.1773 2
5.3112 2.595 2
6.3487 -1.3399 2
-3.2277 0.95387 2
-0.53785 4.5734 2
2.1719 4.7195 2
5.0426 -4.8478 2
2.5985 4.6486 2
5.5778 2.289 2
2.9465 -6.7473 2
6.3913 -1.6915 2
-3.1084 0.22066 2
6.5278 -0.39557 2
5.2629 2.8711 2
0.28642 -7.7814 2
-2.8192 2.7832 2
5.9679 -3.221 2
4.0559 -6.0749 2
3.6671 4.1409 2
6.407 -1.1673 2
-1.2133 -1.6807 2
1.5211 4.9359 2
5.3294 -4.7386 2
6.168 -2.8356 2
0.23884 -7.6468 2
5.8236 1.9017 2
0.22286 -7.6584 2
-1.7286 4.0501 2
4.326 -5.7087 2
-0.67922 4.4918 2
5.4864 -4.3409 2
-0.92909 -1.6938 2
0.036188 4.8116 2
4.749 3.4099 2
4.3275 -5.7296 2
3.7241 -6.1536 2
-2.8217 2.7008 2
1.7299 4.8835 2
0.79821 -7.7292 2
5.9641 -3.2839 2
4.6373 -5.4592 2
6.5217 -0.28962 2
4.6218 -5.4654 2
6.4736 -1.6102 2
2.0935 -7.3214 2
6.2219 -2.6794 2
3.1983 -6.6586 2
1.9863 -7.3345 2
-1.6869 4.071 2
2.5985 -7.0221 2
6.1868 -2.8979 2
5.9955 -3.1368 2
6.2734 -2.2232 2
6.4181 -0.037133 2
2.9992 -6.7782 2
6.233 -2.8156 2
6.3801 -1.203 2
3.7793 -6.1845 2
1.4727 -7.484 2
-2.3906 3.3158 2
-2.5766 -0.76457 2
5.9565 1.6696 2
6.4291 -1.5225 2
0.27945 -7.8163 2
2.166 4.7129 2
5.5025 2.3945 2
3.852 -6.1221 2
4.3784 3.6777 2
5.6634 2.0806 2
4.3991 3.72 2
2.7868 -6.89 2
2.6342 -6.9544 2
-0.8654 4.368 2
6.0907 1.249 2
5.8933 -3.5383 2
4.1241 -5.9865 2
-2.5399 3.1338 2
6.0161 1.4765 2
6.2036 1.1623 2
4.9095 3.1261 2
6.4614 -1.1868 2
6.1981 0.7751 2
-2.7975 -0.46842 2
6.4708 -0.26954 2
-3.2166 0.34857 2
6.4273 0.32967 2
6.4009 0.43211 2
2.8121 -6.8849 2
-2.7374 -0.64882 2
6.1106 1.0906 2
-3.068 1.8386 2
1.1573 -7.6612 2
3.5393 -6.3337 2
6.2485 -2.7311 2
2.5691 4.5682 2
-3.0965 0.3 2
5.7621 1.9902 2
-0.56815 4.5058 2
1.5434 -7.4152 2
6.2456 -2.5883 2
3.8803 4.0058 2
-0.55827 4.5039 2
4.4997 3.666 2
0.99704 1.1724 -1
4.1822 -2.3831 -1
-2.1785 -1.2334 -1
0.69431 7.4779 -1
5.9458 4.7189 -1
5.9262 7.9366 -1
-5.7178 -7.4405 -1
3.0562 -7.0185 -1
-2.9601 -0.10538 -1
5.8036 4.6519 -1
1.7255 -4.3812 -1
-1.029 -3.1007 -1
-2.2978 7.6748 -1
-0.39582 -3.1281 -1
3.111 -7.2715 -1
4.5375 -2.682 -1
-4.3156 -2.0144 -1
1.7443 1.4558 -1
1.4728 -5.119 -1
5.2588 -6.8491 -1
3.7378 -5.8209 -1
-6.2272 3.1231 -1
4.094 -1.1167 -1
-5.1932 3.6454 -1
6.0181 2.3473 -1
-6.1718 2.0509 -1
-3.3875 -5.644 -1
0.6686 -2.2634 -1
2.0191 6.5422 -1
-3.8391 -6.6635 -1
1.7566 -3.3627 -1
0.60887 7.4868 -1
5.9015 0.56615 -1
-3.4838 1.9722 -1
-1.5884 -1.6991 -1
-1.2795 -5.9099 -1
-4.1946 1.8443 -1
-0.29807 -1.0464 -1
-0.81015 4.1765 -1
0.50489 -7.3674 -1
0.38651 7.9087 -1
-1.8725 -3.107 -1
-1.6886 -5.5213 -1
2.2159 7.4489 -1
6.1353 0.12555 -1
-0.99317 7.6643 -1
4.9855 5.5923 -1
-3.0679 7.9675 -1
6.3441 2.5904 -1
2.0811 -0.69084 -1
-3.535 -4.1746 -1
-3.5312 -4.2606 -1
-4.79 7.0115 -1
-4.1976 1.4007 -1
-1.5816 2.8529 -1
-6.2717 3.3302 -1
4.213 2.6279 -1
-0.41696 7.7572 -1
-3.8167 -3.4963 -1
3.9224 -6.3023 -1
-1.9521 1.791 -1
-5.0652 -4.0883 -1
5.2947 6.2156 -1
3.7967 -5.6004 -1
6.4926 6.4649 -1
5.7136 -7.4243 -1
0.81967 0.3283 -1
-2.9895 -0.00085951 -1
0.15954 5.9809 -1
-0.33288 -2.5584 -1
-4.2483 6.1688 -1
5.2115 -5.1639 -1
-3.2276 -4.5736 -1
2.5012 -5.6455 -1
0.89988 -0.24351 -1
4.9861 6.2819 -1
-1.9625 -5.6476 -1
-5.2149 7.8121 -1
3.1152 4.2004 -1
-6.2495 7.3756 -1
-1.9432 -7.2925 -1
-0.65084 -0.018711 -1
-4.0715 -4.323 -1
-0.53212 -3.5106 -1
3.933 5.2171 -1
-5.1819 -6.2939 -1
-0.98123 -7.1683 -1
1.1532 -4.7665 -1
5.3666 2.8373 -1
-6.181 1.9162 -1
3.5842 0.20032 -1
1.8329 3.8181 -1
3.4293 2.4644 -1
2.9413 4.0821 -1
2.4294 3.5826 -1
0.060931 -6.7637 -1
3.3046 7.0353 -1
3.6781 -4.2435 -1
-3.5133 3.5795 -1
1.5065 5.9587 -1
3.6982 -4.6055 -1
-0.052565 -2.7876 -1
5.8367 -3.1035 -1
1.0768 -7.1842 -1
-5.8994 5.6035 -1
2.8926 -6.6268 -1
2.584 -3.2516 -1
5.0489 -7.1636 -1
-3.8496 6.3882 -1
5.3396 -4.6697 -1
-5.442 6.8484 -1
-5.3506 6.4377 -1
-0.94144 7.9699 -1
2.672 5.7087 -1
-5.2121 3.4423 -1
-4.8627 -1.2793 -1
4.579 -7.5973 -1
-1.371 1.0155 -1
-2.8194 5.4605 -1
-0.071015 -6.3005 -1
3.4621 -6.4566 -1
-1.5589 4.268 -1
0.83669 1.8137 -1
-4.9696 -7.4708 -1
6.3262 -1.9484 -1
3.3632 -7.1643 -1
-5.4842 -4.819 -1
6.3909 -7.7097 -1
-0.85769 5.5245 -1
-5.9393 7.5945 -1
-5.0688 -1.5275 -1
-2.0687 2.8432 -1
2.3228 3.5423 -1
5.2122 6.4419 -1
5.0437 -2.8339 -1
4.9325 3.8192 -1
1.5656 -4.3287 -1
-0.18476 -1.6329 -1
-5.8413 -5.0362 -1
-4.5102 5.595 -1
4.5549 -0.8138 -1
-6.0414 5.0219 -1
-3.627 -2.0888 -1
2.115 7.7527 -1
-4.2277 -2.239 -1
4.1579 -6.7695 -1
3.7898 0.30789 -1
-1.4327 5.3565 -1
-5.321 3.4747 -1
2.9305 -4.3483 -1
-0.65996 7.4931 -1
5.7946 5.2348 -1
0.71088 -3.4582 -1
2.514 0.80654 -1
-3.8192 1.7883 -1
2.4929 -4.2774 -1
6.4036 1.7112 -1
-5.3291 5.499 -1
0.30721 -0.5677 -1
-5.4762 0.93261 -1
2.4093 2.977 -1
-1.8227 -3.0121 -1
1.8712 6.9915 -1
-1.6002 5.9287 -1
-5.0192 1.8056 -1
-3.5259 -6.7022 -1
6.2082 2.0681 -1
1.8873 -5.8954 -1
-2.76 -0.7612 -1
-2.9446 3.4073 -1
0.20439 -5.1879 -1
1.0054 5.3538 -1
3.0021 4.8663 -1
-3.1108 2.0954 -1
6.0641 6.7557 -1
-4.0038 6.3951 -1
2.4967 5.9365 -1
3.3355 -6.9662 -1
-4.1238 7.3905 -1
5.4149 -0.027801 -1
-3.1404 -3.3035 -1
0.87371 -0.91466 -1
-5.0495 4.1747 -1
-4.5955 -5.998 -1
2.2079 0.90491 -1
6.3577 7.0719 -1
-5.0772 1.2271 -1
3.1964 -0.33412 -1
-0.67638 -2.1462 -1
5.0756 -0.61201 -1
-1.7577 4.2782 -1
0.083412 6.5329 -1
-2.5483 0.5376 -1
6.1629 -2.9882 -1
3.771 0.29035 -1
-0.030158 6.6956 -1
-3.2344 4.4634 -1
1.2983 4.2072 -1
6.3666 4.9412 -1
1.3649 4.8555 -1
-1.2182 0.7369 -1
2.026 -2.5166 -1
5.9935 -7.1565 -1
-4.7681 -6.3109 -1
5.3053 -5.3231 -1
0.97327 6.2888 -1
-3.1591 0.051957 -1
-1.0545 -2.9558 -1
-5.7917 2.8668 -1
-5.666 3.1302 -1
3.1989 -7.4469 -1
2.3884 7.9023 -1
2.6722 -4.6645 -1
1.5957 -4.4956 -1
-0.76716 -5.4389 -1
-2.0792 6.6362 -1
4.0531 -4.8692 -1
4.9185 1.1542 -1
3.8435 0.35499 -1
5.601 6.4115 -1
4.4642 6.4834 -1
-0.59422 6.7979 -1
-1.3794 -6.5383 -1
-3.7656 4.9034 -1
0.24239 -7.5053 -1
-5.6226 6.4793 -1
2.9967 5.7301 -1
-2.8845 -4.9609 -1
1.6857 -1.3776 -1
-0.061039 -6.8821 -1
-1.7615 -6.323 -1
0.73737 0.66436 -1
4.0347 3.5063 -1
2.9222 -4.4931 -1
-0.92616 0.67313 -1
2.5323 3.0703 -1
-0.41022 6.8608 -1
1.3276 6.1547 -1
-2.7322 2.3895 -1
4.9463 5.5359 -1
4.7218 -0.60724 -1
1.4137 -3.0039 -1
-2.1774 -4.2412 -1
-4.1458 3.2823 -1
3.609 3.0428 -1
-5.9599 -7.427 -1
-3.6884 2.5647 -1
-3.2222 6.0661 -1
-0.67926 6.8599 -1
-4.5158 4.3019 -1
-0.92901 -3.9958 -1
-3.7309 7.9855 -1
3.8052 -2.7865 -1
-0.58681 0.71993 -1
3.6634 7.7778 -1
2.8205 -4.1545 -1
5.1298 -2.5519 -1
5.6018 -6.8103 -1
-1.2913 4.8921 -1
1.3781 0.71746 -1
-2.4163 -4.3378 -1
5.5181 7.4863 -1
-3.2052 -4.9288 -1
4.8516 4.1756 -1
-4.5146 6.2547 -1
3.3334 -1.2291 -1
4.6495 5.4796 -1
2.8658 0.35381 -1
-4.8023 -3.2821 -1
-1.9408 5.8408 -1
2.1486 -4.3912 -1
-4.661 6.4978 -1
-2.1517 -6.3523 -1
2.1335 6.5482 -1
-0.41094 1.4984 -1
0.8161 -2.9263 -1
-3.3417 0.18804 -1
-2.0546 7.0922 -1
5.0153 6.8427 -1
3.7475 2.4096 -1
-0.95336 -6.4451 -1
-1.7757 7.1045 -1
5.3302 -2.9651 -1
3.2545 -2.3742 -1
4.8409 -1.274 -1
3.7683 -1.4661 -1
1.6694 6.1417 -1
-5.7319 -2.5869 -1
6.5222 0.5423 -1
-5.0951 -5.5717 -1
3.2518 6.8628 -1
-1.9046 3.7882 -1
-5.5292 5.8553 -1
-3.2266 7.7296 -1
1.3355 -5.6115 -1
-3.8097 6.1073 -1
3.7864 6.5807 -1
4.575 -3.234 -1
-2.7285 1.0933 -1
0.17763 -0.47547 -1
-5.3179 7.0361 -1
0.78796 -7.4162 -1
4.7834 5.5481 -1
-5.335 -7.4817 -1
3.2472 0.16747 -1
5.1979 -5.2918 -1
4.9531 -2.7009 -1
0.76115 -1.2127 -1
3.3223 5.3327 -1
-5.2735 5.2985 -1
-2.7887 -4.7701 -1
3.1233 -7.3721 -1
1.4966 -7.1718 -1
-3.4529 -2.7676 -1
6.5528 -1.859 -1
6.2053 4.9028 -1
3.4933 7.3344 -1
-3.0983 4.5351 -1
1.0965 1.7329 -1
1.4756 4.7751 -1
-1.9237 -0.018649 -1
-3.8355 0.32676 -1
5.5789 -3.3192 -1
-2.4511 -0.67166 -1
5.8085 6.0645 -1
-3.2578 -4.0852 -1
-2.9998 0.13506 -1
-3.0915 1.3737 -1
-4.1625 -3.009 -1
-5.5275 -5.7139 -1
4.6112 -1.8252 -1
-2.1982 -4.5277 -1
-4.5409 1.4609 -1
-5.2343 -3.6335 -1
6.3133 -5.0417 -1
-5.3859 -0.23118 -1
6.3354 -4.479 -1
-4.7967 0.72734 -1
-2.966 -6.2462 -1
1.5387 -0.65715 -1
-2.6625 7.4906 -1
-2.4973 3.6248 -1
4.3438 0.17437 -1
1.9394 -5.4485 -1
-3.8528 -1.8972 -1
1.1483 -6.8406 -1
4.4759 2.4832 -1
-4.3747 0.12029 -1
3.7233 -5.2136 -1
2.3754 -1.4229 -1
-1.0381 3.9534 -1
-0.18974 -1.3719 -1
1.3579 5.2827 -1
5.1344 6.9134 -1
0.55913 0.64429 -1
-4.818 -1.0339 -1
-6.1389 -4.66 -1
-0.54392 1.5351 -1
4.9994 -0.706 -1
4.9898 -6.5334 -1
-5.6496 -2.104 -1
3.6242 -3.6928 -1
4.464 -1.6731 -1
3.3974 0.16034 -1
-0.94691 4.7007 -1
-3.7249 -1.5431 -1
-1.6193 7.6096 -1
-1.9751 -2.2865 -1
5.9782 -2.2816 -1
-5.499 4.7471 -1
4.4027 -6.5977 -1
-0.48006 -4.0227 -1
-4.3616 -3.1293 -1
2.8308 7.8459 -1
3.5673 -0.60188 -1
6.167 2.3977 -1
6.2279 7.326 -1
-6.2254 2.6421 -1
3.3556 3.0495 -1
-3.8513 -0.50673 -1
-2.1658 2.2508 -1
1.2456 -6.8903 -1
-3.3019 6.1641 -1
0.070979 -1.5406 -1
-3.2609 7.5648 -1
2.9468 2.4309 -1
-0.56474 -2.3226 -1
-0.27054 -3.1012 -1
-2.485 6.2854 -1
-0.98761 1.3893 -1
-6.1605 -5.3819 -1
4.3137 -7.62 -1
-2.3717 7.4464 -1
-2.677 0.32133 -1
0.8667 6.1086 -1
3.2881 7.6549 -1
5.949 -5.6174 -1
3.9371 -5.0215 -1
-4.8079 -5.5373 -1
0.46141 -3.8188 -1
5.8141 -5.019 -1
1.8518 -4.5696 -1
4.4702 -3.7805 -1
-1.1937 -0.39054 -1
-1.7887 3.3432 -1
-3.6752 -3.8158 -1
5.8888 -2.869 -1
4.7502 -2.2679 -1
-0.61324 2.1134 -1
-3.9465 -6.8551 -1
1.1924 4.1245 -1
-0.15488 0.57528 -1
-1.2941 3.8053 -1
0.28874 -0.43841 -1
1.5294 7.9929 -1
2.1688 6.9842 -1
-2.1996 0.24757 -1
0.11939 0.22495 -1
5.6391 4.5312 -1
-4.4355 -3.8229 -1
3.6499 6.6755 -1
1.3787 -6.4042 -1
-1.6911 1.0304 -1
-0.39597 -3.4163 -1
1.5339 2.5518 -1
-0.66757 -1.787 -1
-6.0168 -0.85973 -1
0.72782 -7.0758 -1
4.0202 0.61993 -1
0.84488 3.7026 -1
-1.183 0.8396 -1
2.7949 -5.8993 -1
2.3311 7.1993 -1
2.0495 4.09 -1
-5.3304 -3.0831 -1
-4.2442 -3.5328 -1
0.79558 -5.2189 -1
4.0256 5.9052 -1
-5.5057 -6.5451 -1
-2.0786 6.6513 -1
3.1442 1.138 -1
2.975 5.1619 -1
-3.3218 4.8784 -1
0.92004 4.6088 -1
4.145 7.6729 -1
-4.8376 4.783 -1
5.1885 0.32069 -1
4.8452 -4.8085 -1
1.4223 -4.4354 -1
3.7774 -4.2538 -1
2.5299 7.9499 -1
3.7285 6.4302 -1
1.5697 5.617 -1
-4.161 0.40129 -1
3.0207 -7.5248 -1
-3.7894 -6.901 -1
1.4762 -7.7831 -1
4.7934 -0.33329 -1
-3.6731 -5.158 -1
-0.20239 4.394 -1
4.8309 -7.8099 -1
-3.2922 -2.5616 -1
5.7604 5.5706 -1
-1.9064 -4.3283 -1
-5.2855 -4.5455 -1
4.6664 7.3931 -1
3.8793 4.1104 -1
1.1225 -0.44183 -1
-4.2652 -2.9422 -1
3.1047 5.2933 -1
1.8533 -0.23408 -1
3.8429 6.6068 -1
1.9521 6.6885 -1
4.3278 7.3079 -1
-5.8087 6.2181 -1
3.6731 -5.3896 -1
4.3802 -1.8335 -1
-1.7062 7.662 -1
-4.3473 7.8695 -1
-5.9378 -4.1413 -1
-5.0565 -4.3832 -1
-2.3798 2.1034 -1
6.1812 -5.3513 -1
4.7768 -2.0083 -1
-5.654 1.1851 -1
-1.4824 -4.7542 -1
-3.9498 4.9372 -1
3.5556 -3.7813 -1
-6.1515 6.6724 -1
4.017 6.2756 -1
5.6379 5.6121 -1
4.4176 -2.9743 -1
-5.1328 -3.6111 -1
3.3823 -7.4463 -1
-0.21121 0.65504 -1
-2.6382 -5.5732 -1
-0.79968 6.8647 -1
-0.86721 -2.6187 -1
-0.23394 -6.8208 -1
1.2919 7.064 -1


================================================
FILE: data_src/data_DBCV/dataset_4.txt
================================================
340.080593000166 401.306241000071 1
333.985499000177 395.070042999927 1
335.612031000201 392.773647000082 1
345.092862000223 391.974363999907 1
330.569323000032 392.169848000165 1
339.312822999898 389.298434999771 1
339.686031000223 398.61877400009 1
343.316994000226 400.977901000064 1
333.065419999883 396.446630000137 1
342.511708000209 398.544327999931 1
340.766671999823 400.656415000092 1
337.277087999973 398.673247999977 1
340.923070999794 396.419065000024 1
341.683007000014 390.835886999965 1
342.708281000145 391.703137999866 1
343.201282000169 393.605033 1
341.791383000091 397.986231000163 1
331.578933999874 391.320538000204 1
342.348366000224 393.762180000078 1
338.727878999896 396.267293999903 1
334.738222999964 395.718766000122 1
330.975653999951 394.57460599998 1
344.716322999913 392.09760100022 1
341.841244999785 395.915595000144 1
337.327810999937 394.15234699985 1
337.534911999945 389.968692000024 1
342.663162999786 398.431352999993 1
331.561238999944 391.729294999968 1
334.40383999981 401.985080000013 1
332.629104000051 391.597103000153 1
335.299560000189 396.55072600022 1
336.396044999827 390.704500999767 1
342.243536999915 401.898397999816 1
339.051363999955 401.802294999827 1
343.168777000159 393.770492999814 1
339.352742000017 396.219544999767 1
340.294815000147 397.91381700011 1
335.593735000119 396.100544000044 1
331.089887000155 399.788712000009 1
337.715063000098 399.279529999942 1
335.798527000006 390.623579999898 1
342.880836999975 390.597740999889 1
343.378837999888 392.276885000058 1
336.571469000075 388.70158400014 1
338.360745999962 393.431067000143 1
332.635257999878 388.442526000086 1
343.189617000055 397.453784999903 1
332.511510000098 399.986099000089 1
336.477483999915 397.836385000031 1
332.235915000085 389.441686999984 1
275.564732000232 392.559510999825 2
271.378120000008 404.168049000204 2
275.536663000006 391.976090000011 2
277.958006000146 397.734982999973 2
281.468803000171 397.988183000125 2
276.999392999802 389.689679999836 2
277.532668000087 400.254569999874 2
283.876598999836 398.569484000094 2
275.246861999854 393.905619999859 2
280.570826000068 400.053100000136 2
284.972879000008 389.935608999804 2
272.625878999941 403.725992999971 2
284.073303999845 403.610801999923 2
275.946336000226 400.726569000166 2
275.631306000054 400.140395000111 2
271.080223999918 394.248447000049 2
271.56240000017 400.297164000105 2
273.661956999917 402.378101999871 2
281.391483999789 400.420233999845 2
273.427796999924 398.024796000216 2
282.834764000028 403.955949999858 2
275.17954300018 398.659824999981 2
283.498360000085 390.48953700019 2
271.657103999984 403.527952000033 2
282.961077999789 398.744140000083 2
284.618311999831 392.174819000065 2
279.342093000188 404.362902000081 2
270.902385999914 403.879670999944 2
280.118470000103 404.34404100012 2
279.850511000026 399.025212000124 2
282.69835200021 416.991049999837 2
284.884364999831 416.506583000068 2
278.547925000079 417.136450000107 2
272.391222000122 406.802037999965 2
277.926847000141 405.936453999951 2
275.040202999953 414.701766999904 2
280.756792999804 414.884978000075 2
276.10025600018 408.918399999849 2
279.437752000056 405.916401000228 2
272.477855999954 415.77772800019 2
284.624309999868 417.78525599977 2
280.52478600014 426.589997000061 2
286.989403000101 414.205672999844 2
274.815353000071 425.063535000198 2
285.541612000205 424.086339999922 2
275.997407999821 417.256587000098 2
278.908342999872 421.349551000167 2
275.412847999949 425.442823000252 2
287.055484999903 418.244719999842 2
277.670307000168 416.050514000002 2
280.128748999909 438.741917999927 2
280.342310000211 429.522609000094 2
284.103120999876 430.363543000072 2
294.341947000008 441.109651000239 2
287.621638000011 428.60290400032 2
288.48249799991 431.23475799989 2
285.283555000089 436.722258999944 2
286.409955000039 430.631586000323 2
289.852479999885 432.494152000174 2
292.558714999817 429.39255800005 2
300.289115000051 448.096348000225 2
301.998831000179 447.046858000103 2
296.696620999835 444.077189000323 2
292.593574999832 438.624294999987 2
295.824153999798 449.513397000264 2
294.305875000078 444.282610999886 2
303.000893000048 438.451207000297 2
294.69386699982 446.034585000016 2
295.395599999931 447.633481000084 2
298.359889999963 443.035277000163 2
298.105407999828 452.848544000182 2
298.494349000044 456.162745000329 2
307.769902999979 462.346611000132 2
297.570236000232 454.607466999907 2
308.14916000003 448.817819999997 2
306.467536999844 448.400144000072 2
300.060355999973 448.536395000294 2
298.264582999982 448.651820000261 2
301.684843999799 455.66270800028 2
308.0738309999 448.978669000324 2
310.796959000174 454.082526999991 2
312.118670000229 452.805322000291 2
311.059359000064 466.771413000301 2
318.166869999841 457.615817000158 2
314.176175000146 458.207098999992 2
316.558571000118 464.461933000013 2
307.646118999925 453.736591999885 2
317.605855000205 460.297666999977 2
310.377187999897 459.180017000064 2
311.657558000181 459.881522000302 2
328.125591999851 465.369022000115 2
331.674118999857 469.407306999899 2
318.059061000124 459.695964999963 2
327.572842000052 459.722072999924 2
325.065386999864 468.653207000345 2
323.394032999873 459.820330000017 2
327.346989999991 469.411017999984 2
323.62447200017 459.327498000115 2
327.730142999906 471.821191000286 2
320.339424000122 459.946117000189 2
336.860669000074 471.016857000068 2
329.326169000007 473.654308999889 2
336.970894000027 465.487050999887 2
328.943793999963 460.347212000284 2
341.481104999781 468.662958000321 2
331.426523000002 461.136917999946 2
331.442586999852 459.671700000297 2
340.471007000189 469.628787999973 2
334.635424000211 474.328426000196 2
334.673630000092 468.230307999998 2
345.601218000054 472.548501000274 2
340.310949999839 464.94068400003 2
341.861144000199 466.057520000264 2
342.321235999931 474.119177999906 2
339.784878999926 472.528696999885 2
348.661588000134 468.208697000053 2
344.018592999782 464.765122000128 2
346.396873999853 465.233286000323 2
343.950104000047 462.03741999995 2
344.375392000191 468.694385000039 2
281.591250000056 390.116572000086 2
274.038128000218 390.173677999992 2
274.319044000003 388.317232000176 2
281.458730000071 392.840789000038 2
277.684638000093 395.224723999854 2
282.037909999955 390.391964999959 2
277.54200100014 388.628529000096 2
277.781628999859 381.780561999884 2
275.458581999876 382.77025000006 2
278.33758000005 383.239541999996 2
270.407542999834 375.868768000044 2
276.533646999858 380.289565000217 2
276.392326000147 372.586655000225 2
277.435391000006 374.823960999958 2
281.0096450001 375.529147000052 2
270.199572999962 375.50949899992 2
280.601464000065 379.713717000093 2
272.352343999781 380.397797000129 2
272.74782600021 371.799476999789 2
279.051520000212 385.182934000157 2
276.032953999937 373.474593000021 2
276.04079400003 371.569041000213 2
283.93596199993 372.577940999996 2
282.358872999903 376.170775000006 2
273.135604999959 374.693434000015 2
275.436521000229 366.388801000081 2
284.604973000009 361.595015000086 2
279.457003000192 364.791898999829 2
279.324618000071 373.007619999815 2
283.907215999905 372.13773099985 2
286.724506000057 362.70857599983 2
287.665115000214 356.211618000176 2
285.225418999791 365.002741999924 2
285.955506999977 367.179124999791 2
292.162618999835 357.082713000011 2
283.527553999797 355.450588999782 2
290.19073599996 354.795212000143 2
281.79827499995 364.437746000011 2
286.016962000169 364.524929000065 2
285.490943999961 356.040585000068 2
289.584323999938 353.874623999931 2
292.188362999819 354.931929999962 2
290.921076999977 351.638449999969 2
295.706486000214 347.77908599982 2
293.876329000108 344.007606999949 2
287.630673000123 356.822922000196 2
292.76003200002 346.940886000171 2
291.874253999908 347.710423999932 2
284.013354999945 351.794995999895 2
292.967625999823 361.836296000052 2
280.614341000095 362.254129999783 2
283.180738000199 359.081335000228 2
288.563686999958 352.996989000123 2
290.206956000067 359.305296000093 2
285.380559999961 360.669691000134 2
280.855977000203 352.008268999867 2
287.071576999966 356.453741999809 2
289.397274999879 358.707884000149 2
292.32484500017 350.750994999893 2
297.766363000032 343.718977000099 2
295.469016999938 352.890833000187 2
286.694364999887 346.289671999868 2
294.82860499993 353.694565999787 2
286.33236300014 341.491194999777 2
293.489422000013 343.918444999959 2
293.222575999796 343.745008000173 2
289.931313999929 341.977434999775 2
292.469142999966 345.385646999814 2
303.328251000028 344.780511000194 2
304.068678000011 340.544828999788 2
293.718036000151 338.427188999951 2
303.741065999959 343.235077999998 2
307.100600000005 336.871704000048 2
307.09996900009 341.692346999887 2
302.759434999898 337.83112199977 2
301.137457000092 346.819325999822 2
298.115906999912 348.274538999889 2
301.878403000068 342.889849000145 2
313.32788700005 344.940977000166 2
301.189420000184 346.244235000107 2
313.613148000091 341.952339000069 2
310.804626999889 345.506359000225 2
307.120536999777 342.781421999913 2
307.54880299978 333.738884000108 2
309.184045999777 341.367395999841 2
309.930445999838 341.127594000194 2
301.483397000004 340.018436999992 2
314.124737999868 336.966264000162 2
316.484484000131 330.42635500012 2
312.664367000107 342.642039000057 2
316.852740999777 328.654779999983 2
313.073115000036 333.346878999844 2
317.070995000191 341.424494999927 2
322.053408000153 338.632120999973 2
308.055980000179 337.719942000229 2
318.092466000002 332.764543000143 2
316.209414000157 340.604559999891 2
321.762986999936 332.181195999961 2
324.187342999969 336.267490999773 2
330.833409999963 335.663953999989 2
337.203918999992 347.113196999766 2
325.619167000055 343.377601000015 2
336.854183999822 335.616034000181 2
325.100755999796 342.157753999811 2
331.17795300018 337.049343999941 2
326.799414000008 333.708540000021 2
331.353066999931 335.78148799995 2
323.835491000209 346.367407999933 2
328.04509499995 335.579245999921 2
333.326241999865 339.170746000018 2
330.170179999899 334.505876999814 2
330.410356000066 332.77614099998 2
328.951834000181 326.862077000085 2
330.768906999845 332.976048000157 2
339.536853000056 333.274954999797 2
330.71094099991 329.752326999791 2
340.630439999979 340.895986999851 2
342.341614999808 332.016452000011 2
333.59476199979 332.964519000147 2
336.322652000003 330.620709000155 2
335.530784999952 334.798208000138 2
332.734618999995 334.91174999997 2
341.056561000179 333.587681999896 2
345.079256999772 329.809594000224 2
333.05690599978 328.164487000089 2
344.170750999823 340.133692999836 2
339.232667999808 339.78389800014 2
336.185130999889 331.491934999824 2
288.439141000155 429.814830000047 2
277.240346000064 421.534643999767 2
288.705926000141 432.914245000109 2
285.61660300009 420.086494999938 2
284.781975000165 423.352613999974 2
289.356124000158 431.610005999915 2
276.669199999887 428.340828000102 2
282.552151000127 423.650437999982 2
282.833926999941 433.334122000262 2
275.244181999937 423.181933000218 2
295.111847000197 440.70355800027 2
288.956797000021 441.54499100009 2
294.225846999791 446.283021000214 2
285.297542000189 443.724222999997 2
288.548696999904 435.404792000074 2
286.853298999835 445.89872100018 2
284.027945999987 445.659014000092 2
290.283964000177 445.466033000033 2
296.440758000128 448.085398999974 2
292.013282000087 440.945854000282 2
293.315739000216 454.471840000246 2
291.641048000194 451.673665999901 2
304.441722999793 446.362951000221 2
291.323487999849 453.85565600032 2
304.139917000197 446.935756999999 2
302.618131999858 444.201373000164 2
295.255117000081 447.44844800001 2
293.073137000203 444.524476000108 2
297.196814000141 447.21465400001 2
290.564933999907 451.969909999985 2
316.160480999853 462.161685000174 2
313.438000000082 447.888195000123 2
315.721752999816 451.953838000074 2
315.141911000013 447.759120000061 2
310.845393000171 456.790971000213 2
309.446165000089 449.9552480001 2
310.442799999844 455.088514000177 2
315.661547999829 459.732952999882 2
315.430583000183 460.020981000271 2
310.377650000155 460.583492999896 2
337.770816999953 464.291306000203 2
326.218851999845 469.661887000315 2
339.849760000128 473.900303000119 2
330.132836999837 465.409437000286 2
333.963119000196 466.885577999987 2
328.408026000019 463.216091000009 2
338.064695999958 470.464205000084 2
331.278638999909 464.062855000142 2
337.738963999785 473.016298000235 2
331.581642999779 473.827144999988 2
278.063279999886 382.482284000143 2
283.170090000145 379.762041000184 2
284.882455999963 377.243555999827 2
286.762548999861 376.382792999968 2
285.977713000029 375.200914000161 2
282.663991999812 379.152185999788 2
282.158846000209 377.099086000118 2
284.432273000013 372.623476999812 2
280.196934000123 375.145907999948 2
286.962050000206 381.40418299986 2
310.786869000178 347.103199000005 2
311.337789000012 346.479408999905 2
310.680124000181 341.763600999955 2
306.215865000151 342.121199000161 2
302.647224000189 334.363262999803 2
304.464583999943 337.528198999818 2
302.930536999833 338.498699000105 2
315.309626000002 339.280749999918 2
311.334650999866 345.365472000092 2
307.930645999964 339.443221000023 2
316.644803000148 331.834722000174 2
329.77025000006 340.136498000007 2
330.69882800011 335.393705999944 2
321.975273999851 338.384099000134 2
316.734904999845 330.654447999783 2
324.408317999914 332.823224999942 2
330.077132000122 333.456172999926 2
321.82465799991 338.670402999967 2
325.928280000109 340.108022000175 2
327.313602000009 336.05405799998 2
331.96966700023 340.252166000195 2
339.144123999868 341.612036999781 2
328.344622999895 337.307053999975 2
335.398107000161 336.816229999997 2
329.37205900019 339.833136000205 2
339.925007999875 341.027007000055 2
330.932599999942 334.75419700006 2
327.721384000033 334.934183999896 2
337.464132000227 329.19598999992 2
337.832541999873 338.690866000019 2
428.810889000073 499.729580000043 3
425.393627000041 493.410670000128 3
424.475703999866 500.755513000302 3
415.510985999834 491.603397000115 3
419.541222000029 490.648371000309 3
428.113410000224 492.412991000339 3
425.608328999951 488.619296000339 3
427.317301000003 498.1468410003 3
427.943053999916 498.07325399993 3
428.672300999984 496.934424000327 3
433.372045000084 502.141772999894 3
435.874388999771 501.390268000308 3
430.950474999845 491.318884999957 3
437.752884999849 492.625738000032 3
439.392155999783 502.221375999972 3
436.655656999908 496.187112000305 3
440.452285999898 491.369898000266 3
434.746292000171 501.172223000322 3
440.312845000066 493.24883500021 3
438.881428000052 495.655058000237 3
438.524873000104 493.75810000021 3
441.167762000114 491.661625999957 3
443.836159999948 496.010120999999 3
448.038234999869 500.792133000214 3
440.090555999894 493.597029000055 3
448.841490000021 492.925735000055 3
439.478817000054 497.346042000223 3
446.786419000011 499.043122000061 3
438.221760000102 494.281751000322 3
437.356751000043 502.442203999963 3
449.492473000195 492.53163299989 3
459.745124999899 489.286241000053 3
447.223060999997 501.440026999917 3
447.245114000048 487.333358000033 3
450.467995000072 496.733677000273 3
458.580765999854 489.658048000187 3
453.71749900002 491.682899000123 3
450.043326999992 500.339963000268 3
455.143877000082 496.32339300029 3
453.2182169999 496.382433000021 3
462.335330999922 490.786952999886 3
467.451291000005 492.437547000125 3
459.826137000229 496.025311000179 3
464.071967999917 493.899759000167 3
456.547443000134 497.955181000289 3
468.249412999954 486.334616000298 3
465.482481000014 484.134764000308 3
471.070791999809 483.665415999945 3
461.469093999825 496.853117000312 3
464.949471000116 498.282124000136 3
477.476850999985 487.796128000133 3
473.19994200021 481.32956700027 3
475.650100999977 481.682930999901 3
472.407899999991 490.043781999964 3
478.489666999783 484.528574000113 3
477.478930999991 490.609395000152 3
468.97280200012 490.981703000143 3
468.745244000107 487.238164000213 3
472.92670099996 486.226412999909 3
469.936017000116 481.785910000093 3
483.847126000095 475.061950000003 3
480.692332999781 478.47275400022 3
487.595226000063 475.968164999969 3
488.596621000208 468.429765000008 3
487.141553999856 472.961444000248 3
488.096462999936 473.145284000318 3
484.760805000085 472.528414000291 3
488.708178000059 479.04041399993 3
479.584040000103 466.580910000019 3
484.286896999925 468.517949000001 3
495.708829000127 468.662524000276 3
498.306367999874 463.256289000157 3
490.967838000041 466.609476000071 3
489.33916899981 474.366789000109 3
488.481294000056 461.520047999918 3
488.358967999928 461.594925000332 3
489.752803000156 472.862289000303 3
484.34510200005 472.196117000189 3
488.994771999773 460.876666999888 3
490.832779000048 468.55657200003 3
490.678313000128 455.870581999887 3
489.974518000148 462.944128000177 3
497.031692999881 462.044984000269 3
484.53761800006 457.245074999984 3
488.48476300016 452.290405000094 3
484.257629999891 461.404545000289 3
496.414187000133 461.387347000185 3
493.16295599984 454.375983000267 3
491.825439999811 454.194912000094 3
488.775708000176 463.2106949999 3
481.173328999896 443.654618000146 3
494.338107000105 439.560624999925 3
485.760642999783 452.174839999992 3
484.106364999898 445.605880000163 3
487.434642000124 442.638418999966 3
480.945791000035 443.996179000009 3
485.675890999846 444.819050000049 3
481.355332999956 449.149621000048 3
495.079679000191 451.061327999923 3
493.256802000105 441.983272999991 3
488.721293999813 434.025133000221 3
479.183360000141 430.097661999986 3
484.496737999842 434.698993999977 3
485.407277999911 439.268786000088 3
485.903371999972 432.775454000104 3
477.372338000219 438.229150000028 3
474.560936000198 440.252245000098 3
475.857702000067 432.993828000035 3
482.089610999916 444.30068500014 3
474.89568899991 432.706654000096 3
478.456371000037 430.672968000174 3
472.835409999825 425.344663999975 3
469.408571000211 431.490079999901 3
467.778229999822 430.893924999982 3
468.319089999888 423.78520500008 3
466.404827999882 430.545737999957 3
474.656531999819 436.512926999945 3
473.704237000085 434.295117999893 3
469.99489099998 422.807921000291 3
470.22989499988 431.990844999906 3
468.125308000017 427.692645000294 3
459.370287000202 434.228566000238 3
465.574583999813 427.847649999894 3
471.47603000002 424.857528999913 3
458.833753000014 423.569963000249 3
457.583232000005 431.679839000106 3
463.169875000138 428.454710999969 3
466.314158000052 427.3446630002 3
462.784620000049 433.005989999976 3
458.836068000179 426.326979000121 3
452.97005000012 423.382490000222 3
458.167239000089 418.44309400022 3
453.854613000061 421.438537000213 3
457.294453000184 421.76099300012 3
457.539230000228 421.425176999997 3
456.41126499977 425.433713000268 3
455.357557999901 424.094777000137 3
450.902567999903 415.553141000215 3
458.018389000092 418.812316999771 3
455.648783999961 422.940563000273 3
440.518015000038 417.753004999831 3
438.012492000125 412.980132000055 3
447.59070800012 422.962691000197 3
438.595366000198 419.423899000045 3
451.007958999835 415.56221299991 3
447.837811000179 423.666684000287 3
439.242604000028 423.527970999945 3
440.404937000014 418.397346999962 3
440.028320999816 416.051675000228 3
447.287671000231 417.663943999913 3
438.35898800008 414.573799000122 3
426.627871000208 416.068214000203 3
426.516830000095 414.542373000178 3
428.430649999995 416.653237999883 3
436.461378000211 416.150005999953 3
440.34707200015 419.265943999868 3
427.584884000011 408.791635999922 3
439.871456000023 407.839774999768 3
429.444986999966 412.304796000011 3
430.937460999936 405.849741999991 3
428.969322999939 415.703420999926 3
423.408722999971 406.841905999929 3
419.644838000182 405.488836999983 3
427.421575000044 403.303555000108 3
428.579102999996 410.244613000192 3
428.30893300008 401.526786999777 3
424.206778999884 408.167326000053 3
424.488547000103 411.169675999787 3
420.753368000034 409.300112999976 3
421.141454000026 403.093392999843 3
422.575889000203 395.963992000092 3
415.008001000155 394.48915300006 3
423.418938000221 402.424097000156 3
420.429361999966 398.279748000205 3
427.444771000184 400.071800000034 3
414.952022999991 393.633016000036 3
421.983272000216 390.16661400022 3
413.9837460001 393.733527000062 3
426.244646999985 394.742223000154 3
418.182521000039 399.909256999847 3
420.358260999899 375.894729999825 3
427.869353999849 388.467025999911 3
428.307055000216 382.592672999948 3
419.557771999855 382.19579999987 3
428.50125099998 388.969442999922 3
418.244741000235 379.359017999843 3
417.434104999993 376.864184000064 3
419.457435000222 387.456309999805 3
428.179967000149 386.792590999976 3
417.430131000001 385.002572000027 3
422.263489999808 372.909155999776 3
423.511117999908 364.854987000115 3
426.30204299977 373.600267999806 3
425.074638000224 369.74961300008 3
419.22969500022 367.148517999798 3
425.448702000082 367.063457999844 3
427.935936000198 376.258545000106 3
421.599621000234 378.26745699998 3
429.803634000011 379.078275000211 3
428.726489999797 366.422532000113 3
429.293459999841 370.441796000116 3
422.024081000127 365.716256999876 3
422.080856999848 360.452484999783 3
434.971530999988 367.278249000199 3
436.256130999885 362.242738000117 3
424.723292000126 369.823642999865 3
432.911468999926 362.537899000105 3
427.971251000185 357.049992000218 3
434.024201000109 357.875876000151 3
436.15306300018 369.883030999918 3
440.152916000225 351.768310000189 3
436.668070000131 360.135976000223 3
430.671628000215 355.613342000172 3
434.786071000155 355.149490999989 3
435.98977799993 360.07734999992 3
442.275245999917 353.242513000034 3
439.905375999864 348.219847000204 3
444.840218999889 354.630938000046 3
433.893869999796 358.638927000109 3
430.86381699983 354.284175999928 3
442.700879999902 346.138824999798 3
446.513323000167 354.673012999818 3
439.407722999807 357.137496000156 3
446.086984000169 352.812396999914 3
444.458507999778 354.214385000058 3
448.701793000102 351.475484000053 3
438.790421000216 347.653041000012 3
449.342588 357.475145000033 3
445.903801999986 355.878285000101 3
437.50270299986 349.007557000034 3
450.747810999863 344.582192999776 3
452.843367999885 344.103289999999 3
454.279118999839 344.040136000142 3
444.908001999836 347.374592000153 3
453.119206999894 343.011994999833 3
453.410854999907 336.931435000151 3
445.570268999785 341.626203999855 3
455.76273200009 344.204905999824 3
453.301330999937 346.165316000115 3
443.662442000117 343.893095999956 3
450.592554000206 341.27203800017 3
451.640319999773 344.317819000222 3
457.559148000088 338.631022999994 3
460.726425000001 345.278055999894 3
451.824684000108 341.867209000047 3
452.238888000138 331.685310000088 3
452.757952000014 338.266270000022 3
455.351141999941 333.413383999839 3
448.535099999979 332.97164299991 3
455.475157999899 341.506680000108 3
460.855047000106 338.930375999771 3
457.078596000094 330.556702999864 3
453.813031000085 332.122059999965 3
454.459245000035 335.722775999922 3
463.812841000035 332.088324999902 3
454.220263000112 336.861219000071 3
453.876569999848 335.621797999833 3
464.133146000095 340.191471000202 3
463.772503000218 333.303650999907 3
465.428991000168 337.623219000176 3
468.293345999904 323.258880999871 3
461.778946999926 325.126213999931 3
459.180929000024 325.846413000021 3
462.312686999794 332.107357000001 3
470.991812000051 323.379896999802 3
465.851267999969 322.656903000083 3
460.669528000057 324.675607999787 3
466.893639000133 330.128490000032 3
468.453945999965 327.243917000014 3
470.58936699992 324.017206999939 3
472.794015999883 326.346493000165 3
462.414896000177 327.932380000129 3
472.940739999991 320.993077000137 3
466.85614300007 329.479348999914 3
470.819778999779 326.629040999804 3
468.017585999798 319.480458999984 3
474.199337000027 325.671914999839 3
473.76885400014 317.551202999894 3
461.43383899983 317.940142000094 3
468.314850000199 324.931464999914 3
423.049947000109 461.602417000104 4
415.882664999925 462.230109999888 4
422.901542000007 463.280180000234 4
424.773157000076 461.591086000204 4
425.97969100019 463.723321999889 4
417.142891000025 458.040789000224 4
426.111376000103 451.80352600012 4
414.850494000129 454.950883999933 4
424.364312000107 456.30208500009 4
414.640132000204 459.097361000255 4
421.098943999968 465.900079000276 4
415.243125999812 460.873907000292 4
426.502336000092 463.425848000217 4
419.553964000195 461.354702000041 4
420.256450999994 462.199783000164 4
426.286921999883 453.050778999925 4
418.101619999856 455.696752000134 4
418.435087999795 457.909536000341 4
426.986647000071 459.493770000059 4
422.758165999781 452.658585000318 4
429.253837999888 454.370699000079 4
426.566916000098 466.358785000164 4
426.117418999784 457.360111000016 4
423.799103000201 465.722519000061 4
419.570911000017 454.25428700028 4
425.907724000048 465.343441000208 4
415.338547000196 462.592960000038 4
425.639667000156 463.009326000232 4
418.12036800012 464.431340000127 4
421.191784000024 464.121416999958 4
466.573479000013 388.56095399987 5
473.242182999849 388.505501999985 5
477.989756000228 380.362852999941 5
468.493040999863 382.565651000012 5
478.035753999837 383.804427000228 5
465.900979999918 382.578755999915 5
467.047054000199 376.6596789998 5
465.586457999889 376.740127999801 5
466.30907699978 384.941691999789 5
466.533137999941 388.636098000221 5
475.669772000052 382.141518999822 5
479.811947000213 376.880171000026 5
474.566833000164 384.886010999791 5
468.031305999961 380.62018099986 5
478.953275999986 386.168779000174 5
477.305238999892 376.998159999959 5
472.392535000108 376.989792999811 5
479.798206999898 378.12027399987 5
470.961451000068 377.676510000136 5
466.5645750002 389.006446000189 5
468.762294999789 378.853031999897 5
477.20013799984 386.128392999992 5
477.790004000068 389.397962999996 5
472.586430999916 381.000587000046 5
467.618739999831 378.265711000189 5
468.390726000071 390.262949999887 5
474.892611999996 383.943504999857 5
479.888063999824 382.734370000195 5
470.796721999999 382.858122000005 5
469.433087000158 382.949008000083 5
473.588192000054 379.197623000015 5
470.089219999965 388.71069200011 5
479.300497000106 386.932823999785 5
478.489159000106 378.533429000061 5
475.063924999908 388.783824999817 5
465.624900999945 377.243751999922 5
477.853277999908 386.701561999973 5
478.867014999967 376.112596000079 5
466.944978000131 389.898415999953 5
475.431175000034 381.42365900008 5
397.510631000157 318.779397999868 6
393.182192000095 311.378210000228 6
405.59302699985 313.85961999977 6
400.207744999789 314.678629000206 6
399.142376000062 319.893732000142 6
395.942474999931 312.430680999998 6
403.492874000221 311.391166999936 6
403.187113999855 312.641197999939 6
394.807862999849 315.297671999782 6
399.650669999886 318.326419000048 6
398.619932999834 309.981873000041 6
406.447836000007 313.045142000075 6
394.85633599991 306.184570999816 6
406.718776999973 314.228234999813 6
396.452070999891 309.648914000019 6
402.186852000188 312.914503999986 6
398.009571999777 306.306824999861 6
394.72197900014 300.856654000003 6
401.503841000143 306.34066199977 6
394.375514999963 301.732228000183 6
408.25665100012 307.194850999862 6
404.499704999849 310.703956000041 6
398.461693999823 300.209472000133 6
397.59167600004 302.597422000021 6
411.33420599997 303.615460999776 6
409.380115000065 301.257648999803 6
411.549291000236 307.317329999991 6
400.192313999869 299.548905999865 6
408.531942000147 309.208424000069 6
406.719988000114 301.970383999869 6
410.583399999887 302.046819999814 6
404.96436699992 312.242717999965 6
401.784442000091 305.12842199998 6
410.006078999955 310.366615999956 6
401.597389000002 313.42785899993 6
414.056191000156 310.538738999981 6
407.827461000066 315.803555000108 6
414.649985000025 304.695712000132 6
408.976344999857 302.572995000053 6
402.320390000008 303.321086000185 6
388.082185000181 312.699907000177 6
389.509529999923 308.985661000013 6
384.867300000042 306.26948300004 6
390.55796500016 307.401546000037 6
398.799327000044 310.043293000199 6
387.063124000095 315.284988000058 6
385.934568000026 303.611659999937 6
387.964254000224 307.235400999896 6
388.047687000129 312.992616999894 6
389.964410000015 314.521449999884 6
386.692520000041 295.829528000206 6
394.290382000152 296.759353000205 6
385.995306999888 309.215948000085 6
397.548890999984 306.051862000022 6
398.91705300007 301.137899999972 6
398.024985000025 294.545830999967 6
390.556464000139 299.987350000069 6
395.581344999839 307.565301999915 6
387.356068000197 306.987933999859 6
388.168403000105 308.765101999976 6
397.505044000223 306.712673999835 6
395.604325999971 299.74879400013 6
402.714682000224 300.474165999796 6
406.542411999777 297.360909999814 6
397.990000999998 293.174902999774 6
403.867875999771 305.508750000037 6
398.770523999818 304.696136999875 6
399.030168999918 301.416486999951 6
402.828209000174 300.463953999802 6
399.719417999964 296.280708000064 6
406.255197000224 297.83671199996 6
410.982671999838 293.501579999924 6
406.042142999824 304.310072000138 6
398.845921 301.682546999771 6
399.413953000214 302.254333000164 6
399.175451999996 304.443797000218 6
410.2477330002 296.46242700005 6
408.364271000028 296.781977000181 6
410.001900999807 296.184016000014 6
398.705327000003 298.27604399994 6
364.387544000056 443.955205000006 -1
379.862741999794 467.835111000109 -1
370.757472000085 513.521511000581 -1
416.86848400021 524.035082000308 -1
311.539015999995 523.082759000361 -1
284.837520000059 487.712935999967 -1
265.537488000002 517.137047000229 -1
265.406068000011 467.304047000129 -1
331.243737000041 503.141515000258 -1
393.481540999841 431.555767999962 -1
455.248877999838 465.751176000107 -1
433.485898999963 440.123194000218 -1
341.20297600003 430.338018999901 -1
361.197627000045 414.171761000063 -1
392.739544999786 411.681011000182 -1
362.739223000128 364.252307000104 -1
380.037742000073 395.467621999793 -1
400.507069000043 365.464221000206 -1
377.944128999952 336.997504000086 -1
416.090559999924 338.672298000194 -1
444.825889000203 307.228933999781 -1
493.477363000158 300.292936999816 -1
450.57735500019 285.058660999872 -1
487.091688999906 274.915289999917 -1
428.451733000111 271.475777999964 -1
353.462441999931 268.77300899988 -1
332.63990199985 300.980235999916 -1
294.568130999804 269.832191000227 -1
286.768153000157 309.40664099995 -1
254.155410999898 273.011171999853 -1
255.736754999962 383.043938999996 -1
255.578135000076 321.797772999853 -1
269.571206999943 291.691631999798 -1
318.953784000129 371.279631999787 -1
322.208068999927 419.043138000183 -1
302.684162000194 410.35325299995 -1
339.387234999798 367.536927999929 -1
297.637213999871 384.627384999767 -1
311.072525999974 393.566060000099 -1
377.536865999922 377.286367000081 -1
417.072476999834 433.68673299998 -1
396.357729999814 452.929053000174 -1
399.783197000157 485.216498000082 -1
362.771583000198 492.148952000309 -1
308.225172999781 492.362728999928 -1
286.942470000125 516.371205000207 -1
264.795897000004 498.772009999957 -1
454.217315000016 522.212020000443 -1
485.410180000123 522.141029000282 -1
515.508918000385 520.376193000004 -1
515.950048999861 493.279626999982 -1
517.4182099998 456.858341000043 -1
517.196076000109 419.766627999954 -1
519.392636000179 382.963473000098 -1
517.66203899961 356.089052000083 -1
520.467551999725 314.412516000215 -1
518.004772000015 280.905362999998 -1
499.387498000171 337.548436000012 -1
478.931082000025 356.823363000061 -1
497.154240000062 415.408559999894 -1
497.773620999884 377.833358000033 -1
445.298427000176 391.178739000112 -1
470.157571999822 409.287219999824 -1
357.421777000185 314.294974999968 -1
384.430027999915 277.063980999868 -1
268.806625999976 337.208126999903 -1
258.773949999828 431.312564000022 -1

================================================
FILE: data_src/data_DBCV/read_data.R
================================================
library(dbscan)


x <- read.table("Work/data_DBCV/dataset_1.txt")
colnames(x) <- c("x", "y", "class")

cl <- x[, 3]
cl[cl < 0] <- 0
x[, 3] <- cl

plot(x[, 1:2], col = x[, 3] + 1L, asp = 1)

Dataset_1 <- x
save(Dataset_1, file="data/Dataset_1.rda", version = 2)

x <- read.table("Work/data_DBCV/dataset_2.txt")
colnames(x) <- c("x", "y", "class")

cl <- x[, 3]
cl[cl < 0] <- 0
x[, 3] <- cl

clplot(x[, 1:2], x[, 3])

Dataset_2 <- x
save(Dataset_2, file="data/Dataset_2.rda", version = 2)


x <- read.table("Work/data_DBCV/dataset_3.txt")
colnames(x) <- c("x", "y", "class")

cl <- x[, 3]
cl[cl < 0] <- 0
x[, 3] <- cl

clplot(x[, 1:2], x[, 3])

Dataset_3 <- x
save(Dataset_3, file="data/Dataset_3.rda", version = 2)

x <- read.table("Work/data_DBCV/dataset_4.txt")
colnames(x) <- c("x", "y", "class")

cl <- x[, 3]
cl[cl < 0] <- 0
x[, 3] <- cl

clplot(x[, 1:2], x[, 3])

Dataset_4 <- x
save(Dataset_4, file="data/Dataset_4.rda", version = 2)


================================================
FILE: data_src/data_DBCV/test_DBCV.R
================================================
# From: https://github.com/FelSiq/DBCV
#
# Dataset	Python (Scipy's Kruskal's)	Python (Translated MST algorithm)	MATLAB
# dataset_1.txt	0.8566	0.8576	0.8576
# dataset_2.txt	0.5405	0.8103	0.8103
# dataset_3.txt	0.6308	0.6319	0.6319
# dataset_4.txt	0.8456	0.8688	0.8688
#
# Original MATLAB implementation is at:
#     https://github.com/pajaskowiak/dbcv/tree/main/data


res <- c()

data(Dataset_1)
x <- Dataset_1[, c("x", "y")]
class <- Dataset_1$class
#clplot(x, class)
(db <- dbcv(x, class, metric = "sqeuclidean"))
res["ds1"] <- db$score


#dsc [0.00457826 0.00457826 0.0183068  0.0183068 ]
#dspc [0.85627898 0.85627898 0.85627898 0.85627898]
#vcs [0.99465331 0.99465331 0.97862052 0.97862052]
#0.8575741400490697

data(Dataset_2)
x <- Dataset_2[, c("x", "y")]
class <- Dataset_2$class
#clplot(x, class)
(db <- dbcv(x, class, metric = "sqeuclidean"))
res["ds2"] <- db$score

#dsc [19.06151967 15.6082     83.71522964 68.969     ]
#dspc [860.2538 501.4376 501.4376 860.2538]
#vcs [0.97784198 0.9688731  0.83304956 0.91982715]
#0.8103343589093096


data(Dataset_3)
x <- Dataset_3[, c("x", "y")]
class <- Dataset_3$class
#clplot(x, class)
(db <- dbcv(x, class, metric = "sqeuclidean"))
res["ds3"] <- db$score

data(Dataset_4)
x <- Dataset_4[, c("x", "y")]
class <- Dataset_4$class
#clplot(x, class)
(db <- dbcv(x, class, metric = "sqeuclidean"))
res["ds4"] <- db$score

cbind(dbscan = round(res, 2), MATLAB = c(0.85, 0.81, 0.63, 0.87))


================================================
FILE: data_src/data_chameleon/read.R
================================================
# Source: http://glaros.dtc.umn.edu/gkhome/cluto/cluto/download

chameleon_ds4 <- read.table("t4.8k.dat")
chameleon_ds5 <- read.table("t5.8k.dat")
chameleon_ds7 <- read.table("t7.10k.dat")
chameleon_ds8 <- read.table("t8.8k.dat")

colnames(chameleon_ds4) <- colnames(chameleon_ds5) <- colnames(chameleon_ds7) <- colnames(chameleon_ds8) <- c("x", "y")

plot(chameleon_ds4)
plot(chameleon_ds5)
plot(chameleon_ds7)
plot(chameleon_ds8)

save(chameleon_ds4, chameleon_ds5, chameleon_ds7, chameleon_ds8, 
     file="Chameleon.rda")


================================================
FILE: dbscan.Rproj
================================================
Version: 1.0
ProjectId: 6c2ba941-cfaa-4faa-ba72-88eeef0391b8

RestoreWorkspace: Default
SaveWorkspace: Default
AlwaysSaveHistory: Default

EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 2
Encoding: UTF-8

RnwWeave: Sweave
LaTeX: pdfLaTeX

AutoAppendNewline: Yes
StripTrailingWhitespace: Yes

BuildType: Package
PackageUseDevtools: Yes
PackageCleanBeforeInstall: No
PackageInstallArgs: --no-multiarch --with-keep.source
PackageBuildArgs: --compact-vignettes=both
PackageCheckArgs: --as-cran
PackageRoxygenize: rd,collate,namespace


================================================
FILE: inst/CITATION
================================================
citation(auto = meta)

bibentry(bibtype = "Article",
  title        = "{dbscan}: Fast Density-Based Clustering with {R}",
  author       = c(person(given = "Michael",
                          family = "Hahsler",
                          email = "mhahsler@lyle.smu.edu",
			  comment = c(ORCID = "0000-0003-2716-1405")),
                   person(given = "Matthew",
                          family = "Piekenbrock"),
                   person(given = "Derek",
                          family = "Doran",
                          email = "derek.doran@wright.edu")),
  journal      = "Journal of Statistical Software",
  year         = "2019",
  volume       = "91",
  number       = "1",
  pages        = "1--30",
  doi          = "10.18637/jss.v091.i01",
  header       = "To cite dbscan in publications use:"
)


================================================
FILE: man/DBCV_datasets.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/DBCV_datasets.R
\docType{data}
\name{DBCV_datasets}
\alias{DBCV_datasets}
\alias{Dataset_1}
\alias{Dataset_2}
\alias{Dataset_3}
\alias{Dataset_4}
\title{DBCV Paper Datasets}
\format{
Four data frames with the following 3 variables.
\describe{
\item{x}{a numeric vector}
\item{y}{a numeric vector}
\item{class}{an integer vector indicating the class label. 0 means noise.} }
}
\source{
https://github.com/pajaskowiak/dbcv
}
\description{
The four synthetic 2D datasets used in Moulavi et al (2014).
}
\examples{
data("Dataset_1")
clplot(Dataset_1[, c("x", "y")], cl = Dataset_1$class)

data("Dataset_2")
clplot(Dataset_2[, c("x", "y")], cl = Dataset_2$class)

data("Dataset_3")
clplot(Dataset_3[, c("x", "y")], cl = Dataset_3$class)

data("Dataset_4")
clplot(Dataset_4[, c("x", "y")], cl = Dataset_4$class)
}
\references{
Davoud Moulavi and Pablo A. Jaskowiak and
Ricardo J. G. B. Campello and Arthur Zimek and Jörg Sander (2014).
Density-Based Clustering Validation. In
\emph{Proceedings of the 2014 SIAM International Conference on Data Mining,}
pages 839-847
\doi{10.1137/1.9781611973440.96}
}
\keyword{datasets}


================================================
FILE: man/DS3.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/DS3.R
\docType{data}
\name{DS3}
\alias{DS3}
\title{DS3: Spatial data with arbitrary shapes}
\format{
A data.frame with 8000 observations on the following 2 columns:
\describe{
\item{X}{a numeric vector}
\item{Y}{a numeric vector}
}
}
\source{
Obtained from \url{http://cs.joensuu.fi/sipu/datasets/}
}
\description{
Contains 8000 2-d points, with 6 "natural" looking shapes, all of which have
an sinusoid-like shape that intersects with each cluster.
The data set was originally used as a benchmark data set for the Chameleon clustering
algorithm (Karypis, Han and Kumar, 1999) to
illustrate the a data set containing arbitrarily shaped
spatial data surrounded by both noise and artifacts.
}
\examples{
data(DS3)
plot(DS3, pch = 20, cex = 0.25)
}
\references{
Karypis, George, Eui-Hong Han, and Vipin Kumar (1999).
Chameleon: Hierarchical clustering using dynamic modeling. \emph{Computer}
32(8): 68-75.
}
\keyword{datasets}


================================================
FILE: man/NN.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/NN.R
\name{NN}
\alias{NN}
\alias{adjacencylist}
\alias{adjacencylist.NN}
\alias{sort.NN}
\alias{plot.NN}
\title{NN --- Nearest Neighbors Superclass}
\usage{
adjacencylist(x, ...)

\method{adjacencylist}{NN}(x, ...)

\method{sort}{NN}(x, decreasing = FALSE, ...)

\method{plot}{NN}(x, data, main = NULL, pch = 16, col = NULL, linecol = "gray", ...)
}
\arguments{
\item{x}{a \code{NN} object}

\item{...}{further parameters past on to \code{\link[=plot]{plot()}}.}

\item{decreasing}{sort in decreasing order?}

\item{data}{that was used to create \code{x}}

\item{main}{title}

\item{pch}{plotting character.}

\item{col}{color used for the data points (nodes).}

\item{linecol}{color used for edges.}
}
\description{
NN is an abstract S3 superclass for the classes of the objects returned
by \code{\link[=kNN]{kNN()}}, \code{\link[=frNN]{frNN()}} and \code{\link[=sNN]{sNN()}}. Methods for sorting, plotting and getting an
adjacency list are defined.
}
\section{Subclasses}{

\link{kNN}, \link{frNN} and \link{sNN}
}

\examples{
data(iris)
x <- iris[, -5]

# finding kNN directly in data (using a kd-tree)
nn <- kNN(x, k=5)
nn

# plot the kNN where NN are shown as line conecting points.
plot(nn, x)

# show the first few elements of the adjacency list
head(adjacencylist(nn))

\dontrun{
# create a graph and find connected components (if igraph is installed)
library("igraph")
g <- graph_from_adj_list(adjacencylist(nn))
comp <- components(g)
plot(x, col = comp$membership)

# detect clusters (communities) with the label propagation algorithm
cl <- membership(cluster_label_prop(g))
plot(x, col = cl)
}
}
\seealso{
Other NN functions: 
\code{\link{comps}()},
\code{\link{frNN}()},
\code{\link{kNN}()},
\code{\link{kNNdist}()},
\code{\link{sNN}()}
}
\author{
Michael Hahsler
}
\concept{NN functions}
\keyword{model}


================================================
FILE: man/comps.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/comps.R
\name{comps}
\alias{comps}
\alias{components}
\alias{comps.dist}
\alias{comps.kNN}
\alias{comps.sNN}
\alias{comps.frNN}
\title{Find Connected Components in a Nearest-neighbor Graph}
\usage{
comps(x, ...)

\method{comps}{dist}(x, eps, ...)

\method{comps}{kNN}(x, mutual = FALSE, ...)

\method{comps}{sNN}(x, ...)

\method{comps}{frNN}(x, ...)
}
\arguments{
\item{x}{the \link{NN} object representing the graph or a \link{dist} object}

\item{...}{further arguments are currently unused.}

\item{eps}{threshold on the distance}

\item{mutual}{for a pair of points, do both have to be in each other's neighborhood?}
}
\value{
an integer vector with component assignments.
}
\description{
Generic function and methods to find connected components in nearest neighbor graphs.
}
\details{
Note that for kNN graphs, one point may be in the kNN of the other but nor vice versa.
\code{mutual = TRUE} requires that both points are in each other's kNN.
}
\examples{
set.seed(665544)
n <- 100
x <- cbind(
  x=runif(10, 0, 5) + rnorm(n, sd = 0.4),
  y=runif(10, 0, 5) + rnorm(n, sd = 0.4)
  )
plot(x, pch = 16)

# Connected components on a graph where each pair of points
# with a distance less or equal to eps are connected
d <- dist(x)
components <- comps(d, eps = .8)
plot(x, col = components, pch = 16)

# Connected components in a fixed radius nearest neighbor graph
# Gives the same result as the threshold on the distances above
frnn <- frNN(x, eps = .8)
components <- comps(frnn)
plot(frnn, data = x, col = components)

# Connected components on a k nearest neighbors graph
knn <- kNN(x, 3)
components <- comps(knn, mutual = FALSE)
plot(knn, data = x, col = components)

components <- comps(knn, mutual = TRUE)
plot(knn, data = x, col = components)

# Connected components in a shared nearest neighbor graph
snn <- sNN(x, k = 10, kt = 5)
components <- comps(snn)
plot(snn, data = x, col = components)
}
\seealso{
Other NN functions: 
\code{\link{NN}},
\code{\link{frNN}()},
\code{\link{kNN}()},
\code{\link{kNNdist}()},
\code{\link{sNN}()}
}
\author{
Michael Hahsler
}
\concept{NN functions}
\keyword{model}


================================================
FILE: man/dbcv.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dbcv.R
\name{dbcv}
\alias{dbcv}
\alias{DBCV}
\title{Density-Based Clustering Validation Index (DBCV)}
\usage{
dbcv(x, cl, d, metric = "euclidean", sample = NULL)
}
\arguments{
\item{x}{a data matrix or a dist object.}

\item{cl}{a clustering (e.g., a integer vector)}

\item{d}{dimensionality of the original data if a dist object is provided.}

\item{metric}{distance metric used. The available metrics are the methods
implemented by \code{dist()} plus \code{"sqeuclidean"} for the squared
Euclidean distance used in the original DBCV implementation.}

\item{sample}{sample size used for large datasets.}
}
\value{
A list with the DBCV \code{score} for the clustering,
the density sparseness of cluster (\code{dsc}) values,
the density separation of pairs of clusters (\code{dspc}) distances,
and the validity indices of clusters (\code{c_c}).
}
\description{
Calculate the Density-Based Clustering Validation Index (DBCV)  for a
clustering.
}
\details{
DBCV (Moulavi et al, 2014) computes a score based on the density sparseness of each cluster
and the density separation of each pair of clusters.

The density sparseness of a cluster (DSC) is deﬁned as the maximum edge weight of
a minimal spanning tree for the internal points of the cluster using the mutual
reachability distance based on the all-points-core-distance. Internal points
are connected to more than one other point in the cluster. Since clusters of
a size less then 3 cannot have internal points, they are ignored (considered
noise) in this implementation.

The density separation of a pair of clusters (DSPC)
is deﬁned as the minimum reachability distance between the internal nodes of
the spanning trees of the two clusters.

The validity index for a cluster is calculated using these measures and aggregated
to a validity index for the whole clustering using a weighted average.

The index is in the range \eqn{[-1,1]}. If the cluster density compactness is better
than the density separation, a positive value is returned. The actual value depends
on the separability of the data. In general, greater values
of the measure indicating a better density-based clustering solution.

Noise points are included in the calculation only in the weighted average,
therefore clustering with more noise points will get a lower index.

\strong{Performance note:} This implementation calculates a distance matrix and thus
can only be used for small or sampled datasets.
}
\examples{
# Load a test dataset
data(Dataset_1)
x <- Dataset_1[, c("x", "y")]
class <- Dataset_1$class

clplot(x, class)

# We use MinPts 3 and use the knee at eps = .1 for dbscan
kNNdistplot(x, minPts = 3)

cl <- dbscan(x, eps = .1, minPts = 3)
clplot(x, cl)

dbcv(x, cl)

# compare to the DBCV index on the original class labels and
# with a random partitioning
dbcv(x, class)
dbcv(x, sample(1:4, replace = TRUE, size = nrow(x)))

# find the best eps using dbcv
eps_grid <- seq(.05,.2, by = .01)
cls <- lapply(eps_grid, FUN = function(e) dbscan(x, eps = e, minPts = 3))
dbcvs <- sapply(cls, FUN = function(cl) dbcv(x, cl)$score)

plot(eps_grid, dbcvs, type = "l")

eps_opt <- eps_grid[which.max(dbcvs)]
eps_opt

cl <- dbscan(x, eps = eps_opt, minPts = 3)
clplot(x, cl)
}
\references{
Davoud Moulavi and Pablo A. Jaskowiak and
Ricardo J. G. B. Campello and Arthur Zimek and Jörg Sander (2014).
Density-Based Clustering Validation. In
\emph{Proceedings of the 2014 SIAM International Conference on Data Mining,}
pages 839-847
\doi{10.1137/1.9781611973440.96}

Pablo A. Jaskowiak (2022). MATLAB implementation of DBCV.
\url{https://github.com/pajaskowiak/dbcv}
}
\author{
Matt Piekenbrock and Michael Hahsler
}
\concept{Evaluation Functions}


================================================
FILE: man/dbscan-package.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/AAA_dbscan-package.R
\docType{package}
\name{dbscan-package}
\alias{dbscan-package}
\title{dbscan: Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Related Algorithms}
\description{
A fast reimplementation of several density-based algorithms of the DBSCAN family. Includes the clustering algorithms DBSCAN (density-based spatial clustering of applications with noise) and HDBSCAN (hierarchical DBSCAN), the ordering algorithm OPTICS (ordering points to identify the clustering structure), shared nearest neighbor clustering, and the outlier detection algorithms LOF (local outlier factor) and GLOSH (global-local outlier score from hierarchies). The implementations use the kd-tree data structure (from library ANN) for faster k-nearest neighbor search. An R interface to fast kNN and fixed-radius NN search is also provided. Hahsler, Piekenbrock and Doran (2019) \doi{10.18637/jss.v091.i01}.
}
\section{Key functions}{

\itemize{
\item Clustering: \code{\link[=dbscan]{dbscan()}}, \code{\link[=hdbscan]{hdbscan()}}, \code{\link[=optics]{optics()}}, \code{\link[=jpclust]{jpclust()}}, \code{\link[=sNNclust]{sNNclust()}}
\item Outliers: \code{\link[=lof]{lof()}}, \code{\link[=glosh]{glosh()}}, \code{\link[=pointdensity]{pointdensity()}}
\item Nearest Neighbors: \code{\link[=kNN]{kNN()}}, \code{\link[=frNN]{frNN()}}, \code{\link[=sNN]{sNN()}}
}
}

\references{
Hahsler M, Piekenbrock M, Doran D (2019). dbscan: Fast Density-Based Clustering with R. Journal of Statistical Software, 91(1), 1-30. \doi{10.18637/jss.v091.i01}
}
\seealso{
Useful links:
\itemize{
  \item \url{https://github.com/mhahsler/dbscan}
  \item Report bugs at \url{https://github.com/mhahsler/dbscan/issues}
}

}
\author{
\strong{Maintainer}: Michael Hahsler \email{mhahsler@lyle.smu.edu} (\href{https://orcid.org/0000-0003-2716-1405}{ORCID}) [copyright holder]

Authors:
\itemize{
  \item Matthew Piekenbrock [copyright holder]
}

Other contributors:
\itemize{
  \item Sunil Arya [contributor, copyright holder]
  \item David Mount [contributor, copyright holder]
  \item Claudia Malzer [contributor]
}

}
\keyword{internal}


================================================
FILE: man/dbscan.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dbscan.R, R/predict.R
\name{dbscan}
\alias{dbscan}
\alias{DBSCAN}
\alias{print.dbscan_fast}
\alias{is.corepoint}
\alias{predict.dbscan_fast}
\title{Density-based Spatial Clustering of Applications with Noise (DBSCAN)}
\usage{
dbscan(x, eps, minPts = 5, weights = NULL, borderPoints = TRUE, ...)

is.corepoint(x, eps, minPts = 5, ...)

\method{predict}{dbscan_fast}(object, newdata, data, ...)
}
\arguments{
\item{x}{a data matrix, a data.frame, a \link{dist} object or a \link{frNN} object with
fixed-radius nearest neighbors.}

\item{eps}{size (radius) of the epsilon neighborhood. Can be omitted if
\code{x} is a frNN object.}

\item{minPts}{number of minimum points required in the eps neighborhood for
core points (including the point itself).}

\item{weights}{numeric; weights for the data points. Only needed to perform
weighted clustering.}

\item{borderPoints}{logical; should border points be assigned to clusters.
The default is \code{TRUE} for regular DBSCAN. If \code{FALSE} then border
points are considered noise (see DBSCAN* in Campello et al, 2013).}

\item{...}{additional arguments are passed on to the fixed-radius nearest
neighbor search algorithm. See \code{\link[=frNN]{frNN()}} for details on how to
control the search strategy.}

\item{object}{clustering object.}

\item{newdata}{new data points for which the cluster membership should be
predicted.}

\item{data}{the data set used to create the clustering object.}
}
\value{
\code{dbscan()} returns an object of class \code{dbscan_fast} with the following components:

\item{eps }{ value of the \code{eps} parameter.}
\item{minPts }{ value of the \code{minPts} parameter.}
\item{metric }{ used distance metric.}
\item{cluster }{A integer vector with cluster assignments. Zero indicates noise points.}

\code{is.corepoint()} returns a logical vector indicating for each data point if it is a
core point.
}
\description{
Fast reimplementation of the DBSCAN (Density-based spatial clustering of
applications with noise) clustering algorithm using a kd-tree.
}
\details{
The
implementation is significantly faster and can work with larger data sets
than \code{\link[fpc:dbscan]{fpc::dbscan()}} in \pkg{fpc}. Use \code{dbscan::dbscan()} (with specifying the package) to
call this implementation when you also load package \pkg{fpc}.

\strong{The algorithm}

This implementation of DBSCAN follows the original
algorithm as described by Ester et al (1996). DBSCAN performs the following steps:
\enumerate{
\item Estimate the density
around each data point by counting the number of points in a user-specified
eps-neighborhood and applies a used-specified minPts thresholds to identify
\itemize{
\item core points (points with more than minPts points in their neighborhood),
\item border points (non-core points with a core point in their neighborhood) and
\item noise points (all other points).
}
\item Core points form the backbone of clusters by joining them into
a cluster if they are density-reachable from each other (i.e., there is a chain of core
points where one falls inside the eps-neighborhood of the next).
\item Border points are assigned to clusters. The algorithm needs parameters
\code{eps} (the radius of the epsilon neighborhood) and \code{minPts} (the
density threshold).
}

Border points are arbitrarily assigned to clusters in the original
algorithm. DBSCAN* (see Campello et al 2013) treats all border points as
noise points. This is implemented with \code{borderPoints = FALSE}.

\strong{Specifying the data}

If \code{x} is a matrix or a data.frame, then fast fixed-radius nearest
neighbor computation using a kd-tree is performed using Euclidean distance.
See \code{\link[=frNN]{frNN()}} for more information on the parameters related to
nearest neighbor search. \strong{Note} that only numerical values are allowed in \code{x}.

Any precomputed distance matrix (dist object) can be specified as \code{x}.
You may run into memory issues since distance matrices are large.

A precomputed frNN object can be supplied as \code{x}. In this case
\code{eps} does not need to be specified. This option us useful for large
data sets, where a sparse distance matrix is available. See
\code{\link[=frNN]{frNN()}} how to create frNN objects.

\strong{Setting parameters for DBSCAN}

The parameters \code{minPts} and \code{eps} define the minimum density required
in the area around core points which form the backbone of clusters.
\code{minPts} is the number of points
required in the neighborhood around the point defined by the parameter \code{eps}
(i.e., the radius around the point). Both parameters
depend on each other and changing one typically requires changing
the other one as well. The parameters also depend on the size of the data set with
larger datasets requiring a larger \code{minPts} or a smaller \code{eps}.
\itemize{
\item \verb{minPts:} The original
DBSCAN paper (Ester et al, 1996) suggests to start by setting \eqn{\text{minPts} \ge d + 1},
the data dimensionality plus one or higher with a minimum of 3. Larger values
are preferable since increasing the parameter suppresses more noise in the data
by requiring more points to form clusters.
Sander et al (1998) uses in the examples two times the data dimensionality.
Note that setting \eqn{\text{minPts} \le 2} is equivalent to hierarchical clustering
with the single link metric and the dendrogram cut at height \code{eps}.
\item \verb{eps:} A suitable neighborhood size
parameter \code{eps} given a fixed value for \code{minPts} can be found
visually by inspecting the \code{\link[=kNNdistplot]{kNNdistplot()}} of the data using
\eqn{k = \text{minPts} - 1} (\code{minPts} includes the point itself, while the
k-nearest neighbors distance does not). The k-nearest neighbor distance plot
sorts all data points by their k-nearest neighbor distance. A sudden
increase of the kNN distance (a knee) indicates that the points to the right
are most likely outliers. Choose \code{eps} for DBSCAN where the knee is.
}

\strong{Predict cluster memberships}

\code{\link[=predict]{predict()}} can be used to predict cluster memberships for new data
points. A point is considered a member of a cluster if it is within the eps
neighborhood of a core point of the cluster. Points
which cannot be assigned to a cluster will be reported as
noise points (i.e., cluster ID 0).
\strong{Important note:} \code{predict()} currently can only use Euclidean distance to determine
the neighborhood of core points. If \code{dbscan()} was called using distances other than Euclidean,
then the neighborhood calculation will not be correct and only approximated by Euclidean
distances. If the data contain factor columns (e.g., using Gower's distance), then
the factors in \code{data} and \code{query} first need to be converted to numeric to use the
Euclidean approximation.
}
\examples{
## Example 1: use dbscan on the iris data set
data(iris)
iris <- as.matrix(iris[, 1:4])

## Find suitable DBSCAN parameters:
## 1. We use minPts = dim + 1 = 5 for iris. A larger value can also be used.
## 2. We inspect the k-NN distance plot for k = minPts - 1 = 4
kNNdistplot(iris, minPts = 5)

## Noise seems to start around a 4-NN distance of .7
abline(h=.7, col = "red", lty = 2)

## Cluster with the chosen parameters
res <- dbscan(iris, eps = .7, minPts = 5)
res

pairs(iris, col = res$cluster + 1L)
clplot(iris, res)

## Use a precomputed frNN object
fr <- frNN(iris, eps = .7)
dbscan(fr, minPts = 5)

## Example 2: use data from fpc
set.seed(665544)
n <- 100
x <- cbind(
  x = runif(10, 0, 10) + rnorm(n, sd = 0.2),
  y = runif(10, 0, 10) + rnorm(n, sd = 0.2)
  )

res <- dbscan(x, eps = .3, minPts = 3)
res

## plot clusters and add noise (cluster 0) as crosses.
plot(x, col = res$cluster)
points(x[res$cluster == 0, ], pch = 3, col = "grey")

clplot(x, res)
hullplot(x, res)

## Predict cluster membership for new data points
## (Note: 0 means it is predicted as noise)
newdata <- x[1:5,] + rnorm(10, 0, .3)
hullplot(x, res)
points(newdata, pch = 3 , col = "red", lwd = 3)
text(newdata, pos = 1)

pred_label <- predict(res, newdata, data = x)
pred_label
points(newdata, col = pred_label + 1L,  cex = 2, lwd = 2)

## Compare speed against fpc version (if microbenchmark is installed)
## Note: we use dbscan::dbscan to make sure that we do now run the
## implementation in fpc.
\dontrun{
if (requireNamespace("fpc", quietly = TRUE) &&
    requireNamespace("microbenchmark", quietly = TRUE)) {
  t_dbscan <- microbenchmark::microbenchmark(
    dbscan::dbscan(x, .3, 3), times = 10, unit = "ms")
  t_dbscan_linear <- microbenchmark::microbenchmark(
    dbscan::dbscan(x, .3, 3, search = "linear"), times = 10, unit = "ms")
  t_dbscan_dist <- microbenchmark::microbenchmark(
    dbscan::dbscan(x, .3, 3, search = "dist"), times = 10, unit = "ms")
  t_fpc <- microbenchmark::microbenchmark(
    fpc::dbscan(x, .3, 3), times = 10, unit = "ms")

  r <- rbind(t_fpc, t_dbscan_dist, t_dbscan_linear, t_dbscan)
  r

  boxplot(r,
    names = c('fpc', 'dbscan (dist)', 'dbscan (linear)', 'dbscan (kdtree)'),
    main = "Runtime comparison in ms")

  ## speedup of the kd-tree-based version compared to the fpc implementation
  median(t_fpc$time) / median(t_dbscan$time)
}}

## Example 3: manually create a frNN object for dbscan (dbscan only needs ids and eps)
nn <- structure(list(id = list(c(2,3), c(1,3), c(1,2,3), c(3,5), c(4,5)), eps = 1),
  class =  c("NN", "frNN"))
nn
dbscan(nn, minPts = 2)

}
\references{
Hahsler M, Piekenbrock M, Doran D (2019). dbscan: Fast
Density-Based Clustering with R.  \emph{Journal of Statistical Software,}
91(1), 1-30.
\doi{10.18637/jss.v091.i01}

Martin Ester, Hans-Peter Kriegel, Joerg Sander, Xiaowei Xu (1996). A
Density-Based Algorithm for Discovering Clusters in Large Spatial Databases
with Noise. Institute for Computer Science, University of Munich.
\emph{Proceedings of 2nd International Conference on Knowledge Discovery and
Data Mining (KDD-96),} 226-231.
\url{https://dl.acm.org/doi/10.5555/3001460.3001507}

Campello, R. J. G. B.; Moulavi, D.; Sander, J. (2013). Density-Based
Clustering Based on Hierarchical Density Estimates. Proceedings of the
17th Pacific-Asia Conference on Knowledge Discovery in Databases, PAKDD
2013, \emph{Lecture Notes in Computer Science} 7819, p. 160.
\doi{10.1007/978-3-642-37456-2_14}

Sander, J., Ester, M., Kriegel, HP. et al. (1998). Density-Based
Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications.
\emph{Data Mining and Knowledge Discovery} 2, 169-194.
\doi{10.1023/A:1009745219419}
}
\seealso{
Other clustering functions: 
\code{\link{extractFOSC}()},
\code{\link{hdbscan}()},
\code{\link{jpclust}()},
\code{\link{ncluster}()},
\code{\link{optics}()},
\code{\link{sNNclust}()}
}
\author{
Michael Hahsler
}
\concept{clustering functions}
\keyword{clustering}
\keyword{model}


================================================
FILE: man/dbscan_tidiers.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/broom-dbscan-tidiers.R
\name{dbscan_tidiers}
\alias{dbscan_tidiers}
\alias{glance}
\alias{tidy}
\alias{augment}
\alias{tidy.dbscan}
\alias{tidy.hdbscan}
\alias{tidy.general_clustering}
\alias{augment.dbscan}
\alias{augment.hdbscan}
\alias{augment.general_clustering}
\alias{glance.dbscan}
\alias{glance.hdbscan}
\alias{glance.general_clustering}
\title{Turn an dbscan clustering object into a tidy tibble}
\usage{
tidy(x, ...)

\method{tidy}{dbscan}(x, ...)

\method{tidy}{hdbscan}(x, ...)

\method{tidy}{general_clustering}(x, ...)

augment(x, ...)

\method{augment}{dbscan}(x, data = NULL, newdata = NULL, ...)

\method{augment}{hdbscan}(x, data = NULL, newdata = NULL, ...)

\method{augment}{general_clustering}(x, data = NULL, newdata = NULL, ...)

glance(x, ...)

\method{glance}{dbscan}(x, ...)

\method{glance}{hdbscan}(x, ...)

\method{glance}{general_clustering}(x, ...)
}
\arguments{
\item{x}{An \code{dbscan} object returned from \code{\link[=dbscan]{dbscan()}}.}

\item{...}{further arguments are ignored without a warning.}

\item{data}{The data used to create the clustering.}

\item{newdata}{New data to predict cluster labels for.}
}
\description{
Provides \link[generics:tidy]{tidy()}, \link[generics:augment]{augment()}, and
\link[generics:glance]{glance()} verbs for clusterings created with algorithms
in package \code{dbscan} to work with \href{https://www.tidymodels.org/}{tidymodels}.
}
\examples{
\dontshow{if (requireNamespace("tibble", quietly = TRUE) && identical(Sys.getenv("NOT_CRAN"), "true")) withAutoprint(\{ # examplesIf}

data(iris)
x <- scale(iris[, 1:4])

## dbscan
db <- dbscan(x, eps = .9, minPts = 5)
db

# summarize model fit with tidiers
tidy(db)
glance(db)

# augment for this model needs the original data
augment(db, x)

# to augment new data, the original data is also needed
augment(db, x, newdata = x[1:5, ])

## hdbscan
hdb <- hdbscan(x, minPts = 5)

# summarize model fit with tidiers
tidy(hdb)
glance(hdb)

# augment for this model needs the original data
augment(hdb, x)

# to augment new data, the original data is also needed
augment(hdb, x, newdata = x[1:5, ])

## Jarvis-Patrick clustering
cl <- jpclust(x, k = 20, kt = 15)

# summarize model fit with tidiers
tidy(cl)
glance(cl)

# augment for this model needs the original data
augment(cl, x)

## Shared Nearest Neighbor clustering
cl <- sNNclust(x, k = 20, eps = 0.8, minPts = 15)

# summarize model fit with tidiers
tidy(cl)
glance(cl)

# augment for this model needs the original data
augment(cl, x)
\dontshow{\}) # examplesIf}
}
\seealso{
\code{\link[generics:tidy]{generics::tidy()}}, \code{\link[generics:augment]{generics::augment()}},
\code{\link[generics:glance]{generics::glance()}}, \code{\link[=dbscan]{dbscan()}}
}
\concept{tidiers}


================================================
FILE: man/dendrogram.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dendrogram.R
\name{dendrogram}
\alias{dendrogram}
\alias{as.dendrogram}
\alias{as.dendrogram.default}
\alias{as.dendrogram.hclust}
\alias{as.dendrogram.hdbscan}
\alias{as.dendrogram.reachability}
\title{Coersions to Dendrogram}
\usage{
as.dendrogram(object, ...)

\method{as.dendrogram}{default}(object, ...)

\method{as.dendrogram}{hclust}(object, ...)

\method{as.dendrogram}{hdbscan}(object, ...)

\method{as.dendrogram}{reachability}(object, ...)
}
\arguments{
\item{object}{the object}

\item{...}{further arguments}
}
\description{
Provides a new generic function to coerce objects to dendrograms with
\code{\link[stats:dendrogram]{stats::as.dendrogram()}} as the default. Additional methods for
\link{hclust}, \link{hdbscan} and \link{reachability} objects are provided.
}
\details{
Coersion methods for
\link{hclust}, \link{hdbscan} and \link{reachability} objects to \link{dendrogram} are provided.

The coercion from \code{hclust} is a faster C++ reimplementation of the coercion in
package \code{stats}. The original implementation can be called
using \code{\link[stats:dendrogram]{stats::as.dendrogram()}}.

The coersion from \link{hdbscan} builds the non-simplified HDBSCAN hierarchy as a
dendrogram object.
}


================================================
FILE: man/extractFOSC.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/extractFOSC.R
\name{extractFOSC}
\alias{extractFOSC}
\title{Framework for the Optimal Extraction of Clusters from Hierarchies}
\usage{
extractFOSC(
  x,
  constraints,
  alpha = 0,
  minPts = 2L,
  prune_unstable = FALSE,
  validate_constraints = FALSE
)
}
\arguments{
\item{x}{a valid \link{hclust} object created via \code{\link[=hclust]{hclust()}} or \code{\link[=hdbscan]{hdbscan()}}.}

\item{constraints}{Either a list or matrix of pairwise constraints. If
missing, an unsupervised measure of stability is used to make local cuts and
extract the optimal clusters. See details.}

\item{alpha}{numeric; weight between \eqn{[0, 1]} for mixed-objective
semi-supervised extraction. Defaults to 0.}

\item{minPts}{numeric; Defaults to 2. Only needed if class-less noise is a
valid label in the model.}

\item{prune_unstable}{logical; should significantly unstable subtrees be
pruned? The default is \code{FALSE} for the original optimal extraction
framework (see Campello et al, 2013). See details for what \code{TRUE}
implies.}

\item{validate_constraints}{logical; should constraints be checked for
validity? See details for what are considered valid constraints.}
}
\value{
A list with the elements:

\item{cluster }{A integer vector with cluster assignments. Zero
indicates noise points (if any).}
\item{hc }{The original \link{hclust} object with additional list elements
\code{"stability"}, \code{"constraint"}, and \code{"total"}
for the \eqn{n - 1} cluster-wide objective scores from the extraction.}
}
\description{
Generic reimplementation of the \emph{Framework for Optimal Selection of Clusters}
(FOSC; Campello et al, 2013) to extract clusterings from hierarchical clustering (i.e.,
\link{hclust} objects).
Can be parameterized to perform unsupervised
cluster extraction through a stability-based measure, or semisupervised
cluster extraction through either a constraint-based extraction (with a
stability-based tiebreaker) or a mixed (weighted) constraint and
stability-based objective extraction.
}
\details{
Campello et al (2013) suggested a \emph{Framework for Optimal Selection of
Clusters} (FOSC) as a framework to make local (non-horizontal) cuts to any
cluster tree hierarchy. This function implements the original extraction
algorithms as described by the framework for hclust objects. Traditional
cluster extraction methods from hierarchical representations (such as
\link{hclust} objects) generally rely on global parameters or cutting values
which are used to partition a cluster hierarchy into a set of disjoint, flat
clusters. This is implemented in R in function \code{\link[stats:cutree]{stats::cutree()}}.
Although such methods are widespread, using global parameter
settings are inherently limited in that they cannot capture patterns within
the cluster hierarchy at varying \emph{local} levels of granularity.

Rather than partitioning a hierarchy based on the number of the cluster one
expects to find (\eqn{k}) or based on some linkage distance threshold
(\eqn{H}), the FOSC proposes that the optimal clusters may exist at varying
distance thresholds in the hierarchy. To enable this idea, FOSC requires one
parameter (minPts) that represents \emph{the minimum number of points that
constitute a valid cluster.} The first step of the FOSC algorithm is to
traverse the given cluster hierarchy divisively, recording new clusters at
each split if both branches represent more than or equal to minPts. Branches
that contain less than minPts points at one or both branches inherit the
parent clusters identity. Note that using FOSC, due to the constraint that
minPts must be greater than or equal to 2, it is possible that the optimal
cluster solution chosen makes local cuts that render parent branches of
sizes less than minPts as noise, which are denoted as 0 in the final
solution.

Traversing the original cluster tree using minPts creates a new, simplified
cluster tree that is then post-processed recursively to extract clusters
that maximize for each cluster \eqn{C_i}{Ci} the cost function

\deqn{\max_{\delta_2, \dots, \delta_k} J = \sum\limits_{i=2}^{k} \delta_i
S(C_i)}{ J = \sum \delta S(Ci) for all i clusters, } where
\eqn{S(C_i)}{S(Ci)} is the stability-based measure as \deqn{ S(C_i) =
\sum_{x_j \in C_i}(\frac{1}{h_{min} (x_j, C_i)} - \frac{1}{h_{max} (C_i)})
}{ S(Ci) = \sum (1/Hmin(Xj, Ci) - 1/Hmax(Ci)) for all Xj in Ci.}

\eqn{\delta_i}{\delta} represents an indicator function, which constrains
the solution space such that clusters must be disjoint (cannot assign more
than 1 label to each cluster). The measure \eqn{S(C_i)}{S(Ci)} used by FOSC
is an unsupervised validation measure based on the assumption that, if you
vary the linkage/distance threshold across all possible values, more
prominent clusters that survive over many threshold variations should be
considered as stronger candidates of the optimal solution. For this reason,
using this measure to detect clusters is referred to as an unsupervised,
\emph{stability-based} extraction approach. In some cases it may be useful
to enact \emph{instance-level} constraints that ensure the solution space
conforms to linkage expectations known \emph{a priori}. This general idea of
using preliminary expectations to augment the clustering solution will be
referred to as \emph{semisupervised clustering}. If constraints are given in
the call to \code{extractFOSC()}, the following alternative objective function
is maximized:

\deqn{J = \frac{1}{2n_c}\sum\limits_{j=1}^n \gamma (x_j)}{J = 1/(2 * nc)
\sum \gamma(Xj)}

\eqn{n_c}{nc} is the total number of constraints given and
\eqn{\gamma(x_j)}{\gamma(Xj)} represents the number of constraints involving
object \eqn{x_j}{Xj} that are satisfied. In the case of ties (such as
solutions where no constraints were given), the unsupervised solution is
used as a tiebreaker. See Campello et al (2013) for more details.

As a third option, if one wishes to prioritize the degree at which the
unsupervised and semisupervised solutions contribute to the overall optimal
solution, the parameter \eqn{\alpha} can be set to enable the extraction of
clusters that maximize the \code{mixed} objective function

\deqn{J = \alpha S(C_i) + (1 - \alpha) \gamma(C_i))}{J = \alpha S(Ci) + (1 -
\alpha) \gamma(Ci).}

FOSC expects the pairwise constraints to be passed as either 1) an
\eqn{n(n-1)/2} vector of integers representing the constraints, where 1
represents should-link, -1 represents should-not-link, and 0 represents no
preference using the unsupervised solution (see below for examples).
Alternatively, if only a few constraints are needed, a named list
representing the (symmetric) adjacency list can be used, where the names
correspond to indices of the points in the original data, and the values
correspond to integer vectors of constraints (positive indices for
should-link, negative indices for should-not-link). Again, see the examples
section for a demonstration of this.

The parameters to the input function correspond to the concepts discussed
above. The \code{minPts} parameter to represent the minimum cluster size to
extract. The optional \code{constraints} parameter contains the pairwise,
instance-level constraints of the data. The optional \code{alpha} parameters
controls whether the mixed objective function is used (if \code{alpha} is
greater than 0). If the \code{validate_constraints} parameter is set to
true, the constraints are checked (and fixed) for symmetry (if point A has a
should-link constraint with point B, point B should also have the same
constraint). Asymmetric constraints are not supported.

Unstable branch pruning was not discussed by Campello et al (2013), however
in some data sets it may be the case that specific subbranches scores are
significantly greater than sibling and parent branches, and thus sibling
branches should be considered as noise if their scores are cumulatively
lower than the parents. This can happen in extremely nonhomogeneous data
sets, where there exists locally very stable branches surrounded by unstable
branches that contain more than \code{minPts} points.
\code{prune_unstable = TRUE} will remove the unstable branches.
}
\examples{
data("moons")

## Regular HDBSCAN using stability-based extraction (unsupervised)
cl <- hdbscan(moons, minPts = 5)
cl$cluster

## Constraint-based extraction from the HDBSCAN hierarchy
## (w/ stability-based tiebreaker (semisupervised))
cl_con <- extractFOSC(cl$hc, minPts = 5,
  constraints = list("12" = c(49, -47)))
cl_con$cluster

## Alternative formulation: Constraint-based extraction from the HDBSCAN hierarchy
## (w/ stability-based tiebreaker (semisupervised)) using distance thresholds
dist_moons <- dist(moons)
cl_con2 <- extractFOSC(cl$hc, minPts = 5,
  constraints = ifelse(dist_moons < 0.1, 1L,
                ifelse(dist_moons > 1, -1L, 0L)))

cl_con2$cluster # same as the second example
}
\references{
Campello, Ricardo JGB, Davoud Moulavi, Arthur Zimek, and Joerg
Sander (2013). A framework for semi-supervised and unsupervised optimal
extraction of clusters from hierarchies. \emph{Data Mining and Knowledge
Discovery} 27(3): 344-371.
\doi{10.1007/s10618-013-0311-4}
}
\seealso{
\code{\link[=hclust]{hclust()}}, \code{\link[=hdbscan]{hdbscan()}}, \code{\link[stats:cutree]{stats::cutree()}}

Other clustering functions: 
\code{\link{dbscan}()},
\code{\link{hdbscan}()},
\code{\link{jpclust}()},
\code{\link{ncluster}()},
\code{\link{optics}()},
\code{\link{sNNclust}()}
}
\author{
Matt Piekenbrock
}
\concept{clustering functions}
\keyword{clustering}
\keyword{model}


================================================
FILE: man/frNN.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/frNN.R
\name{frNN}
\alias{frNN}
\alias{frnn}
\alias{print.frnn}
\alias{sort.frNN}
\alias{adjacencylist.frNN}
\alias{print.frNN}
\title{Find the Fixed Radius Nearest Neighbors}
\usage{
frNN(
  x,
  eps,
  query = NULL,
  sort = TRUE,
  search = "kdtree",
  bucketSize = 10,
  splitRule = "suggest",
  approx = 0
)

\method{sort}{frNN}(x, decreasing = FALSE, ...)

\method{adjacencylist}{frNN}(x, ...)

\method{print}{frNN}(x, ...)
}
\arguments{
\item{x}{a data matrix, a dist object or a frNN object.}

\item{eps}{neighbors radius.}

\item{query}{a data matrix with the points to query. If query is not
specified, the NN for all the points in \code{x} is returned. If query is
specified then \code{x} needs to be a data matrix.}

\item{sort}{sort the neighbors by distance? This is expensive and can be
done later using \code{sort()}.}

\item{search}{nearest neighbor search strategy (one of \code{"kdtree"}, \code{"linear"} or
\code{"dist"}).}

\item{bucketSize}{max size of the kd-tree leafs.}

\item{splitRule}{rule to split the kd-tree. One of \code{"STD"}, \code{"MIDPT"}, \code{"FAIR"},
\code{"SL_MIDPT"}, \code{"SL_FAIR"} or \code{"SUGGEST"} (SL stands for sliding). \code{"SUGGEST"} uses
ANNs best guess.}

\item{approx}{use approximate nearest neighbors. All NN up to a distance of
a factor of \code{1 + approx} eps may be used. Some actual NN may be omitted
leading to spurious clusters and noise points.  However, the algorithm will
enjoy a significant speedup.}

\item{decreasing}{sort in decreasing order?}

\item{...}{further arguments}
}
\value{
\code{frNN()} returns an object of class \link{frNN} (subclass of
\link{NN}) containing a list with the following components:
\item{id }{a list of
integer vectors. Each vector contains the ids (row numbers) of the fixed radius nearest
neighbors. }
\item{dist }{a list with distances (same structure as
\code{id}). }
\item{eps }{ neighborhood radius \code{eps} that was used. }
\item{metric }{ used distance metric. }

\code{adjacencylist()} returns a list with one entry per data point in \code{x}. Each entry
contains the id of the nearest neighbors.
}
\description{
This function uses a kd-tree to find the fixed radius nearest neighbors
(including distances) fast.
}
\details{
If \code{x} is specified as a data matrix, then Euclidean distances an fast
nearest neighbor lookup using a kd-tree are used.

To create a frNN object from scratch, you need to supply at least the
elements \code{id} with a list of integer vectors with the nearest neighbor
ids for each point and \code{eps} (see below).

\strong{Self-matches:} Self-matches are not returned!
}
\examples{
data(iris)
x <- iris[, -5]

# Example 1: Find fixed radius nearest neighbors for each point
nn <- frNN(x, eps = .5)
nn

# Number of neighbors
hist(lengths(adjacencylist(nn)),
  xlab = "k", main="Number of Neighbors",
  sub = paste("Neighborhood size eps =", nn$eps))

# Explore neighbors of point i = 10
i <- 10
nn$id[[i]]
nn$dist[[i]]
plot(x, col = ifelse(seq_len(nrow(iris)) \%in\% nn$id[[i]], "red", "black"))

# get an adjacency list
head(adjacencylist(nn))

# plot the fixed radius neighbors (and then reduced to a radius of .3)
plot(nn, x)
plot(frNN(nn, eps = .3), x)

## Example 2: find fixed-radius NN for query points
q <- x[c(1,100),]
nn <- frNN(x, eps = .5, query = q)

plot(nn, x, col = "grey")
points(q, pch = 3, lwd = 2)
}
\references{
David M. Mount and Sunil Arya (2010). ANN: A Library for
Approximate Nearest Neighbor Searching,
\url{http://www.cs.umd.edu/~mount/ANN/}.
}
\seealso{
Other NN functions: 
\code{\link{NN}},
\code{\link{comps}()},
\code{\link{kNN}()},
\code{\link{kNNdist}()},
\code{\link{sNN}()}
}
\author{
Michael Hahsler
}
\concept{NN functions}
\keyword{model}


================================================
FILE: man/glosh.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/GLOSH.R
\name{glosh}
\alias{glosh}
\alias{GLOSH}
\title{Global-Local Outlier Score from Hierarchies}
\usage{
glosh(x, k = 4, ...)
}
\arguments{
\item{x}{an \link{hclust} object, data matrix, or \link{dist} object.}

\item{k}{size of the neighborhood.}

\item{...}{further arguments are passed on to \code{\link[=kNN]{kNN()}}.}
}
\value{
A numeric vector of length equal to the size of the original data
set containing GLOSH values for all data points.
}
\description{
Calculate the Global-Local Outlier Score from Hierarchies (GLOSH) score for
each data point using a kd-tree to speed up kNN search.
}
\details{
GLOSH compares the density of a point to densities of any points associated
within current and child clusters (if any). Points that have a substantially
lower density than the density mode (cluster) they most associate with are
considered outliers. GLOSH is computed from a hierarchy a clusters.

Specifically, consider a point \emph{x} and a density or distance threshold
\emph{lambda}. GLOSH is calculated by taking 1 minus the ratio of how long
any of the child clusters of the cluster \emph{x} belongs to "survives"
changes in \emph{lambda} to the highest \emph{lambda} threshold of x, above
which x becomes a noise point.

Scores close to 1 indicate outliers. For more details on the motivation for
this calculation, see Campello et al (2015).
}
\examples{
set.seed(665544)
n <- 100
x <- cbind(
  x=runif(10, 0, 5) + rnorm(n, sd = 0.4),
  y=runif(10, 0, 5) + rnorm(n, sd = 0.4)
  )

### calculate GLOSH score
glosh <- glosh(x, k = 3)

### distribution of outlier scores
summary(glosh)
hist(glosh, breaks = 10)

### simple function to plot point size is proportional to GLOSH score
plot_glosh <- function(x, glosh){
  plot(x, pch = ".", main = "GLOSH (k = 3)")
  points(x, cex = glosh*3, pch = 1, col = "red")
  text(x[glosh > 0.80, ], labels = round(glosh, 3)[glosh > 0.80], pos = 3)
}
plot_glosh(x, glosh)

### GLOSH with any hierarchy
x_dist <- dist(x)
x_sl <- hclust(x_dist, method = "single")
x_upgma <- hclust(x_dist, method = "average")
x_ward <- hclust(x_dist, method = "ward.D2")

## Compare what different linkage criterion consider as outliers
glosh_sl <- glosh(x_sl, k = 3)
plot_glosh(x, glosh_sl)

glosh_upgma <- glosh(x_upgma, k = 3)
plot_glosh(x, glosh_upgma)

glosh_ward <- glosh(x_ward, k = 3)
plot_glosh(x, glosh_ward)

## GLOSH is automatically computed with HDBSCAN
all(hdbscan(x, minPts = 3)$outlier_scores == glosh(x, k = 3))
}
\references{
Campello, Ricardo JGB, Davoud Moulavi, Arthur Zimek, and Joerg
Sander. Hierarchical density estimates for data clustering, visualization,
and outlier detection. \emph{ACM Transactions on Knowledge Discovery from Data
(TKDD)} 10, no. 1 (2015).
\doi{10.1145/2733381}
}
\seealso{
Other Outlier Detection Functions: 
\code{\link{kNNdist}()},
\code{\link{lof}()},
\code{\link{pointdensity}()}
}
\author{
Matt Piekenbrock
}
\concept{Outlier Detection Functions}
\keyword{model}


================================================
FILE: man/hdbscan.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/hdbscan.R, R/predict.R
\name{hdbscan}
\alias{hdbscan}
\alias{HDBSCAN}
\alias{print.hdbscan}
\alias{plot.hdbscan}
\alias{coredist}
\alias{mrdist}
\alias{predict.hdbscan}
\title{Hierarchical DBSCAN (HDBSCAN)}
\usage{
hdbscan(
  x,
  minPts,
  cluster_selection_epsilon = 0,
  gen_hdbscan_tree = FALSE,
  gen_simplified_tree = FALSE,
  verbose = FALSE
)

\method{print}{hdbscan}(x, ...)

\method{plot}{hdbscan}(
  x,
  scale = "suggest",
  gradient = c("yellow", "red"),
  show_flat = FALSE,
  main = "HDBSCAN*",
  ylab = "eps value",
  leaflab = "none",
  ...
)

coredist(x, minPts)

mrdist(x, minPts, coredist = NULL)

\method{predict}{hdbscan}(object, newdata, data, ...)
}
\arguments{
\item{x}{a data matrix (Euclidean distances are used) or a \link{dist} object
calculated with an arbitrary distance metric.}

\item{minPts}{integer; Minimum size of clusters. See details.}

\item{cluster_selection_epsilon}{double; a distance threshold below which}

\item{gen_hdbscan_tree}{logical; should the robust single linkage tree be
explicitly computed (see cluster tree in Chaudhuri et al, 2010).}

\item{gen_simplified_tree}{logical; should the simplified hierarchy be
explicitly computed (see Campello et al, 2013).}

\item{verbose}{report progress.}

\item{...}{additional arguments are passed on.}

\item{scale}{integer; used to scale condensed tree based on the graphics
device. Lower scale results in wider colored trees lines.
The default \code{'suggest'} sets scale to the number of clusters.}

\item{gradient}{character vector; the colors to build the condensed tree
coloring with.}

\item{show_flat}{logical; whether to draw boxes indicating the most stable
clusters.}

\item{main}{Title of the plot.}

\item{ylab}{the label for the y axis.}

\item{leaflab}{a string specifying how leaves are labeled (see \code{\link[stats:dendrogram]{stats::plot.dendrogram()}}).}

\item{coredist}{numeric vector with precomputed core distances (optional).}

\item{object}{clustering object.}

\item{newdata}{new data points for which the cluster membership should be
predicted.}

\item{data}{the data set used to create the clustering object.}
}
\value{
\code{hdbscan()} returns object of class \code{hdbscan} with the following components:
\item{cluster }{A integer vector with cluster assignments. Zero indicates
noise points.}
\item{minPts }{ value of the \code{minPts} parameter.}
\item{cluster_scores }{The sum of the stability scores for each salient
(flat) cluster. Corresponds to cluster IDs given the in \code{"cluster"} element.
}
\item{membership_prob }{The probability or individual stability of a
point within its clusters. Between 0 and 1.}
\item{outlier_scores }{The GLOSH outlier score of each point. }
\item{hc }{An \link{hclust} object of the HDBSCAN hierarchy. }

\code{coredist()} returns a vector with the core distance for each data point.

\code{mrdist()} returns a \link{dist} object containing pairwise mutual reachability distances.
}
\description{
Fast C++ implementation of the HDBSCAN (Hierarchical DBSCAN) and its related
algorithms.
}
\details{
This fast implementation of HDBSCAN (Campello et al., 2013) computes the
hierarchical cluster tree representing density estimates along with the
stability-based flat cluster extraction. HDBSCAN essentially computes the
hierarchy of all DBSCAN* clusterings, and
then uses a stability-based extraction method to find optimal cuts in the
hierarchy, thus producing a flat solution.

HDBSCAN performs the following steps:
\enumerate{
\item Compute mutual reachability distance mrd between points
(based on distances and core distances).
\item Use mdr as a distance measure to construct a minimum spanning tree.
\item Prune the tree using stability.
\item Extract the clusters.
}

Additional, related algorithms including the "Global-Local Outlier Score
from Hierarchies" (GLOSH; see section 6 of Campello et al., 2015)
is available in function \code{\link[=glosh]{glosh()}}
and the ability to cluster based on instance-level constraints (see
section 5.3 of Campello et al. 2015) are supported. The algorithms only need
the parameter \code{minPts}.

Note that \code{minPts} not only acts as a minimum cluster size to detect,
but also as a "smoothing" factor of the density estimates implicitly
computed from HDBSCAN.

When using the optional parameter \code{cluster_selection_epsilon},
a combination between DBSCAN* and HDBSCAN* can be achieved
(see Malzer & Baum 2020). This means that part of the
tree is affected by \code{cluster_selection_epsilon} as if
running DBSCAN* with \code{eps} = \code{cluster_selection_epsilon}.
The remaining part (on levels above the threshold) is still
processed by HDBSCAN*'s stability-based selection algorithm
and can therefore return clusters of variable densities.
Note that there is not always a remaining part, especially if
the parameter value is chosen too large, or if there aren't
enough clusters of variable densities. In this case, the result
will be equal to DBSCAN*.
where HDBSCAN* produces too many small clusters that
need to be merged, while still being able to extract clusters
of variable densities at higher levels.

\code{coredist()}: The core distance is defined for each point as
the distance to the \code{MinPts - 1}'s neighbor.
It is a density estimate equivalent to \code{kNNdist()} with \code{k = MinPts -1}.

\code{mrdist()}: The mutual reachability distance is defined between two points as
\code{mrd(a, b) = max(coredist(a), coredist(b), dist(a, b))}. This distance metric is used by
HDBSCAN. It has the effect of increasing distances in low density areas.

\code{predict()} assigns each new data point to the same cluster as the nearest point
if it is not more than that points core distance away. Otherwise the new point
is classified as a noise point (i.e., cluster ID 0).
}
\examples{
## cluster the moons data set with HDBSCAN
data(moons)

res <- hdbscan(moons, minPts = 5)
res

plot(res)
clplot(moons, res)

## cluster the moons data set with HDBSCAN using Manhattan distances
res <- hdbscan(dist(moons, method = "manhattan"), minPts = 5)
plot(res)
clplot(moons, res)

## Example for HDBSCAN(e) using cluster_selection_epsilon
# data with clusters of various densities.
X <- data.frame(
 x = c(
  0.08, 0.46, 0.46, 2.95, 3.50, 1.49, 6.89, 6.87, 0.21, 0.15,
  0.15, 0.39, 0.80, 0.80, 0.37, 3.63, 0.35, 0.30, 0.64, 0.59, 1.20, 1.22,
  1.42, 0.95, 2.70, 6.36, 6.36, 6.36, 6.60, 0.04, 0.71, 0.57, 0.24, 0.24,
  0.04, 0.04, 1.35, 0.82, 1.04, 0.62, 0.26, 5.98, 1.67, 1.67, 0.48, 0.15,
  6.67, 6.67, 1.20, 0.21, 3.99, 0.12, 0.19, 0.15, 6.96, 0.26, 0.08, 0.30,
  1.04, 1.04, 1.04, 0.62, 0.04, 0.04, 0.04, 0.82, 0.82, 1.29, 1.35, 0.46,
  0.46, 0.04, 0.04, 5.98, 5.98, 6.87, 0.37, 6.47, 6.47, 6.47, 6.67, 0.30,
  1.49, 3.21, 3.21, 0.75, 0.75, 0.46, 0.46, 0.46, 0.46, 3.63, 0.39, 3.65,
  4.09, 4.01, 3.36, 1.43, 3.28, 5.94, 6.35, 6.87, 5.60, 5.99, 0.12, 0.00,
  0.32, 0.39, 0.00, 1.63, 1.36, 5.67, 5.60, 5.79, 1.10, 2.99, 0.39, 0.18
  ),
 y = c(
  7.41, 8.01, 8.01, 5.44, 7.11, 7.13, 1.83, 1.83, 8.22, 8.08,
  8.08, 7.20, 7.83, 7.83, 8.29, 5.99, 8.32, 8.22, 7.38, 7.69, 8.22, 7.31,
  8.25, 8.39, 6.34, 0.16, 0.16, 0.16, 1.66, 7.55, 7.90, 8.18, 8.32, 8.32,
  7.97, 7.97, 8.15, 8.43, 7.83, 8.32, 8.29, 1.03, 7.27, 7.27, 8.08, 7.27,
  0.79, 0.79, 8.22, 7.73, 6.62, 7.62, 8.39, 8.36, 1.73, 8.29, 8.04, 8.22,
  7.83, 7.83, 7.83, 8.32, 8.11, 7.69, 7.55, 7.20, 7.20, 8.01, 8.15, 7.55,
  7.55, 7.97, 7.97, 1.03, 1.03, 1.24, 7.20, 0.47, 0.47, 0.47, 0.79, 8.22,
  7.13, 6.48, 6.48, 7.10, 7.10, 8.01, 8.01, 8.01, 8.01, 5.99, 8.04, 5.22,
  5.82, 5.14, 4.81, 7.62, 5.73, 0.55, 1.31, 0.05, 0.95, 1.59, 7.99, 7.48,
  8.38, 7.12, 2.01, 1.40, 0.00, 9.69, 9.47, 9.25, 2.63, 6.89, 0.56, 3.11
 )
)

## HDBSCAN splits one cluster
hdb <- hdbscan(X, minPts = 3)
plot(hdb, show_flat = TRUE)
hullplot(X, hdb, main = "HDBSCAN")

## DBSCAN* marks the least dense cluster as outliers
db <- dbscan(X, eps = 1, minPts = 3, borderPoints = FALSE)
hullplot(X, db, main = "DBSCAN*")

## HDBSCAN(e) mixes HDBSCAN AND DBSCAN* to find all clusters
hdbe <- hdbscan(X, minPts = 3, cluster_selection_epsilon = 1)
plot(hdbe, show_flat = TRUE)
hullplot(X, hdbe, main = "HDBSCAN(e)")
}
\references{
Campello RJGB, Moulavi D, Sander J (2013). Density-Based Clustering Based on
Hierarchical Density Estimates. Proceedings of the 17th Pacific-Asia
Conference on Knowledge Discovery in Databases, PAKDD 2013, \emph{Lecture Notes
in Computer Science} 7819, p. 160.
\doi{10.1007/978-3-642-37456-2_14}

Campello RJGB, Moulavi D, Zimek A, Sander J (2015). Hierarchical density
estimates for data clustering, visualization, and outlier detection.
\emph{ACM Transactions on Knowledge Discovery from Data (TKDD),} 10(5):1-51.
\doi{10.1145/2733381}

Malzer, C., & Baum, M. (2020). A Hybrid Approach To Hierarchical
Density-based Cluster Selection.
In 2020 IEEE International Conference on Multisensor Fusion
and Integration for Intelligent Systems (MFI), pp. 223-228.
\doi{10.1109/MFI49285.2020.9235263}
}
\seealso{
Other clustering functions: 
\code{\link{dbscan}()},
\code{\link{extractFOSC}()},
\code{\link{jpclust}()},
\code{\link{ncluster}()},
\code{\link{optics}()},
\code{\link{sNNclust}()}
}
\author{
Matt Piekenbrock

Claudia Malzer (added cluster_selection_epsilon)
}
\concept{HDBSCAN functions}
\concept{clustering functions}
\keyword{clustering}
\keyword{hierarchical}
\keyword{model}


================================================
FILE: man/hullplot.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/hullplot.R
\name{hullplot}
\alias{hullplot}
\alias{clplot}
\title{Plot Clusters}
\usage{
hullplot(
  x,
  cl,
  col = NULL,
  pch = NULL,
  cex = 0.5,
  hull_lwd = 1,
  hull_lty = 1,
  solid = TRUE,
  alpha = 0.2,
  main = "Convex Cluster Hulls",
  ...
)

clplot(x, cl, col = NULL, pch = NULL, cex = 0.5, main = "Cluster Plot", ...)
}
\arguments{
\item{x}{a data matrix. If more than 2 columns are provided, then the data
is plotted using the first two principal components.}

\item{cl}{a clustering. Either a numeric cluster assignment vector or a
clustering object (a list with an element named \code{cluster}).}

\item{col}{colors used for clusters. Defaults to the standard palette.  The
first color (default is black) is used for noise/unassigned points (cluster
id 0).}

\item{pch}{a vector of plotting characters. By default \code{o} is used for
points and \code{x} for noise points.}

\item{cex}{expansion factor for symbols.}

\item{hull_lwd, hull_lty}{line width and line type used for the convex hull.}

\item{solid, alpha}{draw filled polygons instead of just lines for the convex
hulls? alpha controls the level of alpha shading.}

\item{main}{main title.}

\item{...}{additional arguments passed on to plot.}
}
\description{
This function produces a two-dimensional scatter plot of data points
and colors the data points according to a supplied clustering. Noise points
are marked as \code{x}. \code{hullplot()} also adds convex hulls to clusters.
}
\examples{
set.seed(2)
n <- 400

x <- cbind(
  x = runif(4, 0, 1) + rnorm(n, sd = 0.1),
  y = runif(4, 0, 1) + rnorm(n, sd = 0.1)
  )
cl <- rep(1:4, times = 100)


### original data with true clustering
clplot(x, cl, main = "True clusters")
hullplot(x, cl, main = "True clusters")
### use different symbols
hullplot(x, cl, main = "True clusters", pch = cl)
### just the hulls
hullplot(x, cl, main = "True clusters", pch = NA)
### a version suitable for b/w printing)
hullplot(x, cl, main = "True clusters", solid = FALSE,
  col = c("grey", "black"), pch = cl)


### run some clustering algorithms and plot the results
db <- dbscan(x, eps = .07, minPts = 10)
clplot(x, db, main = "DBSCAN")
hullplot(x, db, main = "DBSCAN")

op <- optics(x, eps = 10, minPts = 10)
opDBSCAN <- extractDBSCAN(op, eps_cl = .07)
hullplot(x, opDBSCAN, main = "OPTICS")

opXi <- extractXi(op, xi = 0.05)
hullplot(x, opXi, main = "OPTICSXi")

# Extract minimal 'flat' clusters only
opXi <- extractXi(op, xi = 0.05, minimum = TRUE)
hullplot(x, opXi, main = "OPTICSXi")

km <- kmeans(x, centers = 4)
hullplot(x, km, main = "k-means")

hc <- cutree(hclust(dist(x)), k = 4)
hullplot(x, hc, main = "Hierarchical Clustering")
}
\author{
Michael Hahsler
}
\keyword{clustering}
\keyword{plot}


================================================
FILE: man/jpclust.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/jpclust.R
\name{jpclust}
\alias{jpclust}
\alias{print.general_clustering}
\title{Jarvis-Patrick Clustering}
\usage{
jpclust(x, k, kt, ...)
}
\arguments{
\item{x}{a data matrix/data.frame (Euclidean distance is used), a
precomputed \link{dist} object or a kNN object created with \code{\link[=kNN]{kNN()}}.}

\item{k}{Neighborhood size for nearest neighbor sparsification. If \code{x}
is a kNN object then \code{k} may be missing.}

\item{kt}{threshold on the number of shared nearest neighbors (including the
points themselves) to form clusters. Range: \eqn{[1, k]}}

\item{...}{additional arguments are passed on to the k nearest neighbor
search algorithm. See \code{\link[=kNN]{kNN()}} for details on how to control the
search strategy.}
}
\value{
A object of class \code{general_clustering} with the following
components:
\item{cluster }{A integer vector with cluster assignments. Zero
indicates noise points.}
\item{type }{ name of used clustering algorithm.}
\item{metric }{ the distance metric used for clustering.}
\item{param }{ list of used clustering parameters. }
}
\description{
Fast C++ implementation of the Jarvis-Patrick clustering which first builds
a shared nearest neighbor graph (k nearest neighbor sparsification) and then
places two points in the same cluster if they are in each others nearest
neighbor list and they share at least kt nearest neighbors.
}
\details{
Following the original paper, the shared nearest neighbor list is
constructed as the k neighbors plus the point itself (as neighbor zero).
Therefore, the threshold \code{kt} needs to be in the range \eqn{[1, k]}.

Fast nearest neighbors search with \code{\link[=kNN]{kNN()}} is only used if \code{x} is
a matrix. In this case Euclidean distance is used.
}
\examples{
data("DS3")

# use a shared neighborhood of 20 points and require 12 shared neighbors
cl <- jpclust(DS3, k = 20, kt = 12)
cl

clplot(DS3, cl)
# Note: JP clustering does not consider noise and thus,
# the sine wave points chain clusters together.

# use a precomputed kNN object instead of the original data.
nn <- kNN(DS3, k = 30)
nn

cl <- jpclust(nn, k = 20, kt = 12)
cl

# cluster with noise removed (use low pointdensity to identify noise)
d <- pointdensity(DS3, eps = 25)
hist(d, breaks = 20)
DS3_noiseless <- DS3[d > 110,]

cl <- jpclust(DS3_noiseless, k = 20, kt = 10)
cl

clplot(DS3_noiseless, cl)
}
\references{
R. A. Jarvis and E. A. Patrick. 1973. Clustering Using a
Similarity Measure Based on Shared Near Neighbors. \emph{IEEE Trans. Comput.
22,} 11 (November 1973), 1025-1034.
\doi{10.1109/T-C.1973.223640}
}
\seealso{
Other clustering functions: 
\code{\link{dbscan}()},
\code{\link{extractFOSC}()},
\code{\link{hdbscan}()},
\code{\link{ncluster}()},
\code{\link{optics}()},
\code{\link{sNNclust}()}
}
\author{
Michael Hahsler
}
\concept{clustering functions}
\keyword{clustering}
\keyword{model}


================================================
FILE: man/kNN.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/kNN.R
\name{kNN}
\alias{kNN}
\alias{knn}
\alias{sort.kNN}
\alias{adjacencylist.kNN}
\alias{print.kNN}
\title{Find the k Nearest Neighbors}
\usage{
kNN(
  x,
  k,
  query = NULL,
  sort = TRUE,
  search = "kdtree",
  bucketSize = 10,
  splitRule = "suggest",
  approx = 0
)

\method{sort}{kNN}(x, decreasing = FALSE, ...)

\method{adjacencylist}{kNN}(x, ...)

\method{print}{kNN}(x, ...)
}
\arguments{
\item{x}{a data matrix, a \link{dist} object or a \link{kNN} object.}

\item{k}{number of neighbors to find.}

\item{query}{a data matrix with the points to query. If query is not
specified, the NN for all the points in \code{x} is returned. If query is
specified then \code{x} needs to be a data matrix.}

\item{sort}{sort the neighbors by distance? Note that some search methods
already sort the results. Sorting is expensive and \code{sort = FALSE} may
be much faster for some search methods. kNN objects can be sorted using
\code{sort()}.}

\item{search}{nearest neighbor search strategy (one of \code{"kdtree"}, \code{"linear"} or
\code{"dist"}).}

\item{bucketSize}{max size of the kd-tree leafs.}

\item{splitRule}{rule to split the kd-tree. One of \code{"STD"}, \code{"MIDPT"}, \code{"FAIR"},
\code{"SL_MIDPT"}, \code{"SL_FAIR"} or \code{"SUGGEST"} (SL stands for sliding). \code{"SUGGEST"} uses
ANNs best guess.}

\item{approx}{use approximate nearest neighbors. All NN up to a distance of
a factor of \code{1 + approx} eps may be used. Some actual NN may be omitted
leading to spurious clusters and noise points.  However, the algorithm will
enjoy a significant speedup.}

\item{decreasing}{sort in decreasing order?}

\item{...}{further arguments}
}
\value{
An object of class \code{kNN} (subclass of \link{NN}) containing a
list with the following components:
\item{dist }{a matrix with distances. }
\item{id }{a matrix with \code{ids}. }
\item{k }{number \code{k} used. }
\item{metric }{ used distance metric. }
}
\description{
This function uses a kd-tree to find all k nearest neighbors in a data
matrix (including distances) fast.
}
\details{
\strong{Ties:} If the kth and the (k+1)th nearest neighbor are tied, then the
neighbor found first is returned and the other one is ignored.

\strong{Self-matches:} If no query is specified, then self-matches are
removed.

Details on the search parameters:
\itemize{
\item \code{search} controls if
a kd-tree or linear search (both implemented in the ANN library; see Mount
and Arya, 2010). Note, that these implementations cannot handle NAs.
\code{search = "dist"} precomputes Euclidean distances first using R. NAs are
handled, but the resulting distance matrix cannot contain NAs. To use other
distance measures, a precomputed distance matrix can be provided as \code{x}
(\code{search} is ignored).
\item \code{bucketSize} and \code{splitRule} influence how the kd-tree is
built. \code{approx} uses the approximate nearest neighbor search
implemented in ANN. All nearest neighbors up to a distance of
\code{eps / (1 + approx)} will be considered and all with a distance
greater than \code{eps} will not be considered. The other points might be
considered. Note that this results in some actual nearest neighbors being
omitted leading to spurious clusters and noise points. However, the
algorithm will enjoy a significant speedup. For more details see Mount and
Arya (2010).
}
}
\examples{
data(iris)
x <- iris[, -5]

# Example 1: finding kNN for all points in a data matrix (using a kd-tree)
nn <- kNN(x, k = 5)
nn

# explore neighborhood of point 10
i <- 10
nn$id[i,]
plot(x, col = ifelse(seq_len(nrow(iris)) \%in\% nn$id[i,], "red", "black"))

# visualize the 5 nearest neighbors
plot(nn, x)

# visualize a reduced 2-NN graph
plot(kNN(nn, k = 2), x)

# Example 2: find kNN for query points
q <- x[c(1,100),]
nn <- kNN(x, k = 10, query = q)

plot(nn, x, col = "grey")
points(q, pch = 3, lwd = 2)

# Example 3: find kNN using distances
d <- dist(x, method = "manhattan")
nn <- kNN(d, k = 1)
plot(nn, x)
}
\references{
David M. Mount and Sunil Arya (2010). ANN: A Library for
Approximate Nearest Neighbor Searching,
\url{http://www.cs.umd.edu/~mount/ANN/}.
}
\seealso{
Other NN functions: 
\code{\link{NN}},
\code{\link{comps}()},
\code{\link{frNN}()},
\code{\link{kNNdist}()},
\code{\link{sNN}()}
}
\author{
Michael Hahsler
}
\concept{NN functions}
\keyword{model}


================================================
FILE: man/kNNdist.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/kNNdist.R
\name{kNNdist}
\alias{kNNdist}
\alias{kNNdistplot}
\title{Calculate and Plot k-Nearest Neighbor Distances}
\usage{
kNNdist(x, k, all = FALSE, ...)

kNNdistplot(x, k, minPts, ...)
}
\arguments{
\item{x}{the data set as a matrix of points (Euclidean distance is used) or
a precalculated \link{dist} object.}

\item{k}{number of nearest neighbors used for the distance calculation. For
\code{kNNdistplot()} also a range of values for \code{k} or \code{minPts} can be specified.}

\item{all}{should a matrix with the distances to all k nearest neighbors be
returned?}

\item{...}{further arguments (e.g., kd-tree related parameters) are passed
on to \code{\link[=kNN]{kNN()}}.}

\item{minPts}{to use a k-NN plot to determine a suitable \code{eps} value for \code{\link[=dbscan]{dbscan()}},
\code{minPts} used in dbscan can be specified and will set \code{k = minPts - 1}.}
}
\value{
\code{kNNdist()} returns a numeric vector with the distance to its k
nearest neighbor. If \code{all = TRUE} then a matrix with k columns
containing the distances to all 1st, 2nd, ..., kth nearest neighbors is
returned instead.
}
\description{
Fast calculation of the k-nearest neighbor distances for a dataset
represented as a matrix of points. The kNN distance is defined as the
distance from a point to its k nearest neighbor. The kNN distance plot
displays the kNN distance of all points sorted from smallest to largest. The
plot can be used to help find suitable parameter values for \code{\link[=dbscan]{dbscan()}}.
}
\examples{
data(iris)
iris <- as.matrix(iris[, 1:4])

## Find the 4-NN distance for each observation (see ?kNN
## for different search strategies)
kNNdist(iris, k = 4)

## Get a matrix with distances to the 1st, 2nd, ..., 4th NN.
kNNdist(iris, k = 4, all = TRUE)

## Produce a k-NN distance plot to determine a suitable eps for
## DBSCAN with MinPts = 5. Use k = 4 (= MinPts -1).
## The knee is visible around a distance of .7
kNNdistplot(iris, k = 4)

## Look at all k-NN distance plots for a k of 1 to 10
## Note that k-NN distances are increasing in k
kNNdistplot(iris, k = 1:20)

cl <- dbscan(iris, eps = .7, minPts = 5)
pairs(iris, col = cl$cluster + 1L)
## Note: black points are noise points
}
\seealso{
Other Outlier Detection Functions: 
\code{\link{glosh}()},
\code{\link{lof}()},
\code{\link{pointdensity}()}

Other NN functions: 
\code{\link{NN}},
\code{\link{comps}()},
\code{\link{frNN}()},
\code{\link{kNN}()},
\code{\link{sNN}()}
}
\author{
Michael Hahsler
}
\concept{NN functions}
\concept{Outlier Detection Functions}
\keyword{model}
\keyword{plot}


================================================
FILE: man/lof.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/LOF.R
\name{lof}
\alias{lof}
\alias{LOF}
\title{Local Outlier Factor Score}
\usage{
lof(x, minPts = 5, ...)
}
\arguments{
\item{x}{a data matrix or a \link{dist} object.}

\item{minPts}{number of nearest neighbors used in defining the local
neighborhood of a point (includes the point itself).}

\item{...}{further arguments are passed on to \code{\link[=kNN]{kNN()}}.
Note: \code{sort} cannot be specified here since \code{lof()}
uses always \code{sort = TRUE}.}
}
\value{
A numeric vector of length \code{ncol(x)} containing LOF values for
all data points.
}
\description{
Calculate the Local Outlier Factor (LOF) score for each data point using a
kd-tree to speed up kNN search.
}
\details{
LOF compares the local readability density (lrd) of an point to the lrd of
its neighbors. A LOF score of approximately 1 indicates that the lrd around
the point is comparable to the lrd of its neighbors and that the point is
not an outlier. Points that have a substantially lower lrd than their
neighbors are considered outliers and produce scores significantly larger
than 1.

If a data matrix is specified, then Euclidean distances and fast nearest
neighbor search using a kd-tree is used.

\strong{Note on duplicate points:} If there are more than \code{minPts}
duplicates of a point in the data, then LOF the local readability distance
will be 0 resulting in an undefined LOF score of 0/0. We set LOF in this
case to 1 since there is already enough density from the points in the same
location to make them not outliers. The original paper by Breunig et al
(2000) assumes that the points are real duplicates and suggests to remove
the duplicates before computing LOF. If duplicate points are removed first,
then this LOF implementation in \pkg{dbscan} behaves like the one described
by Breunig et al.
}
\examples{
set.seed(665544)
n <- 100
x <- cbind(
  x=runif(10, 0, 5) + rnorm(n, sd = 0.4),
  y=runif(10, 0, 5) + rnorm(n, sd = 0.4)
  )

### calculate LOF score with a neighborhood of 3 points
lof <- lof(x, minPts = 3)

### distribution of outlier factors
summary(lof)
hist(lof, breaks = 10, main = "LOF (minPts = 3)")

### plot sorted lof. Looks like outliers start arounf a LOF of 2.
plot(sort(lof), type = "l",  main = "LOF (minPts = 3)",
  xlab = "Points sorted by LOF", ylab = "LOF")

### point size is proportional to LOF and mark points with a LOF > 2
plot(x, pch = ".", main = "LOF (minPts = 3)", asp = 1)
points(x, cex = (lof - 1) * 2, pch = 1, col = "red")
text(x[lof > 2,], labels = round(lof, 1)[lof > 2], pos = 3)
}
\references{
Breunig, M., Kriegel, H., Ng, R., and Sander, J. (2000). LOF:
identifying density-based local outliers. In \emph{ACM Int. Conf. on
Management of Data,} pages 93-104.
\doi{10.1145/335191.335388}
}
\seealso{
Other Outlier Detection Functions: 
\code{\link{glosh}()},
\code{\link{kNNdist}()},
\code{\link{pointdensity}()}
}
\author{
Michael Hahsler
}
\concept{Outlier Detection Functions}
\keyword{model}


================================================
FILE: man/moons.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/moons.R
\docType{data}
\name{moons}
\alias{moons}
\title{Moons Data}
\format{
A data frame with 100 observations on the following 2 variables.
\describe{
\item{X}{a numeric vector}
\item{Y}{a numeric vector} }
}
\source{
See the HDBSCAN notebook from github documentation:
\url{http://hdbscan.readthedocs.io/en/latest/how_hdbscan_works.html}
}
\description{
Contains 100 2-d points, half of which are contained in two moons or
"blobs"" (25 points each blob), and the other half in asymmetric facing
crescent shapes. The three shapes are all linearly separable.
}
\details{
This data was generated with the following Python commands using the
SciKit-Learn library:

\verb{> import sklearn.datasets as data}

\verb{> moons = data.make_moons(n_samples=50, noise=0.05)}

\verb{> blobs = data.make_blobs(n_samples=50, centers=[(-0.75,2.25), (1.0, 2.0)], cluster_std=0.25)}

\verb{> test_data = np.vstack([moons, blobs])}
}
\examples{
data(moons)
plot(moons, pch=20)
}
\references{
Pedregosa, Fabian, Gael Varoquaux, Alexandre Gramfort,
Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel et al.
Scikit-learn: Machine learning in Python. \emph{Journal of Machine Learning
Research} 12, no. Oct (2011): 2825-2830.
}
\keyword{datasets}


================================================
FILE: man/ncluster.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ncluster.R
\name{ncluster}
\alias{ncluster}
\alias{nnoise}
\alias{nobs}
\title{Number of Clusters, Noise Points, and Observations}
\usage{
ncluster(object, ...)

nnoise(object, ...)
}
\arguments{
\item{object}{a clustering result object containing a \code{cluster} element.}

\item{...}{additional arguments are unused.}
}
\value{
returns the number if clusters or noise points.
}
\description{
Extract the number of clusters or the number of noise points for
a clustering. This function works with any clustering result that
contains a list element named \code{cluster} with a clustering vector. In
addition, \code{nobs} (see \code{\link[stats:nobs]{stats::nobs()}}) is also available to retrieve
the number of clustered points.
}
\examples{
data(iris)
iris <- as.matrix(iris[, 1:4])

res <- dbscan(iris, eps = .7, minPts = 5)
res

ncluster(res)
nnoise(res)
nobs(res)

# the functions also work with kmeans and other clustering algorithms.
cl <- kmeans(iris, centers = 3)
ncluster(cl)
nnoise(cl)
nobs(res)
}
\seealso{
Other clustering functions: 
\code{\link{dbscan}()},
\code{\link{extractFOSC}()},
\code{\link{hdbscan}()},
\code{\link{jpclust}()},
\code{\link{optics}()},
\code{\link{sNNclust}()}
}
\concept{clustering functions}


================================================
FILE: man/optics.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/optics.R, R/predict.R
\name{optics}
\alias{optics}
\alias{OPTICS}
\alias{print.optics}
\alias{plot.optics}
\alias{as.reachability.optics}
\alias{as.dendrogram.optics}
\alias{extractDBSCAN}
\alias{extractXi}
\alias{predict.optics}
\title{Ordering Points to Identify the Clustering Structure (OPTICS)}
\usage{
optics(x, eps = NULL, minPts = 5, ...)

\method{print}{optics}(x, ...)

\method{plot}{optics}(x, cluster = TRUE, predecessor = FALSE, ...)

\method{as.reachability}{optics}(object, ...)

\method{as.dendrogram}{optics}(object, ...)

extractDBSCAN(object, eps_cl)

extractXi(object, xi, minimum = FALSE, correctPredecessors = TRUE)

\method{predict}{optics}(object, newdata, data, ...)
}
\arguments{
\item{x}{a data matrix or a \link{dist} object.}

\item{eps}{upper limit of the size of the epsilon neighborhood. Limiting the
neighborhood size improves performance and has no or very little impact on
the ordering as long as it is not set too low. If not specified, the largest
minPts-distance in the data set is used which gives the same result as
infinity.}

\item{minPts}{the parameter is used to identify dense neighborhoods and the
reachability distance is calculated as the distance to the minPts nearest
neighbor. Controls the smoothness of the reachability distribution. Default
is 5 points.}

\item{...}{additional arguments are passed on to fixed-radius nearest
neighbor search algorithm. See \code{\link[=frNN]{frNN()}} for details on how to
control the search strategy.}

\item{cluster, predecessor}{plot clusters and predecessors.}

\item{object}{clustering object.}

\item{eps_cl}{Threshold to identify clusters (\code{eps_cl <= eps}).}

\item{xi}{Steepness threshold to identify clusters hierarchically using the
Xi method.}

\item{minimum}{logical, representing whether or not to extract the minimal
(non-overlapping) clusters in the Xi clustering algorithm.}

\item{correctPredecessors}{logical, correct a common artifact by pruning
the steep up area for points that have predecessors not in the
cluster--found by the ELKI framework, see details below.}

\item{newdata}{new data points for which the cluster membership should be
predicted.}

\item{data}{the data set used to create the clustering object.}
}
\value{
An object of class \code{optics} with components:
\item{eps }{ value of \code{eps} parameter. }
\item{minPts }{ value of \code{minPts} parameter. }
\item{order }{ optics order for the data points in \code{x}. }
\item{reachdist }{ \link{reachability} distance for each data point in \code{x}. }
\item{coredist }{ core distance for each data point in \code{x}. }

For \code{extractDBSCAN()}, in addition the following
components are available:
\item{eps_cl }{ the value of the \code{eps_cl} parameter. }
\item{cluster }{ assigned cluster labels in the order of the data points in \code{x}. }

For \code{extractXi()}, in addition the following components
are available:
\item{xi}{ Steepness threshold\code{x}. }
\item{cluster }{ assigned cluster labels in the order of the data points in \code{x}.}
\item{clusters_xi }{ data.frame containing the start and end of each cluster
found in the OPTICS ordering. }
}
\description{
Implementation of the OPTICS (Ordering points to identify the clustering
structure) point ordering algorithm using a kd-tree.
}
\details{
\strong{The algorithm}

This implementation of OPTICS implements the original
algorithm as described by Ankerst et al (1999). OPTICS is an ordering
algorithm with methods to extract a clustering from the ordering.
While using similar concepts as DBSCAN, for OPTICS \code{eps}
is only an upper limit for the neighborhood size used to reduce
computational complexity. Note that \code{minPts} in OPTICS has a different
effect then in DBSCAN. It is used to define dense neighborhoods, but since
\code{eps} is typically set rather high, this does not effect the ordering
much. However, it is also used to calculate the reachability distance and
larger values will make the reachability distance plot smoother.

OPTICS linearly orders the data points such that points which are spatially
closest become neighbors in the ordering. The closest analog to this
ordering is dendrogram in single-link hierarchical clustering. The algorithm
also calculates the reachability distance for each point.
\code{plot()} (see \link{reachability_plot})
produces a reachability plot which shows each points reachability distance
between two consecutive points
where the points are sorted by OPTICS. Valleys represent clusters (the
deeper the valley, the more dense the cluster) and high points indicate
points between clusters.

\strong{Specifying the data}

If \code{x} is specified as a data matrix, then Euclidean distances and fast
nearest neighbor lookup using a kd-tree are used. See \code{\link[=kNN]{kNN()}} for
details on the parameters for the kd-tree.

\strong{Extracting a clustering}

Several methods to extract a clustering from the order returned by OPTICS are
implemented:
\itemize{
\item \code{extractDBSCAN()} extracts a clustering from an OPTICS ordering that is
similar to what DBSCAN would produce with an eps set to \code{eps_cl} (see
Ankerst et al, 1999). The only difference to a DBSCAN clustering is that
OPTICS is not able to assign some border points and reports them instead as
noise.
\item \code{extractXi()} extract clusters hierarchically specified in Ankerst et al
(1999) based on the steepness of the reachability plot. One interpretation
of the \code{xi} parameter is that it classifies clusters by change in
relative cluster density. The used algorithm was originally contributed by
the ELKI framework and is explained in Schubert et al (2018), but contains a
set of fixes.
}

\strong{Predict cluster memberships}

\code{predict()} requires an extracted DBSCAN clustering with \code{extractDBSCAN()} and then
uses predict for \code{dbscan()}.
}
\examples{
set.seed(2)
n <- 400

x <- cbind(
  x = runif(4, 0, 1) + rnorm(n, sd = 0.1),
  y = runif(4, 0, 1) + rnorm(n, sd = 0.1)
  )

plot(x, col=rep(1:4, times = 100))

### run OPTICS (Note: we use the default eps calculation)
res <- optics(x, minPts = 10)
res

### get order
res$order

### plot produces a reachability plot
plot(res)

### plot the order of points in the reachability plot
plot(x, col = "grey")
polygon(x[res$order, ])

### extract a DBSCAN clustering by cutting the reachability plot at eps_cl
res <- extractDBSCAN(res, eps_cl = .065)
res

plot(res)  ## black is noise
hullplot(x, res)

### re-cut at a higher eps threshold
res <- extractDBSCAN(res, eps_cl = .07)
res
plot(res)
hullplot(x, res)

### extract hierarchical clustering of varying density using the Xi method
res <- extractXi(res, xi = 0.01)
res

plot(res)
hullplot(x, res)

# Xi cluster structure
res$clusters_xi

### use OPTICS on a precomputed distance matrix
d <- dist(x)
res <- optics(d, minPts = 10)
plot(res)
}
\references{
Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Joerg
Sander (1999). OPTICS: Ordering Points To Identify the Clustering Structure.
\emph{ACM SIGMOD international conference on Management of data.} ACM Press. pp.
\doi{10.1145/304181.304187}

Hahsler M, Piekenbrock M, Doran D (2019). dbscan: Fast Density-Based
Clustering with R.  \emph{Journal of Statistical Software}, 91(1), 1-30.
\doi{10.18637/jss.v091.i01}

Erich Schubert, Michael Gertz (2018). Improving the Cluster Structure
Extracted from OPTICS Plots. In \emph{Lernen, Wissen, Daten, Analysen (LWDA 2018),}
pp. 318-329.
}
\seealso{
Density \link{reachability}.

Other clustering functions: 
\code{\link{dbscan}()},
\code{\link{extractFOSC}()},
\code{\link{hdbscan}()},
\code{\link{jpclust}()},
\code{\link{ncluster}()},
\code{\link{sNNclust}()}
}
\author{
Michael Hahsler and Matthew Piekenbrock
}
\concept{clustering functions}
\keyword{clustering}
\keyword{model}


================================================
FILE: man/pointdensity.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/pointdensity.R
\name{pointdensity}
\alias{pointdensity}
\alias{density}
\title{Calculate Local Density at Each Data Point}
\usage{
pointdensity(
  x,
  eps,
  type = "frequency",
  search = "kdtree",
  bucketSize = 10,
  splitRule = "suggest",
  approx = 0
)
}
\arguments{
\item{x}{a data matrix or a dist object.}

\item{eps}{radius of the eps-neighborhood, i.e., bandwidth of the uniform
kernel). For the Gaussian kde, this parameter specifies the standard deviation of
the kernel.}

\item{type}{\code{"frequency"}, \code{"density"}, or \code{"gaussian"}. should the raw count of
points inside the eps-neighborhood, the eps-neighborhood density estimate,
or a Gaussian density estimate be returned?}

\item{search, bucketSize, splitRule, approx}{algorithmic parameters for
\code{\link[=frNN]{frNN()}}.}
}
\value{
A vector of the same length as data points (rows) in \code{x} with
the count or density values for each data point.
}
\description{
Calculate the local density at each data point as either the number of
points in the eps-neighborhood (as used in \code{dbscan()}) or perform kernel density
estimation (KDE) using a uniform kernel. The function uses a kd-tree for fast
fixed-radius nearest neighbor search.
}
\details{
\code{dbscan()} estimates the density around a point as the number of points in the
eps-neighborhood of the point (including the query point itself).
Kernel density estimation (KDE) using a uniform kernel, which is just this point
count in the eps-neighborhood divided by \eqn{(2\,eps\,n)}{(2 eps n)}, where
\eqn{n} is the number of points in \code{x}.

Alternatively, \code{type = "gaussian"} calculates a Gaussian kernel estimate where
\code{eps} is used as the standard deviation. To speed up computation, a
kd-tree is used to find all points within 3 times the standard deviation and
these points are used for the estimate.

Points with low local density often indicate noise (see e.g., Wishart (1969)
and Hartigan (1975)).
}
\examples{
set.seed(665544)
n <- 100
x <- cbind(
  x=runif(10, 0, 5) + rnorm(n, sd = 0.4),
  y=runif(10, 0, 5) + rnorm(n, sd = 0.4)
  )
plot(x)

### calculate density around points
d <- pointdensity(x, eps = .5, type = "density")

### density distribution
summary(d)
hist(d, breaks = 10)

### plot with point size is proportional to Density
plot(x, pch = 19, main = "Density (eps = .5)", cex = d*5)

### Wishart (1969) single link clustering after removing low-density noise
# 1. remove noise with low density
f <- pointdensity(x, eps = .5, type = "frequency")
x_nonoise <- x[f >= 5,]

# 2. use single-linkage on the non-noise points
hc <- hclust(dist(x_nonoise), method = "single")
plot(x, pch = 19, cex = .5)
points(x_nonoise, pch = 19, col= cutree(hc, k = 4) + 1L)
}
\references{
Wishart, D. (1969), Mode Analysis: A Generalization of Nearest
Neighbor which Reduces Chaining Effects, in \emph{Numerical Taxonomy,} Ed., A.J.
Cole, Academic Press, 282-311.

John A. Hartigan (1975), \emph{Clustering Algorithms,} John Wiley & Sons, Inc.,
New York, NY, USA.
}
\seealso{
\code{\link[=frNN]{frNN()}}, \code{\link[stats:density]{stats::density()}}.

Other Outlier Detection Functions: 
\code{\link{glosh}()},
\code{\link{kNNdist}()},
\code{\link{lof}()}
}
\author{
Michael Hahsler
}
\concept{Outlier Detection Functions}
\keyword{model}


================================================
FILE: man/reachability.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/reachability.R
\name{reachability}
\alias{reachability}
\alias{reachability_plot}
\alias{print.reachability}
\alias{plot.reachability}
\alias{as.reachability}
\alias{as.reachability.dendrogram}
\title{Reachability Distances}
\usage{
\method{print}{reachability}(x, ...)

\method{plot}{reachability}(
  x,
  order_labels = FALSE,
  xlab = "Order",
  ylab = "Reachability dist.",
  main = "Reachability Plot",
  ...
)

as.reachability(object, ...)

\method{as.reachability}{dendrogram}(object, ...)
}
\arguments{
\item{x}{object of class \code{reachability}.}

\item{...}{graphical parameters are passed on to \code{plot()},
or arguments for other methods.}

\item{order_labels}{whether to plot text labels for each points reachability
distance.}

\item{xlab}{x-axis label.}

\item{ylab}{y-axis label.}

\item{main}{Title of the plot.}

\item{object}{any object that can be coerced to class
\code{reachability}, such as an object of class \link{optics} or \link[stats:dendrogram]{stats::dendrogram}.}
}
\value{
An object of class \code{reachability} with components:
\item{order }{order to use for the data points in \code{x}. }
\item{reachdist }{reachability distance for each data point in \code{x}. }
}
\description{
Reachability distances can be plotted to show the hierarchical relationships between data points.
The idea was originally introduced by Ankerst et al (1999) for \link{OPTICS}. Later,
Sanders et al (2003) showed that the visualization is useful for other hierarchical
structures and introduced an algorithm to convert \link{dendrogram} representation to
reachability plots.
}
\details{
A reachability plot displays the points as vertical bars, were the height is the
reachability distance between two consecutive points.
The central idea behind reachability plots is that the ordering in which
points are plotted identifies underlying hierarchical density
representation as mountains and valleys of high and low reachability distance.
The original ordering algorithm OPTICS as described by Ankerst et al (1999)
introduced the notion of reachability plots.

OPTICS linearly orders the data points such that points
which are spatially closest become neighbors in the ordering. Valleys
represent clusters, which can be represented hierarchically. Although the
ordering is crucial to the structure of the reachability plot, its important
to note that OPTICS, like DBSCAN, is not entirely deterministic and, just
like the dendrogram, isomorphisms may exist

Reachability plots were shown to essentially convey the same information as
the more traditional dendrogram structure by Sanders et al (2003). An dendrograms
can be converted into reachability plots.

Different hierarchical representations, such as dendrograms or reachability
plots, may be preferable depending on the context. In smaller datasets,
cluster memberships may be more easily identifiable through a dendrogram
representation, particularly is the user is already familiar with tree-like
representations. For larger datasets however, a reachability plot may be
preferred for visualizing macro-level density relationships.

A variety of cluster extraction methods have been proposed using
reachability plots. Because both cluster extraction depend directly on the
ordering OPTICS produces, they are part of the \code{\link[=optics]{optics()}} interface.
Nonetheless, reachability plots can be created directly from other types of
linkage trees, and vice versa.

\emph{Note:} The reachability distance for the first point is by definition not defined
(it has no preceding point).
Also, the reachability distances can be undefined when a point does not have enough
neighbors in the epsilon neighborhood. We represent these undefined cases as \code{Inf}
and represent them in the plot as a dashed line.
}
\examples{
set.seed(2)
n <- 20

x <- cbind(
  x = runif(4, 0, 1) + rnorm(n, sd = 0.1),
  y = runif(4, 0, 1) + rnorm(n, sd = 0.1)
)

plot(x, xlim = range(x), ylim = c(min(x) - sd(x), max(x) + sd(x)), pch = 20)
text(x = x, labels = seq_len(nrow(x)), pos = 3)

### run OPTICS
res <- optics(x, eps = 10,  minPts = 2)
res

### plot produces a reachability plot.
plot(res)

### Manually extract reachability components from OPTICS
reach <- as.reachability(res)
reach

### plot still produces a reachability plot; points ids
### (rows in the original data) can be displayed with order_labels = TRUE
plot(reach, order_labels = TRUE)

### Reachability objects can be directly converted to dendrograms
dend <- as.dendrogram(reach)
dend
plot(dend)

### A dendrogram can be converted back into a reachability object
plot(as.reachability(dend))
}
\references{
Ankerst, M., M. M. Breunig, H.-P. Kriegel, J. Sander (1999).
OPTICS: Ordering Points To Identify the Clustering Structure. \emph{ACM
SIGMOD international conference on Management of data.} ACM Press. pp.
49--60.

Sander, J., X. Qin, Z. Lu, N. Niu, and A. Kovarsky (2003). Automatic
extraction of clusters from hierarchical clustering representations.
\emph{Pacific-Asia Conference on Knowledge Discovery and Data Mining.}
Springer Berlin Heidelberg.
}
\seealso{
\code{\link[=optics]{optics()}}, \code{\link[=as.dendrogram]{as.dendrogram()}}, and \code{\link[stats:hclust]{stats::hclust()}}.
}
\author{
Matthew Piekenbrock
}
\keyword{clustering}
\keyword{hierarchical}
\keyword{model}


================================================
FILE: man/sNN.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/sNN.R
\name{sNN}
\alias{sNN}
\alias{snn}
\alias{sort.sNN}
\alias{print.sNN}
\title{Find Shared Nearest Neighbors}
\usage{
sNN(
  x,
  k,
  kt = NULL,
  jp = FALSE,
  sort = TRUE,
  search = "kdtree",
  bucketSize = 10,
  splitRule = "suggest",
  approx = 0
)

\method{sort}{sNN}(x, decreasing = TRUE, ...)

\method{print}{sNN}(x, ...)
}
\arguments{
\item{x}{a data matrix, a \link{dist} object or a \link{kNN} object.}

\item{k}{number of neighbors to consider to calculate the shared nearest
neighbors.}

\item{kt}{minimum threshold on the number of shared nearest neighbors to
build the shared nearest neighbor graph. Edges are only preserved if
\code{kt} or more neighbors are shared.}

\item{jp}{In regular sNN graphs, two points that are not neighbors
can have shared neighbors.
Javis and Patrick (1973) requires the two points to be neighbors, otherwise
the count is zeroed out. \code{TRUE} uses this behavior.}

\item{sort}{sort by the number of shared nearest neighbors? Note that this
is expensive and \code{sort = FALSE} is much faster. sNN objects can be
sorted using \code{sort()}.}

\item{search}{nearest neighbor search strategy (one of \code{"kdtree"}, \code{"linear"} or
\code{"dist"}).}

\item{bucketSize}{max size of the kd-tree leafs.}

\item{splitRule}{rule to split the kd-tree. One of \code{"STD"}, \code{"MIDPT"}, \code{"FAIR"},
\code{"SL_MIDPT"}, \code{"SL_FAIR"} or \code{"SUGGEST"} (SL stands for sliding). \code{"SUGGEST"} uses
ANNs best guess.}

\item{approx}{use approximate nearest neighbors. All NN up to a distance of
a factor of \verb{(1 + approx) eps} may be used. Some actual NN may be omitted
leading to spurious clusters and noise points.  However, the algorithm will
enjoy a significant speedup.}

\item{decreasing}{logical; sort in decreasing order?}

\item{...}{additional parameters are passed on.}
}
\value{
An object of class \code{sNN} (subclass of \link{kNN} and \link{NN}) containing a list
with the following components:
\item{id }{a matrix with ids. }
\item{dist}{a matrix with the distances. }
\item{shared }{a matrix with the number of shared nearest neighbors. }
\item{k }{number of \code{k} used. }
\item{metric }{the used distance metric. }
}
\description{
Calculates the number of shared nearest neighbors
and creates a shared nearest neighbors graph.
}
\details{
The number of shared nearest neighbors of two points p and q is the
intersection of the kNN neighborhood of two points.
Note: that each point is considered to be part
of its own kNN neighborhood.
The range for the shared nearest neighbors is
\eqn{[0, k]}. The result is a n-by-k matrix called \code{shared}.
Each row is a point and the columns are the point's k nearest neighbors.
The value is the count of the shared neighbors.

The shared nearest neighbor graph connects a point with all its nearest neighbors
if they have at least one shared neighbor. The number of shared neighbors can be used
as an edge weight.
Javis and Patrick (1973) use a slightly
modified (see parameter \code{jp}) shared nearest neighbor graph for
clustering.
}
\examples{
data(iris)
x <- iris[, -5]

# finding kNN and add the number of shared nearest neighbors.
k <- 5
nn <- sNN(x, k = k)
nn

# shared nearest neighbor distribution
table(as.vector(nn$shared))

# explore number of shared points for the k-neighborhood of point 10
i <- 10
nn$shared[i,]

plot(nn, x)

# apply a threshold to create a sNN graph with edges
# if more than 3 neighbors are shared.
nn_3 <- sNN(nn, kt = 3)
plot(nn_3, x)

# get an adjacency list for the shared nearest neighbor graph
adjacencylist(nn_3)
}
\references{
R. A. Jarvis and E. A. Patrick. 1973. Clustering Using a
Similarity Measure Based on Shared Near Neighbors. \emph{IEEE Trans. Comput.}
22, 11 (November 1973), 1025-1034.
\doi{10.1109/T-C.1973.223640}
}
\seealso{
Other NN functions: 
\code{\link{NN}},
\code{\link{comps}()},
\code{\link{frNN}()},
\code{\link{kNN}()},
\code{\link{kNNdist}()}
}
\author{
Michael Hahsler
}
\concept{NN functions}
\keyword{model}


================================================
FILE: man/sNNclust.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/sNNclust.R
\name{sNNclust}
\alias{sNNclust}
\alias{snnclust}
\title{Shared Nearest Neighbor Clustering}
\usage{
sNNclust(x, k, eps, minPts, borderPoints = TRUE, ...)
}
\arguments{
\item{x}{a data matrix/data.frame (Euclidean distance is used), a
precomputed \link{dist} object or a kNN object created with \code{\link[=kNN]{kNN()}}.}

\item{k}{Neighborhood size for nearest neighbor sparsification to create the
shared NN graph.}

\item{eps}{Two objects are only reachable from each other if they share at
least \code{eps} nearest neighbors. Note: this is different from the \code{eps} in DBSCAN!}

\item{minPts}{minimum number of points that share at least \code{eps}
nearest neighbors for a point to be considered a core points.}

\item{borderPoints}{should border points be assigned to clusters like in
\link{DBSCAN}?}

\item{...}{additional arguments are passed on to the k nearest neighbor
search algorithm. See \code{\link[=kNN]{kNN()}} for details on how to control the
search strategy.}
}
\value{
A object of class \code{general_clustering} with the following
components:
\item{cluster }{A integer vector with cluster assignments. Zero
indicates noise points.}
\item{type }{ name of used clustering algorithm.}
\item{param }{ list of used clustering parameters. }
}
\description{
Implements the shared nearest neighbor clustering algorithm by Ertoz,
Steinbach and Kumar (2003).
}
\details{
\strong{Algorithm:}
\enumerate{
\item Constructs a shared nearest neighbor graph for a given k. The edge
weights are the number of shared k nearest neighbors (in the range of
\eqn{[0, k]}).
\item Find each points SNN density, i.e., the number of points which have a
similarity of \code{eps} or greater.
\item Find the core points, i.e., all points that have an SNN density greater
than \code{MinPts}.
\item Form clusters from the core points and assign border points (i.e.,
non-core points which share at least \code{eps} neighbors with a core point).
}

Note that steps 2-4 are equivalent to the DBSCAN algorithm (see \code{\link[=dbscan]{dbscan()}})
and that \code{eps} has a different meaning than for DBSCAN. Here it is
a threshold on the number of shared neighbors (see \code{\link[=sNN]{sNN()}})
which defines a similarity.
}
\examples{
data("DS3")

# Out of k = 20 NN 7 (eps) have to be shared to create a link in the sNN graph.
# A point needs a least 16 (minPts) links in the sNN graph to be a core point.
# Noise points have cluster id 0 and are shown in black.
cl <- sNNclust(DS3, k = 20, eps = 7, minPts = 16)
cl

clplot(DS3, cl)

}
\references{
Levent Ertoz, Michael Steinbach, Vipin Kumar, Finding Clusters
of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data,
\emph{SIAM International Conference on Data Mining,} 2003, 47-59.
\doi{10.1137/1.9781611972733.5}
}
\seealso{
Other clustering functions: 
\code{\link{dbscan}()},
\code{\link{extractFOSC}()},
\code{\link{hdbscan}()},
\code{\link{jpclust}()},
\code{\link{ncluster}()},
\code{\link{optics}()}
}
\author{
Michael Hahsler
}
\concept{clustering functions}
\keyword{clustering}
\keyword{model}


================================================
FILE: src/ANN/ANN.cpp
================================================
//----------------------------------------------------------------------
// File:			ANN.cpp
// Programmer:		Sunil Arya and David Mount
// Description:		Methods for ANN.h and ANNx.h
// Last modified:	01/04/05 (Version 1.0)
//----------------------------------------------------------------------
// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and
// David Mount.  All Rights Reserved.
//
// This software and related documentation is part of the Approximate
// Nearest Neighbor Library (ANN).  This software is provided under
// the provisions of the Lesser GNU Public License (LGPL).  See the
// file ../ReadMe.txt for further information.
//
// The University of Maryland (U.M.) and the authors make no
// representations about the suitability or fitness of this software for
// any purpose.  It is provided "as is" without express or implied
// warranty.
//----------------------------------------------------------------------
// History:
//	Revision 0.1  03/04/98
//		Initial release
//	Revision 1.0  04/01/05
//		Added performance counting to annDist()
//      Modified 2/28/08
//              Added cstdlib and std::
//----------------------------------------------------------------------

#include <cstdlib>
#include "ANNx.h"					// all ANN include
#include "ANNperf.h"				// ANN performance
//using namespace std;					// make std:: accessible

#include <R.h>

//----------------------------------------------------------------------
//	Point methods
//----------------------------------------------------------------------

//----------------------------------------------------------------------
//	Distance utility.
//		(Note: In the nearest neighbor search, most distances are
//		computed using partial distance calculations, not this
//		procedure.)
//----------------------------------------------------------------------

ANNdist annDist(						// interpoint squared distance
	int					dim,
	ANNpoint			p,
	ANNpoint			q)
{
	int d;
	ANNcoord diff;
	ANNcoord dist;

	dist = 0;
	for (d = 0; d < dim; d++) {
		diff = p[d] - q[d];
		dist = ANN_SUM(dist, ANN_POW(diff));
	}
	ANN_FLOP(3*dim)					// performance counts
	ANN_PTS(1)
	ANN_COORD(dim)
	return dist;
}

//----------------------------------------------------------------------
//	annPrintPoint() prints a point to a given output stream.
//----------------------------------------------------------------------

void annPrintPt(						// print a point
	ANNpoint			pt,				// the point
	int					dim,			// the dimension
	std::ostream		&out)			// output stream
{
	for (int j = 0; j < dim; j++) {
		out << pt[j];
		if (j < dim-1) out << " ";
	}
}

//----------------------------------------------------------------------
//	Point allocation/deallocation:
//
//		Because points (somewhat like strings in C) are stored
//		as pointers.  Consequently, creating and destroying
//		copies of points may require storage allocation.  These
//		procedures do this.
//
//		annAllocPt() and annDeallocPt() allocate a deallocate
//		storage for a single point, and return a pointer to it.
//
//		annAllocPts() allocates an array of points as well a place
//		to store their coordinates, and initializes the points to
//		point to their respective coordinates.  It allocates point
//		storage in a contiguous block large enough to store all the
//		points.  It performs no initialization.
//
//		annDeallocPts() should only be used on point arrays allocated
//		by annAllocPts since it assumes that points are allocated in
//		a block.
//
//		annCopyPt() copies a point taking care to allocate storage
//		for the new point.
//
//		annAssignRect() assigns the coordinates of one rectangle to
//		another.  The two rectangles must have the same dimension
//		(and it is not possible to test this here).
//----------------------------------------------------------------------

ANNpoint annAllocPt(int dim, ANNcoord c)		// allocate 1 point
{
	ANNpoint p = new ANNcoord[dim];
	for (int i = 0; i < dim; i++) p[i] = c;
	return p;
}

ANNpointArray annAllocPts(int n, int dim)		// allocate n pts in dim
{
	ANNpointArray pa = new ANNpoint[n];			// allocate points
	ANNpoint	  p  = new ANNcoord[n*dim];		// allocate space for coords
	for (int i = 0; i < n; i++) {
		pa[i] = &(p[i*dim]);
	}
	return pa;
}

void annDeallocPt(ANNpoint &p)					// deallocate 1 point
{
	delete [] p;
	p = NULL;
}

void annDeallocPts(ANNpointArray &pa)			// deallocate points
{
	delete [] pa[0];							// dealloc coordinate storage
	delete [] pa;								// dealloc points
	pa = NULL;
}

ANNpoint annCopyPt(int dim, ANNpoint source)	// copy point
{
	ANNpoint p = new ANNcoord[dim];
	for (int i = 0; i < dim; i++) p[i] = source[i];
	return p;
}

												// assign one rect to another
void annAssignRect(int dim, ANNorthRect &dest, const ANNorthRect &source)
{
	for (int i = 0; i < dim; i++) {
		dest.lo[i] = source.lo[i];
		dest.hi[i] = source.hi[i];
	}
}

												// is point inside rectangle?
ANNbool ANNorthRect::inside(int dim, ANNpoint p)
{
	for (int i = 0; i < dim; i++) {
		if (p[i] < lo[i] || p[i] > hi[i]) return ANNfalse;
	}
	return ANNtrue;
}

//----------------------------------------------------------------------
//	Error handler
//----------------------------------------------------------------------

void annError(const char *msg, ANNerr level)
{
	if (level == ANNabort) {
	  //cerr << "ANN: ERROR------->" << msg << "<-------------ERROR\n";
	  Rprintf("ANN Fatal ERROR: %s", msg);
//	  std::exit(1);
	}
	else {
	  //cerr << "ANN: WARNING----->" << msg << "<-------------WARNING\n";
	  Rprintf("ANN WARNING: %s", msg);
	}
}

//----------------------------------------------------------------------
//	Limit on number of points visited
//		We have an option for terminating the search early if the
//		number of points visited exceeds some threshold.  If the
//		threshold is 0 (its default)  this means there is no limit
//		and the algorithm applies its normal termination condition.
//		This is for applications where there are real time constraints
//		on the running time of the algorithm.
//----------------------------------------------------------------------

int	ANNmaxPtsVisited = 0;	// maximum number of pts visited
int	ANNptsVisited;			// number of pts visited in search

//----------------------------------------------------------------------
//	Global function declarations
//----------------------------------------------------------------------

void annMaxPtsVisit(			// set limit on max. pts to visit in search
	int					maxPts)			// the limit
{
	ANNmaxPtsVisited = maxPts;
}


================================================
FILE: src/ANN/ANN.h
================================================
//----------------------------------------------------------------------
// File:			ANN.h
// Programmer:		Sunil Arya and David Mount
// Last modified:	05/03/05 (Release 1.1)
// Description:		Basic include file for approximate nearest
//					neighbor searching.
//----------------------------------------------------------------------
// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and
// David Mount.  All Rights Reserved.
//
// This software and related documentation is part of the Approximate
// Nearest Neighbor Library (ANN).  This software is provided under
// the provisions of the Lesser GNU Public License (LGPL).  See the
// file ../ReadMe.txt for further information.
//
// The University of Maryland (U.M.) and the authors make no
// representations about the suitability or fitness of this software for
// any purpose.  It is provided "as is" without express or implied
// warranty.
//----------------------------------------------------------------------
// History:
//	Revision 0.1  03/04/98
//		Initial release
//	Revision 1.0  04/01/05
//		Added copyright and revision information
//		Added ANNcoordPrec for coordinate precision.
//		Added methods theDim, nPoints, maxPoints, thePoints to ANNpointSet.
//		Cleaned up C++ structure for modern compilers
//	Revision 1.1  05/03/05
//		Added fixed-radius k-NN searching
//----------------------------------------------------------------------

//----------------------------------------------------------------------
// ANN - approximate nearest neighbor searching
//	ANN is a library for approximate nearest neighbor searching,
//	based on the use of standard and priority search in kd-trees
//	and balanced box-decomposition (bbd) trees. Here are some
//	references to the main algorithmic techniques used here:
//
//		kd-trees:
//			Friedman, Bentley, and Finkel, ``An algorithm for finding
//				best matches in logarithmic expected time,'' ACM
//				Transactions on Mathematical Software, 3(3):209-226, 1977.
//
//		Priority search in kd-trees:
//			Arya and Mount, ``Algorithms for fast vector quantization,''
//				Proc. of DCC '93: Data Compression Conference, eds. J. A.
//				Storer and M. Cohn, IEEE Press, 1993, 381-390.
//
//		Approximate nearest neighbor search and bbd-trees:
//			Arya, Mount, Netanyahu, Silverman, and Wu, ``An optimal
//				algorithm for approximate nearest neighbor searching,''
//				5th Ann. ACM-SIAM Symposium on Discrete Algorithms,
//				1994, 573-582.
//----------------------------------------------------------------------

#ifndef ANN_H
#define ANN_H

#ifdef Win32
//----------------------------------------------------------------------
// For Microsoft Visual C++, externally accessible symbols must be
// explicitly indicated with DLL_API, which is somewhat like "extern."
//
// The following ifdef block is the standard way of creating macros
// which make exporting from a DLL simpler. All files within this DLL
// are compiled with the DLL_EXPORTS preprocessor symbol defined on the
// command line. In contrast, projects that use (or import) the DLL
// objects do not define the DLL_EXPORTS symbol. This way any other
// project whose source files include this file see DLL_API functions as
// being imported from a DLL, wheras this DLL sees symbols defined with
// this macro as being exported.
//----------------------------------------------------------------------
#ifdef DLL_EXPORTS
#define DLL_API __declspec(dllexport)
#else
#define DLL_API __declspec(dllimport)
#endif
//----------------------------------------------------------------------
// DLL_API is ignored for all other systems
//----------------------------------------------------------------------
#else
#define DLL_API
#endif

//----------------------------------------------------------------------
//  basic includes
//----------------------------------------------------------------------

#include <cmath>			// math includes
#include <iostream>			// I/O streams

#include <vector>

//----------------------------------------------------------------------
// Limits
// There are a number of places where we use the maximum double value as
// default initializers (and others may be used, depending on the
// data/distance representation). These can usually be found in limits.h
// (as LONG_MAX, INT_MAX) or in float.h (as DBL_MAX, FLT_MAX).
//
// Not all systems have these files.  If you are using such a system,
// you should set the preprocessor symbol ANN_NO_LIMITS_H when
// compiling, and modify the statements below to generate the
// appropriate value. For practical purposes, this does not need to be
// the maximum double value. It is sufficient that it be at least as
// large than the maximum squared distance between between any two
// points.
//----------------------------------------------------------------------
#ifdef ANN_NO_LIMITS_H					// limits.h unavailable
#include <cvalues>					// replacement for limits.h
const double ANN_DBL_MAX = MAXDOUBLE;	// insert maximum double
#else
#include <climits>
#include <cfloat>
const double ANN_DBL_MAX = DBL_MAX;
#endif

#define ANNversion 		"1.0"			// ANN version and information
#define ANNversionCmt	""
#define ANNcopyright	"David M. Mount and Sunil Arya"
#define ANNlatestRev	"Mar 1, 2005"

//----------------------------------------------------------------------
//	ANNbool
//	This is a simple boolean type. Although ANSI C++ is supposed
//	to support the type bool, some compilers do not have it.
//----------------------------------------------------------------------

enum ANNbool {ANNfalse = 0, ANNtrue = 1}; // ANN boolean type (non ANSI C++)

//----------------------------------------------------------------------
//	ANNcoord, ANNdist
//		ANNcoord and ANNdist are the types used for representing
//		point coordinates and distances.  They can be modified by the
//		user, with some care.  It is assumed that they are both numeric
//		types, and that ANNdist is generally of an equal or higher type
//		from ANNcoord.	A variable of type ANNdist should be large
//		enough to store the sum of squared components of a variable
//		of type ANNcoord for the number of dimensions needed in the
//		application.  For example, the following combinations are
//		legal:
//
//		ANNcoord		ANNdist
//		---------		-------------------------------
//		short			short, int, long, float, double
//		int				int, long, float, double
//		long			long, float, double
//		float			float, double
//		double			double
//
//		It is the user's responsibility to make sure that overflow does
//		not occur in distance calculation.
//----------------------------------------------------------------------

typedef double	ANNcoord;				// coordinate data type
typedef double	ANNdist;				// distance data type

//----------------------------------------------------------------------
//	ANNidx
//		ANNidx is a point index.  When the data structure is built, the
//		points are given as an array.  Nearest neighbor results are
//		returned as an integer index into this array.  To make it
//		clearer when this is happening, we define the integer type
//		ANNidx.	 Indexing starts from 0.
//
//		For fixed-radius near neighbor searching, it is possible that
//		there are not k nearest neighbors within the search radius.  To
//		indicate this, the algorithm returns ANN_NULL_IDX as its result.
//		It should be distinguishable from any valid array index.
//----------------------------------------------------------------------

typedef int		ANNidx;					// point index
const ANNidx	ANN_NULL_IDX = -1;		// a NULL point index

//----------------------------------------------------------------------
//	Infinite distance:
//		The code assumes that there is an "infinite distance" which it
//		uses to initialize distances before performing nearest neighbor
//		searches.  It should be as larger or larger than any legitimate
//		nearest neighbor distance.
//
//		On most systems, these should be found in the standard include
//		file <limits.h> or possibly <float.h>.  If you do not have these
//		file, some suggested values are listed below, assuming 64-bit
//		long, 32-bit int and 16-bit short.
//
//		ANNdist ANN_DIST_INF	Values (see <limits.h> or <float.h>)
//		------- ------------	------------------------------------
//		double	DBL_MAX			1.79769313486231570e+308
//		float	FLT_MAX			3.40282346638528860e+38
//		long	LONG_MAX		0x7fffffffffffffff
//		int		INT_MAX			0x7fffffff
//		short	SHRT_MAX		0x7fff
//----------------------------------------------------------------------

const ANNdist	ANN_DIST_INF = ANN_DBL_MAX;

//----------------------------------------------------------------------
//	Significant digits for tree dumps:
//		When floating point coordinates are used, the routine that dumps
//		a tree needs to know roughly how many significant digits there
//		are in a ANNcoord, so it can output points to full precision.
//		This is defined to be ANNcoordPrec.  On most systems these
//		values can be found in the standard include files <limits.h> or
//		<float.h>.  For integer types, the value is essentially ignored.
//
//		ANNcoord ANNcoordPrec	Values (see <limits.h> or <float.h>)
//		-------- ------------	------------------------------------
//		double	 DBL_DIG		15
//		float	 FLT_DIG		6
//		long	 doesn't matter 19
//		int		 doesn't matter 10
//		short	 doesn't matter 5
//----------------------------------------------------------------------

#ifdef DBL_DIG							// number of sig. bits in ANNcoord
const int	 ANNcoordPrec	= DBL_DIG;
#else
const int	 ANNcoordPrec	= 15;	// default precision
#endif

//----------------------------------------------------------------------
// Self match?
//	In some applications, the nearest neighbor of a point is not
//	allowed to be the point itself. This occurs, for example, when
//	computing all nearest neighbors in a set.  By setting the
//	parameter ANN_ALLOW_SELF_MATCH to ANNfalse, the nearest neighbor
//	is the closest point whose distance from the query point is
//	strictly positive.
//----------------------------------------------------------------------

const ANNbool	ANN_ALLOW_SELF_MATCH	= ANNtrue;
//const ANNbool	ANN_ALLOW_SELF_MATCH	= ANNfalse;

//----------------------------------------------------------------------
//	Norms and metrics:
//		ANN supports any Minkowski norm for defining distance.  In
//		particular, for any p >= 1, the L_p Minkowski norm defines the
//		length of a d-vector (v0, v1, ..., v(d-1)) to be
//
//				(|v0|^p + |v1|^p + ... + |v(d-1)|^p)^(1/p),
//
//		(where ^ denotes exponentiation, and |.| denotes absolute
//		value).  The distance between two points is defined to be the
//		norm of the vector joining them.  Some common distance metrics
//		include
//
//				Euclidean metric		p = 2
//				Manhattan metric		p = 1
//				Max metric				p = infinity
//
//		In the case of the max metric, the norm is computed by taking
//		the maxima of the absolute values of the components.  ANN is
//		highly "coordinate-based" and does not support general distances
//		functions (e.g. those obeying just the triangle inequality).  It
//		also does not support distance functions based on
//		inner-products.
//
//		For the purpose of computing nearest neighbors, it is not
//		necessary to compute the final power (1/p).  Thus the only
//		component that is used by the program is |v(i)|^p.
//
//		ANN parameterizes the distance computation through the following
//		macros.  (Macros are used rather than procedures for
//		efficiency.) Recall that the distance between two points is
//		given by the length of the vector joining them, and the length
//		or norm of a vector v is given by formula:
//
//				|v| = ROOT(POW(v0) # POW(v1) # ... # POW(v(d-1)))
//
//		where ROOT, POW are unary functions and # is an associative and
//		commutative binary operator mapping the following types:
//
//			**	POW:	ANNcoord				--> ANNdist
//			**	#:		ANNdist x ANNdist		--> ANNdist
//			**	ROOT:	ANNdist (>0)			--> double
//
//		For early termination in distance calculation (partial distance
//		calculation) we assume that POW and # together are monotonically
//		increasing on sequences of arguments, meaning that for all
//		v0..vk and y:
//
//		POW(v0) #...# POW(vk) <= (POW(v0) #...# POW(vk)) # POW(y).
//
//	Incremental Distance Calculation:
//		The program uses an optimized method of computing distances for
//		kd-trees and bd-trees, called incremental distance calculation.
//		It is used when distances are to be updated when only a single
//		coordinate of a point has been changed.  In order to use this,
//		we assume that there is an incremental update function DIFF(x,y)
//		for #, such that if:
//
//					s = x0 # ... # xi # ... # xk
//
//		then if s' is equal to s but with xi replaced by y, that is,
//
//					s' = x0 # ... # y # ... # xk
//
//		then the length of s' can be computed by:
//
//					|s'| = |s| # DIFF(xi,y).
//
//		Thus, if # is + then DIFF(xi,y) is (yi-x).  For the L_infinity
//		norm we make use of the fact that in the program this function
//		is only invoked when y > xi, and hence DIFF(xi,y)=y.
//
//		Finally, for approximate nearest neighbor queries we assume
//		that POW and ROOT are related such that
//
//					v*ROOT(x) = ROOT(POW(v)*x)
//
//		Here are the values for the various Minkowski norms:
//
//		L_p:	p even:							p odd:
//				-------------------------		------------------------
//				POW(v)			= v^p			POW(v)			= |v|^p
//				ROOT(x)			= x^(1/p)		ROOT(x)			= x^(1/p)
//				#				= +				#				= +
//				DIFF(x,y)		= y - x			DIFF(x,y)		= y - x
//
//		L_inf:
//				POW(v)			= |v|
//				ROOT(x)			= x
//				#				= max
//				DIFF(x,y)		= y
//
//		By default the Euclidean norm is assumed.  To change the norm,
//		uncomment the appropriate set of macros below.
//----------------------------------------------------------------------

//----------------------------------------------------------------------
//	Use the following for the Euclidean norm
//----------------------------------------------------------------------
#define ANN_POW(v)			((v)*(v))
#define ANN_ROOT(x)			sqrt(x)
#define ANN_SUM(x,y)		((x) + (y))
#define ANN_DIFF(x,y)		((y) - (x))

//----------------------------------------------------------------------
//	Use the following for the L_1 (Manhattan) norm
//----------------------------------------------------------------------
// #define ANN_POW(v)		fabs(v)
// #define ANN_ROOT(x)		(x)
// #define ANN_SUM(x,y)		((x) + (y))
// #define ANN_DIFF(x,y)	((y) - (x))

//----------------------------------------------------------------------
//	Use the following for a general L_p norm
//----------------------------------------------------------------------
// #define ANN_POW(v)		pow(fabs(v),p)
// #define ANN_ROOT(x)		pow(fabs(x),1/p)
// #define ANN_SUM(x,y)		((x) + (y))
// #define ANN_DIFF(x,y)	((y) - (x))

//----------------------------------------------------------------------
//	Use the following for the L_infinity (Max) norm
//----------------------------------------------------------------------
// #define ANN_POW(v)		fabs(v)
// #define ANN_ROOT(x)		(x)
// #define ANN_SUM(x,y)		((x) > (y) ? (x) : (y))
// #define ANN_DIFF(x,y)	(y)

//----------------------------------------------------------------------
//	Array types
//		The following array types are of basic interest.  A point is
//		just a dimensionless array of coordinates, a point array is a
//		dimensionless array of points.  A distance array is a
//		dimensionless array of distances and an index array is a
//		dimensionless array of point indices.  The latter two are used
//		when returning the results of k-nearest neighbor queries.
//----------------------------------------------------------------------

typedef ANNcoord* ANNpoint;			// a point
typedef ANNpoint* ANNpointArray;	// an array of points
typedef ANNdist*  ANNdistArray;		// an array of distances
typedef ANNidx*   ANNidxArray;		// an array of point indices

//----------------------------------------------------------------------
//	Basic point and array utilities:
//		The following procedures are useful supplements to ANN's nearest
//		neighbor capabilities.
//
//		annDist():
//			Computes the (squared) distance between a pair of points.
//			Note that this routine is not used internally by ANN for
//			computing distance calculations.  For reasons of efficiency
//			this is done using incremental distance calculation.  Thus,
//			this routine cannot be modified as a method of changing the
//			metric.
//
//		Because points (somewhat like strings in C) are stored as
//		pointers.  Consequently, creating and destroying copies of
//		points may require storage allocation.  These procedures do
//		this.
//
//		annAllocPt() and annDeallocPt():
//				Allocate a deallocate storage for a single point, and
//				return a pointer to it.  The argument to AllocPt() is
//				used to initialize all components.
//
//		annAllocPts() and annDeallocPts():
//				Allocate and deallocate an array of points as well a
//				place to store their coordinates, and initializes the
//				points to point to their respective coordinates.  It
//				allocates point storage in a contiguous block large
//				enough to store all the points.  It performs no
//				initialization.
//
//		annCopyPt():
//				Creates a copy of a given point, allocating space for
//				the new point.  It returns a pointer to the newly
//				allocated copy.
//----------------------------------------------------------------------

DLL_API ANNdist annDist(
	int				dim,		// dimension of space
	ANNpoint		p,			// points
	ANNpoint		q);

DLL_API ANNpoint annAllocPt(
	int				dim,		// dimension
	ANNcoord		c = 0);		// coordinate value (all equal)

DLL_API ANNpointArray annAllocPts(
	int				n,			// number of points
	int				dim);		// dimension

DLL_API void annDeallocPt(
	ANNpoint		&p);		// deallocate 1 point

DLL_API void annDeallocPts(
	ANNpointArray	&pa);		// point array

DLL_API ANNpoint annCopyPt(
	int				dim,		// dimension
	ANNpoint		source);	// point to copy

//----------------------------------------------------------------------
//Overall structure: ANN supports a number of different data structures
//for approximate and exact nearest neighbor searching.  These are:
//
//		ANNbruteForce	A simple brute-force search structure.
//		ANNkd_tree		A kd-tree tree search structure.  ANNbd_tree
//		A bd-tree tree search structure (a kd-tree with shrink
//		capabilities).
//
//		At a minimum, each of these data structures support k-nearest
//		neighbor queries.  The nearest neighbor query, annkSearch,
//		returns an integer identifier and the distance to the nearest
//		neighbor(s) and annRangeSearch returns the nearest points that
//		lie within a given query ball.
//
//		Each structure is built by invoking the appropriate constructor
//		and passing it (at a minimum) the array of points, the total
//		number of points and the dimension of the space.  Each structure
//		is also assumed to support a destructor and member functions
//		that return basic information about the point set.
//
//		Note that the array of points is not copied by the data
//		structure (for reasons of space efficiency), and it is assumed
//		to be constant throughout the lifetime of the search structure.
//
//		The search algorithm, annkSearch, is given the query point (q),
//		and the desired number of nearest neighbors to report (k), and
//		the error bound (eps) (whose default value is 0, implying exact
//		nearest neighbors).  It returns two arrays which are assumed to
//		contain at least k elements: one (nn_idx) contains the indices
//		(within the point array) of the nearest neighbors and the other
//		(dd) contains the squared distances to these nearest neighbors.
//
//		The search algorithm, annkFRSearch, is a fixed-radius kNN
//		search.  In addition to a query point, it is given a (squared)
//		radius bound.  (This is done for consistency, because the search
//		returns distances as squared quantities.) It does two things.
//		First, it computes the k nearest neighbors within the radius
//		bound, and second, it returns the total number of points lying
//		within the radius bound. It is permitted to set k = 0, in which
//		case it effectively answers a range counting query.  If the
//		error bound epsilon is positive, then the search is approximate
//		in the sense that it is free to ignore any point that lies
//		outside a ball of radius r/(1+epsilon), where r is the given
//		(unsquared) radius bound.
//
//		The generic object from which all the search structures are
//		dervied is given below.  It is a virtual object, and is useless
//		by itself.
//----------------------------------------------------------------------

class DLL_API ANNpointSet {
    public:
	virtual ~ANNpointSet() {}			// virtual distructor

	virtual void annkSearch(			// approx k near neighbor search
		ANNpoint		q,				// query point
		int				k,				// number of near neighbors to return
		ANNidxArray		nn_idx,			// nearest neighbor array (modified)
		ANNdistArray	dd,				// dist to near neighbors (modified)
		double			eps=0.0			// error bound
		) = 0;							// pure virtual (defined elsewhere)

	virtual int annkFRSearch(			// approx fixed-radius kNN search
		ANNpoint		q,				// query point
		ANNdist			sqRad,			// squared radius
		int				k = 0,			// number of near neighbors to return
		ANNidxArray		nn_idx = NULL,	// nearest neighbor array (modified)
		ANNdistArray	dd = NULL,		// dist to near neighbors (modified)
		double			eps=0.0			// error bound
		) = 0;							// pure virtual (defined elsewhere)

	virtual  std::pair< std::vector<int>, std::vector<double> >  annkFRSearch2(			// approx fixed-radius kNN search
		ANNpoint		q,				// query point
		ANNdist			sqRad,			// squared radius
		double			eps=0.0			// error bound
		) = 0;							// pure virtual (defined elsewhere)
	virtual int theDim() = 0;			// return dimension of space
	virtual int nPoints() = 0;			// return number of points
	// return pointer to points
	virtual ANNpointArray thePoints() = 0;
};

//----------------------------------------------------------------------
//	Brute-force nearest neighbor search:
//		The brute-force search structure is very simple but inefficient.
//		It has been provided primarily for the sake of comparison with
//		and validation of the more complex search structures.
//
//		Query processing is the same as described above, but the value
//		of epsilon is ignored, since all distance calculations are
//		performed exactly.
//
//		WARNING: This data structure is very slow, and should not be
//		used unless the number of points is very small.
//
//		Internal information:
//		---------------------
//		This data structure bascially consists of the array of points
//		(each a pointer to an array of coordinates).  The search is
//		performed by a simple linear scan of all the points.
//----------------------------------------------------------------------

class DLL_API ANNbruteForce: public ANNpointSet {
    int				dim;				// dimension
    int				n_pts;				// number of points
    ANNpointArray	pts;				// point array
    public:
    ANNbruteForce(						// constructor from point array
	    ANNpointArray	pa,				// point array
	    int				n,				// number of points
	    int				dd);			// dimension

    ~ANNbruteForce();					// destructor

    void annkSearch(					// approx k near neighbor search
	    ANNpoint		q,				// query point
	    int				k,				// number of near neighbors to return
	    ANNidxArray		nn_idx,			// nearest neighbor array (modified)
	    ANNdistArray	dd,				// dist to near neighbors (modified)
	    double			eps=0.0);		// error bound

    int annkFRSearch(					// approx fixed-radius kNN search
	    ANNpoint		q,				// query point
	    ANNdist			sqRad,			// squared radius
	    int				k = 0,			// number of near neighbors to return
	    ANNidxArray		nn_idx = NULL,	// nearest neighbor array (modified)
	    ANNdistArray	dd = NULL,		// dist to near neighbors (modified)
	    double			eps=0.0);		// error bound

    std::pair< std::vector<int>, std::vector<double> >  annkFRSearch2(					// approx fixed-radius kNN search
	    ANNpoint		q,				// query point
	    ANNdist			sqRad,			// squared radius
	    double			eps=0.0);		// error bound

    int theDim()						// return dimension of space
    { return dim; }

    int nPoints()						// return number of points
    { return n_pts; }

    ANNpointArray thePoints()			// return pointer to points
    {  return pts;  }
};

//----------------------------------------------------------------------
// kd- and bd-tree splitting and shrinking rules
//		kd-trees supports a collection of different splitting rules.
//		In addition to the standard kd-tree splitting rule proposed
//		by Friedman, Bentley, and Finkel, we have introduced a
//		number of other splitting rules, which seem to perform
//		as well or better (for the distributions we have tested).
//
//		The splitting methods given below allow the user to tailor
//		the data structure to the particular data set.  They are
//		are described in greater details in the kd_split.cc source
//		file.  The method ANN_KD_SUGGEST is the method chosen (rather
//		subjectively) by the implementors as the one giving the
//		fastest performance, and is the default splitting method.
//
//		As with splitting rules, there are a number of different
//		shrinking rules.  The shrinking rule ANN_BD_NONE does no
//		shrinking (and hence produces a kd-tree tree).  The rule
//		ANN_BD_SUGGEST uses the implementors favorite rule.
//----------------------------------------------------------------------

enum ANNsplitRule {
    ANN_KD_STD				= 0,	// the optimized kd-splitting rule
    ANN_KD_MIDPT			= 1,	// midpoint split
    ANN_KD_FAIR				= 2,	// fair split
    ANN_KD_SL_MIDPT			= 3,	// sliding midpoint splitting method
    ANN_KD_SL_FAIR			= 4,	// sliding fair split method
    ANN_KD_SUGGEST			= 5};	// the authors' suggestion for best
const int ANN_N_SPLIT_RULES		= 6;	// number of split rules

enum ANNshrinkRule {
    ANN_BD_NONE				= 0,	// no shrinking at all (just kd-tree)
    ANN_BD_SIMPLE			= 1,	// simple splitting
    ANN_BD_CENTROID			= 2,	// centroid splitting
    ANN_BD_SUGGEST			= 3};	// the authors' suggested choice
const int ANN_N_SHRINK_RULES	= 4;	// number of shrink rules

//----------------------------------------------------------------------
//	kd-tree:
//		The main search data structure supported by ANN is a kd-tree.
//		The main constructor is given a set of points and a choice of
//		splitting method to use in building the tree.
//
//		Construction:
//		-------------
//		The constructor is given the point array, number of points,
//		dimension, bucket size (default = 1), and the splitting rule
//		(default = ANN_KD_SUGGEST).  The point array is not copied, and
//		is assumed to be kept constant throughout the lifetime of the
//		search structure.  There is also a "load" constructor that
//		builds a tree from a file description that was created by the
//		Dump operation.
//
//		Search:
//		-------
//		There are two search methods:
//
//			Standard search (annkSearch()):
//				Searches nodes in tree-traversal order, always visiting
//				the closer child first.
//			Priority search (annkPriSearch()):
//				Searches nodes in order of increasing distance of the
//				associated cell from the query point.  For many
//				distributions the standard search seems to work just
//				fine, but priority search is safer for worst-case
//				performance.
//
//		Printing:
//		---------
//		There are two methods provided for printing the tree.  Print()
//		is used to produce a "human-readable" display of the tree, with
//		indenation, which is handy for debugging.  Dump() produces a
//		format that is suitable reading by another program.  There is a
//		"load" constructor, which constructs a tree which is assumed to
//		have been saved by the Dump() procedure.
//
//		Performance and Structure Statistics:
//		-------------------------------------
//		The procedure getStats() collects statistics information on the
//		tree (its size, height, etc.)  See ANNperf.h for information on
//		the stats structure it returns.
//
//		Internal information:
//		---------------------
//		The data structure consists of three major chunks of storage.
//		The first (implicit) storage are the points themselves (pts),
//		which have been provided by the users as an argument to the
//		constructor, or are allocated dynamically if the tree is built
//		using the load constructor).  These should not be changed during
//		the lifetime of the search structure.  It is the user's
//		responsibility to delete these after the tree is destroyed.
//
//		The second is the tree itself (which is dynamically allocated in
//		the constructor) and is given as a pointer to its root node
//		(root).  These nodes are automatically deallocated when the tree
//		is deleted.  See the file src/kd_tree.h for further information
//		on the structure of the tree nodes.
//
//		Each leaf of the tree does not contain a pointer directly to a
//		point, but rather contains a pointer to a "bucket", which is an
//		array consisting of point indices.  The third major chunk of
//		storage is an array (pidx), which is a large array in which all
//		these bucket subarrays reside.  (The reason for storing them
//		separately is the buckets are typically small, but of varying
//		sizes.  This was done to avoid fragmentation.)  This array is
//		also deallocated when the tree is deleted.
//
//		In addition to this, the tree consists of a number of other
//		pieces of information which are used in searching and for
//		subsequent tree operations.  These consist of the following:
//
//		dim						Dimension of space
//		n_pts					Number of points currently in the tree
//		n_max					Maximum number of points that are allowed
//								in the tree
//		bkt_size				Maximum bucket size (no. of points per leaf)
//		bnd_box_lo				Bounding box low point
//		bnd_box_hi				Bounding box high point
//		splitRule				Splitting method used
//
//----------------------------------------------------------------------

//----------------------------------------------------------------------
// Some types and objects used by kd-tree functions
// See src/kd_tree.h and src/kd_tree.cpp for definitions
//----------------------------------------------------------------------
class ANNkdStats;				// stats on kd-tree
class ANNkd_node;				// generic node in a kd-tree
typedef ANNkd_node*	ANNkd_ptr;	// pointer to a kd-tree node

class DLL_API ANNkd_tree: public ANNpointSet {
    protected:
	int				dim;				// dimension of space
	int				n_pts;				// number of points in tree
	int				bkt_size;			// bucket size
	ANNpointArray	pts;				// the points
	ANNidxArray		pidx;				// point indices (to pts array)
	ANNkd_ptr		root;				// root of kd-tree
	ANNpoint		bnd_box_lo;			// bounding box low point
	ANNpoint		bnd_box_hi;			// bounding box high point

	void SkeletonTree(					// construct skeleton tree
		int				n,				// number of points
		int				dd,				// dimension
		int				bs,				// bucket size
		ANNpointArray pa = NULL,		// point array (optional)
		ANNidxArray pi = NULL);			// point indices (optional)

    public:
	ANNkd_tree(							// build skeleton tree
		int				n = 0,			// number of points
		int				dd = 0,			// dimension
		int				bs = 1);		// bucket size

	ANNkd_tree(							// build from point array
		ANNpointArray	pa,				// point array
		int				n,				// number of points
		int				dd,				// dimension
		int				bs = 1,			// bucket size
		ANNsplitRule	split = ANN_KD_SUGGEST);	// splitting method

	ANNkd_tree(							// build from dump file
		std::istream&	in);			// input stream for dump file

	~ANNkd_tree();						// tree destructor

	void annkSearch(					// approx k near neighbor search
		ANNpoint		q,				// query point
		int				k,				// number of near neighbors to return
		ANNidxArray		nn_idx,			// nearest neighbor array (modified)
		ANNdistArray	dd,				// dist to near neighbors (modified)
		double			eps=0.0);		// error bound

	void annkPriSearch( 				// priority k near neighbor search
		ANNpoint		q,				// query point
		int				k,				// number of near neighbors to return
		ANNidxArray		nn_idx,			// nearest neighbor array (modified)
		ANNdistArray	dd,				// dist to near neighbors (modified)
		double			eps=0.0);		// error bound

	int annkFRSearch(					// approx fixed-radius kNN search
		ANNpoint		q,				// the query point
		ANNdist			sqRad,			// squared radius of query ball
		int				k,				// number of neighbors to return
		ANNidxArray		nn_idx = NULL,	// nearest neighbor array (modified)
		ANNdistArray	dd = NULL,		// dist to near neighbors (modified)
		double			eps=0.0);		// error bound

	//MFH 7/15/2015
	std::pair< std::vector<int>, std::vector<double> > annkFRSearch2(					// approx fixed-radius kNN search
		ANNpoint		q,				// the query point
		ANNdist			sqRad,			// squared radius of query ball
		double			eps=0.0);		// error bound


	int theDim()						// return dimension of space
	{ return dim; }

	int nPoints()						// return number of points
	{ return n_pts; }

	ANNpointArray thePoints()			// return pointer to points
	{  return pts;  }

	virtual void Print(					// print the tree (for debugging)
		ANNbool			with_pts,		// print points as well?
		std::ostream&	out);			// output stream

	virtual void Dump(					// dump entire tree
		ANNbool			with_pts,		// print points as well?
		std::ostream&	out);			// output stream

	virtual void getStats(				// compute tree statistics
		ANNkdStats&		st);			// the statistics (modified)
};

//----------------------------------------------------------------------
//	Box decomposition tree (bd-tree)
//		The bd-tree is inherited from a kd-tree.  The main difference
//		in the bd-tree and the kd-tree is a new type of internal node
//		called a shrinking node (in the kd-tree there is only one type
//		of internal node, a splitting node).  The shrinking node
//		makes it possible to generate balanced trees in which the
//		cells have bounded aspect ratio, by allowing the decomposition
//		to zoom in on regions of dense point concentration.  Although
//		this is a nice idea in theory, few point distributions are so
//		densely clustered that this is really needed.
//----------------------------------------------------------------------

class DLL_API ANNbd_tree: public ANNkd_tree {
    public:
	ANNbd_tree(							// build skeleton tree
		int				n,				// number of points
		int				dd,				// dimension
		int				bs = 1)			// bucket size
	    : ANNkd_tree(n, dd, bs) {}		// build base kd-tree

	ANNbd_tree(							// build from point array
		ANNpointArray	pa,				// point array
		int				n,				// number of points
		int				dd,				// dimension
		int				bs = 1,			// bucket size
		ANNsplitRule	split  = ANN_KD_SUGGEST,	// splitting rule
		ANNshrinkRule	shrink = ANN_BD_SUGGEST);	// shrinking rule

	ANNbd_tree(							// build from dump file
		std::istream&	in);			// input stream for dump file
};

//----------------------------------------------------------------------
//	Other functions
//	annMaxPtsVisit		Sets a limit on the maximum number of points
//						to visit in the search.
//  annClose			Can be called when all use of ANN is finished.
//						It clears up a minor memory leak.
//----------------------------------------------------------------------

DLL_API void annMaxPtsVisit(	// max. pts to visit in search
	int				maxPts);	// the limit

DLL_API void annClose();		// called to end use of ANN

#endif


================================================
FILE: src/ANN/ANNperf.h
================================================
//----------------------------------------------------------------------
//	File:			ANNperf.h
//	Programmer:		Sunil Arya and David Mount
//	Last modified:	03/04/98 (Release 0.1)
//	Description:	Include file for ANN performance stats
//
//	Some of the code for statistics gathering has been adapted
//	from the SmplStat.h package in the g++ library.
//----------------------------------------------------------------------
// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and
// David Mount.  All Rights Reserved.
//
// This software and related documentation is part of the Approximate
// Nearest Neighbor Library (ANN).  This software is provided under
// the provisions of the Lesser GNU Public License (LGPL).  See the
// file ../ReadMe.txt for further information.
//
// The University of Maryland (U.M.) and the authors make no
// representations about the suitability or fitness of this software for
// any purpose.  It is provided "as is" without express or implied
// warranty.
//----------------------------------------------------------------------
//      History:
//      Revision 0.1  03/04/98
//          Initial release
//      Revision 1.0  04/01/05
//          Added ANN_ prefix to avoid name conflicts.
//----------------------------------------------------------------------

#ifndef ANNperf_H
#define ANNperf_H

//----------------------------------------------------------------------
//	basic includes
//----------------------------------------------------------------------

#include "ANN.h"					// basic ANN includes

//----------------------------------------------------------------------
// kd-tree stats object
//	This object is used for collecting information about a kd-tree
//	or bd-tree.
//----------------------------------------------------------------------

class ANNkdStats {			// stats on kd-tree
public:
	int		dim;			// dimension of space
	int		n_pts;			// no. of points
	int		bkt_size;		// bucket size
	int		n_lf;			// no. of leaves (including trivial)
	int		n_tl;			// no. of trivial leaves (no points)
	int		n_spl;			// no. of splitting nodes
	int		n_shr;			// no. of shrinking nodes (for bd-trees)
	int		depth;			// depth of tree
	float	sum_ar;			// sum of leaf aspect ratios
	float	avg_ar;			// average leaf aspect ratio
 //
							// reset stats
	void reset(int d=0, int n=0, int bs=0)
	{
		dim = d; n_pts = n; bkt_size = bs;
		n_lf = n_tl = n_spl = n_shr = depth = 0;
		sum_ar = avg_ar = 0.0;
	}

	ANNkdStats()			// basic constructor
	{ reset(); }

	void merge(const ANNkdStats &st);	// merge stats from child 
};

//----------------------------------------------------------------------
//  ANNsampStat
//	A sample stat collects numeric (double) samples and returns some
//	simple statistics.  Its main functions are:
//
//		reset()		Reset to no samples.
//		+= x		Include sample x.
//		samples()	Return number of samples.
//		mean()		Return mean of samples.
//		stdDev()	Return standard deviation
//		min()		Return minimum of samples.
//		max()		Return maximum of samples.
//----------------------------------------------------------------------
class DLL_API ANNsampStat {
	int				n;				// number of samples
	double			sum;			// sum
	double			sum2;			// sum of squares
	double			minVal, maxVal;	// min and max
public :
	void reset()				// reset everything
	{  
		n = 0;
		sum = sum2 = 0;
		minVal = ANN_DBL_MAX;
		maxVal = -ANN_DBL_MAX; 
	}

	ANNsampStat() { reset(); }		// constructor

	void operator+=(double x)		// add sample
	{
		n++;  sum += x;  sum2 += x*x;
		if (x < minVal) minVal = x;
		if (x > maxVal) maxVal = x;
	}

	int samples() { return n; }		// number of samples

	double mean() { return sum/n; } // mean

									// standard deviation
	double stdDev() { return std::sqrt((sum2 - (sum*sum)/n)/(n-1));}

	double min() { return minVal; } // minimum
	double max() { return maxVal; } // maximum
};

//----------------------------------------------------------------------
//		Operation count updates
//----------------------------------------------------------------------

#ifdef ANN_PERF
  #define ANN_FLOP(n)	{ann_Nfloat_ops += (n);}
  #define ANN_LEAF(n)	{ann_Nvisit_lfs += (n);}
  #define ANN_SPL(n)	{ann_Nvisit_spl += (n);}
  #define ANN_SHR(n)	{ann_Nvisit_shr += (n);}
  #define ANN_PTS(n)	{ann_Nvisit_pts += (n);}
  #define ANN_COORD(n)	{ann_Ncoord_hts += (n);}
#else
  #define ANN_FLOP(n)
  #define ANN_LEAF(n)
  #define ANN_SPL(n)
  #define ANN_SHR(n)
  #define ANN_PTS(n)
  #define ANN_COORD(n)
#endif

//----------------------------------------------------------------------
//	Performance statistics
//	The following data and routines are used for computing performance
//	statistics for nearest neighbor searching.  Because these routines
//	can slow the code down, they can be activated and deactiviated by
//	defining the ANN_PERF variable, by compiling with the option:
//	-DANN_PERF
//----------------------------------------------------------------------

//----------------------------------------------------------------------
//	Global counters for performance measurement
//
//	visit_lfs	The number of leaf nodes visited in the
//				tree.
//
//	visit_spl	The number of splitting nodes visited in the
//				tree.
//
//	visit_shr	The number of shrinking nodes visited in the
//				tree.
//
//	visit_pts	The number of points visited in all the
//				leaf nodes visited. Equivalently, this
//				is the number of points for which distance
//				calculations are performed.
//
//	coord_hts	The number of times a coordinate of a 
//				data point is accessed. This is generally
//				less than visit_pts*d if partial distance
//				calculation is used.  This count is low
//				in the sense that if a coordinate is hit
//				many times in the same routine we may
//				count it only once.
//
//	float_ops	The number of floating point operations.
//				This includes all operations in the heap
//				as well as distance calculations to boxes.
//
//	average_err	The average error of each query (the
//				error of the reported point to the true
//				nearest neighbor).  For k nearest neighbors
//				the error is computed k times.
//
//	rank_err	The rank error of each query (the difference
//				in the rank of the reported point and its
//				true rank).
//
//	data_pts	The number of data points.  This is not
//				a counter, but used in stats computation.
//----------------------------------------------------------------------

extern int			ann_Ndata_pts;	// number of data points
extern int			ann_Nvisit_lfs;	// number of leaf nodes visited
extern int			ann_Nvisit_spl;	// number of splitting nodes visited
extern int			ann_Nvisit_shr;	// number of shrinking nodes visited
extern int			ann_Nvisit_pts;	// visited points for one query
extern int			ann_Ncoord_hts;	// coordinate hits for one query
extern int			ann_Nfloat_ops;	// floating ops for one query
extern ANNsampStat	ann_visit_lfs;	// stats on leaf nodes visits
extern ANNsampStat	ann_visit_spl;	// stats on splitting nodes visits
extern ANNsampStat	ann_visit_shr;	// stats on shrinking nodes visits
extern ANNsampStat	ann_visit_nds;	// stats on total nodes visits
extern ANNsampStat	ann_visit_pts;	// stats on points visited
extern ANNsampStat	ann_coord_hts;	// stats on coordinate hits
extern ANNsampStat	ann_float_ops;	// stats on floating ops
//----------------------------------------------------------------------
//  The following need to be part of the public interface, because
//  they are accessed outside the DLL in ann_test.cpp.
//----------------------------------------------------------------------
DLL_API extern ANNsampStat ann_average_err;	// average error
DLL_API extern ANNsampStat ann_rank_err;	// rank error

//----------------------------------------------------------------------
//	Declaration of externally accessible routines for statistics
//----------------------------------------------------------------------

DLL_API void annResetStats(int data_size);	// reset stats for a set of queries

DLL_API void annResetCounts();				// reset counts for one queries

DLL_API void annUpdateStats();				// update stats with current counts

DLL_API void annPrintStats(ANNbool validate); // print statistics for a run

#endif


================================================
FILE: src/ANN/ANNx.h
================================================
//----------------------------------------------------------------------
//	File:			ANNx.h
//	Programmer: 	Sunil Arya and David Mount
//	Last modified:	03/04/98 (Release 0.1)
//	Description:	Internal include file for ANN
//
//	These declarations are of use in manipulating some of
//	the internal data objects appearing in ANN, but are not
//	needed for applications just using the nearest neighbor
//	search.
//
//	Typical users of ANN should not need to access this file.
//----------------------------------------------------------------------
// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and
// David Mount.  All Rights Reserved.
//
// This software and related documentation is part of the Approximate
// Nearest Neighbor Library (ANN).  This software is provided under
// the provisions of the Lesser GNU Public License (LGPL).  See the
// file ../ReadMe.txt for further information.
//
// The University of Maryland (U.M.) and the authors make no
// representations about the suitability or fitness of this software for
// any purpose.  It is provided "as is" without express or implied
// warranty.
//----------------------------------------------------------------------
//	History:
//	Revision 0.1  03/04/98
//	    Initial release
//	Revision 1.0  04/01/05
//	    Changed LO, HI, IN, OUT to ANN_LO, ANN_HI, etc.
//----------------------------------------------------------------------

#ifndef ANNx_H
#define ANNx_H

#include <iomanip>				// I/O manipulators
#include "ANN.h"			// ANN includes

//----------------------------------------------------------------------
//	Global constants and types
//----------------------------------------------------------------------
enum	{ANN_LO=0, ANN_HI=1};	// splitting indices
enum	{ANN_IN=0, ANN_OUT=1};	// shrinking indices
								// what to do in case of error
enum ANNerr {ANNwarn = 0, ANNabort = 1};

//----------------------------------------------------------------------
//	Maximum number of points to visit
//	We have an option for terminating the search early if the
//	number of points visited exceeds some threshold.  If the
//	threshold is 0 (its default)  this means there is no limit
//	and the algorithm applies its normal termination condition.
//----------------------------------------------------------------------

extern int		ANNmaxPtsVisited;	// maximum number of pts visited
extern int		ANNptsVisited;		// number of pts visited in search

//----------------------------------------------------------------------
//	Global function declarations
//----------------------------------------------------------------------

void annError(					// ANN error routine
	const char			*msg,		// error message
	ANNerr			level);		// level of error

void annPrintPt(				// print a point
	ANNpoint		pt,			// the point
	int				dim,		// the dimension
	std::ostream	&out);		// output stream

//----------------------------------------------------------------------
//	Orthogonal (axis aligned) rectangle
//	Orthogonal rectangles are represented by two points, one
//	for the lower left corner (min coordinates) and the other
//	for the upper right corner (max coordinates).
//
//	The constructor initializes from either a pair of coordinates,
//	pair of points, or another rectangle.  Note that all constructors
//	allocate new point storage. The destructor deallocates this
//	storage.
//
//	BEWARE: Orthogonal rectangles should be passed ONLY BY REFERENCE.
//	(C++'s default copy constructor will not allocate new point
//	storage, then on return the destructor free's storage, and then
//	you get into big trouble in the calling procedure.)
//----------------------------------------------------------------------

class ANNorthRect {
public:
	ANNpoint		lo;			// rectangle lower bounds
	ANNpoint		hi;			// rectangle upper bounds
//
	ANNorthRect(				// basic constructor
	int				dd,			// dimension of space
	ANNcoord		l=0,		// default is empty
	ANNcoord		h=0)
	{  lo = annAllocPt(dd, l);  hi = annAllocPt(dd, h); }

	ANNorthRect(				// (almost a) copy constructor
	int				dd,			// dimension
	const			ANNorthRect &r) // rectangle to copy
	{  lo = annCopyPt(dd, r.lo);  hi = annCopyPt(dd, r.hi);  }

	ANNorthRect(				// construct from points
	int				dd,			// dimension
	ANNpoint		l,			// low point
	ANNpoint		h)			// hight point
	{  lo = annCopyPt(dd, l);  hi = annCopyPt(dd, h);  }

	~ANNorthRect()				// destructor
    {  annDeallocPt(lo);  annDeallocPt(hi);  }

	ANNbool inside(int dim, ANNpoint p);// is point p inside rectangle?
};

void annAssignRect(				// assign one rect to another
	int				dim,		// dimension (both must be same)
	ANNorthRect		&dest,		// destination (modified)
	const ANNorthRect &source);	// source

//----------------------------------------------------------------------
//	Orthogonal (axis aligned) halfspace
//	An orthogonal halfspace is represented by an integer cutting
//	dimension cd, coordinate cutting value, cv, and side, sd, which is
//	either +1 or -1. Our convention is that point q lies in the (closed)
//	halfspace if (q[cd] - cv)*sd >= 0.
//----------------------------------------------------------------------

class ANNorthHalfSpace {
public:
	int				cd;			// cutting dimension
	ANNcoord		cv;			// cutting value
	int				sd;			// which side
//
	ANNorthHalfSpace()			// default constructor
	{  cd = 0; cv = 0;  sd = 0;  }

	ANNorthHalfSpace(			// basic constructor
	int				cdd,		// dimension of space
	ANNcoord		cvv,		// cutting value
	int				sdd)		// side
	{  cd = cdd;  cv = cvv;  sd = sdd;  }

	ANNbool in(ANNpoint q) const	// is q inside halfspace?
	{  return  (ANNbool) ((q[cd] - cv)*sd >= 0);  }

	ANNbool out(ANNpoint q) const	// is q outside halfspace?
	{  return  (ANNbool) ((q[cd] - cv)*sd < 0);  }

	ANNdist dist(ANNpoint q) const	// (squared) distance from q
	{  return  (ANNdist) ANN_POW(q[cd] - cv);  }

	void setLowerBound(int d, ANNpoint p)// set to lower bound at p[i]
	{  cd = d;  cv = p[d];  sd = +1;  }

	void setUpperBound(int d, ANNpoint p)// set to upper bound at p[i]
	{  cd = d;  cv = p[d];  sd = -1;  }

	void project(ANNpoint &q)		// project q (modified) onto halfspace
	{  if (out(q)) q[cd] = cv;  }
};

								// array of halfspaces
typedef ANNorthHalfSpace *ANNorthHSArray;

#endif


================================================
FILE: src/ANN/Copyright.txt
================================================
ANN: Approximate Nearest Neighbors
Version: 1.1
Release Date: May 3, 2005
----------------------------------------------------------------------------
Copyright (c) 1997-2005 University of Maryland and Sunil Arya and David
Mount All Rights Reserved.

This program is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser Public License as published by the
Free Software Foundation; either version 2.1 of the License, or (at your
option) any later version.

This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
Lesser Public License for more details.

A copy of the terms and conditions of the license can be found in
License.txt or online at

    http://www.gnu.org/copyleft/lesser.html

To obtain a copy, write to the Free Software Foundation, Inc.,
59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.

Disclaimer
----------
The University of Maryland and the authors make no representations about
the suitability or fitness of this software for any purpose.  It is
provided "as is" without express or implied warranty.
---------------------------------------------------------------------

Authors
-------
David Mount
Dept of Computer Science
University of Maryland,
College Park, MD 20742 USA
mount@cs.umd.edu
http://www.cs.umd.edu/~mount/

Sunil Arya
Dept of Computer Science
Hong University of Science and Technology
Clearwater Bay, HONG KONG
arya@cs.ust.hk
http://www.cs.ust.hk/faculty/arya/


================================================
FILE: src/ANN/License.txt
================================================
----------------------------------------------------------------------
The ANN Library (all versions) is provided under the terms and
conditions of the GNU Lesser General Public Library, which is stated
below.  It can also be found at:

   http://www.gnu.org/copyleft/lesser.html

----------------------------------------------------------------------

GNU LESSER GENERAL PUBLIC LICENSE

Version 2.1, February 1999

Copyright (C) 1991, 1999 Free Software Foundation, Inc.
59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.

[This is the first released version of the Lesser GPL.  It also counts
as the successor of the GNU Library Public License, version 2, hence the
version number 2.1.]

Preamble

The licenses for most software are designed to take away your freedom to
share and change it. By contrast, the GNU General Public Licenses are
intended to guarantee your freedom to share and change free software--to
make sure the software is free for all its users.

This license, the Lesser General Public License, applies to some
specially designated software packages--typically libraries--of the Free
Software Foundation and other authors who decide to use it. You can use
it too, but we suggest you first think carefully about whether this
license or the ordinary General Public License is the better strategy to
use in any particular case, based on the explanations below.

When we speak of free software, we are referring to freedom of use, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
this service if you wish); that you receive source code or can get it if
you want it; that you can change the software and use pieces of it in
new free programs; and that you are informed that you can do these
things.

To protect your rights, we need to make restrictions that forbid
distributors to deny you these rights or to ask you to surrender these
rights. These restrictions translate to certain responsibilities for you
if you distribute copies of the library or if you modify it.

For example, if you distribute copies of the library, whether gratis or
for a fee, you must give the recipients all the rights that we gave you.
You must make sure that they, too, receive or can get the source code.
If you link other code with the library, you must provide complete
object files to the recipients, so that they can relink them with the
library after making changes to the library and recompiling it. And you
must show them these terms so they know their rights.

We protect your rights with a two-step method: (1) we copyright the
library, and (2) we offer you this license, which gives you legal
permission to copy, distribute and/or modify the library.

To protect each distributor, we want to make it very clear that there is
no warranty for the free library. Also, if the library is modified by
someone else and passed on, the recipients should know that what they
have is not the original version, so that the original author's
reputation will not be affected by problems that might be introduced by
others.

Finally, software patents pose a constant threat to the existence of any
free program. We wish to make sure that a company cannot effectively
restrict the users of a free program by obtaining a restrictive license
from a patent holder. Therefore, we insist that any patent license
obtained for a version of the library must be consistent with the full
freedom of use specified in this license.

Most GNU software, including some libraries, is covered by the ordinary
GNU General Public License. This license, the GNU Lesser General Public
License, applies to certain designated libraries, and is quite different
from the ordinary General Public License. We use this license for
certain libraries in order to permit linking those libraries into
non-free programs.

When a program is linked with a library, whether statically or using a
shared library, the combination of the two is legally speaking a
combined work, a derivative of the original library. The ordinary
General Public License therefore permits such linking only if the entire
combination fits its criteria of freedom. The Lesser General Public
License permits more lax criteria for linking other code with the
library.

We call this license the "Lesser" General Public License because it does
Less to protect the user's freedom than the ordinary General Public
License. It also provides other free software developers Less of an
advantage over competing non-free programs. These disadvantages are the
reason we use the ordinary General Public License for many libraries.
However, the Lesser license provides advantages in certain special
circumstances.

For example, on rare occasions, there may be a special need to encourage
the widest possible use of a certain library, so that it becomes a
de-facto standard. To achieve this, non-free programs must be allowed to
use the library. A more frequent case is that a free library does the
same job as widely used non-free libraries. In this case, there is
little to gain by limiting the free library to free software only, so we
use the Lesser General Public License.

In other cases, permission to use a particular library in non-free
programs enables a greater number of people to use a large body of free
software. For example, permission to use the GNU C Library in non-free
programs enables many more people to use the whole GNU operating system,
as well as its variant, the GNU/Linux operating system.

Although the Lesser General Public License is Less protective of the
users' freedom, it does ensure that the user of a program that is linked
with the Library has the freedom and the wherewithal to run that program
using a modified version of the Library.

The precise terms and conditions for copying, distribution and
modification follow. Pay close attention to the difference between a
"work based on the library" and a "work that uses the library". The
former contains code derived from the library, whereas the latter must
be combined with the library in order to run.

TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION

0. This License Agreement applies to any software library or other
program which contains a notice placed by the copyright holder or other
authorized party saying it may be distributed under the terms of this
Lesser General Public License (also called "this License"). Each
licensee is addressed as "you".

A "library" means a collection of software functions and/or data
prepared so as to be conveniently linked with application programs
(which use some of those functions and data) to form executables.

The "Library", below, refers to any such software library or work which
has been distributed under these terms. A "work based on the Library"
means either the Library or any derivative work under copyright law:
that is to say, a work containing the Library or a portion of it, either
verbatim or with modifications and/or translated straightforwardly into
another language. (Hereinafter, translation is included without
limitation in the term "modification".)

"Source code" for a work means the preferred form of the work for making
modifications to it. For a library, complete source code means all the
source code for all modules it contains, plus any associated interface
definition files, plus the scripts used to control compilation and
installation of the library.

Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope. The act of running
a program using the Library is not restricted, and output from such a
program is covered only if its contents constitute a work based on the
Library (independent of the use of the Library in a tool for writing
it). Whether that is true depends on what the Library does and what the
program that uses the Library does.

1. You may copy and distribute verbatim copies of the Library's complete
source code as you receive it, in any medium, provided that you
conspicuously and appropriately publish on each copy an appropriate
copyright notice and disclaimer of warranty; keep intact all the notices
that refer to this License and to the absence of any warranty; and
distribute a copy of this License along with the Library.

You may charge a fee for the physical act of transferring a copy, and
you may at your option offer warranty protection in exchange for a fee.

2. You may modify your copy or copies of the Library or any portion of
it, thus forming a work based on the Library, and copy and distribute
such modifications or work under the terms of Section 1 above, provided
that you also meet all of these conditions:

    a) The modified work must itself be a software library.
    b) You must cause the files modified to carry prominent notices
       stating that you changed the files and the date of any change.
    c) You must cause the whole of the work to be licensed at no
       charge to all third parties under the terms of this License.
    d) If a facility in the modified Library refers to a function or a
       table of data to be supplied by an application program that uses
       the facility, other than as an argument passed when the facility
       is invoked, then you must make a good faith effort to ensure that,
       in the event an application does not supply such function or
       table, the facility still operates, and performs whatever part of
       its purpose remains meaningful.

      (For example, a function in a library to compute square roots has
a purpose that is entirely well-defined independent of the application.
Therefore, Subsection 2d requires that any application-supplied function
or table used by this function must be optional: if the application does
not supply it, the square root function must still compute square
roots.)

      These requirements apply to the modified work as a whole. If
identifiable sections of that work are not derived from the Library, and
can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works. But when you
distribute the same sections as part of a whole which is a work based on
the Library, the distribution of the whole must be on the terms of this
License, whose permissions for other licensees extend to the entire
whole, and thus to each and every part regardless of who wrote it.

      Thus, it is not the intent of this section to claim rights or
contest your rights to work written entirely by you; rather, the intent
is to exercise the right to control the distribution of derivative or
collective works based on the Library.

      In addition, mere aggregation of another work not based on the
Library with the Library (or with a work based on the Library) on a
volume of a storage or distribution medium does not bring the other work
under the scope of this License. 

3. You may opt to apply the terms of the ordinary GNU General Public
License instead of this License to a given copy of the Library. To do
this, you must alter all the notices that refer to this License, so that
they refer to the ordinary GNU General Public License, version 2,
instead of to this License. (If a newer version than version 2 of the
ordinary GNU General Public License has appeared, then you can specify
that version instead if you wish.) Do not make any other change in these
notices.

Once this change is made in a given copy, it is irreversible for that
copy, so the ordinary GNU General Public License applies to all
subsequent copies and derivative works made from that copy.

This option is useful when you wish to copy part of the code of the
Library into a program that is not a library.

4. You may copy and distribute the Library (or a portion or derivative
of it, under Section 2) in object code or executable form under the
terms of Sections 1 and 2 above provided that you accompany it with the
complete corresponding machine-readable source code, which must be
distributed under the terms of Sections 1 and 2 above on a medium
customarily used for software interchange.

If distribution of object code is made by offering access to copy from a
designated place, then offering equivalent access to copy the source
code from the same place satisfies the requirement to distribute the
source code, even though third parties are not compelled to copy the
source along with the object code.

5. A program that contains no derivative of any portion of the Library,
but is designed to work with the Library by being compiled or linked
with it, is called a "work that uses the Library". Such a work, in
isolation, is not a derivative work of the Library, and therefore falls
outside the scope of this License.

However, linking a "work that uses the Library" with the Library creates
an executable that is a derivative of the Library (because it contains
portions of the Library), rather than a "work that uses the library".
The executable is therefore covered by this License. Section 6 states
terms for distribution of such executables.

When a "work that uses the Library" uses material from a header file
that is part of the Library, the object code for the work may be a
derivative work of the Library even though the source code is not.
Whether this is true is especially significant if the work can be linked
without the Library, or if the work is itself a library. The threshold
for this to be true is not precisely defined by law.

If such an object file uses only numerical parameters, data structure
layouts and accessors, and small macros and small inline functions (ten
lines or less in length), then the use of the object file is
unrestricted, regardless of whether it is legally a derivative work.
(Executables containing this object code plus portions of the Library
will still fall under Section 6.)

Otherwise, if the work is a derivative of the Library, you may
distribute the object code for the work under the terms of Section 6.
Any executables containing that work also fall under Section 6, whether
or not they are linked directly with the Library itself.

6. As an exception to the Sections above, you may also combine or link a
"work that uses the Library" with the Library to produce a work
containing portions of the Library, and distribute that work under terms
of your choice, provided that the terms permit modification of the work
for the customer's own use and reverse engineering for debugging such
modifications.

You must give prominent notice with each copy of the work that the
Library is used in it and that the Library and its use are covered by
this License. You must supply a copy of this License. If the work during
execution displays copyright notices, you must include the copyright
notice for the Library among them, as well as a reference directing the
user to the copy of this License. Also, you must do one of these things:

    a) Accompany the work with the complete corresponding
       machine-readable source code for the Library including whatever
       changes were used in the work (which must be distributed under
       Sections 1 and 2 above); and, if the work is an executable linked
       with the Library, with the complete machine-readable "work that
       uses the Library", as object code and/or source code, so that the
       user can modify the Library and then relink to produce a modified
       executable containing the modified Library. (It is understood that
       the user who changes the contents of definitions files in the
       Library will not necessarily be able to recompile the application
       to use the modified definitions.)
    b) Use a suitable shared library mechanism for linking with the
       Library. A suitable mechanism is one that (1) uses at run time a
       copy of the library already present on the user's computer system,
       rather than copying library functions into the executable, and (2)
       will operate properly with a modified version of the library, if
       the user installs one, as long as the modified version is
       interface-compatible with the version that the work was made with.
    c) Accompany the work with a written offer, valid for at least
       three years, to give the same user the materials specified in
       Subsection 6a, above, for a charge no more than the cost of
       performing this distribution.
    d) If distribution of the work is made by offering access to copy
       from a designated place, offer equivalent access to copy the above
       specified materials from the same place.
    e) Verify that the user has already received a copy of these
       materials or that you have already sent this user a copy. 

For an executable, the required form of the "work that uses the Library"
must include any data and utility programs needed for reproducing the
executable from it. However, as a special exception, the materials to be
distributed need not include anything that is normally distributed (in
either source or binary form) with the major components (compiler,
kernel, and so on) of the operating system on which the executable runs,
unless that component itself accompanies the executable.

It may happen that this requirement contradicts the license restrictions
of other proprietary libraries that do not normally accompany the
operating system. Such a contradiction means you cannot use both them
and the Library together in an executable that you distribute.

7. You may place library facilities that are a work based on the Library
side-by-side in a single library together with other library facilities
not covered by this License, and distribute such a combined library,
provided that the separate distribution of the work based on the Library
and of the other library facilities is otherwise permitted, and provided
that you do these two things:

    a) Accompany the combined library with a copy of the same work
       based on the Library, uncombined with any other library
       facilities. This must be distributed under the terms of the
       Sections above.
    b) Give prominent notice with the combined library of the fact
       that part of it is a work based on the Library, and explaining
       where to find the accompanying uncombined form of the same work. 

8. You may not copy, modify, sublicense, link with, or distribute the
Library except as expressly provided under this License. Any attempt
otherwise to copy, modify, sublicense, link with, or distribute the
Library is void, and will automatically terminate your rights under this
License. However, parties who have received copies, or rights, from you
under this License will not have their licenses terminated so long as
such parties remain in full compliance.

9. You are not required to accept this License, since you have not
signed it. However, nothing else grants you permission to modify or
distribute the Library or its derivative works. These actions are
prohibited by law if you do not accept this License. Therefore, by
modifying or distributing the Library (or any work based on the
Library), you indicate your acceptance of this License to do so, and all
its terms and conditions for copying, distributing or modifying the
Library or works based on it.

10. Each time you redistribute the Library (or any work based on the
Library), the recipient automatically receives a license from the
original licensor to copy, distribute, link with or modify the Library
subject to these terms and conditions. You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties with
this License.

11. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot distribute
so as to satisfy simultaneously your obligations under this License and
any other pertinent obligations, then as a consequence you may not
distribute the Library at all. For example, if a patent license would
not permit royalty-free redistribution of the Library by all those who
receive copies directly or indirectly through you, then the only way you
could satisfy both it and this License would be to refrain entirely from
distribution of the Library.

If any portion of this section is held invalid or unenforceable under
any particular circumstance, the balance of the section is intended to
apply, and the section as a whole is intended to apply in other
circumstances.

It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system which is implemented
by public license practices. Many people have made generous
contributions to the wide range of software distributed through that
system in reliance on consistent application of that system; it is up to
the author/donor to decide if he or she is willing to distribute
software through any other system and a licensee cannot impose that
choice.

This section is intended to make thoroughly clear what is believed to be
a consequence of the rest of this License.

12. If the distribution and/or use of the Library is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Library under this License may
add an explicit geographical distribution limitation excluding those
countries, so that distribution is permitted only in or among countries
not thus excluded. In such case, this License incorporates the
limitation as if written in the body of this License.

13. The Free Software Foundation may publish revised and/or new versions
of the Lesser General Public License from time to time. Such new
versions will be similar in spirit to the present version, but may
differ in detail to address new problems or concerns.

Each version is given a distinguishing version number. If the Library
specifies a version number of this License which applies to it and "any
later version", you have the option of following the terms and
conditions either of that version or of any later version published by
the Free Software Foundation. If the Library does not specify a license
version number, you may choose any version ever published by the Free
Software Foundation.

14. If you wish to incorporate parts of the Library into other free
programs whose distribution conditions are incompatible with these,
write to the author to ask for permission. For software which is
copyrighted by the Free Software Foundation, write to the Free Software
Foundation; we sometimes make exceptions for this. Our decision will be
guided by the two goals of preserving the free status of all derivatives
of our free software and of promoting the sharing and reuse of software
generally.

NO WARRANTY

15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER
EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE
ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE LIBRARY IS WITH
YOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL
NECESSARY SERVICING, REPAIR OR CORRECTION.

16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY
AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU FOR
DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL
DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE LIBRARY
(INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED
INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF
THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR
OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. 


================================================
FILE: src/ANN/ReadMe.txt
================================================
ANN: Approximate Nearest Neighbors
Version: 1.1
Release date: May 3, 2005
----------------------------------------------------------------------------
Copyright (c) 1997-2005 University of Maryland and Sunil Arya and David
Mount. All Rights Reserved.  See Copyright.txt and License.txt for
complete information on terms and conditions of use and distribution of
this software.
----------------------------------------------------------------------------

Authors
-------
David Mount
Dept of Computer Science
University of Maryland,
College Park, MD 20742 USA
mount@cs.umd.edu
http://www.cs.umd.edu/~mount/

Sunil Arya
Dept of Computer Science
Hong University of Science and Technology
Clearwater Bay, HONG KONG
arya@cs.ust.hk
http://www.cs.ust.hk/faculty/arya/

Introduction
------------
ANN is a library written in the C++ programming language to support both
exact and approximate nearest neighbor searching in spaces of various
dimensions.  It was implemented by David M. Mount of the University of
Maryland, and Sunil Arya of the Hong Kong University of Science and
Technology.  ANN (pronounced like the name ``Ann'') stands for
Approximate Nearest Neighbors.  ANN is also a testbed containing
programs and procedures for generating data sets, collecting and
analyzing statistics on the performance of nearest neighbor algorithms
and data structures, and visualizing the geometric structure of these
data structures.

The ANN source code and documentation is available from the following
web page:

    http://www.cs.umd.edu/~mount/ANN

For more information on ANN and its use, see the ``ANN Programming
Manual,'' which is provided with the software distribution.

----------------------------------------------------------------------------
History
  Version 0.1  03/04/98
    Preliminary release
  Version 0.2  06/24/98
    Changes for SGI compiler.
  Version 1.0  04/01/05
    Fixed a number of small bugs
    Added dump/load operations
    Added annClose to eliminate minor memory leak
    Improved compatibility with current C++ compilers
    Added compilation for Microsoft Visual Studio.NET
    Added compilation for Linux 2.x
  Version 1.1  05/03/05
    Added make target for Mac OS X
    Added fixed-radius range searching and counting
    Added instructions on compiling/using ANN on Windows platforms
    Fixed minor output bug in ann2fig


================================================
FILE: src/ANN/bd_fix_rad_search.cpp
================================================
//----------------------------------------------------------------------
// File:			bd_fix_rad_search.cpp
// Programmer:		David Mount
// Description:		Standard bd-tree search
// Last modified:	05/03/05 (Version 1.1)
//----------------------------------------------------------------------
// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and
// David Mount.  All Rights Reserved.
// 
// This software and related documentation is part of the Approximate
// Nearest Neighbor Library (ANN).  This software is provided under
// the provisions of the Lesser GNU Public License (LGPL).  See the
// file ../ReadMe.txt for further information.
// 
// The University of Maryland (U.M.) and the authors make no
// representations about the suitability or fitness of this software for
// any purpose.  It is provided "as is" without express or implied
// warranty.
//----------------------------------------------------------------------
// History:
//	Revision 1.1  05/03/05
//		Initial release
//----------------------------------------------------------------------

#include "bd_tree.h"					// bd-tree declarations
#include "kd_fix_rad_search.h"			// kd-tree FR search declarations

//----------------------------------------------------------------------
//	Approximate searching for bd-trees.
//		See the file kd_FR_search.cpp for general information on the
//		approximate nearest neighbor search algorithm.  Here we
//		include the extensions for shrinking nodes.
//----------------------------------------------------------------------

//----------------------------------------------------------------------
//	bd_shrink::ann_FR_search - search a shrinking node
//----------------------------------------------------------------------

void ANNbd_shrink::ann_FR_search(ANNdist box_dist)
{
												// check dist calc term cond.
	if (ANNmaxPtsVisited != 0 && ANNptsVisited > ANNmaxPtsVisited) return;

	ANNdist inner_dist = 0;						// distance to inner box
	for (int i = 0; i < n_bnds; i++) {			// is query point in the box?
		if (bnds[i].out(ANNkdFRQ)) {			// outside this bounding side?
												// add to inner distance
			inner_dist = (ANNdist) ANN_SUM(inner_dist, bnds[i].dist(ANNkdFRQ));
		}
	}
	if (inner_dist <= box_dist) {				// if inner box is closer
		child[ANN_IN]->ann_FR_search(inner_dist);// search inner child first
		child[ANN_OUT]->ann_FR_search(box_dist);// ...then outer child
	}
	else {										// if outer box is closer
		child[ANN_OUT]->ann_FR_search(box_dist);// search outer child first
		child[ANN_IN]->ann_FR_search(inner_dist);// ...then outer child
	}
	ANN_FLOP(3*n_bnds)							// increment floating ops
	ANN_SHR(1)									// one more shrinking node
}


================================================
FILE: src/ANN/bd_pr_search.cpp
================================================
//----------------------------------------------------------------------
// File:			bd_pr_search.cpp
// Programmer:		David Mount
// Description:		Priority search for bd-trees
// Last modified:	01/04/05 (Version 1.0)
//----------------------------------------------------------------------
// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and
// David Mount.  All Rights Reserved.
// 
// This software and related documentation is part of the Approximate
// Nearest Neighbor Library (ANN).  This software is provided under
// the provisions of the Lesser GNU Public License (LGPL).  See the
// file ../ReadMe.txt for further information.
// 
// The University of Maryland (U.M.) and the authors make no
// representations about the suitability or fitness of this software for
// any purpose.  It is provided "as is" without express or implied
// warranty.
//----------------------------------------------------------------------
//History:
//	Revision 0.1  03/04/98
//		Initial release
//----------------------------------------------------------------------

#include "bd_tree.h"					// bd-tree declarations
#include "kd_pr_search.h"				// kd priority search declarations

//----------------------------------------------------------------------
//	Approximate priority searching for bd-trees.
//		See the file kd_pr_search.cc for general information on the
//		approximate nearest neighbor priority search algorithm.  Here
//		we include the extensions for shrinking nodes.
//----------------------------------------------------------------------

//----------------------------------------------------------------------
//	bd_shrink::ann_search - search a shrinking node
//----------------------------------------------------------------------

void ANNbd_shrink::ann_pri_search(ANNdist box_dist)
{
	ANNdist inner_dist = 0;						// distance to inner box
	for (int i = 0; i < n_bnds; i++) {			// is query point in the box?
		if (bnds[i].out(ANNprQ)) {				// outside this bounding side?
												// add to inner distance
			inner_dist = (ANNdist) ANN_SUM(inner_dist, bnds[i].dist(ANNprQ));
		}
	}
	if (inner_dist <= box_dist) {				// if inner box is closer
		if (child[ANN_OUT] != KD_TRIVIAL)		// enqueue outer if not trivial
			ANNprBoxPQ->insert(box_dist,child[ANN_OUT]);
												// continue with inner child
		child[ANN_IN]->ann_pri_search(inner_dist);
	}
	else {										// if outer box is closer
		if (child[ANN_IN] != KD_TRIVIAL)		// enqueue inner if not trivial
			ANNprBoxPQ->insert(inner_dist,child[ANN_IN]);
												// continue with outer child
		child[ANN_OUT]->ann_pri_search(box_dist);
	}
	ANN_FLOP(3*n_bnds)							// increment floating ops
	ANN_SHR(1)									// one more shrinking node
}


================================================
FILE: src/ANN/bd_search.cpp
================================================
//----------------------------------------------------------------------
// File:			bd_search.cpp
// Programmer:		David Mount
// Description:		Standard bd-tree search
// Last modified:	01/04/05 (Version 1.0)
//----------------------------------------------------------------------
// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and
// David Mount.  All Rights Reserved.
// 
// This software and related documentation is part of the Approximate
// Nearest Neighbor Library (ANN).  This software is provided under
// the provisions of the Lesser GNU Public License (LGPL).  See the
// file ../ReadMe.txt for further information.
// 
// The University of Maryland (U.M.) and the authors make no
// representations about the suitability or fitness of this software for
// any purpose.  It is provided "as is" without express or implied
// warranty.
//----------------------------------------------------------------------
// History:
//	Revision 0.1  03/04/98
//		Initial release
//----------------------------------------------------------------------

#include "bd_tree.h"					// bd-tree declarations
#include "kd_search.h"					// kd-tree search declarations

//----------------------------------------------------------------------
//	Approximate searching for bd-trees.
//		See the file kd_search.cpp for general information on the
//		approximate nearest neighbor search algorithm.  Here we
//		include the extensions for shrinking nodes.
//----------------------------------------------------------------------

//----------------------------------------------------------------------
//	bd_shrink::ann_search - search a shrinking node
//----------------------------------------------------------------------

void ANNbd_shrink::ann_search(ANNdist box_dist)
{
												// check dist calc term cond.
	if (ANNmaxPtsVisited != 0 && ANNptsVisited > ANNmaxPtsVisited) return;

	ANNdist inner_dist = 0;						// distance to inner box
	for (int i = 0; i < n_bnds; i++) {			// is query point in the box?
		if (bnds[i].out(ANNkdQ)) {				// outside this bounding side?
												// add to inner distance
			inner_dist = (ANNdist) ANN_SUM(inner_dist, bnds[i].dist(ANNkdQ));
		}
	}
	if (inner_dist <= box_dist) {				// if inner box is closer
		child[ANN_IN]->ann_search(inner_dist);	// search inner child first
		child[ANN_OUT]->ann_search(box_dist);	// ...then outer child
	}
	else {										// if outer box is closer
		child[ANN_OUT]->ann_search(box_dist);	// search outer child first
		child[ANN_IN]->ann_search(inner_dist);	// ...then outer child
	}
	ANN_FLOP(3*n_bnds)							// increment floating ops
	ANN_SHR(1)									// one more shrinking node
}


================================================
FILE: src/ANN/bd_tree.cpp
================================================
//----------------------------------------------------------------------
// File:			bd_tree.cpp
// Programmer:		David Mount
// Description:		Basic methods for bd-trees.
// Last modified:	01/04/05 (Version 1.0)
//----------------------------------------------------------------------
// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and
// David Mount.  All Rights Reserved.
//
// This software and related documentation is part of the Approximate
// Nearest Neighbor Library (ANN).  This software is provided under
// the provisions of the Lesser GNU Public License (LGPL).  See the
// file ../ReadMe.txt for further information.
//
// The University of Maryland (U.M.) and the authors make no
// representations about the suitability or fitness of this software for
// any purpose.  It is provided "as is" without express or implied
// warranty.
//----------------------------------------------------------------------
// History:
//	Revision 0.1  03/04/98
//		Initial release
//	Revision l.0  04/01/05
//		Fixed centroid shrink threshold condition to depend on the
//			dimension.
//		Moved dump routine to kd_dump.cpp.
//----------------------------------------------------------------------

#include "bd_tree.h"					// bd-tree declarations
#include "kd_util.h"					// kd-tree utilities
#include "kd_split.h"					// kd-tree splitting rules

#include "ANNperf.h"				// performance evaluation

//----------------------------------------------------------------------
//	Printing a bd-tree
//		These routines print a bd-tree.   See the analogous procedure
//		in kd_tree.cpp for more information.
//----------------------------------------------------------------------

void ANNbd_shrink::print(				// print shrinking node
		int level,						// depth of node in tree
		ostream &out)					// output stream
{
	child[ANN_OUT]->print(level+1, out);		// print out-child

	out << "    ";
	for (int i = 0; i < level; i++)				// print indentation
		out << "..";
	out << "Shrink";
	for (int j = 0; j < n_bnds; j++) {			// print sides, 2 per line
		if (j % 2 == 0) {
			out << "\n";						// newline and indentation
			for (int i = 0; i < level+2; i++) out << "  ";
		}
		out << "  ([" << bnds[j].cd << "]"
			 << (bnds[j].sd > 0 ? ">=" : "< ")
			 << bnds[j].cv << ")";
	}
	out << "\n";

	child[ANN_IN]->print(level+1, out);			// print in-child
}

//----------------------------------------------------------------------
//	kd_tree statistics utility (for performance evaluation)
//		This routine computes various statistics information for
//		shrinking nodes.  See file kd_tree.cpp for more information.
//----------------------------------------------------------------------

void ANNbd_shrink::getStats(					// get subtree statistics
	int					dim,					// dimension of space
	ANNkdStats			&st,					// stats (modified)
	ANNorthRect			&bnd_box)				// bounding box
{
	ANNkdStats ch_stats;						// stats for children
	ANNorthRect inner_box(dim);					// inner box of shrink

	annBnds2Box(bnd_box,						// enclosing box
				dim,							// dimension
				n_bnds,							// number of bounds
				bnds,							// bounds array
				inner_box);						// inner box (modified)
												// get stats for inner child
	ch_stats.reset();							// reset
	child[ANN_IN]->getStats(dim, ch_stats, inner_box);
	st.merge(ch_stats);							// merge them
												// get stats for outer child
	ch_stats.reset();							// reset
	child[ANN_OUT]->getStats(dim, ch_stats, bnd_box);
	st.merge(ch_stats);							// merge them

	st.depth++;									// increment depth
	st.n_shr++;									// increment number of shrinks
}

//----------------------------------------------------------------------
// bd-tree constructor
//		This is the main constructor for bd-trees given a set of points.
//		It first builds a skeleton kd-tree as a basis, then computes the
//		bounding box of the data points, and then invokes rbd_tree() to
//		actually build the tree, passing it the appropriate splitting
//		and shrinking information.
//----------------------------------------------------------------------

ANNkd_ptr rbd_tree(						// recursive construction of bd-tree
	ANNpointArray		pa,				// point array
	ANNidxArray			pidx,			// point indices to store in subtree
	int					n,				// number of points
	int					dim,			// dimension of space
	int					bsp,			// bucket space
	ANNorthRect			&bnd_box,		// bounding box for current node
	ANNkd_splitter		splitter,		// splitting routine
	ANNshrinkRule		shrink);		// shrinking rule

ANNbd_tree::ANNbd_tree(					// construct from point array
	ANNpointArray		pa,				// point array (with at least n pts)
	int					n,				// number of points
	int					dd,				// dimension
	int					bs,				// bucket size
	ANNsplitRule		split,			// splitting rule
	ANNshrinkRule		shrink)			// shrinking rule
	: ANNkd_tree(n, dd, bs)				// build skeleton base tree
{
	pts = pa;							// where the points are
	if (n == 0) return;					// no points--no sweat

	ANNorthRect bnd_box(dd);			// bounding box for points
									// construct bounding rectangle
	annEnclRect(pa, pidx, n, dd, bnd_box);
										// copy to tree structure
	bnd_box_lo = annCopyPt(dd, bnd_box.lo);
	bnd_box_hi = annCopyPt(dd, bnd_box.hi);

	switch (split) {					// build by rule
	case ANN_KD_STD:					// standard kd-splitting rule

		root = rbd_tree(pa, pidx, n, dd, bs, bnd_box, kd_split, shrink);
		break;
	case ANN_KD_MIDPT:					// midpoint split
		root = rbd_tree(pa, pidx, n, dd, bs, bnd_box, midpt_split, shrink);
		break;
	case ANN_KD_SUGGEST:				// best (in our opinion)
	case ANN_KD_SL_MIDPT:				// sliding midpoint split
		root = rbd_tree(pa, pidx, n, dd, bs, bnd_box, sl_midpt_split, shrink);
		break;
	case ANN_KD_FAIR:					// fair split
		root = rbd_tree(pa, pidx, n, dd, bs, bnd_box, fair_split, shrink);
		break;
	case ANN_KD_SL_FAIR:				// sliding fair split
		root = rbd_tree(pa, pidx, n, dd, bs,
						bnd_box, sl_fair_split, shrink);
		break;
	default:
		annError("Illegal splitting method", ANNabort);
	}
}

//----------------------------------------------------------------------
//	Shrinking rules
//----------------------------------------------------------------------

enum ANNdecomp {SPLIT, SHRINK};			// decomposition methods

//----------------------------------------------------------------------
//	trySimpleShrink - Attempt a simple shrink
//
//		We compute the tight bounding box of the points, and compute
//		the 2*dim ``gaps'' between the sides of the tight box and the
//		bounding box.  If any of the gaps is large enough relative to
//		the longest side of the tight bounding box, then we shrink
//		all sides whose gaps are large enough.  (The reason for
//		comparing against the tight bounding box, is that after
//		shrinking the longest box size will decrease, and if we use
//		the standard bounding box, we may decide to shrink twice in
//		a row.  Since the tight box is fixed, we cannot shrink twice
//		consecutively.)
//----------------------------------------------------------------------
const float BD_GAP_THRESH = 0.5;		// gap threshold (must be < 1)
const int   BD_CT_THRESH  = 2;			// min number of shrink sides

ANNdecomp trySimpleShrink(				// try a simple shrink
	ANNpointArray		pa,				// point array
	ANNidxArray			pidx,			// point indices to store in subtree
	int					n,				// number of points
	int					dim,			// dimension of space
	const ANNorthRect	&bnd_box,		// current bounding box
	ANNorthRect			&inner_box)		// inner box if shrinking (returned)
{
	int i;
												// compute tight bounding box
	annEnclRect(pa, pidx, n, dim, inner_box);

	ANNcoord max_length = 0;					// find longest box side
	for (i = 0; i < dim; i++) {
		ANNcoord length = inner_box.hi[i] - inner_box.lo[i];
		if (length > max_length) {
			max_length = length;
		}
	}

	int shrink_ct = 0;							// number of sides we shrunk
	for (i = 0; i < dim; i++) {					// select which sides to shrink
												// gap between boxes
		ANNcoord gap_hi = bnd_box.hi[i] - inner_box.hi[i];
												// big enough gap to shrink?
		if (gap_hi < max_length*BD_GAP_THRESH)
			inner_box.hi[i] = bnd_box.hi[i];	// no - expand
		else shrink_ct++;						// yes - shrink this side

												// repeat for high side
		ANNcoord gap_lo = inner_box.lo[i] - bnd_box.lo[i];
		if (gap_lo < max_length*BD_GAP_THRESH)
			inner_box.lo[i] = bnd_box.lo[i];	// no - expand
		else shrink_ct++;						// yes - shrink this side
	}

	if (shrink_ct >= BD_CT_THRESH)				// did we shrink enough sides?
		 return SHRINK;
	else return SPLIT;
}

//----------------------------------------------------------------------
//	tryCentroidShrink - Attempt a centroid shrink
//
//	We repeatedly apply the splitting rule, always to the larger subset
//	of points, until the number of points decreases by the constant
//	fraction BD_FRACTION.  If this takes more than dim*BD_MAX_SPLIT_FAC
//	splits for this to happen, then we shrink to the final inner box
//	Otherwise we split.
//----------------------------------------------------------------------

const float	BD_MAX_SPLIT_FAC = 0.5;		// maximum number of splits allowed
const float	BD_FRACTION = 0.5;			// ...to reduce points by this fraction
										// ...This must be < 1.

ANNdecomp tryCentroidShrink(			// try a centroid shrink
	ANNpointArray		pa,				// point array
	ANNidxArray			pidx,			// point indices to store in subtree
	int					n,				// number of points
	int					dim,			// dimension of space
	const ANNorthRect	&bnd_box,		// current bounding box
	ANNkd_splitter		splitter,		// splitting procedure
	ANNorthRect			&inner_box)		// inner box if shrinking (returned)
{
	int n_sub = n;						// number of points in subset
	int n_goal = (int) (n*BD_FRACTION); // number of point in goal
	int n_splits = 0;					// number of splits needed
										// initialize inner box to bounding box
	annAssignRect(dim, inner_box, bnd_box);

	while (n_sub > n_goal) {			// keep splitting until goal reached
		int cd;							// cut dim from splitter (ignored)
		ANNcoord cv;					// cut value from splitter (ignored)
		int n_lo;						// number of points on low side
										// invoke splitting procedure
		(*splitter)(pa, pidx, inner_box, n_sub, dim, cd, cv, n_lo);
		n_splits++;						// increment split count

		if (n_lo >= n_sub/2) {			// most points on low side
			inner_box.hi[cd] = cv;		// collapse high side
			n_sub = n_lo;				// recurse on lower points
		}
		else {							// most points on high side
			inner_box.lo[cd] = cv;		// collapse low side
			pidx += n_lo;				// recurse on higher points
			n_sub -= n_lo;
		}
	}
    if (n_splits > dim*BD_MAX_SPLIT_FAC)// took too many splits
		return SHRINK;					// shrink to final subset
	else
		return SPLIT;
}

//----------------------------------------------------------------------
//	selectDecomp - select which decomposition to use
//----------------------------------------------------------------------

ANNdecomp selectDecomp(			// select decomposition method
	ANNpointArray		pa,				// point array
	ANNidxArray			pidx,			// point indices to store in subtree
	int					n,				// number of points
	int					dim,			// dimension of space
	const ANNorthRect	&bnd_box,		// current bounding box
	ANNkd_splitter		splitter,		// splitting procedure
	ANNshrinkRule		shrink,			// shrinking rule
	ANNorthRect			&inner_box)		// inner box if shrinking (returned)
{
	ANNdecomp decomp = SPLIT;			// decomposition

	switch (shrink) {					// check shrinking rule
	case ANN_BD_NONE:					// no shrinking allowed
		decomp = SPLIT;
		break;
	case ANN_BD_SUGGEST:				// author's suggestion
	case ANN_BD_SIMPLE:					// simple shrink
		decomp = trySimpleShrink(
				pa, pidx,				// points and indices
				n, dim,					// number of points and dimension
				bnd_box,				// current bounding box
				inner_box);				// inner box if shrinking (returned)
		break;
	case ANN_BD_CENTROID:				// centroid shrink
		decomp = tryCentroidShrink(
				pa, pidx,				// points and indices
				n, dim,					// number of points and dimension
				bnd_box,				// current bounding box
				splitter,				// splitting procedure
				inner_box);				// inner box if shrinking (returned)
		break;
	default:
		annError("Illegal shrinking rule", ANNabort);
	}
	return decomp;
}

//----------------------------------------------------------------------
//	rbd_tree - recursive procedure to build a bd-tree
//
//		This is analogous to rkd_tree, but for bd-trees.  See the
//		procedure rkd_tree() in kd_split.cpp for more information.
//
//		If the number of points falls below the bucket size, then a
//		leaf node is created for the points.  Otherwise we invoke the
//		procedure selectDecomp() which determines whether we are to
//		split or shrink.  If splitting is chosen, then we essentially
//		do exactly as rkd_tree() would, and invoke the specified
//		splitting procedure to the points.  Otherwise, the selection
//		procedure returns a bounding box, from which we extract the
//		appropriate shrinking bounds, and create a shrinking node.
//		Finally the points are subdivided, and the procedure is
//		invoked recursively on the two subsets to form the children.
//----------------------------------------------------------------------

ANNkd_ptr rbd_tree(				// recursive construction of bd-tree
	ANNpointArray		pa,				// point array
	ANNidxArray			pidx,			// point indices to store in subtree
	int					n,				// number of points
	int					dim,			// dimension of space
	int					bsp,			// bucket space
	ANNorthRect			&bnd_box,		// bounding box for current node
	ANNkd_splitter		splitter,		// splitting routine
	ANNshrinkRule		shrink)			// shrinking rule
{
	ANNdecomp decomp;					// decomposition method

	ANNorthRect inner_box(dim);			// inner box (if shrinking)

	if (n <= bsp) {						// n small, make a leaf node
		if (n == 0)						// empty leaf node
			return KD_TRIVIAL;			// return (canonical) empty leaf
		else							// construct the node and return
			return new ANNkd_leaf(n, pidx);
	}

	decomp = selectDecomp(				// select decomposition method
				pa, pidx,				// points and indices
				n, dim,					// number of points and dimension
				bnd_box,				// current bounding box
				splitter, shrink,		// splitting/shrinking methods
				inner_box);				// inner box if shrinking (returned)

	if (decomp == SPLIT) {				// split selected
		int cd;							// cutting dimension
		ANNcoord cv;					// cutting value
		int n_lo;						// number on low side of cut
										// invoke splitting procedure
		(*splitter)(pa, pidx, bnd_box, n, dim, cd, cv, n_lo);

		ANNcoord lv = bnd_box.lo[cd];	// save bounds for cutting dimension
		ANNcoord hv = bnd_box.hi[cd];

		bnd_box.hi[cd] = cv;			// modify bounds for left subtree
		ANNkd_ptr lo = rbd_tree(		// build left subtree
				pa, pidx, n_lo,			// ...from pidx[0..n_lo-1]
				dim, bsp, bnd_box, splitter, shrink);
		bnd_box.hi[cd] = hv;			// restore bounds

		bnd_box.lo[cd] = cv;			// modify bounds for right subtree
		ANNkd_ptr hi = rbd_tree(		// build right subtree
				pa, pidx + n_lo, n-n_lo,// ...from pidx[n_lo..n-1]
				dim, bsp, bnd_box, splitter, shrink);
		bnd_box.lo[cd] = lv;			// restore bounds
										// create the splitting node
		return new ANNkd_split(cd, cv, lv, hv, lo, hi);
	}
	else {								// shrink selected
		int n_in;						// number of points in box
		int n_bnds;						// number of bounding sides

		annBoxSplit(					// split points around inner box
				pa,						// points to split
				pidx,					// point indices
				n,						// number of points
				dim,					// dimension
				inner_box,				// inner box
				n_in);					// number of points inside (returned)

		ANNkd_ptr in = rbd_tree(		// build inner subtree pidx[0..n_in-1]
				pa, pidx, n_in, dim, bsp, inner_box, splitter, shrink);
		ANNkd_ptr out = rbd_tree(		// build outer subtree pidx[n_in..n]
				pa, pidx+n_in, n - n_in, dim, bsp, bnd_box, splitter, shrink);

		ANNorthHSArray bnds = NULL;		// bounds (alloc in Box2Bnds and
										// ...freed in bd_shrink destroyer)

		annBox2Bnds(					// convert inner box to bounds
				inner_box,				// inner box
				bnd_box,				// enclosing box
				dim,					// dimension
				n_bnds,					// number of bounds (returned)
				bnds);					// bounds array (modified)

										// return shrinking node
		return new ANNbd_shrink(n_bnds, bnds, in, out);
	}
}


================================================
FILE: src/ANN/bd_tree.h
================================================
//----------------------------------------------------------------------
// File:			bd_tree.h
// Programmer:		David Mount
// Description:		Declarations for standard bd-tree routines
// Last modified:	01/04/05 (Version 1.0)
//----------------------------------------------------------------------
// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and
// David Mount.  All Rights Reserved.
//
// This software and related documentation is part of the Approximate
// Nearest Neighbor Library (ANN).  This software is provided under
// the provisions of the Lesser GNU Public License (LGPL).  See the
// file ../ReadMe.txt for further information.
//
// The University of Maryland (U.M.) and the authors make no
// representations about the suitability or fitness of this software for
// any purpose.  It is provided "as is" without express or implied
// warranty.
//----------------------------------------------------------------------
// History:
//	Revision 0.1  03/04/98
//		Initial release
//	Revision 1.0  04/01/05
//		Changed IN, OUT to ANN_IN, ANN_OUT
//----------------------------------------------------------------------

#ifndef ANN_bd_tree_H
#define ANN_bd_tree_H

#include "ANNx.h"					// all ANN includes
#include "kd_tree.h"					// kd-tree includes

//----------------------------------------------------------------------
//	bd-tree shrinking node.
//		The main addition in the bd-tree is the shrinking node, which
//		is declared here.
//
//		Shrinking nodes are defined by list of orthogonal halfspaces.
//		These halfspaces define a (possibly unbounded) orthogonal
//		rectangle.  There are two children, in and out.  Points that
//		lie within this rectangle are stored in the in-child, and the
//		other points are stored in the out-child.
//
//		We use a list of orthogonal halfspaces rather than an
//		orthogonal rectangle object because typically the number of
//		sides of the shrinking box will be much smaller than the
//		worst case bound of 2*dim.
//
//		BEWARE: Note that constructor just copies the pointer to the
//		bounding array, but the destructor deallocates it.  This is
//		rather poor practice, but happens to be convenient.  The list
//		is allocated in the bd-tree building procedure rbd_tree() just
//		prior to construction, and is used for no other purposes.
//
//		WARNING: In the near neighbor searching code it is assumed that
//		the list of bounding halfspaces is irredundant, meaning that there
//		are no two distinct halfspaces in the list with the same outward
//		pointing normals.
//----------------------------------------------------------------------

class ANNbd_shrink : public ANNkd_node	// splitting node of a kd-tree
{
	int					n_bnds;			// number of bounding halfspaces
	ANNorthHSArray		bnds;			// list of bounding halfspaces
	ANNkd_ptr			child[2];		// in and out children
public:
	ANNbd_shrink(						// constructor
		int				nb,				// number of bounding halfspaces
		ANNorthHSArray	bds,			// list of bounding halfspaces
		ANNkd_ptr ic=NULL, ANNkd_ptr oc=NULL)	// children
		{
			n_bnds			= nb;				// cutting dimension
			bnds			= bds;				// assign bounds
			child[ANN_IN]	= ic;				// set children
			child[ANN_OUT]	= oc;
		}

	~ANNbd_shrink()						// destructor
		{
			if (child[ANN_IN]!= NULL && child[ANN_IN]!=  KD_TRIVIAL)
				delete child[ANN_IN];
			if (child[ANN_OUT]!= NULL&& child[ANN_OUT]!= KD_TRIVIAL)
				delete child[ANN_OUT];
			if (bnds != NULL)
				delete [] bnds;			// delete bounds
		}

	virtual void getStats(						// get tree statistics
				int dim,						// dimension of space
				ANNkdStats &st,					// statistics
				ANNorthRect &bnd_box);			// bounding box
	virtual void print(int level, ostream &out);// print node
	virtual void dump(ostream &out);			// dump node

	virtual void ann_search(ANNdist);			// standard search
	virtual void ann_pri_search(ANNdist);		// priority search
	virtual void ann_FR_search(ANNdist); 		// fixed-radius search
};

#endif


================================================
FILE: src/ANN/brute.cpp
================================================
//----------------------------------------------------------------------
// File:			brute.cpp
// Programmer:		Sunil Arya and David Mount
// Description:		Brute-force nearest neighbors
// Last modified:	05/03/05 (Version 1.1)
//----------------------------------------------------------------------
// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and
// David Mount.  All Rights Reserved.
//
// This software and related documentation is part of the Approximate
// Nearest Neighbor Library (ANN).  This software is provided under
// the provisions of the Lesser GNU Public License (LGPL).  See the
// file ../ReadMe.txt for further information.
//
// The University of Maryland (U.M.) and the authors make no
// representations about the suitability or fitness of this software for
// any purpose.  It is provided "as is" without express or implied
// warranty.
//----------------------------------------------------------------------
// History:
//	Revision 0.1  03/04/98
//		Initial release
//	Revision 1.1  05/03/05
//		Added fixed-radius kNN search
//----------------------------------------------------------------------

#include "ANNx.h"					// all ANN includes
#include "pr_queue_k.h"					// k element priority queue

//----------------------------------------------------------------------
//		Brute-force search simply stores a pointer to the list of
//		data points and searches linearly for the nearest neighbor.
//		The k nearest neighbors are stored in a k-element priority
//		queue (which is implemented in a pretty dumb way as well).
//
//		If ANN_ALLOW_SELF_MATCH is ANNfalse then data points at distance
//		zero are not considered.
//
//		Note that the error bound eps is passed in, but it is ignored.
//		These routines compute exact nearest neighbors (which is needed
//		for validation purposes in ann_test.cpp).
//----------------------------------------------------------------------

ANNbruteForce::ANNbruteForce(			// constructor from point array
	ANNpointArray		pa,				// point array
	int					n,				// number of points
	int					dd)				// dimension
{
	dim = dd;  n_pts = n;  pts = pa;
}

ANNbruteForce::~ANNbruteForce() { }		// destructor (empty)

void ANNbruteForce::annkSearch(			// approx k near neighbor search
	ANNpoint			q,				// query point
	int					k,				// number of near neighbors to return
	ANNidxArray			nn_idx,			// nearest neighbor indices (returned)
	ANNdistArray		dd,				// dist to near neighbors (returned)
	double				eps)			// error bound (ignored)
{
	ANNmin_k mk(k);						// construct a k-limited priority queue
	int i;

	if (k > n_pts) {					// too many near neighbors?
		annError("Requesting more near neighbors than data points", ANNabort);
	}
										// run every point through queue
	for (i = 0; i < n_pts; i++) {
										// compute distance to point
		ANNdist sqDist = annDist(dim, pts[i], q);
		if (ANN_ALLOW_SELF_MATCH || sqDist != 0)
			mk.insert(sqDist, i);
	}
	for (i = 0; i < k; i++) {			// extract the k closest points
		dd[i] = mk.ith_smallest_key(i);
		nn_idx[i] = mk.ith_smallest_info(i);
	}
}

int ANNbruteForce::annkFRSearch(		// approx fixed-radius kNN search
	ANNpoint			q,				// query point
	ANNdist				sqRad,			// squared radius
	int					k,				// number of near neighbors to return
	ANNidxArray			nn_idx,			// nearest neighbor array (returned)
	ANNdistArray		dd,				// dist to near neighbors (returned)
	double				eps)			// error bound
{
	ANNmin_k mk(k);						// construct a k-limited priority queue
	int i;
	int pts_in_range = 0;				// number of points in query range
										// run every point through queue
	for (i = 0; i < n_pts; i++) {
										// compute distance to point
		ANNdist sqDist = annDist(dim, pts[i], q);
		if (sqDist <= sqRad &&			// within radius bound
			(ANN_ALLOW_SELF_MATCH || sqDist != 0)) { // ...and no self match
			mk.insert(sqDist, i);
			pts_in_range++;
		}
	}
	for (i = 0; i < k; i++) {			// extract the k closest points
		if (dd != NULL)
			dd[i] = mk.ith_smallest_key(i);
		if (nn_idx != NULL)
			nn_idx[i] = mk.ith_smallest_info(i);
	}

	return pts_in_range;
}

// MFH: version that returns all points
std::pair< std::vector<int>, std::vector<double> > ANNbruteForce::annkFRSearch2(		// approx fixed-radius kNN search
	ANNpoint			q,				// query point
	ANNdist				sqRad,			// squared radius
	double				eps)			// error bound
{
	std::vector<int> points;
	std::vector<double> dists;
	int i;
	int pts_in_range = 0;				// number of points in query range
										// run every point through queue
	for (i = 0; i < n_pts; i++) {
										// compute distance to point
		ANNdist sqDist = annDist(dim, pts[i], q);
		if (sqDist <= sqRad &&			// within radius bound
			(ANN_ALLOW_SELF_MATCH || sqDist != 0)) { // ...and no self match
			points.push_back(i);
			dists.push_back(sqDist);
			pts_in_range++;
		}
	}

	return std::make_pair(points, dists);
}


================================================
FILE: src/ANN/kd_dump.cpp
================================================
//----------------------------------------------------------------------
// File:			kd_dump.cc
// Programmer:		David Mount
// Description:		Dump and Load for kd- and bd-trees
// Last modified:	01/04/05 (Version 1.0)
//----------------------------------------------------------------------
// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and
// David Mount.  All Rights Reserved.
// 
// This software and related documentation is part of the Approximate
// Nearest Neighbor Library (ANN).  This software is provided under
// the provisions of the Lesser GNU Public License (LGPL).  See the
// file ../ReadMe.txt for further information.
// 
// The University of Maryland (U.M.) and the authors make no
// representations about the suitability or fitness of this software for
// any purpose.  It is provided "as is" without express or implied
// warranty.
//----------------------------------------------------------------------
// History:
//	Revision 0.1  03/04/98
//		Initial release
//	Revision 1.0  04/01/05
//		Moved dump out of kd_tree.cc into this file.
//		Added kd-tree load constructor.
//      Revision 2/29/08
//              added cstdlib and std:: along with cstdlib. and sting.h
//----------------------------------------------------------------------
// This file contains routines for dumping kd-trees and bd-trees and
// reloading them. (It is an abuse of policy to include both kd- and
// bd-tree routines in the same file, sorry.  There should be no problem
// in deleting the bd- versions of the routines if they are not
// desired.)
//----------------------------------------------------------------------

#include <cstdlib>
#include <stdio.h>
#include <string.h>

//using namespace std;					// make std:: available

#include "kd_tree.h"					// kd-tree declarations
#include "bd_tree.h"					// bd-tree declarations


//----------------------------------------------------------------------
//		Constants
//----------------------------------------------------------------------

const int		STRING_LEN		= 500;	// maximum string length
// const double	EPSILON			= 1E-5; // small number for float comparison

enum ANNtreeType {KD_TREE, BD_TREE};	// tree types (used in loading)

//----------------------------------------------------------------------
//		Procedure declarations
//----------------------------------------------------------------------

static ANNkd_ptr annReadDump(			// read dump file
	istream				&in,					// input stream
	ANNtreeType			tree_type,				// type of tree expected
	ANNpointArray		&the_pts,				// new points (if applic)
	ANNidxArray			&the_pidx,				// point indices (returned)
	int					&the_dim,				// dimension (returned)
	int					&the_n_pts,				// number of points (returned)
	int					&the_bkt_size,			// bucket size (returned)
	ANNpoint			&the_bnd_box_lo,		// low bounding point
	ANNpoint			&the_bnd_box_hi);		// high bounding point

static ANNkd_ptr annReadTree(			// read tree-part of dump file
	istream				&in,					// input stream
	ANNtreeType			tree_type,				// type of tree expected
	ANNidxArray			the_pidx,				// point indices (modified)
	int					&next_idx);				// next index (modified)

//----------------------------------------------------------------------
//	ANN kd- and bd-tree Dump Format
//		The dump file begins with a header containing the version of
//		ANN, an optional section containing the points, followed by
//		a description of the tree.	The tree is printed in preorder.
//
//		Format:
//		#ANN <version number> <comments> [END_OF_LINE]
//		points <dim> <n_pts>			(point coordinates: this is optional)
//		0 <xxx> <xxx> ... <xxx>			(point indices and coordinates)
//		1 <xxx> <xxx> ... <xxx>
//		  ...
//		tree <dim> <n_pts> <bkt_size>
//		<xxx> <xxx> ... <xxx>			(lower end of bounding box)
//		<xxx> <xxx> ... <xxx>			(upper end of bounding box)
//				If the tree is null, then a single line "null" is
//				output.	 Otherwise the nodes of the tree are printed
//				one per line in preorder.  Leaves and splitting nodes 
//				have the following formats:
//		Leaf node:
//				leaf <n_pts> <bkt[0]> <bkt[1]> ... <bkt[n-1]>
//		Splitting nodes:
//				split <cut_dim> <cut_val> <lo_bound> <hi_bound>
//
//		For bd-trees:
//
//		Shrinking nodes:
//				shrink <n_bnds>
//						<cut_dim> <cut_val> <side>
//						<cut_dim> <cut_val> <side>
//						... (repeated n_bnds times)
//----------------------------------------------------------------------

void ANNkd_tree::Dump(					// dump entire tree
		ANNbool with_pts,				// print points as well?
		ostream &out)					// output stream
{
	out << "#ANN " << ANNversion << "\n";
	out.precision(ANNcoordPrec);		// use full precision in dumping
	if (with_pts) {						// print point coordinates
		out << "points " << dim << " " << n_pts << "\n";
		for (int i = 0; i < n_pts; i++) {
			out << i << " ";
			annPrintPt(pts[i], dim, out);
			out << "\n";
		}
	}
	out << "tree "						// print tree elements
		<< dim << " "
		<< n_pts << " "
		<< bkt_size << "\n";

	annPrintPt(bnd_box_lo, dim, out);	// print lower bound
	out << "\n";
	annPrintPt(bnd_box_hi, dim, out);	// print upper bound
	out << "\n";

	if (root == NULL)					// empty tree?
		out << "null\n";
	else {
		root->dump(out);				// invoke printing at root
	}
	out.precision(0);					// restore default precision
}

void ANNkd_split::dump(					// dump a splitting node
		ostream &out)					// output stream
{
	out << "split " << cut_dim << " " << cut_val << " ";
	out << cd_bnds[ANN_LO] << " " << cd_bnds[ANN_HI] << "\n";

	child[ANN_LO]->dump(out);			// print low child
	child[ANN_HI]->dump(out);			// print high child
}

void ANNkd_leaf::dump(					// dump a leaf node
		ostream &out)					// output stream
{
	if (this == KD_TRIVIAL) {			// canonical trivial leaf node
		out << "leaf 0\n";				// leaf no points
	}
	else{
		out << "leaf " << n_pts;
		for (int j = 0; j < n_pts; j++) {
			out << " " << bkt[j];
		}
		out << "\n";
	}
}

void ANNbd_shrink::dump(				// dump a shrinking node
		ostream &out)					// output stream
{
	out << "shrink " << n_bnds << "\n";
	for (int j = 0; j < n_bnds; j++) {
		out << bnds[j].cd << " " << bnds[j].cv << " " << bnds[j].sd << "\n";
	}
	child[ANN_IN]->dump(out);			// print in-child
	child[ANN_OUT]->dump(out);			// print out-child
}

//----------------------------------------------------------------------
// Load kd-tree from dump file
//		This rebuilds a kd-tree which was dumped to a file.	 The dump
//		file contains all the basic tree information according to a
//		preorder traversal.	 We assume that the dump file also contains
//		point data.	 (This is to guarantee the consistency of the tree.)
//		If not, then an error is generated.
//
//		Indirectly, this procedure allocates space for points, point
//		indices, all nodes in the tree, and the bounding box for the
//		tree.  When the tree is destroyed, all but the points are
//		deallocated.
//
//		This routine calls annReadDump to do all the work.
//----------------------------------------------------------------------

ANNkd_tree::ANNkd_tree(					// build from dump file
	istream				&in)					// input stream for dump file
{
	int the_dim;								// local dimension
	int the_n_pts;								// local number of points
	int the_bkt_size;							// local number of points
	ANNpoint the_bnd_box_lo;					// low bounding point
	ANNpoint the_bnd_box_hi;					// high bounding point
	ANNpointArray the_pts;						// point storage
	ANNidxArray the_pidx;						// point index storage
	ANNkd_ptr the_root;							// root of the tree

	the_root = annReadDump(						// read the dump file
		in,										// input stream
		KD_TREE,								// expecting a kd-tree
		the_pts,								// point array (returned)
		the_pidx,								// point indices (returned)
		the_dim, the_n_pts, the_bkt_size,		// basic tree info (returned)
		the_bnd_box_lo, the_bnd_box_hi);		// bounding box info (returned)

												// create a skeletal tree
	SkeletonTree(the_n_pts, the_dim, the_bkt_size, the_pts, the_pidx);

	bnd_box_lo = the_bnd_box_lo;
	bnd_box_hi = the_bnd_box_hi;

	root = the_root;							// set the root
}

ANNbd_tree::ANNbd_tree(					// build bd-tree from dump file
	istream				&in) : ANNkd_tree()		// input stream for dump file
{
	int the_dim;								// local dimension
	int the_n_pts;								// local number of points
	int the_bkt_size;							// local number of points
	ANNpoint the_bnd_box_lo;					// low bounding point
	ANNpoint the_bnd_box_hi;					// high bounding point
	ANNpointArray the_pts;						// point storage
	ANNidxArray the_pidx;						// point index storage
	ANNkd_ptr the_root;							// root of the tree

	the_root = annReadDump(						// read the dump file
		in,										// input stream
		BD_TREE,								// expecting a bd-tree
		the_pts,								// point array (returned)
		the_pidx,								// point indices (returned)
		the_dim, the_n_pts, the_bkt_size,		// basic tree info (returned)
		the_bnd_box_lo, the_bnd_box_hi);		// bounding box info (returned)

												// create a skeletal tree
	SkeletonTree(the_n_pts, the_dim, the_bkt_size, the_pts, the_pidx);
	bnd_box_lo = the_bnd_box_lo;
	bnd_box_hi = the_bnd_box_hi;

	root = the_root;							// set the root
}

//----------------------------------------------------------------------
//	annReadDump - read a dump file
//
//		This procedure reads a dump file, constructs a kd-tree
//		and returns all the essential information needed to actually
//		construct the tree.	 Because this procedure is used for
//		constructing both kd-trees and bd-trees, the second argument
//		is used to indicate which type of tree we are expecting.
//----------------------------------------------------------------------

static ANNkd_ptr annReadDump(
	istream				&in,					// input stream
	ANNtreeType			tree_type,				// type of tree expected
	ANNpointArray		&the_pts,				// new points (returned)
	ANNidxArray			&the_pidx,				// point indices (returned)
	int					&the_dim,				// dimension (returned)
	int					&the_n_pts,				// number of points (returned)
	int					&the_bkt_size,			// bucket size (returned)
	ANNpoint			&the_bnd_box_lo,		// low bounding point (ret'd)
	ANNpoint			&the_bnd_box_hi)		// high bounding point (ret'd)
{
	int j;
	char str[STRING_LEN];						// storage for string
	char version[STRING_LEN];					// ANN version number
	ANNkd_ptr the_root = NULL;

	//------------------------------------------------------------------
	//	Input file header
	//------------------------------------------------------------------
	in >> str;									// input header
	if (strcmp(str, "#ANN") != 0) {				// incorrect header
		annError("Incorrect header for dump file", ANNabort);
	}
	in.getline(version, STRING_LEN);			// get version (ignore)

	//------------------------------------------------------------------
	//	Input the points
	//			An array the_pts is allocated and points are read from
	//			the dump file.
	//------------------------------------------------------------------
	in >> str;									// get major heading
	if (strcmp(str, "points") == 0) {			// points section
		in >> the_dim;							// input dimension
		in >> the_n_pts;						// number of points
												// allocate point storage
		the_pts = annAllocPts(the_n_pts, the_dim);
		for (int i = 0; i < the_n_pts; i++) {	// input point coordinates
			ANNidx idx;							// point index
			in >> idx;							// input point index
			if (idx < 0 || idx >= the_n_pts) {
				annError("Point index is out of range", ANNabort);
			}
			for (j = 0; j < the_dim; j++) {
				in >> the_pts[idx][j];			// read point coordinates
			}
		}
		in >> str;								// get next major heading
	}
	else {										// no points were input
		annError("Points must be supplied in the dump file", ANNabort);
	}

	//------------------------------------------------------------------
	//	Input the tree
	//			After the basic header information, we invoke annReadTree
	//			to do all the heavy work.  We create our own array of
	//			point indices (so we can pass them to annReadTree())
	//			but we do not deallocate them.	They will be deallocated
	//			when the tree is destroyed.
	//------------------------------------------------------------------
	if (strcmp(str, "tree") == 0) {				// tree section
		in >> the_dim;							// read dimension
		in >> the_n_pts;						// number of points
		in >> the_bkt_size;						// bucket size
		the_bnd_box_lo = annAllocPt(the_dim);	// allocate bounding box pts
		the_bnd_box_hi = annAllocPt(the_dim);

		for (j = 0; j < the_dim; j++) {			// read bounding box low
			in >> the_bnd_box_lo[j];
		}
		for (j = 0; j < the_dim; j++) {			// read bounding box low
			in >> the_bnd_box_hi[j];
		}
		the_pidx = new ANNidx[the_n_pts];		// allocate point index array
		int next_idx = 0;						// number of indices filled
												// read the tree and indices
		the_root = annReadTree(in, tree_type, the_pidx, next_idx);
		if (next_idx != the_n_pts) {			// didn't see all the points?
			annError("Didn't see as many points as expected", ANNwarn);
		}
	}
	else {
		annError("Illegal dump format.	Expecting section heading", ANNabort);
	}
	return the_root;
}

//----------------------------------------------------------------------
// annReadTree - input tree and return pointer
//
//		annReadTree reads in a node of the tree, makes any recursive
//		calls as needed to input the children of this node (if internal).
//		It returns a pointer to the node that was created.	An array
//		of point indices is given along with a pointer to the next
//		available location in the array.  As leaves are read, their
//		point indices are stored here, and the point buckets point
//		to the first entry in the array.
//
//		Recall that these are the formats.	The tree is given in
//		preorder.
//
//		Leaf node:
//				leaf <n_pts> <bkt[0]> <bkt[1]> ... <bkt[n-1]>
//		Splitting nodes:
//				split <cut_dim> <cut_val> <lo_bound> <hi_bound>
//
//		For bd-trees:
//
//		Shrinking nodes:
//				shrink <n_bnds>
//						<cut_dim> <cut_val> <side>
//						<cut_dim> <cut_val> <side>
//						... (repeated n_bnds times)
//----------------------------------------------------------------------

static ANNkd_ptr annReadTree(
	istream				&in,					// input stream
	ANNtreeType			tree_type,				// type of tree expected
	ANNidxArray			the_pidx,				// point indices (modified)
	int					&next_idx)				// next index (modified)
{
	char tag[STRING_LEN];						// tag (leaf, split, shrink)
	int n_pts;									// number of points in leaf
	int cd;										// cut dimension
	ANNcoord cv;								// cut value
	ANNcoord lb;								// low bound
	ANNcoord hb;								// high bound
	int n_bnds;									// number of bounding sides
	int sd;										// which side

	in >> tag;									// input node tag

	if (strcmp(tag, "null") == 0) {				// null tree
		return NULL;
	}
	//------------------------------------------------------------------
	//	Read a leaf
	//------------------------------------------------------------------
	if (strcmp(tag, "leaf") == 0) {				// leaf node

		in >> n_pts;							// input number of points
		int old_idx = next_idx;					// save next_idx
		if (n_pts == 0) {						// trivial leaf
			return KD_TRIVIAL;
		}
		else {
			for (int i = 0; i < n_pts; i++) {	// input point indices
				in >> the_pidx[next_idx++];		// store in array of indices
			}
		}
		return new ANNkd_leaf(n_pts, &the_pidx[old_idx]);
	}
	//------------------------------------------------------------------
	//	Read a splitting node
	//------------------------------------------------------------------
	else if (strcmp(tag, "split") == 0) {		// splitting node

		in >> cd >> cv >> lb >> hb;

												// read low and high subtrees
		ANNkd_ptr lc = annReadTree(in, tree_type, the_pidx, next_idx);
		ANNkd_ptr hc = annReadTree(in, tree_type, the_pidx, next_idx);
												// create new node and return
		return new ANNkd_split(cd, cv, lb, hb, lc, hc);
	}
	//------------------------------------------------------------------
	//	Read a shrinking node (bd-tree only)
	//------------------------------------------------------------------
	else if (strcmp(tag, "shrink") == 0) {		// shrinking node
		if (tree_type != BD_TREE) {
			annError("Shrinking node not allowed in kd-tree", ANNabort);
		}

		in >> n_bnds;							// number of bounding sides
												// allocate bounds array
		ANNorthHSArray bds = new ANNorthHalfSpace[n_bnds];
		for (int i = 0; i < n_bnds; i++) {
			in >> cd >> cv >> sd;				// input bounding halfspace
												// copy to array
			bds[i] = ANNorthHalfSpace(cd, cv, sd);
		}
												// read inner and outer subtrees
		ANNkd_ptr ic = annReadTree(in, tree_type, the_pidx, next_idx);
		ANNkd_ptr oc = annReadTree(in, tree_type, the_pidx, next_idx);
												// create new node and return
		return new ANNbd_shrink(n_bnds, bds, ic, oc);
	}
	else {
		annError("Illegal node type in dump file", ANNabort);
		//std::exit(0);		 // R objects... this approch to keep the compiler happy
		return NULL;       // to keep the compiler happy
	}
}


================================================
FILE: src/ANN/kd_fix_rad_search.cpp
================================================
//----------------------------------------------------------------------
// File:			kd_fix_rad_search.cpp
// Programmer:		Sunil Arya and David Mount
// Description:		Standard kd-tree fixed-radius kNN search
// Last modified:	05/03/05 (Version 1.1)
//----------------------------------------------------------------------
// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and
// David Mount.  All Rights Reserved.
//
// This software and related documentation is part of the Approximate
// Nearest Neighbor Library (ANN).  This software is provided under
// the provisions of the Lesser GNU Public License (LGPL).  See the
// file ../ReadMe.txt for further information.
//
// The University of Maryland (U.M.) and the authors make no
// representations about the suitability or fitness of this software for
// any purpose.  It is provided "as is" without express or implied
// warranty.
//----------------------------------------------------------------------
// History:
//	Revision 1.1  05/03/05
//		Initial release
//----------------------------------------------------------------------

// MFH: the code was changed to return all fixed radius neighbors using
// a std::vector<int> called closest.

#include "kd_fix_rad_search.h"			// kd fixed-radius search decls
#include <vector>

//----------------------------------------------------------------------
//	Approximate fixed-radius k nearest neighbor search
//		The squared radius is provided, and this procedure finds the
//		k nearest neighbors within the radius, and returns the total
//		number of points lying within the radius.
//
//		The method used for searching the kd-tree is a variation of the
//		nearest neighbor search used in kd_search.cpp, except that the
//		radius of the search ball is known.  We refer the reader to that
//		file for the explanation of the recursive search procedure.
//----------------------------------------------------------------------

//----------------------------------------------------------------------
//		To keep argument lists short, a number of global variables
//		are maintained which are common to all the recursive calls.
//		These are given below.
//----------------------------------------------------------------------

int				ANNkdFRDim;				// dimension of space
ANNpoint		ANNkdFRQ;				// query point
ANNdist			ANNkdFRSqRad;			// squared radius search bound
double			ANNkdFRMaxErr;			// max tolerable squared error
ANNpointArray	ANNkdFRPts;				// the points
ANNmin_k*		ANNkdFRPointMK;			// set of k closest points

std::vector<int> closest;			  // MFH: set of all closest points
std::vector<double> dists;			  // MFH: set of all closest points

int				ANNkdFRPtsVisited;		// total points visited
int				ANNkdFRPtsInRange;		// number of points in the range

//----------------------------------------------------------------------
//	annkFRSearch - fixed radius search for k nearest neighbors
//----------------------------------------------------------------------

// defunct we use ANNkd_tree::annkFRSearch2 which stores all neighbors in the new structures
// closest and dist.
int ANNkd_tree::annkFRSearch(
	ANNpoint			q,				// the query point
	ANNdist				sqRad,			// squared radius search bound
	int					k,				// number of near neighbors to return
	ANNidxArray			nn_idx,			// nearest neighbor indices (returned)
	ANNdistArray		dd,				// the approximate nearest neighbor
	double				eps)			// the error bound
{
	ANNkdFRDim = dim;					// copy arguments to static equivs
	ANNkdFRQ = q;
	ANNkdFRSqRad = sqRad;
	ANNkdFRPts = pts;
	ANNkdFRPtsVisited = 0;				// initialize count of points visited
	ANNkdFRPtsInRange = 0;				// ...and points in the range

	ANNkdFRMaxErr = ANN_POW(1.0 + eps);
	ANN_FLOP(2)							// increment floating op count

	ANNkdFRPointMK = new ANNmin_k(k);	// create set for closest k points
										// search starting at the root
	root->ann_FR_search(annBoxDistance(q, bnd_box_lo, bnd_box_hi, dim));

	for (int i = 0; i < k; i++) {		// extract the k-th closest points
		if (dd != NULL)
			dd[i] = ANNkdFRPointMK->ith_smallest_key(i);
		if (nn_idx != NULL)
			nn_idx[i] = ANNkdFRPointMK->ith_smallest_info(i);
	}

	delete ANNkdFRPointMK;				// deallocate closest point set
	return ANNkdFRPtsInRange;			// return final point count

}

// MFH this function returns all closest points
std::pair< std::vector<int>, std::vector<double> > ANNkd_tree::annkFRSearch2(
	ANNpoint			q,				// the query point
	ANNdist				sqRad,			// squared radius search bound
	double				eps)			// the error bound
{
	ANNkdFRDim = dim;					// copy arguments to static equivs
	ANNkdFRQ = q;
	ANNkdFRSqRad = sqRad;
	ANNkdFRPts = pts;
	ANNkdFRPtsVisited = 0;				// initialize count of points visited
	ANNkdFRPtsInRange = 0;				// ...and points in the range

	ANNkdFRMaxErr = ANN_POW(1.0 + eps);
	ANN_FLOP(2)							// increment floating op count

	//ANNkdFRPointMK = new ANNmin_k(k);	// create set for closest k points

	closest.clear();
	dists.clear();

	// search starting at the root
	root->ann_FR_search(annBoxDistance(q, bnd_box_lo, bnd_box_hi, dim));

	return std::make_pair(closest, dists);			// return final point count

}


//----------------------------------------------------------------------
//	kd_split::arch - search a splitting node
//		Note: This routine is similar in structure to the standard kNN
//		search.  It visits the subtree that is closer to the query point
//		first.  For fixed-radius search, there is no benefit in visiting
//		one subtree before the other, but we maintain the same basic
//		code structure for the sake of uniformity.
//----------------------------------------------------------------------

void ANNkd_split::ann_FR_search(ANNdist box_dist)
{
										// check dist calc term condition
	if (ANNmaxPtsVisited != 0 && ANNkdFRPtsVisited > ANNmaxPtsVisited) return;

										// distance to cutting plane
	ANNcoord cut_diff = ANNkdFRQ[cut_dim] - cut_val;

	if (cut_diff < 0) {					// left of cutting plane
		child[ANN_LO]->ann_FR_search(box_dist);// visit closer child first

		ANNcoord box_diff = cd_bnds[ANN_LO] - ANNkdFRQ[cut_dim];
		if (box_diff < 0)				// within bounds - ignore
			box_diff = 0;
										// distance to further box
		box_dist = (ANNdist) ANN_SUM(box_dist,
				ANN_DIFF(ANN_POW(box_diff), ANN_POW(cut_diff)));

										// visit further child if in range
		if (box_dist * ANNkdFRMaxErr <= ANNkdFRSqRad)
			child[ANN_HI]->ann_FR_search(box_dist);

	}
	else {								// right of cutting plane
		child[ANN_HI]->ann_FR_search(box_dist);// visit closer child first

		ANNcoord box_diff = ANNkdFRQ[cut_dim] - cd_bnds[ANN_HI];
		if (box_diff < 0)				// within bounds - ignore
			box_diff = 0;
										// distance to further box
		box_dist = (ANNdist) ANN_SUM(box_dist,
				ANN_DIFF(ANN_POW(box_diff), ANN_POW(cut_diff)));

										// visit further child if close enough
		if (box_dist * ANNkdFRMaxErr <= ANNkdFRSqRad)
			child[ANN_LO]->ann_FR_search(box_dist);

	}
	ANN_FLOP(13)						// increment floating ops
	ANN_SPL(1)							// one more splitting node visited
}

//----------------------------------------------------------------------
//	kd_leaf::ann_FR_search - search points in a leaf node
//		Note: The unreadability of this code is the result of
//		some fine tuning to replace indexing by pointer operations.
//----------------------------------------------------------------------

void ANNkd_leaf::ann_FR_search(ANNdist box_dist)
{
	ANNdist dist;				// distance to data point
	ANNcoord* pp;				// data coordinate pointer
	ANNcoord* qq;				// query coordinate pointer
	ANNcoord t;
	int d;

	for (int i = 0; i < n_pts; i++) {	// check points in bucket

		pp = ANNkdFRPts[bkt[i]];		// first coord of next data point
		qq = ANNkdFRQ;					// first coord of query point
		dist = 0;

		for(d = 0; d < ANNkdFRDim; d++) {
			ANN_COORD(1)				// one more coordinate hit
			ANN_FLOP(5)					// increment floating ops

			t = *(qq++) - *(pp++);		// compute length and adv coordinate
										// exceeds dist to k-th smallest?


			if( (dist = ANN_SUM(dist, ANN_POW(t))) > ANNkdFRSqRad) {
				break;
			}
		}

		if (d >= ANNkdFRDim &&					// among the k best?
		   (ANN_ALLOW_SELF_MATCH || dist!=0.0)) { // and no self-match problem
												// add it to the list
			//ANNkdFRPointMK->insert(dist, bkt[i]);

		  // MFH
			closest.push_back(bkt[i]);
			dists.push_back(dist);

			ANNkdFRPtsInRange++;				// increment point count
		}
	}
	ANN_LEAF(1)							// one more leaf node visited
	ANN_PTS(n_pts)						// increment points visited
	ANNkdFRPtsVisited += n_pts;			// increment number of points visited
}


================================================
FILE: src/ANN/kd_fix_rad_search.h
================================================
//----------------------------------------------------------------------
// File:			kd_fix_rad_search.h
// Programmer:		Sunil Arya and David Mount
// Description:		Standard kd-tree fixed-radius kNN search
// Last modified:	??/??/?? (Version 1.1)
//----------------------------------------------------------------------
// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and
// David Mount.  All Rights Reserved.
//
// This software and related documentation is part of the Approximate
// Nearest Neighbor Library (ANN).  This software is provided under
// the provisions of the Lesser GNU Public License (LGPL).  See the
// file ../ReadMe.txt for further information.
//
// The University of Maryland (U.M.) and the authors make no
// representations about the suitability or fitness of this software for
// any purpose.  It is provided "as is" without express or implied
// warranty.
//----------------------------------------------------------------------
// History:
//	Revision 1.1  ??/??/??
//		Initial release
//----------------------------------------------------------------------

#ifndef ANN_kd_fix_rad_search_H
#define ANN_kd_fix_rad_search_H

#include "kd_tree.h"					// kd-tree declarations
#include "kd_util.h"					// kd-tree utilities
#include "pr_queue_k.h"					// k-element priority queue

#include "ANNperf.h"				// performance evaluation

//----------------------------------------------------------------------
//	Global variables
//		These are active for the life of each call to
//		annRangeSearch().  They are set to save the number of
//		variables that need to be passed among the various search
//		procedures.
//----------------------------------------------------------------------

extern ANNpoint			ANNkdFRQ;			// query point (static copy)

#endif


================================================
FILE: src/ANN/kd_pr_search.cpp
================================================
//----------------------------------------------------------------------
// File:			kd_pr_search.cpp
// Programmer:		Sunil Arya and David Mount
// Description:		Priority search for kd-trees
// Last modified:	01/04/05 (Version 1.0)
//----------------------------------------------------------------------
// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and
// David Mount.  All Rights Reserved.
// 
// This software and related documentation is part of the Approximate
// Nearest Neighbor Library (ANN).  This software is provided under
// the provisions of the Lesser GNU Public License (LGPL).  See the
// file ../ReadMe.txt for further information.
// 
// The University of Maryland (U.M.) and the authors make no
// representations about the suitability or fitness of this software for
// any purpose.  It is provided "as is" without express or implied
// warranty.
//----------------------------------------------------------------------
// History:
//	Revision 0.1  03/04/98
//		Initial release
//----------------------------------------------------------------------

#include "kd_pr_search.h"				// kd priority search declarations

//----------------------------------------------------------------------
//	Approximate nearest neighbor searching by priority search.
//		The kd-tree is searched for an approximate nearest neighbor.
//		The point is returned through one of the arguments, and the
//		distance returned is the SQUARED distance to this point.
//
//		The method used for searching the kd-tree is called priority
//		search.  (It is described in Arya and Mount, ``Algorithms for
//		fast vector quantization,'' Proc. of DCC '93: Data Compression
//		Conference}, eds. J. A. Storer and M. Cohn, IEEE Press, 1993,
//		381--390.)
//
//		The cell of the kd-tree containing the query point is located,
//		and cells are visited in increasing order of distance from the
//		query point.  This is done by placing each subtree which has
//		NOT been visited in a priority queue, according to the closest
//		distance of the corresponding enclosing rectangle from the
//		query point.  The search stops when the distance to the nearest
//		remaining rectangle exceeds the distance to the nearest point
//		seen by a factor of more than 1/(1+eps). (Implying that any
//		point found subsequently in the search cannot be closer by more
//		than this factor.)
//
//		The main entry point is annkPriSearch() which sets things up and
//		then call the recursive routine ann_pri_search().  This is a
//		recursive routine which performs the processing for one node in
//		the kd-tree.  There are two versions of this virtual procedure,
//		one for splitting nodes and one for leaves. When a splitting node
//		is visited, we determine which child to continue the search on
//		(the closer one), and insert the other child into the priority
//		queue.  When a leaf is visited, we compute the distances to the
//		points in the buckets, and update information on the closest
//		points.
//
//		Some trickery is used to incrementally update the distance from
//		a kd-tree rectangle to the query point.  This comes about from
//		the fact that which each successive split, only one component
//		(along the dimension that is split) of the squared distance to
//		the child rectangle is different from the squared distance to
//		the parent rectangle.
//----------------------------------------------------------------------

//----------------------------------------------------------------------
//		To keep argument lists short, a number of global variables
//		are maintained which are common to all the recursive calls.
//		These are given below.
//----------------------------------------------------------------------

double			ANNprEps;				// the error bound
int				ANNprDim;				// dimension of space
ANNpoint		ANNprQ;					// query point
double			ANNprMaxErr;			// max tolerable squared error
ANNpointArray	ANNprPts;				// the points
ANNpr_queue		*ANNprBoxPQ;			// priority queue for boxes
ANNmin_k		*ANNprPointMK;			// set of k closest points

//----------------------------------------------------------------------
//	annkPriSearch - priority search for k nearest neighbors
//----------------------------------------------------------------------

void ANNkd_tree::annkPriSearch(
	ANNpoint			q,				// query point
	int					k,				// number of near neighbors to return
	ANNidxArray			nn_idx,			// nearest neighbor indices (returned)
	ANNdistArray		dd,				// dist to near neighbors (returned)
	double				eps)			// error bound (ignored)
{
										// max tolerable squared error
	ANNprMaxErr = ANN_POW(1.0 + eps);
	ANN_FLOP(2)							// increment floating ops

	ANNprDim = dim;						// copy arguments to static equivs
	ANNprQ = q;
	ANNprPts = pts;
	ANNptsVisited = 0;					// initialize count of points visited

	ANNprPointMK = new ANNmin_k(k);		// create set for closest k points

										// distance to root box
	ANNdist box_dist = annBoxDistance(q,
				bnd_box_lo, bnd_box_hi, dim);

	ANNprBoxPQ = new ANNpr_queue(n_pts);// create priority queue for boxes
	ANNprBoxPQ->insert(box_dist, root); // insert root in priority queue

	while (ANNprBoxPQ->non_empty() &&
		(!(ANNmaxPtsVisited != 0 && ANNptsVisited > ANNmaxPtsVisited))) {
		ANNkd_ptr np;					// next box from prior queue

										// extract closest box from queue
		ANNprBoxPQ->extr_min(box_dist, (void *&) np);

		ANN_FLOP(2)						// increment floating ops
		if (box_dist*ANNprMaxErr >= ANNprPointMK->max_key())
			break;

		np->ann_pri_search(box_dist);	// search this subtree.
	}

	for (int i = 0; i < k; i++) {		// extract the k-th closest points
		dd[i] = ANNprPointMK->ith_smallest_key(i);
		nn_idx[i] = ANNprPointMK->ith_smallest_info(i);
	}

	delete ANNprPointMK;				// deallocate closest point set
	delete ANNprBoxPQ;					// deallocate priority queue
}

//----------------------------------------------------------------------
//	kd_split::ann_pri_search - search a splitting node
//----------------------------------------------------------------------

void ANNkd_split::ann_pri_search(ANNdist box_dist)
{
	ANNdist new_dist;					// distance to child visited later
										// distance to cutting plane
	ANNcoord cut_diff = ANNprQ[cut_dim] - cut_val;

	if (cut_diff < 0) {					// left of cutting plane
		ANNcoord box_diff = cd_bnds[ANN_LO] - ANNprQ[cut_dim];
		if (box_diff < 0)				// within bounds - ignore
			box_diff = 0;
										// distance to further box
		new_dist = (ANNdist) ANN_SUM(box_dist,
				ANN_DIFF(ANN_POW(box_diff), ANN_POW(cut_diff)));

		if (child[ANN_HI] != KD_TRIVIAL)// enqueue if not trivial
			ANNprBoxPQ->insert(new_dist, child[ANN_HI]);
										// continue with closer child
		child[ANN_LO]->ann_pri_search(box_dist);
	}
	else {								// right of cutting plane
		ANNcoord box_diff = ANNprQ[cut_dim] - cd_bnds[ANN_HI];
		if (box_diff < 0)				// within bounds - ignore
			box_diff = 0;
										// distance to further box
		new_dist = (ANNdist) ANN_SUM(box_dist,
				ANN_DIFF(ANN_POW(box_diff), ANN_POW(cut_diff)));

		if (child[ANN_LO] != KD_TRIVIAL)// enqueue if not trivial
			ANNprBoxPQ->insert(new_dist, child[ANN_LO]);
										// continue with closer child
		child[ANN_HI]->ann_pri_search(box_dist);
	}
	ANN_SPL(1)							// one more splitting node visited
	ANN_FLOP(8)							// increment floating ops
}

//----------------------------------------------------------------------
//	kd_leaf::ann_pri_search - search points in a leaf node
//
//		This is virtually identical to the ann_search for standard search.
//----------------------------------------------------------------------

void ANNkd_leaf::ann_pri_search(ANNdist box_dist)
{
	ANNdist dist;				// distance to data point
	ANNcoord* pp;				// data coordinate pointer
	ANNcoord* qq;				// query coordinate pointer
	ANNdist min_dist;			// distance to k-th closest point
	ANNcoord t;
	int d;

	min_dist = ANNprPointMK->max_key(); // k-th smallest distance so far

	for (int i = 0; i < n_pts; i++) {	// check points in bucket

		pp = ANNprPts[bkt[i]];			// first coord of next data point
		qq = ANNprQ;					// first coord of query point
		dist = 0;

		for(d = 0; d < ANNprDim; d++) {
			ANN_COORD(1)				// one more coordinate hit
			ANN_FLOP(4)					// increment floating ops

			t = *(qq++) - *(pp++);		// compute length and adv coordinate
										// exceeds dist to k-th smallest?
			if( (dist = ANN_SUM(dist, ANN_POW(t))) > min_dist) {
				break;
			}
		}

		if (d >= ANNprDim &&					// among the k best?
		   (ANN_ALLOW_SELF_MATCH || dist!=0)) { // and no self-match problem
												// add it to the list
			ANNprPointMK->insert(dist, bkt[i]);
			min_dist = ANNprPointMK->max_key();
		}
	}
	ANN_LEAF(1)							// one more leaf node visited
	ANN_PTS(n_pts)						// increment points visited
	ANNptsVisited += n_pts;				// increment number of points visited
}


================================================
FILE: src/ANN/kd_pr_search.h
================================================
//----------------------------------------------------------------------
// File:			kd_pr_search.h
// Programmer:		Sunil Arya and David Mount
// Description:		Priority kd-tree search
// Last modified:	01/04/05 (Version 1.0)
//----------------------------------------------------------------------
// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and
// David Mount.  All Rights Reserved.
//
// This software and related documentation is part of the Approximate
// Nearest Neighbor Library (ANN).  This software is provided under
// the provisions of the Lesser GNU Public License (LGPL).  See the
// file ../ReadMe.txt for further information.
//
// The University of Maryland (U.M.) and the authors make no
// representations about the suitability or fitness of this software for
// any purpose.  It is provided "as is" without express or implied
// warranty.
//----------------------------------------------------------------------
// History:
//	Revision 0.1  03/04/98
//		Initial release
//----------------------------------------------------------------------

#ifndef ANN_kd_pr_search_H
#define ANN_kd_pr_search_H

#include "kd_tree.h"					// kd-tree declarations
#include "kd_util.h"					// kd-tree utilities
#include "pr_queue.h"					// priority queue declarations
#include "pr_queue_k.h"					// k-element priority queue

#include "ANNperf.h"				// performance evaluation

//----------------------------------------------------------------------
//	Global variables
//		Active for the life of each call to Appx_Near_Neigh() or
//		Appx_k_Near_Neigh().
//----------------------------------------------------------------------

extern double			ANNprEps;		// the error bound
extern int				ANNprDim;		// dimension of space
extern ANNpoint			ANNprQ;			// query point
extern double			ANNprMaxErr;	// max tolerable squared error
extern ANNpointArray	ANNprPts;		// the points
extern ANNpr_queue		*ANNprBoxPQ;	// priority queue for boxes
extern ANNmin_k			*ANNprPointMK;	// set of k closest points

#endif


================================================
FILE: src/ANN/kd_search.cpp
================================================
//----------------------------------------------------------------------
// File:			kd_search.cpp
// Programmer:		Sunil Arya and David Mount
// Description:		Standard kd-tree search
// Last modified:	01/04/05 (Version 1.0)
//----------------------------------------------------------------------
// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and
// David Mount.  All Rights Reserved.
// 
// This software and related documentation is part of the Approximate
// Nearest Neighbor Library (ANN).  This software is provided under
// the provisions of the Lesser GNU Public License (LGPL).  See the
// file ../ReadMe.txt for further information.
// 
// The University of Maryland (U.M.) and the authors make no
// representations about the suitability or fitness of this software for
// any purpose.  It is provided "as is" without express or implied
// warranty.
//----------------------------------------------------------------------
// History:
//	Revision 0.1  03/04/98
//		Initial release
//	Revision 1.0  04/01/05
//		Changed names LO, HI to ANN_LO, ANN_HI
//----------------------------------------------------------------------

#include "kd_search.h"					// kd-search declarations

//----------------------------------------------------------------------
//	Approximate nearest neighbor searching by kd-tree search
//		The kd-tree is searched for an approximate nearest neighbor.
//		The point is returned through one of the arguments, and the
//		distance returned is the squared distance to this point.
//
//		The method used for searching the kd-tree is an approximate
//		adaptation of the search algorithm described by Friedman,
//		Bentley, and Finkel, ``An algorithm for finding best matches
//		in logarithmic expected time,'' ACM Transactions on Mathematical
//		Software, 3(3):209-226, 1977).
//
//		The algorithm operates recursively.  When first encountering a
//		node of the kd-tree we first visit the child which is closest to
//		the query point.  On return, we decide whether we want to visit
//		the other child.  If the box containing the other child exceeds
//		1/(1+eps) times the current best distance, then we skip it (since
//		any point found in this child cannot be closer to the query point
//		by more than this factor.)  Otherwise, we visit it recursively.
//		The distance between a box and the query point is computed exactly
//		(not approximated as is often done in kd-tree), using incremental
//		distance updates, as described by Arya and Mount in ``Algorithms
//		for fast vector quantization,'' Proc.  of DCC '93: Data Compression
//		Conference, eds. J. A. Storer and M. Cohn, IEEE Press, 1993,
//		381-390.
//
//		The main entry points is annkSearch() which sets things up and
//		then call the recursive routine ann_search().  This is a recursive
//		routine which performs the processing for one node in the kd-tree.
//		There are two versions of this virtual procedure, one for splitting
//		nodes and one for leaves.  When a splitting node is visited, we
//		determine which child to visit first (the closer one), and visit
//		the other child on return.  When a leaf is visited, we compute
//		the distances to the points in the buckets, and update information
//		on the closest points.
//
//		Some trickery is used to incrementally update the distance from
//		a kd-tree rectangle to the query point.  This comes about from
//		the fact that which each successive split, only one component
//		(along the dimension that is split) of the squared distance to
//		the child rectangle is different from the squared distance to
//		the parent rectangle.
//----------------------------------------------------------------------

//----------------------------------------------------------------------
//		To keep argument lists short, a number of global variables
//		are maintained which are common to all the recursive calls.
//		These are given below.
//----------------------------------------------------------------------

int				ANNkdDim;				// dimension of space
ANNpoint		ANNkdQ;					// query point
double			ANNkdMaxErr;			// max tolerable squared error
ANNpointArray	ANNkdPts;				// the points
ANNmin_k		*ANNkdPointMK;			// set of k closest points

//----------------------------------------------------------------------
//	annkSearch - search for the k nearest neighbors
//----------------------------------------------------------------------

void ANNkd_tree::annkSearch(
	ANNpoint			q,				// the query point
	int					k,				// number of near neighbors to return
	ANNidxArray			nn_idx,			// nearest neighbor indices (returned)
	ANNdistArray		dd,				// the approximate nearest neighbor
	double				eps)			// the error bound
{

	ANNkdDim = dim;						// copy arguments to static equivs
	ANNkdQ = q;
	ANNkdPts = pts;
	ANNptsVisited = 0;					// initialize count of points visited

	if (k > n_pts) {					// too many near neighbors?
		annError("Requesting more near neighbors than data points", ANNabort);
	}

	ANNkdMaxErr = ANN_POW(1.0 + eps);
	ANN_FLOP(2)							// increment floating op count

	ANNkdPointMK = new ANNmin_k(k);		// create set for closest k points
										// search starting at the root
	root->ann_search(annBoxDistance(q, bnd_box_lo, bnd_box_hi, dim));

	for (int i = 0; i < k; i++) {		// extract the k-th closest points
		dd[i] = ANNkdPointMK->ith_smallest_key(i);
		nn_idx[i] = ANNkdPointMK->ith_smallest_info(i);
	}
	delete ANNkdPointMK;				// deallocate closest point set
}

//----------------------------------------------------------------------
//	kd_split::ann_search - search a splitting node
//----------------------------------------------------------------------

void ANNkd_split::ann_search(ANNdist box_dist)
{
										// check dist calc term condition
	if (ANNmaxPtsVisited != 0 && ANNptsVisited > ANNmaxPtsVisited) return;

										// distance to cutting plane
	ANNcoord cut_diff = ANNkdQ[cut_dim] - cut_val;

	if (cut_diff < 0) {					// left of cutting plane
		child[ANN_LO]->ann_search(box_dist);// visit closer child first

		ANNcoord box_diff = cd_bnds[ANN_LO] - ANNkdQ[cut_dim];
		if (box_diff < 0)				// within bounds - ignore
			box_diff = 0;
										// distance to further box
		box_dist = (ANNdist) ANN_SUM(box_dist,
				ANN_DIFF(ANN_POW(box_diff), ANN_POW(cut_diff)));

										// visit further child if close enough
		if (box_dist * ANNkdMaxErr < ANNkdPointMK->max_key())
			child[ANN_HI]->ann_search(box_dist);

	}
	else {								// right of cutting plane
		child[ANN_HI]->ann_search(box_dist);// visit closer child first

		ANNcoord box_diff = ANNkdQ[cut_dim] - cd_bnds[ANN_HI];
		if (box_diff < 0)				// within bounds - ignore
			box_diff = 0;
										// distance to further box
		box_dist = (ANNdist) ANN_SUM(box_dist,
				ANN_DIFF(ANN_POW(box_diff), ANN_POW(cut_diff)));

										// visit further child if close enough
		if (box_dist * ANNkdMaxErr < ANNkdPointMK->max_key())
			child[ANN_LO]->ann_search(box_dist);

	}
	ANN_FLOP(10)						// increment floating ops
	ANN_SPL(1)							// one more splitting node visited
}

//----------------------------------------------------------------------
//	kd_leaf::ann_search - search points in a leaf node
//		Note: The unreadability of this code is the result of
//		some fine tuning to replace indexing by pointer operations.
//----------------------------------------------------------------------

void ANNkd_leaf::ann_search(ANNdist box_dist)
{
	ANNdist dist;				// distance to data point
	ANNcoord* pp;				// data coordinate pointer
	ANNcoord* qq;				// query coordinate pointer
	ANNdist min_dist;			// distance to k-th closest point
	ANNcoord t;
	int d;

	min_dist = ANNkdPointMK->max_key(); // k-th smallest distance so far

	for (int i = 0; i < n_pts; i++) {	// check points in bucket

		pp = ANNkdPts[bkt[i]];			// first coord of next data point
		qq = ANNkdQ;					// first coord of query point
		dist = 0;

		for(d = 0; d < ANNkdDim; d++) {
			ANN_COORD(1)				// one more coordinate hit
			ANN_FLOP(4)					// increment floating ops

			t = *(qq++) - *(pp++);		// compute length and adv coordinate
										// exceeds dist to k-th smallest?
			if( (dist = ANN_SUM(dist, ANN_POW(t))) > min_dist) {
				break;
			}
		}

		if (d >= ANNkdDim &&					// among the k best?
		   (ANN_ALLOW_SELF_MATCH || dist!=0)) { // and no self-match problem
												// add it to the list
			ANNkdPointMK->insert(dist, bkt[i]);
			min_dist = ANNkdPointMK->max_key();
		}
	}
	ANN_LEAF(1)							// one more leaf node visited
	ANN_PTS(n_pts)						// increment points visited
	ANNptsVisited += n_pts;				// increment number of points visited
}


================================================
FILE: src/ANN/kd_search.h
================================================
//----------------------------------------------------------------------
// File:			kd_search.h
// Programmer:		Sunil Arya and David Mount
// Description:		Standard kd-tree search
// Last modified:	01/04/05 (Version 1.0)
//----------------------------------------------------------------------
// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and
// David Mount.  All Rights Reserved.
//
// This software and related documentation is part of the Approximate
// Nearest Neighbor Library (ANN).  This software is provided under
// the provisions of the Lesser GNU Public License (LGPL).  See the
// file ../ReadMe.txt for further information.
//
// The University of Maryland (U.M.) and the authors make no
// representations about the suitability or fitness of this software for
// any purpose.  It is provided "as is" without express or implied
// warranty.
//----------------------------------------------------------------------
// History:
//	Revision 0.1  03/04/98
//		Initial release
//----------------------------------------------------------------------

#ifndef ANN_kd_search_H
#define ANN_kd_search_H

#include "kd_tree.h"					// kd-tree declarations
#include "kd_util.h"					// kd-tree utilities
#include "pr_queue_k.h"					// k-element priority queue

#include "ANNperf.h"				// performance evaluation

//----------------------------------------------------------------------
//	More global variables
//		These are active for the life of each call to annkSearch(). They
//		are set to save the number of variables that need to be passed
//		among the various search procedures.
//----------------------------------------------------------------------

extern int				ANNkdDim;		// dimension of space (static copy)
extern ANNpoint			ANNkdQ;			// query point (static copy)
extern double			ANNkdMaxErr;	// max tolerable squared error
extern ANNpointArray	ANNkdPts;		// the points (static copy)
extern ANNmin_k			*ANNkdPointMK;	// set of k closest points
extern int				ANNptsVisited;	// number of points visited

#endif


================================================
FILE: src/ANN/kd_split.cpp
================================================
//----------------------------------------------------------------------
// File:			kd_split.cpp
// Programmer:		Sunil Arya and David Mount
// Description:		Methods for splitting kd-trees
// Last modified:	01/04/05 (Version 1.0)
//----------------------------------------------------------------------
// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and
// David Mount.  All Rights Reserved.
//
// This software and related documentation is part of the Approximate
// Nearest Neighbor Library (ANN).  This software is provided under
// the provisions of the Lesser GNU Public License (LGPL).  See the
// file ../ReadMe.txt for further information.
//
// The University of Maryland (U.M.) and the authors make no
// representations about the suitability or fitness of this software for
// any purpose.  It is provided "as is" without express or implied
// warranty.
//----------------------------------------------------------------------
// History:
//	Revision 0.1  03/04/98
//		Initial release
//	Revision 1.0  04/01/05
//----------------------------------------------------------------------

#include "kd_tree.h"					// kd-tree definitions
#include "kd_util.h"					// kd-tree utilities
#include "kd_split.h"					// splitting functions

//----------------------------------------------------------------------
//	Constants
//----------------------------------------------------------------------

const double EPS = 0.001;				// a small value
const double FS_ASPECT_RATIO = 3.0;		// maximum allowed aspect ratio
										// in fair split. Must be >= 2.

//----------------------------------------------------------------------
//	kd_split - Bentley's standard splitting routine for kd-trees
//		Find the dimension of the greatest spread, and split
//		just before the median point along this dimension.
//----------------------------------------------------------------------

void kd_split(
	ANNpointArray		pa,				// point array (permuted on return)
	ANNidxArray			pidx,			// point indices
	const ANNorthRect	&bnds,			// bounding rectangle for cell
	int					n,				// number of points
	int					dim,			// dimension of space
	int					&cut_dim,		// cutting dimension (returned)
	ANNcoord			&cut_val,		// cutting value (returned)
	int					&n_lo)			// num of points on low side (returned)
{
										// find dimension of maximum spread
	cut_dim = annMaxSpread(pa, pidx, n, dim);
	n_lo = n/2;							// median rank
										// split about median
	annMedianSplit(pa, pidx, n, cut_dim, cut_val, n_lo);
}

//----------------------------------------------------------------------
//	midpt_split - midpoint splitting rule for box-decomposition trees
//
//		This is the simplest splitting rule that guarantees boxes
//		of bounded aspect ratio.  It simply cuts the box with the
//		longest side through its midpoint.  If there are ties, it
//		selects the dimension with the maximum point spread.
//
//		WARNING: This routine (while simple) doesn't seem to work
//		well in practice in high dimensions, because it tends to
//		generate a large number of trivial and/or unbalanced splits.
//		Either kd_split(), sl_midpt_split(), or fair_split() are
//		recommended, instead.
//----------------------------------------------------------------------

void midpt_split(
	ANNpointArray		pa,				// point array
	ANNidxArray			pidx,			// point indices (permuted on return)
	const ANNorthRect	&bnds,			// bounding rectangle for cell
	int					n,				// number of points
	int					dim,			// dimension of space
	int					&cut_dim,		// cutting dimension (returned)
	ANNcoord			&cut_val,		// cutting value (returned)
	int					&n_lo)			// num of points on low side (returned)
{
	int d;

	ANNcoord max_length = bnds.hi[0] - bnds.lo[0];
	for (d = 1; d < dim; d++) {			// find length of longest box side
		ANNcoord length = bnds.hi[d] - bnds.lo[d];
		if (length > max_length) {
			max_length = length;
		}
	}
	ANNcoord max_spread = -1;			// find long side with most spread
	for (d = 0; d < dim; d++) {
										// is it among longest?
		if (double(bnds.hi[d] - bnds.lo[d]) >= (1-EPS)*max_length) {
										// compute its spread
			ANNcoord spr = annSpread(pa, pidx, n, d);
			if (spr > max_spread) {		// is it max so far?
				max_spread = spr;
				cut_dim = d;
			}
		}
	}
										// split along cut_dim at midpoint
	cut_val = (bnds.lo[cut_dim] + bnds.hi[cut_dim]) / 2;
										// permute points accordingly
	int br1, br2;
	annPlaneSplit(pa, pidx, n, cut_dim, cut_val, br1, br2);
	//------------------------------------------------------------------
	//	On return:		pa[0..br1-1] < cut_val
	//					pa[br1..br2-1] == cut_val
	//					pa[br2..n-1] > cut_val
	//
	//	We can set n_lo to any value in the range [br1..br2].
	//	We choose split so that points are most evenly divided.
	//------------------------------------------------------------------
	if (br1 > n/2) n_lo = br1;
	else if (br2 < n/2) n_lo = br2;
	else n_lo = n/2;
}

//----------------------------------------------------------------------
//	sl_midpt_split - sliding midpoint splitting rule
//
//		This is a modification of midpt_split, which has the nonsensical
//		name "sliding midpoint".  The idea is that we try to use the
//		midpoint rule, by bisecting the longest side.  If there are
//		ties, the dimension with the maximum spread is selected.  If,
//		however, the midpoint split produces a trivial split (no points
//		on one side of the splitting plane) then we slide the splitting
//		(maintaining its orientation) until it produces a nontrivial
//		split. For example, if the splitting plane is along the x-axis,
//		and all the data points have x-coordinate less than the x-bisector,
//		then the split is taken along the maximum x-coordinate of the
//		data points.
//
//		Intuitively, this rule cannot generate trivial splits, and
//		hence avoids midpt_split's tendency to produce trees with
//		a very large number of nodes.
//
//----------------------------------------------------------------------

void sl_midpt_split(
	ANNpointArray		pa,				// point array
	ANNidxArray			pidx,			// point indices (permuted on return)
	const ANNorthRect	&bnds,			// bounding rectangle for cell
	int					n,				// number of points
	int					dim,			// dimension of space
	int					&cut_dim,		// cutting dimension (returned)
	ANNcoord			&cut_val,		// cutting value (returned)
	int					&n_lo)			// num of points on low side (returned)
{
	int d;

	ANNcoord max_length = bnds.hi[0] - bnds.lo[0];
	for (d = 1; d < dim; d++) {			// find length of longest box side
		ANNcoord length = bnds.hi[d] - bnds.lo[d];
		if (length > max_length) {
			max_length = length;
		}
	}
	ANNcoord max_spread = -1;			// find long side with most spread
	for (d = 0; d < dim; d++) {
										// is it among longest?
		if ((bnds.hi[d] - bnds.lo[d]) >= (1-EPS)*max_length) {
										// compute its spread
			ANNcoord spr = annSpread(pa, pidx, n, d);
			if (spr > max_spread) {		// is it max so far?
				max_spread = spr;
				cut_dim = d;
			}
		}
	}
										// ideal split at midpoint
	ANNcoord ideal_cut_val = (bnds.lo[cut_dim] + bnds.hi[cut_dim])/2;

	ANNcoord min, max;
	annMinMax(pa, pidx, n, cut_dim, min, max);	// find min/max coordinates

	if (ideal_cut_val < min)			// slide to min or max as needed
		cut_val = min;
	else if (ideal_cut_val > max)
		cut_val = max;
	else
		cut_val = ideal_cut_val;

										// permute points accordingly
	int br1, br2;
	annPlaneSplit(pa, pidx, n, cut_dim, cut_val, br1, br2);
	//------------------------------------------------------------------
	//	On return:		pa[0..br1-1] < cut_val
	//					pa[br1..br2-1] == cut_val
	//					pa[br2..n-1] > cut_val
	//
	//	We can set n_lo to any value in the range [br1..br2] to satisfy
	//	the exit conditions of the procedure.
	//
	//	if ideal_cut_val < min (implying br2 >= 1),
	//			then we select n_lo = 1 (so there is one point on left) and
	//	if ideal_cut_val > max (implying br1 <= n-1),
	//			then we select n_lo = n-1 (so there is one point on right).
	//	Otherwise, we select n_lo as close to n/2 as possible within
	//			[br1..br2].
	//------------------------------------------------------------------
	if (ideal_cut_val < min) n_lo = 1;
	else if (ideal_cut_val > max) n_lo = n-1;
	else if (br1 > n/2) n_lo = br1;
	else if (br2 < n/2) n_lo = br2;
	else n_lo = n/2;
}

//----------------------------------------------------------------------
//	fair_split - fair-split splitting rule
//
//		This is a compromise between the kd-tree splitting rule (which
//		always splits data points at their median) and the midpoint
//		splitting rule (which always splits a box through its center.
//		The goal of this procedure is to achieve both nicely balanced
//		splits, and boxes of bounded aspect ratio.
//
//		A constant FS_ASPECT_RATIO is defined. Given a box, those sides
//		which can be split so that the ratio of the longest to shortest
//		side does not exceed ASPECT_RATIO are identified.  Among these
//		sides, we select the one in which the points have the largest
//		spread. We then split the points in a manner which most evenly
//		distributes the points on either side of the splitting plane,
//		subject to maintaining the bound on the ratio of long to short
//		sides. To determine that the aspect ratio will be preserved,
//		we determine the longest side (other than this side), and
//		determine how narrowly we can cut this side, without causing the
//		aspect ratio bound to be exceeded (small_piece).
//
//		This procedure is more robust than either kd_split or midpt_split,
//		but is more complicated as well.  When point distribution is
//		extremely skewed, this degenerates to midpt_split (actually
//		1/3 point split), and when the points are most evenly distributed,
//		this degenerates to kd-split.
//----------------------------------------------------------------------

void fair_split(
	ANNpointArray		pa,				// point array
	ANNidxArray			pidx,			// point indices (permuted on return)
	const ANNorthRect	&bnds,			// bounding rectangle for cell
	int					n,				// number of points
	int					dim,			// dimension of space
	int					&cut_dim,		// cutting dimension (returned)
	ANNcoord			&cut_val,		// cutting value (returned)
	int					&n_lo)			// num of points on low side (returned)
{
	int d;
	ANNcoord max_length = bnds.hi[0] - bnds.lo[0];
	cut_dim = 0;
	for (d = 1; d < dim; d++) {			// find length of longest box side
		ANNcoord length = bnds.hi[d] - bnds.lo[d];
		if (length > max_length) {
			max_length = length;
			cut_dim = d;
		}
	}

	ANNcoord max_spread = 0;			// find legal cut with max spread
	cut_dim = 0;
	for (d = 0; d < dim; d++) {
		ANNcoord length = bnds.hi[d] - bnds.lo[d];
										// is this side midpoint splitable
										// without violating aspect ratio?
		if (((double) max_length)*2.0/((double) length) <= FS_ASPECT_RATIO) {
										// compute spread along this dim
			ANNcoord spr = annSpread(pa, pidx, n, d);
			if (spr > max_spread) {		// best spread so far
				max_spread = spr;
				cut_dim = d;			// this is dimension to cut
			}
		}
	}

	max_length = 0;						// find longest side other than cut_dim
	for (d = 0; d < dim; d++) {
		ANNcoord length = bnds.hi[d] - bnds.lo[d];
		if (d != cut_dim && length > max_length)
			max_length = length;
	}
										// consider most extreme splits
	ANNcoord small_piece = max_length / FS_ASPECT_RATIO;
	ANNcoord lo_cut = bnds.lo[cut_dim] + small_piece;// lowest legal cut
	ANNcoord hi_cut = bnds.hi[cut_dim] - small_piece;// highest legal cut

	int br1, br2;
										// is median below lo_cut ?
	if (annSplitBalance(pa, pidx, n, cut_dim, lo_cut) >= 0) {
		cut_val = lo_cut;				// cut at lo_cut
		annPlaneSplit(pa, pidx, n, cut_dim, cut_val, br1, br2);
		n_lo = br1;
	}
										// is median above hi_cut?
	else if (annSplitBalance(pa, pidx, n, cut_dim, hi_cut) <= 0) {
		cut_val = hi_cut;				// cut at hi_cut
		annPlaneSplit(pa, pidx, n, cut_dim, cut_val, br1, br2);
		n_lo = br2;
	}
	else {								// median cut preserves asp ratio
		n_lo = n/2;						// split about median
		annMedianSplit(pa, pidx, n, cut_dim, cut_val, n_lo);
	}
}

//----------------------------------------------------------------------
//	sl_fair_split - sliding fair split splitting rule
//
//		Sliding fair split is a splitting rule that combines the
//		strengths of both fair split with sliding midpoint split.
//		Fair split tends to produce balanced splits when the points
//		are roughly uniformly distributed, but it can produce many
//		trivial splits when points are highly clustered.  Sliding
//		midpoint never produces trivial splits, and shrinks boxes
//		nicely if points are highly clustered, but it may produce
//		rather unbalanced splits when points are unclustered but not
//		quite uniform.
//
//		Sliding fair split is based on the theory that there are two
//		types of splits that are "good": balanced splits that produce
//		fat boxes, and unbalanced splits provided the cell with fewer
//		points is fat.
//
//		This splitting rule operates by first computing the longest
//		side of the current bounding box.  Then it asks which sides
//		could be split (at the midpoint) and still satisfy the aspect
//		ratio bound with respect to this side.	Among these, it selects
//		the side with the largest spread (as fair split would).	 It
//		then considers the most extreme cuts that would be allowed by
//		the aspect ratio bound.	 This is done by dividing the longest
//		side of the box by the aspect ratio bound.	If the median cut
//		lies between these extreme cuts, then we use the median cut.
//		If not, then consider the extreme cut that is closer to the
//		median.	 If all the points lie to one side of this cut, then
//		we slide the cut until it hits the first point.	 This may
//		violate the aspect ratio bound, but will never generate empty
//		cells.	However the sibling of every such skinny cell is fat,
//		and hence packing arguments still apply.
//
//----------------------------------------------------------------------

void sl_fair_split(
	ANNpointArray		pa,				// point array
	ANNidxArray			pidx,			// point indices (permuted on return)
	const ANNorthRect	&bnds,			// bounding rectangle for cell
	int					n,				// number of points
	int					dim,			// dimension of space
	int					&cut_dim,		// cutting dimension (returned)
	ANNcoord			&cut_val,		// cutting value (returned)
	int					&n_lo)			// num of points on low side (returned)
{
	int d;
	ANNcoord min, max;					// min/max coordinates
	int br1, br2;						// split break points

	ANNcoord max_length = bnds.hi[0] - bnds.lo[0];
	cut_dim = 0;
	for (d = 1; d < dim; d++) {			// find length of longest box side
		ANNcoord length = bnds.hi[d] - bnds.lo[d];
		if (length	> max_length) {
			max_length = length;
			cut_dim = d;
		}
	}

	ANNcoord max_spread = 0;			// find legal cut with max spread
	cut_dim = 0;
	for (d = 0; d < dim; d++) {
		ANNcoord length = bnds.hi[d] - bnds.lo[d];
										// is this side midpoint splitable
										// without violating aspect ratio?
		if (((double) max_length)*2.0/((double) length) <= FS_ASPECT_RATIO) {
										// compute spread along this dim
			ANNcoord spr = annSpread(pa, pidx, n, d);
			if (spr > max_spread) {		// best spread so far
				max_spread = spr;
				cut_dim = d;			// this is dimension to cut
			}
		}
	}

	max_length = 0;						// find longest side other than cut_dim
	for (d = 0; d < dim; d++) {
		ANNcoord length = bnds.hi[d] - bnds.lo[d];
		if (d != cut_dim && length > max_length)
			max_length = length;
	}
										// consider most extreme splits
	ANNcoord small_piece = max_length / FS_ASPECT_RATIO;
	ANNcoord lo_cut = bnds.lo[cut_dim] + small_piece;// lowest legal cut
	ANNcoord hi_cut = bnds.hi[cut_dim] - small_piece;// highest legal cut
										// find min and max along cut_dim
	annMinMax(pa, pidx, n, cut_dim, min, max);
										// is median below lo_cut?
	if (annSplitBalance(pa, pidx, n, cut_dim, lo_cut) >= 0) {
		if (max > lo_cut) {				// are any points above lo_cut?
			cut_val = lo_cut;			// cut at lo_cut
			annPlaneSplit(pa, pidx, n, cut_dim, cut_val, br1, br2);
			n_lo = br1;					// balance if there are ties
		}
		else {							// all points below lo_cut
			cut_val = max;				// cut at max value
			annPlaneSplit(pa, pidx, n, cut_dim, cut_val, br1, br2);
			n_lo = n-1;
		}
	}
										// is median above hi_cut?
	else if (annSplitBalance(pa, pidx, n, cut_dim, hi_cut) <= 0) {
		if (min < hi_cut) {				// are any points below hi_cut?
			cut_val = hi_cut;			// cut at hi_cut
			annPlaneSplit(pa, pidx, n, cut_dim, cut_val, br1, br2);
			n_lo = br2;					// balance if there are ties
		}
		else {							// all points above hi_cut
			cut_val = min;				// cut at min value
			annPlaneSplit(pa, pidx, n, cut_dim, cut_val, br1, br2);
			n_lo = 1;
		}
	}
	else {								// median cut is good enough
		n_lo = n/2;						// split about median
		annMedianSplit(pa, pidx, n, cut_dim, cut_val, n_lo);
	}
}


================================================
FILE: src/ANN/kd_split.h
================================================
//----------------------------------------------------------------------
// File:			kd_split.h
// Programmer:		Sunil Arya and David Mount
// Description:		Methods for splitting kd-trees
// Last modified:	01/04/05 (Version 1.0)
//----------------------------------------------------------------------
// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and
// David Mount.  All Rights Reserved.
// 
// This software and related documentation is part of the Approximate
// Nearest Neighbor Library (ANN).  This software is provided under
// the provisions of the Lesser GNU Public License (LGPL).  See the
// file ../ReadMe.txt for further information.
// 
// The University of Maryland (U.M.) and the authors make no
// representations about the suitability or fitness of this software for
// any purpose.  It is provided "as is" without express or implied
// warranty.
//----------------------------------------------------------------------
// History:
//	Revision 0.1  03/04/98
//		Initial release
//----------------------------------------------------------------------

#ifndef ANN_KD_SPLIT_H
#define ANN_KD_SPLIT_H

#include "kd_tree.h"					// kd-tree definitions

//----------------------------------------------------------------------
//	External entry points
//		These are all splitting procedures for kd-trees.
//----------------------------------------------------------------------

void kd_split(							// standard (optimized) kd-splitter
	ANNpointArray		pa,				// point array (unaltered)
	ANNidxArray			pidx,			// point indices (permuted on return)
	const ANNorthRect	&bnds,			// bounding rectangle for cell
	int					n,				// number of points
	int					dim,			// dimension of space
	int					&cut_dim,		// cutting dimension (returned)
	ANNcoord			&cut_val,		// cutting value (returned)
	int					&n_lo);			// num of points on low side (returned)

void midpt_split(						// midpoint kd-splitter
	ANNpointArray		pa,				// point array (unaltered)
	ANNidxArray			pidx,			// point indices (permuted on return)
	const ANNorthRect	&bnds,			// bounding rectangle for cell
	int					n,				// number of points
	int					dim,			// dimension of space
	int					&cut_dim,		// cutting dimension (returned)
	ANNcoord			&cut_val,		// cutting value (returned)
	int					&n_lo);			// num of points on low side (returned)

void sl_midpt_split(					// sliding midpoint kd-splitter
	ANNpointArray		pa,				// point array (unaltered)
	ANNidxArray			pidx,			// point indices (permuted on return)
	const ANNorthRect	&bnds,			// bounding rectangle for cell
	int					n,				// number of points
	int					dim,			// dimension of space
	int					&cut_dim,		// cutting dimension (returned)
	ANNcoord			&cut_val,		// cutting value (returned)
	int					&n_lo);			// num of points on low side (returned)

void fair_split(						// fair-split kd-splitter
	ANNpointArray		pa,				// point array (unaltered)
	ANNidxArray			pidx,			// point indices (permuted on return)
	const ANNorthRect	&bnds,			// bounding rectangle for cell
	int					n,				// number of points
	int					dim,			// dimension of space
	int					&cut_dim,		// cutting dimension (returned)
	ANNcoord			&cut_val,		// cutting value (returned)
	int					&n_lo);			// num of points on low side (returned)

void sl_fair_split(						// sliding fair-split kd-splitter
	ANNpointArray		pa,				// point array (unaltered)
	ANNidxArray			pidx,			// point indices (permuted on return)
	const ANNorthRect	&bnds,			// bounding rectangle for cell
	int					n,				// number of points
	int					dim,			// dimension of space
	int					&cut_dim,		// cutting dimension (returned)
	ANNcoord			&cut_val,		// cutting value (returned)
	int					&n_lo);			// num of points on low side (returned)

#endif


================================================
FILE: src/ANN/kd_tree.cpp
================================================
//----------------------------------------------------------------------
// File:			kd_tree.cpp
// Programmer:		Sunil Arya and David Mount
// Description:		Basic methods for kd-trees.
// Last modified:	01/04/05 (Version 1.0)
//----------------------------------------------------------------------
// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and
// David Mount.  All Rights Reserved.
//
// This software and related documentation is part of the Approximate
// Nearest Neighbor Library (ANN).  This software is provided under
// the provisions of the Lesser GNU Public License (LGPL).  See the
// file ../ReadMe.txt for further information.
//
// The University of Maryland (U.M.) and the authors make no
// representations about the suitability or fitness of this software for
// any purpose.  It is provided "as is" without express or implied
// warranty.
//----------------------------------------------------------------------
// History:
//	Revision 0.1  03/04/98
//		Initial release
//	Revision 1.0  04/01/05
//		Increased aspect ratio bound (ANN_AR_TOOBIG) from 100 to 1000.
//		Fixed leaf counts to count trivial leaves.
//		Added optional pa, pi arguments to Skeleton kd_tree constructor
//			for use in load constructor.
//		Added annClose() to eliminate KD_TRIVIAL memory leak.
//----------------------------------------------------------------------

#include "kd_tree.h"					// kd-tree declarations
#include "kd_split.h"					// kd-tree splitting rules
#include "kd_util.h"					// kd-tree utilities
#include "ANNperf.h"				// performance evaluation

//----------------------------------------------------------------------
//	Global data
//
//	For some splitting rules, especially with small bucket sizes,
//	it is possible to generate a large number of empty leaf nodes.
//	To save storage we allocate a single trivial leaf node which
//	contains no points.  For messy coding reasons it is convenient
//	to have it reference a trivial point index.
//
//	KD_TRIVIAL is allocated when the first kd-tree is created.  It
//	must *never* deallocated (since it may be shared by more than
//	one tree).
//----------------------------------------------------------------------
static int				IDX_TRIVIAL[] = {0};	// trivial point index
ANNkd_leaf				*KD_TRIVIAL = NULL;		// trivial leaf node

//----------------------------------------------------------------------
//	Printing the kd-tree
//		These routines print a kd-tree in reverse inorder (high then
//		root then low).  (This is so that if you look at the output
//		from the right side it appear from left to right in standard
//		inorder.)  When outputting leaves we output only the point
//		indices rather than the point coordinates. There is an option
//		to print the point coordinates separately.
//
//		The tree printing routine calls the printing routines on the
//		individual nodes of the tree, passing in the level or depth
//		in the tree.  The level in the tree is used to print indentation
//		for readability.
//----------------------------------------------------------------------

void ANNkd_split::print(				// print splitting node
		int level,						// depth of node in tree
		ostream &out)					// output stream
{
	child[ANN_HI]->print(level+1, out);	// print high child
	out << "    ";
	for (int i = 0; i < level; i++)		// print indentation
		out << "..";
	out << "Split cd=" << cut_dim << " cv=" << cut_val;
	out << " lbnd=" << cd_bnds[ANN_LO];
	out << " hbnd=" << cd_bnds[ANN_HI];
	out << "\n";
	child[ANN_LO]->print(level+1, out);	// print low child
}

void ANNkd_leaf::print(					// print leaf node
		int level,						// depth of node in tree
		ostream &out)					// output stream
{

	out << "    ";
	for (int i = 0; i < level; i++)		// print indentation
		out << "..";

	if (this == KD_TRIVIAL) {			// canonical trivial leaf node
		out << "Leaf (trivial)\n";
	}
	else{
		out << "Leaf n=" << n_pts << " <";
		for (int j = 0; j < n_pts; j++) {
			out << bkt[j];
			if (j < n_pts-1) out << ",";
		}
		out << ">\n";
	}
}

void ANNkd_tree::Print(					// print entire tree
		ANNbool with_pts,				// print points as well?
		ostream &out)					// output stream
{
	out << "ANN Version " << ANNversion << "\n";
	if (with_pts) {						// print point coordinates
		out << "    Points:\n";
		for (int i = 0; i < n_pts; i++) {
			out << "\t" << i << ": ";
			annPrintPt(pts[i], dim, out);
			out << "\n";
		}
	}
	if (root == NULL)					// empty tree?
		out << "    Null tree.\n";
	else {
		root->print(0, out);			// invoke printing at root
	}
}

//----------------------------------------------------------------------
//	kd_tree statistics (for performance evaluation)
//		This routine compute various statistics information for
//		a kd-tree.  It is used by the implementors for performance
//		evaluation of the data structure.
//----------------------------------------------------------------------

#define MAX(a,b)		((a) > (b) ? (a) : (b))

void ANNkdStats::merge(const ANNkdStats &st)	// merge stats from child
{
	n_lf += st.n_lf;			n_tl += st.n_tl;
	n_spl += st.n_spl;			n_shr += st.n_shr;
	depth = MAX(depth, st.depth);
	sum_ar += st.sum_ar;
}

//----------------------------------------------------------------------
//	Update statistics for nodes
//----------------------------------------------------------------------

const double ANN_AR_TOOBIG = 1000;				// too big an aspect ratio

void ANNkd_leaf::getStats(						// get subtree statistics
	int					dim,					// dimension of space
	ANNkdStats			&st,					// stats (modified)
	ANNorthRect			&bnd_box)				// bounding box
{
	st.reset();
	st.n_lf = 1;								// count this leaf
	if (this == KD_TRIVIAL) st.n_tl = 1;		// count trivial leaf
	double ar = annAspectRatio(dim, bnd_box);	// aspect ratio of leaf
												// incr sum (ignore outliers)
	st.sum_ar += float(ar < ANN_AR_TOOBIG ? ar : ANN_AR_TOOBIG);
}

void ANNkd_split::getStats(						// get subtree statistics
	int					dim,					// dimension of space
	ANNkdStats			&st,					// stats (modified)
	ANNorthRect			&bnd_box)				// bounding box
{
	ANNkdStats ch_stats;						// stats for children
												// get stats for low child
	ANNcoord hv = bnd_box.hi[cut_dim];			// save box bounds
	bnd_box.hi[cut_dim] = cut_val;				// upper bound for low child
	ch_stats.reset();							// reset
	child[ANN_LO]->getStats(dim, ch_stats, bnd_box);
	st.merge(ch_stats);							// merge them
	bnd_box.hi[cut_dim] = hv;					// restore bound
												// get stats for high child
	ANNcoord lv = bnd_box.lo[cut_dim];			// save box bounds
	bnd_box.lo[cut_dim] = cut_val;				// lower bound for high child
	ch_stats.reset();							// reset
	child[ANN_HI]->getStats(dim, ch_stats, bnd_box);
	st.merge(ch_stats);							// merge them
	bnd_box.lo[cut_dim] = lv;					// restore bound

	st.depth++;									// increment depth
	st.n_spl++;									// increment number of splits
}

//----------------------------------------------------------------------
//	getStats
//		Collects a number of statistics related to kd_tree or
//		bd_tree.
//----------------------------------------------------------------------

void ANNkd_tree::getStats(						// get tree statistics
	ANNkdStats			&st)					// stats (modified)
{
	st.reset(dim, n_pts, bkt_size);				// reset stats
												// create bounding box
	ANNorthRect bnd_box(dim, bnd_box_lo, bnd_box_hi);
	if (root != NULL) {							// if nonempty tree
		root->getStats(dim, st, bnd_box);		// get statistics
		st.avg_ar = st.sum_ar / st.n_lf;		// average leaf asp ratio
	}
}

//----------------------------------------------------------------------
//	kd_tree destructor
//		The destructor just frees the various elements that were
//		allocated in the construction process.
//----------------------------------------------------------------------

ANNkd_tree::~ANNkd_tree()				// tree destructor
{
  if (root != NULL) delete root;
	if (pidx != NULL) delete [] pidx;
	if (bnd_box_lo != NULL) annDeallocPt(bnd_box_lo);
	if (bnd_box_hi != NULL) annDeallocPt(bnd_box_hi);
}

//----------------------------------------------------------------------
//	This is called with all use of ANN is finished.  It eliminates the
//	minor memory leak caused by the allocation of KD_TRIVIAL.
//----------------------------------------------------------------------
void annClose()				// close use of ANN
{
	if (KD_TRIVIAL != NULL) {
		delete KD_TRIVIAL;
		KD_TRIVIAL = NULL;
	}
}

//----------------------------------------------------------------------
//	kd_tree constructors
//		There is a skeleton kd-tree constructor which sets up a
//		trivial empty tree.	 The last optional argument allows
//		the routine to be passed a point index array which is
//		assumed to be of the proper size (n).  Otherwise, one is
//		allocated and initialized to the identity.	Warning: In
//		either case the destructor will deallocate this array.
//
//		As a kludge, we need to allocate KD_TRIVIAL if one has not
//		already been allocated.	 (This is because I'm too dumb to
//		figure out how to cause a pointer to be allocated at load
//		time.)
//----------------------------------------------------------------------

void ANNkd_tree::SkeletonTree(			// construct skeleton tree
		int n,							// number of points
		int dd,							// dimension
		int bs,							// bucket size
		ANNpointArray pa,				// point array
		ANNidxArray pi)					// point indices
{
	dim = dd;							// initialize basic elements
	n_pts = n;
	bkt_size = bs;
	pts = pa;							// initialize points array

	root = NULL;						// no associated tree yet

	if (pi == NULL) {					// point indices provided?
		pidx = new ANNidx[n];			// no, allocate space for point indices
		for (int i = 0; i < n; i++) {
			pidx[i] = i;				// initially identity
		}
	}
	else {
		pidx = pi;						// yes, use them
	}

	bnd_box_lo = bnd_box_hi = NULL;		// bounding box is nonexistent
	if (KD_TRIVIAL == NULL)				// no trivial leaf node yet?
		KD_TRIVIAL = new ANNkd_leaf(0, IDX_TRIVIAL);	// allocate it
}

ANNkd_tree::ANNkd_tree(					// basic constructor
		int n,							// number of points
		int dd,							// dimension
		int bs)							// bucket size
{  SkeletonTree(n, dd, bs);  }			// construct skeleton tree

//----------------------------------------------------------------------
//	rkd_tree - recursive procedure to build a kd-tree
//
//		Builds a kd-tree for points in pa as indexed through the
//		array pidx[0..n-1] (typically a subarray of the array used in
//		the top-level call).  This routine permutes the array pidx,
//		but does not alter pa[].
//
//		The construction is based on a standard algorithm for constructing
//		the kd-tree (see Friedman, Bentley, and Finkel, ``An algorithm for
//		finding best matches in logarithmic expected time,'' ACM Transactions
//		on Mathematical Software, 3(3):209-226, 1977).  The procedure
//		operates by a simple divide-and-conquer strategy, which determines
//		an appropriate orthogonal cutting plane (see below), and splits
//		the points.  When the number of points falls below the bucket size,
//		we simply store the points in a leaf node's bucket.
//
//		One of the arguments is a pointer to a splitting routine,
//		whose prototype is:
//
//				void split(
//						ANNpointArray pa,  // complete point array
//						ANNidxArray pidx,  // point array (permuted on return)
//						ANNorthRect &bnds, // bounds of current cell
//						int n,			   // number of points
//						int dim,		   // dimension of space
//						int &cut_dim,	   // cutting dimension
//						ANNcoord &cut_val, // cutting value
//						int &n_lo)		   // no. of points on low side of cut
//
//		This procedure selects a cutting dimension and cutting value,
//		partitions pa about these values, and returns the number of
//		points on the low side of the cut.
//----------------------------------------------------------------------

ANNkd_ptr rkd_tree(				// recursive construction of kd-tree
	ANNpointArray		pa,				// point array
	ANNidxArray			pidx,			// point indices to store in subtree
	int					n,				// number of points
	int					dim,			// dimension of space
	int					bsp,			// bucket space
	ANNorthRect			&bnd_box,		// bounding box for current node
	ANNkd_splitter		splitter)		// splitting routine
{
	if (n <= bsp) {						// n small, make a leaf node
		if (n == 0)						// empty leaf node
			return KD_TRIVIAL;			// return (canonical) empty leaf
		else							// construct the node and return
			return new ANNkd_leaf(n, pidx);
	}
	else {								// n large, make a splitting node
		int cd;							// cutting dimension
		ANNcoord cv;					// cutting value
		int n_lo;						// number on low side of cut
		ANNkd_node *lo, *hi;			// low and high children

										// invoke splitting procedure
		(*splitter)(pa, pidx, bnd_box, n, dim, cd, cv, n_lo);

		ANNcoord lv = bnd_box.lo[cd];	// save bounds for cutting dimension
		ANNcoord hv = bnd_box.hi[cd];

		bnd_box.hi[cd] = cv;			// modify bounds for left subtree
		lo = rkd_tree(					// build left subtree
				pa, pidx, n_lo,			// ...from pidx[0..n_lo-1]
				dim, bsp, bnd_box, splitter);
		bnd_box.hi[cd] = hv;			// restore bounds

		bnd_box.lo[cd] = cv;			// modify bounds for right subtree
		hi = rkd_tree(					// build right subtree
				pa, pidx + n_lo, n-n_lo,// ...from pidx[n_lo..n-1]
				dim, bsp, bnd_box, splitter);
		bnd_box.lo[cd] = lv;			// restore bounds

										// create the splitting node
		ANNkd_split *ptr = new ANNkd_split(cd, cv, lv, hv, lo, hi);

		return ptr;						// return pointer to this node
	}
}

//----------------------------------------------------------------------
// kd-tree constructor
//		This is the main constructor for kd-trees given a set of points.
//		It first builds a skeleton tree, then computes the bounding box
//		of the data points, and then invokes rkd_tree() to actually
//		build the tree, passing it the appropriate splitting routine.
//----------------------------------------------------------------------

ANNkd_tree::ANNkd_tree(					// construct from point array
	ANNpointArray		pa,				// point array (with at least n pts)
	int					n,				// number of points
	int					dd,				// dimension
	int					bs,				// bucket size
	ANNsplitRule		split)			// splitting method
{
	SkeletonTree(n, dd, bs);			// set up the basic stuff
	pts = pa;							// where the points are
	if (n == 0) return;					// no points--no sweat

	ANNorthRect bnd_box(dd);			// bounding box for points
	annEnclRect(pa, pidx, n, dd, bnd_box);// construct bounding rectangle
										// copy to tree structure
	bnd_box_lo = annCopyPt(dd, bnd_box.lo);
	bnd_box_hi = annCopyPt(dd, bnd_box.hi);

	switch (split) {					// build by rule
	case ANN_KD_STD:					// standard kd-splitting rule
		root = rkd_tree(pa, pidx, n, dd, bs, bnd_box, kd_split);
		break;
	case ANN_KD_MIDPT:					// midpoint split
		root = rkd_tree(pa, pidx, n, dd, bs, bnd_box, midpt_split);
		break;
	case ANN_KD_FAIR:					// fair split
		root = rkd_tree(pa, pidx, n, dd, bs, bnd_box, fair_split);
		break;
	case ANN_KD_SUGGEST:				// best (in our opinion)
	case ANN_KD_SL_MIDPT:				// sliding midpoint split
		root = rkd_tree(pa, pidx, n, dd, bs, bnd_box, sl_midpt_split);
		break;
	case ANN_KD_SL_FAIR:				// sliding fair split
		root = rkd_tree(pa, pidx, n, dd, bs, bnd_box, sl_fair_split);
		break;
	default:
		annError("Illegal splitting method", ANNabort);
	}
}


================================================
FILE: src/ANN/kd_tree.h
================================================
//----------------------------------------------------------------------
// File:			kd_tree.h
// Programmer:		Sunil Arya and David Mount
// Description:		Declarations for standard kd-tree routines
// Last modified:	05/03/05 (Version 1.1)
//----------------------------------------------------------------------
// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and
// David Mount.  All Rights Reserved.
//
// This software and related documentation is part of the Approximate
// Nearest Neighbor Library (ANN).  This software is provided under
// the provisions of the Lesser GNU Public License (LGPL).  See the
// file ../ReadMe.txt for further information.
//
// The University of Maryland (U.M.) and the authors make no
// representations about the suitability or fitness of this software for
// any purpose.  It is provided "as is" without express or implied
// warranty.
//----------------------------------------------------------------------
// History:
//	Revision 0.1  03/04/98
//		Initial release
//	Revision 1.1  05/03/05
//		Added fixed radius kNN search
//----------------------------------------------------------------------

#ifndef ANN_kd_tree_H
#define ANN_kd_tree_H

#include "ANNx.h"					// all ANN includes

using namespace std;					// make std:: available

//----------------------------------------------------------------------
//	Generic kd-tree node
//
//		Nodes in kd-trees are of two types, splitting nodes which contain
//		splitting information (a splitting hyperplane orthogonal to one
//		of the coordinate axes) and leaf nodes which contain point
//		information (an array of points stored in a bucket).  This is
//		handled by making a generic class kd_node, which is essentially an
//		empty shell, and then deriving the leaf and splitting nodes from
//		this.
//----------------------------------------------------------------------

class ANNkd_node{						// generic kd-tree node (empty shell)
public:
	virtual ~ANNkd_node() {}					// virtual distroyer

	virtual void ann_search(ANNdist) = 0;		// tree search
	virtual void ann_pri_search(ANNdist) = 0;	// priority search
	virtual void ann_FR_search(ANNdist) = 0;	// fixed-radius search

	virtual void getStats(						// get tree statistics
				int dim,						// dimension of space
				ANNkdStats &st,					// statistics
				ANNorthRect &bnd_box) = 0;		// bounding box
												// print node
	virtual void print(int level, ostream &out) = 0;
	virtual void dump(ostream &out) = 0;		// dump node

	friend class ANNkd_tree;					// allow kd-tree to access us
};

//----------------------------------------------------------------------
//	kd-splitting function:
//		kd_splitter is a pointer to a splitting routine for preprocessing.
//		Different splitting procedures result in different strategies
//		for building the tree.
//----------------------------------------------------------------------

typedef void (*ANNkd_splitter)(			// splitting routine for kd-trees
	ANNpointArray		pa,				// point array (unaltered)
	ANNidxArray			pidx,			// point indices (permuted on return)
	const ANNorthRect	&bnds,			// bounding rectangle for cell
	int					n,				// number of points
	int					dim,			// dimension of space
	int					&cut_dim,		// cutting dimension (returned)
	ANNcoord			&cut_val,		// cutting value (returned)
	int					&n_lo);			// num of points on low side (returned)

//----------------------------------------------------------------------
//	Leaf kd-tree node
//		Leaf nodes of the kd-tree store the set of points associated
//		with this bucket, stored as an array of point indices.  These
//		are indices in the array points, which resides with the
//		root of the kd-tree.  We also store the number of points
//		that reside in this bucket.
//----------------------------------------------------------------------

class ANNkd_leaf: public ANNkd_node		// leaf node for kd-tree
{
	int					n_pts;			// no. points in bucket
	ANNidxArray			bkt;			// bucket of points
public:
	ANNkd_leaf(							// constructor
		int				n,				// number of points
		ANNidxArray		b)				// bucket
		{
			n_pts		= n;			// number of points in bucket
			bkt			= b;			// the bucket
		}

	~ANNkd_leaf() { }					// destructor (none)

	virtual void getStats(						// get tree statistics
				int dim,						// dimension of space
				ANNkdStats &st,					// statistics
				ANNorthRect &bnd_box);			// bounding box
	virtual void print(int level, ostream &out);// print node
	virtual void dump(ostream &out);			// dump node

	virtual void ann_search(ANNdist);			// standard search
	virtual void ann_pri_search(ANNdist);		// priority search
	virtual void ann_FR_search(ANNdist);		// fixed-radius search
};

//----------------------------------------------------------------------
//		KD_TRIVIAL is a special pointer to an empty leaf node. Since
//		some splitting rules generate many (more than 50%) trivial
//		leaves, we use this one shared node to save space.
//
//		The pointer is initialized to NULL, but whenever a kd-tree is
//		created, we allocate this node, if it has not already been
//		allocated. This node is *never* deallocated, so it produces
//		a small memory leak.
//----------------------------------------------------------------------

extern ANNkd_leaf *KD_TRIVIAL;					// trivial (empty) leaf node

//----------------------------------------------------------------------
//	kd-tree splitting node.
//		Splitting nodes contain a cutting dimension and a cutting value.
//		These indicate the axis-parellel plane which subdivide the
//		box for this node. The extent of the bounding box along the
//		cutting dimension is maintained (this is used to speed up point
//		to box distance calculations) [we do not store the entire bounding
//		box since this may be wasteful of space in high dimensions].
//		We also store pointers to the 2 children.
//----------------------------------------------------------------------

class ANNkd_split : public ANNkd_node	// splitting node of a kd-tree
{
	int					cut_dim;		// dim orthogonal to cutting plane
	ANNcoord			cut_val;		// location of cutting plane
	ANNcoord			cd_bnds[2];		// lower and upper bounds of
										// rectangle along cut_dim
	ANNkd_ptr			child[2];		// left and right children
public:
	ANNkd_split(						// constructor
		int cd,							// cutting dimension
		ANNcoord cv,					// cutting value
		ANNcoord lv, ANNcoord hv,				// low and high values
		ANNkd_ptr lc=NULL, ANNkd_ptr hc=NULL)	// children
		{
			cut_dim		= cd;					// cutting dimension
			cut_val		= cv;					// cutting value
			cd_bnds[ANN_LO] = lv;				// lower bound for rectangle
			cd_bnds[ANN_HI] = hv;				// upper bound for rectangle
			child[ANN_LO]	= lc;				// left child
			child[ANN_HI]	= hc;				// right child
		}

	~ANNkd_split()						// destructor
		{
			if (child[ANN_LO]!= NULL && child[ANN_LO]!= KD_TRIVIAL)
				delete child[ANN_LO];
			if (child[ANN_HI]!= NULL && child[ANN_HI]!= KD_TRIVIAL)
				delete child[ANN_HI];
		}

	virtual void getStats(						// get tree statistics
				int dim,						// dimension of space
				ANNkdStats &st,					// statistics
				ANNorthRect &bnd_box);			// bounding box
	virtual void print(int level, ostream &out);// print node
	virtual void dump(ostream &out);			// dump node

	virtual void ann_search(ANNdist);			// standard search
	virtual void ann_pri_search(ANNdist);		// priority search
	virtual void ann_FR_search(ANNdist);		// fixed-radius search
};

//----------------------------------------------------------------------
//		External entry points
//----------------------------------------------------------------------

ANNkd_ptr rkd_tree(				// recursive construction of kd-tree
	ANNpointArray		pa,				// point array (unaltered)
	ANNidxArray			pidx,			// point indices to store in subtree
	int					n,				// number of points
	int					dim,			// dimension of space
	int					bsp,			// bucket space
	ANNorthRect			&bnd_box,		// bounding box for current node
	ANNkd_splitter		splitter);		// splitting routine

#endif


================================================
FILE: src/ANN/kd_util.cpp
================================================
//----------------------------------------------------------------------
// File:			kd_util.cpp
// Programmer:		Sunil Arya and David Mount
// Description:		Common utilities for kd-trees
// Last modified:	01/04/05 (Version 1.0)
//----------------------------------------------------------------------
// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and
// David Mount.  All Rights Reserved.
//
// This software and related documentation is part of the Approximate
// Nearest Neighbor Library (ANN).  This software is provided under
// the provisions of the Lesser GNU Public License (LGPL).  See the
// file ../ReadMe.txt for further information.
//
// The University of Maryland (U.M.) and the authors make no
// representations about the suitability or fitness of this software for
// any purpose.  It is provided "as is" without express or implied
// warranty.
//----------------------------------------------------------------------
// History:
//	Revision 0.1  03/04/98
//		Initial release
//----------------------------------------------------------------------

#include "kd_util.h"					// kd-utility declarations

#include "ANNperf.h"				// performance evaluation

//----------------------------------------------------------------------
// The following routines are utility functions for manipulating
// points sets, used in determining splitting planes for kd-tree
// construction.
//----------------------------------------------------------------------

//----------------------------------------------------------------------
//	NOTE: Virtually all point indexing is done through an index (i.e.
//	permutation) array pidx.  Consequently, a reference to the d-th
//	coordinate of the i-th point is pa[pidx[i]][d].  The macro PA(i,d)
//	is a shorthand for this.
//----------------------------------------------------------------------
										// standard 2-d indirect indexing
#define PA(i,d)			(pa[pidx[(i)]][(d)])
										// accessing a single point
#define PP(i)			(pa[pidx[(i)]])

//----------------------------------------------------------------------
//	annAspectRatio
//		Compute the aspect ratio (ratio of longest to shortest side)
//		of a rectangle.
//----------------------------------------------------------------------

double annAspectRatio(
	int					dim,			// dimension
	const ANNorthRect	&bnd_box)		// bounding cube
{
	ANNcoord length = bnd_box.hi[0] - bnd_box.lo[0];
	ANNcoord min_length = length;				// min side length
	ANNcoord max_length = length;				// max side length
	for (int d = 0; d < dim; d++) {
		length = bnd_box.hi[d] - bnd_box.lo[d];
		if (length < min_length) min_length = length;
		if (length > max_length) max_length = length;
	}
	return max_length/min_length;
}

//----------------------------------------------------------------------
//	annEnclRect, annEnclCube
//		These utilities compute the smallest rectangle and cube enclosing
//		a set of points, respectively.
//----------------------------------------------------------------------

void annEnclRect(
	ANNpointArray		pa,				// point array
	ANNidxArray			pidx,			// point indices
	int					n,				// number of points
	int					dim,			// dimension
	ANNorthRect			&bnds)			// bounding cube (returned)
{
	for (int d = 0; d < dim; d++) {		// find smallest enclosing rectangle
		ANNcoord lo_bnd = PA(0,d);		// lower bound on dimension d
		ANNcoord hi_bnd = PA(0,d);		// upper bound on dimension d
		for (int i = 0; i < n; i++) {
			if (PA(i,d) < lo_bnd) lo_bnd = PA(i,d);
			else if (PA(i,d) > hi_bnd) hi_bnd = PA(i,d);
		}
		bnds.lo[d] = lo_bnd;
		bnds.hi[d] = hi_bnd;
	}
}

void annEnclCube(						// compute smallest enclosing cube
	ANNpointArray		pa,				// point array
	ANNidxArray			pidx,			// point indices
	int					n,				// number of points
	int					dim,			// dimension
	ANNorthRect			&bnds)			// bounding cube (returned)
{
	int d;
										// compute smallest enclosing rect
	annEnclRect(pa, pidx, n, dim, bnds);

	ANNcoord max_len = 0;				// max length of any side
	for (d = 0; d < dim; d++) {			// determine max side length
		ANNcoord len = bnds.hi[d] - bnds.lo[d];
		if (len > max_len) {			// update max_len if longest
			max_len = len;
		}
	}
	for (d = 0; d < dim; d++) {			// grow sides to match max
		ANNcoord len = bnds.hi[d] - bnds.lo[d];
		ANNcoord half_diff = (max_len - len) / 2;
		bnds.lo[d] -= half_diff;
		bnds.hi[d] += half_diff;
	}
}

//----------------------------------------------------------------------
//	annBoxDistance - utility routine which computes distance from point to
//		box (Note: most distances to boxes are computed using incremental
//		distance updates, not this function.)
//----------------------------------------------------------------------

ANNdist annBoxDistance(			// compute distance from point to box
	const ANNpoint		q,				// the point
	const ANNpoint		lo,				// low point of box
	const ANNpoint		hi,				// high point of box
	int					dim)			// dimension of space
{
	ANNdist dist = 0.0;		// sum of squared distances
	ANNdist t;

	for (int d = 0; d < dim; d++) {
		if (q[d] < lo[d]) {				// q is left of box
			t = ANNdist(lo[d]) - ANNdist(q[d]);
			dist = ANN_SUM(dist, ANN_POW(t));
		}
		else if (q[d] > hi[d]) {		// q is right of box
			t = ANNdist(q[d]) - ANNdist(hi[d]);
			dist = ANN_SUM(dist, ANN_POW(t));
		}
	}
	ANN_FLOP(4*dim)						// increment floating op count

	return dist;
}

//----------------------------------------------------------------------
//	annSpread - find spread along given dimension
//	annMinMax - find min and max coordinates along given dimension
//	annMaxSpread - find dimension of max spread
//----------------------------------------------------------------------

ANNcoord annSpread(				// compute point spread along dimension
	ANNpointArray		pa,				// point array
	ANNidxArray			pidx,			// point indices
	int					n,				// number of points
	int					d)				// dimension to check
{
	ANNcoord min = PA(0,d);				// compute max and min coords
	ANNcoord max = PA(0,d);
	for (int i = 1; i < n; i++) {
		ANNcoord c = PA(i,d);
		if (c < min) min = c;
		else if (c > max) max = c;
	}
	return (max - min);					// total spread is difference
}

void annMinMax(					// compute min and max coordinates along dim
	ANNpointArray		pa,				// point array
	ANNidxArray			pidx,			// point indices
	int					n,				// number of points
	int					d,				// dimension to check
	ANNcoord			&min,			// minimum value (returned)
	ANNcoord			&max)			// maximum value (returned)
{
	min = PA(0,d);						// compute max and min coords
	max = PA(0,d);
	for (int i = 1; i < n; i++) {
		ANNcoord c = PA(i,d);
		if (c < min) min = c;
		else if (c > max) max = c;
	}
}

int annMaxSpread(						// compute dimension of max spread
	ANNpointArray		pa,				// point array
	ANNidxArray			pidx,			// point indices
	int					n,				// number of points
	int					dim)			// dimension of space
{
	int max_dim = 0;					// dimension of max spread
	ANNcoord max_spr = 0;				// amount of max spread

	if (n == 0) return max_dim;			// no points, who cares?

	for (int d = 0; d < dim; d++) {		// compute spread along each dim
		ANNcoord spr = annSpread(pa, pidx, n, d);
		if (spr > max_spr) {			// bigger than current max
			max_spr = spr;
			max_dim = d;
		}
	}
	return max_dim;
}

//----------------------------------------------------------------------
//	annMedianSplit - split point array about its median
//		Splits a subarray of points pa[0..n] about an element of given
//		rank (median: n_lo = n/2) with respect to dimension d.  It places
//		the element of rank n_lo-1 correctly (because our splitting rule
//		takes the mean of these two).  On exit, the array is permuted so
//		that:
//
//		pa[0..n_lo-2][d] <= pa[n_lo-1][d] <= pa[n_lo][d] <= pa[n_lo+1..n-1][d].
//
//		The mean of pa[n_lo-1][d] and pa[n_lo][d] is returned as the
//		splitting value.
//
//		All indexing is done indirectly through the index array pidx.
//
//		This function uses the well known selection algorithm due to
//		C.A.R. Hoare.
//----------------------------------------------------------------------

										// swap two points in pa array
#define PASWAP(a,b) { int tmp = pidx[a]; pidx[a] = pidx[b]; pidx[b] = tmp; }

void annMedianSplit(
	ANNpointArray		pa,				// points to split
	ANNidxArray			pidx,			// point indices
	int					n,				// number of points
	int					d,				// dimension along which to split
	ANNcoord			&cv,			// cutting value
	int					n_lo)			// split into n_lo and n-n_lo
{
	int l = 0;							// left end of current subarray
	int r = n-1;						// right end of current subarray
	while (l < r) {
		int i = (r+l)/2;		// select middle as pivot
		int k;

		if (PA(i,d) > PA(r,d))			// make sure last > pivot
			PASWAP(i,r)
		PASWAP(l,i);					// move pivot to first position

		ANNcoord c = PA(l,d);			// pivot value
		i = l;
		k = r;
		for(;;) {						// pivot about c
			while (PA(++i,d) < c) ;
			while (PA(--k,d) > c) ;
			if (i < k) PASWAP(i,k) else break;
		}
		PASWAP(l,k);					// pivot winds up in location k

		if (k > n_lo)	   r = k-1;		// recurse on proper subarray
		else if (k < n_lo) l = k+1;
		else break;						// got the median exactly
	}
	if (n_lo > 0) {						// search for next smaller item
		ANNcoord c = PA(0,d);			// candidate for max
		int k = 0;						// candidate's index
		for (int i = 1; i < n_lo; i++) {
			if (PA(i,d) > c) {
				c = PA(i,d);
				k = i;
			}
		}
		PASWAP(n_lo-1, k);				// max among pa[0..n_lo-1] to pa[n_lo-1]
	}
										// cut value is midpoint value
	cv = (PA(n_lo-1,d) + PA(n_lo,d))/2.0;
}

//----------------------------------------------------------------------
//	annPlaneSplit - split point array about a cutting plane
//		Split the points in an array about a given plane along a
//		given cutting dimension.  On exit, br1 and br2 are set so
//		that:
//
//				pa[ 0 ..br1-1] <  cv
//				pa[br1..br2-1] == cv
//				pa[br2.. n -1] >  cv
//
//		All indexing is done indirectly through the index array pidx.
//
//----------------------------------------------------------------------

void annPlaneSplit(				// split points by a plane
	ANNpointArray		pa,				// points to split
	ANNidxArray			pidx,			// point indices
	int					n,				// number of points
	int					d,				// dimension along which to split
	ANNcoord			cv,				// cutting value
	int					&br1,			// first break (values < cv)
	int					&br2)			// second break (values == cv)
{
	int l = 0;
	int r = n-1;
	for(;;) {							// partition pa[0..n-1] about cv
		while (l < n && PA(l,d) < cv) l++;
		while (r >= 0 && PA(r,d) >= cv) r--;
		if (l > r) break;
		PASWAP(l,r);
		l++; r--;
	}
	br1 = l;					// now: pa[0..br1-1] < cv <= pa[br1..n-1]
	r = n-1;
	for(;;) {							// partition pa[br1..n-1] about cv
		while (l < n && PA(l,d) <= cv) l++;
		while (r >= br1 && PA(r,d) > cv) r--;
		if (l > r) break;
		PASWAP(l,r);
		l++; r--;
	}
	br2 = l;					// now: pa[br1..br2-1] == cv < pa[br2..n-1]
}


//----------------------------------------------------------------------
//	annBoxSplit - split point array about a orthogonal rectangle
//		Split the points in an array about a given orthogonal
//		rectangle.  On exit, n_in is set to the number of points
//		that are inside (or on the boundary of) the rectangle.
//
//		All indexing is done indirectly through the index array pidx.
//
//----------------------------------------------------------------------

void annBoxSplit(				// split points by a box
	ANNpointArray		pa,				// points to split
	ANNidxArray			pidx,			// point indices
	int					n,				// number of points
	int					dim,			// dimension of space
	ANNorthRect			&box,			// the box
	int					&n_in)			// number of points inside (returned)
{
	int l = 0;
	int r = n-1;
	for(;;) {							// partition pa[0..n-1] about box
		while (l < n && box.inside(dim, PP(l))) l++;
		while (r >= 0 && !box.inside(dim, PP(r))) r--;
		if (l > r) break;
		PASWAP(l,r);
		l++; r--;
	}
	n_in = l;					// now: pa[0..n_in-1] inside and rest outside
}

//----------------------------------------------------------------------
//	annSplitBalance - compute balance factor for a given plane split
//		Balance factor is defined as the number of points lying
//		below the splitting value minus n/2 (median).  Thus, a
//		median split has balance 0, left of this is negative and
//		right of this is positive.  (The points are unchanged.)
//----------------------------------------------------------------------

int annSplitBalance(			// determine balance factor of a split
	ANNpointArray		pa,				// points to split
	ANNidxArray			pidx,			// point indices
	int					n,				// number of points
	int					d,				// dimension along which to split
	ANNcoord			cv)				// cutting value
{
	int n_lo = 0;
	for(int i = 0; i < n; i++) {		// count number less than cv
		if (PA(i,d) < cv) n_lo++;
	}
	return n_lo - n/2;
}

//----------------------------------------------------------------------
//	annBox2Bnds - convert bounding box to list of bounds
//		Given two boxes, an inner box enclosed within a bounding
//		box, this routine determines all the sides for which the
//		inner box is strictly contained with the bounding box,
//		and adds an appropriate entry to a list of bounds.  Then
//		we allocate storage for the final list of bounds, and return
//		the resulting list and its size.
//----------------------------------------------------------------------

void annBox2Bnds(						// convert inner box to bounds
	const ANNorthRect	&inner_box,		// inner box
	const ANNorthRect	&bnd_box,		// enclosing box
	int					dim,			// dimension of space
	int					&n_bnds,		// number of bounds (returned)
	ANNorthHSArray		&bnds)			// bounds array (returned)
{
	int i;
	n_bnds = 0;									// count number of bounds
	for (i = 0; i < dim; i++) {
		if (inner_box.lo[i] > bnd_box.lo[i])	// low bound is inside
				n_bnds++;
		if (inner_box.hi[i] < bnd_box.hi[i])	// high bound is inside
				n_bnds++;
	}

	bnds = new ANNorthHalfSpace[n_bnds];		// allocate appropriate size

	int j = 0;
	for (i = 0; i < dim; i++) {					// fill the array
		if (inner_box.lo[i] > bnd_box.lo[i]) {
				bnds[j].cd = i;
				bnds[j].cv = inner_box.lo[i];
				bnds[j].sd = +1;
				j++;
		}
		if (inner_box.hi[i] < bnd_box.hi[i]) {
				bnds[j].cd = i;
				bnds[j].cv = inner_box.hi[i];
				bnds[j].sd = -1;
				j++;
		}
	}
}

//----------------------------------------------------------------------
//	annBnds2Box - convert list of bounds to bounding box
//		Given an enclosing box and a list of bounds, this routine
//		computes the corresponding inner box.  It is assumed that
//		the box points have been allocated already.
//----------------------------------------------------------------------

void annBnds2Box(
	const ANNorthRect	&bnd_box,		// enclosing box
	int					dim,			// dimension of space
	int					n_bnds,			// number of bounds
	ANNorthHSArray		bnds,			// bounds array
	ANNorthRect			&inner_box)		// inner box (returned)
{
	annAssignRect(dim, inner_box, bnd_box);		// copy bounding box to inner

	for (int i = 0; i < n_bnds; i++) {
		bnds[i].project(inner_box.lo);			// project each endpoint
		bnds[i].project(inner_box.hi);
	}
}


================================================
FILE: src/ANN/kd_util.h
================================================
//----------------------------------------------------------------------
// File:			kd_util.h
// Programmer:		Sunil Arya and David Mount
// Description:		Common utilities for kd- trees
// Last modified:	01/04/05 (Version 1.0)
//----------------------------------------------------------------------
// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and
// David Mount.  All Rights Reserved.
// 
// This software and related documentation is part of the Approximate
// Nearest Neighbor Library (ANN).  This software is provided under
// the provisions of the Lesser GNU Public License (LGPL).  See the
// file ../ReadMe.txt for further information.
// 
// The University of Maryland (U.M.) and the authors make no
// representations about the suitability or fitness of this software for
// any purpose.  It is provided "as is" without express or implied
// warranty.
//----------------------------------------------------------------------
// History:
//	Revision 0.1  03/04/98
//		Initial release
//----------------------------------------------------------------------

#ifndef ANN_kd_util_H
#define ANN_kd_util_H

#include "kd_tree.h"					// kd-tree declarations

//----------------------------------------------------------------------
//	externally accessible functions
//----------------------------------------------------------------------

double annAspectRatio(			// compute aspect ratio of box
	int					dim,			// dimension
	const ANNorthRect	&bnd_box);		// bounding cube

void annEnclRect(				// compute smallest enclosing rectangle
	ANNpointArray		pa,				// point array
	ANNidxArray			pidx,			// point indices
	int					n,				// number of points
	int					dim,			// dimension
	ANNorthRect &bnds);					// bounding cube (returned)

void annEnclCube(				// compute smallest enclosing cube
	ANNpointArray		pa,				// point array
	ANNidxArray			pidx,			// point indices
	int					n,				// number of points
	int					dim,			// dimension
	ANNorthRect &bnds);					// bounding cube (returned)

ANNdist annBoxDistance(			// compute distance from point to box
	const ANNpoint		q,				// the point
	const ANNpoint		lo,				// low point of box
	const ANNpoint		hi,				// high point of box
	int					dim);			// dimension of space

ANNcoord annSpread(				// compute point spread along dimension
	ANNpointArray		pa,				// point array
	ANNidxArray			pidx,			// point indices
	int					n,				// number of points
	int					d);				// dimension to check

void annMinMax(					// compute min and max coordinates along dim
	ANNpointArray		pa,				// point array
	ANNidxArray			pidx,			// point indices
	int					n,				// number of points
	int					d,				// dimension to check
	ANNcoord&			min,			// minimum value (returned)
	ANNcoord&			max);			// maximum value (returned)

int annMaxSpread(				// compute dimension of max spread
	ANNpointArray		pa,				// point array
	ANNidxArray			pidx,			// point indices
	int					n,				// number of points
	int					dim);			// dimension of space

void annMedianSplit(			// split points along median value
	ANNpointArray		pa,				// points to split
	ANNidxArray			pidx,			// point indices
	int					n,				// number of points
	int					d,				// dimension along which to split
	ANNcoord			&cv,			// cutting value
	int					n_lo);			// split into n_lo and n-n_lo

void annPlaneSplit(				// split points by a plane
	ANNpointArray		pa,				// points to split
	ANNidxArray			pidx,			// point indices
	int					n,				// number of points
	int					d,				// dimension along which to split
	ANNcoord			cv,				// cutting value
	int					&br1,			// first break (values < cv)
	int					&br2);			// second break (values == cv)

void annBoxSplit(				// split points by a box
	ANNpointArray		pa,				// points to split
	ANNidxArray			pidx,			// point indices
	int					n,				// number of points
	int					dim,			// dimension of space
	ANNorthRect			&box,			// the box
	int					&n_in);			// number of points inside (returned)

int annSplitBalance(			// determine balance factor of a split
	ANNpointArray		pa,				// points to split
	ANNidxArray			pidx,			// point indices
	int					n,				// number of points
	int					d,				// dimension along which to split
	ANNcoord			cv);			// cutting value

void annBox2Bnds(				// convert inner box to bounds
	const ANNorthRect	&inner_box,		// inner box
	const ANNorthRect	&bnd_box,		// enclosing box
	int					dim,			// dimension of space
	int					&n_bnds,		// number of bounds (returned)
	ANNorthHSArray		&bnds);			// bounds array (returned)

void annBnds2Box(				// convert bounds to inner box
	const ANNorthRect	&bnd_box,		// enclosing box
	int					dim,			// dimension of space
	int					n_bnds,			// number of bounds
	ANNorthHSArray		bnds,			// bounds array
	ANNorthRect			&inner_box);	// inner box (returned)

#endif


================================================
FILE: src/ANN/perf.cpp
================================================
//----------------------------------------------------------------------
// File:			perf.cpp
// Programmer:		Sunil Arya and David Mount
// Description:		Methods for performance stats
// Last modified:	01/04/05 (Version 1.0)
//----------------------------------------------------------------------
// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and
// David Mount.  All Rights Reserved.
//
// This software and related documentation is part of the Approximate
// Nearest Neighbor Library (ANN).  This software is provided under
// the provisions of the Lesser GNU Public License (LGPL).  See the
// file ../ReadMe.txt for further information.
//
// The University of Maryland (U.M.) and the authors make no
// representations about the suitability or fitness of this software for
// any purpose.  It is provided "as is" without express or implied
// warranty.
//----------------------------------------------------------------------
// History:
//	Revision 0.1  03/04/98
//		Initial release
//	Revision 1.0  04/01/05
//		Changed names to avoid namespace conflicts.
//		Added flush after printing performance stats to fix bug
//			in Microsoft Windows version.
//----------------------------------------------------------------------

#include "ANN.h"					// basic ANN includes
#include "ANNperf.h"				// performance includes

using namespace std;					// make std:: available

//----------------------------------------------------------------------
//	Performance statistics
//		The following data and routines are used for computing
//		performance statistics for nearest neighbor searching.
//		Because these routines can slow the code down, they can be
//		activated and deactiviated by defining the PERF variable,
//		by compiling with the option: -DPERF
//----------------------------------------------------------------------

//----------------------------------------------------------------------
//	Global counters for performance measurement
//----------------------------------------------------------------------

int				ann_Ndata_pts  = 0;		// number of data points
int				ann_Nvisit_lfs = 0;		// number of leaf nodes visited
int				ann_Nvisit_spl = 0;		// number of splitting nodes visited
int				ann_Nvisit_shr = 0;		// number of shrinking nodes visited
int				ann_Nvisit_pts = 0;		// visited points for one query
int				ann_Ncoord_hts = 0;		// coordinate hits for one query
int				ann_Nfloat_ops = 0;		// floating ops for one query
ANNsampStat		ann_visit_lfs;			// stats on leaf nodes visits
ANNsampStat		ann_visit_spl;			// stats on splitting nodes visits
ANNsampStat		ann_visit_shr;			// stats on shrinking nodes visits
ANNsampStat		ann_visit_nds;			// stats on total nodes visits
ANNsampStat		ann_visit_pts;			// stats on points visited
ANNsampStat		ann_coord_hts;			// stats on coordinate hits
ANNsampStat		ann_float_ops;			// stats on floating ops
//
ANNsampStat		ann_average_err;		// average error
ANNsampStat		ann_rank_err;			// rank error

//----------------------------------------------------------------------
//	Routines for statistics.
//----------------------------------------------------------------------

DLL_API void annResetStats(int data_size) // reset stats for a set of queries
{
	ann_Ndata_pts  = data_size;
	ann_visit_lfs.reset();
	ann_visit_spl.reset();
	ann_visit_shr.reset();
	ann_visit_nds.reset();
	ann_visit_pts.reset();
	ann_coord_hts.reset();
	ann_float_ops.reset();
	ann_average_err.reset();
	ann_rank_err.reset();
}

DLL_API void annResetCounts()				// reset counts for one query
{
	ann_Nvisit_lfs = 0;
	ann_Nvisit_spl = 0;
	ann_Nvisit_shr = 0;
	ann_Nvisit_pts = 0;
	ann_Ncoord_hts = 0;
	ann_Nfloat_ops = 0;
}

DLL_API void annUpdateStats()				// update stats with current counts
{
	ann_visit_lfs += ann_Nvisit_lfs;
	ann_visit_nds += ann_Nvisit_spl + ann_Nvisit_lfs;
	ann_visit_spl += ann_Nvisit_spl;
	ann_visit_shr += ann_Nvisit_shr;
	ann_visit_pts += ann_Nvisit_pts;
	ann_coord_hts += ann_Ncoord_hts;
	ann_float_ops += ann_Nfloat_ops;
}

										// print a single statistic
void print_one_stat(const char *title, ANNsampStat s, double div)
{
//R does not allow:	cout << title << "= [ ";
//R does not allow:	cout.width(9); cout << s.mean()/div			<< " : ";
//R does not allow:	cout.width(9); cout << s.stdDev()/div		<< " ]<";
//R does not allow:	cout.width(9); cout << s.min()/div			<< " , ";
//R does not allow: cout.width(9); cout << s.max()/div			<< " >\n";
}

DLL_API void annPrintStats(				// print statistics for a run
	ANNbool validate)					// true if average errors desired
{
//R does not allow:	cout.precision(4);					// set floating precision
//R does not allow:	cout << "  (Performance stats: "
//R does not allow:			 << " [      mean :    stddev ]<      min ,       max >\n";
	print_one_stat("    leaf_nodes       ", ann_visit_lfs, 1);
	print_one_stat("    splitting_nodes  ", ann_visit_spl, 1);
	print_one_stat("    shrinking_nodes  ", ann_visit_shr, 1);
	print_one_stat("    total_nodes      ", ann_visit_nds, 1);
	print_one_stat("    points_visited   ", ann_visit_pts, 1);
	print_one_stat("    coord_hits/pt    ", ann_coord_hts, ann_Ndata_pts);
	print_one_stat("    floating_ops_(K) ", ann_float_ops, 1000);
	if (validate) {
		print_one_stat("    average_error    ", ann_average_err, 1);
		print_one_stat("    rank_error       ", ann_rank_err, 1);
	}
//R does not allow:	cout.precision(0);					// restore the default
//R does not allow:	cout << "  )\n";
//R does not allow:	cout.flush();
}


================================================
FILE: src/ANN/pr_queue.h
================================================
//----------------------------------------------------------------------
// File:			pr_queue.h
// Programmer:		Sunil Arya and David Mount
// Description:		Include file for priority queue and related
// 					structures.
// Last modified:	01/04/05 (Version 1.0)
//----------------------------------------------------------------------
// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and
// David Mount.  All Rights Reserved.
//
// This software and related documentation is part of the Approximate
// Nearest Neighbor Library (ANN).  This software is provided under
// the provisions of the Lesser GNU Public License (LGPL).  See the
// file ../ReadMe.txt for further information.
//
// The University of Maryland (U.M.) and the authors make no
// representations about the suitability or fitness of this software for
// any purpose.  It is provided "as is" without express or implied
// warranty.
//----------------------------------------------------------------------
// History:
//	Revision 0.1  03/04/98
//		Initial release
//----------------------------------------------------------------------

#ifndef PR_QUEUE_H
#define PR_QUEUE_H

#include "ANNx.h"					// all ANN includes
#include "ANNperf.h"				// performance evaluation

//----------------------------------------------------------------------
//	Basic types.
//----------------------------------------------------------------------
typedef void			*PQinfo;		// info field is generic pointer
typedef ANNdist			PQkey;			// key field is distance

//----------------------------------------------------------------------
//	Priority queue
//		A priority queue is a list of items, along with associated
//		priorities.  The basic operations are insert and extract_minimum.
//
//		The priority queue is maintained using a standard binary heap.
//		(Implementation note: Indexing is performed from [1..max] rather
//		than the C standard of [0..max-1].  This simplifies parent/child
//		computations.)  User information consists of a void pointer,
//		and the user is responsible for casting this quantity into whatever
//		useful form is desired.
//
//		Because the priority queue is so central to the efficiency of
//		query processing, all the code is inline.
//----------------------------------------------------------------------

class ANNpr_queue {

	struct pq_node {					// node in priority queue
		PQkey			key;			// key value
		PQinfo			info;			// info field
	};
	int			n;						// number of items in queue
	int			max_size;				// maximum queue size
	pq_node		*pq;					// the priority queue (array of nodes)

public:
	ANNpr_queue(int max)				// constructor (given max size)
		{
			n = 0;						// initially empty
			max_size = max;				// maximum number of items
			pq = new pq_node[max+1];	// queue is array [1..max] of nodes
		}

	~ANNpr_queue()						// destructor
		{ delete [] pq; }

	ANNbool empty()						// is queue empty?
		{ if (n==0) return ANNtrue; else return ANNfalse; }

	ANNbool non_empty()					// is queue nonempty?
		{ if (n==0) return ANNfalse; else return ANNtrue; }

	void reset()						// make existing queue empty
		{ n = 0; }

	inline void insert(					// insert item (inlined for speed)
		PQkey kv,						// key value
		PQinfo inf)						// item info
		{
			if (++n > max_size) annError("Priority queue overflow.", ANNabort);
			int r = n;
			while (r > 1) {				// sift up new item
				int p = r/2;
				ANN_FLOP(1)				// increment floating ops
				if (pq[p].key <= kv)	// in proper order
					break;
				pq[r] = pq[p];			// else swap with parent
				r = p;
			}
			pq[r].key = kv;				// insert new item at final location
			pq[r].info = inf;
		}

	inline void extr_min(				// extract minimum (inlined for speed)
		PQkey &kv,						// key (returned)
		PQinfo &inf)					// item info (returned)
		{
			kv = pq[1].key;				// key of min item
			inf = pq[1].info;			// information of min item
			PQkey kn = pq[n--].key;// last item in queue
			int p = 1;			// p points to item out of position
			int r = p<<1;		// left child of p
			while (r <= n) {			// while r is still within the heap
				ANN_FLOP(2)				// increment floating ops
										// set r to smaller child of p
				if (r < n  && pq[r].key > pq[r+1].key) r++;
				if (kn <= pq[r].key)	// in proper order
					break;
				pq[p] = pq[r];			// else swap with child
				p = r;					// advance pointers
				r = p<<1;
			}
			pq[p] = pq[n+1];			// insert last item in proper place
		}
};

#endif


================================================
FILE: src/ANN/pr_queue_k.h
================================================
//----------------------------------------------------------------------
// File:			pr_queue_k.h
// Programmer:		Sunil Arya and David Mount
// Description:		Include file for priority queue with k items.
// Last modified:	01/04/05 (Version 1.0)
//----------------------------------------------------------------------
// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and
// David Mount.  All Rights Reserved.
//
// This software and related documentation is part of the Approximate
// Nearest Neighbor Library (ANN).  This software is provided under
// the provisions of the Lesser GNU Public License (LGPL).  See the
// file ../ReadMe.txt for further information.
//
// The University of Maryland (U.M.) and the authors make no
// representations about the suitability or fitness of this software for
// any purpose.  It is provided "as is" without express or implied
// warranty.
//----------------------------------------------------------------------
// History:
//	Revision 0.1  03/04/98
//		Initial release
//----------------------------------------------------------------------

#ifndef PR_QUEUE_K_H
#define PR_QUEUE_K_H

#include "ANNx.h"					// all ANN includes
#include "ANNperf.h"				// performance evaluation

//----------------------------------------------------------------------
//	Basic types
//----------------------------------------------------------------------
typedef ANNdist			PQKkey;			// key field is distance
typedef int				PQKinfo;		// info field is int

//----------------------------------------------------------------------
//	Constants
//		The NULL key value is used to initialize the priority queue, and
//		so it should be larger than any valid distance, so that it will
//		be replaced as legal distance values are inserted.  The NULL
//		info value must be a nonvalid array index, we use ANN_NULL_IDX,
//		which is guaranteed to be negative.
//----------------------------------------------------------------------

const PQKkey	PQ_NULL_KEY  =  ANN_DIST_INF;	// nonexistent key value
const PQKinfo	PQ_NULL_INFO =  ANN_NULL_IDX;	// nonexistent info value

//----------------------------------------------------------------------
//	ANNmin_k
//		An ANNmin_k structure is one which maintains the smallest
//		k values (of type PQKkey) and associated information (of type
//		PQKinfo).  The special info and key values PQ_NULL_INFO and
//		PQ_NULL_KEY means that thise entry is empty.
//
//		It is currently implemented using an array with k items.
//		Items are stored in increasing sorted order, and insertions
//		are made through standard insertion sort.  (This is quite
//		inefficient, but current applications call for small values
//		of k and relatively few insertions.)
//
//		Note that the list contains k+1 entries, but the last entry
//		is used as a simple placeholder and is otherwise ignored.
//----------------------------------------------------------------------

class ANNmin_k {
	struct mk_node {					// node in min_k structure
		PQKkey			key;			// key value
		PQKinfo			info;			// info field (user defined)
	};

	int			k;						// max number of keys to store
	int			n;						// number of keys currently active
	mk_node		*mk;					// the list itself

public:
	ANNmin_k(int max)					// constructor (given max size)
		{
			n = 0;						// initially no items
			k = max;					// maximum number of items
			mk = new mk_node[max+1];	// sorted array of keys
		}

	~ANNmin_k()							// destructor
		{ delete [] mk; }

	PQKkey ANNmin_key()					// return minimum key
		{ return (n > 0 ? mk[0].key : PQ_NULL_KEY); }

	PQKkey max_key()					// return maximum key
		{ return (n == k ? mk[k-1].key : PQ_NULL_KEY); }

	PQKkey ith_smallest_key(int i)		// ith smallest key (i in [0..n-1])
		{ return (i < n ? mk[i].key : PQ_NULL_KEY); }

	PQKinfo ith_smallest_info(int i)	// info for ith smallest (i in [0..n-1])
		{ return (i < n ? mk[i].info : PQ_NULL_INFO); }

	inline void insert(					// insert item (inlined for speed)
		PQKkey kv,						// key value
		PQKinfo inf)					// item info
		{
			int i;
										// slide larger values up
			for (i = n; i > 0; i--) {
				if (mk[i-1].key > kv)
					mk[i] = mk[i-1];
				else
					break;
			}
			mk[i].key = kv;				// store element here
			mk[i].info = inf;
			if (n < k) n++;				// increment number of items
			ANN_FLOP(k-i+1)				// increment floating ops
		}
};

#endif


================================================
FILE: src/JP.cpp
================================================
//----------------------------------------------------------------------
//                  Jarvis-Patrick Clustering
//----------------------------------------------------------------------
// Copyright (c) 2017 Michael Hahsler. All Rights Reserved.
//
// This software is provided under the provisions of the
// GNU General Public License (GPL) Version 3
// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)


#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export]]
IntegerVector JP_int(IntegerMatrix nn, unsigned int kt) {
  R_xlen_t n = nn.nrow();

  // create label vector
  std::vector<int> label(n);
  //iota is C++11 only
  //std::iota(std::begin(label), std::end(label), 1); // Fill with 1, 2, ..., n.
  int value = 1;
  std::vector<int>::iterator first = label.begin(), last = label.end();
  while(first != last) *first++ = value++;

  // create sorted sets so we can use set operations
  std::vector< std::set<int> > nn_set(nn.nrow());
  IntegerVector r;
  std::vector<int> s;
  for(R_xlen_t i = 0; i < n; ++i) {
    r = nn(i,_);
    s =  as<std::vector<int> >(r);
    nn_set[i].insert(s.begin(), s.end());
  }

  std::vector<int> z;
  std::set<int>::iterator it;
  R_xlen_t i, j;
  int newlabel, oldlabel;

  for(i = 0; i < n; ++i) {
    // check all neighbors of i
    for (it = nn_set[i].begin(); it != nn_set[i].end(); ++it) {
      j = *it-1; // index in nn starts with 1

      // edge was already checked
      if(j<i) continue;

      // already in the same cluster
      if(label[i] == label[j]) continue;

      // check if points are in each others snn list (i is already in j)
      if(nn_set[j].find(i+1) != nn_set[j].end()) {

        // calculate link strength as the number of shared points
        z.clear();
        std::set_intersection(nn_set[i].begin(), nn_set[i].end(),
          nn_set[j].begin(), nn_set[j].end(),
          std::back_inserter(z));

        // this could be done faster with set union
        // +1 since i is in j
        if(z.size()+1 >= kt) {
          // update labels
          if(label[i] > label[j]) {
            newlabel = label[j]; oldlabel = label[i];
          }else{
            newlabel = label[i]; oldlabel = label[j];
          }

          for(int k = 0; k < n; ++k) {
            if(label[k] == oldlabel) label[k] = newlabel;
          }
        }
      }
    }
  }

  return wrap(label);
}


// jp == true: use the definition by Jarvis-Patrick: A link is created between a pair of
// points, p and q, if and only if p and q have each other  in their k-nearest neighbor lists.
// jp == false: just count the shared NNs = regular sNN
// [[Rcpp::export]]
IntegerMatrix SNN_sim_int(IntegerMatrix nn, LogicalVector jp) {
  R_xlen_t n = nn.nrow();
  R_xlen_t k = nn.ncol();

  IntegerMatrix snn(n, k);

  // create sorted sets so we can use set operations
  std::vector< std::set<int> > nn_set(n);
  IntegerVector r;
  std::vector<int> s;
  for(R_xlen_t i = 0; i < n; ++i) {
    r = nn(i,_);
    s =  as<std::vector<int> >(r);
    nn_set[i].insert(s.begin(), s.end());
  }

  std::vector<int> z;
  int j;

  for(R_xlen_t i = 0; i < n; ++i) {
    // check all neighbors of i
    for (R_xlen_t j_ind = 0; j_ind < k; ++j_ind) {
      j = nn(i, j_ind)-1;

      bool i_in_j = (nn_set[j].find(i+1) != nn_set[j].end());

      if(is_false(all(jp)) || i_in_j) {
        // calculate link strength as the number of shared points
        z.clear();
        std::set_intersection(nn_set[i].begin(), nn_set[i].end(),
          nn_set[j].begin(), nn_set[j].end(),
          std::back_inserter(z));
        snn(i, j_ind) = z.size();
        // +1 if i is in j
        if(i_in_j) snn(i, j_ind)++;

      }else snn(i, j_ind) = 0;

    }
  }

  return snn;
}


================================================
FILE: src/Makevars
================================================
# CXX_STD = CXX11

SOURCES = \
	ANN/perf.cpp ANN/bd_fix_rad_search.cpp ANN/bd_search.cpp \
	ANN/kd_split.cpp ANN/kd_pr_search.cpp ANN/kd_search.cpp \
	ANN/ANN.cpp ANN/brute.cpp ANN/bd_tree.cpp ANN/kd_fix_rad_search.cpp \
	ANN/bd_pr_search.cpp ANN/kd_util.cpp ANN/kd_tree.cpp ANN/kd_dump.cpp \
	utilities.cpp cleanup.cpp \
	kNN.cpp connectedComps.cpp \
	frNN.cpp regionQuery.cpp density.cpp \
	dbscan.cpp \
	optics.cpp \
	JP.cpp \
	hdbscan.cpp \
	dendrogram.cpp UnionFind.cpp \
	mrd.cpp \
	mst.cpp \
	lof.cpp \
	dbcv.cpp \
	RcppExports.cpp

OBJECTS = $(SOURCES:.cpp=.o)


================================================
FILE: src/RcppExports.cpp
================================================
// Generated by using Rcpp::compileAttributes() -> do not edit by hand
// Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393

#include <Rcpp.h>

using namespace Rcpp;

#ifdef RCPP_USE_GLOBAL_ROSTREAM
Rcpp::Rostream<true>&  Rcpp::Rcout = Rcpp::Rcpp_cout_get();
Rcpp::Rostream<false>& Rcpp::Rcerr = Rcpp::Rcpp_cerr_get();
#endif

// JP_int
IntegerVector JP_int(IntegerMatrix nn, unsigned int kt);
RcppExport SEXP _dbscan_JP_int(SEXP nnSEXP, SEXP ktSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< IntegerMatrix >::type nn(nnSEXP);
    Rcpp::traits::input_parameter< unsigned int >::type kt(ktSEXP);
    rcpp_result_gen = Rcpp::wrap(JP_int(nn, kt));
    return rcpp_result_gen;
END_RCPP
}
// SNN_sim_int
IntegerMatrix SNN_sim_int(IntegerMatrix nn, LogicalVector jp);
RcppExport SEXP _dbscan_SNN_sim_int(SEXP nnSEXP, SEXP jpSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< IntegerMatrix >::type nn(nnSEXP);
    Rcpp::traits::input_parameter< LogicalVector >::type jp(jpSEXP);
    rcpp_result_gen = Rcpp::wrap(SNN_sim_int(nn, jp));
    return rcpp_result_gen;
END_RCPP
}
// ANN_cleanup
void ANN_cleanup();
RcppExport SEXP _dbscan_ANN_cleanup() {
BEGIN_RCPP
    Rcpp::RNGScope rcpp_rngScope_gen;
    ANN_cleanup();
    return R_NilValue;
END_RCPP
}
// comps_kNN
IntegerVector comps_kNN(IntegerMatrix nn, bool mutual);
RcppExport SEXP _dbscan_comps_kNN(SEXP nnSEXP, SEXP mutualSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< IntegerMatrix >::type nn(nnSEXP);
    Rcpp::traits::input_parameter< bool >::type mutual(mutualSEXP);
    rcpp_result_gen = Rcpp::wrap(comps_kNN(nn, mutual));
    return rcpp_result_gen;
END_RCPP
}
// comps_frNN
IntegerVector comps_frNN(List nn, bool mutual);
RcppExport SEXP _dbscan_comps_frNN(SEXP nnSEXP, SEXP mutualSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< List >::type nn(nnSEXP);
    Rcpp::traits::input_parameter< bool >::type mutual(mutualSEXP);
    rcpp_result_gen = Rcpp::wrap(comps_frNN(nn, mutual));
    return rcpp_result_gen;
END_RCPP
}
// intToStr
StringVector intToStr(IntegerVector iv);
RcppExport SEXP _dbscan_intToStr(SEXP ivSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< IntegerVector >::type iv(ivSEXP);
    rcpp_result_gen = Rcpp::wrap(intToStr(iv));
    return rcpp_result_gen;
END_RCPP
}
// dist_subset
NumericVector dist_subset(const NumericVector& dist, IntegerVector idx);
RcppExport SEXP _dbscan_dist_subset(SEXP distSEXP, SEXP idxSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< const NumericVector& >::type dist(distSEXP);
    Rcpp::traits::input_parameter< IntegerVector >::type idx(idxSEXP);
    rcpp_result_gen = Rcpp::wrap(dist_subset(dist, idx));
    return rcpp_result_gen;
END_RCPP
}
// XOR
Rcpp::LogicalVector XOR(Rcpp::LogicalVector lhs, Rcpp::LogicalVector rhs);
RcppExport SEXP _dbscan_XOR(SEXP lhsSEXP, SEXP rhsSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< Rcpp::LogicalVector >::type lhs(lhsSEXP);
    Rcpp::traits::input_parameter< Rcpp::LogicalVector >::type rhs(rhsSEXP);
    rcpp_result_gen = Rcpp::wrap(XOR(lhs, rhs));
    return rcpp_result_gen;
END_RCPP
}
// dspc
NumericMatrix dspc(const List& cl_idx, const List& internal_nodes, const IntegerVector& all_cl_ids, const NumericVector& mrd_dist);
RcppExport SEXP _dbscan_dspc(SEXP cl_idxSEXP, SEXP internal_nodesSEXP, SEXP all_cl_idsSEXP, SEXP mrd_distSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< const List& >::type cl_idx(cl_idxSEXP);
    Rcpp::traits::input_parameter< const List& >::type internal_nodes(internal_nodesSEXP);
    Rcpp::traits::input_parameter< const IntegerVector& >::type all_cl_ids(all_cl_idsSEXP);
    Rcpp::traits::input_parameter< const NumericVector& >::type mrd_dist(mrd_distSEXP);
    rcpp_result_gen = Rcpp::wrap(dspc(cl_idx, internal_nodes, all_cl_ids, mrd_dist));
    return rcpp_result_gen;
END_RCPP
}
// dbscan_int
IntegerVector dbscan_int(NumericMatrix data, double eps, int minPts, NumericVector weights, int borderPoints, int type, int bucketSize, int splitRule, double approx, List frNN);
RcppExport SEXP _dbscan_dbscan_int(SEXP dataSEXP, SEXP epsSEXP, SEXP minPtsSEXP, SEXP weightsSEXP, SEXP borderPointsSEXP, SEXP typeSEXP, SEXP bucketSizeSEXP, SEXP splitRuleSEXP, SEXP approxSEXP, SEXP frNNSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< NumericMatrix >::type data(dataSEXP);
    Rcpp::traits::input_parameter< double >::type eps(epsSEXP);
    Rcpp::traits::input_parameter< int >::type minPts(minPtsSEXP);
    Rcpp::traits::input_parameter< NumericVector >::type weights(weightsSEXP);
    Rcpp::traits::input_parameter< int >::type borderPoints(borderPointsSEXP);
    Rcpp::traits::input_parameter< int >::type type(typeSEXP);
    Rcpp::traits::input_parameter< int >::type bucketSize(bucketSizeSEXP);
    Rcpp::traits::input_parameter< int >::type splitRule(splitRuleSEXP);
    Rcpp::traits::input_parameter< double >::type approx(approxSEXP);
    Rcpp::traits::input_parameter< List >::type frNN(frNNSEXP);
    rcpp_result_gen = Rcpp::wrap(dbscan_int(data, eps, minPts, weights, borderPoints, type, bucketSize, splitRule, approx, frNN));
    return rcpp_result_gen;
END_RCPP
}
// reach_to_dendrogram
List reach_to_dendrogram(const Rcpp::List reachability, const NumericVector pl_order);
RcppExport SEXP _dbscan_reach_to_dendrogram(SEXP reachabilitySEXP, SEXP pl_orderSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< const Rcpp::List >::type reachability(reachabilitySEXP);
    Rcpp::traits::input_parameter< const NumericVector >::type pl_order(pl_orderSEXP);
    rcpp_result_gen = Rcpp::wrap(reach_to_dendrogram(reachability, pl_order));
    return rcpp_result_gen;
END_RCPP
}
// dendrogram_to_reach
List dendrogram_to_reach(const Rcpp::List x);
RcppExport SEXP _dbscan_dendrogram_to_reach(SEXP xSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< const Rcpp::List >::type x(xSEXP);
    rcpp_result_gen = Rcpp::wrap(dendrogram_to_reach(x));
    return rcpp_result_gen;
END_RCPP
}
// mst_to_dendrogram
List mst_to_dendrogram(const NumericMatrix mst);
RcppExport SEXP _dbscan_mst_to_dendrogram(SEXP mstSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< const NumericMatrix >::type mst(mstSEXP);
    rcpp_result_gen = Rcpp::wrap(mst_to_dendrogram(mst));
    return rcpp_result_gen;
END_RCPP
}
// dbscan_density_int
IntegerVector dbscan_density_int(NumericMatrix data, double eps, int type, int bucketSize, int splitRule, double approx);
RcppExport SEXP _dbscan_dbscan_density_int(SEXP dataSEXP, SEXP epsSEXP, SEXP typeSEXP, SEXP bucketSizeSEXP, SEXP splitRuleSEXP, SEXP approxSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< NumericMatrix >::type data(dataSEXP);
    Rcpp::traits::input_parameter< double >::type eps(epsSEXP);
    Rcpp::traits::input_parameter< int >::type type(typeSEXP);
    Rcpp::traits::input_parameter< int >::type bucketSize(bucketSizeSEXP);
    Rcpp::traits::input_parameter< int >::type splitRule(splitRuleSEXP);
    Rcpp::traits::input_parameter< double >::type approx(approxSEXP);
    rcpp_result_gen = Rcpp::wrap(dbscan_density_int(data, eps, type, bucketSize, splitRule, approx));
    return rcpp_result_gen;
END_RCPP
}
// frNN_int
List frNN_int(NumericMatrix data, double eps, int type, int bucketSize, int splitRule, double approx);
RcppExport SEXP _dbscan_frNN_int(SEXP dataSEXP, SEXP epsSEXP, SEXP typeSEXP, SEXP bucketSizeSEXP, SEXP splitRuleSEXP, SEXP approxSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< NumericMatrix >::type data(dataSEXP);
    Rcpp::traits::input_parameter< double >::type eps(epsSEXP);
    Rcpp::traits::input_parameter< int >::type type(typeSEXP);
    Rcpp::traits::input_parameter< int >::type bucketSize(bucketSizeSEXP);
    Rcpp::traits::input_parameter< int >::type splitRule(splitRuleSEXP);
    Rcpp::traits::input_parameter< double >::type approx(approxSEXP);
    rcpp_result_gen = Rcpp::wrap(frNN_int(data, eps, type, bucketSize, splitRule, approx));
    return rcpp_result_gen;
END_RCPP
}
// frNN_query_int
List frNN_query_int(NumericMatrix data, NumericMatrix query, double eps, int type, int bucketSize, int splitRule, double approx);
RcppExport SEXP _dbscan_frNN_query_int(SEXP dataSEXP, SEXP querySEXP, SEXP epsSEXP, SEXP typeSEXP, SEXP bucketSizeSEXP, SEXP splitRuleSEXP, SEXP approxSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< NumericMatrix >::type data(dataSEXP);
    Rcpp::traits::input_parameter< NumericMatrix >::type query(querySEXP);
    Rcpp::traits::input_parameter< double >::type eps(epsSEXP);
    Rcpp::traits::input_parameter< int >::type type(typeSEXP);
    Rcpp::traits::input_parameter< int >::type bucketSize(bucketSizeSEXP);
    Rcpp::traits::input_parameter< int >::type splitRule(splitRuleSEXP);
    Rcpp::traits::input_parameter< double >::type approx(approxSEXP);
    rcpp_result_gen = Rcpp::wrap(frNN_query_int(data, query, eps, type, bucketSize, splitRule, approx));
    return rcpp_result_gen;
END_RCPP
}
// distToAdjacency
List distToAdjacency(IntegerVector constraints, const int N);
RcppExport SEXP _dbscan_distToAdjacency(SEXP constraintsSEXP, SEXP NSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< IntegerVector >::type constraints(constraintsSEXP);
    Rcpp::traits::input_parameter< const int >::type N(NSEXP);
    rcpp_result_gen = Rcpp::wrap(distToAdjacency(constraints, N));
    return rcpp_result_gen;
END_RCPP
}
// buildDendrogram
List buildDendrogram(List hcl);
RcppExport SEXP _dbscan_buildDendrogram(SEXP hclSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< List >::type hcl(hclSEXP);
    rcpp_result_gen = Rcpp::wrap(buildDendrogram(hcl));
    return rcpp_result_gen;
END_RCPP
}
// all_children
IntegerVector all_children(List hier, int key, bool leaves_only);
RcppExport SEXP _dbscan_all_children(SEXP hierSEXP, SEXP keySEXP, SEXP leaves_onlySEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< List >::type hier(hierSEXP);
    Rcpp::traits::input_parameter< int >::type key(keySEXP);
    Rcpp::traits::input_parameter< bool >::type leaves_only(leaves_onlySEXP);
    rcpp_result_gen = Rcpp::wrap(all_children(hier, key, leaves_only));
    return rcpp_result_gen;
END_RCPP
}
// node_xy
NumericMatrix node_xy(List cl_tree, List cl_hierarchy, int cid);
RcppExport SEXP _dbscan_node_xy(SEXP cl_treeSEXP, SEXP cl_hierarchySEXP, SEXP cidSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< List >::type cl_tree(cl_treeSEXP);
    Rcpp::traits::input_parameter< List >::type cl_hierarchy(cl_hierarchySEXP);
    Rcpp::traits::input_parameter< int >::type cid(cidSEXP);
    rcpp_result_gen = Rcpp::wrap(node_xy(cl_tree, cl_hierarchy, cid));
    return rcpp_result_gen;
END_RCPP
}
// simplifiedTree
List simplifiedTree(List cl_tree);
RcppExport SEXP _dbscan_simplifiedTree(SEXP cl_treeSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< List >::type cl_tree(cl_treeSEXP);
    rcpp_result_gen = Rcpp::wrap(simplifiedTree(cl_tree));
    return rcpp_result_gen;
END_RCPP
}
// computeStability
List computeStability(const List hcl, const int minPts, bool compute_glosh);
RcppExport SEXP _dbscan_computeStability(SEXP hclSEXP, SEXP minPtsSEXP, SEXP compute_gloshSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< const List >::type hcl(hclSEXP);
    Rcpp::traits::input_parameter< const int >::type minPts(minPtsSEXP);
    Rcpp::traits::input_parameter< bool >::type compute_glosh(compute_gloshSEXP);
    rcpp_result_gen = Rcpp::wrap(computeStability(hcl, minPts, compute_glosh));
    return rcpp_result_gen;
END_RCPP
}
// validateConstraintList
List validateConstraintList(List& constraints, int n);
RcppExport SEXP _dbscan_validateConstraintList(SEXP constraintsSEXP, SEXP nSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< List& >::type constraints(constraintsSEXP);
    Rcpp::traits::input_parameter< int >::type n(nSEXP);
    rcpp_result_gen = Rcpp::wrap(validateConstraintList(constraints, n));
    return rcpp_result_gen;
END_RCPP
}
// computeVirtualNode
double computeVirtualNode(IntegerVector noise, List constraints);
RcppExport SEXP _dbscan_computeVirtualNode(SEXP noiseSEXP, SEXP constraintsSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< IntegerVector >::type noise(noiseSEXP);
    Rcpp::traits::input_parameter< List >::type constraints(constraintsSEXP);
    rcpp_result_gen = Rcpp::wrap(computeVirtualNode(noise, constraints));
    return rcpp_result_gen;
END_RCPP
}
// fosc
NumericVector fosc(List cl_tree, std::string cid, std::list<int>& sc, List cl_hierarchy, bool prune_unstable_leaves, double cluster_selection_epsilon, const double alpha, bool useVirtual, const int n_constraints, List constraints);
RcppExport SEXP _dbscan_fosc(SEXP cl_treeSEXP, SEXP cidSEXP, SEXP scSEXP, SEXP cl_hierarchySEXP, SEXP prune_unstable_leavesSEXP, SEXP cluster_selection_epsilonSEXP, SEXP alphaSEXP, SEXP useVirtualSEXP, SEXP n_constraintsSEXP, SEXP constraintsSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< List >::type cl_tree(cl_treeSEXP);
    Rcpp::traits::input_parameter< std::string >::type cid(cidSEXP);
    Rcpp::traits::input_parameter< std::list<int>& >::type sc(scSEXP);
    Rcpp::traits::input_parameter< List >::type cl_hierarchy(cl_hierarchySEXP);
    Rcpp::traits::input_parameter< bool >::type prune_unstable_leaves(prune_unstable_leavesSEXP);
    Rcpp::traits::input_parameter< double >::type cluster_selection_epsilon(cluster_selection_epsilonSEXP);
    Rcpp::traits::input_parameter< const double >::type alpha(alphaSEXP);
    Rcpp::traits::input_parameter< bool >::type useVirtual(useVirtualSEXP);
    Rcpp::traits::input_parameter< const int >::type n_constraints(n_constraintsSEXP);
    Rcpp::traits::input_parameter< List >::type constraints(constraintsSEXP);
    rcpp_result_gen = Rcpp::wrap(fosc(cl_tree, cid, sc, cl_hierarchy, prune_unstable_leaves, cluster_selection_epsilon, alpha, useVirtual, n_constraints, constraints));
    return rcpp_result_gen;
END_RCPP
}
// extractUnsupervised
List extractUnsupervised(List cl_tree, bool prune_unstable, double cluster_selection_epsilon);
RcppExport SEXP _dbscan_extractUnsupervised(SEXP cl_treeSEXP, SEXP prune_unstableSEXP, SEXP cluster_selection_epsilonSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< List >::type cl_tree(cl_treeSEXP);
    Rcpp::traits::input_parameter< bool >::type prune_unstable(prune_unstableSEXP);
    Rcpp::traits::input_parameter< double >::type cluster_selection_epsilon(cluster_selection_epsilonSEXP);
    rcpp_result_gen = Rcpp::wrap(extractUnsupervised(cl_tree, prune_unstable, cluster_selection_epsilon));
    return rcpp_result_gen;
END_RCPP
}
// extractSemiSupervised
List extractSemiSupervised(List cl_tree, List constraints, float alpha, bool prune_unstable_leaves, double cluster_selection_epsilon);
RcppExport SEXP _dbscan_extractSemiSupervised(SEXP cl_treeSEXP, SEXP constraintsSEXP, SEXP alphaSEXP, SEXP prune_unstable_leavesSEXP, SEXP cluster_selection_epsilonSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< List >::type cl_tree(cl_treeSEXP);
    Rcpp::traits::input_parameter< List >::type constraints(constraintsSEXP);
    Rcpp::traits::input_parameter< float >::type alpha(alphaSEXP);
    Rcpp::traits::input_parameter< bool >::type prune_unstable_leaves(prune_unstable_leavesSEXP);
    Rcpp::traits::input_parameter< double >::type cluster_selection_epsilon(cluster_selection_epsilonSEXP);
    rcpp_result_gen = Rcpp::wrap(extractSemiSupervised(cl_tree, constraints, alpha, prune_unstable_leaves, cluster_selection_epsilon));
    return rcpp_result_gen;
END_RCPP
}
// kNN_query_int
List kNN_query_int(NumericMatrix data, NumericMatrix query, int k, int type, int bucketSize, int splitRule, double approx);
RcppExport SEXP _dbscan_kNN_query_int(SEXP dataSEXP, SEXP querySEXP, SEXP kSEXP, SEXP typeSEXP, SEXP bucketSizeSEXP, SEXP splitRuleSEXP, SEXP approxSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< NumericMatrix >::type data(dataSEXP);
    Rcpp::traits::input_parameter< NumericMatrix >::type query(querySEXP);
    Rcpp::traits::input_parameter< int >::type k(kSEXP);
    Rcpp::traits::input_parameter< int >::type type(typeSEXP);
    Rcpp::traits::input_parameter< int >::type bucketSize(bucketSizeSEXP);
    Rcpp::traits::input_parameter< int >::type splitRule(splitRuleSEXP);
    Rcpp::traits::input_parameter< double >::type approx(approxSEXP);
    rcpp_result_gen = Rcpp::wrap(kNN_query_int(data, query, k, type, bucketSize, splitRule, approx));
    return rcpp_result_gen;
END_RCPP
}
// kNN_int
List kNN_int(NumericMatrix data, int k, int type, int bucketSize, int splitRule, double approx);
RcppExport SEXP _dbscan_kNN_int(SEXP dataSEXP, SEXP kSEXP, SEXP typeSEXP, SEXP bucketSizeSEXP, SEXP splitRuleSEXP, SEXP approxSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< NumericMatrix >::type data(dataSEXP);
    Rcpp::traits::input_parameter< int >::type k(kSEXP);
    Rcpp::traits::input_parameter< int >::type type(typeSEXP);
    Rcpp::traits::input_parameter< int >::type bucketSize(bucketSizeSEXP);
    Rcpp::traits::input_parameter< int >::type splitRule(splitRuleSEXP);
    Rcpp::traits::input_parameter< double >::type approx(approxSEXP);
    rcpp_result_gen = Rcpp::wrap(kNN_int(data, k, type, bucketSize, splitRule, approx));
    return rcpp_result_gen;
END_RCPP
}
// lof_kNN
List lof_kNN(NumericMatrix data, int minPts, int type, int bucketSize, int splitRule, double approx);
RcppExport SEXP _dbscan_lof_kNN(SEXP dataSEXP, SEXP minPtsSEXP, SEXP typeSEXP, SEXP bucketSizeSEXP, SEXP splitRuleSEXP, SEXP approxSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< NumericMatrix >::type data(dataSEXP);
    Rcpp::traits::input_parameter< int >::type minPts(minPtsSEXP);
    Rcpp::traits::input_parameter< int >::type type(typeSEXP);
    Rcpp::traits::input_parameter< int >::type bucketSize(bucketSizeSEXP);
    Rcpp::traits::input_parameter< int >::type splitRule(splitRuleSEXP);
    Rcpp::traits::input_parameter< double >::type approx(approxSEXP);
    rcpp_result_gen = Rcpp::wrap(lof_kNN(data, minPts, type, bucketSize, splitRule, approx));
    return rcpp_result_gen;
END_RCPP
}
// mrd
NumericVector mrd(NumericVector dm, NumericVector cd);
RcppExport SEXP _dbscan_mrd(SEXP dmSEXP, SEXP cdSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< NumericVector >::type dm(dmSEXP);
    Rcpp::traits::input_parameter< NumericVector >::type cd(cdSEXP);
    rcpp_result_gen = Rcpp::wrap(mrd(dm, cd));
    return rcpp_result_gen;
END_RCPP
}
// mst
Rcpp::NumericMatrix mst(const NumericVector x_dist, const R_xlen_t n);
RcppExport SEXP _dbscan_mst(SEXP x_distSEXP, SEXP nSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< const NumericVector >::type x_dist(x_distSEXP);
    Rcpp::traits::input_parameter< const R_xlen_t >::type n(nSEXP);
    rcpp_result_gen = Rcpp::wrap(mst(x_dist, n));
    return rcpp_result_gen;
END_RCPP
}
// hclustMergeOrder
List hclustMergeOrder(NumericMatrix mst, IntegerVector o);
RcppExport SEXP _dbscan_hclustMergeOrder(SEXP mstSEXP, SEXP oSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< NumericMatrix >::type mst(mstSEXP);
    Rcpp::traits::input_parameter< IntegerVector >::type o(oSEXP);
    rcpp_result_gen = Rcpp::wrap(hclustMergeOrder(mst, o));
    return rcpp_result_gen;
END_RCPP
}
// optics_int
List optics_int(NumericMatrix data, double eps, int minPts, int type, int bucketSize, int splitRule, double approx, List frNN);
RcppExport SEXP _dbscan_optics_int(SEXP dataSEXP, SEXP epsSEXP, SEXP minPtsSEXP, SEXP typeSEXP, SEXP bucketSizeSEXP, SEXP splitRuleSEXP, SEXP approxSEXP, SEXP frNNSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< NumericMatrix >::type data(dataSEXP);
    Rcpp::traits::input_parameter< double >::type eps(epsSEXP);
    Rcpp::traits::input_parameter< int >::type minPts(minPtsSEXP);
    Rcpp::traits::input_parameter< int >::type type(typeSEXP);
    Rcpp::traits::input_parameter< int >::type bucketSize(bucketSizeSEXP);
    Rcpp::traits::input_parameter< int >::type splitRule(splitRuleSEXP);
    Rcpp::traits::input_parameter< double >::type approx(approxSEXP);
    Rcpp::traits::input_parameter< List >::type frNN(frNNSEXP);
    rcpp_result_gen = Rcpp::wrap(optics_int(data, eps, minPts, type, bucketSize, splitRule, approx, frNN));
    return rcpp_result_gen;
END_RCPP
}
// lowerTri
IntegerVector lowerTri(IntegerMatrix m);
RcppExport SEXP _dbscan_lowerTri(SEXP mSEXP) {
BEGIN_RCPP
    Rcpp::RObject rcpp_result_gen;
    Rcpp::RNGScope rcpp_rngScope_gen;
    Rcpp::traits::input_parameter< IntegerMatrix >::type m(mSEXP);
    rcpp_result_gen = Rcpp::wrap(lowerTri(m));
    return rcpp_result_gen;
END_RCPP
}

static const R_CallMethodDef CallEntries[] = {
    {"_dbscan_JP_int", (DL_FUNC) &_dbscan_JP_int, 2},
    {"_dbscan_SNN_sim_int", (DL_FUNC) &_dbscan_SNN_sim_int, 2},
    {"_dbscan_ANN_cleanup", (DL_FUNC) &_dbscan_ANN_cleanup, 0},
    {"_dbscan_comps_kNN", (DL_FUNC) &_dbscan_comps_kNN, 2},
    {"_dbscan_comps_frNN", (DL_FUNC) &_dbscan_comps_frNN, 2},
    {"_dbscan_intToStr", (DL_FUNC) &_dbscan_intToStr, 1},
    {"_dbscan_dist_subset", (DL_FUNC) &_dbscan_dist_subset, 2},
    {"_dbscan_XOR", (DL_FUNC) &_dbscan_XOR, 2},
    {"_dbscan_dspc", (DL_FUNC) &_dbscan_dspc, 4},
    {"_dbscan_dbscan_int", (DL_FUNC) &_dbscan_dbscan_int, 10},
    {"_dbscan_reach_to_dendrogram", (DL_FUNC) &_dbscan_reach_to_dendrogram, 2},
    {"_dbscan_dendrogram_to_reach", (DL_FUNC) &_dbscan_dendrogram_to_reach, 1},
    {"_dbscan_mst_to_dendrogram", (DL_FUNC) &_dbscan_mst_to_dendrogram, 1},
    {"_dbscan_dbscan_density_int", (DL_FUNC) &_dbscan_dbscan_density_int, 6},
    {"_dbscan_frNN_int", (DL_FUNC) &_dbscan_frNN_int, 6},
    {"_dbscan_frNN_query_int", (DL_FUNC) &_dbscan_frNN_query_int, 7},
    {"_dbscan_distToAdjacency", (DL_FUNC) &_dbscan_distToAdjacency, 2},
    {"_dbscan_buildDendrogram", (DL_FUNC) &_dbscan_buildDendrogram, 1},
    {"_dbscan_all_children", (DL_FUNC) &_dbscan_all_children, 3},
    {"_dbscan_node_xy", (DL_FUNC) &_dbscan_node_xy, 3},
    {"_dbscan_simplifiedTree", (DL_FUNC) &_dbscan_simplifiedTree, 1},
    {"_dbscan_computeStability", (DL_FUNC) &_dbscan_computeStability, 3},
    {"_dbscan_validateConstraintList", (DL_FUNC) &_dbscan_validateConstraintList, 2},
    {"_dbscan_computeVirtualNode", (DL_FUNC) &_dbscan_computeVirtualNode, 2},
    {"_dbscan_fosc", (DL_FUNC) &_dbscan_fosc, 10},
    {"_dbscan_extractUnsupervised", (DL_FUNC) &_dbscan_extractUnsupervised, 3},
    {"_dbscan_extractSemiSupervised", (DL_FUNC) &_dbscan_extractSemiSupervised, 5},
    {"_dbscan_kNN_query_int", (DL_FUNC) &_dbscan_kNN_query_int, 7},
    {"_dbscan_kNN_int", (DL_FUNC) &_dbscan_kNN_int, 6},
    {"_dbscan_lof_kNN", (DL_FUNC) &_dbscan_lof_kNN, 6},
    {"_dbscan_mrd", (DL_FUNC) &_dbscan_mrd, 2},
    {"_dbscan_mst", (DL_FUNC) &_dbscan_mst, 2},
    {"_dbscan_hclustMergeOrder", (DL_FUNC) &_dbscan_hclustMergeOrder, 2},
    {"_dbscan_optics_int", (DL_FUNC) &_dbscan_optics_int, 8},
    {"_dbscan_lowerTri", (DL_FUNC) &_dbscan_lowerTri, 1},
    {NULL, NULL, 0}
};

RcppExport void R_init_dbscan(DllInfo *dll) {
    R_registerRoutines(dll, NULL, CallEntries, NULL, NULL);
    R_useDynamicSymbols(dll, FALSE);
}


================================================
FILE: src/UnionFind.cpp
================================================
//----------------------------------------------------------------------
//                        Disjoint-set data structure
// File:                        union_find.cpp
//----------------------------------------------------------------------
// Copyright (c) 2016 Michael Hahsler, Matt Piekenbrock. All Rights Reserved.
//
// This software is provided under the provisions of the
// GNU General Public License (GPL) Version 3
// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)

// Class definition based off of data-structure described here:
// https://en.wikipedia.org/wiki/Disjoint-set_data_structure

#include "UnionFind.h"

UnionFind::UnionFind(const int size) : parent(size), rank(size)
{
  for (int i = 0; i < size; ++i)
  { parent[i] = i, rank[i] = 0; }
}

// Destructor not needed w/o dynamic allocation
UnionFind::~UnionFind() { }

void UnionFind::Union(const int x, const int y)
{
  const int xRoot = Find(x);
  const int yRoot = Find(y);
  if (xRoot == yRoot)
   return;
  else if (rank[xRoot] > rank[yRoot])
    parent[yRoot] = xRoot;
  else if (rank[xRoot] < rank[yRoot])
    parent[xRoot] = yRoot;
  else if (rank[xRoot] == rank[yRoot])
  {
    parent[yRoot] = parent[xRoot];
    rank[xRoot] = rank[xRoot] + 1;
  }
}

const int UnionFind::Find(const int x)
{
  if (parent[x] == x)
    return x;
  else
  {
    parent[x] = Find(parent[x]);
    return parent[x];
  }
}


================================================
FILE: src/UnionFind.h
================================================
//----------------------------------------------------------------------
//                        Disjoint-set data structure
// File:                        union_find.h
//----------------------------------------------------------------------
// Copyright (c) 2016 Michael Hahsler, Matt Piekenbrock. All Rights Reserved.
//
// This software is provided under the provisions of the
// GNU General Public License (GPL) Version 3
// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)

// Class definition based off of data-structure described here:
// https://en.wikipedia.org/wiki/Disjoint-set_data_structure

#ifndef UNIONFIND
#define UNIONFIND

#include <Rcpp.h>

using namespace Rcpp;

class UnionFind
{
  Rcpp::IntegerVector parent;
  Rcpp::IntegerVector rank;

  public:
  UnionFind(const int size);
  ~UnionFind();
  void Union(const int x, const int y);
  const int Find(const int x);

}; // class UnionFind

#endif


================================================
FILE: src/cleanup.cpp
================================================
//----------------------------------------------------------------------
//              R interface to dbscan using the ANN library
//----------------------------------------------------------------------
// Copyright (c) 2015 Michael Hahsler. All Rights Reserved.
//
// This software is provided under the provisions of the
// GNU General Public License (GPL) Version 3
// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)

#include <Rcpp.h>
#include "ANN/ANN.h"

using namespace Rcpp;

// [[Rcpp::export]]
void ANN_cleanup() {
  annClose();
}


================================================
FILE: src/connectedComps.cpp
================================================
//----------------------------------------------------------------------
//              R interface to dbscan using the ANN library
//----------------------------------------------------------------------
// Copyright (c) 2015 Michael Hahsler. All Rights Reserved.
//
// This software is provided under the provisions of the
// GNU General Public License (GPL) Version 3
// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)

#include <Rcpp.h>

using namespace Rcpp;

// Find connected components in kNN and frNN objects.

// [[Rcpp::export]]
IntegerVector comps_kNN(IntegerMatrix nn, bool mutual) {
  R_xlen_t n = nn.nrow();

  // create label vector
  std::vector<int> label(n);
  std::iota(std::begin(label), std::end(label), 1); // Fill with 1, 2, ..., n.
  //iota is C++11 only
  //int value = 1;
  //std::vector<int>::iterator first = label.begin(), last = label.end();
  //while(first != last) *first++ = value++;

  // create sorted sets so we can use set operations
  std::vector< std::set<int> > nn_set(n);
  IntegerVector r;
  std::vector<int> s;
  for(int i = 0; i < n; ++i) {
    r = na_omit(nn(i,_));
    s =  as<std::vector<int> >(r);
    nn_set[i].insert(s.begin(), s.end());
  }

  std::set<int>::iterator it;
  R_xlen_t i, j;
  int newlabel, oldlabel;

  for(i = 0; i < n; ++i) {
    // check all neighbors of i
    for (it = nn_set[i].begin(); it != nn_set[i].end(); ++it) {
      j = *it-1; // index in nn starts with 1

      // edge was already checked
      //if(j<i) continue;

      // already in the same cluster
      if(label[i] == label[j]) continue;

      // check if points are in each others nn list (i is already in j)
      if(!mutual || nn_set[j].find(i+1) != nn_set[j].end()) {
        if(label[i] > label[j]) {
          newlabel = label[j]; oldlabel = label[i];
        }else{
          newlabel = label[i]; oldlabel = label[j];
        }

        // relabel
        for(int k = 0; k < n; ++k) {
          if(label[k] == oldlabel) label[k] = newlabel;
        }
      }
    }
  }

  return wrap(label);
}

// [[Rcpp::export]]
IntegerVector comps_frNN(List nn, bool mutual) {
  R_xlen_t n = nn.length();

  // create label vector
  std::vector<int> label(n);
  std::iota(std::begin(label), std::end(label), 1); // Fill with 1, 2, ..., n.
  //iota is C++11 only
  //int value = 1;
  //std::vector<int>::iterator first = label.begin(), last = label.end();
  //while(first != last) *first++ = value++;

  // create sorted sets so we can use set operations
  std::vector< std::set<int> > nn_set(n);
  IntegerVector r;
  std::vector<int> s;
  for(R_xlen_t i = 0; i < n; ++i) {
    r = nn[i];
    s =  as<std::vector<int> >(r);
    nn_set[i].insert(s.begin(), s.end());
  }

  std::set<int>::iterator it;
  R_xlen_t i, j;
  int newlabel, oldlabel;

  for(i = 0; i < n; ++i) {
    // check all neighbors of i
    for (it = nn_set[i].begin(); it != nn_set[i].end(); ++it) {
      j = *it-1; // index in nn starts with 1

      // edge was already checked
      //if(j<i) continue;

      // already in the same cluster
      if(label[i] == label[j]) continue;

      // check if points are in each others nn list (i is already in j)
      if(!mutual || nn_set[j].find(i+1) != nn_set[j].end()) {
        if(label[i] > label[j]) {
          newlabel = label[j]; oldlabel = label[i];
        }else{
          newlabel = label[i]; oldlabel = label[j];
        }

        // relabel
        for(int k = 0; k < n; ++k) {
          if(label[k] == oldlabel) label[k] = newlabel;
        }
      }
    }
  }

  return wrap(label);
}


================================================
FILE: src/dbcv.cpp
================================================
//----------------------------------------------------------------------
//                                DBSCAN
// File:                         dbcv.cpp
//----------------------------------------------------------------------
// Copyright (c) 2025 Michael Hahsler. All Rights Reserved.
//
// This software is provided under the provisions of the
// GNU General Public License (GPL) Version 3
// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)

#include <Rcpp.h>

// Includes
#include "utilities.h"
#include "mst.h"
#include "ANN/ANN.h"
#include "kNN.h"
#include <string>
#include <unordered_map>

using namespace Rcpp;
// [[Rcpp::plugins(cpp11)]]

// [[Rcpp::export]]
StringVector intToStr(IntegerVector iv){
  StringVector res = StringVector(iv.length());
  int ci = 0;
  for (IntegerVector::iterator i = iv.begin(); i != iv.end(); ++i){
    res[ci++] = std::to_string(*i);
  }
  return(res);
}

std::unordered_map<std::string, double> toMap(List map){
  std::vector<std::string> keys = map.names();
  std::unordered_map<std::string, double> hash_map = std::unordered_map<std::string, double>();
  const int n = map.size();
  for (int i = 0; i < n; ++i){
    hash_map.emplace((std::string) keys.at(i), (double) map.at(i));
  }
  return(hash_map);
}

NumericVector retrieve(StringVector keys, std::unordered_map<std::string, double> map){
  int n = keys.size(), i = 0;
  NumericVector res = NumericVector(n);
  for (StringVector::iterator it = keys.begin(); it != keys.end(); ++it){ res[i++] = map[as< std::string >(*it)]; }
  return(res);
}


NumericVector dist_subset_arma(const NumericVector& dist, IntegerVector idx){
  // vec v1 = as<vec>(v1in);
  // uvec idx = as<uvec>(idxin) - 1;
  // vec subset = v1.elem(idx);
  // return(wrap(subset));
  return(NumericVector::create());
}


// Provides a fast of extracting subsets of a dist object. Expects as input the full dist
// object to subset 'dist', and a (1-based!) integer vector 'idx' of the points to keep in the subset
// [[Rcpp::export]]
NumericVector dist_subset(const NumericVector& dist, IntegerVector idx){
  const int n = dist.attr("Size");
  const int cl_n = idx.length();
  NumericVector new_dist = Rcpp::no_init((cl_n * (cl_n - 1))/2);
  int ii = 0;
  for (IntegerVector::iterator i = idx.begin(); i != idx.end(); ++i){
    for (IntegerVector::iterator j = i; j != idx.end(); ++j){
      if (*i == *j) { continue; }
      const int ij_idx = LT_POS1(n, *i, *j);
      new_dist[ii++] = dist[ij_idx];
    }
  }
  new_dist.attr("Size") = cl_n;
  new_dist.attr("class") = "dist";
  return(new_dist);
}

// Returns true if a given distance is less than 32-bit floating point precision
bool remove_zero(ANNdist cdist){
  return(cdist <= std::numeric_limits<float>::epsilon());
}

ANNdist inv_density(ANNdist cdist){
  return(1.0/cdist);
}

// // [[Rcpp::export]]
// List all_pts_core_sorted_dist(const NumericMatrix& sorted_dist, const List& cl, const int d, const bool squared){
//   // The all core dists to return
//   List all_core_res = List(cl.size());
//
//   // Do the kNN searches per cluster; note that k varies with the cluster
//   int i = 0;
//   for (List::const_iterator it = cl.begin(); it < cl.end(); ++it, ++i){
//     const IntegerVector& cl_pts = (*it);
//     const int k = cl_pts.length();
//
//     // Initial vector to record the per-point all core dists
//     NumericVector all_core_cl = Rcpp::no_init_vector(k);
//
//     // For each point in the cluster, get the all core points dist
//     int j = 0;
//     for (IntegerVector::const_iterator pt_id = cl_pts.begin(); pt_id != cl_pts.end(); ++pt_id, ++j){
//       const NumericMatrix::ConstColumn& knn_dist = sorted_dist.column((*pt_id) - 1);
//
//       // Calculate the all core points distance for this point
//       std::vector<ANNdist> ndists = std::vector<ANNdist>(knn_dist.begin(), knn_dist.begin()+k);
//       std::remove_if(ndists.begin(), ndists.end(), remove_zero);
//       std::transform(ndists.begin(), ndists.end(), ndists.begin(), [=](ANNdist cdist){ return std::pow(1.0/cdist, d); });
//       ANNdist sum_inv_density = std::accumulate(ndists.begin(), ndists.end(), (ANNdist) 0.0);
//       double acdist = std::pow(sum_inv_density/(k - 1.0), -(1.0 / double(d))); // Apply all core points equation
//       all_core_cl[j] = acdist;
//       // return(List::create(_["ndists"] = acdist, _["denom"] = sum_inv_density/(k - 1.0), _["k"] = k));
//     }
//     all_core_res[i] = all_core_cl;
//   }
//   return(all_core_res);
// }

// // [[Rcpp::export]]
// List all_pts_core(const NumericMatrix& data, const List& cl, const bool squared){
//   // copy data
//   int nrow = data.nrow();
//   int ncol = data.ncol();
//   ANNpointArray dataPts = annAllocPts(nrow, ncol);
//   for(int i = 0; i < nrow; i++){
//     for(int j = 0; j < ncol; j++){
//       (dataPts[i])[j] = data(i, j);
//     }
//   }
//
//   // create kd-tree (1) or linear search structure (2)
//   ANNpointSet* kdTree = new ANNkd_tree(dataPts, nrow, ncol, 30, (ANNsplitRule)  5);
//
//   // The all core dists to
//   List all_core_res = List(cl.size());
//
//   // Do the kNN searches per cluster; note that k varies with the cluster
//   int i = 0;
//   for (List::const_iterator it = cl.begin(); it < cl.end(); ++it, ++i){
//     const IntegerVector& cl_pts = (*it);
//     const int k = cl_pts.length();
//
//     // Initial vector to record the per-point all core dists
//     NumericVector all_core_cl = Rcpp::no_init_vector(k);
//
//     // For each point in the cluster, get the all core points dist
//     int j = 0;
//     ANNdistArray dists = new ANNdist[k];
//     ANNidxArray nnIdx = new ANNidx[k];
//     for (IntegerVector::const_iterator pt_id = cl_pts.begin(); pt_id != cl_pts.end(); ++pt_id, ++j){
//       // Do the search
//       ANNpoint queryPt = dataPts[(*pt_id) - 1]; // use original data points
//       kdTree->annkSearch(queryPt, k, nnIdx, dists);
//
//       // V2.
//       std::vector<ANNdist> ndists = std::vector<ANNdist>(dists, dists+k);
//       std::remove_if(ndists.begin(), ndists.end(), remove_zero);
//       std::transform(ndists.begin(), ndists.end(), ndists.begin(), [=](ANNdist cdist){ return std::pow(1.0/cdist, ncol); });
//       ANNdist sum_inv_density = std::accumulate(ndists.begin(), ndists.end(), (ANNdist) 0.0);
//       double acdist = std::pow(sum_inv_density/(k - 1.0), -(1.0 / double(ncol))); // Apply all core points equation
//       all_core_cl[j] = acdist;
//       // return(List::create(_["ndists"] = acdist, _["denom"] = sum_inv_density/(k - 1.0), _["k"] = k));
//     }
//     delete [] dists;
//     delete [] nnIdx;
//     all_core_res[i] = all_core_cl;
//   }
//
//   // cleanup
//   delete kdTree;
//   annDeallocPts(dataPts);
//   annClose();
//
//   // Return the all point core distance
//   if(!squared){ for (int i = 0; i < cl.size(); ++i){ all_core_res[i] = Rcpp::sqrt(all_core_res[i]); } }
//   return(all_core_res);
// }


// NumericVector all_pts_core(const NumericVector& dist, IntegerVector cl, const int d){
//   const int n = dist.attr("Size");
//   const int cl_n = cl.length();
//   NumericVector all_pts_cd = NumericVector(cl_n);
//   NumericVector tmp = NumericVector(cl_n);
//   int knn_i = 0, ii = 0;
//   for (IntegerVector::iterator i = cl.begin(); i != cl.end(); ++i){
//     for (IntegerVector::iterator j = cl.begin(); j != cl.end(); ++j){
//       if (*i == *j) { continue; }
//       const int idx = INDEX_TF(n, (*i < *j ? *i : *j) - 1, (*i < *j ? *j : *i) - 1);
//       double dist_ij = dist[idx];
//       tmp[knn_i++] = 1.0 / (dist_ij == 0.0 ? std::numeric_limits<double>::epsilon() : dist_ij);
//     }
//     all_pts_cd[ii++] = pow(sum(pow(tmp, d))/(cl_n - 1.0), -(1.0 / d));
//     knn_i = 0;
//   }
//   return(all_pts_cd);
// }


// RCPP does not provide xor!
// [[Rcpp::export]]
Rcpp::LogicalVector XOR(Rcpp::LogicalVector lhs, Rcpp::LogicalVector rhs) {
  R_xlen_t i = 0, n = lhs.size();
  Rcpp::LogicalVector result(n);
  for ( ; i < n; i++) {  result[i] = (lhs[i] ^ rhs[i]); }
  return result;
}

// [[Rcpp::export]]
NumericMatrix dspc(const List& cl_idx, const List& internal_nodes, const IntegerVector& all_cl_ids, const NumericVector& mrd_dist) {

  // Setup variables
  const int ncl = cl_idx.length(); // number of clusters
  NumericMatrix res = Rcpp::no_init_matrix((ncl * (ncl - 1))/2, 3); // resulting separation measures

  // Loop through cluster combinations, and for each combination
  int c = 0;
  double min_edge = std::numeric_limits<double>::infinity();
  for (int ci = 0; ci < ncl; ++ci) {
    for (int cj = (ci+1); cj < ncl; ++cj){
      Rcpp::checkUserInterrupt();

      // Do lots of indexing to get the relative indexes corresponding to internal nodes
      const IntegerVector i_idx = internal_nodes[ci], j_idx = internal_nodes[cj]; // i and j cluster point indices

      // ignore clusters with no internal nodes! -> get infinity for minimum edge
      // this leads to a NaN and should not happen in this implementation since
      // we have already filtered out clusters of size < 3
      if(i_idx.length() > 1 || j_idx.length() > 1) {

        const IntegerVector rel_i_idx = match(as<IntegerVector>(cl_idx[ci]), all_cl_ids)[i_idx - 1];
        const IntegerVector rel_j_idx = match(as<IntegerVector>(cl_idx[cj]), all_cl_ids)[j_idx - 1];
        IntegerVector int_idx = combine(rel_i_idx, rel_j_idx);

        // Get the pairwise MST
        NumericMatrix pairwise_mst = mst(dist_subset(mrd_dist, int_idx), int_idx.length());

        // Do lots of indexing / casting
        const IntegerVector from_int = seq_len(rel_i_idx.length());
        const NumericVector from_idx = as<NumericVector>(from_int);
        const NumericVector from = pairwise_mst.column(0), to = pairwise_mst.column(1), height = pairwise_mst.column(2);

        // Find which distances in the MST cross to both clusters
        LogicalVector cross_edges = XOR(Rcpp::in(from, from_idx), Rcpp::in(to, from_idx));

        // The minimum weighted edge of these cross edges is the density separation between the two clusters
        min_edge = min(as<NumericVector>(height[cross_edges]));

      }

      // Save the minimum edge
      res(c++, _) = NumericVector::create(ci+1, cj+1, min_edge);
      min_edge = std::numeric_limits<double>::infinity();
    }
  }
  return(res);
}


// Density Separation code
// NumericMatrix dspc(List config, const NumericVector& xdist) {
//
//   // Load configuration from list
//   const int n = config["n"];
//   const int ncl = config["ncl"];
//   const int n_pairs = config["n_pairs"];
//   List node_ids = config["node_ids"];
//   List acp = config["acp"];
//
//   // Conversions and basic setup
//   std::unordered_map<std::string, double> acp_map = toMap(acp);
//   double min_mrd = std::numeric_limits<double>::infinity();
//   NumericMatrix min_mrd_dist = NumericMatrix(n_pairs, 3);
//
//   // Loop through cluster combinations, and for each combination
//   int c = 0;
//   for (int ci = 0; ci < ncl; ++ci) {
//     for (int cj = (ci+1); cj < ncl; ++cj){
//       Rcpp::checkUserInterrupt();
//       IntegerVector i_idx = node_ids[ci], j_idx = node_ids[cj]; // i and j cluster point indices
//       for (IntegerVector::iterator i = i_idx.begin(); i != i_idx.end(); ++i){
//         for (IntegerVector::iterator j = j_idx.begin(); j != j_idx.end(); ++j){
//           const int lhs = *i < *j ? *i : *j, rhs = *i < *j ? *j : *i;
//           double dist_ij = xdist[INDEX_TF(n, lhs - 1, rhs - 1)]; // dist(p_i, p_j)
//           double acd_i = acp_map[std::to_string(*i)]; // all core distance for p_i
//           double acd_j = acp_map[std::to_string(*j)]; // all core distance for p_i
//           double mrd_ij = std::max(std::max(acd_i, acd_j), dist_ij); // mutual reachability distance of the pair
//           if (mrd_ij < min_mrd){
//             min_mrd = mrd_ij;
//           }
//         }
//       }
//       min_mrd_dist(c++, _) = NumericVector::create(ci+1, cj+1, min_mrd);
//       min_mrd = std::numeric_limits<double>::infinity();
//     }
//   }
//   return(min_mrd_dist);
// }


================================================
FILE: src/dbscan.cpp
================================================
//----------------------------------------------------------------------
//                                DBSCAN
// File:                        R_dbscan.cpp
//----------------------------------------------------------------------
// Copyright (c) 2015 Michael Hahsler. All Rights Reserved.
//
// This software is provided under the provisions of the
// GNU General Public License (GPL) Version 3
// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)


#include <Rcpp.h>
#include "ANN/ANN.h"
#include "regionQuery.h"

using namespace Rcpp;

// call this with either
// * data and epsilon and an empty frNN list
// or
// * empty data and a frNN id list (including selfmatches and using C numbering)

// [[Rcpp::export]]
IntegerVector dbscan_int(
    NumericMatrix data, double eps, int minPts, NumericVector weights,
    int borderPoints, int type, int bucketSize, int splitRule, double approx,
    List frNN) {

  // kd-tree uses squared distances
  double eps2 = eps*eps;

  bool weighted = FALSE;
  double Nweight = 0.0;
  ANNpointSet* kdTree = NULL;
  ANNpointArray dataPts = NULL;
  int nrow = NA_INTEGER;
  int ncol= NA_INTEGER;

  if(frNN.size()) {
    // no kd-tree but use frNN list from distances
    nrow = frNN.size();
  }else{

    // copy data for kd-tree
    nrow = data.nrow();
    ncol = data.ncol();
    dataPts = annAllocPts(nrow, ncol);
    for (int i = 0; i < nrow; i++){
      for (int j = 0; j < ncol; j++){
        (dataPts[i])[j] = data(i, j);
      }
    }
    //Rprintf("Points copied.\n");

    // create kd-tree (1) or linear search structure (2)
    if (type==1) kdTree = new ANNkd_tree(dataPts, nrow, ncol, bucketSize,
      (ANNsplitRule) splitRule);
    else kdTree = new ANNbruteForce(dataPts, nrow, ncol);
    //Rprintf("kd-tree ready. starting DBSCAN.\n");
  }

  if (weights.size() != 0) {
    if (weights.size() != nrow)
      stop("length of weights vector is incompatible with data.");
    weighted = TRUE;
  }

  // DBSCAN
  std::vector<bool> visited(nrow, false);
  std::vector< std::vector<int> > clusters; // vector of vectors == list
  std::vector<int>  N, N2;

  for (int i=0; i<nrow; i++) {
    //Rprintf("processing point %d\n", i+1);
    if (!(i % 100)) Rcpp::checkUserInterrupt();

    if (visited[i]) continue;

    //N = regionQuery(i, dataPts, kdTree, eps2, approx);
    if(frNN.size())   N = Rcpp::as< std::vector<int> >(frNN[i]);
    else              N = regionQuery(i, dataPts, kdTree, eps2, approx);

    // noise points stay unassigned for now
    //if (weighted) Nweight = sum(weights[IntegerVector(N.begin(), N.end())]) +
    if (weighted) {
      // This should work, but Rcpp has a problem with the sugar expression!
      // Assigning the subselection forces it to be materialized.
      // Nweight = sum(weights[IntegerVector(N.begin(), N.end())]) +
      // weights[i];
      NumericVector w = weights[IntegerVector(N.begin(), N.end())];
      Nweight = sum(w);
    } else Nweight = N.size();

    if (Nweight < minPts) continue;

    // start new cluster and expand
    std::vector<int> cluster;
    cluster.push_back(i);
    visited[i] = true;

    while (!N.empty()) {
      int j = N.back();
      N.pop_back();

      if (visited[j]) continue; // point already processed
      visited[j] = true;

      //N2 = regionQuery(j, dataPts, kdTree, eps2, approx);
      if(frNN.size())   N2 = Rcpp::as< std::vector<int> >(frNN[j]);
      else              N2 = regionQuery(j, dataPts, kdTree, eps2, approx);

      if (weighted) {
        // Nweight = sum(weights(NumericVector(N2.begin(), N2.end())) +
        // weights[j]
        NumericVector w = weights[IntegerVector(N2.begin(), N2.end())];
        Nweight = sum(w);
      } else Nweight = N2.size();

      if (Nweight >= minPts) { // expand neighborhood
        // this is faster than set_union and does not need sort! visited takes
        // care of duplicates.
        std::copy(N2.begin(), N2.end(),
          std::back_inserter(N));
      }

      // for DBSCAN* (borderPoints==FALSE) border points are considered noise
      if(Nweight >= minPts || borderPoints) cluster.push_back(j);
    }

    // add cluster to list
    clusters.push_back(cluster);
  }

  // prepare cluster vector
  // unassigned points are noise (cluster 0)
  IntegerVector id(nrow, 0);
  for (std::size_t i=0; i<clusters.size(); i++) {
    for (std::size_t j=0; j<clusters[i].size(); j++) {
      id[clusters[i][j]] = i+1;
    }
  }

  // cleanup
  if (kdTree != NULL) delete kdTree;
  if (dataPts != NULL)  annDeallocPts(dataPts);
  // annClose(); is now done globally in the package

  return wrap(id);
}


================================================
FILE: src/dendrogram.cpp
================================================
//----------------------------------------------------------------------
//              R interface to dbscan using the ANN library
//----------------------------------------------------------------------
// Copyright (c) 2015 Michael Hahsler, Matt Piekenbrock. All Rights Reserved.
//
// This software is provided under the provisions of the
// GNU General Public License (GPL) Version 3
// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)

#include <Rcpp.h>
#include <sstream>
#include <string>
#include "UnionFind.h"

using namespace Rcpp;

// Ditto with atoi!
int fast_atoi( const char * str )
{
  int val = 0;
  while( *str ) {
    val = val*10 + (*str++ - '0');
  }
  return val;
}

int which_int(IntegerVector x, int target) {
  int size = (int) x.size();
  for (int i = 0; i < size; ++i) {
    if (x(i) == target) return(i);
  }
  return(-1);
}


// [[Rcpp::export]]
List reach_to_dendrogram(const Rcpp::List reachability, const NumericVector pl_order) {

  // Set up sorted reachability distance
  NumericVector pl = Rcpp::clone(as<NumericVector>(reachability["reachdist"])).sort();

  // Get 0-based order
  IntegerVector order = Rcpp::clone(as<IntegerVector>(reachability["order"])) - 1;

  /// Initialize disjoint-set structure
  int n_nodes = order.size();
  UnionFind uf((size_t) n_nodes);

  // Create leaves
  List dendrogram(n_nodes);
  for (int i = 0; i < n_nodes; ++i) {
    IntegerVector leaf = IntegerVector();
    leaf.push_back(i+1);
    leaf.attr("label") = std::to_string(i + 1);
    leaf.attr("members") = 1;
    leaf.attr("height") = 0;
    leaf.attr("leaf") = true;
    dendrogram.at(i) = leaf;
  }

  // Precompute the q order
  IntegerVector q_order(n_nodes);
  for (int i = 0; i < n_nodes - 1; ++i) {
    q_order.at(i) = order(which_int(order, pl_order(i)) - 1);
  }

  // Get the index of the point with next smallest reach dist and its neighbor
  IntegerVector members(n_nodes, 1);
  int insert = 0, p = 0, q = 0, p_i = 0, q_i = 0;
  for (int i = 0; i < (n_nodes-1); ++i) {
    p = pl_order(i);
    q = q_order(i);  // left neighbor in ordering
    if (q == -1) { stop("Left neighbor not found"); }

    // Get the actual index of the branch(es) containing the p and q
    p_i = uf.Find(p), q_i = uf.Find(q);
    List branch = List::create(dendrogram.at(q_i), dendrogram.at(p_i));

    // generic proxy blocks attr access for mixed types, so keep track of members manually!
    branch.attr("members") = members.at(p_i) + members.at(q_i);
    branch.attr("height") = pl(i);
    branch.attr("class") = "dendrogram";

    // Merge the two, retrieving the new index
    uf.Union(p_i, q_i);
    insert = uf.Find(q_i); // q because q_branch is first in the new branch

    // Update members reference and insert the branch
    members.at(insert) = branch.attr("members");
    dendrogram.at(insert) = branch;
  }
  return(dendrogram.at(insert));
}

int DFS(List d, List& rp, int pnode, NumericVector stack) {
  if (d.hasAttribute("leaf")) { // If at a leaf node, compare to previous node
    std::string leaf_label = as<std::string>( d.attr("label") );
    rp[leaf_label] = stack; // Record the ancestors reachability values
    std::string pnode_label = std::to_string(pnode);
    double new_reach = 0.0f;
    if(!rp.containsElementNamed(pnode_label.c_str())) { // 1st time seeing this point
      new_reach = INFINITY;
    } else { // Smallest Common Ancestor
      NumericVector reachdist_p = rp[pnode_label];
      new_reach = min(intersect(stack, reachdist_p));
    }
    NumericVector reachdist = rp["reachdist"];
    IntegerVector order = rp["order"];
    reachdist.push_back(new_reach);
    int res = fast_atoi(leaf_label.c_str());
    order.push_back(res);
    rp["order"] = order;
    rp["reachdist"] = reachdist;
    return(res);
  } else {
    double cheight = d.attr("height");
    stack.push_back(cheight);
    List left = d[0];
    // Recursively go left, recording the reachability distances on the stack
    pnode = DFS(left, rp, pnode, stack);
    if (d.length() > 1) {
      for (int sub_branch = 1; sub_branch < d.length(); ++sub_branch)  {
        pnode = DFS(d[sub_branch], rp, pnode, stack); // pnode;
      }
    }
    return(pnode);
  }
}

// [[Rcpp::export]]
List dendrogram_to_reach(const Rcpp::List x) {
  Rcpp::List rp = List::create(_["order"] = IntegerVector::create(),
                               _["reachdist"] = NumericVector::create());
  NumericVector stack = NumericVector::create();
  DFS(x, rp, 0, stack);
  List res = List::create(_["reachdist"] = rp["reachdist"], _["order"] = rp["order"]);
  res.attr("class") = "reachability";
  return(res);
}

// [[Rcpp::export]]
List mst_to_dendrogram(const NumericMatrix mst) {

  // Set up sorted vector values
  NumericVector p_order = mst(_, 0);
  NumericVector q_order = mst(_, 1);
  NumericVector dist = mst(_, 2);
  int n_nodes = p_order.length() + 1;

  // Make sure to clone so as to not make changes by reference
  p_order = Rcpp::clone(p_order);
  q_order = Rcpp::clone(q_order);

  // UnionFind data structure for fast agglomerative building
  UnionFind uf((size_t) n_nodes);

  // Create leaves
  List dendrogram(n_nodes);
  for (int i = 0; i < n_nodes; ++i) {
    IntegerVector leaf = IntegerVector();
    leaf.push_back(i+1);
    leaf.attr("label") = std::to_string(i + 1);
    leaf.attr("members") = 1;
    leaf.attr("height") = 0;
    leaf.attr("leaf") = true;
    dendrogram.at(i) = leaf;
  }

  // Get the index of the point with next smallest reach dist and its neighbor
  IntegerVector members(n_nodes, 1);
  int insert = 0, p = 0, q = 0, p_i = 0, q_i = 0;
  for (int i = 0; i < (n_nodes-1); ++i) {
    p = p_order(i), q = q_order(i);

    // Get the actual index of the branch(es) containing the p and q
    p_i = uf.Find(p), q_i = uf.Find(q);

    // Merge the two, retrieving the new index
    uf.Union(p_i, q_i);
    List branch = List::create(dendrogram.at(q_i), dendrogram.at(p_i));

    insert = uf.Find(q_i); // q because q_branch is first in the new branch

    // Update members in the branch
    int tmp_members = members.at(p_i) + members.at(q_i);

    // Branches with equivalent distances are merged simultaneously
    while((i + 1) < (n_nodes-1) && dist(i + 1) == dist(i)){
      i += 1;
      p = p_order(i), q = q_order(i);
      p_i = uf.Find(p), q_i = uf.Find(q);

      // Merge the branches, update current insert index
      int insert2 = uf.Find(q_i);
      branch.push_back(insert == insert2 ? dendrogram.at(p_i) : dendrogram.at(q_i));
      tmp_members += insert == insert2 ? members.at(p_i) : members.at(q_i);
      uf.Union(p_i, q_i);
      insert = uf.Find(q_i);

    }
    // Generic proxy blocks attr access for mixed types, so need to keep track of members manually!
    branch.attr("height") = dist(i);
    branch.attr("class") = "dendrogram";
    branch.attr("members") = tmp_members;

    // Update members reference and insert the branch
    members.at(insert) = branch.attr("members");
    dendrogram.at(insert) = branch;

  }
  return(dendrogram.at(insert));
}


================================================
FILE: src/density.cpp
================================================
//----------------------------------------------------------------------
//                                DBSCAN density
//----------------------------------------------------------------------
// Copyright (c) 2015 Michael Hahsler. All Rights Reserved.
//
// This software is provided under the provisions of the
// GNU General Public License (GPL) Version 3
// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)


#include <Rcpp.h>
#include "ANN/ANN.h"
#include "regionQuery.h"

using namespace Rcpp;

// faster implementation of counting point densities from a matrix
// using a kd-tree
// [[Rcpp::export]]
IntegerVector dbscan_density_int(
    NumericMatrix data, double eps,
    int type, int bucketSize, int splitRule, double approx) {

  // kd-tree uses squared distances
  double eps2 = eps*eps;

  ANNpointSet* kdTree = NULL;
  ANNpointArray dataPts = NULL;
  int nrow = NA_INTEGER;
  int ncol= NA_INTEGER;

  // copy data for kd-tree
  nrow = data.nrow();
  ncol = data.ncol();
  dataPts = annAllocPts(nrow, ncol);
  for (int i = 0; i < nrow; i++){
    for (int j = 0; j < ncol; j++){
      (dataPts[i])[j] = data(i, j);
    }
  }
  //Rprintf("Points copied.\n");

  // create kd-tree (1) or linear search structure (2)
  if (type==1) kdTree = new ANNkd_tree(dataPts, nrow, ncol, bucketSize,
    (ANNsplitRule) splitRule);
  else kdTree = new ANNbruteForce(dataPts, nrow, ncol);
  //Rprintf("kd-tree ready. starting DBSCAN.\n");

  std::vector<int> N;
  IntegerVector count(nrow);

  for (int i=0; i<nrow; ++i) {
    //Rprintf("processing point %d\n", i+1);
    if (!(i % 100)) Rcpp::checkUserInterrupt();

    N = regionQuery(i, dataPts, kdTree, eps2, approx);
    count[i] = N.size();
  }

  // cleanup
  if (kdTree != NULL) delete kdTree;
  if (dataPts != NULL)  annDeallocPts(dataPts);
  // annClose(); is now done globally in the package

  return count;
}


================================================
FILE: src/frNN.cpp
================================================
//----------------------------------------------------------------------
//                   Fixed Radius Nearest Neighbors
// File:                        R_frNN.cpp
//----------------------------------------------------------------------
// Copyright (c) 2015 Michael Hahsler. All Rights Reserved.
//
// This software is provided under the provisions of the
// GNU General Public License (GPL) Version 3
// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)


#include <Rcpp.h>
#include "ANN/ANN.h"
#include "regionQuery.h"

using namespace Rcpp;

// [[Rcpp::export]]
List frNN_int(NumericMatrix data, double eps, int type,
  int bucketSize, int splitRule, double approx) {

  // kd-tree uses squared distances
  double eps2 = eps*eps;

  // copy data
  int nrow = data.nrow();
  int ncol = data.ncol();
  ANNpointArray dataPts = annAllocPts(nrow, ncol);
  for(int i = 0; i < nrow; i++){
    for(int j = 0; j < ncol; j++){
      (dataPts[i])[j] = data(i, j);
    }
  }
  //Rprintf("Points copied.\n");

  // create kd-tree (1) or linear search structure (2)
  ANNpointSet* kdTree = NULL;
  if (type==1){
    kdTree = new ANNkd_tree(dataPts, nrow, ncol, bucketSize,
      (ANNsplitRule)  splitRule);
  } else{
    kdTree = new ANNbruteForce(dataPts, nrow, ncol);
  }
  //Rprintf("kd-tree ready. starting DBSCAN.\n");

  // frNN
  //std::vector< IntegerVector > id; id.resize(nrow);
  //std::vector< NumericVector > dist; dist.resize(nrow);
  List id(nrow);
  List dist(nrow);

  for (int p=0; p<nrow; p++) {
    if (!(p % 100)) Rcpp::checkUserInterrupt();

    //Rprintf("processing point %d\n", p+1);
    nn N = regionQueryDist(p, dataPts, kdTree, eps2, approx);

    // fix index
    //std::transform(N.first.begin(), N.first.end(),
    //  N.first.begin(), std::bind2nd( std::plus<int>(), 1 ) );

    // take sqrt of distance since the tree stores d^2
    //std::transform(N.second.begin(), N.second.end(),
    //  N.second.begin(), static_cast<double (*)(double)>(std::sqrt));

    IntegerVector ids = IntegerVector(N.first.begin(), N.first.end());
    NumericVector dists = NumericVector(N.second.begin(), N.second.end());

    // remove self matches
    LogicalVector take = ids != p;
    ids = ids[take];
    dists = dists[take];

    //Rprintf("Found neighborhood size %d\n", ids.size());
    id[p] = ids+1;
    dist[p] = sqrt(dists);
  }

  // cleanup
  delete kdTree;
  annDeallocPts(dataPts);
  // annClose(); is now done globally in the package

  // prepare results
  List ret;
  ret["dist"] = dist;
  ret["id"] = id;
  ret["eps"] = eps;
  return ret;
}

// [[Rcpp::export]]
List frNN_query_int(NumericMatrix data, NumericMatrix query, double eps, int type,
  int bucketSize, int splitRule, double approx) {

  // kd-tree uses squared distances
  double eps2 = eps*eps;

  // copy data
  int nrow = data.nrow();
  int ncol = data.ncol();
  ANNpointArray dataPts = annAllocPts(nrow, ncol);
  for(int i = 0; i < nrow; i++){
    for(int j = 0; j < ncol; j++){
      (dataPts[i])[j] = data(i, j);
    }
  }

  int nrow_q = query.nrow();
  int ncol_q = query.ncol();
  ANNpointArray queryPts = annAllocPts(nrow_q, ncol_q);
  for(int i = 0; i < nrow_q; i++){
    for(int j = 0; j < ncol_q; j++){
      (queryPts[i])[j] = query(i, j);
    }
  }
  //Rprintf("Points copied.\n");

  // create kd-tree (1) or linear search structure (2)
  ANNpointSet* kdTree = NULL;
  if (type==1){
    kdTree = new ANNkd_tree(dataPts, nrow, ncol, bucketSize,
      (ANNsplitRule)  splitRule);
  } else{
    kdTree = new ANNbruteForce(dataPts, nrow, ncol);
  }
  //Rprintf("kd-tree ready. starting DBSCAN.\n");

  // frNN
  //std::vector< IntegerVector > id; id.resize(nrow);
  //std::vector< NumericVector > dist; dist.resize(nrow);
  List id(nrow_q);
  List dist(nrow_q);

  for (int p=0; p<nrow_q; p++) {
    if (!(p % 100)) Rcpp::checkUserInterrupt();

    //Rprintf("processing point %d\n", p+1);
    ANNpoint queryPt = queryPts[p];
    nn N = regionQueryDist_point(queryPt, dataPts, kdTree, eps2, approx);

    // fix index
    //std::transform(N.first.begin(), N.first.end(),
    //  N.first.begin(), std::bind2nd( std::plus<int>(), 1 ) );

    // take sqrt of distance since the tree stores d^2
    //std::transform(N.second.begin(), N.second.end(),
    //  N.second.begin(), static_cast<double (*)(double)>(std::sqrt));

    IntegerVector ids = IntegerVector(N.first.begin(), N.first.end());
    NumericVector dists = NumericVector(N.second.begin(), N.second.end());

    // remove self matches -- not an issue with query points
    //LogicalVector take = ids != p;
    //ids = ids[take];
    //dists = dists[take];

    //Rprintf("Found neighborhood size %d\n", ids.size());
    id[p] = ids+1;
    dist[p] = sqrt(dists);
  }

  // cleanup
  delete kdTree;
  annDeallocPts(dataPts);
  annDeallocPts(queryPts);
  // annClose(); is now done globally in the package

  // prepare results
  List ret;
  ret["dist"] = dist;
  ret["id"] = id;
  ret["eps"] = eps;
  ret["sort"] = false;
  return ret;
}


================================================
FILE: src/hdbscan.cpp
================================================
//----------------------------------------------------------------------
//              R interface to dbscan using the ANN library
//----------------------------------------------------------------------
// Copyright (c) 2015 Michael Hahsler, Matt Piekenbrock. All Rights Reserved.
//
// This software is provided under the provisions of the
// GNU General Public License (GPL) Version 3
// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)

#include <Rcpp.h>

// C++ includes
#include <unordered_map>
#include <stack>
#include <queue>
#include <string> // std::atoi

// Helper functions
#include "utilities.h"

using namespace Rcpp;
// [[Rcpp::plugins(cpp11)]]

// Macros
#define INDEX_TF(N,to,from) (N)*(to) - (to)*(to+1)/2 + (from) - (to) - (1)

// Given a dist vector of "should-link" (1), "should-not-link" (-1), and "don't care" (0)
// constraints in the form of integers, convert constraints to a more compact adjacency list
// representation.
// [[Rcpp::export]]
List distToAdjacency(IntegerVector constraints, const int N){
  std::unordered_map<int, std::vector<int> > key_map = std::unordered_map<int, std::vector<int> >();
  for (int i = 0; i < N; ++i){
    for (int j = 0; j < N; ++j){
      if (i == j) continue;
      int index = i > j ? INDEX_TF(N, j, i) : INDEX_TF(N, i, j);
      int crule = constraints.at(index);
      if (crule != 0){
        if (key_map.count(i+1) != 1){ key_map[i+1] = std::vector<int>(); } // add 1 for base 1
        key_map[i+1].push_back(crule < 0 ? - (j + 1) : j + 1); // add 1 for base 1
      }
    }
  }
  return(wrap(key_map));
}

// Given an hclust object, convert to a dendrogram object (but much faster).
// [[Rcpp::export]]
List buildDendrogram(List hcl) {

  // Extract hclust info
  IntegerMatrix merge = hcl["merge"];
  NumericVector height = hcl["height"];
  IntegerVector order = hcl["order"];
  List labels = List(); // allows to avoid type inference
  if (!hcl.containsElementNamed("labels") || hcl["labels"] == R_NilValue){
    labels = seq_along(order);
  } else {
    labels = as<StringVector>(hcl["labels"]);
  }

  int n = merge.nrow() + 1, k;
  List new_br, z = List(n);
  for (k = 0; k < n-1; k++){
    int lm = merge(k, 0), rm = merge(k, 1);
    IntegerVector m = IntegerVector::create(lm, rm);

    // First Case: Both are singletons, so need to create leaves
    if (all(m < 0).is_true()){
      // Left
      IntegerVector left = IntegerVector::create(-lm);
      left.attr("members") = (int) 1;
      left.attr("height") = (double) 0.f;
      left.attr("label") = labels.at(-(lm + 1));
      left.attr("leaf") = true;

      // Right
      IntegerVector right = IntegerVector::create(-rm);
      right.attr("members") = (int) 1;
      right.attr("height") = (double) 0.f;
      right.attr("label") = labels.at(-(rm + 1));
      right.attr("leaf") = true;

      // Merge
      new_br = List::create(left, right);
      new_br.attr("members") = 2;
      new_br.attr("midpoint") = 0.5;
    }
    // Second case: 1 is a singleton, the other is a branch
    else if (any(m < 0).is_true()){
      bool isL = lm < 0;

      // Create the leaf from the negative entry
      IntegerVector leaf = IntegerVector::create(isL ? -lm : -rm);
      leaf.attr("members") = 1;
      leaf.attr("height") = 0;
      leaf.attr("label") = labels.at(isL ? -(lm + 1) : -(rm + 1));
      leaf.attr("leaf") = true;

      // Merge the leaf with the other existing branch
      int branch_key = isL ? rm - 1 : lm - 1;
      List sub_branch = z[branch_key];
      new_br = isL ? List::create(leaf, sub_branch) : List::create(sub_branch, leaf);
      z.at(branch_key) = R_NilValue;

      // Set attributes of new branch
      int sub_members = sub_branch.attr("members");
      double mid_pt = sub_branch.attr("midpoint");
      new_br.attr("members") = int(sub_members) + 1;
      new_br.attr("midpoint") = (int(isL ? 1 : sub_members) + mid_pt) / 2;
    } else {
      // Create the new branch
      List l_branch = z.at(lm - 1), r_branch = z.at(rm - 1);
      new_br = List::create(l_branch, r_branch);

      // Store attribute valeus in locals to get around proxy
      int left_members = l_branch.attr("members"), right_members = r_branch.attr("members");
      double l_mid = l_branch.attr("midpoint"), r_mid = r_branch.attr("midpoint");

      // Set up new branch attributes
      new_br.attr("members") = left_members + right_members;
      new_br.attr("midpoint") = (left_members + l_mid + r_mid) / 2;

      // Deallocate unneeded memory along the way
      z.at(lm - 1) = R_NilValue;
      z.at(rm - 1) = R_NilValue;
    }
    new_br.attr("height") = height.at(k);
    z.at(k) = new_br;
  }
  List res = z.at(k - 1);
  res.attr("class") = "dendrogram";
  return(res);
}

// Simple function to iteratively get the sub-children of a nested integer-hierarchy
// [[Rcpp::export]]
IntegerVector all_children(List hier, int key, bool leaves_only = false){
  IntegerVector res = IntegerVector();

  // If the key doesn't exist return an empty vector
  if (!hier.containsElementNamed(std::to_string(key).c_str())){
    return(res);
  }

  // Else, do iterative 'recursive' type function to extract all the IDs of
  // all sub trees
  IntegerVector children = hier[std::to_string(key).c_str()];
  std::queue<int> to_do = std::queue<int>();
  to_do.push(key);
  while (to_do.size() != 0){
    int parent = to_do.front();
    if (!hier.containsElementNamed(std::to_string(parent).c_str())){
      to_do.pop();
    } else {
      children = hier[std::to_string(parent).c_str()];
      to_do.pop();
      for (int n_children = 0; n_children < children.length(); ++n_children){
        int child_id = children.at(n_children);
        if (leaves_only){
          if (!hier.containsElementNamed(std::to_string(child_id).c_str())) {
            res.push_back(child_id);
          }
        } else { res.push_back(child_id); }
        to_do.push(child_id);
      }
    }
  }
  return(res);
}

// Extract 'flat' assignments
IntegerVector getSalientAssignments(List cl_tree, List cl_hierarchy, std::list<int> sc, const int n){
  IntegerVector cluster = IntegerVector(n, 0);
  for (std::list<int>::iterator it = sc.begin(); it != sc.end(); it++) {
    IntegerVector child_cl = all_children(cl_hierarchy, *it);

    // If at a leaf, its not necessary to recursively get point indices, else need to traverse hierarchy
    if (child_cl.length() == 0){
      List cl = cl_tree[std::to_string(*it)];
      cluster[as<IntegerVector>(cl["contains"]) - 1] = *it;
    } else {
      List cl = cl_tree[std::to_string(*it)];
      cluster[as<IntegerVector>(cl["contains"]) - 1] = *it;
      for (IntegerVector::iterator child_cid = child_cl.begin(); child_cid != child_cl.end(); ++child_cid){
        cl = cl_tree[std::to_string(*child_cid)];
        IntegerVector child_contains = as<IntegerVector>(cl["contains"]);
        if (child_contains.length() > 0){
          cluster[child_contains - 1] = *it;
        }
      }
    }
  }
  return(cluster);
}

// Retrieve node (x, y) positions in a cluster tree
// [[Rcpp::export]]
NumericMatrix node_xy(List cl_tree, List cl_hierarchy, int cid = 0){

  // Initialize
  if (cid == 0){
    cl_tree["node_xy"] = NumericMatrix(all_children(cl_hierarchy, 0).size()+1, 2);
    cl_tree["leaf_counter"] = 0;
    cl_tree["row_counter"] = 0;
  }

  // Retrieve/set variables
  std::string cid_str = std::to_string(cid);
  NumericMatrix node_xy_ = cl_tree["node_xy"];
  List cl = cl_tree[cid_str];

  // Increment row index every time
  int row_index = (int) cl_tree["row_counter"];
  cl_tree["row_counter"] = row_index+1;

  // base case
  if (!cl_hierarchy.containsElementNamed(cid_str.c_str())){
    int leaf_index = (int) cl_tree["leaf_counter"];
    node_xy_(row_index, _) = NumericVector::create((double) ++leaf_index, (double) cl["eps_death"]);
    cl_tree["leaf_counter"] = leaf_index;
    NumericMatrix res = NumericMatrix(1, 1);
    res[0] = row_index;
    return(res);
  } else {
    IntegerVector children = cl_hierarchy[cid_str];
    int l_row = (int) node_xy(cl_tree, cl_hierarchy, children.at(0))[0]; // left
    int r_row = (int) node_xy(cl_tree, cl_hierarchy, children.at(1))[0]; // right
    double lvalue = (double) (node_xy_(l_row, 0) + node_xy_(r_row, 0)) / 2;
    node_xy_(row_index, _) = NumericVector::create(lvalue, (double) cl["eps_death"]);

    if (cid != 0){
      NumericMatrix res = NumericMatrix(1, 1);
      res[0] = row_index;
      return(res);
    }
  }

  // Cleanup
  if (cid == 0){
    cl_tree["leaf_counter"] = R_NilValue;
    cl_tree["row_counter"] = R_NilValue;
  }
  return (node_xy_);
}

// Given a cluster tree, convert to a simplified dendrogram
// [[Rcpp::export]]
List simplifiedTree(List cl_tree) {

  // Hierarchical information
  List cl_hierarchy = cl_tree.attr("cl_hierarchy");
  IntegerVector all_childs = all_children(cl_hierarchy, 0);

  // To keep track of members and midpoints
  std::unordered_map<std::string, int> members = std::unordered_map<std::string, int>();
  std::unordered_map<std::string, float> mids = std::unordered_map<std::string, float>();

  // To keep track of where we are
  std::stack<int> cid_stack = std::stack<int>();
  cid_stack.push(0);

  // Iteratively build the hierarchy
  List dendrogram = List();

  // Premake children
  for (IntegerVector::iterator it = all_childs.begin(); it != all_childs.end(); ++it){
    std::string cid_label = std::to_string(*it);
    List cl = cl_tree[cid_label];
    if (!cl_hierarchy.containsElementNamed(cid_label.c_str())){
      // Create leaf
      IntegerVector leaf = IntegerVector::create(*it);
      leaf.attr("label") = cid_label;
      leaf.attr("members") = 1;
      leaf.attr("height") = cl["eps_death"];
      leaf.attr("midpoint") = 0;
      leaf.attr("leaf") = true;
      dendrogram[cid_label] = leaf;
      members[cid_label] = 1;
      mids[cid_label] = 0;
    }
  }

  // Building the dendrogram bottom-up
  while(!cid_stack.empty()) {
    int cid = cid_stack.top();
    std::string cid_label = std::to_string(cid);
    List cl = cl_tree[cid_label];

    // Recursive calls
    IntegerVector local_children = cl_hierarchy[cid_label];

    // Members and midpoint extraction
    std::string l_str = std::to_string(local_children.at(0)), r_str = std::to_string(local_children.at(1));
    // Rcout << "Comparing: " << l_str << ", " << r_str << std::endl;
    if (!dendrogram.containsElementNamed(l_str.c_str())){ cid_stack.push(local_children.at(0)); continue; }
    if (!dendrogram.containsElementNamed(r_str.c_str())){ cid_stack.push(local_children.at(1)); continue; }

    // Continue building up the hierarchy
    List left = dendrogram[l_str], right = dendrogram[r_str];

    int l_members = members[l_str], r_members = members[r_str];
    float l_mid = mids[l_str], r_mid = mids[r_str];

    // Make the new branch
    List new_branch = List::create(dendrogram[l_str], dendrogram[r_str]);
    new_branch.attr("label") = cid_label;
    new_branch.attr("members") = l_members + r_members;
    new_branch.attr("height") = (float) cl["eps_death"];
    new_branch.attr("class") = "dendrogram";

    // Midpoint calculation
    bool isL = (bool) !cl_hierarchy.containsElementNamed(l_str.c_str()); // is left a leaf
    if (!isL && cl_hierarchy.containsElementNamed(r_str.c_str())){ // is non-singleton merge
      new_branch.attr("midpoint") = (l_members + l_mid + r_mid) / 2;
    } else { // contains a leaf
      int sub_members = isL ? r_members : l_members;
      float mid_pt = isL ? r_mid : l_mid;
      new_branch.attr("midpoint") = ((isL ? 1 : sub_members) + mid_pt) / 2;
    }

    // Save info for later
    members[cid_label] = l_members + r_members;
    mids[cid_label] = (float) new_branch.attr("midpoint");
    dendrogram[cid_label] = new_branch;

    // Done with this node
    cid_stack.pop();
  }
  return(dendrogram["0"]);
}

/* Main processing step to compute all the relevent information in the form of the
 * 'cluster tree' for FOSC. The cluster stability scores are computed via the tree traversal rely on a separate function
 * Requires information associated with hclust elements. See ?hclust in R for more info.
 * 1. merge := an (n-1) x d matrix representing the MST computed from any arbitrary similarity matrix
 * 2. height := the (linkage) distance each new set of clusters forms from the MST
 * 3. order := the point indices of the original data the negative entries in merge refer to
 * Notation: eps is used to arbitrarily refer to the dissimilarity distance used
*/
// [[Rcpp::export]]
List computeStability(const List hcl, const int minPts, bool compute_glosh = false){
  // Extract hclust info
  NumericMatrix merge = hcl["merge"];
  NumericVector eps_dist = hcl["height"];
  IntegerVector pt_order = hcl["order"];
  int n = merge.nrow() + 1, k;

  //  Which cluster does each merge step represent (after the merge, or before the split)
  IntegerVector cl_tracker = IntegerVector(n-1 , 0),
                member_sizes = IntegerVector(n-1, 0); // Size each step

  List clusters = List(), // Final cluster information
       cl_hierarchy = List(); // Keeps track of hierarchy, which cluster contains who

  // The primary information needed
  std::unordered_map<std::string, IntegerVector> contains = std::unordered_map<std::string, IntegerVector>();
  std::unordered_map<std::string, NumericVector> eps = std::unordered_map<std::string, NumericVector>();

  // Supplemental information for either conveniance or to reduce memory
  std::unordered_map<std::string, int> n_children = std::unordered_map<std::string, int>();
  std::unordered_map<std::string, double> eps_death = std::unordered_map<std::string, double>();
  std::unordered_map<std::string, double> eps_birth = std::unordered_map<std::string, double>();
  std::unordered_map<std::string, bool> processed = std::unordered_map<std::string, bool>();

  // First pass: Agglomerate up the hierarchy, recording member sizes.
  // This enables a dynamic programming strategy to improve performance below.
  for (k = 0; k < n-1; ++k){
    int lm = merge(k, 0), rm = merge(k, 1);
    IntegerVector m = IntegerVector::create(lm, rm);
    if (all(m < 0).is_true()){
      member_sizes[k] = 2;
    } else if (any(m < 0).is_true()) {
      int pos_merge = (lm < 0 ? rm : lm), merge_size = member_sizes[pos_merge - 1];
      member_sizes[k] = merge_size + 1;
    } else {
      // Record Member Sizes
      int merge_size1 = member_sizes[lm-1], merge_size2 = member_sizes[rm-1];
      member_sizes[k] = merge_size1 + merge_size2;
    }
  }

  // Initialize root (unknown size, might be 0, so don't initialize length)
  std::string root_str = "0";
  contains[root_str] = NumericVector();
  eps[root_str] = NumericVector();
  eps_birth[root_str] = eps_dist.at(eps_dist.length()-1);

  int global_cid = 0;
  // Second pass: Divisively split the hierarchy, recording the epsilon and point index values as needed
  for (k = n-2; k >= 0; --k){
    // Current Merge
    int lm = merge(k, 0), rm = merge(k, 1), cid = cl_tracker.at(k);
    IntegerVector m = IntegerVector::create(lm, rm);
    std::string cl_cid = std::to_string(cid);

    // Trivial case: split into singletons, record eps, contains, and ensure eps_death is minimal
    if (all(m < 0).is_true()){
      contains[cl_cid].push_back(-lm), contains[cl_cid].push_back(-rm);
      double noise_eps = processed[cl_cid] ? eps_death[cl_cid] : eps_dist.at(k);
      eps[cl_cid].push_back(noise_eps), eps[cl_cid].push_back(noise_eps);
      eps_death[cl_cid] = processed[cl_cid] ? eps_death[cl_cid] : std::min((double) eps_dist.at(k), (double) eps_death[cl_cid]);
    } else if (any(m < 0).is_true()) {
      // Record new point info and mark the non-singleton with the cluster id
      contains[cl_cid].push_back(-(lm < 0 ? lm : rm));
      eps[cl_cid].push_back(processed[cl_cid] ? eps_death[cl_cid] : eps_dist.at(k));
      cl_tracker.at((lm < 0 ? rm : lm) - 1) = cid;
    } else {
      int merge_size1 = member_sizes[lm-1], merge_size2 = member_sizes[rm-1];

      // The minPts step
      if (merge_size1 >= minPts && merge_size2 >= minPts){
        // Record death of current cluster
        eps_death[cl_cid] = eps_dist.at(k);
        processed[cl_cid] = true;

        // Mark the lower merge steps as new clusters
        cl_hierarchy[cl_cid] = IntegerVector::create(global_cid+1, global_cid+2);
        std::string l_index = std::to_string(global_cid+1), r_index = std::to_string(global_cid+2);
        cl_tracker.at(lm - 1) = ++global_cid, cl_tracker.at(rm - 1) = ++global_cid;

        // Record the distance the new clusters appeared and initialize containers
        contains[l_index] = IntegerVector(), contains[r_index] = IntegerVector();
        eps[l_index] = NumericVector(), eps[r_index] = NumericVector(); ;
        eps_birth[l_index] = eps_dist.at(k), eps_birth[r_index] = eps_dist.at(k);
        eps_death[l_index] = eps_dist.at(lm - 1), eps_death[r_index] = eps_dist.at(rm - 1);
        processed[l_index] = false, processed[r_index] = false;
        n_children[cl_cid] = merge_size1 + merge_size2;
      } else {
        // Inherit cluster identity
        cl_tracker.at(lm - 1) = cid,  cl_tracker.at(rm - 1) = cid;
      }
    }
  }

  // Aggregate data into a returnable list
  // NOTE: the 'contains' element will be empty for all inner nodes w/ minPts == 1, else
  // it will contain only the objects that were considered 'noise' at that hierarchical level
  List res = List();
  NumericVector outlier_scores;
  if (compute_glosh) { outlier_scores = NumericVector( n, -1.0); }
  for (std::unordered_map<std::string, IntegerVector>::iterator key = contains.begin(); key != contains.end(); ++key){
    int nc = n_children[key->first];
    res[key->first] = List::create(
      _["contains"] = key->second,
      _["eps"] = eps[key->first],
      _["eps_birth"] = eps_birth[key->first],
      _["eps_death"] = eps_death[key->first],
      _["stability"] = sum(1/eps[key->first] - 1/eps_birth[key->first]) + (nc * 1/eps_death[key->first] - nc * 1/eps_birth[key->first]),
      //_["_stability"] = 1/eps[key->first] - 1/eps_birth[key->first],
      _["n_children"] = n_children[key->first]
    );

    // Compute GLOSH outlier scores (HDBSCAN only)
    if (compute_glosh){
      if (eps[key->first].size() > 0){ // contains noise points
        double eps_max = std::numeric_limits<double>::infinity();
        IntegerVector leaf_membership = all_children(cl_hierarchy, atoi(key->first.c_str()), true);
        if (leaf_membership.length() == 0){ // is itself a leaf
          eps_max = eps_death[key->first];
        } else {
          for (IntegerVector::iterator it = leaf_membership.begin(); it != leaf_membership.end(); ++it){
            eps_max = std::min(eps_max, eps_death[std::to_string(*it)]);
          }
        }
        NumericVector eps_max_vec =  NumericVector(eps[key->first].size(), eps_max) / as<NumericVector>(eps[key->first]);
        NumericVector glosh = Rcpp::rep(1.0, key->second.length()) - eps_max_vec;
        outlier_scores[key->second - 1] = glosh;
      }
        // MFH: If the point is never an outlier (0/0) then set GLOSH to 0
        outlier_scores[is_nan(outlier_scores)] = 0.0;
    }
  }

  // Store meta-data as attributes
  res.attr("n") = n; // number of points in the original data
  res.attr("cl_hierarchy") = cl_hierarchy;  // Stores parent/child structure
  res.attr("cl_tracker") = cl_tracker; // stores cluster id formation for each merge step, used for cluster extraction
  res.attr("minPts") = minPts; // needed later
  // res.attr("root") = minPts == 1; // needed later to ensure root is not captured as a cluster
  if (compute_glosh){ res.attr("glosh") = outlier_scores; } // glosh outlier scores (hdbscan only)
  return(res);
}

// Validates a given list of instance-level constraints for symmetry. Since the number of
// constraints might change dramatically based on the problem, and initial loop is performed
// to figure out whether it would be faster to check via an adjacencty list or matrix
// [[Rcpp::export]]
List validateConstraintList(List& constraints, int n){
  std::vector< std::string > keys = as< std::vector< std::string > >(constraints.names());
  bool is_valid = true, tmp_valid, use_matrix = false;

  int n_constraints = 0;
  for (List::iterator it = constraints.begin(); it != constraints.end(); ++it){
    n_constraints += as<IntegerVector>(*it).size();
  }

  // Sparsity check: if the constraints make up a sufficiently large amount of
  // the solution space, use matrix to check validity
  if (n_constraints/(n*n) > 0.20){
    use_matrix = true;
  }

  // Check using adjacency matrix
  if (use_matrix){
    IntegerMatrix adj_matrix = IntegerMatrix(Dimension(n, n));
    int from, to;
    for (std::vector< std::string >::iterator it = keys.begin(); it != keys.end(); ++it){
      // Get constraints
      int cid = atoi(it->c_str()); // to base-0
      IntegerVector cs_ = constraints[*it];

      // Positive "should-link" constraints
      IntegerVector pcons = as<IntegerVector>(cs_[cs_ > 0]);
      for (IntegerVector::iterator pc = pcons.begin(); pc != pcons.end(); ++pc){
        from = (*pc < cid ? *pc : cid) - 1;
        to = (*pc > cid ? *pc : cid) - 1;
        adj_matrix(from, to) = 1;
      }

      // Negative "should-not-link" constraints
      IntegerVector ncons = -(as<IntegerVector>(cs_[cs_ < 0]));
      for (IntegerVector::iterator nc = ncons.begin(); nc != ncons.end(); ++nc){
        from = (*nc < cid ? *nc : cid) - 1;
        to = (*nc > cid ? *nc : cid) - 1;
        adj_matrix(from, to) = -1;
      }
    }

    // Check symmetry
    IntegerVector lower = lowerTri(adj_matrix);
    IntegerMatrix adj_t = Rcpp::transpose(adj_matrix);
    IntegerVector lower_t = lowerTri(adj_t);
    LogicalVector valid_check = lower == lower_t;
    is_valid = all(valid_check == TRUE).is_true();

    // Try to merge the two
    if (!is_valid){
      int sum = 0;
      for (int i = 0; i < lower.size(); ++i){
        sum = lower.at(i) + lower_t.at(i);
        lower[i] = sum > 0 ? 1 : sum < 0 ? -1 : 0;
      }
    }
    constraints = distToAdjacency(lower, n);
  }
  // Else check using given adjacency list
  else {
    for (std::vector< std::string >::iterator it = keys.begin(); it != keys.end(); ++it){
      // Get constraints
      int cid = atoi(it->c_str());
      IntegerVector cs_ = constraints[*it];

      // Positive "should-link" constraints
      IntegerVector pcons = as<IntegerVector>(cs_[cs_ > 0]);
      for (IntegerVector::iterator pc = pcons.begin(); pc != pcons.end(); ++pc){
        int ic = *pc < 0 ? -(*pc) : *pc;
        std::string ic_str = std::to_string(ic);
        bool exists = constraints.containsElementNamed(ic_str.c_str());
        tmp_valid = exists ? contains(as<IntegerVector>(constraints[ic_str]), cid) : false;
        if (!tmp_valid){
          if (!exists){
            constraints[ic_str] = IntegerVector::create(cid);
          } else {
            IntegerVector con_vec = constraints[ic_str];
            con_vec.push_back(cid);
            constraints[ic_str] = con_vec;
          }
          is_valid = false;
        }
      }

      // Negative "should-not-link" constraints
      IntegerVector ncons = -(as<IntegerVector>(cs_[cs_ < 0]));
      for (IntegerVector::iterator nc = ncons.begin(); nc != ncons.end(); ++nc){
        int ic = *nc < 0 ? -(*nc) : *nc;
        std::string ic_str = std::to_string(ic);
        bool exists = constraints.containsElementNamed(ic_str.c_str());
        tmp_valid = exists ? contains(as<IntegerVector>(constraints[ic_str]), cid) : false;
        if (!tmp_valid){
          if (!exists){
            constraints[ic_str] = IntegerVector::create(-cid);
          } else {
            IntegerVector con_vec = constraints[ic_str];
            con_vec.push_back(-cid);
            constraints[ic_str] = con_vec;
          }
          is_valid = false;
        }
      }
    }
  }
  // Produce warning if asymmetric constraints detected; return attempt at fixing constraints.
  if (!is_valid){
      warning("Incomplete (asymmetric) constraints detected. Populating constraint list.");
  }
  return(constraints);
}

// [[Rcpp::export]]
double computeVirtualNode(IntegerVector noise, List constraints){
  if (noise.length() == 0) return(0);
  if (Rf_isNull(constraints)) return(0);

  // Semi-supervised extraction
  int satisfied_constraints = 0;
  // Rcout << "Starting constraint based optimization" << std::endl;
  for (IntegerVector::iterator it = noise.begin(); it != noise.end(); ++it){
    std::string cs_str = std::to_string(*it);
    if (constraints.containsElementNamed(cs_str.c_str())){
      // Get constraints
      IntegerVector cs_ = constraints[cs_str];

      // Positive "should-link" constraints
      IntegerVector pcons = as<IntegerVector>(cs_[cs_ > 0]);
      for (IntegerVector::iterator pc = pcons.begin(); pc != pcons.end(); ++pc){
        satisfied_constraints += contains(noise, *pc);
      }

      // Negative "should-not-link" constraints
      IntegerVector ncons = -(as<IntegerVector>(cs_[cs_ < 0]));
      for (IntegerVector::iterator nc = ncons.begin(); nc != ncons.end(); ++nc){
        satisfied_constraints += (1 - contains(noise, *nc));
      }
    }
  }
  return(satisfied_constraints);
}


// Framework for Optimal Selection of Clusters (FOSC)
// Traverses a cluster tree hierarchy to compute a flat solution, maximizing the:
// - Unsupervised soln: the 'most stable' clusters following the give linkage criterion
// - SS soln w/ instance level Constraints: constraint-based w/ unsupervised tiebreaker
// - SS soln w/ mixed objective function: maximizes J = α JU + (1 − α) JSS
// [[Rcpp::export]]
NumericVector fosc(List cl_tree, std::string cid, std::list<int>& sc, List cl_hierarchy,
                   bool prune_unstable_leaves=false, // whether to prune -very- unstable subbranches
                   double cluster_selection_epsilon = 0.0, // whether to prune subbranches below a given epsilon
                   const double alpha = 0, // mixed objective case
                   bool useVirtual = false, // return virtual node as well
                   const int n_constraints = 0, // number of constraints
                   List constraints = R_NilValue) // instance-level constraints
{
  // Base case: at a leaf
  if (!cl_hierarchy.containsElementNamed(cid.c_str())){
    List cl = cl_tree[cid];
    sc.push_back(std::atoi(cid.c_str())); // assume the leaf will be a salient cluster until proven otherwise
    return(NumericVector::create((double) cl["stability"],
                                 (double) useVirtual ? cl["vscore"] : 0));
  } else {
    // Non-base case: at a merge of clusters, determine which to keep
    List cl = cl_tree[cid];

    // Get child stability/constraint scores
    NumericVector scores, stability_scores = NumericVector(), constraint_scores = NumericVector();
    IntegerVector child_ids = cl_hierarchy[cid];
    for (int i = 0, clen = child_ids.length(); i < clen; ++i){
      int child_id = child_ids.at(i);
      scores = fosc(cl_tree, std::to_string(child_id), sc, cl_hierarchy, prune_unstable_leaves, cluster_selection_epsilon, alpha, useVirtual, n_constraints, constraints);
      stability_scores.push_back(scores.at(0));
      constraint_scores.push_back(scores.at(1));
    }

    // If semisupervised scenario, normalizing should be stored in 'total_stability'
    double total_stability = (contains(cl_tree.attributeNames(),"total_stability") ? (double) cl_tree.attr("total_stability") : 1.0);

    // Compare and update stability scores
    double old_stability_score = (double) cl["stability"] / total_stability;
    double new_stability_score = (double) sum(stability_scores) / total_stability;

    // Compute instance-level constraints if necessary
    double old_constraint_score = 0, new_constraint_score = 0;
    if (useVirtual){
      // Rcout << "old constraint score for " << cid << ": " << (double) cl["vscore"] << std::endl;
      old_constraint_score = (double) cl["vscore"];
      new_constraint_score = (double) sum(constraint_scores) + (double) computeVirtualNode(cl["contains"], constraints)/n_constraints;
    }

    bool keep_children = true;
    // If the score is unchanged, remove the children and add parent
    if (useVirtual){
      if (old_constraint_score < new_constraint_score && cid != "0"){
        // Children satisfies more constraints
        cl["vscore"] = new_constraint_score;
        cl["score"] = alpha * new_stability_score + (1 - alpha) * new_constraint_score;
        // Rcout << "1: score for " << cid << ":" << (double) cl["score"] << std::endl;
        // Rcout << "(old constraint): " << old_constraint_score << ", (new constraint): " << new_constraint_score << std::endl;
      } else if (old_constraint_score > new_constraint_score && cid != "0"){
        // Parent satisfies more constraints
        cl["vscore"] = old_constraint_score;
        cl["score"] = alpha * old_stability_score + (1 - alpha) * old_constraint_score;
        // Rcout << "2: score for " << cid << ":" << (double) cl["score"] << std::endl;
        keep_children = false;
      } else {
        // Resolve tie using unsupervised, stability-based approach
        if (old_stability_score < new_stability_score){
          // Children are more stable
          cl["score"] = new_stability_score / total_stability;
          // Rcout << "3: score for " << cid << ":" << (double) cl["score"] << std::endl;
        } else {
          // Parent is more stable
          cl["score"] = old_stability_score / total_stability;
          // Rcout << "4: score for " << cid << ":" << (double) cl["score"] << std::endl;
          // Rcout << "(old stability): " << old_stability_score << ", (total stability): " << total_stability << std::endl;
          keep_children = false;
        }
        cl["vscore"] = old_constraint_score;
      }
    } else {
      // Use unsupervised, stability-based approach only
      if (old_stability_score < new_stability_score){
        cl["score"] = new_stability_score; // keep children
      } else {
        cl["score"] = old_stability_score;
        keep_children = false;
      }
    }

    double epsdeath = (double) cl["eps_death"];
    if (epsdeath < cluster_selection_epsilon){
      keep_children = false; // prune children that emerge at distance below epsilon
    }

    // Prune children and add parent (cid) if need be
    if (!keep_children && cid != "0") {
      IntegerVector children = all_children(cl_hierarchy, std::atoi(cid.c_str())); // use all_children to prune subtrees
      for (int i = 0, clen = children.length(); i < clen; ++i){
        sc.remove(children.at(i)); // use list for slightly better random deletion performance
      }
      sc.push_back(std::atoi(cid.c_str()));
    } else if (keep_children && prune_unstable_leaves){
      // If flag passed, prunes leaves with insignificant stability scores
      // this can happen in cases where one leaf has a stability score significantly greater
      // than both its siblings and its parent (or other ancestors), causing sibling branches
      // to be considered as clusters even though they may nto be significantly more stable than their parent
      if (all(stability_scores < old_stability_score).is_false()){
        for (int i = 0, clen = child_ids.length(); i < clen; ++i){
          if (stability_scores.at(i) < old_stability_score){
            IntegerVector to_prune = all_children(cl_hierarchy, child_ids.at(i)); // all sub members
            for (IntegerVector::iterator it = to_prune.begin(); it != to_prune.end(); ++it){
              //Rcout << "Pruning: " << *it << std::endl;
              sc.remove(*it);
            }
          }
        }
      }
    }

    // Save scores for traversal up and for later
    cl_tree[cid] = cl;

    // Return this sub trees score
    return(NumericVector::create((double) cl["score"], useVirtual ? (double) cl["vscore"] : 0));
  }
}

// Given a cluster tree object with computed stability precomputed scores from computeStability,
// extract the 'most stable' or salient flat cluster assignments. The large number of derivable
// arguments due to fosc being a recursive function
// [[Rcpp::export]]
List extractUnsupervised(List cl_tree, bool prune_unstable = false, double cluster_selection_epsilon = 0.0){
  // Compute Salient Clusters
  std::list<int> sc = std::list<int>();
  List cl_hierarchy = cl_tree.attr("cl_hierarchy");
  int n = as<int>(cl_tree.attr("n"));
  fosc(cl_tree, "0", sc, cl_hierarchy, prune_unstable, cluster_selection_epsilon); // Assume root node is always id == 0

  // Store results as attributes
  cl_tree.attr("cluster") = getSalientAssignments(cl_tree, cl_hierarchy, sc, n); // Flat assignments
  cl_tree.attr("salient_clusters") = wrap(sc); // salient clusters
  return(cl_tree);
}

// [[Rcpp::export]]
List extractSemiSupervised(List cl_tree, List constraints, float alpha = 0, bool prune_unstable_leaves = false, double cluster_selection_epsilon = 0.0){
  // Rcout << "Starting semisupervised extraction..." << std::endl;
  List root = cl_tree["0"];
  List cl_hierarchy = cl_tree.attr("cl_hierarchy");
  int n = as<int>(cl_tree.attr("n"));

  // Compute total number of constraints
  int n_constraints = 0;
  for (int i = 0, n = constraints.length(); i < n; ++i){
    IntegerVector cl_constraints = constraints.at(i);
    n_constraints += cl_constraints.length();
  }

  // Initialize root
  List cl = cl_tree["0"];
  cl["vscore"] = 0;
  cl_tree["0"] = cl; // replace to keep changes

  // Compute initial gamma values or "virtual nodes" for both leaf and internal nodes
  IntegerVector cl_ids = all_children(cl_hierarchy, 0);
  for (IntegerVector::iterator it = cl_ids.begin(); it != cl_ids.end(); ++it){
    if (*it != 0){
      std::string cid_str = std::to_string(*it);
      List cl = cl_tree[cid_str];

      // Store the initial fraction of constraints satisfied for each node as 'vscore'
      // NOTE: leaf scores represent \hat{gamma}, internal represent virtual node scores
      if (cl_hierarchy.containsElementNamed(cid_str.c_str())){
        // Extract the point indices the cluster contains
        IntegerVector child_cl = all_children(cl_hierarchy, *it), child_ids;
        List cl_container = List();
        for (IntegerVector::iterator ch_id = child_cl.begin(); ch_id != child_cl.end(); ++ch_id){
          List ch_cl = cl_tree[std::to_string(*ch_id)];
          //child_ids = combine(child_ids, ch_cl["contains"]);
          cl_container.push_back(as<IntegerVector>(ch_cl["contains"]));
        }
        cl_container.push_back(as<IntegerVector>(cl["contains"]));
        child_ids = concat_int(cl_container);
        cl["vscore"] = computeVirtualNode(child_ids, constraints)/n_constraints;
      } else { // is leaf node
        cl["vscore"] = computeVirtualNode(cl["contains"], constraints)/n_constraints;
      }
      cl_tree[cid_str] = cl; // replace to keep changes
    }
  }

  // First pass: compute unsupervised soln as a means of extracting normalizing constant J_U^*
  cl_tree = extractUnsupervised(cl_tree, false, cluster_selection_epsilon);
  IntegerVector stable_sc = cl_tree.attr("salient_clusters");
  double total_stability = 0.0f;
  for (IntegerVector::iterator it = stable_sc.begin(); it != stable_sc.end(); ++it){
    List cl = cl_tree[std::to_string(*it)];
    total_stability += (double) cl["stability"];
  }
  cl_tree.attr("total_stability") = total_stability;
  // Rcout << "Total stability: " << total_stability << std::endl;

  // Compute stable clusters w/ instance-level constraints
  std::list<int> sc = std::list<int>();
  fosc(cl_tree, "0", sc, cl_hierarchy, prune_unstable_leaves, cluster_selection_epsilon,
       alpha, true, n_constraints, constraints); // semi-supervised parameters

  // Store results as attributes and return
  cl_tree.attr("salient_clusters") = wrap(sc);
  cl_tree.attr("cluster") = getSalientAssignments(cl_tree, cl_hierarchy, sc, n);
  return(cl_tree);
}


================================================
FILE: src/kNN.cpp
================================================
//----------------------------------------------------------------------
//                  Find the k Nearest Neighbors
// File:                    R_kNNdist.cpp
//----------------------------------------------------------------------
// Copyright (c) 2015 Michael Hahsler. All Rights Reserved.
//
// This software is provided under the provisions of the
// GNU General Public License (GPL) Version 3
// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)

// Note: does not return self-matches!

#include "kNN.h"

// returns knn + dist
List kNN_int(NumericMatrix data, int k,
  int type, int bucketSize, int splitRule, double approx) {

  // copy data
  int nrow = data.nrow();
  int ncol = data.ncol();
  ANNpointArray dataPts = annAllocPts(nrow, ncol);
  for(int i = 0; i < nrow; i++){
    for(int j = 0; j < ncol; j++){
      (dataPts[i])[j] = data(i, j);
    }
  }
  //Rprintf("Points copied.\n");

  // create kd-tree (1) or linear search structure (2)
  ANNpointSet* kdTree = NULL;
  if (type==1){
    kdTree = new ANNkd_tree(dataPts, nrow, ncol, bucketSize,
      (ANNsplitRule)  splitRule);
  } else{
    kdTree = new ANNbruteForce(dataPts, nrow, ncol);
  }
  //Rprintf("kd-tree ready. starting DBSCAN.\n");

  NumericMatrix d(nrow, k);
  IntegerMatrix id(nrow, k);

  // Note: the search also returns the point itself (as the first hit)!
  // So we have to look for k+1 points.
  ANNdistArray dists = new ANNdist[k+1];
  ANNidxArray nnIdx = new ANNidx[k+1];

  for (int i=0; i<nrow; i++) {
    if (!(i % 100)) Rcpp::checkUserInterrupt();

    ANNpoint queryPt = dataPts[i];

    kdTree->annkSearch(queryPt, k+1, nnIdx, dists, approx);

    // remove self match
    IntegerVector ids = IntegerVector(nnIdx, nnIdx+k+1);
    LogicalVector take = ids != i;
    ids = ids[take];
    id(i, _) = ids + 1;

    NumericVector ndists = NumericVector(dists, dists+k+1)[take];
    d(i, _) = sqrt(ndists);
  }

  // cleanup
  delete kdTree;
  delete [] dists;
  delete [] nnIdx;
  annDeallocPts(dataPts);
  // annClose(); is now done globally in the package


  // prepare results
  List ret;
  ret["dist"] = d;
  ret["id"] = id;
  ret["k"] = k;
  ret["sort"] = true;
  return ret;
}

// returns knn + dist using data and query
// [[Rcpp::export]]
List kNN_query_int(NumericMatrix data, NumericMatrix query, int k,
  int type, int bucketSize, int splitRule, double approx) {

  // FIXME: check ncol for data and query

  // copy data
  int nrow = data.nrow();
  int ncol = data.ncol();
  ANNpointArray dataPts = annAllocPts(nrow, ncol);
  for(int i = 0; i < nrow; i++){
    for(int j = 0; j < ncol; j++){
      (dataPts[i])[j] = data(i, j);
    }
  }

  // copy query
  int nrow_q = query.nrow();
  int ncol_q = query.ncol();
  ANNpointArray queryPts = annAllocPts(nrow_q, ncol_q);
  for(int i = 0; i < nrow_q; i++){
    for(int j = 0; j < ncol_q; j++){
      (queryPts[i])[j] = query(i, j);
    }
  }
  //Rprintf("Points copied.\n");

  // create kd-tree (1) or linear search structure (2)
  ANNpointSet* kdTree = NULL;
  if (type==1){
    kdTree = new ANNkd_tree(dataPts, nrow, ncol, bucketSize,
      (ANNsplitRule)  splitRule);
  } else{
    kdTree = new ANNbruteForce(dataPts, nrow, ncol);
  }
  //Rprintf("kd-tree ready. starting DBSCAN.\n");

  NumericMatrix d(nrow_q, k);
  IntegerMatrix id(nrow_q, k);

  // Note: does not return itself with query
  ANNdistArray dists = new ANNdist[k];
  ANNidxArray nnIdx = new ANNidx[k];

  for (int i=0; i<nrow_q; i++) {
    if (!(i % 100)) Rcpp::checkUserInterrupt();

    ANNpoint queryPt = queryPts[i];
    kdTree->annkSearch(queryPt, k, nnIdx, dists, approx);

    IntegerVector ids = IntegerVector(nnIdx, nnIdx+k);
    id(i, _) = ids + 1;

    NumericVector ndists = NumericVector(dists, dists+k);
    d(i, _) = sqrt(ndists);
  }

  // cleanup
  delete kdTree;
  delete [] dists;
  delete [] nnIdx;
  annDeallocPts(dataPts);
  annDeallocPts(queryPts);
  // annClose(); is now done globally in the package

  // prepare results (ANN returns points sorted by distance)
  List ret;
  ret["dist"] = d;
  ret["id"] = id;
  ret["k"] = k;
  ret["sort"] = true;
  return ret;
}


================================================
FILE: src/kNN.h
================================================
#ifndef KNN_H
#define KNN_H

#include <Rcpp.h>
#include "ANN/ANN.h"

using namespace Rcpp;

// returns knn + dist
// [[Rcpp::export]]
List kNN_int(NumericMatrix data, int k,
             int type, int bucketSize, int splitRule, double approx);

#endif


================================================
FILE: src/lof.cpp
================================================
//----------------------------------------------------------------------
//                  Find the Neighbourhood for LOF
// File:                    R_lof.cpp
//----------------------------------------------------------------------
// Copyright (c) 2021 Michael Hahsler. All Rights Reserved.
//
// This software is provided under the provisions of the
// GNU General Public License (GPL) Version 3
// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)

// LOF needs to find the k-NN distance and then how many points are within this
// neighborhood.

#include <Rcpp.h>
#include "regionQuery.h"

using namespace Rcpp;

// returns knn-dist and the neighborhood size as a matrix
// [[Rcpp::export]]
List lof_kNN(NumericMatrix data, int minPts,
  int type, int bucketSize, int splitRule, double approx) {

  // minPts includes the point itself; k does not!
  int k = minPts - 1;

  // copy data
  int nrow = data.nrow();
  int ncol = data.ncol();
  ANNpointArray dataPts = annAllocPts(nrow, ncol);
  for(int i = 0; i < nrow; i++){
    for(int j = 0; j < ncol; j++){
      (dataPts[i])[j] = data(i, j);
    }
  }
  //Rprintf("Points copied.\n");

  // create kd-tree (1) or linear search structure (2)
  ANNpointSet* kdTree = NULL;
  if (type==1){
    kdTree = new ANNkd_tree(dataPts, nrow, ncol, bucketSize,
      (ANNsplitRule)  splitRule);
  } else{
    kdTree = new ANNbruteForce(dataPts, nrow, ncol);
  }
  //Rprintf("kd-tree ready. starting DBSCAN.\n");

  // Note: the search also returns the point itself (as the first hit)!
  // So we have to look for k+1 points.
  ANNdistArray dists = new ANNdist[k+1];
  ANNidxArray nnIdx = new ANNidx[k+1];
  nn N;

  // results
  List id(nrow);
  List dist(nrow);
  NumericVector k_dist(nrow);

  for (int i=0; i<nrow; i++) {
    //Rprintf("processing point %d\n", p+1);
    if (!(i % 100)) Rcpp::checkUserInterrupt();

    ANNpoint queryPt = dataPts[i];

    // find k-NN distance
    kdTree->annkSearch(queryPt, k+1, nnIdx, dists, approx);
    k_dist[i] = ANN_ROOT(dists[k]); // this is a squared distance!

    // find k-NN neighborhood which can be larger than k with tied distances
    // This works under Linux and Windows, but not under Solaris: The points at the
    // k_distance may not be included.
    //nn N = regionQueryDist_point(queryPt, dataPts, kdTree, dists[k], approx);

    // Make the comparison robust.
    // Compare doubles: http://c-faq.com/fp/fpequal.html
    double minPts_dist = dists[k] + DBL_EPSILON * dists[k];
    nn N = regionQueryDist_point(queryPt, dataPts, kdTree, minPts_dist, approx);

    IntegerVector ids = IntegerVector(N.first.begin(), N.first.end());
    NumericVector dists = NumericVector(N.second.begin(), N.second.end());

    // remove self matches -- not an issue with query points
    LogicalVector take = ids != i;
    ids = ids[take];
    dists = dists[take];

    id[i] = ids+1;
    dist[i] = sqrt(dists);
  }

  // cleanup
  delete kdTree;
  delete [] dists;
  delete [] nnIdx;
  annDeallocPts(dataPts);
  // annClose(); is now done globally in the package

  // all k_dists are squared
  //k_dist = sqrt(k_dist);

  // prepare results
  List ret;
  ret["k_dist"] = k_dist;
  ret["ids"] = id;
  ret["dist"] = dist;
  return ret;
}


================================================
FILE: src/lt.h
================================================
#ifndef LT
#define LT

/* LT_POS to access a lower triangle matrix by C. Buchta
 * modified by M. Hahsler
 * n ... number of rows/columns
 * i,j ... column and row index (starts with 1)
 *
 * LT_POS1 ... 1-based indexing
 * LT_POS0 ... 0-based indexing
 */

/* for long vectors, n, i, j need to be  R_xlen_t */
#define LT_POS1(n, i, j)					\
  (i)==(j) ? 0 : (i)<(j) ? (n) * ((i) - 1) - (i)*((i)-1)/2 + (j)-(i) -1	\
        : (n)*((j)-1) - (j)*((j)-1)/2 + (i)-(j) -1

#define LT_POS0(n, i, j)					\
  (i)==(j) ? 0 : (i)<(j) ? (n) * (i) - ((i) + 1)*(i)/2 + (j)-(i) -1	\
        : (n)*(j) - ((j) + 1)*(j)/2 + (i)-(j) -1

/* M_POS to access matrix column-major order by i and j index (starts with 1)
 * n is the number of rows
 */
#define M_POS(n, i, j) ((i)+(n)*(j))


/*
 * MIN/MAX
 */

#define MIN(X,Y) ((X) < (Y) ? (X) : (Y))
#define MAX(X,Y) ((X) > (Y) ? (X) : (Y))


#endif


================================================
FILE: src/mrd.cpp
================================================
//----------------------------------------------------------------------
//              R interface to dbscan using the ANN library
//----------------------------------------------------------------------
// Copyright (c) 2015 Michael Hahsler, Matt Piekenbrock. All Rights Reserved.
//
// This software is provided under the provisions of the
// GNU General Public License (GPL) Version 3
// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)

#include <Rcpp.h>

using namespace Rcpp;

// Computes the mutual reachability distance defined for HDBSCAN
//
// The mutual reachability distance is a summary at what level two points together
// will connect. The mutual reachability distance is defined as:
// mrd(a, b) = max[core_distance(a), core_distance(b), distance(a, b)]
//
// Input:
// * dm: distances as a dist object (vector) of size (n*(n-1))/2 where n
//       is the number of points.
//       Note: we divide by 2 early to stay within the number range of int.
// * cd: the core distances as a vector of length n
//
// Returns:
// a vector (dist object) in the same order as dm
// [[Rcpp::export]]
NumericVector mrd(NumericVector dm, NumericVector cd) {
  R_xlen_t n = cd.length();
  if (dm.length() != (n * (n-1) / 2))
    stop("number of mutual reachability distance values and size of the distance matrix do not agree.");

  NumericVector res = NumericVector(dm.length());
  for (R_xlen_t i = 0, idx = 0; i < n; ++i) {
//    Rprintf("i = %ill of %ill, idx = %ill\n", i, n, idx);
    for (R_xlen_t j = i+1; j < n; ++j, ++idx) {
      res[idx] = std::max(dm[idx], std::max(cd[i], cd[j]));
    }
  }
  return res;
}


================================================
FILE: src/mst.cpp
================================================
//----------------------------------------------------------------------
//              R interface to dbscan using the ANN library
//----------------------------------------------------------------------
// Copyright (c) 2015 Michael Hahsler, Matt Piekenbrock. All Rights Reserved.
//
// This software is provided under the provisions of the
// GNU General Public License (GPL) Version 3
// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)

#include "mst.h"

// coreFromDist indexes through the a dist vector to retrieve the core distance;
// this might be useful in some situations. For example, you can get the core distance
// from only a dist object, without needing the original data. In experimentation, the
// kNNdist ended up being faster than this.
//
// // [[Rcpp::export]]
// NumericVector coreFromDist(const NumericVector dist, const int n, const int minPts){
//   NumericVector core_dist = NumericVector(n);
//   NumericVector row_dist = NumericVector(n - 1);
//   for (R_xlen_t i = 0; i < n; ++i){
//     for (R_xlen_t j = 0; j < n; ++j){
//       if (i == j) continue;
//       R_xlen_t index = LT_POS0(n, j, i)
//       row_dist.at(j > i ? j  - 1 : j) = dist.at(index);
//     }
//     std::sort(row_dist.begin(), row_dist.end());
//     core_dist[i] = row_dist.at(minPts-2); // one for 0-based indexes, one for inclusive minPts condition
//   }
//   return(core_dist);
// }


// Prim's Algorithm
// this implementation for dense dist objects avoids the use of a min-heap.
// [[Rcpp::export]]
Rcpp::NumericMatrix mst(const NumericVector x_dist, const R_xlen_t n) {
  Rcpp::NumericMatrix mst = NumericMatrix(n - 1, 3);
  colnames(mst) = CharacterVector::create("from", "to", "weight");

  // vector to store the parent of vertex
  std::vector<int> parent(n);
  std::vector<double> weight(n, INFINITY);
  std::vector<bool> visited(n, false);

  // first node is always the root of MST.
  parent[0] = -1;
  weight[0] = 0;

  int next_node = 0;
  double next_weight;
  int node;

  while (next_node >= 0) {
    node = next_node;
    next_node = -1;
    next_weight = INFINITY;

    visited[node] = true;
    mst(node-1, 1) = parent[node] +1;
    mst(node-1, 0) = node + 1;
    mst(node-1, 2) = weight[node];

    for (int i = 1; i < n; i++) { // 0 is always the first node
      if (visited[i] || node == i) continue;

      double the_weight = x_dist[LT_POS0(n, node, i)];
      if (the_weight < weight[i]) {
        weight[i] = the_weight;
        parent[i] = node;
      }

      // find minimum weight node
      if (weight[i] < next_weight) {
        next_weight = weight[i];
        next_node = i;
      }

    }
  }

  return(mst);
}

//
// // [[Rcpp::export]]
// IntegerVector order_(NumericVector x) {
//   if (is_true(any(duplicated(x)))) {
//     Rf_warning("There are duplicates in 'x'; order not guaranteed to match that of R's base::order");
//   }
//   NumericVector sorted = clone(x).sort();
//   return match(sorted, x);
// }


// Single link hierarchical clustering
// used by GLOSH.R and hdbscan.R

void visit(const IntegerMatrix& merge, IntegerVector& order, int i, int j, int& ind) {
  // base case
  if (merge(i, j) < 0) {
    order.at(ind++) = -merge(i, j);
  }
  else {
    visit(merge, order, merge(i, j) - 1, 0, ind);
    visit(merge, order, merge(i, j) - 1, 1, ind);
  }
}

IntegerVector extractOrder(IntegerMatrix merge){
  IntegerVector order = IntegerVector(merge.nrow()+1);
  int ind = 0;
  visit(merge, order, merge.nrow() - 1, 0, ind);
  visit(merge, order, merge.nrow() - 1, 1, ind);
  return(order);
}

// [[Rcpp::export]]
List hclustMergeOrder(NumericMatrix mst, IntegerVector o){
  int npoints = mst.nrow() + 1;
  NumericVector dist = mst(_, 2);

  // Extract order, reorder indices
  NumericVector left = mst(_, 0), right = mst(_, 1);
  IntegerVector left_int = as<IntegerVector>(left[o-1]), right_int = as<IntegerVector>(right[o-1]);

  // Labels and resulting merge matrix
  IntegerVector labs = -seq_len(npoints);
  IntegerMatrix merge = IntegerMatrix(npoints - 1, 2);

  // Replace singletons as negative and record merge of non-singletons as positive
  for (int i = 0; i < npoints - 1; ++i) {
    int lab_left = labs.at(left_int.at(i)-1), lab_right = labs.at(right_int.at(i)-1);
    merge(i, _) = IntegerVector::create(lab_left, lab_right);
    for (int c = 0; c < npoints; ++c){
      if (labs.at(c) == lab_left || labs.at(c) == lab_right){
        labs.at(c) = i+1;
      }
    }
  }
  //IntegerVector int_labels = seq_len(npoints);
  List res = List::create(
    _["merge"] = merge,
    _["height"] = dist[o-1],
    _["order"] = extractOrder(merge),
    _["labels"] = R_NilValue, //as<StringVector>(int_labels)
    _["method"] = "robust single",
    _["dist.method"] = "mutual reachability"
  );
  res.attr("class") = "hclust";
  return res;
}


================================================
FILE: src/mst.h
================================================
#ifndef MST_H
#define MST_H

#include <Rcpp.h>
#include "lt.h"

using namespace Rcpp;

// Functions to compute MST and build hclust object out of the resulting tree
NumericMatrix mst(const NumericVector x_dist, const R_xlen_t n);

List hclustMergeOrder(NumericMatrix mst, IntegerVector o);

#endif


================================================
FILE: src/optics.cpp
================================================
//----------------------------------------------------------------------
//                                OPTICS
// File:                        R_optics.cpp
//----------------------------------------------------------------------
// Copyright (c) 2015 Michael Hahsler, Matt Piekenbrock. All Rights Reserved.
//
// This software is provided under the provisions of the
// GNU General Public License (GPL) Version 3
// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)


#include <Rcpp.h>
#include "ANN/ANN.h"
#include "regionQuery.h"

using namespace Rcpp;

void update(
    std::pair< std::vector<int>, std::vector<double> > &N,
    int p,
    std::vector<int> &seeds,
    int minPts,
    std::vector <bool> &visited,
    std::vector<int> &orderedPoints,
    std::vector<double> &reachdist,
    std::vector<double> &coredist,
    std::vector<int> &pre){

  std::vector<int>::iterator pos_seeds;
  double newreachdist;
  int o;
  double o_d;

  while(!N.first.empty()) {
    o = N.first.back();
    o_d = N.second.back();
    N.first.pop_back();
    N.second.pop_back();

    if(visited[o]) continue;

    newreachdist = std::max(coredist[p], o_d);

    if(reachdist[o] == INFINITY) {
      reachdist[o] = newreachdist;
      seeds.push_back(o);
    } else {
      // o was not visited and has a reachability distance must be
      // already in seeds!
      if(newreachdist < reachdist[o]) {
        reachdist[o] = newreachdist;
        pre[o] = p;
      }
    }
  }
}


// [[Rcpp::export]]
List optics_int(NumericMatrix data, double eps, int minPts,
  int type, int bucketSize, int splitRule, double approx, List frNN) {

  // kd-tree uses squared distances
  double eps2 = eps*eps;

  ANNpointSet* kdTree = NULL;
  ANNpointArray dataPts = NULL;
  int nrow = NA_INTEGER;
  int ncol= NA_INTEGER;

  if(frNN.size()) {
    // no kd-tree
    nrow = (as<List>(frNN["id"])).size();
  }else{

    // copy data for kd-tree
    nrow = data.nrow();
    ncol = data.ncol();
    dataPts = annAllocPts(nrow, ncol);
    for (int i = 0; i < nrow; i++){
      for (int j = 0; j < ncol; j++){
        (dataPts[i])[j] = data(i, j);
      }
    }
    //Rprintf("Points copied.\n");

    // create kd-tree (1) or linear search structure (2)
    if (type==1) kdTree = new ANNkd_tree(dataPts, nrow, ncol, bucketSize,
      (ANNsplitRule) splitRule);
    else kdTree = new ANNbruteForce(dataPts, nrow, ncol);
    //Rprintf("kd-tree ready. starting OPTICS.\n");

  }


  // OPTICS
  std::vector<bool> visited(nrow, false);
  std::vector<int> orderedPoints; orderedPoints.reserve(nrow);
  std::vector<int> pre(nrow, NA_INTEGER);
  std::vector<double> reachdist(nrow, INFINITY); // we used Inf as undefined
  std::vector<double> coredist(nrow, INFINITY);
  nn N;
  std::vector<int> seeds;
  std::vector<double> ds;

  for (int p=0; p<nrow; p++) {
    if (!(p % 10)) Rcpp::checkUserInterrupt();
    //Rprintf("processing point %d\n", p+1);

    if (visited[p]) continue;

    // ExpandClusterOrder
    //N = regionQueryDist(p, dataPts, kdTree, eps2, approx);
    if(frNN.size())   N = std::make_pair(
      as<std::vector<int> >(as<List>(frNN["id"])[p]),
      as<std::vector<double> >(as<List>(frNN["dist"])[p]));
    else              N = regionQueryDist(p, dataPts, kdTree, eps2, approx);

    visited[p] = true;

    // find core distance
    if(N.second.size() >= (size_t) minPts) {
      ds = N.second;
      std::sort(ds.begin(), ds.end()); // sort inceasing
      coredist[p] = ds[minPts-1];
    }
    int tmp_p = NA_INTEGER;
    if (pre[p] == NA_INTEGER) { tmp_p = p; }
    orderedPoints.push_back(p);

    if (coredist[p] == INFINITY) continue; // core-dist is undefined

    // updateable priority queue does not exist in C++ STL so we use a vector!
    //seeds.clear();

    // update
    update(N, p, seeds, minPts, visited, orderedPoints,
      reachdist, coredist, pre);

    int q;
    while (!seeds.empty()) {
      // get smallest dist (to emulate priority queue). All should have already
      // a reachability distance <Inf from update().
      std::vector<int>::iterator q_it = seeds.begin();
      for (std::vector<int>::iterator it = seeds.begin();
        it!=seeds.end(); ++it) {
        // Note: The second part of the if statement ensures that ties are
        // always broken consistenty (higher ID wins to produce the same
        // results as the elki implementation)!
        if (reachdist[*it] < reachdist[*q_it] ||
          (reachdist[*it] == reachdist[*q_it] && *q_it < *it)) q_it = it;
      }
      q = *q_it;
      seeds.erase(q_it);

      //N2 = regionQueryDist(q, dataPts, kdTree, eps2, approx);
      if(frNN.size())   N = std::make_pair(
        as<std::vector<int> >(as<List>(frNN["id"])[q]),
        as<std::vector<double> >(as<List>(frNN["dist"])[q]));
      else              N = regionQueryDist(q, dataPts, kdTree, eps2, approx);

      visited[q] = true;

      // update core distance
      if(N.second.size() >= (size_t) minPts) {
        ds = N.second;
        std::sort(ds.begin(), ds.end());
        coredist[q] = ds[minPts - 1];
      }
      if (pre[q] == NA_INTEGER) { pre[q] = tmp_p; }
      orderedPoints.push_back(q);

      if(N.first.size() < (size_t) minPts) continue; //  == q has no core dist.

      // update seeds
      update(N, q, seeds, minPts, visited, orderedPoints,
        reachdist, coredist, pre);
    }
  }

  // cleanup
  if (kdTree != NULL) delete kdTree;
  if (dataPts != NULL)  annDeallocPts(dataPts);
  // annClose(); is now done globally in the package

  // prepare results (R index starts with 1)
  List ret;
  ret["order"] = IntegerVector(orderedPoints.begin(), orderedPoints.end()) + 1;
  ret["reachdist"] = sqrt(NumericVector(reachdist.begin(), reachdist.end()));
  ret["coredist"] = sqrt(NumericVector(coredist.begin(), coredist.end()));
  ret["predecessor"] = IntegerVector(pre.begin(), pre.end()) + 1;
  return ret;
}


================================================
FILE: src/regionQuery.cpp
================================================
//----------------------------------------------------------------------
//                              Region Query
// File:                        R_regionQuery.cpp
//----------------------------------------------------------------------
// Copyright (c) 2015 Michael Hahsler. All Rights Reserved.
//
// This software is provided under the provisions of the
// GNU General Public License (GPL) Version 3
// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)


#include "regionQuery.h"

using namespace Rcpp;

// Note: Region query returns self-matches!

// these function takes an id for the points in the k-d tree
nn regionQueryDist(int id, ANNpointArray dataPts, ANNpointSet* kdTree,
  double eps2, double approx) {

  // find fixed radius nearest neighbors
  ANNpoint queryPt = dataPts[id];
  std::pair< std::vector<int>, std::vector<double> > ret =
    kdTree->annkFRSearch2(queryPt, eps2, approx);
  // Note: the points are not sorted by distance!

  return(ret);
}

std::vector<int> regionQuery(int id, ANNpointArray dataPts, ANNpointSet* kdTree,
  double eps2, double approx) {

  // find fixed radius nearest neighbors
  ANNpoint queryPt = dataPts[id];
  std::pair< std::vector<int>, std::vector<double> > ret =
    kdTree->annkFRSearch2(queryPt, eps2, approx);
  // Note: the points are not sorted by distance!

  return(ret.first);
}


// these function takes an query point not in the tree
nn regionQueryDist_point(ANNpoint queryPt, ANNpointArray dataPts,
	ANNpointSet* kdTree, double eps2, double approx) {

  // find fixed radius nearest neighbors
  std::pair< std::vector<int>, std::vector<double> > ret =
    kdTree->annkFRSearch2(queryPt, eps2, approx);
  // Note: the points are not sorted by distance!

  return(ret);
}

std::vector<int> regionQuery_point(ANNpoint queryPt, ANNpointArray dataPts,
	ANNpointSet* kdTree, double eps2, double approx) {

  // find fixed radius nearest neighbors
  std::pair< std::vector<int>, std::vector<double> > ret =
    kdTree->annkFRSearch2(queryPt, eps2, approx);
  // Note: the points are not sorted by distance!

  return(ret.first);
}


================================================
FILE: src/regionQuery.h
================================================
//----------------------------------------------------------------------
//                              Region Query
// File:                        R_regionQuery.h
//----------------------------------------------------------------------
// Copyright (c) 2015 Michael Hahsler. All Rights Reserved.
//
// This software is provided under the provisions of the
// GNU General Public License (GPL) Version 3
// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)

#ifndef REGIONQUERY_H
#define REGIONQUERY_H

#include <Rcpp.h>
#include "ANN/ANN.h"

using namespace Rcpp;

// pair of ids and dists
typedef std::pair< std::vector<int>, std::vector<double> > nn ;

// Note: Region query returns self-matches!

// these function takes an id for the points in the k-d tree
nn regionQueryDist(int id, ANNpointArray dataPts, ANNpointSet* kdTree,
  double eps2, double approx = 0.0);

std::vector<int> regionQuery(int id, ANNpointArray dataPts, ANNpointSet* kdTree,
  double eps2, double approx = 0.0);

// these function takes an query point not in the tree
nn regionQueryDist_point(ANNpoint queryPt, ANNpointArray dataPts, ANNpointSet* kdTree,
  double eps2, double approx = 0.0);

std::vector<int> regionQuery_point(ANNpoint queryPt, ANNpointArray dataPts, ANNpointSet* kdTree,
  double eps2, double approx = 0.0);

#endif


================================================
FILE: src/utilities.cpp
================================================
//----------------------------------------------------------------------
//              R interface to dbscan using the ANN library
//----------------------------------------------------------------------
// Copyright (c) 2015 Michael Hahsler. All Rights Reserved.
//
// This software is provided under the provisions of the
// GNU General Public License (GPL) Version 3
// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)


#include "utilities.h"

// extract the lower triangle from a matrix
IntegerVector lowerTri(IntegerMatrix m) {
  int n = m.nrow();
  IntegerVector lower_tri = IntegerVector(n * (n - 1) / 2);
  for (int i = 0, c = 0; i < n; ++i) {
    for (int j = i + 1; j < n; ++j) {
      if (i < j) lower_tri[c++] = m(i, j);
    }
  }
  return lower_tri;
}

NumericVector combine(const NumericVector& t1, const NumericVector& t2) {
  std::size_t n = t1.size() + t2.size();
  NumericVector output = Rcpp::no_init(n);
  std::copy(t1.begin(), t1.end(), output.begin());
  std::copy(t2.begin(), t2.end(), output.begin() + t1.size());
  return output;
}

IntegerVector combine(const IntegerVector& t1, const IntegerVector& t2) {
  std::size_t n = t1.size() + t2.size();
  IntegerVector output = Rcpp::no_init(n);
  std::copy(t1.begin(), t1.end(), output.begin());
  std::copy(t2.begin(), t2.end(), output.begin() + t1.size());
  return output;
}

// Faster version of above combine function, assuming you can precompute and store
// the containers needing to be concatenated
IntegerVector concat_int(List const& container) {
  int total_length = 0;
  for (List::const_iterator it = container.begin(); it != container.end(); ++it) {
    total_length += as<IntegerVector>(*it).size();
  }
  int pos = 0;
  IntegerVector output = Rcpp::no_init(total_length);
  for (List::const_iterator it = container.begin(); it != container.end(); ++it) {
    IntegerVector vec = as<IntegerVector>(*it);
    std::copy(vec.begin(), vec.end(), output.begin() + pos);
    pos += vec.size();
  }
  return output;
}


================================================
FILE: src/utilities.h
================================================
//----------------------------------------------------------------------
//              R interface to dbscan using the ANN library
//----------------------------------------------------------------------
// Copyright (c) 2015 Michael Hahsler. All Rights Reserved.
//
// This software is provided under the provisions of the
// GNU General Public License (GPL) Version 3
// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)


#ifndef UTILITIES_H
#define UTILITIES_H

#include <Rcpp.h>

using namespace Rcpp;

// contains used in hdbscan.cpp
template <typename T, typename C>
bool contains(const T& container, const C& key) {
  if (std::find(container.begin(), container.end(), key) != container.end()) {
    return true;
  }
  return false;
}

// extract the lower triangle from a matrix
// [[Rcpp::export]]
IntegerVector lowerTri(IntegerMatrix m);

// internal c (combine) for Rcpp vectors
NumericVector combine(const NumericVector& t1, const NumericVector& t2);
IntegerVector combine(const IntegerVector& t1, const IntegerVector& t2);

// Faster version of above combine function, assuming you can precompute and store
// the containers needing to be concatenated
IntegerVector concat_int(List const& container);

#endif


================================================
FILE: tests/testthat/test-dbcv.R
================================================
test_that("dbcv", {
  # From: https://github.com/FelSiq/DBCV
  #
  # Dataset	      MATLAB
  # dataset_1.txt	0.8576
  # dataset_2.txt	0.8103
  # dataset_3.txt	0.6319
  # dataset_4.txt	0.8688
  #
  # Original MATLAB implementation is at:
  #     https://github.com/pajaskowiak/dbcv/tree/main/data

  data(Dataset_1)
  x <- Dataset_1[, c("x", "y")]
  class <- Dataset_1$class
  #clplot(x, class)
  (db <- dbcv(x, class, metric = "sqeuclidean"))
  expect_equal(round(db$score, 2), 0.86)

  # detailed results from the Python implementation
  #dsc [0.00457826 0.00457826 0.0183068  0.0183068 ]
  #dspc [0.85627898 0.85627898 0.85627898 0.85627898]
  #vcs [0.99465331 0.99465331 0.97862052 0.97862052]
  #0.8575741400490697

  data(Dataset_2)
  x <- Dataset_2[, c("x", "y")]
  class <- Dataset_2$class
  #clplot(x, class)
  (db <- dbcv(x, class, metric = "sqeuclidean"))
  expect_equal(round(db$score, 2), 0.81)

  #dsc [19.06151967 15.6082 83.71522964 68.969]
  #dspc [860.2538 501.4376 501.4376 860.2538]
  #vcs [0.97784198 0.9688731  0.83304956 0.91982715]
  #0.8103343589093096

  # more data sets

  # data(Dataset_3)
  # x <- Dataset_3[, c("x", "y")]
  # class <- Dataset_3$class
  # #clplot(x, class)
  # (db <- dbcv(x, class, metric = "sqeuclidean"))
  #
  # data(Dataset_4)
  # x <- Dataset_4[, c("x", "y")]
  # class <- Dataset_4$class
  # #clplot(x, class)
  # (db <- dbcv(x, class, metric = "sqeuclidean"))

})


================================================
FILE: tests/testthat/test-dbscan.R
================================================
test_that("dbscan works", {
  data("iris")
  ## Species is a factor
  expect_error(dbscan(iris))

  iris <- as.matrix(iris[, 1:4])

  res <- dbscan(iris, eps = .4, minPts = 4)

  expect_length(res$cluster, nrow(iris))

  ## expected result of table(res$cluster) is:
  expect_identical(table(res$cluster, dnn = NULL),
      as.table(c("0" = 25L, "1" = 47L, "2" = 38L, "3" = 36L, "4" = 4L)))

  ## compare with dbscan from package fpc (only if installed)
  if (requireNamespace("fpc", quietly = TRUE)) {
      res2 <- fpc::dbscan(iris, eps = .4, MinPts = 4)

      expect_equal(res$cluster, res2$cluster)

      ## test is.corepoint
      all(res2$isseed == is.corepoint(iris, eps = .4, minPts = 4))
  }

  ## compare with precomputed frNN
  fr <- frNN(iris, eps = .4)
  res9 <- dbscan(fr, minPts = 4)
  expect_equal(res, res9)

  ## compare on example data from fpc
  set.seed(665544)
  n <- 600
  x <- cbind(
      x = runif(10, 0, 10) + rnorm(n, sd = 0.2),
      y = runif(10, 0, 10) + rnorm(n, sd = 0.2)
      )

  res <- dbscan(x, eps = .2, minPts = 4)
  expect_length(res$cluster, nrow(x))

  ## compare with dist-based versions
  res_d <- dbscan(dist(x), eps = .2, minPts = 4)
  expect_identical(res, res_d)
  res_d2 <- dbscan(x, eps = .2, minPts = 4, search = "dist")
  expect_identical(res, res_d2)

  ## compare with dbscan from package fpc (only if installed)
  if (requireNamespace("fpc", quietly = TRUE)) {
    res2 <- fpc::dbscan(x, eps = .2, MinPts = 4)
    expect_equal(res$cluster, res2$cluster)
  }

  ## missing values, but distances are fine
  x_na <- x
  x_na[c(1, 3, 5), 1] <- NA
  expect_error(dbscan(x_na, eps = .2, minPts = 4), regexp = "NA")
  res_d1 <- dbscan(x_na, eps = .2, minPts = 4, search = "dist")
  res_d2 <- dbscan(dist(x_na), eps = .2, minPts = 4)
  expect_identical(res_d1, res_d2)

  ## introduce NAs into dist
  x_na[c(1,3,5), 2] <- NA
  expect_error(dbscan(x_na, eps = .2, minPts = 4), regexp = "NA")
  expect_error(dbscan(x_na, eps = .2, minPts = 4, search = "dist"),
    regexp = "NA")
  expect_error(dbscan(dist(x_na), eps = .2, minPts = 4), regexp = "NA")


  ## call with no rows or no columns
  expect_error(dbscan(matrix(0, nrow = 0, ncol = 2), eps = .2, minPts = 4))
  expect_error(dbscan(matrix(0, nrow = 2, ncol = 0), eps = .2, minPts = 4))
  dbscan(matrix(0, nrow = 1, ncol = 1), eps = .2, minPts = 4)
})


================================================
FILE: tests/testthat/test-fosc.R
================================================
test_that("FOSC", {
  data("iris")

  ## FOSC expects an hclust object
  expect_error(extractFOSC(iris))

  x <- iris[, 1:4]
  x_sl <- hclust(dist(x), "single")

  ## Should return augmented hclust object and cluster assignments
  expect_length(extractFOSC(x_sl), 2)
  res <- extractFOSC(x_sl)
  expect_identical(res$hc$method, "single (w/ stability-based extraction)")

  ## Constraint-checking
  expect_error(extractFOSC(x_sl, constraints = c("1" = 2)))

  ## Matrix inputs must be nxn
  expect_error(extractFOSC(x_sl, constraints = matrix(c(1, 2), nrow=1)))

  ## Matrix or vector constraints must be in c(-1, 0, 1)
  expect_error(extractFOSC(x_sl, constraints = matrix(-2, nrow=nrow(x), ncol=nrow(x))))

  ## Valid constraints
  expect_warning(extractFOSC(x_sl, constraints = matrix(1, nrow=nrow(x), ncol=nrow(x))))
  expect_silent(extractFOSC(x_sl, constraints = list("1" = 2, "2" = 1)))
  expect_silent(extractFOSC(x_sl, constraints = ifelse(dist(x) > 2, -1, 1)))

  ## Constraints should be symmetric, but symmetry test is only done if specified. Asymmetric
  ## constraints through warning, but proceeds with manual warning
  expect_warning(extractFOSC(x_sl, constraints = list("1" = 2), validate_constraints = TRUE))

  ## Make sure that's whats returned
  res <- extractFOSC(x_sl)
  expect_type(res$cluster, "integer")
  expect_s3_class(res$hc, "hclust")

  ## Test 'Optimal' Clustering using only positive constraints
  set <- which(iris$Species == "setosa")
  ver <- which(iris$Species == "versicolor")
  vir <- which(iris$Species == "virginica")
  il_constraints <- structure(list(set[-1], ver[-1], vir[-1]), names = as.character(c(set[1], ver[1], vir[1])))
  res <- extractFOSC(x_sl, il_constraints)

  ## Positive-only constraints should link to best unsupervised solution
  expect_identical(table(res$cluster, dnn = NULL), as.table(c(`1` = 50L, `2` = 100L)))
  expect_identical(res$hc$method, "single (w/ constraint-based extraction)")

  ## Test negative constraints
  set2 <- c(il_constraints[[as.character(set[1])]], -unlist(il_constraints[as.character(c(ver[1], vir[1]))], use.names = FALSE))
  ver2 <- c(il_constraints[[as.character(ver[1])]], -unlist(il_constraints[as.character(c(set[1], vir[1]))], use.names = FALSE))
  vir2 <- c(il_constraints[[as.character(vir[1])]], -unlist(il_constraints[as.character(c(set[1], ver[1]))], use.names = FALSE))
  il_constraints2 <- structure(list(set2, ver2, vir2), names = as.character(c(set[1], ver[1], vir[1])))
  res2 <- extractFOSC(x_sl, constraints = il_constraints2)

  ## Positive and Negative should produce a different solution
  expect_false(all(res$cluster == res2$cluster))
  expect_identical(res2$hc$method, "single (w/ constraint-based extraction)")

  ## Test minPts parameters
  expect_error(extractFOSC(x_sl, constraints = il_constraints2, minPts = 1))
  expect_silent(extractFOSC(x_sl, constraints = il_constraints2, minPts = 5))

  ## Test alpha parameter
  expect_silent(extractFOSC(x_sl, constraints = il_constraints2, alpha = 0.5))
  expect_error(extractFOSC(x_sl, constraints = il_constraints2, alpha = 1.5))
  res3 <- extractFOSC(x_sl, constraints = il_constraints2, alpha = 0.5)
  expect_identical(res3$hc$method, "single (w/ mixed-objective extraction)")

  ## Test unstable pruning
  expect_silent(extractFOSC(x_sl, constraints = il_constraints2, prune_unstable = TRUE))
})


================================================
FILE: tests/testthat/test-frNN.R
================================================
test_that("frNN", {
  set.seed(665544)
  n <- 1000
  x <- cbind(
    x = runif(10, 0, 10) + rnorm(n, sd = 0.2),
    y = runif(10, 0, 10) + rnorm(n, sd = 0.2),
    z = runif(10, 0, 10) + rnorm(n, sd = 0.2)
  )

  ## no duplicates first!
  #x <- x[!duplicated(x),]

  rownames(x) <- paste0("Object_", seq_len(nrow(x)))

  eps <- .5
  nn <- frNN(x, eps = eps, sort = TRUE)

  ## check dimensions
  expect_identical(nn$eps, eps)
  expect_length(nn$dist, nrow(x))
  expect_length(nn$id, nrow(x))

  expect_identical(lengths(nn$dist), lengths(nn$id))

  ## check visually
  #plot(x)
  #points(x[nn$id[[1]],], col="red", lwd=5)
  #points(x[nn$id[[2]],], col="green", lwd=5)
  #points(x[1:2,, drop = FALSE], col="blue", pch="+", cex=2)

  ## compare with manually found NNs
  nn_d <- frNN(dist(x), eps = eps, sort = TRUE)
  expect_equal(nn, nn_d)

  nn_d2 <- frNN(x, eps = eps, sort = TRUE, search = "dist")
  expect_equal(nn, nn_d2)

  ## without sorting
  nn2 <- frNN(x, eps = eps, sort = FALSE)
  expect_identical(lapply(nn$id, sort),
    lapply(nn2$id, sort))

  ## search options
  nn_linear <- frNN(x, eps=eps, search = "linear")
  expect_equal(nn, nn_linear)

  ## split options
  for (so in c("STD", "MIDPT", "FAIR", "SL_FAIR")) {
    nn3 <- frNN(x, eps=eps, splitRule = so)
    expect_equal(nn, nn3)
  }

  ## bucket size
  for (bs in c(5, 10, 15, 100)) {
    nn3 <- frNN(x, eps=eps, bucketSize = bs)
    expect_equal(nn, nn3)
  }


  ## add 100 copied points to check if self match filtering works
  x <- rbind(x, x[sample(seq_len(nrow(x)), 100),])
  rownames(x) <- paste0("Object_", seq_len(nrow(x)))

  eps <- .5
  nn <- frNN(x, eps = eps, sort = TRUE)

  ## compare with manually found NNs
  nn_d <- frNN(x, eps = eps, sort = TRUE, search = "dist")

  expect_equal(nn, nn_d)

  ## sort and frNN to reduce eps
  nn5 <- frNN(x, eps = .5, sort = FALSE)
  expect_false(nn5$sort)

  nn5s <- sort(nn5)
  expect_true(nn5s$sort)
  expect_true(all(vapply(nn5s$dist, function(x) !is.unsorted(x), logical(1L))))

  expect_error(frNN(nn5, eps = 1))
  nn2 <- frNN(nn5, eps = .2)
  expect_true(all(vapply(nn2$dist, function(x) all(x <= 0.2), logical(1L))))


  ## test with simple data
  x <- data.frame(x = 1:10, row.names = LETTERS[1:10], check.names = FALSE)
  nn <- frNN(x, eps = 2)
  expect_identical(nn$id[[1]], 2:3)
  expect_identical(nn$id[[5]], c(4L, 6L, 3L, 7L))
  expect_identical(nn$id[[10]], 9:8)

  ## test kNN with query
  x <- data.frame(x = 1:10, row.names = LETTERS[1:10], check.names = FALSE)
  nn <- frNN(x[1:8, , drop=FALSE], x[9:10, , drop = FALSE], eps = 2)

  expect_length(nn$id, 2L)
  expect_identical(nn$id[[1]], 8:7)
  expect_identical(nn$id[[2]], 8L)

  expect_error(frNN(dist(x[1:8, , drop=FALSE]), x[9:10, , drop = FALSE], eps = 2))
})


================================================
FILE: tests/testthat/test-hdbscan.R
================================================
test_that("HDBSCAN", {
  data("iris")

  ## minPts not given
  expect_error(hdbscan(iris))

  ## Expects numerical data; species is factor
  expect_error(dbscan(iris, minPts = 4))

  iris <- as.matrix(iris[,1:4])

  res <- hdbscan(iris, minPts = 4)
  expect_length(res$cluster, nrow(iris))

  ## expected result of table(res$cluster) is:
  expect_identical(table(res$cluster, dnn = NULL),
                    as.table(c("1" = 100L, "2" = 50L)))

  ## compare on moons data
  data("moons")
  res <- hdbscan(moons, minPts = 5)
  expect_length(res$cluster, nrow(moons))

  ## Check hierarchy matches dbscan* at every value
  check <- rep(FALSE, nrow(moons)-1)
  core_dist <- kNNdist(moons, k=5-1)

  ## cutree doesn't distinguish noise as 0, so we make a new method to do it manually
  cut_tree <- function(hcl, eps, core_dist){
    cuts <- unname(cutree(hcl, h=eps))
    cuts[which(core_dist > eps)] <- 0 # Use core distance to distinguish noise
    cuts
  }

  eps_values <- sort(res$hc$height, decreasing = TRUE)+.Machine$double.eps ## Machine eps for consistency between cuts
  for (i in seq_along(eps_values)) {
    cut_cl <- cut_tree(res$hc, eps_values[i], core_dist)
    dbscan_cl <- dbscan(moons, eps = eps_values[i], minPts = 5, borderPoints = FALSE) # DBSCAN* doesn't include border points

    ## Use run length encoding as an ID-independent way to check ordering
    check[i] <- (all.equal(rle(cut_cl)$lengths, rle(dbscan_cl$cluster)$lengths) == "TRUE")
  }

  expect_true(all(check))

  ## Expect generating extra trees doesn't fail
  res <- hdbscan(moons, minPts = 5, gen_hdbscan_tree = TRUE, gen_simplified_tree = TRUE)
  expect_s3_class(res, "hdbscan")

  ## Expect hdbscan tree matches stats:::as.dendrogram version of hclust object
  hc_dend <- as.dendrogram(res$hc)
  expect_s3_class(hc_dend, "dendrogram")
  expect_identical(hc_dend, res$hdbscan_tree)

  ## Expect hdbscan works with non-euclidean distances
  dist_moons <- dist(moons, method = "canberra")
  res <- hdbscan(dist_moons, minPts = 5)
  expect_s3_class(res, "hdbscan")
})

test_that("mrdist", {
  expect_identical(mrdist(cbind(1:10), 2),  mrdist(dist(cbind(1:10)), 2))
  expect_identical(mrdist(cbind(1:11), 3), mrdist(dist(cbind(1:11)), 3))
})

test_that("HDBSCAN(e)", {
  X <- data.frame(
   x = c(
    0.08, 0.46, 0.46, 2.95, 3.50, 1.49, 6.89, 6.87, 0.21, 0.15,
    0.15, 0.39, 0.80, 0.80, 0.37, 3.63, 0.35, 0.30, 0.64, 0.59, 1.20, 1.22,
    1.42, 0.95, 2.70, 6.36, 6.36, 6.36, 6.60, 0.04, 0.71, 0.57, 0.24, 0.24,
    0.04, 0.04, 1.35, 0.82, 1.04, 0.62, 0.26, 5.98, 1.67, 1.67, 0.48, 0.15,
    6.67, 6.67, 1.20, 0.21, 3.99, 0.12, 0.19, 0.15, 6.96, 0.26, 0.08, 0.30,
    1.04, 1.04, 1.04, 0.62, 0.04, 0.04, 0.04, 0.82, 0.82, 1.29, 1.35, 0.46,
    0.46, 0.04, 0.04, 5.98, 5.98, 6.87, 0.37, 6.47, 6.47, 6.47, 6.67, 0.30,
    1.49, 3.21, 3.21, 0.75, 0.75, 0.46, 0.46, 0.46, 0.46, 3.63, 0.39, 3.65,
    4.09, 4.01, 3.36, 1.43, 3.28, 5.94, 6.35, 6.87, 5.60, 5.99, 0.12, 0.00,
    0.32, 0.39, 0.00, 1.63, 1.36, 5.67, 5.60, 5.79, 1.10, 2.99, 0.39, 0.18
    ),
   y = c(
    7.41, 8.01, 8.01, 5.44, 7.11, 7.13, 1.83, 1.83, 8.22, 8.08,
    8.08, 7.20, 7.83, 7.83, 8.29, 5.99, 8.32, 8.22, 7.38, 7.69, 8.22, 7.31,
    8.25, 8.39, 6.34, 0.16, 0.16, 0.16, 1.66, 7.55, 7.90, 8.18, 8.32, 8.32,
    7.97, 7.97, 8.15, 8.43, 7.83, 8.32, 8.29, 1.03, 7.27, 7.27, 8.08, 7.27,
    0.79, 0.79, 8.22, 7.73, 6.62, 7.62, 8.39, 8.36, 1.73, 8.29, 8.04, 8.22,
    7.83, 7.83, 7.83, 8.32, 8.11, 7.69, 7.55, 7.20, 7.20, 8.01, 8.15, 7.55,
    7.55, 7.97, 7.97, 1.03, 1.03, 1.24, 7.20, 0.47, 0.47, 0.47, 0.79, 8.22,
    7.13, 6.48, 6.48, 7.10, 7.10, 8.01, 8.01, 8.01, 8.01, 5.99, 8.04, 5.22,
    5.82, 5.14, 4.81, 7.62, 5.73, 0.55, 1.31, 0.05, 0.95, 1.59, 7.99, 7.48,
    8.38, 7.12, 2.01, 1.40, 0.00, 9.69, 9.47, 9.25, 2.63, 6.89, 0.56, 3.11
   )
  )

  hdbe <- hdbscan(X, minPts = 3, cluster_selection_epsilon = 1)
  #plot(X, col = hdbe$cluster + 1L, main = "HDBSCAN(e)")

  expect_equal(ncluster(hdbe), 5L)
  expect_equal(nnoise(hdbe), 0L)
})


================================================
FILE: tests/testthat/test-kNN.R
================================================
test_that("kNN", {
  set.seed(665544)
  n <- 1000
  x <- cbind(
    x = runif(10, 0, 10) + rnorm(n, sd = 0.2),
    y = runif(10, 0, 10) + rnorm(n, sd = 0.2),
    z = runif(10, 0, 10) + rnorm(n, sd = 0.2)
  )

  ## no duplicates first! All distances should be unique
  x <- x[!duplicated(x),]

  rownames(x) <- paste0("Object_", seq_len(nrow(x)))

  k <- 5L
  nn <- kNN(x, k=k, sort = TRUE)

  ## check dimensions
  expect_identical(nn$k, k)
  expect_identical(dim(nn$dist), c(nrow(x), k))
  expect_identical(dim(nn$id), c(nrow(x), k))

  ## check visually
  #plot(x)
  #points(x[nn$id[1,],], col="red", lwd=5)
  #points(x[nn$id[2,],], col="green", lwd=5)

  ## compare with kNN found using distances
  nn_d <- kNN(dist(x), k, sort = TRUE)

  ## check visually
  #plot(x)
  #points(x[nn_d$id[1,],], col="red", lwd=5)
  #points(x[nn_d$id[2,],], col="green", lwd=5)

  ### will agree since we use sorting
  expect_equal(nn, nn_d)

  ## calculate dist internally
  nn_d2 <- kNN(x, k, search = "dist", sort = TRUE)
  expect_equal(nn, nn_d2)

  ## without sorting
  nn2 <- kNN(x, k=k, sort = FALSE)
  expect_equal(t(apply(nn$id, MARGIN = 1, sort)),
    t(apply(nn2$id, MARGIN = 1, sort)))

  ## search options
  nn_linear <- kNN(x, k=k, search = "linear", sort = TRUE)
  expect_equal(nn, nn_linear)

  ## split options
  for(so in c("STD", "MIDPT", "FAIR", "SL_FAIR")) {
    nn3 <- kNN(x, k=k, splitRule = so, sort = TRUE)
    expect_equal(nn, nn3)
  }

  ## bucket size
  for (bs in c(5, 10, 15, 100)) {
    nn3 <- kNN(x, k=k, bucketSize = bs, sort = TRUE)
    expect_equal(nn, nn3)
  }

  ## the order is not stable with matching distances which means that the
  ## k-NN are not stable. We add 100 copied points to check if self match
  ## filtering and sort works
  x <- rbind(x, x[sample(seq_len(nrow(x)), 100),])
  rownames(x) <- paste0("Object_", seq_len(nrow(x)))

  k <- 5L
  nn <- kNN(x, k=k, sort = TRUE)

  ## compare with manually found NNs
  nn_d <- kNN(x, k=k, search = "dist", sort = TRUE)

  expect_equal(nn$dist, nn_d$dist)
  ## This is expected to fail: because the ids are not stable for matching distances
  ## expect_equal(nn$id, nn_d$id)
  ## FIXME: write some code to check this!


  ## missing values, but distances are fine
  x_na <- x
  x_na[c(1, 3, 5), 1] <- NA
  expect_error(kNN(x_na, k = 3), regexp = "NA")
  res_d1 <- kNN(x_na, k = 3, search = "dist")
  res_d2 <- kNN(dist(x_na), k = 3)
  expect_equal(res_d1, res_d2)

  ## introduce NAs into dist
  x_na[c(1, 3, 5),] <- NA
  expect_error(kNN(x_na, k = 3), regexp = "NA")
  expect_error(kNN(x_na, k = 3, search = "dist"), regexp = "NA")
  expect_error(kNN(dist(x_na), k = 3), regexp = "NA")

  ## inf
  x_inf <- x
  x_inf[c(1, 3, 5), 2] <- Inf
  kNN(x_inf, k = 3)
  kNN(x_inf, k = 3, search = "dist")
  kNN(dist(x_inf), k = 3)


  ## sort and kNN to reduce k
  nn10 <- kNN(x, k = 10)
  #nn10 <- kNN(x, k = 10, sort = FALSE)
  ## knn now returns sorted lists
  #expect_equal(nn10$sort, FALSE)
  expect_error(kNN(nn10, k = 11))
  nn5 <- kNN(nn10, k = 5)
  expect_true(nn5$sort)
  expect_identical(ncol(nn5$id), 5L)
  expect_identical(ncol(nn5$dist), 5L)

  ## test with simple data
  x <- data.frame(x = 1:10, row.names = LETTERS[1:10], check.names = FALSE)
  nn <- kNN(x, k = 5)
  expect_identical(unname(nn$id[1, ]), 2:6)
  expect_identical(unname(nn$id[5, ]), c(4L, 6L, 3L, 7L, 2L))
  expect_identical(unname(nn$id[10, ]), 9:5)

  ## test kNN with query
  x <- data.frame(x = 1:10, row.names = LETTERS[1:10], check.names = FALSE)
  nn <- kNN(x[1:8, , drop=FALSE], x[9:10, , drop = FALSE], k = 5)
  expect_identical(nrow(nn$id), 2L)
  expect_identical(unname(nn$id[1, ]), 8:4)
  expect_identical(unname(nn$id[2, ]), 8:4)

  expect_error(kNN(dist(x[1:8, , drop=FALSE]), x[9:10, , drop = FALSE], k = 5))
})


================================================
FILE: tests/testthat/test-kNNdist.R
================================================
test_that("kNNdist", {
  set.seed(665544)
  n <- 1000
  x <- cbind(
    x = runif(10, 0, 10) + rnorm(n, sd = 0.2),
    y = runif(10, 0, 10) + rnorm(n, sd = 0.2),
    z = runif(10, 0, 10) + rnorm(n, sd = 0.2)
  )

  d <- kNNdist(x, k = 5)
  expect_length(d, n)

  d <- kNNdist(x, k = 5, all = TRUE)
  expect_equal(dim(d), c(n, 5))

  # does the plot work?
  #kNNdistplot(x, 5)
})


================================================
FILE: tests/testthat/test-lof.R
================================================
test_that("LOF", {
  set.seed(665544)
  n <- 600
  x <- cbind(
    x=runif(10, 0, 5) + rnorm(n, sd=0.4),
    y=runif(10, 0, 5) + rnorm(n, sd=0.4)
  )

  ### calculate LOF score
  system.time(lof_kd <- lof(x, minPts = 5))
  expect_length(lof_kd, nrow(x))

  system.time(lof_d <- lof(dist(x), minPts = 5))
  #expect_equal(lof_kd, lof_d)

  ## compare with lofactor from DMwR (k = minPts - 1)
  #if(requireNamespace("DMwR", quietly = TRUE)) {
  #  system.time(lof_DMwr <- DMwR::lofactor(x, k = 4))
  # DMwR is now retired so we have the correct values here
  #  dput(round(lof_DMwr, 7))

  lof_DMwr <- c(1.0386817, 1.0725475, 1.1440822, 0.9448794, 1.1387918, 2.285202,
    1.0976862, 1.071325, 0.975922, 0.9549399, 1.0918247, 0.9868736,
    1.123618, 2.2802129, 0.992019, 1.046492, 1.0729966, 1.6925297,
    1.0032157, 0.9691323, 1.0561082, 0.9493052, 1.0209116, 0.8897277,
    1.008681, 1.0711202, 1.053845, 0.9734241, 1.1147289, 0.9351913,
    1.8674401, 1.097982, 0.9782695, 1.0613472, 0.9988367, 1.4571062,
    0.9927837, 0.9443716, 1.0014804, 1.0322888, 0.9264795, 0.9509729,
    0.9757305, 1.0647956, 1.0184634, 1.428911, 1.0166712, 0.9692196,
    1.0821285, 1.1282936, 0.9874694, 1.1079347, 0.9906487, 0.9972962,
    1.0594364, 0.9160978, 1.2393862, 1.3578505, 0.930095, 1.0489962,
    1.1401282, 1.1808566, 1.0380796, 2.0657157, 0.9837392, 0.9712287,
    1.4754447, 1.3154291, 1.0589814, 1.0486608, 1.0986178, 1.1375705,
    1.0147473, 1.7615974, 0.9724805, 0.9719851, 0.982247, 1.0591561,
    1.0862436, 1.0710844, 1.11301, 0.9719126, 1.0455651, 0.9426225,
    1.0934785, 1.1223749, 1.1734774, 1.0037237, 0.8844162, 0.9131705,
    1.0728687, 1.0446755, 1.108353, 0.9492501, 1.1704727, 1.1914106,
    0.9453222, 1.1724001, 1.1827576, 0.9617445, 1.1519398, 1.1480532,
    1.0268692, 1.0580088, 1.392551, 1.2571354, 0.9703385, 1.5030845,
    1.0201881, 1.0061842, 0.9919245, 1.2771078, 1.0473407, 1.263149,
    0.9587146, 1.0235194, 0.988292, 0.9302287, 1.0593181, 0.978052,
    1.1026427, 1.0615622, 1.0299466, 1.2200394, 1.0720229, 1.1343499,
    1.0180289, 1.4500258, 0.9886391, 0.969401, 1.4881191, 1.0775279,
    1.0380796, 1.2315327, 1.0307432, 0.9615078, 1.2379828, 1.1181202,
    1.1049541, 1.0786524, 0.9197587, 1.0642223, 0.8073981, 0.9251505,
    0.9971381, 1.5188771, 1.0679818, 0.9943418, 3.5343815, 0.9559526,
    1.2129819, 1.0067672, 1.0175442, 1.0875222, 1.0403766, 2.0998678,
    0.9870077, 1.327542, 1.0081014, 0.9608997, 0.9144311, 1.0016777,
    1.0465469, 1.5140562, 1.5560253, 1.1125134, 1.0310594, 1.0245521,
    1.7247798, 1.0586581, 1.0720232, 1.0594747, 0.956174, 1.0540952,
    1.0889792, 1.050014, 1.0216425, 0.9509729, 0.9740812, 1.3065791,
    1.0004211, 1.0127932, 0.9796374, 1.0552426, 1.0302613, 0.9524017,
    0.9554341, 0.9870971, 0.9857225, 0.9699046, 1.1122461, 1.031985,
    1.0852427, 1.0585017, 0.9733342, 0.9610561, 0.9086219, 1.1570747,
    1.069232, 0.9747538, 1.0084392, 1.1063077, 0.9573789, 1.3672764,
    1.3631144, 0.966934, 1.0992401, 0.9943351, 0.9850424, 1.0019623,
    1.5344698, 0.9592966, 0.9645661, 1.0076189, 1.0056102, 1.0066028,
    1.0148453, 1.0096178, 1.0963682, 1.0345623, 1.0121158, 1.0816582,
    1.0068326, 0.9697611, 0.9322887, 1.1414811, 1.0266256, 0.9143263,
    0.9602328, 1.1100272, 1.0885216, 1.0795966, 1.1165265, 1.1712866,
    1.1478981, 0.9653769, 1.0419996, 1.0245088, 1.0619264, 1.1729143,
    0.9756447, 0.9935498, 2.8554242, 1.0067806, 1.1311249, 1.36881,
    1.8759446, 1.2136268, 1.2112035, 0.9891436, 1.1089825, 0.9937973,
    0.9730926, 1.0287588, 1.1275406, 1.5135599, 1.0322888, 1.0746697,
    1.0181387, 1.2715467, 0.9196022, 1.1063077, 1.0666201, 1.121323,
    1.0850662, 0.9150997, 1.428667, 0.9488952, 1.1007532, 1.2246563,
    0.9933742, 1.1263888, 0.985569, 1.0275125, 1.01964, 1.0449989,
    0.9767297, 0.9704362, 0.9897834, 1.0246062, 1.0947694, 1.2170169,
    1.1323645, 1.2366689, 0.9516316, 1.2727108, 1.0480459, 1.0338822,
    1.1418884, 1.0733666, 1.0230934, 0.9149864, 0.9480381, 1.0388333,
    1.1266161, 0.9615078, 1.1221968, 0.9750836, 0.978132, 1.1412698,
    0.9716957, 1.0675609, 1.2594503, 1.0633289, 1.1427586, 1.0709402,
    1.0393154, 1.3284915, 0.9598698, 1.1755224, 1.2392279, 1.0625965,
    1.133851, 1.1631179, 1.4499444, 1.20366, 0.9606104, 0.9921343,
    0.8938437, 1.1738624, 1.0131062, 1.0027174, 0.9461069, 0.9717685,
    1.0645426, 1.046492, 1.1502628, 0.999057, 0.9758641, 1.1654844,
    0.9964193, 1.1066967, 1.1900241, 1.0727625, 1.1304909, 1.0892065,
    0.963785, 1.2942228, 1.0619264, 1.2733898, 0.9840458, 1.109005,
    1.0437884, 1.0298398, 0.9513221, 1.0823791, 1.0056102, 0.8875967,
    1.1385844, 0.8947159, 1.229025, 2.0563263, 0.9387754, 0.9683886,
    1.2059569, 0.9923111, 1.4218394, 1.043666, 0.9963639, 1.0610107,
    1.0049425, 0.9844978, 1.0292947, 0.9768325, 1.0528094, 1.0155664,
    1.1586381, 1.0432875, 1.0382743, 0.9793557, 1.1206471, 0.985182,
    1.1138052, 1.3397872, 1.0062782, 0.9474922, 1.2033802, 1.0889565,
    0.9172793, 0.9749791, 0.9912765, 1.2617741, 0.9875289, 0.9231973,
    1.1543416, 1.084554, 0.9805775, 0.9976991, 1.0076805, 1.0267488,
    0.9919245, 1.0627179, 0.9760528, 1.14714, 0.947823, 1.0574966,
    1.0560581, 0.9939038, 1.1754719, 0.9804448, 1.1892616, 1.2926922,
    1.0381062, 0.9991459, 1.0110192, 1.7982637, 0.9932575, 1.0365072,
    1.0476382, 0.9572147, 1.0362918, 0.929587, 1.1575934, 1.0942486,
    1.1386353, 1.0484103, 1.0846261, 0.9627105, 1.0514676, 1.0148971,
    0.9468566, 1.1103724, 1.0637948, 1.9343892, 1.0520743, 1.0526934,
    1.0679818, 1.0045373, 1.3400328, 0.9598806, 1.0309374, 0.9556979,
    1.3586868, 0.9806832, 1.0108765, 0.9652751, 1.9171728, 1.1786559,
    1.0223136, 0.9491173, 1.0020994, 0.977787, 1.0659739, 1.4374944,
    1.0311553, 1.0109194, 1.4310709, 0.9937973, 1.1235442, 1.0475279,
    1.0221015, 1.0810464, 1.6977976, 1.0944615, 1.0511645, 1.0957941,
    1.4443457, 1.0375637, 1.1045543, 1.0264414, 1.0205876, 1.3753965,
    1.0976175, 1.0539255, 1.037731, 1.0592793, 1.0109924, 1.0427939,
    1.1111455, 1.04521, 0.9745986, 1.3716186, 1.0089931, 1.0603559,
    1.5494147, 0.9854366, 1.2662523, 0.9623836, 1.3929899, 0.999679,
    1.0011268, 1.0179427, 1.0416134, 1.7609114, 1.069779, 1.0366241,
    1.1245068, 0.9792311, 0.967655, 0.9542575, 1.1684304, 1.2482993,
    1.2640331, 1.0298585, 0.9111223, 1.0672941, 0.9855631, 0.9206366,
    1.1058931, 1.0740426, 0.9649612, 1.3460875, 0.9493052, 1.0763382,
    1.0750445, 1.1003632, 1.0639591, 1.0930897, 0.9366367, 1.4825478,
    0.9872073, 1.0595017, 0.9098508, 0.9132522, 0.9715029, 1.3445599,
    0.9442429, 0.9947035, 1.5735628, 1.0179848, 1.1207158, 1.4513845,
    0.9971349, 1.0549698, 1.0829184, 0.9570918, 1.1063325, 1.049832,
    1.6941119, 0.976464, 1.0548108, 1.0429154, 1.1387078, 1.252386,
    1.4497295, 1.2952889, 1.0345598, 1.3188744, 1.059327, 0.9671478,
    0.9628657, 0.9935354, 1.2020615, 0.977946, 1.0286028, 0.9360817,
    0.9507702, 1.0119649, 1.49294, 0.9929636, 1.0500374, 1.3857874,
    1.271137, 1.2183431, 1.0284245, 1.2371945, 1.1308861, 1.386502,
    1.0364896, 1.222194, 1.0893758, 1.3687506, 0.9889728, 0.9717685,
    0.9804448, 1.0066674, 0.9703385, 1.5495994, 1.0779985, 0.9233493,
    1.1049508, 1.0770304, 0.9206519, 1.645557, 1.0494959, 1.1984923,
    1.4967244, 0.9976991, 1.0476285, 0.9612643, 0.9270878, 0.9683637,
    1.1585881, 1.0376168, 0.9816509, 0.9598896, 1.035713, 1.0170878,
    0.9578521, 0.9849839, 0.9363952, 0.9856201, 1.0240401, 1.1739687,
    1.1257174, 0.9772498, 0.9539389, 0.9537187, 1.3452872, 0.9888146
  )

  expect_equal(round(lof_kd, 7), lof_DMwr)
  expect_equal(round(lof_d, 7), lof_DMwr)

  ## missing values, but distances are fine
  x_na <- x
  x_na[c(1,3,5), 1] <- NA
  expect_error(lof(x_na), regexp = "NA")
  res_d1 <- lof(x_na, search = "dist")
  res_d2 <- lof(dist(x_na))
  expect_equal(res_d1, res_d2)

  x_na[c(1,3,5), 2] <- NA
  expect_error(lof(x_na), regexp = "NA")
  expect_error(lof(x_na, search = "dist"),
    regexp = "NA")
  expect_error(lof(dist(x_na)), regexp = "NA")

  ## test with tied distances
  x <- rbind(1,2,3,4,5,6,7)
  expect_equal(round(lof(x, minPts = 4), 7),
    c(1.0679012, 1.0679012, 1.0133929, 0.8730159, 1.0133929, 1.0679012, 1.0679012))

  expect_equal(round(lof(dist(x), minPts = 4),7),
    c(1.0679012, 1.0679012, 1.0133929, 0.8730159, 1.0133929, 1.0679012, 1.0679012))
})


================================================
FILE: tests/testthat/test-mst.R
================================================
test_that("mst", {
  draw_mst <- function(x, m) {
    plot(x)
    text(x, labels = 1:nrow(x), pos = 1)
    for (i in seq(nrow(m))) {
      from_to <- rbind(x[m[i, 1], ], x[m[i, 2], ])
      lines(from_to[, 1], from_to[, 2])
    }
  }

  x <- rbind(c(0, 0), c(0, 1), c(1, 1))
  d <- dist(x)
  (m <- mst(d, n = nrow(x)))

  #draw_mst(x, m)

  expect_equal(m, structure(
    c(2, 3, 1, 2, 1, 1),
    dim = 2:3,
    dimnames = list(NULL, c("from", "to", "weight"))
  ))

  x <- rbind(c(0, 0),
             c(1, 0),
             c(0, 1),
             c(1, 1),
             c(2, 1),
             c(1, 2),
             c(.7, 1),
             c(.7, .7),
             c(.7, 1.3))
  d <- dist(x)
  (m <- mst(d, n = nrow(x)))

  #draw_mst(x, m)

  expect_equal(m, structure(
    c(
      2,
      3,
      4,
      5,
      6,
      7,
      8,
      9,
      8,
      7,
      7,
      4,
      9,
      8,
      1,
      7,
      0.761577310586391,
      0.7,
      0.3,
      1,
      0.761577310586391,
      0.3,
      0.989949493661166,
      0.3
    ),
    dim = c(8L, 3L),
    dimnames = list(NULL, c("from", "to", "weight"))
  ))

  # data("Dataset_2")
  # x <- Dataset_2[,1:2]
  # cl <- Dataset_2[,3]
  # x_3 <- x[cl==3, ]
  #
  # (m <- mst(dist(x_3), n = nrow(x_3)))
  # max(m[,3])
  # draw_mst(x_3, m)


})

test_that("dist_subset", {
  x <- rbind(c(0, 0),
             c(1, 0),
             c(0, 1),
             c(1, 1),
             c(2, 1),
             c(1, 2),
             c(.7, 1),
             c(.7, .7),
             c(.7, 1.3))
  d <- dist(x)
  m <- as.matrix(d)

  s <- c(1:3, 6)
  (d_sub <- dist_subset(d, s))
  (m_sub <- m[s,s])

  expect_equal(unname(as.matrix(d_sub)), unname(m_sub))
})


================================================
FILE: tests/testthat/test-optics.R
================================================
test_that("OPTICS", {
  load(test_path("fixtures", "test_data.rda"))
  load(test_path("fixtures", "elki_optics.rda"))

  x <- test_data

  ### run OPTICS
  eps <- .1
  #eps <- .06
  eps_cl <- .1
  minPts <- 10
  res <- optics(x, eps = eps,  minPts = minPts)

  expect_length(res$order, nrow(x))
  expect_length(res$reachdist, nrow(x))
  expect_length(res$coredist, nrow(x))
  expect_identical(res$eps, eps)
  expect_identical(res$minPts, minPts)

  ### compare with distance based version!
  res_d <- optics(dist(x), eps = eps,  minPts = minPts)
  expect_equal(res, res_d)

  #plot(res)
  #plot(res_d)

  ### compare with elki's result
  expect_equal(res$order, elki$ID)
  expect_equal(round(res$reachdist[res$order], 3), round(elki$reachability, 3))

  ### compare result with DBSCAN
  ### "clustering created from a cluster-ordered is nearly indistinguishable
  ### from a clustering created by DBSCAN. Only some border objects may
  ### be missed"

  # extract DBSCAN clustering
  res <- extractDBSCAN(res, eps_cl = eps_cl)
  #plot(res)

  # are there any clusters with only border points?
  frnn <- frNN(x, eps_cl)
  good <- vapply(frnn$id, function(x) (length(x) + 1L) >= minPts, logical(1L))
  #plot(x, col = (res$cluster+1L))
  c_good <- res$cluster[good]
  c_notgood <- res$cluster[!good]
  expect_false(setdiff(c_notgood, c_good) != 0L)

  # compare with DBSCAN
  db <- dbscan(x, minPts = minPts, eps = eps)
  #plot(x, col = res$cluster+1L)
  #plot(x, col = db$cluster+1L)

  # match clusters (get rid of border points which might differ)
  pure <- vapply(
    split(db$cluster, res$cluster), function(x) length(unique(x)), integer(1L)
  )

  expect_true(all(pure[names(pure) != "0"] == 1L))

  ## missing values, but distances are fine
  x_na <- x
  x_na[c(1,3,5), 1] <- NA
  expect_error(optics(x_na, eps = .2, minPts = 4), regexp = "NA")
  res_d1 <- optics(x_na, eps = .2, minPts = 4, search = "dist")
  res_d2 <- optics(dist(x_na), eps = .2, minPts = 4)
  expect_equal(res_d1, res_d2)

  ## introduce NAs into dist
  x_na[c(1,3,5), 2] <- NA
  expect_error(optics(x_na, eps = .2, minPts = 4), regexp = "NA")
  expect_error(optics(x_na, eps = .2, minPts = 4, search = "dist"),
    regexp = "NA")
  expect_error(optics(dist(x_na), eps = .2, minPts = 4), regexp = "NA")

  ## Create OPTICS-converted and single-linkage dendrograms
  res <- optics(test_data, eps = Inf,  minPts = 2)
  res_dend <- as.dendrogram(res)
  reference <- as.dendrogram(hclust(dist(test_data), method = "single"))

  ## Test dendrogram ordering
  expect_equal(as.integer(unlist(res_dend)), res$order)

  ## Test Single Linkage with minPts=2, eps=INF for strict equivalence
  ## Note: Reordering needed to correct for isomorphisms
  ref_order <- order.dendrogram(reference)
  reference <- reorder(reference, ref_order, agglo.FUN = mean)
  expect_equal(reference, reorder(res_dend, ref_order, agglo.FUN = mean))

  # Make sure any epsilon that queries the entire neighborhood works,
  # error otherwise
  max_rd <- max(res$reachdist[!is.infinite(res$reachdist)], na.rm = TRUE)
  expect_error(as.dendrogram(optics(test_data, eps = max_rd-1e-7,  minPts = 2)), regexp = "Eps")
  expect_error(as.dendrogram(optics(test_data, eps = max_rd, minPts = nrow(test_data) + 1)), regexp = "'minPts'")

  ## Test symmetric relation between reachability <-> dendrogram structures
  expect_equal(as.reachability(as.dendrogram(res))$reachdist, res$reachdist)
  expect_equal(as.reachability(as.dendrogram(res))$order, res$order)
})


================================================
FILE: tests/testthat/test-opticsXi.R
================================================
test_that("OPTICS-XI", {
  load(test_path("fixtures", "test_data.rda"))
  load(test_path("fixtures", "elki_optics.rda"))
  load(test_path("fixtures", "elki_optics_xi.rda"))

  ### run OPTICS XI with parameters: xi=0.01, eps=1.0, minPts=5
  x <- test_data
  res <- optics(x, eps = 1.0,  minPts = 5)
  res <- extractXi(res, xi = 0.10, minimum = FALSE)

  ### Check to make sure ELKI results match R
  expected <- res$clusters_xi[, c("start", "end")]
  class(expected) <- "data.frame"
  expect_identical(elki_optics_xi, expected)
})


================================================
FILE: tests/testthat/test-predict.R
================================================
test_that("predict", {
  set.seed(3)
  n <- 100
  x_data <- cbind(
    x = runif(5, 0, 10) + rnorm(n, sd = 0.2),
    y = runif(5, 0, 10) + rnorm(n, sd = 0.2)
  )

  x_noise <- cbind(
    x = runif(n/2, 0, 10),
    y = runif(n/2, 0, 10)
  )

  x <- rbind(x_data, x_noise)

  # check if l points with a little noise are assigned to the same cluster
  l <- 20
  newdata <- rbind(
    x_data[1:l,] + rnorm(2*l, 0, .05),
    x_noise[1:l,] + rnorm(2*l, 0, .05)
  )

  idx <- c(1:l, n + (1:l))

  #plot(x, col = rep(c("black", "gray"), each = n))
  #points(newdata, col = rep(c("red", "gray"), each = l), pch = 16)

  # DBSCAN
  res <- dbscan(x, eps = .3, minPts = 3)
  pr <- predict(res, newdata, data = x)

  rbind(true = res$cluster[idx], pred = pr)
  expect_equal(res$cluster[idx], pr)
  #plot(x, col = ifelse(res$cluster == 0, "gray", res$cluster))
  #points(newdata, col = ifelse(pr == 0, "gray", pr), pch = 16)

  # OPTICS
  res <- optics(x, minPts = 3)
  res <- extractDBSCAN(res, eps = .3)
  pr <- predict(res, newdata, data = x)

  rbind(true = res$cluster[idx], pred = pr)
  expect_equal(res$cluster[idx], pr)

  # currently no implementation for extractXi

  # HDBSCAN (note predict is not perfect for the data.)
  res <- hdbscan(x, minPts = 3)
  pr <- predict(res, newdata, data = x)

  rbind(true = res$cluster[idx], pred = pr)
  accuracy <- sum(res$cluster[idx] == pr)/length(pr)
  expect_true(accuracy > .9)

  # show misclassifications
  #plot(x, col = ifelse(res$cluster == 0, "gray", res$cluster))
  #points(newdata, col = ifelse(pr == 0, "gray", pr), pch = 16)
  #points(newdata[res$cluster[idx] != pr,, drop = FALSE], col = "red", pch = 4, lwd = 2)
})


================================================
FILE: tests/testthat/test-sNN.R
================================================
test_that("sNN", {
  set.seed(665544)
  n <- 1000
  x <- cbind(
    x = runif(10, 0, 10) + rnorm(n, sd = 0.2),
    y = runif(10, 0, 10) + rnorm(n, sd = 0.2),
    z = runif(10, 0, 10) + rnorm(n, sd = 0.2)
  )

  ## no duplicates first!
  x <- x[!duplicated(x),]

  rownames(x) <- paste0("Object_", seq_len(nrow(x)))

  k <- 5L
  nn <- sNN(x, k=k, sort = TRUE)

  ## check dimensions
  expect_equal(nn$k, k)
  expect_equal(dim(nn$dist), c(nrow(x), k))
  expect_equal(dim(nn$id), c(nrow(x), k))

  ## check visually
  #plot(x)
  #points(x[nn$id[1,],], col="red", lwd=5)
  #points(x[nn$id[2,],], col="green", lwd=5)

  ## compare with kNN found using distances
  nn_d <- sNN(dist(x), k, sort = TRUE)

  ## check visually
  #plot(x)
  #points(x[nn_d$id[1,],], col="red", lwd=5)
  #points(x[nn_d$id[2,],], col="green", lwd=5)

  ### will aggree minus some tries
  expect_equal(nn, nn_d)

  ## calculate dist internally
  nn_d2 <- sNN(x, k, search = "dist", sort = TRUE)
  expect_equal(nn, nn_d2)

  ## missing values, but distances are fine
  x_na <- x
  x_na[c(1,3,5), 1] <- NA
  expect_error(sNN(x_na, k = 3), regexp = "NA")
  res_d1 <- sNN(x_na, k = 3, search = "dist")
  res_d2 <- sNN(dist(x_na), k = 3)
  expect_equal(res_d1, res_d2)

  ## introduce NAs into dist
  x_na[c(1,3,5),] <- NA
  expect_error(sNN(x_na, k = 3), regexp = "NA")
  expect_error(sNN(x_na, k = 3, search = "dist"), regexp = "NA")
  expect_error(sNN(dist(x_na), k = 3), regexp = "NA")


  ## sort and kNN to reduce k
  nn10 <- sNN(x, k = 10, sort = FALSE)
  expect_false(nn10$sort_shared)
  expect_error(sNN(nn10, k = 11))

  nn5 <- sNN(nn10, k = 5, sort = TRUE)
  nn5_x <- sNN(x, k = 5, sort = TRUE)
  expect_equal(nn5, nn5_x)

  ## test with simple data
  x <- data.frame(x = 1:10, check.names = FALSE)
  nn <- sNN(x, k = 5)

  i <- 1
  j_ind <- 1
  j <- nn$id[i,j_ind]
  intersect(c(i, nn$id[i,]), nn$id[j,])
  nn$shared[i,j_ind]

  # compute the sNN simularity in R
  ss <- matrix(nrow = nrow(x), ncol = nn$k)
  for(i in seq_len(nrow(x)))
    for(j_ind in 1:nn$k)
      ss[i, j_ind] <- length(intersect(c(i, nn$id[i,]), nn$id[nn$id[i,j_ind],]))

  expect_equal(nn$shared, ss)
})


================================================
FILE: tests/testthat.R
================================================
library(testthat)
library(dbscan)

test_check("dbscan")


================================================
FILE: vignettes/dbscan.Rnw
================================================
% !Rnw weave = Sweave
\documentclass[nojss]{jss}

% Package includes
\usepackage[utf8]{inputenc}
\usepackage[english]{babel}
%\usepackage{esvect} % vv
%\usepackage{algorithm} % algorithm tools
%\usepackage[noend]{algpseudocode} % algorithmic (pseudocode) tools
\usepackage{mathtools} % coloneqq
\usepackage{amsthm}
%\usepackage[dvipsnames]{xcolor} % for adding color to code
%\usepackage{listings} % For pprinting r code blocks (w/o execution)
\usepackage{amssymb}
\usepackage{pifont} % http://ctan.org/pkg/pifont
%\usepackage{float}
\usepackage{tabularx}
%\usepackage[toc,page]{appendix}

% Remove sweave margins if possible
%\usepackage[belowskip=-15pt,aboveskip=0pt]{caption}
%\setlength{\intextsep}{8pt plus 1pt minus 1pt}
%\setlength{\floatsep}{1ex}
%\setlength{\textfloatsep}{1ex plus 1pt minus 1pt}
%\setlength{\abovecaptionskip}{0ex}
%\setlength{\belowcaptionskip}{0ex}

% Aliases and commands
\newtheorem{mydef}{Definition}
\newcommand{\minus}{\scalebox{0.75}[1.0]{$-$}}
\newcommand{\exdb}{\texttt{extractDBSCAN} }
\mathchardef\mhyphen="2D % Define a "math hyphen"
\newcommand{\cmark}{\ding{51}} % checkmark

%% \VignetteIndexEntry{Fast Density-based Clustering (DBSCAN and OPTICS)}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% declarations for jss.cls %%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\author{
        Michael Hahsler \\Southern Methodist University \And
        Matthew Piekenbrock\\Wright State University \AND
        Derek Doran \\ Wright State University
    }
\title{\pkg{dbscan}: Fast Density-based Clustering with \proglang{R}}
\Plainauthor{Michael Hahsler, Matthew Piekenbrock, Derek Doran}
\Plaintitle{dbscan: Fast Density-based Clustering with R}
\Shorttitle{\pkg{dbscan}: Density-based Clustering with \proglang{R}}

\Address{
  Michael Hahsler\\
  Department of Engineering Management, Information, and Systems\\
  Bobby B. Lyle School of Engineering, SMU\\
  P. O. Box 750123, Dallas, TX 75275\\
  E-mail: \email{mhahsler@lyle.smu.edu}\\
  URL: \url{https://michael.hahsler.net/}

  \vspace{5mm}

  Matt Piekenbrock\\
  Department of Computer Science and Engineering\\
  Dept. of Computer Science and Engineering, Wright State University\\
  3640 Colonel Glenn Hwy, Dayton, OH, 45435\\
  E-mail: \email{piekenbrock.5@wright.edu}

  \vspace{5mm}

  Derek Doran\\
  Department of Computer Science and Engineering\\
  Dept. of Computer Science and Engineering, Wright State University\\
  3640 Colonel Glenn Hwy, Dayton, OH, 45435\\
  E-mail: \email{derek.doran@wright.edu}
}

\Abstract {
    This article describes the implementation and use of the \proglang{R} package \pkg{dbscan}, which provides complete and fast implementations of the popular density-based clustering algorithm DBSCAN and the augmented ordering algorithm OPTICS. Compared to other implementations, \pkg{dbscan} offers open-source implementations using \proglang{C++} and advanced data structures like k-d trees to speed up computation. An important advantage of this implementation is that it is up-to-date with several primary advancements that have been added since their original publications, including artifact corrections and dendrogram extraction methods for OPTICS. Experiments with \pkg{dbscan}'s implementation of DBSCAN and OPTICS compared and other libraries such as FPC, ELKI, WEKA, PyClustering, SciKit-Learn and SPMF suggest that \pkg{dbscan} provides a very efficient implementation.
}
\Keywords{DBSCAN, OPTICS, Density-based Clustering, Hierarchical Clustering}

\begin{document}

% Do not move SweaveOpts into preamble
\SweaveOpts{concordance=TRUE} % prefix.string=generated/dbscan
\section{Introduction}
Clustering is typically described as the process of finding structure in data by grouping similar objects together, where the resulting set of groups are called clusters.
Many clustering algorithms directly apply the idea that clusters can be formed such that objects in the same cluster should be more similar to each other than to objects in other clusters. The notion of similarity (or distance) stems from the fact that objects are assumed to be data points embedded in a data space in which a similarity measure can be defined. Examples are methods based on solving the $k$-means problem or mixture models
which find the parameters of a parametric generative probabilistic model from which the observed data are assumed to arise. Another approach is hierarchical clustering, which uses local heuristics to form a hierarchy of nested grouping of objects. Most of these approaches (with the notable exception of single-link hierarchical clustering) are biased towards clusters with convex, hyper-spherical shape. A detailed review of these clustering algorithms is provided in \cite{Kaufman:1990}, \cite{jain1999review},  and the more recent review by
\cite{Aggarwal:2013}.

Density-based clustering approaches clustering differently. It simply posits that clusters are contiguous `dense' regions in the data space (i.e., regions of high point density), separated by areas of low point density~\citep{kriegel:2011,sander2011density}.
Density-based methods find such high-density regions representing clusters of arbitrary shape and typically have a structured means of identifying noise points in low-density regions. These properties provide advantages for many applications compared to other clustering approaches. For example, geospatial data may be fraught with noisy data points due to estimation errors in GPS-enabled sensors~\citep{Chen2014} and may have unique cluster shapes caused by the physical space the data was captured in. Density-based clustering is also a promising approach to clustering high-dimensional data~\citep{kailing2004density}, where partitions are difficult to discover, and where the physical shape constraints assumed by model-based methods are more likely to be violated.
%While dimensionality reduction techniques enable the use of many clustering algorithms to cluster high dimensional data, density-based clustering enables us to group high-dimensional data without the loss of information and recognizing noisy data.
% What DBSCAN has been used for

Several density-based clustering algorithms have been proposed, including
DBSCAN algorithm~\citep{ester1996density},
DENCLUE~\citep{hinneburg1998efficient}
and many DBSCAN derivates like HDBSCAN~\citep{campello2015hierarchical}.
These clustering algorithms are widely used in practice with applications ranging from finding outliers in datasets for fraud prevention~\citep{breunig2000lof}, to finding patterns in streaming data~\citep{chen2007density, cao2006density}, noisy signals~\citep{kriegel2005density,ester1996density,tran2006knn,hinneburg1998efficient,duan2007local}, gene expression data~\citep{jiang2003dhc}, multimedia databases~\citep{kisilevich2010p}, and road traffic~\citep{li2007traffic}.

%%% MFH: I am not sure this is true. Some of these are not pure density-based
% What is the aim of the DBSCAN package?
%There are many meaningful ways to define 'natural' clusters based on density. As a result, numerous density-based clustering algorithms have been proposed within the past two decades, e.g.,
%BIRCH~\citep{zhang96},
%DBSCAN algorithm~\citep{ester1996density},
%DENCLUE~\citep{hinneburg1998efficient},
%CURE~\citep{guha1998cure},
%CHAMELEON~\citep{karypis1999chameleon},
%CLARANS~\citep{ng2002clarans},
%and HDBSCAN~\citep{campello2015hierarchical}.

This paper focuses on an efficient implementation of the DBSCAN algorithm~\citep{ester1996density},
one of the most popular density-based clustering algorithms,
whose consistent use earned it the SIG
KDD 2014's Test of Time Award~\citep{SIGKDDNe30:online}, and OPTICS~\citep{ankerst1999optics}, often referred to as an extension of DBSCAN.
%Matt - what do you mean when you say related algorithms?
%along with their related algorithms, such as the Local Outlier Factor \citep{breunig2000lof} and the conversion methods between reachability and dendrogram representations\citep{sander2003automatic}.
%Matt - you can cite the KAIS 17 paper in this first sentence
While surveying software tools that implement various density-based clustering algorithms, it was discovered that in a large number of statistical tools, not only do implementations vary significantly in performance~\citep{kriegel2016black}, but may also lack important components and corrections. Specifically, for the statistical computing environment \proglang{R}~\citep{team2013r}, only naive DBSCAN implementations without speed-up with spatial data structures are available (e.g., in the well-known Flexible Procedures for Clustering package~\citep{fpc}), and OPTICS is not available. %% Matt, what packages? : fixed (fpc). It's probably not worth mentioning largeVis, doesn't even compile/load properly on my machine.
This motivated the development of a \proglang{R} package for density-based clustering with DBSCAN and related algorithms called \pkg{dbscan}. The \pkg{dbscan} package contains complete, correct and fast implementations of DBSCAN and OPTICS.
% precisely as intended by the original authors of the algorithms.
The package currently enjoys thousands of new installations from the CRAN repository every month.

This article presents an overview of the \proglang{R} package~\pkg{dbscan}
focusing on DBSCAN and OPTICS, outlining its operation and experimentally
compares its performance with implementations in other open-source implementations. We first review the concept of density-based clustering and present the DBSCAN and OPTICS algorithms in Section~\ref{sec:dbc}. This section concludes with a short review of existing software packages that implement these algorithms. Details about \pkg{dbscan}, with examples of its use, are presented in Section~\ref{sec:dbscan}. A performance evaluation is presented in Section~\ref{sec:eval}. Concluding remarks are offered in Section~\ref{sec:conc}.

A version of this article describing the package \pkg{dbscan} was published as \cite{hahsler2019dbscan} and should be cited.

<<echo=FALSE>>=
options(useFancyQuotes = FALSE)
citation("dbscan")
@

\section{Density-based clustering}\label{sec:dbc}
Density-based clustering is now a well-studied field. Conceptually, the idea behind density-based clustering is simple: given a set of data points, define a structure that accurately reflects the underlying density~\citep{sander2011density}. An important distinction between density-based clustering and alternative approaches to cluster analysis, such as the use of \emph{(Gaussian) mixture models}~\citep[see][]{jain1999review}, is that the latter represents a \emph{parametric} approach in which the observed data are assumed to have been produced by mixture of either Gaussian or other parametric families of distributions.
While certainly useful in many applications, parametric approaches naturally assume clusters will exhibit some type convex (generally hyper-spherical or hyper-elliptical) shape. Other approaches, such as $k$-means clustering (where the $k$ parameter signifies the user-specified number of clusters to find), share this common theme of `minimum variance', where the underlying assumption is made that ideal clusters are found by minimizing some measure of intra-cluster variance (often referred to as cluster cohesion) and maximizing the inter-cluster variance (cluster separation)~\citep{arbelaitz2013extensive}. Conversely, the label density-based clustering is used for methods which do not assume parametric distributions, are capable of finding arbitrarily-shaped clusters, handle varying amounts of noise, and require no prior knowledge regarding how to set the number of clusters $k$. This methodology is best expressed in the DBSCAN algorithm, which we discuss next.

\subsection{DBSCAN: Density Based Spatial Clustering of Applications with Noise}
As one of the most cited of the density-based clustering algorithms~\citep{acade96:online}, DBSCAN~\citep{ester1996density} is likely the best known density-based clustering algorithm in the scientific community today. The central idea behind DBSCAN and its extensions and revisions is the notion that points are assigned to the same cluster if they are \emph{density-reachable} from each other. To understand this concept, we will go through the most important definitions used in DBSCAN and related algorithms. The definitions and the presented pseudo code follows the original by \cite{ester1996density}, but are adapted to provide a more consistent presentation with the other algorithms discussed in the paper.

Clustering starts with a dataset $D$ containing a set of points
$p \in D$.
Density-based algorithms need to obtain a density estimate over the data space.
DBSCAN estimates the density around a point using
the concept of $\epsilon$-neighborhood.

\begin{mydef} {\bf $\epsilon$-Neighborhood}.
The $\epsilon$-neighborhood, $N_\epsilon(p)$, of a data point $p$ is the set of points within a specified radius $\epsilon$ around $p$.

    $$N_\epsilon(p) = \{q \;|\; d(p,q) \le \epsilon\}$$

where $d$ is some distance measure and $\epsilon \in \mathbb{R}^+$. Note that the point $p$ is always in its own $\epsilon$-neighborhood, i.e., $p \in N_\epsilon(p)$ always holds.
\end{mydef}

% high density definition both below and above
Following this definition, the size of the neighborhood $|N_\epsilon(p)|$ can
be seen as a simple unnormalized kernel density estimate around $p$
using a uniform kernel and a bandwidth of $\epsilon$.
DBSCAN uses $N_\epsilon(p)$ and a threshold called $\mathit{minPts}$
to detect dense regions and to classify the points in a data set into
{\bf core}, {\bf border}, or {\bf noise} points.

\begin{mydef} {\bf Point classes}.
A point $p \in D$ is classified as
    \begin{itemize}
    \item
    a {\bf core point} if $N_\epsilon(p)$ has
        high density, i.e., $|N_\epsilon(p)| \geq \mathit{minPts}$ where $\mathit{minPts} \in \mathbb{Z}^+$ is a user-specified density threshold,
    \item
    a {\bf border point} if $p$ is not a core point, but
        it is in the neighborhood of a core point $q \in D$,
        i.e., $p \in N_\epsilon(q)$, or
    \item
    a {\bf noise point}, otherwise.
    \end{itemize}
\end{mydef}


\begin{figure}
    \minipage{0.49\textwidth}
        \includegraphics[height=\linewidth, angle=-90, origin=c]{figures/dbscan_a}\\
        \centerline{(a)}
    \endminipage\hfill
    \minipage{0.49\textwidth}
        \includegraphics[height=\linewidth, angle=-90, origin=c]{figures/dbscan_b}\\
        \centerline{(b)}
    \endminipage\\
\caption{Concepts used the DBSCAN family of algorithms.
    (a) shows examples for the three point classes, core, border, and noise points,
    (b)  illustrates the concept of density-reachability and density-connectivity.
    }\label{fig:point_classes}
\end{figure}


A visual example is shown in
Figure~\ref{fig:point_classes}(a). The size of the neighborhood for some points
is shown as a circle and their class is shown as an annotation.

To form contiguous dense regions from individual points, DBSCAN defines the
notions of reachability and connectedness.

\begin{mydef} {\bf Directly density-reachable}.
A point $q \in D$ is directly density-reachable from a point $p \in D$ with respect to
$\epsilon$ and $\mathit{minPts}$ if, and only if,
\begin{enumerate}
  \item $|N_\epsilon(p)|$ $\geq$ $\mathit{minPts}$, and
  \item $q$ $\in$ $N_\epsilon(p)$.
\end{enumerate}
      That is,
      $p$ is a core point and
      $q$ is in its $\epsilon$-neighborhood.
\end{mydef}

\begin{mydef} {\bf Density-reachable}. A point $p$ is density-reachable from $q$ if there exists in $D$ an ordered sequence of points $(p_1, p_2, ..., p_n)$
 with $q=p_1$ and $p=p_n$
    such that $p_i+1$ directly density-reachable from $p_{i}$ $\forall$ $i \in \{1,2, ..., n-1\}$.
\end{mydef}


\begin{mydef} {\bf Density-connected}. A point $p \in D$ is density-connected to a point $q \in D$
if there is a point $o$ $\in$ $D$ such that both $p$ and $q$ are density-reachable from $o$.
\end{mydef}

The notion of density-connection can be used to form clusters as
contiguous dense regions.

% DBSCAN definition of cluster
\begin{mydef} {\bf Cluster}. A cluster $C$ is a non-empty subset of $D$ satisfying the following conditions:
\begin{enumerate}
    \item {\bf Maximality}: If $p \in C$ and $q$ is density-reachable from $p$, then $q \in C$; and
    \item {\bf Connectivity}: $\forall$ $p, q \in C$, $p$ is density-connected to $q$.
\end{enumerate}
\end{mydef}

The DBSCAN algorithm identifies all such clusters by finding all core points and expanding each to all density-reachable points.
%Algorithm~\ref{alg:dbscan} presents the details of the DBSCAN implementation in \pkg{dbscan}. It largely follows the algorithm presented by \cite{ester1996density}, but presents DBSCAN and cluster expansion in a single function.
The algorithm begins with an arbitrary point $p$ and retrieves its $\epsilon$-neighborhood.
%, denoted $N_{\epsilon}(p)$.
If it is a core point then it will start a new cluster that is expanded by assigning all points in its neighborhood to the cluster. If an additional core point is found in the neighborhood, then the search is expanded to include also all points in its neighborhood.
If no more core points are found in the expanded neighborhood, then the cluster is complete and the remaining points are searched to see if another core point can be found to start a new cluster.
%The algorithm returns the cluster assignments after all data points have been processed.
After processing all points, points which were not assigned to a cluster are considered noise.
%Note that border points are point which have been assigned a cluster, but are not core points.

%\begin{algorithm}[t]
%     \caption{DBSCAN}
%     \begin{algorithmic}[1]
%     \Require $D \coloneqq$ Database of points
%     \Require $\epsilon \coloneqq$ User-defined neighborhood radius
%     \Require $\mathit{minPts} \coloneqq$ Minimum number of points in the neighborhood of a core point
%     \Function{DBSCAN}{D, eps, $\mathit{minPts}$}
%    \For{$p$ in $D$}% Iterate through the DB of points, arbitrary starting point
%     \Comment{Find core points}
%        \If {$p$ has already been visited} % Already Processed points are skipped
%                continue
%        \EndIf
%
%        \State Mark $p$ as visited % Mark progress
%            \State $N \gets N_{\epsilon}(p)$ % Get all points within eps radius
%            \If{$|N| < \mathit{minPts}$} % How many points were found
%        continue
%        \EndIf
%
%        \State $c \gets$ new cluster label
%        \Comment{Start new cluster for core point and expand}
%        \State Assign $p$ to cluster $c$
%        \While {$N \ne \emptyset$}
%        \State $p' \gets pop(N)$
%          \If {$p'$ has already been visited} % Already Processed points are skipped
%                continue
%          \EndIf
%          \State Mark $p'$ as visited % Mark progress
%          \State $N' \gets N_{\epsilon}(p')$ % Get all points within eps radius
%          \State Assign $p'$ to cluster $c$
%            \If{$|N'| \ge \mathit{minPts}$} % How many points were found
%        \Comment{Expand cluster for additional core point}
%        \State Mark $p'$ as a core point
%        \State $N \gets N \cup N'$
%
%        \EndIf
%
%        \EndWhile
%         \EndFor
%     \State \Return cluster assignments
%     \EndFunction
%     \end{algorithmic}
%\label{alg:dbscan}
%\end{algorithm}

In the DBSCAN algorithm, core points are always part of the same cluster, independent of the order in which the points in the dataset are processed.
This is different for border points. Border points might be density-reachable from core points in several clusters and the algorithm assigns them to the
first of these clusters processed which depends on the order of
the data points and the particular implementation of the algorithm.
%Border points, however, although density-reachable from a core point, do not share the density-reachable property (the relation is asymmetric) and thus their cluster assignment depends on the order of which points are visited in the algorithm. This needs to be taken into account when comparing two different implementations since they might visit the points in a different order and thus end up producing different cluster assignments for border points.
To alleviate this behavior, \cite{campello2015hierarchical} suggest a modification called DBSCAN* which considers all border points as noise instead and leaves
them unassigned.


\subsection{OPTICS: Ordering Points To Identify Clustering Structure}\label{sec:optics}
There are many instances where it would be useful to detect clusters of varying density. From identifying causes among similar seawater characteristics~\citep{birant2007st}, to network intrusion detection systems~\citep{ertoz2003finding}, point of interest detection using geo-tagged photos~\citep{kisilevich2010p}, classifying cancerous skin lesions~\citep{celebi2005mining}, the motivations for detecting clusters among varying densities are numerous. The inability to find clusters of varying density is a notable drawback of DBSCAN resulting from the fact that a combination of a specific neighborhood size with a single density threshold $\mathrm{minPts}$ is used to determine if a point resides in a dense neighborhood.

In 1999, some of the original DBSCAN authors developed OPTICS~\citep{ankerst1999optics} to address this concern. OPTICS borrows the core density-reachable concept from DBSCAN. But while DBSCAN may be thought of as a clustering algorithm, searching for natural groups in data, OPTICS is an \emph{augmented ordering algorithm} from which either flat or hierarchical clustering results can be derived. OPTICS requires the same $\epsilon$ and $\mathit{minPts}$ parameters as DBSCAN, however, the $\epsilon$ parameter is theoretically unnecessary and is only used for the practical purpose of reducing the runtime complexity of the algorithm.

To describe OPTICS, we introduce an additional concepts called core-distance
and reachability-distance. All used distances are calculated using the same metric (often Euclidean distance) used for the neighborhood calculation.

\begin{mydef} {\bf Core-distance}.
The core-distance of a point $p \in D$ with respect to $\mathit{minPts}$ and $\epsilon$ is defined as
    \[ \mathrm{core\mhyphen dist}(p; \epsilon, \mathit{minPts}) = \begin{cases}
    \text{UNDEFINED} & \text{if} \; |N_{\epsilon}(p)| < \mathit{minPts}, \text{and} \\
    \mathrm{minPts\mhyphen dist}(p) & \text{otherwise.}
   \end{cases}
    \] where $\mathrm{minPts\mhyphen dist}(p)$ is the
    distance from $p$ to its $\mathit{minPts} - 1$ nearest neighbor, i.e.,
    the minimal radius a neighborhood of size $\mathit{minPts}$ centered at and
    including $p$ would have.
\end{mydef}

\begin{mydef} {\bf Reachability-distance}.
 The reachability-distance of a point $p \in D$ to a point $q \in D$
 parameterized by  $\epsilon$ and $\mathit{minPts}$ is defined as
    \[ \mathrm{reachability\mhyphen dist}(p,q; \epsilon, \mathit{minPts}) = \begin{cases}
    \text{UNDEFINED} & \text{if} \; |N_{\epsilon}(p)| < \mathit{minPts}, \text{and} \\
    \max(\mathrm{core\mhyphen dist}(p), d(p, q)) & \text{otherwise.}
   \end{cases}
\]
\end{mydef}

The reachability-distance of a core point $p$ with respect to object $q$
is the smallest neighborhood radius such that $p$ would be directly
density-reachable from $q$.
Note that $\epsilon$ is typically set very large compared to DBSCAN. Therefore,
$\mathit{minPts}$ behaves differently for OPTICS: more points will be considered core points and it affects how many nearest neighbors are considered in the core-distance calculation, where larger values will lead to larger and more smooth reachability distributions. This needs to be kept in mind when choosing appropriate parameters.

%The OPTICS algorithm pseudocode is shown in Algorithm~\ref{alg:optics}.
OPTICS provides an augmented ordering.
%sorts points by the reachability-distance to their closest core point.
The algorithm starting with a point and expands it's neighborhood like DBSCAN, but it explores the new point in the order of lowest to highest core-distance. The order in which the points are explored along with each point's core- and reachability-distance is the final result of the algorithm.
An example of the order and the resulting reachability-distance is shown
in the form of a reachability plot
in Figure~\ref{fig:opticsReachPlot1}. Low reachability-distances shown as
valleys represent clusters separated by peaks representing
points with larger distances.
This density representation essentially conveys the same information as the often used dendrogram or `tree-like' structure.
This is why OPTICS is often also noted as a visualization tool.
\cite{sander2003automatic} showed how the output of OPTICS can
be converted into an equivalent dendrogram, and that
under certain conditions, the dendrogram produced by the well known hierarchical clustering with single linkage is identical to running OPTICS with the parameter $\mathit{minPts} = 2$
%To make this connection explicit, an OPTICS extension~\citep{sander2003automatic} showed how that, under certain conditions, the dendrogram produced by the well known hierarchical clustering with single linkage is identical to running OPTICS with the parameter $\mathit{minPts} = 2$. Due to the widespread usage of dendrograms in
%the \proglang{R} computing environment, this conversion algorithm between reachability and dendrogram representations is made available in \pkg{dbscan}.


\begin{figure}
    \centering
      \includegraphics{dbscan-opticsReachPlot}
      \caption{OPTICS reachability plot example for a data set with four clusters of 100 data points each.}
      \label{fig:opticsReachPlot1}
\end{figure}


%
%OPTICS evaluates each point's reachability-distance with respect to a neighbor, marks the point as processed, and then continues processing nearest neighbors.
%The algorithm is similar to DBSCAN. Where OPTICS differs, however, is in the assignment of reachability-distance, a generalized extension to density-reachability. Rather than assigning cluster labels for each object processed, OPTICS stores reachability-distance and core-distance , in a specific ordering such that neighboring objects that have smaller reachability-distances are prioritized. Due to this prioritization, core objects are naturally grouped up near other core objects in the ordering, where each point is labeled with its minimum reachability-distance. An overview of the algorithm is shown below in Algorithm \ref{alg:optics}.
%  OPTICS(DB, eps, \mathit{minPts})
%     for each point p of DB
%        p.reachability-distance = UNDEFINED
%     for each unprocessed point p of DB
%        N = getNeighbors(p, eps)
%        mark p as processed
%        output p to the ordered list
%        if (core-distance(p, eps, \mathit{minPts}) != UNDEFINED)
%           Seeds = empty priority queue
%           update(N, p, Seeds, eps, \mathit{minPts})
%           for each next q in Seeds
%              N' = getNeighbors(q, eps)
%              mark q as processed
%              output q to the ordered list
%              if (core-distance(q, eps, \mathit{minPts}) != UNDEFINED)
%                 update(N', q, Seeds, eps, \mathit{minPts})

%% \begin{algorithm}[tp]
%     \caption{OPTICS}
%     \begin{algorithmic}[1]
%     \Require $D \coloneqq$ Database of points
%     \Require $\epsilon \coloneqq$ User-defined neighborhood radius
%     \Require $\mathit{minPts} \coloneqq$ Minimum number of points in the neighborhood of a core point
%     \Function{OPTICS}{D, $\epsilon$, $\mathit{minPts}$}
%         \For{$p$ in $D$}% Iterate through the DB of points, arbitrary starting point
%            \If {$p$ has been processed} % Already Processed points are skipped
%                \textit{continue}
%            \EndIf
%            \State $N \gets N_{\epsilon}(p)$ \Comment{expand cluster order}
%        % Get all points within eps radius
%            \State Mark $p$ as processed % Mark progress
%            \State queue $\gets p$
%            \If{$|N| \ \geq \mathit{minPts}$}
%                \State $Seeds \gets \text{ < empty priority queue > }$
%                \State update($N$, $p$, $Seeds$, $\epsilon$, $\mathit{minPts}$)
%                \For{$q$ in $Seeds$}
%                  \State $N' \gets N_{\epsilon}(q)$
%                  \State Mark $q$ as processed % Mark progress
%                  \State queue $\gets q$
%                  \If{$|N'| \ \geq \mathit{minPts}$} % How many points were found
%                      \State update($N'$, $p$, $Seeds$, $\epsilon$, $\mathit{minPts}$)
%                  \EndIf
%                \EndFor
%            \EndIf
%         \EndFor
%     \State \Return core-distances
%         % A call to a function that extracts clusters should be mentioned, but we dont need to specify the extractdbscan or opitics-xi algorithms.
%         % Mentioned below?
%     \EndFunction
%     \end{algorithmic}
%\label{alg:optics}
%\end{algorithm}
%
%\begin{algorithm}[tp]
%     \caption{update}
%     \begin{algorithmic}[1]
%     \Require $N \coloneqq$ NeighborPts
%     \Require $p \coloneqq$ Current point to process
%     \Require $Seeds \coloneqq$ Priority Queue of known, unprocessed cluster members
%     \Require $\epsilon \coloneqq$ User-defined $\epsilon$ radius to consider
%     \Require $\mathit{minPts} \coloneqq$ The minimum number of points that constitute a cluster
%     \Function{update}{N, p, Seeds, $\epsilon$, $\mathit{minPts}$}
%     \State $p_\mathrm{core\mhyphen dist} \coloneqq \mathrm{core\mhyphen dist}(p, \epsilon, \mathit{minPts})$
%     \For{$o$ in $N$}
%         \If {$o$ has not been processed} % Already Processed points are skipped
%        \State $new_{rd} \coloneqq \max(p_{\mathrm{core\mhyphen dist}}, d(p, o))$
%            \If{$o_{rd} == \text{UNDEFINED}$}
%                \State $o_{rd} \gets new_{rd}$
%                \State $Seeds.insert \mhyphen with \mhyphen priority(o, o_{rd})$
%            \Else
%                \If{$new_{rd} < o_{rd}$}
%                  \State $o_{rd} \gets new_{rd}$
%                  \State $Seeds.move \mhyphen up(o, new_{rd})$
%                \EndIf
%            \EndIf
%        \EndIf
%     \EndFor
%     \EndFunction
%     \end{algorithmic}
%  \label{alg:update}
%\end{algorithm}

%\subsubsection{Cluster Extraction}\label{sub:opt_cluster_ex}
From the order discovered by OPTICS, two ways to group points into clusters
was discussed in ~\cite{ankerst1999optics}, one which we will refer to as the {\bf ExtractDBSCAN} method and one which we will refer to as the {\bf Extract-$\xi$} method summarized below:
\begin{enumerate}
  \item {\bf ExtractDBSCAN} uses a single global
      reachability-distance threshold $\epsilon'$ to extract a clustering.
     This can be seen as a horizontal line in the reachability plot
	in~\ref{fig:opticsReachPlot1}.
	Peaks above the cut-off represent noise points and separate the
	clusters.
  \item {\bf Extract-$\xi$}
      identifies clusters \emph{hierarchically} by scanning through the ordering that OPTICS produces to identify significant, relative changes in reachability-distance. The authors of OPTICS noted that clusters can be thought of as identifying `dents' in the reachability plot.
\end{enumerate}
The ExtractDBSCAN method extracts a clustering
equivalent to DBSCAN* (i.e., DBSCAN where border points stay unassigned).
Because this method extracts clusters like DBSCAN, it cannot identify partitions that exhibit very significant differences in density. Clusters of significantly different density can only be identified if the data is well separated and very little noise is present.
The second method, which we call Extract-$\xi$\footnote{In the original OPTICS publication \cite{ankerst1999optics}, the algorithm was outlined in Figure 19 and called the 'ExtractClusters' algorithm, where the clusters extracted were referred to as $\xi$-clusters. To distinguish the method uniquely, we refer to it as the Extract-$\xi$ method.},
identifies a cluster hierarchy and replaces the data dependent global $\epsilon$ parameter with $\xi$, a data-independent density-threshold parameter ranging between $0$ and $1$. One interpretation of $\xi$ is that it describes the relative magnitude of the change of cluster density (i.e., reachability). Significant changes in relative reachability allow for clusters to manifest themselves hierarchically as `dents' in the ordering structure. The hierarchical representation Extract-$\xi$ can, as opposed to the ExtractDBSCAN method, produce clusters of varying densities.

With its two ways of extracting clusters from the ordering, whether through either the global $\epsilon'$ or relative $\xi$ threshold, OPTICS can be seen as a generalization of DBSCAN. In contexts where one wants to find clusters of similar density, OPTICS's ExtractDBSCAN yields a DBSCAN-like solution, while in other contexts Extract-$\xi$ can generate a hierarchy representing clusters of varying density. It is thus interesting to note that while DBSCAN has reached critical acclaim, even motivating numerous extensions~\citep{rehman2014dbscan}, OPTICS has received decidedly less attention. Perhaps one of the reasons for this is because the Extract-$\xi$ method for grouping points into clusters has gone largely unnoticed, as it is not implemented in most open-source software packages that advertise an implementation of OPTICS. This includes implementations in WEKA~\citep{hall2009weka}, SPMF~\citep{fournier2014spmf}, and the PyClustering~\citep{PyCluste54:online} and Scikit-learn~\citep{pedregosa2011scikit} libraries for Python. To the best of our knowledge, the only other open-source library
sporting a complete implementation of OPTICS is ELKI~\citep{DBLP:journals/pvldb/SchubertKEZSZ15}, written in \proglang{Java}.
%\subsection{A Note on DBSCAN and OPTICS Extensions}\label{sec:extensions}

In fact, perhaps due to the (incomplete) implementations of OPTICS cluster extraction across various software libraries, there has been some confusion regarding the usage of OPTICS, and the benefits it offers compared to DBSCAN.
Several papers motivate DBSCAN extensions or devise new algorithms by citing OPTICS as incapable of finding density-heterogeneous clusters~\citep{ghanbarpour2014exdbscan,chowdhury2010efficient,Gupta2010,duan2007local}. Along the same line of thought, others cite OPTICS as capable of finding clusters of varying density, but either use the DBSCAN-like global density threshold extraction method or refer to OPTICS as a clustering algorithm, without mention of which cluster extraction method was used in their experimentation~\citep{verma2012comparative,roy2005approach,liu2007vdbscan,pei2009decode}.
However, OPTICS fundamentally returns an ordering of the data which can be post-processed to extract either
1) a flat clustering with clusters of relatively similar density or
2) a cluster hierarchy, which is adaptive to representing local densities within the data.
To clear up this confusion,
it seems to be important to add complete implementations to
existing software packages and introduce new complete implementations of OPTICS like the \proglang{R} package~\pkg{dbscan} described in this paper.


\subsection{Current implementations of DBSCAN and OPTICS}\label{sec:review}
Implementations of DBSCAN and/or OPTICS are available in many statistical software packages. We focus here on open-source solutions. These include the Waikato Environment for Knowledge Analysis (WEKA)~\citep{hall2009weka}, the Sequential Pattern Mining Framework (SPMF)~\citep{fournier2014spmf}, the Environment for Developing KDD-Application supported by Index Structures (ELKI)~\citep{DBLP:journals/pvldb/SchubertKEZSZ15}, the Python
%% Matt - need a cite for PyClustering.
library scikit-learn~\citep{pedregosa2011scikit}, the PyClustering Data Mining library~\citep{PyCluste54:online}, the Flexible Procedures for Clustering \proglang{R} package~\citep{fpc}, and the \pkg{dbscan} package~\citep{dbscan-R} introduced in this paper.

\begin{table}
  \begin{tabularx}{\textwidth}{ c  c  c  c  c  X }
        \hline
      {\bf Library} & {\bf DBSCAN} & {\bf OPTICS} & {\bf ExtractDBSCAN} & {\bf Extract-$\xi$} & \\
      \hline
      \rule{0pt}{3ex}
\pkg{dbscan}    & \cmark & \cmark & \cmark & \cmark & \\
      ELKI    & \cmark & \cmark & \cmark & \cmark & \\
      SPMF    & \cmark & \cmark & \cmark & & \\
      PyClustering & \cmark & \cmark & \cmark & & \\
      WEKA    & \cmark & \cmark & \cmark & & \\
      SCIKIT-LEARN & \cmark & & & & \\
      FPC    & \cmark & & & & \\
      \hline
  \end{tabularx}
    \vspace{2mm}
    \begin{tabularx}{\textwidth}{ c  c c  c  X }
     \hline
    {\bf Library} & {\bf Index Acceleration} & {\bf Dendrogram for OPTICS} & {\bf Language} & \\
      \hline
      \rule{0pt}{3ex}
    \pkg{dbscan} & \cmark & \cmark & \proglang{R} & \\
    ELKI & \cmark & \cmark & \proglang{Java} & \\
    SPMF & \cmark & & \proglang{Java} & \\
    PyClustering & \cmark & & \proglang{Python} & \\
    WEKA & & & \proglang{Java} & \\
    SCIKIT-LEARN & \cmark & & \proglang{Python} & \\
    FPC & & & \proglang{R} & \\
      \hline
  \end{tabularx}
  \caption{A Comparison of DBSCAN and OPTICS implementations in various
    open-source statistical software libraries. A \cmark \ symbol denotes availability.}
  \label{tab:comp}
\end{table}

Table~\ref{tab:comp} presents a comparison of the features offered by these packages. All packages support DBSCAN and most use index acceleration to speed up the $\epsilon$-neighborhood queries involved in both DBSCAN and OPTICS algorithms, the known bottleneck that typically dominates the runtime and is essential for processing larger data sets. \pkg{dbscan} is the first \proglang{R} implementation offering this improvement. OPTICS with ExtractDBSCAN is also widely implemented, but the Extract-$\xi$ method, as well as the use of dendrograms with OPTICS, is only available in \pkg{dbscan} and ELKI.
%It is notable that there still remain minor discrepancies between the implementations (see Completeness subsection %in Section~\ref{sec:eval} for details).
A small experimental runtime comparison is provided in Section~\ref{sec:eval}.


\section{The dbscan package}\label{sec:dbscan}
The package \pkg{dbscan} provides high performance code for DBSCAN and OPTICS through a \proglang{C++} implementation (interfaced via the \pkg{Rcpp} package by \cite{eddelbuettel2011rcpp}) using the $k$-d tree data structure implemented in the \proglang{C++} library ANN~\citep{mount1998ann} to improve $k$ nearest neighbor (kNN) and fixed-radius nearest neighbor search speed.
DBSCAN and OPTICS share a similar interface.

\begin{Schunk}
\begin{Sinput}
dbscan(x, eps, minPts = 5, weights = NULL, borderPoints = TRUE, ...)
optics(x, eps, minPts = 5, ...)
\end{Sinput}
\end{Schunk}

The first argument \code{x} is the data set in form of a \code{data.frame} or a \code{matrix}. The implementations use by default Euclidean distance for neighborhood computation. Alternatively, a precomputed set of pair-wise distances between data points stored in a \code{dist} object can be supplied. Using precomputed distances, arbitrary distance metrics can be used, however, note that $k$-d trees are not used for distance data, but lists of nearest neighbors are precomputed. For \code{dbscan()} and \code{optics()}, the parameter \code{eps} represents the radius of the $\epsilon$-neighborhood considered for density estimation  and \code{minPts} represents the density threshold to identify core points.
Note that \code{eps} is not strictly necessary for OPTICS but is only used as an upper limit for the considered neighborhood size used to reduce computational complexity.
\code{dbscan()} also can use weights for the data points in \code{x}. The density in a neighborhood is just the sum of the weights of the points inside the neighborhood. By default, each data point has a weight of one, so the density estimate for the neighborhood is just the number of data points inside the neighborhood.
%This is the reason why the density threshold is called minPoints, i.e., the minimum number of required points in the eps-neighborhood.
Using weights, the importance of points can be changed.

The original DBSCAN implementation assigns border points to the first cluster
it is density reachable from. Since this may result in different clustering results if the data points are processed in a different order, \cite{campello2015hierarchical} suggest for DBSCAN* to consider border points as noise. This can be achieved by using \code{borderPoints = FALSE}. All functions accept additional arguments. % in~\code{...}.
These arguments are passed on to the fixed-radius nearest neighbor search. More details about the implementation of the nearest neighbor search will be presented in Section~\ref{sec:nn} below.

Clusters can be extracted from the linear order produced by OPTICS. The \pkg{dbscan} implementation of the cluster extraction methods for ExtractDBSCAN and Extract-$\xi$ are:

\begin{Schunk}
\begin{Sinput}
extractDBSCAN(object, eps_cl)
extractXi(object, xi, minimum = FALSE, correctPredecessor = TRUE)
\end{Sinput}
\end{Schunk}

\code{extractDBSCAN()} extracts a clustering from an OPTICS ordering that is similar to what DBSCAN would produce with a single global $\epsilon$ set to \code{eps_cl}. \code{extractXi()} extracts clusters hierarchically based on the steepness of the reachability plot. \code{minimum} controls whether only the minimal (non-overlapping) cluster are extracted. \code{correctPredecessor} corrects a common artifact known of the original $\xi$ method presented in~\cite{ankerst1999optics} by pruning the steep up area for points that have predecessors not in the cluster (see Technical Note in Appendix~\ref{sec:technote} for details).

\subsection{Nearest Neighbor Search}\label{sec:nn}
The density based algorithms in \pkg{dbscan} rely heavily on forming neighborhoods, i.e., finding all points belonging to an $\epsilon$-neighborhood. A simple approach is to perform a linear search, i.e., always calculating the distances to all other points to find the closest points. This requires $O(n)$ operations, with $n$ being the number of data points, for each time a neighborhood is needed. Since DBSCAN and OPTICS process each data point once, this results in a $O(n^2)$ runtime complexity. A convenient way in \proglang{R} is to compute a distance matrix with all pairwise distances between points and sort the distances for each point (row in the distance matrix) to precompute the nearest neighbors for each point. However, this method has the drawback that the size of the full distance matrix is $O(n^2)$, and becomes very large and slow to compute for medium to large data sets.

In order to avoid computing the complete distance matrix,
\pkg{dbscan} relies on a space-partitioning data structure called a $k$-d trees~\citep{bentley1975multidimensional}. This data structure allows \pkg{dbscan} to identify the kNN or all neighbors within a fixed radius $eps$ more efficiently in sub-linear time using on average only $O(\mathop{log}(n))$ operations per query.
This results in a reduced runtime complexity of $O(n\mathop{log}(n))$.
However, note that $k$-d trees are known to degenerate for high-dimensional data requiring $O(n)$ operations and leading to a performance no better than linear search.
% See above
%However, for high-dimensional data, $k$-d trees are known to degenerate
%resulting again in a runtime complexity of $O(n^2)$.
Fast kNN search and fixed-radius nearest neighbor search are used in DBSCAN and OPTICS, but we also provide a direct interface in \pkg{dbscan}, since they are useful in their own right.

\begin{Schunk}
\begin{Sinput}
kNN(x, k, sort = TRUE, search = "kdtree", bucketSize = 10,
     splitRule = "suggest", approx = 0)

frNN(x, eps, sort = TRUE, search = "kdtree", bucketSize = 10,
     splitRule = "suggest", approx = 0)
\end{Sinput}
\end{Schunk}

The interfaces only differ in the way that \code{kNN()} requires to specify \code{k} while \code{frNN()} needs the radius \code{eps}. All other arguments are the same. \code{x} is the data and the result will be a list of neighbors in \code{x} for each point in \code{x}. \code{sort} controls if the returned points are sorted by distance. \code{search} controls what searching method should be used. Available search methods are \code{"kdtree"}, \code{"linear"} and \code{"dist"}. The linear search method does not build a search data structure, but performs a complete linear search to find the nearest neighbors.
%This is typically slow for large data sets, however,
The dist method precomputes a dissimilarity matrix which is very fast for small data sets, but problematic for large sets. The default method is to build a $k$-d tree. $k$-d trees are implemented in \proglang{C++} using a modified version of the ANN library \citep{mount1998ann} compiled for Euclidean distances. Parameters \code{bucketSize}, \code{splitRule} and \code{approx} are algorithmic parameters which control the way the $k$-d tree is built. \code{bucketSize} controls the maximal size of the $k$-d tree leaf nodes. \code{splitRule} specifies the method how the $k$-d tree partitions the data space. We use \code{"suggest"}, which uses the best guess of the ANN library given the data. \code{approx} greater than zero uses approximate NN search. Only nearest neighbors up to a distance of a factor of $(1+\mathrm{approx})\mathrm{eps}$ will be returned, but some actual neighbors may be omitted potentially leading to spurious clusters and noise points. However, the algorithm will enjoy a significant speedup. For more details, we refer the reader to the documentation of the ANN library~\citep{mount1998ann}. \code{dbscan()} and \code{optics()} use internally \code{frNN()} and the additional arguments in~\code{...} are passed on to the nearest neighbor search method.

% \section{Using the dbscan package}
\subsection{Clustering with DBSCAN}
We use a very simple artificial data set of four slightly overlapping Gaussians in two-dimensional space with 100 points each. We load \pkg{dbscan},
set the random number generator to make the results reproducible and create the data set.

<<echo=FALSE>>=
options(width = 75)
@

<<>>=
library("dbscan")

set.seed(2)
n <- 400

x <- cbind(
  x = runif(4, 0, 1) + rnorm(n, sd = 0.1),
  y = runif(4, 0, 1) + rnorm(n, sd = 0.1)
  )

true_clusters <- rep(1:4, time = 100)
@

<<fig=TRUE, include=FALSE, label=sampleData, width=5, height=5>>=
plot(x, col = true_clusters, pch = true_clusters)
@

\begin{figure}
\centering
\includegraphics[width=8cm]{dbscan-sampleData}
\caption{The sample dataset, consisting of 4 noisy Gaussian distributions with slight overlap.}
\label{fig:sampleData}
\end{figure}

The resulting data set is shown in Figure~\ref{fig:sampleData}.

To apply DBSCAN, we need to decide on the neighborhood radius~\code{eps} and
the density threshold~\code{minPts}. The rule of thumb for minPts is to use at least the number of dimensions of the data set plus one. In our case, this is 3. For eps, we can plot the points' kNN distances (i.e., the distance to the $k$th nearest neighbor) in decreasing order and look for a knee in the plot. The idea behind this heuristic is that points located inside of clusters will have a small $k$-nearest neighbor distance, because they are close to other points in the same cluster, while noise points are isolated and will have a rather large kNN distance. \pkg{dbscan} provides a function called \code{kNNdistplot()} to make this easier. For $k$ we use \code{minPts} - 1 since DBSCAN's  \code{minPts} include the actual data point and the $k$th nearest neighbors distance does not.

<<fig=TRUE, include=FALSE, label=kNNdistplot, width=7, height=4>>=
kNNdistplot(x, k = 2)
abline(h=.06, col = "red", lty=2)
@
\begin{figure}
\centering
\includegraphics{dbscan-kNNdistplot}
\caption{$k$-Nearest Neighbor Distance plot.}
\label{fig:kNNdistplot}
\end{figure}

The kNN distance plot is shown in Figure~\ref{fig:kNNdistplot}. A knee is visible at around a 2-NN distance of 0.06. We have manually added a horizontal
line for reference.

Now we can perform the clustering with the chosen parameters.
<<>>=
res <- dbscan(x, eps = 0.06, minPts = 3)
res
@

The resulting clustering identified one large cluster with 191 member points, two medium clusters with around 90 points, several very small
clusters and 15 noise points (represented by cluster id 0). The available
fields can be directly accessed using the list extraction operator \code{$}.
For example, the cluster assignment information can be used to plot the data
with the clusters identified by different labels and colors.

<<fig=TRUE, include=FALSE, label=dbscanPlot, width=5, height=5>>=
plot(x, col = res$cluster + 1L, pch = res$cluster + 1L)
@
\begin{figure}
    \centering
      \includegraphics[width=9cm]{dbscan-dbscanPlot}
      \caption{Result of clustering with DBSCAN. Noise is represented as black circles.}
      \label{fig:dbscanPlot}
\end{figure}

The scatter plot in Figure~\ref{fig:dbscanPlot} shows that the clustering
algorithm correctly identified the upper two clusters, but merged the lower two
clusters because the region between them has a high enough density. The small
clusters are isolated groups of 3 points (passing $\mathit{minPts}$) and the
noise points isolated points. These small clusters can be suppressed by using a larger number for \code{minPts}.

\pkg{dbscan} also provides a plot that adds
convex cluster hulls to the scatter plot shown in Figure~\ref{fig:dbscanHullPlot}.

<<fig=TRUE, include=FALSE, label=dbscanHullPlot, width=5, height=5>>=
hullplot(x, res)
@
\begin{figure}
    \centering
    \includegraphics[width=9cm]{dbscan-dbscanHullPlot}
    \caption{Convex hull plot of the DBSCAN clustering. Noise points
are black.
Note that noise points and points of another cluster may lie
within the convex hull of a different cluster. }
    \label{fig:dbscanHullPlot}
    \vspace{0.1cm}
\end{figure}

A clustering can also be used to find out to which clusters new data points
would be assigned using
\code{predict(object, newdata = NULL, data, ...)}.
The predict method uses nearest neighbor assignment to core points and needs the original dataset. Additional parameters %(\code{...})
are passed on to the nearest neighbor search method. Here we obtain the cluster assignment for the first 25 data points. Note that an assignment to cluster~0 means that the data point is considered noise because it is not close enough to a core point.

<<>>=
predict(res, x[1:25,], data = x)
@


\subsection{Clustering with OPTICS}

Unless OPTICS is purely used to extract a DBSCAN clustering, its parameters
have a different effect than for DBSCAN: \code{eps} is typically chosen rather large (we use 10 here) and \code{minPts} mostly affects core and reachability-distance calculation, where larger values have a smoothing effect. We use also 10, i.e., the core-distance is defined as the distance to the 9th nearest neighbor (spanning a neighborhood of 10 points).

<<>>=
res <- optics(x, eps = 10, minPts = 10)
res
@

OPTICS is an augmented ordering algorithm, which stores the computed order of the points it found in the \code{order} element of the returned object.

<<>>=
head(res$order, n = 15)
@

This means that data point 1 in the data set is the first in the order,
data point 363 is the second and so forth.
The density-based order produced by OPTICS can be directly plotted
as a reachability plot.

<<fig=TRUE, include=FALSE, label=opticsReachPlot, width=7, height=4>>=
plot(res)
@
\begin{figure}
    \centering
      \includegraphics{dbscan-opticsReachPlot}
      \caption{OPTICS reachability plot. Note that the first reachability value is always UNDEFINED.}
      \label{fig:opticsReachPlot}
\end{figure}


The reachability plot in Figure~\ref{fig:opticsReachPlot} shows the reachability distance for points
ordered by OPTICS. Valleys represent potential clusters separated by peaks.
Very high peaks may indicate noise points. To visualize the order
on the original data sets we can plot a line connecting the points in order.

<<fig=TRUE, include=FALSE, label=opticsOrder, width=5, height=5>>=
plot(x, col = "grey")
polygon(x[res$order,], )
@
\begin{figure}
    \centering
      \includegraphics[width=8cm]{dbscan-opticsOrder}
      \caption{OPTICS order of data points represented as a line.}
      \label{fig:opticsOrder}
\end{figure}

Figure~\ref{fig:opticsOrder} shows that points in each cluster are
visited in consecutive order starting with the points in the center (the densest region) and then the points in the surrounding area.


As noted in Section~\ref{sec:optics}, OPTICS has two primary cluster extraction methods using the ordered reachability structure it produces. A DBSCAN-type clustering can be extracted using \code{extractDBSCAN()} by specifying the global eps parameter. The reachability plot in figure~\ref{fig:opticsReachPlot} shows four peaks, i.e., points with a high reachability-distance. These points indicate boundaries between clusters four clusters. An \code{eps} threshold that separates the four clusters can be visually determined. In this case we use \code{eps_cl}  of 0.065.
<<fig=TRUE, include=FALSE, label=extractDBSCANReachPlot2, width=7, height=4>>=
res <- extractDBSCAN(res, eps_cl = .065)
plot(res)
@

<<fig=TRUE, include=FALSE, label=extractDBSCANHullPlot2>>=
hullplot(x, res)
@

\begin{figure}
  \centering
  \includegraphics{dbscan-extractDBSCANReachPlot2}
      \caption{Reachability plot for a DBSCAN-type clustering extracted at global $\epsilon = 0.065$ results in four clusters.}
  \label{fig:extractDBSCANReachPlot2}
  \centering
  \includegraphics[width=9cm]{dbscan-extractDBSCANHullPlot2}
      \caption{Convex hull plot for a DBSCAN-type clustering extracted at global $\epsilon = 0.065$ results in four clusters.}
  \label{fig:extractDBSCANHullPlot2}
\end{figure}

The resulting reachability and corresponding clusters are shown in Figures~\ref{fig:extractDBSCANReachPlot2} and \ref{fig:extractDBSCANHullPlot2}. The clustering  resembles closely the original structure
of the four clusters with which the data were generated, with the only difference
that points on the boundary of the clusters are marked as noise points.

\pkg{dbscan} also provides \code{extractXi()} to extract a hierarchical cluster
structure. We use here a \code{xi} value of 0.05.
<<>>=
res <- extractXi(res, xi = 0.05)
res
@

The $\xi$ method results in a hierarchical clustering structure, and thus points can be members of several nested clusters. Clusters are represented as contiguous ranges in the reachability plot and are available the field \code{clusters_xi}.

<<>>=
res$clusters_xi
@

Here we have seven clusters.
The clusters are also visible in the reachability plot.


<<fig=TRUE, include=FALSE, label=extractXiReachPlot, height=4, width=7>>=
plot(res)
@
<<fig=TRUE, include=FALSE, label=extractXiHullPlot, width=5, height=5>>=
hullplot(x, res)
@

\begin{figure}
    \centering
      \includegraphics{dbscan-extractXiReachPlot}
      \caption{Reachability plot of a hierarchical clustering
    extracted with Extract-$\xi$.}
      \label{fig:extractXiReachPlot}
%\end{figure}
%\begin{figure}[htb]
    \centering
      \includegraphics[width=9cm]{dbscan-extractXiHullPlot}
      \caption{Convex hull plot of a hierarchical clustering
            extracted with Extract-$\xi$.}
      \label{fig:extractXiHullPlot}
\end{figure}

Figure~\ref{fig:extractXiReachPlot} shows the reachability plot with clusters represented using colors and vertical bars below the plot. The clusters themselves can also be plotted with the convex hull plot function shown in Figure~\ref{fig:extractXiHullPlot}. Note how the nested structure is shown by clusters inside of clusters. Also note that it is possible for the convex hull, while useful for visualizations, to contain a point that is not considered as part of a cluster grouping.

%\subsection{LOF}
%The Local Outlier Factor score can be computed as follows
%\ifdefined\USESWEAVE
%<<>>=
%lof <- lof(x, k=3)
%summary(lof)
%@
%The distribution of outlier factors can be view simply using the specialized hist function:
%<<fig=TRUE, include=FALSE, label=LOF_hist, height=4, widht=9>>=
%hist(lof, breaks=20)
%@
%\begin{figure}
%    \centering
%    \includegraphics{dbscan-LOF_hist}
%    \caption{LOF outlier histogram.}
%    \label{fig:LOF_hist}
%\end{figure}
%
%The outlier factor can be visualized in a scatter plot through the following:
%<<fig=TRUE, include=FALSE, label=LOF_plot>>=
%plot(x, pch = ".", main = "LOF (k=3)")
%points(x, cex = (lof-1)*3, pch = 1, col="red")
%text(x[lof>2,], labels = round(lof, 1)[lof>2], pos = 3)
%@
%\begin{figure}
%    \centering
%      \includegraphics[width=9cm]{dbscan-LOF_plot}
%      \caption{Visualization of the local outlier factor of each point in the data set.}
%      \label{fig:LOF_plot}
%\end{figure}
%\else
%\fi

\subsection{Reachability and Dendrograms}
%The \pkg{dbscan} package contains a variety of visualization options.
Reachability plots can be converted into equivalent dendrograms
\citep{sander2003automatic}.
\pkg{dbscan} contains a fast implementation of the reachability-to-dendrogram
conversion algorithm through the use of a disjoint-set data structure~\citep{cormen2001introduction, patwary2010experiments}, allowing the user to choose which hierarchical representation they prefer.
The conversion algorithm can be directly called for OPTICS objects using the coercion method \code{as.dendrogram()}.

<<>>=
dend <- as.dendrogram(res)
dend
@

The dendrogram can be plotted using the standard plot method.
<<fig=TRUE, include=FALSE, label=opticsDendrogram, height=5, width=7>>=
plot(dend, ylab = "Reachability dist.", leaflab = "none")
@
\begin{figure}[t]
    \centering
      \includegraphics{dbscan-opticsDendrogram}
      \caption{Dendrogram structure of OPTICS reordering.}
      \label{fig:opticsDendrogram}
\end{figure}

Note how the dendrogram in Figure~\ref{fig:opticsDendrogram} closely resembles
the reachability plots with added binary splits. Since the object is a standard dendrogram (from package \pkg{stats}), it can be used like any other dendrogram
created with hierarchical clustering.

\section{Performance Comparison}\label{sec:eval}

\begin{table}
\begin{center}
    \begin{tabular}{ c c c }
    \hline
      {\bf Data set} & \bf{Size} & \bf{Dimension}\\
    \hline
     Aggregation & 788 & 2\\
     Compound & 399 & 2\\
     D31 & 3,100 & 2 \\
     flame & 240 & 2 \\
     jain & 373 & 2 \\
     pathbased & 300 & 2 \\
     R15 & 600 & 2 \\
     s1 & 5,000 & 2 \\
     s4 & 5,000 & 2 \\
     spiral & 312 & 2\\
     t4.8k & 8,000 & 2 \\
     synth1 & 1000 & 3 \\
     synth2 & 1000 & 10 \\
     synth3 & 1000 & 100 \\
     \hline
    \end{tabular}
\end{center}
    \caption{Datasets used for comparison.}
    \label{tab:dsizes}
\end{table}

Finally, we evaluate the performance of \pkg{dbscan}'s implementation of DBSCAN and OPTICS against other open-source implementations. This is not a comprehensive evaluation study, but is used to demonstrate the performance of \pkg{dbscan}'s DBSCAN and OPTICS implementation on datasets of varying sizes as compared to other software packages. A comparative test was performed using both DBSCAN and OPTICS algorithms, where supported, for the libraries listed in Table~\ref{tab:comp}
on page~\pageref{tab:comp}. The used datasets and their sizes are listed in Table~\ref{tab:dsizes}.
The data sets tested include s1 and s2,
the randomly generated but moderately-separated Gaussian clusters often used for agglomerative cluster analysis~\citep{Ssets},
the R15 validation data set used for maximum variance based clustering approach by \cite{veenman2002maximum},
the well-known spatial data set t4.8k used for validation of the CHAMELEON algorithm~\citep{karypis1999chameleon},
along with a variety of shape data sets commonly found in clustering validation papers~\citep{gionis2007clustering, zahn1971graph, chang2008robust, jain2005law, fu2007flame}.

In 2019, we performed a comparison between \pkg{dbscan} 0.9-8, \pkg{fpc} 2.1-10, ELKI version 0.7, PyClustering 0.6.6, SPMF v2.10, WEKA 3.8.0, SciKit-Learn 0.17.1 on a MacBook Pro equipped with a 2.5 GHz Intel Core i7 processor, running OS X El Capitan 10.11.6.
Note that newer versions of all mentioned software packages have been released since then. Changes in data structures and added optimization may result in significant improvements in runtime for different packages.

All data sets where normalized to the unit interval, [0, 1], per dimension to standardize neighbor queries. For all data sets we used $\mathit{minPts} = 2$ and $\epsilon = 0.10$ for DBSCAN. For OPTICS, $\mathit{minPts} = 2$ with a large $\epsilon = 1$ was used.
We replicated each run for each data set 15 times and report
the average runtime here.
Figures~\ref{fig:dbscan_bench}
and \ref{fig:optics_bench}
shows the runtimes. The datasets are sorted from easiest to hardest
and the algorithm in the legend are sorted from on average fastest to slowest.
Dimensionality, used distance function, data set size, and other data characteristics have a substantial impact on runtime performance.
The results show that the implementation in $\pkg{dbscan}$
compares very favorably to the other implementations (but note that we did not enable data indexing in ELKI, and used a very small $\mathit{minPts}$).

\begin{figure}
    \centering
    \includegraphics[width=0.80\textwidth]{figures/dbscan_benchmark}
    \caption{Runtime of DBSCAN in milliseconds (y-axis, logarithmic scale) vs. the name of the data set tested (x-axis).}
    \label{fig:dbscan_bench}
\end{figure}
\begin{figure}
    \centering
    \includegraphics[width=0.80\textwidth]{figures/optics_benchmark}
    \caption{Runtime of OPTICS in milliseconds (y-axis, logarithmic scale) vs. the name of the data set tested (x-axis).}
    \label{fig:optics_bench}
\end{figure}

% Clear page for Before Conclusind Remarks
%\clearpage

\section{Concluding Remarks}\label{sec:conc}
The \pkg{dbscan} package offers a set of scalable, robust, and complete implementations of popular density-based clustering algorithms from the DBSCAN family. The main features of \pkg{dbscan} are a
simple interface to fast clustering and cluster extraction algorithms, extensible data structures and methods for both density-based clustering visualization and representation including efficient conversion algorithms between
OPTICS ordering and dendrograms. In addition to DBSCAN and OPTICS discussed in this paper, \pkg{dbscan} also contains a fast version of the local outlier factor (LOF) algorithm~\citep{breunig2000lof} and an implementation of HDBSCAN~\citep{campello2015hierarchical} is under development.

\section{Acknowledgments}
This work is partially supported by industrial and government partners at the Center for Surveillance Research, a National Science Foundation I/UCRC.

%\clearpage
\bibliography{dbscan}
\clearpage

\appendix
\section{Technical Note on OPTICS cluster extraction}\label{sec:technote}
Of the two cluster extraction methods outlined in the publication, the flat DBSCAN-type extraction method seems to remain the defacto clustering method implemented across most statistical software for OPTICS. However, this method does not provide any advantage over the original DBSCAN method. To the best of the authors' knowledge, the only (other) library that has implemented the Extract-$\xi$ method for finding $\xi$-clusters is the Environment for Developing KDD-Applications Supported by Index Structures (ELKI) \citep{DBLP:journals/pvldb/SchubertKEZSZ15}. Perhaps much of the complication as to why nearly every statistical computing framework has neglected the Extract-$\xi$ cluster method stems from the fact that the original specification (Figure~19 in~\cite{ankerst1999optics}), while mostly complete, lacks important corrections that otherwise produce artifacts when clustering data~\citep{DBLP:conf/lwa/SchubertG18}. In the original specification of the algorithm, the `dents' of the ordering structure OPTICS produces are scanned for significant changes in reachability (hence the $\xi$ threshold), where clusters are represented by contiguous ranges of points that are distinguished by $1 - \xi$ density-reachability changes in the reachability plot. It is possible, however, after the recursive completion of the \code{update} algorithm
(Figure~7 in~\cite{ankerst1999optics})
that the next point processed in the ordering is not actually within the reachability distance of other members of cluster being currently processed. To account for the missing details described above, Erich Schubert introduced a small postprocessing step, first added in the ELKI framework and published much later~\citep{DBLP:conf/lwa/SchubertG18}. This filter corrects the artifacting based on the predecessor of each point~\citep{DBLP:conf/lwa/SchubertG18}, thus improving the $\xi$-cluster method from the original implementation mentioned in the original OPTICS paper. This correction was not introduced until version 0.7.0 of the ELKI framework, released in 2015, 16 years after the original publication of OPTICS and the Extract-$\xi$ method and not published in written form until 2018. \pkg{dbscan} has incorporated these important changes
in \code{extractXi()}
via the option \code{correctPredecessors} which is by default enabled.

%% Not included to keep things simple
% To further complicate the status of the \opxi algorithm's existing
% implementations, the current ELKI implementation, aside from the predecessor
% correction, does not match the original specification of the OPTICS algorithm.
% Mentioned by~\cite{ankerst1999optics}, \opxi should not include the last
% point of a steep-up area inside of each cluster range\footnote{We alerted the
% authors of ELKI to our correction, which is to be included in the next major
% release.}. The differences on even a small, randomly generated dataset
% are shown on Figures~\ref{fig:dbscan_xi} and \ref{fig:elki_xi} using the
% \pkg{dbscan} package result. Thus, \pkg{dbscan} offers complete, a correct
% \opxi implementation, true to the original specification.
%
% \begin{figure}
%   \centering
%     \begin{minipage}[t]{0.48\textwidth}
%       \includegraphics[width=\textwidth]{figures/dbscan_xi_bare}
%       \caption{Excluding the last point in the steep-up area.}
%       \label{fig:dbscan_xi}
%     \end{minipage}
%   \hfill
%     \begin{minipage}[t]{0.48\textwidth}
%       \includegraphics[width=\textwidth]{figures/elki_xi_bare}
%       \caption{Including the last point in the steep-up area. Note the sharp edges caused by points that are clearly not density-connected to their respective clusters.}
%       \label{fig:elki_xi}
%     \end{minipage}
% \end{figure}
% % Much of the complication stems from the fact that the original specification of the \opxi extraction method defined in the paper (Figure 19 of~\cite{}), while mostly complete, lacks important corrections that otherwise produces many artifacts when clustering data.  In the original specification of the \opxi algorithm, points within the ``dents'' of the ordering structure represent collections of spatially dense neighborhoods. Its possible, however, after OPTICS finishes ordering a spatially close cluster, that the next point included in the ordering may not be a member of current cluster (there are no more points in the current cluster to add). This can be remedied by pruning an area of each cluster known as the steep-up area (see Figure 19 in \citep{ankerst1999optics} for details) of points that do not contain predecessors within the same cluster.

\end{document}


================================================
FILE: vignettes/dbscan.bib
================================================
@Article{hahsler2019dbscan,
    title = {{dbscan}: Fast Density-Based Clustering with {R}},
    author = {Michael Hahsler and Matthew Piekenbrock and Derek Doran},
    journal = {Journal of Statistical Software},
    year = {2019},
    volume = {91},
    number = {1},
    pages = {1--30},
    doi = {10.18637/jss.v091.i01},
  }


@inproceedings{ester1996density,
  title={A density-based algorithm for discovering clusters in large spatial databases with noise.},
  author={Ester, Martin and Kriegel, Hans-Peter and Sander, J{\"o}rg and Xu, Xiaowei and others},
  booktitle={Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96)},
  pages={226--231},
  year={1996},
  url = {https://dl.acm.org/doi/10.5555/3001460.3001507}
}

@Manual{dbscan-R,
  title = {dbscan: Density Based Clustering of Applications with Noise (DBSCAN) and Related Algorithms},
  author = {Michael Hahsler and Matthew Piekenbrock},
  note = {R package version 0.9-8.2},
  year={2016}
}
%% Original OPTICS paper
%% -----------------------------------------------------------------------------
@inproceedings{ankerst1999optics,
  title={OPTICS: ordering points to identify the clustering structure},
  author={Ankerst, Mihael and Breunig, Markus M and Kriegel, Hans-Peter and Sander, J{\"o}rg},
  booktitle={ACM Sigmod Record},
  volume={28},
  number={2},
  pages={49--60},
  year={1999},
  organization={ACM},
  doi = {10.1145/304181.304187}
}

% OPTICS cluster extraction improvements
% -----------------------------------------------------------------------------
@inproceedings{DBLP:conf/lwa/SchubertG18,
  author    = {Erich Schubert and
               Michael Gertz},
  title     = {Improving the Cluster Structure Extracted from {OPTICS} Plots},
  booktitle = {Lernen, Wissen, Daten, Analysen (LWDA 2018)},
  series    = {{CEUR} Workshop Proceedings},
  volume    = {2191},
  pages     = {318--329},
  publisher = {CEUR-WS.org},
  year      = {2018}
}

% Original LOF paper
% -----------------------------------------------------------------------------
@inproceedings{breunig2000lof,
 title={LOF: identifying density-based local outliers},
 author={Breunig, Markus M and Kriegel, Hans-Peter and Ng, Raymond T and Sander, J{\"o}rg},
 booktitle={ACM Int. Conf. on Management of Data},
 volume={29},
 number={2},
 pages={93--104},
 year={2000},
 organization={ACM},
 doi = {10.1145/335191.335388}
}

% 2003 Reachability <--> Dendrograms Conversions Paper
% -----------------------------------------------------------------------------
@inproceedings{sander2003automatic,
 title={Automatic extraction of clusters from hierarchical clustering representations},
 author={Sander, J{\"o}rg and Qin, Xuejie and Lu, Zhiyong and Niu, Nan and Kovarsky, Alex},
 booktitle={Pacific-Asia Conference on Knowledge Discovery and Data Mining},
 pages={75--87},
 year={2003},
 organization={Springer}
}

% Original BIRCH paper
% -----------------------------------------------------------------------------
@inproceedings{zhang96,
 title={BIRCH: an efficient data clustering method for very large databases},
 author={Zhang, Tian and Ramakrishnan, Raghu and Livny, Miron},
 booktitle={ACM Sigmod Record},
 volume={25},
 number={2},
 pages={103--114},
 year={1996},
 organization={ACM}
}

% GDBSCAN Paper (Generalized DBSCAN, by Sanders)
% -----------------------------------------------------------------------------
@article{sander1998density,
 title={Density-based clustering in spatial databases: The algorithm gdbscan and its applications},
 author={Sander, J{\"o}rg and Ester, Martin and Kriegel, Hans-Peter and Xu, Xiaowei},
 journal={Data mining and knowledge discovery},
 volume={2},
 number={2},
 pages={169--194},
 year={1998},
 publisher={Springer}
}

% HDBSCAN* Newest Paper
% -----------------------------------------------------------------------------
@article{campello2015hierarchical,
 title={Hierarchical density estimates for data clustering, visualization, and outlier detection},
 author={Campello, Ricardo JGB and Moulavi, Davoud and Zimek, Arthur and Sander, Joerg},
 journal={ACM Transactions on Knowledge Discovery from Data (TKDD)},
 volume={10},
 number={1},
 pages={5},
 year={2015},
 publisher={ACM},
 doi = {10.1145/2733381}
}

% First HDBSCAN* introduction paper, later revised in 2015. The newer one is better.
% -----------------------------------------------------------------------------
@inproceedings{campello2013density,
 title={Density-based clustering based on hierarchical density estimates},
 author={Campello, Ricardo JGB and Moulavi, Davoud and Sander, J{\"o}rg},
 booktitle={Pacific-Asia Conference on Knowledge Discovery and Data Mining},
 pages={160--172},
 year={2013},
 organization={Springer},
 doi = {10.1007/978-3-642-37456-2_14}
}

% The new-ish 'Standard Methodology' paper of that 'tackles the methodological  drawbacks' % of internal clustering validation
% -----------------------------------------------------------------------------
@article{gurrutxaga2011towards,
 title={Towards a standard methodology to evaluate internal cluster validity indices},
 author={Gurrutxaga, Ibai and Muguerza, Javier and Arbelaitz, Olatz and P{\'e}rez, Jes{\'u}s M and Mart{\'\i}n, Jos{\'e} I},
 journal={Pattern Recognition Letters},
 volume={32},
 number={3},
 pages={505--515},
 year={2011},
 publisher={Elsevier}
}

% Original ABACUS - Workaround implementation of mixture modeling for finding
% arbitrary shapes
% -----------------------------------------------------------------------------
@article{gegick2011abacus,
 title={ABACUS: mining arbitrary shaped clusters from large datasets based on backbone identification},
 author={Gegick, M},
 year={2011},
 publisher={SIAM}
}


% Original Silhouette Index Paper
% -----------------------------------------------------------------------------
@article{rousseeuw1987silhouettes,
 title={Silhouettes: a graphical aid to the interpretation and validation of cluster analysis},
 author={Rousseeuw, Peter J},
 journal={Journal of computational and applied mathematics},
 volume={20},
 pages={53--65},
 year={1987},
 publisher={Elsevier}
}

% Extensive Comparative Study of IVMS
% -----------------------------------------------------------------------------
@article{arbelaitz2013extensive,
 title={An extensive comparative study of cluster validity indices},
 author={Arbelaitz, Olatz and Gurrutxaga, Ibai and Muguerza, Javier and P{\'e}rez, Jes{\'u}S M and Perona, I{\~n}Igo},
 journal={Pattern Recognition},
 volume={46},
 number={1},
 pages={243--256},
 year={2013},
 publisher={Elsevier}
}

% Graph Theory measures for Internal Cluster Validation
% -----------------------------------------------------------------------------
@article{pal1997cluster,
 title={Cluster validation using graph theoretic concepts},
 author={Pal, Nikhil R and Biswas, J},
 journal={Pattern Recognition},
 volume={30},
 number={6},
 pages={847--857},
 year={1997},
 publisher={Elsevier}
}

% Rankings of research papers by citation count; used for showing DBSCAN
% popularity
% -----------------------------------------------------------------------------
@misc{acade96:online,
author = {{Microsoft Academic Search}},
title = {Top publications in data mining},
month = {},
year = {2016},
note = {(Accessed on 08/29/2016)}
}

@article{PyCluste54:online,
doi = {10.21105/joss.01230},
url = {https://doi.org/10.21105/joss.01230},
year = {2019},
publisher = {The Open Journal},
volume = {4},
number = {36},
pages = {1230},
author = {Novikov, Andrei V.},
title = {PyClustering: Data Mining Library},
journal = {Journal of Open Source Software}
}


% Hartigans convex density estimation model
% -----------------------------------------------------------------------------
@article{hartigan1987estimation,
 title={Estimation of a convex density contour in two dimensions},
 author={Hartigan, JA},
 journal={Journal of the American Statistical Association},
 volume={82},
 number={397},
 pages={267--270},
 year={1987},
 publisher={Taylor \& Francis}
}

% Bentleys Original KDTree Paper
% -----------------------------------------------------------------------------
@article{bentley1975multidimensional,
 title={Multidimensional binary search trees used for associative searching},
 author={Bentley, Jon Louis},
 journal={Communications of the ACM},
 volume={18},
 number={9},
 pages={509--517},
 year={1975},
 publisher={ACM}
}

% Original CLARANS paper
% -----------------------------------------------------------------------------
@article{ng2002clarans,
 title={CLARANS: A method for clustering objects for spatial data mining},
 author={Ng, Raymond T. and Han, Jiawei},
 journal={IEEE transactions on knowledge and data engineering},
 volume={14},
 number={5},
 pages={1003--1016},
 year={2002},
 publisher={IEEE}
}

% Original DENCLUE paper
% -----------------------------------------------------------------------------
@inproceedings{hinneburg1998efficient,
 title={An efficient approach to clustering in large multimedia databases with noise},
 author={Hinneburg, Alexander and Keim, Daniel A},
 booktitle={KDD},
 volume={98},
 pages={58--65},
 year={1998}
}

% Original Chameleon Paper
% -----------------------------------------------------------------------------
@article{karypis1999chameleon,
 title={Chameleon: Hierarchical clustering using dynamic modeling},
 author={Karypis, George and Han, Eui-Hong and Kumar, Vipin},
 journal={Computer},
 volume={32},
 number={8},
 pages={68--75},
 year={1999},
 publisher={IEEE}
}

% Original CURE algorithm
% -----------------------------------------------------------------------------
@inproceedings{guha1998cure,
 title={CURE: an efficient clustering algorithm for large databases},
 author={Guha, Sudipto and Rastogi, Rajeev and Shim, Kyuseok},
 booktitle={ACM SIGMOD Record},
 volume={27},
 number={2},
 pages={73--84},
 year={1998},
 organization={ACM}
}

% R statistical computing language citation
% -----------------------------------------------------------------------------
@article{team2013r,
 title={R: A language and environment for statistical computing},
 author={Team, R Core and others},
 year={2013},
 publisher={Vienna, Austria}
}

% WEKA
% -----------------------------------------------------------------------------
@article{hall2009weka,
 title={The WEKA data mining software: an update},
 author={Hall, Mark and Frank, Eibe and Holmes, Geoffrey and Pfahringer, Bernhard and Reutemann, Peter and Witten, Ian H},
 journal={ACM SIGKDD explorations newsletter},
 volume={11},
 number={1},
 pages={10--18},
 year={2009},
 publisher={ACM}
}

% SPMF Java Machine Learning Library
% -----------------------------------------------------------------------------
@article{fournier2014spmf,
 title={SPMF: a Java open-source pattern mining library.},
 author={Fournier-Viger, Philippe and Gomariz, Antonio and Gueniche, Ted and Soltani, Azadeh and Wu, Cheng-Wei and Tseng, Vincent S and others},
 journal={Journal of Machine Learning Research},
 volume={15},
 number={1},
 pages={3389--3393},
 year={2014}
}

% Python Scikit Learn
% -----------------------------------------------------------------------------
@article{pedregosa2011scikit,
 title={Scikit-learn: Machine learning in Python},
 author={Pedregosa, Fabian and Varoquaux, Ga{\"e}l and Gramfort, Alexandre and Michel, Vincent and Thirion, Bertrand and Grisel, Olivier and Blondel, Mathieu and Prettenhofer, Peter and Weiss, Ron and Dubourg, Vincent and others},
 journal={Journal of Machine Learning Research},
 volume={12},
 number={Oct},
 pages={2825--2830},
 year={2011}
}

% MATLAB TOMCAT Toolkit
% -----------------------------------------------------------------------------
@article{daszykowski2007tomcat,
 title={TOMCAT: A MATLAB toolbox for multivariate calibration techniques},
 author={Daszykowski, Micha{\l} and Serneels, Sven and Kaczmarek, Krzysztof and Van Espen, Piet and Croux, Christophe and Walczak, Beata},
 journal={Chemometrics and intelligent laboratory systems},
 volume={85},
 number={2},
 pages={269--277},
 year={2007},
 publisher={Elsevier}
}

% OPTICS code for TOMCAT
% -----------------------------------------------------------------------------
@article{daszykowski2002looking,
 title={Looking for natural patterns in analytical data. 2. Tracing local density with OPTICS},
 author={Daszykowski, Michael and Walczak, Beata and Massart, Desire L},
 journal={Journal of chemical information and computer sciences},
 volume={42},
 number={3},
 pages={500--507},
 year={2002},
 publisher={ACS Publications}
}

% Java ML library
% -----------------------------------------------------------------------------
@comment{ Abeel, T.; de Peer, Y. V. & Saeys, Y. Java-ML: A Machine Learning
          Library, Journal of Machine Learning Research, 2009, 10, 931-934  }
@book{abeel2009journal,
author = "Abeel, T. ; de Peer and Y. V. and Saeys, Y. Java-ML: A Machine Learning Library",
title = "Journal of Machine Learning Research",
publisher = "10",
pages = "931--934",
year = 2009
}


% ELKI
% -----------------------------------------------------------------------------
@article{DBLP:journals/pvldb/SchubertKEZSZ15,
 author    = {Erich Schubert and
              Alexander Koos and
              Tobias Emrich and
              Andreas Z{\"{u}}fle and
              Klaus Arthur Schmid and
              Arthur Zimek},
 title     = {A Framework for Clustering Uncertain Data},
 journal   = {{PVLDB}},
 volume    = {8},
 number    = {12},
 pages     = {1976--1979},
 year      = {2015},
 url       = {http://www.vldb.org/pvldb/vol8/p1976-schubert.pdf},
 timestamp = {Mon, 30 May 2016 12:01:10 +0200},
 biburl    = {http://dblp.uni-trier.de/rec/bib/journals/pvldb/SchubertKEZSZ15},
 bibsource = {dblp computer science bibliography, http://dblp.org}
}

% BIRCH CRAN records
% -----------------------------------------------------------------------------
@misc{CRANPack84:online, author={CRAN}, title = {CRAN - Package birch}, howpublished = {\url{https://cran.r-project.org/web/packages/birch/index.html}}, month = {}, year = {2016}, note = {(Accessed on 09/16/2016)} }

% Spectral Clustering
% ----------------------------------------------------------------------------
@inproceedings{dhillon2004kernel,
 title={Kernel k-means: spectral clustering and normalized cuts},
 author={Dhillon, Inderjit S and Guan, Yuqiang and Kulis, Brian},
 booktitle={Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining},
 pages={551--556},
 year={2004},
 organization={ACM}
}


% Disjoint-set data structure (2 citations)
% -----------------------------------------------------------------------------
@misc{cormen2001introduction,
 title={Introduction to algorithms second edition},
 author={Cormen, Thomas H and Leiserson, Charles E and Rivest, Ronald L and Stein, Clifford},
 year={2001},
 publisher={The MIT Press}
}
@inproceedings{patwary2010experiments,
 title={Experiments on union-find algorithms for the disjoint-set data structure},
 author={Patwary, Md Mostofa Ali and Blair, Jean and Manne, Fredrik},
 booktitle={International Symposium on Experimental Algorithms},
 pages={411--423},
 year={2010},
 organization={Springer}
}

% SUBCLU high-dimensional density based clustering
% -----------------------------------------------------------------------------
@inproceedings{kailing2004density,
 title={Density-connected subspace clustering for high-dimensional data},
 author={Kailing, Karin and Kriegel, Hans-Peter and Kr{\"o}ger, Peer},
 booktitle={Proc. SDM},
 volume={4},
 year={2004},
 organization={SIAM}
}

% DBSCAN KDD Test of Time award
% -----------------------------------------------------------------------------
@misc{SIGKDDNe30:online,
author = {SIGKDD},
title = {SIGKDD News : 2014 SIGKDD Test of Time Award},
howpublished = {\url{https://www.kdd.org/News/view/2014-sigkdd-test-of-time-award}},
month = {},
year = {2014},
note = {(Accessed on 10/10/2016)}
}

% Raftery and Fraley's model-based clustering paper
% -----------------------------------------------------------------------------
@article{fraley2002model,
 title={Model-based clustering, discriminant analysis, and density estimation},
 author={Fraley, Chris and Raftery, Adrian E},
 journal={Journal of the American statistical Association},
 volume={97},
 number={458},
 pages={611--631},
 year={2002},
 publisher={Taylor \& Francis}
}

% FPC: Flexible Procedures for Clustering
% -----------------------------------------------------------------------------
@Manual{fpc,
title = {fpc: Flexible Procedures for Clustering},
author = {Christian Hennig},
year = {2015},
note = {R package version 2.1-10},
url = {https://CRAN.R-project.org/package=fpc},
}

% From the ELKI Benchmarking page
% -----------------------------------------------------------------------------
@article{kriegel2016black,
  title={The (black) art of runtime evaluation: Are we comparing algorithms or implementations?},
  author={Kriegel, Hans-Peter and Schubert, Erich and Zimek, Arthur},
  journal={Knowledge and Information Systems},
  pages={1--38},
  year={2016},
  publisher={Springer}
}

% ANN Library
% -----------------------------------------------------------------------------
@manual{mount1998ann,
 title={ANN: library for approximate nearest neighbour searching},
 author={Mount, David M and Arya, Sunil},
 year={2010},
 url = {http://www.cs.umd.edu/~mount/ANN/},
}

% Rcpp
% -----------------------------------------------------------------------------
@article{eddelbuettel2011rcpp,
 title={Rcpp: Seamless R and C++ integration},
 author={Eddelbuettel, Dirk and Fran{\c{c}}ois, Romain and Allaire, J and Chambers, John and Bates, Douglas and Ushey, Kevin},
 journal={Journal of Statistical Software},
 volume={40},
 number={8},
 pages={1--18},
 year={2011}
}

% ST-DBCAN: SpatioTemporal DBSCAN
% -----------------------------------------------------------------------------
@article{birant2007st,
 title={ST-DBSCAN: An algorithm for clustering spatial--temporal data},
 author={Birant, Derya and Kut, Alp},
 journal={Data \& Knowledge Engineering},
 volume={60},
 number={1},
 pages={208--221},
 year={2007},
 publisher={Elsevier}
}

% DBSCAN History (small relative to actual number of extensions)
% -----------------------------------------------------------------------------
@inproceedings{rehman2014dbscan,
 title={DBSCAN: Past, present and future},
 author={Rehman, Saif Ur and Asghar, Sohail and Fong, Simon and Sarasvady, S},
 booktitle={Applications of Digital Information and Web Technologies (ICADIWT), 2014 Fifth International Conference on the},
 pages={232--238},
 year={2014},
 organization={IEEE}
}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 								 Miscellaneous 						           %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


@article{Gupta2010,
abstract = {A key application of clustering data obtained from sources such as microarrays, protein mass spectroscopy, and phylogenetic profiles is the detection of functionally related genes. Typically, only a small number of functionally related genes cluster into one or more groups, and the rest need to be ignored. For such situations, we present Automated Hierarchical Density Shaving (Auto-HDS), a framework that consists of a fast hierarchical density-based clustering algorithm and an unsupervised model selection strategy. Auto-HDS can automatically select clusters of different densities, present them in a compact hierarchy, and rank individual clusters using an innovative stability criteria. Our framework also provides a simple yet powerful 2D visualization of the hierarchy of clusters that is useful for further interactive exploration. We present results on Gasch and Lee microarray data sets to show the effectiveness of our methods. Additional results on other biological data are included in the supplemental material.},
author = {Gupta, Gunjan and Liu, Alexander and Ghosh, Joydeep},
doi = {10.1109/TCBB.2008.32},
file = {:Users/mpiekenbrock/ResearchLibrary/Automated Hierarchical Density Shaving- A Robust Automated Clustering and Visualization Framework for Large Biological Data Sets.pdf:pdf},
isbn = {1557-9964},
issn = {15455963},
journal = {IEEE/ACM Transactions on Computational Biology and Bioinformatics},
keywords = {Bioinformatics,Clustering,Data and knowledge visualization,Mining methods and algorithms},
number = {2},
pages = {223--237},
pmid = {20431143},
title = {{Automated hierarchical density shaving: A robust automated clustering and visualization framework for large biological data sets}},
volume = {7},
year = {2010}
}
@article{Ssets,
   author = {P. Fr\"anti and O. Virmajoki},
   title = {Iterative shrinking method for clustering problems},
   journal = {Pattern Recognition},
   year = {2006},
   volume = {39},
   number = {5},
   pages = {761--765}
}

% Path and Spiral based
@article{chang2008robust,
 title={Robust path-based spectral clustering},
 author={Chang, Hong and Yeung, Dit-Yan},
 journal={Pattern Recognition},
 volume={41},
 number={1},
 pages={191--203},
 year={2008},
 publisher={Elsevier}
}

% Compound dataset
@article{zahn1971graph,
 title={Graph-theoretical methods for detecting and describing gestalt clusters},
 author={Zahn, Charles T},
 journal={IEEE Transactions on computers},
 volume={100},
 number={1},
 pages={68--86},
 year={1971},
 publisher={IEEE}
}

% Aggregation dataset
@article{gionis2007clustering,
 title={Clustering aggregation},
 author={Gionis, Aristides and Mannila, Heikki and Tsaparas, Panayiotis},
 journal={ACM Transactions on Knowledge Discovery from Data (TKDD)},
 volume={1},
 number={1},
 pages={4},
 year={2007},
 publisher={ACM}
}

% R15 dataset
@article{veenman2002maximum,
 title={A maximum variance cluster algorithm},
 author={Veenman, Cor J. and Reinders, Marcel J. T. and Backer, Eric},
 journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
 volume={24},
 number={9},
 pages={1273--1280},
 year={2002},
 publisher={IEEE}
}

@inproceedings{reilly2010detection,
 title={Detection and tracking of large number of targets in wide area surveillance},
 author={Reilly, Vladimir and Idrees, Haroon and Shah, Mubarak},
 booktitle={European Conference on Computer Vision},
 pages={186--199},
 year={2010},
 organization={Springer}
}

@inproceedings{jain2005law,
 title={Law, Data clustering: a user’s dilemma},
 author={Jain, Anil K and Martin, HC},
 booktitle={Proceedings of the First international conference on Pattern Recognition and Machine Intelligence},
 year={2005}
}

@article{jain1999review,
   author = {Jain, A. K. and Murty, M. N. and Flynn, P. J.},
   title = {Data Clustering: A Review},
   journal = {ACM Computuing Surveys},
   issue_date = {Sept. 1999},
   volume = {31},
   number = {3},
   month = sep,
   year = {1999},
   issn = {0360-0300},
   pages = {264--323},
   numpages = {60},
   url = {http://doi.acm.org/10.1145/331499.331504},
   doi = {10.1145/331499.331504},
   acmid = {331504},
   publisher = {ACM},
   address = {New York, NY, USA},
}

% Flame data set
@article{fu2007flame,
 title={FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data},
 author={Fu, Limin and Medico, Enzo},
 journal={BMC Bioinformatics},
 volume={8},
 number={1},
 pages={1},
 year={2007},
 publisher={BioMed Central}
}

% Birch dataset
@article{Birchsets,
   author = {T. Zhang and R. Ramakrishnan and M. Livny},
   title = {BIRCH: A new data clustering algorithm and its applications},
   journal = {Data Mining and Knowledge Discovery},
   year = {1997},
   volume = {1},
   number = {2},
   pages = {141--182}
}

@inproceedings{kisilevich2010p,
 title={P-DBSCAN: a density based clustering algorithm for exploration and analysis of attractive areas using collections of geo-tagged photos},
 author={Kisilevich, Slava and Mansmann, Florian and Keim, Daniel},
 booktitle={Proceedings of the 1st international conference and exhibition on computing for geospatial research \& application},
 pages={38},
 year={2010},
 organization={ACM}
}

@inproceedings{celebi2005mining,
 title={Mining biomedical images with density-based clustering},
 author={Celebi, M Emre and Aslandogan, Y Alp and Bergstresser, Paul R},
 booktitle={International Conference on Information Technology: Coding and Computing (ITCC'05)-Volume II},
 volume={1},
 pages={163--168},
 year={2005},
 organization={IEEE}
}

@inproceedings{ertoz2003finding,
 title={Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data.},
 author={Ert{\"o}z, Levent and Steinbach, Michael and Kumar, Vipin},
 booktitle={SDM},
 pages={47--58},
 year={2003},
 organization={SIAM}
}

@article{Chen2014,
author = {Chen, W and Ji, M H and Wang, J M},
doi = {10.3991/ijoe.v10i6.3881},
file = {:Users/mpiekenbrock/ResearchLibrary/TDBSCAN.pdf:pdf},
issn = {18612121},
journal = {International Journal of Online Engineering},
keywords = {Density-based clustering,Personal travel trajectory,T-DBSCAN,Trip segmentation},
number = {6},
pages = {19--24},
title = {{T-DBSCAN: A spatiotemporal density clustering for GPS trajectory segmentation}},
volume = {10},
year = {2014}
}


@incollection{sander2011density,
 title={Density-based clustering},
 author={Sander, Joerg},
 booktitle={Encyclopedia of Machine Learning},
 pages={270--273},
 year={2011},
 publisher={Springer}
}


% 88 citations
@article{verma2012comparative,
 title={A comparative study of various clustering algorithms in data mining},
 author={Verma, Manish and Srivastava, Mauly and Chack, Neha and Diswar, Atul Kumar and Gupta, Nidhi},
 journal={International Journal of Engineering Research and Applications (IJERA)},
 volume={2},
 number={3},
 pages={1379--1384},
 year={2012}
}

@inproceedings{roy2005approach,
 title={An approach to find embedded clusters using density based techniques},
 author={Roy, Swarup and Bhattacharyya, DK},
 booktitle={International Conference on Distributed Computing and Internet Technology},
 pages={523--535},
 year={2005},
 organization={Springer}
}

@inproceedings{chowdhury2010efficient,
 title={An efficient method for subjectively choosing parameter ‘k’automatically in VDBSCAN (Varied Density Based Spatial Clustering of Applications with Noise) algorithm},
 author={Chowdhury, AK M Rasheduzzaman and Mollah, Md Elias and Rahman, Md Asikur},
 booktitle={Computer and Automation Engineering (ICCAE), 2010 The 2nd International Conference on},
 volume={1},
 pages={38--41},
 year={2010},
 organization={IEEE}
}

@inproceedings{ghanbarpour2014exdbscan,
 title={EXDBSCAN: An extension of DBSCAN to detect clusters in multi-density datasets},
 author={Ghanbarpour, Asieh and Minaei, Behrooz},
 booktitle={Intelligent Systems (ICIS), 2014 Iranian Conference on},
 pages={1--5},
 year={2014},
 organization={IEEE}
}

@inproceedings{vijayalakshmi2010improved,
 title={Improved varied density based spatial clustering algorithm with noise},
 author={Vijayalakshmi, S and Punithavalli, M},
 booktitle={Computational Intelligence and Computing Research (ICCIC), 2010 IEEE International Conference on},
 pages={1--4},
 year={2010},
 organization={IEEE}
}

@article{Wang2013,
author = {Wang, Wei},
file = {:Users/mpiekenbrock/Downloads/905067f5314e6073d4779c11572bd8c5.pdf:pdf},
isbn = {978-0-9891305-0-9},
keywords = {clustering algorithm,clustering techniques,data mining,derivative,global optimum k,similarity,similarity and minimizes intergroup,there are four basic,vdbscan},
pages = {225--228},
title = {{Improved VDBSCAN With Global Optimum K}},
year = {2013}
}

@article{parvez2012data,
 title={Data set property based ‘K’in VDBSCAN Clustering Algorithm},
 author={Parvez, Abu Wahid Md Masud},
 journal={World of Computer Science and Information Technology Journal (WCSIT)},
 volume={2},
 number={3},
 pages={115--119},
 year={2012}
}

@inproceedings{liu2007vdbscan,
 title={VDBSCAN: varied density based spatial clustering of applications with noise},
 author={Liu, Peng and Zhou, Dong and Wu, Naijun},
 booktitle={2007 International conference on service systems and service management},
 pages={1--4},
 year={2007},
 organization={IEEE}
}

@article{pei2009decode,
 title={DECODE: a new method for discovering clusters of different densities in spatial data},
 author={Pei, Tao and Jasra, Ajay and Hand, David J and Zhu, A-Xing and Zhou, Chenghu},
 journal={Data Mining and Knowledge Discovery},
 volume={18},
 number={3},
 pages={337--369},
 year={2009},
 publisher={Springer}
}

@article{duan2007local,
 title={A local-density based spatial clustering algorithm with noise},
 author={Duan, Lian and Xu, Lida and Guo, Feng and Lee, Jun and Yan, Baopin},
 journal={Information Systems},
 volume={32},
 number={7},
 pages={978--986},
 year={2007},
 publisher={Elsevier}
}

@inproceedings{li2007traffic,
 title={Traffic density-based discovery of hot routes in road networks},
 author={Li, Xiaolei and Han, Jiawei and Lee, Jae-Gil and Gonzalez, Hector},
 booktitle={International Symposium on Spatial and Temporal Databases},
 pages={441--459},
 year={2007},
 organization={Springer}
}

@article{tran2006knn,
 title={KNN-kernel density-based clustering for high-dimensional multivariate data},
 author={Tran, Thanh N and Wehrens, Ron and Buydens, Lutgarde MC},
 journal={Computational Statistics \& Data Analysis},
 volume={51},
 number={2},
 pages={513--525},
 year={2006},
 publisher={Elsevier}
}

@inproceedings{jiang2003dhc,
 title={DHC: a density-based hierarchical clustering method for time series gene expression data},
 author={Jiang, Daxin and Pei, Jian and Zhang, Aidong},
 booktitle={Bioinformatics and Bioengineering, 2003. Proceedings. Third IEEE Symposium on},
 pages={393--400},
 year={2003},
 organization={IEEE}
}

@inproceedings{kriegel2005density,
 title={Density-based clustering of uncertain data},
 author={Kriegel, Hans-Peter and Pfeifle, Martin},
 booktitle={Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining},
 pages={672--677},
 year={2005},
 organization={ACM}
}

@book{agrawal1998automatic,
 title={Automatic subspace clustering of high dimensional data for data mining applications},
 author={Agrawal, Rakesh and Gehrke, Johannes and Gunopulos, Dimitrios and Raghavan, Prabhakar},
 volume={27},
 number={2},
 year={1998},
 publisher={ACM}
}

@inproceedings{cao2006density,
 title={Density-Based Clustering over an Evolving Data Stream with Noise.},
 author={Cao, Feng and Ester, Martin and Qian, Weining and Zhou, Aoying},
 booktitle={SDM},
 volume={6},
 pages={328--339},
 year={2006},
 organization={SIAM}
}

@inproceedings{chen2007density,
 title={Density-based clustering for real-time stream data},
 author={Chen, Yixin and Tu, Li},
 booktitle={Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining},
 pages={133--142},
 year={2007},
 organization={ACM}
}


@article{kriegel:2011,
 title={Density-based clustering},
 author={Kriegel, Hans-Peter and Kr{\"o}ger, Peer and Sander, J{\"o}rg and Zimek Arthur},
 journal={Wires Data and Knowledge Discovery},
 volume={1},
 number={},
 pages={231--240},
 year={2011},
 publisher={John Wiley \& Sons}
}

@book{Aggarwal:2013,
    author = {Aggarwal, Charu C. and Reddy, Chandan K.},
    title = {Data Clustering: Algorithms and Applications},
    year = {2013},
    isbn = {1466558210, 9781466558212},
    edition = {1st},
    publisher = {Chapman \& Hall/CRC},
}

@book{Kaufman:1990,
    title = "Finding groups in data : an introduction to cluster analysis",
    author = "Kaufman, Leonard and Rousseeuw, Peter J.",
    series = "Wiley series in probability and mathematical statistics",
    publisher = "Wiley",
    address = "New York",
    isbn = "0-471-87876-6",
    year = 1990
}

@ARTICLE{jarvis1973,
  author={Jarvis, R.A. and Patrick, E.A.},
  journal={IEEE Transactions on Computers},
  title={Clustering Using a Similarity Measure Based on Shared Near Neighbors},
  year={1973},
  volume={C-22},
  number={11},
  pages={1025-1034},
  keywords={Clustering, nonparametric, pattern recognition, shared near neighbors, similarity measure.},
  doi={10.1109/T-C.1973.223640}
  }

@inbook{erdoz2003,
author = {Levent Ertöz and Michael Steinbach and Vipin Kumar},
title = {Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data},
booktitle = {Proceedings of the 2003 SIAM International Conference on Data Mining (SDM)},
year = {2003},
pages = {47-58},
doi = {10.1137/1.9781611972733.5}
}

@inbook{moulavi2014,
author = {Davoud Moulavi and Pablo A. Jaskowiak and Ricardo J. G. B. Campello and Arthur Zimek and Jörg Sander},
title = {Density-Based Clustering Validation},
booktitle = {Proceedings of the 2014 SIAM International Conference on Data Mining (SDM)},
year = {2014},
pages = {839-847},
doi = {10.1137/1.9781611973440.96},
}


================================================
FILE: vignettes/hdbscan.Rmd
================================================
---
title: "HDBSCAN with the dbscan package"
author: "Matt Piekenbrock, Michael Hahsler"
vignette: >
  %\VignetteIndexEntry{Hierarchical DBSCAN (HDBSCAN) with the dbscan package}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
header-includes: \usepackage{animation}
output: html_document
---
The dbscan package [6] includes a fast implementation of Hierarchical DBSCAN (HDBSCAN) and its related algorithm(s) for the 
R platform. This vignette introduces how to interface with these features. To understand how HDBSCAN works, we refer to an excellent Python Notebook resource that goes over the basic concepts of the algorithm (see [ the SciKit-learn docs](http://hdbscan.readthedocs.io/en/latest/how_hdbscan_works.html)). For the sake of simplicity, consider the same sample 
dataset from the notebook:
```{r}
library("dbscan")
data("moons")
plot(moons, pch=20)
```

To run the HDBSCAN algorithm, simply pass the dataset and the (single) parameter value 'minPts' to the hdbscan function. 

```{r}
  cl <- hdbscan(moons, minPts = 5)
  cl
```

The 'flat' results are stored in the 'cluster' member. Noise points are given a value of 0, so increment by 1. 
```{r}
 plot(moons, col=cl$cluster+1, pch=20)
```

The results match intuitive notions of what 'similar' clusters may look like when they manifest in arbitrary shapes. 

## Hierarchical DBSCAN
The resulting HDBSCAN object contains a hierarchical representation of every possible DBSCAN* clustering. This hierarchical representation is compactly stored in the familiar 'hc' member of the resulting HDBSCAN object, in the same format of traditional hierarchical clustering objects formed using the 'hclust' method from the stats package. 
```{r}
cl$hc
```

Note that although this object is available for use with any of the methods that work with 'hclust' objects, the distance method HDBSCAN uses (mutual reachability distance, see [2]) is _not_ an available method of the hclust function. This hierarchy, denoted the "HDBSCAN* hierarchy" in [3], can be visualized using the built-in plotting method from the stats package 
```{r}
plot(cl$hc, main="HDBSCAN* Hierarchy")
```

## DBSCAN\* vs cutting the HDBSCAN\* tree 
As the name implies, the fascinating thing about the HDBSCAN\* hierarchy is that any global 'cut' is equivalent to running DBSCAN\* (DBSCAN w/o border points) at the tree's cutting threshold $eps$ (assuming the same $minPts$ parameter setting was used). But can this be verified manually? Using a modified function to distinguish noise using core distance as 0 (since the stats cutree method _does not_ assign singletons with 0), the results can be shown to be identical. 
```{r}
cl <- hdbscan(moons, minPts = 5)
check <- rep(FALSE, nrow(moons)-1)
core_dist <- kNNdist(moons, k=5-1)

## cutree doesn't distinguish noise as 0, so we make a new method to do it manually 
cut_tree <- function(hcl, eps, core_dist){
  cuts <- unname(cutree(hcl, h=eps))
  cuts[which(core_dist > eps)] <- 0 # Use core distance to distinguish noise
  cuts
}

eps_values <- sort(cl$hc$height, decreasing = TRUE)+.Machine$double.eps ## Machine eps for consistency between cuts
for (i in 1:length(eps_values)) { 
  cut_cl <- cut_tree(cl$hc, eps_values[i], core_dist)
  dbscan_cl <- dbscan(moons, eps = eps_values[i], minPts = 5, borderPoints = FALSE) # DBSCAN* doesn't include border points
  
  ## Use run length encoding as an ID-independent way to check ordering
  check[i] <- (all.equal(rle(cut_cl)$lengths, rle(dbscan_cl$cluster)$lengths) == "TRUE")
}
print(all(check == TRUE))
```

## Simplified Tree
The HDBSCAN\* hierarchy is useful, but for larger datasets it can become overly cumbersome since every data point is represented as a leaf somewhere in the hierarchy. The hdbscan object comes with a powerful visualization tool that plots the 'simplified' hierarchy(see [2] for more details), which shows __cluster-wide__ changes over an infinite number of $eps$ thresholds. It is the default visualization dispatched by the 'plot' method
```{r}
 plot(cl)
```

You can change up colors
```{r}
 plot(cl, gradient = c("yellow", "orange", "red", "blue"))
```

... and scale the widths for individual devices appropriately 
```{r}
plot(cl, gradient = c("purple", "blue", "green", "yellow"), scale=1.5)
```

... even outline the most 'stable' clusters reported in the flat solution 
```{r}
plot(cl, gradient = c("purple", "blue", "green", "yellow"), show_flat = TRUE)
```

## Cluster Stability Scores
Note the stability scores correspond to the labels on the condensed tree, but the cluster assignments in the cluster member element do not correspond to the labels in the condensed tree. Also, note that these scores represent the stability scores _before_ the traversal up the tree that updates the scores based on the children. 
```{r}
print(cl$cluster_scores)
```

The individual point membership 'probabilities' are in the probabilities member element
```{r}
  head(cl$membership_prob)
```

These can be used to show the 'degree of cluster membership' by, for example, plotting points with transparencies that correspond to their membership degrees.   
```{r}
  plot(moons, col=cl$cluster+1, pch=21)
  colors <- mapply(function(col, i) adjustcolor(col, alpha.f = cl$membership_prob[i]), 
                   palette()[cl$cluster+1], seq_along(cl$cluster))
  points(moons, col=colors, pch=20)
```

## Global-Local Outlier Score from Hierarchies
A recent journal publication on HDBSCAN comes with a new outlier measure that computes an outlier score of each point in the data based on local _and_ global properties of the hierarchy, defined as the Global-Local Outlier Score from Hierarchies (GLOSH)[4]. An example of this is shown below, where unlike the membership probabilities, the opacity of point represents the amount of "outlierness" the point represents. Traditionally, outliers are generally considered to be observations that deviate from the expected value of their presumed underlying distribution, where the measure of deviation that is considered significant is determined by some statistical threshold value.

__Note:__ Because of the distinction made that noise points, points that _are not_ assigned to any clusters, should be considered in the definition of an outlier, the outlier scores computed are not just the inversely-proportional scores to the membership probabilities. 
```{r}
  top_outliers <- order(cl$outlier_scores, decreasing = TRUE)[1:10]
  colors <- mapply(function(col, i) adjustcolor(col, alpha.f = cl$outlier_scores[i]), 
                   palette()[cl$cluster+1], seq_along(cl$cluster))
  plot(moons, col=colors, pch=20)
  text(moons[top_outliers, ], labels = top_outliers, pos=3)
```

## A Larger Clustering Example 
A larger example dataset may be more beneficial in explicitly revealing the usefulness of HDSBCAN. Consider the 'DS3' 
dataset originally published as part of a benchmark test dataset for the Chameleon clustering algorithm [5]. It's
clear that the shapes in this dataset can be distinguished sufficiently well by a human, however, it is well known that 
many clustering algorithms fail to capture the intuitive structure. 
```{r}
data("DS3")
plot(DS3, pch=20, cex=0.25)
```

Using the single parameter setting of, say, 25, HDBSCAN finds 6 clusters
```{r}
cl2 <- hdbscan(DS3, minPts = 25)
cl2
```

Marking the noise appropriately and highlighting points based on their 'membership probabilities' as before, a visualization of the cluster structure can be easily crafted.   
```{r}
  plot(DS3, col=cl2$cluster+1, 
       pch=ifelse(cl2$cluster == 0, 8, 1), # Mark noise as star
       cex=ifelse(cl2$cluster == 0, 0.5, 0.75), # Decrease size of noise
       xlab=NA, ylab=NA)
  colors <- sapply(1:length(cl2$cluster), 
                   function(i) adjustcolor(palette()[(cl2$cluster+1)[i]], alpha.f = cl2$membership_prob[i]))
  points(DS3, col=colors, pch=20)
```

The simplified tree can be particularly useful for larger datasets  
```{r}
  plot(cl2, scale = 3, gradient = c("purple", "orange", "red"), show_flat = TRUE)
```

## Performance 
All of the computational and memory intensive tasks required by HDSBCAN were written in C++ using the Rcpp package. With DBSCAN, the performance depends on the parameter settings, primarily on the radius at which points are considered as candidates for clustering ('eps'), and generally less so on the 'minPts' parameter. Intuitively, larger values of eps increase the computation time. 

One of the primary computational bottleneck with using HDBSCAN is the computation of the full (euclidean) pairwise distance between all points, for which HDBSCAN currently relies on base R 'dist' method for. If a precomputed one is available, the running time of HDBSCAN can be moderately reduced. 

## References 
1. Martin Ester, Hans-Peter Kriegel, Joerg Sander, Xiaowei Xu (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Institute for Computer Science, University of Munich. Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96).
2. Campello, Ricardo JGB, Davoud Moulavi, Arthur Zimek, and Jörg Sander. "A framework for semi-supervised and unsupervised optimal extraction of clusters from hierarchies." Data Mining and Knowledge Discovery 27, no. 3 (2013): 344-371.
3. Campello, Ricardo JGB, Davoud Moulavi, and Joerg Sander. "Density-based clustering based on hierarchical density estimates." In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 160-172. Springer Berlin Heidelberg, 2013.
4. Campello, Ricardo JGB, Davoud Moulavi, Arthur Zimek, and Jörg Sander. "Hierarchical density estimates for data clustering, visualization, and outlier detection." ACM Transactions on Knowledge Discovery from Data (TKDD) 10, no. 1 (2015): 5.
5. Karypis, George, Eui-Hong Han, and Vipin Kumar. "Chameleon: Hierarchical clustering using dynamic modeling." Computer 32, no. 8 (1999): 68-75.
6. Hahsler M, Piekenbrock M, Doran D (2019). "dbscan: Fast Density-Based Clustering with R." Journal of Statistical Software, 91(1), 1-30. doi: [10.18637/jss.v091.i01](https://doi.org/10.18637/jss.v091.i01)