[
  {
    "path": ".Rbuildignore",
    "content": "proj$\n^\\.Rproj\\.user$\n^cran-comments\\.md$\n^appveyor\\.yml$\n^revdep$\n^.*\\.o$\n^.*\\.Rproj$\n^LICENSE\nREADME.Rmd\ndata_src\nignore\n^\\.github$\n"
  },
  {
    "path": ".github/.gitignore",
    "content": "*.html\n"
  },
  {
    "path": ".gitignore",
    "content": "# Generated files \n*.o\n*.so\n\n# History files\n.Rhistory\n.Rapp.history\n.RData\n*.Rcheck\n\n\n# Example code in package build process\n*-Ex.R\n\n# RStudio files\n.Rproj.user/\n\n# produced vignettes\nvignettes/*.html\nvignettes/*.pdf\n.Rproj.user\n\n# OS stuff \n.DS*\n\n# Personal work directories \nWork\nignore\njss\n"
  },
  {
    "path": "DESCRIPTION",
    "content": "Package: dbscan\nTitle: Density-Based Spatial Clustering of Applications with Noise\n    (DBSCAN) and Related Algorithms\nVersion: 1.2.4\nDate: 2025-12-18\nAuthors@R: c(\n    person(\"Michael\", \"Hahsler\", email = \"mhahsler@lyle.smu.edu\", \n           role = c(\"aut\", \"cre\", \"cph\"),\n           comment = c(ORCID = \"0000-0003-2716-1405\")),\n    person(\"Matthew\", \"Piekenbrock\", role = c(\"aut\", \"cph\")),\n    person(\"Sunil\", \"Arya\", role = c(\"ctb\", \"cph\")),\n    person(\"David\", \"Mount\", role = c(\"ctb\", \"cph\")),\n    person(\"Claudia\", \"Malzer\", role = \"ctb\")\n  )\nDescription: A fast reimplementation of several density-based algorithms\n    of the DBSCAN family. Includes the clustering algorithms DBSCAN\n    (density-based spatial clustering of applications with noise) and\n    HDBSCAN (hierarchical DBSCAN), the ordering algorithm OPTICS (ordering\n    points to identify the clustering structure), shared nearest neighbor\n    clustering, and the outlier detection algorithms LOF (local outlier\n    factor) and GLOSH (global-local outlier score from hierarchies). The\n    implementations use the kd-tree data structure (from library ANN) for\n    faster k-nearest neighbor search. An R interface to fast kNN and\n    fixed-radius NN search is also provided.  Hahsler, Piekenbrock and\n    Doran (2019) <doi:10.18637/jss.v091.i01>.\nLicense: GPL (>= 2)\nURL: https://github.com/mhahsler/dbscan\nBugReports: https://github.com/mhahsler/dbscan/issues\nDepends:\n    R (>= 3.2.0)\nImports:\n    generics,\n    graphics,\n    Rcpp (>= 1.0.0),\n    stats\nSuggests:\n    dendextend,\n    fpc,\n    igraph,\n    knitr,\n    microbenchmark,\n    rmarkdown,\n    testthat (>= 3.0.0),\n    tibble\nLinkingTo: \n    Rcpp\nVignetteBuilder: \n    knitr\nConfig/testthat/edition: 3\nCopyright: ANN library is copyright by University of Maryland, Sunil Arya\n    and David Mount. All other code is copyright by Michael Hahsler and\n    Matthew Piekenbrock.\nEncoding: UTF-8\nRoxygen: list(markdown = TRUE)\nRoxygenNote: 7.3.3\n"
  },
  {
    "path": "LICENSE",
    "content": "                    GNU GENERAL PUBLIC LICENSE\n                       Version 3, 29 June 2007\n\n Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>\n Everyone is permitted to copy and distribute verbatim copies\n of this license document, but changing it is not allowed.\n\n                            Preamble\n\n  The GNU General Public License is a free, copyleft license for\nsoftware and other kinds of works.\n\n  The licenses for most software and other practical works are designed\nto take away your freedom to share and change the works.  By contrast,\nthe GNU General Public License is intended to guarantee your freedom to\nshare and change all versions of a program--to make sure it remains free\nsoftware for all its users.  We, the Free Software Foundation, use the\nGNU General Public License for most of our software; it applies also to\nany other work released this way by its authors.  You can apply it to\nyour programs, too.\n\n  When we speak of free software, we are referring to freedom, not\nprice.  Our General Public Licenses are designed to make sure that you\nhave the freedom to distribute copies of free software (and charge for\nthem if you wish), that you receive source code or can get it if you\nwant it, that you can change the software or use pieces of it in new\nfree programs, and that you know you can do these things.\n\n  To protect your rights, we need to prevent others from denying you\nthese rights or asking you to surrender the rights.  Therefore, you have\ncertain responsibilities if you distribute copies of the software, or if\nyou modify it: responsibilities to respect the freedom of others.\n\n  For example, if you distribute copies of such a program, whether\ngratis or for a fee, you must pass on to the recipients the same\nfreedoms that you received.  You must make sure that they, too, receive\nor can get the source code.  And you must show them these terms so they\nknow their rights.\n\n  Developers that use the GNU GPL protect your rights with two steps:\n(1) assert copyright on the software, and (2) offer you this License\ngiving you legal permission to copy, distribute and/or modify it.\n\n  For the developers' and authors' protection, the GPL clearly explains\nthat there is no warranty for this free software.  For both users' and\nauthors' sake, the GPL requires that modified versions be marked as\nchanged, so that their problems will not be attributed erroneously to\nauthors of previous versions.\n\n  Some devices are designed to deny users access to install or run\nmodified versions of the software inside them, although the manufacturer\ncan do so.  This is fundamentally incompatible with the aim of\nprotecting users' freedom to change the software.  The systematic\npattern of such abuse occurs in the area of products for individuals to\nuse, which is precisely where it is most unacceptable.  Therefore, we\nhave designed this version of the GPL to prohibit the practice for those\nproducts.  If such problems arise substantially in other domains, we\nstand ready to extend this provision to those domains in future versions\nof the GPL, as needed to protect the freedom of users.\n\n  Finally, every program is threatened constantly by software patents.\nStates should not allow patents to restrict development and use of\nsoftware on general-purpose computers, but in those that do, we wish to\navoid the special danger that patents applied to a free program could\nmake it effectively proprietary.  To prevent this, the GPL assures that\npatents cannot be used to render the program non-free.\n\n  The precise terms and conditions for copying, distribution and\nmodification follow.\n\n                       TERMS AND CONDITIONS\n\n  0. Definitions.\n\n  \"This License\" refers to version 3 of the GNU General Public License.\n\n  \"Copyright\" also means copyright-like laws that apply to other kinds of\nworks, such as semiconductor masks.\n\n  \"The Program\" refers to any copyrightable work licensed under this\nLicense.  Each licensee is addressed as \"you\".  \"Licensees\" and\n\"recipients\" may be individuals or organizations.\n\n  To \"modify\" a work means to copy from or adapt all or part of the work\nin a fashion requiring copyright permission, other than the making of an\nexact copy.  The resulting work is called a \"modified version\" of the\nearlier work or a work \"based on\" the earlier work.\n\n  A \"covered work\" means either the unmodified Program or a work based\non the Program.\n\n  To \"propagate\" a work means to do anything with it that, without\npermission, would make you directly or secondarily liable for\ninfringement under applicable copyright law, except executing it on a\ncomputer or modifying a private copy.  Propagation includes copying,\ndistribution (with or without modification), making available to the\npublic, and in some countries other activities as well.\n\n  To \"convey\" a work means any kind of propagation that enables other\nparties to make or receive copies.  Mere interaction with a user through\na computer network, with no transfer of a copy, is not conveying.\n\n  An interactive user interface displays \"Appropriate Legal Notices\"\nto the extent that it includes a convenient and prominently visible\nfeature that (1) displays an appropriate copyright notice, and (2)\ntells the user that there is no warranty for the work (except to the\nextent that warranties are provided), that licensees may convey the\nwork under this License, and how to view a copy of this License.  If\nthe interface presents a list of user commands or options, such as a\nmenu, a prominent item in the list meets this criterion.\n\n  1. Source Code.\n\n  The \"source code\" for a work means the preferred form of the work\nfor making modifications to it.  \"Object code\" means any non-source\nform of a work.\n\n  A \"Standard Interface\" means an interface that either is an official\nstandard defined by a recognized standards body, or, in the case of\ninterfaces specified for a particular programming language, one that\nis widely used among developers working in that language.\n\n  The \"System Libraries\" of an executable work include anything, other\nthan the work as a whole, that (a) is included in the normal form of\npackaging a Major Component, but which is not part of that Major\nComponent, and (b) serves only to enable use of the work with that\nMajor Component, or to implement a Standard Interface for which an\nimplementation is available to the public in source code form.  A\n\"Major Component\", in this context, means a major essential component\n(kernel, window system, and so on) of the specific operating system\n(if any) on which the executable work runs, or a compiler used to\nproduce the work, or an object code interpreter used to run it.\n\n  The \"Corresponding Source\" for a work in object code form means all\nthe source code needed to generate, install, and (for an executable\nwork) run the object code and to modify the work, including scripts to\ncontrol those activities.  However, it does not include the work's\nSystem Libraries, or general-purpose tools or generally available free\nprograms which are used unmodified in performing those activities but\nwhich are not part of the work.  For example, Corresponding Source\nincludes interface definition files associated with source files for\nthe work, and the source code for shared libraries and dynamically\nlinked subprograms that the work is specifically designed to require,\nsuch as by intimate data communication or control flow between those\nsubprograms and other parts of the work.\n\n  The Corresponding Source need not include anything that users\ncan regenerate automatically from other parts of the Corresponding\nSource.\n\n  The Corresponding Source for a work in source code form is that\nsame work.\n\n  2. Basic Permissions.\n\n  All rights granted under this License are granted for the term of\ncopyright on the Program, and are irrevocable provided the stated\nconditions are met.  This License explicitly affirms your unlimited\npermission to run the unmodified Program.  The output from running a\ncovered work is covered by this License only if the output, given its\ncontent, constitutes a covered work.  This License acknowledges your\nrights of fair use or other equivalent, as provided by copyright law.\n\n  You may make, run and propagate covered works that you do not\nconvey, without conditions so long as your license otherwise remains\nin force.  You may convey covered works to others for the sole purpose\nof having them make modifications exclusively for you, or provide you\nwith facilities for running those works, provided that you comply with\nthe terms of this License in conveying all material for which you do\nnot control copyright.  Those thus making or running the covered works\nfor you must do so exclusively on your behalf, under your direction\nand control, on terms that prohibit them from making any copies of\nyour copyrighted material outside their relationship with you.\n\n  Conveying under any other circumstances is permitted solely under\nthe conditions stated below.  Sublicensing is not allowed; section 10\nmakes it unnecessary.\n\n  3. Protecting Users' Legal Rights From Anti-Circumvention Law.\n\n  No covered work shall be deemed part of an effective technological\nmeasure under any applicable law fulfilling obligations under article\n11 of the WIPO copyright treaty adopted on 20 December 1996, or\nsimilar laws prohibiting or restricting circumvention of such\nmeasures.\n\n  When you convey a covered work, you waive any legal power to forbid\ncircumvention of technological measures to the extent such circumvention\nis effected by exercising rights under this License with respect to\nthe covered work, and you disclaim any intention to limit operation or\nmodification of the work as a means of enforcing, against the work's\nusers, your or third parties' legal rights to forbid circumvention of\ntechnological measures.\n\n  4. Conveying Verbatim Copies.\n\n  You may convey verbatim copies of the Program's source code as you\nreceive it, in any medium, provided that you conspicuously and\nappropriately publish on each copy an appropriate copyright notice;\nkeep intact all notices stating that this License and any\nnon-permissive terms added in accord with section 7 apply to the code;\nkeep intact all notices of the absence of any warranty; and give all\nrecipients a copy of this License along with the Program.\n\n  You may charge any price or no price for each copy that you convey,\nand you may offer support or warranty protection for a fee.\n\n  5. Conveying Modified Source Versions.\n\n  You may convey a work based on the Program, or the modifications to\nproduce it from the Program, in the form of source code under the\nterms of section 4, provided that you also meet all of these conditions:\n\n    a) The work must carry prominent notices stating that you modified\n    it, and giving a relevant date.\n\n    b) The work must carry prominent notices stating that it is\n    released under this License and any conditions added under section\n    7.  This requirement modifies the requirement in section 4 to\n    \"keep intact all notices\".\n\n    c) You must license the entire work, as a whole, under this\n    License to anyone who comes into possession of a copy.  This\n    License will therefore apply, along with any applicable section 7\n    additional terms, to the whole of the work, and all its parts,\n    regardless of how they are packaged.  This License gives no\n    permission to license the work in any other way, but it does not\n    invalidate such permission if you have separately received it.\n\n    d) If the work has interactive user interfaces, each must display\n    Appropriate Legal Notices; however, if the Program has interactive\n    interfaces that do not display Appropriate Legal Notices, your\n    work need not make them do so.\n\n  A compilation of a covered work with other separate and independent\nworks, which are not by their nature extensions of the covered work,\nand which are not combined with it such as to form a larger program,\nin or on a volume of a storage or distribution medium, is called an\n\"aggregate\" if the compilation and its resulting copyright are not\nused to limit the access or legal rights of the compilation's users\nbeyond what the individual works permit.  Inclusion of a covered work\nin an aggregate does not cause this License to apply to the other\nparts of the aggregate.\n\n  6. Conveying Non-Source Forms.\n\n  You may convey a covered work in object code form under the terms\nof sections 4 and 5, provided that you also convey the\nmachine-readable Corresponding Source under the terms of this License,\nin one of these ways:\n\n    a) Convey the object code in, or embodied in, a physical product\n    (including a physical distribution medium), accompanied by the\n    Corresponding Source fixed on a durable physical medium\n    customarily used for software interchange.\n\n    b) Convey the object code in, or embodied in, a physical product\n    (including a physical distribution medium), accompanied by a\n    written offer, valid for at least three years and valid for as\n    long as you offer spare parts or customer support for that product\n    model, to give anyone who possesses the object code either (1) a\n    copy of the Corresponding Source for all the software in the\n    product that is covered by this License, on a durable physical\n    medium customarily used for software interchange, for a price no\n    more than your reasonable cost of physically performing this\n    conveying of source, or (2) access to copy the\n    Corresponding Source from a network server at no charge.\n\n    c) Convey individual copies of the object code with a copy of the\n    written offer to provide the Corresponding Source.  This\n    alternative is allowed only occasionally and noncommercially, and\n    only if you received the object code with such an offer, in accord\n    with subsection 6b.\n\n    d) Convey the object code by offering access from a designated\n    place (gratis or for a charge), and offer equivalent access to the\n    Corresponding Source in the same way through the same place at no\n    further charge.  You need not require recipients to copy the\n    Corresponding Source along with the object code.  If the place to\n    copy the object code is a network server, the Corresponding Source\n    may be on a different server (operated by you or a third party)\n    that supports equivalent copying facilities, provided you maintain\n    clear directions next to the object code saying where to find the\n    Corresponding Source.  Regardless of what server hosts the\n    Corresponding Source, you remain obligated to ensure that it is\n    available for as long as needed to satisfy these requirements.\n\n    e) Convey the object code using peer-to-peer transmission, provided\n    you inform other peers where the object code and Corresponding\n    Source of the work are being offered to the general public at no\n    charge under subsection 6d.\n\n  A separable portion of the object code, whose source code is excluded\nfrom the Corresponding Source as a System Library, need not be\nincluded in conveying the object code work.\n\n  A \"User Product\" is either (1) a \"consumer product\", which means any\ntangible personal property which is normally used for personal, family,\nor household purposes, or (2) anything designed or sold for incorporation\ninto a dwelling.  In determining whether a product is a consumer product,\ndoubtful cases shall be resolved in favor of coverage.  For a particular\nproduct received by a particular user, \"normally used\" refers to a\ntypical or common use of that class of product, regardless of the status\nof the particular user or of the way in which the particular user\nactually uses, or expects or is expected to use, the product.  A product\nis a consumer product regardless of whether the product has substantial\ncommercial, industrial or non-consumer uses, unless such uses represent\nthe only significant mode of use of the product.\n\n  \"Installation Information\" for a User Product means any methods,\nprocedures, authorization keys, or other information required to install\nand execute modified versions of a covered work in that User Product from\na modified version of its Corresponding Source.  The information must\nsuffice to ensure that the continued functioning of the modified object\ncode is in no case prevented or interfered with solely because\nmodification has been made.\n\n  If you convey an object code work under this section in, or with, or\nspecifically for use in, a User Product, and the conveying occurs as\npart of a transaction in which the right of possession and use of the\nUser Product is transferred to the recipient in perpetuity or for a\nfixed term (regardless of how the transaction is characterized), the\nCorresponding Source conveyed under this section must be accompanied\nby the Installation Information.  But this requirement does not apply\nif neither you nor any third party retains the ability to install\nmodified object code on the User Product (for example, the work has\nbeen installed in ROM).\n\n  The requirement to provide Installation Information does not include a\nrequirement to continue to provide support service, warranty, or updates\nfor a work that has been modified or installed by the recipient, or for\nthe User Product in which it has been modified or installed.  Access to a\nnetwork may be denied when the modification itself materially and\nadversely affects the operation of the network or violates the rules and\nprotocols for communication across the network.\n\n  Corresponding Source conveyed, and Installation Information provided,\nin accord with this section must be in a format that is publicly\ndocumented (and with an implementation available to the public in\nsource code form), and must require no special password or key for\nunpacking, reading or copying.\n\n  7. Additional Terms.\n\n  \"Additional permissions\" are terms that supplement the terms of this\nLicense by making exceptions from one or more of its conditions.\nAdditional permissions that are applicable to the entire Program shall\nbe treated as though they were included in this License, to the extent\nthat they are valid under applicable law.  If additional permissions\napply only to part of the Program, that part may be used separately\nunder those permissions, but the entire Program remains governed by\nthis License without regard to the additional permissions.\n\n  When you convey a copy of a covered work, you may at your option\nremove any additional permissions from that copy, or from any part of\nit.  (Additional permissions may be written to require their own\nremoval in certain cases when you modify the work.)  You may place\nadditional permissions on material, added by you to a covered work,\nfor which you have or can give appropriate copyright permission.\n\n  Notwithstanding any other provision of this License, for material you\nadd to a covered work, you may (if authorized by the copyright holders of\nthat material) supplement the terms of this License with terms:\n\n    a) Disclaiming warranty or limiting liability differently from the\n    terms of sections 15 and 16 of this License; or\n\n    b) Requiring preservation of specified reasonable legal notices or\n    author attributions in that material or in the Appropriate Legal\n    Notices displayed by works containing it; or\n\n    c) Prohibiting misrepresentation of the origin of that material, or\n    requiring that modified versions of such material be marked in\n    reasonable ways as different from the original version; or\n\n    d) Limiting the use for publicity purposes of names of licensors or\n    authors of the material; or\n\n    e) Declining to grant rights under trademark law for use of some\n    trade names, trademarks, or service marks; or\n\n    f) Requiring indemnification of licensors and authors of that\n    material by anyone who conveys the material (or modified versions of\n    it) with contractual assumptions of liability to the recipient, for\n    any liability that these contractual assumptions directly impose on\n    those licensors and authors.\n\n  All other non-permissive additional terms are considered \"further\nrestrictions\" within the meaning of section 10.  If the Program as you\nreceived it, or any part of it, contains a notice stating that it is\ngoverned by this License along with a term that is a further\nrestriction, you may remove that term.  If a license document contains\na further restriction but permits relicensing or conveying under this\nLicense, you may add to a covered work material governed by the terms\nof that license document, provided that the further restriction does\nnot survive such relicensing or conveying.\n\n  If you add terms to a covered work in accord with this section, you\nmust place, in the relevant source files, a statement of the\nadditional terms that apply to those files, or a notice indicating\nwhere to find the applicable terms.\n\n  Additional terms, permissive or non-permissive, may be stated in the\nform of a separately written license, or stated as exceptions;\nthe above requirements apply either way.\n\n  8. Termination.\n\n  You may not propagate or modify a covered work except as expressly\nprovided under this License.  Any attempt otherwise to propagate or\nmodify it is void, and will automatically terminate your rights under\nthis License (including any patent licenses granted under the third\nparagraph of section 11).\n\n  However, if you cease all violation of this License, then your\nlicense from a particular copyright holder is reinstated (a)\nprovisionally, unless and until the copyright holder explicitly and\nfinally terminates your license, and (b) permanently, if the copyright\nholder fails to notify you of the violation by some reasonable means\nprior to 60 days after the cessation.\n\n  Moreover, your license from a particular copyright holder is\nreinstated permanently if the copyright holder notifies you of the\nviolation by some reasonable means, this is the first time you have\nreceived notice of violation of this License (for any work) from that\ncopyright holder, and you cure the violation prior to 30 days after\nyour receipt of the notice.\n\n  Termination of your rights under this section does not terminate the\nlicenses of parties who have received copies or rights from you under\nthis License.  If your rights have been terminated and not permanently\nreinstated, you do not qualify to receive new licenses for the same\nmaterial under section 10.\n\n  9. Acceptance Not Required for Having Copies.\n\n  You are not required to accept this License in order to receive or\nrun a copy of the Program.  Ancillary propagation of a covered work\noccurring solely as a consequence of using peer-to-peer transmission\nto receive a copy likewise does not require acceptance.  However,\nnothing other than this License grants you permission to propagate or\nmodify any covered work.  These actions infringe copyright if you do\nnot accept this License.  Therefore, by modifying or propagating a\ncovered work, you indicate your acceptance of this License to do so.\n\n  10. Automatic Licensing of Downstream Recipients.\n\n  Each time you convey a covered work, the recipient automatically\nreceives a license from the original licensors, to run, modify and\npropagate that work, subject to this License.  You are not responsible\nfor enforcing compliance by third parties with this License.\n\n  An \"entity transaction\" is a transaction transferring control of an\norganization, or substantially all assets of one, or subdividing an\norganization, or merging organizations.  If propagation of a covered\nwork results from an entity transaction, each party to that\ntransaction who receives a copy of the work also receives whatever\nlicenses to the work the party's predecessor in interest had or could\ngive under the previous paragraph, plus a right to possession of the\nCorresponding Source of the work from the predecessor in interest, if\nthe predecessor has it or can get it with reasonable efforts.\n\n  You may not impose any further restrictions on the exercise of the\nrights granted or affirmed under this License.  For example, you may\nnot impose a license fee, royalty, or other charge for exercise of\nrights granted under this License, and you may not initiate litigation\n(including a cross-claim or counterclaim in a lawsuit) alleging that\nany patent claim is infringed by making, using, selling, offering for\nsale, or importing the Program or any portion of it.\n\n  11. Patents.\n\n  A \"contributor\" is a copyright holder who authorizes use under this\nLicense of the Program or a work on which the Program is based.  The\nwork thus licensed is called the contributor's \"contributor version\".\n\n  A contributor's \"essential patent claims\" are all patent claims\nowned or controlled by the contributor, whether already acquired or\nhereafter acquired, that would be infringed by some manner, permitted\nby this License, of making, using, or selling its contributor version,\nbut do not include claims that would be infringed only as a\nconsequence of further modification of the contributor version.  For\npurposes of this definition, \"control\" includes the right to grant\npatent sublicenses in a manner consistent with the requirements of\nthis License.\n\n  Each contributor grants you a non-exclusive, worldwide, royalty-free\npatent license under the contributor's essential patent claims, to\nmake, use, sell, offer for sale, import and otherwise run, modify and\npropagate the contents of its contributor version.\n\n  In the following three paragraphs, a \"patent license\" is any express\nagreement or commitment, however denominated, not to enforce a patent\n(such as an express permission to practice a patent or covenant not to\nsue for patent infringement).  To \"grant\" such a patent license to a\nparty means to make such an agreement or commitment not to enforce a\npatent against the party.\n\n  If you convey a covered work, knowingly relying on a patent license,\nand the Corresponding Source of the work is not available for anyone\nto copy, free of charge and under the terms of this License, through a\npublicly available network server or other readily accessible means,\nthen you must either (1) cause the Corresponding Source to be so\navailable, or (2) arrange to deprive yourself of the benefit of the\npatent license for this particular work, or (3) arrange, in a manner\nconsistent with the requirements of this License, to extend the patent\nlicense to downstream recipients.  \"Knowingly relying\" means you have\nactual knowledge that, but for the patent license, your conveying the\ncovered work in a country, or your recipient's use of the covered work\nin a country, would infringe one or more identifiable patents in that\ncountry that you have reason to believe are valid.\n\n  If, pursuant to or in connection with a single transaction or\narrangement, you convey, or propagate by procuring conveyance of, a\ncovered work, and grant a patent license to some of the parties\nreceiving the covered work authorizing them to use, propagate, modify\nor convey a specific copy of the covered work, then the patent license\nyou grant is automatically extended to all recipients of the covered\nwork and works based on it.\n\n  A patent license is \"discriminatory\" if it does not include within\nthe scope of its coverage, prohibits the exercise of, or is\nconditioned on the non-exercise of one or more of the rights that are\nspecifically granted under this License.  You may not convey a covered\nwork if you are a party to an arrangement with a third party that is\nin the business of distributing software, under which you make payment\nto the third party based on the extent of your activity of conveying\nthe work, and under which the third party grants, to any of the\nparties who would receive the covered work from you, a discriminatory\npatent license (a) in connection with copies of the covered work\nconveyed by you (or copies made from those copies), or (b) primarily\nfor and in connection with specific products or compilations that\ncontain the covered work, unless you entered into that arrangement,\nor that patent license was granted, prior to 28 March 2007.\n\n  Nothing in this License shall be construed as excluding or limiting\nany implied license or other defenses to infringement that may\notherwise be available to you under applicable patent law.\n\n  12. No Surrender of Others' Freedom.\n\n  If conditions are imposed on you (whether by court order, agreement or\notherwise) that contradict the conditions of this License, they do not\nexcuse you from the conditions of this License.  If you cannot convey a\ncovered work so as to satisfy simultaneously your obligations under this\nLicense and any other pertinent obligations, then as a consequence you may\nnot convey it at all.  For example, if you agree to terms that obligate you\nto collect a royalty for further conveying from those to whom you convey\nthe Program, the only way you could satisfy both those terms and this\nLicense would be to refrain entirely from conveying the Program.\n\n  13. Use with the GNU Affero General Public License.\n\n  Notwithstanding any other provision of this License, you have\npermission to link or combine any covered work with a work licensed\nunder version 3 of the GNU Affero General Public License into a single\ncombined work, and to convey the resulting work.  The terms of this\nLicense will continue to apply to the part which is the covered work,\nbut the special requirements of the GNU Affero General Public License,\nsection 13, concerning interaction through a network will apply to the\ncombination as such.\n\n  14. Revised Versions of this License.\n\n  The Free Software Foundation may publish revised and/or new versions of\nthe GNU General Public License from time to time.  Such new versions will\nbe similar in spirit to the present version, but may differ in detail to\naddress new problems or concerns.\n\n  Each version is given a distinguishing version number.  If the\nProgram specifies that a certain numbered version of the GNU General\nPublic License \"or any later version\" applies to it, you have the\noption of following the terms and conditions either of that numbered\nversion or of any later version published by the Free Software\nFoundation.  If the Program does not specify a version number of the\nGNU General Public License, you may choose any version ever published\nby the Free Software Foundation.\n\n  If the Program specifies that a proxy can decide which future\nversions of the GNU General Public License can be used, that proxy's\npublic statement of acceptance of a version permanently authorizes you\nto choose that version for the Program.\n\n  Later license versions may give you additional or different\npermissions.  However, no additional obligations are imposed on any\nauthor or copyright holder as a result of your choosing to follow a\nlater version.\n\n  15. Disclaimer of Warranty.\n\n  THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY\nAPPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT\nHOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM \"AS IS\" WITHOUT WARRANTY\nOF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,\nTHE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR\nPURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM\nIS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF\nALL NECESSARY SERVICING, REPAIR OR CORRECTION.\n\n  16. Limitation of Liability.\n\n  IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING\nWILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS\nTHE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY\nGENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE\nUSE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF\nDATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD\nPARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),\nEVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF\nSUCH DAMAGES.\n\n  17. Interpretation of Sections 15 and 16.\n\n  If the disclaimer of warranty and limitation of liability provided\nabove cannot be given local legal effect according to their terms,\nreviewing courts shall apply local law that most closely approximates\nan absolute waiver of all civil liability in connection with the\nProgram, unless a warranty or assumption of liability accompanies a\ncopy of the Program in return for a fee.\n\n                     END OF TERMS AND CONDITIONS\n\n            How to Apply These Terms to Your New Programs\n\n  If you develop a new program, and you want it to be of the greatest\npossible use to the public, the best way to achieve this is to make it\nfree software which everyone can redistribute and change under these terms.\n\n  To do so, attach the following notices to the program.  It is safest\nto attach them to the start of each source file to most effectively\nstate the exclusion of warranty; and each file should have at least\nthe \"copyright\" line and a pointer to where the full notice is found.\n\n    {one line to give the program's name and a brief idea of what it does.}\n    Copyright (C) {year}  {name of author}\n\n    This program is free software: you can redistribute it and/or modify\n    it under the terms of the GNU General Public License as published by\n    the Free Software Foundation, either version 3 of the License, or\n    (at your option) any later version.\n\n    This program is distributed in the hope that it will be useful,\n    but WITHOUT ANY WARRANTY; without even the implied warranty of\n    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n    GNU General Public License for more details.\n\n    You should have received a copy of the GNU General Public License\n    along with this program.  If not, see <http://www.gnu.org/licenses/>.\n\nAlso add information on how to contact you by electronic and paper mail.\n\n  If the program does terminal interaction, make it output a short\nnotice like this when it starts in an interactive mode:\n\n    {project}  Copyright (C) {year}  {fullname}\n    This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.\n    This is free software, and you are welcome to redistribute it\n    under certain conditions; type `show c' for details.\n\nThe hypothetical commands `show w' and `show c' should show the appropriate\nparts of the General Public License.  Of course, your program's commands\nmight be different; for a GUI interface, you would use an \"about box\".\n\n  You should also get your employer (if you work as a programmer) or school,\nif any, to sign a \"copyright disclaimer\" for the program, if necessary.\nFor more information on this, and how to apply and follow the GNU GPL, see\n<http://www.gnu.org/licenses/>.\n\n  The GNU General Public License does not permit incorporating your program\ninto proprietary programs.  If your program is a subroutine library, you\nmay consider it more useful to permit linking proprietary applications with\nthe library.  If this is what you want to do, use the GNU Lesser General\nPublic License instead of this License.  But first, please read\n<http://www.gnu.org/philosophy/why-not-lgpl.html>.\n\n"
  },
  {
    "path": "NAMESPACE",
    "content": "# Generated by roxygen2: do not edit by hand\n\nS3method(adjacencylist,NN)\nS3method(adjacencylist,frNN)\nS3method(adjacencylist,kNN)\nS3method(as.dendrogram,default)\nS3method(as.dendrogram,hclust)\nS3method(as.dendrogram,hdbscan)\nS3method(as.dendrogram,optics)\nS3method(as.dendrogram,reachability)\nS3method(as.reachability,dendrogram)\nS3method(as.reachability,optics)\nS3method(augment,dbscan)\nS3method(augment,general_clustering)\nS3method(augment,hdbscan)\nS3method(comps,dist)\nS3method(comps,frNN)\nS3method(comps,kNN)\nS3method(comps,sNN)\nS3method(glance,dbscan)\nS3method(glance,general_clustering)\nS3method(glance,hdbscan)\nS3method(ncluster,default)\nS3method(nnoise,default)\nS3method(nobs,dbscan)\nS3method(nobs,general_clustering)\nS3method(nobs,hdbscan)\nS3method(plot,NN)\nS3method(plot,hdbscan)\nS3method(plot,optics)\nS3method(plot,reachability)\nS3method(predict,dbscan_fast)\nS3method(predict,hdbscan)\nS3method(predict,optics)\nS3method(print,dbscan_fast)\nS3method(print,frNN)\nS3method(print,general_clustering)\nS3method(print,hdbscan)\nS3method(print,kNN)\nS3method(print,optics)\nS3method(print,reachability)\nS3method(print,sNN)\nS3method(sort,NN)\nS3method(sort,frNN)\nS3method(sort,kNN)\nS3method(sort,sNN)\nS3method(tidy,dbscan)\nS3method(tidy,general_clustering)\nS3method(tidy,hdbscan)\nexport(adjacencylist)\nexport(as.dendrogram)\nexport(as.reachability)\nexport(augment)\nexport(clplot)\nexport(comps)\nexport(coredist)\nexport(dbcv)\nexport(dbscan)\nexport(extractDBSCAN)\nexport(extractFOSC)\nexport(extractXi)\nexport(frNN)\nexport(glance)\nexport(glosh)\nexport(hdbscan)\nexport(hullplot)\nexport(is.corepoint)\nexport(jpclust)\nexport(kNN)\nexport(kNNdist)\nexport(kNNdistplot)\nexport(lof)\nexport(mrdist)\nexport(ncluster)\nexport(nnoise)\nexport(optics)\nexport(pointdensity)\nexport(sNN)\nexport(sNNclust)\nexport(tidy)\nimport(Rcpp)\nimportFrom(generics,augment)\nimportFrom(generics,glance)\nimportFrom(generics,tidy)\nimportFrom(grDevices,adjustcolor)\nimportFrom(grDevices,chull)\nimportFrom(grDevices,palette)\nimportFrom(graphics,abline)\nimportFrom(graphics,lines)\nimportFrom(graphics,matplot)\nimportFrom(graphics,par)\nimportFrom(graphics,plot)\nimportFrom(graphics,points)\nimportFrom(graphics,polygon)\nimportFrom(graphics,segments)\nimportFrom(graphics,text)\nimportFrom(stats,as.dendrogram)\nimportFrom(stats,dendrapply)\nimportFrom(stats,dist)\nimportFrom(stats,hclust)\nimportFrom(stats,is.leaf)\nimportFrom(stats,nobs)\nimportFrom(stats,prcomp)\nimportFrom(stats,predict)\nimportFrom(utils,tail)\nuseDynLib(dbscan, .registration=TRUE)\n"
  },
  {
    "path": "NEWS.md",
    "content": "# dbscan 1.2.4 (2025-12-18)\n\n## Bugfixes\n* dbscan now checks for matrices with 0 rows or 0 columns\n  (reported by maldridgeepa).\n* Fixed license information for the ANN library header files (reported by \n  Charles Plessy).\n\n# dbscan 1.2.3 (2025-08-20)\n\n## Bugfixes\n* plot.hdbscan gained parameters main, ylab, and leaflab (reported by nhward).\n\n## Changes\n* Fixed  partial argument matches.\n\n# dbscan 1.2.2 (2025-01-24)\n\n## Changes\n* Removed dependence on the /bits/stdc++.h header. \n\n# dbscan 1.2.1 (2025-01-23)\n\n## Changes\n* Various refactoring by m-muecke\n\n## New Features\n* HDBSCAN gained parameter cluster_selection_epsilon to implement \n  clusters selected from Malzer and Baum (2020).\n* Functions ncluster() and nnoise() were added.\n* hullplot now() marks noise as x.\n* Added clplot().\n* pointdensity now also accepts a dist object as input and has the new type\n  \"gaussian\" to calculate a Gaussian kernel estimate.\n* Added the DBCV index.\n\n## Bugfixes\n* extractFOCS: Fixed total_score.\n* Rewrote minimal spanning tree code.\n\n# dbscan 1.2-0 (2024-06-28)\n\n## New Features\n* dbscan has now tidymodels tidiers (glance, tidy, augment).\n* kNNdistplot can now plot a range of k/minPts values.\n* added stats::nobs methods for the clusterings.\n* kNN and frNN now contains the used distance metric.\n\n## Changes\n* dbscan component dist was renamed to metric. \n* Removed redundant sort in kNNdistplot (reported by Natasza Szczypien).\n* Refactoring use anyNA(x) instead of any(is.na(x))\n  and many more (by m-muecke).\n* Reorganized the C++ source code.\n* README now uses bibtex.\n* Tests use now testthat edition 3 (m-muecke).\n\n# dbscan 1.1-12 (2023-11-28)\n\n## Bugfixes\n* point_density checks now for missing values (reported by soelderer).\n* Removed C++11 specification.\n* ANN.cpp: fixed Rprintf warning.\n\n# dbscan 1.1-11 (2022-10-26)\n\n## New Features\n* kNNdistplot gained parameter minPts.\n* dbscan now retains information on distance method and border points.\n* HDBSCAN now supports long vectors to work with larger distance matrices. \n* conversion from dist to kNN and frNN is now more memory efficient. It does no longer \n  coerce the dist object into a matrix of double the size, but extract the distances directly\n  from the dist object.\n* Better description of how predict uses only Euclidean distances and more error checking.\n* The package now exports a new generic for as.dendrogram().\n\n## Bugfixes\n* is.corepoint() now uses the correct epsilon value (reported by Eng Aun).\n* functions now check for cluster::dissimilariy objects which have class dist \n  but missing attributes.\n\n# dbscan 1.1-10 (2022-01-14)\n\n## New Features\n* is.corepoint() for DBSCAN.\n* coredist() and mrdist() for HDBSCAN.\n* find connected components with comps().\n\n## Changes\n* reachability plot now shows all undefined distances as a dashed line.\n\n## Bugfixes\n* memory leak in mrd calculation fixed.\n\n# dbscan 1.1-9 (2022-01-10)\n\n## Changes\n* We use now roxygen2.  \n\n## New Features\n* Added predict for hdbscan (as suggested by moredatapls)\n\n# dbscan 1.1-8 (2021-04-26)\n\n## Bugfixes\n* LOF: fixed numerical issues with k-nearest neighbor distance on Solaris.\n\n# dbscan 1.1-7 (2021-04-21)\n\n## Bugfixes\n* Fixed description of k in knndistplot and added minPts argument.\n* Fixed bug for tied distances in lof (reported by sverchkov).\n\n## Changes\n* lof: the density parameter was changes to minPts to be consistent with the original paper and dbscan. Note that minPts = k + 1.\n\n# dbscan 1.1-6 (2021-02-24)\n\n## Improvements \n* Improved speed of LOF for large ks (following suggestions by eduardokapp). \n* kNN: results is now not sorted again for kd-tree queries which is much faster (by a factor of 10).\n* ANN library: annclose() is now only called once when the package is unloaded. This is in preparation to support persistent kd-trees using external pointers.\n* hdbscan lost parameter xdist.\n\n## Bugfixes\n* removed dependence on methods.\n* fixed problem in hullplot for singleton clusters (reported by Fernando Archuby).\n* GLOSH now also accepts data.frames.\n* GLOSH returns now 0 instead of NaN if we have k duplicate points in the data.\n\n# dbscan 1.1-5 (2019-10-22)\n\n## New Features\n* kNN and frNN gained parameter query to query neighbors for points not in the data.\n* sNN gained parameter jp to decide if the shared NN should be counted using the definition by Jarvis and Patrick.\n\n\n# dbscan 1.1-4 (2019-08-05)\n\n## New Features\n* kNNdist gained parameter all to indicate if a matrix with the distance to all \n  nearest neighbors up to k should be returned.\n\n## Bugfixes\n* kNNdist now correctly returns the distances to the kth neighbor \n  (reported by zschuster).\n* dbscan: check eps and minPts parameters to avoid undefined results (reported by ArthurPERE).\n\n\n# dbscan 1.1-3 (2018-11-12)\n\n## Bugfixes\n* pointdensity was double counting the query point (reported by Marius Hofert).\n\n# dbscan 1.1-2 (2018-05-18)\n\n## New Features\n* OPTICS now calculates eps if it is omitted.\n\n## Bugfixes\n* Example now only uses igraph conditionally since it is unavailable \n  on Solaris (reported by B. Ripley).\n\n# dbscan 1.1-1 (2017-03-19)\n\n## Bugfixes\n\n* Fixed problem with constant name on Solaris in ANN code (reported by B. Ripley).\n\n# dbscan 1.1-0 (2017-03-18)\n\n## New Features\n\n* HDBSCAN was added.\n* extractFOSC (optimal selection of clusters for HDBSCAN) was added.\n* GLOSH outlier score was added.\n* hullplot uses now filled polygons as the default.\n* hullplot now used PCA if the data has more than 2 dimensions.\n* Added NN superclass for kNN and frNN with plot and with adjacencylist().\n* Added shared nearest neighbor clustering as sNNclust() and sNN to calculate\n  the number of shared nearest neighbors.\n* Added pointdensity function.\n* Unsorted kNN and frNN can now be sorted using sort().\n* kNN and frNN now also accept kNN and frNN objects, respectively. This can \n  be used to create a new kNN (frNN) with a reduced k or eps.\n* Datasets added: DS3 and moon.\n\n## Interface Changes\n\n* Improved interface for dbscan() and optics(): ... it now passed on to frNN.\n* OPTICS clustering extraction methods are now called extractDBSCAN and \n  extractXi.\n* kNN and frNN are now objects with a print function.\n* dbscan now also accepts a frNN object as input.\n* jpclust and sNNclust now return a list instead of just the \n  cluster assignments.\n\n# dbscan 1.0-0 (2017-02-02)\n\n## New Features\n\n* The package has now a vignette.\n* Jarvis-Patrick clustering is now available as jpclust().\n* Improved interface for dbscan() and optics(): ... is now passed on to frNN.\n* OPTICS clustering extraction methods are now called extractDBSCAN and \n  extractXi.\n* hullplot uses now filled polygons as the default.\n* hullplot now used PCA if the data has more than 2 dimensions.\n* kNN and frNN are now objects with a print function.\n* dbscan now also accepts a frNN object as input.\n\n\n# dbscan 0.9-8 (2016-08-05)\n\n## New Features\n\n* Added hullplot to plot a scatter plot with added convex cluster hulls.\n* OPTICS: added a predecessor correction step that is used by \n    the ELKI implementation (Matt Piekenbrock).  \n\n## Bugfixes\n\n* Fixed a memory problem in frNN (reported by Yilei He).\n\n# dbscan 0.9-7 (2016-04-14)\n\n* OPTICSXi is now implemented (thanks to Matt Piekenbrock).\n* DBSCAN now also accepts MinPts (with a capital M) to be\n    compatible with the fpc version.\n* DBSCAN objects are now also of class db scan_fast to avoid clashes with fpc.\n* DBSCAN and OPTICS have now predict functions.\n* Added test for unhandled NAs.\n* Fixed LOF for more than k duplicate points (reported by Samneet Singh).\n\n# dbscan 0.9-6 (2015-12-14)\n\n* OPTICS: fixed second bug reported by Di Pang\n* all methods now also accept dist objects and have a search\n    method \"dist\" which precomputes distances.\n\n# dbscan 0.9-5 (2015-10-04)\n\n* OPTICS: fixed bug with first observation reported by Di Pang\n* OPTICS: clusterings can now be extracted using optics_cut\n\n# dbscan 0.9-4 (2015-09-17)\n\n* added tests (testthat).\n* input data is now checked if it can safely be coerced into a\n    numeric matrix (storage.mode double).\n* fixed self matches in kNN and frNN (now returns the first NN correctly).\n\n# dbscan 0.9-3 (2015-9-2)\n\n* Added weights to DBSCAN.\n\n# dbscan 0.9-2 (2015-08-11)\n\n* Added kNN interface.\n* Added frNN (fixed radius NN) interface.\n* Added LOF.\n* Added OPTICS.\n* All algorithms check now for interrupt (CTRL-C/Esc).\n* DBSCAN now returns a list instead of a numeric vector.\n\n# dbscan 0.9-1 (2015-07-21)\n\n* DBSCAN: Improved speed by avoiding repeated sorting of point ids.\n* Added linear NN search option.\n* Added fast calculation for kNN distance.\n* fpc and microbenchmark are now used conditionally in the examples.\n\n# dbscan 0.9-0 (2015-07-15)\n\n* initial release\n"
  },
  {
    "path": "R/AAA_dbscan-package.R",
    "content": "#' @keywords internal\n#'\n#' @section Key functions:\n#' - Clustering: [dbscan()], [hdbscan()], [optics()], [jpclust()], [sNNclust()]\n#' - Outliers: [lof()], [glosh()], [pointdensity()]\n#' - Nearest Neighbors: [kNN()], [frNN()], [sNN()]\n#'\n#' @references\n#' Hahsler M, Piekenbrock M, Doran D (2019). dbscan: Fast Density-Based Clustering with R. Journal of Statistical Software, 91(1), 1-30. \\doi{10.18637/jss.v091.i01}\n#'\n#' @import Rcpp\n#' @importFrom graphics plot points lines text abline polygon par segments matplot\n#' @importFrom grDevices palette chull adjustcolor\n#' @importFrom stats dist hclust dendrapply as.dendrogram is.leaf prcomp\n#' @importFrom utils tail\n#'\n#' @useDynLib dbscan, .registration=TRUE\n\"_PACKAGE\"\n"
  },
  {
    "path": "R/AAA_definitions.R",
    "content": "#######################################################################\n# dbscan - Density Based Clustering of Applications with Noise\n#          and Related Algorithms\n# Copyright (C) 2015 Michael Hahsler\n\n# This program is free software; you can redistribute it and/or modify\n# it under the terms of the GNU General Public License as published by\n# the Free Software Foundation; either version 2 of the License, or\n# any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU General Public License for more details.\n#\n# You should have received a copy of the GNU General Public License along\n# with this program; if not, write to the Free Software Foundation, Inc.,\n# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.\n\n.ANNsplitRule <- c(\"STD\", \"MIDPT\", \"FAIR\", \"SL_MIDPT\", \"SL_FAIR\", \"SUGGEST\")\n\n.matrixlike <- function(x) {\n  if  (is.null(dim(x)))\n       return(FALSE)\n\n  # check that there is at least one row and one column!\n  if (nrow(x) < 1L) stop(\"the provided data has 0 rows!\")\n  if (ncol(x) < 1L) stop(\"the provided data has 0 columns!\")\n\n  TRUE\n}\n"
  },
  {
    "path": "R/DBCV_datasets.R",
    "content": "#######################################################################\n# dbscan - Density Based Clustering of Applications with Noise\n#          and Related Algorithms\n# Copyright (C) 2015 Michael Hahsler\n\n# This program is free software; you can redistribute it and/or modify\n# it under the terms of the GNU General Public License as published by\n# the Free Software Foundation; either version 2 of the License, or\n# any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU General Public License for more details.\n#\n# You should have received a copy of the GNU General Public License along\n# with this program; if not, write to the Free Software Foundation, Inc.,\n# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.\n\n#' DBCV Paper Datasets\n#'\n#' The four synthetic 2D datasets used in Moulavi et al (2014).\n#'\n#' @name DBCV_datasets\n#' @aliases Dataset_1 Dataset_2 Dataset_3 Dataset_4\n#' @docType data\n#' @format Four data frames with the following 3 variables.\n#' \\describe{\n#' \\item{x}{a numeric vector}\n#' \\item{y}{a numeric vector}\n#' \\item{class}{an integer vector indicating the class label. 0 means noise.} }\n#' @references Davoud Moulavi and Pablo A. Jaskowiak and\n#' Ricardo J. G. B. Campello and Arthur Zimek and Jörg Sander (2014).\n#' Density-Based Clustering Validation. In\n#' _Proceedings of the 2014 SIAM International Conference on Data Mining,_\n#' pages 839-847\n#' \\doi{10.1137/1.9781611973440.96}\n#' @source https://github.com/pajaskowiak/dbcv\n#' @keywords datasets\n#' @examples\n#' data(\"Dataset_1\")\n#' clplot(Dataset_1[, c(\"x\", \"y\")], cl = Dataset_1$class)\n#'\n#' data(\"Dataset_2\")\n#' clplot(Dataset_2[, c(\"x\", \"y\")], cl = Dataset_2$class)\n#'\n#' data(\"Dataset_3\")\n#' clplot(Dataset_3[, c(\"x\", \"y\")], cl = Dataset_3$class)\n#'\n#' data(\"Dataset_4\")\n#' clplot(Dataset_4[, c(\"x\", \"y\")], cl = Dataset_4$class)\nNULL\n\n\n\n"
  },
  {
    "path": "R/DS3.R",
    "content": "#######################################################################\n# dbscan - Density Based Clustering of Applications with Noise\n#          and Related Algorithms\n# Copyright (C) 2015 Michael Hahsler\n\n# This program is free software; you can redistribute it and/or modify\n# it under the terms of the GNU General Public License as published by\n# the Free Software Foundation; either version 2 of the License, or\n# any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU General Public License for more details.\n#\n# You should have received a copy of the GNU General Public License along\n# with this program; if not, write to the Free Software Foundation, Inc.,\n# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.\n\n\n#' DS3: Spatial data with arbitrary shapes\n#'\n#' Contains 8000 2-d points, with 6 \"natural\" looking shapes, all of which have\n#' an sinusoid-like shape that intersects with each cluster.\n#' The data set was originally used as a benchmark data set for the Chameleon clustering\n#' algorithm (Karypis, Han and Kumar, 1999) to\n#' illustrate the a data set containing arbitrarily shaped\n#' spatial data surrounded by both noise and artifacts.\n#'\n#' @name DS3\n#' @docType data\n#' @format A data.frame with 8000 observations on the following 2 columns:\n#' \\describe{\n#'   \\item{X}{a numeric vector}\n#'   \\item{Y}{a numeric vector}\n#' }\n#'\n#' @references Karypis, George, Eui-Hong Han, and Vipin Kumar (1999).\n#' Chameleon: Hierarchical clustering using dynamic modeling. _Computer_\n#' 32(8): 68-75.\n#' @source Obtained from \\url{http://cs.joensuu.fi/sipu/datasets/}\n#' @keywords datasets\n#' @examples\n#' data(DS3)\n#' plot(DS3, pch = 20, cex = 0.25)\nNULL\n"
  },
  {
    "path": "R/GLOSH.R",
    "content": "#######################################################################\n# dbscan - Density Based Clustering of Applications with Noise\n#          and Related Algorithms\n# Copyright (C) 2015 Michael Hahsler, Matthew Piekenbrock\n\n# This program is free software; you can redistribute it and/or modify\n# it under the terms of the GNU General Public License as published by\n# the Free Software Foundation; either version 2 of the License, or\n# any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU General Public License for more details.\n#\n# You should have received a copy of the GNU General Public License along\n# with this program; if not, write to the Free Software Foundation, Inc.,\n# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.\n\n#' Global-Local Outlier Score from Hierarchies\n#'\n#' Calculate the Global-Local Outlier Score from Hierarchies (GLOSH) score for\n#' each data point using a kd-tree to speed up kNN search.\n#'\n#' GLOSH compares the density of a point to densities of any points associated\n#' within current and child clusters (if any). Points that have a substantially\n#' lower density than the density mode (cluster) they most associate with are\n#' considered outliers. GLOSH is computed from a hierarchy a clusters.\n#'\n#' Specifically, consider a point \\emph{x} and a density or distance threshold\n#' \\emph{lambda}. GLOSH is calculated by taking 1 minus the ratio of how long\n#' any of the child clusters of the cluster \\emph{x} belongs to \"survives\"\n#' changes in \\emph{lambda} to the highest \\emph{lambda} threshold of x, above\n#' which x becomes a noise point.\n#'\n#' Scores close to 1 indicate outliers. For more details on the motivation for\n#' this calculation, see Campello et al (2015).\n#'\n#' @aliases glosh GLOSH\n#' @family Outlier Detection Functions\n#'\n#' @param x an [hclust] object, data matrix, or [dist] object.\n#' @param k size of the neighborhood.\n#' @param ... further arguments are passed on to [kNN()].\n#' @return A numeric vector of length equal to the size of the original data\n#' set containing GLOSH values for all data points.\n#' @author Matt Piekenbrock\n#'\n#' @references Campello, Ricardo JGB, Davoud Moulavi, Arthur Zimek, and Joerg\n#' Sander. Hierarchical density estimates for data clustering, visualization,\n#' and outlier detection. _ACM Transactions on Knowledge Discovery from Data\n#' (TKDD)_ 10, no. 1 (2015).\n#' \\doi{10.1145/2733381}\n#' @keywords model\n#' @examples\n#' set.seed(665544)\n#' n <- 100\n#' x <- cbind(\n#'   x=runif(10, 0, 5) + rnorm(n, sd = 0.4),\n#'   y=runif(10, 0, 5) + rnorm(n, sd = 0.4)\n#'   )\n#'\n#' ### calculate GLOSH score\n#' glosh <- glosh(x, k = 3)\n#'\n#' ### distribution of outlier scores\n#' summary(glosh)\n#' hist(glosh, breaks = 10)\n#'\n#' ### simple function to plot point size is proportional to GLOSH score\n#' plot_glosh <- function(x, glosh){\n#'   plot(x, pch = \".\", main = \"GLOSH (k = 3)\")\n#'   points(x, cex = glosh*3, pch = 1, col = \"red\")\n#'   text(x[glosh > 0.80, ], labels = round(glosh, 3)[glosh > 0.80], pos = 3)\n#' }\n#' plot_glosh(x, glosh)\n#'\n#' ### GLOSH with any hierarchy\n#' x_dist <- dist(x)\n#' x_sl <- hclust(x_dist, method = \"single\")\n#' x_upgma <- hclust(x_dist, method = \"average\")\n#' x_ward <- hclust(x_dist, method = \"ward.D2\")\n#'\n#' ## Compare what different linkage criterion consider as outliers\n#' glosh_sl <- glosh(x_sl, k = 3)\n#' plot_glosh(x, glosh_sl)\n#'\n#' glosh_upgma <- glosh(x_upgma, k = 3)\n#' plot_glosh(x, glosh_upgma)\n#'\n#' glosh_ward <- glosh(x_ward, k = 3)\n#' plot_glosh(x, glosh_ward)\n#'\n#' ## GLOSH is automatically computed with HDBSCAN\n#' all(hdbscan(x, minPts = 3)$outlier_scores == glosh(x, k = 3))\n#' @export\nglosh <- function(x, k = 4, ...) {\n  if (inherits(x, \"data.frame\"))\n    x <- as.matrix(x)\n\n  # get n\n  if (inherits(x, \"dist\") || inherits(x, \"matrix\")) {\n    if (inherits(x, \"dist\"))\n      n <- attr(x, \"Size\")\n    else\n      n <- nrow(x)\n    # get k nearest neighbors + distances\n    d <- kNN(x, k - 1, ...)\n    x_dist <-\n      if (inherits(x, \"dist\"))\n        x\n    else\n      dist(x, method = \"euclidean\") # copy since mrd changes by reference!\n\n    .check_dist(x_dist)\n    mrd <- mrd(x_dist, d$dist[, k - 1])\n\n    # need to assemble hclust object manually\n    mst <- mst(mrd, n)\n    hc <- hclustMergeOrder(mst, order(mst[, 3]))\n  } else if (inherits(x, \"hclust\")) {\n    hc <- x\n    n <- nrow(hc$merge) + 1\n  }\n  else\n    stop(\"x needs to be a matrix, dist, or hclust object!\")\n\n  if (k < 2 || k >= n)\n    stop(\"k has to be larger than 1 and smaller than the number of points\")\n\n  res <- computeStability(hc, k, compute_glosh = TRUE)\n\n  # return\n  attr(res, \"glosh\")\n}\n"
  },
  {
    "path": "R/LOF.R",
    "content": "#######################################################################\n# dbscan - Density Based Clustering of Applications with Noise\n#          and Related Algorithms\n# Copyright (C) 2015 Michael Hahsler\n\n# This program is free software; you can redistribute it and/or modify\n# it under the terms of the GNU General Public License as published by\n# the Free Software Foundation; either version 2 of the License, or\n# any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU General Public License for more details.\n#\n# You should have received a copy of the GNU General Public License along\n# with this program; if not, write to the Free Software Foundation, Inc.,\n# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.\n\n\n#' Local Outlier Factor Score\n#'\n#' Calculate the Local Outlier Factor (LOF) score for each data point using a\n#' kd-tree to speed up kNN search.\n#'\n#' LOF compares the local readability density (lrd) of an point to the lrd of\n#' its neighbors. A LOF score of approximately 1 indicates that the lrd around\n#' the point is comparable to the lrd of its neighbors and that the point is\n#' not an outlier. Points that have a substantially lower lrd than their\n#' neighbors are considered outliers and produce scores significantly larger\n#' than 1.\n#'\n#' If a data matrix is specified, then Euclidean distances and fast nearest\n#' neighbor search using a kd-tree is used.\n#'\n#' **Note on duplicate points:** If there are more than `minPts`\n#' duplicates of a point in the data, then LOF the local readability distance\n#' will be 0 resulting in an undefined LOF score of 0/0. We set LOF in this\n#' case to 1 since there is already enough density from the points in the same\n#' location to make them not outliers. The original paper by Breunig et al\n#' (2000) assumes that the points are real duplicates and suggests to remove\n#' the duplicates before computing LOF. If duplicate points are removed first,\n#' then this LOF implementation in \\pkg{dbscan} behaves like the one described\n#' by Breunig et al.\n#'\n#' @aliases lof LOF\n#' @family Outlier Detection Functions\n#'\n#' @param x a data matrix or a [dist] object.\n#' @param minPts number of nearest neighbors used in defining the local\n#' neighborhood of a point (includes the point itself).\n#' @param ... further arguments are passed on to [kNN()].\n#' Note: `sort` cannot be specified here since `lof()`\n#' uses always `sort = TRUE`.\n#'\n#' @return A numeric vector of length `ncol(x)` containing LOF values for\n#' all data points.\n#'\n#' @author Michael Hahsler\n#' @references Breunig, M., Kriegel, H., Ng, R., and Sander, J. (2000). LOF:\n#' identifying density-based local outliers. In _ACM Int. Conf. on\n#' Management of Data,_ pages 93-104.\n#' \\doi{10.1145/335191.335388}\n#' @keywords model\n#' @examples\n#' set.seed(665544)\n#' n <- 100\n#' x <- cbind(\n#'   x=runif(10, 0, 5) + rnorm(n, sd = 0.4),\n#'   y=runif(10, 0, 5) + rnorm(n, sd = 0.4)\n#'   )\n#'\n#' ### calculate LOF score with a neighborhood of 3 points\n#' lof <- lof(x, minPts = 3)\n#'\n#' ### distribution of outlier factors\n#' summary(lof)\n#' hist(lof, breaks = 10, main = \"LOF (minPts = 3)\")\n#'\n#' ### plot sorted lof. Looks like outliers start arounf a LOF of 2.\n#' plot(sort(lof), type = \"l\",  main = \"LOF (minPts = 3)\",\n#'   xlab = \"Points sorted by LOF\", ylab = \"LOF\")\n#'\n#' ### point size is proportional to LOF and mark points with a LOF > 2\n#' plot(x, pch = \".\", main = \"LOF (minPts = 3)\", asp = 1)\n#' points(x, cex = (lof - 1) * 2, pch = 1, col = \"red\")\n#' text(x[lof > 2,], labels = round(lof, 1)[lof > 2], pos = 3)\n#' @export\nlof <- function(x, minPts = 5, ...) {\n  ### parse extra parameters\n  extra <- list(...)\n\n  # check for deprecated k\n  if (!is.null(extra[[\"k\"]])) {\n    minPts <- extra[[\"k\"]] + 1\n    extra[[\"k\"]] <- NULL\n    warning(\"lof: k is now deprecated. use minPts = \", minPts, \" instead .\")\n  }\n\n  args <- c(\"search\", \"bucketSize\", \"splitRule\", \"approx\")\n  m <- pmatch(names(extra), args)\n  if (anyNA(m))\n    stop(\"Unknown parameter: \",\n      toString(names(extra)[is.na(m)]))\n  names(extra) <- args[m]\n\n  search <- extra$search %||% \"kdtree\"\n  search <- .parse_search(search)\n  splitRule <- extra$splitRule %||% \"suggest\"\n  splitRule <- .parse_splitRule(splitRule)\n  bucketSize <- if (is.null(extra$bucketSize))\n    10L\n  else\n    as.integer(extra$bucketSize)\n  approx <- if (is.null(extra$approx))\n    0\n  else\n    as.double(extra$approx)\n\n  ### precompute distance matrix for dist search\n  if (search == 3 && !inherits(x, \"dist\")) {\n    if (.matrixlike(x))\n      x <- dist(x)\n    else\n      stop(\"x needs to be a matrix to calculate distances\")\n  }\n\n  # get and check n\n  if (inherits(x, \"dist\"))\n    n <- attr(x, \"Size\")\n  else\n    n <- nrow(x)\n  if (is.null(n))\n    stop(\"x needs to be a matrix or a dist object!\")\n  if (minPts < 2 || minPts > n)\n    stop(\"minPts has to be at least 2 and not larger than the number of points\")\n\n\n  ### get LOF from a dist object\n  if (inherits(x, \"dist\")) {\n    if (anyNA(x))\n      stop(\"NAs not allowed in dist for LOF!\")\n\n    # find k-NN distance, ids and distances\n    x <- as.matrix(x)\n    diag(x) <- Inf ### no self-matches\n    o <- t(apply(x, 1, order, decreasing = FALSE))\n    k_dist <- x[cbind(o[, minPts - 1], seq_len(n))]\n    ids <-\n      lapply(\n        seq_len(n),\n        FUN = function(i)\n          which(x[i,] <= k_dist[i])\n      )\n    dist <-\n      lapply(\n        seq_len(n),\n        FUN = function(i)\n          x[i, x[i,] <= k_dist[i]]\n      )\n\n    ret <- list(k_dist = k_dist,\n      ids = ids,\n      dist = dist)\n\n  } else{\n    ### Use kd-tree\n\n    if (anyNA(x))\n      stop(\"NAs not allowed for LOF using kdtree!\")\n\n    ret <- lof_kNN(\n      as.matrix(x),\n      as.integer(minPts),\n      as.integer(search),\n      as.integer(bucketSize),\n      as.integer(splitRule),\n      as.double(approx)\n    )\n  }\n\n  # calculate local reachability density (LRD)\n  # reachability-distance_k(A,B) = max{k-distance(B), d(A,B)}\n  # lrdk(A) = 1/(sum_B \\in N_k(A) reachability-distance_k(A, B) / |N_k(A)|)\n  lrd <- numeric(n)\n  for (A in seq_len(n)) {\n    Bs <- ret$ids[[A]]\n    lrd[A] <-\n      1 / (sum(pmax.int(ret$k_dist[Bs], ret$dist[[A]])) / length(Bs))\n  }\n\n  # calculate local outlier factor (LOF)\n  # LOF_k(A) = sum_B \\in N_k(A) lrd_k(B)/(|N_k(A)| lrdk(A))\n  lof <- numeric(n)\n  for (A in seq_len(n)) {\n    Bs <- ret$ids[[A]]\n    lof[A] <- sum(lrd[Bs]) / length(Bs) / lrd[A]\n  }\n\n  # with more than k duplicates lrd can become infinity\n  # we define them not to be outliers\n  lof[is.nan(lof)] <- 1\n\n  lof\n}\n"
  },
  {
    "path": "R/NN.R",
    "content": "#######################################################################\n# dbscan - Density Based Clustering of Applications with Noise\n#          and Related Algorithms\n# Copyright (C) 2015 Michael Hahsler\n\n# This program is free software; you can redistribute it and/or modify\n# it under the terms of the GNU General Public License as published by\n# the Free Software Foundation; either version 2 of the License, or\n# any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU General Public License for more details.\n#\n# You should have received a copy of the GNU General Public License along\n# with this program; if not, write to the Free Software Foundation, Inc.,\n# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.\n\n#' NN --- Nearest Neighbors Superclass\n#'\n#' NN is an abstract S3 superclass for the classes of the objects returned\n#' by [kNN()], [frNN()] and [sNN()]. Methods for sorting, plotting and getting an\n#' adjacency list are defined.\n#'\n#' @name NN\n#' @aliases NN\n#' @family NN functions\n#'\n#' @param x a `NN` object\n#' @param pch plotting character.\n#' @param col color used for the data points (nodes).\n#' @param linecol color used for edges.\n#' @param ... further parameters past on to [plot()].\n#' @param decreasing sort in decreasing order?\n#' @param data that was used to create `x`\n#' @param main title\n#'\n#' @section Subclasses:\n#' [kNN], [frNN] and [sNN]\n#'\n#' @author Michael Hahsler\n#' @keywords model\n#' @examples\n#' data(iris)\n#' x <- iris[, -5]\n#'\n#' # finding kNN directly in data (using a kd-tree)\n#' nn <- kNN(x, k=5)\n#' nn\n#'\n#' # plot the kNN where NN are shown as line conecting points.\n#' plot(nn, x)\n#'\n#' # show the first few elements of the adjacency list\n#' head(adjacencylist(nn))\n#'\n#' \\dontrun{\n#' # create a graph and find connected components (if igraph is installed)\n#' library(\"igraph\")\n#' g <- graph_from_adj_list(adjacencylist(nn))\n#' comp <- components(g)\n#' plot(x, col = comp$membership)\n#'\n#' # detect clusters (communities) with the label propagation algorithm\n#' cl <- membership(cluster_label_prop(g))\n#' plot(x, col = cl)\n#' }\nNULL\n\n#' @rdname NN\n#' @export\nadjacencylist <- function (x, ...)\n  UseMethod(\"adjacencylist\", x)\n\n#' @rdname NN\n#' @export\nadjacencylist.NN <- function (x, ...) {\n  stop(\"needs to be implemented by a subclass\")\n  }\n\n#' @rdname NN\n#' @export\nsort.NN <- function(x, decreasing = FALSE, ...) {\n  stop(\"needs to be implemented by a subclass\")\n  }\n\n\n#' @rdname NN\n#' @export\nplot.NN <- function(x, data, main = NULL, pch = 16, col = NULL, linecol = \"gray\", ...) {\n  if (is.null(main)) {\n    if (inherits(x, \"frNN\"))\n      main <- paste0(\"frNN graph (eps = \", x$eps, \")\")\n    if (inherits(x, \"kNN\"))\n      main <- paste0(x$k, \"-NN graph\")\n    if (inherits(x, \"sNN\"))\n      main <- paste0(\"Shared NN graph (k=\", x$k,\n        ifelse(is.null(x$kt), \"\", paste0(\", kt=\", x$kt)), \")\")\n  }\n\n  ## create an empty plot\n  plot(data[, 1:2], main = main, type = \"n\", pch = pch, col = col, ...)\n\n  id <- adjacencylist(x)\n\n  ## use lines if it is from the same data\n  ## FIXME: this test is not perfect, maybe we should have a parameter here or add the query points...\n  if (length(id) == nrow(data)) {\n    for (i in seq_along(id)) {\n      for (j in seq_along(id[[i]]))\n        lines(x = c(data[i, 1], data[id[[i]][j], 1]),\n          y = c(data[i, 2], data[id[[i]][j], 2]), col = linecol,\n          ...)\n    }\n\n    ## ad vertices\n    points(data[, 1:2], main = main, pch = pch, col = col, ...)\n\n  } else {\n    ## ad vertices\n    points(data[, 1:2], main = main, pch = pch, ...)\n    ## use colors if it was from a query\n    for (i in seq_along(id)) {\n      points(data[id[[i]], ], pch = pch, col = i + 1L)\n    }\n  }\n}\n"
  },
  {
    "path": "R/RcppExports.R",
    "content": "# Generated by using Rcpp::compileAttributes() -> do not edit by hand\n# Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393\n\nJP_int <- function(nn, kt) {\n    .Call(`_dbscan_JP_int`, nn, kt)\n}\n\nSNN_sim_int <- function(nn, jp) {\n    .Call(`_dbscan_SNN_sim_int`, nn, jp)\n}\n\nANN_cleanup <- function() {\n    invisible(.Call(`_dbscan_ANN_cleanup`))\n}\n\ncomps_kNN <- function(nn, mutual) {\n    .Call(`_dbscan_comps_kNN`, nn, mutual)\n}\n\ncomps_frNN <- function(nn, mutual) {\n    .Call(`_dbscan_comps_frNN`, nn, mutual)\n}\n\nintToStr <- function(iv) {\n    .Call(`_dbscan_intToStr`, iv)\n}\n\ndist_subset <- function(dist, idx) {\n    .Call(`_dbscan_dist_subset`, dist, idx)\n}\n\nXOR <- function(lhs, rhs) {\n    .Call(`_dbscan_XOR`, lhs, rhs)\n}\n\ndspc <- function(cl_idx, internal_nodes, all_cl_ids, mrd_dist) {\n    .Call(`_dbscan_dspc`, cl_idx, internal_nodes, all_cl_ids, mrd_dist)\n}\n\ndbscan_int <- function(data, eps, minPts, weights, borderPoints, type, bucketSize, splitRule, approx, frNN) {\n    .Call(`_dbscan_dbscan_int`, data, eps, minPts, weights, borderPoints, type, bucketSize, splitRule, approx, frNN)\n}\n\nreach_to_dendrogram <- function(reachability, pl_order) {\n    .Call(`_dbscan_reach_to_dendrogram`, reachability, pl_order)\n}\n\ndendrogram_to_reach <- function(x) {\n    .Call(`_dbscan_dendrogram_to_reach`, x)\n}\n\nmst_to_dendrogram <- function(mst) {\n    .Call(`_dbscan_mst_to_dendrogram`, mst)\n}\n\ndbscan_density_int <- function(data, eps, type, bucketSize, splitRule, approx) {\n    .Call(`_dbscan_dbscan_density_int`, data, eps, type, bucketSize, splitRule, approx)\n}\n\nfrNN_int <- function(data, eps, type, bucketSize, splitRule, approx) {\n    .Call(`_dbscan_frNN_int`, data, eps, type, bucketSize, splitRule, approx)\n}\n\nfrNN_query_int <- function(data, query, eps, type, bucketSize, splitRule, approx) {\n    .Call(`_dbscan_frNN_query_int`, data, query, eps, type, bucketSize, splitRule, approx)\n}\n\ndistToAdjacency <- function(constraints, N) {\n    .Call(`_dbscan_distToAdjacency`, constraints, N)\n}\n\nbuildDendrogram <- function(hcl) {\n    .Call(`_dbscan_buildDendrogram`, hcl)\n}\n\nall_children <- function(hier, key, leaves_only = FALSE) {\n    .Call(`_dbscan_all_children`, hier, key, leaves_only)\n}\n\nnode_xy <- function(cl_tree, cl_hierarchy, cid = 0L) {\n    .Call(`_dbscan_node_xy`, cl_tree, cl_hierarchy, cid)\n}\n\nsimplifiedTree <- function(cl_tree) {\n    .Call(`_dbscan_simplifiedTree`, cl_tree)\n}\n\ncomputeStability <- function(hcl, minPts, compute_glosh = FALSE) {\n    .Call(`_dbscan_computeStability`, hcl, minPts, compute_glosh)\n}\n\nvalidateConstraintList <- function(constraints, n) {\n    .Call(`_dbscan_validateConstraintList`, constraints, n)\n}\n\ncomputeVirtualNode <- function(noise, constraints) {\n    .Call(`_dbscan_computeVirtualNode`, noise, constraints)\n}\n\nfosc <- function(cl_tree, cid, sc, cl_hierarchy, prune_unstable_leaves = FALSE, cluster_selection_epsilon = 0.0, alpha = 0, useVirtual = FALSE, n_constraints = 0L, constraints = NULL) {\n    .Call(`_dbscan_fosc`, cl_tree, cid, sc, cl_hierarchy, prune_unstable_leaves, cluster_selection_epsilon, alpha, useVirtual, n_constraints, constraints)\n}\n\nextractUnsupervised <- function(cl_tree, prune_unstable = FALSE, cluster_selection_epsilon = 0.0) {\n    .Call(`_dbscan_extractUnsupervised`, cl_tree, prune_unstable, cluster_selection_epsilon)\n}\n\nextractSemiSupervised <- function(cl_tree, constraints, alpha = 0, prune_unstable_leaves = FALSE, cluster_selection_epsilon = 0.0) {\n    .Call(`_dbscan_extractSemiSupervised`, cl_tree, constraints, alpha, prune_unstable_leaves, cluster_selection_epsilon)\n}\n\nkNN_query_int <- function(data, query, k, type, bucketSize, splitRule, approx) {\n    .Call(`_dbscan_kNN_query_int`, data, query, k, type, bucketSize, splitRule, approx)\n}\n\nkNN_int <- function(data, k, type, bucketSize, splitRule, approx) {\n    .Call(`_dbscan_kNN_int`, data, k, type, bucketSize, splitRule, approx)\n}\n\nlof_kNN <- function(data, minPts, type, bucketSize, splitRule, approx) {\n    .Call(`_dbscan_lof_kNN`, data, minPts, type, bucketSize, splitRule, approx)\n}\n\nmrd <- function(dm, cd) {\n    .Call(`_dbscan_mrd`, dm, cd)\n}\n\nmst <- function(x_dist, n) {\n    .Call(`_dbscan_mst`, x_dist, n)\n}\n\nhclustMergeOrder <- function(mst, o) {\n    .Call(`_dbscan_hclustMergeOrder`, mst, o)\n}\n\noptics_int <- function(data, eps, minPts, type, bucketSize, splitRule, approx, frNN) {\n    .Call(`_dbscan_optics_int`, data, eps, minPts, type, bucketSize, splitRule, approx, frNN)\n}\n\nlowerTri <- function(m) {\n    .Call(`_dbscan_lowerTri`, m)\n}\n\n"
  },
  {
    "path": "R/broom-dbscan-tidiers.R",
    "content": "#' Turn an dbscan clustering object into a tidy tibble\n#'\n#' Provides [tidy()][generics::tidy()], [augment()][generics::augment()], and\n#' [glance()][generics::glance()] verbs for clusterings created with algorithms\n#' in package `dbscan` to work with [tidymodels](https://www.tidymodels.org/).\n#'\n#' @param x An `dbscan` object returned from [dbscan::dbscan()].\n#' @param data The data used to create the clustering.\n#' @param newdata New data to predict cluster labels for.\n#' @param ... further arguments are ignored without a warning.\n#'\n#' @name dbscan_tidiers\n#' @aliases dbscan_tidiers glance tidy augment\n#' @family tidiers\n#'\n#' @seealso [generics::tidy()], [generics::augment()],\n#'  [generics::glance()], [dbscan()]\n#'\n#' @examplesIf requireNamespace(\"tibble\", quietly = TRUE) && identical(Sys.getenv(\"NOT_CRAN\"), \"true\")\n#'\n#' data(iris)\n#' x <- scale(iris[, 1:4])\n#'\n#' ## dbscan\n#' db <- dbscan(x, eps = .9, minPts = 5)\n#' db\n#'\n#' # summarize model fit with tidiers\n#' tidy(db)\n#' glance(db)\n#'\n#' # augment for this model needs the original data\n#' augment(db, x)\n#'\n#' # to augment new data, the original data is also needed\n#' augment(db, x, newdata = x[1:5, ])\n#'\n#' ## hdbscan\n#' hdb <- hdbscan(x, minPts = 5)\n#'\n#' # summarize model fit with tidiers\n#' tidy(hdb)\n#' glance(hdb)\n#'\n#' # augment for this model needs the original data\n#' augment(hdb, x)\n#'\n#' # to augment new data, the original data is also needed\n#' augment(hdb, x, newdata = x[1:5, ])\n#'\n#' ## Jarvis-Patrick clustering\n#' cl <- jpclust(x, k = 20, kt = 15)\n#'\n#' # summarize model fit with tidiers\n#' tidy(cl)\n#' glance(cl)\n#'\n#' # augment for this model needs the original data\n#' augment(cl, x)\n#'\n#' ## Shared Nearest Neighbor clustering\n#' cl <- sNNclust(x, k = 20, eps = 0.8, minPts = 15)\n#'\n#' # summarize model fit with tidiers\n#' tidy(cl)\n#' glance(cl)\n#'\n#' # augment for this model needs the original data\n#' augment(cl, x)\n#'\nNULL\n\n#' @rdname dbscan_tidiers\n#' @importFrom generics tidy\n#' @export\ngenerics::tidy\n\n\n#' @rdname dbscan_tidiers\n#' @export\ntidy.dbscan <- function(x, ...) {\n  n_cl <- max(x$cluster)\n  size <- table(factor(x$cluster, levels = 0:n_cl))\n\n  tb <- tibble::tibble(cluster = as.factor(0:n_cl),\n         size = as.integer(size))\n\n  tb$noise <- tb$cluster == 0L\n  tb\n}\n\n#' @rdname dbscan_tidiers\n#' @export\ntidy.hdbscan <- function(x, ...) {\n  n_cl <- max(x$cluster)\n  size <- table(factor(x$cluster, levels = 0:n_cl))\n\n  tb <- tibble::tibble(cluster = as.factor(0:n_cl),\n         size = as.integer(size))\n  tb$cluster_score <- as.numeric(x$cluster_scores[as.character(tb$cluster)])\n  tb$noise <- tb$cluster == 0L\n\n  tb\n}\n\n#' @rdname dbscan_tidiers\n#' @export\ntidy.general_clustering <- function(x, ...) {\n  n_cl <- max(x$cluster)\n  size <- table(factor(x$cluster, levels = 0:n_cl))\n\n  tb <- tibble::tibble(cluster = as.factor(0:n_cl),\n         size = as.integer(size))\n  tb$noise <- tb$cluster == 0L\n\n  tb\n}\n\n\n## augment\n\n#' @importFrom generics augment\n#' @rdname dbscan_tidiers\n#' @export\ngenerics::augment\n\n\n#' @rdname dbscan_tidiers\n#' @export\naugment.dbscan <- function(x, data = NULL, newdata = NULL, ...) {\n  n_cl <- max(x$cluster)\n\n  if (is.null(data) && is.null(newdata))\n    stop(\"Must specify either `data` or `newdata` argument.\")\n\n  if (is.null(data) || nrow(data) != length(x$cluster)) {\n    stop(\"The original data needs to be passed as data.\")\n  }\n\n  if (is.null(newdata)) {\n    tb <- tibble::as_tibble(data)\n    tb$.cluster <- factor(x$cluster, levels = 0:n_cl)\n  } else {\n    tb <- tibble::as_tibble(newdata)\n    tb$.cluster <- factor(predict(x,\n                                  newdata = newdata,\n                                  data = data), levels = 0:n_cl)\n  }\n\n  tb$noise <- tb$.cluster == 0L\n\n  tb\n}\n\n#' @rdname dbscan_tidiers\n#' @export\naugment.hdbscan <- function(x, data = NULL, newdata = NULL, ...) {\n  n_cl <- max(x$cluster)\n\n  if (is.null(data) || nrow(data) != length(x$cluster)) {\n    stop(\"The original data needs to be passed as data.\")\n  }\n\n  if (is.null(newdata)) {\n    tb <- tibble::as_tibble(data)\n    tb$.cluster <- factor(x$cluster, levels = 0:n_cl)\n    tb$.coredist <- x$coredist\n    tb$.membership_prob <- x$membership_prob\n    tb$.outlier_scores <- x$outlier_scores\n  } else {\n    tb <- tibble::as_tibble(newdata)\n    tb$.cluster <- factor(\n        predict(x, newdata = newdata, data = data), levels = 0:n_cl)\n    tb$.coredist <- NA_real_\n    tb$.membership_prob <- NA_real_\n    tb$.outlier_scores <- NA_real_\n  }\n\n  tb\n}\n\n#' @rdname dbscan_tidiers\n#' @export\naugment.general_clustering <- function(x, data = NULL, newdata = NULL, ...) {\n  n_cl <- max(x$cluster)\n\n  if (is.null(data) || nrow(data) != length(x$cluster)) {\n    stop(\"The original data needs to be passed as data.\")\n  }\n\n  if (is.null(newdata)) {\n    tb <- tibble::as_tibble(data)\n    tb$.cluster <- factor(x$cluster, levels = 0:n_cl)\n  } else {\n    stop(\"augmenting new data is not supported.\")\n  }\n\n  tb\n}\n\n\n\n## glance\n#' @importFrom generics glance\n#' @rdname dbscan_tidiers\n#' @export\ngenerics::glance\n\n\n#' @rdname dbscan_tidiers\n#' @export\nglance.dbscan <- function(x, ...) {\n  tibble::tibble(\n    nobs = length(x$cluster),\n    n.clusters = length(table(x$cluster[x$cluster != 0L])),\n    nexcluded = sum(x$cluster == 0L)\n  )\n}\n\n#' @rdname dbscan_tidiers\n#' @export\nglance.hdbscan <- function(x, ...) {\n  tibble::tibble(\n    nobs = length(x$cluster),\n    n.clusters = length(table(x$cluster[x$cluster != 0L])),\n    nexcluded = sum(x$cluster == 0L)\n  )\n}\n\n#' @rdname dbscan_tidiers\n#' @export\nglance.general_clustering <- function(x, ...) {\n  tibble::tibble(\n    nobs = length(x$cluster),\n    n.clusters = length(table(x$cluster[x$cluster != 0L])),\n    nexcluded = sum(x$cluster == 0L)\n  )\n}\n\n"
  },
  {
    "path": "R/comps.R",
    "content": "#######################################################################\n# dbscan - Density Based Clustering of Applications with Noise\n#          and Related Algorithms\n# Copyright (C) 2017 Michael Hahsler\n\n# This program is free software; you can redistribute it and/or modify\n# it under the terms of the GNU General Public License as published by\n# the Free Software Foundation; either version 2 of the License, or\n# any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU General Public License for more details.\n#\n# You should have received a copy of the GNU General Public License along\n# with this program; if not, write to the Free Software Foundation, Inc.,\n# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.\n\n#' Find Connected Components in a Nearest-neighbor Graph\n#'\n#' Generic function and methods to find connected components in nearest neighbor graphs.\n#'\n#' Note that for kNN graphs, one point may be in the kNN of the other but nor vice versa.\n#' `mutual = TRUE` requires that both points are in each other's kNN.\n#'\n#' @family NN functions\n#' @aliases components\n#'\n#' @param x the [NN] object representing the graph or a [dist] object\n#' @param eps threshold on the distance\n#' @param mutual for a pair of points, do both have to be in each other's neighborhood?\n#' @param ... further arguments are currently unused.\n#'\n#' @return an integer vector with component assignments.\n#'\n#' @author Michael Hahsler\n#' @keywords model\n#' @examples\n#' set.seed(665544)\n#' n <- 100\n#' x <- cbind(\n#'   x=runif(10, 0, 5) + rnorm(n, sd = 0.4),\n#'   y=runif(10, 0, 5) + rnorm(n, sd = 0.4)\n#'   )\n#' plot(x, pch = 16)\n#'\n#' # Connected components on a graph where each pair of points\n#' # with a distance less or equal to eps are connected\n#' d <- dist(x)\n#' components <- comps(d, eps = .8)\n#' plot(x, col = components, pch = 16)\n#'\n#' # Connected components in a fixed radius nearest neighbor graph\n#' # Gives the same result as the threshold on the distances above\n#' frnn <- frNN(x, eps = .8)\n#' components <- comps(frnn)\n#' plot(frnn, data = x, col = components)\n#'\n#' # Connected components on a k nearest neighbors graph\n#' knn <- kNN(x, 3)\n#' components <- comps(knn, mutual = FALSE)\n#' plot(knn, data = x, col = components)\n#'\n#' components <- comps(knn, mutual = TRUE)\n#' plot(knn, data = x, col = components)\n#'\n#' # Connected components in a shared nearest neighbor graph\n#' snn <- sNN(x, k = 10, kt = 5)\n#' components <- comps(snn)\n#' plot(snn, data = x, col = components)\n#' @export\ncomps <- function(x, ...) UseMethod(\"comps\", x)\n\n#' @rdname comps\n#' @export\ncomps.dist <- function(x, eps, ...)\n  stats::cutree(stats::hclust(x, method = \"single\"), h = eps)\n\n#' @rdname comps\n#' @export\ncomps.kNN <- function(x, mutual = FALSE, ...)\n  as.integer(factor(comps_kNN(x$id, as.logical(mutual))))\n\n# sNN and frNN are symmetric so no need for mutual\n#' @rdname comps\n#' @export\ncomps.sNN <- function(x, ...) comps.kNN(x, mutual = FALSE)\n\n#' @rdname comps\n#' @export\ncomps.frNN <- function(x, ...) comps_frNN(x$id, mutual = FALSE)\n"
  },
  {
    "path": "R/dbcv.R",
    "content": "#######################################################################\n# dbscan - Density Based Clustering of Applications with Noise\n#          and Related Algorithms\n# Copyright (C) 2024 Michael Hahsler, Matt Piekenbrock\n\n# This program is free software; you can redistribute it and/or modify\n# it under the terms of the GNU General Public License as published by\n# the Free Software Foundation; either version 2 of the License, or\n# any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU General Public License for more details.\n#\n# You should have received a copy of the GNU General Public License along\n# with this program; if not, write to the Free Software Foundation, Inc.,\n# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.\n\n\n#' Density-Based Clustering Validation Index (DBCV)\n#'\n#' Calculate the Density-Based Clustering Validation Index (DBCV)  for a\n#' clustering.\n#'\n#' DBCV (Moulavi et al, 2014) computes a score based on the density sparseness of each cluster\n#' and the density separation of each pair of clusters.\n#'\n#' The density sparseness of a cluster (DSC) is deﬁned as the maximum edge weight of\n#' a minimal spanning tree for the internal points of the cluster using the mutual\n#' reachability distance based on the all-points-core-distance. Internal points\n#' are connected to more than one other point in the cluster. Since clusters of\n#' a size less then 3 cannot have internal points, they are ignored (considered\n#' noise) in this implementation.\n#'\n#' The density separation of a pair of clusters (DSPC)\n#' is deﬁned as the minimum reachability distance between the internal nodes of\n#' the spanning trees of the two clusters.\n#'\n#' The validity index for a cluster is calculated using these measures and aggregated\n#' to a validity index for the whole clustering using a weighted average.\n#'\n#' The index is in the range \\eqn{[-1,1]}. If the cluster density compactness is better\n#' than the density separation, a positive value is returned. The actual value depends\n#' on the separability of the data. In general, greater values\n#' of the measure indicating a better density-based clustering solution.\n#'\n#' Noise points are included in the calculation only in the weighted average,\n#' therefore clustering with more noise points will get a lower index.\n#'\n#' **Performance note:** This implementation calculates a distance matrix and thus\n#' can only be used for small or sampled datasets.\n#'\n#' @aliases dbcv DBCV\n#' @family Evaluation Functions\n#'\n#' @param x a data matrix or a dist object.\n#' @param cl a clustering (e.g., a integer vector)\n#' @param d dimensionality of the original data if a dist object is provided.\n#' @param metric distance metric used. The available metrics are the methods\n#'        implemented by `dist()` plus `\"sqeuclidean\"` for the squared\n#'        Euclidean distance used in the original DBCV implementation.\n#' @param sample sample size used for large datasets.\n#'\n#' @return A list with the DBCV `score` for the clustering,\n#'   the density sparseness of cluster (`dsc`) values,\n#'   the density separation of pairs of clusters (`dspc`) distances,\n#'   and the validity indices of clusters (`c_c`).\n#'\n#' @author Matt Piekenbrock and Michael Hahsler\n#' @references Davoud Moulavi and Pablo A. Jaskowiak and\n#' Ricardo J. G. B. Campello and Arthur Zimek and Jörg Sander (2014).\n#' Density-Based Clustering Validation. In\n#' _Proceedings of the 2014 SIAM International Conference on Data Mining,_\n#' pages 839-847\n#' \\doi{10.1137/1.9781611973440.96}\n#'\n#' Pablo A. Jaskowiak (2022). MATLAB implementation of DBCV.\n#' \\url{https://github.com/pajaskowiak/dbcv}\n#' @examples\n#' # Load a test dataset\n#' data(Dataset_1)\n#' x <- Dataset_1[, c(\"x\", \"y\")]\n#' class <- Dataset_1$class\n#'\n#' clplot(x, class)\n#'\n#' # We use MinPts 3 and use the knee at eps = .1 for dbscan\n#' kNNdistplot(x, minPts = 3)\n#'\n#' cl <- dbscan(x, eps = .1, minPts = 3)\n#' clplot(x, cl)\n#'\n#' dbcv(x, cl)\n#'\n#' # compare to the DBCV index on the original class labels and\n#' # with a random partitioning\n#' dbcv(x, class)\n#' dbcv(x, sample(1:4, replace = TRUE, size = nrow(x)))\n#'\n#' # find the best eps using dbcv\n#' eps_grid <- seq(.05,.2, by = .01)\n#' cls <- lapply(eps_grid, FUN = function(e) dbscan(x, eps = e, minPts = 3))\n#' dbcvs <- sapply(cls, FUN = function(cl) dbcv(x, cl)$score)\n#'\n#' plot(eps_grid, dbcvs, type = \"l\")\n#'\n#' eps_opt <- eps_grid[which.max(dbcvs)]\n#' eps_opt\n#'\n#' cl <- dbscan(x, eps = eps_opt, minPts = 3)\n#' clplot(x, cl)\n#' @export\ndbcv <- function(x,\n                 cl,\n                 d,\n                 metric = \"euclidean\",\n                 sample = NULL) {\n  # a clustering with a cluster element\n  if (is.list(cl)) {\n    cl <- cl$cluster\n  }\n\n  if (inherits(x, \"dist\")) {\n    xdist <- x\n    if (missing(d))\n      stop(\"d needs to be specified if a distance matrix is supplied!\")\n\n  } else if (.matrixlike(x)) {\n    if (!is.null(sample)) {\n      take <- sample(nrow(x), size = sample)\n      x <- x[take, ]\n      cl <- cl[take]\n    }\n\n    x <- as.matrix(x)\n    if (!missing(d) && d != ncol(x))\n      stop(\"d does not match the number of columns in x!\")\n    d <- ncol(x)\n\n    if (pmatch(metric, \"sqeuclidean\", nomatch = 0))\n      xdist <- dist(x, method = \"euclidean\")^2\n    else\n      xdist <- dist(x, method = metric)\n\n  } else\n    stop(\"'dbcv' expects x needs to be a matrix to calculate distances.\")\n\n  .check_dist(xdist)\n  n <- attr(xdist, \"Size\")\n\n  # in case we get a factor\n  cl <- as.integer(cl)\n\n  if (length(cl) != n)\n    stop(\"cl does not match the number of rows in x!\")\n\n  ## calculate everything for all non-noise points ordered by cluster\n  ## getClusterIdList removes noise points and singleton clusters\n  ## and returns indices reorder by cluster\n  cl_idx_list <- getClusterIdList(cl)\n  n_cl <- length(cl_idx_list)\n  ## reordered distances w/o noise\n  all_dist <- dist_subset(xdist, unlist(cl_idx_list))\n\n  new_cl_idx_list <- list()\n  i <- 1L\n  start <- 1\n  for(l in lengths(cl_idx_list)) {\n    end <- start + l - 1\n    new_cl_idx_list[[i]] <- seq(start, end)\n    start <- end + 1\n    i <- i + 1L\n  }\n\n  cl_idx_list <- new_cl_idx_list\n  all_idx <- unlist(cl_idx_list)\n\n\n  ## 1. Calculate all-points-core-distance\n  ## Calculate the all-points-core-distance for each point, within each cluster\n  ## Note: this needs the dimensionality of the data d\n  all_pts_core_dist <- unlist(lapply(\n    cl_idx_list,\n    FUN = function(ids) {\n      dists <- (rowSums(as.matrix((\n        1 / dist_subset(all_dist, ids)\n      )^d)) / (length(ids) - 1))^(-1 / d)\n    }\n  ))\n\n  ## 2. Create for each cluster a mutual reachability MSTs\n  all_mrd <- structure(mrd(all_dist, all_pts_core_dist),\n                       class = \"dist\",\n                       Size = length(all_idx))\n  ## Noise points are removed, but the index is affected by dividing by the\n  ## total number of objects including the noise points (n)!\n\n  ## mst is a matrix with columns: from to and weight\n  mrd_graphs <- lapply(cl_idx_list, function(idx) {\n    mst(x_dist = dist_subset(all_mrd, idx), n = length(idx))\n  })\n\n  ## 3. Density Sparseness of a Cluster (DSC):\n  ## The maximum edge weight of the internal edges in the cluster's\n  ## mutual reachability MST.\n\n  ## find internal nodes for DSC and DSPC. Internal nodes have a degree > 1\n  internal_nodes <- lapply(mrd_graphs, function(mst) {\n    node_deg <- table(c(mst[, 1], mst[, 2]))\n    idx <- as.integer(names(node_deg)[node_deg > 1])\n    idx\n  })\n\n  dsc <- mapply(function(mst, int_idx) {\n    # find internal edges\n    int_edge_idx <- which((mst[, 1L] %in% int_idx) &\n                            (mst[, 2L] %in% int_idx))\n    if (length(int_edge_idx) == 0L) {\n      return(max(mst[, 3L]))\n    }\n    max(mst[int_edge_idx, 3L])\n  }, mrd_graphs, internal_nodes)\n\n\n  ## 4. Density Separation of a Pair of Clusters (DSPC):\n  ## The minimum reachability distance between the internal nodes of the\n  ## internal nodes of a pair of MST_MRD's of clusters Ci and Cj\n  dspc_dist <- dspc(cl_idx_list, internal_nodes, all_idx, all_mrd)\n  # returns a matrix with Ci, Cj, dist\n\n  # make it into a full distance matrix\n  dspc_dist <- dspc_dist[, 3L]\n  class(dspc_dist) <- \"dist\"\n  attr(dspc_dist, \"Size\") <- n_cl\n  attr(dspc_dist, \"Diag\") <- FALSE\n  attr(dspc_dist, \"Upper\") <- FALSE\n\n  dspc_mm <- as.matrix(dspc_dist)\n  diag(dspc_mm) <- NA\n\n  ## 5. Validity index of a cluster:\n  min_separation <- apply(dspc_mm, MARGIN = 1, min, na.rm = TRUE)\n  v_c <- (min_separation - dsc) / pmax(min_separation, dsc)\n\n\n  ## 5. Validity index for whole clustering\n  res <- sum(lengths(cl_idx_list) / n * v_c)\n\n  return(list(\n    score = res,\n    n = n,\n    n_c = lengths(cl_idx_list),\n    d = d,\n    dsc = dsc,\n    dspc = dspc_dist,\n    v_c = v_c\n  ))\n}\n\n\ngetClusterIdList <- function(cl) {\n  ## In DBCV, singletons are ambiguously defined. However, they cannot be\n  ## considered valid clusters, for reasons listed in section 4 of the\n  ## original paper.\n  ## Clusters with less then 3 points cannot have internal nodes, so we need to\n  ## ignore them as well.\n  ## To ensure coverage, they are assigned into the noise category.\n  cl_freq <- table(cl)\n  cl[cl %in% as.integer(names(which(cl_freq < 3)))] <- 0L\n  if (all(cl == 0)) {\n    return(0)\n  }\n\n  cl_ids <- unique(cl)            # all cluster ids\n  cl_valid <- cl_ids[cl_ids != 0] # valid cluster indices (non-noise)\n  n_cl <- length(cl_valid)        # number of clusters\n\n  ## 1 or 0 clusters results in worst score + a warning\n  if (n_cl <= 1) {\n    warning(\"DBCV is undefined for less than 2 non-noise clusters with more than 2 member points.\")\n    return(-1L)\n  }\n\n  ## Indexes\n  cl_ids_idx <- lapply(cl_valid, function(id)\n    sort(which(cl == id))) ## the sort is important for indexing purposes\n  return(cl_ids_idx)\n}\n"
  },
  {
    "path": "R/dbscan.R",
    "content": "#######################################################################\n# dbscan - Density Based Clustering of Applications with Noise\n#          and Related Algorithms\n# Copyright (C) 2015 Michael Hahsler\n#\n# This program is free software; you can redistribute it and/or modify\n# it under the terms of the GNU General Public License as published by\n# the Free Software Foundation; either version 2 of the License, or\n# any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU General Public License for more details.\n#\n# You should have received a copy of the GNU General Public License along\n# with this program; if not, write to the Free Software Foundation, Inc.,\n# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.\n\n\n#' Density-based Spatial Clustering of Applications with Noise (DBSCAN)\n#'\n#' Fast reimplementation of the DBSCAN (Density-based spatial clustering of\n#' applications with noise) clustering algorithm using a kd-tree.\n#'\n#' The\n#' implementation is significantly faster and can work with larger data sets\n#' than [fpc::dbscan()] in \\pkg{fpc}. Use `dbscan::dbscan()` (with specifying the package) to\n#' call this implementation when you also load package \\pkg{fpc}.\n#'\n#' **The algorithm**\n#'\n#' This implementation of DBSCAN follows the original\n#' algorithm as described by Ester et al (1996). DBSCAN performs the following steps:\n#'\n#' 1. Estimate the density\n#'   around each data point by counting the number of points in a user-specified\n#'   eps-neighborhood and applies a used-specified minPts thresholds to identify\n#'      - core points (points with more than minPts points in their neighborhood),\n#'      - border points (non-core points with a core point in their neighborhood) and\n#'      - noise points (all other points).\n#' 2. Core points form the backbone of clusters by joining them into\n#'   a cluster if they are density-reachable from each other (i.e., there is a chain of core\n#'   points where one falls inside the eps-neighborhood of the next).\n#' 3. Border points are assigned to clusters. The algorithm needs parameters\n#'   `eps` (the radius of the epsilon neighborhood) and `minPts` (the\n#'   density threshold).\n#'\n#' Border points are arbitrarily assigned to clusters in the original\n#' algorithm. DBSCAN* (see Campello et al 2013) treats all border points as\n#' noise points. This is implemented with `borderPoints = FALSE`.\n#'\n#' **Specifying the data**\n#'\n#' If `x` is a matrix or a data.frame, then fast fixed-radius nearest\n#' neighbor computation using a kd-tree is performed using Euclidean distance.\n#' See [frNN()] for more information on the parameters related to\n#' nearest neighbor search. **Note** that only numerical values are allowed in `x`.\n#'\n#' Any precomputed distance matrix (dist object) can be specified as `x`.\n#' You may run into memory issues since distance matrices are large.\n#'\n#' A precomputed frNN object can be supplied as `x`. In this case\n#' `eps` does not need to be specified. This option us useful for large\n#' data sets, where a sparse distance matrix is available. See\n#' [frNN()] how to create frNN objects.\n#'\n#' **Setting parameters for DBSCAN**\n#'\n#' The parameters `minPts` and `eps` define the minimum density required\n#' in the area around core points which form the backbone of clusters.\n#' `minPts` is the number of points\n#' required in the neighborhood around the point defined by the parameter `eps`\n#' (i.e., the radius around the point). Both parameters\n#' depend on each other and changing one typically requires changing\n#' the other one as well. The parameters also depend on the size of the data set with\n#' larger datasets requiring a larger `minPts` or a smaller `eps`.\n#'\n#' * `minPts:` The original\n#' DBSCAN paper (Ester et al, 1996) suggests to start by setting \\eqn{\\text{minPts} \\ge d + 1},\n#' the data dimensionality plus one or higher with a minimum of 3. Larger values\n#' are preferable since increasing the parameter suppresses more noise in the data\n#' by requiring more points to form clusters.\n#' Sander et al (1998) uses in the examples two times the data dimensionality.\n#' Note that setting \\eqn{\\text{minPts} \\le 2} is equivalent to hierarchical clustering\n#' with the single link metric and the dendrogram cut at height `eps`.\n#'\n#' * `eps:` A suitable neighborhood size\n#' parameter `eps` given a fixed value for `minPts` can be found\n#' visually by inspecting the [kNNdistplot()] of the data using\n#' \\eqn{k = \\text{minPts} - 1} (`minPts` includes the point itself, while the\n#' k-nearest neighbors distance does not). The k-nearest neighbor distance plot\n#' sorts all data points by their k-nearest neighbor distance. A sudden\n#' increase of the kNN distance (a knee) indicates that the points to the right\n#' are most likely outliers. Choose `eps` for DBSCAN where the knee is.\n#'\n#' **Predict cluster memberships**\n#'\n#' [predict()] can be used to predict cluster memberships for new data\n#' points. A point is considered a member of a cluster if it is within the eps\n#' neighborhood of a core point of the cluster. Points\n#' which cannot be assigned to a cluster will be reported as\n#' noise points (i.e., cluster ID 0).\n#' **Important note:** `predict()` currently can only use Euclidean distance to determine\n#' the neighborhood of core points. If `dbscan()` was called using distances other than Euclidean,\n#' then the neighborhood calculation will not be correct and only approximated by Euclidean\n#' distances. If the data contain factor columns (e.g., using Gower's distance), then\n#' the factors in `data` and `query` first need to be converted to numeric to use the\n#' Euclidean approximation.\n#'\n#'\n#' @aliases dbscan DBSCAN print.dbscan_fast\n#' @family clustering functions\n#'\n#' @param x a data matrix, a data.frame, a [dist] object or a [frNN] object with\n#' fixed-radius nearest neighbors.\n#' @param eps size (radius) of the epsilon neighborhood. Can be omitted if\n#' `x` is a frNN object.\n#' @param minPts number of minimum points required in the eps neighborhood for\n#' core points (including the point itself).\n#' @param weights numeric; weights for the data points. Only needed to perform\n#' weighted clustering.\n#' @param borderPoints logical; should border points be assigned to clusters.\n#' The default is `TRUE` for regular DBSCAN. If `FALSE` then border\n#' points are considered noise (see DBSCAN* in Campello et al, 2013).\n#' @param ...  additional arguments are passed on to the fixed-radius nearest\n#' neighbor search algorithm. See [frNN()] for details on how to\n#' control the search strategy.\n#'\n#' @return `dbscan()` returns an object of class `dbscan_fast` with the following components:\n#'\n#' \\item{eps }{ value of the `eps` parameter.}\n#' \\item{minPts }{ value of the `minPts` parameter.}\n#' \\item{metric }{ used distance metric.}\n#' \\item{cluster }{A integer vector with cluster assignments. Zero indicates noise points.}\n#'\n#' `is.corepoint()` returns a logical vector indicating for each data point if it is a\n#'   core point.\n#'\n#' @author Michael Hahsler\n#' @references Hahsler M, Piekenbrock M, Doran D (2019). dbscan: Fast\n#' Density-Based Clustering with R.  _Journal of Statistical Software,_\n#' 91(1), 1-30.\n#' \\doi{10.18637/jss.v091.i01}\n#'\n#' Martin Ester, Hans-Peter Kriegel, Joerg Sander, Xiaowei Xu (1996). A\n#' Density-Based Algorithm for Discovering Clusters in Large Spatial Databases\n#' with Noise. Institute for Computer Science, University of Munich.\n#' _Proceedings of 2nd International Conference on Knowledge Discovery and\n#' Data Mining (KDD-96),_ 226-231.\n#' \\url{https://dl.acm.org/doi/10.5555/3001460.3001507}\n#'\n#' Campello, R. J. G. B.; Moulavi, D.; Sander, J. (2013). Density-Based\n#' Clustering Based on Hierarchical Density Estimates. Proceedings of the\n#' 17th Pacific-Asia Conference on Knowledge Discovery in Databases, PAKDD\n#' 2013, _Lecture Notes in Computer Science_ 7819, p. 160.\n#' \\doi{10.1007/978-3-642-37456-2_14}\n#'\n#' Sander, J., Ester, M., Kriegel, HP. et al. (1998). Density-Based\n#' Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications.\n#' _Data Mining and Knowledge Discovery_ 2, 169-194.\n#' \\doi{10.1023/A:1009745219419}\n#'\n#' @keywords model clustering\n#' @examples\n#' ## Example 1: use dbscan on the iris data set\n#' data(iris)\n#' iris <- as.matrix(iris[, 1:4])\n#'\n#' ## Find suitable DBSCAN parameters:\n#' ## 1. We use minPts = dim + 1 = 5 for iris. A larger value can also be used.\n#' ## 2. We inspect the k-NN distance plot for k = minPts - 1 = 4\n#' kNNdistplot(iris, minPts = 5)\n#'\n#' ## Noise seems to start around a 4-NN distance of .7\n#' abline(h=.7, col = \"red\", lty = 2)\n#'\n#' ## Cluster with the chosen parameters\n#' res <- dbscan(iris, eps = .7, minPts = 5)\n#' res\n#'\n#' pairs(iris, col = res$cluster + 1L)\n#' clplot(iris, res)\n#'\n#' ## Use a precomputed frNN object\n#' fr <- frNN(iris, eps = .7)\n#' dbscan(fr, minPts = 5)\n#'\n#' ## Example 2: use data from fpc\n#' set.seed(665544)\n#' n <- 100\n#' x <- cbind(\n#'   x = runif(10, 0, 10) + rnorm(n, sd = 0.2),\n#'   y = runif(10, 0, 10) + rnorm(n, sd = 0.2)\n#'   )\n#'\n#' res <- dbscan(x, eps = .3, minPts = 3)\n#' res\n#'\n#' ## plot clusters and add noise (cluster 0) as crosses.\n#' plot(x, col = res$cluster)\n#' points(x[res$cluster == 0, ], pch = 3, col = \"grey\")\n#'\n#' clplot(x, res)\n#' hullplot(x, res)\n#'\n#' ## Predict cluster membership for new data points\n#' ## (Note: 0 means it is predicted as noise)\n#' newdata <- x[1:5,] + rnorm(10, 0, .3)\n#' hullplot(x, res)\n#' points(newdata, pch = 3 , col = \"red\", lwd = 3)\n#' text(newdata, pos = 1)\n#'\n#' pred_label <- predict(res, newdata, data = x)\n#' pred_label\n#' points(newdata, col = pred_label + 1L,  cex = 2, lwd = 2)\n#'\n#' ## Compare speed against fpc version (if microbenchmark is installed)\n#' ## Note: we use dbscan::dbscan to make sure that we do now run the\n#' ## implementation in fpc.\n#' \\dontrun{\n#' if (requireNamespace(\"fpc\", quietly = TRUE) &&\n#'     requireNamespace(\"microbenchmark\", quietly = TRUE)) {\n#'   t_dbscan <- microbenchmark::microbenchmark(\n#'     dbscan::dbscan(x, .3, 3), times = 10, unit = \"ms\")\n#'   t_dbscan_linear <- microbenchmark::microbenchmark(\n#'     dbscan::dbscan(x, .3, 3, search = \"linear\"), times = 10, unit = \"ms\")\n#'   t_dbscan_dist <- microbenchmark::microbenchmark(\n#'     dbscan::dbscan(x, .3, 3, search = \"dist\"), times = 10, unit = \"ms\")\n#'   t_fpc <- microbenchmark::microbenchmark(\n#'     fpc::dbscan(x, .3, 3), times = 10, unit = \"ms\")\n#'\n#'   r <- rbind(t_fpc, t_dbscan_dist, t_dbscan_linear, t_dbscan)\n#'   r\n#'\n#'   boxplot(r,\n#'     names = c('fpc', 'dbscan (dist)', 'dbscan (linear)', 'dbscan (kdtree)'),\n#'     main = \"Runtime comparison in ms\")\n#'\n#'   ## speedup of the kd-tree-based version compared to the fpc implementation\n#'   median(t_fpc$time) / median(t_dbscan$time)\n#' }}\n#'\n#' ## Example 3: manually create a frNN object for dbscan (dbscan only needs ids and eps)\n#' nn <- structure(list(id = list(c(2,3), c(1,3), c(1,2,3), c(3,5), c(4,5)), eps = 1),\n#'   class =  c(\"NN\", \"frNN\"))\n#' nn\n#' dbscan(nn, minPts = 2)\n#'\n#' @export\ndbscan <-\n  function(x,\n    eps,\n    minPts = 5,\n    weights = NULL,\n    borderPoints = TRUE,\n    ...) {\n    if (inherits(x, \"frNN\") && missing(eps)) {\n      eps <- x$eps\n      dist_method <- x$metric\n    }\n\n    if (inherits(x, \"dist\")) {\n      .check_dist(x)\n      dist_method <- attr(x, \"method\")\n    } else\n      dist_method <- \"euclidean\"\n\n    dist_method <- dist_method %||% \"unknown\"\n\n    ### extra contains settings for frNN\n    ### search = \"kdtree\", bucketSize = 10, splitRule = \"suggest\", approx = 0\n    ### also check for MinPts for fpc compatibility (does not work for\n    ### search method dist)\n    extra <- list(...)\n    args <-\n      c(\"MinPts\", \"search\", \"bucketSize\", \"splitRule\", \"approx\")\n    m <- pmatch(names(extra), args)\n    if (anyNA(m))\n      stop(\"Unknown parameter: \",\n        toString(names(extra)[is.na(m)]))\n    names(extra) <- args[m]\n\n    # fpc compartability\n    if (!is.null(extra$MinPts)) {\n      warning(\"converting argument MinPts (fpc) to minPts (dbscan)!\")\n      minPts <- extra$MinPts\n      extra$MinPts <- NULL\n    }\n\n    search <- .parse_search(extra$search %||% \"kdtree\")\n    splitRule <- .parse_splitRule(extra$splitRule %||% \"suggest\")\n    bucketSize <- as.integer(extra$bucketSize %||% 10L)\n    approx <- as.integer(extra$approx %||% 0L)\n\n    ### do dist search\n    if (search == 3L && !inherits(x, \"dist\")) {\n      if (.matrixlike(x))\n        x <- dist(x)\n      else\n        stop(\"x needs to be a matrix to calculate distances\")\n    }\n\n    ## for dist we provide the R code with a frNN list and no x\n    frNN <- list()\n    if (inherits(x, \"dist\")) {\n      frNN <- frNN(x, eps, ...)$id\n      x <- matrix(0.0, nrow = 0, ncol = 0)\n    } else if (inherits(x, \"frNN\")) {\n      if (x$eps != eps) {\n        eps <- x$eps\n        warning(\"Using the eps of \",\n          eps,\n          \" provided in the fixed-radius NN object.\")\n      }\n      frNN <- x$id\n      x <- matrix(0.0, nrow = 0, ncol = 0)\n\n    } else {\n      if (!.matrixlike(x))\n        stop(\"x needs to be a matrix or data.frame.\")\n      ## make sure x is numeric\n      x <- as.matrix(x)\n      if (storage.mode(x) == \"integer\")\n        storage.mode(x) <- \"double\"\n      if (storage.mode(x) != \"double\")\n        stop(\"all data in x has to be numeric.\")\n    }\n\n    if (length(frNN) == 0 && anyNA(x))\n      stop(\"data/distances cannot contain NAs for dbscan (with kd-tree)!\")\n\n    ## add self match and use C numbering if frNN is used\n    if (length(frNN) > 0L)\n      frNN <-\n      lapply(\n        seq_along(frNN),\n        FUN = function(i)\n          c(i - 1L, frNN[[i]] - 1L)\n      )\n\n    if (length(minPts) != 1L ||\n        !is.finite(minPts) ||\n        minPts < 0)\n      stop(\"minPts need to be a single integer >=0.\")\n\n    if (is.null(eps) ||\n        is.na(eps) || eps < 0)\n      stop(\"eps needs to be >=0.\")\n\n    ret <- dbscan_int(\n      x,\n      as.double(eps),\n      as.integer(minPts),\n      as.double(weights),\n      as.integer(borderPoints),\n      as.integer(search),\n      as.integer(bucketSize),\n      as.integer(splitRule),\n      as.double(approx),\n      frNN\n    )\n\n    structure(\n      list(\n        cluster = ret,\n        eps = eps,\n        minPts = minPts,\n        metric = dist_method,\n        borderPoints = borderPoints\n      ),\n      class = c(\"dbscan_fast\", \"dbscan\")\n    )\n  }\n\n#' @export\nprint.dbscan_fast <- function(x, ...) {\n  writeLines(c(\n    paste0(\"DBSCAN clustering for \", nobs(x), \" objects.\"),\n    paste0(\"Parameters: eps = \", x$eps, \", minPts = \", x$minPts),\n    paste0(\n      \"Using \",\n      x$metric,\n      \" distances and borderpoints = \",\n      x$borderPoints\n    ),\n    paste0(\n      \"The clustering contains \",\n      ncluster(x),\n      \" cluster(s) and \",\n      nnoise(x),\n      \" noise points.\"\n    )\n  ))\n\n  print(table(x$cluster))\n  cat(\"\\n\")\n\n  writeLines(strwrap(paste0(\n    \"Available fields: \",\n    toString(names(x))\n  ), exdent = 18))\n}\n\n#' @rdname dbscan\n#' @export\nis.corepoint <- function(x, eps, minPts = 5, ...)\n  lengths(frNN(x, eps = eps, ...)$id) >= (minPts - 1)\n"
  },
  {
    "path": "R/dendrogram.R",
    "content": "#######################################################################\n# dbscan - Density Based Clustering of Applications with Noise\n#          and Related Algorithms\n# Copyright (C) 2015 Michael Hahsler, Matt Piekenbrock\n\n# This program is free software; you can redistribute it and/or modify\n# it under the terms of the GNU General Public License as published by\n# the Free Software Foundation; either version 2 of the License, or\n# any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU General Public License for more details.\n#\n# You should have received a copy of the GNU General Public License along\n# with this program; if not, write to the Free Software Foundation, Inc.,\n# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.\n\n#' Coersions to Dendrogram\n#'\n#' Provides a new generic function to coerce objects to dendrograms with\n#' [stats::as.dendrogram()] as the default. Additional methods for\n#' [hclust], [hdbscan] and [reachability] objects are provided.\n#'\n#' Coersion methods for\n#' [hclust], [hdbscan] and [reachability] objects to [dendrogram] are provided.\n#'\n#' The coercion from `hclust` is a faster C++ reimplementation of the coercion in\n#' package `stats`. The original implementation can be called\n#' using [stats::as.dendrogram()].\n#'\n#' The coersion from [hdbscan] builds the non-simplified HDBSCAN hierarchy as a\n#' dendrogram object.\n#'\n#' @name dendrogram\n#' @aliases dendrogram\n#'\n#' @param object the object\n#' @param ... further arguments\nNULL\n\n#' @rdname dendrogram\n#' @export\nas.dendrogram <- function (object, ...) {\n  UseMethod(\"as.dendrogram\", object)\n}\n\n#' @rdname dendrogram\n#' @export\nas.dendrogram.default <- function (object, ...)\n  stats::as.dendrogram(object, ...)\n\n## this is a replacement for stats::as.dendrogram for hclust\n#' @rdname dendrogram\n#' @export\nas.dendrogram.hclust <- function(object, ...) {\n  return(buildDendrogram(object))\n}\n\n#' @rdname dendrogram\n#' @export\nas.dendrogram.hdbscan <- function(object, ...) {\n  return(buildDendrogram(object$hc))\n}\n\n#' @rdname dendrogram\n#' @export\nas.dendrogram.reachability <- function(object, ...) {\n  if (sum(is.infinite(object$reachdist)) > 1)\n    stop(\n      \"Multiple Infinite reachability distances found. Reachability plots can only be converted if they contain enough information to fully represent the dendrogram structure. If using OPTICS, a larger eps value (such as Inf) may be needed in the parameterization.\"\n    )\n  #dup_x <- object\n  c_order <- order(object$reachdist) - 1\n  # dup_x$order <- dup_x$order - 1\n  #q_order <- sapply(c_order, function(i) which(dup_x$order == i))\n  res <- reach_to_dendrogram(object, c_order)\n  # res <- dendrapply(res, function(leaf) { new_leaf <- leaf[[1]]; attributes(new_leaf) <- attributes(leaf); new_leaf })\n\n  # add mid points for plotting\n  res <- .midcache.dendrogram(res)\n\n  res\n}\n\n# calculate midpoints for dendrogram\n# from stats, but not exported\n# see stats:::midcache.dendrogram\n\n.midcache.dendrogram <- function(x, type = \"hclust\", quiet = FALSE) {\n  type <- match.arg(type)\n  stopifnot(inherits(x, \"dendrogram\"))\n  verbose <- getOption(\"verbose\", 0) >= 2\n  setmid <- function(d, type) {\n    depth <- 0L\n    kk <- integer()\n    jj <- integer()\n    dd <- list()\n    repeat {\n      if (!is.leaf(d)) {\n        k <- length(d)\n        if (k < 1)\n          stop(\"dendrogram node with non-positive #{branches}\")\n        depth <- depth + 1L\n        if (verbose)\n          cat(sprintf(\" depth(+)=%4d, k=%d\\n\", depth,\n            k))\n        kk[depth] <- k\n        if (storage.mode(jj) != storage.mode(kk))\n          storage.mode(jj) <- storage.mode(kk)\n        dd[[depth]] <- d\n        d <- d[[jj[depth] <- 1L]]\n        next\n      }\n      while (depth) {\n        k <- kk[depth]\n        j <- jj[depth]\n        r <- dd[[depth]]\n        r[[j]] <- unclass(d)\n        if (j < k)\n          break\n        depth <- depth - 1L\n        if (verbose)\n          cat(sprintf(\" depth(-)=%4d, k=%d\\n\", depth,\n            k))\n        midS <- sum(vapply(r, .midDend, 0))\n        if (!quiet && type == \"hclust\" && k != 2)\n          warning(\"midcache() of non-binary dendrograms only partly implemented\")\n        attr(r, \"midpoint\") <- (.memberDend(r[[1L]]) +\n            midS) / 2\n        d <- r\n      }\n      if (!depth)\n        break\n      dd[[depth]] <- r\n      d <- r[[jj[depth] <- j + 1L]]\n    }\n    d\n  }\n  setmid(x, type = type)\n}\n\n.midDend <- function(x) {\n  attr(x, \"midpoint\") %||% 0\n}\n\n.memberDend <- function(x) {\n  attr(x, \"x.member\") %||% attr(x, \"members\") %||% 1\n}\n"
  },
  {
    "path": "R/extractFOSC.R",
    "content": "#######################################################################\n# dbscan - Density Based Clustering of Applications with Noise\n#          and Related Algorithms\n# Copyright (C) 2015 Michael Hahsler, Matt Piekenbrock\n\n# This program is free software; you can redistribute it and/or modify\n# it under the terms of the GNU General Public License as published by\n# the Free Software Foundation; either version 2 of the License, or\n# any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU General Public License for more details.\n#\n# You should have received a copy of the GNU General Public License along\n# with this program; if not, write to the Free Software Foundation, Inc.,\n# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.\n\n#' Framework for the Optimal Extraction of Clusters from Hierarchies\n#'\n#' Generic reimplementation of the _Framework for Optimal Selection of Clusters_\n#' (FOSC; Campello et al, 2013) to extract clusterings from hierarchical clustering (i.e.,\n#' [hclust] objects).\n#' Can be parameterized to perform unsupervised\n#' cluster extraction through a stability-based measure, or semisupervised\n#' cluster extraction through either a constraint-based extraction (with a\n#' stability-based tiebreaker) or a mixed (weighted) constraint and\n#' stability-based objective extraction.\n#'\n#' Campello et al (2013) suggested a _Framework for Optimal Selection of\n#' Clusters_ (FOSC) as a framework to make local (non-horizontal) cuts to any\n#' cluster tree hierarchy. This function implements the original extraction\n#' algorithms as described by the framework for hclust objects. Traditional\n#' cluster extraction methods from hierarchical representations (such as\n#' [hclust] objects) generally rely on global parameters or cutting values\n#' which are used to partition a cluster hierarchy into a set of disjoint, flat\n#' clusters. This is implemented in R in function [stats::cutree()].\n#' Although such methods are widespread, using global parameter\n#' settings are inherently limited in that they cannot capture patterns within\n#' the cluster hierarchy at varying _local_ levels of granularity.\n#'\n#' Rather than partitioning a hierarchy based on the number of the cluster one\n#' expects to find (\\eqn{k}) or based on some linkage distance threshold\n#' (\\eqn{H}), the FOSC proposes that the optimal clusters may exist at varying\n#' distance thresholds in the hierarchy. To enable this idea, FOSC requires one\n#' parameter (minPts) that represents _the minimum number of points that\n#' constitute a valid cluster._ The first step of the FOSC algorithm is to\n#' traverse the given cluster hierarchy divisively, recording new clusters at\n#' each split if both branches represent more than or equal to minPts. Branches\n#' that contain less than minPts points at one or both branches inherit the\n#' parent clusters identity. Note that using FOSC, due to the constraint that\n#' minPts must be greater than or equal to 2, it is possible that the optimal\n#' cluster solution chosen makes local cuts that render parent branches of\n#' sizes less than minPts as noise, which are denoted as 0 in the final\n#' solution.\n#'\n#' Traversing the original cluster tree using minPts creates a new, simplified\n#' cluster tree that is then post-processed recursively to extract clusters\n#' that maximize for each cluster \\eqn{C_i}{Ci} the cost function\n#'\n#' \\deqn{\\max_{\\delta_2, \\dots, \\delta_k} J = \\sum\\limits_{i=2}^{k} \\delta_i\n#' S(C_i)}{ J = \\sum \\delta S(Ci) for all i clusters, } where\n#' \\eqn{S(C_i)}{S(Ci)} is the stability-based measure as \\deqn{ S(C_i) =\n#' \\sum_{x_j \\in C_i}(\\frac{1}{h_{min} (x_j, C_i)} - \\frac{1}{h_{max} (C_i)})\n#' }{ S(Ci) = \\sum (1/Hmin(Xj, Ci) - 1/Hmax(Ci)) for all Xj in Ci.}\n#'\n#' \\eqn{\\delta_i}{\\delta} represents an indicator function, which constrains\n#' the solution space such that clusters must be disjoint (cannot assign more\n#' than 1 label to each cluster). The measure \\eqn{S(C_i)}{S(Ci)} used by FOSC\n#' is an unsupervised validation measure based on the assumption that, if you\n#' vary the linkage/distance threshold across all possible values, more\n#' prominent clusters that survive over many threshold variations should be\n#' considered as stronger candidates of the optimal solution. For this reason,\n#' using this measure to detect clusters is referred to as an unsupervised,\n#' _stability-based_ extraction approach. In some cases it may be useful\n#' to enact _instance-level_ constraints that ensure the solution space\n#' conforms to linkage expectations known _a priori_. This general idea of\n#' using preliminary expectations to augment the clustering solution will be\n#' referred to as _semisupervised clustering_. If constraints are given in\n#' the call to `extractFOSC()`, the following alternative objective function\n#' is maximized:\n#'\n#' \\deqn{J = \\frac{1}{2n_c}\\sum\\limits_{j=1}^n \\gamma (x_j)}{J = 1/(2 * nc)\n#' \\sum \\gamma(Xj)}\n#'\n#' \\eqn{n_c}{nc} is the total number of constraints given and\n#' \\eqn{\\gamma(x_j)}{\\gamma(Xj)} represents the number of constraints involving\n#' object \\eqn{x_j}{Xj} that are satisfied. In the case of ties (such as\n#' solutions where no constraints were given), the unsupervised solution is\n#' used as a tiebreaker. See Campello et al (2013) for more details.\n#'\n#' As a third option, if one wishes to prioritize the degree at which the\n#' unsupervised and semisupervised solutions contribute to the overall optimal\n#' solution, the parameter \\eqn{\\alpha} can be set to enable the extraction of\n#' clusters that maximize the `mixed` objective function\n#'\n#' \\deqn{J = \\alpha S(C_i) + (1 - \\alpha) \\gamma(C_i))}{J = \\alpha S(Ci) + (1 -\n#' \\alpha) \\gamma(Ci).}\n#'\n#' FOSC expects the pairwise constraints to be passed as either 1) an\n#' \\eqn{n(n-1)/2} vector of integers representing the constraints, where 1\n#' represents should-link, -1 represents should-not-link, and 0 represents no\n#' preference using the unsupervised solution (see below for examples).\n#' Alternatively, if only a few constraints are needed, a named list\n#' representing the (symmetric) adjacency list can be used, where the names\n#' correspond to indices of the points in the original data, and the values\n#' correspond to integer vectors of constraints (positive indices for\n#' should-link, negative indices for should-not-link). Again, see the examples\n#' section for a demonstration of this.\n#'\n#' The parameters to the input function correspond to the concepts discussed\n#' above. The `minPts` parameter to represent the minimum cluster size to\n#' extract. The optional `constraints` parameter contains the pairwise,\n#' instance-level constraints of the data. The optional `alpha` parameters\n#' controls whether the mixed objective function is used (if `alpha` is\n#' greater than 0). If the `validate_constraints` parameter is set to\n#' true, the constraints are checked (and fixed) for symmetry (if point A has a\n#' should-link constraint with point B, point B should also have the same\n#' constraint). Asymmetric constraints are not supported.\n#'\n#' Unstable branch pruning was not discussed by Campello et al (2013), however\n#' in some data sets it may be the case that specific subbranches scores are\n#' significantly greater than sibling and parent branches, and thus sibling\n#' branches should be considered as noise if their scores are cumulatively\n#' lower than the parents. This can happen in extremely nonhomogeneous data\n#' sets, where there exists locally very stable branches surrounded by unstable\n#' branches that contain more than `minPts` points.\n#' `prune_unstable = TRUE` will remove the unstable branches.\n#'\n#' @family clustering functions\n#'\n#' @param x a valid [hclust] object created via [hclust()] or [hdbscan()].\n#' @param constraints Either a list or matrix of pairwise constraints. If\n#' missing, an unsupervised measure of stability is used to make local cuts and\n#' extract the optimal clusters. See details.\n#' @param alpha numeric; weight between \\eqn{[0, 1]} for mixed-objective\n#' semi-supervised extraction. Defaults to 0.\n#' @param minPts numeric; Defaults to 2. Only needed if class-less noise is a\n#' valid label in the model.\n#' @param prune_unstable logical; should significantly unstable subtrees be\n#' pruned? The default is `FALSE` for the original optimal extraction\n#' framework (see Campello et al, 2013). See details for what `TRUE`\n#' implies.\n#' @param validate_constraints logical; should constraints be checked for\n#' validity? See details for what are considered valid constraints.\n#'\n#' @returns A list with the elements:\n#'\n#' \\item{cluster }{A integer vector with cluster assignments. Zero\n#' indicates noise points (if any).}\n#' \\item{hc }{The original [hclust] object with additional list elements\n#' `\"stability\"`, `\"constraint\"`, and `\"total\"`\n#' for the \\eqn{n - 1} cluster-wide objective scores from the extraction.}\n#'\n#' @author Matt Piekenbrock\n#' @seealso [hclust()], [hdbscan()], [stats::cutree()]\n#' @references Campello, Ricardo JGB, Davoud Moulavi, Arthur Zimek, and Joerg\n#' Sander (2013). A framework for semi-supervised and unsupervised optimal\n#' extraction of clusters from hierarchies. _Data Mining and Knowledge\n#' Discovery_ 27(3): 344-371.\n#' \\doi{10.1007/s10618-013-0311-4}\n#' @keywords model clustering\n#' @examples\n#' data(\"moons\")\n#'\n#' ## Regular HDBSCAN using stability-based extraction (unsupervised)\n#' cl <- hdbscan(moons, minPts = 5)\n#' cl$cluster\n#'\n#' ## Constraint-based extraction from the HDBSCAN hierarchy\n#' ## (w/ stability-based tiebreaker (semisupervised))\n#' cl_con <- extractFOSC(cl$hc, minPts = 5,\n#'   constraints = list(\"12\" = c(49, -47)))\n#' cl_con$cluster\n#'\n#' ## Alternative formulation: Constraint-based extraction from the HDBSCAN hierarchy\n#' ## (w/ stability-based tiebreaker (semisupervised)) using distance thresholds\n#' dist_moons <- dist(moons)\n#' cl_con2 <- extractFOSC(cl$hc, minPts = 5,\n#'   constraints = ifelse(dist_moons < 0.1, 1L,\n#'                 ifelse(dist_moons > 1, -1L, 0L)))\n#'\n#' cl_con2$cluster # same as the second example\n#' @export\nextractFOSC <-\n  function(x,\n    constraints,\n    alpha = 0,\n    minPts = 2L,\n    prune_unstable = FALSE,\n    validate_constraints = FALSE) {\n    if (!inherits(x, \"hclust\"))\n      stop(\"extractFOSC expects 'x' to be a valid hclust object.\")\n\n    # if constraints are given then they need to be a list, a matrix or a vector\n    if (!(\n      missing(constraints) ||\n        is.list(constraints) ||\n        is.matrix(constraints) ||\n        is.numeric(constraints)\n    ))\n      stop(\"extractFOSC expects constraints to be either an adjacency list or adjacency matrix.\")\n\n    if (!minPts >= 2)\n      stop(\"minPts must be at least 2.\")\n    if (alpha < 0 ||\n        alpha > 1)\n      stop(\"alpha can only takes values between [0, 1].\")\n    n <- nrow(x$merge) + 1L\n\n    ## First step for both unsupervised and semisupervised - compute stability scores\n    cl_tree <- computeStability(x, minPts)\n\n    ## Unsupervised Extraction\n    if (missing(constraints)) {\n      cl_tree <- extractUnsupervised(cl_tree, prune_unstable)\n    }\n    ## Semi-supervised Extraction\n    else {\n      ## If given as adjacency-list form\n      if (is.list(constraints)) {\n        ## Checks for proper indexing, symmetry of constraints, etc.\n        if (validate_constraints) {\n          is_valid <- max(as.integer(names(constraints))) < n\n          is_valid <- is_valid &&\n            all(vapply(constraints, function(ilc) all(ilc <= n), logical(1L)))\n          if (!is_valid) {\n            stop(\"Detected constraint indices not in the interval [1, n]\")\n          }\n          constraints <- validateConstraintList(constraints, n)\n        }\n        cl_tree <-\n          extractSemiSupervised(cl_tree, constraints, alpha, prune_unstable)\n      }\n      ## Adjacency matrix given (probably from dist object), retrieve adjacency list form\n      else if (is.vector(constraints)) {\n        if (!all(constraints %in% c(-1, 0, 1))) {\n          stop(\n            \"'extractFOSC' only accepts instance-level constraints. See ?extractFOSC for more details.\"\n          )\n        }\n        ## Checks for proper integer labels, symmetry of constraints, length of vector, etc.\n        if (validate_constraints) {\n          is_valid <- length(constraints) == choose(n, 2)\n          constraints_list <-\n            validateConstraintList(distToAdjacency(constraints, n), n)\n        } else {\n          constraints_list <-  distToAdjacency(constraints, n)\n        }\n        cl_tree <-\n          extractSemiSupervised(cl_tree, constraints_list, alpha, prune_unstable)\n      }\n      ## Full nxn adjacency-matrix given, give warning and retrieve adjacency list form\n      else if (is.matrix(constraints)) {\n        if (!all(constraints %in% c(-1, 0, 1))) {\n          stop(\n            \"'extractFOSC' only accepts instance-level constraints. See ?extractFOSC for more details.\"\n          )\n        }\n        if (!all(dim(constraints) == c(n, n))) {\n          stop(\"Given matrix is not square.\")\n        }\n        warning(\n          \"Full nxn matrix given; extractFOCS does not support asymmetric relational constraints. Using lower triangular.\"\n        )\n\n        constraints <- constraints[lower.tri(constraints)]\n\n        ## Checks for proper integer labels, symmetry of constraints, length of vector, etc.\n        if (validate_constraints) {\n          is_valid <- length(constraints) == choose(n, 2)\n          constraints_list <-\n            validateConstraintList(distToAdjacency(constraints, n), n)\n        } else {\n          constraints_list <- distToAdjacency(constraints, n)\n        }\n        cl_tree <-\n          extractSemiSupervised(cl_tree, constraints_list, alpha, prune_unstable)\n      } else {\n        stop(\n          \"'extractFOSC' doesn't know how to handle constraints of type \",\n          class(constraints)\n        )\n      }\n    }\n    cl_track <- attr(cl_tree, \"cl_tracker\")\n    stability_score <-\n      vapply(cl_track, function(cid)\n        cl_tree[[as.character(cid)]]$stability, numeric(1L))\n    constraint_score <-\n      vapply(cl_track, function(cid)\n        cl_tree[[as.character(cid)]]$vscore %||% 0, numeric(1L))\n    total_score <-\n      vapply(cl_track, function(cid)\n        cl_tree[[as.character(cid)]]$score %||% 0, numeric(1L))\n    out <- append(\n      x,\n      list(\n        cluster = cl_track,\n        stability = stability_score,\n        constraint = constraint_score,\n        total = total_score\n      )\n    )\n    extraction_type <-\n      if (missing(constraints)) {\n        \"(w/ stability-based extraction)\"\n      } else if (alpha == 0) {\n        \"(w/ constraint-based extraction)\"\n      } else {\n        \"(w/ mixed-objective extraction)\"\n      }\n    substrs <- strsplit(x$method, split = \" \\\\(w\\\\/\")[[1L]]\n    out[[\"method\"]] <-\n      if (length(substrs) > 1)\n        paste(substrs[[1]], extraction_type)\n    else\n      paste(out[[\"method\"]], extraction_type)\n    class(out) <- \"hclust\"\n    return(list(cluster = attr(cl_tree, \"cluster\"), hc = out))\n  }\n"
  },
  {
    "path": "R/frNN.R",
    "content": "#######################################################################\n# dbscan - Density Based Clustering of Applications with Noise\n#          and Related Algorithms\n# Copyright (C) 2015 Michael Hahsler\n\n# This program is free software; you can redistribute it and/or modify\n# it under the terms of the GNU General Public License as published by\n# the Free Software Foundation; either version 2 of the License, or\n# any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU General Public License for more details.\n#\n# You should have received a copy of the GNU General Public License along\n# with this program; if not, write to the Free Software Foundation, Inc.,\n# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.\n\n\n#' Find the Fixed Radius Nearest Neighbors\n#'\n#' This function uses a kd-tree to find the fixed radius nearest neighbors\n#' (including distances) fast.\n#'\n#' If `x` is specified as a data matrix, then Euclidean distances an fast\n#' nearest neighbor lookup using a kd-tree are used.\n#'\n#' To create a frNN object from scratch, you need to supply at least the\n#' elements `id` with a list of integer vectors with the nearest neighbor\n#' ids for each point and `eps` (see below).\n#'\n#' **Self-matches:** Self-matches are not returned!\n#'\n#' @aliases frNN frnn print.frnn\n#' @family NN functions\n#'\n#' @param x a data matrix, a dist object or a frNN object.\n#' @param eps neighbors radius.\n#' @param query a data matrix with the points to query. If query is not\n#' specified, the NN for all the points in `x` is returned. If query is\n#' specified then `x` needs to be a data matrix.\n#' @param sort sort the neighbors by distance? This is expensive and can be\n#' done later using `sort()`.\n#' @param search nearest neighbor search strategy (one of `\"kdtree\"`, `\"linear\"` or\n#' `\"dist\"`).\n#' @param bucketSize max size of the kd-tree leafs.\n#' @param splitRule rule to split the kd-tree. One of `\"STD\"`, `\"MIDPT\"`, `\"FAIR\"`,\n#' `\"SL_MIDPT\"`, `\"SL_FAIR\"` or `\"SUGGEST\"` (SL stands for sliding). `\"SUGGEST\"` uses\n#' ANNs best guess.\n#' @param approx use approximate nearest neighbors. All NN up to a distance of\n#' a factor of `1 + approx` eps may be used. Some actual NN may be omitted\n#' leading to spurious clusters and noise points.  However, the algorithm will\n#' enjoy a significant speedup.\n#' @param decreasing sort in decreasing order?\n#' @param ... further arguments\n#'\n#' @returns\n#'\n#' `frNN()` returns an object of class [frNN] (subclass of\n#' [NN]) containing a list with the following components:\n#' \\item{id }{a list of\n#' integer vectors. Each vector contains the ids (row numbers) of the fixed radius nearest\n#' neighbors. }\n#' \\item{dist }{a list with distances (same structure as\n#' `id`). }\n#' \\item{eps }{ neighborhood radius `eps` that was used. }\n#' \\item{metric }{ used distance metric. }\n#'\n#' `adjacencylist()` returns a list with one entry per data point in `x`. Each entry\n#' contains the id of the nearest neighbors.\n#'\n#' @author Michael Hahsler\n#'\n#' @references David M. Mount and Sunil Arya (2010). ANN: A Library for\n#' Approximate Nearest Neighbor Searching,\n#' \\url{http://www.cs.umd.edu/~mount/ANN/}.\n#' @keywords model\n#' @examples\n#' data(iris)\n#' x <- iris[, -5]\n#'\n#' # Example 1: Find fixed radius nearest neighbors for each point\n#' nn <- frNN(x, eps = .5)\n#' nn\n#'\n#' # Number of neighbors\n#' hist(lengths(adjacencylist(nn)),\n#'   xlab = \"k\", main=\"Number of Neighbors\",\n#'   sub = paste(\"Neighborhood size eps =\", nn$eps))\n#'\n#' # Explore neighbors of point i = 10\n#' i <- 10\n#' nn$id[[i]]\n#' nn$dist[[i]]\n#' plot(x, col = ifelse(seq_len(nrow(iris)) %in% nn$id[[i]], \"red\", \"black\"))\n#'\n#' # get an adjacency list\n#' head(adjacencylist(nn))\n#'\n#' # plot the fixed radius neighbors (and then reduced to a radius of .3)\n#' plot(nn, x)\n#' plot(frNN(nn, eps = .3), x)\n#'\n#' ## Example 2: find fixed-radius NN for query points\n#' q <- x[c(1,100),]\n#' nn <- frNN(x, eps = .5, query = q)\n#'\n#' plot(nn, x, col = \"grey\")\n#' points(q, pch = 3, lwd = 2)\n#' @export frNN\nfrNN <-\n  function(x,\n    eps,\n    query = NULL,\n    sort = TRUE,\n    search = \"kdtree\",\n    bucketSize = 10,\n    splitRule = \"suggest\",\n    approx = 0) {\n    if (is.null(eps) ||\n        is.na(eps) || eps < 0)\n      stop(\"eps needs to be >=0.\")\n\n    if (inherits(x, \"frNN\")) {\n      if (x$eps < eps)\n        stop(\"frNN in x has not a sufficient eps radius.\")\n\n      for (i in seq_along(x$dist)) {\n        take <- x$dist[[i]] <= eps\n        x$dist[[i]] <- x$dist[[i]][take]\n        x$id[[i]] <- x$id[[i]][take]\n      }\n      x$eps <- eps\n\n      return(x)\n    }\n\n    search <- .parse_search(search)\n    splitRule <- .parse_splitRule(splitRule)\n\n    ### dist search\n    if (search == 3 && !inherits(x, \"dist\")) {\n      if (.matrixlike(x))\n        x <- dist(x)\n      else\n        stop(\"x needs to be a matrix to calculate distances\")\n    }\n\n    ### get kNN from a dist object in R\n    if (inherits(x, \"dist\")) {\n      if (!is.null(query))\n        stop(\"query can only be used if x contains the data.\")\n\n      if (anyNA(x))\n        stop(\"data/distances cannot contain NAs for frNN (with kd-tree)!\")\n\n      return(dist_to_frNN(x, eps = eps, sort = sort))\n    }\n\n    ## make sure x is numeric\n    if (!.matrixlike(x))\n      stop(\"x needs to be a matrix or a data.frame.\")\n    x <- as.matrix(x)\n    if (storage.mode(x) == \"integer\")\n      storage.mode(x) <- \"double\"\n    if (storage.mode(x) != \"double\")\n      stop(\"all data in x has to be numeric.\")\n\n    if (!is.null(query)) {\n      if (!.matrixlike(query))\n        stop(\"query needs to be a matrix or a data.frame.\")\n      query <- as.matrix(query)\n      if (storage.mode(query) == \"integer\")\n        storage.mode(query) <- \"double\"\n      if (storage.mode(query) != \"double\")\n        stop(\"query has to be NULL or a numeric matrix or data.frame.\")\n      if (ncol(x) != ncol(query))\n        stop(\"x and query need to have the same number of columns!\")\n    }\n\n    if (anyNA(x))\n      stop(\"data/distances cannot contain NAs for frNN (with kd-tree)!\")\n\n    ## returns NO self matches\n    if (!is.null(query)) {\n      ret <-\n        frNN_query_int(\n          as.matrix(x),\n          as.matrix(query),\n          as.double(eps),\n          as.integer(search),\n          as.integer(bucketSize),\n          as.integer(splitRule),\n          as.double(approx)\n        )\n      names(ret$dist) <- rownames(query)\n      names(ret$id) <- rownames(query)\n      ret$metric <- \"euclidean\"\n    } else {\n      ret <- frNN_int(\n        as.matrix(x),\n        as.double(eps),\n        as.integer(search),\n        as.integer(bucketSize),\n        as.integer(splitRule),\n        as.double(approx)\n      )\n      names(ret$dist) <- rownames(x)\n      names(ret$id) <- rownames(x)\n      ret$metric <- \"euclidean\"\n    }\n\n    ret$eps <- eps\n    ret$sort <- FALSE\n    class(ret) <- c(\"frNN\", \"NN\")\n\n    if (sort)\n      ret <- sort.frNN(ret)\n\n    ret\n  }\n\n# extract a row from a distance matrix without doubling space requirements\ndist_row <- function(x, i, self_val = 0) {\n  n <- attr(x, \"Size\")\n\n  i <- rep(i, times = n)\n  j <- seq_len(n)\n  swap_idx <- i > j\n  tmp <- i[swap_idx]\n  i[swap_idx] <- j[swap_idx]\n  j[swap_idx] <- tmp\n\n  diag_idx <- i == j\n  idx <- n * (i - 1) - i * (i - 1) / 2 + j - i\n  idx[diag_idx] <- NA\n\n  val <- x[idx]\n  val[diag_idx] <- self_val\n  val\n}\n\ndist_to_frNN <- function(x, eps, sort = FALSE) {\n  .check_dist(x)\n\n  n <- attr(x, \"Size\")\n\n  id <- list()\n  d <- list()\n\n  for (i in seq_len(n)) {\n    ### Inf -> no self-matches\n    y <- dist_row(x, i, self_val = Inf)\n    o <- which(y <= eps)\n    id[[i]] <- o\n    d[[i]] <- y[o]\n  }\n  names(id) <- labels(x)\n  names(d) <- labels(x)\n\n  ret <-\n    structure(list(\n      dist = d,\n      id = id,\n      eps = eps,\n      metric = attr(x, \"method\"),\n      sort = FALSE\n    ),\n      class = c(\"frNN\", \"NN\"))\n\n  if (sort)\n    ret <- sort.frNN(ret)\n\n  return(ret)\n}\n\n#' @rdname frNN\n#' @export\nsort.frNN <- function(x, decreasing = FALSE, ...) {\n  if (isTRUE(x$sort))\n    return(x)\n  if (is.null(x$dist))\n    stop(\"Unable to sort. Distances are missing.\")\n\n  ## FIXME: This is slow do this in C++\n  n <- names(x$id)\n\n  o <- lapply(\n    seq_along(x$dist),\n    FUN =\n      function(i)\n        order(x$dist[[i]], x$id[[i]], decreasing = decreasing)\n  )\n  x$dist <-\n    lapply(\n      seq_along(o),\n      FUN = function(p)\n        x$dist[[p]][o[[p]]]\n    )\n  x$id <- lapply(\n    seq_along(o),\n    FUN = function(p)\n      x$id[[p]][o[[p]]]\n  )\n\n  names(x$dist) <- n\n  names(x$id) <- n\n\n  x$sort <- TRUE\n\n  x\n}\n\n#' @rdname frNN\n#' @export\nadjacencylist.frNN <- function(x, ...)\n  x$id\n\n#' @rdname frNN\n#' @export\nprint.frNN <- function(x, ...) {\n  cat(\n    \"fixed radius nearest neighbors for \",\n    length(x$id),\n    \" objects (eps=\",\n    x$eps,\n    \").\",\n    \"\\n\",\n    sep = \"\"\n  )\n\n  cat(\"Distance metric:\", x$metric, \"\\n\")\n  cat(\"\\nAvailable fields: \", toString(names(x)), \"\\n\", sep = \"\")\n}\n"
  },
  {
    "path": "R/hdbscan.R",
    "content": "#######################################################################\n# dbscan - Density Based Clustering of Applications with Noise\n#          and Related Algorithms\n# Copyright (C) 2015 Michael Hahsler, Matt Piekenbrock\n\n# This program is free software; you can redistribute it and/or modify\n# it under the terms of the GNU General Public License as published by\n# the Free Software Foundation; either version 2 of the License, or\n# any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU General Public License for more details.\n#\n# You should have received a copy of the GNU General Public License along\n# with this program; if not, write to the Free Software Foundation, Inc.,\n# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.\n\n#' Hierarchical DBSCAN (HDBSCAN)\n#'\n#' Fast C++ implementation of the HDBSCAN (Hierarchical DBSCAN) and its related\n#' algorithms.\n#'\n#' This fast implementation of HDBSCAN (Campello et al., 2013) computes the\n#' hierarchical cluster tree representing density estimates along with the\n#' stability-based flat cluster extraction. HDBSCAN essentially computes the\n#' hierarchy of all DBSCAN* clusterings, and\n#' then uses a stability-based extraction method to find optimal cuts in the\n#' hierarchy, thus producing a flat solution.\n#'\n#' HDBSCAN performs the following steps:\n#'\n#' 1. Compute mutual reachability distance mrd between points\n#'    (based on distances and core distances).\n#' 2. Use mdr as a distance measure to construct a minimum spanning tree.\n#' 3. Prune the tree using stability.\n#' 4. Extract the clusters.\n#'\n#' Additional, related algorithms including the \"Global-Local Outlier Score\n#' from Hierarchies\" (GLOSH; see section 6 of Campello et al., 2015)\n#' is available in function [glosh()]\n#' and the ability to cluster based on instance-level constraints (see\n#' section 5.3 of Campello et al. 2015) are supported. The algorithms only need\n#' the parameter `minPts`.\n#'\n#' Note that `minPts` not only acts as a minimum cluster size to detect,\n#' but also as a \"smoothing\" factor of the density estimates implicitly\n#' computed from HDBSCAN.\n#'\n#' When using the optional parameter `cluster_selection_epsilon`,\n#' a combination between DBSCAN* and HDBSCAN* can be achieved\n#' (see Malzer & Baum 2020). This means that part of the\n#' tree is affected by `cluster_selection_epsilon` as if\n#' running DBSCAN* with `eps` = `cluster_selection_epsilon`.\n#' The remaining part (on levels above the threshold) is still\n#' processed by HDBSCAN*'s stability-based selection algorithm\n#' and can therefore return clusters of variable densities.\n#' Note that there is not always a remaining part, especially if\n#' the parameter value is chosen too large, or if there aren't\n#' enough clusters of variable densities. In this case, the result\n#' will be equal to DBSCAN*.\n# `cluster_selection_epsilon` is especially useful for cases\n#' where HDBSCAN* produces too many small clusters that\n#' need to be merged, while still being able to extract clusters\n#' of variable densities at higher levels.\n#'\n#' `coredist()`: The core distance is defined for each point as\n#' the distance to the `MinPts - 1`'s neighbor.\n#' It is a density estimate equivalent to `kNNdist()` with `k = MinPts -1`.\n#'\n#' `mrdist()`: The mutual reachability distance is defined between two points as\n#' `mrd(a, b) = max(coredist(a), coredist(b), dist(a, b))`. This distance metric is used by\n#' HDBSCAN. It has the effect of increasing distances in low density areas.\n#'\n#' `predict()` assigns each new data point to the same cluster as the nearest point\n#' if it is not more than that points core distance away. Otherwise the new point\n#' is classified as a noise point (i.e., cluster ID 0).\n#' @aliases hdbscan HDBSCAN print.hdbscan\n#'\n#' @family HDBSCAN functions\n#' @family clustering functions\n#'\n#' @param x a data matrix (Euclidean distances are used) or a [dist] object\n#' calculated with an arbitrary distance metric.\n#' @param minPts integer; Minimum size of clusters. See details.\n#' @param cluster_selection_epsilon double; a distance threshold below which\n#  no clusters should be selected (see Malzer & Baum 2020)\n#' @param gen_hdbscan_tree logical; should the robust single linkage tree be\n#' explicitly computed (see cluster tree in Chaudhuri et al, 2010).\n#' @param gen_simplified_tree logical; should the simplified hierarchy be\n#' explicitly computed (see Campello et al, 2013).\n#' @param verbose report progress.\n#' @param ...  additional arguments are passed on.\n#' @param scale integer; used to scale condensed tree based on the graphics\n#' device. Lower scale results in wider colored trees lines.\n#' The default `'suggest'` sets scale to the number of clusters.\n#' @param gradient character vector; the colors to build the condensed tree\n#' coloring with.\n#' @param show_flat logical; whether to draw boxes indicating the most stable\n#' clusters.\n#' @param coredist numeric vector with precomputed core distances (optional).\n#'\n#' @return `hdbscan()` returns object of class `hdbscan` with the following components:\n#' \\item{cluster }{A integer vector with cluster assignments. Zero indicates\n#' noise points.}\n#' \\item{minPts }{ value of the `minPts` parameter.}\n#' \\item{cluster_scores }{The sum of the stability scores for each salient\n#' (flat) cluster. Corresponds to cluster IDs given the in `\"cluster\"` element.\n#' }\n#' \\item{membership_prob }{The probability or individual stability of a\n#' point within its clusters. Between 0 and 1.}\n#' \\item{outlier_scores }{The GLOSH outlier score of each point. }\n#' \\item{hc }{An [hclust] object of the HDBSCAN hierarchy. }\n#'\n#' `coredist()` returns a vector with the core distance for each data point.\n#'\n#' `mrdist()` returns a [dist] object containing pairwise mutual reachability distances.\n#'\n#' @author Matt Piekenbrock\n#' @author Claudia Malzer (added cluster_selection_epsilon)\n#'\n#' @references\n#' Campello RJGB, Moulavi D, Sander J (2013). Density-Based Clustering Based on\n#' Hierarchical Density Estimates. Proceedings of the 17th Pacific-Asia\n#' Conference on Knowledge Discovery in Databases, PAKDD 2013, _Lecture Notes\n#' in Computer Science_ 7819, p. 160.\n#' \\doi{10.1007/978-3-642-37456-2_14}\n#'\n#' Campello RJGB, Moulavi D, Zimek A, Sander J (2015). Hierarchical density\n#' estimates for data clustering, visualization, and outlier detection.\n#' _ACM Transactions on Knowledge Discovery from Data (TKDD),_ 10(5):1-51.\n#' \\doi{10.1145/2733381}\n#'\n#' Malzer, C., & Baum, M. (2020). A Hybrid Approach To Hierarchical\n#' Density-based Cluster Selection.\n#' In 2020 IEEE International Conference on Multisensor Fusion\n#' and Integration for Intelligent Systems (MFI), pp. 223-228.\n#' \\doi{10.1109/MFI49285.2020.9235263}\n#' @keywords model clustering hierarchical\n#' @examples\n#' ## cluster the moons data set with HDBSCAN\n#' data(moons)\n#'\n#' res <- hdbscan(moons, minPts = 5)\n#' res\n#'\n#' plot(res)\n#' clplot(moons, res)\n#'\n#' ## cluster the moons data set with HDBSCAN using Manhattan distances\n#' res <- hdbscan(dist(moons, method = \"manhattan\"), minPts = 5)\n#' plot(res)\n#' clplot(moons, res)\n#'\n#' ## Example for HDBSCAN(e) using cluster_selection_epsilon\n#' # data with clusters of various densities.\n#' X <- data.frame(\n#'  x = c(\n#'   0.08, 0.46, 0.46, 2.95, 3.50, 1.49, 6.89, 6.87, 0.21, 0.15,\n#'   0.15, 0.39, 0.80, 0.80, 0.37, 3.63, 0.35, 0.30, 0.64, 0.59, 1.20, 1.22,\n#'   1.42, 0.95, 2.70, 6.36, 6.36, 6.36, 6.60, 0.04, 0.71, 0.57, 0.24, 0.24,\n#'   0.04, 0.04, 1.35, 0.82, 1.04, 0.62, 0.26, 5.98, 1.67, 1.67, 0.48, 0.15,\n#'   6.67, 6.67, 1.20, 0.21, 3.99, 0.12, 0.19, 0.15, 6.96, 0.26, 0.08, 0.30,\n#'   1.04, 1.04, 1.04, 0.62, 0.04, 0.04, 0.04, 0.82, 0.82, 1.29, 1.35, 0.46,\n#'   0.46, 0.04, 0.04, 5.98, 5.98, 6.87, 0.37, 6.47, 6.47, 6.47, 6.67, 0.30,\n#'   1.49, 3.21, 3.21, 0.75, 0.75, 0.46, 0.46, 0.46, 0.46, 3.63, 0.39, 3.65,\n#'   4.09, 4.01, 3.36, 1.43, 3.28, 5.94, 6.35, 6.87, 5.60, 5.99, 0.12, 0.00,\n#'   0.32, 0.39, 0.00, 1.63, 1.36, 5.67, 5.60, 5.79, 1.10, 2.99, 0.39, 0.18\n#'   ),\n#'  y = c(\n#'   7.41, 8.01, 8.01, 5.44, 7.11, 7.13, 1.83, 1.83, 8.22, 8.08,\n#'   8.08, 7.20, 7.83, 7.83, 8.29, 5.99, 8.32, 8.22, 7.38, 7.69, 8.22, 7.31,\n#'   8.25, 8.39, 6.34, 0.16, 0.16, 0.16, 1.66, 7.55, 7.90, 8.18, 8.32, 8.32,\n#'   7.97, 7.97, 8.15, 8.43, 7.83, 8.32, 8.29, 1.03, 7.27, 7.27, 8.08, 7.27,\n#'   0.79, 0.79, 8.22, 7.73, 6.62, 7.62, 8.39, 8.36, 1.73, 8.29, 8.04, 8.22,\n#'   7.83, 7.83, 7.83, 8.32, 8.11, 7.69, 7.55, 7.20, 7.20, 8.01, 8.15, 7.55,\n#'   7.55, 7.97, 7.97, 1.03, 1.03, 1.24, 7.20, 0.47, 0.47, 0.47, 0.79, 8.22,\n#'   7.13, 6.48, 6.48, 7.10, 7.10, 8.01, 8.01, 8.01, 8.01, 5.99, 8.04, 5.22,\n#'   5.82, 5.14, 4.81, 7.62, 5.73, 0.55, 1.31, 0.05, 0.95, 1.59, 7.99, 7.48,\n#'   8.38, 7.12, 2.01, 1.40, 0.00, 9.69, 9.47, 9.25, 2.63, 6.89, 0.56, 3.11\n#'  )\n#' )\n#'\n#' ## HDBSCAN splits one cluster\n#' hdb <- hdbscan(X, minPts = 3)\n#' plot(hdb, show_flat = TRUE)\n#' hullplot(X, hdb, main = \"HDBSCAN\")\n#'\n#' ## DBSCAN* marks the least dense cluster as outliers\n#' db <- dbscan(X, eps = 1, minPts = 3, borderPoints = FALSE)\n#' hullplot(X, db, main = \"DBSCAN*\")\n#'\n#' ## HDBSCAN(e) mixes HDBSCAN AND DBSCAN* to find all clusters\n#' hdbe <- hdbscan(X, minPts = 3, cluster_selection_epsilon = 1)\n#' plot(hdbe, show_flat = TRUE)\n#' hullplot(X, hdbe, main = \"HDBSCAN(e)\")\n#' @export\nhdbscan <- function(x,\n                    minPts,\n                    cluster_selection_epsilon = 0.0,\n                    gen_hdbscan_tree = FALSE,\n                    gen_simplified_tree = FALSE,\n                    verbose = FALSE) {\n  if (!inherits(x, \"dist\") && !.matrixlike(x)) {\n    stop(\"hdbscan expects a numeric matrix or a dist object.\")\n  }\n\n  ## 1. Calculate the mutual reachability between points\n  if (verbose) {\n    cat(\"Calculating core distances...\\n\")\n  }\n  coredist <- coredist(x, minPts)\n\n\n  if (verbose) {\n    cat(\"Calculating the mutual reachability matrix distances...\\n\")\n  }\n  mrd <- mrdist(x, minPts, coredist = coredist)\n  n <- attr(mrd, \"Size\")\n\n  ## 2. Construct a minimum spanning tree and convert to RSL representation\n  if (verbose) {\n    cat(\"Constructing the minimum spanning tree...\\n\")\n  }\n  mst <- mst(mrd, n)\n  hc <- hclustMergeOrder(mst, order(mst[, 3]))\n  hc$call <- match.call()\n\n  ## 3. Prune the tree\n  ## Process the hierarchy to retrieve all the necessary info needed by HDBSCAN\n  if (verbose) {\n    cat(\"Tree pruning...\\n\")\n  }\n  res <- computeStability(hc, minPts, compute_glosh = TRUE)\n  res <- extractUnsupervised(res, cluster_selection_epsilon = cluster_selection_epsilon)\n  cl <- attr(res, \"cluster\")\n\n  ## 4. Extract the clusters\n  if (verbose) {\n    cat(\"Extract clusters...\\n\")\n  }\n  sl <- attr(res, \"salient_clusters\")\n\n  ## Generate membership 'probabilities' using core distance as the measure of density\n  prob <- rep(0, length(cl))\n  for (cid in sl) {\n    max_f <- max(coredist[which(cl == cid)])\n    pr <- (max_f - coredist[which(cl == cid)]) / max_f\n    prob[cl == cid] <- pr\n  }\n\n  ## Match cluster assignments to be incremental, with 0 representing noise\n  if (any(cl == 0)) {\n    cluster <- match(cl, c(0, sl)) - 1\n  } else {\n    cluster <- match(cl, sl)\n  }\n  cl_map <-\n    structure(sl, names = unique(cluster[hc$order][cluster[hc$order] != 0]))\n\n  ## Stability scores\n  ## NOTE: These scores represent the stability scores -before- the hierarchy traversal\n  cluster_scores <-\n    vapply(sl, function(sl_cid) {\n      res[[as.character(sl_cid)]]$stability\n    }, numeric(1L))\n  names(cluster_scores) <- names(cl_map)\n\n  ## Return everything HDBSCAN does\n  attr(res, \"cl_map\") <-\n    cl_map # Mapping of hierarchical IDS to 'normalized' incremental ids\n  out <- structure(\n    list(\n      cluster = cluster,\n      minPts = minPts,\n      coredist = coredist,\n      cluster_scores = cluster_scores,\n      # (Cluster-wide cumulative) Stability Scores\n      membership_prob = prob,\n      # Individual point membership probabilities\n      outlier_scores = attr(res, \"glosh\"),\n      # Outlier Scores\n      hc = hc # Hclust object of MST (can be cut for quick assignments)\n    ),\n    class = \"hdbscan\",\n    hdbscan = res\n  ) # hdbscan attributes contains actual HDBSCAN hierarchy\n\n  ## The trees don't need to be explicitly computed, but they may be useful if the user wants them\n  if (gen_hdbscan_tree) {\n    out$hdbscan_tree <- buildDendrogram(hc)\n  }\n  if (gen_simplified_tree) {\n    out$simplified_tree <- simplifiedTree(res)\n  }\n  return(out)\n}\n\n#' @rdname hdbscan\n#' @export\nprint.hdbscan <- function(x, ...) {\n  writeLines(c(\n    paste0(\"HDBSCAN clustering for \", nobs(x), \" objects.\"),\n    paste0(\"Parameters: minPts = \", x$minPts),\n    paste0(\n      \"The clustering contains \",\n      ncluster(x),\n      \" cluster(s) and \",\n      nnoise(x),\n      \" noise points.\"\n    )\n  ))\n\n  print(table(x$cluster))\n  cat(\"\\n\")\n  writeLines(strwrap(paste0(\"Available fields: \", toString(names(\n    x\n  ))), exdent = 18))\n}\n\n#' @rdname hdbscan\n#' @param leaflab a string specifying how leaves are labeled (see [stats::plot.dendrogram()]).\n#' @param ylab the label for the y axis.\n#' @param main Title of the plot.\n#' @export\nplot.hdbscan <-\n  function(x,\n           scale = \"suggest\",\n           gradient = c(\"yellow\", \"red\"),\n           show_flat = FALSE,\n           main = \"HDBSCAN*\",\n           ylab = \"eps value\",\n           leaflab = \"none\",\n           ...) {\n    ## Logic checks\n    if (!(scale == \"suggest\" ||\n          scale > 0)) {\n      stop(\"scale parameter must be greater than 0.\")\n    }\n\n    ## Main information needed\n    hd_info <- attr(x, \"hdbscan\")\n    dend <- x$simplified_tree %||% simplifiedTree(hd_info)\n    coords <-\n      node_xy(hd_info, cl_hierarchy = attr(hd_info, \"cl_hierarchy\"))\n\n    ## Variables to help setup the scaling of the plotting\n    nclusters <- length(hd_info)\n    npoints <- length(x$cluster)\n    nleaves <-\n      length(all_children(\n        attr(hd_info, \"cl_hierarchy\"),\n        key = 0,\n        leaves_only = TRUE\n      ))\n\n    scale <- ifelse(scale == \"suggest\", nclusters, nclusters / scale)\n\n    ## Color variables\n    col_breaks <- seq(0, length(x$cluster) + nclusters, by = nclusters)\n    gcolors <- grDevices::colorRampPalette(gradient)(length(col_breaks))\n\n    ## Depth-first search to recursively plot rectangles\n    eps_dfs <- function(dend, index, parent_height, scale) {\n      coord <- coords[index, ]\n      cl_key <- as.character(attr(dend, \"label\"))\n\n      ## widths == number of points in the cluster at each eps it was alive\n      widths <-\n        vapply(sort(hd_info[[cl_key]]$eps, decreasing = TRUE), function(eps) {\n          sum(hd_info[[cl_key]]$eps <= eps)\n        }, numeric(1L))\n      if (length(widths) > 0) {\n        widths <- c(widths + hd_info[[cl_key]]$n_children,\n                    rep(hd_info[[cl_key]]$n_children, hd_info[[cl_key]]$n_children))\n      } else {\n        widths <-\n          rep(hd_info[[cl_key]]$n_children, hd_info[[cl_key]]$n_children)\n      }\n\n      ## Normalize and scale widths to length of x-axis\n      normalize <- function(x) {\n        (nleaves) * (x - 1) / (npoints - 1)\n      }\n      xleft <- coord[[1]] - normalize(widths) / scale\n      xright <- coord[[1]] + normalize(widths) / scale\n\n      ## Top is always parent height, bottom is when the points died\n      ## Minor adjustment made if at the root equivalent to plot.dendrogram(edge.root=T)\n      if (cl_key == \"0\") {\n        ytop <-\n          rep(hd_info[[cl_key]]$eps_birth + 0.0625 * hd_info[[cl_key]]$eps_birth,\n              length(widths))\n        ybottom <- rep(hd_info[[cl_key]]$eps_death, length(widths))\n      } else {\n        ytop <- rep(parent_height, length(widths))\n        ybottom <-\n          c(\n            sort(hd_info[[cl_key]]$eps, decreasing = TRUE),\n            rep(hd_info[[cl_key]]$eps_death, hd_info[[cl_key]]$n_children)\n          )\n      }\n\n      ## Draw the rectangles\n      rect_color <-\n        gcolors[.bincode(length(widths), breaks = col_breaks)]\n      graphics::rect(\n        xleft = xleft,\n        xright = xright,\n        ybottom = ybottom,\n        ytop = ytop,\n        col = rect_color,\n        border = NA,\n        lwd = 0\n      )\n\n      ## Highlight the most 'stable' clusters returned by the default flat cluster extraction\n      if (show_flat) {\n        salient_cl <- attr(hd_info, \"salient_clusters\")\n        if (as.integer(attr(dend, \"label\")) %in% salient_cl) {\n          x_adjust <-\n            (max(xright) - min(xleft)) * 0.10 # 10% left/right border\n          y_adjust <-\n            (max(ytop) - min(ybottom)) * 0.025 # 2.5% above/below border\n          graphics::rect(\n            xleft = min(xleft) - x_adjust,\n            xright = max(xright) + x_adjust,\n            ybottom = min(ybottom) - y_adjust,\n            ytop = max(ytop) + y_adjust,\n            border = \"red\",\n            lwd = 1\n          )\n          n_label <-\n            names(which(attr(hd_info, \"cl_map\") == attr(dend, \"label\")))\n          text(\n            x = coord[[1]],\n            y = min(ybottom),\n            pos = 1,\n            labels = n_label\n          )\n        }\n      }\n\n      ## Recurse in depth-first-manner\n      if (is.leaf(dend)) {\n        return(index)\n      } else {\n        left <-\n          eps_dfs(\n            dend[[1]],\n            index = index + 1,\n            parent_height = attr(dend, \"height\"),\n            scale = scale\n          )\n        right <-\n          eps_dfs(\n            dend[[2]],\n            index = left + 1,\n            parent_height = attr(dend, \"height\"),\n            scale = scale\n          )\n        return(right)\n      }\n    }\n\n    ## Run the recursive plotting\n    plot(\n      dend,\n      edge.root = TRUE,\n      main = main,\n      ylab = ylab,\n      leaflab = leaflab,\n      ...\n    )\n    eps_dfs(dend,\n            index = 1,\n            parent_height = 0,\n            scale = scale)\n    return(invisible(x))\n  }\n\n#' @rdname hdbscan\n#' @export\ncoredist <- function(x, minPts)\n  kNNdist(x, k = minPts - 1)\n\n#' @rdname hdbscan\n#' @export\nmrdist <- function(x, minPts, coredist = NULL) {\n  if (inherits(x, \"dist\")) {\n    .check_dist(x)\n    x_dist <- x\n  } else {\n    x_dist <- dist(x,\n                   method = \"euclidean\",\n                   diag = FALSE,\n                   upper = FALSE)\n  }\n\n  if (is.null(coredist)) {\n    coredist <- coredist(x, minPts)\n  }\n\n  # mr_dist <- as.vector(pmax(as.dist(outer(coredist, coredist, pmax)), x_dist))\n  # much faster in C++\n  mr_dist <- mrd(x_dist, coredist)\n  class(mr_dist) <- \"dist\"\n  attr(mr_dist, \"Size\") <- attr(x_dist, \"Size\")\n  attr(mr_dist, \"Diag\") <- FALSE\n  attr(mr_dist, \"Upper\") <- FALSE\n  attr(mr_dist, \"method\") <- paste0(\"mutual reachability (\", attr(x_dist, \"method\"), \")\")\n  mr_dist\n}\n"
  },
  {
    "path": "R/hullplot.R",
    "content": "#######################################################################\n# dbscan - Density Based Clustering of Applications with Noise\n#          and Related Algorithms\n# Copyright (C) 2015 Michael Hahsler\n\n# This program is free software; you can redistribute it and/or modify\n# it under the terms of the GNU General Public License as published by\n# the Free Software Foundation; either version 2 of the License, or\n# any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU General Public License for more details.\n#\n# You should have received a copy of the GNU General Public License along\n# with this program; if not, write to the Free Software Foundation, Inc.,\n# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.\n\n#' Plot Clusters\n#'\n#' This function produces a two-dimensional scatter plot of data points\n#' and colors the data points according to a supplied clustering. Noise points\n#' are marked as `x`. `hullplot()` also adds convex hulls to clusters.\n#'\n#' @name hullplot\n#' @aliases hullplot clplot\n#'\n#' @param x a data matrix. If more than 2 columns are provided, then the data\n#' is plotted using the first two principal components.\n#' @param cl a clustering. Either a numeric cluster assignment vector or a\n#' clustering object (a list with an element named `cluster`).\n#' @param col colors used for clusters. Defaults to the standard palette.  The\n#' first color (default is black) is used for noise/unassigned points (cluster\n#' id 0).\n#' @param pch a vector of plotting characters. By default `o` is used for\n#'   points and `x` for noise points.\n#' @param cex expansion factor for symbols.\n#' @param hull_lwd,hull_lty line width and line type used for the convex hull.\n#' @param main main title.\n#' @param solid,alpha draw filled polygons instead of just lines for the convex\n#' hulls? alpha controls the level of alpha shading.\n#' @param ...  additional arguments passed on to plot.\n#' @author Michael Hahsler\n#' @keywords plot clustering\n#' @examples\n#' set.seed(2)\n#' n <- 400\n#'\n#' x <- cbind(\n#'   x = runif(4, 0, 1) + rnorm(n, sd = 0.1),\n#'   y = runif(4, 0, 1) + rnorm(n, sd = 0.1)\n#'   )\n#' cl <- rep(1:4, times = 100)\n#'\n#'\n#' ### original data with true clustering\n#' clplot(x, cl, main = \"True clusters\")\n#' hullplot(x, cl, main = \"True clusters\")\n#' ### use different symbols\n#' hullplot(x, cl, main = \"True clusters\", pch = cl)\n#' ### just the hulls\n#' hullplot(x, cl, main = \"True clusters\", pch = NA)\n#' ### a version suitable for b/w printing)\n#' hullplot(x, cl, main = \"True clusters\", solid = FALSE,\n#'   col = c(\"grey\", \"black\"), pch = cl)\n#'\n#'\n#' ### run some clustering algorithms and plot the results\n#' db <- dbscan(x, eps = .07, minPts = 10)\n#' clplot(x, db, main = \"DBSCAN\")\n#' hullplot(x, db, main = \"DBSCAN\")\n#'\n#' op <- optics(x, eps = 10, minPts = 10)\n#' opDBSCAN <- extractDBSCAN(op, eps_cl = .07)\n#' hullplot(x, opDBSCAN, main = \"OPTICS\")\n#'\n#' opXi <- extractXi(op, xi = 0.05)\n#' hullplot(x, opXi, main = \"OPTICSXi\")\n#'\n#' # Extract minimal 'flat' clusters only\n#' opXi <- extractXi(op, xi = 0.05, minimum = TRUE)\n#' hullplot(x, opXi, main = \"OPTICSXi\")\n#'\n#' km <- kmeans(x, centers = 4)\n#' hullplot(x, km, main = \"k-means\")\n#'\n#' hc <- cutree(hclust(dist(x)), k = 4)\n#' hullplot(x, hc, main = \"Hierarchical Clustering\")\n#' @export\nhullplot <- function(x,\n  cl,\n  col = NULL,\n  pch = NULL,\n  cex = 0.5,\n  hull_lwd = 1,\n  hull_lty = 1,\n  solid = TRUE,\n  alpha = .2,\n  main = \"Convex Cluster Hulls\",\n  ...) {\n  ### handle d>2 by using PCA\n  if (ncol(x) > 2)\n    x <- prcomp(x)$x\n\n  ### extract clustering (keep hierarchical xICSXi structure)\n  if (inherits(cl, \"xics\") || \"clusters_xi\" %in% names(cl)) {\n    clusters_xi <- cl$clusters_xi\n    cl_order <- cl$order\n  } else\n    clusters_xi <- NULL\n\n  if (is.list(cl))\n    cl <- cl$cluster\n  if (!is.numeric(cl))\n    stop(\"Could not get cluster assignment vector from cl.\")\n\n  #if(is.null(col)) col <- c(\"#000000FF\", rainbow(n=max(cl)))\n  if (is.null(col))\n    col <- palette()\n\n  # Note: We use the first color for noise points\n  if (length(col) == 1L)\n    col <- c(col, col)\n  col_noise <- col[1]\n  col <- col[-1]\n\n\n  if (max(cl) > length(col)) {\n    warning(\"Not enough colors. Some colors will be reused.\")\n    col <- rep(col, length.out = max(cl))\n  }\n\n  # mark noise points\n  pch <- pch %||% ifelse(cl == 0L, 4L, 1L)\n\n  plot(x[, 1:2],\n    col = c(col_noise, col)[cl + 1L],\n    pch = pch,\n    cex = cex,\n    main = main,\n    ...)\n\n  col_poly <- adjustcolor(col, alpha.f = alpha)\n  border <- col\n\n  ## no border?\n  if (is.null(hull_lwd) || is.na(hull_lwd) || hull_lwd == 0) {\n    hull_lwd <- 1\n    border <- NA\n  }\n\n  if (inherits(cl, \"xics\") || \"clusters_xi\" %in% names(cl)) {\n    ## This is necessary for larger datasets: Ensure largest is plotted first\n    clusters_xi <-\n      clusters_xi[order(-(clusters_xi$end - clusters_xi$start)), ] # Order by size (descending)\n    ci_order <- clusters_xi$cluster_id\n  } else {\n    ci_order <- 1:max(cl)\n  }\n\n  for (i in seq_along(ci_order)) {\n    ### use all the points for xICSXi's hierarchical structure\n    if (is.null(clusters_xi)) {\n      d <- x[cl == i, , drop = FALSE]\n    } else {\n      d <-\n        x[cl_order[clusters_xi$start[i]:clusters_xi$end[i]], , drop = FALSE]\n    }\n\n    ch <- chull(d)\n    ch <- c(ch, ch[1])\n    if (!solid) {\n      lines(d[ch, ],\n            col = border[ci_order[i]],\n            lwd = hull_lwd,\n            lty = hull_lty)\n    } else {\n      polygon(\n        d[ch, ],\n        col = col_poly[ci_order[i]],\n        lwd = hull_lwd,\n        lty = hull_lty,\n        border = border[ci_order[i]]\n      )\n    }\n  }\n}\n\n#' @rdname hullplot\n#' @export\nclplot <- function(x,\n                   cl,\n                   col = NULL,\n                   pch = NULL,\n                   cex = 0.5,\n                   main = \"Cluster Plot\",\n                   ...)\n  hullplot(x, cl = cl, col = col, pch = pch, cex = cex, main = main,\n          solid = FALSE, hull_lwd = NA)\n"
  },
  {
    "path": "R/jpclust.R",
    "content": "#######################################################################\n# dbscan - Density Based Clustering of Applications with Noise\n#          and Related Algorithms\n# Copyright (C) 2017 Michael Hahsler\n\n# This program is free software; you can redistribute it and/or modify\n# it under the terms of the GNU General Public License as published by\n# the Free Software Foundation; either version 2 of the License, or\n# any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU General Public License for more details.\n#\n# You should have received a copy of the GNU General Public License along\n# with this program; if not, write to the Free Software Foundation, Inc.,\n# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.\n\n#' Jarvis-Patrick Clustering\n#'\n#' Fast C++ implementation of the Jarvis-Patrick clustering which first builds\n#' a shared nearest neighbor graph (k nearest neighbor sparsification) and then\n#' places two points in the same cluster if they are in each others nearest\n#' neighbor list and they share at least kt nearest neighbors.\n#'\n#' Following the original paper, the shared nearest neighbor list is\n#' constructed as the k neighbors plus the point itself (as neighbor zero).\n#' Therefore, the threshold `kt` needs to be in the range \\eqn{[1, k]}.\n#'\n#' Fast nearest neighbors search with [kNN()] is only used if `x` is\n#' a matrix. In this case Euclidean distance is used.\n#'\n#' @aliases jpclust print.general_clustering\n#' @family clustering functions\n#'\n#' @param x a data matrix/data.frame (Euclidean distance is used), a\n#' precomputed [dist] object or a kNN object created with [kNN()].\n#' @param k Neighborhood size for nearest neighbor sparsification. If `x`\n#' is a kNN object then `k` may be missing.\n#' @param kt threshold on the number of shared nearest neighbors (including the\n#' points themselves) to form clusters. Range: \\eqn{[1, k]}\n#' @param ...  additional arguments are passed on to the k nearest neighbor\n#' search algorithm. See [kNN()] for details on how to control the\n#' search strategy.\n#'\n#' @return A object of class `general_clustering` with the following\n#' components:\n#' \\item{cluster }{A integer vector with cluster assignments. Zero\n#' indicates noise points.}\n#' \\item{type }{ name of used clustering algorithm.}\n#' \\item{metric }{ the distance metric used for clustering.}\n#' \\item{param }{ list of used clustering parameters. }\n#'\n#' @author Michael Hahsler\n#' @references R. A. Jarvis and E. A. Patrick. 1973. Clustering Using a\n#' Similarity Measure Based on Shared Near Neighbors. _IEEE Trans. Comput.\n#' 22,_ 11 (November 1973), 1025-1034.\n#' \\doi{10.1109/T-C.1973.223640}\n#' @keywords model clustering\n#' @examples\n#' data(\"DS3\")\n#'\n#' # use a shared neighborhood of 20 points and require 12 shared neighbors\n#' cl <- jpclust(DS3, k = 20, kt = 12)\n#' cl\n#'\n#' clplot(DS3, cl)\n#' # Note: JP clustering does not consider noise and thus,\n#' # the sine wave points chain clusters together.\n#'\n#' # use a precomputed kNN object instead of the original data.\n#' nn <- kNN(DS3, k = 30)\n#' nn\n#'\n#' cl <- jpclust(nn, k = 20, kt = 12)\n#' cl\n#'\n#' # cluster with noise removed (use low pointdensity to identify noise)\n#' d <- pointdensity(DS3, eps = 25)\n#' hist(d, breaks = 20)\n#' DS3_noiseless <- DS3[d > 110,]\n#'\n#' cl <- jpclust(DS3_noiseless, k = 20, kt = 10)\n#' cl\n#'\n#' clplot(DS3_noiseless, cl)\n#' @export\njpclust <- function(x, k, kt, ...) {\n  # Create NN graph\n  if (missing(k) && inherits(x, \"kNN\"))\n      k <- x$k\n  if (length(kt) != 1 || kt < 1 || kt > k)\n    stop(\"kt needs to be a threshold in range [1, k].\")\n\n  nn <- kNN(x, k, sort = FALSE, ...)\n\n  # Perform clustering\n  cl <- JP_int(nn$id, kt = as.integer(kt))\n\n  structure(\n    list(\n      cluster = as.integer(factor(cl)),\n      type = \"Jarvis-Patrick clustering\",\n      metric = nn$metric,\n      param = list(k = k, kt = kt)\n    ),\n    class = c(\"general_clustering\")\n  )\n}\n\n#' @export\nprint.general_clustering <- function(x, ...) {\n  cl <- unique(x$cluster)\n  cl <- length(cl[cl != 0L])\n\n  writeLines(c(\n    paste0(x$type, \" for \", length(x$cluster), \" objects.\"),\n    paste0(\"Parameters: \",\n      paste(\n        names(x$param),\n        unlist(x$param, use.names = FALSE),\n        sep = \" = \",\n        collapse = \", \"\n      )),\n    paste0(\n      \"The clustering contains \",\n      cl,\n      \" cluster(s) and \",\n      sum(x$cluster == 0L),\n      \" noise points.\"\n    )\n  ))\n\n  print(table(x$cluster))\n  cat(\"\\n\")\n\n  writeLines(strwrap(paste0(\n    \"Available fields: \",\n    toString(names(x))\n  ), exdent = 18))\n}\n"
  },
  {
    "path": "R/kNN.R",
    "content": "#######################################################################\n# dbscan - Density Based Clustering of Applications with Noise\n#          and Related Algorithms\n# Copyright (C) 2015 Michael Hahsler\n\n# This program is free software; you can redistribute it and/or modify\n# it under the terms of the GNU General Public License as published by\n# the Free Software Foundation; either version 2 of the License, or\n# any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU General Public License for more details.\n#\n# You should have received a copy of the GNU General Public License along\n# with this program; if not, write to the Free Software Foundation, Inc.,\n# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.\n\n\n#' Find the k Nearest Neighbors\n#'\n#' This function uses a kd-tree to find all k nearest neighbors in a data\n#' matrix (including distances) fast.\n#'\n#' **Ties:** If the kth and the (k+1)th nearest neighbor are tied, then the\n#' neighbor found first is returned and the other one is ignored.\n#'\n#' **Self-matches:** If no query is specified, then self-matches are\n#' removed.\n#'\n#' Details on the search parameters:\n#'\n#' * `search` controls if\n#' a kd-tree or linear search (both implemented in the ANN library; see Mount\n#' and Arya, 2010). Note, that these implementations cannot handle NAs.\n#' `search = \"dist\"` precomputes Euclidean distances first using R. NAs are\n#' handled, but the resulting distance matrix cannot contain NAs. To use other\n#' distance measures, a precomputed distance matrix can be provided as `x`\n#' (`search` is ignored).\n#'\n#' * `bucketSize` and `splitRule` influence how the kd-tree is\n#' built. `approx` uses the approximate nearest neighbor search\n#' implemented in ANN. All nearest neighbors up to a distance of\n#' `eps / (1 + approx)` will be considered and all with a distance\n#' greater than `eps` will not be considered. The other points might be\n#' considered. Note that this results in some actual nearest neighbors being\n#' omitted leading to spurious clusters and noise points. However, the\n#' algorithm will enjoy a significant speedup. For more details see Mount and\n#' Arya (2010).\n#'\n#' @aliases kNN knn\n#' @family NN functions\n#'\n#' @param x a data matrix, a [dist] object or a [kNN] object.\n#' @param k number of neighbors to find.\n#' @param query a data matrix with the points to query. If query is not\n#' specified, the NN for all the points in `x` is returned. If query is\n#' specified then `x` needs to be a data matrix.\n#' @param search nearest neighbor search strategy (one of `\"kdtree\"`, `\"linear\"` or\n#' `\"dist\"`).\n#' @param sort sort the neighbors by distance? Note that some search methods\n#' already sort the results. Sorting is expensive and `sort = FALSE` may\n#' be much faster for some search methods. kNN objects can be sorted using\n#' `sort()`.\n#' @param bucketSize max size of the kd-tree leafs.\n#' @param splitRule rule to split the kd-tree. One of `\"STD\"`, `\"MIDPT\"`, `\"FAIR\"`,\n#' `\"SL_MIDPT\"`, `\"SL_FAIR\"` or `\"SUGGEST\"` (SL stands for sliding). `\"SUGGEST\"` uses\n#' ANNs best guess.\n#' @param approx use approximate nearest neighbors. All NN up to a distance of\n#' a factor of `1 + approx` eps may be used. Some actual NN may be omitted\n#' leading to spurious clusters and noise points.  However, the algorithm will\n#' enjoy a significant speedup.\n#' @param decreasing sort in decreasing order?\n#' @param ... further arguments\n#'\n#' @return An object of class `kNN` (subclass of [NN]) containing a\n#' list with the following components:\n#' \\item{dist }{a matrix with distances. }\n#' \\item{id }{a matrix with `ids`. }\n#' \\item{k }{number `k` used. }\n#' \\item{metric }{ used distance metric. }\n#'\n#' @author Michael Hahsler\n#' @references David M. Mount and Sunil Arya (2010). ANN: A Library for\n#' Approximate Nearest Neighbor Searching,\n#' \\url{http://www.cs.umd.edu/~mount/ANN/}.\n#' @keywords model\n#' @examples\n#' data(iris)\n#' x <- iris[, -5]\n#'\n#' # Example 1: finding kNN for all points in a data matrix (using a kd-tree)\n#' nn <- kNN(x, k = 5)\n#' nn\n#'\n#' # explore neighborhood of point 10\n#' i <- 10\n#' nn$id[i,]\n#' plot(x, col = ifelse(seq_len(nrow(iris)) %in% nn$id[i,], \"red\", \"black\"))\n#'\n#' # visualize the 5 nearest neighbors\n#' plot(nn, x)\n#'\n#' # visualize a reduced 2-NN graph\n#' plot(kNN(nn, k = 2), x)\n#'\n#' # Example 2: find kNN for query points\n#' q <- x[c(1,100),]\n#' nn <- kNN(x, k = 10, query = q)\n#'\n#' plot(nn, x, col = \"grey\")\n#' points(q, pch = 3, lwd = 2)\n#'\n#' # Example 3: find kNN using distances\n#' d <- dist(x, method = \"manhattan\")\n#' nn <- kNN(d, k = 1)\n#' plot(nn, x)\n#' @export\nkNN <-\n  function(x,\n    k,\n    query = NULL,\n    sort = TRUE,\n    search = \"kdtree\",\n    bucketSize = 10,\n    splitRule = \"suggest\",\n    approx = 0) {\n    if (inherits(x, \"kNN\")) {\n      if (x$k < k)\n        stop(\"kNN in x has not enough nearest neighbors.\")\n      if (!x$sort)\n        x <- sort(x)\n      x$id <- x$id[, 1:k]\n      if (!is.null(x$dist))\n        x$dist <- x$dist[, 1:k]\n      if (!is.null(x$shared))\n        x$dist <- x$shared[, 1:k]\n      x$k <- k\n      return(x)\n    }\n\n    search <- .parse_search(search)\n    splitRule <- .parse_splitRule(splitRule)\n\n    k <- as.integer(k)\n    if (k < 1)\n      stop(\"Illegal k: needs to be k>=1!\")\n\n    ### dist search\n    if (search == 3 && !inherits(x, \"dist\")) {\n      if (.matrixlike(x))\n        x <- dist(x)\n      else\n        stop(\"x needs to be a matrix to calculate distances\")\n    }\n\n    ### get kNN from a dist object\n    if (inherits(x, \"dist\")) {\n      if (!is.null(query))\n        stop(\"query can only be used if x contains a data matrix.\")\n\n      if (anyNA(x))\n        stop(\"distances cannot be NAs for kNN!\")\n\n      return(dist_to_kNN(x, k = k))\n    }\n\n    ## make sure x is numeric\n    if (!.matrixlike(x))\n      stop(\"x needs to be a matrix to calculate distances\")\n    x <- as.matrix(x)\n    if (storage.mode(x) == \"integer\")\n      storage.mode(x) <- \"double\"\n    if (storage.mode(x) != \"double\")\n      stop(\"x has to be a numeric matrix.\")\n\n    if (!is.null(query)) {\n      query <- as.matrix(query)\n      if (storage.mode(query) == \"integer\")\n        storage.mode(query) <- \"double\"\n      if (storage.mode(query) != \"double\")\n        stop(\"query has to be NULL or a numeric matrix.\")\n      if (ncol(x) != ncol(query))\n        stop(\"x and query need to have the same number of columns!\")\n    }\n\n    if (k >= nrow(x))\n      stop(\"Not enough neighbors in data set!\")\n\n\n    if (anyNA(x))\n      stop(\"data/distances cannot contain NAs for kNN (with kd-tree)!\")\n\n    ## returns NO self matches\n    if (!is.null(query)) {\n      ret <- kNN_query_int(\n        as.matrix(x),\n        as.matrix(query),\n        as.integer(k),\n        as.integer(search),\n        as.integer(bucketSize),\n        as.integer(splitRule),\n        as.double(approx)\n      )\n      dimnames(ret$dist) <- list(rownames(query), 1:k)\n      dimnames(ret$id) <- list(rownames(query), 1:k)\n    } else {\n      ret <- kNN_int(\n        as.matrix(x),\n        as.integer(k),\n        as.integer(search),\n        as.integer(bucketSize),\n        as.integer(splitRule),\n        as.double(approx)\n      )\n      dimnames(ret$dist) <- list(rownames(x), 1:k)\n      dimnames(ret$id) <- list(rownames(x), 1:k)\n    }\n\n    class(ret) <- c(\"kNN\", \"NN\")\n\n    ### ANN already returns them sorted (by dist but not by ID)\n    if (sort)\n      ret <- sort(ret)\n\n    ret$metric <- \"euclidean\"\n\n    ret\n  }\n\n# make sure we have a lower-triangle representation w/o diagonal\n.check_dist <- function(x) {\n  if (!inherits(x, \"dist\"))\n    stop(\"x needs to be a dist object\")\n\n  # cluster::dissimilarity does not have Diag or Upper attributes, but is a lower triangle\n  # representation\n  if (inherits(x, \"dissimilarity\"))\n    return(TRUE)\n\n  # check that dist objects have diag = FALSE, upper = FALSE\n  if (attr(x, \"Diag\") || attr(x, \"Upper\"))\n    stop(\"x needs to be a dist object with attributes Diag and Upper set to FALSE. Use as.dist(x, diag = FALSE, upper = FALSE) fist.\")\n  }\n\ndist_to_kNN <- function(x, k) {\n  .check_dist(x)\n\n  n <- attr(x, \"Size\")\n\n  id <- structure(integer(n * k), dim = c(n, k))\n  d <- matrix(NA_real_, nrow = n, ncol = k)\n\n  for (i in seq_len(n)) {\n    ### Inf -> no self-matches\n    y <- dist_row(x, i, self_val = Inf)\n    o <- order(y, decreasing = FALSE)\n    o <- o[seq_len(k)]\n    id[i, ] <- o\n    d[i, ] <- y[o]\n  }\n  dimnames(id) <- list(labels(x), seq_len(k))\n  dimnames(d) <- list(labels(x), seq_len(k))\n\n  ret <-\n    structure(list(\n      dist = d,\n      id = id,\n      k = k,\n      sort = TRUE,\n      metric = attr(x, \"method\")\n    ),\n      class = c(\"kNN\", \"NN\"))\n\n  return(ret)\n}\n\n#' @rdname kNN\n#' @export\nsort.kNN <- function(x, decreasing = FALSE, ...) {\n  if (isTRUE(x$sort))\n    return(x)\n  if (is.null(x$dist))\n    stop(\"Unable to sort. Distances are missing.\")\n  if (ncol(x$id) < 2) {\n    x$sort <- TRUE\n    return(x)\n  }\n\n  ## sort first by dist and break ties using id\n  o <- vapply(\n    seq_len(nrow(x$dist)),\n    function(i) order(x$dist[i, ], x$id[i, ], decreasing = decreasing),\n    integer(ncol(x$id))\n  )\n  for (i in seq_len(ncol(o))) {\n    x$dist[i, ] <- x$dist[i, ][o[, i]]\n    x$id[i, ] <- x$id[i, ][o[, i]]\n  }\n  x$sort <- TRUE\n\n  x\n}\n\n#' @rdname kNN\n#' @export\nadjacencylist.kNN <- function(x, ...)\n  lapply(\n    seq_len(nrow(x$id)),\n    FUN = function(i) {\n      ## filter NAs\n      tmp <- x$id[i, ]\n      tmp[!is.na(tmp)]\n    }\n  )\n\n#' @rdname kNN\n#' @export\nprint.kNN <- function(x, ...) {\n  cat(\"k-nearest neighbors for \",\n    nrow(x$id),\n    \" objects (k=\",\n    x$k,\n    \").\",\n    \"\\n\",\n    sep = \"\")\n  cat(\"Distance metric:\", x$metric, \"\\n\")\n  cat(\"\\nAvailable fields: \", toString(names(x)), \"\\n\", sep = \"\")\n}\n\n# Convert names to integers for C++\n.parse_search <- function(search) {\n  search <- pmatch(toupper(search), c(\"KDTREE\", \"LINEAR\", \"DIST\"))\n  if (is.na(search))\n    stop(\"Unknown NN search type!\")\n  search\n}\n\n.parse_splitRule <- function(splitRule) {\n  splitRule <- pmatch(toupper(splitRule), .ANNsplitRule) - 1L\n  if (is.na(splitRule))\n    stop(\"Unknown splitRule!\")\n  splitRule\n}\n"
  },
  {
    "path": "R/kNNdist.R",
    "content": "#######################################################################\n# dbscan - Density Based Clustering of Applications with Noise\n#          and Related Algorithms\n# Copyright (C) 2015 Michael Hahsler\n\n# This program is free software; you can redistribute it and/or modify\n# it under the terms of the GNU General Public License as published by\n# the Free Software Foundation; either version 2 of the License, or\n# any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU General Public License for more details.\n#\n# You should have received a copy of the GNU General Public License along\n# with this program; if not, write to the Free Software Foundation, Inc.,\n# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.\n\n#' Calculate and Plot k-Nearest Neighbor Distances\n#'\n#' Fast calculation of the k-nearest neighbor distances for a dataset\n#' represented as a matrix of points. The kNN distance is defined as the\n#' distance from a point to its k nearest neighbor. The kNN distance plot\n#' displays the kNN distance of all points sorted from smallest to largest. The\n#' plot can be used to help find suitable parameter values for [dbscan()].\n#'\n#' @family Outlier Detection Functions\n#' @family NN functions\n#'\n#' @param x the data set as a matrix of points (Euclidean distance is used) or\n#' a precalculated [dist] object.\n#' @param k number of nearest neighbors used for the distance calculation. For\n#' `kNNdistplot()` also a range of values for `k` or `minPts` can be specified.\n#' @param minPts to use a k-NN plot to determine a suitable `eps` value for [dbscan()],\n#'    `minPts` used in dbscan can be specified and will set `k = minPts - 1`.\n#' @param all should a matrix with the distances to all k nearest neighbors be\n#' returned?\n#' @param ... further arguments (e.g., kd-tree related parameters) are passed\n#' on to [kNN()].\n#'\n#' @return `kNNdist()` returns a numeric vector with the distance to its k\n#' nearest neighbor. If `all = TRUE` then a matrix with k columns\n#' containing the distances to all 1st, 2nd, ..., kth nearest neighbors is\n#' returned instead.\n#'\n#' @author Michael Hahsler\n#' @keywords model plot\n#' @examples\n#' data(iris)\n#' iris <- as.matrix(iris[, 1:4])\n#'\n#' ## Find the 4-NN distance for each observation (see ?kNN\n#' ## for different search strategies)\n#' kNNdist(iris, k = 4)\n#'\n#' ## Get a matrix with distances to the 1st, 2nd, ..., 4th NN.\n#' kNNdist(iris, k = 4, all = TRUE)\n#'\n#' ## Produce a k-NN distance plot to determine a suitable eps for\n#' ## DBSCAN with MinPts = 5. Use k = 4 (= MinPts -1).\n#' ## The knee is visible around a distance of .7\n#' kNNdistplot(iris, k = 4)\n#'\n#' ## Look at all k-NN distance plots for a k of 1 to 10\n#' ## Note that k-NN distances are increasing in k\n#' kNNdistplot(iris, k = 1:20)\n#'\n#' cl <- dbscan(iris, eps = .7, minPts = 5)\n#' pairs(iris, col = cl$cluster + 1L)\n#' ## Note: black points are noise points\n#' @export\nkNNdist <- function(x, k, all = FALSE, ...) {\n  kNNd <- kNN(x, k, sort = TRUE, ...)$dist\n  if (!all)\n    kNNd <- kNNd[, k]\n  kNNd\n}\n\n#' @rdname kNNdist\n#' @export\nkNNdistplot <- function(x, k, minPts, ...) {\n  if (missing(k) && missing(minPts))\n    stop(\"k or minPts need to be specified.\")\n\n  if (missing(k))\n    k <- minPts - 1\n\n  if (length(k) == 1) {\n  kNNdist <- sort(kNNdist(x, k, ...))\n  plot(\n    kNNdist,\n    type = \"l\",\n    ylab = paste0(k, \"-NN distance\"),\n    xlab = \"Points sorted by distance\"\n  )\n\n  } else {\n    knnds <- vapply(k, function(i) sort(kNNdist(x, i, ...)), numeric(nrow(x)))\n\n    matplot(knnds, type = \"l\", lty = 1,\n            ylab = paste0(\"k-NN distance\"),\n            xlab = \"Points sorted by distance\")\n  }\n}\n"
  },
  {
    "path": "R/moons.R",
    "content": "#######################################################################\n# dbscan - Density Based Clustering of Applications with Noise\n#          and Related Algorithms\n# Copyright (C) 2015 Michael Hahsler\n\n# This program is free software; you can redistribute it and/or modify\n# it under the terms of the GNU General Public License as published by\n# the Free Software Foundation; either version 2 of the License, or\n# any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU General Public License for more details.\n#\n# You should have received a copy of the GNU General Public License along\n# with this program; if not, write to the Free Software Foundation, Inc.,\n# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.\n\n#' Moons Data\n#'\n#' Contains 100 2-d points, half of which are contained in two moons or\n#' \"blobs\"\" (25 points each blob), and the other half in asymmetric facing\n#' crescent shapes. The three shapes are all linearly separable.\n#'\n#' This data was generated with the following Python commands using the\n#' SciKit-Learn library:\n#'\n#' `> import sklearn.datasets as data`\n#'\n#' `> moons = data.make_moons(n_samples=50, noise=0.05)`\n#'\n#' `> blobs = data.make_blobs(n_samples=50, centers=[(-0.75,2.25), (1.0, 2.0)], cluster_std=0.25)`\n#'\n#' `> test_data = np.vstack([moons, blobs])`\n#'\n#' @name moons\n#' @docType data\n#' @format A data frame with 100 observations on the following 2 variables.\n#' \\describe{\n#' \\item{X}{a numeric vector}\n#' \\item{Y}{a numeric vector} }\n#' @references Pedregosa, Fabian, Gael Varoquaux, Alexandre Gramfort,\n#' Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel et al.\n#' Scikit-learn: Machine learning in Python. _Journal of Machine Learning\n#' Research_ 12, no. Oct (2011): 2825-2830.\n#' @source See the HDBSCAN notebook from github documentation:\n#' \\url{http://hdbscan.readthedocs.io/en/latest/how_hdbscan_works.html}\n#' @keywords datasets\n#' @examples\n#' data(moons)\n#' plot(moons, pch=20)\nNULL\n\n\n\n"
  },
  {
    "path": "R/ncluster.R",
    "content": "#######################################################################\n# dbscan - Density Based Clustering of Applications with Noise\n#          and Related Algorithms\n# Copyright (C) 2015 Michael Hahsler, Matt Piekenbrock\n\n# This program is free software; you can redistribute it and/or modify\n# it under the terms of the GNU General Public License as published by\n# the Free Software Foundation; either version 2 of the License, or\n# any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU General Public License for more details.\n#\n# You should have received a copy of the GNU General Public License along\n# with this program; if not, write to the Free Software Foundation, Inc.,\n# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.\n\n#' Number of Clusters, Noise Points, and Observations\n#'\n#' Extract the number of clusters or the number of noise points for\n#' a clustering. This function works with any clustering result that\n#' contains a list element named `cluster` with a clustering vector. In\n#' addition, `nobs` (see [stats::nobs()]) is also available to retrieve\n#' the number of clustered points.\n#'\n#' @name ncluster\n#' @aliases ncluster nnoise nobs\n#' @family clustering functions\n#'\n#' @param object a clustering result object containing a `cluster` element.\n#' @param ...  additional arguments are unused.\n#'\n#' @return returns the number if clusters or noise points.\n#' @examples\n#' data(iris)\n#' iris <- as.matrix(iris[, 1:4])\n#'\n#' res <- dbscan(iris, eps = .7, minPts = 5)\n#' res\n#'\n#' ncluster(res)\n#' nnoise(res)\n#' nobs(res)\n#'\n#' # the functions also work with kmeans and other clustering algorithms.\n#' cl <- kmeans(iris, centers = 3)\n#' ncluster(cl)\n#' nnoise(cl)\n#' nobs(res)\n#' @export\nncluster <- function(object, ...) {\n  UseMethod(\"ncluster\")\n}\n\n#' @export\nncluster.default <- function(object, ...) {\n  if (!is.list(object) || !is.numeric(object$cluster))\n    stop(\"ncluster() requires a clustering object with a cluster component containing the cluster labels.\")\n\n  length(setdiff(unique(object$cluster), 0L))\n}\n\n#' @rdname ncluster\n#' @export\nnnoise <- function(object, ...) {\n  UseMethod(\"nnoise\")\n}\n\n#' @export\nnnoise.default <- function(object, ...) {\n  if (!is.list(object) || !is.numeric(object$cluster))\n    stop(\"ncluster() requires a clustering object with a cluster component containing the cluster labels.\")\n\n  sum(object$cluster == 0L)\n}\n"
  },
  {
    "path": "R/nobs.R",
    "content": "\n#' @importFrom stats nobs\n#' @export\nnobs.dbscan <- function(object, ...) length(object$cluster)\n\n#' @export\nnobs.hdbscan <- function(object, ...) length(object$cluster)\n\n#' @export\nnobs.general_clustering <- function(object, ...) length(object$cluster)\n\n"
  },
  {
    "path": "R/optics.R",
    "content": "#######################################################################\n# dbscan - Density Based Clustering of Applications with Noise\n#          and Related Algorithms\n# Copyright (C) 2015 Michael Hahsler\n\n# This program is free software; you can redistribute it and/or modify\n# it under the terms of the GNU General Public License as published by\n# the Free Software Foundation; either version 2 of the License, or\n# any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU General Public License for more details.\n#\n# You should have received a copy of the GNU General Public License along\n# with this program; if not, write to the Free Software Foundation, Inc.,\n# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.\n\n#' Ordering Points to Identify the Clustering Structure (OPTICS)\n#'\n#' Implementation of the OPTICS (Ordering points to identify the clustering\n#' structure) point ordering algorithm using a kd-tree.\n#'\n#' **The algorithm**\n#'\n#' This implementation of OPTICS implements the original\n#' algorithm as described by Ankerst et al (1999). OPTICS is an ordering\n#' algorithm with methods to extract a clustering from the ordering.\n#' While using similar concepts as DBSCAN, for OPTICS `eps`\n#' is only an upper limit for the neighborhood size used to reduce\n#' computational complexity. Note that `minPts` in OPTICS has a different\n#' effect then in DBSCAN. It is used to define dense neighborhoods, but since\n#' `eps` is typically set rather high, this does not effect the ordering\n#' much. However, it is also used to calculate the reachability distance and\n#' larger values will make the reachability distance plot smoother.\n#'\n#' OPTICS linearly orders the data points such that points which are spatially\n#' closest become neighbors in the ordering. The closest analog to this\n#' ordering is dendrogram in single-link hierarchical clustering. The algorithm\n#' also calculates the reachability distance for each point.\n#' `plot()` (see [reachability_plot])\n#' produces a reachability plot which shows each points reachability distance\n#' between two consecutive points\n#' where the points are sorted by OPTICS. Valleys represent clusters (the\n#' deeper the valley, the more dense the cluster) and high points indicate\n#' points between clusters.\n#'\n#' **Specifying the data**\n#'\n#' If `x` is specified as a data matrix, then Euclidean distances and fast\n#' nearest neighbor lookup using a kd-tree are used. See [kNN()] for\n#' details on the parameters for the kd-tree.\n#'\n#' **Extracting a clustering**\n#'\n#' Several methods to extract a clustering from the order returned by OPTICS are\n#' implemented:\n#'\n#' * `extractDBSCAN()` extracts a clustering from an OPTICS ordering that is\n#'   similar to what DBSCAN would produce with an eps set to `eps_cl` (see\n#'   Ankerst et al, 1999). The only difference to a DBSCAN clustering is that\n#'   OPTICS is not able to assign some border points and reports them instead as\n#'   noise.\n#'\n#' * `extractXi()` extract clusters hierarchically specified in Ankerst et al\n#'   (1999) based on the steepness of the reachability plot. One interpretation\n#'   of the `xi` parameter is that it classifies clusters by change in\n#'   relative cluster density. The used algorithm was originally contributed by\n#'   the ELKI framework and is explained in Schubert et al (2018), but contains a\n#'   set of fixes.\n#'\n#' **Predict cluster memberships**\n#'\n#' `predict()` requires an extracted DBSCAN clustering with `extractDBSCAN()` and then\n#' uses predict for `dbscan()`.\n#'\n#' @aliases optics OPTICS\n#' @family clustering functions\n#'\n#' @param x a data matrix or a [dist] object.\n#' @param eps upper limit of the size of the epsilon neighborhood. Limiting the\n#' neighborhood size improves performance and has no or very little impact on\n#' the ordering as long as it is not set too low. If not specified, the largest\n#' minPts-distance in the data set is used which gives the same result as\n#' infinity.\n#' @param minPts the parameter is used to identify dense neighborhoods and the\n#' reachability distance is calculated as the distance to the minPts nearest\n#' neighbor. Controls the smoothness of the reachability distribution. Default\n#' is 5 points.\n#' @param eps_cl Threshold to identify clusters (`eps_cl <= eps`).\n#' @param xi Steepness threshold to identify clusters hierarchically using the\n#' Xi method.\n#' @param object an object of class `optics`.\n#' @param minimum logical, representing whether or not to extract the minimal\n#' (non-overlapping) clusters in the Xi clustering algorithm.\n#' @param correctPredecessors logical, correct a common artifact by pruning\n#' the steep up area for points that have predecessors not in the\n#' cluster--found by the ELKI framework, see details below.\n#' @param ...  additional arguments are passed on to fixed-radius nearest\n#' neighbor search algorithm. See [frNN()] for details on how to\n#' control the search strategy.\n#' @param cluster,predecessor plot clusters and predecessors.\n#'\n#' @return An object of class `optics` with components:\n#' \\item{eps }{ value of `eps` parameter. }\n#' \\item{minPts }{ value of `minPts` parameter. }\n#' \\item{order }{ optics order for the data points in `x`. }\n#' \\item{reachdist }{ [reachability] distance for each data point in `x`. }\n#' \\item{coredist }{ core distance for each data point in `x`. }\n#'\n#' For `extractDBSCAN()`, in addition the following\n#' components are available:\n#' \\item{eps_cl }{ the value of the `eps_cl` parameter. }\n#' \\item{cluster }{ assigned cluster labels in the order of the data points in `x`. }\n#'\n#' For `extractXi()`, in addition the following components\n#' are available:\n#' \\item{xi}{ Steepness threshold`x`. }\n#' \\item{cluster }{ assigned cluster labels in the order of the data points in `x`.}\n#' \\item{clusters_xi }{ data.frame containing the start and end of each cluster\n#' found in the OPTICS ordering. }\n#'\n#' @author Michael Hahsler and Matthew Piekenbrock\n#' @seealso Density [reachability].\n#'\n#' @references Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Joerg\n#' Sander (1999). OPTICS: Ordering Points To Identify the Clustering Structure.\n#' _ACM SIGMOD international conference on Management of data._ ACM Press. pp.\n#' \\doi{10.1145/304181.304187}\n#'\n#' Hahsler M, Piekenbrock M, Doran D (2019). dbscan: Fast Density-Based\n#' Clustering with R.  _Journal of Statistical Software_, 91(1), 1-30.\n#' \\doi{10.18637/jss.v091.i01}\n#'\n#' Erich Schubert, Michael Gertz (2018). Improving the Cluster Structure\n#' Extracted from OPTICS Plots. In _Lernen, Wissen, Daten, Analysen (LWDA 2018),_\n#' pp. 318-329.\n#' @keywords model clustering\n#' @examples\n#' set.seed(2)\n#' n <- 400\n#'\n#' x <- cbind(\n#'   x = runif(4, 0, 1) + rnorm(n, sd = 0.1),\n#'   y = runif(4, 0, 1) + rnorm(n, sd = 0.1)\n#'   )\n#'\n#' plot(x, col=rep(1:4, times = 100))\n#'\n#' ### run OPTICS (Note: we use the default eps calculation)\n#' res <- optics(x, minPts = 10)\n#' res\n#'\n#' ### get order\n#' res$order\n#'\n#' ### plot produces a reachability plot\n#' plot(res)\n#'\n#' ### plot the order of points in the reachability plot\n#' plot(x, col = \"grey\")\n#' polygon(x[res$order, ])\n#'\n#' ### extract a DBSCAN clustering by cutting the reachability plot at eps_cl\n#' res <- extractDBSCAN(res, eps_cl = .065)\n#' res\n#'\n#' plot(res)  ## black is noise\n#' hullplot(x, res)\n#'\n#' ### re-cut at a higher eps threshold\n#' res <- extractDBSCAN(res, eps_cl = .07)\n#' res\n#' plot(res)\n#' hullplot(x, res)\n#'\n#' ### extract hierarchical clustering of varying density using the Xi method\n#' res <- extractXi(res, xi = 0.01)\n#' res\n#'\n#' plot(res)\n#' hullplot(x, res)\n#'\n#' # Xi cluster structure\n#' res$clusters_xi\n#'\n#' ### use OPTICS on a precomputed distance matrix\n#' d <- dist(x)\n#' res <- optics(d, minPts = 10)\n#' plot(res)\n#' @export\noptics <- function(x, eps = NULL, minPts = 5, ...) {\n  ### find eps from minPts\n  eps <- eps %||% max(kNNdist(x, k =  minPts))\n\n  ### extra contains settings for frNN\n  ### search = \"kdtree\", bucketSize = 10, splitRule = \"suggest\", approx = 0\n  extra <- list(...)\n  args <- c(\"search\", \"bucketSize\", \"splitRule\", \"approx\")\n  m <- pmatch(names(extra), args)\n  if (anyNA(m))\n    stop(\"Unknown parameter: \",\n      toString(names(extra)[is.na(m)]))\n  names(extra) <- args[m]\n\n  search <- .parse_search(extra$search %||% \"kdtree\")\n  splitRule <- .parse_splitRule(extra$splitRule %||% \"suggest\")\n  bucketSize <- as.integer(extra$bucketSize %||% 10L)\n  approx <- as.integer(extra$approx %||% 0L)\n\n  ### dist search\n  if (search == 3L && !inherits(x, \"dist\")) {\n    if (.matrixlike(x))\n      x <- dist(x)\n    else\n      stop(\"x needs to be a matrix to calculate distances\")\n  }\n\n  ## for dist we provide the R code with a frNN list and no x\n  frNN <- list()\n  if (inherits(x, \"dist\")) {\n    frNN <- frNN(x, eps, ...)\n    ## add self match and use C numbering\n    frNN$id <- lapply(\n      seq_along(frNN$id),\n      FUN = function(i)\n        c(i - 1L, frNN$id[[i]] - 1L)\n    )\n    frNN$dist <- lapply(\n      seq_along(frNN$dist),\n      FUN = function(i)\n        c(0, frNN$dist[[i]]) ^ 2\n    )\n\n    x <- matrix()\n    storage.mode(x) <- \"double\"\n\n  } else{\n    if (!.matrixlike(x))\n      stop(\"x needs to be a matrix\")\n    ## make sure x is numeric\n    x <- as.matrix(x)\n    if (storage.mode(x) == \"integer\")\n      storage.mode(x) <- \"double\"\n    if (storage.mode(x) != \"double\")\n      stop(\"x has to be a numeric matrix.\")\n  }\n\n  if (length(frNN) == 0 &&\n      anyNA(x))\n    stop(\"data/distances cannot contain NAs for optics (with kd-tree)!\")\n\n  ret <-\n    optics_int(\n      as.matrix(x),\n      as.double(eps),\n      as.integer(minPts),\n      as.integer(search),\n      as.integer(bucketSize),\n      as.integer(splitRule),\n      as.double(approx),\n      frNN\n    )\n\n  ret$minPts <- minPts\n  ret$eps <- eps\n  ret$eps_cl <- NA_real_\n  ret$xi <- NA_real_\n  class(ret) <- \"optics\"\n\n  ret\n}\n\n#' @rdname optics\n#' @export\nprint.optics <- function(x, ...) {\n  writeLines(c(\n    paste0(\n      \"OPTICS ordering/clustering for \",\n      length(x$order),\n      \" objects.\"\n    ),\n    paste0(\n      \"Parameters: \",\n      \"minPts = \",\n      x$minPts,\n      \", eps = \",\n      x$eps,\n      \", eps_cl = \",\n      x$eps_cl,\n      \", xi = \",\n      x$xi\n    )\n  ))\n\n  if (!is.null(x$cluster)) {\n\n    if (is.na(x$xi)) {\n      writeLines(paste0(\n        \"The clustering contains \",\n        ncluster(x),\n        \" cluster(s) and \",\n        nnoise(x),\n        \" noise points.\"\n      ))\n\n      print(table(x$cluster))\n    } else {\n      writeLines(\n        paste0(\n          \"The clustering contains \",\n          nrow(x$clusters_xi),\n          \" cluster(s) and \",\n          nnoise(x),\n          \" noise points.\"\n        )\n      )\n    }\n    cat(\"\\n\")\n  }\n  writeLines(strwrap(paste0(\n    \"Available fields: \",\n    toString(names(x))\n  ), exdent = 18))\n}\n\n#' @rdname optics\n#' @export\nplot.optics <-\n  function(x,\n    cluster = TRUE,\n    predecessor = FALSE,\n    ...) {\n    # OPTICS cluster extraction methods\n    if (inherits(x$cluster, \"xics\") ||\n        all(c(\"start\", \"end\", \"cluster_id\") %in% names(x$clusters_xi))) {\n      # Sort clusters by size\n      hclusters <-\n        x$clusters_xi[order(x$clusters_xi$end - x$clusters_xi$start), ]\n\n      # .1 means to leave 15% for the cluster lines\n      def.par <- par(no.readonly = TRUE)\n      par(mar = c(2, 4, 4, 2) + 0.1, omd = c(0, 1, .15, 1))\n\n      # Need to know how to spread out lines\n      y_max <- max(x$reachdist[!is.infinite(x$reachdist)])\n      y_increments <- (y_max / 0.85 * .15) / (nrow(hclusters) + 1L)\n\n      # Get top level cluster labels\n      # top_level <- extractClusterLabels(x$clusters_xi, x$order)\n      plot(\n        as.reachability(x),\n        col = x$cluster[x$order] + 1L,\n        xlab = NA,\n        xaxt = 'n',\n        yaxs = \"i\",\n        ylim = c(0, y_max),\n        ...\n      )\n\n      # Lines beneath plotting region indicating Xi clusters\n      i <- seq_len(nrow(hclusters))\n      segments(\n        x0 = hclusters$start[i],\n        y0 = -(y_increments * i),\n        x1 = hclusters$end[i],\n        col = hclusters$cluster_id[i] + 1L,\n        lwd = 2,\n        xpd = NA\n      )\n      ## Restore previous settings\n      par(def.par)\n    } else if (is.numeric(x$cluster) &&\n        !is.null(x$eps_cl)) {\n      # Works for integers too\n      ## extractDBSCAN clustering\n      plot(as.reachability(x), col = x$cluster[x$order] + 1L, ...)\n      lines(\n        x = c(0, length(x$cluster)),\n        y = c(x$eps_cl, x$eps_cl),\n        col = \"black\",\n        lty = 2\n      )\n    } else {\n      # Regular reachability plot\n      plot(as.reachability(x), ...)\n    }\n  }\n\n# Simple conversion between OPTICS objects and reachability objects\n#' @rdname optics\n#' @export\nas.reachability.optics <- function(object, ...) {\n  structure(list(reachdist = object$reachdist, order = object$order),\n    class = \"reachability\")\n}\n\n# Conversion between OPTICS objects and dendrograms\n#' @rdname optics\n#' @export\nas.dendrogram.optics <- function(object, ...) {\n  if (object$minPts > length(object$order)) {\n    stop(\"'minPts' should be less or equal to the points in the dataset.\")\n  }\n  if (sum(is.infinite(object$reachdist)) > 1)\n    stop(\n      \"Eps value is not large enough to capture the complete hiearchical structure of the dataset. Please use a large eps value (such as Inf).\"\n    )\n  as.dendrogram(as.reachability(object))\n}\n\n#' @rdname optics\n#' @export\nextractDBSCAN <- function(object, eps_cl) {\n  if (!inherits(object, \"optics\"))\n    stop(\"extractDBSCAN only accepts objects resulting from dbscan::optics!\")\n\n  reachdist <- object$reachdist[object$order]\n  coredist <- object$coredist[object$order]\n  n <- length(object$order)\n  cluster <- integer(n)\n\n  clusterid <- 0L         ### 0 is noise\n  for (i in 1:n) {\n    if (reachdist[i] > eps_cl) {\n      if (coredist[i] <= eps_cl) {\n        clusterid <- clusterid + 1L\n        cluster[i] <- clusterid\n      } else{\n        cluster[i] <- 0L  ### noise\n      }\n    } else{\n      cluster[i] <- clusterid\n    }\n  }\n\n  object$eps_cl <- eps_cl\n  object$xi <- NA_real_\n  ### fix the order so cluster is in the same order as the rows in x\n  cluster[object$order] <- cluster\n  object$cluster <- cluster\n\n  object\n}\n\n\n#' @rdname optics\n#' @export\nextractXi <-\n  function(object,\n    xi,\n    minimum = FALSE,\n    correctPredecessors = TRUE)\n  {\n    if (!inherits(object, \"optics\"))\n      stop(\"extractXi only accepts xs resulting from dbscan::optics!\")\n    if (xi >= 1.0 ||\n        xi <= 0.0)\n      stop(\"The Xi parameter must be (0, 1)\")\n\n    # Initial variables\n    object$ord_rd <- object$reachdist[object$order]\n    object$ixi <- (1 - xi)\n    SetOfSteepDownAreas <- list()\n    SetOfClusters <- list()\n    index <- 1\n    mib <- 0\n    sdaset <- list()\n    while (index <= length(object$order))\n    {\n      mib <- max(mib, object$ord_rd[index])\n      if (!valid(index + 1, object))\n        break\n\n      # Test if this is a steep down area\n      if (steepDown(index, object))\n      {\n        # Update mib values with current mib and filter\n        sdaset <- updateFilterSDASet(mib, sdaset, object$ixi)\n        startval <- object$ord_rd[index]\n        mib <- 0\n        startsteep <- index\n        endsteep <- index + 1\n        while (!is.na(object$order[index + 1])) {\n          index <- index + 1\n          if (steepDown(index, object)) {\n            endsteep <- index + 1\n            next\n          }\n          if (!steepDown(index, object, ixi = 1.0) ||\n              index - endsteep > object$minPts)\n            break\n        }\n        sda <- list(\n          s = startsteep,\n          e = endsteep,\n          maximum = startval,\n          mib = 0\n        )\n        # print(paste(\"New steep down area:\", toString(sda)))\n        sdaset <- append(sdaset, list(sda))\n        next\n      }\n      if (steepUp(index, object))\n      {\n        sdaset <- updateFilterSDASet(mib, sdaset, object$ixi)\n        {\n          startsteep <- index\n          endsteep <- index + 1\n          mib <- object$ord_rd[index]\n          esuccr <-\n            if (!valid(index + 1, object))\n              Inf\n          else\n            object$ord_rd[index + 1]\n          if (!is.infinite(esuccr)) {\n            while (!is.na(object$order[index + 1])) {\n              index <- index + 1\n              if (steepUp(index, object)) {\n                endsteep <- index + 1\n                mib <- object$ord_rd[index]\n                esuccr <-\n                  if (!valid(index + 1, object))\n                    Inf\n                else\n                  object$ord_rd[index + 1]\n                if (is.infinite(esuccr)) {\n                  endsteep <- endsteep - 1\n                  break\n                }\n                next\n              }\n              if (!steepUp(index, object, ixi = 1.0) ||\n                  index - endsteep > object$minPts)\n                break\n            }\n          } else {\n            endsteep <- endsteep - 1\n            index <- index + 1\n          }\n          sua <- list(s = startsteep,\n            e = endsteep,\n            maximum = esuccr)\n          # print(paste(\"New steep up area:\", toString(sua)))\n        }\n        for (sda in rev(sdaset))\n        {\n          # Condition 3B\n          if (mib * object$ixi < sda$mib)\n            next\n\n          # Default values\n          cstart <- sda$s\n          cend <- sua$e\n\n          # Credit to ELKI\n          if (correctPredecessors) {\n            while (cend > cstart && is.infinite(object$ord_rd[cend])) {\n              cend <- cend - 1\n            }\n          }\n\n          # Condition 4\n          {\n            # Case b\n            if (sda$maximum * object$ixi >= sua$maximum) {\n              while (cstart < cend &&\n                  object$ord_rd[cstart + 1] > sua$maximum)\n                cstart <- cstart + 1\n            }\n            # Case c\n            else if (sua$maximum * object$ixi >= sda$maximum) {\n              while (cend > cstart &&\n                  object$ord_rd[cend - 1] > sda$maximum)\n                cend <- cend - 1\n            }\n          }\n\n          # This NOT in the original article - credit to ELKI for finding this.\n          # Ensure that the predecessor is in the current cluster. This filter\n          # removes common artifacts from the Xi method\n          if (correctPredecessors) {\n            while (cend > cstart) {\n              tmp2 <- object$predecessor[object$order[cend]]\n              if (!is.na(tmp2) &&\n                  any(object$order[cstart:(cend - 1)] == tmp2, na.rm = TRUE))\n                break\n              # Not found.\n              cend <- cend - 1\n            }\n          }\n\n          # Ensure the last steep up point is not included if it's xi significant\n          if (steepUp(index - 1, object)) {\n            cend <- cend - 1\n          }\n\n          # obey minpts\n          if (cend - cstart + 1 < object$minPts)\n            next\n          SetOfClusters <-\n            append(SetOfClusters, list(list(\n              start = cstart, end = cend\n            )))\n          next\n        }\n      } else {\n        index <- index + 1\n      }\n    }\n    # Remove aliases\n    object$ord_rd <- NULL\n    object$ixi <- NULL\n\n    # Keep xi parameter, disable any previous flat clustering parameter\n    object$xi <- xi\n    object$eps_cl <- NA_real_\n\n    # Zero-out clusters (only noise) if none found\n    if (length(SetOfClusters) == 0) {\n      warning(paste(\"No clusters were found with threshold:\", xi))\n      object$clusters_xi <- NULL\n      object$cluster < rep(0, length(object$cluster))\n      return(invisible(object))\n    }\n    # Cluster data exists; organize it by starting and ending index, give arbitrary id\n    object$clusters_xi <- do.call(rbind, SetOfClusters)\n    object$clusters_xi <-\n      data.frame(\n        start = unlist(object$clusters_xi[, 1], use.names = FALSE),\n        end = unlist(object$clusters_xi[, 2], use.names = FALSE),\n        check.names = FALSE\n      )\n    object$clusters_xi <-\n      object$clusters_xi[order(object$clusters_xi$start, object$clusters_xi$end), ]\n    object$clusters_xi <-\n      cbind(object$clusters_xi, list(cluster_id = seq_len(nrow(object$clusters_xi))))\n    row.names(object$clusters_xi) <- NULL\n\n    ## Populate cluster vector with either:\n    ## 1. 'top-level' cluster labels to aid in plotting\n    ## 2. 'local' or non-overlapping cluster labels if minimum == TRUE\n    object$cluster <-\n      extractClusterLabels(object$clusters_xi, object$order, minimum = minimum)\n\n    # Remove non-local clusters if minimum was specified\n    if (minimum) {\n      object$clusters_xi <-\n        object$clusters_xi[sort(unique(object$cluster))[-1], ]\n    }\n\n    class(object$cluster) <-\n      unique(append(class(object$cluster), \"xics\"))\n    class(object$clusters_xi) <-\n      unique(append(class(object$clusters_xi), \"xics\"))\n    object\n  }\n\n# Removes obsolete steep areas\nupdateFilterSDASet <- function(mib, sdaset, ixi) {\n  sdaset <- Filter(function(sda)\n    sda$maximum * ixi > mib, sdaset)\n  lapply(sdaset, function(sda) {\n    if (mib > sda$mib)\n      sda$mib <- mib\n    sda\n  })\n}\n\n# Determines if the reachability distance at the current index 'i' is\n# (xi) significantly lower than the next index\nsteepUp <- function(i, object, ixi = object$ixi) {\n  if (is.infinite(object$ord_rd[i]))\n    return(FALSE)\n  if (!valid(i + 1, object))\n    return(TRUE)\n  return(object$ord_rd[i] <= object$ord_rd[i + 1] * ixi)\n}\n\n# Determines if the reachability distance at the current index 'i' is\n# (xi) significantly higher than the next index\nsteepDown <- function(i, object, ixi = object$ixi) {\n  if (!valid(i + 1, object))\n    return(FALSE)\n  if (is.infinite(object$ord_rd[i + 1]))\n    return(FALSE)\n  return(object$ord_rd[i] * ixi >= object$ord_rd[i + 1])\n}\n\n# Determines if the reachability distance at the current index 'i' is a valid distance\nvalid <- function(index, object) {\n  return(!is.na(object$ord_rd[index]))\n}\n\n### Extract clusters (minimum == T extracts clusters that do not contain other clusters) from a given ordering of points\nextractClusterLabels <- function(cl, order, minimum = FALSE) {\n  ## Add cluster_id to clusters\n  if (!all(c(\"start\", \"end\") %in% names(cl)))\n    stop(\"extractClusterLabels expects start and end references\")\n  if (!\"cluster_id\" %in% names(cl))\n    cl <- cbind(cl, cluster_id = seq_len(nrow(cl)))\n\n  ## Sort cl based on minimum parameter / cluster size\n  if (!\"cluster_size\" %in% names(cl))\n    cl <- cbind(cl, list(cluster_size = (cl$end - cl$start)))\n  cl <-\n    if (minimum) {\n      cl[order(cl$cluster_size), ]\n    } else {\n      cl[order(-cl$cluster_size), ]\n    }\n\n  ## Fill in the [cluster] vector with cluster IDs\n  clusters <- rep(0, length(order))\n  for (cid in cl$cluster_id) {\n    cluster <- cl[cl$cluster_id == cid, ]\n    if (minimum) {\n      if (all(clusters[cluster$start:cluster$end] == 0)) {\n        clusters[cluster$start:cluster$end] <- cid\n      }\n    } else\n      clusters[cluster$start:cluster$end] <- cid\n  }\n\n  # Fix the ordering\n  clusters[order] <- clusters\n  return(clusters)\n}\n"
  },
  {
    "path": "R/pointdensity.R",
    "content": "#######################################################################\n# dbscan - Density Based Clustering of Applications with Noise\n#          and Related Algorithms\n# Copyright (C) 2017 Michael Hahsler\n\n# This program is free software; you can redistribute it and/or modify\n# it under the terms of the GNU General Public License as published by\n# the Free Software Foundation; either version 2 of the License, or\n# any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU General Public License for more details.\n#\n# You should have received a copy of the GNU General Public License along\n# with this program; if not, write to the Free Software Foundation, Inc.,\n# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.\n\n#' Calculate Local Density at Each Data Point\n#'\n#' Calculate the local density at each data point as either the number of\n#' points in the eps-neighborhood (as used in `dbscan()`) or perform kernel density\n#' estimation (KDE) using a uniform kernel. The function uses a kd-tree for fast\n#' fixed-radius nearest neighbor search.\n#'\n#' `dbscan()` estimates the density around a point as the number of points in the\n#' eps-neighborhood of the point (including the query point itself).\n#' Kernel density estimation (KDE) using a uniform kernel, which is just this point\n#' count in the eps-neighborhood divided by \\eqn{(2\\,eps\\,n)}{(2 eps n)}, where\n#' \\eqn{n} is the number of points in `x`.\n#'\n#' Alternatively, `type = \"gaussian\"` calculates a Gaussian kernel estimate where\n#' `eps` is used as the standard deviation. To speed up computation, a\n#' kd-tree is used to find all points within 3 times the standard deviation and\n#' these points are used for the estimate.\n#'\n#' Points with low local density often indicate noise (see e.g., Wishart (1969)\n#' and Hartigan (1975)).\n#'\n#' @aliases pointdensity density\n#' @family Outlier Detection Functions\n#'\n#' @param x a data matrix or a dist object.\n#' @param eps radius of the eps-neighborhood, i.e., bandwidth of the uniform\n#' kernel). For the Gaussian kde, this parameter specifies the standard deviation of\n#' the kernel.\n#' @param type `\"frequency\"`, `\"density\"`, or `\"gaussian\"`. should the raw count of\n#' points inside the eps-neighborhood, the eps-neighborhood density estimate,\n#' or a Gaussian density estimate be returned?\n#' @param search,bucketSize,splitRule,approx algorithmic parameters for\n#' [frNN()].\n#'\n#' @return A vector of the same length as data points (rows) in `x` with\n#' the count or density values for each data point.\n#'\n#' @author Michael Hahsler\n#' @seealso [frNN()], [stats::density()].\n#' @references Wishart, D. (1969), Mode Analysis: A Generalization of Nearest\n#' Neighbor which Reduces Chaining Effects, in _Numerical Taxonomy,_ Ed., A.J.\n#' Cole, Academic Press, 282-311.\n#'\n#' John A. Hartigan (1975), _Clustering Algorithms,_ John Wiley & Sons, Inc.,\n#' New York, NY, USA.\n#' @keywords model\n#' @examples\n#' set.seed(665544)\n#' n <- 100\n#' x <- cbind(\n#'   x=runif(10, 0, 5) + rnorm(n, sd = 0.4),\n#'   y=runif(10, 0, 5) + rnorm(n, sd = 0.4)\n#'   )\n#' plot(x)\n#'\n#' ### calculate density around points\n#' d <- pointdensity(x, eps = .5, type = \"density\")\n#'\n#' ### density distribution\n#' summary(d)\n#' hist(d, breaks = 10)\n#'\n#' ### plot with point size is proportional to Density\n#' plot(x, pch = 19, main = \"Density (eps = .5)\", cex = d*5)\n#'\n#' ### Wishart (1969) single link clustering after removing low-density noise\n#' # 1. remove noise with low density\n#' f <- pointdensity(x, eps = .5, type = \"frequency\")\n#' x_nonoise <- x[f >= 5,]\n#'\n#' # 2. use single-linkage on the non-noise points\n#' hc <- hclust(dist(x_nonoise), method = \"single\")\n#' plot(x, pch = 19, cex = .5)\n#' points(x_nonoise, pch = 19, col= cutree(hc, k = 4) + 1L)\n#' @export\npointdensity <- function(x,\n  eps,\n  type = \"frequency\",\n  search = \"kdtree\",\n  bucketSize = 10,\n  splitRule = \"suggest\",\n  approx = 0) {\n  type <- match.arg(type, choices = c(\"frequency\", \"density\", \"gaussian\"))\n\n  if (anyNA(x))\n    stop(\"missing values are not allowed in x.\")\n\n  if (type == \"gaussian\")\n    return (.pointdensity_gaussian(x, sd = eps, search = search,\n                                   bucketSize = bucketSize,\n                                   splitRule = splitRule, approx = approx))\n\n  # regular dbscan density estimation\n  if (inherits(x, \"dist\")) {\n    nn <- frNN(\n      x,\n      eps,\n      sort = FALSE,\n      search = search,\n      bucketSize = bucketSize,\n      splitRule = splitRule,\n      approx = approx\n    )\n    d <- lengths(nn$id) + 1L\n\n  } else {\n    # faster implementation for a data matrix\n    search <- .parse_search(search)\n    splitRule <- .parse_splitRule(splitRule)\n\n    d <- dbscan_density_int(\n      as.matrix(x),\n      as.double(eps),\n      as.integer(search),\n      as.integer(bucketSize),\n      as.integer(splitRule),\n      as.double(approx)\n    )\n  }\n\n  if (type == \"density\")\n    d <- d / (2 * eps * nrow(x))\n\n  d\n}\n\n.pointdensity_gaussian <- function(x, sd, ...) {\n    ### consider all points within 3 standard deviations\n    nn <- frNN(\n      x,\n      3 * sd,\n      sort = FALSE,\n      ...\n    )\n\n    sigma <- sd^2\n    d <- sapply(nn$dist, FUN = function(ds) sum(exp(-1 * ds^2 / (2 * sigma))))\n    d <- d / (length(d) * sd * 2 * pi)\n    d\n}\n\n#gof <- function(x, eps, ...) {\n#  d <- pointdensity(x, eps, ...)\n#  1/(d/mean(d))\n#}\n"
  },
  {
    "path": "R/predict.R",
    "content": "#######################################################################\n# dbscan - Density Based Clustering of Applications with Noise\n#          and Related Algorithms\n# Copyright (C) 2017 Michael Hahsler\n\n# This program is free software; you can redistribute it and/or modify\n# it under the terms of the GNU General Public License as published by\n# the Free Software Foundation; either version 2 of the License, or\n# any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU General Public License for more details.\n#\n# You should have received a copy of the GNU General Public License along\n# with this program; if not, write to the Free Software Foundation, Inc.,\n# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.\n\n#' @rdname dbscan\n#' @param object clustering object.\n#' @param data the data set used to create the clustering object.\n#' @param newdata new data points for which the cluster membership should be\n#' predicted.\n#' @importFrom stats predict\n#' @export\npredict.dbscan_fast <- function (object, newdata, data, ...) {\n  if (object$metric != \"euclidean\")\n    warning(\"dbscan used non-Euclidean distances, predict assigns new points using Euclidean distances!\")\n  .predict_frNN(newdata, data, object$cluster, object$eps, ...)\n}\n\n#' @rdname optics\n#' @param object clustering object.\n#' @param data the data set used to create the clustering object.\n#' @param newdata new data points for which the cluster membership should be\n#' predicted.\n#' @export\npredict.optics <- function (object, newdata, data, ...) {\n  if (is.null(object$cluster) ||\n      is.null(object$eps_cl) || is.na(object$eps_cl))\n    stop(\"no extracted clustering available in object! run extractDBSCAN() first.\")\n  .predict_frNN(newdata, data, object$cluster, object$eps_cl, ...)\n}\n\n#' @rdname hdbscan\n#' @param object clustering object.\n#' @param data the data set used to create the clustering object.\n#' @param newdata new data points for which the cluster membership should be\n#' predicted.\n#' @export\npredict.hdbscan <- function(object, newdata, data, ...) {\n  clusters <- object$cluster\n\n  if (is.null(newdata))\n    return(clusters)\n\n  # don't use noise\n  coredist <- object$coredist[clusters != 0]\n  data <- data[clusters != 0,]\n  clusters <- clusters[clusters != 0]\n\n  # find minPts - 1 nearest neighbor\n  nns <- kNN(data, query = newdata, k = 1)\n\n  # choose cluster if dist <= coredist of that point\n  drop(ifelse(nns$dist > coredist[nns$id], 0L, clusters[nns$id]))\n}\n\n## find the cluster id of the closest NN in the eps neighborhood or return 0 otherwise.\n.predict_frNN <- function(newdata, data, clusters, eps, ...) {\n  if (is.null(newdata))\n    return(clusters)\n\n  if (ncol(data) != ncol(newdata))\n    stop(\"Number of columns in data and newdata do not agree!\")\n\n  if (nrow(data) != length(clusters))\n    stop(\"clustering does not agree with the number of data points in data.\")\n\n  if (is.data.frame(data)) {\n    indx <- vapply(data, is.factor, logical(1L))\n    if (any(indx)) {\n      warning(\n        \"data contains factors! The factors are converted to numbers and euclidean distances are used\"\n      )\n    }\n    data[indx] <- lapply(data[indx], as.numeric)\n    newdata[indx] <- lapply(newdata[indx], as.numeric)\n  }\n\n  # don't use noise\n  data <- data[clusters != 0,]\n  clusters <- clusters[clusters != 0]\n\n  # calculate the frNN between newdata and data (only keep entries for newdata)\n  nn <- frNN(data,\n    query = newdata,\n    eps = eps,\n    sort = TRUE,\n    ...)\n\n  vapply(\n    nn$id, function(nns) if (length(nns) == 0L) 0L else clusters[nns[1L]], integer(1L)\n  )\n}\n"
  },
  {
    "path": "R/reachability.R",
    "content": "#######################################################################\n# dbscan - Density Based Clustering of Applications with Noise\n#          and Related Algorithms\n# Copyright (C) 2015 Michael Hahsler, Matt Piekenbrock\n\n# This program is free software; you can redistribute it and/or modify\n# it under the terms of the GNU General Public License as published by\n# the Free Software Foundation; either version 2 of the License, or\n# any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU General Public License for more details.\n#\n# You should have received a copy of the GNU General Public License along\n# with this program; if not, write to the Free Software Foundation, Inc.,\n# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.\n\n\n#' Reachability Distances\n#'\n#' Reachability distances can be plotted to show the hierarchical relationships between data points.\n#' The idea was originally introduced by Ankerst et al (1999) for [OPTICS]. Later,\n#' Sanders et al (2003) showed that the visualization is useful for other hierarchical\n#' structures and introduced an algorithm to convert [dendrogram] representation to\n#' reachability plots.\n#'\n#' A reachability plot displays the points as vertical bars, were the height is the\n#' reachability distance between two consecutive points.\n#' The central idea behind reachability plots is that the ordering in which\n#' points are plotted identifies underlying hierarchical density\n#' representation as mountains and valleys of high and low reachability distance.\n#' The original ordering algorithm OPTICS as described by Ankerst et al (1999)\n#' introduced the notion of reachability plots.\n#'\n#' OPTICS linearly orders the data points such that points\n#' which are spatially closest become neighbors in the ordering. Valleys\n#' represent clusters, which can be represented hierarchically. Although the\n#' ordering is crucial to the structure of the reachability plot, its important\n#' to note that OPTICS, like DBSCAN, is not entirely deterministic and, just\n#' like the dendrogram, isomorphisms may exist\n#'\n#' Reachability plots were shown to essentially convey the same information as\n#' the more traditional dendrogram structure by Sanders et al (2003). An dendrograms\n#' can be converted into reachability plots.\n#'\n#' Different hierarchical representations, such as dendrograms or reachability\n#' plots, may be preferable depending on the context. In smaller datasets,\n#' cluster memberships may be more easily identifiable through a dendrogram\n#' representation, particularly is the user is already familiar with tree-like\n#' representations. For larger datasets however, a reachability plot may be\n#' preferred for visualizing macro-level density relationships.\n#'\n#' A variety of cluster extraction methods have been proposed using\n#' reachability plots. Because both cluster extraction depend directly on the\n#' ordering OPTICS produces, they are part of the [optics()] interface.\n#' Nonetheless, reachability plots can be created directly from other types of\n#' linkage trees, and vice versa.\n#'\n#' _Note:_ The reachability distance for the first point is by definition not defined\n#' (it has no preceding point).\n#' Also, the reachability distances can be undefined when a point does not have enough\n#' neighbors in the epsilon neighborhood. We represent these undefined cases as `Inf`\n#' and represent them in the plot as a dashed line.\n#'\n#' @name reachability\n#' @aliases reachability reachability_plot print.reachability\n#'\n#' @param object any object that can be coerced to class\n#' `reachability`, such as an object of class [optics] or [stats::dendrogram].\n#' @param x object of class `reachability`.\n#' @param order_labels whether to plot text labels for each points reachability\n#' distance.\n#' @param xlab x-axis label.\n#' @param ylab y-axis label.\n#' @param main Title of the plot.\n#' @param ...  graphical parameters are passed on to `plot()`,\n#'   or arguments for other methods.\n#'\n#' @return An object of class `reachability` with components:\n#' \\item{order }{order to use for the data points in `x`. }\n#' \\item{reachdist }{reachability distance for each data point in `x`. }\n#'\n#' @author Matthew Piekenbrock\n#' @seealso [optics()], [as.dendrogram()], and [stats::hclust()].\n#' @references Ankerst, M., M. M. Breunig, H.-P. Kriegel, J. Sander (1999).\n#' OPTICS: Ordering Points To Identify the Clustering Structure. _ACM\n#' SIGMOD international conference on Management of data._ ACM Press. pp.\n#' 49--60.\n#'\n#' Sander, J., X. Qin, Z. Lu, N. Niu, and A. Kovarsky (2003). Automatic\n#' extraction of clusters from hierarchical clustering representations.\n#' _Pacific-Asia Conference on Knowledge Discovery and Data Mining._\n#' Springer Berlin Heidelberg.\n#' @keywords model clustering hierarchical clustering\n#' @examples\n#' set.seed(2)\n#' n <- 20\n#'\n#' x <- cbind(\n#'   x = runif(4, 0, 1) + rnorm(n, sd = 0.1),\n#'   y = runif(4, 0, 1) + rnorm(n, sd = 0.1)\n#' )\n#'\n#' plot(x, xlim = range(x), ylim = c(min(x) - sd(x), max(x) + sd(x)), pch = 20)\n#' text(x = x, labels = seq_len(nrow(x)), pos = 3)\n#'\n#' ### run OPTICS\n#' res <- optics(x, eps = 10,  minPts = 2)\n#' res\n#'\n#' ### plot produces a reachability plot.\n#' plot(res)\n#'\n#' ### Manually extract reachability components from OPTICS\n#' reach <- as.reachability(res)\n#' reach\n#'\n#' ### plot still produces a reachability plot; points ids\n#' ### (rows in the original data) can be displayed with order_labels = TRUE\n#' plot(reach, order_labels = TRUE)\n#'\n#' ### Reachability objects can be directly converted to dendrograms\n#' dend <- as.dendrogram(reach)\n#' dend\n#' plot(dend)\n#'\n#' ### A dendrogram can be converted back into a reachability object\n#' plot(as.reachability(dend))\nNULL\n\n#' @rdname reachability\n#' @export\nprint.reachability <- function(x, ...) {\n  avg_reach <- mean(x$reachdist[!is.infinite(x$reachdist)], na.rm = TRUE)\n  cat(\n    \"Reachability plot collection for \",\n    length(x$order),\n    \" objects.\\n\",\n    \"Avg minimum reachability distance: \",\n    avg_reach,\n    \"\\n\",\n    \"Available Fields: order, reachdist\",\n    sep = \"\"\n  )\n}\n\n#' @rdname reachability\n#' @export\nplot.reachability <- function(x,\n  order_labels = FALSE,\n  xlab = \"Order\",\n  ylab = \"Reachability dist.\",\n  main = \"Reachability Plot\",\n  ...) {\n  if (is.null(x$order) ||\n      is.null(x$reachdist))\n    stop(\"reachability objects need 'reachdist' and 'order' fields\")\n  reachdist <- x$reachdist[x$order]\n\n  plot(\n    reachdist,\n    xlab = xlab,\n    ylab = ylab,\n    main = main,\n    type = \"h\",\n    ...\n  )\n  abline(v = which(is.infinite(reachdist)),\n    lty = 3)\n  if (order_labels) {\n    text(\n      x = seq_along(x$order),\n      y = reachdist,\n      labels = x$order,\n      pos = 3\n    )\n  }\n}\n\n#' @rdname reachability\n#' @export\nas.reachability <-\n  function(object, ...)\n    UseMethod(\"as.reachability\")\n\n\n#' @rdname reachability\n#' @export\nas.reachability.dendrogram <- function(object, ...) {\n  if (!inherits(object, \"dendrogram\"))\n    stop(\"The as.reachability method requires a dendrogram object.\")\n  # Rcpp doesn't seem to import attributes well for vectors\n  fix_x <- dendrapply(object, function(leaf) {\n    new_leaf <-\n      as.list(leaf)\n    attributes(new_leaf) <- attributes(leaf)\n    new_leaf\n  })\n  res <- dendrogram_to_reach(fix_x)\n  # Refix the ordering\n  res$reachdist <- res$reachdist[order(res$order)]\n\n  return(res)\n}\n"
  },
  {
    "path": "R/sNN.R",
    "content": "#######################################################################\n# dbscan - Density Based Clustering of Applications with Noise\n#          and Related Algorithms\n# Copyright (C) 2017 Michael Hahsler\n\n# This program is free software; you can redistribute it and/or modify\n# it under the terms of the GNU General Public License as published by\n# the Free Software Foundation; either version 2 of the License, or\n# any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU General Public License for more details.\n#\n# You should have received a copy of the GNU General Public License along\n# with this program; if not, write to the Free Software Foundation, Inc.,\n# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.\n\n# number of shared nearest neighbors including the point itself.\n\n\n#' Find Shared Nearest Neighbors\n#'\n#' Calculates the number of shared nearest neighbors\n#' and creates a shared nearest neighbors graph.\n#'\n#' The number of shared nearest neighbors of two points p and q is the\n#' intersection of the kNN neighborhood of two points.\n#' Note: that each point is considered to be part\n#' of its own kNN neighborhood.\n#' The range for the shared nearest neighbors is\n#' \\eqn{[0, k]}. The result is a n-by-k matrix called `shared`.\n#' Each row is a point and the columns are the point's k nearest neighbors.\n#' The value is the count of the shared neighbors.\n#'\n#' The shared nearest neighbor graph connects a point with all its nearest neighbors\n#' if they have at least one shared neighbor. The number of shared neighbors can be used\n#' as an edge weight.\n#' Javis and Patrick (1973) use a slightly\n#' modified (see parameter `jp`) shared nearest neighbor graph for\n#' clustering.\n#'\n#' @aliases sNN snn\n#' @family NN functions\n#'\n#' @param x a data matrix, a [dist] object or a [kNN] object.\n#' @param k number of neighbors to consider to calculate the shared nearest\n#' neighbors.\n#' @param kt minimum threshold on the number of shared nearest neighbors to\n#' build the shared nearest neighbor graph. Edges are only preserved if\n#' `kt` or more neighbors are shared.\n#' @param jp In regular sNN graphs, two points that are not neighbors\n#' can have shared neighbors.\n#' Javis and Patrick (1973) requires the two points to be neighbors, otherwise\n#' the count is zeroed out. `TRUE` uses this behavior.\n#' @param search nearest neighbor search strategy (one of `\"kdtree\"`, `\"linear\"` or\n#' `\"dist\"`).\n#' @param sort sort by the number of shared nearest neighbors? Note that this\n#' is expensive and `sort = FALSE` is much faster. sNN objects can be\n#' sorted using `sort()`.\n#' @param bucketSize max size of the kd-tree leafs.\n#' @param splitRule rule to split the kd-tree. One of `\"STD\"`, `\"MIDPT\"`, `\"FAIR\"`,\n#' `\"SL_MIDPT\"`, `\"SL_FAIR\"` or `\"SUGGEST\"` (SL stands for sliding). `\"SUGGEST\"` uses\n#' ANNs best guess.\n#' @param approx use approximate nearest neighbors. All NN up to a distance of\n#' a factor of `(1 + approx) eps` may be used. Some actual NN may be omitted\n#' leading to spurious clusters and noise points.  However, the algorithm will\n#' enjoy a significant speedup.\n#' @param decreasing logical; sort in decreasing order?\n#' @param ... additional parameters are passed on.\n#' @return An object of class `sNN` (subclass of [kNN] and [NN]) containing a list\n#' with the following components:\n#' \\item{id }{a matrix with ids. }\n#' \\item{dist}{a matrix with the distances. }\n#' \\item{shared }{a matrix with the number of shared nearest neighbors. }\n#' \\item{k }{number of `k` used. }\n#' \\item{metric }{the used distance metric. }\n#'\n#' @author Michael Hahsler\n#' @references R. A. Jarvis and E. A. Patrick. 1973. Clustering Using a\n#' Similarity Measure Based on Shared Near Neighbors. _IEEE Trans. Comput._\n#' 22, 11 (November 1973), 1025-1034.\n#' \\doi{10.1109/T-C.1973.223640}\n#' @keywords model\n#' @examples\n#' data(iris)\n#' x <- iris[, -5]\n#'\n#' # finding kNN and add the number of shared nearest neighbors.\n#' k <- 5\n#' nn <- sNN(x, k = k)\n#' nn\n#'\n#' # shared nearest neighbor distribution\n#' table(as.vector(nn$shared))\n#'\n#' # explore number of shared points for the k-neighborhood of point 10\n#' i <- 10\n#' nn$shared[i,]\n#'\n#' plot(nn, x)\n#'\n#' # apply a threshold to create a sNN graph with edges\n#' # if more than 3 neighbors are shared.\n#' nn_3 <- sNN(nn, kt = 3)\n#' plot(nn_3, x)\n#'\n#' # get an adjacency list for the shared nearest neighbor graph\n#' adjacencylist(nn_3)\n#' @export\nsNN <- function(x,\n  k,\n  kt = NULL,\n  jp = FALSE,\n  sort = TRUE,\n  search = \"kdtree\",\n  bucketSize = 10,\n  splitRule = \"suggest\",\n  approx = 0) {\n  if (missing(k))\n    k <- x$k\n\n  if (inherits(x, \"kNN\")) {\n    if (k != x$k) {\n      if (ncol(x$id) < k)\n        stop(\"kNN object does not contain enough neighbors!\")\n      if (!x$sort)\n        x <- sort.kNN(x)\n      x$id <- x$id[, 1:k]\n      x$dist <- x$dist[, 1:k]\n      x$k <- k\n    }\n\n  } else\n    x <-\n      kNN(\n        x,\n        k,\n        sort = FALSE,\n        search = search,\n        bucketSize = bucketSize,\n        splitRule = splitRule,\n        approx = approx\n      )\n\n  x$shared <- SNN_sim_int(x$id, as.logical(jp[1]))\n  x$sort_shared <- FALSE\n\n  class(x) <- c(\"sNN\", \"kNN\", \"NN\")\n\n  if (sort)\n    x <- sort.sNN(x)\n\n  x$kt <- kt\n\n  if (!is.null(kt)) {\n    if (kt > k)\n      stop(\"kt needs to be less than k.\")\n    rem <- x$shared < kt\n    x$id[rem] <- NA\n    x$dist[rem] <- NA\n    x$shared[rem] <- NA\n  }\n\n  x\n}\n\n#' @rdname sNN\n#' @export\nsort.sNN <- function(x, decreasing = TRUE, ...) {\n  if (isTRUE(x$sort_shared))\n    return(x)\n  if (is.null(x$shared))\n    stop(\"Unable to sort. Number of shared neighbors is missing.\")\n  if (ncol(x$id) < 2) {\n    x$sort <- TRUE\n    x$sort_shared <- TRUE\n    return(x)\n  }\n\n  ## sort first by number of shared points (decreasing) and break ties by id (increasing)\n  k <- ncol(x$shared)\n  o <- vapply(\n    seq_len(nrow(x$shared)),\n    function(i) order(k - x$shared[i, ], x$id[i, ], decreasing = !decreasing),\n    integer(k)\n  )\n  for (i in seq_len(ncol(o))) {\n    x$shared[i, ] <- x$shared[i, ][o[, i]]\n    x$dist[i, ] <- x$dist[i, ][o[, i]]\n    x$id[i, ] <- x$id[i, ][o[, i]]\n  }\n\n  x$sort <- FALSE\n  x$sort_shared <- TRUE\n\n  x\n}\n\n#' @rdname sNN\n#' @export\nprint.sNN <- function(x, ...) {\n  cat(\n    \"shared-nearest neighbors for \",\n    nrow(x$id),\n    \" objects (k=\",\n    x$k,\n    \", kt=\",\n    x$kt %||% \"NULL\",\n    \").\",\n    \"\\n\",\n    sep = \"\"\n  )\n  cat(\"Available fields: \", toString(names(x)), \"\\n\", sep = \"\")\n}\n"
  },
  {
    "path": "R/sNNclust.R",
    "content": "#######################################################################\n# dbscan - Density Based Clustering of Applications with Noise\n#          and Related Algorithms\n# Copyright (C) 2017 Michael Hahsler\n\n# This program is free software; you can redistribute it and/or modify\n# it under the terms of the GNU General Public License as published by\n# the Free Software Foundation; either version 2 of the License, or\n# any later version.\n#\n# This program is distributed in the hope that it will be useful,\n# but WITHOUT ANY WARRANTY; without even the implied warranty of\n# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the\n# GNU General Public License for more details.\n#\n# You should have received a copy of the GNU General Public License along\n# with this program; if not, write to the Free Software Foundation, Inc.,\n# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.\n\n\n#' Shared Nearest Neighbor Clustering\n#'\n#' Implements the shared nearest neighbor clustering algorithm by Ertoz,\n#' Steinbach and Kumar (2003).\n#'\n#' **Algorithm:**\n#'\n#' 1. Constructs a shared nearest neighbor graph for a given k. The edge\n#' weights are the number of shared k nearest neighbors (in the range of\n#' \\eqn{[0, k]}).\n#'\n#' 2. Find each points SNN density, i.e., the number of points which have a\n#' similarity of `eps` or greater.\n#'\n#' 3. Find the core points, i.e., all points that have an SNN density greater\n#' than `MinPts`.\n#'\n#' 4. Form clusters from the core points and assign border points (i.e.,\n#' non-core points which share at least `eps` neighbors with a core point).\n#'\n#' Note that steps 2-4 are equivalent to the DBSCAN algorithm (see [dbscan()])\n#' and that `eps` has a different meaning than for DBSCAN. Here it is\n#' a threshold on the number of shared neighbors (see [sNN()])\n#' which defines a similarity.\n#'\n#' @aliases sNNclust snnclust\n#' @family clustering functions\n#'\n#' @param x a data matrix/data.frame (Euclidean distance is used), a\n#' precomputed [dist] object or a kNN object created with [kNN()].\n#' @param k Neighborhood size for nearest neighbor sparsification to create the\n#' shared NN graph.\n#' @param eps Two objects are only reachable from each other if they share at\n#' least `eps` nearest neighbors. Note: this is different from the `eps` in DBSCAN!\n#' @param minPts minimum number of points that share at least `eps`\n#' nearest neighbors for a point to be considered a core points.\n#' @param borderPoints should border points be assigned to clusters like in\n#' [DBSCAN]?\n#' @param ...  additional arguments are passed on to the k nearest neighbor\n#' search algorithm. See [kNN()] for details on how to control the\n#' search strategy.\n#'\n#' @return A object of class `general_clustering` with the following\n#' components:\n#' \\item{cluster }{A integer vector with cluster assignments. Zero\n#' indicates noise points.}\n#' \\item{type }{ name of used clustering algorithm.}\n#' \\item{param }{ list of used clustering parameters. }\n#'\n#' @author Michael Hahsler\n#'\n#' @references Levent Ertoz, Michael Steinbach, Vipin Kumar, Finding Clusters\n#' of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data,\n#' _SIAM International Conference on Data Mining,_ 2003, 47-59.\n#' \\doi{10.1137/1.9781611972733.5}\n#' @keywords model clustering\n#' @examples\n#' data(\"DS3\")\n#'\n#' # Out of k = 20 NN 7 (eps) have to be shared to create a link in the sNN graph.\n#' # A point needs a least 16 (minPts) links in the sNN graph to be a core point.\n#' # Noise points have cluster id 0 and are shown in black.\n#' cl <- sNNclust(DS3, k = 20, eps = 7, minPts = 16)\n#' cl\n#'\n#' clplot(DS3, cl)\n#'\n#' @export\nsNNclust <- function(x, k, eps, minPts, borderPoints = TRUE, ...) {\n  nn <- sNN(x, k = k, jp = TRUE, ...)\n\n  # convert into a frNN object which already enforces eps\n  nn_list <- lapply(seq_len(nrow(nn$id)),\n    FUN = function(i) unname(nn$id[i, nn$shared[i, ] >= eps]))\n  snn <- structure(list(id = nn_list, eps = eps, metric = nn$metric),\n    class = c(\"NN\", \"frNN\"))\n\n  # run dbscan\n  cl <- dbscan(snn, minPts = minPts, borderPoints = borderPoints)\n\n  structure(list(cluster = cl$cluster,\n    type = \"SharedNN clustering\",\n    param = list(k = k, eps = eps, minPts = minPts, borderPoints = borderPoints),\n    metric = cl$metric),\n    class = \"general_clustering\")\n}\n"
  },
  {
    "path": "R/utils.R",
    "content": "`%||%` <- function(x, y) {\n  if (is.null(x)) y else x\n}\n"
  },
  {
    "path": "R/zzz.R",
    "content": "# ANN uses a global KD_TRIVIAL structure which needs to be removed.\n.onUnload <- function(libpath) {\n  ANN_cleanup()\n  #cat(\"Cleaning up after ANN.\\n\")\n}\n"
  },
  {
    "path": "README.Rmd",
    "content": "---\noutput: github_document\nbibliography: vignettes/dbscan.bib\nlink-citations: yes\n---\n\n```{r echo=FALSE, results = 'asis'}\npkg <- 'dbscan'\n\nsource(\"https://raw.githubusercontent.com/mhahsler/pkg_helpers/main/pkg_helpers.R\")\npkg_title(pkg, anaconda = \"r-dbscan\", stackoverflow = \"dbscan%2br\")\n```\n\n## Introduction\n\nThis R package [@hahsler2019dbscan] provides a fast C++ (re)implementation of several density-based algorithms with a focus on the DBSCAN family for clustering spatial data.\nThe package includes: \n \n__Clustering__\n\n- __DBSCAN:__ Density-based spatial clustering of applications with noise [@ester1996density].\n- __Jarvis-Patrick Clustering__: Clustering using a similarity measure based\non shared near neighbors [@jarvis1973].\n- __SNN Clustering__: Shared nearest neighbor clustering [@erdoz2003].\n- __HDBSCAN:__  Hierarchical DBSCAN with simplified hierarchy extraction [@campello2015hierarchical].\n- __FOSC:__ Framework for optimal selection of clusters for unsupervised and semisupervised clustering of hierarchical cluster tree [@campello2013density].\n- __OPTICS/OPTICSXi:__ Ordering points to identify the clustering structure and cluster extraction methods\n  [@ankerst1999optics].\n\n__Outlier Detection__\n\n- __LOF:__ Local outlier factor algorithm [@breunig2000lof]. \n- __GLOSH:__ Global-Local Outlier Score from Hierarchies algorithm [@campello2015hierarchical]. \n\n__Cluster Evaluation__\n\n- __DBCV:__ Density-based clustering validation [@moulavi2014].\n\n__Fast Nearest-Neighbor Search (using kd-trees)__\n\n- __kNN search__\n- __Fixed-radius NN search__\n\n\nThe implementations use the kd-tree data structure (from library ANN) for faster k-nearest neighbor search, and are\nfor Euclidean distance typically faster than the native R implementations (e.g., dbscan in package `fpc`), or the \nimplementations in [WEKA](https://ml.cms.waikato.ac.nz/weka/), [ELKI](https://elki-project.github.io/) and [Python's scikit-learn](https://scikit-learn.org/).\n\n```{r echo=FALSE, results = 'asis'}\npkg_usage(pkg)\npkg_citation(pkg, 2)\npkg_install(pkg)\n```\n\n## Usage\n\nLoad the package and use the numeric variables in the iris dataset\n```{r}\nlibrary(\"dbscan\")\n\ndata(\"iris\")\nx <- as.matrix(iris[, 1:4])\n```\n\nDBSCAN\n```{r}\ndb <- dbscan(x, eps = .42, minPts = 5)\ndb\n```\n\nVisualize the resulting clustering (noise points are shown in black).\n```{r dbscan}\npairs(x, col = db$cluster + 1L)\n```\n\n\nOPTICS\n```{r}\nopt <- optics(x, eps = 1, minPts = 4)\nopt\n```\n\nExtract DBSCAN-like clustering from OPTICS \nand create a reachability plot (extracted DBSCAN clusters at eps_cl=.4 are colored)\n```{r OPTICS_extractDBSCAN, fig.height=3}\nopt <- extractDBSCAN(opt, eps_cl = .4)\nplot(opt)\n```\n\nHDBSCAN\n\n```{r}\nhdb <- hdbscan(x, minPts = 4)\nhdb\n```\n\nVisualize the hierarchical clustering as a simplified tree. HDBSCAN finds 2 stable clusters.\n\n```{r hdbscan, fig.height=4}\nplot(hdb, show_flat = TRUE)\n```\n\n## Using dbscan with tidyverse\n\n`dbscan` provides for all clustering algorithms `tidy()`, `augment()`, and `glance()` so they can\nbe easily used with tidyverse, ggplot2 and [tidymodels](https://www.tidymodels.org/learn/statistics/k-means/).\n\n```{r tidyverse, message=FALSE, warning=FALSE}\nlibrary(tidyverse)\ndb <- x %>% dbscan(eps = .42, minPts = 5)\n```\n\nGet cluster statistics as a tibble\n\n```{r tidyverse2}\ntidy(db)\n```\n\nVisualize the clustering with ggplot2 (use an x for noise points)\n```{r tidyverse3}\naugment(db, x) %>% \n  ggplot(aes(x = Petal.Length, y = Petal.Width)) +\n    geom_point(aes(color = .cluster, shape = noise)) +\n    scale_shape_manual(values=c(19, 4))\n\n```\n\n\n\n\n## Using dbscan from Python\nR, the R package `dbscan`, and the Python package `rpy2` need to be installed.\n\n```{python, eval = FALSE, python.reticulate = FALSE}\nimport pandas as pd\nimport numpy as np\n\n### prepare data\niris = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', \n                   header = None, \n                   names = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species'])\niris_numeric = iris[['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth']]\n\n# get R dbscan package\nfrom rpy2.robjects import packages\ndbscan = packages.importr('dbscan')\n\n# enable automatic conversion of pandas dataframes to R dataframes\nfrom rpy2.robjects import pandas2ri\npandas2ri.activate()\n\ndb = dbscan.dbscan(iris_numeric, eps = 0.5, MinPts = 5)\nprint(db)\n```\n\n```\n## DBSCAN clustering for 150 objects.\n## Parameters: eps = 0.5, minPts = 5\n## Using euclidean distances and borderpoints = TRUE\n## The clustering contains 2 cluster(s) and 17 noise points.\n## \n##  0  1  2 \n## 17 49 84 \n## \n## Available fields: cluster, eps, minPts, dist, borderPoints\n```\n\n```{python, eval = FALSE, python.reticulate = FALSE}\n# get the cluster assignment vector\nlabels = np.array(db.rx('cluster'))\nlabels\n```\n\n```\n## array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n##         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1,\n##         1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 0, 2, 2, 0, 2, 2, 2, 2, 2,\n##         2, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0,\n##         2, 2, 2, 2, 2, 0, 2, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2, 0, 0, 2, 0, 0,\n##         2, 2, 2, 2, 2, 2, 2, 0, 0, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 0,\n##         2, 2, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]],\n##       dtype=int32)\n```\n\n## License \nThe dbscan package is licensed under the [GNU General Public License (GPL) Version 3](https://www.gnu.org/licenses/gpl-3.0.en.html). The __OPTICSXi__ R implementation was directly ported from the ELKI framework's Java implementation (GNU AGPLv3), with permission by the original author, Erich Schubert.  \n\n## Changes\n* List of changes from [NEWS.md](https://github.com/mhahsler/dbscan/blob/master/NEWS.md)\n\n## References\n\n"
  },
  {
    "path": "README.md",
    "content": "\n# <img src=\"man/figures/logo.svg\" align=\"right\" height=\"139\" /> R package dbscan - Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Related Algorithms\n\n[![Package on\nCRAN](https://www.r-pkg.org/badges/version/dbscan)](https://CRAN.R-project.org/package=dbscan)\n[![CRAN RStudio mirror\ndownloads](https://cranlogs.r-pkg.org/badges/dbscan)](https://CRAN.R-project.org/package=dbscan)\n![License](https://img.shields.io/cran/l/dbscan)\n[![Anaconda.org](https://anaconda.org/conda-forge/r-dbscan/badges/version.svg)](https://anaconda.org/conda-forge/r-dbscan)\n[![r-universe\nstatus](https://mhahsler.r-universe.dev/badges/dbscan)](https://mhahsler.r-universe.dev/dbscan)\n[![StackOverflow](https://img.shields.io/badge/stackoverflow-dbscan%2br-orange.svg)](https://stackoverflow.com/questions/tagged/dbscan%2br)\n\n## Introduction\n\nThis R package ([Hahsler, Piekenbrock, and Doran\n2019](#ref-hahsler2019dbscan)) provides a fast C++ (re)implementation of\nseveral density-based algorithms with a focus on the DBSCAN family for\nclustering spatial data. The package includes:\n\n**Clustering**\n\n- **DBSCAN:** Density-based spatial clustering of applications with\n  noise ([Ester et al. 1996](#ref-ester1996density)).\n- **Jarvis-Patrick Clustering**: Clustering using a similarity measure\n  based on shared near neighbors ([Jarvis and Patrick\n  1973](#ref-jarvis1973)).\n- **SNN Clustering**: Shared nearest neighbor clustering ([Ertöz,\n  Steinbach, and Kumar 2003](#ref-erdoz2003)).\n- **HDBSCAN:** Hierarchical DBSCAN with simplified hierarchy extraction\n  ([Campello et al. 2015](#ref-campello2015hierarchical)).\n- **FOSC:** Framework for optimal selection of clusters for unsupervised\n  and semisupervised clustering of hierarchical cluster tree ([Campello,\n  Moulavi, and Sander 2013](#ref-campello2013density)).\n- **OPTICS/OPTICSXi:** Ordering points to identify the clustering\n  structure and cluster extraction methods ([Ankerst et al.\n  1999](#ref-ankerst1999optics)).\n\n**Outlier Detection**\n\n- **LOF:** Local outlier factor algorithm ([Breunig et al.\n  2000](#ref-breunig2000lof)).\n- **GLOSH:** Global-Local Outlier Score from Hierarchies algorithm\n  ([Campello et al. 2015](#ref-campello2015hierarchical)).\n\n**Cluster Evaluation**\n\n- **DBCV:** Density-based clustering validation ([Moulavi et al.\n  2014](#ref-moulavi2014)).\n\n**Fast Nearest-Neighbor Search (using kd-trees)**\n\n- **kNN search**\n- **Fixed-radius NN search**\n\nThe implementations use the kd-tree data structure (from library ANN)\nfor faster k-nearest neighbor search, and are for Euclidean distance\ntypically faster than the native R implementations (e.g., dbscan in\npackage `fpc`), or the implementations in\n[WEKA](https://ml.cms.waikato.ac.nz/weka/),\n[ELKI](https://elki-project.github.io/) and [Python’s\nscikit-learn](https://scikit-learn.org/).\n\nThe following R packages use `dbscan`:\n[AnimalSequences](https://CRAN.R-project.org/package=AnimalSequences),\n[bioregion](https://CRAN.R-project.org/package=bioregion),\n[clayringsmiletus](https://CRAN.R-project.org/package=clayringsmiletus),\n[CLONETv2](https://CRAN.R-project.org/package=CLONETv2),\n[clusterWebApp](https://CRAN.R-project.org/package=clusterWebApp),\n[cordillera](https://CRAN.R-project.org/package=cordillera),\n[CPC](https://CRAN.R-project.org/package=CPC),\n[crosshap](https://CRAN.R-project.org/package=crosshap),\n[crownsegmentr](https://CRAN.R-project.org/package=crownsegmentr),\n[CspStandSegmentation](https://CRAN.R-project.org/package=CspStandSegmentation),\n[daltoolbox](https://CRAN.R-project.org/package=daltoolbox),\n[DataSimilarity](https://CRAN.R-project.org/package=DataSimilarity),\n[diceR](https://CRAN.R-project.org/package=diceR),\n[dobin](https://CRAN.R-project.org/package=dobin),\n[doc2vec](https://CRAN.R-project.org/package=doc2vec),\n[dPCP](https://CRAN.R-project.org/package=dPCP),\n[emcAdr](https://CRAN.R-project.org/package=emcAdr),\n[eventstream](https://CRAN.R-project.org/package=eventstream),\n[evprof](https://CRAN.R-project.org/package=evprof),\n[fastml](https://CRAN.R-project.org/package=fastml),\n[FCPS](https://CRAN.R-project.org/package=FCPS),\n[flowcluster](https://CRAN.R-project.org/package=flowcluster),\n[funtimes](https://CRAN.R-project.org/package=funtimes),\n[FuzzyDBScan](https://CRAN.R-project.org/package=FuzzyDBScan),\n[HaploVar](https://CRAN.R-project.org/package=HaploVar),\n[immunaut](https://CRAN.R-project.org/package=immunaut),\n[karyotapR](https://CRAN.R-project.org/package=karyotapR),\n[ksharp](https://CRAN.R-project.org/package=ksharp),\n[LLMing](https://CRAN.R-project.org/package=LLMing),\n[LOMAR](https://CRAN.R-project.org/package=LOMAR),\n[maotai](https://CRAN.R-project.org/package=maotai),\n[MapperAlgo](https://CRAN.R-project.org/package=MapperAlgo),\n[metaCluster](https://CRAN.R-project.org/package=metaCluster),\n[metasnf](https://CRAN.R-project.org/package=metasnf),\n[mlr3cluster](https://CRAN.R-project.org/package=mlr3cluster),\n[neuroim2](https://CRAN.R-project.org/package=neuroim2),\n[oclust](https://CRAN.R-project.org/package=oclust),\n[omicsTools](https://CRAN.R-project.org/package=omicsTools),\n[openSkies](https://CRAN.R-project.org/package=openSkies),\n[opticskxi](https://CRAN.R-project.org/package=opticskxi),\n[OTclust](https://CRAN.R-project.org/package=OTclust),\n[outlierensembles](https://CRAN.R-project.org/package=outlierensembles),\n[outlierMBC](https://CRAN.R-project.org/package=outlierMBC),\n[pagoda2](https://CRAN.R-project.org/package=pagoda2),\n[parameters](https://CRAN.R-project.org/package=parameters),\n[ParBayesianOptimization](https://CRAN.R-project.org/package=ParBayesianOptimization),\n[performance](https://CRAN.R-project.org/package=performance),\n[PiC](https://CRAN.R-project.org/package=PiC),\n[rcrisp](https://CRAN.R-project.org/package=rcrisp),\n[rMultiNet](https://CRAN.R-project.org/package=rMultiNet),\n[seriation](https://CRAN.R-project.org/package=seriation),\n[sfdep](https://CRAN.R-project.org/package=sfdep),\n[sfnetworks](https://CRAN.R-project.org/package=sfnetworks),\n[sharp](https://CRAN.R-project.org/package=sharp),\n[smotefamily](https://CRAN.R-project.org/package=smotefamily),\n[snap](https://CRAN.R-project.org/package=snap),\n[spdep](https://CRAN.R-project.org/package=spdep),\n[spNetwork](https://CRAN.R-project.org/package=spNetwork),\n[ssMRCD](https://CRAN.R-project.org/package=ssMRCD),\n[stream](https://CRAN.R-project.org/package=stream),\n[SuperCell](https://CRAN.R-project.org/package=SuperCell),\n[synr](https://CRAN.R-project.org/package=synr),\n[tidySEM](https://CRAN.R-project.org/package=tidySEM),\n[VBphenoR](https://CRAN.R-project.org/package=VBphenoR),\n[VIProDesign](https://CRAN.R-project.org/package=VIProDesign),\n[weird](https://CRAN.R-project.org/package=weird)\n\nTo cite package ‘dbscan’ in publications use:\n\n> Hahsler M, Piekenbrock M, Doran D (2019). “dbscan: Fast Density-Based\n> Clustering with R.” *Journal of Statistical Software*, *91*(1), 1-30.\n> <doi:10.18637/jss.v091.i01> <https://doi.org/10.18637/jss.v091.i01>.\n\n    @Article{,\n      title = {{dbscan}: Fast Density-Based Clustering with {R}},\n      author = {Michael Hahsler and Matthew Piekenbrock and Derek Doran},\n      journal = {Journal of Statistical Software},\n      year = {2019},\n      volume = {91},\n      number = {1},\n      pages = {1--30},\n      doi = {10.18637/jss.v091.i01},\n    }\n\n## Installation\n\n**Stable CRAN version:** Install from within R with\n\n``` r\ninstall.packages(\"dbscan\")\n```\n\n**Current development version:** Install from\n[r-universe.](https://mhahsler.r-universe.dev/dbscan)\n\n``` r\ninstall.packages(\"dbscan\",\n    repos = c(\"https://mhahsler.r-universe.dev\",\n              \"https://cloud.r-project.org/\"))\n```\n\n## Usage\n\nLoad the package and use the numeric variables in the iris dataset\n\n``` r\nlibrary(\"dbscan\")\n\ndata(\"iris\")\nx <- as.matrix(iris[, 1:4])\n```\n\nDBSCAN\n\n``` r\ndb <- dbscan(x, eps = 0.42, minPts = 5)\ndb\n```\n\n    ## DBSCAN clustering for 150 objects.\n    ## Parameters: eps = 0.42, minPts = 5\n    ## Using euclidean distances and borderpoints = TRUE\n    ## The clustering contains 3 cluster(s) and 29 noise points.\n    ## \n    ##  0  1  2  3 \n    ## 29 48 37 36 \n    ## \n    ## Available fields: cluster, eps, minPts, metric, borderPoints\n\nVisualize the resulting clustering (noise points are shown in black).\n\n``` r\npairs(x, col = db$cluster + 1L)\n```\n\n![](inst/README_files/dbscan-1.png)<!-- -->\n\nOPTICS\n\n``` r\nopt <- optics(x, eps = 1, minPts = 4)\nopt\n```\n\n    ## OPTICS ordering/clustering for 150 objects.\n    ## Parameters: minPts = 4, eps = 1, eps_cl = NA, xi = NA\n    ## Available fields: order, reachdist, coredist, predecessor, minPts, eps,\n    ##                   eps_cl, xi\n\nExtract DBSCAN-like clustering from OPTICS and create a reachability\nplot (extracted DBSCAN clusters at eps_cl=.4 are colored)\n\n``` r\nopt <- extractDBSCAN(opt, eps_cl = 0.4)\nplot(opt)\n```\n\n![](inst/README_files/OPTICS_extractDBSCAN-1.png)<!-- -->\n\nHDBSCAN\n\n``` r\nhdb <- hdbscan(x, minPts = 4)\nhdb\n```\n\n    ## HDBSCAN clustering for 150 objects.\n    ## Parameters: minPts = 4\n    ## The clustering contains 2 cluster(s) and 0 noise points.\n    ## \n    ##   1   2 \n    ## 100  50 \n    ## \n    ## Available fields: cluster, minPts, coredist, cluster_scores,\n    ##                   membership_prob, outlier_scores, hc\n\nVisualize the hierarchical clustering as a simplified tree. HDBSCAN\nfinds 2 stable clusters.\n\n``` r\nplot(hdb, show_flat = TRUE)\n```\n\n![](inst/README_files/hdbscan-1.png)<!-- -->\n\n## Using dbscan with tidyverse\n\n`dbscan` provides for all clustering algorithms `tidy()`, `augment()`,\nand `glance()` so they can be easily used with tidyverse, ggplot2 and\n[tidymodels](https://www.tidymodels.org/learn/statistics/k-means/).\n\n``` r\nlibrary(tidyverse)\ndb <- x %>%\n    dbscan(eps = 0.42, minPts = 5)\n```\n\nGet cluster statistics as a tibble\n\n``` r\ntidy(db)\n```\n\n    ## # A tibble: 4 × 3\n    ##   cluster  size noise\n    ##   <fct>   <int> <lgl>\n    ## 1 0          29 TRUE \n    ## 2 1          48 FALSE\n    ## 3 2          37 FALSE\n    ## 4 3          36 FALSE\n\nVisualize the clustering with ggplot2 (use an x for noise points)\n\n``` r\naugment(db, x) %>%\n    ggplot(aes(x = Petal.Length, y = Petal.Width)) + geom_point(aes(color = .cluster,\n    shape = noise)) + scale_shape_manual(values = c(19, 4))\n```\n\n![](inst/README_files/tidyverse3-1.png)<!-- -->\n\n## Using dbscan from Python\n\nR, the R package `dbscan`, and the Python package `rpy2` need to be\ninstalled.\n\n``` python\nimport pandas as pd\nimport numpy as np\n\n### prepare data\niris = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', \n                   header = None, \n                   names = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species'])\niris_numeric = iris[['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth']]\n\n# get R dbscan package\nfrom rpy2.robjects import packages\ndbscan = packages.importr('dbscan')\n\n# enable automatic conversion of pandas dataframes to R dataframes\nfrom rpy2.robjects import pandas2ri\npandas2ri.activate()\n\ndb = dbscan.dbscan(iris_numeric, eps = 0.5, MinPts = 5)\nprint(db)\n```\n\n    ## DBSCAN clustering for 150 objects.\n    ## Parameters: eps = 0.5, minPts = 5\n    ## Using euclidean distances and borderpoints = TRUE\n    ## The clustering contains 2 cluster(s) and 17 noise points.\n    ## \n    ##  0  1  2 \n    ## 17 49 84 \n    ## \n    ## Available fields: cluster, eps, minPts, dist, borderPoints\n\n``` python\n# get the cluster assignment vector\nlabels = np.array(db.rx('cluster'))\nlabels\n```\n\n    ## array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n    ##         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1,\n    ##         1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 0, 2, 2, 0, 2, 2, 2, 2, 2,\n    ##         2, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0,\n    ##         2, 2, 2, 2, 2, 0, 2, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2, 0, 0, 2, 0, 0,\n    ##         2, 2, 2, 2, 2, 2, 2, 0, 0, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 0,\n    ##         2, 2, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]],\n    ##       dtype=int32)\n\n## License\n\nThe dbscan package is licensed under the [GNU General Public License\n(GPL) Version 3](https://www.gnu.org/licenses/gpl-3.0.en.html). The\n**OPTICSXi** R implementation was directly ported from the ELKI\nframework’s Java implementation (GNU AGPLv3), with permission by the\noriginal author, Erich Schubert.\n\n## Changes\n\n- List of changes from\n  [NEWS.md](https://github.com/mhahsler/dbscan/blob/master/NEWS.md)\n\n## References\n\n<div id=\"refs\" class=\"references csl-bib-body hanging-indent\"\nentry-spacing=\"0\">\n\n<div id=\"ref-ankerst1999optics\" class=\"csl-entry\">\n\nAnkerst, Mihael, Markus M Breunig, Hans-Peter Kriegel, and Jörg Sander.\n1999. “OPTICS: Ordering Points to Identify the Clustering Structure.” In\n*ACM Sigmod Record*, 28:49–60. 2. ACM.\n<https://doi.org/10.1145/304181.304187>.\n\n</div>\n\n<div id=\"ref-breunig2000lof\" class=\"csl-entry\">\n\nBreunig, Markus M, Hans-Peter Kriegel, Raymond T Ng, and Jörg Sander.\n2000. “LOF: Identifying Density-Based Local Outliers.” In *ACM Int.\nConf. On Management of Data*, 29:93–104. 2. ACM.\n<https://doi.org/10.1145/335191.335388>.\n\n</div>\n\n<div id=\"ref-campello2013density\" class=\"csl-entry\">\n\nCampello, Ricardo JGB, Davoud Moulavi, and Jörg Sander. 2013.\n“Density-Based Clustering Based on Hierarchical Density Estimates.” In\n*Pacific-Asia Conference on Knowledge Discovery and Data Mining*,\n160–72. Springer. <https://doi.org/10.1007/978-3-642-37456-2_14>.\n\n</div>\n\n<div id=\"ref-campello2015hierarchical\" class=\"csl-entry\">\n\nCampello, Ricardo JGB, Davoud Moulavi, Arthur Zimek, and Joerg Sander.\n2015. “Hierarchical Density Estimates for Data Clustering,\nVisualization, and Outlier Detection.” *ACM Transactions on Knowledge\nDiscovery from Data (TKDD)* 10 (1): 5.\n<https://doi.org/10.1145/2733381>.\n\n</div>\n\n<div id=\"ref-erdoz2003\" class=\"csl-entry\">\n\nErtöz, Levent, Michael Steinbach, and Vipin Kumar. 2003. “Finding\nClusters of Different Sizes, Shapes, and Densities in Noisy, High\nDimensional Data.” In *Proceedings of the 2003 SIAM International\nConference on Data Mining (SDM)*, 47–58.\n<https://doi.org/10.1137/1.9781611972733.5>.\n\n</div>\n\n<div id=\"ref-ester1996density\" class=\"csl-entry\">\n\nEster, Martin, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. 1996.\n“A Density-Based Algorithm for Discovering Clusters in Large Spatial\nDatabases with Noise.” In *Proceedings of 2nd International Conference\non Knowledge Discovery and Data Mining (KDD-96)*, 226–31.\n<https://dl.acm.org/doi/10.5555/3001460.3001507>.\n\n</div>\n\n<div id=\"ref-hahsler2019dbscan\" class=\"csl-entry\">\n\nHahsler, Michael, Matthew Piekenbrock, and Derek Doran. 2019.\n“<span class=\"nocase\">dbscan</span>: Fast Density-Based Clustering with\nR.” *Journal of Statistical Software* 91 (1): 1–30.\n<https://doi.org/10.18637/jss.v091.i01>.\n\n</div>\n\n<div id=\"ref-jarvis1973\" class=\"csl-entry\">\n\nJarvis, R. A., and E. A. Patrick. 1973. “Clustering Using a Similarity\nMeasure Based on Shared Near Neighbors.” *IEEE Transactions on\nComputers* C-22 (11): 1025–34.\n<https://doi.org/10.1109/T-C.1973.223640>.\n\n</div>\n\n<div id=\"ref-moulavi2014\" class=\"csl-entry\">\n\nMoulavi, Davoud, Pablo A. Jaskowiak, Ricardo J. G. B. Campello, Arthur\nZimek, and Jörg Sander. 2014. “Density-Based Clustering Validation.” In\n*Proceedings of the 2014 SIAM International Conference on Data Mining\n(SDM)*, 839–47. <https://doi.org/10.1137/1.9781611973440.96>.\n\n</div>\n\n</div>\n"
  },
  {
    "path": "data_src/data_DBCV/dataset_1.txt",
    "content": "-0.0014755 0.99852 1\n-0.005943 0.98904 1\n0.028184 1.0181 1\n0.019204 1.0041 1\n0.033017 1.0128 1\n0.011014 0.9857 1\n0.033779 1.0033 1\n0.045243 1.0096 1\n0.02493 0.98413 1\n0.064521 1.0185 1\n0.032742 0.98149 1\n0.042959 0.98645 1\n0.049146 0.98734 1\n0.05769 0.99058 1\n0.070368 0.99792 1\n0.070434 0.99262 1\n0.09811 1.0149 1\n0.078285 0.98967 1\n0.096586 1.0025 1\n0.10724 1.0077 1\n0.083108 0.9781 1\n0.088157 0.97763 1\n0.092311 0.97624 1\n0.10984 0.98821 1\n0.12512 0.99789 1\n0.13833 1.0055 1\n0.12534 0.98686 1\n0.13543 0.99127 1\n0.13098 0.98113 1\n0.14075 0.98519 1\n0.16177 1.0005 1\n0.13901 0.97193 1\n0.14619 0.97331 1\n0.14712 0.96842 1\n0.16767 0.98311 1\n0.19442 1.004 1\n0.16394 0.96761 1\n0.1977 0.99543 1\n0.19514 0.98692 1\n0.1946 0.9804 1\n0.19852 0.97831 1\n0.20655 0.98031 1\n0.20457 0.97227 1\n0.22232 0.98393 1\n0.23737 0.99287 1\n0.22462 0.97398 1\n0.23313 0.97632 1\n0.22676 0.96375 1\n0.246 0.97677 1\n0.26077 0.98529 1\n0.26161 0.97986 1\n0.23546 0.9474 1\n0.2654 0.97101 1\n0.24746 0.9467 1\n0.26646 0.95933 1\n0.29237 0.97882 1\n0.26142 0.94142 1\n0.29617 0.9697 1\n0.29783 0.96485 1\n0.27501 0.93551 1\n0.2995 0.95344 1\n0.29481 0.94216 1\n0.31401 0.95475 1\n0.32047 0.95457 1\n0.32755 0.95496 1\n0.31955 0.94027 1\n0.33585 0.94983 1\n0.33838 0.9456 1\n0.32029 0.92072 1\n0.32917 0.92278 1\n0.36377 0.95052 1\n0.34103 0.9209 1\n0.34455 0.9175 1\n0.36578 0.93179 1\n0.36666 0.92569 1\n0.38252 0.93455 1\n0.38847 0.93345 1\n0.40353 0.94145 1\n0.38628 0.91709 1\n0.39619 0.91987 1\n0.40831 0.92482 1\n0.42051 0.92983 1\n0.42992 0.932 1\n0.41207 0.90689 1\n0.41348 0.901 1\n0.41216 0.89236 1\n0.42511 0.89794 1\n0.44358 0.90901 1\n0.44485 0.90285 1\n0.43699 0.88752 1\n0.45736 0.90039 1\n0.44539 0.88088 1\n0.44175 0.86967 1\n0.45383 0.87414 1\n0.47455 0.88721 1\n0.46535 0.87033 1\n0.47352 0.87079 1\n0.48349 0.873 1\n0.48279 0.86452 1\n0.4897 0.86359 1\n0.4966 0.86263 1\n0.5235 0.88162 1\n0.51375 0.86392 1\n0.51293 0.85512 1\n0.51094 0.8451 1\n0.53526 0.86136 1\n0.52601 0.84401 1\n0.52951 0.83937 1\n0.53659 0.83826 1\n0.54668 0.84011 1\n0.55938 0.84454 1\n0.57416 0.85101 1\n0.56963 0.83812 1\n0.58407 0.84416 1\n0.55567 0.80732 1\n0.56363 0.80678 1\n0.59075 0.82537 1\n0.60254 0.82858 1\n0.60324 0.82064 1\n0.58442 0.79315 1\n0.60202 0.80202 1\n0.60983 0.80106 1\n0.62846 0.81087 1\n0.63324 0.80676 1\n0.6081 0.77271 1\n0.6167 0.77233 1\n0.63294 0.77954 1\n0.63518 0.7727 1\n0.62163 0.75 1\n0.63385 0.75303 1\n0.64162 0.75155 1\n0.63656 0.73719 1\n0.66559 0.75686 1\n0.65921 0.74105 1\n0.67238 0.74474 1\n0.66346 0.72628 1\n0.69846 0.75167 1\n0.6876 0.73114 1\n0.69558 0.72939 1\n0.67529 0.6993 1\n0.69987 0.71402 1\n0.69774 0.70195 1\n0.71685 0.71106 1\n0.69996 0.68408 1\n0.70054 0.67451 1\n0.72628 0.69003 1\n0.72721 0.68066 1\n0.74228 0.68534 1\n0.75923 0.69184 1\n0.73849 0.66055 1\n0.74532 0.65676 1\n0.76487 0.66559 1\n0.76875 0.65868 1\n0.78436 0.66339 1\n0.76745 0.6355 1\n0.77718 0.63414 1\n0.79609 0.64187 1\n0.76966 0.60415 1\n0.77308 0.59619 1\n0.80871 0.62031 1\n0.79292 0.59292 1\n0.80364 0.59192 1\n0.81851 0.59494 1\n0.81311 0.57757 1\n0.81251 0.56488 1\n0.80587 0.546 1\n0.81022 0.53799 1\n0.81768 0.53293 1\n0.82183 0.52442 1\n0.84505 0.53482 1\n0.83976 0.51654 1\n0.8495 0.51313 1\n0.87442 0.52471 1\n0.88198 0.51875 1\n0.88773 0.51078 1\n0.85546 0.46458 1\n0.89719 0.49216 1\n0.87605 0.45664 1\n0.87376 0.43972 1\n0.90767 0.45874 1\n0.90549 0.44138 1\n0.90785 0.42826 1\n0.89003 0.39464 1\n0.90694 0.3954 1\n0.93518 0.4071 1\n0.92258 0.37754 1\n0.91917 0.35673 1\n0.94426 0.36391 1\n0.92657 0.32774 1\n0.94763 0.3297 1\n0.95621 0.31846 1\n0.93664 0.27824 1\n0.94663 0.26663 1\n0.9509 0.24815 1\n0.97853 0.25164 1\n0.98948 0.23668 1\n0.97915 0.19814 1\n0.98452 0.17207 1\n0.99067 0.14174 1\n0.9892 0.094075 1\n0.98787 -0.012127 1\n0.0014755 -0.99852 2\n0.005943 -0.98904 2\n-0.028184 -1.0181 2\n-0.019204 -1.0041 2\n-0.033017 -1.0128 2\n-0.011014 -0.9857 2\n-0.033779 -1.0033 2\n-0.045243 -1.0096 2\n-0.02493 -0.98413 2\n-0.064521 -1.0185 2\n-0.032742 -0.98149 2\n-0.042959 -0.98645 2\n-0.049146 -0.98734 2\n-0.05769 -0.99058 2\n-0.070368 -0.99792 2\n-0.070434 -0.99262 2\n-0.09811 -1.0149 2\n-0.078285 -0.98967 2\n-0.096586 -1.0025 2\n-0.10724 -1.0077 2\n-0.083108 -0.9781 2\n-0.088157 -0.97763 2\n-0.092311 -0.97624 2\n-0.10984 -0.98821 2\n-0.12512 -0.99789 2\n-0.13833 -1.0055 2\n-0.12534 -0.98686 2\n-0.13543 -0.99127 2\n-0.13098 -0.98113 2\n-0.14075 -0.98519 2\n-0.16177 -1.0005 2\n-0.13901 -0.97193 2\n-0.14619 -0.97331 2\n-0.14712 -0.96842 2\n-0.16767 -0.98311 2\n-0.19442 -1.004 2\n-0.16394 -0.96761 2\n-0.1977 -0.99543 2\n-0.19514 -0.98692 2\n-0.1946 -0.9804 2\n-0.19852 -0.97831 2\n-0.20655 -0.98031 2\n-0.20457 -0.97227 2\n-0.22232 -0.98393 2\n-0.23737 -0.99287 2\n-0.22462 -0.97398 2\n-0.23313 -0.97632 2\n-0.22676 -0.96375 2\n-0.246 -0.97677 2\n-0.26077 -0.98529 2\n-0.26161 -0.97986 2\n-0.23546 -0.9474 2\n-0.2654 -0.97101 2\n-0.24746 -0.9467 2\n-0.26646 -0.95933 2\n-0.29237 -0.97882 2\n-0.26142 -0.94142 2\n-0.29617 -0.9697 2\n-0.29783 -0.96485 2\n-0.27501 -0.93551 2\n-0.2995 -0.95344 2\n-0.29481 -0.94216 2\n-0.31401 -0.95475 2\n-0.32047 -0.95457 2\n-0.32755 -0.95496 2\n-0.31955 -0.94027 2\n-0.33585 -0.94983 2\n-0.33838 -0.9456 2\n-0.32029 -0.92072 2\n-0.32917 -0.92278 2\n-0.36377 -0.95052 2\n-0.34103 -0.9209 2\n-0.34455 -0.9175 2\n-0.36578 -0.93179 2\n-0.36666 -0.92569 2\n-0.38252 -0.93455 2\n-0.38847 -0.93345 2\n-0.40353 -0.94145 2\n-0.38628 -0.91709 2\n-0.39619 -0.91987 2\n-0.40831 -0.92482 2\n-0.42051 -0.92983 2\n-0.42992 -0.932 2\n-0.41207 -0.90689 2\n-0.41348 -0.901 2\n-0.41216 -0.89236 2\n-0.42511 -0.89794 2\n-0.44358 -0.90901 2\n-0.44485 -0.90285 2\n-0.43699 -0.88752 2\n-0.45736 -0.90039 2\n-0.44539 -0.88088 2\n-0.44175 -0.86967 2\n-0.45383 -0.87414 2\n-0.47455 -0.88721 2\n-0.46535 -0.87033 2\n-0.47352 -0.87079 2\n-0.48349 -0.873 2\n-0.48279 -0.86452 2\n-0.4897 -0.86359 2\n-0.4966 -0.86263 2\n-0.5235 -0.88162 2\n-0.51375 -0.86392 2\n-0.51293 -0.85512 2\n-0.51094 -0.8451 2\n-0.53526 -0.86136 2\n-0.52601 -0.84401 2\n-0.52951 -0.83937 2\n-0.53659 -0.83826 2\n-0.54668 -0.84011 2\n-0.55938 -0.84454 2\n-0.57416 -0.85101 2\n-0.56963 -0.83812 2\n-0.58407 -0.84416 2\n-0.55567 -0.80732 2\n-0.56363 -0.80678 2\n-0.59075 -0.82537 2\n-0.60254 -0.82858 2\n-0.60324 -0.82064 2\n-0.58442 -0.79315 2\n-0.60202 -0.80202 2\n-0.60983 -0.80106 2\n-0.62846 -0.81087 2\n-0.63324 -0.80676 2\n-0.6081 -0.77271 2\n-0.6167 -0.77233 2\n-0.63294 -0.77954 2\n-0.63518 -0.7727 2\n-0.62163 -0.75 2\n-0.63385 -0.75303 2\n-0.64162 -0.75155 2\n-0.63656 -0.73719 2\n-0.66559 -0.75686 2\n-0.65921 -0.74105 2\n-0.67238 -0.74474 2\n-0.66346 -0.72628 2\n-0.69846 -0.75167 2\n-0.6876 -0.73114 2\n-0.69558 -0.72939 2\n-0.67529 -0.6993 2\n-0.69987 -0.71402 2\n-0.69774 -0.70195 2\n-0.71685 -0.71106 2\n-0.69996 -0.68408 2\n-0.70054 -0.67451 2\n-0.72628 -0.69003 2\n-0.72721 -0.68066 2\n-0.74228 -0.68534 2\n-0.75923 -0.69184 2\n-0.73849 -0.66055 2\n-0.74532 -0.65676 2\n-0.76487 -0.66559 2\n-0.76875 -0.65868 2\n-0.78436 -0.66339 2\n-0.76745 -0.6355 2\n-0.77718 -0.63414 2\n-0.79609 -0.64187 2\n-0.76966 -0.60415 2\n-0.77308 -0.59619 2\n-0.80871 -0.62031 2\n-0.79292 -0.59292 2\n-0.80364 -0.59192 2\n-0.81851 -0.59494 2\n-0.81311 -0.57757 2\n-0.81251 -0.56488 2\n-0.80587 -0.546 2\n-0.81022 -0.53799 2\n-0.81768 -0.53293 2\n-0.82183 -0.52442 2\n-0.84505 -0.53482 2\n-0.83976 -0.51654 2\n-0.8495 -0.51313 2\n-0.87442 -0.52471 2\n-0.88198 -0.51875 2\n-0.88773 -0.51078 2\n-0.85546 -0.46458 2\n-0.89719 -0.49216 2\n-0.87605 -0.45664 2\n-0.87376 -0.43972 2\n-0.90767 -0.45874 2\n-0.90549 -0.44138 2\n-0.90785 -0.42826 2\n-0.89003 -0.39464 2\n-0.90694 -0.3954 2\n-0.93518 -0.4071 2\n-0.92258 -0.37754 2\n-0.91917 -0.35673 2\n-0.94426 -0.36391 2\n-0.92657 -0.32774 2\n-0.94763 -0.3297 2\n-0.95621 -0.31846 2\n-0.93664 -0.27824 2\n-0.94663 -0.26663 2\n-0.9509 -0.24815 2\n-0.97853 -0.25164 2\n-0.98948 -0.23668 2\n-0.97915 -0.19814 2\n-0.98452 -0.17207 2\n-0.99067 -0.14174 2\n-0.9892 -0.094075 2\n-0.98787 0.012127 2\n-0.0029509 1.997 3\n-0.011886 1.9781 3\n0.056369 2.0363 3\n0.038408 2.0082 3\n0.066034 2.0256 3\n0.022028 1.9714 3\n0.067558 2.0067 3\n0.090485 2.0193 3\n0.04986 1.9683 3\n0.12904 2.037 3\n0.065484 1.963 3\n0.085919 1.9729 3\n0.098292 1.9747 3\n0.11538 1.9812 3\n0.14074 1.9958 3\n0.14087 1.9852 3\n0.19622 2.0298 3\n0.15657 1.9793 3\n0.19317 2.0051 3\n0.21449 2.0154 3\n0.16622 1.9562 3\n0.17631 1.9553 3\n0.18462 1.9525 3\n0.21968 1.9764 3\n0.25024 1.9958 3\n0.27666 2.011 3\n0.25069 1.9737 3\n0.27086 1.9825 3\n0.26197 1.9623 3\n0.28151 1.9704 3\n0.32354 2.0009 3\n0.27802 1.9439 3\n0.29239 1.9466 3\n0.29424 1.9368 3\n0.33533 1.9662 3\n0.38883 2.008 3\n0.32789 1.9352 3\n0.39539 1.9909 3\n0.39027 1.9738 3\n0.38919 1.9608 3\n0.39703 1.9566 3\n0.41309 1.9606 3\n0.40914 1.9445 3\n0.44464 1.9679 3\n0.47475 1.9857 3\n0.44924 1.948 3\n0.46626 1.9526 3\n0.45351 1.9275 3\n0.492 1.9535 3\n0.52153 1.9706 3\n0.52323 1.9597 3\n0.47091 1.8948 3\n0.5308 1.942 3\n0.49491 1.8934 3\n0.53293 1.9187 3\n0.58475 1.9576 3\n0.52284 1.8828 3\n0.59234 1.9394 3\n0.59565 1.9297 3\n0.55002 1.871 3\n0.599 1.9069 3\n0.58962 1.8843 3\n0.62803 1.9095 3\n0.64095 1.9091 3\n0.65509 1.9099 3\n0.63911 1.8805 3\n0.6717 1.8997 3\n0.67676 1.8912 3\n0.64059 1.8414 3\n0.65835 1.8456 3\n0.72754 1.901 3\n0.68206 1.8418 3\n0.68909 1.835 3\n0.73157 1.8636 3\n0.73332 1.8514 3\n0.76505 1.8691 3\n0.77693 1.8669 3\n0.80707 1.8829 3\n0.77256 1.8342 3\n0.79239 1.8397 3\n0.81662 1.8496 3\n0.84103 1.8597 3\n0.85983 1.864 3\n0.82415 1.8138 3\n0.82696 1.802 3\n0.82433 1.7847 3\n0.85022 1.7959 3\n0.88716 1.818 3\n0.88971 1.8057 3\n0.87398 1.775 3\n0.91472 1.8008 3\n0.89078 1.7618 3\n0.8835 1.7393 3\n0.90766 1.7483 3\n0.94909 1.7744 3\n0.9307 1.7407 3\n0.94704 1.7416 3\n0.96697 1.746 3\n0.96559 1.729 3\n0.9794 1.7272 3\n0.99321 1.7253 3\n1.047 1.7632 3\n1.0275 1.7278 3\n1.0259 1.7102 3\n1.0219 1.6902 3\n1.0705 1.7227 3\n1.052 1.688 3\n1.059 1.6787 3\n1.0732 1.6765 3\n1.0934 1.6802 3\n1.1188 1.6891 3\n1.1483 1.702 3\n1.1393 1.6762 3\n1.1681 1.6883 3\n1.1113 1.6146 3\n1.1273 1.6136 3\n1.1815 1.6507 3\n1.2051 1.6572 3\n1.2065 1.6413 3\n1.1688 1.5863 3\n1.204 1.604 3\n1.2197 1.6021 3\n1.2569 1.6217 3\n1.2665 1.6135 3\n1.2162 1.5454 3\n1.2334 1.5447 3\n1.2659 1.5591 3\n1.2704 1.5454 3\n1.2433 1.5 3\n1.2677 1.5061 3\n1.2832 1.5031 3\n1.2731 1.4744 3\n1.3312 1.5137 3\n1.3184 1.4821 3\n1.3448 1.4895 3\n1.3269 1.4526 3\n1.3969 1.5033 3\n1.3752 1.4623 3\n1.3912 1.4588 3\n1.3506 1.3986 3\n1.3997 1.428 3\n1.3955 1.4039 3\n1.4337 1.4221 3\n1.3999 1.3682 3\n1.4011 1.349 3\n1.4526 1.3801 3\n1.4544 1.3613 3\n1.4846 1.3707 3\n1.5185 1.3837 3\n1.477 1.3211 3\n1.4906 1.3135 3\n1.5297 1.3312 3\n1.5375 1.3174 3\n1.5687 1.3268 3\n1.5349 1.271 3\n1.5544 1.2683 3\n1.5922 1.2837 3\n1.5393 1.2083 3\n1.5462 1.1924 3\n1.6174 1.2406 3\n1.5858 1.1858 3\n1.6073 1.1838 3\n1.637 1.1899 3\n1.6262 1.1551 3\n1.625 1.1298 3\n1.6117 1.092 3\n1.6204 1.076 3\n1.6354 1.0659 3\n1.6437 1.0488 3\n1.6901 1.0696 3\n1.6795 1.0331 3\n1.699 1.0263 3\n1.7488 1.0494 3\n1.764 1.0375 3\n1.7755 1.0216 3\n1.7109 0.92917 3\n1.7944 0.98432 3\n1.7521 0.91328 3\n1.7475 0.87945 3\n1.8153 0.91747 3\n1.811 0.88277 3\n1.8157 0.85652 3\n1.7801 0.78928 3\n1.8139 0.79079 3\n1.8704 0.8142 3\n1.8452 0.75509 3\n1.8383 0.71346 3\n1.8885 0.72782 3\n1.8531 0.65549 3\n1.8953 0.65939 3\n1.9124 0.63693 3\n1.8733 0.55649 3\n1.8933 0.53327 3\n1.9018 0.49629 3\n1.9571 0.50328 3\n1.979 0.47337 3\n1.9583 0.39629 3\n1.969 0.34415 3\n1.9813 0.28348 3\n1.9784 0.18815 3\n1.9757 -0.024254 3\n0.0029509 -1.997 4\n0.011886 -1.9781 4\n-0.056369 -2.0363 4\n-0.038408 -2.0082 4\n-0.066034 -2.0256 4\n-0.022028 -1.9714 4\n-0.067558 -2.0067 4\n-0.090485 -2.0193 4\n-0.04986 -1.9683 4\n-0.12904 -2.037 4\n-0.065484 -1.963 4\n-0.085919 -1.9729 4\n-0.098292 -1.9747 4\n-0.11538 -1.9812 4\n-0.14074 -1.9958 4\n-0.14087 -1.9852 4\n-0.19622 -2.0298 4\n-0.15657 -1.9793 4\n-0.19317 -2.0051 4\n-0.21449 -2.0154 4\n-0.16622 -1.9562 4\n-0.17631 -1.9553 4\n-0.18462 -1.9525 4\n-0.21968 -1.9764 4\n-0.25024 -1.9958 4\n-0.27666 -2.011 4\n-0.25069 -1.9737 4\n-0.27086 -1.9825 4\n-0.26197 -1.9623 4\n-0.28151 -1.9704 4\n-0.32354 -2.0009 4\n-0.27802 -1.9439 4\n-0.29239 -1.9466 4\n-0.29424 -1.9368 4\n-0.33533 -1.9662 4\n-0.38883 -2.008 4\n-0.32789 -1.9352 4\n-0.39539 -1.9909 4\n-0.39027 -1.9738 4\n-0.38919 -1.9608 4\n-0.39703 -1.9566 4\n-0.41309 -1.9606 4\n-0.40914 -1.9445 4\n-0.44464 -1.9679 4\n-0.47475 -1.9857 4\n-0.44924 -1.948 4\n-0.46626 -1.9526 4\n-0.45351 -1.9275 4\n-0.492 -1.9535 4\n-0.52153 -1.9706 4\n-0.52323 -1.9597 4\n-0.47091 -1.8948 4\n-0.5308 -1.942 4\n-0.49491 -1.8934 4\n-0.53293 -1.9187 4\n-0.58475 -1.9576 4\n-0.52284 -1.8828 4\n-0.59234 -1.9394 4\n-0.59565 -1.9297 4\n-0.55002 -1.871 4\n-0.599 -1.9069 4\n-0.58962 -1.8843 4\n-0.62803 -1.9095 4\n-0.64095 -1.9091 4\n-0.65509 -1.9099 4\n-0.63911 -1.8805 4\n-0.6717 -1.8997 4\n-0.67676 -1.8912 4\n-0.64059 -1.8414 4\n-0.65835 -1.8456 4\n-0.72754 -1.901 4\n-0.68206 -1.8418 4\n-0.68909 -1.835 4\n-0.73157 -1.8636 4\n-0.73332 -1.8514 4\n-0.76505 -1.8691 4\n-0.77693 -1.8669 4\n-0.80707 -1.8829 4\n-0.77256 -1.8342 4\n-0.79239 -1.8397 4\n-0.81662 -1.8496 4\n-0.84103 -1.8597 4\n-0.85983 -1.864 4\n-0.82415 -1.8138 4\n-0.82696 -1.802 4\n-0.82433 -1.7847 4\n-0.85022 -1.7959 4\n-0.88716 -1.818 4\n-0.88971 -1.8057 4\n-0.87398 -1.775 4\n-0.91472 -1.8008 4\n-0.89078 -1.7618 4\n-0.8835 -1.7393 4\n-0.90766 -1.7483 4\n-0.94909 -1.7744 4\n-0.9307 -1.7407 4\n-0.94704 -1.7416 4\n-0.96697 -1.746 4\n-0.96559 -1.729 4\n-0.9794 -1.7272 4\n-0.99321 -1.7253 4\n-1.047 -1.7632 4\n-1.0275 -1.7278 4\n-1.0259 -1.7102 4\n-1.0219 -1.6902 4\n-1.0705 -1.7227 4\n-1.052 -1.688 4\n-1.059 -1.6787 4\n-1.0732 -1.6765 4\n-1.0934 -1.6802 4\n-1.1188 -1.6891 4\n-1.1483 -1.702 4\n-1.1393 -1.6762 4\n-1.1681 -1.6883 4\n-1.1113 -1.6146 4\n-1.1273 -1.6136 4\n-1.1815 -1.6507 4\n-1.2051 -1.6572 4\n-1.2065 -1.6413 4\n-1.1688 -1.5863 4\n-1.204 -1.604 4\n-1.2197 -1.6021 4\n-1.2569 -1.6217 4\n-1.2665 -1.6135 4\n-1.2162 -1.5454 4\n-1.2334 -1.5447 4\n-1.2659 -1.5591 4\n-1.2704 -1.5454 4\n-1.2433 -1.5 4\n-1.2677 -1.5061 4\n-1.2832 -1.5031 4\n-1.2731 -1.4744 4\n-1.3312 -1.5137 4\n-1.3184 -1.4821 4\n-1.3448 -1.4895 4\n-1.3269 -1.4526 4\n-1.3969 -1.5033 4\n-1.3752 -1.4623 4\n-1.3912 -1.4588 4\n-1.3506 -1.3986 4\n-1.3997 -1.428 4\n-1.3955 -1.4039 4\n-1.4337 -1.4221 4\n-1.3999 -1.3682 4\n-1.4011 -1.349 4\n-1.4526 -1.3801 4\n-1.4544 -1.3613 4\n-1.4846 -1.3707 4\n-1.5185 -1.3837 4\n-1.477 -1.3211 4\n-1.4906 -1.3135 4\n-1.5297 -1.3312 4\n-1.5375 -1.3174 4\n-1.5687 -1.3268 4\n-1.5349 -1.271 4\n-1.5544 -1.2683 4\n-1.5922 -1.2837 4\n-1.5393 -1.2083 4\n-1.5462 -1.1924 4\n-1.6174 -1.2406 4\n-1.5858 -1.1858 4\n-1.6073 -1.1838 4\n-1.637 -1.1899 4\n-1.6262 -1.1551 4\n-1.625 -1.1298 4\n-1.6117 -1.092 4\n-1.6204 -1.076 4\n-1.6354 -1.0659 4\n-1.6437 -1.0488 4\n-1.6901 -1.0696 4\n-1.6795 -1.0331 4\n-1.699 -1.0263 4\n-1.7488 -1.0494 4\n-1.764 -1.0375 4\n-1.7755 -1.0216 4\n-1.7109 -0.92917 4\n-1.7944 -0.98432 4\n-1.7521 -0.91328 4\n-1.7475 -0.87945 4\n-1.8153 -0.91747 4\n-1.811 -0.88277 4\n-1.8157 -0.85652 4\n-1.7801 -0.78928 4\n-1.8139 -0.79079 4\n-1.8704 -0.8142 4\n-1.8452 -0.75509 4\n-1.8383 -0.71346 4\n-1.8885 -0.72782 4\n-1.8531 -0.65549 4\n-1.8953 -0.65939 4\n-1.9124 -0.63693 4\n-1.8733 -0.55649 4\n-1.8933 -0.53327 4\n-1.9018 -0.49629 4\n-1.9571 -0.50328 4\n-1.979 -0.47337 4\n-1.9583 -0.39629 4\n-1.969 -0.34415 4\n-1.9813 -0.28348 4\n-1.9784 -0.18815 4\n-1.9757 0.024254 4\n1.4303 -1.0155 -1\n-0.47685 -0.96563 -1\n0.84056 1.4012 -1\n0.093202 -0.41791 -1\n-0.54094 -1.6109 -1\n-0.25885 -1.2472 -1\n0.74337 -0.55785 -1\n-1.0824 1.5259 -1\n1.8981 0.40646 -1\n1.8849 -0.98545 -1\n-0.83407 -0.57677 -1\n-0.64022 1.5788 -1\n1.9672 1.6318 -1\n1.1451 -0.21204 -1\n1.1687 -0.9417 -1\n0.52452 0.21924 -1\n1.2342 -1.3084 -1\n-0.20569 1.4654 -1\n1.3101 -1.0919 -1\n-1.4794 -1.3521 -1\n0.052576 -1.9281 -1\n0.85565 -0.72342 -1\n-0.998 0.22474 -1\n0.12641 1.3221 -1\n-0.46676 1.2395 -1\n1.1958 -1.9376 -1\n0.67705 -0.52349 -1\n1.9134 -0.033122 -1\n1.7309 -0.1383 -1\n0.30224 -1.8671 -1\n-1.6636 0.47667 -1\n-0.34148 0.31791 -1\n-1.2647 -0.81965 -1\n1.964 -0.2621 -1\n0.080782 -1.4804 -1\n1.5267 -0.81594 -1\n0.58746 1.0648 -1\n-0.13372 -1.8932 -1\n-1.6037 -0.93906 -1\n1.8538 2.0218 -1\n0.47595 -0.21614 -1\n-1.3631 -1.4146 -1\n-0.40273 1.5735 -1\n1.5157 -1.9092 -1\n0.1546 -1.5643 -1\n0.17307 -1.015 -1\n-0.22804 1.0579 -1\n-1.2532 1.6227 -1\n-0.9937 -1.1268 -1\n-0.85152 0.70602 -1\n0.11693 1.2987 -1\n0.23711 1.8289 -1\n-0.33624 1.525 -1\n1.6075 -0.43292 -1\n-0.77214 1.7802 -1\n0.59348 -0.25709 -1\n-0.83697 -1.3749 -1\n-0.96984 -0.77479 -1\n-0.56196 0.73784 -1\n1.2122 1.7683 -1\n0.15425 1.8227 -1\n0.35689 0.40366 -1\n-1.0654 1.8287 -1\n-1.5773 -0.39103 -1\n0.57317 -1.8698 -1\n1.9026 -0.83995 -1\n-1.5782 -1.9069 -1\n-1.2369 1.485 -1\n-1.9441 -0.27481 -1\n1.3406 -1.6589 -1\n-0.073933 -1.4756 -1\n-0.1247 -1.0512 -1\n1.6189 -1.1285 -1\n-0.32831 1.4982 -1\n0.1749 1.0763 -1\n0.78859 -0.63263 -1\n-1.6681 -0.46941 -1\n0.037311 0.38648 -1\n-0.051917 0.14308 -1\n1.4102 -0.67809 -1\n0.45334 1.445 -1\n-1.516 -0.95477 -1\n0.42349 1.7679 -1\n-1.3307 -0.44882 -1\n-0.40012 0.74581 -1\n0.12822 -0.91661 -1\n1.4868 -1.9231 -1\n0.63021 1.7951 -1\n1.1397 0.1384 -1\n-1.4819 0.69736 -1\n0.098963 0.4381 -1\n1.583 1.0221 -1\n-1.549 1.9609 -1\n0.53325 0.92753 -1\n-1.6609 1.4557 -1\n-0.35175 2.0038 -1\n0.84258 1.057 -1\n-1.5834 -1.442 -1\n1.2282 -0.70763 -1\n0.54608 -1.9197 -1\n1.5774 0.7926 -1\n0.48273 1.869 -1\n-0.33838 0.93314 -1\n0.58471 0.96454 -1\n-0.042523 -1.3256 -1\n-1.6098 -0.58906 -1\n0.54416 0.30412 -1\n1.7842 -0.16318 -1\n-0.093611 1.3596 -1\n0.40738 1.2851 -1\n0.36251 -0.71722 -1\n-1.0887 -0.1561 -1\n0.66743 0.70871 -1\n-1.3609 0.38795 -1\n1.0867 -1.4895 -1\n-1.1371 -1.9576 -1\n-1.3111 -1.5273 -1\n0.89457 -1.1274 -1\n-0.96612 -0.20721 -1\n-1.3363 0.14068 -1\n0.4984 1.9978 -1\n"
  },
  {
    "path": "data_src/data_DBCV/dataset_2.txt",
    "content": "191.67 388.02 1\n186.28 383.39 1\n182.22 397.99 1\n194.54 394.76 1\n183.43 393.87 1\n184.23 388.09 1\n192.33 389.85 1\n190.66 379.92 1\n195.57 391.06 1\n191.96 385.75 1\n199.7 389.03 1\n198.24 396.81 1\n193.82 392.53 1\n199.6 389 1\n183.64 380.64 1\n197.05 391.36 1\n184.78 385 1\n191.82 380.51 1\n195.54 391.6 1\n201.36 396.71 1\n191.57 382.8 1\n188.86 394.4 1\n193.18 394.9 1\n193.52 383.23 1\n190.54 390.52 1\n193.54 380.93 1\n190.13 385.28 1\n189.19 389.93 1\n196.69 396.46 1\n184.35 384.27 1\n187.22 388 1\n200.74 378.8 1\n186.19 394.45 1\n183.39 391.57 1\n191.16 391.98 1\n192.17 384.25 1\n191.2 381.73 1\n197.37 390.6 1\n187.18 396.36 1\n185.36 395.56 1\n185 378.42 1\n200.7 378.48 1\n189.43 380.73 1\n201.74 385.78 1\n191.41 393.4 1\n190.49 396.99 1\n183 384.57 1\n192.84 394.45 1\n188.91 393.37 1\n195.13 381.42 1\n192.55 389.33 1\n188.18 379.04 1\n194.63 390.13 1\n195.78 384.71 1\n194 384.32 1\n201.64 392.2 1\n189.08 384.43 1\n193.18 385.76 1\n186.7 380.63 1\n193.05 389.65 1\n192.87 391.51 1\n200.06 383.44 1\n187.65 389.69 1\n185.65 393.05 1\n194.62 385.77 1\n200.27 395.52 1\n190.14 380.87 1\n201.36 386.43 1\n197.19 380.52 1\n194.06 380.94 1\n190.65 381.92 1\n185.66 393.78 1\n192.36 383.31 1\n195.77 390.8 1\n186.62 389.32 1\n188.21 390.11 1\n192.65 396.48 1\n195.48 390.87 1\n200.97 385.22 1\n184.95 393.58 1\n197.78 386.29 1\n186.87 380.27 1\n189.26 386.77 1\n190.07 379.13 1\n200.59 382.33 1\n188.67 396.58 1\n200.17 395.49 1\n201.76 385.96 1\n192.04 397.27 1\n192.75 383.17 1\n187.46 382.41 1\n340.41 481.39 2\n340.1 495.16 2\n344.78 481.21 2\n331.12 496.46 2\n340.69 487.92 2\n335.3 482.55 2\n337.23 499.25 2\n342.68 494.51 2\n328.22 495.53 2\n339.51 486.51 2\n341.08 493.48 2\n328.75 495.01 2\n326.3 488.38 2\n327.81 498.99 2\n334.28 487.46 2\n326.35 492.18 2\n341.09 498.63 2\n338.57 499.69 2\n335.41 492.93 2\n332.2 493.39 2\n337.98 485.14 2\n336.41 483.31 2\n339.96 493.2 2\n343.33 486.77 2\n341.46 485.77 2\n330.26 493.31 2\n332.52 484.28 2\n326.43 499.75 2\n328.52 499.14 2\n338.5 484.65 2\n344.84 482.29 2\n334.78 494.12 2\n326.04 485.07 2\n329.06 481.9 2\n331.54 494.11 2\n328.48 485.08 2\n337.45 499.97 2\n339.55 499.48 2\n337.43 495.51 2\n327.76 494.22 2\n330.44 492.63 2\n339.49 487.74 2\n336.16 482.82 2\n341.06 485.45 2\n339.48 488.7 2\n330.98 480.68 2\n331.57 484.42 2\n343.33 484.18 2\n328.31 488.15 2\n334.89 498.58 2\n342.32 497.03 2\n332.51 487.29 2\n326.03 491.16 2\n341.69 486.9 2\n338.1 492.02 2\n332.05 497.13 2\n339.75 485.09 2\n333.82 484.08 2\n329.32 499.44 2\n332.68 488.99 2\n327.92 487.37 2\n337.22 480.92 2\n336.15 488.91 2\n333.69 490.45 2\n326.04 486.61 2\n334.56 492.73 2\n333.88 489.53 2\n337.59 488.77 2\n340.21 492.08 2\n339.8 493.36 2\n329.76 497.27 2\n340.95 482.98 2\n338.42 484.3 2\n344.52 499.25 2\n327.57 499.96 2\n329.93 499.83 2\n335.57 480.24 2\n333.34 488.34 2\n337.02 493.53 2\n340.54 482.09 2\n325.4 482.11 2\n334.95 494.09 2\n336.14 495 2\n326.32 495.45 2\n332.61 484.52 2\n338.13 484.78 2\n336.18 494.65 2\n331.86 493.36 2\n332.1 496.01 2\n344.72 488.13 2\n294.98 518.49 3\n291.23 516.72 3\n288.05 515.34 3\n278.89 524.17 3\n275.93 521.61 3\n282.55 514.63 3\n281.36 518.21 3\n279.33 518.85 3\n283.84 525.99 3\n288.43 529.03 3\n291.04 528.4 3\n278.14 522.74 3\n281.71 515.52 3\n278.62 526.96 3\n291.96 525.89 3\n287.65 513.1 3\n286.88 512.23 3\n285.28 527.11 3\n280.91 526.84 3\n283.04 531.88 3\n285.71 523.52 3\n281.09 523.43 3\n294.47 517.23 3\n285.36 515.44 3\n280.82 507.61 3\n292.46 516.93 3\n288.01 519.53 3\n287.09 524.37 3\n289.28 514.53 3\n278.14 515.27 3\n280.88 506.35 3\n290.47 508.46 3\n286.48 501.99 3\n289.54 509.65 3\n284.03 505.59 3\n290.56 510.6 3\n283.11 500.77 3\n292.86 512.83 3\n280.09 510.88 3\n288.59 515.62 3\n293.92 504.93 3\n283.62 505.76 3\n289 500.88 3\n284.38 493.54 3\n281.44 496.12 3\n290.95 496.9 3\n293.05 488.36 3\n276.16 489.8 3\n278.67 505.62 3\n283.59 501.08 3\n286.26 492.08 3\n291.35 490.49 3\n288.35 487.82 3\n282.77 477.32 3\n283.58 480.83 3\n292.1 477.41 3\n294.59 481.41 3\n285.3 479.59 3\n279.84 489.81 3\n293.43 491.57 3\n280.47 469.46 3\n279.58 471 3\n291.63 475.93 3\n291.74 468.47 3\n288.93 466.17 3\n276.82 482.18 3\n282.36 481.33 3\n288.45 471.95 3\n288.05 469.6 3\n276.47 479.88 3\n300.56 466.5 3\n298.47 471.81 3\n284.07 481.36 3\n287.38 464.73 3\n284.4 473.29 3\n287.97 480.09 3\n298.61 474.17 3\n289.85 469.67 3\n283.63 464.57 3\n298.97 464.74 3\n291.27 458.95 3\n294.18 463.21 3\n294.48 465.85 3\n289.87 468.79 3\n290.94 459.28 3\n296.98 458.08 3\n280.27 454.63 3\n285.93 466.91 3\n278.23 463.41 3\n282.53 454.21 3\n288.59 454.81 3\n276.72 457.2 3\n285.36 461.91 3\n277.37 457.95 3\n278.49 447.63 3\n293.35 448.16 3\n291.39 449.97 3\n294.06 448.98 3\n293.1 455.09 3\n287.31 453.59 3\n282.92 456.89 3\n285.34 460.96 3\n277.37 443.03 3\n285.76 448.16 3\n290.56 452.58 3\n292.64 460.18 3\n280.59 460.58 3\n277.92 446.65 3\n287.88 447.08 3\n286.07 446.54 3\n283.37 453.01 3\n285.11 438.68 3\n278.74 440.95 3\n283.6 443.65 3\n284.92 444.38 3\n287.73 453.63 3\n277.89 445.57 3\n289.52 436.22 3\n295.98 436.38 3\n287.3 436.81 3\n296.1 441.71 3\n292.47 447.83 3\n289.95 450.92 3\n297.99 443.04 3\n297 434.36 3\n296.57 445.3 3\n299.79 440.01 3\n299.96 442.62 3\n299.68 439.13 3\n296.39 436.12 3\n320.92 454.47 3\n310.97 444.94 3\n323.03 452.19 3\n309.97 447.02 3\n311.33 456.66 3\n320.74 452.42 3\n323.85 458.09 3\n305.53 448.24 3\n307.96 450.16 3\n318.1 460.06 3\n307.99 450.18 3\n306.09 447.03 3\n314.32 446.33 3\n310.71 454.38 3\n318.18 440.03 3\n317.17 448.15 3\n314.51 454.5 3\n314.28 444.67 3\n315.05 449.01 3\n310.99 457.54 3\n313.12 441.72 3\n309.27 445.33 3\n309.35 444.71 3\n311.14 443.3 3\n305.75 437.12 3\n309.01 455.41 3\n312.25 437.3 3\n305.43 442.71 3\n309.84 453.82 3\n305.52 444.09 3\n321.73 441.87 3\n314.2 439.02 3\n329.11 440.43 3\n316.15 455.43 3\n316.5 454.81 3\n314.86 452.27 3\n323.51 448.13 3\n324 439.58 3\n322.74 448.47 3\n322.93 447.08 3\n335.27 437.48 3\n338.28 451.23 3\n328.09 447.29 3\n322.51 449.93 3\n323.06 450.62 3\n331.6 452.04 3\n334.17 449.54 3\n330.58 439.86 3\n327.51 450.65 3\n335.91 449.43 3\n343.39 443.55 3\n331.72 435.02 3\n336.7 447.63 3\n330.01 450.15 3\n328.63 448.64 3\n329.16 436.68 3\n327.53 440.84 3\n332.23 452.85 3\n330.02 447.25 3\n328.79 452.26 3\n340.74 442.92 3\n353.55 435.51 3\n345.18 445.63 3\n337.29 440.98 3\n343.28 435.96 3\n341.23 445.12 3\n355.15 435.15 3\n345.11 444.24 3\n339.09 437.76 3\n340.62 442.32 3\n351.08 437.81 3\n355.65 442.49 3\n357.24 446.59 3\n361.65 444.11 3\n345.92 445.21 3\n349.02 451.95 3\n348.38 438.95 3\n358.49 451.57 3\n345.9 441.06 3\n360.79 449.67 3\n358.19 447.11 3\n351.18 457.04 3\n355.48 453.5 3\n354.18 451.7 3\n351.94 452.31 3\n365.01 439.53 3\n369.88 441.77 3\n359.26 454.45 3\n355.58 455.95 3\n353.81 440.37 3\n361.36 454.94 3\n368.56 447.07 3\n375.49 443.25 3\n366.36 450.91 3\n363.16 455.29 3\n371.11 458.19 3\n372.98 448 3\n373.03 447.92 3\n374.39 457.3 3\n372.26 445.73 3\n377.06 450.13 3\n379.63 439.73 3\n372.1 441.27 3\n383.96 458.63 3\n371.14 456.3 3\n367.27 441.51 3\n381.39 453.48 3\n381.73 455 3\n366.89 451.22 3\n377.3 444.36 3\n380.11 448.47 3\n379.44 449.11 3\n376.58 457.72 3\n372.81 456.82 3\n382.33 462.54 3\n386.07 457.32 3\n378.29 448.74 3\n373.19 450.57 3\n370.29 444.36 3\n383.03 452.18 3\n368.76 464.2 3\n384.78 457.26 3\n383.27 468.2 3\n369.62 471.17 3\n372.26 465.96 3\n371.47 468.03 3\n380.38 467.98 3\n383.79 455.89 3\n385.78 457.04 3\n380.57 470.4 3\n382.35 480.17 3\n386.18 481.18 3\n378.57 471.25 3\n381.73 468.53 3\n376.54 466.42 3\n368.33 466.71 3\n372.74 474.28 3\n382.27 482.73 3\n384.97 466.96 3\n372.55 468.1 3\n378.96 486.39 3\n380.22 496.84 3\n375.67 493.78 3\n366.74 495.22 3\n367.36 484.87 3\n366.04 493.07 3\n366.34 488.55 3\n376.13 492.72 3\n374.27 494.8 3\n371.04 489.64 3\n373.75 473.85 3\n378.51 489.22 3\n385.28 490.17 3\n372.82 490.59 3\n372.1 476.8 3\n370.43 475.76 3\n384.99 474.93 3\n385.69 476.61 3\n381.26 479.32 3\n372.69 488.47 3\n381.36 492.11 3\n384.11 474.87 3\n368.05 475.11 3\n374.83 473.57 3\n369.97 484.63 3\n371.07 475.87 3\n366.76 489.93 3\n384.18 482.18 3\n385.75 492.76 3\n368.73 488.29 3\n385.87 493.03 3\n377.38 499.81 3\n384.49 495.86 3\n372.36 495.33 3\n375.01 501.77 3\n375.62 488.1 3\n379.96 501.58 3\n370.54 498.8 3\n383.35 503.41 3\n371.08 490.36 3\n370.21 495.93 3\n373.48 514.8 3\n370.55 504.81 3\n370.71 506.46 3\n371.66 499.01 3\n377.41 502.92 3\n367.2 513.39 3\n377.92 514.13 3\n384.16 514.71 3\n385.25 505.71 3\n373.21 523.99 3\n377.33 518.61 3\n385.47 514.76 3\n375.76 521.69 3\n371.97 523.57 3\n372.11 523.87 3\n370.64 515.02 3\n366.48 514.77 3\n378.54 523.61 3\n383.2 513.48 3\n367.42 522.37 3\n385.56 520.12 3\n368.75 523.87 3\n384.73 533.84 3\n377.31 529.91 3\n376.84 529.64 3\n378.41 535.94 3\n369.63 528.13 3\n366.75 524.39 3\n366.24 522.79 3\n377.66 536.96 3\n378.88 541.38 3\n372.15 528.05 3\n373.61 537.04 3\n366.17 522.84 3\n383.47 527.33 3\n383.24 522.12 3\n367.52 527.08 3\n373.86 535.5 3\n382.26 535.66 3\n357.93 535.36 3\n365.68 532.88 3\n363.11 538.57 3\n350.31 534.83 3\n365.66 535.6 3\n359.09 526.82 3\n358.05 533.23 3\n367.55 529.39 3\n361.73 527.94 3\n349.54 536.64 3\n347.31 543.75 3\n346.07 536.35 3\n351.77 531.47 3\n353.46 529.62 3\n354.41 533.82 3\n361.57 541.99 3\n346.78 545.99 3\n344.67 536.04 3\n361.73 540.08 3\n355.75 544.66 3\n346.84 539.93 3\n343.98 538.04 3\n342.01 536.7 3\n335.88 525.71 3\n338 533.94 3\n338.97 526.9 3\n353.23 530.62 3\n338.4 540.83 3\n341.43 533.65 3\n336.62 535.57 3\n338.84 535.99 3\n336.84 529.85 3\n325.93 534.64 3\n329.93 528.94 3\n327.31 526.5 3\n342.67 535.84 3\n325.67 540.26 3\n335.96 529.47 3\n324.81 530.54 3\n323.57 531.35 3\n330.93 539.85 3\n325.2 527.89 3\n314.42 533.91 3\n317.52 532.12 3\n329.36 531.92 3\n318.32 542.56 3\n321.96 540.29 3\n322.88 530.85 3\n328.42 530.82 3\n323.48 524.62 3\n313.88 542.08 3\n319.01 525.49 3\n323.61 529.62 3\n320.88 535.79 3\n306.95 532.76 3\n315.62 541.98 3\n316.32 525.54 3\n307.04 539.87 3\n313.11 543.76 3\n317.78 533.33 3\n304.88 538.15 3\n310.86 537.3 3\n306.53 527.56 3\n293.92 539.02 3\n295.26 525.31 3\n298.32 530.93 3\n307.76 535.17 3\n303.6 528.51 3\n295.49 540.66 3\n303.73 529.11 3\n302.05 532.02 3\n302.8 531.32 3\n295.21 533.72 3\n286.11 528.52 3\n296.5 531.68 3\n290.35 537.34 3\n302.04 536.13 3\n285.01 531.55 3\n292.5 541.61 3\n302.73 526.73 3\n286.09 543.6 3\n286.95 541.62 3\n288.73 525.62 3\n291.94 525.4 3\n284.91 535.21 3\n281.74 536.65 3\n282.97 536.73 3\n279.31 541.51 3\n282.08 529.34 3\n288.73 525.19 3\n306.37 533.73 3\n290.12 539.37 3\n294.4 534.15 3\n296.61 526.55 3\n306.91 536.5 3\n306 526.44 3\n291.47 542.59 3\n305.58 525.35 3\n297.59 544.89 3\n288.33 527.02 3\n305.29 540.79 3\n311.39 536.2 3\n310.13 526.2 3\n318.07 544.46 3\n309.02 529.11 3\n305.32 536.69 3\n317.93 532.3 3\n320.65 533.34 3\n319.57 542.11 3\n308.2 539.44 3\n333.5 529.38 3\n336.7 535.47 3\n318.03 528.88 3\n327.08 530.98 3\n329.83 531.53 3\n323.47 532.39 3\n328.83 531.74 3\n337.92 543.6 3\n329.95 528.19 3\n326.37 543.61 3\n338.88 534.78 3\n333.09 535.94 3\n342.69 538.79 3\n346.23 542.77 3\n334.57 526.75 3\n341.95 524.57 3\n333.56 541.51 3\n331.04 527.26 3\n345.95 537.33 3\n348.59 528.64 3\n352.97 539.86 3\n346.53 542.73 3\n343.8 542.63 3\n341.86 531.31 3\n340.08 539.25 3\n359.03 532.22 3\n340.99 531.07 3\n340.08 536.54 3\n359.73 533.9 3\n351.63 527.78 3\n372.06 524.72 3\n361.89 541.77 3\n359.29 542.9 3\n356.45 535.49 3\n369.13 525.67 3\n361.68 539.93 3\n364.28 532.1 3\n374.01 538.93 3\n370.34 535.18 3\n358.54 542.05 3\n372.08 528.04 3\n372.39 542.33 3\n370.01 537.32 3\n368.39 525.05 3\n369.99 533.21 3\n371.23 533.56 3\n382.96 526.72 3\n371.94 527.73 3\n368.82 527 3\n377.01 538.67 3\n380.64 509.76 3\n379.15 510.5 3\n381.52 516.9 3\n385.61 509.08 3\n385.72 511.69 3\n385.25 520.56 3\n372.08 521.03 3\n373.05 527.4 3\n375.92 515 3\n380.46 508.53 3\n381.17 509.89 3\n377.5 498.2 3\n378.08 515.71 3\n383.04 504.89 3\n374.27 498.43 3\n371.71 507.72 3\n382.35 500.94 3\n374.14 512.62 3\n372.42 498.97 3\n375.79 497.16 3\n383.88 449.14 3\n375.92 450.35 3\n378.77 441.93 3\n369.78 441.17 3\n377.04 443.58 3\n382.24 446.89 3\n370.4 456.44 3\n371.94 447.36 3\n369.27 445.18 3\n386.51 454.03 3\n307.62 450.76 3\n293.17 456.52 3\n291.63 444.09 3\n303.18 444.98 3\n308.05 451.8 3\n298.22 448.06 3\n308.96 446.43 3\n306.11 459.63 3\n295.57 453.74 3\n293.5 453.79 3\n305.05 445.18 3\n294.13 455.98 3\n289.97 444.39 3\n296.05 451.41 3\n292.94 442.44 3\n293.6 442.95 3\n306.73 455.38 3\n302.1 441.24 3\n297.24 443.52 3\n305.96 459.49 3\n282.21 479.76 3\n295.87 491.26 3\n285.18 491.68 3\n292.34 478.29 3\n294.6 484.44 3\n295.86 490.94 3\n285.59 490.64 3\n277.5 488.55 3\n282.95 483.96 3\n294.89 478 3\n290.23 498.08 3\n294.37 506.82 3\n283.9 501.09 3\n292.28 502.85 3\n283.46 506.98 3\n293.4 499.57 3\n292.27 500.25 3\n283.43 492.35 3\n289.43 490.77 3\n281.36 509.62 3\n283.61 494.48 3\n278.76 498.29 3\n276.56 482 3\n279.43 485.63 3\n276.21 493.2 3\n279.88 482.74 3\n285 481.02 3\n284.56 487 3\n293.44 490.91 3\n291.58 485.65 3\n198.3 458.14 4\n203.89 469.93 4\n199.19 463.46 4\n198.64 454.27 4\n196.65 451.89 4\n199.64 464.95 4\n207.5 464.11 4\n188.17 464.08 4\n193.09 458.38 4\n203.07 451.7 4\n197.56 457.59 4\n202.17 466.1 4\n200.91 465.19 4\n196.98 459.22 4\n205.31 463.32 4\n195.67 469.62 4\n201.69 461.65 4\n191.84 464.67 4\n191 451.79 4\n200.77 466.85 4\n201.71 451.69 4\n192.59 451.93 4\n200.48 453.18 4\n194.89 447.26 4\n197.26 465.19 4\n200.44 466.05 4\n196.99 465.68 4\n203.72 452.95 4\n206.59 454.63 4\n207.58 464.62 4\n202.88 451.85 4\n204.78 446.27 4\n200.15 456.7 4\n207.99 446.47 4\n200.37 458.39 4\n201.7 446.48 4\n201.3 449.72 4\n200.37 440.41 4\n215.49 457.73 4\n214.85 448.79 4\n201.53 448.19 4\n211.18 454.93 4\n207.18 441.92 4\n212.44 448.85 4\n210.42 452.5 4\n210.99 452.69 4\n218.63 442.57 4\n204.34 451.16 4\n221.99 437.54 4\n218.96 448.85 4\n208.27 450.06 4\n212.25 447.75 4\n217.98 434.97 4\n221.75 431.21 4\n223.15 449.72 4\n219.95 445.2 4\n224.86 440.54 4\n220.52 430.74 4\n225.66 441.67 4\n212.64 445.5 4\n214.78 443.36 4\n218.02 436 4\n218.18 444.59 4\n218.23 429.54 4\n216.33 431.04 4\n228.17 447.54 4\n214.33 428.31 4\n229.09 428.86 4\n227.48 436.96 4\n227.73 436.69 4\n230.46 435.62 4\n229.82 423.5 4\n234.03 434.96 4\n239.51 436.33 4\n225.41 430.7 4\n225.95 436.85 4\n227.65 432.75 4\n221.91 423.96 4\n227.68 424.42 4\n235.92 434.8 4\n235.65 420.72 4\n228.66 418.43 4\n235.06 419.85 4\n236.03 428.26 4\n229.29 427.44 4\n225.75 414.38 4\n239.79 416.05 4\n243.86 424.58 4\n232.34 418.07 4\n236.91 428.55 4\n241.79 427.96 4\n235.44 409.6 4\n235.16 416.5 4\n244.16 408.77 4\n231.48 423.2 4\n242.9 424.95 4\n246.4 423.32 4\n239.09 408.26 4\n247.89 409.23 4\n244.61 415.54 4\n245.38 410.68 4\n244.9 411.19 4\n236.65 408.73 4\n243.95 404.73 4\n254.25 407.38 4\n251.26 415.82 4\n247.76 399.39 4\n252.76 404.75 4\n240.98 403.86 4\n236.98 413.05 4\n240.26 404.66 4\n255.2 395.54 4\n258.91 404.23 4\n243.74 406.34 4\n252.82 397.57 4\n250.77 401.71 4\n247.93 399.11 4\n252.33 393.87 4\n255.98 391.3 4\n245.39 396.91 4\n246.85 408.06 4\n251.04 401.55 4\n258.05 396.09 4\n246.61 390.08 4\n245.69 393.88 4\n259.65 385 4\n260.94 396.9 4\n245.64 402.94 4\n244.92 403.08 4\n251.78 388.86 4\n243.08 398.84 4\n258.91 382.06 4\n248.48 385.45 4\n259.08 386.77 4\n250.36 382.85 4\n247.62 398.27 4\n261.63 384.05 4\n247.43 381.97 4\n250.61 395.19 4\n264.37 390.65 4\n258.36 391.3 4\n190.06 455.47 4\n195.48 451.46 4\n201.24 458.19 4\n198.89 458.24 4\n203.55 468.81 4\n199.34 466.97 4\n191.85 452.98 4\n203.38 455.55 4\n198.53 464.05 4\n203 455.49 4\n194.69 464.91 4\n186.61 453.88 4\n190.47 462.79 4\n195.8 462.98 4\n189.89 460.86 4\n190.97 452.25 4\n194.53 459.51 4\n183.97 458.15 4\n198.73 449.22 4\n186.47 447.6 4\n184.29 449.35 4\n176.73 455.3 4\n191.18 446.89 4\n179.84 452.04 4\n175.18 443.39 4\n181.11 456.55 4\n190.27 440.32 4\n190.94 453.05 4\n189.19 450.35 4\n179.1 452.01 4\n174.44 449.97 4\n172.03 434.57 4\n174.07 450.58 4\n175.13 449.2 4\n165.88 448.76 4\n168.29 432.29 4\n179.29 450.29 4\n172.98 449.92 4\n172.1 450.39 4\n167.58 442.8 4\n174.78 439.86 4\n164.13 436.56 4\n176.13 446.54 4\n169.48 447.33 4\n178.52 433.01 4\n166.08 434.11 4\n162.53 429.1 4\n176.36 432.25 4\n168.91 448.69 4\n177.75 445.15 4\n164.68 432.66 4\n166.76 437.64 4\n164.07 435.11 4\n152.04 431.92 4\n168.08 423.64 4\n158.56 432.68 4\n161.75 426.83 4\n170.06 433.08 4\n154.79 423.99 4\n165.09 429.73 4\n154.66 435.48 4\n147.97 434.71 4\n147.19 430.58 4\n154.19 422.82 4\n151.84 435.21 4\n162.41 430.96 4\n153.94 425.48 4\n153.24 435.55 4\n149.07 419.97 4\n146.52 418.66 4\n148.73 424.08 4\n148.69 416 4\n138.89 425.37 4\n153.89 411.67 4\n152.34 417.64 4\n142.22 424.99 4\n153.99 412.53 4\n149.8 408.78 4\n147.58 417.33 4\n144.91 423.09 4\n148.67 411.39 4\n140.61 414.57 4\n134.42 406.89 4\n146.25 408.91 4\n139 418.24 4\n140.38 412.91 4\n139.69 400.23 4\n138.24 409.93 4\n137.67 402.47 4\n149.3 409.35 4\n126.59 412.45 4\n140.89 405.4 4\n129.15 412.94 4\n126.7 400.82 4\n140.11 404.86 4\n138.7 397.16 4\n125.15 403.94 4\n132.31 411.92 4\n138.65 413.69 4\n128.17 395.76 4\n132.45 392.75 4\n140.23 391 4\n135.5 390.57 4\n125.47 394.75 4\n138.57 407.52 4\n131.49 408.38 4\n123.88 393.71 4\n137.26 394.43 4\n126.34 401.15 4\n124.47 400.66 4\n122.16 404.27 4\n136.05 397.37 4\n134.26 394.87 4\n138.96 386.37 4\n136.19 387.27 4\n123.36 388.86 4\n138.88 390.61 4\n139.61 397.8 4\n128.32 386.19 4\n120.12 401.68 4\n123.52 388.45 4\n119.1 392.93 4\n133.36 385.84 4\n120.26 400.77 4\n134.65 392.68 4\n119.76 393.56 4\n124.64 386.31 4\n128.29 396.37 4\n120.41 393.71 4\n123.22 385.37 4\n145.77 393.51 4\n137.14 393.7 4\n138.18 393.07 4\n137.07 397.67 4\n140.52 389.69 4\n135.67 398.87 4\n128.85 408.99 4\n130.66 405.39 4\n134.79 398.03 4\n135.89 406.96 4\n150.18 418.17 4\n142.18 414.92 4\n140.63 411.45 4\n145.11 407.78 4\n147.82 411.9 4\n151 407.72 4\n150.83 415.74 4\n135.16 401.2 4\n136.89 414.08 4\n140.62 404.82 4\n150.2 408.49 4\n152.68 422.47 4\n151.35 412.86 4\n157.05 424.34 4\n148.68 426.44 4\n160.54 408.35 4\n149.52 417.35 4\n153.08 417 4\n155.37 412.9 4\n159.1 408.44 4\n150.24 427.06 4\n152.83 419.78 4\n160.87 431.75 4\n158.89 428.23 4\n153.08 416.97 4\n167.93 434.65 4\n166.45 424.95 4\n163.38 433.96 4\n160.96 427.63 4\n161.1 433.9 4\n181.14 437.87 4\n176.46 438.86 4\n169.81 438.33 4\n182.45 430.82 4\n163.65 445.6 4\n181.41 431.22 4\n166.67 440.68 4\n178.08 432.8 4\n167.84 440.94 4\n169.44 436.48 4\n171.23 449.72 4\n182.82 444.9 4\n176.3 445.6 4\n188.05 441.99 4\n183.7 439.02 4\n175.06 445.93 4\n180.96 448.71 4\n183.01 442.18 4\n169.45 449.21 4\n187.35 437.74 4\n191.27 444.3 4\n182.97 438.2 4\n185.82 440.49 4\n189.89 441.88 4\n188.1 445.55 4\n182.45 448.23 4\n177.89 452.31 4\n193.3 455.14 4\n195.03 439.95 4\n189.35 439.2 4\n117.72 385.48 4\n122.63 377.76 4\n121.74 387.04 4\n124.2 375.18 4\n127.16 382.54 4\n127.69 382.12 4\n123.44 381.06 4\n121.13 376.28 4\n127.91 371.9 4\n133.42 381.22 4\n130.44 374.66 4\n136.7 380.41 4\n128.86 374.55 4\n136.66 367.58 4\n138.2 382.07 4\n127.34 375.42 4\n140.79 381.27 4\n125.45 380.05 4\n132.13 378.84 4\n131.31 376.54 4\n137.27 366.15 4\n133.56 370.78 4\n138.64 360.85 4\n138.61 361.91 4\n137.03 359.91 4\n142.13 359.84 4\n140.54 361.11 4\n139.26 374.2 4\n130.06 359.98 4\n147.35 368.34 4\n140.84 353.83 4\n134.76 366.96 4\n150.47 356.02 4\n144.82 367.66 4\n151.73 367 4\n139.22 371.1 4\n147.48 372.7 4\n141.74 356.31 4\n147.96 360.38 4\n139.26 357.93 4\n141.8 358.32 4\n156.21 348.31 4\n156.45 363.2 4\n156.08 352.66 4\n150.86 357.83 4\n152.37 350.2 4\n158.09 357.93 4\n156.27 360.5 4\n157.75 363.24 4\n155.08 362.48 4\n162.43 346.75 4\n163.35 349.73 4\n161.67 346.8 4\n148.99 356.4 4\n153.2 348.88 4\n159.99 354.49 4\n156.76 343.47 4\n152.85 362.09 4\n153.94 347.41 4\n154.56 353.59 4\n163.39 342.31 4\n165.92 338.05 4\n157 345.79 4\n172.24 340.97 4\n164.41 345.46 4\n157 348.08 4\n161.71 337.14 4\n154.67 344.41 4\n172.13 348.46 4\n163.47 341.69 4\n164.7 343.18 4\n172.45 337.05 4\n171.88 346.85 4\n163.73 335.13 4\n175.97 338.81 4\n157.38 343.17 4\n156.84 337.49 4\n166.79 351.48 4\n171 345.14 4\n172.25 346.07 4\n176.58 341.91 4\n169.96 332.16 4\n178.97 325.11 4\n178.44 326.47 4\n169.4 336.15 4\n181.5 328.26 4\n171.77 343.04 4\n176.6 328.48 4\n175.76 340.22 4\n172.65 341.4 4\n174.81 327.5 4\n191.28 324.43 4\n190.37 340.29 4\n175.31 340.67 4\n186.26 340.09 4\n176.14 336.95 4\n184.61 340.22 4\n182.48 338.43 4\n190.87 322.33 4\n176.67 325.27 4\n178.32 337.64 4\n186.63 326.13 4\n176.43 333.61 4\n177.88 335.8 4\n191.76 332.31 4\n179.12 338.13 4\n185.95 329.39 4\n187.96 330.96 4\n175.63 337.27 4\n179.15 321.21 4\n193.52 328.24 4\n178.94 327.4 4\n191.32 338.3 4\n193.89 324.32 4\n179.57 336.73 4\n184.27 335.84 4\n183.85 330.84 4\n192.23 323.22 4\n193.01 325.36 4\n184.32 327.39 4\n200.13 343.46 4\n202.98 332.06 4\n198.42 329.82 4\n188.9 340.01 4\n200.39 344.17 4\n191.98 331.19 4\n187.45 334.11 4\n196.08 342.22 4\n192.31 342.2 4\n192.81 337.94 4\n190.76 349.94 4\n200.67 349.47 4\n208.32 341.83 4\n197.8 346.03 4\n208.77 350.61 4\n201.54 335.23 4\n193.3 346.37 4\n196.31 345.38 4\n202.7 337.96 4\n208.83 333.32 4\n195.91 344.7 4\n212.93 334.53 4\n207.48 342.27 4\n202.13 353 4\n203.85 341.3 4\n199.29 341.41 4\n212.12 341.43 4\n206.18 336.02 4\n200.38 340.09 4\n200.35 350.43 4\n206.29 341.36 4\n217.96 351.75 4\n222.43 337.16 4\n218.31 344.44 4\n211.09 350.99 4\n214.13 346.28 4\n208.15 339.44 4\n218.07 341.14 4\n213.75 349.12 4\n215.24 337.6 4\n224.5 350.74 4\n210.19 344.25 4\n209.7 357.27 4\n211.29 347.59 4\n220.44 348.32 4\n222.85 343.04 4\n219.92 351.36 4\n225.22 345.5 4\n225.96 340.91 4\n222.28 357.91 4\n220.36 363.04 4\n219.53 361.85 4\n226.05 346.9 4\n220.05 353.52 4\n228.9 362.98 4\n225.64 346.26 4\n228.64 353.13 4\n220.18 360 4\n223.64 360.37 4\n228.58 354.81 4\n228.59 367.66 4\n231.4 371.1 4\n242.12 371.48 4\n232.93 371 4\n231.8 363.3 4\n242.2 355.7 4\n228.65 358.03 4\n229.7 372.52 4\n232.95 355.08 4\n229.14 363.95 4\n234.46 370.7 4\n247.11 373.95 4\n243.17 358.13 4\n239.66 359.27 4\n232.77 365.49 4\n243.63 368.23 4\n241.06 373.55 4\n240.9 367.56 4\n248.27 376.65 4\n237.77 360.71 4\n253.71 364.84 4\n243.26 379.69 4\n254.33 375.17 4\n245.46 373.74 4\n247.71 366.26 4\n240.4 366.04 4\n256.63 382.68 4\n247.55 372.71 4\n248.04 377.17 4\n240.53 363.84 4\n242.42 377.33 4\n257.53 369.08 4\n257.42 370.07 4\n251.96 382.08 4\n248.29 369.64 4\n259.34 385.78 4\n253.46 371.86 4\n255.27 373.52 4\n244.83 369.61 4\n248.63 379.58 4\n235.05 369.8 4\n237.22 372.49 4\n249.32 368.07 4\n242.86 374.1 4\n238.66 362.07 4\n250.89 375.81 4\n241.31 370.12 4\n237.49 362.68 4\n237.23 371.37 4\n246.65 360.96 4\n219.41 365.99 4\n223.52 360.2 4\n228.85 369.91 4\n217.11 361.84 4\n234.9 357.68 4\n222.46 363.06 4\n223.96 361.62 4\n230.89 367.73 4\n229.33 357.19 4\n230.89 369.42 4\n226.28 352.54 4\n213.51 353.54 4\n214.99 363.01 4\n226.78 361.09 4\n217.91 354.33 4\n214.09 357.95 4\n221.93 355.37 4\n229.22 349.4 4\n225.11 358.57 4\n211.71 354.16 4\n210.42 344.9 4\n213.16 343.07 4\n213.08 349.23 4\n206.17 350.93 4\n219.06 343.72 4\n217.43 348.01 4\n206.71 339.37 4\n212.88 345.11 4\n214.92 342.69 4\n210.66 343.11 4\n202.17 343.93 4\n190.27 339.93 4\n191.43 339.48 4\n201.45 328.21 4\n205.77 346.44 4\n206.99 330.85 4\n200.21 339.23 4\n201.98 341.81 4\n193.11 330.56 4\n195.36 338.81 4\n192.18 333.78 4\n178.05 337.99 4\n182.74 327.85 4\n187.46 341.07 4\n191.62 326.92 4\n189.26 333.86 4\n181.52 334.34 4\n177.42 337.54 4\n186.32 326.16 4\n179.18 323.71 4\n130.22 368.24 4\n134.92 370.21 4\n130.08 356.89 4\n137.76 375.17 4\n141.57 356.92 4\n134.31 367.94 4\n145.15 357 4\n148.45 362.05 4\n143.25 359.34 4\n132.38 365.07 4\n130.39 374.35 4\n138.76 368.03 4\n134.69 370.98 4\n132.68 357.75 4\n142.41 365.15 4\n145.76 374.53 4\n138.66 368.6 4\n137.79 363.31 4\n133.72 359.79 4\n142.8 367.33 4\n140.04 358.54 4\n149.51 355.02 4\n150.25 357.86 4\n155.36 345.6 4\n150.69 346.15 4\n155.44 347.88 4\n155.29 357.69 4\n153.12 342.46 4\n148.18 340.92 4\n155.79 349.6 4\n157.19 355.96 4\n153.69 348.89 4\n149.19 351.88 4\n151.05 351.77 4\n149.87 355.65 4\n163.68 354.24 4\n154.32 344.1 4\n154.39 343.41 4\n156.59 343.95 4\n159.8 350.55 4\n163.18 353.59 4\n157.1 362.28 4\n149.59 356.35 4\n147.21 365.17 4\n144.89 346.11 4\n152.48 361.98 4\n157.29 360.26 4\n149.05 361.91 4\n146.33 358.3 4\n148.53 346.72 4\n139.25 349.31 4\n218.04 437.44 4\n220.29 450.98 4\n225.04 433.06 4\n214.25 434.21 4\n215.99 441.05 4\n208.72 449.19 4\n227.07 442.72 4\n221.21 433.19 4\n222.83 445.46 4\n218.46 438.48 4\n211.28 436.06 4\n217.16 434.23 4\n206.12 451.65 4\n212.12 445.7 4\n203.89 435.37 4\n210.29 452.63 4\n212.07 441.61 4\n215.74 439.71 4\n217.13 434.17 4\n204.02 441.62 4\n223.78 434.98 4\n218.42 437.18 4\n210.56 430.4 4\n217.65 425.46 4\n216.93 430.51 4\n222.84 442.86 4\n213.28 431.26 4\n210.09 434.33 4\n222.73 442.82 4\n214.36 441.69 4\n237.39 418.2 4\n229.47 429.84 4\n244.66 429.93 4\n239.66 429.15 4\n244.76 425.95 4\n237.55 426.22 4\n243.88 422.54 4\n240.95 421.33 4\n229.03 427.66 4\n229.86 420.24 4\n249.26 404.24 4\n251.17 415.68 4\n251.71 419.58 4\n252.39 420.42 4\n251.73 411.15 4\n244.97 401.75 4\n242.29 401.06 4\n238.15 419.37 4\n250.35 412.23 4\n244.49 419.99 4\n259.66 395.81 4\n261.1 407.93 4\n250.81 404.14 4\n254.44 408.97 4\n252.95 405.59 4\n262.43 394.1 4\n255.92 397.37 4\n261.36 395.05 4\n250.06 408.09 4\n262.14 392.31 4\n257.57 396.5 4\n270.06 403.05 4\n262.73 409.5 4\n267.01 408.7 4\n262.4 392.08 4\n267.11 395.43 4\n271.62 396.93 4\n262.14 393.18 4\n271.35 390.59 4\n257.85 406.07 4\n228.08 364.61 4\n231.64 362.28 4\n243.51 364.21 4\n226.72 357.82 4\n230.6 367.68 4\n240.6 362.96 4\n238.25 372.67 4\n229.44 360.23 4\n232.08 364.63 4\n232.92 361.7 4\n252.77 381.88 4\n247.62 392.28 4\n263.44 387.81 4\n253.68 382.19 4\n259.08 392.99 4\n264.67 393.6 4\n255.51 377.17 4\n262.49 379.26 4\n252.66 388 4\n247.5 388.68 4\n256.48 390.36 4\n256.89 377.67 4\n261.53 378.71 4\n255.17 386.52 4\n254.66 392.37 4\n258.6 389.34 4\n247.35 382.35 4\n265.2 376.06 4\n264.45 376.43 4\n250.61 380.42 4\n256.62 386.65 4\n251.43 395.22 4\n266.27 388.11 4\n258.57 391.79 4\n259.73 391 4\n254.08 399.91 4\n253.78 392.9 4\n257.59 388.64 4\n261.87 397.35 4\n252.47 400.82 4\n264.4 398.63 4\n250 403.1 4\n259.46 396.45 4\n261.72 408.46 4\n252.46 393.86 4\n254.19 409.98 4\n258.04 397.77 4\n247.66 397.47 4\n266.59 403.65 4\n262.47 405.94 4\n247.05 409.1 4\n245.69 420.5 4\n234.86 407.95 4\n240.49 409.16 4\n238.89 408.82 4\n242.22 414 4\n247.82 414.76 4\n238.5 407.32 4\n243.72 422.43 4\n246.33 408.02 4\n231.04 421.65 4\n237.27 421.82 4\n226.69 419.44 4\n224.6 421.41 4\n239.16 426.92 4\n228.72 420.58 4\n227.71 426.17 4\n241.93 429.33 4\n239.63 419.52 4\n234.09 415.81 4\n227.93 434.94 4\n233.62 436.41 4\n217.54 425.56 4\n232.23 439.02 4\n234.67 427.1 4\n218.98 420.79 4\n229.37 429.67 4\n216.5 427.53 4\n233.05 439.01 4\n217.24 436.4 4\n213.08 430.56 4\n203.91 430.68 4\n215.42 444.26 4\n202.52 431.18 4\n220.37 447.48 4\n201.17 431.52 4\n213.4 434.65 4\n213.54 439.4 4\n210.96 434.04 4\n220.72 442.19 4\n210.34 444.44 4\n194.43 450.68 4\n199.3 447.32 4\n194.28 456.99 4\n195.82 444.78 4\n204.58 447.64 4\n200.65 446.53 4\n204.44 454.08 4\n196.37 445.67 4\n213.8 454.18 4\n352.62 446.2 3\n347.57 441.9 3\n351.7 443.21 3\n349.3 435.54 3\n350.36 444.3 3\n348.3 435.53 3\n349.05 449.92 3\n364.43 437.18 3\n353.32 446.58 3\n348.12 437.77 3\n354.46 440.77 3\n347.62 438.56 3\n347.37 433.29 3\n341.16 442.67 3\n354.43 446.37 3\n341.44 435.51 3\n346.13 433.37 3\n354.07 437.71 3\n350.66 441.36 3\n339.25 435.67 3\n354.16 434.2 3\n330.65 452.58 3\n336.68 441.11 3\n334.92 443.7 3\n331.34 453.99 3\n331.77 453.2 3\n335.38 434.56 3\n332.28 437.36 3\n344.19 447.13 3\n344.17 435.99 3\n341.41 444.4 3\n353.29 445.79 3\n342.73 439.77 3\n351.52 448.73 3\n340.21 438.36 3\n334.65 443.17 3\n345.58 443.9 3\n343.81 439.37 3\n342.66 446.82 3\n334.2 449.86 3\n341.76 435.64 3\n358.82 442.96 3\n366.64 442.46 3\n367.76 434.95 3\n368.64 435.45 3\n364.24 444.29 3\n368.97 449.81 3\n356.26 445 3\n362.38 452.6 3\n359.9 444.43 3\n368.63 447.55 3\n365.35 482.83 3\n367.35 484.42 3\n365.03 479.79 3\n380.03 482.61 3\n381.3 487.57 3\n370.08 475.27 3\n366.23 487.66 3\n375.72 487.36 3\n370.03 479.5 3\n382.27 485.04 3\n365.76 471.48 3\n368.39 469.64 3\n383.48 482.17 3\n384.07 476.72 3\n384.59 473.77 3\n381.4 468.12 3\n384.2 478.41 3\n376.22 467.17 3\n368.14 479.56 3\n383.01 466.91 3\n379.56 515.72 3\n382.6 511.26 3\n381.47 511.99 3\n363.12 504.73 3\n366.87 516.62 3\n365.36 507.87 3\n372.71 513.27 3\n364.66 501.16 3\n365.96 505.78 3\n382.53 497.86 3\n282.64 365.27 -1\n180.87 329.72 -1\n348.24 507.23 -1\n148.71 327.57 -1\n354.93 403.76 -1\n222.26 360.51 -1\n198.31 434.52 -1\n320.1 432.7 -1\n189 515.62 -1\n258.98 449.59 -1\n179.34 359.65 -1\n132.88 525.39 -1\n340.22 470.39 -1\n148.85 434.21 -1\n263.38 506 -1\n285.63 385.88 -1\n117.82 519.24 -1\n161.7 440.67 -1\n120.28 342.03 -1\n320.12 429.9 -1\n306.87 393.55 -1\n213.5 443.29 -1\n297.92 536.26 -1\n142.18 403.83 -1\n338.84 327.16 -1\n231.48 487.44 -1\n145.03 394.24 -1\n143.95 470.76 -1\n146.31 433.12 -1\n284.02 514 -1\n171.74 489.49 -1\n231.47 410.66 -1\n335.18 436.78 -1\n206.13 452.43 -1\n191.19 435.46 -1\n337.81 486.17 -1\n173.22 484.99 -1\n293.66 364.85 -1\n272.55 423.46 -1\n269.24 437.79 -1\n202.3 354.73 -1\n156.87 496.98 -1\n152.72 358.4 -1\n344.22 432.53 -1\n245.29 379.47 -1\n180 460.97 -1\n327.66 423.96 -1\n301.15 378.63 -1\n161.06 441.34 -1\n155.67 341.81 -1\n297.47 447.43 -1\n294.78 512.77 -1\n290.77 484.74 -1\n167.93 442.3 -1\n338.69 472.18 -1\n282.69 514.68 -1\n368.29 445.28 -1\n295.5 525.92 -1\n184.7 446.36 -1\n130.27 514.15 -1\n127.7 422.99 -1\n310.78 357.32 -1\n183.15 374.53 -1\n156.88 458.72 -1\n129.54 418.71 -1\n335.13 457.69 -1\n343.7 374.41 -1\n179.41 531.05 -1\n318.73 479.32 -1\n193.51 473.08 -1\n208.64 409.69 -1\n206.21 466.76 -1\n135.63 422.2 -1\n138.46 543.74 -1\n250.73 342.26 -1\n218.81 426.49 -1\n229.69 477.14 -1\n284.29 532.91 -1\n296.07 463.16 -1\n288.54 459.93 -1\n209.16 465.87 -1\n317.9 355.28 -1\n304.66 376.21 -1\n275.39 444.87 -1\n354.99 375.45 -1\n375.01 434.52 -1\n348.9 501.32 -1\n203.83 470.36 -1\n296.09 329.48 -1\n361.14 367.97 -1\n301.95 411.83 -1\n386.27 527.61 -1\n251.95 457.88 -1\n191.07 370.28 -1\n304.1 399.64 -1\n385.17 381.44 -1\n174.27 407.84 -1\n229.46 518.63 -1\n161.27 445.97 -1\n184.84 394.25 -1\n223.55 466.99 -1\n209.15 458.42 -1\n367.3 341.21 -1\n239.41 403.66 -1\n259.26 532.49 -1\n252.43 528.64 -1\n335.53 379.85 -1\n334.05 380.68 -1\n149.39 496.62 -1\n283.93 411.55 -1\n325.42 520.47 -1\n194.28 422.86 -1\n190.06 432.93 -1\n243.01 405.05 -1\n217.51 407.01 -1\n246.54 424.18 -1\n235.51 482.56 -1\n186.32 485.29 -1\n155.66 396.77 -1\n189.27 425.58 -1\n211.39 406.11 -1\n185.6 335.59 -1\n122.48 542.1 -1\n238.88 455.25 -1\n360.86 422.39 -1\n294.33 539.7 -1\n156.27 476.01 -1\n187.2 430.9 -1\n330.4 401.54 -1\n313.23 418.2 -1\n338.97 367.56 -1\n369.39 443.68 -1\n169.11 536.44 -1\n259.36 363.25 -1\n193.15 494.34 -1\n227.49 509.44 -1\n182.19 525.38 -1\n213.83 462.97 -1\n161.89 454.64 -1\n118.4 494.88 -1\n227.03 382.59 -1\n192.88 323.07 -1\n165.37 381.55 -1\n275.14 441.21 -1\n311.66 371.82 -1\n311.59 526.32 -1\n254.42 346.56 -1\n205.33 381.08 -1\n239.98 429.06 -1\n197.52 469.78 -1\n301.31 509.12 -1\n366.28 463.88 -1\n196.14 502.57 -1\n245.66 449.88 -1\n222.96 431.74 -1\n203.44 501.38 -1\n225.29 420.35 -1\n353.61 332.2 -1\n373.92 543.17 -1\n118.9 489.63 -1\n279.38 451.28 -1\n328.88 396.33 -1\n289.34 454.59 -1\n149.17 523.88 -1\n308.7 510.69 -1\n144 516.41 -1\n274.9 344.51 -1\n345.54 498.18 -1\n179.74 412.47 -1\n"
  },
  {
    "path": "data_src/data_DBCV/dataset_3.txt",
    "content": "-6.1698 2.2449 1\n-2.6453 6.9494 1\n-4.9691 4.9966 1\n-3.0064 6.868 1\n-4.3216 -3.2774 1\n-0.45173 -4.763 1\n2.2952 1.3859 1\n-2.5657 7.1216 1\n-2.3846 7.1456 1\n-3.6913 6.332 1\n-5.153 -2.4078 1\n-1.3853 7.5652 1\n1.8265 1.6745 1\n3.1154 0.26274 1\n-5.778 -1.4073 1\n-2.4125 7.1112 1\n2.5649 -3.0943 1\n0.67043 -4.389 1\n-0.79545 7.7774 1\n-0.44995 -4.6329 1\n-6.1132 1.9711 1\n-6.1505 2.1439 1\n-5.5912 -1.9206 1\n-6.2993 1.4634 1\n-1.3839 7.7193 1\n-5.9906 2.6321 1\n3.4322 -0.6101 1\n-5.3034 -2.3233 1\n-0.65154 -4.7226 1\n-6.1521 0.39326 1\n-3.8776 -3.7479 1\n-6.3132 1.3069 1\n0.93691 -4.4268 1\n-0.35854 7.9465 1\n-0.78787 -4.6477 1\n-2.1887 7.3713 1\n-0.74915 7.7361 1\n-6.2274 0.0082439 1\n-1.7182 7.5466 1\n2.8764 -2.5084 1\n2.0636 -3.6648 1\n-4.6502 5.4532 1\n2.7572 1.1565 1\n2.5495 1.2955 1\n3.2146 -0.17538 1\n-6.2863 0.82606 1\n-5.3056 4.401 1\n-6.061 -0.26088 1\n-5.2331 -2.2809 1\n-2.6057 -4.3723 1\n-5.6997 -1.5265 1\n-0.87514 -4.7666 1\n-6.1727 0.38501 1\n2.5422 -3.008 1\n-4.1156 5.9038 1\n-2.0388 7.4532 1\n3.4408 -0.76035 1\n-5.2475 -2.2421 1\n-5.5362 4.12 1\n-1.4176 7.7016 1\n2.4145 -3.3185 1\n2.9195 -2.5891 1\n-4.3937 -3.3767 1\n1.7697 1.7792 1\n-2.1413 -4.4583 1\n-2.3857 -4.3793 1\n-2.3096 7.314 1\n-0.63063 -4.7641 1\n-1.9559 -4.6505 1\n-6.2077 0.76146 1\n-5.1073 -2.3705 1\n-5.8922 -1.2146 1\n-3.1161 -4.1828 1\n2.18 1.5453 1\n-2.1073 7.4124 1\n3.2978 -1.6224 1\n1.2467 -4.2636 1\n-3.2396 -4.0549 1\n-5.7833 3.4376 1\n-5.1881 -2.2561 1\n0.60866 -4.403 1\n2.178 -3.671 1\n-1.998 -4.5224 1\n-0.27819 -4.676 1\n-6.1715 1.2513 1\n-2.7158 7.0806 1\n2.8426 -2.5933 1\n-5.9742 -0.66186 1\n-0.7476 7.7348 1\n2.1055 1.7288 1\n-5.8099 3.4073 1\n-6.2355 0.50701 1\n3.0143 -2.3951 1\n-1.5247 7.5515 1\n1.8699 -3.7282 1\n-4.9572 4.9802 1\n-4.3967 5.6445 1\n3.3057 -1.5753 1\n-4.8397 5.0022 1\n-5.0865 -2.5794 1\n3.3492 -1.7359 1\n2.8991 -2.6453 1\n-3.9605 -3.5645 1\n1.5611 1.8471 1\n3.3819 0.011511 1\n0.16344 7.931 1\n-1.5348 -4.6219 1\n-3.9732 -3.56 1\n-1.3544 -4.715 1\n3.0456 -2.5702 1\n-4.9667 -2.6651 1\n-0.10006 7.8403 1\n-6.1917 0.36259 1\n-3.9011 6.0356 1\n-0.90232 -4.6295 1\n3.021 -2.4461 1\n-6.076 -0.34853 1\n3.214 0.30889 1\n2.6117 1.3262 1\n-0.83203 -4.6298 1\n-6.2157 1.6625 1\n-5.8382 3.1314 1\n-6.1839 1.805 1\n-2.8905 7.0309 1\n0.14707 -4.5635 1\n1.4356 -4.1191 1\n-5.19 -2.3845 1\n-5.1262 4.6174 1\n2.3806 -3.4144 1\n-5.8711 3.4416 1\n-6.1608 -0.093702 1\n0.21929 -4.5793 1\n-1.1652 -4.7321 1\n-3.9254 6.228 1\n-3.7933 -3.8204 1\n-1.6515 -4.5556 1\n-5.9038 -0.9972 1\n-6.2008 0.83282 1\n1.6254 -3.9999 1\n-5.6922 3.7521 1\n-1.4298 7.5338 1\n-2.7265 6.8836 1\n-3.3159 6.6052 1\n1.7764 -3.8709 1\n1.0788 -4.3696 1\n3.365 -0.99269 1\n-3.9803 -3.6076 1\n-4.9048 5.0547 1\n-3.8898 6.2905 1\n-1.4511 7.5179 1\n-5.4893 -1.7314 1\n3.3652 -1.1164 1\n-0.98328 -4.6799 1\n2.2731 1.5816 1\n3.0047 0.36568 1\n-3.2663 -4.1672 1\n1.4962 1.8261 1\n3.2998 -1.8415 1\n-6.094 -0.34383 1\n-5.5513 4.1196 1\n-6.1505 1.898 1\n1.2574 -4.2461 1\n-5.8598 -1.3176 1\n-1.2144 7.7601 1\n-3.7141 6.3007 1\n-5.3108 -2.3238 1\n-5.8959 3.1082 1\n-4.6705 -3.115 1\n-5.3752 -2.1539 1\n-2.2387 7.2425 1\n-5.9466 3.3294 1\n-5.851 -1.2134 1\n2.6023 1.2032 1\n-2.72 -4.3814 1\n-3.4958 -3.9466 1\n-0.93825 -4.7496 1\n-4.1957 -3.5278 1\n1.2663 -4.2248 1\n3.2018 -2.1513 1\n-3.4006 -4.1237 1\n-2.171 7.313 1\n-4.5383 5.4181 1\n-4.0748 -3.5052 1\n0.085267 -4.7325 1\n2.4304 1.2277 1\n-2.9598 6.8878 1\n-6.1139 2.1374 1\n-1.4481 7.5303 1\n-4.8102 5.1402 1\n-5.8818 3.5197 1\n3.3688 -1.2538 1\n-5.1811 4.6182 1\n3.2674 -1.4453 1\n-1.4325 -4.6638 1\n3.3017 -0.12231 1\n-2.1011 -4.5904 1\n-4.0724 6.0534 1\n-4.7317 5.2959 1\n-5.4326 4.3945 1\n-6.1893 2.2533 1\n-2.2982 7.2706 1\n3.0265 -2.3502 1\n-3.7599 -3.904 1\n-5.6157 3.99 1\n-3.724 -3.84 1\n-5.6048 3.8756 1\n2.8111 -2.852 1\n-0.80854 7.7524 1\n-0.89907 7.8053 1\n-6.0955 2.1419 1\n-2.3342 -4.4729 1\n-1.0081 -4.7688 1\n-6.0881 2.6516 1\n-1.6014 7.5111 1\n-2.5625 -4.4832 1\n-4.4079 -3.2629 1\n-6.151 -0.31588 1\n-5.9188 3.2525 1\n-1.1186 -4.7033 1\n-2.5774 -4.4689 1\n-0.45431 -4.7551 1\n-4.6819 5.3644 1\n-0.77646 -4.7392 1\n-4.583 5.6029 1\n-4.9591 5.0709 1\n3.3449 -1.0402 1\n2.8335 -2.7907 1\n-0.5272 -4.6913 1\n-3.6065 6.3909 1\n0.72579 -4.3608 1\n2.8243 1.0215 1\n-5.1695 4.6522 1\n0.98626 -4.3009 1\n-2.9066 6.8253 1\n0.039289 7.8544 1\n-4.4984 -3.1992 1\n0.20231 -4.548 1\n3.3438 -0.065579 1\n-4.5904 -3.0627 1\n-2.0359 -4.4939 1\n-1.9076 -4.5294 1\n-6.2583 0.89337 1\n-6.1533 2.2783 1\n2.5585 1.1847 1\n-5.8076 3.3952 1\n-4.1019 -3.4957 1\n-6.303 1.606 1\n-1.022 -4.8092 1\n-3.0028 -4.3178 1\n-6.1773 2.0103 1\n-5.697 -1.4977 1\n-6.1036 0.13014 1\n-5.3868 4.384 1\n-2.8206 -4.2493 1\n-5.8641 3.2251 1\n-0.57206 7.8735 1\n-1.687 -4.6287 1\n2.7958 0.98318 1\n-4.4925 -3.149 1\n-2.4689 -4.4222 1\n-5.4706 -2.0264 1\n-3.4792 6.6521 1\n2.0998 -3.7863 1\n-0.08993 -4.575 1\n1.5086 1.9064 1\n-6.0624 2.3474 1\n-5.9216 3.1475 1\n-2.0402 -4.5406 1\n-1.5541 7.4392 1\n-4.3069 -3.5199 1\n-0.19346 -4.6784 1\n-6.2463 0.82607 1\n1.9587 -3.6271 1\n-3.8419 6.3257 1\n-5.8543 3.1637 1\n-3.6623 -3.8544 1\n-4.9279 -2.7225 1\n2.452 1.2546 1\n-4.961 -2.7089 1\n-3.3635 6.6231 1\n-1.7314 -4.6645 1\n-2.1333 7.4257 1\n-3.6074 -3.8437 1\n-4.2607 -3.33 1\n-5.9917 2.8099 1\n-0.95921 -4.7305 1\n-1.4213 7.625 1\n-2.5912 -4.4147 1\n1.2211 1.9544 1\n3.1921 0.26394 1\n2.9753 -2.3685 1\n-6.181 2.5176 1\n1.9384 -3.7714 1\n-0.74784 7.7964 1\n3.3864 -0.72418 1\n0.34031 -4.4723 1\n3.3078 -1.8496 1\n-0.49715 7.8825 1\n-0.83413 -4.7821 1\n-2.8652 6.9821 1\n-4.6518 5.3171 1\n3.3604 -0.73416 1\n-5.9145 -1.1117 1\n-6.2117 1.0877 1\n-5.8441 -1.089 1\n-1.7505 7.5096 1\n2.3767 1.3365 1\n3.2739 -1.9942 1\n2.8171 -2.8902 1\n-6.1911 2.5085 1\n-6.2916 1.5628 1\n3.2681 -1.2102 1\n-6.0338 -0.08985 1\n-1.6683 -4.601 1\n0.096065 -4.5557 1\n-2.7946 6.8945 1\n1.1918 -4.2068 1\n3.4074 -0.2537 1\n-5.9936 2.7372 1\n-4.9291 5.1042 1\n-2.4852 -4.4388 1\n-1.7567 -4.6021 1\n-4.6514 -3.0039 1\n-1.0551 7.6402 1\n-1.6229 7.5916 1\n3.1767 -2.1681 1\n-2.4498 -4.5642 1\n-4.7123 -2.9139 1\n-5.69 3.6948 1\n3.2183 -0.028644 1\n2.8023 1.0581 1\n3.4585 -0.50888 1\n2.0868 -3.7497 1\n-4.4788 5.4788 1\n3.3178 -0.50115 1\n-3.317 -4.1684 1\n-1.0013 -4.7458 1\n-0.95184 -4.6885 1\n-0.25688 -4.6315 1\n-1.5891 7.565 1\n-5.7677 3.7102 1\n-1.0834 -4.6134 1\n3.1957 -2.1511 1\n-0.018241 -4.6831 1\n-5.9338 2.837 1\n-5.9907 -0.5755 1\n-4.7271 5.3952 1\n-2.897 -4.2621 1\n0.47157 -4.5755 1\n-6.2341 2.1326 1\n-4.4388 -3.1694 1\n-2.6483 7.1807 1\n-5.726 -1.4455 1\n-6.1568 -0.40491 1\n2.2571 -3.3413 1\n-4.0679 6.0436 1\n-2.1194 7.3201 1\n-3.3799 6.5573 1\n-4.197 5.8339 1\n3.0276 -2.2194 1\n-3.9462 -3.6179 1\n-5.2039 4.7859 1\n-4.6198 5.4341 1\n2.8861 0.76578 1\n-3.0231 6.9319 1\n2.8218 -2.8335 1\n-2.8724 -4.2436 1\n-3.9295 6.1823 1\n2.3698 -3.4424 1\n-4.1927 6.0031 1\n-6.2545 1.9851 1\n2.8982 -2.606 1\n-6.2413 0.23513 1\n-6.27 0.9425 1\n-0.53161 7.9212 1\n-4.7209 -3.0555 1\n-6.0107 2.463 1\n0.93173 -4.4139 1\n-6.1823 1.3601 1\n2.2714 -3.5054 1\n-1.0324 7.741 1\n-0.58375 7.8081 1\n1.6579 -3.8829 1\n2.9001 -2.4986 1\n-5.674 -1.5775 1\n-4.9496 -2.659 1\n2.9873 0.69848 1\n-4.3279 -3.3906 1\n-1.3433 7.5808 1\n-2.1915 -4.5722 1\n-6.0963 0.13299 1\n-2.353 7.2748 1\n-5.7501 -1.5378 1\n-3.885 -3.7594 1\n-6.2901 0.44681 1\n3.214 -2.0341 1\n2.9397 0.65879 1\n-5.5431 -1.8092 1\n-6.1994 1.6203 1\n-3.1293 6.7009 1\n-6.2386 1.1145 1\n-5.9896 -0.75084 1\n-5.4608 4.4609 1\n3.3977 -0.21849 1\n2.1834 1.5013 1\n2.152 -3.6281 1\n-5.5111 -1.9358 1\n3.2567 -1.7672 1\n-6.1302 2.5927 1\n-2.5059 7.1524 1\n1.8789 -3.8147 1\n3.3049 -1.0396 1\n-4.602 -2.994 1\n0.95192 -4.3575 1\n-5.6499 3.798 1\n-1.282 -4.756 1\n-5.5084 4.3503 1\n-0.066337 -4.5797 1\n-2.1589 7.3053 1\n2.8958 -2.7035 1\n-5.8562 3.418 1\n3.4431 -0.58257 1\n-0.10927 8.0172 1\n-5.9605 -0.4692 1\n3.163 0.33222 1\n-5.0862 4.8213 1\n2.5286 1.2403 1\n-1.4342 7.6504 1\n-2.154 7.3785 1\n-2.8986 -4.3834 1\n-1.1953 -4.7598 1\n-1.8133 7.4006 1\n3.36 -0.72114 1\n-4.0791 6.0075 1\n-4.2142 -3.3836 1\n0.15949 -4.6017 1\n3.3754 -0.57769 1\n3.4287 -1.356 1\n3.3247 -1.1705 1\n-3.6661 -3.7816 1\n2.6744 1.1078 1\n0.84009 -4.392 1\n-3.3009 6.7279 1\n-0.31977 7.9051 1\n-4.5216 -3.1992 1\n-0.35988 7.8863 1\n-6.2713 1.3644 1\n-6.0192 3.1203 1\n-2.8101 6.9533 1\n3.2133 -2.1276 1\n0.21598 -4.559 1\n-3.3028 6.6126 1\n-2.2697 -4.4433 1\n-5.4596 4.3848 1\n-3.4816 6.5494 1\n-1.0561 -4.7111 1\n-0.13969 -4.7505 1\n-3.3508 -3.9635 1\n-3.6234 6.4307 1\n1.4755 1.8387 1\n-5.8838 -1.1636 1\n-5.0427 4.8941 1\n-3.5563 -3.9513 1\n-0.48457 -4.7643 1\n-2.276 7.3625 1\n-0.96562 7.6426 1\n-4.0564 6.0772 1\n-4.5298 -3.2606 1\n-5.9273 -1.0439 1\n-2.6546 7.0087 1\n2.1218 -3.5138 1\n-5.5363 4.0746 1\n-3.1884 -4.1543 1\n-4.8921 5.1039 1\n-6.196 2.0557 1\n-3.8481 6.3307 1\n-4.3378 5.7294 1\n-5.0208 -2.7846 1\n-5.8965 -1.0851 1\n3.3137 -0.024915 1\n3.3837 -0.22876 1\n-4.6846 -3.0497 1\n3.0696 -2.3043 1\n-5.9582 3.1293 1\n3.381 -0.70216 1\n-5.7995 3.5909 1\n0.42013 -4.4989 1\n-6.2479 1.7108 1\n-6.2452 1.1284 1\n-3.4988 -4.0339 1\n-1.2256 7.593 1\n0.80094 1.9646 1\n-5.5097 4.2706 1\n-6.011 -0.44955 1\n1.0144 -4.2568 1\n1.7542 -3.8924 1\n-5.514 -1.7172 1\n-5.7481 3.6795 1\n3.3633 -0.26338 1\n-0.41615 7.7809 1\n-2.7292 2.678 2\n6.427 -1.5571 2\n5.6794 1.9406 2\n0.92956 -7.5603 2\n5.5607 -4.2163 2\n-0.381 4.6635 2\n6.3861 -2.1147 2\n-2.2577 3.5306 2\n-2.1152 -1.2694 2\n3.1563 -6.6707 2\n-2.2423 3.3811 2\n4.6982 -5.4378 2\n-3.1782 0.9924 2\n-0.1891 4.7885 2\n-1.9537 3.854 2\n3.3027 -6.5269 2\n3.8727 4.129 2\n6.5243 -0.31321 2\n6.1538 -2.9673 2\n2.4911 4.6972 2\n2.0123 -7.1532 2\n5.5601 -4.3073 2\n3.3382 4.4703 2\n-2.189 3.5106 2\n6.1891 1.02 2\n-1.1432 4.3208 2\n5.0395 2.8845 2\n6.3222 -2.5084 2\n0.80556 -7.6111 2\n-1.3839 -1.5559 2\n3.9052 -6.2384 2\n6.0627 -2.9543 2\n4.5834 -5.5003 2\n5.5657 2.2871 2\n4.7444 -5.3809 2\n6.3834 -2.2209 2\n6.3097 0.46719 2\n5.9742 1.3019 2\n-3.0716 2.026 2\n5.8631 1.5587 2\n1.6789 4.9181 2\n5.1373 2.9149 2\n2.0008 -7.2334 2\n1.3575 -7.4904 2\n6.4563 -0.7725 2\n2.707 4.5995 2\n4.5101 -5.5125 2\n5.963 1.4898 2\n-0.33966 4.7476 2\n5.2664 -4.7287 2\n1.3812 -7.437 2\n4.5617 -5.4559 2\n5.587 -4.3219 2\n3.0474 -6.7959 2\n6.5192 -0.63595 2\n6.1111 0.81528 2\n-1.8965 3.899 2\n5.7883 2.0388 2\n2.9268 4.5453 2\n6.2632 -2.7394 2\n6.4489 -0.2652 2\n6.4614 -1.1897 2\n0.096154 4.8726 2\n6.3421 -1.5528 2\n5.8443 1.6615 2\n6.4979 -0.40846 2\n5.0686 -5.0257 2\n6.3139 -1.9026 2\n6.0893 -2.8633 2\n-1.6164 -1.4629 2\n2.5612 -7.0908 2\n3.1338 -6.8139 2\n1.8996 4.8209 2\n4.9691 3.2237 2\n2.2112 -7.2106 2\n6.4475 -0.66118 2\n1.7005 4.9799 2\n1.5638 -7.4116 2\n2.3391 -7.1162 2\n6.209 0.51713 2\n5.1163 2.784 2\n1.4049 -7.4508 2\n-2.8691 2.5268 2\n6.4108 -0.012729 2\n-1.7573 3.7635 2\n-3.2408 1.3344 2\n-3.1559 1.8975 2\n1.9652 -7.2147 2\n-3.159 0.98852 2\n6.3528 -1.7104 2\n4.8506 3.0983 2\n-2.7115 -0.68904 2\n0.68375 -7.616 2\n6.4043 -2.163 2\n6.4885 -0.88946 2\n5.7026 -3.8504 2\n1.0582 -7.5153 2\n3.2273 -6.6599 2\n3.1174 4.3946 2\n3.0125 4.4683 2\n6.3448 -0.13963 2\n3.6491 -6.4201 2\n-2.1277 3.6382 2\n6.4675 -1.8384 2\n4.1719 3.9544 2\n4.6025 3.3786 2\n4.0099 -5.9592 2\n1.8536 -7.2324 2\n5.7609 -3.705 2\n6.3335 -1.442 2\n6.4775 -0.7453 2\n6.3857 -1.8588 2\n-1.6809 3.8924 2\n0.70171 -7.7085 2\n5.2946 2.7231 2\n4.9089 3.2015 2\n5.9017 -3.3883 2\n-2.2728 -1.1649 2\n2.9752 4.6285 2\n1.6232 4.9563 2\n-1.0096 4.5489 2\n2.0346 -7.241 2\n6.3326 0.76215 2\n2.7749 -6.8243 2\n-1.8511 3.8155 2\n4.4025 3.7079 2\n4.8452 -5.2712 2\n-2.8393 -0.62294 2\n0.78812 -7.6084 2\n2.3085 -7.0831 2\n6.3191 -1.9666 2\n6.3994 -0.45463 2\n4.981 -4.9138 2\n-2.2605 3.4586 2\n0.72022 4.9324 2\n-3.1875 0.31121 2\n-3.2375 0.98241 2\n5.4447 -4.5013 2\n-2.9895 1.8525 2\n5.9411 1.4263 2\n-0.9885 4.304 2\n2.952 4.6063 2\n2.4579 4.8084 2\n-2.1874 3.4713 2\n-1.3382 4.2388 2\n3.6935 4.3011 2\n-3.251 1.4542 2\n6.4796 -0.50248 2\n-1.229 4.3682 2\n1.6418 -7.4958 2\n3.6038 -6.3414 2\n-1.9447 3.6858 2\n6.2058 -2.6473 2\n1.3316 -7.5185 2\n-3.126 0.82747 2\n6.3842 -0.47861 2\n-2.4151 3.2334 2\n0.20408 -7.6577 2\n0.62293 -7.716 2\n3.2961 -6.4948 2\n5.1814 2.8042 2\n5.5743 2.2513 2\n4.884 3.1278 2\n2.0676 -7.1324 2\n-2.6794 -0.68037 2\n-2.9712 1.946 2\n-1.2844 4.2258 2\n2.5864 -6.9903 2\n-3.19 1.3984 2\n-2.0651 3.7256 2\n2.7585 4.6294 2\n5.3795 2.6451 2\n6.0866 -3.357 2\n4.0195 3.9389 2\n3.1471 -6.5994 2\n2.9439 4.4323 2\n5.6417 -4.1828 2\n-1.1651 -1.687 2\n5.4889 2.552 2\n1.8462 -7.352 2\n5.2399 -4.6065 2\n4.7348 -5.3232 2\n6.4429 -0.42937 2\n0.33706 -7.6951 2\n3.3616 -6.5868 2\n-2.4246 -0.99087 2\n1.3711 4.8442 2\n-2.7516 -0.67637 2\n6.4497 -1.6127 2\n6.3287 -2.4924 2\n6.2181 0.61233 2\n-1.9148 -1.4401 2\n0.32903 4.8322 2\n-1.0486 4.3948 2\n1.8637 4.9261 2\n5.946 1.6416 2\n5.9782 -3.4719 2\n4.3689 -5.8339 2\n4.8701 -5.1862 2\n-1.9648 3.7916 2\n6.1463 -2.6648 2\n2.493 -7.0228 2\n5.8529 2.0451 2\n0.14263 -7.7574 2\n6.4427 -1.9957 2\n4.5157 -5.6445 2\n-2.8233 2.3889 2\n1.4919 -7.3597 2\n3.8798 4.0237 2\n6.3055 -2.2512 2\n-0.96533 4.3773 2\n-2.9724 2.2722 2\n0.0019711 4.8861 2\n-2.8551 2.768 2\n-2.533 3.1856 2\n5.7456 2.1583 2\n5.3981 2.519 2\n5.5995 -4.2579 2\n6.4312 -0.52667 2\n2.4774 4.6862 2\n4.7547 3.3431 2\n-2.5939 -0.73218 2\n1.2141 4.8501 2\n1.6696 4.8471 2\n-3.1081 0.27703 2\n5.6124 2.4343 2\n1.4382 4.9434 2\n5.6154 2.4123 2\n-2.1319 3.6001 2\n4.9249 -5.107 2\n0.050994 -7.7024 2\n6.3555 -1.6063 2\n-3.0791 2.1865 2\n2.3547 4.7075 2\n3.9816 -6.1968 2\n1.0873 -7.5274 2\n-2.6644 2.9582 2\n2.0233 4.8139 2\n-3.0489 2.1468 2\n6.1675 1.2331 2\n4.0886 3.9927 2\n6.3816 -2.3946 2\n6.3489 -2.3115 2\n6.5316 -1.2236 2\n1.9039 4.756 2\n4.0596 3.8607 2\n0.87403 4.8702 2\n-3.1811 1.6199 2\n-3.0779 0.47123 2\n6.2672 -2.6707 2\n6.0261 -3.0643 2\n2.4272 -7.1491 2\n-0.72962 4.6564 2\n3.0473 4.4605 2\n-1.992 3.6569 2\n2.8542 -6.7082 2\n1.9688 4.7551 2\n4.8398 3.2717 2\n6.5377 -0.83165 2\n-2.7013 2.9882 2\n3.4785 -6.5684 2\n6.0918 -3.3086 2\n4.1054 3.9046 2\n4.2454 -5.8355 2\n4.2297 -5.8555 2\n6.1844 0.78824 2\n-3.1072 0.79734 2\n1.9617 -7.3447 2\n5.6462 -3.983 2\n-2.0419 3.8145 2\n-1.4388 -1.5715 2\n-3.0998 1.9846 2\n5.8739 -3.4762 2\n2.162 -7.1465 2\n-2.0515 3.7541 2\n4.9595 3.1266 2\n1.3112 4.9388 2\n-3.0044 2.2341 2\n-2.6397 2.9147 2\n5.8411 1.9259 2\n-0.95939 -1.7587 2\n5.5488 2.1736 2\n5.0181 3.107 2\n3.9234 -6.0841 2\n6.2705 0.39543 2\n6.419 -1.8796 2\n3.7692 -6.1608 2\n-0.061244 4.8233 2\n0.85094 4.8976 2\n4.1503 -5.9354 2\n-2.0968 3.7061 2\n0.83894 -7.5353 2\n0.43546 4.8267 2\n-3.141 1.7122 2\n5.2341 2.7274 2\n1.1118 4.8246 2\n4.5534 3.5619 2\n-1.5183 -1.5607 2\n5.792 1.7899 2\n2.1424 -7.1191 2\n-0.54672 -1.6555 2\n3.5336 -6.2971 2\n5.1978 -4.7685 2\n6.4257 -1.3189 2\n-0.9037 -1.7984 2\n-2.3201 -1.0552 2\n0.10409 4.8321 2\n3.3775 -6.5115 2\n4.9017 3.3316 2\n5.0012 -4.9709 2\n3.4367 4.3207 2\n5.8751 -3.5501 2\n-0.34362 4.7043 2\n1.0941 4.9251 2\n5.1286 -4.8403 2\n-3.0528 1.647 2\n4.0627 3.8055 2\n0.78173 4.8959 2\n6.2907 -2.5884 2\n5.7415 1.8189 2\n-2.5618 3.1089 2\n-0.6622 4.6721 2\n4.671 3.4053 2\n4.7148 -5.3409 2\n5.6775 1.9527 2\n3.0686 -6.8194 2\n0.01098 4.9114 2\n-0.75893 4.4866 2\n6.112 1.0048 2\n-3.1864 0.45116 2\n4.5362 3.5128 2\n4.3289 -5.8774 2\n1.7771 -7.4087 2\n1.2887 -7.4498 2\n6.2827 0.83476 2\n-0.65794 -1.674 2\n3.0035 4.4978 2\n-2.0238 3.7158 2\n5.9609 1.7045 2\n6.3526 -2.3443 2\n-0.9987 -1.6175 2\n6.41 -1.6784 2\n1.0806 4.8737 2\n-2.6533 -0.69065 2\n0.1997 4.8505 2\n5.9783 1.3177 2\n5.5211 2.5726 2\n4.3105 -5.7338 2\n4.1865 3.9145 2\n1.5045 4.9257 2\n6.2142 -2.7649 2\n-3.1703 0.47809 2\n3.5251 -6.3682 2\n2.3893 4.6788 2\n-1.5404 -1.5897 2\n3.7329 4.0781 2\n5.6696 -3.8517 2\n6.3815 -0.799 2\n-2.3442 3.2488 2\n3.5217 -6.3678 2\n5.0262 -5.1201 2\n6.4588 0.10579 2\n3.522 -6.4586 2\n4.1355 -5.8115 2\n4.9176 -5.1472 2\n-0.57683 4.5974 2\n6.2585 -2.0077 2\n-3.077 -0.045501 2\n2.4779 -7.094 2\n1.2709 4.9418 2\n6.0997 1.2546 2\n6.5594 -0.69323 2\n-2.0176 3.6406 2\n4.1245 -5.9357 2\n3.6482 4.162 2\n3.3778 4.3628 2\n5.5678 2.1632 2\n-2.6549 -0.58156 2\n5.2243 2.8431 2\n0.11568 4.8433 2\n0.67305 4.9357 2\n4.8975 2.9977 2\n5.5937 -4.0394 2\n4.4049 3.7283 2\n-0.80157 4.4785 2\n3.3968 4.3121 2\n0.60669 -7.7112 2\n3.5716 -6.4109 2\n2.0704 4.8326 2\n2.3665 -7.0215 2\n0.64364 -7.7117 2\n6.3317 0.14363 2\n-1.8085 -1.5006 2\n6.3736 -1.8954 2\n4.0163 -5.9447 2\n1.2159 -7.4892 2\n-1.924 -1.4097 2\n6.2921 -2.1773 2\n5.3112 2.595 2\n6.3487 -1.3399 2\n-3.2277 0.95387 2\n-0.53785 4.5734 2\n2.1719 4.7195 2\n5.0426 -4.8478 2\n2.5985 4.6486 2\n5.5778 2.289 2\n2.9465 -6.7473 2\n6.3913 -1.6915 2\n-3.1084 0.22066 2\n6.5278 -0.39557 2\n5.2629 2.8711 2\n0.28642 -7.7814 2\n-2.8192 2.7832 2\n5.9679 -3.221 2\n4.0559 -6.0749 2\n3.6671 4.1409 2\n6.407 -1.1673 2\n-1.2133 -1.6807 2\n1.5211 4.9359 2\n5.3294 -4.7386 2\n6.168 -2.8356 2\n0.23884 -7.6468 2\n5.8236 1.9017 2\n0.22286 -7.6584 2\n-1.7286 4.0501 2\n4.326 -5.7087 2\n-0.67922 4.4918 2\n5.4864 -4.3409 2\n-0.92909 -1.6938 2\n0.036188 4.8116 2\n4.749 3.4099 2\n4.3275 -5.7296 2\n3.7241 -6.1536 2\n-2.8217 2.7008 2\n1.7299 4.8835 2\n0.79821 -7.7292 2\n5.9641 -3.2839 2\n4.6373 -5.4592 2\n6.5217 -0.28962 2\n4.6218 -5.4654 2\n6.4736 -1.6102 2\n2.0935 -7.3214 2\n6.2219 -2.6794 2\n3.1983 -6.6586 2\n1.9863 -7.3345 2\n-1.6869 4.071 2\n2.5985 -7.0221 2\n6.1868 -2.8979 2\n5.9955 -3.1368 2\n6.2734 -2.2232 2\n6.4181 -0.037133 2\n2.9992 -6.7782 2\n6.233 -2.8156 2\n6.3801 -1.203 2\n3.7793 -6.1845 2\n1.4727 -7.484 2\n-2.3906 3.3158 2\n-2.5766 -0.76457 2\n5.9565 1.6696 2\n6.4291 -1.5225 2\n0.27945 -7.8163 2\n2.166 4.7129 2\n5.5025 2.3945 2\n3.852 -6.1221 2\n4.3784 3.6777 2\n5.6634 2.0806 2\n4.3991 3.72 2\n2.7868 -6.89 2\n2.6342 -6.9544 2\n-0.8654 4.368 2\n6.0907 1.249 2\n5.8933 -3.5383 2\n4.1241 -5.9865 2\n-2.5399 3.1338 2\n6.0161 1.4765 2\n6.2036 1.1623 2\n4.9095 3.1261 2\n6.4614 -1.1868 2\n6.1981 0.7751 2\n-2.7975 -0.46842 2\n6.4708 -0.26954 2\n-3.2166 0.34857 2\n6.4273 0.32967 2\n6.4009 0.43211 2\n2.8121 -6.8849 2\n-2.7374 -0.64882 2\n6.1106 1.0906 2\n-3.068 1.8386 2\n1.1573 -7.6612 2\n3.5393 -6.3337 2\n6.2485 -2.7311 2\n2.5691 4.5682 2\n-3.0965 0.3 2\n5.7621 1.9902 2\n-0.56815 4.5058 2\n1.5434 -7.4152 2\n6.2456 -2.5883 2\n3.8803 4.0058 2\n-0.55827 4.5039 2\n4.4997 3.666 2\n0.99704 1.1724 -1\n4.1822 -2.3831 -1\n-2.1785 -1.2334 -1\n0.69431 7.4779 -1\n5.9458 4.7189 -1\n5.9262 7.9366 -1\n-5.7178 -7.4405 -1\n3.0562 -7.0185 -1\n-2.9601 -0.10538 -1\n5.8036 4.6519 -1\n1.7255 -4.3812 -1\n-1.029 -3.1007 -1\n-2.2978 7.6748 -1\n-0.39582 -3.1281 -1\n3.111 -7.2715 -1\n4.5375 -2.682 -1\n-4.3156 -2.0144 -1\n1.7443 1.4558 -1\n1.4728 -5.119 -1\n5.2588 -6.8491 -1\n3.7378 -5.8209 -1\n-6.2272 3.1231 -1\n4.094 -1.1167 -1\n-5.1932 3.6454 -1\n6.0181 2.3473 -1\n-6.1718 2.0509 -1\n-3.3875 -5.644 -1\n0.6686 -2.2634 -1\n2.0191 6.5422 -1\n-3.8391 -6.6635 -1\n1.7566 -3.3627 -1\n0.60887 7.4868 -1\n5.9015 0.56615 -1\n-3.4838 1.9722 -1\n-1.5884 -1.6991 -1\n-1.2795 -5.9099 -1\n-4.1946 1.8443 -1\n-0.29807 -1.0464 -1\n-0.81015 4.1765 -1\n0.50489 -7.3674 -1\n0.38651 7.9087 -1\n-1.8725 -3.107 -1\n-1.6886 -5.5213 -1\n2.2159 7.4489 -1\n6.1353 0.12555 -1\n-0.99317 7.6643 -1\n4.9855 5.5923 -1\n-3.0679 7.9675 -1\n6.3441 2.5904 -1\n2.0811 -0.69084 -1\n-3.535 -4.1746 -1\n-3.5312 -4.2606 -1\n-4.79 7.0115 -1\n-4.1976 1.4007 -1\n-1.5816 2.8529 -1\n-6.2717 3.3302 -1\n4.213 2.6279 -1\n-0.41696 7.7572 -1\n-3.8167 -3.4963 -1\n3.9224 -6.3023 -1\n-1.9521 1.791 -1\n-5.0652 -4.0883 -1\n5.2947 6.2156 -1\n3.7967 -5.6004 -1\n6.4926 6.4649 -1\n5.7136 -7.4243 -1\n0.81967 0.3283 -1\n-2.9895 -0.00085951 -1\n0.15954 5.9809 -1\n-0.33288 -2.5584 -1\n-4.2483 6.1688 -1\n5.2115 -5.1639 -1\n-3.2276 -4.5736 -1\n2.5012 -5.6455 -1\n0.89988 -0.24351 -1\n4.9861 6.2819 -1\n-1.9625 -5.6476 -1\n-5.2149 7.8121 -1\n3.1152 4.2004 -1\n-6.2495 7.3756 -1\n-1.9432 -7.2925 -1\n-0.65084 -0.018711 -1\n-4.0715 -4.323 -1\n-0.53212 -3.5106 -1\n3.933 5.2171 -1\n-5.1819 -6.2939 -1\n-0.98123 -7.1683 -1\n1.1532 -4.7665 -1\n5.3666 2.8373 -1\n-6.181 1.9162 -1\n3.5842 0.20032 -1\n1.8329 3.8181 -1\n3.4293 2.4644 -1\n2.9413 4.0821 -1\n2.4294 3.5826 -1\n0.060931 -6.7637 -1\n3.3046 7.0353 -1\n3.6781 -4.2435 -1\n-3.5133 3.5795 -1\n1.5065 5.9587 -1\n3.6982 -4.6055 -1\n-0.052565 -2.7876 -1\n5.8367 -3.1035 -1\n1.0768 -7.1842 -1\n-5.8994 5.6035 -1\n2.8926 -6.6268 -1\n2.584 -3.2516 -1\n5.0489 -7.1636 -1\n-3.8496 6.3882 -1\n5.3396 -4.6697 -1\n-5.442 6.8484 -1\n-5.3506 6.4377 -1\n-0.94144 7.9699 -1\n2.672 5.7087 -1\n-5.2121 3.4423 -1\n-4.8627 -1.2793 -1\n4.579 -7.5973 -1\n-1.371 1.0155 -1\n-2.8194 5.4605 -1\n-0.071015 -6.3005 -1\n3.4621 -6.4566 -1\n-1.5589 4.268 -1\n0.83669 1.8137 -1\n-4.9696 -7.4708 -1\n6.3262 -1.9484 -1\n3.3632 -7.1643 -1\n-5.4842 -4.819 -1\n6.3909 -7.7097 -1\n-0.85769 5.5245 -1\n-5.9393 7.5945 -1\n-5.0688 -1.5275 -1\n-2.0687 2.8432 -1\n2.3228 3.5423 -1\n5.2122 6.4419 -1\n5.0437 -2.8339 -1\n4.9325 3.8192 -1\n1.5656 -4.3287 -1\n-0.18476 -1.6329 -1\n-5.8413 -5.0362 -1\n-4.5102 5.595 -1\n4.5549 -0.8138 -1\n-6.0414 5.0219 -1\n-3.627 -2.0888 -1\n2.115 7.7527 -1\n-4.2277 -2.239 -1\n4.1579 -6.7695 -1\n3.7898 0.30789 -1\n-1.4327 5.3565 -1\n-5.321 3.4747 -1\n2.9305 -4.3483 -1\n-0.65996 7.4931 -1\n5.7946 5.2348 -1\n0.71088 -3.4582 -1\n2.514 0.80654 -1\n-3.8192 1.7883 -1\n2.4929 -4.2774 -1\n6.4036 1.7112 -1\n-5.3291 5.499 -1\n0.30721 -0.5677 -1\n-5.4762 0.93261 -1\n2.4093 2.977 -1\n-1.8227 -3.0121 -1\n1.8712 6.9915 -1\n-1.6002 5.9287 -1\n-5.0192 1.8056 -1\n-3.5259 -6.7022 -1\n6.2082 2.0681 -1\n1.8873 -5.8954 -1\n-2.76 -0.7612 -1\n-2.9446 3.4073 -1\n0.20439 -5.1879 -1\n1.0054 5.3538 -1\n3.0021 4.8663 -1\n-3.1108 2.0954 -1\n6.0641 6.7557 -1\n-4.0038 6.3951 -1\n2.4967 5.9365 -1\n3.3355 -6.9662 -1\n-4.1238 7.3905 -1\n5.4149 -0.027801 -1\n-3.1404 -3.3035 -1\n0.87371 -0.91466 -1\n-5.0495 4.1747 -1\n-4.5955 -5.998 -1\n2.2079 0.90491 -1\n6.3577 7.0719 -1\n-5.0772 1.2271 -1\n3.1964 -0.33412 -1\n-0.67638 -2.1462 -1\n5.0756 -0.61201 -1\n-1.7577 4.2782 -1\n0.083412 6.5329 -1\n-2.5483 0.5376 -1\n6.1629 -2.9882 -1\n3.771 0.29035 -1\n-0.030158 6.6956 -1\n-3.2344 4.4634 -1\n1.2983 4.2072 -1\n6.3666 4.9412 -1\n1.3649 4.8555 -1\n-1.2182 0.7369 -1\n2.026 -2.5166 -1\n5.9935 -7.1565 -1\n-4.7681 -6.3109 -1\n5.3053 -5.3231 -1\n0.97327 6.2888 -1\n-3.1591 0.051957 -1\n-1.0545 -2.9558 -1\n-5.7917 2.8668 -1\n-5.666 3.1302 -1\n3.1989 -7.4469 -1\n2.3884 7.9023 -1\n2.6722 -4.6645 -1\n1.5957 -4.4956 -1\n-0.76716 -5.4389 -1\n-2.0792 6.6362 -1\n4.0531 -4.8692 -1\n4.9185 1.1542 -1\n3.8435 0.35499 -1\n5.601 6.4115 -1\n4.4642 6.4834 -1\n-0.59422 6.7979 -1\n-1.3794 -6.5383 -1\n-3.7656 4.9034 -1\n0.24239 -7.5053 -1\n-5.6226 6.4793 -1\n2.9967 5.7301 -1\n-2.8845 -4.9609 -1\n1.6857 -1.3776 -1\n-0.061039 -6.8821 -1\n-1.7615 -6.323 -1\n0.73737 0.66436 -1\n4.0347 3.5063 -1\n2.9222 -4.4931 -1\n-0.92616 0.67313 -1\n2.5323 3.0703 -1\n-0.41022 6.8608 -1\n1.3276 6.1547 -1\n-2.7322 2.3895 -1\n4.9463 5.5359 -1\n4.7218 -0.60724 -1\n1.4137 -3.0039 -1\n-2.1774 -4.2412 -1\n-4.1458 3.2823 -1\n3.609 3.0428 -1\n-5.9599 -7.427 -1\n-3.6884 2.5647 -1\n-3.2222 6.0661 -1\n-0.67926 6.8599 -1\n-4.5158 4.3019 -1\n-0.92901 -3.9958 -1\n-3.7309 7.9855 -1\n3.8052 -2.7865 -1\n-0.58681 0.71993 -1\n3.6634 7.7778 -1\n2.8205 -4.1545 -1\n5.1298 -2.5519 -1\n5.6018 -6.8103 -1\n-1.2913 4.8921 -1\n1.3781 0.71746 -1\n-2.4163 -4.3378 -1\n5.5181 7.4863 -1\n-3.2052 -4.9288 -1\n4.8516 4.1756 -1\n-4.5146 6.2547 -1\n3.3334 -1.2291 -1\n4.6495 5.4796 -1\n2.8658 0.35381 -1\n-4.8023 -3.2821 -1\n-1.9408 5.8408 -1\n2.1486 -4.3912 -1\n-4.661 6.4978 -1\n-2.1517 -6.3523 -1\n2.1335 6.5482 -1\n-0.41094 1.4984 -1\n0.8161 -2.9263 -1\n-3.3417 0.18804 -1\n-2.0546 7.0922 -1\n5.0153 6.8427 -1\n3.7475 2.4096 -1\n-0.95336 -6.4451 -1\n-1.7757 7.1045 -1\n5.3302 -2.9651 -1\n3.2545 -2.3742 -1\n4.8409 -1.274 -1\n3.7683 -1.4661 -1\n1.6694 6.1417 -1\n-5.7319 -2.5869 -1\n6.5222 0.5423 -1\n-5.0951 -5.5717 -1\n3.2518 6.8628 -1\n-1.9046 3.7882 -1\n-5.5292 5.8553 -1\n-3.2266 7.7296 -1\n1.3355 -5.6115 -1\n-3.8097 6.1073 -1\n3.7864 6.5807 -1\n4.575 -3.234 -1\n-2.7285 1.0933 -1\n0.17763 -0.47547 -1\n-5.3179 7.0361 -1\n0.78796 -7.4162 -1\n4.7834 5.5481 -1\n-5.335 -7.4817 -1\n3.2472 0.16747 -1\n5.1979 -5.2918 -1\n4.9531 -2.7009 -1\n0.76115 -1.2127 -1\n3.3223 5.3327 -1\n-5.2735 5.2985 -1\n-2.7887 -4.7701 -1\n3.1233 -7.3721 -1\n1.4966 -7.1718 -1\n-3.4529 -2.7676 -1\n6.5528 -1.859 -1\n6.2053 4.9028 -1\n3.4933 7.3344 -1\n-3.0983 4.5351 -1\n1.0965 1.7329 -1\n1.4756 4.7751 -1\n-1.9237 -0.018649 -1\n-3.8355 0.32676 -1\n5.5789 -3.3192 -1\n-2.4511 -0.67166 -1\n5.8085 6.0645 -1\n-3.2578 -4.0852 -1\n-2.9998 0.13506 -1\n-3.0915 1.3737 -1\n-4.1625 -3.009 -1\n-5.5275 -5.7139 -1\n4.6112 -1.8252 -1\n-2.1982 -4.5277 -1\n-4.5409 1.4609 -1\n-5.2343 -3.6335 -1\n6.3133 -5.0417 -1\n-5.3859 -0.23118 -1\n6.3354 -4.479 -1\n-4.7967 0.72734 -1\n-2.966 -6.2462 -1\n1.5387 -0.65715 -1\n-2.6625 7.4906 -1\n-2.4973 3.6248 -1\n4.3438 0.17437 -1\n1.9394 -5.4485 -1\n-3.8528 -1.8972 -1\n1.1483 -6.8406 -1\n4.4759 2.4832 -1\n-4.3747 0.12029 -1\n3.7233 -5.2136 -1\n2.3754 -1.4229 -1\n-1.0381 3.9534 -1\n-0.18974 -1.3719 -1\n1.3579 5.2827 -1\n5.1344 6.9134 -1\n0.55913 0.64429 -1\n-4.818 -1.0339 -1\n-6.1389 -4.66 -1\n-0.54392 1.5351 -1\n4.9994 -0.706 -1\n4.9898 -6.5334 -1\n-5.6496 -2.104 -1\n3.6242 -3.6928 -1\n4.464 -1.6731 -1\n3.3974 0.16034 -1\n-0.94691 4.7007 -1\n-3.7249 -1.5431 -1\n-1.6193 7.6096 -1\n-1.9751 -2.2865 -1\n5.9782 -2.2816 -1\n-5.499 4.7471 -1\n4.4027 -6.5977 -1\n-0.48006 -4.0227 -1\n-4.3616 -3.1293 -1\n2.8308 7.8459 -1\n3.5673 -0.60188 -1\n6.167 2.3977 -1\n6.2279 7.326 -1\n-6.2254 2.6421 -1\n3.3556 3.0495 -1\n-3.8513 -0.50673 -1\n-2.1658 2.2508 -1\n1.2456 -6.8903 -1\n-3.3019 6.1641 -1\n0.070979 -1.5406 -1\n-3.2609 7.5648 -1\n2.9468 2.4309 -1\n-0.56474 -2.3226 -1\n-0.27054 -3.1012 -1\n-2.485 6.2854 -1\n-0.98761 1.3893 -1\n-6.1605 -5.3819 -1\n4.3137 -7.62 -1\n-2.3717 7.4464 -1\n-2.677 0.32133 -1\n0.8667 6.1086 -1\n3.2881 7.6549 -1\n5.949 -5.6174 -1\n3.9371 -5.0215 -1\n-4.8079 -5.5373 -1\n0.46141 -3.8188 -1\n5.8141 -5.019 -1\n1.8518 -4.5696 -1\n4.4702 -3.7805 -1\n-1.1937 -0.39054 -1\n-1.7887 3.3432 -1\n-3.6752 -3.8158 -1\n5.8888 -2.869 -1\n4.7502 -2.2679 -1\n-0.61324 2.1134 -1\n-3.9465 -6.8551 -1\n1.1924 4.1245 -1\n-0.15488 0.57528 -1\n-1.2941 3.8053 -1\n0.28874 -0.43841 -1\n1.5294 7.9929 -1\n2.1688 6.9842 -1\n-2.1996 0.24757 -1\n0.11939 0.22495 -1\n5.6391 4.5312 -1\n-4.4355 -3.8229 -1\n3.6499 6.6755 -1\n1.3787 -6.4042 -1\n-1.6911 1.0304 -1\n-0.39597 -3.4163 -1\n1.5339 2.5518 -1\n-0.66757 -1.787 -1\n-6.0168 -0.85973 -1\n0.72782 -7.0758 -1\n4.0202 0.61993 -1\n0.84488 3.7026 -1\n-1.183 0.8396 -1\n2.7949 -5.8993 -1\n2.3311 7.1993 -1\n2.0495 4.09 -1\n-5.3304 -3.0831 -1\n-4.2442 -3.5328 -1\n0.79558 -5.2189 -1\n4.0256 5.9052 -1\n-5.5057 -6.5451 -1\n-2.0786 6.6513 -1\n3.1442 1.138 -1\n2.975 5.1619 -1\n-3.3218 4.8784 -1\n0.92004 4.6088 -1\n4.145 7.6729 -1\n-4.8376 4.783 -1\n5.1885 0.32069 -1\n4.8452 -4.8085 -1\n1.4223 -4.4354 -1\n3.7774 -4.2538 -1\n2.5299 7.9499 -1\n3.7285 6.4302 -1\n1.5697 5.617 -1\n-4.161 0.40129 -1\n3.0207 -7.5248 -1\n-3.7894 -6.901 -1\n1.4762 -7.7831 -1\n4.7934 -0.33329 -1\n-3.6731 -5.158 -1\n-0.20239 4.394 -1\n4.8309 -7.8099 -1\n-3.2922 -2.5616 -1\n5.7604 5.5706 -1\n-1.9064 -4.3283 -1\n-5.2855 -4.5455 -1\n4.6664 7.3931 -1\n3.8793 4.1104 -1\n1.1225 -0.44183 -1\n-4.2652 -2.9422 -1\n3.1047 5.2933 -1\n1.8533 -0.23408 -1\n3.8429 6.6068 -1\n1.9521 6.6885 -1\n4.3278 7.3079 -1\n-5.8087 6.2181 -1\n3.6731 -5.3896 -1\n4.3802 -1.8335 -1\n-1.7062 7.662 -1\n-4.3473 7.8695 -1\n-5.9378 -4.1413 -1\n-5.0565 -4.3832 -1\n-2.3798 2.1034 -1\n6.1812 -5.3513 -1\n4.7768 -2.0083 -1\n-5.654 1.1851 -1\n-1.4824 -4.7542 -1\n-3.9498 4.9372 -1\n3.5556 -3.7813 -1\n-6.1515 6.6724 -1\n4.017 6.2756 -1\n5.6379 5.6121 -1\n4.4176 -2.9743 -1\n-5.1328 -3.6111 -1\n3.3823 -7.4463 -1\n-0.21121 0.65504 -1\n-2.6382 -5.5732 -1\n-0.79968 6.8647 -1\n-0.86721 -2.6187 -1\n-0.23394 -6.8208 -1\n1.2919 7.064 -1\n"
  },
  {
    "path": "data_src/data_DBCV/dataset_4.txt",
    "content": "340.080593000166 401.306241000071 1\r\n333.985499000177 395.070042999927 1\r\n335.612031000201 392.773647000082 1\r\n345.092862000223 391.974363999907 1\r\n330.569323000032 392.169848000165 1\r\n339.312822999898 389.298434999771 1\r\n339.686031000223 398.61877400009 1\r\n343.316994000226 400.977901000064 1\r\n333.065419999883 396.446630000137 1\r\n342.511708000209 398.544327999931 1\r\n340.766671999823 400.656415000092 1\r\n337.277087999973 398.673247999977 1\r\n340.923070999794 396.419065000024 1\r\n341.683007000014 390.835886999965 1\r\n342.708281000145 391.703137999866 1\r\n343.201282000169 393.605033 1\r\n341.791383000091 397.986231000163 1\r\n331.578933999874 391.320538000204 1\r\n342.348366000224 393.762180000078 1\r\n338.727878999896 396.267293999903 1\r\n334.738222999964 395.718766000122 1\r\n330.975653999951 394.57460599998 1\r\n344.716322999913 392.09760100022 1\r\n341.841244999785 395.915595000144 1\r\n337.327810999937 394.15234699985 1\r\n337.534911999945 389.968692000024 1\r\n342.663162999786 398.431352999993 1\r\n331.561238999944 391.729294999968 1\r\n334.40383999981 401.985080000013 1\r\n332.629104000051 391.597103000153 1\r\n335.299560000189 396.55072600022 1\r\n336.396044999827 390.704500999767 1\r\n342.243536999915 401.898397999816 1\r\n339.051363999955 401.802294999827 1\r\n343.168777000159 393.770492999814 1\r\n339.352742000017 396.219544999767 1\r\n340.294815000147 397.91381700011 1\r\n335.593735000119 396.100544000044 1\r\n331.089887000155 399.788712000009 1\r\n337.715063000098 399.279529999942 1\r\n335.798527000006 390.623579999898 1\r\n342.880836999975 390.597740999889 1\r\n343.378837999888 392.276885000058 1\r\n336.571469000075 388.70158400014 1\r\n338.360745999962 393.431067000143 1\r\n332.635257999878 388.442526000086 1\r\n343.189617000055 397.453784999903 1\r\n332.511510000098 399.986099000089 1\r\n336.477483999915 397.836385000031 1\r\n332.235915000085 389.441686999984 1\r\n275.564732000232 392.559510999825 2\r\n271.378120000008 404.168049000204 2\r\n275.536663000006 391.976090000011 2\r\n277.958006000146 397.734982999973 2\r\n281.468803000171 397.988183000125 2\r\n276.999392999802 389.689679999836 2\r\n277.532668000087 400.254569999874 2\r\n283.876598999836 398.569484000094 2\r\n275.246861999854 393.905619999859 2\r\n280.570826000068 400.053100000136 2\r\n284.972879000008 389.935608999804 2\r\n272.625878999941 403.725992999971 2\r\n284.073303999845 403.610801999923 2\r\n275.946336000226 400.726569000166 2\r\n275.631306000054 400.140395000111 2\r\n271.080223999918 394.248447000049 2\r\n271.56240000017 400.297164000105 2\r\n273.661956999917 402.378101999871 2\r\n281.391483999789 400.420233999845 2\r\n273.427796999924 398.024796000216 2\r\n282.834764000028 403.955949999858 2\r\n275.17954300018 398.659824999981 2\r\n283.498360000085 390.48953700019 2\r\n271.657103999984 403.527952000033 2\r\n282.961077999789 398.744140000083 2\r\n284.618311999831 392.174819000065 2\r\n279.342093000188 404.362902000081 2\r\n270.902385999914 403.879670999944 2\r\n280.118470000103 404.34404100012 2\r\n279.850511000026 399.025212000124 2\r\n282.69835200021 416.991049999837 2\r\n284.884364999831 416.506583000068 2\r\n278.547925000079 417.136450000107 2\r\n272.391222000122 406.802037999965 2\r\n277.926847000141 405.936453999951 2\r\n275.040202999953 414.701766999904 2\r\n280.756792999804 414.884978000075 2\r\n276.10025600018 408.918399999849 2\r\n279.437752000056 405.916401000228 2\r\n272.477855999954 415.77772800019 2\r\n284.624309999868 417.78525599977 2\r\n280.52478600014 426.589997000061 2\r\n286.989403000101 414.205672999844 2\r\n274.815353000071 425.063535000198 2\r\n285.541612000205 424.086339999922 2\r\n275.997407999821 417.256587000098 2\r\n278.908342999872 421.349551000167 2\r\n275.412847999949 425.442823000252 2\r\n287.055484999903 418.244719999842 2\r\n277.670307000168 416.050514000002 2\r\n280.128748999909 438.741917999927 2\r\n280.342310000211 429.522609000094 2\r\n284.103120999876 430.363543000072 2\r\n294.341947000008 441.109651000239 2\r\n287.621638000011 428.60290400032 2\r\n288.48249799991 431.23475799989 2\r\n285.283555000089 436.722258999944 2\r\n286.409955000039 430.631586000323 2\r\n289.852479999885 432.494152000174 2\r\n292.558714999817 429.39255800005 2\r\n300.289115000051 448.096348000225 2\r\n301.998831000179 447.046858000103 2\r\n296.696620999835 444.077189000323 2\r\n292.593574999832 438.624294999987 2\r\n295.824153999798 449.513397000264 2\r\n294.305875000078 444.282610999886 2\r\n303.000893000048 438.451207000297 2\r\n294.69386699982 446.034585000016 2\r\n295.395599999931 447.633481000084 2\r\n298.359889999963 443.035277000163 2\r\n298.105407999828 452.848544000182 2\r\n298.494349000044 456.162745000329 2\r\n307.769902999979 462.346611000132 2\r\n297.570236000232 454.607466999907 2\r\n308.14916000003 448.817819999997 2\r\n306.467536999844 448.400144000072 2\r\n300.060355999973 448.536395000294 2\r\n298.264582999982 448.651820000261 2\r\n301.684843999799 455.66270800028 2\r\n308.0738309999 448.978669000324 2\r\n310.796959000174 454.082526999991 2\r\n312.118670000229 452.805322000291 2\r\n311.059359000064 466.771413000301 2\r\n318.166869999841 457.615817000158 2\r\n314.176175000146 458.207098999992 2\r\n316.558571000118 464.461933000013 2\r\n307.646118999925 453.736591999885 2\r\n317.605855000205 460.297666999977 2\r\n310.377187999897 459.180017000064 2\r\n311.657558000181 459.881522000302 2\r\n328.125591999851 465.369022000115 2\r\n331.674118999857 469.407306999899 2\r\n318.059061000124 459.695964999963 2\r\n327.572842000052 459.722072999924 2\r\n325.065386999864 468.653207000345 2\r\n323.394032999873 459.820330000017 2\r\n327.346989999991 469.411017999984 2\r\n323.62447200017 459.327498000115 2\r\n327.730142999906 471.821191000286 2\r\n320.339424000122 459.946117000189 2\r\n336.860669000074 471.016857000068 2\r\n329.326169000007 473.654308999889 2\r\n336.970894000027 465.487050999887 2\r\n328.943793999963 460.347212000284 2\r\n341.481104999781 468.662958000321 2\r\n331.426523000002 461.136917999946 2\r\n331.442586999852 459.671700000297 2\r\n340.471007000189 469.628787999973 2\r\n334.635424000211 474.328426000196 2\r\n334.673630000092 468.230307999998 2\r\n345.601218000054 472.548501000274 2\r\n340.310949999839 464.94068400003 2\r\n341.861144000199 466.057520000264 2\r\n342.321235999931 474.119177999906 2\r\n339.784878999926 472.528696999885 2\r\n348.661588000134 468.208697000053 2\r\n344.018592999782 464.765122000128 2\r\n346.396873999853 465.233286000323 2\r\n343.950104000047 462.03741999995 2\r\n344.375392000191 468.694385000039 2\r\n281.591250000056 390.116572000086 2\r\n274.038128000218 390.173677999992 2\r\n274.319044000003 388.317232000176 2\r\n281.458730000071 392.840789000038 2\r\n277.684638000093 395.224723999854 2\r\n282.037909999955 390.391964999959 2\r\n277.54200100014 388.628529000096 2\r\n277.781628999859 381.780561999884 2\r\n275.458581999876 382.77025000006 2\r\n278.33758000005 383.239541999996 2\r\n270.407542999834 375.868768000044 2\r\n276.533646999858 380.289565000217 2\r\n276.392326000147 372.586655000225 2\r\n277.435391000006 374.823960999958 2\r\n281.0096450001 375.529147000052 2\r\n270.199572999962 375.50949899992 2\r\n280.601464000065 379.713717000093 2\r\n272.352343999781 380.397797000129 2\r\n272.74782600021 371.799476999789 2\r\n279.051520000212 385.182934000157 2\r\n276.032953999937 373.474593000021 2\r\n276.04079400003 371.569041000213 2\r\n283.93596199993 372.577940999996 2\r\n282.358872999903 376.170775000006 2\r\n273.135604999959 374.693434000015 2\r\n275.436521000229 366.388801000081 2\r\n284.604973000009 361.595015000086 2\r\n279.457003000192 364.791898999829 2\r\n279.324618000071 373.007619999815 2\r\n283.907215999905 372.13773099985 2\r\n286.724506000057 362.70857599983 2\r\n287.665115000214 356.211618000176 2\r\n285.225418999791 365.002741999924 2\r\n285.955506999977 367.179124999791 2\r\n292.162618999835 357.082713000011 2\r\n283.527553999797 355.450588999782 2\r\n290.19073599996 354.795212000143 2\r\n281.79827499995 364.437746000011 2\r\n286.016962000169 364.524929000065 2\r\n285.490943999961 356.040585000068 2\r\n289.584323999938 353.874623999931 2\r\n292.188362999819 354.931929999962 2\r\n290.921076999977 351.638449999969 2\r\n295.706486000214 347.77908599982 2\r\n293.876329000108 344.007606999949 2\r\n287.630673000123 356.822922000196 2\r\n292.76003200002 346.940886000171 2\r\n291.874253999908 347.710423999932 2\r\n284.013354999945 351.794995999895 2\r\n292.967625999823 361.836296000052 2\r\n280.614341000095 362.254129999783 2\r\n283.180738000199 359.081335000228 2\r\n288.563686999958 352.996989000123 2\r\n290.206956000067 359.305296000093 2\r\n285.380559999961 360.669691000134 2\r\n280.855977000203 352.008268999867 2\r\n287.071576999966 356.453741999809 2\r\n289.397274999879 358.707884000149 2\r\n292.32484500017 350.750994999893 2\r\n297.766363000032 343.718977000099 2\r\n295.469016999938 352.890833000187 2\r\n286.694364999887 346.289671999868 2\r\n294.82860499993 353.694565999787 2\r\n286.33236300014 341.491194999777 2\r\n293.489422000013 343.918444999959 2\r\n293.222575999796 343.745008000173 2\r\n289.931313999929 341.977434999775 2\r\n292.469142999966 345.385646999814 2\r\n303.328251000028 344.780511000194 2\r\n304.068678000011 340.544828999788 2\r\n293.718036000151 338.427188999951 2\r\n303.741065999959 343.235077999998 2\r\n307.100600000005 336.871704000048 2\r\n307.09996900009 341.692346999887 2\r\n302.759434999898 337.83112199977 2\r\n301.137457000092 346.819325999822 2\r\n298.115906999912 348.274538999889 2\r\n301.878403000068 342.889849000145 2\r\n313.32788700005 344.940977000166 2\r\n301.189420000184 346.244235000107 2\r\n313.613148000091 341.952339000069 2\r\n310.804626999889 345.506359000225 2\r\n307.120536999777 342.781421999913 2\r\n307.54880299978 333.738884000108 2\r\n309.184045999777 341.367395999841 2\r\n309.930445999838 341.127594000194 2\r\n301.483397000004 340.018436999992 2\r\n314.124737999868 336.966264000162 2\r\n316.484484000131 330.42635500012 2\r\n312.664367000107 342.642039000057 2\r\n316.852740999777 328.654779999983 2\r\n313.073115000036 333.346878999844 2\r\n317.070995000191 341.424494999927 2\r\n322.053408000153 338.632120999973 2\r\n308.055980000179 337.719942000229 2\r\n318.092466000002 332.764543000143 2\r\n316.209414000157 340.604559999891 2\r\n321.762986999936 332.181195999961 2\r\n324.187342999969 336.267490999773 2\r\n330.833409999963 335.663953999989 2\r\n337.203918999992 347.113196999766 2\r\n325.619167000055 343.377601000015 2\r\n336.854183999822 335.616034000181 2\r\n325.100755999796 342.157753999811 2\r\n331.17795300018 337.049343999941 2\r\n326.799414000008 333.708540000021 2\r\n331.353066999931 335.78148799995 2\r\n323.835491000209 346.367407999933 2\r\n328.04509499995 335.579245999921 2\r\n333.326241999865 339.170746000018 2\r\n330.170179999899 334.505876999814 2\r\n330.410356000066 332.77614099998 2\r\n328.951834000181 326.862077000085 2\r\n330.768906999845 332.976048000157 2\r\n339.536853000056 333.274954999797 2\r\n330.71094099991 329.752326999791 2\r\n340.630439999979 340.895986999851 2\r\n342.341614999808 332.016452000011 2\r\n333.59476199979 332.964519000147 2\r\n336.322652000003 330.620709000155 2\r\n335.530784999952 334.798208000138 2\r\n332.734618999995 334.91174999997 2\r\n341.056561000179 333.587681999896 2\r\n345.079256999772 329.809594000224 2\r\n333.05690599978 328.164487000089 2\r\n344.170750999823 340.133692999836 2\r\n339.232667999808 339.78389800014 2\r\n336.185130999889 331.491934999824 2\r\n288.439141000155 429.814830000047 2\r\n277.240346000064 421.534643999767 2\r\n288.705926000141 432.914245000109 2\r\n285.61660300009 420.086494999938 2\r\n284.781975000165 423.352613999974 2\r\n289.356124000158 431.610005999915 2\r\n276.669199999887 428.340828000102 2\r\n282.552151000127 423.650437999982 2\r\n282.833926999941 433.334122000262 2\r\n275.244181999937 423.181933000218 2\r\n295.111847000197 440.70355800027 2\r\n288.956797000021 441.54499100009 2\r\n294.225846999791 446.283021000214 2\r\n285.297542000189 443.724222999997 2\r\n288.548696999904 435.404792000074 2\r\n286.853298999835 445.89872100018 2\r\n284.027945999987 445.659014000092 2\r\n290.283964000177 445.466033000033 2\r\n296.440758000128 448.085398999974 2\r\n292.013282000087 440.945854000282 2\r\n293.315739000216 454.471840000246 2\r\n291.641048000194 451.673665999901 2\r\n304.441722999793 446.362951000221 2\r\n291.323487999849 453.85565600032 2\r\n304.139917000197 446.935756999999 2\r\n302.618131999858 444.201373000164 2\r\n295.255117000081 447.44844800001 2\r\n293.073137000203 444.524476000108 2\r\n297.196814000141 447.21465400001 2\r\n290.564933999907 451.969909999985 2\r\n316.160480999853 462.161685000174 2\r\n313.438000000082 447.888195000123 2\r\n315.721752999816 451.953838000074 2\r\n315.141911000013 447.759120000061 2\r\n310.845393000171 456.790971000213 2\r\n309.446165000089 449.9552480001 2\r\n310.442799999844 455.088514000177 2\r\n315.661547999829 459.732952999882 2\r\n315.430583000183 460.020981000271 2\r\n310.377650000155 460.583492999896 2\r\n337.770816999953 464.291306000203 2\r\n326.218851999845 469.661887000315 2\r\n339.849760000128 473.900303000119 2\r\n330.132836999837 465.409437000286 2\r\n333.963119000196 466.885577999987 2\r\n328.408026000019 463.216091000009 2\r\n338.064695999958 470.464205000084 2\r\n331.278638999909 464.062855000142 2\r\n337.738963999785 473.016298000235 2\r\n331.581642999779 473.827144999988 2\r\n278.063279999886 382.482284000143 2\r\n283.170090000145 379.762041000184 2\r\n284.882455999963 377.243555999827 2\r\n286.762548999861 376.382792999968 2\r\n285.977713000029 375.200914000161 2\r\n282.663991999812 379.152185999788 2\r\n282.158846000209 377.099086000118 2\r\n284.432273000013 372.623476999812 2\r\n280.196934000123 375.145907999948 2\r\n286.962050000206 381.40418299986 2\r\n310.786869000178 347.103199000005 2\r\n311.337789000012 346.479408999905 2\r\n310.680124000181 341.763600999955 2\r\n306.215865000151 342.121199000161 2\r\n302.647224000189 334.363262999803 2\r\n304.464583999943 337.528198999818 2\r\n302.930536999833 338.498699000105 2\r\n315.309626000002 339.280749999918 2\r\n311.334650999866 345.365472000092 2\r\n307.930645999964 339.443221000023 2\r\n316.644803000148 331.834722000174 2\r\n329.77025000006 340.136498000007 2\r\n330.69882800011 335.393705999944 2\r\n321.975273999851 338.384099000134 2\r\n316.734904999845 330.654447999783 2\r\n324.408317999914 332.823224999942 2\r\n330.077132000122 333.456172999926 2\r\n321.82465799991 338.670402999967 2\r\n325.928280000109 340.108022000175 2\r\n327.313602000009 336.05405799998 2\r\n331.96966700023 340.252166000195 2\r\n339.144123999868 341.612036999781 2\r\n328.344622999895 337.307053999975 2\r\n335.398107000161 336.816229999997 2\r\n329.37205900019 339.833136000205 2\r\n339.925007999875 341.027007000055 2\r\n330.932599999942 334.75419700006 2\r\n327.721384000033 334.934183999896 2\r\n337.464132000227 329.19598999992 2\r\n337.832541999873 338.690866000019 2\r\n428.810889000073 499.729580000043 3\r\n425.393627000041 493.410670000128 3\r\n424.475703999866 500.755513000302 3\r\n415.510985999834 491.603397000115 3\r\n419.541222000029 490.648371000309 3\r\n428.113410000224 492.412991000339 3\r\n425.608328999951 488.619296000339 3\r\n427.317301000003 498.1468410003 3\r\n427.943053999916 498.07325399993 3\r\n428.672300999984 496.934424000327 3\r\n433.372045000084 502.141772999894 3\r\n435.874388999771 501.390268000308 3\r\n430.950474999845 491.318884999957 3\r\n437.752884999849 492.625738000032 3\r\n439.392155999783 502.221375999972 3\r\n436.655656999908 496.187112000305 3\r\n440.452285999898 491.369898000266 3\r\n434.746292000171 501.172223000322 3\r\n440.312845000066 493.24883500021 3\r\n438.881428000052 495.655058000237 3\r\n438.524873000104 493.75810000021 3\r\n441.167762000114 491.661625999957 3\r\n443.836159999948 496.010120999999 3\r\n448.038234999869 500.792133000214 3\r\n440.090555999894 493.597029000055 3\r\n448.841490000021 492.925735000055 3\r\n439.478817000054 497.346042000223 3\r\n446.786419000011 499.043122000061 3\r\n438.221760000102 494.281751000322 3\r\n437.356751000043 502.442203999963 3\r\n449.492473000195 492.53163299989 3\r\n459.745124999899 489.286241000053 3\r\n447.223060999997 501.440026999917 3\r\n447.245114000048 487.333358000033 3\r\n450.467995000072 496.733677000273 3\r\n458.580765999854 489.658048000187 3\r\n453.71749900002 491.682899000123 3\r\n450.043326999992 500.339963000268 3\r\n455.143877000082 496.32339300029 3\r\n453.2182169999 496.382433000021 3\r\n462.335330999922 490.786952999886 3\r\n467.451291000005 492.437547000125 3\r\n459.826137000229 496.025311000179 3\r\n464.071967999917 493.899759000167 3\r\n456.547443000134 497.955181000289 3\r\n468.249412999954 486.334616000298 3\r\n465.482481000014 484.134764000308 3\r\n471.070791999809 483.665415999945 3\r\n461.469093999825 496.853117000312 3\r\n464.949471000116 498.282124000136 3\r\n477.476850999985 487.796128000133 3\r\n473.19994200021 481.32956700027 3\r\n475.650100999977 481.682930999901 3\r\n472.407899999991 490.043781999964 3\r\n478.489666999783 484.528574000113 3\r\n477.478930999991 490.609395000152 3\r\n468.97280200012 490.981703000143 3\r\n468.745244000107 487.238164000213 3\r\n472.92670099996 486.226412999909 3\r\n469.936017000116 481.785910000093 3\r\n483.847126000095 475.061950000003 3\r\n480.692332999781 478.47275400022 3\r\n487.595226000063 475.968164999969 3\r\n488.596621000208 468.429765000008 3\r\n487.141553999856 472.961444000248 3\r\n488.096462999936 473.145284000318 3\r\n484.760805000085 472.528414000291 3\r\n488.708178000059 479.04041399993 3\r\n479.584040000103 466.580910000019 3\r\n484.286896999925 468.517949000001 3\r\n495.708829000127 468.662524000276 3\r\n498.306367999874 463.256289000157 3\r\n490.967838000041 466.609476000071 3\r\n489.33916899981 474.366789000109 3\r\n488.481294000056 461.520047999918 3\r\n488.358967999928 461.594925000332 3\r\n489.752803000156 472.862289000303 3\r\n484.34510200005 472.196117000189 3\r\n488.994771999773 460.876666999888 3\r\n490.832779000048 468.55657200003 3\r\n490.678313000128 455.870581999887 3\r\n489.974518000148 462.944128000177 3\r\n497.031692999881 462.044984000269 3\r\n484.53761800006 457.245074999984 3\r\n488.48476300016 452.290405000094 3\r\n484.257629999891 461.404545000289 3\r\n496.414187000133 461.387347000185 3\r\n493.16295599984 454.375983000267 3\r\n491.825439999811 454.194912000094 3\r\n488.775708000176 463.2106949999 3\r\n481.173328999896 443.654618000146 3\r\n494.338107000105 439.560624999925 3\r\n485.760642999783 452.174839999992 3\r\n484.106364999898 445.605880000163 3\r\n487.434642000124 442.638418999966 3\r\n480.945791000035 443.996179000009 3\r\n485.675890999846 444.819050000049 3\r\n481.355332999956 449.149621000048 3\r\n495.079679000191 451.061327999923 3\r\n493.256802000105 441.983272999991 3\r\n488.721293999813 434.025133000221 3\r\n479.183360000141 430.097661999986 3\r\n484.496737999842 434.698993999977 3\r\n485.407277999911 439.268786000088 3\r\n485.903371999972 432.775454000104 3\r\n477.372338000219 438.229150000028 3\r\n474.560936000198 440.252245000098 3\r\n475.857702000067 432.993828000035 3\r\n482.089610999916 444.30068500014 3\r\n474.89568899991 432.706654000096 3\r\n478.456371000037 430.672968000174 3\r\n472.835409999825 425.344663999975 3\r\n469.408571000211 431.490079999901 3\r\n467.778229999822 430.893924999982 3\r\n468.319089999888 423.78520500008 3\r\n466.404827999882 430.545737999957 3\r\n474.656531999819 436.512926999945 3\r\n473.704237000085 434.295117999893 3\r\n469.99489099998 422.807921000291 3\r\n470.22989499988 431.990844999906 3\r\n468.125308000017 427.692645000294 3\r\n459.370287000202 434.228566000238 3\r\n465.574583999813 427.847649999894 3\r\n471.47603000002 424.857528999913 3\r\n458.833753000014 423.569963000249 3\r\n457.583232000005 431.679839000106 3\r\n463.169875000138 428.454710999969 3\r\n466.314158000052 427.3446630002 3\r\n462.784620000049 433.005989999976 3\r\n458.836068000179 426.326979000121 3\r\n452.97005000012 423.382490000222 3\r\n458.167239000089 418.44309400022 3\r\n453.854613000061 421.438537000213 3\r\n457.294453000184 421.76099300012 3\r\n457.539230000228 421.425176999997 3\r\n456.41126499977 425.433713000268 3\r\n455.357557999901 424.094777000137 3\r\n450.902567999903 415.553141000215 3\r\n458.018389000092 418.812316999771 3\r\n455.648783999961 422.940563000273 3\r\n440.518015000038 417.753004999831 3\r\n438.012492000125 412.980132000055 3\r\n447.59070800012 422.962691000197 3\r\n438.595366000198 419.423899000045 3\r\n451.007958999835 415.56221299991 3\r\n447.837811000179 423.666684000287 3\r\n439.242604000028 423.527970999945 3\r\n440.404937000014 418.397346999962 3\r\n440.028320999816 416.051675000228 3\r\n447.287671000231 417.663943999913 3\r\n438.35898800008 414.573799000122 3\r\n426.627871000208 416.068214000203 3\r\n426.516830000095 414.542373000178 3\r\n428.430649999995 416.653237999883 3\r\n436.461378000211 416.150005999953 3\r\n440.34707200015 419.265943999868 3\r\n427.584884000011 408.791635999922 3\r\n439.871456000023 407.839774999768 3\r\n429.444986999966 412.304796000011 3\r\n430.937460999936 405.849741999991 3\r\n428.969322999939 415.703420999926 3\r\n423.408722999971 406.841905999929 3\r\n419.644838000182 405.488836999983 3\r\n427.421575000044 403.303555000108 3\r\n428.579102999996 410.244613000192 3\r\n428.30893300008 401.526786999777 3\r\n424.206778999884 408.167326000053 3\r\n424.488547000103 411.169675999787 3\r\n420.753368000034 409.300112999976 3\r\n421.141454000026 403.093392999843 3\r\n422.575889000203 395.963992000092 3\r\n415.008001000155 394.48915300006 3\r\n423.418938000221 402.424097000156 3\r\n420.429361999966 398.279748000205 3\r\n427.444771000184 400.071800000034 3\r\n414.952022999991 393.633016000036 3\r\n421.983272000216 390.16661400022 3\r\n413.9837460001 393.733527000062 3\r\n426.244646999985 394.742223000154 3\r\n418.182521000039 399.909256999847 3\r\n420.358260999899 375.894729999825 3\r\n427.869353999849 388.467025999911 3\r\n428.307055000216 382.592672999948 3\r\n419.557771999855 382.19579999987 3\r\n428.50125099998 388.969442999922 3\r\n418.244741000235 379.359017999843 3\r\n417.434104999993 376.864184000064 3\r\n419.457435000222 387.456309999805 3\r\n428.179967000149 386.792590999976 3\r\n417.430131000001 385.002572000027 3\r\n422.263489999808 372.909155999776 3\r\n423.511117999908 364.854987000115 3\r\n426.30204299977 373.600267999806 3\r\n425.074638000224 369.74961300008 3\r\n419.22969500022 367.148517999798 3\r\n425.448702000082 367.063457999844 3\r\n427.935936000198 376.258545000106 3\r\n421.599621000234 378.26745699998 3\r\n429.803634000011 379.078275000211 3\r\n428.726489999797 366.422532000113 3\r\n429.293459999841 370.441796000116 3\r\n422.024081000127 365.716256999876 3\r\n422.080856999848 360.452484999783 3\r\n434.971530999988 367.278249000199 3\r\n436.256130999885 362.242738000117 3\r\n424.723292000126 369.823642999865 3\r\n432.911468999926 362.537899000105 3\r\n427.971251000185 357.049992000218 3\r\n434.024201000109 357.875876000151 3\r\n436.15306300018 369.883030999918 3\r\n440.152916000225 351.768310000189 3\r\n436.668070000131 360.135976000223 3\r\n430.671628000215 355.613342000172 3\r\n434.786071000155 355.149490999989 3\r\n435.98977799993 360.07734999992 3\r\n442.275245999917 353.242513000034 3\r\n439.905375999864 348.219847000204 3\r\n444.840218999889 354.630938000046 3\r\n433.893869999796 358.638927000109 3\r\n430.86381699983 354.284175999928 3\r\n442.700879999902 346.138824999798 3\r\n446.513323000167 354.673012999818 3\r\n439.407722999807 357.137496000156 3\r\n446.086984000169 352.812396999914 3\r\n444.458507999778 354.214385000058 3\r\n448.701793000102 351.475484000053 3\r\n438.790421000216 347.653041000012 3\r\n449.342588 357.475145000033 3\r\n445.903801999986 355.878285000101 3\r\n437.50270299986 349.007557000034 3\r\n450.747810999863 344.582192999776 3\r\n452.843367999885 344.103289999999 3\r\n454.279118999839 344.040136000142 3\r\n444.908001999836 347.374592000153 3\r\n453.119206999894 343.011994999833 3\r\n453.410854999907 336.931435000151 3\r\n445.570268999785 341.626203999855 3\r\n455.76273200009 344.204905999824 3\r\n453.301330999937 346.165316000115 3\r\n443.662442000117 343.893095999956 3\r\n450.592554000206 341.27203800017 3\r\n451.640319999773 344.317819000222 3\r\n457.559148000088 338.631022999994 3\r\n460.726425000001 345.278055999894 3\r\n451.824684000108 341.867209000047 3\r\n452.238888000138 331.685310000088 3\r\n452.757952000014 338.266270000022 3\r\n455.351141999941 333.413383999839 3\r\n448.535099999979 332.97164299991 3\r\n455.475157999899 341.506680000108 3\r\n460.855047000106 338.930375999771 3\r\n457.078596000094 330.556702999864 3\r\n453.813031000085 332.122059999965 3\r\n454.459245000035 335.722775999922 3\r\n463.812841000035 332.088324999902 3\r\n454.220263000112 336.861219000071 3\r\n453.876569999848 335.621797999833 3\r\n464.133146000095 340.191471000202 3\r\n463.772503000218 333.303650999907 3\r\n465.428991000168 337.623219000176 3\r\n468.293345999904 323.258880999871 3\r\n461.778946999926 325.126213999931 3\r\n459.180929000024 325.846413000021 3\r\n462.312686999794 332.107357000001 3\r\n470.991812000051 323.379896999802 3\r\n465.851267999969 322.656903000083 3\r\n460.669528000057 324.675607999787 3\r\n466.893639000133 330.128490000032 3\r\n468.453945999965 327.243917000014 3\r\n470.58936699992 324.017206999939 3\r\n472.794015999883 326.346493000165 3\r\n462.414896000177 327.932380000129 3\r\n472.940739999991 320.993077000137 3\r\n466.85614300007 329.479348999914 3\r\n470.819778999779 326.629040999804 3\r\n468.017585999798 319.480458999984 3\r\n474.199337000027 325.671914999839 3\r\n473.76885400014 317.551202999894 3\r\n461.43383899983 317.940142000094 3\r\n468.314850000199 324.931464999914 3\r\n423.049947000109 461.602417000104 4\r\n415.882664999925 462.230109999888 4\r\n422.901542000007 463.280180000234 4\r\n424.773157000076 461.591086000204 4\r\n425.97969100019 463.723321999889 4\r\n417.142891000025 458.040789000224 4\r\n426.111376000103 451.80352600012 4\r\n414.850494000129 454.950883999933 4\r\n424.364312000107 456.30208500009 4\r\n414.640132000204 459.097361000255 4\r\n421.098943999968 465.900079000276 4\r\n415.243125999812 460.873907000292 4\r\n426.502336000092 463.425848000217 4\r\n419.553964000195 461.354702000041 4\r\n420.256450999994 462.199783000164 4\r\n426.286921999883 453.050778999925 4\r\n418.101619999856 455.696752000134 4\r\n418.435087999795 457.909536000341 4\r\n426.986647000071 459.493770000059 4\r\n422.758165999781 452.658585000318 4\r\n429.253837999888 454.370699000079 4\r\n426.566916000098 466.358785000164 4\r\n426.117418999784 457.360111000016 4\r\n423.799103000201 465.722519000061 4\r\n419.570911000017 454.25428700028 4\r\n425.907724000048 465.343441000208 4\r\n415.338547000196 462.592960000038 4\r\n425.639667000156 463.009326000232 4\r\n418.12036800012 464.431340000127 4\r\n421.191784000024 464.121416999958 4\r\n466.573479000013 388.56095399987 5\r\n473.242182999849 388.505501999985 5\r\n477.989756000228 380.362852999941 5\r\n468.493040999863 382.565651000012 5\r\n478.035753999837 383.804427000228 5\r\n465.900979999918 382.578755999915 5\r\n467.047054000199 376.6596789998 5\r\n465.586457999889 376.740127999801 5\r\n466.30907699978 384.941691999789 5\r\n466.533137999941 388.636098000221 5\r\n475.669772000052 382.141518999822 5\r\n479.811947000213 376.880171000026 5\r\n474.566833000164 384.886010999791 5\r\n468.031305999961 380.62018099986 5\r\n478.953275999986 386.168779000174 5\r\n477.305238999892 376.998159999959 5\r\n472.392535000108 376.989792999811 5\r\n479.798206999898 378.12027399987 5\r\n470.961451000068 377.676510000136 5\r\n466.5645750002 389.006446000189 5\r\n468.762294999789 378.853031999897 5\r\n477.20013799984 386.128392999992 5\r\n477.790004000068 389.397962999996 5\r\n472.586430999916 381.000587000046 5\r\n467.618739999831 378.265711000189 5\r\n468.390726000071 390.262949999887 5\r\n474.892611999996 383.943504999857 5\r\n479.888063999824 382.734370000195 5\r\n470.796721999999 382.858122000005 5\r\n469.433087000158 382.949008000083 5\r\n473.588192000054 379.197623000015 5\r\n470.089219999965 388.71069200011 5\r\n479.300497000106 386.932823999785 5\r\n478.489159000106 378.533429000061 5\r\n475.063924999908 388.783824999817 5\r\n465.624900999945 377.243751999922 5\r\n477.853277999908 386.701561999973 5\r\n478.867014999967 376.112596000079 5\r\n466.944978000131 389.898415999953 5\r\n475.431175000034 381.42365900008 5\r\n397.510631000157 318.779397999868 6\r\n393.182192000095 311.378210000228 6\r\n405.59302699985 313.85961999977 6\r\n400.207744999789 314.678629000206 6\r\n399.142376000062 319.893732000142 6\r\n395.942474999931 312.430680999998 6\r\n403.492874000221 311.391166999936 6\r\n403.187113999855 312.641197999939 6\r\n394.807862999849 315.297671999782 6\r\n399.650669999886 318.326419000048 6\r\n398.619932999834 309.981873000041 6\r\n406.447836000007 313.045142000075 6\r\n394.85633599991 306.184570999816 6\r\n406.718776999973 314.228234999813 6\r\n396.452070999891 309.648914000019 6\r\n402.186852000188 312.914503999986 6\r\n398.009571999777 306.306824999861 6\r\n394.72197900014 300.856654000003 6\r\n401.503841000143 306.34066199977 6\r\n394.375514999963 301.732228000183 6\r\n408.25665100012 307.194850999862 6\r\n404.499704999849 310.703956000041 6\r\n398.461693999823 300.209472000133 6\r\n397.59167600004 302.597422000021 6\r\n411.33420599997 303.615460999776 6\r\n409.380115000065 301.257648999803 6\r\n411.549291000236 307.317329999991 6\r\n400.192313999869 299.548905999865 6\r\n408.531942000147 309.208424000069 6\r\n406.719988000114 301.970383999869 6\r\n410.583399999887 302.046819999814 6\r\n404.96436699992 312.242717999965 6\r\n401.784442000091 305.12842199998 6\r\n410.006078999955 310.366615999956 6\r\n401.597389000002 313.42785899993 6\r\n414.056191000156 310.538738999981 6\r\n407.827461000066 315.803555000108 6\r\n414.649985000025 304.695712000132 6\r\n408.976344999857 302.572995000053 6\r\n402.320390000008 303.321086000185 6\r\n388.082185000181 312.699907000177 6\r\n389.509529999923 308.985661000013 6\r\n384.867300000042 306.26948300004 6\r\n390.55796500016 307.401546000037 6\r\n398.799327000044 310.043293000199 6\r\n387.063124000095 315.284988000058 6\r\n385.934568000026 303.611659999937 6\r\n387.964254000224 307.235400999896 6\r\n388.047687000129 312.992616999894 6\r\n389.964410000015 314.521449999884 6\r\n386.692520000041 295.829528000206 6\r\n394.290382000152 296.759353000205 6\r\n385.995306999888 309.215948000085 6\r\n397.548890999984 306.051862000022 6\r\n398.91705300007 301.137899999972 6\r\n398.024985000025 294.545830999967 6\r\n390.556464000139 299.987350000069 6\r\n395.581344999839 307.565301999915 6\r\n387.356068000197 306.987933999859 6\r\n388.168403000105 308.765101999976 6\r\n397.505044000223 306.712673999835 6\r\n395.604325999971 299.74879400013 6\r\n402.714682000224 300.474165999796 6\r\n406.542411999777 297.360909999814 6\r\n397.990000999998 293.174902999774 6\r\n403.867875999771 305.508750000037 6\r\n398.770523999818 304.696136999875 6\r\n399.030168999918 301.416486999951 6\r\n402.828209000174 300.463953999802 6\r\n399.719417999964 296.280708000064 6\r\n406.255197000224 297.83671199996 6\r\n410.982671999838 293.501579999924 6\r\n406.042142999824 304.310072000138 6\r\n398.845921 301.682546999771 6\r\n399.413953000214 302.254333000164 6\r\n399.175451999996 304.443797000218 6\r\n410.2477330002 296.46242700005 6\r\n408.364271000028 296.781977000181 6\r\n410.001900999807 296.184016000014 6\r\n398.705327000003 298.27604399994 6\r\n364.387544000056 443.955205000006 -1\r\n379.862741999794 467.835111000109 -1\r\n370.757472000085 513.521511000581 -1\r\n416.86848400021 524.035082000308 -1\r\n311.539015999995 523.082759000361 -1\r\n284.837520000059 487.712935999967 -1\r\n265.537488000002 517.137047000229 -1\r\n265.406068000011 467.304047000129 -1\r\n331.243737000041 503.141515000258 -1\r\n393.481540999841 431.555767999962 -1\r\n455.248877999838 465.751176000107 -1\r\n433.485898999963 440.123194000218 -1\r\n341.20297600003 430.338018999901 -1\r\n361.197627000045 414.171761000063 -1\r\n392.739544999786 411.681011000182 -1\r\n362.739223000128 364.252307000104 -1\r\n380.037742000073 395.467621999793 -1\r\n400.507069000043 365.464221000206 -1\r\n377.944128999952 336.997504000086 -1\r\n416.090559999924 338.672298000194 -1\r\n444.825889000203 307.228933999781 -1\r\n493.477363000158 300.292936999816 -1\r\n450.57735500019 285.058660999872 -1\r\n487.091688999906 274.915289999917 -1\r\n428.451733000111 271.475777999964 -1\r\n353.462441999931 268.77300899988 -1\r\n332.63990199985 300.980235999916 -1\r\n294.568130999804 269.832191000227 -1\r\n286.768153000157 309.40664099995 -1\r\n254.155410999898 273.011171999853 -1\r\n255.736754999962 383.043938999996 -1\r\n255.578135000076 321.797772999853 -1\r\n269.571206999943 291.691631999798 -1\r\n318.953784000129 371.279631999787 -1\r\n322.208068999927 419.043138000183 -1\r\n302.684162000194 410.35325299995 -1\r\n339.387234999798 367.536927999929 -1\r\n297.637213999871 384.627384999767 -1\r\n311.072525999974 393.566060000099 -1\r\n377.536865999922 377.286367000081 -1\r\n417.072476999834 433.68673299998 -1\r\n396.357729999814 452.929053000174 -1\r\n399.783197000157 485.216498000082 -1\r\n362.771583000198 492.148952000309 -1\r\n308.225172999781 492.362728999928 -1\r\n286.942470000125 516.371205000207 -1\r\n264.795897000004 498.772009999957 -1\r\n454.217315000016 522.212020000443 -1\r\n485.410180000123 522.141029000282 -1\r\n515.508918000385 520.376193000004 -1\r\n515.950048999861 493.279626999982 -1\r\n517.4182099998 456.858341000043 -1\r\n517.196076000109 419.766627999954 -1\r\n519.392636000179 382.963473000098 -1\r\n517.66203899961 356.089052000083 -1\r\n520.467551999725 314.412516000215 -1\r\n518.004772000015 280.905362999998 -1\r\n499.387498000171 337.548436000012 -1\r\n478.931082000025 356.823363000061 -1\r\n497.154240000062 415.408559999894 -1\r\n497.773620999884 377.833358000033 -1\r\n445.298427000176 391.178739000112 -1\r\n470.157571999822 409.287219999824 -1\r\n357.421777000185 314.294974999968 -1\r\n384.430027999915 277.063980999868 -1\r\n268.806625999976 337.208126999903 -1\r\n258.773949999828 431.312564000022 -1"
  },
  {
    "path": "data_src/data_DBCV/read_data.R",
    "content": "library(dbscan)\n\n\nx <- read.table(\"Work/data_DBCV/dataset_1.txt\")\ncolnames(x) <- c(\"x\", \"y\", \"class\")\n\ncl <- x[, 3]\ncl[cl < 0] <- 0\nx[, 3] <- cl\n\nplot(x[, 1:2], col = x[, 3] + 1L, asp = 1)\n\nDataset_1 <- x\nsave(Dataset_1, file=\"data/Dataset_1.rda\", version = 2)\n\nx <- read.table(\"Work/data_DBCV/dataset_2.txt\")\ncolnames(x) <- c(\"x\", \"y\", \"class\")\n\ncl <- x[, 3]\ncl[cl < 0] <- 0\nx[, 3] <- cl\n\nclplot(x[, 1:2], x[, 3])\n\nDataset_2 <- x\nsave(Dataset_2, file=\"data/Dataset_2.rda\", version = 2)\n\n\nx <- read.table(\"Work/data_DBCV/dataset_3.txt\")\ncolnames(x) <- c(\"x\", \"y\", \"class\")\n\ncl <- x[, 3]\ncl[cl < 0] <- 0\nx[, 3] <- cl\n\nclplot(x[, 1:2], x[, 3])\n\nDataset_3 <- x\nsave(Dataset_3, file=\"data/Dataset_3.rda\", version = 2)\n\nx <- read.table(\"Work/data_DBCV/dataset_4.txt\")\ncolnames(x) <- c(\"x\", \"y\", \"class\")\n\ncl <- x[, 3]\ncl[cl < 0] <- 0\nx[, 3] <- cl\n\nclplot(x[, 1:2], x[, 3])\n\nDataset_4 <- x\nsave(Dataset_4, file=\"data/Dataset_4.rda\", version = 2)\n"
  },
  {
    "path": "data_src/data_DBCV/test_DBCV.R",
    "content": "# From: https://github.com/FelSiq/DBCV\n#\n# Dataset\tPython (Scipy's Kruskal's)\tPython (Translated MST algorithm)\tMATLAB\n# dataset_1.txt\t0.8566\t0.8576\t0.8576\n# dataset_2.txt\t0.5405\t0.8103\t0.8103\n# dataset_3.txt\t0.6308\t0.6319\t0.6319\n# dataset_4.txt\t0.8456\t0.8688\t0.8688\n#\n# Original MATLAB implementation is at:\n#     https://github.com/pajaskowiak/dbcv/tree/main/data\n\n\nres <- c()\n\ndata(Dataset_1)\nx <- Dataset_1[, c(\"x\", \"y\")]\nclass <- Dataset_1$class\n#clplot(x, class)\n(db <- dbcv(x, class, metric = \"sqeuclidean\"))\nres[\"ds1\"] <- db$score\n\n\n#dsc [0.00457826 0.00457826 0.0183068  0.0183068 ]\n#dspc [0.85627898 0.85627898 0.85627898 0.85627898]\n#vcs [0.99465331 0.99465331 0.97862052 0.97862052]\n#0.8575741400490697\n\ndata(Dataset_2)\nx <- Dataset_2[, c(\"x\", \"y\")]\nclass <- Dataset_2$class\n#clplot(x, class)\n(db <- dbcv(x, class, metric = \"sqeuclidean\"))\nres[\"ds2\"] <- db$score\n\n#dsc [19.06151967 15.6082     83.71522964 68.969     ]\n#dspc [860.2538 501.4376 501.4376 860.2538]\n#vcs [0.97784198 0.9688731  0.83304956 0.91982715]\n#0.8103343589093096\n\n\ndata(Dataset_3)\nx <- Dataset_3[, c(\"x\", \"y\")]\nclass <- Dataset_3$class\n#clplot(x, class)\n(db <- dbcv(x, class, metric = \"sqeuclidean\"))\nres[\"ds3\"] <- db$score\n\ndata(Dataset_4)\nx <- Dataset_4[, c(\"x\", \"y\")]\nclass <- Dataset_4$class\n#clplot(x, class)\n(db <- dbcv(x, class, metric = \"sqeuclidean\"))\nres[\"ds4\"] <- db$score\n\ncbind(dbscan = round(res, 2), MATLAB = c(0.85, 0.81, 0.63, 0.87))\n\n"
  },
  {
    "path": "data_src/data_chameleon/read.R",
    "content": "# Source: http://glaros.dtc.umn.edu/gkhome/cluto/cluto/download\n\nchameleon_ds4 <- read.table(\"t4.8k.dat\")\nchameleon_ds5 <- read.table(\"t5.8k.dat\")\nchameleon_ds7 <- read.table(\"t7.10k.dat\")\nchameleon_ds8 <- read.table(\"t8.8k.dat\")\n\ncolnames(chameleon_ds4) <- colnames(chameleon_ds5) <- colnames(chameleon_ds7) <- colnames(chameleon_ds8) <- c(\"x\", \"y\")\n\nplot(chameleon_ds4)\nplot(chameleon_ds5)\nplot(chameleon_ds7)\nplot(chameleon_ds8)\n\nsave(chameleon_ds4, chameleon_ds5, chameleon_ds7, chameleon_ds8, \n     file=\"Chameleon.rda\")\n"
  },
  {
    "path": "dbscan.Rproj",
    "content": "Version: 1.0\nProjectId: 6c2ba941-cfaa-4faa-ba72-88eeef0391b8\n\nRestoreWorkspace: Default\nSaveWorkspace: Default\nAlwaysSaveHistory: Default\n\nEnableCodeIndexing: Yes\nUseSpacesForTab: Yes\nNumSpacesForTab: 2\nEncoding: UTF-8\n\nRnwWeave: Sweave\nLaTeX: pdfLaTeX\n\nAutoAppendNewline: Yes\nStripTrailingWhitespace: Yes\n\nBuildType: Package\nPackageUseDevtools: Yes\nPackageCleanBeforeInstall: No\nPackageInstallArgs: --no-multiarch --with-keep.source\nPackageBuildArgs: --compact-vignettes=both\nPackageCheckArgs: --as-cran\nPackageRoxygenize: rd,collate,namespace\n"
  },
  {
    "path": "inst/CITATION",
    "content": "citation(auto = meta)\r\n\r\nbibentry(bibtype = \"Article\",\r\n  title        = \"{dbscan}: Fast Density-Based Clustering with {R}\",\r\n  author       = c(person(given = \"Michael\",\r\n                          family = \"Hahsler\",\r\n                          email = \"mhahsler@lyle.smu.edu\",\r\n\t\t\t  comment = c(ORCID = \"0000-0003-2716-1405\")),\r\n                   person(given = \"Matthew\",\r\n                          family = \"Piekenbrock\"),\r\n                   person(given = \"Derek\",\r\n                          family = \"Doran\",\r\n                          email = \"derek.doran@wright.edu\")),\r\n  journal      = \"Journal of Statistical Software\",\r\n  year         = \"2019\",\r\n  volume       = \"91\",\r\n  number       = \"1\",\r\n  pages        = \"1--30\",\r\n  doi          = \"10.18637/jss.v091.i01\",\r\n  header       = \"To cite dbscan in publications use:\"\r\n)\r\n\r\n"
  },
  {
    "path": "man/DBCV_datasets.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/DBCV_datasets.R\n\\docType{data}\n\\name{DBCV_datasets}\n\\alias{DBCV_datasets}\n\\alias{Dataset_1}\n\\alias{Dataset_2}\n\\alias{Dataset_3}\n\\alias{Dataset_4}\n\\title{DBCV Paper Datasets}\n\\format{\nFour data frames with the following 3 variables.\n\\describe{\n\\item{x}{a numeric vector}\n\\item{y}{a numeric vector}\n\\item{class}{an integer vector indicating the class label. 0 means noise.} }\n}\n\\source{\nhttps://github.com/pajaskowiak/dbcv\n}\n\\description{\nThe four synthetic 2D datasets used in Moulavi et al (2014).\n}\n\\examples{\ndata(\"Dataset_1\")\nclplot(Dataset_1[, c(\"x\", \"y\")], cl = Dataset_1$class)\n\ndata(\"Dataset_2\")\nclplot(Dataset_2[, c(\"x\", \"y\")], cl = Dataset_2$class)\n\ndata(\"Dataset_3\")\nclplot(Dataset_3[, c(\"x\", \"y\")], cl = Dataset_3$class)\n\ndata(\"Dataset_4\")\nclplot(Dataset_4[, c(\"x\", \"y\")], cl = Dataset_4$class)\n}\n\\references{\nDavoud Moulavi and Pablo A. Jaskowiak and\nRicardo J. G. B. Campello and Arthur Zimek and Jörg Sander (2014).\nDensity-Based Clustering Validation. In\n\\emph{Proceedings of the 2014 SIAM International Conference on Data Mining,}\npages 839-847\n\\doi{10.1137/1.9781611973440.96}\n}\n\\keyword{datasets}\n"
  },
  {
    "path": "man/DS3.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/DS3.R\n\\docType{data}\n\\name{DS3}\n\\alias{DS3}\n\\title{DS3: Spatial data with arbitrary shapes}\n\\format{\nA data.frame with 8000 observations on the following 2 columns:\n\\describe{\n\\item{X}{a numeric vector}\n\\item{Y}{a numeric vector}\n}\n}\n\\source{\nObtained from \\url{http://cs.joensuu.fi/sipu/datasets/}\n}\n\\description{\nContains 8000 2-d points, with 6 \"natural\" looking shapes, all of which have\nan sinusoid-like shape that intersects with each cluster.\nThe data set was originally used as a benchmark data set for the Chameleon clustering\nalgorithm (Karypis, Han and Kumar, 1999) to\nillustrate the a data set containing arbitrarily shaped\nspatial data surrounded by both noise and artifacts.\n}\n\\examples{\ndata(DS3)\nplot(DS3, pch = 20, cex = 0.25)\n}\n\\references{\nKarypis, George, Eui-Hong Han, and Vipin Kumar (1999).\nChameleon: Hierarchical clustering using dynamic modeling. \\emph{Computer}\n32(8): 68-75.\n}\n\\keyword{datasets}\n"
  },
  {
    "path": "man/NN.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/NN.R\n\\name{NN}\n\\alias{NN}\n\\alias{adjacencylist}\n\\alias{adjacencylist.NN}\n\\alias{sort.NN}\n\\alias{plot.NN}\n\\title{NN --- Nearest Neighbors Superclass}\n\\usage{\nadjacencylist(x, ...)\n\n\\method{adjacencylist}{NN}(x, ...)\n\n\\method{sort}{NN}(x, decreasing = FALSE, ...)\n\n\\method{plot}{NN}(x, data, main = NULL, pch = 16, col = NULL, linecol = \"gray\", ...)\n}\n\\arguments{\n\\item{x}{a \\code{NN} object}\n\n\\item{...}{further parameters past on to \\code{\\link[=plot]{plot()}}.}\n\n\\item{decreasing}{sort in decreasing order?}\n\n\\item{data}{that was used to create \\code{x}}\n\n\\item{main}{title}\n\n\\item{pch}{plotting character.}\n\n\\item{col}{color used for the data points (nodes).}\n\n\\item{linecol}{color used for edges.}\n}\n\\description{\nNN is an abstract S3 superclass for the classes of the objects returned\nby \\code{\\link[=kNN]{kNN()}}, \\code{\\link[=frNN]{frNN()}} and \\code{\\link[=sNN]{sNN()}}. Methods for sorting, plotting and getting an\nadjacency list are defined.\n}\n\\section{Subclasses}{\n\n\\link{kNN}, \\link{frNN} and \\link{sNN}\n}\n\n\\examples{\ndata(iris)\nx <- iris[, -5]\n\n# finding kNN directly in data (using a kd-tree)\nnn <- kNN(x, k=5)\nnn\n\n# plot the kNN where NN are shown as line conecting points.\nplot(nn, x)\n\n# show the first few elements of the adjacency list\nhead(adjacencylist(nn))\n\n\\dontrun{\n# create a graph and find connected components (if igraph is installed)\nlibrary(\"igraph\")\ng <- graph_from_adj_list(adjacencylist(nn))\ncomp <- components(g)\nplot(x, col = comp$membership)\n\n# detect clusters (communities) with the label propagation algorithm\ncl <- membership(cluster_label_prop(g))\nplot(x, col = cl)\n}\n}\n\\seealso{\nOther NN functions: \n\\code{\\link{comps}()},\n\\code{\\link{frNN}()},\n\\code{\\link{kNN}()},\n\\code{\\link{kNNdist}()},\n\\code{\\link{sNN}()}\n}\n\\author{\nMichael Hahsler\n}\n\\concept{NN functions}\n\\keyword{model}\n"
  },
  {
    "path": "man/comps.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/comps.R\n\\name{comps}\n\\alias{comps}\n\\alias{components}\n\\alias{comps.dist}\n\\alias{comps.kNN}\n\\alias{comps.sNN}\n\\alias{comps.frNN}\n\\title{Find Connected Components in a Nearest-neighbor Graph}\n\\usage{\ncomps(x, ...)\n\n\\method{comps}{dist}(x, eps, ...)\n\n\\method{comps}{kNN}(x, mutual = FALSE, ...)\n\n\\method{comps}{sNN}(x, ...)\n\n\\method{comps}{frNN}(x, ...)\n}\n\\arguments{\n\\item{x}{the \\link{NN} object representing the graph or a \\link{dist} object}\n\n\\item{...}{further arguments are currently unused.}\n\n\\item{eps}{threshold on the distance}\n\n\\item{mutual}{for a pair of points, do both have to be in each other's neighborhood?}\n}\n\\value{\nan integer vector with component assignments.\n}\n\\description{\nGeneric function and methods to find connected components in nearest neighbor graphs.\n}\n\\details{\nNote that for kNN graphs, one point may be in the kNN of the other but nor vice versa.\n\\code{mutual = TRUE} requires that both points are in each other's kNN.\n}\n\\examples{\nset.seed(665544)\nn <- 100\nx <- cbind(\n  x=runif(10, 0, 5) + rnorm(n, sd = 0.4),\n  y=runif(10, 0, 5) + rnorm(n, sd = 0.4)\n  )\nplot(x, pch = 16)\n\n# Connected components on a graph where each pair of points\n# with a distance less or equal to eps are connected\nd <- dist(x)\ncomponents <- comps(d, eps = .8)\nplot(x, col = components, pch = 16)\n\n# Connected components in a fixed radius nearest neighbor graph\n# Gives the same result as the threshold on the distances above\nfrnn <- frNN(x, eps = .8)\ncomponents <- comps(frnn)\nplot(frnn, data = x, col = components)\n\n# Connected components on a k nearest neighbors graph\nknn <- kNN(x, 3)\ncomponents <- comps(knn, mutual = FALSE)\nplot(knn, data = x, col = components)\n\ncomponents <- comps(knn, mutual = TRUE)\nplot(knn, data = x, col = components)\n\n# Connected components in a shared nearest neighbor graph\nsnn <- sNN(x, k = 10, kt = 5)\ncomponents <- comps(snn)\nplot(snn, data = x, col = components)\n}\n\\seealso{\nOther NN functions: \n\\code{\\link{NN}},\n\\code{\\link{frNN}()},\n\\code{\\link{kNN}()},\n\\code{\\link{kNNdist}()},\n\\code{\\link{sNN}()}\n}\n\\author{\nMichael Hahsler\n}\n\\concept{NN functions}\n\\keyword{model}\n"
  },
  {
    "path": "man/dbcv.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/dbcv.R\n\\name{dbcv}\n\\alias{dbcv}\n\\alias{DBCV}\n\\title{Density-Based Clustering Validation Index (DBCV)}\n\\usage{\ndbcv(x, cl, d, metric = \"euclidean\", sample = NULL)\n}\n\\arguments{\n\\item{x}{a data matrix or a dist object.}\n\n\\item{cl}{a clustering (e.g., a integer vector)}\n\n\\item{d}{dimensionality of the original data if a dist object is provided.}\n\n\\item{metric}{distance metric used. The available metrics are the methods\nimplemented by \\code{dist()} plus \\code{\"sqeuclidean\"} for the squared\nEuclidean distance used in the original DBCV implementation.}\n\n\\item{sample}{sample size used for large datasets.}\n}\n\\value{\nA list with the DBCV \\code{score} for the clustering,\nthe density sparseness of cluster (\\code{dsc}) values,\nthe density separation of pairs of clusters (\\code{dspc}) distances,\nand the validity indices of clusters (\\code{c_c}).\n}\n\\description{\nCalculate the Density-Based Clustering Validation Index (DBCV)  for a\nclustering.\n}\n\\details{\nDBCV (Moulavi et al, 2014) computes a score based on the density sparseness of each cluster\nand the density separation of each pair of clusters.\n\nThe density sparseness of a cluster (DSC) is deﬁned as the maximum edge weight of\na minimal spanning tree for the internal points of the cluster using the mutual\nreachability distance based on the all-points-core-distance. Internal points\nare connected to more than one other point in the cluster. Since clusters of\na size less then 3 cannot have internal points, they are ignored (considered\nnoise) in this implementation.\n\nThe density separation of a pair of clusters (DSPC)\nis deﬁned as the minimum reachability distance between the internal nodes of\nthe spanning trees of the two clusters.\n\nThe validity index for a cluster is calculated using these measures and aggregated\nto a validity index for the whole clustering using a weighted average.\n\nThe index is in the range \\eqn{[-1,1]}. If the cluster density compactness is better\nthan the density separation, a positive value is returned. The actual value depends\non the separability of the data. In general, greater values\nof the measure indicating a better density-based clustering solution.\n\nNoise points are included in the calculation only in the weighted average,\ntherefore clustering with more noise points will get a lower index.\n\n\\strong{Performance note:} This implementation calculates a distance matrix and thus\ncan only be used for small or sampled datasets.\n}\n\\examples{\n# Load a test dataset\ndata(Dataset_1)\nx <- Dataset_1[, c(\"x\", \"y\")]\nclass <- Dataset_1$class\n\nclplot(x, class)\n\n# We use MinPts 3 and use the knee at eps = .1 for dbscan\nkNNdistplot(x, minPts = 3)\n\ncl <- dbscan(x, eps = .1, minPts = 3)\nclplot(x, cl)\n\ndbcv(x, cl)\n\n# compare to the DBCV index on the original class labels and\n# with a random partitioning\ndbcv(x, class)\ndbcv(x, sample(1:4, replace = TRUE, size = nrow(x)))\n\n# find the best eps using dbcv\neps_grid <- seq(.05,.2, by = .01)\ncls <- lapply(eps_grid, FUN = function(e) dbscan(x, eps = e, minPts = 3))\ndbcvs <- sapply(cls, FUN = function(cl) dbcv(x, cl)$score)\n\nplot(eps_grid, dbcvs, type = \"l\")\n\neps_opt <- eps_grid[which.max(dbcvs)]\neps_opt\n\ncl <- dbscan(x, eps = eps_opt, minPts = 3)\nclplot(x, cl)\n}\n\\references{\nDavoud Moulavi and Pablo A. Jaskowiak and\nRicardo J. G. B. Campello and Arthur Zimek and Jörg Sander (2014).\nDensity-Based Clustering Validation. In\n\\emph{Proceedings of the 2014 SIAM International Conference on Data Mining,}\npages 839-847\n\\doi{10.1137/1.9781611973440.96}\n\nPablo A. Jaskowiak (2022). MATLAB implementation of DBCV.\n\\url{https://github.com/pajaskowiak/dbcv}\n}\n\\author{\nMatt Piekenbrock and Michael Hahsler\n}\n\\concept{Evaluation Functions}\n"
  },
  {
    "path": "man/dbscan-package.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/AAA_dbscan-package.R\n\\docType{package}\n\\name{dbscan-package}\n\\alias{dbscan-package}\n\\title{dbscan: Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Related Algorithms}\n\\description{\nA fast reimplementation of several density-based algorithms of the DBSCAN family. Includes the clustering algorithms DBSCAN (density-based spatial clustering of applications with noise) and HDBSCAN (hierarchical DBSCAN), the ordering algorithm OPTICS (ordering points to identify the clustering structure), shared nearest neighbor clustering, and the outlier detection algorithms LOF (local outlier factor) and GLOSH (global-local outlier score from hierarchies). The implementations use the kd-tree data structure (from library ANN) for faster k-nearest neighbor search. An R interface to fast kNN and fixed-radius NN search is also provided. Hahsler, Piekenbrock and Doran (2019) \\doi{10.18637/jss.v091.i01}.\n}\n\\section{Key functions}{\n\n\\itemize{\n\\item Clustering: \\code{\\link[=dbscan]{dbscan()}}, \\code{\\link[=hdbscan]{hdbscan()}}, \\code{\\link[=optics]{optics()}}, \\code{\\link[=jpclust]{jpclust()}}, \\code{\\link[=sNNclust]{sNNclust()}}\n\\item Outliers: \\code{\\link[=lof]{lof()}}, \\code{\\link[=glosh]{glosh()}}, \\code{\\link[=pointdensity]{pointdensity()}}\n\\item Nearest Neighbors: \\code{\\link[=kNN]{kNN()}}, \\code{\\link[=frNN]{frNN()}}, \\code{\\link[=sNN]{sNN()}}\n}\n}\n\n\\references{\nHahsler M, Piekenbrock M, Doran D (2019). dbscan: Fast Density-Based Clustering with R. Journal of Statistical Software, 91(1), 1-30. \\doi{10.18637/jss.v091.i01}\n}\n\\seealso{\nUseful links:\n\\itemize{\n  \\item \\url{https://github.com/mhahsler/dbscan}\n  \\item Report bugs at \\url{https://github.com/mhahsler/dbscan/issues}\n}\n\n}\n\\author{\n\\strong{Maintainer}: Michael Hahsler \\email{mhahsler@lyle.smu.edu} (\\href{https://orcid.org/0000-0003-2716-1405}{ORCID}) [copyright holder]\n\nAuthors:\n\\itemize{\n  \\item Matthew Piekenbrock [copyright holder]\n}\n\nOther contributors:\n\\itemize{\n  \\item Sunil Arya [contributor, copyright holder]\n  \\item David Mount [contributor, copyright holder]\n  \\item Claudia Malzer [contributor]\n}\n\n}\n\\keyword{internal}\n"
  },
  {
    "path": "man/dbscan.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/dbscan.R, R/predict.R\n\\name{dbscan}\n\\alias{dbscan}\n\\alias{DBSCAN}\n\\alias{print.dbscan_fast}\n\\alias{is.corepoint}\n\\alias{predict.dbscan_fast}\n\\title{Density-based Spatial Clustering of Applications with Noise (DBSCAN)}\n\\usage{\ndbscan(x, eps, minPts = 5, weights = NULL, borderPoints = TRUE, ...)\n\nis.corepoint(x, eps, minPts = 5, ...)\n\n\\method{predict}{dbscan_fast}(object, newdata, data, ...)\n}\n\\arguments{\n\\item{x}{a data matrix, a data.frame, a \\link{dist} object or a \\link{frNN} object with\nfixed-radius nearest neighbors.}\n\n\\item{eps}{size (radius) of the epsilon neighborhood. Can be omitted if\n\\code{x} is a frNN object.}\n\n\\item{minPts}{number of minimum points required in the eps neighborhood for\ncore points (including the point itself).}\n\n\\item{weights}{numeric; weights for the data points. Only needed to perform\nweighted clustering.}\n\n\\item{borderPoints}{logical; should border points be assigned to clusters.\nThe default is \\code{TRUE} for regular DBSCAN. If \\code{FALSE} then border\npoints are considered noise (see DBSCAN* in Campello et al, 2013).}\n\n\\item{...}{additional arguments are passed on to the fixed-radius nearest\nneighbor search algorithm. See \\code{\\link[=frNN]{frNN()}} for details on how to\ncontrol the search strategy.}\n\n\\item{object}{clustering object.}\n\n\\item{newdata}{new data points for which the cluster membership should be\npredicted.}\n\n\\item{data}{the data set used to create the clustering object.}\n}\n\\value{\n\\code{dbscan()} returns an object of class \\code{dbscan_fast} with the following components:\n\n\\item{eps }{ value of the \\code{eps} parameter.}\n\\item{minPts }{ value of the \\code{minPts} parameter.}\n\\item{metric }{ used distance metric.}\n\\item{cluster }{A integer vector with cluster assignments. Zero indicates noise points.}\n\n\\code{is.corepoint()} returns a logical vector indicating for each data point if it is a\ncore point.\n}\n\\description{\nFast reimplementation of the DBSCAN (Density-based spatial clustering of\napplications with noise) clustering algorithm using a kd-tree.\n}\n\\details{\nThe\nimplementation is significantly faster and can work with larger data sets\nthan \\code{\\link[fpc:dbscan]{fpc::dbscan()}} in \\pkg{fpc}. Use \\code{dbscan::dbscan()} (with specifying the package) to\ncall this implementation when you also load package \\pkg{fpc}.\n\n\\strong{The algorithm}\n\nThis implementation of DBSCAN follows the original\nalgorithm as described by Ester et al (1996). DBSCAN performs the following steps:\n\\enumerate{\n\\item Estimate the density\naround each data point by counting the number of points in a user-specified\neps-neighborhood and applies a used-specified minPts thresholds to identify\n\\itemize{\n\\item core points (points with more than minPts points in their neighborhood),\n\\item border points (non-core points with a core point in their neighborhood) and\n\\item noise points (all other points).\n}\n\\item Core points form the backbone of clusters by joining them into\na cluster if they are density-reachable from each other (i.e., there is a chain of core\npoints where one falls inside the eps-neighborhood of the next).\n\\item Border points are assigned to clusters. The algorithm needs parameters\n\\code{eps} (the radius of the epsilon neighborhood) and \\code{minPts} (the\ndensity threshold).\n}\n\nBorder points are arbitrarily assigned to clusters in the original\nalgorithm. DBSCAN* (see Campello et al 2013) treats all border points as\nnoise points. This is implemented with \\code{borderPoints = FALSE}.\n\n\\strong{Specifying the data}\n\nIf \\code{x} is a matrix or a data.frame, then fast fixed-radius nearest\nneighbor computation using a kd-tree is performed using Euclidean distance.\nSee \\code{\\link[=frNN]{frNN()}} for more information on the parameters related to\nnearest neighbor search. \\strong{Note} that only numerical values are allowed in \\code{x}.\n\nAny precomputed distance matrix (dist object) can be specified as \\code{x}.\nYou may run into memory issues since distance matrices are large.\n\nA precomputed frNN object can be supplied as \\code{x}. In this case\n\\code{eps} does not need to be specified. This option us useful for large\ndata sets, where a sparse distance matrix is available. See\n\\code{\\link[=frNN]{frNN()}} how to create frNN objects.\n\n\\strong{Setting parameters for DBSCAN}\n\nThe parameters \\code{minPts} and \\code{eps} define the minimum density required\nin the area around core points which form the backbone of clusters.\n\\code{minPts} is the number of points\nrequired in the neighborhood around the point defined by the parameter \\code{eps}\n(i.e., the radius around the point). Both parameters\ndepend on each other and changing one typically requires changing\nthe other one as well. The parameters also depend on the size of the data set with\nlarger datasets requiring a larger \\code{minPts} or a smaller \\code{eps}.\n\\itemize{\n\\item \\verb{minPts:} The original\nDBSCAN paper (Ester et al, 1996) suggests to start by setting \\eqn{\\text{minPts} \\ge d + 1},\nthe data dimensionality plus one or higher with a minimum of 3. Larger values\nare preferable since increasing the parameter suppresses more noise in the data\nby requiring more points to form clusters.\nSander et al (1998) uses in the examples two times the data dimensionality.\nNote that setting \\eqn{\\text{minPts} \\le 2} is equivalent to hierarchical clustering\nwith the single link metric and the dendrogram cut at height \\code{eps}.\n\\item \\verb{eps:} A suitable neighborhood size\nparameter \\code{eps} given a fixed value for \\code{minPts} can be found\nvisually by inspecting the \\code{\\link[=kNNdistplot]{kNNdistplot()}} of the data using\n\\eqn{k = \\text{minPts} - 1} (\\code{minPts} includes the point itself, while the\nk-nearest neighbors distance does not). The k-nearest neighbor distance plot\nsorts all data points by their k-nearest neighbor distance. A sudden\nincrease of the kNN distance (a knee) indicates that the points to the right\nare most likely outliers. Choose \\code{eps} for DBSCAN where the knee is.\n}\n\n\\strong{Predict cluster memberships}\n\n\\code{\\link[=predict]{predict()}} can be used to predict cluster memberships for new data\npoints. A point is considered a member of a cluster if it is within the eps\nneighborhood of a core point of the cluster. Points\nwhich cannot be assigned to a cluster will be reported as\nnoise points (i.e., cluster ID 0).\n\\strong{Important note:} \\code{predict()} currently can only use Euclidean distance to determine\nthe neighborhood of core points. If \\code{dbscan()} was called using distances other than Euclidean,\nthen the neighborhood calculation will not be correct and only approximated by Euclidean\ndistances. If the data contain factor columns (e.g., using Gower's distance), then\nthe factors in \\code{data} and \\code{query} first need to be converted to numeric to use the\nEuclidean approximation.\n}\n\\examples{\n## Example 1: use dbscan on the iris data set\ndata(iris)\niris <- as.matrix(iris[, 1:4])\n\n## Find suitable DBSCAN parameters:\n## 1. We use minPts = dim + 1 = 5 for iris. A larger value can also be used.\n## 2. We inspect the k-NN distance plot for k = minPts - 1 = 4\nkNNdistplot(iris, minPts = 5)\n\n## Noise seems to start around a 4-NN distance of .7\nabline(h=.7, col = \"red\", lty = 2)\n\n## Cluster with the chosen parameters\nres <- dbscan(iris, eps = .7, minPts = 5)\nres\n\npairs(iris, col = res$cluster + 1L)\nclplot(iris, res)\n\n## Use a precomputed frNN object\nfr <- frNN(iris, eps = .7)\ndbscan(fr, minPts = 5)\n\n## Example 2: use data from fpc\nset.seed(665544)\nn <- 100\nx <- cbind(\n  x = runif(10, 0, 10) + rnorm(n, sd = 0.2),\n  y = runif(10, 0, 10) + rnorm(n, sd = 0.2)\n  )\n\nres <- dbscan(x, eps = .3, minPts = 3)\nres\n\n## plot clusters and add noise (cluster 0) as crosses.\nplot(x, col = res$cluster)\npoints(x[res$cluster == 0, ], pch = 3, col = \"grey\")\n\nclplot(x, res)\nhullplot(x, res)\n\n## Predict cluster membership for new data points\n## (Note: 0 means it is predicted as noise)\nnewdata <- x[1:5,] + rnorm(10, 0, .3)\nhullplot(x, res)\npoints(newdata, pch = 3 , col = \"red\", lwd = 3)\ntext(newdata, pos = 1)\n\npred_label <- predict(res, newdata, data = x)\npred_label\npoints(newdata, col = pred_label + 1L,  cex = 2, lwd = 2)\n\n## Compare speed against fpc version (if microbenchmark is installed)\n## Note: we use dbscan::dbscan to make sure that we do now run the\n## implementation in fpc.\n\\dontrun{\nif (requireNamespace(\"fpc\", quietly = TRUE) &&\n    requireNamespace(\"microbenchmark\", quietly = TRUE)) {\n  t_dbscan <- microbenchmark::microbenchmark(\n    dbscan::dbscan(x, .3, 3), times = 10, unit = \"ms\")\n  t_dbscan_linear <- microbenchmark::microbenchmark(\n    dbscan::dbscan(x, .3, 3, search = \"linear\"), times = 10, unit = \"ms\")\n  t_dbscan_dist <- microbenchmark::microbenchmark(\n    dbscan::dbscan(x, .3, 3, search = \"dist\"), times = 10, unit = \"ms\")\n  t_fpc <- microbenchmark::microbenchmark(\n    fpc::dbscan(x, .3, 3), times = 10, unit = \"ms\")\n\n  r <- rbind(t_fpc, t_dbscan_dist, t_dbscan_linear, t_dbscan)\n  r\n\n  boxplot(r,\n    names = c('fpc', 'dbscan (dist)', 'dbscan (linear)', 'dbscan (kdtree)'),\n    main = \"Runtime comparison in ms\")\n\n  ## speedup of the kd-tree-based version compared to the fpc implementation\n  median(t_fpc$time) / median(t_dbscan$time)\n}}\n\n## Example 3: manually create a frNN object for dbscan (dbscan only needs ids and eps)\nnn <- structure(list(id = list(c(2,3), c(1,3), c(1,2,3), c(3,5), c(4,5)), eps = 1),\n  class =  c(\"NN\", \"frNN\"))\nnn\ndbscan(nn, minPts = 2)\n\n}\n\\references{\nHahsler M, Piekenbrock M, Doran D (2019). dbscan: Fast\nDensity-Based Clustering with R.  \\emph{Journal of Statistical Software,}\n91(1), 1-30.\n\\doi{10.18637/jss.v091.i01}\n\nMartin Ester, Hans-Peter Kriegel, Joerg Sander, Xiaowei Xu (1996). A\nDensity-Based Algorithm for Discovering Clusters in Large Spatial Databases\nwith Noise. Institute for Computer Science, University of Munich.\n\\emph{Proceedings of 2nd International Conference on Knowledge Discovery and\nData Mining (KDD-96),} 226-231.\n\\url{https://dl.acm.org/doi/10.5555/3001460.3001507}\n\nCampello, R. J. G. B.; Moulavi, D.; Sander, J. (2013). Density-Based\nClustering Based on Hierarchical Density Estimates. Proceedings of the\n17th Pacific-Asia Conference on Knowledge Discovery in Databases, PAKDD\n2013, \\emph{Lecture Notes in Computer Science} 7819, p. 160.\n\\doi{10.1007/978-3-642-37456-2_14}\n\nSander, J., Ester, M., Kriegel, HP. et al. (1998). Density-Based\nClustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications.\n\\emph{Data Mining and Knowledge Discovery} 2, 169-194.\n\\doi{10.1023/A:1009745219419}\n}\n\\seealso{\nOther clustering functions: \n\\code{\\link{extractFOSC}()},\n\\code{\\link{hdbscan}()},\n\\code{\\link{jpclust}()},\n\\code{\\link{ncluster}()},\n\\code{\\link{optics}()},\n\\code{\\link{sNNclust}()}\n}\n\\author{\nMichael Hahsler\n}\n\\concept{clustering functions}\n\\keyword{clustering}\n\\keyword{model}\n"
  },
  {
    "path": "man/dbscan_tidiers.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/broom-dbscan-tidiers.R\n\\name{dbscan_tidiers}\n\\alias{dbscan_tidiers}\n\\alias{glance}\n\\alias{tidy}\n\\alias{augment}\n\\alias{tidy.dbscan}\n\\alias{tidy.hdbscan}\n\\alias{tidy.general_clustering}\n\\alias{augment.dbscan}\n\\alias{augment.hdbscan}\n\\alias{augment.general_clustering}\n\\alias{glance.dbscan}\n\\alias{glance.hdbscan}\n\\alias{glance.general_clustering}\n\\title{Turn an dbscan clustering object into a tidy tibble}\n\\usage{\ntidy(x, ...)\n\n\\method{tidy}{dbscan}(x, ...)\n\n\\method{tidy}{hdbscan}(x, ...)\n\n\\method{tidy}{general_clustering}(x, ...)\n\naugment(x, ...)\n\n\\method{augment}{dbscan}(x, data = NULL, newdata = NULL, ...)\n\n\\method{augment}{hdbscan}(x, data = NULL, newdata = NULL, ...)\n\n\\method{augment}{general_clustering}(x, data = NULL, newdata = NULL, ...)\n\nglance(x, ...)\n\n\\method{glance}{dbscan}(x, ...)\n\n\\method{glance}{hdbscan}(x, ...)\n\n\\method{glance}{general_clustering}(x, ...)\n}\n\\arguments{\n\\item{x}{An \\code{dbscan} object returned from \\code{\\link[=dbscan]{dbscan()}}.}\n\n\\item{...}{further arguments are ignored without a warning.}\n\n\\item{data}{The data used to create the clustering.}\n\n\\item{newdata}{New data to predict cluster labels for.}\n}\n\\description{\nProvides \\link[generics:tidy]{tidy()}, \\link[generics:augment]{augment()}, and\n\\link[generics:glance]{glance()} verbs for clusterings created with algorithms\nin package \\code{dbscan} to work with \\href{https://www.tidymodels.org/}{tidymodels}.\n}\n\\examples{\n\\dontshow{if (requireNamespace(\"tibble\", quietly = TRUE) && identical(Sys.getenv(\"NOT_CRAN\"), \"true\")) withAutoprint(\\{ # examplesIf}\n\ndata(iris)\nx <- scale(iris[, 1:4])\n\n## dbscan\ndb <- dbscan(x, eps = .9, minPts = 5)\ndb\n\n# summarize model fit with tidiers\ntidy(db)\nglance(db)\n\n# augment for this model needs the original data\naugment(db, x)\n\n# to augment new data, the original data is also needed\naugment(db, x, newdata = x[1:5, ])\n\n## hdbscan\nhdb <- hdbscan(x, minPts = 5)\n\n# summarize model fit with tidiers\ntidy(hdb)\nglance(hdb)\n\n# augment for this model needs the original data\naugment(hdb, x)\n\n# to augment new data, the original data is also needed\naugment(hdb, x, newdata = x[1:5, ])\n\n## Jarvis-Patrick clustering\ncl <- jpclust(x, k = 20, kt = 15)\n\n# summarize model fit with tidiers\ntidy(cl)\nglance(cl)\n\n# augment for this model needs the original data\naugment(cl, x)\n\n## Shared Nearest Neighbor clustering\ncl <- sNNclust(x, k = 20, eps = 0.8, minPts = 15)\n\n# summarize model fit with tidiers\ntidy(cl)\nglance(cl)\n\n# augment for this model needs the original data\naugment(cl, x)\n\\dontshow{\\}) # examplesIf}\n}\n\\seealso{\n\\code{\\link[generics:tidy]{generics::tidy()}}, \\code{\\link[generics:augment]{generics::augment()}},\n\\code{\\link[generics:glance]{generics::glance()}}, \\code{\\link[=dbscan]{dbscan()}}\n}\n\\concept{tidiers}\n"
  },
  {
    "path": "man/dendrogram.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/dendrogram.R\n\\name{dendrogram}\n\\alias{dendrogram}\n\\alias{as.dendrogram}\n\\alias{as.dendrogram.default}\n\\alias{as.dendrogram.hclust}\n\\alias{as.dendrogram.hdbscan}\n\\alias{as.dendrogram.reachability}\n\\title{Coersions to Dendrogram}\n\\usage{\nas.dendrogram(object, ...)\n\n\\method{as.dendrogram}{default}(object, ...)\n\n\\method{as.dendrogram}{hclust}(object, ...)\n\n\\method{as.dendrogram}{hdbscan}(object, ...)\n\n\\method{as.dendrogram}{reachability}(object, ...)\n}\n\\arguments{\n\\item{object}{the object}\n\n\\item{...}{further arguments}\n}\n\\description{\nProvides a new generic function to coerce objects to dendrograms with\n\\code{\\link[stats:dendrogram]{stats::as.dendrogram()}} as the default. Additional methods for\n\\link{hclust}, \\link{hdbscan} and \\link{reachability} objects are provided.\n}\n\\details{\nCoersion methods for\n\\link{hclust}, \\link{hdbscan} and \\link{reachability} objects to \\link{dendrogram} are provided.\n\nThe coercion from \\code{hclust} is a faster C++ reimplementation of the coercion in\npackage \\code{stats}. The original implementation can be called\nusing \\code{\\link[stats:dendrogram]{stats::as.dendrogram()}}.\n\nThe coersion from \\link{hdbscan} builds the non-simplified HDBSCAN hierarchy as a\ndendrogram object.\n}\n"
  },
  {
    "path": "man/extractFOSC.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/extractFOSC.R\n\\name{extractFOSC}\n\\alias{extractFOSC}\n\\title{Framework for the Optimal Extraction of Clusters from Hierarchies}\n\\usage{\nextractFOSC(\n  x,\n  constraints,\n  alpha = 0,\n  minPts = 2L,\n  prune_unstable = FALSE,\n  validate_constraints = FALSE\n)\n}\n\\arguments{\n\\item{x}{a valid \\link{hclust} object created via \\code{\\link[=hclust]{hclust()}} or \\code{\\link[=hdbscan]{hdbscan()}}.}\n\n\\item{constraints}{Either a list or matrix of pairwise constraints. If\nmissing, an unsupervised measure of stability is used to make local cuts and\nextract the optimal clusters. See details.}\n\n\\item{alpha}{numeric; weight between \\eqn{[0, 1]} for mixed-objective\nsemi-supervised extraction. Defaults to 0.}\n\n\\item{minPts}{numeric; Defaults to 2. Only needed if class-less noise is a\nvalid label in the model.}\n\n\\item{prune_unstable}{logical; should significantly unstable subtrees be\npruned? The default is \\code{FALSE} for the original optimal extraction\nframework (see Campello et al, 2013). See details for what \\code{TRUE}\nimplies.}\n\n\\item{validate_constraints}{logical; should constraints be checked for\nvalidity? See details for what are considered valid constraints.}\n}\n\\value{\nA list with the elements:\n\n\\item{cluster }{A integer vector with cluster assignments. Zero\nindicates noise points (if any).}\n\\item{hc }{The original \\link{hclust} object with additional list elements\n\\code{\"stability\"}, \\code{\"constraint\"}, and \\code{\"total\"}\nfor the \\eqn{n - 1} cluster-wide objective scores from the extraction.}\n}\n\\description{\nGeneric reimplementation of the \\emph{Framework for Optimal Selection of Clusters}\n(FOSC; Campello et al, 2013) to extract clusterings from hierarchical clustering (i.e.,\n\\link{hclust} objects).\nCan be parameterized to perform unsupervised\ncluster extraction through a stability-based measure, or semisupervised\ncluster extraction through either a constraint-based extraction (with a\nstability-based tiebreaker) or a mixed (weighted) constraint and\nstability-based objective extraction.\n}\n\\details{\nCampello et al (2013) suggested a \\emph{Framework for Optimal Selection of\nClusters} (FOSC) as a framework to make local (non-horizontal) cuts to any\ncluster tree hierarchy. This function implements the original extraction\nalgorithms as described by the framework for hclust objects. Traditional\ncluster extraction methods from hierarchical representations (such as\n\\link{hclust} objects) generally rely on global parameters or cutting values\nwhich are used to partition a cluster hierarchy into a set of disjoint, flat\nclusters. This is implemented in R in function \\code{\\link[stats:cutree]{stats::cutree()}}.\nAlthough such methods are widespread, using global parameter\nsettings are inherently limited in that they cannot capture patterns within\nthe cluster hierarchy at varying \\emph{local} levels of granularity.\n\nRather than partitioning a hierarchy based on the number of the cluster one\nexpects to find (\\eqn{k}) or based on some linkage distance threshold\n(\\eqn{H}), the FOSC proposes that the optimal clusters may exist at varying\ndistance thresholds in the hierarchy. To enable this idea, FOSC requires one\nparameter (minPts) that represents \\emph{the minimum number of points that\nconstitute a valid cluster.} The first step of the FOSC algorithm is to\ntraverse the given cluster hierarchy divisively, recording new clusters at\neach split if both branches represent more than or equal to minPts. Branches\nthat contain less than minPts points at one or both branches inherit the\nparent clusters identity. Note that using FOSC, due to the constraint that\nminPts must be greater than or equal to 2, it is possible that the optimal\ncluster solution chosen makes local cuts that render parent branches of\nsizes less than minPts as noise, which are denoted as 0 in the final\nsolution.\n\nTraversing the original cluster tree using minPts creates a new, simplified\ncluster tree that is then post-processed recursively to extract clusters\nthat maximize for each cluster \\eqn{C_i}{Ci} the cost function\n\n\\deqn{\\max_{\\delta_2, \\dots, \\delta_k} J = \\sum\\limits_{i=2}^{k} \\delta_i\nS(C_i)}{ J = \\sum \\delta S(Ci) for all i clusters, } where\n\\eqn{S(C_i)}{S(Ci)} is the stability-based measure as \\deqn{ S(C_i) =\n\\sum_{x_j \\in C_i}(\\frac{1}{h_{min} (x_j, C_i)} - \\frac{1}{h_{max} (C_i)})\n}{ S(Ci) = \\sum (1/Hmin(Xj, Ci) - 1/Hmax(Ci)) for all Xj in Ci.}\n\n\\eqn{\\delta_i}{\\delta} represents an indicator function, which constrains\nthe solution space such that clusters must be disjoint (cannot assign more\nthan 1 label to each cluster). The measure \\eqn{S(C_i)}{S(Ci)} used by FOSC\nis an unsupervised validation measure based on the assumption that, if you\nvary the linkage/distance threshold across all possible values, more\nprominent clusters that survive over many threshold variations should be\nconsidered as stronger candidates of the optimal solution. For this reason,\nusing this measure to detect clusters is referred to as an unsupervised,\n\\emph{stability-based} extraction approach. In some cases it may be useful\nto enact \\emph{instance-level} constraints that ensure the solution space\nconforms to linkage expectations known \\emph{a priori}. This general idea of\nusing preliminary expectations to augment the clustering solution will be\nreferred to as \\emph{semisupervised clustering}. If constraints are given in\nthe call to \\code{extractFOSC()}, the following alternative objective function\nis maximized:\n\n\\deqn{J = \\frac{1}{2n_c}\\sum\\limits_{j=1}^n \\gamma (x_j)}{J = 1/(2 * nc)\n\\sum \\gamma(Xj)}\n\n\\eqn{n_c}{nc} is the total number of constraints given and\n\\eqn{\\gamma(x_j)}{\\gamma(Xj)} represents the number of constraints involving\nobject \\eqn{x_j}{Xj} that are satisfied. In the case of ties (such as\nsolutions where no constraints were given), the unsupervised solution is\nused as a tiebreaker. See Campello et al (2013) for more details.\n\nAs a third option, if one wishes to prioritize the degree at which the\nunsupervised and semisupervised solutions contribute to the overall optimal\nsolution, the parameter \\eqn{\\alpha} can be set to enable the extraction of\nclusters that maximize the \\code{mixed} objective function\n\n\\deqn{J = \\alpha S(C_i) + (1 - \\alpha) \\gamma(C_i))}{J = \\alpha S(Ci) + (1 -\n\\alpha) \\gamma(Ci).}\n\nFOSC expects the pairwise constraints to be passed as either 1) an\n\\eqn{n(n-1)/2} vector of integers representing the constraints, where 1\nrepresents should-link, -1 represents should-not-link, and 0 represents no\npreference using the unsupervised solution (see below for examples).\nAlternatively, if only a few constraints are needed, a named list\nrepresenting the (symmetric) adjacency list can be used, where the names\ncorrespond to indices of the points in the original data, and the values\ncorrespond to integer vectors of constraints (positive indices for\nshould-link, negative indices for should-not-link). Again, see the examples\nsection for a demonstration of this.\n\nThe parameters to the input function correspond to the concepts discussed\nabove. The \\code{minPts} parameter to represent the minimum cluster size to\nextract. The optional \\code{constraints} parameter contains the pairwise,\ninstance-level constraints of the data. The optional \\code{alpha} parameters\ncontrols whether the mixed objective function is used (if \\code{alpha} is\ngreater than 0). If the \\code{validate_constraints} parameter is set to\ntrue, the constraints are checked (and fixed) for symmetry (if point A has a\nshould-link constraint with point B, point B should also have the same\nconstraint). Asymmetric constraints are not supported.\n\nUnstable branch pruning was not discussed by Campello et al (2013), however\nin some data sets it may be the case that specific subbranches scores are\nsignificantly greater than sibling and parent branches, and thus sibling\nbranches should be considered as noise if their scores are cumulatively\nlower than the parents. This can happen in extremely nonhomogeneous data\nsets, where there exists locally very stable branches surrounded by unstable\nbranches that contain more than \\code{minPts} points.\n\\code{prune_unstable = TRUE} will remove the unstable branches.\n}\n\\examples{\ndata(\"moons\")\n\n## Regular HDBSCAN using stability-based extraction (unsupervised)\ncl <- hdbscan(moons, minPts = 5)\ncl$cluster\n\n## Constraint-based extraction from the HDBSCAN hierarchy\n## (w/ stability-based tiebreaker (semisupervised))\ncl_con <- extractFOSC(cl$hc, minPts = 5,\n  constraints = list(\"12\" = c(49, -47)))\ncl_con$cluster\n\n## Alternative formulation: Constraint-based extraction from the HDBSCAN hierarchy\n## (w/ stability-based tiebreaker (semisupervised)) using distance thresholds\ndist_moons <- dist(moons)\ncl_con2 <- extractFOSC(cl$hc, minPts = 5,\n  constraints = ifelse(dist_moons < 0.1, 1L,\n                ifelse(dist_moons > 1, -1L, 0L)))\n\ncl_con2$cluster # same as the second example\n}\n\\references{\nCampello, Ricardo JGB, Davoud Moulavi, Arthur Zimek, and Joerg\nSander (2013). A framework for semi-supervised and unsupervised optimal\nextraction of clusters from hierarchies. \\emph{Data Mining and Knowledge\nDiscovery} 27(3): 344-371.\n\\doi{10.1007/s10618-013-0311-4}\n}\n\\seealso{\n\\code{\\link[=hclust]{hclust()}}, \\code{\\link[=hdbscan]{hdbscan()}}, \\code{\\link[stats:cutree]{stats::cutree()}}\n\nOther clustering functions: \n\\code{\\link{dbscan}()},\n\\code{\\link{hdbscan}()},\n\\code{\\link{jpclust}()},\n\\code{\\link{ncluster}()},\n\\code{\\link{optics}()},\n\\code{\\link{sNNclust}()}\n}\n\\author{\nMatt Piekenbrock\n}\n\\concept{clustering functions}\n\\keyword{clustering}\n\\keyword{model}\n"
  },
  {
    "path": "man/frNN.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/frNN.R\n\\name{frNN}\n\\alias{frNN}\n\\alias{frnn}\n\\alias{print.frnn}\n\\alias{sort.frNN}\n\\alias{adjacencylist.frNN}\n\\alias{print.frNN}\n\\title{Find the Fixed Radius Nearest Neighbors}\n\\usage{\nfrNN(\n  x,\n  eps,\n  query = NULL,\n  sort = TRUE,\n  search = \"kdtree\",\n  bucketSize = 10,\n  splitRule = \"suggest\",\n  approx = 0\n)\n\n\\method{sort}{frNN}(x, decreasing = FALSE, ...)\n\n\\method{adjacencylist}{frNN}(x, ...)\n\n\\method{print}{frNN}(x, ...)\n}\n\\arguments{\n\\item{x}{a data matrix, a dist object or a frNN object.}\n\n\\item{eps}{neighbors radius.}\n\n\\item{query}{a data matrix with the points to query. If query is not\nspecified, the NN for all the points in \\code{x} is returned. If query is\nspecified then \\code{x} needs to be a data matrix.}\n\n\\item{sort}{sort the neighbors by distance? This is expensive and can be\ndone later using \\code{sort()}.}\n\n\\item{search}{nearest neighbor search strategy (one of \\code{\"kdtree\"}, \\code{\"linear\"} or\n\\code{\"dist\"}).}\n\n\\item{bucketSize}{max size of the kd-tree leafs.}\n\n\\item{splitRule}{rule to split the kd-tree. One of \\code{\"STD\"}, \\code{\"MIDPT\"}, \\code{\"FAIR\"},\n\\code{\"SL_MIDPT\"}, \\code{\"SL_FAIR\"} or \\code{\"SUGGEST\"} (SL stands for sliding). \\code{\"SUGGEST\"} uses\nANNs best guess.}\n\n\\item{approx}{use approximate nearest neighbors. All NN up to a distance of\na factor of \\code{1 + approx} eps may be used. Some actual NN may be omitted\nleading to spurious clusters and noise points.  However, the algorithm will\nenjoy a significant speedup.}\n\n\\item{decreasing}{sort in decreasing order?}\n\n\\item{...}{further arguments}\n}\n\\value{\n\\code{frNN()} returns an object of class \\link{frNN} (subclass of\n\\link{NN}) containing a list with the following components:\n\\item{id }{a list of\ninteger vectors. Each vector contains the ids (row numbers) of the fixed radius nearest\nneighbors. }\n\\item{dist }{a list with distances (same structure as\n\\code{id}). }\n\\item{eps }{ neighborhood radius \\code{eps} that was used. }\n\\item{metric }{ used distance metric. }\n\n\\code{adjacencylist()} returns a list with one entry per data point in \\code{x}. Each entry\ncontains the id of the nearest neighbors.\n}\n\\description{\nThis function uses a kd-tree to find the fixed radius nearest neighbors\n(including distances) fast.\n}\n\\details{\nIf \\code{x} is specified as a data matrix, then Euclidean distances an fast\nnearest neighbor lookup using a kd-tree are used.\n\nTo create a frNN object from scratch, you need to supply at least the\nelements \\code{id} with a list of integer vectors with the nearest neighbor\nids for each point and \\code{eps} (see below).\n\n\\strong{Self-matches:} Self-matches are not returned!\n}\n\\examples{\ndata(iris)\nx <- iris[, -5]\n\n# Example 1: Find fixed radius nearest neighbors for each point\nnn <- frNN(x, eps = .5)\nnn\n\n# Number of neighbors\nhist(lengths(adjacencylist(nn)),\n  xlab = \"k\", main=\"Number of Neighbors\",\n  sub = paste(\"Neighborhood size eps =\", nn$eps))\n\n# Explore neighbors of point i = 10\ni <- 10\nnn$id[[i]]\nnn$dist[[i]]\nplot(x, col = ifelse(seq_len(nrow(iris)) \\%in\\% nn$id[[i]], \"red\", \"black\"))\n\n# get an adjacency list\nhead(adjacencylist(nn))\n\n# plot the fixed radius neighbors (and then reduced to a radius of .3)\nplot(nn, x)\nplot(frNN(nn, eps = .3), x)\n\n## Example 2: find fixed-radius NN for query points\nq <- x[c(1,100),]\nnn <- frNN(x, eps = .5, query = q)\n\nplot(nn, x, col = \"grey\")\npoints(q, pch = 3, lwd = 2)\n}\n\\references{\nDavid M. Mount and Sunil Arya (2010). ANN: A Library for\nApproximate Nearest Neighbor Searching,\n\\url{http://www.cs.umd.edu/~mount/ANN/}.\n}\n\\seealso{\nOther NN functions: \n\\code{\\link{NN}},\n\\code{\\link{comps}()},\n\\code{\\link{kNN}()},\n\\code{\\link{kNNdist}()},\n\\code{\\link{sNN}()}\n}\n\\author{\nMichael Hahsler\n}\n\\concept{NN functions}\n\\keyword{model}\n"
  },
  {
    "path": "man/glosh.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/GLOSH.R\n\\name{glosh}\n\\alias{glosh}\n\\alias{GLOSH}\n\\title{Global-Local Outlier Score from Hierarchies}\n\\usage{\nglosh(x, k = 4, ...)\n}\n\\arguments{\n\\item{x}{an \\link{hclust} object, data matrix, or \\link{dist} object.}\n\n\\item{k}{size of the neighborhood.}\n\n\\item{...}{further arguments are passed on to \\code{\\link[=kNN]{kNN()}}.}\n}\n\\value{\nA numeric vector of length equal to the size of the original data\nset containing GLOSH values for all data points.\n}\n\\description{\nCalculate the Global-Local Outlier Score from Hierarchies (GLOSH) score for\neach data point using a kd-tree to speed up kNN search.\n}\n\\details{\nGLOSH compares the density of a point to densities of any points associated\nwithin current and child clusters (if any). Points that have a substantially\nlower density than the density mode (cluster) they most associate with are\nconsidered outliers. GLOSH is computed from a hierarchy a clusters.\n\nSpecifically, consider a point \\emph{x} and a density or distance threshold\n\\emph{lambda}. GLOSH is calculated by taking 1 minus the ratio of how long\nany of the child clusters of the cluster \\emph{x} belongs to \"survives\"\nchanges in \\emph{lambda} to the highest \\emph{lambda} threshold of x, above\nwhich x becomes a noise point.\n\nScores close to 1 indicate outliers. For more details on the motivation for\nthis calculation, see Campello et al (2015).\n}\n\\examples{\nset.seed(665544)\nn <- 100\nx <- cbind(\n  x=runif(10, 0, 5) + rnorm(n, sd = 0.4),\n  y=runif(10, 0, 5) + rnorm(n, sd = 0.4)\n  )\n\n### calculate GLOSH score\nglosh <- glosh(x, k = 3)\n\n### distribution of outlier scores\nsummary(glosh)\nhist(glosh, breaks = 10)\n\n### simple function to plot point size is proportional to GLOSH score\nplot_glosh <- function(x, glosh){\n  plot(x, pch = \".\", main = \"GLOSH (k = 3)\")\n  points(x, cex = glosh*3, pch = 1, col = \"red\")\n  text(x[glosh > 0.80, ], labels = round(glosh, 3)[glosh > 0.80], pos = 3)\n}\nplot_glosh(x, glosh)\n\n### GLOSH with any hierarchy\nx_dist <- dist(x)\nx_sl <- hclust(x_dist, method = \"single\")\nx_upgma <- hclust(x_dist, method = \"average\")\nx_ward <- hclust(x_dist, method = \"ward.D2\")\n\n## Compare what different linkage criterion consider as outliers\nglosh_sl <- glosh(x_sl, k = 3)\nplot_glosh(x, glosh_sl)\n\nglosh_upgma <- glosh(x_upgma, k = 3)\nplot_glosh(x, glosh_upgma)\n\nglosh_ward <- glosh(x_ward, k = 3)\nplot_glosh(x, glosh_ward)\n\n## GLOSH is automatically computed with HDBSCAN\nall(hdbscan(x, minPts = 3)$outlier_scores == glosh(x, k = 3))\n}\n\\references{\nCampello, Ricardo JGB, Davoud Moulavi, Arthur Zimek, and Joerg\nSander. Hierarchical density estimates for data clustering, visualization,\nand outlier detection. \\emph{ACM Transactions on Knowledge Discovery from Data\n(TKDD)} 10, no. 1 (2015).\n\\doi{10.1145/2733381}\n}\n\\seealso{\nOther Outlier Detection Functions: \n\\code{\\link{kNNdist}()},\n\\code{\\link{lof}()},\n\\code{\\link{pointdensity}()}\n}\n\\author{\nMatt Piekenbrock\n}\n\\concept{Outlier Detection Functions}\n\\keyword{model}\n"
  },
  {
    "path": "man/hdbscan.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/hdbscan.R, R/predict.R\n\\name{hdbscan}\n\\alias{hdbscan}\n\\alias{HDBSCAN}\n\\alias{print.hdbscan}\n\\alias{plot.hdbscan}\n\\alias{coredist}\n\\alias{mrdist}\n\\alias{predict.hdbscan}\n\\title{Hierarchical DBSCAN (HDBSCAN)}\n\\usage{\nhdbscan(\n  x,\n  minPts,\n  cluster_selection_epsilon = 0,\n  gen_hdbscan_tree = FALSE,\n  gen_simplified_tree = FALSE,\n  verbose = FALSE\n)\n\n\\method{print}{hdbscan}(x, ...)\n\n\\method{plot}{hdbscan}(\n  x,\n  scale = \"suggest\",\n  gradient = c(\"yellow\", \"red\"),\n  show_flat = FALSE,\n  main = \"HDBSCAN*\",\n  ylab = \"eps value\",\n  leaflab = \"none\",\n  ...\n)\n\ncoredist(x, minPts)\n\nmrdist(x, minPts, coredist = NULL)\n\n\\method{predict}{hdbscan}(object, newdata, data, ...)\n}\n\\arguments{\n\\item{x}{a data matrix (Euclidean distances are used) or a \\link{dist} object\ncalculated with an arbitrary distance metric.}\n\n\\item{minPts}{integer; Minimum size of clusters. See details.}\n\n\\item{cluster_selection_epsilon}{double; a distance threshold below which}\n\n\\item{gen_hdbscan_tree}{logical; should the robust single linkage tree be\nexplicitly computed (see cluster tree in Chaudhuri et al, 2010).}\n\n\\item{gen_simplified_tree}{logical; should the simplified hierarchy be\nexplicitly computed (see Campello et al, 2013).}\n\n\\item{verbose}{report progress.}\n\n\\item{...}{additional arguments are passed on.}\n\n\\item{scale}{integer; used to scale condensed tree based on the graphics\ndevice. Lower scale results in wider colored trees lines.\nThe default \\code{'suggest'} sets scale to the number of clusters.}\n\n\\item{gradient}{character vector; the colors to build the condensed tree\ncoloring with.}\n\n\\item{show_flat}{logical; whether to draw boxes indicating the most stable\nclusters.}\n\n\\item{main}{Title of the plot.}\n\n\\item{ylab}{the label for the y axis.}\n\n\\item{leaflab}{a string specifying how leaves are labeled (see \\code{\\link[stats:dendrogram]{stats::plot.dendrogram()}}).}\n\n\\item{coredist}{numeric vector with precomputed core distances (optional).}\n\n\\item{object}{clustering object.}\n\n\\item{newdata}{new data points for which the cluster membership should be\npredicted.}\n\n\\item{data}{the data set used to create the clustering object.}\n}\n\\value{\n\\code{hdbscan()} returns object of class \\code{hdbscan} with the following components:\n\\item{cluster }{A integer vector with cluster assignments. Zero indicates\nnoise points.}\n\\item{minPts }{ value of the \\code{minPts} parameter.}\n\\item{cluster_scores }{The sum of the stability scores for each salient\n(flat) cluster. Corresponds to cluster IDs given the in \\code{\"cluster\"} element.\n}\n\\item{membership_prob }{The probability or individual stability of a\npoint within its clusters. Between 0 and 1.}\n\\item{outlier_scores }{The GLOSH outlier score of each point. }\n\\item{hc }{An \\link{hclust} object of the HDBSCAN hierarchy. }\n\n\\code{coredist()} returns a vector with the core distance for each data point.\n\n\\code{mrdist()} returns a \\link{dist} object containing pairwise mutual reachability distances.\n}\n\\description{\nFast C++ implementation of the HDBSCAN (Hierarchical DBSCAN) and its related\nalgorithms.\n}\n\\details{\nThis fast implementation of HDBSCAN (Campello et al., 2013) computes the\nhierarchical cluster tree representing density estimates along with the\nstability-based flat cluster extraction. HDBSCAN essentially computes the\nhierarchy of all DBSCAN* clusterings, and\nthen uses a stability-based extraction method to find optimal cuts in the\nhierarchy, thus producing a flat solution.\n\nHDBSCAN performs the following steps:\n\\enumerate{\n\\item Compute mutual reachability distance mrd between points\n(based on distances and core distances).\n\\item Use mdr as a distance measure to construct a minimum spanning tree.\n\\item Prune the tree using stability.\n\\item Extract the clusters.\n}\n\nAdditional, related algorithms including the \"Global-Local Outlier Score\nfrom Hierarchies\" (GLOSH; see section 6 of Campello et al., 2015)\nis available in function \\code{\\link[=glosh]{glosh()}}\nand the ability to cluster based on instance-level constraints (see\nsection 5.3 of Campello et al. 2015) are supported. The algorithms only need\nthe parameter \\code{minPts}.\n\nNote that \\code{minPts} not only acts as a minimum cluster size to detect,\nbut also as a \"smoothing\" factor of the density estimates implicitly\ncomputed from HDBSCAN.\n\nWhen using the optional parameter \\code{cluster_selection_epsilon},\na combination between DBSCAN* and HDBSCAN* can be achieved\n(see Malzer & Baum 2020). This means that part of the\ntree is affected by \\code{cluster_selection_epsilon} as if\nrunning DBSCAN* with \\code{eps} = \\code{cluster_selection_epsilon}.\nThe remaining part (on levels above the threshold) is still\nprocessed by HDBSCAN*'s stability-based selection algorithm\nand can therefore return clusters of variable densities.\nNote that there is not always a remaining part, especially if\nthe parameter value is chosen too large, or if there aren't\nenough clusters of variable densities. In this case, the result\nwill be equal to DBSCAN*.\nwhere HDBSCAN* produces too many small clusters that\nneed to be merged, while still being able to extract clusters\nof variable densities at higher levels.\n\n\\code{coredist()}: The core distance is defined for each point as\nthe distance to the \\code{MinPts - 1}'s neighbor.\nIt is a density estimate equivalent to \\code{kNNdist()} with \\code{k = MinPts -1}.\n\n\\code{mrdist()}: The mutual reachability distance is defined between two points as\n\\code{mrd(a, b) = max(coredist(a), coredist(b), dist(a, b))}. This distance metric is used by\nHDBSCAN. It has the effect of increasing distances in low density areas.\n\n\\code{predict()} assigns each new data point to the same cluster as the nearest point\nif it is not more than that points core distance away. Otherwise the new point\nis classified as a noise point (i.e., cluster ID 0).\n}\n\\examples{\n## cluster the moons data set with HDBSCAN\ndata(moons)\n\nres <- hdbscan(moons, minPts = 5)\nres\n\nplot(res)\nclplot(moons, res)\n\n## cluster the moons data set with HDBSCAN using Manhattan distances\nres <- hdbscan(dist(moons, method = \"manhattan\"), minPts = 5)\nplot(res)\nclplot(moons, res)\n\n## Example for HDBSCAN(e) using cluster_selection_epsilon\n# data with clusters of various densities.\nX <- data.frame(\n x = c(\n  0.08, 0.46, 0.46, 2.95, 3.50, 1.49, 6.89, 6.87, 0.21, 0.15,\n  0.15, 0.39, 0.80, 0.80, 0.37, 3.63, 0.35, 0.30, 0.64, 0.59, 1.20, 1.22,\n  1.42, 0.95, 2.70, 6.36, 6.36, 6.36, 6.60, 0.04, 0.71, 0.57, 0.24, 0.24,\n  0.04, 0.04, 1.35, 0.82, 1.04, 0.62, 0.26, 5.98, 1.67, 1.67, 0.48, 0.15,\n  6.67, 6.67, 1.20, 0.21, 3.99, 0.12, 0.19, 0.15, 6.96, 0.26, 0.08, 0.30,\n  1.04, 1.04, 1.04, 0.62, 0.04, 0.04, 0.04, 0.82, 0.82, 1.29, 1.35, 0.46,\n  0.46, 0.04, 0.04, 5.98, 5.98, 6.87, 0.37, 6.47, 6.47, 6.47, 6.67, 0.30,\n  1.49, 3.21, 3.21, 0.75, 0.75, 0.46, 0.46, 0.46, 0.46, 3.63, 0.39, 3.65,\n  4.09, 4.01, 3.36, 1.43, 3.28, 5.94, 6.35, 6.87, 5.60, 5.99, 0.12, 0.00,\n  0.32, 0.39, 0.00, 1.63, 1.36, 5.67, 5.60, 5.79, 1.10, 2.99, 0.39, 0.18\n  ),\n y = c(\n  7.41, 8.01, 8.01, 5.44, 7.11, 7.13, 1.83, 1.83, 8.22, 8.08,\n  8.08, 7.20, 7.83, 7.83, 8.29, 5.99, 8.32, 8.22, 7.38, 7.69, 8.22, 7.31,\n  8.25, 8.39, 6.34, 0.16, 0.16, 0.16, 1.66, 7.55, 7.90, 8.18, 8.32, 8.32,\n  7.97, 7.97, 8.15, 8.43, 7.83, 8.32, 8.29, 1.03, 7.27, 7.27, 8.08, 7.27,\n  0.79, 0.79, 8.22, 7.73, 6.62, 7.62, 8.39, 8.36, 1.73, 8.29, 8.04, 8.22,\n  7.83, 7.83, 7.83, 8.32, 8.11, 7.69, 7.55, 7.20, 7.20, 8.01, 8.15, 7.55,\n  7.55, 7.97, 7.97, 1.03, 1.03, 1.24, 7.20, 0.47, 0.47, 0.47, 0.79, 8.22,\n  7.13, 6.48, 6.48, 7.10, 7.10, 8.01, 8.01, 8.01, 8.01, 5.99, 8.04, 5.22,\n  5.82, 5.14, 4.81, 7.62, 5.73, 0.55, 1.31, 0.05, 0.95, 1.59, 7.99, 7.48,\n  8.38, 7.12, 2.01, 1.40, 0.00, 9.69, 9.47, 9.25, 2.63, 6.89, 0.56, 3.11\n )\n)\n\n## HDBSCAN splits one cluster\nhdb <- hdbscan(X, minPts = 3)\nplot(hdb, show_flat = TRUE)\nhullplot(X, hdb, main = \"HDBSCAN\")\n\n## DBSCAN* marks the least dense cluster as outliers\ndb <- dbscan(X, eps = 1, minPts = 3, borderPoints = FALSE)\nhullplot(X, db, main = \"DBSCAN*\")\n\n## HDBSCAN(e) mixes HDBSCAN AND DBSCAN* to find all clusters\nhdbe <- hdbscan(X, minPts = 3, cluster_selection_epsilon = 1)\nplot(hdbe, show_flat = TRUE)\nhullplot(X, hdbe, main = \"HDBSCAN(e)\")\n}\n\\references{\nCampello RJGB, Moulavi D, Sander J (2013). Density-Based Clustering Based on\nHierarchical Density Estimates. Proceedings of the 17th Pacific-Asia\nConference on Knowledge Discovery in Databases, PAKDD 2013, \\emph{Lecture Notes\nin Computer Science} 7819, p. 160.\n\\doi{10.1007/978-3-642-37456-2_14}\n\nCampello RJGB, Moulavi D, Zimek A, Sander J (2015). Hierarchical density\nestimates for data clustering, visualization, and outlier detection.\n\\emph{ACM Transactions on Knowledge Discovery from Data (TKDD),} 10(5):1-51.\n\\doi{10.1145/2733381}\n\nMalzer, C., & Baum, M. (2020). A Hybrid Approach To Hierarchical\nDensity-based Cluster Selection.\nIn 2020 IEEE International Conference on Multisensor Fusion\nand Integration for Intelligent Systems (MFI), pp. 223-228.\n\\doi{10.1109/MFI49285.2020.9235263}\n}\n\\seealso{\nOther clustering functions: \n\\code{\\link{dbscan}()},\n\\code{\\link{extractFOSC}()},\n\\code{\\link{jpclust}()},\n\\code{\\link{ncluster}()},\n\\code{\\link{optics}()},\n\\code{\\link{sNNclust}()}\n}\n\\author{\nMatt Piekenbrock\n\nClaudia Malzer (added cluster_selection_epsilon)\n}\n\\concept{HDBSCAN functions}\n\\concept{clustering functions}\n\\keyword{clustering}\n\\keyword{hierarchical}\n\\keyword{model}\n"
  },
  {
    "path": "man/hullplot.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/hullplot.R\n\\name{hullplot}\n\\alias{hullplot}\n\\alias{clplot}\n\\title{Plot Clusters}\n\\usage{\nhullplot(\n  x,\n  cl,\n  col = NULL,\n  pch = NULL,\n  cex = 0.5,\n  hull_lwd = 1,\n  hull_lty = 1,\n  solid = TRUE,\n  alpha = 0.2,\n  main = \"Convex Cluster Hulls\",\n  ...\n)\n\nclplot(x, cl, col = NULL, pch = NULL, cex = 0.5, main = \"Cluster Plot\", ...)\n}\n\\arguments{\n\\item{x}{a data matrix. If more than 2 columns are provided, then the data\nis plotted using the first two principal components.}\n\n\\item{cl}{a clustering. Either a numeric cluster assignment vector or a\nclustering object (a list with an element named \\code{cluster}).}\n\n\\item{col}{colors used for clusters. Defaults to the standard palette.  The\nfirst color (default is black) is used for noise/unassigned points (cluster\nid 0).}\n\n\\item{pch}{a vector of plotting characters. By default \\code{o} is used for\npoints and \\code{x} for noise points.}\n\n\\item{cex}{expansion factor for symbols.}\n\n\\item{hull_lwd, hull_lty}{line width and line type used for the convex hull.}\n\n\\item{solid, alpha}{draw filled polygons instead of just lines for the convex\nhulls? alpha controls the level of alpha shading.}\n\n\\item{main}{main title.}\n\n\\item{...}{additional arguments passed on to plot.}\n}\n\\description{\nThis function produces a two-dimensional scatter plot of data points\nand colors the data points according to a supplied clustering. Noise points\nare marked as \\code{x}. \\code{hullplot()} also adds convex hulls to clusters.\n}\n\\examples{\nset.seed(2)\nn <- 400\n\nx <- cbind(\n  x = runif(4, 0, 1) + rnorm(n, sd = 0.1),\n  y = runif(4, 0, 1) + rnorm(n, sd = 0.1)\n  )\ncl <- rep(1:4, times = 100)\n\n\n### original data with true clustering\nclplot(x, cl, main = \"True clusters\")\nhullplot(x, cl, main = \"True clusters\")\n### use different symbols\nhullplot(x, cl, main = \"True clusters\", pch = cl)\n### just the hulls\nhullplot(x, cl, main = \"True clusters\", pch = NA)\n### a version suitable for b/w printing)\nhullplot(x, cl, main = \"True clusters\", solid = FALSE,\n  col = c(\"grey\", \"black\"), pch = cl)\n\n\n### run some clustering algorithms and plot the results\ndb <- dbscan(x, eps = .07, minPts = 10)\nclplot(x, db, main = \"DBSCAN\")\nhullplot(x, db, main = \"DBSCAN\")\n\nop <- optics(x, eps = 10, minPts = 10)\nopDBSCAN <- extractDBSCAN(op, eps_cl = .07)\nhullplot(x, opDBSCAN, main = \"OPTICS\")\n\nopXi <- extractXi(op, xi = 0.05)\nhullplot(x, opXi, main = \"OPTICSXi\")\n\n# Extract minimal 'flat' clusters only\nopXi <- extractXi(op, xi = 0.05, minimum = TRUE)\nhullplot(x, opXi, main = \"OPTICSXi\")\n\nkm <- kmeans(x, centers = 4)\nhullplot(x, km, main = \"k-means\")\n\nhc <- cutree(hclust(dist(x)), k = 4)\nhullplot(x, hc, main = \"Hierarchical Clustering\")\n}\n\\author{\nMichael Hahsler\n}\n\\keyword{clustering}\n\\keyword{plot}\n"
  },
  {
    "path": "man/jpclust.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/jpclust.R\n\\name{jpclust}\n\\alias{jpclust}\n\\alias{print.general_clustering}\n\\title{Jarvis-Patrick Clustering}\n\\usage{\njpclust(x, k, kt, ...)\n}\n\\arguments{\n\\item{x}{a data matrix/data.frame (Euclidean distance is used), a\nprecomputed \\link{dist} object or a kNN object created with \\code{\\link[=kNN]{kNN()}}.}\n\n\\item{k}{Neighborhood size for nearest neighbor sparsification. If \\code{x}\nis a kNN object then \\code{k} may be missing.}\n\n\\item{kt}{threshold on the number of shared nearest neighbors (including the\npoints themselves) to form clusters. Range: \\eqn{[1, k]}}\n\n\\item{...}{additional arguments are passed on to the k nearest neighbor\nsearch algorithm. See \\code{\\link[=kNN]{kNN()}} for details on how to control the\nsearch strategy.}\n}\n\\value{\nA object of class \\code{general_clustering} with the following\ncomponents:\n\\item{cluster }{A integer vector with cluster assignments. Zero\nindicates noise points.}\n\\item{type }{ name of used clustering algorithm.}\n\\item{metric }{ the distance metric used for clustering.}\n\\item{param }{ list of used clustering parameters. }\n}\n\\description{\nFast C++ implementation of the Jarvis-Patrick clustering which first builds\na shared nearest neighbor graph (k nearest neighbor sparsification) and then\nplaces two points in the same cluster if they are in each others nearest\nneighbor list and they share at least kt nearest neighbors.\n}\n\\details{\nFollowing the original paper, the shared nearest neighbor list is\nconstructed as the k neighbors plus the point itself (as neighbor zero).\nTherefore, the threshold \\code{kt} needs to be in the range \\eqn{[1, k]}.\n\nFast nearest neighbors search with \\code{\\link[=kNN]{kNN()}} is only used if \\code{x} is\na matrix. In this case Euclidean distance is used.\n}\n\\examples{\ndata(\"DS3\")\n\n# use a shared neighborhood of 20 points and require 12 shared neighbors\ncl <- jpclust(DS3, k = 20, kt = 12)\ncl\n\nclplot(DS3, cl)\n# Note: JP clustering does not consider noise and thus,\n# the sine wave points chain clusters together.\n\n# use a precomputed kNN object instead of the original data.\nnn <- kNN(DS3, k = 30)\nnn\n\ncl <- jpclust(nn, k = 20, kt = 12)\ncl\n\n# cluster with noise removed (use low pointdensity to identify noise)\nd <- pointdensity(DS3, eps = 25)\nhist(d, breaks = 20)\nDS3_noiseless <- DS3[d > 110,]\n\ncl <- jpclust(DS3_noiseless, k = 20, kt = 10)\ncl\n\nclplot(DS3_noiseless, cl)\n}\n\\references{\nR. A. Jarvis and E. A. Patrick. 1973. Clustering Using a\nSimilarity Measure Based on Shared Near Neighbors. \\emph{IEEE Trans. Comput.\n22,} 11 (November 1973), 1025-1034.\n\\doi{10.1109/T-C.1973.223640}\n}\n\\seealso{\nOther clustering functions: \n\\code{\\link{dbscan}()},\n\\code{\\link{extractFOSC}()},\n\\code{\\link{hdbscan}()},\n\\code{\\link{ncluster}()},\n\\code{\\link{optics}()},\n\\code{\\link{sNNclust}()}\n}\n\\author{\nMichael Hahsler\n}\n\\concept{clustering functions}\n\\keyword{clustering}\n\\keyword{model}\n"
  },
  {
    "path": "man/kNN.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/kNN.R\n\\name{kNN}\n\\alias{kNN}\n\\alias{knn}\n\\alias{sort.kNN}\n\\alias{adjacencylist.kNN}\n\\alias{print.kNN}\n\\title{Find the k Nearest Neighbors}\n\\usage{\nkNN(\n  x,\n  k,\n  query = NULL,\n  sort = TRUE,\n  search = \"kdtree\",\n  bucketSize = 10,\n  splitRule = \"suggest\",\n  approx = 0\n)\n\n\\method{sort}{kNN}(x, decreasing = FALSE, ...)\n\n\\method{adjacencylist}{kNN}(x, ...)\n\n\\method{print}{kNN}(x, ...)\n}\n\\arguments{\n\\item{x}{a data matrix, a \\link{dist} object or a \\link{kNN} object.}\n\n\\item{k}{number of neighbors to find.}\n\n\\item{query}{a data matrix with the points to query. If query is not\nspecified, the NN for all the points in \\code{x} is returned. If query is\nspecified then \\code{x} needs to be a data matrix.}\n\n\\item{sort}{sort the neighbors by distance? Note that some search methods\nalready sort the results. Sorting is expensive and \\code{sort = FALSE} may\nbe much faster for some search methods. kNN objects can be sorted using\n\\code{sort()}.}\n\n\\item{search}{nearest neighbor search strategy (one of \\code{\"kdtree\"}, \\code{\"linear\"} or\n\\code{\"dist\"}).}\n\n\\item{bucketSize}{max size of the kd-tree leafs.}\n\n\\item{splitRule}{rule to split the kd-tree. One of \\code{\"STD\"}, \\code{\"MIDPT\"}, \\code{\"FAIR\"},\n\\code{\"SL_MIDPT\"}, \\code{\"SL_FAIR\"} or \\code{\"SUGGEST\"} (SL stands for sliding). \\code{\"SUGGEST\"} uses\nANNs best guess.}\n\n\\item{approx}{use approximate nearest neighbors. All NN up to a distance of\na factor of \\code{1 + approx} eps may be used. Some actual NN may be omitted\nleading to spurious clusters and noise points.  However, the algorithm will\nenjoy a significant speedup.}\n\n\\item{decreasing}{sort in decreasing order?}\n\n\\item{...}{further arguments}\n}\n\\value{\nAn object of class \\code{kNN} (subclass of \\link{NN}) containing a\nlist with the following components:\n\\item{dist }{a matrix with distances. }\n\\item{id }{a matrix with \\code{ids}. }\n\\item{k }{number \\code{k} used. }\n\\item{metric }{ used distance metric. }\n}\n\\description{\nThis function uses a kd-tree to find all k nearest neighbors in a data\nmatrix (including distances) fast.\n}\n\\details{\n\\strong{Ties:} If the kth and the (k+1)th nearest neighbor are tied, then the\nneighbor found first is returned and the other one is ignored.\n\n\\strong{Self-matches:} If no query is specified, then self-matches are\nremoved.\n\nDetails on the search parameters:\n\\itemize{\n\\item \\code{search} controls if\na kd-tree or linear search (both implemented in the ANN library; see Mount\nand Arya, 2010). Note, that these implementations cannot handle NAs.\n\\code{search = \"dist\"} precomputes Euclidean distances first using R. NAs are\nhandled, but the resulting distance matrix cannot contain NAs. To use other\ndistance measures, a precomputed distance matrix can be provided as \\code{x}\n(\\code{search} is ignored).\n\\item \\code{bucketSize} and \\code{splitRule} influence how the kd-tree is\nbuilt. \\code{approx} uses the approximate nearest neighbor search\nimplemented in ANN. All nearest neighbors up to a distance of\n\\code{eps / (1 + approx)} will be considered and all with a distance\ngreater than \\code{eps} will not be considered. The other points might be\nconsidered. Note that this results in some actual nearest neighbors being\nomitted leading to spurious clusters and noise points. However, the\nalgorithm will enjoy a significant speedup. For more details see Mount and\nArya (2010).\n}\n}\n\\examples{\ndata(iris)\nx <- iris[, -5]\n\n# Example 1: finding kNN for all points in a data matrix (using a kd-tree)\nnn <- kNN(x, k = 5)\nnn\n\n# explore neighborhood of point 10\ni <- 10\nnn$id[i,]\nplot(x, col = ifelse(seq_len(nrow(iris)) \\%in\\% nn$id[i,], \"red\", \"black\"))\n\n# visualize the 5 nearest neighbors\nplot(nn, x)\n\n# visualize a reduced 2-NN graph\nplot(kNN(nn, k = 2), x)\n\n# Example 2: find kNN for query points\nq <- x[c(1,100),]\nnn <- kNN(x, k = 10, query = q)\n\nplot(nn, x, col = \"grey\")\npoints(q, pch = 3, lwd = 2)\n\n# Example 3: find kNN using distances\nd <- dist(x, method = \"manhattan\")\nnn <- kNN(d, k = 1)\nplot(nn, x)\n}\n\\references{\nDavid M. Mount and Sunil Arya (2010). ANN: A Library for\nApproximate Nearest Neighbor Searching,\n\\url{http://www.cs.umd.edu/~mount/ANN/}.\n}\n\\seealso{\nOther NN functions: \n\\code{\\link{NN}},\n\\code{\\link{comps}()},\n\\code{\\link{frNN}()},\n\\code{\\link{kNNdist}()},\n\\code{\\link{sNN}()}\n}\n\\author{\nMichael Hahsler\n}\n\\concept{NN functions}\n\\keyword{model}\n"
  },
  {
    "path": "man/kNNdist.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/kNNdist.R\n\\name{kNNdist}\n\\alias{kNNdist}\n\\alias{kNNdistplot}\n\\title{Calculate and Plot k-Nearest Neighbor Distances}\n\\usage{\nkNNdist(x, k, all = FALSE, ...)\n\nkNNdistplot(x, k, minPts, ...)\n}\n\\arguments{\n\\item{x}{the data set as a matrix of points (Euclidean distance is used) or\na precalculated \\link{dist} object.}\n\n\\item{k}{number of nearest neighbors used for the distance calculation. For\n\\code{kNNdistplot()} also a range of values for \\code{k} or \\code{minPts} can be specified.}\n\n\\item{all}{should a matrix with the distances to all k nearest neighbors be\nreturned?}\n\n\\item{...}{further arguments (e.g., kd-tree related parameters) are passed\non to \\code{\\link[=kNN]{kNN()}}.}\n\n\\item{minPts}{to use a k-NN plot to determine a suitable \\code{eps} value for \\code{\\link[=dbscan]{dbscan()}},\n\\code{minPts} used in dbscan can be specified and will set \\code{k = minPts - 1}.}\n}\n\\value{\n\\code{kNNdist()} returns a numeric vector with the distance to its k\nnearest neighbor. If \\code{all = TRUE} then a matrix with k columns\ncontaining the distances to all 1st, 2nd, ..., kth nearest neighbors is\nreturned instead.\n}\n\\description{\nFast calculation of the k-nearest neighbor distances for a dataset\nrepresented as a matrix of points. The kNN distance is defined as the\ndistance from a point to its k nearest neighbor. The kNN distance plot\ndisplays the kNN distance of all points sorted from smallest to largest. The\nplot can be used to help find suitable parameter values for \\code{\\link[=dbscan]{dbscan()}}.\n}\n\\examples{\ndata(iris)\niris <- as.matrix(iris[, 1:4])\n\n## Find the 4-NN distance for each observation (see ?kNN\n## for different search strategies)\nkNNdist(iris, k = 4)\n\n## Get a matrix with distances to the 1st, 2nd, ..., 4th NN.\nkNNdist(iris, k = 4, all = TRUE)\n\n## Produce a k-NN distance plot to determine a suitable eps for\n## DBSCAN with MinPts = 5. Use k = 4 (= MinPts -1).\n## The knee is visible around a distance of .7\nkNNdistplot(iris, k = 4)\n\n## Look at all k-NN distance plots for a k of 1 to 10\n## Note that k-NN distances are increasing in k\nkNNdistplot(iris, k = 1:20)\n\ncl <- dbscan(iris, eps = .7, minPts = 5)\npairs(iris, col = cl$cluster + 1L)\n## Note: black points are noise points\n}\n\\seealso{\nOther Outlier Detection Functions: \n\\code{\\link{glosh}()},\n\\code{\\link{lof}()},\n\\code{\\link{pointdensity}()}\n\nOther NN functions: \n\\code{\\link{NN}},\n\\code{\\link{comps}()},\n\\code{\\link{frNN}()},\n\\code{\\link{kNN}()},\n\\code{\\link{sNN}()}\n}\n\\author{\nMichael Hahsler\n}\n\\concept{NN functions}\n\\concept{Outlier Detection Functions}\n\\keyword{model}\n\\keyword{plot}\n"
  },
  {
    "path": "man/lof.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/LOF.R\n\\name{lof}\n\\alias{lof}\n\\alias{LOF}\n\\title{Local Outlier Factor Score}\n\\usage{\nlof(x, minPts = 5, ...)\n}\n\\arguments{\n\\item{x}{a data matrix or a \\link{dist} object.}\n\n\\item{minPts}{number of nearest neighbors used in defining the local\nneighborhood of a point (includes the point itself).}\n\n\\item{...}{further arguments are passed on to \\code{\\link[=kNN]{kNN()}}.\nNote: \\code{sort} cannot be specified here since \\code{lof()}\nuses always \\code{sort = TRUE}.}\n}\n\\value{\nA numeric vector of length \\code{ncol(x)} containing LOF values for\nall data points.\n}\n\\description{\nCalculate the Local Outlier Factor (LOF) score for each data point using a\nkd-tree to speed up kNN search.\n}\n\\details{\nLOF compares the local readability density (lrd) of an point to the lrd of\nits neighbors. A LOF score of approximately 1 indicates that the lrd around\nthe point is comparable to the lrd of its neighbors and that the point is\nnot an outlier. Points that have a substantially lower lrd than their\nneighbors are considered outliers and produce scores significantly larger\nthan 1.\n\nIf a data matrix is specified, then Euclidean distances and fast nearest\nneighbor search using a kd-tree is used.\n\n\\strong{Note on duplicate points:} If there are more than \\code{minPts}\nduplicates of a point in the data, then LOF the local readability distance\nwill be 0 resulting in an undefined LOF score of 0/0. We set LOF in this\ncase to 1 since there is already enough density from the points in the same\nlocation to make them not outliers. The original paper by Breunig et al\n(2000) assumes that the points are real duplicates and suggests to remove\nthe duplicates before computing LOF. If duplicate points are removed first,\nthen this LOF implementation in \\pkg{dbscan} behaves like the one described\nby Breunig et al.\n}\n\\examples{\nset.seed(665544)\nn <- 100\nx <- cbind(\n  x=runif(10, 0, 5) + rnorm(n, sd = 0.4),\n  y=runif(10, 0, 5) + rnorm(n, sd = 0.4)\n  )\n\n### calculate LOF score with a neighborhood of 3 points\nlof <- lof(x, minPts = 3)\n\n### distribution of outlier factors\nsummary(lof)\nhist(lof, breaks = 10, main = \"LOF (minPts = 3)\")\n\n### plot sorted lof. Looks like outliers start arounf a LOF of 2.\nplot(sort(lof), type = \"l\",  main = \"LOF (minPts = 3)\",\n  xlab = \"Points sorted by LOF\", ylab = \"LOF\")\n\n### point size is proportional to LOF and mark points with a LOF > 2\nplot(x, pch = \".\", main = \"LOF (minPts = 3)\", asp = 1)\npoints(x, cex = (lof - 1) * 2, pch = 1, col = \"red\")\ntext(x[lof > 2,], labels = round(lof, 1)[lof > 2], pos = 3)\n}\n\\references{\nBreunig, M., Kriegel, H., Ng, R., and Sander, J. (2000). LOF:\nidentifying density-based local outliers. In \\emph{ACM Int. Conf. on\nManagement of Data,} pages 93-104.\n\\doi{10.1145/335191.335388}\n}\n\\seealso{\nOther Outlier Detection Functions: \n\\code{\\link{glosh}()},\n\\code{\\link{kNNdist}()},\n\\code{\\link{pointdensity}()}\n}\n\\author{\nMichael Hahsler\n}\n\\concept{Outlier Detection Functions}\n\\keyword{model}\n"
  },
  {
    "path": "man/moons.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/moons.R\n\\docType{data}\n\\name{moons}\n\\alias{moons}\n\\title{Moons Data}\n\\format{\nA data frame with 100 observations on the following 2 variables.\n\\describe{\n\\item{X}{a numeric vector}\n\\item{Y}{a numeric vector} }\n}\n\\source{\nSee the HDBSCAN notebook from github documentation:\n\\url{http://hdbscan.readthedocs.io/en/latest/how_hdbscan_works.html}\n}\n\\description{\nContains 100 2-d points, half of which are contained in two moons or\n\"blobs\"\" (25 points each blob), and the other half in asymmetric facing\ncrescent shapes. The three shapes are all linearly separable.\n}\n\\details{\nThis data was generated with the following Python commands using the\nSciKit-Learn library:\n\n\\verb{> import sklearn.datasets as data}\n\n\\verb{> moons = data.make_moons(n_samples=50, noise=0.05)}\n\n\\verb{> blobs = data.make_blobs(n_samples=50, centers=[(-0.75,2.25), (1.0, 2.0)], cluster_std=0.25)}\n\n\\verb{> test_data = np.vstack([moons, blobs])}\n}\n\\examples{\ndata(moons)\nplot(moons, pch=20)\n}\n\\references{\nPedregosa, Fabian, Gael Varoquaux, Alexandre Gramfort,\nVincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel et al.\nScikit-learn: Machine learning in Python. \\emph{Journal of Machine Learning\nResearch} 12, no. Oct (2011): 2825-2830.\n}\n\\keyword{datasets}\n"
  },
  {
    "path": "man/ncluster.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/ncluster.R\n\\name{ncluster}\n\\alias{ncluster}\n\\alias{nnoise}\n\\alias{nobs}\n\\title{Number of Clusters, Noise Points, and Observations}\n\\usage{\nncluster(object, ...)\n\nnnoise(object, ...)\n}\n\\arguments{\n\\item{object}{a clustering result object containing a \\code{cluster} element.}\n\n\\item{...}{additional arguments are unused.}\n}\n\\value{\nreturns the number if clusters or noise points.\n}\n\\description{\nExtract the number of clusters or the number of noise points for\na clustering. This function works with any clustering result that\ncontains a list element named \\code{cluster} with a clustering vector. In\naddition, \\code{nobs} (see \\code{\\link[stats:nobs]{stats::nobs()}}) is also available to retrieve\nthe number of clustered points.\n}\n\\examples{\ndata(iris)\niris <- as.matrix(iris[, 1:4])\n\nres <- dbscan(iris, eps = .7, minPts = 5)\nres\n\nncluster(res)\nnnoise(res)\nnobs(res)\n\n# the functions also work with kmeans and other clustering algorithms.\ncl <- kmeans(iris, centers = 3)\nncluster(cl)\nnnoise(cl)\nnobs(res)\n}\n\\seealso{\nOther clustering functions: \n\\code{\\link{dbscan}()},\n\\code{\\link{extractFOSC}()},\n\\code{\\link{hdbscan}()},\n\\code{\\link{jpclust}()},\n\\code{\\link{optics}()},\n\\code{\\link{sNNclust}()}\n}\n\\concept{clustering functions}\n"
  },
  {
    "path": "man/optics.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/optics.R, R/predict.R\n\\name{optics}\n\\alias{optics}\n\\alias{OPTICS}\n\\alias{print.optics}\n\\alias{plot.optics}\n\\alias{as.reachability.optics}\n\\alias{as.dendrogram.optics}\n\\alias{extractDBSCAN}\n\\alias{extractXi}\n\\alias{predict.optics}\n\\title{Ordering Points to Identify the Clustering Structure (OPTICS)}\n\\usage{\noptics(x, eps = NULL, minPts = 5, ...)\n\n\\method{print}{optics}(x, ...)\n\n\\method{plot}{optics}(x, cluster = TRUE, predecessor = FALSE, ...)\n\n\\method{as.reachability}{optics}(object, ...)\n\n\\method{as.dendrogram}{optics}(object, ...)\n\nextractDBSCAN(object, eps_cl)\n\nextractXi(object, xi, minimum = FALSE, correctPredecessors = TRUE)\n\n\\method{predict}{optics}(object, newdata, data, ...)\n}\n\\arguments{\n\\item{x}{a data matrix or a \\link{dist} object.}\n\n\\item{eps}{upper limit of the size of the epsilon neighborhood. Limiting the\nneighborhood size improves performance and has no or very little impact on\nthe ordering as long as it is not set too low. If not specified, the largest\nminPts-distance in the data set is used which gives the same result as\ninfinity.}\n\n\\item{minPts}{the parameter is used to identify dense neighborhoods and the\nreachability distance is calculated as the distance to the minPts nearest\nneighbor. Controls the smoothness of the reachability distribution. Default\nis 5 points.}\n\n\\item{...}{additional arguments are passed on to fixed-radius nearest\nneighbor search algorithm. See \\code{\\link[=frNN]{frNN()}} for details on how to\ncontrol the search strategy.}\n\n\\item{cluster, predecessor}{plot clusters and predecessors.}\n\n\\item{object}{clustering object.}\n\n\\item{eps_cl}{Threshold to identify clusters (\\code{eps_cl <= eps}).}\n\n\\item{xi}{Steepness threshold to identify clusters hierarchically using the\nXi method.}\n\n\\item{minimum}{logical, representing whether or not to extract the minimal\n(non-overlapping) clusters in the Xi clustering algorithm.}\n\n\\item{correctPredecessors}{logical, correct a common artifact by pruning\nthe steep up area for points that have predecessors not in the\ncluster--found by the ELKI framework, see details below.}\n\n\\item{newdata}{new data points for which the cluster membership should be\npredicted.}\n\n\\item{data}{the data set used to create the clustering object.}\n}\n\\value{\nAn object of class \\code{optics} with components:\n\\item{eps }{ value of \\code{eps} parameter. }\n\\item{minPts }{ value of \\code{minPts} parameter. }\n\\item{order }{ optics order for the data points in \\code{x}. }\n\\item{reachdist }{ \\link{reachability} distance for each data point in \\code{x}. }\n\\item{coredist }{ core distance for each data point in \\code{x}. }\n\nFor \\code{extractDBSCAN()}, in addition the following\ncomponents are available:\n\\item{eps_cl }{ the value of the \\code{eps_cl} parameter. }\n\\item{cluster }{ assigned cluster labels in the order of the data points in \\code{x}. }\n\nFor \\code{extractXi()}, in addition the following components\nare available:\n\\item{xi}{ Steepness threshold\\code{x}. }\n\\item{cluster }{ assigned cluster labels in the order of the data points in \\code{x}.}\n\\item{clusters_xi }{ data.frame containing the start and end of each cluster\nfound in the OPTICS ordering. }\n}\n\\description{\nImplementation of the OPTICS (Ordering points to identify the clustering\nstructure) point ordering algorithm using a kd-tree.\n}\n\\details{\n\\strong{The algorithm}\n\nThis implementation of OPTICS implements the original\nalgorithm as described by Ankerst et al (1999). OPTICS is an ordering\nalgorithm with methods to extract a clustering from the ordering.\nWhile using similar concepts as DBSCAN, for OPTICS \\code{eps}\nis only an upper limit for the neighborhood size used to reduce\ncomputational complexity. Note that \\code{minPts} in OPTICS has a different\neffect then in DBSCAN. It is used to define dense neighborhoods, but since\n\\code{eps} is typically set rather high, this does not effect the ordering\nmuch. However, it is also used to calculate the reachability distance and\nlarger values will make the reachability distance plot smoother.\n\nOPTICS linearly orders the data points such that points which are spatially\nclosest become neighbors in the ordering. The closest analog to this\nordering is dendrogram in single-link hierarchical clustering. The algorithm\nalso calculates the reachability distance for each point.\n\\code{plot()} (see \\link{reachability_plot})\nproduces a reachability plot which shows each points reachability distance\nbetween two consecutive points\nwhere the points are sorted by OPTICS. Valleys represent clusters (the\ndeeper the valley, the more dense the cluster) and high points indicate\npoints between clusters.\n\n\\strong{Specifying the data}\n\nIf \\code{x} is specified as a data matrix, then Euclidean distances and fast\nnearest neighbor lookup using a kd-tree are used. See \\code{\\link[=kNN]{kNN()}} for\ndetails on the parameters for the kd-tree.\n\n\\strong{Extracting a clustering}\n\nSeveral methods to extract a clustering from the order returned by OPTICS are\nimplemented:\n\\itemize{\n\\item \\code{extractDBSCAN()} extracts a clustering from an OPTICS ordering that is\nsimilar to what DBSCAN would produce with an eps set to \\code{eps_cl} (see\nAnkerst et al, 1999). The only difference to a DBSCAN clustering is that\nOPTICS is not able to assign some border points and reports them instead as\nnoise.\n\\item \\code{extractXi()} extract clusters hierarchically specified in Ankerst et al\n(1999) based on the steepness of the reachability plot. One interpretation\nof the \\code{xi} parameter is that it classifies clusters by change in\nrelative cluster density. The used algorithm was originally contributed by\nthe ELKI framework and is explained in Schubert et al (2018), but contains a\nset of fixes.\n}\n\n\\strong{Predict cluster memberships}\n\n\\code{predict()} requires an extracted DBSCAN clustering with \\code{extractDBSCAN()} and then\nuses predict for \\code{dbscan()}.\n}\n\\examples{\nset.seed(2)\nn <- 400\n\nx <- cbind(\n  x = runif(4, 0, 1) + rnorm(n, sd = 0.1),\n  y = runif(4, 0, 1) + rnorm(n, sd = 0.1)\n  )\n\nplot(x, col=rep(1:4, times = 100))\n\n### run OPTICS (Note: we use the default eps calculation)\nres <- optics(x, minPts = 10)\nres\n\n### get order\nres$order\n\n### plot produces a reachability plot\nplot(res)\n\n### plot the order of points in the reachability plot\nplot(x, col = \"grey\")\npolygon(x[res$order, ])\n\n### extract a DBSCAN clustering by cutting the reachability plot at eps_cl\nres <- extractDBSCAN(res, eps_cl = .065)\nres\n\nplot(res)  ## black is noise\nhullplot(x, res)\n\n### re-cut at a higher eps threshold\nres <- extractDBSCAN(res, eps_cl = .07)\nres\nplot(res)\nhullplot(x, res)\n\n### extract hierarchical clustering of varying density using the Xi method\nres <- extractXi(res, xi = 0.01)\nres\n\nplot(res)\nhullplot(x, res)\n\n# Xi cluster structure\nres$clusters_xi\n\n### use OPTICS on a precomputed distance matrix\nd <- dist(x)\nres <- optics(d, minPts = 10)\nplot(res)\n}\n\\references{\nMihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Joerg\nSander (1999). OPTICS: Ordering Points To Identify the Clustering Structure.\n\\emph{ACM SIGMOD international conference on Management of data.} ACM Press. pp.\n\\doi{10.1145/304181.304187}\n\nHahsler M, Piekenbrock M, Doran D (2019). dbscan: Fast Density-Based\nClustering with R.  \\emph{Journal of Statistical Software}, 91(1), 1-30.\n\\doi{10.18637/jss.v091.i01}\n\nErich Schubert, Michael Gertz (2018). Improving the Cluster Structure\nExtracted from OPTICS Plots. In \\emph{Lernen, Wissen, Daten, Analysen (LWDA 2018),}\npp. 318-329.\n}\n\\seealso{\nDensity \\link{reachability}.\n\nOther clustering functions: \n\\code{\\link{dbscan}()},\n\\code{\\link{extractFOSC}()},\n\\code{\\link{hdbscan}()},\n\\code{\\link{jpclust}()},\n\\code{\\link{ncluster}()},\n\\code{\\link{sNNclust}()}\n}\n\\author{\nMichael Hahsler and Matthew Piekenbrock\n}\n\\concept{clustering functions}\n\\keyword{clustering}\n\\keyword{model}\n"
  },
  {
    "path": "man/pointdensity.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/pointdensity.R\n\\name{pointdensity}\n\\alias{pointdensity}\n\\alias{density}\n\\title{Calculate Local Density at Each Data Point}\n\\usage{\npointdensity(\n  x,\n  eps,\n  type = \"frequency\",\n  search = \"kdtree\",\n  bucketSize = 10,\n  splitRule = \"suggest\",\n  approx = 0\n)\n}\n\\arguments{\n\\item{x}{a data matrix or a dist object.}\n\n\\item{eps}{radius of the eps-neighborhood, i.e., bandwidth of the uniform\nkernel). For the Gaussian kde, this parameter specifies the standard deviation of\nthe kernel.}\n\n\\item{type}{\\code{\"frequency\"}, \\code{\"density\"}, or \\code{\"gaussian\"}. should the raw count of\npoints inside the eps-neighborhood, the eps-neighborhood density estimate,\nor a Gaussian density estimate be returned?}\n\n\\item{search, bucketSize, splitRule, approx}{algorithmic parameters for\n\\code{\\link[=frNN]{frNN()}}.}\n}\n\\value{\nA vector of the same length as data points (rows) in \\code{x} with\nthe count or density values for each data point.\n}\n\\description{\nCalculate the local density at each data point as either the number of\npoints in the eps-neighborhood (as used in \\code{dbscan()}) or perform kernel density\nestimation (KDE) using a uniform kernel. The function uses a kd-tree for fast\nfixed-radius nearest neighbor search.\n}\n\\details{\n\\code{dbscan()} estimates the density around a point as the number of points in the\neps-neighborhood of the point (including the query point itself).\nKernel density estimation (KDE) using a uniform kernel, which is just this point\ncount in the eps-neighborhood divided by \\eqn{(2\\,eps\\,n)}{(2 eps n)}, where\n\\eqn{n} is the number of points in \\code{x}.\n\nAlternatively, \\code{type = \"gaussian\"} calculates a Gaussian kernel estimate where\n\\code{eps} is used as the standard deviation. To speed up computation, a\nkd-tree is used to find all points within 3 times the standard deviation and\nthese points are used for the estimate.\n\nPoints with low local density often indicate noise (see e.g., Wishart (1969)\nand Hartigan (1975)).\n}\n\\examples{\nset.seed(665544)\nn <- 100\nx <- cbind(\n  x=runif(10, 0, 5) + rnorm(n, sd = 0.4),\n  y=runif(10, 0, 5) + rnorm(n, sd = 0.4)\n  )\nplot(x)\n\n### calculate density around points\nd <- pointdensity(x, eps = .5, type = \"density\")\n\n### density distribution\nsummary(d)\nhist(d, breaks = 10)\n\n### plot with point size is proportional to Density\nplot(x, pch = 19, main = \"Density (eps = .5)\", cex = d*5)\n\n### Wishart (1969) single link clustering after removing low-density noise\n# 1. remove noise with low density\nf <- pointdensity(x, eps = .5, type = \"frequency\")\nx_nonoise <- x[f >= 5,]\n\n# 2. use single-linkage on the non-noise points\nhc <- hclust(dist(x_nonoise), method = \"single\")\nplot(x, pch = 19, cex = .5)\npoints(x_nonoise, pch = 19, col= cutree(hc, k = 4) + 1L)\n}\n\\references{\nWishart, D. (1969), Mode Analysis: A Generalization of Nearest\nNeighbor which Reduces Chaining Effects, in \\emph{Numerical Taxonomy,} Ed., A.J.\nCole, Academic Press, 282-311.\n\nJohn A. Hartigan (1975), \\emph{Clustering Algorithms,} John Wiley & Sons, Inc.,\nNew York, NY, USA.\n}\n\\seealso{\n\\code{\\link[=frNN]{frNN()}}, \\code{\\link[stats:density]{stats::density()}}.\n\nOther Outlier Detection Functions: \n\\code{\\link{glosh}()},\n\\code{\\link{kNNdist}()},\n\\code{\\link{lof}()}\n}\n\\author{\nMichael Hahsler\n}\n\\concept{Outlier Detection Functions}\n\\keyword{model}\n"
  },
  {
    "path": "man/reachability.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/reachability.R\n\\name{reachability}\n\\alias{reachability}\n\\alias{reachability_plot}\n\\alias{print.reachability}\n\\alias{plot.reachability}\n\\alias{as.reachability}\n\\alias{as.reachability.dendrogram}\n\\title{Reachability Distances}\n\\usage{\n\\method{print}{reachability}(x, ...)\n\n\\method{plot}{reachability}(\n  x,\n  order_labels = FALSE,\n  xlab = \"Order\",\n  ylab = \"Reachability dist.\",\n  main = \"Reachability Plot\",\n  ...\n)\n\nas.reachability(object, ...)\n\n\\method{as.reachability}{dendrogram}(object, ...)\n}\n\\arguments{\n\\item{x}{object of class \\code{reachability}.}\n\n\\item{...}{graphical parameters are passed on to \\code{plot()},\nor arguments for other methods.}\n\n\\item{order_labels}{whether to plot text labels for each points reachability\ndistance.}\n\n\\item{xlab}{x-axis label.}\n\n\\item{ylab}{y-axis label.}\n\n\\item{main}{Title of the plot.}\n\n\\item{object}{any object that can be coerced to class\n\\code{reachability}, such as an object of class \\link{optics} or \\link[stats:dendrogram]{stats::dendrogram}.}\n}\n\\value{\nAn object of class \\code{reachability} with components:\n\\item{order }{order to use for the data points in \\code{x}. }\n\\item{reachdist }{reachability distance for each data point in \\code{x}. }\n}\n\\description{\nReachability distances can be plotted to show the hierarchical relationships between data points.\nThe idea was originally introduced by Ankerst et al (1999) for \\link{OPTICS}. Later,\nSanders et al (2003) showed that the visualization is useful for other hierarchical\nstructures and introduced an algorithm to convert \\link{dendrogram} representation to\nreachability plots.\n}\n\\details{\nA reachability plot displays the points as vertical bars, were the height is the\nreachability distance between two consecutive points.\nThe central idea behind reachability plots is that the ordering in which\npoints are plotted identifies underlying hierarchical density\nrepresentation as mountains and valleys of high and low reachability distance.\nThe original ordering algorithm OPTICS as described by Ankerst et al (1999)\nintroduced the notion of reachability plots.\n\nOPTICS linearly orders the data points such that points\nwhich are spatially closest become neighbors in the ordering. Valleys\nrepresent clusters, which can be represented hierarchically. Although the\nordering is crucial to the structure of the reachability plot, its important\nto note that OPTICS, like DBSCAN, is not entirely deterministic and, just\nlike the dendrogram, isomorphisms may exist\n\nReachability plots were shown to essentially convey the same information as\nthe more traditional dendrogram structure by Sanders et al (2003). An dendrograms\ncan be converted into reachability plots.\n\nDifferent hierarchical representations, such as dendrograms or reachability\nplots, may be preferable depending on the context. In smaller datasets,\ncluster memberships may be more easily identifiable through a dendrogram\nrepresentation, particularly is the user is already familiar with tree-like\nrepresentations. For larger datasets however, a reachability plot may be\npreferred for visualizing macro-level density relationships.\n\nA variety of cluster extraction methods have been proposed using\nreachability plots. Because both cluster extraction depend directly on the\nordering OPTICS produces, they are part of the \\code{\\link[=optics]{optics()}} interface.\nNonetheless, reachability plots can be created directly from other types of\nlinkage trees, and vice versa.\n\n\\emph{Note:} The reachability distance for the first point is by definition not defined\n(it has no preceding point).\nAlso, the reachability distances can be undefined when a point does not have enough\nneighbors in the epsilon neighborhood. We represent these undefined cases as \\code{Inf}\nand represent them in the plot as a dashed line.\n}\n\\examples{\nset.seed(2)\nn <- 20\n\nx <- cbind(\n  x = runif(4, 0, 1) + rnorm(n, sd = 0.1),\n  y = runif(4, 0, 1) + rnorm(n, sd = 0.1)\n)\n\nplot(x, xlim = range(x), ylim = c(min(x) - sd(x), max(x) + sd(x)), pch = 20)\ntext(x = x, labels = seq_len(nrow(x)), pos = 3)\n\n### run OPTICS\nres <- optics(x, eps = 10,  minPts = 2)\nres\n\n### plot produces a reachability plot.\nplot(res)\n\n### Manually extract reachability components from OPTICS\nreach <- as.reachability(res)\nreach\n\n### plot still produces a reachability plot; points ids\n### (rows in the original data) can be displayed with order_labels = TRUE\nplot(reach, order_labels = TRUE)\n\n### Reachability objects can be directly converted to dendrograms\ndend <- as.dendrogram(reach)\ndend\nplot(dend)\n\n### A dendrogram can be converted back into a reachability object\nplot(as.reachability(dend))\n}\n\\references{\nAnkerst, M., M. M. Breunig, H.-P. Kriegel, J. Sander (1999).\nOPTICS: Ordering Points To Identify the Clustering Structure. \\emph{ACM\nSIGMOD international conference on Management of data.} ACM Press. pp.\n49--60.\n\nSander, J., X. Qin, Z. Lu, N. Niu, and A. Kovarsky (2003). Automatic\nextraction of clusters from hierarchical clustering representations.\n\\emph{Pacific-Asia Conference on Knowledge Discovery and Data Mining.}\nSpringer Berlin Heidelberg.\n}\n\\seealso{\n\\code{\\link[=optics]{optics()}}, \\code{\\link[=as.dendrogram]{as.dendrogram()}}, and \\code{\\link[stats:hclust]{stats::hclust()}}.\n}\n\\author{\nMatthew Piekenbrock\n}\n\\keyword{clustering}\n\\keyword{hierarchical}\n\\keyword{model}\n"
  },
  {
    "path": "man/sNN.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/sNN.R\n\\name{sNN}\n\\alias{sNN}\n\\alias{snn}\n\\alias{sort.sNN}\n\\alias{print.sNN}\n\\title{Find Shared Nearest Neighbors}\n\\usage{\nsNN(\n  x,\n  k,\n  kt = NULL,\n  jp = FALSE,\n  sort = TRUE,\n  search = \"kdtree\",\n  bucketSize = 10,\n  splitRule = \"suggest\",\n  approx = 0\n)\n\n\\method{sort}{sNN}(x, decreasing = TRUE, ...)\n\n\\method{print}{sNN}(x, ...)\n}\n\\arguments{\n\\item{x}{a data matrix, a \\link{dist} object or a \\link{kNN} object.}\n\n\\item{k}{number of neighbors to consider to calculate the shared nearest\nneighbors.}\n\n\\item{kt}{minimum threshold on the number of shared nearest neighbors to\nbuild the shared nearest neighbor graph. Edges are only preserved if\n\\code{kt} or more neighbors are shared.}\n\n\\item{jp}{In regular sNN graphs, two points that are not neighbors\ncan have shared neighbors.\nJavis and Patrick (1973) requires the two points to be neighbors, otherwise\nthe count is zeroed out. \\code{TRUE} uses this behavior.}\n\n\\item{sort}{sort by the number of shared nearest neighbors? Note that this\nis expensive and \\code{sort = FALSE} is much faster. sNN objects can be\nsorted using \\code{sort()}.}\n\n\\item{search}{nearest neighbor search strategy (one of \\code{\"kdtree\"}, \\code{\"linear\"} or\n\\code{\"dist\"}).}\n\n\\item{bucketSize}{max size of the kd-tree leafs.}\n\n\\item{splitRule}{rule to split the kd-tree. One of \\code{\"STD\"}, \\code{\"MIDPT\"}, \\code{\"FAIR\"},\n\\code{\"SL_MIDPT\"}, \\code{\"SL_FAIR\"} or \\code{\"SUGGEST\"} (SL stands for sliding). \\code{\"SUGGEST\"} uses\nANNs best guess.}\n\n\\item{approx}{use approximate nearest neighbors. All NN up to a distance of\na factor of \\verb{(1 + approx) eps} may be used. Some actual NN may be omitted\nleading to spurious clusters and noise points.  However, the algorithm will\nenjoy a significant speedup.}\n\n\\item{decreasing}{logical; sort in decreasing order?}\n\n\\item{...}{additional parameters are passed on.}\n}\n\\value{\nAn object of class \\code{sNN} (subclass of \\link{kNN} and \\link{NN}) containing a list\nwith the following components:\n\\item{id }{a matrix with ids. }\n\\item{dist}{a matrix with the distances. }\n\\item{shared }{a matrix with the number of shared nearest neighbors. }\n\\item{k }{number of \\code{k} used. }\n\\item{metric }{the used distance metric. }\n}\n\\description{\nCalculates the number of shared nearest neighbors\nand creates a shared nearest neighbors graph.\n}\n\\details{\nThe number of shared nearest neighbors of two points p and q is the\nintersection of the kNN neighborhood of two points.\nNote: that each point is considered to be part\nof its own kNN neighborhood.\nThe range for the shared nearest neighbors is\n\\eqn{[0, k]}. The result is a n-by-k matrix called \\code{shared}.\nEach row is a point and the columns are the point's k nearest neighbors.\nThe value is the count of the shared neighbors.\n\nThe shared nearest neighbor graph connects a point with all its nearest neighbors\nif they have at least one shared neighbor. The number of shared neighbors can be used\nas an edge weight.\nJavis and Patrick (1973) use a slightly\nmodified (see parameter \\code{jp}) shared nearest neighbor graph for\nclustering.\n}\n\\examples{\ndata(iris)\nx <- iris[, -5]\n\n# finding kNN and add the number of shared nearest neighbors.\nk <- 5\nnn <- sNN(x, k = k)\nnn\n\n# shared nearest neighbor distribution\ntable(as.vector(nn$shared))\n\n# explore number of shared points for the k-neighborhood of point 10\ni <- 10\nnn$shared[i,]\n\nplot(nn, x)\n\n# apply a threshold to create a sNN graph with edges\n# if more than 3 neighbors are shared.\nnn_3 <- sNN(nn, kt = 3)\nplot(nn_3, x)\n\n# get an adjacency list for the shared nearest neighbor graph\nadjacencylist(nn_3)\n}\n\\references{\nR. A. Jarvis and E. A. Patrick. 1973. Clustering Using a\nSimilarity Measure Based on Shared Near Neighbors. \\emph{IEEE Trans. Comput.}\n22, 11 (November 1973), 1025-1034.\n\\doi{10.1109/T-C.1973.223640}\n}\n\\seealso{\nOther NN functions: \n\\code{\\link{NN}},\n\\code{\\link{comps}()},\n\\code{\\link{frNN}()},\n\\code{\\link{kNN}()},\n\\code{\\link{kNNdist}()}\n}\n\\author{\nMichael Hahsler\n}\n\\concept{NN functions}\n\\keyword{model}\n"
  },
  {
    "path": "man/sNNclust.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/sNNclust.R\n\\name{sNNclust}\n\\alias{sNNclust}\n\\alias{snnclust}\n\\title{Shared Nearest Neighbor Clustering}\n\\usage{\nsNNclust(x, k, eps, minPts, borderPoints = TRUE, ...)\n}\n\\arguments{\n\\item{x}{a data matrix/data.frame (Euclidean distance is used), a\nprecomputed \\link{dist} object or a kNN object created with \\code{\\link[=kNN]{kNN()}}.}\n\n\\item{k}{Neighborhood size for nearest neighbor sparsification to create the\nshared NN graph.}\n\n\\item{eps}{Two objects are only reachable from each other if they share at\nleast \\code{eps} nearest neighbors. Note: this is different from the \\code{eps} in DBSCAN!}\n\n\\item{minPts}{minimum number of points that share at least \\code{eps}\nnearest neighbors for a point to be considered a core points.}\n\n\\item{borderPoints}{should border points be assigned to clusters like in\n\\link{DBSCAN}?}\n\n\\item{...}{additional arguments are passed on to the k nearest neighbor\nsearch algorithm. See \\code{\\link[=kNN]{kNN()}} for details on how to control the\nsearch strategy.}\n}\n\\value{\nA object of class \\code{general_clustering} with the following\ncomponents:\n\\item{cluster }{A integer vector with cluster assignments. Zero\nindicates noise points.}\n\\item{type }{ name of used clustering algorithm.}\n\\item{param }{ list of used clustering parameters. }\n}\n\\description{\nImplements the shared nearest neighbor clustering algorithm by Ertoz,\nSteinbach and Kumar (2003).\n}\n\\details{\n\\strong{Algorithm:}\n\\enumerate{\n\\item Constructs a shared nearest neighbor graph for a given k. The edge\nweights are the number of shared k nearest neighbors (in the range of\n\\eqn{[0, k]}).\n\\item Find each points SNN density, i.e., the number of points which have a\nsimilarity of \\code{eps} or greater.\n\\item Find the core points, i.e., all points that have an SNN density greater\nthan \\code{MinPts}.\n\\item Form clusters from the core points and assign border points (i.e.,\nnon-core points which share at least \\code{eps} neighbors with a core point).\n}\n\nNote that steps 2-4 are equivalent to the DBSCAN algorithm (see \\code{\\link[=dbscan]{dbscan()}})\nand that \\code{eps} has a different meaning than for DBSCAN. Here it is\na threshold on the number of shared neighbors (see \\code{\\link[=sNN]{sNN()}})\nwhich defines a similarity.\n}\n\\examples{\ndata(\"DS3\")\n\n# Out of k = 20 NN 7 (eps) have to be shared to create a link in the sNN graph.\n# A point needs a least 16 (minPts) links in the sNN graph to be a core point.\n# Noise points have cluster id 0 and are shown in black.\ncl <- sNNclust(DS3, k = 20, eps = 7, minPts = 16)\ncl\n\nclplot(DS3, cl)\n\n}\n\\references{\nLevent Ertoz, Michael Steinbach, Vipin Kumar, Finding Clusters\nof Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data,\n\\emph{SIAM International Conference on Data Mining,} 2003, 47-59.\n\\doi{10.1137/1.9781611972733.5}\n}\n\\seealso{\nOther clustering functions: \n\\code{\\link{dbscan}()},\n\\code{\\link{extractFOSC}()},\n\\code{\\link{hdbscan}()},\n\\code{\\link{jpclust}()},\n\\code{\\link{ncluster}()},\n\\code{\\link{optics}()}\n}\n\\author{\nMichael Hahsler\n}\n\\concept{clustering functions}\n\\keyword{clustering}\n\\keyword{model}\n"
  },
  {
    "path": "src/ANN/ANN.cpp",
    "content": "//----------------------------------------------------------------------\n// File:\t\t\tANN.cpp\n// Programmer:\t\tSunil Arya and David Mount\n// Description:\t\tMethods for ANN.h and ANNx.h\n// Last modified:\t01/04/05 (Version 1.0)\n//----------------------------------------------------------------------\n// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and\n// David Mount.  All Rights Reserved.\n//\n// This software and related documentation is part of the Approximate\n// Nearest Neighbor Library (ANN).  This software is provided under\n// the provisions of the Lesser GNU Public License (LGPL).  See the\n// file ../ReadMe.txt for further information.\n//\n// The University of Maryland (U.M.) and the authors make no\n// representations about the suitability or fitness of this software for\n// any purpose.  It is provided \"as is\" without express or implied\n// warranty.\n//----------------------------------------------------------------------\n// History:\n//\tRevision 0.1  03/04/98\n//\t\tInitial release\n//\tRevision 1.0  04/01/05\n//\t\tAdded performance counting to annDist()\n//      Modified 2/28/08\n//              Added cstdlib and std::\n//----------------------------------------------------------------------\n\n#include <cstdlib>\n#include \"ANNx.h\"\t\t\t\t\t// all ANN include\n#include \"ANNperf.h\"\t\t\t\t// ANN performance\n//using namespace std;\t\t\t\t\t// make std:: accessible\n\n#include <R.h>\n\n//----------------------------------------------------------------------\n//\tPoint methods\n//----------------------------------------------------------------------\n\n//----------------------------------------------------------------------\n//\tDistance utility.\n//\t\t(Note: In the nearest neighbor search, most distances are\n//\t\tcomputed using partial distance calculations, not this\n//\t\tprocedure.)\n//----------------------------------------------------------------------\n\nANNdist annDist(\t\t\t\t\t\t// interpoint squared distance\n\tint\t\t\t\t\tdim,\n\tANNpoint\t\t\tp,\n\tANNpoint\t\t\tq)\n{\n\tint d;\n\tANNcoord diff;\n\tANNcoord dist;\n\n\tdist = 0;\n\tfor (d = 0; d < dim; d++) {\n\t\tdiff = p[d] - q[d];\n\t\tdist = ANN_SUM(dist, ANN_POW(diff));\n\t}\n\tANN_FLOP(3*dim)\t\t\t\t\t// performance counts\n\tANN_PTS(1)\n\tANN_COORD(dim)\n\treturn dist;\n}\n\n//----------------------------------------------------------------------\n//\tannPrintPoint() prints a point to a given output stream.\n//----------------------------------------------------------------------\n\nvoid annPrintPt(\t\t\t\t\t\t// print a point\n\tANNpoint\t\t\tpt,\t\t\t\t// the point\n\tint\t\t\t\t\tdim,\t\t\t// the dimension\n\tstd::ostream\t\t&out)\t\t\t// output stream\n{\n\tfor (int j = 0; j < dim; j++) {\n\t\tout << pt[j];\n\t\tif (j < dim-1) out << \" \";\n\t}\n}\n\n//----------------------------------------------------------------------\n//\tPoint allocation/deallocation:\n//\n//\t\tBecause points (somewhat like strings in C) are stored\n//\t\tas pointers.  Consequently, creating and destroying\n//\t\tcopies of points may require storage allocation.  These\n//\t\tprocedures do this.\n//\n//\t\tannAllocPt() and annDeallocPt() allocate a deallocate\n//\t\tstorage for a single point, and return a pointer to it.\n//\n//\t\tannAllocPts() allocates an array of points as well a place\n//\t\tto store their coordinates, and initializes the points to\n//\t\tpoint to their respective coordinates.  It allocates point\n//\t\tstorage in a contiguous block large enough to store all the\n//\t\tpoints.  It performs no initialization.\n//\n//\t\tannDeallocPts() should only be used on point arrays allocated\n//\t\tby annAllocPts since it assumes that points are allocated in\n//\t\ta block.\n//\n//\t\tannCopyPt() copies a point taking care to allocate storage\n//\t\tfor the new point.\n//\n//\t\tannAssignRect() assigns the coordinates of one rectangle to\n//\t\tanother.  The two rectangles must have the same dimension\n//\t\t(and it is not possible to test this here).\n//----------------------------------------------------------------------\n\nANNpoint annAllocPt(int dim, ANNcoord c)\t\t// allocate 1 point\n{\n\tANNpoint p = new ANNcoord[dim];\n\tfor (int i = 0; i < dim; i++) p[i] = c;\n\treturn p;\n}\n\nANNpointArray annAllocPts(int n, int dim)\t\t// allocate n pts in dim\n{\n\tANNpointArray pa = new ANNpoint[n];\t\t\t// allocate points\n\tANNpoint\t  p  = new ANNcoord[n*dim];\t\t// allocate space for coords\n\tfor (int i = 0; i < n; i++) {\n\t\tpa[i] = &(p[i*dim]);\n\t}\n\treturn pa;\n}\n\nvoid annDeallocPt(ANNpoint &p)\t\t\t\t\t// deallocate 1 point\n{\n\tdelete [] p;\n\tp = NULL;\n}\n\nvoid annDeallocPts(ANNpointArray &pa)\t\t\t// deallocate points\n{\n\tdelete [] pa[0];\t\t\t\t\t\t\t// dealloc coordinate storage\n\tdelete [] pa;\t\t\t\t\t\t\t\t// dealloc points\n\tpa = NULL;\n}\n\nANNpoint annCopyPt(int dim, ANNpoint source)\t// copy point\n{\n\tANNpoint p = new ANNcoord[dim];\n\tfor (int i = 0; i < dim; i++) p[i] = source[i];\n\treturn p;\n}\n\n\t\t\t\t\t\t\t\t\t\t\t\t// assign one rect to another\nvoid annAssignRect(int dim, ANNorthRect &dest, const ANNorthRect &source)\n{\n\tfor (int i = 0; i < dim; i++) {\n\t\tdest.lo[i] = source.lo[i];\n\t\tdest.hi[i] = source.hi[i];\n\t}\n}\n\n\t\t\t\t\t\t\t\t\t\t\t\t// is point inside rectangle?\nANNbool ANNorthRect::inside(int dim, ANNpoint p)\n{\n\tfor (int i = 0; i < dim; i++) {\n\t\tif (p[i] < lo[i] || p[i] > hi[i]) return ANNfalse;\n\t}\n\treturn ANNtrue;\n}\n\n//----------------------------------------------------------------------\n//\tError handler\n//----------------------------------------------------------------------\n\nvoid annError(const char *msg, ANNerr level)\n{\n\tif (level == ANNabort) {\n\t  //cerr << \"ANN: ERROR------->\" << msg << \"<-------------ERROR\\n\";\n\t  Rprintf(\"ANN Fatal ERROR: %s\", msg);\n//\t  std::exit(1);\n\t}\n\telse {\n\t  //cerr << \"ANN: WARNING----->\" << msg << \"<-------------WARNING\\n\";\n\t  Rprintf(\"ANN WARNING: %s\", msg);\n\t}\n}\n\n//----------------------------------------------------------------------\n//\tLimit on number of points visited\n//\t\tWe have an option for terminating the search early if the\n//\t\tnumber of points visited exceeds some threshold.  If the\n//\t\tthreshold is 0 (its default)  this means there is no limit\n//\t\tand the algorithm applies its normal termination condition.\n//\t\tThis is for applications where there are real time constraints\n//\t\ton the running time of the algorithm.\n//----------------------------------------------------------------------\n\nint\tANNmaxPtsVisited = 0;\t// maximum number of pts visited\nint\tANNptsVisited;\t\t\t// number of pts visited in search\n\n//----------------------------------------------------------------------\n//\tGlobal function declarations\n//----------------------------------------------------------------------\n\nvoid annMaxPtsVisit(\t\t\t// set limit on max. pts to visit in search\n\tint\t\t\t\t\tmaxPts)\t\t\t// the limit\n{\n\tANNmaxPtsVisited = maxPts;\n}\n"
  },
  {
    "path": "src/ANN/ANN.h",
    "content": "//----------------------------------------------------------------------\n// File:\t\t\tANN.h\n// Programmer:\t\tSunil Arya and David Mount\n// Last modified:\t05/03/05 (Release 1.1)\n// Description:\t\tBasic include file for approximate nearest\n//\t\t\t\t\tneighbor searching.\n//----------------------------------------------------------------------\n// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and\n// David Mount.  All Rights Reserved.\n//\n// This software and related documentation is part of the Approximate\n// Nearest Neighbor Library (ANN).  This software is provided under\n// the provisions of the Lesser GNU Public License (LGPL).  See the\n// file ../ReadMe.txt for further information.\n//\n// The University of Maryland (U.M.) and the authors make no\n// representations about the suitability or fitness of this software for\n// any purpose.  It is provided \"as is\" without express or implied\n// warranty.\n//----------------------------------------------------------------------\n// History:\n//\tRevision 0.1  03/04/98\n//\t\tInitial release\n//\tRevision 1.0  04/01/05\n//\t\tAdded copyright and revision information\n//\t\tAdded ANNcoordPrec for coordinate precision.\n//\t\tAdded methods theDim, nPoints, maxPoints, thePoints to ANNpointSet.\n//\t\tCleaned up C++ structure for modern compilers\n//\tRevision 1.1  05/03/05\n//\t\tAdded fixed-radius k-NN searching\n//----------------------------------------------------------------------\n\n//----------------------------------------------------------------------\n// ANN - approximate nearest neighbor searching\n//\tANN is a library for approximate nearest neighbor searching,\n//\tbased on the use of standard and priority search in kd-trees\n//\tand balanced box-decomposition (bbd) trees. Here are some\n//\treferences to the main algorithmic techniques used here:\n//\n//\t\tkd-trees:\n//\t\t\tFriedman, Bentley, and Finkel, ``An algorithm for finding\n//\t\t\t\tbest matches in logarithmic expected time,'' ACM\n//\t\t\t\tTransactions on Mathematical Software, 3(3):209-226, 1977.\n//\n//\t\tPriority search in kd-trees:\n//\t\t\tArya and Mount, ``Algorithms for fast vector quantization,''\n//\t\t\t\tProc. of DCC '93: Data Compression Conference, eds. J. A.\n//\t\t\t\tStorer and M. Cohn, IEEE Press, 1993, 381-390.\n//\n//\t\tApproximate nearest neighbor search and bbd-trees:\n//\t\t\tArya, Mount, Netanyahu, Silverman, and Wu, ``An optimal\n//\t\t\t\talgorithm for approximate nearest neighbor searching,''\n//\t\t\t\t5th Ann. ACM-SIAM Symposium on Discrete Algorithms,\n//\t\t\t\t1994, 573-582.\n//----------------------------------------------------------------------\n\n#ifndef ANN_H\n#define ANN_H\n\n#ifdef Win32\n//----------------------------------------------------------------------\n// For Microsoft Visual C++, externally accessible symbols must be\n// explicitly indicated with DLL_API, which is somewhat like \"extern.\"\n//\n// The following ifdef block is the standard way of creating macros\n// which make exporting from a DLL simpler. All files within this DLL\n// are compiled with the DLL_EXPORTS preprocessor symbol defined on the\n// command line. In contrast, projects that use (or import) the DLL\n// objects do not define the DLL_EXPORTS symbol. This way any other\n// project whose source files include this file see DLL_API functions as\n// being imported from a DLL, wheras this DLL sees symbols defined with\n// this macro as being exported.\n//----------------------------------------------------------------------\n#ifdef DLL_EXPORTS\n#define DLL_API __declspec(dllexport)\n#else\n#define DLL_API __declspec(dllimport)\n#endif\n//----------------------------------------------------------------------\n// DLL_API is ignored for all other systems\n//----------------------------------------------------------------------\n#else\n#define DLL_API\n#endif\n\n//----------------------------------------------------------------------\n//  basic includes\n//----------------------------------------------------------------------\n\n#include <cmath>\t\t\t// math includes\n#include <iostream>\t\t\t// I/O streams\n\n#include <vector>\n\n//----------------------------------------------------------------------\n// Limits\n// There are a number of places where we use the maximum double value as\n// default initializers (and others may be used, depending on the\n// data/distance representation). These can usually be found in limits.h\n// (as LONG_MAX, INT_MAX) or in float.h (as DBL_MAX, FLT_MAX).\n//\n// Not all systems have these files.  If you are using such a system,\n// you should set the preprocessor symbol ANN_NO_LIMITS_H when\n// compiling, and modify the statements below to generate the\n// appropriate value. For practical purposes, this does not need to be\n// the maximum double value. It is sufficient that it be at least as\n// large than the maximum squared distance between between any two\n// points.\n//----------------------------------------------------------------------\n#ifdef ANN_NO_LIMITS_H\t\t\t\t\t// limits.h unavailable\n#include <cvalues>\t\t\t\t\t// replacement for limits.h\nconst double ANN_DBL_MAX = MAXDOUBLE;\t// insert maximum double\n#else\n#include <climits>\n#include <cfloat>\nconst double ANN_DBL_MAX = DBL_MAX;\n#endif\n\n#define ANNversion \t\t\"1.0\"\t\t\t// ANN version and information\n#define ANNversionCmt\t\"\"\n#define ANNcopyright\t\"David M. Mount and Sunil Arya\"\n#define ANNlatestRev\t\"Mar 1, 2005\"\n\n//----------------------------------------------------------------------\n//\tANNbool\n//\tThis is a simple boolean type. Although ANSI C++ is supposed\n//\tto support the type bool, some compilers do not have it.\n//----------------------------------------------------------------------\n\nenum ANNbool {ANNfalse = 0, ANNtrue = 1}; // ANN boolean type (non ANSI C++)\n\n//----------------------------------------------------------------------\n//\tANNcoord, ANNdist\n//\t\tANNcoord and ANNdist are the types used for representing\n//\t\tpoint coordinates and distances.  They can be modified by the\n//\t\tuser, with some care.  It is assumed that they are both numeric\n//\t\ttypes, and that ANNdist is generally of an equal or higher type\n//\t\tfrom ANNcoord.\tA variable of type ANNdist should be large\n//\t\tenough to store the sum of squared components of a variable\n//\t\tof type ANNcoord for the number of dimensions needed in the\n//\t\tapplication.  For example, the following combinations are\n//\t\tlegal:\n//\n//\t\tANNcoord\t\tANNdist\n//\t\t---------\t\t-------------------------------\n//\t\tshort\t\t\tshort, int, long, float, double\n//\t\tint\t\t\t\tint, long, float, double\n//\t\tlong\t\t\tlong, float, double\n//\t\tfloat\t\t\tfloat, double\n//\t\tdouble\t\t\tdouble\n//\n//\t\tIt is the user's responsibility to make sure that overflow does\n//\t\tnot occur in distance calculation.\n//----------------------------------------------------------------------\n\ntypedef double\tANNcoord;\t\t\t\t// coordinate data type\ntypedef double\tANNdist;\t\t\t\t// distance data type\n\n//----------------------------------------------------------------------\n//\tANNidx\n//\t\tANNidx is a point index.  When the data structure is built, the\n//\t\tpoints are given as an array.  Nearest neighbor results are\n//\t\treturned as an integer index into this array.  To make it\n//\t\tclearer when this is happening, we define the integer type\n//\t\tANNidx.\t Indexing starts from 0.\n//\n//\t\tFor fixed-radius near neighbor searching, it is possible that\n//\t\tthere are not k nearest neighbors within the search radius.  To\n//\t\tindicate this, the algorithm returns ANN_NULL_IDX as its result.\n//\t\tIt should be distinguishable from any valid array index.\n//----------------------------------------------------------------------\n\ntypedef int\t\tANNidx;\t\t\t\t\t// point index\nconst ANNidx\tANN_NULL_IDX = -1;\t\t// a NULL point index\n\n//----------------------------------------------------------------------\n//\tInfinite distance:\n//\t\tThe code assumes that there is an \"infinite distance\" which it\n//\t\tuses to initialize distances before performing nearest neighbor\n//\t\tsearches.  It should be as larger or larger than any legitimate\n//\t\tnearest neighbor distance.\n//\n//\t\tOn most systems, these should be found in the standard include\n//\t\tfile <limits.h> or possibly <float.h>.  If you do not have these\n//\t\tfile, some suggested values are listed below, assuming 64-bit\n//\t\tlong, 32-bit int and 16-bit short.\n//\n//\t\tANNdist ANN_DIST_INF\tValues (see <limits.h> or <float.h>)\n//\t\t------- ------------\t------------------------------------\n//\t\tdouble\tDBL_MAX\t\t\t1.79769313486231570e+308\n//\t\tfloat\tFLT_MAX\t\t\t3.40282346638528860e+38\n//\t\tlong\tLONG_MAX\t\t0x7fffffffffffffff\n//\t\tint\t\tINT_MAX\t\t\t0x7fffffff\n//\t\tshort\tSHRT_MAX\t\t0x7fff\n//----------------------------------------------------------------------\n\nconst ANNdist\tANN_DIST_INF = ANN_DBL_MAX;\n\n//----------------------------------------------------------------------\n//\tSignificant digits for tree dumps:\n//\t\tWhen floating point coordinates are used, the routine that dumps\n//\t\ta tree needs to know roughly how many significant digits there\n//\t\tare in a ANNcoord, so it can output points to full precision.\n//\t\tThis is defined to be ANNcoordPrec.  On most systems these\n//\t\tvalues can be found in the standard include files <limits.h> or\n//\t\t<float.h>.  For integer types, the value is essentially ignored.\n//\n//\t\tANNcoord ANNcoordPrec\tValues (see <limits.h> or <float.h>)\n//\t\t-------- ------------\t------------------------------------\n//\t\tdouble\t DBL_DIG\t\t15\n//\t\tfloat\t FLT_DIG\t\t6\n//\t\tlong\t doesn't matter 19\n//\t\tint\t\t doesn't matter 10\n//\t\tshort\t doesn't matter 5\n//----------------------------------------------------------------------\n\n#ifdef DBL_DIG\t\t\t\t\t\t\t// number of sig. bits in ANNcoord\nconst int\t ANNcoordPrec\t= DBL_DIG;\n#else\nconst int\t ANNcoordPrec\t= 15;\t// default precision\n#endif\n\n//----------------------------------------------------------------------\n// Self match?\n//\tIn some applications, the nearest neighbor of a point is not\n//\tallowed to be the point itself. This occurs, for example, when\n//\tcomputing all nearest neighbors in a set.  By setting the\n//\tparameter ANN_ALLOW_SELF_MATCH to ANNfalse, the nearest neighbor\n//\tis the closest point whose distance from the query point is\n//\tstrictly positive.\n//----------------------------------------------------------------------\n\nconst ANNbool\tANN_ALLOW_SELF_MATCH\t= ANNtrue;\n//const ANNbool\tANN_ALLOW_SELF_MATCH\t= ANNfalse;\n\n//----------------------------------------------------------------------\n//\tNorms and metrics:\n//\t\tANN supports any Minkowski norm for defining distance.  In\n//\t\tparticular, for any p >= 1, the L_p Minkowski norm defines the\n//\t\tlength of a d-vector (v0, v1, ..., v(d-1)) to be\n//\n//\t\t\t\t(|v0|^p + |v1|^p + ... + |v(d-1)|^p)^(1/p),\n//\n//\t\t(where ^ denotes exponentiation, and |.| denotes absolute\n//\t\tvalue).  The distance between two points is defined to be the\n//\t\tnorm of the vector joining them.  Some common distance metrics\n//\t\tinclude\n//\n//\t\t\t\tEuclidean metric\t\tp = 2\n//\t\t\t\tManhattan metric\t\tp = 1\n//\t\t\t\tMax metric\t\t\t\tp = infinity\n//\n//\t\tIn the case of the max metric, the norm is computed by taking\n//\t\tthe maxima of the absolute values of the components.  ANN is\n//\t\thighly \"coordinate-based\" and does not support general distances\n//\t\tfunctions (e.g. those obeying just the triangle inequality).  It\n//\t\talso does not support distance functions based on\n//\t\tinner-products.\n//\n//\t\tFor the purpose of computing nearest neighbors, it is not\n//\t\tnecessary to compute the final power (1/p).  Thus the only\n//\t\tcomponent that is used by the program is |v(i)|^p.\n//\n//\t\tANN parameterizes the distance computation through the following\n//\t\tmacros.  (Macros are used rather than procedures for\n//\t\tefficiency.) Recall that the distance between two points is\n//\t\tgiven by the length of the vector joining them, and the length\n//\t\tor norm of a vector v is given by formula:\n//\n//\t\t\t\t|v| = ROOT(POW(v0) # POW(v1) # ... # POW(v(d-1)))\n//\n//\t\twhere ROOT, POW are unary functions and # is an associative and\n//\t\tcommutative binary operator mapping the following types:\n//\n//\t\t\t**\tPOW:\tANNcoord\t\t\t\t--> ANNdist\n//\t\t\t**\t#:\t\tANNdist x ANNdist\t\t--> ANNdist\n//\t\t\t**\tROOT:\tANNdist (>0)\t\t\t--> double\n//\n//\t\tFor early termination in distance calculation (partial distance\n//\t\tcalculation) we assume that POW and # together are monotonically\n//\t\tincreasing on sequences of arguments, meaning that for all\n//\t\tv0..vk and y:\n//\n//\t\tPOW(v0) #...# POW(vk) <= (POW(v0) #...# POW(vk)) # POW(y).\n//\n//\tIncremental Distance Calculation:\n//\t\tThe program uses an optimized method of computing distances for\n//\t\tkd-trees and bd-trees, called incremental distance calculation.\n//\t\tIt is used when distances are to be updated when only a single\n//\t\tcoordinate of a point has been changed.  In order to use this,\n//\t\twe assume that there is an incremental update function DIFF(x,y)\n//\t\tfor #, such that if:\n//\n//\t\t\t\t\ts = x0 # ... # xi # ... # xk\n//\n//\t\tthen if s' is equal to s but with xi replaced by y, that is,\n//\n//\t\t\t\t\ts' = x0 # ... # y # ... # xk\n//\n//\t\tthen the length of s' can be computed by:\n//\n//\t\t\t\t\t|s'| = |s| # DIFF(xi,y).\n//\n//\t\tThus, if # is + then DIFF(xi,y) is (yi-x).  For the L_infinity\n//\t\tnorm we make use of the fact that in the program this function\n//\t\tis only invoked when y > xi, and hence DIFF(xi,y)=y.\n//\n//\t\tFinally, for approximate nearest neighbor queries we assume\n//\t\tthat POW and ROOT are related such that\n//\n//\t\t\t\t\tv*ROOT(x) = ROOT(POW(v)*x)\n//\n//\t\tHere are the values for the various Minkowski norms:\n//\n//\t\tL_p:\tp even:\t\t\t\t\t\t\tp odd:\n//\t\t\t\t-------------------------\t\t------------------------\n//\t\t\t\tPOW(v)\t\t\t= v^p\t\t\tPOW(v)\t\t\t= |v|^p\n//\t\t\t\tROOT(x)\t\t\t= x^(1/p)\t\tROOT(x)\t\t\t= x^(1/p)\n//\t\t\t\t#\t\t\t\t= +\t\t\t\t#\t\t\t\t= +\n//\t\t\t\tDIFF(x,y)\t\t= y - x\t\t\tDIFF(x,y)\t\t= y - x\n//\n//\t\tL_inf:\n//\t\t\t\tPOW(v)\t\t\t= |v|\n//\t\t\t\tROOT(x)\t\t\t= x\n//\t\t\t\t#\t\t\t\t= max\n//\t\t\t\tDIFF(x,y)\t\t= y\n//\n//\t\tBy default the Euclidean norm is assumed.  To change the norm,\n//\t\tuncomment the appropriate set of macros below.\n//----------------------------------------------------------------------\n\n//----------------------------------------------------------------------\n//\tUse the following for the Euclidean norm\n//----------------------------------------------------------------------\n#define ANN_POW(v)\t\t\t((v)*(v))\n#define ANN_ROOT(x)\t\t\tsqrt(x)\n#define ANN_SUM(x,y)\t\t((x) + (y))\n#define ANN_DIFF(x,y)\t\t((y) - (x))\n\n//----------------------------------------------------------------------\n//\tUse the following for the L_1 (Manhattan) norm\n//----------------------------------------------------------------------\n// #define ANN_POW(v)\t\tfabs(v)\n// #define ANN_ROOT(x)\t\t(x)\n// #define ANN_SUM(x,y)\t\t((x) + (y))\n// #define ANN_DIFF(x,y)\t((y) - (x))\n\n//----------------------------------------------------------------------\n//\tUse the following for a general L_p norm\n//----------------------------------------------------------------------\n// #define ANN_POW(v)\t\tpow(fabs(v),p)\n// #define ANN_ROOT(x)\t\tpow(fabs(x),1/p)\n// #define ANN_SUM(x,y)\t\t((x) + (y))\n// #define ANN_DIFF(x,y)\t((y) - (x))\n\n//----------------------------------------------------------------------\n//\tUse the following for the L_infinity (Max) norm\n//----------------------------------------------------------------------\n// #define ANN_POW(v)\t\tfabs(v)\n// #define ANN_ROOT(x)\t\t(x)\n// #define ANN_SUM(x,y)\t\t((x) > (y) ? (x) : (y))\n// #define ANN_DIFF(x,y)\t(y)\n\n//----------------------------------------------------------------------\n//\tArray types\n//\t\tThe following array types are of basic interest.  A point is\n//\t\tjust a dimensionless array of coordinates, a point array is a\n//\t\tdimensionless array of points.  A distance array is a\n//\t\tdimensionless array of distances and an index array is a\n//\t\tdimensionless array of point indices.  The latter two are used\n//\t\twhen returning the results of k-nearest neighbor queries.\n//----------------------------------------------------------------------\n\ntypedef ANNcoord* ANNpoint;\t\t\t// a point\ntypedef ANNpoint* ANNpointArray;\t// an array of points\ntypedef ANNdist*  ANNdistArray;\t\t// an array of distances\ntypedef ANNidx*   ANNidxArray;\t\t// an array of point indices\n\n//----------------------------------------------------------------------\n//\tBasic point and array utilities:\n//\t\tThe following procedures are useful supplements to ANN's nearest\n//\t\tneighbor capabilities.\n//\n//\t\tannDist():\n//\t\t\tComputes the (squared) distance between a pair of points.\n//\t\t\tNote that this routine is not used internally by ANN for\n//\t\t\tcomputing distance calculations.  For reasons of efficiency\n//\t\t\tthis is done using incremental distance calculation.  Thus,\n//\t\t\tthis routine cannot be modified as a method of changing the\n//\t\t\tmetric.\n//\n//\t\tBecause points (somewhat like strings in C) are stored as\n//\t\tpointers.  Consequently, creating and destroying copies of\n//\t\tpoints may require storage allocation.  These procedures do\n//\t\tthis.\n//\n//\t\tannAllocPt() and annDeallocPt():\n//\t\t\t\tAllocate a deallocate storage for a single point, and\n//\t\t\t\treturn a pointer to it.  The argument to AllocPt() is\n//\t\t\t\tused to initialize all components.\n//\n//\t\tannAllocPts() and annDeallocPts():\n//\t\t\t\tAllocate and deallocate an array of points as well a\n//\t\t\t\tplace to store their coordinates, and initializes the\n//\t\t\t\tpoints to point to their respective coordinates.  It\n//\t\t\t\tallocates point storage in a contiguous block large\n//\t\t\t\tenough to store all the points.  It performs no\n//\t\t\t\tinitialization.\n//\n//\t\tannCopyPt():\n//\t\t\t\tCreates a copy of a given point, allocating space for\n//\t\t\t\tthe new point.  It returns a pointer to the newly\n//\t\t\t\tallocated copy.\n//----------------------------------------------------------------------\n\nDLL_API ANNdist annDist(\n\tint\t\t\t\tdim,\t\t// dimension of space\n\tANNpoint\t\tp,\t\t\t// points\n\tANNpoint\t\tq);\n\nDLL_API ANNpoint annAllocPt(\n\tint\t\t\t\tdim,\t\t// dimension\n\tANNcoord\t\tc = 0);\t\t// coordinate value (all equal)\n\nDLL_API ANNpointArray annAllocPts(\n\tint\t\t\t\tn,\t\t\t// number of points\n\tint\t\t\t\tdim);\t\t// dimension\n\nDLL_API void annDeallocPt(\n\tANNpoint\t\t&p);\t\t// deallocate 1 point\n\nDLL_API void annDeallocPts(\n\tANNpointArray\t&pa);\t\t// point array\n\nDLL_API ANNpoint annCopyPt(\n\tint\t\t\t\tdim,\t\t// dimension\n\tANNpoint\t\tsource);\t// point to copy\n\n//----------------------------------------------------------------------\n//Overall structure: ANN supports a number of different data structures\n//for approximate and exact nearest neighbor searching.  These are:\n//\n//\t\tANNbruteForce\tA simple brute-force search structure.\n//\t\tANNkd_tree\t\tA kd-tree tree search structure.  ANNbd_tree\n//\t\tA bd-tree tree search structure (a kd-tree with shrink\n//\t\tcapabilities).\n//\n//\t\tAt a minimum, each of these data structures support k-nearest\n//\t\tneighbor queries.  The nearest neighbor query, annkSearch,\n//\t\treturns an integer identifier and the distance to the nearest\n//\t\tneighbor(s) and annRangeSearch returns the nearest points that\n//\t\tlie within a given query ball.\n//\n//\t\tEach structure is built by invoking the appropriate constructor\n//\t\tand passing it (at a minimum) the array of points, the total\n//\t\tnumber of points and the dimension of the space.  Each structure\n//\t\tis also assumed to support a destructor and member functions\n//\t\tthat return basic information about the point set.\n//\n//\t\tNote that the array of points is not copied by the data\n//\t\tstructure (for reasons of space efficiency), and it is assumed\n//\t\tto be constant throughout the lifetime of the search structure.\n//\n//\t\tThe search algorithm, annkSearch, is given the query point (q),\n//\t\tand the desired number of nearest neighbors to report (k), and\n//\t\tthe error bound (eps) (whose default value is 0, implying exact\n//\t\tnearest neighbors).  It returns two arrays which are assumed to\n//\t\tcontain at least k elements: one (nn_idx) contains the indices\n//\t\t(within the point array) of the nearest neighbors and the other\n//\t\t(dd) contains the squared distances to these nearest neighbors.\n//\n//\t\tThe search algorithm, annkFRSearch, is a fixed-radius kNN\n//\t\tsearch.  In addition to a query point, it is given a (squared)\n//\t\tradius bound.  (This is done for consistency, because the search\n//\t\treturns distances as squared quantities.) It does two things.\n//\t\tFirst, it computes the k nearest neighbors within the radius\n//\t\tbound, and second, it returns the total number of points lying\n//\t\twithin the radius bound. It is permitted to set k = 0, in which\n//\t\tcase it effectively answers a range counting query.  If the\n//\t\terror bound epsilon is positive, then the search is approximate\n//\t\tin the sense that it is free to ignore any point that lies\n//\t\toutside a ball of radius r/(1+epsilon), where r is the given\n//\t\t(unsquared) radius bound.\n//\n//\t\tThe generic object from which all the search structures are\n//\t\tdervied is given below.  It is a virtual object, and is useless\n//\t\tby itself.\n//----------------------------------------------------------------------\n\nclass DLL_API ANNpointSet {\n    public:\n\tvirtual ~ANNpointSet() {}\t\t\t// virtual distructor\n\n\tvirtual void annkSearch(\t\t\t// approx k near neighbor search\n\t\tANNpoint\t\tq,\t\t\t\t// query point\n\t\tint\t\t\t\tk,\t\t\t\t// number of near neighbors to return\n\t\tANNidxArray\t\tnn_idx,\t\t\t// nearest neighbor array (modified)\n\t\tANNdistArray\tdd,\t\t\t\t// dist to near neighbors (modified)\n\t\tdouble\t\t\teps=0.0\t\t\t// error bound\n\t\t) = 0;\t\t\t\t\t\t\t// pure virtual (defined elsewhere)\n\n\tvirtual int annkFRSearch(\t\t\t// approx fixed-radius kNN search\n\t\tANNpoint\t\tq,\t\t\t\t// query point\n\t\tANNdist\t\t\tsqRad,\t\t\t// squared radius\n\t\tint\t\t\t\tk = 0,\t\t\t// number of near neighbors to return\n\t\tANNidxArray\t\tnn_idx = NULL,\t// nearest neighbor array (modified)\n\t\tANNdistArray\tdd = NULL,\t\t// dist to near neighbors (modified)\n\t\tdouble\t\t\teps=0.0\t\t\t// error bound\n\t\t) = 0;\t\t\t\t\t\t\t// pure virtual (defined elsewhere)\n\n\tvirtual  std::pair< std::vector<int>, std::vector<double> >  annkFRSearch2(\t\t\t// approx fixed-radius kNN search\n\t\tANNpoint\t\tq,\t\t\t\t// query point\n\t\tANNdist\t\t\tsqRad,\t\t\t// squared radius\n\t\tdouble\t\t\teps=0.0\t\t\t// error bound\n\t\t) = 0;\t\t\t\t\t\t\t// pure virtual (defined elsewhere)\n\tvirtual int theDim() = 0;\t\t\t// return dimension of space\n\tvirtual int nPoints() = 0;\t\t\t// return number of points\n\t// return pointer to points\n\tvirtual ANNpointArray thePoints() = 0;\n};\n\n//----------------------------------------------------------------------\n//\tBrute-force nearest neighbor search:\n//\t\tThe brute-force search structure is very simple but inefficient.\n//\t\tIt has been provided primarily for the sake of comparison with\n//\t\tand validation of the more complex search structures.\n//\n//\t\tQuery processing is the same as described above, but the value\n//\t\tof epsilon is ignored, since all distance calculations are\n//\t\tperformed exactly.\n//\n//\t\tWARNING: This data structure is very slow, and should not be\n//\t\tused unless the number of points is very small.\n//\n//\t\tInternal information:\n//\t\t---------------------\n//\t\tThis data structure bascially consists of the array of points\n//\t\t(each a pointer to an array of coordinates).  The search is\n//\t\tperformed by a simple linear scan of all the points.\n//----------------------------------------------------------------------\n\nclass DLL_API ANNbruteForce: public ANNpointSet {\n    int\t\t\t\tdim;\t\t\t\t// dimension\n    int\t\t\t\tn_pts;\t\t\t\t// number of points\n    ANNpointArray\tpts;\t\t\t\t// point array\n    public:\n    ANNbruteForce(\t\t\t\t\t\t// constructor from point array\n\t    ANNpointArray\tpa,\t\t\t\t// point array\n\t    int\t\t\t\tn,\t\t\t\t// number of points\n\t    int\t\t\t\tdd);\t\t\t// dimension\n\n    ~ANNbruteForce();\t\t\t\t\t// destructor\n\n    void annkSearch(\t\t\t\t\t// approx k near neighbor search\n\t    ANNpoint\t\tq,\t\t\t\t// query point\n\t    int\t\t\t\tk,\t\t\t\t// number of near neighbors to return\n\t    ANNidxArray\t\tnn_idx,\t\t\t// nearest neighbor array (modified)\n\t    ANNdistArray\tdd,\t\t\t\t// dist to near neighbors (modified)\n\t    double\t\t\teps=0.0);\t\t// error bound\n\n    int annkFRSearch(\t\t\t\t\t// approx fixed-radius kNN search\n\t    ANNpoint\t\tq,\t\t\t\t// query point\n\t    ANNdist\t\t\tsqRad,\t\t\t// squared radius\n\t    int\t\t\t\tk = 0,\t\t\t// number of near neighbors to return\n\t    ANNidxArray\t\tnn_idx = NULL,\t// nearest neighbor array (modified)\n\t    ANNdistArray\tdd = NULL,\t\t// dist to near neighbors (modified)\n\t    double\t\t\teps=0.0);\t\t// error bound\n\n    std::pair< std::vector<int>, std::vector<double> >  annkFRSearch2(\t\t\t\t\t// approx fixed-radius kNN search\n\t    ANNpoint\t\tq,\t\t\t\t// query point\n\t    ANNdist\t\t\tsqRad,\t\t\t// squared radius\n\t    double\t\t\teps=0.0);\t\t// error bound\n\n    int theDim()\t\t\t\t\t\t// return dimension of space\n    { return dim; }\n\n    int nPoints()\t\t\t\t\t\t// return number of points\n    { return n_pts; }\n\n    ANNpointArray thePoints()\t\t\t// return pointer to points\n    {  return pts;  }\n};\n\n//----------------------------------------------------------------------\n// kd- and bd-tree splitting and shrinking rules\n//\t\tkd-trees supports a collection of different splitting rules.\n//\t\tIn addition to the standard kd-tree splitting rule proposed\n//\t\tby Friedman, Bentley, and Finkel, we have introduced a\n//\t\tnumber of other splitting rules, which seem to perform\n//\t\tas well or better (for the distributions we have tested).\n//\n//\t\tThe splitting methods given below allow the user to tailor\n//\t\tthe data structure to the particular data set.  They are\n//\t\tare described in greater details in the kd_split.cc source\n//\t\tfile.  The method ANN_KD_SUGGEST is the method chosen (rather\n//\t\tsubjectively) by the implementors as the one giving the\n//\t\tfastest performance, and is the default splitting method.\n//\n//\t\tAs with splitting rules, there are a number of different\n//\t\tshrinking rules.  The shrinking rule ANN_BD_NONE does no\n//\t\tshrinking (and hence produces a kd-tree tree).  The rule\n//\t\tANN_BD_SUGGEST uses the implementors favorite rule.\n//----------------------------------------------------------------------\n\nenum ANNsplitRule {\n    ANN_KD_STD\t\t\t\t= 0,\t// the optimized kd-splitting rule\n    ANN_KD_MIDPT\t\t\t= 1,\t// midpoint split\n    ANN_KD_FAIR\t\t\t\t= 2,\t// fair split\n    ANN_KD_SL_MIDPT\t\t\t= 3,\t// sliding midpoint splitting method\n    ANN_KD_SL_FAIR\t\t\t= 4,\t// sliding fair split method\n    ANN_KD_SUGGEST\t\t\t= 5};\t// the authors' suggestion for best\nconst int ANN_N_SPLIT_RULES\t\t= 6;\t// number of split rules\n\nenum ANNshrinkRule {\n    ANN_BD_NONE\t\t\t\t= 0,\t// no shrinking at all (just kd-tree)\n    ANN_BD_SIMPLE\t\t\t= 1,\t// simple splitting\n    ANN_BD_CENTROID\t\t\t= 2,\t// centroid splitting\n    ANN_BD_SUGGEST\t\t\t= 3};\t// the authors' suggested choice\nconst int ANN_N_SHRINK_RULES\t= 4;\t// number of shrink rules\n\n//----------------------------------------------------------------------\n//\tkd-tree:\n//\t\tThe main search data structure supported by ANN is a kd-tree.\n//\t\tThe main constructor is given a set of points and a choice of\n//\t\tsplitting method to use in building the tree.\n//\n//\t\tConstruction:\n//\t\t-------------\n//\t\tThe constructor is given the point array, number of points,\n//\t\tdimension, bucket size (default = 1), and the splitting rule\n//\t\t(default = ANN_KD_SUGGEST).  The point array is not copied, and\n//\t\tis assumed to be kept constant throughout the lifetime of the\n//\t\tsearch structure.  There is also a \"load\" constructor that\n//\t\tbuilds a tree from a file description that was created by the\n//\t\tDump operation.\n//\n//\t\tSearch:\n//\t\t-------\n//\t\tThere are two search methods:\n//\n//\t\t\tStandard search (annkSearch()):\n//\t\t\t\tSearches nodes in tree-traversal order, always visiting\n//\t\t\t\tthe closer child first.\n//\t\t\tPriority search (annkPriSearch()):\n//\t\t\t\tSearches nodes in order of increasing distance of the\n//\t\t\t\tassociated cell from the query point.  For many\n//\t\t\t\tdistributions the standard search seems to work just\n//\t\t\t\tfine, but priority search is safer for worst-case\n//\t\t\t\tperformance.\n//\n//\t\tPrinting:\n//\t\t---------\n//\t\tThere are two methods provided for printing the tree.  Print()\n//\t\tis used to produce a \"human-readable\" display of the tree, with\n//\t\tindenation, which is handy for debugging.  Dump() produces a\n//\t\tformat that is suitable reading by another program.  There is a\n//\t\t\"load\" constructor, which constructs a tree which is assumed to\n//\t\thave been saved by the Dump() procedure.\n//\n//\t\tPerformance and Structure Statistics:\n//\t\t-------------------------------------\n//\t\tThe procedure getStats() collects statistics information on the\n//\t\ttree (its size, height, etc.)  See ANNperf.h for information on\n//\t\tthe stats structure it returns.\n//\n//\t\tInternal information:\n//\t\t---------------------\n//\t\tThe data structure consists of three major chunks of storage.\n//\t\tThe first (implicit) storage are the points themselves (pts),\n//\t\twhich have been provided by the users as an argument to the\n//\t\tconstructor, or are allocated dynamically if the tree is built\n//\t\tusing the load constructor).  These should not be changed during\n//\t\tthe lifetime of the search structure.  It is the user's\n//\t\tresponsibility to delete these after the tree is destroyed.\n//\n//\t\tThe second is the tree itself (which is dynamically allocated in\n//\t\tthe constructor) and is given as a pointer to its root node\n//\t\t(root).  These nodes are automatically deallocated when the tree\n//\t\tis deleted.  See the file src/kd_tree.h for further information\n//\t\ton the structure of the tree nodes.\n//\n//\t\tEach leaf of the tree does not contain a pointer directly to a\n//\t\tpoint, but rather contains a pointer to a \"bucket\", which is an\n//\t\tarray consisting of point indices.  The third major chunk of\n//\t\tstorage is an array (pidx), which is a large array in which all\n//\t\tthese bucket subarrays reside.  (The reason for storing them\n//\t\tseparately is the buckets are typically small, but of varying\n//\t\tsizes.  This was done to avoid fragmentation.)  This array is\n//\t\talso deallocated when the tree is deleted.\n//\n//\t\tIn addition to this, the tree consists of a number of other\n//\t\tpieces of information which are used in searching and for\n//\t\tsubsequent tree operations.  These consist of the following:\n//\n//\t\tdim\t\t\t\t\t\tDimension of space\n//\t\tn_pts\t\t\t\t\tNumber of points currently in the tree\n//\t\tn_max\t\t\t\t\tMaximum number of points that are allowed\n//\t\t\t\t\t\t\t\tin the tree\n//\t\tbkt_size\t\t\t\tMaximum bucket size (no. of points per leaf)\n//\t\tbnd_box_lo\t\t\t\tBounding box low point\n//\t\tbnd_box_hi\t\t\t\tBounding box high point\n//\t\tsplitRule\t\t\t\tSplitting method used\n//\n//----------------------------------------------------------------------\n\n//----------------------------------------------------------------------\n// Some types and objects used by kd-tree functions\n// See src/kd_tree.h and src/kd_tree.cpp for definitions\n//----------------------------------------------------------------------\nclass ANNkdStats;\t\t\t\t// stats on kd-tree\nclass ANNkd_node;\t\t\t\t// generic node in a kd-tree\ntypedef ANNkd_node*\tANNkd_ptr;\t// pointer to a kd-tree node\n\nclass DLL_API ANNkd_tree: public ANNpointSet {\n    protected:\n\tint\t\t\t\tdim;\t\t\t\t// dimension of space\n\tint\t\t\t\tn_pts;\t\t\t\t// number of points in tree\n\tint\t\t\t\tbkt_size;\t\t\t// bucket size\n\tANNpointArray\tpts;\t\t\t\t// the points\n\tANNidxArray\t\tpidx;\t\t\t\t// point indices (to pts array)\n\tANNkd_ptr\t\troot;\t\t\t\t// root of kd-tree\n\tANNpoint\t\tbnd_box_lo;\t\t\t// bounding box low point\n\tANNpoint\t\tbnd_box_hi;\t\t\t// bounding box high point\n\n\tvoid SkeletonTree(\t\t\t\t\t// construct skeleton tree\n\t\tint\t\t\t\tn,\t\t\t\t// number of points\n\t\tint\t\t\t\tdd,\t\t\t\t// dimension\n\t\tint\t\t\t\tbs,\t\t\t\t// bucket size\n\t\tANNpointArray pa = NULL,\t\t// point array (optional)\n\t\tANNidxArray pi = NULL);\t\t\t// point indices (optional)\n\n    public:\n\tANNkd_tree(\t\t\t\t\t\t\t// build skeleton tree\n\t\tint\t\t\t\tn = 0,\t\t\t// number of points\n\t\tint\t\t\t\tdd = 0,\t\t\t// dimension\n\t\tint\t\t\t\tbs = 1);\t\t// bucket size\n\n\tANNkd_tree(\t\t\t\t\t\t\t// build from point array\n\t\tANNpointArray\tpa,\t\t\t\t// point array\n\t\tint\t\t\t\tn,\t\t\t\t// number of points\n\t\tint\t\t\t\tdd,\t\t\t\t// dimension\n\t\tint\t\t\t\tbs = 1,\t\t\t// bucket size\n\t\tANNsplitRule\tsplit = ANN_KD_SUGGEST);\t// splitting method\n\n\tANNkd_tree(\t\t\t\t\t\t\t// build from dump file\n\t\tstd::istream&\tin);\t\t\t// input stream for dump file\n\n\t~ANNkd_tree();\t\t\t\t\t\t// tree destructor\n\n\tvoid annkSearch(\t\t\t\t\t// approx k near neighbor search\n\t\tANNpoint\t\tq,\t\t\t\t// query point\n\t\tint\t\t\t\tk,\t\t\t\t// number of near neighbors to return\n\t\tANNidxArray\t\tnn_idx,\t\t\t// nearest neighbor array (modified)\n\t\tANNdistArray\tdd,\t\t\t\t// dist to near neighbors (modified)\n\t\tdouble\t\t\teps=0.0);\t\t// error bound\n\n\tvoid annkPriSearch( \t\t\t\t// priority k near neighbor search\n\t\tANNpoint\t\tq,\t\t\t\t// query point\n\t\tint\t\t\t\tk,\t\t\t\t// number of near neighbors to return\n\t\tANNidxArray\t\tnn_idx,\t\t\t// nearest neighbor array (modified)\n\t\tANNdistArray\tdd,\t\t\t\t// dist to near neighbors (modified)\n\t\tdouble\t\t\teps=0.0);\t\t// error bound\n\n\tint annkFRSearch(\t\t\t\t\t// approx fixed-radius kNN search\n\t\tANNpoint\t\tq,\t\t\t\t// the query point\n\t\tANNdist\t\t\tsqRad,\t\t\t// squared radius of query ball\n\t\tint\t\t\t\tk,\t\t\t\t// number of neighbors to return\n\t\tANNidxArray\t\tnn_idx = NULL,\t// nearest neighbor array (modified)\n\t\tANNdistArray\tdd = NULL,\t\t// dist to near neighbors (modified)\n\t\tdouble\t\t\teps=0.0);\t\t// error bound\n\n\t//MFH 7/15/2015\n\tstd::pair< std::vector<int>, std::vector<double> > annkFRSearch2(\t\t\t\t\t// approx fixed-radius kNN search\n\t\tANNpoint\t\tq,\t\t\t\t// the query point\n\t\tANNdist\t\t\tsqRad,\t\t\t// squared radius of query ball\n\t\tdouble\t\t\teps=0.0);\t\t// error bound\n\n\n\tint theDim()\t\t\t\t\t\t// return dimension of space\n\t{ return dim; }\n\n\tint nPoints()\t\t\t\t\t\t// return number of points\n\t{ return n_pts; }\n\n\tANNpointArray thePoints()\t\t\t// return pointer to points\n\t{  return pts;  }\n\n\tvirtual void Print(\t\t\t\t\t// print the tree (for debugging)\n\t\tANNbool\t\t\twith_pts,\t\t// print points as well?\n\t\tstd::ostream&\tout);\t\t\t// output stream\n\n\tvirtual void Dump(\t\t\t\t\t// dump entire tree\n\t\tANNbool\t\t\twith_pts,\t\t// print points as well?\n\t\tstd::ostream&\tout);\t\t\t// output stream\n\n\tvirtual void getStats(\t\t\t\t// compute tree statistics\n\t\tANNkdStats&\t\tst);\t\t\t// the statistics (modified)\n};\n\n//----------------------------------------------------------------------\n//\tBox decomposition tree (bd-tree)\n//\t\tThe bd-tree is inherited from a kd-tree.  The main difference\n//\t\tin the bd-tree and the kd-tree is a new type of internal node\n//\t\tcalled a shrinking node (in the kd-tree there is only one type\n//\t\tof internal node, a splitting node).  The shrinking node\n//\t\tmakes it possible to generate balanced trees in which the\n//\t\tcells have bounded aspect ratio, by allowing the decomposition\n//\t\tto zoom in on regions of dense point concentration.  Although\n//\t\tthis is a nice idea in theory, few point distributions are so\n//\t\tdensely clustered that this is really needed.\n//----------------------------------------------------------------------\n\nclass DLL_API ANNbd_tree: public ANNkd_tree {\n    public:\n\tANNbd_tree(\t\t\t\t\t\t\t// build skeleton tree\n\t\tint\t\t\t\tn,\t\t\t\t// number of points\n\t\tint\t\t\t\tdd,\t\t\t\t// dimension\n\t\tint\t\t\t\tbs = 1)\t\t\t// bucket size\n\t    : ANNkd_tree(n, dd, bs) {}\t\t// build base kd-tree\n\n\tANNbd_tree(\t\t\t\t\t\t\t// build from point array\n\t\tANNpointArray\tpa,\t\t\t\t// point array\n\t\tint\t\t\t\tn,\t\t\t\t// number of points\n\t\tint\t\t\t\tdd,\t\t\t\t// dimension\n\t\tint\t\t\t\tbs = 1,\t\t\t// bucket size\n\t\tANNsplitRule\tsplit  = ANN_KD_SUGGEST,\t// splitting rule\n\t\tANNshrinkRule\tshrink = ANN_BD_SUGGEST);\t// shrinking rule\n\n\tANNbd_tree(\t\t\t\t\t\t\t// build from dump file\n\t\tstd::istream&\tin);\t\t\t// input stream for dump file\n};\n\n//----------------------------------------------------------------------\n//\tOther functions\n//\tannMaxPtsVisit\t\tSets a limit on the maximum number of points\n//\t\t\t\t\t\tto visit in the search.\n//  annClose\t\t\tCan be called when all use of ANN is finished.\n//\t\t\t\t\t\tIt clears up a minor memory leak.\n//----------------------------------------------------------------------\n\nDLL_API void annMaxPtsVisit(\t// max. pts to visit in search\n\tint\t\t\t\tmaxPts);\t// the limit\n\nDLL_API void annClose();\t\t// called to end use of ANN\n\n#endif\n"
  },
  {
    "path": "src/ANN/ANNperf.h",
    "content": "//----------------------------------------------------------------------\n//\tFile:\t\t\tANNperf.h\n//\tProgrammer:\t\tSunil Arya and David Mount\n//\tLast modified:\t03/04/98 (Release 0.1)\n//\tDescription:\tInclude file for ANN performance stats\n//\n//\tSome of the code for statistics gathering has been adapted\n//\tfrom the SmplStat.h package in the g++ library.\n//----------------------------------------------------------------------\n// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and\n// David Mount.  All Rights Reserved.\n//\n// This software and related documentation is part of the Approximate\n// Nearest Neighbor Library (ANN).  This software is provided under\n// the provisions of the Lesser GNU Public License (LGPL).  See the\n// file ../ReadMe.txt for further information.\n//\n// The University of Maryland (U.M.) and the authors make no\n// representations about the suitability or fitness of this software for\n// any purpose.  It is provided \"as is\" without express or implied\n// warranty.\n//----------------------------------------------------------------------\n//      History:\n//      Revision 0.1  03/04/98\n//          Initial release\n//      Revision 1.0  04/01/05\n//          Added ANN_ prefix to avoid name conflicts.\n//----------------------------------------------------------------------\n\n#ifndef ANNperf_H\n#define ANNperf_H\n\n//----------------------------------------------------------------------\n//\tbasic includes\n//----------------------------------------------------------------------\n\n#include \"ANN.h\"\t\t\t\t\t// basic ANN includes\n\n//----------------------------------------------------------------------\n// kd-tree stats object\n//\tThis object is used for collecting information about a kd-tree\n//\tor bd-tree.\n//----------------------------------------------------------------------\n\nclass ANNkdStats {\t\t\t// stats on kd-tree\npublic:\n\tint\t\tdim;\t\t\t// dimension of space\n\tint\t\tn_pts;\t\t\t// no. of points\n\tint\t\tbkt_size;\t\t// bucket size\n\tint\t\tn_lf;\t\t\t// no. of leaves (including trivial)\n\tint\t\tn_tl;\t\t\t// no. of trivial leaves (no points)\n\tint\t\tn_spl;\t\t\t// no. of splitting nodes\n\tint\t\tn_shr;\t\t\t// no. of shrinking nodes (for bd-trees)\n\tint\t\tdepth;\t\t\t// depth of tree\n\tfloat\tsum_ar;\t\t\t// sum of leaf aspect ratios\n\tfloat\tavg_ar;\t\t\t// average leaf aspect ratio\n //\n\t\t\t\t\t\t\t// reset stats\n\tvoid reset(int d=0, int n=0, int bs=0)\n\t{\n\t\tdim = d; n_pts = n; bkt_size = bs;\n\t\tn_lf = n_tl = n_spl = n_shr = depth = 0;\n\t\tsum_ar = avg_ar = 0.0;\n\t}\n\n\tANNkdStats()\t\t\t// basic constructor\n\t{ reset(); }\n\n\tvoid merge(const ANNkdStats &st);\t// merge stats from child \n};\n\n//----------------------------------------------------------------------\n//  ANNsampStat\n//\tA sample stat collects numeric (double) samples and returns some\n//\tsimple statistics.  Its main functions are:\n//\n//\t\treset()\t\tReset to no samples.\n//\t\t+= x\t\tInclude sample x.\n//\t\tsamples()\tReturn number of samples.\n//\t\tmean()\t\tReturn mean of samples.\n//\t\tstdDev()\tReturn standard deviation\n//\t\tmin()\t\tReturn minimum of samples.\n//\t\tmax()\t\tReturn maximum of samples.\n//----------------------------------------------------------------------\nclass DLL_API ANNsampStat {\n\tint\t\t\t\tn;\t\t\t\t// number of samples\n\tdouble\t\t\tsum;\t\t\t// sum\n\tdouble\t\t\tsum2;\t\t\t// sum of squares\n\tdouble\t\t\tminVal, maxVal;\t// min and max\npublic :\n\tvoid reset()\t\t\t\t// reset everything\n\t{  \n\t\tn = 0;\n\t\tsum = sum2 = 0;\n\t\tminVal = ANN_DBL_MAX;\n\t\tmaxVal = -ANN_DBL_MAX; \n\t}\n\n\tANNsampStat() { reset(); }\t\t// constructor\n\n\tvoid operator+=(double x)\t\t// add sample\n\t{\n\t\tn++;  sum += x;  sum2 += x*x;\n\t\tif (x < minVal) minVal = x;\n\t\tif (x > maxVal) maxVal = x;\n\t}\n\n\tint samples() { return n; }\t\t// number of samples\n\n\tdouble mean() { return sum/n; } // mean\n\n\t\t\t\t\t\t\t\t\t// standard deviation\n\tdouble stdDev() { return std::sqrt((sum2 - (sum*sum)/n)/(n-1));}\n\n\tdouble min() { return minVal; } // minimum\n\tdouble max() { return maxVal; } // maximum\n};\n\n//----------------------------------------------------------------------\n//\t\tOperation count updates\n//----------------------------------------------------------------------\n\n#ifdef ANN_PERF\n  #define ANN_FLOP(n)\t{ann_Nfloat_ops += (n);}\n  #define ANN_LEAF(n)\t{ann_Nvisit_lfs += (n);}\n  #define ANN_SPL(n)\t{ann_Nvisit_spl += (n);}\n  #define ANN_SHR(n)\t{ann_Nvisit_shr += (n);}\n  #define ANN_PTS(n)\t{ann_Nvisit_pts += (n);}\n  #define ANN_COORD(n)\t{ann_Ncoord_hts += (n);}\n#else\n  #define ANN_FLOP(n)\n  #define ANN_LEAF(n)\n  #define ANN_SPL(n)\n  #define ANN_SHR(n)\n  #define ANN_PTS(n)\n  #define ANN_COORD(n)\n#endif\n\n//----------------------------------------------------------------------\n//\tPerformance statistics\n//\tThe following data and routines are used for computing performance\n//\tstatistics for nearest neighbor searching.  Because these routines\n//\tcan slow the code down, they can be activated and deactiviated by\n//\tdefining the ANN_PERF variable, by compiling with the option:\n//\t-DANN_PERF\n//----------------------------------------------------------------------\n\n//----------------------------------------------------------------------\n//\tGlobal counters for performance measurement\n//\n//\tvisit_lfs\tThe number of leaf nodes visited in the\n//\t\t\t\ttree.\n//\n//\tvisit_spl\tThe number of splitting nodes visited in the\n//\t\t\t\ttree.\n//\n//\tvisit_shr\tThe number of shrinking nodes visited in the\n//\t\t\t\ttree.\n//\n//\tvisit_pts\tThe number of points visited in all the\n//\t\t\t\tleaf nodes visited. Equivalently, this\n//\t\t\t\tis the number of points for which distance\n//\t\t\t\tcalculations are performed.\n//\n//\tcoord_hts\tThe number of times a coordinate of a \n//\t\t\t\tdata point is accessed. This is generally\n//\t\t\t\tless than visit_pts*d if partial distance\n//\t\t\t\tcalculation is used.  This count is low\n//\t\t\t\tin the sense that if a coordinate is hit\n//\t\t\t\tmany times in the same routine we may\n//\t\t\t\tcount it only once.\n//\n//\tfloat_ops\tThe number of floating point operations.\n//\t\t\t\tThis includes all operations in the heap\n//\t\t\t\tas well as distance calculations to boxes.\n//\n//\taverage_err\tThe average error of each query (the\n//\t\t\t\terror of the reported point to the true\n//\t\t\t\tnearest neighbor).  For k nearest neighbors\n//\t\t\t\tthe error is computed k times.\n//\n//\trank_err\tThe rank error of each query (the difference\n//\t\t\t\tin the rank of the reported point and its\n//\t\t\t\ttrue rank).\n//\n//\tdata_pts\tThe number of data points.  This is not\n//\t\t\t\ta counter, but used in stats computation.\n//----------------------------------------------------------------------\n\nextern int\t\t\tann_Ndata_pts;\t// number of data points\nextern int\t\t\tann_Nvisit_lfs;\t// number of leaf nodes visited\nextern int\t\t\tann_Nvisit_spl;\t// number of splitting nodes visited\nextern int\t\t\tann_Nvisit_shr;\t// number of shrinking nodes visited\nextern int\t\t\tann_Nvisit_pts;\t// visited points for one query\nextern int\t\t\tann_Ncoord_hts;\t// coordinate hits for one query\nextern int\t\t\tann_Nfloat_ops;\t// floating ops for one query\nextern ANNsampStat\tann_visit_lfs;\t// stats on leaf nodes visits\nextern ANNsampStat\tann_visit_spl;\t// stats on splitting nodes visits\nextern ANNsampStat\tann_visit_shr;\t// stats on shrinking nodes visits\nextern ANNsampStat\tann_visit_nds;\t// stats on total nodes visits\nextern ANNsampStat\tann_visit_pts;\t// stats on points visited\nextern ANNsampStat\tann_coord_hts;\t// stats on coordinate hits\nextern ANNsampStat\tann_float_ops;\t// stats on floating ops\n//----------------------------------------------------------------------\n//  The following need to be part of the public interface, because\n//  they are accessed outside the DLL in ann_test.cpp.\n//----------------------------------------------------------------------\nDLL_API extern ANNsampStat ann_average_err;\t// average error\nDLL_API extern ANNsampStat ann_rank_err;\t// rank error\n\n//----------------------------------------------------------------------\n//\tDeclaration of externally accessible routines for statistics\n//----------------------------------------------------------------------\n\nDLL_API void annResetStats(int data_size);\t// reset stats for a set of queries\n\nDLL_API void annResetCounts();\t\t\t\t// reset counts for one queries\n\nDLL_API void annUpdateStats();\t\t\t\t// update stats with current counts\n\nDLL_API void annPrintStats(ANNbool validate); // print statistics for a run\n\n#endif\n"
  },
  {
    "path": "src/ANN/ANNx.h",
    "content": "//----------------------------------------------------------------------\n//\tFile:\t\t\tANNx.h\n//\tProgrammer: \tSunil Arya and David Mount\n//\tLast modified:\t03/04/98 (Release 0.1)\n//\tDescription:\tInternal include file for ANN\n//\n//\tThese declarations are of use in manipulating some of\n//\tthe internal data objects appearing in ANN, but are not\n//\tneeded for applications just using the nearest neighbor\n//\tsearch.\n//\n//\tTypical users of ANN should not need to access this file.\n//----------------------------------------------------------------------\n// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and\n// David Mount.  All Rights Reserved.\n//\n// This software and related documentation is part of the Approximate\n// Nearest Neighbor Library (ANN).  This software is provided under\n// the provisions of the Lesser GNU Public License (LGPL).  See the\n// file ../ReadMe.txt for further information.\n//\n// The University of Maryland (U.M.) and the authors make no\n// representations about the suitability or fitness of this software for\n// any purpose.  It is provided \"as is\" without express or implied\n// warranty.\n//----------------------------------------------------------------------\n//\tHistory:\n//\tRevision 0.1  03/04/98\n//\t    Initial release\n//\tRevision 1.0  04/01/05\n//\t    Changed LO, HI, IN, OUT to ANN_LO, ANN_HI, etc.\n//----------------------------------------------------------------------\n\n#ifndef ANNx_H\n#define ANNx_H\n\n#include <iomanip>\t\t\t\t// I/O manipulators\n#include \"ANN.h\"\t\t\t// ANN includes\n\n//----------------------------------------------------------------------\n//\tGlobal constants and types\n//----------------------------------------------------------------------\nenum\t{ANN_LO=0, ANN_HI=1};\t// splitting indices\nenum\t{ANN_IN=0, ANN_OUT=1};\t// shrinking indices\n\t\t\t\t\t\t\t\t// what to do in case of error\nenum ANNerr {ANNwarn = 0, ANNabort = 1};\n\n//----------------------------------------------------------------------\n//\tMaximum number of points to visit\n//\tWe have an option for terminating the search early if the\n//\tnumber of points visited exceeds some threshold.  If the\n//\tthreshold is 0 (its default)  this means there is no limit\n//\tand the algorithm applies its normal termination condition.\n//----------------------------------------------------------------------\n\nextern int\t\tANNmaxPtsVisited;\t// maximum number of pts visited\nextern int\t\tANNptsVisited;\t\t// number of pts visited in search\n\n//----------------------------------------------------------------------\n//\tGlobal function declarations\n//----------------------------------------------------------------------\n\nvoid annError(\t\t\t\t\t// ANN error routine\n\tconst char\t\t\t*msg,\t\t// error message\n\tANNerr\t\t\tlevel);\t\t// level of error\n\nvoid annPrintPt(\t\t\t\t// print a point\n\tANNpoint\t\tpt,\t\t\t// the point\n\tint\t\t\t\tdim,\t\t// the dimension\n\tstd::ostream\t&out);\t\t// output stream\n\n//----------------------------------------------------------------------\n//\tOrthogonal (axis aligned) rectangle\n//\tOrthogonal rectangles are represented by two points, one\n//\tfor the lower left corner (min coordinates) and the other\n//\tfor the upper right corner (max coordinates).\n//\n//\tThe constructor initializes from either a pair of coordinates,\n//\tpair of points, or another rectangle.  Note that all constructors\n//\tallocate new point storage. The destructor deallocates this\n//\tstorage.\n//\n//\tBEWARE: Orthogonal rectangles should be passed ONLY BY REFERENCE.\n//\t(C++'s default copy constructor will not allocate new point\n//\tstorage, then on return the destructor free's storage, and then\n//\tyou get into big trouble in the calling procedure.)\n//----------------------------------------------------------------------\n\nclass ANNorthRect {\npublic:\n\tANNpoint\t\tlo;\t\t\t// rectangle lower bounds\n\tANNpoint\t\thi;\t\t\t// rectangle upper bounds\n//\n\tANNorthRect(\t\t\t\t// basic constructor\n\tint\t\t\t\tdd,\t\t\t// dimension of space\n\tANNcoord\t\tl=0,\t\t// default is empty\n\tANNcoord\t\th=0)\n\t{  lo = annAllocPt(dd, l);  hi = annAllocPt(dd, h); }\n\n\tANNorthRect(\t\t\t\t// (almost a) copy constructor\n\tint\t\t\t\tdd,\t\t\t// dimension\n\tconst\t\t\tANNorthRect &r) // rectangle to copy\n\t{  lo = annCopyPt(dd, r.lo);  hi = annCopyPt(dd, r.hi);  }\n\n\tANNorthRect(\t\t\t\t// construct from points\n\tint\t\t\t\tdd,\t\t\t// dimension\n\tANNpoint\t\tl,\t\t\t// low point\n\tANNpoint\t\th)\t\t\t// hight point\n\t{  lo = annCopyPt(dd, l);  hi = annCopyPt(dd, h);  }\n\n\t~ANNorthRect()\t\t\t\t// destructor\n    {  annDeallocPt(lo);  annDeallocPt(hi);  }\n\n\tANNbool inside(int dim, ANNpoint p);// is point p inside rectangle?\n};\n\nvoid annAssignRect(\t\t\t\t// assign one rect to another\n\tint\t\t\t\tdim,\t\t// dimension (both must be same)\n\tANNorthRect\t\t&dest,\t\t// destination (modified)\n\tconst ANNorthRect &source);\t// source\n\n//----------------------------------------------------------------------\n//\tOrthogonal (axis aligned) halfspace\n//\tAn orthogonal halfspace is represented by an integer cutting\n//\tdimension cd, coordinate cutting value, cv, and side, sd, which is\n//\teither +1 or -1. Our convention is that point q lies in the (closed)\n//\thalfspace if (q[cd] - cv)*sd >= 0.\n//----------------------------------------------------------------------\n\nclass ANNorthHalfSpace {\npublic:\n\tint\t\t\t\tcd;\t\t\t// cutting dimension\n\tANNcoord\t\tcv;\t\t\t// cutting value\n\tint\t\t\t\tsd;\t\t\t// which side\n//\n\tANNorthHalfSpace()\t\t\t// default constructor\n\t{  cd = 0; cv = 0;  sd = 0;  }\n\n\tANNorthHalfSpace(\t\t\t// basic constructor\n\tint\t\t\t\tcdd,\t\t// dimension of space\n\tANNcoord\t\tcvv,\t\t// cutting value\n\tint\t\t\t\tsdd)\t\t// side\n\t{  cd = cdd;  cv = cvv;  sd = sdd;  }\n\n\tANNbool in(ANNpoint q) const\t// is q inside halfspace?\n\t{  return  (ANNbool) ((q[cd] - cv)*sd >= 0);  }\n\n\tANNbool out(ANNpoint q) const\t// is q outside halfspace?\n\t{  return  (ANNbool) ((q[cd] - cv)*sd < 0);  }\n\n\tANNdist dist(ANNpoint q) const\t// (squared) distance from q\n\t{  return  (ANNdist) ANN_POW(q[cd] - cv);  }\n\n\tvoid setLowerBound(int d, ANNpoint p)// set to lower bound at p[i]\n\t{  cd = d;  cv = p[d];  sd = +1;  }\n\n\tvoid setUpperBound(int d, ANNpoint p)// set to upper bound at p[i]\n\t{  cd = d;  cv = p[d];  sd = -1;  }\n\n\tvoid project(ANNpoint &q)\t\t// project q (modified) onto halfspace\n\t{  if (out(q)) q[cd] = cv;  }\n};\n\n\t\t\t\t\t\t\t\t// array of halfspaces\ntypedef ANNorthHalfSpace *ANNorthHSArray;\n\n#endif\n"
  },
  {
    "path": "src/ANN/Copyright.txt",
    "content": "ANN: Approximate Nearest Neighbors\nVersion: 1.1\nRelease Date: May 3, 2005\n----------------------------------------------------------------------------\nCopyright (c) 1997-2005 University of Maryland and Sunil Arya and David\nMount All Rights Reserved.\n\nThis program is free software; you can redistribute it and/or modify it\nunder the terms of the GNU Lesser Public License as published by the\nFree Software Foundation; either version 2.1 of the License, or (at your\noption) any later version.\n\nThis program is distributed in the hope that it will be useful, but\nWITHOUT ANY WARRANTY; without even the implied warranty of\nMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU\nLesser Public License for more details.\n\nA copy of the terms and conditions of the license can be found in\nLicense.txt or online at\n\n    http://www.gnu.org/copyleft/lesser.html\n\nTo obtain a copy, write to the Free Software Foundation, Inc.,\n59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.\n\nDisclaimer\n----------\nThe University of Maryland and the authors make no representations about\nthe suitability or fitness of this software for any purpose.  It is\nprovided \"as is\" without express or implied warranty.\n---------------------------------------------------------------------\n\nAuthors\n-------\nDavid Mount\nDept of Computer Science\nUniversity of Maryland,\nCollege Park, MD 20742 USA\nmount@cs.umd.edu\nhttp://www.cs.umd.edu/~mount/\n\nSunil Arya\nDept of Computer Science\nHong University of Science and Technology\nClearwater Bay, HONG KONG\narya@cs.ust.hk\nhttp://www.cs.ust.hk/faculty/arya/\n"
  },
  {
    "path": "src/ANN/License.txt",
    "content": "----------------------------------------------------------------------\nThe ANN Library (all versions) is provided under the terms and\nconditions of the GNU Lesser General Public Library, which is stated\nbelow.  It can also be found at:\n\n   http://www.gnu.org/copyleft/lesser.html\n\n----------------------------------------------------------------------\n\nGNU LESSER GENERAL PUBLIC LICENSE\n\nVersion 2.1, February 1999\n\nCopyright (C) 1991, 1999 Free Software Foundation, Inc.\n59 Temple Place, Suite 330, Boston, MA  02111-1307  USA\nEveryone is permitted to copy and distribute verbatim copies\nof this license document, but changing it is not allowed.\n\n[This is the first released version of the Lesser GPL.  It also counts\nas the successor of the GNU Library Public License, version 2, hence the\nversion number 2.1.]\n\nPreamble\n\nThe licenses for most software are designed to take away your freedom to\nshare and change it. By contrast, the GNU General Public Licenses are\nintended to guarantee your freedom to share and change free software--to\nmake sure the software is free for all its users.\n\nThis license, the Lesser General Public License, applies to some\nspecially designated software packages--typically libraries--of the Free\nSoftware Foundation and other authors who decide to use it. You can use\nit too, but we suggest you first think carefully about whether this\nlicense or the ordinary General Public License is the better strategy to\nuse in any particular case, based on the explanations below.\n\nWhen we speak of free software, we are referring to freedom of use, not\nprice. Our General Public Licenses are designed to make sure that you\nhave the freedom to distribute copies of free software (and charge for\nthis service if you wish); that you receive source code or can get it if\nyou want it; that you can change the software and use pieces of it in\nnew free programs; and that you are informed that you can do these\nthings.\n\nTo protect your rights, we need to make restrictions that forbid\ndistributors to deny you these rights or to ask you to surrender these\nrights. These restrictions translate to certain responsibilities for you\nif you distribute copies of the library or if you modify it.\n\nFor example, if you distribute copies of the library, whether gratis or\nfor a fee, you must give the recipients all the rights that we gave you.\nYou must make sure that they, too, receive or can get the source code.\nIf you link other code with the library, you must provide complete\nobject files to the recipients, so that they can relink them with the\nlibrary after making changes to the library and recompiling it. And you\nmust show them these terms so they know their rights.\n\nWe protect your rights with a two-step method: (1) we copyright the\nlibrary, and (2) we offer you this license, which gives you legal\npermission to copy, distribute and/or modify the library.\n\nTo protect each distributor, we want to make it very clear that there is\nno warranty for the free library. Also, if the library is modified by\nsomeone else and passed on, the recipients should know that what they\nhave is not the original version, so that the original author's\nreputation will not be affected by problems that might be introduced by\nothers.\n\nFinally, software patents pose a constant threat to the existence of any\nfree program. We wish to make sure that a company cannot effectively\nrestrict the users of a free program by obtaining a restrictive license\nfrom a patent holder. Therefore, we insist that any patent license\nobtained for a version of the library must be consistent with the full\nfreedom of use specified in this license.\n\nMost GNU software, including some libraries, is covered by the ordinary\nGNU General Public License. This license, the GNU Lesser General Public\nLicense, applies to certain designated libraries, and is quite different\nfrom the ordinary General Public License. We use this license for\ncertain libraries in order to permit linking those libraries into\nnon-free programs.\n\nWhen a program is linked with a library, whether statically or using a\nshared library, the combination of the two is legally speaking a\ncombined work, a derivative of the original library. The ordinary\nGeneral Public License therefore permits such linking only if the entire\ncombination fits its criteria of freedom. The Lesser General Public\nLicense permits more lax criteria for linking other code with the\nlibrary.\n\nWe call this license the \"Lesser\" General Public License because it does\nLess to protect the user's freedom than the ordinary General Public\nLicense. It also provides other free software developers Less of an\nadvantage over competing non-free programs. These disadvantages are the\nreason we use the ordinary General Public License for many libraries.\nHowever, the Lesser license provides advantages in certain special\ncircumstances.\n\nFor example, on rare occasions, there may be a special need to encourage\nthe widest possible use of a certain library, so that it becomes a\nde-facto standard. To achieve this, non-free programs must be allowed to\nuse the library. A more frequent case is that a free library does the\nsame job as widely used non-free libraries. In this case, there is\nlittle to gain by limiting the free library to free software only, so we\nuse the Lesser General Public License.\n\nIn other cases, permission to use a particular library in non-free\nprograms enables a greater number of people to use a large body of free\nsoftware. For example, permission to use the GNU C Library in non-free\nprograms enables many more people to use the whole GNU operating system,\nas well as its variant, the GNU/Linux operating system.\n\nAlthough the Lesser General Public License is Less protective of the\nusers' freedom, it does ensure that the user of a program that is linked\nwith the Library has the freedom and the wherewithal to run that program\nusing a modified version of the Library.\n\nThe precise terms and conditions for copying, distribution and\nmodification follow. Pay close attention to the difference between a\n\"work based on the library\" and a \"work that uses the library\". The\nformer contains code derived from the library, whereas the latter must\nbe combined with the library in order to run.\n\nTERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION\n\n0. This License Agreement applies to any software library or other\nprogram which contains a notice placed by the copyright holder or other\nauthorized party saying it may be distributed under the terms of this\nLesser General Public License (also called \"this License\"). Each\nlicensee is addressed as \"you\".\n\nA \"library\" means a collection of software functions and/or data\nprepared so as to be conveniently linked with application programs\n(which use some of those functions and data) to form executables.\n\nThe \"Library\", below, refers to any such software library or work which\nhas been distributed under these terms. A \"work based on the Library\"\nmeans either the Library or any derivative work under copyright law:\nthat is to say, a work containing the Library or a portion of it, either\nverbatim or with modifications and/or translated straightforwardly into\nanother language. (Hereinafter, translation is included without\nlimitation in the term \"modification\".)\n\n\"Source code\" for a work means the preferred form of the work for making\nmodifications to it. For a library, complete source code means all the\nsource code for all modules it contains, plus any associated interface\ndefinition files, plus the scripts used to control compilation and\ninstallation of the library.\n\nActivities other than copying, distribution and modification are not\ncovered by this License; they are outside its scope. The act of running\na program using the Library is not restricted, and output from such a\nprogram is covered only if its contents constitute a work based on the\nLibrary (independent of the use of the Library in a tool for writing\nit). Whether that is true depends on what the Library does and what the\nprogram that uses the Library does.\n\n1. You may copy and distribute verbatim copies of the Library's complete\nsource code as you receive it, in any medium, provided that you\nconspicuously and appropriately publish on each copy an appropriate\ncopyright notice and disclaimer of warranty; keep intact all the notices\nthat refer to this License and to the absence of any warranty; and\ndistribute a copy of this License along with the Library.\n\nYou may charge a fee for the physical act of transferring a copy, and\nyou may at your option offer warranty protection in exchange for a fee.\n\n2. You may modify your copy or copies of the Library or any portion of\nit, thus forming a work based on the Library, and copy and distribute\nsuch modifications or work under the terms of Section 1 above, provided\nthat you also meet all of these conditions:\n\n    a) The modified work must itself be a software library.\n    b) You must cause the files modified to carry prominent notices\n       stating that you changed the files and the date of any change.\n    c) You must cause the whole of the work to be licensed at no\n       charge to all third parties under the terms of this License.\n    d) If a facility in the modified Library refers to a function or a\n       table of data to be supplied by an application program that uses\n       the facility, other than as an argument passed when the facility\n       is invoked, then you must make a good faith effort to ensure that,\n       in the event an application does not supply such function or\n       table, the facility still operates, and performs whatever part of\n       its purpose remains meaningful.\n\n      (For example, a function in a library to compute square roots has\na purpose that is entirely well-defined independent of the application.\nTherefore, Subsection 2d requires that any application-supplied function\nor table used by this function must be optional: if the application does\nnot supply it, the square root function must still compute square\nroots.)\n\n      These requirements apply to the modified work as a whole. If\nidentifiable sections of that work are not derived from the Library, and\ncan be reasonably considered independent and separate works in\nthemselves, then this License, and its terms, do not apply to those\nsections when you distribute them as separate works. But when you\ndistribute the same sections as part of a whole which is a work based on\nthe Library, the distribution of the whole must be on the terms of this\nLicense, whose permissions for other licensees extend to the entire\nwhole, and thus to each and every part regardless of who wrote it.\n\n      Thus, it is not the intent of this section to claim rights or\ncontest your rights to work written entirely by you; rather, the intent\nis to exercise the right to control the distribution of derivative or\ncollective works based on the Library.\n\n      In addition, mere aggregation of another work not based on the\nLibrary with the Library (or with a work based on the Library) on a\nvolume of a storage or distribution medium does not bring the other work\nunder the scope of this License. \n\n3. You may opt to apply the terms of the ordinary GNU General Public\nLicense instead of this License to a given copy of the Library. To do\nthis, you must alter all the notices that refer to this License, so that\nthey refer to the ordinary GNU General Public License, version 2,\ninstead of to this License. (If a newer version than version 2 of the\nordinary GNU General Public License has appeared, then you can specify\nthat version instead if you wish.) Do not make any other change in these\nnotices.\n\nOnce this change is made in a given copy, it is irreversible for that\ncopy, so the ordinary GNU General Public License applies to all\nsubsequent copies and derivative works made from that copy.\n\nThis option is useful when you wish to copy part of the code of the\nLibrary into a program that is not a library.\n\n4. You may copy and distribute the Library (or a portion or derivative\nof it, under Section 2) in object code or executable form under the\nterms of Sections 1 and 2 above provided that you accompany it with the\ncomplete corresponding machine-readable source code, which must be\ndistributed under the terms of Sections 1 and 2 above on a medium\ncustomarily used for software interchange.\n\nIf distribution of object code is made by offering access to copy from a\ndesignated place, then offering equivalent access to copy the source\ncode from the same place satisfies the requirement to distribute the\nsource code, even though third parties are not compelled to copy the\nsource along with the object code.\n\n5. A program that contains no derivative of any portion of the Library,\nbut is designed to work with the Library by being compiled or linked\nwith it, is called a \"work that uses the Library\". Such a work, in\nisolation, is not a derivative work of the Library, and therefore falls\noutside the scope of this License.\n\nHowever, linking a \"work that uses the Library\" with the Library creates\nan executable that is a derivative of the Library (because it contains\nportions of the Library), rather than a \"work that uses the library\".\nThe executable is therefore covered by this License. Section 6 states\nterms for distribution of such executables.\n\nWhen a \"work that uses the Library\" uses material from a header file\nthat is part of the Library, the object code for the work may be a\nderivative work of the Library even though the source code is not.\nWhether this is true is especially significant if the work can be linked\nwithout the Library, or if the work is itself a library. The threshold\nfor this to be true is not precisely defined by law.\n\nIf such an object file uses only numerical parameters, data structure\nlayouts and accessors, and small macros and small inline functions (ten\nlines or less in length), then the use of the object file is\nunrestricted, regardless of whether it is legally a derivative work.\n(Executables containing this object code plus portions of the Library\nwill still fall under Section 6.)\n\nOtherwise, if the work is a derivative of the Library, you may\ndistribute the object code for the work under the terms of Section 6.\nAny executables containing that work also fall under Section 6, whether\nor not they are linked directly with the Library itself.\n\n6. As an exception to the Sections above, you may also combine or link a\n\"work that uses the Library\" with the Library to produce a work\ncontaining portions of the Library, and distribute that work under terms\nof your choice, provided that the terms permit modification of the work\nfor the customer's own use and reverse engineering for debugging such\nmodifications.\n\nYou must give prominent notice with each copy of the work that the\nLibrary is used in it and that the Library and its use are covered by\nthis License. You must supply a copy of this License. If the work during\nexecution displays copyright notices, you must include the copyright\nnotice for the Library among them, as well as a reference directing the\nuser to the copy of this License. Also, you must do one of these things:\n\n    a) Accompany the work with the complete corresponding\n       machine-readable source code for the Library including whatever\n       changes were used in the work (which must be distributed under\n       Sections 1 and 2 above); and, if the work is an executable linked\n       with the Library, with the complete machine-readable \"work that\n       uses the Library\", as object code and/or source code, so that the\n       user can modify the Library and then relink to produce a modified\n       executable containing the modified Library. (It is understood that\n       the user who changes the contents of definitions files in the\n       Library will not necessarily be able to recompile the application\n       to use the modified definitions.)\n    b) Use a suitable shared library mechanism for linking with the\n       Library. A suitable mechanism is one that (1) uses at run time a\n       copy of the library already present on the user's computer system,\n       rather than copying library functions into the executable, and (2)\n       will operate properly with a modified version of the library, if\n       the user installs one, as long as the modified version is\n       interface-compatible with the version that the work was made with.\n    c) Accompany the work with a written offer, valid for at least\n       three years, to give the same user the materials specified in\n       Subsection 6a, above, for a charge no more than the cost of\n       performing this distribution.\n    d) If distribution of the work is made by offering access to copy\n       from a designated place, offer equivalent access to copy the above\n       specified materials from the same place.\n    e) Verify that the user has already received a copy of these\n       materials or that you have already sent this user a copy. \n\nFor an executable, the required form of the \"work that uses the Library\"\nmust include any data and utility programs needed for reproducing the\nexecutable from it. However, as a special exception, the materials to be\ndistributed need not include anything that is normally distributed (in\neither source or binary form) with the major components (compiler,\nkernel, and so on) of the operating system on which the executable runs,\nunless that component itself accompanies the executable.\n\nIt may happen that this requirement contradicts the license restrictions\nof other proprietary libraries that do not normally accompany the\noperating system. Such a contradiction means you cannot use both them\nand the Library together in an executable that you distribute.\n\n7. You may place library facilities that are a work based on the Library\nside-by-side in a single library together with other library facilities\nnot covered by this License, and distribute such a combined library,\nprovided that the separate distribution of the work based on the Library\nand of the other library facilities is otherwise permitted, and provided\nthat you do these two things:\n\n    a) Accompany the combined library with a copy of the same work\n       based on the Library, uncombined with any other library\n       facilities. This must be distributed under the terms of the\n       Sections above.\n    b) Give prominent notice with the combined library of the fact\n       that part of it is a work based on the Library, and explaining\n       where to find the accompanying uncombined form of the same work. \n\n8. You may not copy, modify, sublicense, link with, or distribute the\nLibrary except as expressly provided under this License. Any attempt\notherwise to copy, modify, sublicense, link with, or distribute the\nLibrary is void, and will automatically terminate your rights under this\nLicense. However, parties who have received copies, or rights, from you\nunder this License will not have their licenses terminated so long as\nsuch parties remain in full compliance.\n\n9. You are not required to accept this License, since you have not\nsigned it. However, nothing else grants you permission to modify or\ndistribute the Library or its derivative works. These actions are\nprohibited by law if you do not accept this License. Therefore, by\nmodifying or distributing the Library (or any work based on the\nLibrary), you indicate your acceptance of this License to do so, and all\nits terms and conditions for copying, distributing or modifying the\nLibrary or works based on it.\n\n10. Each time you redistribute the Library (or any work based on the\nLibrary), the recipient automatically receives a license from the\noriginal licensor to copy, distribute, link with or modify the Library\nsubject to these terms and conditions. You may not impose any further\nrestrictions on the recipients' exercise of the rights granted herein.\nYou are not responsible for enforcing compliance by third parties with\nthis License.\n\n11. If, as a consequence of a court judgment or allegation of patent\ninfringement or for any other reason (not limited to patent issues),\nconditions are imposed on you (whether by court order, agreement or\notherwise) that contradict the conditions of this License, they do not\nexcuse you from the conditions of this License. If you cannot distribute\nso as to satisfy simultaneously your obligations under this License and\nany other pertinent obligations, then as a consequence you may not\ndistribute the Library at all. For example, if a patent license would\nnot permit royalty-free redistribution of the Library by all those who\nreceive copies directly or indirectly through you, then the only way you\ncould satisfy both it and this License would be to refrain entirely from\ndistribution of the Library.\n\nIf any portion of this section is held invalid or unenforceable under\nany particular circumstance, the balance of the section is intended to\napply, and the section as a whole is intended to apply in other\ncircumstances.\n\nIt is not the purpose of this section to induce you to infringe any\npatents or other property right claims or to contest validity of any\nsuch claims; this section has the sole purpose of protecting the\nintegrity of the free software distribution system which is implemented\nby public license practices. Many people have made generous\ncontributions to the wide range of software distributed through that\nsystem in reliance on consistent application of that system; it is up to\nthe author/donor to decide if he or she is willing to distribute\nsoftware through any other system and a licensee cannot impose that\nchoice.\n\nThis section is intended to make thoroughly clear what is believed to be\na consequence of the rest of this License.\n\n12. If the distribution and/or use of the Library is restricted in\ncertain countries either by patents or by copyrighted interfaces, the\noriginal copyright holder who places the Library under this License may\nadd an explicit geographical distribution limitation excluding those\ncountries, so that distribution is permitted only in or among countries\nnot thus excluded. In such case, this License incorporates the\nlimitation as if written in the body of this License.\n\n13. The Free Software Foundation may publish revised and/or new versions\nof the Lesser General Public License from time to time. Such new\nversions will be similar in spirit to the present version, but may\ndiffer in detail to address new problems or concerns.\n\nEach version is given a distinguishing version number. If the Library\nspecifies a version number of this License which applies to it and \"any\nlater version\", you have the option of following the terms and\nconditions either of that version or of any later version published by\nthe Free Software Foundation. If the Library does not specify a license\nversion number, you may choose any version ever published by the Free\nSoftware Foundation.\n\n14. If you wish to incorporate parts of the Library into other free\nprograms whose distribution conditions are incompatible with these,\nwrite to the author to ask for permission. For software which is\ncopyrighted by the Free Software Foundation, write to the Free Software\nFoundation; we sometimes make exceptions for this. Our decision will be\nguided by the two goals of preserving the free status of all derivatives\nof our free software and of promoting the sharing and reuse of software\ngenerally.\n\nNO WARRANTY\n\n15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY\nFOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN\nOTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES\nPROVIDE THE LIBRARY \"AS IS\" WITHOUT WARRANTY OF ANY KIND, EITHER\nEXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED\nWARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE\nENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE LIBRARY IS WITH\nYOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL\nNECESSARY SERVICING, REPAIR OR CORRECTION.\n\n16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN\nWRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY\nAND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU FOR\nDAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL\nDAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE LIBRARY\n(INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED\nINACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF\nTHE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR\nOTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. \n"
  },
  {
    "path": "src/ANN/ReadMe.txt",
    "content": "ANN: Approximate Nearest Neighbors\nVersion: 1.1\nRelease date: May 3, 2005\n----------------------------------------------------------------------------\nCopyright (c) 1997-2005 University of Maryland and Sunil Arya and David\nMount. All Rights Reserved.  See Copyright.txt and License.txt for\ncomplete information on terms and conditions of use and distribution of\nthis software.\n----------------------------------------------------------------------------\n\nAuthors\n-------\nDavid Mount\nDept of Computer Science\nUniversity of Maryland,\nCollege Park, MD 20742 USA\nmount@cs.umd.edu\nhttp://www.cs.umd.edu/~mount/\n\nSunil Arya\nDept of Computer Science\nHong University of Science and Technology\nClearwater Bay, HONG KONG\narya@cs.ust.hk\nhttp://www.cs.ust.hk/faculty/arya/\n\nIntroduction\n------------\nANN is a library written in the C++ programming language to support both\nexact and approximate nearest neighbor searching in spaces of various\ndimensions.  It was implemented by David M. Mount of the University of\nMaryland, and Sunil Arya of the Hong Kong University of Science and\nTechnology.  ANN (pronounced like the name ``Ann'') stands for\nApproximate Nearest Neighbors.  ANN is also a testbed containing\nprograms and procedures for generating data sets, collecting and\nanalyzing statistics on the performance of nearest neighbor algorithms\nand data structures, and visualizing the geometric structure of these\ndata structures.\n\nThe ANN source code and documentation is available from the following\nweb page:\n\n    http://www.cs.umd.edu/~mount/ANN\n\nFor more information on ANN and its use, see the ``ANN Programming\nManual,'' which is provided with the software distribution.\n\n----------------------------------------------------------------------------\nHistory\n  Version 0.1  03/04/98\n    Preliminary release\n  Version 0.2  06/24/98\n    Changes for SGI compiler.\n  Version 1.0  04/01/05\n    Fixed a number of small bugs\n    Added dump/load operations\n    Added annClose to eliminate minor memory leak\n    Improved compatibility with current C++ compilers\n    Added compilation for Microsoft Visual Studio.NET\n    Added compilation for Linux 2.x\n  Version 1.1  05/03/05\n    Added make target for Mac OS X\n    Added fixed-radius range searching and counting\n    Added instructions on compiling/using ANN on Windows platforms\n    Fixed minor output bug in ann2fig\n"
  },
  {
    "path": "src/ANN/bd_fix_rad_search.cpp",
    "content": "//----------------------------------------------------------------------\n// File:\t\t\tbd_fix_rad_search.cpp\n// Programmer:\t\tDavid Mount\n// Description:\t\tStandard bd-tree search\n// Last modified:\t05/03/05 (Version 1.1)\n//----------------------------------------------------------------------\n// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and\n// David Mount.  All Rights Reserved.\n// \n// This software and related documentation is part of the Approximate\n// Nearest Neighbor Library (ANN).  This software is provided under\n// the provisions of the Lesser GNU Public License (LGPL).  See the\n// file ../ReadMe.txt for further information.\n// \n// The University of Maryland (U.M.) and the authors make no\n// representations about the suitability or fitness of this software for\n// any purpose.  It is provided \"as is\" without express or implied\n// warranty.\n//----------------------------------------------------------------------\n// History:\n//\tRevision 1.1  05/03/05\n//\t\tInitial release\n//----------------------------------------------------------------------\n\n#include \"bd_tree.h\"\t\t\t\t\t// bd-tree declarations\n#include \"kd_fix_rad_search.h\"\t\t\t// kd-tree FR search declarations\n\n//----------------------------------------------------------------------\n//\tApproximate searching for bd-trees.\n//\t\tSee the file kd_FR_search.cpp for general information on the\n//\t\tapproximate nearest neighbor search algorithm.  Here we\n//\t\tinclude the extensions for shrinking nodes.\n//----------------------------------------------------------------------\n\n//----------------------------------------------------------------------\n//\tbd_shrink::ann_FR_search - search a shrinking node\n//----------------------------------------------------------------------\n\nvoid ANNbd_shrink::ann_FR_search(ANNdist box_dist)\n{\n\t\t\t\t\t\t\t\t\t\t\t\t// check dist calc term cond.\n\tif (ANNmaxPtsVisited != 0 && ANNptsVisited > ANNmaxPtsVisited) return;\n\n\tANNdist inner_dist = 0;\t\t\t\t\t\t// distance to inner box\n\tfor (int i = 0; i < n_bnds; i++) {\t\t\t// is query point in the box?\n\t\tif (bnds[i].out(ANNkdFRQ)) {\t\t\t// outside this bounding side?\n\t\t\t\t\t\t\t\t\t\t\t\t// add to inner distance\n\t\t\tinner_dist = (ANNdist) ANN_SUM(inner_dist, bnds[i].dist(ANNkdFRQ));\n\t\t}\n\t}\n\tif (inner_dist <= box_dist) {\t\t\t\t// if inner box is closer\n\t\tchild[ANN_IN]->ann_FR_search(inner_dist);// search inner child first\n\t\tchild[ANN_OUT]->ann_FR_search(box_dist);// ...then outer child\n\t}\n\telse {\t\t\t\t\t\t\t\t\t\t// if outer box is closer\n\t\tchild[ANN_OUT]->ann_FR_search(box_dist);// search outer child first\n\t\tchild[ANN_IN]->ann_FR_search(inner_dist);// ...then outer child\n\t}\n\tANN_FLOP(3*n_bnds)\t\t\t\t\t\t\t// increment floating ops\n\tANN_SHR(1)\t\t\t\t\t\t\t\t\t// one more shrinking node\n}\n"
  },
  {
    "path": "src/ANN/bd_pr_search.cpp",
    "content": "//----------------------------------------------------------------------\n// File:\t\t\tbd_pr_search.cpp\n// Programmer:\t\tDavid Mount\n// Description:\t\tPriority search for bd-trees\n// Last modified:\t01/04/05 (Version 1.0)\n//----------------------------------------------------------------------\n// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and\n// David Mount.  All Rights Reserved.\n// \n// This software and related documentation is part of the Approximate\n// Nearest Neighbor Library (ANN).  This software is provided under\n// the provisions of the Lesser GNU Public License (LGPL).  See the\n// file ../ReadMe.txt for further information.\n// \n// The University of Maryland (U.M.) and the authors make no\n// representations about the suitability or fitness of this software for\n// any purpose.  It is provided \"as is\" without express or implied\n// warranty.\n//----------------------------------------------------------------------\n//History:\n//\tRevision 0.1  03/04/98\n//\t\tInitial release\n//----------------------------------------------------------------------\n\n#include \"bd_tree.h\"\t\t\t\t\t// bd-tree declarations\n#include \"kd_pr_search.h\"\t\t\t\t// kd priority search declarations\n\n//----------------------------------------------------------------------\n//\tApproximate priority searching for bd-trees.\n//\t\tSee the file kd_pr_search.cc for general information on the\n//\t\tapproximate nearest neighbor priority search algorithm.  Here\n//\t\twe include the extensions for shrinking nodes.\n//----------------------------------------------------------------------\n\n//----------------------------------------------------------------------\n//\tbd_shrink::ann_search - search a shrinking node\n//----------------------------------------------------------------------\n\nvoid ANNbd_shrink::ann_pri_search(ANNdist box_dist)\n{\n\tANNdist inner_dist = 0;\t\t\t\t\t\t// distance to inner box\n\tfor (int i = 0; i < n_bnds; i++) {\t\t\t// is query point in the box?\n\t\tif (bnds[i].out(ANNprQ)) {\t\t\t\t// outside this bounding side?\n\t\t\t\t\t\t\t\t\t\t\t\t// add to inner distance\n\t\t\tinner_dist = (ANNdist) ANN_SUM(inner_dist, bnds[i].dist(ANNprQ));\n\t\t}\n\t}\n\tif (inner_dist <= box_dist) {\t\t\t\t// if inner box is closer\n\t\tif (child[ANN_OUT] != KD_TRIVIAL)\t\t// enqueue outer if not trivial\n\t\t\tANNprBoxPQ->insert(box_dist,child[ANN_OUT]);\n\t\t\t\t\t\t\t\t\t\t\t\t// continue with inner child\n\t\tchild[ANN_IN]->ann_pri_search(inner_dist);\n\t}\n\telse {\t\t\t\t\t\t\t\t\t\t// if outer box is closer\n\t\tif (child[ANN_IN] != KD_TRIVIAL)\t\t// enqueue inner if not trivial\n\t\t\tANNprBoxPQ->insert(inner_dist,child[ANN_IN]);\n\t\t\t\t\t\t\t\t\t\t\t\t// continue with outer child\n\t\tchild[ANN_OUT]->ann_pri_search(box_dist);\n\t}\n\tANN_FLOP(3*n_bnds)\t\t\t\t\t\t\t// increment floating ops\n\tANN_SHR(1)\t\t\t\t\t\t\t\t\t// one more shrinking node\n}\n"
  },
  {
    "path": "src/ANN/bd_search.cpp",
    "content": "//----------------------------------------------------------------------\n// File:\t\t\tbd_search.cpp\n// Programmer:\t\tDavid Mount\n// Description:\t\tStandard bd-tree search\n// Last modified:\t01/04/05 (Version 1.0)\n//----------------------------------------------------------------------\n// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and\n// David Mount.  All Rights Reserved.\n// \n// This software and related documentation is part of the Approximate\n// Nearest Neighbor Library (ANN).  This software is provided under\n// the provisions of the Lesser GNU Public License (LGPL).  See the\n// file ../ReadMe.txt for further information.\n// \n// The University of Maryland (U.M.) and the authors make no\n// representations about the suitability or fitness of this software for\n// any purpose.  It is provided \"as is\" without express or implied\n// warranty.\n//----------------------------------------------------------------------\n// History:\n//\tRevision 0.1  03/04/98\n//\t\tInitial release\n//----------------------------------------------------------------------\n\n#include \"bd_tree.h\"\t\t\t\t\t// bd-tree declarations\n#include \"kd_search.h\"\t\t\t\t\t// kd-tree search declarations\n\n//----------------------------------------------------------------------\n//\tApproximate searching for bd-trees.\n//\t\tSee the file kd_search.cpp for general information on the\n//\t\tapproximate nearest neighbor search algorithm.  Here we\n//\t\tinclude the extensions for shrinking nodes.\n//----------------------------------------------------------------------\n\n//----------------------------------------------------------------------\n//\tbd_shrink::ann_search - search a shrinking node\n//----------------------------------------------------------------------\n\nvoid ANNbd_shrink::ann_search(ANNdist box_dist)\n{\n\t\t\t\t\t\t\t\t\t\t\t\t// check dist calc term cond.\n\tif (ANNmaxPtsVisited != 0 && ANNptsVisited > ANNmaxPtsVisited) return;\n\n\tANNdist inner_dist = 0;\t\t\t\t\t\t// distance to inner box\n\tfor (int i = 0; i < n_bnds; i++) {\t\t\t// is query point in the box?\n\t\tif (bnds[i].out(ANNkdQ)) {\t\t\t\t// outside this bounding side?\n\t\t\t\t\t\t\t\t\t\t\t\t// add to inner distance\n\t\t\tinner_dist = (ANNdist) ANN_SUM(inner_dist, bnds[i].dist(ANNkdQ));\n\t\t}\n\t}\n\tif (inner_dist <= box_dist) {\t\t\t\t// if inner box is closer\n\t\tchild[ANN_IN]->ann_search(inner_dist);\t// search inner child first\n\t\tchild[ANN_OUT]->ann_search(box_dist);\t// ...then outer child\n\t}\n\telse {\t\t\t\t\t\t\t\t\t\t// if outer box is closer\n\t\tchild[ANN_OUT]->ann_search(box_dist);\t// search outer child first\n\t\tchild[ANN_IN]->ann_search(inner_dist);\t// ...then outer child\n\t}\n\tANN_FLOP(3*n_bnds)\t\t\t\t\t\t\t// increment floating ops\n\tANN_SHR(1)\t\t\t\t\t\t\t\t\t// one more shrinking node\n}\n"
  },
  {
    "path": "src/ANN/bd_tree.cpp",
    "content": "//----------------------------------------------------------------------\n// File:\t\t\tbd_tree.cpp\n// Programmer:\t\tDavid Mount\n// Description:\t\tBasic methods for bd-trees.\n// Last modified:\t01/04/05 (Version 1.0)\n//----------------------------------------------------------------------\n// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and\n// David Mount.  All Rights Reserved.\n//\n// This software and related documentation is part of the Approximate\n// Nearest Neighbor Library (ANN).  This software is provided under\n// the provisions of the Lesser GNU Public License (LGPL).  See the\n// file ../ReadMe.txt for further information.\n//\n// The University of Maryland (U.M.) and the authors make no\n// representations about the suitability or fitness of this software for\n// any purpose.  It is provided \"as is\" without express or implied\n// warranty.\n//----------------------------------------------------------------------\n// History:\n//\tRevision 0.1  03/04/98\n//\t\tInitial release\n//\tRevision l.0  04/01/05\n//\t\tFixed centroid shrink threshold condition to depend on the\n//\t\t\tdimension.\n//\t\tMoved dump routine to kd_dump.cpp.\n//----------------------------------------------------------------------\n\n#include \"bd_tree.h\"\t\t\t\t\t// bd-tree declarations\n#include \"kd_util.h\"\t\t\t\t\t// kd-tree utilities\n#include \"kd_split.h\"\t\t\t\t\t// kd-tree splitting rules\n\n#include \"ANNperf.h\"\t\t\t\t// performance evaluation\n\n//----------------------------------------------------------------------\n//\tPrinting a bd-tree\n//\t\tThese routines print a bd-tree.   See the analogous procedure\n//\t\tin kd_tree.cpp for more information.\n//----------------------------------------------------------------------\n\nvoid ANNbd_shrink::print(\t\t\t\t// print shrinking node\n\t\tint level,\t\t\t\t\t\t// depth of node in tree\n\t\tostream &out)\t\t\t\t\t// output stream\n{\n\tchild[ANN_OUT]->print(level+1, out);\t\t// print out-child\n\n\tout << \"    \";\n\tfor (int i = 0; i < level; i++)\t\t\t\t// print indentation\n\t\tout << \"..\";\n\tout << \"Shrink\";\n\tfor (int j = 0; j < n_bnds; j++) {\t\t\t// print sides, 2 per line\n\t\tif (j % 2 == 0) {\n\t\t\tout << \"\\n\";\t\t\t\t\t\t// newline and indentation\n\t\t\tfor (int i = 0; i < level+2; i++) out << \"  \";\n\t\t}\n\t\tout << \"  ([\" << bnds[j].cd << \"]\"\n\t\t\t << (bnds[j].sd > 0 ? \">=\" : \"< \")\n\t\t\t << bnds[j].cv << \")\";\n\t}\n\tout << \"\\n\";\n\n\tchild[ANN_IN]->print(level+1, out);\t\t\t// print in-child\n}\n\n//----------------------------------------------------------------------\n//\tkd_tree statistics utility (for performance evaluation)\n//\t\tThis routine computes various statistics information for\n//\t\tshrinking nodes.  See file kd_tree.cpp for more information.\n//----------------------------------------------------------------------\n\nvoid ANNbd_shrink::getStats(\t\t\t\t\t// get subtree statistics\n\tint\t\t\t\t\tdim,\t\t\t\t\t// dimension of space\n\tANNkdStats\t\t\t&st,\t\t\t\t\t// stats (modified)\n\tANNorthRect\t\t\t&bnd_box)\t\t\t\t// bounding box\n{\n\tANNkdStats ch_stats;\t\t\t\t\t\t// stats for children\n\tANNorthRect inner_box(dim);\t\t\t\t\t// inner box of shrink\n\n\tannBnds2Box(bnd_box,\t\t\t\t\t\t// enclosing box\n\t\t\t\tdim,\t\t\t\t\t\t\t// dimension\n\t\t\t\tn_bnds,\t\t\t\t\t\t\t// number of bounds\n\t\t\t\tbnds,\t\t\t\t\t\t\t// bounds array\n\t\t\t\tinner_box);\t\t\t\t\t\t// inner box (modified)\n\t\t\t\t\t\t\t\t\t\t\t\t// get stats for inner child\n\tch_stats.reset();\t\t\t\t\t\t\t// reset\n\tchild[ANN_IN]->getStats(dim, ch_stats, inner_box);\n\tst.merge(ch_stats);\t\t\t\t\t\t\t// merge them\n\t\t\t\t\t\t\t\t\t\t\t\t// get stats for outer child\n\tch_stats.reset();\t\t\t\t\t\t\t// reset\n\tchild[ANN_OUT]->getStats(dim, ch_stats, bnd_box);\n\tst.merge(ch_stats);\t\t\t\t\t\t\t// merge them\n\n\tst.depth++;\t\t\t\t\t\t\t\t\t// increment depth\n\tst.n_shr++;\t\t\t\t\t\t\t\t\t// increment number of shrinks\n}\n\n//----------------------------------------------------------------------\n// bd-tree constructor\n//\t\tThis is the main constructor for bd-trees given a set of points.\n//\t\tIt first builds a skeleton kd-tree as a basis, then computes the\n//\t\tbounding box of the data points, and then invokes rbd_tree() to\n//\t\tactually build the tree, passing it the appropriate splitting\n//\t\tand shrinking information.\n//----------------------------------------------------------------------\n\nANNkd_ptr rbd_tree(\t\t\t\t\t\t// recursive construction of bd-tree\n\tANNpointArray\t\tpa,\t\t\t\t// point array\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices to store in subtree\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\tdim,\t\t\t// dimension of space\n\tint\t\t\t\t\tbsp,\t\t\t// bucket space\n\tANNorthRect\t\t\t&bnd_box,\t\t// bounding box for current node\n\tANNkd_splitter\t\tsplitter,\t\t// splitting routine\n\tANNshrinkRule\t\tshrink);\t\t// shrinking rule\n\nANNbd_tree::ANNbd_tree(\t\t\t\t\t// construct from point array\n\tANNpointArray\t\tpa,\t\t\t\t// point array (with at least n pts)\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\tdd,\t\t\t\t// dimension\n\tint\t\t\t\t\tbs,\t\t\t\t// bucket size\n\tANNsplitRule\t\tsplit,\t\t\t// splitting rule\n\tANNshrinkRule\t\tshrink)\t\t\t// shrinking rule\n\t: ANNkd_tree(n, dd, bs)\t\t\t\t// build skeleton base tree\n{\n\tpts = pa;\t\t\t\t\t\t\t// where the points are\n\tif (n == 0) return;\t\t\t\t\t// no points--no sweat\n\n\tANNorthRect bnd_box(dd);\t\t\t// bounding box for points\n\t\t\t\t\t\t\t\t\t// construct bounding rectangle\n\tannEnclRect(pa, pidx, n, dd, bnd_box);\n\t\t\t\t\t\t\t\t\t\t// copy to tree structure\n\tbnd_box_lo = annCopyPt(dd, bnd_box.lo);\n\tbnd_box_hi = annCopyPt(dd, bnd_box.hi);\n\n\tswitch (split) {\t\t\t\t\t// build by rule\n\tcase ANN_KD_STD:\t\t\t\t\t// standard kd-splitting rule\n\n\t\troot = rbd_tree(pa, pidx, n, dd, bs, bnd_box, kd_split, shrink);\n\t\tbreak;\n\tcase ANN_KD_MIDPT:\t\t\t\t\t// midpoint split\n\t\troot = rbd_tree(pa, pidx, n, dd, bs, bnd_box, midpt_split, shrink);\n\t\tbreak;\n\tcase ANN_KD_SUGGEST:\t\t\t\t// best (in our opinion)\n\tcase ANN_KD_SL_MIDPT:\t\t\t\t// sliding midpoint split\n\t\troot = rbd_tree(pa, pidx, n, dd, bs, bnd_box, sl_midpt_split, shrink);\n\t\tbreak;\n\tcase ANN_KD_FAIR:\t\t\t\t\t// fair split\n\t\troot = rbd_tree(pa, pidx, n, dd, bs, bnd_box, fair_split, shrink);\n\t\tbreak;\n\tcase ANN_KD_SL_FAIR:\t\t\t\t// sliding fair split\n\t\troot = rbd_tree(pa, pidx, n, dd, bs,\n\t\t\t\t\t\tbnd_box, sl_fair_split, shrink);\n\t\tbreak;\n\tdefault:\n\t\tannError(\"Illegal splitting method\", ANNabort);\n\t}\n}\n\n//----------------------------------------------------------------------\n//\tShrinking rules\n//----------------------------------------------------------------------\n\nenum ANNdecomp {SPLIT, SHRINK};\t\t\t// decomposition methods\n\n//----------------------------------------------------------------------\n//\ttrySimpleShrink - Attempt a simple shrink\n//\n//\t\tWe compute the tight bounding box of the points, and compute\n//\t\tthe 2*dim ``gaps'' between the sides of the tight box and the\n//\t\tbounding box.  If any of the gaps is large enough relative to\n//\t\tthe longest side of the tight bounding box, then we shrink\n//\t\tall sides whose gaps are large enough.  (The reason for\n//\t\tcomparing against the tight bounding box, is that after\n//\t\tshrinking the longest box size will decrease, and if we use\n//\t\tthe standard bounding box, we may decide to shrink twice in\n//\t\ta row.  Since the tight box is fixed, we cannot shrink twice\n//\t\tconsecutively.)\n//----------------------------------------------------------------------\nconst float BD_GAP_THRESH = 0.5;\t\t// gap threshold (must be < 1)\nconst int   BD_CT_THRESH  = 2;\t\t\t// min number of shrink sides\n\nANNdecomp trySimpleShrink(\t\t\t\t// try a simple shrink\n\tANNpointArray\t\tpa,\t\t\t\t// point array\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices to store in subtree\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\tdim,\t\t\t// dimension of space\n\tconst ANNorthRect\t&bnd_box,\t\t// current bounding box\n\tANNorthRect\t\t\t&inner_box)\t\t// inner box if shrinking (returned)\n{\n\tint i;\n\t\t\t\t\t\t\t\t\t\t\t\t// compute tight bounding box\n\tannEnclRect(pa, pidx, n, dim, inner_box);\n\n\tANNcoord max_length = 0;\t\t\t\t\t// find longest box side\n\tfor (i = 0; i < dim; i++) {\n\t\tANNcoord length = inner_box.hi[i] - inner_box.lo[i];\n\t\tif (length > max_length) {\n\t\t\tmax_length = length;\n\t\t}\n\t}\n\n\tint shrink_ct = 0;\t\t\t\t\t\t\t// number of sides we shrunk\n\tfor (i = 0; i < dim; i++) {\t\t\t\t\t// select which sides to shrink\n\t\t\t\t\t\t\t\t\t\t\t\t// gap between boxes\n\t\tANNcoord gap_hi = bnd_box.hi[i] - inner_box.hi[i];\n\t\t\t\t\t\t\t\t\t\t\t\t// big enough gap to shrink?\n\t\tif (gap_hi < max_length*BD_GAP_THRESH)\n\t\t\tinner_box.hi[i] = bnd_box.hi[i];\t// no - expand\n\t\telse shrink_ct++;\t\t\t\t\t\t// yes - shrink this side\n\n\t\t\t\t\t\t\t\t\t\t\t\t// repeat for high side\n\t\tANNcoord gap_lo = inner_box.lo[i] - bnd_box.lo[i];\n\t\tif (gap_lo < max_length*BD_GAP_THRESH)\n\t\t\tinner_box.lo[i] = bnd_box.lo[i];\t// no - expand\n\t\telse shrink_ct++;\t\t\t\t\t\t// yes - shrink this side\n\t}\n\n\tif (shrink_ct >= BD_CT_THRESH)\t\t\t\t// did we shrink enough sides?\n\t\t return SHRINK;\n\telse return SPLIT;\n}\n\n//----------------------------------------------------------------------\n//\ttryCentroidShrink - Attempt a centroid shrink\n//\n//\tWe repeatedly apply the splitting rule, always to the larger subset\n//\tof points, until the number of points decreases by the constant\n//\tfraction BD_FRACTION.  If this takes more than dim*BD_MAX_SPLIT_FAC\n//\tsplits for this to happen, then we shrink to the final inner box\n//\tOtherwise we split.\n//----------------------------------------------------------------------\n\nconst float\tBD_MAX_SPLIT_FAC = 0.5;\t\t// maximum number of splits allowed\nconst float\tBD_FRACTION = 0.5;\t\t\t// ...to reduce points by this fraction\n\t\t\t\t\t\t\t\t\t\t// ...This must be < 1.\n\nANNdecomp tryCentroidShrink(\t\t\t// try a centroid shrink\n\tANNpointArray\t\tpa,\t\t\t\t// point array\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices to store in subtree\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\tdim,\t\t\t// dimension of space\n\tconst ANNorthRect\t&bnd_box,\t\t// current bounding box\n\tANNkd_splitter\t\tsplitter,\t\t// splitting procedure\n\tANNorthRect\t\t\t&inner_box)\t\t// inner box if shrinking (returned)\n{\n\tint n_sub = n;\t\t\t\t\t\t// number of points in subset\n\tint n_goal = (int) (n*BD_FRACTION); // number of point in goal\n\tint n_splits = 0;\t\t\t\t\t// number of splits needed\n\t\t\t\t\t\t\t\t\t\t// initialize inner box to bounding box\n\tannAssignRect(dim, inner_box, bnd_box);\n\n\twhile (n_sub > n_goal) {\t\t\t// keep splitting until goal reached\n\t\tint cd;\t\t\t\t\t\t\t// cut dim from splitter (ignored)\n\t\tANNcoord cv;\t\t\t\t\t// cut value from splitter (ignored)\n\t\tint n_lo;\t\t\t\t\t\t// number of points on low side\n\t\t\t\t\t\t\t\t\t\t// invoke splitting procedure\n\t\t(*splitter)(pa, pidx, inner_box, n_sub, dim, cd, cv, n_lo);\n\t\tn_splits++;\t\t\t\t\t\t// increment split count\n\n\t\tif (n_lo >= n_sub/2) {\t\t\t// most points on low side\n\t\t\tinner_box.hi[cd] = cv;\t\t// collapse high side\n\t\t\tn_sub = n_lo;\t\t\t\t// recurse on lower points\n\t\t}\n\t\telse {\t\t\t\t\t\t\t// most points on high side\n\t\t\tinner_box.lo[cd] = cv;\t\t// collapse low side\n\t\t\tpidx += n_lo;\t\t\t\t// recurse on higher points\n\t\t\tn_sub -= n_lo;\n\t\t}\n\t}\n    if (n_splits > dim*BD_MAX_SPLIT_FAC)// took too many splits\n\t\treturn SHRINK;\t\t\t\t\t// shrink to final subset\n\telse\n\t\treturn SPLIT;\n}\n\n//----------------------------------------------------------------------\n//\tselectDecomp - select which decomposition to use\n//----------------------------------------------------------------------\n\nANNdecomp selectDecomp(\t\t\t// select decomposition method\n\tANNpointArray\t\tpa,\t\t\t\t// point array\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices to store in subtree\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\tdim,\t\t\t// dimension of space\n\tconst ANNorthRect\t&bnd_box,\t\t// current bounding box\n\tANNkd_splitter\t\tsplitter,\t\t// splitting procedure\n\tANNshrinkRule\t\tshrink,\t\t\t// shrinking rule\n\tANNorthRect\t\t\t&inner_box)\t\t// inner box if shrinking (returned)\n{\n\tANNdecomp decomp = SPLIT;\t\t\t// decomposition\n\n\tswitch (shrink) {\t\t\t\t\t// check shrinking rule\n\tcase ANN_BD_NONE:\t\t\t\t\t// no shrinking allowed\n\t\tdecomp = SPLIT;\n\t\tbreak;\n\tcase ANN_BD_SUGGEST:\t\t\t\t// author's suggestion\n\tcase ANN_BD_SIMPLE:\t\t\t\t\t// simple shrink\n\t\tdecomp = trySimpleShrink(\n\t\t\t\tpa, pidx,\t\t\t\t// points and indices\n\t\t\t\tn, dim,\t\t\t\t\t// number of points and dimension\n\t\t\t\tbnd_box,\t\t\t\t// current bounding box\n\t\t\t\tinner_box);\t\t\t\t// inner box if shrinking (returned)\n\t\tbreak;\n\tcase ANN_BD_CENTROID:\t\t\t\t// centroid shrink\n\t\tdecomp = tryCentroidShrink(\n\t\t\t\tpa, pidx,\t\t\t\t// points and indices\n\t\t\t\tn, dim,\t\t\t\t\t// number of points and dimension\n\t\t\t\tbnd_box,\t\t\t\t// current bounding box\n\t\t\t\tsplitter,\t\t\t\t// splitting procedure\n\t\t\t\tinner_box);\t\t\t\t// inner box if shrinking (returned)\n\t\tbreak;\n\tdefault:\n\t\tannError(\"Illegal shrinking rule\", ANNabort);\n\t}\n\treturn decomp;\n}\n\n//----------------------------------------------------------------------\n//\trbd_tree - recursive procedure to build a bd-tree\n//\n//\t\tThis is analogous to rkd_tree, but for bd-trees.  See the\n//\t\tprocedure rkd_tree() in kd_split.cpp for more information.\n//\n//\t\tIf the number of points falls below the bucket size, then a\n//\t\tleaf node is created for the points.  Otherwise we invoke the\n//\t\tprocedure selectDecomp() which determines whether we are to\n//\t\tsplit or shrink.  If splitting is chosen, then we essentially\n//\t\tdo exactly as rkd_tree() would, and invoke the specified\n//\t\tsplitting procedure to the points.  Otherwise, the selection\n//\t\tprocedure returns a bounding box, from which we extract the\n//\t\tappropriate shrinking bounds, and create a shrinking node.\n//\t\tFinally the points are subdivided, and the procedure is\n//\t\tinvoked recursively on the two subsets to form the children.\n//----------------------------------------------------------------------\n\nANNkd_ptr rbd_tree(\t\t\t\t// recursive construction of bd-tree\n\tANNpointArray\t\tpa,\t\t\t\t// point array\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices to store in subtree\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\tdim,\t\t\t// dimension of space\n\tint\t\t\t\t\tbsp,\t\t\t// bucket space\n\tANNorthRect\t\t\t&bnd_box,\t\t// bounding box for current node\n\tANNkd_splitter\t\tsplitter,\t\t// splitting routine\n\tANNshrinkRule\t\tshrink)\t\t\t// shrinking rule\n{\n\tANNdecomp decomp;\t\t\t\t\t// decomposition method\n\n\tANNorthRect inner_box(dim);\t\t\t// inner box (if shrinking)\n\n\tif (n <= bsp) {\t\t\t\t\t\t// n small, make a leaf node\n\t\tif (n == 0)\t\t\t\t\t\t// empty leaf node\n\t\t\treturn KD_TRIVIAL;\t\t\t// return (canonical) empty leaf\n\t\telse\t\t\t\t\t\t\t// construct the node and return\n\t\t\treturn new ANNkd_leaf(n, pidx);\n\t}\n\n\tdecomp = selectDecomp(\t\t\t\t// select decomposition method\n\t\t\t\tpa, pidx,\t\t\t\t// points and indices\n\t\t\t\tn, dim,\t\t\t\t\t// number of points and dimension\n\t\t\t\tbnd_box,\t\t\t\t// current bounding box\n\t\t\t\tsplitter, shrink,\t\t// splitting/shrinking methods\n\t\t\t\tinner_box);\t\t\t\t// inner box if shrinking (returned)\n\n\tif (decomp == SPLIT) {\t\t\t\t// split selected\n\t\tint cd;\t\t\t\t\t\t\t// cutting dimension\n\t\tANNcoord cv;\t\t\t\t\t// cutting value\n\t\tint n_lo;\t\t\t\t\t\t// number on low side of cut\n\t\t\t\t\t\t\t\t\t\t// invoke splitting procedure\n\t\t(*splitter)(pa, pidx, bnd_box, n, dim, cd, cv, n_lo);\n\n\t\tANNcoord lv = bnd_box.lo[cd];\t// save bounds for cutting dimension\n\t\tANNcoord hv = bnd_box.hi[cd];\n\n\t\tbnd_box.hi[cd] = cv;\t\t\t// modify bounds for left subtree\n\t\tANNkd_ptr lo = rbd_tree(\t\t// build left subtree\n\t\t\t\tpa, pidx, n_lo,\t\t\t// ...from pidx[0..n_lo-1]\n\t\t\t\tdim, bsp, bnd_box, splitter, shrink);\n\t\tbnd_box.hi[cd] = hv;\t\t\t// restore bounds\n\n\t\tbnd_box.lo[cd] = cv;\t\t\t// modify bounds for right subtree\n\t\tANNkd_ptr hi = rbd_tree(\t\t// build right subtree\n\t\t\t\tpa, pidx + n_lo, n-n_lo,// ...from pidx[n_lo..n-1]\n\t\t\t\tdim, bsp, bnd_box, splitter, shrink);\n\t\tbnd_box.lo[cd] = lv;\t\t\t// restore bounds\n\t\t\t\t\t\t\t\t\t\t// create the splitting node\n\t\treturn new ANNkd_split(cd, cv, lv, hv, lo, hi);\n\t}\n\telse {\t\t\t\t\t\t\t\t// shrink selected\n\t\tint n_in;\t\t\t\t\t\t// number of points in box\n\t\tint n_bnds;\t\t\t\t\t\t// number of bounding sides\n\n\t\tannBoxSplit(\t\t\t\t\t// split points around inner box\n\t\t\t\tpa,\t\t\t\t\t\t// points to split\n\t\t\t\tpidx,\t\t\t\t\t// point indices\n\t\t\t\tn,\t\t\t\t\t\t// number of points\n\t\t\t\tdim,\t\t\t\t\t// dimension\n\t\t\t\tinner_box,\t\t\t\t// inner box\n\t\t\t\tn_in);\t\t\t\t\t// number of points inside (returned)\n\n\t\tANNkd_ptr in = rbd_tree(\t\t// build inner subtree pidx[0..n_in-1]\n\t\t\t\tpa, pidx, n_in, dim, bsp, inner_box, splitter, shrink);\n\t\tANNkd_ptr out = rbd_tree(\t\t// build outer subtree pidx[n_in..n]\n\t\t\t\tpa, pidx+n_in, n - n_in, dim, bsp, bnd_box, splitter, shrink);\n\n\t\tANNorthHSArray bnds = NULL;\t\t// bounds (alloc in Box2Bnds and\n\t\t\t\t\t\t\t\t\t\t// ...freed in bd_shrink destroyer)\n\n\t\tannBox2Bnds(\t\t\t\t\t// convert inner box to bounds\n\t\t\t\tinner_box,\t\t\t\t// inner box\n\t\t\t\tbnd_box,\t\t\t\t// enclosing box\n\t\t\t\tdim,\t\t\t\t\t// dimension\n\t\t\t\tn_bnds,\t\t\t\t\t// number of bounds (returned)\n\t\t\t\tbnds);\t\t\t\t\t// bounds array (modified)\n\n\t\t\t\t\t\t\t\t\t\t// return shrinking node\n\t\treturn new ANNbd_shrink(n_bnds, bnds, in, out);\n\t}\n}\n"
  },
  {
    "path": "src/ANN/bd_tree.h",
    "content": "//----------------------------------------------------------------------\n// File:\t\t\tbd_tree.h\n// Programmer:\t\tDavid Mount\n// Description:\t\tDeclarations for standard bd-tree routines\n// Last modified:\t01/04/05 (Version 1.0)\n//----------------------------------------------------------------------\n// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and\n// David Mount.  All Rights Reserved.\n//\n// This software and related documentation is part of the Approximate\n// Nearest Neighbor Library (ANN).  This software is provided under\n// the provisions of the Lesser GNU Public License (LGPL).  See the\n// file ../ReadMe.txt for further information.\n//\n// The University of Maryland (U.M.) and the authors make no\n// representations about the suitability or fitness of this software for\n// any purpose.  It is provided \"as is\" without express or implied\n// warranty.\n//----------------------------------------------------------------------\n// History:\n//\tRevision 0.1  03/04/98\n//\t\tInitial release\n//\tRevision 1.0  04/01/05\n//\t\tChanged IN, OUT to ANN_IN, ANN_OUT\n//----------------------------------------------------------------------\n\n#ifndef ANN_bd_tree_H\n#define ANN_bd_tree_H\n\n#include \"ANNx.h\"\t\t\t\t\t// all ANN includes\n#include \"kd_tree.h\"\t\t\t\t\t// kd-tree includes\n\n//----------------------------------------------------------------------\n//\tbd-tree shrinking node.\n//\t\tThe main addition in the bd-tree is the shrinking node, which\n//\t\tis declared here.\n//\n//\t\tShrinking nodes are defined by list of orthogonal halfspaces.\n//\t\tThese halfspaces define a (possibly unbounded) orthogonal\n//\t\trectangle.  There are two children, in and out.  Points that\n//\t\tlie within this rectangle are stored in the in-child, and the\n//\t\tother points are stored in the out-child.\n//\n//\t\tWe use a list of orthogonal halfspaces rather than an\n//\t\torthogonal rectangle object because typically the number of\n//\t\tsides of the shrinking box will be much smaller than the\n//\t\tworst case bound of 2*dim.\n//\n//\t\tBEWARE: Note that constructor just copies the pointer to the\n//\t\tbounding array, but the destructor deallocates it.  This is\n//\t\trather poor practice, but happens to be convenient.  The list\n//\t\tis allocated in the bd-tree building procedure rbd_tree() just\n//\t\tprior to construction, and is used for no other purposes.\n//\n//\t\tWARNING: In the near neighbor searching code it is assumed that\n//\t\tthe list of bounding halfspaces is irredundant, meaning that there\n//\t\tare no two distinct halfspaces in the list with the same outward\n//\t\tpointing normals.\n//----------------------------------------------------------------------\n\nclass ANNbd_shrink : public ANNkd_node\t// splitting node of a kd-tree\n{\n\tint\t\t\t\t\tn_bnds;\t\t\t// number of bounding halfspaces\n\tANNorthHSArray\t\tbnds;\t\t\t// list of bounding halfspaces\n\tANNkd_ptr\t\t\tchild[2];\t\t// in and out children\npublic:\n\tANNbd_shrink(\t\t\t\t\t\t// constructor\n\t\tint\t\t\t\tnb,\t\t\t\t// number of bounding halfspaces\n\t\tANNorthHSArray\tbds,\t\t\t// list of bounding halfspaces\n\t\tANNkd_ptr ic=NULL, ANNkd_ptr oc=NULL)\t// children\n\t\t{\n\t\t\tn_bnds\t\t\t= nb;\t\t\t\t// cutting dimension\n\t\t\tbnds\t\t\t= bds;\t\t\t\t// assign bounds\n\t\t\tchild[ANN_IN]\t= ic;\t\t\t\t// set children\n\t\t\tchild[ANN_OUT]\t= oc;\n\t\t}\n\n\t~ANNbd_shrink()\t\t\t\t\t\t// destructor\n\t\t{\n\t\t\tif (child[ANN_IN]!= NULL && child[ANN_IN]!=  KD_TRIVIAL)\n\t\t\t\tdelete child[ANN_IN];\n\t\t\tif (child[ANN_OUT]!= NULL&& child[ANN_OUT]!= KD_TRIVIAL)\n\t\t\t\tdelete child[ANN_OUT];\n\t\t\tif (bnds != NULL)\n\t\t\t\tdelete [] bnds;\t\t\t// delete bounds\n\t\t}\n\n\tvirtual void getStats(\t\t\t\t\t\t// get tree statistics\n\t\t\t\tint dim,\t\t\t\t\t\t// dimension of space\n\t\t\t\tANNkdStats &st,\t\t\t\t\t// statistics\n\t\t\t\tANNorthRect &bnd_box);\t\t\t// bounding box\n\tvirtual void print(int level, ostream &out);// print node\n\tvirtual void dump(ostream &out);\t\t\t// dump node\n\n\tvirtual void ann_search(ANNdist);\t\t\t// standard search\n\tvirtual void ann_pri_search(ANNdist);\t\t// priority search\n\tvirtual void ann_FR_search(ANNdist); \t\t// fixed-radius search\n};\n\n#endif\n"
  },
  {
    "path": "src/ANN/brute.cpp",
    "content": "//----------------------------------------------------------------------\n// File:\t\t\tbrute.cpp\n// Programmer:\t\tSunil Arya and David Mount\n// Description:\t\tBrute-force nearest neighbors\n// Last modified:\t05/03/05 (Version 1.1)\n//----------------------------------------------------------------------\n// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and\n// David Mount.  All Rights Reserved.\n//\n// This software and related documentation is part of the Approximate\n// Nearest Neighbor Library (ANN).  This software is provided under\n// the provisions of the Lesser GNU Public License (LGPL).  See the\n// file ../ReadMe.txt for further information.\n//\n// The University of Maryland (U.M.) and the authors make no\n// representations about the suitability or fitness of this software for\n// any purpose.  It is provided \"as is\" without express or implied\n// warranty.\n//----------------------------------------------------------------------\n// History:\n//\tRevision 0.1  03/04/98\n//\t\tInitial release\n//\tRevision 1.1  05/03/05\n//\t\tAdded fixed-radius kNN search\n//----------------------------------------------------------------------\n\n#include \"ANNx.h\"\t\t\t\t\t// all ANN includes\n#include \"pr_queue_k.h\"\t\t\t\t\t// k element priority queue\n\n//----------------------------------------------------------------------\n//\t\tBrute-force search simply stores a pointer to the list of\n//\t\tdata points and searches linearly for the nearest neighbor.\n//\t\tThe k nearest neighbors are stored in a k-element priority\n//\t\tqueue (which is implemented in a pretty dumb way as well).\n//\n//\t\tIf ANN_ALLOW_SELF_MATCH is ANNfalse then data points at distance\n//\t\tzero are not considered.\n//\n//\t\tNote that the error bound eps is passed in, but it is ignored.\n//\t\tThese routines compute exact nearest neighbors (which is needed\n//\t\tfor validation purposes in ann_test.cpp).\n//----------------------------------------------------------------------\n\nANNbruteForce::ANNbruteForce(\t\t\t// constructor from point array\n\tANNpointArray\t\tpa,\t\t\t\t// point array\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\tdd)\t\t\t\t// dimension\n{\n\tdim = dd;  n_pts = n;  pts = pa;\n}\n\nANNbruteForce::~ANNbruteForce() { }\t\t// destructor (empty)\n\nvoid ANNbruteForce::annkSearch(\t\t\t// approx k near neighbor search\n\tANNpoint\t\t\tq,\t\t\t\t// query point\n\tint\t\t\t\t\tk,\t\t\t\t// number of near neighbors to return\n\tANNidxArray\t\t\tnn_idx,\t\t\t// nearest neighbor indices (returned)\n\tANNdistArray\t\tdd,\t\t\t\t// dist to near neighbors (returned)\n\tdouble\t\t\t\teps)\t\t\t// error bound (ignored)\n{\n\tANNmin_k mk(k);\t\t\t\t\t\t// construct a k-limited priority queue\n\tint i;\n\n\tif (k > n_pts) {\t\t\t\t\t// too many near neighbors?\n\t\tannError(\"Requesting more near neighbors than data points\", ANNabort);\n\t}\n\t\t\t\t\t\t\t\t\t\t// run every point through queue\n\tfor (i = 0; i < n_pts; i++) {\n\t\t\t\t\t\t\t\t\t\t// compute distance to point\n\t\tANNdist sqDist = annDist(dim, pts[i], q);\n\t\tif (ANN_ALLOW_SELF_MATCH || sqDist != 0)\n\t\t\tmk.insert(sqDist, i);\n\t}\n\tfor (i = 0; i < k; i++) {\t\t\t// extract the k closest points\n\t\tdd[i] = mk.ith_smallest_key(i);\n\t\tnn_idx[i] = mk.ith_smallest_info(i);\n\t}\n}\n\nint ANNbruteForce::annkFRSearch(\t\t// approx fixed-radius kNN search\n\tANNpoint\t\t\tq,\t\t\t\t// query point\n\tANNdist\t\t\t\tsqRad,\t\t\t// squared radius\n\tint\t\t\t\t\tk,\t\t\t\t// number of near neighbors to return\n\tANNidxArray\t\t\tnn_idx,\t\t\t// nearest neighbor array (returned)\n\tANNdistArray\t\tdd,\t\t\t\t// dist to near neighbors (returned)\n\tdouble\t\t\t\teps)\t\t\t// error bound\n{\n\tANNmin_k mk(k);\t\t\t\t\t\t// construct a k-limited priority queue\n\tint i;\n\tint pts_in_range = 0;\t\t\t\t// number of points in query range\n\t\t\t\t\t\t\t\t\t\t// run every point through queue\n\tfor (i = 0; i < n_pts; i++) {\n\t\t\t\t\t\t\t\t\t\t// compute distance to point\n\t\tANNdist sqDist = annDist(dim, pts[i], q);\n\t\tif (sqDist <= sqRad &&\t\t\t// within radius bound\n\t\t\t(ANN_ALLOW_SELF_MATCH || sqDist != 0)) { // ...and no self match\n\t\t\tmk.insert(sqDist, i);\n\t\t\tpts_in_range++;\n\t\t}\n\t}\n\tfor (i = 0; i < k; i++) {\t\t\t// extract the k closest points\n\t\tif (dd != NULL)\n\t\t\tdd[i] = mk.ith_smallest_key(i);\n\t\tif (nn_idx != NULL)\n\t\t\tnn_idx[i] = mk.ith_smallest_info(i);\n\t}\n\n\treturn pts_in_range;\n}\n\n// MFH: version that returns all points\nstd::pair< std::vector<int>, std::vector<double> > ANNbruteForce::annkFRSearch2(\t\t// approx fixed-radius kNN search\n\tANNpoint\t\t\tq,\t\t\t\t// query point\n\tANNdist\t\t\t\tsqRad,\t\t\t// squared radius\n\tdouble\t\t\t\teps)\t\t\t// error bound\n{\n\tstd::vector<int> points;\n\tstd::vector<double> dists;\n\tint i;\n\tint pts_in_range = 0;\t\t\t\t// number of points in query range\n\t\t\t\t\t\t\t\t\t\t// run every point through queue\n\tfor (i = 0; i < n_pts; i++) {\n\t\t\t\t\t\t\t\t\t\t// compute distance to point\n\t\tANNdist sqDist = annDist(dim, pts[i], q);\n\t\tif (sqDist <= sqRad &&\t\t\t// within radius bound\n\t\t\t(ANN_ALLOW_SELF_MATCH || sqDist != 0)) { // ...and no self match\n\t\t\tpoints.push_back(i);\n\t\t\tdists.push_back(sqDist);\n\t\t\tpts_in_range++;\n\t\t}\n\t}\n\n\treturn std::make_pair(points, dists);\n}\n"
  },
  {
    "path": "src/ANN/kd_dump.cpp",
    "content": "//----------------------------------------------------------------------\n// File:\t\t\tkd_dump.cc\n// Programmer:\t\tDavid Mount\n// Description:\t\tDump and Load for kd- and bd-trees\n// Last modified:\t01/04/05 (Version 1.0)\n//----------------------------------------------------------------------\n// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and\n// David Mount.  All Rights Reserved.\n// \n// This software and related documentation is part of the Approximate\n// Nearest Neighbor Library (ANN).  This software is provided under\n// the provisions of the Lesser GNU Public License (LGPL).  See the\n// file ../ReadMe.txt for further information.\n// \n// The University of Maryland (U.M.) and the authors make no\n// representations about the suitability or fitness of this software for\n// any purpose.  It is provided \"as is\" without express or implied\n// warranty.\n//----------------------------------------------------------------------\n// History:\n//\tRevision 0.1  03/04/98\n//\t\tInitial release\n//\tRevision 1.0  04/01/05\n//\t\tMoved dump out of kd_tree.cc into this file.\n//\t\tAdded kd-tree load constructor.\n//      Revision 2/29/08\n//              added cstdlib and std:: along with cstdlib. and sting.h\n//----------------------------------------------------------------------\n// This file contains routines for dumping kd-trees and bd-trees and\n// reloading them. (It is an abuse of policy to include both kd- and\n// bd-tree routines in the same file, sorry.  There should be no problem\n// in deleting the bd- versions of the routines if they are not\n// desired.)\n//----------------------------------------------------------------------\n\n#include <cstdlib>\n#include <stdio.h>\n#include <string.h>\n\n//using namespace std;\t\t\t\t\t// make std:: available\n\n#include \"kd_tree.h\"\t\t\t\t\t// kd-tree declarations\n#include \"bd_tree.h\"\t\t\t\t\t// bd-tree declarations\n\n\n//----------------------------------------------------------------------\n//\t\tConstants\n//----------------------------------------------------------------------\n\nconst int\t\tSTRING_LEN\t\t= 500;\t// maximum string length\n// const double\tEPSILON\t\t\t= 1E-5; // small number for float comparison\n\nenum ANNtreeType {KD_TREE, BD_TREE};\t// tree types (used in loading)\n\n//----------------------------------------------------------------------\n//\t\tProcedure declarations\n//----------------------------------------------------------------------\n\nstatic ANNkd_ptr annReadDump(\t\t\t// read dump file\n\tistream\t\t\t\t&in,\t\t\t\t\t// input stream\n\tANNtreeType\t\t\ttree_type,\t\t\t\t// type of tree expected\n\tANNpointArray\t\t&the_pts,\t\t\t\t// new points (if applic)\n\tANNidxArray\t\t\t&the_pidx,\t\t\t\t// point indices (returned)\n\tint\t\t\t\t\t&the_dim,\t\t\t\t// dimension (returned)\n\tint\t\t\t\t\t&the_n_pts,\t\t\t\t// number of points (returned)\n\tint\t\t\t\t\t&the_bkt_size,\t\t\t// bucket size (returned)\n\tANNpoint\t\t\t&the_bnd_box_lo,\t\t// low bounding point\n\tANNpoint\t\t\t&the_bnd_box_hi);\t\t// high bounding point\n\nstatic ANNkd_ptr annReadTree(\t\t\t// read tree-part of dump file\n\tistream\t\t\t\t&in,\t\t\t\t\t// input stream\n\tANNtreeType\t\t\ttree_type,\t\t\t\t// type of tree expected\n\tANNidxArray\t\t\tthe_pidx,\t\t\t\t// point indices (modified)\n\tint\t\t\t\t\t&next_idx);\t\t\t\t// next index (modified)\n\n//----------------------------------------------------------------------\n//\tANN kd- and bd-tree Dump Format\n//\t\tThe dump file begins with a header containing the version of\n//\t\tANN, an optional section containing the points, followed by\n//\t\ta description of the tree.\tThe tree is printed in preorder.\n//\n//\t\tFormat:\n//\t\t#ANN <version number> <comments> [END_OF_LINE]\n//\t\tpoints <dim> <n_pts>\t\t\t(point coordinates: this is optional)\n//\t\t0 <xxx> <xxx> ... <xxx>\t\t\t(point indices and coordinates)\n//\t\t1 <xxx> <xxx> ... <xxx>\n//\t\t  ...\n//\t\ttree <dim> <n_pts> <bkt_size>\n//\t\t<xxx> <xxx> ... <xxx>\t\t\t(lower end of bounding box)\n//\t\t<xxx> <xxx> ... <xxx>\t\t\t(upper end of bounding box)\n//\t\t\t\tIf the tree is null, then a single line \"null\" is\n//\t\t\t\toutput.\t Otherwise the nodes of the tree are printed\n//\t\t\t\tone per line in preorder.  Leaves and splitting nodes \n//\t\t\t\thave the following formats:\n//\t\tLeaf node:\n//\t\t\t\tleaf <n_pts> <bkt[0]> <bkt[1]> ... <bkt[n-1]>\n//\t\tSplitting nodes:\n//\t\t\t\tsplit <cut_dim> <cut_val> <lo_bound> <hi_bound>\n//\n//\t\tFor bd-trees:\n//\n//\t\tShrinking nodes:\n//\t\t\t\tshrink <n_bnds>\n//\t\t\t\t\t\t<cut_dim> <cut_val> <side>\n//\t\t\t\t\t\t<cut_dim> <cut_val> <side>\n//\t\t\t\t\t\t... (repeated n_bnds times)\n//----------------------------------------------------------------------\n\nvoid ANNkd_tree::Dump(\t\t\t\t\t// dump entire tree\n\t\tANNbool with_pts,\t\t\t\t// print points as well?\n\t\tostream &out)\t\t\t\t\t// output stream\n{\n\tout << \"#ANN \" << ANNversion << \"\\n\";\n\tout.precision(ANNcoordPrec);\t\t// use full precision in dumping\n\tif (with_pts) {\t\t\t\t\t\t// print point coordinates\n\t\tout << \"points \" << dim << \" \" << n_pts << \"\\n\";\n\t\tfor (int i = 0; i < n_pts; i++) {\n\t\t\tout << i << \" \";\n\t\t\tannPrintPt(pts[i], dim, out);\n\t\t\tout << \"\\n\";\n\t\t}\n\t}\n\tout << \"tree \"\t\t\t\t\t\t// print tree elements\n\t\t<< dim << \" \"\n\t\t<< n_pts << \" \"\n\t\t<< bkt_size << \"\\n\";\n\n\tannPrintPt(bnd_box_lo, dim, out);\t// print lower bound\n\tout << \"\\n\";\n\tannPrintPt(bnd_box_hi, dim, out);\t// print upper bound\n\tout << \"\\n\";\n\n\tif (root == NULL)\t\t\t\t\t// empty tree?\n\t\tout << \"null\\n\";\n\telse {\n\t\troot->dump(out);\t\t\t\t// invoke printing at root\n\t}\n\tout.precision(0);\t\t\t\t\t// restore default precision\n}\n\nvoid ANNkd_split::dump(\t\t\t\t\t// dump a splitting node\n\t\tostream &out)\t\t\t\t\t// output stream\n{\n\tout << \"split \" << cut_dim << \" \" << cut_val << \" \";\n\tout << cd_bnds[ANN_LO] << \" \" << cd_bnds[ANN_HI] << \"\\n\";\n\n\tchild[ANN_LO]->dump(out);\t\t\t// print low child\n\tchild[ANN_HI]->dump(out);\t\t\t// print high child\n}\n\nvoid ANNkd_leaf::dump(\t\t\t\t\t// dump a leaf node\n\t\tostream &out)\t\t\t\t\t// output stream\n{\n\tif (this == KD_TRIVIAL) {\t\t\t// canonical trivial leaf node\n\t\tout << \"leaf 0\\n\";\t\t\t\t// leaf no points\n\t}\n\telse{\n\t\tout << \"leaf \" << n_pts;\n\t\tfor (int j = 0; j < n_pts; j++) {\n\t\t\tout << \" \" << bkt[j];\n\t\t}\n\t\tout << \"\\n\";\n\t}\n}\n\nvoid ANNbd_shrink::dump(\t\t\t\t// dump a shrinking node\n\t\tostream &out)\t\t\t\t\t// output stream\n{\n\tout << \"shrink \" << n_bnds << \"\\n\";\n\tfor (int j = 0; j < n_bnds; j++) {\n\t\tout << bnds[j].cd << \" \" << bnds[j].cv << \" \" << bnds[j].sd << \"\\n\";\n\t}\n\tchild[ANN_IN]->dump(out);\t\t\t// print in-child\n\tchild[ANN_OUT]->dump(out);\t\t\t// print out-child\n}\n\n//----------------------------------------------------------------------\n// Load kd-tree from dump file\n//\t\tThis rebuilds a kd-tree which was dumped to a file.\t The dump\n//\t\tfile contains all the basic tree information according to a\n//\t\tpreorder traversal.\t We assume that the dump file also contains\n//\t\tpoint data.\t (This is to guarantee the consistency of the tree.)\n//\t\tIf not, then an error is generated.\n//\n//\t\tIndirectly, this procedure allocates space for points, point\n//\t\tindices, all nodes in the tree, and the bounding box for the\n//\t\ttree.  When the tree is destroyed, all but the points are\n//\t\tdeallocated.\n//\n//\t\tThis routine calls annReadDump to do all the work.\n//----------------------------------------------------------------------\n\nANNkd_tree::ANNkd_tree(\t\t\t\t\t// build from dump file\n\tistream\t\t\t\t&in)\t\t\t\t\t// input stream for dump file\n{\n\tint the_dim;\t\t\t\t\t\t\t\t// local dimension\n\tint the_n_pts;\t\t\t\t\t\t\t\t// local number of points\n\tint the_bkt_size;\t\t\t\t\t\t\t// local number of points\n\tANNpoint the_bnd_box_lo;\t\t\t\t\t// low bounding point\n\tANNpoint the_bnd_box_hi;\t\t\t\t\t// high bounding point\n\tANNpointArray the_pts;\t\t\t\t\t\t// point storage\n\tANNidxArray the_pidx;\t\t\t\t\t\t// point index storage\n\tANNkd_ptr the_root;\t\t\t\t\t\t\t// root of the tree\n\n\tthe_root = annReadDump(\t\t\t\t\t\t// read the dump file\n\t\tin,\t\t\t\t\t\t\t\t\t\t// input stream\n\t\tKD_TREE,\t\t\t\t\t\t\t\t// expecting a kd-tree\n\t\tthe_pts,\t\t\t\t\t\t\t\t// point array (returned)\n\t\tthe_pidx,\t\t\t\t\t\t\t\t// point indices (returned)\n\t\tthe_dim, the_n_pts, the_bkt_size,\t\t// basic tree info (returned)\n\t\tthe_bnd_box_lo, the_bnd_box_hi);\t\t// bounding box info (returned)\n\n\t\t\t\t\t\t\t\t\t\t\t\t// create a skeletal tree\n\tSkeletonTree(the_n_pts, the_dim, the_bkt_size, the_pts, the_pidx);\n\n\tbnd_box_lo = the_bnd_box_lo;\n\tbnd_box_hi = the_bnd_box_hi;\n\n\troot = the_root;\t\t\t\t\t\t\t// set the root\n}\n\nANNbd_tree::ANNbd_tree(\t\t\t\t\t// build bd-tree from dump file\n\tistream\t\t\t\t&in) : ANNkd_tree()\t\t// input stream for dump file\n{\n\tint the_dim;\t\t\t\t\t\t\t\t// local dimension\n\tint the_n_pts;\t\t\t\t\t\t\t\t// local number of points\n\tint the_bkt_size;\t\t\t\t\t\t\t// local number of points\n\tANNpoint the_bnd_box_lo;\t\t\t\t\t// low bounding point\n\tANNpoint the_bnd_box_hi;\t\t\t\t\t// high bounding point\n\tANNpointArray the_pts;\t\t\t\t\t\t// point storage\n\tANNidxArray the_pidx;\t\t\t\t\t\t// point index storage\n\tANNkd_ptr the_root;\t\t\t\t\t\t\t// root of the tree\n\n\tthe_root = annReadDump(\t\t\t\t\t\t// read the dump file\n\t\tin,\t\t\t\t\t\t\t\t\t\t// input stream\n\t\tBD_TREE,\t\t\t\t\t\t\t\t// expecting a bd-tree\n\t\tthe_pts,\t\t\t\t\t\t\t\t// point array (returned)\n\t\tthe_pidx,\t\t\t\t\t\t\t\t// point indices (returned)\n\t\tthe_dim, the_n_pts, the_bkt_size,\t\t// basic tree info (returned)\n\t\tthe_bnd_box_lo, the_bnd_box_hi);\t\t// bounding box info (returned)\n\n\t\t\t\t\t\t\t\t\t\t\t\t// create a skeletal tree\n\tSkeletonTree(the_n_pts, the_dim, the_bkt_size, the_pts, the_pidx);\n\tbnd_box_lo = the_bnd_box_lo;\n\tbnd_box_hi = the_bnd_box_hi;\n\n\troot = the_root;\t\t\t\t\t\t\t// set the root\n}\n\n//----------------------------------------------------------------------\n//\tannReadDump - read a dump file\n//\n//\t\tThis procedure reads a dump file, constructs a kd-tree\n//\t\tand returns all the essential information needed to actually\n//\t\tconstruct the tree.\t Because this procedure is used for\n//\t\tconstructing both kd-trees and bd-trees, the second argument\n//\t\tis used to indicate which type of tree we are expecting.\n//----------------------------------------------------------------------\n\nstatic ANNkd_ptr annReadDump(\n\tistream\t\t\t\t&in,\t\t\t\t\t// input stream\n\tANNtreeType\t\t\ttree_type,\t\t\t\t// type of tree expected\n\tANNpointArray\t\t&the_pts,\t\t\t\t// new points (returned)\n\tANNidxArray\t\t\t&the_pidx,\t\t\t\t// point indices (returned)\n\tint\t\t\t\t\t&the_dim,\t\t\t\t// dimension (returned)\n\tint\t\t\t\t\t&the_n_pts,\t\t\t\t// number of points (returned)\n\tint\t\t\t\t\t&the_bkt_size,\t\t\t// bucket size (returned)\n\tANNpoint\t\t\t&the_bnd_box_lo,\t\t// low bounding point (ret'd)\n\tANNpoint\t\t\t&the_bnd_box_hi)\t\t// high bounding point (ret'd)\n{\n\tint j;\n\tchar str[STRING_LEN];\t\t\t\t\t\t// storage for string\n\tchar version[STRING_LEN];\t\t\t\t\t// ANN version number\n\tANNkd_ptr the_root = NULL;\n\n\t//------------------------------------------------------------------\n\t//\tInput file header\n\t//------------------------------------------------------------------\n\tin >> str;\t\t\t\t\t\t\t\t\t// input header\n\tif (strcmp(str, \"#ANN\") != 0) {\t\t\t\t// incorrect header\n\t\tannError(\"Incorrect header for dump file\", ANNabort);\n\t}\n\tin.getline(version, STRING_LEN);\t\t\t// get version (ignore)\n\n\t//------------------------------------------------------------------\n\t//\tInput the points\n\t//\t\t\tAn array the_pts is allocated and points are read from\n\t//\t\t\tthe dump file.\n\t//------------------------------------------------------------------\n\tin >> str;\t\t\t\t\t\t\t\t\t// get major heading\n\tif (strcmp(str, \"points\") == 0) {\t\t\t// points section\n\t\tin >> the_dim;\t\t\t\t\t\t\t// input dimension\n\t\tin >> the_n_pts;\t\t\t\t\t\t// number of points\n\t\t\t\t\t\t\t\t\t\t\t\t// allocate point storage\n\t\tthe_pts = annAllocPts(the_n_pts, the_dim);\n\t\tfor (int i = 0; i < the_n_pts; i++) {\t// input point coordinates\n\t\t\tANNidx idx;\t\t\t\t\t\t\t// point index\n\t\t\tin >> idx;\t\t\t\t\t\t\t// input point index\n\t\t\tif (idx < 0 || idx >= the_n_pts) {\n\t\t\t\tannError(\"Point index is out of range\", ANNabort);\n\t\t\t}\n\t\t\tfor (j = 0; j < the_dim; j++) {\n\t\t\t\tin >> the_pts[idx][j];\t\t\t// read point coordinates\n\t\t\t}\n\t\t}\n\t\tin >> str;\t\t\t\t\t\t\t\t// get next major heading\n\t}\n\telse {\t\t\t\t\t\t\t\t\t\t// no points were input\n\t\tannError(\"Points must be supplied in the dump file\", ANNabort);\n\t}\n\n\t//------------------------------------------------------------------\n\t//\tInput the tree\n\t//\t\t\tAfter the basic header information, we invoke annReadTree\n\t//\t\t\tto do all the heavy work.  We create our own array of\n\t//\t\t\tpoint indices (so we can pass them to annReadTree())\n\t//\t\t\tbut we do not deallocate them.\tThey will be deallocated\n\t//\t\t\twhen the tree is destroyed.\n\t//------------------------------------------------------------------\n\tif (strcmp(str, \"tree\") == 0) {\t\t\t\t// tree section\n\t\tin >> the_dim;\t\t\t\t\t\t\t// read dimension\n\t\tin >> the_n_pts;\t\t\t\t\t\t// number of points\n\t\tin >> the_bkt_size;\t\t\t\t\t\t// bucket size\n\t\tthe_bnd_box_lo = annAllocPt(the_dim);\t// allocate bounding box pts\n\t\tthe_bnd_box_hi = annAllocPt(the_dim);\n\n\t\tfor (j = 0; j < the_dim; j++) {\t\t\t// read bounding box low\n\t\t\tin >> the_bnd_box_lo[j];\n\t\t}\n\t\tfor (j = 0; j < the_dim; j++) {\t\t\t// read bounding box low\n\t\t\tin >> the_bnd_box_hi[j];\n\t\t}\n\t\tthe_pidx = new ANNidx[the_n_pts];\t\t// allocate point index array\n\t\tint next_idx = 0;\t\t\t\t\t\t// number of indices filled\n\t\t\t\t\t\t\t\t\t\t\t\t// read the tree and indices\n\t\tthe_root = annReadTree(in, tree_type, the_pidx, next_idx);\n\t\tif (next_idx != the_n_pts) {\t\t\t// didn't see all the points?\n\t\t\tannError(\"Didn't see as many points as expected\", ANNwarn);\n\t\t}\n\t}\n\telse {\n\t\tannError(\"Illegal dump format.\tExpecting section heading\", ANNabort);\n\t}\n\treturn the_root;\n}\n\n//----------------------------------------------------------------------\n// annReadTree - input tree and return pointer\n//\n//\t\tannReadTree reads in a node of the tree, makes any recursive\n//\t\tcalls as needed to input the children of this node (if internal).\n//\t\tIt returns a pointer to the node that was created.\tAn array\n//\t\tof point indices is given along with a pointer to the next\n//\t\tavailable location in the array.  As leaves are read, their\n//\t\tpoint indices are stored here, and the point buckets point\n//\t\tto the first entry in the array.\n//\n//\t\tRecall that these are the formats.\tThe tree is given in\n//\t\tpreorder.\n//\n//\t\tLeaf node:\n//\t\t\t\tleaf <n_pts> <bkt[0]> <bkt[1]> ... <bkt[n-1]>\n//\t\tSplitting nodes:\n//\t\t\t\tsplit <cut_dim> <cut_val> <lo_bound> <hi_bound>\n//\n//\t\tFor bd-trees:\n//\n//\t\tShrinking nodes:\n//\t\t\t\tshrink <n_bnds>\n//\t\t\t\t\t\t<cut_dim> <cut_val> <side>\n//\t\t\t\t\t\t<cut_dim> <cut_val> <side>\n//\t\t\t\t\t\t... (repeated n_bnds times)\n//----------------------------------------------------------------------\n\nstatic ANNkd_ptr annReadTree(\n\tistream\t\t\t\t&in,\t\t\t\t\t// input stream\n\tANNtreeType\t\t\ttree_type,\t\t\t\t// type of tree expected\n\tANNidxArray\t\t\tthe_pidx,\t\t\t\t// point indices (modified)\n\tint\t\t\t\t\t&next_idx)\t\t\t\t// next index (modified)\n{\n\tchar tag[STRING_LEN];\t\t\t\t\t\t// tag (leaf, split, shrink)\n\tint n_pts;\t\t\t\t\t\t\t\t\t// number of points in leaf\n\tint cd;\t\t\t\t\t\t\t\t\t\t// cut dimension\n\tANNcoord cv;\t\t\t\t\t\t\t\t// cut value\n\tANNcoord lb;\t\t\t\t\t\t\t\t// low bound\n\tANNcoord hb;\t\t\t\t\t\t\t\t// high bound\n\tint n_bnds;\t\t\t\t\t\t\t\t\t// number of bounding sides\n\tint sd;\t\t\t\t\t\t\t\t\t\t// which side\n\n\tin >> tag;\t\t\t\t\t\t\t\t\t// input node tag\n\n\tif (strcmp(tag, \"null\") == 0) {\t\t\t\t// null tree\n\t\treturn NULL;\n\t}\n\t//------------------------------------------------------------------\n\t//\tRead a leaf\n\t//------------------------------------------------------------------\n\tif (strcmp(tag, \"leaf\") == 0) {\t\t\t\t// leaf node\n\n\t\tin >> n_pts;\t\t\t\t\t\t\t// input number of points\n\t\tint old_idx = next_idx;\t\t\t\t\t// save next_idx\n\t\tif (n_pts == 0) {\t\t\t\t\t\t// trivial leaf\n\t\t\treturn KD_TRIVIAL;\n\t\t}\n\t\telse {\n\t\t\tfor (int i = 0; i < n_pts; i++) {\t// input point indices\n\t\t\t\tin >> the_pidx[next_idx++];\t\t// store in array of indices\n\t\t\t}\n\t\t}\n\t\treturn new ANNkd_leaf(n_pts, &the_pidx[old_idx]);\n\t}\n\t//------------------------------------------------------------------\n\t//\tRead a splitting node\n\t//------------------------------------------------------------------\n\telse if (strcmp(tag, \"split\") == 0) {\t\t// splitting node\n\n\t\tin >> cd >> cv >> lb >> hb;\n\n\t\t\t\t\t\t\t\t\t\t\t\t// read low and high subtrees\n\t\tANNkd_ptr lc = annReadTree(in, tree_type, the_pidx, next_idx);\n\t\tANNkd_ptr hc = annReadTree(in, tree_type, the_pidx, next_idx);\n\t\t\t\t\t\t\t\t\t\t\t\t// create new node and return\n\t\treturn new ANNkd_split(cd, cv, lb, hb, lc, hc);\n\t}\n\t//------------------------------------------------------------------\n\t//\tRead a shrinking node (bd-tree only)\n\t//------------------------------------------------------------------\n\telse if (strcmp(tag, \"shrink\") == 0) {\t\t// shrinking node\n\t\tif (tree_type != BD_TREE) {\n\t\t\tannError(\"Shrinking node not allowed in kd-tree\", ANNabort);\n\t\t}\n\n\t\tin >> n_bnds;\t\t\t\t\t\t\t// number of bounding sides\n\t\t\t\t\t\t\t\t\t\t\t\t// allocate bounds array\n\t\tANNorthHSArray bds = new ANNorthHalfSpace[n_bnds];\n\t\tfor (int i = 0; i < n_bnds; i++) {\n\t\t\tin >> cd >> cv >> sd;\t\t\t\t// input bounding halfspace\n\t\t\t\t\t\t\t\t\t\t\t\t// copy to array\n\t\t\tbds[i] = ANNorthHalfSpace(cd, cv, sd);\n\t\t}\n\t\t\t\t\t\t\t\t\t\t\t\t// read inner and outer subtrees\n\t\tANNkd_ptr ic = annReadTree(in, tree_type, the_pidx, next_idx);\n\t\tANNkd_ptr oc = annReadTree(in, tree_type, the_pidx, next_idx);\n\t\t\t\t\t\t\t\t\t\t\t\t// create new node and return\n\t\treturn new ANNbd_shrink(n_bnds, bds, ic, oc);\n\t}\n\telse {\n\t\tannError(\"Illegal node type in dump file\", ANNabort);\n\t\t//std::exit(0);\t\t // R objects... this approch to keep the compiler happy\n\t\treturn NULL;       // to keep the compiler happy\n\t}\n}\n\n\n\n\n"
  },
  {
    "path": "src/ANN/kd_fix_rad_search.cpp",
    "content": "//----------------------------------------------------------------------\n// File:\t\t\tkd_fix_rad_search.cpp\n// Programmer:\t\tSunil Arya and David Mount\n// Description:\t\tStandard kd-tree fixed-radius kNN search\n// Last modified:\t05/03/05 (Version 1.1)\n//----------------------------------------------------------------------\n// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and\n// David Mount.  All Rights Reserved.\n//\n// This software and related documentation is part of the Approximate\n// Nearest Neighbor Library (ANN).  This software is provided under\n// the provisions of the Lesser GNU Public License (LGPL).  See the\n// file ../ReadMe.txt for further information.\n//\n// The University of Maryland (U.M.) and the authors make no\n// representations about the suitability or fitness of this software for\n// any purpose.  It is provided \"as is\" without express or implied\n// warranty.\n//----------------------------------------------------------------------\n// History:\n//\tRevision 1.1  05/03/05\n//\t\tInitial release\n//----------------------------------------------------------------------\n\n// MFH: the code was changed to return all fixed radius neighbors using\n// a std::vector<int> called closest.\n\n#include \"kd_fix_rad_search.h\"\t\t\t// kd fixed-radius search decls\n#include <vector>\n\n//----------------------------------------------------------------------\n//\tApproximate fixed-radius k nearest neighbor search\n//\t\tThe squared radius is provided, and this procedure finds the\n//\t\tk nearest neighbors within the radius, and returns the total\n//\t\tnumber of points lying within the radius.\n//\n//\t\tThe method used for searching the kd-tree is a variation of the\n//\t\tnearest neighbor search used in kd_search.cpp, except that the\n//\t\tradius of the search ball is known.  We refer the reader to that\n//\t\tfile for the explanation of the recursive search procedure.\n//----------------------------------------------------------------------\n\n//----------------------------------------------------------------------\n//\t\tTo keep argument lists short, a number of global variables\n//\t\tare maintained which are common to all the recursive calls.\n//\t\tThese are given below.\n//----------------------------------------------------------------------\n\nint\t\t\t\tANNkdFRDim;\t\t\t\t// dimension of space\nANNpoint\t\tANNkdFRQ;\t\t\t\t// query point\nANNdist\t\t\tANNkdFRSqRad;\t\t\t// squared radius search bound\ndouble\t\t\tANNkdFRMaxErr;\t\t\t// max tolerable squared error\nANNpointArray\tANNkdFRPts;\t\t\t\t// the points\nANNmin_k*\t\tANNkdFRPointMK;\t\t\t// set of k closest points\n\nstd::vector<int> closest;\t\t\t  // MFH: set of all closest points\nstd::vector<double> dists;\t\t\t  // MFH: set of all closest points\n\nint\t\t\t\tANNkdFRPtsVisited;\t\t// total points visited\nint\t\t\t\tANNkdFRPtsInRange;\t\t// number of points in the range\n\n//----------------------------------------------------------------------\n//\tannkFRSearch - fixed radius search for k nearest neighbors\n//----------------------------------------------------------------------\n\n// defunct we use ANNkd_tree::annkFRSearch2 which stores all neighbors in the new structures\n// closest and dist.\nint ANNkd_tree::annkFRSearch(\n\tANNpoint\t\t\tq,\t\t\t\t// the query point\n\tANNdist\t\t\t\tsqRad,\t\t\t// squared radius search bound\n\tint\t\t\t\t\tk,\t\t\t\t// number of near neighbors to return\n\tANNidxArray\t\t\tnn_idx,\t\t\t// nearest neighbor indices (returned)\n\tANNdistArray\t\tdd,\t\t\t\t// the approximate nearest neighbor\n\tdouble\t\t\t\teps)\t\t\t// the error bound\n{\n\tANNkdFRDim = dim;\t\t\t\t\t// copy arguments to static equivs\n\tANNkdFRQ = q;\n\tANNkdFRSqRad = sqRad;\n\tANNkdFRPts = pts;\n\tANNkdFRPtsVisited = 0;\t\t\t\t// initialize count of points visited\n\tANNkdFRPtsInRange = 0;\t\t\t\t// ...and points in the range\n\n\tANNkdFRMaxErr = ANN_POW(1.0 + eps);\n\tANN_FLOP(2)\t\t\t\t\t\t\t// increment floating op count\n\n\tANNkdFRPointMK = new ANNmin_k(k);\t// create set for closest k points\n\t\t\t\t\t\t\t\t\t\t// search starting at the root\n\troot->ann_FR_search(annBoxDistance(q, bnd_box_lo, bnd_box_hi, dim));\n\n\tfor (int i = 0; i < k; i++) {\t\t// extract the k-th closest points\n\t\tif (dd != NULL)\n\t\t\tdd[i] = ANNkdFRPointMK->ith_smallest_key(i);\n\t\tif (nn_idx != NULL)\n\t\t\tnn_idx[i] = ANNkdFRPointMK->ith_smallest_info(i);\n\t}\n\n\tdelete ANNkdFRPointMK;\t\t\t\t// deallocate closest point set\n\treturn ANNkdFRPtsInRange;\t\t\t// return final point count\n\n}\n\n// MFH this function returns all closest points\nstd::pair< std::vector<int>, std::vector<double> > ANNkd_tree::annkFRSearch2(\n\tANNpoint\t\t\tq,\t\t\t\t// the query point\n\tANNdist\t\t\t\tsqRad,\t\t\t// squared radius search bound\n\tdouble\t\t\t\teps)\t\t\t// the error bound\n{\n\tANNkdFRDim = dim;\t\t\t\t\t// copy arguments to static equivs\n\tANNkdFRQ = q;\n\tANNkdFRSqRad = sqRad;\n\tANNkdFRPts = pts;\n\tANNkdFRPtsVisited = 0;\t\t\t\t// initialize count of points visited\n\tANNkdFRPtsInRange = 0;\t\t\t\t// ...and points in the range\n\n\tANNkdFRMaxErr = ANN_POW(1.0 + eps);\n\tANN_FLOP(2)\t\t\t\t\t\t\t// increment floating op count\n\n\t//ANNkdFRPointMK = new ANNmin_k(k);\t// create set for closest k points\n\n\tclosest.clear();\n\tdists.clear();\n\n\t// search starting at the root\n\troot->ann_FR_search(annBoxDistance(q, bnd_box_lo, bnd_box_hi, dim));\n\n\treturn std::make_pair(closest, dists);\t\t\t// return final point count\n\n}\n\n\n//----------------------------------------------------------------------\n//\tkd_split::arch - search a splitting node\n//\t\tNote: This routine is similar in structure to the standard kNN\n//\t\tsearch.  It visits the subtree that is closer to the query point\n//\t\tfirst.  For fixed-radius search, there is no benefit in visiting\n//\t\tone subtree before the other, but we maintain the same basic\n//\t\tcode structure for the sake of uniformity.\n//----------------------------------------------------------------------\n\nvoid ANNkd_split::ann_FR_search(ANNdist box_dist)\n{\n\t\t\t\t\t\t\t\t\t\t// check dist calc term condition\n\tif (ANNmaxPtsVisited != 0 && ANNkdFRPtsVisited > ANNmaxPtsVisited) return;\n\n\t\t\t\t\t\t\t\t\t\t// distance to cutting plane\n\tANNcoord cut_diff = ANNkdFRQ[cut_dim] - cut_val;\n\n\tif (cut_diff < 0) {\t\t\t\t\t// left of cutting plane\n\t\tchild[ANN_LO]->ann_FR_search(box_dist);// visit closer child first\n\n\t\tANNcoord box_diff = cd_bnds[ANN_LO] - ANNkdFRQ[cut_dim];\n\t\tif (box_diff < 0)\t\t\t\t// within bounds - ignore\n\t\t\tbox_diff = 0;\n\t\t\t\t\t\t\t\t\t\t// distance to further box\n\t\tbox_dist = (ANNdist) ANN_SUM(box_dist,\n\t\t\t\tANN_DIFF(ANN_POW(box_diff), ANN_POW(cut_diff)));\n\n\t\t\t\t\t\t\t\t\t\t// visit further child if in range\n\t\tif (box_dist * ANNkdFRMaxErr <= ANNkdFRSqRad)\n\t\t\tchild[ANN_HI]->ann_FR_search(box_dist);\n\n\t}\n\telse {\t\t\t\t\t\t\t\t// right of cutting plane\n\t\tchild[ANN_HI]->ann_FR_search(box_dist);// visit closer child first\n\n\t\tANNcoord box_diff = ANNkdFRQ[cut_dim] - cd_bnds[ANN_HI];\n\t\tif (box_diff < 0)\t\t\t\t// within bounds - ignore\n\t\t\tbox_diff = 0;\n\t\t\t\t\t\t\t\t\t\t// distance to further box\n\t\tbox_dist = (ANNdist) ANN_SUM(box_dist,\n\t\t\t\tANN_DIFF(ANN_POW(box_diff), ANN_POW(cut_diff)));\n\n\t\t\t\t\t\t\t\t\t\t// visit further child if close enough\n\t\tif (box_dist * ANNkdFRMaxErr <= ANNkdFRSqRad)\n\t\t\tchild[ANN_LO]->ann_FR_search(box_dist);\n\n\t}\n\tANN_FLOP(13)\t\t\t\t\t\t// increment floating ops\n\tANN_SPL(1)\t\t\t\t\t\t\t// one more splitting node visited\n}\n\n//----------------------------------------------------------------------\n//\tkd_leaf::ann_FR_search - search points in a leaf node\n//\t\tNote: The unreadability of this code is the result of\n//\t\tsome fine tuning to replace indexing by pointer operations.\n//----------------------------------------------------------------------\n\nvoid ANNkd_leaf::ann_FR_search(ANNdist box_dist)\n{\n\tANNdist dist;\t\t\t\t// distance to data point\n\tANNcoord* pp;\t\t\t\t// data coordinate pointer\n\tANNcoord* qq;\t\t\t\t// query coordinate pointer\n\tANNcoord t;\n\tint d;\n\n\tfor (int i = 0; i < n_pts; i++) {\t// check points in bucket\n\n\t\tpp = ANNkdFRPts[bkt[i]];\t\t// first coord of next data point\n\t\tqq = ANNkdFRQ;\t\t\t\t\t// first coord of query point\n\t\tdist = 0;\n\n\t\tfor(d = 0; d < ANNkdFRDim; d++) {\n\t\t\tANN_COORD(1)\t\t\t\t// one more coordinate hit\n\t\t\tANN_FLOP(5)\t\t\t\t\t// increment floating ops\n\n\t\t\tt = *(qq++) - *(pp++);\t\t// compute length and adv coordinate\n\t\t\t\t\t\t\t\t\t\t// exceeds dist to k-th smallest?\n\n\n\n\t\t\tif( (dist = ANN_SUM(dist, ANN_POW(t))) > ANNkdFRSqRad) {\n\t\t\t\tbreak;\n\t\t\t}\n\t\t}\n\n\t\tif (d >= ANNkdFRDim &&\t\t\t\t\t// among the k best?\n\t\t   (ANN_ALLOW_SELF_MATCH || dist!=0.0)) { // and no self-match problem\n\t\t\t\t\t\t\t\t\t\t\t\t// add it to the list\n\t\t\t//ANNkdFRPointMK->insert(dist, bkt[i]);\n\n\t\t  // MFH\n\t\t\tclosest.push_back(bkt[i]);\n\t\t\tdists.push_back(dist);\n\n\t\t\tANNkdFRPtsInRange++;\t\t\t\t// increment point count\n\t\t}\n\t}\n\tANN_LEAF(1)\t\t\t\t\t\t\t// one more leaf node visited\n\tANN_PTS(n_pts)\t\t\t\t\t\t// increment points visited\n\tANNkdFRPtsVisited += n_pts;\t\t\t// increment number of points visited\n}\n"
  },
  {
    "path": "src/ANN/kd_fix_rad_search.h",
    "content": "//----------------------------------------------------------------------\n// File:\t\t\tkd_fix_rad_search.h\n// Programmer:\t\tSunil Arya and David Mount\n// Description:\t\tStandard kd-tree fixed-radius kNN search\n// Last modified:\t??/??/?? (Version 1.1)\n//----------------------------------------------------------------------\n// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and\n// David Mount.  All Rights Reserved.\n//\n// This software and related documentation is part of the Approximate\n// Nearest Neighbor Library (ANN).  This software is provided under\n// the provisions of the Lesser GNU Public License (LGPL).  See the\n// file ../ReadMe.txt for further information.\n//\n// The University of Maryland (U.M.) and the authors make no\n// representations about the suitability or fitness of this software for\n// any purpose.  It is provided \"as is\" without express or implied\n// warranty.\n//----------------------------------------------------------------------\n// History:\n//\tRevision 1.1  ??/??/??\n//\t\tInitial release\n//----------------------------------------------------------------------\n\n#ifndef ANN_kd_fix_rad_search_H\n#define ANN_kd_fix_rad_search_H\n\n#include \"kd_tree.h\"\t\t\t\t\t// kd-tree declarations\n#include \"kd_util.h\"\t\t\t\t\t// kd-tree utilities\n#include \"pr_queue_k.h\"\t\t\t\t\t// k-element priority queue\n\n#include \"ANNperf.h\"\t\t\t\t// performance evaluation\n\n//----------------------------------------------------------------------\n//\tGlobal variables\n//\t\tThese are active for the life of each call to\n//\t\tannRangeSearch().  They are set to save the number of\n//\t\tvariables that need to be passed among the various search\n//\t\tprocedures.\n//----------------------------------------------------------------------\n\nextern ANNpoint\t\t\tANNkdFRQ;\t\t\t// query point (static copy)\n\n#endif\n"
  },
  {
    "path": "src/ANN/kd_pr_search.cpp",
    "content": "//----------------------------------------------------------------------\n// File:\t\t\tkd_pr_search.cpp\n// Programmer:\t\tSunil Arya and David Mount\n// Description:\t\tPriority search for kd-trees\n// Last modified:\t01/04/05 (Version 1.0)\n//----------------------------------------------------------------------\n// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and\n// David Mount.  All Rights Reserved.\n// \n// This software and related documentation is part of the Approximate\n// Nearest Neighbor Library (ANN).  This software is provided under\n// the provisions of the Lesser GNU Public License (LGPL).  See the\n// file ../ReadMe.txt for further information.\n// \n// The University of Maryland (U.M.) and the authors make no\n// representations about the suitability or fitness of this software for\n// any purpose.  It is provided \"as is\" without express or implied\n// warranty.\n//----------------------------------------------------------------------\n// History:\n//\tRevision 0.1  03/04/98\n//\t\tInitial release\n//----------------------------------------------------------------------\n\n#include \"kd_pr_search.h\"\t\t\t\t// kd priority search declarations\n\n//----------------------------------------------------------------------\n//\tApproximate nearest neighbor searching by priority search.\n//\t\tThe kd-tree is searched for an approximate nearest neighbor.\n//\t\tThe point is returned through one of the arguments, and the\n//\t\tdistance returned is the SQUARED distance to this point.\n//\n//\t\tThe method used for searching the kd-tree is called priority\n//\t\tsearch.  (It is described in Arya and Mount, ``Algorithms for\n//\t\tfast vector quantization,'' Proc. of DCC '93: Data Compression\n//\t\tConference}, eds. J. A. Storer and M. Cohn, IEEE Press, 1993,\n//\t\t381--390.)\n//\n//\t\tThe cell of the kd-tree containing the query point is located,\n//\t\tand cells are visited in increasing order of distance from the\n//\t\tquery point.  This is done by placing each subtree which has\n//\t\tNOT been visited in a priority queue, according to the closest\n//\t\tdistance of the corresponding enclosing rectangle from the\n//\t\tquery point.  The search stops when the distance to the nearest\n//\t\tremaining rectangle exceeds the distance to the nearest point\n//\t\tseen by a factor of more than 1/(1+eps). (Implying that any\n//\t\tpoint found subsequently in the search cannot be closer by more\n//\t\tthan this factor.)\n//\n//\t\tThe main entry point is annkPriSearch() which sets things up and\n//\t\tthen call the recursive routine ann_pri_search().  This is a\n//\t\trecursive routine which performs the processing for one node in\n//\t\tthe kd-tree.  There are two versions of this virtual procedure,\n//\t\tone for splitting nodes and one for leaves. When a splitting node\n//\t\tis visited, we determine which child to continue the search on\n//\t\t(the closer one), and insert the other child into the priority\n//\t\tqueue.  When a leaf is visited, we compute the distances to the\n//\t\tpoints in the buckets, and update information on the closest\n//\t\tpoints.\n//\n//\t\tSome trickery is used to incrementally update the distance from\n//\t\ta kd-tree rectangle to the query point.  This comes about from\n//\t\tthe fact that which each successive split, only one component\n//\t\t(along the dimension that is split) of the squared distance to\n//\t\tthe child rectangle is different from the squared distance to\n//\t\tthe parent rectangle.\n//----------------------------------------------------------------------\n\n//----------------------------------------------------------------------\n//\t\tTo keep argument lists short, a number of global variables\n//\t\tare maintained which are common to all the recursive calls.\n//\t\tThese are given below.\n//----------------------------------------------------------------------\n\ndouble\t\t\tANNprEps;\t\t\t\t// the error bound\nint\t\t\t\tANNprDim;\t\t\t\t// dimension of space\nANNpoint\t\tANNprQ;\t\t\t\t\t// query point\ndouble\t\t\tANNprMaxErr;\t\t\t// max tolerable squared error\nANNpointArray\tANNprPts;\t\t\t\t// the points\nANNpr_queue\t\t*ANNprBoxPQ;\t\t\t// priority queue for boxes\nANNmin_k\t\t*ANNprPointMK;\t\t\t// set of k closest points\n\n//----------------------------------------------------------------------\n//\tannkPriSearch - priority search for k nearest neighbors\n//----------------------------------------------------------------------\n\nvoid ANNkd_tree::annkPriSearch(\n\tANNpoint\t\t\tq,\t\t\t\t// query point\n\tint\t\t\t\t\tk,\t\t\t\t// number of near neighbors to return\n\tANNidxArray\t\t\tnn_idx,\t\t\t// nearest neighbor indices (returned)\n\tANNdistArray\t\tdd,\t\t\t\t// dist to near neighbors (returned)\n\tdouble\t\t\t\teps)\t\t\t// error bound (ignored)\n{\n\t\t\t\t\t\t\t\t\t\t// max tolerable squared error\n\tANNprMaxErr = ANN_POW(1.0 + eps);\n\tANN_FLOP(2)\t\t\t\t\t\t\t// increment floating ops\n\n\tANNprDim = dim;\t\t\t\t\t\t// copy arguments to static equivs\n\tANNprQ = q;\n\tANNprPts = pts;\n\tANNptsVisited = 0;\t\t\t\t\t// initialize count of points visited\n\n\tANNprPointMK = new ANNmin_k(k);\t\t// create set for closest k points\n\n\t\t\t\t\t\t\t\t\t\t// distance to root box\n\tANNdist box_dist = annBoxDistance(q,\n\t\t\t\tbnd_box_lo, bnd_box_hi, dim);\n\n\tANNprBoxPQ = new ANNpr_queue(n_pts);// create priority queue for boxes\n\tANNprBoxPQ->insert(box_dist, root); // insert root in priority queue\n\n\twhile (ANNprBoxPQ->non_empty() &&\n\t\t(!(ANNmaxPtsVisited != 0 && ANNptsVisited > ANNmaxPtsVisited))) {\n\t\tANNkd_ptr np;\t\t\t\t\t// next box from prior queue\n\n\t\t\t\t\t\t\t\t\t\t// extract closest box from queue\n\t\tANNprBoxPQ->extr_min(box_dist, (void *&) np);\n\n\t\tANN_FLOP(2)\t\t\t\t\t\t// increment floating ops\n\t\tif (box_dist*ANNprMaxErr >= ANNprPointMK->max_key())\n\t\t\tbreak;\n\n\t\tnp->ann_pri_search(box_dist);\t// search this subtree.\n\t}\n\n\tfor (int i = 0; i < k; i++) {\t\t// extract the k-th closest points\n\t\tdd[i] = ANNprPointMK->ith_smallest_key(i);\n\t\tnn_idx[i] = ANNprPointMK->ith_smallest_info(i);\n\t}\n\n\tdelete ANNprPointMK;\t\t\t\t// deallocate closest point set\n\tdelete ANNprBoxPQ;\t\t\t\t\t// deallocate priority queue\n}\n\n//----------------------------------------------------------------------\n//\tkd_split::ann_pri_search - search a splitting node\n//----------------------------------------------------------------------\n\nvoid ANNkd_split::ann_pri_search(ANNdist box_dist)\n{\n\tANNdist new_dist;\t\t\t\t\t// distance to child visited later\n\t\t\t\t\t\t\t\t\t\t// distance to cutting plane\n\tANNcoord cut_diff = ANNprQ[cut_dim] - cut_val;\n\n\tif (cut_diff < 0) {\t\t\t\t\t// left of cutting plane\n\t\tANNcoord box_diff = cd_bnds[ANN_LO] - ANNprQ[cut_dim];\n\t\tif (box_diff < 0)\t\t\t\t// within bounds - ignore\n\t\t\tbox_diff = 0;\n\t\t\t\t\t\t\t\t\t\t// distance to further box\n\t\tnew_dist = (ANNdist) ANN_SUM(box_dist,\n\t\t\t\tANN_DIFF(ANN_POW(box_diff), ANN_POW(cut_diff)));\n\n\t\tif (child[ANN_HI] != KD_TRIVIAL)// enqueue if not trivial\n\t\t\tANNprBoxPQ->insert(new_dist, child[ANN_HI]);\n\t\t\t\t\t\t\t\t\t\t// continue with closer child\n\t\tchild[ANN_LO]->ann_pri_search(box_dist);\n\t}\n\telse {\t\t\t\t\t\t\t\t// right of cutting plane\n\t\tANNcoord box_diff = ANNprQ[cut_dim] - cd_bnds[ANN_HI];\n\t\tif (box_diff < 0)\t\t\t\t// within bounds - ignore\n\t\t\tbox_diff = 0;\n\t\t\t\t\t\t\t\t\t\t// distance to further box\n\t\tnew_dist = (ANNdist) ANN_SUM(box_dist,\n\t\t\t\tANN_DIFF(ANN_POW(box_diff), ANN_POW(cut_diff)));\n\n\t\tif (child[ANN_LO] != KD_TRIVIAL)// enqueue if not trivial\n\t\t\tANNprBoxPQ->insert(new_dist, child[ANN_LO]);\n\t\t\t\t\t\t\t\t\t\t// continue with closer child\n\t\tchild[ANN_HI]->ann_pri_search(box_dist);\n\t}\n\tANN_SPL(1)\t\t\t\t\t\t\t// one more splitting node visited\n\tANN_FLOP(8)\t\t\t\t\t\t\t// increment floating ops\n}\n\n//----------------------------------------------------------------------\n//\tkd_leaf::ann_pri_search - search points in a leaf node\n//\n//\t\tThis is virtually identical to the ann_search for standard search.\n//----------------------------------------------------------------------\n\nvoid ANNkd_leaf::ann_pri_search(ANNdist box_dist)\n{\n\tANNdist dist;\t\t\t\t// distance to data point\n\tANNcoord* pp;\t\t\t\t// data coordinate pointer\n\tANNcoord* qq;\t\t\t\t// query coordinate pointer\n\tANNdist min_dist;\t\t\t// distance to k-th closest point\n\tANNcoord t;\n\tint d;\n\n\tmin_dist = ANNprPointMK->max_key(); // k-th smallest distance so far\n\n\tfor (int i = 0; i < n_pts; i++) {\t// check points in bucket\n\n\t\tpp = ANNprPts[bkt[i]];\t\t\t// first coord of next data point\n\t\tqq = ANNprQ;\t\t\t\t\t// first coord of query point\n\t\tdist = 0;\n\n\t\tfor(d = 0; d < ANNprDim; d++) {\n\t\t\tANN_COORD(1)\t\t\t\t// one more coordinate hit\n\t\t\tANN_FLOP(4)\t\t\t\t\t// increment floating ops\n\n\t\t\tt = *(qq++) - *(pp++);\t\t// compute length and adv coordinate\n\t\t\t\t\t\t\t\t\t\t// exceeds dist to k-th smallest?\n\t\t\tif( (dist = ANN_SUM(dist, ANN_POW(t))) > min_dist) {\n\t\t\t\tbreak;\n\t\t\t}\n\t\t}\n\n\t\tif (d >= ANNprDim &&\t\t\t\t\t// among the k best?\n\t\t   (ANN_ALLOW_SELF_MATCH || dist!=0)) { // and no self-match problem\n\t\t\t\t\t\t\t\t\t\t\t\t// add it to the list\n\t\t\tANNprPointMK->insert(dist, bkt[i]);\n\t\t\tmin_dist = ANNprPointMK->max_key();\n\t\t}\n\t}\n\tANN_LEAF(1)\t\t\t\t\t\t\t// one more leaf node visited\n\tANN_PTS(n_pts)\t\t\t\t\t\t// increment points visited\n\tANNptsVisited += n_pts;\t\t\t\t// increment number of points visited\n}\n"
  },
  {
    "path": "src/ANN/kd_pr_search.h",
    "content": "//----------------------------------------------------------------------\n// File:\t\t\tkd_pr_search.h\n// Programmer:\t\tSunil Arya and David Mount\n// Description:\t\tPriority kd-tree search\n// Last modified:\t01/04/05 (Version 1.0)\n//----------------------------------------------------------------------\n// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and\n// David Mount.  All Rights Reserved.\n//\n// This software and related documentation is part of the Approximate\n// Nearest Neighbor Library (ANN).  This software is provided under\n// the provisions of the Lesser GNU Public License (LGPL).  See the\n// file ../ReadMe.txt for further information.\n//\n// The University of Maryland (U.M.) and the authors make no\n// representations about the suitability or fitness of this software for\n// any purpose.  It is provided \"as is\" without express or implied\n// warranty.\n//----------------------------------------------------------------------\n// History:\n//\tRevision 0.1  03/04/98\n//\t\tInitial release\n//----------------------------------------------------------------------\n\n#ifndef ANN_kd_pr_search_H\n#define ANN_kd_pr_search_H\n\n#include \"kd_tree.h\"\t\t\t\t\t// kd-tree declarations\n#include \"kd_util.h\"\t\t\t\t\t// kd-tree utilities\n#include \"pr_queue.h\"\t\t\t\t\t// priority queue declarations\n#include \"pr_queue_k.h\"\t\t\t\t\t// k-element priority queue\n\n#include \"ANNperf.h\"\t\t\t\t// performance evaluation\n\n//----------------------------------------------------------------------\n//\tGlobal variables\n//\t\tActive for the life of each call to Appx_Near_Neigh() or\n//\t\tAppx_k_Near_Neigh().\n//----------------------------------------------------------------------\n\nextern double\t\t\tANNprEps;\t\t// the error bound\nextern int\t\t\t\tANNprDim;\t\t// dimension of space\nextern ANNpoint\t\t\tANNprQ;\t\t\t// query point\nextern double\t\t\tANNprMaxErr;\t// max tolerable squared error\nextern ANNpointArray\tANNprPts;\t\t// the points\nextern ANNpr_queue\t\t*ANNprBoxPQ;\t// priority queue for boxes\nextern ANNmin_k\t\t\t*ANNprPointMK;\t// set of k closest points\n\n#endif\n"
  },
  {
    "path": "src/ANN/kd_search.cpp",
    "content": "//----------------------------------------------------------------------\n// File:\t\t\tkd_search.cpp\n// Programmer:\t\tSunil Arya and David Mount\n// Description:\t\tStandard kd-tree search\n// Last modified:\t01/04/05 (Version 1.0)\n//----------------------------------------------------------------------\n// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and\n// David Mount.  All Rights Reserved.\n// \n// This software and related documentation is part of the Approximate\n// Nearest Neighbor Library (ANN).  This software is provided under\n// the provisions of the Lesser GNU Public License (LGPL).  See the\n// file ../ReadMe.txt for further information.\n// \n// The University of Maryland (U.M.) and the authors make no\n// representations about the suitability or fitness of this software for\n// any purpose.  It is provided \"as is\" without express or implied\n// warranty.\n//----------------------------------------------------------------------\n// History:\n//\tRevision 0.1  03/04/98\n//\t\tInitial release\n//\tRevision 1.0  04/01/05\n//\t\tChanged names LO, HI to ANN_LO, ANN_HI\n//----------------------------------------------------------------------\n\n#include \"kd_search.h\"\t\t\t\t\t// kd-search declarations\n\n//----------------------------------------------------------------------\n//\tApproximate nearest neighbor searching by kd-tree search\n//\t\tThe kd-tree is searched for an approximate nearest neighbor.\n//\t\tThe point is returned through one of the arguments, and the\n//\t\tdistance returned is the squared distance to this point.\n//\n//\t\tThe method used for searching the kd-tree is an approximate\n//\t\tadaptation of the search algorithm described by Friedman,\n//\t\tBentley, and Finkel, ``An algorithm for finding best matches\n//\t\tin logarithmic expected time,'' ACM Transactions on Mathematical\n//\t\tSoftware, 3(3):209-226, 1977).\n//\n//\t\tThe algorithm operates recursively.  When first encountering a\n//\t\tnode of the kd-tree we first visit the child which is closest to\n//\t\tthe query point.  On return, we decide whether we want to visit\n//\t\tthe other child.  If the box containing the other child exceeds\n//\t\t1/(1+eps) times the current best distance, then we skip it (since\n//\t\tany point found in this child cannot be closer to the query point\n//\t\tby more than this factor.)  Otherwise, we visit it recursively.\n//\t\tThe distance between a box and the query point is computed exactly\n//\t\t(not approximated as is often done in kd-tree), using incremental\n//\t\tdistance updates, as described by Arya and Mount in ``Algorithms\n//\t\tfor fast vector quantization,'' Proc.  of DCC '93: Data Compression\n//\t\tConference, eds. J. A. Storer and M. Cohn, IEEE Press, 1993,\n//\t\t381-390.\n//\n//\t\tThe main entry points is annkSearch() which sets things up and\n//\t\tthen call the recursive routine ann_search().  This is a recursive\n//\t\troutine which performs the processing for one node in the kd-tree.\n//\t\tThere are two versions of this virtual procedure, one for splitting\n//\t\tnodes and one for leaves.  When a splitting node is visited, we\n//\t\tdetermine which child to visit first (the closer one), and visit\n//\t\tthe other child on return.  When a leaf is visited, we compute\n//\t\tthe distances to the points in the buckets, and update information\n//\t\ton the closest points.\n//\n//\t\tSome trickery is used to incrementally update the distance from\n//\t\ta kd-tree rectangle to the query point.  This comes about from\n//\t\tthe fact that which each successive split, only one component\n//\t\t(along the dimension that is split) of the squared distance to\n//\t\tthe child rectangle is different from the squared distance to\n//\t\tthe parent rectangle.\n//----------------------------------------------------------------------\n\n//----------------------------------------------------------------------\n//\t\tTo keep argument lists short, a number of global variables\n//\t\tare maintained which are common to all the recursive calls.\n//\t\tThese are given below.\n//----------------------------------------------------------------------\n\nint\t\t\t\tANNkdDim;\t\t\t\t// dimension of space\nANNpoint\t\tANNkdQ;\t\t\t\t\t// query point\ndouble\t\t\tANNkdMaxErr;\t\t\t// max tolerable squared error\nANNpointArray\tANNkdPts;\t\t\t\t// the points\nANNmin_k\t\t*ANNkdPointMK;\t\t\t// set of k closest points\n\n//----------------------------------------------------------------------\n//\tannkSearch - search for the k nearest neighbors\n//----------------------------------------------------------------------\n\nvoid ANNkd_tree::annkSearch(\n\tANNpoint\t\t\tq,\t\t\t\t// the query point\n\tint\t\t\t\t\tk,\t\t\t\t// number of near neighbors to return\n\tANNidxArray\t\t\tnn_idx,\t\t\t// nearest neighbor indices (returned)\n\tANNdistArray\t\tdd,\t\t\t\t// the approximate nearest neighbor\n\tdouble\t\t\t\teps)\t\t\t// the error bound\n{\n\n\tANNkdDim = dim;\t\t\t\t\t\t// copy arguments to static equivs\n\tANNkdQ = q;\n\tANNkdPts = pts;\n\tANNptsVisited = 0;\t\t\t\t\t// initialize count of points visited\n\n\tif (k > n_pts) {\t\t\t\t\t// too many near neighbors?\n\t\tannError(\"Requesting more near neighbors than data points\", ANNabort);\n\t}\n\n\tANNkdMaxErr = ANN_POW(1.0 + eps);\n\tANN_FLOP(2)\t\t\t\t\t\t\t// increment floating op count\n\n\tANNkdPointMK = new ANNmin_k(k);\t\t// create set for closest k points\n\t\t\t\t\t\t\t\t\t\t// search starting at the root\n\troot->ann_search(annBoxDistance(q, bnd_box_lo, bnd_box_hi, dim));\n\n\tfor (int i = 0; i < k; i++) {\t\t// extract the k-th closest points\n\t\tdd[i] = ANNkdPointMK->ith_smallest_key(i);\n\t\tnn_idx[i] = ANNkdPointMK->ith_smallest_info(i);\n\t}\n\tdelete ANNkdPointMK;\t\t\t\t// deallocate closest point set\n}\n\n//----------------------------------------------------------------------\n//\tkd_split::ann_search - search a splitting node\n//----------------------------------------------------------------------\n\nvoid ANNkd_split::ann_search(ANNdist box_dist)\n{\n\t\t\t\t\t\t\t\t\t\t// check dist calc term condition\n\tif (ANNmaxPtsVisited != 0 && ANNptsVisited > ANNmaxPtsVisited) return;\n\n\t\t\t\t\t\t\t\t\t\t// distance to cutting plane\n\tANNcoord cut_diff = ANNkdQ[cut_dim] - cut_val;\n\n\tif (cut_diff < 0) {\t\t\t\t\t// left of cutting plane\n\t\tchild[ANN_LO]->ann_search(box_dist);// visit closer child first\n\n\t\tANNcoord box_diff = cd_bnds[ANN_LO] - ANNkdQ[cut_dim];\n\t\tif (box_diff < 0)\t\t\t\t// within bounds - ignore\n\t\t\tbox_diff = 0;\n\t\t\t\t\t\t\t\t\t\t// distance to further box\n\t\tbox_dist = (ANNdist) ANN_SUM(box_dist,\n\t\t\t\tANN_DIFF(ANN_POW(box_diff), ANN_POW(cut_diff)));\n\n\t\t\t\t\t\t\t\t\t\t// visit further child if close enough\n\t\tif (box_dist * ANNkdMaxErr < ANNkdPointMK->max_key())\n\t\t\tchild[ANN_HI]->ann_search(box_dist);\n\n\t}\n\telse {\t\t\t\t\t\t\t\t// right of cutting plane\n\t\tchild[ANN_HI]->ann_search(box_dist);// visit closer child first\n\n\t\tANNcoord box_diff = ANNkdQ[cut_dim] - cd_bnds[ANN_HI];\n\t\tif (box_diff < 0)\t\t\t\t// within bounds - ignore\n\t\t\tbox_diff = 0;\n\t\t\t\t\t\t\t\t\t\t// distance to further box\n\t\tbox_dist = (ANNdist) ANN_SUM(box_dist,\n\t\t\t\tANN_DIFF(ANN_POW(box_diff), ANN_POW(cut_diff)));\n\n\t\t\t\t\t\t\t\t\t\t// visit further child if close enough\n\t\tif (box_dist * ANNkdMaxErr < ANNkdPointMK->max_key())\n\t\t\tchild[ANN_LO]->ann_search(box_dist);\n\n\t}\n\tANN_FLOP(10)\t\t\t\t\t\t// increment floating ops\n\tANN_SPL(1)\t\t\t\t\t\t\t// one more splitting node visited\n}\n\n//----------------------------------------------------------------------\n//\tkd_leaf::ann_search - search points in a leaf node\n//\t\tNote: The unreadability of this code is the result of\n//\t\tsome fine tuning to replace indexing by pointer operations.\n//----------------------------------------------------------------------\n\nvoid ANNkd_leaf::ann_search(ANNdist box_dist)\n{\n\tANNdist dist;\t\t\t\t// distance to data point\n\tANNcoord* pp;\t\t\t\t// data coordinate pointer\n\tANNcoord* qq;\t\t\t\t// query coordinate pointer\n\tANNdist min_dist;\t\t\t// distance to k-th closest point\n\tANNcoord t;\n\tint d;\n\n\tmin_dist = ANNkdPointMK->max_key(); // k-th smallest distance so far\n\n\tfor (int i = 0; i < n_pts; i++) {\t// check points in bucket\n\n\t\tpp = ANNkdPts[bkt[i]];\t\t\t// first coord of next data point\n\t\tqq = ANNkdQ;\t\t\t\t\t// first coord of query point\n\t\tdist = 0;\n\n\t\tfor(d = 0; d < ANNkdDim; d++) {\n\t\t\tANN_COORD(1)\t\t\t\t// one more coordinate hit\n\t\t\tANN_FLOP(4)\t\t\t\t\t// increment floating ops\n\n\t\t\tt = *(qq++) - *(pp++);\t\t// compute length and adv coordinate\n\t\t\t\t\t\t\t\t\t\t// exceeds dist to k-th smallest?\n\t\t\tif( (dist = ANN_SUM(dist, ANN_POW(t))) > min_dist) {\n\t\t\t\tbreak;\n\t\t\t}\n\t\t}\n\n\t\tif (d >= ANNkdDim &&\t\t\t\t\t// among the k best?\n\t\t   (ANN_ALLOW_SELF_MATCH || dist!=0)) { // and no self-match problem\n\t\t\t\t\t\t\t\t\t\t\t\t// add it to the list\n\t\t\tANNkdPointMK->insert(dist, bkt[i]);\n\t\t\tmin_dist = ANNkdPointMK->max_key();\n\t\t}\n\t}\n\tANN_LEAF(1)\t\t\t\t\t\t\t// one more leaf node visited\n\tANN_PTS(n_pts)\t\t\t\t\t\t// increment points visited\n\tANNptsVisited += n_pts;\t\t\t\t// increment number of points visited\n}\n"
  },
  {
    "path": "src/ANN/kd_search.h",
    "content": "//----------------------------------------------------------------------\n// File:\t\t\tkd_search.h\n// Programmer:\t\tSunil Arya and David Mount\n// Description:\t\tStandard kd-tree search\n// Last modified:\t01/04/05 (Version 1.0)\n//----------------------------------------------------------------------\n// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and\n// David Mount.  All Rights Reserved.\n//\n// This software and related documentation is part of the Approximate\n// Nearest Neighbor Library (ANN).  This software is provided under\n// the provisions of the Lesser GNU Public License (LGPL).  See the\n// file ../ReadMe.txt for further information.\n//\n// The University of Maryland (U.M.) and the authors make no\n// representations about the suitability or fitness of this software for\n// any purpose.  It is provided \"as is\" without express or implied\n// warranty.\n//----------------------------------------------------------------------\n// History:\n//\tRevision 0.1  03/04/98\n//\t\tInitial release\n//----------------------------------------------------------------------\n\n#ifndef ANN_kd_search_H\n#define ANN_kd_search_H\n\n#include \"kd_tree.h\"\t\t\t\t\t// kd-tree declarations\n#include \"kd_util.h\"\t\t\t\t\t// kd-tree utilities\n#include \"pr_queue_k.h\"\t\t\t\t\t// k-element priority queue\n\n#include \"ANNperf.h\"\t\t\t\t// performance evaluation\n\n//----------------------------------------------------------------------\n//\tMore global variables\n//\t\tThese are active for the life of each call to annkSearch(). They\n//\t\tare set to save the number of variables that need to be passed\n//\t\tamong the various search procedures.\n//----------------------------------------------------------------------\n\nextern int\t\t\t\tANNkdDim;\t\t// dimension of space (static copy)\nextern ANNpoint\t\t\tANNkdQ;\t\t\t// query point (static copy)\nextern double\t\t\tANNkdMaxErr;\t// max tolerable squared error\nextern ANNpointArray\tANNkdPts;\t\t// the points (static copy)\nextern ANNmin_k\t\t\t*ANNkdPointMK;\t// set of k closest points\nextern int\t\t\t\tANNptsVisited;\t// number of points visited\n\n#endif\n"
  },
  {
    "path": "src/ANN/kd_split.cpp",
    "content": "//----------------------------------------------------------------------\n// File:\t\t\tkd_split.cpp\n// Programmer:\t\tSunil Arya and David Mount\n// Description:\t\tMethods for splitting kd-trees\n// Last modified:\t01/04/05 (Version 1.0)\n//----------------------------------------------------------------------\n// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and\n// David Mount.  All Rights Reserved.\n//\n// This software and related documentation is part of the Approximate\n// Nearest Neighbor Library (ANN).  This software is provided under\n// the provisions of the Lesser GNU Public License (LGPL).  See the\n// file ../ReadMe.txt for further information.\n//\n// The University of Maryland (U.M.) and the authors make no\n// representations about the suitability or fitness of this software for\n// any purpose.  It is provided \"as is\" without express or implied\n// warranty.\n//----------------------------------------------------------------------\n// History:\n//\tRevision 0.1  03/04/98\n//\t\tInitial release\n//\tRevision 1.0  04/01/05\n//----------------------------------------------------------------------\n\n#include \"kd_tree.h\"\t\t\t\t\t// kd-tree definitions\n#include \"kd_util.h\"\t\t\t\t\t// kd-tree utilities\n#include \"kd_split.h\"\t\t\t\t\t// splitting functions\n\n//----------------------------------------------------------------------\n//\tConstants\n//----------------------------------------------------------------------\n\nconst double EPS = 0.001;\t\t\t\t// a small value\nconst double FS_ASPECT_RATIO = 3.0;\t\t// maximum allowed aspect ratio\n\t\t\t\t\t\t\t\t\t\t// in fair split. Must be >= 2.\n\n//----------------------------------------------------------------------\n//\tkd_split - Bentley's standard splitting routine for kd-trees\n//\t\tFind the dimension of the greatest spread, and split\n//\t\tjust before the median point along this dimension.\n//----------------------------------------------------------------------\n\nvoid kd_split(\n\tANNpointArray\t\tpa,\t\t\t\t// point array (permuted on return)\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices\n\tconst ANNorthRect\t&bnds,\t\t\t// bounding rectangle for cell\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\tdim,\t\t\t// dimension of space\n\tint\t\t\t\t\t&cut_dim,\t\t// cutting dimension (returned)\n\tANNcoord\t\t\t&cut_val,\t\t// cutting value (returned)\n\tint\t\t\t\t\t&n_lo)\t\t\t// num of points on low side (returned)\n{\n\t\t\t\t\t\t\t\t\t\t// find dimension of maximum spread\n\tcut_dim = annMaxSpread(pa, pidx, n, dim);\n\tn_lo = n/2;\t\t\t\t\t\t\t// median rank\n\t\t\t\t\t\t\t\t\t\t// split about median\n\tannMedianSplit(pa, pidx, n, cut_dim, cut_val, n_lo);\n}\n\n//----------------------------------------------------------------------\n//\tmidpt_split - midpoint splitting rule for box-decomposition trees\n//\n//\t\tThis is the simplest splitting rule that guarantees boxes\n//\t\tof bounded aspect ratio.  It simply cuts the box with the\n//\t\tlongest side through its midpoint.  If there are ties, it\n//\t\tselects the dimension with the maximum point spread.\n//\n//\t\tWARNING: This routine (while simple) doesn't seem to work\n//\t\twell in practice in high dimensions, because it tends to\n//\t\tgenerate a large number of trivial and/or unbalanced splits.\n//\t\tEither kd_split(), sl_midpt_split(), or fair_split() are\n//\t\trecommended, instead.\n//----------------------------------------------------------------------\n\nvoid midpt_split(\n\tANNpointArray\t\tpa,\t\t\t\t// point array\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices (permuted on return)\n\tconst ANNorthRect\t&bnds,\t\t\t// bounding rectangle for cell\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\tdim,\t\t\t// dimension of space\n\tint\t\t\t\t\t&cut_dim,\t\t// cutting dimension (returned)\n\tANNcoord\t\t\t&cut_val,\t\t// cutting value (returned)\n\tint\t\t\t\t\t&n_lo)\t\t\t// num of points on low side (returned)\n{\n\tint d;\n\n\tANNcoord max_length = bnds.hi[0] - bnds.lo[0];\n\tfor (d = 1; d < dim; d++) {\t\t\t// find length of longest box side\n\t\tANNcoord length = bnds.hi[d] - bnds.lo[d];\n\t\tif (length > max_length) {\n\t\t\tmax_length = length;\n\t\t}\n\t}\n\tANNcoord max_spread = -1;\t\t\t// find long side with most spread\n\tfor (d = 0; d < dim; d++) {\n\t\t\t\t\t\t\t\t\t\t// is it among longest?\n\t\tif (double(bnds.hi[d] - bnds.lo[d]) >= (1-EPS)*max_length) {\n\t\t\t\t\t\t\t\t\t\t// compute its spread\n\t\t\tANNcoord spr = annSpread(pa, pidx, n, d);\n\t\t\tif (spr > max_spread) {\t\t// is it max so far?\n\t\t\t\tmax_spread = spr;\n\t\t\t\tcut_dim = d;\n\t\t\t}\n\t\t}\n\t}\n\t\t\t\t\t\t\t\t\t\t// split along cut_dim at midpoint\n\tcut_val = (bnds.lo[cut_dim] + bnds.hi[cut_dim]) / 2;\n\t\t\t\t\t\t\t\t\t\t// permute points accordingly\n\tint br1, br2;\n\tannPlaneSplit(pa, pidx, n, cut_dim, cut_val, br1, br2);\n\t//------------------------------------------------------------------\n\t//\tOn return:\t\tpa[0..br1-1] < cut_val\n\t//\t\t\t\t\tpa[br1..br2-1] == cut_val\n\t//\t\t\t\t\tpa[br2..n-1] > cut_val\n\t//\n\t//\tWe can set n_lo to any value in the range [br1..br2].\n\t//\tWe choose split so that points are most evenly divided.\n\t//------------------------------------------------------------------\n\tif (br1 > n/2) n_lo = br1;\n\telse if (br2 < n/2) n_lo = br2;\n\telse n_lo = n/2;\n}\n\n//----------------------------------------------------------------------\n//\tsl_midpt_split - sliding midpoint splitting rule\n//\n//\t\tThis is a modification of midpt_split, which has the nonsensical\n//\t\tname \"sliding midpoint\".  The idea is that we try to use the\n//\t\tmidpoint rule, by bisecting the longest side.  If there are\n//\t\tties, the dimension with the maximum spread is selected.  If,\n//\t\thowever, the midpoint split produces a trivial split (no points\n//\t\ton one side of the splitting plane) then we slide the splitting\n//\t\t(maintaining its orientation) until it produces a nontrivial\n//\t\tsplit. For example, if the splitting plane is along the x-axis,\n//\t\tand all the data points have x-coordinate less than the x-bisector,\n//\t\tthen the split is taken along the maximum x-coordinate of the\n//\t\tdata points.\n//\n//\t\tIntuitively, this rule cannot generate trivial splits, and\n//\t\thence avoids midpt_split's tendency to produce trees with\n//\t\ta very large number of nodes.\n//\n//----------------------------------------------------------------------\n\nvoid sl_midpt_split(\n\tANNpointArray\t\tpa,\t\t\t\t// point array\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices (permuted on return)\n\tconst ANNorthRect\t&bnds,\t\t\t// bounding rectangle for cell\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\tdim,\t\t\t// dimension of space\n\tint\t\t\t\t\t&cut_dim,\t\t// cutting dimension (returned)\n\tANNcoord\t\t\t&cut_val,\t\t// cutting value (returned)\n\tint\t\t\t\t\t&n_lo)\t\t\t// num of points on low side (returned)\n{\n\tint d;\n\n\tANNcoord max_length = bnds.hi[0] - bnds.lo[0];\n\tfor (d = 1; d < dim; d++) {\t\t\t// find length of longest box side\n\t\tANNcoord length = bnds.hi[d] - bnds.lo[d];\n\t\tif (length > max_length) {\n\t\t\tmax_length = length;\n\t\t}\n\t}\n\tANNcoord max_spread = -1;\t\t\t// find long side with most spread\n\tfor (d = 0; d < dim; d++) {\n\t\t\t\t\t\t\t\t\t\t// is it among longest?\n\t\tif ((bnds.hi[d] - bnds.lo[d]) >= (1-EPS)*max_length) {\n\t\t\t\t\t\t\t\t\t\t// compute its spread\n\t\t\tANNcoord spr = annSpread(pa, pidx, n, d);\n\t\t\tif (spr > max_spread) {\t\t// is it max so far?\n\t\t\t\tmax_spread = spr;\n\t\t\t\tcut_dim = d;\n\t\t\t}\n\t\t}\n\t}\n\t\t\t\t\t\t\t\t\t\t// ideal split at midpoint\n\tANNcoord ideal_cut_val = (bnds.lo[cut_dim] + bnds.hi[cut_dim])/2;\n\n\tANNcoord min, max;\n\tannMinMax(pa, pidx, n, cut_dim, min, max);\t// find min/max coordinates\n\n\tif (ideal_cut_val < min)\t\t\t// slide to min or max as needed\n\t\tcut_val = min;\n\telse if (ideal_cut_val > max)\n\t\tcut_val = max;\n\telse\n\t\tcut_val = ideal_cut_val;\n\n\t\t\t\t\t\t\t\t\t\t// permute points accordingly\n\tint br1, br2;\n\tannPlaneSplit(pa, pidx, n, cut_dim, cut_val, br1, br2);\n\t//------------------------------------------------------------------\n\t//\tOn return:\t\tpa[0..br1-1] < cut_val\n\t//\t\t\t\t\tpa[br1..br2-1] == cut_val\n\t//\t\t\t\t\tpa[br2..n-1] > cut_val\n\t//\n\t//\tWe can set n_lo to any value in the range [br1..br2] to satisfy\n\t//\tthe exit conditions of the procedure.\n\t//\n\t//\tif ideal_cut_val < min (implying br2 >= 1),\n\t//\t\t\tthen we select n_lo = 1 (so there is one point on left) and\n\t//\tif ideal_cut_val > max (implying br1 <= n-1),\n\t//\t\t\tthen we select n_lo = n-1 (so there is one point on right).\n\t//\tOtherwise, we select n_lo as close to n/2 as possible within\n\t//\t\t\t[br1..br2].\n\t//------------------------------------------------------------------\n\tif (ideal_cut_val < min) n_lo = 1;\n\telse if (ideal_cut_val > max) n_lo = n-1;\n\telse if (br1 > n/2) n_lo = br1;\n\telse if (br2 < n/2) n_lo = br2;\n\telse n_lo = n/2;\n}\n\n//----------------------------------------------------------------------\n//\tfair_split - fair-split splitting rule\n//\n//\t\tThis is a compromise between the kd-tree splitting rule (which\n//\t\talways splits data points at their median) and the midpoint\n//\t\tsplitting rule (which always splits a box through its center.\n//\t\tThe goal of this procedure is to achieve both nicely balanced\n//\t\tsplits, and boxes of bounded aspect ratio.\n//\n//\t\tA constant FS_ASPECT_RATIO is defined. Given a box, those sides\n//\t\twhich can be split so that the ratio of the longest to shortest\n//\t\tside does not exceed ASPECT_RATIO are identified.  Among these\n//\t\tsides, we select the one in which the points have the largest\n//\t\tspread. We then split the points in a manner which most evenly\n//\t\tdistributes the points on either side of the splitting plane,\n//\t\tsubject to maintaining the bound on the ratio of long to short\n//\t\tsides. To determine that the aspect ratio will be preserved,\n//\t\twe determine the longest side (other than this side), and\n//\t\tdetermine how narrowly we can cut this side, without causing the\n//\t\taspect ratio bound to be exceeded (small_piece).\n//\n//\t\tThis procedure is more robust than either kd_split or midpt_split,\n//\t\tbut is more complicated as well.  When point distribution is\n//\t\textremely skewed, this degenerates to midpt_split (actually\n//\t\t1/3 point split), and when the points are most evenly distributed,\n//\t\tthis degenerates to kd-split.\n//----------------------------------------------------------------------\n\nvoid fair_split(\n\tANNpointArray\t\tpa,\t\t\t\t// point array\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices (permuted on return)\n\tconst ANNorthRect\t&bnds,\t\t\t// bounding rectangle for cell\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\tdim,\t\t\t// dimension of space\n\tint\t\t\t\t\t&cut_dim,\t\t// cutting dimension (returned)\n\tANNcoord\t\t\t&cut_val,\t\t// cutting value (returned)\n\tint\t\t\t\t\t&n_lo)\t\t\t// num of points on low side (returned)\n{\n\tint d;\n\tANNcoord max_length = bnds.hi[0] - bnds.lo[0];\n\tcut_dim = 0;\n\tfor (d = 1; d < dim; d++) {\t\t\t// find length of longest box side\n\t\tANNcoord length = bnds.hi[d] - bnds.lo[d];\n\t\tif (length > max_length) {\n\t\t\tmax_length = length;\n\t\t\tcut_dim = d;\n\t\t}\n\t}\n\n\tANNcoord max_spread = 0;\t\t\t// find legal cut with max spread\n\tcut_dim = 0;\n\tfor (d = 0; d < dim; d++) {\n\t\tANNcoord length = bnds.hi[d] - bnds.lo[d];\n\t\t\t\t\t\t\t\t\t\t// is this side midpoint splitable\n\t\t\t\t\t\t\t\t\t\t// without violating aspect ratio?\n\t\tif (((double) max_length)*2.0/((double) length) <= FS_ASPECT_RATIO) {\n\t\t\t\t\t\t\t\t\t\t// compute spread along this dim\n\t\t\tANNcoord spr = annSpread(pa, pidx, n, d);\n\t\t\tif (spr > max_spread) {\t\t// best spread so far\n\t\t\t\tmax_spread = spr;\n\t\t\t\tcut_dim = d;\t\t\t// this is dimension to cut\n\t\t\t}\n\t\t}\n\t}\n\n\tmax_length = 0;\t\t\t\t\t\t// find longest side other than cut_dim\n\tfor (d = 0; d < dim; d++) {\n\t\tANNcoord length = bnds.hi[d] - bnds.lo[d];\n\t\tif (d != cut_dim && length > max_length)\n\t\t\tmax_length = length;\n\t}\n\t\t\t\t\t\t\t\t\t\t// consider most extreme splits\n\tANNcoord small_piece = max_length / FS_ASPECT_RATIO;\n\tANNcoord lo_cut = bnds.lo[cut_dim] + small_piece;// lowest legal cut\n\tANNcoord hi_cut = bnds.hi[cut_dim] - small_piece;// highest legal cut\n\n\tint br1, br2;\n\t\t\t\t\t\t\t\t\t\t// is median below lo_cut ?\n\tif (annSplitBalance(pa, pidx, n, cut_dim, lo_cut) >= 0) {\n\t\tcut_val = lo_cut;\t\t\t\t// cut at lo_cut\n\t\tannPlaneSplit(pa, pidx, n, cut_dim, cut_val, br1, br2);\n\t\tn_lo = br1;\n\t}\n\t\t\t\t\t\t\t\t\t\t// is median above hi_cut?\n\telse if (annSplitBalance(pa, pidx, n, cut_dim, hi_cut) <= 0) {\n\t\tcut_val = hi_cut;\t\t\t\t// cut at hi_cut\n\t\tannPlaneSplit(pa, pidx, n, cut_dim, cut_val, br1, br2);\n\t\tn_lo = br2;\n\t}\n\telse {\t\t\t\t\t\t\t\t// median cut preserves asp ratio\n\t\tn_lo = n/2;\t\t\t\t\t\t// split about median\n\t\tannMedianSplit(pa, pidx, n, cut_dim, cut_val, n_lo);\n\t}\n}\n\n//----------------------------------------------------------------------\n//\tsl_fair_split - sliding fair split splitting rule\n//\n//\t\tSliding fair split is a splitting rule that combines the\n//\t\tstrengths of both fair split with sliding midpoint split.\n//\t\tFair split tends to produce balanced splits when the points\n//\t\tare roughly uniformly distributed, but it can produce many\n//\t\ttrivial splits when points are highly clustered.  Sliding\n//\t\tmidpoint never produces trivial splits, and shrinks boxes\n//\t\tnicely if points are highly clustered, but it may produce\n//\t\trather unbalanced splits when points are unclustered but not\n//\t\tquite uniform.\n//\n//\t\tSliding fair split is based on the theory that there are two\n//\t\ttypes of splits that are \"good\": balanced splits that produce\n//\t\tfat boxes, and unbalanced splits provided the cell with fewer\n//\t\tpoints is fat.\n//\n//\t\tThis splitting rule operates by first computing the longest\n//\t\tside of the current bounding box.  Then it asks which sides\n//\t\tcould be split (at the midpoint) and still satisfy the aspect\n//\t\tratio bound with respect to this side.\tAmong these, it selects\n//\t\tthe side with the largest spread (as fair split would).\t It\n//\t\tthen considers the most extreme cuts that would be allowed by\n//\t\tthe aspect ratio bound.\t This is done by dividing the longest\n//\t\tside of the box by the aspect ratio bound.\tIf the median cut\n//\t\tlies between these extreme cuts, then we use the median cut.\n//\t\tIf not, then consider the extreme cut that is closer to the\n//\t\tmedian.\t If all the points lie to one side of this cut, then\n//\t\twe slide the cut until it hits the first point.\t This may\n//\t\tviolate the aspect ratio bound, but will never generate empty\n//\t\tcells.\tHowever the sibling of every such skinny cell is fat,\n//\t\tand hence packing arguments still apply.\n//\n//----------------------------------------------------------------------\n\nvoid sl_fair_split(\n\tANNpointArray\t\tpa,\t\t\t\t// point array\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices (permuted on return)\n\tconst ANNorthRect\t&bnds,\t\t\t// bounding rectangle for cell\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\tdim,\t\t\t// dimension of space\n\tint\t\t\t\t\t&cut_dim,\t\t// cutting dimension (returned)\n\tANNcoord\t\t\t&cut_val,\t\t// cutting value (returned)\n\tint\t\t\t\t\t&n_lo)\t\t\t// num of points on low side (returned)\n{\n\tint d;\n\tANNcoord min, max;\t\t\t\t\t// min/max coordinates\n\tint br1, br2;\t\t\t\t\t\t// split break points\n\n\tANNcoord max_length = bnds.hi[0] - bnds.lo[0];\n\tcut_dim = 0;\n\tfor (d = 1; d < dim; d++) {\t\t\t// find length of longest box side\n\t\tANNcoord length = bnds.hi[d] - bnds.lo[d];\n\t\tif (length\t> max_length) {\n\t\t\tmax_length = length;\n\t\t\tcut_dim = d;\n\t\t}\n\t}\n\n\tANNcoord max_spread = 0;\t\t\t// find legal cut with max spread\n\tcut_dim = 0;\n\tfor (d = 0; d < dim; d++) {\n\t\tANNcoord length = bnds.hi[d] - bnds.lo[d];\n\t\t\t\t\t\t\t\t\t\t// is this side midpoint splitable\n\t\t\t\t\t\t\t\t\t\t// without violating aspect ratio?\n\t\tif (((double) max_length)*2.0/((double) length) <= FS_ASPECT_RATIO) {\n\t\t\t\t\t\t\t\t\t\t// compute spread along this dim\n\t\t\tANNcoord spr = annSpread(pa, pidx, n, d);\n\t\t\tif (spr > max_spread) {\t\t// best spread so far\n\t\t\t\tmax_spread = spr;\n\t\t\t\tcut_dim = d;\t\t\t// this is dimension to cut\n\t\t\t}\n\t\t}\n\t}\n\n\tmax_length = 0;\t\t\t\t\t\t// find longest side other than cut_dim\n\tfor (d = 0; d < dim; d++) {\n\t\tANNcoord length = bnds.hi[d] - bnds.lo[d];\n\t\tif (d != cut_dim && length > max_length)\n\t\t\tmax_length = length;\n\t}\n\t\t\t\t\t\t\t\t\t\t// consider most extreme splits\n\tANNcoord small_piece = max_length / FS_ASPECT_RATIO;\n\tANNcoord lo_cut = bnds.lo[cut_dim] + small_piece;// lowest legal cut\n\tANNcoord hi_cut = bnds.hi[cut_dim] - small_piece;// highest legal cut\n\t\t\t\t\t\t\t\t\t\t// find min and max along cut_dim\n\tannMinMax(pa, pidx, n, cut_dim, min, max);\n\t\t\t\t\t\t\t\t\t\t// is median below lo_cut?\n\tif (annSplitBalance(pa, pidx, n, cut_dim, lo_cut) >= 0) {\n\t\tif (max > lo_cut) {\t\t\t\t// are any points above lo_cut?\n\t\t\tcut_val = lo_cut;\t\t\t// cut at lo_cut\n\t\t\tannPlaneSplit(pa, pidx, n, cut_dim, cut_val, br1, br2);\n\t\t\tn_lo = br1;\t\t\t\t\t// balance if there are ties\n\t\t}\n\t\telse {\t\t\t\t\t\t\t// all points below lo_cut\n\t\t\tcut_val = max;\t\t\t\t// cut at max value\n\t\t\tannPlaneSplit(pa, pidx, n, cut_dim, cut_val, br1, br2);\n\t\t\tn_lo = n-1;\n\t\t}\n\t}\n\t\t\t\t\t\t\t\t\t\t// is median above hi_cut?\n\telse if (annSplitBalance(pa, pidx, n, cut_dim, hi_cut) <= 0) {\n\t\tif (min < hi_cut) {\t\t\t\t// are any points below hi_cut?\n\t\t\tcut_val = hi_cut;\t\t\t// cut at hi_cut\n\t\t\tannPlaneSplit(pa, pidx, n, cut_dim, cut_val, br1, br2);\n\t\t\tn_lo = br2;\t\t\t\t\t// balance if there are ties\n\t\t}\n\t\telse {\t\t\t\t\t\t\t// all points above hi_cut\n\t\t\tcut_val = min;\t\t\t\t// cut at min value\n\t\t\tannPlaneSplit(pa, pidx, n, cut_dim, cut_val, br1, br2);\n\t\t\tn_lo = 1;\n\t\t}\n\t}\n\telse {\t\t\t\t\t\t\t\t// median cut is good enough\n\t\tn_lo = n/2;\t\t\t\t\t\t// split about median\n\t\tannMedianSplit(pa, pidx, n, cut_dim, cut_val, n_lo);\n\t}\n}\n"
  },
  {
    "path": "src/ANN/kd_split.h",
    "content": "//----------------------------------------------------------------------\n// File:\t\t\tkd_split.h\n// Programmer:\t\tSunil Arya and David Mount\n// Description:\t\tMethods for splitting kd-trees\n// Last modified:\t01/04/05 (Version 1.0)\n//----------------------------------------------------------------------\n// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and\n// David Mount.  All Rights Reserved.\n// \n// This software and related documentation is part of the Approximate\n// Nearest Neighbor Library (ANN).  This software is provided under\n// the provisions of the Lesser GNU Public License (LGPL).  See the\n// file ../ReadMe.txt for further information.\n// \n// The University of Maryland (U.M.) and the authors make no\n// representations about the suitability or fitness of this software for\n// any purpose.  It is provided \"as is\" without express or implied\n// warranty.\n//----------------------------------------------------------------------\n// History:\n//\tRevision 0.1  03/04/98\n//\t\tInitial release\n//----------------------------------------------------------------------\n\n#ifndef ANN_KD_SPLIT_H\n#define ANN_KD_SPLIT_H\n\n#include \"kd_tree.h\"\t\t\t\t\t// kd-tree definitions\n\n//----------------------------------------------------------------------\n//\tExternal entry points\n//\t\tThese are all splitting procedures for kd-trees.\n//----------------------------------------------------------------------\n\nvoid kd_split(\t\t\t\t\t\t\t// standard (optimized) kd-splitter\n\tANNpointArray\t\tpa,\t\t\t\t// point array (unaltered)\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices (permuted on return)\n\tconst ANNorthRect\t&bnds,\t\t\t// bounding rectangle for cell\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\tdim,\t\t\t// dimension of space\n\tint\t\t\t\t\t&cut_dim,\t\t// cutting dimension (returned)\n\tANNcoord\t\t\t&cut_val,\t\t// cutting value (returned)\n\tint\t\t\t\t\t&n_lo);\t\t\t// num of points on low side (returned)\n\nvoid midpt_split(\t\t\t\t\t\t// midpoint kd-splitter\n\tANNpointArray\t\tpa,\t\t\t\t// point array (unaltered)\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices (permuted on return)\n\tconst ANNorthRect\t&bnds,\t\t\t// bounding rectangle for cell\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\tdim,\t\t\t// dimension of space\n\tint\t\t\t\t\t&cut_dim,\t\t// cutting dimension (returned)\n\tANNcoord\t\t\t&cut_val,\t\t// cutting value (returned)\n\tint\t\t\t\t\t&n_lo);\t\t\t// num of points on low side (returned)\n\nvoid sl_midpt_split(\t\t\t\t\t// sliding midpoint kd-splitter\n\tANNpointArray\t\tpa,\t\t\t\t// point array (unaltered)\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices (permuted on return)\n\tconst ANNorthRect\t&bnds,\t\t\t// bounding rectangle for cell\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\tdim,\t\t\t// dimension of space\n\tint\t\t\t\t\t&cut_dim,\t\t// cutting dimension (returned)\n\tANNcoord\t\t\t&cut_val,\t\t// cutting value (returned)\n\tint\t\t\t\t\t&n_lo);\t\t\t// num of points on low side (returned)\n\nvoid fair_split(\t\t\t\t\t\t// fair-split kd-splitter\n\tANNpointArray\t\tpa,\t\t\t\t// point array (unaltered)\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices (permuted on return)\n\tconst ANNorthRect\t&bnds,\t\t\t// bounding rectangle for cell\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\tdim,\t\t\t// dimension of space\n\tint\t\t\t\t\t&cut_dim,\t\t// cutting dimension (returned)\n\tANNcoord\t\t\t&cut_val,\t\t// cutting value (returned)\n\tint\t\t\t\t\t&n_lo);\t\t\t// num of points on low side (returned)\n\nvoid sl_fair_split(\t\t\t\t\t\t// sliding fair-split kd-splitter\n\tANNpointArray\t\tpa,\t\t\t\t// point array (unaltered)\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices (permuted on return)\n\tconst ANNorthRect\t&bnds,\t\t\t// bounding rectangle for cell\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\tdim,\t\t\t// dimension of space\n\tint\t\t\t\t\t&cut_dim,\t\t// cutting dimension (returned)\n\tANNcoord\t\t\t&cut_val,\t\t// cutting value (returned)\n\tint\t\t\t\t\t&n_lo);\t\t\t// num of points on low side (returned)\n\n#endif\n"
  },
  {
    "path": "src/ANN/kd_tree.cpp",
    "content": "//----------------------------------------------------------------------\n// File:\t\t\tkd_tree.cpp\n// Programmer:\t\tSunil Arya and David Mount\n// Description:\t\tBasic methods for kd-trees.\n// Last modified:\t01/04/05 (Version 1.0)\n//----------------------------------------------------------------------\n// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and\n// David Mount.  All Rights Reserved.\n//\n// This software and related documentation is part of the Approximate\n// Nearest Neighbor Library (ANN).  This software is provided under\n// the provisions of the Lesser GNU Public License (LGPL).  See the\n// file ../ReadMe.txt for further information.\n//\n// The University of Maryland (U.M.) and the authors make no\n// representations about the suitability or fitness of this software for\n// any purpose.  It is provided \"as is\" without express or implied\n// warranty.\n//----------------------------------------------------------------------\n// History:\n//\tRevision 0.1  03/04/98\n//\t\tInitial release\n//\tRevision 1.0  04/01/05\n//\t\tIncreased aspect ratio bound (ANN_AR_TOOBIG) from 100 to 1000.\n//\t\tFixed leaf counts to count trivial leaves.\n//\t\tAdded optional pa, pi arguments to Skeleton kd_tree constructor\n//\t\t\tfor use in load constructor.\n//\t\tAdded annClose() to eliminate KD_TRIVIAL memory leak.\n//----------------------------------------------------------------------\n\n#include \"kd_tree.h\"\t\t\t\t\t// kd-tree declarations\n#include \"kd_split.h\"\t\t\t\t\t// kd-tree splitting rules\n#include \"kd_util.h\"\t\t\t\t\t// kd-tree utilities\n#include \"ANNperf.h\"\t\t\t\t// performance evaluation\n\n//----------------------------------------------------------------------\n//\tGlobal data\n//\n//\tFor some splitting rules, especially with small bucket sizes,\n//\tit is possible to generate a large number of empty leaf nodes.\n//\tTo save storage we allocate a single trivial leaf node which\n//\tcontains no points.  For messy coding reasons it is convenient\n//\tto have it reference a trivial point index.\n//\n//\tKD_TRIVIAL is allocated when the first kd-tree is created.  It\n//\tmust *never* deallocated (since it may be shared by more than\n//\tone tree).\n//----------------------------------------------------------------------\nstatic int\t\t\t\tIDX_TRIVIAL[] = {0};\t// trivial point index\nANNkd_leaf\t\t\t\t*KD_TRIVIAL = NULL;\t\t// trivial leaf node\n\n//----------------------------------------------------------------------\n//\tPrinting the kd-tree\n//\t\tThese routines print a kd-tree in reverse inorder (high then\n//\t\troot then low).  (This is so that if you look at the output\n//\t\tfrom the right side it appear from left to right in standard\n//\t\tinorder.)  When outputting leaves we output only the point\n//\t\tindices rather than the point coordinates. There is an option\n//\t\tto print the point coordinates separately.\n//\n//\t\tThe tree printing routine calls the printing routines on the\n//\t\tindividual nodes of the tree, passing in the level or depth\n//\t\tin the tree.  The level in the tree is used to print indentation\n//\t\tfor readability.\n//----------------------------------------------------------------------\n\nvoid ANNkd_split::print(\t\t\t\t// print splitting node\n\t\tint level,\t\t\t\t\t\t// depth of node in tree\n\t\tostream &out)\t\t\t\t\t// output stream\n{\n\tchild[ANN_HI]->print(level+1, out);\t// print high child\n\tout << \"    \";\n\tfor (int i = 0; i < level; i++)\t\t// print indentation\n\t\tout << \"..\";\n\tout << \"Split cd=\" << cut_dim << \" cv=\" << cut_val;\n\tout << \" lbnd=\" << cd_bnds[ANN_LO];\n\tout << \" hbnd=\" << cd_bnds[ANN_HI];\n\tout << \"\\n\";\n\tchild[ANN_LO]->print(level+1, out);\t// print low child\n}\n\nvoid ANNkd_leaf::print(\t\t\t\t\t// print leaf node\n\t\tint level,\t\t\t\t\t\t// depth of node in tree\n\t\tostream &out)\t\t\t\t\t// output stream\n{\n\n\tout << \"    \";\n\tfor (int i = 0; i < level; i++)\t\t// print indentation\n\t\tout << \"..\";\n\n\tif (this == KD_TRIVIAL) {\t\t\t// canonical trivial leaf node\n\t\tout << \"Leaf (trivial)\\n\";\n\t}\n\telse{\n\t\tout << \"Leaf n=\" << n_pts << \" <\";\n\t\tfor (int j = 0; j < n_pts; j++) {\n\t\t\tout << bkt[j];\n\t\t\tif (j < n_pts-1) out << \",\";\n\t\t}\n\t\tout << \">\\n\";\n\t}\n}\n\nvoid ANNkd_tree::Print(\t\t\t\t\t// print entire tree\n\t\tANNbool with_pts,\t\t\t\t// print points as well?\n\t\tostream &out)\t\t\t\t\t// output stream\n{\n\tout << \"ANN Version \" << ANNversion << \"\\n\";\n\tif (with_pts) {\t\t\t\t\t\t// print point coordinates\n\t\tout << \"    Points:\\n\";\n\t\tfor (int i = 0; i < n_pts; i++) {\n\t\t\tout << \"\\t\" << i << \": \";\n\t\t\tannPrintPt(pts[i], dim, out);\n\t\t\tout << \"\\n\";\n\t\t}\n\t}\n\tif (root == NULL)\t\t\t\t\t// empty tree?\n\t\tout << \"    Null tree.\\n\";\n\telse {\n\t\troot->print(0, out);\t\t\t// invoke printing at root\n\t}\n}\n\n//----------------------------------------------------------------------\n//\tkd_tree statistics (for performance evaluation)\n//\t\tThis routine compute various statistics information for\n//\t\ta kd-tree.  It is used by the implementors for performance\n//\t\tevaluation of the data structure.\n//----------------------------------------------------------------------\n\n#define MAX(a,b)\t\t((a) > (b) ? (a) : (b))\n\nvoid ANNkdStats::merge(const ANNkdStats &st)\t// merge stats from child\n{\n\tn_lf += st.n_lf;\t\t\tn_tl += st.n_tl;\n\tn_spl += st.n_spl;\t\t\tn_shr += st.n_shr;\n\tdepth = MAX(depth, st.depth);\n\tsum_ar += st.sum_ar;\n}\n\n//----------------------------------------------------------------------\n//\tUpdate statistics for nodes\n//----------------------------------------------------------------------\n\nconst double ANN_AR_TOOBIG = 1000;\t\t\t\t// too big an aspect ratio\n\nvoid ANNkd_leaf::getStats(\t\t\t\t\t\t// get subtree statistics\n\tint\t\t\t\t\tdim,\t\t\t\t\t// dimension of space\n\tANNkdStats\t\t\t&st,\t\t\t\t\t// stats (modified)\n\tANNorthRect\t\t\t&bnd_box)\t\t\t\t// bounding box\n{\n\tst.reset();\n\tst.n_lf = 1;\t\t\t\t\t\t\t\t// count this leaf\n\tif (this == KD_TRIVIAL) st.n_tl = 1;\t\t// count trivial leaf\n\tdouble ar = annAspectRatio(dim, bnd_box);\t// aspect ratio of leaf\n\t\t\t\t\t\t\t\t\t\t\t\t// incr sum (ignore outliers)\n\tst.sum_ar += float(ar < ANN_AR_TOOBIG ? ar : ANN_AR_TOOBIG);\n}\n\nvoid ANNkd_split::getStats(\t\t\t\t\t\t// get subtree statistics\n\tint\t\t\t\t\tdim,\t\t\t\t\t// dimension of space\n\tANNkdStats\t\t\t&st,\t\t\t\t\t// stats (modified)\n\tANNorthRect\t\t\t&bnd_box)\t\t\t\t// bounding box\n{\n\tANNkdStats ch_stats;\t\t\t\t\t\t// stats for children\n\t\t\t\t\t\t\t\t\t\t\t\t// get stats for low child\n\tANNcoord hv = bnd_box.hi[cut_dim];\t\t\t// save box bounds\n\tbnd_box.hi[cut_dim] = cut_val;\t\t\t\t// upper bound for low child\n\tch_stats.reset();\t\t\t\t\t\t\t// reset\n\tchild[ANN_LO]->getStats(dim, ch_stats, bnd_box);\n\tst.merge(ch_stats);\t\t\t\t\t\t\t// merge them\n\tbnd_box.hi[cut_dim] = hv;\t\t\t\t\t// restore bound\n\t\t\t\t\t\t\t\t\t\t\t\t// get stats for high child\n\tANNcoord lv = bnd_box.lo[cut_dim];\t\t\t// save box bounds\n\tbnd_box.lo[cut_dim] = cut_val;\t\t\t\t// lower bound for high child\n\tch_stats.reset();\t\t\t\t\t\t\t// reset\n\tchild[ANN_HI]->getStats(dim, ch_stats, bnd_box);\n\tst.merge(ch_stats);\t\t\t\t\t\t\t// merge them\n\tbnd_box.lo[cut_dim] = lv;\t\t\t\t\t// restore bound\n\n\tst.depth++;\t\t\t\t\t\t\t\t\t// increment depth\n\tst.n_spl++;\t\t\t\t\t\t\t\t\t// increment number of splits\n}\n\n//----------------------------------------------------------------------\n//\tgetStats\n//\t\tCollects a number of statistics related to kd_tree or\n//\t\tbd_tree.\n//----------------------------------------------------------------------\n\nvoid ANNkd_tree::getStats(\t\t\t\t\t\t// get tree statistics\n\tANNkdStats\t\t\t&st)\t\t\t\t\t// stats (modified)\n{\n\tst.reset(dim, n_pts, bkt_size);\t\t\t\t// reset stats\n\t\t\t\t\t\t\t\t\t\t\t\t// create bounding box\n\tANNorthRect bnd_box(dim, bnd_box_lo, bnd_box_hi);\n\tif (root != NULL) {\t\t\t\t\t\t\t// if nonempty tree\n\t\troot->getStats(dim, st, bnd_box);\t\t// get statistics\n\t\tst.avg_ar = st.sum_ar / st.n_lf;\t\t// average leaf asp ratio\n\t}\n}\n\n//----------------------------------------------------------------------\n//\tkd_tree destructor\n//\t\tThe destructor just frees the various elements that were\n//\t\tallocated in the construction process.\n//----------------------------------------------------------------------\n\nANNkd_tree::~ANNkd_tree()\t\t\t\t// tree destructor\n{\n  if (root != NULL) delete root;\n\tif (pidx != NULL) delete [] pidx;\n\tif (bnd_box_lo != NULL) annDeallocPt(bnd_box_lo);\n\tif (bnd_box_hi != NULL) annDeallocPt(bnd_box_hi);\n}\n\n//----------------------------------------------------------------------\n//\tThis is called with all use of ANN is finished.  It eliminates the\n//\tminor memory leak caused by the allocation of KD_TRIVIAL.\n//----------------------------------------------------------------------\nvoid annClose()\t\t\t\t// close use of ANN\n{\n\tif (KD_TRIVIAL != NULL) {\n\t\tdelete KD_TRIVIAL;\n\t\tKD_TRIVIAL = NULL;\n\t}\n}\n\n//----------------------------------------------------------------------\n//\tkd_tree constructors\n//\t\tThere is a skeleton kd-tree constructor which sets up a\n//\t\ttrivial empty tree.\t The last optional argument allows\n//\t\tthe routine to be passed a point index array which is\n//\t\tassumed to be of the proper size (n).  Otherwise, one is\n//\t\tallocated and initialized to the identity.\tWarning: In\n//\t\teither case the destructor will deallocate this array.\n//\n//\t\tAs a kludge, we need to allocate KD_TRIVIAL if one has not\n//\t\talready been allocated.\t (This is because I'm too dumb to\n//\t\tfigure out how to cause a pointer to be allocated at load\n//\t\ttime.)\n//----------------------------------------------------------------------\n\nvoid ANNkd_tree::SkeletonTree(\t\t\t// construct skeleton tree\n\t\tint n,\t\t\t\t\t\t\t// number of points\n\t\tint dd,\t\t\t\t\t\t\t// dimension\n\t\tint bs,\t\t\t\t\t\t\t// bucket size\n\t\tANNpointArray pa,\t\t\t\t// point array\n\t\tANNidxArray pi)\t\t\t\t\t// point indices\n{\n\tdim = dd;\t\t\t\t\t\t\t// initialize basic elements\n\tn_pts = n;\n\tbkt_size = bs;\n\tpts = pa;\t\t\t\t\t\t\t// initialize points array\n\n\troot = NULL;\t\t\t\t\t\t// no associated tree yet\n\n\tif (pi == NULL) {\t\t\t\t\t// point indices provided?\n\t\tpidx = new ANNidx[n];\t\t\t// no, allocate space for point indices\n\t\tfor (int i = 0; i < n; i++) {\n\t\t\tpidx[i] = i;\t\t\t\t// initially identity\n\t\t}\n\t}\n\telse {\n\t\tpidx = pi;\t\t\t\t\t\t// yes, use them\n\t}\n\n\tbnd_box_lo = bnd_box_hi = NULL;\t\t// bounding box is nonexistent\n\tif (KD_TRIVIAL == NULL)\t\t\t\t// no trivial leaf node yet?\n\t\tKD_TRIVIAL = new ANNkd_leaf(0, IDX_TRIVIAL);\t// allocate it\n}\n\nANNkd_tree::ANNkd_tree(\t\t\t\t\t// basic constructor\n\t\tint n,\t\t\t\t\t\t\t// number of points\n\t\tint dd,\t\t\t\t\t\t\t// dimension\n\t\tint bs)\t\t\t\t\t\t\t// bucket size\n{  SkeletonTree(n, dd, bs);  }\t\t\t// construct skeleton tree\n\n//----------------------------------------------------------------------\n//\trkd_tree - recursive procedure to build a kd-tree\n//\n//\t\tBuilds a kd-tree for points in pa as indexed through the\n//\t\tarray pidx[0..n-1] (typically a subarray of the array used in\n//\t\tthe top-level call).  This routine permutes the array pidx,\n//\t\tbut does not alter pa[].\n//\n//\t\tThe construction is based on a standard algorithm for constructing\n//\t\tthe kd-tree (see Friedman, Bentley, and Finkel, ``An algorithm for\n//\t\tfinding best matches in logarithmic expected time,'' ACM Transactions\n//\t\ton Mathematical Software, 3(3):209-226, 1977).  The procedure\n//\t\toperates by a simple divide-and-conquer strategy, which determines\n//\t\tan appropriate orthogonal cutting plane (see below), and splits\n//\t\tthe points.  When the number of points falls below the bucket size,\n//\t\twe simply store the points in a leaf node's bucket.\n//\n//\t\tOne of the arguments is a pointer to a splitting routine,\n//\t\twhose prototype is:\n//\n//\t\t\t\tvoid split(\n//\t\t\t\t\t\tANNpointArray pa,  // complete point array\n//\t\t\t\t\t\tANNidxArray pidx,  // point array (permuted on return)\n//\t\t\t\t\t\tANNorthRect &bnds, // bounds of current cell\n//\t\t\t\t\t\tint n,\t\t\t   // number of points\n//\t\t\t\t\t\tint dim,\t\t   // dimension of space\n//\t\t\t\t\t\tint &cut_dim,\t   // cutting dimension\n//\t\t\t\t\t\tANNcoord &cut_val, // cutting value\n//\t\t\t\t\t\tint &n_lo)\t\t   // no. of points on low side of cut\n//\n//\t\tThis procedure selects a cutting dimension and cutting value,\n//\t\tpartitions pa about these values, and returns the number of\n//\t\tpoints on the low side of the cut.\n//----------------------------------------------------------------------\n\nANNkd_ptr rkd_tree(\t\t\t\t// recursive construction of kd-tree\n\tANNpointArray\t\tpa,\t\t\t\t// point array\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices to store in subtree\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\tdim,\t\t\t// dimension of space\n\tint\t\t\t\t\tbsp,\t\t\t// bucket space\n\tANNorthRect\t\t\t&bnd_box,\t\t// bounding box for current node\n\tANNkd_splitter\t\tsplitter)\t\t// splitting routine\n{\n\tif (n <= bsp) {\t\t\t\t\t\t// n small, make a leaf node\n\t\tif (n == 0)\t\t\t\t\t\t// empty leaf node\n\t\t\treturn KD_TRIVIAL;\t\t\t// return (canonical) empty leaf\n\t\telse\t\t\t\t\t\t\t// construct the node and return\n\t\t\treturn new ANNkd_leaf(n, pidx);\n\t}\n\telse {\t\t\t\t\t\t\t\t// n large, make a splitting node\n\t\tint cd;\t\t\t\t\t\t\t// cutting dimension\n\t\tANNcoord cv;\t\t\t\t\t// cutting value\n\t\tint n_lo;\t\t\t\t\t\t// number on low side of cut\n\t\tANNkd_node *lo, *hi;\t\t\t// low and high children\n\n\t\t\t\t\t\t\t\t\t\t// invoke splitting procedure\n\t\t(*splitter)(pa, pidx, bnd_box, n, dim, cd, cv, n_lo);\n\n\t\tANNcoord lv = bnd_box.lo[cd];\t// save bounds for cutting dimension\n\t\tANNcoord hv = bnd_box.hi[cd];\n\n\t\tbnd_box.hi[cd] = cv;\t\t\t// modify bounds for left subtree\n\t\tlo = rkd_tree(\t\t\t\t\t// build left subtree\n\t\t\t\tpa, pidx, n_lo,\t\t\t// ...from pidx[0..n_lo-1]\n\t\t\t\tdim, bsp, bnd_box, splitter);\n\t\tbnd_box.hi[cd] = hv;\t\t\t// restore bounds\n\n\t\tbnd_box.lo[cd] = cv;\t\t\t// modify bounds for right subtree\n\t\thi = rkd_tree(\t\t\t\t\t// build right subtree\n\t\t\t\tpa, pidx + n_lo, n-n_lo,// ...from pidx[n_lo..n-1]\n\t\t\t\tdim, bsp, bnd_box, splitter);\n\t\tbnd_box.lo[cd] = lv;\t\t\t// restore bounds\n\n\t\t\t\t\t\t\t\t\t\t// create the splitting node\n\t\tANNkd_split *ptr = new ANNkd_split(cd, cv, lv, hv, lo, hi);\n\n\t\treturn ptr;\t\t\t\t\t\t// return pointer to this node\n\t}\n}\n\n//----------------------------------------------------------------------\n// kd-tree constructor\n//\t\tThis is the main constructor for kd-trees given a set of points.\n//\t\tIt first builds a skeleton tree, then computes the bounding box\n//\t\tof the data points, and then invokes rkd_tree() to actually\n//\t\tbuild the tree, passing it the appropriate splitting routine.\n//----------------------------------------------------------------------\n\nANNkd_tree::ANNkd_tree(\t\t\t\t\t// construct from point array\n\tANNpointArray\t\tpa,\t\t\t\t// point array (with at least n pts)\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\tdd,\t\t\t\t// dimension\n\tint\t\t\t\t\tbs,\t\t\t\t// bucket size\n\tANNsplitRule\t\tsplit)\t\t\t// splitting method\n{\n\tSkeletonTree(n, dd, bs);\t\t\t// set up the basic stuff\n\tpts = pa;\t\t\t\t\t\t\t// where the points are\n\tif (n == 0) return;\t\t\t\t\t// no points--no sweat\n\n\tANNorthRect bnd_box(dd);\t\t\t// bounding box for points\n\tannEnclRect(pa, pidx, n, dd, bnd_box);// construct bounding rectangle\n\t\t\t\t\t\t\t\t\t\t// copy to tree structure\n\tbnd_box_lo = annCopyPt(dd, bnd_box.lo);\n\tbnd_box_hi = annCopyPt(dd, bnd_box.hi);\n\n\tswitch (split) {\t\t\t\t\t// build by rule\n\tcase ANN_KD_STD:\t\t\t\t\t// standard kd-splitting rule\n\t\troot = rkd_tree(pa, pidx, n, dd, bs, bnd_box, kd_split);\n\t\tbreak;\n\tcase ANN_KD_MIDPT:\t\t\t\t\t// midpoint split\n\t\troot = rkd_tree(pa, pidx, n, dd, bs, bnd_box, midpt_split);\n\t\tbreak;\n\tcase ANN_KD_FAIR:\t\t\t\t\t// fair split\n\t\troot = rkd_tree(pa, pidx, n, dd, bs, bnd_box, fair_split);\n\t\tbreak;\n\tcase ANN_KD_SUGGEST:\t\t\t\t// best (in our opinion)\n\tcase ANN_KD_SL_MIDPT:\t\t\t\t// sliding midpoint split\n\t\troot = rkd_tree(pa, pidx, n, dd, bs, bnd_box, sl_midpt_split);\n\t\tbreak;\n\tcase ANN_KD_SL_FAIR:\t\t\t\t// sliding fair split\n\t\troot = rkd_tree(pa, pidx, n, dd, bs, bnd_box, sl_fair_split);\n\t\tbreak;\n\tdefault:\n\t\tannError(\"Illegal splitting method\", ANNabort);\n\t}\n}\n"
  },
  {
    "path": "src/ANN/kd_tree.h",
    "content": "//----------------------------------------------------------------------\n// File:\t\t\tkd_tree.h\n// Programmer:\t\tSunil Arya and David Mount\n// Description:\t\tDeclarations for standard kd-tree routines\n// Last modified:\t05/03/05 (Version 1.1)\n//----------------------------------------------------------------------\n// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and\n// David Mount.  All Rights Reserved.\n//\n// This software and related documentation is part of the Approximate\n// Nearest Neighbor Library (ANN).  This software is provided under\n// the provisions of the Lesser GNU Public License (LGPL).  See the\n// file ../ReadMe.txt for further information.\n//\n// The University of Maryland (U.M.) and the authors make no\n// representations about the suitability or fitness of this software for\n// any purpose.  It is provided \"as is\" without express or implied\n// warranty.\n//----------------------------------------------------------------------\n// History:\n//\tRevision 0.1  03/04/98\n//\t\tInitial release\n//\tRevision 1.1  05/03/05\n//\t\tAdded fixed radius kNN search\n//----------------------------------------------------------------------\n\n#ifndef ANN_kd_tree_H\n#define ANN_kd_tree_H\n\n#include \"ANNx.h\"\t\t\t\t\t// all ANN includes\n\nusing namespace std;\t\t\t\t\t// make std:: available\n\n//----------------------------------------------------------------------\n//\tGeneric kd-tree node\n//\n//\t\tNodes in kd-trees are of two types, splitting nodes which contain\n//\t\tsplitting information (a splitting hyperplane orthogonal to one\n//\t\tof the coordinate axes) and leaf nodes which contain point\n//\t\tinformation (an array of points stored in a bucket).  This is\n//\t\thandled by making a generic class kd_node, which is essentially an\n//\t\tempty shell, and then deriving the leaf and splitting nodes from\n//\t\tthis.\n//----------------------------------------------------------------------\n\nclass ANNkd_node{\t\t\t\t\t\t// generic kd-tree node (empty shell)\npublic:\n\tvirtual ~ANNkd_node() {}\t\t\t\t\t// virtual distroyer\n\n\tvirtual void ann_search(ANNdist) = 0;\t\t// tree search\n\tvirtual void ann_pri_search(ANNdist) = 0;\t// priority search\n\tvirtual void ann_FR_search(ANNdist) = 0;\t// fixed-radius search\n\n\tvirtual void getStats(\t\t\t\t\t\t// get tree statistics\n\t\t\t\tint dim,\t\t\t\t\t\t// dimension of space\n\t\t\t\tANNkdStats &st,\t\t\t\t\t// statistics\n\t\t\t\tANNorthRect &bnd_box) = 0;\t\t// bounding box\n\t\t\t\t\t\t\t\t\t\t\t\t// print node\n\tvirtual void print(int level, ostream &out) = 0;\n\tvirtual void dump(ostream &out) = 0;\t\t// dump node\n\n\tfriend class ANNkd_tree;\t\t\t\t\t// allow kd-tree to access us\n};\n\n//----------------------------------------------------------------------\n//\tkd-splitting function:\n//\t\tkd_splitter is a pointer to a splitting routine for preprocessing.\n//\t\tDifferent splitting procedures result in different strategies\n//\t\tfor building the tree.\n//----------------------------------------------------------------------\n\ntypedef void (*ANNkd_splitter)(\t\t\t// splitting routine for kd-trees\n\tANNpointArray\t\tpa,\t\t\t\t// point array (unaltered)\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices (permuted on return)\n\tconst ANNorthRect\t&bnds,\t\t\t// bounding rectangle for cell\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\tdim,\t\t\t// dimension of space\n\tint\t\t\t\t\t&cut_dim,\t\t// cutting dimension (returned)\n\tANNcoord\t\t\t&cut_val,\t\t// cutting value (returned)\n\tint\t\t\t\t\t&n_lo);\t\t\t// num of points on low side (returned)\n\n//----------------------------------------------------------------------\n//\tLeaf kd-tree node\n//\t\tLeaf nodes of the kd-tree store the set of points associated\n//\t\twith this bucket, stored as an array of point indices.  These\n//\t\tare indices in the array points, which resides with the\n//\t\troot of the kd-tree.  We also store the number of points\n//\t\tthat reside in this bucket.\n//----------------------------------------------------------------------\n\nclass ANNkd_leaf: public ANNkd_node\t\t// leaf node for kd-tree\n{\n\tint\t\t\t\t\tn_pts;\t\t\t// no. points in bucket\n\tANNidxArray\t\t\tbkt;\t\t\t// bucket of points\npublic:\n\tANNkd_leaf(\t\t\t\t\t\t\t// constructor\n\t\tint\t\t\t\tn,\t\t\t\t// number of points\n\t\tANNidxArray\t\tb)\t\t\t\t// bucket\n\t\t{\n\t\t\tn_pts\t\t= n;\t\t\t// number of points in bucket\n\t\t\tbkt\t\t\t= b;\t\t\t// the bucket\n\t\t}\n\n\t~ANNkd_leaf() { }\t\t\t\t\t// destructor (none)\n\n\tvirtual void getStats(\t\t\t\t\t\t// get tree statistics\n\t\t\t\tint dim,\t\t\t\t\t\t// dimension of space\n\t\t\t\tANNkdStats &st,\t\t\t\t\t// statistics\n\t\t\t\tANNorthRect &bnd_box);\t\t\t// bounding box\n\tvirtual void print(int level, ostream &out);// print node\n\tvirtual void dump(ostream &out);\t\t\t// dump node\n\n\tvirtual void ann_search(ANNdist);\t\t\t// standard search\n\tvirtual void ann_pri_search(ANNdist);\t\t// priority search\n\tvirtual void ann_FR_search(ANNdist);\t\t// fixed-radius search\n};\n\n//----------------------------------------------------------------------\n//\t\tKD_TRIVIAL is a special pointer to an empty leaf node. Since\n//\t\tsome splitting rules generate many (more than 50%) trivial\n//\t\tleaves, we use this one shared node to save space.\n//\n//\t\tThe pointer is initialized to NULL, but whenever a kd-tree is\n//\t\tcreated, we allocate this node, if it has not already been\n//\t\tallocated. This node is *never* deallocated, so it produces\n//\t\ta small memory leak.\n//----------------------------------------------------------------------\n\nextern ANNkd_leaf *KD_TRIVIAL;\t\t\t\t\t// trivial (empty) leaf node\n\n//----------------------------------------------------------------------\n//\tkd-tree splitting node.\n//\t\tSplitting nodes contain a cutting dimension and a cutting value.\n//\t\tThese indicate the axis-parellel plane which subdivide the\n//\t\tbox for this node. The extent of the bounding box along the\n//\t\tcutting dimension is maintained (this is used to speed up point\n//\t\tto box distance calculations) [we do not store the entire bounding\n//\t\tbox since this may be wasteful of space in high dimensions].\n//\t\tWe also store pointers to the 2 children.\n//----------------------------------------------------------------------\n\nclass ANNkd_split : public ANNkd_node\t// splitting node of a kd-tree\n{\n\tint\t\t\t\t\tcut_dim;\t\t// dim orthogonal to cutting plane\n\tANNcoord\t\t\tcut_val;\t\t// location of cutting plane\n\tANNcoord\t\t\tcd_bnds[2];\t\t// lower and upper bounds of\n\t\t\t\t\t\t\t\t\t\t// rectangle along cut_dim\n\tANNkd_ptr\t\t\tchild[2];\t\t// left and right children\npublic:\n\tANNkd_split(\t\t\t\t\t\t// constructor\n\t\tint cd,\t\t\t\t\t\t\t// cutting dimension\n\t\tANNcoord cv,\t\t\t\t\t// cutting value\n\t\tANNcoord lv, ANNcoord hv,\t\t\t\t// low and high values\n\t\tANNkd_ptr lc=NULL, ANNkd_ptr hc=NULL)\t// children\n\t\t{\n\t\t\tcut_dim\t\t= cd;\t\t\t\t\t// cutting dimension\n\t\t\tcut_val\t\t= cv;\t\t\t\t\t// cutting value\n\t\t\tcd_bnds[ANN_LO] = lv;\t\t\t\t// lower bound for rectangle\n\t\t\tcd_bnds[ANN_HI] = hv;\t\t\t\t// upper bound for rectangle\n\t\t\tchild[ANN_LO]\t= lc;\t\t\t\t// left child\n\t\t\tchild[ANN_HI]\t= hc;\t\t\t\t// right child\n\t\t}\n\n\t~ANNkd_split()\t\t\t\t\t\t// destructor\n\t\t{\n\t\t\tif (child[ANN_LO]!= NULL && child[ANN_LO]!= KD_TRIVIAL)\n\t\t\t\tdelete child[ANN_LO];\n\t\t\tif (child[ANN_HI]!= NULL && child[ANN_HI]!= KD_TRIVIAL)\n\t\t\t\tdelete child[ANN_HI];\n\t\t}\n\n\tvirtual void getStats(\t\t\t\t\t\t// get tree statistics\n\t\t\t\tint dim,\t\t\t\t\t\t// dimension of space\n\t\t\t\tANNkdStats &st,\t\t\t\t\t// statistics\n\t\t\t\tANNorthRect &bnd_box);\t\t\t// bounding box\n\tvirtual void print(int level, ostream &out);// print node\n\tvirtual void dump(ostream &out);\t\t\t// dump node\n\n\tvirtual void ann_search(ANNdist);\t\t\t// standard search\n\tvirtual void ann_pri_search(ANNdist);\t\t// priority search\n\tvirtual void ann_FR_search(ANNdist);\t\t// fixed-radius search\n};\n\n//----------------------------------------------------------------------\n//\t\tExternal entry points\n//----------------------------------------------------------------------\n\nANNkd_ptr rkd_tree(\t\t\t\t// recursive construction of kd-tree\n\tANNpointArray\t\tpa,\t\t\t\t// point array (unaltered)\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices to store in subtree\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\tdim,\t\t\t// dimension of space\n\tint\t\t\t\t\tbsp,\t\t\t// bucket space\n\tANNorthRect\t\t\t&bnd_box,\t\t// bounding box for current node\n\tANNkd_splitter\t\tsplitter);\t\t// splitting routine\n\n#endif\n"
  },
  {
    "path": "src/ANN/kd_util.cpp",
    "content": "//----------------------------------------------------------------------\n// File:\t\t\tkd_util.cpp\n// Programmer:\t\tSunil Arya and David Mount\n// Description:\t\tCommon utilities for kd-trees\n// Last modified:\t01/04/05 (Version 1.0)\n//----------------------------------------------------------------------\n// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and\n// David Mount.  All Rights Reserved.\n//\n// This software and related documentation is part of the Approximate\n// Nearest Neighbor Library (ANN).  This software is provided under\n// the provisions of the Lesser GNU Public License (LGPL).  See the\n// file ../ReadMe.txt for further information.\n//\n// The University of Maryland (U.M.) and the authors make no\n// representations about the suitability or fitness of this software for\n// any purpose.  It is provided \"as is\" without express or implied\n// warranty.\n//----------------------------------------------------------------------\n// History:\n//\tRevision 0.1  03/04/98\n//\t\tInitial release\n//----------------------------------------------------------------------\n\n#include \"kd_util.h\"\t\t\t\t\t// kd-utility declarations\n\n#include \"ANNperf.h\"\t\t\t\t// performance evaluation\n\n//----------------------------------------------------------------------\n// The following routines are utility functions for manipulating\n// points sets, used in determining splitting planes for kd-tree\n// construction.\n//----------------------------------------------------------------------\n\n//----------------------------------------------------------------------\n//\tNOTE: Virtually all point indexing is done through an index (i.e.\n//\tpermutation) array pidx.  Consequently, a reference to the d-th\n//\tcoordinate of the i-th point is pa[pidx[i]][d].  The macro PA(i,d)\n//\tis a shorthand for this.\n//----------------------------------------------------------------------\n\t\t\t\t\t\t\t\t\t\t// standard 2-d indirect indexing\n#define PA(i,d)\t\t\t(pa[pidx[(i)]][(d)])\n\t\t\t\t\t\t\t\t\t\t// accessing a single point\n#define PP(i)\t\t\t(pa[pidx[(i)]])\n\n//----------------------------------------------------------------------\n//\tannAspectRatio\n//\t\tCompute the aspect ratio (ratio of longest to shortest side)\n//\t\tof a rectangle.\n//----------------------------------------------------------------------\n\ndouble annAspectRatio(\n\tint\t\t\t\t\tdim,\t\t\t// dimension\n\tconst ANNorthRect\t&bnd_box)\t\t// bounding cube\n{\n\tANNcoord length = bnd_box.hi[0] - bnd_box.lo[0];\n\tANNcoord min_length = length;\t\t\t\t// min side length\n\tANNcoord max_length = length;\t\t\t\t// max side length\n\tfor (int d = 0; d < dim; d++) {\n\t\tlength = bnd_box.hi[d] - bnd_box.lo[d];\n\t\tif (length < min_length) min_length = length;\n\t\tif (length > max_length) max_length = length;\n\t}\n\treturn max_length/min_length;\n}\n\n//----------------------------------------------------------------------\n//\tannEnclRect, annEnclCube\n//\t\tThese utilities compute the smallest rectangle and cube enclosing\n//\t\ta set of points, respectively.\n//----------------------------------------------------------------------\n\nvoid annEnclRect(\n\tANNpointArray\t\tpa,\t\t\t\t// point array\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\tdim,\t\t\t// dimension\n\tANNorthRect\t\t\t&bnds)\t\t\t// bounding cube (returned)\n{\n\tfor (int d = 0; d < dim; d++) {\t\t// find smallest enclosing rectangle\n\t\tANNcoord lo_bnd = PA(0,d);\t\t// lower bound on dimension d\n\t\tANNcoord hi_bnd = PA(0,d);\t\t// upper bound on dimension d\n\t\tfor (int i = 0; i < n; i++) {\n\t\t\tif (PA(i,d) < lo_bnd) lo_bnd = PA(i,d);\n\t\t\telse if (PA(i,d) > hi_bnd) hi_bnd = PA(i,d);\n\t\t}\n\t\tbnds.lo[d] = lo_bnd;\n\t\tbnds.hi[d] = hi_bnd;\n\t}\n}\n\nvoid annEnclCube(\t\t\t\t\t\t// compute smallest enclosing cube\n\tANNpointArray\t\tpa,\t\t\t\t// point array\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\tdim,\t\t\t// dimension\n\tANNorthRect\t\t\t&bnds)\t\t\t// bounding cube (returned)\n{\n\tint d;\n\t\t\t\t\t\t\t\t\t\t// compute smallest enclosing rect\n\tannEnclRect(pa, pidx, n, dim, bnds);\n\n\tANNcoord max_len = 0;\t\t\t\t// max length of any side\n\tfor (d = 0; d < dim; d++) {\t\t\t// determine max side length\n\t\tANNcoord len = bnds.hi[d] - bnds.lo[d];\n\t\tif (len > max_len) {\t\t\t// update max_len if longest\n\t\t\tmax_len = len;\n\t\t}\n\t}\n\tfor (d = 0; d < dim; d++) {\t\t\t// grow sides to match max\n\t\tANNcoord len = bnds.hi[d] - bnds.lo[d];\n\t\tANNcoord half_diff = (max_len - len) / 2;\n\t\tbnds.lo[d] -= half_diff;\n\t\tbnds.hi[d] += half_diff;\n\t}\n}\n\n//----------------------------------------------------------------------\n//\tannBoxDistance - utility routine which computes distance from point to\n//\t\tbox (Note: most distances to boxes are computed using incremental\n//\t\tdistance updates, not this function.)\n//----------------------------------------------------------------------\n\nANNdist annBoxDistance(\t\t\t// compute distance from point to box\n\tconst ANNpoint\t\tq,\t\t\t\t// the point\n\tconst ANNpoint\t\tlo,\t\t\t\t// low point of box\n\tconst ANNpoint\t\thi,\t\t\t\t// high point of box\n\tint\t\t\t\t\tdim)\t\t\t// dimension of space\n{\n\tANNdist dist = 0.0;\t\t// sum of squared distances\n\tANNdist t;\n\n\tfor (int d = 0; d < dim; d++) {\n\t\tif (q[d] < lo[d]) {\t\t\t\t// q is left of box\n\t\t\tt = ANNdist(lo[d]) - ANNdist(q[d]);\n\t\t\tdist = ANN_SUM(dist, ANN_POW(t));\n\t\t}\n\t\telse if (q[d] > hi[d]) {\t\t// q is right of box\n\t\t\tt = ANNdist(q[d]) - ANNdist(hi[d]);\n\t\t\tdist = ANN_SUM(dist, ANN_POW(t));\n\t\t}\n\t}\n\tANN_FLOP(4*dim)\t\t\t\t\t\t// increment floating op count\n\n\treturn dist;\n}\n\n//----------------------------------------------------------------------\n//\tannSpread - find spread along given dimension\n//\tannMinMax - find min and max coordinates along given dimension\n//\tannMaxSpread - find dimension of max spread\n//----------------------------------------------------------------------\n\nANNcoord annSpread(\t\t\t\t// compute point spread along dimension\n\tANNpointArray\t\tpa,\t\t\t\t// point array\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\td)\t\t\t\t// dimension to check\n{\n\tANNcoord min = PA(0,d);\t\t\t\t// compute max and min coords\n\tANNcoord max = PA(0,d);\n\tfor (int i = 1; i < n; i++) {\n\t\tANNcoord c = PA(i,d);\n\t\tif (c < min) min = c;\n\t\telse if (c > max) max = c;\n\t}\n\treturn (max - min);\t\t\t\t\t// total spread is difference\n}\n\nvoid annMinMax(\t\t\t\t\t// compute min and max coordinates along dim\n\tANNpointArray\t\tpa,\t\t\t\t// point array\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\td,\t\t\t\t// dimension to check\n\tANNcoord\t\t\t&min,\t\t\t// minimum value (returned)\n\tANNcoord\t\t\t&max)\t\t\t// maximum value (returned)\n{\n\tmin = PA(0,d);\t\t\t\t\t\t// compute max and min coords\n\tmax = PA(0,d);\n\tfor (int i = 1; i < n; i++) {\n\t\tANNcoord c = PA(i,d);\n\t\tif (c < min) min = c;\n\t\telse if (c > max) max = c;\n\t}\n}\n\nint annMaxSpread(\t\t\t\t\t\t// compute dimension of max spread\n\tANNpointArray\t\tpa,\t\t\t\t// point array\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\tdim)\t\t\t// dimension of space\n{\n\tint max_dim = 0;\t\t\t\t\t// dimension of max spread\n\tANNcoord max_spr = 0;\t\t\t\t// amount of max spread\n\n\tif (n == 0) return max_dim;\t\t\t// no points, who cares?\n\n\tfor (int d = 0; d < dim; d++) {\t\t// compute spread along each dim\n\t\tANNcoord spr = annSpread(pa, pidx, n, d);\n\t\tif (spr > max_spr) {\t\t\t// bigger than current max\n\t\t\tmax_spr = spr;\n\t\t\tmax_dim = d;\n\t\t}\n\t}\n\treturn max_dim;\n}\n\n//----------------------------------------------------------------------\n//\tannMedianSplit - split point array about its median\n//\t\tSplits a subarray of points pa[0..n] about an element of given\n//\t\trank (median: n_lo = n/2) with respect to dimension d.  It places\n//\t\tthe element of rank n_lo-1 correctly (because our splitting rule\n//\t\ttakes the mean of these two).  On exit, the array is permuted so\n//\t\tthat:\n//\n//\t\tpa[0..n_lo-2][d] <= pa[n_lo-1][d] <= pa[n_lo][d] <= pa[n_lo+1..n-1][d].\n//\n//\t\tThe mean of pa[n_lo-1][d] and pa[n_lo][d] is returned as the\n//\t\tsplitting value.\n//\n//\t\tAll indexing is done indirectly through the index array pidx.\n//\n//\t\tThis function uses the well known selection algorithm due to\n//\t\tC.A.R. Hoare.\n//----------------------------------------------------------------------\n\n\t\t\t\t\t\t\t\t\t\t// swap two points in pa array\n#define PASWAP(a,b) { int tmp = pidx[a]; pidx[a] = pidx[b]; pidx[b] = tmp; }\n\nvoid annMedianSplit(\n\tANNpointArray\t\tpa,\t\t\t\t// points to split\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\td,\t\t\t\t// dimension along which to split\n\tANNcoord\t\t\t&cv,\t\t\t// cutting value\n\tint\t\t\t\t\tn_lo)\t\t\t// split into n_lo and n-n_lo\n{\n\tint l = 0;\t\t\t\t\t\t\t// left end of current subarray\n\tint r = n-1;\t\t\t\t\t\t// right end of current subarray\n\twhile (l < r) {\n\t\tint i = (r+l)/2;\t\t// select middle as pivot\n\t\tint k;\n\n\t\tif (PA(i,d) > PA(r,d))\t\t\t// make sure last > pivot\n\t\t\tPASWAP(i,r)\n\t\tPASWAP(l,i);\t\t\t\t\t// move pivot to first position\n\n\t\tANNcoord c = PA(l,d);\t\t\t// pivot value\n\t\ti = l;\n\t\tk = r;\n\t\tfor(;;) {\t\t\t\t\t\t// pivot about c\n\t\t\twhile (PA(++i,d) < c) ;\n\t\t\twhile (PA(--k,d) > c) ;\n\t\t\tif (i < k) PASWAP(i,k) else break;\n\t\t}\n\t\tPASWAP(l,k);\t\t\t\t\t// pivot winds up in location k\n\n\t\tif (k > n_lo)\t   r = k-1;\t\t// recurse on proper subarray\n\t\telse if (k < n_lo) l = k+1;\n\t\telse break;\t\t\t\t\t\t// got the median exactly\n\t}\n\tif (n_lo > 0) {\t\t\t\t\t\t// search for next smaller item\n\t\tANNcoord c = PA(0,d);\t\t\t// candidate for max\n\t\tint k = 0;\t\t\t\t\t\t// candidate's index\n\t\tfor (int i = 1; i < n_lo; i++) {\n\t\t\tif (PA(i,d) > c) {\n\t\t\t\tc = PA(i,d);\n\t\t\t\tk = i;\n\t\t\t}\n\t\t}\n\t\tPASWAP(n_lo-1, k);\t\t\t\t// max among pa[0..n_lo-1] to pa[n_lo-1]\n\t}\n\t\t\t\t\t\t\t\t\t\t// cut value is midpoint value\n\tcv = (PA(n_lo-1,d) + PA(n_lo,d))/2.0;\n}\n\n//----------------------------------------------------------------------\n//\tannPlaneSplit - split point array about a cutting plane\n//\t\tSplit the points in an array about a given plane along a\n//\t\tgiven cutting dimension.  On exit, br1 and br2 are set so\n//\t\tthat:\n//\n//\t\t\t\tpa[ 0 ..br1-1] <  cv\n//\t\t\t\tpa[br1..br2-1] == cv\n//\t\t\t\tpa[br2.. n -1] >  cv\n//\n//\t\tAll indexing is done indirectly through the index array pidx.\n//\n//----------------------------------------------------------------------\n\nvoid annPlaneSplit(\t\t\t\t// split points by a plane\n\tANNpointArray\t\tpa,\t\t\t\t// points to split\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\td,\t\t\t\t// dimension along which to split\n\tANNcoord\t\t\tcv,\t\t\t\t// cutting value\n\tint\t\t\t\t\t&br1,\t\t\t// first break (values < cv)\n\tint\t\t\t\t\t&br2)\t\t\t// second break (values == cv)\n{\n\tint l = 0;\n\tint r = n-1;\n\tfor(;;) {\t\t\t\t\t\t\t// partition pa[0..n-1] about cv\n\t\twhile (l < n && PA(l,d) < cv) l++;\n\t\twhile (r >= 0 && PA(r,d) >= cv) r--;\n\t\tif (l > r) break;\n\t\tPASWAP(l,r);\n\t\tl++; r--;\n\t}\n\tbr1 = l;\t\t\t\t\t// now: pa[0..br1-1] < cv <= pa[br1..n-1]\n\tr = n-1;\n\tfor(;;) {\t\t\t\t\t\t\t// partition pa[br1..n-1] about cv\n\t\twhile (l < n && PA(l,d) <= cv) l++;\n\t\twhile (r >= br1 && PA(r,d) > cv) r--;\n\t\tif (l > r) break;\n\t\tPASWAP(l,r);\n\t\tl++; r--;\n\t}\n\tbr2 = l;\t\t\t\t\t// now: pa[br1..br2-1] == cv < pa[br2..n-1]\n}\n\n\n//----------------------------------------------------------------------\n//\tannBoxSplit - split point array about a orthogonal rectangle\n//\t\tSplit the points in an array about a given orthogonal\n//\t\trectangle.  On exit, n_in is set to the number of points\n//\t\tthat are inside (or on the boundary of) the rectangle.\n//\n//\t\tAll indexing is done indirectly through the index array pidx.\n//\n//----------------------------------------------------------------------\n\nvoid annBoxSplit(\t\t\t\t// split points by a box\n\tANNpointArray\t\tpa,\t\t\t\t// points to split\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\tdim,\t\t\t// dimension of space\n\tANNorthRect\t\t\t&box,\t\t\t// the box\n\tint\t\t\t\t\t&n_in)\t\t\t// number of points inside (returned)\n{\n\tint l = 0;\n\tint r = n-1;\n\tfor(;;) {\t\t\t\t\t\t\t// partition pa[0..n-1] about box\n\t\twhile (l < n && box.inside(dim, PP(l))) l++;\n\t\twhile (r >= 0 && !box.inside(dim, PP(r))) r--;\n\t\tif (l > r) break;\n\t\tPASWAP(l,r);\n\t\tl++; r--;\n\t}\n\tn_in = l;\t\t\t\t\t// now: pa[0..n_in-1] inside and rest outside\n}\n\n//----------------------------------------------------------------------\n//\tannSplitBalance - compute balance factor for a given plane split\n//\t\tBalance factor is defined as the number of points lying\n//\t\tbelow the splitting value minus n/2 (median).  Thus, a\n//\t\tmedian split has balance 0, left of this is negative and\n//\t\tright of this is positive.  (The points are unchanged.)\n//----------------------------------------------------------------------\n\nint annSplitBalance(\t\t\t// determine balance factor of a split\n\tANNpointArray\t\tpa,\t\t\t\t// points to split\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\td,\t\t\t\t// dimension along which to split\n\tANNcoord\t\t\tcv)\t\t\t\t// cutting value\n{\n\tint n_lo = 0;\n\tfor(int i = 0; i < n; i++) {\t\t// count number less than cv\n\t\tif (PA(i,d) < cv) n_lo++;\n\t}\n\treturn n_lo - n/2;\n}\n\n//----------------------------------------------------------------------\n//\tannBox2Bnds - convert bounding box to list of bounds\n//\t\tGiven two boxes, an inner box enclosed within a bounding\n//\t\tbox, this routine determines all the sides for which the\n//\t\tinner box is strictly contained with the bounding box,\n//\t\tand adds an appropriate entry to a list of bounds.  Then\n//\t\twe allocate storage for the final list of bounds, and return\n//\t\tthe resulting list and its size.\n//----------------------------------------------------------------------\n\nvoid annBox2Bnds(\t\t\t\t\t\t// convert inner box to bounds\n\tconst ANNorthRect\t&inner_box,\t\t// inner box\n\tconst ANNorthRect\t&bnd_box,\t\t// enclosing box\n\tint\t\t\t\t\tdim,\t\t\t// dimension of space\n\tint\t\t\t\t\t&n_bnds,\t\t// number of bounds (returned)\n\tANNorthHSArray\t\t&bnds)\t\t\t// bounds array (returned)\n{\n\tint i;\n\tn_bnds = 0;\t\t\t\t\t\t\t\t\t// count number of bounds\n\tfor (i = 0; i < dim; i++) {\n\t\tif (inner_box.lo[i] > bnd_box.lo[i])\t// low bound is inside\n\t\t\t\tn_bnds++;\n\t\tif (inner_box.hi[i] < bnd_box.hi[i])\t// high bound is inside\n\t\t\t\tn_bnds++;\n\t}\n\n\tbnds = new ANNorthHalfSpace[n_bnds];\t\t// allocate appropriate size\n\n\tint j = 0;\n\tfor (i = 0; i < dim; i++) {\t\t\t\t\t// fill the array\n\t\tif (inner_box.lo[i] > bnd_box.lo[i]) {\n\t\t\t\tbnds[j].cd = i;\n\t\t\t\tbnds[j].cv = inner_box.lo[i];\n\t\t\t\tbnds[j].sd = +1;\n\t\t\t\tj++;\n\t\t}\n\t\tif (inner_box.hi[i] < bnd_box.hi[i]) {\n\t\t\t\tbnds[j].cd = i;\n\t\t\t\tbnds[j].cv = inner_box.hi[i];\n\t\t\t\tbnds[j].sd = -1;\n\t\t\t\tj++;\n\t\t}\n\t}\n}\n\n//----------------------------------------------------------------------\n//\tannBnds2Box - convert list of bounds to bounding box\n//\t\tGiven an enclosing box and a list of bounds, this routine\n//\t\tcomputes the corresponding inner box.  It is assumed that\n//\t\tthe box points have been allocated already.\n//----------------------------------------------------------------------\n\nvoid annBnds2Box(\n\tconst ANNorthRect\t&bnd_box,\t\t// enclosing box\n\tint\t\t\t\t\tdim,\t\t\t// dimension of space\n\tint\t\t\t\t\tn_bnds,\t\t\t// number of bounds\n\tANNorthHSArray\t\tbnds,\t\t\t// bounds array\n\tANNorthRect\t\t\t&inner_box)\t\t// inner box (returned)\n{\n\tannAssignRect(dim, inner_box, bnd_box);\t\t// copy bounding box to inner\n\n\tfor (int i = 0; i < n_bnds; i++) {\n\t\tbnds[i].project(inner_box.lo);\t\t\t// project each endpoint\n\t\tbnds[i].project(inner_box.hi);\n\t}\n}\n"
  },
  {
    "path": "src/ANN/kd_util.h",
    "content": "//----------------------------------------------------------------------\n// File:\t\t\tkd_util.h\n// Programmer:\t\tSunil Arya and David Mount\n// Description:\t\tCommon utilities for kd- trees\n// Last modified:\t01/04/05 (Version 1.0)\n//----------------------------------------------------------------------\n// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and\n// David Mount.  All Rights Reserved.\n// \n// This software and related documentation is part of the Approximate\n// Nearest Neighbor Library (ANN).  This software is provided under\n// the provisions of the Lesser GNU Public License (LGPL).  See the\n// file ../ReadMe.txt for further information.\n// \n// The University of Maryland (U.M.) and the authors make no\n// representations about the suitability or fitness of this software for\n// any purpose.  It is provided \"as is\" without express or implied\n// warranty.\n//----------------------------------------------------------------------\n// History:\n//\tRevision 0.1  03/04/98\n//\t\tInitial release\n//----------------------------------------------------------------------\n\n#ifndef ANN_kd_util_H\n#define ANN_kd_util_H\n\n#include \"kd_tree.h\"\t\t\t\t\t// kd-tree declarations\n\n//----------------------------------------------------------------------\n//\texternally accessible functions\n//----------------------------------------------------------------------\n\ndouble annAspectRatio(\t\t\t// compute aspect ratio of box\n\tint\t\t\t\t\tdim,\t\t\t// dimension\n\tconst ANNorthRect\t&bnd_box);\t\t// bounding cube\n\nvoid annEnclRect(\t\t\t\t// compute smallest enclosing rectangle\n\tANNpointArray\t\tpa,\t\t\t\t// point array\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\tdim,\t\t\t// dimension\n\tANNorthRect &bnds);\t\t\t\t\t// bounding cube (returned)\n\nvoid annEnclCube(\t\t\t\t// compute smallest enclosing cube\n\tANNpointArray\t\tpa,\t\t\t\t// point array\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\tdim,\t\t\t// dimension\n\tANNorthRect &bnds);\t\t\t\t\t// bounding cube (returned)\n\nANNdist annBoxDistance(\t\t\t// compute distance from point to box\n\tconst ANNpoint\t\tq,\t\t\t\t// the point\n\tconst ANNpoint\t\tlo,\t\t\t\t// low point of box\n\tconst ANNpoint\t\thi,\t\t\t\t// high point of box\n\tint\t\t\t\t\tdim);\t\t\t// dimension of space\n\nANNcoord annSpread(\t\t\t\t// compute point spread along dimension\n\tANNpointArray\t\tpa,\t\t\t\t// point array\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\td);\t\t\t\t// dimension to check\n\nvoid annMinMax(\t\t\t\t\t// compute min and max coordinates along dim\n\tANNpointArray\t\tpa,\t\t\t\t// point array\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\td,\t\t\t\t// dimension to check\n\tANNcoord&\t\t\tmin,\t\t\t// minimum value (returned)\n\tANNcoord&\t\t\tmax);\t\t\t// maximum value (returned)\n\nint annMaxSpread(\t\t\t\t// compute dimension of max spread\n\tANNpointArray\t\tpa,\t\t\t\t// point array\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\tdim);\t\t\t// dimension of space\n\nvoid annMedianSplit(\t\t\t// split points along median value\n\tANNpointArray\t\tpa,\t\t\t\t// points to split\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\td,\t\t\t\t// dimension along which to split\n\tANNcoord\t\t\t&cv,\t\t\t// cutting value\n\tint\t\t\t\t\tn_lo);\t\t\t// split into n_lo and n-n_lo\n\nvoid annPlaneSplit(\t\t\t\t// split points by a plane\n\tANNpointArray\t\tpa,\t\t\t\t// points to split\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\td,\t\t\t\t// dimension along which to split\n\tANNcoord\t\t\tcv,\t\t\t\t// cutting value\n\tint\t\t\t\t\t&br1,\t\t\t// first break (values < cv)\n\tint\t\t\t\t\t&br2);\t\t\t// second break (values == cv)\n\nvoid annBoxSplit(\t\t\t\t// split points by a box\n\tANNpointArray\t\tpa,\t\t\t\t// points to split\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\tdim,\t\t\t// dimension of space\n\tANNorthRect\t\t\t&box,\t\t\t// the box\n\tint\t\t\t\t\t&n_in);\t\t\t// number of points inside (returned)\n\nint annSplitBalance(\t\t\t// determine balance factor of a split\n\tANNpointArray\t\tpa,\t\t\t\t// points to split\n\tANNidxArray\t\t\tpidx,\t\t\t// point indices\n\tint\t\t\t\t\tn,\t\t\t\t// number of points\n\tint\t\t\t\t\td,\t\t\t\t// dimension along which to split\n\tANNcoord\t\t\tcv);\t\t\t// cutting value\n\nvoid annBox2Bnds(\t\t\t\t// convert inner box to bounds\n\tconst ANNorthRect\t&inner_box,\t\t// inner box\n\tconst ANNorthRect\t&bnd_box,\t\t// enclosing box\n\tint\t\t\t\t\tdim,\t\t\t// dimension of space\n\tint\t\t\t\t\t&n_bnds,\t\t// number of bounds (returned)\n\tANNorthHSArray\t\t&bnds);\t\t\t// bounds array (returned)\n\nvoid annBnds2Box(\t\t\t\t// convert bounds to inner box\n\tconst ANNorthRect\t&bnd_box,\t\t// enclosing box\n\tint\t\t\t\t\tdim,\t\t\t// dimension of space\n\tint\t\t\t\t\tn_bnds,\t\t\t// number of bounds\n\tANNorthHSArray\t\tbnds,\t\t\t// bounds array\n\tANNorthRect\t\t\t&inner_box);\t// inner box (returned)\n\n#endif\n"
  },
  {
    "path": "src/ANN/perf.cpp",
    "content": "//----------------------------------------------------------------------\n// File:\t\t\tperf.cpp\n// Programmer:\t\tSunil Arya and David Mount\n// Description:\t\tMethods for performance stats\n// Last modified:\t01/04/05 (Version 1.0)\n//----------------------------------------------------------------------\n// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and\n// David Mount.  All Rights Reserved.\n//\n// This software and related documentation is part of the Approximate\n// Nearest Neighbor Library (ANN).  This software is provided under\n// the provisions of the Lesser GNU Public License (LGPL).  See the\n// file ../ReadMe.txt for further information.\n//\n// The University of Maryland (U.M.) and the authors make no\n// representations about the suitability or fitness of this software for\n// any purpose.  It is provided \"as is\" without express or implied\n// warranty.\n//----------------------------------------------------------------------\n// History:\n//\tRevision 0.1  03/04/98\n//\t\tInitial release\n//\tRevision 1.0  04/01/05\n//\t\tChanged names to avoid namespace conflicts.\n//\t\tAdded flush after printing performance stats to fix bug\n//\t\t\tin Microsoft Windows version.\n//----------------------------------------------------------------------\n\n#include \"ANN.h\"\t\t\t\t\t// basic ANN includes\n#include \"ANNperf.h\"\t\t\t\t// performance includes\n\nusing namespace std;\t\t\t\t\t// make std:: available\n\n//----------------------------------------------------------------------\n//\tPerformance statistics\n//\t\tThe following data and routines are used for computing\n//\t\tperformance statistics for nearest neighbor searching.\n//\t\tBecause these routines can slow the code down, they can be\n//\t\tactivated and deactiviated by defining the PERF variable,\n//\t\tby compiling with the option: -DPERF\n//----------------------------------------------------------------------\n\n//----------------------------------------------------------------------\n//\tGlobal counters for performance measurement\n//----------------------------------------------------------------------\n\nint\t\t\t\tann_Ndata_pts  = 0;\t\t// number of data points\nint\t\t\t\tann_Nvisit_lfs = 0;\t\t// number of leaf nodes visited\nint\t\t\t\tann_Nvisit_spl = 0;\t\t// number of splitting nodes visited\nint\t\t\t\tann_Nvisit_shr = 0;\t\t// number of shrinking nodes visited\nint\t\t\t\tann_Nvisit_pts = 0;\t\t// visited points for one query\nint\t\t\t\tann_Ncoord_hts = 0;\t\t// coordinate hits for one query\nint\t\t\t\tann_Nfloat_ops = 0;\t\t// floating ops for one query\nANNsampStat\t\tann_visit_lfs;\t\t\t// stats on leaf nodes visits\nANNsampStat\t\tann_visit_spl;\t\t\t// stats on splitting nodes visits\nANNsampStat\t\tann_visit_shr;\t\t\t// stats on shrinking nodes visits\nANNsampStat\t\tann_visit_nds;\t\t\t// stats on total nodes visits\nANNsampStat\t\tann_visit_pts;\t\t\t// stats on points visited\nANNsampStat\t\tann_coord_hts;\t\t\t// stats on coordinate hits\nANNsampStat\t\tann_float_ops;\t\t\t// stats on floating ops\n//\nANNsampStat\t\tann_average_err;\t\t// average error\nANNsampStat\t\tann_rank_err;\t\t\t// rank error\n\n//----------------------------------------------------------------------\n//\tRoutines for statistics.\n//----------------------------------------------------------------------\n\nDLL_API void annResetStats(int data_size) // reset stats for a set of queries\n{\n\tann_Ndata_pts  = data_size;\n\tann_visit_lfs.reset();\n\tann_visit_spl.reset();\n\tann_visit_shr.reset();\n\tann_visit_nds.reset();\n\tann_visit_pts.reset();\n\tann_coord_hts.reset();\n\tann_float_ops.reset();\n\tann_average_err.reset();\n\tann_rank_err.reset();\n}\n\nDLL_API void annResetCounts()\t\t\t\t// reset counts for one query\n{\n\tann_Nvisit_lfs = 0;\n\tann_Nvisit_spl = 0;\n\tann_Nvisit_shr = 0;\n\tann_Nvisit_pts = 0;\n\tann_Ncoord_hts = 0;\n\tann_Nfloat_ops = 0;\n}\n\nDLL_API void annUpdateStats()\t\t\t\t// update stats with current counts\n{\n\tann_visit_lfs += ann_Nvisit_lfs;\n\tann_visit_nds += ann_Nvisit_spl + ann_Nvisit_lfs;\n\tann_visit_spl += ann_Nvisit_spl;\n\tann_visit_shr += ann_Nvisit_shr;\n\tann_visit_pts += ann_Nvisit_pts;\n\tann_coord_hts += ann_Ncoord_hts;\n\tann_float_ops += ann_Nfloat_ops;\n}\n\n\t\t\t\t\t\t\t\t\t\t// print a single statistic\nvoid print_one_stat(const char *title, ANNsampStat s, double div)\n{\n//R does not allow:\tcout << title << \"= [ \";\n//R does not allow:\tcout.width(9); cout << s.mean()/div\t\t\t<< \" : \";\n//R does not allow:\tcout.width(9); cout << s.stdDev()/div\t\t<< \" ]<\";\n//R does not allow:\tcout.width(9); cout << s.min()/div\t\t\t<< \" , \";\n//R does not allow: cout.width(9); cout << s.max()/div\t\t\t<< \" >\\n\";\n}\n\nDLL_API void annPrintStats(\t\t\t\t// print statistics for a run\n\tANNbool validate)\t\t\t\t\t// true if average errors desired\n{\n//R does not allow:\tcout.precision(4);\t\t\t\t\t// set floating precision\n//R does not allow:\tcout << \"  (Performance stats: \"\n//R does not allow:\t\t\t << \" [      mean :    stddev ]<      min ,       max >\\n\";\n\tprint_one_stat(\"    leaf_nodes       \", ann_visit_lfs, 1);\n\tprint_one_stat(\"    splitting_nodes  \", ann_visit_spl, 1);\n\tprint_one_stat(\"    shrinking_nodes  \", ann_visit_shr, 1);\n\tprint_one_stat(\"    total_nodes      \", ann_visit_nds, 1);\n\tprint_one_stat(\"    points_visited   \", ann_visit_pts, 1);\n\tprint_one_stat(\"    coord_hits/pt    \", ann_coord_hts, ann_Ndata_pts);\n\tprint_one_stat(\"    floating_ops_(K) \", ann_float_ops, 1000);\n\tif (validate) {\n\t\tprint_one_stat(\"    average_error    \", ann_average_err, 1);\n\t\tprint_one_stat(\"    rank_error       \", ann_rank_err, 1);\n\t}\n//R does not allow:\tcout.precision(0);\t\t\t\t\t// restore the default\n//R does not allow:\tcout << \"  )\\n\";\n//R does not allow:\tcout.flush();\n}\n"
  },
  {
    "path": "src/ANN/pr_queue.h",
    "content": "//----------------------------------------------------------------------\n// File:\t\t\tpr_queue.h\n// Programmer:\t\tSunil Arya and David Mount\n// Description:\t\tInclude file for priority queue and related\n// \t\t\t\t\tstructures.\n// Last modified:\t01/04/05 (Version 1.0)\n//----------------------------------------------------------------------\n// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and\n// David Mount.  All Rights Reserved.\n//\n// This software and related documentation is part of the Approximate\n// Nearest Neighbor Library (ANN).  This software is provided under\n// the provisions of the Lesser GNU Public License (LGPL).  See the\n// file ../ReadMe.txt for further information.\n//\n// The University of Maryland (U.M.) and the authors make no\n// representations about the suitability or fitness of this software for\n// any purpose.  It is provided \"as is\" without express or implied\n// warranty.\n//----------------------------------------------------------------------\n// History:\n//\tRevision 0.1  03/04/98\n//\t\tInitial release\n//----------------------------------------------------------------------\n\n#ifndef PR_QUEUE_H\n#define PR_QUEUE_H\n\n#include \"ANNx.h\"\t\t\t\t\t// all ANN includes\n#include \"ANNperf.h\"\t\t\t\t// performance evaluation\n\n//----------------------------------------------------------------------\n//\tBasic types.\n//----------------------------------------------------------------------\ntypedef void\t\t\t*PQinfo;\t\t// info field is generic pointer\ntypedef ANNdist\t\t\tPQkey;\t\t\t// key field is distance\n\n//----------------------------------------------------------------------\n//\tPriority queue\n//\t\tA priority queue is a list of items, along with associated\n//\t\tpriorities.  The basic operations are insert and extract_minimum.\n//\n//\t\tThe priority queue is maintained using a standard binary heap.\n//\t\t(Implementation note: Indexing is performed from [1..max] rather\n//\t\tthan the C standard of [0..max-1].  This simplifies parent/child\n//\t\tcomputations.)  User information consists of a void pointer,\n//\t\tand the user is responsible for casting this quantity into whatever\n//\t\tuseful form is desired.\n//\n//\t\tBecause the priority queue is so central to the efficiency of\n//\t\tquery processing, all the code is inline.\n//----------------------------------------------------------------------\n\nclass ANNpr_queue {\n\n\tstruct pq_node {\t\t\t\t\t// node in priority queue\n\t\tPQkey\t\t\tkey;\t\t\t// key value\n\t\tPQinfo\t\t\tinfo;\t\t\t// info field\n\t};\n\tint\t\t\tn;\t\t\t\t\t\t// number of items in queue\n\tint\t\t\tmax_size;\t\t\t\t// maximum queue size\n\tpq_node\t\t*pq;\t\t\t\t\t// the priority queue (array of nodes)\n\npublic:\n\tANNpr_queue(int max)\t\t\t\t// constructor (given max size)\n\t\t{\n\t\t\tn = 0;\t\t\t\t\t\t// initially empty\n\t\t\tmax_size = max;\t\t\t\t// maximum number of items\n\t\t\tpq = new pq_node[max+1];\t// queue is array [1..max] of nodes\n\t\t}\n\n\t~ANNpr_queue()\t\t\t\t\t\t// destructor\n\t\t{ delete [] pq; }\n\n\tANNbool empty()\t\t\t\t\t\t// is queue empty?\n\t\t{ if (n==0) return ANNtrue; else return ANNfalse; }\n\n\tANNbool non_empty()\t\t\t\t\t// is queue nonempty?\n\t\t{ if (n==0) return ANNfalse; else return ANNtrue; }\n\n\tvoid reset()\t\t\t\t\t\t// make existing queue empty\n\t\t{ n = 0; }\n\n\tinline void insert(\t\t\t\t\t// insert item (inlined for speed)\n\t\tPQkey kv,\t\t\t\t\t\t// key value\n\t\tPQinfo inf)\t\t\t\t\t\t// item info\n\t\t{\n\t\t\tif (++n > max_size) annError(\"Priority queue overflow.\", ANNabort);\n\t\t\tint r = n;\n\t\t\twhile (r > 1) {\t\t\t\t// sift up new item\n\t\t\t\tint p = r/2;\n\t\t\t\tANN_FLOP(1)\t\t\t\t// increment floating ops\n\t\t\t\tif (pq[p].key <= kv)\t// in proper order\n\t\t\t\t\tbreak;\n\t\t\t\tpq[r] = pq[p];\t\t\t// else swap with parent\n\t\t\t\tr = p;\n\t\t\t}\n\t\t\tpq[r].key = kv;\t\t\t\t// insert new item at final location\n\t\t\tpq[r].info = inf;\n\t\t}\n\n\tinline void extr_min(\t\t\t\t// extract minimum (inlined for speed)\n\t\tPQkey &kv,\t\t\t\t\t\t// key (returned)\n\t\tPQinfo &inf)\t\t\t\t\t// item info (returned)\n\t\t{\n\t\t\tkv = pq[1].key;\t\t\t\t// key of min item\n\t\t\tinf = pq[1].info;\t\t\t// information of min item\n\t\t\tPQkey kn = pq[n--].key;// last item in queue\n\t\t\tint p = 1;\t\t\t// p points to item out of position\n\t\t\tint r = p<<1;\t\t// left child of p\n\t\t\twhile (r <= n) {\t\t\t// while r is still within the heap\n\t\t\t\tANN_FLOP(2)\t\t\t\t// increment floating ops\n\t\t\t\t\t\t\t\t\t\t// set r to smaller child of p\n\t\t\t\tif (r < n  && pq[r].key > pq[r+1].key) r++;\n\t\t\t\tif (kn <= pq[r].key)\t// in proper order\n\t\t\t\t\tbreak;\n\t\t\t\tpq[p] = pq[r];\t\t\t// else swap with child\n\t\t\t\tp = r;\t\t\t\t\t// advance pointers\n\t\t\t\tr = p<<1;\n\t\t\t}\n\t\t\tpq[p] = pq[n+1];\t\t\t// insert last item in proper place\n\t\t}\n};\n\n#endif\n"
  },
  {
    "path": "src/ANN/pr_queue_k.h",
    "content": "//----------------------------------------------------------------------\n// File:\t\t\tpr_queue_k.h\n// Programmer:\t\tSunil Arya and David Mount\n// Description:\t\tInclude file for priority queue with k items.\n// Last modified:\t01/04/05 (Version 1.0)\n//----------------------------------------------------------------------\n// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and\n// David Mount.  All Rights Reserved.\n//\n// This software and related documentation is part of the Approximate\n// Nearest Neighbor Library (ANN).  This software is provided under\n// the provisions of the Lesser GNU Public License (LGPL).  See the\n// file ../ReadMe.txt for further information.\n//\n// The University of Maryland (U.M.) and the authors make no\n// representations about the suitability or fitness of this software for\n// any purpose.  It is provided \"as is\" without express or implied\n// warranty.\n//----------------------------------------------------------------------\n// History:\n//\tRevision 0.1  03/04/98\n//\t\tInitial release\n//----------------------------------------------------------------------\n\n#ifndef PR_QUEUE_K_H\n#define PR_QUEUE_K_H\n\n#include \"ANNx.h\"\t\t\t\t\t// all ANN includes\n#include \"ANNperf.h\"\t\t\t\t// performance evaluation\n\n//----------------------------------------------------------------------\n//\tBasic types\n//----------------------------------------------------------------------\ntypedef ANNdist\t\t\tPQKkey;\t\t\t// key field is distance\ntypedef int\t\t\t\tPQKinfo;\t\t// info field is int\n\n//----------------------------------------------------------------------\n//\tConstants\n//\t\tThe NULL key value is used to initialize the priority queue, and\n//\t\tso it should be larger than any valid distance, so that it will\n//\t\tbe replaced as legal distance values are inserted.  The NULL\n//\t\tinfo value must be a nonvalid array index, we use ANN_NULL_IDX,\n//\t\twhich is guaranteed to be negative.\n//----------------------------------------------------------------------\n\nconst PQKkey\tPQ_NULL_KEY  =  ANN_DIST_INF;\t// nonexistent key value\nconst PQKinfo\tPQ_NULL_INFO =  ANN_NULL_IDX;\t// nonexistent info value\n\n//----------------------------------------------------------------------\n//\tANNmin_k\n//\t\tAn ANNmin_k structure is one which maintains the smallest\n//\t\tk values (of type PQKkey) and associated information (of type\n//\t\tPQKinfo).  The special info and key values PQ_NULL_INFO and\n//\t\tPQ_NULL_KEY means that thise entry is empty.\n//\n//\t\tIt is currently implemented using an array with k items.\n//\t\tItems are stored in increasing sorted order, and insertions\n//\t\tare made through standard insertion sort.  (This is quite\n//\t\tinefficient, but current applications call for small values\n//\t\tof k and relatively few insertions.)\n//\n//\t\tNote that the list contains k+1 entries, but the last entry\n//\t\tis used as a simple placeholder and is otherwise ignored.\n//----------------------------------------------------------------------\n\nclass ANNmin_k {\n\tstruct mk_node {\t\t\t\t\t// node in min_k structure\n\t\tPQKkey\t\t\tkey;\t\t\t// key value\n\t\tPQKinfo\t\t\tinfo;\t\t\t// info field (user defined)\n\t};\n\n\tint\t\t\tk;\t\t\t\t\t\t// max number of keys to store\n\tint\t\t\tn;\t\t\t\t\t\t// number of keys currently active\n\tmk_node\t\t*mk;\t\t\t\t\t// the list itself\n\npublic:\n\tANNmin_k(int max)\t\t\t\t\t// constructor (given max size)\n\t\t{\n\t\t\tn = 0;\t\t\t\t\t\t// initially no items\n\t\t\tk = max;\t\t\t\t\t// maximum number of items\n\t\t\tmk = new mk_node[max+1];\t// sorted array of keys\n\t\t}\n\n\t~ANNmin_k()\t\t\t\t\t\t\t// destructor\n\t\t{ delete [] mk; }\n\n\tPQKkey ANNmin_key()\t\t\t\t\t// return minimum key\n\t\t{ return (n > 0 ? mk[0].key : PQ_NULL_KEY); }\n\n\tPQKkey max_key()\t\t\t\t\t// return maximum key\n\t\t{ return (n == k ? mk[k-1].key : PQ_NULL_KEY); }\n\n\tPQKkey ith_smallest_key(int i)\t\t// ith smallest key (i in [0..n-1])\n\t\t{ return (i < n ? mk[i].key : PQ_NULL_KEY); }\n\n\tPQKinfo ith_smallest_info(int i)\t// info for ith smallest (i in [0..n-1])\n\t\t{ return (i < n ? mk[i].info : PQ_NULL_INFO); }\n\n\tinline void insert(\t\t\t\t\t// insert item (inlined for speed)\n\t\tPQKkey kv,\t\t\t\t\t\t// key value\n\t\tPQKinfo inf)\t\t\t\t\t// item info\n\t\t{\n\t\t\tint i;\n\t\t\t\t\t\t\t\t\t\t// slide larger values up\n\t\t\tfor (i = n; i > 0; i--) {\n\t\t\t\tif (mk[i-1].key > kv)\n\t\t\t\t\tmk[i] = mk[i-1];\n\t\t\t\telse\n\t\t\t\t\tbreak;\n\t\t\t}\n\t\t\tmk[i].key = kv;\t\t\t\t// store element here\n\t\t\tmk[i].info = inf;\n\t\t\tif (n < k) n++;\t\t\t\t// increment number of items\n\t\t\tANN_FLOP(k-i+1)\t\t\t\t// increment floating ops\n\t\t}\n};\n\n#endif\n"
  },
  {
    "path": "src/JP.cpp",
    "content": "//----------------------------------------------------------------------\n//                  Jarvis-Patrick Clustering\n//----------------------------------------------------------------------\n// Copyright (c) 2017 Michael Hahsler. All Rights Reserved.\n//\n// This software is provided under the provisions of the\n// GNU General Public License (GPL) Version 3\n// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)\n\n\n#include <Rcpp.h>\n\nusing namespace Rcpp;\n\n// [[Rcpp::export]]\nIntegerVector JP_int(IntegerMatrix nn, unsigned int kt) {\n  R_xlen_t n = nn.nrow();\n\n  // create label vector\n  std::vector<int> label(n);\n  //iota is C++11 only\n  //std::iota(std::begin(label), std::end(label), 1); // Fill with 1, 2, ..., n.\n  int value = 1;\n  std::vector<int>::iterator first = label.begin(), last = label.end();\n  while(first != last) *first++ = value++;\n\n  // create sorted sets so we can use set operations\n  std::vector< std::set<int> > nn_set(nn.nrow());\n  IntegerVector r;\n  std::vector<int> s;\n  for(R_xlen_t i = 0; i < n; ++i) {\n    r = nn(i,_);\n    s =  as<std::vector<int> >(r);\n    nn_set[i].insert(s.begin(), s.end());\n  }\n\n  std::vector<int> z;\n  std::set<int>::iterator it;\n  R_xlen_t i, j;\n  int newlabel, oldlabel;\n\n  for(i = 0; i < n; ++i) {\n    // check all neighbors of i\n    for (it = nn_set[i].begin(); it != nn_set[i].end(); ++it) {\n      j = *it-1; // index in nn starts with 1\n\n      // edge was already checked\n      if(j<i) continue;\n\n      // already in the same cluster\n      if(label[i] == label[j]) continue;\n\n      // check if points are in each others snn list (i is already in j)\n      if(nn_set[j].find(i+1) != nn_set[j].end()) {\n\n        // calculate link strength as the number of shared points\n        z.clear();\n        std::set_intersection(nn_set[i].begin(), nn_set[i].end(),\n          nn_set[j].begin(), nn_set[j].end(),\n          std::back_inserter(z));\n\n        // this could be done faster with set union\n        // +1 since i is in j\n        if(z.size()+1 >= kt) {\n          // update labels\n          if(label[i] > label[j]) {\n            newlabel = label[j]; oldlabel = label[i];\n          }else{\n            newlabel = label[i]; oldlabel = label[j];\n          }\n\n          for(int k = 0; k < n; ++k) {\n            if(label[k] == oldlabel) label[k] = newlabel;\n          }\n        }\n      }\n    }\n  }\n\n  return wrap(label);\n}\n\n\n// jp == true: use the definition by Jarvis-Patrick: A link is created between a pair of\n// points, p and q, if and only if p and q have each other  in their k-nearest neighbor lists.\n// jp == false: just count the shared NNs = regular sNN\n// [[Rcpp::export]]\nIntegerMatrix SNN_sim_int(IntegerMatrix nn, LogicalVector jp) {\n  R_xlen_t n = nn.nrow();\n  R_xlen_t k = nn.ncol();\n\n  IntegerMatrix snn(n, k);\n\n  // create sorted sets so we can use set operations\n  std::vector< std::set<int> > nn_set(n);\n  IntegerVector r;\n  std::vector<int> s;\n  for(R_xlen_t i = 0; i < n; ++i) {\n    r = nn(i,_);\n    s =  as<std::vector<int> >(r);\n    nn_set[i].insert(s.begin(), s.end());\n  }\n\n  std::vector<int> z;\n  int j;\n\n  for(R_xlen_t i = 0; i < n; ++i) {\n    // check all neighbors of i\n    for (R_xlen_t j_ind = 0; j_ind < k; ++j_ind) {\n      j = nn(i, j_ind)-1;\n\n      bool i_in_j = (nn_set[j].find(i+1) != nn_set[j].end());\n\n      if(is_false(all(jp)) || i_in_j) {\n        // calculate link strength as the number of shared points\n        z.clear();\n        std::set_intersection(nn_set[i].begin(), nn_set[i].end(),\n          nn_set[j].begin(), nn_set[j].end(),\n          std::back_inserter(z));\n        snn(i, j_ind) = z.size();\n        // +1 if i is in j\n        if(i_in_j) snn(i, j_ind)++;\n\n      }else snn(i, j_ind) = 0;\n\n    }\n  }\n\n  return snn;\n}\n"
  },
  {
    "path": "src/Makevars",
    "content": "# CXX_STD = CXX11\n\nSOURCES = \\\n\tANN/perf.cpp ANN/bd_fix_rad_search.cpp ANN/bd_search.cpp \\\n\tANN/kd_split.cpp ANN/kd_pr_search.cpp ANN/kd_search.cpp \\\n\tANN/ANN.cpp ANN/brute.cpp ANN/bd_tree.cpp ANN/kd_fix_rad_search.cpp \\\n\tANN/bd_pr_search.cpp ANN/kd_util.cpp ANN/kd_tree.cpp ANN/kd_dump.cpp \\\n\tutilities.cpp cleanup.cpp \\\n\tkNN.cpp connectedComps.cpp \\\n\tfrNN.cpp regionQuery.cpp density.cpp \\\n\tdbscan.cpp \\\n\toptics.cpp \\\n\tJP.cpp \\\n\thdbscan.cpp \\\n\tdendrogram.cpp UnionFind.cpp \\\n\tmrd.cpp \\\n\tmst.cpp \\\n\tlof.cpp \\\n\tdbcv.cpp \\\n\tRcppExports.cpp\n\nOBJECTS = $(SOURCES:.cpp=.o)\n"
  },
  {
    "path": "src/RcppExports.cpp",
    "content": "// Generated by using Rcpp::compileAttributes() -> do not edit by hand\n// Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393\n\n#include <Rcpp.h>\n\nusing namespace Rcpp;\n\n#ifdef RCPP_USE_GLOBAL_ROSTREAM\nRcpp::Rostream<true>&  Rcpp::Rcout = Rcpp::Rcpp_cout_get();\nRcpp::Rostream<false>& Rcpp::Rcerr = Rcpp::Rcpp_cerr_get();\n#endif\n\n// JP_int\nIntegerVector JP_int(IntegerMatrix nn, unsigned int kt);\nRcppExport SEXP _dbscan_JP_int(SEXP nnSEXP, SEXP ktSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< IntegerMatrix >::type nn(nnSEXP);\n    Rcpp::traits::input_parameter< unsigned int >::type kt(ktSEXP);\n    rcpp_result_gen = Rcpp::wrap(JP_int(nn, kt));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// SNN_sim_int\nIntegerMatrix SNN_sim_int(IntegerMatrix nn, LogicalVector jp);\nRcppExport SEXP _dbscan_SNN_sim_int(SEXP nnSEXP, SEXP jpSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< IntegerMatrix >::type nn(nnSEXP);\n    Rcpp::traits::input_parameter< LogicalVector >::type jp(jpSEXP);\n    rcpp_result_gen = Rcpp::wrap(SNN_sim_int(nn, jp));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// ANN_cleanup\nvoid ANN_cleanup();\nRcppExport SEXP _dbscan_ANN_cleanup() {\nBEGIN_RCPP\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    ANN_cleanup();\n    return R_NilValue;\nEND_RCPP\n}\n// comps_kNN\nIntegerVector comps_kNN(IntegerMatrix nn, bool mutual);\nRcppExport SEXP _dbscan_comps_kNN(SEXP nnSEXP, SEXP mutualSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< IntegerMatrix >::type nn(nnSEXP);\n    Rcpp::traits::input_parameter< bool >::type mutual(mutualSEXP);\n    rcpp_result_gen = Rcpp::wrap(comps_kNN(nn, mutual));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// comps_frNN\nIntegerVector comps_frNN(List nn, bool mutual);\nRcppExport SEXP _dbscan_comps_frNN(SEXP nnSEXP, SEXP mutualSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< List >::type nn(nnSEXP);\n    Rcpp::traits::input_parameter< bool >::type mutual(mutualSEXP);\n    rcpp_result_gen = Rcpp::wrap(comps_frNN(nn, mutual));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// intToStr\nStringVector intToStr(IntegerVector iv);\nRcppExport SEXP _dbscan_intToStr(SEXP ivSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< IntegerVector >::type iv(ivSEXP);\n    rcpp_result_gen = Rcpp::wrap(intToStr(iv));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// dist_subset\nNumericVector dist_subset(const NumericVector& dist, IntegerVector idx);\nRcppExport SEXP _dbscan_dist_subset(SEXP distSEXP, SEXP idxSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< const NumericVector& >::type dist(distSEXP);\n    Rcpp::traits::input_parameter< IntegerVector >::type idx(idxSEXP);\n    rcpp_result_gen = Rcpp::wrap(dist_subset(dist, idx));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// XOR\nRcpp::LogicalVector XOR(Rcpp::LogicalVector lhs, Rcpp::LogicalVector rhs);\nRcppExport SEXP _dbscan_XOR(SEXP lhsSEXP, SEXP rhsSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< Rcpp::LogicalVector >::type lhs(lhsSEXP);\n    Rcpp::traits::input_parameter< Rcpp::LogicalVector >::type rhs(rhsSEXP);\n    rcpp_result_gen = Rcpp::wrap(XOR(lhs, rhs));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// dspc\nNumericMatrix dspc(const List& cl_idx, const List& internal_nodes, const IntegerVector& all_cl_ids, const NumericVector& mrd_dist);\nRcppExport SEXP _dbscan_dspc(SEXP cl_idxSEXP, SEXP internal_nodesSEXP, SEXP all_cl_idsSEXP, SEXP mrd_distSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< const List& >::type cl_idx(cl_idxSEXP);\n    Rcpp::traits::input_parameter< const List& >::type internal_nodes(internal_nodesSEXP);\n    Rcpp::traits::input_parameter< const IntegerVector& >::type all_cl_ids(all_cl_idsSEXP);\n    Rcpp::traits::input_parameter< const NumericVector& >::type mrd_dist(mrd_distSEXP);\n    rcpp_result_gen = Rcpp::wrap(dspc(cl_idx, internal_nodes, all_cl_ids, mrd_dist));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// dbscan_int\nIntegerVector dbscan_int(NumericMatrix data, double eps, int minPts, NumericVector weights, int borderPoints, int type, int bucketSize, int splitRule, double approx, List frNN);\nRcppExport SEXP _dbscan_dbscan_int(SEXP dataSEXP, SEXP epsSEXP, SEXP minPtsSEXP, SEXP weightsSEXP, SEXP borderPointsSEXP, SEXP typeSEXP, SEXP bucketSizeSEXP, SEXP splitRuleSEXP, SEXP approxSEXP, SEXP frNNSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< NumericMatrix >::type data(dataSEXP);\n    Rcpp::traits::input_parameter< double >::type eps(epsSEXP);\n    Rcpp::traits::input_parameter< int >::type minPts(minPtsSEXP);\n    Rcpp::traits::input_parameter< NumericVector >::type weights(weightsSEXP);\n    Rcpp::traits::input_parameter< int >::type borderPoints(borderPointsSEXP);\n    Rcpp::traits::input_parameter< int >::type type(typeSEXP);\n    Rcpp::traits::input_parameter< int >::type bucketSize(bucketSizeSEXP);\n    Rcpp::traits::input_parameter< int >::type splitRule(splitRuleSEXP);\n    Rcpp::traits::input_parameter< double >::type approx(approxSEXP);\n    Rcpp::traits::input_parameter< List >::type frNN(frNNSEXP);\n    rcpp_result_gen = Rcpp::wrap(dbscan_int(data, eps, minPts, weights, borderPoints, type, bucketSize, splitRule, approx, frNN));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// reach_to_dendrogram\nList reach_to_dendrogram(const Rcpp::List reachability, const NumericVector pl_order);\nRcppExport SEXP _dbscan_reach_to_dendrogram(SEXP reachabilitySEXP, SEXP pl_orderSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< const Rcpp::List >::type reachability(reachabilitySEXP);\n    Rcpp::traits::input_parameter< const NumericVector >::type pl_order(pl_orderSEXP);\n    rcpp_result_gen = Rcpp::wrap(reach_to_dendrogram(reachability, pl_order));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// dendrogram_to_reach\nList dendrogram_to_reach(const Rcpp::List x);\nRcppExport SEXP _dbscan_dendrogram_to_reach(SEXP xSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< const Rcpp::List >::type x(xSEXP);\n    rcpp_result_gen = Rcpp::wrap(dendrogram_to_reach(x));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// mst_to_dendrogram\nList mst_to_dendrogram(const NumericMatrix mst);\nRcppExport SEXP _dbscan_mst_to_dendrogram(SEXP mstSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< const NumericMatrix >::type mst(mstSEXP);\n    rcpp_result_gen = Rcpp::wrap(mst_to_dendrogram(mst));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// dbscan_density_int\nIntegerVector dbscan_density_int(NumericMatrix data, double eps, int type, int bucketSize, int splitRule, double approx);\nRcppExport SEXP _dbscan_dbscan_density_int(SEXP dataSEXP, SEXP epsSEXP, SEXP typeSEXP, SEXP bucketSizeSEXP, SEXP splitRuleSEXP, SEXP approxSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< NumericMatrix >::type data(dataSEXP);\n    Rcpp::traits::input_parameter< double >::type eps(epsSEXP);\n    Rcpp::traits::input_parameter< int >::type type(typeSEXP);\n    Rcpp::traits::input_parameter< int >::type bucketSize(bucketSizeSEXP);\n    Rcpp::traits::input_parameter< int >::type splitRule(splitRuleSEXP);\n    Rcpp::traits::input_parameter< double >::type approx(approxSEXP);\n    rcpp_result_gen = Rcpp::wrap(dbscan_density_int(data, eps, type, bucketSize, splitRule, approx));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// frNN_int\nList frNN_int(NumericMatrix data, double eps, int type, int bucketSize, int splitRule, double approx);\nRcppExport SEXP _dbscan_frNN_int(SEXP dataSEXP, SEXP epsSEXP, SEXP typeSEXP, SEXP bucketSizeSEXP, SEXP splitRuleSEXP, SEXP approxSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< NumericMatrix >::type data(dataSEXP);\n    Rcpp::traits::input_parameter< double >::type eps(epsSEXP);\n    Rcpp::traits::input_parameter< int >::type type(typeSEXP);\n    Rcpp::traits::input_parameter< int >::type bucketSize(bucketSizeSEXP);\n    Rcpp::traits::input_parameter< int >::type splitRule(splitRuleSEXP);\n    Rcpp::traits::input_parameter< double >::type approx(approxSEXP);\n    rcpp_result_gen = Rcpp::wrap(frNN_int(data, eps, type, bucketSize, splitRule, approx));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// frNN_query_int\nList frNN_query_int(NumericMatrix data, NumericMatrix query, double eps, int type, int bucketSize, int splitRule, double approx);\nRcppExport SEXP _dbscan_frNN_query_int(SEXP dataSEXP, SEXP querySEXP, SEXP epsSEXP, SEXP typeSEXP, SEXP bucketSizeSEXP, SEXP splitRuleSEXP, SEXP approxSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< NumericMatrix >::type data(dataSEXP);\n    Rcpp::traits::input_parameter< NumericMatrix >::type query(querySEXP);\n    Rcpp::traits::input_parameter< double >::type eps(epsSEXP);\n    Rcpp::traits::input_parameter< int >::type type(typeSEXP);\n    Rcpp::traits::input_parameter< int >::type bucketSize(bucketSizeSEXP);\n    Rcpp::traits::input_parameter< int >::type splitRule(splitRuleSEXP);\n    Rcpp::traits::input_parameter< double >::type approx(approxSEXP);\n    rcpp_result_gen = Rcpp::wrap(frNN_query_int(data, query, eps, type, bucketSize, splitRule, approx));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// distToAdjacency\nList distToAdjacency(IntegerVector constraints, const int N);\nRcppExport SEXP _dbscan_distToAdjacency(SEXP constraintsSEXP, SEXP NSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< IntegerVector >::type constraints(constraintsSEXP);\n    Rcpp::traits::input_parameter< const int >::type N(NSEXP);\n    rcpp_result_gen = Rcpp::wrap(distToAdjacency(constraints, N));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// buildDendrogram\nList buildDendrogram(List hcl);\nRcppExport SEXP _dbscan_buildDendrogram(SEXP hclSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< List >::type hcl(hclSEXP);\n    rcpp_result_gen = Rcpp::wrap(buildDendrogram(hcl));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// all_children\nIntegerVector all_children(List hier, int key, bool leaves_only);\nRcppExport SEXP _dbscan_all_children(SEXP hierSEXP, SEXP keySEXP, SEXP leaves_onlySEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< List >::type hier(hierSEXP);\n    Rcpp::traits::input_parameter< int >::type key(keySEXP);\n    Rcpp::traits::input_parameter< bool >::type leaves_only(leaves_onlySEXP);\n    rcpp_result_gen = Rcpp::wrap(all_children(hier, key, leaves_only));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// node_xy\nNumericMatrix node_xy(List cl_tree, List cl_hierarchy, int cid);\nRcppExport SEXP _dbscan_node_xy(SEXP cl_treeSEXP, SEXP cl_hierarchySEXP, SEXP cidSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< List >::type cl_tree(cl_treeSEXP);\n    Rcpp::traits::input_parameter< List >::type cl_hierarchy(cl_hierarchySEXP);\n    Rcpp::traits::input_parameter< int >::type cid(cidSEXP);\n    rcpp_result_gen = Rcpp::wrap(node_xy(cl_tree, cl_hierarchy, cid));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// simplifiedTree\nList simplifiedTree(List cl_tree);\nRcppExport SEXP _dbscan_simplifiedTree(SEXP cl_treeSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< List >::type cl_tree(cl_treeSEXP);\n    rcpp_result_gen = Rcpp::wrap(simplifiedTree(cl_tree));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// computeStability\nList computeStability(const List hcl, const int minPts, bool compute_glosh);\nRcppExport SEXP _dbscan_computeStability(SEXP hclSEXP, SEXP minPtsSEXP, SEXP compute_gloshSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< const List >::type hcl(hclSEXP);\n    Rcpp::traits::input_parameter< const int >::type minPts(minPtsSEXP);\n    Rcpp::traits::input_parameter< bool >::type compute_glosh(compute_gloshSEXP);\n    rcpp_result_gen = Rcpp::wrap(computeStability(hcl, minPts, compute_glosh));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// validateConstraintList\nList validateConstraintList(List& constraints, int n);\nRcppExport SEXP _dbscan_validateConstraintList(SEXP constraintsSEXP, SEXP nSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< List& >::type constraints(constraintsSEXP);\n    Rcpp::traits::input_parameter< int >::type n(nSEXP);\n    rcpp_result_gen = Rcpp::wrap(validateConstraintList(constraints, n));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// computeVirtualNode\ndouble computeVirtualNode(IntegerVector noise, List constraints);\nRcppExport SEXP _dbscan_computeVirtualNode(SEXP noiseSEXP, SEXP constraintsSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< IntegerVector >::type noise(noiseSEXP);\n    Rcpp::traits::input_parameter< List >::type constraints(constraintsSEXP);\n    rcpp_result_gen = Rcpp::wrap(computeVirtualNode(noise, constraints));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// fosc\nNumericVector fosc(List cl_tree, std::string cid, std::list<int>& sc, List cl_hierarchy, bool prune_unstable_leaves, double cluster_selection_epsilon, const double alpha, bool useVirtual, const int n_constraints, List constraints);\nRcppExport SEXP _dbscan_fosc(SEXP cl_treeSEXP, SEXP cidSEXP, SEXP scSEXP, SEXP cl_hierarchySEXP, SEXP prune_unstable_leavesSEXP, SEXP cluster_selection_epsilonSEXP, SEXP alphaSEXP, SEXP useVirtualSEXP, SEXP n_constraintsSEXP, SEXP constraintsSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< List >::type cl_tree(cl_treeSEXP);\n    Rcpp::traits::input_parameter< std::string >::type cid(cidSEXP);\n    Rcpp::traits::input_parameter< std::list<int>& >::type sc(scSEXP);\n    Rcpp::traits::input_parameter< List >::type cl_hierarchy(cl_hierarchySEXP);\n    Rcpp::traits::input_parameter< bool >::type prune_unstable_leaves(prune_unstable_leavesSEXP);\n    Rcpp::traits::input_parameter< double >::type cluster_selection_epsilon(cluster_selection_epsilonSEXP);\n    Rcpp::traits::input_parameter< const double >::type alpha(alphaSEXP);\n    Rcpp::traits::input_parameter< bool >::type useVirtual(useVirtualSEXP);\n    Rcpp::traits::input_parameter< const int >::type n_constraints(n_constraintsSEXP);\n    Rcpp::traits::input_parameter< List >::type constraints(constraintsSEXP);\n    rcpp_result_gen = Rcpp::wrap(fosc(cl_tree, cid, sc, cl_hierarchy, prune_unstable_leaves, cluster_selection_epsilon, alpha, useVirtual, n_constraints, constraints));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// extractUnsupervised\nList extractUnsupervised(List cl_tree, bool prune_unstable, double cluster_selection_epsilon);\nRcppExport SEXP _dbscan_extractUnsupervised(SEXP cl_treeSEXP, SEXP prune_unstableSEXP, SEXP cluster_selection_epsilonSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< List >::type cl_tree(cl_treeSEXP);\n    Rcpp::traits::input_parameter< bool >::type prune_unstable(prune_unstableSEXP);\n    Rcpp::traits::input_parameter< double >::type cluster_selection_epsilon(cluster_selection_epsilonSEXP);\n    rcpp_result_gen = Rcpp::wrap(extractUnsupervised(cl_tree, prune_unstable, cluster_selection_epsilon));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// extractSemiSupervised\nList extractSemiSupervised(List cl_tree, List constraints, float alpha, bool prune_unstable_leaves, double cluster_selection_epsilon);\nRcppExport SEXP _dbscan_extractSemiSupervised(SEXP cl_treeSEXP, SEXP constraintsSEXP, SEXP alphaSEXP, SEXP prune_unstable_leavesSEXP, SEXP cluster_selection_epsilonSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< List >::type cl_tree(cl_treeSEXP);\n    Rcpp::traits::input_parameter< List >::type constraints(constraintsSEXP);\n    Rcpp::traits::input_parameter< float >::type alpha(alphaSEXP);\n    Rcpp::traits::input_parameter< bool >::type prune_unstable_leaves(prune_unstable_leavesSEXP);\n    Rcpp::traits::input_parameter< double >::type cluster_selection_epsilon(cluster_selection_epsilonSEXP);\n    rcpp_result_gen = Rcpp::wrap(extractSemiSupervised(cl_tree, constraints, alpha, prune_unstable_leaves, cluster_selection_epsilon));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// kNN_query_int\nList kNN_query_int(NumericMatrix data, NumericMatrix query, int k, int type, int bucketSize, int splitRule, double approx);\nRcppExport SEXP _dbscan_kNN_query_int(SEXP dataSEXP, SEXP querySEXP, SEXP kSEXP, SEXP typeSEXP, SEXP bucketSizeSEXP, SEXP splitRuleSEXP, SEXP approxSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< NumericMatrix >::type data(dataSEXP);\n    Rcpp::traits::input_parameter< NumericMatrix >::type query(querySEXP);\n    Rcpp::traits::input_parameter< int >::type k(kSEXP);\n    Rcpp::traits::input_parameter< int >::type type(typeSEXP);\n    Rcpp::traits::input_parameter< int >::type bucketSize(bucketSizeSEXP);\n    Rcpp::traits::input_parameter< int >::type splitRule(splitRuleSEXP);\n    Rcpp::traits::input_parameter< double >::type approx(approxSEXP);\n    rcpp_result_gen = Rcpp::wrap(kNN_query_int(data, query, k, type, bucketSize, splitRule, approx));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// kNN_int\nList kNN_int(NumericMatrix data, int k, int type, int bucketSize, int splitRule, double approx);\nRcppExport SEXP _dbscan_kNN_int(SEXP dataSEXP, SEXP kSEXP, SEXP typeSEXP, SEXP bucketSizeSEXP, SEXP splitRuleSEXP, SEXP approxSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< NumericMatrix >::type data(dataSEXP);\n    Rcpp::traits::input_parameter< int >::type k(kSEXP);\n    Rcpp::traits::input_parameter< int >::type type(typeSEXP);\n    Rcpp::traits::input_parameter< int >::type bucketSize(bucketSizeSEXP);\n    Rcpp::traits::input_parameter< int >::type splitRule(splitRuleSEXP);\n    Rcpp::traits::input_parameter< double >::type approx(approxSEXP);\n    rcpp_result_gen = Rcpp::wrap(kNN_int(data, k, type, bucketSize, splitRule, approx));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// lof_kNN\nList lof_kNN(NumericMatrix data, int minPts, int type, int bucketSize, int splitRule, double approx);\nRcppExport SEXP _dbscan_lof_kNN(SEXP dataSEXP, SEXP minPtsSEXP, SEXP typeSEXP, SEXP bucketSizeSEXP, SEXP splitRuleSEXP, SEXP approxSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< NumericMatrix >::type data(dataSEXP);\n    Rcpp::traits::input_parameter< int >::type minPts(minPtsSEXP);\n    Rcpp::traits::input_parameter< int >::type type(typeSEXP);\n    Rcpp::traits::input_parameter< int >::type bucketSize(bucketSizeSEXP);\n    Rcpp::traits::input_parameter< int >::type splitRule(splitRuleSEXP);\n    Rcpp::traits::input_parameter< double >::type approx(approxSEXP);\n    rcpp_result_gen = Rcpp::wrap(lof_kNN(data, minPts, type, bucketSize, splitRule, approx));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// mrd\nNumericVector mrd(NumericVector dm, NumericVector cd);\nRcppExport SEXP _dbscan_mrd(SEXP dmSEXP, SEXP cdSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< NumericVector >::type dm(dmSEXP);\n    Rcpp::traits::input_parameter< NumericVector >::type cd(cdSEXP);\n    rcpp_result_gen = Rcpp::wrap(mrd(dm, cd));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// mst\nRcpp::NumericMatrix mst(const NumericVector x_dist, const R_xlen_t n);\nRcppExport SEXP _dbscan_mst(SEXP x_distSEXP, SEXP nSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< const NumericVector >::type x_dist(x_distSEXP);\n    Rcpp::traits::input_parameter< const R_xlen_t >::type n(nSEXP);\n    rcpp_result_gen = Rcpp::wrap(mst(x_dist, n));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// hclustMergeOrder\nList hclustMergeOrder(NumericMatrix mst, IntegerVector o);\nRcppExport SEXP _dbscan_hclustMergeOrder(SEXP mstSEXP, SEXP oSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< NumericMatrix >::type mst(mstSEXP);\n    Rcpp::traits::input_parameter< IntegerVector >::type o(oSEXP);\n    rcpp_result_gen = Rcpp::wrap(hclustMergeOrder(mst, o));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// optics_int\nList optics_int(NumericMatrix data, double eps, int minPts, int type, int bucketSize, int splitRule, double approx, List frNN);\nRcppExport SEXP _dbscan_optics_int(SEXP dataSEXP, SEXP epsSEXP, SEXP minPtsSEXP, SEXP typeSEXP, SEXP bucketSizeSEXP, SEXP splitRuleSEXP, SEXP approxSEXP, SEXP frNNSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< NumericMatrix >::type data(dataSEXP);\n    Rcpp::traits::input_parameter< double >::type eps(epsSEXP);\n    Rcpp::traits::input_parameter< int >::type minPts(minPtsSEXP);\n    Rcpp::traits::input_parameter< int >::type type(typeSEXP);\n    Rcpp::traits::input_parameter< int >::type bucketSize(bucketSizeSEXP);\n    Rcpp::traits::input_parameter< int >::type splitRule(splitRuleSEXP);\n    Rcpp::traits::input_parameter< double >::type approx(approxSEXP);\n    Rcpp::traits::input_parameter< List >::type frNN(frNNSEXP);\n    rcpp_result_gen = Rcpp::wrap(optics_int(data, eps, minPts, type, bucketSize, splitRule, approx, frNN));\n    return rcpp_result_gen;\nEND_RCPP\n}\n// lowerTri\nIntegerVector lowerTri(IntegerMatrix m);\nRcppExport SEXP _dbscan_lowerTri(SEXP mSEXP) {\nBEGIN_RCPP\n    Rcpp::RObject rcpp_result_gen;\n    Rcpp::RNGScope rcpp_rngScope_gen;\n    Rcpp::traits::input_parameter< IntegerMatrix >::type m(mSEXP);\n    rcpp_result_gen = Rcpp::wrap(lowerTri(m));\n    return rcpp_result_gen;\nEND_RCPP\n}\n\nstatic const R_CallMethodDef CallEntries[] = {\n    {\"_dbscan_JP_int\", (DL_FUNC) &_dbscan_JP_int, 2},\n    {\"_dbscan_SNN_sim_int\", (DL_FUNC) &_dbscan_SNN_sim_int, 2},\n    {\"_dbscan_ANN_cleanup\", (DL_FUNC) &_dbscan_ANN_cleanup, 0},\n    {\"_dbscan_comps_kNN\", (DL_FUNC) &_dbscan_comps_kNN, 2},\n    {\"_dbscan_comps_frNN\", (DL_FUNC) &_dbscan_comps_frNN, 2},\n    {\"_dbscan_intToStr\", (DL_FUNC) &_dbscan_intToStr, 1},\n    {\"_dbscan_dist_subset\", (DL_FUNC) &_dbscan_dist_subset, 2},\n    {\"_dbscan_XOR\", (DL_FUNC) &_dbscan_XOR, 2},\n    {\"_dbscan_dspc\", (DL_FUNC) &_dbscan_dspc, 4},\n    {\"_dbscan_dbscan_int\", (DL_FUNC) &_dbscan_dbscan_int, 10},\n    {\"_dbscan_reach_to_dendrogram\", (DL_FUNC) &_dbscan_reach_to_dendrogram, 2},\n    {\"_dbscan_dendrogram_to_reach\", (DL_FUNC) &_dbscan_dendrogram_to_reach, 1},\n    {\"_dbscan_mst_to_dendrogram\", (DL_FUNC) &_dbscan_mst_to_dendrogram, 1},\n    {\"_dbscan_dbscan_density_int\", (DL_FUNC) &_dbscan_dbscan_density_int, 6},\n    {\"_dbscan_frNN_int\", (DL_FUNC) &_dbscan_frNN_int, 6},\n    {\"_dbscan_frNN_query_int\", (DL_FUNC) &_dbscan_frNN_query_int, 7},\n    {\"_dbscan_distToAdjacency\", (DL_FUNC) &_dbscan_distToAdjacency, 2},\n    {\"_dbscan_buildDendrogram\", (DL_FUNC) &_dbscan_buildDendrogram, 1},\n    {\"_dbscan_all_children\", (DL_FUNC) &_dbscan_all_children, 3},\n    {\"_dbscan_node_xy\", (DL_FUNC) &_dbscan_node_xy, 3},\n    {\"_dbscan_simplifiedTree\", (DL_FUNC) &_dbscan_simplifiedTree, 1},\n    {\"_dbscan_computeStability\", (DL_FUNC) &_dbscan_computeStability, 3},\n    {\"_dbscan_validateConstraintList\", (DL_FUNC) &_dbscan_validateConstraintList, 2},\n    {\"_dbscan_computeVirtualNode\", (DL_FUNC) &_dbscan_computeVirtualNode, 2},\n    {\"_dbscan_fosc\", (DL_FUNC) &_dbscan_fosc, 10},\n    {\"_dbscan_extractUnsupervised\", (DL_FUNC) &_dbscan_extractUnsupervised, 3},\n    {\"_dbscan_extractSemiSupervised\", (DL_FUNC) &_dbscan_extractSemiSupervised, 5},\n    {\"_dbscan_kNN_query_int\", (DL_FUNC) &_dbscan_kNN_query_int, 7},\n    {\"_dbscan_kNN_int\", (DL_FUNC) &_dbscan_kNN_int, 6},\n    {\"_dbscan_lof_kNN\", (DL_FUNC) &_dbscan_lof_kNN, 6},\n    {\"_dbscan_mrd\", (DL_FUNC) &_dbscan_mrd, 2},\n    {\"_dbscan_mst\", (DL_FUNC) &_dbscan_mst, 2},\n    {\"_dbscan_hclustMergeOrder\", (DL_FUNC) &_dbscan_hclustMergeOrder, 2},\n    {\"_dbscan_optics_int\", (DL_FUNC) &_dbscan_optics_int, 8},\n    {\"_dbscan_lowerTri\", (DL_FUNC) &_dbscan_lowerTri, 1},\n    {NULL, NULL, 0}\n};\n\nRcppExport void R_init_dbscan(DllInfo *dll) {\n    R_registerRoutines(dll, NULL, CallEntries, NULL, NULL);\n    R_useDynamicSymbols(dll, FALSE);\n}\n"
  },
  {
    "path": "src/UnionFind.cpp",
    "content": "//----------------------------------------------------------------------\n//                        Disjoint-set data structure\n// File:                        union_find.cpp\n//----------------------------------------------------------------------\n// Copyright (c) 2016 Michael Hahsler, Matt Piekenbrock. All Rights Reserved.\n//\n// This software is provided under the provisions of the\n// GNU General Public License (GPL) Version 3\n// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)\n\n// Class definition based off of data-structure described here:\n// https://en.wikipedia.org/wiki/Disjoint-set_data_structure\n\n#include \"UnionFind.h\"\n\nUnionFind::UnionFind(const int size) : parent(size), rank(size)\n{\n  for (int i = 0; i < size; ++i)\n  { parent[i] = i, rank[i] = 0; }\n}\n\n// Destructor not needed w/o dynamic allocation\nUnionFind::~UnionFind() { }\n\nvoid UnionFind::Union(const int x, const int y)\n{\n  const int xRoot = Find(x);\n  const int yRoot = Find(y);\n  if (xRoot == yRoot)\n   return;\n  else if (rank[xRoot] > rank[yRoot])\n    parent[yRoot] = xRoot;\n  else if (rank[xRoot] < rank[yRoot])\n    parent[xRoot] = yRoot;\n  else if (rank[xRoot] == rank[yRoot])\n  {\n    parent[yRoot] = parent[xRoot];\n    rank[xRoot] = rank[xRoot] + 1;\n  }\n}\n\nconst int UnionFind::Find(const int x)\n{\n  if (parent[x] == x)\n    return x;\n  else\n  {\n    parent[x] = Find(parent[x]);\n    return parent[x];\n  }\n}\n"
  },
  {
    "path": "src/UnionFind.h",
    "content": "//----------------------------------------------------------------------\n//                        Disjoint-set data structure\n// File:                        union_find.h\n//----------------------------------------------------------------------\n// Copyright (c) 2016 Michael Hahsler, Matt Piekenbrock. All Rights Reserved.\n//\n// This software is provided under the provisions of the\n// GNU General Public License (GPL) Version 3\n// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)\n\n// Class definition based off of data-structure described here:\n// https://en.wikipedia.org/wiki/Disjoint-set_data_structure\n\n#ifndef UNIONFIND\n#define UNIONFIND\n\n#include <Rcpp.h>\n\nusing namespace Rcpp;\n\nclass UnionFind\n{\n  Rcpp::IntegerVector parent;\n  Rcpp::IntegerVector rank;\n\n  public:\n  UnionFind(const int size);\n  ~UnionFind();\n  void Union(const int x, const int y);\n  const int Find(const int x);\n\n}; // class UnionFind\n\n#endif\n"
  },
  {
    "path": "src/cleanup.cpp",
    "content": "//----------------------------------------------------------------------\n//              R interface to dbscan using the ANN library\n//----------------------------------------------------------------------\n// Copyright (c) 2015 Michael Hahsler. All Rights Reserved.\n//\n// This software is provided under the provisions of the\n// GNU General Public License (GPL) Version 3\n// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)\n\n#include <Rcpp.h>\n#include \"ANN/ANN.h\"\n\nusing namespace Rcpp;\n\n// [[Rcpp::export]]\nvoid ANN_cleanup() {\n  annClose();\n}\n"
  },
  {
    "path": "src/connectedComps.cpp",
    "content": "//----------------------------------------------------------------------\n//              R interface to dbscan using the ANN library\n//----------------------------------------------------------------------\n// Copyright (c) 2015 Michael Hahsler. All Rights Reserved.\n//\n// This software is provided under the provisions of the\n// GNU General Public License (GPL) Version 3\n// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)\n\n#include <Rcpp.h>\n\nusing namespace Rcpp;\n\n// Find connected components in kNN and frNN objects.\n\n// [[Rcpp::export]]\nIntegerVector comps_kNN(IntegerMatrix nn, bool mutual) {\n  R_xlen_t n = nn.nrow();\n\n  // create label vector\n  std::vector<int> label(n);\n  std::iota(std::begin(label), std::end(label), 1); // Fill with 1, 2, ..., n.\n  //iota is C++11 only\n  //int value = 1;\n  //std::vector<int>::iterator first = label.begin(), last = label.end();\n  //while(first != last) *first++ = value++;\n\n  // create sorted sets so we can use set operations\n  std::vector< std::set<int> > nn_set(n);\n  IntegerVector r;\n  std::vector<int> s;\n  for(int i = 0; i < n; ++i) {\n    r = na_omit(nn(i,_));\n    s =  as<std::vector<int> >(r);\n    nn_set[i].insert(s.begin(), s.end());\n  }\n\n  std::set<int>::iterator it;\n  R_xlen_t i, j;\n  int newlabel, oldlabel;\n\n  for(i = 0; i < n; ++i) {\n    // check all neighbors of i\n    for (it = nn_set[i].begin(); it != nn_set[i].end(); ++it) {\n      j = *it-1; // index in nn starts with 1\n\n      // edge was already checked\n      //if(j<i) continue;\n\n      // already in the same cluster\n      if(label[i] == label[j]) continue;\n\n      // check if points are in each others nn list (i is already in j)\n      if(!mutual || nn_set[j].find(i+1) != nn_set[j].end()) {\n        if(label[i] > label[j]) {\n          newlabel = label[j]; oldlabel = label[i];\n        }else{\n          newlabel = label[i]; oldlabel = label[j];\n        }\n\n        // relabel\n        for(int k = 0; k < n; ++k) {\n          if(label[k] == oldlabel) label[k] = newlabel;\n        }\n      }\n    }\n  }\n\n  return wrap(label);\n}\n\n// [[Rcpp::export]]\nIntegerVector comps_frNN(List nn, bool mutual) {\n  R_xlen_t n = nn.length();\n\n  // create label vector\n  std::vector<int> label(n);\n  std::iota(std::begin(label), std::end(label), 1); // Fill with 1, 2, ..., n.\n  //iota is C++11 only\n  //int value = 1;\n  //std::vector<int>::iterator first = label.begin(), last = label.end();\n  //while(first != last) *first++ = value++;\n\n  // create sorted sets so we can use set operations\n  std::vector< std::set<int> > nn_set(n);\n  IntegerVector r;\n  std::vector<int> s;\n  for(R_xlen_t i = 0; i < n; ++i) {\n    r = nn[i];\n    s =  as<std::vector<int> >(r);\n    nn_set[i].insert(s.begin(), s.end());\n  }\n\n  std::set<int>::iterator it;\n  R_xlen_t i, j;\n  int newlabel, oldlabel;\n\n  for(i = 0; i < n; ++i) {\n    // check all neighbors of i\n    for (it = nn_set[i].begin(); it != nn_set[i].end(); ++it) {\n      j = *it-1; // index in nn starts with 1\n\n      // edge was already checked\n      //if(j<i) continue;\n\n      // already in the same cluster\n      if(label[i] == label[j]) continue;\n\n      // check if points are in each others nn list (i is already in j)\n      if(!mutual || nn_set[j].find(i+1) != nn_set[j].end()) {\n        if(label[i] > label[j]) {\n          newlabel = label[j]; oldlabel = label[i];\n        }else{\n          newlabel = label[i]; oldlabel = label[j];\n        }\n\n        // relabel\n        for(int k = 0; k < n; ++k) {\n          if(label[k] == oldlabel) label[k] = newlabel;\n        }\n      }\n    }\n  }\n\n  return wrap(label);\n}\n"
  },
  {
    "path": "src/dbcv.cpp",
    "content": "//----------------------------------------------------------------------\n//                                DBSCAN\n// File:                         dbcv.cpp\n//----------------------------------------------------------------------\n// Copyright (c) 2025 Michael Hahsler. All Rights Reserved.\n//\n// This software is provided under the provisions of the\n// GNU General Public License (GPL) Version 3\n// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)\n\n#include <Rcpp.h>\n\n// Includes\n#include \"utilities.h\"\n#include \"mst.h\"\n#include \"ANN/ANN.h\"\n#include \"kNN.h\"\n#include <string>\n#include <unordered_map>\n\nusing namespace Rcpp;\n// [[Rcpp::plugins(cpp11)]]\n\n// [[Rcpp::export]]\nStringVector intToStr(IntegerVector iv){\n  StringVector res = StringVector(iv.length());\n  int ci = 0;\n  for (IntegerVector::iterator i = iv.begin(); i != iv.end(); ++i){\n    res[ci++] = std::to_string(*i);\n  }\n  return(res);\n}\n\nstd::unordered_map<std::string, double> toMap(List map){\n  std::vector<std::string> keys = map.names();\n  std::unordered_map<std::string, double> hash_map = std::unordered_map<std::string, double>();\n  const int n = map.size();\n  for (int i = 0; i < n; ++i){\n    hash_map.emplace((std::string) keys.at(i), (double) map.at(i));\n  }\n  return(hash_map);\n}\n\nNumericVector retrieve(StringVector keys, std::unordered_map<std::string, double> map){\n  int n = keys.size(), i = 0;\n  NumericVector res = NumericVector(n);\n  for (StringVector::iterator it = keys.begin(); it != keys.end(); ++it){ res[i++] = map[as< std::string >(*it)]; }\n  return(res);\n}\n\n\nNumericVector dist_subset_arma(const NumericVector& dist, IntegerVector idx){\n  // vec v1 = as<vec>(v1in);\n  // uvec idx = as<uvec>(idxin) - 1;\n  // vec subset = v1.elem(idx);\n  // return(wrap(subset));\n  return(NumericVector::create());\n}\n\n\n\n// Provides a fast of extracting subsets of a dist object. Expects as input the full dist\n// object to subset 'dist', and a (1-based!) integer vector 'idx' of the points to keep in the subset\n// [[Rcpp::export]]\nNumericVector dist_subset(const NumericVector& dist, IntegerVector idx){\n  const int n = dist.attr(\"Size\");\n  const int cl_n = idx.length();\n  NumericVector new_dist = Rcpp::no_init((cl_n * (cl_n - 1))/2);\n  int ii = 0;\n  for (IntegerVector::iterator i = idx.begin(); i != idx.end(); ++i){\n    for (IntegerVector::iterator j = i; j != idx.end(); ++j){\n      if (*i == *j) { continue; }\n      const int ij_idx = LT_POS1(n, *i, *j);\n      new_dist[ii++] = dist[ij_idx];\n    }\n  }\n  new_dist.attr(\"Size\") = cl_n;\n  new_dist.attr(\"class\") = \"dist\";\n  return(new_dist);\n}\n\n// Returns true if a given distance is less than 32-bit floating point precision\nbool remove_zero(ANNdist cdist){\n  return(cdist <= std::numeric_limits<float>::epsilon());\n}\n\nANNdist inv_density(ANNdist cdist){\n  return(1.0/cdist);\n}\n\n// // [[Rcpp::export]]\n// List all_pts_core_sorted_dist(const NumericMatrix& sorted_dist, const List& cl, const int d, const bool squared){\n//   // The all core dists to return\n//   List all_core_res = List(cl.size());\n//\n//   // Do the kNN searches per cluster; note that k varies with the cluster\n//   int i = 0;\n//   for (List::const_iterator it = cl.begin(); it < cl.end(); ++it, ++i){\n//     const IntegerVector& cl_pts = (*it);\n//     const int k = cl_pts.length();\n//\n//     // Initial vector to record the per-point all core dists\n//     NumericVector all_core_cl = Rcpp::no_init_vector(k);\n//\n//     // For each point in the cluster, get the all core points dist\n//     int j = 0;\n//     for (IntegerVector::const_iterator pt_id = cl_pts.begin(); pt_id != cl_pts.end(); ++pt_id, ++j){\n//       const NumericMatrix::ConstColumn& knn_dist = sorted_dist.column((*pt_id) - 1);\n//\n//       // Calculate the all core points distance for this point\n//       std::vector<ANNdist> ndists = std::vector<ANNdist>(knn_dist.begin(), knn_dist.begin()+k);\n//       std::remove_if(ndists.begin(), ndists.end(), remove_zero);\n//       std::transform(ndists.begin(), ndists.end(), ndists.begin(), [=](ANNdist cdist){ return std::pow(1.0/cdist, d); });\n//       ANNdist sum_inv_density = std::accumulate(ndists.begin(), ndists.end(), (ANNdist) 0.0);\n//       double acdist = std::pow(sum_inv_density/(k - 1.0), -(1.0 / double(d))); // Apply all core points equation\n//       all_core_cl[j] = acdist;\n//       // return(List::create(_[\"ndists\"] = acdist, _[\"denom\"] = sum_inv_density/(k - 1.0), _[\"k\"] = k));\n//     }\n//     all_core_res[i] = all_core_cl;\n//   }\n//   return(all_core_res);\n// }\n\n// // [[Rcpp::export]]\n// List all_pts_core(const NumericMatrix& data, const List& cl, const bool squared){\n//   // copy data\n//   int nrow = data.nrow();\n//   int ncol = data.ncol();\n//   ANNpointArray dataPts = annAllocPts(nrow, ncol);\n//   for(int i = 0; i < nrow; i++){\n//     for(int j = 0; j < ncol; j++){\n//       (dataPts[i])[j] = data(i, j);\n//     }\n//   }\n//\n//   // create kd-tree (1) or linear search structure (2)\n//   ANNpointSet* kdTree = new ANNkd_tree(dataPts, nrow, ncol, 30, (ANNsplitRule)  5);\n//\n//   // The all core dists to\n//   List all_core_res = List(cl.size());\n//\n//   // Do the kNN searches per cluster; note that k varies with the cluster\n//   int i = 0;\n//   for (List::const_iterator it = cl.begin(); it < cl.end(); ++it, ++i){\n//     const IntegerVector& cl_pts = (*it);\n//     const int k = cl_pts.length();\n//\n//     // Initial vector to record the per-point all core dists\n//     NumericVector all_core_cl = Rcpp::no_init_vector(k);\n//\n//     // For each point in the cluster, get the all core points dist\n//     int j = 0;\n//     ANNdistArray dists = new ANNdist[k];\n//     ANNidxArray nnIdx = new ANNidx[k];\n//     for (IntegerVector::const_iterator pt_id = cl_pts.begin(); pt_id != cl_pts.end(); ++pt_id, ++j){\n//       // Do the search\n//       ANNpoint queryPt = dataPts[(*pt_id) - 1]; // use original data points\n//       kdTree->annkSearch(queryPt, k, nnIdx, dists);\n//\n//       // V2.\n//       std::vector<ANNdist> ndists = std::vector<ANNdist>(dists, dists+k);\n//       std::remove_if(ndists.begin(), ndists.end(), remove_zero);\n//       std::transform(ndists.begin(), ndists.end(), ndists.begin(), [=](ANNdist cdist){ return std::pow(1.0/cdist, ncol); });\n//       ANNdist sum_inv_density = std::accumulate(ndists.begin(), ndists.end(), (ANNdist) 0.0);\n//       double acdist = std::pow(sum_inv_density/(k - 1.0), -(1.0 / double(ncol))); // Apply all core points equation\n//       all_core_cl[j] = acdist;\n//       // return(List::create(_[\"ndists\"] = acdist, _[\"denom\"] = sum_inv_density/(k - 1.0), _[\"k\"] = k));\n//     }\n//     delete [] dists;\n//     delete [] nnIdx;\n//     all_core_res[i] = all_core_cl;\n//   }\n//\n//   // cleanup\n//   delete kdTree;\n//   annDeallocPts(dataPts);\n//   annClose();\n//\n//   // Return the all point core distance\n//   if(!squared){ for (int i = 0; i < cl.size(); ++i){ all_core_res[i] = Rcpp::sqrt(all_core_res[i]); } }\n//   return(all_core_res);\n// }\n\n\n\n// NumericVector all_pts_core(const NumericVector& dist, IntegerVector cl, const int d){\n//   const int n = dist.attr(\"Size\");\n//   const int cl_n = cl.length();\n//   NumericVector all_pts_cd = NumericVector(cl_n);\n//   NumericVector tmp = NumericVector(cl_n);\n//   int knn_i = 0, ii = 0;\n//   for (IntegerVector::iterator i = cl.begin(); i != cl.end(); ++i){\n//     for (IntegerVector::iterator j = cl.begin(); j != cl.end(); ++j){\n//       if (*i == *j) { continue; }\n//       const int idx = INDEX_TF(n, (*i < *j ? *i : *j) - 1, (*i < *j ? *j : *i) - 1);\n//       double dist_ij = dist[idx];\n//       tmp[knn_i++] = 1.0 / (dist_ij == 0.0 ? std::numeric_limits<double>::epsilon() : dist_ij);\n//     }\n//     all_pts_cd[ii++] = pow(sum(pow(tmp, d))/(cl_n - 1.0), -(1.0 / d));\n//     knn_i = 0;\n//   }\n//   return(all_pts_cd);\n// }\n\n\n// RCPP does not provide xor!\n// [[Rcpp::export]]\nRcpp::LogicalVector XOR(Rcpp::LogicalVector lhs, Rcpp::LogicalVector rhs) {\n  R_xlen_t i = 0, n = lhs.size();\n  Rcpp::LogicalVector result(n);\n  for ( ; i < n; i++) {  result[i] = (lhs[i] ^ rhs[i]); }\n  return result;\n}\n\n// [[Rcpp::export]]\nNumericMatrix dspc(const List& cl_idx, const List& internal_nodes, const IntegerVector& all_cl_ids, const NumericVector& mrd_dist) {\n\n  // Setup variables\n  const int ncl = cl_idx.length(); // number of clusters\n  NumericMatrix res = Rcpp::no_init_matrix((ncl * (ncl - 1))/2, 3); // resulting separation measures\n\n  // Loop through cluster combinations, and for each combination\n  int c = 0;\n  double min_edge = std::numeric_limits<double>::infinity();\n  for (int ci = 0; ci < ncl; ++ci) {\n    for (int cj = (ci+1); cj < ncl; ++cj){\n      Rcpp::checkUserInterrupt();\n\n      // Do lots of indexing to get the relative indexes corresponding to internal nodes\n      const IntegerVector i_idx = internal_nodes[ci], j_idx = internal_nodes[cj]; // i and j cluster point indices\n\n      // ignore clusters with no internal nodes! -> get infinity for minimum edge\n      // this leads to a NaN and should not happen in this implementation since\n      // we have already filtered out clusters of size < 3\n      if(i_idx.length() > 1 || j_idx.length() > 1) {\n\n        const IntegerVector rel_i_idx = match(as<IntegerVector>(cl_idx[ci]), all_cl_ids)[i_idx - 1];\n        const IntegerVector rel_j_idx = match(as<IntegerVector>(cl_idx[cj]), all_cl_ids)[j_idx - 1];\n        IntegerVector int_idx = combine(rel_i_idx, rel_j_idx);\n\n        // Get the pairwise MST\n        NumericMatrix pairwise_mst = mst(dist_subset(mrd_dist, int_idx), int_idx.length());\n\n        // Do lots of indexing / casting\n        const IntegerVector from_int = seq_len(rel_i_idx.length());\n        const NumericVector from_idx = as<NumericVector>(from_int);\n        const NumericVector from = pairwise_mst.column(0), to = pairwise_mst.column(1), height = pairwise_mst.column(2);\n\n        // Find which distances in the MST cross to both clusters\n        LogicalVector cross_edges = XOR(Rcpp::in(from, from_idx), Rcpp::in(to, from_idx));\n\n        // The minimum weighted edge of these cross edges is the density separation between the two clusters\n        min_edge = min(as<NumericVector>(height[cross_edges]));\n\n      }\n\n      // Save the minimum edge\n      res(c++, _) = NumericVector::create(ci+1, cj+1, min_edge);\n      min_edge = std::numeric_limits<double>::infinity();\n    }\n  }\n  return(res);\n}\n\n\n// Density Separation code\n// NumericMatrix dspc(List config, const NumericVector& xdist) {\n//\n//   // Load configuration from list\n//   const int n = config[\"n\"];\n//   const int ncl = config[\"ncl\"];\n//   const int n_pairs = config[\"n_pairs\"];\n//   List node_ids = config[\"node_ids\"];\n//   List acp = config[\"acp\"];\n//\n//   // Conversions and basic setup\n//   std::unordered_map<std::string, double> acp_map = toMap(acp);\n//   double min_mrd = std::numeric_limits<double>::infinity();\n//   NumericMatrix min_mrd_dist = NumericMatrix(n_pairs, 3);\n//\n//   // Loop through cluster combinations, and for each combination\n//   int c = 0;\n//   for (int ci = 0; ci < ncl; ++ci) {\n//     for (int cj = (ci+1); cj < ncl; ++cj){\n//       Rcpp::checkUserInterrupt();\n//       IntegerVector i_idx = node_ids[ci], j_idx = node_ids[cj]; // i and j cluster point indices\n//       for (IntegerVector::iterator i = i_idx.begin(); i != i_idx.end(); ++i){\n//         for (IntegerVector::iterator j = j_idx.begin(); j != j_idx.end(); ++j){\n//           const int lhs = *i < *j ? *i : *j, rhs = *i < *j ? *j : *i;\n//           double dist_ij = xdist[INDEX_TF(n, lhs - 1, rhs - 1)]; // dist(p_i, p_j)\n//           double acd_i = acp_map[std::to_string(*i)]; // all core distance for p_i\n//           double acd_j = acp_map[std::to_string(*j)]; // all core distance for p_i\n//           double mrd_ij = std::max(std::max(acd_i, acd_j), dist_ij); // mutual reachability distance of the pair\n//           if (mrd_ij < min_mrd){\n//             min_mrd = mrd_ij;\n//           }\n//         }\n//       }\n//       min_mrd_dist(c++, _) = NumericVector::create(ci+1, cj+1, min_mrd);\n//       min_mrd = std::numeric_limits<double>::infinity();\n//     }\n//   }\n//   return(min_mrd_dist);\n// }\n\n\n"
  },
  {
    "path": "src/dbscan.cpp",
    "content": "//----------------------------------------------------------------------\n//                                DBSCAN\n// File:                        R_dbscan.cpp\n//----------------------------------------------------------------------\n// Copyright (c) 2015 Michael Hahsler. All Rights Reserved.\n//\n// This software is provided under the provisions of the\n// GNU General Public License (GPL) Version 3\n// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)\n\n\n#include <Rcpp.h>\n#include \"ANN/ANN.h\"\n#include \"regionQuery.h\"\n\nusing namespace Rcpp;\n\n// call this with either\n// * data and epsilon and an empty frNN list\n// or\n// * empty data and a frNN id list (including selfmatches and using C numbering)\n\n// [[Rcpp::export]]\nIntegerVector dbscan_int(\n    NumericMatrix data, double eps, int minPts, NumericVector weights,\n    int borderPoints, int type, int bucketSize, int splitRule, double approx,\n    List frNN) {\n\n  // kd-tree uses squared distances\n  double eps2 = eps*eps;\n\n  bool weighted = FALSE;\n  double Nweight = 0.0;\n  ANNpointSet* kdTree = NULL;\n  ANNpointArray dataPts = NULL;\n  int nrow = NA_INTEGER;\n  int ncol= NA_INTEGER;\n\n  if(frNN.size()) {\n    // no kd-tree but use frNN list from distances\n    nrow = frNN.size();\n  }else{\n\n    // copy data for kd-tree\n    nrow = data.nrow();\n    ncol = data.ncol();\n    dataPts = annAllocPts(nrow, ncol);\n    for (int i = 0; i < nrow; i++){\n      for (int j = 0; j < ncol; j++){\n        (dataPts[i])[j] = data(i, j);\n      }\n    }\n    //Rprintf(\"Points copied.\\n\");\n\n    // create kd-tree (1) or linear search structure (2)\n    if (type==1) kdTree = new ANNkd_tree(dataPts, nrow, ncol, bucketSize,\n      (ANNsplitRule) splitRule);\n    else kdTree = new ANNbruteForce(dataPts, nrow, ncol);\n    //Rprintf(\"kd-tree ready. starting DBSCAN.\\n\");\n  }\n\n  if (weights.size() != 0) {\n    if (weights.size() != nrow)\n      stop(\"length of weights vector is incompatible with data.\");\n    weighted = TRUE;\n  }\n\n  // DBSCAN\n  std::vector<bool> visited(nrow, false);\n  std::vector< std::vector<int> > clusters; // vector of vectors == list\n  std::vector<int>  N, N2;\n\n  for (int i=0; i<nrow; i++) {\n    //Rprintf(\"processing point %d\\n\", i+1);\n    if (!(i % 100)) Rcpp::checkUserInterrupt();\n\n    if (visited[i]) continue;\n\n    //N = regionQuery(i, dataPts, kdTree, eps2, approx);\n    if(frNN.size())   N = Rcpp::as< std::vector<int> >(frNN[i]);\n    else              N = regionQuery(i, dataPts, kdTree, eps2, approx);\n\n    // noise points stay unassigned for now\n    //if (weighted) Nweight = sum(weights[IntegerVector(N.begin(), N.end())]) +\n    if (weighted) {\n      // This should work, but Rcpp has a problem with the sugar expression!\n      // Assigning the subselection forces it to be materialized.\n      // Nweight = sum(weights[IntegerVector(N.begin(), N.end())]) +\n      // weights[i];\n      NumericVector w = weights[IntegerVector(N.begin(), N.end())];\n      Nweight = sum(w);\n    } else Nweight = N.size();\n\n    if (Nweight < minPts) continue;\n\n    // start new cluster and expand\n    std::vector<int> cluster;\n    cluster.push_back(i);\n    visited[i] = true;\n\n    while (!N.empty()) {\n      int j = N.back();\n      N.pop_back();\n\n      if (visited[j]) continue; // point already processed\n      visited[j] = true;\n\n      //N2 = regionQuery(j, dataPts, kdTree, eps2, approx);\n      if(frNN.size())   N2 = Rcpp::as< std::vector<int> >(frNN[j]);\n      else              N2 = regionQuery(j, dataPts, kdTree, eps2, approx);\n\n      if (weighted) {\n        // Nweight = sum(weights(NumericVector(N2.begin(), N2.end())) +\n        // weights[j]\n        NumericVector w = weights[IntegerVector(N2.begin(), N2.end())];\n        Nweight = sum(w);\n      } else Nweight = N2.size();\n\n      if (Nweight >= minPts) { // expand neighborhood\n        // this is faster than set_union and does not need sort! visited takes\n        // care of duplicates.\n        std::copy(N2.begin(), N2.end(),\n          std::back_inserter(N));\n      }\n\n      // for DBSCAN* (borderPoints==FALSE) border points are considered noise\n      if(Nweight >= minPts || borderPoints) cluster.push_back(j);\n    }\n\n    // add cluster to list\n    clusters.push_back(cluster);\n  }\n\n  // prepare cluster vector\n  // unassigned points are noise (cluster 0)\n  IntegerVector id(nrow, 0);\n  for (std::size_t i=0; i<clusters.size(); i++) {\n    for (std::size_t j=0; j<clusters[i].size(); j++) {\n      id[clusters[i][j]] = i+1;\n    }\n  }\n\n  // cleanup\n  if (kdTree != NULL) delete kdTree;\n  if (dataPts != NULL)  annDeallocPts(dataPts);\n  // annClose(); is now done globally in the package\n\n  return wrap(id);\n}\n\n"
  },
  {
    "path": "src/dendrogram.cpp",
    "content": "//----------------------------------------------------------------------\n//              R interface to dbscan using the ANN library\n//----------------------------------------------------------------------\n// Copyright (c) 2015 Michael Hahsler, Matt Piekenbrock. All Rights Reserved.\n//\n// This software is provided under the provisions of the\n// GNU General Public License (GPL) Version 3\n// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)\n\n#include <Rcpp.h>\n#include <sstream>\n#include <string>\n#include \"UnionFind.h\"\n\nusing namespace Rcpp;\n\n// Ditto with atoi!\nint fast_atoi( const char * str )\n{\n  int val = 0;\n  while( *str ) {\n    val = val*10 + (*str++ - '0');\n  }\n  return val;\n}\n\nint which_int(IntegerVector x, int target) {\n  int size = (int) x.size();\n  for (int i = 0; i < size; ++i) {\n    if (x(i) == target) return(i);\n  }\n  return(-1);\n}\n\n\n// [[Rcpp::export]]\nList reach_to_dendrogram(const Rcpp::List reachability, const NumericVector pl_order) {\n\n  // Set up sorted reachability distance\n  NumericVector pl = Rcpp::clone(as<NumericVector>(reachability[\"reachdist\"])).sort();\n\n  // Get 0-based order\n  IntegerVector order = Rcpp::clone(as<IntegerVector>(reachability[\"order\"])) - 1;\n\n  /// Initialize disjoint-set structure\n  int n_nodes = order.size();\n  UnionFind uf((size_t) n_nodes);\n\n  // Create leaves\n  List dendrogram(n_nodes);\n  for (int i = 0; i < n_nodes; ++i) {\n    IntegerVector leaf = IntegerVector();\n    leaf.push_back(i+1);\n    leaf.attr(\"label\") = std::to_string(i + 1);\n    leaf.attr(\"members\") = 1;\n    leaf.attr(\"height\") = 0;\n    leaf.attr(\"leaf\") = true;\n    dendrogram.at(i) = leaf;\n  }\n\n  // Precompute the q order\n  IntegerVector q_order(n_nodes);\n  for (int i = 0; i < n_nodes - 1; ++i) {\n    q_order.at(i) = order(which_int(order, pl_order(i)) - 1);\n  }\n\n  // Get the index of the point with next smallest reach dist and its neighbor\n  IntegerVector members(n_nodes, 1);\n  int insert = 0, p = 0, q = 0, p_i = 0, q_i = 0;\n  for (int i = 0; i < (n_nodes-1); ++i) {\n    p = pl_order(i);\n    q = q_order(i);  // left neighbor in ordering\n    if (q == -1) { stop(\"Left neighbor not found\"); }\n\n    // Get the actual index of the branch(es) containing the p and q\n    p_i = uf.Find(p), q_i = uf.Find(q);\n    List branch = List::create(dendrogram.at(q_i), dendrogram.at(p_i));\n\n    // generic proxy blocks attr access for mixed types, so keep track of members manually!\n    branch.attr(\"members\") = members.at(p_i) + members.at(q_i);\n    branch.attr(\"height\") = pl(i);\n    branch.attr(\"class\") = \"dendrogram\";\n\n    // Merge the two, retrieving the new index\n    uf.Union(p_i, q_i);\n    insert = uf.Find(q_i); // q because q_branch is first in the new branch\n\n    // Update members reference and insert the branch\n    members.at(insert) = branch.attr(\"members\");\n    dendrogram.at(insert) = branch;\n  }\n  return(dendrogram.at(insert));\n}\n\nint DFS(List d, List& rp, int pnode, NumericVector stack) {\n  if (d.hasAttribute(\"leaf\")) { // If at a leaf node, compare to previous node\n    std::string leaf_label = as<std::string>( d.attr(\"label\") );\n    rp[leaf_label] = stack; // Record the ancestors reachability values\n    std::string pnode_label = std::to_string(pnode);\n    double new_reach = 0.0f;\n    if(!rp.containsElementNamed(pnode_label.c_str())) { // 1st time seeing this point\n      new_reach = INFINITY;\n    } else { // Smallest Common Ancestor\n      NumericVector reachdist_p = rp[pnode_label];\n      new_reach = min(intersect(stack, reachdist_p));\n    }\n    NumericVector reachdist = rp[\"reachdist\"];\n    IntegerVector order = rp[\"order\"];\n    reachdist.push_back(new_reach);\n    int res = fast_atoi(leaf_label.c_str());\n    order.push_back(res);\n    rp[\"order\"] = order;\n    rp[\"reachdist\"] = reachdist;\n    return(res);\n  } else {\n    double cheight = d.attr(\"height\");\n    stack.push_back(cheight);\n    List left = d[0];\n    // Recursively go left, recording the reachability distances on the stack\n    pnode = DFS(left, rp, pnode, stack);\n    if (d.length() > 1) {\n      for (int sub_branch = 1; sub_branch < d.length(); ++sub_branch)  {\n        pnode = DFS(d[sub_branch], rp, pnode, stack); // pnode;\n      }\n    }\n    return(pnode);\n  }\n}\n\n// [[Rcpp::export]]\nList dendrogram_to_reach(const Rcpp::List x) {\n  Rcpp::List rp = List::create(_[\"order\"] = IntegerVector::create(),\n                               _[\"reachdist\"] = NumericVector::create());\n  NumericVector stack = NumericVector::create();\n  DFS(x, rp, 0, stack);\n  List res = List::create(_[\"reachdist\"] = rp[\"reachdist\"], _[\"order\"] = rp[\"order\"]);\n  res.attr(\"class\") = \"reachability\";\n  return(res);\n}\n\n// [[Rcpp::export]]\nList mst_to_dendrogram(const NumericMatrix mst) {\n\n  // Set up sorted vector values\n  NumericVector p_order = mst(_, 0);\n  NumericVector q_order = mst(_, 1);\n  NumericVector dist = mst(_, 2);\n  int n_nodes = p_order.length() + 1;\n\n  // Make sure to clone so as to not make changes by reference\n  p_order = Rcpp::clone(p_order);\n  q_order = Rcpp::clone(q_order);\n\n  // UnionFind data structure for fast agglomerative building\n  UnionFind uf((size_t) n_nodes);\n\n  // Create leaves\n  List dendrogram(n_nodes);\n  for (int i = 0; i < n_nodes; ++i) {\n    IntegerVector leaf = IntegerVector();\n    leaf.push_back(i+1);\n    leaf.attr(\"label\") = std::to_string(i + 1);\n    leaf.attr(\"members\") = 1;\n    leaf.attr(\"height\") = 0;\n    leaf.attr(\"leaf\") = true;\n    dendrogram.at(i) = leaf;\n  }\n\n  // Get the index of the point with next smallest reach dist and its neighbor\n  IntegerVector members(n_nodes, 1);\n  int insert = 0, p = 0, q = 0, p_i = 0, q_i = 0;\n  for (int i = 0; i < (n_nodes-1); ++i) {\n    p = p_order(i), q = q_order(i);\n\n    // Get the actual index of the branch(es) containing the p and q\n    p_i = uf.Find(p), q_i = uf.Find(q);\n\n    // Merge the two, retrieving the new index\n    uf.Union(p_i, q_i);\n    List branch = List::create(dendrogram.at(q_i), dendrogram.at(p_i));\n\n    insert = uf.Find(q_i); // q because q_branch is first in the new branch\n\n    // Update members in the branch\n    int tmp_members = members.at(p_i) + members.at(q_i);\n\n    // Branches with equivalent distances are merged simultaneously\n    while((i + 1) < (n_nodes-1) && dist(i + 1) == dist(i)){\n      i += 1;\n      p = p_order(i), q = q_order(i);\n      p_i = uf.Find(p), q_i = uf.Find(q);\n\n      // Merge the branches, update current insert index\n      int insert2 = uf.Find(q_i);\n      branch.push_back(insert == insert2 ? dendrogram.at(p_i) : dendrogram.at(q_i));\n      tmp_members += insert == insert2 ? members.at(p_i) : members.at(q_i);\n      uf.Union(p_i, q_i);\n      insert = uf.Find(q_i);\n\n    }\n    // Generic proxy blocks attr access for mixed types, so need to keep track of members manually!\n    branch.attr(\"height\") = dist(i);\n    branch.attr(\"class\") = \"dendrogram\";\n    branch.attr(\"members\") = tmp_members;\n\n    // Update members reference and insert the branch\n    members.at(insert) = branch.attr(\"members\");\n    dendrogram.at(insert) = branch;\n\n  }\n  return(dendrogram.at(insert));\n}\n\n\n"
  },
  {
    "path": "src/density.cpp",
    "content": "//----------------------------------------------------------------------\n//                                DBSCAN density\n//----------------------------------------------------------------------\n// Copyright (c) 2015 Michael Hahsler. All Rights Reserved.\n//\n// This software is provided under the provisions of the\n// GNU General Public License (GPL) Version 3\n// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)\n\n\n#include <Rcpp.h>\n#include \"ANN/ANN.h\"\n#include \"regionQuery.h\"\n\nusing namespace Rcpp;\n\n// faster implementation of counting point densities from a matrix\n// using a kd-tree\n// [[Rcpp::export]]\nIntegerVector dbscan_density_int(\n    NumericMatrix data, double eps,\n    int type, int bucketSize, int splitRule, double approx) {\n\n  // kd-tree uses squared distances\n  double eps2 = eps*eps;\n\n  ANNpointSet* kdTree = NULL;\n  ANNpointArray dataPts = NULL;\n  int nrow = NA_INTEGER;\n  int ncol= NA_INTEGER;\n\n  // copy data for kd-tree\n  nrow = data.nrow();\n  ncol = data.ncol();\n  dataPts = annAllocPts(nrow, ncol);\n  for (int i = 0; i < nrow; i++){\n    for (int j = 0; j < ncol; j++){\n      (dataPts[i])[j] = data(i, j);\n    }\n  }\n  //Rprintf(\"Points copied.\\n\");\n\n  // create kd-tree (1) or linear search structure (2)\n  if (type==1) kdTree = new ANNkd_tree(dataPts, nrow, ncol, bucketSize,\n    (ANNsplitRule) splitRule);\n  else kdTree = new ANNbruteForce(dataPts, nrow, ncol);\n  //Rprintf(\"kd-tree ready. starting DBSCAN.\\n\");\n\n  std::vector<int> N;\n  IntegerVector count(nrow);\n\n  for (int i=0; i<nrow; ++i) {\n    //Rprintf(\"processing point %d\\n\", i+1);\n    if (!(i % 100)) Rcpp::checkUserInterrupt();\n\n    N = regionQuery(i, dataPts, kdTree, eps2, approx);\n    count[i] = N.size();\n  }\n\n  // cleanup\n  if (kdTree != NULL) delete kdTree;\n  if (dataPts != NULL)  annDeallocPts(dataPts);\n  // annClose(); is now done globally in the package\n\n  return count;\n}\n\n"
  },
  {
    "path": "src/frNN.cpp",
    "content": "//----------------------------------------------------------------------\n//                   Fixed Radius Nearest Neighbors\n// File:                        R_frNN.cpp\n//----------------------------------------------------------------------\n// Copyright (c) 2015 Michael Hahsler. All Rights Reserved.\n//\n// This software is provided under the provisions of the\n// GNU General Public License (GPL) Version 3\n// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)\n\n\n#include <Rcpp.h>\n#include \"ANN/ANN.h\"\n#include \"regionQuery.h\"\n\nusing namespace Rcpp;\n\n// [[Rcpp::export]]\nList frNN_int(NumericMatrix data, double eps, int type,\n  int bucketSize, int splitRule, double approx) {\n\n  // kd-tree uses squared distances\n  double eps2 = eps*eps;\n\n  // copy data\n  int nrow = data.nrow();\n  int ncol = data.ncol();\n  ANNpointArray dataPts = annAllocPts(nrow, ncol);\n  for(int i = 0; i < nrow; i++){\n    for(int j = 0; j < ncol; j++){\n      (dataPts[i])[j] = data(i, j);\n    }\n  }\n  //Rprintf(\"Points copied.\\n\");\n\n  // create kd-tree (1) or linear search structure (2)\n  ANNpointSet* kdTree = NULL;\n  if (type==1){\n    kdTree = new ANNkd_tree(dataPts, nrow, ncol, bucketSize,\n      (ANNsplitRule)  splitRule);\n  } else{\n    kdTree = new ANNbruteForce(dataPts, nrow, ncol);\n  }\n  //Rprintf(\"kd-tree ready. starting DBSCAN.\\n\");\n\n  // frNN\n  //std::vector< IntegerVector > id; id.resize(nrow);\n  //std::vector< NumericVector > dist; dist.resize(nrow);\n  List id(nrow);\n  List dist(nrow);\n\n  for (int p=0; p<nrow; p++) {\n    if (!(p % 100)) Rcpp::checkUserInterrupt();\n\n    //Rprintf(\"processing point %d\\n\", p+1);\n    nn N = regionQueryDist(p, dataPts, kdTree, eps2, approx);\n\n    // fix index\n    //std::transform(N.first.begin(), N.first.end(),\n    //  N.first.begin(), std::bind2nd( std::plus<int>(), 1 ) );\n\n    // take sqrt of distance since the tree stores d^2\n    //std::transform(N.second.begin(), N.second.end(),\n    //  N.second.begin(), static_cast<double (*)(double)>(std::sqrt));\n\n    IntegerVector ids = IntegerVector(N.first.begin(), N.first.end());\n    NumericVector dists = NumericVector(N.second.begin(), N.second.end());\n\n    // remove self matches\n    LogicalVector take = ids != p;\n    ids = ids[take];\n    dists = dists[take];\n\n    //Rprintf(\"Found neighborhood size %d\\n\", ids.size());\n    id[p] = ids+1;\n    dist[p] = sqrt(dists);\n  }\n\n  // cleanup\n  delete kdTree;\n  annDeallocPts(dataPts);\n  // annClose(); is now done globally in the package\n\n  // prepare results\n  List ret;\n  ret[\"dist\"] = dist;\n  ret[\"id\"] = id;\n  ret[\"eps\"] = eps;\n  return ret;\n}\n\n// [[Rcpp::export]]\nList frNN_query_int(NumericMatrix data, NumericMatrix query, double eps, int type,\n  int bucketSize, int splitRule, double approx) {\n\n  // kd-tree uses squared distances\n  double eps2 = eps*eps;\n\n  // copy data\n  int nrow = data.nrow();\n  int ncol = data.ncol();\n  ANNpointArray dataPts = annAllocPts(nrow, ncol);\n  for(int i = 0; i < nrow; i++){\n    for(int j = 0; j < ncol; j++){\n      (dataPts[i])[j] = data(i, j);\n    }\n  }\n\n  int nrow_q = query.nrow();\n  int ncol_q = query.ncol();\n  ANNpointArray queryPts = annAllocPts(nrow_q, ncol_q);\n  for(int i = 0; i < nrow_q; i++){\n    for(int j = 0; j < ncol_q; j++){\n      (queryPts[i])[j] = query(i, j);\n    }\n  }\n  //Rprintf(\"Points copied.\\n\");\n\n  // create kd-tree (1) or linear search structure (2)\n  ANNpointSet* kdTree = NULL;\n  if (type==1){\n    kdTree = new ANNkd_tree(dataPts, nrow, ncol, bucketSize,\n      (ANNsplitRule)  splitRule);\n  } else{\n    kdTree = new ANNbruteForce(dataPts, nrow, ncol);\n  }\n  //Rprintf(\"kd-tree ready. starting DBSCAN.\\n\");\n\n  // frNN\n  //std::vector< IntegerVector > id; id.resize(nrow);\n  //std::vector< NumericVector > dist; dist.resize(nrow);\n  List id(nrow_q);\n  List dist(nrow_q);\n\n  for (int p=0; p<nrow_q; p++) {\n    if (!(p % 100)) Rcpp::checkUserInterrupt();\n\n    //Rprintf(\"processing point %d\\n\", p+1);\n    ANNpoint queryPt = queryPts[p];\n    nn N = regionQueryDist_point(queryPt, dataPts, kdTree, eps2, approx);\n\n    // fix index\n    //std::transform(N.first.begin(), N.first.end(),\n    //  N.first.begin(), std::bind2nd( std::plus<int>(), 1 ) );\n\n    // take sqrt of distance since the tree stores d^2\n    //std::transform(N.second.begin(), N.second.end(),\n    //  N.second.begin(), static_cast<double (*)(double)>(std::sqrt));\n\n    IntegerVector ids = IntegerVector(N.first.begin(), N.first.end());\n    NumericVector dists = NumericVector(N.second.begin(), N.second.end());\n\n    // remove self matches -- not an issue with query points\n    //LogicalVector take = ids != p;\n    //ids = ids[take];\n    //dists = dists[take];\n\n    //Rprintf(\"Found neighborhood size %d\\n\", ids.size());\n    id[p] = ids+1;\n    dist[p] = sqrt(dists);\n  }\n\n  // cleanup\n  delete kdTree;\n  annDeallocPts(dataPts);\n  annDeallocPts(queryPts);\n  // annClose(); is now done globally in the package\n\n  // prepare results\n  List ret;\n  ret[\"dist\"] = dist;\n  ret[\"id\"] = id;\n  ret[\"eps\"] = eps;\n  ret[\"sort\"] = false;\n  return ret;\n}\n"
  },
  {
    "path": "src/hdbscan.cpp",
    "content": "//----------------------------------------------------------------------\n//              R interface to dbscan using the ANN library\n//----------------------------------------------------------------------\n// Copyright (c) 2015 Michael Hahsler, Matt Piekenbrock. All Rights Reserved.\n//\n// This software is provided under the provisions of the\n// GNU General Public License (GPL) Version 3\n// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)\n\n#include <Rcpp.h>\n\n// C++ includes\n#include <unordered_map>\n#include <stack>\n#include <queue>\n#include <string> // std::atoi\n\n// Helper functions\n#include \"utilities.h\"\n\nusing namespace Rcpp;\n// [[Rcpp::plugins(cpp11)]]\n\n// Macros\n#define INDEX_TF(N,to,from) (N)*(to) - (to)*(to+1)/2 + (from) - (to) - (1)\n\n// Given a dist vector of \"should-link\" (1), \"should-not-link\" (-1), and \"don't care\" (0)\n// constraints in the form of integers, convert constraints to a more compact adjacency list\n// representation.\n// [[Rcpp::export]]\nList distToAdjacency(IntegerVector constraints, const int N){\n  std::unordered_map<int, std::vector<int> > key_map = std::unordered_map<int, std::vector<int> >();\n  for (int i = 0; i < N; ++i){\n    for (int j = 0; j < N; ++j){\n      if (i == j) continue;\n      int index = i > j ? INDEX_TF(N, j, i) : INDEX_TF(N, i, j);\n      int crule = constraints.at(index);\n      if (crule != 0){\n        if (key_map.count(i+1) != 1){ key_map[i+1] = std::vector<int>(); } // add 1 for base 1\n        key_map[i+1].push_back(crule < 0 ? - (j + 1) : j + 1); // add 1 for base 1\n      }\n    }\n  }\n  return(wrap(key_map));\n}\n\n// Given an hclust object, convert to a dendrogram object (but much faster).\n// [[Rcpp::export]]\nList buildDendrogram(List hcl) {\n\n  // Extract hclust info\n  IntegerMatrix merge = hcl[\"merge\"];\n  NumericVector height = hcl[\"height\"];\n  IntegerVector order = hcl[\"order\"];\n  List labels = List(); // allows to avoid type inference\n  if (!hcl.containsElementNamed(\"labels\") || hcl[\"labels\"] == R_NilValue){\n    labels = seq_along(order);\n  } else {\n    labels = as<StringVector>(hcl[\"labels\"]);\n  }\n\n  int n = merge.nrow() + 1, k;\n  List new_br, z = List(n);\n  for (k = 0; k < n-1; k++){\n    int lm = merge(k, 0), rm = merge(k, 1);\n    IntegerVector m = IntegerVector::create(lm, rm);\n\n    // First Case: Both are singletons, so need to create leaves\n    if (all(m < 0).is_true()){\n      // Left\n      IntegerVector left = IntegerVector::create(-lm);\n      left.attr(\"members\") = (int) 1;\n      left.attr(\"height\") = (double) 0.f;\n      left.attr(\"label\") = labels.at(-(lm + 1));\n      left.attr(\"leaf\") = true;\n\n      // Right\n      IntegerVector right = IntegerVector::create(-rm);\n      right.attr(\"members\") = (int) 1;\n      right.attr(\"height\") = (double) 0.f;\n      right.attr(\"label\") = labels.at(-(rm + 1));\n      right.attr(\"leaf\") = true;\n\n      // Merge\n      new_br = List::create(left, right);\n      new_br.attr(\"members\") = 2;\n      new_br.attr(\"midpoint\") = 0.5;\n    }\n    // Second case: 1 is a singleton, the other is a branch\n    else if (any(m < 0).is_true()){\n      bool isL = lm < 0;\n\n      // Create the leaf from the negative entry\n      IntegerVector leaf = IntegerVector::create(isL ? -lm : -rm);\n      leaf.attr(\"members\") = 1;\n      leaf.attr(\"height\") = 0;\n      leaf.attr(\"label\") = labels.at(isL ? -(lm + 1) : -(rm + 1));\n      leaf.attr(\"leaf\") = true;\n\n      // Merge the leaf with the other existing branch\n      int branch_key = isL ? rm - 1 : lm - 1;\n      List sub_branch = z[branch_key];\n      new_br = isL ? List::create(leaf, sub_branch) : List::create(sub_branch, leaf);\n      z.at(branch_key) = R_NilValue;\n\n      // Set attributes of new branch\n      int sub_members = sub_branch.attr(\"members\");\n      double mid_pt = sub_branch.attr(\"midpoint\");\n      new_br.attr(\"members\") = int(sub_members) + 1;\n      new_br.attr(\"midpoint\") = (int(isL ? 1 : sub_members) + mid_pt) / 2;\n    } else {\n      // Create the new branch\n      List l_branch = z.at(lm - 1), r_branch = z.at(rm - 1);\n      new_br = List::create(l_branch, r_branch);\n\n      // Store attribute valeus in locals to get around proxy\n      int left_members = l_branch.attr(\"members\"), right_members = r_branch.attr(\"members\");\n      double l_mid = l_branch.attr(\"midpoint\"), r_mid = r_branch.attr(\"midpoint\");\n\n      // Set up new branch attributes\n      new_br.attr(\"members\") = left_members + right_members;\n      new_br.attr(\"midpoint\") = (left_members + l_mid + r_mid) / 2;\n\n      // Deallocate unneeded memory along the way\n      z.at(lm - 1) = R_NilValue;\n      z.at(rm - 1) = R_NilValue;\n    }\n    new_br.attr(\"height\") = height.at(k);\n    z.at(k) = new_br;\n  }\n  List res = z.at(k - 1);\n  res.attr(\"class\") = \"dendrogram\";\n  return(res);\n}\n\n// Simple function to iteratively get the sub-children of a nested integer-hierarchy\n// [[Rcpp::export]]\nIntegerVector all_children(List hier, int key, bool leaves_only = false){\n  IntegerVector res = IntegerVector();\n\n  // If the key doesn't exist return an empty vector\n  if (!hier.containsElementNamed(std::to_string(key).c_str())){\n    return(res);\n  }\n\n  // Else, do iterative 'recursive' type function to extract all the IDs of\n  // all sub trees\n  IntegerVector children = hier[std::to_string(key).c_str()];\n  std::queue<int> to_do = std::queue<int>();\n  to_do.push(key);\n  while (to_do.size() != 0){\n    int parent = to_do.front();\n    if (!hier.containsElementNamed(std::to_string(parent).c_str())){\n      to_do.pop();\n    } else {\n      children = hier[std::to_string(parent).c_str()];\n      to_do.pop();\n      for (int n_children = 0; n_children < children.length(); ++n_children){\n        int child_id = children.at(n_children);\n        if (leaves_only){\n          if (!hier.containsElementNamed(std::to_string(child_id).c_str())) {\n            res.push_back(child_id);\n          }\n        } else { res.push_back(child_id); }\n        to_do.push(child_id);\n      }\n    }\n  }\n  return(res);\n}\n\n// Extract 'flat' assignments\nIntegerVector getSalientAssignments(List cl_tree, List cl_hierarchy, std::list<int> sc, const int n){\n  IntegerVector cluster = IntegerVector(n, 0);\n  for (std::list<int>::iterator it = sc.begin(); it != sc.end(); it++) {\n    IntegerVector child_cl = all_children(cl_hierarchy, *it);\n\n    // If at a leaf, its not necessary to recursively get point indices, else need to traverse hierarchy\n    if (child_cl.length() == 0){\n      List cl = cl_tree[std::to_string(*it)];\n      cluster[as<IntegerVector>(cl[\"contains\"]) - 1] = *it;\n    } else {\n      List cl = cl_tree[std::to_string(*it)];\n      cluster[as<IntegerVector>(cl[\"contains\"]) - 1] = *it;\n      for (IntegerVector::iterator child_cid = child_cl.begin(); child_cid != child_cl.end(); ++child_cid){\n        cl = cl_tree[std::to_string(*child_cid)];\n        IntegerVector child_contains = as<IntegerVector>(cl[\"contains\"]);\n        if (child_contains.length() > 0){\n          cluster[child_contains - 1] = *it;\n        }\n      }\n    }\n  }\n  return(cluster);\n}\n\n// Retrieve node (x, y) positions in a cluster tree\n// [[Rcpp::export]]\nNumericMatrix node_xy(List cl_tree, List cl_hierarchy, int cid = 0){\n\n  // Initialize\n  if (cid == 0){\n    cl_tree[\"node_xy\"] = NumericMatrix(all_children(cl_hierarchy, 0).size()+1, 2);\n    cl_tree[\"leaf_counter\"] = 0;\n    cl_tree[\"row_counter\"] = 0;\n  }\n\n  // Retrieve/set variables\n  std::string cid_str = std::to_string(cid);\n  NumericMatrix node_xy_ = cl_tree[\"node_xy\"];\n  List cl = cl_tree[cid_str];\n\n  // Increment row index every time\n  int row_index = (int) cl_tree[\"row_counter\"];\n  cl_tree[\"row_counter\"] = row_index+1;\n\n  // base case\n  if (!cl_hierarchy.containsElementNamed(cid_str.c_str())){\n    int leaf_index = (int) cl_tree[\"leaf_counter\"];\n    node_xy_(row_index, _) = NumericVector::create((double) ++leaf_index, (double) cl[\"eps_death\"]);\n    cl_tree[\"leaf_counter\"] = leaf_index;\n    NumericMatrix res = NumericMatrix(1, 1);\n    res[0] = row_index;\n    return(res);\n  } else {\n    IntegerVector children = cl_hierarchy[cid_str];\n    int l_row = (int) node_xy(cl_tree, cl_hierarchy, children.at(0))[0]; // left\n    int r_row = (int) node_xy(cl_tree, cl_hierarchy, children.at(1))[0]; // right\n    double lvalue = (double) (node_xy_(l_row, 0) + node_xy_(r_row, 0)) / 2;\n    node_xy_(row_index, _) = NumericVector::create(lvalue, (double) cl[\"eps_death\"]);\n\n    if (cid != 0){\n      NumericMatrix res = NumericMatrix(1, 1);\n      res[0] = row_index;\n      return(res);\n    }\n  }\n\n  // Cleanup\n  if (cid == 0){\n    cl_tree[\"leaf_counter\"] = R_NilValue;\n    cl_tree[\"row_counter\"] = R_NilValue;\n  }\n  return (node_xy_);\n}\n\n// Given a cluster tree, convert to a simplified dendrogram\n// [[Rcpp::export]]\nList simplifiedTree(List cl_tree) {\n\n  // Hierarchical information\n  List cl_hierarchy = cl_tree.attr(\"cl_hierarchy\");\n  IntegerVector all_childs = all_children(cl_hierarchy, 0);\n\n  // To keep track of members and midpoints\n  std::unordered_map<std::string, int> members = std::unordered_map<std::string, int>();\n  std::unordered_map<std::string, float> mids = std::unordered_map<std::string, float>();\n\n  // To keep track of where we are\n  std::stack<int> cid_stack = std::stack<int>();\n  cid_stack.push(0);\n\n  // Iteratively build the hierarchy\n  List dendrogram = List();\n\n  // Premake children\n  for (IntegerVector::iterator it = all_childs.begin(); it != all_childs.end(); ++it){\n    std::string cid_label = std::to_string(*it);\n    List cl = cl_tree[cid_label];\n    if (!cl_hierarchy.containsElementNamed(cid_label.c_str())){\n      // Create leaf\n      IntegerVector leaf = IntegerVector::create(*it);\n      leaf.attr(\"label\") = cid_label;\n      leaf.attr(\"members\") = 1;\n      leaf.attr(\"height\") = cl[\"eps_death\"];\n      leaf.attr(\"midpoint\") = 0;\n      leaf.attr(\"leaf\") = true;\n      dendrogram[cid_label] = leaf;\n      members[cid_label] = 1;\n      mids[cid_label] = 0;\n    }\n  }\n\n  // Building the dendrogram bottom-up\n  while(!cid_stack.empty()) {\n    int cid = cid_stack.top();\n    std::string cid_label = std::to_string(cid);\n    List cl = cl_tree[cid_label];\n\n    // Recursive calls\n    IntegerVector local_children = cl_hierarchy[cid_label];\n\n    // Members and midpoint extraction\n    std::string l_str = std::to_string(local_children.at(0)), r_str = std::to_string(local_children.at(1));\n    // Rcout << \"Comparing: \" << l_str << \", \" << r_str << std::endl;\n    if (!dendrogram.containsElementNamed(l_str.c_str())){ cid_stack.push(local_children.at(0)); continue; }\n    if (!dendrogram.containsElementNamed(r_str.c_str())){ cid_stack.push(local_children.at(1)); continue; }\n\n    // Continue building up the hierarchy\n    List left = dendrogram[l_str], right = dendrogram[r_str];\n\n    int l_members = members[l_str], r_members = members[r_str];\n    float l_mid = mids[l_str], r_mid = mids[r_str];\n\n    // Make the new branch\n    List new_branch = List::create(dendrogram[l_str], dendrogram[r_str]);\n    new_branch.attr(\"label\") = cid_label;\n    new_branch.attr(\"members\") = l_members + r_members;\n    new_branch.attr(\"height\") = (float) cl[\"eps_death\"];\n    new_branch.attr(\"class\") = \"dendrogram\";\n\n    // Midpoint calculation\n    bool isL = (bool) !cl_hierarchy.containsElementNamed(l_str.c_str()); // is left a leaf\n    if (!isL && cl_hierarchy.containsElementNamed(r_str.c_str())){ // is non-singleton merge\n      new_branch.attr(\"midpoint\") = (l_members + l_mid + r_mid) / 2;\n    } else { // contains a leaf\n      int sub_members = isL ? r_members : l_members;\n      float mid_pt = isL ? r_mid : l_mid;\n      new_branch.attr(\"midpoint\") = ((isL ? 1 : sub_members) + mid_pt) / 2;\n    }\n\n    // Save info for later\n    members[cid_label] = l_members + r_members;\n    mids[cid_label] = (float) new_branch.attr(\"midpoint\");\n    dendrogram[cid_label] = new_branch;\n\n    // Done with this node\n    cid_stack.pop();\n  }\n  return(dendrogram[\"0\"]);\n}\n\n/* Main processing step to compute all the relevent information in the form of the\n * 'cluster tree' for FOSC. The cluster stability scores are computed via the tree traversal rely on a separate function\n * Requires information associated with hclust elements. See ?hclust in R for more info.\n * 1. merge := an (n-1) x d matrix representing the MST computed from any arbitrary similarity matrix\n * 2. height := the (linkage) distance each new set of clusters forms from the MST\n * 3. order := the point indices of the original data the negative entries in merge refer to\n * Notation: eps is used to arbitrarily refer to the dissimilarity distance used\n*/\n// [[Rcpp::export]]\nList computeStability(const List hcl, const int minPts, bool compute_glosh = false){\n  // Extract hclust info\n  NumericMatrix merge = hcl[\"merge\"];\n  NumericVector eps_dist = hcl[\"height\"];\n  IntegerVector pt_order = hcl[\"order\"];\n  int n = merge.nrow() + 1, k;\n\n  //  Which cluster does each merge step represent (after the merge, or before the split)\n  IntegerVector cl_tracker = IntegerVector(n-1 , 0),\n                member_sizes = IntegerVector(n-1, 0); // Size each step\n\n  List clusters = List(), // Final cluster information\n       cl_hierarchy = List(); // Keeps track of hierarchy, which cluster contains who\n\n  // The primary information needed\n  std::unordered_map<std::string, IntegerVector> contains = std::unordered_map<std::string, IntegerVector>();\n  std::unordered_map<std::string, NumericVector> eps = std::unordered_map<std::string, NumericVector>();\n\n  // Supplemental information for either conveniance or to reduce memory\n  std::unordered_map<std::string, int> n_children = std::unordered_map<std::string, int>();\n  std::unordered_map<std::string, double> eps_death = std::unordered_map<std::string, double>();\n  std::unordered_map<std::string, double> eps_birth = std::unordered_map<std::string, double>();\n  std::unordered_map<std::string, bool> processed = std::unordered_map<std::string, bool>();\n\n  // First pass: Agglomerate up the hierarchy, recording member sizes.\n  // This enables a dynamic programming strategy to improve performance below.\n  for (k = 0; k < n-1; ++k){\n    int lm = merge(k, 0), rm = merge(k, 1);\n    IntegerVector m = IntegerVector::create(lm, rm);\n    if (all(m < 0).is_true()){\n      member_sizes[k] = 2;\n    } else if (any(m < 0).is_true()) {\n      int pos_merge = (lm < 0 ? rm : lm), merge_size = member_sizes[pos_merge - 1];\n      member_sizes[k] = merge_size + 1;\n    } else {\n      // Record Member Sizes\n      int merge_size1 = member_sizes[lm-1], merge_size2 = member_sizes[rm-1];\n      member_sizes[k] = merge_size1 + merge_size2;\n    }\n  }\n\n  // Initialize root (unknown size, might be 0, so don't initialize length)\n  std::string root_str = \"0\";\n  contains[root_str] = NumericVector();\n  eps[root_str] = NumericVector();\n  eps_birth[root_str] = eps_dist.at(eps_dist.length()-1);\n\n  int global_cid = 0;\n  // Second pass: Divisively split the hierarchy, recording the epsilon and point index values as needed\n  for (k = n-2; k >= 0; --k){\n    // Current Merge\n    int lm = merge(k, 0), rm = merge(k, 1), cid = cl_tracker.at(k);\n    IntegerVector m = IntegerVector::create(lm, rm);\n    std::string cl_cid = std::to_string(cid);\n\n    // Trivial case: split into singletons, record eps, contains, and ensure eps_death is minimal\n    if (all(m < 0).is_true()){\n      contains[cl_cid].push_back(-lm), contains[cl_cid].push_back(-rm);\n      double noise_eps = processed[cl_cid] ? eps_death[cl_cid] : eps_dist.at(k);\n      eps[cl_cid].push_back(noise_eps), eps[cl_cid].push_back(noise_eps);\n      eps_death[cl_cid] = processed[cl_cid] ? eps_death[cl_cid] : std::min((double) eps_dist.at(k), (double) eps_death[cl_cid]);\n    } else if (any(m < 0).is_true()) {\n      // Record new point info and mark the non-singleton with the cluster id\n      contains[cl_cid].push_back(-(lm < 0 ? lm : rm));\n      eps[cl_cid].push_back(processed[cl_cid] ? eps_death[cl_cid] : eps_dist.at(k));\n      cl_tracker.at((lm < 0 ? rm : lm) - 1) = cid;\n    } else {\n      int merge_size1 = member_sizes[lm-1], merge_size2 = member_sizes[rm-1];\n\n      // The minPts step\n      if (merge_size1 >= minPts && merge_size2 >= minPts){\n        // Record death of current cluster\n        eps_death[cl_cid] = eps_dist.at(k);\n        processed[cl_cid] = true;\n\n        // Mark the lower merge steps as new clusters\n        cl_hierarchy[cl_cid] = IntegerVector::create(global_cid+1, global_cid+2);\n        std::string l_index = std::to_string(global_cid+1), r_index = std::to_string(global_cid+2);\n        cl_tracker.at(lm - 1) = ++global_cid, cl_tracker.at(rm - 1) = ++global_cid;\n\n        // Record the distance the new clusters appeared and initialize containers\n        contains[l_index] = IntegerVector(), contains[r_index] = IntegerVector();\n        eps[l_index] = NumericVector(), eps[r_index] = NumericVector(); ;\n        eps_birth[l_index] = eps_dist.at(k), eps_birth[r_index] = eps_dist.at(k);\n        eps_death[l_index] = eps_dist.at(lm - 1), eps_death[r_index] = eps_dist.at(rm - 1);\n        processed[l_index] = false, processed[r_index] = false;\n        n_children[cl_cid] = merge_size1 + merge_size2;\n      } else {\n        // Inherit cluster identity\n        cl_tracker.at(lm - 1) = cid,  cl_tracker.at(rm - 1) = cid;\n      }\n    }\n  }\n\n  // Aggregate data into a returnable list\n  // NOTE: the 'contains' element will be empty for all inner nodes w/ minPts == 1, else\n  // it will contain only the objects that were considered 'noise' at that hierarchical level\n  List res = List();\n  NumericVector outlier_scores;\n  if (compute_glosh) { outlier_scores = NumericVector( n, -1.0); }\n  for (std::unordered_map<std::string, IntegerVector>::iterator key = contains.begin(); key != contains.end(); ++key){\n    int nc = n_children[key->first];\n    res[key->first] = List::create(\n      _[\"contains\"] = key->second,\n      _[\"eps\"] = eps[key->first],\n      _[\"eps_birth\"] = eps_birth[key->first],\n      _[\"eps_death\"] = eps_death[key->first],\n      _[\"stability\"] = sum(1/eps[key->first] - 1/eps_birth[key->first]) + (nc * 1/eps_death[key->first] - nc * 1/eps_birth[key->first]),\n      //_[\"_stability\"] = 1/eps[key->first] - 1/eps_birth[key->first],\n      _[\"n_children\"] = n_children[key->first]\n    );\n\n    // Compute GLOSH outlier scores (HDBSCAN only)\n    if (compute_glosh){\n      if (eps[key->first].size() > 0){ // contains noise points\n        double eps_max = std::numeric_limits<double>::infinity();\n        IntegerVector leaf_membership = all_children(cl_hierarchy, atoi(key->first.c_str()), true);\n        if (leaf_membership.length() == 0){ // is itself a leaf\n          eps_max = eps_death[key->first];\n        } else {\n          for (IntegerVector::iterator it = leaf_membership.begin(); it != leaf_membership.end(); ++it){\n            eps_max = std::min(eps_max, eps_death[std::to_string(*it)]);\n          }\n        }\n        NumericVector eps_max_vec =  NumericVector(eps[key->first].size(), eps_max) / as<NumericVector>(eps[key->first]);\n        NumericVector glosh = Rcpp::rep(1.0, key->second.length()) - eps_max_vec;\n        outlier_scores[key->second - 1] = glosh;\n      }\n        // MFH: If the point is never an outlier (0/0) then set GLOSH to 0\n        outlier_scores[is_nan(outlier_scores)] = 0.0;\n    }\n  }\n\n  // Store meta-data as attributes\n  res.attr(\"n\") = n; // number of points in the original data\n  res.attr(\"cl_hierarchy\") = cl_hierarchy;  // Stores parent/child structure\n  res.attr(\"cl_tracker\") = cl_tracker; // stores cluster id formation for each merge step, used for cluster extraction\n  res.attr(\"minPts\") = minPts; // needed later\n  // res.attr(\"root\") = minPts == 1; // needed later to ensure root is not captured as a cluster\n  if (compute_glosh){ res.attr(\"glosh\") = outlier_scores; } // glosh outlier scores (hdbscan only)\n  return(res);\n}\n\n// Validates a given list of instance-level constraints for symmetry. Since the number of\n// constraints might change dramatically based on the problem, and initial loop is performed\n// to figure out whether it would be faster to check via an adjacencty list or matrix\n// [[Rcpp::export]]\nList validateConstraintList(List& constraints, int n){\n  std::vector< std::string > keys = as< std::vector< std::string > >(constraints.names());\n  bool is_valid = true, tmp_valid, use_matrix = false;\n\n  int n_constraints = 0;\n  for (List::iterator it = constraints.begin(); it != constraints.end(); ++it){\n    n_constraints += as<IntegerVector>(*it).size();\n  }\n\n  // Sparsity check: if the constraints make up a sufficiently large amount of\n  // the solution space, use matrix to check validity\n  if (n_constraints/(n*n) > 0.20){\n    use_matrix = true;\n  }\n\n  // Check using adjacency matrix\n  if (use_matrix){\n    IntegerMatrix adj_matrix = IntegerMatrix(Dimension(n, n));\n    int from, to;\n    for (std::vector< std::string >::iterator it = keys.begin(); it != keys.end(); ++it){\n      // Get constraints\n      int cid = atoi(it->c_str()); // to base-0\n      IntegerVector cs_ = constraints[*it];\n\n      // Positive \"should-link\" constraints\n      IntegerVector pcons = as<IntegerVector>(cs_[cs_ > 0]);\n      for (IntegerVector::iterator pc = pcons.begin(); pc != pcons.end(); ++pc){\n        from = (*pc < cid ? *pc : cid) - 1;\n        to = (*pc > cid ? *pc : cid) - 1;\n        adj_matrix(from, to) = 1;\n      }\n\n      // Negative \"should-not-link\" constraints\n      IntegerVector ncons = -(as<IntegerVector>(cs_[cs_ < 0]));\n      for (IntegerVector::iterator nc = ncons.begin(); nc != ncons.end(); ++nc){\n        from = (*nc < cid ? *nc : cid) - 1;\n        to = (*nc > cid ? *nc : cid) - 1;\n        adj_matrix(from, to) = -1;\n      }\n    }\n\n    // Check symmetry\n    IntegerVector lower = lowerTri(adj_matrix);\n    IntegerMatrix adj_t = Rcpp::transpose(adj_matrix);\n    IntegerVector lower_t = lowerTri(adj_t);\n    LogicalVector valid_check = lower == lower_t;\n    is_valid = all(valid_check == TRUE).is_true();\n\n    // Try to merge the two\n    if (!is_valid){\n      int sum = 0;\n      for (int i = 0; i < lower.size(); ++i){\n        sum = lower.at(i) + lower_t.at(i);\n        lower[i] = sum > 0 ? 1 : sum < 0 ? -1 : 0;\n      }\n    }\n    constraints = distToAdjacency(lower, n);\n  }\n  // Else check using given adjacency list\n  else {\n    for (std::vector< std::string >::iterator it = keys.begin(); it != keys.end(); ++it){\n      // Get constraints\n      int cid = atoi(it->c_str());\n      IntegerVector cs_ = constraints[*it];\n\n      // Positive \"should-link\" constraints\n      IntegerVector pcons = as<IntegerVector>(cs_[cs_ > 0]);\n      for (IntegerVector::iterator pc = pcons.begin(); pc != pcons.end(); ++pc){\n        int ic = *pc < 0 ? -(*pc) : *pc;\n        std::string ic_str = std::to_string(ic);\n        bool exists = constraints.containsElementNamed(ic_str.c_str());\n        tmp_valid = exists ? contains(as<IntegerVector>(constraints[ic_str]), cid) : false;\n        if (!tmp_valid){\n          if (!exists){\n            constraints[ic_str] = IntegerVector::create(cid);\n          } else {\n            IntegerVector con_vec = constraints[ic_str];\n            con_vec.push_back(cid);\n            constraints[ic_str] = con_vec;\n          }\n          is_valid = false;\n        }\n      }\n\n      // Negative \"should-not-link\" constraints\n      IntegerVector ncons = -(as<IntegerVector>(cs_[cs_ < 0]));\n      for (IntegerVector::iterator nc = ncons.begin(); nc != ncons.end(); ++nc){\n        int ic = *nc < 0 ? -(*nc) : *nc;\n        std::string ic_str = std::to_string(ic);\n        bool exists = constraints.containsElementNamed(ic_str.c_str());\n        tmp_valid = exists ? contains(as<IntegerVector>(constraints[ic_str]), cid) : false;\n        if (!tmp_valid){\n          if (!exists){\n            constraints[ic_str] = IntegerVector::create(-cid);\n          } else {\n            IntegerVector con_vec = constraints[ic_str];\n            con_vec.push_back(-cid);\n            constraints[ic_str] = con_vec;\n          }\n          is_valid = false;\n        }\n      }\n    }\n  }\n  // Produce warning if asymmetric constraints detected; return attempt at fixing constraints.\n  if (!is_valid){\n      warning(\"Incomplete (asymmetric) constraints detected. Populating constraint list.\");\n  }\n  return(constraints);\n}\n\n// [[Rcpp::export]]\ndouble computeVirtualNode(IntegerVector noise, List constraints){\n  if (noise.length() == 0) return(0);\n  if (Rf_isNull(constraints)) return(0);\n\n  // Semi-supervised extraction\n  int satisfied_constraints = 0;\n  // Rcout << \"Starting constraint based optimization\" << std::endl;\n  for (IntegerVector::iterator it = noise.begin(); it != noise.end(); ++it){\n    std::string cs_str = std::to_string(*it);\n    if (constraints.containsElementNamed(cs_str.c_str())){\n      // Get constraints\n      IntegerVector cs_ = constraints[cs_str];\n\n      // Positive \"should-link\" constraints\n      IntegerVector pcons = as<IntegerVector>(cs_[cs_ > 0]);\n      for (IntegerVector::iterator pc = pcons.begin(); pc != pcons.end(); ++pc){\n        satisfied_constraints += contains(noise, *pc);\n      }\n\n      // Negative \"should-not-link\" constraints\n      IntegerVector ncons = -(as<IntegerVector>(cs_[cs_ < 0]));\n      for (IntegerVector::iterator nc = ncons.begin(); nc != ncons.end(); ++nc){\n        satisfied_constraints += (1 - contains(noise, *nc));\n      }\n    }\n  }\n  return(satisfied_constraints);\n}\n\n\n// Framework for Optimal Selection of Clusters (FOSC)\n// Traverses a cluster tree hierarchy to compute a flat solution, maximizing the:\n// - Unsupervised soln: the 'most stable' clusters following the give linkage criterion\n// - SS soln w/ instance level Constraints: constraint-based w/ unsupervised tiebreaker\n// - SS soln w/ mixed objective function: maximizes J = α JU + (1 − α) JSS\n// [[Rcpp::export]]\nNumericVector fosc(List cl_tree, std::string cid, std::list<int>& sc, List cl_hierarchy,\n                   bool prune_unstable_leaves=false, // whether to prune -very- unstable subbranches\n                   double cluster_selection_epsilon = 0.0, // whether to prune subbranches below a given epsilon\n                   const double alpha = 0, // mixed objective case\n                   bool useVirtual = false, // return virtual node as well\n                   const int n_constraints = 0, // number of constraints\n                   List constraints = R_NilValue) // instance-level constraints\n{\n  // Base case: at a leaf\n  if (!cl_hierarchy.containsElementNamed(cid.c_str())){\n    List cl = cl_tree[cid];\n    sc.push_back(std::atoi(cid.c_str())); // assume the leaf will be a salient cluster until proven otherwise\n    return(NumericVector::create((double) cl[\"stability\"],\n                                 (double) useVirtual ? cl[\"vscore\"] : 0));\n  } else {\n    // Non-base case: at a merge of clusters, determine which to keep\n    List cl = cl_tree[cid];\n\n    // Get child stability/constraint scores\n    NumericVector scores, stability_scores = NumericVector(), constraint_scores = NumericVector();\n    IntegerVector child_ids = cl_hierarchy[cid];\n    for (int i = 0, clen = child_ids.length(); i < clen; ++i){\n      int child_id = child_ids.at(i);\n      scores = fosc(cl_tree, std::to_string(child_id), sc, cl_hierarchy, prune_unstable_leaves, cluster_selection_epsilon, alpha, useVirtual, n_constraints, constraints);\n      stability_scores.push_back(scores.at(0));\n      constraint_scores.push_back(scores.at(1));\n    }\n\n    // If semisupervised scenario, normalizing should be stored in 'total_stability'\n    double total_stability = (contains(cl_tree.attributeNames(),\"total_stability\") ? (double) cl_tree.attr(\"total_stability\") : 1.0);\n\n    // Compare and update stability scores\n    double old_stability_score = (double) cl[\"stability\"] / total_stability;\n    double new_stability_score = (double) sum(stability_scores) / total_stability;\n\n    // Compute instance-level constraints if necessary\n    double old_constraint_score = 0, new_constraint_score = 0;\n    if (useVirtual){\n      // Rcout << \"old constraint score for \" << cid << \": \" << (double) cl[\"vscore\"] << std::endl;\n      old_constraint_score = (double) cl[\"vscore\"];\n      new_constraint_score = (double) sum(constraint_scores) + (double) computeVirtualNode(cl[\"contains\"], constraints)/n_constraints;\n    }\n\n    bool keep_children = true;\n    // If the score is unchanged, remove the children and add parent\n    if (useVirtual){\n      if (old_constraint_score < new_constraint_score && cid != \"0\"){\n        // Children satisfies more constraints\n        cl[\"vscore\"] = new_constraint_score;\n        cl[\"score\"] = alpha * new_stability_score + (1 - alpha) * new_constraint_score;\n        // Rcout << \"1: score for \" << cid << \":\" << (double) cl[\"score\"] << std::endl;\n        // Rcout << \"(old constraint): \" << old_constraint_score << \", (new constraint): \" << new_constraint_score << std::endl;\n      } else if (old_constraint_score > new_constraint_score && cid != \"0\"){\n        // Parent satisfies more constraints\n        cl[\"vscore\"] = old_constraint_score;\n        cl[\"score\"] = alpha * old_stability_score + (1 - alpha) * old_constraint_score;\n        // Rcout << \"2: score for \" << cid << \":\" << (double) cl[\"score\"] << std::endl;\n        keep_children = false;\n      } else {\n        // Resolve tie using unsupervised, stability-based approach\n        if (old_stability_score < new_stability_score){\n          // Children are more stable\n          cl[\"score\"] = new_stability_score / total_stability;\n          // Rcout << \"3: score for \" << cid << \":\" << (double) cl[\"score\"] << std::endl;\n        } else {\n          // Parent is more stable\n          cl[\"score\"] = old_stability_score / total_stability;\n          // Rcout << \"4: score for \" << cid << \":\" << (double) cl[\"score\"] << std::endl;\n          // Rcout << \"(old stability): \" << old_stability_score << \", (total stability): \" << total_stability << std::endl;\n          keep_children = false;\n        }\n        cl[\"vscore\"] = old_constraint_score;\n      }\n    } else {\n      // Use unsupervised, stability-based approach only\n      if (old_stability_score < new_stability_score){\n        cl[\"score\"] = new_stability_score; // keep children\n      } else {\n        cl[\"score\"] = old_stability_score;\n        keep_children = false;\n      }\n    }\n\n    double epsdeath = (double) cl[\"eps_death\"];\n    if (epsdeath < cluster_selection_epsilon){\n      keep_children = false; // prune children that emerge at distance below epsilon\n    }\n\n    // Prune children and add parent (cid) if need be\n    if (!keep_children && cid != \"0\") {\n      IntegerVector children = all_children(cl_hierarchy, std::atoi(cid.c_str())); // use all_children to prune subtrees\n      for (int i = 0, clen = children.length(); i < clen; ++i){\n        sc.remove(children.at(i)); // use list for slightly better random deletion performance\n      }\n      sc.push_back(std::atoi(cid.c_str()));\n    } else if (keep_children && prune_unstable_leaves){\n      // If flag passed, prunes leaves with insignificant stability scores\n      // this can happen in cases where one leaf has a stability score significantly greater\n      // than both its siblings and its parent (or other ancestors), causing sibling branches\n      // to be considered as clusters even though they may nto be significantly more stable than their parent\n      if (all(stability_scores < old_stability_score).is_false()){\n        for (int i = 0, clen = child_ids.length(); i < clen; ++i){\n          if (stability_scores.at(i) < old_stability_score){\n            IntegerVector to_prune = all_children(cl_hierarchy, child_ids.at(i)); // all sub members\n            for (IntegerVector::iterator it = to_prune.begin(); it != to_prune.end(); ++it){\n              //Rcout << \"Pruning: \" << *it << std::endl;\n              sc.remove(*it);\n            }\n          }\n        }\n      }\n    }\n\n    // Save scores for traversal up and for later\n    cl_tree[cid] = cl;\n\n    // Return this sub trees score\n    return(NumericVector::create((double) cl[\"score\"], useVirtual ? (double) cl[\"vscore\"] : 0));\n  }\n}\n\n// Given a cluster tree object with computed stability precomputed scores from computeStability,\n// extract the 'most stable' or salient flat cluster assignments. The large number of derivable\n// arguments due to fosc being a recursive function\n// [[Rcpp::export]]\nList extractUnsupervised(List cl_tree, bool prune_unstable = false, double cluster_selection_epsilon = 0.0){\n  // Compute Salient Clusters\n  std::list<int> sc = std::list<int>();\n  List cl_hierarchy = cl_tree.attr(\"cl_hierarchy\");\n  int n = as<int>(cl_tree.attr(\"n\"));\n  fosc(cl_tree, \"0\", sc, cl_hierarchy, prune_unstable, cluster_selection_epsilon); // Assume root node is always id == 0\n\n  // Store results as attributes\n  cl_tree.attr(\"cluster\") = getSalientAssignments(cl_tree, cl_hierarchy, sc, n); // Flat assignments\n  cl_tree.attr(\"salient_clusters\") = wrap(sc); // salient clusters\n  return(cl_tree);\n}\n\n// [[Rcpp::export]]\nList extractSemiSupervised(List cl_tree, List constraints, float alpha = 0, bool prune_unstable_leaves = false, double cluster_selection_epsilon = 0.0){\n  // Rcout << \"Starting semisupervised extraction...\" << std::endl;\n  List root = cl_tree[\"0\"];\n  List cl_hierarchy = cl_tree.attr(\"cl_hierarchy\");\n  int n = as<int>(cl_tree.attr(\"n\"));\n\n  // Compute total number of constraints\n  int n_constraints = 0;\n  for (int i = 0, n = constraints.length(); i < n; ++i){\n    IntegerVector cl_constraints = constraints.at(i);\n    n_constraints += cl_constraints.length();\n  }\n\n  // Initialize root\n  List cl = cl_tree[\"0\"];\n  cl[\"vscore\"] = 0;\n  cl_tree[\"0\"] = cl; // replace to keep changes\n\n  // Compute initial gamma values or \"virtual nodes\" for both leaf and internal nodes\n  IntegerVector cl_ids = all_children(cl_hierarchy, 0);\n  for (IntegerVector::iterator it = cl_ids.begin(); it != cl_ids.end(); ++it){\n    if (*it != 0){\n      std::string cid_str = std::to_string(*it);\n      List cl = cl_tree[cid_str];\n\n      // Store the initial fraction of constraints satisfied for each node as 'vscore'\n      // NOTE: leaf scores represent \\hat{gamma}, internal represent virtual node scores\n      if (cl_hierarchy.containsElementNamed(cid_str.c_str())){\n        // Extract the point indices the cluster contains\n        IntegerVector child_cl = all_children(cl_hierarchy, *it), child_ids;\n        List cl_container = List();\n        for (IntegerVector::iterator ch_id = child_cl.begin(); ch_id != child_cl.end(); ++ch_id){\n          List ch_cl = cl_tree[std::to_string(*ch_id)];\n          //child_ids = combine(child_ids, ch_cl[\"contains\"]);\n          cl_container.push_back(as<IntegerVector>(ch_cl[\"contains\"]));\n        }\n        cl_container.push_back(as<IntegerVector>(cl[\"contains\"]));\n        child_ids = concat_int(cl_container);\n        cl[\"vscore\"] = computeVirtualNode(child_ids, constraints)/n_constraints;\n      } else { // is leaf node\n        cl[\"vscore\"] = computeVirtualNode(cl[\"contains\"], constraints)/n_constraints;\n      }\n      cl_tree[cid_str] = cl; // replace to keep changes\n    }\n  }\n\n  // First pass: compute unsupervised soln as a means of extracting normalizing constant J_U^*\n  cl_tree = extractUnsupervised(cl_tree, false, cluster_selection_epsilon);\n  IntegerVector stable_sc = cl_tree.attr(\"salient_clusters\");\n  double total_stability = 0.0f;\n  for (IntegerVector::iterator it = stable_sc.begin(); it != stable_sc.end(); ++it){\n    List cl = cl_tree[std::to_string(*it)];\n    total_stability += (double) cl[\"stability\"];\n  }\n  cl_tree.attr(\"total_stability\") = total_stability;\n  // Rcout << \"Total stability: \" << total_stability << std::endl;\n\n  // Compute stable clusters w/ instance-level constraints\n  std::list<int> sc = std::list<int>();\n  fosc(cl_tree, \"0\", sc, cl_hierarchy, prune_unstable_leaves, cluster_selection_epsilon,\n       alpha, true, n_constraints, constraints); // semi-supervised parameters\n\n  // Store results as attributes and return\n  cl_tree.attr(\"salient_clusters\") = wrap(sc);\n  cl_tree.attr(\"cluster\") = getSalientAssignments(cl_tree, cl_hierarchy, sc, n);\n  return(cl_tree);\n}\n\n\n\n\n"
  },
  {
    "path": "src/kNN.cpp",
    "content": "//----------------------------------------------------------------------\n//                  Find the k Nearest Neighbors\n// File:                    R_kNNdist.cpp\n//----------------------------------------------------------------------\n// Copyright (c) 2015 Michael Hahsler. All Rights Reserved.\n//\n// This software is provided under the provisions of the\n// GNU General Public License (GPL) Version 3\n// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)\n\n// Note: does not return self-matches!\n\n#include \"kNN.h\"\n\n// returns knn + dist\nList kNN_int(NumericMatrix data, int k,\n  int type, int bucketSize, int splitRule, double approx) {\n\n  // copy data\n  int nrow = data.nrow();\n  int ncol = data.ncol();\n  ANNpointArray dataPts = annAllocPts(nrow, ncol);\n  for(int i = 0; i < nrow; i++){\n    for(int j = 0; j < ncol; j++){\n      (dataPts[i])[j] = data(i, j);\n    }\n  }\n  //Rprintf(\"Points copied.\\n\");\n\n  // create kd-tree (1) or linear search structure (2)\n  ANNpointSet* kdTree = NULL;\n  if (type==1){\n    kdTree = new ANNkd_tree(dataPts, nrow, ncol, bucketSize,\n      (ANNsplitRule)  splitRule);\n  } else{\n    kdTree = new ANNbruteForce(dataPts, nrow, ncol);\n  }\n  //Rprintf(\"kd-tree ready. starting DBSCAN.\\n\");\n\n  NumericMatrix d(nrow, k);\n  IntegerMatrix id(nrow, k);\n\n  // Note: the search also returns the point itself (as the first hit)!\n  // So we have to look for k+1 points.\n  ANNdistArray dists = new ANNdist[k+1];\n  ANNidxArray nnIdx = new ANNidx[k+1];\n\n  for (int i=0; i<nrow; i++) {\n    if (!(i % 100)) Rcpp::checkUserInterrupt();\n\n    ANNpoint queryPt = dataPts[i];\n\n    kdTree->annkSearch(queryPt, k+1, nnIdx, dists, approx);\n\n    // remove self match\n    IntegerVector ids = IntegerVector(nnIdx, nnIdx+k+1);\n    LogicalVector take = ids != i;\n    ids = ids[take];\n    id(i, _) = ids + 1;\n\n    NumericVector ndists = NumericVector(dists, dists+k+1)[take];\n    d(i, _) = sqrt(ndists);\n  }\n\n  // cleanup\n  delete kdTree;\n  delete [] dists;\n  delete [] nnIdx;\n  annDeallocPts(dataPts);\n  // annClose(); is now done globally in the package\n\n\n  // prepare results\n  List ret;\n  ret[\"dist\"] = d;\n  ret[\"id\"] = id;\n  ret[\"k\"] = k;\n  ret[\"sort\"] = true;\n  return ret;\n}\n\n// returns knn + dist using data and query\n// [[Rcpp::export]]\nList kNN_query_int(NumericMatrix data, NumericMatrix query, int k,\n  int type, int bucketSize, int splitRule, double approx) {\n\n  // FIXME: check ncol for data and query\n\n  // copy data\n  int nrow = data.nrow();\n  int ncol = data.ncol();\n  ANNpointArray dataPts = annAllocPts(nrow, ncol);\n  for(int i = 0; i < nrow; i++){\n    for(int j = 0; j < ncol; j++){\n      (dataPts[i])[j] = data(i, j);\n    }\n  }\n\n  // copy query\n  int nrow_q = query.nrow();\n  int ncol_q = query.ncol();\n  ANNpointArray queryPts = annAllocPts(nrow_q, ncol_q);\n  for(int i = 0; i < nrow_q; i++){\n    for(int j = 0; j < ncol_q; j++){\n      (queryPts[i])[j] = query(i, j);\n    }\n  }\n  //Rprintf(\"Points copied.\\n\");\n\n  // create kd-tree (1) or linear search structure (2)\n  ANNpointSet* kdTree = NULL;\n  if (type==1){\n    kdTree = new ANNkd_tree(dataPts, nrow, ncol, bucketSize,\n      (ANNsplitRule)  splitRule);\n  } else{\n    kdTree = new ANNbruteForce(dataPts, nrow, ncol);\n  }\n  //Rprintf(\"kd-tree ready. starting DBSCAN.\\n\");\n\n  NumericMatrix d(nrow_q, k);\n  IntegerMatrix id(nrow_q, k);\n\n  // Note: does not return itself with query\n  ANNdistArray dists = new ANNdist[k];\n  ANNidxArray nnIdx = new ANNidx[k];\n\n  for (int i=0; i<nrow_q; i++) {\n    if (!(i % 100)) Rcpp::checkUserInterrupt();\n\n    ANNpoint queryPt = queryPts[i];\n    kdTree->annkSearch(queryPt, k, nnIdx, dists, approx);\n\n    IntegerVector ids = IntegerVector(nnIdx, nnIdx+k);\n    id(i, _) = ids + 1;\n\n    NumericVector ndists = NumericVector(dists, dists+k);\n    d(i, _) = sqrt(ndists);\n  }\n\n  // cleanup\n  delete kdTree;\n  delete [] dists;\n  delete [] nnIdx;\n  annDeallocPts(dataPts);\n  annDeallocPts(queryPts);\n  // annClose(); is now done globally in the package\n\n  // prepare results (ANN returns points sorted by distance)\n  List ret;\n  ret[\"dist\"] = d;\n  ret[\"id\"] = id;\n  ret[\"k\"] = k;\n  ret[\"sort\"] = true;\n  return ret;\n}\n"
  },
  {
    "path": "src/kNN.h",
    "content": "#ifndef KNN_H\n#define KNN_H\n\n#include <Rcpp.h>\n#include \"ANN/ANN.h\"\n\nusing namespace Rcpp;\n\n// returns knn + dist\n// [[Rcpp::export]]\nList kNN_int(NumericMatrix data, int k,\n             int type, int bucketSize, int splitRule, double approx);\n\n#endif\n"
  },
  {
    "path": "src/lof.cpp",
    "content": "//----------------------------------------------------------------------\n//                  Find the Neighbourhood for LOF\n// File:                    R_lof.cpp\n//----------------------------------------------------------------------\n// Copyright (c) 2021 Michael Hahsler. All Rights Reserved.\n//\n// This software is provided under the provisions of the\n// GNU General Public License (GPL) Version 3\n// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)\n\n// LOF needs to find the k-NN distance and then how many points are within this\n// neighborhood.\n\n#include <Rcpp.h>\n#include \"regionQuery.h\"\n\nusing namespace Rcpp;\n\n// returns knn-dist and the neighborhood size as a matrix\n// [[Rcpp::export]]\nList lof_kNN(NumericMatrix data, int minPts,\n  int type, int bucketSize, int splitRule, double approx) {\n\n  // minPts includes the point itself; k does not!\n  int k = minPts - 1;\n\n  // copy data\n  int nrow = data.nrow();\n  int ncol = data.ncol();\n  ANNpointArray dataPts = annAllocPts(nrow, ncol);\n  for(int i = 0; i < nrow; i++){\n    for(int j = 0; j < ncol; j++){\n      (dataPts[i])[j] = data(i, j);\n    }\n  }\n  //Rprintf(\"Points copied.\\n\");\n\n  // create kd-tree (1) or linear search structure (2)\n  ANNpointSet* kdTree = NULL;\n  if (type==1){\n    kdTree = new ANNkd_tree(dataPts, nrow, ncol, bucketSize,\n      (ANNsplitRule)  splitRule);\n  } else{\n    kdTree = new ANNbruteForce(dataPts, nrow, ncol);\n  }\n  //Rprintf(\"kd-tree ready. starting DBSCAN.\\n\");\n\n  // Note: the search also returns the point itself (as the first hit)!\n  // So we have to look for k+1 points.\n  ANNdistArray dists = new ANNdist[k+1];\n  ANNidxArray nnIdx = new ANNidx[k+1];\n  nn N;\n\n  // results\n  List id(nrow);\n  List dist(nrow);\n  NumericVector k_dist(nrow);\n\n  for (int i=0; i<nrow; i++) {\n    //Rprintf(\"processing point %d\\n\", p+1);\n    if (!(i % 100)) Rcpp::checkUserInterrupt();\n\n    ANNpoint queryPt = dataPts[i];\n\n    // find k-NN distance\n    kdTree->annkSearch(queryPt, k+1, nnIdx, dists, approx);\n    k_dist[i] = ANN_ROOT(dists[k]); // this is a squared distance!\n\n    // find k-NN neighborhood which can be larger than k with tied distances\n    // This works under Linux and Windows, but not under Solaris: The points at the\n    // k_distance may not be included.\n    //nn N = regionQueryDist_point(queryPt, dataPts, kdTree, dists[k], approx);\n\n    // Make the comparison robust.\n    // Compare doubles: http://c-faq.com/fp/fpequal.html\n    double minPts_dist = dists[k] + DBL_EPSILON * dists[k];\n    nn N = regionQueryDist_point(queryPt, dataPts, kdTree, minPts_dist, approx);\n\n    IntegerVector ids = IntegerVector(N.first.begin(), N.first.end());\n    NumericVector dists = NumericVector(N.second.begin(), N.second.end());\n\n    // remove self matches -- not an issue with query points\n    LogicalVector take = ids != i;\n    ids = ids[take];\n    dists = dists[take];\n\n    id[i] = ids+1;\n    dist[i] = sqrt(dists);\n  }\n\n  // cleanup\n  delete kdTree;\n  delete [] dists;\n  delete [] nnIdx;\n  annDeallocPts(dataPts);\n  // annClose(); is now done globally in the package\n\n  // all k_dists are squared\n  //k_dist = sqrt(k_dist);\n\n  // prepare results\n  List ret;\n  ret[\"k_dist\"] = k_dist;\n  ret[\"ids\"] = id;\n  ret[\"dist\"] = dist;\n  return ret;\n}\n"
  },
  {
    "path": "src/lt.h",
    "content": "#ifndef LT\n#define LT\n\n/* LT_POS to access a lower triangle matrix by C. Buchta\n * modified by M. Hahsler\n * n ... number of rows/columns\n * i,j ... column and row index (starts with 1)\n *\n * LT_POS1 ... 1-based indexing\n * LT_POS0 ... 0-based indexing\n */\n\n/* for long vectors, n, i, j need to be  R_xlen_t */\n#define LT_POS1(n, i, j)\t\t\t\t\t\\\n  (i)==(j) ? 0 : (i)<(j) ? (n) * ((i) - 1) - (i)*((i)-1)/2 + (j)-(i) -1\t\\\n        : (n)*((j)-1) - (j)*((j)-1)/2 + (i)-(j) -1\n\n#define LT_POS0(n, i, j)\t\t\t\t\t\\\n  (i)==(j) ? 0 : (i)<(j) ? (n) * (i) - ((i) + 1)*(i)/2 + (j)-(i) -1\t\\\n        : (n)*(j) - ((j) + 1)*(j)/2 + (i)-(j) -1\n\n/* M_POS to access matrix column-major order by i and j index (starts with 1)\n * n is the number of rows\n */\n#define M_POS(n, i, j) ((i)+(n)*(j))\n\n\n/*\n * MIN/MAX\n */\n\n#define MIN(X,Y) ((X) < (Y) ? (X) : (Y))\n#define MAX(X,Y) ((X) > (Y) ? (X) : (Y))\n\n\n#endif\n"
  },
  {
    "path": "src/mrd.cpp",
    "content": "//----------------------------------------------------------------------\n//              R interface to dbscan using the ANN library\n//----------------------------------------------------------------------\n// Copyright (c) 2015 Michael Hahsler, Matt Piekenbrock. All Rights Reserved.\n//\n// This software is provided under the provisions of the\n// GNU General Public License (GPL) Version 3\n// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)\n\n#include <Rcpp.h>\n\nusing namespace Rcpp;\n\n// Computes the mutual reachability distance defined for HDBSCAN\n//\n// The mutual reachability distance is a summary at what level two points together\n// will connect. The mutual reachability distance is defined as:\n// mrd(a, b) = max[core_distance(a), core_distance(b), distance(a, b)]\n//\n// Input:\n// * dm: distances as a dist object (vector) of size (n*(n-1))/2 where n\n//       is the number of points.\n//       Note: we divide by 2 early to stay within the number range of int.\n// * cd: the core distances as a vector of length n\n//\n// Returns:\n// a vector (dist object) in the same order as dm\n// [[Rcpp::export]]\nNumericVector mrd(NumericVector dm, NumericVector cd) {\n  R_xlen_t n = cd.length();\n  if (dm.length() != (n * (n-1) / 2))\n    stop(\"number of mutual reachability distance values and size of the distance matrix do not agree.\");\n\n  NumericVector res = NumericVector(dm.length());\n  for (R_xlen_t i = 0, idx = 0; i < n; ++i) {\n//    Rprintf(\"i = %ill of %ill, idx = %ill\\n\", i, n, idx);\n    for (R_xlen_t j = i+1; j < n; ++j, ++idx) {\n      res[idx] = std::max(dm[idx], std::max(cd[i], cd[j]));\n    }\n  }\n  return res;\n}\n"
  },
  {
    "path": "src/mst.cpp",
    "content": "//----------------------------------------------------------------------\n//              R interface to dbscan using the ANN library\n//----------------------------------------------------------------------\n// Copyright (c) 2015 Michael Hahsler, Matt Piekenbrock. All Rights Reserved.\n//\n// This software is provided under the provisions of the\n// GNU General Public License (GPL) Version 3\n// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)\n\n#include \"mst.h\"\n\n// coreFromDist indexes through the a dist vector to retrieve the core distance;\n// this might be useful in some situations. For example, you can get the core distance\n// from only a dist object, without needing the original data. In experimentation, the\n// kNNdist ended up being faster than this.\n//\n// // [[Rcpp::export]]\n// NumericVector coreFromDist(const NumericVector dist, const int n, const int minPts){\n//   NumericVector core_dist = NumericVector(n);\n//   NumericVector row_dist = NumericVector(n - 1);\n//   for (R_xlen_t i = 0; i < n; ++i){\n//     for (R_xlen_t j = 0; j < n; ++j){\n//       if (i == j) continue;\n//       R_xlen_t index = LT_POS0(n, j, i)\n//       row_dist.at(j > i ? j  - 1 : j) = dist.at(index);\n//     }\n//     std::sort(row_dist.begin(), row_dist.end());\n//     core_dist[i] = row_dist.at(minPts-2); // one for 0-based indexes, one for inclusive minPts condition\n//   }\n//   return(core_dist);\n// }\n\n\n// Prim's Algorithm\n// this implementation for dense dist objects avoids the use of a min-heap.\n// [[Rcpp::export]]\nRcpp::NumericMatrix mst(const NumericVector x_dist, const R_xlen_t n) {\n  Rcpp::NumericMatrix mst = NumericMatrix(n - 1, 3);\n  colnames(mst) = CharacterVector::create(\"from\", \"to\", \"weight\");\n\n  // vector to store the parent of vertex\n  std::vector<int> parent(n);\n  std::vector<double> weight(n, INFINITY);\n  std::vector<bool> visited(n, false);\n\n  // first node is always the root of MST.\n  parent[0] = -1;\n  weight[0] = 0;\n\n  int next_node = 0;\n  double next_weight;\n  int node;\n\n  while (next_node >= 0) {\n    node = next_node;\n    next_node = -1;\n    next_weight = INFINITY;\n\n    visited[node] = true;\n    mst(node-1, 1) = parent[node] +1;\n    mst(node-1, 0) = node + 1;\n    mst(node-1, 2) = weight[node];\n\n    for (int i = 1; i < n; i++) { // 0 is always the first node\n      if (visited[i] || node == i) continue;\n\n      double the_weight = x_dist[LT_POS0(n, node, i)];\n      if (the_weight < weight[i]) {\n        weight[i] = the_weight;\n        parent[i] = node;\n      }\n\n      // find minimum weight node\n      if (weight[i] < next_weight) {\n        next_weight = weight[i];\n        next_node = i;\n      }\n\n    }\n  }\n\n  return(mst);\n}\n\n//\n// // [[Rcpp::export]]\n// IntegerVector order_(NumericVector x) {\n//   if (is_true(any(duplicated(x)))) {\n//     Rf_warning(\"There are duplicates in 'x'; order not guaranteed to match that of R's base::order\");\n//   }\n//   NumericVector sorted = clone(x).sort();\n//   return match(sorted, x);\n// }\n\n\n// Single link hierarchical clustering\n// used by GLOSH.R and hdbscan.R\n\nvoid visit(const IntegerMatrix& merge, IntegerVector& order, int i, int j, int& ind) {\n  // base case\n  if (merge(i, j) < 0) {\n    order.at(ind++) = -merge(i, j);\n  }\n  else {\n    visit(merge, order, merge(i, j) - 1, 0, ind);\n    visit(merge, order, merge(i, j) - 1, 1, ind);\n  }\n}\n\nIntegerVector extractOrder(IntegerMatrix merge){\n  IntegerVector order = IntegerVector(merge.nrow()+1);\n  int ind = 0;\n  visit(merge, order, merge.nrow() - 1, 0, ind);\n  visit(merge, order, merge.nrow() - 1, 1, ind);\n  return(order);\n}\n\n// [[Rcpp::export]]\nList hclustMergeOrder(NumericMatrix mst, IntegerVector o){\n  int npoints = mst.nrow() + 1;\n  NumericVector dist = mst(_, 2);\n\n  // Extract order, reorder indices\n  NumericVector left = mst(_, 0), right = mst(_, 1);\n  IntegerVector left_int = as<IntegerVector>(left[o-1]), right_int = as<IntegerVector>(right[o-1]);\n\n  // Labels and resulting merge matrix\n  IntegerVector labs = -seq_len(npoints);\n  IntegerMatrix merge = IntegerMatrix(npoints - 1, 2);\n\n  // Replace singletons as negative and record merge of non-singletons as positive\n  for (int i = 0; i < npoints - 1; ++i) {\n    int lab_left = labs.at(left_int.at(i)-1), lab_right = labs.at(right_int.at(i)-1);\n    merge(i, _) = IntegerVector::create(lab_left, lab_right);\n    for (int c = 0; c < npoints; ++c){\n      if (labs.at(c) == lab_left || labs.at(c) == lab_right){\n        labs.at(c) = i+1;\n      }\n    }\n  }\n  //IntegerVector int_labels = seq_len(npoints);\n  List res = List::create(\n    _[\"merge\"] = merge,\n    _[\"height\"] = dist[o-1],\n    _[\"order\"] = extractOrder(merge),\n    _[\"labels\"] = R_NilValue, //as<StringVector>(int_labels)\n    _[\"method\"] = \"robust single\",\n    _[\"dist.method\"] = \"mutual reachability\"\n  );\n  res.attr(\"class\") = \"hclust\";\n  return res;\n}\n"
  },
  {
    "path": "src/mst.h",
    "content": "#ifndef MST_H\n#define MST_H\n\n#include <Rcpp.h>\n#include \"lt.h\"\n\nusing namespace Rcpp;\n\n// Functions to compute MST and build hclust object out of the resulting tree\nNumericMatrix mst(const NumericVector x_dist, const R_xlen_t n);\n\nList hclustMergeOrder(NumericMatrix mst, IntegerVector o);\n\n#endif\n"
  },
  {
    "path": "src/optics.cpp",
    "content": "//----------------------------------------------------------------------\n//                                OPTICS\n// File:                        R_optics.cpp\n//----------------------------------------------------------------------\n// Copyright (c) 2015 Michael Hahsler, Matt Piekenbrock. All Rights Reserved.\n//\n// This software is provided under the provisions of the\n// GNU General Public License (GPL) Version 3\n// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)\n\n\n#include <Rcpp.h>\n#include \"ANN/ANN.h\"\n#include \"regionQuery.h\"\n\nusing namespace Rcpp;\n\nvoid update(\n    std::pair< std::vector<int>, std::vector<double> > &N,\n    int p,\n    std::vector<int> &seeds,\n    int minPts,\n    std::vector <bool> &visited,\n    std::vector<int> &orderedPoints,\n    std::vector<double> &reachdist,\n    std::vector<double> &coredist,\n    std::vector<int> &pre){\n\n  std::vector<int>::iterator pos_seeds;\n  double newreachdist;\n  int o;\n  double o_d;\n\n  while(!N.first.empty()) {\n    o = N.first.back();\n    o_d = N.second.back();\n    N.first.pop_back();\n    N.second.pop_back();\n\n    if(visited[o]) continue;\n\n    newreachdist = std::max(coredist[p], o_d);\n\n    if(reachdist[o] == INFINITY) {\n      reachdist[o] = newreachdist;\n      seeds.push_back(o);\n    } else {\n      // o was not visited and has a reachability distance must be\n      // already in seeds!\n      if(newreachdist < reachdist[o]) {\n        reachdist[o] = newreachdist;\n        pre[o] = p;\n      }\n    }\n  }\n}\n\n\n// [[Rcpp::export]]\nList optics_int(NumericMatrix data, double eps, int minPts,\n  int type, int bucketSize, int splitRule, double approx, List frNN) {\n\n  // kd-tree uses squared distances\n  double eps2 = eps*eps;\n\n  ANNpointSet* kdTree = NULL;\n  ANNpointArray dataPts = NULL;\n  int nrow = NA_INTEGER;\n  int ncol= NA_INTEGER;\n\n  if(frNN.size()) {\n    // no kd-tree\n    nrow = (as<List>(frNN[\"id\"])).size();\n  }else{\n\n    // copy data for kd-tree\n    nrow = data.nrow();\n    ncol = data.ncol();\n    dataPts = annAllocPts(nrow, ncol);\n    for (int i = 0; i < nrow; i++){\n      for (int j = 0; j < ncol; j++){\n        (dataPts[i])[j] = data(i, j);\n      }\n    }\n    //Rprintf(\"Points copied.\\n\");\n\n    // create kd-tree (1) or linear search structure (2)\n    if (type==1) kdTree = new ANNkd_tree(dataPts, nrow, ncol, bucketSize,\n      (ANNsplitRule) splitRule);\n    else kdTree = new ANNbruteForce(dataPts, nrow, ncol);\n    //Rprintf(\"kd-tree ready. starting OPTICS.\\n\");\n\n  }\n\n\n  // OPTICS\n  std::vector<bool> visited(nrow, false);\n  std::vector<int> orderedPoints; orderedPoints.reserve(nrow);\n  std::vector<int> pre(nrow, NA_INTEGER);\n  std::vector<double> reachdist(nrow, INFINITY); // we used Inf as undefined\n  std::vector<double> coredist(nrow, INFINITY);\n  nn N;\n  std::vector<int> seeds;\n  std::vector<double> ds;\n\n  for (int p=0; p<nrow; p++) {\n    if (!(p % 10)) Rcpp::checkUserInterrupt();\n    //Rprintf(\"processing point %d\\n\", p+1);\n\n    if (visited[p]) continue;\n\n    // ExpandClusterOrder\n    //N = regionQueryDist(p, dataPts, kdTree, eps2, approx);\n    if(frNN.size())   N = std::make_pair(\n      as<std::vector<int> >(as<List>(frNN[\"id\"])[p]),\n      as<std::vector<double> >(as<List>(frNN[\"dist\"])[p]));\n    else              N = regionQueryDist(p, dataPts, kdTree, eps2, approx);\n\n    visited[p] = true;\n\n    // find core distance\n    if(N.second.size() >= (size_t) minPts) {\n      ds = N.second;\n      std::sort(ds.begin(), ds.end()); // sort inceasing\n      coredist[p] = ds[minPts-1];\n    }\n    int tmp_p = NA_INTEGER;\n    if (pre[p] == NA_INTEGER) { tmp_p = p; }\n    orderedPoints.push_back(p);\n\n    if (coredist[p] == INFINITY) continue; // core-dist is undefined\n\n    // updateable priority queue does not exist in C++ STL so we use a vector!\n    //seeds.clear();\n\n    // update\n    update(N, p, seeds, minPts, visited, orderedPoints,\n      reachdist, coredist, pre);\n\n    int q;\n    while (!seeds.empty()) {\n      // get smallest dist (to emulate priority queue). All should have already\n      // a reachability distance <Inf from update().\n      std::vector<int>::iterator q_it = seeds.begin();\n      for (std::vector<int>::iterator it = seeds.begin();\n        it!=seeds.end(); ++it) {\n        // Note: The second part of the if statement ensures that ties are\n        // always broken consistenty (higher ID wins to produce the same\n        // results as the elki implementation)!\n        if (reachdist[*it] < reachdist[*q_it] ||\n          (reachdist[*it] == reachdist[*q_it] && *q_it < *it)) q_it = it;\n      }\n      q = *q_it;\n      seeds.erase(q_it);\n\n      //N2 = regionQueryDist(q, dataPts, kdTree, eps2, approx);\n      if(frNN.size())   N = std::make_pair(\n        as<std::vector<int> >(as<List>(frNN[\"id\"])[q]),\n        as<std::vector<double> >(as<List>(frNN[\"dist\"])[q]));\n      else              N = regionQueryDist(q, dataPts, kdTree, eps2, approx);\n\n      visited[q] = true;\n\n      // update core distance\n      if(N.second.size() >= (size_t) minPts) {\n        ds = N.second;\n        std::sort(ds.begin(), ds.end());\n        coredist[q] = ds[minPts - 1];\n      }\n      if (pre[q] == NA_INTEGER) { pre[q] = tmp_p; }\n      orderedPoints.push_back(q);\n\n      if(N.first.size() < (size_t) minPts) continue; //  == q has no core dist.\n\n      // update seeds\n      update(N, q, seeds, minPts, visited, orderedPoints,\n        reachdist, coredist, pre);\n    }\n  }\n\n  // cleanup\n  if (kdTree != NULL) delete kdTree;\n  if (dataPts != NULL)  annDeallocPts(dataPts);\n  // annClose(); is now done globally in the package\n\n  // prepare results (R index starts with 1)\n  List ret;\n  ret[\"order\"] = IntegerVector(orderedPoints.begin(), orderedPoints.end()) + 1;\n  ret[\"reachdist\"] = sqrt(NumericVector(reachdist.begin(), reachdist.end()));\n  ret[\"coredist\"] = sqrt(NumericVector(coredist.begin(), coredist.end()));\n  ret[\"predecessor\"] = IntegerVector(pre.begin(), pre.end()) + 1;\n  return ret;\n}\n\n"
  },
  {
    "path": "src/regionQuery.cpp",
    "content": "//----------------------------------------------------------------------\n//                              Region Query\n// File:                        R_regionQuery.cpp\n//----------------------------------------------------------------------\n// Copyright (c) 2015 Michael Hahsler. All Rights Reserved.\n//\n// This software is provided under the provisions of the\n// GNU General Public License (GPL) Version 3\n// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)\n\n\n\n#include \"regionQuery.h\"\n\nusing namespace Rcpp;\n\n// Note: Region query returns self-matches!\n\n// these function takes an id for the points in the k-d tree\nnn regionQueryDist(int id, ANNpointArray dataPts, ANNpointSet* kdTree,\n  double eps2, double approx) {\n\n  // find fixed radius nearest neighbors\n  ANNpoint queryPt = dataPts[id];\n  std::pair< std::vector<int>, std::vector<double> > ret =\n    kdTree->annkFRSearch2(queryPt, eps2, approx);\n  // Note: the points are not sorted by distance!\n\n  return(ret);\n}\n\nstd::vector<int> regionQuery(int id, ANNpointArray dataPts, ANNpointSet* kdTree,\n  double eps2, double approx) {\n\n  // find fixed radius nearest neighbors\n  ANNpoint queryPt = dataPts[id];\n  std::pair< std::vector<int>, std::vector<double> > ret =\n    kdTree->annkFRSearch2(queryPt, eps2, approx);\n  // Note: the points are not sorted by distance!\n\n  return(ret.first);\n}\n\n\n// these function takes an query point not in the tree\nnn regionQueryDist_point(ANNpoint queryPt, ANNpointArray dataPts,\n\tANNpointSet* kdTree, double eps2, double approx) {\n\n  // find fixed radius nearest neighbors\n  std::pair< std::vector<int>, std::vector<double> > ret =\n    kdTree->annkFRSearch2(queryPt, eps2, approx);\n  // Note: the points are not sorted by distance!\n\n  return(ret);\n}\n\nstd::vector<int> regionQuery_point(ANNpoint queryPt, ANNpointArray dataPts,\n\tANNpointSet* kdTree, double eps2, double approx) {\n\n  // find fixed radius nearest neighbors\n  std::pair< std::vector<int>, std::vector<double> > ret =\n    kdTree->annkFRSearch2(queryPt, eps2, approx);\n  // Note: the points are not sorted by distance!\n\n  return(ret.first);\n}\n\n"
  },
  {
    "path": "src/regionQuery.h",
    "content": "//----------------------------------------------------------------------\n//                              Region Query\n// File:                        R_regionQuery.h\n//----------------------------------------------------------------------\n// Copyright (c) 2015 Michael Hahsler. All Rights Reserved.\n//\n// This software is provided under the provisions of the\n// GNU General Public License (GPL) Version 3\n// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)\n\n#ifndef REGIONQUERY_H\n#define REGIONQUERY_H\n\n#include <Rcpp.h>\n#include \"ANN/ANN.h\"\n\nusing namespace Rcpp;\n\n// pair of ids and dists\ntypedef std::pair< std::vector<int>, std::vector<double> > nn ;\n\n// Note: Region query returns self-matches!\n\n// these function takes an id for the points in the k-d tree\nnn regionQueryDist(int id, ANNpointArray dataPts, ANNpointSet* kdTree,\n  double eps2, double approx = 0.0);\n\nstd::vector<int> regionQuery(int id, ANNpointArray dataPts, ANNpointSet* kdTree,\n  double eps2, double approx = 0.0);\n\n// these function takes an query point not in the tree\nnn regionQueryDist_point(ANNpoint queryPt, ANNpointArray dataPts, ANNpointSet* kdTree,\n  double eps2, double approx = 0.0);\n\nstd::vector<int> regionQuery_point(ANNpoint queryPt, ANNpointArray dataPts, ANNpointSet* kdTree,\n  double eps2, double approx = 0.0);\n\n#endif\n"
  },
  {
    "path": "src/utilities.cpp",
    "content": "//----------------------------------------------------------------------\n//              R interface to dbscan using the ANN library\n//----------------------------------------------------------------------\n// Copyright (c) 2015 Michael Hahsler. All Rights Reserved.\n//\n// This software is provided under the provisions of the\n// GNU General Public License (GPL) Version 3\n// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)\n\n\n\n#include \"utilities.h\"\n\n// extract the lower triangle from a matrix\nIntegerVector lowerTri(IntegerMatrix m) {\n  int n = m.nrow();\n  IntegerVector lower_tri = IntegerVector(n * (n - 1) / 2);\n  for (int i = 0, c = 0; i < n; ++i) {\n    for (int j = i + 1; j < n; ++j) {\n      if (i < j) lower_tri[c++] = m(i, j);\n    }\n  }\n  return lower_tri;\n}\n\nNumericVector combine(const NumericVector& t1, const NumericVector& t2) {\n  std::size_t n = t1.size() + t2.size();\n  NumericVector output = Rcpp::no_init(n);\n  std::copy(t1.begin(), t1.end(), output.begin());\n  std::copy(t2.begin(), t2.end(), output.begin() + t1.size());\n  return output;\n}\n\nIntegerVector combine(const IntegerVector& t1, const IntegerVector& t2) {\n  std::size_t n = t1.size() + t2.size();\n  IntegerVector output = Rcpp::no_init(n);\n  std::copy(t1.begin(), t1.end(), output.begin());\n  std::copy(t2.begin(), t2.end(), output.begin() + t1.size());\n  return output;\n}\n\n// Faster version of above combine function, assuming you can precompute and store\n// the containers needing to be concatenated\nIntegerVector concat_int(List const& container) {\n  int total_length = 0;\n  for (List::const_iterator it = container.begin(); it != container.end(); ++it) {\n    total_length += as<IntegerVector>(*it).size();\n  }\n  int pos = 0;\n  IntegerVector output = Rcpp::no_init(total_length);\n  for (List::const_iterator it = container.begin(); it != container.end(); ++it) {\n    IntegerVector vec = as<IntegerVector>(*it);\n    std::copy(vec.begin(), vec.end(), output.begin() + pos);\n    pos += vec.size();\n  }\n  return output;\n}\n\n"
  },
  {
    "path": "src/utilities.h",
    "content": "//----------------------------------------------------------------------\n//              R interface to dbscan using the ANN library\n//----------------------------------------------------------------------\n// Copyright (c) 2015 Michael Hahsler. All Rights Reserved.\n//\n// This software is provided under the provisions of the\n// GNU General Public License (GPL) Version 3\n// (see: http://www.gnu.org/licenses/gpl-3.0.en.html)\n\n\n#ifndef UTILITIES_H\n#define UTILITIES_H\n\n#include <Rcpp.h>\n\nusing namespace Rcpp;\n\n// contains used in hdbscan.cpp\ntemplate <typename T, typename C>\nbool contains(const T& container, const C& key) {\n  if (std::find(container.begin(), container.end(), key) != container.end()) {\n    return true;\n  }\n  return false;\n}\n\n// extract the lower triangle from a matrix\n// [[Rcpp::export]]\nIntegerVector lowerTri(IntegerMatrix m);\n\n// internal c (combine) for Rcpp vectors\nNumericVector combine(const NumericVector& t1, const NumericVector& t2);\nIntegerVector combine(const IntegerVector& t1, const IntegerVector& t2);\n\n// Faster version of above combine function, assuming you can precompute and store\n// the containers needing to be concatenated\nIntegerVector concat_int(List const& container);\n\n#endif\n"
  },
  {
    "path": "tests/testthat/test-dbcv.R",
    "content": "test_that(\"dbcv\", {\n  # From: https://github.com/FelSiq/DBCV\n  #\n  # Dataset\t      MATLAB\n  # dataset_1.txt\t0.8576\n  # dataset_2.txt\t0.8103\n  # dataset_3.txt\t0.6319\n  # dataset_4.txt\t0.8688\n  #\n  # Original MATLAB implementation is at:\n  #     https://github.com/pajaskowiak/dbcv/tree/main/data\n\n  data(Dataset_1)\n  x <- Dataset_1[, c(\"x\", \"y\")]\n  class <- Dataset_1$class\n  #clplot(x, class)\n  (db <- dbcv(x, class, metric = \"sqeuclidean\"))\n  expect_equal(round(db$score, 2), 0.86)\n\n  # detailed results from the Python implementation\n  #dsc [0.00457826 0.00457826 0.0183068  0.0183068 ]\n  #dspc [0.85627898 0.85627898 0.85627898 0.85627898]\n  #vcs [0.99465331 0.99465331 0.97862052 0.97862052]\n  #0.8575741400490697\n\n  data(Dataset_2)\n  x <- Dataset_2[, c(\"x\", \"y\")]\n  class <- Dataset_2$class\n  #clplot(x, class)\n  (db <- dbcv(x, class, metric = \"sqeuclidean\"))\n  expect_equal(round(db$score, 2), 0.81)\n\n  #dsc [19.06151967 15.6082 83.71522964 68.969]\n  #dspc [860.2538 501.4376 501.4376 860.2538]\n  #vcs [0.97784198 0.9688731  0.83304956 0.91982715]\n  #0.8103343589093096\n\n  # more data sets\n\n  # data(Dataset_3)\n  # x <- Dataset_3[, c(\"x\", \"y\")]\n  # class <- Dataset_3$class\n  # #clplot(x, class)\n  # (db <- dbcv(x, class, metric = \"sqeuclidean\"))\n  #\n  # data(Dataset_4)\n  # x <- Dataset_4[, c(\"x\", \"y\")]\n  # class <- Dataset_4$class\n  # #clplot(x, class)\n  # (db <- dbcv(x, class, metric = \"sqeuclidean\"))\n\n})\n"
  },
  {
    "path": "tests/testthat/test-dbscan.R",
    "content": "test_that(\"dbscan works\", {\n  data(\"iris\")\n  ## Species is a factor\n  expect_error(dbscan(iris))\n\n  iris <- as.matrix(iris[, 1:4])\n\n  res <- dbscan(iris, eps = .4, minPts = 4)\n\n  expect_length(res$cluster, nrow(iris))\n\n  ## expected result of table(res$cluster) is:\n  expect_identical(table(res$cluster, dnn = NULL),\n      as.table(c(\"0\" = 25L, \"1\" = 47L, \"2\" = 38L, \"3\" = 36L, \"4\" = 4L)))\n\n  ## compare with dbscan from package fpc (only if installed)\n  if (requireNamespace(\"fpc\", quietly = TRUE)) {\n      res2 <- fpc::dbscan(iris, eps = .4, MinPts = 4)\n\n      expect_equal(res$cluster, res2$cluster)\n\n      ## test is.corepoint\n      all(res2$isseed == is.corepoint(iris, eps = .4, minPts = 4))\n  }\n\n  ## compare with precomputed frNN\n  fr <- frNN(iris, eps = .4)\n  res9 <- dbscan(fr, minPts = 4)\n  expect_equal(res, res9)\n\n  ## compare on example data from fpc\n  set.seed(665544)\n  n <- 600\n  x <- cbind(\n      x = runif(10, 0, 10) + rnorm(n, sd = 0.2),\n      y = runif(10, 0, 10) + rnorm(n, sd = 0.2)\n      )\n\n  res <- dbscan(x, eps = .2, minPts = 4)\n  expect_length(res$cluster, nrow(x))\n\n  ## compare with dist-based versions\n  res_d <- dbscan(dist(x), eps = .2, minPts = 4)\n  expect_identical(res, res_d)\n  res_d2 <- dbscan(x, eps = .2, minPts = 4, search = \"dist\")\n  expect_identical(res, res_d2)\n\n  ## compare with dbscan from package fpc (only if installed)\n  if (requireNamespace(\"fpc\", quietly = TRUE)) {\n    res2 <- fpc::dbscan(x, eps = .2, MinPts = 4)\n    expect_equal(res$cluster, res2$cluster)\n  }\n\n  ## missing values, but distances are fine\n  x_na <- x\n  x_na[c(1, 3, 5), 1] <- NA\n  expect_error(dbscan(x_na, eps = .2, minPts = 4), regexp = \"NA\")\n  res_d1 <- dbscan(x_na, eps = .2, minPts = 4, search = \"dist\")\n  res_d2 <- dbscan(dist(x_na), eps = .2, minPts = 4)\n  expect_identical(res_d1, res_d2)\n\n  ## introduce NAs into dist\n  x_na[c(1,3,5), 2] <- NA\n  expect_error(dbscan(x_na, eps = .2, minPts = 4), regexp = \"NA\")\n  expect_error(dbscan(x_na, eps = .2, minPts = 4, search = \"dist\"),\n    regexp = \"NA\")\n  expect_error(dbscan(dist(x_na), eps = .2, minPts = 4), regexp = \"NA\")\n\n\n  ## call with no rows or no columns\n  expect_error(dbscan(matrix(0, nrow = 0, ncol = 2), eps = .2, minPts = 4))\n  expect_error(dbscan(matrix(0, nrow = 2, ncol = 0), eps = .2, minPts = 4))\n  dbscan(matrix(0, nrow = 1, ncol = 1), eps = .2, minPts = 4)\n})\n"
  },
  {
    "path": "tests/testthat/test-fosc.R",
    "content": "test_that(\"FOSC\", {\n  data(\"iris\")\n\n  ## FOSC expects an hclust object\n  expect_error(extractFOSC(iris))\n\n  x <- iris[, 1:4]\n  x_sl <- hclust(dist(x), \"single\")\n\n  ## Should return augmented hclust object and cluster assignments\n  expect_length(extractFOSC(x_sl), 2)\n  res <- extractFOSC(x_sl)\n  expect_identical(res$hc$method, \"single (w/ stability-based extraction)\")\n\n  ## Constraint-checking\n  expect_error(extractFOSC(x_sl, constraints = c(\"1\" = 2)))\n\n  ## Matrix inputs must be nxn\n  expect_error(extractFOSC(x_sl, constraints = matrix(c(1, 2), nrow=1)))\n\n  ## Matrix or vector constraints must be in c(-1, 0, 1)\n  expect_error(extractFOSC(x_sl, constraints = matrix(-2, nrow=nrow(x), ncol=nrow(x))))\n\n  ## Valid constraints\n  expect_warning(extractFOSC(x_sl, constraints = matrix(1, nrow=nrow(x), ncol=nrow(x))))\n  expect_silent(extractFOSC(x_sl, constraints = list(\"1\" = 2, \"2\" = 1)))\n  expect_silent(extractFOSC(x_sl, constraints = ifelse(dist(x) > 2, -1, 1)))\n\n  ## Constraints should be symmetric, but symmetry test is only done if specified. Asymmetric\n  ## constraints through warning, but proceeds with manual warning\n  expect_warning(extractFOSC(x_sl, constraints = list(\"1\" = 2), validate_constraints = TRUE))\n\n  ## Make sure that's whats returned\n  res <- extractFOSC(x_sl)\n  expect_type(res$cluster, \"integer\")\n  expect_s3_class(res$hc, \"hclust\")\n\n  ## Test 'Optimal' Clustering using only positive constraints\n  set <- which(iris$Species == \"setosa\")\n  ver <- which(iris$Species == \"versicolor\")\n  vir <- which(iris$Species == \"virginica\")\n  il_constraints <- structure(list(set[-1], ver[-1], vir[-1]), names = as.character(c(set[1], ver[1], vir[1])))\n  res <- extractFOSC(x_sl, il_constraints)\n\n  ## Positive-only constraints should link to best unsupervised solution\n  expect_identical(table(res$cluster, dnn = NULL), as.table(c(`1` = 50L, `2` = 100L)))\n  expect_identical(res$hc$method, \"single (w/ constraint-based extraction)\")\n\n  ## Test negative constraints\n  set2 <- c(il_constraints[[as.character(set[1])]], -unlist(il_constraints[as.character(c(ver[1], vir[1]))], use.names = FALSE))\n  ver2 <- c(il_constraints[[as.character(ver[1])]], -unlist(il_constraints[as.character(c(set[1], vir[1]))], use.names = FALSE))\n  vir2 <- c(il_constraints[[as.character(vir[1])]], -unlist(il_constraints[as.character(c(set[1], ver[1]))], use.names = FALSE))\n  il_constraints2 <- structure(list(set2, ver2, vir2), names = as.character(c(set[1], ver[1], vir[1])))\n  res2 <- extractFOSC(x_sl, constraints = il_constraints2)\n\n  ## Positive and Negative should produce a different solution\n  expect_false(all(res$cluster == res2$cluster))\n  expect_identical(res2$hc$method, \"single (w/ constraint-based extraction)\")\n\n  ## Test minPts parameters\n  expect_error(extractFOSC(x_sl, constraints = il_constraints2, minPts = 1))\n  expect_silent(extractFOSC(x_sl, constraints = il_constraints2, minPts = 5))\n\n  ## Test alpha parameter\n  expect_silent(extractFOSC(x_sl, constraints = il_constraints2, alpha = 0.5))\n  expect_error(extractFOSC(x_sl, constraints = il_constraints2, alpha = 1.5))\n  res3 <- extractFOSC(x_sl, constraints = il_constraints2, alpha = 0.5)\n  expect_identical(res3$hc$method, \"single (w/ mixed-objective extraction)\")\n\n  ## Test unstable pruning\n  expect_silent(extractFOSC(x_sl, constraints = il_constraints2, prune_unstable = TRUE))\n})\n"
  },
  {
    "path": "tests/testthat/test-frNN.R",
    "content": "test_that(\"frNN\", {\n  set.seed(665544)\n  n <- 1000\n  x <- cbind(\n    x = runif(10, 0, 10) + rnorm(n, sd = 0.2),\n    y = runif(10, 0, 10) + rnorm(n, sd = 0.2),\n    z = runif(10, 0, 10) + rnorm(n, sd = 0.2)\n  )\n\n  ## no duplicates first!\n  #x <- x[!duplicated(x),]\n\n  rownames(x) <- paste0(\"Object_\", seq_len(nrow(x)))\n\n  eps <- .5\n  nn <- frNN(x, eps = eps, sort = TRUE)\n\n  ## check dimensions\n  expect_identical(nn$eps, eps)\n  expect_length(nn$dist, nrow(x))\n  expect_length(nn$id, nrow(x))\n\n  expect_identical(lengths(nn$dist), lengths(nn$id))\n\n  ## check visually\n  #plot(x)\n  #points(x[nn$id[[1]],], col=\"red\", lwd=5)\n  #points(x[nn$id[[2]],], col=\"green\", lwd=5)\n  #points(x[1:2,, drop = FALSE], col=\"blue\", pch=\"+\", cex=2)\n\n  ## compare with manually found NNs\n  nn_d <- frNN(dist(x), eps = eps, sort = TRUE)\n  expect_equal(nn, nn_d)\n\n  nn_d2 <- frNN(x, eps = eps, sort = TRUE, search = \"dist\")\n  expect_equal(nn, nn_d2)\n\n  ## without sorting\n  nn2 <- frNN(x, eps = eps, sort = FALSE)\n  expect_identical(lapply(nn$id, sort),\n    lapply(nn2$id, sort))\n\n  ## search options\n  nn_linear <- frNN(x, eps=eps, search = \"linear\")\n  expect_equal(nn, nn_linear)\n\n  ## split options\n  for (so in c(\"STD\", \"MIDPT\", \"FAIR\", \"SL_FAIR\")) {\n    nn3 <- frNN(x, eps=eps, splitRule = so)\n    expect_equal(nn, nn3)\n  }\n\n  ## bucket size\n  for (bs in c(5, 10, 15, 100)) {\n    nn3 <- frNN(x, eps=eps, bucketSize = bs)\n    expect_equal(nn, nn3)\n  }\n\n\n  ## add 100 copied points to check if self match filtering works\n  x <- rbind(x, x[sample(seq_len(nrow(x)), 100),])\n  rownames(x) <- paste0(\"Object_\", seq_len(nrow(x)))\n\n  eps <- .5\n  nn <- frNN(x, eps = eps, sort = TRUE)\n\n  ## compare with manually found NNs\n  nn_d <- frNN(x, eps = eps, sort = TRUE, search = \"dist\")\n\n  expect_equal(nn, nn_d)\n\n  ## sort and frNN to reduce eps\n  nn5 <- frNN(x, eps = .5, sort = FALSE)\n  expect_false(nn5$sort)\n\n  nn5s <- sort(nn5)\n  expect_true(nn5s$sort)\n  expect_true(all(vapply(nn5s$dist, function(x) !is.unsorted(x), logical(1L))))\n\n  expect_error(frNN(nn5, eps = 1))\n  nn2 <- frNN(nn5, eps = .2)\n  expect_true(all(vapply(nn2$dist, function(x) all(x <= 0.2), logical(1L))))\n\n\n  ## test with simple data\n  x <- data.frame(x = 1:10, row.names = LETTERS[1:10], check.names = FALSE)\n  nn <- frNN(x, eps = 2)\n  expect_identical(nn$id[[1]], 2:3)\n  expect_identical(nn$id[[5]], c(4L, 6L, 3L, 7L))\n  expect_identical(nn$id[[10]], 9:8)\n\n  ## test kNN with query\n  x <- data.frame(x = 1:10, row.names = LETTERS[1:10], check.names = FALSE)\n  nn <- frNN(x[1:8, , drop=FALSE], x[9:10, , drop = FALSE], eps = 2)\n\n  expect_length(nn$id, 2L)\n  expect_identical(nn$id[[1]], 8:7)\n  expect_identical(nn$id[[2]], 8L)\n\n  expect_error(frNN(dist(x[1:8, , drop=FALSE]), x[9:10, , drop = FALSE], eps = 2))\n})\n"
  },
  {
    "path": "tests/testthat/test-hdbscan.R",
    "content": "test_that(\"HDBSCAN\", {\n  data(\"iris\")\n\n  ## minPts not given\n  expect_error(hdbscan(iris))\n\n  ## Expects numerical data; species is factor\n  expect_error(dbscan(iris, minPts = 4))\n\n  iris <- as.matrix(iris[,1:4])\n\n  res <- hdbscan(iris, minPts = 4)\n  expect_length(res$cluster, nrow(iris))\n\n  ## expected result of table(res$cluster) is:\n  expect_identical(table(res$cluster, dnn = NULL),\n                    as.table(c(\"1\" = 100L, \"2\" = 50L)))\n\n  ## compare on moons data\n  data(\"moons\")\n  res <- hdbscan(moons, minPts = 5)\n  expect_length(res$cluster, nrow(moons))\n\n  ## Check hierarchy matches dbscan* at every value\n  check <- rep(FALSE, nrow(moons)-1)\n  core_dist <- kNNdist(moons, k=5-1)\n\n  ## cutree doesn't distinguish noise as 0, so we make a new method to do it manually\n  cut_tree <- function(hcl, eps, core_dist){\n    cuts <- unname(cutree(hcl, h=eps))\n    cuts[which(core_dist > eps)] <- 0 # Use core distance to distinguish noise\n    cuts\n  }\n\n  eps_values <- sort(res$hc$height, decreasing = TRUE)+.Machine$double.eps ## Machine eps for consistency between cuts\n  for (i in seq_along(eps_values)) {\n    cut_cl <- cut_tree(res$hc, eps_values[i], core_dist)\n    dbscan_cl <- dbscan(moons, eps = eps_values[i], minPts = 5, borderPoints = FALSE) # DBSCAN* doesn't include border points\n\n    ## Use run length encoding as an ID-independent way to check ordering\n    check[i] <- (all.equal(rle(cut_cl)$lengths, rle(dbscan_cl$cluster)$lengths) == \"TRUE\")\n  }\n\n  expect_true(all(check))\n\n  ## Expect generating extra trees doesn't fail\n  res <- hdbscan(moons, minPts = 5, gen_hdbscan_tree = TRUE, gen_simplified_tree = TRUE)\n  expect_s3_class(res, \"hdbscan\")\n\n  ## Expect hdbscan tree matches stats:::as.dendrogram version of hclust object\n  hc_dend <- as.dendrogram(res$hc)\n  expect_s3_class(hc_dend, \"dendrogram\")\n  expect_identical(hc_dend, res$hdbscan_tree)\n\n  ## Expect hdbscan works with non-euclidean distances\n  dist_moons <- dist(moons, method = \"canberra\")\n  res <- hdbscan(dist_moons, minPts = 5)\n  expect_s3_class(res, \"hdbscan\")\n})\n\ntest_that(\"mrdist\", {\n  expect_identical(mrdist(cbind(1:10), 2),  mrdist(dist(cbind(1:10)), 2))\n  expect_identical(mrdist(cbind(1:11), 3), mrdist(dist(cbind(1:11)), 3))\n})\n\ntest_that(\"HDBSCAN(e)\", {\n  X <- data.frame(\n   x = c(\n    0.08, 0.46, 0.46, 2.95, 3.50, 1.49, 6.89, 6.87, 0.21, 0.15,\n    0.15, 0.39, 0.80, 0.80, 0.37, 3.63, 0.35, 0.30, 0.64, 0.59, 1.20, 1.22,\n    1.42, 0.95, 2.70, 6.36, 6.36, 6.36, 6.60, 0.04, 0.71, 0.57, 0.24, 0.24,\n    0.04, 0.04, 1.35, 0.82, 1.04, 0.62, 0.26, 5.98, 1.67, 1.67, 0.48, 0.15,\n    6.67, 6.67, 1.20, 0.21, 3.99, 0.12, 0.19, 0.15, 6.96, 0.26, 0.08, 0.30,\n    1.04, 1.04, 1.04, 0.62, 0.04, 0.04, 0.04, 0.82, 0.82, 1.29, 1.35, 0.46,\n    0.46, 0.04, 0.04, 5.98, 5.98, 6.87, 0.37, 6.47, 6.47, 6.47, 6.67, 0.30,\n    1.49, 3.21, 3.21, 0.75, 0.75, 0.46, 0.46, 0.46, 0.46, 3.63, 0.39, 3.65,\n    4.09, 4.01, 3.36, 1.43, 3.28, 5.94, 6.35, 6.87, 5.60, 5.99, 0.12, 0.00,\n    0.32, 0.39, 0.00, 1.63, 1.36, 5.67, 5.60, 5.79, 1.10, 2.99, 0.39, 0.18\n    ),\n   y = c(\n    7.41, 8.01, 8.01, 5.44, 7.11, 7.13, 1.83, 1.83, 8.22, 8.08,\n    8.08, 7.20, 7.83, 7.83, 8.29, 5.99, 8.32, 8.22, 7.38, 7.69, 8.22, 7.31,\n    8.25, 8.39, 6.34, 0.16, 0.16, 0.16, 1.66, 7.55, 7.90, 8.18, 8.32, 8.32,\n    7.97, 7.97, 8.15, 8.43, 7.83, 8.32, 8.29, 1.03, 7.27, 7.27, 8.08, 7.27,\n    0.79, 0.79, 8.22, 7.73, 6.62, 7.62, 8.39, 8.36, 1.73, 8.29, 8.04, 8.22,\n    7.83, 7.83, 7.83, 8.32, 8.11, 7.69, 7.55, 7.20, 7.20, 8.01, 8.15, 7.55,\n    7.55, 7.97, 7.97, 1.03, 1.03, 1.24, 7.20, 0.47, 0.47, 0.47, 0.79, 8.22,\n    7.13, 6.48, 6.48, 7.10, 7.10, 8.01, 8.01, 8.01, 8.01, 5.99, 8.04, 5.22,\n    5.82, 5.14, 4.81, 7.62, 5.73, 0.55, 1.31, 0.05, 0.95, 1.59, 7.99, 7.48,\n    8.38, 7.12, 2.01, 1.40, 0.00, 9.69, 9.47, 9.25, 2.63, 6.89, 0.56, 3.11\n   )\n  )\n\n  hdbe <- hdbscan(X, minPts = 3, cluster_selection_epsilon = 1)\n  #plot(X, col = hdbe$cluster + 1L, main = \"HDBSCAN(e)\")\n\n  expect_equal(ncluster(hdbe), 5L)\n  expect_equal(nnoise(hdbe), 0L)\n})\n\n"
  },
  {
    "path": "tests/testthat/test-kNN.R",
    "content": "test_that(\"kNN\", {\n  set.seed(665544)\n  n <- 1000\n  x <- cbind(\n    x = runif(10, 0, 10) + rnorm(n, sd = 0.2),\n    y = runif(10, 0, 10) + rnorm(n, sd = 0.2),\n    z = runif(10, 0, 10) + rnorm(n, sd = 0.2)\n  )\n\n  ## no duplicates first! All distances should be unique\n  x <- x[!duplicated(x),]\n\n  rownames(x) <- paste0(\"Object_\", seq_len(nrow(x)))\n\n  k <- 5L\n  nn <- kNN(x, k=k, sort = TRUE)\n\n  ## check dimensions\n  expect_identical(nn$k, k)\n  expect_identical(dim(nn$dist), c(nrow(x), k))\n  expect_identical(dim(nn$id), c(nrow(x), k))\n\n  ## check visually\n  #plot(x)\n  #points(x[nn$id[1,],], col=\"red\", lwd=5)\n  #points(x[nn$id[2,],], col=\"green\", lwd=5)\n\n  ## compare with kNN found using distances\n  nn_d <- kNN(dist(x), k, sort = TRUE)\n\n  ## check visually\n  #plot(x)\n  #points(x[nn_d$id[1,],], col=\"red\", lwd=5)\n  #points(x[nn_d$id[2,],], col=\"green\", lwd=5)\n\n  ### will agree since we use sorting\n  expect_equal(nn, nn_d)\n\n  ## calculate dist internally\n  nn_d2 <- kNN(x, k, search = \"dist\", sort = TRUE)\n  expect_equal(nn, nn_d2)\n\n  ## without sorting\n  nn2 <- kNN(x, k=k, sort = FALSE)\n  expect_equal(t(apply(nn$id, MARGIN = 1, sort)),\n    t(apply(nn2$id, MARGIN = 1, sort)))\n\n  ## search options\n  nn_linear <- kNN(x, k=k, search = \"linear\", sort = TRUE)\n  expect_equal(nn, nn_linear)\n\n  ## split options\n  for(so in c(\"STD\", \"MIDPT\", \"FAIR\", \"SL_FAIR\")) {\n    nn3 <- kNN(x, k=k, splitRule = so, sort = TRUE)\n    expect_equal(nn, nn3)\n  }\n\n  ## bucket size\n  for (bs in c(5, 10, 15, 100)) {\n    nn3 <- kNN(x, k=k, bucketSize = bs, sort = TRUE)\n    expect_equal(nn, nn3)\n  }\n\n  ## the order is not stable with matching distances which means that the\n  ## k-NN are not stable. We add 100 copied points to check if self match\n  ## filtering and sort works\n  x <- rbind(x, x[sample(seq_len(nrow(x)), 100),])\n  rownames(x) <- paste0(\"Object_\", seq_len(nrow(x)))\n\n  k <- 5L\n  nn <- kNN(x, k=k, sort = TRUE)\n\n  ## compare with manually found NNs\n  nn_d <- kNN(x, k=k, search = \"dist\", sort = TRUE)\n\n  expect_equal(nn$dist, nn_d$dist)\n  ## This is expected to fail: because the ids are not stable for matching distances\n  ## expect_equal(nn$id, nn_d$id)\n  ## FIXME: write some code to check this!\n\n\n  ## missing values, but distances are fine\n  x_na <- x\n  x_na[c(1, 3, 5), 1] <- NA\n  expect_error(kNN(x_na, k = 3), regexp = \"NA\")\n  res_d1 <- kNN(x_na, k = 3, search = \"dist\")\n  res_d2 <- kNN(dist(x_na), k = 3)\n  expect_equal(res_d1, res_d2)\n\n  ## introduce NAs into dist\n  x_na[c(1, 3, 5),] <- NA\n  expect_error(kNN(x_na, k = 3), regexp = \"NA\")\n  expect_error(kNN(x_na, k = 3, search = \"dist\"), regexp = \"NA\")\n  expect_error(kNN(dist(x_na), k = 3), regexp = \"NA\")\n\n  ## inf\n  x_inf <- x\n  x_inf[c(1, 3, 5), 2] <- Inf\n  kNN(x_inf, k = 3)\n  kNN(x_inf, k = 3, search = \"dist\")\n  kNN(dist(x_inf), k = 3)\n\n\n  ## sort and kNN to reduce k\n  nn10 <- kNN(x, k = 10)\n  #nn10 <- kNN(x, k = 10, sort = FALSE)\n  ## knn now returns sorted lists\n  #expect_equal(nn10$sort, FALSE)\n  expect_error(kNN(nn10, k = 11))\n  nn5 <- kNN(nn10, k = 5)\n  expect_true(nn5$sort)\n  expect_identical(ncol(nn5$id), 5L)\n  expect_identical(ncol(nn5$dist), 5L)\n\n  ## test with simple data\n  x <- data.frame(x = 1:10, row.names = LETTERS[1:10], check.names = FALSE)\n  nn <- kNN(x, k = 5)\n  expect_identical(unname(nn$id[1, ]), 2:6)\n  expect_identical(unname(nn$id[5, ]), c(4L, 6L, 3L, 7L, 2L))\n  expect_identical(unname(nn$id[10, ]), 9:5)\n\n  ## test kNN with query\n  x <- data.frame(x = 1:10, row.names = LETTERS[1:10], check.names = FALSE)\n  nn <- kNN(x[1:8, , drop=FALSE], x[9:10, , drop = FALSE], k = 5)\n  expect_identical(nrow(nn$id), 2L)\n  expect_identical(unname(nn$id[1, ]), 8:4)\n  expect_identical(unname(nn$id[2, ]), 8:4)\n\n  expect_error(kNN(dist(x[1:8, , drop=FALSE]), x[9:10, , drop = FALSE], k = 5))\n})\n"
  },
  {
    "path": "tests/testthat/test-kNNdist.R",
    "content": "test_that(\"kNNdist\", {\n  set.seed(665544)\n  n <- 1000\n  x <- cbind(\n    x = runif(10, 0, 10) + rnorm(n, sd = 0.2),\n    y = runif(10, 0, 10) + rnorm(n, sd = 0.2),\n    z = runif(10, 0, 10) + rnorm(n, sd = 0.2)\n  )\n\n  d <- kNNdist(x, k = 5)\n  expect_length(d, n)\n\n  d <- kNNdist(x, k = 5, all = TRUE)\n  expect_equal(dim(d), c(n, 5))\n\n  # does the plot work?\n  #kNNdistplot(x, 5)\n})\n"
  },
  {
    "path": "tests/testthat/test-lof.R",
    "content": "test_that(\"LOF\", {\n  set.seed(665544)\n  n <- 600\n  x <- cbind(\n    x=runif(10, 0, 5) + rnorm(n, sd=0.4),\n    y=runif(10, 0, 5) + rnorm(n, sd=0.4)\n  )\n\n  ### calculate LOF score\n  system.time(lof_kd <- lof(x, minPts = 5))\n  expect_length(lof_kd, nrow(x))\n\n  system.time(lof_d <- lof(dist(x), minPts = 5))\n  #expect_equal(lof_kd, lof_d)\n\n  ## compare with lofactor from DMwR (k = minPts - 1)\n  #if(requireNamespace(\"DMwR\", quietly = TRUE)) {\n  #  system.time(lof_DMwr <- DMwR::lofactor(x, k = 4))\n  # DMwR is now retired so we have the correct values here\n  #  dput(round(lof_DMwr, 7))\n\n  lof_DMwr <- c(1.0386817, 1.0725475, 1.1440822, 0.9448794, 1.1387918, 2.285202,\n    1.0976862, 1.071325, 0.975922, 0.9549399, 1.0918247, 0.9868736,\n    1.123618, 2.2802129, 0.992019, 1.046492, 1.0729966, 1.6925297,\n    1.0032157, 0.9691323, 1.0561082, 0.9493052, 1.0209116, 0.8897277,\n    1.008681, 1.0711202, 1.053845, 0.9734241, 1.1147289, 0.9351913,\n    1.8674401, 1.097982, 0.9782695, 1.0613472, 0.9988367, 1.4571062,\n    0.9927837, 0.9443716, 1.0014804, 1.0322888, 0.9264795, 0.9509729,\n    0.9757305, 1.0647956, 1.0184634, 1.428911, 1.0166712, 0.9692196,\n    1.0821285, 1.1282936, 0.9874694, 1.1079347, 0.9906487, 0.9972962,\n    1.0594364, 0.9160978, 1.2393862, 1.3578505, 0.930095, 1.0489962,\n    1.1401282, 1.1808566, 1.0380796, 2.0657157, 0.9837392, 0.9712287,\n    1.4754447, 1.3154291, 1.0589814, 1.0486608, 1.0986178, 1.1375705,\n    1.0147473, 1.7615974, 0.9724805, 0.9719851, 0.982247, 1.0591561,\n    1.0862436, 1.0710844, 1.11301, 0.9719126, 1.0455651, 0.9426225,\n    1.0934785, 1.1223749, 1.1734774, 1.0037237, 0.8844162, 0.9131705,\n    1.0728687, 1.0446755, 1.108353, 0.9492501, 1.1704727, 1.1914106,\n    0.9453222, 1.1724001, 1.1827576, 0.9617445, 1.1519398, 1.1480532,\n    1.0268692, 1.0580088, 1.392551, 1.2571354, 0.9703385, 1.5030845,\n    1.0201881, 1.0061842, 0.9919245, 1.2771078, 1.0473407, 1.263149,\n    0.9587146, 1.0235194, 0.988292, 0.9302287, 1.0593181, 0.978052,\n    1.1026427, 1.0615622, 1.0299466, 1.2200394, 1.0720229, 1.1343499,\n    1.0180289, 1.4500258, 0.9886391, 0.969401, 1.4881191, 1.0775279,\n    1.0380796, 1.2315327, 1.0307432, 0.9615078, 1.2379828, 1.1181202,\n    1.1049541, 1.0786524, 0.9197587, 1.0642223, 0.8073981, 0.9251505,\n    0.9971381, 1.5188771, 1.0679818, 0.9943418, 3.5343815, 0.9559526,\n    1.2129819, 1.0067672, 1.0175442, 1.0875222, 1.0403766, 2.0998678,\n    0.9870077, 1.327542, 1.0081014, 0.9608997, 0.9144311, 1.0016777,\n    1.0465469, 1.5140562, 1.5560253, 1.1125134, 1.0310594, 1.0245521,\n    1.7247798, 1.0586581, 1.0720232, 1.0594747, 0.956174, 1.0540952,\n    1.0889792, 1.050014, 1.0216425, 0.9509729, 0.9740812, 1.3065791,\n    1.0004211, 1.0127932, 0.9796374, 1.0552426, 1.0302613, 0.9524017,\n    0.9554341, 0.9870971, 0.9857225, 0.9699046, 1.1122461, 1.031985,\n    1.0852427, 1.0585017, 0.9733342, 0.9610561, 0.9086219, 1.1570747,\n    1.069232, 0.9747538, 1.0084392, 1.1063077, 0.9573789, 1.3672764,\n    1.3631144, 0.966934, 1.0992401, 0.9943351, 0.9850424, 1.0019623,\n    1.5344698, 0.9592966, 0.9645661, 1.0076189, 1.0056102, 1.0066028,\n    1.0148453, 1.0096178, 1.0963682, 1.0345623, 1.0121158, 1.0816582,\n    1.0068326, 0.9697611, 0.9322887, 1.1414811, 1.0266256, 0.9143263,\n    0.9602328, 1.1100272, 1.0885216, 1.0795966, 1.1165265, 1.1712866,\n    1.1478981, 0.9653769, 1.0419996, 1.0245088, 1.0619264, 1.1729143,\n    0.9756447, 0.9935498, 2.8554242, 1.0067806, 1.1311249, 1.36881,\n    1.8759446, 1.2136268, 1.2112035, 0.9891436, 1.1089825, 0.9937973,\n    0.9730926, 1.0287588, 1.1275406, 1.5135599, 1.0322888, 1.0746697,\n    1.0181387, 1.2715467, 0.9196022, 1.1063077, 1.0666201, 1.121323,\n    1.0850662, 0.9150997, 1.428667, 0.9488952, 1.1007532, 1.2246563,\n    0.9933742, 1.1263888, 0.985569, 1.0275125, 1.01964, 1.0449989,\n    0.9767297, 0.9704362, 0.9897834, 1.0246062, 1.0947694, 1.2170169,\n    1.1323645, 1.2366689, 0.9516316, 1.2727108, 1.0480459, 1.0338822,\n    1.1418884, 1.0733666, 1.0230934, 0.9149864, 0.9480381, 1.0388333,\n    1.1266161, 0.9615078, 1.1221968, 0.9750836, 0.978132, 1.1412698,\n    0.9716957, 1.0675609, 1.2594503, 1.0633289, 1.1427586, 1.0709402,\n    1.0393154, 1.3284915, 0.9598698, 1.1755224, 1.2392279, 1.0625965,\n    1.133851, 1.1631179, 1.4499444, 1.20366, 0.9606104, 0.9921343,\n    0.8938437, 1.1738624, 1.0131062, 1.0027174, 0.9461069, 0.9717685,\n    1.0645426, 1.046492, 1.1502628, 0.999057, 0.9758641, 1.1654844,\n    0.9964193, 1.1066967, 1.1900241, 1.0727625, 1.1304909, 1.0892065,\n    0.963785, 1.2942228, 1.0619264, 1.2733898, 0.9840458, 1.109005,\n    1.0437884, 1.0298398, 0.9513221, 1.0823791, 1.0056102, 0.8875967,\n    1.1385844, 0.8947159, 1.229025, 2.0563263, 0.9387754, 0.9683886,\n    1.2059569, 0.9923111, 1.4218394, 1.043666, 0.9963639, 1.0610107,\n    1.0049425, 0.9844978, 1.0292947, 0.9768325, 1.0528094, 1.0155664,\n    1.1586381, 1.0432875, 1.0382743, 0.9793557, 1.1206471, 0.985182,\n    1.1138052, 1.3397872, 1.0062782, 0.9474922, 1.2033802, 1.0889565,\n    0.9172793, 0.9749791, 0.9912765, 1.2617741, 0.9875289, 0.9231973,\n    1.1543416, 1.084554, 0.9805775, 0.9976991, 1.0076805, 1.0267488,\n    0.9919245, 1.0627179, 0.9760528, 1.14714, 0.947823, 1.0574966,\n    1.0560581, 0.9939038, 1.1754719, 0.9804448, 1.1892616, 1.2926922,\n    1.0381062, 0.9991459, 1.0110192, 1.7982637, 0.9932575, 1.0365072,\n    1.0476382, 0.9572147, 1.0362918, 0.929587, 1.1575934, 1.0942486,\n    1.1386353, 1.0484103, 1.0846261, 0.9627105, 1.0514676, 1.0148971,\n    0.9468566, 1.1103724, 1.0637948, 1.9343892, 1.0520743, 1.0526934,\n    1.0679818, 1.0045373, 1.3400328, 0.9598806, 1.0309374, 0.9556979,\n    1.3586868, 0.9806832, 1.0108765, 0.9652751, 1.9171728, 1.1786559,\n    1.0223136, 0.9491173, 1.0020994, 0.977787, 1.0659739, 1.4374944,\n    1.0311553, 1.0109194, 1.4310709, 0.9937973, 1.1235442, 1.0475279,\n    1.0221015, 1.0810464, 1.6977976, 1.0944615, 1.0511645, 1.0957941,\n    1.4443457, 1.0375637, 1.1045543, 1.0264414, 1.0205876, 1.3753965,\n    1.0976175, 1.0539255, 1.037731, 1.0592793, 1.0109924, 1.0427939,\n    1.1111455, 1.04521, 0.9745986, 1.3716186, 1.0089931, 1.0603559,\n    1.5494147, 0.9854366, 1.2662523, 0.9623836, 1.3929899, 0.999679,\n    1.0011268, 1.0179427, 1.0416134, 1.7609114, 1.069779, 1.0366241,\n    1.1245068, 0.9792311, 0.967655, 0.9542575, 1.1684304, 1.2482993,\n    1.2640331, 1.0298585, 0.9111223, 1.0672941, 0.9855631, 0.9206366,\n    1.1058931, 1.0740426, 0.9649612, 1.3460875, 0.9493052, 1.0763382,\n    1.0750445, 1.1003632, 1.0639591, 1.0930897, 0.9366367, 1.4825478,\n    0.9872073, 1.0595017, 0.9098508, 0.9132522, 0.9715029, 1.3445599,\n    0.9442429, 0.9947035, 1.5735628, 1.0179848, 1.1207158, 1.4513845,\n    0.9971349, 1.0549698, 1.0829184, 0.9570918, 1.1063325, 1.049832,\n    1.6941119, 0.976464, 1.0548108, 1.0429154, 1.1387078, 1.252386,\n    1.4497295, 1.2952889, 1.0345598, 1.3188744, 1.059327, 0.9671478,\n    0.9628657, 0.9935354, 1.2020615, 0.977946, 1.0286028, 0.9360817,\n    0.9507702, 1.0119649, 1.49294, 0.9929636, 1.0500374, 1.3857874,\n    1.271137, 1.2183431, 1.0284245, 1.2371945, 1.1308861, 1.386502,\n    1.0364896, 1.222194, 1.0893758, 1.3687506, 0.9889728, 0.9717685,\n    0.9804448, 1.0066674, 0.9703385, 1.5495994, 1.0779985, 0.9233493,\n    1.1049508, 1.0770304, 0.9206519, 1.645557, 1.0494959, 1.1984923,\n    1.4967244, 0.9976991, 1.0476285, 0.9612643, 0.9270878, 0.9683637,\n    1.1585881, 1.0376168, 0.9816509, 0.9598896, 1.035713, 1.0170878,\n    0.9578521, 0.9849839, 0.9363952, 0.9856201, 1.0240401, 1.1739687,\n    1.1257174, 0.9772498, 0.9539389, 0.9537187, 1.3452872, 0.9888146\n  )\n\n  expect_equal(round(lof_kd, 7), lof_DMwr)\n  expect_equal(round(lof_d, 7), lof_DMwr)\n\n  ## missing values, but distances are fine\n  x_na <- x\n  x_na[c(1,3,5), 1] <- NA\n  expect_error(lof(x_na), regexp = \"NA\")\n  res_d1 <- lof(x_na, search = \"dist\")\n  res_d2 <- lof(dist(x_na))\n  expect_equal(res_d1, res_d2)\n\n  x_na[c(1,3,5), 2] <- NA\n  expect_error(lof(x_na), regexp = \"NA\")\n  expect_error(lof(x_na, search = \"dist\"),\n    regexp = \"NA\")\n  expect_error(lof(dist(x_na)), regexp = \"NA\")\n\n  ## test with tied distances\n  x <- rbind(1,2,3,4,5,6,7)\n  expect_equal(round(lof(x, minPts = 4), 7),\n    c(1.0679012, 1.0679012, 1.0133929, 0.8730159, 1.0133929, 1.0679012, 1.0679012))\n\n  expect_equal(round(lof(dist(x), minPts = 4),7),\n    c(1.0679012, 1.0679012, 1.0133929, 0.8730159, 1.0133929, 1.0679012, 1.0679012))\n})\n"
  },
  {
    "path": "tests/testthat/test-mst.R",
    "content": "test_that(\"mst\", {\n  draw_mst <- function(x, m) {\n    plot(x)\n    text(x, labels = 1:nrow(x), pos = 1)\n    for (i in seq(nrow(m))) {\n      from_to <- rbind(x[m[i, 1], ], x[m[i, 2], ])\n      lines(from_to[, 1], from_to[, 2])\n    }\n  }\n\n  x <- rbind(c(0, 0), c(0, 1), c(1, 1))\n  d <- dist(x)\n  (m <- mst(d, n = nrow(x)))\n\n  #draw_mst(x, m)\n\n  expect_equal(m, structure(\n    c(2, 3, 1, 2, 1, 1),\n    dim = 2:3,\n    dimnames = list(NULL, c(\"from\", \"to\", \"weight\"))\n  ))\n\n  x <- rbind(c(0, 0),\n             c(1, 0),\n             c(0, 1),\n             c(1, 1),\n             c(2, 1),\n             c(1, 2),\n             c(.7, 1),\n             c(.7, .7),\n             c(.7, 1.3))\n  d <- dist(x)\n  (m <- mst(d, n = nrow(x)))\n\n  #draw_mst(x, m)\n\n  expect_equal(m, structure(\n    c(\n      2,\n      3,\n      4,\n      5,\n      6,\n      7,\n      8,\n      9,\n      8,\n      7,\n      7,\n      4,\n      9,\n      8,\n      1,\n      7,\n      0.761577310586391,\n      0.7,\n      0.3,\n      1,\n      0.761577310586391,\n      0.3,\n      0.989949493661166,\n      0.3\n    ),\n    dim = c(8L, 3L),\n    dimnames = list(NULL, c(\"from\", \"to\", \"weight\"))\n  ))\n\n  # data(\"Dataset_2\")\n  # x <- Dataset_2[,1:2]\n  # cl <- Dataset_2[,3]\n  # x_3 <- x[cl==3, ]\n  #\n  # (m <- mst(dist(x_3), n = nrow(x_3)))\n  # max(m[,3])\n  # draw_mst(x_3, m)\n\n\n})\n\ntest_that(\"dist_subset\", {\n  x <- rbind(c(0, 0),\n             c(1, 0),\n             c(0, 1),\n             c(1, 1),\n             c(2, 1),\n             c(1, 2),\n             c(.7, 1),\n             c(.7, .7),\n             c(.7, 1.3))\n  d <- dist(x)\n  m <- as.matrix(d)\n\n  s <- c(1:3, 6)\n  (d_sub <- dist_subset(d, s))\n  (m_sub <- m[s,s])\n\n  expect_equal(unname(as.matrix(d_sub)), unname(m_sub))\n})\n"
  },
  {
    "path": "tests/testthat/test-optics.R",
    "content": "test_that(\"OPTICS\", {\n  load(test_path(\"fixtures\", \"test_data.rda\"))\n  load(test_path(\"fixtures\", \"elki_optics.rda\"))\n\n  x <- test_data\n\n  ### run OPTICS\n  eps <- .1\n  #eps <- .06\n  eps_cl <- .1\n  minPts <- 10\n  res <- optics(x, eps = eps,  minPts = minPts)\n\n  expect_length(res$order, nrow(x))\n  expect_length(res$reachdist, nrow(x))\n  expect_length(res$coredist, nrow(x))\n  expect_identical(res$eps, eps)\n  expect_identical(res$minPts, minPts)\n\n  ### compare with distance based version!\n  res_d <- optics(dist(x), eps = eps,  minPts = minPts)\n  expect_equal(res, res_d)\n\n  #plot(res)\n  #plot(res_d)\n\n  ### compare with elki's result\n  expect_equal(res$order, elki$ID)\n  expect_equal(round(res$reachdist[res$order], 3), round(elki$reachability, 3))\n\n  ### compare result with DBSCAN\n  ### \"clustering created from a cluster-ordered is nearly indistinguishable\n  ### from a clustering created by DBSCAN. Only some border objects may\n  ### be missed\"\n\n  # extract DBSCAN clustering\n  res <- extractDBSCAN(res, eps_cl = eps_cl)\n  #plot(res)\n\n  # are there any clusters with only border points?\n  frnn <- frNN(x, eps_cl)\n  good <- vapply(frnn$id, function(x) (length(x) + 1L) >= minPts, logical(1L))\n  #plot(x, col = (res$cluster+1L))\n  c_good <- res$cluster[good]\n  c_notgood <- res$cluster[!good]\n  expect_false(setdiff(c_notgood, c_good) != 0L)\n\n  # compare with DBSCAN\n  db <- dbscan(x, minPts = minPts, eps = eps)\n  #plot(x, col = res$cluster+1L)\n  #plot(x, col = db$cluster+1L)\n\n  # match clusters (get rid of border points which might differ)\n  pure <- vapply(\n    split(db$cluster, res$cluster), function(x) length(unique(x)), integer(1L)\n  )\n\n  expect_true(all(pure[names(pure) != \"0\"] == 1L))\n\n  ## missing values, but distances are fine\n  x_na <- x\n  x_na[c(1,3,5), 1] <- NA\n  expect_error(optics(x_na, eps = .2, minPts = 4), regexp = \"NA\")\n  res_d1 <- optics(x_na, eps = .2, minPts = 4, search = \"dist\")\n  res_d2 <- optics(dist(x_na), eps = .2, minPts = 4)\n  expect_equal(res_d1, res_d2)\n\n  ## introduce NAs into dist\n  x_na[c(1,3,5), 2] <- NA\n  expect_error(optics(x_na, eps = .2, minPts = 4), regexp = \"NA\")\n  expect_error(optics(x_na, eps = .2, minPts = 4, search = \"dist\"),\n    regexp = \"NA\")\n  expect_error(optics(dist(x_na), eps = .2, minPts = 4), regexp = \"NA\")\n\n  ## Create OPTICS-converted and single-linkage dendrograms\n  res <- optics(test_data, eps = Inf,  minPts = 2)\n  res_dend <- as.dendrogram(res)\n  reference <- as.dendrogram(hclust(dist(test_data), method = \"single\"))\n\n  ## Test dendrogram ordering\n  expect_equal(as.integer(unlist(res_dend)), res$order)\n\n  ## Test Single Linkage with minPts=2, eps=INF for strict equivalence\n  ## Note: Reordering needed to correct for isomorphisms\n  ref_order <- order.dendrogram(reference)\n  reference <- reorder(reference, ref_order, agglo.FUN = mean)\n  expect_equal(reference, reorder(res_dend, ref_order, agglo.FUN = mean))\n\n  # Make sure any epsilon that queries the entire neighborhood works,\n  # error otherwise\n  max_rd <- max(res$reachdist[!is.infinite(res$reachdist)], na.rm = TRUE)\n  expect_error(as.dendrogram(optics(test_data, eps = max_rd-1e-7,  minPts = 2)), regexp = \"Eps\")\n  expect_error(as.dendrogram(optics(test_data, eps = max_rd, minPts = nrow(test_data) + 1)), regexp = \"'minPts'\")\n\n  ## Test symmetric relation between reachability <-> dendrogram structures\n  expect_equal(as.reachability(as.dendrogram(res))$reachdist, res$reachdist)\n  expect_equal(as.reachability(as.dendrogram(res))$order, res$order)\n})\n"
  },
  {
    "path": "tests/testthat/test-opticsXi.R",
    "content": "test_that(\"OPTICS-XI\", {\n  load(test_path(\"fixtures\", \"test_data.rda\"))\n  load(test_path(\"fixtures\", \"elki_optics.rda\"))\n  load(test_path(\"fixtures\", \"elki_optics_xi.rda\"))\n\n  ### run OPTICS XI with parameters: xi=0.01, eps=1.0, minPts=5\n  x <- test_data\n  res <- optics(x, eps = 1.0,  minPts = 5)\n  res <- extractXi(res, xi = 0.10, minimum = FALSE)\n\n  ### Check to make sure ELKI results match R\n  expected <- res$clusters_xi[, c(\"start\", \"end\")]\n  class(expected) <- \"data.frame\"\n  expect_identical(elki_optics_xi, expected)\n})\n"
  },
  {
    "path": "tests/testthat/test-predict.R",
    "content": "test_that(\"predict\", {\n  set.seed(3)\n  n <- 100\n  x_data <- cbind(\n    x = runif(5, 0, 10) + rnorm(n, sd = 0.2),\n    y = runif(5, 0, 10) + rnorm(n, sd = 0.2)\n  )\n\n  x_noise <- cbind(\n    x = runif(n/2, 0, 10),\n    y = runif(n/2, 0, 10)\n  )\n\n  x <- rbind(x_data, x_noise)\n\n  # check if l points with a little noise are assigned to the same cluster\n  l <- 20\n  newdata <- rbind(\n    x_data[1:l,] + rnorm(2*l, 0, .05),\n    x_noise[1:l,] + rnorm(2*l, 0, .05)\n  )\n\n  idx <- c(1:l, n + (1:l))\n\n  #plot(x, col = rep(c(\"black\", \"gray\"), each = n))\n  #points(newdata, col = rep(c(\"red\", \"gray\"), each = l), pch = 16)\n\n  # DBSCAN\n  res <- dbscan(x, eps = .3, minPts = 3)\n  pr <- predict(res, newdata, data = x)\n\n  rbind(true = res$cluster[idx], pred = pr)\n  expect_equal(res$cluster[idx], pr)\n  #plot(x, col = ifelse(res$cluster == 0, \"gray\", res$cluster))\n  #points(newdata, col = ifelse(pr == 0, \"gray\", pr), pch = 16)\n\n  # OPTICS\n  res <- optics(x, minPts = 3)\n  res <- extractDBSCAN(res, eps = .3)\n  pr <- predict(res, newdata, data = x)\n\n  rbind(true = res$cluster[idx], pred = pr)\n  expect_equal(res$cluster[idx], pr)\n\n  # currently no implementation for extractXi\n\n  # HDBSCAN (note predict is not perfect for the data.)\n  res <- hdbscan(x, minPts = 3)\n  pr <- predict(res, newdata, data = x)\n\n  rbind(true = res$cluster[idx], pred = pr)\n  accuracy <- sum(res$cluster[idx] == pr)/length(pr)\n  expect_true(accuracy > .9)\n\n  # show misclassifications\n  #plot(x, col = ifelse(res$cluster == 0, \"gray\", res$cluster))\n  #points(newdata, col = ifelse(pr == 0, \"gray\", pr), pch = 16)\n  #points(newdata[res$cluster[idx] != pr,, drop = FALSE], col = \"red\", pch = 4, lwd = 2)\n})\n"
  },
  {
    "path": "tests/testthat/test-sNN.R",
    "content": "test_that(\"sNN\", {\n  set.seed(665544)\n  n <- 1000\n  x <- cbind(\n    x = runif(10, 0, 10) + rnorm(n, sd = 0.2),\n    y = runif(10, 0, 10) + rnorm(n, sd = 0.2),\n    z = runif(10, 0, 10) + rnorm(n, sd = 0.2)\n  )\n\n  ## no duplicates first!\n  x <- x[!duplicated(x),]\n\n  rownames(x) <- paste0(\"Object_\", seq_len(nrow(x)))\n\n  k <- 5L\n  nn <- sNN(x, k=k, sort = TRUE)\n\n  ## check dimensions\n  expect_equal(nn$k, k)\n  expect_equal(dim(nn$dist), c(nrow(x), k))\n  expect_equal(dim(nn$id), c(nrow(x), k))\n\n  ## check visually\n  #plot(x)\n  #points(x[nn$id[1,],], col=\"red\", lwd=5)\n  #points(x[nn$id[2,],], col=\"green\", lwd=5)\n\n  ## compare with kNN found using distances\n  nn_d <- sNN(dist(x), k, sort = TRUE)\n\n  ## check visually\n  #plot(x)\n  #points(x[nn_d$id[1,],], col=\"red\", lwd=5)\n  #points(x[nn_d$id[2,],], col=\"green\", lwd=5)\n\n  ### will aggree minus some tries\n  expect_equal(nn, nn_d)\n\n  ## calculate dist internally\n  nn_d2 <- sNN(x, k, search = \"dist\", sort = TRUE)\n  expect_equal(nn, nn_d2)\n\n  ## missing values, but distances are fine\n  x_na <- x\n  x_na[c(1,3,5), 1] <- NA\n  expect_error(sNN(x_na, k = 3), regexp = \"NA\")\n  res_d1 <- sNN(x_na, k = 3, search = \"dist\")\n  res_d2 <- sNN(dist(x_na), k = 3)\n  expect_equal(res_d1, res_d2)\n\n  ## introduce NAs into dist\n  x_na[c(1,3,5),] <- NA\n  expect_error(sNN(x_na, k = 3), regexp = \"NA\")\n  expect_error(sNN(x_na, k = 3, search = \"dist\"), regexp = \"NA\")\n  expect_error(sNN(dist(x_na), k = 3), regexp = \"NA\")\n\n\n  ## sort and kNN to reduce k\n  nn10 <- sNN(x, k = 10, sort = FALSE)\n  expect_false(nn10$sort_shared)\n  expect_error(sNN(nn10, k = 11))\n\n  nn5 <- sNN(nn10, k = 5, sort = TRUE)\n  nn5_x <- sNN(x, k = 5, sort = TRUE)\n  expect_equal(nn5, nn5_x)\n\n  ## test with simple data\n  x <- data.frame(x = 1:10, check.names = FALSE)\n  nn <- sNN(x, k = 5)\n\n  i <- 1\n  j_ind <- 1\n  j <- nn$id[i,j_ind]\n  intersect(c(i, nn$id[i,]), nn$id[j,])\n  nn$shared[i,j_ind]\n\n  # compute the sNN simularity in R\n  ss <- matrix(nrow = nrow(x), ncol = nn$k)\n  for(i in seq_len(nrow(x)))\n    for(j_ind in 1:nn$k)\n      ss[i, j_ind] <- length(intersect(c(i, nn$id[i,]), nn$id[nn$id[i,j_ind],]))\n\n  expect_equal(nn$shared, ss)\n})\n"
  },
  {
    "path": "tests/testthat.R",
    "content": "library(testthat)\nlibrary(dbscan)\n\ntest_check(\"dbscan\")\n"
  },
  {
    "path": "vignettes/dbscan.Rnw",
    "content": "% !Rnw weave = Sweave\n\\documentclass[nojss]{jss}\n\n% Package includes\n\\usepackage[utf8]{inputenc}\n\\usepackage[english]{babel}\n%\\usepackage{esvect} % vv\n%\\usepackage{algorithm} % algorithm tools\n%\\usepackage[noend]{algpseudocode} % algorithmic (pseudocode) tools\n\\usepackage{mathtools} % coloneqq\n\\usepackage{amsthm}\n%\\usepackage[dvipsnames]{xcolor} % for adding color to code\n%\\usepackage{listings} % For pprinting r code blocks (w/o execution)\n\\usepackage{amssymb}\n\\usepackage{pifont} % http://ctan.org/pkg/pifont\n%\\usepackage{float}\n\\usepackage{tabularx}\n%\\usepackage[toc,page]{appendix}\n\n% Remove sweave margins if possible\n%\\usepackage[belowskip=-15pt,aboveskip=0pt]{caption}\n%\\setlength{\\intextsep}{8pt plus 1pt minus 1pt}\n%\\setlength{\\floatsep}{1ex}\n%\\setlength{\\textfloatsep}{1ex plus 1pt minus 1pt}\n%\\setlength{\\abovecaptionskip}{0ex}\n%\\setlength{\\belowcaptionskip}{0ex}\n\n% Aliases and commands\n\\newtheorem{mydef}{Definition}\n\\newcommand{\\minus}{\\scalebox{0.75}[1.0]{$-$}}\n\\newcommand{\\exdb}{\\texttt{extractDBSCAN} }\n\\mathchardef\\mhyphen=\"2D % Define a \"math hyphen\"\n\\newcommand{\\cmark}{\\ding{51}} % checkmark\n\n%% \\VignetteIndexEntry{Fast Density-based Clustering (DBSCAN and OPTICS)}\n\n%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\n%% declarations for jss.cls %%\n%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\n\\author{\n        Michael Hahsler \\\\Southern Methodist University \\And\n        Matthew Piekenbrock\\\\Wright State University \\AND\n        Derek Doran \\\\ Wright State University\n    }\n\\title{\\pkg{dbscan}: Fast Density-based Clustering with \\proglang{R}}\n\\Plainauthor{Michael Hahsler, Matthew Piekenbrock, Derek Doran}\n\\Plaintitle{dbscan: Fast Density-based Clustering with R}\n\\Shorttitle{\\pkg{dbscan}: Density-based Clustering with \\proglang{R}}\n\n\\Address{\n  Michael Hahsler\\\\\n  Department of Engineering Management, Information, and Systems\\\\\n  Bobby B. Lyle School of Engineering, SMU\\\\\n  P. O. Box 750123, Dallas, TX 75275\\\\\n  E-mail: \\email{mhahsler@lyle.smu.edu}\\\\\n  URL: \\url{https://michael.hahsler.net/}\n\n  \\vspace{5mm}\n\n  Matt Piekenbrock\\\\\n  Department of Computer Science and Engineering\\\\\n  Dept. of Computer Science and Engineering, Wright State University\\\\\n  3640 Colonel Glenn Hwy, Dayton, OH, 45435\\\\\n  E-mail: \\email{piekenbrock.5@wright.edu}\n\n  \\vspace{5mm}\n\n  Derek Doran\\\\\n  Department of Computer Science and Engineering\\\\\n  Dept. of Computer Science and Engineering, Wright State University\\\\\n  3640 Colonel Glenn Hwy, Dayton, OH, 45435\\\\\n  E-mail: \\email{derek.doran@wright.edu}\n}\n\n\\Abstract {\n    This article describes the implementation and use of the \\proglang{R} package \\pkg{dbscan}, which provides complete and fast implementations of the popular density-based clustering algorithm DBSCAN and the augmented ordering algorithm OPTICS. Compared to other implementations, \\pkg{dbscan} offers open-source implementations using \\proglang{C++} and advanced data structures like k-d trees to speed up computation. An important advantage of this implementation is that it is up-to-date with several primary advancements that have been added since their original publications, including artifact corrections and dendrogram extraction methods for OPTICS. Experiments with \\pkg{dbscan}'s implementation of DBSCAN and OPTICS compared and other libraries such as FPC, ELKI, WEKA, PyClustering, SciKit-Learn and SPMF suggest that \\pkg{dbscan} provides a very efficient implementation.\n}\n\\Keywords{DBSCAN, OPTICS, Density-based Clustering, Hierarchical Clustering}\n\n\\begin{document}\n\n% Do not move SweaveOpts into preamble\n\\SweaveOpts{concordance=TRUE} % prefix.string=generated/dbscan\n\\section{Introduction}\nClustering is typically described as the process of finding structure in data by grouping similar objects together, where the resulting set of groups are called clusters.\nMany clustering algorithms directly apply the idea that clusters can be formed such that objects in the same cluster should be more similar to each other than to objects in other clusters. The notion of similarity (or distance) stems from the fact that objects are assumed to be data points embedded in a data space in which a similarity measure can be defined. Examples are methods based on solving the $k$-means problem or mixture models\nwhich find the parameters of a parametric generative probabilistic model from which the observed data are assumed to arise. Another approach is hierarchical clustering, which uses local heuristics to form a hierarchy of nested grouping of objects. Most of these approaches (with the notable exception of single-link hierarchical clustering) are biased towards clusters with convex, hyper-spherical shape. A detailed review of these clustering algorithms is provided in \\cite{Kaufman:1990}, \\cite{jain1999review},  and the more recent review by\n\\cite{Aggarwal:2013}.\n\nDensity-based clustering approaches clustering differently. It simply posits that clusters are contiguous `dense' regions in the data space (i.e., regions of high point density), separated by areas of low point density~\\citep{kriegel:2011,sander2011density}.\nDensity-based methods find such high-density regions representing clusters of arbitrary shape and typically have a structured means of identifying noise points in low-density regions. These properties provide advantages for many applications compared to other clustering approaches. For example, geospatial data may be fraught with noisy data points due to estimation errors in GPS-enabled sensors~\\citep{Chen2014} and may have unique cluster shapes caused by the physical space the data was captured in. Density-based clustering is also a promising approach to clustering high-dimensional data~\\citep{kailing2004density}, where partitions are difficult to discover, and where the physical shape constraints assumed by model-based methods are more likely to be violated.\n%While dimensionality reduction techniques enable the use of many clustering algorithms to cluster high dimensional data, density-based clustering enables us to group high-dimensional data without the loss of information and recognizing noisy data.\n% What DBSCAN has been used for\n\nSeveral density-based clustering algorithms have been proposed, including\nDBSCAN algorithm~\\citep{ester1996density},\nDENCLUE~\\citep{hinneburg1998efficient}\nand many DBSCAN derivates like HDBSCAN~\\citep{campello2015hierarchical}.\nThese clustering algorithms are widely used in practice with applications ranging from finding outliers in datasets for fraud prevention~\\citep{breunig2000lof}, to finding patterns in streaming data~\\citep{chen2007density, cao2006density}, noisy signals~\\citep{kriegel2005density,ester1996density,tran2006knn,hinneburg1998efficient,duan2007local}, gene expression data~\\citep{jiang2003dhc}, multimedia databases~\\citep{kisilevich2010p}, and road traffic~\\citep{li2007traffic}.\n\n%%% MFH: I am not sure this is true. Some of these are not pure density-based\n% What is the aim of the DBSCAN package?\n%There are many meaningful ways to define 'natural' clusters based on density. As a result, numerous density-based clustering algorithms have been proposed within the past two decades, e.g.,\n%BIRCH~\\citep{zhang96},\n%DBSCAN algorithm~\\citep{ester1996density},\n%DENCLUE~\\citep{hinneburg1998efficient},\n%CURE~\\citep{guha1998cure},\n%CHAMELEON~\\citep{karypis1999chameleon},\n%CLARANS~\\citep{ng2002clarans},\n%and HDBSCAN~\\citep{campello2015hierarchical}.\n\nThis paper focuses on an efficient implementation of the DBSCAN algorithm~\\citep{ester1996density},\none of the most popular density-based clustering algorithms,\nwhose consistent use earned it the SIG\nKDD 2014's Test of Time Award~\\citep{SIGKDDNe30:online}, and OPTICS~\\citep{ankerst1999optics}, often referred to as an extension of DBSCAN.\n%Matt - what do you mean when you say related algorithms?\n%along with their related algorithms, such as the Local Outlier Factor \\citep{breunig2000lof} and the conversion methods between reachability and dendrogram representations\\citep{sander2003automatic}.\n%Matt - you can cite the KAIS 17 paper in this first sentence\nWhile surveying software tools that implement various density-based clustering algorithms, it was discovered that in a large number of statistical tools, not only do implementations vary significantly in performance~\\citep{kriegel2016black}, but may also lack important components and corrections. Specifically, for the statistical computing environment \\proglang{R}~\\citep{team2013r}, only naive DBSCAN implementations without speed-up with spatial data structures are available (e.g., in the well-known Flexible Procedures for Clustering package~\\citep{fpc}), and OPTICS is not available. %% Matt, what packages? : fixed (fpc). It's probably not worth mentioning largeVis, doesn't even compile/load properly on my machine.\nThis motivated the development of a \\proglang{R} package for density-based clustering with DBSCAN and related algorithms called \\pkg{dbscan}. The \\pkg{dbscan} package contains complete, correct and fast implementations of DBSCAN and OPTICS.\n% precisely as intended by the original authors of the algorithms.\nThe package currently enjoys thousands of new installations from the CRAN repository every month.\n\nThis article presents an overview of the \\proglang{R} package~\\pkg{dbscan}\nfocusing on DBSCAN and OPTICS, outlining its operation and experimentally\ncompares its performance with implementations in other open-source implementations. We first review the concept of density-based clustering and present the DBSCAN and OPTICS algorithms in Section~\\ref{sec:dbc}. This section concludes with a short review of existing software packages that implement these algorithms. Details about \\pkg{dbscan}, with examples of its use, are presented in Section~\\ref{sec:dbscan}. A performance evaluation is presented in Section~\\ref{sec:eval}. Concluding remarks are offered in Section~\\ref{sec:conc}.\n\nA version of this article describing the package \\pkg{dbscan} was published as \\cite{hahsler2019dbscan} and should be cited.\n\n<<echo=FALSE>>=\noptions(useFancyQuotes = FALSE)\ncitation(\"dbscan\")\n@\n\n\\section{Density-based clustering}\\label{sec:dbc}\nDensity-based clustering is now a well-studied field. Conceptually, the idea behind density-based clustering is simple: given a set of data points, define a structure that accurately reflects the underlying density~\\citep{sander2011density}. An important distinction between density-based clustering and alternative approaches to cluster analysis, such as the use of \\emph{(Gaussian) mixture models}~\\citep[see][]{jain1999review}, is that the latter represents a \\emph{parametric} approach in which the observed data are assumed to have been produced by mixture of either Gaussian or other parametric families of distributions.\nWhile certainly useful in many applications, parametric approaches naturally assume clusters will exhibit some type convex (generally hyper-spherical or hyper-elliptical) shape. Other approaches, such as $k$-means clustering (where the $k$ parameter signifies the user-specified number of clusters to find), share this common theme of `minimum variance', where the underlying assumption is made that ideal clusters are found by minimizing some measure of intra-cluster variance (often referred to as cluster cohesion) and maximizing the inter-cluster variance (cluster separation)~\\citep{arbelaitz2013extensive}. Conversely, the label density-based clustering is used for methods which do not assume parametric distributions, are capable of finding arbitrarily-shaped clusters, handle varying amounts of noise, and require no prior knowledge regarding how to set the number of clusters $k$. This methodology is best expressed in the DBSCAN algorithm, which we discuss next.\n\n\\subsection{DBSCAN: Density Based Spatial Clustering of Applications with Noise}\nAs one of the most cited of the density-based clustering algorithms~\\citep{acade96:online}, DBSCAN~\\citep{ester1996density} is likely the best known density-based clustering algorithm in the scientific community today. The central idea behind DBSCAN and its extensions and revisions is the notion that points are assigned to the same cluster if they are \\emph{density-reachable} from each other. To understand this concept, we will go through the most important definitions used in DBSCAN and related algorithms. The definitions and the presented pseudo code follows the original by \\cite{ester1996density}, but are adapted to provide a more consistent presentation with the other algorithms discussed in the paper.\n\nClustering starts with a dataset $D$ containing a set of points\n$p \\in D$.\nDensity-based algorithms need to obtain a density estimate over the data space.\nDBSCAN estimates the density around a point using\nthe concept of $\\epsilon$-neighborhood.\n\n\\begin{mydef} {\\bf $\\epsilon$-Neighborhood}.\nThe $\\epsilon$-neighborhood, $N_\\epsilon(p)$, of a data point $p$ is the set of points within a specified radius $\\epsilon$ around $p$.\n\n    $$N_\\epsilon(p) = \\{q \\;|\\; d(p,q) \\le \\epsilon\\}$$\n\nwhere $d$ is some distance measure and $\\epsilon \\in \\mathbb{R}^+$. Note that the point $p$ is always in its own $\\epsilon$-neighborhood, i.e., $p \\in N_\\epsilon(p)$ always holds.\n\\end{mydef}\n\n% high density definition both below and above\nFollowing this definition, the size of the neighborhood $|N_\\epsilon(p)|$ can\nbe seen as a simple unnormalized kernel density estimate around $p$\nusing a uniform kernel and a bandwidth of $\\epsilon$.\nDBSCAN uses $N_\\epsilon(p)$ and a threshold called $\\mathit{minPts}$\nto detect dense regions and to classify the points in a data set into\n{\\bf core}, {\\bf border}, or {\\bf noise} points.\n\n\\begin{mydef} {\\bf Point classes}.\nA point $p \\in D$ is classified as\n    \\begin{itemize}\n    \\item\n    a {\\bf core point} if $N_\\epsilon(p)$ has\n        high density, i.e., $|N_\\epsilon(p)| \\geq \\mathit{minPts}$ where $\\mathit{minPts} \\in \\mathbb{Z}^+$ is a user-specified density threshold,\n    \\item\n    a {\\bf border point} if $p$ is not a core point, but\n        it is in the neighborhood of a core point $q \\in D$,\n        i.e., $p \\in N_\\epsilon(q)$, or\n    \\item\n    a {\\bf noise point}, otherwise.\n    \\end{itemize}\n\\end{mydef}\n\n\n\\begin{figure}\n    \\minipage{0.49\\textwidth}\n        \\includegraphics[height=\\linewidth, angle=-90, origin=c]{figures/dbscan_a}\\\\\n        \\centerline{(a)}\n    \\endminipage\\hfill\n    \\minipage{0.49\\textwidth}\n        \\includegraphics[height=\\linewidth, angle=-90, origin=c]{figures/dbscan_b}\\\\\n        \\centerline{(b)}\n    \\endminipage\\\\\n\\caption{Concepts used the DBSCAN family of algorithms.\n    (a) shows examples for the three point classes, core, border, and noise points,\n    (b)  illustrates the concept of density-reachability and density-connectivity.\n    }\\label{fig:point_classes}\n\\end{figure}\n\n\n\nA visual example is shown in\nFigure~\\ref{fig:point_classes}(a). The size of the neighborhood for some points\nis shown as a circle and their class is shown as an annotation.\n\nTo form contiguous dense regions from individual points, DBSCAN defines the\nnotions of reachability and connectedness.\n\n\\begin{mydef} {\\bf Directly density-reachable}.\nA point $q \\in D$ is directly density-reachable from a point $p \\in D$ with respect to\n$\\epsilon$ and $\\mathit{minPts}$ if, and only if,\n\\begin{enumerate}\n  \\item $|N_\\epsilon(p)|$ $\\geq$ $\\mathit{minPts}$, and\n  \\item $q$ $\\in$ $N_\\epsilon(p)$.\n\\end{enumerate}\n      That is,\n      $p$ is a core point and\n      $q$ is in its $\\epsilon$-neighborhood.\n\\end{mydef}\n\n\\begin{mydef} {\\bf Density-reachable}. A point $p$ is density-reachable from $q$ if there exists in $D$ an ordered sequence of points $(p_1, p_2, ..., p_n)$\n with $q=p_1$ and $p=p_n$\n    such that $p_i+1$ directly density-reachable from $p_{i}$ $\\forall$ $i \\in \\{1,2, ..., n-1\\}$.\n\\end{mydef}\n\n\n\\begin{mydef} {\\bf Density-connected}. A point $p \\in D$ is density-connected to a point $q \\in D$\nif there is a point $o$ $\\in$ $D$ such that both $p$ and $q$ are density-reachable from $o$.\n\\end{mydef}\n\nThe notion of density-connection can be used to form clusters as\ncontiguous dense regions.\n\n% DBSCAN definition of cluster\n\\begin{mydef} {\\bf Cluster}. A cluster $C$ is a non-empty subset of $D$ satisfying the following conditions:\n\\begin{enumerate}\n    \\item {\\bf Maximality}: If $p \\in C$ and $q$ is density-reachable from $p$, then $q \\in C$; and\n    \\item {\\bf Connectivity}: $\\forall$ $p, q \\in C$, $p$ is density-connected to $q$.\n\\end{enumerate}\n\\end{mydef}\n\nThe DBSCAN algorithm identifies all such clusters by finding all core points and expanding each to all density-reachable points.\n%Algorithm~\\ref{alg:dbscan} presents the details of the DBSCAN implementation in \\pkg{dbscan}. It largely follows the algorithm presented by \\cite{ester1996density}, but presents DBSCAN and cluster expansion in a single function.\nThe algorithm begins with an arbitrary point $p$ and retrieves its $\\epsilon$-neighborhood.\n%, denoted $N_{\\epsilon}(p)$.\nIf it is a core point then it will start a new cluster that is expanded by assigning all points in its neighborhood to the cluster. If an additional core point is found in the neighborhood, then the search is expanded to include also all points in its neighborhood.\nIf no more core points are found in the expanded neighborhood, then the cluster is complete and the remaining points are searched to see if another core point can be found to start a new cluster.\n%The algorithm returns the cluster assignments after all data points have been processed.\nAfter processing all points, points which were not assigned to a cluster are considered noise.\n%Note that border points are point which have been assigned a cluster, but are not core points.\n\n%\\begin{algorithm}[t]\n%     \\caption{DBSCAN}\n%     \\begin{algorithmic}[1]\n%     \\Require $D \\coloneqq$ Database of points\n%     \\Require $\\epsilon \\coloneqq$ User-defined neighborhood radius\n%     \\Require $\\mathit{minPts} \\coloneqq$ Minimum number of points in the neighborhood of a core point\n%     \\Function{DBSCAN}{D, eps, $\\mathit{minPts}$}\n%    \\For{$p$ in $D$}% Iterate through the DB of points, arbitrary starting point\n%     \\Comment{Find core points}\n%        \\If {$p$ has already been visited} % Already Processed points are skipped\n%                continue\n%        \\EndIf\n%\n%        \\State Mark $p$ as visited % Mark progress\n%            \\State $N \\gets N_{\\epsilon}(p)$ % Get all points within eps radius\n%            \\If{$|N| < \\mathit{minPts}$} % How many points were found\n%        continue\n%        \\EndIf\n%\n%        \\State $c \\gets$ new cluster label\n%        \\Comment{Start new cluster for core point and expand}\n%        \\State Assign $p$ to cluster $c$\n%        \\While {$N \\ne \\emptyset$}\n%        \\State $p' \\gets pop(N)$\n%          \\If {$p'$ has already been visited} % Already Processed points are skipped\n%                continue\n%          \\EndIf\n%          \\State Mark $p'$ as visited % Mark progress\n%          \\State $N' \\gets N_{\\epsilon}(p')$ % Get all points within eps radius\n%          \\State Assign $p'$ to cluster $c$\n%            \\If{$|N'| \\ge \\mathit{minPts}$} % How many points were found\n%        \\Comment{Expand cluster for additional core point}\n%        \\State Mark $p'$ as a core point\n%        \\State $N \\gets N \\cup N'$\n%\n%        \\EndIf\n%\n%        \\EndWhile\n%         \\EndFor\n%     \\State \\Return cluster assignments\n%     \\EndFunction\n%     \\end{algorithmic}\n%\\label{alg:dbscan}\n%\\end{algorithm}\n\nIn the DBSCAN algorithm, core points are always part of the same cluster, independent of the order in which the points in the dataset are processed.\nThis is different for border points. Border points might be density-reachable from core points in several clusters and the algorithm assigns them to the\nfirst of these clusters processed which depends on the order of\nthe data points and the particular implementation of the algorithm.\n%Border points, however, although density-reachable from a core point, do not share the density-reachable property (the relation is asymmetric) and thus their cluster assignment depends on the order of which points are visited in the algorithm. This needs to be taken into account when comparing two different implementations since they might visit the points in a different order and thus end up producing different cluster assignments for border points.\nTo alleviate this behavior, \\cite{campello2015hierarchical} suggest a modification called DBSCAN* which considers all border points as noise instead and leaves\nthem unassigned.\n\n\n\\subsection{OPTICS: Ordering Points To Identify Clustering Structure}\\label{sec:optics}\nThere are many instances where it would be useful to detect clusters of varying density. From identifying causes among similar seawater characteristics~\\citep{birant2007st}, to network intrusion detection systems~\\citep{ertoz2003finding}, point of interest detection using geo-tagged photos~\\citep{kisilevich2010p}, classifying cancerous skin lesions~\\citep{celebi2005mining}, the motivations for detecting clusters among varying densities are numerous. The inability to find clusters of varying density is a notable drawback of DBSCAN resulting from the fact that a combination of a specific neighborhood size with a single density threshold $\\mathrm{minPts}$ is used to determine if a point resides in a dense neighborhood.\n\nIn 1999, some of the original DBSCAN authors developed OPTICS~\\citep{ankerst1999optics} to address this concern. OPTICS borrows the core density-reachable concept from DBSCAN. But while DBSCAN may be thought of as a clustering algorithm, searching for natural groups in data, OPTICS is an \\emph{augmented ordering algorithm} from which either flat or hierarchical clustering results can be derived. OPTICS requires the same $\\epsilon$ and $\\mathit{minPts}$ parameters as DBSCAN, however, the $\\epsilon$ parameter is theoretically unnecessary and is only used for the practical purpose of reducing the runtime complexity of the algorithm.\n\nTo describe OPTICS, we introduce an additional concepts called core-distance\nand reachability-distance. All used distances are calculated using the same metric (often Euclidean distance) used for the neighborhood calculation.\n\n\\begin{mydef} {\\bf Core-distance}.\nThe core-distance of a point $p \\in D$ with respect to $\\mathit{minPts}$ and $\\epsilon$ is defined as\n    \\[ \\mathrm{core\\mhyphen dist}(p; \\epsilon, \\mathit{minPts}) = \\begin{cases}\n    \\text{UNDEFINED} & \\text{if} \\; |N_{\\epsilon}(p)| < \\mathit{minPts}, \\text{and} \\\\\n    \\mathrm{minPts\\mhyphen dist}(p) & \\text{otherwise.}\n   \\end{cases}\n    \\] where $\\mathrm{minPts\\mhyphen dist}(p)$ is the\n    distance from $p$ to its $\\mathit{minPts} - 1$ nearest neighbor, i.e.,\n    the minimal radius a neighborhood of size $\\mathit{minPts}$ centered at and\n    including $p$ would have.\n\\end{mydef}\n\n\\begin{mydef} {\\bf Reachability-distance}.\n The reachability-distance of a point $p \\in D$ to a point $q \\in D$\n parameterized by  $\\epsilon$ and $\\mathit{minPts}$ is defined as\n    \\[ \\mathrm{reachability\\mhyphen dist}(p,q; \\epsilon, \\mathit{minPts}) = \\begin{cases}\n    \\text{UNDEFINED} & \\text{if} \\; |N_{\\epsilon}(p)| < \\mathit{minPts}, \\text{and} \\\\\n    \\max(\\mathrm{core\\mhyphen dist}(p), d(p, q)) & \\text{otherwise.}\n   \\end{cases}\n\\]\n\\end{mydef}\n\nThe reachability-distance of a core point $p$ with respect to object $q$\nis the smallest neighborhood radius such that $p$ would be directly\ndensity-reachable from $q$.\nNote that $\\epsilon$ is typically set very large compared to DBSCAN. Therefore,\n$\\mathit{minPts}$ behaves differently for OPTICS: more points will be considered core points and it affects how many nearest neighbors are considered in the core-distance calculation, where larger values will lead to larger and more smooth reachability distributions. This needs to be kept in mind when choosing appropriate parameters.\n\n%The OPTICS algorithm pseudocode is shown in Algorithm~\\ref{alg:optics}.\nOPTICS provides an augmented ordering.\n%sorts points by the reachability-distance to their closest core point.\nThe algorithm starting with a point and expands it's neighborhood like DBSCAN, but it explores the new point in the order of lowest to highest core-distance. The order in which the points are explored along with each point's core- and reachability-distance is the final result of the algorithm.\nAn example of the order and the resulting reachability-distance is shown\nin the form of a reachability plot\nin Figure~\\ref{fig:opticsReachPlot1}. Low reachability-distances shown as\nvalleys represent clusters separated by peaks representing\npoints with larger distances.\nThis density representation essentially conveys the same information as the often used dendrogram or `tree-like' structure.\nThis is why OPTICS is often also noted as a visualization tool.\n\\cite{sander2003automatic} showed how the output of OPTICS can\nbe converted into an equivalent dendrogram, and that\nunder certain conditions, the dendrogram produced by the well known hierarchical clustering with single linkage is identical to running OPTICS with the parameter $\\mathit{minPts} = 2$\n%To make this connection explicit, an OPTICS extension~\\citep{sander2003automatic} showed how that, under certain conditions, the dendrogram produced by the well known hierarchical clustering with single linkage is identical to running OPTICS with the parameter $\\mathit{minPts} = 2$. Due to the widespread usage of dendrograms in\n%the \\proglang{R} computing environment, this conversion algorithm between reachability and dendrogram representations is made available in \\pkg{dbscan}.\n\n\n\\begin{figure}\n    \\centering\n      \\includegraphics{dbscan-opticsReachPlot}\n      \\caption{OPTICS reachability plot example for a data set with four clusters of 100 data points each.}\n      \\label{fig:opticsReachPlot1}\n\\end{figure}\n\n\n%\n%OPTICS evaluates each point's reachability-distance with respect to a neighbor, marks the point as processed, and then continues processing nearest neighbors.\n%The algorithm is similar to DBSCAN. Where OPTICS differs, however, is in the assignment of reachability-distance, a generalized extension to density-reachability. Rather than assigning cluster labels for each object processed, OPTICS stores reachability-distance and core-distance , in a specific ordering such that neighboring objects that have smaller reachability-distances are prioritized. Due to this prioritization, core objects are naturally grouped up near other core objects in the ordering, where each point is labeled with its minimum reachability-distance. An overview of the algorithm is shown below in Algorithm \\ref{alg:optics}.\n%  OPTICS(DB, eps, \\mathit{minPts})\n%     for each point p of DB\n%        p.reachability-distance = UNDEFINED\n%     for each unprocessed point p of DB\n%        N = getNeighbors(p, eps)\n%        mark p as processed\n%        output p to the ordered list\n%        if (core-distance(p, eps, \\mathit{minPts}) != UNDEFINED)\n%           Seeds = empty priority queue\n%           update(N, p, Seeds, eps, \\mathit{minPts})\n%           for each next q in Seeds\n%              N' = getNeighbors(q, eps)\n%              mark q as processed\n%              output q to the ordered list\n%              if (core-distance(q, eps, \\mathit{minPts}) != UNDEFINED)\n%                 update(N', q, Seeds, eps, \\mathit{minPts})\n\n%% \\begin{algorithm}[tp]\n%     \\caption{OPTICS}\n%     \\begin{algorithmic}[1]\n%     \\Require $D \\coloneqq$ Database of points\n%     \\Require $\\epsilon \\coloneqq$ User-defined neighborhood radius\n%     \\Require $\\mathit{minPts} \\coloneqq$ Minimum number of points in the neighborhood of a core point\n%     \\Function{OPTICS}{D, $\\epsilon$, $\\mathit{minPts}$}\n%         \\For{$p$ in $D$}% Iterate through the DB of points, arbitrary starting point\n%            \\If {$p$ has been processed} % Already Processed points are skipped\n%                \\textit{continue}\n%            \\EndIf\n%            \\State $N \\gets N_{\\epsilon}(p)$ \\Comment{expand cluster order}\n%        % Get all points within eps radius\n%            \\State Mark $p$ as processed % Mark progress\n%            \\State queue $\\gets p$\n%            \\If{$|N| \\ \\geq \\mathit{minPts}$}\n%                \\State $Seeds \\gets \\text{ < empty priority queue > }$\n%                \\State update($N$, $p$, $Seeds$, $\\epsilon$, $\\mathit{minPts}$)\n%                \\For{$q$ in $Seeds$}\n%                  \\State $N' \\gets N_{\\epsilon}(q)$\n%                  \\State Mark $q$ as processed % Mark progress\n%                  \\State queue $\\gets q$\n%                  \\If{$|N'| \\ \\geq \\mathit{minPts}$} % How many points were found\n%                      \\State update($N'$, $p$, $Seeds$, $\\epsilon$, $\\mathit{minPts}$)\n%                  \\EndIf\n%                \\EndFor\n%            \\EndIf\n%         \\EndFor\n%     \\State \\Return core-distances\n%         % A call to a function that extracts clusters should be mentioned, but we dont need to specify the extractdbscan or opitics-xi algorithms.\n%         % Mentioned below?\n%     \\EndFunction\n%     \\end{algorithmic}\n%\\label{alg:optics}\n%\\end{algorithm}\n%\n%\\begin{algorithm}[tp]\n%     \\caption{update}\n%     \\begin{algorithmic}[1]\n%     \\Require $N \\coloneqq$ NeighborPts\n%     \\Require $p \\coloneqq$ Current point to process\n%     \\Require $Seeds \\coloneqq$ Priority Queue of known, unprocessed cluster members\n%     \\Require $\\epsilon \\coloneqq$ User-defined $\\epsilon$ radius to consider\n%     \\Require $\\mathit{minPts} \\coloneqq$ The minimum number of points that constitute a cluster\n%     \\Function{update}{N, p, Seeds, $\\epsilon$, $\\mathit{minPts}$}\n%     \\State $p_\\mathrm{core\\mhyphen dist} \\coloneqq \\mathrm{core\\mhyphen dist}(p, \\epsilon, \\mathit{minPts})$\n%     \\For{$o$ in $N$}\n%         \\If {$o$ has not been processed} % Already Processed points are skipped\n%        \\State $new_{rd} \\coloneqq \\max(p_{\\mathrm{core\\mhyphen dist}}, d(p, o))$\n%            \\If{$o_{rd} == \\text{UNDEFINED}$}\n%                \\State $o_{rd} \\gets new_{rd}$\n%                \\State $Seeds.insert \\mhyphen with \\mhyphen priority(o, o_{rd})$\n%            \\Else\n%                \\If{$new_{rd} < o_{rd}$}\n%                  \\State $o_{rd} \\gets new_{rd}$\n%                  \\State $Seeds.move \\mhyphen up(o, new_{rd})$\n%                \\EndIf\n%            \\EndIf\n%        \\EndIf\n%     \\EndFor\n%     \\EndFunction\n%     \\end{algorithmic}\n%  \\label{alg:update}\n%\\end{algorithm}\n\n%\\subsubsection{Cluster Extraction}\\label{sub:opt_cluster_ex}\nFrom the order discovered by OPTICS, two ways to group points into clusters\nwas discussed in ~\\cite{ankerst1999optics}, one which we will refer to as the {\\bf ExtractDBSCAN} method and one which we will refer to as the {\\bf Extract-$\\xi$} method summarized below:\n\\begin{enumerate}\n  \\item {\\bf ExtractDBSCAN} uses a single global\n      reachability-distance threshold $\\epsilon'$ to extract a clustering.\n     This can be seen as a horizontal line in the reachability plot\n\tin~\\ref{fig:opticsReachPlot1}.\n\tPeaks above the cut-off represent noise points and separate the\n\tclusters.\n  \\item {\\bf Extract-$\\xi$}\n      identifies clusters \\emph{hierarchically} by scanning through the ordering that OPTICS produces to identify significant, relative changes in reachability-distance. The authors of OPTICS noted that clusters can be thought of as identifying `dents' in the reachability plot.\n\\end{enumerate}\nThe ExtractDBSCAN method extracts a clustering\nequivalent to DBSCAN* (i.e., DBSCAN where border points stay unassigned).\nBecause this method extracts clusters like DBSCAN, it cannot identify partitions that exhibit very significant differences in density. Clusters of significantly different density can only be identified if the data is well separated and very little noise is present.\nThe second method, which we call Extract-$\\xi$\\footnote{In the original OPTICS publication \\cite{ankerst1999optics}, the algorithm was outlined in Figure 19 and called the 'ExtractClusters' algorithm, where the clusters extracted were referred to as $\\xi$-clusters. To distinguish the method uniquely, we refer to it as the Extract-$\\xi$ method.},\nidentifies a cluster hierarchy and replaces the data dependent global $\\epsilon$ parameter with $\\xi$, a data-independent density-threshold parameter ranging between $0$ and $1$. One interpretation of $\\xi$ is that it describes the relative magnitude of the change of cluster density (i.e., reachability). Significant changes in relative reachability allow for clusters to manifest themselves hierarchically as `dents' in the ordering structure. The hierarchical representation Extract-$\\xi$ can, as opposed to the ExtractDBSCAN method, produce clusters of varying densities.\n\nWith its two ways of extracting clusters from the ordering, whether through either the global $\\epsilon'$ or relative $\\xi$ threshold, OPTICS can be seen as a generalization of DBSCAN. In contexts where one wants to find clusters of similar density, OPTICS's ExtractDBSCAN yields a DBSCAN-like solution, while in other contexts Extract-$\\xi$ can generate a hierarchy representing clusters of varying density. It is thus interesting to note that while DBSCAN has reached critical acclaim, even motivating numerous extensions~\\citep{rehman2014dbscan}, OPTICS has received decidedly less attention. Perhaps one of the reasons for this is because the Extract-$\\xi$ method for grouping points into clusters has gone largely unnoticed, as it is not implemented in most open-source software packages that advertise an implementation of OPTICS. This includes implementations in WEKA~\\citep{hall2009weka}, SPMF~\\citep{fournier2014spmf}, and the PyClustering~\\citep{PyCluste54:online} and Scikit-learn~\\citep{pedregosa2011scikit} libraries for Python. To the best of our knowledge, the only other open-source library\nsporting a complete implementation of OPTICS is ELKI~\\citep{DBLP:journals/pvldb/SchubertKEZSZ15}, written in \\proglang{Java}.\n%\\subsection{A Note on DBSCAN and OPTICS Extensions}\\label{sec:extensions}\n\nIn fact, perhaps due to the (incomplete) implementations of OPTICS cluster extraction across various software libraries, there has been some confusion regarding the usage of OPTICS, and the benefits it offers compared to DBSCAN.\nSeveral papers motivate DBSCAN extensions or devise new algorithms by citing OPTICS as incapable of finding density-heterogeneous clusters~\\citep{ghanbarpour2014exdbscan,chowdhury2010efficient,Gupta2010,duan2007local}. Along the same line of thought, others cite OPTICS as capable of finding clusters of varying density, but either use the DBSCAN-like global density threshold extraction method or refer to OPTICS as a clustering algorithm, without mention of which cluster extraction method was used in their experimentation~\\citep{verma2012comparative,roy2005approach,liu2007vdbscan,pei2009decode}.\nHowever, OPTICS fundamentally returns an ordering of the data which can be post-processed to extract either\n1) a flat clustering with clusters of relatively similar density or\n2) a cluster hierarchy, which is adaptive to representing local densities within the data.\nTo clear up this confusion,\nit seems to be important to add complete implementations to\nexisting software packages and introduce new complete implementations of OPTICS like the \\proglang{R} package~\\pkg{dbscan} described in this paper.\n\n\n\n\\subsection{Current implementations of DBSCAN and OPTICS}\\label{sec:review}\nImplementations of DBSCAN and/or OPTICS are available in many statistical software packages. We focus here on open-source solutions. These include the Waikato Environment for Knowledge Analysis (WEKA)~\\citep{hall2009weka}, the Sequential Pattern Mining Framework (SPMF)~\\citep{fournier2014spmf}, the Environment for Developing KDD-Application supported by Index Structures (ELKI)~\\citep{DBLP:journals/pvldb/SchubertKEZSZ15}, the Python\n%% Matt - need a cite for PyClustering.\nlibrary scikit-learn~\\citep{pedregosa2011scikit}, the PyClustering Data Mining library~\\citep{PyCluste54:online}, the Flexible Procedures for Clustering \\proglang{R} package~\\citep{fpc}, and the \\pkg{dbscan} package~\\citep{dbscan-R} introduced in this paper.\n\n\\begin{table}\n  \\begin{tabularx}{\\textwidth}{ c  c  c  c  c  X }\n        \\hline\n      {\\bf Library} & {\\bf DBSCAN} & {\\bf OPTICS} & {\\bf ExtractDBSCAN} & {\\bf Extract-$\\xi$} & \\\\\n      \\hline\n      \\rule{0pt}{3ex}\n\\pkg{dbscan}    & \\cmark & \\cmark & \\cmark & \\cmark & \\\\\n      ELKI    & \\cmark & \\cmark & \\cmark & \\cmark & \\\\\n      SPMF    & \\cmark & \\cmark & \\cmark & & \\\\\n      PyClustering & \\cmark & \\cmark & \\cmark & & \\\\\n      WEKA    & \\cmark & \\cmark & \\cmark & & \\\\\n      SCIKIT-LEARN & \\cmark & & & & \\\\\n      FPC    & \\cmark & & & & \\\\\n      \\hline\n  \\end{tabularx}\n    \\vspace{2mm}\n    \\begin{tabularx}{\\textwidth}{ c  c c  c  X }\n     \\hline\n    {\\bf Library} & {\\bf Index Acceleration} & {\\bf Dendrogram for OPTICS} & {\\bf Language} & \\\\\n      \\hline\n      \\rule{0pt}{3ex}\n    \\pkg{dbscan} & \\cmark & \\cmark & \\proglang{R} & \\\\\n    ELKI & \\cmark & \\cmark & \\proglang{Java} & \\\\\n    SPMF & \\cmark & & \\proglang{Java} & \\\\\n    PyClustering & \\cmark & & \\proglang{Python} & \\\\\n    WEKA & & & \\proglang{Java} & \\\\\n    SCIKIT-LEARN & \\cmark & & \\proglang{Python} & \\\\\n    FPC & & & \\proglang{R} & \\\\\n      \\hline\n  \\end{tabularx}\n  \\caption{A Comparison of DBSCAN and OPTICS implementations in various\n    open-source statistical software libraries. A \\cmark \\ symbol denotes availability.}\n  \\label{tab:comp}\n\\end{table}\n\nTable~\\ref{tab:comp} presents a comparison of the features offered by these packages. All packages support DBSCAN and most use index acceleration to speed up the $\\epsilon$-neighborhood queries involved in both DBSCAN and OPTICS algorithms, the known bottleneck that typically dominates the runtime and is essential for processing larger data sets. \\pkg{dbscan} is the first \\proglang{R} implementation offering this improvement. OPTICS with ExtractDBSCAN is also widely implemented, but the Extract-$\\xi$ method, as well as the use of dendrograms with OPTICS, is only available in \\pkg{dbscan} and ELKI.\n%It is notable that there still remain minor discrepancies between the implementations (see Completeness subsection %in Section~\\ref{sec:eval} for details).\nA small experimental runtime comparison is provided in Section~\\ref{sec:eval}.\n\n\n\\section{The dbscan package}\\label{sec:dbscan}\nThe package \\pkg{dbscan} provides high performance code for DBSCAN and OPTICS through a \\proglang{C++} implementation (interfaced via the \\pkg{Rcpp} package by \\cite{eddelbuettel2011rcpp}) using the $k$-d tree data structure implemented in the \\proglang{C++} library ANN~\\citep{mount1998ann} to improve $k$ nearest neighbor (kNN) and fixed-radius nearest neighbor search speed.\nDBSCAN and OPTICS share a similar interface.\n\n\\begin{Schunk}\n\\begin{Sinput}\ndbscan(x, eps, minPts = 5, weights = NULL, borderPoints = TRUE, ...)\noptics(x, eps, minPts = 5, ...)\n\\end{Sinput}\n\\end{Schunk}\n\nThe first argument \\code{x} is the data set in form of a \\code{data.frame} or a \\code{matrix}. The implementations use by default Euclidean distance for neighborhood computation. Alternatively, a precomputed set of pair-wise distances between data points stored in a \\code{dist} object can be supplied. Using precomputed distances, arbitrary distance metrics can be used, however, note that $k$-d trees are not used for distance data, but lists of nearest neighbors are precomputed. For \\code{dbscan()} and \\code{optics()}, the parameter \\code{eps} represents the radius of the $\\epsilon$-neighborhood considered for density estimation  and \\code{minPts} represents the density threshold to identify core points.\nNote that \\code{eps} is not strictly necessary for OPTICS but is only used as an upper limit for the considered neighborhood size used to reduce computational complexity.\n\\code{dbscan()} also can use weights for the data points in \\code{x}. The density in a neighborhood is just the sum of the weights of the points inside the neighborhood. By default, each data point has a weight of one, so the density estimate for the neighborhood is just the number of data points inside the neighborhood.\n%This is the reason why the density threshold is called minPoints, i.e., the minimum number of required points in the eps-neighborhood.\nUsing weights, the importance of points can be changed.\n\nThe original DBSCAN implementation assigns border points to the first cluster\nit is density reachable from. Since this may result in different clustering results if the data points are processed in a different order, \\cite{campello2015hierarchical} suggest for DBSCAN* to consider border points as noise. This can be achieved by using \\code{borderPoints = FALSE}. All functions accept additional arguments. % in~\\code{...}.\nThese arguments are passed on to the fixed-radius nearest neighbor search. More details about the implementation of the nearest neighbor search will be presented in Section~\\ref{sec:nn} below.\n\nClusters can be extracted from the linear order produced by OPTICS. The \\pkg{dbscan} implementation of the cluster extraction methods for ExtractDBSCAN and Extract-$\\xi$ are:\n\n\\begin{Schunk}\n\\begin{Sinput}\nextractDBSCAN(object, eps_cl)\nextractXi(object, xi, minimum = FALSE, correctPredecessor = TRUE)\n\\end{Sinput}\n\\end{Schunk}\n\n\\code{extractDBSCAN()} extracts a clustering from an OPTICS ordering that is similar to what DBSCAN would produce with a single global $\\epsilon$ set to \\code{eps_cl}. \\code{extractXi()} extracts clusters hierarchically based on the steepness of the reachability plot. \\code{minimum} controls whether only the minimal (non-overlapping) cluster are extracted. \\code{correctPredecessor} corrects a common artifact known of the original $\\xi$ method presented in~\\cite{ankerst1999optics} by pruning the steep up area for points that have predecessors not in the cluster (see Technical Note in Appendix~\\ref{sec:technote} for details).\n\n\\subsection{Nearest Neighbor Search}\\label{sec:nn}\nThe density based algorithms in \\pkg{dbscan} rely heavily on forming neighborhoods, i.e., finding all points belonging to an $\\epsilon$-neighborhood. A simple approach is to perform a linear search, i.e., always calculating the distances to all other points to find the closest points. This requires $O(n)$ operations, with $n$ being the number of data points, for each time a neighborhood is needed. Since DBSCAN and OPTICS process each data point once, this results in a $O(n^2)$ runtime complexity. A convenient way in \\proglang{R} is to compute a distance matrix with all pairwise distances between points and sort the distances for each point (row in the distance matrix) to precompute the nearest neighbors for each point. However, this method has the drawback that the size of the full distance matrix is $O(n^2)$, and becomes very large and slow to compute for medium to large data sets.\n\nIn order to avoid computing the complete distance matrix,\n\\pkg{dbscan} relies on a space-partitioning data structure called a $k$-d trees~\\citep{bentley1975multidimensional}. This data structure allows \\pkg{dbscan} to identify the kNN or all neighbors within a fixed radius $eps$ more efficiently in sub-linear time using on average only $O(\\mathop{log}(n))$ operations per query.\nThis results in a reduced runtime complexity of $O(n\\mathop{log}(n))$.\nHowever, note that $k$-d trees are known to degenerate for high-dimensional data requiring $O(n)$ operations and leading to a performance no better than linear search.\n% See above\n%However, for high-dimensional data, $k$-d trees are known to degenerate\n%resulting again in a runtime complexity of $O(n^2)$.\nFast kNN search and fixed-radius nearest neighbor search are used in DBSCAN and OPTICS, but we also provide a direct interface in \\pkg{dbscan}, since they are useful in their own right.\n\n\\begin{Schunk}\n\\begin{Sinput}\nkNN(x, k, sort = TRUE, search = \"kdtree\", bucketSize = 10,\n     splitRule = \"suggest\", approx = 0)\n\nfrNN(x, eps, sort = TRUE, search = \"kdtree\", bucketSize = 10,\n     splitRule = \"suggest\", approx = 0)\n\\end{Sinput}\n\\end{Schunk}\n\nThe interfaces only differ in the way that \\code{kNN()} requires to specify \\code{k} while \\code{frNN()} needs the radius \\code{eps}. All other arguments are the same. \\code{x} is the data and the result will be a list of neighbors in \\code{x} for each point in \\code{x}. \\code{sort} controls if the returned points are sorted by distance. \\code{search} controls what searching method should be used. Available search methods are \\code{\"kdtree\"}, \\code{\"linear\"} and \\code{\"dist\"}. The linear search method does not build a search data structure, but performs a complete linear search to find the nearest neighbors.\n%This is typically slow for large data sets, however,\nThe dist method precomputes a dissimilarity matrix which is very fast for small data sets, but problematic for large sets. The default method is to build a $k$-d tree. $k$-d trees are implemented in \\proglang{C++} using a modified version of the ANN library \\citep{mount1998ann} compiled for Euclidean distances. Parameters \\code{bucketSize}, \\code{splitRule} and \\code{approx} are algorithmic parameters which control the way the $k$-d tree is built. \\code{bucketSize} controls the maximal size of the $k$-d tree leaf nodes. \\code{splitRule} specifies the method how the $k$-d tree partitions the data space. We use \\code{\"suggest\"}, which uses the best guess of the ANN library given the data. \\code{approx} greater than zero uses approximate NN search. Only nearest neighbors up to a distance of a factor of $(1+\\mathrm{approx})\\mathrm{eps}$ will be returned, but some actual neighbors may be omitted potentially leading to spurious clusters and noise points. However, the algorithm will enjoy a significant speedup. For more details, we refer the reader to the documentation of the ANN library~\\citep{mount1998ann}. \\code{dbscan()} and \\code{optics()} use internally \\code{frNN()} and the additional arguments in~\\code{...} are passed on to the nearest neighbor search method.\n\n% \\section{Using the dbscan package}\n\\subsection{Clustering with DBSCAN}\nWe use a very simple artificial data set of four slightly overlapping Gaussians in two-dimensional space with 100 points each. We load \\pkg{dbscan},\nset the random number generator to make the results reproducible and create the data set.\n\n<<echo=FALSE>>=\noptions(width = 75)\n@\n\n<<>>=\nlibrary(\"dbscan\")\n\nset.seed(2)\nn <- 400\n\nx <- cbind(\n  x = runif(4, 0, 1) + rnorm(n, sd = 0.1),\n  y = runif(4, 0, 1) + rnorm(n, sd = 0.1)\n  )\n\ntrue_clusters <- rep(1:4, time = 100)\n@\n\n<<fig=TRUE, include=FALSE, label=sampleData, width=5, height=5>>=\nplot(x, col = true_clusters, pch = true_clusters)\n@\n\n\\begin{figure}\n\\centering\n\\includegraphics[width=8cm]{dbscan-sampleData}\n\\caption{The sample dataset, consisting of 4 noisy Gaussian distributions with slight overlap.}\n\\label{fig:sampleData}\n\\end{figure}\n\nThe resulting data set is shown in Figure~\\ref{fig:sampleData}.\n\nTo apply DBSCAN, we need to decide on the neighborhood radius~\\code{eps} and\nthe density threshold~\\code{minPts}. The rule of thumb for minPts is to use at least the number of dimensions of the data set plus one. In our case, this is 3. For eps, we can plot the points' kNN distances (i.e., the distance to the $k$th nearest neighbor) in decreasing order and look for a knee in the plot. The idea behind this heuristic is that points located inside of clusters will have a small $k$-nearest neighbor distance, because they are close to other points in the same cluster, while noise points are isolated and will have a rather large kNN distance. \\pkg{dbscan} provides a function called \\code{kNNdistplot()} to make this easier. For $k$ we use \\code{minPts} - 1 since DBSCAN's  \\code{minPts} include the actual data point and the $k$th nearest neighbors distance does not.\n\n<<fig=TRUE, include=FALSE, label=kNNdistplot, width=7, height=4>>=\nkNNdistplot(x, k = 2)\nabline(h=.06, col = \"red\", lty=2)\n@\n\\begin{figure}\n\\centering\n\\includegraphics{dbscan-kNNdistplot}\n\\caption{$k$-Nearest Neighbor Distance plot.}\n\\label{fig:kNNdistplot}\n\\end{figure}\n\nThe kNN distance plot is shown in Figure~\\ref{fig:kNNdistplot}. A knee is visible at around a 2-NN distance of 0.06. We have manually added a horizontal\nline for reference.\n\nNow we can perform the clustering with the chosen parameters.\n<<>>=\nres <- dbscan(x, eps = 0.06, minPts = 3)\nres\n@\n\nThe resulting clustering identified one large cluster with 191 member points, two medium clusters with around 90 points, several very small\nclusters and 15 noise points (represented by cluster id 0). The available\nfields can be directly accessed using the list extraction operator \\code{$}.\nFor example, the cluster assignment information can be used to plot the data\nwith the clusters identified by different labels and colors.\n\n<<fig=TRUE, include=FALSE, label=dbscanPlot, width=5, height=5>>=\nplot(x, col = res$cluster + 1L, pch = res$cluster + 1L)\n@\n\\begin{figure}\n    \\centering\n      \\includegraphics[width=9cm]{dbscan-dbscanPlot}\n      \\caption{Result of clustering with DBSCAN. Noise is represented as black circles.}\n      \\label{fig:dbscanPlot}\n\\end{figure}\n\nThe scatter plot in Figure~\\ref{fig:dbscanPlot} shows that the clustering\nalgorithm correctly identified the upper two clusters, but merged the lower two\nclusters because the region between them has a high enough density. The small\nclusters are isolated groups of 3 points (passing $\\mathit{minPts}$) and the\nnoise points isolated points. These small clusters can be suppressed by using a larger number for \\code{minPts}.\n\n\\pkg{dbscan} also provides a plot that adds\nconvex cluster hulls to the scatter plot shown in Figure~\\ref{fig:dbscanHullPlot}.\n\n<<fig=TRUE, include=FALSE, label=dbscanHullPlot, width=5, height=5>>=\nhullplot(x, res)\n@\n\\begin{figure}\n    \\centering\n    \\includegraphics[width=9cm]{dbscan-dbscanHullPlot}\n    \\caption{Convex hull plot of the DBSCAN clustering. Noise points\nare black.\nNote that noise points and points of another cluster may lie\nwithin the convex hull of a different cluster. }\n    \\label{fig:dbscanHullPlot}\n    \\vspace{0.1cm}\n\\end{figure}\n\nA clustering can also be used to find out to which clusters new data points\nwould be assigned using\n\\code{predict(object, newdata = NULL, data, ...)}.\nThe predict method uses nearest neighbor assignment to core points and needs the original dataset. Additional parameters %(\\code{...})\nare passed on to the nearest neighbor search method. Here we obtain the cluster assignment for the first 25 data points. Note that an assignment to cluster~0 means that the data point is considered noise because it is not close enough to a core point.\n\n<<>>=\npredict(res, x[1:25,], data = x)\n@\n\n\n\\subsection{Clustering with OPTICS}\n\nUnless OPTICS is purely used to extract a DBSCAN clustering, its parameters\nhave a different effect than for DBSCAN: \\code{eps} is typically chosen rather large (we use 10 here) and \\code{minPts} mostly affects core and reachability-distance calculation, where larger values have a smoothing effect. We use also 10, i.e., the core-distance is defined as the distance to the 9th nearest neighbor (spanning a neighborhood of 10 points).\n\n<<>>=\nres <- optics(x, eps = 10, minPts = 10)\nres\n@\n\nOPTICS is an augmented ordering algorithm, which stores the computed order of the points it found in the \\code{order} element of the returned object.\n\n<<>>=\nhead(res$order, n = 15)\n@\n\nThis means that data point 1 in the data set is the first in the order,\ndata point 363 is the second and so forth.\nThe density-based order produced by OPTICS can be directly plotted\nas a reachability plot.\n\n<<fig=TRUE, include=FALSE, label=opticsReachPlot, width=7, height=4>>=\nplot(res)\n@\n\\begin{figure}\n    \\centering\n      \\includegraphics{dbscan-opticsReachPlot}\n      \\caption{OPTICS reachability plot. Note that the first reachability value is always UNDEFINED.}\n      \\label{fig:opticsReachPlot}\n\\end{figure}\n\n\nThe reachability plot in Figure~\\ref{fig:opticsReachPlot} shows the reachability distance for points\nordered by OPTICS. Valleys represent potential clusters separated by peaks.\nVery high peaks may indicate noise points. To visualize the order\non the original data sets we can plot a line connecting the points in order.\n\n<<fig=TRUE, include=FALSE, label=opticsOrder, width=5, height=5>>=\nplot(x, col = \"grey\")\npolygon(x[res$order,], )\n@\n\\begin{figure}\n    \\centering\n      \\includegraphics[width=8cm]{dbscan-opticsOrder}\n      \\caption{OPTICS order of data points represented as a line.}\n      \\label{fig:opticsOrder}\n\\end{figure}\n\nFigure~\\ref{fig:opticsOrder} shows that points in each cluster are\nvisited in consecutive order starting with the points in the center (the densest region) and then the points in the surrounding area.\n\n\nAs noted in Section~\\ref{sec:optics}, OPTICS has two primary cluster extraction methods using the ordered reachability structure it produces. A DBSCAN-type clustering can be extracted using \\code{extractDBSCAN()} by specifying the global eps parameter. The reachability plot in figure~\\ref{fig:opticsReachPlot} shows four peaks, i.e., points with a high reachability-distance. These points indicate boundaries between clusters four clusters. An \\code{eps} threshold that separates the four clusters can be visually determined. In this case we use \\code{eps_cl}  of 0.065.\n<<fig=TRUE, include=FALSE, label=extractDBSCANReachPlot2, width=7, height=4>>=\nres <- extractDBSCAN(res, eps_cl = .065)\nplot(res)\n@\n\n<<fig=TRUE, include=FALSE, label=extractDBSCANHullPlot2>>=\nhullplot(x, res)\n@\n\n\\begin{figure}\n  \\centering\n  \\includegraphics{dbscan-extractDBSCANReachPlot2}\n      \\caption{Reachability plot for a DBSCAN-type clustering extracted at global $\\epsilon = 0.065$ results in four clusters.}\n  \\label{fig:extractDBSCANReachPlot2}\n  \\centering\n  \\includegraphics[width=9cm]{dbscan-extractDBSCANHullPlot2}\n      \\caption{Convex hull plot for a DBSCAN-type clustering extracted at global $\\epsilon = 0.065$ results in four clusters.}\n  \\label{fig:extractDBSCANHullPlot2}\n\\end{figure}\n\nThe resulting reachability and corresponding clusters are shown in Figures~\\ref{fig:extractDBSCANReachPlot2} and \\ref{fig:extractDBSCANHullPlot2}. The clustering  resembles closely the original structure\nof the four clusters with which the data were generated, with the only difference\nthat points on the boundary of the clusters are marked as noise points.\n\n\\pkg{dbscan} also provides \\code{extractXi()} to extract a hierarchical cluster\nstructure. We use here a \\code{xi} value of 0.05.\n<<>>=\nres <- extractXi(res, xi = 0.05)\nres\n@\n\nThe $\\xi$ method results in a hierarchical clustering structure, and thus points can be members of several nested clusters. Clusters are represented as contiguous ranges in the reachability plot and are available the field \\code{clusters_xi}.\n\n<<>>=\nres$clusters_xi\n@\n\nHere we have seven clusters.\nThe clusters are also visible in the reachability plot.\n\n\n<<fig=TRUE, include=FALSE, label=extractXiReachPlot, height=4, width=7>>=\nplot(res)\n@\n<<fig=TRUE, include=FALSE, label=extractXiHullPlot, width=5, height=5>>=\nhullplot(x, res)\n@\n\n\\begin{figure}\n    \\centering\n      \\includegraphics{dbscan-extractXiReachPlot}\n      \\caption{Reachability plot of a hierarchical clustering\n    extracted with Extract-$\\xi$.}\n      \\label{fig:extractXiReachPlot}\n%\\end{figure}\n%\\begin{figure}[htb]\n    \\centering\n      \\includegraphics[width=9cm]{dbscan-extractXiHullPlot}\n      \\caption{Convex hull plot of a hierarchical clustering\n            extracted with Extract-$\\xi$.}\n      \\label{fig:extractXiHullPlot}\n\\end{figure}\n\nFigure~\\ref{fig:extractXiReachPlot} shows the reachability plot with clusters represented using colors and vertical bars below the plot. The clusters themselves can also be plotted with the convex hull plot function shown in Figure~\\ref{fig:extractXiHullPlot}. Note how the nested structure is shown by clusters inside of clusters. Also note that it is possible for the convex hull, while useful for visualizations, to contain a point that is not considered as part of a cluster grouping.\n\n%\\subsection{LOF}\n%The Local Outlier Factor score can be computed as follows\n%\\ifdefined\\USESWEAVE\n%<<>>=\n%lof <- lof(x, k=3)\n%summary(lof)\n%@\n%The distribution of outlier factors can be view simply using the specialized hist function:\n%<<fig=TRUE, include=FALSE, label=LOF_hist, height=4, widht=9>>=\n%hist(lof, breaks=20)\n%@\n%\\begin{figure}\n%    \\centering\n%    \\includegraphics{dbscan-LOF_hist}\n%    \\caption{LOF outlier histogram.}\n%    \\label{fig:LOF_hist}\n%\\end{figure}\n%\n%The outlier factor can be visualized in a scatter plot through the following:\n%<<fig=TRUE, include=FALSE, label=LOF_plot>>=\n%plot(x, pch = \".\", main = \"LOF (k=3)\")\n%points(x, cex = (lof-1)*3, pch = 1, col=\"red\")\n%text(x[lof>2,], labels = round(lof, 1)[lof>2], pos = 3)\n%@\n%\\begin{figure}\n%    \\centering\n%      \\includegraphics[width=9cm]{dbscan-LOF_plot}\n%      \\caption{Visualization of the local outlier factor of each point in the data set.}\n%      \\label{fig:LOF_plot}\n%\\end{figure}\n%\\else\n%\\fi\n\n\\subsection{Reachability and Dendrograms}\n%The \\pkg{dbscan} package contains a variety of visualization options.\nReachability plots can be converted into equivalent dendrograms\n\\citep{sander2003automatic}.\n\\pkg{dbscan} contains a fast implementation of the reachability-to-dendrogram\nconversion algorithm through the use of a disjoint-set data structure~\\citep{cormen2001introduction, patwary2010experiments}, allowing the user to choose which hierarchical representation they prefer.\nThe conversion algorithm can be directly called for OPTICS objects using the coercion method \\code{as.dendrogram()}.\n\n<<>>=\ndend <- as.dendrogram(res)\ndend\n@\n\nThe dendrogram can be plotted using the standard plot method.\n<<fig=TRUE, include=FALSE, label=opticsDendrogram, height=5, width=7>>=\nplot(dend, ylab = \"Reachability dist.\", leaflab = \"none\")\n@\n\\begin{figure}[t]\n    \\centering\n      \\includegraphics{dbscan-opticsDendrogram}\n      \\caption{Dendrogram structure of OPTICS reordering.}\n      \\label{fig:opticsDendrogram}\n\\end{figure}\n\nNote how the dendrogram in Figure~\\ref{fig:opticsDendrogram} closely resembles\nthe reachability plots with added binary splits. Since the object is a standard dendrogram (from package \\pkg{stats}), it can be used like any other dendrogram\ncreated with hierarchical clustering.\n\n\\section{Performance Comparison}\\label{sec:eval}\n\n\\begin{table}\n\\begin{center}\n    \\begin{tabular}{ c c c }\n    \\hline\n      {\\bf Data set} & \\bf{Size} & \\bf{Dimension}\\\\\n    \\hline\n     Aggregation & 788 & 2\\\\\n     Compound & 399 & 2\\\\\n     D31 & 3,100 & 2 \\\\\n     flame & 240 & 2 \\\\\n     jain & 373 & 2 \\\\\n     pathbased & 300 & 2 \\\\\n     R15 & 600 & 2 \\\\\n     s1 & 5,000 & 2 \\\\\n     s4 & 5,000 & 2 \\\\\n     spiral & 312 & 2\\\\\n     t4.8k & 8,000 & 2 \\\\\n     synth1 & 1000 & 3 \\\\\n     synth2 & 1000 & 10 \\\\\n     synth3 & 1000 & 100 \\\\\n     \\hline\n    \\end{tabular}\n\\end{center}\n    \\caption{Datasets used for comparison.}\n    \\label{tab:dsizes}\n\\end{table}\n\nFinally, we evaluate the performance of \\pkg{dbscan}'s implementation of DBSCAN and OPTICS against other open-source implementations. This is not a comprehensive evaluation study, but is used to demonstrate the performance of \\pkg{dbscan}'s DBSCAN and OPTICS implementation on datasets of varying sizes as compared to other software packages. A comparative test was performed using both DBSCAN and OPTICS algorithms, where supported, for the libraries listed in Table~\\ref{tab:comp}\non page~\\pageref{tab:comp}. The used datasets and their sizes are listed in Table~\\ref{tab:dsizes}.\nThe data sets tested include s1 and s2,\nthe randomly generated but moderately-separated Gaussian clusters often used for agglomerative cluster analysis~\\citep{Ssets},\nthe R15 validation data set used for maximum variance based clustering approach by \\cite{veenman2002maximum},\nthe well-known spatial data set t4.8k used for validation of the CHAMELEON algorithm~\\citep{karypis1999chameleon},\nalong with a variety of shape data sets commonly found in clustering validation papers~\\citep{gionis2007clustering, zahn1971graph, chang2008robust, jain2005law, fu2007flame}.\n\nIn 2019, we performed a comparison between \\pkg{dbscan} 0.9-8, \\pkg{fpc} 2.1-10, ELKI version 0.7, PyClustering 0.6.6, SPMF v2.10, WEKA 3.8.0, SciKit-Learn 0.17.1 on a MacBook Pro equipped with a 2.5 GHz Intel Core i7 processor, running OS X El Capitan 10.11.6.\nNote that newer versions of all mentioned software packages have been released since then. Changes in data structures and added optimization may result in significant improvements in runtime for different packages.\n\nAll data sets where normalized to the unit interval, [0, 1], per dimension to standardize neighbor queries. For all data sets we used $\\mathit{minPts} = 2$ and $\\epsilon = 0.10$ for DBSCAN. For OPTICS, $\\mathit{minPts} = 2$ with a large $\\epsilon = 1$ was used.\nWe replicated each run for each data set 15 times and report\nthe average runtime here.\nFigures~\\ref{fig:dbscan_bench}\nand \\ref{fig:optics_bench}\nshows the runtimes. The datasets are sorted from easiest to hardest\nand the algorithm in the legend are sorted from on average fastest to slowest.\nDimensionality, used distance function, data set size, and other data characteristics have a substantial impact on runtime performance.\nThe results show that the implementation in $\\pkg{dbscan}$\ncompares very favorably to the other implementations (but note that we did not enable data indexing in ELKI, and used a very small $\\mathit{minPts}$).\n\n\\begin{figure}\n    \\centering\n    \\includegraphics[width=0.80\\textwidth]{figures/dbscan_benchmark}\n    \\caption{Runtime of DBSCAN in milliseconds (y-axis, logarithmic scale) vs. the name of the data set tested (x-axis).}\n    \\label{fig:dbscan_bench}\n\\end{figure}\n\\begin{figure}\n    \\centering\n    \\includegraphics[width=0.80\\textwidth]{figures/optics_benchmark}\n    \\caption{Runtime of OPTICS in milliseconds (y-axis, logarithmic scale) vs. the name of the data set tested (x-axis).}\n    \\label{fig:optics_bench}\n\\end{figure}\n\n% Clear page for Before Conclusind Remarks\n%\\clearpage\n\n\\section{Concluding Remarks}\\label{sec:conc}\nThe \\pkg{dbscan} package offers a set of scalable, robust, and complete implementations of popular density-based clustering algorithms from the DBSCAN family. The main features of \\pkg{dbscan} are a\nsimple interface to fast clustering and cluster extraction algorithms, extensible data structures and methods for both density-based clustering visualization and representation including efficient conversion algorithms between\nOPTICS ordering and dendrograms. In addition to DBSCAN and OPTICS discussed in this paper, \\pkg{dbscan} also contains a fast version of the local outlier factor (LOF) algorithm~\\citep{breunig2000lof} and an implementation of HDBSCAN~\\citep{campello2015hierarchical} is under development.\n\n\\section{Acknowledgments}\nThis work is partially supported by industrial and government partners at the Center for Surveillance Research, a National Science Foundation I/UCRC.\n\n%\\clearpage\n\\bibliography{dbscan}\n\\clearpage\n\n\\appendix\n\\section{Technical Note on OPTICS cluster extraction}\\label{sec:technote}\nOf the two cluster extraction methods outlined in the publication, the flat DBSCAN-type extraction method seems to remain the defacto clustering method implemented across most statistical software for OPTICS. However, this method does not provide any advantage over the original DBSCAN method. To the best of the authors' knowledge, the only (other) library that has implemented the Extract-$\\xi$ method for finding $\\xi$-clusters is the Environment for Developing KDD-Applications Supported by Index Structures (ELKI) \\citep{DBLP:journals/pvldb/SchubertKEZSZ15}. Perhaps much of the complication as to why nearly every statistical computing framework has neglected the Extract-$\\xi$ cluster method stems from the fact that the original specification (Figure~19 in~\\cite{ankerst1999optics}), while mostly complete, lacks important corrections that otherwise produce artifacts when clustering data~\\citep{DBLP:conf/lwa/SchubertG18}. In the original specification of the algorithm, the `dents' of the ordering structure OPTICS produces are scanned for significant changes in reachability (hence the $\\xi$ threshold), where clusters are represented by contiguous ranges of points that are distinguished by $1 - \\xi$ density-reachability changes in the reachability plot. It is possible, however, after the recursive completion of the \\code{update} algorithm\n(Figure~7 in~\\cite{ankerst1999optics})\nthat the next point processed in the ordering is not actually within the reachability distance of other members of cluster being currently processed. To account for the missing details described above, Erich Schubert introduced a small postprocessing step, first added in the ELKI framework and published much later~\\citep{DBLP:conf/lwa/SchubertG18}. This filter corrects the artifacting based on the predecessor of each point~\\citep{DBLP:conf/lwa/SchubertG18}, thus improving the $\\xi$-cluster method from the original implementation mentioned in the original OPTICS paper. This correction was not introduced until version 0.7.0 of the ELKI framework, released in 2015, 16 years after the original publication of OPTICS and the Extract-$\\xi$ method and not published in written form until 2018. \\pkg{dbscan} has incorporated these important changes\nin \\code{extractXi()}\nvia the option \\code{correctPredecessors} which is by default enabled.\n\n%% Not included to keep things simple\n% To further complicate the status of the \\opxi algorithm's existing\n% implementations, the current ELKI implementation, aside from the predecessor\n% correction, does not match the original specification of the OPTICS algorithm.\n% Mentioned by~\\cite{ankerst1999optics}, \\opxi should not include the last\n% point of a steep-up area inside of each cluster range\\footnote{We alerted the\n% authors of ELKI to our correction, which is to be included in the next major\n% release.}. The differences on even a small, randomly generated dataset\n% are shown on Figures~\\ref{fig:dbscan_xi} and \\ref{fig:elki_xi} using the\n% \\pkg{dbscan} package result. Thus, \\pkg{dbscan} offers complete, a correct\n% \\opxi implementation, true to the original specification.\n%\n% \\begin{figure}\n%   \\centering\n%     \\begin{minipage}[t]{0.48\\textwidth}\n%       \\includegraphics[width=\\textwidth]{figures/dbscan_xi_bare}\n%       \\caption{Excluding the last point in the steep-up area.}\n%       \\label{fig:dbscan_xi}\n%     \\end{minipage}\n%   \\hfill\n%     \\begin{minipage}[t]{0.48\\textwidth}\n%       \\includegraphics[width=\\textwidth]{figures/elki_xi_bare}\n%       \\caption{Including the last point in the steep-up area. Note the sharp edges caused by points that are clearly not density-connected to their respective clusters.}\n%       \\label{fig:elki_xi}\n%     \\end{minipage}\n% \\end{figure}\n% % Much of the complication stems from the fact that the original specification of the \\opxi extraction method defined in the paper (Figure 19 of~\\cite{}), while mostly complete, lacks important corrections that otherwise produces many artifacts when clustering data.  In the original specification of the \\opxi algorithm, points within the ``dents'' of the ordering structure represent collections of spatially dense neighborhoods. Its possible, however, after OPTICS finishes ordering a spatially close cluster, that the next point included in the ordering may not be a member of current cluster (there are no more points in the current cluster to add). This can be remedied by pruning an area of each cluster known as the steep-up area (see Figure 19 in \\citep{ankerst1999optics} for details) of points that do not contain predecessors within the same cluster.\n\n\\end{document}\n"
  },
  {
    "path": "vignettes/dbscan.bib",
    "content": "@Article{hahsler2019dbscan,\n    title = {{dbscan}: Fast Density-Based Clustering with {R}},\n    author = {Michael Hahsler and Matthew Piekenbrock and Derek Doran},\n    journal = {Journal of Statistical Software},\n    year = {2019},\n    volume = {91},\n    number = {1},\n    pages = {1--30},\n    doi = {10.18637/jss.v091.i01},\n  }\n\n\n@inproceedings{ester1996density,\n  title={A density-based algorithm for discovering clusters in large spatial databases with noise.},\n  author={Ester, Martin and Kriegel, Hans-Peter and Sander, J{\\\"o}rg and Xu, Xiaowei and others},\n  booktitle={Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96)},\n  pages={226--231},\n  year={1996},\n  url = {https://dl.acm.org/doi/10.5555/3001460.3001507}\n}\n\n@Manual{dbscan-R,\n  title = {dbscan: Density Based Clustering of Applications with Noise (DBSCAN) and Related Algorithms},\n  author = {Michael Hahsler and Matthew Piekenbrock},\n  note = {R package version 0.9-8.2},\n  year={2016}\n}\n%% Original OPTICS paper\n%% -----------------------------------------------------------------------------\n@inproceedings{ankerst1999optics,\n  title={OPTICS: ordering points to identify the clustering structure},\n  author={Ankerst, Mihael and Breunig, Markus M and Kriegel, Hans-Peter and Sander, J{\\\"o}rg},\n  booktitle={ACM Sigmod Record},\n  volume={28},\n  number={2},\n  pages={49--60},\n  year={1999},\n  organization={ACM},\n  doi = {10.1145/304181.304187}\n}\n\n% OPTICS cluster extraction improvements\n% -----------------------------------------------------------------------------\n@inproceedings{DBLP:conf/lwa/SchubertG18,\n  author    = {Erich Schubert and\n               Michael Gertz},\n  title     = {Improving the Cluster Structure Extracted from {OPTICS} Plots},\n  booktitle = {Lernen, Wissen, Daten, Analysen (LWDA 2018)},\n  series    = {{CEUR} Workshop Proceedings},\n  volume    = {2191},\n  pages     = {318--329},\n  publisher = {CEUR-WS.org},\n  year      = {2018}\n}\n\n% Original LOF paper\n% -----------------------------------------------------------------------------\n@inproceedings{breunig2000lof,\n title={LOF: identifying density-based local outliers},\n author={Breunig, Markus M and Kriegel, Hans-Peter and Ng, Raymond T and Sander, J{\\\"o}rg},\n booktitle={ACM Int. Conf. on Management of Data},\n volume={29},\n number={2},\n pages={93--104},\n year={2000},\n organization={ACM},\n doi = {10.1145/335191.335388}\n}\n\n% 2003 Reachability <--> Dendrograms Conversions Paper\n% -----------------------------------------------------------------------------\n@inproceedings{sander2003automatic,\n title={Automatic extraction of clusters from hierarchical clustering representations},\n author={Sander, J{\\\"o}rg and Qin, Xuejie and Lu, Zhiyong and Niu, Nan and Kovarsky, Alex},\n booktitle={Pacific-Asia Conference on Knowledge Discovery and Data Mining},\n pages={75--87},\n year={2003},\n organization={Springer}\n}\n\n% Original BIRCH paper\n% -----------------------------------------------------------------------------\n@inproceedings{zhang96,\n title={BIRCH: an efficient data clustering method for very large databases},\n author={Zhang, Tian and Ramakrishnan, Raghu and Livny, Miron},\n booktitle={ACM Sigmod Record},\n volume={25},\n number={2},\n pages={103--114},\n year={1996},\n organization={ACM}\n}\n\n% GDBSCAN Paper (Generalized DBSCAN, by Sanders)\n% -----------------------------------------------------------------------------\n@article{sander1998density,\n title={Density-based clustering in spatial databases: The algorithm gdbscan and its applications},\n author={Sander, J{\\\"o}rg and Ester, Martin and Kriegel, Hans-Peter and Xu, Xiaowei},\n journal={Data mining and knowledge discovery},\n volume={2},\n number={2},\n pages={169--194},\n year={1998},\n publisher={Springer}\n}\n\n% HDBSCAN* Newest Paper\n% -----------------------------------------------------------------------------\n@article{campello2015hierarchical,\n title={Hierarchical density estimates for data clustering, visualization, and outlier detection},\n author={Campello, Ricardo JGB and Moulavi, Davoud and Zimek, Arthur and Sander, Joerg},\n journal={ACM Transactions on Knowledge Discovery from Data (TKDD)},\n volume={10},\n number={1},\n pages={5},\n year={2015},\n publisher={ACM},\n doi = {10.1145/2733381}\n}\n\n% First HDBSCAN* introduction paper, later revised in 2015. The newer one is better.\n% -----------------------------------------------------------------------------\n@inproceedings{campello2013density,\n title={Density-based clustering based on hierarchical density estimates},\n author={Campello, Ricardo JGB and Moulavi, Davoud and Sander, J{\\\"o}rg},\n booktitle={Pacific-Asia Conference on Knowledge Discovery and Data Mining},\n pages={160--172},\n year={2013},\n organization={Springer},\n doi = {10.1007/978-3-642-37456-2_14}\n}\n\n% The new-ish 'Standard Methodology' paper of that 'tackles the methodological  drawbacks' % of internal clustering validation\n% -----------------------------------------------------------------------------\n@article{gurrutxaga2011towards,\n title={Towards a standard methodology to evaluate internal cluster validity indices},\n author={Gurrutxaga, Ibai and Muguerza, Javier and Arbelaitz, Olatz and P{\\'e}rez, Jes{\\'u}s M and Mart{\\'\\i}n, Jos{\\'e} I},\n journal={Pattern Recognition Letters},\n volume={32},\n number={3},\n pages={505--515},\n year={2011},\n publisher={Elsevier}\n}\n\n% Original ABACUS - Workaround implementation of mixture modeling for finding\n% arbitrary shapes\n% -----------------------------------------------------------------------------\n@article{gegick2011abacus,\n title={ABACUS: mining arbitrary shaped clusters from large datasets based on backbone identification},\n author={Gegick, M},\n year={2011},\n publisher={SIAM}\n}\n\n\n% Original Silhouette Index Paper\n% -----------------------------------------------------------------------------\n@article{rousseeuw1987silhouettes,\n title={Silhouettes: a graphical aid to the interpretation and validation of cluster analysis},\n author={Rousseeuw, Peter J},\n journal={Journal of computational and applied mathematics},\n volume={20},\n pages={53--65},\n year={1987},\n publisher={Elsevier}\n}\n\n% Extensive Comparative Study of IVMS\n% -----------------------------------------------------------------------------\n@article{arbelaitz2013extensive,\n title={An extensive comparative study of cluster validity indices},\n author={Arbelaitz, Olatz and Gurrutxaga, Ibai and Muguerza, Javier and P{\\'e}rez, Jes{\\'u}S M and Perona, I{\\~n}Igo},\n journal={Pattern Recognition},\n volume={46},\n number={1},\n pages={243--256},\n year={2013},\n publisher={Elsevier}\n}\n\n% Graph Theory measures for Internal Cluster Validation\n% -----------------------------------------------------------------------------\n@article{pal1997cluster,\n title={Cluster validation using graph theoretic concepts},\n author={Pal, Nikhil R and Biswas, J},\n journal={Pattern Recognition},\n volume={30},\n number={6},\n pages={847--857},\n year={1997},\n publisher={Elsevier}\n}\n\n% Rankings of research papers by citation count; used for showing DBSCAN\n% popularity\n% -----------------------------------------------------------------------------\n@misc{acade96:online,\nauthor = {{Microsoft Academic Search}},\ntitle = {Top publications in data mining},\nmonth = {},\nyear = {2016},\nnote = {(Accessed on 08/29/2016)}\n}\n\n@article{PyCluste54:online,\ndoi = {10.21105/joss.01230},\nurl = {https://doi.org/10.21105/joss.01230},\nyear = {2019},\npublisher = {The Open Journal},\nvolume = {4},\nnumber = {36},\npages = {1230},\nauthor = {Novikov, Andrei V.},\ntitle = {PyClustering: Data Mining Library},\njournal = {Journal of Open Source Software}\n}\n\n\n% Hartigans convex density estimation model\n% -----------------------------------------------------------------------------\n@article{hartigan1987estimation,\n title={Estimation of a convex density contour in two dimensions},\n author={Hartigan, JA},\n journal={Journal of the American Statistical Association},\n volume={82},\n number={397},\n pages={267--270},\n year={1987},\n publisher={Taylor \\& Francis}\n}\n\n% Bentleys Original KDTree Paper\n% -----------------------------------------------------------------------------\n@article{bentley1975multidimensional,\n title={Multidimensional binary search trees used for associative searching},\n author={Bentley, Jon Louis},\n journal={Communications of the ACM},\n volume={18},\n number={9},\n pages={509--517},\n year={1975},\n publisher={ACM}\n}\n\n% Original CLARANS paper\n% -----------------------------------------------------------------------------\n@article{ng2002clarans,\n title={CLARANS: A method for clustering objects for spatial data mining},\n author={Ng, Raymond T. and Han, Jiawei},\n journal={IEEE transactions on knowledge and data engineering},\n volume={14},\n number={5},\n pages={1003--1016},\n year={2002},\n publisher={IEEE}\n}\n\n% Original DENCLUE paper\n% -----------------------------------------------------------------------------\n@inproceedings{hinneburg1998efficient,\n title={An efficient approach to clustering in large multimedia databases with noise},\n author={Hinneburg, Alexander and Keim, Daniel A},\n booktitle={KDD},\n volume={98},\n pages={58--65},\n year={1998}\n}\n\n% Original Chameleon Paper\n% -----------------------------------------------------------------------------\n@article{karypis1999chameleon,\n title={Chameleon: Hierarchical clustering using dynamic modeling},\n author={Karypis, George and Han, Eui-Hong and Kumar, Vipin},\n journal={Computer},\n volume={32},\n number={8},\n pages={68--75},\n year={1999},\n publisher={IEEE}\n}\n\n% Original CURE algorithm\n% -----------------------------------------------------------------------------\n@inproceedings{guha1998cure,\n title={CURE: an efficient clustering algorithm for large databases},\n author={Guha, Sudipto and Rastogi, Rajeev and Shim, Kyuseok},\n booktitle={ACM SIGMOD Record},\n volume={27},\n number={2},\n pages={73--84},\n year={1998},\n organization={ACM}\n}\n\n% R statistical computing language citation\n% -----------------------------------------------------------------------------\n@article{team2013r,\n title={R: A language and environment for statistical computing},\n author={Team, R Core and others},\n year={2013},\n publisher={Vienna, Austria}\n}\n\n% WEKA\n% -----------------------------------------------------------------------------\n@article{hall2009weka,\n title={The WEKA data mining software: an update},\n author={Hall, Mark and Frank, Eibe and Holmes, Geoffrey and Pfahringer, Bernhard and Reutemann, Peter and Witten, Ian H},\n journal={ACM SIGKDD explorations newsletter},\n volume={11},\n number={1},\n pages={10--18},\n year={2009},\n publisher={ACM}\n}\n\n% SPMF Java Machine Learning Library\n% -----------------------------------------------------------------------------\n@article{fournier2014spmf,\n title={SPMF: a Java open-source pattern mining library.},\n author={Fournier-Viger, Philippe and Gomariz, Antonio and Gueniche, Ted and Soltani, Azadeh and Wu, Cheng-Wei and Tseng, Vincent S and others},\n journal={Journal of Machine Learning Research},\n volume={15},\n number={1},\n pages={3389--3393},\n year={2014}\n}\n\n% Python Scikit Learn\n% -----------------------------------------------------------------------------\n@article{pedregosa2011scikit,\n title={Scikit-learn: Machine learning in Python},\n author={Pedregosa, Fabian and Varoquaux, Ga{\\\"e}l and Gramfort, Alexandre and Michel, Vincent and Thirion, Bertrand and Grisel, Olivier and Blondel, Mathieu and Prettenhofer, Peter and Weiss, Ron and Dubourg, Vincent and others},\n journal={Journal of Machine Learning Research},\n volume={12},\n number={Oct},\n pages={2825--2830},\n year={2011}\n}\n\n% MATLAB TOMCAT Toolkit\n% -----------------------------------------------------------------------------\n@article{daszykowski2007tomcat,\n title={TOMCAT: A MATLAB toolbox for multivariate calibration techniques},\n author={Daszykowski, Micha{\\l} and Serneels, Sven and Kaczmarek, Krzysztof and Van Espen, Piet and Croux, Christophe and Walczak, Beata},\n journal={Chemometrics and intelligent laboratory systems},\n volume={85},\n number={2},\n pages={269--277},\n year={2007},\n publisher={Elsevier}\n}\n\n% OPTICS code for TOMCAT\n% -----------------------------------------------------------------------------\n@article{daszykowski2002looking,\n title={Looking for natural patterns in analytical data. 2. Tracing local density with OPTICS},\n author={Daszykowski, Michael and Walczak, Beata and Massart, Desire L},\n journal={Journal of chemical information and computer sciences},\n volume={42},\n number={3},\n pages={500--507},\n year={2002},\n publisher={ACS Publications}\n}\n\n% Java ML library\n% -----------------------------------------------------------------------------\n@comment{ Abeel, T.; de Peer, Y. V. & Saeys, Y. Java-ML: A Machine Learning\n          Library, Journal of Machine Learning Research, 2009, 10, 931-934  }\n@book{abeel2009journal,\nauthor = \"Abeel, T. ; de Peer and Y. V. and Saeys, Y. Java-ML: A Machine Learning Library\",\ntitle = \"Journal of Machine Learning Research\",\npublisher = \"10\",\npages = \"931--934\",\nyear = 2009\n}\n\n\n% ELKI\n% -----------------------------------------------------------------------------\n@article{DBLP:journals/pvldb/SchubertKEZSZ15,\n author    = {Erich Schubert and\n              Alexander Koos and\n              Tobias Emrich and\n              Andreas Z{\\\"{u}}fle and\n              Klaus Arthur Schmid and\n              Arthur Zimek},\n title     = {A Framework for Clustering Uncertain Data},\n journal   = {{PVLDB}},\n volume    = {8},\n number    = {12},\n pages     = {1976--1979},\n year      = {2015},\n url       = {http://www.vldb.org/pvldb/vol8/p1976-schubert.pdf},\n timestamp = {Mon, 30 May 2016 12:01:10 +0200},\n biburl    = {http://dblp.uni-trier.de/rec/bib/journals/pvldb/SchubertKEZSZ15},\n bibsource = {dblp computer science bibliography, http://dblp.org}\n}\n\n% BIRCH CRAN records\n% -----------------------------------------------------------------------------\n@misc{CRANPack84:online, author={CRAN}, title = {CRAN - Package birch}, howpublished = {\\url{https://cran.r-project.org/web/packages/birch/index.html}}, month = {}, year = {2016}, note = {(Accessed on 09/16/2016)} }\n\n% Spectral Clustering\n% ----------------------------------------------------------------------------\n@inproceedings{dhillon2004kernel,\n title={Kernel k-means: spectral clustering and normalized cuts},\n author={Dhillon, Inderjit S and Guan, Yuqiang and Kulis, Brian},\n booktitle={Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining},\n pages={551--556},\n year={2004},\n organization={ACM}\n}\n\n\n% Disjoint-set data structure (2 citations)\n% -----------------------------------------------------------------------------\n@misc{cormen2001introduction,\n title={Introduction to algorithms second edition},\n author={Cormen, Thomas H and Leiserson, Charles E and Rivest, Ronald L and Stein, Clifford},\n year={2001},\n publisher={The MIT Press}\n}\n@inproceedings{patwary2010experiments,\n title={Experiments on union-find algorithms for the disjoint-set data structure},\n author={Patwary, Md Mostofa Ali and Blair, Jean and Manne, Fredrik},\n booktitle={International Symposium on Experimental Algorithms},\n pages={411--423},\n year={2010},\n organization={Springer}\n}\n\n% SUBCLU high-dimensional density based clustering\n% -----------------------------------------------------------------------------\n@inproceedings{kailing2004density,\n title={Density-connected subspace clustering for high-dimensional data},\n author={Kailing, Karin and Kriegel, Hans-Peter and Kr{\\\"o}ger, Peer},\n booktitle={Proc. SDM},\n volume={4},\n year={2004},\n organization={SIAM}\n}\n\n% DBSCAN KDD Test of Time award\n% -----------------------------------------------------------------------------\n@misc{SIGKDDNe30:online,\nauthor = {SIGKDD},\ntitle = {SIGKDD News : 2014 SIGKDD Test of Time Award},\nhowpublished = {\\url{https://www.kdd.org/News/view/2014-sigkdd-test-of-time-award}},\nmonth = {},\nyear = {2014},\nnote = {(Accessed on 10/10/2016)}\n}\n\n% Raftery and Fraley's model-based clustering paper\n% -----------------------------------------------------------------------------\n@article{fraley2002model,\n title={Model-based clustering, discriminant analysis, and density estimation},\n author={Fraley, Chris and Raftery, Adrian E},\n journal={Journal of the American statistical Association},\n volume={97},\n number={458},\n pages={611--631},\n year={2002},\n publisher={Taylor \\& Francis}\n}\n\n% FPC: Flexible Procedures for Clustering\n% -----------------------------------------------------------------------------\n@Manual{fpc,\ntitle = {fpc: Flexible Procedures for Clustering},\nauthor = {Christian Hennig},\nyear = {2015},\nnote = {R package version 2.1-10},\nurl = {https://CRAN.R-project.org/package=fpc},\n}\n\n% From the ELKI Benchmarking page\n% -----------------------------------------------------------------------------\n@article{kriegel2016black,\n  title={The (black) art of runtime evaluation: Are we comparing algorithms or implementations?},\n  author={Kriegel, Hans-Peter and Schubert, Erich and Zimek, Arthur},\n  journal={Knowledge and Information Systems},\n  pages={1--38},\n  year={2016},\n  publisher={Springer}\n}\n\n% ANN Library\n% -----------------------------------------------------------------------------\n@manual{mount1998ann,\n title={ANN: library for approximate nearest neighbour searching},\n author={Mount, David M and Arya, Sunil},\n year={2010},\n url = {http://www.cs.umd.edu/~mount/ANN/},\n}\n\n% Rcpp\n% -----------------------------------------------------------------------------\n@article{eddelbuettel2011rcpp,\n title={Rcpp: Seamless R and C++ integration},\n author={Eddelbuettel, Dirk and Fran{\\c{c}}ois, Romain and Allaire, J and Chambers, John and Bates, Douglas and Ushey, Kevin},\n journal={Journal of Statistical Software},\n volume={40},\n number={8},\n pages={1--18},\n year={2011}\n}\n\n% ST-DBCAN: SpatioTemporal DBSCAN\n% -----------------------------------------------------------------------------\n@article{birant2007st,\n title={ST-DBSCAN: An algorithm for clustering spatial--temporal data},\n author={Birant, Derya and Kut, Alp},\n journal={Data \\& Knowledge Engineering},\n volume={60},\n number={1},\n pages={208--221},\n year={2007},\n publisher={Elsevier}\n}\n\n% DBSCAN History (small relative to actual number of extensions)\n% -----------------------------------------------------------------------------\n@inproceedings{rehman2014dbscan,\n title={DBSCAN: Past, present and future},\n author={Rehman, Saif Ur and Asghar, Sohail and Fong, Simon and Sarasvady, S},\n booktitle={Applications of Digital Information and Web Technologies (ICADIWT), 2014 Fifth International Conference on the},\n pages={232--238},\n year={2014},\n organization={IEEE}\n}\n\n\n%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\n% \t\t\t\t\t\t\t\t Miscellaneous \t\t\t\t\t\t           %\n%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%\n\n\n\n@article{Gupta2010,\nabstract = {A key application of clustering data obtained from sources such as microarrays, protein mass spectroscopy, and phylogenetic profiles is the detection of functionally related genes. Typically, only a small number of functionally related genes cluster into one or more groups, and the rest need to be ignored. For such situations, we present Automated Hierarchical Density Shaving (Auto-HDS), a framework that consists of a fast hierarchical density-based clustering algorithm and an unsupervised model selection strategy. Auto-HDS can automatically select clusters of different densities, present them in a compact hierarchy, and rank individual clusters using an innovative stability criteria. Our framework also provides a simple yet powerful 2D visualization of the hierarchy of clusters that is useful for further interactive exploration. We present results on Gasch and Lee microarray data sets to show the effectiveness of our methods. Additional results on other biological data are included in the supplemental material.},\nauthor = {Gupta, Gunjan and Liu, Alexander and Ghosh, Joydeep},\ndoi = {10.1109/TCBB.2008.32},\nfile = {:Users/mpiekenbrock/ResearchLibrary/Automated Hierarchical Density Shaving- A Robust Automated Clustering and Visualization Framework for Large Biological Data Sets.pdf:pdf},\nisbn = {1557-9964},\nissn = {15455963},\njournal = {IEEE/ACM Transactions on Computational Biology and Bioinformatics},\nkeywords = {Bioinformatics,Clustering,Data and knowledge visualization,Mining methods and algorithms},\nnumber = {2},\npages = {223--237},\npmid = {20431143},\ntitle = {{Automated hierarchical density shaving: A robust automated clustering and visualization framework for large biological data sets}},\nvolume = {7},\nyear = {2010}\n}\n@article{Ssets,\n   author = {P. Fr\\\"anti and O. Virmajoki},\n   title = {Iterative shrinking method for clustering problems},\n   journal = {Pattern Recognition},\n   year = {2006},\n   volume = {39},\n   number = {5},\n   pages = {761--765}\n}\n\n% Path and Spiral based\n@article{chang2008robust,\n title={Robust path-based spectral clustering},\n author={Chang, Hong and Yeung, Dit-Yan},\n journal={Pattern Recognition},\n volume={41},\n number={1},\n pages={191--203},\n year={2008},\n publisher={Elsevier}\n}\n\n% Compound dataset\n@article{zahn1971graph,\n title={Graph-theoretical methods for detecting and describing gestalt clusters},\n author={Zahn, Charles T},\n journal={IEEE Transactions on computers},\n volume={100},\n number={1},\n pages={68--86},\n year={1971},\n publisher={IEEE}\n}\n\n% Aggregation dataset\n@article{gionis2007clustering,\n title={Clustering aggregation},\n author={Gionis, Aristides and Mannila, Heikki and Tsaparas, Panayiotis},\n journal={ACM Transactions on Knowledge Discovery from Data (TKDD)},\n volume={1},\n number={1},\n pages={4},\n year={2007},\n publisher={ACM}\n}\n\n% R15 dataset\n@article{veenman2002maximum,\n title={A maximum variance cluster algorithm},\n author={Veenman, Cor J. and Reinders, Marcel J. T. and Backer, Eric},\n journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},\n volume={24},\n number={9},\n pages={1273--1280},\n year={2002},\n publisher={IEEE}\n}\n\n@inproceedings{reilly2010detection,\n title={Detection and tracking of large number of targets in wide area surveillance},\n author={Reilly, Vladimir and Idrees, Haroon and Shah, Mubarak},\n booktitle={European Conference on Computer Vision},\n pages={186--199},\n year={2010},\n organization={Springer}\n}\n\n@inproceedings{jain2005law,\n title={Law, Data clustering: a user’s dilemma},\n author={Jain, Anil K and Martin, HC},\n booktitle={Proceedings of the First international conference on Pattern Recognition and Machine Intelligence},\n year={2005}\n}\n\n@article{jain1999review,\n   author = {Jain, A. K. and Murty, M. N. and Flynn, P. J.},\n   title = {Data Clustering: A Review},\n   journal = {ACM Computuing Surveys},\n   issue_date = {Sept. 1999},\n   volume = {31},\n   number = {3},\n   month = sep,\n   year = {1999},\n   issn = {0360-0300},\n   pages = {264--323},\n   numpages = {60},\n   url = {http://doi.acm.org/10.1145/331499.331504},\n   doi = {10.1145/331499.331504},\n   acmid = {331504},\n   publisher = {ACM},\n   address = {New York, NY, USA},\n}\n\n% Flame data set\n@article{fu2007flame,\n title={FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data},\n author={Fu, Limin and Medico, Enzo},\n journal={BMC Bioinformatics},\n volume={8},\n number={1},\n pages={1},\n year={2007},\n publisher={BioMed Central}\n}\n\n% Birch dataset\n@article{Birchsets,\n   author = {T. Zhang and R. Ramakrishnan and M. Livny},\n   title = {BIRCH: A new data clustering algorithm and its applications},\n   journal = {Data Mining and Knowledge Discovery},\n   year = {1997},\n   volume = {1},\n   number = {2},\n   pages = {141--182}\n}\n\n@inproceedings{kisilevich2010p,\n title={P-DBSCAN: a density based clustering algorithm for exploration and analysis of attractive areas using collections of geo-tagged photos},\n author={Kisilevich, Slava and Mansmann, Florian and Keim, Daniel},\n booktitle={Proceedings of the 1st international conference and exhibition on computing for geospatial research \\& application},\n pages={38},\n year={2010},\n organization={ACM}\n}\n\n@inproceedings{celebi2005mining,\n title={Mining biomedical images with density-based clustering},\n author={Celebi, M Emre and Aslandogan, Y Alp and Bergstresser, Paul R},\n booktitle={International Conference on Information Technology: Coding and Computing (ITCC'05)-Volume II},\n volume={1},\n pages={163--168},\n year={2005},\n organization={IEEE}\n}\n\n@inproceedings{ertoz2003finding,\n title={Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data.},\n author={Ert{\\\"o}z, Levent and Steinbach, Michael and Kumar, Vipin},\n booktitle={SDM},\n pages={47--58},\n year={2003},\n organization={SIAM}\n}\n\n@article{Chen2014,\nauthor = {Chen, W and Ji, M H and Wang, J M},\ndoi = {10.3991/ijoe.v10i6.3881},\nfile = {:Users/mpiekenbrock/ResearchLibrary/TDBSCAN.pdf:pdf},\nissn = {18612121},\njournal = {International Journal of Online Engineering},\nkeywords = {Density-based clustering,Personal travel trajectory,T-DBSCAN,Trip segmentation},\nnumber = {6},\npages = {19--24},\ntitle = {{T-DBSCAN: A spatiotemporal density clustering for GPS trajectory segmentation}},\nvolume = {10},\nyear = {2014}\n}\n\n\n@incollection{sander2011density,\n title={Density-based clustering},\n author={Sander, Joerg},\n booktitle={Encyclopedia of Machine Learning},\n pages={270--273},\n year={2011},\n publisher={Springer}\n}\n\n\n% 88 citations\n@article{verma2012comparative,\n title={A comparative study of various clustering algorithms in data mining},\n author={Verma, Manish and Srivastava, Mauly and Chack, Neha and Diswar, Atul Kumar and Gupta, Nidhi},\n journal={International Journal of Engineering Research and Applications (IJERA)},\n volume={2},\n number={3},\n pages={1379--1384},\n year={2012}\n}\n\n@inproceedings{roy2005approach,\n title={An approach to find embedded clusters using density based techniques},\n author={Roy, Swarup and Bhattacharyya, DK},\n booktitle={International Conference on Distributed Computing and Internet Technology},\n pages={523--535},\n year={2005},\n organization={Springer}\n}\n\n@inproceedings{chowdhury2010efficient,\n title={An efficient method for subjectively choosing parameter ‘k’automatically in VDBSCAN (Varied Density Based Spatial Clustering of Applications with Noise) algorithm},\n author={Chowdhury, AK M Rasheduzzaman and Mollah, Md Elias and Rahman, Md Asikur},\n booktitle={Computer and Automation Engineering (ICCAE), 2010 The 2nd International Conference on},\n volume={1},\n pages={38--41},\n year={2010},\n organization={IEEE}\n}\n\n@inproceedings{ghanbarpour2014exdbscan,\n title={EXDBSCAN: An extension of DBSCAN to detect clusters in multi-density datasets},\n author={Ghanbarpour, Asieh and Minaei, Behrooz},\n booktitle={Intelligent Systems (ICIS), 2014 Iranian Conference on},\n pages={1--5},\n year={2014},\n organization={IEEE}\n}\n\n@inproceedings{vijayalakshmi2010improved,\n title={Improved varied density based spatial clustering algorithm with noise},\n author={Vijayalakshmi, S and Punithavalli, M},\n booktitle={Computational Intelligence and Computing Research (ICCIC), 2010 IEEE International Conference on},\n pages={1--4},\n year={2010},\n organization={IEEE}\n}\n\n@article{Wang2013,\nauthor = {Wang, Wei},\nfile = {:Users/mpiekenbrock/Downloads/905067f5314e6073d4779c11572bd8c5.pdf:pdf},\nisbn = {978-0-9891305-0-9},\nkeywords = {clustering algorithm,clustering techniques,data mining,derivative,global optimum k,similarity,similarity and minimizes intergroup,there are four basic,vdbscan},\npages = {225--228},\ntitle = {{Improved VDBSCAN With Global Optimum K}},\nyear = {2013}\n}\n\n@article{parvez2012data,\n title={Data set property based ‘K’in VDBSCAN Clustering Algorithm},\n author={Parvez, Abu Wahid Md Masud},\n journal={World of Computer Science and Information Technology Journal (WCSIT)},\n volume={2},\n number={3},\n pages={115--119},\n year={2012}\n}\n\n@inproceedings{liu2007vdbscan,\n title={VDBSCAN: varied density based spatial clustering of applications with noise},\n author={Liu, Peng and Zhou, Dong and Wu, Naijun},\n booktitle={2007 International conference on service systems and service management},\n pages={1--4},\n year={2007},\n organization={IEEE}\n}\n\n@article{pei2009decode,\n title={DECODE: a new method for discovering clusters of different densities in spatial data},\n author={Pei, Tao and Jasra, Ajay and Hand, David J and Zhu, A-Xing and Zhou, Chenghu},\n journal={Data Mining and Knowledge Discovery},\n volume={18},\n number={3},\n pages={337--369},\n year={2009},\n publisher={Springer}\n}\n\n@article{duan2007local,\n title={A local-density based spatial clustering algorithm with noise},\n author={Duan, Lian and Xu, Lida and Guo, Feng and Lee, Jun and Yan, Baopin},\n journal={Information Systems},\n volume={32},\n number={7},\n pages={978--986},\n year={2007},\n publisher={Elsevier}\n}\n\n@inproceedings{li2007traffic,\n title={Traffic density-based discovery of hot routes in road networks},\n author={Li, Xiaolei and Han, Jiawei and Lee, Jae-Gil and Gonzalez, Hector},\n booktitle={International Symposium on Spatial and Temporal Databases},\n pages={441--459},\n year={2007},\n organization={Springer}\n}\n\n@article{tran2006knn,\n title={KNN-kernel density-based clustering for high-dimensional multivariate data},\n author={Tran, Thanh N and Wehrens, Ron and Buydens, Lutgarde MC},\n journal={Computational Statistics \\& Data Analysis},\n volume={51},\n number={2},\n pages={513--525},\n year={2006},\n publisher={Elsevier}\n}\n\n@inproceedings{jiang2003dhc,\n title={DHC: a density-based hierarchical clustering method for time series gene expression data},\n author={Jiang, Daxin and Pei, Jian and Zhang, Aidong},\n booktitle={Bioinformatics and Bioengineering, 2003. Proceedings. Third IEEE Symposium on},\n pages={393--400},\n year={2003},\n organization={IEEE}\n}\n\n@inproceedings{kriegel2005density,\n title={Density-based clustering of uncertain data},\n author={Kriegel, Hans-Peter and Pfeifle, Martin},\n booktitle={Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining},\n pages={672--677},\n year={2005},\n organization={ACM}\n}\n\n@book{agrawal1998automatic,\n title={Automatic subspace clustering of high dimensional data for data mining applications},\n author={Agrawal, Rakesh and Gehrke, Johannes and Gunopulos, Dimitrios and Raghavan, Prabhakar},\n volume={27},\n number={2},\n year={1998},\n publisher={ACM}\n}\n\n@inproceedings{cao2006density,\n title={Density-Based Clustering over an Evolving Data Stream with Noise.},\n author={Cao, Feng and Ester, Martin and Qian, Weining and Zhou, Aoying},\n booktitle={SDM},\n volume={6},\n pages={328--339},\n year={2006},\n organization={SIAM}\n}\n\n@inproceedings{chen2007density,\n title={Density-based clustering for real-time stream data},\n author={Chen, Yixin and Tu, Li},\n booktitle={Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining},\n pages={133--142},\n year={2007},\n organization={ACM}\n}\n\n\n@article{kriegel:2011,\n title={Density-based clustering},\n author={Kriegel, Hans-Peter and Kr{\\\"o}ger, Peer and Sander, J{\\\"o}rg and Zimek Arthur},\n journal={Wires Data and Knowledge Discovery},\n volume={1},\n number={},\n pages={231--240},\n year={2011},\n publisher={John Wiley \\& Sons}\n}\n\n@book{Aggarwal:2013,\n    author = {Aggarwal, Charu C. and Reddy, Chandan K.},\n    title = {Data Clustering: Algorithms and Applications},\n    year = {2013},\n    isbn = {1466558210, 9781466558212},\n    edition = {1st},\n    publisher = {Chapman \\& Hall/CRC},\n}\n\n@book{Kaufman:1990,\n    title = \"Finding groups in data : an introduction to cluster analysis\",\n    author = \"Kaufman, Leonard and Rousseeuw, Peter J.\",\n    series = \"Wiley series in probability and mathematical statistics\",\n    publisher = \"Wiley\",\n    address = \"New York\",\n    isbn = \"0-471-87876-6\",\n    year = 1990\n}\n\n@ARTICLE{jarvis1973,\n  author={Jarvis, R.A. and Patrick, E.A.},\n  journal={IEEE Transactions on Computers},\n  title={Clustering Using a Similarity Measure Based on Shared Near Neighbors},\n  year={1973},\n  volume={C-22},\n  number={11},\n  pages={1025-1034},\n  keywords={Clustering, nonparametric, pattern recognition, shared near neighbors, similarity measure.},\n  doi={10.1109/T-C.1973.223640}\n  }\n\n@inbook{erdoz2003,\nauthor = {Levent Ertöz and Michael Steinbach and Vipin Kumar},\ntitle = {Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data},\nbooktitle = {Proceedings of the 2003 SIAM International Conference on Data Mining (SDM)},\nyear = {2003},\npages = {47-58},\ndoi = {10.1137/1.9781611972733.5}\n}\n\n@inbook{moulavi2014,\nauthor = {Davoud Moulavi and Pablo A. Jaskowiak and Ricardo J. G. B. Campello and Arthur Zimek and Jörg Sander},\ntitle = {Density-Based Clustering Validation},\nbooktitle = {Proceedings of the 2014 SIAM International Conference on Data Mining (SDM)},\nyear = {2014},\npages = {839-847},\ndoi = {10.1137/1.9781611973440.96},\n}\n"
  },
  {
    "path": "vignettes/hdbscan.Rmd",
    "content": "---\ntitle: \"HDBSCAN with the dbscan package\"\nauthor: \"Matt Piekenbrock, Michael Hahsler\"\nvignette: >\n  %\\VignetteIndexEntry{Hierarchical DBSCAN (HDBSCAN) with the dbscan package}\n  %\\VignetteEncoding{UTF-8}\n  %\\VignetteEngine{knitr::rmarkdown}\nheader-includes: \\usepackage{animation}\noutput: html_document\n---\nThe dbscan package [6] includes a fast implementation of Hierarchical DBSCAN (HDBSCAN) and its related algorithm(s) for the \nR platform. This vignette introduces how to interface with these features. To understand how HDBSCAN works, we refer to an excellent Python Notebook resource that goes over the basic concepts of the algorithm (see [ the SciKit-learn docs](http://hdbscan.readthedocs.io/en/latest/how_hdbscan_works.html)). For the sake of simplicity, consider the same sample \ndataset from the notebook:\n```{r}\nlibrary(\"dbscan\")\ndata(\"moons\")\nplot(moons, pch=20)\n```\n\nTo run the HDBSCAN algorithm, simply pass the dataset and the (single) parameter value 'minPts' to the hdbscan function. \n\n```{r}\n  cl <- hdbscan(moons, minPts = 5)\n  cl\n```\n\nThe 'flat' results are stored in the 'cluster' member. Noise points are given a value of 0, so increment by 1. \n```{r}\n plot(moons, col=cl$cluster+1, pch=20)\n```\n\nThe results match intuitive notions of what 'similar' clusters may look like when they manifest in arbitrary shapes. \n\n## Hierarchical DBSCAN\nThe resulting HDBSCAN object contains a hierarchical representation of every possible DBSCAN* clustering. This hierarchical representation is compactly stored in the familiar 'hc' member of the resulting HDBSCAN object, in the same format of traditional hierarchical clustering objects formed using the 'hclust' method from the stats package. \n```{r}\ncl$hc\n```\n\nNote that although this object is available for use with any of the methods that work with 'hclust' objects, the distance method HDBSCAN uses (mutual reachability distance, see [2]) is _not_ an available method of the hclust function. This hierarchy, denoted the \"HDBSCAN* hierarchy\" in [3], can be visualized using the built-in plotting method from the stats package \n```{r}\nplot(cl$hc, main=\"HDBSCAN* Hierarchy\")\n```\n\n## DBSCAN\\* vs cutting the HDBSCAN\\* tree \nAs the name implies, the fascinating thing about the HDBSCAN\\* hierarchy is that any global 'cut' is equivalent to running DBSCAN\\* (DBSCAN w/o border points) at the tree's cutting threshold $eps$ (assuming the same $minPts$ parameter setting was used). But can this be verified manually? Using a modified function to distinguish noise using core distance as 0 (since the stats cutree method _does not_ assign singletons with 0), the results can be shown to be identical. \n```{r}\ncl <- hdbscan(moons, minPts = 5)\ncheck <- rep(FALSE, nrow(moons)-1)\ncore_dist <- kNNdist(moons, k=5-1)\n\n## cutree doesn't distinguish noise as 0, so we make a new method to do it manually \ncut_tree <- function(hcl, eps, core_dist){\n  cuts <- unname(cutree(hcl, h=eps))\n  cuts[which(core_dist > eps)] <- 0 # Use core distance to distinguish noise\n  cuts\n}\n\neps_values <- sort(cl$hc$height, decreasing = TRUE)+.Machine$double.eps ## Machine eps for consistency between cuts\nfor (i in 1:length(eps_values)) { \n  cut_cl <- cut_tree(cl$hc, eps_values[i], core_dist)\n  dbscan_cl <- dbscan(moons, eps = eps_values[i], minPts = 5, borderPoints = FALSE) # DBSCAN* doesn't include border points\n  \n  ## Use run length encoding as an ID-independent way to check ordering\n  check[i] <- (all.equal(rle(cut_cl)$lengths, rle(dbscan_cl$cluster)$lengths) == \"TRUE\")\n}\nprint(all(check == TRUE))\n```\n\n## Simplified Tree\nThe HDBSCAN\\* hierarchy is useful, but for larger datasets it can become overly cumbersome since every data point is represented as a leaf somewhere in the hierarchy. The hdbscan object comes with a powerful visualization tool that plots the 'simplified' hierarchy(see [2] for more details), which shows __cluster-wide__ changes over an infinite number of $eps$ thresholds. It is the default visualization dispatched by the 'plot' method\n```{r}\n plot(cl)\n```\n\nYou can change up colors\n```{r}\n plot(cl, gradient = c(\"yellow\", \"orange\", \"red\", \"blue\"))\n```\n\n... and scale the widths for individual devices appropriately \n```{r}\nplot(cl, gradient = c(\"purple\", \"blue\", \"green\", \"yellow\"), scale=1.5)\n```\n\n... even outline the most 'stable' clusters reported in the flat solution \n```{r}\nplot(cl, gradient = c(\"purple\", \"blue\", \"green\", \"yellow\"), show_flat = TRUE)\n```\n\n## Cluster Stability Scores\nNote the stability scores correspond to the labels on the condensed tree, but the cluster assignments in the cluster member element do not correspond to the labels in the condensed tree. Also, note that these scores represent the stability scores _before_ the traversal up the tree that updates the scores based on the children. \n```{r}\nprint(cl$cluster_scores)\n```\n\nThe individual point membership 'probabilities' are in the probabilities member element\n```{r}\n  head(cl$membership_prob)\n```\n\nThese can be used to show the 'degree of cluster membership' by, for example, plotting points with transparencies that correspond to their membership degrees.   \n```{r}\n  plot(moons, col=cl$cluster+1, pch=21)\n  colors <- mapply(function(col, i) adjustcolor(col, alpha.f = cl$membership_prob[i]), \n                   palette()[cl$cluster+1], seq_along(cl$cluster))\n  points(moons, col=colors, pch=20)\n```\n\n## Global-Local Outlier Score from Hierarchies\nA recent journal publication on HDBSCAN comes with a new outlier measure that computes an outlier score of each point in the data based on local _and_ global properties of the hierarchy, defined as the Global-Local Outlier Score from Hierarchies (GLOSH)[4]. An example of this is shown below, where unlike the membership probabilities, the opacity of point represents the amount of \"outlierness\" the point represents. Traditionally, outliers are generally considered to be observations that deviate from the expected value of their presumed underlying distribution, where the measure of deviation that is considered significant is determined by some statistical threshold value.\n\n__Note:__ Because of the distinction made that noise points, points that _are not_ assigned to any clusters, should be considered in the definition of an outlier, the outlier scores computed are not just the inversely-proportional scores to the membership probabilities. \n```{r}\n  top_outliers <- order(cl$outlier_scores, decreasing = TRUE)[1:10]\n  colors <- mapply(function(col, i) adjustcolor(col, alpha.f = cl$outlier_scores[i]), \n                   palette()[cl$cluster+1], seq_along(cl$cluster))\n  plot(moons, col=colors, pch=20)\n  text(moons[top_outliers, ], labels = top_outliers, pos=3)\n```\n\n## A Larger Clustering Example \nA larger example dataset may be more beneficial in explicitly revealing the usefulness of HDSBCAN. Consider the 'DS3' \ndataset originally published as part of a benchmark test dataset for the Chameleon clustering algorithm [5]. It's\nclear that the shapes in this dataset can be distinguished sufficiently well by a human, however, it is well known that \nmany clustering algorithms fail to capture the intuitive structure. \n```{r}\ndata(\"DS3\")\nplot(DS3, pch=20, cex=0.25)\n```\n\nUsing the single parameter setting of, say, 25, HDBSCAN finds 6 clusters\n```{r}\ncl2 <- hdbscan(DS3, minPts = 25)\ncl2\n```\n\nMarking the noise appropriately and highlighting points based on their 'membership probabilities' as before, a visualization of the cluster structure can be easily crafted.   \n```{r}\n  plot(DS3, col=cl2$cluster+1, \n       pch=ifelse(cl2$cluster == 0, 8, 1), # Mark noise as star\n       cex=ifelse(cl2$cluster == 0, 0.5, 0.75), # Decrease size of noise\n       xlab=NA, ylab=NA)\n  colors <- sapply(1:length(cl2$cluster), \n                   function(i) adjustcolor(palette()[(cl2$cluster+1)[i]], alpha.f = cl2$membership_prob[i]))\n  points(DS3, col=colors, pch=20)\n```\n\nThe simplified tree can be particularly useful for larger datasets  \n```{r}\n  plot(cl2, scale = 3, gradient = c(\"purple\", \"orange\", \"red\"), show_flat = TRUE)\n```\n\n## Performance \nAll of the computational and memory intensive tasks required by HDSBCAN were written in C++ using the Rcpp package. With DBSCAN, the performance depends on the parameter settings, primarily on the radius at which points are considered as candidates for clustering ('eps'), and generally less so on the 'minPts' parameter. Intuitively, larger values of eps increase the computation time. \n\nOne of the primary computational bottleneck with using HDBSCAN is the computation of the full (euclidean) pairwise distance between all points, for which HDBSCAN currently relies on base R 'dist' method for. If a precomputed one is available, the running time of HDBSCAN can be moderately reduced. \n\n## References \n1. Martin Ester, Hans-Peter Kriegel, Joerg Sander, Xiaowei Xu (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Institute for Computer Science, University of Munich. Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96).\n2. Campello, Ricardo JGB, Davoud Moulavi, Arthur Zimek, and Jörg Sander. \"A framework for semi-supervised and unsupervised optimal extraction of clusters from hierarchies.\" Data Mining and Knowledge Discovery 27, no. 3 (2013): 344-371.\n3. Campello, Ricardo JGB, Davoud Moulavi, and Joerg Sander. \"Density-based clustering based on hierarchical density estimates.\" In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 160-172. Springer Berlin Heidelberg, 2013.\n4. Campello, Ricardo JGB, Davoud Moulavi, Arthur Zimek, and Jörg Sander. \"Hierarchical density estimates for data clustering, visualization, and outlier detection.\" ACM Transactions on Knowledge Discovery from Data (TKDD) 10, no. 1 (2015): 5.\n5. Karypis, George, Eui-Hong Han, and Vipin Kumar. \"Chameleon: Hierarchical clustering using dynamic modeling.\" Computer 32, no. 8 (1999): 68-75.\n6. Hahsler M, Piekenbrock M, Doran D (2019). \"dbscan: Fast Density-Based Clustering with R.\" Journal of Statistical Software, 91(1), 1-30. doi: [10.18637/jss.v091.i01](https://doi.org/10.18637/jss.v091.i01)\n\n\n"
  }
]