[
  {
    "path": ".github/CODEOWNERS",
    "content": "# These code reviewers should be added by default.\n* @snsinha @omiksik @tien-d\n"
  },
  {
    "path": "CODE_OF_CONDUCT.md",
    "content": "# Microsoft Open Source Code of Conduct\n\nThis project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).\n\nResources:\n\n- [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/)\n- [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)\n- Contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with questions or concerns\n"
  },
  {
    "path": "LICENSE",
    "content": "﻿Attribution 4.0 International\n\n=======================================================================\n\nCreative Commons Corporation (\"Creative Commons\") is not a law firm and\ndoes not provide legal services or legal advice. Distribution of\nCreative Commons public licenses does not create a lawyer-client or\nother relationship. Creative Commons makes its licenses and related\ninformation available on an \"as-is\" basis. Creative Commons gives no\nwarranties regarding its licenses, any material licensed under their\nterms and conditions, or any related information. Creative Commons\ndisclaims all liability for damages resulting from their use to the\nfullest extent possible.\n\nUsing Creative Commons Public Licenses\n\nCreative Commons public licenses provide a standard set of terms and\nconditions that creators and other rights holders may use to share\noriginal works of authorship and other material subject to copyright\nand certain other rights specified in the public license below. The\nfollowing considerations are for informational purposes only, are not\nexhaustive, and do not form part of our licenses.\n\n     Considerations for licensors: Our public licenses are\n     intended for use by those authorized to give the public\n     permission to use material in ways otherwise restricted by\n     copyright and certain other rights. Our licenses are\n     irrevocable. Licensors should read and understand the terms\n     and conditions of the license they choose before applying it.\n     Licensors should also secure all rights necessary before\n     applying our licenses so that the public can reuse the\n     material as expected. Licensors should clearly mark any\n     material not subject to the license. This includes other CC-\n     licensed material, or material used under an exception or\n     limitation to copyright. More considerations for licensors:\n\twiki.creativecommons.org/Considerations_for_licensors\n\n     Considerations for the public: By using one of our public\n     licenses, a licensor grants the public permission to use the\n     licensed material under specified terms and conditions. If\n     the licensor's permission is not necessary for any reason--for\n     example, because of any applicable exception or limitation to\n     copyright--then that use is not regulated by the license. Our\n     licenses grant only permissions under copyright and certain\n     other rights that a licensor has authority to grant. Use of\n     the licensed material may still be restricted for other\n     reasons, including because others have copyright or other\n     rights in the material. A licensor may make special requests,\n     such as asking that all changes be marked or described.\n     Although not required by our licenses, you are encouraged to\n     respect those requests where reasonable. More_considerations\n     for the public: \n\twiki.creativecommons.org/Considerations_for_licensees\n\n=======================================================================\n\nCreative Commons Attribution 4.0 International Public License\n\nBy exercising the Licensed Rights (defined below), You accept and agree\nto be bound by the terms and conditions of this Creative Commons\nAttribution 4.0 International Public License (\"Public License\"). To the\nextent this Public License may be interpreted as a contract, You are\ngranted the Licensed Rights in consideration of Your acceptance of\nthese terms and conditions, and the Licensor grants You such rights in\nconsideration of benefits the Licensor receives from making the\nLicensed Material available under these terms and conditions.\n\n\nSection 1 -- Definitions.\n\n  a. Adapted Material means material subject to Copyright and Similar\n     Rights that is derived from or based upon the Licensed Material\n     and in which the Licensed Material is translated, altered,\n     arranged, transformed, or otherwise modified in a manner requiring\n     permission under the Copyright and Similar Rights held by the\n     Licensor. For purposes of this Public License, where the Licensed\n     Material is a musical work, performance, or sound recording,\n     Adapted Material is always produced where the Licensed Material is\n     synched in timed relation with a moving image.\n\n  b. Adapter's License means the license You apply to Your Copyright\n     and Similar Rights in Your contributions to Adapted Material in\n     accordance with the terms and conditions of this Public License.\n\n  c. Copyright and Similar Rights means copyright and/or similar rights\n     closely related to copyright including, without limitation,\n     performance, broadcast, sound recording, and Sui Generis Database\n     Rights, without regard to how the rights are labeled or\n     categorized. For purposes of this Public License, the rights\n     specified in Section 2(b)(1)-(2) are not Copyright and Similar\n     Rights.\n\n  d. Effective Technological Measures means those measures that, in the\n     absence of proper authority, may not be circumvented under laws\n     fulfilling obligations under Article 11 of the WIPO Copyright\n     Treaty adopted on December 20, 1996, and/or similar international\n     agreements.\n\n  e. Exceptions and Limitations means fair use, fair dealing, and/or\n     any other exception or limitation to Copyright and Similar Rights\n     that applies to Your use of the Licensed Material.\n\n  f. Licensed Material means the artistic or literary work, database,\n     or other material to which the Licensor applied this Public\n     License.\n\n  g. Licensed Rights means the rights granted to You subject to the\n     terms and conditions of this Public License, which are limited to\n     all Copyright and Similar Rights that apply to Your use of the\n     Licensed Material and that the Licensor has authority to license.\n\n  h. Licensor means the individual(s) or entity(ies) granting rights\n     under this Public License.\n\n  i. Share means to provide material to the public by any means or\n     process that requires permission under the Licensed Rights, such\n     as reproduction, public display, public performance, distribution,\n     dissemination, communication, or importation, and to make material\n     available to the public including in ways that members of the\n     public may access the material from a place and at a time\n     individually chosen by them.\n\n  j. Sui Generis Database Rights means rights other than copyright\n     resulting from Directive 96/9/EC of the European Parliament and of\n     the Council of 11 March 1996 on the legal protection of databases,\n     as amended and/or succeeded, as well as other essentially\n     equivalent rights anywhere in the world.\n\n  k. You means the individual or entity exercising the Licensed Rights\n     under this Public License. Your has a corresponding meaning.\n\n\nSection 2 -- Scope.\n\n  a. License grant.\n\n       1. Subject to the terms and conditions of this Public License,\n          the Licensor hereby grants You a worldwide, royalty-free,\n          non-sublicensable, non-exclusive, irrevocable license to\n          exercise the Licensed Rights in the Licensed Material to:\n\n            a. reproduce and Share the Licensed Material, in whole or\n               in part; and\n\n            b. produce, reproduce, and Share Adapted Material.\n\n       2. Exceptions and Limitations. For the avoidance of doubt, where\n          Exceptions and Limitations apply to Your use, this Public\n          License does not apply, and You do not need to comply with\n          its terms and conditions.\n\n       3. Term. The term of this Public License is specified in Section\n          6(a).\n\n       4. Media and formats; technical modifications allowed. The\n          Licensor authorizes You to exercise the Licensed Rights in\n          all media and formats whether now known or hereafter created,\n          and to make technical modifications necessary to do so. The\n          Licensor waives and/or agrees not to assert any right or\n          authority to forbid You from making technical modifications\n          necessary to exercise the Licensed Rights, including\n          technical modifications necessary to circumvent Effective\n          Technological Measures. For purposes of this Public License,\n          simply making modifications authorized by this Section 2(a)\n          (4) never produces Adapted Material.\n\n       5. Downstream recipients.\n\n            a. Offer from the Licensor -- Licensed Material. Every\n               recipient of the Licensed Material automatically\n               receives an offer from the Licensor to exercise the\n               Licensed Rights under the terms and conditions of this\n               Public License.\n\n            b. No downstream restrictions. You may not offer or impose\n               any additional or different terms or conditions on, or\n               apply any Effective Technological Measures to, the\n               Licensed Material if doing so restricts exercise of the\n               Licensed Rights by any recipient of the Licensed\n               Material.\n\n       6. No endorsement. Nothing in this Public License constitutes or\n          may be construed as permission to assert or imply that You\n          are, or that Your use of the Licensed Material is, connected\n          with, or sponsored, endorsed, or granted official status by,\n          the Licensor or others designated to receive attribution as\n          provided in Section 3(a)(1)(A)(i).\n\n  b. Other rights.\n\n       1. Moral rights, such as the right of integrity, are not\n          licensed under this Public License, nor are publicity,\n          privacy, and/or other similar personality rights; however, to\n          the extent possible, the Licensor waives and/or agrees not to\n          assert any such rights held by the Licensor to the limited\n          extent necessary to allow You to exercise the Licensed\n          Rights, but not otherwise.\n\n       2. Patent and trademark rights are not licensed under this\n          Public License.\n\n       3. To the extent possible, the Licensor waives any right to\n          collect royalties from You for the exercise of the Licensed\n          Rights, whether directly or through a collecting society\n          under any voluntary or waivable statutory or compulsory\n          licensing scheme. In all other cases the Licensor expressly\n          reserves any right to collect such royalties.\n\n\nSection 3 -- License Conditions.\n\nYour exercise of the Licensed Rights is expressly made subject to the\nfollowing conditions.\n\n  a. Attribution.\n\n       1. If You Share the Licensed Material (including in modified\n          form), You must:\n\n            a. retain the following if it is supplied by the Licensor\n               with the Licensed Material:\n\n                 i. identification of the creator(s) of the Licensed\n                    Material and any others designated to receive\n                    attribution, in any reasonable manner requested by\n                    the Licensor (including by pseudonym if\n                    designated);\n\n                ii. a copyright notice;\n\n               iii. a notice that refers to this Public License;\n\n                iv. a notice that refers to the disclaimer of\n                    warranties;\n\n                 v. a URI or hyperlink to the Licensed Material to the\n                    extent reasonably practicable;\n\n            b. indicate if You modified the Licensed Material and\n               retain an indication of any previous modifications; and\n\n            c. indicate the Licensed Material is licensed under this\n               Public License, and include the text of, or the URI or\n               hyperlink to, this Public License.\n\n       2. You may satisfy the conditions in Section 3(a)(1) in any\n          reasonable manner based on the medium, means, and context in\n          which You Share the Licensed Material. For example, it may be\n          reasonable to satisfy the conditions by providing a URI or\n          hyperlink to a resource that includes the required\n          information.\n\n       3. If requested by the Licensor, You must remove any of the\n          information required by Section 3(a)(1)(A) to the extent\n          reasonably practicable.\n\n       4. If You Share Adapted Material You produce, the Adapter's\n          License You apply must not prevent recipients of the Adapted\n          Material from complying with this Public License.\n\n\nSection 4 -- Sui Generis Database Rights.\n\nWhere the Licensed Rights include Sui Generis Database Rights that\napply to Your use of the Licensed Material:\n\n  a. for the avoidance of doubt, Section 2(a)(1) grants You the right\n     to extract, reuse, reproduce, and Share all or a substantial\n     portion of the contents of the database;\n\n  b. if You include all or a substantial portion of the database\n     contents in a database in which You have Sui Generis Database\n     Rights, then the database in which You have Sui Generis Database\n     Rights (but not its individual contents) is Adapted Material; and\n\n  c. You must comply with the conditions in Section 3(a) if You Share\n     all or a substantial portion of the contents of the database.\n\nFor the avoidance of doubt, this Section 4 supplements and does not\nreplace Your obligations under this Public License where the Licensed\nRights include other Copyright and Similar Rights.\n\n\nSection 5 -- Disclaimer of Warranties and Limitation of Liability.\n\n  a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE\n     EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS\n     AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF\n     ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,\n     IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,\n     WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR\n     PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,\n     ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT\n     KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT\n     ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.\n\n  b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE\n     TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,\n     NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,\n     INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,\n     COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR\n     USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN\n     ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR\n     DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR\n     IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.\n\n  c. The disclaimer of warranties and limitation of liability provided\n     above shall be interpreted in a manner that, to the extent\n     possible, most closely approximates an absolute disclaimer and\n     waiver of all liability.\n\n\nSection 6 -- Term and Termination.\n\n  a. This Public License applies for the term of the Copyright and\n     Similar Rights licensed here. However, if You fail to comply with\n     this Public License, then Your rights under this Public License\n     terminate automatically.\n\n  b. Where Your right to use the Licensed Material has terminated under\n     Section 6(a), it reinstates:\n\n       1. automatically as of the date the violation is cured, provided\n          it is cured within 30 days of Your discovery of the\n          violation; or\n\n       2. upon express reinstatement by the Licensor.\n\n     For the avoidance of doubt, this Section 6(b) does not affect any\n     right the Licensor may have to seek remedies for Your violations\n     of this Public License.\n\n  c. For the avoidance of doubt, the Licensor may also offer the\n     Licensed Material under separate terms or conditions or stop\n     distributing the Licensed Material at any time; however, doing so\n     will not terminate this Public License.\n\n  d. Sections 1, 5, 6, 7, and 8 survive termination of this Public\n     License.\n\n\nSection 7 -- Other Terms and Conditions.\n\n  a. The Licensor shall not be bound by any additional or different\n     terms or conditions communicated by You unless expressly agreed.\n\n  b. Any arrangements, understandings, or agreements regarding the\n     Licensed Material not stated herein are separate from and\n     independent of the terms and conditions of this Public License.\n\n\nSection 8 -- Interpretation.\n\n  a. For the avoidance of doubt, this Public License does not, and\n     shall not be interpreted to, reduce, limit, restrict, or impose\n     conditions on any use of the Licensed Material that could lawfully\n     be made without permission under this Public License.\n\n  b. To the extent possible, if any provision of this Public License is\n     deemed unenforceable, it shall be automatically reformed to the\n     minimum extent necessary to make it enforceable. If the provision\n     cannot be reformed, it shall be severed from this Public License\n     without affecting the enforceability of the remaining terms and\n     conditions.\n\n  c. No term or condition of this Public License will be waived and no\n     failure to comply consented to unless expressly agreed to by the\n     Licensor.\n\n  d. Nothing in this Public License constitutes or may be interpreted\n     as a limitation upon, or waiver of, any privileges and immunities\n     that apply to the Licensor or You, including from the legal\n     processes of any jurisdiction or authority.\n\n\n=======================================================================\n\nCreative Commons is not a party to its public\nlicenses. Notwithstanding, Creative Commons may elect to apply one of\nits public licenses to material it publishes and in those instances\nwill be considered the “Licensor.” The text of the Creative Commons\npublic licenses is dedicated to the public domain under the CC0 Public\nDomain Dedication. Except for the limited purpose of indicating that\nmaterial is shared under a Creative Commons public license or as\notherwise permitted by the Creative Commons policies published at\ncreativecommons.org/policies, Creative Commons does not authorize the\nuse of the trademark \"Creative Commons\" or any other trademark or logo\nof Creative Commons without its prior written consent including,\nwithout limitation, in connection with any unauthorized modifications\nto any of its public licenses or any other arrangements,\nunderstandings, or agreements concerning use of licensed material. For\nthe avoidance of doubt, this paragraph does not form part of the\npublic licenses.\n\nCreative Commons may be contacted at creativecommons.org."
  },
  {
    "path": "LICENSE-CODE",
    "content": "    MIT License\n\n    Copyright (c) Microsoft Corporation.\n\n    Permission is hereby granted, free of charge, to any person obtaining a copy\n    of this software and associated documentation files (the \"Software\"), to deal\n    in the Software without restriction, including without limitation the rights\n    to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n    copies of the Software, and to permit persons to whom the Software is\n    furnished to do so, subject to the following conditions:\n\n    The above copyright notice and this permission notice shall be included in all\n    copies or substantial portions of the Software.\n\n    THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n    AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n    OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n    SOFTWARE\n"
  },
  {
    "path": "README.md",
    "content": "# Scene Landmark Detection for Camera Localization\n\n## Introduction\n\n![teaser](media/teaser_wide.png)\nWe have devised a new method to detect scene-specific _scene landmarks_ for localizing a camera within a pre-mapped scene. Our method is privacy-preserving, has low storage requirements and achieves high accuracy. **[Left]** Scene landmarks detected in a query image. **[Middle]** A CNN-based heatmap prediction architecture is trained. **[Right]** The 3D scene \nlandmarks (_in red_) and the estimated camera pose (_in blue_) are shown overlaid over the 3D point cloud (_in gray_). The 3D point \ncloud is shown only for visualization. It is not actually used for camera localization.\n\n---  \n\n## Papers\n**Improved Scene Landmark Detection for Camera Localization**![new](media/New.png)  \nTien Do and Sudipta N. Sinha  \nInternational Conference on 3D Vision (**3DV**), 2024  \n[pdf](paper/DoSinha3DV2024.pdf)  \n\n**Learning to Detect Scene Landmarks for Camera Localization**  \nTien Do, Ondrej Miksik, Joseph DeGol, Hyun Soo Park, and Sudipta N. Sinha  \nIEEE/CVF Conference on Computer Vision and Pattern Recognition (**CVPR**), 2022  \n[pdf](paper/DoEtalCVPR2022.pdf) &nbsp; [video](https://www.youtube.com/watch?v=HM2yLCLz5nY) \n\n**Indoor6 Dataset**  \n[download](https://drive.google.com/drive/folders/1w7Adnd6MXmNOacT072JnQ6emHUeLrD71?usp=drive_link)\n\n## Bibtex\nIf you find our work to be useful in your research, please consider citing our paper:\n```\n@InProceedings{Do_Sinha_2024_ImprovedSceneLandmarkLoc,\n    author     = {Do, Tien and Sinha, Sudipta N.},\n    title      = {Improved Scene Landmark Detection for Camera Localization},\n    booktitle  = {Proceedings of the International Conference on 3D Vision (3DV)},\n    month      = {March},\n    year       = {2024}\n}\n\n@InProceedings{Do_2022_SceneLandmarkLoc,\n    author     = {Do, Tien and Miksik, Ondrej and DeGol, Joseph and Park, Hyun Soo and Sinha, Sudipta N.},\n    title      = {Learning to Detect Scene Landmarks for Camera Localization},\n    booktitle  = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},\n    month      = {June},\n    year       = {2022}\n}\n```\n\n# Indoor-6 Dataset\n\nThe Indoor-6 dataset was created from multiple sessions captured in six indoor scenes over multiple days. The pseudo \nground truth (pGT) 3D point clouds and camera poses for each scene are computed using [COLMAP](https://colmap.github.io/). All training data uses only colmap reconstruction from training images. The figure below \nshows the camera poses (in red) and point clouds (in gray) and for each scene, the number of video and images in the \ntraining and test split respectively. Compared to [7-scenes](https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/), the scenes in Indoor-6 are larger, have multiple rooms, \ncontains illumination variations as the images span multiple days and different times of day.\n\n![indoor6_sfm](media/indoor6_sfm.png)\nIndoor-6 dataset SfM reconstructions. Train/val/test splits and download urls per scene are listed below:\n* [scene1](https://drive.google.com/file/d/1AJhPh9nnZO0HJyxuXXZdtKtA7kFRi3LQ/view?usp=drive_link) (6289/798/799 images)\n* <strike>scene2 (3021/283/284 images)</strike> \n* [scene2a](https://drive.google.com/file/d/1DgTQ7fflZJ7DdbHDRZF-6gXdB_vJF7fY/view?usp=drive_link) (4890/256/257 images)\n* [scene3](https://drive.google.com/file/d/12aER7rQkvGS_DPeugTHo_Ma_Fi7JuflS/view?usp=drive_link) (4181/313/315 images)\n* <strike>scene4 (1942/272/272 images)</strike>\n* [scene4a](https://drive.google.com/file/d/1gibneq5ixZ0lmeNAYTmY4Mh8a244T2nl/view?usp=drive_link) (2285/158/158 images)\n* [scene5](https://drive.google.com/file/d/18wHn_69-eV22N4I8R0rWQkcSQ3EtCYMX/view?usp=drive_link) (4946/512/424 images)\n* [scene6](https://drive.google.com/file/d/1mZYnoKo37KXRjREK5CKs5IzDox2G3Prt/view?usp=drive_link) (1761/322/323 images)\n* [colmap](https://drive.google.com/file/d/1oMo552DYo2U5Fvjm5MrTYPMqpMjXEf7m/view?usp=drive_link) (colmap reconstructions for all scenes.)\n\n**Note**: We added two new scenes (`scene2a` and `scene4a`) to the Indoor-6 dataset after our CVPR 2022 paper was published. This was because we were unable to release `scene2` and `scene4` from the original dataset due to privacy reasons. \nThe two new scenes have been included as replacements. Please refer to our 3DV 2024 paper for a quantitative evaluation of our method and several baselines on the latest version of the dataset.\n\n# Source code\nThe repository contains all the source code for our project. The most recent version can be found in the `3dv24` git branch (which is now the default branch of the repository). The best performing pretrained models for `SLD-star` as proposed in our 3DV 2024 paper are also available (see below). It significantly outperforms the `SLD+NBE` approach proposed in our CVPR 2022 paper. The source code for the `SLD+NBE` method is not maintained anymore. The older version of the code (pre 3DV 2024) can be found in the `main` branch.\n\n## Environment Setup\n```\npip install -r requirements.txt\n```\n\n* Python 3.9.13 on Windows 11.\n* CUDA version: release 11.8 (V11.8.89)\n* PyTorch version: 2.1.0+cu118\n\nFor development purposes, training was tested to run on both CUDA and CPU on both Linux and Windows platforms, as well as using the latest experimental version of pyTorch with Metal Performance Shaders on Mac OS X (see below).\n\nBy default the code will select hardware acceleration for your device, if available.\n\n### Experimental Mac OS Metal Performance Shaders (MPS)\n\nTo enable the MPS backend, make sure you are running the latest Apple Silicon compatible hardware and follow [these instructions](https://pytorch.org/blog/introducing-accelerated-pytorch-training-on-mac/) to get the latest Nightly build of pyTorch instead.\n\n_NOTE_: MPS has max supported precision of FP32.\n\n## Layout\n\nThe source code expects the following directory structure (currently in your home directory).\n\n```\n  └── data\n  |\t└── outputs\n  |\t└── checkpoints\n  |\t└── indoor6\n  |\t\t└── scene1\n  |\t\t└── scene2a\n  |\t\t└── scene3\n  |\t\t└── scene4a\n  |\t\t└── scene5\n  |\t\t└── scene6\t\n  └── SceneLandmarkLocalization\n\t\t└── src\n\t\t└── README.md (this)\n```\n\n* Download the indoor6 dataset and place the contents in the `/data/indoor6/` folder, as indicated above.\n* Download the pretrained models for `SLD-star` (see below) from our 3DV 2024 paper and place them in the `/data/checkpoints` folder, as indicated above.  \n[pretrained models](https://drive.google.com/file/d/1s8bUgAuy2LX4QMcKE8yKz6JRyhL3JgxZ/view?usp=drive_link)\n\n* Clone this repo into `/SceneLandmarkLocalization`.\n* Finally, create the folder `/data/outputs` for storing trained models and other files that will be created when training your own models using the training routine.\n\n## Running Inference using Pre-trained Models\n\nInstructions to test the `SLD-star` models from our 3DV 2024 paper are listed below.\n\n**Step 1.** First, verify the contents of the checkpoints folder. You should see the following files and directories.\n```\n  └── data\n  \t└── checkpoints\n  \t\t└── scene1_1000-125_v10\n  \t\t└── scene1_1000-125_v10.txt\n\t\t└── scene2a_1000-125_v10\n  \t\t└── scene2a_1000-125_v10.txt\n\t\t└── scene3_1000-125_v10\n  \t\t└── scene3_1000-125_v10.txt\n  \t\t└── scene4a_1000-125_v10\n  \t\t└── scene4a_1000-125_v10.txt\n  \t\t└── scene5_1000-125_v10\n  \t\t└── scene5_1000-125_v10.txt\n  \t\t└── scene6_1000-125_v10\t\t\n  \t\t└── scene6_1000-125_v10.txt\n```\n\n**Step 2.** For `1000-125_v10`, each scene has eight model checkpoints. For example, `scene6` has these files.\n```\n  └── scene6_1000-125_v10\n  \t└── scene6-000-125\n\t\t└── model-best_median.ckpt\n  \t└── scene6-125-250\n\t\t└── model-best_median.ckpt\n  \t└── scene6-250-375\n\t\t└── model-best_median.ckpt\n  \t└── scene6-375-500\n\t\t└── model-best_median.ckpt\n  \t└── scene6-500-625\n\t\t└── model-best_median.ckpt\n  \t└── scene6-625-750\n\t\t└── model-best_median.ckpt\n  \t└── scene6-750-875\n\t\t└── model-best_median.ckpt\n  \t└── scene6-875-1000\n\t\t└── model-best_median.ckpt\n```\n\n**Step 3.** Each experiment file for the `1000-125_v10` experiment, for e.g. `scene6_1000-125_v10.txt` contains eight lines, one for each model checkpoint (or landmark subset). Each line contains various attributes for the associated model.\n\n**Step 4.** Check the Python script `/SceneLandmarkLocalization/src/run_inference.py`. The relative paths hardcoded in the variables `checkpoint_dir` and `dataset_dir` both assume the directory layout that was described earlier. The variable `experiment` is set to `1000-125_v10` which corresponds to the `SLD-star` model trained for 1000 landmarks partitioned into eight subsets each with 125 landmarks. The suffix `v10` is a tag to keep track of the experiment and generated model checkpoints.\n\n**Step 5.** Now, run the following script.\n```\ncd SceneLandmarkLocalization/src\npython run_inference.py\n```\n\n**Step 6.** When the script finishes running, the following text will be displayed on the console. The final accuracy (5cm/5deg recall) in percent is printed alongwith the mean inference speed.\n![indoor6_sfm](media/run_inference_screenshot.png)\n\n**Step 7.** The metrics are also written to the file `/data/checkpoints/RESULTS-1000-125_v10.txt`. Note that, `1000-125_v10` is the experiment name specified in the `run_inference.py` script.\n\n## Training Models\n\nWe now discuss how to train an `SLD-star` model ensemble. \nAs proposed in our 3DV 2024 paper, the model ensemble is a set of models that share the same architecture (derived from an EfficientNet backbone), but have independent sets of model parameters.\nEach model (or network) in the ensemble is trained on a different subset of scene landmarks. \nIn our implementation, we define the subsets by considering the ordered list of all the scene landmarks and partitioning that list into blocks of fixed size. For convenience, we choose block sizes that exactly divide the total number of landmarks to ensure that all the subsets have the same size.\n<br>  \nFor example, given 1000 scene landmarks and choosing a block size of 125, we will obtain eight subsets. The first subset will consist of landmarks with indices in the range `[0,125]` in the ordered list. \nThe second subset will have landmarks with indices in the range `[125,250]` and so on.  \n<br>\nWe will now discuss how to run the training code.\n\n\n**Step 1.** Now, run the following script.\n\nTo train a single model in the ensemble (for a specific scene), you might need to edit certain variables and modify the default values hardcoded in the `SceneLandmarkLocalization/src/run_training.py` script. \nThen, just run it as follows.  \n```\ncd SceneLandmarkLocalization/src\npython run_training.py\n```\n\n**Step 2.** Editing the script and modifying the parameter values.\n\nThe important hyperparameters and settings that might need to be modified are the follows.\n\n1. ***Paths:*** The default values for the dataset path and output paths are as follows (based on the assumed directory structure). However, these can be modified as needed.\n```\n    dataset_dir = '../../data/indoor6'\n    output_dir = '../../data/outputs'\n```\n\n2. ***Scene ID and landmarks:*** The names of the landmark and visibility files.\n```\n\tscene_name = 'scene6'\n    landmark_config = 'landmarks/landmarks-1000v10'\n    visibility_config = 'landmarks/visibility-1000v10_depth_normal'\n```\n\n3. ***Ensemble configuration:*** The number of landmarks and the block size of the ensemble. `subset_index` indicates which network within the ensemble will be trained. So in the following example, the value `0` indicates that the model will be trained for the landmarks in the index range of `[0,125]`. So for this `1000-125` ensemble, you will need to change `subset_index` to `1, 2, ..., 7` to train all eight networks.\n\n```\n    num_landmarks = 1000\n    block_size = 125\n\tsubset_index = 0\n```\n\n4. ***Version No.:*** A string tag which is appended to the generated model names and experiment files. This helps us avoid nameclashes when training and testing multiple sets of models.\n\n**Step 3.** When training completes, check the output directory, you should see a directory that contains the model checkpoint for the specified scene. There will also be an experiment text file with the same name. \nInside the scene directory are sub-directories, one for each network in the ensemble. For example, the subdirectories for the `1000-125` ensemble for `scene6` will be named as `scene6-000-125`, `scene6-125-250` and so on.\nLook inside these subdirectories for the model checkpoint file `model-best_median.ckpt`. \n\n# Contributing\n\nThis project welcomes contributions and suggestions.  Most contributions require you to agree to a\nContributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us\nthe rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.\n\nWhen you submit a pull request, a CLA bot will automatically determine whether you need to provide\na CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions\nprovided by the bot. You will only need to do this once across all repos using our CLA.\n\nThis project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).\nFor more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or\ncontact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.\n\n# Legal Notices\n\nMicrosoft and any contributors grant you a license to the Microsoft documentation and other content\nin this repository under the [Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/legalcode),\nsee the [LICENSE](LICENSE) file, and grant you a license to any code in the repository under the [MIT License](https://opensource.org/licenses/MIT), see the\n[LICENSE-CODE](LICENSE-CODE) file.\n\nMicrosoft, Windows, Microsoft Azure and/or other Microsoft products and services referenced in the documentation\nmay be either trademarks or registered trademarks of Microsoft in the United States and/or other countries.\nThe licenses for this project do not grant you rights to use any Microsoft names, logos, or trademarks.\nMicrosoft's general trademark guidelines can be found at http://go.microsoft.com/fwlink/?LinkID=254653.\n\nPrivacy information can be found at https://privacy.microsoft.com/en-us/\n\nMicrosoft and any contributors reserve all other rights, whether under their respective copyrights, patents,\nor trademarks, whether by implication, estoppel or otherwise.\n"
  },
  {
    "path": "SECURITY.md",
    "content": "<!-- BEGIN MICROSOFT SECURITY.MD V0.0.5 BLOCK -->\n\n## Security\n\nMicrosoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/Microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), [Xamarin](https://github.com/xamarin), and [our GitHub organizations](https://opensource.microsoft.com/).\n\nIf you believe you have found a security vulnerability in any Microsoft-owned repository that meets [Microsoft's definition of a security vulnerability](https://docs.microsoft.com/en-us/previous-versions/tn-archive/cc751383(v=technet.10)), please report it to us as described below.\n\n## Reporting Security Issues\n\n**Please do not report security vulnerabilities through public GitHub issues.**\n\nInstead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://msrc.microsoft.com/create-report).\n\nIf you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com).  If possible, encrypt your message with our PGP key; please download it from the [Microsoft Security Response Center PGP Key page](https://www.microsoft.com/en-us/msrc/pgp-key-msrc).\n\nYou should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://www.microsoft.com/msrc). \n\nPlease include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue:\n\n  * Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.)\n  * Full paths of source file(s) related to the manifestation of the issue\n  * The location of the affected source code (tag/branch/commit or direct URL)\n  * Any special configuration required to reproduce the issue\n  * Step-by-step instructions to reproduce the issue\n  * Proof-of-concept or exploit code (if possible)\n  * Impact of the issue, including how an attacker might exploit the issue\n\nThis information will help us triage your report more quickly.\n\nIf you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our [Microsoft Bug Bounty Program](https://microsoft.com/msrc/bounty) page for more details about our active programs.\n\n## Preferred Languages\n\nWe prefer all communications to be in English.\n\n## Policy\n\nMicrosoft follows the principle of [Coordinated Vulnerability Disclosure](https://www.microsoft.com/en-us/msrc/cvd).\n\n<!-- END MICROSOFT SECURITY.MD BLOCK -->"
  },
  {
    "path": "src/dataloader/indoor6.py",
    "content": "import argparse\nimport copy\nimport fnmatch\nimport numpy as np\nimport os\nimport pickle\nfrom PIL import Image\n\nimport sys\nsys.path.append('../utils')\n\nimport torch\nfrom torch.utils.data.dataset import Dataset\nfrom torch.utils.data import DataLoader\nfrom torchvision import transforms\n\nfrom utils.pnp import Quaternion2Rotation\n\nnp.random.seed(0)\n\nclass Indoor6(Dataset):\n    def __init__(self, root_folder=\"\",\n                 scene_id='', mode='all',\n                 landmark_idx=[None], skip_image_index=1,\n                 input_image_downsample=1, gray_image_output=False,\n                 landmark_config='landmarks/landmarks-50',\n                 visibility_config='landmarks/visibility-50',\n                 use_precomputed_focal_length=False):\n        super(Indoor6, self).__init__()\n\n        self.to_tensor = transforms.ToTensor()\n\n        self.image_folder = os.path.join(root_folder,\n                                        scene_id,\n                                        'images')\n        image_files_all = fnmatch.filter(os.listdir(self.image_folder), '*.color.jpg')\n        image_files_all = sorted(image_files_all)[::skip_image_index]\n\n        self.image_files = []\n        if mode == 'train':\n            self.image_files = \\\n                pickle.load(open('%s/%s/train_test_val.pkl' % (root_folder, scene_id), 'rb'))[\n                    'train'][::skip_image_index]\n            self.image_indices = \\\n                pickle.load(open('%s/%s/train_test_val.pkl' % (root_folder, scene_id), 'rb'))[\n                    'train_idx'][::skip_image_index]\n        elif mode == 'test':\n            self.image_files = \\\n                pickle.load(open('%s/%s/train_test_val.pkl' % (root_folder, scene_id), 'rb'))[\n                    'test'][::skip_image_index]\n            self.image_indices = \\\n                pickle.load(open('%s/%s/train_test_val.pkl' % (root_folder, scene_id), 'rb'))[\n                    'test_idx'][::skip_image_index]\n        elif mode == 'val':\n            self.image_files = \\\n                pickle.load(open('%s/%s/train_test_val.pkl' % (root_folder, scene_id), 'rb'))[\n                    'val'][::skip_image_index]\n            self.image_indices = \\\n                pickle.load(open('%s/%s/train_test_val.pkl' % (root_folder, scene_id), 'rb'))[\n                    'val_idx'][::skip_image_index]\n        else:\n            self.image_files = image_files_all\n            self.image_indices = np.arange(0, len(image_files_all))\n\n        self.image_indices = np.asarray(self.image_indices)\n        self.num_images = len(self.image_files)\n        self.gray_image_output = gray_image_output\n        self.mode = mode\n\n        landmark_file = open(root_folder + '/' + scene_id\n                                         + '/%s.txt' % landmark_config, 'r')\n        num_landmark = int(landmark_file.readline())\n        self.landmark = []\n        for l in range(num_landmark):\n            pl = landmark_file.readline().split()\n            pl = np.array([float(pl[i]) for i in range(len(pl))])\n            self.landmark.append(pl)\n        self.landmark = np.asarray(self.landmark)[:, 1:]\n\n        self.image_downsampled = input_image_downsample\n\n        visibility_file = root_folder + '/' + scene_id + '/%s.txt' % visibility_config\n        self.visibility = np.loadtxt(visibility_file).astype(bool)\n        \n        if landmark_idx[0] != None:\n            self.landmark = self.landmark[landmark_idx]\n            self.visibility = self.visibility[landmark_idx]\n        \n        self.landmark = self.landmark.transpose()\n        \n        ## Precomputed fixed focal length\n        self.precomputed_focal_length = None\n        if use_precomputed_focal_length:\n            PRECOMPUTED_FOCAL_LENGTH = {'scene1': 900, 'scene2a': 1100, 'scene3': 900, 'scene4a': 900, 'scene5': 900, 'scene6': 900}\n            self.precomputed_focal_length = PRECOMPUTED_FOCAL_LENGTH[scene_id]\n\n    \n    def original_image_name(self, index):\n        \n        intrinsics = open(os.path.join(self.image_folder,\n                                        self.image_files[index].replace('color.jpg', 'intrinsics.txt')))\n        intrinsics = intrinsics.readline().split()\n    \n        return intrinsics[6]\n\n    \n    def _modify_intrinsic(self, index, use_precomputed_focal_length=False):\n        W = None\n        H = None\n        K = None\n        K_inv = None\n\n        while K_inv is None:\n            try:\n                intrinsics = open(os.path.join(self.image_folder,\n                                               self.image_files[index].replace('color.jpg', 'intrinsics.txt')))                \n                intrinsics = intrinsics.readline().split()\n\n                W = int(intrinsics[0]) // (self.image_downsampled * 32) * 32\n                H = int(intrinsics[1]) // (self.image_downsampled * 32) * 32\n\n                scale_factor_x = W / float(intrinsics[0])\n                scale_factor_y = H / float(intrinsics[1])\n\n                if use_precomputed_focal_length:                    \n                    fx = self.precomputed_focal_length * scale_factor_x\n                    fy = self.precomputed_focal_length * scale_factor_y\n                else:\n                    fx = float(intrinsics[2]) * scale_factor_x\n                    fy = float(intrinsics[2]) * scale_factor_y\n\n                cx = float(intrinsics[3]) * scale_factor_x\n                cy = float(intrinsics[4]) * scale_factor_y\n\n                K = np.array([[fx, 0., cx],\n                              [0., fy, cy],\n                              [0., 0., 1.]], dtype=float)\n\n                K_inv = np.linalg.inv(K)\n\n            except(RuntimeError, TypeError, NameError):\n                pass\n        return K, K_inv, W, H\n\n    def _load_and_resize_image(self, index, W, H):\n        color_img_rs = None\n        while color_img_rs is None:\n            try:\n                # Load color image\n                color_img = Image.open(os.path.join(self.image_folder, self.image_files[index]))\n                color_img_rs = color_img.resize((W, H), resample=Image.BILINEAR)\n            except(RuntimeError, TypeError, NameError):\n                pass\n\n        color_tensor = self.to_tensor(color_img_rs)\n\n        return color_tensor\n\n    def _load_pose(self, index):\n        pose = None\n        while pose is None:\n            try:\n                # Load 3x4 pose matrix and make it 4x4 by appending vector [0., 0., 0., 1.]\n                pose = np.loadtxt(os.path.join(self.image_folder, self.image_files[index].replace('color.jpg', 'pose.txt')))\n            except (RuntimeError, TypeError, NameError):\n                pass\n\n        pose_s = np.vstack((pose, np.array([0., 0., 0., 1.])))\n\n        return pose_s\n\n    def __getitem__(self, index):\n        K, K_inv, W_modified, H_modified = self._modify_intrinsic(index, use_precomputed_focal_length=False if self.precomputed_focal_length is None else True)\n        color_tensor = self._load_and_resize_image(index, W_modified, H_modified)\n        C_T_G = self._load_pose(index)\n\n        landmark3d = C_T_G @ np.vstack((self.landmark, np.ones((1, self.landmark.shape[1]))))\n\n        output = {'pose_gt': torch.tensor(C_T_G),\n                  'image': color_tensor,\n                  'intrinsics': torch.tensor(K, dtype=torch.float32, requires_grad=False),\n                  'inv_intrinsics': torch.tensor(K_inv, dtype=torch.float32, requires_grad=False),\n                  'landmark3d': torch.tensor(landmark3d[:3], dtype=torch.float32, requires_grad=False),\n                  }\n        \n        proj = K @ (C_T_G[:3, :3] @ self.landmark + C_T_G[:3, 3:])\n        landmark2d = proj / proj[2:]\n        output['landmark2d'] = landmark2d[:2]\n\n        inside_patch = (landmark2d[0] < W_modified) * \\\n                        (landmark2d[0] >= 0) * \\\n                        (landmark2d[1] < H_modified) * \\\n                        (landmark2d[1] >= 0)  # L vector\n\n        # visible by propagated colmap visibility and inside image\n        _mask1 = self.visibility[:, self.image_indices[index]] * inside_patch\n\n        # outside patch\n        # _mask2 = ~inside_patch\n\n        # inside image but not visible by colmap\n        _mask3 = (self.visibility[:, self.image_indices[index]] == 0) * inside_patch\n\n        visibility_mask = 1.0 * _mask1 + 0.5 * _mask3\n        output['visibility'] = visibility_mask\n\n        return output\n\n    def __len__(self):\n        return self.num_images\n\n\nclass Indoor6Patches(Indoor6):\n    def __init__(self, root_folder=\"\",\n                 scene_id='', mode='all',\n                 landmark_idx=[None], skip_image_index=1,\n                 input_image_downsample=1, gray_image_output=False,\n                 patch_size=96,\n                 positive_samples=4, random_samples=4,\n                 landmark_config='landmarks/landmarks-50',\n                 visibility_config='landmarks/visibility-50',\n                 augmentation=True):\n        super().__init__(root_folder=root_folder,\n                         scene_id=scene_id, mode=mode,\n                         landmark_idx=landmark_idx, skip_image_index=skip_image_index,\n                         input_image_downsample=input_image_downsample, gray_image_output=gray_image_output,\n                         landmark_config=landmark_config,\n                         visibility_config=visibility_config)\n        self.patch_size = patch_size\n        self.positive_samples = positive_samples\n        self.random_samples = random_samples\n        self.landmark_idx = landmark_idx\n        self.augmentation = augmentation\n\n        self.num_landmarks = self.landmark.shape[1]\n\n    def _extract_patch(self, C_T_G, lm_idx, K, W_modified, H_modified, center=False, adjust_boundary=True):\n\n        proj = K @ (C_T_G[:3, :3] @ self.landmark[:, lm_idx:(lm_idx + 1)] + C_T_G[:3, 3:])\n        proj /= copy.copy(proj[2:])\n\n        # Extract patch\n        y = int(proj[1, 0])\n        x = int(proj[0, 0])\n\n        if center:\n            dy = -self.patch_size // 2\n            dx = -self.patch_size // 2\n        else:\n            dy = -np.random.rand(1) * self.patch_size\n            dx = -np.random.rand(1) * self.patch_size\n\n        _top = int(y + dy)\n        _bottom = _top + int(self.patch_size)\n        _left = int(x + dx)\n        _right = _left + int(self.patch_size)\n\n        if adjust_boundary:\n            # Adjust the boundary\n            if _top < 0:\n                _top = 0\n                _bottom = int(self.patch_size)\n            elif _bottom >= H_modified:\n                _top = H_modified - int(self.patch_size)\n                _bottom = H_modified\n\n            if _left < 0:\n                _left = 0\n                _right = int(self.patch_size)\n            elif _right >= W_modified:\n                _left = W_modified - int(self.patch_size)\n                _right = W_modified\n\n        return _left, _right, _top, _bottom\n\n    def _project_landmarks_into_patch(self, K, C_T_G, img_idx, _top, _bottom, _left, _right):\n        proj = K @ (C_T_G[:3, :3] @ self.landmark + C_T_G[:3, 3:])\n        in_front_of_camera = proj[2] > 0.0\n        proj /= copy.copy(proj[2:])\n\n        proj_patch = np.zeros_like(proj[:2])\n        proj_patch[0] = proj[0] - _left\n        proj_patch[1] = proj[1] - _top\n\n        # L vector\n        inside_patch = (proj[0] < _right) * (proj[0] >= _left) * (proj[1] < _bottom) * (\n                    proj[1] >= _top) * in_front_of_camera\n\n        # visible by propagated colmap visibility and inside patch\n        _mask1 = self.visibility[:, self.image_indices[img_idx]] * inside_patch\n\n        # outside patch\n        # _mask2 = ~inside_patch\n\n        # inside patch but not visible by colmap\n        _mask3 = (self.visibility[:, self.image_indices[img_idx]] == 0) * inside_patch\n\n        visibility_mask = 1.0 * _mask1 + 0.5 * _mask3\n\n        return proj_patch, visibility_mask\n\n    def __getitem__(self, index):\n\n        patches = []\n        keypoint_locations = []\n        landmark_visibility_on_patch = []\n        L = self.landmark.shape[1]  # number of keypoints\n\n        list_landmarks = np.random.permutation(L)[:self.positive_samples]\n        \n        ## Create positive examples\n        for lm_idx in list_landmarks:\n            ## Randomly draw image index from visibility mask\n            training_img_ids_observe_lm_idx = self.visibility[lm_idx, self.image_indices].reshape(-1)\n            total_images_observed_this_lm = np.sum(training_img_ids_observe_lm_idx)\n            if total_images_observed_this_lm == 0:\n                print('no positive example')\n                img_idx_positive_sample_for_lm_idx = np.random.randint(self.num_images)\n            else:\n                # img_idx_observe_lm_idx = (index % int(np.sum(training_img_ids_observe_lm_idx)))\n                random_indices_observe_this_lm = np.random.randint(0, total_images_observed_this_lm)\n                img_idx_positive_sample_for_lm_idx = np.where(training_img_ids_observe_lm_idx==1)[0][random_indices_observe_this_lm]\n\n            K, K_inv, W_modified, H_modified = self._modify_intrinsic(img_idx_positive_sample_for_lm_idx)\n            C_T_G = self._load_pose(img_idx_positive_sample_for_lm_idx)\n            color_tensor = self._load_and_resize_image(img_idx_positive_sample_for_lm_idx, W_modified, H_modified)\n\n            if not self.augmentation:\n                _left, _right, _top, _bottom = self._extract_patch(C_T_G, lm_idx, K, W_modified, H_modified,\n                                                                   center=False, adjust_boundary=True)\n                color_patch = color_tensor.reshape(1, 3, H_modified, W_modified)[:, :, _top:_bottom, _left:_right]\n                Cg_T_G = C_T_G\n                K_scale = K\n            else:\n                ## Random rotation, change K, T\n                q = np.random.rand(4) - 0.5\n                q[1] *= 0.1  # pitch\n                q[2] *= 0.1  # yaw\n                q[3] *= 0.1  # roll\n                q[0] = 1.0\n                q /= np.linalg.norm(q)\n                Cg_R_C = Quaternion2Rotation(q)\n                Cg_T_C = np.eye(4)\n                Cg_T_C[:3, :3] = Cg_R_C\n\n                Cg_T_G = Cg_T_C @ C_T_G\n                K_scale = K.copy()\n                K_scale[:2, :2] *= (0.9 + 0.2*np.random.rand())\n                K_scale_inv = np.linalg.inv(K_scale)\n\n                _left, _right, _top, _bottom = self._extract_patch(Cg_T_G, lm_idx, K_scale, W_modified, H_modified,\n                                                                   center=False, adjust_boundary=False)\n\n                ## Extract patch\n                YY_patch, XX_patch = torch.meshgrid(torch.arange(_top, _bottom, 1),\n                                                    torch.arange(_left, _right, 1))\n                XX_patch = XX_patch.reshape(1, self.patch_size, self.patch_size).float()\n                YY_patch = YY_patch.reshape(1, self.patch_size, self.patch_size).float()\n\n                in_H_out = K @ Cg_R_C.T @ K_scale_inv\n                in_H_out = torch.tensor(in_H_out, dtype=torch.float)\n                in_p_out = in_H_out @ torch.cat((XX_patch,\n                                                 YY_patch,\n                                                 torch.ones_like(XX_patch)), dim=1).reshape((3, self.patch_size**2))\n                in_p_out = in_p_out / in_p_out[2:].clone()\n\n                scale = torch.tensor([[2. / W_modified, 0.],\n                                      [0., 2. / H_modified]], dtype=torch.float).reshape(2, 2)\n                center = torch.tensor([0.5 * (W_modified - 1),\n                                       0.5 * (H_modified - 1)], dtype=torch.float).reshape(2, 1)\n                in_p_out_normalized = scale @ (in_p_out[:2] - center)\n\n                invalid_pixel_mask = (in_p_out_normalized[0] < -1) + \\\n                                     (in_p_out_normalized[0] > 1) + \\\n                                     (in_p_out_normalized[1] < -1) + \\\n                                     (in_p_out_normalized[1] > 1)\n\n                if torch.sum(invalid_pixel_mask>0) > 0.25 * self.patch_size ** 2:\n                    _left, _right, _top, _bottom = self._extract_patch(C_T_G, lm_idx, K, W_modified, H_modified,\n                                                                      center=False, adjust_boundary=True)\n                    color_patch = color_tensor.reshape(1, 3, H_modified, W_modified)[:, :, _top:_bottom, _left:_right]\n\n                    # Not using augmented transformation\n                    K_scale = K.copy()\n                    Cg_T_G = C_T_G\n                else:\n                    grid_sampler = in_p_out_normalized.reshape(1, 2, self.patch_size, self.patch_size).permute(0, 2, 3, 1)\n                    color_tensor = color_tensor.reshape(1, 3, H_modified, W_modified)\n                    color_patch = torch.nn.functional.grid_sample(color_tensor, grid_sampler,\n                                                                 padding_mode='zeros', mode='bilinear', align_corners=False)\n                    color_patch = torch.nn.functional.interpolate(color_patch, size=(self.patch_size, self.patch_size))\n\n            keypoints_2d, visibility_mask = self._project_landmarks_into_patch(K_scale, Cg_T_G, img_idx_positive_sample_for_lm_idx, _top, _bottom, _left, _right)\n            patches.append(color_patch)\n            keypoint_locations.append(keypoints_2d.reshape((1, 2, L)))\n            landmark_visibility_on_patch.append(visibility_mask.reshape((1, L)))\n\n        ## Create random examples\n        patches_random = []\n        keypoint_locations_random = []\n        landmark_visibility_on_patch_random = []\n\n        C_T_G = self._load_pose(index)\n        K, K_inv, W_modified, H_modified = self._modify_intrinsic(index)\n        color_tensor = self._load_and_resize_image(index, W_modified, H_modified)\n\n        for _ in range(self.random_samples):\n            _top = int(np.random.rand(1) * (H_modified - self.patch_size))\n            _bottom = _top + self.patch_size\n            _left = int(np.random.rand(1) * (W_modified - self.patch_size))\n            _right = _left + self.patch_size\n\n            keypoints_2d, visibility_mask = self._project_landmarks_into_patch(K, C_T_G, index, _top, _bottom, _left, _right)\n\n            patches_random.append(color_tensor[:, _top:_bottom, _left:_right].clone().reshape(1, 3, self.patch_size, self.patch_size))\n            keypoint_locations_random.append(keypoints_2d.reshape((1, 2, L)))\n            landmark_visibility_on_patch_random.append(visibility_mask.reshape((1, L)))\n\n        patches = torch.cat(patches+patches_random, dim=0)\n        keypoint_locations = np.concatenate(keypoint_locations+keypoint_locations_random, axis=0)\n        landmark_visibility_on_patch = np.concatenate(landmark_visibility_on_patch+landmark_visibility_on_patch_random, axis=0)\n\n        ## COLOR AUGMENTATION\n        if self.augmentation:\n            if torch.rand(1) > 0.5:\n                patches += 0.02 * (\n                            torch.rand((patches.shape[0], patches.shape[1], 1, 1)) - 0.5) * torch.ones_like(patches)\n            else:\n                patches += 0.2 * (\n                        torch.rand((patches.shape[0], 1, 1, 1)) - 0.5) * torch.ones_like(patches)\n        clipped_patches = torch.clip(patches, 0, 1)\n\n\n        output = {'patches': clipped_patches,\n                  'landmark2d': torch.tensor(keypoint_locations, dtype=torch.float, requires_grad=False),\n                  'visibility': torch.tensor(landmark_visibility_on_patch, requires_grad=False),\n                  }\n\n        return output\n"
  },
  {
    "path": "src/inference.py",
    "content": "import copy\nimport numpy as np\nimport os\nimport torch\nfrom torch.utils.data import DataLoader\nfrom tqdm import tqdm\nimport random\nfrom datetime import datetime\n\nfrom dataloader.indoor6 import Indoor6\nfrom models.efficientlitesld import EfficientNetSLD\nfrom utils.pnp import *\n\n\ndef compute_error(C_R_G, C_t_G, C_R_G_hat, C_t_G_hat):\n\n    rot_err = 180 / np.pi * np.arccos(np.clip(0.5 * (np.trace(C_R_G.T @ C_R_G_hat) - 1.0), a_min=-1., a_max=1.))\n    trans_err = np.linalg.norm(C_R_G_hat.T @ C_t_G_hat - C_R_G.T @ C_t_G)\n\n    return rot_err, trans_err\n\n\ndef compute_2d3d(opt, pred_heatmap, peak_threshold, landmark2d, landmark3d, C_b_f_gt, H_hm, W_hm, K_inv,\n                 METRICS_LOGGING=None):\n    N = pred_heatmap.shape[0]\n    G_p_f = np.zeros((3, N))\n    C_b_f_hm = np.zeros((3, N))\n    weights = np.zeros(N)\n    validIdx = 0\n\n    pixel_error = []\n    angular_error = []\n    for l in range(N):\n        pred_heatmap_l = pred_heatmap[l]\n        max_pred_heatmap_l = np.max(pred_heatmap_l)\n\n        if max_pred_heatmap_l > peak_threshold:\n            peak_yx = np.unravel_index(np.argmax(pred_heatmap_l), np.array(pred_heatmap_l).shape)\n            peak_yx = np.array(peak_yx)\n\n            # Patch size extraction\n            P = int(min(1+2*np.min(np.array([peak_yx[0], H_hm-1.0-peak_yx[0], peak_yx[1], W_hm-1.0-peak_yx[1]])),\n                        1+64//opt.output_downsample))\n\n            patch_peak_yx = pred_heatmap_l[peak_yx[0] - P // 2:peak_yx[0] + P // 2 + 1,\n                            peak_yx[1] - P // 2:peak_yx[1] + P // 2 + 1]\n            xx_patch, yy_patch = np.meshgrid(np.arange(peak_yx[1] - P // 2, peak_yx[1] + P // 2 + 1, 1),\n                                             np.arange(peak_yx[0] - P // 2, peak_yx[0] + P // 2 + 1, 1))\n\n            refine_y = np.sum(patch_peak_yx * yy_patch) / np.sum(patch_peak_yx)\n            refine_x = np.sum(patch_peak_yx * xx_patch) / np.sum(patch_peak_yx)\n\n            \n            pixel_error.append(np.linalg.norm(landmark2d[:2, l] -\n                                              opt.output_downsample * np.array([refine_x, refine_y])))\n\n            pred_bearing = K_inv @ np.array([refine_x, refine_y, 1])\n            pred_bearing = pred_bearing / np.linalg.norm(pred_bearing)\n            gt_bearing = C_b_f_gt[:, l]\n            gt_bearing = gt_bearing / np.linalg.norm(gt_bearing)\n            angular_error_batch = np.arccos(\n                np.clip(pred_bearing @ gt_bearing, a_min=-1, a_max=1)) * 180 / np.pi\n            \n            angular_error.append(angular_error_batch)\n\n            weights[validIdx] = max_pred_heatmap_l\n            C_b_f_hm[:, validIdx] = pred_bearing\n            G_p_f[:, validIdx] = landmark3d[:, l]\n            validIdx += 1\n\n    return G_p_f[:, :validIdx], C_b_f_hm[:, :validIdx], weights[:validIdx], np.asarray(pixel_error), np.asarray(angular_error)\n\n\ndef compute_pose(G_p_f, C_b_f_hm, weights, minimal_tight_thr, opt_tight_thr):\n\n    Ndetected_landmarks = C_b_f_hm.shape[1]\n\n    if Ndetected_landmarks >= 4:\n        ## P3P ransac\n        C_T_G_hat, PnP_inlier = P3PKe_Ransac(G_p_f, C_b_f_hm, weights,\n                                             thres=minimal_tight_thr)\n        \n        if np.sum(PnP_inlier) >= 4:\n            C_T_G_opt = RunPnPNL(C_T_G_hat,\n                                 G_p_f[:, PnP_inlier], \n                                 C_b_f_hm[:, PnP_inlier],\n                                 weights[PnP_inlier],\n                                 cutoff=opt_tight_thr)\n            return np.sum(PnP_inlier), C_T_G_opt\n\n    return 0, None\n\n\ndef inference(opt, minimal_tight_thr=1e-2, opt_tight_thr=5e-3, mode='test'):\n\n    # random.seed(datetime.now().timestamp())\n\n    PRETRAINED_MODEL = opt.pretrained_model\n\n    device = opt.gpu_device\n\n    test_dataset = Indoor6(landmark_idx=np.arange(opt.landmark_indices[0], opt.landmark_indices[-1]),\n                           scene_id=opt.scene_id,\n                           mode=mode,\n                           root_folder=opt.dataset_folder,\n                           input_image_downsample=2,\n                           landmark_config=opt.landmark_config,\n                           visibility_config=opt.visibility_config,\n                           skip_image_index=1,\n                           use_precomputed_focal_length=opt.use_precomputed_focal_length)\n\n    test_dataloader = DataLoader(dataset=test_dataset, num_workers=1, batch_size=1, shuffle=False, pin_memory=True)\n\n    num_landmarks = test_dataset.landmark.shape[1]\n    landmark_data = test_dataset.landmark\n\n    cnns = []\n    nLandmarks = opt.landmark_indices\n    num_landmarks = opt.landmark_indices[-1] - opt.landmark_indices[0]\n\n    for idx, pretrained_model in enumerate(PRETRAINED_MODEL):\n        if opt.model == 'efficientnet':\n            cnn = EfficientNetSLD(num_landmarks=nLandmarks[idx+1]-nLandmarks[idx], output_downsample=opt.output_downsample).to(device=device)\n\n        cnn.load_state_dict(torch.load(pretrained_model))\n        cnn = cnn.to(device=device)\n        cnn.eval()\n        \n        # Adding pretrained model\n        cnns.append(cnn)\n\n    peak_threshold = 3e-1\n    img_id = 0\n\n    METRICS_LOGGING = {'image_name': '',\n                       'angular_error': 180.,\n                       'pixel_error': 1800.,\n                       'rot_err_all': 180.,\n                       'trans_err_all': 180.,\n                       'heatmap_peak': 0.0,\n                       'ndetected': 0,                       \n                       }\n    test_image_logging = []    \n\n    with torch.no_grad():\n\n        ## Only works for indoor-6\n        indoor6W = 640 // opt.output_downsample\n        indoor6H = 352 // opt.output_downsample\n        HH, WW = torch.meshgrid(torch.arange(indoor6H), torch.arange(indoor6W))\n        WW = WW.reshape(1, 1, indoor6H, indoor6W).to('cuda')\n        HH = HH.reshape(1, 1, indoor6H, indoor6W).to('cuda')\n\n        with tqdm(test_dataloader) as tq:\n            for idx, batch in enumerate(tq):\n            #for idx, batch in enumerate(tqdm(test_dataloader)):\n\n                image = batch['image'].to(device=device)\n                B, _, H, W = image.shape\n\n                K_inv = batch['inv_intrinsics'].to(device=device)\n                C_T_G_gt = batch['pose_gt'].cpu().numpy()\n\n                landmark2d = batch['intrinsics'] @ batch['landmark3d'].reshape(B, 3, num_landmarks)\n                landmark2d /= landmark2d[:, 2:].clone()\n                landmark2d = landmark2d.numpy()\n\n                pred_heatmap = []\n                for cnn in cnns:\n                    pred = cnn(image)\n                    pred_heatmap.append(pred['1'])\n\n                pred_heatmap = torch.cat(pred_heatmap, axis=1)\n                pred_heatmap *= (pred_heatmap > peak_threshold).float()\n\n                # tmp = torch.sqrt(pred_heatmap)\n                #\n                # w^{1.5}\n                # pred_heatmap *= tmp\n                #\n                # w^{2.5}\n                # pred_heatmap *= tmp\n                # pred_heatmap *= pred_heatmap\n\n                # w^2\n                pred_heatmap *= pred_heatmap\n                \n                K_inv[:, :, :2] *= opt.output_downsample\n\n                ## Compute 2D location of landmarks\n                P = torch.max(torch.max(pred_heatmap, dim=3)[0], dim=2)[0]\n                pred_normalized_heatmap = pred_heatmap / (torch.sum(pred_heatmap, axis=(2, 3), keepdim=True) + 1e-4)\n                projx = torch.sum(WW * pred_normalized_heatmap, axis=(2, 3)).reshape(B, 1, num_landmarks)\n                projy = torch.sum(HH * pred_normalized_heatmap, axis=(2, 3)).reshape(B, 1, num_landmarks)\n                xy1 = torch.cat((projx, projy, torch.ones_like(projx)), axis=1)\n                uv1 = K_inv @ xy1\n                C_B_f = uv1 / torch.sqrt(torch.sum(uv1 ** 2, axis=1, keepdim=True))\n                C_B_f = C_B_f.cpu().numpy()\n                P = P.cpu().numpy()\n                xy1 = xy1.cpu().numpy()\n                \n                ## Compute error\n                for b in range(B):\n                    Pb = P[b]>peak_threshold\n                    G_p_f = landmark_data[:, Pb]\n                    C_b_f = C_B_f[b][:, Pb]\n\n                    ## MAKING THIS CHANGE FOR ABLATION STUDY IN PAPER: PLEASE REMOVE LATER!\n                    ## weights = np.ones_like(P[b][Pb])\n                    weights = P[b][Pb]\n\n                    xy1b = xy1[b][:2, Pb]\n\n                    pnp_inlier, C_T_G_hat = compute_pose(G_p_f, C_b_f, weights,\n                                                        minimal_tight_thr, opt_tight_thr)\n\n                    rot_err, trans_err = 180., 1800.\n                    if pnp_inlier >= 4:\n                        rot_err, trans_err = compute_error(C_T_G_gt[b][:3, :3], C_T_G_gt[b][:3, 3],\n                                                           C_T_G_hat[:3, :3], C_T_G_hat[:3, 3])\n\n                    ## Logging information\n                    pixel_error = np.linalg.norm(landmark2d[b][:2, Pb] - opt.output_downsample * xy1b, axis=0)\n                    C_b_f_gt = batch['landmark3d'][b]\n                    C_b_f_gt = torch.nn.functional.normalize(C_b_f_gt, dim=0).cpu().numpy()\n                    angular_error = np.arccos(np.clip(np.sum(C_b_f * C_b_f_gt[:, Pb], axis=0), -1, 1)) * 180. / np.pi\n\n                    m = copy.deepcopy(METRICS_LOGGING)\n                    m['image_name'] = test_dataset.image_files[img_id]\n                    m['pixel_error'] = pixel_error\n                    m['angular_error'] = angular_error\n                    m['heatmap_peak'] = weights\n                    m['rot_err_all'] = np.array([rot_err])\n                    m['trans_err_all'] = np.array([trans_err])\n                    test_image_logging.append(m)\n                    img_id += 1\n\n    elapsedtime = tq.format_dict[\"elapsed\"]\n    processing_speed = len(test_dataset)/elapsedtime\n\n    metrics_output = {'angular_error': [], \n                      'pixel_error': [], \n                      'heatmap_peak': [], \n                      'rot_err_all': [], \n                      'trans_err_all': []}\n    \n    for k in metrics_output:        \n        for imgdata in test_image_logging:            \n            metrics_output[k].append(imgdata[k])\n        metrics_output[k] = np.concatenate(metrics_output[k])\n\n    metrics_output['r5'] = np.sum(metrics_output['rot_err_all'] < 5) / len(test_dataset)\n    metrics_output['r10'] = np.sum(metrics_output['rot_err_all'] < 10) / len(test_dataset)\n    metrics_output['p5'] = np.sum(metrics_output['trans_err_all'] < 0.05) / len(test_dataset)\n    metrics_output['p10'] = np.sum(metrics_output['trans_err_all'] < 0.1) / len(test_dataset)\n    metrics_output['r1p1'] = np.sum((metrics_output['rot_err_all'] < 1) * (metrics_output['trans_err_all'] < 0.01))/len(test_dataset)\n    metrics_output['r2p2'] = np.sum((metrics_output['rot_err_all'] < 2) * (metrics_output['trans_err_all'] < 0.02))/len(test_dataset)\n    metrics_output['r5p5'] = np.sum((metrics_output['rot_err_all'] < 5) * (metrics_output['trans_err_all'] < 0.05))/len(test_dataset)\n    metrics_output['r10p10'] = np.sum((metrics_output['rot_err_all'] < 10) * (metrics_output['trans_err_all'] < 0.1)) / len(test_dataset)\n    metrics_output['median_rot_error'] = np.median(metrics_output['rot_err_all'])\n    metrics_output['median_trans_error'] = np.median(metrics_output['trans_err_all'])\n    metrics_output['speed'] = processing_speed\n    return metrics_output\n\n\ndef inference_landmark_stats(opt, mode='test'):\n    import pickle\n\n    PRETRAINED_MODEL = opt.pretrained_model\n\n    device = opt.gpu_device\n\n    test_dataset = Indoor6(landmark_idx=np.arange(opt.landmark_indices[0], opt.landmark_indices[-1]),\n                           scene_id=opt.scene_id,\n                           mode=mode,\n                           root_folder=opt.dataset_folder,\n                           input_image_downsample=2,\n                           landmark_config=opt.landmark_config,\n                           visibility_config=opt.visibility_config,\n                           skip_image_index=1)\n\n    test_dataloader = DataLoader(dataset=test_dataset, num_workers=1, batch_size=1, shuffle=False, pin_memory=True)\n\n    num_landmarks = test_dataset.landmark.shape[1]\n\n    cnns = []\n    nLandmarks = opt.landmark_indices\n    num_landmarks = opt.landmark_indices[-1] - opt.landmark_indices[0]\n\n    for idx, pretrained_model in enumerate(PRETRAINED_MODEL):\n        if opt.model == 'efficientnet':\n            cnn = EfficientNetSLD(num_landmarks=nLandmarks[idx+1]-nLandmarks[idx], output_downsample=opt.output_downsample).to(device=device)\n\n        cnn.load_state_dict(torch.load(pretrained_model))\n        cnn = cnn.to(device=device)\n        cnn.eval()\n        \n        # Adding pretrained model\n        cnns.append(cnn)\n\n    peak_threshold = 2e-1\n\n    SINGLE_LANDMARK_STATS = {'image_idx': [],\n                            'pixel_error': [],\n                            }\n    landmark_stats = [copy.deepcopy(SINGLE_LANDMARK_STATS) for _ in range(num_landmarks)]\n    img_idx = 0\n\n    with torch.no_grad():\n\n        ## Only works for indoor-6\n        indoor6W = 640 // opt.output_downsample\n        indoor6H = 352 // opt.output_downsample\n        HH, WW = torch.meshgrid(torch.arange(indoor6H), torch.arange(indoor6W))\n        WW = WW.reshape(1, 1, indoor6H, indoor6W).to('cuda')\n        HH = HH.reshape(1, 1, indoor6H, indoor6W).to('cuda')\n\n        for idx, batch in enumerate(tqdm(test_dataloader)):\n\n            image = batch['image'].to(device=device)\n            B, _, H, W = image.shape\n            landmark2d = batch['intrinsics'] @ batch['landmark3d'].reshape(B, 3, num_landmarks)\n            landmark2d /= landmark2d[:, 2:].clone()\n            landmark2d = landmark2d.numpy()\n\n            pred_heatmap = []\n            for cnn in cnns:\n                pred = cnn(image)\n                pred_heatmap.append(pred['1'])\n\n            pred_heatmap = torch.cat(pred_heatmap, axis=1)\n            pred_heatmap *= (pred_heatmap > peak_threshold).float()\n\n            ## Compute 2D location of landmarks\n            P = torch.max(torch.max(pred_heatmap, dim=3)[0], dim=2)[0]\n            pred_normalized_heatmap = pred_heatmap / (torch.sum(pred_heatmap, axis=(2, 3), keepdim=True) + 1e-4)\n            projx = torch.sum(WW * pred_normalized_heatmap, axis=(2, 3)).reshape(B, 1, num_landmarks)\n            projy = torch.sum(HH * pred_normalized_heatmap, axis=(2, 3)).reshape(B, 1, num_landmarks)\n            xy1 = torch.cat((projx, projy, torch.ones_like(projx)), axis=1)\n            P = P.cpu().numpy()\n            xy1 = xy1.cpu().numpy()\n\n            ## Compute error\n            for b in range(B):                                \n                for l in range(num_landmarks):\n                    if P[b,l] > peak_threshold:\n                        pixel_error = np.linalg.norm(landmark2d[b][:2, l] - \n                                                     opt.output_downsample * xy1[b][:2, l])\n                        landmark_stats[l]['pixel_error'].append(pixel_error)\n                        landmark_stats[l]['image_idx'].append(test_dataset.image_indices[img_idx])\n                img_idx += 1\n        \n        landmark_stats_np = np.zeros((num_landmarks, 5))\n        for l in range(num_landmarks):\n            landmark_stats_np[l, 0] = l\n            landmark_stats_np[l, 1] = len(landmark_stats[l]['image_idx'])\n            if landmark_stats_np[l, 1] > 0:\n                pixel_error = np.array(landmark_stats[l]['pixel_error'])\n                landmark_stats_np[l, 2] = np.mean(pixel_error)\n                landmark_stats_np[l, 3] = np.median(pixel_error)\n                landmark_stats_np[l, 4] = np.max(pixel_error)\n        np.savetxt(os.path.join(opt.output_folder, 'landmark_stats.txt'), landmark_stats_np)\n        pickle.dump(landmark_stats, open(os.path.join(opt.output_folder, 'landmark_stats.pkl'), 'wb'))\n\n    return\n"
  },
  {
    "path": "src/local_inference.py",
    "content": "# Copyright (c) Microsoft Corporation. All rights reserved.\n#from __future__ import print_function\nimport argparse\nimport os\nimport time\n\nArgs = None\n\ndef local_inference():\n    cmd = 'python main.py --action test --dataset_folder %s --scene_id %s --landmark_config %s --visibility_config %s' % (Args.dataset_dir, Args.scene_id, Args.landmark_config, Args.visibility_config)\n    cmd += ' --output_downsample 8'\n    cmd += ' --landmark_indices 0'\n    for i in range(0, len(Args.landmark_indices)):\n        cmd += ' --landmark_indices %d' % (Args.landmark_indices[i])\n    for ckpt in Args.checkpoint_names:\n        cmd += ' --pretrained_model %s/%s/%s/model-best_median.ckpt' % (Args.checkpoint_dir, Args.experimentGroupName, ckpt)\n    cmd += ' --output_folder %s/%s' % (Args.checkpoint_dir, Args.experimentGroupName)\n    print(\"Running [\" + cmd + \"]\")\n    os.system(cmd)\n\nif __name__ == '__main__':\n    \n    parser = argparse.ArgumentParser()\n    parser.add_argument(\n        '--experiment_file', default=\"\", type=str, required=True,\n        help=\"Experiment file path.\")\n    parser.add_argument(\n        '--dataset_dir', default=\"\", type=str, required=True,\n        help=\"Dataset path.\")\n    parser.add_argument(\n        '--checkpoint_dir', default=\"\", type=str, required=True,\n        help=\"Checkpoints folder path.\")\n\n    Args = parser.parse_args()\n\n    tmp = os.path.basename(Args.experiment_file)\n    Args.experimentGroupName = tmp[:tmp.rindex('.')]\n    Args.landmark_indices = []\n    Args.checkpoint_names = []\n    exp_file = os.path.join(Args.checkpoint_dir, Args.experiment_file)\n    fd = open(exp_file, 'r')\n    while True:\n        line = fd.readline()\n        if line == '':\n            break\n        split_line = line.split()\n\n        Args.scene_id = split_line[0]\n        expName = split_line[1]\n\n        Args.landmark_config = split_line[2]\n        Args.visibility_config = split_line[3]\n\n        Args.checkpoint_names.append(expName)\n        fields = expName.split('-')\n        Args.landmark_indices.append(int(fields[2]))\n\n    local_inference()"
  },
  {
    "path": "src/local_training.py",
    "content": "# Copyright (c) Microsoft Corporation. All rights reserved.\nimport argparse\nimport os\n#import re\n\nArgs = None\n\ndef launch_training():\n    print(\"Experiment File: %s\" % Args.experiment_file)\n    print(\"Model Dir: %s\" % Args.model_dir)\n    cmd = 'python main.py --action train_patches'\n    cmd += ' --training_batch_size %d' % (Args.training_batch_size)\n    cmd += ' --output_downsample %d' % (Args.output_downsample)\n    cmd += ' --num_epochs %d' % (Args.num_epochs)\n    cmd += ' --dataset_folder %s' % (Args.dataset_dir)\n    cmd += ' --scene_id %s' % (Args.scene_id)\n    cmd += ' --landmark_config %s' % (Args.landmark_config)\n    cmd += ' --visibility_config %s' % (Args.visibility_config)\n    cmd += ' --output_folder %s' % (Args.model_dir)\n    cmd += ' --landmark_indices %d' % (Args.landmark_index_start)\n    cmd += ' --landmark_indices %d' % (Args.landmark_index_stop)\n    os.system(cmd)\n\nif __name__ == '__main__':\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\n        '--dataset_dir', type=str, required=True,\n        help=\"Dataset folder path.\")\n    parser.add_argument(\n        '--experiment_file', type=str, required=True,\n        help=\"Experiment file path.\")\n    parser.add_argument(\n        '--scene_id', type=str, required=True,\n        help=\"name of scene.\")\n    parser.add_argument(\n        '--landmark_config', type=str, required=True,\n        help='Landmark configuration.')\n    parser.add_argument(\n        '--visibility_config', type=str, required=True,\n        help='Visibility configuration.')\n    parser.add_argument(\n        '--num_landmarks', type=int, required=True,\n        help='number of landmarks.')\n    parser.add_argument(\n        '--block_size', type=int, required=True,\n        help='number of landmarks in each block.')\n    parser.add_argument(\n        '--subset_index', type=int, required=True,\n        help='index of landmark subset (starts from 0).')\n    parser.add_argument(\n        '--output_dir', type=str, required=True,\n        help='folder to save experiment file in.')\n    parser.add_argument(\n        '--model_dir', type=str, required=True,\n        help='folder to save model ckpt file in.')\n    parser.add_argument(\n        '--training_batch_size', type=int, required=True,\n        help='batch size.')\n    parser.add_argument(\n        '--output_downsample', type=int, required=True,\n        help='Downsample factor for heat map resolution.')\n    parser.add_argument(\n        '--num_epochs', type=int, required=True,\n        help='the number of epochs used for training.')\n    Args = parser.parse_args()\n\n    # Write the experiment file\n    exp_fn = os.path.join(Args.output_dir, Args.experiment_file)\n    fd = open(exp_fn, \"w\")\n    for lid in range(0, Args.num_landmarks, Args.block_size):\n        Args.landmark_index_start = lid\n        Args.landmark_index_stop = lid + Args.block_size\n        str = '%s %s-%03d-%03d %s %s local' % (Args.scene_id, Args.scene_id, Args.landmark_index_start, Args.landmark_index_stop, Args.landmark_config, Args.visibility_config)\n        print(str, file=fd)\n    fd.close()\n\n    # Launch the training job for the specified subset only.\n    Args.landmark_index_start = Args.block_size * Args.subset_index\n    Args.landmark_index_stop = Args.block_size * (Args.subset_index + 1)\n    launch_training()"
  },
  {
    "path": "src/main.py",
    "content": "import argparse\nfrom inference import *\nfrom train import *\n\nDEVICE = None\n# auto-detect default device\nif torch.backends.mps.is_available():\n    # Code to run on macOS\n    torch.backends.mps.enabled = True\n    DEVICE = \"mps\"\n    print (\"MPS enabled\")\nelif torch.cuda.is_available():\n    # Windows or Linux GPU acceleration\n    torch.backends.cudnn.enabled = True\n    torch.backends.cudnn.benchmark = True\n    DEVICE = \"cuda\"\n    print (\"CUDA enabled\")\nelse:\n    # CPU\n    torch.backends.cudnn.enabled = False\n    DEVICE = \"cpu\"\n    print (\"CPU enabled\")\n\nif __name__ == '__main__':\n    parser = argparse.ArgumentParser(\n        description='Scene Landmark Detection',\n        formatter_class=argparse.ArgumentDefaultsHelpFormatter)\n    parser.add_argument(\n        '--dataset_folder', type=str, required=True,\n        help='Root directory, where all data is stored')\n    parser.add_argument(\n        '--output_folder', type=str, required=True,\n        help='Output folder')\n    parser.add_argument(\n        '--landmark_config', type=str, default='landmarks/landmarks-300',\n        help='File containing scene-specific 3D landmarks.')\n    parser.add_argument(\n        '--landmark_indices', type=int, action='append',\n        help = 'Landmark indices, specify twice',\n        required=True)\n    parser.add_argument(\n        '--visibility_config', type=str, default='landmarks/visibility_aug-300',\n        help='File containing information about visibility of landmarks in cameras associated with training set.')\n    parser.add_argument(\n        '--scene_id', type=str, default='scene6',\n        help='Scene id')\n    parser.add_argument(\n        '--model', type=str, default='efficientnet',\n        help='Network architecture backbone.')\n    parser.add_argument(\n        '--output_downsample', type=int, default=4,\n        help='Down sampling factor for output resolution')\n    parser.add_argument(\n        '--gpu_device', type=str, default=DEVICE,\n        help='GPU device')\n    parser.add_argument(\n        '--pretrained_model', type=str, action='append', default=[],\n        help='Pretrained detector model')\n    parser.add_argument(\n        '--num_epochs', type=int, default=200,\n        help='Number of training epochs.')\n    parser.add_argument(\n        '--action', type=str, default='test',\n        help='train/train_patches/test')\n    parser.add_argument(\n        '--use_precomputed_focal_length', type=int, default=0)\n    parser.add_argument(\n        '--training_batch_size', type=int, default=8,\n        help='Batch size used during training.')\n\n    opt = parser.parse_args()\n\n    #print('scene_id: ', opt.scene_id)\n    #print('action: ', opt.action)\n    #print('training_batch_size: ', opt.training_batch_size)\n    #print('output downsample: ', opt.output_downsample)\n\n    if opt.action == 'train':\n        train(opt)\n        opt.pretrained_model = [opt.output_folder + '/model-best_median.ckpt']\n        eval_stats = inference(opt, minimal_tight_thr=1e-3, opt_tight_thr=1e-3)\n        print(\"{:>10} {:>30} {:>30} {:>20}\".format('Scene ID',\n                                                   'Median trans error (cm)',\n                                                   'Median rotation error (deg)',\n                                                   'Recall 5cm5deg (%)'))\n        print(\"{:>10} {:>30.4} {:>30.4} {:>20.2%}\".format(opt.scene_id,\n                                                          100. * eval_stats['median_trans_error'],\n                                                          eval_stats['median_rot_error'],\n                                                          eval_stats['r5p5']))\n    elif opt.action == 'train_patches':\n        train_patches(opt)\n        opt.pretrained_model = [opt.output_folder + '/model-best_median.ckpt']\n        eval_stats = inference(opt, minimal_tight_thr=1e-3, opt_tight_thr=1e-3)\n        print(\"{:>10} {:>30} {:>30} {:>20}\".format('Scene ID',\n                                                   'Median trans error (cm)',\n                                                   'Median rotation error (deg)',\n                                                   'Recall 5cm5deg (%)'))\n        print(\"{:>10} {:>30.4} {:>30.4} {:>20.2%}\".format(opt.scene_id,\n                                                          100. * eval_stats['median_trans_error'],\n                                                          eval_stats['median_rot_error'],\n                                                          eval_stats['r5p5']))\n    elif opt.action == 'landmark_stats':\n        inference_landmark_stats(opt, mode='train')\n    elif opt.action == 'test':\n        if opt.scene_id == 'all':\n            eval_stats = {}\n            pretrained_folder = opt.pretrained_model\n            output_folder = opt.output_folder\n            for scene_id in ['1', '2a', '3', '4a', '5', '6']:\n                opt.scene_id = 'scene' + scene_id\n                opt.pretrained_model = [pretrained_folder + 'scene%s.ckpt' % scene_id]\n                opt.output_folder = os.path.join(output_folder, 'scene' + scene_id)\n                eval_stats[opt.scene_id] = inference(opt, minimal_tight_thr=1e-3, opt_tight_thr=1e-3)\n\n            print(\"{:>10} {:>30} {:>30} {:>20}\".format('Scene ID',\n                                                       'Median trans error (cm)',\n                                                       'Median rotation error (deg)',\n                                                       'Recall 5cm5deg (%)'))\n            for x in eval_stats:\n                print(\"{:>10} {:>30.4} {:>30.4} {:>20.2%}\".format(x,\n                                                                  100. * eval_stats[x]['median_trans_error'],\n                                                                  eval_stats[x]['median_rot_error'],\n                                                                  eval_stats[x]['r5p5']))\n        else:\n\n            eval_stats = inference(opt, minimal_tight_thr=1e-3, opt_tight_thr=1e-3)\n            metricsFilename = opt.output_folder + '/metrics.txt'\n            print(metricsFilename)\n            fd = open(metricsFilename, \"w\")\n            fd.write(\"%f\\n\" % (eval_stats['r5p5']))\n            fd.write(\"%f\\n\" % (eval_stats['speed']))\n            fd.close()\n\n            print(\"{:>10} {:>30} {:>30} {:>20} {:>15} {:>15} {:>15} {:>15} {:>20} {:>20}\".format('Scene ID',\n                                                                            'Median trans error (cm)',\n                                                                            'Median rotation error (deg)',\n                                                                            'Recall 1cm1deg (%)',\n                                                                            '2cm2deg (%)',\n                                                                            '5cm5deg (%)',\n                                                                            '10cm10deg (%)',\n                                                                            '5deg (%)',\n                                                                            'Median Pixel Error',\n                                                                            'Median Angular Error'))\n            print(\"{:>10} {:>30.4} {:>30.4} {:>20.2%} {:>15.2%} {:>15.2%} {:>15.2%} {:>15.2%} {:>20.4} {:>20.4}\".format(opt.scene_id,\n                                                                                100. * eval_stats['median_trans_error'],\n                                                                                eval_stats['median_rot_error'],\n                                                                                eval_stats['r1p1'],\n                                                                                eval_stats['r2p2'],\n                                                                                eval_stats['r5p5'],\n                                                                                eval_stats['r10p10'],\n                                                                                eval_stats['r5'],\n                                                                                np.median(eval_stats['pixel_error']),\n                                                                                np.median(eval_stats['angular_error'])))\n"
  },
  {
    "path": "src/models/blocks.py",
    "content": "import torch\nimport torch.nn as nn\nfrom .conv2d_layers import Conv2dSameExport\n\n\ndef _make_encoder(use_pretrained, exportable=True, output_downsample=4):\n\n    # pretrained = _make_pretrained_efficientnet_lite0(use_pretrained, exportable=exportable)\n    pretrained = torch.load('pretrained_efficientnetlite0.net')\n\n    if output_downsample <= 16:\n        pretrained.layer2[0][0].conv_dw.stride = (1, 1)\n    if output_downsample <= 8:\n        pretrained.layer3[0][0].conv_dw.stride = (1, 1)\n    if output_downsample <= 4:\n        pretrained.layer4[0][0].conv_dw.stride = (1, 1)\n\n    return pretrained, None\n\n\ndef _make_pretrained_efficientnet_lite0(use_pretrained, exportable=False):\n    efficientnet = torch.hub.load(\n        \"rwightman/gen-efficientnet-pytorch\",\n        \"tf_efficientnet_lite0\",\n        pretrained=use_pretrained,\n        exportable=exportable\n    )\n    return _make_efficientnet_backbone(efficientnet)\n\n\ndef _make_efficientnet_backbone(effnet):\n    pretrained = nn.Module()\n\n    pretrained.layer1 = nn.Sequential(\n        effnet.conv_stem, effnet.bn1, effnet.act1, *effnet.blocks[0:2]\n    )\n    pretrained.layer2 = nn.Sequential(*effnet.blocks[2:3])\n    pretrained.layer3 = nn.Sequential(*effnet.blocks[3:5])\n    pretrained.layer4 = nn.Sequential(*effnet.blocks[5:9])\n\n    return pretrained\n\n\ndef _make_resnet_backbone(resnet):\n    pretrained = nn.Module()\n    pretrained.layer1 = nn.Sequential(\n        resnet.conv1, resnet.bn1, resnet.relu, resnet.maxpool, resnet.layer1\n    )\n\n    pretrained.layer2 = resnet.layer2\n    pretrained.layer3 = resnet.layer3\n    pretrained.layer4 = resnet.layer4\n\n    return pretrained\n\n\ndef _make_pretrained_resnext101_wsl(use_pretrained):\n    resnet = torch.hub.load(\"facebookresearch/WSL-Images\", \"resnext101_32x8d_wsl\")\n    return _make_resnet_backbone(resnet)\n\n\nclass Interpolate(nn.Module):\n    \"\"\"Interpolation module.\n    \"\"\"\n\n    def __init__(self, scale_factor, mode, align_corners=False):\n        \"\"\"Init.\n\n        Args:\n            scale_factor (float): scaling\n            mode (str): interpolation mode\n        \"\"\"\n        super(Interpolate, self).__init__()\n\n        self.interp = nn.functional.interpolate\n        self.scale_factor = scale_factor\n        self.mode = mode\n        self.align_corners = align_corners\n\n    def forward(self, x):\n        \"\"\"Forward pass.\n\n        Args:\n            x (tensor): input\n\n        Returns:\n            tensor: interpolated data\n        \"\"\"\n\n        x = self.interp(\n            x, scale_factor=self.scale_factor, mode=self.mode, align_corners=self.align_corners\n        )\n\n        return x\n\n\nclass ResidualConvUnit(nn.Module):\n    \"\"\"Residual convolution module.\n    \"\"\"\n\n    def __init__(self, features):\n        \"\"\"Init.\n\n        Args:\n            features (int): number of features\n        \"\"\"\n        super().__init__()\n\n        self.conv1 = nn.Conv2d(\n            features, features, kernel_size=3, stride=1, padding=1, bias=True\n        )\n\n        self.conv2 = nn.Conv2d(\n            features, features, kernel_size=3, stride=1, padding=1, bias=True\n        )\n\n        self.relu = nn.ReLU(inplace=True)\n\n    def forward(self, x):\n        \"\"\"Forward pass.\n\n        Args:\n            x (tensor): input\n\n        Returns:\n            tensor: output\n        \"\"\"\n        out = self.relu(x)\n        out = self.conv1(out)\n        out = self.relu(out)\n        out = self.conv2(out)\n\n        return out + x\n\n\nclass FeatureFusionBlock(nn.Module):\n    \"\"\"Feature fusion block.\n    \"\"\"\n\n    def __init__(self, features):\n        \"\"\"Init.\n\n        Args:\n            features (int): number of features\n        \"\"\"\n        super(FeatureFusionBlock, self).__init__()\n\n        self.resConfUnit1 = ResidualConvUnit(features)\n        self.resConfUnit2 = ResidualConvUnit(features)\n\n    def forward(self, *xs):\n        \"\"\"Forward pass.\n\n        Returns:\n            tensor: output\n        \"\"\"\n        output = xs[0]\n\n        if len(xs) == 2:\n            output += self.resConfUnit1(xs[1])\n\n        output = self.resConfUnit2(output)\n\n        output = nn.functional.interpolate(\n            output, scale_factor=2, mode=\"bilinear\", align_corners=True\n        )\n\n        return output\n\n\nclass ResidualConvUnit_custom(nn.Module):\n    \"\"\"Residual convolution module.\n    \"\"\"\n\n    def __init__(self, features, activation, bn):\n        \"\"\"Init.\n\n        Args:\n            features (int): number of features\n        \"\"\"\n        super().__init__()\n\n        self.bn = bn\n\n        self.groups = 1\n\n        self.conv1 = nn.Conv2d(\n            features, features, kernel_size=3, stride=1, padding=1, bias=True, groups=self.groups\n        )\n\n        self.conv2 = nn.Conv2d(\n            features, features, kernel_size=3, stride=1, padding=1, bias=True, groups=self.groups\n        )\n\n        if self.bn == True:\n            self.bn1 = nn.BatchNorm2d(features)\n            self.bn2 = nn.BatchNorm2d(features)\n\n        self.activation = activation\n\n        self.skip_add = nn.quantized.FloatFunctional()\n\n    def forward(self, x):\n        \"\"\"Forward pass.\n\n        Args:\n            x (tensor): input\n\n        Returns:\n            tensor: output\n        \"\"\"\n\n        out = self.activation(x)\n        out = self.conv1(out)\n        if self.bn == True:\n            out = self.bn1(out)\n\n        out = self.activation(out)\n        out = self.conv2(out)\n        if self.bn == True:\n            out = self.bn2(out)\n\n        if self.groups > 1:\n            out = self.conv_merge(out)\n\n        return self.skip_add.add(out, x)\n\n        # return out + x\n\n\nclass FeatureFusionBlock_custom(nn.Module):\n    \"\"\"Feature fusion block.\n    \"\"\"\n\n    def __init__(self, features, activation, deconv=False, bn=False, expand=False, align_corners=True):\n        \"\"\"Init.\n\n        Args:\n            features (int): number of features\n        \"\"\"\n        super(FeatureFusionBlock_custom, self).__init__()\n\n        self.deconv = deconv\n        self.align_corners = align_corners\n\n        self.groups = 1\n\n        self.expand = expand\n        out_features = features\n        if self.expand == True:\n            out_features = features // 2\n\n        self.out_conv = nn.Conv2d(features, out_features, kernel_size=1, stride=1, padding=0, bias=True, groups=1)\n\n        self.resConfUnit1 = ResidualConvUnit_custom(features, activation, bn)\n        self.resConfUnit2 = ResidualConvUnit_custom(features, activation, bn)\n\n        self.skip_add = nn.quantized.FloatFunctional()\n\n    def forward(self, *xs):\n        \"\"\"Forward pass.\n\n        Returns:\n            tensor: output\n        \"\"\"\n        output = xs[0]\n\n        if len(xs) == 2:\n            res = self.resConfUnit1(xs[1])\n            output = self.skip_add.add(output, res)\n            # output += res\n\n        output = self.resConfUnit2(output)\n\n        output = nn.functional.interpolate(\n            output, scale_factor=2, mode=\"bilinear\", align_corners=self.align_corners\n        )\n\n        output = self.out_conv(output)\n\n        return output\n"
  },
  {
    "path": "src/models/conv2d_layers.py",
    "content": "\"\"\" Conv2D w/ SAME padding, CondConv, MixedConv\n\nA collection of conv layers and padding helpers needed by EfficientNet, MixNet, and\nMobileNetV3 models that maintain weight compatibility with original Tensorflow models.\n\nCopyright 2020 Ross Wightman\n\"\"\"\nimport collections.abc\nimport math\nfrom functools import partial\nfrom itertools import repeat\nfrom typing import Tuple, Optional\n\nimport numpy as np\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\n# From PyTorch internals\ndef _ntuple(n):\n    def parse(x):\n        if isinstance(x, collections.abc.Iterable):\n            return x\n        return tuple(repeat(x, n))\n    return parse\n\n\n_single = _ntuple(1)\n_pair = _ntuple(2)\n_triple = _ntuple(3)\n_quadruple = _ntuple(4)\n\n\ndef _is_static_pad(kernel_size, stride=1, dilation=1, **_):\n    return stride == 1 and (dilation * (kernel_size - 1)) % 2 == 0\n\n\ndef _get_padding(kernel_size, stride=1, dilation=1, **_):\n    padding = ((stride - 1) + dilation * (kernel_size - 1)) // 2\n    return padding\n\n\ndef _calc_same_pad(i: int, k: int, s: int, d: int):\n    return max((-(i // -s) - 1) * s + (k - 1) * d + 1 - i, 0)\n\n\ndef _same_pad_arg(input_size, kernel_size, stride, dilation):\n    ih, iw = input_size\n    kh, kw = kernel_size\n    pad_h = _calc_same_pad(ih, kh, stride[0], dilation[0])\n    pad_w = _calc_same_pad(iw, kw, stride[1], dilation[1])\n    return [pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2]\n\n\ndef _split_channels(num_chan, num_groups):\n    split = [num_chan // num_groups for _ in range(num_groups)]\n    split[0] += num_chan - sum(split)\n    return split\n\n\ndef conv2d_same(\n        x, weight: torch.Tensor, bias: Optional[torch.Tensor] = None, stride: Tuple[int, int] = (1, 1),\n        padding: Tuple[int, int] = (0, 0), dilation: Tuple[int, int] = (1, 1), groups: int = 1):\n    ih, iw = x.size()[-2:]\n    kh, kw = weight.size()[-2:]\n    pad_h = _calc_same_pad(ih, kh, stride[0], dilation[0])\n    pad_w = _calc_same_pad(iw, kw, stride[1], dilation[1])\n    x = F.pad(x, [pad_w // 2, pad_w - pad_w // 2, pad_h // 2, pad_h - pad_h // 2])\n    return F.conv2d(x, weight, bias, stride, (0, 0), dilation, groups)\n\n\nclass Conv2dSame(nn.Conv2d):\n    \"\"\" Tensorflow like 'SAME' convolution wrapper for 2D convolutions\n    \"\"\"\n\n    # pylint: disable=unused-argument\n    def __init__(self, in_channels, out_channels, kernel_size, stride=1,\n                 padding=0, dilation=1, groups=1, bias=True):\n        super(Conv2dSame, self).__init__(\n            in_channels, out_channels, kernel_size, stride, 0, dilation, groups, bias)\n\n    def forward(self, x):\n        return conv2d_same(x, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups)\n\n\nclass Conv2dSameExport(nn.Conv2d):\n    \"\"\" ONNX export friendly Tensorflow like 'SAME' convolution wrapper for 2D convolutions\n\n    NOTE: This does not currently work with torch.jit.script\n    \"\"\"\n\n    # pylint: disable=unused-argument\n    def __init__(self, in_channels, out_channels, kernel_size, stride=1, dilation=1, groups=1, bias=True):\n        super(Conv2dSameExport, self).__init__(\n            in_channels, out_channels, kernel_size, stride, 0, dilation, groups, bias)\n        self.pad = None\n        self.pad_input_size = (0, 0)\n\n    def forward(self, x):\n        input_size = x.size()[-2:]\n        if self.pad is None:\n            pad_arg = _same_pad_arg(input_size, self.weight.size()[-2:], self.stride, self.dilation)\n            self.pad = nn.ZeroPad2d(pad_arg)\n            self.pad_input_size = input_size\n\n        if self.pad is not None:\n            x = self.pad(x)\n        return F.conv2d(\n            x, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups)\n\n\ndef get_padding_value(padding, kernel_size, **kwargs):\n    dynamic = False\n    if isinstance(padding, str):\n        # for any string padding, the padding will be calculated for you, one of three ways\n        padding = padding.lower()\n        if padding == 'same':\n            # TF compatible 'SAME' padding, has a performance and GPU memory allocation impact\n            if _is_static_pad(kernel_size, **kwargs):\n                # static case, no extra overhead\n                padding = _get_padding(kernel_size, **kwargs)\n            else:\n                # dynamic padding\n                padding = 0\n                dynamic = True\n        elif padding == 'valid':\n            # 'VALID' padding, same as padding=0\n            padding = 0\n        else:\n            # Default to PyTorch style 'same'-ish symmetric padding\n            padding = _get_padding(kernel_size, **kwargs)\n    return padding, dynamic\n\n\ndef create_conv2d_pad(in_chs, out_chs, kernel_size, **kwargs):\n    padding = kwargs.pop('padding', '')\n    kwargs.setdefault('bias', False)\n    padding, is_dynamic = get_padding_value(padding, kernel_size, **kwargs)\n    if is_dynamic:\n        if is_exportable():\n            assert not is_scriptable()\n            return Conv2dSameExport(in_chs, out_chs, kernel_size, **kwargs)\n        else:\n            return Conv2dSame(in_chs, out_chs, kernel_size, **kwargs)\n    else:\n        return nn.Conv2d(in_chs, out_chs, kernel_size, padding=padding, **kwargs)\n\n\nclass MixedConv2d(nn.ModuleDict):\n    \"\"\" Mixed Grouped Convolution\n    Based on MDConv and GroupedConv in MixNet impl:\n      https://github.com/tensorflow/tpu/blob/master/models/official/mnasnet/mixnet/custom_layers.py\n    \"\"\"\n\n    def __init__(self, in_channels, out_channels, kernel_size=3,\n                 stride=1, padding='', dilation=1, depthwise=False, **kwargs):\n        super(MixedConv2d, self).__init__()\n\n        kernel_size = kernel_size if isinstance(kernel_size, list) else [kernel_size]\n        num_groups = len(kernel_size)\n        in_splits = _split_channels(in_channels, num_groups)\n        out_splits = _split_channels(out_channels, num_groups)\n        self.in_channels = sum(in_splits)\n        self.out_channels = sum(out_splits)\n        for idx, (k, in_ch, out_ch) in enumerate(zip(kernel_size, in_splits, out_splits)):\n            conv_groups = out_ch if depthwise else 1\n            self.add_module(\n                str(idx),\n                create_conv2d_pad(\n                    in_ch, out_ch, k, stride=stride,\n                    padding=padding, dilation=dilation, groups=conv_groups, **kwargs)\n            )\n        self.splits = in_splits\n\n    def forward(self, x):\n        x_split = torch.split(x, self.splits, 1)\n        x_out = [conv(x_split[i]) for i, conv in enumerate(self.values())]\n        x = torch.cat(x_out, 1)\n        return x\n\n\ndef get_condconv_initializer(initializer, num_experts, expert_shape):\n    def condconv_initializer(weight):\n        \"\"\"CondConv initializer function.\"\"\"\n        num_params = np.prod(expert_shape)\n        if (len(weight.shape) != 2 or weight.shape[0] != num_experts or\n                weight.shape[1] != num_params):\n            raise (ValueError(\n                'CondConv variables must have shape [num_experts, num_params]'))\n        for i in range(num_experts):\n            initializer(weight[i].view(expert_shape))\n    return condconv_initializer\n\n\nclass CondConv2d(nn.Module):\n    \"\"\" Conditional Convolution\n    Inspired by: https://github.com/tensorflow/tpu/blob/master/models/official/efficientnet/condconv/condconv_layers.py\n\n    Grouped convolution hackery for parallel execution of the per-sample kernel filters inspired by this discussion:\n    https://github.com/pytorch/pytorch/issues/17983\n    \"\"\"\n    __constants__ = ['bias', 'in_channels', 'out_channels', 'dynamic_padding']\n\n    def __init__(self, in_channels, out_channels, kernel_size=3,\n                 stride=1, padding='', dilation=1, groups=1, bias=False, num_experts=4):\n        super(CondConv2d, self).__init__()\n\n        self.in_channels = in_channels\n        self.out_channels = out_channels\n        self.kernel_size = _pair(kernel_size)\n        self.stride = _pair(stride)\n        padding_val, is_padding_dynamic = get_padding_value(\n            padding, kernel_size, stride=stride, dilation=dilation)\n        self.dynamic_padding = is_padding_dynamic  # if in forward to work with torchscript\n        self.padding = _pair(padding_val)\n        self.dilation = _pair(dilation)\n        self.groups = groups\n        self.num_experts = num_experts\n\n        self.weight_shape = (self.out_channels, self.in_channels // self.groups) + self.kernel_size\n        weight_num_param = 1\n        for wd in self.weight_shape:\n            weight_num_param *= wd\n        self.weight = torch.nn.Parameter(torch.Tensor(self.num_experts, weight_num_param))\n\n        if bias:\n            self.bias_shape = (self.out_channels,)\n            self.bias = torch.nn.Parameter(torch.Tensor(self.num_experts, self.out_channels))\n        else:\n            self.register_parameter('bias', None)\n\n        self.reset_parameters()\n\n    def reset_parameters(self):\n        init_weight = get_condconv_initializer(\n            partial(nn.init.kaiming_uniform_, a=math.sqrt(5)), self.num_experts, self.weight_shape)\n        init_weight(self.weight)\n        if self.bias is not None:\n            fan_in = np.prod(self.weight_shape[1:])\n            bound = 1 / math.sqrt(fan_in)\n            init_bias = get_condconv_initializer(\n                partial(nn.init.uniform_, a=-bound, b=bound), self.num_experts, self.bias_shape)\n            init_bias(self.bias)\n\n    def forward(self, x, routing_weights):\n        B, C, H, W = x.shape\n        weight = torch.matmul(routing_weights, self.weight)\n        new_weight_shape = (B * self.out_channels, self.in_channels // self.groups) + self.kernel_size\n        weight = weight.view(new_weight_shape)\n        bias = None\n        if self.bias is not None:\n            bias = torch.matmul(routing_weights, self.bias)\n            bias = bias.view(B * self.out_channels)\n        # move batch elements with channels so each batch element can be efficiently convolved with separate kernel\n        x = x.view(1, B * C, H, W)\n        if self.dynamic_padding:\n            out = conv2d_same(\n                x, weight, bias, stride=self.stride, padding=self.padding,\n                dilation=self.dilation, groups=self.groups * B)\n        else:\n            out = F.conv2d(\n                x, weight, bias, stride=self.stride, padding=self.padding,\n                dilation=self.dilation, groups=self.groups * B)\n        out = out.permute([1, 0, 2, 3]).view(B, self.out_channels, out.shape[-2], out.shape[-1])\n\n        # Literal port (from TF definition)\n        # x = torch.split(x, 1, 0)\n        # weight = torch.split(weight, 1, 0)\n        # if self.bias is not None:\n        #     bias = torch.matmul(routing_weights, self.bias)\n        #     bias = torch.split(bias, 1, 0)\n        # else:\n        #     bias = [None] * B\n        # out = []\n        # for xi, wi, bi in zip(x, weight, bias):\n        #     wi = wi.view(*self.weight_shape)\n        #     if bi is not None:\n        #         bi = bi.view(*self.bias_shape)\n        #     out.append(self.conv_fn(\n        #         xi, wi, bi, stride=self.stride, padding=self.padding,\n        #         dilation=self.dilation, groups=self.groups))\n        # out = torch.cat(out, 0)\n        return out\n\n\ndef select_conv2d(in_chs, out_chs, kernel_size, **kwargs):\n    assert 'groups' not in kwargs  # only use 'depthwise' bool arg\n    if isinstance(kernel_size, list):\n        assert 'num_experts' not in kwargs  # MixNet + CondConv combo not supported currently\n        # We're going to use only lists for defining the MixedConv2d kernel groups,\n        # ints, tuples, other iterables will continue to pass to normal conv and specify h, w.\n        m = MixedConv2d(in_chs, out_chs, kernel_size, **kwargs)\n    else:\n        depthwise = kwargs.pop('depthwise', False)\n        groups = out_chs if depthwise else 1\n        if 'num_experts' in kwargs and kwargs['num_experts'] > 0:\n            m = CondConv2d(in_chs, out_chs, kernel_size, groups=groups, **kwargs)\n        else:\n            m = create_conv2d_pad(in_chs, out_chs, kernel_size, groups=groups, **kwargs)\n    return m"
  },
  {
    "path": "src/models/efficientlitesld.py",
    "content": "import torch\nimport torch.nn as nn\n\nfrom .blocks import _make_encoder\n\n\nclass ASPP(nn.Module):\n    def __init__(self, in_ch, d1, d2, d3, d4, reduction=4):\n        super(ASPP, self).__init__()\n        self.aspp_d1 = nn.Sequential(\n            nn.Conv2d(in_ch, in_ch // reduction, 3, padding=d1, dilation=d1),\n            nn.BatchNorm2d(in_ch // reduction),\n            nn.ReLU(inplace=True)\n        )\n        self.aspp_d2 = nn.Sequential(\n            nn.Conv2d(in_ch, in_ch // reduction, 3, padding=d2, dilation=d2),\n            nn.BatchNorm2d(in_ch // reduction),\n            nn.ReLU(inplace=True)\n        )\n        self.aspp_d3 = nn.Sequential(\n            nn.Conv2d(in_ch, in_ch // reduction, 3, padding=d3, dilation=d3),\n            nn.BatchNorm2d(in_ch // reduction),\n            nn.ReLU(inplace=True)\n        )\n\n        self.aspp_d4 = nn.Sequential(\n            nn.Conv2d(in_ch, in_ch // reduction, 3, padding=d4, dilation=d4),\n            nn.BatchNorm2d(in_ch // reduction),\n            nn.ReLU(inplace=True)\n        )\n\n    def forward(self, x):\n        d1 = self.aspp_d1(x)\n        d2 = self.aspp_d2(x)\n        d3 = self.aspp_d3(x)\n        d4 = self.aspp_d4(x)\n        return torch.cat((d1, d2, d3, d4), dim=1)\n\n\nclass EfficientNetSLD(torch.nn.Module):\n    \"\"\"Network for monocular depth estimation.\n    \"\"\"\n\n    def __init__(self, path=None, num_landmarks=200, output_downsample=4, features=320):\n        \"\"\"Init.\n\n        Args:\n            path (str, optional): Path to saved model. Defaults to None.\n            features (int, optional): Number of features. Defaults to 256.\n            backbone (str, optional): Backbone network for encoder. Defaults to efficientnetlite0\n        \"\"\"\n        super(EfficientNetSLD, self).__init__()\n\n        self.pretrained, _ = _make_encoder(use_pretrained=True, output_downsample=output_downsample)\n\n        self.aspp = nn.Sequential(\n            ASPP(in_ch=features, d1=1, d2=2, d3=3, d4=4, reduction=4),\n        )\n\n        self.heatmap_outputs_res1 = nn.Sequential(\n            nn.Conv2d(features, num_landmarks, kernel_size=1, stride=1, padding=0)\n        )\n        self.heatmap_outputs_res2 = None\n\n        if output_downsample == 2:\n            input_channels = features + num_landmarks\n            output_channels = features\n\n            self.heatmap_features_res2 = nn.Sequential(nn.ConvTranspose2d(in_channels=input_channels,\n                                                                          out_channels=output_channels,\n                                                                          kernel_size=4, stride=2, padding=1,\n                                                                          bias=False),\n                                                       nn.BatchNorm2d(output_channels),\n                                                       nn.ReLU(inplace=True)\n                                                       )\n            self.heatmap_outputs_res2 = nn.Conv2d(output_channels, num_landmarks, kernel_size=1, stride=1, bias=False)\n\n        if path:\n            self.load(path)\n\n    def forward(self, x):\n        \"\"\"Forward pass.\n\n        Args:\n            x (tensor): input data (image)\n\n        Returns:\n            Heatmap prediction\n            ['1']: quarter of input spatial dimension\n            ['2']: half of input spatial dimension\n        \"\"\"\n\n        layer_1 = self.pretrained.layer1(x)\n        layer_2 = self.pretrained.layer2(layer_1)\n        layer_3 = self.pretrained.layer3(layer_2)\n        layer_4 = self.pretrained.layer4(layer_3)\n        y1 = self.aspp(layer_4)\n        z1 = self.heatmap_outputs_res1(y1)\n\n        z2 = None\n        if self.heatmap_outputs_res2 is not None:\n            y2 = self.heatmap_features_res2(torch.cat((y1, z1), dim=1))\n            z2 = self.heatmap_outputs_res2(y2)\n\n        return {'1': z1, '2': z2}\n"
  },
  {
    "path": "src/requirements.txt",
    "content": "# Scene Landmarks Detector Requirements\n# Usage: pip install -r requirements.txt\n\nargparse\nmatplotlib>=3.2.2\nnumpy>=1.22.3\nPillow>=8.2.0\nscipy>=1.6.2\nopen3d\n#torch==1.10.0+cu113\n#torchvision==0.11.1+cu113\n#torchaudio==0.10.0+cu113\ntqdm>=4.59.0\ngeffnet\n"
  },
  {
    "path": "src/run_inference.py",
    "content": "import os\nimport statistics as st\nimport sys\nimport torch\n\nif __name__ == '__main__':\n\n    home_dir = os.path.expanduser(\"~\")\n    # specify dataset path, location of checkpoints and the experiment name.\n    checkpoint_dir = os.path.join(home_dir, 'data/checkpoints')\n    dataset_dir = os.path.join(home_dir, 'data/indoor6')\n    experiment = '1000-125_v10'\n\n    # run inference for all six scenes of the indoor6 dataset\n    for scene_name in ['scene1', 'scene2a', 'scene3', 'scene4a', 'scene5', 'scene6']:\n        command = 'python ./local_inference.py --experiment_file %s_%s.txt --dataset_dir %s --checkpoint_dir %s' % (scene_name, experiment, dataset_dir, checkpoint_dir)\n        os.system(command)\n\n    # calculate metrics\n    t1 = []\n    t2 = []\n    for scene_name in ['scene1', 'scene2a', 'scene3', 'scene4a', 'scene5', 'scene6']:\n        subfolder = '%s_%s' % (scene_name, experiment)\n        mfn = os.path.join(checkpoint_dir, subfolder, \"metrics.txt\")\n        mfd = open(mfn, 'r')\n        idx = 0\n        for line in mfd.readlines():\n            if (idx % 2 == 0):\n                t1.append(float(line))\n            else:\n                t2.append(float(line))\n            idx+=1\n        mfd.close();\n    \n    print(t1)\n    print(t2)\n    metricPcnt = 100.0 * st.fmean(t1)\n    print('   mean = %s pcnt' % str(metricPcnt))\n    print('   rate = %s imgs./sec.' % str(st.fmean(t2)))\n    \n    fname = 'RESULTS-%s.txt' % experiment  \n    ffn = os.path.join(checkpoint_dir, fname)\n    ffd = open(ffn, 'w')\n    ffd.write(f\"{metricPcnt}\\n{st.fmean(t2)}\\n\")\n    ffd.close();"
  },
  {
    "path": "src/run_training.py",
    "content": "from math import exp\nimport os\nimport statistics as st\nfrom tabnanny import check\n\nif __name__ == '__main__':\n\n    home_dir = os.path.expanduser(\"~\")\n\n    # Specify the paths to the dataset and the output folders.\n    dataset_dir = os.path.join(home_dir, \"data/indoor6\")\n    output_dir = os.path.join(home_dir, \"data/outputs\")\n\n    # Specify a version number which can be incremented when training multiple variants on \n    # the same scene.\n    version_no = 10\n    \n    # Specify the scene name\n    scene_name = 'scene6'\n\n    # Specify the landmark file\n    landmark_config = 'landmarks/landmarks-1000v10'\n\n    # Specify the visibility file\n    visibility_config = 'landmarks/visibility-1000v10_depth_normal'\n\n    # Specify the batch size for the minibatches used for training.\n    training_batch_size = 8\n    \n    # Specify the downsample factor for the output heatmap.\n    output_downsample = 8\n    \n    # Specify the number of epochs to use during training.\n    num_epochs = 200\n\n    # Specify the number of landmarks and the block size. The number of landmarks should be \n    # identical to the number of landmarks in the landmark file specified for the \n    # landmark_config parameter.\n    num_landmarks = 1000\n\n    # Specify the number of landmarks that will be present in each subset when the set of \n    # landmarks is partitioned into mutually exclusive subsets. The value specified here \n    # should exactly divide the landmark count. For example, when num_landmarks = 1000 and \n    # block_size = 125, we get 1000/125 = 8 subsets of landmarks.\n    block_size = 125\n\n    # Specify which subset you want to train the model for. For example, when \n    # num_landmarks = 1000 and block_size = 125, then subset_index = 0 indicates that the \n    # range of indices of landmarks in the subset would be [0, 125]. If subset_index = 1,\n    # then the range of indices would be [125, 250].\n    subset_index = 0\n    \n    # Format the experiment name.\n    experiment_name = '%s_%d-%d_v%d' % (scene_name, num_landmarks, block_size, version_no)\n\n    # Format the model_dir string\n    landmark_start_index = subset_index * block_size\n    landmark_stop_index = (subset_index + 1) * block_size\n\n    if landmark_start_index < 0 | landmark_stop_index > num_landmarks:\n        raise Exception('landmark indices are outside valid range!')\n    else:\n        tmp = '%s-%03d-%03d' % (scene_name, landmark_start_index, landmark_stop_index)\n        model_dir = os.path.join(output_dir, experiment_name, tmp)\n\n        # Create the model_dir folder.\n        os.makedirs(model_dir, exist_ok=True)\n\n        # Create the command line string for the training job.\n        cmd = 'python ./local_training.py'\n        cmd += ' --dataset_dir %s' % dataset_dir\n        cmd += ' --scene_id %s' % scene_name\n        cmd += ' --experiment_file %s.txt' % experiment_name\n        cmd += ' --num_landmarks %d' % num_landmarks\n        cmd += ' --block_size %d' % block_size\n        cmd += ' --landmark_config %s' % landmark_config\n        cmd += ' --visibility_config %s' % visibility_config\n        cmd += ' --subset_index %d' % subset_index\n        cmd += ' --output_dir %s' % output_dir\n        cmd += ' --model_dir %s' % model_dir\n        cmd += ' --training_batch_size %d' % training_batch_size\n        cmd += ' --output_downsample %d' % output_downsample \n        cmd += ' --num_epochs %d' % num_epochs\n\n        # Launch training\n        os.system(cmd)\n"
  },
  {
    "path": "src/train.py",
    "content": "from datetime import datetime\nimport logging\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport os\nimport pickle\nimport torch\nfrom torch.utils.data import DataLoader\nfrom tqdm import tqdm\n\nfrom inference import *\nfrom dataloader.indoor6 import *\nfrom models.efficientlitesld import EfficientNetSLD\nfrom utils.heatmap import generate_heat_maps_gpu\n\n\ndef plotting(ROOT_FOLDER):\n    data = pickle.load(open('%s/stats.pkl' % ROOT_FOLDER, 'rb'))\n    fig, axs = plt.subplots(4, 1)\n\n    t = 0\n    s = []\n    epoch = 0\n    for i in range(len(data['train'])-1):\n        if data['train'][i+1]['ep'] == epoch + 1:\n            epoch += 1\n        else:\n            t += 1\n            s.append(data['train'][i]['loss'])\n\n    t = np.arange(0, t)\n    s = np.array(s)\n    s = np.convolve(s, np.ones(10)/10., mode='same')\n\n    axs[0].plot(t, np.log(s))\n    axs[0].set(xlabel='iterations', ylabel='loss', title='')\n    axs[0].grid()\n\n    max_grad = np.array([data['train'][i]['max_grad'] for i in range(len(data['train']))])\n    axs[1].plot(np.arange(0, len(max_grad)), np.log10(max_grad))\n    axs[1].set(xlabel='iterations', ylabel='max gradient', title='')\n    axs[1].grid()\n\n    t = np.array([data['eval'][i]['ep'] for i in range(len(data['eval']))])\n    s = np.array([np.median(data['eval'][i]['pixel_error']) for i in range(len(data['eval']))])\n    axs[2].plot(t, s)\n    axs[2].set(xlabel='epoch', ylabel='Pixel error', title='')\n    axs[2].grid()\n    axs[2].set_yticks(np.arange(0, 20, 5), minor=False)\n    axs[2].set_ylim(0, 20)\n\n    r = np.array([data['eval'][i]['recall'] for i in range(len(data['eval']))])\n    axs[3].plot(t, r)\n    axs[3].set(xlabel='epoch', ylabel='recall', title='')\n    axs[3].grid()\n\n    plt.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=0.8, hspace=1.0)\n    plt.close()\n    fig.savefig('%s/curve_train_test.png' % ROOT_FOLDER, format='png', dpi=120)\n\n\ndef train(opt):\n\n    if not os.path.exists(opt.output_folder):\n        os.makedirs(opt.output_folder)\n\n    logging.basicConfig(filename='%s/training.log' % opt.output_folder, filemode='a', level=logging.DEBUG, format='')\n    logging.info(\"Scene Landmark Detector Training\")\n    print('Start training ...')\n\n    stats_pkl_logging = {'train': [], 'eval': []}\n\n    device = opt.gpu_device\n\n    assert len(opt.landmark_indices) == 0 or len(opt.landmark_indices) == 2, \"landmark indices must be empty or length 2\"\n\n    train_dataset = Indoor6(landmark_idx=np.arange(opt.landmark_indices[0],\n                                                   opt.landmark_indices[1]) if len(opt.landmark_indices) == 2 else [None],\n                            scene_id=opt.scene_id,\n                            mode='train',\n                            root_folder=opt.dataset_folder,\n                            input_image_downsample=2,\n                            landmark_config=opt.landmark_config,\n                            visibility_config=opt.visibility_config,\n                            skip_image_index=1)\n\n    train_dataloader = DataLoader(dataset=train_dataset, num_workers=4, batch_size=opt.training_batch_size, shuffle=True,\n                                  pin_memory=True)\n    \n    ## Save the trained landmark configurations\n    np.savetxt(os.path.join(opt.output_folder, 'landmarks.txt'), train_dataset.landmark)\n    np.savetxt(os.path.join(opt.output_folder, 'visibility.txt'), train_dataset.visibility, fmt='%d')\n\n    num_landmarks = train_dataset.landmark.shape[1]\n\n    if opt.model == 'efficientnet':\n        cnn = EfficientNetSLD(num_landmarks=num_landmarks, output_downsample=opt.output_downsample).to(device=device)\n\n    optimizer = torch.optim.AdamW(cnn.parameters(), lr=1e-3, betas=(0.9, 0.999), eps=1e-4, weight_decay=0.01)\n    scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=20, gamma=0.5)\n\n    lowest_median_angular_error = 1e6\n\n    for epoch in range(opt.num_epochs):\n        # Training\n        training_loss = 0\n        for idx, batch in enumerate(tqdm(train_dataloader)):\n            cnn.train()\n\n            images = batch['image'].to(device=device)\n            B, _, H, W = images.shape\n            visibility = batch['visibility'].reshape(B, num_landmarks).to(device=device)\n            landmark2d = batch['landmark2d'].reshape(B, 2, num_landmarks).to(device=device)\n\n            # Resolution configure\n            landmark2d /= opt.output_downsample\n            heat_map_size = [H // opt.output_downsample, W // opt.output_downsample]\n\n            gt = generate_heat_maps_gpu(landmark2d,\n                                        visibility,\n                                        heat_map_size,\n                                        sigma=torch.tensor([5.], dtype=torch.float, device=device, requires_grad=False))\n            gt.requires_grad = False\n\n            # Clear gradient\n            optimizer.zero_grad()\n\n            # CNN forward pass\n            pred = cnn(images)['1']\n\n            # Compute loss and do backward pass\n            losses = torch.sum((pred[visibility != 0.5] - gt[visibility != 0.5]) ** 2)\n\n            training_loss += losses.detach().clone().item()\n            losses.backward()\n            optimizer.step()\n\n            logging.info('epoch %d, iter %d, loss %4.4f' % (epoch, idx, losses.item()))\n            stats_pkl_logging['train'].append({'ep': epoch, 'iter': idx, 'loss': losses.item()})\n\n        # Saving the ckpt\n        path = '%s/model-latest.ckpt' % (opt.output_folder)\n        torch.save(cnn.state_dict(), path)\n\n        if scheduler.get_last_lr()[-1] > 5e-5:\n            scheduler.step()\n\n        opt.pretrained_model = path\n        eval_stats = inference(opt, opt_tight_thr=1e-3, minimal_tight_thr=1e-3, mode='val')\n\n        median_angular_error = np.median(eval_stats['angular_error'])\n\n        if (median_angular_error < lowest_median_angular_error):\n            lowest_median_angular_error = median_angular_error\n            path = '%s/model-best_median.ckpt' % (opt.output_folder)\n            torch.save(cnn.state_dict(), path)\n\n        # date time\n        ts = datetime.datetime.now().timestamp()\n        dt = datetime.datetime.fromtimestamp(ts)\n        datestring = dt.strftime(\"%Y-%m-%d_%H-%M-%S\")\n\n        # Print, log and update plot\n        stats_pkl_logging['eval'].append(\n            {'ep': epoch,\n             'angular_error': eval_stats['angular_error'],\n             'pixel_error': eval_stats['pixel_error'],\n             'recall': eval_stats['r5p5']\n             })\n\n        str_log = 'epoch %3d: [%s] ' \\\n                  'tr_loss= %10.2f, ' \\\n                  'lowest_median= %8.4f deg. ' \\\n                  'recall= %2.4f ' \\\n                  'angular-err(deg.)= [%7.4f %7.4f %7.4f]  ' \\\n                  'pixel-err= [%4.3f %4.3f %4.3f] [mean/med./min] ' % (epoch, datestring, training_loss,\n                                                                        lowest_median_angular_error,\n                                                                        eval_stats['r5p5'],\n                                                                        np.mean(eval_stats['angular_error']),\n                                                                        np.median(eval_stats['angular_error']),\n                                                                        np.min(eval_stats['angular_error']),\n                                                                        np.mean(eval_stats['pixel_error']),\n                                                                        np.median(eval_stats['pixel_error']),\n                                                                        np.min(eval_stats['pixel_error']))\n        print(str_log)\n        logging.info(str_log)\n\n        with open('%s/stats.pkl' % opt.output_folder, 'wb') as f:\n            pickle.dump(stats_pkl_logging, f)\n        plotting(opt.output_folder)\n\n\ndef train_patches(opt):\n\n    if not os.path.exists(opt.output_folder):\n        os.makedirs(opt.output_folder)\n\n    logging.basicConfig(filename='%s/training.log' % opt.output_folder, filemode='a', level=logging.DEBUG, format='')\n    logging.info(\"Scene Landmark Detector Training Patches\")\n    stats_pkl_logging = {'train': [], 'eval': []}\n\n    device = opt.gpu_device\n\n    assert len(opt.landmark_indices) == 0 or len(opt.landmark_indices) == 2, \"landmark indices must be empty or length 2\"\n    train_dataset = Indoor6Patches(landmark_idx=np.arange(opt.landmark_indices[0],\n                                                   opt.landmark_indices[1]) if len(opt.landmark_indices) == 2 else [None],\n                            scene_id=opt.scene_id,\n                            mode='train',\n                            root_folder=opt.dataset_folder,\n                            input_image_downsample=2,\n                            landmark_config=opt.landmark_config,\n                            visibility_config=opt.visibility_config,\n                            skip_image_index=1)\n\n    train_dataloader = DataLoader(dataset=train_dataset, num_workers=4, batch_size=opt.training_batch_size, shuffle=True,\n                                  pin_memory=True)\n    \n    ## Save the trained landmark configurations\n    np.savetxt(os.path.join(opt.output_folder, 'landmarks.txt'), train_dataset.landmark)\n    np.savetxt(os.path.join(opt.output_folder, 'visibility.txt'), train_dataset.visibility, fmt='%d')\n\n    num_landmarks = train_dataset.landmark.shape[1]\n\n    if opt.model == 'efficientnet':\n        cnn = EfficientNetSLD(num_landmarks=num_landmarks, output_downsample=opt.output_downsample).to(device=device)\n\n    optimizer = torch.optim.AdamW(cnn.parameters(), lr=1e-3, betas=(0.9, 0.999), eps=1e-4, weight_decay=0.01)\n    scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=40, gamma=0.5)\n\n    lowest_median_angular_error = 1e6\n\n    for epoch in range(opt.num_epochs):\n        # Training\n        training_loss = 0\n        for idx, batch in enumerate(tqdm(train_dataloader)):\n\n            cnn.train()\n\n            B1, B2, _, H, W = batch['patches'].shape\n            B = B1 * B2\n            patches = batch['patches']\n            visibility = batch['visibility']\n            landmark2d = batch['landmark2d']\n\n            # highest supported precision for MPS is FP32\n            if device.lower() == 'mps':\n                patches = patches.float()\n                visibility = visibility.float()\n                landmark2d = landmark2d.float()\n\n            patches = patches.reshape(B, 3, H, W).to(device=device)\n            visibility = visibility.reshape(B, num_landmarks).to(device=device)\n            landmark2d = landmark2d.reshape(B, 2, num_landmarks).to(device=device)\n\n            # Batch randomization\n\n            input_batch_random = np.random.permutation(B)\n            landmark2d_rand = [landmark2d[input_batch_random[b:b + 1]] for b in range(B)]\n            patches_rand = [patches[input_batch_random[b:b + 1]] for b in range(B)]\n            visibility_rand = [visibility[input_batch_random[b:b + 1]] for b in range(B)]\n\n            landmark2d_rand = torch.cat(landmark2d_rand, dim=0)\n            patches_rand = torch.cat(patches_rand, dim=0)\n            visibility_rand = torch.cat(visibility_rand, axis=0)\n\n            # Resolution configure\n            landmark2d_rand /= opt.output_downsample\n            heat_map_size = [H // opt.output_downsample, W // opt.output_downsample]\n\n            gt = generate_heat_maps_gpu(landmark2d_rand,\n                                        visibility_rand,\n                                        heat_map_size,\n                                        sigma=torch.tensor([20. / opt.output_downsample], dtype=torch.float, device=device, requires_grad=False))\n            gt.requires_grad = False\n\n            # Clear gradient\n            optimizer.zero_grad()\n\n            # CNN forward pass\n            pred = cnn(patches_rand)['1']\n\n            # Compute loss and do backward pass\n            losses = torch.sum((pred[visibility_rand != 0.5] - gt[visibility_rand != 0.5]) ** 2)\n\n            training_loss += losses.detach().clone().item()\n            losses.backward()\n\n            m = torch.tensor([0.0]).to(device)\n            for p in cnn.parameters():\n                m = torch.max(torch.max(torch.abs(p.grad.data)), m)\n\n            ## Ignore batch with large gradient element\n            if epoch == 0 or (epoch > 0 and m < 1e4):\n                optimizer.step()\n            else:\n                cnn.load_state_dict(torch.load('%s/model-best_median.ckpt' % (opt.output_folder)))\n                cnn.to(device=device)\n\n            logging.info('epoch %d, iter %d, loss %4.4f' % (epoch, idx, losses.item()))\n            stats_pkl_logging['train'].append({'ep': epoch, 'iter': idx, 'loss': losses.item(), 'max_grad': m.cpu().numpy()})\n\n        # Saving the ckpt\n        path = '%s/model-latest.ckpt' % (opt.output_folder)\n        torch.save(cnn.state_dict(), path)\n\n        if scheduler.get_last_lr()[-1] > 5e-5:\n            scheduler.step()\n\n        opt.pretrained_model = [path]\n        eval_stats = inference(opt, opt_tight_thr=1e-3, minimal_tight_thr=1e-3, mode='val')\n\n        median_angular_error = np.median(eval_stats['angular_error'])\n        path = '%s/model-best_median.ckpt' % (opt.output_folder)\n\n        if (median_angular_error < lowest_median_angular_error):\n            lowest_median_angular_error = median_angular_error\n            torch.save(cnn.state_dict(), path)\n        \n        if (~os.path.exists(path) and len(eval_stats['angular_error']) == 0):\n            torch.save(cnn.state_dict(), path)\n\n        # date time\n        ts = datetime.now().timestamp()\n        dt = datetime.fromtimestamp(ts)\n        datestring = dt.strftime(\"%Y-%m-%d_%H-%M-%S\")\n\n        # Print, log and update plot\n        stats_pkl_logging['eval'].append(\n            {'ep': epoch,\n             'angular_error': eval_stats['angular_error'],\n             'pixel_error': eval_stats['pixel_error'],\n             'recall': eval_stats['r5p5']\n             })\n\n\n        try:\n            str_log = 'epoch %3d: [%s] ' \\\n                    'tr_loss= %10.2f, ' \\\n                    'lowest_median= %8.4f deg. ' \\\n                    'recall= %2.4f ' \\\n                    'angular-err(deg.)= [%7.4f %7.4f %7.4f]  ' \\\n                    'pixel-err= [%4.3f %4.3f %4.3f] [mean/med./min] ' % (epoch, datestring, training_loss,\n                                                                            lowest_median_angular_error,\n                                                                            eval_stats['r5p5'],\n                                                                            np.mean(eval_stats['angular_error']),\n                                                                            np.median(eval_stats['angular_error']),\n                                                                            np.min(eval_stats['angular_error']),\n                                                                            np.mean(eval_stats['pixel_error']),\n                                                                            np.median(eval_stats['pixel_error']),\n                                                                            np.min(eval_stats['pixel_error']))\n            print(str_log)\n            logging.info(str_log)\n        except ValueError:  #raised if array is empty.\n            str_log = 'epoch %3d: [%s] ' \\\n                        'tr_loss= %10.2f, ' \\\n                        'No correspondences found' % (epoch, datestring, training_loss)\n            print(str_log)\n            logging.info(str_log)\n\n        with open('%s/stats.pkl' % opt.output_folder, 'wb') as f:\n            pickle.dump(stats_pkl_logging, f)\n        plotting(opt.output_folder)\n"
  },
  {
    "path": "src/utils/generate_visibility_depth_normal.py",
    "content": "import argparse\nimport copy\nimport fnmatch\nimport numpy as np\nimport open3d as o3d\nimport os\nimport pickle\nfrom PIL import Image\nfrom torch.utils.data import DataLoader\nfrom tqdm import tqdm\n\nimport sys\nsys.path.append(os.path.join(sys.path[0], '..'))\nfrom dataloader.indoor6 import Indoor6\n\ndef extract(opt):\n\n    DATASET_FOLDER = os.path.join(opt.dataset_folder)\n\n    test_dataset = Indoor6(scene_id=opt.scene_id,\n                         mode='all',\n                         root_folder=DATASET_FOLDER,\n                         input_image_downsample=1,\n                         landmark_config=opt.landmark_config,\n                         visibility_config=opt.visibility_config,\n                         skip_image_index=1)\n\n    test_dataloader = DataLoader(dataset=test_dataset, num_workers=1, batch_size=1, shuffle=False, pin_memory=True)\n\n    return test_dataloader, test_dataset\n\n\nif __name__ == '__main__':\n    parser = argparse.ArgumentParser(\n        description='Scene Landmark Detection',\n        formatter_class=argparse.ArgumentDefaultsHelpFormatter)\n    parser.add_argument(\n        '--dataset_folder', type=str, required=False,\n        help='Root directory, where all data is stored')\n    parser.add_argument(\n        '--output_folder', type=str, required=False,\n        help='Output folder')\n    parser.add_argument(\n        '--landmark_config', type=str, default='landmarks/landmarks-300',\n        help='Landmark configuration.')\n    parser.add_argument(\n        '--visibility_config', type=str, default='landmarks/visibility-300',\n        help='Visibility configuration.')\n    parser.add_argument(\n        '--scene_id', type=str, default='scene1',\n        help='Scene id')\n\n    opt = parser.parse_args()\n    monodepth_folder = os.path.join(opt.dataset_folder, opt.scene_id, 'depth')\n\n    from read_write_models import *\n    cameras, images, points = read_model(os.path.join(opt.dataset_folder, 'indoor6-colmap/%s/sparse/0' % opt.scene_id), ext='.bin')\n    indoor6_name_2to_colmap_index = {}\n    for k in images:\n        indoor6_name_2to_colmap_index[images[k].name] = k\n        # print(images[k])\n\n    dataloader, data = extract(opt)\n\n    augmented_visibility = copy.deepcopy(data.visibility)\n    monodepth_folder = os.path.join(opt.dataset_folder,\n                                    opt.scene_id,\n                                    'depth')\n\n    count_invalid_images = 0\n\n    ##############################################################\n    ### Creating depth images and augment visibility based on ####\n    ### the consistency between depth and 3D points from colmap ##\n    ##############################################################\n\n    for idx, batch in enumerate(tqdm(dataloader)):        \n        _, _, H, W = batch['image'].shape\n        # batch['intrinsic']\n\n        original_image_name = data.original_image_name(idx)\n        colmap_index = indoor6_name_2to_colmap_index[original_image_name]\n        if images[colmap_index].name != original_image_name:\n            print('indoor6 name: ', data.image_files[idx], ', original name ', original_image_name)        \n\n\n        point3D_ids = images[colmap_index].point3D_ids                \n\n        K = batch['intrinsics'][0].cpu().numpy()\n        R = batch['pose_gt'][0, :3, :3].cpu().numpy()\n        t = batch['pose_gt'][0, :3, 3].cpu().numpy()\n\n        xys = images[colmap_index].xys\n\n        monoscaled_depth_path = os.path.join(monodepth_folder, data.image_files[idx].replace('.jpg', '.scaled_depth.npy'))\n        dmonodense_scaled = None\n        if os.path.exists(monoscaled_depth_path):\n            dmonodense_scaled = np.load(monoscaled_depth_path)\n        # else:\n            # dmonodense = np.load(os.path.join(monodepth_folder, data.image_files[idx].replace('jpg', 'npy')))\n\n            # ds = np.zeros(len(point3D_ids))\n            # dmono = np.zeros(len(point3D_ids))\n            # validIdx = 0\n\n            # for i, k in enumerate(point3D_ids):            \n            #     if k != -1:\n            #         Cp = R @ points[k].xyz + t\n            #         xyz = K @ Cp\n            #         proj_x = xyz[0] / xyz[2]\n            #         proj_y = xyz[1] / xyz[2]\n\n            #         px = xys[i][0]\n            #         py = xys[i][1]\n\n            #         if Cp[2] < 15.0 and proj_x >= 0 and proj_x < W and proj_y >= 0 and proj_y < H and np.abs(proj_x-px) < 5.0 and np.abs(proj_y-py) < 5.0:\n            #             ds[validIdx] = Cp[2]\n            #             dmono[validIdx] = dmonodense[int(proj_y), int(proj_x)]\n\n            #             ## Doing sth here to compute surface normal\n            #             validIdx += 1\n            \n            # if validIdx < 10:\n            #     dmonodense_scaled = None\n            #     count_invalid_images += 1\n            # else:\n            #     ds = ds[:validIdx]\n            #     dmono = dmono[:validIdx]\n            #     A = np.array([[np.sum(dmono**2), np.sum(dmono)], [np.sum(dmono), validIdx]])\n            #     b = np.array([np.sum(dmono*ds), np.sum(ds)])\n            #     k = np.linalg.solve(A, b)\n\n            #     dmonodense_scaled = k[0] * dmonodense + k[1]\n            #     np.save(monoscaled_depth_path, dmonodense_scaled)            \n\n        if dmonodense_scaled is not None:\n            Cplm = batch['landmark3d'][0].cpu().numpy()            \n            pixlm = K @ Cplm\n            px = pixlm[0] / pixlm[2]\n            py = pixlm[1] / pixlm[2]\n            infront_infrustum = (Cplm[2] > 0.3) * (Cplm[2] < 15.0) * (px >= 0) * (px < W) * (py >=0) * (py < H)\n\n            vis = copy.deepcopy(augmented_visibility[:, data.image_indices[idx]])\n            count_colmap_vs_depth_incompatibility = 0\n            count_infront_infrustum = 0\n            for l in range(data.landmark.shape[1]):\n                if infront_infrustum[l]:\n                    count_infront_infrustum += 1\n\n                    depth_from_scaled_mono = dmonodense_scaled[int(py[l]), int(px[l])]\n                    depth_from_lm_proj = Cplm[2, l]\n                    rel_depth = np.abs(depth_from_lm_proj - depth_from_scaled_mono) / depth_from_lm_proj\n\n                    if vis[l]==0:                    \n                        if rel_depth < 0.3: ## 30% depth compatible\n                            vis[l] = True\n\n            augmented_visibility[:, data.image_indices[idx]] = vis\n\n    np.savetxt(os.path.join(opt.dataset_folder, opt.scene_id, opt.visibility_config + '_depth.txt'), augmented_visibility, fmt='%d')    \n\n\n    #########################################################\n    ### Adding visibility refinement using surface normal ###\n    #########################################################\n    root_folder=opt.dataset_folder\n    scene_id=opt.scene_id\n\n    data = pickle.load(open('%s/%s/train_test_val.pkl' % (root_folder, scene_id), 'rb'))\n    imgs = data['train'] + data['val'] + data['test']\n    idx = data['train_idx'] + data['val_idx'] + data['test_idx']\n\n    landmark_config = opt.landmark_config\n    visibility_config = opt.visibility_config\n    visibility_depth_config = visibility_config + '_depth'\n\n    np.random.seed(100)\n    landmark_colors = np.random.rand(10000, 3)\n\n    landmark_file = open(root_folder + '/' + scene_id + '/%s.txt' % landmark_config, 'r')\n    num_landmark = int(landmark_file.readline())\n\n    lm = []\n    for l in range(num_landmark):\n        pl = landmark_file.readline().split()\n        pl = np.array([float(pl[i]) for i in range(len(pl))])\n        lm.append(pl)\n    lm = np.asarray(lm)[:, 1:].T\n\n    visibility_file = root_folder + '/' + scene_id + '/%s.txt' % visibility_config\n    visibility = np.loadtxt(visibility_file).astype(bool)\n\n    visibility_file = root_folder + '/' + scene_id + '/%s.txt' % visibility_depth_config\n    visibility_depth = np.loadtxt(visibility_file).astype(bool)\n    new_visibility = copy.deepcopy(visibility_depth)\n\n    lm_spheres = []\n    mesh_arrows = []\n    mesh_arrows_ref = []\n    H = 720\n    W = 1280\n\n    WW, HH = np.meshgrid(np.arange(W), np.arange(H))\n    WW = WW.reshape(1, H, W)\n    HH = HH.reshape(1, H, W)\n    wh1 = np.concatenate((WW, HH, np.ones_like(HH)), axis=0)\n    lm_sn = np.zeros((num_landmark, 6))\n    lm_sn[:, :3] = lm.T\n\n    for lm_idx in tqdm(range(visibility.shape[0])):\n        ## Observe from colmap\n\n        visibility_matrix_ids = [i for i in np.where(visibility[lm_idx, idx])[0]]\n\n        images_observe_lm = [imgs[i] for i in visibility_matrix_ids]\n        pose_paths = [os.path.join(root_folder, scene_id, 'images', ifile.replace('color.jpg', 'pose.txt')) for ifile in images_observe_lm]\n        depth_paths = [os.path.join(root_folder, scene_id, 'depth', ifile.replace('.jpg', '.scaled_depth.npy')) for ifile in images_observe_lm]\n        intrinsic_paths = [os.path.join(root_folder, scene_id, 'images', ifile.replace('color.jpg', 'intrinsics.txt')) for ifile in images_observe_lm]\n\n        depths = np.zeros((len(pose_paths), H, W))\n        Ts = np.zeros((len(pose_paths), 4, 4))\n        Ks = np.zeros((len(pose_paths), 3, 3))\n        for i, pp in enumerate(pose_paths):\n            T = np.loadtxt(pp)\n            T = np.concatenate( (T, np.array([[0, 0, 0, 1]])), axis=0)\n            Ts[i] = T\n\n            intrinsics = open(intrinsic_paths[i])\n            intrinsics = intrinsics.readline().split()\n            fx = float(intrinsics[2])\n            fy = float(intrinsics[2])\n\n            cx = float(intrinsics[3])\n            cy = float(intrinsics[4])\n\n            K = np.array([[fx, 0., cx],\n                            [0., fy, cy],\n                            [0., 0., 1.]])\n            Ks[i] = K\n        \n\n        ## First estimate for surface normal using just visibility vector\n        bsum = np.zeros(3)    \n        for i in range(Ts.shape[0]):\n            Gpt = lm[:, lm_idx] + Ts[i, :3, :3].T @ Ts[i, :3, 3]\n            bsum -= (Gpt / np.linalg.norm(Gpt))                        \n        bsum /= np.linalg.norm(bsum)\n        \n        ## Refine the surface normal based on depth image\n        bref = np.zeros(3)\n        patch_size = 50\n        for i in range(Ts.shape[0]):\n            if os.path.exists(depth_paths[i]):\n                cp = Ts[i, :3, :3] @ lm[:, lm_idx] + Ts[i, :3, 3]\n                cp = Ks[i] @ cp\n                cp = cp.reshape(-1)\n                proj_x = int(cp[0] / cp[2])\n                proj_y = int(cp[1] / cp[2])\n\n                if proj_x >= patch_size and proj_x < W-patch_size and proj_y >= patch_size and proj_y < H-patch_size:\n                    patch_x0, patch_x1 = proj_x-patch_size, proj_x+patch_size\n                    patch_y0, patch_y1 = proj_y-patch_size, proj_y+patch_size\n\n                    d = np.load(depth_paths[i])[patch_y0:patch_y1, patch_x0:patch_x1].reshape((1, patch_size * 2, patch_size * 2))\n                    pcd = np.linalg.inv(Ks[i]) @ (wh1[:, patch_y0:patch_y1, patch_x0:patch_x1] * d).reshape(3, 4 * patch_size ** 2)\n\n                    A = np.concatenate((pcd, np.ones((1, 4 * patch_size ** 2))), axis=0)\n                    D, U = np.linalg.eig(A @ A.T)\n                    \n                    sn = Ts[i, :3, :3].T @ U[:3, np.argsort(D)[0]]\n                    sn /= np.linalg.norm(sn)\n\n                    if np.sum(bsum * sn) > 0.0:\n                        bref += sn\n                    elif np.sum(bsum * sn) < 0.0:\n                        bref -= sn\n        \n        if np.linalg.norm(bref) == 0:\n            lm_sn[lm_idx, 3:] = bsum\n        else:\n            bref /= np.linalg.norm(bref)\n            lm_sn[lm_idx, 3:] = bref\n\n        visibility_matrix_ids = [i for i in np.where(visibility_depth[lm_idx, idx])[0]]\n        images_observe_lm = [imgs[i] for i in np.where(visibility_depth[lm_idx, idx])[0]]\n        pose_paths = [os.path.join(root_folder, scene_id, 'images', ifile.replace('color.jpg', 'pose.txt')) for ifile in images_observe_lm]\n        for i, pp in enumerate(pose_paths):\n            T = np.loadtxt(pp)\n            if visibility_depth[lm_idx, idx[visibility_matrix_ids[i]]]:\n                Gpt = lm[:, lm_idx] + T[:3, :3].T @ T[:3, 3]\n                Gpt /= np.linalg.norm(Gpt)\n                if np.sum(bref * Gpt) > -0.2: ## violate visibility direction\n                    new_visibility[lm_idx, idx[visibility_matrix_ids[i]]] = 0\n    \n    np.savetxt(os.path.join(root_folder, scene_id, '%s_normal.txt' % (landmark_config)), lm_sn)\n    np.savetxt(os.path.join(root_folder, scene_id, '%s_depth_normal.txt' % (visibility_config)), new_visibility, fmt='%d')"
  },
  {
    "path": "src/utils/heatmap.py",
    "content": "import numpy as np\nimport torch\n\n\ndef generate_heat_maps(landmarks, visibility_mask, heatmap_size, K, sigma=3):\n    '''\n    :param landmarks:  [3, L]\n    :param visibility_mask: [L]\n    :return: hms, hms_weight(1: visible, 0: invisible)\n    '''\n\n\n    hms = np.zeros((landmarks.shape[1],\n                       heatmap_size[0],\n                       heatmap_size[1]),\n                       dtype=np.float32)\n\n    hms_weights = np.ones((landmarks.shape[1]), dtype=np.float32)\n\n    tmp_size = sigma * 3\n\n    for lm_id in range(landmarks.shape[1]):\n        landmark_2d = K @ landmarks[:, lm_id]\n        landmark_2d /= landmark_2d[2]\n\n        mu_x = int(landmark_2d[0] + 0.5)\n        mu_y = int(landmark_2d[1] + 0.5)\n        # Check that any part of the gaussian is in-bounds\n        ul = [int(mu_y - tmp_size), int(mu_x - tmp_size)]\n        br = [int(mu_y + tmp_size + 1), int(mu_x + tmp_size + 1)]\n        if ul[0] >= heatmap_size[0] or ul[1] >= heatmap_size[1] \\\n                or br[0] < 0 or br[1] < 0 or landmarks[2, lm_id] < 0:\n            continue\n\n        if visibility_mask[lm_id]:\n            ## Generate gaussian\n            size = 2 * tmp_size + 1\n            x = np.arange(0, size, 1, np.float32)\n            y = x[:, np.newaxis]\n            x0 = y0 = size // 2\n            # The gaussian is not normalized, we want the center value to equal 1\n            g = np.exp(- ((x - x0) ** 2 + (y - y0) ** 2) / (2 * sigma ** 2))\n\n            # Usable gaussian range\n            g_y = max(0, -ul[0]), min(br[0], heatmap_size[0]) - ul[0]\n            g_x = max(0, -ul[1]), min(br[1], heatmap_size[1]) - ul[1]\n\n            # Image range\n            img_y = max(0, ul[0]), min(br[0], heatmap_size[0])\n            img_x = max(0, ul[1]), min(br[1], heatmap_size[1])\n\n            hms[lm_id][img_y[0]:img_y[1], img_x[0]:img_x[1]] = \\\n                g[g_y[0]:g_y[1], g_x[0]:g_x[1]]\n        else:\n            hms_weights[lm_id] = 0.0\n    return hms, hms_weights\n\n\ndef generate_heat_maps_gpu(landmarks_2d, visibility_mask, heatmap_size, sigma=3):\n    '''\n    gpu version of heat map generation\n    :param landmarks:  [3, L]\n    :return: hms\n    '''\n\n    B, _, L = landmarks_2d.shape\n    H, W = heatmap_size[0], heatmap_size[1]\n\n    yy_grid, xx_grid = torch.meshgrid(torch.arange(0, heatmap_size[0]),\n                                      torch.arange(0, heatmap_size[1]))\n    xx_grid, yy_grid = xx_grid.to(device=landmarks_2d.device), yy_grid.to(device=landmarks_2d.device)\n    hms = torch.exp(-((xx_grid.reshape(1, 1, H, W)-landmarks_2d[:, 0].reshape(B, L, 1, 1))**2 +\n                      (yy_grid.reshape(1, 1, H, W)-landmarks_2d[:, 1].reshape(B, L, 1, 1))**2)/(2*sigma**2))\n    hms_vis = hms * visibility_mask.reshape(B, L, 1, 1).float()\n    hms_vis[hms_vis < 0.1] = 0.0\n    normalizing_factor, _ = torch.max(hms_vis.reshape(B, L, -1), dim=2)\n    hms_vis[normalizing_factor > 0.5] = hms_vis[normalizing_factor > 0.5] / \\\n                                        normalizing_factor.reshape(B, L, 1, 1)[normalizing_factor > 0.5]\n\n    return hms_vis"
  },
  {
    "path": "src/utils/landmark_selection.py",
    "content": "import argparse\r\nimport numpy as np\r\nimport os\r\nimport pickle\r\nfrom read_write_models import qvec2rotmat, read_model\r\nfrom tqdm import tqdm\r\n\r\n\r\ndef ComputePerPointTimeSpan(image_ids, images):\r\n    timespan = {}\r\n    \r\n    for imageID in image_ids:\r\n        session_id = int(images[imageID].name.split('-')[0])\r\n        if session_id in timespan:\r\n            timespan[session_id] += 1\r\n        else:\r\n            timespan[session_id] = 1\r\n\r\n    return len(timespan)\r\n\r\n\r\ndef ComputePerPointDepth(pointInGlobal, image_ids, images):\r\n    d = np.zeros(len(image_ids))\r\n    for i, imageID in enumerate(image_ids):\r\n        R = qvec2rotmat(images[imageID].qvec)\r\n        t = images[imageID].tvec\r\n        pointInCamerai = R @ pointInGlobal + t\r\n        d[i] = pointInCamerai[2]\r\n    \r\n    pointDepthMean, pointDepthStd = np.mean(d), np.std(d)\r\n\r\n    return pointDepthMean, pointDepthStd\r\n\r\n\r\ndef ComputePerPointAngularSpan(pointInGlobal, image_ids, images):\r\n    N = len(image_ids)\r\n    H = np.zeros((3, 3))\r\n    for i, imageID in enumerate(image_ids):\r\n        Ri = qvec2rotmat(images[imageID].qvec)\r\n        ti = images[imageID].tvec\r\n        bi = Ri.T @ (pointInGlobal - ti)\r\n        bi = bi / np.linalg.norm(bi)\r\n        H += (np.eye(3) - np.outer(bi, bi))\r\n    \r\n    H /= N\r\n    eigH = np.linalg.eigvals(0.5*(H + H.T))\r\n    \r\n    return np.arccos(np.clip(1 - 2.0 * np.min(eigH)/np.max(eigH), 0, 1))\r\n\r\n\r\ndef SaveLandmarksAndVisibilityMask(selected_landmarks, points3D, images, indoor6_imagename_to_index, num_images, root_path, \r\n                                   landmark_config, visibility_config, outformat):\r\n    \r\n    num_landmarks = len(selected_landmarks['id'])\r\n\r\n    visibility_mask = np.zeros((num_landmarks, num_images), dtype=np.uint8)\r\n\r\n    for i, pid in enumerate(selected_landmarks['id']):\r\n        for imgid in points3D[pid].image_ids:\r\n            if images[imgid].name in indoor6_imagename_to_index:\r\n                visibility_mask[i, indoor6_imagename_to_index[images[imgid].name]] = 1\r\n\r\n    np.savetxt(os.path.join(root_path, '%s%s.txt' % (visibility_config, outformat)), visibility_mask, fmt='%d')\r\n\r\n    f = open(os.path.join(root_path, '%s%s.txt' % (landmark_config, outformat)), 'w')\r\n    f.write('%d\\n' % num_landmarks)\r\n    for i in range(selected_landmarks['xyz'].shape[1]):\r\n        f.write('%d %4.4f %4.4f %4.4f\\n' % (i, \r\n                                            selected_landmarks['xyz'][0, i], \r\n                                            selected_landmarks['xyz'][1, i], \r\n                                            selected_landmarks['xyz'][2, i]))\r\n    f.close()\r\n\r\n\r\n\r\nif __name__ == '__main__':\r\n    parser = argparse.ArgumentParser(\r\n        description='Scene Landmark Detection',\r\n        formatter_class=argparse.ArgumentDefaultsHelpFormatter)\r\n    parser.add_argument(\r\n        '--dataset_folder', type=str, required=True,\r\n        help='Root directory, where all data is stored')\r\n    parser.add_argument(\r\n        '--scene_id', type=str, default='scene6',\r\n        help='Scene id')\r\n    parser.add_argument(\r\n        '--num_landmarks', type=int, default=300,\r\n        help='Number of selected landmarks.')\r\n    parser.add_argument(\r\n        '--output_format', type=str, default='v2',\r\n        help='Landmark file output.')\r\n\r\n    opt = parser.parse_args()\r\n    opt.landmark_config = \"landmarks/landmarks-%d\" % (opt.num_landmarks)\r\n    opt.visibility_config = \"landmarks/visibility-%d\" % (opt.num_landmarks)\r\n\r\n    scene = opt.scene_id\r\n    path = os.path.join(opt.dataset_folder, 'indoor6-colmap/%s-tr/sparse/0/' % scene)\r\n    cameras, images, points3D = read_model(path, ext='.bin')\r\n    \r\n    ## Max number of sessions\r\n    sessions = {}\r\n    for i in images:\r\n        print(images[i].name)\r\n        session_id = int(images[i].name.split('-')[0])\r\n        sessions[session_id] = 1\r\n    maxSession = len(sessions)\r\n\r\n    ## Initialization\r\n    numPoints3D = len(points3D)\r\n    points3D_ids = np.zeros(numPoints3D)\r\n    points3D_scores = np.zeros(numPoints3D)\r\n    validIdx = 0\r\n\r\n    ## Compute score for each landmark    \r\n    for i, k in enumerate(tqdm(points3D)):            \r\n        pointInGlobal = points3D[k].xyz\r\n        image_ids = points3D[k].image_ids\r\n        trackLength = len(image_ids)\r\n            \r\n        if trackLength > 25:        \r\n            depthMean, depthStd = ComputePerPointDepth(pointInGlobal, image_ids, images)        \r\n            timespan = ComputePerPointTimeSpan(image_ids, images)\r\n            anglespan = ComputePerPointAngularSpan(pointInGlobal, image_ids, images)\r\n            \r\n            depthScore = min(1.0, depthStd / depthMean) \r\n            trackLengthScore = 0.25 * np.log2(trackLength)\r\n            timeSpanScore = timespan / maxSession\r\n            \r\n            if timespan >= 1 and depthMean < 10.0 and anglespan > 0.3:\r\n                points3D_ids[validIdx] = k\r\n                points3D_scores[validIdx] = depthScore + trackLengthScore + timeSpanScore + anglespan\r\n                validIdx += 1                \r\n        \r\n    \r\n    ## Sort scores\r\n    points3D_ids = points3D_ids[:validIdx]\r\n    points3D_scores = points3D_scores[:validIdx]\r\n    sorted_indices = np.argsort(points3D_scores)\r\n\r\n\r\n    ## Greedy selection\r\n    selected_landmarks = {'id': np.zeros(opt.num_landmarks), \r\n                        'xyz': np.zeros((3, opt.num_landmarks)), \r\n                        'score': np.zeros(opt.num_landmarks)}\r\n\r\n    ## Selecting first point\r\n    selected_landmarks['id'][0] = points3D_ids[sorted_indices[-1]]\r\n    selected_landmarks['xyz'][:, 0] = points3D[selected_landmarks['id'][0]].xyz\r\n    selected_landmarks['score'][0] = points3D_scores[sorted_indices[-1]]\r\n\r\n    nselected = 1\r\n    radius = 5.0\r\n\r\n    while nselected < opt.num_landmarks:\r\n        for i in reversed(sorted_indices):\r\n            id = points3D_ids[i]\r\n            xyz = points3D[id].xyz        \r\n\r\n            if np.sum(np.linalg.norm(xyz.reshape(3, 1) - selected_landmarks['xyz'][:, :nselected], axis=0) < radius):\r\n                continue\r\n            else:\r\n                selected_landmarks['id'][nselected] = id\r\n                selected_landmarks['xyz'][:, nselected] = xyz\r\n                selected_landmarks['score'][nselected] = points3D_scores[i]\r\n                nselected += 1\r\n\r\n            if nselected == opt.num_landmarks:\r\n                break\r\n        radius *= 0.5\r\n\r\n    ## Saving\r\n    indoor6_images = pickle.load(open(os.path.join(opt.dataset_folder, '%s/train_test_val.pkl' % opt.scene_id), 'rb'))\r\n    indoor6_imagename_to_index = {}\r\n\r\n    for i, f in enumerate(indoor6_images['train']):\r\n        image_name = open(os.path.join(opt.dataset_folder, \r\n                                    opt.scene_id, 'images', \r\n                                    f.replace('color.jpg', \r\n                                                'intrinsics.txt'))).readline().split(' ')[-1][:-1]\r\n        indoor6_imagename_to_index[image_name] = indoor6_images['train_idx'][i]\r\n\r\n    num_images = len(indoor6_images['train']) + len(indoor6_images['val']) + len(indoor6_images['test'])\r\n    SaveLandmarksAndVisibilityMask(selected_landmarks, points3D, images, indoor6_imagename_to_index, num_images, \r\n                                   os.path.join(opt.dataset_folder, opt.scene_id), \r\n                                   opt.landmark_config, opt.visibility_config, opt.output_format)"
  },
  {
    "path": "src/utils/merge_landmark_files.py",
    "content": "import argparse\nimport copy\nimport numpy as np\nimport os\n\nimport sys\nsys.path.append(os.path.join(sys.path[0], '..'))\n\nfrom utils.select_additional_landmarks import load_landmark_visibility_files\n\ndef save_landmark_visibility_mask(landmarks, visibility_mask, \n                                  landmark_path, visibility_path):\n    \n    num_landmarks = landmarks.shape[1]\n\n    np.savetxt(visibility_path, visibility_mask, fmt='%d')\n\n    f = open(landmark_path, 'w')\n    f.write('%d\\n' % num_landmarks)\n    for i in range(num_landmarks):\n        f.write('%d %4.4f %4.4f %4.4f\\n' % (i, \n                                            landmarks[0, i], \n                                            landmarks[1, i], \n                                            landmarks[2, i]))\n    f.close()\n\n\nif __name__ == '__main__':\n    parser = argparse.ArgumentParser(\n        description='Scene Landmark Detection',\n        formatter_class=argparse.ArgumentDefaultsHelpFormatter)\n    parser.add_argument(\n        '--dataset_folder', type=str, required=True,\n        help='Root directory, where all data is stored')\n    parser.add_argument(\n        '--scene_id', type=str, default='scene6',\n        help='Scene id')\n    parser.add_argument(\n        '--landmark_config', type=str, action='append',\n        help='File containing scene-specific 3D landmarks.')\n    parser.add_argument(\n        '--visibility_config', type=str, action='append',\n        help='File containing information about visibility of landmarks in cameras associated with training set.')\n    parser.add_argument(\n        '--output_format', type=str, required=True,\n        help='Output file format.')\n\n    opt = parser.parse_args()\n    \n    assert len(opt.landmark_config) > 1\n    assert len(opt.landmark_config) == len(opt.visibility_config)\n\n    num_landmarks = 0\n    num_files = len(opt.landmark_config)\n    ls = []\n    vs = []\n    for (lp, vp) in zip(opt.landmark_config, opt.visibility_config):\n        landmark_path = os.path.join(opt.dataset_folder, opt.scene_id, lp + '.txt')\n        vis_path = os.path.join(opt.dataset_folder, opt.scene_id, vp + '.txt')\n        \n        l, v = load_landmark_visibility_files(landmark_path=landmark_path,\n                                                visibility_path=vis_path)\n        \n        num_landmarks += l.shape[1]\n\n        ls.append(l)\n        vs.append(v)\n\n    ls = np.concatenate(ls, axis=1)\n    vs = np.concatenate(vs, axis=0)\n\n    output_landmark_path = os.path.join(opt.dataset_folder, opt.scene_id, 'landmarks/landmarks-%d%s.txt' % (num_landmarks, opt.output_format))\n    \n    if 'depth_normal' in opt.visibility_config[0]:\n        output_visibility_path = os.path.join(opt.dataset_folder, opt.scene_id, 'landmarks/visibility-%d%s_depth_normal.txt' % (num_landmarks, opt.output_format))\n    else:\n        output_visibility_path = os.path.join(opt.dataset_folder, opt.scene_id, 'landmarks/visibility-%d%s.txt' % (num_landmarks, opt.output_format))\n    save_landmark_visibility_mask(ls, vs, output_landmark_path, output_visibility_path)"
  },
  {
    "path": "src/utils/pnp.py",
    "content": "import numpy as np\nfrom scipy.optimize import least_squares\n\n\ndef Rotation2Quaternion(R):\n    \"\"\"\n    Convert a rotation matrix to quaternion\n\n    Parameters\n    ----------\n    R : ndarray of shape (3, 3)\n        Rotation matrix\n    Returns\n    -------\n    q : ndarray of shape (4,)\n        The unit quaternion (w, x, y, z)\n    \"\"\"\n    q = np.empty([4, ])\n\n    tr = np.trace(R)\n    if tr < 0:\n        i = R.diagonal().argmax()\n        j = (i + 1) % 3\n        k = (j + 1) % 3\n\n        q[i] = np.sqrt(1 - tr + 2 * R[i, i]) / 2\n        q[j] = (R[j, i] + R[i, j]) / (4 * q[i])\n        q[k] = (R[k, i] + R[i, k]) / (4 * q[i])\n        q[3] = (R[k, j] - R[j, k]) / (4 * q[i])\n    else:\n        q[3] = np.sqrt(1 + tr) / 2\n        q[0] = (R[2, 1] - R[1, 2]) / (4 * q[3])\n        q[1] = (R[0, 2] - R[2, 0]) / (4 * q[3])\n        q[2] = (R[1, 0] - R[0, 1]) / (4 * q[3])\n\n    q /= np.linalg.norm(q)\n    # Rearrange (x, y, z, w) to (w, x, y, z)\n    q = q[[3, 0, 1, 2]]\n\n    return q\n\n\ndef Quaternion2Rotation(q):\n    \"\"\"\n    Convert a quaternion to rotation matrix\n\n    Parameters\n    ----------\n    q : ndarray of shape (4,)\n        Unit quaternion (w, x, y, z)\n    Returns\n    -------\n    R : ndarray of shape (3, 3)\n        The rotation matrix\n    \"\"\"\n    q /= np.linalg.norm(q)\n\n    w = q[0]\n    x = q[1]\n    y = q[2]\n    z = q[3]\n\n    R = np.empty([3, 3])\n    R[0, 0] = 1 - 2 * y ** 2 - 2 * z ** 2\n    R[0, 1] = 2 * (x * y - z * w)\n    R[0, 2] = 2 * (x * z + y * w)\n\n    R[1, 0] = 2 * (x * y + z * w)\n    R[1, 1] = 1 - 2 * x ** 2 - 2 * z ** 2\n    R[1, 2] = 2 * (y * z - x * w)\n\n    R[2, 0] = 2 * (x * z - y * w)\n    R[2, 1] = 2 * (y * z + x * w)\n    R[2, 2] = 1 - 2 * x ** 2 - 2 * y ** 2\n\n    return R\n\ndef skewsymm(x):\n    \n    Sx = np.zeros((3, 3))\n    Sx[0, 1] = -x[2]\n    Sx[0, 2] = x[1]\n    Sx[1, 0] = x[2]\n    Sx[2, 0] = -x[1]\n    Sx[1, 2] = -x[0]\n    Sx[2, 1] = x[0]\n\n    return Sx\n\n\ndef VectorizeInitialPose(C_T_G):\n\n    R = C_T_G[:3, :3]\n    t = C_T_G[:3, 3]\n    q = Rotation2Quaternion(R)\n    z = np.concatenate([t, q])\n\n    return z\n\n\ndef MeasureReprojectionSinglePose(z, p, b, w):\n\n    n_points = b.shape[1]\n\n    q = z[3:7]\n    q_norm = np.sqrt(np.sum(q ** 2))\n    q = q / q_norm\n    R = Quaternion2Rotation(q)\n    t = z[:3]\n\n    b_hat = R @ p + t.reshape(3, 1)\n    b_hat_normalized = b_hat / np.sqrt(np.sum(b_hat ** 2, axis=0))\n    err = np.repeat(w, 3).reshape(n_points, 3).T * (b_hat_normalized - b)\n\n    return err.reshape(-1)\n\n\ndef UpdatePose(z):\n\n    p = z[0:7]\n    q = p[3:]\n\n    q = q / np.linalg.norm(q)\n    R = Quaternion2Rotation(q)\n    t = p[:3]\n    P_new = np.hstack([R, t[:, np.newaxis]])\n\n    return P_new\n\n\ndef P3PKe(m, X, inlier_thres=1e-5):\n    \"\"\"\n    Perspective-3-point algorithm from\n    Ke, T., & Roumeliotis, S. I. (CVPR'17). An efficient algebraic solution to the perspective-three-point problem.\n\n\n    Parameters\n    ----------\n    m : ndarray of shape (3, 4)\n        unit bearing vectors to each landmarks w.r.t camera\n    X : ndarray of shape (3, 4)\n        3D points position w.r.t global\n    Returns\n    -------\n    R : ndarray of shape (3, 3)\n    t : ndarray of shape (3, 1)\n        (R, t) represents transformation from global to camera frame of reference\n    \"\"\"\n    w1 = X[:, 0]\n    w2 = X[:, 1]\n    w3 = X[:, 2]\n\n    u0 = w1 - w2\n    nu0 = np.linalg.norm(u0)\n    if nu0 < 1e-4:\n        return None, None\n    k1 = u0 / nu0\n\n    b1 = m[:, 0]\n    b2 = m[:, 1]\n    b3 = m[:, 2]\n\n    k3 = np.cross(b1, b2)\n    nk3 = np.linalg.norm(k3)\n    if nk3 < 1e-4:\n        return None, None\n    k3 = k3 / nk3\n\n    tz = np.cross(b1, k3)\n    v1 = np.cross(b1, b3)\n    v2 = np.cross(b2, b3)\n\n    u1 = w1 - w3\n    u1k1 = np.sum(u1 * k1)\n    k3b3 = np.sum(k3 * b3)\n    if np.abs(k3b3) < 1e-4:\n        return None, None\n\n\n    f11 = k3.T @ b3\n    f13 = k3.T @ v1\n    f15 = -u1k1 * f11\n    nl = np.cross(u1, k1)\n    delta = np.linalg.norm(nl)\n    if delta < 1e-4:\n        return None, None\n    nl = nl / delta\n    f11 = delta * f11\n    f13 = delta * f13\n\n    u2k1 = u1k1 - nu0\n    f21 = np.sum(tz * v2)\n    f22 = nk3 * k3b3\n    f23 = np.sum(k3 * v2)\n    f24 = u2k1 * f22\n    f25 = -u2k1 * f21\n    f21 = delta * f21\n    f22 = delta * f22\n    f23 = delta * f23\n\n    g1 = f13 * f22\n    g2 = f13 * f25 - f15 * f23\n    g3 = f11 * f23 - f13 * f21\n    g4 = -f13 * f24\n    g5 = f11 * f22\n    g6 = f11 * f25 - f15 * f21\n    g7 = -f15 * f24\n    alpha = np.array([g5 * g5 + g1 * g1 + g3 * g3,\n                      2 * (g5 * g6 + g1 * g2 + g3 * g4),\n                      g6 * g6 + 2 * g5 * g7 + g2 * g2 + g4 * g4 - g1 * g1 - g3 * g3,\n                      2 * (g6 * g7 - g1 * g2 - g3 * g4),\n                      g7 * g7 - g2 * g2 - g4 * g4])\n\n    if any(np.isnan(alpha)):\n        return None, None\n\n    sols = np.roots(alpha)\n\n    Ck1nl = np.vstack((k1, nl, np.cross(k1, nl))).T\n    Cb1k3tzT = np.vstack((b1, k3, tz))\n    b3p = (delta / k3b3) * b3\n\n    R = np.zeros((3, 3, 4))\n    t = np.zeros((3, 4))\n    for i in range(sols.shape[0]):\n        if np.imag(sols[i]) != 0:\n            continue\n\n        ctheta1p = np.real(sols[i])\n        if abs(ctheta1p) > 1:\n            continue\n        stheta1p = np.sqrt(1 - ctheta1p * ctheta1p)\n        if k3b3 < 0:\n            stheta1p = -stheta1p\n\n        ctheta3 = g1 * ctheta1p + g2\n        stheta3 = g3 * ctheta1p + g4\n        ntheta3 = stheta1p / ((g5 * ctheta1p + g6) * ctheta1p + g7)\n        ctheta3 = ntheta3 * ctheta3\n        stheta3 = ntheta3 * stheta3\n\n        C13 = np.array([[ctheta3, 0, -stheta3],\n                        [stheta1p * stheta3, ctheta1p, stheta1p * ctheta3],\n                        [ctheta1p * stheta3, -stheta1p, ctheta1p * ctheta3]])\n\n        Ri = (Ck1nl @ C13 @ Cb1k3tzT).T\n        pxstheta1p = stheta1p * b3p\n        ti = pxstheta1p - Ri @ w3\n        ti = ti.reshape(3, 1)\n\n        m_hat = Ri @ X + ti\n        m_hat = m_hat / np.linalg.norm(m_hat, axis=0)\n        if np.sum(np.sum(m_hat * m, axis=0) > 1.0 - inlier_thres) == 4:\n            return Ri, ti\n\n    return None, None\n\n\ndef P3PKe_Ransac(G_p_f, C_b_f_hm, w, thres=0.01):\n    inlier_thres = thres\n    C_T_G_best = None\n    inlier_best = np.zeros(G_p_f.shape[1], dtype=bool)\n    Nsample=4\n    inlier_score_best=0\n\n    for iter in range(125): #old value was 10\n        ## Weighted sampling based on weight factor\n        min_set = np.argpartition(np.exp(w) * np.random.rand(w.shape[0]), -Nsample)[-Nsample:]\n        C_R_G_hat, C_t_G_hat = P3PKe(C_b_f_hm[:, min_set], G_p_f[:, min_set], inlier_thres=thres)\n\n        if C_R_G_hat is None or C_t_G_hat is None:\n            continue\n\n        # Get inlier\n        C_b_f_hat = C_R_G_hat @ G_p_f + C_t_G_hat\n        C_b_f_hat = C_b_f_hat / np.linalg.norm(C_b_f_hat, axis=0)\n        inlier_mask = np.sum(C_b_f_hat * C_b_f_hm, axis=0) > (1.0 - inlier_thres)\n        inlier_score = np.sum(w[inlier_mask])\n\n        if inlier_score > inlier_score_best:\n            inlier_best = inlier_mask\n            C_T_G_best = np.eye(4)\n            C_T_G_best[:3, :3] = C_R_G_hat\n            C_T_G_best[:3, 3:] = C_t_G_hat\n            inlier_score_best = inlier_score\n\n    return C_T_G_best, inlier_best\n\n\ndef RunPnPNL(C_T_G, G_p_f, C_b_f, w, cutoff=0.01):\n    '''\n    Weighted PnP based using weight w and bearing angular loss.\n    Return optimized P_new = optimized C_T_G.\n    '''\n\n    z0 = VectorizeInitialPose(C_T_G)\n    res = least_squares(\n        lambda x: MeasureReprojectionSinglePose(x, G_p_f, C_b_f, w),\n        z0,\n        verbose=0,\n        ftol=1e-4,\n        max_nfev=50,\n        xtol=1e-8,\n        loss='huber',\n        f_scale=cutoff\n    )\n    z = res.x\n\n    P_new = UpdatePose(z)\n\n    return P_new"
  },
  {
    "path": "src/utils/read_write_models.py",
    "content": "# Copyright (c) 2018, ETH Zurich and UNC Chapel Hill.\r\n# All rights reserved.\r\n#\r\n# Redistribution and use in source and binary forms, with or without\r\n# modification, are permitted provided that the following conditions are met:\r\n#\r\n#     * Redistributions of source code must retain the above copyright\r\n#       notice, this list of conditions and the following disclaimer.\r\n#\r\n#     * Redistributions in binary form must reproduce the above copyright\r\n#       notice, this list of conditions and the following disclaimer in the\r\n#       documentation and/or other materials provided with the distribution.\r\n#\r\n#     * Neither the name of ETH Zurich and UNC Chapel Hill nor the names of\r\n#       its contributors may be used to endorse or promote products derived\r\n#       from this software without specific prior written permission.\r\n#\r\n# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\"\r\n# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE\r\n# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE\r\n# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE\r\n# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR\r\n# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF\r\n# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS\r\n# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN\r\n# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)\r\n# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE\r\n# POSSIBILITY OF SUCH DAMAGE.\r\n#\r\n# Author: Johannes L. Schoenberger (jsch-at-demuc-dot-de)\r\n\r\nimport os\r\nimport collections\r\nimport numpy as np\r\nimport struct\r\nimport argparse\r\nimport logging\r\n\r\nlogger = logging.getLogger(__name__)\r\n\r\n\r\nCameraModel = collections.namedtuple(\r\n    \"CameraModel\", [\"model_id\", \"model_name\", \"num_params\"])\r\nCamera = collections.namedtuple(\r\n    \"Camera\", [\"id\", \"model\", \"width\", \"height\", \"params\"])\r\nBaseImage = collections.namedtuple(\r\n    \"Image\", [\"id\", \"qvec\", \"tvec\", \"camera_id\", \"name\", \"xys\", \"point3D_ids\"])\r\nPoint3D = collections.namedtuple(\r\n    \"Point3D\", [\"id\", \"xyz\", \"rgb\", \"error\", \"image_ids\", \"point2D_idxs\"])\r\n\r\n\r\nclass Image(BaseImage):\r\n    def qvec2rotmat(self):\r\n        return qvec2rotmat(self.qvec)\r\n\r\n\r\nCAMERA_MODELS = {\r\n    CameraModel(model_id=0, model_name=\"SIMPLE_PINHOLE\", num_params=3),\r\n    CameraModel(model_id=1, model_name=\"PINHOLE\", num_params=4),\r\n    CameraModel(model_id=2, model_name=\"SIMPLE_RADIAL\", num_params=4),\r\n    CameraModel(model_id=3, model_name=\"RADIAL\", num_params=5),\r\n    CameraModel(model_id=4, model_name=\"OPENCV\", num_params=8),\r\n    CameraModel(model_id=5, model_name=\"OPENCV_FISHEYE\", num_params=8),\r\n    CameraModel(model_id=6, model_name=\"FULL_OPENCV\", num_params=12),\r\n    CameraModel(model_id=7, model_name=\"FOV\", num_params=5),\r\n    CameraModel(model_id=8, model_name=\"SIMPLE_RADIAL_FISHEYE\", num_params=4),\r\n    CameraModel(model_id=9, model_name=\"RADIAL_FISHEYE\", num_params=5),\r\n    CameraModel(model_id=10, model_name=\"THIN_PRISM_FISHEYE\", num_params=12)\r\n}\r\nCAMERA_MODEL_IDS = dict([(camera_model.model_id, camera_model)\r\n                         for camera_model in CAMERA_MODELS])\r\nCAMERA_MODEL_NAMES = dict([(camera_model.model_name, camera_model)\r\n                           for camera_model in CAMERA_MODELS])\r\n\r\n\r\ndef read_next_bytes(fid, num_bytes, format_char_sequence, endian_character=\"<\"):\r\n    \"\"\"Read and unpack the next bytes from a binary file.\r\n    :param fid:\r\n    :param num_bytes: Sum of combination of {2, 4, 8}, e.g. 2, 6, 16, 30, etc.\r\n    :param format_char_sequence: List of {c, e, f, d, h, H, i, I, l, L, q, Q}.\r\n    :param endian_character: Any of {@, =, <, >, !}\r\n    :return: Tuple of read and unpacked values.\r\n    \"\"\"\r\n    data = fid.read(num_bytes)\r\n    return struct.unpack(endian_character + format_char_sequence, data)\r\n\r\n\r\ndef write_next_bytes(fid, data, format_char_sequence, endian_character=\"<\"):\r\n    \"\"\"pack and write to a binary file.\r\n    :param fid:\r\n    :param data: data to send, if multiple elements are sent at the same time,\r\n    they should be encapsuled either in a list or a tuple\r\n    :param format_char_sequence: List of {c, e, f, d, h, H, i, I, l, L, q, Q}.\r\n    should be the same length as the data list or tuple\r\n    :param endian_character: Any of {@, =, <, >, !}\r\n    \"\"\"\r\n    if isinstance(data, (list, tuple)):\r\n        bytes = struct.pack(endian_character + format_char_sequence, *data)\r\n    else:\r\n        bytes = struct.pack(endian_character + format_char_sequence, data)\r\n    fid.write(bytes)\r\n\r\n\r\ndef read_cameras_text(path):\r\n    \"\"\"\r\n    see: src/base/reconstruction.cc\r\n        void Reconstruction::WriteCamerasText(const std::string& path)\r\n        void Reconstruction::ReadCamerasText(const std::string& path)\r\n    \"\"\"\r\n    cameras = {}\r\n    with open(path, \"r\") as fid:\r\n        while True:\r\n            line = fid.readline()\r\n            if not line:\r\n                break\r\n            line = line.strip()\r\n            if len(line) > 0 and line[0] != \"#\":\r\n                elems = line.split()\r\n                camera_id = int(elems[0])\r\n                model = elems[1]\r\n                width = int(elems[2])\r\n                height = int(elems[3])\r\n                params = np.array(tuple(map(float, elems[4:])))\r\n                cameras[camera_id] = Camera(id=camera_id, model=model,\r\n                                            width=width, height=height,\r\n                                            params=params)\r\n    return cameras\r\n\r\n\r\ndef read_cameras_binary(path_to_model_file):\r\n    \"\"\"\r\n    see: src/base/reconstruction.cc\r\n        void Reconstruction::WriteCamerasBinary(const std::string& path)\r\n        void Reconstruction::ReadCamerasBinary(const std::string& path)\r\n    \"\"\"\r\n    cameras = {}\r\n    with open(path_to_model_file, \"rb\") as fid:\r\n        num_cameras = read_next_bytes(fid, 8, \"Q\")[0]\r\n        for _ in range(num_cameras):\r\n            camera_properties = read_next_bytes(\r\n                fid, num_bytes=24, format_char_sequence=\"iiQQ\")\r\n            camera_id = camera_properties[0]\r\n            model_id = camera_properties[1]\r\n            model_name = CAMERA_MODEL_IDS[camera_properties[1]].model_name\r\n            width = camera_properties[2]\r\n            height = camera_properties[3]\r\n            num_params = CAMERA_MODEL_IDS[model_id].num_params\r\n            params = read_next_bytes(fid, num_bytes=8*num_params,\r\n                                     format_char_sequence=\"d\"*num_params)\r\n            cameras[camera_id] = Camera(id=camera_id,\r\n                                        model=model_name,\r\n                                        width=width,\r\n                                        height=height,\r\n                                        params=np.array(params))\r\n        assert len(cameras) == num_cameras\r\n    return cameras\r\n\r\n\r\ndef write_cameras_text(cameras, path):\r\n    \"\"\"\r\n    see: src/base/reconstruction.cc\r\n        void Reconstruction::WriteCamerasText(const std::string& path)\r\n        void Reconstruction::ReadCamerasText(const std::string& path)\r\n    \"\"\"\r\n    HEADER = \"# Camera list with one line of data per camera:\\n\" + \\\r\n             \"#   CAMERA_ID, MODEL, WIDTH, HEIGHT, PARAMS[]\\n\" + \\\r\n             \"# Number of cameras: {}\\n\".format(len(cameras))\r\n    with open(path, \"w\") as fid:\r\n        fid.write(HEADER)\r\n        for _, cam in cameras.items():\r\n            to_write = [cam.id, cam.model, cam.width, cam.height, *cam.params]\r\n            line = \" \".join([str(elem) for elem in to_write])\r\n            fid.write(line + \"\\n\")\r\n\r\n\r\ndef write_cameras_binary(cameras, path_to_model_file):\r\n    \"\"\"\r\n    see: src/base/reconstruction.cc\r\n        void Reconstruction::WriteCamerasBinary(const std::string& path)\r\n        void Reconstruction::ReadCamerasBinary(const std::string& path)\r\n    \"\"\"\r\n    with open(path_to_model_file, \"wb\") as fid:\r\n        write_next_bytes(fid, len(cameras), \"Q\")\r\n        for _, cam in cameras.items():\r\n            model_id = CAMERA_MODEL_NAMES[cam.model].model_id\r\n            camera_properties = [cam.id,\r\n                                 model_id,\r\n                                 cam.width,\r\n                                 cam.height]\r\n            write_next_bytes(fid, camera_properties, \"iiQQ\")\r\n            for p in cam.params:\r\n                write_next_bytes(fid, float(p), \"d\")\r\n    return cameras\r\n\r\n\r\ndef read_images_text(path):\r\n    \"\"\"\r\n    see: src/base/reconstruction.cc\r\n        void Reconstruction::ReadImagesText(const std::string& path)\r\n        void Reconstruction::WriteImagesText(const std::string& path)\r\n    \"\"\"\r\n    images = {}\r\n    with open(path, \"r\") as fid:\r\n        while True:\r\n            line = fid.readline()\r\n            if not line:\r\n                break\r\n            line = line.strip()\r\n            if len(line) > 0 and line[0] != \"#\":\r\n                elems = line.split()\r\n                image_id = int(elems[0])\r\n                qvec = np.array(tuple(map(float, elems[1:5])))\r\n                tvec = np.array(tuple(map(float, elems[5:8])))\r\n                camera_id = int(elems[8])\r\n                image_name = elems[9]\r\n                elems = fid.readline().split()\r\n                xys = np.column_stack([tuple(map(float, elems[0::3])),\r\n                                       tuple(map(float, elems[1::3]))])\r\n                point3D_ids = np.array(tuple(map(int, elems[2::3])))\r\n                images[image_id] = Image(\r\n                    id=image_id, qvec=qvec, tvec=tvec,\r\n                    camera_id=camera_id, name=image_name,\r\n                    xys=xys, point3D_ids=point3D_ids)\r\n    return images\r\n\r\n\r\ndef read_images_binary(path_to_model_file):\r\n    \"\"\"\r\n    see: src/base/reconstruction.cc\r\n        void Reconstruction::ReadImagesBinary(const std::string& path)\r\n        void Reconstruction::WriteImagesBinary(const std::string& path)\r\n    \"\"\"\r\n    images = {}\r\n    with open(path_to_model_file, \"rb\") as fid:\r\n        num_reg_images = read_next_bytes(fid, 8, \"Q\")[0]\r\n        for _ in range(num_reg_images):\r\n            binary_image_properties = read_next_bytes(\r\n                fid, num_bytes=64, format_char_sequence=\"idddddddi\")\r\n            image_id = binary_image_properties[0]\r\n            qvec = np.array(binary_image_properties[1:5])\r\n            tvec = np.array(binary_image_properties[5:8])\r\n            camera_id = binary_image_properties[8]\r\n            image_name = \"\"\r\n            current_char = read_next_bytes(fid, 1, \"c\")[0]\r\n            while current_char != b\"\\x00\":   # look for the ASCII 0 entry\r\n                image_name += current_char.decode(\"utf-8\")\r\n                current_char = read_next_bytes(fid, 1, \"c\")[0]\r\n            num_points2D = read_next_bytes(fid, num_bytes=8,\r\n                                           format_char_sequence=\"Q\")[0]\r\n            x_y_id_s = read_next_bytes(fid, num_bytes=24*num_points2D,\r\n                                       format_char_sequence=\"ddq\"*num_points2D)\r\n            xys = np.column_stack([tuple(map(float, x_y_id_s[0::3])),\r\n                                   tuple(map(float, x_y_id_s[1::3]))])\r\n            point3D_ids = np.array(tuple(map(int, x_y_id_s[2::3])))\r\n            images[image_id] = Image(\r\n                id=image_id, qvec=qvec, tvec=tvec,\r\n                camera_id=camera_id, name=image_name,\r\n                xys=xys, point3D_ids=point3D_ids)\r\n    return images\r\n\r\n\r\ndef write_images_text(images, path):\r\n    \"\"\"\r\n    see: src/base/reconstruction.cc\r\n        void Reconstruction::ReadImagesText(const std::string& path)\r\n        void Reconstruction::WriteImagesText(const std::string& path)\r\n    \"\"\"\r\n    if len(images) == 0:\r\n        mean_observations = 0\r\n    else:\r\n        mean_observations = sum((len(img.point3D_ids) for _, img in images.items()))/len(images)\r\n    HEADER = \"# Image list with two lines of data per image:\\n\" + \\\r\n             \"#   IMAGE_ID, QW, QX, QY, QZ, TX, TY, TZ, CAMERA_ID, NAME\\n\" + \\\r\n             \"#   POINTS2D[] as (X, Y, POINT3D_ID)\\n\" + \\\r\n             \"# Number of images: {}, mean observations per image: {}\\n\".format(len(images), mean_observations)\r\n\r\n    with open(path, \"w\") as fid:\r\n        fid.write(HEADER)\r\n        for _, img in images.items():\r\n            image_header = [img.id, *img.qvec, *img.tvec, img.camera_id, img.name]\r\n            first_line = \" \".join(map(str, image_header))\r\n            fid.write(first_line + \"\\n\")\r\n\r\n            points_strings = []\r\n            for xy, point3D_id in zip(img.xys, img.point3D_ids):\r\n                points_strings.append(\" \".join(map(str, [*xy, point3D_id])))\r\n            fid.write(\" \".join(points_strings) + \"\\n\")\r\n\r\n\r\ndef write_images_binary(images, path_to_model_file):\r\n    \"\"\"\r\n    see: src/base/reconstruction.cc\r\n        void Reconstruction::ReadImagesBinary(const std::string& path)\r\n        void Reconstruction::WriteImagesBinary(const std::string& path)\r\n    \"\"\"\r\n    with open(path_to_model_file, \"wb\") as fid:\r\n        write_next_bytes(fid, len(images), \"Q\")\r\n        for _, img in images.items():\r\n            write_next_bytes(fid, img.id, \"i\")\r\n            write_next_bytes(fid, img.qvec.tolist(), \"dddd\")\r\n            write_next_bytes(fid, img.tvec.tolist(), \"ddd\")\r\n            write_next_bytes(fid, img.camera_id, \"i\")\r\n            for char in img.name:\r\n                write_next_bytes(fid, char.encode(\"utf-8\"), \"c\")\r\n            write_next_bytes(fid, b\"\\x00\", \"c\")\r\n            write_next_bytes(fid, len(img.point3D_ids), \"Q\")\r\n            for xy, p3d_id in zip(img.xys, img.point3D_ids):\r\n                write_next_bytes(fid, [*xy, p3d_id], \"ddq\")\r\n\r\n\r\ndef read_points3D_text(path):\r\n    \"\"\"\r\n    see: src/base/reconstruction.cc\r\n        void Reconstruction::ReadPoints3DText(const std::string& path)\r\n        void Reconstruction::WritePoints3DText(const std::string& path)\r\n    \"\"\"\r\n    points3D = {}\r\n    with open(path, \"r\") as fid:\r\n        while True:\r\n            line = fid.readline()\r\n            if not line:\r\n                break\r\n            line = line.strip()\r\n            if len(line) > 0 and line[0] != \"#\":\r\n                elems = line.split()\r\n                point3D_id = int(elems[0])\r\n                xyz = np.array(tuple(map(float, elems[1:4])))\r\n                rgb = np.array(tuple(map(int, elems[4:7])))\r\n                error = float(elems[7])\r\n                image_ids = np.array(tuple(map(int, elems[8::2])))\r\n                point2D_idxs = np.array(tuple(map(int, elems[9::2])))\r\n                points3D[point3D_id] = Point3D(id=point3D_id, xyz=xyz, rgb=rgb,\r\n                                               error=error, image_ids=image_ids,\r\n                                               point2D_idxs=point2D_idxs)\r\n    return points3D\r\n\r\n\r\ndef read_points3D_binary(path_to_model_file):\r\n    \"\"\"\r\n    see: src/base/reconstruction.cc\r\n        void Reconstruction::ReadPoints3DBinary(const std::string& path)\r\n        void Reconstruction::WritePoints3DBinary(const std::string& path)\r\n    \"\"\"\r\n    points3D = {}\r\n    with open(path_to_model_file, \"rb\") as fid:\r\n        num_points = read_next_bytes(fid, 8, \"Q\")[0]\r\n        for _ in range(num_points):\r\n            binary_point_line_properties = read_next_bytes(\r\n                fid, num_bytes=43, format_char_sequence=\"QdddBBBd\")\r\n            point3D_id = binary_point_line_properties[0]\r\n            xyz = np.array(binary_point_line_properties[1:4])\r\n            rgb = np.array(binary_point_line_properties[4:7])\r\n            error = np.array(binary_point_line_properties[7])\r\n            track_length = read_next_bytes(\r\n                fid, num_bytes=8, format_char_sequence=\"Q\")[0]\r\n            track_elems = read_next_bytes(\r\n                fid, num_bytes=8*track_length,\r\n                format_char_sequence=\"ii\"*track_length)\r\n            image_ids = np.array(tuple(map(int, track_elems[0::2])))\r\n            point2D_idxs = np.array(tuple(map(int, track_elems[1::2])))\r\n            points3D[point3D_id] = Point3D(\r\n                id=point3D_id, xyz=xyz, rgb=rgb,\r\n                error=error, image_ids=image_ids,\r\n                point2D_idxs=point2D_idxs)\r\n    return points3D\r\n\r\n\r\ndef write_points3D_text(points3D, path):\r\n    \"\"\"\r\n    see: src/base/reconstruction.cc\r\n        void Reconstruction::ReadPoints3DText(const std::string& path)\r\n        void Reconstruction::WritePoints3DText(const std::string& path)\r\n    \"\"\"\r\n    if len(points3D) == 0:\r\n        mean_track_length = 0\r\n    else:\r\n        mean_track_length = sum((len(pt.image_ids) for _, pt in points3D.items()))/len(points3D)\r\n    HEADER = \"# 3D point list with one line of data per point:\\n\" + \\\r\n             \"#   POINT3D_ID, X, Y, Z, R, G, B, ERROR, TRACK[] as (IMAGE_ID, POINT2D_IDX)\\n\" + \\\r\n             \"# Number of points: {}, mean track length: {}\\n\".format(len(points3D), mean_track_length)\r\n\r\n    with open(path, \"w\") as fid:\r\n        fid.write(HEADER)\r\n        for _, pt in points3D.items():\r\n            point_header = [pt.id, *pt.xyz, *pt.rgb, pt.error]\r\n            fid.write(\" \".join(map(str, point_header)) + \" \")\r\n            track_strings = []\r\n            for image_id, point2D in zip(pt.image_ids, pt.point2D_idxs):\r\n                track_strings.append(\" \".join(map(str, [image_id, point2D])))\r\n            fid.write(\" \".join(track_strings) + \"\\n\")\r\n\r\n\r\ndef write_points3D_binary(points3D, path_to_model_file):\r\n    \"\"\"\r\n    see: src/base/reconstruction.cc\r\n        void Reconstruction::ReadPoints3DBinary(const std::string& path)\r\n        void Reconstruction::WritePoints3DBinary(const std::string& path)\r\n    \"\"\"\r\n    with open(path_to_model_file, \"wb\") as fid:\r\n        write_next_bytes(fid, len(points3D), \"Q\")\r\n        for _, pt in points3D.items():\r\n            write_next_bytes(fid, pt.id, \"Q\")\r\n            write_next_bytes(fid, pt.xyz.tolist(), \"ddd\")\r\n            write_next_bytes(fid, pt.rgb.tolist(), \"BBB\")\r\n            write_next_bytes(fid, pt.error, \"d\")\r\n            track_length = pt.image_ids.shape[0]\r\n            write_next_bytes(fid, track_length, \"Q\")\r\n            for image_id, point2D_id in zip(pt.image_ids, pt.point2D_idxs):\r\n                write_next_bytes(fid, [image_id, point2D_id], \"ii\")\r\n\r\n\r\ndef detect_model_format(path, ext):\r\n    if os.path.isfile(os.path.join(path, \"cameras\"  + ext)) and \\\r\n       os.path.isfile(os.path.join(path, \"images\"   + ext)) and \\\r\n       os.path.isfile(os.path.join(path, \"points3D\" + ext)):\r\n        return True\r\n\r\n    return False\r\n\r\n\r\ndef read_model(path, ext=\"\"):\r\n    # try to detect the extension automatically\r\n    if ext == \"\":\r\n        if detect_model_format(path, \".bin\"):\r\n            ext = \".bin\"\r\n        elif detect_model_format(path, \".txt\"):\r\n            ext = \".txt\"\r\n        else:\r\n            try:\r\n                cameras, images, points3D = read_model(os.path.join(path, \"model/\"))\r\n                logger.warning(\r\n                    \"This SfM file structure was deprecated in hloc v1.1\")\r\n                return cameras, images, points3D\r\n            except FileNotFoundError:\r\n                raise FileNotFoundError(\r\n                    f\"Could not find binary or text COLMAP model at {path}\")\r\n\r\n    if ext == \".txt\":\r\n        cameras = read_cameras_text(os.path.join(path, \"cameras\" + ext))\r\n        images = read_images_text(os.path.join(path, \"images\" + ext))\r\n        points3D = read_points3D_text(os.path.join(path, \"points3D\") + ext)\r\n    else:\r\n        cameras = read_cameras_binary(os.path.join(path, \"cameras\" + ext))\r\n        images = read_images_binary(os.path.join(path, \"images\" + ext))\r\n        points3D = read_points3D_binary(os.path.join(path, \"points3D\") + ext)\r\n    return cameras, images, points3D\r\n\r\n\r\ndef write_model(cameras, images, points3D, path, ext=\".bin\"):\r\n    if ext == \".txt\":\r\n        write_cameras_text(cameras, os.path.join(path, \"cameras\" + ext))\r\n        write_images_text(images, os.path.join(path, \"images\" + ext))\r\n        write_points3D_text(points3D, os.path.join(path, \"points3D\") + ext)\r\n    else:\r\n        write_cameras_binary(cameras, os.path.join(path, \"cameras\" + ext))\r\n        write_images_binary(images, os.path.join(path, \"images\" + ext))\r\n        write_points3D_binary(points3D, os.path.join(path, \"points3D\") + ext)\r\n    return cameras, images, points3D\r\n\r\n\r\ndef qvec2rotmat(qvec):\r\n    return np.array([\r\n        [1 - 2 * qvec[2]**2 - 2 * qvec[3]**2,\r\n         2 * qvec[1] * qvec[2] - 2 * qvec[0] * qvec[3],\r\n         2 * qvec[3] * qvec[1] + 2 * qvec[0] * qvec[2]],\r\n        [2 * qvec[1] * qvec[2] + 2 * qvec[0] * qvec[3],\r\n         1 - 2 * qvec[1]**2 - 2 * qvec[3]**2,\r\n         2 * qvec[2] * qvec[3] - 2 * qvec[0] * qvec[1]],\r\n        [2 * qvec[3] * qvec[1] - 2 * qvec[0] * qvec[2],\r\n         2 * qvec[2] * qvec[3] + 2 * qvec[0] * qvec[1],\r\n         1 - 2 * qvec[1]**2 - 2 * qvec[2]**2]])\r\n\r\n\r\ndef rotmat2qvec(R):\r\n    Rxx, Ryx, Rzx, Rxy, Ryy, Rzy, Rxz, Ryz, Rzz = R.flat\r\n    K = np.array([\r\n        [Rxx - Ryy - Rzz, 0, 0, 0],\r\n        [Ryx + Rxy, Ryy - Rxx - Rzz, 0, 0],\r\n        [Rzx + Rxz, Rzy + Ryz, Rzz - Rxx - Ryy, 0],\r\n        [Ryz - Rzy, Rzx - Rxz, Rxy - Ryx, Rxx + Ryy + Rzz]]) / 3.0\r\n    eigvals, eigvecs = np.linalg.eigh(K)\r\n    qvec = eigvecs[[3, 0, 1, 2], np.argmax(eigvals)]\r\n    if qvec[0] < 0:\r\n        qvec *= -1\r\n    return qvec\r\n\r\n\r\ndef main():\r\n    parser = argparse.ArgumentParser(description=\"Read and write COLMAP binary and text models\")\r\n    parser.add_argument(\"--input_model\", help=\"path to input model folder\")\r\n    parser.add_argument(\"--input_format\", choices=[\".bin\", \".txt\"],\r\n                        help=\"input model format\", default=\"\")\r\n    parser.add_argument(\"--output_model\",\r\n                        help=\"path to output model folder\")\r\n    parser.add_argument(\"--output_format\", choices=[\".bin\", \".txt\"],\r\n                        help=\"outut model format\", default=\".txt\")\r\n    args = parser.parse_args()\r\n\r\n    cameras, images, points3D = read_model(path=args.input_model, ext=args.input_format)\r\n\r\n    print(\"num_cameras:\", len(cameras))\r\n    print(\"num_images:\", len(images))\r\n    print(\"num_points3D:\", len(points3D))\r\n\r\n    if args.output_model is not None:\r\n        write_model(cameras, images, points3D, path=args.output_model, ext=args.output_format)\r\n\r\n\r\nif __name__ == \"__main__\":\r\n    main()"
  },
  {
    "path": "src/utils/select_additional_landmarks.py",
    "content": "import argparse\nimport copy\nimport numpy as np\nimport os\nimport scipy.stats as stats\nimport torch\nfrom torch.utils.data import DataLoader\nfrom tqdm import tqdm\n\nimport sys\nsys.path.append(os.path.join(sys.path[0], '..'))\n\nfrom dataloader.indoor6 import Indoor6\nfrom models.efficientlitesld import EfficientNetSLD\nfrom utils.pnp import *\n\nfrom PIL import Image\n\n# import open3d as o3d\n\ndef load_landmark_files(landmark_path, visibility_path):\n    landmark_file = open(landmark_path, 'r')\n    num_landmark = int(landmark_file.readline())\n    landmark = []\n    for l in range(num_landmark):\n        pl = landmark_file.readline().split()\n        pl = np.array([float(pl[i]) for i in range(len(pl))])\n        landmark.append(pl)\n    landmark = np.asarray(landmark)[:, 1:].T\n\n    visibility = np.loadtxt(visibility_path)\n\n    return landmark, visibility\n\n\ndef load_landmark_visibility_files(landmark_path, visibility_path):\n    landmark_file = open(landmark_path, 'r')\n    num_landmark = int(landmark_file.readline())\n    landmark = []\n    for l in range(num_landmark):\n        pl = landmark_file.readline().split()\n        pl = np.array([float(pl[i]) for i in range(len(pl))])\n        landmark.append(pl)\n    landmark = np.asarray(landmark)[:, 1:].T\n\n    visibility = np.loadtxt(visibility_path)\n\n    return landmark, visibility\n\n\ndef visualize_keypoint_np(image_, y, x, kp_color):\n    image = image_.copy()\n    if np.sum(kp_color) == 255:\n        square_size = 5\n    else:\n        square_size = 3\n    for c in range(3):\n        image[y - square_size:y + square_size, x - square_size:x + square_size, c] = kp_color[c]\n\n    return image\n\n\ndef compute_error(C_R_G, C_t_G, C_R_G_hat, C_t_G_hat):\n\n    rot_err = 180 / np.pi * np.arccos(np.clip(0.5 * (np.trace(C_R_G.T @ C_R_G_hat) - 1.0), a_min=-1., a_max=1.))\n    trans_err = np.linalg.norm(C_R_G_hat.T @ C_t_G_hat - C_R_G.T @ C_t_G)\n\n    return rot_err, trans_err\n\n\ndef compute_2d3d(opt, pred_heatmap, peak_threshold, landmark2d, landmark3d, C_b_f_gt, H_hm, W_hm, K_inv,\n                 METRICS_LOGGING=None):\n    N = pred_heatmap.shape[0]\n    G_p_f = np.zeros((3, N))\n    C_b_f_hm = np.zeros((3, N))\n    weights = np.zeros(N)\n    validIdx = 0\n\n    pixel_error = []\n    angular_error = []\n    for l in range(N):\n        pred_heatmap_l = pred_heatmap[l]\n        max_pred_heatmap_l = np.max(pred_heatmap_l)\n\n        if max_pred_heatmap_l > peak_threshold:\n            peak_yx = np.unravel_index(np.argmax(pred_heatmap_l), np.array(pred_heatmap_l).shape)\n            peak_yx = np.array(peak_yx)\n\n            # Patch size extraction\n            P = int(min(1+2*np.min(np.array([peak_yx[0], H_hm-1.0-peak_yx[0], peak_yx[1], W_hm-1.0-peak_yx[1]])),\n                        1+64//opt.output_downsample))\n\n            patch_peak_yx = pred_heatmap_l[peak_yx[0] - P // 2:peak_yx[0] + P // 2 + 1,\n                            peak_yx[1] - P // 2:peak_yx[1] + P // 2 + 1]\n            xx_patch, yy_patch = np.meshgrid(np.arange(peak_yx[1] - P // 2, peak_yx[1] + P // 2 + 1, 1),\n                                             np.arange(peak_yx[0] - P // 2, peak_yx[0] + P // 2 + 1, 1))\n\n            refine_y = np.sum(patch_peak_yx * yy_patch) / np.sum(patch_peak_yx)\n            refine_x = np.sum(patch_peak_yx * xx_patch) / np.sum(patch_peak_yx)\n\n            \n            pixel_error.append(np.linalg.norm(landmark2d[:2, l] -\n                                              opt.output_downsample * np.array([refine_x, refine_y])))\n\n            pred_bearing = K_inv @ np.array([refine_x, refine_y, 1])\n            pred_bearing = pred_bearing / np.linalg.norm(pred_bearing)\n            gt_bearing = C_b_f_gt[:, l]\n            gt_bearing = gt_bearing / np.linalg.norm(gt_bearing)\n            angular_error_batch = np.arccos(\n                np.clip(pred_bearing @ gt_bearing, a_min=-1, a_max=1)) * 180 / np.pi\n            \n            angular_error.append(angular_error_batch)\n\n            weights[validIdx] = max_pred_heatmap_l\n            C_b_f_hm[:, validIdx] = pred_bearing\n            G_p_f[:, validIdx] = landmark3d[:, l]\n            validIdx += 1\n\n    return G_p_f[:, :validIdx], C_b_f_hm[:, :validIdx], weights[:validIdx], np.asarray(pixel_error), np.asarray(angular_error)\n\n\ndef compute_pose(G_p_f, C_b_f_hm, weights, minimal_tight_thr, opt_tight_thr, img_id, OUTPUT_FOLDER):\n    Ndetected_landmarks = C_b_f_hm.shape[1]\n\n    # ### Saving 2D-3D correspondences\n    # if Ndetected_landmarks > 0:\n    #     if not os.path.exists(os.path.join(OUTPUT_FOLDER, 'sld2d3d')):\n    #         os.makedirs(os.path.join(OUTPUT_FOLDER, 'sld2d3d'))\n\n    #     np.savetxt('%s/sld2d3d/%06d.txt' % (OUTPUT_FOLDER, img_id),\n    #                np.concatenate((C_b_f_hm, G_p_f), axis=0))\n    # else:\n    #     C_b_f_hm = None\n    #     G_p_f = None\n    #     weights = None\n\n\n    if Ndetected_landmarks >= 4:\n        ## P3P ransac\n        C_T_G_hat, PnP_inlier = P3PKe_Ransac(G_p_f, C_b_f_hm, weights,\n                                             thres=minimal_tight_thr)\n        # print('inlier: ', np.sum(PnP_inlier))\n        if np.sum(PnP_inlier) >= 4:\n            # C_T_G_opt = PnP(C_T_G_hat, G_p_f[:, PnP_inlier], C_b_f_hm[:, PnP_inlier], weights[PnP_inlier])\n            C_T_G_opt = RunPnPNL(C_T_G_hat,\n                                 G_p_f[:, PnP_inlier], C_b_f_hm[:, PnP_inlier],\n                                 weights[PnP_inlier],\n                                 cutoff=opt_tight_thr)\n            return np.sum(PnP_inlier), C_T_G_opt, PnP_inlier\n\n    return 0, None, np.empty((0))\n\n\ndef select_additional_landmarks(opt, minimal_tight_thr=1e-2, opt_tight_thr=5e-3, mode='test', peak_threshold=0.6):\n\n    PRETRAINED_MODEL = opt.pretrained_model    \n\n    device = opt.gpu_device\n    \n    test_dataset = Indoor6(landmark_idx=np.arange(opt.landmark_indices[0], opt.landmark_indices[-1]),\n                           scene_id=opt.scene_id,\n                           mode=mode,\n                           root_folder=opt.dataset_folder,\n                           input_image_downsample=2,\n                           landmark_config=opt.landmark_config,\n                           visibility_config=opt.visibility_config,\n                           skip_image_index=1)\n\n    test_dataloader = DataLoader(dataset=test_dataset, num_workers=1, batch_size=1, shuffle=False, pin_memory=True)\n    \n    landmark_data = test_dataset.landmark\n\n    cnns = []\n    nLandmarks = opt.landmark_indices\n    num_landmarks = opt.landmark_indices[-1]\n\n    if len(PRETRAINED_MODEL) == 0:\n        use_gt_2d3d = True\n    else:\n        use_gt_2d3d = False\n        for idx, pretrained_model in enumerate(PRETRAINED_MODEL):\n            if opt.model == 'efficientnet':\n                cnn = EfficientNetSLD(num_landmarks=nLandmarks[idx+1]-nLandmarks[idx], output_downsample=opt.output_downsample).to(device=device)\n\n            cnn.load_state_dict(torch.load(pretrained_model))\n            cnn = cnn.to(device=device)\n            cnn.eval()\n            \n            # Adding pretrained model\n            cnns.append(cnn)\n    \n    img_id = 0\n\n    METRICS_LOGGING = {'image_name': '',\n                       'angular_error': [],\n                       'pixel_error': [],\n                       'rot_err_all': 180.,\n                       'trans_err_all': 180.,\n                       'heatmap_peak': 0.0,\n                       'ndetected': 0,              \n                       'pnp_inlier': np.zeros(num_landmarks),\n                       'pixel_inlier_error': np.array([1800.]),\n                       }\n    test_image_logging = []    \n\n\n\n    LANDMARKS_METRICS_LOGGING = {'image_name': [],\n                                'angular_error': [],\n                                'pixel_error': [],\n                                'heatmap_peak': 0.0,\n                                'ndetected': 0,                       \n                                }\n    test_landmarks_logging = [copy.deepcopy(LANDMARKS_METRICS_LOGGING) for _ in range(num_landmarks)]\n    print(len(test_landmarks_logging))\n\n    with torch.no_grad():\n\n        ## Only works for indoor-6\n        indoor6W = 640 // opt.output_downsample\n        indoor6H = 352 // opt.output_downsample\n        HH, WW = torch.meshgrid(torch.arange(indoor6H), torch.arange(indoor6W))\n        WW = WW.reshape(1, 1, indoor6H, indoor6W).to('cuda')\n        HH = HH.reshape(1, 1, indoor6H, indoor6W).to('cuda')\n\n        for idx, batch in enumerate(tqdm(test_dataloader)):\n\n            image = batch['image'].to(device=device)\n            B, _, H, W = image.shape\n\n            K_inv = batch['inv_intrinsics'].to(device=device)\n            C_T_G_gt = batch['pose_gt'].cpu().numpy()\n\n            landmark2d = batch['intrinsics'] @ batch['landmark3d'].reshape(B, 3, num_landmarks)\n            landmark2d /= landmark2d[:, 2:].clone()\n            landmark2d = landmark2d.numpy()\n\n            pred_heatmap = []\n            for cnn in cnns:\n                pred = cnn(image)\n                pred_heatmap.append(pred['1'])\n\n            pred_heatmap = torch.cat(pred_heatmap, axis=1)\n            pred_heatmap *= (pred_heatmap > peak_threshold).float()\n\n            K_inv[:, :, :2] *= opt.output_downsample\n\n            ## Compute 2D location of landmarks\n            P = torch.max(torch.max(pred_heatmap, dim=3)[0], dim=2)[0]\n            pred_normalized_heatmap = pred_heatmap / (torch.sum(pred_heatmap, axis=(2, 3), keepdim=True) + 1e-4)\n            projx = torch.sum(WW * pred_normalized_heatmap, axis=(2, 3)).reshape(B, 1, num_landmarks)\n            projy = torch.sum(HH * pred_normalized_heatmap, axis=(2, 3)).reshape(B, 1, num_landmarks)\n            xy1 = torch.cat((projx, projy, torch.ones_like(projx)), axis=1)\n            uv1 = K_inv @ xy1\n            C_B_f = uv1 / torch.sqrt(torch.sum(uv1 ** 2, axis=1, keepdim=True))\n            C_B_f = C_B_f.cpu().numpy()\n            P = P.cpu().numpy()\n            xy1 = xy1.cpu().numpy()\n\n            ## Compute error\n            for b in range(B):\n                # G_p_f, C_b_f, weights, pixel_error, angular_error = compute_2d3d(\n                #                                         opt, pred_heatmap[b].cpu().numpy(), \n                #                                         peak_threshold, landmark2d[b], landmark_data,\n                #                                         batch['landmark3d'][b].cpu().numpy(),\n                #                                         H_hm, W_hm, K_inv[b].cpu().numpy())\n\n                Pb = P[b]>peak_threshold\n                G_p_f = landmark_data[:, Pb]\n                C_b_f = C_B_f[b][:, Pb]                \n                weights = P[b][Pb]                \n                # xy1b = xy1[b][:2, Pb]\n\n                pnp_inlier, C_T_G_hat, pnp_inlier_mask = compute_pose(G_p_f, C_b_f, weights,\n                                                                        minimal_tight_thr, opt_tight_thr,\n                                                                        img_id, opt.output_folder)\n                \n                rot_err, trans_err = 180., 1800.\n                if pnp_inlier >= 4:\n                    rot_err, trans_err = compute_error(C_T_G_gt[b][:3, :3], C_T_G_gt[b][:3, 3],\n                                                       C_T_G_hat[:3, :3], C_T_G_hat[:3, 3])\n                \n                ## Logging information                \n                pixel_error = np.linalg.norm(landmark2d[b][:2, Pb] - opt.output_downsample * xy1[b][:2, Pb], axis=0)\n                C_b_f_gt = batch['landmark3d'][b]\n                C_b_f_gt = torch.nn.functional.normalize(C_b_f_gt, dim=0).cpu().numpy()\n                angular_error = np.arccos(np.clip(np.sum(C_b_f * C_b_f_gt[:, Pb], axis=0), -1, 1)) * 180. / np.pi                \n\n                m = copy.deepcopy(METRICS_LOGGING)\n                m['image_name'] = test_dataset.image_files[img_id]\n                m['rgb'] = batch['image'][b].cpu().numpy().transpose(1, 2, 0)\n                m['pixel_error'] = pixel_error \n                m['angular_error'] = angular_error\n                m['heatmap_peak'] = P[b]\n                m['pixel_detected'] = xy1[b] * opt.output_downsample\n                m['pixel_gt'] = landmark2d[b]\n                m['visibility_gt'] = batch['visibility'][b] > 0.5\n                m['rot_err_all'] = np.array([rot_err])\n                m['trans_err_all'] = np.array([trans_err])\n                m['K'] = batch['intrinsics'][b].cpu().numpy()\n                m['C_T_G_gt'] = C_T_G_gt[b]\n\n                if len(pnp_inlier_mask):\n                    m['pnp_inlier'][Pb] = pnp_inlier_mask\n                    pixel_inlier_error = np.linalg.norm(landmark2d[b][:2, m['pnp_inlier']==1] - \n                                                        opt.output_downsample * xy1[b][:2, m['pnp_inlier']==1], axis=0)\n                    m['pixel_inlier_error'] = pixel_inlier_error\n                \n                test_image_logging.append(m)\n\n                for l in range(num_landmarks):\n                    if batch['visibility'][b, l] > 0.5:\n                        test_landmarks_logging[l]['image_name'].append(test_dataset.image_files[img_id])\n                        if P[b, l]:\n                            test_landmarks_logging[l]['pixel_error'].append(np.linalg.norm(landmark2d[b][:2, l] - \n                                                                                           opt.output_downsample * xy1[b][:2, l], axis=0))\n                        else:\n                            test_landmarks_logging[l]['pixel_error'].append(1e3)\n\n                test_landmarks_logging.append(m)\n                \n                img_id += 1\n\n\n        ## 2D visualization of images\n        # test_image_logging.sort(key = lambda x: x['trans_err_all'][0])\n        # for m in test_image_logging:\n        #     print(m['image_name'], ': ', m['trans_err_all'][0])\n        #     img = np.array(m['rgb'] * 255, dtype=np.uint8)\n        #     for l in range(len(m['visibility_gt'])):\n        #         if m['pnp_inlier'][l]:\n        #             img = visualize_keypoint_np(img, \n        #                                         int(m['pixel_detected'][1, l]),\n        #                                         int(m['pixel_detected'][0, l]),\n        #                                         np.array([0., 255., 0.]))\n        #         if m['visibility_gt'][l]:                                        \n        #             img = visualize_keypoint_np(img, \n        #                                         int(m['pixel_gt'][1, l]),\n        #                                         int(m['pixel_gt'][0, l]),\n        #                                         np.array([200., 0., 0.]))\n                    \n        #     Image.fromarray(img).save('%s/%2.2f_%2.2f_%s.jpg' % (opt.output_folder, \n        #                                                          m['trans_err_all'][0], \n        #                                                          np.mean(m['pixel_inlier_error']),\n        #                                                          m['image_name']))\n\n\n        ###########################################################################################\n        ############################ Extra landmark selection analysis ############################\n        ###########################################################################################\n\n        ## Some more additional points to improve wacky poses\n        # lm_file = os.path.join(opt.dataset_folder, opt.scene_id, 'landmarks/landmarks-2000v8.txt')\n        # vis_file = os.path.join(opt.dataset_folder, opt.scene_id, 'landmarks/visibility-2000v8_depth_normal.txt')\n        # full_landmarks, full_vis = load_landmark_files(lm_file, vis_file)\n        # full_landmarks, full_vis = full_landmarks[:, :200], full_vis[:200]\n\n\n        ## Colmap file\n        from utils.read_write_models import read_model\n        cameras, images, points = read_model(os.path.join(opt.dataset_folder, 'indoor6-colmap/%s/sparse/0' % opt.scene_id), ext='.bin')\n        indoor6_name_2to_colmap_index = {}\n        for k in images:\n            indoor6_name_2to_colmap_index[images[k].name] = k\n\n\n        ## Images with bad poses\n        ## Adding more landmarks on top\n        ## For each test image, pick 10 landmarks that have highest score, \n        ## adding to the high accuracy of camera position triangulation\n        additional_landmarks = set()\n        for idx, m in enumerate(test_image_logging):\n            if m['trans_err_all'][0] > 1.0:\n\n                ## We want to add unseen points that isn't near the 2D detected points\n                img_vis_id = test_dataset.image_files.index(m['image_name'])\n\n                # print('---------------------------')\n                # print(m['image_name'])                \n                # print(test_dataset.original_image_name(img_vis_id))\n                # print(images[indoor6_name_2to_colmap_index[test_dataset.original_image_name(img_vis_id)]].name)\n\n                xys = images[indoor6_name_2to_colmap_index[test_dataset.original_image_name(img_vis_id)]].xys\n                point3dids = images[indoor6_name_2to_colmap_index[test_dataset.original_image_name(img_vis_id)]].point3D_ids\n                xys = xys[point3dids != -1]\n                point3dids = point3dids[point3dids != -1]\n\n                img = np.array(m['rgb'] * 255, dtype=np.uint8)\n                \n                for l in range(xys.shape[0]):\n                    img = visualize_keypoint_np(img,\n                                            int(xys[l, 1] * 352 / 720),\n                                            int(xys[l, 0] * 0.5),\n                                            np.array([200., 0., 0.]))\n                    \n                    xy_scaled = np.array([xys[l, 0] * 0.5, xys[l, 1] * 352 / 720])\n\n                    if np.sum(m['pnp_inlier']) > 0:\n                        dist_other_2d_kpts = np.linalg.norm(xy_scaled.reshape(2, 1) - m['pixel_detected'][:2, m['pnp_inlier']==1], axis=0)\n                        \n                        if np.min(dist_other_2d_kpts) > 20: # 20 pixels, 1/10 of the image size\n                            additional_landmarks.add(point3dids[l])\n                    else:\n                        additional_landmarks.add(point3dids[l])\n                    \n                # for l in range(len(m['visibility_gt'])):\n                #     if m['pnp_inlier'][l]:\n                #         img = visualize_keypoint_np(img, \n                #                                     int(m['pixel_detected'][1, l]),\n                #                                     int(m['pixel_detected'][0, l]),\n                #                                     np.array([0., 255., 0.]))\n                    \n                # visible_landmarks_in_the_next_1k = np.where(full_vis[:, test_dataset.image_indices[img_vis_id]] == 1)[0]\n                # visible_landmarks_in_the_next_1k += 1000\n                # print(visible_landmarks_in_the_next_1k)\n                # img = np.array(m['rgb'] * 255, dtype=np.uint8)\n                # for l in visible_landmarks_in_the_next_1k:\n                #     pix = m['K'] @ (m['C_T_G_gt'][:3, :3] @ full_landmarks[:, l] + m['C_T_G_gt'][:3, 3])\n                #     img = visualize_keypoint_np(img,\n                #                                 int(pix[1] / pix[2]),\n                #                                 int(pix[0] / pix[2]),\n                #                                 np.array([200., 0., 0.]))\n\n                \n                ## Re-do pnp, save new image with new translation error\n\n\n                # Image.fromarray(img).save('%s/%2.2f_%2.2f_%s_after.jpg' % (opt.output_folder, \n                #                                                             m['trans_err_all'][0], \n                #                                                             np.mean(m['pixel_inlier_error']),\n                #                                                             m['image_name']))\n        \n\n        ### Given additional set of landmarks, re-run the landmark selection to get 200 points\n        from landmark_selection import ComputePerPointAngularSpan, ComputePerPointDepth, SaveLandmarksAndVisibilityMask\n\n        ### Adding a bank of new points\n        numPoints3D = len(additional_landmarks)\n        points3D_ids = np.zeros(numPoints3D)\n        points3D_scores = np.zeros(numPoints3D)\n        points3D_depth = np.zeros(numPoints3D)\n        points3D_tracklength = np.zeros(numPoints3D)\n        points3D_anglespan = np.zeros(numPoints3D)\n\n        validIdx = 0\n        ## Compute score for each landmark    \n        for i, k in enumerate(tqdm(additional_landmarks)):\n            pointInGlobal = points[k].xyz\n            image_ids = points[k].image_ids\n            trackLength = len(image_ids)\n                \n            depthMean, depthStd = ComputePerPointDepth(pointInGlobal, image_ids, images)        \n            # timespan = ComputePerPointTimeSpan(image_ids, images)\n            anglespan = ComputePerPointAngularSpan(pointInGlobal, image_ids, images)\n\n            if depthMean < 15.0 and trackLength > 3:\n                depthScore = min(1.0, depthStd / depthMean) \n                trackLengthScore = 0.25 * np.log2(trackLength)\n                \n                points3D_depth[validIdx] = depthMean\n                points3D_tracklength[validIdx] = trackLength\n                points3D_anglespan[validIdx] = anglespan\n\n                points3D_ids[validIdx] = k\n                points3D_scores[validIdx] = depthScore + trackLengthScore + anglespan\n\n                validIdx += 1\n\n        points3D_depth = points3D_depth[:validIdx]\n        points3D_tracklength = points3D_tracklength[:validIdx]\n        points3D_anglespan = points3D_anglespan[:validIdx]\n        point3dids = points3D_ids[:validIdx]\n        points3D_scores = points3D_scores[:validIdx]\n\n        print('Number of additional points: ', validIdx)\n        print('[Depth mean] Max: %2.2f/Median: %2.2f/Mean: %2.2f/Min: %2.2f' \n              % (np.max(points3D_depth), np.median(points3D_depth), np.mean(points3D_depth), np.min(points3D_depth)))\n        print('[Track length] Max: %2.2f/Median: %2.2f/Mean: %2.2f/Min: %2.2f' \n              % (np.max(points3D_tracklength), np.median(points3D_tracklength), np.mean(points3D_tracklength), np.min(points3D_tracklength)))\n        print('[Angle span] Max: %2.2f/Median: %2.2f/Mean: %2.2f/Min: %2.2f' \n              % (np.max(points3D_anglespan), np.median(points3D_anglespan), np.mean(points3D_anglespan), np.min(points3D_anglespan)))\n        \n\n        num_selected_landmark = opt.num_landmarks\n        ## Sort scores\n        sorted_indices = np.argsort(points3D_scores)\n\n        ## Greedy selection\n        selected_landmarks = {'id': np.zeros(num_selected_landmark), \n                            'xyz': np.zeros((3, num_selected_landmark)), \n                            'score': np.zeros(num_selected_landmark)}\n\n        ## Selecting first point\n        selected_landmarks['id'][0] = points3D_ids[sorted_indices[-1]]\n        selected_landmarks['xyz'][:, 0] = points[selected_landmarks['id'][0]].xyz\n        selected_landmarks['score'][0] = points3D_scores[sorted_indices[-1]]\n\n        nselected = 1\n        radius = 5.0\n\n        while nselected < num_selected_landmark:\n            for i in reversed(sorted_indices):\n                id = points3D_ids[i]\n                xyz = points[id].xyz        \n\n                if np.sum(np.linalg.norm(xyz.reshape(3, 1) - selected_landmarks['xyz'][:, :nselected], axis=0) < radius):\n                    continue\n                else:\n                    selected_landmarks['id'][nselected] = id\n                    selected_landmarks['xyz'][:, nselected] = xyz\n                    selected_landmarks['score'][nselected] = points3D_scores[i]\n                    nselected += 1\n\n                if nselected == num_selected_landmark:\n                    break\n                \n            radius *= 0.5\n\n        ## Saving\n        import pickle\n\n        indoor6_images = pickle.load(open(os.path.join(opt.dataset_folder, '%s/train_test_val.pkl' % opt.scene_id), 'rb'))\n        indoor6_imagename_to_index = {}\n\n        for i, f in enumerate(indoor6_images['train']):\n            image_name = open(os.path.join(opt.dataset_folder, \n                                        opt.scene_id, 'images', \n                                        f.replace('color.jpg', \n                                                    'intrinsics.txt'))).readline().split(' ')[-1][:-1]\n            indoor6_imagename_to_index[image_name] = indoor6_images['train_idx'][i]\n        \n        for i, f in enumerate(indoor6_images['val']):\n            image_name = open(os.path.join(opt.dataset_folder, \n                                        opt.scene_id, 'images', \n                                        f.replace('color.jpg', \n                                                    'intrinsics.txt'))).readline().split(' ')[-1][:-1]\n            indoor6_imagename_to_index[image_name] = indoor6_images['val_idx'][i]\n\n        for i, f in enumerate(indoor6_images['test']):\n            image_name = open(os.path.join(opt.dataset_folder, \n                                        opt.scene_id, 'images', \n                                        f.replace('color.jpg', \n                                                    'intrinsics.txt'))).readline().split(' ')[-1][:-1]\n            indoor6_imagename_to_index[image_name] = indoor6_images['test_idx'][i]\n\n        num_images = len(indoor6_images['train']) + len(indoor6_images['val']) + len(indoor6_images['test'])\n        \n        SaveLandmarksAndVisibilityMask(selected_landmarks, points, images, indoor6_imagename_to_index, num_images, \n                                    os.path.join(opt.dataset_folder, opt.scene_id), \n                                    opt.landmark_config, opt.visibility_config, opt.output_format)\n        \n\nif __name__ == '__main__':\n    parser = argparse.ArgumentParser(\n        description='Scene Landmark Detection',\n        formatter_class=argparse.ArgumentDefaultsHelpFormatter)\n    parser.add_argument(\n        '--dataset_folder', type=str, required=True,\n        help='Root directory, where all data is stored')\n    parser.add_argument(\n        '--scene_id', type=str, default='scene6',\n        help='Scene id')\n    parser.add_argument(\n        '--num_landmarks', type=int, default=300,\n        help='Number of selected landmarks.')\n    parser.add_argument(\n        '--output_format', type=str, default='',\n        help='Landmark file output.')\n    parser.add_argument(\n        '--output_folder', type=str, required=True,\n        help='Output folder')\n    parser.add_argument(\n        '--landmark_config', type=str, default='landmarks/landmarks-300',\n        help='File containing scene-specific 3D landmarks.')\n    parser.add_argument(\n        '--landmark_indices', type=int, action='append',\n        help = 'Landmark indices, specify twice',\n        required=True)\n    parser.add_argument(\n        '--visibility_config', type=str, default='landmarks/visibility_aug-300',\n        help='File containing information about visibility of landmarks in cameras associated with training set.')\n    parser.add_argument(\n        '--model', type=str, default='efficientnet',\n        help='Network architecture backbone.')\n    parser.add_argument(\n        '--output_downsample', type=int, default=4,\n        help='Down sampling factor for output resolution')\n    parser.add_argument(\n        '--gpu_device', type=str, default='cuda:0',\n        help='GPU device')\n    parser.add_argument(\n        '--pretrained_model', type=str, action='append', default=[],\n        help='Pretrained detector model')\n\n\n    opt = parser.parse_args()\n    select_additional_landmarks(opt, minimal_tight_thr=1e-3, opt_tight_thr=1e-3)"
  }
]