[
  {
    "path": ".gitignore",
    "content": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packaging\n.Python\nbuild/\ndevelop-eggs/\ndist/\ndownloads/\neggs/\n.eggs/\nlib/\nlib64/\nparts/\nsdist/\nvar/\nwheels/\nshare/python-wheels/\n*.egg-info/\n.installed.cfg\n*.egg\nMANIFEST\n\n# PyInstaller\n#  Usually these files are written by a python script from a template\n#  before PyInstaller builds the exe, so as to inject date/other infos into it.\n*.manifest\n*.spec\n\n# Installer logs\npip-log.txt\npip-delete-this-directory.txt\n\n# Unit test / coverage reports\nhtmlcov/\n.tox/\n.nox/\n.coverage\n.coverage.*\n.cache\nnosetests.xml\ncoverage.xml\n*.cover\n*.py,cover\n.hypothesis/\n.pytest_cache/\ncover/\n\n# Translations\n*.mo\n*.pot\n\n# Django stuff:\n*.log\nlocal_settings.py\ndb.sqlite3\ndb.sqlite3-journal\n\n# Flask stuff:\ninstance/\n.webassets-cache\n\n# Scrapy stuff:\n.scrapy\n\n# Sphinx documentation\ndocs/_build/\n\n# PyBuilder\n.pybuilder/\ntarget/\n\n# Jupyter Notebook\n.ipynb_checkpoints\n\n# IPython\nprofile_default/\nipython_config.py\n\n# pyenv\n#   For a library or package, you might want to ignore these files since the code is\n#   intended to run in multiple environments; otherwise, check them in:\n# .python-version\n\n# pipenv\n#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.\n#   However, in case of collaboration, if having platform-specific dependencies or dependencies\n#   having no cross-platform support, pipenv may install dependencies that don't work, or not\n#   install all needed dependencies.\n#Pipfile.lock\n\n# poetry\n#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.\n#   This is especially recommended for binary packages to ensure reproducibility, and is more\n#   commonly ignored for libraries.\n#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control\n#poetry.lock\n\n# pdm\n#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.\n#pdm.lock\n#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it\n#   in version control.\n#   https://pdm.fming.dev/#use-with-ide\n.pdm.toml\n\n# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm\n__pypackages__/\n\n# Celery stuff\ncelerybeat-schedule\ncelerybeat.pid\n\n# SageMath parsed files\n*.sage.py\n\n# Environments\n.env\n.venv\nenv/\nvenv/\nENV/\nenv.bak/\nvenv.bak/\n\n# Spyder project settings\n.spyderproject\n.spyproject\n\n# Rope project settings\n.ropeproject\n\n# mkdocs documentation\n/site\n\n# mypy\n.mypy_cache/\n.dmypy.json\ndmypy.json\n\n# Pyre type checker\n.pyre/\n\n# pytype static type analyzer\n.pytype/\n\n# Cython debug symbols\ncython_debug/\n\n# PyCharm\n#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can\n#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore\n#  and can be added to the global gitignore or merged into this file.  For a more nuclear\n#  option (not recommended) you can uncomment the following to ignore the entire idea folder.\n.idea/\n"
  },
  {
    "path": "DESIGN.md",
    "content": "# Study on Design of LiDAR Compression \n\nAll the following experiments are conducted with 4 NVIDIA 3090 GPUs on KITTI-360 (64-beam).\n\nTip: Download the video instead of watching it with the Google Drive's built-in video player provides a better visualization.\n\n### Autoencoders (trained with 40k steps, evaluated on reconstruction):\n\n| Curvewise <br/> Factor | Patchwise <br/> Factor | Output <br/> Size | rFRID(↓) | rFSVD(↓) | rFPVD(↓) | CD(↓) | EMD(↓) | #Params (M) |                                                Directory                                                |                                                                                Visualization of Reconstruction (val)                                                                                 |\n|:----------------------:|:----------------------:|:-----------------:|:--------:|:--------:|:--------:|:-----:|:------:|:-----------:|:-------------------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|\n|          N/A           |          N/A           |   Ground Truth    |    -     |    -     |    -     |   -   |   -    |      -      |                                                    -                                                    | [Range Image](https://drive.google.com/file/d/1wAtQSlVwF2jCpcL3zbXlk2lGUYzo1GBf/view?usp=sharing), [Point Cloud](https://drive.google.com/file/d/1iHIB7Jw-WS0D_hXgQSOyyDyWCmPVR-6k/view?usp=sharing) |\n|                        |                        |                   |          |          |          |       |        |             |                                                                                                         |                                                                                                                                                                                                      |\n|           4            |           1            |     64x256x2      |   0.2    |   12.9   |   13.8   | 0.069 | 0.151  |    9.52     | [Google Drive](https://drive.google.com/drive/folders/1bLGigdh3oNBTfskdX5yisqJ3fd99wFnR?usp=drive_link) | [Range Image](https://drive.google.com/file/d/1w7slbsRjlU4kb0kl6LyjX-JojJvoWQhG/view?usp=sharing), [Point Cloud](https://drive.google.com/file/d/17ewPXoRMeA_HsvEOznsvxy3d6iKk7hC2/view?usp=sharing) |\n|           8            |           1            |     64x128x3      |   0.9    |   21.2   |   17.4   | 0.141 | 0.230  |    10.76    |  [Google Drive](https://drive.google.com/drive/folders/1qPCPJC9TsIEO2UaZqurPu99m4syzfzuq?usp=sharing)   | [Range Image](https://drive.google.com/file/d/17kukYFlJY40_cVBuWXMLHiMe7ls2OLNh/view?usp=sharing), [Point Cloud](https://drive.google.com/file/d/116IXDMgrWn6OHtyEYIo6aM1ARloX3BWF/view?usp=sharing) |\n|           16           |           1            |      64x64x4      |   2.8    |   31.1   |   23.9   | 0.220 | 0.265  |    12.43    |  [Google Drive](https://drive.google.com/drive/folders/1IHm3KlwG4lQAa9Ygt3WRUPfDxAQ1Tjia?usp=sharing)   | [Range Image](https://drive.google.com/file/d/12TKyoajTiU_hr1MAdK2PNveddorCshG4/view?usp=sharing), [Point Cloud](https://drive.google.com/file/d/18NCV7JoR3W1COaPH96a1ozbh8-58eT6n/view?usp=sharing) |\n|           32           |           1            |      64x32x8      |   16.4   |   49.0   |   38.5   | 0.438 | 0.344  |    13.72    |  [Google Drive](https://drive.google.com/drive/folders/1CnUGOoAZDrSbDG3DjVx5pcouAT5WQTGN?usp=sharing)   | [Range Image](https://drive.google.com/file/d/1S2DPHfWAljKZrHJlPHIvxAPK2-rpdJ_J/view?usp=sharing), [Point Cloud](https://drive.google.com/file/d/1yx8V4Qav7sCigcfSHrrrJQOFF-s2PryV/view?usp=sharing) |\n|                        |                        |                   |          |          |          |       |        |             |                                                                                                         |                                                                                                                                                                                                      |\n|           1            |           2            |     32x512x2      |   1.5    |   25.0   |   23.8   | 0.096 | 0.178  |    2.87     |  [Google Drive](https://drive.google.com/drive/folders/16OLfvexGSuOO8zNxkVLvY6rglvLn3HRG?usp=sharing)   | [Range Image](https://drive.google.com/file/d/1tPPD2Pnn_6ge3x2yoJXhkDhe0Wi5Qxhw/view?usp=sharing), [Point Cloud](https://drive.google.com/file/d/1Xjg0ckVb208BFEgbv4VQtV-fVraEXUNC/view?usp=sharing) |\n|           1            |           4            |     16x256x4      |   0.6    |   15.4   |   15.8   | 0.142 | 0.233  |    12.45    |  [Google Drive](https://drive.google.com/drive/folders/1ArTAar3UM-7eBmkGb2bqDF0MVW6GL0az?usp=sharing)   | [Range Image](https://drive.google.com/file/d/1Q_ZTRKyDOAmP314p9B6Cip79mc-FJ2se/view?usp=sharing), [Point Cloud](https://drive.google.com/file/d/1-t9zvSrov1OsF_WEIBqH3xkLzTJfxRBr/view?usp=sharing) |\n|           1            |           8            |     8x128x16      |   17.7   |   35.7   |   33.1   | 0.384 | 0.327  |    15.78    |  [Google Drive](https://drive.google.com/drive/folders/1Ol2P6ZYYFjEImLAhIhY8iR_G6bLKI4Yx?usp=sharing)   | [Range Image](https://drive.google.com/file/d/14hPy2utsaxwPxW5PA7gO7ak7f-lcd-X5/view?usp=sharing), [Point Cloud](https://drive.google.com/file/d/1izj-_1hFkdaRCg2qUzkXByfCD-vBd_1M/view?usp=sharing) |\n|           1            |           16           |      4x64x64      |   37.1   |   68.7   |   63.9   | 0.699 | 0.416  |    16.25    |  [Google Drive](https://drive.google.com/drive/folders/1_vihPf9xgnr4Zib-dYNUZ1n6kTMxT3rG?usp=sharing)   | [Range Image](https://drive.google.com/file/d/1G7evMm3H6WvbHFhBlCa8wxPzwVC3q-8H/view?usp=sharing), [Point Cloud](https://drive.google.com/file/d/1IdBrEpCIugvxVHyNOsNIg8Y8ZBWrHcWL/view?usp=sharing) |\n|                        |                        |                   |          |          |          |       |        |             |                                                                                                         |                                                                                                                                                                                                      |\n|           2            |           2            |     32x256x3      |   0.4    |   11.2   |   12.2   | 0.094 | 0.199  |    13.09    |  [Google Drive](https://drive.google.com/drive/folders/1SdFEtMGRE9Oi23jlDrtebslc5hxhYLBQ?usp=sharing)   | [Range Image](https://drive.google.com/file/d/1Ac4jVB6RkqMwV1fZcPGDyQhR3eE_Zj6C/view?usp=sharing), [Point Cloud](https://drive.google.com/file/d/1pg2ezSmXiu3ensvj564JIy6CpB46uZm7/view?usp=sharing) |\n|           4            |           2            |     32x128x4      |   3.9    |   19.6   |   16.6   | 0.197 | 0.236  |    14.35    |  [Google Drive](https://drive.google.com/drive/folders/1uWlZPiU9Jw4TFfvI4Avi4r0bEyJ9kw4i?usp=sharing)   | [Range Image](https://drive.google.com/file/d/1yZGqe_DcDXew3JabnN4T1-P27ZlscHba/view?usp=sharing), [Point Cloud](https://drive.google.com/file/d/1i_q6gVY4gMtzKYlhlMQ9QrRql73VX05j/view?usp=sharing) |\n|           8            |           2            |      32x64x8      |   8.0    |   25.3   |   20.2   | 0.277 | 0.294  |    16.06    |  [Google Drive](https://drive.google.com/drive/folders/1Z9B7PjR5SlgAl2WLGmIPxiYTzmo17J--?usp=sharing)   | [Range Image](https://drive.google.com/file/d/1HVqFbIE1lgotDplc8x7_hJkSU5vLtbRN/view?usp=sharing), [Point Cloud](https://drive.google.com/file/d/1jSYWZMmPelmfWpVa7V5f2Byr9vN2BKXo/view?usp=sharing) | \n|           16           |           2            |     32x32x16      |   21.5   |   54.2   |   44.6   | 0.491 | 0.371  |    17.44    |  [Google Drive](https://drive.google.com/drive/folders/1jBaEiAymHACWTdy_GbYOiG9e-GFVkIfe?usp=sharing)   | [Range Image](https://drive.google.com/file/d/1flAzjRLcl5Jtc_T--GbbomKWi42DvW9v/view?usp=sharing), [Point Cloud](https://drive.google.com/file/d/1zfMzu6NFeLJhR1YPU28k7vPy1GX-80QT/view?usp=sharing) |\n|           2            |           4            |     16x128x8      |   2.5    |   16.9   |   15.8   | 0.205 | 0.273  |    15.07    |  [Google Drive](https://drive.google.com/drive/folders/1w-4bF4yORsot6xb5ia95RXWhfHrfpK0T?usp=sharing)   | [Range Image](https://drive.google.com/file/d/1rm0sviRg4LfImgWVCi6THi3pHF4kFccH/view?usp=sharing), [Point Cloud](https://drive.google.com/file/d/1gPKB2zj44oLLEBuUXU8uiaXIcSWpyMOi/view?usp=sharing) |\n|           4            |           4            |     16x128x16     |   13.8   |   29.5   |   25.4   | 0.341 | 0.317  |    16.86    |  [Google Drive](https://drive.google.com/drive/folders/1_hY52mbKy4t3U5eWQ4Stq-3wZX1FPXXz?usp=sharing)   | [Range Image](https://drive.google.com/file/d/1ldMRXfUtFNBtjCCc-KYR311dQvCmn0EF/view?usp=sharing), [Point Cloud](https://drive.google.com/file/d/129WcZXW3b6e4UMxZ9x4XCR3BlaKw1Vec/view?usp=sharing) |\n\n### LiDMs (trained with 10k steps, evaluated on generation):\n\n\n| Curvewise <br/> Factor | Patchwise <br/> Factor | Output <br/> Size | FRID(↓) | FSVD(↓) | FPVD(↓) | JSD(↓) | MMD(10$^-4$,↓) |                                                Directory                                                |\n|:----------------------:|:----------------------:|:-----------------:|:-------:|:-------:|:-------:|:------:|:--------------:|:-------------------------------------------------------------------------------------------------------:|\n|          N/A           |          N/A           |   Ground Truth    |    -    |    -    |    -    |   -    |       -        |                                                                                                         |\n|                        |                        |                   |         |         |         |        |                |                                                                                                         |\n|           4            |           1            |     64x256x2      |   271   |   148   |   118   | 0.262  |      5.33      | [Google Drive](https://drive.google.com/drive/folders/1_bf9apVwhhmyaYiUPO5vE1t6Tqwbj2dq?usp=drive_link) |\n|           8            |           1            |     64x128x3      |   162   |   85    |   68    | 0.234  |      5.03      | [Google Drive](https://drive.google.com/drive/folders/1M_NVgHNWbWDe6vOMML4ZpoO7-alKGGBl?usp=drive_link) |\n|           16           |           1            |      64x64x4      |   142   |   116   |   106   | 0.232  |      5.15      | [Google Drive](https://drive.google.com/drive/folders/19DkZhHhVj7oa7XITXbdcNLGqqDkfjri-?usp=drive_link) |\n|                        |                        |                   |         |         |         |        |                |                                                                                                         |\n|           1            |           2            |     32x512x2      |   205   |   154   |   132   | 0.248  |      6.15      | [Google Drive](https://drive.google.com/drive/folders/1l5VZRImWDZttHIgM5A6heWjSYocoeujq?usp=drive_link) |\n|           1            |           4            |     16x256x4      |   180   |   60    |   55    | 0.230  |      5.34      | [Google Drive](https://drive.google.com/drive/folders/1sg0iVMFf7EnAcUvpxq57kx7Y2D0Pclq7?usp=drive_link) |\n|           1            |           8            |     8x128x16      |   192   |   88    |   78    | 0.243  |      5.14      | [Google Drive](https://drive.google.com/drive/folders/163yiMd3nEey6igZWk2ldegdlGtJhRppf?usp=drive_link) |\n|                        |                        |                   |         |         |         |        |                |                                                                                                         |\n|           2            |           2            |     32x256x3      |   161   |   73    |   63    | 0.228  |      5.44      | [Google Drive](https://drive.google.com/drive/folders/1cP-ghlv996glNHewCF01iU5lHy9iOgQO?usp=drive_link) |\n|           4            |           2            |     32x128x4      |   145   |   77    |   68    | 0.222  |      5.10      | [Google Drive](https://drive.google.com/drive/folders/1zQf3_fFlp8r2b34ZySpHU4Nd1ilIDqRz?usp=drive_link) |\n|           8            |           2            |      32x64x8      |   188   |   83    |   71    | 0.228  |      5.33      | [Google Drive](https://drive.google.com/drive/folders/1EXK5tw95LOKqxclFNdIbc6qQ7H-whRKp?usp=drive_link) | \n|           2            |           4            |     16x128x8      |   162   |   56    |   49    | 0.228  |      4.82      | [Google Drive](https://drive.google.com/drive/folders/1JIQTswdJ3s4b_w1BHv6WWFs29fTswgRy?usp=drive_link) |\n|           4            |           4            |     16x128x16     |   195   |   80    |   70    | 0.240  |      5.84      | [Google Drive](https://drive.google.com/drive/folders/1F47aSmU2CnWSx8mWZ1ICnKftgIUgpW58?usp=drive_link) |\n\n\n\n### LiDM Performance with Different Scaling Factors:\n\n<p align=\"center\">\n<img src=assets/lidm_frid.png width=\"450\"/>\n<img src=assets/lidm_fsvd.png width=\"450\"/>\n</p>\n"
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2024 Haoxi Ran\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "<div align=\"center\">\n<h1>LiDAR Diffusion Models [CVPR 2024]</h1>\n\n[**Haoxi Ran**](https://hancyran.github.io/) · [**Vitor Guizilini**](https://scholar.google.com.br/citations?user=UH9tP6QAAAAJ&hl=en) · [**Yue Wang**](https://yuewang.xyz/)\n\n\n<a href=\"https://hancyran.github.io/assets/paper/lidar_diffusion.pdf\"><img src='https://img.shields.io/badge/PDF-LiDAR Diffusion-yellow' alt='PDF'></a>\n<a href=\"https://arxiv.org/abs/2404.00815\"><img src='https://img.shields.io/badge/arXiv-2404.00815-red?logo=arXiv' alt='arXiv'></a>\n<a href=\"https://lidar-diffusion.github.io/\"><img src='https://img.shields.io/badge/Project-LiDAR Diffusion-green' alt='Project'></a>\n<a href=\"https://www.youtube.com/watch?v=Vj7DubNZnDo\"><img src='https://img.shields.io/badge/youtube-Video-slateblue?logo=youtube' alt='Video'></a>\n<a href=\"#citation\"><img src='https://img.shields.io/badge/BibTex-LiDAR Diffusion-blue' alt='Paper BibTex'></a>\n\n\n<img src=assets/overview.png width=\"400\"/>\n\n</div>\n\n## :tada: News :tada:\n\n- [**Apr 14, 2024**] Pretrained autoencoders and LiDMs for different tasks are released!\n- [**Apr 5, 2024**] Our codebase and a detailed study of our autoencoder design along with the pretrained models is released! \n\n\n\n## Requirements\n\nWe provide an available [conda](https://conda.io/) environment named `lidar_diffusion`:\n\n```\nsh init/create_env.sh\nconda activate lidar_diffusion\n```\n\n## Evaluation Toolbox\n\n**Overview of evaluation metrics**:\n\n<table>\n<thead>\n  <tr>\n    <th style=\"text-align: center; vertical-align: middle;\" colspan=\"3\">Perceptual Metrics<br>(generation &amp; reconstruction)</th>\n    <th style=\"text-align: center; vertical-align: middle;\" colspan=\"2\">Statistical Metrics<br>(generation only)</th>\n    <th style=\"text-align: center; vertical-align: middle;\" colspan=\"2\">Distance metrics <br> (reconstruction only)</th>\n  </tr>\n</thead>\n<tbody>\n  <tr>\n    <td style=\"text-align: center; vertical-align: middle;\">FRID</td>\n    <td style=\"text-align: center; vertical-align: middle;\">FSVD</td>\n    <td style=\"text-align: center; vertical-align: middle;\">FPVD</td>\n    <td style=\"text-align: center; vertical-align: middle;\">JSD</td>\n    <td style=\"text-align: center; vertical-align: middle;\">MMD</td>\n    <td style=\"text-align: center; vertical-align: middle;\">CD</td>\n    <td style=\"text-align: center; vertical-align: middle;\">EMD</td>\n  </tr>\n</tbody>\n</table>\n<br/>\n\nTo standardize the evaluation of LiDAR generative models, we provide a **self-contained** and **mostly CUDA-accelerated** evaluation toolbox in the directory `./lidm/eval/`. It implements and integrates various evaluation metrics, including:\n* Perceptual metrics:\n  * Fréchet Range Image Distance (**FRID**)\n  * Fréchet Sparse Volume Distance (**FSVD**)\n  * Fréchet Point-based Volume Distance (**FPVD**)\n* Statistical metrics:\n  * Minimum Matching Distance (**MMD**)\n  * Jensen-Shannon Divergence (**JSD**)\n* Statistical pairwise metrics (for reconstruction only):\n  * Chamfer Distance (**CD**)\n  * Earth Mover's Distance (**EMD**)\n\n\n\nFor more details about setup and usage, please refer to the [Evaluation Toolbox README](./lidm/eval/README.md).\n\n\n## Model Zoo \n\nTo test different tasks below, please download the pretrained LiDM and its corresponding autoencoder:\n\n### Pretrained Autoencoders\n\n#### 64-beam (evaluated on KITTI-360 val):\n\n| Encoder  | rFRID(↓) | rFSVD(↓) | rFPVD(↓) | CD(↓) | EMD(↓) |                                                        Checkpoint                                                        |                        Rec.&nbsp;Results&nbsp;val<br/>(Point&nbsp;Cloud)                         |         Comment          |\n|:--------:|:--------:|:--------:|:--------:|:-----:|:------:|:------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------:|:------------------------:|\n| f_c2_p4  |   2.15   |   20.2   |   16.2   | 0.160 | 0.203  | [[Google&nbsp;Drive]](https://drive.google.com/file/d/1fUlQVqnShylps4-PnFCRD-sW-6v_rAB4/view?usp=drive_link)<br/>(205MB) | [[Video]](https://drive.google.com/file/d/1bIjRtrF3ljtcR-esjTL79uJisn4cNf2D/view?usp=drive_link) |                          |\n| f_c2_p4* |   2.06   |   20.3   |   15.7   | 0.092 | 0.176  | [[Google&nbsp;Drive]](https://drive.google.com/file/d/1A0zhQQXZTr8IfvpmsXrsG3lISC8KLkka/view?usp=drive_link)<br/>(205MB) | [[Video]](https://drive.google.com/file/d/1P_FbIOmYtS3kgutVAYXr7RShryO5Md7s/view?usp=drive_link) | *: w/o logarithm scaling |\n\n\n### Benchmark for Unconditional LiDAR Generation\n\n#### 64-beam (2k samples):\n\n|         Method         | Encoder  |  FRID(↓)  | FSVD(↓)  | FPVD(↓)  |  JSD(↓)   | MMD<br/>(10^-4,↓) |                                                        Checkpoint                                                        |                                Output&nbsp;LiDAR<br/>Point&nbsp;Clouds                                |\n|:----------------------:|:--------:|:---------:|:--------:|:--------:|:---------:|:-----------------:|:------------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------:|\n|       LiDAR-GAN        |          |   1222    |  183.4   |  168.1   |   0.272   |       4.74        |                                                            -                                                             | [[2k samples]](https://drive.google.com/file/d/1lzOqXHxtO83HNMZ_7_dU9GMee_Zm3clO/view?usp=drive_link) |\n|       LiDAR-VAE        |          |   199.1   |  129.9   |  105.8   |   0.237   |       7.07        |                                                            -                                                             | [[2k samples]](https://drive.google.com/file/d/1_6KGATYfzLur9bt8vISLXwzEIsjbXq_k/view?usp=drive_link) |\n|      ProjectedGAN      |          |   149.7   |   44.7   |   33.4   |   0.188   |       2.88        |                                                            -                                                             | [[2k samples]](https://drive.google.com/file/d/1LzLhuKpBOIZ6F7SlPtMdYuCSE_8P1qwz/view?usp=drive_link) |\n|      UltraLiDAR§       |          |   370.0   |   72.1   |   66.6   |   0.747   |       17.12       |                                                            -                                                             | [[2k samples]](https://drive.google.com/file/d/17kft5S0nA_lnjECrK_aHzI5q1Erma_T7/view?usp=drive_link) |\n| LiDARGen&nbsp;(1160s)† |          |   129.0   |   39.2   |   33.4   | **0.188** |     **2.88**      |                                                            -                                                             | [[2k samples]](https://drive.google.com/file/d/1N5jTHjM8XnUYAMYkbsOipUQGhZjqMYDD/view?usp=drive_link) |\n|                        |          |           |          |          |           |                   |                                                                                                                          |                                                                                                       |\n|  LiDARGen&nbsp;(50s)†  |          |   2051    |  480.6   |  400.7   |   0.506   |       9.91        |                                                            -                                                             | [[2k samples]](https://drive.google.com/file/d/1qN4T0Jg8P4IJLdaR_7sBdjID3TtzLITy/view?usp=drive_link) |\n|    LiDM&nbsp;(50s)     | f_c2_p4  |   135.8   | **37.9** | **28.7** |   0.211   |       3.87        | [[Google&nbsp;Drive]](https://drive.google.com/file/d/1WKFwXi7xiXr2WCtM3ZX95CqlU-kOhhgC/view?usp=drive_link)<br/>(3.9GB) | [[2k samples]](https://drive.google.com/file/d/1mdWdzXHTW4IONgAYD44EvfUI8aokPfP_/view?usp=drive_link) |\n|    LiDM&nbsp;(50s)     | f_c2_p4* | **125.1** |   38.8   |   29.0   |   0.211   |       3.84        | [[Google&nbsp;Drive]](https://drive.google.com/file/d/1huCr1xQJ6ZRS2VYcJ99vDrCS8QhxVysQ/view?usp=drive_link)<br/>(3.9GB) | [[2k samples]](https://drive.google.com/file/d/18K-9ps9Ej-OACRKe7D30reY4l6CttN6T/view?usp=drive_link) |\n\nNOTE:\n1. Each method is evaluated with **2,000** randomly generated samples. \n2. †: samples generated by the officially released pretrained model in [LiDARGen github repo](https://github.com/vzyrianov/lidargen).\n3. §: samples borrowed from [UltraLiDAR implementation](https://github.com/myc634/UltraLiDAR_nusc_waymo).\n4. All above results are calculated from our [evaluation toolbox](#evaluation-toolbox). For more details, please refer to [Evaluation Toolbox README](./lidm/eval/README.md).\n5. Each .pcd file is a list of point clouds stored by `joblib` package. To load those files, use command `joblib.load(path)`.\n\nTo evaluate above methods (except _LiDM_) yourself, download our provided .pcd files in the **Output** column to directory `./models/baseline/kitti/[method]/`:\n\n```\nCUDA_VISIBLE_DEVICES=0 python scripts/sample.py -d kitti -f models/baseline/kitti/[method]/samples.pcd --baseline --eval\n```\n\nTo evaluate LiDM through the given .pcd files:\n\n```\nCUDA_VISIBLE_DEVICES=0 python scripts/sample.py -d kitti -f models/lidm/kitti/[method]/samples.pcd --eval\n```\n\n### Pretrained LiDMs for Other Tasks\n\n|                 Task                 | Encoder  |                         Dataset                         | FRID(↓) | FSVD(↓) |                                                        Checkpoint                                                        |                                                      Output                                                       |\n|:------------------------------------:|:--------:|:-------------------------------------------------------:|:-------:|:-------:|:------------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------------------------------:|\n| Semantic&nbsp;Map&nbsp;to&nbsp;LiDAR | f_c2_p4* | [SemanticKITTI](http://semantic-kitti.org/dataset.html) |  11.8   |  19.1   | [[Google&nbsp;Drive]](https://drive.google.com/file/d/1Mijx3cRPupsC2d4b2FwlbOXsojeHAXaO/view?usp=drive_link)<br/>(3.9GB) | [[log.tar.gz]](https://drive.google.com/file/d/1N2hMDO0boL5TPmnulApPspIpNnG5d9e5/view?usp=drive_link)<br/>(2.1GB) |\n|      Camera&nbsp;to&nbsp;LiDAR       | f_c2_p4* | [KITTI-360](https://www.cvlibs.net/datasets/kitti-360/) |  38.9   |  32.1   | [[Google&nbsp;Drive]](https://drive.google.com/file/d/1XzY7fSHQz72gWVFcmit-NlkoSwtMWbfz/view?usp=drive_link)<br/>(7.5GB) | [[log.tar.gz]](https://drive.google.com/file/d/1PZrMwiZiVvpYuuMKxMHpWEalt0b1lM17/view?usp=drive_link)<br/>(5.4GB) |\n|       Text&nbsp;to&nbsp;LiDAR        | f_c2_p4* |                       _zero-shot_                       |    -    |    -    |                                               From&nbsp;_Camera-to-LiDAR_                                                |                                                         -                                                         |\n\nNOTE:\n1. The output `log.tar.gz` contains input conditions (`.png`), generated range images (`.png`), generated point clouds (`.txt`), and a collection of all output point clouds (`.pcd`). \n\n\n### Study on Design of LiDAR Compression \n\nFor full details of our studies on the design of LiDAR Compression, please refer to [LiDAR Compression Design README](./DESIGN.md).\n\nTip: Download the video instead of watching it with the Google Drive's built-in video player provides a better visualization.\n\n#### Autoencoders (trained with 40k steps, evaluated on reconstruction):\n\n| Curvewise <br/> Factor | Patchwise <br/> Factor | Output <br/> Size | rFRID(↓) | rFSVD(↓) | #Params (M) |                                                                                          Visualization of Reconstruction (val)                                                                                          |\n|:----------------------:|:----------------------:|:-----------------:|:--------:|:--------:|:-----------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|\n|          N/A           |          N/A           |   Ground Truth    |    -     |    -     |      -      | [[Range&nbsp;Image]](https://drive.google.com/file/d/1wAtQSlVwF2jCpcL3zbXlk2lGUYzo1GBf/view?usp=sharing),&nbsp;[[Point&nbsp;Cloud]](https://drive.google.com/file/d/1iHIB7Jw-WS0D_hXgQSOyyDyWCmPVR-6k/view?usp=sharing) |\n|                        |                        |                   |          |          |             |                                                                                                                                                                                                                         |\n|           4            |           1            |     64x256x2      |   0.2    |   12.9   |    9.52     | [[Range&nbsp;Image]](https://drive.google.com/file/d/1w7slbsRjlU4kb0kl6LyjX-JojJvoWQhG/view?usp=sharing),&nbsp;[[Point&nbsp;Cloud]](https://drive.google.com/file/d/17ewPXoRMeA_HsvEOznsvxy3d6iKk7hC2/view?usp=sharing) |\n|           8            |           1            |     64x128x3      |   0.9    |   21.2   |    10.76    | [[Range&nbsp;Image]](https://drive.google.com/file/d/17kukYFlJY40_cVBuWXMLHiMe7ls2OLNh/view?usp=sharing),&nbsp;[[Point&nbsp;Cloud]](https://drive.google.com/file/d/116IXDMgrWn6OHtyEYIo6aM1ARloX3BWF/view?usp=sharing) |\n|           16           |           1            |      64x64x4      |   2.8    |   31.1   |    12.43    | [[Range&nbsp;Image]](https://drive.google.com/file/d/12TKyoajTiU_hr1MAdK2PNveddorCshG4/view?usp=sharing),&nbsp;[[Point&nbsp;Cloud]](https://drive.google.com/file/d/18NCV7JoR3W1COaPH96a1ozbh8-58eT6n/view?usp=sharing) |\n|           32           |           1            |      64x32x8      |   16.4   |   49.0   |    13.72    | [[Range&nbsp;Image]](https://drive.google.com/file/d/1S2DPHfWAljKZrHJlPHIvxAPK2-rpdJ_J/view?usp=sharing),&nbsp;[[Point&nbsp;Cloud]](https://drive.google.com/file/d/1yx8V4Qav7sCigcfSHrrrJQOFF-s2PryV/view?usp=sharing) |\n|                        |                        |                   |          |          |             |                                                                                                                                                                                                                         |\n|           1            |           2            |     32x512x2      |   1.5    |   25.0   |    2.87     | [[Range&nbsp;Image]](https://drive.google.com/file/d/1tPPD2Pnn_6ge3x2yoJXhkDhe0Wi5Qxhw/view?usp=sharing),&nbsp;[[Point&nbsp;Cloud]](https://drive.google.com/file/d/1Xjg0ckVb208BFEgbv4VQtV-fVraEXUNC/view?usp=sharing) |\n|           1            |           4            |     16x256x4      |   0.6    |   15.4   |    12.45    | [[Range&nbsp;Image]](https://drive.google.com/file/d/1Q_ZTRKyDOAmP314p9B6Cip79mc-FJ2se/view?usp=sharing),&nbsp;[[Point&nbsp;Cloud]](https://drive.google.com/file/d/1-t9zvSrov1OsF_WEIBqH3xkLzTJfxRBr/view?usp=sharing) |\n|           1            |           8            |     8x128x16      |   17.7   |   35.7   |    15.78    | [[Range&nbsp;Image]](https://drive.google.com/file/d/14hPy2utsaxwPxW5PA7gO7ak7f-lcd-X5/view?usp=sharing),&nbsp;[[Point&nbsp;Cloud]](https://drive.google.com/file/d/1izj-_1hFkdaRCg2qUzkXByfCD-vBd_1M/view?usp=sharing) |\n|           1            |           16           |      4x64x64      |   37.1   |   68.7   |    16.25    | [[Range&nbsp;Image]](https://drive.google.com/file/d/1G7evMm3H6WvbHFhBlCa8wxPzwVC3q-8H/view?usp=sharing),&nbsp;[[Point&nbsp;Cloud]](https://drive.google.com/file/d/1IdBrEpCIugvxVHyNOsNIg8Y8ZBWrHcWL/view?usp=sharing) |\n|                        |                        |                   |          |          |             |                                                                                                                                                                                                                         |\n|           2            |           2            |     32x256x3      |   0.4    |   11.2   |    13.09    | [[Range&nbsp;Image]](https://drive.google.com/file/d/1Ac4jVB6RkqMwV1fZcPGDyQhR3eE_Zj6C/view?usp=sharing),&nbsp;[[Point&nbsp;Cloud]](https://drive.google.com/file/d/1pg2ezSmXiu3ensvj564JIy6CpB46uZm7/view?usp=sharing) |\n|           4            |           2            |     32x128x4      |   3.9    |   19.6   |    14.35    | [[Range&nbsp;Image]](https://drive.google.com/file/d/1yZGqe_DcDXew3JabnN4T1-P27ZlscHba/view?usp=sharing),&nbsp;[[Point&nbsp;Cloud]](https://drive.google.com/file/d/1i_q6gVY4gMtzKYlhlMQ9QrRql73VX05j/view?usp=sharing) |\n|           8            |           2            |      32x64x8      |   8.0    |   25.3   |    16.06    | [[Range&nbsp;Image]](https://drive.google.com/file/d/1HVqFbIE1lgotDplc8x7_hJkSU5vLtbRN/view?usp=sharing),&nbsp;[[Point&nbsp;Cloud]](https://drive.google.com/file/d/1jSYWZMmPelmfWpVa7V5f2Byr9vN2BKXo/view?usp=sharing) | \n|           16           |           2            |     32x32x16      |   21.5   |   54.2   |    17.44    | [[Range&nbsp;Image]](https://drive.google.com/file/d/1flAzjRLcl5Jtc_T--GbbomKWi42DvW9v/view?usp=sharing),&nbsp;[[Point&nbsp;Cloud]](https://drive.google.com/file/d/1zfMzu6NFeLJhR1YPU28k7vPy1GX-80QT/view?usp=sharing) |\n|           2            |           4            |     16x128x8      |   2.5    |   16.9   |    15.07    | [[Range&nbsp;Image]](https://drive.google.com/file/d/1rm0sviRg4LfImgWVCi6THi3pHF4kFccH/view?usp=sharing),&nbsp;[[Point&nbsp;Cloud]](https://drive.google.com/file/d/1gPKB2zj44oLLEBuUXU8uiaXIcSWpyMOi/view?usp=sharing) |\n|           4            |           4            |     16x128x16     |   13.8   |   29.5   |    16.86    | [[Range&nbsp;Image]](https://drive.google.com/file/d/1ldMRXfUtFNBtjCCc-KYR311dQvCmn0EF/view?usp=sharing),&nbsp;[[Point&nbsp;Cloud]](https://drive.google.com/file/d/129WcZXW3b6e4UMxZ9x4XCR3BlaKw1Vec/view?usp=sharing) |\n\n\n## Unconditional LiDAR Generation\n\n<p align=\"center\">\n<img src=assets/uncond.jpeg width=\"512\"/>\n</p>\n\nTo run sampling on pretrained models (and to evaluate your results with flag \"--eval\"), firstly download our provided [pretrained autoencoders](#pretrained-autoencoders) to directory `./models/first_stage_models/kitti/[model_name]` and [pretrained LiDMs](#benchmark-for-unconditional-lidar-generation) to directory `./models/lidm/kitti/[model_name]`:\n\n```\nCUDA_VISIBLE_DEVICES=0 python scripts/sample.py -d kitti -r models/lidm/kitti/[model_name]/model.ckpt -n 2000 --eval\n```\n\n\n## Semantic-Map-to-LiDAR\n\n<p align=\"center\">\n<img src=assets/map2lidar.gif width=\"768\"/>\n</p>\n\nTo check the conditional results on a full sequence of semantic maps (sequence '08'), please refer to [this video](https://drive.google.com/file/d/1TtAROAmQVecZm2xDTEkfPGRP1Bbr8U6n/view?usp=drive_link)\n\nBefore run this task, set up the [SemanticKITTI](http://www.semantic-kitti.org/) dataset first for semantic labels as input.\n\nTo run sampling on pretrained models (and to evaluate your results with flag \"--eval\"):\n```\nCUDA_VISIBLE_DEVICES=0 python scripts/sample_cond.py -r models/lidm/kitti/sem2lidar/model.ckpt -d kitti [--eval]\n```\n\n## Camera-to-LiDAR\n\n<p align=\"center\">\n<img src=assets/cam2lidar.jpeg width=\"768\"/>\n</p>\n\nBefore run this task, set up the [KITTI-360](https://www.cvlibs.net/datasets/kitti-360/) dataset first for camera images as input.\n\nTo run sampling on pretrained models:\n```\nCUDA_VISIBLE_DEVICES=0 python scripts/sample_cond.py -r models/lidm/kitti/sem2lidar/model.ckpt -d kitti [--eval]\n```\n\n\n## Text-to-LiDAR\n\n<p align=\"center\">\n<img src=assets/text2lidar.jpeg width=\"768\"/>\n</p>\n\n\nTo run sampling on pretrained models:\n```\nCUDA_VISIBLE_DEVICES=0 python scripts/text2lidar.py -r models/lidm/kitti/cam2lidar/model.ckpt -d kitti -p \"an empty road with no object\"\n```\n\n## Training\n\n\nBesides, to train your own LiDAR Diffusion Models, just run this command (for example, train both autoencoder and lidm on four gpus):\n```\n# train an autoencoder\npython main.py -b configs/autoencoder/kitti/autoencoder_c2_p4.yaml -t --gpus 0,1,2,3\n\n# train an LiDM\npython main.py -b configs/lidar_diffusion/kitti/uncond_c2_p4.yaml -t --gpus 0,1,2,3\n```\n\nTo debug the training process, just add flag `-d`:\n```\npython main.py -b path/to/your/config.yaml -t --gpus 0, -d\n```\n\nTo resume your training from an existing log directory or an existing checkpoint file, use the flag `-r`:\n```\n# using a log directory\npython main.py -b path/to/your/config.yaml -t --gpus 0, -r path/to/your/log\n\n# or, using a checkpoint \npython main.py -b path/to/your/config.yaml -t --gpus 0, -r path/to/your/ckpt/file\n```\n\n\n## Acknowledgement\n\n- Our codebase for the diffusion models builds heavily on [Latent Diffusion](https://github.com/CompVis/latent-diffusion)\n\n\n## Citation\n\nIf you find this project useful in your research, please consider citing:\n```\n@inproceedings{ran2024towards,\n    title={Towards Realistic Scene Generation with LiDAR Diffusion Models},\n    author={Ran, Haoxi and Guizilini, Vitor and Wang, Yue},\n    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},\n    year={2024}\n}\n```\n"
  },
  {
    "path": "configs/autoencoder/kitti/autoencoder_c2_p4.yaml",
    "content": "model:\n  base_learning_rate: 4.5e-6\n  target: lidm.models.autoencoder.VQModel\n  params:\n    monitor: val/rec_loss\n    embed_dim: 8\n    n_embed: 16384\n    lib_name: lidm\n    use_mask: False  # False\n    lossconfig:\n      target: lidm.modules.losses.vqperceptual.VQGeoLPIPSWithDiscriminator\n      params:\n        disc_conditional: false\n        disc_start: 1\n        disc_in_channels: 1\n        disc_num_layers: 3\n        disc_weight: 0.6  # 0.6\n        disc_version: v0  # v1\n        codebook_weight: 1\n        curve_length: 1\n        geo_factor: 0\n        mask_factor: 0  # 0.0\n        perceptual_factor: 0\n        perceptual_type: rangenet_dec\n\n    ddconfig:\n      double_z: false\n      z_channels: 8\n      in_channels: 1\n      out_ch: 1\n      ch: 64\n      ch_mult: [1,2,2,4]  # num_down = len(ch_mult)-1\n      strides: [[1,2],[2,2],[2,2]]\n      num_res_blocks: 2\n      attn_levels: []\n      dropout: 0.0\n\n\ndata:\n  target: main.DataModuleFromConfig\n  params:\n    batch_size: 4\n    num_workers: 8\n    wrap: true\n    dataset:\n      size: [64, 1024]\n      fov: [ 3,-25 ]\n      depth_range: [ 1.0,56.0 ]\n      depth_scale: 5.84  # np.log2(depth_max + 1)\n      log_scale: true\n      x_range: [ -50.0, 50.0 ]\n      y_range: [ -50.0, 50.0 ]\n      z_range: [ -3.0, 1.0 ]\n      resolution: 1\n      num_channels: 1\n      num_cats: 10\n      num_views: 2\n      num_sem_cats: 19\n      filtered_map_cats: [ ]\n    aug:\n      flip: true\n      rotate: true\n      keypoint_drop: false\n      keypoint_drop_range: [ 5,20 ]\n      randaug: false\n    train:\n      target: lidm.data.kitti.KITTIImageTrain\n      params:\n        condition_key: image\n    validation:\n      target: lidm.data.kitti.KITTIImageValidation\n      params:\n        condition_key: image\n\n\n\nlightning:\n  callbacks:\n    image_logger:\n      target: main.ImageLogger\n      params:\n        batch_frequency: 1000\n        max_images: 8\n        increase_log_steps: true\n\n  trainer:\n    benchmark: true\n    accumulate_grad_batches: 2\n"
  },
  {
    "path": "configs/lidar_diffusion/kitti/uncond_c2_p4.yaml",
    "content": "model:\n  base_learning_rate: 1.0e-06\n  target: lidm.models.diffusion.ddpm.LatentDiffusion\n  params:\n    linear_start: 0.0015\n    linear_end: 0.0195\n    num_timesteps_cond: 1\n    log_every_t: 200\n    timesteps: 1000\n    image_size: [16, 128]\n    channels: 8\n    monitor: val/loss_simple_ema\n    first_stage_key: image\n    unet_config:\n      target: lidm.modules.diffusion.openaimodel.UNetModel\n      params:\n        image_size: [16, 128]\n        in_channels: 8\n        out_channels: 8\n        model_channels: 256\n        attention_resolutions: [4, 2, 1]\n        num_res_blocks: 2\n        channel_mult: [1, 2, 4]\n        num_head_channels: 32\n        lib_name: lidm\n    first_stage_config:\n      target: lidm.models.autoencoder.VQModelInterface\n      params:\n        embed_dim: 8\n        n_embed: 16384\n        lib_name: lidm\n        use_mask: True  # False\n        ckpt_path: models/first_stage_models/kitti/f_c2_p4/model.ckpt\n        ddconfig:\n          double_z: false\n          z_channels: 8\n          in_channels: 1\n          out_ch: 2\n          ch: 64\n          ch_mult: [1,2,2,4]\n          strides: [[1,2],[2,2],[2,2]]\n          num_res_blocks: 2\n          attn_levels: []\n          dropout: 0.0\n        lossconfig:\n          target: torch.nn.Identity\n    cond_stage_config: \"__is_unconditional__\"\n\ndata:\n  target: main.DataModuleFromConfig\n  params:\n    batch_size: 16\n    num_workers: 8\n    wrap: true\n    dataset:\n      size: [64, 1024]\n      fov: [ 3,-25 ]\n      depth_range: [ 1.0,56.0 ]\n      depth_scale: 5.84  # np.log2(depth_max + 1)\n      log_scale: true\n      x_range: [ -50.0, 50.0 ]\n      y_range: [ -50.0, 50.0 ]\n      z_range: [ -3.0, 1.0 ]\n      resolution: 1\n      num_channels: 1\n      num_cats: 10\n      num_views: 2\n      num_sem_cats: 19\n      filtered_map_cats: [ ]\n    aug:\n      flip: true\n      rotate: false\n      keypoint_drop: false\n      keypoint_drop_range: [ 5,20 ]\n      randaug: false\n    train:\n      target: lidm.data.kitti.KITTI360Train\n      params:\n        condition_key: image\n    validation:\n      target: lidm.data.kitti.KITTI360Validation\n      params:\n        condition_key: image\n\n\nlightning:\n  callbacks:\n    image_logger:\n      target: main.ImageLogger\n      params:\n        batch_frequency: 5000\n        max_images: 8\n        increase_log_steps: False\n\n  trainer:\n    benchmark: true"
  },
  {
    "path": "data/config/semantic-kitti.yaml",
    "content": "# This file is covered by the LICENSE file in the root of this project.\nlabels: \n  0 : \"unlabeled\"\n  1 : \"outlier\"\n  10: \"car\"\n  11: \"bicycle\"\n  13: \"bus\"\n  15: \"motorcycle\"\n  16: \"on-rails\"\n  18: \"truck\"\n  20: \"other-vehicle\"\n  30: \"person\"\n  31: \"bicyclist\"\n  32: \"motorcyclist\"\n  40: \"road\"\n  44: \"parking\"\n  48: \"sidewalk\"\n  49: \"other-ground\"\n  50: \"building\"\n  51: \"fence\"\n  52: \"other-structure\"\n  60: \"lane-marking\"\n  70: \"vegetation\"\n  71: \"trunk\"\n  72: \"terrain\"\n  80: \"pole\"\n  81: \"traffic-sign\"\n  99: \"other-object\"\n  252: \"moving-car\"\n  253: \"moving-bicyclist\"\n  254: \"moving-person\"\n  255: \"moving-motorcyclist\"\n  256: \"moving-on-rails\"\n  257: \"moving-bus\"\n  258: \"moving-truck\"\n  259: \"moving-other-vehicle\"\ncolor_map: # bgr\n  0 : [0, 0, 0]\n  1 : [0, 0, 255]\n  10: [245, 150, 100]\n  11: [245, 230, 100]\n  13: [250, 80, 100]\n  15: [150, 60, 30]\n  16: [255, 0, 0]\n  18: [180, 30, 80]\n  20: [255, 0, 0]\n  30: [30, 30, 255]\n  31: [200, 40, 255]\n  32: [90, 30, 150]\n  40: [255, 0, 255]\n  44: [255, 150, 255]\n  48: [75, 0, 75]\n  49: [75, 0, 175]\n  50: [0, 200, 255]\n  51: [50, 120, 255]\n  52: [0, 150, 255]\n  60: [170, 255, 150]\n  70: [0, 175, 0]\n  71: [0, 60, 135]\n  72: [80, 240, 150]\n  80: [150, 240, 255]\n  81: [0, 0, 255]\n  99: [255, 255, 50]\n  252: [245, 150, 100]\n  256: [255, 0, 0]\n  253: [200, 40, 255]\n  254: [30, 30, 255]\n  255: [90, 30, 150]\n  257: [250, 80, 100]\n  258: [180, 30, 80]\n  259: [255, 0, 0]\ncontent: # as a ratio with the total number of points\n  0: 0.018889854628292943\n  1: 0.0002937197336781505\n  10: 0.040818519255974316\n  11: 0.00016609538710764618\n  13: 2.7879693665067774e-05\n  15: 0.00039838616015114444\n  16: 0.0\n  18: 0.0020633612104619787\n  20: 0.0016218197275284021\n  30: 0.00017698551338515307\n  31: 1.1065903904919655e-08\n  32: 5.532951952459828e-09\n  40: 0.1987493871255525\n  44: 0.014717169549888214\n  48: 0.14392298360372\n  49: 0.0039048553037472045\n  50: 0.1326861944777486\n  51: 0.0723592229456223\n  52: 0.002395131480328884\n  60: 4.7084144280367186e-05\n  70: 0.26681502148037506\n  71: 0.006035012012626033\n  72: 0.07814222006271769\n  80: 0.002855498193863172\n  81: 0.0006155958086189918\n  99: 0.009923127583046915\n  252: 0.001789309418528068\n  253: 0.00012709999297008662\n  254: 0.00016059776092534436\n  255: 3.745553104802113e-05\n  256: 0.0\n  257: 0.00011351574470342043\n  258: 0.00010157861367183268\n  259: 4.3840131989471124e-05\n# classes that are indistinguishable from single scan or inconsistent in\n# ground truth are mapped to their closest equivalent\nlearning_map:\n  0 : 0     # \"unlabeled\"\n  1 : 0     # \"outlier\" mapped to \"unlabeled\" --------------------------mapped\n  10: 1     # \"car\"\n  11: 2     # \"bicycle\"\n  13: 5     # \"bus\" mapped to \"other-vehicle\" --------------------------mapped\n  15: 3     # \"motorcycle\"\n  16: 5     # \"on-rails\" mapped to \"other-vehicle\" ---------------------mapped\n  18: 4     # \"truck\"\n  20: 5     # \"other-vehicle\"\n  30: 6     # \"person\"\n  31: 7     # \"bicyclist\"\n  32: 8     # \"motorcyclist\"\n  40: 9     # \"road\"\n  44: 10    # \"parking\"\n  48: 11    # \"sidewalk\"\n  49: 12    # \"other-ground\"\n  50: 13    # \"building\"\n  51: 14    # \"fence\"\n  52: 0     # \"other-structure\" mapped to \"unlabeled\" ------------------mapped\n  60: 9     # \"lane-marking\" to \"road\" ---------------------------------mapped\n  70: 15    # \"vegetation\"\n  71: 16    # \"trunk\"\n  72: 17    # \"terrain\"\n  80: 18    # \"pole\"\n  81: 19    # \"traffic-sign\"\n  99: 0     # \"other-object\" to \"unlabeled\" ----------------------------mapped\n  252: 1    # \"moving-car\" to \"car\" ------------------------------------mapped\n  253: 7    # \"moving-bicyclist\" to \"bicyclist\" ------------------------mapped\n  254: 6    # \"moving-person\" to \"person\" ------------------------------mapped\n  255: 8    # \"moving-motorcyclist\" to \"motorcyclist\" ------------------mapped\n  256: 5    # \"moving-on-rails\" mapped to \"other-vehicle\" --------------mapped\n  257: 5    # \"moving-bus\" mapped to \"other-vehicle\" -------------------mapped\n  258: 4    # \"moving-truck\" to \"truck\" --------------------------------mapped\n  259: 5    # \"moving-other\"-vehicle to \"other-vehicle\" ----------------mapped\nlearning_map_inv: # inverse of previous map\n  0: 0      # \"unlabeled\", and others ignored\n  1: 10     # \"car\"\n  2: 11     # \"bicycle\"\n  3: 15     # \"motorcycle\"\n  4: 18     # \"truck\"\n  5: 20     # \"other-vehicle\"\n  6: 30     # \"person\"\n  7: 31     # \"bicyclist\"\n  8: 32     # \"motorcyclist\"\n  9: 40     # \"road\"\n  10: 44    # \"parking\"\n  11: 48    # \"sidewalk\"\n  12: 49    # \"other-ground\"\n  13: 50    # \"building\"\n  14: 51    # \"fence\"\n  15: 70    # \"vegetation\"\n  16: 71    # \"trunk\"\n  17: 72    # \"terrain\"\n  18: 80    # \"pole\"\n  19: 81    # \"traffic-sign\"\nlearning_ignore: # Ignore classes\n  0: True      # \"unlabeled\", and others ignored\n  1: False     # \"car\"\n  2: False     # \"bicycle\"\n  3: False     # \"motorcycle\"\n  4: False     # \"truck\"\n  5: False     # \"other-vehicle\"\n  6: False     # \"person\"\n  7: False     # \"bicyclist\"\n  8: False     # \"motorcyclist\"\n  9: False     # \"road\"\n  10: False    # \"parking\"\n  11: False    # \"sidewalk\"\n  12: False    # \"other-ground\"\n  13: False    # \"building\"\n  14: False    # \"fence\"\n  15: False    # \"vegetation\"\n  16: False    # \"trunk\"\n  17: False    # \"terrain\"\n  18: False    # \"pole\"\n  19: False    # \"traffic-sign\"\nsplit: # sequence numbers\n  train:\n    - 0\n    - 1\n    - 2\n    - 3\n    - 4\n    - 5\n    - 6\n    - 7\n    - 9\n    - 10\n  valid:\n    - 8\n  test:\n    - 11\n    - 12\n    - 13\n    - 14\n    - 15\n    - 16\n    - 17\n    - 18\n    - 19\n    - 20\n    - 21\n"
  },
  {
    "path": "init/create_env.sh",
    "content": "#!/usr/bin/bash\n\n# install rust compiler\ncurl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y\nexport PATH=\"$HOME/.cargo/bin:$PATH\"\n\n# create conda environment\nconda create -n lidar_diffusion python=3.10.11 -y\nconda activate lidar_diffusion\n\n# install dependencies\npip install --upgrade pip\npip install torchmetrics==0.5.0 pytorch-lightning==1.4.2 omegaconf==2.1.1 einops==0.3.0 transformers==4.36.2 imageio==2.9.0 imageio-ffmpeg==0.4.2 opencv-python kornia==0.7.0 wandb more_itertools\npip install gdown\npip install -e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers\npip install -e git+https://github.com/openai/CLIP.git@main#egg=clip\n\n# install torchsparse (optional)\n#apt-get install libsparsehash-dev\n#pip install git+https://github.com/mit-han-lab/torchsparse.git@v1.4.0\n\nmkdir -p dataset/\n"
  },
  {
    "path": "lidm/__init__.py",
    "content": ""
  },
  {
    "path": "lidm/data/__init__.py",
    "content": ""
  },
  {
    "path": "lidm/data/annotated_dataset.py",
    "content": "from pathlib import Path\nfrom typing import Optional, List, Dict, Union, Any\nimport warnings\n\nfrom torch.utils.data import Dataset\n\nfrom .conditional_builder.objects_bbox import ObjectsBoundingBoxConditionalBuilder\nfrom .conditional_builder.objects_center_points import ObjectsCenterPointsConditionalBuilder\n\n\nclass Annotated3DObjectsDataset(Dataset):\n    def __init__(self, min_objects_per_image: int,\n                 max_objects_per_image: int, no_tokens: int, num_beams: int, cats: List[str],\n                 cat_blacklist: Optional[List[str]] = None, **kwargs):\n        self.min_objects_per_image = min_objects_per_image\n        self.max_objects_per_image = max_objects_per_image\n        self.no_tokens = no_tokens\n        self.num_beams = num_beams\n\n        self.categories = [c for c in cats if c not in cat_blacklist] if cat_blacklist is not None else cats\n        self._conditional_builders = None\n\n    @property\n    def no_classes(self) -> int:\n        return len(self.categories)\n\n    @property\n    def conditional_builders(self) -> ObjectsCenterPointsConditionalBuilder:\n        # cannot set this up in init because no_classes is only known after loading data in init of superclass\n        if self._conditional_builders is None:\n            self._conditional_builders = {\n                'center': ObjectsCenterPointsConditionalBuilder(\n                    self.no_classes,\n                    self.max_objects_per_image,\n                    self.no_tokens,\n                    self.num_beams\n                ),\n                'bbox': ObjectsBoundingBoxConditionalBuilder(\n                    self.no_classes,\n                    self.max_objects_per_image,\n                    self.no_tokens,\n                    self.num_beams\n                )\n            }\n        return self._conditional_builders\n\n    def get_textual_label_for_category_id(self, category_id: int) -> str:\n        return self.categories[category_id]\n"
  },
  {
    "path": "lidm/data/base.py",
    "content": "import pdb\nfrom abc import abstractmethod\nfrom functools import partial\n\nimport PIL\nimport numpy as np\nfrom PIL import Image\n\nimport torchvision.transforms.functional as TF\nfrom torch.utils.data import Dataset, IterableDataset\n\nfrom ..utils.aug_utils import get_lidar_transform, get_camera_transform, get_anno_transform\n\n\nclass DatasetBase(Dataset):\n    def __init__(self, data_root, split, dataset_config, aug_config, return_pcd=False, condition_key=None,\n                 scale_factors=None, degradation=None, **kwargs):\n        self.data_root = data_root\n        self.split = split\n        self.data = []\n        self.aug_config = aug_config\n\n        self.img_size = dataset_config.size\n        self.fov = dataset_config.fov\n        self.depth_range = dataset_config.depth_range\n        self.filtered_map_cats = dataset_config.filtered_map_cats\n        self.depth_scale = dataset_config.depth_scale\n        self.log_scale = dataset_config.log_scale\n\n        if self.log_scale:\n            self.depth_thresh = (np.log2(1./255. + 1) / self.depth_scale) * 2. - 1 + 1e-6\n        else:\n            self.depth_thresh = (1./255. / self.depth_scale) * 2. - 1 + 1e-6\n        self.return_pcd = return_pcd\n\n        if degradation is not None and scale_factors is not None:\n            scaled_img_size = (int(self.img_size[0] / scale_factors[0]), int(self.img_size[1] / scale_factors[1]))\n            degradation_fn = {\n                \"pil_nearest\": PIL.Image.NEAREST,\n                \"pil_bilinear\": PIL.Image.BILINEAR,\n                \"pil_bicubic\": PIL.Image.BICUBIC,\n                \"pil_box\": PIL.Image.BOX,\n                \"pil_hamming\": PIL.Image.HAMMING,\n                \"pil_lanczos\": PIL.Image.LANCZOS,\n            }[degradation]\n            self.degradation_transform = partial(TF.resize, size=scaled_img_size, interpolation=degradation_fn)\n        else:\n            self.degradation_transform = None\n        self.condition_key = condition_key\n\n        self.lidar_transform = get_lidar_transform(aug_config, split)\n        self.anno_transform = get_anno_transform(aug_config, split) if condition_key in ['bbox', 'center'] else None\n        self.view_transform = get_camera_transform(aug_config, split) if condition_key in ['camera'] else None\n\n        self.prepare_data()\n\n    def prepare_data(self):\n        raise NotImplementedError\n\n    def process_scan(self, range_img):\n        range_img = np.where(range_img < 0, 0, range_img)\n\n        if self.log_scale:\n            # log scale\n            range_img = np.log2(range_img + 0.0001 + 1)\n\n        range_img = range_img / self.depth_scale\n        range_img = range_img * 2. - 1.\n\n        range_img = np.clip(range_img, -1, 1)\n        range_img = np.expand_dims(range_img, axis=0)\n\n        # mask\n        range_mask = np.ones_like(range_img)\n        range_mask[range_img < self.depth_thresh] = -1\n\n        return range_img, range_mask\n\n    @staticmethod\n    def load_lidar_sweep(*args, **kwargs):\n        raise NotImplementedError\n\n    @staticmethod\n    def load_semantic_map(*args, **kwargs):\n        raise NotImplementedError\n\n    @staticmethod\n    def load_camera(*args, **kwargs):\n        raise NotImplementedError\n\n    @staticmethod\n    def load_annotation(*args, **kwargs):\n        raise NotImplementedError\n\n    def __len__(self):\n        return len(self.data)\n\n    def __getitem__(self, idx):\n        example = dict()\n        return example\n\n\nclass Txt2ImgIterableBaseDataset(IterableDataset):\n    \"\"\"\n    Define an interface to make the IterableDatasets for text2img data chainable\n    \"\"\"\n    def __init__(self, num_records=0, valid_ids=None, size=256):\n        super().__init__()\n        self.num_records = num_records\n        self.valid_ids = valid_ids\n        self.sample_ids = valid_ids\n        self.size = size\n\n        print(f'{self.__class__.__name__} dataset contains {self.__len__()} examples.')\n\n    def __len__(self):\n        return self.num_records\n\n    @abstractmethod\n    def __iter__(self):\n        pass"
  },
  {
    "path": "lidm/data/conditional_builder/__init__.py",
    "content": ""
  },
  {
    "path": "lidm/data/conditional_builder/objects_bbox.py",
    "content": "from itertools import cycle\nfrom typing import List, Tuple, Callable, Optional\n\nfrom PIL import Image as pil_image, ImageDraw as pil_img_draw, ImageFont\nfrom more_itertools.recipes import grouper\nfrom torch import LongTensor, Tensor\n\nfrom ..helper_types import BoundingBox, Annotation\nfrom .objects_center_points import ObjectsCenterPointsConditionalBuilder, convert_pil_to_tensor\nfrom .utils import COLOR_PALETTE, WHITE, GRAY_75, BLACK, additional_parameters_string, \\\n    pad_list, get_plot_font_size, absolute_bbox\n\n\nclass ObjectsBoundingBoxConditionalBuilder(ObjectsCenterPointsConditionalBuilder):\n    @property\n    def object_descriptor_length(self) -> int:\n        return 3  # 3/5: object_representation (1) + corners (2/4)\n\n    def _make_object_descriptors(self, annotations: List[Annotation]) -> List[Tuple[int, ...]]:\n        object_tuples = [\n            (self.object_representation(ann), *self.token_pair_from_bbox(ann.bbox))\n            for ann in annotations\n        ]\n        object_tuples = pad_list(object_tuples, self.empty_tuple, self.no_max_objects)\n        return object_tuples\n\n    def inverse_build(self, conditional: LongTensor) -> Tuple[List[Tuple[int, BoundingBox]], Optional[BoundingBox]]:\n        conditional_list = conditional.tolist()\n        object_triples = grouper(conditional_list, 3)\n        assert conditional.shape[0] == self.embedding_dim\n        return [(object_triple[0], self.bbox_from_token_pair(object_triple[1], object_triple[2])) for object_triple in object_triples if object_triple[0] != self.none], None\n\n    def plot(self, conditional: LongTensor, label_for_category_no: Callable[[int], str], figure_size: Tuple[int, int],\n             line_width: int = 3, font_size: Optional[int] = None) -> Tensor:\n        plot = pil_image.new('RGB', figure_size, WHITE)\n        draw = pil_img_draw.Draw(plot)\n        # font = ImageFont.truetype(\n        #     \"/usr/share/fonts/truetype/lato/Lato-Regular.ttf\",\n        #     size=get_plot_font_size(font_size, figure_size)\n        # )\n        font = ImageFont.load_default()\n        width, height = plot.size\n        description, crop_coordinates = self.inverse_build(conditional)\n        for (representation, bbox), color in zip(description, cycle(COLOR_PALETTE)):\n            annotation = self.representation_to_annotation(representation)\n            # class_label = label_for_category_no(annotation.category_id) + ' ' + additional_parameters_string(annotation)\n            class_label = label_for_category_no(annotation.category_id)\n            bbox = absolute_bbox(bbox, width, height)\n            draw.rectangle(bbox, outline=color, width=line_width)\n            draw.text((bbox[0] + line_width, bbox[1] + line_width), class_label, anchor='la', fill=BLACK, font=font)\n        if crop_coordinates is not None:\n            draw.rectangle(absolute_bbox(crop_coordinates, width, height), outline=GRAY_75, width=line_width)\n        return convert_pil_to_tensor(plot) / 127.5 - 1.\n"
  },
  {
    "path": "lidm/data/conditional_builder/objects_center_points.py",
    "content": "import math\nimport random\nimport warnings\nfrom itertools import cycle\nfrom typing import List, Optional, Tuple, Callable\n\nfrom PIL import Image as pil_image, ImageDraw as pil_img_draw, ImageFont\nfrom more_itertools.recipes import grouper\nfrom .utils import COLOR_PALETTE, WHITE, GRAY_75, BLACK, additional_parameters_string, pad_list, get_circle_size, \\\n    get_plot_font_size, absolute_bbox\nfrom ..helper_types import BoundingBox, Annotation, Image\nfrom torch import LongTensor, Tensor\nfrom torchvision.transforms import PILToTensor\n\n\npil_to_tensor = PILToTensor()\n\n\ndef convert_pil_to_tensor(image: Image) -> Tensor:\n    with warnings.catch_warnings():\n        # to filter PyTorch UserWarning as described here: https://github.com/pytorch/vision/issues/2194\n        warnings.simplefilter(\"ignore\")\n        return pil_to_tensor(image)\n\n\nclass ObjectsCenterPointsConditionalBuilder:\n    def __init__(self, no_object_classes: int, no_max_objects: int, no_tokens: int, num_beams: int):\n        self.no_object_classes = no_object_classes\n        self.no_max_objects = no_max_objects\n        self.no_tokens = no_tokens\n        # self.no_sections = int(math.sqrt(self.no_tokens))\n        self.no_sections = (self.no_tokens // num_beams, num_beams)  # (width, height)\n\n    @property\n    def none(self) -> int:\n        return self.no_tokens - 1\n\n    @property\n    def object_descriptor_length(self) -> int:\n        return 2\n\n    @property\n    def empty_tuple(self) -> Tuple:\n        return (self.none,) * self.object_descriptor_length\n\n    @property\n    def embedding_dim(self) -> int:\n        return self.no_max_objects * self.object_descriptor_length\n\n    def tokenize_coordinates(self, x: float, y: float) -> int:\n        \"\"\"\n        Express 2d coordinates with one number.\n        Example: assume self.no_tokens = 16, then no_sections = 4:\n        0  0  0  0\n        0  0  #  0\n        0  0  0  0\n        0  0  0  x\n        Then the # position corresponds to token 6, the x position to token 15.\n        @param x: float in [0, 1]\n        @param y: float in [0, 1]\n        @return: discrete tokenized coordinate\n        \"\"\"\n        x_discrete = int(round(x * (self.no_sections[0] - 1)))\n        y_discrete = int(round(y * (self.no_sections[1] - 1)))\n        return y_discrete * self.no_sections[0] + x_discrete\n\n    def coordinates_from_token(self, token: int) -> (float, float):\n        x = token % self.no_sections[0]\n        y = token // self.no_sections[0]\n        return x / (self.no_sections[0] - 1), y / (self.no_sections[1] - 1)\n\n    def bbox_from_token_pair(self, token1: int, token2: int) -> BoundingBox:\n        x0, y0 = self.coordinates_from_token(token1)\n        x1, y1 = self.coordinates_from_token(token2)\n        # x2, y2 = self.coordinates_from_token(token3)\n        # x3, y3 = self.coordinates_from_token(token4)\n        return x0, y0, x1, y1\n\n    def token_pair_from_bbox(self, bbox: BoundingBox) -> Tuple:\n        # return self.tokenize_coordinates(bbox[0], bbox[1]), self.tokenize_coordinates(bbox[2], bbox[3]), self.tokenize_coordinates(bbox[4], bbox[5]), self.tokenize_coordinates(bbox[6], bbox[7])\n        return self.tokenize_coordinates(bbox[0], bbox[1]), self.tokenize_coordinates(bbox[4], bbox[5])\n\n    def inverse_build(self, conditional: LongTensor) \\\n            -> Tuple[List[Tuple[int, Tuple[float, float]]], Optional[BoundingBox]]:\n        conditional_list = conditional.tolist()\n        table_of_content = grouper(conditional_list, self.object_descriptor_length)\n        assert conditional.shape[0] == self.embedding_dim\n        return [\n            (object_tuple[0], self.coordinates_from_token(object_tuple[1]))\n            for object_tuple in table_of_content if object_tuple[0] != self.none\n        ], None\n\n    def plot(self, conditional: LongTensor, label_for_category_no: Callable[[int], str], figure_size: Tuple[int, int],\n             line_width: int = 3, font_size: Optional[int] = None) -> Tensor:\n        plot = pil_image.new('RGB', figure_size, WHITE)\n        draw = pil_img_draw.Draw(plot)\n        circle_size = get_circle_size(figure_size)\n        # font = ImageFont.truetype('/usr/share/fonts/truetype/lato/Lato-Regular.ttf',\n        #                           size=get_plot_font_size(font_size, figure_size))\n        font = ImageFont.load_default()\n        width, height = plot.size\n        description, crop_coordinates = self.inverse_build(conditional)\n        for (representation, (x, y)), color in zip(description, cycle(COLOR_PALETTE)):\n            x_abs, y_abs = x * width, y * height\n            ann = self.representation_to_annotation(representation)\n            label = label_for_category_no(ann.category_id) + ' ' + additional_parameters_string(ann)\n            ellipse_bbox = [x_abs - circle_size, y_abs - circle_size, x_abs + circle_size, y_abs + circle_size]\n            draw.ellipse(ellipse_bbox, fill=color, width=0)\n            draw.text((x_abs, y_abs), label, anchor='md', fill=BLACK, font=font)\n        if crop_coordinates is not None:\n            draw.rectangle(absolute_bbox(crop_coordinates, width, height), outline=GRAY_75, width=line_width)\n        return convert_pil_to_tensor(plot) / 127.5 - 1.\n\n    def object_representation(self, annotation: Annotation) -> int:\n        return annotation.category_id\n\n    def representation_to_annotation(self, representation: int) -> Annotation:\n        category_id = representation % self.no_object_classes\n        # noinspection PyTypeChecker\n        return Annotation(\n            bbox=None,\n            category_id=category_id,\n        )\n\n    def _make_object_descriptors(self, annotations: List[Annotation]) -> List[Tuple[int, ...]]:\n        object_tuples = [\n            (self.object_representation(a),\n             self.tokenize_coordinates(a.center[0], a.center[1]))\n            for a in annotations\n        ]\n        empty_tuple = (self.none, self.none)\n        object_tuples = pad_list(object_tuples, empty_tuple, self.no_max_objects)\n        return object_tuples\n\n    def build(self, annotations: List[Annotation]) \\\n            -> LongTensor:\n        if len(annotations) == 0:\n            warnings.warn('Did not receive any annotations.')\n\n        random.shuffle(annotations)\n        if len(annotations) > self.no_max_objects:\n            warnings.warn('Received more annotations than allowed.')\n            annotations = annotations[:self.no_max_objects]\n\n        object_tuples = self._make_object_descriptors(annotations)\n        flattened = [token for tuple_ in object_tuples for token in tuple_]\n        assert len(flattened) == self.embedding_dim\n        assert all(0 <= value < self.no_tokens for value in flattened)\n\n        return LongTensor(flattened)\n"
  },
  {
    "path": "lidm/data/conditional_builder/utils.py",
    "content": "import importlib\nfrom typing import List, Any, Tuple, Optional\n\nimport numpy as np\nfrom ..helper_types import BoundingBox, Annotation\n\n# source: seaborn, color palette tab10\nCOLOR_PALETTE = [(30, 118, 179), (255, 126, 13), (43, 159, 43), (213, 38, 39), (147, 102, 188),\n                 (139, 85, 74), (226, 118, 193), (126, 126, 126), (187, 188, 33), (22, 189, 206)]\nBLACK = (0, 0, 0)\nGRAY_75 = (63, 63, 63)\nGRAY_50 = (127, 127, 127)\nGRAY_25 = (191, 191, 191)\nWHITE = (255, 255, 255)\nFULL_CROP = (0., 0., 1., 1.)\n\n\ndef corners_3d_to_2d(corners3d):\n    \"\"\"\n    Args:\n        corners3d: (N, 8, 2)\n    Returns:\n        corners2d: (N, 4, 2)\n    \"\"\"\n    # select pairs to reorganize\n    mask_0_3 = corners3d[:, 0:4, 0].argmax(1) // 2 != 0\n    mask_4_7 = corners3d[:, 4:8, 0].argmin(1) // 2 != 0\n\n    # reorganize corners in the order of (bottom-right, bottom-left)\n    corners3d[mask_0_3, 0:4] = corners3d[mask_0_3][:, [2, 3, 0, 1]]\n    # reorganize corners in the order of (top-left, top-right)\n    corners3d[mask_4_7, 4:8] = corners3d[mask_4_7][:, [2, 3, 0, 1]]\n\n    # calculate corners in order\n    bot_r = np.stack([corners3d[:, 0:2, 0].max(1), corners3d[:, 0:2, 1].min(1)], axis=-1)\n    bot_l = np.stack([corners3d[:, 2:4, 0].min(1), corners3d[:, 2:4, 1].min(1)], axis=-1)\n    top_l = np.stack([corners3d[:, 4:6, 0].min(1), corners3d[:, 4:6, 1].max(1)], axis=-1)\n    top_r = np.stack([corners3d[:, 6:8, 0].max(1), corners3d[:, 6:8, 1].max(1)], axis=-1)\n\n    return np.stack([bot_r, bot_l, top_l, top_r], axis=1)\n\n\ndef rotate_points_along_z(points, angle):\n    \"\"\"\n    Args:\n        points: (N, 3 + C)\n        angle: angle along z-axis, angle increases x ==> y\n    Returns:\n\n    \"\"\"\n    cosa = np.cos(angle)\n    sina = np.sin(angle)\n    zeros = np.zeros(points.shape[0])\n    ones = np.ones(points.shape[0])\n    rot_matrix = np.stack((\n        cosa,  sina, zeros,\n        -sina, cosa, zeros,\n        zeros, zeros, ones)).reshape((-1, 3, 3))\n    points_rot = np.matmul(points[:, :, 0:3], rot_matrix)\n    points_rot = np.concatenate((points_rot, points[:, :, 3:]), axis=-1)\n    return points_rot\n\n\ndef boxes_to_corners_3d(boxes3d):\n    \"\"\"\n        7 -------- 4\n       /|         /|\n      6 -------- 5 .\n      | |        | |\n      . 3 -------- 0\n      |/         |/\n      2 -------- 1\n    Args:\n        boxes3d:  (N, 7) [x, y, z, dx, dy, dz, heading], (x, y, z) is the box center\n\n    Returns:\n        corners3d: (N, 8, 3)\n    \"\"\"\n    template = np.array(\n        [[1, 1, -1], [1, -1, -1], [-1, -1, -1], [-1, 1, -1],\n        [1, 1, 1], [1, -1, 1], [-1, -1, 1], [-1, 1, 1]],\n    ) / 2\n\n    # corners3d = boxes3d[:, None, 3:6].repeat(1, 8, 1) * template[None, :, :]\n    corners3d = np.tile(boxes3d[:, None, 3:6], (1, 8, 1)) * template[None, :, :]\n    corners3d = rotate_points_along_z(corners3d.reshape((-1, 8, 3)), boxes3d[:, 6]).reshape((-1, 8, 3))\n    corners3d += boxes3d[:, None, 0:3]\n\n    return corners3d\n\n\ndef intersection_area(rectangle1: BoundingBox, rectangle2: BoundingBox) -> float:\n    \"\"\"\n    Give intersection area of two rectangles.\n    @param rectangle1: (x0, y0, w, h) of first rectangle\n    @param rectangle2: (x0, y0, w, h) of second rectangle\n    \"\"\"\n    rectangle1 = rectangle1[0], rectangle1[1], rectangle1[0] + rectangle1[2], rectangle1[1] + rectangle1[3]\n    rectangle2 = rectangle2[0], rectangle2[1], rectangle2[0] + rectangle2[2], rectangle2[1] + rectangle2[3]\n    x_overlap = max(0., min(rectangle1[2], rectangle2[2]) - max(rectangle1[0], rectangle2[0]))\n    y_overlap = max(0., min(rectangle1[3], rectangle2[3]) - max(rectangle1[1], rectangle2[1]))\n    return x_overlap * y_overlap\n\n\ndef horizontally_flip_bbox(bbox: BoundingBox) -> BoundingBox:\n    return 1 - (bbox[0] + bbox[2]), bbox[1], bbox[2], bbox[3]\n\n\ndef absolute_bbox(relative_bbox: BoundingBox, width: int, height: int) -> Tuple[int, int, int, int]:\n    bbox = relative_bbox\n    # bbox = bbox[0] * width, bbox[1] * height, (bbox[0] + bbox[2]) * width, (bbox[1] + bbox[3]) * height\n    bbox = bbox[0] * width, bbox[1] * height, bbox[2] * width, bbox[3] * height\n    # return int(bbox[0]), int(bbox[1]), int(bbox[2]), int(bbox[3])\n    x1, x2 = min(int(bbox[2]), int(bbox[0])), max(int(bbox[2]), int(bbox[0]))\n    y1, y2 = min(int(bbox[3]), int(bbox[1])), max(int(bbox[3]), int(bbox[1]))\n    if x1 == x2:\n        x2 += 1\n    if y1 == y2:\n        y2 += 1\n    return x1, y1, x2, y2\n\n\ndef pad_list(list_: List, pad_element: Any, pad_to_length: int) -> List:\n    return list_ + [pad_element for _ in range(pad_to_length - len(list_))]\n\n\ndef rescale_annotations(annotations: List[Annotation], crop_coordinates: BoundingBox, flip: bool) -> \\\n        List[Annotation]:\n    def clamp(x: float):\n        return max(min(x, 1.), 0.)\n\n    def rescale_bbox(bbox: BoundingBox) -> BoundingBox:\n        x0 = clamp((bbox[0] - crop_coordinates[0]) / crop_coordinates[2])\n        y0 = clamp((bbox[1] - crop_coordinates[1]) / crop_coordinates[3])\n        w = min(bbox[2] / crop_coordinates[2], 1 - x0)\n        h = min(bbox[3] / crop_coordinates[3], 1 - y0)\n        if flip:\n            x0 = 1 - (x0 + w)\n        return x0, y0, w, h\n\n    return [a._replace(bbox=rescale_bbox(a.bbox)) for a in annotations]\n\n\ndef filter_annotations(annotations: List[Annotation], crop_coordinates: BoundingBox) -> List:\n    return [a for a in annotations if intersection_area(a.bbox, crop_coordinates) > 0.0]\n\n\ndef additional_parameters_string(annotation: Annotation, short: bool = True) -> str:\n    sl = slice(1) if short else slice(None)\n    string = ''\n    if not (annotation.is_group_of or annotation.is_occluded or annotation.is_depiction or annotation.is_inside):\n        return string\n    if annotation.is_group_of:\n        string += 'group'[sl] + ','\n    if annotation.is_occluded:\n        string += 'occluded'[sl] + ','\n    if annotation.is_depiction:\n        string += 'depiction'[sl] + ','\n    if annotation.is_inside:\n        string += 'inside'[sl]\n    return '(' + string.strip(\",\") + ')'\n\n\ndef get_plot_font_size(font_size: Optional[int], figure_size: Tuple[int, int]) -> int:\n    if font_size is None:\n        font_size = 10\n        if max(figure_size) >= 256:\n            font_size = 12\n        if max(figure_size) >= 512:\n            font_size = 15\n    return font_size\n\n\ndef get_circle_size(figure_size: Tuple[int, int]) -> int:\n    circle_size = 2\n    if max(figure_size) >= 256:\n        circle_size = 3\n    if max(figure_size) >= 512:\n        circle_size = 4\n    return circle_size\n\n\ndef load_object_from_string(object_string: str) -> Any:\n    \"\"\"\n    Source: https://stackoverflow.com/a/10773699\n    \"\"\"\n    module_name, class_name = object_string.rsplit(\".\", 1)\n    return getattr(importlib.import_module(module_name), class_name)\n"
  },
  {
    "path": "lidm/data/helper_types.py",
    "content": "from typing import Tuple, Optional, NamedTuple, Union, List\nfrom PIL.Image import Image as pil_image\nfrom torch import Tensor\n\ntry:\n  from typing import Literal\nexcept ImportError:\n  from typing_extensions import Literal\n\nImage = Union[Tensor, pil_image]\n# BoundingBox = Tuple[float, float, float, float]  # x0, y0, w, h | x0, y0, x1, y1\n# BoundingBox3D = Tuple[float, float, float, float, float, float]  # x0, y0, z0, l, w, h\nBoundingBox = Tuple[float, float, float, float]  # corner coordinates (x,y) in the order of bottom-right -> bottom-left -> top-left -> top-right\nCenter = Tuple[float, float]\n\n\nclass Annotation(NamedTuple):\n    category_id: int\n    bbox: Optional[BoundingBox] = None\n    center: Optional[Center] = None\n"
  },
  {
    "path": "lidm/data/kitti.py",
    "content": "import glob\nimport os\nimport pickle\nimport numpy as np\nimport yaml\nfrom PIL import Image\nimport xml.etree.ElementTree as ET\n\nfrom lidm.data.base import DatasetBase\nfrom .annotated_dataset import Annotated3DObjectsDataset\nfrom .conditional_builder.utils import corners_3d_to_2d\nfrom .helper_types import Annotation\nfrom ..utils.lidar_utils import pcd2range, pcd2coord2d, range2pcd\n\n# TODO add annotation categories and semantic categories\nCATEGORIES = ['ignore', 'car', 'bicycle', 'motorcycle', 'truck', 'other-vehicle', 'person', 'bicyclist', 'motorcyclist',\n              'road', 'parking', 'sidewalk', 'other-ground', 'building', 'fence', 'vegetation', 'trunk', 'terrain',\n              'pole', 'traffic-sign']\nCATE2LABEL = {k: v for v, k in enumerate(CATEGORIES)}  # 0: invalid, 1~10: categories\nLABEL2RGB = np.array([(0, 0, 0), (0, 0, 142), (119, 11, 32), (0, 0, 230), (0, 0, 70), (0, 0, 90), (220, 20, 60),\n                      (255, 0, 0), (0, 0, 110), (128, 64, 128), (250, 170, 160), (244, 35, 232), (230, 150, 140),\n                      (70, 70, 70), (190, 153, 153), (107, 142, 35), (0, 80, 100), (230, 150, 140), (153, 153, 153),\n                      (220, 220, 0)])\nCAMERAS = ['CAM_FRONT']\nBBOX_CATS = ['car', 'people', 'cycle']\nBBOX_CAT2LABEL = {'car': 0, 'truck': 0, 'bus': 0, 'caravan': 0, 'person': 1, 'rider': 2, 'motorcycle': 2, 'bicycle': 2}\n\n# train + test\nSEM_KITTI_TRAIN_SET = ['00', '01', '02', '03', '04', '05', '06', '07', '09', '10']\nKITTI_TRAIN_SET = SEM_KITTI_TRAIN_SET + ['11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21']\nKITTI360_TRAIN_SET = ['00', '02', '04', '05', '06', '07', '09', '10'] + ['08']  # partial test data at '02' sequence\nCAM_KITTI360_TRAIN_SET = ['00', '04', '05', '06', '07', '08', '09', '10']  # cam mismatch lidar in '02'\n\n# validation\nSEM_KITTI_VAL_SET = KITTI_VAL_SET = ['08']\nCAM_KITTI360_VAL_SET = KITTI360_VAL_SET = ['03']\n\n\nclass KITTIBase(DatasetBase):\n    def __init__(self, **kwargs):\n        super().__init__(**kwargs)\n        self.dataset_name = 'kitti'\n        self.num_sem_cats = kwargs['dataset_config'].num_sem_cats + 1\n\n    @staticmethod\n    def load_lidar_sweep(path):\n        scan = np.fromfile(path, dtype=np.float32)\n        scan = scan.reshape((-1, 4))\n        points = scan[:, 0:3]  # get xyz\n        return points\n\n    def load_semantic_map(self, path, pcd):\n        raise NotImplementedError\n\n    def load_camera(self, path):\n        raise NotImplementedError\n\n    def __getitem__(self, idx):\n        example = dict()\n        data_path = self.data[idx]\n        # lidar point cloud\n        sweep = self.load_lidar_sweep(data_path)\n\n        if self.lidar_transform:\n            sweep, _ = self.lidar_transform(sweep, None)\n\n        if self.condition_key == 'segmentation':\n            # semantic maps\n            proj_range, sem_map = self.load_semantic_map(data_path, sweep)\n            example[self.condition_key] = sem_map\n        else:\n            proj_range, _ = pcd2range(sweep, self.img_size, self.fov, self.depth_range)\n        proj_range, proj_mask = self.process_scan(proj_range)\n        example['image'], example['mask'] = proj_range, proj_mask\n        if self.return_pcd:\n            reproj_sweep, _, _ = range2pcd(proj_range[0] * .5 + .5, self.fov, self.depth_range, self.depth_scale, self.log_scale)\n            example['raw'] = sweep\n            example['reproj'] = reproj_sweep.astype(np.float32)\n\n        # image degradation\n        if self.degradation_transform:\n            degraded_proj_range = self.degradation_transform(proj_range)\n            example['degraded_image'] = degraded_proj_range\n\n        # cameras\n        if self.condition_key == 'camera':\n            cameras = self.load_camera(data_path)\n            example[self.condition_key] = cameras\n\n        return example\n\n\nclass SemanticKITTIBase(KITTIBase):\n    def __init__(self, **kwargs):\n        super().__init__(**kwargs)\n        assert self.condition_key in ['segmentation']  # for segmentation input only\n        self.label2rgb = LABEL2RGB\n\n    def prepare_data(self):\n        # read data paths from KITTI\n        for seq_id in eval('SEM_KITTI_%s_SET' % self.split.upper()):\n            self.data.extend(glob.glob(os.path.join(\n                self.data_root, f'dataset/sequences/{seq_id}/velodyne/*.bin')))\n        # read label mapping\n        data_config = yaml.safe_load(open('./data/config/semantic-kitti.yaml', 'r'))\n        remap_dict = data_config[\"learning_map\"]\n        max_key = max(remap_dict.keys())\n        self.learning_map = np.zeros((max_key + 100), dtype=np.int32)\n        self.learning_map[list(remap_dict.keys())] = list(remap_dict.values())\n\n    def load_semantic_map(self, path, pcd):\n        label_path = path.replace('velodyne', 'labels').replace('.bin', '.label')\n        labels = np.fromfile(label_path, dtype=np.uint32)\n        labels = labels.reshape((-1))\n        labels = labels & 0xFFFF  # semantic label in lower half\n        labels = self.learning_map[labels]\n\n        proj_range, sem_map = pcd2range(pcd, self.img_size, self.fov, self.depth_range, labels=labels)\n        # sem_map = np.expand_dims(sem_map, axis=0).astype(np.int64)\n        sem_map = sem_map.astype(np.int64)\n        if self.filtered_map_cats is not None:\n            sem_map[np.isin(sem_map, self.filtered_map_cats)] = 0  # set filtered category as noise\n        onehot = np.eye(self.num_sem_cats, dtype=np.float32)[sem_map].transpose(2, 0, 1)\n        return proj_range, onehot\n\n\nclass SemanticKITTITrain(SemanticKITTIBase):\n    def __init__(self, **kwargs):\n        super().__init__(data_root='./dataset/SemanticKITTI', split='train', **kwargs)\n\n\nclass SemanticKITTIValidation(SemanticKITTIBase):\n    def __init__(self, **kwargs):\n        super().__init__(data_root='./dataset/SemanticKITTI', split='val', **kwargs)\n\n\nclass KITTI360Base(KITTIBase):\n    def __init__(self, split_per_view=None, **kwargs):\n        super().__init__(**kwargs)\n        self.split_per_view = split_per_view\n        if self.condition_key == 'camera':\n            assert self.split_per_view is not None, 'For camera-to-lidar, need to specify split_per_view'\n\n    def prepare_data(self):\n        # read data paths\n        self.data = []\n        if self.condition_key == 'camera':\n            seq_list = eval('CAM_KITTI360_%s_SET' % self.split.upper())\n        else:\n            seq_list = eval('KITTI360_%s_SET' % self.split.upper())\n        for seq_id in seq_list:\n            self.data.extend(glob.glob(os.path.join(\n                self.data_root, f'data_3d_raw/2013_05_28_drive_00{seq_id}_sync/velodyne_points/data/*.bin')))\n\n    def random_drop_camera(self, camera_list):\n        if np.random.rand() < self.aug_config['camera_drop'] and self.split == 'train':\n            camera_list = [np.zeros_like(c) if i != len(camera_list) // 2 else c for i, c in enumerate(camera_list)]  # keep the middle view only\n        return camera_list\n\n    def load_camera(self, path):\n        camera_path = path.replace('data_3d_raw', 'data_2d_camera').replace('velodyne_points/data', 'image_00/data_rect').replace('.bin', '.png')\n        camera = np.array(Image.open(camera_path)).astype(np.float32) / 255.\n        camera = camera.transpose(2, 0, 1)\n        if self.view_transform:\n            camera = self.view_transform(camera)\n        camera_list = np.split(camera, self.split_per_view, axis=2)  # split into n chunks as different views\n        camera_list = self.random_drop_camera(camera_list)\n        return camera_list\n\n\nclass KITTI360Train(KITTI360Base):\n    def __init__(self, **kwargs):\n        super().__init__(data_root='./dataset/KITTI-360', split='train', **kwargs)\n\n\nclass KITTI360Validation(KITTI360Base):\n    def __init__(self, **kwargs):\n        super().__init__(data_root='./dataset/KITTI-360', split='val', **kwargs)\n\n\nclass AnnotatedKITTI360Base(Annotated3DObjectsDataset, KITTI360Base):\n    def __init__(self, **kwargs):\n        self.id_bbox_dict = dict()\n        self.id_label_dict = dict()\n\n        Annotated3DObjectsDataset.__init__(self, **kwargs)\n        KITTI360Base.__init__(self, **kwargs)\n        assert self.condition_key in ['center', 'bbox']  # for annotated images only\n\n    @staticmethod\n    def parseOpencvMatrix(node):\n        rows = int(node.find('rows').text)\n        cols = int(node.find('cols').text)\n        data = node.find('data').text.split(' ')\n\n        mat = []\n        for d in data:\n            d = d.replace('\\n', '')\n            if len(d) < 1:\n                continue\n            mat.append(float(d))\n        mat = np.reshape(mat, [rows, cols])\n        return mat\n\n    def parseVertices(self, child):\n        transform = self.parseOpencvMatrix(child.find('transform'))\n        R = transform[:3, :3]\n        T = transform[:3, 3]\n        vertices = self.parseOpencvMatrix(child.find('vertices'))\n        vertices = np.matmul(R, vertices.transpose()).transpose() + T\n        return vertices\n\n    def parse_bbox_xml(self, path):\n        tree = ET.parse(path)\n        root = tree.getroot()\n\n        bbox_dict = dict()\n        label_dict = dict()\n        for child in root:\n            if child.find('transform') is None:\n                continue\n\n            label_name = child.find('label').text\n            if label_name not in BBOX_CAT2LABEL:\n                continue\n\n            label = BBOX_CAT2LABEL[label_name]\n            timestamp = int(child.find('timestamp').text)\n            # verts = self.parseVertices(child)\n            verts = self.parseOpencvMatrix(child.find('vertices'))[:8]\n            if timestamp in bbox_dict:\n                bbox_dict[timestamp].append(verts)\n                label_dict[timestamp].append(label)\n            else:\n                bbox_dict[timestamp] = [verts]\n                label_dict[timestamp] = [label]\n        return bbox_dict, label_dict\n\n    def prepare_data(self):\n        KITTI360Base.prepare_data(self)\n\n        self.data = [p for p in self.data if '2013_05_28_drive_0008_sync' not in p]  # remove unlabeled sequence 08\n        seq_list = eval('KITTI360_%s_SET' % self.split.upper())\n        for seq_id in seq_list:\n            if seq_id != '08':\n                xml_path = os.path.join(self.data_root, f'data_3d_bboxes/train/2013_05_28_drive_00{seq_id}_sync.xml')\n                bbox_dict, label_dict = self.parse_bbox_xml(xml_path)\n                self.id_bbox_dict[seq_id] = bbox_dict\n                self.id_label_dict[seq_id] = label_dict\n\n    def load_annotation(self, path):\n        seq_id = path.split('/')[-4].split('_')[-2][-2:]\n        timestamp = int(path.split('/')[-1].replace('.bin', ''))\n        verts_list = self.id_bbox_dict[seq_id][timestamp]\n        label_list = self.id_label_dict[seq_id][timestamp]\n\n        if self.condition_key == 'bbox':\n            points = np.stack(verts_list)\n        elif self.condition_key == 'center':\n            points = (verts_list[0] + verts_list[6]) / 2.\n        else:\n            raise NotImplementedError\n        labels = np.array([label_list])\n        if self.anno_transform:\n            points, labels = self.anno_transform(points, labels)\n        return points, labels\n\n    def __getitem__(self, idx):\n        example = dict()\n        data_path = self.data[idx]\n\n        # lidar point cloud\n        sweep = self.load_lidar_sweep(data_path)\n\n        # annotations\n        bbox_points, bbox_labels = self.load_annotation(data_path)\n\n        if self.lidar_transform:\n            sweep, bbox_points = self.lidar_transform(sweep, bbox_points)\n\n        # point cloud -> range\n        proj_range, _ = pcd2range(sweep, self.img_size, self.fov, self.depth_range)\n        proj_range, proj_mask = self.process_scan(proj_range)\n        example['image'], example['mask'] = proj_range, proj_mask\n        if self.return_pcd:\n            example['reproj'] = sweep\n\n        # annotation -> range\n        # NOTE: do not need to transform bbox points along with lidar, since their coordinates are based on range-image space instead of 3D space\n        proj_bbox_points, proj_bbox_labels = pcd2coord2d(bbox_points, self.fov, self.depth_range, labels=bbox_labels)\n        builder = self.conditional_builders[self.condition_key]\n        if self.condition_key == 'bbox':\n            proj_bbox_points = corners_3d_to_2d(proj_bbox_points)\n            annotations = [Annotation(bbox=bbox.flatten(), category_id=label) for bbox, label in\n                           zip(proj_bbox_points, proj_bbox_labels)]\n        else:\n            annotations = [Annotation(center=center, category_id=label) for center, label in\n                           zip(proj_bbox_points, proj_bbox_labels)]\n        example[self.condition_key] = builder.build(annotations)\n\n        return example\n\n\nclass AnnotatedKITTI360Train(AnnotatedKITTI360Base):\n    def __init__(self, **kwargs):\n        super().__init__(data_root='./dataset/KITTI-360', split='train', cats=BBOX_CATS, **kwargs)\n\n\nclass AnnotatedKITTI360Validation(AnnotatedKITTI360Base):\n    def __init__(self, **kwargs):\n        super().__init__(data_root='./dataset/KITTI-360', split='train', cats=BBOX_CATS, **kwargs)\n\n\nclass KITTIImageBase(KITTIBase):\n    \"\"\"\n    Range ImageSet only combining KITTI-360 and SemanticKITTI\n\n    #Samples (Training): 98014, #Samples (Val): 3511\n\n    \"\"\"\n    def __init__(self, **kwargs):\n        super().__init__(**kwargs)\n        assert self.condition_key in [None, 'image']  # for image input only\n\n    def prepare_data(self):\n        # read data paths from KITTI-360\n        self.data = []\n        for seq_id in eval('KITTI360_%s_SET' % self.split.upper()):\n            self.data.extend(glob.glob(os.path.join(\n                self.data_root, f'KITTI-360/data_3d_raw/2013_05_28_drive_00{seq_id}_sync/velodyne_points/data/*.bin')))\n\n        # read data paths from KITTI\n        for seq_id in eval('KITTI_%s_SET' % self.split.upper()):\n            self.data.extend(glob.glob(os.path.join(\n                self.data_root, f'SemanticKITTI/dataset/sequences/{seq_id}/velodyne/*.bin')))\n\n\nclass KITTIImageTrain(KITTIImageBase):\n    def __init__(self, **kwargs):\n        super().__init__(data_root='./dataset', split='train', **kwargs)\n\n\nclass KITTIImageValidation(KITTIImageBase):\n    def __init__(self, **kwargs):\n        super().__init__(data_root='./dataset', split='val', **kwargs)\n"
  },
  {
    "path": "lidm/eval/README.md",
    "content": "# Evaluation Toolbox for LiDAR Generation\n\nThis directory is a **self-contained**, **memory-friendly** and mostly **CUDA-accelerated** toolbox of multiple evaluation metrics for LiDAR generative models, including:\n* Perceptual metrics (our proposed):\n  * Fréchet Range Image Distance (**FRID**)\n  * Fréchet Sparse Volume Distance (**FSVD**)\n  * Fréchet Point-based Volume Distance (**FPVD**)\n* Statistical metrics (proposed in [Learning Representations and Generative Models for 3D Point Clouds](https://arxiv.org/abs/1707.02392)):\n  * Minimum Matching Distance (**MMD**)\n  * Jensen-Shannon Divergence (**JSD**)\n* Statistical pairwise metrics (for reconstruction only):\n  * Chamfer Distance (**CD**)\n  * Earth Mover's Distance (**EMD**)\n\n## Citation\n\nIf you find this project useful in your research, please consider citing:\n```\n@article{ran2024towards,\n  title={Towards Realistic Scene Generation with LiDAR Diffusion Models},\n  author={Ran, Haoxi and Guizilini, Vitor and Wang, Yue},\n  journal={arXiv preprint arXiv:2404.00815},\n  year={2024}\n}\n```\n\n\n## Dependencies\n\n### Basic (install through **pip**):\n* scipy\n* numpy\n* torch\n* pyyaml\n\n### Required by FSVD and FPVD:\n* [Torchsparse v1.4.0](https://github.com/mit-han-lab/torchsparse/tree/v1.4.0) (pip install git+https://github.com/mit-han-lab/torchsparse.git@v1.4.0)\n* [Google Sparse Hash library](https://github.com/sparsehash/sparsehash) (apt-get install libsparsehash-dev **or** compile locally and update variable CPLUS_INCLUDE_PATH with directory path)\n\n\n## Model Zoo \n\nTo evaluate with perceptual metrics on different types of LiDAR data, you can download all models through:\n*  this [google drive link](https://drive.google.com/file/d/1Ml4p4_nMlwLkSp7JB528GJv2_HxO8v1i/view?usp=drive_link) in the .zip file \n\nor\n*  the **full directory** of one specific model:\n\n### 64-beam LiDAR (trained on [SemanticKITTI](http://semantic-kitti.org/dataset.html)):\n\n| Metric |                                            Model                                            |          Arch           |                                                  Link                                                   | Code                                                             | Comments                                                                  |\n|:------:|:-------------------------------------------------------------------------------------------:|:-----------------------:|:-------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------|---------------------------------------------------------------------------|\n|  FRID  | [RangeNet++](https://www.ipb.uni-bonn.de/wp-content/papercite-data/pdf/milioto2019iros.pdf) |  DarkNet21-based UNet   | [Google Drive](https://drive.google.com/drive/folders/1ZS8KOoxB9hjB6kwKbH5Zfc8O5qJlKsbl?usp=drive_link) | [./models/rangenet/model.py](./models/rangenet/model.py)         | range image input (our trained model without the need of remission input) |\n|  FSVD  |                      [MinkowskiNet](https://arxiv.org/abs/1904.08755)                       |       Sparse UNet       | [Google Drive](https://drive.google.com/drive/folders/1zN12ZEvjIvo4PCjAsncgC22yvtRrCCMe?usp=drive_link) | [./models/minkowskinet/model.py](./models/minkowskinet/model.py) | point cloud input                                                         |\n|  FPVD  |                         [SPVCNN](https://arxiv.org/abs/2007.16100)                          | Point-Voxel Sparse UNet | [Google Drive](https://drive.google.com/drive/folders/1oEm3qpxfGetiVAfXIvecawEiFqW79M6B?usp=drive_link) | [./models/spvcnn/model.py](./models/spvcnn/model.py)             | point cloud input                                                         |\n\n\n### 32-beam LiDAR (trained on [nuScenes](https://www.nuscenes.org/nuscenes)):\n\n| Metric |                      Model                       |          Arch           |                                                  Link                                                   | Code                                                             | Comments          |\n|:------:|:------------------------------------------------:|:-----------------------:|:-------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------|-------------------|\n|  FSVD  | [MinkowskiNet](https://arxiv.org/abs/1904.08755) |       Sparse UNet       | [Google Drive](https://drive.google.com/drive/folders/1oZIS9FlklCQ6dlh3TZ8Junir7QwgT-Me?usp=drive_link) | [./models/minkowskinet/model.py](./models/minkowskinet/model.py) | point cloud input |\n|  FPVD  |    [SPVCNN](https://arxiv.org/abs/2007.16100)    | Point-Voxel Sparse UNet | [Google Drive](https://drive.google.com/drive/folders/1F69RbprAoT6MOJ7iI0KHjxuq-tbeqGiR?usp=drive_link) | [./models/spvcnn/model.py](./models/spvcnn/model.py)             | point cloud input |\n\n\n## Usage\n\n1. Place the unzipped `pretrained_weights` folder under the root python directory **or** modify the `DEFAULT_ROOT` variable in the `__init__.py`.\n2. Prepare input data, including the synthesized samples and the reference dataset. **Note**: The reference data should be the **point clouds projected back from range images** instead of raw point clouds. \n3. Specify the data type (`32` or `64`) and the metrics to evaluate. Options: `mmd`, `jsd`, `frid`, `fsvd`, `fpvd`, `cd`, `emd`.\n4. (Optional) If you want to compute `frid`, `fsvd` or `fpvd` metric, adjust the corresponding batch size through the `MODAL2BATCHSIZE` in file `__init__.py` according to your max GPU memory (default: ~24GB).\n5. Start evaluation and all results will print out!\n\n### Example:\n\n```\nfrom .eval_utils import evaluate\n\ndata = '64'  # specify data type to evaluate\nmetrics = ['mmd', 'jsd', 'frid', 'fsvd', 'fpvd']  # specify metrics to evaluate\n\n# list of np.float32 array\n# shape of each array: (#points, #dim=3), #dim: xyz coordinate (NOTE: no need to input remission)\nreference = ...\nsamples = ...\n\nevaluate(reference, samples, metrics, data)\n```\n\n\n## Acknowledgement\n\n- The implementation of MinkowskiNet and SPVCNN is borrowed from [2DPASS](https://github.com/yanx27/2DPASS).\n- The implementation of RangeNet++ is borrowed from [the official RangeNet++ codebase](https://github.com/PRBonn/lidar-bonnetal).\n- The implementation of Chamfer Distance is adapted from [CD Pytorch Implementation](https://github.com/ThibaultGROUEIX/ChamferDistancePytorch) and Earth Mover's Distance from [MSN official repo](https://github.com/Colin97/MSN-Point-Cloud-Completion).\n"
  },
  {
    "path": "lidm/eval/__init__.py",
    "content": "\"\"\"\n@Author: Haoxi Ran\n@Date: 01/03/2024\n@Citation: Towards Realistic Scene Generation with LiDAR Diffusion Models\n\n\"\"\"\n\nimport os\n\nimport torch\nimport yaml\n\nfrom lidm.utils.misc_utils import dict2namespace\nfrom ..modules.rangenet.model import Model as rangenet\n\ntry:\n    from ..modules.spvcnn.model import Model as spvcnn\n    from ..modules.minkowskinet.model import Model as minkowskinet\nexcept:\n    print('To install torchsparse 1.4.0, please refer to https://github.com/mit-han-lab/torchsparse/tree/74099d10a51c71c14318bce63d6421f698b24f24')\n\n# user settings\nDEFAULT_ROOT = './pretrained_weights'\nMODAL2BATCHSIZE = {'range': 100, 'voxel': 50, 'point_voxel': 25}\nOUTPUT_TEMPLATE = 50 * '-' + '\\n|' + 16 * ' ' + '{}:{:.4E}' + 17 * ' ' + '|\\n' + 50 * '-'\n\n# eval settings (do not modify)\nVOXEL_SIZE = 0.05\nNUM_SECTORS = 16\nAGG_TYPE = 'depth'\nTYPE2DATASET = {'32': 'nuscenes', '64': 'kitti'}\nDATA_CONFIG = {'64': {'x': [-50, 50], 'y': [-50, 50], 'z': [-3, 1]},\n               '32': {'x': [-30, 30], 'y': [-30, 30], 'z': [-3, 6]}}\nMODALITY2MODEL = {'range': 'rangenet', 'voxel': 'minkowskinet', 'point_voxel': 'spvcnn'}\nDATASET_CONFIG = {'kitti': {'size': [64, 1024], 'fov': [3, -25], 'depth_range': [1.0, 56.0], 'depth_scale': 6},\n                  'nuscenes': {'size': [32, 1024], 'fov': [10, -30], 'depth_range': [1.0, 45.0]}}\n\n\ndef build_model(dataset_name, model_name, device='cpu'):\n    # config\n    model_folder = os.path.join(DEFAULT_ROOT, dataset_name, model_name)\n\n    if not os.path.isdir(model_folder):\n        raise Exception('Not Available Pretrained Weights!')\n\n    config = yaml.safe_load(open(os.path.join(model_folder, 'config.yaml'), 'r'))\n    if model_name != 'rangenet':\n        config = dict2namespace(config)\n\n    # build model\n    model = eval(model_name)(config)\n\n    # load checkpoint\n    if model_name == 'rangenet':\n        model.load_pretrained_weights(model_folder)\n    else:\n        ckpt = torch.load(os.path.join(model_folder, 'model.ckpt'), map_location=\"cpu\")\n        model.load_state_dict(ckpt['state_dict'], strict=False)\n    model.to(device)\n    model.eval()\n\n    return model\n"
  },
  {
    "path": "lidm/eval/compile.sh",
    "content": "#!/bin/sh\n\ncd modules/chamfer\npython setup.py build_ext --inplace\n\ncd ../emd\npython setup.py build_ext --inplace\n\ncd ..\n"
  },
  {
    "path": "lidm/eval/eval_utils.py",
    "content": "\"\"\"\n@Author: Haoxi Ran\n@Date: 01/03/2024\n@Citation: Towards Realistic Scene Generation with LiDAR Diffusion Models\n\n\"\"\"\nimport multiprocessing\nfrom functools import partial\n\nimport numpy as np\nfrom scipy.spatial.distance import jensenshannon\nfrom tqdm import tqdm\n\nfrom . import OUTPUT_TEMPLATE\nfrom .metric_utils import compute_logits, compute_pairwise_cd, \\\n    compute_pairwise_emd, pcd2bev_sum, compute_pairwise_cd_batch, pcd2bev_bin\nfrom .fid_score import calculate_frechet_distance\n\n\ndef evaluate(reference, samples, metrics, data):\n    # perceptual\n    if 'frid' in metrics:\n        compute_frid(reference, samples, data)\n    if 'fsvd' in metrics:\n        compute_fsvd(reference, samples, data)\n    if 'fpvd' in metrics:\n        compute_fpvd(reference, samples, data)\n\n    # reconstruction\n    if 'cd' in metrics:\n        compute_cd(reference, samples)\n    if 'emd' in metrics:\n        compute_emd(reference, samples)\n\n    # statistical\n    if 'jsd' in metrics:\n        compute_jsd(reference, samples, data)\n    if 'mmd' in metrics:\n        compute_mmd(reference, samples, data)\n\n\ndef compute_cd(reference, samples):\n    \"\"\"\n    Calculate score of Chamfer Distance (CD)\n\n    \"\"\"\n    print('Evaluating (CD) ...')\n    results = []\n    for x, y in zip(reference, samples):\n        d = compute_pairwise_cd(x, y)\n        results.append(d)\n    score = sum(results) / len(results)\n    print(OUTPUT_TEMPLATE.format('CD  ', score))\n\n\ndef compute_emd(reference, samples):\n    \"\"\"\n    Calculate score of Earth Mover's Distance (EMD)\n\n    \"\"\"\n    print('Evaluating (EMD) ...')\n    results = []\n    for x, y in zip(reference, samples):\n        d = compute_pairwise_emd(x, y)\n        results.append(d)\n    score = sum(results) / len(results)\n    print(OUTPUT_TEMPLATE.format('EMD ', score))\n\n\ndef compute_mmd(reference, samples, data, dist='cd', verbose=True):\n    \"\"\"\n    Calculate the score of Minimum Matching Distance (MMD)\n\n    \"\"\"\n    print('Evaluating (MMD) ...')\n    assert dist in ['cd', 'emd']\n    reference, samples = pcd2bev_bin(data, reference, samples)\n    compute_dist_func = compute_pairwise_cd_batch if dist == 'cd' else compute_pairwise_emd\n    results = []\n    for r in tqdm(reference, disable=not verbose):\n        dists = compute_dist_func(r, samples)\n        results.append(min(dists))\n    score = sum(results) / len(results)\n    print(OUTPUT_TEMPLATE.format('MMD ', score))\n\n\ndef compute_jsd(reference, samples, data):\n    \"\"\"\n    Calculate the score of Jensen-Shannon Divergence (JSD)\n\n    \"\"\"\n    print('Evaluating (JSD) ...')\n    reference, samples = pcd2bev_sum(data, reference, samples)\n    reference = (reference / np.sum(reference)).flatten()\n    samples = (samples / np.sum(samples)).flatten()\n    score = jensenshannon(reference, samples)\n    print(OUTPUT_TEMPLATE.format('JSD ', score))\n\n\ndef compute_fd(reference, samples):\n    mu1, mu2 = np.mean(reference, axis=0), np.mean(samples, axis=0)\n    sigma1, sigma2 = np.cov(reference, rowvar=False), np.cov(samples, rowvar=False)\n    distance = calculate_frechet_distance(mu1, sigma1, mu2, sigma2)\n    return distance\n\n\ndef compute_frid(reference, samples, data):\n    \"\"\"\n    Calculate the score of Fréchet Range Image Distance (FRID)\n\n    \"\"\"\n    print('Evaluating (FRID) ...')\n    gt_logits, samples_logits = compute_logits(data, 'range', reference, samples)\n    score = compute_fd(gt_logits, samples_logits)\n    print(OUTPUT_TEMPLATE.format('FRID', score))\n\n\ndef compute_fsvd(reference, samples, data):\n    \"\"\"\n    Calculate the score of Fréchet Sparse Volume Distance (FSVD)\n\n    \"\"\"\n    print('Evaluating (FSVD) ...')\n    gt_logits, samples_logits = compute_logits(data, 'voxel', reference, samples)\n    score = compute_fd(gt_logits, samples_logits)\n    print(OUTPUT_TEMPLATE.format('FSVD', score))\n\n\ndef compute_fpvd(reference, samples, data):\n    \"\"\"\n    Calculate the score of Fréchet Point-based Volume Distance (FPVD)\n\n    \"\"\"\n    print('Evaluating (FPVD) ...')\n    gt_logits, samples_logits = compute_logits(data, 'point_voxel', reference, samples)\n    score = compute_fd(gt_logits, samples_logits)\n    print(OUTPUT_TEMPLATE.format('FPVD', score))\n\n"
  },
  {
    "path": "lidm/eval/fid_score.py",
    "content": "\"\"\"Calculates the Frechet Inception Distance (FID) to evalulate GANs\nThe FID metric calculates the distance between two distributions of images.\nTypically, we have summary statistics (mean & covariance matrix) of one\nof these distributions, while the 2nd distribution is given by a GAN.\nWhen run as a stand-alone program, it compares the distribution of\nimages that are stored as PNG/JPEG at a specified location with a\ndistribution given by summary statistics (in pickle format).\nThe FID is calculated by assuming that X_1 and X_2 are the activations of\nthe pool_3 layer of the inception net for generated samples and real world\nsamples respectively.\nSee --help to see further details.\nCode adapted from https://github.com/bioinf-jku/TTUR to use PyTorch instead\nof Tensorflow\nCopyright 2018 Institute of Bioinformatics, JKU Linz\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n   http://www.apache.org/licenses/LICENSE-2.0\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n\"\"\"\nimport os\nimport pathlib\nfrom argparse import ArgumentDefaultsHelpFormatter, ArgumentParser\n\nimport numpy as np\nimport torch\nimport torchvision.transforms as TF\nfrom PIL import Image\nfrom scipy import linalg\nfrom torch.nn.functional import adaptive_avg_pool2d\n\ntry:\n    from tqdm import tqdm\nexcept ImportError:\n    # If tqdm is not available, provide a mock version of it\n    def tqdm(x):\n        return x\n\nclass ImagePathDataset(torch.utils.data.Dataset):\n    def __init__(self, files, transforms=None):\n        self.files = files\n        self.transforms = transforms\n\n    def __len__(self):\n        return len(self.files)\n\n    def __getitem__(self, i):\n        path = self.files[i]\n        img = Image.open(path).convert('RGB')\n        if self.transforms is not None:\n            img = self.transforms(img)\n        return img\n\n\ndef get_activations(files, model, batch_size=50, dims=2048, device='cpu',\n                    num_workers=1):\n    \"\"\"Calculates the activations of the pool_3 layer for all images.\n    Params:\n    -- files       : List of image files paths\n    -- model       : Instance of inception model\n    -- batch_size  : Batch size of images for the model to process at once.\n                     Make sure that the number of samples is a multiple of\n                     the batch size, otherwise some samples are ignored. This\n                     behavior is retained to match the original FID score\n                     implementation.\n    -- dims        : Dimensionality of features returned by Inception\n    -- device      : Device to run calculations\n    -- num_workers : Number of parallel dataloader workers\n    Returns:\n    -- A numpy array of dimension (num images, dims) that contains the\n       activations of the given tensor when feeding inception with the\n       query tensor.\n    \"\"\"\n    model.eval()\n\n    if batch_size > len(files):\n        print(('Warning: batch size is bigger than the data size. '\n               'Setting batch size to data size'))\n        batch_size = len(files)\n\n    dataset = ImagePathDataset(files, transforms=TF.ToTensor())\n    dataloader = torch.utils.data.DataLoader(dataset,\n                                             batch_size=batch_size,\n                                             shuffle=False,\n                                             drop_last=False,\n                                             num_workers=num_workers)\n\n    pred_arr = np.empty((len(files), dims))\n\n    start_idx = 0\n\n    for batch in tqdm(dataloader):\n        batch = batch.to(device)\n\n        with torch.no_grad():\n            pred = model(batch)[0]\n\n        # If model output is not scalar, apply global spatial average pooling.\n        # This happens if you choose a dimensionality not equal 2048.\n        if pred.size(2) != 1 or pred.size(3) != 1:\n            pred = adaptive_avg_pool2d(pred, output_size=(1, 1))\n\n        pred = pred.squeeze(3).squeeze(2).cpu().numpy()\n\n        pred_arr[start_idx:start_idx + pred.shape[0]] = pred\n\n        start_idx = start_idx + pred.shape[0]\n\n    return pred_arr\n\n\ndef calculate_frechet_distance(mu1, sigma1, mu2, sigma2, eps=1e-6):\n    \"\"\"Numpy implementation of the Frechet Distance.\n    The Frechet distance between two multivariate Gaussians X_1 ~ N(mu_1, C_1)\n    and X_2 ~ N(mu_2, C_2) is\n            d^2 = ||mu_1 - mu_2||^2 + Tr(C_1 + C_2 - 2*sqrt(C_1*C_2)).\n    Stable version by Dougal J. Sutherland.\n    Params:\n    -- mu1   : Numpy array containing the activations of a layer of the\n               inception net (like returned by the function 'get_predictions')\n               for generated samples.\n    -- mu2   : The sample mean over activations, precalculated on an\n               representative data set.\n    -- sigma1: The covariance matrix over activations for generated samples.\n    -- sigma2: The covariance matrix over activations, precalculated on an\n               representative data set.\n    Returns:\n    --   : The Frechet Distance.\n    \"\"\"\n\n    mu1 = np.atleast_1d(mu1)\n    mu2 = np.atleast_1d(mu2)\n\n    sigma1 = np.atleast_2d(sigma1)\n    sigma2 = np.atleast_2d(sigma2)\n\n    assert mu1.shape == mu2.shape, \\\n        'Training and test mean vectors have different lengths'\n    assert sigma1.shape == sigma2.shape, \\\n        'Training and test covariances have different dimensions'\n\n    diff = mu1 - mu2\n\n    # Product might be almost singular\n    covmean, _ = linalg.sqrtm(sigma1.dot(sigma2), disp=False)\n    if not np.isfinite(covmean).all():\n        msg = ('fid calculation produces singular product; '\n               'adding %s to diagonal of cov estimates') % eps\n        print(msg)\n        offset = np.eye(sigma1.shape[0]) * eps\n        covmean = linalg.sqrtm((sigma1 + offset).dot(sigma2 + offset))\n\n    # Numerical error might give slight imaginary component\n    if np.iscomplexobj(covmean):\n        if not np.allclose(np.diagonal(covmean).imag, 0, atol=1e-3):\n            m = np.max(np.abs(covmean.imag))\n            raise ValueError('Imaginary component {}'.format(m))\n        covmean = covmean.real\n\n    tr_covmean = np.trace(covmean)\n\n    return (diff.dot(diff) + np.trace(sigma1)\n            + np.trace(sigma2) - 2 * tr_covmean)\n\n\ndef calculate_activation_statistics(files, model, batch_size=50, dims=2048,\n                                    device='cpu', num_workers=1):\n    \"\"\"Calculation of the statistics used by the FID.\n    Params:\n    -- files       : List of image files paths\n    -- model       : Instance of inception model\n    -- batch_size  : The images numpy array is split into batches with\n                     batch size batch_size. A reasonable batch size\n                     depends on the hardware.\n    -- dims        : Dimensionality of features returned by Inception\n    -- device      : Device to run calculations\n    -- num_workers : Number of parallel dataloader workers\n    Returns:\n    -- mu    : The mean over samples of the activations of the pool_3 layer of\n               the inception model.\n    -- sigma : The covariance matrix of the activations of the pool_3 layer of\n               the inception model.\n    \"\"\"\n    act = get_activations(files, model, batch_size, dims, device, num_workers)\n    mu = np.mean(act, axis=0)\n    sigma = np.cov(act, rowvar=False)\n    return mu, sigma\n"
  },
  {
    "path": "lidm/eval/metric_utils.py",
    "content": "\"\"\"\n@Author: Haoxi Ran\n@Date: 01/03/2024\n@Citation: Towards Realistic Scene Generation with LiDAR Diffusion Models\n\n\"\"\"\n\nimport math\nfrom itertools import repeat\nfrom typing import List, Tuple, Union\nimport numpy as np\nimport torch\n\nfrom . import build_model, VOXEL_SIZE, MODALITY2MODEL, MODAL2BATCHSIZE, DATASET_CONFIG, AGG_TYPE, NUM_SECTORS, \\\n    TYPE2DATASET, DATA_CONFIG\n\ntry:\n    from torchsparse import SparseTensor, PointTensor\n    from torchsparse.utils.collate import sparse_collate_fn\n    from .modules.chamfer3D.dist_chamfer_3D import chamfer_3DDist\n    from .modules.chamfer2D.dist_chamfer_2D import chamfer_2DDist\n    from .modules.emd.emd_module import emdModule\nexcept:\n    print(\n        'To install torchsparse 1.4.0, please refer to https://github.com/mit-han-lab/torchsparse/tree/74099d10a51c71c14318bce63d6421f698b24f24')\n\n\ndef ravel_hash(x: np.ndarray) -> np.ndarray:\n    assert x.ndim == 2, x.shape\n\n    x = x - np.min(x, axis=0)\n    x = x.astype(np.uint64, copy=False)\n    xmax = np.max(x, axis=0).astype(np.uint64) + 1\n\n    h = np.zeros(x.shape[0], dtype=np.uint64)\n    for k in range(x.shape[1] - 1):\n        h += x[:, k]\n        h *= xmax[k + 1]\n    h += x[:, -1]\n    return h\n\n\ndef sparse_quantize(coords, voxel_size: Union[float, Tuple[float, ...]] = 1, *, return_index: bool = False,\n                    return_inverse: bool = False) -> List[np.ndarray]:\n    \"\"\"\n    Modified based on https://github.com/mit-han-lab/torchsparse/blob/462dea4a701f87a7545afb3616bf2cf53dd404f3/torchsparse/utils/quantize.py\n\n    \"\"\"\n    if isinstance(voxel_size, (float, int)):\n        voxel_size = tuple(repeat(voxel_size, coords.shape[1]))\n    assert isinstance(voxel_size, tuple) and len(voxel_size) in [2, 3]  # support 2D and 3D coordinates only\n\n    voxel_size = np.array(voxel_size)\n    coords = np.floor(coords / voxel_size).astype(np.int32)\n\n    _, indices, inverse_indices = np.unique(\n        ravel_hash(coords), return_index=True, return_inverse=True\n    )\n    coords = coords[indices]\n\n    outputs = [coords]\n    if return_index:\n        outputs += [indices]\n    if return_inverse:\n        outputs += [inverse_indices]\n    return outputs[0] if len(outputs) == 1 else outputs\n\n\ndef pcd2range(pcd, size, fov, depth_range, remission=None, labels=None, **kwargs):\n    # laser parameters\n    fov_up = fov[0] / 180.0 * np.pi  # field of view up in rad\n    fov_down = fov[1] / 180.0 * np.pi  # field of view down in rad\n    fov_range = abs(fov_down) + abs(fov_up)  # get field of view total in rad\n\n    # get depth (distance) of all points\n    depth = np.linalg.norm(pcd, 2, axis=1)\n\n    # mask points out of range\n    mask = np.logical_and(depth > depth_range[0], depth < depth_range[1])\n    depth, pcd = depth[mask], pcd[mask]\n\n    # get scan components\n    scan_x, scan_y, scan_z = pcd[:, 0], pcd[:, 1], pcd[:, 2]\n\n    # get angles of all points\n    yaw = -np.arctan2(scan_y, scan_x)\n    pitch = np.arcsin(scan_z / depth)\n\n    # get projections in image coords\n    proj_x = 0.5 * (yaw / np.pi + 1.0)  # in [0.0, 1.0]\n    proj_y = 1.0 - (pitch + abs(fov_down)) / fov_range  # in [0.0, 1.0]\n\n    # scale to image size using angular resolution\n    proj_x *= size[1]  # in [0.0, W]\n    proj_y *= size[0]  # in [0.0, H]\n\n    # round and clamp for use as index\n    proj_x = np.maximum(0, np.minimum(size[1] - 1, np.floor(proj_x))).astype(np.int32)  # in [0,W-1]\n    proj_y = np.maximum(0, np.minimum(size[0] - 1, np.floor(proj_y))).astype(np.int32)  # in [0,H-1]\n\n    # order in decreasing depth\n    order = np.argsort(depth)[::-1]\n    proj_x, proj_y = proj_x[order], proj_y[order]\n\n    # project depth\n    depth = depth[order]\n    proj_range = np.full(size, -1, dtype=np.float32)\n    proj_range[proj_y, proj_x] = depth\n\n    # project point feature\n    if remission is not None:\n        remission = remission[mask][order]\n        proj_feature = np.full(size, -1, dtype=np.float32)\n        proj_feature[proj_y, proj_x] = remission\n    elif labels is not None:\n        labels = labels[mask][order]\n        proj_feature = np.full(size, 0, dtype=np.float32)\n        proj_feature[proj_y, proj_x] = labels\n    else:\n        proj_feature = None\n\n    return proj_range, proj_feature\n\n\ndef range2xyz(range_img, fov, depth_range, depth_scale, log_scale=True, **kwargs):\n    # laser parameters\n    size = range_img.shape\n    fov_up = fov[0] / 180.0 * np.pi  # field of view up in rad\n    fov_down = fov[1] / 180.0 * np.pi  # field of view down in rad\n    fov_range = abs(fov_down) + abs(fov_up)  # get field of view total in rad\n\n    # inverse transform from depth\n    if log_scale:\n        depth = (np.exp2(range_img * depth_scale) - 1)\n    else:\n        depth = range_img\n\n    scan_x, scan_y = np.meshgrid(np.arange(size[1]), np.arange(size[0]))\n    scan_x = scan_x.astype(np.float64) / size[1]\n    scan_y = scan_y.astype(np.float64) / size[0]\n\n    yaw = np.pi * (scan_x * 2 - 1)\n    pitch = (1.0 - scan_y) * fov_range - abs(fov_down)\n\n    xyz = -np.ones((3, *size))\n    xyz[0] = np.cos(yaw) * np.cos(pitch) * depth\n    xyz[1] = -np.sin(yaw) * np.cos(pitch) * depth\n    xyz[2] = np.sin(pitch) * depth\n\n    # mask out invalid points\n    mask = np.logical_and(depth > depth_range[0], depth < depth_range[1])\n    xyz[:, ~mask] = -1\n\n    return xyz\n\n\ndef pcd2voxel(pcd):\n    pcd_voxel = np.round(pcd / VOXEL_SIZE)\n    pcd_voxel = pcd_voxel - pcd_voxel.min(0, keepdims=1)\n    feat = np.concatenate((pcd, -np.ones((pcd.shape[0], 1))), axis=1)  # -1 for remission placeholder\n    _, inds, inverse_map = sparse_quantize(pcd_voxel, 1, return_index=True, return_inverse=True)\n\n    feat = torch.FloatTensor(feat[inds])\n    pcd_voxel = torch.LongTensor(pcd_voxel[inds])\n    lidar = SparseTensor(feat, pcd_voxel)\n    output = {'lidar': lidar}\n    return output\n\n\ndef pcd2voxel_full(data_type, *args):\n    config = DATA_CONFIG[data_type]\n    x_range, y_range, z_range = config['x'], config['y'], config['z']\n    vol_shape = (math.ceil((x_range[1] - x_range[0]) / VOXEL_SIZE), math.ceil((y_range[1] - y_range[0]) / VOXEL_SIZE),\n                 math.ceil((z_range[1] - z_range[0]) / VOXEL_SIZE))\n    min_bound = (math.ceil((x_range[0]) / VOXEL_SIZE), math.ceil((y_range[0]) / VOXEL_SIZE),\n                 math.ceil((z_range[0]) / VOXEL_SIZE))\n\n    output = tuple()\n    for data in args:\n        volume_list = []\n        for pcd in data:\n            # mask out invalid points\n            mask_x = np.logical_and(pcd[:, 0] > x_range[0], pcd[:, 0] < x_range[1])\n            mask_y = np.logical_and(pcd[:, 1] > y_range[0], pcd[:, 1] < y_range[1])\n            mask_z = np.logical_and(pcd[:, 2] > z_range[0], pcd[:, 2] < z_range[1])\n            mask = mask_x & mask_y & mask_z\n            pcd = pcd[mask]\n\n            # voxelize\n            pcd_voxel = np.floor(pcd / VOXEL_SIZE)\n            _, indices, inverse_map = sparse_quantize(pcd_voxel, 1, return_index=True, return_inverse=True)\n            pcd_voxel = pcd_voxel[indices]\n            pcd_voxel = (pcd_voxel - min_bound).astype(np.int32)\n\n            # 2D bev grid\n            vol = np.zeros(vol_shape, dtype=np.float32)\n            vol[pcd_voxel[:, 0], pcd_voxel[:, 1], pcd_voxel[:, 2]] = 1\n            volume_list.append(vol)\n        output += (volume_list,)\n    return output\n\n\n# def pcd2bev_full(data_type, *args, voxel_size=VOXEL_SIZE):\n#     config = DATA_CONFIG[data_type]\n#     x_range, y_range = config['x'], config['y']\n#     vol_shape = (math.ceil((x_range[1] - x_range[0]) / voxel_size), math.ceil((y_range[1] - y_range[0]) / voxel_size))\n#     min_bound = (math.ceil((x_range[0]) / voxel_size), math.ceil((y_range[0]) / voxel_size))\n#\n#     output = tuple()\n#     for data in args:\n#         volume_list = []\n#         for pcd in data:\n#             # mask out invalid points\n#             mask_x = np.logical_and(pcd[:, 0] > x_range[0], pcd[:, 0] < x_range[1])\n#             mask_y = np.logical_and(pcd[:, 1] > y_range[0], pcd[:, 1] < y_range[1])\n#             mask = mask_x & mask_y\n#             pcd = pcd[mask][:, :2]  # keep x,y coord\n#\n#             # voxelize\n#             pcd_voxel = np.floor(pcd / voxel_size)\n#             _, indices, inverse_map = sparse_quantize(pcd_voxel, 1, return_index=True, return_inverse=True)\n#             pcd_voxel = pcd_voxel[indices]\n#             pcd_voxel = (pcd_voxel - min_bound).astype(np.int32)\n#\n#             # 2D bev grid\n#             vol = np.zeros(vol_shape, dtype=np.float32)\n#             vol[pcd_voxel[:, 0], pcd_voxel[:, 1]] = 1\n#             volume_list.append(vol)\n#         output += (volume_list,)\n#     return output\n\n\ndef pcd2bev_sum(data_type, *args, voxel_size=VOXEL_SIZE):\n    config = DATA_CONFIG[data_type]\n    x_range, y_range = config['x'], config['y']\n    vol_shape = (math.ceil((x_range[1] - x_range[0]) / voxel_size), math.ceil((y_range[1] - y_range[0]) / voxel_size))\n    min_bound = (math.ceil((x_range[0]) / voxel_size), math.ceil((y_range[0]) / voxel_size))\n\n    output = tuple()\n    for data in args:\n        volume_sum = np.zeros(vol_shape, np.float32)\n        for pcd in data:\n            # mask out invalid points\n            mask_x = np.logical_and(pcd[:, 0] > x_range[0], pcd[:, 0] < x_range[1])\n            mask_y = np.logical_and(pcd[:, 1] > y_range[0], pcd[:, 1] < y_range[1])\n            mask = mask_x & mask_y\n            pcd = pcd[mask][:, :2]  # keep x,y coord\n\n            # voxelize\n            pcd_voxel = np.floor(pcd / voxel_size)\n            _, indices, inverse_map = sparse_quantize(pcd_voxel, 1, return_index=True, return_inverse=True)\n            pcd_voxel = pcd_voxel[indices]\n            pcd_voxel = (pcd_voxel - min_bound).astype(np.int32)\n\n            # summation\n            volume_sum[pcd_voxel[:, 0], pcd_voxel[:, 1]] += 1.\n        output += (volume_sum,)\n    return output\n\n\ndef pcd2bev_bin(data_type, *args, voxel_size=0.5):\n    config = DATA_CONFIG[data_type]\n    x_range, y_range = config['x'], config['y']\n    vol_shape = (math.ceil((x_range[1] - x_range[0]) / voxel_size), math.ceil((y_range[1] - y_range[0]) / voxel_size))\n    min_bound = (math.ceil((x_range[0]) / voxel_size), math.ceil((y_range[0]) / voxel_size))\n\n    output = tuple()\n    for data in args:\n        pcd_list = []\n        for pcd in data:\n            # mask out invalid points\n            mask_x = np.logical_and(pcd[:, 0] > x_range[0], pcd[:, 0] < x_range[1])\n            mask_y = np.logical_and(pcd[:, 1] > y_range[0], pcd[:, 1] < y_range[1])\n            mask = mask_x & mask_y\n            pcd = pcd[mask][:, :2]  # keep x,y coord\n\n            # voxelize\n            pcd_voxel = np.floor(pcd / voxel_size)\n            _, indices, inverse_map = sparse_quantize(pcd_voxel, 1, return_index=True, return_inverse=True)\n            pcd_voxel = pcd_voxel[indices]\n            pcd_voxel = ((pcd_voxel - min_bound) / vol_shape).astype(np.float32)\n            pcd_list.append(pcd_voxel)\n        output += (pcd_list,)\n    return output\n\n\ndef bev_sample(data_type, *args, voxel_size=0.5):\n    config = DATA_CONFIG[data_type]\n    x_range, y_range = config['x'], config['y']\n\n    output = tuple()\n    for data in args:\n        pcd_list = []\n        for pcd in data:\n            # mask out invalid points\n            mask_x = np.logical_and(pcd[:, 0] > x_range[0], pcd[:, 0] < x_range[1])\n            mask_y = np.logical_and(pcd[:, 1] > y_range[0], pcd[:, 1] < y_range[1])\n            mask = mask_x & mask_y\n            pcd = pcd[mask][:, :2]  # keep x,y coord\n\n            # voxelize\n            pcd_voxel = np.floor(pcd / voxel_size)\n            _, indices, inverse_map = sparse_quantize(pcd_voxel, 1, return_index=True, return_inverse=True)\n            pcd = pcd[indices]\n            pcd_list.append(pcd)\n        output += (pcd_list,)\n    return output\n\n\ndef preprocess_pcd(pcd, **kwargs):\n    depth = np.linalg.norm(pcd, 2, axis=1)\n    mask = np.logical_and(depth > kwargs['depth_range'][0], depth < kwargs['depth_range'][1])\n    pcd = pcd[mask]\n    return pcd\n\n\ndef preprocess_range(pcd, **kwargs):\n    depth_img = pcd2range(pcd, **kwargs)[0]\n    xyz_img = range2xyz(depth_img, log_scale=False, **kwargs)\n    depth_img = depth_img[None]\n    img = np.vstack([depth_img, xyz_img])\n    return img\n\n\ndef batch2list(batch_dict, agg_type='depth', **kwargs):\n    \"\"\"\n    Aggregation Type: Default 'depth', ['all', 'sector', 'depth']\n    \"\"\"\n    output_list = []\n    batch_indices = batch_dict['batch_indices']\n    for b_idx in range(batch_indices.max() + 1):\n        # avg all\n        if agg_type == 'all':\n            logits = batch_dict['logits'][batch_indices == b_idx].mean(0)\n\n        # avg on sectors\n        elif agg_type == 'sector':\n            logits = batch_dict['logits'][batch_indices == b_idx]\n            coords = batch_dict['coords'][batch_indices == b_idx].float()\n            coords = coords - coords.mean(0)\n            angle = torch.atan2(coords[:, 1], coords[:, 0])  # [-pi, pi]\n            sector_range = torch.linspace(-np.pi - 1e-4, np.pi + 1e-4, NUM_SECTORS + 1)\n            logits_list = []\n            for i in range(NUM_SECTORS):\n                sector_indices = torch.where((angle >= sector_range[i]) & (angle < sector_range[i + 1]))[0]\n                sector_logits = logits[sector_indices].mean(0)\n                sector_logits = torch.nan_to_num(sector_logits, 0.)\n                logits_list.append(sector_logits)\n            logits = torch.cat(logits_list)  # dim: 768\n\n        # avg by depth\n        elif agg_type == 'depth':\n            logits = batch_dict['logits'][batch_indices == b_idx]\n            coords = batch_dict['coords'][batch_indices == b_idx].float()\n            coords = coords - coords.mean(0)\n            bev_depth = torch.norm(coords, dim=-1) * VOXEL_SIZE\n            sector_range = torch.linspace(kwargs['depth_range'][0] + 3, kwargs['depth_range'][1], NUM_SECTORS + 1)\n            sector_range[0] = 0.\n            logits_list = []\n            for i in range(NUM_SECTORS):\n                sector_indices = torch.where((bev_depth >= sector_range[i]) & (bev_depth < sector_range[i + 1]))[0]\n                sector_logits = logits[sector_indices].mean(0)\n                sector_logits = torch.nan_to_num(sector_logits, 0.)\n                logits_list.append(sector_logits)\n            logits = torch.cat(logits_list)  # dim: 768\n\n        else:\n            raise NotImplementedError\n\n        output_list.append(logits.detach().cpu().numpy())\n    return output_list\n\n\ndef compute_logits(data_type, modality, *args):\n    assert data_type in ['32', '64']\n    assert modality in ['range', 'voxel', 'point_voxel']\n    is_voxel = 'voxel' in modality\n    dataset_name = TYPE2DATASET[data_type]\n    dataset_config = DATASET_CONFIG[dataset_name]\n    bs = MODAL2BATCHSIZE[modality]\n\n    model = build_model(dataset_name, MODALITY2MODEL[modality], device='cuda')\n\n    output = tuple()\n    for data in args:\n        all_logits_list = []\n        for i in range(math.ceil(len(data) / bs)):\n            batch = data[i * bs:(i + 1) * bs]\n            if is_voxel:\n                batch = [pcd2voxel(preprocess_pcd(pcd, **dataset_config)) for pcd in batch]\n                batch = sparse_collate_fn(batch)\n                batch = {k: v.cuda() if isinstance(v, (torch.Tensor, SparseTensor, PointTensor)) else v for k, v in\n                         batch.items()}\n                with torch.no_grad():\n                    batch_out = model(batch, return_final_logits=True)\n                    batch_out = batch2list(batch_out, AGG_TYPE, **dataset_config)\n                    all_logits_list.extend(batch_out)\n            else:\n                batch = [preprocess_range(pcd, **dataset_config) for pcd in batch]\n                batch = torch.from_numpy(np.stack(batch)).float().cuda()\n                with torch.no_grad():\n                    batch_out = model(batch, return_final_logits=True, agg_type=AGG_TYPE)\n                    all_logits_list.append(batch_out)\n        if is_voxel:\n            all_logits = np.stack(all_logits_list)\n        else:\n            all_logits = np.vstack(all_logits_list)\n        output += (all_logits,)\n\n    del model, batch, batch_out\n    torch.cuda.empty_cache()\n    return output\n\n\ndef compute_pairwise_cd(x, y, module=None):\n    if module is None:\n        module = chamfer_3DDist()\n    if x.ndim == 2 and y.ndim == 2:\n        x, y = x[None], y[None]\n    x, y = torch.from_numpy(x).cuda(), torch.from_numpy(y).cuda()\n    dist1, dist2, _, _ = module(x, y)\n    dist = (dist1.mean() + dist2.mean()) / 2\n    return dist.item()\n\n\ndef compute_pairwise_cd_batch(reference, samples):\n    ndim = reference.ndim\n    assert ndim in [2, 3]\n    module = chamfer_3DDist() if ndim == 3 else chamfer_2DDist()\n    len_r, len_s = reference.shape[0], [s.shape[0] for s in samples]\n    max_len = max([len_r] + len_s)\n    reference = torch.from_numpy(\n        np.vstack([reference, np.ones((max_len - reference.shape[0], ndim), dtype=np.float32) * 1e6])).cuda()\n    samples = [np.vstack([s, np.ones((max_len - s.shape[0], ndim), dtype=np.float32) * 1e6]) for s in samples]\n    samples = torch.from_numpy(np.stack(samples)).cuda()\n    reference = reference.expand_as(samples)\n    dist_r, dist_s, _, _ = module(reference, samples)\n\n    results = []\n    for i in range(samples.shape[0]):\n        dist1, dist2, len1, len2 = dist_r[i], dist_s[i], len_r, len_s[i]\n        dist = (dist1[:len1].mean() + dist2[:len2].mean()) / 2.\n        results.append(dist.item())\n    return results\n\n\ndef compute_pairwise_emd(x, y, module=None):\n    if module is None:\n        module = emdModule()\n    n_points = min(x.shape[0], y.shape[0])\n    n_points = n_points - n_points % 1024\n    x, y = x[:n_points], y[:n_points]\n    if x.ndim == 2 and y.ndim == 2:\n        x, y = x[None], y[None]\n    x, y = torch.from_numpy(x).cuda(), torch.from_numpy(y).cuda()\n    dist, _ = module(x, y, 0.005, 50)\n    dist = torch.sqrt(dist).mean()\n    return dist.item()\n"
  },
  {
    "path": "lidm/eval/models/__init__.py",
    "content": ""
  },
  {
    "path": "lidm/eval/models/minkowskinet/__init__.py",
    "content": ""
  },
  {
    "path": "lidm/eval/models/minkowskinet/model.py",
    "content": "import torch\nimport torch.nn as nn\n\ntry:\n    import torchsparse\n    import torchsparse.nn as spnn\n    from ..ts import basic_blocks\nexcept ImportError:\n    raise Exception('Required ts lib. Reference: https://github.com/mit-han-lab/torchsparse/tree/v1.4.0')\n\n\nclass Model(nn.Module):\n    def __init__(self, config):\n        super().__init__()\n\n        cr = config.model_params.cr\n        cs = config.model_params.layer_num\n        cs = [int(cr * x) for x in cs]\n\n        self.pres = self.vres = config.model_params.voxel_size\n        self.num_classes = config.model_params.num_class\n\n        self.stem = nn.Sequential(\n            spnn.Conv3d(config.model_params.input_dims, cs[0], kernel_size=3, stride=1),\n            spnn.BatchNorm(cs[0]), spnn.ReLU(True),\n            spnn.Conv3d(cs[0], cs[0], kernel_size=3, stride=1),\n            spnn.BatchNorm(cs[0]), spnn.ReLU(True))\n\n        self.stage1 = nn.Sequential(\n            basic_blocks.BasicConvolutionBlock(cs[0], cs[0], ks=2, stride=2, dilation=1),\n            basic_blocks.ResidualBlock(cs[0], cs[1], ks=3, stride=1, dilation=1),\n            basic_blocks.ResidualBlock(cs[1], cs[1], ks=3, stride=1, dilation=1),\n        )\n\n        self.stage2 = nn.Sequential(\n            basic_blocks.BasicConvolutionBlock(cs[1], cs[1], ks=2, stride=2, dilation=1),\n            basic_blocks.ResidualBlock(cs[1], cs[2], ks=3, stride=1, dilation=1),\n            basic_blocks.ResidualBlock(cs[2], cs[2], ks=3, stride=1, dilation=1),\n        )\n\n        self.stage3 = nn.Sequential(\n            basic_blocks.BasicConvolutionBlock(cs[2], cs[2], ks=2, stride=2, dilation=1),\n            basic_blocks.ResidualBlock(cs[2], cs[3], ks=3, stride=1, dilation=1),\n            basic_blocks.ResidualBlock(cs[3], cs[3], ks=3, stride=1, dilation=1),\n        )\n\n        self.stage4 = nn.Sequential(\n            basic_blocks.BasicConvolutionBlock(cs[3], cs[3], ks=2, stride=2, dilation=1),\n            basic_blocks.ResidualBlock(cs[3], cs[4], ks=3, stride=1, dilation=1),\n            basic_blocks.ResidualBlock(cs[4], cs[4], ks=3, stride=1, dilation=1),\n        )\n\n        self.up1 = nn.ModuleList([\n            basic_blocks.BasicDeconvolutionBlock(cs[4], cs[5], ks=2, stride=2),\n            nn.Sequential(\n                basic_blocks.ResidualBlock(cs[5] + cs[3], cs[5], ks=3, stride=1,\n                                           dilation=1),\n                basic_blocks.ResidualBlock(cs[5], cs[5], ks=3, stride=1, dilation=1),\n            )\n        ])\n\n        self.up2 = nn.ModuleList([\n            basic_blocks.BasicDeconvolutionBlock(cs[5], cs[6], ks=2, stride=2),\n            nn.Sequential(\n                basic_blocks.ResidualBlock(cs[6] + cs[2], cs[6], ks=3, stride=1,\n                                           dilation=1),\n                basic_blocks.ResidualBlock(cs[6], cs[6], ks=3, stride=1, dilation=1),\n            )\n        ])\n\n        self.up3 = nn.ModuleList([\n            basic_blocks.BasicDeconvolutionBlock(cs[6], cs[7], ks=2, stride=2),\n            nn.Sequential(\n                basic_blocks.ResidualBlock(cs[7] + cs[1], cs[7], ks=3, stride=1,\n                                           dilation=1),\n                basic_blocks.ResidualBlock(cs[7], cs[7], ks=3, stride=1, dilation=1),\n            )\n        ])\n\n        self.up4 = nn.ModuleList([\n            basic_blocks.BasicDeconvolutionBlock(cs[7], cs[8], ks=2, stride=2),\n            nn.Sequential(\n                basic_blocks.ResidualBlock(cs[8] + cs[0], cs[8], ks=3, stride=1,\n                                           dilation=1),\n                basic_blocks.ResidualBlock(cs[8], cs[8], ks=3, stride=1, dilation=1),\n            )\n        ])\n\n        self.classifier = nn.Sequential(nn.Linear(cs[8], self.num_classes))\n\n        self.weight_initialization()\n        self.dropout = nn.Dropout(0.3, True)\n\n    def weight_initialization(self):\n        for m in self.modules():\n            if isinstance(m, nn.BatchNorm1d):\n                nn.init.constant_(m.weight, 1)\n                nn.init.constant_(m.bias, 0)\n\n    def forward(self, data_dict, return_logits=False, return_final_logits=False):\n        x = data_dict['lidar']\n        x.C = x.C.int()\n\n        x0 = self.stem(x)\n        x1 = self.stage1(x0)\n        x2 = self.stage2(x1)\n        x3 = self.stage3(x2)\n        x4 = self.stage4(x3)\n\n        if return_logits:\n            output_dict = dict()\n            output_dict['logits'] = x4.F\n            output_dict['batch_indices'] = x4.C[:, -1]\n            return output_dict\n\n        y1 = self.up1[0](x4)\n        y1 = torchsparse.cat([y1, x3])\n        y1 = self.up1[1](y1)\n\n        y2 = self.up2[0](y1)\n        y2 = torchsparse.cat([y2, x2])\n        y2 = self.up2[1](y2)\n\n        y3 = self.up3[0](y2)\n        y3 = torchsparse.cat([y3, x1])\n        y3 = self.up3[1](y3)\n\n        y4 = self.up4[0](y3)\n        y4 = torchsparse.cat([y4, x0])\n        y4 = self.up4[1](y4)\n        if return_final_logits:\n            output_dict = dict()\n            output_dict['logits'] = y4.F\n            output_dict['coords'] = y4.C[:, :3]\n            output_dict['batch_indices'] = y4.C[:, -1]\n            return output_dict\n\n        output = self.classifier(y4.F)\n        data_dict['output'] = output.F\n\n        return data_dict\n"
  },
  {
    "path": "lidm/eval/models/rangenet/__init__.py",
    "content": ""
  },
  {
    "path": "lidm/eval/models/rangenet/model.py",
    "content": "#!/usr/bin/env python3\n# This file is covered by the LICENSE file in the root of this project.\nfrom collections import OrderedDict\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\n\nclass BasicBlock(nn.Module):\n    def __init__(self, inplanes, planes, bn_d=0.1):\n        super(BasicBlock, self).__init__()\n        self.conv1 = nn.Conv2d(inplanes, planes[0], kernel_size=1,\n                               stride=1, padding=0, bias=False)\n        self.bn1 = nn.BatchNorm2d(planes[0], momentum=bn_d)\n        self.relu1 = nn.LeakyReLU(0.1)\n        self.conv2 = nn.Conv2d(planes[0], planes[1], kernel_size=3,\n                               stride=1, padding=1, bias=False)\n        self.bn2 = nn.BatchNorm2d(planes[1], momentum=bn_d)\n        self.relu2 = nn.LeakyReLU(0.1)\n\n    def forward(self, x):\n        residual = x\n\n        out = self.conv1(x)\n        out = self.bn1(out)\n        out = self.relu1(out)\n\n        out = self.conv2(out)\n        out = self.bn2(out)\n        out = self.relu2(out)\n\n        out += residual\n        return out\n\n\n# ******************************************************************************\n\n# number of layers per model\nmodel_blocks = {\n    21: [1, 1, 2, 2, 1],\n    53: [1, 2, 8, 8, 4],\n}\n\n\nclass Backbone(nn.Module):\n    \"\"\"\n       Class for DarknetSeg. Subclasses PyTorch's own \"nn\" module\n    \"\"\"\n\n    def __init__(self, params):\n        super(Backbone, self).__init__()\n        self.use_range = params[\"input_depth\"][\"range\"]\n        self.use_xyz = params[\"input_depth\"][\"xyz\"]\n        self.use_remission = params[\"input_depth\"][\"remission\"]\n        self.drop_prob = params[\"dropout\"]\n        self.bn_d = params[\"bn_d\"]\n        self.OS = params[\"OS\"]\n        self.layers = params[\"extra\"][\"layers\"]\n\n        # input depth calc\n        self.input_depth = 0\n        self.input_idxs = []\n        if self.use_range:\n            self.input_depth += 1\n            self.input_idxs.append(0)\n        if self.use_xyz:\n            self.input_depth += 3\n            self.input_idxs.extend([1, 2, 3])\n        if self.use_remission:\n            self.input_depth += 1\n            self.input_idxs.append(4)\n\n        # stride play\n        self.strides = [2, 2, 2, 2, 2]\n        # check current stride\n        current_os = 1\n        for s in self.strides:\n            current_os *= s\n\n        # make the new stride\n        if self.OS > current_os:\n            print(\"Can't do OS, \", self.OS,\n                  \" because it is bigger than original \", current_os)\n        else:\n            # redo strides according to needed stride\n            for i, stride in enumerate(reversed(self.strides), 0):\n                if int(current_os) != self.OS:\n                    if stride == 2:\n                        current_os /= 2\n                        self.strides[-1 - i] = 1\n                    if int(current_os) == self.OS:\n                        break\n\n        # check that darknet exists\n        assert self.layers in model_blocks.keys()\n\n        # generate layers depending on darknet type\n        self.blocks = model_blocks[self.layers]\n\n        # input layer\n        self.conv1 = nn.Conv2d(self.input_depth, 32, kernel_size=3,\n                               stride=1, padding=1, bias=False)\n        self.bn1 = nn.BatchNorm2d(32, momentum=self.bn_d)\n        self.relu1 = nn.LeakyReLU(0.1)\n\n        # encoder\n        self.enc1 = self._make_enc_layer(BasicBlock, [32, 64], self.blocks[0],\n                                         stride=self.strides[0], bn_d=self.bn_d)\n        self.enc2 = self._make_enc_layer(BasicBlock, [64, 128], self.blocks[1],\n                                         stride=self.strides[1], bn_d=self.bn_d)\n        self.enc3 = self._make_enc_layer(BasicBlock, [128, 256], self.blocks[2],\n                                         stride=self.strides[2], bn_d=self.bn_d)\n        self.enc4 = self._make_enc_layer(BasicBlock, [256, 512], self.blocks[3],\n                                         stride=self.strides[3], bn_d=self.bn_d)\n        self.enc5 = self._make_enc_layer(BasicBlock, [512, 1024], self.blocks[4],\n                                         stride=self.strides[4], bn_d=self.bn_d)\n\n        # for a bit of fun\n        self.dropout = nn.Dropout2d(self.drop_prob)\n\n        # last channels\n        self.last_channels = 1024\n\n    # make layer useful function\n    def _make_enc_layer(self, block, planes, blocks, stride, bn_d=0.1):\n        layers = []\n\n        #  downsample\n        layers.append((\"conv\", nn.Conv2d(planes[0], planes[1],\n                                         kernel_size=3,\n                                         stride=[1, stride], dilation=1,\n                                         padding=1, bias=False)))\n        layers.append((\"bn\", nn.BatchNorm2d(planes[1], momentum=bn_d)))\n        layers.append((\"relu\", nn.LeakyReLU(0.1)))\n\n        #  blocks\n        inplanes = planes[1]\n        for i in range(0, blocks):\n            layers.append((\"residual_{}\".format(i),\n                           block(inplanes, planes, bn_d)))\n\n        return nn.Sequential(OrderedDict(layers))\n\n    def run_layer(self, x, layer, skips, os):\n        y = layer(x)\n        if y.shape[2] < x.shape[2] or y.shape[3] < x.shape[3]:\n            skips[os] = x.detach()\n            os *= 2\n        x = y\n        return x, skips, os\n\n    def forward(self, x, return_logits=False, return_list=None):\n        # filter input\n        x = x[:, self.input_idxs]\n\n        # run cnn\n        # store for skip connections\n        skips = {}\n        out_dict = {}\n        os = 1\n\n        # first layer\n        x, skips, os = self.run_layer(x, self.conv1, skips, os)\n        x, skips, os = self.run_layer(x, self.bn1, skips, os)\n        x, skips, os = self.run_layer(x, self.relu1, skips, os)\n        if return_list and 'enc_0' in return_list:\n            out_dict['enc_0'] = x.detach().cpu()  # 32, 64, 1024\n\n        # all encoder blocks with intermediate dropouts\n        x, skips, os = self.run_layer(x, self.enc1, skips, os)\n        if return_list and 'enc_1' in return_list:\n            out_dict['enc_1'] = x.detach().cpu()  # 64, 64, 512\n        x, skips, os = self.run_layer(x, self.dropout, skips, os)\n\n        x, skips, os = self.run_layer(x, self.enc2, skips, os)\n        if return_list and 'enc_2' in return_list:\n            out_dict['enc_2'] = x.detach().cpu()  # 128, 64, 256\n        x, skips, os = self.run_layer(x, self.dropout, skips, os)\n\n        x, skips, os = self.run_layer(x, self.enc3, skips, os)\n        if return_list and 'enc_3' in return_list:\n            out_dict['enc_3'] = x.detach().cpu()  # 256, 64, 128\n        x, skips, os = self.run_layer(x, self.dropout, skips, os)\n\n        x, skips, os = self.run_layer(x, self.enc4, skips, os)\n        if return_list and 'enc_4' in return_list:\n            out_dict['enc_4'] = x.detach().cpu()  # 512, 64, 64\n        x, skips, os = self.run_layer(x, self.dropout, skips, os)\n\n        x, skips, os = self.run_layer(x, self.enc5, skips, os)\n        if return_list and 'enc_5' in return_list:\n            out_dict['enc_5'] = x.detach().cpu()  # 1024, 64, 32\n        if return_logits:\n            return x\n\n        x, skips, os = self.run_layer(x, self.dropout, skips, os)\n\n        if return_list is not None:\n            return x, skips, out_dict\n        return x, skips\n\n    def get_last_depth(self):\n        return self.last_channels\n\n    def get_input_depth(self):\n        return self.input_depth\n\n\nclass Decoder(nn.Module):\n    \"\"\"\n       Class for DarknetSeg. Subclasses PyTorch's own \"nn\" module\n    \"\"\"\n\n    def __init__(self, params, OS=32, feature_depth=1024):\n        super(Decoder, self).__init__()\n        self.backbone_OS = OS\n        self.backbone_feature_depth = feature_depth\n        self.drop_prob = params[\"dropout\"]\n        self.bn_d = params[\"bn_d\"]\n        self.index = 0\n\n        # stride play\n        self.strides = [2, 2, 2, 2, 2]\n        # check current stride\n        current_os = 1\n        for s in self.strides:\n            current_os *= s\n        # redo strides according to needed stride\n        for i, stride in enumerate(self.strides):\n            if int(current_os) != self.backbone_OS:\n                if stride == 2:\n                    current_os /= 2\n                    self.strides[i] = 1\n                if int(current_os) == self.backbone_OS:\n                    break\n\n        # decoder\n        self.dec5 = self._make_dec_layer(BasicBlock,\n                                         [self.backbone_feature_depth, 512],\n                                         bn_d=self.bn_d,\n                                         stride=self.strides[0])\n        self.dec4 = self._make_dec_layer(BasicBlock, [512, 256], bn_d=self.bn_d,\n                                         stride=self.strides[1])\n        self.dec3 = self._make_dec_layer(BasicBlock, [256, 128], bn_d=self.bn_d,\n                                         stride=self.strides[2])\n        self.dec2 = self._make_dec_layer(BasicBlock, [128, 64], bn_d=self.bn_d,\n                                         stride=self.strides[3])\n        self.dec1 = self._make_dec_layer(BasicBlock, [64, 32], bn_d=self.bn_d,\n                                         stride=self.strides[4])\n\n        # layer list to execute with skips\n        self.layers = [self.dec5, self.dec4, self.dec3, self.dec2, self.dec1]\n\n        # for a bit of fun\n        self.dropout = nn.Dropout2d(self.drop_prob)\n\n        # last channels\n        self.last_channels = 32\n\n    def _make_dec_layer(self, block, planes, bn_d=0.1, stride=2):\n        layers = []\n\n        #  downsample\n        if stride == 2:\n            layers.append((\"upconv\", nn.ConvTranspose2d(planes[0], planes[1],\n                                                        kernel_size=[1, 4], stride=[1, 2],\n                                                        padding=[0, 1])))\n        else:\n            layers.append((\"conv\", nn.Conv2d(planes[0], planes[1],\n                                             kernel_size=3, padding=1)))\n        layers.append((\"bn\", nn.BatchNorm2d(planes[1], momentum=bn_d)))\n        layers.append((\"relu\", nn.LeakyReLU(0.1)))\n\n        #  blocks\n        layers.append((\"residual\", block(planes[1], planes, bn_d)))\n\n        return nn.Sequential(OrderedDict(layers))\n\n    def run_layer(self, x, layer, skips, os):\n        feats = layer(x)  # up\n        if feats.shape[-1] > x.shape[-1]:\n            os //= 2  # match skip\n            feats = feats + skips[os].detach()  # add skip\n        x = feats\n        return x, skips, os\n\n    def forward(self, x, skips, return_logits=False, return_list=None):\n        os = self.backbone_OS\n        out_dict = {}\n\n        # run layers\n        x, skips, os = self.run_layer(x, self.dec5, skips, os)\n        if return_list and 'dec_4' in return_list:\n            out_dict['dec_4'] = x.detach().cpu()  # 512, 64, 64\n        x, skips, os = self.run_layer(x, self.dec4, skips, os)\n        if return_list and 'dec_3' in return_list:\n            out_dict['dec_3'] = x.detach().cpu()  # 256, 64, 128\n        x, skips, os = self.run_layer(x, self.dec3, skips, os)\n        if return_list and 'dec_2' in return_list:\n            out_dict['dec_2'] = x.detach().cpu()  # 128, 64, 256\n        x, skips, os = self.run_layer(x, self.dec2, skips, os)\n        if return_list and 'dec_1' in return_list:\n            out_dict['dec_1'] = x.detach().cpu()  # 64, 64, 512\n        x, skips, os = self.run_layer(x, self.dec1, skips, os)\n        if return_list and 'dec_0' in return_list:\n            out_dict['dec_0'] = x.detach().cpu()  # 32, 64, 1024\n\n        logits = torch.clone(x).detach()\n        x = self.dropout(x)\n\n        if return_logits:\n            return x, logits\n        if return_list is not None:\n            return out_dict\n        return x\n\n    def get_last_depth(self):\n        return self.last_channels\n\n\nclass Model(nn.Module):\n    def __init__(self, config):\n        super().__init__()\n        self.config = config\n        self.backbone = Backbone(params=self.config[\"backbone\"])\n        self.decoder = Decoder(params=self.config[\"decoder\"], OS=self.config[\"backbone\"][\"OS\"],\n                               feature_depth=self.backbone.get_last_depth())\n\n    def load_pretrained_weights(self, path):\n        w_dict = torch.load(path + \"/backbone\",\n                            map_location=lambda storage, loc: storage)\n        self.backbone.load_state_dict(w_dict, strict=True)\n        w_dict = torch.load(path + \"/segmentation_decoder\",\n                            map_location=lambda storage, loc: storage)\n        self.decoder.load_state_dict(w_dict, strict=True)\n\n    def forward(self, x, return_logits=False, return_final_logits=False, return_list=None, agg_type='depth'):\n        if return_logits:\n            logits = self.backbone(x, return_logits)\n            logits = F.adaptive_avg_pool2d(logits, (1, 1)).squeeze()\n            logits = torch.clone(logits).detach().cpu().numpy()\n            return logits\n        elif return_list is not None:\n            x, skips, enc_dict = self.backbone(x, return_list=return_list)\n            dec_dict = self.decoder(x, skips, return_list=return_list)\n            out_dict = {**enc_dict, **dec_dict}\n            return out_dict\n        elif return_final_logits:\n            assert agg_type in ['all', 'sector', 'depth']\n            y, skips = self.backbone(x)\n            y, logits = self.decoder(y, skips, True)\n\n            B, C, H, W = logits.shape\n            N = 16\n\n            # avg all\n            if agg_type == 'all':\n                logits = logits.mean([2, 3])\n            # avg in patch\n            elif agg_type == 'sector':\n                logits = logits.view(B, C, H, N, W // N).mean([2, 4]).reshape(B, -1)\n            # avg in row\n            elif agg_type == 'depth':\n                logits = logits.view(B, C, N, H // N, W).mean([3, 4]).reshape(B, -1)\n\n            logits = torch.clone(logits).detach().cpu().numpy()\n            return logits\n        else:\n            y, skips = self.backbone(x)\n            y = self.decoder(y, skips, False)\n            return y\n"
  },
  {
    "path": "lidm/eval/models/spvcnn/__init__.py",
    "content": ""
  },
  {
    "path": "lidm/eval/models/spvcnn/model.py",
    "content": "import torch.nn as nn\n\ntry:\n    import torchsparse\n    import torchsparse.nn as spnn\n    from torchsparse import PointTensor\n    from ..ts.utils import initial_voxelize, point_to_voxel, voxel_to_point\n    from ..ts import basic_blocks\nexcept ImportError:\n    raise Exception('Required torchsparse lib. Reference: https://github.com/mit-han-lab/torchsparse/tree/v1.4.0')\n\n\nclass Model(nn.Module):\n    def __init__(self, config):\n        super().__init__()\n        cr = config.model_params.cr\n        cs = config.model_params.layer_num\n        cs = [int(cr * x) for x in cs]\n\n        self.pres = self.vres = config.model_params.voxel_size\n        self.num_classes = config.model_params.num_class\n\n        self.stem = nn.Sequential(\n            spnn.Conv3d(config.model_params.input_dims, cs[0], kernel_size=3, stride=1),\n            spnn.BatchNorm(cs[0]), spnn.ReLU(True),\n            spnn.Conv3d(cs[0], cs[0], kernel_size=3, stride=1),\n            spnn.BatchNorm(cs[0]), spnn.ReLU(True))\n\n        self.stage1 = nn.Sequential(\n            basic_blocks.BasicConvolutionBlock(cs[0], cs[0], ks=2, stride=2, dilation=1),\n            basic_blocks.ResidualBlock(cs[0], cs[1], ks=3, stride=1, dilation=1),\n            basic_blocks.ResidualBlock(cs[1], cs[1], ks=3, stride=1, dilation=1),\n        )\n\n        self.stage2 = nn.Sequential(\n            basic_blocks.BasicConvolutionBlock(cs[1], cs[1], ks=2, stride=2, dilation=1),\n            basic_blocks.ResidualBlock(cs[1], cs[2], ks=3, stride=1, dilation=1),\n            basic_blocks.ResidualBlock(cs[2], cs[2], ks=3, stride=1, dilation=1),\n        )\n\n        self.stage3 = nn.Sequential(\n            basic_blocks.BasicConvolutionBlock(cs[2], cs[2], ks=2, stride=2, dilation=1),\n            basic_blocks.ResidualBlock(cs[2], cs[3], ks=3, stride=1, dilation=1),\n            basic_blocks.ResidualBlock(cs[3], cs[3], ks=3, stride=1, dilation=1),\n        )\n\n        self.stage4 = nn.Sequential(\n            basic_blocks.BasicConvolutionBlock(cs[3], cs[3], ks=2, stride=2, dilation=1),\n            basic_blocks.ResidualBlock(cs[3], cs[4], ks=3, stride=1, dilation=1),\n            basic_blocks.ResidualBlock(cs[4], cs[4], ks=3, stride=1, dilation=1),\n        )\n\n        self.up1 = nn.ModuleList([\n            basic_blocks.BasicDeconvolutionBlock(cs[4], cs[5], ks=2, stride=2),\n            nn.Sequential(\n                basic_blocks.ResidualBlock(cs[5] + cs[3], cs[5], ks=3, stride=1,\n                                           dilation=1),\n                basic_blocks.ResidualBlock(cs[5], cs[5], ks=3, stride=1, dilation=1),\n            )\n        ])\n\n        self.up2 = nn.ModuleList([\n            basic_blocks.BasicDeconvolutionBlock(cs[5], cs[6], ks=2, stride=2),\n            nn.Sequential(\n                basic_blocks.ResidualBlock(cs[6] + cs[2], cs[6], ks=3, stride=1,\n                                           dilation=1),\n                basic_blocks.ResidualBlock(cs[6], cs[6], ks=3, stride=1, dilation=1),\n            )\n        ])\n\n        self.up3 = nn.ModuleList([\n            basic_blocks.BasicDeconvolutionBlock(cs[6], cs[7], ks=2, stride=2),\n            nn.Sequential(\n                basic_blocks.ResidualBlock(cs[7] + cs[1], cs[7], ks=3, stride=1,\n                                           dilation=1),\n                basic_blocks.ResidualBlock(cs[7], cs[7], ks=3, stride=1, dilation=1),\n            )\n        ])\n\n        self.up4 = nn.ModuleList([\n            basic_blocks.BasicDeconvolutionBlock(cs[7], cs[8], ks=2, stride=2),\n            nn.Sequential(\n                basic_blocks.ResidualBlock(cs[8] + cs[0], cs[8], ks=3, stride=1,\n                                           dilation=1),\n                basic_blocks.ResidualBlock(cs[8], cs[8], ks=3, stride=1, dilation=1),\n            )\n        ])\n\n        self.classifier = nn.Sequential(nn.Linear(cs[8], self.num_classes))\n\n        self.point_transforms = nn.ModuleList([\n            nn.Sequential(\n                nn.Linear(cs[0], cs[4]),\n                nn.BatchNorm1d(cs[4]),\n                nn.ReLU(True),\n            ),\n            nn.Sequential(\n                nn.Linear(cs[4], cs[6]),\n                nn.BatchNorm1d(cs[6]),\n                nn.ReLU(True),\n            ),\n            nn.Sequential(\n                nn.Linear(cs[6], cs[8]),\n                nn.BatchNorm1d(cs[8]),\n                nn.ReLU(True),\n            )\n        ])\n\n        self.weight_initialization()\n        self.dropout = nn.Dropout(0.3, True)\n\n    def weight_initialization(self):\n        for m in self.modules():\n            if isinstance(m, nn.BatchNorm1d):\n                nn.init.constant_(m.weight, 1)\n                nn.init.constant_(m.bias, 0)\n\n    def forward(self, data_dict, return_logits=False, return_final_logits=False):\n        x = data_dict['lidar']\n\n        # x: SparseTensor z: PointTensor\n        z = PointTensor(x.F, x.C.float())\n\n        x0 = initial_voxelize(z, self.pres, self.vres)\n\n        x0 = self.stem(x0)\n        z0 = voxel_to_point(x0, z, nearest=False)\n        z0.F = z0.F\n\n        x1 = point_to_voxel(x0, z0)\n        x1 = self.stage1(x1)\n        x2 = self.stage2(x1)\n        x3 = self.stage3(x2)\n        x4 = self.stage4(x3)\n        z1 = voxel_to_point(x4, z0)\n        z1.F = z1.F + self.point_transforms[0](z0.F)\n\n        y1 = point_to_voxel(x4, z1)\n\n        if return_logits:\n            output_dict = dict()\n            output_dict['logits'] = y1.F\n            output_dict['batch_indices'] = y1.C[:, -1]\n            return output_dict\n\n        y1.F = self.dropout(y1.F)\n        y1 = self.up1[0](y1)\n        y1 = torchsparse.cat([y1, x3])\n        y1 = self.up1[1](y1)\n\n        y2 = self.up2[0](y1)\n        y2 = torchsparse.cat([y2, x2])\n        y2 = self.up2[1](y2)\n        z2 = voxel_to_point(y2, z1)\n        z2.F = z2.F + self.point_transforms[1](z1.F)\n\n        y3 = point_to_voxel(y2, z2)\n        y3.F = self.dropout(y3.F)\n        y3 = self.up3[0](y3)\n        y3 = torchsparse.cat([y3, x1])\n        y3 = self.up3[1](y3)\n\n        y4 = self.up4[0](y3)\n        y4 = torchsparse.cat([y4, x0])\n        y4 = self.up4[1](y4)\n        z3 = voxel_to_point(y4, z2)\n        z3.F = z3.F + self.point_transforms[2](z2.F)\n\n        if return_final_logits:\n            output_dict = dict()\n            output_dict['logits'] = z3.F\n            output_dict['coords'] = z3.C[:, :3]\n            output_dict['batch_indices'] = z3.C[:, -1].long()\n            return output_dict\n\n        # output = self.classifier(z3.F)\n        data_dict['logits'] = z3.F\n\n        return data_dict\n"
  },
  {
    "path": "lidm/eval/models/ts/__init__.py",
    "content": ""
  },
  {
    "path": "lidm/eval/models/ts/basic_blocks.py",
    "content": "#!/usr/bin/env python\n# encoding: utf-8\n'''\n@author: Xu Yan\n@file: basic_blocks.py\n@time: 2021/4/14 22:53\n'''\nimport torch.nn as nn\n\ntry:\n    import torchsparse.nn as spnn\nexcept:\n    print('To install torchsparse 1.4.0, please refer to https://github.com/mit-han-lab/torchsparse/tree/74099d10a51c71c14318bce63d6421f698b24f24')\n\n\nclass BasicConvolutionBlock(nn.Module):\n    def __init__(self, inc, outc, ks=3, stride=1, dilation=1):\n        super().__init__()\n        self.net = nn.Sequential(\n            spnn.Conv3d(\n                inc,\n                outc,\n                kernel_size=ks,\n                dilation=dilation,\n                stride=stride), spnn.BatchNorm(outc),\n            spnn.ReLU(True))\n\n    def forward(self, x):\n        out = self.net(x)\n        return out\n\n\nclass BasicDeconvolutionBlock(nn.Module):\n    def __init__(self, inc, outc, ks=3, stride=1):\n        super().__init__()\n        self.net = nn.Sequential(\n            spnn.Conv3d(\n                inc,\n                outc,\n                kernel_size=ks,\n                stride=stride,\n                transposed=True),\n            spnn.BatchNorm(outc),\n            spnn.ReLU(True))\n\n    def forward(self, x):\n        return self.net(x)\n\n\nclass ResidualBlock(nn.Module):\n    def __init__(self, inc, outc, ks=3, stride=1, dilation=1):\n        super().__init__()\n        self.net = nn.Sequential(\n            spnn.Conv3d(\n                inc,\n                outc,\n                kernel_size=ks,\n                dilation=dilation,\n                stride=stride), spnn.BatchNorm(outc),\n            spnn.ReLU(True),\n            spnn.Conv3d(\n                outc,\n                outc,\n                kernel_size=ks,\n                dilation=dilation,\n                stride=1),\n            spnn.BatchNorm(outc))\n\n        self.downsample = nn.Sequential() if (inc == outc and stride == 1) else \\\n            nn.Sequential(\n                spnn.Conv3d(inc, outc, kernel_size=1, dilation=1, stride=stride),\n                spnn.BatchNorm(outc)\n            )\n\n        self.ReLU = spnn.ReLU(True)\n\n    def forward(self, x):\n        out = self.ReLU(self.net(x) + self.downsample(x))\n        return out\n"
  },
  {
    "path": "lidm/eval/models/ts/utils.py",
    "content": "import torch\n\ntry:\n    import torchsparse.nn.functional as F\n    from torchsparse import PointTensor, SparseTensor\n    from torchsparse.nn.utils import get_kernel_offsets\nexcept:\n    print('To install torchsparse 1.4.0, please refer to https://github.com/mit-han-lab/torchsparse/tree/74099d10a51c71c14318bce63d6421f698b24f24')\n\n__all__ = ['initial_voxelize', 'point_to_voxel', 'voxel_to_point']\n\n\n# z: PointTensor\n# return: SparseTensor\ndef initial_voxelize(z, init_res, after_res):\n    new_float_coord = torch.cat([(z.C[:, :3] * init_res) / after_res, z.C[:, -1].view(-1, 1)], 1)\n\n    pc_hash = F.sphash(torch.floor(new_float_coord).int())\n    sparse_hash = torch.unique(pc_hash)\n    idx_query = F.sphashquery(pc_hash, sparse_hash)\n    counts = F.spcount(idx_query.int(), len(sparse_hash))\n\n    inserted_coords = F.spvoxelize(torch.floor(new_float_coord), idx_query, counts)\n    inserted_coords = torch.round(inserted_coords).int()\n    inserted_feat = F.spvoxelize(z.F, idx_query, counts)\n\n    new_tensor = SparseTensor(inserted_feat, inserted_coords, 1)\n    new_tensor.cmaps.setdefault(new_tensor.stride, new_tensor.coords)\n    z.additional_features['idx_query'][1] = idx_query\n    z.additional_features['counts'][1] = counts\n    z.C = new_float_coord\n\n    return new_tensor\n\n\n# x: SparseTensor, z: PointTensor\n# return: SparseTensor\ndef point_to_voxel(x, z):\n    if z.additional_features is None or \\\n            z.additional_features.get('idx_query') is None or \\\n            z.additional_features['idx_query'].get(x.s) is None:\n        pc_hash = F.sphash(\n            torch.cat([torch.floor(z.C[:, :3] / x.s[0]).int() * x.s[0], z.C[:, -1].int().view(-1, 1)], 1))\n        sparse_hash = F.sphash(x.C)\n        idx_query = F.sphashquery(pc_hash, sparse_hash)\n        counts = F.spcount(idx_query.int(), x.C.shape[0])\n        z.additional_features['idx_query'][x.s] = idx_query\n        z.additional_features['counts'][x.s] = counts\n    else:\n        idx_query = z.additional_features['idx_query'][x.s]\n        counts = z.additional_features['counts'][x.s]\n\n    inserted_feat = F.spvoxelize(z.F, idx_query, counts)\n    new_tensor = SparseTensor(inserted_feat, x.C, x.s)\n    new_tensor.cmaps = x.cmaps\n    new_tensor.kmaps = x.kmaps\n\n    return new_tensor\n\n\n# x: SparseTensor, z: PointTensor\n# return: PointTensor\ndef voxel_to_point(x, z, nearest=False):\n    if z.idx_query is None or z.weights is None or z.idx_query.get(x.s) is None or z.weights.get(x.s) is None:\n        off = get_kernel_offsets(2, x.s, 1, device=z.F.device)\n        old_hash = F.sphash(\n            torch.cat([\n                torch.floor(z.C[:, :3] / x.s[0]).int() * x.s[0],\n                z.C[:, -1].int().view(-1, 1)], 1), off)\n        pc_hash = F.sphash(x.C.to(z.F.device))\n        idx_query = F.sphashquery(old_hash, pc_hash)\n        weights = F.calc_ti_weights(z.C, idx_query, scale=x.s[0]).transpose(0, 1).contiguous()\n        idx_query = idx_query.transpose(0, 1).contiguous()\n        if nearest:\n            weights[:, 1:] = 0.\n            idx_query[:, 1:] = -1\n        new_feat = F.spdevoxelize(x.F, idx_query, weights)\n        new_tensor = PointTensor(new_feat, z.C, idx_query=z.idx_query, weights=z.weights)\n        new_tensor.additional_features = z.additional_features\n        new_tensor.idx_query[x.s] = idx_query\n        new_tensor.weights[x.s] = weights\n        z.idx_query[x.s] = idx_query\n        z.weights[x.s] = weights\n\n    else:\n        new_feat = F.spdevoxelize(x.F, z.idx_query.get(x.s), z.weights.get(x.s))\n        new_tensor = PointTensor(new_feat, z.C, idx_query=z.idx_query, weights=z.weights)\n        new_tensor.additional_features = z.additional_features\n\n    return new_tensor"
  },
  {
    "path": "lidm/eval/modules/__init__.py",
    "content": ""
  },
  {
    "path": "lidm/eval/modules/chamfer2D/__init__.py",
    "content": ""
  },
  {
    "path": "lidm/eval/modules/chamfer2D/chamfer2D.cu",
    "content": "\n#include <stdio.h>\n#include <ATen/ATen.h>\n\n#include <cuda.h>\n#include <cuda_runtime.h>\n\n#include <vector>\n\n\n\n__global__ void NmDistanceKernel(int b,int n,const float * xyz,int m,const float * xyz2,float * result,int * result_i){\n\tconst int batch=512;\n\t__shared__ float buf[batch*2];\n\tfor (int i=blockIdx.x;i<b;i+=gridDim.x){\n\t\tfor (int k2=0;k2<m;k2+=batch){\n\t\t\tint end_k=min(m,k2+batch)-k2;\n\t\t\tfor (int j=threadIdx.x;j<end_k*2;j+=blockDim.x){\n\t\t\t\tbuf[j]=xyz2[(i*m+k2)*2+j];\n\t\t\t}\n\t\t\t__syncthreads();\n\t\t\tfor (int j=threadIdx.x+blockIdx.y*blockDim.x;j<n;j+=blockDim.x*gridDim.y){\n\t\t\t\tfloat x1=xyz[(i*n+j)*2+0];\n\t\t\t\tfloat y1=xyz[(i*n+j)*2+1];\n\t\t\t\tint best_i=0;\n\t\t\t\tfloat best=0;\n\t\t\t\tint end_ka=end_k-(end_k&2);\n\t\t\t\tif (end_ka==batch){\n\t\t\t\t\tfor (int k=0;k<batch;k+=4){\n\t\t\t\t\t\t{\n\t\t\t\t\t\t\tfloat x2=buf[k*2+0]-x1;\n\t\t\t\t\t\t\tfloat y2=buf[k*2+1]-y1;\n\t\t\t\t\t\t\tfloat d=x2*x2+y2*y2;\n\t\t\t\t\t\t\tif (k==0 || d<best){\n\t\t\t\t\t\t\t\tbest=d;\n\t\t\t\t\t\t\t\tbest_i=k+k2;\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t}\n\t\t\t\t\t\t{\n\t\t\t\t\t\t\tfloat x2=buf[k*2+2]-x1;\n\t\t\t\t\t\t\tfloat y2=buf[k*2+3]-y1;\n\t\t\t\t\t\t\tfloat d=x2*x2+y2*y2;\n\t\t\t\t\t\t\tif (d<best){\n\t\t\t\t\t\t\t\tbest=d;\n\t\t\t\t\t\t\t\tbest_i=k+k2+1;\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t}\n\t\t\t\t\t\t{\n\t\t\t\t\t\t\tfloat x2=buf[k*2+4]-x1;\n\t\t\t\t\t\t\tfloat y2=buf[k*2+5]-y1;\n\t\t\t\t\t\t\tfloat d=x2*x2+y2*y2;\n\t\t\t\t\t\t\tif (d<best){\n\t\t\t\t\t\t\t\tbest=d;\n\t\t\t\t\t\t\t\tbest_i=k+k2+2;\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t}\n\t\t\t\t\t\t{\n\t\t\t\t\t\t\tfloat x2=buf[k*2+6]-x1;\n\t\t\t\t\t\t\tfloat y2=buf[k*2+7]-y1;\n\t\t\t\t\t\t\tfloat d=x2*x2+y2*y2;\n\t\t\t\t\t\t\tif (d<best){\n\t\t\t\t\t\t\t\tbest=d;\n\t\t\t\t\t\t\t\tbest_i=k+k2+3;\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t}\n\t\t\t\t\t}\n\t\t\t\t}else{\n\t\t\t\t\tfor (int k=0;k<end_ka;k+=4){\n\t\t\t\t\t\t{\n\t\t\t\t\t\t\tfloat x2=buf[k*2+0]-x1;\n\t\t\t\t\t\t\tfloat y2=buf[k*2+1]-y1;\n\t\t\t\t\t\t\tfloat d=x2*x2+y2*y2;\n\t\t\t\t\t\t\tif (k==0 || d<best){\n\t\t\t\t\t\t\t\tbest=d;\n\t\t\t\t\t\t\t\tbest_i=k+k2;\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t}\n\t\t\t\t\t\t{\n\t\t\t\t\t\t\tfloat x2=buf[k*2+2]-x1;\n\t\t\t\t\t\t\tfloat y2=buf[k*2+3]-y1;\n\t\t\t\t\t\t\tfloat d=x2*x2+y2*y2;\n\t\t\t\t\t\t\tif (d<best){\n\t\t\t\t\t\t\t\tbest=d;\n\t\t\t\t\t\t\t\tbest_i=k+k2+1;\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t}\n\t\t\t\t\t\t{\n\t\t\t\t\t\t\tfloat x2=buf[k*2+4]-x1;\n\t\t\t\t\t\t\tfloat y2=buf[k*2+5]-y1;\n\t\t\t\t\t\t\tfloat d=x2*x2+y2*y2;\n\t\t\t\t\t\t\tif (d<best){\n\t\t\t\t\t\t\t\tbest=d;\n\t\t\t\t\t\t\t\tbest_i=k+k2+2;\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t}\n\t\t\t\t\t\t{\n\t\t\t\t\t\t\tfloat x2=buf[k*2+6]-x1;\n\t\t\t\t\t\t\tfloat y2=buf[k*2+7]-y1;\n\t\t\t\t\t\t\tfloat d=x2*x2+y2*y2;\n\t\t\t\t\t\t\tif (d<best){\n\t\t\t\t\t\t\t\tbest=d;\n\t\t\t\t\t\t\t\tbest_i=k+k2+3;\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t}\n\t\t\t\t\t}\n\t\t\t\t}\n\t\t\t\tfor (int k=end_ka;k<end_k;k++){\n\t\t\t\t\tfloat x2=buf[k*2+0]-x1;\n\t\t\t\t\tfloat y2=buf[k*2+1]-y1;\n\t\t\t\t\tfloat d=x2*x2+y2*y2;\n\t\t\t\t\tif (k==0 || d<best){\n\t\t\t\t\t\tbest=d;\n\t\t\t\t\t\tbest_i=k+k2;\n\t\t\t\t\t}\n\t\t\t\t}\n\t\t\t\tif (k2==0 || result[(i*n+j)]>best){\n\t\t\t\t\tresult[(i*n+j)]=best;\n\t\t\t\t\tresult_i[(i*n+j)]=best_i;\n\t\t\t\t}\n\t\t\t}\n\t\t\t__syncthreads();\n\t\t}\n\t}\n}\n// int chamfer_cuda_forward(int b,int n,const float * xyz,int m,const float * xyz2,float * result,int * result_i,float * result2,int * result2_i, cudaStream_t stream){\nint chamfer_cuda_forward(at::Tensor xyz1, at::Tensor xyz2, at::Tensor dist1, at::Tensor dist2, at::Tensor idx1, at::Tensor idx2){\n\n\tconst auto batch_size = xyz1.size(0);\n\tconst auto n = xyz1.size(1); //num_points point cloud A\n\tconst auto m = xyz2.size(1); //num_points point cloud B\n\n\tNmDistanceKernel<<<dim3(32,16,1),512>>>(batch_size, n, xyz1.data<float>(), m, xyz2.data<float>(), dist1.data<float>(), idx1.data<int>());\n\tNmDistanceKernel<<<dim3(32,16,1),512>>>(batch_size, m, xyz2.data<float>(), n, xyz1.data<float>(), dist2.data<float>(), idx2.data<int>());\n\n\tcudaError_t err = cudaGetLastError();\n\t  if (err != cudaSuccess) {\n\t    printf(\"error in nnd updateOutput: %s\\n\", cudaGetErrorString(err));\n\t    //THError(\"aborting\");\n\t    return 0;\n\t  }\n\t  return 1;\n\n\n}\n__global__ void NmDistanceGradKernel(int b,int n,const float * xyz1,int m,const float * xyz2,const float * grad_dist1,const int * idx1,float * grad_xyz1,float * grad_xyz2){\n\tfor (int i=blockIdx.x;i<b;i+=gridDim.x){\n\t\tfor (int j=threadIdx.x+blockIdx.y*blockDim.x;j<n;j+=blockDim.x*gridDim.y){\n\t\t\tfloat x1=xyz1[(i*n+j)*2+0];\n\t\t\tfloat y1=xyz1[(i*n+j)*2+1];\n\t\t\tint j2=idx1[i*n+j];\n\t\t\tfloat x2=xyz2[(i*m+j2)*2+0];\n\t\t\tfloat y2=xyz2[(i*m+j2)*2+1];\n\t\t\tfloat g=grad_dist1[i*n+j]*2;\n\t\t\tatomicAdd(&(grad_xyz1[(i*n+j)*2+0]),g*(x1-x2));\n\t\t\tatomicAdd(&(grad_xyz1[(i*n+j)*2+1]),g*(y1-y2));\n\t\t\tatomicAdd(&(grad_xyz2[(i*m+j2)*2+0]),-(g*(x1-x2)));\n\t\t\tatomicAdd(&(grad_xyz2[(i*m+j2)*2+1]),-(g*(y1-y2)));\n\t\t}\n\t}\n}\n// int chamfer_cuda_backward(int b,int n,const float * xyz1,int m,const float * xyz2,const float * grad_dist1,const int * idx1,const float * grad_dist2,const int * idx2,float * grad_xyz1,float * grad_xyz2, cudaStream_t stream){\nint chamfer_cuda_backward(at::Tensor xyz1, at::Tensor xyz2, at::Tensor gradxyz1, at::Tensor gradxyz2, at::Tensor graddist1, at::Tensor graddist2, at::Tensor idx1, at::Tensor idx2){\n\t// cudaMemset(grad_xyz1,0,b*n*3*4);\n\t// cudaMemset(grad_xyz2,0,b*m*3*4);\n\t\n\tconst auto batch_size = xyz1.size(0);\n\tconst auto n = xyz1.size(1); //num_points point cloud A\n\tconst auto m = xyz2.size(1); //num_points point cloud B\n\n\tNmDistanceGradKernel<<<dim3(1,16,1),256>>>(batch_size,n,xyz1.data<float>(),m,xyz2.data<float>(),graddist1.data<float>(),idx1.data<int>(),gradxyz1.data<float>(),gradxyz2.data<float>());\n\tNmDistanceGradKernel<<<dim3(1,16,1),256>>>(batch_size,m,xyz2.data<float>(),n,xyz1.data<float>(),graddist2.data<float>(),idx2.data<int>(),gradxyz2.data<float>(),gradxyz1.data<float>());\n\t\n\tcudaError_t err = cudaGetLastError();\n\t  if (err != cudaSuccess) {\n\t    printf(\"error in nnd get grad: %s\\n\", cudaGetErrorString(err));\n\t    //THError(\"aborting\");\n\t    return 0;\n\t  }\n\t  return 1;\n\t\n}\n\n"
  },
  {
    "path": "lidm/eval/modules/chamfer2D/chamfer_cuda.cpp",
    "content": "#include <torch/torch.h>\n#include <vector>\n\n///TMP\n//#include \"common.h\"\n/// NOT TMP\n\t\n\nint chamfer_cuda_forward(at::Tensor xyz1, at::Tensor xyz2, at::Tensor dist1, at::Tensor dist2, at::Tensor idx1, at::Tensor idx2);\n\n\nint chamfer_cuda_backward(at::Tensor xyz1, at::Tensor xyz2, at::Tensor gradxyz1, at::Tensor gradxyz2, at::Tensor graddist1, at::Tensor graddist2, at::Tensor idx1, at::Tensor idx2);\n\n\n\n\nint chamfer_forward(at::Tensor xyz1, at::Tensor xyz2, at::Tensor dist1, at::Tensor dist2, at::Tensor idx1, at::Tensor idx2) {\n    return chamfer_cuda_forward(xyz1, xyz2, dist1, dist2, idx1, idx2);\n}\n\n\nint chamfer_backward(at::Tensor xyz1, at::Tensor xyz2, at::Tensor gradxyz1, at::Tensor gradxyz2, at::Tensor graddist1, \n\t\t\t\t\t  at::Tensor graddist2, at::Tensor idx1, at::Tensor idx2) {\n\n    return chamfer_cuda_backward(xyz1, xyz2, gradxyz1, gradxyz2, graddist1, graddist2, idx1, idx2);\n}\n\n\n\nPYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {\n  m.def(\"forward\", &chamfer_forward, \"chamfer forward (CUDA)\");\n  m.def(\"backward\", &chamfer_backward, \"chamfer backward (CUDA)\");\n}"
  },
  {
    "path": "lidm/eval/modules/chamfer2D/dist_chamfer_2D.py",
    "content": "from torch import nn\nfrom torch.autograd import Function\nimport torch\nimport importlib\nimport os\n\nchamfer_found = importlib.find_loader(\"chamfer_2D\") is not None\nif not chamfer_found:\n    ## Cool trick from https://github.com/chrdiller\n    print(\"Jitting Chamfer 2D\")\n    cur_path = os.path.dirname(os.path.abspath(__file__))\n    build_path = cur_path.replace('chamfer2D', 'tmp')\n    os.makedirs(build_path, exist_ok=True)\n\n    from torch.utils.cpp_extension import load\n\n    chamfer_2D = load(name=\"chamfer_2D\",\n                      sources=[\n                          \"/\".join(os.path.abspath(__file__).split('/')[:-1] + [\"chamfer_cuda.cpp\"]),\n                          \"/\".join(os.path.abspath(__file__).split('/')[:-1] + [\"chamfer2D.cu\"]),\n                      ], build_directory=build_path)\n    print(\"Loaded JIT 2D CUDA chamfer distance\")\n\nelse:\n    import chamfer_2D\n\n    print(\"Loaded compiled 2D CUDA chamfer distance\")\n\n\n# Chamfer's distance module @thibaultgroueix\n# GPU tensors only\nclass chamfer_2DFunction(Function):\n    @staticmethod\n    def forward(ctx, xyz1, xyz2):\n        batchsize, n, dim = xyz1.size()\n        assert dim == 2, \"Wrong last dimension for the chamfer distance 's input! Check with .size()\"\n        _, m, dim = xyz2.size()\n        assert dim == 2, \"Wrong last dimension for the chamfer distance 's input! Check with .size()\"\n        device = xyz1.device\n\n        device = xyz1.device\n\n        dist1 = torch.zeros(batchsize, n)\n        dist2 = torch.zeros(batchsize, m)\n\n        idx1 = torch.zeros(batchsize, n).type(torch.IntTensor)\n        idx2 = torch.zeros(batchsize, m).type(torch.IntTensor)\n\n        dist1 = dist1.to(device)\n        dist2 = dist2.to(device)\n        idx1 = idx1.to(device)\n        idx2 = idx2.to(device)\n        torch.cuda.set_device(device)\n\n        chamfer_2D.forward(xyz1, xyz2, dist1, dist2, idx1, idx2)\n        ctx.save_for_backward(xyz1, xyz2, idx1, idx2)\n        return dist1, dist2, idx1, idx2\n\n    @staticmethod\n    def backward(ctx, graddist1, graddist2, gradidx1, gradidx2):\n        xyz1, xyz2, idx1, idx2 = ctx.saved_tensors\n        graddist1 = graddist1.contiguous()\n        graddist2 = graddist2.contiguous()\n        device = graddist1.device\n\n        gradxyz1 = torch.zeros(xyz1.size())\n        gradxyz2 = torch.zeros(xyz2.size())\n\n        gradxyz1 = gradxyz1.to(device)\n        gradxyz2 = gradxyz2.to(device)\n        chamfer_2D.backward(\n            xyz1, xyz2, gradxyz1, gradxyz2, graddist1, graddist2, idx1, idx2\n        )\n        return gradxyz1, gradxyz2\n\n\nclass chamfer_2DDist(nn.Module):\n    def __init__(self):\n        super(chamfer_2DDist, self).__init__()\n\n    def forward(self, input1, input2):\n        input1 = input1.contiguous()\n        input2 = input2.contiguous()\n        return chamfer_2DFunction.apply(input1, input2)\n"
  },
  {
    "path": "lidm/eval/modules/chamfer2D/setup.py",
    "content": "from setuptools import setup\nfrom torch.utils.cpp_extension import BuildExtension, CUDAExtension\n\nsetup(\n    name='chamfer_2D',\n    ext_modules=[\n        CUDAExtension('chamfer_2D', [\n            \"/\".join(__file__.split('/')[:-1] + ['chamfer_cuda.cpp']),\n            \"/\".join(__file__.split('/')[:-1] + ['chamfer2D.cu']),\n        ]),\n    ],\n    cmdclass={\n        'build_ext': BuildExtension\n    })"
  },
  {
    "path": "lidm/eval/modules/chamfer3D/__init__.py",
    "content": ""
  },
  {
    "path": "lidm/eval/modules/chamfer3D/chamfer3D.cu",
    "content": "\n#include <stdio.h>\n#include <ATen/ATen.h>\n\n#include <cuda.h>\n#include <cuda_runtime.h>\n\n#include <vector>\n\n\n\n__global__ void NmDistanceKernel(int b,int n,const float * xyz,int m,const float * xyz2,float * result,int * result_i){\n\tconst int batch=512;\n\t__shared__ float buf[batch*3];\n\tfor (int i=blockIdx.x;i<b;i+=gridDim.x){\n\t\tfor (int k2=0;k2<m;k2+=batch){\n\t\t\tint end_k=min(m,k2+batch)-k2;\n\t\t\tfor (int j=threadIdx.x;j<end_k*3;j+=blockDim.x){\n\t\t\t\tbuf[j]=xyz2[(i*m+k2)*3+j];\n\t\t\t}\n\t\t\t__syncthreads();\n\t\t\tfor (int j=threadIdx.x+blockIdx.y*blockDim.x;j<n;j+=blockDim.x*gridDim.y){\n\t\t\t\tfloat x1=xyz[(i*n+j)*3+0];\n\t\t\t\tfloat y1=xyz[(i*n+j)*3+1];\n\t\t\t\tfloat z1=xyz[(i*n+j)*3+2];\n\t\t\t\tint best_i=0;\n\t\t\t\tfloat best=0;\n\t\t\t\tint end_ka=end_k-(end_k&3);\n\t\t\t\tif (end_ka==batch){\n\t\t\t\t\tfor (int k=0;k<batch;k+=4){\n\t\t\t\t\t\t{\n\t\t\t\t\t\t\tfloat x2=buf[k*3+0]-x1;\n\t\t\t\t\t\t\tfloat y2=buf[k*3+1]-y1;\n\t\t\t\t\t\t\tfloat z2=buf[k*3+2]-z1;\n\t\t\t\t\t\t\tfloat d=x2*x2+y2*y2+z2*z2;\n\t\t\t\t\t\t\tif (k==0 || d<best){\n\t\t\t\t\t\t\t\tbest=d;\n\t\t\t\t\t\t\t\tbest_i=k+k2;\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t}\n\t\t\t\t\t\t{\n\t\t\t\t\t\t\tfloat x2=buf[k*3+3]-x1;\n\t\t\t\t\t\t\tfloat y2=buf[k*3+4]-y1;\n\t\t\t\t\t\t\tfloat z2=buf[k*3+5]-z1;\n\t\t\t\t\t\t\tfloat d=x2*x2+y2*y2+z2*z2;\n\t\t\t\t\t\t\tif (d<best){\n\t\t\t\t\t\t\t\tbest=d;\n\t\t\t\t\t\t\t\tbest_i=k+k2+1;\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t}\n\t\t\t\t\t\t{\n\t\t\t\t\t\t\tfloat x2=buf[k*3+6]-x1;\n\t\t\t\t\t\t\tfloat y2=buf[k*3+7]-y1;\n\t\t\t\t\t\t\tfloat z2=buf[k*3+8]-z1;\n\t\t\t\t\t\t\tfloat d=x2*x2+y2*y2+z2*z2;\n\t\t\t\t\t\t\tif (d<best){\n\t\t\t\t\t\t\t\tbest=d;\n\t\t\t\t\t\t\t\tbest_i=k+k2+2;\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t}\n\t\t\t\t\t\t{\n\t\t\t\t\t\t\tfloat x2=buf[k*3+9]-x1;\n\t\t\t\t\t\t\tfloat y2=buf[k*3+10]-y1;\n\t\t\t\t\t\t\tfloat z2=buf[k*3+11]-z1;\n\t\t\t\t\t\t\tfloat d=x2*x2+y2*y2+z2*z2;\n\t\t\t\t\t\t\tif (d<best){\n\t\t\t\t\t\t\t\tbest=d;\n\t\t\t\t\t\t\t\tbest_i=k+k2+3;\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t}\n\t\t\t\t\t}\n\t\t\t\t}else{\n\t\t\t\t\tfor (int k=0;k<end_ka;k+=4){\n\t\t\t\t\t\t{\n\t\t\t\t\t\t\tfloat x2=buf[k*3+0]-x1;\n\t\t\t\t\t\t\tfloat y2=buf[k*3+1]-y1;\n\t\t\t\t\t\t\tfloat z2=buf[k*3+2]-z1;\n\t\t\t\t\t\t\tfloat d=x2*x2+y2*y2+z2*z2;\n\t\t\t\t\t\t\tif (k==0 || d<best){\n\t\t\t\t\t\t\t\tbest=d;\n\t\t\t\t\t\t\t\tbest_i=k+k2;\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t}\n\t\t\t\t\t\t{\n\t\t\t\t\t\t\tfloat x2=buf[k*3+3]-x1;\n\t\t\t\t\t\t\tfloat y2=buf[k*3+4]-y1;\n\t\t\t\t\t\t\tfloat z2=buf[k*3+5]-z1;\n\t\t\t\t\t\t\tfloat d=x2*x2+y2*y2+z2*z2;\n\t\t\t\t\t\t\tif (d<best){\n\t\t\t\t\t\t\t\tbest=d;\n\t\t\t\t\t\t\t\tbest_i=k+k2+1;\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t}\n\t\t\t\t\t\t{\n\t\t\t\t\t\t\tfloat x2=buf[k*3+6]-x1;\n\t\t\t\t\t\t\tfloat y2=buf[k*3+7]-y1;\n\t\t\t\t\t\t\tfloat z2=buf[k*3+8]-z1;\n\t\t\t\t\t\t\tfloat d=x2*x2+y2*y2+z2*z2;\n\t\t\t\t\t\t\tif (d<best){\n\t\t\t\t\t\t\t\tbest=d;\n\t\t\t\t\t\t\t\tbest_i=k+k2+2;\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t}\n\t\t\t\t\t\t{\n\t\t\t\t\t\t\tfloat x2=buf[k*3+9]-x1;\n\t\t\t\t\t\t\tfloat y2=buf[k*3+10]-y1;\n\t\t\t\t\t\t\tfloat z2=buf[k*3+11]-z1;\n\t\t\t\t\t\t\tfloat d=x2*x2+y2*y2+z2*z2;\n\t\t\t\t\t\t\tif (d<best){\n\t\t\t\t\t\t\t\tbest=d;\n\t\t\t\t\t\t\t\tbest_i=k+k2+3;\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t}\n\t\t\t\t\t}\n\t\t\t\t}\n\t\t\t\tfor (int k=end_ka;k<end_k;k++){\n\t\t\t\t\tfloat x2=buf[k*3+0]-x1;\n\t\t\t\t\tfloat y2=buf[k*3+1]-y1;\n\t\t\t\t\tfloat z2=buf[k*3+2]-z1;\n\t\t\t\t\tfloat d=x2*x2+y2*y2+z2*z2;\n\t\t\t\t\tif (k==0 || d<best){\n\t\t\t\t\t\tbest=d;\n\t\t\t\t\t\tbest_i=k+k2;\n\t\t\t\t\t}\n\t\t\t\t}\n\t\t\t\tif (k2==0 || result[(i*n+j)]>best){\n\t\t\t\t\tresult[(i*n+j)]=best;\n\t\t\t\t\tresult_i[(i*n+j)]=best_i;\n\t\t\t\t}\n\t\t\t}\n\t\t\t__syncthreads();\n\t\t}\n\t}\n}\n// int chamfer_cuda_forward(int b,int n,const float * xyz,int m,const float * xyz2,float * result,int * result_i,float * result2,int * result2_i, cudaStream_t stream){\nint chamfer_cuda_forward(at::Tensor xyz1, at::Tensor xyz2, at::Tensor dist1, at::Tensor dist2, at::Tensor idx1, at::Tensor idx2){\n\n\tconst auto batch_size = xyz1.size(0);\n\tconst auto n = xyz1.size(1); //num_points point cloud A\n\tconst auto m = xyz2.size(1); //num_points point cloud B\n\n\tNmDistanceKernel<<<dim3(32,16,1),512>>>(batch_size, n, xyz1.data<float>(), m, xyz2.data<float>(), dist1.data<float>(), idx1.data<int>());\n\tNmDistanceKernel<<<dim3(32,16,1),512>>>(batch_size, m, xyz2.data<float>(), n, xyz1.data<float>(), dist2.data<float>(), idx2.data<int>());\n\n\tcudaError_t err = cudaGetLastError();\n\t  if (err != cudaSuccess) {\n\t    printf(\"error in nnd updateOutput: %s\\n\", cudaGetErrorString(err));\n\t    //THError(\"aborting\");\n\t    return 0;\n\t  }\n\t  return 1;\n\n\n}\n__global__ void NmDistanceGradKernel(int b,int n,const float * xyz1,int m,const float * xyz2,const float * grad_dist1,const int * idx1,float * grad_xyz1,float * grad_xyz2){\n\tfor (int i=blockIdx.x;i<b;i+=gridDim.x){\n\t\tfor (int j=threadIdx.x+blockIdx.y*blockDim.x;j<n;j+=blockDim.x*gridDim.y){\n\t\t\tfloat x1=xyz1[(i*n+j)*3+0];\n\t\t\tfloat y1=xyz1[(i*n+j)*3+1];\n\t\t\tfloat z1=xyz1[(i*n+j)*3+2];\n\t\t\tint j2=idx1[i*n+j];\n\t\t\tfloat x2=xyz2[(i*m+j2)*3+0];\n\t\t\tfloat y2=xyz2[(i*m+j2)*3+1];\n\t\t\tfloat z2=xyz2[(i*m+j2)*3+2];\n\t\t\tfloat g=grad_dist1[i*n+j]*2;\n\t\t\tatomicAdd(&(grad_xyz1[(i*n+j)*3+0]),g*(x1-x2));\n\t\t\tatomicAdd(&(grad_xyz1[(i*n+j)*3+1]),g*(y1-y2));\n\t\t\tatomicAdd(&(grad_xyz1[(i*n+j)*3+2]),g*(z1-z2));\n\t\t\tatomicAdd(&(grad_xyz2[(i*m+j2)*3+0]),-(g*(x1-x2)));\n\t\t\tatomicAdd(&(grad_xyz2[(i*m+j2)*3+1]),-(g*(y1-y2)));\n\t\t\tatomicAdd(&(grad_xyz2[(i*m+j2)*3+2]),-(g*(z1-z2)));\n\t\t}\n\t}\n}\n// int chamfer_cuda_backward(int b,int n,const float * xyz1,int m,const float * xyz2,const float * grad_dist1,const int * idx1,const float * grad_dist2,const int * idx2,float * grad_xyz1,float * grad_xyz2, cudaStream_t stream){\nint chamfer_cuda_backward(at::Tensor xyz1, at::Tensor xyz2, at::Tensor gradxyz1, at::Tensor gradxyz2, at::Tensor graddist1, at::Tensor graddist2, at::Tensor idx1, at::Tensor idx2){\n\t// cudaMemset(grad_xyz1,0,b*n*3*4);\n\t// cudaMemset(grad_xyz2,0,b*m*3*4);\n\t\n\tconst auto batch_size = xyz1.size(0);\n\tconst auto n = xyz1.size(1); //num_points point cloud A\n\tconst auto m = xyz2.size(1); //num_points point cloud B\n\n\tNmDistanceGradKernel<<<dim3(1,16,1),256>>>(batch_size,n,xyz1.data<float>(),m,xyz2.data<float>(),graddist1.data<float>(),idx1.data<int>(),gradxyz1.data<float>(),gradxyz2.data<float>());\n\tNmDistanceGradKernel<<<dim3(1,16,1),256>>>(batch_size,m,xyz2.data<float>(),n,xyz1.data<float>(),graddist2.data<float>(),idx2.data<int>(),gradxyz2.data<float>(),gradxyz1.data<float>());\n\t\n\tcudaError_t err = cudaGetLastError();\n\t  if (err != cudaSuccess) {\n\t    printf(\"error in nnd get grad: %s\\n\", cudaGetErrorString(err));\n\t    //THError(\"aborting\");\n\t    return 0;\n\t  }\n\t  return 1;\n\t\n}\n\n"
  },
  {
    "path": "lidm/eval/modules/chamfer3D/chamfer_cuda.cpp",
    "content": "#include <torch/torch.h>\n#include <vector>\n\n///TMP\n//#include \"common.h\"\n/// NOT TMP\n\t\n\nint chamfer_cuda_forward(at::Tensor xyz1, at::Tensor xyz2, at::Tensor dist1, at::Tensor dist2, at::Tensor idx1, at::Tensor idx2);\n\n\nint chamfer_cuda_backward(at::Tensor xyz1, at::Tensor xyz2, at::Tensor gradxyz1, at::Tensor gradxyz2, at::Tensor graddist1, at::Tensor graddist2, at::Tensor idx1, at::Tensor idx2);\n\n\n\n\nint chamfer_forward(at::Tensor xyz1, at::Tensor xyz2, at::Tensor dist1, at::Tensor dist2, at::Tensor idx1, at::Tensor idx2) {\n    return chamfer_cuda_forward(xyz1, xyz2, dist1, dist2, idx1, idx2);\n}\n\n\nint chamfer_backward(at::Tensor xyz1, at::Tensor xyz2, at::Tensor gradxyz1, at::Tensor gradxyz2, at::Tensor graddist1, \n\t\t\t\t\t  at::Tensor graddist2, at::Tensor idx1, at::Tensor idx2) {\n\n    return chamfer_cuda_backward(xyz1, xyz2, gradxyz1, gradxyz2, graddist1, graddist2, idx1, idx2);\n}\n\n\n\nPYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {\n  m.def(\"forward\", &chamfer_forward, \"chamfer forward (CUDA)\");\n  m.def(\"backward\", &chamfer_backward, \"chamfer backward (CUDA)\");\n}"
  },
  {
    "path": "lidm/eval/modules/chamfer3D/dist_chamfer_3D.py",
    "content": "from torch import nn\nfrom torch.autograd import Function\nimport torch\nimport importlib\nimport os\n\nchamfer_found = importlib.find_loader(\"chamfer_3D\") is not None\nif not chamfer_found:\n    ## Cool trick from https://github.com/chrdiller\n    print(\"Jitting Chamfer 3D\")\n\n    from torch.utils.cpp_extension import load\n\n    chamfer_3D = load(name=\"chamfer_3D\",\n                      sources=[\n                          \"/\".join(os.path.abspath(__file__).split('/')[:-1] + [\"chamfer_cuda.cpp\"]),\n                          \"/\".join(os.path.abspath(__file__).split('/')[:-1] + [\"chamfer3D.cu\"]),\n                      ])\n    print(\"Loaded JIT 3D CUDA chamfer distance\")\n\nelse:\n    import chamfer_3D\n    print(\"Loaded compiled 3D CUDA chamfer distance\")\n\n\n# Chamfer's distance module @thibaultgroueix\n# GPU tensors only\nclass chamfer_3DFunction(Function):\n    @staticmethod\n    def forward(ctx, xyz1, xyz2):\n        batchsize, n, _ = xyz1.size()\n        _, m, _ = xyz2.size()\n        device = xyz1.device\n\n        dist1 = torch.zeros(batchsize, n)\n        dist2 = torch.zeros(batchsize, m)\n\n        idx1 = torch.zeros(batchsize, n).type(torch.IntTensor)\n        idx2 = torch.zeros(batchsize, m).type(torch.IntTensor)\n\n        dist1 = dist1.to(device)\n        dist2 = dist2.to(device)\n        idx1 = idx1.to(device)\n        idx2 = idx2.to(device)\n        torch.cuda.set_device(device)\n\n        chamfer_3D.forward(xyz1, xyz2, dist1, dist2, idx1, idx2)\n        ctx.save_for_backward(xyz1, xyz2, idx1, idx2)\n        return dist1, dist2, idx1, idx2\n\n    @staticmethod\n    def backward(ctx, graddist1, graddist2, gradidx1, gradidx2):\n        xyz1, xyz2, idx1, idx2 = ctx.saved_tensors\n        graddist1 = graddist1.contiguous()\n        graddist2 = graddist2.contiguous()\n        device = graddist1.device\n\n        gradxyz1 = torch.zeros(xyz1.size())\n        gradxyz2 = torch.zeros(xyz2.size())\n\n        gradxyz1 = gradxyz1.to(device)\n        gradxyz2 = gradxyz2.to(device)\n        chamfer_3D.backward(\n            xyz1, xyz2, gradxyz1, gradxyz2, graddist1, graddist2, idx1, idx2\n        )\n        return gradxyz1, gradxyz2\n\n\nclass chamfer_3DDist(nn.Module):\n    def __init__(self):\n        super(chamfer_3DDist, self).__init__()\n\n    def forward(self, input1, input2):\n        input1 = input1.contiguous()\n        input2 = input2.contiguous()\n        return chamfer_3DFunction.apply(input1, input2)\n"
  },
  {
    "path": "lidm/eval/modules/chamfer3D/setup.py",
    "content": "from setuptools import setup\nfrom torch.utils.cpp_extension import BuildExtension, CUDAExtension\n\nsetup(\n    name='chamfer_3D',\n    ext_modules=[\n        CUDAExtension('chamfer_3D', [\n            \"/\".join(__file__.split('/')[:-1] + ['chamfer_cuda.cpp']),\n            \"/\".join(__file__.split('/')[:-1] + ['chamfer3D.cu']),\n        ]),\n    ],\n    cmdclass={\n        'build_ext': BuildExtension\n    })"
  },
  {
    "path": "lidm/eval/modules/emd/__init__.py",
    "content": ""
  },
  {
    "path": "lidm/eval/modules/emd/emd.cpp",
    "content": "// EMD approximation module (based on auction algorithm)\n// author: Minghua Liu\n#include <torch/extension.h>\n#include <vector>\n\nint emd_cuda_forward(at::Tensor xyz1, at::Tensor xyz2, at::Tensor dist, at::Tensor assignment, at::Tensor price, \n\t                 at::Tensor assignment_inv, at::Tensor bid, at::Tensor bid_increments, at::Tensor max_increments,\n\t                 at::Tensor unass_idx, at::Tensor unass_cnt, at::Tensor unass_cnt_sum, at::Tensor cnt_tmp, at::Tensor max_idx, float eps, int iters);\n\nint emd_cuda_backward(at::Tensor xyz1, at::Tensor xyz2, at::Tensor gradxyz, at::Tensor graddist, at::Tensor idx);\n\n\n\nint emd_forward(at::Tensor xyz1, at::Tensor xyz2, at::Tensor dist, at::Tensor assignment, at::Tensor price, \n\t                 at::Tensor assignment_inv, at::Tensor bid, at::Tensor bid_increments, at::Tensor max_increments,\n\t                 at::Tensor unass_idx, at::Tensor unass_cnt, at::Tensor unass_cnt_sum, at::Tensor cnt_tmp, at::Tensor max_idx, float eps, int iters) {\n\treturn emd_cuda_forward(xyz1, xyz2, dist, assignment, price, assignment_inv, bid, bid_increments, max_increments, unass_idx, unass_cnt, unass_cnt_sum, cnt_tmp, max_idx, eps, iters);\n}\n\nint emd_backward(at::Tensor xyz1, at::Tensor xyz2, at::Tensor gradxyz, at::Tensor graddist, at::Tensor idx) {\n\n    return emd_cuda_backward(xyz1, xyz2, gradxyz, graddist, idx);\n}\n\n\n\n\nPYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {\n  m.def(\"forward\", &emd_forward, \"emd forward (CUDA)\");\n  m.def(\"backward\", &emd_backward, \"emd backward (CUDA)\");\n}"
  },
  {
    "path": "lidm/eval/modules/emd/emd_cuda.cu",
    "content": "// EMD approximation module (based on auction algorithm)\n// author: Minghua Liu\n#include <stdio.h>\n#include <ATen/ATen.h>\n\n#include <cuda.h>\n#include <iostream>\n#include <cuda_runtime.h>\n\n__device__ __forceinline__ float atomicMax(float *address, float val)\n{\n    int ret = __float_as_int(*address);\n    while(val > __int_as_float(ret))\n    {\n        int old = ret;\n        if((ret = atomicCAS((int *)address, old, __float_as_int(val))) == old)\n            break;\n    }\n    return __int_as_float(ret);\n}\n\n\n__global__ void clear(int b, int * cnt_tmp, int * unass_cnt) {\n\tfor (int i = threadIdx.x; i < b; i += blockDim.x) {\n\t\tcnt_tmp[i] = 0;\n\t\tunass_cnt[i] = 0;\n\t}\n}\n\n__global__ void calc_unass_cnt(int b, int n, int * assignment, int * unass_cnt) { \n\t// count the number of unassigned points in each batch\n\tconst int BLOCK_SIZE = 1024; \n\t__shared__ int scan_array[BLOCK_SIZE];\n\tfor (int i = blockIdx.x; i < b; i += gridDim.x) {\n\t\tscan_array[threadIdx.x] = assignment[i * n + blockIdx.y * BLOCK_SIZE + threadIdx.x] == -1 ? 1 : 0;\n\t\t__syncthreads();\n\t\t\n\t\tint stride = 1;\n\t\twhile(stride <= BLOCK_SIZE / 2) {\n\t\t\tint index = (threadIdx.x + 1) * stride * 2 - 1; \n\t\t\tif(index < BLOCK_SIZE)\n\t\t\t\tscan_array[index] += scan_array[index - stride]; \n\t\t\tstride = stride * 2;\n\t\t\t__syncthreads(); \n\t\t}\n\t\t__syncthreads();\n\t\t\n\t\tif (threadIdx.x == BLOCK_SIZE - 1) {\n\t\t\tatomicAdd(&unass_cnt[i], scan_array[threadIdx.x]);\n\t\t}\n\t\t__syncthreads();\n\t}\n}\n\n__global__ void calc_unass_cnt_sum(int b, int * unass_cnt, int * unass_cnt_sum) {\n\t// count the cumulative sum over over unass_cnt\n\tconst int BLOCK_SIZE = 512; // batch_size <= 512\n\t__shared__ int scan_array[BLOCK_SIZE];\n\tscan_array[threadIdx.x] = unass_cnt[threadIdx.x];\n\t__syncthreads();\n\t\n\tint stride = 1;\n\twhile(stride <= BLOCK_SIZE / 2) {\n\t\tint index = (threadIdx.x + 1) * stride * 2 - 1; \n\t\tif(index < BLOCK_SIZE)\n\t\t\tscan_array[index] += scan_array[index - stride]; \n\t\tstride = stride * 2;\n\t\t__syncthreads(); \n\t}\n\t__syncthreads();\n\tstride = BLOCK_SIZE / 4; \n\twhile(stride > 0) {\n\t\tint index = (threadIdx.x + 1) * stride * 2 - 1; \n\t\tif((index + stride) < BLOCK_SIZE)\n\t\t\tscan_array[index + stride] += scan_array[index];\n\t\tstride = stride / 2;\n\t\t__syncthreads(); \n\t}\n\t__syncthreads(); \n\t\n\t//printf(\"%d\\n\", unass_cnt_sum[b - 1]);\n\tunass_cnt_sum[threadIdx.x] = scan_array[threadIdx.x];\n}\n\n__global__ void calc_unass_idx(int b, int n, int * assignment, int * unass_idx, int * unass_cnt, int * unass_cnt_sum, int * cnt_tmp) {\n\t// list all the unassigned points\n\tfor (int i = blockIdx.x; i < b; i += gridDim.x) {\n\t\tif (assignment[i * n + blockIdx.y * 1024 + threadIdx.x] == -1) {\n\t\t\tint idx = atomicAdd(&cnt_tmp[i], 1);\n\t\t\tunass_idx[unass_cnt_sum[i] - unass_cnt[i] + idx] = blockIdx.y * 1024 + threadIdx.x;\n\t\t} \n\t}\n}\n\n__global__ void Bid(int b, int n, const float * xyz1, const float * xyz2, float eps, int * assignment, int * assignment_inv, float * price, \n\t\t\t\t\tint * bid, float * bid_increments, float * max_increments, int * unass_cnt, int * unass_cnt_sum, int * unass_idx) {\n\tconst int batch = 2048, block_size = 1024, block_cnt = n / 1024;\n\t__shared__ float xyz2_buf[batch * 3];\n\t__shared__ float price_buf[batch];\n\t__shared__ float best_buf[block_size];\n\t__shared__ float better_buf[block_size];\n\t__shared__ int best_i_buf[block_size];\n\tfor (int i = blockIdx.x; i < b; i += gridDim.x) {\n\t\tint _unass_cnt = unass_cnt[i];\n\t\tif (_unass_cnt == 0)\n\t\t\tcontinue;\n\t\tint _unass_cnt_sum = unass_cnt_sum[i];\n\t\tint unass_per_block = (_unass_cnt + block_cnt - 1) / block_cnt;\n\t\tint thread_per_unass = block_size / unass_per_block;\n\t\tint unass_this_block = max(min(_unass_cnt - (int) blockIdx.y * unass_per_block, unass_per_block), 0);\n\t\t\t\n\t\tfloat x1, y1, z1, best = -1e9, better = -1e9;\n\t\tint best_i = -1, _unass_id = -1, thread_in_unass;\n\n\t\tif (threadIdx.x < thread_per_unass * unass_this_block) {\n\t\t\t_unass_id = unass_per_block * blockIdx.y + threadIdx.x / thread_per_unass + _unass_cnt_sum - _unass_cnt;\n\t\t\t_unass_id = unass_idx[_unass_id];\n\t\t\tthread_in_unass = threadIdx.x % thread_per_unass;\n\n\t\t\tx1 = xyz1[(i * n + _unass_id) * 3 + 0];\n\t\t\ty1 = xyz1[(i * n + _unass_id) * 3 + 1];\n\t\t\tz1 = xyz1[(i * n + _unass_id) * 3 + 2];\n\t\t}\n\n\t\tfor (int k2 = 0; k2 < n; k2 += batch) {\n\t\t\tint end_k = min(n, k2 + batch) - k2;\n\t\t\tfor (int j = threadIdx.x; j < end_k * 3; j += blockDim.x) {\n\t\t\t\txyz2_buf[j] = xyz2[(i * n + k2) * 3 + j];\n\t\t\t}\n\t\t\tfor (int j = threadIdx.x; j < end_k; j += blockDim.x) {\n\t\t\t\tprice_buf[j] = price[i * n + k2 + j];\n\t\t\t}\n\t\t\t__syncthreads();\n\n\t\t\tif (_unass_id != -1) {\n\t\t\t\tint delta = (end_k + thread_per_unass - 1) / thread_per_unass;\n\t\t\t\tint l = thread_in_unass * delta;\n\t\t\t\tint r = min((thread_in_unass + 1) * delta, end_k);\n\t\t\t\tfor (int k = l; k < r; k++) \n\t\t\t\t//if (!last || assignment_inv[i * n + k + k2] == -1)\n\t\t\t\t{\n\t\t\t\t\tfloat x2 = xyz2_buf[k * 3 + 0] - x1;\n\t\t\t\t\tfloat y2 = xyz2_buf[k * 3 + 1] - y1;\n\t\t\t\t\tfloat z2 = xyz2_buf[k * 3 + 2] - z1;\n\t\t\t\t\t// the coordinates of points should be normalized to [0, 1]\n\t\t\t\t\tfloat d = 3.0 - sqrtf(x2 * x2 + y2 * y2 + z2 * z2) - price_buf[k];\n\t\t\t\t\tif (d > best) {\n\t\t\t\t\t\tbetter = best;\n\t\t\t\t\t\tbest = d;\n\t\t\t\t\t\tbest_i = k + k2;\n\t\t\t\t\t}\n\t\t\t\t\telse if (d > better) {\n\t\t\t\t\t\tbetter = d;\n\t\t\t\t\t}\n\t\t\t\t}\n\t\t\t}\n\t\t\t__syncthreads();\n\t\t}\n\n\t\tbest_buf[threadIdx.x] = best;\n\t\tbetter_buf[threadIdx.x] = better;\n\t\tbest_i_buf[threadIdx.x] = best_i;\n\t\t__syncthreads();\n\t\t\n\t\tif (_unass_id != -1 && thread_in_unass == 0) {\n\t\t\tfor (int j = threadIdx.x + 1; j < threadIdx.x + thread_per_unass; j++) {\n\t\t\t\tif (best_buf[j] > best) {\n\t\t\t\t\tbetter = max(best, better_buf[j]);\n\t\t\t\t\tbest = best_buf[j];\n\t\t\t\t\tbest_i = best_i_buf[j];\n\t\t\t\t}\n\t\t\t\telse better = max(better, best_buf[j]);\n\t\t\t}\n\t\t\tbid[i * n + _unass_id] = best_i;\n\t\t\tbid_increments[i * n + _unass_id] = best - better + eps; \n\t\t\tatomicMax(&max_increments[i * n + best_i], best - better + eps);\n\t\t}\n\t}\n}\n\n__global__ void GetMax(int b, int n, int * assignment, int * bid, float * bid_increments, float * max_increments, int * max_idx) {\n\tfor (int i = blockIdx.x; i < b; i += gridDim.x) {\n\t\tint j = threadIdx.x + blockIdx.y * blockDim.x;\n\t\tif (assignment[i * n + j] == -1) {\n\t\t\tint bid_id = bid[i * n + j];\n\t\t\tfloat bid_inc = bid_increments[i * n + j];\n\t\t\tfloat max_inc = max_increments[i * n + bid_id];\n\t\t\tif (bid_inc - 1e-6 <= max_inc && max_inc <= bid_inc + 1e-6) \n\t\t\t{\n\t\t\t\tmax_idx[i * n + bid_id] = j;\n\t\t\t}\n\t\t}\n\t}\n}\n\n__global__ void Assign(int b, int n, int * assignment, int * assignment_inv, float * price, int * bid, float * bid_increments, float * max_increments, int * max_idx, bool last) {\n\tfor (int i = blockIdx.x; i < b; i += gridDim.x) {\n\t\tint j = threadIdx.x + blockIdx.y * blockDim.x;\n\t\tif (assignment[i * n + j] == -1) {\n\t\t\tint bid_id = bid[i * n + j];\n\t\t\tif (last || max_idx[i * n + bid_id] == j) \n\t\t\t{\n\t\t\t\tfloat bid_inc = bid_increments[i * n + j];\n\t\t\t\tint ass_inv = assignment_inv[i * n + bid_id];\n\t\t\t\tif (!last && ass_inv != -1) {\n\t\t\t\t\tassignment[i * n + ass_inv] = -1;\n\t\t\t\t}\n\t\t\t\tassignment_inv[i * n + bid_id] = j;\n\t\t\t\tassignment[i * n + j] = bid_id;\n\t\t\t\tprice[i * n + bid_id] += bid_inc;\n\t\t\t\tmax_increments[i * n + bid_id] = -1e9;\n\t\t\t}\n\t\t}\n\t}\n}\n\n__global__ void CalcDist(int b, int n, float * xyz1, float * xyz2, float * dist, int * assignment) {\n\tfor (int i = blockIdx.x; i < b; i += gridDim.x) {\n\t\tint j = threadIdx.x + blockIdx.y * blockDim.x;\n\t\tint k = assignment[i * n + j];\n\t\tfloat deltax = xyz1[(i * n + j) * 3 + 0] - xyz2[(i * n + k) * 3 + 0];\n\t\tfloat deltay = xyz1[(i * n + j) * 3 + 1] - xyz2[(i * n + k) * 3 + 1];\n\t\tfloat deltaz = xyz1[(i * n + j) * 3 + 2] - xyz2[(i * n + k) * 3 + 2];\n\t\tdist[i * n + j] = deltax * deltax + deltay * deltay + deltaz * deltaz;\n\t}\n}\n\nint emd_cuda_forward(at::Tensor xyz1, at::Tensor xyz2, at::Tensor dist, at::Tensor assignment, at::Tensor price, \n\t                 at::Tensor assignment_inv, at::Tensor bid, at::Tensor bid_increments, at::Tensor max_increments,\n\t                 at::Tensor unass_idx, at::Tensor unass_cnt, at::Tensor unass_cnt_sum, at::Tensor cnt_tmp, at::Tensor max_idx, float eps, int iters) {\n\n\tconst auto batch_size = xyz1.size(0);\n\tconst auto n = xyz1.size(1); //num_points point cloud A\n\tconst auto m = xyz2.size(1); //num_points point cloud B\n\t\n\tif (n != m) {\n\t\tprintf(\"Input Error! The two point clouds should have the same size.\\n\");\n\t\treturn -1;\n\t}\n\n\tif (batch_size > 512) {\n\t\tprintf(\"Input Error! The batch size should be less than 512.\\n\");\n\t\treturn -1;\n\t}\n\n\tif (n % 1024 != 0) {\n\t\tprintf(\"Input Error! The size of the point clouds should be a multiple of 1024.\\n\");\n\t\treturn -1;\n\t}\n\n\t//cudaEvent_t start,stop;\n\t//cudaEventCreate(&start);\n\t//cudaEventCreate(&stop);\n\t//cudaEventRecord(start);\n\t//int iters = 50;\n\tfor (int i = 0; i < iters; i++) {\n\t\tclear<<<1, batch_size>>>(batch_size, cnt_tmp.data<int>(), unass_cnt.data<int>());\n\t\tcalc_unass_cnt<<<dim3(batch_size, n / 1024, 1), 1024>>>(batch_size, n, assignment.data<int>(), unass_cnt.data<int>());\n\t\tcalc_unass_cnt_sum<<<1, batch_size>>>(batch_size, unass_cnt.data<int>(), unass_cnt_sum.data<int>());\n\t\tcalc_unass_idx<<<dim3(batch_size, n / 1024, 1), 1024>>>(batch_size, n, assignment.data<int>(), unass_idx.data<int>(), unass_cnt.data<int>(), \n\t\t\t\t\t\t\t\t\t\t\t unass_cnt_sum.data<int>(), cnt_tmp.data<int>());\n\t\tBid<<<dim3(batch_size, n / 1024, 1), 1024>>>(batch_size, n, xyz1.data<float>(), xyz2.data<float>(), eps, assignment.data<int>(), assignment_inv.data<int>(), \n\t\t\t                          price.data<float>(), bid.data<int>(), bid_increments.data<float>(), max_increments.data<float>(),\n\t\t\t                          unass_cnt.data<int>(), unass_cnt_sum.data<int>(), unass_idx.data<int>());\n\t\tGetMax<<<dim3(batch_size, n / 1024, 1), 1024>>>(batch_size, n, assignment.data<int>(), bid.data<int>(), bid_increments.data<float>(), max_increments.data<float>(), max_idx.data<int>());\n\t\tAssign<<<dim3(batch_size, n / 1024, 1), 1024>>>(batch_size, n, assignment.data<int>(), assignment_inv.data<int>(), price.data<float>(), bid.data<int>(),\n\t\t\t\t\t\t\t\t\t  bid_increments.data<float>(), max_increments.data<float>(), max_idx.data<int>(), i == iters - 1);\n\t}\n\tCalcDist<<<dim3(batch_size, n / 1024, 1), 1024>>>(batch_size, n, xyz1.data<float>(), xyz2.data<float>(), dist.data<float>(), assignment.data<int>());\n\t//cudaEventRecord(stop);\n\t//cudaEventSynchronize(stop);\n\t//float elapsedTime;\n\t//cudaEventElapsedTime(&elapsedTime,start,stop);\n\t//printf(\"%lf\\n\", elapsedTime);\n\n\tcudaError_t err = cudaGetLastError();\n\t  if (err != cudaSuccess) {\n\t    printf(\"error in nnd Output: %s\\n\", cudaGetErrorString(err));\n\t    return 0;\n\t  }\n\t  return 1;\n}\n\n__global__ void NmDistanceGradKernel(int b, int n, const float * xyz1, const float * xyz2, const float * grad_dist, const int * idx, float * grad_xyz){\n\tfor (int i = blockIdx.x; i < b; i += gridDim.x) {\n\t\tfor (int j = threadIdx.x + blockIdx.y * blockDim.x; j < n; j += blockDim.x * gridDim.y) {\n\t\t\tfloat x1 = xyz1[(i * n + j) * 3 + 0];\n\t\t\tfloat y1 = xyz1[(i * n + j) * 3 + 1];\n\t\t\tfloat z1 = xyz1[(i * n + j) * 3 + 2];\n\t\t\tint j2 = idx[i * n + j];\n\t\t\tfloat x2 = xyz2[(i * n + j2) * 3 + 0];\n\t\t\tfloat y2 = xyz2[(i * n + j2) * 3 + 1];\n\t\t\tfloat z2 = xyz2[(i * n + j2) * 3 + 2];\n\t\t\tfloat g = grad_dist[i * n + j] * 2;\n\t\t\tatomicAdd(&(grad_xyz[(i * n + j) * 3 + 0]), g * (x1 - x2));\n\t\t\tatomicAdd(&(grad_xyz[(i * n + j) * 3 + 1]), g * (y1 - y2));\n\t\t\tatomicAdd(&(grad_xyz[(i * n + j) * 3 + 2]), g * (z1 - z2));\n\t\t}\n\t}\n}\n\nint emd_cuda_backward(at::Tensor xyz1, at::Tensor xyz2, at::Tensor gradxyz, at::Tensor graddist, at::Tensor idx){\n\tconst auto batch_size = xyz1.size(0);\n\tconst auto n = xyz1.size(1); \n\tconst auto m = xyz2.size(1); \n\n\tNmDistanceGradKernel<<<dim3(batch_size, n / 1024, 1), 1024>>>(batch_size, n, xyz1.data<float>(), xyz2.data<float>(), graddist.data<float>(), idx.data<int>(), gradxyz.data<float>());\n\t\n\tcudaError_t err = cudaGetLastError();\n\t  if (err != cudaSuccess) {\n\t    printf(\"error in nnd get grad: %s\\n\", cudaGetErrorString(err));\n\t    return 0;\n\t  }\n\t  return 1;\n\t\n}\n"
  },
  {
    "path": "lidm/eval/modules/emd/emd_module.py",
    "content": "# EMD approximation module (based on auction algorithm)\n# memory complexity: O(n)\n# time complexity: O(n^2 * iter) \n# author: Minghua Liu\n\n# Input:\n# xyz1, xyz2: [#batch, #points, 3]\n# where xyz1 is the predicted point cloud and xyz2 is the ground truth point cloud \n# two point clouds should have same size and be normalized to [0, 1]\n# #points should be a multiple of 1024\n# #batch should be no greater than 512\n# eps is a parameter which balances the error rate and the speed of convergence\n# iters is the number of iteration\n# we only calculate gradient for xyz1\n\n# Output:\n# dist: [#batch, #points],  sqrt(dist) -> L2 distance \n# assignment: [#batch, #points], index of the matched point in the ground truth point cloud\n# the result is an approximation and the assignment is not guranteed to be a bijection\nimport importlib\nimport os\nimport time\nimport numpy as np\nimport torch\nfrom torch import nn\nfrom torch.autograd import Function\n\nemd_found = importlib.find_loader(\"emd\") is not None\nif not emd_found:\n    ## Cool trick from https://github.com/chrdiller\n    print(\"Jitting EMD 3D\")\n\n    from torch.utils.cpp_extension import load\n\n    emd = load(name=\"emd\",\n               sources=[\n                   \"/\".join(os.path.abspath(__file__).split('/')[:-1] + [\"emd.cpp\"]),\n                   \"/\".join(os.path.abspath(__file__).split('/')[:-1] + [\"emd_cuda.cu\"]),\n               ])\n    print(\"Loaded JIT 3D CUDA emd\")\nelse:\n    import emd\n    print(\"Loaded compiled 3D CUDA emd\")\n\n\nclass emdFunction(Function):\n    @staticmethod\n    def forward(ctx, xyz1, xyz2, eps, iters):\n        batchsize, n, _ = xyz1.size()\n        _, m, _ = xyz2.size()\n\n        assert (n == m)\n        assert (xyz1.size()[0] == xyz2.size()[0])\n        # assert(n % 1024 == 0)\n        assert (batchsize <= 512)\n\n        xyz1 = xyz1.contiguous().float().cuda()\n        xyz2 = xyz2.contiguous().float().cuda()\n        dist = torch.zeros(batchsize, n, device='cuda').contiguous()\n        assignment = torch.zeros(batchsize, n, device='cuda', dtype=torch.int32).contiguous() - 1\n        assignment_inv = torch.zeros(batchsize, m, device='cuda', dtype=torch.int32).contiguous() - 1\n        price = torch.zeros(batchsize, m, device='cuda').contiguous()\n        bid = torch.zeros(batchsize, n, device='cuda', dtype=torch.int32).contiguous()\n        bid_increments = torch.zeros(batchsize, n, device='cuda').contiguous()\n        max_increments = torch.zeros(batchsize, m, device='cuda').contiguous()\n        unass_idx = torch.zeros(batchsize * n, device='cuda', dtype=torch.int32).contiguous()\n        max_idx = torch.zeros(batchsize * m, device='cuda', dtype=torch.int32).contiguous()\n        unass_cnt = torch.zeros(512, dtype=torch.int32, device='cuda').contiguous()\n        unass_cnt_sum = torch.zeros(512, dtype=torch.int32, device='cuda').contiguous()\n        cnt_tmp = torch.zeros(512, dtype=torch.int32, device='cuda').contiguous()\n\n        emd.forward(xyz1, xyz2, dist, assignment, price, assignment_inv, bid, bid_increments, max_increments, unass_idx,\n                    unass_cnt, unass_cnt_sum, cnt_tmp, max_idx, eps, iters)\n\n        ctx.save_for_backward(xyz1, xyz2, assignment)\n        return dist, assignment\n\n    @staticmethod\n    def backward(ctx, graddist, gradidx):\n        xyz1, xyz2, assignment = ctx.saved_tensors\n        graddist = graddist.contiguous()\n\n        gradxyz1 = torch.zeros(xyz1.size(), device='cuda').contiguous()\n        gradxyz2 = torch.zeros(xyz2.size(), device='cuda').contiguous()\n\n        emd.backward(xyz1, xyz2, gradxyz1, graddist, assignment)\n        return gradxyz1, gradxyz2, None, None\n\n\nclass emdModule(nn.Module):\n    def __init__(self):\n        super(emdModule, self).__init__()\n\n    def forward(self, input1, input2, eps, iters):\n        return emdFunction.apply(input1, input2, eps, iters)\n\n\ndef test_emd():\n    x1 = torch.rand(20, 8192, 3).cuda()\n    x2 = torch.rand(20, 8192, 3).cuda()\n    emd = emdModule()\n    start_time = time.perf_counter()\n    dis, assigment = emd(x1, x2, 0.05, 3000)\n    print(\"Input_size: \", x1.shape)\n    print(\"Runtime: %lfs\" % (time.perf_counter() - start_time))\n    print(\"EMD: %lf\" % np.sqrt(dis.cpu()).mean())\n    print(\"|set(assignment)|: %d\" % assigment.unique().numel())\n    assigment = assigment.cpu().numpy()\n    assigment = np.expand_dims(assigment, -1)\n    x2 = np.take_along_axis(x2, assigment, axis=1)\n    d = (x1 - x2) * (x1 - x2)\n    print(\"Verified EMD: %lf\" % np.sqrt(d.cpu().sum(-1)).mean())\n"
  },
  {
    "path": "lidm/eval/modules/emd/setup.py",
    "content": "from setuptools import setup\nfrom torch.utils.cpp_extension import BuildExtension, CUDAExtension\n\nsetup(\n    name='emd',\n    ext_modules=[\n        CUDAExtension('emd', [\n            'emd.cpp',\n            'emd_cuda.cu',\n        ]),\n    ],\n    cmdclass={\n        'build_ext': BuildExtension\n    })"
  },
  {
    "path": "lidm/models/__init__.py",
    "content": ""
  },
  {
    "path": "lidm/models/autoencoder.py",
    "content": "import numpy as np\nimport torch\nimport pytorch_lightning as pl\nimport torch.nn.functional as F\nfrom contextlib import contextmanager\n\nfrom taming.modules.vqvae.quantize import VectorQuantizer2 as VectorQuantizer\n\nfrom ..modules.diffusion import model_lidm, model_ldm\nfrom ..modules.distributions.distributions import DiagonalGaussianDistribution\nfrom ..modules.ema import LitEma\nfrom ..utils.misc_utils import instantiate_from_config\n\n\nclass VQModel(pl.LightningModule):\n    def __init__(self,\n                 ddconfig,\n                 n_embed,\n                 embed_dim,\n                 lossconfig=None,\n                 ckpt_path=None,\n                 ignore_keys=[],\n                 image_key=\"image\",\n                 colorize_nlabels=None,\n                 monitor=None,\n                 batch_resize_range=None,\n                 scheduler_config=None,\n                 lr_g_factor=1.0,\n                 remap=None,\n                 sane_index_shape=False,  # tell vector quantizer to return indices as bhw\n                 use_ema=False,\n                 lib_name='ldm',\n                 use_mask=False,\n                 **kwargs\n                 ):\n        super().__init__()\n        self.embed_dim = embed_dim\n        self.n_embed = n_embed\n        self.image_key = image_key\n        self.use_mask = use_mask\n        model_lib = eval(f'model_{lib_name}')\n        self.encoder = model_lib.Encoder(**ddconfig)\n        self.decoder = model_lib.Decoder(**ddconfig)\n        if lossconfig is not None:\n            self.loss = instantiate_from_config(lossconfig)\n        self.quantize = VectorQuantizer(n_embed, embed_dim, beta=0.25,\n                                        remap=remap,\n                                        sane_index_shape=sane_index_shape)\n        self.quant_conv = torch.nn.Conv2d(ddconfig[\"z_channels\"], embed_dim, 1)\n        self.post_quant_conv = torch.nn.Conv2d(embed_dim, ddconfig[\"z_channels\"], 1)\n        if colorize_nlabels is not None:\n            assert type(colorize_nlabels) == int\n            self.register_buffer(\"colorize\", torch.randn(3, colorize_nlabels, 1, 1))\n        if monitor is not None:\n            self.monitor = monitor\n        self.batch_resize_range = batch_resize_range\n        if self.batch_resize_range is not None:\n            print(f\"{self.__class__.__name__}: Using per-batch resizing in range {batch_resize_range}.\")\n\n        self.use_ema = use_ema\n        if self.use_ema:\n            self.model_ema = LitEma(self)\n            print(f\"Keeping EMAs of {len(list(self.model_ema.buffers()))}.\")\n\n        if ckpt_path is not None:\n            self.init_from_ckpt(ckpt_path, ignore_keys=ignore_keys)\n        self.scheduler_config = scheduler_config\n        self.lr_g_factor = lr_g_factor\n\n    @contextmanager\n    def ema_scope(self, context=None):\n        if self.use_ema:\n            self.model_ema.store(self.parameters())\n            self.model_ema.copy_to(self)\n            if context is not None:\n                print(f\"{context}: Switched to EMA weights\")\n        try:\n            yield None\n        finally:\n            if self.use_ema:\n                self.model_ema.restore(self.parameters())\n                if context is not None:\n                    print(f\"{context}: Restored training weights\")\n\n    def init_from_ckpt(self, path, ignore_keys=list()):\n        sd = torch.load(path, map_location=\"cpu\")[\"state_dict\"]\n        keys = list(sd.keys())\n        for k in keys:\n            for ik in ignore_keys:\n                if k.startswith(ik):\n                    print(\"Deleting key {} from state_dict.\".format(k))\n                    del sd[k]\n        missing, unexpected = self.load_state_dict(sd, strict=False)\n        print(f\"Restored from {path} with {len(missing)} missing and {len(unexpected)} unexpected keys\")\n        if len(missing) > 0:\n            print(f\"Missing Keys: {missing}\")\n            print(f\"Unexpected Keys: {unexpected}\")\n\n    def on_train_batch_end(self, *args, **kwargs):\n        if self.use_ema:\n            self.model_ema(self)\n\n    def encode(self, x):\n        h = self.encoder(x)\n        h = self.quant_conv(h)\n        quant, emb_loss, info = self.quantize(h)\n        return quant, emb_loss, info\n\n    def encode_to_prequant(self, x):\n        h = self.encoder(x)\n        h = self.quant_conv(h)\n        return h\n\n    def decode(self, quant):\n        quant = self.post_quant_conv(quant)\n        dec = self.decoder(quant)\n        return dec\n\n    def decode_code(self, code_b):\n        quant_b = self.quantize.embed_code(code_b)\n        dec = self.decode(quant_b)\n        return dec\n\n    def forward(self, input, return_pred_indices=False):\n        quant, diff, (_, _, ind) = self.encode(input)\n        dec = self.decode(quant)\n        if return_pred_indices:\n            return dec, diff, ind\n        return dec, diff\n\n    def get_input(self, batch, k):\n        x = batch[k]\n        # if len(x.shape) == 3:\n        #     x = x[..., None]\n\n        if self.batch_resize_range is not None:\n            lower_size = self.batch_resize_range[0]\n            upper_size = self.batch_resize_range[1]\n            if self.global_step <= 4:\n                # do the first few batches with max size to avoid later oom\n                new_resize = upper_size\n            else:\n                new_resize = np.random.choice(np.arange(lower_size, upper_size + 16, 16))\n            if new_resize != x.shape[2]:\n                x = F.interpolate(x, size=new_resize, mode=\"bicubic\")\n            x = x.detach()\n        return x\n\n    def get_mask(self, batch):\n        mask = batch['mask']\n        # if len(mask.shape) == 3:\n        #     mask = mask[..., None]\n        return mask\n\n    def training_step(self, batch, batch_idx, optimizer_idx):\n        # https://github.com/pytorch/pytorch/issues/37142\n        # try not to fool the heuristics\n        x = self.get_input(batch, self.image_key)\n        m = self.get_mask(batch) if self.use_mask else None\n        x_rec, qloss, ind = self(x, return_pred_indices=True)\n\n        if optimizer_idx == 0:\n            # autoencoder\n            aeloss, log_dict_ae = self.loss(qloss, x, x_rec, optimizer_idx, self.global_step,\n                                            last_layer=self.get_last_layer(), split=\"train\",\n                                            predicted_indices=None, masks=m)\n            self.log_dict(log_dict_ae, prog_bar=False, logger=True, on_step=True, on_epoch=True)\n            return aeloss\n\n        if optimizer_idx == 1:\n            # discriminator\n            discloss, log_dict_disc = self.loss(qloss, x, x_rec, optimizer_idx, self.global_step,\n                                                last_layer=self.get_last_layer(), split=\"train\",\n                                                masks=m)\n            self.log_dict(log_dict_disc, prog_bar=False, logger=True, on_step=True, on_epoch=True)\n            return discloss\n\n    def validation_step(self, batch, batch_idx):\n        log_dict = self._validation_step(batch, batch_idx)\n        if self.use_ema:\n            with self.ema_scope():\n                log_dict_ema = self._validation_step(batch, batch_idx, suffix=\"_ema\")\n        return log_dict\n\n    def _validation_step(self, batch, batch_idx, suffix=\"\"):\n        x = self.get_input(batch, self.image_key)\n        m = self.get_mask(batch) if self.use_mask else None\n        xrec, qloss, ind = self(x, return_pred_indices=True)\n        aeloss, log_dict_ae = self.loss(qloss, x, xrec, 0,\n                                        self.global_step,\n                                        last_layer=self.get_last_layer(),\n                                        split=\"val\" + suffix,\n                                        predicted_indices=None,\n                                        masks=m\n                                        )\n\n        discloss, log_dict_disc = self.loss(qloss, x, xrec, 1,\n                                            self.global_step,\n                                            last_layer=self.get_last_layer(),\n                                            split=\"val\" + suffix,\n                                            predicted_indices=None,\n                                            masks=m\n                                            )\n        rec_loss = log_dict_ae[f\"val{suffix}/rec_loss\"]\n        self.log(f\"val{suffix}/rec_loss\", rec_loss,\n                 prog_bar=True, logger=True, on_step=False, on_epoch=True, sync_dist=True)\n        self.log(f\"val{suffix}/aeloss\", aeloss,\n                 prog_bar=True, logger=True, on_step=False, on_epoch=True, sync_dist=True)\n        del log_dict_ae[f\"val{suffix}/rec_loss\"]\n        self.log_dict(log_dict_ae)\n        self.log_dict(log_dict_disc)\n        return self.log_dict\n\n    def configure_optimizers(self):\n        lr_d = self.learning_rate\n        lr_g = self.lr_g_factor * self.learning_rate\n        # print(\"lr_d\", lr_d)\n        # print(\"lr_g\", lr_g)\n        opt_ae = torch.optim.Adam(list(self.encoder.parameters()) +\n                                  list(self.decoder.parameters()) +\n                                  list(self.quantize.parameters()) +\n                                  list(self.quant_conv.parameters()) +\n                                  list(self.post_quant_conv.parameters()),\n                                  lr=lr_g, betas=(0.5, 0.9))\n        opt_disc = torch.optim.Adam(self.loss.discriminator.parameters(),\n                                    lr=lr_d, betas=(0.5, 0.9))\n\n        if self.scheduler_config is not None:\n            scheduler = instantiate_from_config(self.scheduler_config)\n\n            print(\"Setting up LambdaLR scheduler...\")\n            scheduler = [\n                {\n                    'scheduler': LambdaLR(opt_ae, lr_lambda=scheduler.schedule),\n                    'interval': 'step',\n                    'frequency': 1\n                },\n                {\n                    'scheduler': LambdaLR(opt_disc, lr_lambda=scheduler.schedule),\n                    'interval': 'step',\n                    'frequency': 1\n                },\n            ]\n            return [opt_ae, opt_disc], scheduler\n        return [opt_ae, opt_disc], []\n\n    def get_last_layer(self):\n        return self.decoder.conv_out.weight\n\n    @torch.no_grad()\n    def log_images(self, batch, only_inputs=False, plot_ema=False, **kwargs):\n        log = dict()\n        x = self.get_input(batch, self.image_key)\n        x = x.to(self.device)\n        if only_inputs:\n            log[\"inputs\"] = x\n            return log\n        xrec, _ = self(x)\n        if self.use_mask:\n            mask = xrec[:, 1:2] < 0.\n            xrec = xrec[:, 0:1]\n            xrec[mask] = -1.\n        log[\"inputs\"] = x\n        log[\"reconstructions\"] = xrec\n        if plot_ema:\n            with self.ema_scope():\n                xrec_ema, _ = self(x)\n                log[\"reconstructions_ema\"] = xrec_ema\n        return log\n\n    def to_rgb(self, x):\n        assert self.image_key == \"segmentation\"\n        if not hasattr(self, \"colorize\"):\n            self.register_buffer(\"colorize\", torch.randn(3, x.shape[1], 1, 1).to(x))\n        x = F.conv2d(x, weight=self.colorize)\n        x = 2. * (x - x.min()) / (x.max() - x.min()) - 1.\n        return x\n\n\nclass VQModelInterface(VQModel):\n    def __init__(self, embed_dim, *args, **kwargs):\n        super().__init__(embed_dim=embed_dim, *args, **kwargs)\n        self.embed_dim = embed_dim\n\n    def encode(self, x):\n        h = self.encoder(x)\n        h = self.quant_conv(h)\n        return h\n\n    def decode(self, h, force_not_quantize=False):\n        # also go through quantization layer\n        if not force_not_quantize:\n            quant, emb_loss, info = self.quantize(h)\n        else:\n            quant = h\n        quant = self.post_quant_conv(quant)\n        dec = self.decoder(quant)\n        if self.use_mask:\n            mask = dec[:, 1:2] < 0.\n            dec = dec[:, 0:1]\n            dec[mask] = -1.\n        return dec\n\n\nclass AutoencoderKL(pl.LightningModule):\n    def __init__(self,\n                 ddconfig,\n                 lossconfig,\n                 embed_dim,\n                 ckpt_path=None,\n                 ignore_keys=[],\n                 image_key=\"image\",\n                 colorize_nlabels=None,\n                 monitor=None,\n                 lib_name='ldm',\n                 use_mask=False\n                 ):\n        super().__init__()\n        self.image_key = image_key\n        self.use_mask = use_mask\n        model_lib = eval(f'model_{lib_name}')\n        self.encoder = model_lib.Encoder(**ddconfig)\n        self.decoder = model_lib.Decoder(**ddconfig)\n        self.loss = instantiate_from_config(lossconfig)\n        assert ddconfig[\"double_z\"]\n        self.quant_conv = torch.nn.Conv2d(2 * ddconfig[\"z_channels\"], 2 * embed_dim, 1)\n        self.post_quant_conv = torch.nn.Conv2d(embed_dim, ddconfig[\"z_channels\"], 1)\n        self.embed_dim = embed_dim\n        if colorize_nlabels is not None:\n            assert type(colorize_nlabels) == int\n            self.register_buffer(\"colorize\", torch.randn(3, colorize_nlabels, 1, 1))\n        if monitor is not None:\n            self.monitor = monitor\n        if ckpt_path is not None:\n            self.init_from_ckpt(ckpt_path, ignore_keys=ignore_keys)\n\n    def init_from_ckpt(self, path, ignore_keys=list()):\n        sd = torch.load(path, map_location=\"cpu\")[\"state_dict\"]\n        keys = list(sd.keys())\n        for k in keys:\n            for ik in ignore_keys:\n                if k.startswith(ik):\n                    print(\"Deleting key {} from state_dict.\".format(k))\n                    del sd[k]\n        self.load_state_dict(sd, strict=False)\n        print(f\"Restored from {path}\")\n\n    def encode(self, x):\n        h = self.encoder(x)\n        moments = self.quant_conv(h)\n        posterior = DiagonalGaussianDistribution(moments)\n        return posterior\n\n    def decode(self, z):\n        z = self.post_quant_conv(z)\n        dec = self.decoder(z)\n        return dec\n\n    def forward(self, input, sample_posterior=True):\n        posterior = self.encode(input)\n        if sample_posterior:\n            z = posterior.sample()\n        else:\n            z = posterior.mode()\n        dec = self.decode(z)\n        return dec, posterior\n\n    def get_input(self, batch, k):\n        x = batch[k]\n        if len(x.shape) == 3:\n            x = x[:, None]\n        return x\n\n    def training_step(self, batch, batch_idx, optimizer_idx):\n        inputs = self.get_input(batch, self.image_key)\n        reconstructions, posterior = self(inputs)\n\n        if optimizer_idx == 0:\n            # train encoder+decoder+logvar\n            aeloss, log_dict_ae = self.loss(inputs, reconstructions, posterior, optimizer_idx, self.global_step,\n                                            last_layer=self.get_last_layer(), split=\"train\")\n            self.log(\"aeloss\", aeloss, prog_bar=True, logger=True, on_step=True, on_epoch=True)\n            self.log_dict(log_dict_ae, prog_bar=False, logger=True, on_step=True, on_epoch=False)\n            return aeloss\n\n        if optimizer_idx == 1:\n            # train the discriminator\n            discloss, log_dict_disc = self.loss(inputs, reconstructions, posterior, optimizer_idx, self.global_step,\n                                                last_layer=self.get_last_layer(), split=\"train\")\n\n            self.log(\"discloss\", discloss, prog_bar=True, logger=True, on_step=True, on_epoch=True)\n            self.log_dict(log_dict_disc, prog_bar=False, logger=True, on_step=True, on_epoch=False)\n            return discloss\n\n    def validation_step(self, batch, batch_idx):\n        inputs = self.get_input(batch, self.image_key)\n        reconstructions, posterior = self(inputs)\n        aeloss, log_dict_ae = self.loss(inputs, reconstructions, posterior, 0, self.global_step,\n                                        last_layer=self.get_last_layer(), split=\"val\")\n        discloss, log_dict_disc = self.loss(inputs, reconstructions, posterior, 1, self.global_step,\n                                            last_layer=self.get_last_layer(), split=\"val\")\n\n        self.log(\"val/rec_loss\", log_dict_ae[\"val/rec_loss\"])\n        self.log_dict(log_dict_ae)\n        self.log_dict(log_dict_disc)\n        return self.log_dict\n\n    def configure_optimizers(self):\n        lr = self.learning_rate\n        opt_ae = torch.optim.Adam(list(self.encoder.parameters()) +\n                                  list(self.decoder.parameters()) +\n                                  list(self.quant_conv.parameters()) +\n                                  list(self.post_quant_conv.parameters()),\n                                  lr=lr, betas=(0.5, 0.9))\n        opt_disc = torch.optim.Adam(self.loss.discriminator.parameters(),\n                                    lr=lr, betas=(0.5, 0.9))\n        return [opt_ae, opt_disc], []\n\n    def get_last_layer(self):\n        return self.decoder.conv_out.weight\n\n    @torch.no_grad()\n    def log_images(self, batch, only_inputs=False, **kwargs):\n        log = dict()\n        x = self.get_input(batch, self.image_key)\n        x = x.to(self.device)\n        if not only_inputs:\n            xrec, posterior = self(x)\n            if x.shape[1] > 3:\n                # colorize with random projection\n                assert xrec.shape[1] > 3\n                x = self.to_rgb(x)\n                xrec = self.to_rgb(xrec)\n            log[\"samples\"] = self.decode(torch.randn_like(posterior.sample()))\n            log[\"reconstructions\"] = xrec\n        log[\"inputs\"] = x\n        return log\n\n    def to_rgb(self, x):\n        assert self.image_key == \"segmentation\"\n        if not hasattr(self, \"colorize\"):\n            self.register_buffer(\"colorize\", torch.randn(3, x.shape[1], 1, 1).to(x))\n        x = F.conv2d(x, weight=self.colorize)\n        x = 2. * (x - x.min()) / (x.max() - x.min()) - 1.\n        return x\n\n\nclass IdentityFirstStage(torch.nn.Module):\n    def __init__(self, *args, vq_interface=False, **kwargs):\n        self.vq_interface = vq_interface\n        super().__init__()\n\n    def encode(self, x, *args, **kwargs):\n        return x\n\n    def decode(self, x, *args, **kwargs):\n        return x\n\n    def quantize(self, x, *args, **kwargs):\n        if self.vq_interface:\n            return x, None, [None, None, None]\n        return x\n\n    def forward(self, x, *args, **kwargs):\n        return x\n"
  },
  {
    "path": "lidm/models/diffusion/__init__.py",
    "content": ""
  },
  {
    "path": "lidm/models/diffusion/classifier.py",
    "content": "import os\nimport torch\nimport pytorch_lightning as pl\nfrom omegaconf import OmegaConf\nfrom torch.nn import functional as F\nfrom torch.optim import AdamW\nfrom torch.optim.lr_scheduler import LambdaLR\nfrom copy import deepcopy\nfrom einops import rearrange\nfrom glob import glob\nfrom natsort import natsorted\n\nfrom ...modules.diffusion.openaimodel import EncoderUNetModel, UNetModel\nfrom ...utils.misc_utils import log_txt_as_img, default, ismap, instantiate_from_config\n\n__models__ = {\n    'class_label': EncoderUNetModel,\n    'segmentation': UNetModel\n}\n\n\ndef disabled_train(self, mode=True):\n    \"\"\"Overwrite model.train with this function to make sure train/eval mode\n    does not change anymore.\"\"\"\n    return self\n\n\nclass NoisyLatentImageClassifier(pl.LightningModule):\n\n    def __init__(self,\n                 diffusion_path,\n                 num_classes,\n                 ckpt_path=None,\n                 pool='attention',\n                 label_key=None,\n                 diffusion_ckpt_path=None,\n                 scheduler_config=None,\n                 weight_decay=1.e-2,\n                 log_steps=10,\n                 monitor='val/loss',\n                 *args,\n                 **kwargs):\n        super().__init__(*args, **kwargs)\n        self.num_classes = num_classes\n        # get latest config of diffusion model\n        diffusion_config = natsorted(glob(os.path.join(diffusion_path, 'configs', '*-project.yaml')))[-1]\n        self.diffusion_config = OmegaConf.load(diffusion_config).model\n        self.diffusion_config.params.ckpt_path = diffusion_ckpt_path\n        self.load_diffusion()\n\n        self.monitor = monitor\n        self.numd = self.diffusion_model.first_stage_model.encoder.num_resolutions - 1\n        self.log_time_interval = self.diffusion_model.num_timesteps // log_steps\n        self.log_steps = log_steps\n\n        self.label_key = label_key if not hasattr(self.diffusion_model, 'cond_stage_key') \\\n            else self.diffusion_model.cond_stage_key\n\n        assert self.label_key is not None, 'label_key neither in diffusion model nor in model.params'\n\n        if self.label_key not in __models__:\n            raise NotImplementedError()\n\n        self.load_classifier(ckpt_path, pool)\n\n        self.scheduler_config = scheduler_config\n        self.use_scheduler = self.scheduler_config is not None\n        self.weight_decay = weight_decay\n\n    def init_from_ckpt(self, path, ignore_keys=list(), only_model=False):\n        sd = torch.load(path, map_location=\"cpu\")\n        if \"state_dict\" in list(sd.keys()):\n            sd = sd[\"state_dict\"]\n        keys = list(sd.keys())\n        for k in keys:\n            for ik in ignore_keys:\n                if k.startswith(ik):\n                    print(\"Deleting key {} from state_dict.\".format(k))\n                    del sd[k]\n        missing, unexpected = self.load_state_dict(sd, strict=False) if not only_model else self.model.load_state_dict(\n            sd, strict=False)\n        print(f\"Restored from {path} with {len(missing)} missing and {len(unexpected)} unexpected keys\")\n        if len(missing) > 0:\n            print(f\"Missing Keys: {missing}\")\n        if len(unexpected) > 0:\n            print(f\"Unexpected Keys: {unexpected}\")\n\n    def load_diffusion(self):\n        model = instantiate_from_config(self.diffusion_config)\n        self.diffusion_model = model.eval()\n        self.diffusion_model.train = disabled_train\n        for param in self.diffusion_model.parameters():\n            param.requires_grad = False\n\n    def load_classifier(self, ckpt_path, pool):\n        model_config = deepcopy(self.diffusion_config.params.unet_config.params)\n        model_config.in_channels = self.diffusion_config.params.unet_config.params.out_channels\n        model_config.out_channels = self.num_classes\n        if self.label_key == 'class_label':\n            model_config.pool = pool\n\n        self.model = __models__[self.label_key](**model_config)\n        if ckpt_path is not None:\n            print('#####################################################################')\n            print(f'load from ckpt \"{ckpt_path}\"')\n            print('#####################################################################')\n            self.init_from_ckpt(ckpt_path)\n\n    @torch.no_grad()\n    def get_x_noisy(self, x, t, noise=None):\n        noise = default(noise, lambda: torch.randn_like(x))\n        continuous_sqrt_alpha_cumprod = None\n        if self.diffusion_model.use_continuous_noise:\n            continuous_sqrt_alpha_cumprod = self.diffusion_model.sample_continuous_noise_level(x.shape[0], t + 1)\n            # todo: make sure t+1 is correct here\n\n        return self.diffusion_model.q_sample(x_start=x, t=t, noise=noise,\n                                             continuous_sqrt_alpha_cumprod=continuous_sqrt_alpha_cumprod)\n\n    def forward(self, x_noisy, t, *args, **kwargs):\n        return self.model(x_noisy, t)\n\n    @torch.no_grad()\n    def get_input(self, batch, k):\n        x = batch[k]\n        if len(x.shape) == 3:\n            x = x[..., None]\n        x = rearrange(x, 'b h w c -> b c h w')\n        x = x.to(memory_format=torch.contiguous_format).float()\n        return x\n\n    @torch.no_grad()\n    def get_conditioning(self, batch, k=None):\n        if k is None:\n            k = self.label_key\n        assert k is not None, 'Needs to provide label key'\n\n        targets = batch[k].to(self.device)\n\n        if self.label_key == 'segmentation':\n            targets = rearrange(targets, 'b h w c -> b c h w')\n            for down in range(self.numd):\n                h, w = targets.shape[-2:]\n                targets = F.interpolate(targets, size=(h // 2, w // 2), mode='nearest')\n\n            # targets = rearrange(targets,'b c h w -> b h w c')\n\n        return targets\n\n    def compute_top_k(self, logits, labels, k, reduction=\"mean\"):\n        _, top_ks = torch.topk(logits, k, dim=1)\n        if reduction == \"mean\":\n            return (top_ks == labels[:, None]).float().sum(dim=-1).mean().item()\n        elif reduction == \"none\":\n            return (top_ks == labels[:, None]).float().sum(dim=-1)\n\n    def on_train_epoch_start(self):\n        # save some memory\n        self.diffusion_model.model.to('cpu')\n\n    @torch.no_grad()\n    def write_logs(self, loss, logits, targets):\n        log_prefix = 'train' if self.training else 'val'\n        log = {}\n        log[f\"{log_prefix}/loss\"] = loss.mean()\n        log[f\"{log_prefix}/acc@1\"] = self.compute_top_k(\n            logits, targets, k=1, reduction=\"mean\"\n        )\n        log[f\"{log_prefix}/acc@5\"] = self.compute_top_k(\n            logits, targets, k=5, reduction=\"mean\"\n        )\n\n        self.log_dict(log, prog_bar=False, logger=True, on_step=self.training, on_epoch=True)\n        self.log('loss', log[f\"{log_prefix}/loss\"], prog_bar=True, logger=False)\n        self.log('global_step', self.global_step, logger=False, on_epoch=False, prog_bar=True)\n        lr = self.optimizers().param_groups[0]['lr']\n        self.log('lr_abs', lr, on_step=True, logger=True, on_epoch=False, prog_bar=True)\n\n    def shared_step(self, batch, t=None):\n        x, *_ = self.diffusion_model.get_input(batch, k=self.diffusion_model.first_stage_key)\n        targets = self.get_conditioning(batch)\n        if targets.dim() == 4:\n            targets = targets.argmax(dim=1)\n        if t is None:\n            t = torch.randint(0, self.diffusion_model.num_timesteps, (x.shape[0],), device=self.device).long()\n        else:\n            t = torch.full(size=(x.shape[0],), fill_value=t, device=self.device).long()\n        x_noisy = self.get_x_noisy(x, t)\n        logits = self(x_noisy, t)\n\n        loss = F.cross_entropy(logits, targets, reduction='none')\n\n        self.write_logs(loss.detach(), logits.detach(), targets.detach())\n\n        loss = loss.mean()\n        return loss, logits, x_noisy, targets\n\n    def training_step(self, batch, batch_idx):\n        loss, *_ = self.shared_step(batch)\n        return loss\n\n    def reset_noise_accs(self):\n        self.noisy_acc = {t: {'acc@1': [], 'acc@5': []} for t in\n                          range(0, self.diffusion_model.num_timesteps, self.diffusion_model.log_every_t)}\n\n    def on_validation_start(self):\n        self.reset_noise_accs()\n\n    @torch.no_grad()\n    def validation_step(self, batch, batch_idx):\n        loss, *_ = self.shared_step(batch)\n\n        for t in self.noisy_acc:\n            _, logits, _, targets = self.shared_step(batch, t)\n            self.noisy_acc[t]['acc@1'].append(self.compute_top_k(logits, targets, k=1, reduction='mean'))\n            self.noisy_acc[t]['acc@5'].append(self.compute_top_k(logits, targets, k=5, reduction='mean'))\n\n        return loss\n\n    def configure_optimizers(self):\n        optimizer = AdamW(self.model.parameters(), lr=self.learning_rate, weight_decay=self.weight_decay)\n\n        if self.use_scheduler:\n            scheduler = instantiate_from_config(self.scheduler_config)\n\n            print(\"Setting up LambdaLR scheduler...\")\n            scheduler = [\n                {\n                    'scheduler': LambdaLR(optimizer, lr_lambda=scheduler.schedule),\n                    'interval': 'step',\n                    'frequency': 1\n                }]\n            return [optimizer], scheduler\n\n        return optimizer\n\n    @torch.no_grad()\n    def log_images(self, batch, N=8, *args, **kwargs):\n        log = dict()\n        x = self.get_input(batch, self.diffusion_model.first_stage_key)\n        log['inputs'] = x\n\n        y = self.get_conditioning(batch)\n\n        if self.label_key == 'class_label':\n            y = log_txt_as_img((x.shape[2], x.shape[3]), batch[\"human_label\"])\n            log['labels'] = y\n\n        if ismap(y):\n            log['labels'] = self.diffusion_model.to_rgb(y)\n\n            for step in range(self.log_steps):\n                current_time = step * self.log_time_interval\n\n                _, logits, x_noisy, _ = self.shared_step(batch, t=current_time)\n\n                log[f'inputs@t{current_time}'] = x_noisy\n\n                pred = F.one_hot(logits.argmax(dim=1), num_classes=self.num_classes)\n                pred = rearrange(pred, 'b h w c -> b c h w')\n\n                log[f'pred@t{current_time}'] = self.diffusion_model.to_rgb(pred)\n\n        for key in log:\n            log[key] = log[key][:N]\n\n        return log\n"
  },
  {
    "path": "lidm/models/diffusion/ddim.py",
    "content": "\"\"\"SAMPLING ONLY.\"\"\"\n\nimport torch\nimport numpy as np\nfrom tqdm import tqdm\nfrom functools import partial\n\nfrom ...modules.basic import make_ddim_sampling_parameters, make_ddim_timesteps, noise_like\nfrom ...utils.misc_utils import print_fn\n\n\nclass DDIMSampler(object):\n    def __init__(self, model, schedule=\"linear\", **kwargs):\n        super().__init__()\n        self.model = model\n        self.ddpm_num_timesteps = model.num_timesteps\n        self.schedule = schedule\n\n    def register_buffer(self, name, attr):\n        if type(attr) == torch.Tensor:\n            if attr.device != torch.device(\"cuda\"):\n                attr = attr.to(torch.device(\"cuda\"))\n        setattr(self, name, attr)\n\n    def make_schedule(self, ddim_num_steps, ddim_discretize=\"uniform\", ddim_eta=0., verbose=False):\n        self.ddim_timesteps = make_ddim_timesteps(ddim_discr_method=ddim_discretize, num_ddim_timesteps=ddim_num_steps,\n                                                  num_ddpm_timesteps=self.ddpm_num_timesteps, verbose=verbose)\n        alphas_cumprod = self.model.alphas_cumprod\n        assert alphas_cumprod.shape[0] == self.ddpm_num_timesteps, 'alphas have to be defined for each timestep'\n        to_torch = lambda x: x.clone().detach().to(torch.float32).to(self.model.device)\n\n        self.register_buffer('betas', to_torch(self.model.betas))\n        self.register_buffer('alphas_cumprod', to_torch(alphas_cumprod))\n        self.register_buffer('alphas_cumprod_prev', to_torch(self.model.alphas_cumprod_prev))\n\n        # calculations for diffusion q(x_t | x_{t-1}) and others\n        self.register_buffer('sqrt_alphas_cumprod', to_torch(np.sqrt(alphas_cumprod.cpu())))\n        self.register_buffer('sqrt_one_minus_alphas_cumprod', to_torch(np.sqrt(1. - alphas_cumprod.cpu())))\n        self.register_buffer('log_one_minus_alphas_cumprod', to_torch(np.log(1. - alphas_cumprod.cpu())))\n        self.register_buffer('sqrt_recip_alphas_cumprod', to_torch(np.sqrt(1. / alphas_cumprod.cpu())))\n        self.register_buffer('sqrt_recipm1_alphas_cumprod', to_torch(np.sqrt(1. / alphas_cumprod.cpu() - 1)))\n\n        # ddim sampling parameters\n        ddim_sigmas, ddim_alphas, ddim_alphas_prev = make_ddim_sampling_parameters(alphacums=alphas_cumprod.cpu(),\n                                                                                   ddim_timesteps=self.ddim_timesteps,\n                                                                                   eta=ddim_eta, verbose=verbose)\n        self.register_buffer('ddim_sigmas', ddim_sigmas)\n        self.register_buffer('ddim_alphas', ddim_alphas)\n        self.register_buffer('ddim_alphas_prev', ddim_alphas_prev)\n        self.register_buffer('ddim_sqrt_one_minus_alphas', np.sqrt(1. - ddim_alphas))\n        sigmas_for_original_sampling_steps = ddim_eta * torch.sqrt(\n            (1 - self.alphas_cumprod_prev) / (1 - self.alphas_cumprod) * (\n                    1 - self.alphas_cumprod / self.alphas_cumprod_prev))\n        self.register_buffer('ddim_sigmas_for_original_num_steps', sigmas_for_original_sampling_steps)\n\n    @torch.no_grad()\n    def sample(self,\n               S,\n               batch_size,\n               shape,\n               conditioning=None,\n               callback=None,\n               normals_sequence=None,\n               img_callback=None,\n               quantize_x0=False,\n               eta=0.,\n               mask=None,\n               x0=None,\n               temperature=1.,\n               noise_dropout=0.,\n               score_corrector=None,\n               corrector_kwargs=None,\n               verbose=False,\n               disable_tqdm=True,\n               x_T=None,\n               log_every_t=100,\n               unconditional_guidance_scale=1.,\n               unconditional_conditioning=None,\n               # this has to come in the same format as the conditioning, # e.g. as encoded tokens, ...\n               **kwargs\n               ):\n        if conditioning is not None:\n            if isinstance(conditioning, dict):\n                cbs = conditioning[list(conditioning.keys())[0]].shape[0]\n                if cbs != batch_size:\n                    print(f\"Warning: Got {cbs} conditionings but batch-size is {batch_size}\")\n            else:\n                if conditioning.shape[0] != batch_size:\n                    print(f\"Warning: Got {conditioning.shape[0]} conditionings but batch-size is {batch_size}\")\n\n        self.make_schedule(ddim_num_steps=S, ddim_eta=eta, verbose=verbose)\n        # sampling\n        C, H, W = shape\n        size = (batch_size, C, H, W)\n        print_fn(f'Data shape for DDIM sampling is {size}, eta {eta}', verbose)\n\n        samples, intermediates = self.ddim_sampling(conditioning, size,\n                                                    callback=callback,\n                                                    img_callback=img_callback,\n                                                    quantize_denoised=quantize_x0,\n                                                    mask=mask, x0=x0,\n                                                    ddim_use_original_steps=False,\n                                                    noise_dropout=noise_dropout,\n                                                    temperature=temperature,\n                                                    score_corrector=score_corrector,\n                                                    corrector_kwargs=corrector_kwargs,\n                                                    x_T=x_T,\n                                                    log_every_t=log_every_t,\n                                                    unconditional_guidance_scale=unconditional_guidance_scale,\n                                                    unconditional_conditioning=unconditional_conditioning,\n                                                    verbose=verbose, disable_tqdm=disable_tqdm)\n        return samples, intermediates\n\n    @torch.no_grad()\n    def ddim_sampling(self, cond, shape,\n                      x_T=None, ddim_use_original_steps=False,\n                      callback=None, timesteps=None, quantize_denoised=False,\n                      mask=None, x0=None, img_callback=None, log_every_t=100,\n                      temperature=1., noise_dropout=0., score_corrector=None, corrector_kwargs=None,\n                      unconditional_guidance_scale=1., unconditional_conditioning=None, verbose=False, disable_tqdm=True):\n        device = self.model.betas.device\n        b = shape[0]\n        if x_T is None:\n            img = torch.randn(shape, device=device)\n        else:\n            img = x_T\n\n        if timesteps is None:\n            timesteps = self.ddpm_num_timesteps if ddim_use_original_steps else self.ddim_timesteps\n        elif timesteps is not None and not ddim_use_original_steps:\n            subset_end = int(min(timesteps / self.ddim_timesteps.shape[0], 1) * self.ddim_timesteps.shape[0]) - 1\n            timesteps = self.ddim_timesteps[:subset_end]\n\n        intermediates = {'x_inter': [img], 'pred_x0': [img]}\n        time_range = reversed(range(0, timesteps)) if ddim_use_original_steps else np.flip(timesteps)\n        total_steps = timesteps if ddim_use_original_steps else timesteps.shape[0]\n        print_fn(f\"Running DDIM Sampling with {total_steps} timesteps\", verbose)\n\n        iterator = tqdm(time_range, desc='DDIM Sampler', total=total_steps, disable=disable_tqdm)\n\n        for i, step in enumerate(iterator):\n            index = total_steps - i - 1\n            ts = torch.full((b,), step, device=device, dtype=torch.long)\n\n            if mask is not None:\n                assert x0 is not None\n                img_orig = self.model.q_sample(x0, ts)\n                img = img_orig * mask + (1. - mask) * img\n\n            outs = self.p_sample_ddim(img, cond, ts, index=index, use_original_steps=ddim_use_original_steps,\n                                      quantize_denoised=quantize_denoised, temperature=temperature,\n                                      noise_dropout=noise_dropout, score_corrector=score_corrector,\n                                      corrector_kwargs=corrector_kwargs,\n                                      unconditional_guidance_scale=unconditional_guidance_scale,\n                                      unconditional_conditioning=unconditional_conditioning)\n            img, pred_x0 = outs\n            if callback: callback(i)\n            if img_callback: img_callback(pred_x0, i)\n\n            if index % log_every_t == 0 or index == total_steps - 1:\n                intermediates['x_inter'].append(img)\n                intermediates['pred_x0'].append(pred_x0)\n\n        return img, intermediates\n\n    @torch.no_grad()\n    def p_sample_ddim(self, x, c, t, index, repeat_noise=False, use_original_steps=False, quantize_denoised=False,\n                      temperature=1., noise_dropout=0., score_corrector=None, corrector_kwargs=None,\n                      unconditional_guidance_scale=1., unconditional_conditioning=None):\n        b, *_, device = *x.shape, x.device\n\n        if unconditional_conditioning is None or unconditional_guidance_scale == 1.:\n            e_t = self.model.apply_model(x, t, c)\n        else:\n            x_in = torch.cat([x] * 2)\n            t_in = torch.cat([t] * 2)\n            c_in = torch.cat([unconditional_conditioning, c])\n            e_t_uncond, e_t = self.model.apply_model(x_in, t_in, c_in).chunk(2)\n            e_t = e_t_uncond + unconditional_guidance_scale * (e_t - e_t_uncond)\n\n        if score_corrector is not None:\n            assert self.model.parameterization == \"eps\"\n            e_t = score_corrector.modify_score(self.model, e_t, x, t, c, **corrector_kwargs)\n\n        alphas = self.model.alphas_cumprod if use_original_steps else self.ddim_alphas\n        alphas_prev = self.model.alphas_cumprod_prev if use_original_steps else self.ddim_alphas_prev\n        sqrt_one_minus_alphas = self.model.sqrt_one_minus_alphas_cumprod if use_original_steps else self.ddim_sqrt_one_minus_alphas\n        sigmas = self.model.ddim_sigmas_for_original_num_steps if use_original_steps else self.ddim_sigmas\n        # select parameters corresponding to the currently considered timestep\n        a_t = torch.full((b, 1, 1, 1), alphas[index], device=device)\n        a_prev = torch.full((b, 1, 1, 1), alphas_prev[index], device=device)\n        sigma_t = torch.full((b, 1, 1, 1), sigmas[index], device=device)\n        sqrt_one_minus_at = torch.full((b, 1, 1, 1), sqrt_one_minus_alphas[index], device=device)\n\n        # current prediction for x_0\n        pred_x0 = (x - sqrt_one_minus_at * e_t) / a_t.sqrt()\n        if quantize_denoised:\n            pred_x0, _, *_ = self.model.first_stage_model.quantize(pred_x0)\n        # direction pointing to x_t\n        dir_xt = (1. - a_prev - sigma_t ** 2).sqrt() * e_t\n        noise = sigma_t * noise_like(x.shape, device, repeat_noise) * temperature\n        if noise_dropout > 0.:\n            noise = torch.nn.functional.dropout(noise, p=noise_dropout)\n        x_prev = a_prev.sqrt() * pred_x0 + dir_xt + noise\n        return x_prev, pred_x0\n"
  },
  {
    "path": "lidm/models/diffusion/ddpm.py",
    "content": "\"\"\"\nwild mixture of\nhttps://github.com/lucidrains/denoising-diffusion-pytorch/blob/7706bdfc6f527f58d33f84b7b522e61e6e3164b3/denoising_diffusion_pytorch/denoising_diffusion_pytorch.py\nhttps://github.com/openai/improved-diffusion/blob/e94489283bb876ac1477d5dd7709bbbd2d9902ce/improved_diffusion/gaussian_diffusion.py\nhttps://github.com/CompVis/taming-transformers\n-- merci\n\"\"\"\n\nimport torch\nimport torch.nn as nn\nimport numpy as np\nimport pytorch_lightning as pl\nfrom torch.optim.lr_scheduler import LambdaLR\nfrom einops import rearrange, repeat\nfrom contextlib import contextmanager\nfrom functools import partial\nfrom tqdm import tqdm\nfrom torchvision.utils import make_grid\nfrom pytorch_lightning.utilities.distributed import rank_zero_only\n\nfrom ...utils.misc_utils import log_txt_as_img, exists, default, ismap, isimage, mean_flat, count_params, instantiate_from_config, print_fn\nfrom ...modules.ema import LitEma\nfrom ...modules.distributions.distributions import normal_kl, DiagonalGaussianDistribution\nfrom ...models.autoencoder import VQModelInterface, IdentityFirstStage, AutoencoderKL\nfrom ...modules.basic import make_beta_schedule, extract_into_tensor, noise_like\nfrom ...models.diffusion.ddim import DDIMSampler\n\n__conditioning_keys__ = {'concat': 'c_concat',\n                         'crossattn': 'c_crossattn',\n                         'adm': 'y'}\n\n\ndef disabled_train(self, mode=True):\n    \"\"\"Overwrite model.train with this function to make sure train/eval mode\n    does not change anymore.\"\"\"\n    return self\n\n\ndef uniform_on_device(r1, r2, shape, device):\n    return (r1 - r2) * torch.rand(*shape, device=device) + r2\n\n\nclass DDPM(pl.LightningModule):\n    # classic DDPM with Gaussian diffusion, in image space\n    def __init__(self,\n                 unet_config,\n                 timesteps=1000,\n                 beta_schedule=\"linear\",\n                 loss_type=\"l2\",\n                 ckpt_path=None,\n                 ignore_keys=[],\n                 load_only_unet=False,\n                 monitor=\"val/loss\",\n                 use_ema=True,\n                 first_stage_key=\"image\",\n                 image_size=[256, 256],\n                 channels=3,\n                 log_every_t=100,\n                 clip_denoised=True,\n                 linear_start=1e-4,\n                 linear_end=2e-2,\n                 cosine_s=8e-3,\n                 given_betas=None,\n                 original_elbo_weight=0.,\n                 v_posterior=0.,  # weight for choosing posterior variance as sigma = (1-v) * beta_tilde + v * beta\n                 l_simple_weight=1.,\n                 conditioning_key=None,\n                 parameterization=\"eps\",  # all assuming fixed variance schedules\n                 scheduler_config=None,\n                 use_positional_encodings=False,\n                 learn_logvar=False,\n                 logvar_init=0.,\n                 verbose=False\n                 ):\n        super().__init__()\n        assert parameterization in [\"eps\", \"x0\"], 'currently only supporting \"eps\" and \"x0\"'\n        self.print_fn = partial(print_fn, verbose=verbose)\n        self.verbose = verbose\n        self.parameterization = parameterization\n        self.print_fn(f\"{self.__class__.__name__}: Running in {self.parameterization}-prediction mode\")\n        self.cond_stage_model = None\n        self.clip_denoised = clip_denoised\n        self.log_every_t = log_every_t\n        self.first_stage_key = first_stage_key\n        self.image_size = image_size  # try conv?\n        self.channels = channels\n        self.use_positional_encodings = use_positional_encodings\n        self.model = DiffusionWrapper(unet_config, conditioning_key)\n        count_params(self.model, verbose=True)\n        self.use_ema = use_ema\n        if self.use_ema:\n            self.model_ema = LitEma(self.model)\n            self.print_fn(f\"Keeping EMAs of {len(list(self.model_ema.buffers()))}.\")\n\n        self.use_scheduler = scheduler_config is not None\n        if self.use_scheduler:\n            self.scheduler_config = scheduler_config\n\n        self.v_posterior = v_posterior\n        self.original_elbo_weight = original_elbo_weight\n        self.l_simple_weight = l_simple_weight\n\n        if monitor is not None:\n            self.monitor = monitor\n        if ckpt_path is not None:\n            self.init_from_ckpt(ckpt_path, ignore_keys=ignore_keys, only_model=load_only_unet)\n\n        self.register_schedule(given_betas=given_betas, beta_schedule=beta_schedule, timesteps=timesteps,\n                               linear_start=linear_start, linear_end=linear_end, cosine_s=cosine_s)\n\n        self.loss_type = loss_type\n\n        self.learn_logvar = learn_logvar\n        self.logvar = torch.full(fill_value=logvar_init, size=(self.num_timesteps,))\n        self.logvar = nn.Parameter(self.logvar, requires_grad=self.learn_logvar)\n\n    def register_schedule(self, given_betas=None, beta_schedule=\"linear\", timesteps=1000,\n                          linear_start=1e-4, linear_end=2e-2, cosine_s=8e-3):\n        if exists(given_betas):\n            betas = given_betas\n        else:\n            betas = make_beta_schedule(beta_schedule, timesteps, linear_start=linear_start, linear_end=linear_end,\n                                       cosine_s=cosine_s)\n        alphas = 1. - betas\n        alphas_cumprod = np.cumprod(alphas, axis=0)\n        alphas_cumprod_prev = np.append(1., alphas_cumprod[:-1])\n\n        timesteps, = betas.shape\n        self.num_timesteps = int(timesteps)\n        self.linear_start = linear_start\n        self.linear_end = linear_end\n        assert alphas_cumprod.shape[0] == self.num_timesteps, 'alphas have to be defined for each timestep'\n\n        to_torch = partial(torch.tensor, dtype=torch.float32)\n\n        self.register_buffer('betas', to_torch(betas))\n        self.register_buffer('alphas_cumprod', to_torch(alphas_cumprod))\n        self.register_buffer('alphas_cumprod_prev', to_torch(alphas_cumprod_prev))\n\n        # calculations for diffusion q(x_t | x_{t-1}) and others\n        self.register_buffer('sqrt_alphas_cumprod', to_torch(np.sqrt(alphas_cumprod)))\n        self.register_buffer('sqrt_one_minus_alphas_cumprod', to_torch(np.sqrt(1. - alphas_cumprod)))\n        self.register_buffer('log_one_minus_alphas_cumprod', to_torch(np.log(1. - alphas_cumprod)))\n        self.register_buffer('sqrt_recip_alphas_cumprod', to_torch(np.sqrt(1. / alphas_cumprod)))\n        self.register_buffer('sqrt_recipm1_alphas_cumprod', to_torch(np.sqrt(1. / alphas_cumprod - 1)))\n\n        # calculations for posterior q(x_{t-1} | x_t, x_0)\n        posterior_variance = (1 - self.v_posterior) * betas * (1. - alphas_cumprod_prev) / (\n                1. - alphas_cumprod) + self.v_posterior * betas\n        # above: equal to 1. / (1. / (1. - alpha_cumprod_tm1) + alpha_t / beta_t)\n        self.register_buffer('posterior_variance', to_torch(posterior_variance))\n        # below: log calculation clipped because the posterior variance is 0 at the beginning of the diffusion chain\n        self.register_buffer('posterior_log_variance_clipped', to_torch(np.log(np.maximum(posterior_variance, 1e-20))))\n        self.register_buffer('posterior_mean_coef1', to_torch(\n            betas * np.sqrt(alphas_cumprod_prev) / (1. - alphas_cumprod)))\n        self.register_buffer('posterior_mean_coef2', to_torch(\n            (1. - alphas_cumprod_prev) * np.sqrt(alphas) / (1. - alphas_cumprod)))\n\n        if self.parameterization == \"eps\":\n            lvlb_weights = self.betas ** 2 / (\n                    2 * self.posterior_variance * to_torch(alphas) * (1 - self.alphas_cumprod))\n        elif self.parameterization == \"x0\":\n            lvlb_weights = 0.5 * np.sqrt(torch.Tensor(alphas_cumprod)) / (2. * 1 - torch.Tensor(alphas_cumprod))\n        else:\n            raise NotImplementedError(\"mu not supported\")\n\n        lvlb_weights[0] = lvlb_weights[1]\n        self.register_buffer('lvlb_weights', lvlb_weights, persistent=False)\n        assert not torch.isnan(self.lvlb_weights).all()\n\n    @contextmanager\n    def ema_scope(self, context=None):\n        if self.use_ema:\n            self.model_ema.store(self.model.parameters())\n            self.model_ema.copy_to(self.model)\n            if context is not None:\n                self.print_fn(f\"{context}: Switched to EMA weights\")\n        try:\n            yield None\n        finally:\n            if self.use_ema:\n                self.model_ema.restore(self.model.parameters())\n                if context is not None:\n                    self.print_fn(f\"{context}: Restored training weights\")\n\n    def init_from_ckpt(self, path, ignore_keys=list(), only_model=False):\n        sd = torch.load(path, map_location=\"cpu\")\n        if \"state_dict\" in list(sd.keys()):\n            sd = sd[\"state_dict\"]\n        keys = list(sd.keys())\n        for k in keys:\n            for ik in ignore_keys:\n                if k.startswith(ik):\n                    self.print_fn(\"Deleting key {} from state_dict.\".format(k))\n                    del sd[k]\n        missing, unexpected = self.load_state_dict(sd, strict=False) if not only_model else self.model.load_state_dict(\n            sd, strict=False)\n        self.print_fn(f\"Restored from {path} with {len(missing)} missing and {len(unexpected)} unexpected keys\")\n        if len(missing) > 0:\n            self.print_fn(f\"Missing Keys: {missing}\")\n        if len(unexpected) > 0:\n            self.print_fn(f\"Unexpected Keys: {unexpected}\")\n\n    def q_mean_variance(self, x_start, t):\n        \"\"\"\n        Get the distribution q(x_t | x_0).\n        :param x_start: the [N x C x ...] tensor of noiseless inputs.\n        :param t: the number of diffusion steps (minus 1). Here, 0 means one step.\n        :return: A tuple (mean, variance, log_variance), all of x_start's shape.\n        \"\"\"\n        mean = (extract_into_tensor(self.sqrt_alphas_cumprod, t, x_start.shape) * x_start)\n        variance = extract_into_tensor(1.0 - self.alphas_cumprod, t, x_start.shape)\n        log_variance = extract_into_tensor(self.log_one_minus_alphas_cumprod, t, x_start.shape)\n        return mean, variance, log_variance\n\n    def predict_start_from_noise(self, x_t, t, noise):\n        return (\n                extract_into_tensor(self.sqrt_recip_alphas_cumprod, t, x_t.shape) * x_t -\n                extract_into_tensor(self.sqrt_recipm1_alphas_cumprod, t, x_t.shape) * noise\n        )\n\n    def q_posterior(self, x_start, x_t, t):\n        posterior_mean = (\n                extract_into_tensor(self.posterior_mean_coef1, t, x_t.shape) * x_start +\n                extract_into_tensor(self.posterior_mean_coef2, t, x_t.shape) * x_t\n        )\n        posterior_variance = extract_into_tensor(self.posterior_variance, t, x_t.shape)\n        posterior_log_variance_clipped = extract_into_tensor(self.posterior_log_variance_clipped, t, x_t.shape)\n        return posterior_mean, posterior_variance, posterior_log_variance_clipped\n\n    def p_mean_variance(self, x, t, clip_denoised: bool):\n        model_out = self.model(x, t)\n        if self.parameterization == \"eps\":\n            x_recon = self.predict_start_from_noise(x, t=t, noise=model_out)\n        elif self.parameterization == \"x0\":\n            x_recon = model_out\n        if clip_denoised:\n            x_recon.clamp_(-1., 1.)\n\n        model_mean, posterior_variance, posterior_log_variance = self.q_posterior(x_start=x_recon, x_t=x, t=t)\n        return model_mean, posterior_variance, posterior_log_variance\n\n    @torch.no_grad()\n    def p_sample(self, x, t, clip_denoised=True, repeat_noise=False):\n        b, *_, device = *x.shape, x.device\n        model_mean, _, model_log_variance = self.p_mean_variance(x=x, t=t, clip_denoised=clip_denoised)\n        noise = noise_like(x.shape, device, repeat_noise)\n        # no noise when t == 0\n        nonzero_mask = (1 - (t == 0).float()).reshape(b, *((1,) * (len(x.shape) - 1)))\n        return model_mean + nonzero_mask * (0.5 * model_log_variance).exp() * noise\n\n    @torch.no_grad()\n    def p_sample_loop(self, shape, return_intermediates=False):\n        device = self.betas.device\n        b = shape[0]\n        img = torch.randn(shape, device=device)\n        intermediates = [img]\n        for i in tqdm(reversed(range(0, self.num_timesteps)), desc='Sampling t', total=self.num_timesteps):\n            img = self.p_sample(img, torch.full((b,), i, device=device, dtype=torch.long),\n                                clip_denoised=self.clip_denoised)\n            if i % self.log_every_t == 0 or i == self.num_timesteps - 1:\n                intermediates.append(img)\n        if return_intermediates:\n            return img, intermediates\n        return img\n\n    @torch.no_grad()\n    def sample(self, batch_size=16, return_intermediates=False):\n        image_size = self.image_size\n        channels = self.channels\n        return self.p_sample_loop((batch_size, channels, *image_size),\n                                  return_intermediates=return_intermediates)\n\n    def q_sample(self, x_start, t, noise=None):\n        noise = default(noise, lambda: torch.randn_like(x_start))\n        return (extract_into_tensor(self.sqrt_alphas_cumprod, t, x_start.shape) * x_start +\n                extract_into_tensor(self.sqrt_one_minus_alphas_cumprod, t, x_start.shape) * noise)\n\n    def get_loss(self, pred, target, mean=True):\n        if self.loss_type == 'l1':\n            loss = (target - pred).abs()\n            if mean:\n                loss = loss.mean()\n        elif self.loss_type == 'l2':\n            if mean:\n                loss = torch.nn.functional.mse_loss(target, pred)\n            else:\n                loss = torch.nn.functional.mse_loss(target, pred, reduction='none')\n        else:\n            raise NotImplementedError(\"unknown loss type '{loss_type}'\")\n\n        return loss\n\n    def p_losses(self, x_start, t, noise=None):\n        noise = default(noise, lambda: torch.randn_like(x_start))\n        x_noisy = self.q_sample(x_start=x_start, t=t, noise=noise)\n        model_out = self.model(x_noisy, t)\n\n        loss_dict = {}\n        if self.parameterization == \"eps\":\n            target = noise\n        elif self.parameterization == \"x0\":\n            target = x_start\n        else:\n            raise NotImplementedError(f\"Paramterization {self.parameterization} not yet supported\")\n\n        loss = self.get_loss(model_out, target, mean=False).mean(dim=[1, 2, 3])\n\n        log_prefix = 'train' if self.training else 'val'\n\n        loss_dict.update({f'{log_prefix}/loss_simple': loss.mean()})\n        loss_simple = loss.mean() * self.l_simple_weight\n\n        loss_vlb = (self.lvlb_weights[t] * loss).mean()\n        loss_dict.update({f'{log_prefix}/loss_vlb': loss_vlb})\n\n        loss = loss_simple + self.original_elbo_weight * loss_vlb\n\n        loss_dict.update({f'{log_prefix}/loss': loss})\n\n        return loss, loss_dict\n\n    def forward(self, x, *args, **kwargs):\n        # b, c, h, w, device, img_size, = *x.shape, x.device, self.image_size\n        # assert h == img_size and w == img_size, f'height and width of image must be {img_size}'\n        t = torch.randint(0, self.num_timesteps, (x.shape[0],), device=self.device).long()\n        return self.p_losses(x, t, *args, **kwargs)\n\n    def get_input(self, batch, k):\n        x = batch[k]\n        # if len(x.shape) == 3:\n        #     x = x[..., None]\n        return x\n\n    def shared_step(self, batch):\n        x = self.get_input(batch, self.first_stage_key)\n        loss, loss_dict = self(x)\n        return loss, loss_dict\n\n    def training_step(self, batch, batch_idx):\n        loss, loss_dict = self.shared_step(batch)\n\n        self.log_dict(loss_dict, prog_bar=True,\n                      logger=True, on_step=True, on_epoch=True)\n\n        self.log(\"global_step\", self.global_step,\n                 prog_bar=True, logger=True, on_step=True, on_epoch=False)\n\n        if self.use_scheduler:\n            lr = self.optimizers().param_groups[0]['lr']\n            self.log('lr_abs', lr, prog_bar=True, logger=True, on_step=True, on_epoch=False)\n\n        return loss\n\n    @torch.no_grad()\n    def validation_step(self, batch, batch_idx):\n        _, loss_dict_no_ema = self.shared_step(batch)\n        with self.ema_scope():\n            _, loss_dict_ema = self.shared_step(batch)\n            loss_dict_ema = {key + '_ema': loss_dict_ema[key] for key in loss_dict_ema}\n        self.log_dict(loss_dict_no_ema, prog_bar=False, logger=True, on_step=False, on_epoch=True)\n        self.log_dict(loss_dict_ema, prog_bar=False, logger=True, on_step=False, on_epoch=True)\n\n    def on_train_batch_end(self, *args, **kwargs):\n        if self.use_ema:\n            self.model_ema(self.model)\n\n    def _get_rows_from_list(self, samples):\n        n_imgs_per_row = len(samples)\n        denoise_grid = rearrange(samples, 'n b c h w -> b n c h w')\n        denoise_grid = rearrange(denoise_grid, 'b n c h w -> (b n) c h w')\n        denoise_grid = make_grid(denoise_grid, nrow=n_imgs_per_row)\n        return denoise_grid\n\n    @torch.no_grad()\n    def log_images(self, batch, N=8, n_row=2, sample=True, return_keys=None, **kwargs):\n        log = dict()\n        x = self.get_input(batch, self.first_stage_key)\n        N = min(x.shape[0], N)\n        n_row = min(x.shape[0], n_row)\n        x = x.to(self.device)[:N]\n        log[\"inputs\"] = x\n\n        # get diffusion row\n        diffusion_row = list()\n        x_start = x[:n_row]\n\n        for t in range(self.num_timesteps):\n            if t % self.log_every_t == 0 or t == self.num_timesteps - 1:\n                t = repeat(torch.tensor([t]), '1 -> b', b=n_row)\n                t = t.to(self.device).long()\n                noise = torch.randn_like(x_start)\n                x_noisy = self.q_sample(x_start=x_start, t=t, noise=noise)\n                diffusion_row.append(x_noisy)\n\n        log[\"diffusion_row\"] = self._get_rows_from_list(diffusion_row)\n\n        if sample:\n            # get denoise row\n            with self.ema_scope(\"Plotting\"):\n                samples, denoise_row = self.sample(batch_size=N, return_intermediates=True)\n\n            log[\"samples\"] = samples\n            log[\"denoise_row\"] = self._get_rows_from_list(denoise_row)\n\n        if return_keys:\n            if np.intersect1d(list(log.keys()), return_keys).shape[0] == 0:\n                return log\n            else:\n                return {key: log[key] for key in return_keys}\n        return log\n\n    def configure_optimizers(self):\n        lr = self.learning_rate\n        params = list(self.model.parameters())\n        if self.learn_logvar:\n            params = params + [self.logvar]\n        opt = torch.optim.AdamW(params, lr=lr)\n        return opt\n\n\nclass LatentDiffusion(DDPM):\n    \"\"\"main class\"\"\"\n\n    def __init__(self,\n                 first_stage_config,\n                 cond_stage_config,\n                 num_timesteps_cond=None,\n                 cond_stage_key=\"image\",\n                 cond_stage_trainable=False,\n                 concat_mode=True,\n                 cond_stage_forward=None,\n                 conditioning_key=None,\n                 scale_factor=1.0,\n                 scale_by_std=False,\n                 use_mask=False,\n                 *args, **kwargs):\n        self.num_timesteps_cond = default(num_timesteps_cond, 1)\n        self.scale_by_std = scale_by_std\n        assert self.num_timesteps_cond <= kwargs['timesteps']\n        # for backwards compatibility after implementation of DiffusionWrapper\n        if conditioning_key is None:\n            conditioning_key = 'concat' if concat_mode else 'crossattn'\n        if cond_stage_config == '__is_unconditional__':\n            conditioning_key = None\n        ckpt_path = kwargs.pop(\"ckpt_path\", None)\n        ignore_keys = kwargs.pop(\"ignore_keys\", [])\n        super().__init__(conditioning_key=conditioning_key, *args, **kwargs)\n        self.concat_mode = concat_mode\n        self.cond_stage_trainable = cond_stage_trainable\n        self.cond_stage_key = cond_stage_key\n        try:\n            self.num_downs = len(first_stage_config.params.ddconfig.ch_mult) - 1\n        except:\n            self.num_downs = 0\n        if not scale_by_std:\n            self.scale_factor = scale_factor\n        else:\n            self.register_buffer('scale_factor', torch.tensor(scale_factor))\n        self.instantiate_first_stage(first_stage_config)\n        self.instantiate_cond_stage(cond_stage_config)\n        self.cond_stage_forward = cond_stage_forward\n        self.clip_denoised = False\n        self.bbox_tokenizer = None\n        self.use_mask = use_mask\n\n        self.restarted_from_ckpt = False\n        if ckpt_path is not None:\n            self.init_from_ckpt(ckpt_path, ignore_keys)\n            self.restarted_from_ckpt = True\n\n    def make_cond_schedule(self, ):\n        self.cond_ids = torch.full(size=(self.num_timesteps,), fill_value=self.num_timesteps - 1, dtype=torch.long)\n        ids = torch.round(torch.linspace(0, self.num_timesteps - 1, self.num_timesteps_cond)).long()\n        self.cond_ids[:self.num_timesteps_cond] = ids\n\n    @rank_zero_only\n    @torch.no_grad()\n    def on_train_batch_start(self, batch, batch_idx, dataloader_idx):\n        # only for very first batch\n        if self.scale_by_std and self.current_epoch == 0 and self.global_step == 0 and batch_idx == 0 and not self.restarted_from_ckpt:\n            assert self.scale_factor == 1., 'rather not use custom rescaling and std-rescaling simultaneously'\n            # set rescale weight to 1./std of encodings\n            self.print_fn(\"### USING STD-RESCALING ###\")\n            x = super().get_input(batch, self.first_stage_key)\n            x = x.to(self.device)\n            encoder_posterior = self.encode_first_stage(x)\n            z = self.get_first_stage_encoding(encoder_posterior).detach()\n            del self.scale_factor\n            self.register_buffer('scale_factor', 1. / z.flatten().std())\n            self.print_fn(f\"setting self.scale_factor to {self.scale_factor}\")\n            self.print_fn(\"### USING STD-RESCALING ###\")\n\n    def register_schedule(self,\n                          given_betas=None, beta_schedule=\"linear\", timesteps=1000,\n                          linear_start=1e-4, linear_end=2e-2, cosine_s=8e-3):\n        super().register_schedule(given_betas, beta_schedule, timesteps, linear_start, linear_end, cosine_s)\n\n        self.shorten_cond_schedule = self.num_timesteps_cond > 1\n        if self.shorten_cond_schedule:\n            self.make_cond_schedule()\n\n    def instantiate_first_stage(self, config):\n        model = instantiate_from_config(config)\n        self.first_stage_model = model.eval()\n        self.first_stage_model.train = disabled_train\n        for param in self.first_stage_model.parameters():\n            param.requires_grad = False\n\n    def instantiate_cond_stage(self, config):\n        if not self.cond_stage_trainable:\n            if config == \"__is_first_stage__\":\n                self.print_fn(\"Using first stage also as cond stage.\")\n                self.cond_stage_model = self.first_stage_model\n            elif config == \"__is_unconditional__\":\n                self.print_fn(f\"Training {self.__class__.__name__} as an unconditional model.\")\n                self.cond_stage_model = None\n                # self.be_unconditional = True\n            else:\n                model = instantiate_from_config(config)\n                self.cond_stage_model = model.eval()\n                self.cond_stage_model.train = disabled_train\n                for param in self.cond_stage_model.parameters():\n                    param.requires_grad = False\n        else:\n            assert config != '__is_first_stage__'\n            assert config != '__is_unconditional__'\n            model = instantiate_from_config(config)\n            self.cond_stage_model = model\n\n    def _get_denoise_row_from_list(self, samples, desc='', force_no_decoder_quantization=False):\n        denoise_row = []\n        for zd in tqdm(samples, desc=desc):\n            denoise_row.append(self.decode_first_stage(zd.to(self.device),\n                                                       force_not_quantize=force_no_decoder_quantization))\n        n_imgs_per_row = len(denoise_row)\n        denoise_row = torch.stack(denoise_row)  # n_log_step, n_row, C, H, W\n        denoise_grid = rearrange(denoise_row, 'n b c h w -> b n c h w')\n        denoise_grid = rearrange(denoise_grid, 'b n c h w -> (b n) c h w')\n        denoise_grid = make_grid(denoise_grid, nrow=n_imgs_per_row)\n        return denoise_grid\n\n    def get_first_stage_encoding(self, encoder_posterior):\n        if isinstance(encoder_posterior, DiagonalGaussianDistribution):\n            z = encoder_posterior.sample()\n        elif isinstance(encoder_posterior, torch.Tensor):\n            z = encoder_posterior\n        else:\n            raise NotImplementedError(f\"encoder_posterior of type '{type(encoder_posterior)}' not yet implemented\")\n        return self.scale_factor * z\n\n    def get_learned_conditioning(self, c):\n        if self.cond_stage_forward is None:\n            if hasattr(self.cond_stage_model, 'encode') and callable(self.cond_stage_model.encode):\n                c = self.cond_stage_model.encode(c)\n                if isinstance(c, DiagonalGaussianDistribution):\n                    c = c.mode()\n            else:\n                c = self.cond_stage_model(c)\n        else:\n            assert hasattr(self.cond_stage_model, self.cond_stage_forward)\n            c = getattr(self.cond_stage_model, self.cond_stage_forward)(c)\n        return c\n\n    def meshgrid(self, h, w):\n        y = torch.arange(0, h).view(h, 1, 1).repeat(1, w, 1)\n        x = torch.arange(0, w).view(1, w, 1).repeat(h, 1, 1)\n\n        arr = torch.cat([y, x], dim=-1)\n        return arr\n\n    def delta_border(self, h, w):\n        \"\"\"\n        :param h: height\n        :param w: width\n        :return: normalized distance to image border,\n         wtith min distance = 0 at border and max dist = 0.5 at image center\n        \"\"\"\n        lower_right_corner = torch.tensor([h - 1, w - 1]).view(1, 1, 2)\n        arr = self.meshgrid(h, w) / lower_right_corner\n        dist_left_up = torch.min(arr, dim=-1, keepdims=True)[0]\n        dist_right_down = torch.min(1 - arr, dim=-1, keepdims=True)[0]\n        edge_dist = torch.min(torch.cat([dist_left_up, dist_right_down], dim=-1), dim=-1)[0]\n        return edge_dist\n\n    def get_weighting(self, h, w, Ly, Lx, device):\n        weighting = self.delta_border(h, w)\n        weighting = torch.clip(weighting, self.split_input_params[\"clip_min_weight\"],\n                               self.split_input_params[\"clip_max_weight\"], )\n        weighting = weighting.view(1, h * w, 1).repeat(1, 1, Ly * Lx).to(device)\n\n        if self.split_input_params[\"tie_braker\"]:\n            L_weighting = self.delta_border(Ly, Lx)\n            L_weighting = torch.clip(L_weighting,\n                                     self.split_input_params[\"clip_min_tie_weight\"],\n                                     self.split_input_params[\"clip_max_tie_weight\"])\n\n            L_weighting = L_weighting.view(1, 1, Ly * Lx).to(device)\n            weighting = weighting * L_weighting\n        return weighting\n\n    def get_fold_unfold(self, x, kernel_size, stride, uf=1, df=1):  # todo load once not every time, shorten code\n        \"\"\"\n        :param x: img of size (bs, c, h, w)\n        :return: n img crops of size (n, bs, c, kernel_size[0], kernel_size[1])\n        \"\"\"\n        bs, nc, h, w = x.shape\n\n        # number of crops in image\n        Ly = (h - kernel_size[0]) // stride[0] + 1\n        Lx = (w - kernel_size[1]) // stride[1] + 1\n\n        if uf == 1 and df == 1:\n            fold_params = dict(kernel_size=kernel_size, dilation=1, padding=0, stride=stride)\n            unfold = torch.nn.Unfold(**fold_params)\n\n            fold = torch.nn.Fold(output_size=x.shape[2:], **fold_params)\n\n            weighting = self.get_weighting(kernel_size[0], kernel_size[1], Ly, Lx, x.device).to(x.dtype)\n            normalization = fold(weighting).view(1, 1, h, w)  # normalizes the overlap\n            weighting = weighting.view((1, 1, kernel_size[0], kernel_size[1], Ly * Lx))\n\n        elif uf > 1 and df == 1:\n            fold_params = dict(kernel_size=kernel_size, dilation=1, padding=0, stride=stride)\n            unfold = torch.nn.Unfold(**fold_params)\n\n            fold_params2 = dict(kernel_size=(kernel_size[0] * uf, kernel_size[0] * uf),\n                                dilation=1, padding=0,\n                                stride=(stride[0] * uf, stride[1] * uf))\n            fold = torch.nn.Fold(output_size=(x.shape[2] * uf, x.shape[3] * uf), **fold_params2)\n\n            weighting = self.get_weighting(kernel_size[0] * uf, kernel_size[1] * uf, Ly, Lx, x.device).to(x.dtype)\n            normalization = fold(weighting).view(1, 1, h * uf, w * uf)  # normalizes the overlap\n            weighting = weighting.view((1, 1, kernel_size[0] * uf, kernel_size[1] * uf, Ly * Lx))\n\n        elif df > 1 and uf == 1:\n            fold_params = dict(kernel_size=kernel_size, dilation=1, padding=0, stride=stride)\n            unfold = torch.nn.Unfold(**fold_params)\n\n            fold_params2 = dict(kernel_size=(kernel_size[0] // df, kernel_size[0] // df),\n                                dilation=1, padding=0,\n                                stride=(stride[0] // df, stride[1] // df))\n            fold = torch.nn.Fold(output_size=(x.shape[2] // df, x.shape[3] // df), **fold_params2)\n\n            weighting = self.get_weighting(kernel_size[0] // df, kernel_size[1] // df, Ly, Lx, x.device).to(x.dtype)\n            normalization = fold(weighting).view(1, 1, h // df, w // df)  # normalizes the overlap\n            weighting = weighting.view((1, 1, kernel_size[0] // df, kernel_size[1] // df, Ly * Lx))\n\n        else:\n            raise NotImplementedError\n\n        return fold, unfold, normalization, weighting\n\n    @torch.no_grad()\n    def get_input(self, batch, k, return_first_stage_outputs=False, force_c_encode=False,\n                  cond_key=None, return_original_cond=False, bs=None):\n        # ground truth\n        # import pdb; pdb.set_trace()\n        x = super().get_input(batch, k)\n        if bs is not None:\n            x = x[:bs]\n        x = x.to(self.device)\n\n        # encoding\n        encoder_posterior = self.encode_first_stage(x)\n        z = self.get_first_stage_encoding(encoder_posterior).detach()\n        if self.model.conditioning_key is not None:\n            if cond_key is None:\n                cond_key = self.cond_stage_key\n            if cond_key != self.first_stage_key:\n                if cond_key in ['caption', 'bbox', 'center', 'camera']:\n                    xc = batch[cond_key]\n                elif cond_key in ['class_label']:\n                    xc = batch\n                else:\n                    xc = super().get_input(batch, cond_key).to(self.device)\n            else:\n                xc = x\n            # if bs is not None:\n            #     xc = xc[:bs]\n            if not self.cond_stage_trainable or force_c_encode:\n                if isinstance(xc, (dict, list)):\n                    c = self.get_learned_conditioning(xc)\n                else:\n                    c = self.get_learned_conditioning(xc.to(self.device))\n            else:\n                c = xc\n            if bs is not None:\n                c = c[:bs]\n\n            if self.use_positional_encodings:\n                pos_x, pos_y = self.compute_latent_shifts(batch)\n                ckey = __conditioning_keys__[self.model.conditioning_key]\n                c = {ckey: c, 'pos_x': pos_x, 'pos_y': pos_y}\n\n        else:\n            c = None\n            xc = None\n            if self.use_positional_encodings:\n                pos_x, pos_y = self.compute_latent_shifts(batch)\n                c = {'pos_x': pos_x, 'pos_y': pos_y}\n        out = [z, c]\n        if return_first_stage_outputs:\n            xrec = self.decode_first_stage(z)\n            out.extend([x, xrec])\n        if return_original_cond:\n            out.append(xc)\n        return out\n\n    @torch.no_grad()\n    def decode_first_stage(self, z, predict_cids=False, force_not_quantize=False):\n        if predict_cids:\n            if z.dim() == 4:\n                z = torch.argmax(z.exp(), dim=1).long()\n            z = self.first_stage_model.quantize.get_codebook_entry(z, shape=None)\n            z = rearrange(z, 'b h w c -> b c h w').contiguous()\n\n        z = 1. / self.scale_factor * z\n\n        if hasattr(self, \"split_input_params\"):\n            if self.split_input_params[\"patch_distributed_vq\"]:\n                ks = self.split_input_params[\"ks\"]  # eg. (128, 128)\n                stride = self.split_input_params[\"stride\"]  # eg. (64, 64)\n                uf = self.split_input_params[\"vqf\"]\n                bs, nc, h, w = z.shape\n                if ks[0] > h or ks[1] > w:\n                    ks = (min(ks[0], h), min(ks[1], w))\n                    self.print_fn(\"reducing Kernel\")\n\n                if stride[0] > h or stride[1] > w:\n                    stride = (min(stride[0], h), min(stride[1], w))\n                    self.print_fn(\"reducing stride\")\n\n                fold, unfold, normalization, weighting = self.get_fold_unfold(z, ks, stride, uf=uf)\n\n                z = unfold(z)  # (bn, nc * prod(**ks), L)\n                # 1. Reshape to img shape\n                z = z.view((z.shape[0], -1, ks[0], ks[1], z.shape[-1]))  # (bn, nc, ks[0], ks[1], L )\n\n                # 2. apply model loop over last dim\n                if isinstance(self.first_stage_model, VQModelInterface):\n                    output_list = [self.first_stage_model.decode(z[:, :, :, :, i],\n                                                                 force_not_quantize=predict_cids or force_not_quantize)\n                                   for i in range(z.shape[-1])]\n                else:\n\n                    output_list = [self.first_stage_model.decode(z[:, :, :, :, i])\n                                   for i in range(z.shape[-1])]\n\n                o = torch.stack(output_list, axis=-1)  # # (bn, nc, ks[0], ks[1], L)\n                o = o * weighting\n                # Reverse 1. reshape to img shape\n                o = o.view((o.shape[0], -1, o.shape[-1]))  # (bn, nc * ks[0] * ks[1], L)\n                # stitch crops together\n                decoded = fold(o)\n                decoded = decoded / normalization  # norm is shape (1, 1, h, w)\n                return decoded\n            else:\n                if isinstance(self.first_stage_model, VQModelInterface):\n                    return self.first_stage_model.decode(z, force_not_quantize=predict_cids or force_not_quantize)\n                else:\n                    return self.first_stage_model.decode(z)\n\n        else:\n            if isinstance(self.first_stage_model, VQModelInterface):\n                return self.first_stage_model.decode(z, force_not_quantize=predict_cids or force_not_quantize)\n            else:\n                return self.first_stage_model.decode(z)\n\n    # same as above but without decorator\n    def differentiable_decode_first_stage(self, z, predict_cids=False, force_not_quantize=False):\n        if predict_cids:\n            if z.dim() == 4:\n                z = torch.argmax(z.exp(), dim=1).long()\n            z = self.first_stage_model.quantize.get_codebook_entry(z, shape=None)\n            z = rearrange(z, 'b h w c -> b c h w').contiguous()\n\n        z = 1. / self.scale_factor * z\n\n        if hasattr(self, \"split_input_params\"):\n            if self.split_input_params[\"patch_distributed_vq\"]:\n                ks = self.split_input_params[\"ks\"]  # eg. (128, 128)\n                stride = self.split_input_params[\"stride\"]  # eg. (64, 64)\n                uf = self.split_input_params[\"vqf\"]\n                bs, nc, h, w = z.shape\n                if ks[0] > h or ks[1] > w:\n                    ks = (min(ks[0], h), min(ks[1], w))\n                    self.print_fn(\"reducing Kernel\")\n\n                if stride[0] > h or stride[1] > w:\n                    stride = (min(stride[0], h), min(stride[1], w))\n                    self.print_fn(\"reducing stride\")\n\n                fold, unfold, normalization, weighting = self.get_fold_unfold(z, ks, stride, uf=uf)\n\n                z = unfold(z)  # (bn, nc * prod(**ks), L)\n                # 1. Reshape to img shape\n                z = z.view((z.shape[0], -1, ks[0], ks[1], z.shape[-1]))  # (bn, nc, ks[0], ks[1], L )\n\n                # 2. apply model loop over last dim\n                if isinstance(self.first_stage_model, VQModelInterface):\n                    output_list = [self.first_stage_model.decode(z[:, :, :, :, i],\n                                                                 force_not_quantize=predict_cids or force_not_quantize)\n                                   for i in range(z.shape[-1])]\n                else:\n                    output_list = [self.first_stage_model.decode(z[:, :, :, :, i])\n                                   for i in range(z.shape[-1])]\n\n                o = torch.stack(output_list, axis=-1)  # # (bn, nc, ks[0], ks[1], L)\n                o = o * weighting\n                # Reverse 1. reshape to img shape\n                o = o.view((o.shape[0], -1, o.shape[-1]))  # (bn, nc * ks[0] * ks[1], L)\n                # stitch crops together\n                decoded = fold(o)\n                decoded = decoded / normalization  # norm is shape (1, 1, h, w)\n                return decoded\n            else:\n                if isinstance(self.first_stage_model, VQModelInterface):\n                    return self.first_stage_model.decode(z, force_not_quantize=predict_cids or force_not_quantize)\n                else:\n                    return self.first_stage_model.decode(z)\n\n        else:\n            if isinstance(self.first_stage_model, VQModelInterface):\n                return self.first_stage_model.decode(z, force_not_quantize=predict_cids or force_not_quantize)\n            else:\n                return self.first_stage_model.decode(z)\n\n    @torch.no_grad()\n    def encode_first_stage(self, x):\n        if hasattr(self, \"split_input_params\"):\n            if self.split_input_params[\"patch_distributed_vq\"]:\n                ks = self.split_input_params[\"ks\"]  # eg. (128, 128)\n                stride = self.split_input_params[\"stride\"]  # eg. (64, 64)\n                df = self.split_input_params[\"vqf\"]\n                self.split_input_params['original_image_size'] = x.shape[-2:]\n                bs, nc, h, w = x.shape\n                if ks[0] > h or ks[1] > w:\n                    ks = (min(ks[0], h), min(ks[1], w))\n                    self.print_fn(\"reducing kernel\")\n\n                if stride[0] > h or stride[1] > w:\n                    stride = (min(stride[0], h), min(stride[1], w))\n                    self.print_fn(\"reducing stride\")\n\n                fold, unfold, normalization, weighting = self.get_fold_unfold(x, ks, stride, df=df)\n                z = unfold(x)  # (bn, nc * prod(**ks), L)\n                # Reshape to img shape\n                z = z.view((z.shape[0], -1, ks[0], ks[1], z.shape[-1]))  # (bn, nc, ks[0], ks[1], L )\n\n                output_list = [self.first_stage_model.encode(z[:, :, :, :, i]) for i in range(z.shape[-1])]\n\n                o = torch.stack(output_list, axis=-1)\n                o = o * weighting\n\n                # Reverse reshape to img shape\n                o = o.view((o.shape[0], -1, o.shape[-1]))  # (bn, nc * ks[0] * ks[1], L)\n                # stitch crops together\n                decoded = fold(o)\n                decoded = decoded / normalization\n                return decoded\n            else:\n                return self.first_stage_model.encode(x)\n        else:\n            return self.first_stage_model.encode(x)\n\n    def shared_step(self, batch, **kwargs):\n        x, c = self.get_input(batch, self.first_stage_key)\n        loss = self(x, c)\n        return loss\n\n    def forward(self, x, c, *args, **kwargs):\n        t = torch.randint(0, self.num_timesteps, (x.shape[0],), device=self.device).long()\n        if self.model.conditioning_key is not None:\n            assert c is not None\n            if self.cond_stage_trainable:\n                c = self.get_learned_conditioning(c)\n            if self.shorten_cond_schedule:  # TODO: drop this option\n                tc = self.cond_ids[t].to(self.device)\n                c = self.q_sample(x_start=c, t=tc, noise=torch.randn_like(c.float()))\n        return self.p_losses(x, c, t, *args, **kwargs)\n\n    def _rescale_annotations(self, bboxes, crop_coordinates):  # TODO: move to dataset\n        def rescale_bbox(bbox):\n            x0 = clamp((bbox[0] - crop_coordinates[0]) / crop_coordinates[2])\n            y0 = clamp((bbox[1] - crop_coordinates[1]) / crop_coordinates[3])\n            w = min(bbox[2] / crop_coordinates[2], 1 - x0)\n            h = min(bbox[3] / crop_coordinates[3], 1 - y0)\n            return x0, y0, w, h\n\n        return [rescale_bbox(b) for b in bboxes]\n\n    def apply_model(self, x_noisy, t, cond, return_ids=False):\n\n        if isinstance(cond, dict):\n            # hybrid case, cond is exptected to be a dict\n            pass\n        else:\n            if not isinstance(cond, list):\n                cond = [cond]\n            key = 'c_concat' if self.model.conditioning_key == 'concat' else 'c_crossattn'\n            cond = {key: cond}\n\n        if hasattr(self, \"split_input_params\"):\n            assert len(cond) == 1\n            assert not return_ids\n            ks = self.split_input_params[\"ks\"]  # eg. (128, 128)\n            stride = self.split_input_params[\"stride\"]  # eg. (64, 64)\n\n            h, w = x_noisy.shape[-2:]\n\n            fold, unfold, normalization, weighting = self.get_fold_unfold(x_noisy, ks, stride)\n\n            z = unfold(x_noisy)  # (bn, nc * prod(**ks), L)\n            # Reshape to img shape\n            z = z.view((z.shape[0], -1, ks[0], ks[1], z.shape[-1]))  # (bn, nc, ks[0], ks[1], L )\n            z_list = [z[:, :, :, :, i] for i in range(z.shape[-1])]\n\n            if self.cond_stage_key in [\"image\", \"LR_image\", \"segmentation\", 'bbox_img'] and self.model.conditioning_key:\n                c_key = next(iter(cond.keys()))  # get key\n                c = next(iter(cond.values()))  # get value\n                assert (len(c) == 1)\n                c = c[0]  # get element\n\n                c = unfold(c)\n                c = c.view((c.shape[0], -1, ks[0], ks[1], c.shape[-1]))  # (bn, nc, ks[0], ks[1], L )\n\n                cond_list = [{c_key: [c[:, :, :, :, i]]} for i in range(c.shape[-1])]\n\n            elif self.cond_stage_key in ['bbox', 'center']:\n                assert 'original_image_size' in self.split_input_params, 'BoudingBoxRescaling is missing original_image_size'\n\n                # assuming padding of unfold is always 0 and its dilation is always 1\n                n_patches_per_row = int((w - ks[0]) / stride[0] + 1)\n                full_img_h, full_img_w = self.split_input_params['original_image_size']\n                # as we are operating on latents, we need the factor from the original image size to the\n                # spatial latent size to properly rescale the crops for regenerating the bbox annotations\n                num_downs = self.first_stage_model.encoder.num_resolutions - 1\n                rescale_latent = 2 ** num_downs\n\n                # get top left postions of patches as conforming for the bbbox tokenizer, therefore we\n                # need to rescale the tl patch coordinates to be in between (0,1)\n                tl_patch_coordinates = [(rescale_latent * stride[0] * (patch_nr % n_patches_per_row) / full_img_w,\n                                         rescale_latent * stride[1] * (patch_nr // n_patches_per_row) / full_img_h)\n                                        for patch_nr in range(z.shape[-1])]\n\n                # patch_limits are tl_coord, width and height coordinates as (x_tl, y_tl, h, w)\n                patch_limits = [(x_tl, y_tl,\n                                 rescale_latent * ks[0] / full_img_w,\n                                 rescale_latent * ks[1] / full_img_h) for x_tl, y_tl in tl_patch_coordinates]\n                # patch_values = [(np.arange(x_tl,min(x_tl+ks, 1.)),np.arange(y_tl,min(y_tl+ks, 1.))) for x_tl, y_tl in tl_patch_coordinates]\n\n                # tokenize crop coordinates for the bounding boxes of the respective patches\n                patch_limits_tknzd = [torch.LongTensor(self.bbox_tokenizer.crop_encoder(bbox))[None].to(self.device)\n                                      for bbox in patch_limits]  # list of length l with tensors of shape (1, 2)\n                self.print_fn(patch_limits_tknzd[0].shape)\n                # cut tknzd crop position from conditioning\n                assert isinstance(cond, dict), 'cond must be dict to be fed into model'\n                cut_cond = cond['c_crossattn'][0][..., :-2].to(self.device)\n                self.print_fn(cut_cond.shape)\n\n                adapted_cond = torch.stack([torch.cat([cut_cond, p], dim=1) for p in patch_limits_tknzd])\n                adapted_cond = rearrange(adapted_cond, 'l b n -> (l b) n')\n                self.print_fn(adapted_cond.shape)\n                adapted_cond = self.get_learned_conditioning(adapted_cond)\n                self.print_fn(adapted_cond.shape)\n                adapted_cond = rearrange(adapted_cond, '(l b) n d -> l b n d', l=z.shape[-1])\n                self.print_fn(adapted_cond.shape)\n\n                cond_list = [{'c_crossattn': [e]} for e in adapted_cond]\n\n            else:\n                cond_list = [cond for i in range(z.shape[-1])]  # todo make this more efficient\n\n            # apply model by loop over crops\n            output_list = [self.model(z_list[i], t, **cond_list[i]) for i in range(z.shape[-1])]\n            assert not isinstance(output_list[0],\n                                  tuple)  # todo cant deal with multiple model outputs check this never happens\n\n            o = torch.stack(output_list, axis=-1)\n            o = o * weighting\n            # Reverse reshape to img shape\n            o = o.view((o.shape[0], -1, o.shape[-1]))  # (bn, nc * ks[0] * ks[1], L)\n            # stitch crops together\n            x_recon = fold(o) / normalization\n\n        else:\n            x_recon = self.model(x_noisy, t, **cond)\n\n        if isinstance(x_recon, tuple) and not return_ids:\n            return x_recon[0]\n        else:\n            return x_recon\n\n    def _predict_eps_from_xstart(self, x_t, t, pred_xstart):\n        return (extract_into_tensor(self.sqrt_recip_alphas_cumprod, t, x_t.shape) * x_t - pred_xstart) / \\\n               extract_into_tensor(self.sqrt_recipm1_alphas_cumprod, t, x_t.shape)\n\n    def _prior_bpd(self, x_start):\n        \"\"\"\n        Get the prior KL term for the variational lower-bound, measured in\n        bits-per-dim.\n        This term can't be optimized, as it only depends on the encoder.\n        :param x_start: the [N x C x ...] tensor of inputs.\n        :return: a batch of [N] KL values (in bits), one per batch element.\n        \"\"\"\n        batch_size = x_start.shape[0]\n        t = torch.tensor([self.num_timesteps - 1] * batch_size, device=x_start.device)\n        qt_mean, _, qt_log_variance = self.q_mean_variance(x_start, t)\n        kl_prior = normal_kl(mean1=qt_mean, logvar1=qt_log_variance, mean2=0.0, logvar2=0.0)\n        return mean_flat(kl_prior) / np.log(2.0)\n\n    def p_losses(self, x_start, cond, t, noise=None):\n        noise = default(noise, lambda: torch.randn_like(x_start))\n        x_noisy = self.q_sample(x_start=x_start, t=t, noise=noise)\n        model_output = self.apply_model(x_noisy, t, cond)\n\n        loss_dict = {}\n        prefix = 'train' if self.training else 'val'\n\n        if self.parameterization == \"x0\":\n            target = x_start\n        elif self.parameterization == \"eps\":\n            target = noise\n        else:\n            raise NotImplementedError()\n\n        # simple loss\n        loss_simple = self.get_loss(model_output, target, mean=False).mean([1, 2, 3])\n        loss_dict.update({f'{prefix}/loss_simple': loss_simple.mean()})\n\n        logvar_t = self.logvar[t].to(self.device)\n        loss = loss_simple / torch.exp(logvar_t) + logvar_t\n        # loss = loss_simple / torch.exp(self.logvar) + self.logvar\n        if self.learn_logvar:\n            loss_dict.update({f'{prefix}/loss_gamma': loss.mean()})\n            loss_dict.update({'logvar': self.logvar.data.mean()})\n\n        loss = self.l_simple_weight * loss.mean()\n\n        # vlb loss\n        loss_vlb = self.get_loss(model_output, target, mean=False).mean(dim=(1, 2, 3))\n        loss_vlb = (self.lvlb_weights[t] * loss_vlb).mean()\n        loss_dict.update({f'{prefix}/loss_vlb': loss_vlb})\n\n        # total loss\n        loss += (self.original_elbo_weight * loss_vlb)\n        loss_dict.update({f'{prefix}/loss': loss})\n\n        return loss, loss_dict\n\n    def p_mean_variance(self, x, c, t, clip_denoised: bool, return_codebook_ids=False, quantize_denoised=False,\n                        return_x0=False, score_corrector=None, corrector_kwargs=None):\n        t_in = t\n        model_out = self.apply_model(x, t_in, c, return_ids=return_codebook_ids)\n\n        if score_corrector is not None:\n            assert self.parameterization == \"eps\"\n            model_out = score_corrector.modify_score(self, model_out, x, t, c, **corrector_kwargs)\n\n        if return_codebook_ids:\n            model_out, logits = model_out\n\n        if self.parameterization == \"eps\":\n            x_recon = self.predict_start_from_noise(x, t=t, noise=model_out)\n        elif self.parameterization == \"x0\":\n            x_recon = model_out\n        else:\n            raise NotImplementedError()\n\n        if clip_denoised:\n            x_recon.clamp_(-1., 1.)\n        if quantize_denoised:\n            x_recon, _, [_, _, indices] = self.first_stage_model.quantize(x_recon)\n        model_mean, posterior_variance, posterior_log_variance = self.q_posterior(x_start=x_recon, x_t=x, t=t)\n        if return_codebook_ids:\n            return model_mean, posterior_variance, posterior_log_variance, logits\n        elif return_x0:\n            return model_mean, posterior_variance, posterior_log_variance, x_recon\n        else:\n            return model_mean, posterior_variance, posterior_log_variance\n\n    @torch.no_grad()\n    def p_sample(self, x, c, t, clip_denoised=False, repeat_noise=False,\n                 return_codebook_ids=False, quantize_denoised=False, return_x0=False,\n                 temperature=1., noise_dropout=0., score_corrector=None, corrector_kwargs=None):\n        b, *_, device = *x.shape, x.device\n        outputs = self.p_mean_variance(x=x, c=c, t=t, clip_denoised=clip_denoised,\n                                       return_codebook_ids=return_codebook_ids,\n                                       quantize_denoised=quantize_denoised,\n                                       return_x0=return_x0,\n                                       score_corrector=score_corrector, corrector_kwargs=corrector_kwargs)\n        if return_codebook_ids:\n            raise DeprecationWarning(\"Support dropped.\")\n            model_mean, _, model_log_variance, logits = outputs\n        elif return_x0:\n            model_mean, _, model_log_variance, x0 = outputs\n        else:\n            model_mean, _, model_log_variance = outputs\n\n        noise = noise_like(x.shape, device, repeat_noise) * temperature\n        if noise_dropout > 0.:\n            noise = torch.nn.functional.dropout(noise, p=noise_dropout)\n        # no noise when t == 0\n        nonzero_mask = (1 - (t == 0).float()).reshape(b, *((1,) * (len(x.shape) - 1)))\n\n        if return_codebook_ids:\n            return model_mean + nonzero_mask * (0.5 * model_log_variance).exp() * noise, logits.argmax(dim=1)\n        if return_x0:\n            return model_mean + nonzero_mask * (0.5 * model_log_variance).exp() * noise, x0\n        else:\n            return model_mean + nonzero_mask * (0.5 * model_log_variance).exp() * noise\n\n    @torch.no_grad()\n    def progressive_denoising(self, cond, shape, verbose=False, callback=None, quantize_denoised=False,\n                              img_callback=None, mask=None, x0=None, temperature=1., noise_dropout=0.,\n                              score_corrector=None, corrector_kwargs=None, batch_size=None, x_T=None, start_T=None,\n                              log_every_t=None):\n        if not log_every_t:\n            log_every_t = self.log_every_t\n        timesteps = self.num_timesteps\n        if batch_size is not None:\n            b = batch_size if batch_size is not None else shape[0]\n            shape = [batch_size] + list(shape)\n        else:\n            b = batch_size = shape[0]\n        if x_T is None:\n            img = torch.randn(shape, device=self.device)\n        else:\n            img = x_T\n        intermediates = []\n        if cond is not None:\n            if isinstance(cond, dict):\n                cond = {key: cond[key][:batch_size] if not isinstance(cond[key], list) else\n                list(map(lambda x: x[:batch_size], cond[key])) for key in cond}\n            else:\n                cond = [c[:batch_size] for c in cond] if isinstance(cond, list) else cond[:batch_size]\n\n        if start_T is not None:\n            timesteps = min(timesteps, start_T)\n        iterator = tqdm(reversed(range(0, timesteps)), desc='Progressive Generation',\n                        total=timesteps) if verbose else reversed(\n            range(0, timesteps))\n        if type(temperature) == float:\n            temperature = [temperature] * timesteps\n\n        for i in iterator:\n            ts = torch.full((b,), i, device=self.device, dtype=torch.long)\n            if self.shorten_cond_schedule:\n                assert self.model.conditioning_key != 'hybrid'\n                tc = self.cond_ids[ts].to(cond.device)\n                cond = self.q_sample(x_start=cond, t=tc, noise=torch.randn_like(cond))\n\n            img, x0_partial = self.p_sample(img, cond, ts,\n                                            clip_denoised=self.clip_denoised,\n                                            quantize_denoised=quantize_denoised, return_x0=True,\n                                            temperature=temperature[i], noise_dropout=noise_dropout,\n                                            score_corrector=score_corrector, corrector_kwargs=corrector_kwargs)\n            if mask is not None:\n                assert x0 is not None\n                img_orig = self.q_sample(x0, ts)\n                img = img_orig * mask + (1. - mask) * img\n\n            if i % log_every_t == 0 or i == timesteps - 1:\n                intermediates.append(x0_partial)\n            if callback: callback(i)\n            if img_callback: img_callback(img, i)\n        return img, intermediates\n\n    @torch.no_grad()\n    def p_sample_loop(self, cond, shape, return_intermediates=False,\n                      x_T=None, verbose=False, callback=None, timesteps=None, quantize_denoised=False,\n                      mask=None, x0=None, img_callback=None, start_T=None,\n                      log_every_t=None):\n\n        if not log_every_t:\n            log_every_t = self.log_every_t\n        device = self.betas.device\n        b = shape[0]\n        if x_T is None:\n            img = torch.randn(shape, device=device)\n        else:\n            img = x_T\n\n        intermediates = [img]\n        if timesteps is None:\n            timesteps = self.num_timesteps\n\n        if start_T is not None:\n            timesteps = min(timesteps, start_T)\n        iterator = tqdm(reversed(range(0, timesteps)), desc='Sampling t', total=timesteps) if verbose else reversed(\n            range(0, timesteps))\n\n        if mask is not None:\n            assert x0 is not None\n            assert x0.shape[2:3] == mask.shape[2:3]  # spatial size has to match\n\n        for i in iterator:\n            ts = torch.full((b,), i, device=device, dtype=torch.long)\n            if self.shorten_cond_schedule:\n                assert self.model.conditioning_key != 'hybrid'\n                tc = self.cond_ids[ts].to(cond.device)\n                cond = self.q_sample(x_start=cond, t=tc, noise=torch.randn_like(cond))\n\n            img = self.p_sample(img, cond, ts,\n                                clip_denoised=self.clip_denoised,\n                                quantize_denoised=quantize_denoised)\n            if mask is not None:\n                img_orig = self.q_sample(x0, ts)\n                img = img_orig * mask + (1. - mask) * img\n\n            if i % log_every_t == 0 or i == timesteps - 1:\n                intermediates.append(img)\n            if callback: callback(i)\n            if img_callback: img_callback(img, i)\n\n        if return_intermediates:\n            return img, intermediates\n        return img\n\n    @torch.no_grad()\n    def sample(self, cond, batch_size=16, return_intermediates=False, x_T=None,\n               verbose=False, timesteps=None, quantize_denoised=False,\n               mask=None, x0=None, shape=None, **kwargs):\n        if shape is None:\n            shape = (batch_size, self.channels, *self.image_size)\n        if cond is not None:\n            if isinstance(cond, dict):\n                cond = {key: cond[key][:batch_size] if not isinstance(cond[key], list) else\n                list(map(lambda x: x[:batch_size], cond[key])) for key in cond}\n            else:\n                cond = [c[:batch_size] for c in cond] if isinstance(cond, list) else cond[:batch_size]\n        return self.p_sample_loop(cond,\n                                  shape,\n                                  return_intermediates=return_intermediates, x_T=x_T,\n                                  verbose=verbose, timesteps=timesteps, quantize_denoised=quantize_denoised,\n                                  mask=mask, x0=x0)\n\n    @torch.no_grad()\n    def sample_log(self, cond, batch_size, ddim, ddim_steps, **kwargs):\n        if ddim:\n            ddim_sampler = DDIMSampler(self)\n            shape = (self.channels, *self.image_size)\n            samples, intermediates = ddim_sampler.sample(ddim_steps, batch_size,\n                                                         shape, cond, verbose=self.verbose, **kwargs)\n\n        else:\n            samples, intermediates = self.sample(cond=cond, batch_size=batch_size,\n                                                 return_intermediates=True, **kwargs)\n\n        return samples, intermediates\n\n    @torch.no_grad()\n    def log_images(self, batch, N=8, n_row=4, sample=True, ddim_steps=200, ddim_eta=1., return_keys=None,\n                   quantize_denoised=False, inpaint=False, plot_denoise_rows=False, plot_progressive_rows=False,\n                   plot_diffusion_rows=False, dset=None, **kwargs):\n\n        use_ddim = ddim_steps is not None\n\n        log = dict()\n        z, c, x, xrec, xc = self.get_input(batch, self.first_stage_key,\n                                           return_first_stage_outputs=True,\n                                           force_c_encode=True,\n                                           return_original_cond=True,\n                                           bs=N)\n\n        N = min(x.shape[0], N)\n        n_row = min(x.shape[0], n_row)\n        log[\"inputs\"] = x\n        log[\"reconstruction\"] = xrec\n        if self.model.conditioning_key is not None:\n            if hasattr(self.cond_stage_model, \"decode\"):\n                xc = self.cond_stage_model.decode(c)\n                log[\"conditioning\"] = xc\n            elif self.cond_stage_key in [\"caption\"]:\n                xc = log_txt_as_img((x.shape[2], x.shape[3]), batch[\"caption\"])\n                log[\"conditioning\"] = xc\n            elif self.cond_stage_key in ['class_label']:\n                xc = log_txt_as_img((x.shape[2], x.shape[3]), batch[\"human_label\"])\n                log['conditioning'] = xc\n            elif self.cond_stage_key in ['camera']:\n                if isinstance(batch[\"camera\"], list):\n                    xc = torch.cat(batch[\"camera\"], -1)\n                else:\n                    xc = batch[\"camera\"].permute(0, 2, 3, 1, 4)\n                    xc = xc.reshape(*xc.shape[:3], -1) * 2. - 1.\n                log['conditioning'] = xc\n            elif isimage(xc):\n                log[\"conditioning\"] = xc\n            if ismap(xc):\n                if dset is None:\n                    key = 'train' if self.training else 'validation'\n                    dset = self.trainer.datamodule.datasets[key].data\n                label2rgb = torch.from_numpy(dset.label2rgb).to(self.device) / 127.5 - 1.\n                # log[\"original_conditioning\"] = self.to_rgb(xc)\n                log[\"original_conditioning\"] = label2rgb[xc.argmax(1)].permute(0, 3, 1, 2)\n\n        if plot_diffusion_rows:\n            # get diffusion row\n            diffusion_row = list()\n            z_start = z[:n_row]\n            for t in range(self.num_timesteps):\n                if t % self.log_every_t == 0 or t == self.num_timesteps - 1:\n                    t = repeat(torch.tensor([t]), '1 -> b', b=n_row)\n                    t = t.to(self.device).long()\n                    noise = torch.randn_like(z_start)\n                    z_noisy = self.q_sample(x_start=z_start, t=t, noise=noise)\n                    diffusion_row.append(self.decode_first_stage(z_noisy))\n\n            diffusion_row = torch.stack(diffusion_row)  # n_log_step, n_row, C, H, W\n            diffusion_grid = rearrange(diffusion_row, 'n b c h w -> b n c h w')\n            diffusion_grid = rearrange(diffusion_grid, 'b n c h w -> (b n) c h w')\n            diffusion_grid = make_grid(diffusion_grid, nrow=diffusion_row.shape[0])\n            log[\"diffusion_row\"] = diffusion_grid\n\n        if sample:\n            # get denoise row\n            with self.ema_scope(\"Plotting\"):\n                samples, z_denoise_row = self.sample_log(cond=c, batch_size=N, ddim=use_ddim,\n                                                         ddim_steps=ddim_steps, eta=ddim_eta)\n            x_samples = self.decode_first_stage(samples)\n            log[\"samples\"] = x_samples\n            if plot_denoise_rows:\n                denoise_grid = self._get_denoise_row_from_list(z_denoise_row)\n                log[\"denoise_row\"] = denoise_grid\n\n            if quantize_denoised and not isinstance(self.first_stage_model, AutoencoderKL) and not isinstance(\n                    self.first_stage_model, IdentityFirstStage):\n                # also display when quantizing x0 while sampling\n                with self.ema_scope(\"Plotting Quantized Denoised\"):\n                    samples, z_denoise_row = self.sample_log(cond=c, batch_size=N, ddim=use_ddim,\n                                                             ddim_steps=ddim_steps, eta=ddim_eta,\n                                                             quantize_denoised=True)\n                x_samples = self.decode_first_stage(samples.to(self.device))\n                log[\"samples_x0_quantized\"] = x_samples\n\n            if inpaint:\n                # make a simple center square\n                b, h, w = z.shape[0], z.shape[2], z.shape[3]\n                mask = torch.ones(N, h, w).to(self.device)\n                # zeros will be filled in\n                mask[:, h // 4:3 * h // 4, w // 4:3 * w // 4] = 0.\n                mask = mask[:, None, ...]\n                with self.ema_scope(\"Plotting Inpaint\"):\n                    samples, _ = self.sample_log(cond=c, batch_size=N, ddim=use_ddim, eta=ddim_eta,\n                                                 ddim_steps=ddim_steps, x0=z[:N], mask=mask)\n                x_samples = self.decode_first_stage(samples.to(self.device))\n                log[\"samples_inpainting\"] = x_samples\n                log[\"mask\"] = mask\n\n                # outpaint\n                with self.ema_scope(\"Plotting Outpaint\"):\n                    samples, _ = self.sample_log(cond=c, batch_size=N, ddim=use_ddim, eta=ddim_eta,\n                                                 ddim_steps=ddim_steps, x0=z[:N], mask=mask)\n                x_samples = self.decode_first_stage(samples.to(self.device))\n                log[\"samples_outpainting\"] = x_samples\n\n        if plot_progressive_rows:\n            with self.ema_scope(\"Plotting Progressives\"):\n                img, progressives = self.progressive_denoising(c, shape=(self.channels, *self.image_size), batch_size=N)\n            prog_row = self._get_denoise_row_from_list(progressives, desc=\"Progressive Generation\")\n            log[\"progressive_row\"] = prog_row\n\n        if return_keys:\n            if np.intersect1d(list(log.keys()), return_keys).shape[0] == 0:\n                return log\n            else:\n                return {key: log[key] for key in return_keys}\n        return log\n\n    def configure_optimizers(self):\n        lr = self.learning_rate\n        params = list(self.model.parameters())\n        if self.cond_stage_trainable:\n            self.print_fn(f\"{self.__class__.__name__}: Also optimizing conditioner params!\")\n            params = params + list(self.cond_stage_model.parameters())\n        if self.learn_logvar:\n            self.print_fn('Diffusion model optimizing logvar')\n            params.append(self.logvar)\n        opt = torch.optim.AdamW(params, lr=lr)\n        if self.use_scheduler:\n            assert 'target' in self.scheduler_config\n            scheduler = instantiate_from_config(self.scheduler_config)\n\n            self.print_fn(\"Setting up LambdaLR scheduler...\")\n            scheduler = [\n                {\n                    'scheduler': LambdaLR(opt, lr_lambda=scheduler.schedule),\n                    'interval': 'step',\n                    'frequency': 1\n                }]\n            return [opt], scheduler\n        return opt\n\n    @torch.no_grad()\n    def to_rgb(self, x):\n        x = x.float()\n        if not hasattr(self, \"colorize\"):\n            self.colorize = torch.randn(3, x.shape[1], 1, 1).to(x)\n        x = nn.functional.conv2d(x, weight=self.colorize)\n        x = 2. * (x - x.min()) / (x.max() - x.min()) - 1.\n        return x\n\n\nclass DiffusionWrapper(pl.LightningModule):\n    def __init__(self, diff_model_config, conditioning_key):\n        super().__init__()\n        self.diffusion_model = instantiate_from_config(diff_model_config)\n        self.conditioning_key = conditioning_key\n        assert self.conditioning_key in [None, 'concat', 'crossattn', 'hybrid', 'adm']\n\n    def forward(self, x, t, c_concat: list = None, c_crossattn: list = None):\n        if self.conditioning_key is None:\n            out = self.diffusion_model(x, t)\n        elif self.conditioning_key == 'concat':\n            xc = torch.cat([x] + c_concat, dim=1)\n            out = self.diffusion_model(xc, t)\n        elif self.conditioning_key == 'crossattn':\n            cc = torch.cat(c_crossattn, 1)\n            out = self.diffusion_model(x, t, context=cc)\n        elif self.conditioning_key == 'hybrid':\n            xc = torch.cat([x] + c_concat, dim=1)\n            cc = torch.cat(c_crossattn, 1)\n            out = self.diffusion_model(xc, t, context=cc)\n        elif self.conditioning_key == 'adm':\n            cc = c_crossattn[0]\n            out = self.diffusion_model(x, t, y=cc)\n        else:\n            raise NotImplementedError()\n\n        return out\n\n\nclass Layout2ImgDiffusion(LatentDiffusion):\n    def __init__(self, cond_stage_key, *args, **kwargs):\n        assert cond_stage_key in ['bbox', 'center'], 'Layout2ImgDiffusion only for cond_stage_key=\"bbox\" or \"center\"'\n        super().__init__(cond_stage_key=cond_stage_key, *args, **kwargs)\n\n    def log_images(self, batch, N=8, dset=None, *args, **kwargs):\n        logs = super().log_images(batch=batch, N=N, *args, **kwargs)\n\n        key = 'train' if self.training else 'validation'\n        if dset is None:\n            dset = self.trainer.datamodule.datasets[key].data\n        mapper = dset.conditional_builders[self.cond_stage_key]\n        H, W = batch['image'].shape[2:]\n\n        bbox_imgs = []\n        map_fn = lambda catno: dset.get_textual_label_for_category_id(catno)\n        for tknzd_bbox in batch[self.cond_stage_key][:N]:\n            bboximg = mapper.plot(tknzd_bbox.detach().cpu(), map_fn, (W, H))\n            bbox_imgs.append(bboximg)\n\n        cond_img = torch.stack(bbox_imgs, dim=0)\n        logs['bbox_image'] = cond_img\n        return logs\n"
  },
  {
    "path": "lidm/models/diffusion/plms.py",
    "content": "\"\"\"SAMPLING ONLY.\"\"\"\n\nimport torch\nimport numpy as np\nfrom tqdm import tqdm\nfrom functools import partial\n\nfrom ...modules.diffusion.util import make_ddim_sampling_parameters, make_ddim_timesteps, noise_like\n\n\nclass PLMSSampler(object):\n    def __init__(self, model, schedule=\"linear\", **kwargs):\n        super().__init__()\n        self.model = model\n        self.ddpm_num_timesteps = model.num_timesteps\n        self.schedule = schedule\n\n    def register_buffer(self, name, attr):\n        if type(attr) == torch.Tensor:\n            if attr.device != torch.device(\"cuda\"):\n                attr = attr.to(torch.device(\"cuda\"))\n        setattr(self, name, attr)\n\n    def make_schedule(self, ddim_num_steps, ddim_discretize=\"uniform\", ddim_eta=0., verbose=False):\n        if ddim_eta != 0:\n            raise ValueError('ddim_eta must be 0 for PLMS')\n        self.ddim_timesteps = make_ddim_timesteps(ddim_discr_method=ddim_discretize, num_ddim_timesteps=ddim_num_steps,\n                                                  num_ddpm_timesteps=self.ddpm_num_timesteps,verbose=verbose)\n        alphas_cumprod = self.model.alphas_cumprod\n        assert alphas_cumprod.shape[0] == self.ddpm_num_timesteps, 'alphas have to be defined for each timestep'\n        to_torch = lambda x: x.clone().detach().to(torch.float32).to(self.model.device)\n\n        self.register_buffer('betas', to_torch(self.model.betas))\n        self.register_buffer('alphas_cumprod', to_torch(alphas_cumprod))\n        self.register_buffer('alphas_cumprod_prev', to_torch(self.model.alphas_cumprod_prev))\n\n        # calculations for diffusion q(x_t | x_{t-1}) and others\n        self.register_buffer('sqrt_alphas_cumprod', to_torch(np.sqrt(alphas_cumprod.cpu())))\n        self.register_buffer('sqrt_one_minus_alphas_cumprod', to_torch(np.sqrt(1. - alphas_cumprod.cpu())))\n        self.register_buffer('log_one_minus_alphas_cumprod', to_torch(np.log(1. - alphas_cumprod.cpu())))\n        self.register_buffer('sqrt_recip_alphas_cumprod', to_torch(np.sqrt(1. / alphas_cumprod.cpu())))\n        self.register_buffer('sqrt_recipm1_alphas_cumprod', to_torch(np.sqrt(1. / alphas_cumprod.cpu() - 1)))\n\n        # ddim sampling parameters\n        ddim_sigmas, ddim_alphas, ddim_alphas_prev = make_ddim_sampling_parameters(alphacums=alphas_cumprod.cpu(),\n                                                                                   ddim_timesteps=self.ddim_timesteps,\n                                                                                   eta=ddim_eta,verbose=verbose)\n        self.register_buffer('ddim_sigmas', ddim_sigmas)\n        self.register_buffer('ddim_alphas', ddim_alphas)\n        self.register_buffer('ddim_alphas_prev', ddim_alphas_prev)\n        self.register_buffer('ddim_sqrt_one_minus_alphas', np.sqrt(1. - ddim_alphas))\n        sigmas_for_original_sampling_steps = ddim_eta * torch.sqrt(\n            (1 - self.alphas_cumprod_prev) / (1 - self.alphas_cumprod) * (\n                        1 - self.alphas_cumprod / self.alphas_cumprod_prev))\n        self.register_buffer('ddim_sigmas_for_original_num_steps', sigmas_for_original_sampling_steps)\n\n    @torch.no_grad()\n    def sample(self,\n               S,\n               batch_size,\n               shape,\n               conditioning=None,\n               callback=None,\n               normals_sequence=None,\n               img_callback=None,\n               quantize_x0=False,\n               eta=0.,\n               mask=None,\n               x0=None,\n               temperature=1.,\n               noise_dropout=0.,\n               score_corrector=None,\n               corrector_kwargs=None,\n               verbose=False,\n               x_T=None,\n               log_every_t=100,\n               unconditional_guidance_scale=1.,\n               unconditional_conditioning=None,\n               # this has to come in the same format as the conditioning, # e.g. as encoded tokens, ...\n               **kwargs\n               ):\n        if conditioning is not None:\n            if isinstance(conditioning, dict):\n                cbs = conditioning[list(conditioning.keys())[0]].shape[0]\n                if cbs != batch_size:\n                    print(f\"Warning: Got {cbs} conditionings but batch-size is {batch_size}\")\n            else:\n                if conditioning.shape[0] != batch_size:\n                    print(f\"Warning: Got {conditioning.shape[0]} conditionings but batch-size is {batch_size}\")\n\n        self.make_schedule(ddim_num_steps=S, ddim_eta=eta, verbose=verbose)\n        # sampling\n        C, H, W = shape\n        size = (batch_size, C, H, W)\n        print(f'Data shape for PLMS sampling is {size}')\n\n        samples, intermediates = self.plms_sampling(conditioning, size,\n                                                    callback=callback,\n                                                    img_callback=img_callback,\n                                                    quantize_denoised=quantize_x0,\n                                                    mask=mask, x0=x0,\n                                                    ddim_use_original_steps=False,\n                                                    noise_dropout=noise_dropout,\n                                                    temperature=temperature,\n                                                    score_corrector=score_corrector,\n                                                    corrector_kwargs=corrector_kwargs,\n                                                    x_T=x_T,\n                                                    log_every_t=log_every_t,\n                                                    unconditional_guidance_scale=unconditional_guidance_scale,\n                                                    unconditional_conditioning=unconditional_conditioning,\n                                                    )\n        return samples, intermediates\n\n    @torch.no_grad()\n    def plms_sampling(self, cond, shape,\n                      x_T=None, ddim_use_original_steps=False,\n                      callback=None, timesteps=None, quantize_denoised=False,\n                      mask=None, x0=None, img_callback=None, log_every_t=100,\n                      temperature=1., noise_dropout=0., score_corrector=None, corrector_kwargs=None,\n                      unconditional_guidance_scale=1., unconditional_conditioning=None,):\n        device = self.model.betas.device\n        b = shape[0]\n        if x_T is None:\n            img = torch.randn(shape, device=device)\n        else:\n            img = x_T\n\n        if timesteps is None:\n            timesteps = self.ddpm_num_timesteps if ddim_use_original_steps else self.ddim_timesteps\n        elif timesteps is not None and not ddim_use_original_steps:\n            subset_end = int(min(timesteps / self.ddim_timesteps.shape[0], 1) * self.ddim_timesteps.shape[0]) - 1\n            timesteps = self.ddim_timesteps[:subset_end]\n\n        intermediates = {'x_inter': [img], 'pred_x0': [img]}\n        time_range = list(reversed(range(0,timesteps))) if ddim_use_original_steps else np.flip(timesteps)\n        total_steps = timesteps if ddim_use_original_steps else timesteps.shape[0]\n        print(f\"Running PLMS Sampling with {total_steps} timesteps\")\n\n        iterator = tqdm(time_range, desc='PLMS Sampler', total=total_steps)\n        old_eps = []\n\n        for i, step in enumerate(iterator):\n            index = total_steps - i - 1\n            ts = torch.full((b,), step, device=device, dtype=torch.long)\n            ts_next = torch.full((b,), time_range[min(i + 1, len(time_range) - 1)], device=device, dtype=torch.long)\n\n            if mask is not None:\n                assert x0 is not None\n                img_orig = self.model.q_sample(x0, ts)  # TODO: deterministic forward pass?\n                img = img_orig * mask + (1. - mask) * img\n\n            outs = self.p_sample_plms(img, cond, ts, index=index, use_original_steps=ddim_use_original_steps,\n                                      quantize_denoised=quantize_denoised, temperature=temperature,\n                                      noise_dropout=noise_dropout, score_corrector=score_corrector,\n                                      corrector_kwargs=corrector_kwargs,\n                                      unconditional_guidance_scale=unconditional_guidance_scale,\n                                      unconditional_conditioning=unconditional_conditioning,\n                                      old_eps=old_eps, t_next=ts_next)\n            img, pred_x0, e_t = outs\n            old_eps.append(e_t)\n            if len(old_eps) >= 4:\n                old_eps.pop(0)\n            if callback: callback(i)\n            if img_callback: img_callback(pred_x0, i)\n\n            if index % log_every_t == 0 or index == total_steps - 1:\n                intermediates['x_inter'].append(img)\n                intermediates['pred_x0'].append(pred_x0)\n\n        return img, intermediates\n\n    @torch.no_grad()\n    def p_sample_plms(self, x, c, t, index, repeat_noise=False, use_original_steps=False, quantize_denoised=False,\n                      temperature=1., noise_dropout=0., score_corrector=None, corrector_kwargs=None,\n                      unconditional_guidance_scale=1., unconditional_conditioning=None, old_eps=None, t_next=None):\n        b, *_, device = *x.shape, x.device\n\n        def get_model_output(x, t):\n            if unconditional_conditioning is None or unconditional_guidance_scale == 1.:\n                e_t = self.model.apply_model(x, t, c)\n            else:\n                x_in = torch.cat([x] * 2)\n                t_in = torch.cat([t] * 2)\n                c_in = torch.cat([unconditional_conditioning, c])\n                e_t_uncond, e_t = self.model.apply_model(x_in, t_in, c_in).chunk(2)\n                e_t = e_t_uncond + unconditional_guidance_scale * (e_t - e_t_uncond)\n\n            if score_corrector is not None:\n                assert self.model.parameterization == \"eps\"\n                e_t = score_corrector.modify_score(self.model, e_t, x, t, c, **corrector_kwargs)\n\n            return e_t\n\n        alphas = self.model.alphas_cumprod if use_original_steps else self.ddim_alphas\n        alphas_prev = self.model.alphas_cumprod_prev if use_original_steps else self.ddim_alphas_prev\n        sqrt_one_minus_alphas = self.model.sqrt_one_minus_alphas_cumprod if use_original_steps else self.ddim_sqrt_one_minus_alphas\n        sigmas = self.model.ddim_sigmas_for_original_num_steps if use_original_steps else self.ddim_sigmas\n\n        def get_x_prev_and_pred_x0(e_t, index):\n            # select parameters corresponding to the currently considered timestep\n            a_t = torch.full((b, 1, 1, 1), alphas[index], device=device)\n            a_prev = torch.full((b, 1, 1, 1), alphas_prev[index], device=device)\n            sigma_t = torch.full((b, 1, 1, 1), sigmas[index], device=device)\n            sqrt_one_minus_at = torch.full((b, 1, 1, 1), sqrt_one_minus_alphas[index],device=device)\n\n            # current prediction for x_0\n            pred_x0 = (x - sqrt_one_minus_at * e_t) / a_t.sqrt()\n            if quantize_denoised:\n                pred_x0, _, *_ = self.model.first_stage_model.quantize(pred_x0)\n            # direction pointing to x_t\n            dir_xt = (1. - a_prev - sigma_t**2).sqrt() * e_t\n            noise = sigma_t * noise_like(x.shape, device, repeat_noise) * temperature\n            if noise_dropout > 0.:\n                noise = torch.nn.functional.dropout(noise, p=noise_dropout)\n            x_prev = a_prev.sqrt() * pred_x0 + dir_xt + noise\n            return x_prev, pred_x0\n\n        e_t = get_model_output(x, t)\n        if len(old_eps) == 0:\n            # Pseudo Improved Euler (2nd order)\n            x_prev, pred_x0 = get_x_prev_and_pred_x0(e_t, index)\n            e_t_next = get_model_output(x_prev, t_next)\n            e_t_prime = (e_t + e_t_next) / 2\n        elif len(old_eps) == 1:\n            # 2nd order Pseudo Linear Multistep (Adams-Bashforth)\n            e_t_prime = (3 * e_t - old_eps[-1]) / 2\n        elif len(old_eps) == 2:\n            # 3nd order Pseudo Linear Multistep (Adams-Bashforth)\n            e_t_prime = (23 * e_t - 16 * old_eps[-1] + 5 * old_eps[-2]) / 12\n        elif len(old_eps) >= 3:\n            # 4nd order Pseudo Linear Multistep (Adams-Bashforth)\n            e_t_prime = (55 * e_t - 59 * old_eps[-1] + 37 * old_eps[-2] - 9 * old_eps[-3]) / 24\n\n        x_prev, pred_x0 = get_x_prev_and_pred_x0(e_t_prime, index)\n\n        return x_prev, pred_x0, e_t\n"
  },
  {
    "path": "lidm/modules/__init__.py",
    "content": ""
  },
  {
    "path": "lidm/modules/attention.py",
    "content": "from inspect import isfunction\nimport math\nimport torch\nimport torch.nn.functional as F\nfrom torch import nn, einsum\nfrom einops import rearrange, repeat\n\nfrom .basic import checkpoint\n\n\ndef exists(val):\n    return val is not None\n\n\ndef uniq(arr):\n    return{el: True for el in arr}.keys()\n\n\ndef default(val, d):\n    if exists(val):\n        return val\n    return d() if isfunction(d) else d\n\n\ndef max_neg_value(t):\n    return -torch.finfo(t.dtype).max\n\n\ndef init_(tensor):\n    dim = tensor.shape[-1]\n    std = 1 / math.sqrt(dim)\n    tensor.uniform_(-std, std)\n    return tensor\n\n\n# feedforward\nclass GEGLU(nn.Module):\n    def __init__(self, dim_in, dim_out):\n        super().__init__()\n        self.proj = nn.Linear(dim_in, dim_out * 2)\n\n    def forward(self, x):\n        x, gate = self.proj(x).chunk(2, dim=-1)\n        return x * F.gelu(gate)\n\n\nclass FeedForward(nn.Module):\n    def __init__(self, dim, dim_out=None, mult=4, glu=False, dropout=0.):\n        super().__init__()\n        inner_dim = int(dim * mult)\n        dim_out = default(dim_out, dim)\n        project_in = nn.Sequential(\n            nn.Linear(dim, inner_dim),\n            nn.GELU()\n        ) if not glu else GEGLU(dim, inner_dim)\n\n        self.net = nn.Sequential(\n            project_in,\n            nn.Dropout(dropout),\n            nn.Linear(inner_dim, dim_out)\n        )\n\n    def forward(self, x):\n        return self.net(x)\n\n\ndef zero_module(module):\n    \"\"\"\n    Zero out the parameters of a module and return it.\n    \"\"\"\n    for p in module.parameters():\n        p.detach().zero_()\n    return module\n\n\ndef Normalize(in_channels):\n    return torch.nn.GroupNorm(num_groups=32, num_channels=in_channels, eps=1e-6, affine=True)\n\n\nclass LinearAttention(nn.Module):\n    def __init__(self, dim, heads=4, dim_head=32):\n        super().__init__()\n        self.heads = heads\n        hidden_dim = dim_head * heads\n        self.to_qkv = nn.Conv2d(dim, hidden_dim * 3, 1, bias=False)\n        self.to_out = nn.Conv2d(hidden_dim, dim, 1)\n\n    def forward(self, x):\n        b, c, h, w = x.shape\n        qkv = self.to_qkv(x)\n        q, k, v = rearrange(qkv, 'b (qkv heads c) h w -> qkv b heads c (h w)', heads = self.heads, qkv=3)\n        k = k.softmax(dim=-1)  \n        context = torch.einsum('bhdn,bhen->bhde', k, v)\n        out = torch.einsum('bhde,bhdn->bhen', context, q)\n        out = rearrange(out, 'b heads c (h w) -> b (heads c) h w', heads=self.heads, h=h, w=w)\n        return self.to_out(out)\n\n\nclass SpatialSelfAttention(nn.Module):\n    def __init__(self, in_channels):\n        super().__init__()\n        self.in_channels = in_channels\n\n        self.norm = Normalize(in_channels)\n        self.q = torch.nn.Conv2d(in_channels,\n                                 in_channels,\n                                 kernel_size=1,\n                                 stride=1,\n                                 padding=0)\n        self.k = torch.nn.Conv2d(in_channels,\n                                 in_channels,\n                                 kernel_size=1,\n                                 stride=1,\n                                 padding=0)\n        self.v = torch.nn.Conv2d(in_channels,\n                                 in_channels,\n                                 kernel_size=1,\n                                 stride=1,\n                                 padding=0)\n        self.proj_out = torch.nn.Conv2d(in_channels,\n                                        in_channels,\n                                        kernel_size=1,\n                                        stride=1,\n                                        padding=0)\n\n    def forward(self, x):\n        h_ = x\n        h_ = self.norm(h_)\n        q = self.q(h_)\n        k = self.k(h_)\n        v = self.v(h_)\n\n        # compute attention\n        b,c,h,w = q.shape\n        q = rearrange(q, 'b c h w -> b (h w) c')\n        k = rearrange(k, 'b c h w -> b c (h w)')\n        w_ = torch.einsum('bij,bjk->bik', q, k)\n\n        w_ = w_ * (int(c)**(-0.5))\n        w_ = torch.nn.functional.softmax(w_, dim=2)\n\n        # attend to values\n        v = rearrange(v, 'b c h w -> b c (h w)')\n        w_ = rearrange(w_, 'b i j -> b j i')\n        h_ = torch.einsum('bij,bjk->bik', v, w_)\n        h_ = rearrange(h_, 'b c (h w) -> b c h w', h=h)\n        h_ = self.proj_out(h_)\n\n        return x+h_\n\n\nclass CrossAttention(nn.Module):\n    def __init__(self, query_dim, context_dim=None, heads=8, dim_head=64, dropout=0.):\n        super().__init__()\n        inner_dim = dim_head * heads\n        context_dim = default(context_dim, query_dim)\n\n        self.scale = dim_head ** -0.5\n        self.heads = heads\n\n        self.to_q = nn.Linear(query_dim, inner_dim, bias=False)\n        self.to_k = nn.Linear(context_dim, inner_dim, bias=False)\n        self.to_v = nn.Linear(context_dim, inner_dim, bias=False)\n\n        self.to_out = nn.Sequential(\n            nn.Linear(inner_dim, query_dim),\n            nn.Dropout(dropout)\n        )\n\n    def forward(self, x, context=None, mask=None):\n        h = self.heads\n\n        q = self.to_q(x)\n        context = default(context, x)\n        k = self.to_k(context)\n        v = self.to_v(context)\n\n        q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> (b h) n d', h=h), (q, k, v))\n\n        sim = einsum('b i d, b j d -> b i j', q, k) * self.scale\n\n        if exists(mask):\n            mask = rearrange(mask, 'b ... -> b (...)')\n            max_neg_value = -torch.finfo(sim.dtype).max\n            mask = repeat(mask, 'b j -> (b h) () j', h=h)\n            sim.masked_fill_(~mask, max_neg_value)\n\n        # attention, what we cannot get enough of\n        attn = sim.softmax(dim=-1)\n\n        out = einsum('b i j, b j d -> b i d', attn, v)\n        out = rearrange(out, '(b h) n d -> b n (h d)', h=h)\n        return self.to_out(out)\n\n\nclass BasicTransformerBlock(nn.Module):\n    def __init__(self, dim, n_heads, d_head, dropout=0., context_dim=None, gated_ff=True, checkpoint=True):\n        super().__init__()\n        self.attn1 = CrossAttention(query_dim=dim, heads=n_heads, dim_head=d_head, dropout=dropout)  # is a self-attention\n        self.ff = FeedForward(dim, dropout=dropout, glu=gated_ff)\n        self.attn2 = CrossAttention(query_dim=dim, context_dim=context_dim,\n                                    heads=n_heads, dim_head=d_head, dropout=dropout)  # is self-attn if context is none\n        self.norm1 = nn.LayerNorm(dim)\n        self.norm2 = nn.LayerNorm(dim)\n        self.norm3 = nn.LayerNorm(dim)\n        self.checkpoint = checkpoint\n\n    def forward(self, x, context=None):\n        return checkpoint(self._forward, (x, context), self.parameters(), self.checkpoint)\n\n    def _forward(self, x, context=None):\n        x = self.attn1(self.norm1(x)) + x\n        x = self.attn2(self.norm2(x), context=context) + x\n        x = self.ff(self.norm3(x)) + x\n        return x\n\n\nclass SpatialTransformer(nn.Module):\n    \"\"\"\n    Transformer block for image-like data.\n    First, project the input (aka embedding)\n    and reshape to b, t, d.\n    Then apply standard transformer action.\n    Finally, reshape to image\n    \"\"\"\n    def __init__(self, in_channels, n_heads, d_head,\n                 depth=1, dropout=0., context_dim=None):\n        super().__init__()\n        self.in_channels = in_channels\n        inner_dim = n_heads * d_head\n        self.norm = Normalize(in_channels)\n\n        self.proj_in = nn.Conv2d(in_channels,\n                                 inner_dim,\n                                 kernel_size=1,\n                                 stride=1,\n                                 padding=0)\n\n        self.transformer_blocks = nn.ModuleList(\n            [BasicTransformerBlock(inner_dim, n_heads, d_head, dropout=dropout, context_dim=context_dim)\n                for d in range(depth)]\n        )\n\n        self.proj_out = zero_module(nn.Conv2d(inner_dim,\n                                              in_channels,\n                                              kernel_size=1,\n                                              stride=1,\n                                              padding=0))\n\n    def forward(self, x, context=None):\n        # note: if no context is given, cross-attention defaults to self-attention\n        b, c, h, w = x.shape\n        x_in = x\n        x = self.norm(x)\n        x = self.proj_in(x)\n        x = rearrange(x, 'b c h w -> b (h w) c')\n        for block in self.transformer_blocks:\n            x = block(x, context=context)\n        x = rearrange(x, 'b (h w) c -> b c h w', h=h, w=w)\n        x = self.proj_out(x)\n        return x + x_in"
  },
  {
    "path": "lidm/modules/basic.py",
    "content": "# adopted from\n# https://github.com/openai/improved-diffusion/blob/main/improved_diffusion/gaussian_diffusion.py\n# and\n# https://github.com/lucidrains/denoising-diffusion-pytorch/blob/7706bdfc6f527f58d33f84b7b522e61e6e3164b3/denoising_diffusion_pytorch/denoising_diffusion_pytorch.py\n# and\n# https://github.com/openai/guided-diffusion/blob/0ba878e517b276c45d1195eb29f6f5f72659a05b/guided_diffusion/nn.py\n#\n# thanks!\n\n\nimport math\nimport torch\nimport torch.nn as nn\nimport numpy as np\nfrom einops import repeat\nfrom torch import Tensor\n\nfrom ..utils.misc_utils import instantiate_from_config, print_fn\n\n\nclass CircularPad(nn.Module):\n    def __init__(self, pad_size):\n        super(CircularPad, self).__init__()\n        h1, h2, v1, v2 = pad_size\n        self.h_pad, self.v_pad = (h1, h2, 0, 0), (0, 0, v1, v2)\n\n    def forward(self, x):\n        if sum(self.h_pad) > 0:\n            x = nn.functional.pad(x, self.h_pad, mode=\"circular\")  # horizontal pad\n        if sum(self.v_pad) > 0:\n            x = nn.functional.pad(x, self.v_pad, mode=\"constant\")  # vertical pad\n        return x\n\n\nclass CircularConv2d(nn.Conv2d):\n    def __init__(self, *args, **kwargs):\n        if 'padding' in kwargs:\n            self.is_pad = True\n            if isinstance(kwargs['padding'], int):\n                h1 = h2 = v1 = v2 = kwargs['padding']\n            elif isinstance(kwargs['padding'], tuple):\n                h1, h2, v1, v2 = kwargs['padding']\n            else:\n                raise NotImplementedError\n            self.h_pad, self.v_pad = (h1, h2, 0, 0), (0, 0, v1, v2)\n            del kwargs['padding']\n        else:\n            self.is_pad = False\n\n        super().__init__(*args, **kwargs)\n\n    def forward(self, x: Tensor) -> Tensor:\n        if self.is_pad:\n            if sum(self.h_pad) > 0:\n                x = nn.functional.pad(x, self.h_pad, mode=\"circular\")  # horizontal pad\n            if sum(self.v_pad) > 0:\n                x = nn.functional.pad(x, self.v_pad, mode=\"constant\")  # vertical pad\n        x = self._conv_forward(x, self.weight, self.bias)\n        return x\n\n\nclass ActNorm(nn.Module):\n    def __init__(self, num_features, logdet=False, affine=True,\n                 allow_reverse_init=False):\n        assert affine\n        super().__init__()\n        self.logdet = logdet\n        self.loc = nn.Parameter(torch.zeros(1, num_features, 1, 1))\n        self.scale = nn.Parameter(torch.ones(1, num_features, 1, 1))\n        self.allow_reverse_init = allow_reverse_init\n\n        self.register_buffer('initialized', torch.tensor(0, dtype=torch.uint8))\n\n    def initialize(self, input):\n        with torch.no_grad():\n            flatten = input.permute(1, 0, 2, 3).contiguous().view(input.shape[1], -1)\n            mean = (\n                flatten.mean(1)\n                .unsqueeze(1)\n                .unsqueeze(2)\n                .unsqueeze(3)\n                .permute(1, 0, 2, 3)\n            )\n            std = (\n                flatten.std(1)\n                .unsqueeze(1)\n                .unsqueeze(2)\n                .unsqueeze(3)\n                .permute(1, 0, 2, 3)\n            )\n\n            self.loc.data.copy_(-mean)\n            self.scale.data.copy_(1 / (std + 1e-6))\n\n    def forward(self, input, reverse=False):\n        if reverse:\n            return self.reverse(input)\n        if len(input.shape) == 2:\n            input = input[:,:,None,None]\n            squeeze = True\n        else:\n            squeeze = False\n\n        _, _, height, width = input.shape\n\n        if self.training and self.initialized.item() == 0:\n            self.initialize(input)\n            self.initialized.fill_(1)\n\n        h = self.scale * (input + self.loc)\n\n        if squeeze:\n            h = h.squeeze(-1).squeeze(-1)\n\n        if self.logdet:\n            log_abs = torch.log(torch.abs(self.scale))\n            logdet = height*width*torch.sum(log_abs)\n            logdet = logdet * torch.ones(input.shape[0]).to(input)\n            return h, logdet\n\n        return h\n\n    def reverse(self, output):\n        if self.training and self.initialized.item() == 0:\n            if not self.allow_reverse_init:\n                raise RuntimeError(\n                    \"Initializing ActNorm in reverse direction is \"\n                    \"disabled by default. Use allow_reverse_init=True to enable.\"\n                )\n            else:\n                self.initialize(output)\n                self.initialized.fill_(1)\n\n        if len(output.shape) == 2:\n            output = output[:, :, None, None]\n            squeeze = True\n        else:\n            squeeze = False\n\n        h = output / self.scale - self.loc\n\n        if squeeze:\n            h = h.squeeze(-1).squeeze(-1)\n        return h\n\n\ndef make_beta_schedule(schedule, n_timestep, linear_start=1e-4, linear_end=2e-2, cosine_s=8e-3):\n    if schedule == \"linear\":\n        betas = (\n                torch.linspace(linear_start ** 0.5, linear_end ** 0.5, n_timestep, dtype=torch.float64) ** 2\n        )\n\n    elif schedule == \"cosine\":\n        timesteps = (\n                torch.arange(n_timestep + 1, dtype=torch.float64) / n_timestep + cosine_s\n        )\n        alphas = timesteps / (1 + cosine_s) * np.pi / 2\n        alphas = torch.cos(alphas).pow(2)\n        alphas = alphas / alphas[0]\n        betas = 1 - alphas[1:] / alphas[:-1]\n        betas = np.clip(betas, a_min=0, a_max=0.999)\n\n    elif schedule == \"sqrt_linear\":\n        betas = torch.linspace(linear_start, linear_end, n_timestep, dtype=torch.float64)\n    elif schedule == \"sqrt\":\n        betas = torch.linspace(linear_start, linear_end, n_timestep, dtype=torch.float64) ** 0.5\n    else:\n        raise ValueError(f\"schedule '{schedule}' unknown.\")\n    return betas.numpy()\n\n\ndef make_ddim_timesteps(ddim_discr_method, num_ddim_timesteps, num_ddpm_timesteps, verbose=False):\n    if ddim_discr_method == 'uniform':\n        c = num_ddpm_timesteps // num_ddim_timesteps\n        ddim_timesteps = np.asarray(list(range(0, num_ddpm_timesteps, c)))\n    elif ddim_discr_method == 'quad':\n        ddim_timesteps = ((np.linspace(0, np.sqrt(num_ddpm_timesteps * .8), num_ddim_timesteps)) ** 2).astype(int)\n    else:\n        raise NotImplementedError(f'There is no ddim discretization method called \"{ddim_discr_method}\"')\n\n    # assert ddim_timesteps.shape[0] == num_ddim_timesteps\n    # add one to get the final alpha values right (the ones from first scale to data during sampling)\n    steps_out = ddim_timesteps + 1\n    print_fn(f'Selected timesteps for ddim sampler: {steps_out}', False)\n    return steps_out\n\n\ndef make_ddim_sampling_parameters(alphacums, ddim_timesteps, eta, verbose=False):\n    # select alphas for computing the variance schedule\n    alphas = alphacums[ddim_timesteps]\n    alphas_prev = np.asarray([alphacums[0]] + alphacums[ddim_timesteps[:-1]].tolist())\n\n    # according the the formula provided in https://arxiv.org/abs/2010.02502\n    sigmas = eta * np.sqrt((1 - alphas_prev) / (1 - alphas) * (1 - alphas / alphas_prev))\n    print_fn(f'Selected alphas for ddim sampler: a_t: {alphas}; a_(t-1): {alphas_prev}', False)\n    print_fn(f'For the chosen value of eta, which is {eta}, this results in the following sigma_t schedule for ddim sampler {sigmas}', False)\n    return sigmas, alphas, alphas_prev\n\n\ndef betas_for_alpha_bar(num_diffusion_timesteps, alpha_bar, max_beta=0.999):\n    \"\"\"\n    Create a beta schedule that discretizes the given alpha_t_bar function,\n    which defines the cumulative product of (1-beta) over time from t = [0,1].\n    :param num_diffusion_timesteps: the number of betas to produce.\n    :param alpha_bar: a lambda that takes an argument t from 0 to 1 and\n                      produces the cumulative product of (1-beta) up to that\n                      part of the diffusion process.\n    :param max_beta: the maximum beta to use; use values lower than 1 to\n                     prevent singularities.\n    \"\"\"\n    betas = []\n    for i in range(num_diffusion_timesteps):\n        t1 = i / num_diffusion_timesteps\n        t2 = (i + 1) / num_diffusion_timesteps\n        betas.append(min(1 - alpha_bar(t2) / alpha_bar(t1), max_beta))\n    return np.array(betas)\n\n\ndef extract_into_tensor(a, t, x_shape):\n    b, *_ = t.shape\n    out = a.gather(-1, t)\n    return out.reshape(b, *((1,) * (len(x_shape) - 1)))\n\n\ndef checkpoint(func, inputs, params, flag):\n    \"\"\"\n    Evaluate a function without caching intermediate activations, allowing for\n    reduced memory at the expense of extra compute in the backward pass.\n    :param func: the function to evaluate.\n    :param inputs: the argument sequence to pass to `func`.\n    :param params: a sequence of parameters `func` depends on but does not\n                   explicitly take as arguments.\n    :param flag: if False, disable gradient checkpointing.\n    \"\"\"\n    if flag:\n        args = tuple(inputs) + tuple(params)\n        return CheckpointFunction.apply(func, len(inputs), *args)\n    else:\n        return func(*inputs)\n\n\nclass CheckpointFunction(torch.autograd.Function):\n    @staticmethod\n    def forward(ctx, run_function, length, *args):\n        ctx.run_function = run_function\n        ctx.input_tensors = list(args[:length])\n        ctx.input_params = list(args[length:])\n\n        with torch.no_grad():\n            output_tensors = ctx.run_function(*ctx.input_tensors)\n        return output_tensors\n\n    @staticmethod\n    def backward(ctx, *output_grads):\n        ctx.input_tensors = [x.detach().requires_grad_(True) for x in ctx.input_tensors]\n        with torch.enable_grad():\n            # Fixes a bug where the first op in run_function modifies the\n            # Tensor storage in place, which is not allowed for detach()'d\n            # Tensors.\n            shallow_copies = [x.view_as(x) for x in ctx.input_tensors]\n            output_tensors = ctx.run_function(*shallow_copies)\n        input_grads = torch.autograd.grad(\n            output_tensors,\n            ctx.input_tensors + ctx.input_params,\n            output_grads,\n            allow_unused=True,\n        )\n        del ctx.input_tensors\n        del ctx.input_params\n        del output_tensors\n        return (None, None) + input_grads\n\n\ndef timestep_embedding(timesteps, dim, max_period=10000, repeat_only=False):\n    \"\"\"\n    Create sinusoidal timestep embeddings.\n    :param timesteps: a 1-D Tensor of N indices, one per batch element.\n                      These may be fractional.\n    :param dim: the dimension of the output.\n    :param max_period: controls the minimum frequency of the embeddings.\n    :return: an [N x dim] Tensor of positional embeddings.\n    \"\"\"\n    if not repeat_only:\n        half = dim // 2\n        freqs = torch.exp(-math.log(max_period) * torch.arange(start=0, end=half, dtype=torch.float32) / half).to(device=timesteps.device)\n        args = timesteps[:, None].float() * freqs[None]\n        embedding = torch.cat([torch.cos(args), torch.sin(args)], dim=-1)\n        if dim % 2:\n            embedding = torch.cat([embedding, torch.zeros_like(embedding[:, :1])], dim=-1)\n    else:\n        embedding = repeat(timesteps, 'b -> b d', d=dim)\n    return embedding\n\n\ndef zero_module(module):\n    \"\"\"\n    Zero out the parameters of a module and return it.\n    \"\"\"\n    for p in module.parameters():\n        p.detach().zero_()\n    return module\n\n\ndef scale_module(module, scale):\n    \"\"\"\n    Scale the parameters of a module and return it.\n    \"\"\"\n    for p in module.parameters():\n        p.detach().mul_(scale)\n    return module\n\n\ndef mean_flat(tensor):\n    \"\"\"\n    Take the mean over all non-batch dimensions.\n    \"\"\"\n    return tensor.mean(dim=list(range(1, len(tensor.shape))))\n\n\ndef normalization(channels):\n    \"\"\"\n    Make a standard normalization layer.\n    :param channels: number of input channels.\n    :return: an nn.Module for normalization.\n    \"\"\"\n    return GroupNorm32(32, channels)\n\n\n# PyTorch 1.7 has SiLU, but we support PyTorch 1.5.\nclass SiLU(nn.Module):\n    def forward(self, x):\n        return x * torch.sigmoid(x)\n\n\nclass GroupNorm32(nn.GroupNorm):\n    def forward(self, x):\n        return super().forward(x.float()).type(x.dtype)\n\n\ndef conv_nd(dims, *args, cconv=False, **kwargs):\n    \"\"\"\n    Create a 1D, 2D, or 3D convolution module.\n    \"\"\"\n    if dims == 1:\n        return nn.Conv1d(*args, **kwargs)\n    elif dims == 2:\n        if cconv:\n            return CircularConv2d(*args, **kwargs)\n        else:\n            return nn.Conv2d(*args, **kwargs)\n    elif dims == 3:\n        return nn.Conv3d(*args, **kwargs)\n    raise ValueError(f\"unsupported dimensions: {dims}\")\n\n\ndef linear(*args, **kwargs):\n    \"\"\"\n    Create a linear module.\n    \"\"\"\n    return nn.Linear(*args, **kwargs)\n\n\ndef avg_pool_nd(dims, *args, **kwargs):\n    \"\"\"\n    Create a 1D, 2D, or 3D average pooling module.\n    \"\"\"\n    if dims == 1:\n        return nn.AvgPool1d(*args, **kwargs)\n    elif dims == 2:\n        return nn.AvgPool2d(*args, **kwargs)\n    elif dims == 3:\n        return nn.AvgPool3d(*args, **kwargs)\n    raise ValueError(f\"unsupported dimensions: {dims}\")\n\n\nclass HybridConditioner(nn.Module):\n\n    def __init__(self, c_concat_config, c_crossattn_config):\n        super().__init__()\n        self.concat_conditioner = instantiate_from_config(c_concat_config)\n        self.crossattn_conditioner = instantiate_from_config(c_crossattn_config)\n\n    def forward(self, c_concat, c_crossattn):\n        c_concat = self.concat_conditioner(c_concat)\n        c_crossattn = self.crossattn_conditioner(c_crossattn)\n        return {'c_concat': [c_concat], 'c_crossattn': [c_crossattn]}\n\n\ndef noise_like(shape, device, repeat=False):\n    repeat_noise = lambda: torch.randn((1, *shape[1:]), device=device).repeat(shape[0], *((1,) * (len(shape) - 1)))\n    noise = lambda: torch.randn(shape, device=device)\n    return repeat_noise() if repeat else noise()"
  },
  {
    "path": "lidm/modules/diffusion/__init__.py",
    "content": ""
  },
  {
    "path": "lidm/modules/diffusion/model_ldm.py",
    "content": "# pytorch_diffusion + derived encoder decoder\nimport math\nimport torch\nimport torch.nn as nn\nimport numpy as np\nfrom einops import rearrange\n\nfrom ...utils.misc_utils import instantiate_from_config\nfrom ...modules.attention import LinearAttention\n\n\ndef get_timestep_embedding(timesteps, embedding_dim):\n    \"\"\"\n    This matches the implementation in Denoising Diffusion Probabilistic Models:\n    From Fairseq.\n    Build sinusoidal embeddings.\n    This matches the implementation in tensor2tensor, but differs slightly\n    from the description in Section 3.5 of \"Attention Is All You Need\".\n    \"\"\"\n    assert len(timesteps.shape) == 1\n\n    half_dim = embedding_dim // 2\n    emb = math.log(10000) / (half_dim - 1)\n    emb = torch.exp(torch.arange(half_dim, dtype=torch.float32) * -emb)\n    emb = emb.to(device=timesteps.device)\n    emb = timesteps.float()[:, None] * emb[None, :]\n    emb = torch.cat([torch.sin(emb), torch.cos(emb)], dim=1)\n    if embedding_dim % 2 == 1:  # zero pad\n        emb = torch.nn.functional.pad(emb, (0, 1, 0, 0))\n    return emb\n\n\ndef nonlinearity(x):\n    # swish\n    return x * torch.sigmoid(x)\n\n\ndef Normalize(in_channels, num_groups=32):\n    return torch.nn.GroupNorm(num_groups=num_groups, num_channels=in_channels, eps=1e-6, affine=True)\n\n\nclass Upsample(nn.Module):\n    def __init__(self, in_channels, with_conv):\n        super().__init__()\n        self.with_conv = with_conv\n        if self.with_conv:\n            self.conv = torch.nn.Conv2d(in_channels,\n                                        in_channels,\n                                        kernel_size=3,\n                                        stride=1,\n                                        padding=1)\n\n    def forward(self, x):\n        x = torch.nn.functional.interpolate(x, scale_factor=2.0, mode=\"nearest\")\n        if self.with_conv:\n            x = self.conv(x)\n        return x\n\n\nclass Downsample(nn.Module):\n    def __init__(self, in_channels, with_conv):\n        super().__init__()\n        self.with_conv = with_conv\n        if self.with_conv:\n            # no asymmetric padding in torch conv, must do it ourselves\n            self.conv = torch.nn.Conv2d(in_channels,\n                                        in_channels,\n                                        kernel_size=3,\n                                        stride=2,\n                                        padding=0)\n\n    def forward(self, x):\n        if self.with_conv:\n            pad = (0, 1, 0, 1)\n            x = torch.nn.functional.pad(x, pad, mode=\"constant\", value=0)\n            x = self.conv(x)\n        else:\n            x = torch.nn.functional.avg_pool2d(x, kernel_size=2, stride=2)\n        return x\n\n\nclass ResnetBlock(nn.Module):\n    def __init__(self, *, in_channels, out_channels=None, conv_shortcut=False,\n                 dropout, temb_channels=512):\n        super().__init__()\n        self.in_channels = in_channels\n        out_channels = in_channels if out_channels is None else out_channels\n        self.out_channels = out_channels\n        self.use_conv_shortcut = conv_shortcut\n\n        self.norm1 = Normalize(in_channels)\n        self.conv1 = torch.nn.Conv2d(in_channels,\n                                     out_channels,\n                                     kernel_size=3,\n                                     stride=1,\n                                     padding=1)\n        if temb_channels > 0:\n            self.temb_proj = torch.nn.Linear(temb_channels,\n                                             out_channels)\n        self.norm2 = Normalize(out_channels)\n        self.dropout = torch.nn.Dropout(dropout)\n        self.conv2 = torch.nn.Conv2d(out_channels,\n                                     out_channels,\n                                     kernel_size=3,\n                                     stride=1,\n                                     padding=1)\n        if self.in_channels != self.out_channels:\n            if self.use_conv_shortcut:\n                self.conv_shortcut = torch.nn.Conv2d(in_channels,\n                                                     out_channels,\n                                                     kernel_size=3,\n                                                     stride=1,\n                                                     padding=1)\n            else:\n                self.nin_shortcut = torch.nn.Conv2d(in_channels,\n                                                    out_channels,\n                                                    kernel_size=1,\n                                                    stride=1,\n                                                    padding=0)\n\n    def forward(self, x, temb):\n        h = x\n        h = self.norm1(h)\n        h = nonlinearity(h)\n        h = self.conv1(h)\n\n        if temb is not None:\n            h = h + self.temb_proj(nonlinearity(temb))[:, :, None, None]\n\n        h = self.norm2(h)\n        h = nonlinearity(h)\n        h = self.dropout(h)\n        h = self.conv2(h)\n\n        if self.in_channels != self.out_channels:\n            if self.use_conv_shortcut:\n                x = self.conv_shortcut(x)\n            else:\n                x = self.nin_shortcut(x)\n\n        return x + h\n\n\nclass LinAttnBlock(LinearAttention):\n    \"\"\"to match AttnBlock usage\"\"\"\n\n    def __init__(self, in_channels):\n        super().__init__(dim=in_channels, heads=1, dim_head=in_channels)\n\n\nclass AttnBlock(nn.Module):\n    def __init__(self, in_channels):\n        super().__init__()\n        self.in_channels = in_channels\n\n        self.norm = Normalize(in_channels)\n        self.q = torch.nn.Conv2d(in_channels,\n                                 in_channels,\n                                 kernel_size=1,\n                                 stride=1,\n                                 padding=0)\n        self.k = torch.nn.Conv2d(in_channels,\n                                 in_channels,\n                                 kernel_size=1,\n                                 stride=1,\n                                 padding=0)\n        self.v = torch.nn.Conv2d(in_channels,\n                                 in_channels,\n                                 kernel_size=1,\n                                 stride=1,\n                                 padding=0)\n        self.proj_out = torch.nn.Conv2d(in_channels,\n                                        in_channels,\n                                        kernel_size=1,\n                                        stride=1,\n                                        padding=0)\n\n    def forward(self, x):\n        h_ = x\n        h_ = self.norm(h_)\n        q = self.q(h_)\n        k = self.k(h_)\n        v = self.v(h_)\n\n        # compute attention\n        b, c, h, w = q.shape\n        q = q.reshape(b, c, h * w)\n        q = q.permute(0, 2, 1)  # b,hw,c\n        k = k.reshape(b, c, h * w)  # b,c,hw\n        w_ = torch.bmm(q, k)  # b,hw,hw    w[b,i,j]=sum_c q[b,i,c]k[b,c,j]\n        w_ = w_ * (int(c) ** (-0.5))\n        w_ = torch.nn.functional.softmax(w_, dim=2)\n\n        # attend to values\n        v = v.reshape(b, c, h * w)\n        w_ = w_.permute(0, 2, 1)  # b,hw,hw (first hw of k, second of q)\n        h_ = torch.bmm(v, w_)  # b, c,hw (hw of q) h_[b,c,j] = sum_i v[b,c,i] w_[b,i,j]\n        h_ = h_.reshape(b, c, h, w)\n\n        h_ = self.proj_out(h_)\n\n        return x + h_\n\n\ndef make_attn(in_channels, attn_type=\"vanilla\"):\n    assert attn_type in [\"vanilla\", \"linear\", \"none\"], f'attn_type {attn_type} unknown'\n    # print(f\"making attention of type '{attn_type}' with {in_channels} in_channels\")\n    if attn_type == \"vanilla\":\n        return AttnBlock(in_channels)\n    elif attn_type == \"none\":\n        return nn.Identity(in_channels)\n    else:\n        return LinAttnBlock(in_channels)\n\n\nclass Model(nn.Module):\n    def __init__(self, *, ch, out_ch, ch_mult=(1, 2, 4, 8), num_res_blocks,\n                 attn_levels, dropout=0.0, resamp_with_conv=True, in_channels,\n                 use_timestep=True, use_linear_attn=False, attn_type=\"vanilla\"):\n        super().__init__()\n        if use_linear_attn: attn_type = \"linear\"\n        self.ch = ch\n        self.temb_ch = self.ch * 4\n        self.num_resolutions = len(ch_mult)\n        self.num_res_blocks = num_res_blocks\n        self.in_channels = in_channels\n\n        self.use_timestep = use_timestep\n        if self.use_timestep:\n            # timestep embedding\n            self.temb = nn.Module()\n            self.temb.dense = nn.ModuleList([\n                torch.nn.Linear(self.ch,\n                                self.temb_ch),\n                torch.nn.Linear(self.temb_ch,\n                                self.temb_ch),\n            ])\n\n        # downsampling\n        self.conv_in = torch.nn.Conv2d(in_channels,\n                                       self.ch,\n                                       kernel_size=3,\n                                       stride=1,\n                                       padding=1)\n\n        in_ch_mult = (1,) + tuple(ch_mult)\n        self.down = nn.ModuleList()\n        for i_level in range(self.num_resolutions):\n            block = nn.ModuleList()\n            attn = nn.ModuleList()\n            block_in = ch * in_ch_mult[i_level]\n            block_out = ch * ch_mult[i_level]\n            for i_block in range(self.num_res_blocks):\n                block.append(ResnetBlock(in_channels=block_in,\n                                         out_channels=block_out,\n                                         temb_channels=self.temb_ch,\n                                         dropout=dropout))\n                block_in = block_out\n                if i_level in attn_levels:\n                    attn.append(make_attn(block_in, attn_type=attn_type))\n            down = nn.Module()\n            down.block = block\n            down.attn = attn\n            if i_level != self.num_resolutions - 1:\n                down.downsample = Downsample(block_in, resamp_with_conv)\n            self.down.append(down)\n\n        # middle\n        self.mid = nn.Module()\n        self.mid.block_1 = ResnetBlock(in_channels=block_in,\n                                       out_channels=block_in,\n                                       temb_channels=self.temb_ch,\n                                       dropout=dropout)\n        self.mid.attn_1 = make_attn(block_in, attn_type=attn_type)\n        self.mid.block_2 = ResnetBlock(in_channels=block_in,\n                                       out_channels=block_in,\n                                       temb_channels=self.temb_ch,\n                                       dropout=dropout)\n\n        # upsampling\n        self.up = nn.ModuleList()\n        for i_level in reversed(range(self.num_resolutions)):\n            block = nn.ModuleList()\n            attn = nn.ModuleList()\n            block_out = ch * ch_mult[i_level]\n            skip_in = ch * ch_mult[i_level]\n            for i_block in range(self.num_res_blocks + 1):\n                if i_block == self.num_res_blocks:\n                    skip_in = ch * in_ch_mult[i_level]\n                block.append(ResnetBlock(in_channels=block_in + skip_in,\n                                         out_channels=block_out,\n                                         temb_channels=self.temb_ch,\n                                         dropout=dropout))\n                block_in = block_out\n                if i_level in attn_levels:\n                    attn.append(make_attn(block_in, attn_type=attn_type))\n            up = nn.Module()\n            up.block = block\n            up.attn = attn\n            if i_level != 0:\n                up.upsample = Upsample(block_in, resamp_with_conv)\n            self.up.insert(0, up)  # prepend to get consistent order\n\n        # end\n        self.norm_out = Normalize(block_in)\n        self.conv_out = torch.nn.Conv2d(block_in,\n                                        out_ch,\n                                        kernel_size=3,\n                                        stride=1,\n                                        padding=1)\n\n    def forward(self, x, t=None, context=None):\n        # assert x.shape[2] == x.shape[3] == self.resolution\n        if context is not None:\n            # assume aligned context, cat along channel axis\n            x = torch.cat((x, context), dim=1)\n        if self.use_timestep:\n            # timestep embedding\n            assert t is not None\n            temb = get_timestep_embedding(t, self.ch)\n            temb = self.temb.dense[0](temb)\n            temb = nonlinearity(temb)\n            temb = self.temb.dense[1](temb)\n        else:\n            temb = None\n\n        # downsampling\n        hs = [self.conv_in(x)]\n        for i_level in range(self.num_resolutions):\n            for i_block in range(self.num_res_blocks):\n                h = self.down[i_level].block[i_block](hs[-1], temb)\n                if len(self.down[i_level].attn) > 0:\n                    h = self.down[i_level].attn[i_block](h)\n                hs.append(h)\n            if i_level != self.num_resolutions - 1:\n                hs.append(self.down[i_level].downsample(hs[-1]))\n\n        # middle\n        h = hs[-1]\n        h = self.mid.block_1(h, temb)\n        h = self.mid.attn_1(h)\n        h = self.mid.block_2(h, temb)\n\n        # upsampling\n        for i_level in reversed(range(self.num_resolutions)):\n            for i_block in range(self.num_res_blocks + 1):\n                h = self.up[i_level].block[i_block](\n                    torch.cat([h, hs.pop()], dim=1), temb)\n                if len(self.up[i_level].attn) > 0:\n                    h = self.up[i_level].attn[i_block](h)\n            if i_level != 0:\n                h = self.up[i_level].upsample(h)\n\n        # end\n        h = self.norm_out(h)\n        h = nonlinearity(h)\n        h = self.conv_out(h)\n        return h\n\n    def get_last_layer(self):\n        return self.conv_out.weight\n\n\nclass Encoder(nn.Module):\n    def __init__(self, *, ch, out_ch, ch_mult=(1, 2, 4, 8), num_res_blocks,\n                 attn_levels, dropout=0.0, resamp_with_conv=True, in_channels,\n                 z_channels, double_z=True, use_linear_attn=False, attn_type=\"vanilla\",\n                 **ignore_kwargs):\n        super().__init__()\n        if use_linear_attn: attn_type = \"linear\"\n        self.ch = ch\n        self.temb_ch = 0\n        self.num_resolutions = len(ch_mult)\n        self.num_res_blocks = num_res_blocks\n        self.in_channels = in_channels\n\n        # downsampling\n        self.conv_in = torch.nn.Conv2d(in_channels,\n                                       self.ch,\n                                       kernel_size=3,\n                                       stride=1,\n                                       padding=1)\n\n        in_ch_mult = (1,) + tuple(ch_mult)\n        self.in_ch_mult = in_ch_mult\n        self.down = nn.ModuleList()\n        for i_level in range(self.num_resolutions):\n            block = nn.ModuleList()\n            attn = nn.ModuleList()\n            block_in = ch * in_ch_mult[i_level]\n            block_out = ch * ch_mult[i_level]\n            for i_block in range(self.num_res_blocks):\n                block.append(ResnetBlock(in_channels=block_in,\n                                         out_channels=block_out,\n                                         temb_channels=self.temb_ch,\n                                         dropout=dropout))\n                block_in = block_out\n                if i_level in attn_levels:\n                    attn.append(make_attn(block_in, attn_type=attn_type))\n            down = nn.Module()\n            down.block = block\n            down.attn = attn\n            if i_level != self.num_resolutions - 1:\n                down.downsample = Downsample(block_in, resamp_with_conv)\n            self.down.append(down)\n\n        # middle\n        self.mid = nn.Module()\n        self.mid.block_1 = ResnetBlock(in_channels=block_in,\n                                       out_channels=block_in,\n                                       temb_channels=self.temb_ch,\n                                       dropout=dropout)\n        self.mid.attn_1 = make_attn(block_in, attn_type=attn_type)\n        self.mid.block_2 = ResnetBlock(in_channels=block_in,\n                                       out_channels=block_in,\n                                       temb_channels=self.temb_ch,\n                                       dropout=dropout)\n\n        # end\n        self.norm_out = Normalize(block_in)\n        self.conv_out = torch.nn.Conv2d(block_in,\n                                        2 * z_channels if double_z else z_channels,\n                                        kernel_size=3,\n                                        stride=1,\n                                        padding=1)\n\n    def forward(self, x):\n        # timestep embedding\n        temb = None\n        # downsampling\n        hs = [self.conv_in(x)]\n        for i_level in range(self.num_resolutions):\n            for i_block in range(self.num_res_blocks):\n                h = self.down[i_level].block[i_block](hs[-1], temb)\n                if len(self.down[i_level].attn) > 0:\n                    h = self.down[i_level].attn[i_block](h)\n                hs.append(h)\n            if i_level != self.num_resolutions - 1:\n                hs.append(self.down[i_level].downsample(hs[-1]))\n\n        # middle\n        h = hs[-1]\n        h = self.mid.block_1(h, temb)\n        h = self.mid.attn_1(h)\n        h = self.mid.block_2(h, temb)\n\n        # end\n        h = self.norm_out(h)\n        h = nonlinearity(h)\n        h = self.conv_out(h)\n        return h\n\n\nclass Decoder(nn.Module):\n    def __init__(self, *, ch, out_ch, ch_mult=(1, 2, 4, 8), num_res_blocks,\n                 attn_levels, dropout=0.0, resamp_with_conv=True, in_channels,\n                 z_channels, give_pre_end=False, tanh_out=False, use_linear_attn=False,\n                 attn_type=\"vanilla\", **ignorekwargs):\n        super().__init__()\n        if use_linear_attn: attn_type = \"linear\"\n        self.ch = ch\n        self.temb_ch = 0\n        self.num_resolutions = len(ch_mult)\n        self.num_res_blocks = num_res_blocks\n        self.in_channels = in_channels\n        self.give_pre_end = give_pre_end\n        self.tanh_out = tanh_out\n\n        # compute in_ch_mult, block_in and curr_res at lowest res\n        in_ch_mult = (1,) + tuple(ch_mult)\n        block_in = ch * ch_mult[self.num_resolutions - 1]\n\n        # z to block_in\n        self.conv_in = torch.nn.Conv2d(z_channels,\n                                       block_in,\n                                       kernel_size=3,\n                                       stride=1,\n                                       padding=1)\n\n        # middle\n        self.mid = nn.Module()\n        self.mid.block_1 = ResnetBlock(in_channels=block_in,\n                                       out_channels=block_in,\n                                       temb_channels=self.temb_ch,\n                                       dropout=dropout)\n        self.mid.attn_1 = make_attn(block_in, attn_type=attn_type)\n        self.mid.block_2 = ResnetBlock(in_channels=block_in,\n                                       out_channels=block_in,\n                                       temb_channels=self.temb_ch,\n                                       dropout=dropout)\n\n        # upsampling\n        self.up = nn.ModuleList()\n        for i_level in reversed(range(self.num_resolutions)):\n            block = nn.ModuleList()\n            attn = nn.ModuleList()\n            block_out = ch * ch_mult[i_level]\n            for i_block in range(self.num_res_blocks + 1):\n                block.append(ResnetBlock(in_channels=block_in,\n                                         out_channels=block_out,\n                                         temb_channels=self.temb_ch,\n                                         dropout=dropout))\n                block_in = block_out\n                if i_level in attn_levels:\n                    attn.append(make_attn(block_in, attn_type=attn_type))\n            up = nn.Module()\n            up.block = block\n            up.attn = attn\n            if i_level != 0:\n                up.upsample = Upsample(block_in, resamp_with_conv)\n            self.up.insert(0, up)  # prepend to get consistent order\n\n        # end\n        self.norm_out = Normalize(block_in)\n        self.conv_out = torch.nn.Conv2d(block_in,\n                                        out_ch,\n                                        kernel_size=3,\n                                        stride=1,\n                                        padding=1)\n\n    def forward(self, z):\n        self.last_z_shape = z.shape\n\n        # timestep embedding\n        temb = None\n\n        # z to block_in\n        h = self.conv_in(z)\n\n        # middle\n        h = self.mid.block_1(h, temb)\n        h = self.mid.attn_1(h)\n        h = self.mid.block_2(h, temb)\n\n        # upsampling\n        for i_level in reversed(range(self.num_resolutions)):\n            for i_block in range(self.num_res_blocks + 1):\n                h = self.up[i_level].block[i_block](h, temb)\n                if len(self.up[i_level].attn) > 0:\n                    h = self.up[i_level].attn[i_block](h)\n            if i_level != 0:\n                h = self.up[i_level].upsample(h)\n\n        # end\n        if self.give_pre_end:\n            return h\n\n        h = self.norm_out(h)\n        h = nonlinearity(h)\n        h = self.conv_out(h)\n        if self.tanh_out:\n            h = torch.tanh(h)\n        return h\n\n\nclass SimpleDecoder(nn.Module):\n    def __init__(self, in_channels, out_channels, *args, **kwargs):\n        super().__init__()\n        self.model = nn.ModuleList([nn.Conv2d(in_channels, in_channels, 1),\n                                    ResnetBlock(in_channels=in_channels,\n                                                out_channels=2 * in_channels,\n                                                temb_channels=0, dropout=0.0),\n                                    ResnetBlock(in_channels=2 * in_channels,\n                                                out_channels=4 * in_channels,\n                                                temb_channels=0, dropout=0.0),\n                                    ResnetBlock(in_channels=4 * in_channels,\n                                                out_channels=2 * in_channels,\n                                                temb_channels=0, dropout=0.0),\n                                    nn.Conv2d(2 * in_channels, in_channels, 1),\n                                    Upsample(in_channels, with_conv=True)])\n        # end\n        self.norm_out = Normalize(in_channels)\n        self.conv_out = torch.nn.Conv2d(in_channels,\n                                        out_channels,\n                                        kernel_size=3,\n                                        stride=1,\n                                        padding=1)\n\n    def forward(self, x):\n        for i, layer in enumerate(self.model):\n            if i in [1, 2, 3]:\n                x = layer(x, None)\n            else:\n                x = layer(x)\n\n        h = self.norm_out(x)\n        h = nonlinearity(h)\n        x = self.conv_out(h)\n        return x\n\n\nclass UpsampleDecoder(nn.Module):\n    def __init__(self, in_channels, out_channels, ch, num_res_blocks,\n                 ch_mult=(2, 2), dropout=0.0):\n        super().__init__()\n        # upsampling\n        self.temb_ch = 0\n        self.num_resolutions = len(ch_mult)\n        block_in = in_channels\n        self.res_blocks = nn.ModuleList()\n        self.upsample_blocks = nn.ModuleList()\n        for i_level in range(self.num_resolutions):\n            res_block = []\n            block_out = ch * ch_mult[i_level]\n            for i_block in range(self.num_res_blocks + 1):\n                res_block.append(ResnetBlock(in_channels=block_in,\n                                             out_channels=block_out,\n                                             temb_channels=self.temb_ch,\n                                             dropout=dropout))\n                block_in = block_out\n            self.res_blocks.append(nn.ModuleList(res_block))\n            if i_level != self.num_resolutions - 1:\n                self.upsample_blocks.append(Upsample(block_in, True))\n\n        # end\n        self.norm_out = Normalize(block_in)\n        self.conv_out = torch.nn.Conv2d(block_in,\n                                        out_channels,\n                                        kernel_size=3,\n                                        stride=1,\n                                        padding=1)\n\n    def forward(self, x):\n        # upsampling\n        h = x\n        for k, i_level in enumerate(range(self.num_resolutions)):\n            for i_block in range(self.num_res_blocks + 1):\n                h = self.res_blocks[i_level][i_block](h, None)\n            if i_level != self.num_resolutions - 1:\n                h = self.upsample_blocks[k](h)\n        h = self.norm_out(h)\n        h = nonlinearity(h)\n        h = self.conv_out(h)\n        return h\n\n\nclass LatentRescaler(nn.Module):\n    def __init__(self, factor, in_channels, mid_channels, out_channels, depth=2):\n        super().__init__()\n        # residual block, interpolate, residual block\n        self.factor = factor\n        self.conv_in = nn.Conv2d(in_channels,\n                                 mid_channels,\n                                 kernel_size=3,\n                                 stride=1,\n                                 padding=1)\n        self.res_block1 = nn.ModuleList([ResnetBlock(in_channels=mid_channels,\n                                                     out_channels=mid_channels,\n                                                     temb_channels=0,\n                                                     dropout=0.0) for _ in range(depth)])\n        self.attn = AttnBlock(mid_channels)\n        self.res_block2 = nn.ModuleList([ResnetBlock(in_channels=mid_channels,\n                                                     out_channels=mid_channels,\n                                                     temb_channels=0,\n                                                     dropout=0.0) for _ in range(depth)])\n\n        self.conv_out = nn.Conv2d(mid_channels,\n                                  out_channels,\n                                  kernel_size=1,\n                                  )\n\n    def forward(self, x):\n        x = self.conv_in(x)\n        for block in self.res_block1:\n            x = block(x, None)\n        x = torch.nn.functional.interpolate(x, size=(\n        int(round(x.shape[2] * self.factor)), int(round(x.shape[3] * self.factor))))\n        x = self.attn(x)\n        for block in self.res_block2:\n            x = block(x, None)\n        x = self.conv_out(x)\n        return x\n\n\nclass MergedRescaleEncoder(nn.Module):\n    def __init__(self, in_channels, ch, out_ch, num_res_blocks,\n                 attn_levels, dropout=0.0, resamp_with_conv=True,\n                 ch_mult=(1, 2, 4, 8), rescale_factor=1.0, rescale_module_depth=1):\n        super().__init__()\n        intermediate_chn = ch * ch_mult[-1]\n        self.encoder = Encoder(in_channels=in_channels, num_res_blocks=num_res_blocks, ch=ch, ch_mult=ch_mult,\n                               z_channels=intermediate_chn, double_z=False,\n                               attn_levels=attn_levels, dropout=dropout, resamp_with_conv=resamp_with_conv,\n                               out_ch=None)\n        self.rescaler = LatentRescaler(factor=rescale_factor, in_channels=intermediate_chn,\n                                       mid_channels=intermediate_chn, out_channels=out_ch, depth=rescale_module_depth)\n\n    def forward(self, x):\n        x = self.encoder(x)\n        x = self.rescaler(x)\n        return x\n\n\nclass MergedRescaleDecoder(nn.Module):\n    def __init__(self, z_channels, out_ch, num_res_blocks, attn_levels, ch, ch_mult=(1, 2, 4, 8),\n                 dropout=0.0, resamp_with_conv=True, rescale_factor=1.0, rescale_module_depth=1):\n        super().__init__()\n        tmp_chn = z_channels * ch_mult[-1]\n        self.decoder = Decoder(out_ch=out_ch, z_channels=tmp_chn, attn_levels=attn_levels, dropout=dropout,\n                               resamp_with_conv=resamp_with_conv, in_channels=None, num_res_blocks=num_res_blocks,\n                               ch_mult=ch_mult, ch=ch)\n        self.rescaler = LatentRescaler(factor=rescale_factor, in_channels=z_channels, mid_channels=tmp_chn,\n                                       out_channels=tmp_chn, depth=rescale_module_depth)\n\n    def forward(self, x):\n        x = self.rescaler(x)\n        x = self.decoder(x)\n        return x\n\n\nclass Upsampler(nn.Module):\n    def __init__(self, in_size, out_size, in_channels, out_channels, ch_mult=2):\n        super().__init__()\n        assert out_size >= in_size\n        num_blocks = int(np.log2(out_size // in_size)) + 1\n        factor_up = 1. + (out_size % in_size)\n        print(\n            f\"Building {self.__class__.__name__} with in_size: {in_size} --> out_size {out_size} and factor {factor_up}\")\n        self.rescaler = LatentRescaler(factor=factor_up, in_channels=in_channels, mid_channels=2 * in_channels,\n                                       out_channels=in_channels)\n        self.decoder = Decoder(out_ch=out_channels, z_channels=in_channels, num_res_blocks=2,\n                               attn_levels=[], in_channels=None, ch=in_channels,\n                               ch_mult=[ch_mult for _ in range(num_blocks)])\n\n    def forward(self, x):\n        x = self.rescaler(x)\n        x = self.decoder(x)\n        return x\n\n\nclass Resize(nn.Module):\n    def __init__(self, in_channels=None, learned=False, mode=\"bilinear\"):\n        super().__init__()\n        self.with_conv = learned\n        self.mode = mode\n        if self.with_conv:\n            print(f\"Note: {self.__class__.__name} uses learned downsampling and will ignore the fixed {mode} mode\")\n            raise NotImplementedError()\n            assert in_channels is not None\n            # no asymmetric padding in torch conv, must do it ourselves\n            self.conv = torch.nn.Conv2d(in_channels,\n                                        in_channels,\n                                        kernel_size=4,\n                                        stride=2,\n                                        padding=1)\n\n    def forward(self, x, scale_factor=1.0):\n        if scale_factor == 1.0:\n            return x\n        else:\n            x = torch.nn.functional.interpolate(x, mode=self.mode, align_corners=False, scale_factor=scale_factor)\n        return x\n\n\nclass FirstStagePostProcessor(nn.Module):\n\n    def __init__(self, ch_mult: list, in_channels,\n                 pretrained_model: nn.Module = None,\n                 reshape=False,\n                 n_channels=None,\n                 dropout=0.,\n                 pretrained_config=None):\n        super().__init__()\n        if pretrained_config is None:\n            assert pretrained_model is not None, 'Either \"pretrained_model\" or \"pretrained_config\" must not be None'\n            self.pretrained_model = pretrained_model\n        else:\n            assert pretrained_config is not None, 'Either \"pretrained_model\" or \"pretrained_config\" must not be None'\n            self.instantiate_pretrained(pretrained_config)\n\n        self.do_reshape = reshape\n\n        if n_channels is None:\n            n_channels = self.pretrained_model.encoder.ch\n\n        self.proj_norm = Normalize(in_channels, num_groups=in_channels // 2)\n        self.proj = nn.Conv2d(in_channels, n_channels, kernel_size=3,\n                              stride=1, padding=1)\n\n        blocks = []\n        downs = []\n        ch_in = n_channels\n        for m in ch_mult:\n            blocks.append(ResnetBlock(in_channels=ch_in, out_channels=m * n_channels, dropout=dropout))\n            ch_in = m * n_channels\n            downs.append(Downsample(ch_in, with_conv=False))\n\n        self.model = nn.ModuleList(blocks)\n        self.downsampler = nn.ModuleList(downs)\n\n    def instantiate_pretrained(self, config):\n        model = instantiate_from_config(config)\n        self.pretrained_model = model.eval()\n        # self.pretrained_model.train = False\n        for param in self.pretrained_model.parameters():\n            param.requires_grad = False\n\n    @torch.no_grad()\n    def encode_with_pretrained(self, x):\n        c = self.pretrained_model.encode(x)\n        if isinstance(c, DiagonalGaussianDistribution):\n            c = c.mode()\n        return c\n\n    def forward(self, x):\n        z_fs = self.encode_with_pretrained(x)\n        z = self.proj_norm(z_fs)\n        z = self.proj(z)\n        z = nonlinearity(z)\n\n        for submodel, downmodel in zip(self.model, self.downsampler):\n            z = submodel(z, temb=None)\n            z = downmodel(z)\n\n        if self.do_reshape:\n            z = rearrange(z, 'b c h w -> b (h w) c')\n        return z\n"
  },
  {
    "path": "lidm/modules/diffusion/model_lidm.py",
    "content": "# pytorch_diffusion + derived encoder decoder\nimport math\n\nimport torch\nimport torch.nn as nn\nimport numpy as np\nfrom einops import rearrange\n\nfrom ..basic import CircularConv2d\nfrom ...utils.misc_utils import instantiate_from_config\nfrom ...modules.attention import LinearAttention\n\n\ndef get_timestep_embedding(timesteps, embedding_dim):\n    \"\"\"\n    This matches the implementation in Denoising Diffusion Probabilistic Models:\n    From Fairseq.\n    Build sinusoidal embeddings.\n    This matches the implementation in tensor2tensor, but differs slightly\n    from the description in Section 3.5 of \"Attention Is All You Need\".\n    \"\"\"\n    assert len(timesteps.shape) == 1\n\n    half_dim = embedding_dim // 2\n    emb = math.log(10000) / (half_dim - 1)\n    emb = torch.exp(torch.arange(half_dim, dtype=torch.float32) * -emb)\n    emb = emb.to(device=timesteps.device)\n    emb = timesteps.float()[:, None] * emb[None, :]\n    emb = torch.cat([torch.sin(emb), torch.cos(emb)], dim=1)\n    if embedding_dim % 2 == 1:  # zero pad\n        emb = torch.nn.functional.pad(emb, (0, 1, 0, 0))\n    return emb\n\n\ndef nonlinearity(x):\n    # swish\n    return x * torch.sigmoid(x)\n\n\ndef Normalize(in_channels, num_groups=32):\n    return torch.nn.GroupNorm(num_groups=num_groups, num_channels=in_channels, eps=1e-6, affine=True)\n\n\nUPSAMPLE_STRIDE2KERNEL_DICT = {(1, 2): (1, 5), (1, 4): (1, 7), (2, 1): (5, 1), (2, 2): (3, 3)}\nUPSAMPLE_STRIDE2PAD_DICT = {(1, 2): (2, 2, 0, 0), (1, 4): (3, 3, 0, 0), (2, 1): (0, 0, 2, 2), (2, 2): (1, 1, 1, 1)}\n\n\nclass Upsample(nn.Module):\n    def __init__(self, in_channels, with_conv, stride):\n        super().__init__()\n        self.with_conv = with_conv\n        self.stride = stride\n        if self.with_conv:\n            k, p = UPSAMPLE_STRIDE2KERNEL_DICT[stride], UPSAMPLE_STRIDE2PAD_DICT[stride]\n            self.conv = CircularConv2d(in_channels, in_channels, kernel_size=k, padding=p)\n\n    def forward(self, x):\n        x = torch.nn.functional.interpolate(x, scale_factor=self.stride, mode='bilinear', align_corners=True)\n        if self.with_conv:\n            x = self.conv(x)\n        return x\n\n\nDOWNSAMPLE_STRIDE2KERNEL_DICT = {(1, 2): (3, 3), (1, 4): (3, 5), (2, 1): (3, 3), (2, 2): (3, 3)}\nDOWNSAMPLE_STRIDE2PAD_DICT = {(1, 2): (0, 1, 1, 1), (1, 4): (1, 1, 1, 1), (2, 1): (1, 1, 1, 1), (2, 2): (0, 1, 0, 1)}\n\n\nclass Downsample(nn.Module):\n    def __init__(self, in_channels, with_conv, stride):\n        super().__init__()\n        self.with_conv = with_conv\n        self.stride = stride\n        if self.with_conv:\n            k, p = DOWNSAMPLE_STRIDE2KERNEL_DICT[stride], DOWNSAMPLE_STRIDE2PAD_DICT[stride]\n            self.conv = CircularConv2d(in_channels, in_channels, kernel_size=k, stride=stride, padding=p)\n\n    def forward(self, x):\n        if self.with_conv:\n            x = self.conv(x)\n        else:\n            x = torch.nn.functional.avg_pool2d(x, kernel_size=self.stride, stride=self.stride)  # modified for lidar\n        return x\n\n\nUNIFORM_KERNEL2PAD_DICT = {(3, 3): (1, 1, 1, 1), (1, 4): (1, 2, 0, 0)}\n\n\nclass ResnetBlock(nn.Module):\n    def __init__(self, *, in_channels, out_channels=None, kernel_size=(3, 3), conv_shortcut=False,\n                 dropout, temb_channels=512):\n        super().__init__()\n        self.in_channels = in_channels\n        out_channels = in_channels if out_channels is None else out_channels\n        self.out_channels = out_channels\n        self.use_conv_shortcut = conv_shortcut\n        pad = UNIFORM_KERNEL2PAD_DICT[kernel_size]\n\n        self.norm1 = Normalize(in_channels)\n        self.conv1 = CircularConv2d(in_channels,\n                                    out_channels,\n                                    kernel_size=kernel_size,\n                                    stride=1,\n                                    padding=pad)\n        if temb_channels > 0:\n            self.temb_proj = torch.nn.Linear(temb_channels, out_channels)\n        self.norm2 = Normalize(out_channels)\n        self.dropout = torch.nn.Dropout(dropout)\n        self.conv2 = CircularConv2d(out_channels,\n                                    out_channels,\n                                    kernel_size=kernel_size,\n                                    stride=1,\n                                    padding=pad)\n        if self.in_channels != self.out_channels:\n            if self.use_conv_shortcut:\n                self.conv_shortcut = CircularConv2d(in_channels,\n                                                    out_channels,\n                                                    kernel_size=kernel_size,\n                                                    stride=1,\n                                                    padding=pad)\n            else:\n                self.nin_shortcut = torch.nn.Conv2d(in_channels,\n                                                    out_channels,\n                                                    kernel_size=1,\n                                                    stride=1,\n                                                    padding=0)\n\n    def forward(self, x, temb):\n        h = x\n        h = self.norm1(h)\n        h = nonlinearity(h)\n        h = self.conv1(h)\n\n        if temb is not None:\n            h = h + self.temb_proj(nonlinearity(temb))[:, :, None, None]\n\n        h = self.norm2(h)\n        h = nonlinearity(h)\n        h = self.dropout(h)\n        h = self.conv2(h)\n\n        if self.in_channels != self.out_channels:\n            if self.use_conv_shortcut:\n                x = self.conv_shortcut(x)\n            else:\n                x = self.nin_shortcut(x)\n\n        return x + h\n\n\nclass LinAttnBlock(LinearAttention):\n    \"\"\"to match AttnBlock usage\"\"\"\n\n    def __init__(self, in_channels):\n        super().__init__(dim=in_channels, heads=1, dim_head=in_channels)\n\n\nclass AttnBlock(nn.Module):\n    def __init__(self, in_channels):\n        super().__init__()\n        self.in_channels = in_channels\n\n        self.norm = Normalize(in_channels)\n        self.q = torch.nn.Conv2d(in_channels,\n                                 in_channels,\n                                 kernel_size=1,\n                                 stride=1,\n                                 padding=0)\n        self.k = torch.nn.Conv2d(in_channels,\n                                 in_channels,\n                                 kernel_size=1,\n                                 stride=1,\n                                 padding=0)\n        self.v = torch.nn.Conv2d(in_channels,\n                                 in_channels,\n                                 kernel_size=1,\n                                 stride=1,\n                                 padding=0)\n        self.proj_out = torch.nn.Conv2d(in_channels,\n                                        in_channels,\n                                        kernel_size=1,\n                                        stride=1,\n                                        padding=0)\n\n    def forward(self, x):\n        h_ = x\n        h_ = self.norm(h_)\n        q = self.q(h_)\n        k = self.k(h_)\n        v = self.v(h_)\n\n        # compute attention\n        b, c, h, w = q.shape\n        q = q.reshape(b, c, h * w)\n        q = q.permute(0, 2, 1)  # b,hw,c\n        k = k.reshape(b, c, h * w)  # b,c,hw\n        w_ = torch.bmm(q, k)  # b,hw,hw    w[b,i,j]=sum_c q[b,i,c]k[b,c,j]\n        w_ = w_ * (int(c) ** (-0.5))\n        w_ = torch.nn.functional.softmax(w_, dim=2)\n\n        # attend to values\n        v = v.reshape(b, c, h * w)\n        w_ = w_.permute(0, 2, 1)  # b,hw,hw (first hw of k, second of q)\n        h_ = torch.bmm(v, w_)  # b, c,hw (hw of q) h_[b,c,j] = sum_i v[b,c,i] w_[b,i,j]\n        h_ = h_.reshape(b, c, h, w)\n\n        h_ = self.proj_out(h_)\n\n        return x + h_\n\n\ndef make_attn(in_channels, attn_type=\"vanilla\"):\n    assert attn_type in [\"vanilla\", \"linear\", \"none\"], f'attn_type {attn_type} unknown'\n    # print(f\"making attention of type '{attn_type}' with {in_channels} in_channels\")\n    if attn_type == \"vanilla\":\n        return AttnBlock(in_channels)\n    elif attn_type == \"none\":\n        return nn.Identity(in_channels)\n    else:\n        return LinAttnBlock(in_channels)\n\n\nclass Encoder(nn.Module):\n    def __init__(self, *, ch, out_ch, ch_mult, strides, num_res_blocks,\n                 attn_levels, dropout=0.0, resamp_with_conv=True, in_channels, z_channels,\n                 double_z=True, use_linear_attn=False, attn_type=\"vanilla\", use_mask=False,\n                 **ignore_kwargs):\n        super().__init__()\n        if use_mask:\n            assert out_ch == in_channels + 1, 'Set \"out_ch = out_ch + 1\" for mask prediction.'\n        if use_linear_attn: attn_type = \"linear\"\n        self.ch = ch\n        self.temb_ch = 0\n        self.num_resolutions = len(ch_mult)\n        self.num_res_blocks = num_res_blocks\n        self.in_channels = in_channels\n\n        # downsampling\n        self.conv_in = CircularConv2d(in_channels,\n                                      self.ch,\n                                      kernel_size=3,\n                                      stride=1,\n                                      padding=1)\n        in_ch_mult = (1,) + tuple(ch_mult)\n        self.in_ch_mult = in_ch_mult\n        self.down = nn.ModuleList()\n        for i_level in range(self.num_resolutions):\n            block = nn.ModuleList()\n            attn = nn.ModuleList()\n            block_in = ch * in_ch_mult[i_level]\n            block_out = ch * ch_mult[i_level]\n            for i_block in range(self.num_res_blocks):\n                block.append(ResnetBlock(in_channels=block_in,\n                                         out_channels=block_out,\n                                         temb_channels=self.temb_ch,\n                                         dropout=dropout))\n                block_in = block_out\n                if i_level in attn_levels:\n                    attn.append(make_attn(block_in, attn_type=attn_type))\n            down = nn.Module()\n            down.block = block\n            down.attn = attn\n            if i_level != self.num_resolutions - 1:\n                stride = tuple(strides[i_level])\n                down.downsample = Downsample(block_in, resamp_with_conv, stride)\n            self.down.append(down)\n\n        # middle\n        self.mid = nn.Module()\n        self.mid.block_1 = ResnetBlock(in_channels=block_in,\n                                       out_channels=block_in,\n                                       temb_channels=self.temb_ch,\n                                       dropout=dropout)\n        self.mid.attn_1 = make_attn(block_in, attn_type=attn_type)\n        self.mid.block_2 = ResnetBlock(in_channels=block_in,\n                                       out_channels=block_in,\n                                       temb_channels=self.temb_ch,\n                                       dropout=dropout)\n\n        # end\n        self.norm_out = Normalize(block_in)\n        self.conv_out = CircularConv2d(block_in,\n                                       2 * z_channels if double_z else z_channels,\n                                       kernel_size=3,\n                                       stride=1,\n                                       padding=1)\n\n    def forward(self, x):\n        # timestep embedding\n        temb = None\n\n        # downsampling\n        hs = [self.conv_in(x)]\n        for i_level in range(self.num_resolutions):\n            for i_block in range(self.num_res_blocks):\n                h = self.down[i_level].block[i_block](hs[-1], temb)\n                if len(self.down[i_level].attn) > 0:\n                    h = self.down[i_level].attn[i_block](h)\n                hs.append(h)\n            if i_level != self.num_resolutions - 1:\n                hs.append(self.down[i_level].downsample(hs[-1]))\n\n        # middle\n        h = hs[-1]\n        h = self.mid.block_1(h, temb)\n        h = self.mid.attn_1(h)\n        h = self.mid.block_2(h, temb)\n\n        # end\n        h = self.norm_out(h)\n        h = nonlinearity(h)\n        h = self.conv_out(h)\n        return h\n\n\nclass Decoder(nn.Module):\n    def __init__(self, *, ch, out_ch, ch_mult, strides, num_res_blocks, attn_levels,\n                 dropout=0.0, resamp_with_conv=True, in_channels, z_channels, give_pre_end=False,\n                 tanh_out=False, use_linear_attn=False, attn_type=\"vanilla\", use_mask=False,\n                 **ignorekwargs):\n        super().__init__()\n        stride2kernel = {(2, 2): (3, 3), (1, 2): (1, 4)}\n        if use_linear_attn: attn_type = \"linear\"\n        self.ch = ch\n        self.temb_ch = 0\n        self.num_resolutions = len(ch_mult)\n        self.num_res_blocks = num_res_blocks\n        self.in_channels = in_channels\n        self.give_pre_end = give_pre_end\n        self.tanh_out = tanh_out\n\n        # compute in_ch_mult, block_in and curr_res at lowest res\n        block_in = ch * ch_mult[self.num_resolutions - 1]\n\n        # z to block_in\n        self.conv_in = CircularConv2d(z_channels,\n                                      block_in,\n                                      kernel_size=3,\n                                      stride=1,\n                                      padding=1)\n\n        # middle\n        self.mid = nn.Module()\n        self.mid.block_1 = ResnetBlock(in_channels=block_in,\n                                       out_channels=block_in,\n                                       temb_channels=self.temb_ch,\n                                       dropout=dropout)\n        self.mid.attn_1 = make_attn(block_in, attn_type=attn_type)\n        self.mid.block_2 = ResnetBlock(in_channels=block_in,\n                                       out_channels=block_in,\n                                       temb_channels=self.temb_ch,\n                                       dropout=dropout)\n\n        # upsampling\n        self.up = nn.ModuleList()\n        for i_level in reversed(range(self.num_resolutions)):\n            stride = tuple(strides[i_level - 1]) if i_level > 0 else None\n            kernel = stride2kernel[stride] if stride is not None else (1, 4)\n            block = nn.ModuleList()\n            attn = nn.ModuleList()\n            block_out = ch * ch_mult[i_level]\n            for i_block in range(self.num_res_blocks + 1):\n                block.append(ResnetBlock(in_channels=block_in,\n                                         out_channels=block_out,\n                                         kernel_size=kernel,\n                                         temb_channels=self.temb_ch,\n                                         dropout=dropout))\n                block_in = block_out\n                if i_level in attn_levels:\n                    attn.append(make_attn(block_in, attn_type=attn_type))\n            up = nn.Module()\n            up.block = block\n            up.attn = attn\n            if stride is not None:\n                up.upsample = Upsample(block_in, resamp_with_conv, stride)\n            self.up.insert(0, up)  # prepend to get consistent order\n\n        # end\n        self.norm_out = Normalize(block_in)\n        self.conv_out = CircularConv2d(block_in,\n                                       out_ch,\n                                       kernel_size=(1, 4),\n                                       stride=1,\n                                       padding=(1, 2, 0, 0))\n\n    def forward(self, z):\n        self.last_z_shape = z.shape\n\n        # timestep embedding\n        temb = None\n\n        # z to block_in\n        h = self.conv_in(z)\n\n        # middle\n        h = self.mid.block_1(h, temb)\n        h = self.mid.attn_1(h)\n        h = self.mid.block_2(h, temb)\n\n        # upsampling\n        for i_level in reversed(range(self.num_resolutions)):\n            for i_block in range(self.num_res_blocks + 1):\n                h = self.up[i_level].block[i_block](h, temb)\n                if len(self.up[i_level].attn) > 0:\n                    h = self.up[i_level].attn[i_block](h)\n            if i_level != 0:\n                h = self.up[i_level].upsample(h)\n\n        # end\n        if self.give_pre_end:\n            return h\n\n        h = self.norm_out(h)\n        h = nonlinearity(h)\n        h = self.conv_out(h)\n        if self.tanh_out:\n            h = torch.tanh(h)\n        return h\n\n\nclass SimpleDecoder(nn.Module):\n    def __init__(self, in_channels, out_channels, *args, **kwargs):\n        super().__init__()\n        self.model = nn.ModuleList([nn.Conv2d(in_channels, in_channels, 1),\n                                    ResnetBlock(in_channels=in_channels,\n                                                out_channels=2 * in_channels,\n                                                temb_channels=0, dropout=0.0),\n                                    ResnetBlock(in_channels=2 * in_channels,\n                                                out_channels=4 * in_channels,\n                                                temb_channels=0, dropout=0.0),\n                                    ResnetBlock(in_channels=4 * in_channels,\n                                                out_channels=2 * in_channels,\n                                                temb_channels=0, dropout=0.0),\n                                    nn.Conv2d(2 * in_channels, in_channels, 1),\n                                    Upsample(in_channels, with_conv=True)])\n        # end\n        self.norm_out = Normalize(in_channels)\n        self.conv_out = torch.nn.Conv2d(in_channels,\n                                        out_channels,\n                                        kernel_size=3,\n                                        stride=1,\n                                        padding=1)\n\n    def forward(self, x):\n        for i, layer in enumerate(self.model):\n            if i in [1, 2, 3]:\n                x = layer(x, None)\n            else:\n                x = layer(x)\n\n        h = self.norm_out(x)\n        h = nonlinearity(h)\n        x = self.conv_out(h)\n        return x\n\n\nclass UpsampleDecoder(nn.Module):\n    def __init__(self, in_channels, out_channels, ch, num_res_blocks,\n                 ch_mult=(2, 2), dropout=0.0):\n        super().__init__()\n        # upsampling\n        self.temb_ch = 0\n        self.num_resolutions = len(ch_mult)\n        self.num_res_blocks = num_res_blocks\n        block_in = in_channels\n        self.res_blocks = nn.ModuleList()\n        self.upsample_blocks = nn.ModuleList()\n        for i_level in range(self.num_resolutions):\n            res_block = []\n            block_out = ch * ch_mult[i_level]\n            for i_block in range(self.num_res_blocks + 1):\n                res_block.append(ResnetBlock(in_channels=block_in,\n                                             out_channels=block_out,\n                                             temb_channels=self.temb_ch,\n                                             dropout=dropout))\n                block_in = block_out\n            self.res_blocks.append(nn.ModuleList(res_block))\n            if i_level != self.num_resolutions - 1:\n                self.upsample_blocks.append(Upsample(block_in, True))\n\n        # end\n        self.norm_out = Normalize(block_in)\n        self.conv_out = torch.nn.Conv2d(block_in,\n                                        out_channels,\n                                        kernel_size=3,\n                                        stride=1,\n                                        padding=1)\n\n    def forward(self, x):\n        # upsampling\n        h = x\n        for k, i_level in enumerate(range(self.num_resolutions)):\n            for i_block in range(self.num_res_blocks + 1):\n                h = self.res_blocks[i_level][i_block](h, None)\n            if i_level != self.num_resolutions - 1:\n                h = self.upsample_blocks[k](h)\n        h = self.norm_out(h)\n        h = nonlinearity(h)\n        h = self.conv_out(h)\n        return h\n\n\nclass LatentRescaler(nn.Module):\n    def __init__(self, factor, in_channels, mid_channels, out_channels, depth=2):\n        super().__init__()\n        # residual block, interpolate, residual block\n        self.factor = factor\n        self.conv_in = nn.Conv2d(in_channels,\n                                 mid_channels,\n                                 kernel_size=3,\n                                 stride=1,\n                                 padding=1)\n        self.res_block1 = nn.ModuleList([ResnetBlock(in_channels=mid_channels,\n                                                     out_channels=mid_channels,\n                                                     temb_channels=0,\n                                                     dropout=0.0) for _ in range(depth)])\n        self.attn = AttnBlock(mid_channels)\n        self.res_block2 = nn.ModuleList([ResnetBlock(in_channels=mid_channels,\n                                                     out_channels=mid_channels,\n                                                     temb_channels=0,\n                                                     dropout=0.0) for _ in range(depth)])\n\n        self.conv_out = nn.Conv2d(mid_channels,\n                                  out_channels,\n                                  kernel_size=1,\n                                  )\n\n    def forward(self, x):\n        x = self.conv_in(x)\n        for block in self.res_block1:\n            x = block(x, None)\n        x = torch.nn.functional.interpolate(x, size=(\n            int(round(x.shape[2] * self.factor)), int(round(x.shape[3] * self.factor))))\n        x = self.attn(x)\n        for block in self.res_block2:\n            x = block(x, None)\n        x = self.conv_out(x)\n        return x\n\n\nclass MergedRescaleEncoder(nn.Module):\n    def __init__(self, in_channels, ch, out_ch, num_res_blocks,\n                 attn_levels, dropout=0.0, resamp_with_conv=True,\n                 ch_mult=(1, 2, 4, 8), rescale_factor=1.0, rescale_module_depth=1):\n        super().__init__()\n        intermediate_chn = ch * ch_mult[-1]\n        self.encoder = Encoder(in_channels=in_channels, num_res_blocks=num_res_blocks, ch=ch, ch_mult=ch_mult,\n                               z_channels=intermediate_chn, double_z=False, attn_levels=attn_levels, dropout=dropout,\n                               resamp_with_conv=resamp_with_conv, out_ch=None)\n        self.rescaler = LatentRescaler(factor=rescale_factor, in_channels=intermediate_chn,\n                                       mid_channels=intermediate_chn, out_channels=out_ch, depth=rescale_module_depth)\n\n    def forward(self, x):\n        x = self.encoder(x)\n        x = self.rescaler(x)\n        return x\n\n\nclass MergedRescaleDecoder(nn.Module):\n    def __init__(self, z_channels, out_ch, num_res_blocks, attn_levels, ch, ch_mult=(1, 2, 4, 8),\n                 dropout=0.0, resamp_with_conv=True, rescale_factor=1.0, rescale_module_depth=1):\n        super().__init__()\n        tmp_chn = z_channels * ch_mult[-1]\n        self.decoder = Decoder(out_ch=out_ch, z_channels=tmp_chn, attn_levels=attn_levels, dropout=dropout,\n                               resamp_with_conv=resamp_with_conv, in_channels=None, num_res_blocks=num_res_blocks,\n                               ch_mult=ch_mult, ch=ch)\n        self.rescaler = LatentRescaler(factor=rescale_factor, in_channels=z_channels, mid_channels=tmp_chn,\n                                       out_channels=tmp_chn, depth=rescale_module_depth)\n\n    def forward(self, x):\n        x = self.rescaler(x)\n        x = self.decoder(x)\n        return x\n\n\nclass Upsampler(nn.Module):\n    def __init__(self, in_size, out_size, in_channels, out_channels, ch_mult=2):\n        super().__init__()\n        assert out_size >= in_size\n        num_blocks = int(np.log2(out_size // in_size)) + 1\n        factor_up = 1. + (out_size % in_size)\n        print(\n            f\"Building {self.__class__.__name__} with in_size: {in_size} --> out_size {out_size} and factor {factor_up}\")\n        self.rescaler = LatentRescaler(factor=factor_up, in_channels=in_channels, mid_channels=2 * in_channels,\n                                       out_channels=in_channels)\n        self.decoder = Decoder(out_ch=out_channels, z_channels=in_channels, num_res_blocks=2,\n                               attn_levels=[], in_channels=None, ch=in_channels,\n                               ch_mult=[ch_mult for _ in range(num_blocks)])\n\n    def forward(self, x):\n        x = self.rescaler(x)\n        x = self.decoder(x)\n        return x\n\n\nclass Resize(nn.Module):\n    def __init__(self, in_channels=None, learned=False, mode=\"bilinear\"):\n        super().__init__()\n        self.with_conv = learned\n        self.mode = mode\n        if self.with_conv:\n            print(f\"Note: {self.__class__.__name} uses learned downsampling and will ignore the fixed {mode} mode\")\n            raise NotImplementedError()\n            assert in_channels is not None\n            # no asymmetric padding in torch conv, must do it ourselves\n            self.conv = torch.nn.Conv2d(in_channels,\n                                        in_channels,\n                                        kernel_size=4,\n                                        stride=2,\n                                        padding=1)\n\n    def forward(self, x, scale_factor=1.0):\n        if scale_factor == 1.0:\n            return x\n        else:\n            x = torch.nn.functional.interpolate(x, mode=self.mode, align_corners=False, scale_factor=scale_factor)\n        return x\n\n\nclass FirstStagePostProcessor(nn.Module):\n\n    def __init__(self, ch_mult: list, in_channels,\n                 pretrained_model: nn.Module = None,\n                 reshape=False,\n                 n_channels=None,\n                 dropout=0.,\n                 pretrained_config=None):\n        super().__init__()\n        if pretrained_config is None:\n            assert pretrained_model is not None, 'Either \"pretrained_model\" or \"pretrained_config\" must not be None'\n            self.pretrained_model = pretrained_model\n        else:\n            assert pretrained_config is not None, 'Either \"pretrained_model\" or \"pretrained_config\" must not be None'\n            self.instantiate_pretrained(pretrained_config)\n\n        self.do_reshape = reshape\n\n        if n_channels is None:\n            n_channels = self.pretrained_model.encoder.ch\n\n        self.proj_norm = Normalize(in_channels, num_groups=in_channels // 2)\n        self.proj = nn.Conv2d(in_channels, n_channels, kernel_size=3,\n                              stride=1, padding=1)\n\n        blocks = []\n        downs = []\n        ch_in = n_channels\n        for m in ch_mult:\n            blocks.append(ResnetBlock(in_channels=ch_in, out_channels=m * n_channels, dropout=dropout))\n            ch_in = m * n_channels\n            downs.append(Downsample(ch_in, with_conv=False))\n\n        self.model = nn.ModuleList(blocks)\n        self.downsampler = nn.ModuleList(downs)\n\n    def instantiate_pretrained(self, config):\n        model = instantiate_from_config(config)\n        self.pretrained_model = model.eval()\n        # self.pretrained_model.train = False\n        for param in self.pretrained_model.parameters():\n            param.requires_grad = False\n\n    @torch.no_grad()\n    def encode_with_pretrained(self, x):\n        c = self.pretrained_model.encode(x)\n        if isinstance(c, DiagonalGaussianDistribution):\n            c = c.mode()\n        return c\n\n    def forward(self, x):\n        z_fs = self.encode_with_pretrained(x)\n        z = self.proj_norm(z_fs)\n        z = self.proj(z)\n        z = nonlinearity(z)\n\n        for submodel, downmodel in zip(self.model, self.downsampler):\n            z = submodel(z, temb=None)\n            z = downmodel(z)\n\n        if self.do_reshape:\n            z = rearrange(z, 'b c h w -> b (h w) c')\n        return z\n"
  },
  {
    "path": "lidm/modules/diffusion/openaimodel.py",
    "content": "from abc import abstractmethod\nimport math\n\nimport numpy as np\nimport torch as th\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom ..basic import (\n    checkpoint,\n    conv_nd,\n    linear,\n    avg_pool_nd,\n    zero_module,\n    normalization,\n    timestep_embedding,\n)\nfrom ...modules.attention import SpatialTransformer\n\n\n# dummy replace\ndef convert_module_to_f16(x):\n    pass\n\n\ndef convert_module_to_f32(x):\n    pass\n\n\n## go\nclass AttentionPool2d(nn.Module):\n    \"\"\"\n    Adapted from CLIP: https://github.com/openai/CLIP/blob/main/clip/model.py\n    \"\"\"\n\n    def __init__(\n            self,\n            spacial_dim: int,\n            embed_dim: int,\n            num_heads_channels: int,\n            output_dim: int = None,\n    ):\n        super().__init__()\n        self.positional_embedding = nn.Parameter(th.randn(embed_dim, spacial_dim ** 2 + 1) / embed_dim ** 0.5)\n        self.qkv_proj = conv_nd(1, embed_dim, 3 * embed_dim, 1)\n        self.c_proj = conv_nd(1, embed_dim, output_dim or embed_dim, 1)\n        self.num_heads = embed_dim // num_heads_channels\n        self.attention = QKVAttention(self.num_heads)\n\n    def forward(self, x):\n        b, c, *_spatial = x.shape\n        x = x.reshape(b, c, -1)  # NC(HW)\n        x = th.cat([x.mean(dim=-1, keepdim=True), x], dim=-1)  # NC(HW+1)\n        x = x + self.positional_embedding[None, :, :].to(x.dtype)  # NC(HW+1)\n        x = self.qkv_proj(x)\n        x = self.attention(x)\n        x = self.c_proj(x)\n        return x[:, :, 0]\n\n\nclass TimestepBlock(nn.Module):\n    \"\"\"\n    Any module where forward() takes timestep embeddings as a second argument.\n    \"\"\"\n\n    @abstractmethod\n    def forward(self, x, emb):\n        \"\"\"\n        Apply the module to `x` given `emb` timestep embeddings.\n        \"\"\"\n\n\nclass TimestepEmbedSequential(nn.Sequential, TimestepBlock):\n    \"\"\"\n    A sequential module that passes timestep embeddings to the children that\n    support it as an extra input.\n    \"\"\"\n\n    def forward(self, x, emb, context=None):\n        for layer in self:\n            if isinstance(layer, TimestepBlock):\n                x = layer(x, emb)\n            elif isinstance(layer, SpatialTransformer):\n                x = layer(x, context)\n            else:\n                x = layer(x)\n        return x\n\n\nclass Upsample(nn.Module):\n    \"\"\"\n    An upsampling layer with an optional convolution.\n    :param channels: channels in the inputs and outputs.\n    :param use_conv: a bool determining if a convolution is applied.\n    :param dims: determines if the signal is 1D, 2D, or 3D. If 3D, then\n                 upsampling occurs in the inner-two dimensions.\n    \"\"\"\n\n    def __init__(self, channels, use_conv, dims=2, out_channels=None, padding=1, cconv=False):\n        super().__init__()\n        self.channels = channels\n        self.out_channels = out_channels or channels\n        self.use_conv = use_conv\n        self.dims = dims\n        if use_conv:\n            self.conv = conv_nd(dims, self.channels, self.out_channels, 3, padding=padding, cconv=cconv)\n\n    def forward(self, x):\n        assert x.shape[1] == self.channels\n        if self.dims == 3:\n            x = F.interpolate(\n                x, (x.shape[2], x.shape[3] * 2, x.shape[4] * 2), mode=\"nearest\"\n            )\n        else:\n            x = F.interpolate(x, scale_factor=2, mode=\"nearest\")\n        if self.use_conv:\n            x = self.conv(x)\n        return x\n\n\nclass TransposedUpsample(nn.Module):\n    'Learned 2x upsampling without padding'\n\n    def __init__(self, channels, out_channels=None, ks=5):\n        super().__init__()\n        self.channels = channels\n        self.out_channels = out_channels or channels\n\n        self.up = nn.ConvTranspose2d(self.channels, self.out_channels, kernel_size=ks, stride=2)\n\n    def forward(self, x):\n        return self.up(x)\n\n\nclass Downsample(nn.Module):\n    \"\"\"\n    A downsampling layer with an optional convolution.\n    :param channels: channels in the inputs and outputs.\n    :param use_conv: a bool determining if a convolution is applied.\n    :param dims: determines if the signal is 1D, 2D, or 3D. If 3D, then\n                 downsampling occurs in the inner-two dimensions.\n    \"\"\"\n\n    def __init__(self, channels, use_conv, dims=2, out_channels=None, padding=1, cconv=False):\n        super().__init__()\n        self.channels = channels\n        self.out_channels = out_channels or channels\n        self.use_conv = use_conv\n        self.dims = dims\n        stride = 2 if dims != 3 else (1, 2, 2)\n        if use_conv:\n            self.op = conv_nd(\n                dims, self.channels, self.out_channels, 3, stride=stride, padding=padding, cconv=cconv\n            )\n        else:\n            assert self.channels == self.out_channels\n            self.op = avg_pool_nd(dims, kernel_size=stride, stride=stride)\n\n    def forward(self, x):\n        assert x.shape[1] == self.channels\n        return self.op(x)\n\n\nclass ResBlock(TimestepBlock):\n    \"\"\"\n    A residual block that can optionally change the number of channels.\n    :param channels: the number of input channels.\n    :param emb_channels: the number of timestep embedding channels.\n    :param dropout: the rate of dropout.\n    :param out_channels: if specified, the number of out channels.\n    :param use_conv: if True and out_channels is specified, use a spatial\n        convolution instead of a smaller 1x1 convolution to change the\n        channels in the skip connection.\n    :param dims: determines if the signal is 1D, 2D, or 3D.\n    :param use_checkpoint: if True, use gradient checkpointing on this module.\n    :param up: if True, use this block for upsampling.\n    :param down: if True, use this block for downsampling.\n    \"\"\"\n\n    def __init__(\n            self,\n            channels,\n            emb_channels,\n            dropout,\n            out_channels=None,\n            use_conv=False,\n            use_scale_shift_norm=False,\n            dims=2,\n            use_checkpoint=False,\n            up=False,\n            down=False,\n            cconv=False\n    ):\n        super().__init__()\n        self.channels = channels\n        self.emb_channels = emb_channels\n        self.dropout = dropout\n        self.out_channels = out_channels or channels\n        self.use_conv = use_conv\n        self.use_checkpoint = use_checkpoint\n        self.use_scale_shift_norm = use_scale_shift_norm\n\n        self.in_layers = nn.Sequential(\n            normalization(channels),\n            nn.SiLU(),\n            conv_nd(dims, channels, self.out_channels, 3, padding=1, cconv=cconv),\n        )\n\n        self.updown = up or down\n\n        if up:\n            self.h_upd = Upsample(channels, False, dims, cconv=cconv)\n            self.x_upd = Upsample(channels, False, dims, cconv=cconv)\n        elif down:\n            self.h_upd = Downsample(channels, False, dims, cconv=cconv)\n            self.x_upd = Downsample(channels, False, dims, cconv=cconv)\n        else:\n            self.h_upd = self.x_upd = nn.Identity()\n\n        self.emb_layers = nn.Sequential(\n            nn.SiLU(),\n            linear(\n                emb_channels,\n                2 * self.out_channels if use_scale_shift_norm else self.out_channels,\n            ),\n        )\n        self.out_layers = nn.Sequential(\n            normalization(self.out_channels),\n            nn.SiLU(),\n            nn.Dropout(p=dropout),\n            zero_module(\n                conv_nd(dims, self.out_channels, self.out_channels, 3, padding=1, cconv=cconv)\n            ),\n        )\n\n        if self.out_channels == channels:\n            self.skip_connection = nn.Identity()\n        elif use_conv:\n            self.skip_connection = conv_nd(\n                dims, channels, self.out_channels, 3, padding=1, cconv=cconv\n            )\n        else:\n            self.skip_connection = conv_nd(dims, channels, self.out_channels, 1, cconv=cconv)\n\n    def forward(self, x, emb):\n        \"\"\"\n        Apply the block to a Tensor, conditioned on a timestep embedding.\n        :param x: an [N x C x ...] Tensor of features.\n        :param emb: an [N x emb_channels] Tensor of timestep embeddings.\n        :return: an [N x C x ...] Tensor of outputs.\n        \"\"\"\n        return checkpoint(\n            self._forward, (x, emb), self.parameters(), self.use_checkpoint\n        )\n\n    def _forward(self, x, emb):\n        if self.updown:\n            in_rest, in_conv = self.in_layers[:-1], self.in_layers[-1]\n            h = in_rest(x)\n            h = self.h_upd(h)\n            x = self.x_upd(x)\n            h = in_conv(h)\n        else:\n            h = self.in_layers(x)\n        emb_out = self.emb_layers(emb).type(h.dtype)\n        while len(emb_out.shape) < len(h.shape):\n            emb_out = emb_out[..., None]\n        if self.use_scale_shift_norm:\n            out_norm, out_rest = self.out_layers[0], self.out_layers[1:]\n            scale, shift = th.chunk(emb_out, 2, dim=1)\n            h = out_norm(h) * (1 + scale) + shift\n            h = out_rest(h)\n        else:\n            h = h + emb_out\n            h = self.out_layers(h)\n        return self.skip_connection(x) + h\n\n\nclass AttentionBlock(nn.Module):\n    \"\"\"\n    An attention block that allows spatial positions to attend to each other.\n    Originally ported from here, but adapted to the N-d case.\n    https://github.com/hojonathanho/diffusion/blob/1e0dceb3b3495bbe19116a5e1b3596cd0706c543/diffusion_tf/models/unet.py#L66.\n    \"\"\"\n\n    def __init__(\n            self,\n            channels,\n            num_heads=1,\n            num_head_channels=-1,\n            use_checkpoint=False,\n            use_new_attention_order=False,\n    ):\n        super().__init__()\n        self.channels = channels\n        if num_head_channels == -1:\n            self.num_heads = num_heads\n        else:\n            assert (\n                    channels % num_head_channels == 0\n            ), f\"q,k,v channels {channels} is not divisible by num_head_channels {num_head_channels}\"\n            self.num_heads = channels // num_head_channels\n        self.use_checkpoint = use_checkpoint\n        self.norm = normalization(channels)\n        self.qkv = conv_nd(1, channels, channels * 3, 1)\n        if use_new_attention_order:\n            # split qkv before split heads\n            self.attention = QKVAttention(self.num_heads)\n        else:\n            # split heads before split qkv\n            self.attention = QKVAttentionLegacy(self.num_heads)\n\n        self.proj_out = zero_module(conv_nd(1, channels, channels, 1))\n\n    def forward(self, x):\n        return checkpoint(self._forward, (x,), self.parameters(),\n                          True)  # TODO: check checkpoint usage, is True # TODO: fix the .half call!!!\n        # return pt_checkpoint(self._forward, x)  # pytorch\n\n    def _forward(self, x):\n        b, c, *spatial = x.shape\n        x = x.reshape(b, c, -1)\n        qkv = self.qkv(self.norm(x))\n        h = self.attention(qkv)\n        h = self.proj_out(h)\n        return (x + h).reshape(b, c, *spatial)\n\n\ndef count_flops_attn(model, _x, y):\n    \"\"\"\n    A counter for the `thop` package to count the operations in an\n    attention operation.\n    Meant to be used like:\n        macs, params = thop.profile(\n            model,\n            inputs=(inputs, timestamps),\n            custom_ops={QKVAttention: QKVAttention.count_flops},\n        )\n    \"\"\"\n    b, c, *spatial = y[0].shape\n    num_spatial = int(np.prod(spatial))\n    # We perform two matmuls with the same number of ops.\n    # The first computes the weight matrix, the second computes\n    # the combination of the value vectors.\n    matmul_ops = 2 * b * (num_spatial ** 2) * c\n    model.total_ops += th.DoubleTensor([matmul_ops])\n\n\nclass QKVAttentionLegacy(nn.Module):\n    \"\"\"\n    A module which performs QKV attention. Matches legacy QKVAttention + input/ouput heads shaping\n    \"\"\"\n\n    def __init__(self, n_heads):\n        super().__init__()\n        self.n_heads = n_heads\n\n    def forward(self, qkv):\n        \"\"\"\n        Apply QKV attention.\n        :param qkv: an [N x (H * 3 * C) x T] tensor of Qs, Ks, and Vs.\n        :return: an [N x (H * C) x T] tensor after attention.\n        \"\"\"\n        bs, width, length = qkv.shape\n        assert width % (3 * self.n_heads) == 0\n        ch = width // (3 * self.n_heads)\n        q, k, v = qkv.reshape(bs * self.n_heads, ch * 3, length).split(ch, dim=1)\n        scale = 1 / math.sqrt(math.sqrt(ch))\n        weight = th.einsum(\n            \"bct,bcs->bts\", q * scale, k * scale\n        )  # More stable with f16 than dividing afterwards\n        weight = th.softmax(weight.float(), dim=-1).type(weight.dtype)\n        a = th.einsum(\"bts,bcs->bct\", weight, v)\n        return a.reshape(bs, -1, length)\n\n    @staticmethod\n    def count_flops(model, _x, y):\n        return count_flops_attn(model, _x, y)\n\n\nclass QKVAttention(nn.Module):\n    \"\"\"\n    A module which performs QKV attention and splits in a different order.\n    \"\"\"\n\n    def __init__(self, n_heads):\n        super().__init__()\n        self.n_heads = n_heads\n\n    def forward(self, qkv):\n        \"\"\"\n        Apply QKV attention.\n        :param qkv: an [N x (3 * H * C) x T] tensor of Qs, Ks, and Vs.\n        :return: an [N x (H * C) x T] tensor after attention.\n        \"\"\"\n        bs, width, length = qkv.shape\n        assert width % (3 * self.n_heads) == 0\n        ch = width // (3 * self.n_heads)\n        q, k, v = qkv.chunk(3, dim=1)\n        scale = 1 / math.sqrt(math.sqrt(ch))\n        weight = th.einsum(\n            \"bct,bcs->bts\",\n            (q * scale).view(bs * self.n_heads, ch, length),\n            (k * scale).view(bs * self.n_heads, ch, length),\n        )  # More stable with f16 than dividing afterwards\n        weight = th.softmax(weight.float(), dim=-1).type(weight.dtype)\n        a = th.einsum(\"bts,bcs->bct\", weight, v.reshape(bs * self.n_heads, ch, length))\n        return a.reshape(bs, -1, length)\n\n    @staticmethod\n    def count_flops(model, _x, y):\n        return count_flops_attn(model, _x, y)\n\n\nclass UNetModel(nn.Module):\n    \"\"\"\n    The full UNet model with attention and timestep embedding.\n    :param in_channels: channels in the input Tensor.\n    :param model_channels: base channel count for the model.\n    :param out_channels: channels in the output Tensor.\n    :param num_res_blocks: number of residual blocks per downsample.\n    :param attention_resolutions: a collection of downsample rates at which\n        attention will take place. May be a set, list, or tuple.\n        For example, if this contains 4, then at 4x downsampling, attention\n        will be used.\n    :param dropout: the dropout probability.\n    :param channel_mult: channel multiplier for each level of the UNet.\n    :param conv_resample: if True, use learned convolutions for upsampling and\n        downsampling.\n    :param dims: determines if the signal is 1D, 2D, or 3D.\n    :param num_classes: if specified (as an int), then this model will be\n        class-conditional with `num_classes` classes.\n    :param use_checkpoint: use gradient checkpointing to reduce memory usage.\n    :param num_heads: the number of attention heads in each attention layer.\n    :param num_heads_channels: if specified, ignore num_heads and instead use\n                               a fixed channel width per attention head.\n    :param num_heads_upsample: works with num_heads to set a different number\n                               of heads for upsampling. Deprecated.\n    :param use_scale_shift_norm: use a FiLM-like conditioning mechanism.\n    :param resblock_updown: use residual blocks for up/downsampling.\n    :param use_new_attention_order: use a different attention pattern for potentially\n                                    increased efficiency.\n    \"\"\"\n\n    def __init__(\n            self,\n            image_size,\n            in_channels,\n            model_channels,\n            out_channels,\n            num_res_blocks,\n            attention_resolutions,\n            dropout=0,\n            channel_mult=(1, 2, 4, 8),\n            conv_resample=True,\n            dims=2,\n            num_classes=None,\n            use_checkpoint=False,\n            use_fp16=False,\n            num_heads=-1,\n            num_head_channels=-1,\n            num_heads_upsample=-1,\n            use_scale_shift_norm=False,\n            resblock_updown=False,\n            use_new_attention_order=False,\n            use_spatial_transformer=False,  # custom transformer support\n            transformer_depth=1,  # custom transformer support\n            context_dim=None,  # custom transformer support\n            n_embed=None,  # custom support for prediction of discrete ids into codebook of first stage vq model\n            legacy=True,\n            lib_name='ldm'\n    ):\n        super().__init__()\n        if use_spatial_transformer:\n            assert context_dim is not None, 'Fool!! You forgot to include the dimension of your cross-attention conditioning...'\n        if context_dim is not None:\n            assert use_spatial_transformer, 'Fool!! You forgot to use the spatial transformer for your cross-attention conditioning...'\n            from omegaconf.listconfig import ListConfig\n            if type(context_dim) == ListConfig:\n                context_dim = list(context_dim)\n\n        if num_heads_upsample == -1:\n            num_heads_upsample = num_heads\n\n        if num_heads == -1:\n            assert num_head_channels != -1, 'Either num_heads or num_head_channels has to be set'\n\n        if num_head_channels == -1:\n            assert num_heads != -1, 'Either num_heads or num_head_channels has to be set'\n\n        self.image_size = image_size\n        self.in_channels = in_channels\n        self.model_channels = model_channels\n        self.out_channels = out_channels\n        self.num_res_blocks = num_res_blocks\n        self.attention_resolutions = attention_resolutions\n        self.dropout = dropout\n        self.channel_mult = channel_mult\n        self.conv_resample = conv_resample\n        self.num_classes = num_classes\n        self.use_checkpoint = use_checkpoint\n        self.dtype = th.float16 if use_fp16 else th.float32\n        self.num_heads = num_heads\n        self.num_head_channels = num_head_channels\n        self.num_heads_upsample = num_heads_upsample\n        self.predict_codebook_ids = n_embed is not None\n        self.cconv = lib_name in ['lidm', 'lidm_v0']\n\n        time_embed_dim = model_channels * 4\n        self.time_embed = nn.Sequential(\n            linear(model_channels, time_embed_dim),\n            nn.SiLU(),\n            linear(time_embed_dim, time_embed_dim),\n        )\n\n        if self.num_classes is not None:\n            self.label_emb = nn.Embedding(num_classes, time_embed_dim)\n\n        self.input_blocks = nn.ModuleList(\n            [\n                TimestepEmbedSequential(\n                    conv_nd(dims, in_channels, model_channels, 3, padding=1, cconv=self.cconv)\n                )\n            ]\n        )\n        self._feature_size = model_channels\n        input_block_chans = [model_channels]\n        ch = model_channels\n        ds = 1\n        for level, mult in enumerate(channel_mult):\n            for _ in range(num_res_blocks):\n                layers = [\n                    ResBlock(\n                        ch,\n                        time_embed_dim,\n                        dropout,\n                        out_channels=mult * model_channels,\n                        dims=dims,\n                        use_checkpoint=use_checkpoint,\n                        use_scale_shift_norm=use_scale_shift_norm,\n                        cconv=self.cconv\n                    )\n                ]\n                ch = mult * model_channels\n                if ds in attention_resolutions:\n                    if num_head_channels == -1:\n                        dim_head = ch // num_heads\n                    else:\n                        num_heads = ch // num_head_channels\n                        dim_head = num_head_channels\n                    if legacy:\n                        # num_heads = 1\n                        dim_head = ch // num_heads if use_spatial_transformer else num_head_channels\n                    layers.append(\n                        AttentionBlock(\n                            ch,\n                            use_checkpoint=use_checkpoint,\n                            num_heads=num_heads,\n                            num_head_channels=dim_head,\n                            use_new_attention_order=use_new_attention_order,\n                        ) if not use_spatial_transformer else SpatialTransformer(\n                            ch, num_heads, dim_head, depth=transformer_depth, context_dim=context_dim\n                        )\n                    )\n                self.input_blocks.append(TimestepEmbedSequential(*layers))\n                self._feature_size += ch\n                input_block_chans.append(ch)\n            if level != len(channel_mult) - 1:\n                out_ch = ch\n                self.input_blocks.append(\n                    TimestepEmbedSequential(\n                        ResBlock(\n                            ch,\n                            time_embed_dim,\n                            dropout,\n                            out_channels=out_ch,\n                            dims=dims,\n                            use_checkpoint=use_checkpoint,\n                            use_scale_shift_norm=use_scale_shift_norm,\n                            down=True,\n                            cconv=self.cconv\n                        )\n                        if resblock_updown\n                        else Downsample(\n                            ch, conv_resample, dims=dims, out_channels=out_ch, cconv=self.cconv\n                        )\n                    )\n                )\n                ch = out_ch\n                input_block_chans.append(ch)\n                ds *= 2\n                self._feature_size += ch\n\n        if num_head_channels == -1:\n            dim_head = ch // num_heads\n        else:\n            num_heads = ch // num_head_channels\n            dim_head = num_head_channels\n        if legacy:\n            # num_heads = 1\n            dim_head = ch // num_heads if use_spatial_transformer else num_head_channels\n        self.middle_block = TimestepEmbedSequential(\n            ResBlock(\n                ch,\n                time_embed_dim,\n                dropout,\n                dims=dims,\n                use_checkpoint=use_checkpoint,\n                use_scale_shift_norm=use_scale_shift_norm,\n                cconv=self.cconv\n            ),\n            AttentionBlock(\n                ch,\n                use_checkpoint=use_checkpoint,\n                num_heads=num_heads,\n                num_head_channels=dim_head,\n                use_new_attention_order=use_new_attention_order,\n            ) if not use_spatial_transformer else SpatialTransformer(\n                ch, num_heads, dim_head, depth=transformer_depth, context_dim=context_dim\n            ),\n            ResBlock(\n                ch,\n                time_embed_dim,\n                dropout,\n                dims=dims,\n                use_checkpoint=use_checkpoint,\n                use_scale_shift_norm=use_scale_shift_norm,\n                cconv=self.cconv\n            ),\n        )\n        self._feature_size += ch\n\n        self.output_blocks = nn.ModuleList([])\n        for level, mult in list(enumerate(channel_mult))[::-1]:\n            for i in range(num_res_blocks + 1):\n                ich = input_block_chans.pop()\n                layers = [\n                    ResBlock(\n                        ch + ich,\n                        time_embed_dim,\n                        dropout,\n                        out_channels=model_channels * mult,\n                        dims=dims,\n                        use_checkpoint=use_checkpoint,\n                        use_scale_shift_norm=use_scale_shift_norm,\n                        cconv=self.cconv\n                    )\n                ]\n                ch = model_channels * mult\n                if ds in attention_resolutions:\n                    if num_head_channels == -1:\n                        dim_head = ch // num_heads\n                    else:\n                        num_heads = ch // num_head_channels\n                        dim_head = num_head_channels\n                    if legacy:\n                        # num_heads = 1\n                        dim_head = ch // num_heads if use_spatial_transformer else num_head_channels\n                    layers.append(\n                        AttentionBlock(\n                            ch,\n                            use_checkpoint=use_checkpoint,\n                            num_heads=num_heads_upsample,\n                            num_head_channels=dim_head,\n                            use_new_attention_order=use_new_attention_order,\n                        ) if not use_spatial_transformer else SpatialTransformer(\n                            ch, num_heads, dim_head, depth=transformer_depth, context_dim=context_dim\n                        )\n                    )\n                if level and i == num_res_blocks:\n                    out_ch = ch\n                    layers.append(\n                        ResBlock(\n                            ch,\n                            time_embed_dim,\n                            dropout,\n                            out_channels=out_ch,\n                            dims=dims,\n                            use_checkpoint=use_checkpoint,\n                            use_scale_shift_norm=use_scale_shift_norm,\n                            up=True,\n                            cconv=self.cconv\n                        )\n                        if resblock_updown\n                        else Upsample(ch, conv_resample, dims=dims, out_channels=out_ch, cconv=self.cconv)\n                    )\n                    ds //= 2\n                self.output_blocks.append(TimestepEmbedSequential(*layers))\n                self._feature_size += ch\n\n        self.out = nn.Sequential(\n            normalization(ch),\n            nn.SiLU(),\n            zero_module(conv_nd(dims, model_channels, out_channels, 3, padding=1, cconv=self.cconv)),\n        )\n        if self.predict_codebook_ids:\n            self.id_predictor = nn.Sequential(\n                normalization(ch),\n                conv_nd(dims, model_channels, n_embed, 1, use_cconv=self.use_cconv),\n                # nn.LogSoftmax(dim=1)  # change to cross_entropy and produce non-normalized logits\n            )\n\n    def convert_to_fp16(self):\n        \"\"\"\n        Convert the torso of the model to float16.\n        \"\"\"\n        self.input_blocks.apply(convert_module_to_f16)\n        self.middle_block.apply(convert_module_to_f16)\n        self.output_blocks.apply(convert_module_to_f16)\n\n    def convert_to_fp32(self):\n        \"\"\"\n        Convert the torso of the model to float32.\n        \"\"\"\n        self.input_blocks.apply(convert_module_to_f32)\n        self.middle_block.apply(convert_module_to_f32)\n        self.output_blocks.apply(convert_module_to_f32)\n\n    def forward(self, x, timesteps=None, context=None, y=None, **kwargs):\n        \"\"\"\n        Apply the model to an input batch.\n        :param x: an [N x C x ...] Tensor of inputs.\n        :param timesteps: a 1-D batch of timesteps.\n        :param context: conditioning plugged in via crossattn\n        :param y: an [N] Tensor of labels, if class-conditional.\n        :return: an [N x C x ...] Tensor of outputs.\n        \"\"\"\n        assert (y is not None) == (\n                self.num_classes is not None\n        ), \"must specify y if and only if the model is class-conditional\"\n        hs = []\n        t_emb = timestep_embedding(timesteps, self.model_channels, repeat_only=False)\n        emb = self.time_embed(t_emb)\n\n        if self.num_classes is not None:\n            assert y.shape == (x.shape[0],)\n            emb = emb + self.label_emb(y)\n\n        h = x.type(self.dtype)\n        for module in self.input_blocks:\n            h = module(h, emb, context)\n            hs.append(h)\n        h = self.middle_block(h, emb, context)\n        for module in self.output_blocks:\n            h = th.cat([h, hs.pop()], dim=1)\n            h = module(h, emb, context)\n        h = h.type(x.dtype)\n        if self.predict_codebook_ids:\n            return self.id_predictor(h)\n        else:\n            return self.out(h)\n\n\nclass EncoderUNetModel(nn.Module):\n    \"\"\"\n    The half UNet model with attention and timestep embedding.\n    For usage, see UNet.\n    \"\"\"\n\n    def __init__(\n            self,\n            image_size,\n            in_channels,\n            model_channels,\n            out_channels,\n            num_res_blocks,\n            attention_resolutions,\n            dropout=0,\n            channel_mult=(1, 2, 4, 8),\n            conv_resample=True,\n            dims=2,\n            use_checkpoint=False,\n            use_fp16=False,\n            num_heads=1,\n            num_head_channels=-1,\n            num_heads_upsample=-1,\n            use_scale_shift_norm=False,\n            resblock_updown=False,\n            use_new_attention_order=False,\n            pool=\"adaptive\",\n            lib_name='ldm',\n            *args,\n            **kwargs\n    ):\n        super().__init__()\n\n        if num_heads_upsample == -1:\n            num_heads_upsample = num_heads\n\n        self.in_channels = in_channels\n        self.model_channels = model_channels\n        self.out_channels = out_channels\n        self.num_res_blocks = num_res_blocks\n        self.attention_resolutions = attention_resolutions\n        self.dropout = dropout\n        self.channel_mult = channel_mult\n        self.conv_resample = conv_resample\n        self.use_checkpoint = use_checkpoint\n        self.dtype = th.float16 if use_fp16 else th.float32\n        self.num_heads = num_heads\n        self.num_head_channels = num_head_channels\n        self.num_heads_upsample = num_heads_upsample\n        self.cconv = lib_name == 'lidm'\n\n        time_embed_dim = model_channels * 4\n        self.time_embed = nn.Sequential(\n            linear(model_channels, time_embed_dim),\n            nn.SiLU(),\n            linear(time_embed_dim, time_embed_dim),\n        )\n\n        self.input_blocks = nn.ModuleList(\n            [\n                TimestepEmbedSequential(\n                    conv_nd(dims, in_channels, model_channels, 3, padding=1, cconv=self.cconv)\n                )\n            ]\n        )\n        self._feature_size = model_channels\n        input_block_chans = [model_channels]\n        ch = model_channels\n        ds = 1\n        for level, mult in enumerate(channel_mult):\n            for _ in range(num_res_blocks):\n                layers = [\n                    ResBlock(\n                        ch,\n                        time_embed_dim,\n                        dropout,\n                        out_channels=mult * model_channels,\n                        dims=dims,\n                        use_checkpoint=use_checkpoint,\n                        use_scale_shift_norm=use_scale_shift_norm,\n                    )\n                ]\n                ch = mult * model_channels\n                if ds in attention_resolutions:\n                    layers.append(\n                        AttentionBlock(\n                            ch,\n                            use_checkpoint=use_checkpoint,\n                            num_heads=num_heads,\n                            num_head_channels=num_head_channels,\n                            use_new_attention_order=use_new_attention_order,\n                        )\n                    )\n                self.input_blocks.append(TimestepEmbedSequential(*layers))\n                self._feature_size += ch\n                input_block_chans.append(ch)\n            if level != len(channel_mult) - 1:\n                out_ch = ch\n                self.input_blocks.append(\n                    TimestepEmbedSequential(\n                        ResBlock(\n                            ch,\n                            time_embed_dim,\n                            dropout,\n                            out_channels=out_ch,\n                            dims=dims,\n                            use_checkpoint=use_checkpoint,\n                            use_scale_shift_norm=use_scale_shift_norm,\n                            down=True,\n                        )\n                        if resblock_updown\n                        else Downsample(\n                            ch, conv_resample, dims=dims, out_channels=out_ch\n                        )\n                    )\n                )\n                ch = out_ch\n                input_block_chans.append(ch)\n                ds *= 2\n                self._feature_size += ch\n\n        self.middle_block = TimestepEmbedSequential(\n            ResBlock(\n                ch,\n                time_embed_dim,\n                dropout,\n                dims=dims,\n                use_checkpoint=use_checkpoint,\n                use_scale_shift_norm=use_scale_shift_norm,\n            ),\n            AttentionBlock(\n                ch,\n                use_checkpoint=use_checkpoint,\n                num_heads=num_heads,\n                num_head_channels=num_head_channels,\n                use_new_attention_order=use_new_attention_order,\n            ),\n            ResBlock(\n                ch,\n                time_embed_dim,\n                dropout,\n                dims=dims,\n                use_checkpoint=use_checkpoint,\n                use_scale_shift_norm=use_scale_shift_norm,\n            ),\n        )\n        self._feature_size += ch\n        self.pool = pool\n        if pool == \"adaptive\":\n            self.out = nn.Sequential(\n                normalization(ch),\n                nn.SiLU(),\n                nn.AdaptiveAvgPool2d((1, 1)),\n                zero_module(conv_nd(dims, ch, out_channels, 1)),\n                nn.Flatten(),\n            )\n        elif pool == \"attention\":\n            assert num_head_channels != -1\n            self.out = nn.Sequential(\n                normalization(ch),\n                nn.SiLU(),\n                AttentionPool2d(\n                    (image_size // ds), ch, num_head_channels, out_channels\n                ),\n            )\n        elif pool == \"spatial\":\n            self.out = nn.Sequential(\n                nn.Linear(self._feature_size, 2048),\n                nn.ReLU(),\n                nn.Linear(2048, self.out_channels),\n            )\n        elif pool == \"spatial_v2\":\n            self.out = nn.Sequential(\n                nn.Linear(self._feature_size, 2048),\n                normalization(2048),\n                nn.SiLU(),\n                nn.Linear(2048, self.out_channels),\n            )\n        else:\n            raise NotImplementedError(f\"Unexpected {pool} pooling\")\n\n    def convert_to_fp16(self):\n        \"\"\"\n        Convert the torso of the model to float16.\n        \"\"\"\n        self.input_blocks.apply(convert_module_to_f16)\n        self.middle_block.apply(convert_module_to_f16)\n\n    def convert_to_fp32(self):\n        \"\"\"\n        Convert the torso of the model to float32.\n        \"\"\"\n        self.input_blocks.apply(convert_module_to_f32)\n        self.middle_block.apply(convert_module_to_f32)\n\n    def forward(self, x, timesteps):\n        \"\"\"\n        Apply the model to an input batch.\n        :param x: an [N x C x ...] Tensor of inputs.\n        :param timesteps: a 1-D batch of timesteps.\n        :return: an [N x K] Tensor of outputs.\n        \"\"\"\n        emb = self.time_embed(timestep_embedding(timesteps, self.model_channels))\n\n        results = []\n        h = x.type(self.dtype)\n        for module in self.input_blocks:\n            h = module(h, emb)\n            if self.pool.startswith(\"spatial\"):\n                results.append(h.type(x.dtype).mean(dim=(2, 3)))\n        h = self.middle_block(h, emb)\n        if self.pool.startswith(\"spatial\"):\n            results.append(h.type(x.dtype).mean(dim=(2, 3)))\n            h = th.cat(results, axis=-1)\n            return self.out(h)\n        else:\n            h = h.type(x.dtype)\n            return self.out(h)\n"
  },
  {
    "path": "lidm/modules/distributions/__init__.py",
    "content": ""
  },
  {
    "path": "lidm/modules/distributions/distributions.py",
    "content": "import torch\nimport numpy as np\n\n\nclass AbstractDistribution:\n    def sample(self):\n        raise NotImplementedError()\n\n    def mode(self):\n        raise NotImplementedError()\n\n\nclass DiracDistribution(AbstractDistribution):\n    def __init__(self, value):\n        self.value = value\n\n    def sample(self):\n        return self.value\n\n    def mode(self):\n        return self.value\n\n\nclass DiagonalGaussianDistribution(object):\n    def __init__(self, parameters, deterministic=False):\n        self.parameters = parameters\n        self.mean, self.logvar = torch.chunk(parameters, 2, dim=1)\n        self.logvar = torch.clamp(self.logvar, -30.0, 20.0)\n        self.deterministic = deterministic\n        self.std = torch.exp(0.5 * self.logvar)\n        self.var = torch.exp(self.logvar)\n        if self.deterministic:\n            self.var = self.std = torch.zeros_like(self.mean).to(device=self.parameters.device)\n\n    def sample(self):\n        x = self.mean + self.std * torch.randn(self.mean.shape).to(device=self.parameters.device)\n        return x\n\n    def kl(self, other=None):\n        if self.deterministic:\n            return torch.Tensor([0.])\n        else:\n            if other is None:\n                return 0.5 * torch.sum(torch.pow(self.mean, 2)\n                                       + self.var - 1.0 - self.logvar,\n                                       dim=[1, 2, 3])\n            else:\n                return 0.5 * torch.sum(\n                    torch.pow(self.mean - other.mean, 2) / other.var\n                    + self.var / other.var - 1.0 - self.logvar + other.logvar,\n                    dim=[1, 2, 3])\n\n    def nll(self, sample, dims=[1,2,3]):\n        if self.deterministic:\n            return torch.Tensor([0.])\n        logtwopi = np.log(2.0 * np.pi)\n        return 0.5 * torch.sum(\n            logtwopi + self.logvar + torch.pow(sample - self.mean, 2) / self.var,\n            dim=dims)\n\n    def mode(self):\n        return self.mean\n\n\ndef normal_kl(mean1, logvar1, mean2, logvar2):\n    \"\"\"\n    source: https://github.com/openai/guided-diffusion/blob/27c20a8fab9cb472df5d6bdd6c8d11c8f430b924/guided_diffusion/losses.py#L12\n    Compute the KL divergence between two gaussians.\n    Shapes are automatically broadcasted, so batches can be compared to\n    scalars, among other use cases.\n    \"\"\"\n    tensor = None\n    for obj in (mean1, logvar1, mean2, logvar2):\n        if isinstance(obj, torch.Tensor):\n            tensor = obj\n            break\n    assert tensor is not None, \"at least one argument must be a Tensor\"\n\n    # Force variances to be Tensors. Broadcasting helps convert scalars to\n    # Tensors, but it does not work for torch.exp().\n    logvar1, logvar2 = [\n        x if isinstance(x, torch.Tensor) else torch.tensor(x).to(tensor)\n        for x in (logvar1, logvar2)\n    ]\n\n    return 0.5 * (\n        -1.0\n        + logvar2\n        - logvar1\n        + torch.exp(logvar1 - logvar2)\n        + ((mean1 - mean2) ** 2) * torch.exp(-logvar2)\n    )\n"
  },
  {
    "path": "lidm/modules/ema.py",
    "content": "import torch\nfrom torch import nn\n\n\nclass LitEma(nn.Module):\n    def __init__(self, model, decay=0.9999, use_num_upates=True):\n        super().__init__()\n        if decay < 0.0 or decay > 1.0:\n            raise ValueError('Decay must be between 0 and 1')\n\n        self.m_name2s_name = {}\n        self.register_buffer('decay', torch.tensor(decay, dtype=torch.float32))\n        self.register_buffer('num_updates', torch.tensor(0, dtype=torch.int) if use_num_upates\n        else torch.tensor(-1, dtype=torch.int))\n\n        for name, p in model.named_parameters():\n            if p.requires_grad:\n                # remove as '.'-character is not allowed in buffers\n                s_name = name.replace('.', '')\n                self.m_name2s_name.update({name: s_name})\n                self.register_buffer(s_name, p.clone().detach().data)\n\n        self.collected_params = []\n\n    def forward(self, model):\n        decay = self.decay\n\n        if self.num_updates >= 0:\n            self.num_updates += 1\n            decay = min(self.decay, (1 + self.num_updates) / (10 + self.num_updates))\n\n        one_minus_decay = 1.0 - decay\n\n        with torch.no_grad():\n            m_param = dict(model.named_parameters())\n            shadow_params = dict(self.named_buffers())\n\n            for key in m_param:\n                if m_param[key].requires_grad:\n                    sname = self.m_name2s_name[key]\n                    shadow_params[sname] = shadow_params[sname].type_as(m_param[key])\n                    shadow_params[sname].sub_(one_minus_decay * (shadow_params[sname] - m_param[key]))\n                else:\n                    assert not key in self.m_name2s_name\n\n    def copy_to(self, model):\n        m_param = dict(model.named_parameters())\n        shadow_params = dict(self.named_buffers())\n        for key in m_param:\n            if m_param[key].requires_grad:\n                m_param[key].data.copy_(shadow_params[self.m_name2s_name[key]].data)\n            else:\n                assert not key in self.m_name2s_name\n\n    def store(self, parameters):\n        \"\"\"\n        Save the current parameters for restoring later.\n        Args:\n          parameters: Iterable of `torch.nn.Parameter`; the parameters to be\n            temporarily stored.\n        \"\"\"\n        self.collected_params = [param.clone() for param in parameters]\n\n    def restore(self, parameters):\n        \"\"\"\n        Restore the parameters stored with the `store` method.\n        Useful to validate the model with EMA parameters without affecting the\n        original optimization process. Store the parameters before the\n        `copy_to` method. After validation (or model saving), use this to\n        restore the former parameters.\n        Args:\n          parameters: Iterable of `torch.nn.Parameter`; the parameters to be\n            updated with the stored parameters.\n        \"\"\"\n        for c_param, param in zip(self.collected_params, parameters):\n            param.data.copy_(c_param.data)\n"
  },
  {
    "path": "lidm/modules/encoders/__init__.py",
    "content": ""
  },
  {
    "path": "lidm/modules/encoders/modules.py",
    "content": "import torch\nimport torch.nn as nn\nfrom functools import partial\nimport clip\nfrom einops import rearrange, repeat\nimport kornia\n\nfrom ...modules.x_transformer import Encoder, TransformerWrapper\n\n\nclass AbstractEncoder(nn.Module):\n    def __init__(self):\n        super().__init__()\n\n    def encode(self, *args, **kwargs):\n        raise NotImplementedError\n\n\nclass ClassEmbedder(nn.Module):\n    def __init__(self, embed_dim, n_classes=1000, key='class'):\n        super().__init__()\n        self.key = key\n        self.embedding = nn.Embedding(n_classes, embed_dim)\n\n    def forward(self, batch, key=None):\n        if key is None:\n            key = self.key\n        # this is for use in crossattn\n        c = batch[key][:, None]\n        c = self.embedding(c)\n        return c\n\n\nclass TransformerEmbedder(AbstractEncoder):\n    \"\"\"Some transformer encoder layers\"\"\"\n\n    def __init__(self, n_embed, n_layer, vocab_size, max_seq_len=77, device=\"cuda\"):\n        super().__init__()\n        self.device = device\n        self.transformer = TransformerWrapper(num_tokens=vocab_size, max_seq_len=max_seq_len,\n                                              attn_layers=Encoder(dim=n_embed, depth=n_layer))\n\n    def forward(self, tokens):\n        tokens = tokens.to(self.device)  # meh\n        z = self.transformer(tokens, return_embeddings=True)\n        return z\n\n    def encode(self, x):\n        return self(x)\n\n\nclass BERTTokenizer(AbstractEncoder):\n    \"\"\" Uses a pretrained BERT tokenizer by huggingface. Vocab size: 30522 (?)\"\"\"\n\n    def __init__(self, device=\"cuda\", vq_interface=True, max_length=77):\n        super().__init__()\n        from transformers import BertTokenizerFast\n        # self.tokenizer = BertTokenizerFast.from_pretrained(\"bert-base-uncased\")\n        self.tokenizer = BertTokenizerFast.from_pretrained('./models/bert')\n        self.device = device\n        self.vq_interface = vq_interface\n        self.max_length = max_length\n\n    def forward(self, text):\n        batch_encoding = self.tokenizer(text, truncation=True, max_length=self.max_length, return_length=True,\n                                        return_overflowing_tokens=False, padding=\"max_length\", return_tensors=\"pt\")\n        tokens = batch_encoding[\"input_ids\"].to(self.device)\n        return tokens\n\n    @torch.no_grad()\n    def encode(self, text):\n        tokens = self(text)\n        if not self.vq_interface:\n            return tokens\n        return None, None, [None, None, tokens]\n\n    def decode(self, text):\n        return text\n\n\nclass BERTEmbedder(AbstractEncoder):\n    \"\"\"Uses the BERT tokenizr model and add some transformer encoder layers\"\"\"\n\n    def __init__(self, n_embed, n_layer, vocab_size=30522, max_seq_len=77,\n                 device=\"cuda\", use_tokenizer=True, embedding_dropout=0.0):\n        super().__init__()\n        self.use_tknz_fn = use_tokenizer\n        if self.use_tknz_fn:\n            self.tknz_fn = BERTTokenizer(vq_interface=False, max_length=max_seq_len)\n        self.device = device\n        self.transformer = TransformerWrapper(num_tokens=vocab_size, max_seq_len=max_seq_len,\n                                              attn_layers=Encoder(dim=n_embed, depth=n_layer),\n                                              emb_dropout=embedding_dropout)\n\n    def forward(self, text):\n        if self.use_tknz_fn:\n            tokens = self.tknz_fn(text)  # .to(self.device)\n        else:\n            tokens = text\n        z = self.transformer(tokens, return_embeddings=True)\n        return z\n\n    def encode(self, text):\n        # output of length 77\n        return self(text)\n\n\nclass SpatialRescaler(nn.Module):\n    def __init__(self,\n                 strides=[],\n                 method='bilinear',\n                 in_channels=3,\n                 out_channels=None,\n                 bias=False):\n        super().__init__()\n        self.strides = strides\n        assert method in ['nearest', 'linear', 'bilinear', 'trilinear', 'bicubic', 'area']\n        self.interpolator = partial(torch.nn.functional.interpolate, mode=method, align_corners=True)\n        self.remap_output = out_channels is not None\n        if self.remap_output:\n            print(f'Spatial Rescaler mapping from {in_channels} to {out_channels} channels after resizing.')\n            self.channel_mapper = nn.Conv2d(in_channels, out_channels, 1, bias=bias)\n\n    def forward(self, x):\n        for h_s, w_s in self.strides:\n            x = self.interpolator(x, scale_factor=(1/h_s, 1/w_s))\n\n        if self.remap_output:\n            x = self.channel_mapper(x)\n        return x\n\n    def encode(self, x):\n        return self(x)\n\n\nclass FrozenCLIPTextEmbedder(nn.Module):\n    \"\"\"\n    Uses the CLIP transformer encoder for text.\n    \"\"\"\n\n    def __init__(self, version='ViT-L/14', device=\"cuda\", max_length=77, n_repeat=1, normalize=True):\n        super().__init__()\n        self.model, _ = clip.load(version, jit=False, device=\"cpu\")\n        self.model.to(device)\n        self.device = device\n        self.max_length = max_length\n        self.n_repeat = n_repeat\n        self.normalize = normalize\n\n    def freeze(self):\n        self.model = self.model.eval()\n        for param in self.parameters():\n            param.requires_grad = False\n\n    def forward(self, text):\n        tokens = clip.tokenize(text).to(self.device)\n        z = self.model.encode_text(tokens)\n        if self.normalize:\n            z = z / torch.linalg.norm(z, dim=1, keepdim=True)\n        return z\n\n    def encode(self, text):\n        z = self(text)\n        if z.ndim == 2:\n            z = z[:, None, :]\n        z = repeat(z, 'b 1 d -> b k d', k=self.n_repeat)\n        return z\n\n\nclass FrozenClipMultiTextEmbedder(FrozenCLIPTextEmbedder):\n    def __init__(self, num_views=1, apply_all=False, **kwargs):\n        super().__init__(**kwargs)\n        self.num_views = num_views\n        self.apply_all = apply_all\n\n    def encode(self, text):\n        z = self(text)\n        if z.ndim == 2:\n            z = z[:, None, :]\n\n        if not self.apply_all:\n            new_z = torch.zeros(*z.shape[:2], z.shape[2] * self.num_views, device=z.device)\n            new_z[:, :, self.num_views // 2 * z.shape[2]: (self.num_views // 2 + 1) * z.shape[2]] = z\n        else:\n            new_z = repeat(z, 'b 1 d -> b 1 (d m)', m=self.num_views)\n\n        return new_z\n\n\nclass FrozenClipImageEmbedder(nn.Module):\n    \"\"\"\n    Uses the CLIP image encoder.\n    \"\"\"\n\n    def __init__(\n            self,\n            model,\n            jit=False,\n            device='cuda' if torch.cuda.is_available() else 'cpu',\n            antialias=False,\n    ):\n        super().__init__()\n        self.model, _ = clip.load(name=model, device='cpu', jit=jit)\n        self.init()\n\n        self.antialias = antialias\n\n        self.register_buffer('mean', torch.Tensor([0.48145466, 0.4578275, 0.40821073]), persistent=False)\n        self.register_buffer('std', torch.Tensor([0.26862954, 0.26130258, 0.27577711]), persistent=False)\n\n    def init(self):\n        for param in self.model.parameters():\n            param.requires_grad = False\n        self.model.eval()\n\n    def preprocess(self, x):\n        x = kornia.geometry.resize(x, (224, 224),\n                                   interpolation='bicubic', align_corners=True,\n                                   antialias=self.antialias)\n        # x = (x + 1.) / 2.\n\n        # renormalize according to clip\n        x = kornia.enhance.normalize(x, self.mean, self.std)\n        return x\n\n    def forward(self, x):\n        # x is assumed to be in range [0,1]\n        return self.model.encode_image(self.preprocess(x))\n\n\nclass FrozenClipMultiImageEmbedder(FrozenClipImageEmbedder):\n    \"\"\"\n    Uses the CLIP image encoder with multi-image as input.\n    \"\"\"\n\n    def __init__(self, num_views=1, split_per_view=1, img_dim=768, out_dim=512, key='camera', **kwargs):\n        super().__init__(**kwargs)\n        self.split_per_view = split_per_view\n        self.key = key\n        self.linear = nn.Linear(img_dim, out_dim)\n        self.view_embedding = nn.Parameter(img_dim ** -0.5 * torch.randn((1, num_views * split_per_view, img_dim)))\n\n    def forward(self, x):\n        # x is assumed to be in range [0,1]\n        if isinstance(x, torch.Tensor) and x.ndim == 5:\n            x = x.permute(1, 0, 2, 3, 4)\n        elif isinstance(x, dict):\n            x = x[self.key]\n        elif isinstance(x, torch.Tensor) and x.ndim == 3:\n            x = self.linear(x)\n            return x\n\n        with torch.no_grad():\n            img_feats = [self.model.encode_image(self.preprocess(img))[:, None] for img in x]\n            x = torch.cat(img_feats, 1).float() + self.view_embedding\n            x = self.linear(x)\n\n        return x\n\n\nclass FrozenClipImagePatchEmbedder(nn.Module):\n    \"\"\"\n    Uses the CLIP image encoder.\n    \"\"\"\n\n    def __init__(\n            self,\n            model,\n            jit=False,\n            device='cuda' if torch.cuda.is_available() else 'cpu',\n            antialias=False,\n            img_dim=1024,\n            out_dim=512,\n            num_views=1,\n            split_per_view=1\n    ):\n        super().__init__()\n        self.model, _ = clip.load(name=model, device='cpu', jit=jit)\n        self.init()\n\n        self.antialias = antialias\n\n        self.register_buffer('mean', torch.Tensor([0.48145466, 0.4578275, 0.40821073]), persistent=False)\n        self.register_buffer('std', torch.Tensor([0.26862954, 0.26130258, 0.27577711]), persistent=False)\n        self.view_embedding = nn.Parameter(img_dim ** -0.5 * torch.randn((1, num_views * split_per_view, 1, img_dim)))\n\n        self.linear = nn.Linear(img_dim, out_dim)\n\n    def init(self):\n        for param in self.model.parameters():\n            param.requires_grad = False\n        self.model.eval()\n\n    def preprocess(self, x):\n        x = kornia.geometry.resize(x, (224, 224),\n                                   interpolation='bicubic', align_corners=True,\n                                   antialias=self.antialias)\n        # x = (x + 1.) / 2.\n\n        # renormalize according to clip\n        x = kornia.enhance.normalize(x, self.mean, self.std)\n        return x\n\n    def encode_image_patch(self, x):\n        visual_encoder = self.model.visual\n        x = x.type(self.model.dtype)\n        x = visual_encoder.conv1(x)  # shape = [*, width, grid, grid]\n        x = x.reshape(x.shape[0], x.shape[1], -1)  # shape = [*, width, grid ** 2]\n        x = x.permute(0, 2, 1)  # shape = [*, grid ** 2, width]\n        x = torch.cat([visual_encoder.class_embedding.to(x.dtype) + torch.zeros(x.shape[0], 1, x.shape[-1], dtype=x.dtype, device=x.device), x], dim=1)  # shape = [*, grid ** 2 + 1, width]\n        x = x + visual_encoder.positional_embedding.to(x.dtype)\n        x = visual_encoder.ln_pre(x)\n\n        x = x.permute(1, 0, 2)  # NLD -> LND\n        x = visual_encoder.transformer(x)\n        x = x.permute(1, 0, 2)  # LND -> NLD\n        x = x[:, 1:, :]\n\n        return x\n\n    def forward(self, x):\n        # x is assumed to be in range [0,1]\n        img_feats = [self.encode_image_patch(self.preprocess(img))[:, None] for img in x]\n        x = torch.cat(img_feats, 1).float() + self.view_embedding\n        x = rearrange(x, 'b v n c -> b (v n) c')\n        x = self.linear(x)\n        return x\n"
  },
  {
    "path": "lidm/modules/image_degradation/__init__.py",
    "content": "from .bsrgan import degradation_bsrgan_variant as degradation_fn_bsr\nfrom .bsrgan_light import degradation_bsrgan_variant as degradation_fn_bsr_light\n"
  },
  {
    "path": "lidm/modules/image_degradation/bsrgan.py",
    "content": "# -*- coding: utf-8 -*-\n\"\"\"\n# --------------------------------------------\n# Super-Resolution\n# --------------------------------------------\n#\n# Kai Zhang (cskaizhang@gmail.com)\n# https://github.com/cszn\n# From 2019/03--2021/08\n# --------------------------------------------\n\"\"\"\n\nimport numpy as np\nimport cv2\nimport torch\n\nfrom functools import partial\nimport random\nfrom scipy import ndimage\nimport scipy\nimport scipy.stats as ss\nfrom scipy.interpolate import interp2d\nfrom scipy.linalg import orth\nimport albumentations\n\nfrom . import utils_image as util\n\n\ndef modcrop_np(img, sf):\n    '''\n    Args:\n        img: numpy image, WxH or WxHxC\n        sf: scale factor\n    Return:\n        cropped image\n    '''\n    w, h = img.shape[:2]\n    im = np.copy(img)\n    return im[:w - w % sf, :h - h % sf, ...]\n\n\n\"\"\"\n# --------------------------------------------\n# anisotropic Gaussian kernels\n# --------------------------------------------\n\"\"\"\n\n\ndef analytic_kernel(k):\n    \"\"\"Calculate the X4 kernel from the X2 kernel (for proof see appendix in paper)\"\"\"\n    k_size = k.shape[0]\n    # Calculate the big kernels size\n    big_k = np.zeros((3 * k_size - 2, 3 * k_size - 2))\n    # Loop over the small kernel to fill the big one\n    for r in range(k_size):\n        for c in range(k_size):\n            big_k[2 * r:2 * r + k_size, 2 * c:2 * c + k_size] += k[r, c] * k\n    # Crop the edges of the big kernel to ignore very small values and increase run time of SR\n    crop = k_size // 2\n    cropped_big_k = big_k[crop:-crop, crop:-crop]\n    # Normalize to 1\n    return cropped_big_k / cropped_big_k.sum()\n\n\ndef anisotropic_Gaussian(ksize=15, theta=np.pi, l1=6, l2=6):\n    \"\"\" generate an anisotropic Gaussian kernel\n    Args:\n        ksize : e.g., 15, kernel size\n        theta : [0,  pi], rotation angle range\n        l1    : [0.1,50], scaling of eigenvalues\n        l2    : [0.1,l1], scaling of eigenvalues\n        If l1 = l2, will get an isotropic Gaussian kernel.\n    Returns:\n        k     : kernel\n    \"\"\"\n\n    v = np.dot(np.array([[np.cos(theta), -np.sin(theta)], [np.sin(theta), np.cos(theta)]]), np.array([1., 0.]))\n    V = np.array([[v[0], v[1]], [v[1], -v[0]]])\n    D = np.array([[l1, 0], [0, l2]])\n    Sigma = np.dot(np.dot(V, D), np.linalg.inv(V))\n    k = gm_blur_kernel(mean=[0, 0], cov=Sigma, size=ksize)\n\n    return k\n\n\ndef gm_blur_kernel(mean, cov, size=15):\n    center = size / 2.0 + 0.5\n    k = np.zeros([size, size])\n    for y in range(size):\n        for x in range(size):\n            cy = y - center + 1\n            cx = x - center + 1\n            k[y, x] = ss.multivariate_normal.pdf([cx, cy], mean=mean, cov=cov)\n\n    k = k / np.sum(k)\n    return k\n\n\ndef shift_pixel(x, sf, upper_left=True):\n    \"\"\"shift pixel for super-resolution with different scale factors\n    Args:\n        x: WxHxC or WxH\n        sf: scale factor\n        upper_left: shift direction\n    \"\"\"\n    h, w = x.shape[:2]\n    shift = (sf - 1) * 0.5\n    xv, yv = np.arange(0, w, 1.0), np.arange(0, h, 1.0)\n    if upper_left:\n        x1 = xv + shift\n        y1 = yv + shift\n    else:\n        x1 = xv - shift\n        y1 = yv - shift\n\n    x1 = np.clip(x1, 0, w - 1)\n    y1 = np.clip(y1, 0, h - 1)\n\n    if x.ndim == 2:\n        x = interp2d(xv, yv, x)(x1, y1)\n    if x.ndim == 3:\n        for i in range(x.shape[-1]):\n            x[:, :, i] = interp2d(xv, yv, x[:, :, i])(x1, y1)\n\n    return x\n\n\ndef blur(x, k):\n    '''\n    x: image, NxcxHxW\n    k: kernel, Nx1xhxw\n    '''\n    n, c = x.shape[:2]\n    p1, p2 = (k.shape[-2] - 1) // 2, (k.shape[-1] - 1) // 2\n    x = torch.nn.functional.pad(x, pad=(p1, p2, p1, p2), mode='replicate')\n    k = k.repeat(1, c, 1, 1)\n    k = k.view(-1, 1, k.shape[2], k.shape[3])\n    x = x.view(1, -1, x.shape[2], x.shape[3])\n    x = torch.nn.functional.conv2d(x, k, bias=None, stride=1, padding=0, groups=n * c)\n    x = x.view(n, c, x.shape[2], x.shape[3])\n\n    return x\n\n\ndef gen_kernel(k_size=np.array([15, 15]), scale_factor=np.array([4, 4]), min_var=0.6, max_var=10., noise_level=0):\n    \"\"\"\"\n    # modified version of https://github.com/assafshocher/BlindSR_dataset_generator\n    # Kai Zhang\n    # min_var = 0.175 * sf  # variance of the gaussian kernel will be sampled between min_var and max_var\n    # max_var = 2.5 * sf\n    \"\"\"\n    # Set random eigen-vals (lambdas) and angle (theta) for COV matrix\n    lambda_1 = min_var + np.random.rand() * (max_var - min_var)\n    lambda_2 = min_var + np.random.rand() * (max_var - min_var)\n    theta = np.random.rand() * np.pi  # random theta\n    noise = -noise_level + np.random.rand(*k_size) * noise_level * 2\n\n    # Set COV matrix using Lambdas and Theta\n    LAMBDA = np.diag([lambda_1, lambda_2])\n    Q = np.array([[np.cos(theta), -np.sin(theta)],\n                  [np.sin(theta), np.cos(theta)]])\n    SIGMA = Q @ LAMBDA @ Q.T\n    INV_SIGMA = np.linalg.inv(SIGMA)[None, None, :, :]\n\n    # Set expectation position (shifting kernel for aligned image)\n    MU = k_size // 2 - 0.5 * (scale_factor - 1)  # - 0.5 * (scale_factor - k_size % 2)\n    MU = MU[None, None, :, None]\n\n    # Create meshgrid for Gaussian\n    [X, Y] = np.meshgrid(range(k_size[0]), range(k_size[1]))\n    Z = np.stack([X, Y], 2)[:, :, :, None]\n\n    # Calcualte Gaussian for every pixel of the kernel\n    ZZ = Z - MU\n    ZZ_t = ZZ.transpose(0, 1, 3, 2)\n    raw_kernel = np.exp(-0.5 * np.squeeze(ZZ_t @ INV_SIGMA @ ZZ)) * (1 + noise)\n\n    # shift the kernel so it will be centered\n    # raw_kernel_centered = kernel_shift(raw_kernel, scale_factor)\n\n    # Normalize the kernel and return\n    # kernel = raw_kernel_centered / np.sum(raw_kernel_centered)\n    kernel = raw_kernel / np.sum(raw_kernel)\n    return kernel\n\n\ndef fspecial_gaussian(hsize, sigma):\n    hsize = [hsize, hsize]\n    siz = [(hsize[0] - 1.0) / 2.0, (hsize[1] - 1.0) / 2.0]\n    std = sigma\n    [x, y] = np.meshgrid(np.arange(-siz[1], siz[1] + 1), np.arange(-siz[0], siz[0] + 1))\n    arg = -(x * x + y * y) / (2 * std * std)\n    h = np.exp(arg)\n    h[h < scipy.finfo(float).eps * h.max()] = 0\n    sumh = h.sum()\n    if sumh != 0:\n        h = h / sumh\n    return h\n\n\ndef fspecial_laplacian(alpha):\n    alpha = max([0, min([alpha, 1])])\n    h1 = alpha / (alpha + 1)\n    h2 = (1 - alpha) / (alpha + 1)\n    h = [[h1, h2, h1], [h2, -4 / (alpha + 1), h2], [h1, h2, h1]]\n    h = np.array(h)\n    return h\n\n\ndef fspecial(filter_type, *args, **kwargs):\n    '''\n    python code from:\n    https://github.com/ronaldosena/imagens-medicas-2/blob/40171a6c259edec7827a6693a93955de2bd39e76/Aulas/aula_2_-_uniform_filter/matlab_fspecial.py\n    '''\n    if filter_type == 'gaussian':\n        return fspecial_gaussian(*args, **kwargs)\n    if filter_type == 'laplacian':\n        return fspecial_laplacian(*args, **kwargs)\n\n\n\"\"\"\n# --------------------------------------------\n# degradation models\n# --------------------------------------------\n\"\"\"\n\n\ndef bicubic_degradation(x, sf=3):\n    '''\n    Args:\n        x: HxWxC image, [0, 1]\n        sf: down-scale factor\n    Return:\n        bicubicly downsampled LR image\n    '''\n    x = util.imresize_np(x, scale=1 / sf)\n    return x\n\n\ndef srmd_degradation(x, k, sf=3):\n    ''' blur + bicubic downsampling\n    Args:\n        x: HxWxC image, [0, 1]\n        k: hxw, double\n        sf: down-scale factor\n    Return:\n        downsampled LR image\n    Reference:\n        @inproceedings{zhang2018learning,\n          title={Learning a single convolutional super-resolution network for multiple degradations},\n          author={Zhang, Kai and Zuo, Wangmeng and Zhang, Lei},\n          booktitle={IEEE Conference on Computer Vision and Pattern Recognition},\n          pages={3262--3271},\n          year={2018}\n        }\n    '''\n    x = ndimage.filters.convolve(x, np.expand_dims(k, axis=2), mode='wrap')  # 'nearest' | 'mirror'\n    x = bicubic_degradation(x, sf=sf)\n    return x\n\n\ndef dpsr_degradation(x, k, sf=3):\n    ''' bicubic downsampling + blur\n    Args:\n        x: HxWxC image, [0, 1]\n        k: hxw, double\n        sf: down-scale factor\n    Return:\n        downsampled LR image\n    Reference:\n        @inproceedings{zhang2019deep,\n          title={Deep Plug-and-Play Super-Resolution for Arbitrary Blur Kernels},\n          author={Zhang, Kai and Zuo, Wangmeng and Zhang, Lei},\n          booktitle={IEEE Conference on Computer Vision and Pattern Recognition},\n          pages={1671--1681},\n          year={2019}\n        }\n    '''\n    x = bicubic_degradation(x, sf=sf)\n    x = ndimage.filters.convolve(x, np.expand_dims(k, axis=2), mode='wrap')\n    return x\n\n\ndef classical_degradation(x, k, sf=3):\n    ''' blur + downsampling\n    Args:\n        x: HxWxC image, [0, 1]/[0, 255]\n        k: hxw, double\n        sf: down-scale factor\n    Return:\n        downsampled LR image\n    '''\n    x = ndimage.filters.convolve(x, np.expand_dims(k, axis=2), mode='wrap')\n    # x = filters.correlate(x, np.expand_dims(np.flip(k), axis=2))\n    st = 0\n    return x[st::sf, st::sf, ...]\n\n\ndef add_sharpening(img, weight=0.5, radius=50, threshold=10):\n    \"\"\"USM sharpening. borrowed from real-ESRGAN\n    Input image: I; Blurry image: B.\n    1. K = I + weight * (I - B)\n    2. Mask = 1 if abs(I - B) > threshold, else: 0\n    3. Blur mask:\n    4. Out = Mask * K + (1 - Mask) * I\n    Args:\n        img (Numpy array): Input image, HWC, BGR; float32, [0, 1].\n        weight (float): Sharp weight. Default: 1.\n        radius (float): Kernel size of Gaussian blur. Default: 50.\n        threshold (int):\n    \"\"\"\n    if radius % 2 == 0:\n        radius += 1\n    blur = cv2.GaussianBlur(img, (radius, radius), 0)\n    residual = img - blur\n    mask = np.abs(residual) * 255 > threshold\n    mask = mask.astype('float32')\n    soft_mask = cv2.GaussianBlur(mask, (radius, radius), 0)\n\n    K = img + weight * residual\n    K = np.clip(K, 0, 1)\n    return soft_mask * K + (1 - soft_mask) * img\n\n\ndef add_blur(img, sf=4):\n    wd2 = 4.0 + sf\n    wd = 2.0 + 0.2 * sf\n    if random.random() < 0.5:\n        l1 = wd2 * random.random()\n        l2 = wd2 * random.random()\n        k = anisotropic_Gaussian(ksize=2 * random.randint(2, 11) + 3, theta=random.random() * np.pi, l1=l1, l2=l2)\n    else:\n        k = fspecial('gaussian', 2 * random.randint(2, 11) + 3, wd * random.random())\n    img = ndimage.filters.convolve(img, np.expand_dims(k, axis=2), mode='mirror')\n\n    return img\n\n\ndef add_resize(img, sf=4):\n    rnum = np.random.rand()\n    if rnum > 0.8:  # up\n        sf1 = random.uniform(1, 2)\n    elif rnum < 0.7:  # down\n        sf1 = random.uniform(0.5 / sf, 1)\n    else:\n        sf1 = 1.0\n    img = cv2.resize(img, (int(sf1 * img.shape[1]), int(sf1 * img.shape[0])), interpolation=random.choice([1, 2, 3]))\n    img = np.clip(img, 0.0, 1.0)\n\n    return img\n\n\n# def add_Gaussian_noise(img, noise_level1=2, noise_level2=25):\n#     noise_level = random.randint(noise_level1, noise_level2)\n#     rnum = np.random.rand()\n#     if rnum > 0.6:  # add color Gaussian noise\n#         img += np.random.normal(0, noise_level / 255.0, img.shape).astype(np.float32)\n#     elif rnum < 0.4:  # add grayscale Gaussian noise\n#         img += np.random.normal(0, noise_level / 255.0, (*img.shape[:2], 1)).astype(np.float32)\n#     else:  # add  noise\n#         L = noise_level2 / 255.\n#         D = np.diag(np.random.rand(3))\n#         U = orth(np.random.rand(3, 3))\n#         conv = np.dot(np.dot(np.transpose(U), D), U)\n#         img += np.random.multivariate_normal([0, 0, 0], np.abs(L ** 2 * conv), img.shape[:2]).astype(np.float32)\n#     img = np.clip(img, 0.0, 1.0)\n#     return img\n\ndef add_Gaussian_noise(img, noise_level1=2, noise_level2=25):\n    noise_level = random.randint(noise_level1, noise_level2)\n    rnum = np.random.rand()\n    if rnum > 0.6:  # add color Gaussian noise\n        img = img + np.random.normal(0, noise_level / 255.0, img.shape).astype(np.float32)\n    elif rnum < 0.4:  # add grayscale Gaussian noise\n        img = img + np.random.normal(0, noise_level / 255.0, (*img.shape[:2], 1)).astype(np.float32)\n    else:  # add  noise\n        L = noise_level2 / 255.\n        D = np.diag(np.random.rand(3))\n        U = orth(np.random.rand(3, 3))\n        conv = np.dot(np.dot(np.transpose(U), D), U)\n        img = img + np.random.multivariate_normal([0, 0, 0], np.abs(L ** 2 * conv), img.shape[:2]).astype(np.float32)\n    img = np.clip(img, 0.0, 1.0)\n    return img\n\n\ndef add_speckle_noise(img, noise_level1=2, noise_level2=25):\n    noise_level = random.randint(noise_level1, noise_level2)\n    img = np.clip(img, 0.0, 1.0)\n    rnum = random.random()\n    if rnum > 0.6:\n        img += img * np.random.normal(0, noise_level / 255.0, img.shape).astype(np.float32)\n    elif rnum < 0.4:\n        img += img * np.random.normal(0, noise_level / 255.0, (*img.shape[:2], 1)).astype(np.float32)\n    else:\n        L = noise_level2 / 255.\n        D = np.diag(np.random.rand(3))\n        U = orth(np.random.rand(3, 3))\n        conv = np.dot(np.dot(np.transpose(U), D), U)\n        img += img * np.random.multivariate_normal([0, 0, 0], np.abs(L ** 2 * conv), img.shape[:2]).astype(np.float32)\n    img = np.clip(img, 0.0, 1.0)\n    return img\n\n\ndef add_Poisson_noise(img):\n    img = np.clip((img * 255.0).round(), 0, 255) / 255.\n    vals = 10 ** (2 * random.random() + 2.0)  # [2, 4]\n    if random.random() < 0.5:\n        img = np.random.poisson(img * vals).astype(np.float32) / vals\n    else:\n        img_gray = np.dot(img[..., :3], [0.299, 0.587, 0.114])\n        img_gray = np.clip((img_gray * 255.0).round(), 0, 255) / 255.\n        noise_gray = np.random.poisson(img_gray * vals).astype(np.float32) / vals - img_gray\n        img += noise_gray[:, :, np.newaxis]\n    img = np.clip(img, 0.0, 1.0)\n    return img\n\n\ndef add_JPEG_noise(img):\n    quality_factor = random.randint(30, 95)\n    img = cv2.cvtColor(util.single2uint(img), cv2.COLOR_RGB2BGR)\n    result, encimg = cv2.imencode('.jpg', img, [int(cv2.IMWRITE_JPEG_QUALITY), quality_factor])\n    img = cv2.imdecode(encimg, 1)\n    img = cv2.cvtColor(util.uint2single(img), cv2.COLOR_BGR2RGB)\n    return img\n\n\ndef random_crop(lq, hq, sf=4, lq_patchsize=64):\n    h, w = lq.shape[:2]\n    rnd_h = random.randint(0, h - lq_patchsize)\n    rnd_w = random.randint(0, w - lq_patchsize)\n    lq = lq[rnd_h:rnd_h + lq_patchsize, rnd_w:rnd_w + lq_patchsize, :]\n\n    rnd_h_H, rnd_w_H = int(rnd_h * sf), int(rnd_w * sf)\n    hq = hq[rnd_h_H:rnd_h_H + lq_patchsize * sf, rnd_w_H:rnd_w_H + lq_patchsize * sf, :]\n    return lq, hq\n\n\ndef degradation_bsrgan(img, sf=4, lq_patchsize=72, isp_model=None):\n    \"\"\"\n    This is the degradation model of BSRGAN from the paper\n    \"Designing a Practical Degradation Model for Deep Blind Image Super-Resolution\"\n    ----------\n    img: HXWXC, [0, 1], its size should be large than (lq_patchsizexsf)x(lq_patchsizexsf)\n    sf: scale factor\n    isp_model: camera ISP model\n    Returns\n    -------\n    img: low-quality patch, size: lq_patchsizeXlq_patchsizeXC, range: [0, 1]\n    hq: corresponding high-quality patch, size: (lq_patchsizexsf)X(lq_patchsizexsf)XC, range: [0, 1]\n    \"\"\"\n    isp_prob, jpeg_prob, scale2_prob = 0.25, 0.9, 0.25\n    sf_ori = sf\n\n    h1, w1 = img.shape[:2]\n    img = img.copy()[:w1 - w1 % sf, :h1 - h1 % sf, ...]  # mod crop\n    h, w = img.shape[:2]\n\n    if h < lq_patchsize * sf or w < lq_patchsize * sf:\n        raise ValueError(f'img size ({h1}X{w1}) is too small!')\n\n    hq = img.copy()\n\n    if sf == 4 and random.random() < scale2_prob:  # downsample1\n        if np.random.rand() < 0.5:\n            img = cv2.resize(img, (int(1 / 2 * img.shape[1]), int(1 / 2 * img.shape[0])),\n                             interpolation=random.choice([1, 2, 3]))\n        else:\n            img = util.imresize_np(img, 1 / 2, True)\n        img = np.clip(img, 0.0, 1.0)\n        sf = 2\n\n    shuffle_order = random.sample(range(7), 7)\n    idx1, idx2 = shuffle_order.index(2), shuffle_order.index(3)\n    if idx1 > idx2:  # keep downsample3 last\n        shuffle_order[idx1], shuffle_order[idx2] = shuffle_order[idx2], shuffle_order[idx1]\n\n    for i in shuffle_order:\n\n        if i == 0:\n            img = add_blur(img, sf=sf)\n\n        elif i == 1:\n            img = add_blur(img, sf=sf)\n\n        elif i == 2:\n            a, b = img.shape[1], img.shape[0]\n            # downsample2\n            if random.random() < 0.75:\n                sf1 = random.uniform(1, 2 * sf)\n                img = cv2.resize(img, (int(1 / sf1 * img.shape[1]), int(1 / sf1 * img.shape[0])),\n                                 interpolation=random.choice([1, 2, 3]))\n            else:\n                k = fspecial('gaussian', 25, random.uniform(0.1, 0.6 * sf))\n                k_shifted = shift_pixel(k, sf)\n                k_shifted = k_shifted / k_shifted.sum()  # blur with shifted kernel\n                img = ndimage.filters.convolve(img, np.expand_dims(k_shifted, axis=2), mode='mirror')\n                img = img[0::sf, 0::sf, ...]  # nearest downsampling\n            img = np.clip(img, 0.0, 1.0)\n\n        elif i == 3:\n            # downsample3\n            img = cv2.resize(img, (int(1 / sf * a), int(1 / sf * b)), interpolation=random.choice([1, 2, 3]))\n            img = np.clip(img, 0.0, 1.0)\n\n        elif i == 4:\n            # add Gaussian noise\n            img = add_Gaussian_noise(img, noise_level1=2, noise_level2=25)\n\n        elif i == 5:\n            # add JPEG noise\n            if random.random() < jpeg_prob:\n                img = add_JPEG_noise(img)\n\n        elif i == 6:\n            # add processed camera sensor noise\n            if random.random() < isp_prob and isp_model is not None:\n                with torch.no_grad():\n                    img, hq = isp_model.forward(img.copy(), hq)\n\n    # add final JPEG compression noise\n    img = add_JPEG_noise(img)\n\n    # random crop\n    img, hq = random_crop(img, hq, sf_ori, lq_patchsize)\n\n    return img, hq\n\n\n# todo no isp_model?\ndef degradation_bsrgan_variant(image, sf=4, isp_model=None):\n    \"\"\"\n    This is the degradation model of BSRGAN from the paper\n    \"Designing a Practical Degradation Model for Deep Blind Image Super-Resolution\"\n    ----------\n    sf: scale factor\n    isp_model: camera ISP model\n    Returns\n    -------\n    img: low-quality patch, size: lq_patchsizeXlq_patchsizeXC, range: [0, 1]\n    hq: corresponding high-quality patch, size: (lq_patchsizexsf)X(lq_patchsizexsf)XC, range: [0, 1]\n    \"\"\"\n    image = util.uint2single(image)\n    isp_prob, jpeg_prob, scale2_prob = 0.25, 0.9, 0.25\n    sf_ori = sf\n\n    h1, w1 = image.shape[:2]\n    image = image.copy()[:w1 - w1 % sf, :h1 - h1 % sf, ...]  # mod crop\n    h, w = image.shape[:2]\n\n    hq = image.copy()\n\n    if sf == 4 and random.random() < scale2_prob:  # downsample1\n        if np.random.rand() < 0.5:\n            image = cv2.resize(image, (int(1 / 2 * image.shape[1]), int(1 / 2 * image.shape[0])),\n                               interpolation=random.choice([1, 2, 3]))\n        else:\n            image = util.imresize_np(image, 1 / 2, True)\n        image = np.clip(image, 0.0, 1.0)\n        sf = 2\n\n    shuffle_order = random.sample(range(7), 7)\n    idx1, idx2 = shuffle_order.index(2), shuffle_order.index(3)\n    if idx1 > idx2:  # keep downsample3 last\n        shuffle_order[idx1], shuffle_order[idx2] = shuffle_order[idx2], shuffle_order[idx1]\n\n    for i in shuffle_order:\n\n        if i == 0:\n            image = add_blur(image, sf=sf)\n\n        elif i == 1:\n            image = add_blur(image, sf=sf)\n\n        elif i == 2:\n            a, b = image.shape[1], image.shape[0]\n            # downsample2\n            if random.random() < 0.75:\n                sf1 = random.uniform(1, 2 * sf)\n                image = cv2.resize(image, (int(1 / sf1 * image.shape[1]), int(1 / sf1 * image.shape[0])),\n                                   interpolation=random.choice([1, 2, 3]))\n            else:\n                k = fspecial('gaussian', 25, random.uniform(0.1, 0.6 * sf))\n                k_shifted = shift_pixel(k, sf)\n                k_shifted = k_shifted / k_shifted.sum()  # blur with shifted kernel\n                image = ndimage.filters.convolve(image, np.expand_dims(k_shifted, axis=2), mode='mirror')\n                image = image[0::sf, 0::sf, ...]  # nearest downsampling\n            image = np.clip(image, 0.0, 1.0)\n\n        elif i == 3:\n            # downsample3\n            image = cv2.resize(image, (int(1 / sf * a), int(1 / sf * b)), interpolation=random.choice([1, 2, 3]))\n            image = np.clip(image, 0.0, 1.0)\n\n        elif i == 4:\n            # add Gaussian noise\n            image = add_Gaussian_noise(image, noise_level1=2, noise_level2=25)\n\n        elif i == 5:\n            # add JPEG noise\n            if random.random() < jpeg_prob:\n                image = add_JPEG_noise(image)\n\n        # elif i == 6:\n        #     # add processed camera sensor noise\n        #     if random.random() < isp_prob and isp_model is not None:\n        #         with torch.no_grad():\n        #             img, hq = isp_model.forward(img.copy(), hq)\n\n    # add final JPEG compression noise\n    image = add_JPEG_noise(image)\n    image = util.single2uint(image)\n    example = {\"image\":image}\n    return example\n\n\n# TODO incase there is a pickle error one needs to replace a += x with a = a + x in add_speckle_noise etc...\ndef degradation_bsrgan_plus(img, sf=4, shuffle_prob=0.5, use_sharp=True, lq_patchsize=64, isp_model=None):\n    \"\"\"\n    This is an extended degradation model by combining\n    the degradation models of BSRGAN and Real-ESRGAN\n    ----------\n    img: HXWXC, [0, 1], its size should be large than (lq_patchsizexsf)x(lq_patchsizexsf)\n    sf: scale factor\n    use_shuffle: the degradation shuffle\n    use_sharp: sharpening the img\n    Returns\n    -------\n    img: low-quality patch, size: lq_patchsizeXlq_patchsizeXC, range: [0, 1]\n    hq: corresponding high-quality patch, size: (lq_patchsizexsf)X(lq_patchsizexsf)XC, range: [0, 1]\n    \"\"\"\n\n    h1, w1 = img.shape[:2]\n    img = img.copy()[:w1 - w1 % sf, :h1 - h1 % sf, ...]  # mod crop\n    h, w = img.shape[:2]\n\n    if h < lq_patchsize * sf or w < lq_patchsize * sf:\n        raise ValueError(f'img size ({h1}X{w1}) is too small!')\n\n    if use_sharp:\n        img = add_sharpening(img)\n    hq = img.copy()\n\n    if random.random() < shuffle_prob:\n        shuffle_order = random.sample(range(13), 13)\n    else:\n        shuffle_order = list(range(13))\n        # local shuffle for noise, JPEG is always the last one\n        shuffle_order[2:6] = random.sample(shuffle_order[2:6], len(range(2, 6)))\n        shuffle_order[9:13] = random.sample(shuffle_order[9:13], len(range(9, 13)))\n\n    poisson_prob, speckle_prob, isp_prob = 0.1, 0.1, 0.1\n\n    for i in shuffle_order:\n        if i == 0:\n            img = add_blur(img, sf=sf)\n        elif i == 1:\n            img = add_resize(img, sf=sf)\n        elif i == 2:\n            img = add_Gaussian_noise(img, noise_level1=2, noise_level2=25)\n        elif i == 3:\n            if random.random() < poisson_prob:\n                img = add_Poisson_noise(img)\n        elif i == 4:\n            if random.random() < speckle_prob:\n                img = add_speckle_noise(img)\n        elif i == 5:\n            if random.random() < isp_prob and isp_model is not None:\n                with torch.no_grad():\n                    img, hq = isp_model.forward(img.copy(), hq)\n        elif i == 6:\n            img = add_JPEG_noise(img)\n        elif i == 7:\n            img = add_blur(img, sf=sf)\n        elif i == 8:\n            img = add_resize(img, sf=sf)\n        elif i == 9:\n            img = add_Gaussian_noise(img, noise_level1=2, noise_level2=25)\n        elif i == 10:\n            if random.random() < poisson_prob:\n                img = add_Poisson_noise(img)\n        elif i == 11:\n            if random.random() < speckle_prob:\n                img = add_speckle_noise(img)\n        elif i == 12:\n            if random.random() < isp_prob and isp_model is not None:\n                with torch.no_grad():\n                    img, hq = isp_model.forward(img.copy(), hq)\n        else:\n            print('check the shuffle!')\n\n    # resize to desired size\n    img = cv2.resize(img, (int(1 / sf * hq.shape[1]), int(1 / sf * hq.shape[0])),\n                     interpolation=random.choice([1, 2, 3]))\n\n    # add final JPEG compression noise\n    img = add_JPEG_noise(img)\n\n    # random crop\n    img, hq = random_crop(img, hq, sf, lq_patchsize)\n\n    return img, hq\n\n\nif __name__ == '__main__':\n\tprint(\"hey\")\n\timg = util.imread_uint('utils/test.png', 3)\n\tprint(img)\n\timg = util.uint2single(img)\n\tprint(img)\n\timg = img[:448, :448]\n\th = img.shape[0] // 4\n\tprint(\"resizing to\", h)\n\tsf = 4\n\tdeg_fn = partial(degradation_bsrgan_variant, sf=sf)\n\tfor i in range(20):\n\t\tprint(i)\n\t\timg_lq = deg_fn(img)\n\t\tprint(img_lq)\n\t\timg_lq_bicubic = albumentations.SmallestMaxSize(max_size=h, interpolation=cv2.INTER_CUBIC)(image=img)[\"image\"]\n\t\tprint(img_lq.shape)\n\t\tprint(\"bicubic\", img_lq_bicubic.shape)\n\t\tprint(img_hq.shape)\n\t\tlq_nearest = cv2.resize(util.single2uint(img_lq), (int(sf * img_lq.shape[1]), int(sf * img_lq.shape[0])),\n\t\t                        interpolation=0)\n\t\tlq_bicubic_nearest = cv2.resize(util.single2uint(img_lq_bicubic), (int(sf * img_lq.shape[1]), int(sf * img_lq.shape[0])),\n\t\t                        interpolation=0)\n\t\timg_concat = np.concatenate([lq_bicubic_nearest, lq_nearest, util.single2uint(img_hq)], axis=1)\n\t\tutil.imsave(img_concat, str(i) + '.png')\n\n\n"
  },
  {
    "path": "lidm/modules/image_degradation/bsrgan_light.py",
    "content": "# -*- coding: utf-8 -*-\nimport numpy as np\nimport cv2\nimport torch\n\nfrom functools import partial\nimport random\nfrom scipy import ndimage\nimport scipy\nimport scipy.stats as ss\nfrom scipy.interpolate import interp2d\nfrom scipy.linalg import orth\nimport albumentations\n\nfrom . import utils_image as util\n\n\"\"\"\n# --------------------------------------------\n# Super-Resolution\n# --------------------------------------------\n#\n# Kai Zhang (cskaizhang@gmail.com)\n# https://github.com/cszn\n# From 2019/03--2021/08\n# --------------------------------------------\n\"\"\"\n\n\ndef modcrop_np(img, sf):\n    '''\n    Args:\n        img: numpy image, WxH or WxHxC\n        sf: scale factor\n    Return:\n        cropped image\n    '''\n    w, h = img.shape[:2]\n    im = np.copy(img)\n    return im[:w - w % sf, :h - h % sf, ...]\n\n\n\"\"\"\n# --------------------------------------------\n# anisotropic Gaussian kernels\n# --------------------------------------------\n\"\"\"\n\n\ndef analytic_kernel(k):\n    \"\"\"Calculate the X4 kernel from the X2 kernel (for proof see appendix in paper)\"\"\"\n    k_size = k.shape[0]\n    # Calculate the big kernels size\n    big_k = np.zeros((3 * k_size - 2, 3 * k_size - 2))\n    # Loop over the small kernel to fill the big one\n    for r in range(k_size):\n        for c in range(k_size):\n            big_k[2 * r:2 * r + k_size, 2 * c:2 * c + k_size] += k[r, c] * k\n    # Crop the edges of the big kernel to ignore very small values and increase run time of SR\n    crop = k_size // 2\n    cropped_big_k = big_k[crop:-crop, crop:-crop]\n    # Normalize to 1\n    return cropped_big_k / cropped_big_k.sum()\n\n\ndef anisotropic_Gaussian(ksize=15, theta=np.pi, l1=6, l2=6):\n    \"\"\" generate an anisotropic Gaussian kernel\n    Args:\n        ksize : e.g., 15, kernel size\n        theta : [0,  pi], rotation angle range\n        l1    : [0.1,50], scaling of eigenvalues\n        l2    : [0.1,l1], scaling of eigenvalues\n        If l1 = l2, will get an isotropic Gaussian kernel.\n    Returns:\n        k     : kernel\n    \"\"\"\n\n    v = np.dot(np.array([[np.cos(theta), -np.sin(theta)], [np.sin(theta), np.cos(theta)]]), np.array([1., 0.]))\n    V = np.array([[v[0], v[1]], [v[1], -v[0]]])\n    D = np.array([[l1, 0], [0, l2]])\n    Sigma = np.dot(np.dot(V, D), np.linalg.inv(V))\n    k = gm_blur_kernel(mean=[0, 0], cov=Sigma, size=ksize)\n\n    return k\n\n\ndef gm_blur_kernel(mean, cov, size=15):\n    center = size / 2.0 + 0.5\n    k = np.zeros([size, size])\n    for y in range(size):\n        for x in range(size):\n            cy = y - center + 1\n            cx = x - center + 1\n            k[y, x] = ss.multivariate_normal.pdf([cx, cy], mean=mean, cov=cov)\n\n    k = k / np.sum(k)\n    return k\n\n\ndef shift_pixel(x, sf, upper_left=True):\n    \"\"\"shift pixel for super-resolution with different scale factors\n    Args:\n        x: WxHxC or WxH\n        sf: scale factor\n        upper_left: shift direction\n    \"\"\"\n    h, w = x.shape[:2]\n    shift = (sf - 1) * 0.5\n    xv, yv = np.arange(0, w, 1.0), np.arange(0, h, 1.0)\n    if upper_left:\n        x1 = xv + shift\n        y1 = yv + shift\n    else:\n        x1 = xv - shift\n        y1 = yv - shift\n\n    x1 = np.clip(x1, 0, w - 1)\n    y1 = np.clip(y1, 0, h - 1)\n\n    if x.ndim == 2:\n        x = interp2d(xv, yv, x)(x1, y1)\n    if x.ndim == 3:\n        for i in range(x.shape[-1]):\n            x[:, :, i] = interp2d(xv, yv, x[:, :, i])(x1, y1)\n\n    return x\n\n\ndef blur(x, k):\n    '''\n    x: image, NxcxHxW\n    k: kernel, Nx1xhxw\n    '''\n    n, c = x.shape[:2]\n    p1, p2 = (k.shape[-2] - 1) // 2, (k.shape[-1] - 1) // 2\n    x = torch.nn.functional.pad(x, pad=(p1, p2, p1, p2), mode='replicate')\n    k = k.repeat(1, c, 1, 1)\n    k = k.view(-1, 1, k.shape[2], k.shape[3])\n    x = x.view(1, -1, x.shape[2], x.shape[3])\n    x = torch.nn.functional.conv2d(x, k, bias=None, stride=1, padding=0, groups=n * c)\n    x = x.view(n, c, x.shape[2], x.shape[3])\n\n    return x\n\n\ndef gen_kernel(k_size=np.array([15, 15]), scale_factor=np.array([4, 4]), min_var=0.6, max_var=10., noise_level=0):\n    \"\"\"\"\n    # modified version of https://github.com/assafshocher/BlindSR_dataset_generator\n    # Kai Zhang\n    # min_var = 0.175 * sf  # variance of the gaussian kernel will be sampled between min_var and max_var\n    # max_var = 2.5 * sf\n    \"\"\"\n    # Set random eigen-vals (lambdas) and angle (theta) for COV matrix\n    lambda_1 = min_var + np.random.rand() * (max_var - min_var)\n    lambda_2 = min_var + np.random.rand() * (max_var - min_var)\n    theta = np.random.rand() * np.pi  # random theta\n    noise = -noise_level + np.random.rand(*k_size) * noise_level * 2\n\n    # Set COV matrix using Lambdas and Theta\n    LAMBDA = np.diag([lambda_1, lambda_2])\n    Q = np.array([[np.cos(theta), -np.sin(theta)],\n                  [np.sin(theta), np.cos(theta)]])\n    SIGMA = Q @ LAMBDA @ Q.T\n    INV_SIGMA = np.linalg.inv(SIGMA)[None, None, :, :]\n\n    # Set expectation position (shifting kernel for aligned image)\n    MU = k_size // 2 - 0.5 * (scale_factor - 1)  # - 0.5 * (scale_factor - k_size % 2)\n    MU = MU[None, None, :, None]\n\n    # Create meshgrid for Gaussian\n    [X, Y] = np.meshgrid(range(k_size[0]), range(k_size[1]))\n    Z = np.stack([X, Y], 2)[:, :, :, None]\n\n    # Calcualte Gaussian for every pixel of the kernel\n    ZZ = Z - MU\n    ZZ_t = ZZ.transpose(0, 1, 3, 2)\n    raw_kernel = np.exp(-0.5 * np.squeeze(ZZ_t @ INV_SIGMA @ ZZ)) * (1 + noise)\n\n    # shift the kernel so it will be centered\n    # raw_kernel_centered = kernel_shift(raw_kernel, scale_factor)\n\n    # Normalize the kernel and return\n    # kernel = raw_kernel_centered / np.sum(raw_kernel_centered)\n    kernel = raw_kernel / np.sum(raw_kernel)\n    return kernel\n\n\ndef fspecial_gaussian(hsize, sigma):\n    hsize = [hsize, hsize]\n    siz = [(hsize[0] - 1.0) / 2.0, (hsize[1] - 1.0) / 2.0]\n    std = sigma\n    [x, y] = np.meshgrid(np.arange(-siz[1], siz[1] + 1), np.arange(-siz[0], siz[0] + 1))\n    arg = -(x * x + y * y) / (2 * std * std)\n    h = np.exp(arg)\n    h[h < scipy.finfo(float).eps * h.max()] = 0\n    sumh = h.sum()\n    if sumh != 0:\n        h = h / sumh\n    return h\n\n\ndef fspecial_laplacian(alpha):\n    alpha = max([0, min([alpha, 1])])\n    h1 = alpha / (alpha + 1)\n    h2 = (1 - alpha) / (alpha + 1)\n    h = [[h1, h2, h1], [h2, -4 / (alpha + 1), h2], [h1, h2, h1]]\n    h = np.array(h)\n    return h\n\n\ndef fspecial(filter_type, *args, **kwargs):\n    '''\n    python code from:\n    https://github.com/ronaldosena/imagens-medicas-2/blob/40171a6c259edec7827a6693a93955de2bd39e76/Aulas/aula_2_-_uniform_filter/matlab_fspecial.py\n    '''\n    if filter_type == 'gaussian':\n        return fspecial_gaussian(*args, **kwargs)\n    if filter_type == 'laplacian':\n        return fspecial_laplacian(*args, **kwargs)\n\n\n\"\"\"\n# --------------------------------------------\n# degradation models\n# --------------------------------------------\n\"\"\"\n\n\ndef bicubic_degradation(x, sf=3):\n    '''\n    Args:\n        x: HxWxC image, [0, 1]\n        sf: down-scale factor\n    Return:\n        bicubicly downsampled LR image\n    '''\n    x = util.imresize_np(x, scale=1 / sf)\n    return x\n\n\ndef srmd_degradation(x, k, sf=3):\n    ''' blur + bicubic downsampling\n    Args:\n        x: HxWxC image, [0, 1]\n        k: hxw, double\n        sf: down-scale factor\n    Return:\n        downsampled LR image\n    Reference:\n        @inproceedings{zhang2018learning,\n          title={Learning a single convolutional super-resolution network for multiple degradations},\n          author={Zhang, Kai and Zuo, Wangmeng and Zhang, Lei},\n          booktitle={IEEE Conference on Computer Vision and Pattern Recognition},\n          pages={3262--3271},\n          year={2018}\n        }\n    '''\n    x = ndimage.filters.convolve(x, np.expand_dims(k, axis=2), mode='wrap')  # 'nearest' | 'mirror'\n    x = bicubic_degradation(x, sf=sf)\n    return x\n\n\ndef dpsr_degradation(x, k, sf=3):\n    ''' bicubic downsampling + blur\n    Args:\n        x: HxWxC image, [0, 1]\n        k: hxw, double\n        sf: down-scale factor\n    Return:\n        downsampled LR image\n    Reference:\n        @inproceedings{zhang2019deep,\n          title={Deep Plug-and-Play Super-Resolution for Arbitrary Blur Kernels},\n          author={Zhang, Kai and Zuo, Wangmeng and Zhang, Lei},\n          booktitle={IEEE Conference on Computer Vision and Pattern Recognition},\n          pages={1671--1681},\n          year={2019}\n        }\n    '''\n    x = bicubic_degradation(x, sf=sf)\n    x = ndimage.filters.convolve(x, np.expand_dims(k, axis=2), mode='wrap')\n    return x\n\n\ndef classical_degradation(x, k, sf=3):\n    ''' blur + downsampling\n    Args:\n        x: HxWxC image, [0, 1]/[0, 255]\n        k: hxw, double\n        sf: down-scale factor\n    Return:\n        downsampled LR image\n    '''\n    x = ndimage.filters.convolve(x, np.expand_dims(k, axis=2), mode='wrap')\n    # x = filters.correlate(x, np.expand_dims(np.flip(k), axis=2))\n    st = 0\n    return x[st::sf, st::sf, ...]\n\n\ndef add_sharpening(img, weight=0.5, radius=50, threshold=10):\n    \"\"\"USM sharpening. borrowed from real-ESRGAN\n    Input image: I; Blurry image: B.\n    1. K = I + weight * (I - B)\n    2. Mask = 1 if abs(I - B) > threshold, else: 0\n    3. Blur mask:\n    4. Out = Mask * K + (1 - Mask) * I\n    Args:\n        img (Numpy array): Input image, HWC, BGR; float32, [0, 1].\n        weight (float): Sharp weight. Default: 1.\n        radius (float): Kernel size of Gaussian blur. Default: 50.\n        threshold (int):\n    \"\"\"\n    if radius % 2 == 0:\n        radius += 1\n    blur = cv2.GaussianBlur(img, (radius, radius), 0)\n    residual = img - blur\n    mask = np.abs(residual) * 255 > threshold\n    mask = mask.astype('float32')\n    soft_mask = cv2.GaussianBlur(mask, (radius, radius), 0)\n\n    K = img + weight * residual\n    K = np.clip(K, 0, 1)\n    return soft_mask * K + (1 - soft_mask) * img\n\n\ndef add_blur(img, sf=4):\n    wd2 = 4.0 + sf\n    wd = 2.0 + 0.2 * sf\n\n    wd2 = wd2/4\n    wd = wd/4\n\n    if random.random() < 0.5:\n        l1 = wd2 * random.random()\n        l2 = wd2 * random.random()\n        k = anisotropic_Gaussian(ksize=random.randint(2, 11) + 3, theta=random.random() * np.pi, l1=l1, l2=l2)\n    else:\n        k = fspecial('gaussian', random.randint(2, 4) + 3, wd * random.random())\n    img = ndimage.filters.convolve(img, np.expand_dims(k, axis=2), mode='mirror')\n\n    return img\n\n\ndef add_resize(img, sf=4):\n    rnum = np.random.rand()\n    if rnum > 0.8:  # up\n        sf1 = random.uniform(1, 2)\n    elif rnum < 0.7:  # down\n        sf1 = random.uniform(0.5 / sf, 1)\n    else:\n        sf1 = 1.0\n    img = cv2.resize(img, (int(sf1 * img.shape[1]), int(sf1 * img.shape[0])), interpolation=random.choice([1, 2, 3]))\n    img = np.clip(img, 0.0, 1.0)\n\n    return img\n\n\n# def add_Gaussian_noise(img, noise_level1=2, noise_level2=25):\n#     noise_level = random.randint(noise_level1, noise_level2)\n#     rnum = np.random.rand()\n#     if rnum > 0.6:  # add color Gaussian noise\n#         img += np.random.normal(0, noise_level / 255.0, img.shape).astype(np.float32)\n#     elif rnum < 0.4:  # add grayscale Gaussian noise\n#         img += np.random.normal(0, noise_level / 255.0, (*img.shape[:2], 1)).astype(np.float32)\n#     else:  # add  noise\n#         L = noise_level2 / 255.\n#         D = np.diag(np.random.rand(3))\n#         U = orth(np.random.rand(3, 3))\n#         conv = np.dot(np.dot(np.transpose(U), D), U)\n#         img += np.random.multivariate_normal([0, 0, 0], np.abs(L ** 2 * conv), img.shape[:2]).astype(np.float32)\n#     img = np.clip(img, 0.0, 1.0)\n#     return img\n\ndef add_Gaussian_noise(img, noise_level1=2, noise_level2=25):\n    noise_level = random.randint(noise_level1, noise_level2)\n    rnum = np.random.rand()\n    if rnum > 0.6:  # add color Gaussian noise\n        img = img + np.random.normal(0, noise_level / 255.0, img.shape).astype(np.float32)\n    elif rnum < 0.4:  # add grayscale Gaussian noise\n        img = img + np.random.normal(0, noise_level / 255.0, (*img.shape[:2], 1)).astype(np.float32)\n    else:  # add  noise\n        L = noise_level2 / 255.\n        D = np.diag(np.random.rand(3))\n        U = orth(np.random.rand(3, 3))\n        conv = np.dot(np.dot(np.transpose(U), D), U)\n        img = img + np.random.multivariate_normal([0, 0, 0], np.abs(L ** 2 * conv), img.shape[:2]).astype(np.float32)\n    img = np.clip(img, 0.0, 1.0)\n    return img\n\n\ndef add_speckle_noise(img, noise_level1=2, noise_level2=25):\n    noise_level = random.randint(noise_level1, noise_level2)\n    img = np.clip(img, 0.0, 1.0)\n    rnum = random.random()\n    if rnum > 0.6:\n        img += img * np.random.normal(0, noise_level / 255.0, img.shape).astype(np.float32)\n    elif rnum < 0.4:\n        img += img * np.random.normal(0, noise_level / 255.0, (*img.shape[:2], 1)).astype(np.float32)\n    else:\n        L = noise_level2 / 255.\n        D = np.diag(np.random.rand(3))\n        U = orth(np.random.rand(3, 3))\n        conv = np.dot(np.dot(np.transpose(U), D), U)\n        img += img * np.random.multivariate_normal([0, 0, 0], np.abs(L ** 2 * conv), img.shape[:2]).astype(np.float32)\n    img = np.clip(img, 0.0, 1.0)\n    return img\n\n\ndef add_Poisson_noise(img):\n    img = np.clip((img * 255.0).round(), 0, 255) / 255.\n    vals = 10 ** (2 * random.random() + 2.0)  # [2, 4]\n    if random.random() < 0.5:\n        img = np.random.poisson(img * vals).astype(np.float32) / vals\n    else:\n        img_gray = np.dot(img[..., :3], [0.299, 0.587, 0.114])\n        img_gray = np.clip((img_gray * 255.0).round(), 0, 255) / 255.\n        noise_gray = np.random.poisson(img_gray * vals).astype(np.float32) / vals - img_gray\n        img += noise_gray[:, :, np.newaxis]\n    img = np.clip(img, 0.0, 1.0)\n    return img\n\n\ndef add_JPEG_noise(img):\n    quality_factor = random.randint(80, 95)\n    img = cv2.cvtColor(util.single2uint(img), cv2.COLOR_RGB2BGR)\n    result, encimg = cv2.imencode('.jpg', img, [int(cv2.IMWRITE_JPEG_QUALITY), quality_factor])\n    img = cv2.imdecode(encimg, 1)\n    img = cv2.cvtColor(util.uint2single(img), cv2.COLOR_BGR2RGB)\n    return img\n\n\ndef random_crop(lq, hq, sf=4, lq_patchsize=64):\n    h, w = lq.shape[:2]\n    rnd_h = random.randint(0, h - lq_patchsize)\n    rnd_w = random.randint(0, w - lq_patchsize)\n    lq = lq[rnd_h:rnd_h + lq_patchsize, rnd_w:rnd_w + lq_patchsize, :]\n\n    rnd_h_H, rnd_w_H = int(rnd_h * sf), int(rnd_w * sf)\n    hq = hq[rnd_h_H:rnd_h_H + lq_patchsize * sf, rnd_w_H:rnd_w_H + lq_patchsize * sf, :]\n    return lq, hq\n\n\ndef degradation_bsrgan(img, sf=4, lq_patchsize=72, isp_model=None):\n    \"\"\"\n    This is the degradation model of BSRGAN from the paper\n    \"Designing a Practical Degradation Model for Deep Blind Image Super-Resolution\"\n    ----------\n    img: HXWXC, [0, 1], its size should be large than (lq_patchsizexsf)x(lq_patchsizexsf)\n    sf: scale factor\n    isp_model: camera ISP model\n    Returns\n    -------\n    img: low-quality patch, size: lq_patchsizeXlq_patchsizeXC, range: [0, 1]\n    hq: corresponding high-quality patch, size: (lq_patchsizexsf)X(lq_patchsizexsf)XC, range: [0, 1]\n    \"\"\"\n    isp_prob, jpeg_prob, scale2_prob = 0.25, 0.9, 0.25\n    sf_ori = sf\n\n    h1, w1 = img.shape[:2]\n    img = img.copy()[:w1 - w1 % sf, :h1 - h1 % sf, ...]  # mod crop\n    h, w = img.shape[:2]\n\n    if h < lq_patchsize * sf or w < lq_patchsize * sf:\n        raise ValueError(f'img size ({h1}X{w1}) is too small!')\n\n    hq = img.copy()\n\n    if sf == 4 and random.random() < scale2_prob:  # downsample1\n        if np.random.rand() < 0.5:\n            img = cv2.resize(img, (int(1 / 2 * img.shape[1]), int(1 / 2 * img.shape[0])),\n                             interpolation=random.choice([1, 2, 3]))\n        else:\n            img = util.imresize_np(img, 1 / 2, True)\n        img = np.clip(img, 0.0, 1.0)\n        sf = 2\n\n    shuffle_order = random.sample(range(7), 7)\n    idx1, idx2 = shuffle_order.index(2), shuffle_order.index(3)\n    if idx1 > idx2:  # keep downsample3 last\n        shuffle_order[idx1], shuffle_order[idx2] = shuffle_order[idx2], shuffle_order[idx1]\n\n    for i in shuffle_order:\n\n        if i == 0:\n            img = add_blur(img, sf=sf)\n\n        elif i == 1:\n            img = add_blur(img, sf=sf)\n\n        elif i == 2:\n            a, b = img.shape[1], img.shape[0]\n            # downsample2\n            if random.random() < 0.75:\n                sf1 = random.uniform(1, 2 * sf)\n                img = cv2.resize(img, (int(1 / sf1 * img.shape[1]), int(1 / sf1 * img.shape[0])),\n                                 interpolation=random.choice([1, 2, 3]))\n            else:\n                k = fspecial('gaussian', 25, random.uniform(0.1, 0.6 * sf))\n                k_shifted = shift_pixel(k, sf)\n                k_shifted = k_shifted / k_shifted.sum()  # blur with shifted kernel\n                img = ndimage.filters.convolve(img, np.expand_dims(k_shifted, axis=2), mode='mirror')\n                img = img[0::sf, 0::sf, ...]  # nearest downsampling\n            img = np.clip(img, 0.0, 1.0)\n\n        elif i == 3:\n            # downsample3\n            img = cv2.resize(img, (int(1 / sf * a), int(1 / sf * b)), interpolation=random.choice([1, 2, 3]))\n            img = np.clip(img, 0.0, 1.0)\n\n        elif i == 4:\n            # add Gaussian noise\n            img = add_Gaussian_noise(img, noise_level1=2, noise_level2=8)\n\n        elif i == 5:\n            # add JPEG noise\n            if random.random() < jpeg_prob:\n                img = add_JPEG_noise(img)\n\n        elif i == 6:\n            # add processed camera sensor noise\n            if random.random() < isp_prob and isp_model is not None:\n                with torch.no_grad():\n                    img, hq = isp_model.forward(img.copy(), hq)\n\n    # add final JPEG compression noise\n    img = add_JPEG_noise(img)\n\n    # random crop\n    img, hq = random_crop(img, hq, sf_ori, lq_patchsize)\n\n    return img, hq\n\n\n# todo no isp_model?\ndef degradation_bsrgan_variant(image, sf=4, isp_model=None):\n    \"\"\"\n    This is the degradation model of BSRGAN from the paper\n    \"Designing a Practical Degradation Model for Deep Blind Image Super-Resolution\"\n    ----------\n    sf: scale factor\n    isp_model: camera ISP model\n    Returns\n    -------\n    img: low-quality patch, size: lq_patchsizeXlq_patchsizeXC, range: [0, 1]\n    hq: corresponding high-quality patch, size: (lq_patchsizexsf)X(lq_patchsizexsf)XC, range: [0, 1]\n    \"\"\"\n    image = util.uint2single(image)\n    isp_prob, jpeg_prob, scale2_prob = 0.25, 0.9, 0.25\n    sf_ori = sf\n\n    h1, w1 = image.shape[:2]\n    image = image.copy()[:w1 - w1 % sf, :h1 - h1 % sf, ...]  # mod crop\n    h, w = image.shape[:2]\n\n    hq = image.copy()\n\n    if sf == 4 and random.random() < scale2_prob:  # downsample1\n        if np.random.rand() < 0.5:\n            image = cv2.resize(image, (int(1 / 2 * image.shape[1]), int(1 / 2 * image.shape[0])),\n                               interpolation=random.choice([1, 2, 3]))\n        else:\n            image = util.imresize_np(image, 1 / 2, True)\n        image = np.clip(image, 0.0, 1.0)\n        sf = 2\n\n    shuffle_order = random.sample(range(7), 7)\n    idx1, idx2 = shuffle_order.index(2), shuffle_order.index(3)\n    if idx1 > idx2:  # keep downsample3 last\n        shuffle_order[idx1], shuffle_order[idx2] = shuffle_order[idx2], shuffle_order[idx1]\n\n    for i in shuffle_order:\n\n        if i == 0:\n            image = add_blur(image, sf=sf)\n\n        # elif i == 1:\n        #     image = add_blur(image, sf=sf)\n\n        if i == 0:\n            pass\n\n        elif i == 2:\n            a, b = image.shape[1], image.shape[0]\n            # downsample2\n            if random.random() < 0.8:\n                sf1 = random.uniform(1, 2 * sf)\n                image = cv2.resize(image, (int(1 / sf1 * image.shape[1]), int(1 / sf1 * image.shape[0])),\n                                   interpolation=random.choice([1, 2, 3]))\n            else:\n                k = fspecial('gaussian', 25, random.uniform(0.1, 0.6 * sf))\n                k_shifted = shift_pixel(k, sf)\n                k_shifted = k_shifted / k_shifted.sum()  # blur with shifted kernel\n                image = ndimage.filters.convolve(image, np.expand_dims(k_shifted, axis=2), mode='mirror')\n                image = image[0::sf, 0::sf, ...]  # nearest downsampling\n\n            image = np.clip(image, 0.0, 1.0)\n\n        elif i == 3:\n            # downsample3\n            image = cv2.resize(image, (int(1 / sf * a), int(1 / sf * b)), interpolation=random.choice([1, 2, 3]))\n            image = np.clip(image, 0.0, 1.0)\n\n        elif i == 4:\n            # add Gaussian noise\n            image = add_Gaussian_noise(image, noise_level1=1, noise_level2=2)\n\n        elif i == 5:\n            # add JPEG noise\n            if random.random() < jpeg_prob:\n                image = add_JPEG_noise(image)\n        #\n        # elif i == 6:\n        #     # add processed camera sensor noise\n        #     if random.random() < isp_prob and isp_model is not None:\n        #         with torch.no_grad():\n        #             img, hq = isp_model.forward(img.copy(), hq)\n\n    # add final JPEG compression noise\n    image = add_JPEG_noise(image)\n    image = util.single2uint(image)\n    example = {\"image\": image}\n    return example\n\n\n\n\nif __name__ == '__main__':\n    print(\"hey\")\n    img = util.imread_uint('utils/test.png', 3)\n    img = img[:448, :448]\n    h = img.shape[0] // 4\n    print(\"resizing to\", h)\n    sf = 4\n    deg_fn = partial(degradation_bsrgan_variant, sf=sf)\n    for i in range(20):\n        print(i)\n        img_hq = img\n        img_lq = deg_fn(img)[\"image\"]\n        img_hq, img_lq = util.uint2single(img_hq), util.uint2single(img_lq)\n        print(img_lq)\n        img_lq_bicubic = albumentations.SmallestMaxSize(max_size=h, interpolation=cv2.INTER_CUBIC)(image=img_hq)[\"image\"]\n        print(img_lq.shape)\n        print(\"bicubic\", img_lq_bicubic.shape)\n        print(img_hq.shape)\n        lq_nearest = cv2.resize(util.single2uint(img_lq), (int(sf * img_lq.shape[1]), int(sf * img_lq.shape[0])),\n                                interpolation=0)\n        lq_bicubic_nearest = cv2.resize(util.single2uint(img_lq_bicubic),\n                                        (int(sf * img_lq.shape[1]), int(sf * img_lq.shape[0])),\n                                        interpolation=0)\n        img_concat = np.concatenate([lq_bicubic_nearest, lq_nearest, util.single2uint(img_hq)], axis=1)\n        util.imsave(img_concat, str(i) + '.png')\n"
  },
  {
    "path": "lidm/modules/image_degradation/utils_image.py",
    "content": "import os\nimport math\nimport random\nimport numpy as np\nimport torch\nimport cv2\nfrom torchvision.utils import make_grid\nfrom datetime import datetime\n#import matplotlib.pyplot as plt   # TODO: check with Dominik, also bsrgan.py vs bsrgan_light.py\n\n\nos.environ[\"KMP_DUPLICATE_LIB_OK\"]=\"TRUE\"\n\n\n'''\n# --------------------------------------------\n# Kai Zhang (github: https://github.com/cszn)\n# 03/Mar/2019\n# --------------------------------------------\n# https://github.com/twhui/SRGAN-pyTorch\n# https://github.com/xinntao/BasicSR\n# --------------------------------------------\n'''\n\n\nIMG_EXTENSIONS = ['.jpg', '.JPG', '.jpeg', '.JPEG', '.png', '.PNG', '.ppm', '.PPM', '.bmp', '.BMP', '.tif']\n\n\ndef is_image_file(filename):\n    return any(filename.endswith(extension) for extension in IMG_EXTENSIONS)\n\n\ndef get_timestamp():\n    return datetime.now().strftime('%y%m%d-%H%M%S')\n\n\ndef imshow(x, title=None, cbar=False, figsize=None):\n    plt.figure(figsize=figsize)\n    plt.imshow(np.squeeze(x), interpolation='nearest', cmap='gray')\n    if title:\n        plt.title(title)\n    if cbar:\n        plt.colorbar()\n    plt.show()\n\n\ndef surf(Z, cmap='rainbow', figsize=None):\n    plt.figure(figsize=figsize)\n    ax3 = plt.axes(projection='3d')\n\n    w, h = Z.shape[:2]\n    xx = np.arange(0,w,1)\n    yy = np.arange(0,h,1)\n    X, Y = np.meshgrid(xx, yy)\n    ax3.plot_surface(X,Y,Z,cmap=cmap)\n    #ax3.contour(X,Y,Z, zdim='z',offset=-2，cmap=cmap)\n    plt.show()\n\n\n'''\n# --------------------------------------------\n# get image pathes\n# --------------------------------------------\n'''\n\n\ndef get_image_paths(dataroot):\n    paths = None  # return None if dataroot is None\n    if dataroot is not None:\n        paths = sorted(_get_paths_from_images(dataroot))\n    return paths\n\n\ndef _get_paths_from_images(path):\n    assert os.path.isdir(path), '{:s} is not a valid directory'.format(path)\n    images = []\n    for dirpath, _, fnames in sorted(os.walk(path)):\n        for fname in sorted(fnames):\n            if is_image_file(fname):\n                img_path = os.path.join(dirpath, fname)\n                images.append(img_path)\n    assert images, '{:s} has no valid image file'.format(path)\n    return images\n\n\n'''\n# --------------------------------------------\n# split large images into small images \n# --------------------------------------------\n'''\n\n\ndef patches_from_image(img, p_size=512, p_overlap=64, p_max=800):\n    w, h = img.shape[:2]\n    patches = []\n    if w > p_max and h > p_max:\n        w1 = list(np.arange(0, w-p_size, p_size-p_overlap, dtype=np.int))\n        h1 = list(np.arange(0, h-p_size, p_size-p_overlap, dtype=np.int))\n        w1.append(w-p_size)\n        h1.append(h-p_size)\n#        print(w1)\n#        print(h1)\n        for i in w1:\n            for j in h1:\n                patches.append(img[i:i+p_size, j:j+p_size,:])\n    else:\n        patches.append(img)\n\n    return patches\n\n\ndef imssave(imgs, img_path):\n    \"\"\"\n    imgs: list, N images of size WxHxC\n    \"\"\"\n    img_name, ext = os.path.splitext(os.path.basename(img_path))\n\n    for i, img in enumerate(imgs):\n        if img.ndim == 3:\n            img = img[:, :, [2, 1, 0]]\n        new_path = os.path.join(os.path.dirname(img_path), img_name+str('_s{:04d}'.format(i))+'.png')\n        cv2.imwrite(new_path, img)\n\n\ndef split_imageset(original_dataroot, taget_dataroot, n_channels=3, p_size=800, p_overlap=96, p_max=1000):\n    \"\"\"\n    split the large images from original_dataroot into small overlapped images with size (p_size)x(p_size),\n    and save them into taget_dataroot; only the images with larger size than (p_max)x(p_max)\n    will be splitted.\n    Args:\n        original_dataroot:\n        taget_dataroot:\n        p_size: size of small images\n        p_overlap: patch size in training is a good choice\n        p_max: images with smaller size than (p_max)x(p_max) keep unchanged.\n    \"\"\"\n    paths = get_image_paths(original_dataroot)\n    for img_path in paths:\n        # img_name, ext = os.path.splitext(os.path.basename(img_path))\n        img = imread_uint(img_path, n_channels=n_channels)\n        patches = patches_from_image(img, p_size, p_overlap, p_max)\n        imssave(patches, os.path.join(taget_dataroot,os.path.basename(img_path)))\n        #if original_dataroot == taget_dataroot:\n        #del img_path\n\n'''\n# --------------------------------------------\n# makedir\n# --------------------------------------------\n'''\n\n\ndef mkdir(path):\n    if not os.path.exists(path):\n        os.makedirs(path)\n\n\ndef mkdirs(paths):\n    if isinstance(paths, str):\n        mkdir(paths)\n    else:\n        for path in paths:\n            mkdir(path)\n\n\ndef mkdir_and_rename(path):\n    if os.path.exists(path):\n        new_name = path + '_archived_' + get_timestamp()\n        print('Path already exists. Rename it to [{:s}]'.format(new_name))\n        os.rename(path, new_name)\n    os.makedirs(path)\n\n\n'''\n# --------------------------------------------\n# read image from path\n# opencv is fast, but read BGR numpy image\n# --------------------------------------------\n'''\n\n\n# --------------------------------------------\n# get uint8 image of size HxWxn_channles (RGB)\n# --------------------------------------------\ndef imread_uint(path, n_channels=3):\n    #  input: path\n    # output: HxWx3(RGB or GGG), or HxWx1 (G)\n    if n_channels == 1:\n        img = cv2.imread(path, 0)  # cv2.IMREAD_GRAYSCALE\n        img = np.expand_dims(img, axis=2)  # HxWx1\n    elif n_channels == 3:\n        img = cv2.imread(path, cv2.IMREAD_UNCHANGED)  # BGR or G\n        if img.ndim == 2:\n            img = cv2.cvtColor(img, cv2.COLOR_GRAY2RGB)  # GGG\n        else:\n            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)  # RGB\n    return img\n\n\n# --------------------------------------------\n# matlab's imwrite\n# --------------------------------------------\ndef imsave(img, img_path):\n    img = np.squeeze(img)\n    if img.ndim == 3:\n        img = img[:, :, [2, 1, 0]]\n    cv2.imwrite(img_path, img)\n\ndef imwrite(img, img_path):\n    img = np.squeeze(img)\n    if img.ndim == 3:\n        img = img[:, :, [2, 1, 0]]\n    cv2.imwrite(img_path, img)\n\n\n\n# --------------------------------------------\n# get single image of size HxWxn_channles (BGR)\n# --------------------------------------------\ndef read_img(path):\n    # read image by cv2\n    # return: Numpy float32, HWC, BGR, [0,1]\n    img = cv2.imread(path, cv2.IMREAD_UNCHANGED)  # cv2.IMREAD_GRAYSCALE\n    img = img.astype(np.float32) / 255.\n    if img.ndim == 2:\n        img = np.expand_dims(img, axis=2)\n    # some images have 4 channels\n    if img.shape[2] > 3:\n        img = img[:, :, :3]\n    return img\n\n\n'''\n# --------------------------------------------\n# image format conversion\n# --------------------------------------------\n# numpy(single) <--->  numpy(unit)\n# numpy(single) <--->  tensor\n# numpy(unit)   <--->  tensor\n# --------------------------------------------\n'''\n\n\n# --------------------------------------------\n# numpy(single) [0, 1] <--->  numpy(unit)\n# --------------------------------------------\n\n\ndef uint2single(img):\n\n    return np.float32(img/255.)\n\n\ndef single2uint(img):\n\n    return np.uint8((img.clip(0, 1)*255.).round())\n\n\ndef uint162single(img):\n\n    return np.float32(img/65535.)\n\n\ndef single2uint16(img):\n\n    return np.uint16((img.clip(0, 1)*65535.).round())\n\n\n# --------------------------------------------\n# numpy(unit) (HxWxC or HxW) <--->  tensor\n# --------------------------------------------\n\n\n# convert uint to 4-dimensional torch tensor\ndef uint2tensor4(img):\n    if img.ndim == 2:\n        img = np.expand_dims(img, axis=2)\n    return torch.from_numpy(np.ascontiguousarray(img)).permute(2, 0, 1).float().div(255.).unsqueeze(0)\n\n\n# convert uint to 3-dimensional torch tensor\ndef uint2tensor3(img):\n    if img.ndim == 2:\n        img = np.expand_dims(img, axis=2)\n    return torch.from_numpy(np.ascontiguousarray(img)).permute(2, 0, 1).float().div(255.)\n\n\n# convert 2/3/4-dimensional torch tensor to uint\ndef tensor2uint(img):\n    img = img.data.squeeze().float().clamp_(0, 1).cpu().numpy()\n    if img.ndim == 3:\n        img = np.transpose(img, (1, 2, 0))\n    return np.uint8((img*255.0).round())\n\n\n# --------------------------------------------\n# numpy(single) (HxWxC) <--->  tensor\n# --------------------------------------------\n\n\n# convert single (HxWxC) to 3-dimensional torch tensor\ndef single2tensor3(img):\n    return torch.from_numpy(np.ascontiguousarray(img)).permute(2, 0, 1).float()\n\n\n# convert single (HxWxC) to 4-dimensional torch tensor\ndef single2tensor4(img):\n    return torch.from_numpy(np.ascontiguousarray(img)).permute(2, 0, 1).float().unsqueeze(0)\n\n\n# convert torch tensor to single\ndef tensor2single(img):\n    img = img.data.squeeze().float().cpu().numpy()\n    if img.ndim == 3:\n        img = np.transpose(img, (1, 2, 0))\n\n    return img\n\n# convert torch tensor to single\ndef tensor2single3(img):\n    img = img.data.squeeze().float().cpu().numpy()\n    if img.ndim == 3:\n        img = np.transpose(img, (1, 2, 0))\n    elif img.ndim == 2:\n        img = np.expand_dims(img, axis=2)\n    return img\n\n\ndef single2tensor5(img):\n    return torch.from_numpy(np.ascontiguousarray(img)).permute(2, 0, 1, 3).float().unsqueeze(0)\n\n\ndef single32tensor5(img):\n    return torch.from_numpy(np.ascontiguousarray(img)).float().unsqueeze(0).unsqueeze(0)\n\n\ndef single42tensor4(img):\n    return torch.from_numpy(np.ascontiguousarray(img)).permute(2, 0, 1, 3).float()\n\n\n# from skimage.io import imread, imsave\ndef tensor2img(tensor, out_type=np.uint8, min_max=(0, 1)):\n    '''\n    Converts a torch Tensor into an image Numpy array of BGR channel order\n    Input: 4D(B,(3/1),H,W), 3D(C,H,W), or 2D(H,W), any range, RGB channel order\n    Output: 3D(H,W,C) or 2D(H,W), [0,255], np.uint8 (default)\n    '''\n    tensor = tensor.squeeze().float().cpu().clamp_(*min_max)  # squeeze first, then clamp\n    tensor = (tensor - min_max[0]) / (min_max[1] - min_max[0])  # to range [0,1]\n    n_dim = tensor.dim()\n    if n_dim == 4:\n        n_img = len(tensor)\n        img_np = make_grid(tensor, nrow=int(math.sqrt(n_img)), normalize=False).numpy()\n        img_np = np.transpose(img_np[[2, 1, 0], :, :], (1, 2, 0))  # HWC, BGR\n    elif n_dim == 3:\n        img_np = tensor.numpy()\n        img_np = np.transpose(img_np[[2, 1, 0], :, :], (1, 2, 0))  # HWC, BGR\n    elif n_dim == 2:\n        img_np = tensor.numpy()\n    else:\n        raise TypeError(\n            'Only support 4D, 3D and 2D tensor. But received with dimension: {:d}'.format(n_dim))\n    if out_type == np.uint8:\n        img_np = (img_np * 255.0).round()\n        # Important. Unlike matlab, numpy.unit8() WILL NOT round by default.\n    return img_np.astype(out_type)\n\n\n'''\n# --------------------------------------------\n# Augmentation, flipe and/or rotate\n# --------------------------------------------\n# The following two are enough.\n# (1) augmet_img: numpy image of WxHxC or WxH\n# (2) augment_img_tensor4: tensor image 1xCxWxH\n# --------------------------------------------\n'''\n\n\ndef augment_img(img, mode=0):\n    '''Kai Zhang (github: https://github.com/cszn)\n    '''\n    if mode == 0:\n        return img\n    elif mode == 1:\n        return np.flipud(np.rot90(img))\n    elif mode == 2:\n        return np.flipud(img)\n    elif mode == 3:\n        return np.rot90(img, k=3)\n    elif mode == 4:\n        return np.flipud(np.rot90(img, k=2))\n    elif mode == 5:\n        return np.rot90(img)\n    elif mode == 6:\n        return np.rot90(img, k=2)\n    elif mode == 7:\n        return np.flipud(np.rot90(img, k=3))\n\n\ndef augment_img_tensor4(img, mode=0):\n    '''Kai Zhang (github: https://github.com/cszn)\n    '''\n    if mode == 0:\n        return img\n    elif mode == 1:\n        return img.rot90(1, [2, 3]).flip([2])\n    elif mode == 2:\n        return img.flip([2])\n    elif mode == 3:\n        return img.rot90(3, [2, 3])\n    elif mode == 4:\n        return img.rot90(2, [2, 3]).flip([2])\n    elif mode == 5:\n        return img.rot90(1, [2, 3])\n    elif mode == 6:\n        return img.rot90(2, [2, 3])\n    elif mode == 7:\n        return img.rot90(3, [2, 3]).flip([2])\n\n\ndef augment_img_tensor(img, mode=0):\n    '''Kai Zhang (github: https://github.com/cszn)\n    '''\n    img_size = img.size()\n    img_np = img.data.cpu().numpy()\n    if len(img_size) == 3:\n        img_np = np.transpose(img_np, (1, 2, 0))\n    elif len(img_size) == 4:\n        img_np = np.transpose(img_np, (2, 3, 1, 0))\n    img_np = augment_img(img_np, mode=mode)\n    img_tensor = torch.from_numpy(np.ascontiguousarray(img_np))\n    if len(img_size) == 3:\n        img_tensor = img_tensor.permute(2, 0, 1)\n    elif len(img_size) == 4:\n        img_tensor = img_tensor.permute(3, 2, 0, 1)\n\n    return img_tensor.type_as(img)\n\n\ndef augment_img_np3(img, mode=0):\n    if mode == 0:\n        return img\n    elif mode == 1:\n        return img.transpose(1, 0, 2)\n    elif mode == 2:\n        return img[::-1, :, :]\n    elif mode == 3:\n        img = img[::-1, :, :]\n        img = img.transpose(1, 0, 2)\n        return img\n    elif mode == 4:\n        return img[:, ::-1, :]\n    elif mode == 5:\n        img = img[:, ::-1, :]\n        img = img.transpose(1, 0, 2)\n        return img\n    elif mode == 6:\n        img = img[:, ::-1, :]\n        img = img[::-1, :, :]\n        return img\n    elif mode == 7:\n        img = img[:, ::-1, :]\n        img = img[::-1, :, :]\n        img = img.transpose(1, 0, 2)\n        return img\n\n\ndef augment_imgs(img_list, hflip=True, rot=True):\n    # horizontal flip OR rotate\n    hflip = hflip and random.random() < 0.5\n    vflip = rot and random.random() < 0.5\n    rot90 = rot and random.random() < 0.5\n\n    def _augment(img):\n        if hflip:\n            img = img[:, ::-1, :]\n        if vflip:\n            img = img[::-1, :, :]\n        if rot90:\n            img = img.transpose(1, 0, 2)\n        return img\n\n    return [_augment(img) for img in img_list]\n\n\n'''\n# --------------------------------------------\n# modcrop and shave\n# --------------------------------------------\n'''\n\n\ndef modcrop(img_in, scale):\n    # img_in: Numpy, HWC or HW\n    img = np.copy(img_in)\n    if img.ndim == 2:\n        H, W = img.shape\n        H_r, W_r = H % scale, W % scale\n        img = img[:H - H_r, :W - W_r]\n    elif img.ndim == 3:\n        H, W, C = img.shape\n        H_r, W_r = H % scale, W % scale\n        img = img[:H - H_r, :W - W_r, :]\n    else:\n        raise ValueError('Wrong img ndim: [{:d}].'.format(img.ndim))\n    return img\n\n\ndef shave(img_in, border=0):\n    # img_in: Numpy, HWC or HW\n    img = np.copy(img_in)\n    h, w = img.shape[:2]\n    img = img[border:h-border, border:w-border]\n    return img\n\n\n'''\n# --------------------------------------------\n# image processing process on numpy image\n# channel_convert(in_c, tar_type, img_list):\n# rgb2ycbcr(img, only_y=True):\n# bgr2ycbcr(img, only_y=True):\n# ycbcr2rgb(img):\n# --------------------------------------------\n'''\n\n\ndef rgb2ycbcr(img, only_y=True):\n    '''same as matlab rgb2ycbcr\n    only_y: only return Y channel\n    Input:\n        uint8, [0, 255]\n        float, [0, 1]\n    '''\n    in_img_type = img.dtype\n    img.astype(np.float32)\n    if in_img_type != np.uint8:\n        img *= 255.\n    # convert\n    if only_y:\n        rlt = np.dot(img, [65.481, 128.553, 24.966]) / 255.0 + 16.0\n    else:\n        rlt = np.matmul(img, [[65.481, -37.797, 112.0], [128.553, -74.203, -93.786],\n                              [24.966, 112.0, -18.214]]) / 255.0 + [16, 128, 128]\n    if in_img_type == np.uint8:\n        rlt = rlt.round()\n    else:\n        rlt /= 255.\n    return rlt.astype(in_img_type)\n\n\ndef ycbcr2rgb(img):\n    '''same as matlab ycbcr2rgb\n    Input:\n        uint8, [0, 255]\n        float, [0, 1]\n    '''\n    in_img_type = img.dtype\n    img.astype(np.float32)\n    if in_img_type != np.uint8:\n        img *= 255.\n    # convert\n    rlt = np.matmul(img, [[0.00456621, 0.00456621, 0.00456621], [0, -0.00153632, 0.00791071],\n                          [0.00625893, -0.00318811, 0]]) * 255.0 + [-222.921, 135.576, -276.836]\n    if in_img_type == np.uint8:\n        rlt = rlt.round()\n    else:\n        rlt /= 255.\n    return rlt.astype(in_img_type)\n\n\ndef bgr2ycbcr(img, only_y=True):\n    '''bgr version of rgb2ycbcr\n    only_y: only return Y channel\n    Input:\n        uint8, [0, 255]\n        float, [0, 1]\n    '''\n    in_img_type = img.dtype\n    img.astype(np.float32)\n    if in_img_type != np.uint8:\n        img *= 255.\n    # convert\n    if only_y:\n        rlt = np.dot(img, [24.966, 128.553, 65.481]) / 255.0 + 16.0\n    else:\n        rlt = np.matmul(img, [[24.966, 112.0, -18.214], [128.553, -74.203, -93.786],\n                              [65.481, -37.797, 112.0]]) / 255.0 + [16, 128, 128]\n    if in_img_type == np.uint8:\n        rlt = rlt.round()\n    else:\n        rlt /= 255.\n    return rlt.astype(in_img_type)\n\n\ndef channel_convert(in_c, tar_type, img_list):\n    # conversion among BGR, gray and y\n    if in_c == 3 and tar_type == 'gray':  # BGR to gray\n        gray_list = [cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) for img in img_list]\n        return [np.expand_dims(img, axis=2) for img in gray_list]\n    elif in_c == 3 and tar_type == 'y':  # BGR to y\n        y_list = [bgr2ycbcr(img, only_y=True) for img in img_list]\n        return [np.expand_dims(img, axis=2) for img in y_list]\n    elif in_c == 1 and tar_type == 'RGB':  # gray/y to BGR\n        return [cv2.cvtColor(img, cv2.COLOR_GRAY2BGR) for img in img_list]\n    else:\n        return img_list\n\n\n'''\n# --------------------------------------------\n# metric, PSNR and SSIM\n# --------------------------------------------\n'''\n\n\n# --------------------------------------------\n# PSNR\n# --------------------------------------------\ndef calculate_psnr(img1, img2, border=0):\n    # img1 and img2 have range [0, 255]\n    #img1 = img1.squeeze()\n    #img2 = img2.squeeze()\n    if not img1.shape == img2.shape:\n        raise ValueError('Input images must have the same dimensions.')\n    h, w = img1.shape[:2]\n    img1 = img1[border:h-border, border:w-border]\n    img2 = img2[border:h-border, border:w-border]\n\n    img1 = img1.astype(np.float64)\n    img2 = img2.astype(np.float64)\n    mse = np.mean((img1 - img2)**2)\n    if mse == 0:\n        return float('inf')\n    return 20 * math.log10(255.0 / math.sqrt(mse))\n\n\n# --------------------------------------------\n# SSIM\n# --------------------------------------------\ndef calculate_ssim(img1, img2, border=0):\n    '''calculate SSIM\n    the same outputs as MATLAB's\n    img1, img2: [0, 255]\n    '''\n    #img1 = img1.squeeze()\n    #img2 = img2.squeeze()\n    if not img1.shape == img2.shape:\n        raise ValueError('Input images must have the same dimensions.')\n    h, w = img1.shape[:2]\n    img1 = img1[border:h-border, border:w-border]\n    img2 = img2[border:h-border, border:w-border]\n\n    if img1.ndim == 2:\n        return ssim(img1, img2)\n    elif img1.ndim == 3:\n        if img1.shape[2] == 3:\n            ssims = []\n            for i in range(3):\n                ssims.append(ssim(img1[:,:,i], img2[:,:,i]))\n            return np.array(ssims).mean()\n        elif img1.shape[2] == 1:\n            return ssim(np.squeeze(img1), np.squeeze(img2))\n    else:\n        raise ValueError('Wrong input image dimensions.')\n\n\ndef ssim(img1, img2):\n    C1 = (0.01 * 255)**2\n    C2 = (0.03 * 255)**2\n\n    img1 = img1.astype(np.float64)\n    img2 = img2.astype(np.float64)\n    kernel = cv2.getGaussianKernel(11, 1.5)\n    window = np.outer(kernel, kernel.transpose())\n\n    mu1 = cv2.filter2D(img1, -1, window)[5:-5, 5:-5]  # valid\n    mu2 = cv2.filter2D(img2, -1, window)[5:-5, 5:-5]\n    mu1_sq = mu1**2\n    mu2_sq = mu2**2\n    mu1_mu2 = mu1 * mu2\n    sigma1_sq = cv2.filter2D(img1**2, -1, window)[5:-5, 5:-5] - mu1_sq\n    sigma2_sq = cv2.filter2D(img2**2, -1, window)[5:-5, 5:-5] - mu2_sq\n    sigma12 = cv2.filter2D(img1 * img2, -1, window)[5:-5, 5:-5] - mu1_mu2\n\n    ssim_map = ((2 * mu1_mu2 + C1) * (2 * sigma12 + C2)) / ((mu1_sq + mu2_sq + C1) *\n                                                            (sigma1_sq + sigma2_sq + C2))\n    return ssim_map.mean()\n\n\n'''\n# --------------------------------------------\n# matlab's bicubic imresize (numpy and torch) [0, 1]\n# --------------------------------------------\n'''\n\n\n# matlab 'imresize' function, now only support 'bicubic'\ndef cubic(x):\n    absx = torch.abs(x)\n    absx2 = absx**2\n    absx3 = absx**3\n    return (1.5*absx3 - 2.5*absx2 + 1) * ((absx <= 1).type_as(absx)) + \\\n        (-0.5*absx3 + 2.5*absx2 - 4*absx + 2) * (((absx > 1)*(absx <= 2)).type_as(absx))\n\n\ndef calculate_weights_indices(in_length, out_length, scale, kernel, kernel_width, antialiasing):\n    if (scale < 1) and (antialiasing):\n        # Use a modified kernel to simultaneously interpolate and antialias- larger kernel width\n        kernel_width = kernel_width / scale\n\n    # Output-space coordinates\n    x = torch.linspace(1, out_length, out_length)\n\n    # Input-space coordinates. Calculate the inverse mapping such that 0.5\n    # in output space maps to 0.5 in input space, and 0.5+scale in output\n    # space maps to 1.5 in input space.\n    u = x / scale + 0.5 * (1 - 1 / scale)\n\n    # What is the left-most pixel that can be involved in the computation?\n    left = torch.floor(u - kernel_width / 2)\n\n    # What is the maximum number of pixels that can be involved in the\n    # computation?  Note: it's OK to use an extra pixel here; if the\n    # corresponding weights are all zero, it will be eliminated at the end\n    # of this function.\n    P = math.ceil(kernel_width) + 2\n\n    # The indices of the input pixels involved in computing the k-th output\n    # pixel are in row k of the indices matrix.\n    indices = left.view(out_length, 1).expand(out_length, P) + torch.linspace(0, P - 1, P).view(\n        1, P).expand(out_length, P)\n\n    # The weights used to compute the k-th output pixel are in row k of the\n    # weights matrix.\n    distance_to_center = u.view(out_length, 1).expand(out_length, P) - indices\n    # apply cubic kernel\n    if (scale < 1) and (antialiasing):\n        weights = scale * cubic(distance_to_center * scale)\n    else:\n        weights = cubic(distance_to_center)\n    # Normalize the weights matrix so that each row sums to 1.\n    weights_sum = torch.sum(weights, 1).view(out_length, 1)\n    weights = weights / weights_sum.expand(out_length, P)\n\n    # If a column in weights is all zero, get rid of it. only consider the first and last column.\n    weights_zero_tmp = torch.sum((weights == 0), 0)\n    if not math.isclose(weights_zero_tmp[0], 0, rel_tol=1e-6):\n        indices = indices.narrow(1, 1, P - 2)\n        weights = weights.narrow(1, 1, P - 2)\n    if not math.isclose(weights_zero_tmp[-1], 0, rel_tol=1e-6):\n        indices = indices.narrow(1, 0, P - 2)\n        weights = weights.narrow(1, 0, P - 2)\n    weights = weights.contiguous()\n    indices = indices.contiguous()\n    sym_len_s = -indices.min() + 1\n    sym_len_e = indices.max() - in_length\n    indices = indices + sym_len_s - 1\n    return weights, indices, int(sym_len_s), int(sym_len_e)\n\n\n# --------------------------------------------\n# imresize for tensor image [0, 1]\n# --------------------------------------------\ndef imresize(img, scale, antialiasing=True):\n    # Now the scale should be the same for H and W\n    # input: img: pytorch tensor, CHW or HW [0,1]\n    # output: CHW or HW [0,1] w/o round\n    need_squeeze = True if img.dim() == 2 else False\n    if need_squeeze:\n        img.unsqueeze_(0)\n    in_C, in_H, in_W = img.size()\n    out_C, out_H, out_W = in_C, math.ceil(in_H * scale), math.ceil(in_W * scale)\n    kernel_width = 4\n    kernel = 'cubic'\n\n    # Return the desired dimension order for performing the resize.  The\n    # strategy is to perform the resize first along the dimension with the\n    # smallest scale factor.\n    # Now we do not support this.\n\n    # get weights and indices\n    weights_H, indices_H, sym_len_Hs, sym_len_He = calculate_weights_indices(\n        in_H, out_H, scale, kernel, kernel_width, antialiasing)\n    weights_W, indices_W, sym_len_Ws, sym_len_We = calculate_weights_indices(\n        in_W, out_W, scale, kernel, kernel_width, antialiasing)\n    # process H dimension\n    # symmetric copying\n    img_aug = torch.FloatTensor(in_C, in_H + sym_len_Hs + sym_len_He, in_W)\n    img_aug.narrow(1, sym_len_Hs, in_H).copy_(img)\n\n    sym_patch = img[:, :sym_len_Hs, :]\n    inv_idx = torch.arange(sym_patch.size(1) - 1, -1, -1).long()\n    sym_patch_inv = sym_patch.index_select(1, inv_idx)\n    img_aug.narrow(1, 0, sym_len_Hs).copy_(sym_patch_inv)\n\n    sym_patch = img[:, -sym_len_He:, :]\n    inv_idx = torch.arange(sym_patch.size(1) - 1, -1, -1).long()\n    sym_patch_inv = sym_patch.index_select(1, inv_idx)\n    img_aug.narrow(1, sym_len_Hs + in_H, sym_len_He).copy_(sym_patch_inv)\n\n    out_1 = torch.FloatTensor(in_C, out_H, in_W)\n    kernel_width = weights_H.size(1)\n    for i in range(out_H):\n        idx = int(indices_H[i][0])\n        for j in range(out_C):\n            out_1[j, i, :] = img_aug[j, idx:idx + kernel_width, :].transpose(0, 1).mv(weights_H[i])\n\n    # process W dimension\n    # symmetric copying\n    out_1_aug = torch.FloatTensor(in_C, out_H, in_W + sym_len_Ws + sym_len_We)\n    out_1_aug.narrow(2, sym_len_Ws, in_W).copy_(out_1)\n\n    sym_patch = out_1[:, :, :sym_len_Ws]\n    inv_idx = torch.arange(sym_patch.size(2) - 1, -1, -1).long()\n    sym_patch_inv = sym_patch.index_select(2, inv_idx)\n    out_1_aug.narrow(2, 0, sym_len_Ws).copy_(sym_patch_inv)\n\n    sym_patch = out_1[:, :, -sym_len_We:]\n    inv_idx = torch.arange(sym_patch.size(2) - 1, -1, -1).long()\n    sym_patch_inv = sym_patch.index_select(2, inv_idx)\n    out_1_aug.narrow(2, sym_len_Ws + in_W, sym_len_We).copy_(sym_patch_inv)\n\n    out_2 = torch.FloatTensor(in_C, out_H, out_W)\n    kernel_width = weights_W.size(1)\n    for i in range(out_W):\n        idx = int(indices_W[i][0])\n        for j in range(out_C):\n            out_2[j, :, i] = out_1_aug[j, :, idx:idx + kernel_width].mv(weights_W[i])\n    if need_squeeze:\n        out_2.squeeze_()\n    return out_2\n\n\n# --------------------------------------------\n# imresize for numpy image [0, 1]\n# --------------------------------------------\ndef imresize_np(img, scale, antialiasing=True):\n    # Now the scale should be the same for H and W\n    # input: img: Numpy, HWC or HW [0,1]\n    # output: HWC or HW [0,1] w/o round\n    img = torch.from_numpy(img)\n    need_squeeze = True if img.dim() == 2 else False\n    if need_squeeze:\n        img.unsqueeze_(2)\n\n    in_H, in_W, in_C = img.size()\n    out_C, out_H, out_W = in_C, math.ceil(in_H * scale), math.ceil(in_W * scale)\n    kernel_width = 4\n    kernel = 'cubic'\n\n    # Return the desired dimension order for performing the resize.  The\n    # strategy is to perform the resize first along the dimension with the\n    # smallest scale factor.\n    # Now we do not support this.\n\n    # get weights and indices\n    weights_H, indices_H, sym_len_Hs, sym_len_He = calculate_weights_indices(\n        in_H, out_H, scale, kernel, kernel_width, antialiasing)\n    weights_W, indices_W, sym_len_Ws, sym_len_We = calculate_weights_indices(\n        in_W, out_W, scale, kernel, kernel_width, antialiasing)\n    # process H dimension\n    # symmetric copying\n    img_aug = torch.FloatTensor(in_H + sym_len_Hs + sym_len_He, in_W, in_C)\n    img_aug.narrow(0, sym_len_Hs, in_H).copy_(img)\n\n    sym_patch = img[:sym_len_Hs, :, :]\n    inv_idx = torch.arange(sym_patch.size(0) - 1, -1, -1).long()\n    sym_patch_inv = sym_patch.index_select(0, inv_idx)\n    img_aug.narrow(0, 0, sym_len_Hs).copy_(sym_patch_inv)\n\n    sym_patch = img[-sym_len_He:, :, :]\n    inv_idx = torch.arange(sym_patch.size(0) - 1, -1, -1).long()\n    sym_patch_inv = sym_patch.index_select(0, inv_idx)\n    img_aug.narrow(0, sym_len_Hs + in_H, sym_len_He).copy_(sym_patch_inv)\n\n    out_1 = torch.FloatTensor(out_H, in_W, in_C)\n    kernel_width = weights_H.size(1)\n    for i in range(out_H):\n        idx = int(indices_H[i][0])\n        for j in range(out_C):\n            out_1[i, :, j] = img_aug[idx:idx + kernel_width, :, j].transpose(0, 1).mv(weights_H[i])\n\n    # process W dimension\n    # symmetric copying\n    out_1_aug = torch.FloatTensor(out_H, in_W + sym_len_Ws + sym_len_We, in_C)\n    out_1_aug.narrow(1, sym_len_Ws, in_W).copy_(out_1)\n\n    sym_patch = out_1[:, :sym_len_Ws, :]\n    inv_idx = torch.arange(sym_patch.size(1) - 1, -1, -1).long()\n    sym_patch_inv = sym_patch.index_select(1, inv_idx)\n    out_1_aug.narrow(1, 0, sym_len_Ws).copy_(sym_patch_inv)\n\n    sym_patch = out_1[:, -sym_len_We:, :]\n    inv_idx = torch.arange(sym_patch.size(1) - 1, -1, -1).long()\n    sym_patch_inv = sym_patch.index_select(1, inv_idx)\n    out_1_aug.narrow(1, sym_len_Ws + in_W, sym_len_We).copy_(sym_patch_inv)\n\n    out_2 = torch.FloatTensor(out_H, out_W, in_C)\n    kernel_width = weights_W.size(1)\n    for i in range(out_W):\n        idx = int(indices_W[i][0])\n        for j in range(out_C):\n            out_2[:, i, j] = out_1_aug[:, idx:idx + kernel_width, j].mv(weights_W[i])\n    if need_squeeze:\n        out_2.squeeze_()\n\n    return out_2.numpy()\n\n\nif __name__ == '__main__':\n    print('---')\n#    img = imread_uint('test.bmp', 3)\n#    img = uint2single(img)\n#    img_bicubic = imresize_np(img, 1/4)"
  },
  {
    "path": "lidm/modules/losses/__init__.py",
    "content": "import torch\nimport torch.nn.functional as F\nfrom torch import nn\n\n\ndef adopt_weight(weight, global_step, threshold=0, value=0.):\n    if global_step < threshold:\n        weight = value\n    return weight\n\n\ndef hinge_d_loss(logits_real, logits_fake):\n    loss_real = torch.mean(F.relu(1. - logits_real))\n    loss_fake = torch.mean(F.relu(1. + logits_fake))\n    d_loss = 0.5 * (loss_real + loss_fake)\n    return d_loss\n\n\ndef vanilla_d_loss(logits_real, logits_fake):\n    d_loss = 0.5 * (\n        torch.mean(torch.nn.functional.softplus(-logits_real)) +\n        torch.mean(torch.nn.functional.softplus(logits_fake)))\n    return d_loss\n\n\ndef measure_perplexity(predicted_indices, n_embed):\n    # src: https://github.com/karpathy/deep-vector-quantization/blob/main/model.py\n    # eval cluster perplexity. when perplexity == num_embeddings then all clusters are used exactly equally\n    encodings = F.one_hot(predicted_indices, n_embed).float().reshape(-1, n_embed)\n    avg_probs = encodings.mean(0)\n    perplexity = (-(avg_probs * torch.log(avg_probs + 1e-10)).sum()).exp()\n    cluster_use = torch.sum(avg_probs > 0)\n    return perplexity, cluster_use\n\n\ndef l1(x, y):\n    return torch.abs(x - y)\n\n\ndef l2(x, y):\n    return torch.pow((x - y), 2)\n\n\ndef square_dist_loss(x, y):\n    return torch.sum((x - y) ** 2, dim=1, keepdim=True)\n\n\ndef weights_init(m):\n    classname = m.__class__.__name__\n    if classname.find('Conv') != -1:\n        nn.init.normal_(m.weight.data, 0.0, 0.02)\n    elif classname.find('BatchNorm') != -1:\n        nn.init.normal_(m.weight.data, 1.0, 0.02)\n        nn.init.constant_(m.bias.data, 0)"
  },
  {
    "path": "lidm/modules/losses/contperceptual.py",
    "content": "import torch\nimport torch.nn as nn\n\nfrom . import weights_init, hinge_d_loss, vanilla_d_loss\nfrom .discriminator import LiDARNLayerDiscriminator\nfrom .lpips import LPIPS\n\n\nclass LPIPSWithDiscriminator(nn.Module):\n    def __init__(self, disc_start, logvar_init=0.0, kl_weight=1.0, pixelloss_weight=1.0,\n                 disc_num_layers=3, disc_in_channels=3, disc_factor=1.0, disc_weight=1.0,\n                 p_weight=1.0, use_actnorm=False, disc_conditional=False,\n                 disc_loss=\"hinge\", **kwargs):\n\n        super().__init__()\n        assert disc_loss in [\"hinge\", \"vanilla\"]\n        self.kl_weight = kl_weight\n        self.pixel_weight = pixelloss_weight\n        self.perceptual_weight = p_weight\n        if p_weight > 0.:\n            self.perceptual_loss = LPIPS().eval()\n        # output log variance\n        self.logvar = nn.Parameter(torch.ones(size=()) * logvar_init)\n\n        self.discriminator = LiDARNLayerDiscriminator(\n            input_nc=disc_in_channels, n_layers=disc_num_layers, use_actnorm=use_actnorm).apply(weights_init)\n        self.discriminator_iter_start = disc_start\n        self.disc_loss = hinge_d_loss if disc_loss == \"hinge\" else vanilla_d_loss\n        self.disc_factor = disc_factor\n        self.discriminator_weight = disc_weight\n        self.disc_conditional = disc_conditional\n\n    def calculate_adaptive_weight(self, nll_loss, g_loss, last_layer=None):\n        if last_layer is not None:\n            nll_grads = torch.autograd.grad(nll_loss, last_layer, retain_graph=True)[0]\n            g_grads = torch.autograd.grad(g_loss, last_layer, retain_graph=True)[0]\n        else:\n            nll_grads = torch.autograd.grad(nll_loss, self.last_layer[0], retain_graph=True)[0]\n            g_grads = torch.autograd.grad(g_loss, self.last_layer[0], retain_graph=True)[0]\n\n        d_weight = torch.norm(nll_grads) / (torch.norm(g_grads) + 1e-4)\n        d_weight = torch.clamp(d_weight, 0.0, 1e4).detach()\n        d_weight = d_weight * self.discriminator_weight\n        return d_weight\n\n    def forward(self, inputs, reconstructions, posteriors, optimizer_idx,\n                global_step, last_layer=None, cond=None, split=\"train\",\n                weights=None):\n        rec_loss = torch.abs(inputs.contiguous() - reconstructions.contiguous())\n        if self.perceptual_weight > 0:\n            p_loss = self.perceptual_loss(inputs.contiguous(), reconstructions.contiguous())\n            rec_loss = rec_loss + self.perceptual_weight * p_loss\n\n        nll_loss = rec_loss / torch.exp(self.logvar) + self.logvar\n        weighted_nll_loss = nll_loss\n        if weights is not None:\n            weighted_nll_loss = weights*nll_loss\n        weighted_nll_loss = torch.sum(weighted_nll_loss) / weighted_nll_loss.shape[0]\n        nll_loss = torch.sum(nll_loss) / nll_loss.shape[0]\n        kl_loss = posteriors.kl()\n        kl_loss = torch.sum(kl_loss) / kl_loss.shape[0]\n\n        # now the GAN part\n        disc_factor = 0. if global_step > self.discriminator_iter_start else self.disc_factor\n        if optimizer_idx == 0:\n            # generator update\n            if cond is None:\n                assert not self.disc_conditional\n                logits_fake = self.discriminator(reconstructions.contiguous())\n            else:\n                assert self.disc_conditional\n                logits_fake = self.discriminator(torch.cat((reconstructions.contiguous(), cond), dim=1))\n            g_loss = -torch.mean(logits_fake)\n\n            if self.disc_factor > 0.0:\n                try:\n                    d_weight = self.calculate_adaptive_weight(nll_loss, g_loss, last_layer=last_layer)\n                except RuntimeError:\n                    assert not self.training\n                    d_weight = torch.tensor(0.0)\n            else:\n                d_weight = torch.tensor(0.0)\n\n            loss = weighted_nll_loss + self.kl_weight * kl_loss + disc_factor * d_weight * g_loss\n\n            log = {\"{}/total_loss\".format(split): loss.clone().detach().mean(), \"{}/logvar\".format(split): self.logvar.detach(),\n                   \"{}/kl_loss\".format(split): kl_loss.detach().mean(), \"{}/nll_loss\".format(split): nll_loss.detach().mean(),\n                   \"{}/rec_loss\".format(split): rec_loss.detach().mean(),\n                   \"{}/d_weight\".format(split): d_weight.detach(),\n                   \"{}/disc_factor\".format(split): torch.tensor(disc_factor),\n                   \"{}/g_loss\".format(split): g_loss.detach().mean(),\n                   }\n            return loss, log\n\n        if optimizer_idx == 1:\n            # second pass for discriminator update\n            if cond is None:\n                logits_real = self.discriminator(inputs.contiguous().detach())\n                logits_fake = self.discriminator(reconstructions.contiguous().detach())\n            else:\n                logits_real = self.discriminator(torch.cat((inputs.contiguous().detach(), cond), dim=1))\n                logits_fake = self.discriminator(torch.cat((reconstructions.contiguous().detach(), cond), dim=1))\n\n            d_loss = disc_factor * self.disc_loss(logits_real, logits_fake)\n\n            log = {\"{}/disc_loss\".format(split): d_loss.clone().detach().mean(),\n                   \"{}/logits_real\".format(split): logits_real.detach().mean(),\n                   \"{}/logits_fake\".format(split): logits_fake.detach().mean()\n                   }\n            return d_loss, log\n"
  },
  {
    "path": "lidm/modules/losses/discriminator.py",
    "content": "import functools\nimport torch.nn as nn\n\n\nfrom ..basic import ActNorm, CircularConv2d\n\n\nclass NLayerDiscriminator(nn.Module):\n    \"\"\"Defines a PatchGAN discriminator as in Pix2Pix\n        --> see https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/blob/master/models/networks.py\n    \"\"\"\n    def __init__(self, input_nc=1, output_nc=1, ndf=64, n_layers=3, use_actnorm=False):\n        \"\"\"Construct a PatchGAN discriminator\n        Parameters:\n            input_nc (int)  -- the number of channels in input images\n            ndf (int)       -- the number of filters in the last conv layer\n            n_layers (int)  -- the number of conv layers in the discriminator\n            norm_layer      -- normalization layer\n        \"\"\"\n        super(NLayerDiscriminator, self).__init__()\n        if not use_actnorm:\n            norm_layer = nn.BatchNorm2d\n        else:\n            norm_layer = ActNorm\n        if type(norm_layer) == functools.partial:  # no need to use bias as BatchNorm2d has affine parameters\n            use_bias = norm_layer.func != nn.BatchNorm2d\n        else:\n            use_bias = norm_layer != nn.BatchNorm2d\n\n        kw = 4\n        padw = 1\n        sequence = [nn.Conv2d(input_nc, ndf, kernel_size=kw, stride=2, padding=padw), nn.LeakyReLU(0.2, True)]\n        nf_mult = 1\n        for n in range(1, n_layers):  # gradually increase the number of filters\n            nf_mult_prev = nf_mult\n            nf_mult = min(2 ** n, 8)\n            sequence += [\n                nn.Conv2d(ndf * nf_mult_prev, ndf * nf_mult, kernel_size=kw, stride=2, padding=padw, bias=use_bias),\n                norm_layer(ndf * nf_mult),\n                nn.LeakyReLU(0.2, True)\n            ]\n\n        nf_mult_prev = nf_mult\n        nf_mult = min(2 ** n_layers, 8)\n        sequence += [\n            nn.Conv2d(ndf * nf_mult_prev, ndf * nf_mult, kernel_size=kw, stride=1, padding=padw, bias=use_bias),\n            norm_layer(ndf * nf_mult),\n            nn.LeakyReLU(0.2, True)\n        ]\n\n        sequence += [\n            nn.Conv2d(ndf * nf_mult, output_nc, kernel_size=kw, stride=1, padding=padw)]  # output 1 channel prediction map\n        self.main = nn.Sequential(*sequence)\n\n    def forward(self, input):\n        \"\"\"Standard forward.\"\"\"\n        return self.main(input)\n\n\nclass LiDARNLayerDiscriminator(nn.Module):\n    \"\"\"Modified PatchGAN discriminator from Pix2Pix\n        --> see https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/blob/master/models/networks.py\n    \"\"\"\n    def __init__(self, input_nc=1, output_nc=1, ndf=64, n_layers=3, use_actnorm=False):\n        \"\"\"Construct a PatchGAN discriminator\n        Parameters:\n            input_nc (int)  -- the number of channels in input images\n            ndf (int)       -- the number of filters in the last conv layer\n            n_layers (int)  -- the number of conv layers in the discriminator\n            norm_layer      -- normalization layer\n        \"\"\"\n        super(LiDARNLayerDiscriminator, self).__init__()\n        if not use_actnorm:\n            norm_layer = nn.BatchNorm2d\n        else:\n            norm_layer = ActNorm\n        if type(norm_layer) == functools.partial:  # no need to use bias as BatchNorm2d has affine parameters\n            use_bias = norm_layer.func != nn.BatchNorm2d\n        else:\n            use_bias = norm_layer != nn.BatchNorm2d\n\n        kw = (4, 4)\n        sequence = [CircularConv2d(input_nc, ndf, kernel_size=kw, stride=(1, 2), padding=(1, 2, 1, 2)), nn.LeakyReLU(0.2, True)]\n        nf_mult = 1\n        nf_mult_prev = 1\n        for n in range(1, n_layers):  # gradually increase the number of filters\n            nf_mult_prev = nf_mult\n            nf_mult = min(2 ** n, 8)\n            sequence += [\n                CircularConv2d(ndf * nf_mult_prev, ndf * nf_mult, kernel_size=kw, stride=(1, 2), bias=use_bias, padding=(1, 2, 1, 2)),\n                norm_layer(ndf * nf_mult),\n                nn.LeakyReLU(0.2, True)\n            ]\n\n        nf_mult_prev = nf_mult\n        nf_mult = min(2 ** n_layers, 8)\n        sequence += [\n            CircularConv2d(ndf * nf_mult_prev, ndf * nf_mult, kernel_size=kw, stride=1, bias=use_bias, padding=(1, 2, 1, 2)),\n            norm_layer(ndf * nf_mult),\n            nn.LeakyReLU(0.2, True)\n        ]\n\n        sequence += [\n            CircularConv2d(ndf * nf_mult, output_nc, kernel_size=kw, stride=1, padding=(1, 2, 1, 2))]  # output 1 channel prediction map\n        self.main = nn.Sequential(*sequence)\n\n    def forward(self, input):\n        \"\"\"Standard forward.\"\"\"\n        return self.main(input)\n\n\nclass LiDARNLayerDiscriminatorV2(nn.Module):\n    \"\"\"Modified PatchGAN discriminator from Pix2Pix (larger receptive field)\n        --> see https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/blob/master/models/networks.py\n    \"\"\"\n    def __init__(self, input_nc=1, output_nc=1, ndf=64, n_layers=3, use_actnorm=False):\n        \"\"\"Construct a PatchGAN discriminator\n        Parameters:\n            input_nc (int)  -- the number of channels in input images\n            ndf (int)       -- the number of filters in the last conv layer\n            n_layers (int)  -- the number of conv layers in the discriminator\n            norm_layer      -- normalization layer\n        \"\"\"\n        super(LiDARNLayerDiscriminatorV2, self).__init__()\n        if not use_actnorm:\n            norm_layer = nn.BatchNorm2d\n        else:\n            norm_layer = ActNorm\n        if type(norm_layer) == functools.partial:  # no need to use bias as BatchNorm2d has affine parameters\n            use_bias = norm_layer.func != nn.BatchNorm2d\n        else:\n            use_bias = norm_layer != nn.BatchNorm2d\n\n        kw = (4, 4)\n        sequence = [CircularConv2d(input_nc, ndf, kernel_size=kw, stride=(1, 2), padding=(1, 2, 1, 2)), nn.LeakyReLU(0.2, True),\n                    CircularConv2d(ndf, ndf, kernel_size=kw, stride=(1, 2), padding=(1, 2, 1, 2)), nn.LeakyReLU(0.2, True)]\n        nf_mult = 1\n        nf_mult_prev = 1\n        for n in range(1, n_layers):  # gradually increase the number of filters\n            nf_mult_prev = nf_mult\n            nf_mult = min(2 ** n, 8)\n            sequence += [\n                CircularConv2d(ndf * nf_mult_prev, ndf * nf_mult, kernel_size=kw, stride=(2, 2), bias=use_bias, padding=(1, 2, 1, 2)),\n                norm_layer(ndf * nf_mult),\n                nn.LeakyReLU(0.2, True)\n            ]\n\n        nf_mult_prev = nf_mult\n        nf_mult = min(2 ** n_layers, 8)\n        sequence += [\n            CircularConv2d(ndf * nf_mult_prev, ndf * nf_mult, kernel_size=kw, stride=1, bias=use_bias, padding=(1, 2, 1, 2)),\n            norm_layer(ndf * nf_mult),\n            nn.LeakyReLU(0.2, True)\n        ]\n\n        sequence += [\n            CircularConv2d(ndf * nf_mult, output_nc, kernel_size=kw, stride=1, padding=(1, 2, 1, 2))]  # output 1 channel prediction map\n        self.main = nn.Sequential(*sequence)\n\n    def forward(self, input):\n        \"\"\"Standard forward.\"\"\"\n        return self.main(input)\n\n\nclass LiDARNLayerDiscriminatorV3(nn.Module):\n    \"\"\"Modified PatchGAN discriminator from Pix2Pix (larger receptive field)\n        --> see https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/blob/master/models/networks.py\n    \"\"\"\n    def __init__(self, input_nc=1, output_nc=1, ndf=64, n_layers=3, use_actnorm=False):\n        \"\"\"Construct a PatchGAN discriminator\n        Parameters:\n            input_nc (int)  -- the number of channels in input images\n            ndf (int)       -- the number of filters in the last conv layer\n            n_layers (int)  -- the number of conv layers in the discriminator\n            norm_layer      -- normalization layer\n        \"\"\"\n        super(LiDARNLayerDiscriminatorV3, self).__init__()\n        if not use_actnorm:\n            norm_layer = nn.BatchNorm2d\n        else:\n            norm_layer = ActNorm\n        if type(norm_layer) == functools.partial:  # no need to use bias as BatchNorm2d has affine parameters\n            use_bias = norm_layer.func != nn.BatchNorm2d\n        else:\n            use_bias = norm_layer != nn.BatchNorm2d\n\n        kw = (4, 4)\n        sequence = [CircularConv2d(input_nc, ndf, kernel_size=(1, 4), stride=(1, 1), padding=(1, 2, 1, 2)), nn.LeakyReLU(0.2, True),\n                    CircularConv2d(ndf, ndf, kernel_size=kw, stride=(2, 2), padding=(1, 2, 1, 2)), nn.LeakyReLU(0.2, True)]\n        nf_mult = 1\n        nf_mult_prev = 1\n        for n in range(1, n_layers):  # gradually increase the number of filters\n            nf_mult_prev = nf_mult\n            nf_mult = min(2 ** n, 8)\n            sequence += [\n                CircularConv2d(ndf * nf_mult_prev, ndf * nf_mult, kernel_size=kw, stride=(2, 2), bias=use_bias, padding=(1, 2, 1, 2)),\n                norm_layer(ndf * nf_mult),\n                nn.LeakyReLU(0.2, True)\n            ]\n\n        nf_mult_prev = nf_mult\n        nf_mult = min(2 ** n_layers, 8)\n        sequence += [\n            CircularConv2d(ndf * nf_mult_prev, ndf * nf_mult, kernel_size=kw, stride=1, bias=use_bias, padding=(1, 2, 1, 2)),\n            norm_layer(ndf * nf_mult),\n            nn.LeakyReLU(0.2, True)\n        ]\n\n        sequence += [\n            CircularConv2d(ndf * nf_mult, output_nc, kernel_size=kw, stride=1, padding=(1, 2, 1, 2))]  # output 1 channel prediction map\n        self.main = nn.Sequential(*sequence)\n\n    def forward(self, input):\n        \"\"\"Standard forward.\"\"\"\n        import pdb; pdb.set_trace()\n        return self.main(input)\n"
  },
  {
    "path": "lidm/modules/losses/geometric.py",
    "content": "from functools import partial\n\nimport numpy as np\nimport torch\nfrom torch import nn\nimport torch.nn.functional as F\n\n\nclass GeoConverter(nn.Module):\n    def __init__(self, curve_length=4, bev_only=False, dataset_config=dict()):\n        super().__init__()\n        self.curve_length = curve_length\n        self.coord_dim = 3 if not bev_only else 2\n        self.convert_fn = self.batch_range2bev if bev_only else self.batch_range2xyz\n\n        fov = dataset_config.fov\n        self.fov_up = fov[0] / 180.0 * np.pi  # field of view up in rad\n        self.fov_down = fov[1] / 180.0 * np.pi  # field of view down in rad\n        self.fov_range = abs(self.fov_down) + abs(self.fov_up)  # get field of view total in rad\n        self.depth_scale = dataset_config.depth_scale\n        self.depth_min, self.depth_max = dataset_config.depth_range\n        self.log_scale = dataset_config.log_scale\n        self.size = dataset_config['size']\n        self.register_conversion()\n\n    def register_conversion(self):\n        scan_x, scan_y = np.meshgrid(np.arange(self.size[1]), np.arange(self.size[0]))\n        scan_x = scan_x.astype(np.float64) / self.size[1]\n        scan_y = scan_y.astype(np.float64) / self.size[0]\n\n        yaw = (np.pi * (scan_x * 2 - 1))\n        pitch = ((1.0 - scan_y) * self.fov_range - abs(self.fov_down))\n\n        to_torch = partial(torch.tensor, dtype=torch.float32)\n\n        self.register_buffer('cos_yaw', torch.cos(to_torch(yaw)))\n        self.register_buffer('sin_yaw', torch.sin(to_torch(yaw)))\n        self.register_buffer('cos_pitch', torch.cos(to_torch(pitch)))\n        self.register_buffer('sin_pitch', torch.sin(to_torch(pitch)))\n\n    def batch_range2xyz(self, imgs):\n        batch_depth = (imgs * 0.5 + 0.5) * self.depth_scale\n        if self.log_scale:\n            batch_depth = torch.exp2(batch_depth) - 1\n        batch_depth = batch_depth.clamp(self.depth_min, self.depth_max)\n\n        batch_x = self.cos_yaw * self.cos_pitch * batch_depth\n        batch_y = -self.sin_yaw * self.cos_pitch * batch_depth\n        batch_z = self.sin_pitch * batch_depth\n        batch_xyz = torch.cat([batch_x, batch_y, batch_z], dim=1)\n\n        return batch_xyz\n\n    def batch_range2bev(self, imgs):\n        batch_depth = (imgs * 0.5 + 0.5) * self.depth_scale\n        if self.log_scale:\n            batch_depth = torch.exp2(batch_depth) - 1\n        batch_depth = batch_depth.clamp(self.depth_min, self.depth_max)\n\n        batch_x = self.cos_yaw * self.cos_pitch * batch_depth\n        batch_y = -self.sin_yaw * self.cos_pitch * batch_depth\n        batch_bev = torch.cat([batch_x, batch_y], dim=1)\n\n        return batch_bev\n\n    def curve_compress(self, batch_coord):\n        compressed_batch_coord = F.avg_pool2d(batch_coord, (1, self.curve_length))\n\n        return compressed_batch_coord\n\n    def forward(self, input):\n        input = input / 2. + .5  # [-1, 1] -> [0, 1]\n\n        input_coord = self.convert_fn(input)\n        if self.curve_length > 1:\n            input_coord = self.curve_compress(input_coord)\n\n        return input_coord\n"
  },
  {
    "path": "lidm/modules/losses/perceptual.py",
    "content": "import hashlib\nimport os\n\nimport requests\nimport torch\nimport torch.nn as nn\n\nfrom tqdm import tqdm\n\nfrom . import l1, l2\nfrom ...utils.model_utils import build_model\n\nURL_MAP = {\n}\n\nCKPT_MAP = {\n}\n\nMD5_MAP = {\n}\n\nPERCEPTUAL_TYPE = {\n    'rangenet_full': [('enc_0', 32), ('enc_1', 64), ('enc_2', 128), ('enc_3', 256), ('enc_4', 512), ('enc_5', 1024),\n                      ('dec_4', 512), ('dec_3', 256), ('dec_2', 128), ('dec_1', 64), ('dec_0', 32)],\n    'rangenet_enc': [('enc_0', 32), ('enc_1', 64), ('enc_2', 128), ('enc_3', 256), ('enc_4', 512), ('enc_5', 1024)],\n    'rangenet_dec': [('dec_4', 512), ('dec_3', 256), ('dec_2', 128), ('dec_1', 64), ('dec_0', 32)],\n    'rangenet_final': [('dec_0', 32)]\n}\n\n\ndef download(url, local_path, chunk_size=1024):\n    os.makedirs(os.path.split(local_path)[0], exist_ok=True)\n    with requests.get(url, stream=True) as r:\n        total_size = int(r.headers.get(\"content-length\", 0))\n        with tqdm(total=total_size, unit=\"B\", unit_scale=True) as pbar:\n            with open(local_path, \"wb\") as f:\n                for data in r.iter_content(chunk_size=chunk_size):\n                    if data:\n                        f.write(data)\n                        pbar.update(chunk_size)\n\n\ndef md5_hash(path):\n    with open(path, \"rb\") as f:\n        content = f.read()\n    return hashlib.md5(content).hexdigest()\n\n\ndef get_ckpt_path(name, root, check=False):\n    assert name in URL_MAP\n    path = os.path.join(root, CKPT_MAP[name])\n    if not os.path.exists(path) or (check and not md5_hash(path) == MD5_MAP[name]):\n        print(\"Downloading {} model from {} to {}\".format(name, URL_MAP[name], path))\n        download(URL_MAP[name], path)\n        md5 = md5_hash(path)\n        assert md5 == MD5_MAP[name], md5\n    return path\n\n\nclass NetLinLayer(nn.Module):\n    \"\"\" A single linear layer which does a 1x1 conv \"\"\"\n\n    def __init__(self, chn_in, chn_out=1, use_dropout=False):\n        super(NetLinLayer, self).__init__()\n        layers = [nn.Dropout(), ] if (use_dropout) else []\n        layers += [nn.Conv2d(chn_in, chn_out, 1, stride=1, padding=0, bias=False), ]\n        self.model = nn.Sequential(*layers)\n\n\nclass PerceptualLoss(nn.Module):\n    def __init__(self, ptype, depth_scale, log_scale=True, use_dropout=True, lpips=False, p_loss='l1'):\n        super().__init__()\n        self.depth_scale = depth_scale\n        self.log_scale = log_scale\n\n        if p_loss == \"l1\":\n            self.p_loss = l1\n        else:\n            self.p_loss = l2\n\n        self.chns = PERCEPTUAL_TYPE[ptype]\n        self.return_list = [name for name, _ in self.chns]\n        self.loss_scale = [5.0, 3.39, 2.29, 1.61, 0.895]  # predefined based on the loss of each stage after a few epochs (refer )\n        self.net = build_model('kitti', 'rangenet')\n        self.lin_list = nn.ModuleList([NetLinLayer(ch, use_dropout=use_dropout) for _, ch in self.chns]) if lpips else None\n        for param in self.parameters():\n            param.requires_grad = False\n\n    @staticmethod\n    def normalize_tensor(x, eps=1e-10):\n        norm_factor = torch.sqrt(torch.sum(x ** 2, dim=1, keepdim=True))\n        return x / (norm_factor + eps)\n\n    @staticmethod\n    def spatial_average(x, keepdim=True):\n        return x.mean([2, 3], keepdim=keepdim)\n\n    def preprocess(self, *inputs):\n        assert len(inputs) == 2, 'input with both depth images and coord images'\n        depth_img, xyz_img = inputs\n\n        # scale to standard rangenet input\n        depth_img = (depth_img * 0.5 + 0.5) * self.depth_scale\n        if self.log_scale:\n            depth_img = torch.exp2(depth_img) - 1\n\n        img = torch.cat([depth_img, xyz_img], 1)\n        return img\n\n    def forward(self, target, input):\n        in0_input, in1_input = self.preprocess(*input), self.preprocess(*target)\n        outs0, outs1 = self.net(in0_input, return_list=self.return_list), self.net(in1_input, return_list=self.return_list)\n\n        val_list = []\n        for i, (name, _) in enumerate(self.chns):\n            feats0, feats1 = self.normalize_tensor(outs0[name].to(in0_input.device)), \\\n                             self.normalize_tensor(outs1[name].to(in0_input.device))\n            diffs = self.p_loss(feats0, feats1)\n            res = self.lin_list[i].model(diffs) if self.lin_list is not None else diffs.mean(1, keepdim=True)\n            res = self.spatial_average(res, keepdim=True) * self.loss_scale[i]\n            val_list.append(res)\n        val = sum(val_list)\n        return val\n"
  },
  {
    "path": "lidm/modules/losses/vqperceptual.py",
    "content": "import torch\nfrom torch import nn\n\nfrom . import weights_init, l1, l2, hinge_d_loss, vanilla_d_loss, measure_perplexity, square_dist_loss\nfrom .geometric import GeoConverter\nfrom .discriminator import NLayerDiscriminator, LiDARNLayerDiscriminator, LiDARNLayerDiscriminatorV2\nfrom .perceptual import PerceptualLoss\n\nVERSION2DISC = {'v0': NLayerDiscriminator, 'v1': LiDARNLayerDiscriminator, 'v2': LiDARNLayerDiscriminatorV2}\n\n\nclass VQGeoLPIPSWithDiscriminator(nn.Module):\n    def __init__(self, disc_start, codebook_weight=1.0, pixelloss_weight=1.0,\n                 disc_num_layers=3, disc_in_channels=3, disc_out_channels=1, disc_factor=1.0, disc_weight=1.0,\n                 mask_factor=0.0, use_actnorm=False, disc_conditional=False,\n                 disc_ndf=64, disc_loss=\"hinge\", n_classes=None, pixel_loss=\"l1\", disc_version='v1',\n                 geo_factor=1.0, curve_length=4, perceptual_factor=1.0, perceptual_type='rangenet_final',\n                 dataset_config=dict()):\n        super().__init__()\n        assert disc_loss in [\"hinge\", \"vanilla\"]\n        assert pixel_loss in [\"l1\", \"l2\"]\n        self.codebook_weight = codebook_weight\n        self.pixel_weight = pixelloss_weight\n        self.mask_factor = mask_factor\n        self.geo_factor = geo_factor\n\n        # scale of reconstruction loss\n        self.rec_scale = 1\n        if mask_factor > 0:\n            self.rec_scale += 1.\n        if geo_factor > 0:\n            self.rec_scale += 1.\n        if perceptual_factor > 0:\n            self.rec_scale += 1.\n\n        if pixel_loss == \"l1\":\n            self.pixel_loss = l1\n        else:\n            self.pixel_loss = l2\n\n        self.perceptual_factor = perceptual_factor\n        if perceptual_factor > 0.:\n            print(f\"{self.__class__.__name__}: Running with LPIPS.\")\n            self.perceptual_loss = PerceptualLoss(perceptual_type, dataset_config.depth_scale,\n                                                  dataset_config.log_scale).eval()\n\n        disc_cls = VERSION2DISC[disc_version]\n        self.discriminator = disc_cls(input_nc=disc_in_channels,\n                                      output_nc=disc_out_channels,\n                                      n_layers=disc_num_layers,\n                                      use_actnorm=use_actnorm,\n                                      ndf=disc_ndf).apply(weights_init)\n        self.discriminator_iter_start = disc_start\n        if disc_loss == \"hinge\":\n            self.disc_loss = hinge_d_loss\n        elif disc_loss == \"vanilla\":\n            self.disc_loss = vanilla_d_loss\n        else:\n            raise ValueError(f\"Unknown GAN loss '{disc_loss}'.\")\n        print(f\"VQGeoLPIPSWithDiscriminator running with {disc_loss} loss.\")\n        self.disc_factor = disc_factor\n        self.discriminator_weight = disc_weight\n        self.disc_conditional = disc_conditional\n        self.n_classes = n_classes\n\n        self.geometry_converter = GeoConverter(curve_length, False, dataset_config)  # force converting xyz output\n        self.geo_loss = square_dist_loss\n\n    def calculate_adaptive_weight(self, nll_loss, g_loss, last_layer=None):\n        if last_layer is not None:\n            nll_grads = torch.autograd.grad(nll_loss, last_layer, retain_graph=True)[0]\n            g_grads = torch.autograd.grad(g_loss, last_layer, retain_graph=True)[0]\n        else:\n            nll_grads = torch.autograd.grad(nll_loss, self.last_layer[0], retain_graph=True)[0]\n            g_grads = torch.autograd.grad(g_loss, self.last_layer[0], retain_graph=True)[0]\n\n        d_weight = torch.norm(nll_grads) / (torch.norm(g_grads) + 1e-4)\n        d_weight = torch.clamp(d_weight, 0.0, 1e4).detach()\n        d_weight = d_weight * self.discriminator_weight\n        return d_weight\n\n    def forward(self, codebook_loss, inputs, reconstructions, optimizer_idx,\n                global_step, last_layer=None, cond=None, split=\"train\", predicted_indices=None, masks=None):\n        input_coord = self.geometry_converter(inputs)\n        rec_coord = self.geometry_converter(reconstructions[:, 0:1].contiguous())\n\n        ############# Reconstruction #############\n        # pixel reconstruction loss\n        if self.mask_factor > 0 and masks is not None:\n            pixel_rec_loss = self.pixel_loss(inputs.contiguous(), reconstructions[:, 0:1].contiguous())\n            mask_rec_loss = self.pixel_loss(masks.contiguous(), reconstructions[:, 1:2].contiguous()) * self.mask_factor\n        else:\n            pixel_rec_loss = self.pixel_loss(inputs.contiguous(), reconstructions.contiguous())\n            mask_rec_loss = torch.tensor(0.0)\n\n        # geometry reconstruction loss (bev)\n        if self.geo_factor > 0:\n            geo_rec_loss = self.geo_loss(input_coord[:, :2], rec_coord[:, :2]) * self.geo_factor\n        else:\n            geo_rec_loss = torch.tensor(0.0)\n\n        # perceptual loss\n        if self.perceptual_factor > 0:\n            perceptual_loss = self.perceptual_loss((inputs.contiguous(), input_coord),\n                                                   (reconstructions[:, 0:1].contiguous(), rec_coord)) * self.perceptual_factor\n        else:\n            perceptual_loss = torch.tensor(0.0)\n\n        # overall reconstruction loss\n        rec_loss = (pixel_rec_loss + mask_rec_loss + geo_rec_loss + perceptual_loss) / self.rec_scale\n        nll_loss = rec_loss\n        nll_loss = torch.mean(nll_loss)\n\n        ############# GAN #############\n        disc_factor = 0. if global_step > self.discriminator_iter_start else self.disc_factor\n        # update generator (input: img, mask, coord, [cond])\n        if optimizer_idx == 0:\n            disc_recons = reconstructions.contiguous()\n            if self.geo_factor > 0:\n                disc_recons = torch.cat((disc_recons, rec_coord[:, :2].contiguous()), dim=1)\n            if cond is not None and self.disc_conditional:\n                disc_recons = torch.cat((disc_recons, cond), dim=1)\n            logits_fake = self.discriminator(disc_recons)\n\n            # adversarial loss\n            g_loss = -torch.mean(logits_fake)\n\n            try:\n                d_weight = self.calculate_adaptive_weight(nll_loss, g_loss, last_layer=last_layer)\n            except RuntimeError:\n                assert not self.training\n                d_weight = torch.tensor(0.0)\n\n            loss = nll_loss + d_weight * disc_factor * g_loss + self.codebook_weight * codebook_loss.mean()\n\n            log = {\"{}/total_loss\".format(split): loss.clone().detach().mean(),\n                   \"{}/quant_loss\".format(split): codebook_loss.detach().mean(),\n                   \"{}/rec_loss\".format(split): rec_loss.detach().mean(),\n                   \"{}/pix_rec_loss\".format(split): pixel_rec_loss.detach().mean(),\n                   \"{}/geo_rec_loss\".format(split): geo_rec_loss.detach().mean(),\n                   \"{}/mask_rec_loss\".format(split): mask_rec_loss.detach().mean(),\n                   \"{}/perceptual_loss\".format(split): perceptual_loss.detach().mean(),\n                   \"{}/d_weight\".format(split): d_weight.detach(),\n                   \"{}/disc_factor\".format(split): torch.tensor(disc_factor),\n                   \"{}/g_loss\".format(split): g_loss.detach().mean()}\n\n            if predicted_indices is not None:\n                assert self.n_classes is not None\n                with torch.no_grad():\n                    perplexity, cluster_usage = measure_perplexity(predicted_indices, self.n_classes)\n                log[f\"{split}/perplexity\"] = perplexity\n                log[f\"{split}/cluster_usage\"] = cluster_usage\n            return loss, log\n\n        # update discriminator (input: img, mask, coord, [cond])\n        if optimizer_idx == 1:\n            disc_inputs, disc_recons = inputs.contiguous().detach(), reconstructions.contiguous().detach()\n            if self.mask_factor > 0 and masks is not None:\n                disc_inputs = torch.cat((disc_inputs, masks.contiguous().detach()), dim=1)\n            if self.geo_factor > 0:\n                disc_inputs = torch.cat((disc_inputs, input_coord[:, :2].contiguous()), dim=1)\n                disc_recons = torch.cat((disc_recons, rec_coord[:, :2].contiguous()), dim=1)\n            if cond is not None:\n                disc_inputs = torch.cat((disc_inputs, cond), dim=1)\n                disc_recons = torch.cat((disc_recons, cond), dim=1)\n            logits_real = self.discriminator(disc_inputs)\n            logits_fake = self.discriminator(disc_recons)\n\n            # gan loss\n            d_loss = self.disc_loss(logits_real, logits_fake) * disc_factor\n\n            log = {\"{}/disc_loss\".format(split): d_loss.clone().detach().mean(),\n                   \"{}/logits_real\".format(split): logits_real.detach().mean(),\n                   \"{}/logits_fake\".format(split): logits_fake.detach().mean()}\n\n            return d_loss, log\n"
  },
  {
    "path": "lidm/modules/minkowskinet/__init__.py",
    "content": ""
  },
  {
    "path": "lidm/modules/minkowskinet/model.py",
    "content": "import torch\nimport torch.nn as nn\n\ntry:\n    import torchsparse\n    import torchsparse.nn as spnn\n    from ..ts import basic_blocks\nexcept ImportError:\n    raise Exception('Required ts lib. Reference: https://github.com/mit-han-lab/torchsparse/tree/v1.4.0')\n\n\nclass Model(nn.Module):\n    def __init__(self, config):\n        super().__init__()\n\n        cr = config.model_params.cr\n        cs = config.model_params.layer_num\n        cs = [int(cr * x) for x in cs]\n\n        self.pres = self.vres = config.model_params.voxel_size\n        self.num_classes = config.model_params.num_class\n\n        self.stem = nn.Sequential(\n            spnn.Conv3d(config.model_params.input_dims, cs[0], kernel_size=3, stride=1),\n            spnn.BatchNorm(cs[0]), spnn.ReLU(True),\n            spnn.Conv3d(cs[0], cs[0], kernel_size=3, stride=1),\n            spnn.BatchNorm(cs[0]), spnn.ReLU(True))\n\n        self.stage1 = nn.Sequential(\n            basic_blocks.BasicConvolutionBlock(cs[0], cs[0], ks=2, stride=2, dilation=1),\n            basic_blocks.ResidualBlock(cs[0], cs[1], ks=3, stride=1, dilation=1),\n            basic_blocks.ResidualBlock(cs[1], cs[1], ks=3, stride=1, dilation=1),\n        )\n\n        self.stage2 = nn.Sequential(\n            basic_blocks.BasicConvolutionBlock(cs[1], cs[1], ks=2, stride=2, dilation=1),\n            basic_blocks.ResidualBlock(cs[1], cs[2], ks=3, stride=1, dilation=1),\n            basic_blocks.ResidualBlock(cs[2], cs[2], ks=3, stride=1, dilation=1),\n        )\n\n        self.stage3 = nn.Sequential(\n            basic_blocks.BasicConvolutionBlock(cs[2], cs[2], ks=2, stride=2, dilation=1),\n            basic_blocks.ResidualBlock(cs[2], cs[3], ks=3, stride=1, dilation=1),\n            basic_blocks.ResidualBlock(cs[3], cs[3], ks=3, stride=1, dilation=1),\n        )\n\n        self.stage4 = nn.Sequential(\n            basic_blocks.BasicConvolutionBlock(cs[3], cs[3], ks=2, stride=2, dilation=1),\n            basic_blocks.ResidualBlock(cs[3], cs[4], ks=3, stride=1, dilation=1),\n            basic_blocks.ResidualBlock(cs[4], cs[4], ks=3, stride=1, dilation=1),\n        )\n\n        self.up1 = nn.ModuleList([\n            basic_blocks.BasicDeconvolutionBlock(cs[4], cs[5], ks=2, stride=2),\n            nn.Sequential(\n                basic_blocks.ResidualBlock(cs[5] + cs[3], cs[5], ks=3, stride=1,\n                                           dilation=1),\n                basic_blocks.ResidualBlock(cs[5], cs[5], ks=3, stride=1, dilation=1),\n            )\n        ])\n\n        self.up2 = nn.ModuleList([\n            basic_blocks.BasicDeconvolutionBlock(cs[5], cs[6], ks=2, stride=2),\n            nn.Sequential(\n                basic_blocks.ResidualBlock(cs[6] + cs[2], cs[6], ks=3, stride=1,\n                                           dilation=1),\n                basic_blocks.ResidualBlock(cs[6], cs[6], ks=3, stride=1, dilation=1),\n            )\n        ])\n\n        self.up3 = nn.ModuleList([\n            basic_blocks.BasicDeconvolutionBlock(cs[6], cs[7], ks=2, stride=2),\n            nn.Sequential(\n                basic_blocks.ResidualBlock(cs[7] + cs[1], cs[7], ks=3, stride=1,\n                                           dilation=1),\n                basic_blocks.ResidualBlock(cs[7], cs[7], ks=3, stride=1, dilation=1),\n            )\n        ])\n\n        self.up4 = nn.ModuleList([\n            basic_blocks.BasicDeconvolutionBlock(cs[7], cs[8], ks=2, stride=2),\n            nn.Sequential(\n                basic_blocks.ResidualBlock(cs[8] + cs[0], cs[8], ks=3, stride=1,\n                                           dilation=1),\n                basic_blocks.ResidualBlock(cs[8], cs[8], ks=3, stride=1, dilation=1),\n            )\n        ])\n\n        self.classifier = nn.Sequential(nn.Linear(cs[8], self.num_classes))\n\n        self.weight_initialization()\n        self.dropout = nn.Dropout(0.3, True)\n\n    def weight_initialization(self):\n        for m in self.modules():\n            if isinstance(m, nn.BatchNorm1d):\n                nn.init.constant_(m.weight, 1)\n                nn.init.constant_(m.bias, 0)\n\n    def forward(self, data_dict, return_logits=False, return_final_logits=False):\n        x = data_dict['lidar']\n        x.C = x.C.int()\n\n        x0 = self.stem(x)\n        x1 = self.stage1(x0)\n        x2 = self.stage2(x1)\n        x3 = self.stage3(x2)\n        x4 = self.stage4(x3)\n\n        if return_logits:\n            output_dict = dict()\n            output_dict['logits'] = x4.F\n            output_dict['batch_indices'] = x4.C[:, -1]\n            return output_dict\n\n        y1 = self.up1[0](x4)\n        y1 = torchsparse.cat([y1, x3])\n        y1 = self.up1[1](y1)\n\n        y2 = self.up2[0](y1)\n        y2 = torchsparse.cat([y2, x2])\n        y2 = self.up2[1](y2)\n\n        y3 = self.up3[0](y2)\n        y3 = torchsparse.cat([y3, x1])\n        y3 = self.up3[1](y3)\n\n        y4 = self.up4[0](y3)\n        y4 = torchsparse.cat([y4, x0])\n        y4 = self.up4[1](y4)\n        if return_final_logits:\n            output_dict = dict()\n            output_dict['logits'] = y4.F\n            output_dict['coords'] = y4.C[:, :3]\n            output_dict['batch_indices'] = y4.C[:, -1]\n            return output_dict\n\n        output = self.classifier(y4.F)\n        data_dict['output'] = output.F\n\n        return data_dict\n"
  },
  {
    "path": "lidm/modules/rangenet/__init__.py",
    "content": ""
  },
  {
    "path": "lidm/modules/rangenet/model.py",
    "content": "#!/usr/bin/env python3\n# This file is covered by the LICENSE file in the root of this project.\nfrom collections import OrderedDict\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\n\nclass BasicBlock(nn.Module):\n    def __init__(self, inplanes, planes, bn_d=0.1):\n        super(BasicBlock, self).__init__()\n        self.conv1 = nn.Conv2d(inplanes, planes[0], kernel_size=1,\n                               stride=1, padding=0, bias=False)\n        self.bn1 = nn.BatchNorm2d(planes[0], momentum=bn_d)\n        self.relu1 = nn.LeakyReLU(0.1)\n        self.conv2 = nn.Conv2d(planes[0], planes[1], kernel_size=3,\n                               stride=1, padding=1, bias=False)\n        self.bn2 = nn.BatchNorm2d(planes[1], momentum=bn_d)\n        self.relu2 = nn.LeakyReLU(0.1)\n\n    def forward(self, x):\n        residual = x\n\n        out = self.conv1(x)\n        out = self.bn1(out)\n        out = self.relu1(out)\n\n        out = self.conv2(out)\n        out = self.bn2(out)\n        out = self.relu2(out)\n\n        out += residual\n        return out\n\n\n# ******************************************************************************\n\n# number of layers per model\nmodel_blocks = {\n    21: [1, 1, 2, 2, 1],\n    53: [1, 2, 8, 8, 4],\n}\n\n\nclass Backbone(nn.Module):\n    \"\"\"\n       Class for DarknetSeg. Subclasses PyTorch's own \"nn\" module\n    \"\"\"\n\n    def __init__(self, params):\n        super(Backbone, self).__init__()\n        self.use_range = params[\"input_depth\"][\"range\"]\n        self.use_xyz = params[\"input_depth\"][\"xyz\"]\n        self.use_remission = params[\"input_depth\"][\"remission\"]\n        self.drop_prob = params[\"dropout\"]\n        self.bn_d = params[\"bn_d\"]\n        self.OS = params[\"OS\"]\n        self.layers = params[\"extra\"][\"layers\"]\n\n        # input depth calc\n        self.input_depth = 0\n        self.input_idxs = []\n        if self.use_range:\n            self.input_depth += 1\n            self.input_idxs.append(0)\n        if self.use_xyz:\n            self.input_depth += 3\n            self.input_idxs.extend([1, 2, 3])\n        if self.use_remission:\n            self.input_depth += 1\n            self.input_idxs.append(4)\n\n        # stride play\n        self.strides = [2, 2, 2, 2, 2]\n        # check current stride\n        current_os = 1\n        for s in self.strides:\n            current_os *= s\n\n        # make the new stride\n        if self.OS > current_os:\n            print(\"Can't do OS, \", self.OS,\n                  \" because it is bigger than original \", current_os)\n        else:\n            # redo strides according to needed stride\n            for i, stride in enumerate(reversed(self.strides), 0):\n                if int(current_os) != self.OS:\n                    if stride == 2:\n                        current_os /= 2\n                        self.strides[-1 - i] = 1\n                    if int(current_os) == self.OS:\n                        break\n\n        # check that darknet exists\n        assert self.layers in model_blocks.keys()\n\n        # generate layers depending on darknet type\n        self.blocks = model_blocks[self.layers]\n\n        # input layer\n        self.conv1 = nn.Conv2d(self.input_depth, 32, kernel_size=3,\n                               stride=1, padding=1, bias=False)\n        self.bn1 = nn.BatchNorm2d(32, momentum=self.bn_d)\n        self.relu1 = nn.LeakyReLU(0.1)\n\n        # encoder\n        self.enc1 = self._make_enc_layer(BasicBlock, [32, 64], self.blocks[0],\n                                         stride=self.strides[0], bn_d=self.bn_d)\n        self.enc2 = self._make_enc_layer(BasicBlock, [64, 128], self.blocks[1],\n                                         stride=self.strides[1], bn_d=self.bn_d)\n        self.enc3 = self._make_enc_layer(BasicBlock, [128, 256], self.blocks[2],\n                                         stride=self.strides[2], bn_d=self.bn_d)\n        self.enc4 = self._make_enc_layer(BasicBlock, [256, 512], self.blocks[3],\n                                         stride=self.strides[3], bn_d=self.bn_d)\n        self.enc5 = self._make_enc_layer(BasicBlock, [512, 1024], self.blocks[4],\n                                         stride=self.strides[4], bn_d=self.bn_d)\n\n        # for a bit of fun\n        self.dropout = nn.Dropout2d(self.drop_prob)\n\n        # last channels\n        self.last_channels = 1024\n\n    # make layer useful function\n    def _make_enc_layer(self, block, planes, blocks, stride, bn_d=0.1):\n        layers = []\n\n        #  downsample\n        layers.append((\"conv\", nn.Conv2d(planes[0], planes[1],\n                                         kernel_size=3,\n                                         stride=[1, stride], dilation=1,\n                                         padding=1, bias=False)))\n        layers.append((\"bn\", nn.BatchNorm2d(planes[1], momentum=bn_d)))\n        layers.append((\"relu\", nn.LeakyReLU(0.1)))\n\n        #  blocks\n        inplanes = planes[1]\n        for i in range(0, blocks):\n            layers.append((\"residual_{}\".format(i),\n                           block(inplanes, planes, bn_d)))\n\n        return nn.Sequential(OrderedDict(layers))\n\n    def run_layer(self, x, layer, skips, os):\n        y = layer(x)\n        if y.shape[2] < x.shape[2] or y.shape[3] < x.shape[3]:\n            skips[os] = x.detach()\n            os *= 2\n        x = y\n        return x, skips, os\n\n    def forward(self, x, return_logits=False, return_list=None):\n        # filter input\n        x = x[:, self.input_idxs]\n\n        # run cnn\n        # store for skip connections\n        skips = {}\n        out_dict = {}\n        os = 1\n\n        # first layer\n        x, skips, os = self.run_layer(x, self.conv1, skips, os)\n        x, skips, os = self.run_layer(x, self.bn1, skips, os)\n        x, skips, os = self.run_layer(x, self.relu1, skips, os)\n        if return_list and 'enc_0' in return_list:\n            out_dict['enc_0'] = x.detach().cpu()  # 32, 64, 1024\n\n        # all encoder blocks with intermediate dropouts\n        x, skips, os = self.run_layer(x, self.enc1, skips, os)\n        if return_list and 'enc_1' in return_list:\n            out_dict['enc_1'] = x.detach().cpu()  # 64, 64, 512\n        x, skips, os = self.run_layer(x, self.dropout, skips, os)\n\n        x, skips, os = self.run_layer(x, self.enc2, skips, os)\n        if return_list and 'enc_2' in return_list:\n            out_dict['enc_2'] = x.detach().cpu()  # 128, 64, 256\n        x, skips, os = self.run_layer(x, self.dropout, skips, os)\n\n        x, skips, os = self.run_layer(x, self.enc3, skips, os)\n        if return_list and 'enc_3' in return_list:\n            out_dict['enc_3'] = x.detach().cpu()  # 256, 64, 128\n        x, skips, os = self.run_layer(x, self.dropout, skips, os)\n\n        x, skips, os = self.run_layer(x, self.enc4, skips, os)\n        if return_list and 'enc_4' in return_list:\n            out_dict['enc_4'] = x.detach().cpu()  # 512, 64, 64\n        x, skips, os = self.run_layer(x, self.dropout, skips, os)\n\n        x, skips, os = self.run_layer(x, self.enc5, skips, os)\n        if return_list and 'enc_5' in return_list:\n            out_dict['enc_5'] = x.detach().cpu()  # 1024, 64, 32\n        if return_logits:\n            return x\n\n        x, skips, os = self.run_layer(x, self.dropout, skips, os)\n\n        if return_list is not None:\n            return x, skips, out_dict\n        return x, skips\n\n    def get_last_depth(self):\n        return self.last_channels\n\n    def get_input_depth(self):\n        return self.input_depth\n\n\nclass Decoder(nn.Module):\n    \"\"\"\n       Class for DarknetSeg. Subclasses PyTorch's own \"nn\" module\n    \"\"\"\n\n    def __init__(self, params, OS=32, feature_depth=1024):\n        super(Decoder, self).__init__()\n        self.backbone_OS = OS\n        self.backbone_feature_depth = feature_depth\n        self.drop_prob = params[\"dropout\"]\n        self.bn_d = params[\"bn_d\"]\n        self.index = 0\n\n        # stride play\n        self.strides = [2, 2, 2, 2, 2]\n        # check current stride\n        current_os = 1\n        for s in self.strides:\n            current_os *= s\n        # redo strides according to needed stride\n        for i, stride in enumerate(self.strides):\n            if int(current_os) != self.backbone_OS:\n                if stride == 2:\n                    current_os /= 2\n                    self.strides[i] = 1\n                if int(current_os) == self.backbone_OS:\n                    break\n\n        # decoder\n        self.dec5 = self._make_dec_layer(BasicBlock,\n                                         [self.backbone_feature_depth, 512],\n                                         bn_d=self.bn_d,\n                                         stride=self.strides[0])\n        self.dec4 = self._make_dec_layer(BasicBlock, [512, 256], bn_d=self.bn_d,\n                                         stride=self.strides[1])\n        self.dec3 = self._make_dec_layer(BasicBlock, [256, 128], bn_d=self.bn_d,\n                                         stride=self.strides[2])\n        self.dec2 = self._make_dec_layer(BasicBlock, [128, 64], bn_d=self.bn_d,\n                                         stride=self.strides[3])\n        self.dec1 = self._make_dec_layer(BasicBlock, [64, 32], bn_d=self.bn_d,\n                                         stride=self.strides[4])\n\n        # layer list to execute with skips\n        self.layers = [self.dec5, self.dec4, self.dec3, self.dec2, self.dec1]\n\n        # for a bit of fun\n        self.dropout = nn.Dropout2d(self.drop_prob)\n\n        # last channels\n        self.last_channels = 32\n\n    def _make_dec_layer(self, block, planes, bn_d=0.1, stride=2):\n        layers = []\n\n        #  downsample\n        if stride == 2:\n            layers.append((\"upconv\", nn.ConvTranspose2d(planes[0], planes[1],\n                                                        kernel_size=[1, 4], stride=[1, 2],\n                                                        padding=[0, 1])))\n        else:\n            layers.append((\"conv\", nn.Conv2d(planes[0], planes[1],\n                                             kernel_size=3, padding=1)))\n        layers.append((\"bn\", nn.BatchNorm2d(planes[1], momentum=bn_d)))\n        layers.append((\"relu\", nn.LeakyReLU(0.1)))\n\n        #  blocks\n        layers.append((\"residual\", block(planes[1], planes, bn_d)))\n\n        return nn.Sequential(OrderedDict(layers))\n\n    def run_layer(self, x, layer, skips, os):\n        feats = layer(x)  # up\n        if feats.shape[-1] > x.shape[-1]:\n            os //= 2  # match skip\n            feats = feats + skips[os].detach()  # add skip\n        x = feats\n        return x, skips, os\n\n    def forward(self, x, skips, return_logits=False, return_list=None):\n        os = self.backbone_OS\n        out_dict = {}\n\n        # run layers\n        x, skips, os = self.run_layer(x, self.dec5, skips, os)\n        if return_list and 'dec_4' in return_list:\n            out_dict['dec_4'] = x.detach().cpu()  # 512, 64, 64\n        x, skips, os = self.run_layer(x, self.dec4, skips, os)\n        if return_list and 'dec_3' in return_list:\n            out_dict['dec_3'] = x.detach().cpu()  # 256, 64, 128\n        x, skips, os = self.run_layer(x, self.dec3, skips, os)\n        if return_list and 'dec_2' in return_list:\n            out_dict['dec_2'] = x.detach().cpu()  # 128, 64, 256\n        x, skips, os = self.run_layer(x, self.dec2, skips, os)\n        if return_list and 'dec_1' in return_list:\n            out_dict['dec_1'] = x.detach().cpu()  # 64, 64, 512\n        x, skips, os = self.run_layer(x, self.dec1, skips, os)\n        if return_list and 'dec_0' in return_list:\n            out_dict['dec_0'] = x.detach().cpu()  # 32, 64, 1024\n\n        logits = torch.clone(x).detach()\n        x = self.dropout(x)\n\n        if return_logits:\n            return x, logits\n        if return_list is not None:\n            return out_dict\n        return x\n\n    def get_last_depth(self):\n        return self.last_channels\n\n\nclass Model(nn.Module):\n    def __init__(self, config):\n        super().__init__()\n        self.config = config\n        self.backbone = Backbone(params=self.config[\"backbone\"])\n        self.decoder = Decoder(params=self.config[\"decoder\"], OS=self.config[\"backbone\"][\"OS\"],\n                               feature_depth=self.backbone.get_last_depth())\n\n    def load_pretrained_weights(self, path):\n        w_dict = torch.load(path + \"/backbone\",\n                            map_location=lambda storage, loc: storage)\n        self.backbone.load_state_dict(w_dict, strict=True)\n        w_dict = torch.load(path + \"/segmentation_decoder\",\n                            map_location=lambda storage, loc: storage)\n        self.decoder.load_state_dict(w_dict, strict=True)\n\n    def forward(self, x, return_logits=False, return_final_logits=False, return_list=None, agg_type='depth'):\n        if return_logits:\n            logits = self.backbone(x, return_logits)\n            logits = F.adaptive_avg_pool2d(logits, (1, 1)).squeeze()\n            logits = torch.clone(logits).detach().cpu().numpy()\n            return logits\n        elif return_list is not None:\n            x, skips, enc_dict = self.backbone(x, return_list=return_list)\n            dec_dict = self.decoder(x, skips, return_list=return_list)\n            out_dict = {**enc_dict, **dec_dict}\n            return out_dict\n        elif return_final_logits:\n            assert agg_type in ['all', 'sector', 'depth']\n            y, skips = self.backbone(x)\n            y, logits = self.decoder(y, skips, True)\n\n            B, C, H, W = logits.shape\n            N = 16\n\n            # avg all\n            if agg_type == 'all':\n                logits = logits.mean([2, 3])\n            # avg in patch\n            elif agg_type == 'sector':\n                logits = logits.view(B, C, H, N, W // N).mean([2, 4]).reshape(B, -1)\n            # avg in row\n            elif agg_type == 'depth':\n                logits = logits.view(B, C, N, H // N, W).mean([3, 4]).reshape(B, -1)\n\n            logits = torch.clone(logits).detach().cpu().numpy()\n            return logits\n        else:\n            y, skips = self.backbone(x)\n            y = self.decoder(y, skips, False)\n            return y\n"
  },
  {
    "path": "lidm/modules/spvcnn/__init__.py",
    "content": ""
  },
  {
    "path": "lidm/modules/spvcnn/model.py",
    "content": "import torch.nn as nn\n\ntry:\n    import torchsparse\n    import torchsparse.nn as spnn\n    from torchsparse import PointTensor\n    from ..ts.utils import initial_voxelize, point_to_voxel, voxel_to_point\n    from ..ts import basic_blocks\nexcept ImportError:\n    raise Exception('Required torchsparse lib. Reference: https://github.com/mit-han-lab/torchsparse/tree/v1.4.0')\n\n\nclass Model(nn.Module):\n    def __init__(self, config):\n        super().__init__()\n        cr = config.model_params.cr\n        cs = config.model_params.layer_num\n        cs = [int(cr * x) for x in cs]\n\n        self.pres = self.vres = config.model_params.voxel_size\n        self.num_classes = config.model_params.num_class\n\n        self.stem = nn.Sequential(\n            spnn.Conv3d(config.model_params.input_dims, cs[0], kernel_size=3, stride=1),\n            spnn.BatchNorm(cs[0]), spnn.ReLU(True),\n            spnn.Conv3d(cs[0], cs[0], kernel_size=3, stride=1),\n            spnn.BatchNorm(cs[0]), spnn.ReLU(True))\n\n        self.stage1 = nn.Sequential(\n            basic_blocks.BasicConvolutionBlock(cs[0], cs[0], ks=2, stride=2, dilation=1),\n            basic_blocks.ResidualBlock(cs[0], cs[1], ks=3, stride=1, dilation=1),\n            basic_blocks.ResidualBlock(cs[1], cs[1], ks=3, stride=1, dilation=1),\n        )\n\n        self.stage2 = nn.Sequential(\n            basic_blocks.BasicConvolutionBlock(cs[1], cs[1], ks=2, stride=2, dilation=1),\n            basic_blocks.ResidualBlock(cs[1], cs[2], ks=3, stride=1, dilation=1),\n            basic_blocks.ResidualBlock(cs[2], cs[2], ks=3, stride=1, dilation=1),\n        )\n\n        self.stage3 = nn.Sequential(\n            basic_blocks.BasicConvolutionBlock(cs[2], cs[2], ks=2, stride=2, dilation=1),\n            basic_blocks.ResidualBlock(cs[2], cs[3], ks=3, stride=1, dilation=1),\n            basic_blocks.ResidualBlock(cs[3], cs[3], ks=3, stride=1, dilation=1),\n        )\n\n        self.stage4 = nn.Sequential(\n            basic_blocks.BasicConvolutionBlock(cs[3], cs[3], ks=2, stride=2, dilation=1),\n            basic_blocks.ResidualBlock(cs[3], cs[4], ks=3, stride=1, dilation=1),\n            basic_blocks.ResidualBlock(cs[4], cs[4], ks=3, stride=1, dilation=1),\n        )\n\n        self.up1 = nn.ModuleList([\n            basic_blocks.BasicDeconvolutionBlock(cs[4], cs[5], ks=2, stride=2),\n            nn.Sequential(\n                basic_blocks.ResidualBlock(cs[5] + cs[3], cs[5], ks=3, stride=1,\n                                           dilation=1),\n                basic_blocks.ResidualBlock(cs[5], cs[5], ks=3, stride=1, dilation=1),\n            )\n        ])\n\n        self.up2 = nn.ModuleList([\n            basic_blocks.BasicDeconvolutionBlock(cs[5], cs[6], ks=2, stride=2),\n            nn.Sequential(\n                basic_blocks.ResidualBlock(cs[6] + cs[2], cs[6], ks=3, stride=1,\n                                           dilation=1),\n                basic_blocks.ResidualBlock(cs[6], cs[6], ks=3, stride=1, dilation=1),\n            )\n        ])\n\n        self.up3 = nn.ModuleList([\n            basic_blocks.BasicDeconvolutionBlock(cs[6], cs[7], ks=2, stride=2),\n            nn.Sequential(\n                basic_blocks.ResidualBlock(cs[7] + cs[1], cs[7], ks=3, stride=1,\n                                           dilation=1),\n                basic_blocks.ResidualBlock(cs[7], cs[7], ks=3, stride=1, dilation=1),\n            )\n        ])\n\n        self.up4 = nn.ModuleList([\n            basic_blocks.BasicDeconvolutionBlock(cs[7], cs[8], ks=2, stride=2),\n            nn.Sequential(\n                basic_blocks.ResidualBlock(cs[8] + cs[0], cs[8], ks=3, stride=1,\n                                           dilation=1),\n                basic_blocks.ResidualBlock(cs[8], cs[8], ks=3, stride=1, dilation=1),\n            )\n        ])\n\n        self.classifier = nn.Sequential(nn.Linear(cs[8], self.num_classes))\n\n        self.point_transforms = nn.ModuleList([\n            nn.Sequential(\n                nn.Linear(cs[0], cs[4]),\n                nn.BatchNorm1d(cs[4]),\n                nn.ReLU(True),\n            ),\n            nn.Sequential(\n                nn.Linear(cs[4], cs[6]),\n                nn.BatchNorm1d(cs[6]),\n                nn.ReLU(True),\n            ),\n            nn.Sequential(\n                nn.Linear(cs[6], cs[8]),\n                nn.BatchNorm1d(cs[8]),\n                nn.ReLU(True),\n            )\n        ])\n\n        self.weight_initialization()\n        self.dropout = nn.Dropout(0.3, True)\n\n    def weight_initialization(self):\n        for m in self.modules():\n            if isinstance(m, nn.BatchNorm1d):\n                nn.init.constant_(m.weight, 1)\n                nn.init.constant_(m.bias, 0)\n\n    def forward(self, data_dict, return_logits=False, return_final_logits=False):\n        x = data_dict['lidar']\n\n        # x: SparseTensor z: PointTensor\n        z = PointTensor(x.F, x.C.float())\n\n        x0 = initial_voxelize(z, self.pres, self.vres)\n\n        x0 = self.stem(x0)\n        z0 = voxel_to_point(x0, z, nearest=False)\n        z0.F = z0.F\n\n        x1 = point_to_voxel(x0, z0)\n        x1 = self.stage1(x1)\n        x2 = self.stage2(x1)\n        x3 = self.stage3(x2)\n        x4 = self.stage4(x3)\n        z1 = voxel_to_point(x4, z0)\n        z1.F = z1.F + self.point_transforms[0](z0.F)\n\n        y1 = point_to_voxel(x4, z1)\n\n        if return_logits:\n            output_dict = dict()\n            output_dict['logits'] = y1.F\n            output_dict['batch_indices'] = y1.C[:, -1]\n            return output_dict\n\n        y1.F = self.dropout(y1.F)\n        y1 = self.up1[0](y1)\n        y1 = torchsparse.cat([y1, x3])\n        y1 = self.up1[1](y1)\n\n        y2 = self.up2[0](y1)\n        y2 = torchsparse.cat([y2, x2])\n        y2 = self.up2[1](y2)\n        z2 = voxel_to_point(y2, z1)\n        z2.F = z2.F + self.point_transforms[1](z1.F)\n\n        y3 = point_to_voxel(y2, z2)\n        y3.F = self.dropout(y3.F)\n        y3 = self.up3[0](y3)\n        y3 = torchsparse.cat([y3, x1])\n        y3 = self.up3[1](y3)\n\n        y4 = self.up4[0](y3)\n        y4 = torchsparse.cat([y4, x0])\n        y4 = self.up4[1](y4)\n        z3 = voxel_to_point(y4, z2)\n        z3.F = z3.F + self.point_transforms[2](z2.F)\n\n        if return_final_logits:\n            output_dict = dict()\n            output_dict['logits'] = z3.F\n            output_dict['coords'] = z3.C[:, :3]\n            output_dict['batch_indices'] = z3.C[:, -1].long()\n            return output_dict\n\n        # output = self.classifier(z3.F)\n        data_dict['logits'] = z3.F\n\n        return data_dict\n"
  },
  {
    "path": "lidm/modules/ts/__init__.py",
    "content": ""
  },
  {
    "path": "lidm/modules/ts/basic_blocks.py",
    "content": "#!/usr/bin/env python\n# encoding: utf-8\n'''\n@author: Xu Yan\n@file: basic_blocks.py\n@time: 2021/4/14 22:53\n'''\nimport torch.nn as nn\nimport torchsparse.nn as spnn\n\n\nclass BasicConvolutionBlock(nn.Module):\n    def __init__(self, inc, outc, ks=3, stride=1, dilation=1):\n        super().__init__()\n        self.net = nn.Sequential(\n            spnn.Conv3d(\n                inc,\n                outc,\n                kernel_size=ks,\n                dilation=dilation,\n                stride=stride), spnn.BatchNorm(outc),\n            spnn.ReLU(True))\n\n    def forward(self, x):\n        out = self.net(x)\n        return out\n\n\nclass BasicDeconvolutionBlock(nn.Module):\n    def __init__(self, inc, outc, ks=3, stride=1):\n        super().__init__()\n        self.net = nn.Sequential(\n            spnn.Conv3d(\n                inc,\n                outc,\n                kernel_size=ks,\n                stride=stride,\n                transposed=True),\n            spnn.BatchNorm(outc),\n            spnn.ReLU(True))\n\n    def forward(self, x):\n        return self.net(x)\n\n\nclass ResidualBlock(nn.Module):\n    def __init__(self, inc, outc, ks=3, stride=1, dilation=1):\n        super().__init__()\n        self.net = nn.Sequential(\n            spnn.Conv3d(\n                inc,\n                outc,\n                kernel_size=ks,\n                dilation=dilation,\n                stride=stride), spnn.BatchNorm(outc),\n            spnn.ReLU(True),\n            spnn.Conv3d(\n                outc,\n                outc,\n                kernel_size=ks,\n                dilation=dilation,\n                stride=1),\n            spnn.BatchNorm(outc))\n\n        self.downsample = nn.Sequential() if (inc == outc and stride == 1) else \\\n            nn.Sequential(\n                spnn.Conv3d(inc, outc, kernel_size=1, dilation=1, stride=stride),\n                spnn.BatchNorm(outc)\n            )\n\n        self.ReLU = spnn.ReLU(True)\n\n    def forward(self, x):\n        out = self.ReLU(self.net(x) + self.downsample(x))\n        return out\n"
  },
  {
    "path": "lidm/modules/ts/utils.py",
    "content": "import torch\nimport torchsparse.nn.functional as F\nfrom torchsparse import PointTensor, SparseTensor\nfrom torchsparse.nn.utils import get_kernel_offsets\n\n__all__ = ['initial_voxelize', 'point_to_voxel', 'voxel_to_point']\n\n\n# z: PointTensor\n# return: SparseTensor\ndef initial_voxelize(z, init_res, after_res):\n    new_float_coord = torch.cat([(z.C[:, :3] * init_res) / after_res, z.C[:, -1].view(-1, 1)], 1)\n\n    pc_hash = F.sphash(torch.floor(new_float_coord).int())\n    sparse_hash = torch.unique(pc_hash)\n    idx_query = F.sphashquery(pc_hash, sparse_hash)\n    counts = F.spcount(idx_query.int(), len(sparse_hash))\n\n    inserted_coords = F.spvoxelize(torch.floor(new_float_coord), idx_query, counts)\n    inserted_coords = torch.round(inserted_coords).int()\n    inserted_feat = F.spvoxelize(z.F, idx_query, counts)\n\n    new_tensor = SparseTensor(inserted_feat, inserted_coords, 1)\n    new_tensor.cmaps.setdefault(new_tensor.stride, new_tensor.coords)\n    z.additional_features['idx_query'][1] = idx_query\n    z.additional_features['counts'][1] = counts\n    z.C = new_float_coord\n\n    return new_tensor\n\n\n# x: SparseTensor, z: PointTensor\n# return: SparseTensor\ndef point_to_voxel(x, z):\n    if z.additional_features is None or \\\n            z.additional_features.get('idx_query') is None or \\\n            z.additional_features['idx_query'].get(x.s) is None:\n        pc_hash = F.sphash(\n            torch.cat([torch.floor(z.C[:, :3] / x.s[0]).int() * x.s[0], z.C[:, -1].int().view(-1, 1)], 1))\n        sparse_hash = F.sphash(x.C)\n        idx_query = F.sphashquery(pc_hash, sparse_hash)\n        counts = F.spcount(idx_query.int(), x.C.shape[0])\n        z.additional_features['idx_query'][x.s] = idx_query\n        z.additional_features['counts'][x.s] = counts\n    else:\n        idx_query = z.additional_features['idx_query'][x.s]\n        counts = z.additional_features['counts'][x.s]\n\n    inserted_feat = F.spvoxelize(z.F, idx_query, counts)\n    new_tensor = SparseTensor(inserted_feat, x.C, x.s)\n    new_tensor.cmaps = x.cmaps\n    new_tensor.kmaps = x.kmaps\n\n    return new_tensor\n\n\n# x: SparseTensor, z: PointTensor\n# return: PointTensor\ndef voxel_to_point(x, z, nearest=False):\n    if z.idx_query is None or z.weights is None or z.idx_query.get(x.s) is None or z.weights.get(x.s) is None:\n        off = get_kernel_offsets(2, x.s, 1, device=z.F.device)\n        old_hash = F.sphash(\n            torch.cat([\n                torch.floor(z.C[:, :3] / x.s[0]).int() * x.s[0],\n                z.C[:, -1].int().view(-1, 1)], 1), off)\n        pc_hash = F.sphash(x.C.to(z.F.device))\n        idx_query = F.sphashquery(old_hash, pc_hash)\n        weights = F.calc_ti_weights(z.C, idx_query, scale=x.s[0]).transpose(0, 1).contiguous()\n        idx_query = idx_query.transpose(0, 1).contiguous()\n        if nearest:\n            weights[:, 1:] = 0.\n            idx_query[:, 1:] = -1\n        new_feat = F.spdevoxelize(x.F, idx_query, weights)\n        new_tensor = PointTensor(new_feat, z.C, idx_query=z.idx_query, weights=z.weights)\n        new_tensor.additional_features = z.additional_features\n        new_tensor.idx_query[x.s] = idx_query\n        new_tensor.weights[x.s] = weights\n        z.idx_query[x.s] = idx_query\n        z.weights[x.s] = weights\n\n    else:\n        new_feat = F.spdevoxelize(x.F, z.idx_query.get(x.s), z.weights.get(x.s))\n        new_tensor = PointTensor(new_feat, z.C, idx_query=z.idx_query, weights=z.weights)\n        new_tensor.additional_features = z.additional_features\n\n    return new_tensor"
  },
  {
    "path": "lidm/modules/x_transformer.py",
    "content": "\"\"\"shout-out to https://github.com/lucidrains/x-transformers/tree/main/x_transformers\"\"\"\nimport torch\nfrom torch import nn, einsum\nimport torch.nn.functional as F\nfrom functools import partial\nfrom inspect import isfunction\nfrom collections import namedtuple\nfrom einops import rearrange, repeat, reduce\n\n# constants\n\nDEFAULT_DIM_HEAD = 64\n\nIntermediates = namedtuple('Intermediates', [\n    'pre_softmax_attn',\n    'post_softmax_attn'\n])\n\nLayerIntermediates = namedtuple('Intermediates', [\n    'hiddens',\n    'attn_intermediates'\n])\n\n\nclass AbsolutePositionalEmbedding(nn.Module):\n    def __init__(self, dim, max_seq_len):\n        super().__init__()\n        self.emb = nn.Embedding(max_seq_len, dim)\n        self.init_()\n\n    def init_(self):\n        nn.init.normal_(self.emb.weight, std=0.02)\n\n    def forward(self, x):\n        n = torch.arange(x.shape[1], device=x.device)\n        return self.emb(n)[None, :, :]\n\n\nclass FixedPositionalEmbedding(nn.Module):\n    def __init__(self, dim):\n        super().__init__()\n        inv_freq = 1. / (10000 ** (torch.arange(0, dim, 2).float() / dim))\n        self.register_buffer('inv_freq', inv_freq)\n\n    def forward(self, x, seq_dim=1, offset=0):\n        t = torch.arange(x.shape[seq_dim], device=x.device).type_as(self.inv_freq) + offset\n        sinusoid_inp = torch.einsum('i , j -> i j', t, self.inv_freq)\n        emb = torch.cat((sinusoid_inp.sin(), sinusoid_inp.cos()), dim=-1)\n        return emb[None, :, :]\n\n\n# helpers\n\ndef exists(val):\n    return val is not None\n\n\ndef default(val, d):\n    if exists(val):\n        return val\n    return d() if isfunction(d) else d\n\n\ndef always(val):\n    def inner(*args, **kwargs):\n        return val\n\n    return inner\n\n\ndef not_equals(val):\n    def inner(x):\n        return x != val\n\n    return inner\n\n\ndef equals(val):\n    def inner(x):\n        return x == val\n\n    return inner\n\n\ndef max_neg_value(tensor):\n    return -torch.finfo(tensor.dtype).max\n\n\n# keyword argument helpers\n\ndef pick_and_pop(keys, d):\n    values = list(map(lambda key: d.pop(key), keys))\n    return dict(zip(keys, values))\n\n\ndef group_dict_by_key(cond, d):\n    return_val = [dict(), dict()]\n    for key in d.keys():\n        match = bool(cond(key))\n        ind = int(not match)\n        return_val[ind][key] = d[key]\n    return (*return_val,)\n\n\ndef string_begins_with(prefix, str):\n    return str.startswith(prefix)\n\n\ndef group_by_key_prefix(prefix, d):\n    return group_dict_by_key(partial(string_begins_with, prefix), d)\n\n\ndef groupby_prefix_and_trim(prefix, d):\n    kwargs_with_prefix, kwargs = group_dict_by_key(partial(string_begins_with, prefix), d)\n    kwargs_without_prefix = dict(map(lambda x: (x[0][len(prefix):], x[1]), tuple(kwargs_with_prefix.items())))\n    return kwargs_without_prefix, kwargs\n\n\n# classes\nclass Scale(nn.Module):\n    def __init__(self, value, fn):\n        super().__init__()\n        self.value = value\n        self.fn = fn\n\n    def forward(self, x, **kwargs):\n        x, *rest = self.fn(x, **kwargs)\n        return (x * self.value, *rest)\n\n\nclass Rezero(nn.Module):\n    def __init__(self, fn):\n        super().__init__()\n        self.fn = fn\n        self.g = nn.Parameter(torch.zeros(1))\n\n    def forward(self, x, **kwargs):\n        x, *rest = self.fn(x, **kwargs)\n        return (x * self.g, *rest)\n\n\nclass ScaleNorm(nn.Module):\n    def __init__(self, dim, eps=1e-5):\n        super().__init__()\n        self.scale = dim ** -0.5\n        self.eps = eps\n        self.g = nn.Parameter(torch.ones(1))\n\n    def forward(self, x):\n        norm = torch.norm(x, dim=-1, keepdim=True) * self.scale\n        return x / norm.clamp(min=self.eps) * self.g\n\n\nclass RMSNorm(nn.Module):\n    def __init__(self, dim, eps=1e-8):\n        super().__init__()\n        self.scale = dim ** -0.5\n        self.eps = eps\n        self.g = nn.Parameter(torch.ones(dim))\n\n    def forward(self, x):\n        norm = torch.norm(x, dim=-1, keepdim=True) * self.scale\n        return x / norm.clamp(min=self.eps) * self.g\n\n\nclass Residual(nn.Module):\n    def forward(self, x, residual):\n        return x + residual\n\n\nclass GRUGating(nn.Module):\n    def __init__(self, dim):\n        super().__init__()\n        self.gru = nn.GRUCell(dim, dim)\n\n    def forward(self, x, residual):\n        gated_output = self.gru(\n            rearrange(x, 'b n d -> (b n) d'),\n            rearrange(residual, 'b n d -> (b n) d')\n        )\n\n        return gated_output.reshape_as(x)\n\n\n# feedforward\n\nclass GEGLU(nn.Module):\n    def __init__(self, dim_in, dim_out):\n        super().__init__()\n        self.proj = nn.Linear(dim_in, dim_out * 2)\n\n    def forward(self, x):\n        x, gate = self.proj(x).chunk(2, dim=-1)\n        return x * F.gelu(gate)\n\n\nclass FeedForward(nn.Module):\n    def __init__(self, dim, dim_out=None, mult=4, glu=False, dropout=0.):\n        super().__init__()\n        inner_dim = int(dim * mult)\n        dim_out = default(dim_out, dim)\n        project_in = nn.Sequential(\n            nn.Linear(dim, inner_dim),\n            nn.GELU()\n        ) if not glu else GEGLU(dim, inner_dim)\n\n        self.net = nn.Sequential(\n            project_in,\n            nn.Dropout(dropout),\n            nn.Linear(inner_dim, dim_out)\n        )\n\n    def forward(self, x):\n        return self.net(x)\n\n\n# attention.\nclass Attention(nn.Module):\n    def __init__(\n            self,\n            dim,\n            dim_head=DEFAULT_DIM_HEAD,\n            heads=8,\n            causal=False,\n            mask=None,\n            talking_heads=False,\n            sparse_topk=None,\n            use_entmax15=False,\n            num_mem_kv=0,\n            dropout=0.,\n            on_attn=False\n    ):\n        super().__init__()\n        if use_entmax15:\n            raise NotImplementedError(\"Check out entmax activation instead of softmax activation!\")\n        self.scale = dim_head ** -0.5\n        self.heads = heads\n        self.causal = causal\n        self.mask = mask\n\n        inner_dim = dim_head * heads\n\n        self.to_q = nn.Linear(dim, inner_dim, bias=False)\n        self.to_k = nn.Linear(dim, inner_dim, bias=False)\n        self.to_v = nn.Linear(dim, inner_dim, bias=False)\n        self.dropout = nn.Dropout(dropout)\n\n        # talking heads\n        self.talking_heads = talking_heads\n        if talking_heads:\n            self.pre_softmax_proj = nn.Parameter(torch.randn(heads, heads))\n            self.post_softmax_proj = nn.Parameter(torch.randn(heads, heads))\n\n        # explicit topk sparse attention\n        self.sparse_topk = sparse_topk\n\n        # entmax\n        # self.attn_fn = entmax15 if use_entmax15 else F.softmax\n        self.attn_fn = F.softmax\n\n        # add memory key / values\n        self.num_mem_kv = num_mem_kv\n        if num_mem_kv > 0:\n            self.mem_k = nn.Parameter(torch.randn(heads, num_mem_kv, dim_head))\n            self.mem_v = nn.Parameter(torch.randn(heads, num_mem_kv, dim_head))\n\n        # attention on attention\n        self.attn_on_attn = on_attn\n        self.to_out = nn.Sequential(nn.Linear(inner_dim, dim * 2), nn.GLU()) if on_attn else nn.Linear(inner_dim, dim)\n\n    def forward(\n            self,\n            x,\n            context=None,\n            mask=None,\n            context_mask=None,\n            rel_pos=None,\n            sinusoidal_emb=None,\n            prev_attn=None,\n            mem=None\n    ):\n        b, n, _, h, talking_heads, device = *x.shape, self.heads, self.talking_heads, x.device\n        kv_input = default(context, x)\n\n        q_input = x\n        k_input = kv_input\n        v_input = kv_input\n\n        if exists(mem):\n            k_input = torch.cat((mem, k_input), dim=-2)\n            v_input = torch.cat((mem, v_input), dim=-2)\n\n        if exists(sinusoidal_emb):\n            # in shortformer, the query would start at a position offset depending on the past cached memory\n            offset = k_input.shape[-2] - q_input.shape[-2]\n            q_input = q_input + sinusoidal_emb(q_input, offset=offset)\n            k_input = k_input + sinusoidal_emb(k_input)\n\n        q = self.to_q(q_input)\n        k = self.to_k(k_input)\n        v = self.to_v(v_input)\n\n        q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> b h n d', h=h), (q, k, v))\n\n        input_mask = None\n        if any(map(exists, (mask, context_mask))):\n            q_mask = default(mask, lambda: torch.ones((b, n), device=device).bool())\n            k_mask = q_mask if not exists(context) else context_mask\n            k_mask = default(k_mask, lambda: torch.ones((b, k.shape[-2]), device=device).bool())\n            q_mask = rearrange(q_mask, 'b i -> b () i ()')\n            k_mask = rearrange(k_mask, 'b j -> b () () j')\n            input_mask = q_mask * k_mask\n\n        if self.num_mem_kv > 0:\n            mem_k, mem_v = map(lambda t: repeat(t, 'h n d -> b h n d', b=b), (self.mem_k, self.mem_v))\n            k = torch.cat((mem_k, k), dim=-2)\n            v = torch.cat((mem_v, v), dim=-2)\n            if exists(input_mask):\n                input_mask = F.pad(input_mask, (self.num_mem_kv, 0), value=True)\n\n        dots = einsum('b h i d, b h j d -> b h i j', q, k) * self.scale\n        mask_value = max_neg_value(dots)\n\n        if exists(prev_attn):\n            dots = dots + prev_attn\n\n        pre_softmax_attn = dots\n\n        if talking_heads:\n            dots = einsum('b h i j, h k -> b k i j', dots, self.pre_softmax_proj).contiguous()\n\n        if exists(rel_pos):\n            dots = rel_pos(dots)\n\n        if exists(input_mask):\n            dots.masked_fill_(~input_mask, mask_value)\n            del input_mask\n\n        if self.causal:\n            i, j = dots.shape[-2:]\n            r = torch.arange(i, device=device)\n            mask = rearrange(r, 'i -> () () i ()') < rearrange(r, 'j -> () () () j')\n            mask = F.pad(mask, (j - i, 0), value=False)\n            dots.masked_fill_(mask, mask_value)\n            del mask\n\n        if exists(self.sparse_topk) and self.sparse_topk < dots.shape[-1]:\n            top, _ = dots.topk(self.sparse_topk, dim=-1)\n            vk = top[..., -1].unsqueeze(-1).expand_as(dots)\n            mask = dots < vk\n            dots.masked_fill_(mask, mask_value)\n            del mask\n\n        attn = self.attn_fn(dots, dim=-1)\n        post_softmax_attn = attn\n\n        attn = self.dropout(attn)\n\n        if talking_heads:\n            attn = einsum('b h i j, h k -> b k i j', attn, self.post_softmax_proj).contiguous()\n\n        out = einsum('b h i j, b h j d -> b h i d', attn, v)\n        out = rearrange(out, 'b h n d -> b n (h d)')\n\n        intermediates = Intermediates(\n            pre_softmax_attn=pre_softmax_attn,\n            post_softmax_attn=post_softmax_attn\n        )\n\n        return self.to_out(out), intermediates\n\n\nclass AttentionLayers(nn.Module):\n    def __init__(\n            self,\n            dim,\n            depth,\n            heads=8,\n            causal=False,\n            cross_attend=False,\n            only_cross=False,\n            use_scalenorm=False,\n            use_rmsnorm=False,\n            use_rezero=False,\n            rel_pos_num_buckets=32,\n            rel_pos_max_distance=128,\n            position_infused_attn=False,\n            custom_layers=None,\n            sandwich_coef=None,\n            par_ratio=None,\n            residual_attn=False,\n            cross_residual_attn=False,\n            macaron=False,\n            pre_norm=True,\n            gate_residual=False,\n            **kwargs\n    ):\n        super().__init__()\n        ff_kwargs, kwargs = groupby_prefix_and_trim('ff_', kwargs)\n        attn_kwargs, _ = groupby_prefix_and_trim('attn_', kwargs)\n\n        dim_head = attn_kwargs.get('dim_head', DEFAULT_DIM_HEAD)\n\n        self.dim = dim\n        self.depth = depth\n        self.layers = nn.ModuleList([])\n\n        self.has_pos_emb = position_infused_attn\n        self.pia_pos_emb = FixedPositionalEmbedding(dim) if position_infused_attn else None\n        self.rotary_pos_emb = always(None)\n\n        assert rel_pos_num_buckets <= rel_pos_max_distance, 'number of relative position buckets must be less than the relative position max distance'\n        self.rel_pos = None\n\n        self.pre_norm = pre_norm\n\n        self.residual_attn = residual_attn\n        self.cross_residual_attn = cross_residual_attn\n\n        norm_class = ScaleNorm if use_scalenorm else nn.LayerNorm\n        norm_class = RMSNorm if use_rmsnorm else norm_class\n        norm_fn = partial(norm_class, dim)\n\n        norm_fn = nn.Identity if use_rezero else norm_fn\n        branch_fn = Rezero if use_rezero else None\n\n        if cross_attend and not only_cross:\n            default_block = ('a', 'c', 'f')\n        elif cross_attend and only_cross:\n            default_block = ('c', 'f')\n        else:\n            default_block = ('a', 'f')\n\n        if macaron:\n            default_block = ('f',) + default_block\n\n        if exists(custom_layers):\n            layer_types = custom_layers\n        elif exists(par_ratio):\n            par_depth = depth * len(default_block)\n            assert 1 < par_ratio <= par_depth, 'par ratio out of range'\n            default_block = tuple(filter(not_equals('f'), default_block))\n            par_attn = par_depth // par_ratio\n            depth_cut = par_depth * 2 // 3  # 2 / 3 attention layer cutoff suggested by PAR paper\n            par_width = (depth_cut + depth_cut // par_attn) // par_attn\n            assert len(default_block) <= par_width, 'default block is too large for par_ratio'\n            par_block = default_block + ('f',) * (par_width - len(default_block))\n            par_head = par_block * par_attn\n            layer_types = par_head + ('f',) * (par_depth - len(par_head))\n        elif exists(sandwich_coef):\n            assert sandwich_coef > 0 and sandwich_coef <= depth, 'sandwich coefficient should be less than the depth'\n            layer_types = ('a',) * sandwich_coef + default_block * (depth - sandwich_coef) + ('f',) * sandwich_coef\n        else:\n            layer_types = default_block * depth\n\n        self.layer_types = layer_types\n        self.num_attn_layers = len(list(filter(equals('a'), layer_types)))\n\n        for layer_type in self.layer_types:\n            if layer_type == 'a':\n                layer = Attention(dim, heads=heads, causal=causal, **attn_kwargs)\n            elif layer_type == 'c':\n                layer = Attention(dim, heads=heads, **attn_kwargs)\n            elif layer_type == 'f':\n                layer = FeedForward(dim, **ff_kwargs)\n                layer = layer if not macaron else Scale(0.5, layer)\n            else:\n                raise Exception(f'invalid layer type {layer_type}')\n\n            if isinstance(layer, Attention) and exists(branch_fn):\n                layer = branch_fn(layer)\n\n            if gate_residual:\n                residual_fn = GRUGating(dim)\n            else:\n                residual_fn = Residual()\n\n            self.layers.append(nn.ModuleList([\n                norm_fn(),\n                layer,\n                residual_fn\n            ]))\n\n    def forward(\n            self,\n            x,\n            context=None,\n            mask=None,\n            context_mask=None,\n            mems=None,\n            return_hiddens=False\n    ):\n        hiddens = []\n        intermediates = []\n        prev_attn = None\n        prev_cross_attn = None\n\n        mems = mems.copy() if exists(mems) else [None] * self.num_attn_layers\n\n        for ind, (layer_type, (norm, block, residual_fn)) in enumerate(zip(self.layer_types, self.layers)):\n            is_last = ind == (len(self.layers) - 1)\n\n            if layer_type == 'a':\n                hiddens.append(x)\n                layer_mem = mems.pop(0)\n\n            residual = x\n\n            if self.pre_norm:\n                x = norm(x)\n\n            if layer_type == 'a':\n                out, inter = block(x, mask=mask, sinusoidal_emb=self.pia_pos_emb, rel_pos=self.rel_pos,\n                                   prev_attn=prev_attn, mem=layer_mem)\n            elif layer_type == 'c':\n                out, inter = block(x, context=context, mask=mask, context_mask=context_mask, prev_attn=prev_cross_attn)\n            elif layer_type == 'f':\n                out = block(x)\n\n            x = residual_fn(out, residual)\n\n            if layer_type in ('a', 'c'):\n                intermediates.append(inter)\n\n            if layer_type == 'a' and self.residual_attn:\n                prev_attn = inter.pre_softmax_attn\n            elif layer_type == 'c' and self.cross_residual_attn:\n                prev_cross_attn = inter.pre_softmax_attn\n\n            if not self.pre_norm and not is_last:\n                x = norm(x)\n\n        if return_hiddens:\n            intermediates = LayerIntermediates(\n                hiddens=hiddens,\n                attn_intermediates=intermediates\n            )\n\n            return x, intermediates\n\n        return x\n\n\nclass Encoder(AttentionLayers):\n    def __init__(self, **kwargs):\n        assert 'causal' not in kwargs, 'cannot set causality on encoder'\n        super().__init__(causal=False, **kwargs)\n\n\nclass TransformerWrapper(nn.Module):\n    def __init__(\n            self,\n            *,\n            num_tokens,\n            max_seq_len,\n            attn_layers,\n            emb_dim=None,\n            max_mem_len=0.,\n            emb_dropout=0.,\n            num_memory_tokens=None,\n            tie_embedding=False,\n            use_pos_emb=True\n    ):\n        super().__init__()\n        assert isinstance(attn_layers, AttentionLayers), 'attention layers must be one of Encoder or Decoder'\n\n        dim = attn_layers.dim\n        emb_dim = default(emb_dim, dim)\n\n        self.max_seq_len = max_seq_len\n        self.max_mem_len = max_mem_len\n        self.num_tokens = num_tokens\n\n        self.token_emb = nn.Embedding(num_tokens, emb_dim)\n        self.pos_emb = AbsolutePositionalEmbedding(emb_dim, max_seq_len) if (\n                use_pos_emb and not attn_layers.has_pos_emb) else always(0)\n        self.emb_dropout = nn.Dropout(emb_dropout)\n\n        self.project_emb = nn.Linear(emb_dim, dim) if emb_dim != dim else nn.Identity()\n        self.attn_layers = attn_layers\n        self.norm = nn.LayerNorm(dim)\n\n        self.init_()\n\n        self.to_logits = nn.Linear(dim, num_tokens) if not tie_embedding else lambda t: t @ self.token_emb.weight.t()\n\n        # memory tokens (like [cls]) from Memory Transformers paper\n        num_memory_tokens = default(num_memory_tokens, 0)\n        self.num_memory_tokens = num_memory_tokens\n        if num_memory_tokens > 0:\n            self.memory_tokens = nn.Parameter(torch.randn(num_memory_tokens, dim))\n\n            # let funnel encoder know number of memory tokens, if specified\n            if hasattr(attn_layers, 'num_memory_tokens'):\n                attn_layers.num_memory_tokens = num_memory_tokens\n\n    def init_(self):\n        nn.init.normal_(self.token_emb.weight, std=0.02)\n\n    def forward(\n            self,\n            x,\n            return_embeddings=False,\n            mask=None,\n            return_mems=False,\n            return_attn=False,\n            mems=None,\n            **kwargs\n    ):\n        b, n, device, num_mem = *x.shape, x.device, self.num_memory_tokens\n        x = self.token_emb(x)\n        x += self.pos_emb(x)\n        x = self.emb_dropout(x)\n\n        x = self.project_emb(x)\n\n        if num_mem > 0:\n            mem = repeat(self.memory_tokens, 'n d -> b n d', b=b)\n            x = torch.cat((mem, x), dim=1)\n\n            # auto-handle masking after appending memory tokens\n            if exists(mask):\n                mask = F.pad(mask, (num_mem, 0), value=True)\n\n        x, intermediates = self.attn_layers(x, mask=mask, mems=mems, return_hiddens=True, **kwargs)\n        x = self.norm(x)\n\n        mem, x = x[:, :num_mem], x[:, num_mem:]\n\n        out = self.to_logits(x) if not return_embeddings else x\n\n        if return_mems:\n            hiddens = intermediates.hiddens\n            new_mems = list(map(lambda pair: torch.cat(pair, dim=-2), zip(mems, hiddens))) if exists(mems) else hiddens\n            new_mems = list(map(lambda t: t[..., -self.max_mem_len:, :].detach(), new_mems))\n            return out, new_mems\n\n        if return_attn:\n            attn_maps = list(map(lambda t: t.post_softmax_attn, intermediates.attn_intermediates))\n            return out, attn_maps\n\n        return out\n"
  },
  {
    "path": "lidm/utils/__init__.py",
    "content": ""
  },
  {
    "path": "lidm/utils/aug_utils.py",
    "content": "import numpy as np\n\n\ndef get_lidar_transform(config, split):\n    transform_list = []\n    if config['rotate']:\n        transform_list.append(RandomRotateAligned())\n    if config['flip']:\n        transform_list.append(RandomFlip())\n    return Compose(transform_list) if len(transform_list) > 0 and split == 'train' else None\n\n\ndef get_camera_transform(config, split):\n    # import open_clip\n    # transform = open_clip.image_transform((224, 224), split == 'train', resize_longest_max=True)\n    # TODO\n    transform = None\n    return transform\n\n\ndef get_anno_transform(config, split):\n    if config['keypoint_drop'] and split == 'train':\n        drop_range = config['keypoint_drop_range'] if 'keypoint_drop_range' in config else (5, 60)\n        transform = RandomKeypointDrop(drop_range)\n    else:\n        transform = None\n    return transform\n\n\nclass Compose(object):\n    def __init__(self, transforms):\n        self.transforms = transforms\n\n    def __call__(self, pcd, pcd1=None):\n        for t in self.transforms:\n            pcd, pcd1 = t(pcd, pcd1)\n        return pcd, pcd1\n\n\nclass RandomFlip(object):\n    def __init__(self, p=1.):\n        self.p = p\n\n    def __call__(self, coord, coord1=None):\n        if np.random.rand() < self.p:\n            if np.random.rand() < 0.5:\n                coord[:, 0] = -coord[:, 0]\n                if coord1 is not None:\n                    coord1[:, 0] = -coord1[:, 0]\n            if np.random.rand() < 0.5:\n                coord[:, 1] = -coord[:, 1]\n                if coord1 is not None:\n                    coord1[:, 1] = -coord1[:, 1]\n        return coord, coord1\n\n\nclass RandomRotateAligned(object):\n    def __init__(self, rot=np.pi / 4, p=1.):\n        self.rot = rot\n        self.p = p\n\n    def __call__(self, coord, coord1=None):\n        if np.random.rand() < self.p:\n            angle_z = np.random.uniform(-self.rot, self.rot)\n            cos_z, sin_z = np.cos(angle_z), np.sin(angle_z)\n            R = np.array([[cos_z, -sin_z, 0], [sin_z, cos_z, 0], [0, 0, 1]])\n            coord = np.dot(coord, R)\n            if coord1 is not None:\n                coord1 = np.dot(coord1, R)\n        return coord, coord1\n\n\nclass RandomKeypointDrop(object):\n    def __init__(self, num_range=(5, 60), p=.5):\n        self.num_range = num_range\n        self.p = p\n\n    def __call__(self, center, category=None):\n        if np.random.rand() < self.p:\n            num = len(center)\n            if num > self.num_range[0]:\n                num_kept = np.random.randint(self.num_range[0], min(self.num_range[1], num))\n                idx_kept = np.random.choice(num, num_kept, replace=False)\n                center, category = center[idx_kept], category[idx_kept]\n        return center, category\n\n\n# class ResizeMaxSize(object):\n#     def __init__(self, max_size, interpolation=InterpolationMode.BICUBIC, fn='max', fill=0):\n#         super().__init__()\n#         if not isinstance(max_size, int):\n#             raise TypeError(f\"Size should be int. Got {type(max_size)}\")\n#         self.max_size = max_size\n#         self.interpolation = interpolation\n#         self.fn = min if fn == 'min' else min\n#         self.fill = fill\n#\n#     def forward(self, img):\n#         width, height = img.size\n#         scale = self.max_size / float(max(height, width))\n#         if scale != 1.0:\n#             new_size = tuple(round(dim * scale) for dim in (height, width))\n#             img = F.resize(img, new_size, self.interpolation)\n#             pad_h = self.max_size - new_size[0]\n#             pad_w = self.max_size - new_size[1]\n#             img = F.pad(img, padding=[pad_w//2, pad_h//2, pad_w - pad_w//2, pad_h - pad_h//2], fill=self.fill)\n#         return img\n"
  },
  {
    "path": "lidm/utils/lidar_utils.py",
    "content": "import math\n\nimport numpy as np\n\n\ndef pcd2coord2d(pcd, fov, depth_range, labels=None):\n    # laser parameters\n    fov_up = fov[0] / 180.0 * np.pi  # field of view up in rad\n    fov_down = fov[1] / 180.0 * np.pi  # field of view down in rad\n    fov_range = abs(fov_down) + abs(fov_up)  # get field of view total in rad\n\n    # get depth (distance) of all points\n    depth = np.linalg.norm(pcd, 2, axis=-1)\n\n    # mask points out of range\n    mask = np.logical_and(depth > depth_range[0], depth < depth_range[1])\n    if pcd.ndim == 3:\n        mask = mask.all(axis=1)\n    depth, pcd = depth[mask], pcd[mask]\n\n    # get scan components\n    scan_x, scan_y, scan_z = pcd[..., 0], pcd[..., 1], pcd[..., 2]\n\n    # get angles of all points\n    yaw = -np.arctan2(scan_y, scan_x)\n    pitch = np.arcsin(scan_z / depth)\n\n    # get projections in image coords\n    proj_x = np.clip(0.5 * (yaw / np.pi + 1.0), 0., 1.)  # in [0.0, 1.0]\n    proj_y = np.clip(1.0 - (pitch + abs(fov_down)) / fov_range, 0., 1.)  # in [0.0, 1.0]\n    proj_coord2d = np.stack([proj_x, proj_y], axis=-1)\n\n    if labels is not None:\n        proj_labels = labels[mask]\n    else:\n        proj_labels = None\n\n    return proj_coord2d, proj_labels\n\n\ndef pcd2range(pcd, size, fov, depth_range, remission=None, labels=None, **kwargs):\n    # laser parameters\n    fov_up = fov[0] / 180.0 * np.pi  # field of view up in rad\n    fov_down = fov[1] / 180.0 * np.pi  # field of view down in rad\n    fov_range = abs(fov_down) + abs(fov_up)  # get field of view total in rad\n\n    # get depth (distance) of all points\n    depth = np.linalg.norm(pcd, 2, axis=1)\n\n    # mask points out of range\n    mask = np.logical_and(depth > depth_range[0], depth < depth_range[1])\n    depth, pcd = depth[mask], pcd[mask]\n\n    # get scan components\n    scan_x, scan_y, scan_z = pcd[:, 0], pcd[:, 1], pcd[:, 2]\n\n    # get angles of all points\n    yaw = -np.arctan2(scan_y, scan_x)\n    pitch = np.arcsin(scan_z / depth)\n\n    # get projections in image coords\n    proj_x = 0.5 * (yaw / np.pi + 1.0)  # in [0.0, 1.0]\n    proj_y = 1.0 - (pitch + abs(fov_down)) / fov_range  # in [0.0, 1.0]\n\n    # scale to image size using angular resolution\n    proj_x *= size[1]  # in [0.0, W]\n    proj_y *= size[0]  # in [0.0, H]\n\n    # round and clamp for use as index\n    proj_x = np.maximum(0, np.minimum(size[1] - 1, np.floor(proj_x))).astype(np.int32)  # in [0,W-1]\n    proj_y = np.maximum(0, np.minimum(size[0] - 1, np.floor(proj_y))).astype(np.int32)  # in [0,H-1]\n\n    # order in decreasing depth\n    order = np.argsort(depth)[::-1]\n    proj_x, proj_y = proj_x[order], proj_y[order]\n\n    # project depth\n    depth = depth[order]\n    proj_range = np.full(size, -1, dtype=np.float32)\n    proj_range[proj_y, proj_x] = depth\n\n    # project point feature\n    if remission is not None:\n        remission = remission[mask][order]\n        proj_feature = np.full(size, -1, dtype=np.float32)\n        proj_feature[proj_y, proj_x] = remission\n    elif labels is not None:\n        labels = labels[mask][order]\n        proj_feature = np.full(size, 0, dtype=np.float32)\n        proj_feature[proj_y, proj_x] = labels\n    else:\n        proj_feature = None\n\n    return proj_range, proj_feature\n\n\ndef range2pcd(range_img, fov, depth_range, depth_scale, log_scale=True, label=None, color=None, **kwargs):\n    # laser parameters\n    size = range_img.shape\n    fov_up = fov[0] / 180.0 * np.pi  # field of view up in rad\n    fov_down = fov[1] / 180.0 * np.pi  # field of view down in rad\n    fov_range = abs(fov_down) + abs(fov_up)  # get field of view total in rad\n\n    # inverse transform from depth\n    depth = (range_img * depth_scale).flatten()\n    if log_scale:\n        depth = np.exp2(depth) - 1\n\n    scan_x, scan_y = np.meshgrid(np.arange(size[1]), np.arange(size[0]))\n    scan_x = scan_x.astype(np.float64) / size[1]\n    scan_y = scan_y.astype(np.float64) / size[0]\n\n    yaw = (np.pi * (scan_x * 2 - 1)).flatten()\n    pitch = ((1.0 - scan_y) * fov_range - abs(fov_down)).flatten()\n\n    pcd = np.zeros((len(yaw), 3))\n    pcd[:, 0] = np.cos(yaw) * np.cos(pitch) * depth\n    pcd[:, 1] = -np.sin(yaw) * np.cos(pitch) * depth\n    pcd[:, 2] = np.sin(pitch) * depth\n\n    # mask out invalid points\n    mask = np.logical_and(depth > depth_range[0], depth < depth_range[1])\n    pcd = pcd[mask, :]\n\n    # label\n    if label is not None:\n        label = label.flatten()[mask]\n\n    # default point color\n    if color is not None:\n        color = color.reshape(-1, 3)[mask, :]\n    else:\n        color = np.ones((pcd.shape[0], 3)) * [0.7, 0.7, 1]\n\n    return pcd, color, label\n\n\ndef range2xyz(range_img, fov, depth_range, depth_scale, log_scale=True, **kwargs):\n    # laser parameters\n    size = range_img.shape\n    fov_up = fov[0] / 180.0 * np.pi  # field of view up in rad\n    fov_down = fov[1] / 180.0 * np.pi  # field of view down in rad\n    fov_range = abs(fov_down) + abs(fov_up)  # get field of view total in rad\n\n    # inverse transform from depth\n    if log_scale:\n        depth = (np.exp2(range_img * depth_scale) - 1)\n    else:\n        depth = range_img\n\n    scan_x, scan_y = np.meshgrid(np.arange(size[1]), np.arange(size[0]))\n    scan_x = scan_x.astype(np.float64) / size[1]\n    scan_y = scan_y.astype(np.float64) / size[0]\n\n    yaw = np.pi * (scan_x * 2 - 1)\n    pitch = (1.0 - scan_y) * fov_range - abs(fov_down)\n\n    xyz = -np.ones((3, *size))\n    xyz[0] = np.cos(yaw) * np.cos(pitch) * depth\n    xyz[1] = -np.sin(yaw) * np.cos(pitch) * depth\n    xyz[2] = np.sin(pitch) * depth\n\n    # mask out invalid points\n    mask = np.logical_and(depth > depth_range[0], depth < depth_range[1])\n    xyz[:, ~mask] = -1\n\n    return xyz\n\n\ndef pcd2bev(pcd, x_range, y_range, z_range, resolution, **kwargs):\n    # mask out invalid points\n    mask_x = np.logical_and(pcd[:, 0] > x_range[0], pcd[:, 0] < x_range[1])\n    mask_y = np.logical_and(pcd[:, 1] > y_range[0], pcd[:, 1] < y_range[1])\n    mask_z = np.logical_and(pcd[:, 2] > z_range[0], pcd[:, 2] < z_range[1])\n    mask = mask_x & mask_y & mask_z\n    pcd = pcd[mask]\n\n    # points to bev coords\n    bev_x = np.floor((pcd[:, 0] - x_range[0]) / resolution).astype(np.int32)\n    bev_y = np.floor((pcd[:, 1] - y_range[0]) / resolution).astype(np.int32)\n\n    # 2D bev grid\n    bev_shape = (math.ceil((x_range[1] - x_range[0]) // resolution), math.ceil((y_range[1] - y_range[0]) // resolution))\n    bev_grid = np.zeros(bev_shape, dtype=np.float64)\n\n    # populate the BEV grid with bev coords\n    bev_grid[bev_x, bev_y] = 1\n\n    return bev_grid\n\n\nif __name__ == '__main__':\n    # test = np.loadtxt('test_range.txt')\n    # pcd, _, _ = range2pcd(test, (32, 1024), (10, -30))\n    # np.savetxt('test_pcd.txt', pcd, fmt='%.4f')\n\n    # import matplotlib.pyplot as plt\n    # pcd = np.loadtxt('test_origin.txt')\n    # bev_grid = pcd2bev(pcd)\n    # plt.imshow(bev_grid[:, :, 0], cmap='gray')  # Display the BEV for the first height level\n    # plt.savefig('test.png', dpi=300, bbox_inches='tight', pad_inches=0, transparent=True)\n\n    from PIL import Image\n    img = Image.open('assets/kitti/range.png')\n    img.convert('L')\n    img = np.array(img) / 255.\n"
  },
  {
    "path": "lidm/utils/lr_scheduler.py",
    "content": "import numpy as np\n\n\nclass LambdaWarmUpCosineScheduler:\n    \"\"\"\n    note: use with a base_lr of 1.0\n    \"\"\"\n    def __init__(self, warm_up_steps, lr_min, lr_max, lr_start, max_decay_steps, verbosity_interval=0):\n        self.lr_warm_up_steps = warm_up_steps\n        self.lr_start = lr_start\n        self.lr_min = lr_min\n        self.lr_max = lr_max\n        self.lr_max_decay_steps = max_decay_steps\n        self.last_lr = 0.\n        self.verbosity_interval = verbosity_interval\n\n    def schedule(self, n, **kwargs):\n        if self.verbosity_interval > 0:\n            if n % self.verbosity_interval == 0: print(f\"current step: {n}, recent lr-multiplier: {self.last_lr}\")\n        if n < self.lr_warm_up_steps:\n            lr = (self.lr_max - self.lr_start) / self.lr_warm_up_steps * n + self.lr_start\n            self.last_lr = lr\n            return lr\n        else:\n            t = (n - self.lr_warm_up_steps) / (self.lr_max_decay_steps - self.lr_warm_up_steps)\n            t = min(t, 1.0)\n            lr = self.lr_min + 0.5 * (self.lr_max - self.lr_min) * (\n                    1 + np.cos(t * np.pi))\n            self.last_lr = lr\n            return lr\n\n    def __call__(self, n, **kwargs):\n        return self.schedule(n,**kwargs)\n\n\nclass LambdaWarmUpCosineScheduler2:\n    \"\"\"\n    supports repeated iterations, configurable via lists\n    note: use with a base_lr of 1.0.\n    \"\"\"\n    def __init__(self, warm_up_steps, f_min, f_max, f_start, cycle_lengths, verbosity_interval=0):\n        assert len(warm_up_steps) == len(f_min) == len(f_max) == len(f_start) == len(cycle_lengths)\n        self.lr_warm_up_steps = warm_up_steps\n        self.f_start = f_start\n        self.f_min = f_min\n        self.f_max = f_max\n        self.cycle_lengths = cycle_lengths\n        self.cum_cycles = np.cumsum([0] + list(self.cycle_lengths))\n        self.last_f = 0.\n        self.verbosity_interval = verbosity_interval\n\n    def find_in_interval(self, n):\n        interval = 0\n        for cl in self.cum_cycles[1:]:\n            if n <= cl:\n                return interval\n            interval += 1\n\n    def schedule(self, n, **kwargs):\n        cycle = self.find_in_interval(n)\n        n = n - self.cum_cycles[cycle]\n        if self.verbosity_interval > 0:\n            if n % self.verbosity_interval == 0: print(f\"current step: {n}, recent lr-multiplier: {self.last_f}, \"\n                                                       f\"current cycle {cycle}\")\n        if n < self.lr_warm_up_steps[cycle]:\n            f = (self.f_max[cycle] - self.f_start[cycle]) / self.lr_warm_up_steps[cycle] * n + self.f_start[cycle]\n            self.last_f = f\n            return f\n        else:\n            t = (n - self.lr_warm_up_steps[cycle]) / (self.cycle_lengths[cycle] - self.lr_warm_up_steps[cycle])\n            t = min(t, 1.0)\n            f = self.f_min[cycle] + 0.5 * (self.f_max[cycle] - self.f_min[cycle]) * (\n                    1 + np.cos(t * np.pi))\n            self.last_f = f\n            return f\n\n    def __call__(self, n, **kwargs):\n        return self.schedule(n, **kwargs)\n\n\nclass LambdaLinearScheduler(LambdaWarmUpCosineScheduler2):\n\n    def schedule(self, n, **kwargs):\n        cycle = self.find_in_interval(n)\n        n = n - self.cum_cycles[cycle]\n        if self.verbosity_interval > 0:\n            if n % self.verbosity_interval == 0: print(f\"current step: {n}, recent lr-multiplier: {self.last_f}, \"\n                                                       f\"current cycle {cycle}\")\n\n        if n < self.lr_warm_up_steps[cycle]:\n            f = (self.f_max[cycle] - self.f_start[cycle]) / self.lr_warm_up_steps[cycle] * n + self.f_start[cycle]\n            self.last_f = f\n            return f\n        else:\n            f = self.f_min[cycle] + (self.f_max[cycle] - self.f_min[cycle]) * (self.cycle_lengths[cycle] - n) / (self.cycle_lengths[cycle])\n            self.last_f = f\n            return f\n\n"
  },
  {
    "path": "lidm/utils/misc_utils.py",
    "content": "import argparse\nimport importlib\nimport random\n\nimport torch\nimport numpy as np\nfrom collections import abc\nfrom einops import rearrange\nfrom functools import partial\n\nimport multiprocessing as mp\nfrom threading import Thread\nfrom queue import Queue\n\nfrom inspect import isfunction\nfrom PIL import Image, ImageDraw, ImageFont\n\n\ndef set_seed(seed):\n    \"\"\"\n    Setting of Global Seed for Reproducibility (for inference only)\n\n    refer to: https://pytorch.org/docs/stable/notes/randomness.html\n\n    \"\"\"\n    np.random.seed(seed)\n    random.seed(seed)\n    torch.manual_seed(seed)\n    torch.cuda.manual_seed(seed)\n\n    torch.backends.cudnn.deterministic = True\n    torch.backends.cudnn.benchmark = False\n\n\ndef print_fn(msg, verbose):\n    if verbose:\n        print(msg)\n\n\ndef dict2namespace(config):\n    namespace = argparse.Namespace()\n    for key, value in config.items():\n        if isinstance(value, dict):\n            new_value = dict2namespace(value)\n        else:\n            new_value = value\n        setattr(namespace, key, new_value)\n    return namespace\n\n\ndef log_txt_as_img(wh, xc, size=10):\n    # wh a tuple of (width, height)\n    # xc a list of captions to plot\n    b = len(xc)\n    txts = list()\n    for bi in range(b):\n        txt = Image.new(\"RGB\", wh, color=\"white\")\n        draw = ImageDraw.Draw(txt)\n        font = ImageFont.truetype('data/DejaVuSans.ttf', size=size)\n        nc = int(40 * (wh[0] / 256))\n        lines = \"\\n\".join(xc[bi][start:start + nc] for start in range(0, len(xc[bi]), nc))\n\n        try:\n            draw.text((0, 0), lines, fill=\"black\", font=font)\n        except UnicodeEncodeError:\n            print(\"Cant encode string for logging. Skipping.\")\n\n        txt = np.array(txt).transpose(2, 0, 1) / 127.5 - 1.0\n        txts.append(txt)\n    txts = np.stack(txts)\n    txts = torch.tensor(txts)\n    return txts\n\n\ndef isdepth(x):\n    if not isinstance(x, (torch.Tensor, np.ndarray)):\n        return False\n    return ((len(x.shape) == 4) and (x.shape[1] == 1)) or (len(x.shape) == 3)\n\n\ndef ismap(x):\n    if not isinstance(x, (torch.Tensor, np.ndarray)):\n        return False\n    return (len(x.shape) == 4) and (x.shape[1] > 3)\n\n\ndef isimage(x):\n    if not isinstance(x, (torch.Tensor, np.ndarray)):\n        return False\n    return (len(x.shape) == 4) and (x.shape[1] == 3 or x.shape[1] == 1)\n\n\ndef exists(x):\n    return x is not None\n\n\ndef default(val, d):\n    if exists(val):\n        return val\n    return d() if isfunction(d) else d\n\n\ndef mean_flat(tensor):\n    \"\"\"\n    https://github.com/openai/guided-diffusion/blob/27c20a8fab9cb472df5d6bdd6c8d11c8f430b924/guided_diffusion/nn.py#L86\n    Take the mean over all non-batch dimensions.\n    \"\"\"\n    return tensor.mean(dim=list(range(1, len(tensor.shape))))\n\n\ndef count_params(model, verbose=False):\n    total_params = sum(p.numel() for p in model.parameters())\n    if verbose:\n        print(f\"{model.__class__.__name__} has {total_params * 1.e-6:.2f} M params.\")\n    return total_params\n\n\ndef instantiate_from_config(config):\n    if not \"target\" in config:\n        if config == '__is_first_stage__':\n            return None\n        elif config == \"__is_unconditional__\":\n            return None\n        raise KeyError(\"Expected key `target` to instantiate.\")\n    return get_obj_from_str(config[\"target\"])(**config.get(\"params\", dict()))\n\n\ndef get_obj_from_str(string, reload=False):\n    module, cls = string.rsplit(\".\", 1)\n    if reload:\n        module_imp = importlib.import_module(module)\n        importlib.reload(module_imp)\n    return getattr(importlib.import_module(module, package=None), cls)\n\n\ndef _do_parallel_data_prefetch(func, Q, data, idx, idx_to_fn=False):\n    # create dummy dataset instance\n\n    # run prefetching\n    if idx_to_fn:\n        res = func(data, worker_id=idx)\n    else:\n        res = func(data)\n    Q.put([idx, res])\n    Q.put(\"Done\")\n\n\ndef parallel_data_prefetch(\n        func: callable, data, n_proc, target_data_type=\"ndarray\", cpu_intensive=True, use_worker_id=False\n):\n    # if target_data_type not in [\"ndarray\", \"list\"]:\n    #     raise ValueError(\n    #         \"Data, which is passed to parallel_data_prefetch has to be either of type list or ndarray.\"\n    #     )\n    if isinstance(data, np.ndarray) and target_data_type == \"list\":\n        raise ValueError(\"list expected but function got ndarray.\")\n    elif isinstance(data, abc.Iterable):\n        if isinstance(data, dict):\n            print(\n                f'WARNING:\"data\" argument passed to parallel_data_prefetch is a dict: Using only its values and disregarding keys.'\n            )\n            data = list(data.values())\n        if target_data_type == \"ndarray\":\n            data = np.asarray(data)\n        else:\n            data = list(data)\n    else:\n        raise TypeError(\n            f\"The data, that shall be processed parallel has to be either an np.ndarray or an Iterable, but is actually {type(data)}.\"\n        )\n\n    if cpu_intensive:\n        Q = mp.Queue(1000)\n        proc = mp.Process\n    else:\n        Q = Queue(1000)\n        proc = Thread\n    # spawn processes\n    if target_data_type == \"ndarray\":\n        arguments = [\n            [func, Q, part, i, use_worker_id]\n            for i, part in enumerate(np.array_split(data, n_proc))\n        ]\n    else:\n        step = (\n            int(len(data) / n_proc + 1)\n            if len(data) % n_proc != 0\n            else int(len(data) / n_proc)\n        )\n        arguments = [\n            [func, Q, part, i, use_worker_id]\n            for i, part in enumerate(\n                [data[i: i + step] for i in range(0, len(data), step)]\n            )\n        ]\n    processes = []\n    for i in range(n_proc):\n        p = proc(target=_do_parallel_data_prefetch, args=arguments[i])\n        processes += [p]\n\n    # start processes\n    print(f\"Start prefetching...\")\n    import time\n\n    start = time.time()\n    gather_res = [[] for _ in range(n_proc)]\n    try:\n        for p in processes:\n            p.start()\n\n        k = 0\n        while k < n_proc:\n            # get result\n            res = Q.get()\n            if res == \"Done\":\n                k += 1\n            else:\n                gather_res[res[0]] = res[1]\n\n    except Exception as e:\n        print(\"Exception: \", e)\n        for p in processes:\n            p.terminate()\n\n        raise e\n    finally:\n        for p in processes:\n            p.join()\n        print(f\"Prefetching complete. [{time.time() - start} sec.]\")\n\n    if target_data_type == 'ndarray':\n        if not isinstance(gather_res[0], np.ndarray):\n            return np.concatenate([np.asarray(r) for r in gather_res], axis=0)\n\n        # order outputs\n        return np.concatenate(gather_res, axis=0)\n    elif target_data_type == 'list':\n        out = []\n        for r in gather_res:\n            out.extend(r)\n        return out\n    else:\n        return gather_res\n"
  },
  {
    "path": "lidm/utils/model_utils.py",
    "content": "import os\n\nimport torch\nimport yaml\n\nfrom lidm.utils.misc_utils import dict2namespace\nfrom ..modules.rangenet.model import Model as rangenet\n\ntry:\n    from ..modules.spvcnn.model import Model as spvcnn\n    from ..modules.minkowskinet.model import Model as minkowskinet\nexcept:\n    print('To install torchsparse 1.4.0, please refer to https://github.com/mit-han-lab/torchsparse/tree/74099d10a51c71c14318bce63d6421f698b24f24')\n\nDEFAULT_ROOT = './pretrained_weights'\n\n\ndef build_model(dataset_name, model_name, device='cpu'):\n    # config\n    model_folder = os.path.join(DEFAULT_ROOT, dataset_name, model_name)\n\n    if not os.path.isdir(model_folder):\n        raise Exception('Not Available Pretrained Weights!')\n\n    config = yaml.safe_load(open(os.path.join(model_folder, 'config.yaml'), 'r'))\n    if model_name != 'rangenet':\n        config = dict2namespace(config)\n\n    # build model\n    model = eval(model_name)(config)\n\n    # load checkpoint\n    if model_name == 'rangenet':\n        model.load_pretrained_weights(model_folder)\n    else:\n        ckpt = torch.load(os.path.join(model_folder, 'model.ckpt'), map_location=\"cpu\")\n        model.load_state_dict(ckpt['state_dict'], strict=False)\n    model.to(device)\n    model.eval()\n\n    return model\n"
  },
  {
    "path": "main.py",
    "content": "import argparse, os, sys, datetime, glob, importlib, csv\nimport numpy as np\nimport time\nimport torch\nimport torchvision\nimport pytorch_lightning as pl\n\nfrom packaging import version\nfrom omegaconf import OmegaConf\nfrom pytorch_lightning.utilities.warnings import LightningDeprecationWarning\nfrom torch.utils.data import random_split, DataLoader, Dataset, Subset\nfrom functools import partial\nfrom PIL import Image\n\nfrom pytorch_lightning import seed_everything\nfrom pytorch_lightning.trainer import Trainer\nfrom pytorch_lightning.callbacks import ModelCheckpoint, Callback, LearningRateMonitor\nfrom pytorch_lightning.utilities.distributed import rank_zero_only\nfrom pytorch_lightning.utilities import rank_zero_info\n\nfrom lidm.data.base import Txt2ImgIterableBaseDataset\nfrom lidm.utils.lidar_utils import range2pcd\nfrom lidm.utils.misc_utils import instantiate_from_config, isdepth\n\n# remove annoying user warnings\nimport warnings\nwarnings.filterwarnings(\"ignore\", category=UserWarning)\nwarnings.filterwarnings(\"ignore\", category=RuntimeWarning)\nwarnings.filterwarnings(\"ignore\", category=LightningDeprecationWarning)\n\n\ndef get_parser(**parser_kwargs):\n    def str2bool(v):\n        if isinstance(v, bool):\n            return v\n        if v.lower() in (\"yes\", \"true\", \"t\", \"y\", \"1\"):\n            return True\n        elif v.lower() in (\"no\", \"false\", \"f\", \"n\", \"0\"):\n            return False\n        else:\n            raise argparse.ArgumentTypeError(\"Boolean value expected.\")\n\n    parser = argparse.ArgumentParser(**parser_kwargs)\n    parser.add_argument(\n        \"-n\",\n        \"--name\",\n        type=str,\n        const=True,\n        default=\"\",\n        nargs=\"?\",\n        help=\"postfix for logdir\",\n    )\n    parser.add_argument(\n        \"-r\",\n        \"--resume\",\n        type=str,\n        const=True,\n        default=\"\",\n        nargs=\"?\",\n        help=\"resume from logdir or checkpoint in logdir\",\n    )\n    parser.add_argument(\n        \"-b\",\n        \"--base\",\n        nargs=\"*\",\n        metavar=\"base_config.yaml\",\n        help=\"paths to base configs. Loaded from left-to-right. \"\n             \"Parameters can be overwritten or added with command-line options of the form `--key value`.\",\n        default=list(),\n    )\n    parser.add_argument(\n        \"-t\",\n        \"--train\",\n        type=str2bool,\n        const=True,\n        default=False,\n        nargs=\"?\",\n        help=\"train\",\n    )\n    parser.add_argument(\n        \"--no-test\",\n        type=str2bool,\n        const=True,\n        default=False,\n        nargs=\"?\",\n        help=\"disable test\",\n    )\n    parser.add_argument(\n        \"-p\",\n        \"--project\",\n        help=\"name of new or path to existing project\"\n    )\n    parser.add_argument(\n        \"-d\",\n        \"--debug\",\n        type=str2bool,\n        nargs=\"?\",\n        const=True,\n        default=False,\n        help=\"enable post-mortem debugging\",\n    )\n    parser.add_argument(\n        \"-s\",\n        \"--seed\",\n        type=int,\n        default=23,\n        help=\"seed for seed_everything\",\n    )\n    parser.add_argument(\n        \"-f\",\n        \"--postfix\",\n        type=str,\n        default=\"\",\n        help=\"post-postfix for default name\",\n    )\n    parser.add_argument(\n        \"-l\",\n        \"--logdir\",\n        type=str,\n        default=\"logs\",\n        help=\"directory for logging dat shit\",\n    )\n    parser.add_argument(\n        \"--scale_lr\",\n        type=str2bool,\n        nargs=\"?\",\n        const=True,\n        default=True,\n        help=\"scale base-lr by ngpu * batch_size * n_accumulate\",\n    )\n    return parser\n\n\ndef nondefault_trainer_args(opt):\n    parser = argparse.ArgumentParser()\n    parser = Trainer.add_argparse_args(parser)\n    args = parser.parse_args([])\n    return sorted(k for k in vars(args) if getattr(opt, k) != getattr(args, k))\n\n\nclass WrappedDataset(Dataset):\n    \"\"\"Wraps an arbitrary object with __len__ and __getitem__ into a pytorch dataset\"\"\"\n\n    def __init__(self, dataset):\n        self.data = dataset\n\n    def __len__(self):\n        return len(self.data)\n\n    def __getitem__(self, idx):\n        return self.data[idx]\n\n\ndef worker_init_fn(_):\n    worker_info = torch.utils.data.get_worker_info()\n\n    dataset = worker_info.dataset\n    worker_id = worker_info.id\n\n    if isinstance(dataset, Txt2ImgIterableBaseDataset):\n        split_size = dataset.num_records // worker_info.num_workers\n        # reset num_records to the true number to retain reliable length information\n        dataset.sample_ids = dataset.valid_ids[worker_id * split_size:(worker_id + 1) * split_size]\n        current_id = np.random.choice(len(np.random.get_state()[1]), 1)\n        return np.random.seed(np.random.get_state()[1][current_id] + worker_id)\n    else:\n        return np.random.seed(np.random.get_state()[1][0] + worker_id)\n\n\nclass DataModuleFromConfig(pl.LightningDataModule):\n    def __init__(self, batch_size, train=None, validation=None, test=None, predict=None, dataset=None, aug=None,\n                 wrap=False, num_workers=None, shuffle_test_loader=False, use_worker_init_fn=False,\n                 shuffle_val_dataloader=False):\n        super().__init__()\n        self.datasets = None\n        self.batch_size = batch_size\n        self.dataset_configs = dict()\n        self.num_workers = num_workers if num_workers is not None else batch_size\n        self.use_worker_init_fn = use_worker_init_fn\n        basic_config = {'dataset_config': dataset, 'aug_config': aug}\n        if train is not None:\n            train['params'] = {**train['params'], **basic_config}\n            self.dataset_configs[\"train\"] = train\n            self.train_dataloader = self._train_dataloader\n        if validation is not None:\n            validation['params'] = {**validation['params'], **basic_config}\n            self.dataset_configs[\"validation\"] = validation\n            self.val_dataloader = partial(self._val_dataloader, shuffle=shuffle_val_dataloader)\n        if test is not None:\n            test['params'] = {**test['params'], **basic_config}\n            self.dataset_configs[\"test\"] = test\n            self.test_dataloader = partial(self._test_dataloader, shuffle=shuffle_test_loader)\n        if predict is not None:\n            predict['params'] = {**predict['params'], **basic_config}\n            self.dataset_configs[\"predict\"] = predict\n            self.predict_dataloader = self._predict_dataloader\n        self.wrap = wrap\n\n    def prepare_data(self):\n        for data_cfg in self.dataset_configs.values():\n            instantiate_from_config(data_cfg)\n\n    def setup(self, stage=None):\n        self.datasets = dict((k, instantiate_from_config(self.dataset_configs[k])) for k in self.dataset_configs)\n        if self.wrap:\n            for k in self.datasets:\n                self.datasets[k] = WrappedDataset(self.datasets[k])\n\n    def _train_dataloader(self):\n        is_iterable_dataset = isinstance(self.datasets['train'], Txt2ImgIterableBaseDataset)\n        if is_iterable_dataset or self.use_worker_init_fn:\n            init_fn = worker_init_fn\n        else:\n            init_fn = None\n        return DataLoader(self.datasets[\"train\"], batch_size=self.batch_size,\n                          num_workers=self.num_workers, shuffle=False if is_iterable_dataset else True,\n                          worker_init_fn=init_fn)\n\n    def _val_dataloader(self, shuffle=False):\n        if isinstance(self.datasets['validation'], Txt2ImgIterableBaseDataset) or self.use_worker_init_fn:\n            init_fn = worker_init_fn\n        else:\n            init_fn = None\n        return DataLoader(self.datasets[\"validation\"],\n                          batch_size=self.batch_size,\n                          num_workers=self.num_workers,\n                          worker_init_fn=init_fn,\n                          shuffle=shuffle)\n\n    def _test_dataloader(self, shuffle=False):\n        is_iterable_dataset = isinstance(self.datasets['train'], Txt2ImgIterableBaseDataset)\n        if is_iterable_dataset or self.use_worker_init_fn:\n            init_fn = worker_init_fn\n        else:\n            init_fn = None\n\n        # do not shuffle dataloader for iterable dataset\n        shuffle = shuffle and (not is_iterable_dataset)\n\n        return DataLoader(self.datasets[\"test\"], batch_size=self.batch_size,\n                          num_workers=self.num_workers, worker_init_fn=init_fn, shuffle=shuffle)\n\n    def _predict_dataloader(self, shuffle=False):\n        if isinstance(self.datasets['predict'], Txt2ImgIterableBaseDataset) or self.use_worker_init_fn:\n            init_fn = worker_init_fn\n        else:\n            init_fn = None\n        return DataLoader(self.datasets[\"predict\"], batch_size=self.batch_size,\n                          num_workers=self.num_workers, worker_init_fn=init_fn)\n\n\nclass SetupCallback(Callback):\n    def __init__(self, resume, now, logdir, ckptdir, cfgdir, config, lightning_config):\n        super().__init__()\n        self.resume = resume\n        self.now = now\n        self.logdir = logdir\n        self.ckptdir = ckptdir\n        self.cfgdir = cfgdir\n        self.config = config\n        self.lightning_config = lightning_config\n\n    def on_keyboard_interrupt(self, trainer, pl_module):\n        if trainer.global_rank == 0:\n            print(\"Summoning checkpoint.\")\n            ckpt_path = os.path.join(self.ckptdir, \"last.ckpt\")\n            trainer.save_checkpoint(ckpt_path)\n\n    def on_pretrain_routine_start(self, trainer, pl_module):\n        if trainer.global_rank == 0:\n            # Create logdirs and save configs\n            os.makedirs(self.logdir, exist_ok=True)\n            os.makedirs(self.ckptdir, exist_ok=True)\n            os.makedirs(self.cfgdir, exist_ok=True)\n\n            if \"callbacks\" in self.lightning_config:\n                if 'metrics_over_trainsteps_checkpoint' in self.lightning_config['callbacks']:\n                    os.makedirs(os.path.join(self.ckptdir, 'trainstep_checkpoints'), exist_ok=True)\n            print(\"Project config\")\n            print(OmegaConf.to_yaml(self.config))\n            OmegaConf.save(self.config,\n                           os.path.join(self.cfgdir, \"{}-project.yaml\".format(self.now)))\n\n            print(\"Lightning config\")\n            print(OmegaConf.to_yaml(self.lightning_config))\n            OmegaConf.save(OmegaConf.create({\"lightning\": self.lightning_config}),\n                           os.path.join(self.cfgdir, \"{}-lightning.yaml\".format(self.now)))\n\n        else:\n            # ModelCheckpoint callback created log directory --- remove it\n            if not self.resume and os.path.exists(self.logdir):\n                dst, name = os.path.split(self.logdir)\n                dst = os.path.join(dst, \"child_runs\", name)\n                os.makedirs(os.path.split(dst)[0], exist_ok=True)\n                try:\n                    os.rename(self.logdir, dst)\n                except FileNotFoundError:\n                    pass\n\n\nclass ImageLogger(Callback):\n    def __init__(self, batch_frequency, max_images, clamp=True, increase_log_steps=True,\n                 rescale=True, disabled=False, log_on_batch_idx=False, log_first_step=False,\n                 log_images_kwargs=None, dataset_kwargs=None):\n        super().__init__()\n        self.rescale = rescale\n        self.batch_freq = batch_frequency\n        self.max_images = max_images\n        self.logger_log_images = {\n            pl.loggers.TestTubeLogger: self._testtube,\n        }\n        self.log_steps = [2 ** n for n in range(int(np.log2(self.batch_freq)) + 1)]\n        if not increase_log_steps:\n            self.log_steps = [self.batch_freq]\n        self.clamp = clamp\n        self.disabled = disabled\n        self.log_on_batch_idx = log_on_batch_idx\n        self.log_images_kwargs = log_images_kwargs if log_images_kwargs else {}\n        self.log_first_step = log_first_step\n        self.dataset_kwargs = dataset_kwargs\n\n    @rank_zero_only\n    def _testtube(self, pl_module, images, batch_idx, split):\n        for k in images:\n            grid = torchvision.utils.make_grid(images[k])\n            grid = (grid + 1.0) / 2.0  # -1,1 -> 0,1; c,h,w\n\n            tag = f\"{split}/{k}\"\n            pl_module.logger.experiment.add_image(\n                tag, grid,\n                global_step=pl_module.global_step)\n\n    @rank_zero_only\n    def log_local(self, save_dir, split, images,\n                  global_step, current_epoch, batch_idx):\n        root = os.path.join(save_dir, \"images\", split)\n        for k in images:\n            # image grids\n            grid = torchvision.utils.make_grid(images[k], nrow=4)\n            if self.rescale:\n                grid = (grid + 1.0) / 2.0  # -1,1 -> 0,1; c,h,w\n            grid = grid.transpose(0, 1).transpose(1, 2).squeeze(-1)\n            grid = grid.numpy()\n            grid = (grid * 255).astype(np.uint8)\n\n            # save a batch of images\n            img_name = \"{}-{:06}_e-{:06}_b-{:06}.png\".format(k, global_step, current_epoch, batch_idx)\n            img_path = os.path.join(root, img_name)\n            os.makedirs(os.path.split(img_path)[0], exist_ok=True)\n            Image.fromarray(grid).save(img_path)\n\n            # save a batch of point clouds\n            imgs = images[k].squeeze().detach().cpu().numpy()\n            if isdepth(imgs):\n                for i, img in enumerate(imgs[:2]):\n                    img = (img + 1.) / 2.\n                    xyz, _, _ = range2pcd(img, **self.dataset_kwargs)\n                    pcd_name = \"{}_pcd_{:02}_gs-{:06}_e-{:06}_b-{:06}.txt\".format(k, i, global_step, current_epoch, batch_idx)\n                    pcd_path = os.path.join(root, pcd_name)\n                    np.savetxt(pcd_path, xyz, fmt='%.6f')\n\n    def log_img(self, pl_module, batch, batch_idx, split=\"train\"):\n        check_idx = batch_idx if self.log_on_batch_idx else pl_module.global_step\n        if (self.check_frequency(check_idx, split) and  # batch_idx % self.batch_freq == 0\n                hasattr(pl_module, \"log_images\") and\n                callable(pl_module.log_images) and\n                self.max_images > 0):\n            logger = type(pl_module.logger)\n\n            is_train = pl_module.training\n            if is_train:\n                pl_module.eval()\n\n            with torch.no_grad():\n                images = pl_module.log_images(batch, split=split, **self.log_images_kwargs)\n\n            for k in images:\n                N = min(images[k].shape[0], self.max_images)\n                images[k] = images[k][:N]\n                if isinstance(images[k], torch.Tensor):\n                    images[k] = images[k].detach().cpu()\n                    if self.clamp:\n                        images[k] = torch.clamp(images[k], -1., 1.)\n\n            self.log_local(pl_module.logger.save_dir, split, images,\n                           pl_module.global_step, pl_module.current_epoch, batch_idx)\n\n            logger_log_images = self.logger_log_images.get(logger, lambda *args, **kwargs: None)\n            logger_log_images(pl_module, images, pl_module.global_step, split)\n\n            if is_train:\n                pl_module.train()\n\n    def check_frequency(self, check_idx, split):\n        # TODO check validation output\n        if ((check_idx % self.batch_freq) == 0 or (check_idx in self.log_steps) or (check_idx == 1 and split == 'val')) \\\n                and (check_idx > 0 or self.log_first_step):\n            try:\n                self.log_steps.pop(0)\n            except IndexError as e:\n                # print(e)\n                pass\n            return True\n        return False\n\n    def on_train_batch_end(self, trainer, pl_module, outputs, batch, batch_idx, dataloader_idx):\n        if not self.disabled and (pl_module.global_step > 0 or self.log_first_step):\n            self.log_img(pl_module, batch, batch_idx, split=\"train\")\n\n    def on_validation_batch_end(self, trainer, pl_module, outputs, batch, batch_idx, dataloader_idx):\n        if not self.disabled and pl_module.global_step > 0:\n            self.log_img(pl_module, batch, batch_idx, split=\"val\")\n        if hasattr(pl_module, 'calibrate_grad_norm'):\n            if (pl_module.calibrate_grad_norm and batch_idx % 25 == 0) and batch_idx > 0:\n                self.log_gradients(trainer, pl_module, batch_idx=batch_idx)\n\n\nclass CUDACallback(Callback):\n    # see https://github.com/SeanNaren/minGPT/blob/master/mingpt/callback.py\n    def on_train_epoch_start(self, trainer, pl_module):\n        # Reset the memory use counter\n        torch.cuda.reset_peak_memory_stats(trainer.root_gpu)\n        torch.cuda.synchronize(trainer.root_gpu)\n        self.start_time = time.time()\n\n    def on_train_epoch_end(self, trainer, pl_module, outputs):\n        torch.cuda.synchronize(trainer.root_gpu)\n        max_memory = torch.cuda.max_memory_allocated(trainer.root_gpu) / 2 ** 20\n        epoch_time = time.time() - self.start_time\n\n        try:\n            max_memory = trainer.training_type_plugin.reduce(max_memory)\n            epoch_time = trainer.training_type_plugin.reduce(epoch_time)\n\n            rank_zero_info(f\"Average Epoch time: {epoch_time:.2f} seconds\")\n            rank_zero_info(f\"Average Peak memory {max_memory:.2f}MiB\")\n        except AttributeError:\n            pass\n\n\nif __name__ == \"__main__\":\n    # custom parser to specify config files, train, test and debug mode,\n    # postfix, resume.\n    # `--key value` arguments are interpreted as arguments to the trainer.\n    # `nested.key=value` arguments are interpreted as config parameters.\n    # configs are merged from left-to-right followed by command line parameters.\n\n    # model:\n    #   base_learning_rate: float\n    #   target: path to lightning module\n    #   params:\n    #       key: value\n    # data:\n    #   target: main.DataModuleFromConfig\n    #   params:\n    #      batch_size: int\n    #      wrap: bool\n    #      train:\n    #          target: path to train dataset\n    #          params:\n    #              key: value\n    #      validation:\n    #          target: path to validation dataset\n    #          params:\n    #              key: value\n    #      test:\n    #          target: path to test dataset\n    #          params:\n    #              key: value\n    # lightning: (optional, has sane defaults and can be specified on cmdline)\n    #   trainer:\n    #       additional arguments to trainer\n    #   logger:\n    #       logger to instantiate\n    #   modelcheckpoint:\n    #       modelcheckpoint to instantiate\n    #   callbacks:\n    #       callback1:\n    #           target: importpath\n    #           params:\n    #               key: value\n\n    now = datetime.datetime.now().strftime(\"%Y-%m-%dT%H-%M-%S\")\n\n    # add cwd for convenience and to make classes in this file available when\n    # running as `python main.py`\n    # (in particular `main.DataModuleFromConfig`)\n    sys.path.append(os.getcwd())\n\n    parser = get_parser()\n    parser = Trainer.add_argparse_args(parser)\n\n    opt, unknown = parser.parse_known_args()\n    dataset_name = opt.base[0].split('/')[-2]\n\n    if opt.name and opt.resume:\n        raise ValueError(\n            \"-n/--name and -r/--resume cannot be specified both.\"\n            \"If you want to resume training in a new log folder, \"\n            \"use -n/--name in combination with --resume_from_checkpoint\"\n        )\n    if opt.resume:\n        if not os.path.exists(opt.resume):\n            raise ValueError(\"Cannot find {}\".format(opt.resume))\n        if os.path.isfile(opt.resume):\n            paths = opt.resume.split(\"/\")\n            logdir = \"/\".join(paths[:-2])\n            ckpt = opt.resume\n        else:\n            assert os.path.isdir(opt.resume), opt.resume\n            logdir = opt.resume.rstrip(\"/\")\n            ckpt = os.path.join(logdir, \"checkpoints\", \"last.ckpt\")\n\n        opt.resume_from_checkpoint = ckpt\n        base_configs = sorted(glob.glob(os.path.join(logdir, \"configs/*.yaml\")))\n        opt.base = base_configs + opt.base\n        _tmp = logdir.split(\"/\")\n        nowname = _tmp[-1]\n    else:\n        if opt.name:\n            name = \"_\" + opt.name\n        elif opt.base:\n            cfg_fname = os.path.split(opt.base[0])[-1]\n            cfg_name = os.path.splitext(cfg_fname)[0]\n            name = \"_\" + cfg_name\n        else:\n            name = \"\"\n        nowname = now + name + opt.postfix\n        logdir = os.path.join(opt.logdir, dataset_name, nowname)\n\n    ckptdir = os.path.join(logdir, \"checkpoints\")\n    cfgdir = os.path.join(logdir, \"configs\")\n    seed_everything(opt.seed)\n\n    try:\n        # init and save configs\n        configs = [OmegaConf.load(cfg) for cfg in opt.base]\n        cli = OmegaConf.from_dotlist(unknown)\n        config = OmegaConf.merge(*configs, cli)\n        lightning_config = config.pop(\"lightning\", OmegaConf.create())\n        # merge trainer cli with config\n        trainer_config = lightning_config.get(\"trainer\", OmegaConf.create())\n        # default to ddp\n        trainer_config[\"accelerator\"] = \"ddp\"\n        for k in nondefault_trainer_args(opt):\n            trainer_config[k] = getattr(opt, k)\n        if not \"gpus\" in trainer_config:\n            del trainer_config[\"accelerator\"]\n            cpu = True\n        else:\n            gpuinfo = trainer_config[\"gpus\"]\n            print(f\"Running on GPUs {gpuinfo}\")\n            cpu = False\n        trainer_opt = argparse.Namespace(**trainer_config)\n        lightning_config.trainer = trainer_config\n\n        # model\n        if 'autoencoder' in config.model.target:\n            config.model.params.lossconfig.params.dataset_config = config.data.params.dataset\n        model = instantiate_from_config(config.model)\n\n        # trainer and callbacks\n        trainer_kwargs = dict()\n\n        # default logger configs\n        default_logger_cfgs = {\n            \"wandb\": {\n                \"target\": \"pytorch_lightning.loggers.WandbLogger\",\n                \"params\": {\n                    \"project\": f\"lidar_diffusion_{dataset_name}\",\n                    \"entity\": \"hancyran\",\n                    \"name\": nowname,\n                    \"save_dir\": logdir,\n                    \"offline\": opt.debug,\n                    \"id\": nowname,\n                }\n            },\n            \"testtube\": {\n                \"target\": \"pytorch_lightning.loggers.TestTubeLogger\",\n                \"params\": {\n                    \"name\": \"testtube\",\n                    \"save_dir\": logdir,\n                }\n            },\n        }\n        default_logger_cfg = default_logger_cfgs[\"wandb\"]\n        if \"logger\" in lightning_config:\n            logger_cfg = lightning_config.logger\n        else:\n            logger_cfg = OmegaConf.create()\n        logger_cfg = OmegaConf.merge(default_logger_cfg, logger_cfg)\n        trainer_kwargs[\"logger\"] = instantiate_from_config(logger_cfg)\n\n        # model checkpoint - use TrainResult/EvalResult(checkpoint_on=metric) to\n        # specify which metric is used to determine best models\n        default_modelckpt_cfg = {\n            \"target\": \"pytorch_lightning.callbacks.ModelCheckpoint\",\n            \"params\": {\n                \"dirpath\": ckptdir,\n                \"filename\": \"{epoch:06}\",\n                \"verbose\": True,\n                \"save_last\": True,\n            }\n        }\n        if hasattr(model, \"monitor\"):\n            print(f\"Monitoring {model.monitor} as checkpoint metric.\")\n            default_modelckpt_cfg[\"params\"][\"monitor\"] = model.monitor\n            default_modelckpt_cfg[\"params\"][\"save_top_k\"] = 3\n\n        if \"modelcheckpoint\" in lightning_config:\n            modelckpt_cfg = lightning_config.modelcheckpoint\n        else:\n            modelckpt_cfg = OmegaConf.create()\n        modelckpt_cfg = OmegaConf.merge(default_modelckpt_cfg, modelckpt_cfg)\n        print(f\"Merged modelckpt-cfg: \\n{modelckpt_cfg}\")\n        if version.parse(pl.__version__) < version.parse('1.4.0'):\n            trainer_kwargs[\"checkpoint_callback\"] = instantiate_from_config(modelckpt_cfg)\n\n        # add callback which sets up log directory\n        default_callbacks_cfg = {\n            \"setup_callback\": {\n                \"target\": \"main.SetupCallback\",\n                \"params\": {\n                    \"resume\": opt.resume,\n                    \"now\": now,\n                    \"logdir\": logdir,\n                    \"ckptdir\": ckptdir,\n                    \"cfgdir\": cfgdir,\n                    \"config\": config,\n                    \"lightning_config\": lightning_config,\n                }\n            },\n            \"image_logger\": {\n                \"target\": \"main.ImageLogger\",\n                \"params\": {\n                    \"batch_frequency\": 750,\n                    \"max_images\": 4,\n                    \"clamp\": True,\n                    \"dataset_kwargs\": config.data.params.dataset\n                }\n            },\n            \"learning_rate_logger\": {\n                \"target\": \"main.LearningRateMonitor\",\n                \"params\": {\n                    \"logging_interval\": \"step\",\n                    # \"log_momentum\": True\n                }\n            },\n            \"cuda_callback\": {\n                \"target\": \"main.CUDACallback\"\n            },\n        }\n        if version.parse(pl.__version__) >= version.parse('1.4.0'):\n            default_callbacks_cfg.update({'checkpoint_callback': modelckpt_cfg})\n\n        if \"callbacks\" in lightning_config:\n            callbacks_cfg = lightning_config.callbacks\n        else:\n            callbacks_cfg = OmegaConf.create()\n\n        if 'metrics_over_trainsteps_checkpoint' in callbacks_cfg:\n            print(\n                'Caution: Saving checkpoints every n train steps without deleting. This might require some free space.')\n            default_metrics_over_trainsteps_ckpt_dict = {\n                'metrics_over_trainsteps_checkpoint':\n                    {\"target\": 'pytorch_lightning.callbacks.ModelCheckpoint',\n                     'params': {\n                         \"dirpath\": os.path.join(ckptdir, 'trainstep_checkpoints'),\n                         \"filename\": \"{epoch:06}-{step:09}\",\n                         \"verbose\": True,\n                         'save_top_k': -1,\n                         'every_n_train_steps': 10000,\n                         'save_weights_only': True\n                     }\n                     }\n            }\n            default_callbacks_cfg.update(default_metrics_over_trainsteps_ckpt_dict)\n\n        callbacks_cfg = OmegaConf.merge(default_callbacks_cfg, callbacks_cfg)\n        if 'ignore_keys_callback' in callbacks_cfg and hasattr(trainer_opt, 'resume_from_checkpoint'):\n            callbacks_cfg.ignore_keys_callback.params['ckpt_path'] = trainer_opt.resume_from_checkpoint\n        elif 'ignore_keys_callback' in callbacks_cfg:\n            del callbacks_cfg['ignore_keys_callback']\n\n        trainer_kwargs[\"callbacks\"] = [instantiate_from_config(callbacks_cfg[k]) for k in callbacks_cfg]\n        # trainer_kwargs[\"progress_bar_refresh_rate\"] = 0 if opt.train else None  # prevent redundant logs in wandb\n        # trainer_kwargs[\"plugins\"] = DDPPlugin(find_unused_parameters=False)  # prevent warning of \"no unused params\"\n        # trainer_kwargs[\"precision\"] = 32  # mixed precision\n\n        trainer = Trainer.from_argparse_args(trainer_opt, **trainer_kwargs)\n        trainer.logdir = logdir\n\n        # data\n        data = instantiate_from_config(config.data)\n        # NOTE according to https://pytorch-lightning.readthedocs.io/en/latest/datamodules.html\n        # calling these ourselves should not be necessary but it is.\n        # lightning still takes care of proper multiprocessing though\n        data.prepare_data()\n        data.setup()\n        test = data.datasets['train'][0]  # for debug\n\n        print(\"#### Data #####\")\n        for k in data.datasets:\n            print(f\"{k}, {data.datasets[k].__class__.__name__}, {len(data.datasets[k])}\")\n\n        # configure learning rate\n        bs, base_lr = config.data.params.batch_size, config.model.base_learning_rate\n        if not cpu:\n            ngpu = len(lightning_config.trainer.gpus.strip(\",\").split(','))\n        else:\n            ngpu = 1\n        if 'accumulate_grad_batches' in lightning_config.trainer:\n            accumulate_grad_batches = lightning_config.trainer.accumulate_grad_batches\n        else:\n            accumulate_grad_batches = 1\n        print(f\"accumulate_grad_batches = {accumulate_grad_batches}\")\n        lightning_config.trainer.accumulate_grad_batches = accumulate_grad_batches\n        if opt.scale_lr:\n            model.learning_rate = accumulate_grad_batches * ngpu * bs * base_lr\n            print(\n                \"Setting learning rate to {:.2e} = {} (accumulate_grad_batches) * {} (num_gpus) * {} (batchsize) * {:.2e} (base_lr)\".format(\n                    model.learning_rate, accumulate_grad_batches, ngpu, bs, base_lr))\n        else:\n            model.learning_rate = base_lr\n            print(\"++++ NOT USING LR SCALING ++++\")\n            print(f\"Setting learning rate to {model.learning_rate:.2e}\")\n\n\n        def melk(*args, **kwargs):\n            # run all checkpoint hooks\n            if trainer.global_rank == 0:\n                print(\"Summoning checkpoint.\")\n                ckpt_path = os.path.join(ckptdir, \"last.ckpt\")\n                trainer.save_checkpoint(ckpt_path)\n\n\n        def divein(*args, **kwargs):\n            if trainer.global_rank == 0:\n                import pudb;\n                pudb.set_trace()\n\n\n        import signal\n\n        signal.signal(signal.SIGUSR1, melk)\n        signal.signal(signal.SIGUSR2, divein)\n\n        # run\n        if opt.train:\n            try:\n                trainer.fit(model, data)\n            except Exception:\n                melk()\n                raise\n        if not opt.no_test and not trainer.interrupted:\n            trainer.test(model, data)\n    except Exception:\n        if opt.debug and trainer.global_rank == 0:\n            import pdb as debugger\n            debugger.post_mortem()\n        raise\n    finally:\n        # move newly created debug project to debug_runs\n        if opt.debug and not opt.resume and trainer.global_rank == 0:\n            dst, name = os.path.split(logdir)\n            dst = os.path.join(dst, \"debug_runs\", name)\n            os.makedirs(os.path.split(dst)[0], exist_ok=True)\n            os.rename(logdir, dst)\n        if trainer.global_rank == 0:\n            print(trainer.profiler.summary())\n"
  },
  {
    "path": "models/baseline/kitti/template/config.yaml",
    "content": "data:\n  target: main.DataModuleFromConfig\n  params:\n    dataset:\n      size: [64, 1024]\n      fov: [ 3,-25 ]\n      depth_range: [ 1.0,56.0 ]\n      depth_scale: 6\n      log_scale: true\n      x_range: [ -50.0, 50.0 ]\n      y_range: [ -50.0, 50.0 ]\n      z_range: [ -3.0, 1.0 ]\n      resolution: 1\n      num_channels: 1\n      num_cats: 10\n      num_views: 2\n      num_sem_cats: 19\n      filtered_map_cats: [ ]\n    aug:\n      flip: false\n      rotate: false\n      keypoint_drop: false\n      keypoint_drop_range: [ 5,20 ]\n      randaug: false\n    validation:\n      target: lidm.data.kitti.KITTI360Validation\n      params:\n        condition_key: image\n"
  },
  {
    "path": "models/baseline/nuscenes/template/config.yaml",
    "content": "data:\n  target: main.DataModuleFromConfig\n  params:\n    dataset:\n      size: [32, 1024]\n      fov: [ 10,-30 ]\n      depth_range: [ 1.0,45.0 ]\n      depth_scale: 6.5\n      log_scale: true\n      x_range: [ -30.0, 30.0 ]\n      y_range: [ -30.0, 30.0 ]\n      z_range: [ -3.0, 6.0 ]\n      resolution: 1\n      num_channels: 1\n      num_cats: 10\n      num_views: 6\n      num_sem_cats: 16\n      filtered_map_cats: [ ]\n    aug:\n      flip: false\n      rotate: false\n      keypoint_drop: false\n      keypoint_drop_range: [ 5,20 ]\n      randaug: false\n    validation:\n      target: lidm.data.nuscenes.NuScenesValidation\n      params:\n        condition_key: image\n"
  },
  {
    "path": "models/first_stage_models/ablate/f_c16/config.yaml",
    "content": "model:\n  base_learning_rate: 4.5e-6\n  target: lidm.models.autoencoder.VQModel\n  params:\n    monitor: val/rec_loss\n    embed_dim: 4\n    n_embed: 16384\n    lib_name: lidm\n    use_mask: True  # False\n\n    ddconfig:\n      double_z: false\n      z_channels: 4\n      in_channels: 1\n      out_ch: 2\n      ch: 64\n      ch_mult: [1,1,2,2,4]  # num_down = len(ch_mult)-1\n      strides: [[1,2],[1,2],[1,2],[1,2]]\n      num_res_blocks: 2\n      attn_levels: [4]\n      dropout: 0.0\n\n\ndata:\n  target: main.DataModuleFromConfig\n  params:\n    batch_size: 4\n    num_workers: 8\n    wrap: true\n    dataset:\n      size: [64, 1024]\n      fov: [ 3,-25 ]\n      depth_range: [ 1.0,56.0 ]\n      depth_scale: 5.84  # np.log2(depth_max + 1)\n      x_range: [ -50.0, 50.0 ]\n      y_range: [ -50.0, 50.0 ]\n      z_range: [ -3.0, 1.0 ]\n      resolution: 1\n      num_channels: 1\n      num_cats: 10\n      num_views: 2\n      num_sem_cats: 19\n      filtered_map_cats: [ ]\n    aug:\n      flip: true\n      rotate: true\n      keypoint_drop: false\n      keypoint_drop_range: [ 5,20 ]\n      randaug: false\n    train:\n      target: lidm.data.kitti.KITTIImageTrain\n      params:\n        condition_key: image\n    validation:\n      target: lidm.data.kitti.KITTIImageValidation\n      params:\n        condition_key: image\n\n\n\nlightning:\n  callbacks:\n    image_logger:\n      target: main.ImageLogger\n      params:\n        batch_frequency: 1000\n        max_images: 8\n        increase_log_steps: true\n\n  trainer:\n    benchmark: true\n    accumulate_grad_batches: 2\n    max_steps: 40000\n    sync_batchnorm: true\n"
  },
  {
    "path": "models/first_stage_models/ablate/f_c16_p2/config.yaml",
    "content": "model:\n  base_learning_rate: 4.5e-6\n  target: lidm.models.autoencoder.VQModel\n  params:\n    monitor: val/rec_loss\n    embed_dim: 16\n    n_embed: 16384\n    lib_name: lidm\n    use_mask: True  # False\n\n    ddconfig:\n      double_z: false\n      z_channels: 16\n      in_channels: 1\n      out_ch: 2\n      ch: 64\n      ch_mult: [1,1,2,2,2,4]  # num_down = len(ch_mult)-1\n      strides: [[1,2],[1,2],[1,2],[1,2],[2,2]]\n      num_res_blocks: 2\n      attn_levels: [5]\n      dropout: 0.0\n\n\ndata:\n  target: main.DataModuleFromConfig\n  params:\n    batch_size: 4\n    num_workers: 8\n    wrap: true\n    dataset:\n      size: [64, 1024]\n      fov: [ 3,-25 ]\n      depth_range: [ 1.0,56.0 ]\n      depth_scale: 5.84  # np.log2(depth_max + 1)\n      log_scale: true\n      x_range: [ -50.0, 50.0 ]\n      y_range: [ -50.0, 50.0 ]\n      z_range: [ -3.0, 1.0 ]\n      resolution: 1\n      num_channels: 1\n      num_cats: 10\n      num_views: 2\n      num_sem_cats: 19\n      filtered_map_cats: [ ]\n    aug:\n      flip: true\n      rotate: true\n      keypoint_drop: false\n      keypoint_drop_range: [ 5,20 ]\n      randaug: false\n    train:\n      target: lidm.data.kitti.KITTIImageTrain\n      params:\n        condition_key: image\n    validation:\n      target: lidm.data.kitti.KITTIImageValidation\n      params:\n        condition_key: image\n\n\n\nlightning:\n  callbacks:\n    image_logger:\n      target: main.ImageLogger\n      params:\n        batch_frequency: 1000\n        max_images: 8\n        increase_log_steps: true\n\n  trainer:\n    benchmark: true\n    accumulate_grad_batches: 2\n    max_steps: 40000\n    sync_batchnorm: true\n"
  },
  {
    "path": "models/first_stage_models/ablate/f_c2_p2/config.yaml",
    "content": "model:\n  base_learning_rate: 4.5e-6\n  target: lidm.models.autoencoder.VQModel\n  params:\n    monitor: val/rec_loss\n    embed_dim: 3\n    n_embed: 8192\n    lib_name: lidm\n    use_mask: True  # False\n\n    ddconfig:\n      double_z: false\n      z_channels: 3\n      in_channels: 1\n      out_ch: 2\n      ch: 64\n      ch_mult: [1,2,4]  # num_down = len(ch_mult)-1\n      strides: [[1,2],[2,2]]\n      num_res_blocks: 2\n      attn_levels: []\n      dropout: 0.0\n\n\ndata:\n  target: main.DataModuleFromConfig\n  params:\n    batch_size: 4\n    num_workers: 8\n    wrap: true\n    dataset:\n      size: [64, 1024]\n      fov: [ 3,-25 ]\n      depth_range: [ 1.0,56.0 ]\n      depth_scale: 5.84  # np.log2(depth_max + 1)\n      log_scale: true\n      x_range: [ -50.0, 50.0 ]\n      y_range: [ -50.0, 50.0 ]\n      z_range: [ -3.0, 1.0 ]\n      resolution: 1\n      num_channels: 1\n      num_cats: 10\n      num_views: 2\n      num_sem_cats: 19\n      filtered_map_cats: [ ]\n    aug:\n      flip: true\n      rotate: true\n      keypoint_drop: false\n      keypoint_drop_range: [ 5,20 ]\n      randaug: false\n    train:\n      target: lidm.data.kitti.KITTIImageTrain\n      params:\n        condition_key: image\n    validation:\n      target: lidm.data.kitti.KITTIImageValidation\n      params:\n        condition_key: image\n\n\n\nlightning:\n  callbacks:\n    image_logger:\n      target: main.ImageLogger\n      params:\n        batch_frequency: 1000\n        max_images: 8\n        increase_log_steps: true\n\n  trainer:\n    benchmark: true\n    accumulate_grad_batches: 2\n    max_steps: 40000\n    sync_batchnorm: true\n"
  },
  {
    "path": "models/first_stage_models/ablate/f_c2_p4/config.yaml",
    "content": "model:\n  base_learning_rate: 4.5e-6\n  target: lidm.models.autoencoder.VQModel\n  params:\n    monitor: val/rec_loss\n    embed_dim: 8\n    n_embed: 16384\n    lib_name: lidm\n    use_mask: True  # False\n\n    ddconfig:\n      double_z: false\n      z_channels: 8\n      in_channels: 1\n      out_ch: 2\n      ch: 64\n      ch_mult: [1,2,2,4]  # num_down = len(ch_mult)-1\n      strides: [[1,2],[2,2],[2,2]]\n      num_res_blocks: 2\n      attn_levels: []\n      dropout: 0.0\n\n\ndata:\n  target: main.DataModuleFromConfig\n  params:\n    batch_size: 4\n    num_workers: 8\n    wrap: true\n    dataset:\n      size: [64, 1024]\n      fov: [ 3,-25 ]\n      depth_range: [ 1.0,56.0 ]\n      depth_scale: 5.84  # np.log2(depth_max + 1)\n      x_range: [ -50.0, 50.0 ]\n      y_range: [ -50.0, 50.0 ]\n      z_range: [ -3.0, 1.0 ]\n      resolution: 1\n      num_channels: 1\n      num_cats: 10\n      num_views: 2\n      num_sem_cats: 19\n      filtered_map_cats: [ ]\n    aug:\n      flip: true\n      rotate: true\n      keypoint_drop: false\n      keypoint_drop_range: [ 5,20 ]\n      randaug: false\n    train:\n      target: lidm.data.kitti.KITTIImageTrain\n      params:\n        condition_key: image\n    validation:\n      target: lidm.data.kitti.KITTIImageValidation\n      params:\n        condition_key: image\n\n\n\nlightning:\n  callbacks:\n    image_logger:\n      target: main.ImageLogger\n      params:\n        batch_frequency: 1000\n        max_images: 8\n        increase_log_steps: true\n\n  trainer:\n    benchmark: true\n    accumulate_grad_batches: 2\n    max_steps: 40000\n    sync_batchnorm: true\n"
  },
  {
    "path": "models/first_stage_models/ablate/f_c32/config.yaml",
    "content": "model:\n  base_learning_rate: 4.5e-6\n  target: lidm.models.autoencoder.VQModel\n  params:\n    monitor: val/rec_loss\n    embed_dim: 8\n    n_embed: 16384\n    lib_name: lidm\n    use_mask: True  # False\n\n    ddconfig:\n      double_z: false\n      z_channels: 8\n      in_channels: 1\n      out_ch: 2\n      ch: 64\n      ch_mult: [1,1,2,2,2,4]  # num_down = len(ch_mult)-1\n      strides: [[1,2],[1,2],[1,2],[1,2],[1,2]]\n      num_res_blocks: 2\n      attn_levels: [5]\n      dropout: 0.0\n\n\ndata:\n  target: main.DataModuleFromConfig\n  params:\n    batch_size: 4\n    num_workers: 8\n    wrap: true\n    dataset:\n      size: [64, 1024]\n      fov: [ 3,-25 ]\n      depth_range: [ 1.0,56.0 ]\n      depth_scale: 5.84  # np.log2(depth_max + 1)\n      x_range: [ -50.0, 50.0 ]\n      y_range: [ -50.0, 50.0 ]\n      z_range: [ -3.0, 1.0 ]\n      resolution: 1\n      num_channels: 1\n      num_cats: 10\n      num_views: 2\n      num_sem_cats: 19\n      filtered_map_cats: [ ]\n    aug:\n      flip: true\n      rotate: true\n      keypoint_drop: false\n      keypoint_drop_range: [ 5,20 ]\n      randaug: false\n    train:\n      target: lidm.data.kitti.KITTIImageTrain\n      params:\n        condition_key: image\n    validation:\n      target: lidm.data.kitti.KITTIImageValidation\n      params:\n        condition_key: image\n\n\n\nlightning:\n  callbacks:\n    image_logger:\n      target: main.ImageLogger\n      params:\n        batch_frequency: 1000\n        max_images: 8\n        increase_log_steps: true\n\n  trainer:\n    benchmark: true\n    accumulate_grad_batches: 2\n    max_steps: 40000\n    sync_batchnorm: true\n"
  },
  {
    "path": "models/first_stage_models/ablate/f_c4/config.yaml",
    "content": "model:\n  base_learning_rate: 4.5e-6\n  target: lidm.models.autoencoder.VQModel\n  params:\n    monitor: val/rec_loss\n    embed_dim: 2\n    n_embed: 4096\n    lib_name: lidm\n    use_mask: True  # False\n\n    ddconfig:\n      double_z: false\n      z_channels: 2\n      in_channels: 1\n      out_ch: 2\n      ch: 64\n      ch_mult: [1,2,4]  # num_down = len(ch_mult)-1\n      strides: [[1,2],[1,2]]\n      num_res_blocks: 2\n      attn_levels: []\n      dropout: 0.0\n\n\ndata:\n  target: main.DataModuleFromConfig\n  params:\n    batch_size: 2\n    num_workers: 8\n    wrap: true\n    dataset:\n      size: [64, 1024]\n      fov: [ 3,-25 ]\n      depth_range: [ 1.0,56.0 ]\n      depth_scale: 5.84  # np.log2(depth_max + 1)\n      x_range: [ -50.0, 50.0 ]\n      y_range: [ -50.0, 50.0 ]\n      z_range: [ -3.0, 1.0 ]\n      resolution: 1\n      num_channels: 1\n      num_cats: 10\n      num_views: 2\n      num_sem_cats: 19\n      filtered_map_cats: [ ]\n    aug:\n      flip: true\n      rotate: true\n      keypoint_drop: false\n      keypoint_drop_range: [ 5,20 ]\n      randaug: false\n    train:\n      target: lidm.data.kitti.KITTIImageTrain\n      params:\n        condition_key: image\n    validation:\n      target: lidm.data.kitti.KITTIImageValidation\n      params:\n        condition_key: image\n\n\n\nlightning:\n  callbacks:\n    image_logger:\n      target: main.ImageLogger\n      params:\n        batch_frequency: 1000\n        max_images: 8\n        increase_log_steps: true\n\n  trainer:\n    benchmark: true\n    accumulate_grad_batches: 2\n    max_steps: 40000\n    sync_batchnorm: true\n"
  },
  {
    "path": "models/first_stage_models/ablate/f_c4_p2/config.yaml",
    "content": "model:\n  base_learning_rate: 4.5e-6\n  target: lidm.models.autoencoder.VQModel\n  params:\n    monitor: val/rec_loss\n    embed_dim: 4\n    n_embed: 16384\n    lib_name: lidm\n    use_mask: True  # False\n\n    ddconfig:\n      double_z: false\n      z_channels: 4\n      in_channels: 1\n      out_ch: 2\n      ch: 64\n      ch_mult: [1,2,2,4]  # num_down = len(ch_mult)-1\n      strides: [[1,2],[1,2],[2,2]]\n      num_res_blocks: 2\n      attn_levels: []\n      dropout: 0.0\n\n\ndata:\n  target: main.DataModuleFromConfig\n  params:\n    batch_size: 4\n    num_workers: 8\n    wrap: true\n    dataset:\n      size: [64, 1024]\n      fov: [ 3,-25 ]\n      depth_range: [ 1.0,56.0 ]\n      depth_scale: 5.84  # np.log2(depth_max + 1)\n      log_scale: true\n      x_range: [ -50.0, 50.0 ]\n      y_range: [ -50.0, 50.0 ]\n      z_range: [ -3.0, 1.0 ]\n      resolution: 1\n      num_channels: 1\n      num_cats: 10\n      num_views: 2\n      num_sem_cats: 19\n      filtered_map_cats: [ ]\n    aug:\n      flip: true\n      rotate: true\n      keypoint_drop: false\n      keypoint_drop_range: [ 5,20 ]\n      randaug: false\n    train:\n      target: lidm.data.kitti.KITTIImageTrain\n      params:\n        condition_key: image\n    validation:\n      target: lidm.data.kitti.KITTIImageValidation\n      params:\n        condition_key: image\n\n\n\nlightning:\n  callbacks:\n    image_logger:\n      target: main.ImageLogger\n      params:\n        batch_frequency: 1000\n        max_images: 8\n        increase_log_steps: true\n\n  trainer:\n    benchmark: true\n    accumulate_grad_batches: 2\n    max_steps: 40000\n    sync_batchnorm: true\n"
  },
  {
    "path": "models/first_stage_models/ablate/f_c4_p4/config.yaml",
    "content": "model:\n  base_learning_rate: 4.5e-6\n  target: lidm.models.autoencoder.VQModel\n  params:\n    monitor: val/rec_loss\n    embed_dim: 16\n    n_embed: 16384\n    lib_name: lidm\n    use_mask: True  # False\n\n    ddconfig:\n      double_z: false\n      z_channels: 16\n      in_channels: 1\n      out_ch: 2\n      ch: 64\n      ch_mult: [1,1,2,2,4]  # num_down = len(ch_mult)-1\n      strides: [[1,2],[1,2],[2,2],[2,2]]\n      num_res_blocks: 2\n      attn_levels: [4]\n      dropout: 0.0\n\n\ndata:\n  target: main.DataModuleFromConfig\n  params:\n    batch_size: 4\n    num_workers: 8\n    wrap: true\n    dataset:\n      size: [64, 1024]\n      fov: [ 3,-25 ]\n      depth_range: [ 1.0,56.0 ]\n      depth_scale: 5.84  # np.log2(depth_max + 1)\n      log_scale: true\n      x_range: [ -50.0, 50.0 ]\n      y_range: [ -50.0, 50.0 ]\n      z_range: [ -3.0, 1.0 ]\n      resolution: 1\n      num_channels: 1\n      num_cats: 10\n      num_views: 2\n      num_sem_cats: 19\n      filtered_map_cats: [ ]\n    aug:\n      flip: true\n      rotate: true\n      keypoint_drop: false\n      keypoint_drop_range: [ 5,20 ]\n      randaug: false\n    train:\n      target: lidm.data.kitti.KITTIImageTrain\n      params:\n        condition_key: image\n    validation:\n      target: lidm.data.kitti.KITTIImageValidation\n      params:\n        condition_key: image\n\n\n\nlightning:\n  callbacks:\n    image_logger:\n      target: main.ImageLogger\n      params:\n        batch_frequency: 1000\n        max_images: 8\n        increase_log_steps: true\n\n  trainer:\n    benchmark: true\n    accumulate_grad_batches: 2\n    max_steps: 40000\n    sync_batchnorm: true\n"
  },
  {
    "path": "models/first_stage_models/ablate/f_c64/config.yaml",
    "content": "model:\n  base_learning_rate: 4.5e-6\n  target: lidm.models.autoencoder.VQModel\n  params:\n    monitor: val/rec_loss\n    embed_dim: 16\n    n_embed: 16384\n    lib_name: lidm\n    use_mask: True  # False\n\n    ddconfig:\n      double_z: false\n      z_channels: 16\n      in_channels: 1\n      out_ch: 2\n      ch: 64\n      ch_mult: [1,1,2,2,2,4,4]  # num_down = len(ch_mult)-1\n      strides: [[1,2],[1,2],[1,2],[1,2],[1,2],[1,2]]\n      num_res_blocks: 2\n      attn_levels: [5,6]\n      dropout: 0.0\n\n\ndata:\n  target: main.DataModuleFromConfig\n  params:\n    batch_size: 4\n    num_workers: 8\n    wrap: true\n    dataset:\n      size: [64, 1024]\n      fov: [ 3,-25 ]\n      depth_range: [ 1.0,56.0 ]\n      depth_scale: 5.84  # np.log2(depth_max + 1)\n      x_range: [ -50.0, 50.0 ]\n      y_range: [ -50.0, 50.0 ]\n      z_range: [ -3.0, 1.0 ]\n      resolution: 1\n      num_channels: 1\n      num_cats: 10\n      num_views: 2\n      num_sem_cats: 19\n      filtered_map_cats: [ ]\n    aug:\n      flip: true\n      rotate: true\n      keypoint_drop: false\n      keypoint_drop_range: [ 5,20 ]\n      randaug: false\n    train:\n      target: lidm.data.kitti.KITTIImageTrain\n      params:\n        condition_key: image\n    validation:\n      target: lidm.data.kitti.KITTIImageValidation\n      params:\n        condition_key: image\n\n\n\nlightning:\n  callbacks:\n    image_logger:\n      target: main.ImageLogger\n      params:\n        batch_frequency: 1000\n        max_images: 8\n        increase_log_steps: true\n\n  trainer:\n    benchmark: true\n    accumulate_grad_batches: 2\n    max_steps: 40000\n    sync_batchnorm: true\n"
  },
  {
    "path": "models/first_stage_models/ablate/f_c8/config.yaml",
    "content": "model:\n  base_learning_rate: 4.5e-6\n  target: lidm.models.autoencoder.VQModel\n  params:\n    monitor: val/rec_loss\n    embed_dim: 3\n    n_embed: 8192\n    lib_name: lidm\n    use_mask: True  # False\n\n    ddconfig:\n      double_z: false\n      z_channels: 3\n      in_channels: 1\n      out_ch: 2\n      ch: 64\n      ch_mult: [1,2,2,4]  # num_down = len(ch_mult)-1\n      strides: [[1,2],[1,2],[1,2]]\n      num_res_blocks: 2\n      attn_levels: []\n      dropout: 0.0\n\n\ndata:\n  target: main.DataModuleFromConfig\n  params:\n    batch_size: 4\n    num_workers: 8\n    wrap: true\n    dataset:\n      size: [64, 1024]\n      fov: [ 3,-25 ]\n      depth_range: [ 1.0,56.0 ]\n      depth_scale: 5.84  # np.log2(depth_max + 1)\n      x_range: [ -50.0, 50.0 ]\n      y_range: [ -50.0, 50.0 ]\n      z_range: [ -3.0, 1.0 ]\n      resolution: 1\n      num_channels: 1\n      num_cats: 10\n      num_views: 2\n      num_sem_cats: 19\n      filtered_map_cats: [ ]\n    aug:\n      flip: true\n      rotate: true\n      keypoint_drop: false\n      keypoint_drop_range: [ 5,20 ]\n      randaug: false\n    train:\n      target: lidm.data.kitti.KITTIImageTrain\n      params:\n        condition_key: image\n    validation:\n      target: lidm.data.kitti.KITTIImageValidation\n      params:\n        condition_key: image\n\n\n\nlightning:\n  callbacks:\n    image_logger:\n      target: main.ImageLogger\n      params:\n        batch_frequency: 1000\n        max_images: 8\n        increase_log_steps: true\n\n  trainer:\n    benchmark: true\n    accumulate_grad_batches: 2\n    max_steps: 40000\n    sync_batchnorm: true\n"
  },
  {
    "path": "models/first_stage_models/ablate/f_c8_p2/config.yaml",
    "content": "model:\n  base_learning_rate: 4.5e-6\n  target: lidm.models.autoencoder.VQModel\n  params:\n    monitor: val/rec_loss\n    embed_dim: 8\n    n_embed: 16384\n    lib_name: lidm\n    use_mask: True  # False\n\n    ddconfig:\n      double_z: false\n      z_channels: 8\n      in_channels: 1\n      out_ch: 2\n      ch: 64\n      ch_mult: [1,1,2,2,4]  # num_down = len(ch_mult)-1\n      strides: [[1,2],[1,2],[1,2],[2,2]]\n      num_res_blocks: 2\n      attn_levels: [4]\n      dropout: 0.0\n\n\ndata:\n  target: main.DataModuleFromConfig\n  params:\n    batch_size: 4\n    num_workers: 8\n    wrap: true\n    dataset:\n      size: [64, 1024]\n      fov: [ 3,-25 ]\n      depth_range: [ 1.0,56.0 ]\n      depth_scale: 5.84  # np.log2(depth_max + 1)\n      x_range: [ -50.0, 50.0 ]\n      y_range: [ -50.0, 50.0 ]\n      z_range: [ -3.0, 1.0 ]\n      resolution: 1\n      num_channels: 1\n      num_cats: 10\n      num_views: 2\n      num_sem_cats: 19\n      filtered_map_cats: [ ]\n    aug:\n      flip: true\n      rotate: true\n      keypoint_drop: false\n      keypoint_drop_range: [ 5,20 ]\n      randaug: false\n    train:\n      target: lidm.data.kitti.KITTIImageTrain\n      params:\n        condition_key: image\n    validation:\n      target: lidm.data.kitti.KITTIImageValidation\n      params:\n        condition_key: image\n\n\n\nlightning:\n  callbacks:\n    image_logger:\n      target: main.ImageLogger\n      params:\n        batch_frequency: 1000\n        max_images: 8\n        increase_log_steps: true\n\n  trainer:\n    benchmark: true\n    accumulate_grad_batches: 2\n    max_steps: 40000\n    sync_batchnorm: true\n"
  },
  {
    "path": "models/first_stage_models/ablate/f_p16/config.yaml",
    "content": "model:\n  base_learning_rate: 4.5e-6\n  target: lidm.models.autoencoder.VQModel\n  params:\n    monitor: val/rec_loss\n    embed_dim: 16\n    n_embed: 16384\n    lib_name: lidm\n    use_mask: True  # False\n\n    ddconfig:\n      double_z: false\n      z_channels: 16\n      in_channels: 1\n      out_ch: 2\n      ch: 64\n      ch_mult: [1,1,2,2,4]  # num_down = len(ch_mult)-1\n      strides: [[2,2],[2,2],[2,2],[2,2]]\n      num_res_blocks: 2\n      attn_levels: [4]\n      dropout: 0.0\n\n\ndata:\n  target: main.DataModuleFromConfig\n  params:\n    batch_size: 4\n    num_workers: 8\n    wrap: true\n    dataset:\n      size: [64, 1024]\n      fov: [ 3,-25 ]\n      depth_range: [ 1.0,56.0 ]\n      depth_scale: 5.84  # np.log2(depth_max + 1)\n      x_range: [ -50.0, 50.0 ]\n      y_range: [ -50.0, 50.0 ]\n      z_range: [ -3.0, 1.0 ]\n      resolution: 1\n      num_channels: 1\n      num_cats: 10\n      num_views: 2\n      num_sem_cats: 19\n      filtered_map_cats: [ ]\n    aug:\n      flip: true\n      rotate: true\n      keypoint_drop: false\n      keypoint_drop_range: [ 5,20 ]\n      randaug: false\n    train:\n      target: lidm.data.kitti.KITTIImageTrain\n      params:\n        condition_key: image\n    validation:\n      target: lidm.data.kitti.KITTIImageValidation\n      params:\n        condition_key: image\n\n\n\nlightning:\n  callbacks:\n    image_logger:\n      target: main.ImageLogger\n      params:\n        batch_frequency: 1000\n        max_images: 8\n        increase_log_steps: true\n\n  trainer:\n    benchmark: true\n    accumulate_grad_batches: 2\n    max_steps: 40000\n    sync_batchnorm: true\n"
  },
  {
    "path": "models/first_stage_models/ablate/f_p2/config.yaml",
    "content": "model:\n  base_learning_rate: 4.5e-6\n  target: lidm.models.autoencoder.VQModel\n  params:\n    monitor: val/rec_loss\n    embed_dim: 2\n    n_embed: 4096\n    lib_name: lidm\n    use_mask: True  # False\n\n    ddconfig:\n      double_z: false\n      z_channels: 2\n      in_channels: 1\n      out_ch: 2\n      ch: 64\n      ch_mult: [1,2]  # num_down = len(ch_mult)-1\n      strides: [[2,2]]\n      num_res_blocks: 2\n      attn_levels: []\n      dropout: 0.0\n\n\ndata:\n  target: main.DataModuleFromConfig\n  params:\n    batch_size: 2\n    num_workers: 4\n    wrap: true\n    dataset:\n      size: [64, 1024]\n      fov: [ 3,-25 ]\n      depth_range: [ 1.0,56.0 ]\n      depth_scale: 5.84  # np.log2(depth_max + 1)\n      x_range: [ -50.0, 50.0 ]\n      y_range: [ -50.0, 50.0 ]\n      z_range: [ -3.0, 1.0 ]\n      resolution: 1\n      num_channels: 1\n      num_cats: 10\n      num_views: 2\n      num_sem_cats: 19\n      filtered_map_cats: [ ]\n    aug:\n      flip: true\n      rotate: true\n      keypoint_drop: false\n      keypoint_drop_range: [ 5,20 ]\n      randaug: false\n    train:\n      target: lidm.data.kitti.KITTIImageTrain\n      params:\n        condition_key: image\n    validation:\n      target: lidm.data.kitti.KITTIImageValidation\n      params:\n        condition_key: image\n\n\n\nlightning:\n  callbacks:\n    image_logger:\n      target: main.ImageLogger\n      params:\n        batch_frequency: 1000\n        max_images: 8\n        increase_log_steps: true\n\n  trainer:\n    benchmark: true\n    accumulate_grad_batches: 2\n    max_steps: 40000\n    sync_batchnorm: true\n"
  },
  {
    "path": "models/first_stage_models/ablate/f_p4/config.yaml",
    "content": "model:\n  base_learning_rate: 4.5e-6\n  target: lidm.models.autoencoder.VQModel\n  params:\n    monitor: val/rec_loss\n    embed_dim: 4\n    n_embed: 16384\n    lib_name: lidm\n    use_mask: True  # False\n\n    ddconfig:\n      double_z: false\n      z_channels: 4\n      in_channels: 1\n      out_ch: 2\n      ch: 64\n      ch_mult: [1,2,4]  # num_down = len(ch_mult)-1\n      strides: [[2,2],[2,2]]\n      num_res_blocks: 2\n      attn_levels: []\n      dropout: 0.0\n\n\ndata:\n  target: main.DataModuleFromConfig\n  params:\n    batch_size: 4\n    num_workers: 8\n    wrap: true\n    dataset:\n      size: [64, 1024]\n      fov: [ 3,-25 ]\n      depth_range: [ 1.0,56.0 ]\n      depth_scale: 5.84  # np.log2(depth_max + 1)\n      x_range: [ -50.0, 50.0 ]\n      y_range: [ -50.0, 50.0 ]\n      z_range: [ -3.0, 1.0 ]\n      resolution: 1\n      num_channels: 1\n      num_cats: 10\n      num_views: 2\n      num_sem_cats: 19\n      filtered_map_cats: [ ]\n    aug:\n      flip: true\n      rotate: true\n      keypoint_drop: false\n      keypoint_drop_range: [ 5,20 ]\n      randaug: false\n    train:\n      target: lidm.data.kitti.KITTIImageTrain\n      params:\n        condition_key: image\n    validation:\n      target: lidm.data.kitti.KITTIImageValidation\n      params:\n        condition_key: image\n\n\n\nlightning:\n  callbacks:\n    image_logger:\n      target: main.ImageLogger\n      params:\n        batch_frequency: 1000\n        max_images: 8\n        increase_log_steps: true\n\n  trainer:\n    benchmark: true\n    accumulate_grad_batches: 2\n    max_steps: 40000\n    sync_batchnorm: true\n"
  },
  {
    "path": "models/first_stage_models/ablate/f_p8/config.yaml",
    "content": "model:\n  base_learning_rate: 4.5e-6\n  target: lidm.models.autoencoder.VQModel\n  params:\n    monitor: val/rec_loss\n    embed_dim: 16\n    n_embed: 16384\n    lib_name: lidm\n    use_mask: True  # False\n\n    ddconfig:\n      double_z: false\n      z_channels: 16\n      in_channels: 1\n      out_ch: 2\n      ch: 64\n      ch_mult: [1,2,2,4]  # num_down = len(ch_mult)-1\n      strides: [[2,2],[2,2],[2,2]]\n      num_res_blocks: 2\n      attn_levels: [3]\n      dropout: 0.0\n\n\ndata:\n  target: main.DataModuleFromConfig\n  params:\n    batch_size: 4\n    num_workers: 8\n    wrap: true\n    dataset:\n      size: [64, 1024]\n      fov: [ 3,-25 ]\n      depth_range: [ 1.0,56.0 ]\n      depth_scale: 5.84  # np.log2(depth_max + 1)\n      x_range: [ -50.0, 50.0 ]\n      y_range: [ -50.0, 50.0 ]\n      z_range: [ -3.0, 1.0 ]\n      resolution: 1\n      num_channels: 1\n      num_cats: 10\n      num_views: 2\n      num_sem_cats: 19\n      filtered_map_cats: [ ]\n    aug:\n      flip: true\n      rotate: true\n      keypoint_drop: false\n      keypoint_drop_range: [ 5,20 ]\n      randaug: false\n    train:\n      target: lidm.data.kitti.KITTIImageTrain\n      params:\n        condition_key: image\n    validation:\n      target: lidm.data.kitti.KITTIImageValidation\n      params:\n        condition_key: image\n\n\n\nlightning:\n  callbacks:\n    image_logger:\n      target: main.ImageLogger\n      params:\n        batch_frequency: 1000\n        max_images: 8\n        increase_log_steps: true\n\n  trainer:\n    benchmark: true\n    accumulate_grad_batches: 2\n    max_steps: 40000\n    sync_batchnorm: true\n"
  },
  {
    "path": "models/first_stage_models/kitti/f_c2_p4/config.yaml",
    "content": "model:\n  base_learning_rate: 4.5e-6\n  target: lidm.models.autoencoder.VQModel\n  params:\n    monitor: val/rec_loss\n    embed_dim: 8\n    n_embed: 16384\n    lib_name: lidm\n    use_mask: False  # False\n    ddconfig:\n      double_z: false\n      z_channels: 8\n      in_channels: 1\n      out_ch: 1\n      ch: 64\n      ch_mult: [1,2,2,4]  # num_down = len(ch_mult)-1\n      strides: [[1,2],[2,2],[2,2]]\n      num_res_blocks: 2\n      attn_levels: []\n      dropout: 0.0\n\n\ndata:\n  target: main.DataModuleFromConfig\n  params:\n    batch_size: 4\n    num_workers: 8\n    wrap: true\n    dataset:\n      size: [64, 1024]\n      fov: [ 3,-25 ]\n      depth_range: [ 1.0,56.0 ]\n      depth_scale: 5.84  # np.log2(depth_max + 1)\n      log_scale: true\n      x_range: [ -50.0, 50.0 ]\n      y_range: [ -50.0, 50.0 ]\n      z_range: [ -3.0, 1.0 ]\n      resolution: 1\n      num_channels: 1\n      num_cats: 10\n      num_views: 2\n      num_sem_cats: 19\n      filtered_map_cats: [ ]\n    aug:\n      flip: true\n      rotate: true\n      keypoint_drop: false\n      keypoint_drop_range: [ 5,20 ]\n      randaug: false\n    train:\n      target: lidm.data.kitti.KITTIImageTrain\n      params:\n        condition_key: image\n    validation:\n      target: lidm.data.kitti.KITTIImageValidation\n      params:\n        condition_key: image\n"
  },
  {
    "path": "models/first_stage_models/kitti/f_c2_p4_wo_logscale/config.yaml",
    "content": "model:\n  base_learning_rate: 4.5e-6\n  target: lidm.models.autoencoder.VQModel\n  params:\n    monitor: val/rec_loss\n    embed_dim: 8\n    n_embed: 16384\n    lib_name: lidm\n    use_mask: False  # False\n    ddconfig:\n      double_z: false\n      z_channels: 8\n      in_channels: 1\n      out_ch: 1\n      ch: 64\n      ch_mult: [1,2,2,4]  # num_down = len(ch_mult)-1\n      strides: [[1,2],[2,2],[2,2]]\n      num_res_blocks: 2\n      attn_levels: []\n      dropout: 0.0\n\n\ndata:\n  target: main.DataModuleFromConfig\n  params:\n    batch_size: 4\n    num_workers: 8\n    wrap: true\n    dataset:\n      size: [64, 1024]\n      fov: [ 3,-25 ]\n      depth_range: [ 1.0,56.0 ]\n      depth_scale: 56  # np.log2(depth_max + 1)\n      log_scale: false\n      x_range: [ -50.0, 50.0 ]\n      y_range: [ -50.0, 50.0 ]\n      z_range: [ -3.0, 1.0 ]\n      resolution: 1\n      num_channels: 1\n      num_cats: 10\n      num_views: 2\n      num_sem_cats: 19\n      filtered_map_cats: [ ]\n    aug:\n      flip: true\n      rotate: true\n      keypoint_drop: false\n      keypoint_drop_range: [ 5,20 ]\n      randaug: false\n    train:\n      target: lidm.data.kitti.KITTIImageTrain\n      params:\n        condition_key: image\n    validation:\n      target: lidm.data.kitti.KITTIImageValidation\n      params:\n        condition_key: image\n"
  },
  {
    "path": "models/lidm/kitti/cam2lidar/config.yaml",
    "content": "model:\n  base_learning_rate: 2.0e-06\n  target: lidm.models.diffusion.ddpm.LatentDiffusion\n  params:\n    linear_start: 0.0015\n    linear_end: 0.0195\n    num_timesteps_cond: 1\n    log_every_t: 100\n    timesteps: 1000\n    image_size: [16, 128]\n    channels: 8\n    monitor: val/loss_simple_ema\n    first_stage_key: image\n    cond_stage_key: camera\n    conditioning_key: crossattn\n    cond_stage_trainable: true\n    verbose: false\n    unet_config:\n      target: lidm.modules.diffusion.openaimodel.UNetModel\n      params:\n        image_size: [16, 128]\n        in_channels: 8\n        out_channels: 8\n        model_channels: 256\n        attention_resolutions: [4, 2, 1]\n        num_res_blocks: 2\n        channel_mult: [1, 2, 4]\n        num_head_channels: 32\n        use_spatial_transformer: true\n        context_dim: 512\n        lib_name: lidm\n    first_stage_config:\n      target: lidm.models.autoencoder.VQModelInterface\n      params:\n        embed_dim: 8\n        n_embed: 16384\n        lib_name: lidm\n        use_mask: False  # False\n        ckpt_path: models/first_stage_models/kitti/f_c2_p4_wo_ls/model.ckpt\n        ddconfig:\n          double_z: false\n          z_channels: 8\n          in_channels: 1\n          out_ch: 1\n          ch: 64\n          ch_mult: [1,2,2,4]\n          strides: [[1,2],[2,2],[2,2]]\n          num_res_blocks: 2\n          attn_levels: []\n          dropout: 0.0\n        lossconfig:\n          target: torch.nn.Identity\n    cond_stage_config:\n      target: lidm.modules.encoders.modules.FrozenClipMultiImageEmbedder\n      params:\n        model: ViT-L/14\n        out_dim: 512\n        split_per_view: 4\n\ndata:\n  target: main.DataModuleFromConfig\n  params:\n    batch_size: 8\n    num_workers: 8\n    wrap: true\n    dataset:\n      size: [64, 1024]\n      fov: [ 3,-25 ]\n      depth_range: [ 1.0,56.0 ]\n      depth_scale: 56  # np.log2(depth_max + 1)\n      log_scale: false\n      x_range: [ -50.0, 50.0 ]\n      y_range: [ -50.0, 50.0 ]\n      z_range: [ -3.0, 1.0 ]\n      resolution: 1\n      num_channels: 1\n      num_cats: 10\n      num_views: 1\n      num_sem_cats: 19\n      filtered_map_cats: [ ]\n    aug:\n      flip: false\n      rotate: false\n      keypoint_drop: false\n      keypoint_drop_range:\n      randaug: false\n      camera_drop: 0.5\n    train:\n      target: lidm.data.kitti.KITTI360Train\n      params:\n        condition_key: camera\n        split_per_view: 4\n    validation:\n      target: lidm.data.kitti.KITTI360Validation\n      params:\n        condition_key: camera\n        split_per_view: 4\n\n\nlightning:\n  callbacks:\n    image_logger:\n      target: main.ImageLogger\n      params:\n        batch_frequency: 5000\n        max_images: 8\n        increase_log_steps: False\n\n  trainer:\n    benchmark: True"
  },
  {
    "path": "models/lidm/kitti/sem2lidar/config.yaml",
    "content": "model:\n  base_learning_rate: 1.0e-06\n  target: lidm.models.diffusion.ddpm.LatentDiffusion\n  params:\n    linear_start: 0.0015\n    linear_end: 0.0205\n    num_timesteps_cond: 1\n    log_every_t: 100\n    timesteps: 1000\n    image_size: [16, 128]\n    channels: 8\n    monitor: val/loss_simple_ema\n    first_stage_key: image\n    cond_stage_key: segmentation\n    concat_mode: true\n    cond_stage_trainable: true\n    verbose: false\n    unet_config:\n      target: lidm.modules.diffusion.openaimodel.UNetModel\n      params:\n        image_size: [16, 128]\n        in_channels: 16\n        out_channels: 8\n        model_channels: 256\n        attention_resolutions: [4, 2, 1]\n        num_res_blocks: 2\n        channel_mult: [1, 2, 4]\n        num_head_channels: 32\n        lib_name: lidm\n    first_stage_config:\n      target: lidm.models.autoencoder.VQModelInterface\n      params:\n        embed_dim: 8\n        n_embed: 16384\n        lib_name: lidm\n        use_mask: False  # False\n        ckpt_path: models/first_stage_models/kitti/f_c2_p4_wo_ls/model.ckpt\n        ddconfig:\n          double_z: false\n          z_channels: 8\n          in_channels: 1\n          out_ch: 1\n          ch: 64\n          ch_mult: [1,2,2,4]\n          strides: [[1,2],[2,2],[2,2]]\n          num_res_blocks: 2\n          attn_levels: []\n          dropout: 0.0\n        lossconfig:\n          target: torch.nn.Identity\n    cond_stage_config:\n      target: lidm.modules.encoders.modules.SpatialRescaler\n      params:\n        strides: [[1,2],[2,2],[2,2]]\n        in_channels: 20\n        out_channels: 8\n\ndata:\n  target: main.DataModuleFromConfig\n  params:\n    batch_size: 16\n    num_workers: 8\n    wrap: true\n    dataset:\n      size: [64, 1024]\n      fov: [ 3,-25 ]\n      depth_range: [ 1.0,56.0 ]\n      depth_scale: 56  # np.log2(depth_max + 1)\n      log_scale: false\n      x_range: [ -50.0, 50.0 ]\n      y_range: [ -50.0, 50.0 ]\n      z_range: [ -3.0, 1.0 ]\n      resolution: 1\n      num_channels: 1\n      num_cats: 10\n      num_views: 2\n      num_sem_cats: 19\n      filtered_map_cats: [ ]\n    aug:\n      flip: true\n      rotate: false\n      keypoint_drop: false\n      keypoint_drop_range: [ 5,20 ]\n      randaug: false\n    train:\n      target: lidm.data.kitti.SemanticKITTITrain\n      params:\n        condition_key: segmentation\n    validation:\n      target: lidm.data.kitti.SemanticKITTIValidation\n      params:\n        condition_key: segmentation\n\n\nlightning:\n  callbacks:\n    image_logger:\n      target: main.ImageLogger\n      params:\n        batch_frequency: 5000\n        max_images: 8\n        increase_log_steps: False\n\n  trainer:\n    benchmark: true"
  },
  {
    "path": "models/lidm/kitti/text2lidar/config.yaml",
    "content": "model:\n  base_learning_rate: 2.0e-06\n  target: lidm.models.diffusion.ddpm.LatentDiffusion\n  params:\n    linear_start: 0.0015\n    linear_end: 0.0195\n    num_timesteps_cond: 1\n    log_every_t: 100\n    timesteps: 1000\n    image_size: [16, 128]\n    channels: 8\n    monitor: val/loss_simple_ema\n    first_stage_key: image\n    cond_stage_key: camera\n    conditioning_key: crossattn\n    cond_stage_trainable: true\n    verbose: false\n    unet_config:\n      target: lidm.modules.diffusion.openaimodel.UNetModel\n      params:\n        image_size: [16, 128]\n        in_channels: 8\n        out_channels: 8\n        model_channels: 256\n        attention_resolutions: [4, 2, 1]\n        num_res_blocks: 2\n        channel_mult: [1, 2, 4]\n        num_head_channels: 32\n        use_spatial_transformer: true\n        context_dim: 512\n        lib_name: lidm\n    first_stage_config:\n      target: lidm.models.autoencoder.VQModelInterface\n      params:\n        embed_dim: 8\n        n_embed: 16384\n        lib_name: lidm\n        use_mask: False  # False\n        ckpt_path: models/first_stage_models/kitti/f_c2_p4_wo_ls/model.ckpt\n        ddconfig:\n          double_z: false\n          z_channels: 8\n          in_channels: 1\n          out_ch: 1\n          ch: 64\n          ch_mult: [1,2,2,4]\n          strides: [[1,2],[2,2],[2,2]]\n          num_res_blocks: 2\n          attn_levels: []\n          dropout: 0.0\n        lossconfig:\n          target: torch.nn.Identity\n    cond_stage_config:\n      target: lidm.modules.encoders.modules.FrozenClipMultiImageEmbedder\n      params:\n        model: ViT-L/14\n        split_per_view: 4\n        key: camera\n        out_dim: 512\n\ndata:\n  target: main.DataModuleFromConfig\n  params:\n    batch_size: 8\n    num_workers: 8\n    wrap: true\n    dataset:\n      size: [64, 1024]\n      fov: [ 3,-25 ]\n      depth_range: [ 1.0,56.0 ]\n      depth_scale: 56  # np.log2(depth_max + 1)\n      log_scale: false\n      x_range: [ -50.0, 50.0 ]\n      y_range: [ -50.0, 50.0 ]\n      z_range: [ -3.0, 1.0 ]\n      resolution: 1\n      num_channels: 1\n      num_cats: 10\n      num_views: 1\n      num_sem_cats: 19\n      filtered_map_cats: [ ]\n    aug:\n      flip: false\n      rotate: false\n      keypoint_drop: false\n      keypoint_drop_range:\n      randaug: false\n      camera_drop: 0.5\n    train:\n      target: lidm.data.kitti.KITTI360Train\n      params:\n        condition_key: camera\n        split_per_view: 4\n    validation:\n      target: lidm.data.kitti.KITTI360Validation\n      params:\n        condition_key: camera\n        split_per_view: 4\n\n\nlightning:\n  callbacks:\n    image_logger:\n      target: main.ImageLogger\n      params:\n        batch_frequency: 5000\n        max_images: 8\n        increase_log_steps: False\n\n  trainer:\n    benchmark: True"
  },
  {
    "path": "models/lidm/kitti/uncond/config.yaml",
    "content": "model:\n  base_learning_rate: 1.0e-06\n  target: lidm.models.diffusion.ddpm.LatentDiffusion\n  params:\n    linear_start: 0.0015\n    linear_end: 0.0195\n    num_timesteps_cond: 1\n    log_every_t: 200\n    timesteps: 1000\n    image_size: [16, 128]\n    channels: 8\n    monitor: val/loss_simple_ema\n    first_stage_key: image\n    unet_config:\n      target: lidm.modules.diffusion.openaimodel.UNetModel\n      params:\n        image_size: [16, 128]\n        in_channels: 8\n        out_channels: 8\n        model_channels: 256\n        attention_resolutions: [4, 2, 1]\n        num_res_blocks: 2\n        channel_mult: [1, 2, 4]\n        num_head_channels: 32\n        lib_name: lidm\n    first_stage_config:\n      target: lidm.models.autoencoder.VQModelInterface\n      params:\n        embed_dim: 8\n        n_embed: 16384\n        lib_name: lidm\n        use_mask: False  # False\n        ckpt_path: models/first_stage_models/kitti/f_c2_p4/model.ckpt\n        ddconfig:\n          double_z: false\n          z_channels: 8\n          in_channels: 1\n          out_ch: 1\n          ch: 64\n          ch_mult: [1,2,2,4]\n          strides: [[1,2],[2,2],[2,2]]\n          num_res_blocks: 2\n          attn_levels: []\n          dropout: 0.0\n        lossconfig:\n          target: torch.nn.Identity\n    cond_stage_config: \"__is_unconditional__\"\n\ndata:\n  target: main.DataModuleFromConfig\n  params:\n    batch_size: 4\n    num_workers: 8\n    wrap: true\n    dataset:\n      size: [64, 1024]\n      fov: [ 3,-25 ]\n      depth_range: [ 1.0,56.0 ]\n      depth_scale: 5.84  # np.log2(depth_max + 1)\n      log_scale: true\n      x_range: [ -50.0, 50.0 ]\n      y_range: [ -50.0, 50.0 ]\n      z_range: [ -3.0, 1.0 ]\n      resolution: 1\n      num_channels: 1\n      num_cats: 10\n      num_views: 2\n      num_sem_cats: 19\n      filtered_map_cats: [ ]\n    aug:\n      flip: true\n      rotate: false\n      keypoint_drop: false\n      keypoint_drop_range: [ 5,20 ]\n      randaug: false\n    train:\n      target: lidm.data.kitti.KITTI360Train\n      params:\n        condition_key: image\n    validation:\n      target: lidm.data.kitti.KITTI360Validation\n      params:\n        condition_key: image\n\n\nlightning:\n  callbacks:\n    image_logger:\n      target: main.ImageLogger\n      params:\n        batch_frequency: 5000\n        max_images: 8\n        increase_log_steps: False\n\n  trainer:\n    benchmark: true\n"
  },
  {
    "path": "models/lidm/kitti/uncond_wo_logscale/config.yaml",
    "content": "model:\n  base_learning_rate: 1.0e-06\n  target: lidm.models.diffusion.ddpm.LatentDiffusion\n  params:\n    linear_start: 0.0015\n    linear_end: 0.0195\n    num_timesteps_cond: 1\n    log_every_t: 200\n    timesteps: 1000\n    image_size: [16, 128]\n    channels: 8\n    monitor: val/loss_simple_ema\n    first_stage_key: image\n    unet_config:\n      target: lidm.modules.diffusion.openaimodel.UNetModel\n      params:\n        image_size: [16, 128]\n        in_channels: 8\n        out_channels: 8\n        model_channels: 256\n        attention_resolutions: [4, 2, 1]\n        num_res_blocks: 2\n        channel_mult: [1, 2, 4]\n        num_head_channels: 32\n        lib_name: lidm\n    first_stage_config:\n      target: lidm.models.autoencoder.VQModelInterface\n      params:\n        embed_dim: 8\n        n_embed: 16384\n        lib_name: lidm\n        use_mask: False  # False\n        ckpt_path: models/first_stage_models/kitti/f_c2_p4_wo_ls/model.ckpt\n        ddconfig:\n          double_z: false\n          z_channels: 8\n          in_channels: 1\n          out_ch: 1\n          ch: 64\n          ch_mult: [1,2,2,4]\n          strides: [[1,2],[2,2],[2,2]]\n          num_res_blocks: 2\n          attn_levels: []\n          dropout: 0.0\n        lossconfig:\n          target: torch.nn.Identity\n    cond_stage_config: \"__is_unconditional__\"\n\ndata:\n  target: main.DataModuleFromConfig\n  params:\n    batch_size: 4\n    num_workers: 8\n    wrap: true\n    dataset:\n      size: [64, 1024]\n      fov: [ 3,-25 ]\n      depth_range: [ 1.0,56.0 ]\n      depth_scale: 56  # np.log2(depth_max + 1)\n      log_scale: false\n      x_range: [ -50.0, 50.0 ]\n      y_range: [ -50.0, 50.0 ]\n      z_range: [ -3.0, 1.0 ]\n      resolution: 1\n      num_channels: 1\n      num_cats: 10\n      num_views: 2\n      num_sem_cats: 19\n      filtered_map_cats: [ ]\n    aug:\n      flip: true\n      rotate: false\n      keypoint_drop: false\n      keypoint_drop_range: [ 5,20 ]\n      randaug: false\n    train:\n      target: lidm.data.kitti.KITTI360Train\n      params:\n        condition_key: image\n    validation:\n      target: lidm.data.kitti.KITTI360Validation\n      params:\n        condition_key: image\n\n\nlightning:\n  callbacks:\n    image_logger:\n      target: main.ImageLogger\n      params:\n        batch_frequency: 5000\n        max_images: 8\n        increase_log_steps: False\n\n  trainer:\n    benchmark: true\n"
  },
  {
    "path": "scripts/eval_ae.py",
    "content": "import math\nimport sys\n\nsys.path.append('./')\n\nimport os, argparse, glob, datetime, yaml\nimport torch\nfrom torch.utils.data import DataLoader\nimport time\nimport numpy as np\nfrom tqdm import tqdm\nimport joblib\n\nfrom omegaconf import OmegaConf\nfrom PIL import Image\n\nfrom lidm.utils.misc_utils import instantiate_from_config, set_seed\nfrom lidm.utils.lidar_utils import range2pcd\nfrom lidm.eval.eval_utils import evaluate\n\n# remove annoying user warnings\nimport warnings\nwarnings.filterwarnings(\"ignore\", category=UserWarning)\nwarnings.filterwarnings(\"ignore\", category=RuntimeWarning)\n\ntry:\n    import open3d as o3d\n    ALLOW_POST_PROCESS = True\nexcept ImportError:\n    ALLOW_POST_PROCESS = False\n\nDATASET2METRICS = {'kitti': ['frid', 'fsvd', 'fpvd', 'cd', 'emd'], 'nuscenes': ['fsvd', 'fpvd', 'cd', 'emd']}\nDATASET2TYPE = {'kitti': '64', 'nuscenes': '32'}\n\ncustom_to_range = lambda x: (x * 255.).clamp(0, 255).floor() / 255.\n\n\ndef custom_to_pcd(x, config, rgb=None):\n    x = x.squeeze().detach().cpu().numpy()\n    x = (np.clip(x, -1., 1.) + 1.) / 2.\n    if rgb is not None:\n        rgb = rgb.squeeze().detach().cpu().numpy()\n        rgb = (np.clip(rgb, -1., 1.) + 1.) / 2.\n        rgb = rgb.transpose(1, 2, 0)\n    xyz, rgb, _ = range2pcd(x, color=rgb, **config['data']['params']['dataset'])\n\n    return xyz, rgb\n\n\ndef custom_to_pil(x):\n    x = x.detach().cpu().squeeze().numpy()\n    x = (np.clip(x, -1., 1.) + 1.) / 2.\n    x = (255 * x).astype(np.uint8)\n\n    if x.ndim == 3:\n        x = x.transpose(1, 2, 0)\n    x = Image.fromarray(x)\n\n    return x\n\n\ndef custom_to_np(x):\n    x = x.detach().cpu().squeeze().numpy()\n    x = (np.clip(x, -1., 1.) + 1.) / 2.\n    x = x.astype(np.float32)  # NOTE: use predicted continuous depth instead of np.uint8 depth\n    return x\n\n\ndef logs2pil(logs, keys=[\"sample\"]):\n    imgs = dict()\n    for k in logs:\n        try:\n            if len(logs[k].shape) == 4:\n                img = custom_to_pil(logs[k][0, ...])\n            elif len(logs[k].shape) == 3:\n                img = custom_to_pil(logs[k])\n            else:\n                print(f\"Unknown format for key {k}. \")\n                img = None\n        except:\n            img = None\n        imgs[k] = img\n    return imgs\n\n\ndef run(model, dataloader, imglogdir, pcdlogdir, nplog=None, config=None, verbose=False):\n    tstart = time.time()\n    n_saved = len(glob.glob(os.path.join(imglogdir, '*.png')))\n\n    all_samples, all_gt = [], []\n    print(f\"Running conditional sampling\")\n    for batch in tqdm(dataloader, desc=\"Reconstructing Batches\"):\n        all_gt.extend(batch['reproj'])\n        N = len(batch['reproj'])\n        logs = model.log_images(batch)\n        n_saved = save_logs(logs, imglogdir, pcdlogdir, N, n_saved=n_saved, config=config)\n        all_samples.extend([custom_to_pcd(img, config)[0].astype(np.float32) for img in logs[\"reconstructions\"]])\n    joblib.dump(all_samples, os.path.join(nplog, f\"samples.pcd\"))\n\n    print(f\"Sampling of {n_saved} images finished in {(time.time() - tstart) / 60.:.2f} minutes.\")\n    return all_samples, all_gt\n\n\ndef save_logs(logs, imglogdir, pcdlogdir, num, n_saved=0, key_list=None, config=None):\n    key_list = logs.keys() if key_list is None else key_list\n    for i in range(num):\n        for k in key_list:\n            x = logs[k][i]\n            # save as image\n            img = custom_to_pil(x)\n            imgpath = os.path.join(imglogdir, f\"{k}_{n_saved:06}.png\")\n            img.save(imgpath)\n            # save as point cloud\n            xyz, rgb = custom_to_pcd(x, config)\n            pcdpath = os.path.join(pcdlogdir, f\"{k}_{n_saved:06}.txt\")\n            np.savetxt(pcdpath, np.hstack([xyz, rgb]), fmt='%.3f')\n        n_saved += 1\n    return n_saved\n\n\ndef get_parser():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\n        \"-r\",\n        \"--resume\",\n        type=str,\n        nargs=\"?\",\n        help=\"load from logdir or checkpoint in logdir\",\n        default=\"none\"\n    )\n    parser.add_argument(\n        \"-l\",\n        \"--logdir\",\n        type=str,\n        nargs=\"?\",\n        help=\"extra logdir\",\n        default=\"none\"\n    )\n    parser.add_argument(\n        \"-b\",\n        \"--batch_size\",\n        type=int,\n        nargs=\"?\",\n        help=\"the bs\",\n        default=32\n    )\n    parser.add_argument(\n        \"-f\",\n        \"--file\",\n        help=\"the file path of samples\",\n        default=None\n    )\n    parser.add_argument(\n        \"-s\",\n        \"--seed\",\n        type=int,\n        help=\"the numpy file path\",\n        default=1000\n    )\n    parser.add_argument(\n        \"-d\",\n        \"--dataset\",\n        type=str,\n        help=\"dataset name [nuscenes, kitti]\",\n        required=True\n    )\n    parser.add_argument(\n        \"-v\",\n        \"--verbose\",\n        default=False,\n        action='store_true',\n        help=\"print status?\",\n    )\n    return parser\n\n\ndef load_model_from_config(config, sd):\n    model = instantiate_from_config(config)\n    model.load_state_dict(sd, strict=False)\n    model.cuda()\n    model.eval()\n    return model\n\n\ndef load_model(config, ckpt):\n    if ckpt:\n        print(f\"Loading model from {ckpt}\")\n        pl_sd = torch.load(ckpt, map_location=\"cpu\")\n        global_step = pl_sd[\"global_step\"]\n    else:\n        pl_sd = {\"state_dict\": None}\n        global_step = None\n    del config.model.params.lossconfig\n    model = load_model_from_config(config.model, pl_sd[\"state_dict\"])\n    return model, global_step\n\n\ndef test_collate_fn(data):\n    output = {}\n    keys = data[0].keys()\n    for k in keys:\n        v = [d[k] for d in data]\n        if k not in ['reproj', 'raw']:\n            v = torch.from_numpy(np.stack(v, 0))\n        else:\n            v = [d[k] for d in data]\n        output[k] = v\n    return output\n\n\nif __name__ == \"__main__\":\n    now = datetime.datetime.now().strftime(\"%Y-%m-%d-%H-%M-%S\")\n    sys.path.append(os.getcwd())\n    command = \" \".join(sys.argv)\n\n    parser = get_parser()\n    opt, unknown = parser.parse_known_args()\n    ckpt = None\n    set_seed(opt.seed)\n\n    if not os.path.exists(opt.resume) and not os.path.exists(opt.file):\n        raise FileNotFoundError\n    if os.path.isfile(opt.resume):\n        try:\n            logdir = '/'.join(opt.resume.split('/')[:-1])\n            print(f'Logdir is {logdir}')\n        except ValueError:\n            paths = opt.resume.split(\"/\")\n            idx = -2  # take a guess: path/to/logdir/checkpoints/model.ckpt\n            logdir = \"/\".join(paths[:idx])\n        ckpt = opt.resume\n    elif os.path.isfile(opt.file):\n        try:\n            logdir = '/'.join(opt.file.split('/')[:-5])\n            if len(logdir) == 0:\n                logdir = '/'.join(opt.file.split('/')[:-1])\n            print(f'Logdir is {logdir}')\n        except ValueError:\n            paths = opt.resume.split(\"/\")\n            idx = -5  # take a guess: path/to/logdir/samples/step_num/date/numpy/*.npz\n            logdir = \"/\".join(paths[:idx])\n        ckpt = None\n    else:\n        assert os.path.isdir(opt.resume), f\"{opt.resume} is not a directory\"\n        logdir = opt.resume.rstrip(\"/\")\n        ckpt = os.path.join(logdir, \"model.ckpt\")\n\n    base_configs = [f'{logdir}/config.yaml']\n    opt.base = base_configs\n\n    configs = [OmegaConf.load(cfg) for cfg in opt.base]\n    cli = OmegaConf.from_dotlist(unknown)\n    config = OmegaConf.merge(*configs, cli)\n\n    gpu = True\n    eval_mode = True\n    if opt.logdir != \"none\":\n        locallog = logdir.split(os.sep)[-1]\n        if locallog == \"\": locallog = logdir.split(os.sep)[-2]\n        print(f\"Switching logdir from '{logdir}' to '{os.path.join(opt.logdir, locallog)}'\")\n        logdir = os.path.join(opt.logdir, locallog)\n\n    print(config)\n\n    if opt.file is None:\n        model, global_step = load_model(config, ckpt)\n        print(f\"global step: {global_step}\")\n        print(75 * \"=\")\n        print(\"logging to:\")\n        logdir = os.path.join(logdir, \"samples\", f\"{global_step:08}\", now)\n        imglogdir = os.path.join(logdir, \"img\")\n        pcdlogdir = os.path.join(logdir, \"pcd\")\n        numpylogdir = os.path.join(logdir, \"numpy\")\n\n        os.makedirs(imglogdir)\n        os.makedirs(pcdlogdir)\n        os.makedirs(numpylogdir)\n        print(logdir)\n        print(75 * \"=\")\n\n        # write config out\n        sampling_file = os.path.join(logdir, \"sampling_config.yaml\")\n        sampling_conf = vars(opt)\n\n        with open(sampling_file, 'w') as f:\n            yaml.dump(sampling_conf, f, default_flow_style=False)\n        print(sampling_conf)\n\n        # traverse all validation data\n        data_config = config['data']['params']['validation']\n        data_config['params'].update({'dataset_config': config['data']['params']['dataset'],\n                                      'aug_config': config['data']['params']['aug'], 'return_pcd': True})\n        dataset = instantiate_from_config(data_config)\n        dataloader = DataLoader(dataset, batch_size=opt.batch_size, num_workers=8, shuffle=False, drop_last=False,\n                                collate_fn=test_collate_fn)\n\n        # settings\n        all_samples, all_gt = run(model, dataloader, imglogdir, pcdlogdir, nplog=numpylogdir,\n                                  config=config, verbose=opt.verbose)\n\n        # recycle gpu memory\n        del model\n        torch.cuda.empty_cache()\n    else:\n        all_samples = joblib.load(opt.file)\n        all_samples = [sample.astype(np.float32) for sample in all_samples]\n\n        # traverse all validation data\n        data_config = config['data']['params']['validation']\n        data_config['params'].update({'dataset_config': config['data']['params']['dataset'],\n                                      'aug_config': config['data']['params']['aug'], 'return_pcd': True})\n        dataset = instantiate_from_config(data_config)\n        test = dataset[0]\n        dataloader = DataLoader(dataset, batch_size=64, num_workers=8, shuffle=False, drop_last=False,\n                                collate_fn=test_collate_fn)\n        all_gt = []\n        for batch in dataloader:\n            all_gt.extend(batch['reproj'])\n\n    # evaluation\n    metrics, data_type = DATASET2METRICS[opt.dataset], DATASET2TYPE[opt.dataset]\n    evaluate(all_gt, all_samples, metrics, data_type)\n"
  },
  {
    "path": "scripts/sample.py",
    "content": "import math\nimport sys\n\nsys.path.append('./')\n\nimport os, argparse, glob, datetime, yaml\nimport torch\nfrom torch.utils.data import DataLoader\nimport time\nimport numpy as np\nfrom tqdm import trange\nimport joblib\n\nfrom omegaconf import OmegaConf\nfrom PIL import Image\n\nfrom lidm.models.diffusion.ddim import DDIMSampler\nfrom lidm.utils.misc_utils import instantiate_from_config, set_seed, count_params\nfrom lidm.utils.lidar_utils import range2pcd\nfrom lidm.eval.eval_utils import evaluate\n\nDATASET2METRICS = {'kitti': ['frid', 'fsvd', 'fpvd', 'jsd', 'mmd'], 'nuscenes': ['fsvd', 'fpvd', 'jsd', 'mmd']}\nDATASET2TYPE = {'kitti': '64', 'nuscenes': '32'}\n\n\ncustom_to_range = lambda x: (x * 255.).clamp(0, 255).floor() / 255.\n\n\ndef custom_to_pcd(x, config):\n    x = x.squeeze().detach().cpu().numpy()\n    x = (np.clip(x, -1., 1.) + 1.) / 2.\n    xyz, _, _ = range2pcd(x, **config['data']['params']['dataset'])\n\n    rgb = np.zeros_like(xyz)\n    return xyz, rgb\n\n\ndef custom_to_pil(x):\n    x = x.detach().cpu().squeeze().numpy()\n    # x = np.clip(x, 0., 1.)\n    x = (np.clip(x, -1., 1.) + 1.) / 2.\n    x = (255 * x).astype(np.uint8)\n    x = Image.fromarray(x)\n\n    return x\n\n\ndef custom_to_np(x):\n    x = x.detach().cpu().squeeze().numpy()\n    x = (np.clip(x, -1., 1.) + 1.) / 2.\n    x = x.astype(np.float32)  # NOTE: use predicted continuous depth instead of np.uint8 depth\n    return x\n\n\ndef logs2pil(logs, keys=[\"samples\"]):\n    imgs = dict()\n    for k in logs:\n        try:\n            if len(logs[k].shape) == 4:\n                img = custom_to_pil(logs[k][0, ...])\n            elif len(logs[k].shape) == 3:\n                img = custom_to_pil(logs[k])\n            else:\n                print(f\"Unknown format for key {k}. \")\n                img = None\n        except:\n            img = None\n        imgs[k] = img\n    return imgs\n\n\n@torch.no_grad()\ndef convsample(model, shape, return_intermediates=True, verbose=True, make_prog_row=False):\n    if not make_prog_row:\n        return model.p_sample_loop(None, shape, return_intermediates=return_intermediates, verbose=verbose)\n    else:\n        return model.progressive_denoising(None, shape, verbose=verbose)\n\n\n@torch.no_grad()\ndef convsample_ddim(model, steps, shape, eta=1.0, verbose=False):\n    ddim = DDIMSampler(model)\n    bs = shape[0]\n    shape = shape[1:]\n    samples, intermediates = ddim.sample(steps, batch_size=bs, shape=shape, eta=eta, verbose=verbose, disable_tqdm=True)\n    return samples, intermediates\n\n\n@torch.no_grad()\ndef make_convolutional_sample(model, batch_size, image_size=None, vanilla=False, custom_steps=None, eta=1.0, verbose=False):\n    log = dict()\n    if image_size is None:\n        image_size = model.model.diffusion_model.image_size\n    shape = [batch_size, model.model.diffusion_model.in_channels, *image_size]\n\n    with model.ema_scope(\"Plotting\"):\n        t0 = time.time()\n        if vanilla:\n            sample, progrow = convsample(model, shape, make_prog_row=True, verbose=verbose)\n        else:\n            sample, intermediates = convsample_ddim(model, custom_steps, shape, eta, verbose)\n        t1 = time.time()\n    x_sample = model.decode_first_stage(sample)\n\n    log[\"samples\"] = x_sample\n    log[\"time\"] = t1 - t0\n    log['throughput'] = sample.shape[0] / (t1 - t0)\n    if verbose:\n        print(f'Throughput for this batch: {log[\"throughput\"]}')\n    return log\n\n\ndef run(model, imglogdir, pcdlogdir, batch_size=50, image_size=None, vanilla=False, custom_steps=None, eta=None, n_samples=50000,\n        nplog=None, config=None, verbose=False):\n    if vanilla:\n        print(f'Using Vanilla DDPM sampling with {model.num_timesteps} sampling steps.')\n    else:\n        print(f'Using DDIM sampling with {custom_steps} sampling steps and eta={eta}')\n\n    tstart = time.time()\n    n_saved = len(glob.glob(os.path.join(imglogdir, '*.png')))\n\n    if model.cond_stage_model is None:\n        all_samples = []\n        print(f\"Running unconditional sampling for {n_samples} samples\")\n        for _ in trange(math.ceil(n_samples / batch_size), desc=\"Sampling Batches (unconditional)\"):\n            logs = make_convolutional_sample(model, batch_size, image_size, vanilla, custom_steps, eta, verbose)\n            n_saved = save_logs(logs, imglogdir, pcdlogdir, n_saved=n_saved, key=\"samples\", config=config)\n            all_samples.extend([custom_to_pcd(img, config)[0].astype(np.float32) for img in logs[\"samples\"]])\n            if n_saved >= n_samples:\n                print(f'Finish after generating {n_saved} samples')\n                break\n        joblib.dump(all_samples, os.path.join(nplog, f\"samples.pcd\"))\n    else:\n       raise NotImplementedError('Currently only sampling for unconditional models supported.')\n\n    print(f\"Sampling of {n_saved} images finished in {(time.time() - tstart) / 60.:.2f} minutes.\")\n    return all_samples\n\n\ndef save_logs(logs, imglogdir, pcdlogdir, n_saved=0, key=\"samples\", np_path=None, config=None):\n    for k in logs:\n        if k == key:\n            batch = logs[key]\n            if np_path is None:\n                for x in batch:\n                    # save as image\n                    img = custom_to_pil(x)\n                    imgpath = os.path.join(imglogdir, f\"{key}_{n_saved:06}.png\")\n                    img.save(imgpath)\n                    # save as point cloud\n                    xyz, rgb = custom_to_pcd(x, config)\n                    pcdpath = os.path.join(pcdlogdir, f\"{key}_{n_saved:06}.txt\")\n                    np.savetxt(pcdpath, np.hstack([xyz, rgb]), fmt='%.3f')\n                    n_saved += 1\n            else:\n                npbatch = custom_to_np(batch)\n                shape_str = \"x\".join([str(x) for x in npbatch.shape])\n                nppath = os.path.join(np_path, f\"{n_saved}-{shape_str}-samples.npz\")\n                np.savez(nppath, npbatch)\n                n_saved += npbatch.shape[0]\n    return n_saved\n\n\ndef get_parser():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\n        \"-r\",\n        \"--resume\",\n        type=str,\n        nargs=\"?\",\n        help=\"load from logdir or checkpoint in logdir\",\n        default=\"none\"\n    )\n    parser.add_argument(\n        \"-n\",\n        \"--n_samples\",\n        type=int,\n        nargs=\"?\",\n        help=\"number of samples to draw\",\n        default=1000\n    )\n    parser.add_argument(\n        \"-e\",\n        \"--eta\",\n        type=float,\n        nargs=\"?\",\n        help=\"eta for ddim sampling (0.0 yields deterministic sampling)\",\n        default=1.0\n    )\n    parser.add_argument(\n        \"--vanilla\",\n        default=False,\n        action='store_true',\n        help=\"vanilla sampling (default option is DDIM sampling)?\",\n    )\n    parser.add_argument(\n        '--image_size',\n        nargs='+',\n        type=int,\n        default=None)\n    parser.add_argument(\n        \"-l\",\n        \"--logdir\",\n        type=str,\n        nargs=\"?\",\n        help=\"extra logdir\",\n        default=\"none\"\n    )\n    parser.add_argument(\n        \"-c\",\n        \"--custom_steps\",\n        type=int,\n        nargs=\"?\",\n        help=\"number of steps for ddim and fastdpm sampling\",\n        default=50\n    )\n    parser.add_argument(\n        \"-b\",\n        \"--batch_size\",\n        type=int,\n        nargs=\"?\",\n        help=\"the bs\",\n        default=10\n    )\n    parser.add_argument(\n        \"-f\",\n        \"--file\",\n        help=\"the file path of samples\",\n        default=None\n    )\n    parser.add_argument(\n        \"-s\",\n        \"--seed\",\n        type=int,\n        help=\"the numpy file path\",\n        default=1000\n    )\n    parser.add_argument(\n        \"-d\",\n        \"--dataset\",\n        type=str,\n        help=\"dataset name [nuscenes, kitti]\",\n        required=True\n    )\n    parser.add_argument(\n        \"--baseline\",\n        default=False,\n        action='store_true',\n        help=\"baseline provided by other sources (default option is not baseline)?\",\n    )\n    parser.add_argument(\n        \"-v\",\n        \"--verbose\",\n        default=False,\n        action='store_true',\n        help=\"print status?\",\n    )\n    parser.add_argument(\n        \"--eval\",\n        default=False,\n        action='store_true',\n        help=\"evaluation results?\",\n    )\n    return parser\n\n\ndef load_model_from_config(config, sd):\n    model = instantiate_from_config(config)\n    model.load_state_dict(sd, strict=False)\n    model.cuda()\n    model.eval()\n    return model\n\n\ndef load_model(config, ckpt):\n    if ckpt:\n        print(f\"Loading model from {ckpt}\")\n        pl_sd = torch.load(ckpt, map_location=\"cpu\")\n        global_step = pl_sd[\"global_step\"]\n    else:\n        pl_sd = {\"state_dict\": None}\n        global_step = None\n    model = load_model_from_config(config.model, pl_sd[\"state_dict\"])\n    count_params(model.first_stage_model, verbose=True)\n    return model, global_step\n\n\ndef visualize(samples, logdir):\n    pcdlogdir = os.path.join(logdir, \"pcd\")\n    os.makedirs(pcdlogdir, exist_ok=True)\n    for i, pcd in enumerate(samples):\n        # save as point cloud\n        pcdpath = os.path.join(pcdlogdir, f\"{i:06}.txt\")\n        np.savetxt(pcdpath, pcd, fmt='%.3f')\n\n\ndef test_collate_fn(data):\n    pcd_list = [example['reproj'].astype(np.float32) for example in data]\n    return pcd_list\n\n\nif __name__ == \"__main__\":\n    now = datetime.datetime.now().strftime(\"%Y-%m-%d-%H-%M-%S\")\n    sys.path.append(os.getcwd())\n    command = \" \".join(sys.argv)\n\n    parser = get_parser()\n    opt, unknown = parser.parse_known_args()\n    ckpt = None\n    set_seed(opt.seed)\n\n    if not os.path.exists(opt.resume) and not os.path.exists(opt.file):\n        raise FileNotFoundError\n    if os.path.isfile(opt.resume):\n        try:\n            logdir = '/'.join(opt.resume.split('/')[:-1])\n            print(f'Logdir is {logdir}')\n        except ValueError:\n            paths = opt.resume.split(\"/\")\n            idx = -2  # take a guess: path/to/logdir/checkpoints/model.ckpt\n            logdir = \"/\".join(paths[:idx])\n        ckpt = opt.resume\n    elif os.path.isfile(opt.file):\n        try:\n            logdir = '/'.join(opt.file.split('/')[:-5])\n            if len(logdir) == 0:\n                logdir = '/'.join(opt.file.split('/')[:-1])\n            print(f'Logdir is {logdir}')\n        except ValueError:\n            paths = opt.resume.split(\"/\")\n            idx = -5  # take a guess: path/to/logdir/samples/step_num/date/numpy/*.npz\n            logdir = \"/\".join(paths[:idx])\n        ckpt = None\n    else:\n        assert os.path.isdir(opt.resume), f\"{opt.resume} is not a directory\"\n        logdir = opt.resume.rstrip(\"/\")\n        ckpt = os.path.join(logdir, \"model.ckpt\")\n\n    if not opt.baseline:\n        base_configs = [f'{logdir}/config.yaml']\n    else:\n        base_configs = [f'models/baseline/{opt.dataset}/template/config.yaml']\n    opt.base = base_configs\n\n    configs = [OmegaConf.load(cfg) for cfg in opt.base]\n    cli = OmegaConf.from_dotlist(unknown)\n    config = OmegaConf.merge(*configs, cli)\n\n    gpu = True\n    eval_mode = True\n    if opt.logdir != \"none\":\n        locallog = logdir.split(os.sep)[-1]\n        if locallog == \"\": locallog = logdir.split(os.sep)[-2]\n        print(f\"Switching logdir from '{logdir}' to '{os.path.join(opt.logdir, locallog)}'\")\n        logdir = os.path.join(opt.logdir, locallog)\n\n    print(config)\n\n    if opt.file is None:\n        if opt.eval:\n            assert len(opt.n_samples) == 2000, \"Specify n_samples=2000 for evaluation\"\n        model, global_step = load_model(config, ckpt)\n        print(f\"global step: {global_step}\")\n        print(75 * \"=\")\n        print(\"logging to:\")\n        logdir = os.path.join(logdir, \"samples\", f\"{global_step:08}\", now)\n        imglogdir = os.path.join(logdir, \"img\")\n        pcdlogdir = os.path.join(logdir, \"pcd\")\n        numpylogdir = os.path.join(logdir, \"numpy\")\n\n        os.makedirs(imglogdir)\n        os.makedirs(pcdlogdir)\n        os.makedirs(numpylogdir)\n        print(logdir)\n        print(75 * \"=\")\n\n        # write config out\n        sampling_file = os.path.join(logdir, \"sampling_config.yaml\")\n        sampling_conf = vars(opt)\n\n        with open(sampling_file, 'w') as f:\n            yaml.dump(sampling_conf, f, default_flow_style=False)\n        print(sampling_conf)\n        all_samples = run(model, imglogdir, pcdlogdir, eta=opt.eta, vanilla=opt.vanilla,\n                          n_samples=opt.n_samples, custom_steps=opt.custom_steps,\n                          batch_size=opt.batch_size, image_size=opt.image_size,\n                          nplog=numpylogdir, config=config, verbose=opt.verbose)\n        # recycle gpu memory\n        del model\n        torch.cuda.empty_cache()\n    else:\n        all_samples = joblib.load(opt.file)\n        if opt.eval:\n            assert len(all_samples) == 2000, \"Prepare 2000 samples before evaluation\"\n        all_samples = [sample.astype(np.float32) for sample in all_samples]\n\n    if opt.image_size is None:\n        # traverse all validation data\n        data_config = config['data']['params']['validation']\n        data_config['params'].update({'dataset_config': config['data']['params']['dataset'],\n                                      'aug_config': config['data']['params']['aug'], 'return_pcd': True})\n        dataset = instantiate_from_config(data_config)\n        dataloader = DataLoader(dataset, batch_size=64, num_workers=8, shuffle=False, drop_last=False,\n                                collate_fn=test_collate_fn)\n        all_gt = []\n        for batch in dataloader:\n            all_gt.extend(batch)\n\n    # evaluation\n    if opt.eval:\n        metrics, data_type = DATASET2METRICS[opt.dataset], DATASET2TYPE[opt.dataset]\n        evaluate(all_gt, all_samples, metrics, data_type)\n"
  },
  {
    "path": "scripts/sample_cond.py",
    "content": "import math\nimport sys\n\nsys.path.append('./')\n\nimport os, argparse, glob, datetime, yaml\nimport torch\nfrom torch.utils.data import DataLoader\nimport time\nimport numpy as np\nfrom tqdm import tqdm\nimport joblib\n\nfrom omegaconf import OmegaConf\nfrom PIL import Image\n\nfrom lidm.utils.misc_utils import instantiate_from_config, set_seed\nfrom lidm.utils.lidar_utils import range2pcd\nfrom lidm.eval.eval_utils import evaluate\n\n# remove annoying user warnings\nimport warnings\nwarnings.filterwarnings(\"ignore\", category=UserWarning)\nwarnings.filterwarnings(\"ignore\", category=RuntimeWarning)\n\nDATASET2METRICS = {'kitti': ['frid', 'fsvd', 'fpvd', 'jsd', 'mmd'], 'nuscenes': ['fsvd', 'fpvd']}\nDATASET2TYPE = {'kitti': '64', 'nuscenes': '32'}\n\ncustom_to_range = lambda x: (x * 255.).clamp(0, 255).floor() / 255.\n\n\ndef custom_to_pcd(x, config, rgb=None):\n    x = x.squeeze().detach().cpu().numpy()\n    x = (np.clip(x, -1., 1.) + 1.) / 2.\n    if rgb is not None:\n        rgb = rgb.squeeze().detach().cpu().numpy()\n        rgb = (np.clip(rgb, -1., 1.) + 1.) / 2.\n        rgb = rgb.transpose(1, 2, 0)\n    xyz, rgb, _ = range2pcd(x, color=rgb, **config['data']['params']['dataset'])\n\n    return xyz, rgb\n\n\ndef custom_to_pil(x):\n    x = x.detach().cpu().squeeze().numpy()\n    x = (np.clip(x, -1., 1.) + 1.) / 2.\n    x = (255 * x).astype(np.uint8)\n\n    if x.ndim == 3:\n        x = x.transpose(1, 2, 0)\n    x = Image.fromarray(x)\n\n    return x\n\n\ndef logs2pil(logs, keys=[\"sample\"]):\n    imgs = dict()\n    for k in logs:\n        try:\n            if len(logs[k].shape) == 4:\n                img = custom_to_pil(logs[k][0, ...])\n            elif len(logs[k].shape) == 3:\n                img = custom_to_pil(logs[k])\n            else:\n                print(f\"Unknown format for key {k}. \")\n                img = None\n        except:\n            img = None\n        imgs[k] = img\n    return imgs\n\n\ndef run(model, dataloader, imglogdir, pcdlogdir, nplog=None, config=None, verbose=False, log_config={}):\n    tstart = time.time()\n    n_saved = len(glob.glob(os.path.join(imglogdir, '*.png')))\n\n    all_samples, all_gt = [], []\n    print(f\"Running conditional sampling\")\n    for batch in tqdm(dataloader, desc=\"Sampling Batches (conditional)\"):\n        all_gt.extend(batch['reproj'])\n        N = len(batch['reproj'])\n        logs = model.log_images(batch, N=N, split='val', **log_config)\n        n_saved = save_logs(logs, imglogdir, pcdlogdir, N, n_saved=n_saved, config=config)\n        all_samples.extend([custom_to_pcd(img, config)[0].astype(np.float32) for img in logs[\"samples\"]])\n    joblib.dump(all_samples, os.path.join(nplog, f\"samples.pcd\"))\n\n    print(f\"Sampling of {n_saved} images finished in {(time.time() - tstart) / 60.:.2f} minutes.\")\n    return all_samples, all_gt\n\n\ndef save_logs(logs, imglogdir, pcdlogdir, num, n_saved=0, key_list=None, config=None):\n    key_list = logs.keys() if key_list is None else key_list\n    for i in range(num):\n        for k in key_list:\n            if k in ['reconstruction']:\n                continue\n            x = logs[k][i]\n            # save as image\n            if x.ndim == 3 and x.shape[0] in [1, 3]:\n                img = custom_to_pil(x)\n                imgpath = os.path.join(imglogdir, f\"{k}_{n_saved:06}.png\")\n                img.save(imgpath)\n            # save as point cloud\n            if k in ['samples', 'inputs']:\n                if config.model.params.cond_stage_key == 'segmentation':\n                    xyz, rgb = custom_to_pcd(x, config, logs['original_conditioning'][i])\n                else:\n                    xyz, rgb = custom_to_pcd(x, config)\n                pcdpath = os.path.join(pcdlogdir, f\"{k}_{n_saved:06}.txt\")\n                np.savetxt(pcdpath, np.hstack([xyz, rgb]), fmt='%.3f')\n        n_saved += 1\n    return n_saved\n\n\ndef get_parser():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\n        \"-r\",\n        \"--resume\",\n        type=str,\n        nargs=\"?\",\n        help=\"load from logdir or checkpoint in logdir\",\n        default=\"none\"\n    )\n    parser.add_argument(\n        \"-e\",\n        \"--eta\",\n        type=float,\n        nargs=\"?\",\n        help=\"eta for ddim sampling (0.0 yields deterministic sampling)\",\n        default=1.0\n    )\n    parser.add_argument(\n        \"--vanilla\",\n        default=False,\n        action='store_true',\n        help=\"vanilla sampling (default option is DDIM sampling)?\",\n    )\n    parser.add_argument(\n        \"-l\",\n        \"--logdir\",\n        type=str,\n        nargs=\"?\",\n        help=\"extra logdir\",\n        default=\"none\"\n    )\n    parser.add_argument(\n        \"-c\",\n        \"--custom_steps\",\n        type=int,\n        nargs=\"?\",\n        help=\"number of steps for ddim and fastdpm sampling\",\n        default=50\n    )\n    parser.add_argument(\n        \"-b\",\n        \"--batch_size\",\n        type=int,\n        nargs=\"?\",\n        help=\"the bs\",\n        default=10\n    )\n    parser.add_argument(\n        \"-f\",\n        \"--file\",\n        help=\"the file path of samples\",\n        default=None\n    )\n    parser.add_argument(\n        \"-s\",\n        \"--seed\",\n        type=int,\n        help=\"the numpy file path\",\n        default=1000\n    )\n    parser.add_argument(\n        \"-d\",\n        \"--dataset\",\n        type=str,\n        help=\"dataset name [nuscenes, kitti]\",\n        required=True\n    )\n    parser.add_argument(\n        \"--baseline\",\n        default=False,\n        action='store_true',\n        help=\"baseline provided by other sources (default option is not baseline)?\",\n    )\n    parser.add_argument(\n        \"-v\",\n        \"--verbose\",\n        default=False,\n        action='store_true',\n        help=\"print status?\",\n    )\n    parser.add_argument(\n        \"--eval\",\n        default=False,\n        action='store_true',\n        help=\"evaluation results?\",\n    )\n    return parser\n\n\ndef load_model_from_config(config, sd):\n    model = instantiate_from_config(config)\n    model.load_state_dict(sd, strict=False)\n    model.cuda()\n    model.eval()\n    return model\n\n\ndef load_model(config, ckpt):\n    if ckpt:\n        print(f\"Loading model from {ckpt}\")\n        pl_sd = torch.load(ckpt, map_location=\"cpu\")\n        global_step = pl_sd[\"global_step\"]\n    else:\n        pl_sd = {\"state_dict\": None}\n        global_step = None\n    model = load_model_from_config(config.model, pl_sd[\"state_dict\"])\n    return model, global_step\n\n\ndef visualize(samples, logdir):\n    pcdlogdir = os.path.join(logdir, \"pcd\")\n    os.makedirs(pcdlogdir, exist_ok=True)\n    for i, pcd in enumerate(samples):\n        # save as point cloud\n        pcdpath = os.path.join(pcdlogdir, f\"{i:06}.txt\")\n        np.savetxt(pcdpath, pcd, fmt='%.3f')\n\n\ndef test_collate_fn(data):\n    output = {}\n    keys = data[0].keys()\n    for k in keys:\n        v = [d[k] for d in data]\n        if k not in ['reproj', 'raw']:\n            v = torch.from_numpy(np.stack(v, 0))\n        else:\n            v = [d[k] for d in data]\n        output[k] = v\n    return output\n\n\ndef traverse_collate_fn(data):\n    pcd_list = [example['reproj'].astype(np.float32) for example in data]\n    return pcd_list\n\n\nif __name__ == \"__main__\":\n    now = datetime.datetime.now().strftime(\"%Y-%m-%d-%H-%M-%S\")\n    sys.path.append(os.getcwd())\n    command = \" \".join(sys.argv)\n\n    parser = get_parser()\n    opt, unknown = parser.parse_known_args()\n    ckpt = None\n    set_seed(opt.seed)\n\n    if not os.path.exists(opt.resume) and not os.path.exists(opt.file):\n        raise FileNotFoundError\n    if os.path.isfile(opt.resume):\n        try:\n            logdir = '/'.join(opt.resume.split('/')[:-1])\n            print(f'Logdir is {logdir}')\n        except ValueError:\n            paths = opt.resume.split(\"/\")\n            idx = -2  # take a guess: path/to/logdir/checkpoints/model.ckpt\n            logdir = \"/\".join(paths[:idx])\n        ckpt = opt.resume\n    elif os.path.isfile(opt.file):\n        try:\n            logdir = '/'.join(opt.file.split('/')[:-5])\n            if len(logdir) == 0:\n                logdir = '/'.join(opt.file.split('/')[:-1])\n            print(f'Logdir is {logdir}')\n        except ValueError:\n            paths = opt.resume.split(\"/\")\n            idx = -5  # take a guess: path/to/logdir/samples/step_num/date/numpy/*.npz\n            logdir = \"/\".join(paths[:idx])\n        ckpt = None\n    else:\n        assert os.path.isdir(opt.resume), f\"{opt.resume} is not a directory\"\n        logdir = opt.resume.rstrip(\"/\")\n        ckpt = os.path.join(logdir, \"model.ckpt\")\n\n    if not opt.baseline:\n        base_configs = [f'{logdir}/config.yaml']\n    else:\n        base_configs = [f'models/baseline/{opt.dataset}/template/config.yaml']\n    opt.base = base_configs\n\n    configs = [OmegaConf.load(cfg) for cfg in opt.base]\n    cli = OmegaConf.from_dotlist(unknown)\n    config = OmegaConf.merge(*configs, cli)\n\n    gpu = True\n    eval_mode = True\n    if opt.logdir != \"none\":\n        locallog = logdir.split(os.sep)[-1]\n        if locallog == \"\": locallog = logdir.split(os.sep)[-2]\n        print(f\"Switching logdir from '{logdir}' to '{os.path.join(opt.logdir, locallog)}'\")\n        logdir = os.path.join(opt.logdir, locallog)\n\n    print(config)\n\n    if opt.file is None:\n        model, global_step = load_model(config, ckpt)\n        print(f\"global step: {global_step}\")\n        print(75 * \"=\")\n        print(\"logging to:\")\n        logdir = os.path.join(logdir, \"samples\", f\"{global_step:08}\", now)\n        imglogdir = os.path.join(logdir, \"img\")\n        pcdlogdir = os.path.join(logdir, \"pcd\")\n        numpylogdir = os.path.join(logdir, \"numpy\")\n\n        os.makedirs(imglogdir)\n        os.makedirs(pcdlogdir)\n        os.makedirs(numpylogdir)\n        print(logdir)\n        print(75 * \"=\")\n\n        # write config out\n        sampling_file = os.path.join(logdir, \"sampling_config.yaml\")\n        sampling_conf = vars(opt)\n\n        with open(sampling_file, 'w') as f:\n            yaml.dump(sampling_conf, f, default_flow_style=False)\n        print(sampling_conf)\n\n        # traverse all validation data\n        data_config = config['data']['params']['validation']\n        data_config['params'].update({'dataset_config': config['data']['params']['dataset'],\n                                      'aug_config': config['data']['params']['aug'], 'return_pcd': True,\n                                      'max_objects_per_image': 5})\n        dataset = instantiate_from_config(data_config)\n        dataloader = DataLoader(dataset, batch_size=opt.batch_size, num_workers=8, shuffle=False, drop_last=False,\n                                collate_fn=test_collate_fn)\n\n        # settings\n        log_config = {'sample': True, 'ddim_steps': opt.custom_steps,\n                      'quantize_denoised': False, 'inpaint': False, 'plot_progressive_rows': False,\n                      'plot_diffusion_rows': False, 'dset': dataset}\n        # test = dataset[0]\n        all_samples, all_gt = run(model, dataloader, imglogdir, pcdlogdir, nplog=numpylogdir,\n                                  config=config, verbose=opt.verbose, log_config=log_config)\n\n        # recycle gpu memory\n        del model\n        torch.cuda.empty_cache()\n    else:\n        all_samples = joblib.load(opt.file)\n        all_samples = [sample.astype(np.float32) for sample in all_samples]\n\n        # traverse all validation data\n        data_config = config['data']['params']['validation']\n        data_config['params'].update({'dataset_config': config['data']['params']['dataset'],\n                                      'aug_config': config['data']['params']['aug'], 'return_pcd': True})\n        dataset = instantiate_from_config(data_config)\n        dataloader = DataLoader(dataset, batch_size=64, num_workers=8, shuffle=False, drop_last=False,\n                                collate_fn=traverse_collate_fn)\n        all_gt = []\n        for batch in dataloader:\n            all_gt.extend(batch)\n\n    # evaluation\n    if opt.eval:\n        metrics, data_type = DATASET2METRICS[opt.dataset], DATASET2TYPE[opt.dataset]\n        evaluate(all_gt, all_samples, metrics, data_type)\n"
  },
  {
    "path": "scripts/text2lidar.py",
    "content": "import math\nimport sys\n\nsys.path.append('./')\n\nimport os, argparse, glob, datetime, yaml\nimport torch\nfrom torch.utils.data import DataLoader\nimport time\nimport numpy as np\nfrom tqdm import tqdm, trange\nimport joblib\n\nfrom omegaconf import OmegaConf\nfrom PIL import Image\n\nfrom lidm.models.diffusion.ddim import DDIMSampler\nfrom lidm.utils.misc_utils import instantiate_from_config, set_seed, isimage\nfrom lidm.utils.lidar_utils import range2pcd\nfrom lidm.modules.encoders.modules import FrozenCLIPTextEmbedder, FrozenClipMultiTextEmbedder\n\n# remove annoying user warnings\nimport warnings\nwarnings.filterwarnings(\"ignore\", category=UserWarning)\nwarnings.filterwarnings(\"ignore\", category=RuntimeWarning)\n\nDATASET2METRICS = {'kitti': ['frid', 'fsvd', 'fpvd'], 'nuscenes': ['fsvd', 'fpvd']}\n\ncustom_to_range = lambda x: (x * 255.).clamp(0, 255).floor() / 255.\n\n\ndef custom_to_pcd(x, config, rgb=None):\n    x = x.squeeze().detach().cpu().numpy()\n    x = (np.clip(x, -1., 1.) + 1.) / 2.\n    if rgb is not None:\n        rgb = rgb.squeeze().detach().cpu().numpy()\n        rgb = (np.clip(rgb, -1., 1.) + 1.) / 2.\n        rgb = rgb.transpose(1, 2, 0)\n    xyz, rgb, _ = range2pcd(x, color=rgb, **config['data']['params']['dataset'])\n\n    return xyz, rgb\n\n\ndef custom_to_pil(x):\n    x = x.detach().cpu().squeeze().numpy()\n    x = (np.clip(x, -1., 1.) + 1.) / 2.\n    x = (255 * x).astype(np.uint8)\n\n    if x.ndim == 3:\n        x = x.transpose(1, 2, 0)\n    x = Image.fromarray(x)\n\n    return x\n\n\ndef custom_to_np(x):\n    x = x.detach().cpu().squeeze().numpy()\n    x = (np.clip(x, -1., 1.) + 1.) / 2.\n    x = x.astype(np.float32)  # NOTE: use predicted continuous depth instead of np.uint8 depth\n    return x\n\n\ndef logs2pil(logs, keys=[\"sample\"]):\n    imgs = dict()\n    for k in logs:\n        try:\n            if len(logs[k].shape) == 4:\n                img = custom_to_pil(logs[k][0, ...])\n            elif len(logs[k].shape) == 3:\n                img = custom_to_pil(logs[k])\n            else:\n                print(f\"Unknown format for key {k}. \")\n                img = None\n        except:\n            img = None\n        imgs[k] = img\n    return imgs\n\n\n@torch.no_grad()\ndef convsample(model, cond, shape, return_intermediates=True, verbose=True, make_prog_row=False):\n    if not make_prog_row:\n        return model.p_sample_loop(cond, shape, return_intermediates=return_intermediates, verbose=verbose)\n    else:\n        return model.progressive_denoising(cond, shape, verbose=verbose)\n\n\n@torch.no_grad()\ndef convsample_ddim(model, cond, steps, shape, eta=1.0, verbose=False):\n    ddim = DDIMSampler(model)\n    bs = shape[0]\n    shape = shape[1:]\n    samples, intermediates = ddim.sample(steps, conditioning=cond, batch_size=bs, shape=shape, eta=eta, verbose=verbose, disable_tqdm=True)\n    return samples, intermediates\n\n\n@torch.no_grad()\ndef make_convolutional_sample(model, cond, batch_size, vanilla=False, custom_steps=None, eta=1.0, verbose=False):\n    log = dict()\n    shape = [batch_size,\n             model.model.diffusion_model.in_channels,\n             *model.model.diffusion_model.image_size]\n\n    with model.ema_scope(\"Plotting\"):\n        t0 = time.time()\n        if vanilla:\n            sample, progrow = convsample(model, cond, shape, make_prog_row=True, verbose=verbose)\n        else:\n            sample, intermediates = convsample_ddim(model, cond, custom_steps, shape, eta, verbose)\n        t1 = time.time()\n    x_sample = model.decode_first_stage(sample)\n\n    log[\"sample\"] = x_sample\n    log[\"time\"] = t1 - t0\n    log['throughput'] = sample.shape[0] / (t1 - t0)\n    if verbose:\n        print(f'Throughput for this batch: {log[\"throughput\"]}')\n    return log\n\n\ndef run(model, text_encoder, prompt, imglogdir, pcdlogdir, custom_steps=50, batch_size=10, n_samples=50, config=None, verbose=False):\n    tstart = time.time()\n    n_saved = len(glob.glob(os.path.join(imglogdir, '*.png')))\n\n    all_samples = []\n    print(f\"Running conditional sampling\")\n    for _ in trange(math.ceil(n_samples / batch_size), desc=\"Sampling Batches (unconditional)\"):\n        with torch.no_grad():\n            cond = text_encoder.encode(batch_size * [prompt])\n            cond = model.cond_stage_model(cond)\n        try:\n            logs = make_convolutional_sample(model, cond, batch_size, custom_steps=custom_steps, verbose=verbose)\n        except Exception:\n            import pdb as debugger\n            debugger.post_mortem()\n        n_saved = save_logs(logs, imglogdir, pcdlogdir, n_saved=n_saved, key=\"sample\", config=config)\n\n    print(f\"Sampling of {n_saved} images finished in {(time.time() - tstart) / 60.:.2f} minutes.\")\n    return all_samples\n\n\ndef save_logs(logs, imglogdir, pcdlogdir, n_saved=0, key=\"sample\", np_path=None, config=None):\n    batch = logs[key]\n    if np_path is None:\n        for x in batch:\n            # save as image\n            img = custom_to_pil(x)\n            imgpath = os.path.join(imglogdir, f\"{key}_{n_saved:06}.png\")\n            img.save(imgpath)\n            # save as point cloud\n            xyz, rgb = custom_to_pcd(x, config)\n            pcdpath = os.path.join(pcdlogdir, f\"{key}_{n_saved:06}.txt\")\n            np.savetxt(pcdpath, np.hstack([xyz, rgb]), fmt='%.6f')\n            n_saved += 1\n    return n_saved\n\n\ndef get_parser():\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\n        \"-r\",\n        \"--resume\",\n        type=str,\n        nargs=\"?\",\n        help=\"load from logdir or checkpoint in logdir\",\n        default=\"none\"\n    )\n    parser.add_argument(\n        \"-p\",\n        \"--prompt\",\n        type=str,\n        nargs=\"?\",\n        default=\"walls surrounded\",\n        help=\"the prompt to render\"\n    )\n    parser.add_argument(\n        \"-n\",\n        \"--n_samples\",\n        type=int,\n        nargs=\"?\",\n        help=\"number of samples to draw\",\n        default=50\n    )\n    parser.add_argument(\n        \"-e\",\n        \"--eta\",\n        type=float,\n        nargs=\"?\",\n        help=\"eta for ddim sampling (0.0 yields deterministic sampling)\",\n        default=1.0\n    )\n    parser.add_argument(\n        \"--vanilla\",\n        default=False,\n        action='store_true',\n        help=\"vanilla sampling (default option is DDIM sampling)?\",\n    )\n    parser.add_argument(\n        \"-l\",\n        \"--logdir\",\n        type=str,\n        nargs=\"?\",\n        help=\"extra logdir\",\n        default=\"none\"\n    )\n    parser.add_argument(\n        \"-c\",\n        \"--custom_steps\",\n        type=int,\n        nargs=\"?\",\n        help=\"number of steps for ddim and fastdpm sampling\",\n        default=50\n    )\n    parser.add_argument(\n        \"-b\",\n        \"--batch_size\",\n        type=int,\n        nargs=\"?\",\n        help=\"the bs\",\n        default=10\n    )\n    parser.add_argument(\n        \"--num_views\",\n        type=int,\n        nargs=\"?\",\n        help=\"num of views\",\n        default=4\n    )\n    parser.add_argument(\n        \"--apply_all\",\n        default=False,\n        action='store_true',\n        help=\"print status?\",\n    )\n    parser.add_argument(\n        \"-s\",\n        \"--seed\",\n        type=int,\n        help=\"the numpy file path\",\n        default=1000\n    )\n    parser.add_argument(\n        \"-d\",\n        \"--dataset\",\n        type=str,\n        help=\"dataset name [nuscenes, kitti]\",\n        required=True\n    )\n    parser.add_argument(\n        \"-v\",\n        \"--verbose\",\n        default=False,\n        action='store_true',\n        help=\"print status?\",\n    )\n    return parser\n\n\ndef load_model_from_config(config, sd):\n    model = instantiate_from_config(config)\n    model.load_state_dict(sd, strict=False)\n    model.cuda()\n    model.eval()\n    return model\n\n\ndef load_model(config, ckpt):\n    if ckpt:\n        print(f\"Loading model from {ckpt}\")\n        pl_sd = torch.load(ckpt, map_location=\"cpu\")\n        global_step = pl_sd[\"global_step\"]\n    else:\n        pl_sd = {\"state_dict\": None}\n        global_step = None\n    model = load_model_from_config(config.model, pl_sd[\"state_dict\"])\n    return model, global_step\n\n\ndef build_text_encoder(num_views, apply_all):\n    model = FrozenClipMultiTextEmbedder(num_views=num_views, apply_all=apply_all)\n    model.freeze()\n    return model\n\n\ndef visualize(samples, logdir):\n    pcdlogdir = os.path.join(logdir, \"pcd\")\n    os.makedirs(pcdlogdir, exist_ok=True)\n    for i, pcd in enumerate(samples):\n        # save as point cloud\n        pcdpath = os.path.join(pcdlogdir, f\"{i:06}.txt\")\n        np.savetxt(pcdpath, pcd, fmt='%.3f')\n\n\ndef test_collate_fn(data):\n    output = {}\n    keys = data[0].keys()\n    for k in keys:\n        v = [d[k] for d in data]\n        if k not in ['reproj']:\n            v = torch.from_numpy(np.stack(v, 0))\n        else:\n            v = [d[k] for d in data]\n        output[k] = v\n    return output\n\n\nif __name__ == \"__main__\":\n    now = datetime.datetime.now().strftime(\"%Y-%m-%d-%H-%M-%S\")\n    sys.path.append(os.getcwd())\n    command = \" \".join(sys.argv)\n\n    parser = get_parser()\n    opt, unknown = parser.parse_known_args()\n    ckpt = None\n    set_seed(opt.seed)\n\n    if not os.path.exists(opt.resume) and not os.path.exists(opt.file):\n        raise ValueError(\"Cannot find {}\".format(opt.resume))\n    if os.path.isfile(opt.resume):\n        try:\n            logdir = '/'.join(opt.resume.split('/')[:-1])\n            print(f'Logdir is {logdir}')\n        except ValueError:\n            paths = opt.resume.split(\"/\")\n            idx = -2  # take a guess: path/to/logdir/checkpoints/model.ckpt\n            logdir = \"/\".join(paths[:idx])\n        ckpt = opt.resume\n    elif os.path.isfile(opt.file):\n        try:\n            logdir = '/'.join(opt.file.split('/')[:-5])\n            if len(logdir) == 0:\n                logdir = '/'.join(opt.file.split('/')[:-1])\n            print(f'Logdir is {logdir}')\n        except ValueError:\n            paths = opt.resume.split(\"/\")\n            idx = -5  # take a guess: path/to/logdir/samples/step_num/date/numpy/*.npz\n            logdir = \"/\".join(paths[:idx])\n        ckpt = None\n    else:\n        assert os.path.isdir(opt.resume), f\"{opt.resume} is not a directory\"\n        logdir = opt.resume.rstrip(\"/\")\n        ckpt = os.path.join(logdir, \"model.ckpt\")\n\n    base_configs = [f'{logdir}/config.yaml']\n    opt.base = base_configs\n\n    configs = [OmegaConf.load(cfg) for cfg in opt.base]\n    cli = OmegaConf.from_dotlist(unknown)\n    config = OmegaConf.merge(*configs, cli)\n\n    gpu = True\n    eval_mode = True\n    if opt.logdir != \"none\":\n        locallog = logdir.split(os.sep)[-1]\n        if locallog == \"\": locallog = logdir.split(os.sep)[-2]\n        print(f\"Switching logdir from '{logdir}' to '{os.path.join(opt.logdir, locallog)}'\")\n        logdir = os.path.join(opt.logdir, locallog)\n\n    print(config)\n\n    model, global_step = load_model(config, ckpt)\n    print(f\"global step: {global_step}\")\n    print(75 * \"=\")\n    print(\"logging to:\")\n    logdir = os.path.join(logdir, \"samples\", f\"{global_step:08}\", opt.prompt.replace(' ', '_'))\n    imglogdir = os.path.join(logdir, \"img\")\n    pcdlogdir = os.path.join(logdir, \"pcd\")\n    numpylogdir = os.path.join(logdir, \"numpy\")\n\n    os.makedirs(imglogdir, exist_ok=True)\n    os.makedirs(pcdlogdir, exist_ok=True)\n    os.makedirs(numpylogdir, exist_ok=True)\n    print(logdir)\n    print(75 * \"=\")\n\n    # write config out\n    sampling_file = os.path.join(logdir, \"sampling_config.yaml\")\n    sampling_conf = vars(opt)\n\n    with open(sampling_file, 'w') as f:\n        yaml.dump(sampling_conf, f, default_flow_style=False)\n    print(sampling_conf)\n\n    text_encoder = build_text_encoder(opt.num_views, opt.apply_all)\n    run(model, text_encoder, opt.prompt, imglogdir, pcdlogdir, custom_steps=opt.custom_steps, config=config, verbose=opt.verbose)\n"
  }
]