[
  {
    "path": ".gitignore",
    "content": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packaging\n.Python\nbuild/\ndevelop-eggs/\ndist/\ndownloads/\neggs/\n.eggs/\nlib/\nlib64/\nparts/\nsdist/\nvar/\nwheels/\nshare/python-wheels/\n*.egg-info/\n.installed.cfg\n*.egg\nMANIFEST\n\n# PyInstaller\n#  Usually these files are written by a python script from a template\n#  before PyInstaller builds the exe, so as to inject date/other infos into it.\n*.manifest\n*.spec\n\n# Installer logs\npip-log.txt\npip-delete-this-directory.txt\n\n# Unit test / coverage reports\nhtmlcov/\n.tox/\n.nox/\n.coverage\n.coverage.*\n.cache\nnosetests.xml\ncoverage.xml\n*.cover\n*.py,cover\n.hypothesis/\n.pytest_cache/\ncover/\n\n# Translations\n*.mo\n*.pot\n\n# Django stuff:\n*.log\nlocal_settings.py\ndb.sqlite3\ndb.sqlite3-journal\n\n# Flask stuff:\ninstance/\n.webassets-cache\n\n# Scrapy stuff:\n.scrapy\n\n# Sphinx documentation\ndocs/_build/\n\n# PyBuilder\n.pybuilder/\ntarget/\n\n# Jupyter Notebook\n.ipynb_checkpoints\n\n# IPython\nprofile_default/\nipython_config.py\n\n# pyenv\n#   For a library or package, you might want to ignore these files since the code is\n#   intended to run in multiple environments; otherwise, check them in:\n# .python-version\n\n# pipenv\n#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.\n#   However, in case of collaboration, if having platform-specific dependencies or dependencies\n#   having no cross-platform support, pipenv may install dependencies that don't work, or not\n#   install all needed dependencies.\n#Pipfile.lock\n\n# poetry\n#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.\n#   This is especially recommended for binary packages to ensure reproducibility, and is more\n#   commonly ignored for libraries.\n#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control\n#poetry.lock\n\n# pdm\n#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.\n#pdm.lock\n#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it\n#   in version control.\n#   https://pdm.fming.dev/#use-with-ide\n.pdm.toml\n\n# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm\n__pypackages__/\n\n# Celery stuff\ncelerybeat-schedule\ncelerybeat.pid\n\n# SageMath parsed files\n*.sage.py\n\n# Environments\n.env\n.venv\nenv/\nvenv/\nENV/\nenv.bak/\nvenv.bak/\n\n# Spyder project settings\n.spyderproject\n.spyproject\n\n# Rope project settings\n.ropeproject\n\n# mkdocs documentation\n/site\n\n# mypy\n.mypy_cache/\n.dmypy.json\ndmypy.json\n\n# Pyre type checker\n.pyre/\n\n# pytype static type analyzer\n.pytype/\n\n# Cython debug symbols\ncython_debug/\n\n# PyCharm\n#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can\n#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore\n#  and can be added to the global gitignore or merged into this file.  For a more nuclear\n#  option (not recommended) you can uncomment the following to ignore the entire idea folder.\n#.idea/\n"
  },
  {
    "path": "DATA.md",
    "content": "\n## Data\n\n* Note: As some of our users requested the mapping between HM3D object id in SceneVerse to HM3D-semantics, we have added an additional file ([HM3D_tgtID2objID.zip](assets/HM3D_tgtID2objID.zip)) to obtain this mapping. The json file for each scene contains a dictionary of ```{<sceneverse_objid>:[hm3d_objid, hm3d_label]}```.\n\n### Data Processing\n\nWe release a data preprocessing exemplar for 3RScan, MultiScan, ARKitScenes and Structured3D, with more details [here](preprocess/README.md).\n\nWe also release the [scripts](preprocess/ssg/README.md) for scene graph generation.\n\n### Data Download\nWe currently host our data on G-drive and request all applicants to fill out the form from [here](https://forms.gle/AXMk7MH6bFXpCqd99).\n\nYou should see one or multiple zip file segments for each dataset we provided. For datasets with multiple segments (e.g., ARKitScenes), you can unzip the files with:\n\n```shell\n# Directories with multiple zip segments\n$ ls ARKitScenes/\n  -> ARKitScenes.zip  ARKitScenes.z01\n\n# Unzip from all zip segments\n$ cd ARKitScenes/\n$ zip -F ARKitScenes.zip --out combined.zip\n$ unzip combined.zip\n```\n\nAfter unzipping, the files are organized as:\n```shell\nARKitScenes/\n|-- scan_data                   # Point cloud data\n  |-- instance_id_to_label      # Reorganized instance id to label mapping\n  |-- pcd_with_global_alignment # Aligned scene point clouds\n|-- annotations                 # Language annotations\n  |-- splits\n    |-- train_split.txt         # For all datasets, we provide training split\n    |-- val_split.txt           # For datasets with evaluation sets\n  |-- <language_type>.json      # For datasets except for ScanNet, language for ScanNet is located at annotations/refer\n```\n\n### Data Visualization\nFor data browsing, we experimented with NVIDIA CUDA 11.8 on Ubuntu 22.04 and require the following steps:\n```shell\n$ conda create -n sceneverse python=3.9\n$ pip install torch==2.2.0 torchvision==0.17.0 --index-url https://download.pytorch.org/whl/cu118\n$ pip install numpy open3d\n```\n\nWe provide a short script for visualizing scene and language data, you can use it with:\n```shell\n# Visualize scene and instance data\n$ python visualize_data.py --root <PATH_TO_DOWNLOAD> --dataset <DATASET>\n# Visualize language data\n$ python visualize_data.py --root <PATH_TO_DOWNLOAD> --dataset <DATASET> --vis_refer\n```\n\nAs our data contains scenes from existing datasets, please read carefully about the term of use for each dataset we provided in the form.\n\n### Provided Language Types\n\nWe list the available data in the current version of SceneVerse in the table below:\n\n|   Dataset    | Object Caption | Scene Caption | Ref-Annotation   | Ref-Pairwise<br>```rel2``` | Ref-MultiObject<br>```relm``` | Ref-Star<br>```star``` | Ref-Chain (Optional)<br>```chain``` |\n|:------------:|:--------------:|:-------------:|------------------|-------------------------|-------------------------------|-----------------------|------------------------------------|\n|   ScanNet    |       ✅        |       ✅       | ScanRefer<br>Nr3D | ✅              | ✅                             | ✅           | ✅       |\n|  MultiScan   |       ✅        |       ✅       | ✅ | ✅              | ✅                             | ✅           | ✅       |\n| ARKitScenes  |       ✅        |       ✅       | ✅ | ✅              | ✅                             | ✅           | ✅       |\n|     HM3D     |  ```template```   |       ✅       | ✅ | ✅              | ✅                             | ✅           | ✅       |\n|    3RScan    |       ✅        |       ✅       | ❌ | ✅              | ✅                             | ✅           | ✅       |\n| Structured3D | ```template``` |       ✅       | ❌ | ✅              | ✅                             | ✅           |    ❌     |\n|   ProcTHOR   | ```template``` |    ❌     | ❌ | ```template```              | ```template```                   | ```template```            |    ❌     |\n\nFor the generated object referrals, we provide both the direct template-based generations ```template``` and the LLM-refined versions ```gpt```.\nPlease refer to our supplementary for the description of selected ```pair-wise``` / ```multi-object``` / ```star``` types. We also\nprovide the ```chain``` type which contains language using obejct A to refer B and then B to refer the target object C. As we found \nthe ```chain``` type could sometimes lead to unnatural descriptions, we did not discuss it in the main paper. Feel free to inspect\nand use it in your projects.\n\nFor the remaining data, we hope to further refine and update our data in the following weeks, stay tuned!\n"
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2024 scene-verse\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "<h2 align=\"center\">\n  <span><img src=\"assets/logo025.png\" width=\"4%\" style=\"transform: translate(0,9px)\"></span>\n  <b>SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding</b>\n</h2>\n\n<div align=\"center\" margin-bottom=\"6em\">\n<a target=\"_blank\" href=\"https://buzz-beater.github.io/\">Baoxiong Jia<sup>✶</sup></a>,\n<a target=\"_blank\" href=\"https://yixchen.github.io/\">Yixin Chen<sup>✶</sup></a>,\n<a target=\"_blank\" href=\"https://scholar.google.com/citations?user=fKRgnIMAAAAJ/\">Huangyue Yu</a>,\n<a target=\"_blank\" href=\"https://github.com/jetpackfirstme\">Yan Wang</a>,\n<a target=\"_blank\" href=\"https://nxsedson.github.io/\">Xuesong Niu</a>,\n<a target=\"_blank\" href=\"https://tengyu.ai/\">Tengyu Liu</a>,\n<a target=\"_blank\" href=\"https://liqing-ustc.github.io/\">Qing Li</a>,\n<a target=\"_blank\" href=\"https://siyuanhuang.com/\">Siyuan Huang</a>\n\n</div>\n&nbsp;\n\n<div align=\"center\">\n    <a href=\"https://arxiv.org/abs/2401.09340\" target=\"_blank\">\n    <img src=\"https://img.shields.io/badge/Paper-arXiv-deepgreen\" alt=\"Paper arXiv\"></a>\n    <a href=\"https://scene-verse.github.io\" target=\"_blank\">\n    <img src=\"https://img.shields.io/badge/Project-Page-9cf\" alt=\"Project Page\"></a>\n    <a href=\"https://youtu.be/UnujS0EVxKU\" target=\"_blank\">\n    <img src=\"https://img.shields.io/badge/Video-YouTube-9966ff\" alt=\"Video\"></a>\n    <a href=\"https://scene-verse.github.io\" target=\"_blank\">\n    <img src=\"https://img.shields.io/badge/Data-SceneVerse-blue\" alt=\"Data\"></a>\n    <a href=\"https://scene-verse.github.io\" target=\"_blank\">\n    <img src=\"https://img.shields.io/badge/Model-GPS-darkorange\" alt=\"Model\"></a>\n</div>\n&nbsp;\n\n<div align=\"left\">\n<img src=\"assets/overview.png\" width=\"99%\" alt=\"SceneVerse Teaser\">\n</div>\n\nWe propose SceneVerse, the first million-scale 3D vision-language dataset with 68K 3D indoor scenes and 2.5M vision-language pairs.  We demonstrate the scaling effect by (i) achieving state-of-the-art on all existing 3D visual grounding benchmarks and (ii) showcasing zero-shot transfer capabilities with our GPS (Grounded Pre-training for Scenes) model.\n\n## News\n- ![](https://img.shields.io/badge/New!-8A2BE2) [2024-12] Our follow-up work on situated question answering on SceneVerse is out, check it out [here](https://msr3d.github.io/)!\n- [2024-10] Pre-trained checkpoints are now available, find detailed instructions in [TRAIN.md](TRAIN.md)!\n- [2024-09] The scripts for scene graph generation are released.\n- [2024-07] Training & Inference code as well as preprocessing code is released and checkpoints & logs are on the way!\n- [2024-07] Preprocessing codes for scenes used in SceneVerse are released.\n- [2024-07] SceneVerse is accepted by ECCV 2024! Training and inference codes/checkpoints will come shortly, stay tuned!\n- [2024-03] We release the data used in SceneVerse. Fill out the [form](https://forms.gle/AXMk7MH6bFXpCqd99) for the download link!\n- [2024-01] We release SceneVerse on ArXiv. Checkout our [paper](https://arxiv.org/abs/2401.09340) and [website](https://scene-verse.github.io/).\n\n## Data\nSee [DATA.md](DATA.md) for detailed instructions on data download, processing, visualization. The data inventory is listed below:\n\n|   Dataset    | Object Caption | Scene Caption | Ref-Annotation   | Ref-Pairwise<br>```rel2``` | Ref-MultiObject<br>```relm``` | Ref-Star<br>```star``` | Ref-Chain (Optional)<br>```chain``` |\n|:------------:|:--------------:|:-------------:|------------------|-------------------------|-------------------------------|-----------------------|------------------------------------|\n|   ScanNet    |       ✅        |       ✅       | ScanRefer<br>Nr3D | ✅              | ✅                             | ✅           | ✅       |\n|  MultiScan   |       ✅        |       ✅       | ✅ | ✅              | ✅                             | ✅           | ✅       |\n| ARKitScenes  |       ✅        |       ✅       | ✅ | ✅              | ✅                             | ✅           | ✅       |\n|     HM3D     |  ```template```   |       ✅       | ✅ | ✅              | ✅                             | ✅           | ✅       |\n|    3RScan    |       ✅        |       ✅       | ❌ | ✅              | ✅                             | ✅           | ✅       |\n| Structured3D | ```template``` |       ✅       | ❌ | ✅              | ✅                             | ✅           |    ❌     |\n|   ProcTHOR   | ```template``` |    ❌     | ❌ | ```template```              | ```template```                   | ```template```            |    ❌     |\n\n\n## Training and Inference\nSee [TRAIN.md](TRAIN.md) for the inventory of available checkpoints and detailed instructions on training and testing \nwith pre-trained checkpoints. The checkpoint inventory is listed below:\n\n\n| Setting              | Description                                                             | Corresponding Experiment                            | Checkpoint based on experiment setting                                                                                                                                                                                                                                                                           |\n|----------------------|-------------------------------------------------------------------------|-----------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| ```pre-trained```    | GPS model pre-trained on SceneVerse                                     | 3D-VL grounding (Tab.2)                             | [Model](https://drive.google.com/drive/folders/1FDjVaYZxHdMJgxB8stSHfI34Q7crItJc?usp=sharing)                                                                                                                                                                                                                                                                                                        |\n| ```scratch```        | GPS model trained on datasets from scratch                              | 3D-VL grounding (Tab.2)<br/>SceneVerse-val (Tab. 3) | [ScanRefer](https://drive.google.com/drive/folders/1d7sGm_D7kyj6Fmo0f8b6DPrhWYUCtWVq?usp=sharing), [Sr3D](https://drive.google.com/drive/folders/1bKGgXot8Sc6BB2MWAfW_OGdu0iq0RWZt?usp=sharing), [Nr3D](https://drive.google.com/drive/folders/14K-UaIeg0GHWFoaonIFHTHZZbDotukzV?usp=sharing), [SceneVerse-val](https://drive.google.com/drive/folders/1CeWwLIPEuK0b35I_gbiwu_OiaUEE42jD?usp=drive_link)                                                                                                                                                                                                                                                            |\n| ```fine-tuned```     | GPS model fine-tuned on datasets with grounding heads                   | 3D-VL grounding (Tab.2)                             | [ScanRefer](https://drive.google.com/drive/folders/1P5YprjIlBMAl0OQ38jgTDJyuFVIGiCMS?usp=sharing), [Sr3D](https://drive.google.com/drive/folders/1-LMYW6jy5wpqL_KlQQuvuSM7TDyo7M3g?usp=sharing), [Nr3D](https://drive.google.com/drive/folders/1sw-_hhF2__JgGCHE1yfyAQeNZ7jSrID0?usp=sharing)                                                                                                                                                                                                                                                                                |\n| ```zero-shot```      | GPS model trained on SceneVerse without data from ScanNet and MultiScan | Zero-shot Transfer (Tab.3)                          | [Model](https://drive.google.com/drive/folders/11824oiZnaU8ChsNpH8zZKIT2i1PdJWSA?usp=sharing)                                                                                                                                                                                                                                                                                                        |\n| ```zero-shot text``` | GPS                                                                     | Zero-shot Transfer (Tab.3)                          | [ScanNet](https://drive.google.com/drive/folders/1TKIhb7xgGzwDiAdvznwTpKzkcJnG7GD0?usp=sharing), [SceneVerse-val](https://drive.google.com/drive/folders/18f65Q6313sa-blLCyspqjZRmWpKJPh3M?usp=sharing)                                                                                                                                                                                              |\n| ```text-ablation```  | Ablations on the type of language used during pre-training              | Ablation on Text (Tab.7)                            | [Template only](https://drive.google.com/drive/folders/1Xo6FkbThHP3uLUJMblt3zgJiM0n3RbVK?usp=sharing), [Template+LLM](https://drive.google.com/drive/folders/1w9Oi8nWKZXOW3BcA0eiC1bgp7snk8ZKS?usp=sharing)                                                                                                      |\n| ```scene-ablation``` | Ablations on the use of synthetic scenes during pre-training            | Ablation on Scene (Tab.8)                           | [Real only](https://drive.google.com/drive/folders/1WZDf2BS7eG36NgGEdTuChICmVHF377is?usp=sharing), [S3D only](https://drive.google.com/drive/folders/1Zh4QfCs6l67ZeltvzOPZtokKkgkvxATc?usp=sharing), [ProcTHOR only](https://drive.google.com/drive/folders/1H9zm7vYxVn_zd2HYi49Js9R34AHnGi1d?usp=sharing)                                                                                                                                                                                                                                                                   |\n| ```model-ablation``` | Ablations on the use of losses during pre-training                      | Ablation on Model Design (Tab.9)                    | [Refer only](https://drive.google.com/drive/folders/1yKF8dVPlcbKb-COcfUZbwcqWxt_uvzuc?usp=sharing), [Refer+Obj-lvl](https://drive.google.com/drive/folders/1C5L20UvTQj2my2t0BnqHZPsb_VaXxVjX?usp=sharing), [w/o Scene-lvl](https://drive.google.com/drive/folders/14jR43ils1-jop6K84hu1AqPqU9DcHucx?usp=sharing) |\n| ```3d-qa```          | Results for QA fine-tuning on ScanQA and SQA3D                          | 3D-QA Experiments (Tab.5)                           | [ScanQA](https://drive.google.com/drive/folders/1_Qluyeu-gvfyQSRoPNcPg7qss5IxFRwO?usp=sharing), [SQA3D](https://drive.google.com/drive/folders/1DGVqsqP12Y2Un10UAC5u9HLij0NJVzJC?usp=sharing)                                                                                                    |\n\n\n## BibTex\n```bibtex\n@inproceedings{jia2024sceneverse,\n  title={Sceneverse: Scaling 3d vision-language learning for grounded scene understanding},\n  author={Jia, Baoxiong and Chen, Yixin and Yu, Huangyue and Wang, Yan and Niu, Xuesong and Liu, Tengyu and Li, Qing and Huang, Siyuan},\n  booktitle={European Conference on Computer Vision (ECCV)},\n  year={2024}\n}\n```\n\n## Acknowledgements\nWe thank the authors from [ScanRefer](https://github.com/daveredrum/ScanRefer), \n[ScanNet](https://github.com/ScanNet/ScanNet), \n[3RScan](https://github.com/WaldJohannaU/3RScan), [ReferIt3D](https://github.com/referit3d/referit3d), \n[Structured3D](https://github.com/bertjiazheng/Structured3D), \n[HM3D](https://github.com/matterport/habitat-matterport-3dresearch),\n[ProcTHOR](https://github.com/allenai/procthor),\n[ARKitScenes](https://github.com/apple/ARKitScenes), [MultiScan](https://github.com/smartscenes/multiscan) for\nopen-sourcing their awesome datasets. We also heavily adapted codes from [ScanQA](https://github.com/ATR-DBI/ScanQA), \n[SQA3D](https://github.com/SilongYong/SQA3D), and \n[3D-VisTA](https://github.com/3d-vista/3D-VisTA) for training and inference.\n"
  },
  {
    "path": "TRAIN.md",
    "content": "# Training and Inference\r\n\r\n## Environment Setup\r\nTo install the environment requirements needed for SceneVerse, you can run the installation scripts provided by:\r\n```bash\r\n$ conda env create -n sceneverse python=3.9\r\n$ conda activate sceneverse\r\n$ pip install --r requirements.txt\r\n```\r\nMeanwhile, SceneVerse depends on an efficient implementation of PointNet2  which is located in ```modules```. Remember to install it with\r\n```bash\r\n$ cd modules/third_party/pointnet2\r\n$ python setup.py install\r\n$ cd ../..\r\n```\r\n\r\n## Model Configurations\r\n### 1. Experiment Setup\r\nWe provide all experiment configurations in ```configs/final```, you can find the experiment setting in the top of comment\r\neach experiment file. To correctly use the configuration files, you need to change the following fields in the configuration\r\nfile to load paths correctly: \r\n- ```base_dir```: save path for model checkpoints, configurations, and logs.\r\n- ```logger.entity```: we used W&B for logging experiments, change it to your corresponding account.\r\n- ```data.{DATASET}_familiy_base```: path to ```{Dataset}``` related data.\r\n- ```model.vision.args.path```: path to the pre-trained object encoder (PointNet++).\r\n- ```model.vision.args.lang_path```: deprecated, but basically text embeddings of the 607 classes in ScanNet.\r\n\r\nYou can walk through the ```configs/final/all_pretrain.yaml``` and compare it with other files to see how we controlled\r\ndata and objectives used in training.\r\n\r\n## Experiments\r\n### 1. Training and Inference\r\nThis codebase leverages [Huggingface Accelerate](https://huggingface.co/docs/accelerate/index) package and \r\n[Facebook Submitit](https://github.com/facebookincubator/submitit) package for efficient model training on multi-node clusters.\r\nWe provide a launcher file ```launch.py``` which provides three ways of launching experiment:\r\n```bash\r\n# Launching using submitit on a SLURM cluster (e.g. 10 hour 1 node 4 GPU experiment with config file $CONFIG)\r\n$ python launch.py --mode submitit --time 10 --qos $QOS --partition $PARTITION --mem_per_gpu 80 \\\r\n                   --gpu_per_node 4 --config $CONFIG note=$NOTE name=$EXP_NAME\r\n                   \r\n# Launching using accelerator with a multi-gpu instance\r\n$ python launch.py --mode accelerate --gpu_per_node 4 --num_nodes 1 -- config $CONFIG note=$NOTE name=$EXP_NAME \r\n```\r\nBasically, ```launch.py``` set up process(es) to run the main entry point ```run.py``` under multi GPU settings. You can\r\ndirectly overwrite configurations in the configuration file ```$CONFIG``` by setting property fields using ```=``` after\r\nall command line arguments. (e.g., ```name=$EXP_NAME```,```solver.epochs=400```,```dataloader.batchsize=4```)\r\n\r\nFor testing and inference, remember to set up the testing data correctly under each configuration files and switch the\r\n```mode``` field in the configurations into ```test``` (i.e., ```mode=test```).\r\n\r\n### 2. Debugging\r\nIf you want to debug your code without an additional job launcher, you can also directly run the file ```run.py``` . \r\nAs an example, you can directly run the file for debugging with\r\n```bash\r\n# Single card direct run for debugging purposes\r\n$ python run.py --config-path ${PROJ_PATH}/configs/final/ --config-name ${EXP_CONFIG_NAME}.yaml \\\r\n                num_gpu=1 hydra.run.dir=. hydra.output_subdir=null hydra/job_logging=disabled hydra/hydra_logging=disabled \\\r\n                debug.flag=True debug.debug_size=1 dataloader.batchsize=2 debug.hard_debug=True name=Debug_test\r\n```\r\n\r\n## Checkpoints\r\nWe provide all available checkpoints under the same data directory, named after ```Checkpoints```. Here we provide detailed\r\ndescriptions of checkpoint in the table below:\r\n\r\n| Setting              | Description                                                             | Corresponding Experiment                            | Checkpoint based on experiment setting                                                                                                                                                                                                                                                                                                                                                                   |\r\n|----------------------|-------------------------------------------------------------------------|-----------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\r\n| ```pre-trained```    | GPS model pre-trained on SceneVerse                                     | 3D-VL grounding (Tab.2)                             | [Model](https://drive.google.com/drive/folders/1FDjVaYZxHdMJgxB8stSHfI34Q7crItJc?usp=sharing)                                                                                                                                                                                                                                                                                                            |\r\n| ```scratch```        | GPS model trained on datasets from scratch                              | 3D-VL grounding (Tab.2)<br/>SceneVerse-val (Tab. 3) | [ScanRefer](https://drive.google.com/drive/folders/1d7sGm_D7kyj6Fmo0f8b6DPrhWYUCtWVq?usp=sharing), [Sr3D](https://drive.google.com/drive/folders/1bKGgXot8Sc6BB2MWAfW_OGdu0iq0RWZt?usp=sharing), [Nr3D](https://drive.google.com/drive/folders/14K-UaIeg0GHWFoaonIFHTHZZbDotukzV?usp=sharing), [SceneVerse-val](https://drive.google.com/drive/folders/1CeWwLIPEuK0b35I_gbiwu_OiaUEE42jD?usp=drive_link) |\r\n| ```fine-tuned```     | GPS model fine-tuned on datasets with grounding heads                   | 3D-VL grounding (Tab.2)                             | [ScanRefer](https://drive.google.com/drive/folders/1P5YprjIlBMAl0OQ38jgTDJyuFVIGiCMS?usp=sharing), [Sr3D](https://drive.google.com/drive/folders/1-LMYW6jy5wpqL_KlQQuvuSM7TDyo7M3g?usp=sharing), [Nr3D](https://drive.google.com/drive/folders/1sw-_hhF2__JgGCHE1yfyAQeNZ7jSrID0?usp=sharing)                                                                                                            |\r\n| ```zero-shot```      | GPS model trained on SceneVerse without data from ScanNet and MultiScan | Zero-shot Transfer (Tab.3)                          | [Model](https://drive.google.com/drive/folders/11824oiZnaU8ChsNpH8zZKIT2i1PdJWSA?usp=sharing)                                                                                                                                                                                                                                                                                                            |\r\n| ```zero-shot text``` | GPS                                                                     | Zero-shot Transfer (Tab.3)                          | [ScanNet](https://drive.google.com/drive/folders/1TKIhb7xgGzwDiAdvznwTpKzkcJnG7GD0?usp=sharing), [SceneVerse-val](https://drive.google.com/drive/folders/18f65Q6313sa-blLCyspqjZRmWpKJPh3M?usp=sharing)                                                                                                                                                                                                  |\r\n| ```text-ablation```  | Ablations on the type of language used during pre-training              | Ablation on Text (Tab.7)                            | [Template only](https://drive.google.com/drive/folders/1Xo6FkbThHP3uLUJMblt3zgJiM0n3RbVK?usp=sharing), [Template+LLM](https://drive.google.com/drive/folders/1w9Oi8nWKZXOW3BcA0eiC1bgp7snk8ZKS?usp=sharing)                                                                                                                                                                                              |\r\n| ```scene-ablation``` | Ablations on the use of synthetic scenes during pre-training            | Ablation on Scene (Tab.8)                           | [Real only](https://drive.google.com/drive/folders/1WZDf2BS7eG36NgGEdTuChICmVHF377is?usp=sharing), [S3D only](https://drive.google.com/drive/folders/1Zh4QfCs6l67ZeltvzOPZtokKkgkvxATc?usp=sharing), [ProcTHOR only](https://drive.google.com/drive/folders/1H9zm7vYxVn_zd2HYi49Js9R34AHnGi1d?usp=sharing)                                                                                               |\r\n| ```model-ablation``` | Ablations on the use of losses during pre-training                      | Ablation on Model Design (Tab.9)                    | [Refer only](https://drive.google.com/drive/folders/1yKF8dVPlcbKb-COcfUZbwcqWxt_uvzuc?usp=sharing), [Refer+Obj-lvl](https://drive.google.com/drive/folders/1C5L20UvTQj2my2t0BnqHZPsb_VaXxVjX?usp=sharing), [w/o Scene-lvl](https://drive.google.com/drive/folders/14jR43ils1-jop6K84hu1AqPqU9DcHucx?usp=sharing)                                                                                         |\r\n| ```3d-qa```          | Results for QA fine-tuning on ScanQA and SQA3D                          | 3D-QA Experiments (Tab.5)                           | [ScanQA](https://drive.google.com/drive/folders/1_Qluyeu-gvfyQSRoPNcPg7qss5IxFRwO?usp=sharing), [SQA3D](https://drive.google.com/drive/folders/1DGVqsqP12Y2Un10UAC5u9HLij0NJVzJC?usp=sharing)                                                                                                    |\r\n\r\n\r\nTo properly use the pre-trained checkpoints, you can use the ```pretrain_ckpt_path``` key in the configs:\r\n```shell\r\n# Directly testing the checkpoint\r\n$ python launch.py --mode submitit --qos $QOS --partition $PARTITION --mem_per_gpu 80 \\\r\n                   --gpu_per_node 4 --config $CONFIG note=$NOTE name=$EXP_NAME mode=test \\\r\n                   pretrain_ckpt_path=$PRETRAIN_CKPT\r\n\r\n# Fine-tuning with pre-trained checkpoint\r\n$ python launch.py --mode submitit --qos $QOS --partition $PARTITION --mem_per_gpu 80 \\\r\n                   --gpu_per_node 4 --config $CONFIG note=$NOTE name=$EXP_NAME \\\r\n                   pretrain_ckpt_path=$PRETRAIN_CKPT\r\n```\r\nFor fine-tuning the pre-trained checkpoint on datasets, you can use the fine-tuning config files provided under \r\n```configs/final/finetune```."
  },
  {
    "path": "common/box_utils.py",
    "content": "import numpy as np\n\n\ndef box3d_iou(corners1, corners2):\n    ''' Compute 3D bounding box IoU.\n\n    Input:\n        corners1: numpy array (8,3), assume up direction is Z\n        corners2: numpy array (8,3), assume up direction is Z\n    Output:\n        iou: 3D bounding box IoU\n\n    '''\n    x_min_1, x_max_1, y_min_1, y_max_1, z_min_1, z_max_1 = get_box3d_min_max(corners1)\n    x_min_2, x_max_2, y_min_2, y_max_2, z_min_2, z_max_2 = get_box3d_min_max(corners2)\n    xA = np.maximum(x_min_1, x_min_2)\n    yA = np.maximum(y_min_1, y_min_2)\n    zA = np.maximum(z_min_1, z_min_2)\n    xB = np.minimum(x_max_1, x_max_2)\n    yB = np.minimum(y_max_1, y_max_2)\n    zB = np.minimum(z_max_1, z_max_2)\n    inter_vol = np.maximum((xB - xA), 0) * np.maximum((yB - yA), 0) * np.maximum((zB - zA), 0)\n    box_vol_1 = (x_max_1 - x_min_1) * (y_max_1 - y_min_1) * (z_max_1 - z_min_1)\n    box_vol_2 = (x_max_2 - x_min_2) * (y_max_2 - y_min_2) * (z_max_2 - z_min_2)\n    iou = inter_vol / (box_vol_1 + box_vol_2 - inter_vol + 1e-8)\n\n    return iou\n\n\ndef get_box3d_min_max(corner):\n    ''' Compute min and max coordinates for 3D bounding box\n        Note: only for axis-aligned bounding boxes\n\n    Input:\n        corners: numpy array (8,3), assume up direction is Z (batch of N samples)\n    Output:\n        box_min_max: an array for min and max coordinates of 3D bounding box IoU\n\n    '''\n    min_coord = corner.min(axis=0)\n    max_coord = corner.max(axis=0)\n    x_min, x_max = min_coord[0], max_coord[0]\n    y_min, y_max = min_coord[1], max_coord[1]\n    z_min, z_max = min_coord[2], max_coord[2]\n    \n    return x_min, x_max, y_min, y_max, z_min, z_max\n\n\ndef get_3d_box(center, box_size):\n    ''' box_size is array(l,w,h), heading_angle is radius clockwise from pos x axis, center is xyz of box center\n        output (8,3) array for 3D box cornders\n        Similar to utils/compute_orientation_3d\n    '''\n    l,w,h = box_size\n    # x_corners = [l/2,l/2,-l/2,-l/2,l/2,l/2,-l/2,-l/2]\n    # y_corners = [h/2,h/2,h/2,h/2,-h/2,-h/2,-h/2,-h/2]\n    # z_corners = [w/2,-w/2,-w/2,w/2,w/2,-w/2,-w/2,w/2]\n    x_corners = [l/2,l/2,-l/2,-l/2,l/2,l/2,-l/2,-l/2]\n    y_corners = [w/2,-w/2,-w/2,w/2,w/2,-w/2,-w/2,w/2]\n    z_corners = [h/2,h/2,h/2,h/2,-h/2,-h/2,-h/2,-h/2]\n    corners_3d = np.vstack([x_corners,y_corners,z_corners])\n    corners_3d[0,:] = corners_3d[0,:] + center[0]\n    corners_3d[1,:] = corners_3d[1,:] + center[1]\n    corners_3d[2,:] = corners_3d[2,:] + center[2]\n    corners_3d = np.transpose(corners_3d)\n    return corners_3d"
  },
  {
    "path": "common/dist_utils.py",
    "content": "import functools\nimport pickle\nimport torch\nimport torch.distributed as dist\n\nimport logging\nlogger = logging.getLogger(__name__)\n\n########################### Basic utility for distributed info ################################\n\ndef is_dist_avail_and_initialized():\n    if not dist.is_available():\n        return False\n    if not dist.is_initialized():\n        return False\n    return True\n\n\ndef get_rank():\n    \"\"\"\n    Get the rank of the current process.\n    \"\"\"\n    if not is_dist_avail_and_initialized():\n        return 0\n    return dist.get_rank()\n\n\ndef get_world_size():\n    \"\"\"\n    Get the size of the world.\n    \"\"\"\n    if not is_dist_avail_and_initialized():\n        return 1\n    return dist.get_world_size()\n\n\ndef is_master_proc(num_gpus=8):\n    \"\"\"\n    Determines if the current process is the master process on each node.\n    \"\"\"\n    if is_dist_avail_and_initialized():\n            return dist.get_rank() % num_gpus == 0\n    else:\n        return True\n\n\ndef is_root_proc():\n    \"\"\"\n    Determines if the current process is the root process.\n    \"\"\"\n    if is_dist_avail_and_initialized():\n        return dist.get_rank() == 0\n    else:\n        return True\n\n\n############################## Data gathering across devices ##################################\n\ndef _serialize_to_tensor(data, group, max_size=1024):\n    \"\"\"\n    Serialize the tensor to ByteTensor. Note that only `gloo` and `nccl`\n        backend is supported.\n    Args:\n        data (data): data to be serialized.\n        group (group): pytorch dist group.\n    Returns:\n        tensor (ByteTensor): tensor that serialized.\n    \"\"\"\n    backend = dist.get_backend(group)\n    assert backend in [\"gloo\", \"nccl\"]\n    device = torch.device(\"cpu\" if backend == \"gloo\" else \"cuda\")\n\n    buffer = pickle.dumps(data)\n    if len(buffer) > max_size ** 3:\n        logger.warning(\n            \"Rank {} trying to all-gather {:.2f} GB of data on device {}\".format(\n                get_rank(), len(buffer) / (max_size ** 3), device\n            )\n        )\n    storage = torch.ByteStorage.from_buffer(buffer)\n    tensor = torch.ByteTensor(storage).to(device=device)\n    return tensor\n\n\ndef _pad_to_largest_tensor(tensor, group):\n    \"\"\"\n    Padding all the tensors from different GPUs to the largest ones.\n    Args:\n        tensor (tensor): tensor to pad.\n        group (group): pytorch dist group.\n    Returns:\n        list[int]: size of the tensor, on each rank\n        Tensor: padded tensor that has the max size\n    \"\"\"\n    world_size = dist.get_world_size(group=group)\n    assert (\n        world_size >= 1\n    ), \"comm.gather/all_gather must be called from ranks within the given group!\"\n    local_size = torch.tensor(\n        [tensor.numel()], dtype=torch.int64, device=tensor.device\n    )\n    size_list = [\n        torch.zeros([1], dtype=torch.int64, device=tensor.device)\n        for _ in range(world_size)\n    ]\n    dist.all_gather(size_list, local_size, group=group)\n    size_list = [int(size.item()) for size in size_list]\n\n    max_size = max(size_list)\n\n    # we pad the tensor because torch all_gather does not support\n    # gathering tensors of different shapes\n    if local_size != max_size:\n        padding = torch.zeros(\n            (max_size - local_size,), dtype=torch.uint8, device=tensor.device\n        )\n        tensor = torch.cat((tensor, padding), dim=0)\n    return size_list, tensor\n\n\ndef broadcast(object):\n    if isinstance(object, torch.Tensor):\n        dist.broadcast(tensor=object, src=0)\n    else:\n        sync_tensor = torch.Tensor([object]).cuda()\n        dist.broadcast(tensor=sync_tensor, src=0)\n        object = sync_tensor[0].item()\n    return object\n\n\ndef all_gather(tensors):\n    \"\"\"\n    All gathers the provided tensors from all processes across machines.\n    Args:\n        tensors (list): tensors to perform all gather across all processes in\n        all machines.\n    \"\"\"\n    gather_list = []\n    output_tensor = []\n    world_size = dist.get_world_size()\n    for tensor in tensors:\n        tensor_placeholder = [\n            torch.ones_like(tensor) for _ in range(world_size)\n        ]\n        dist.all_gather(tensor_placeholder, tensor, async_op=False)\n        gather_list.append(tensor_placeholder)\n    for gathered_tensor in gather_list:\n        output_tensor.append(torch.cat(gathered_tensor, dim=0))\n    return output_tensor\n\n\ndef all_reduce(tensors, average=True):\n    \"\"\"\n    All reduce the provided tensors from all processes across machines.\n    Args:\n        tensors (list): tensors to perform all reduce across all processes in\n        all machines.\n        average (bool): scales the reduced tensor by the number of overall\n        processes across all machines.\n    \"\"\"\n    for tensor in tensors:\n        dist.all_reduce(tensor, async_op=False)\n    if average:\n        world_size = dist.get_world_size()\n        for tensor in tensors:\n            tensor.mul_(1.0 / world_size)\n    return tensors\n\n\n@functools.lru_cache()\ndef _get_global_gloo_group():\n    \"\"\"\n    Return a process group based on gloo backend, containing all the ranks\n    The result is cached.\n    Returns:\n        (group): pytorch dist group.\n    \"\"\"\n    if dist.get_backend() == \"nccl\":\n        return dist.new_group(backend=\"gloo\")\n    else:\n        return dist.group.WORLD\n\n\ndef all_gather_unaligned(data, group=None):\n    \"\"\"\n    Run all_gather on arbitrary picklable data (not necessarily tensors).\n\n    Args:\n        data: any picklable object\n        group: a torch process group. By default, will use a group which\n            contains all ranks on gloo backend.\n\n    Returns:\n        list[data]: list of data gathered from each rank\n    \"\"\"\n    if get_world_size() == 1:\n        return [data]\n    if group is None:\n        group = _get_global_gloo_group()\n    if dist.get_world_size(group) == 1:\n        return [data]\n\n    tensor = _serialize_to_tensor(data, group)\n\n    size_list, tensor = _pad_to_largest_tensor(tensor, group)\n    max_size = max(size_list)\n\n    # receiving Tensor from all ranks\n    tensor_list = [\n        torch.empty((max_size,), dtype=torch.uint8, device=tensor.device)\n        for _ in size_list\n    ]\n    dist.all_gather(tensor_list, tensor, group=group)\n\n    data_list = []\n    for size, tensor in zip(size_list, tensor_list):\n        buffer = tensor.cpu().numpy().tobytes()[:size]\n        data_list.append(pickle.loads(buffer))\n\n    return data_list"
  },
  {
    "path": "common/io_utils.py",
    "content": "import csv\nimport pickle\nimport json\nimport cv2\nimport yaml\nimport numpy as np\nfrom pathlib import Path\nimport torch\nimport open3d\nfrom plyfile import PlyData\n\ndef make_dir(dir_path):\n    if not Path(dir_path).exists():\n        Path(dir_path).mkdir(parents=True, exist_ok=True)\n\n\ndef load_imgs(img_paths, option=cv2.IMREAD_COLOR):\n    imgs = [cv2.imread(img_path, option) for img_path in img_paths]\n    return imgs\n\n\ndef load_pickle(filename):\n    with Path(filename).open(\"rb\") as f:\n        return pickle.load(f)\n\n\ndef save_pickle(data, filename):\n    with Path(filename).open(\"wb\") as f:\n        pickle.dump(data, f, protocol=pickle.HIGHEST_PROTOCOL)\n\n\ndef load_json(filename):\n    with Path(filename).open(\"rb\") as f:\n        return json.load(f)\n\n\ndef save_json(data, filename, save_pretty=True, sort_keys=False):\n    with Path(filename).open(\"w\") as f:\n        if save_pretty:\n            f.write(json.dumps(data, indent=4, sort_keys=sort_keys))\n        else:\n            json.dump(data, f)\n\n\ndef load_jsonl(filename):\n    with Path(filename).open(\"r\") as f:\n        return [json.loads(l.strip(\"\\n\")) for l in f.readlines()]\n\n\ndef save_jsonl(data, filename):\n    with Path(filename).open(\"w\") as f:\n        f.write(\"\\n\".join([json.dumps(e) for e in data]))\n\n\ndef load_yaml(filename):\n    with Path(filename).open(\"r\") as f:\n        return yaml.load(f, Loader=yaml.SafeLoader)\n\n\ndef save_yaml(data, filename):\n    with Path(filename).open(\"w\") as f:\n        yaml.dump(data, f, default_flow_style=False)\n\n\ndef load_csv(filename, delimiter=\",\"):\n    idx2key = None\n    contents = {}\n    with Path(filename).open(\"r\") as f:\n        reader = csv.reader(f, delimiter=delimiter)\n        for l_idx, row in reader:\n            if l_idx == 0:\n                idx2key = row\n                for k_idx, key in enumerate(idx2key):\n                    contents[key] = []\n            else:\n                for c_idx, col in enumerate(row):\n                    contents[idx2key[c_idx]].append(col)\n    return contents, idx2key\n\n\ndef save_csv(data, filename, cols=None, delimiter=\",\"):\n    with Path(filename).open(\"w\") as f:\n        writer = csv.writer(f, delimiter=delimiter)\n        num_entries = len(data[list(data.keys())[0]])\n        assert cols is not None, \"Must have column names for dumping csv files.\"\n        writer.writerow(cols)\n        for l_idx in range(num_entries):\n            row = [data[key][l_idx] for key in cols]\n            writer.writerow(row)\n\n\ndef load_numpy(filename):\n    return np.load(filename, allow_pickle=True)\n\n\ndef save_numpy(data, filename):\n    np.save(filename, data, allow_pickle=True)\n\n\ndef load_tensor(filename):\n    return torch.load(filename)\n\n\ndef save_tensor(data, filename):\n    torch.save(data, filename)\n\n\ndef load_ply(filepath):\n    with open(filepath, \"rb\") as f:\n        plydata = PlyData.read(f)\n    data = plydata.elements[0].data\n    coords = np.array([data[\"x\"], data[\"y\"], data[\"z\"]], dtype=np.float32).T\n    feats = None\n    labels = None\n    if ({\"red\", \"green\", \"blue\"} - set(data.dtype.names)) == set():\n        feats = np.array([data[\"red\"], data[\"green\"], data[\"blue\"]], dtype=np.uint8).T\n    if \"label\" in data.dtype.names:\n        labels = np.array(data[\"label\"], dtype=np.uint32)\n    return coords, feats, labels\n\n    \ndef load_ply_with_normals(filepath):\n    mesh = open3d.io.read_triangle_mesh(str(filepath))\n    if not mesh.has_vertex_normals():\n        mesh.compute_vertex_normals()\n    vertices = np.asarray(mesh.vertices)\n    normals = np.asarray(mesh.vertex_normals)\n\n    coords, feats, labels = load_ply(filepath)\n    assert np.allclose(coords, vertices), \"different coordinates\"\n    feats = np.hstack((feats, normals))\n\n    return coords, feats, labels\n"
  },
  {
    "path": "common/launch_utils.py",
    "content": "import os\nfrom pathlib import Path\nimport subprocess\n\nimport submitit\n\n\nhuggingface_fix = f\"TRANSFORMERS_OFFLINE=1 CURL_CA_BUNDLE=''\"\n\n\nclass SubmititLauncher:\n    def __init__(self, args):\n        self.args = args\n\n    def __call__(self):\n        host_name = os.popen(\n            \"scontrol show hostnames $SLURM_JOB_NODELIST\"\n        ).read().split(\"\\n\")[0]\n        self._set_gpu_args()\n        # Using Accelerate for launching\n        multi_gpu = \"--multi_gpu\" if self.args.num_nodes * self.args.gpu_per_node > 1 else \"\"\n        opts = \" \".join(self.args.opts) if len(self.args.opts) > 0 else \"\"\n        opts += f\" num_gpu={self.args.num_nodes * self.args.gpu_per_node} \"\n        full_cfg_path = Path(self.args.config)\n        cfg_path, cfg_file = str(full_cfg_path.parent), str(full_cfg_path.name)\n        cmd = f\"{huggingface_fix} accelerate launch --num_machines {self.args.num_nodes} \\\n                        --mixed_precision {self.args.mixed_precision} {multi_gpu} \\\n                        --num_processes {self.args.gpu_per_node * self.args.num_nodes} \\\n                        --num_cpu_threads_per_process {self.args.cpu_per_task} \\\n                        --main_process_ip {host_name} \\\n                        --main_process_port {self.args.port} \\\n                        --machine_rank {self.args.node_id} \\\n                        --dynamo_backend no \\\n                        {self.args.run_file} \\\n                        --config-path {cfg_path} \\\n                        --config-name {cfg_file} \\\n                        num_gpu={self.args.num_nodes * self.args.gpu_per_node} \\\n                        hydra.run.dir=. \\\n                        hydra.output_subdir=null \\\n                        hydra/job_logging=disabled \\\n                        hydra/hydra_logging=disabled {opts}\"\n        subprocess.run(cmd, shell=True)\n    \n    def _set_gpu_args(self):\n        job_env = submitit.JobEnvironment()\n        self.args.job_dir = str(self.args.job_dir).replace(\"%j\", job_env.job_id)\n        self.args.node_id = int(job_env.global_rank / self.args.gpu_per_node)\n\n\ndef submitit_launch(args):\n    \"\"\"\n    Multi node script launching with Submitit\n    \"\"\"\n    additional_parameters = {}\n    if args.nodelist != \"\":\n        # if specifying node id\n        nodelist = f\"{str(args.nodelist)}\"\n        additional_parameters[\"nodelist\"] = nodelist\n\n    executor = submitit.AutoExecutor(folder=args.job_dir, slurm_max_num_timeout=30)\n    executor.update_parameters(\n        name=args.name,\n        mem_gb=args.mem_per_gpu * args.gpu_per_node * args.num_nodes,\n        gpus_per_node=args.gpu_per_node,\n        tasks_per_node=1,\n        cpus_per_task=args.gpu_per_node * args.cpu_per_task,\n        nodes=args.num_nodes,\n        slurm_qos=args.qos,\n        slurm_partition=args.partition,\n        slurm_account=args.account,\n        slurm_time=args.time * 60,\n        slurm_signal_delay_s=120,\n        slurm_additional_parameters=additional_parameters\n    )\n    launcher = SubmititLauncher(args)\n    job = executor.submit(launcher)\n    print(f\"submitted job: {job.job_id}\")\n\n\ndef accelerate_launch(args):\n    \"\"\"\n    Single node script launching with Accelerate\n    \"\"\"\n    opts = \" \".join(args.opts) if len(args.opts) > 0 else \"\"\n    opts += f\" num_gpu={args.num_nodes * args.gpu_per_node} \"\n    multi_gpu = \"--multi_gpu\" if args.num_nodes * args.gpu_per_node > 1 else \"\"\n    full_cfg_path = Path(args.config)\n    cfg_path, cfg_file = str(full_cfg_path.parent), str(full_cfg_path.name)\n    cmd = f\"{huggingface_fix} accelerate launch --num_machines {args.num_nodes} \\\n        {multi_gpu} \\\n        --mixed_precision {args.mixed_precision} \\\n        --num_processes {args.gpu_per_node * args.num_nodes} \\\n        --num_cpu_threads_per_process {args.cpu_per_task} \\\n        --dynamo_backend no \\\n        {args.run_file} \\\n        --config-path {cfg_path} \\\n        --config-name {cfg_file} \\\n        num_gpu={args.num_nodes * args.gpu_per_node} \\\n        hydra.run.dir=. \\\n        hydra.output_subdir=null \\\n        hydra/job_logging=disabled \\\n        hydra/hydra_logging=disabled {opts}\"\n    subprocess.run(cmd, shell=True)\n\n\ndef python_launch(args):\n    \"\"\"\n    Vanilla python launcher for degbugging purposes\n    \"\"\"\n    opts = \" \".join(args.opts) if len(args.opts) > 0 else \"\"\n    full_cfg_path = Path(args.config)\n    cfg_path, cfg_file = str(full_cfg_path.parent), str(full_cfg_path.name)\n    cmd = f\"{huggingface_fix} python {args.run_file} \" \\\n          f\"--config-path {cfg_path} \" \\\n          f\"--config-name {cfg_file} \" \\\n          f\"num_gpu=1 \" \\\n          f\"hydra.run.dir=. \" \\\n          f\"hydra.output_subdir=null \" \\\n          f\"hydra/job_logging=disabled \" \\\n          f\"hydra/hydra_logging=disabled {opts}\"\n    subprocess.run(cmd, shell=True)"
  },
  {
    "path": "common/misc.py",
    "content": "import os\nimport glob\nimport importlib\nimport functools\nimport torch\nfrom typing import Any\nfrom accelerate.logging import get_logger\nfrom accelerate.state import PartialState\nfrom accelerate.utils import recursively_apply\nfrom accelerate.utils.constants import TORCH_DISTRIBUTED_OPERATION_TYPES\nfrom accelerate.utils.dataclasses import DistributedType\n\nlogger = get_logger(__name__)\n\n\ndef rsetattr(obj, attr, val):\n    pre, _, post = attr.rpartition('.')\n    return setattr(rgetattr(obj, pre) if pre else obj, post, val)\n\n# using wonder's beautiful simplification: https://stackoverflow.com/questions/31174295/getattr-and-setattr-on-nested-objects/31174427?noredirect=1#comment86638618_31174427\n\ndef rgetattr(obj, attr, *args):\n    def _getattr(obj, attr):\n        return getattr(obj, attr, *args)\n    return functools.reduce(_getattr, [obj] + attr.split('.'))\n\n\n# def import_all(exclude_list=None):\n#     if exclude_list is None:\n#         exclude_list = [\"__init__.py\", \"build.py\"]\n#     print(f\"file: {__file__}\")\n#     current_directory = os.path.dirname(__file__)\n#     module_names = [\n#         os.path.splitext(file)[0] for file in os.listdir(current_directory)\n#         if file.endswith(\".py\") and file not in exclude_list\n#     ]\n#     for module_name in module_names:\n#         module = importlib.import_module(f\".{module_name}\", package=__name__)\n#         globals().update({name: getattr(module, name) for name in getattr(module, '__all__', [])})\n#     __all__ = [name for name in globals() if not name.startswith(\"_\")]\n\n\ndef _gpu_gather_object(object: Any):\n    # by JY Huang: re-implement the method for gathering non-tensor objects\n    output_objects = [None for _ in range(PartialState().num_processes)]\n    torch.distributed.all_gather_object(output_objects, object)\n    if isinstance(object, (list, tuple)):\n        output_list = []\n        for item in output_objects:\n            output_list.extend(item)\n        return output_list\n    elif isinstance(object, dict):\n        template = output_objects[0]\n        output_dict = {}\n        for k, v in template.items():\n            output_dict[k] = []\n            for item in output_objects:\n                if isinstance(item[k], list):\n                    output_dict[k].extend(item[k])\n                else:\n                    output_dict[k].append(item[k])\n        return output_dict\n\n\ndef gather_object(object: Any):\n    \"\"\"\n    Recursively gather object in a nested list/tuple/dictionary of objects from all devices.\n\n    Args:\n        object (nested list/tuple/dictionary of picklable object):\n            The data to gather.\n\n    Returns:\n        The same data structure as `object` with all the objects sent to every device.\n    \"\"\"\n    if PartialState().distributed_type == DistributedType.TPU:\n        raise NotImplementedError(\"gather objects in TPU is not supported\")\n    elif PartialState().distributed_type in TORCH_DISTRIBUTED_OPERATION_TYPES:\n        return _gpu_gather_object(object)\n    else:\n        return object\n\n\ndef gather_for_metrics(accelerator, input_data):\n    \"\"\"\n    by JY Huang: re-implement this method for gathering non-tensor objects\n    Refer source code to https://huggingface.co/docs/accelerate/package_reference/accelerator#accelerate.Accelerator.gather_for_metrics\n    \"\"\"\n\n    try:\n        recursively_apply(lambda x: x, input_data, error_on_other_type=True)\n        all_tensors = True\n    except TypeError:\n        all_tensors = False\n\n    if not all_tensors:\n        data = gather_object(input_data)\n    else:\n        data = accelerator.gather(input_data)\n\n    try:\n        if accelerator.gradient_state.end_of_dataloader:\n            # at the end of a dataloader, `gather_for_metrics` regresses to\n            # `gather` unless the dataset has a remainder so log.\n            if accelerator.gradient_state.remainder == -1:\n                logger.info(\n                    \"The used dataset had no length, returning gathered tensors. You should drop the remainder yourself.\"\n                )\n                return data\n            elif accelerator.gradient_state.remainder > 0:\n                # Last batch needs to be truncated on distributed systems as it contains additional samples\n                def _adjust_samples(tensor):\n                    return tensor[: accelerator.gradient_state.remainder] if tensor is not None else None\n                if all_tensors:\n                    # This only applies to tensors, as defined in `recursively_apply`\n                    return recursively_apply(_adjust_samples, data)\n                else:\n                    if isinstance(data, (list, tuple)):\n                        return _adjust_samples(data)\n                    elif isinstance(data, dict):\n                        return {k: _adjust_samples(v) for k, v in data.items()}\n                    else:\n                        raise NotImplementedError(f\"Non-tensor gather only supports list, tuple or dict\")\n            else:  # remainder is 0\n                # no remainder even though at end of dataloader, so nothing to do.\n                return data\n        else:\n            # Not at the end of the dataloader, no need to adjust the tensors\n            return data\n    except Exception:\n        # Dataset had no length or raised an error\n        return data\n    \ndef gather_dict(accelerator, data_dict):\n    data_dict_non_tensor = {k : v for k, v in data_dict.items() if not isinstance(v, torch.Tensor)}\n    data_dict_non_tensor = gather_for_metrics(accelerator, data_dict_non_tensor)\n    data_dict = {k : v for k, v in data_dict.items() if isinstance(v, torch.Tensor)}\n    data_dict = gather_for_metrics(accelerator, data_dict)\n    data_dict.update(data_dict_non_tensor)\n    return data_dict\n"
  },
  {
    "path": "common/type_utils.py",
    "content": "import torch\n\nfrom omegaconf import OmegaConf\n\n\ndef cfg2dict(cfg):\n    return OmegaConf.to_container(cfg, resolve=True)\n\n\ndef _to_device(state, device):\n    \"\"\" usually load from cpu checkpoint but need to load to cuda \"\"\"\n    if isinstance(state, torch.Tensor):\n        new_state = state.to(device, non_blocking=True)  # assume propoerly set py torch.cuda.set_device\n    elif isinstance(state, list):\n        new_state = torch.tensor([_to_device(t, device) for t in state]).to(device)\n    elif isinstance(state, tuple):\n        new_state = torch.tensor(tuple(_to_device(t, device) for t in state)).to(device)\n    elif isinstance(state, dict):\n        new_state = {n: _to_device(t, device) for n, t in state.items()}\n    else:\n        try:\n            if not isinstance(state, str):\n                new_state = torch.tensor(state).to(device)\n            else:\n                new_state = state\n        except:\n            raise ValueError(f\"The provided tensor can not be transfered to {device}\")\n    return new_state"
  },
  {
    "path": "configs/final/all_anno.yaml",
    "content": "###\n# Pretrain with human annotation only\n###\n\n# Experiment general info\nname: \"FinalOVPretrain\"\nrng_seed: 42\nnum_gpu: 8\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"all_anno\"\n  train: ['ScanNetSpatialRefer','ARKitSceneSpatialRefer','MultiScanSpatialRefer','HMSpatialRefer']\n  val: ['ScanNetSpatialRefer']\n  test: ['ScanNetSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'pred'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n    use_scene_cap: True\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: [ 'scanrefer', 'referit3d' ]\n      referit3d:\n        anno_type: ['nr3d']\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: [ 'rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt', 'template']\n    val:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: ['anno']\n    val:\n      sources: [ 'anno']\n    test:\n      sources: [ 'anno']\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['anno']\n    val:\n      sources: [ 'anno' ]\n    test:\n      sources: [ 'anno' ]\n  HMSpatialRefer:\n    train:\n      sources: [ 'anno' ]\n    val:\n      sources: [ 'anno' ]\n    test:\n      sources: [ 'anno' ]\n  use_voxel: False\n  scan_family_base: \"/scratch2/generalvision/chenyixin/datasets/SceneVerse/ScanNet\"\n  rscan_base: \"/scratch2/generalvision/chenyixin/datasets/SceneVerse/3RScan\"\n  arkitscene_base: '/scratch2/generalvision/chenyixin/datasets/SceneVerse/ARKitScenes'\n  multiscan_base: '/scratch2/generalvision/chenyixin/datasets/SceneVerse/MultiScan'\n  hm_base: '/scratch2/generalvision/chenyixin/datasets/SceneVerse/HM3D'\n  procthor_base: '/scratch2/generalvision/chenyixin/datasets/SceneVerse/ProcThor'\n  s3d_base: /scratch2/generalvision/chenyixin/datasets/SceneVerse/Structured3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: False\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: False\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'MaskDatasetWrapper'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  batchsize: 64\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 20\n  epochs_per_eval: 1\n  lr: 5e-4\n  grad_norm: 5.0\n  epochs: 150\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 500\n      minimum_ratio: 0.1\n\neval:\n  train:\n    name: 'PretrainEval'\n  val:\n    name: 'ScanReferEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: '/scratch/masaccio/code/pretrained_weights/objpretrain/pointnetpp-open-bert'\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['pretrain_head']\n    pretrain_head:\n      name: 'OVPretrainHead'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n  loss_type: 'ListLoss'\n  loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]\n  vis_loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]"
  },
  {
    "path": "configs/final/all_nomlm.yaml",
    "content": "###\n# Pretrain on all data without MLM loss\n###\n\n# Experiment general info\nname: \"FinalOVPretrain\"\nrng_seed: 42\nnum_gpu: 8\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"all_nomlm\"\n  train: ['ScanNetSpatialRefer','ARKitSceneSpatialRefer','MultiScanSpatialRefer','HMSpatialRefer','RScanSpatialRefer']\n  val: ['ScanNetSpatialRefer']\n  test: ['ScanNetSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'pred'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n    use_scene_cap: True\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: [ 'scanrefer', 'referit3d', 'sgrefer', 'sgcaption' ]\n      referit3d:\n        anno_type: ['sr3d', 'nr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: ['rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: ['rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: ['rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno','rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno','rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno','rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno','rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/scratch/masaccio/existing_datasets/HM3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: False\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: False\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'MaskDatasetWrapper'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  batchsize: 64\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 20\n  epochs_per_eval: 1\n  lr: 5e-4\n  grad_norm: 5.0\n  epochs: 150\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 500\n      minimum_ratio: 0.1\n\neval:\n  train:\n    name: 'PretrainEval'\n  val:\n    name: 'ScanReferEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: '/scratch/masaccio/results/ALLObjPretrain_b64_Pretrain_ScanNetPretrainObj+RScanPretrainObj+ARKitScenePretrainObj+MultiScanPretrainObj+HMPretrainObj_1113real_all/2023-11-13-12:17:35.068482/ckpt/best.pth'\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['pretrain_head']\n    pretrain_head:\n      name: 'OVPretrainHead'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n  loss_type: 'ListLoss'\n  loss_list: [\n#      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n#      'TextSceneBetweenBatch'\n  ]\n  vis_loss_list: [\n#      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n#      'TextSceneBetweenBatch'\n  ]"
  },
  {
    "path": "configs/final/all_noobj.yaml",
    "content": "###\n# Pretrain on all data without object-level alignment\n###\n\n\n# Experiment general info\nname: \"FinalOVPretrain\"\nrng_seed: 42\nnum_gpu: 8\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"all\"\n  train: ['ScanNetSpatialRefer','ARKitSceneSpatialRefer','MultiScanSpatialRefer','HMSpatialRefer','RScanSpatialRefer']\n  val: ['ScanNetSpatialRefer']\n  test: ['ScanNetSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'pred'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n    use_scene_cap: True\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: [ 'scanrefer', 'referit3d', 'sgrefer', 'sgcaption' ]\n      referit3d:\n        anno_type: ['sr3d', 'nr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: ['rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: ['rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: ['rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/scratch/masaccio/existing_datasets/HM3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: False\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: False\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'MaskDatasetWrapper'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  batchsize: 64\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 20\n  epochs_per_eval: 1\n  lr: 5e-4\n  grad_norm: 5.0\n  epochs: 150\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 500\n      minimum_ratio: 0.1\n\neval:\n  train:\n    name: 'PretrainEval'\n  val:\n    name: 'ScanReferEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: False\n        path: ''\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['pretrain_head']\n    pretrain_head:\n      name: 'OVPretrainHead'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n  loss_type: 'ListLoss'\n  loss_list: [\n#      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n#      'TextSceneBetweenBatch'\n  ]\n  vis_loss_list: [\n#      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n#      'TextSceneBetweenBatch'\n  ]"
  },
  {
    "path": "configs/final/all_noscene.yaml",
    "content": "###\n# Pretrain on all data without scene-level alignment\n###\n\n# Experiment general info\nname: \"FinalOVPretrain\"\nrng_seed: 42\nnum_gpu: 8\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"all_noscene\"\n  train: ['ScanNetSpatialRefer','ARKitSceneSpatialRefer','MultiScanSpatialRefer','HMSpatialRefer','RScanSpatialRefer']\n  val: ['ScanNetSpatialRefer']\n  test: ['ScanNetSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'pred'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n    use_scene_cap: True\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: [ 'scanrefer', 'referit3d', 'sgrefer', 'sgcaption' ]\n      referit3d:\n        anno_type: ['sr3d', 'nr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: ['rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno','rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno','rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/scratch/masaccio/existing_datasets/HM3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: False\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: False\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'MaskDatasetWrapper'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  batchsize: 64\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 20\n  epochs_per_eval: 1\n  lr: 5e-4\n  grad_norm: 5.0\n  epochs: 150\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 500\n      minimum_ratio: 0.1\n\neval:\n  train:\n    name: 'PretrainEval'\n  val:\n    name: 'ScanReferEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: '/scratch/masaccio/results/ALLObjPretrain_b64_Pretrain_ScanNetPretrainObj+RScanPretrainObj+ARKitScenePretrainObj+MultiScanPretrainObj+HMPretrainObj_1113real_all/2023-11-13-12:17:35.068482/ckpt/best.pth'\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['pretrain_head']\n    pretrain_head:\n      name: 'OVPretrainHead'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n  loss_type: 'ListLoss'\n  loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n#      'TextSceneBetweenBatch'\n  ]\n  vis_loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n#      'TextSceneBetweenBatch'\n  ]"
  },
  {
    "path": "configs/final/all_pretrain.yaml",
    "content": "###\n# Pretrain on all data with all losses\n###\n\n# Experiment general info\nname: \"FinalOVPretrain\"\nrng_seed: 42\nnum_gpu: 8\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"all\"\n  train: ['ScanNetSpatialRefer','ARKitSceneSpatialRefer','MultiScanSpatialRefer','HMSpatialRefer','RScanSpatialRefer']\n  val: ['ScanNetSpatialRefer']\n  test: ['ScanNetSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'pred'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n    use_scene_cap: True\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: [ 'scanrefer', 'referit3d', 'sgrefer', 'sgcaption' ]\n      referit3d:\n        anno_type: ['sr3d', 'nr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: [ 'rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt', 'template']\n    val:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/scratch2/generalvision/chenyixin/datasets/SceneVerse/ScanNet\"\n  rscan_base: \"/scratch2/generalvision/chenyixin/datasets/SceneVerse/3RScan\"\n  arkitscene_base: '/scratch2/generalvision/chenyixin/datasets/SceneVerse/ARKitScenes'\n  multiscan_base: '/scratch2/generalvision/chenyixin/datasets/SceneVerse/MultiScan'\n  hm_base: '/scratch2/generalvision/chenyixin/datasets/SceneVerse/HM3D'\n  procthor_base: '/scratch2/generalvision/chenyixin/datasets/SceneVerse/ProcThor'\n  s3d_base: /scratch2/generalvision/chenyixin/datasets/SceneVerse/Structured3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: False\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: False\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'MaskDatasetWrapper'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  batchsize: 64\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 20\n  epochs_per_eval: 1\n  lr: 5e-4\n  grad_norm: 5.0\n  epochs: 150\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 500\n      minimum_ratio: 0.1\n\neval:\n  train:\n    name: 'PretrainEval'\n  val:\n    name: 'ScanReferEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: '/scratch/masaccio/code/pretrained_weights/objpretrain/pointnetpp-open-bert'\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['pretrain_head']\n    pretrain_head:\n      name: 'OVPretrainHead'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n  loss_type: 'ListLoss'\n  loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]\n  vis_loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]"
  },
  {
    "path": "configs/final/all_pretrain_125.yaml",
    "content": "###\n# Pretrain on 12.5% of all data\n###\n\n# Experiment general info\nname: \"FinalOVPretrain\"\nrng_seed: 42\nnum_gpu: 8\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"all0.125\"\n  train: ['ScanNetSpatialRefer','ARKitSceneSpatialRefer','MultiScanSpatialRefer','HMSpatialRefer','RScanSpatialRefer']\n  val: ['ScanNetSpatialRefer']\n  test: ['ScanNetSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'pred'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n    use_scene_cap: True\n    subset_ratio: 0.125\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: [ 'scanrefer', 'referit3d', 'sgrefer', 'sgcaption' ]\n      referit3d:\n        anno_type: ['sr3d', 'nr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: [ 'rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/scratch/masaccio/existing_datasets/HM3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: False\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: False\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'MaskDatasetWrapper'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  batchsize: 64\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 20\n  epochs_per_eval: 1\n  lr: 5e-4\n  grad_norm: 5.0\n  epochs: 150\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 500\n      minimum_ratio: 0.1\n\neval:\n  train:\n    name: 'PretrainEval'\n  val:\n    name: 'ScanReferEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: '/scratch/masaccio/code/pretrained_weights/objpretrain/pointnetpp-open-bert'\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['pretrain_head']\n    pretrain_head:\n      name: 'OVPretrainHead'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n  loss_type: 'ListLoss'\n  loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]\n  vis_loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]"
  },
  {
    "path": "configs/final/all_pretrain_25.yaml",
    "content": "###\n# Pretrain on 25% of all data\n###\n\n# Experiment general info\nname: \"FinalOVPretrain\"\nrng_seed: 42\nnum_gpu: 8\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"all0.25\"\n  train: ['ScanNetSpatialRefer','ARKitSceneSpatialRefer','MultiScanSpatialRefer','HMSpatialRefer','RScanSpatialRefer']\n  val: ['ScanNetSpatialRefer']\n  test: ['ScanNetSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'pred'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    subset_ratio: 0.25\n    mask_strategy: random\n    use_scene_cap: True\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: [ 'scanrefer', 'referit3d', 'sgrefer', 'sgcaption' ]\n      referit3d:\n        anno_type: ['sr3d', 'nr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: [ 'rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/scratch/masaccio/existing_datasets/HM3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: False\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: False\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'MaskDatasetWrapper'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  batchsize: 64\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 20\n  epochs_per_eval: 1\n  lr: 5e-4\n  grad_norm: 5.0\n  epochs: 150\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 500\n      minimum_ratio: 0.1\n\neval:\n  train:\n    name: 'PretrainEval'\n  val:\n    name: 'ScanReferEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: '/scratch/masaccio/code/pretrained_weights/objpretrain/pointnetpp-open-bert'\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['pretrain_head']\n    pretrain_head:\n      name: 'OVPretrainHead'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n  loss_type: 'ListLoss'\n  loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]\n  vis_loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]"
  },
  {
    "path": "configs/final/all_pretrain_50.yaml",
    "content": "###\n# Pretrain on 50% of all data\n###\n\n# Experiment general info\nname: \"FinalOVPretrain\"\nrng_seed: 42\nnum_gpu: 8\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"all0.50\"\n  train: ['ScanNetSpatialRefer','ARKitSceneSpatialRefer','MultiScanSpatialRefer','HMSpatialRefer','RScanSpatialRefer']\n  val: ['ScanNetSpatialRefer']\n  test: ['ScanNetSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'pred'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    subset_ratio: 0.5\n    mask_strategy: random\n    use_scene_cap: True\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: [ 'scanrefer', 'referit3d', 'sgrefer', 'sgcaption' ]\n      referit3d:\n        anno_type: ['sr3d', 'nr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: [ 'rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/scratch/masaccio/existing_datasets/HM3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: False\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: False\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'MaskDatasetWrapper'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  batchsize: 64\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 20\n  epochs_per_eval: 1\n  lr: 5e-4\n  grad_norm: 5.0\n  epochs: 150\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 500\n      minimum_ratio: 0.1\n\neval:\n  train:\n    name: 'PretrainEval'\n  val:\n    name: 'ScanReferEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: '/scratch/masaccio/code/pretrained_weights/objpretrain/pointnetpp-open-bert'\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['pretrain_head']\n    pretrain_head:\n      name: 'OVPretrainHead'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n  loss_type: 'ListLoss'\n  loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]\n  vis_loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]"
  },
  {
    "path": "configs/final/all_pretrain_75.yaml",
    "content": "###\n# Pretrain on 75% all data\n###\n\n# Experiment general info\nname: \"FinalOVPretrain\"\nrng_seed: 42\nnum_gpu: 8\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"all.75\"\n  train: ['ScanNetSpatialRefer','ARKitSceneSpatialRefer','MultiScanSpatialRefer','HMSpatialRefer','RScanSpatialRefer']\n  val: ['ScanNetSpatialRefer']\n  test: ['ScanNetSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'pred'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    subset_ratio: 0.75\n    mask_strategy: random\n    use_scene_cap: True\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: [ 'scanrefer', 'referit3d', 'sgrefer', 'sgcaption' ]\n      referit3d:\n        anno_type: ['sr3d', 'nr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: [ 'rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/scratch/masaccio/existing_datasets/HM3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: False\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: False\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'MaskDatasetWrapper'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  batchsize: 64\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 20\n  epochs_per_eval: 1\n  lr: 5e-4\n  grad_norm: 5.0\n  epochs: 150\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 500\n      minimum_ratio: 0.1\n\neval:\n  train:\n    name: 'PretrainEval'\n  val:\n    name: 'ScanReferEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: '/scratch/masaccio/code/pretrained_weights/objpretrain/pointnetpp-open-bert'\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['pretrain_head']\n    pretrain_head:\n      name: 'OVPretrainHead'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n  loss_type: 'ListLoss'\n  loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]\n  vis_loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]"
  },
  {
    "path": "configs/final/all_pretrain_objcap.yaml",
    "content": "###\n# Pretrain on all data adding all object captions\n###\n\n# Experiment general info\nname: \"Debug\"\nrng_seed: 42\nnum_gpu: 8\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"all\"\n  train: ['ScanNetSpatialRefer','ARKitSceneSpatialRefer','MultiScanSpatialRefer','RScanSpatialRefer','HMSpatialRefer']\n  val: ['ScanNetSpatialRefer']\n  test: ['ScanNetSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'pred'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n    use_scene_cap: True\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: [ 'scanrefer', 'referit3d', 'sgrefer', 'sgcaption' ]\n      referit3d:\n        anno_type: ['sr3d', 'nr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: [ 'rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt', 'template']\n    val:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt','obj_caption_gpt','obj_caption_template']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt','obj_caption_gpt','obj_caption_template']\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt','obj_caption_gpt','obj_caption_template']\n  MultiScanSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt','obj_caption_gpt','obj_caption_template']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt','obj_caption_gpt','obj_caption_template']\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt','obj_caption_gpt','obj_caption_template']\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt','obj_caption_gpt','obj_caption_template']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template','obj_caption_gpt','obj_caption_template']\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template','obj_caption_gpt','obj_caption_template']\n  HMSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt', 'obj_caption_template']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template','obj_caption_template']\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template','obj_caption_template']\n  use_voxel: False\n  scan_family_base: \"/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/scratch/masaccio/existing_datasets/HM3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: False\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: False\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'MaskDatasetWrapper'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  batchsize: 64\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 20\n  epochs_per_eval: 1\n  lr: 5e-4\n  grad_norm: 5.0\n  epochs: 150\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 500\n      minimum_ratio: 0.1\n\neval:\n  train:\n    name: 'PretrainEval'\n  val:\n    name: 'ScanReferEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: '/scratch/masaccio/code/pretrained_weights/objpretrain/pointnetpp-open-bert'\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['pretrain_head']\n    pretrain_head:\n      name: 'OVPretrainHead'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n  loss_type: 'ListLoss'\n  loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]\n  vis_loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]"
  },
  {
    "path": "configs/final/all_pretrain_objcap_notemplate.yaml",
    "content": "###\n# Pretrain on all data without template-based object captions\n###\n\n# Experiment general info\nname: \"OV_w_Cap\"\nrng_seed: 42\nnum_gpu: 8\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"all\"\n  train: ['ScanNetSpatialRefer','ARKitSceneSpatialRefer','MultiScanSpatialRefer','RScanSpatialRefer','HMSpatialRefer']\n  val: ['ScanNetSpatialRefer']\n  test: ['ScanNetSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'pred'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n    use_scene_cap: True\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: [ 'scanrefer', 'referit3d', 'sgrefer', 'sgcaption' ]\n      referit3d:\n        anno_type: ['sr3d', 'nr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: [ 'rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt','obj_caption_gpt']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt','obj_caption_gpt']\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt','obj_caption_gpt']\n  MultiScanSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt','obj_caption_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt','obj_caption_gpt']\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt','obj_caption_gpt']\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt','obj_caption_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template','obj_caption_gpt']\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template','obj_caption_gpt']\n  HMSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt', 'obj_caption_template']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template','obj_caption_template']\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template','obj_caption_template']\n  use_voxel: False\n  scan_family_base: \"/scratch2/generalvision/chenyixin/datasets/SceneVerse/ScanNet\"\n  rscan_base: \"/scratch2/generalvision/chenyixin/datasets/SceneVerse/3RScan\"\n  arkitscene_base: '/scratch2/generalvision/chenyixin/datasets/SceneVerse/ARKitScenes'\n  multiscan_base: '/scratch2/generalvision/chenyixin/datasets/SceneVerse/MultiScan'\n  hm_base: '/scratch2/generalvision/chenyixin/datasets/SceneVerse/HM3D'\n  procthor_base: '/scratch2/generalvision/chenyixin/datasets/SceneVerse/ProcThor'\n  s3d_base: /scratch2/generalvision/chenyixin/datasets/SceneVerse/Structured3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: False\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: False\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'MaskDatasetWrapper'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  batchsize: 64\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 20\n  epochs_per_eval: 1\n  lr: 5e-4\n  grad_norm: 5.0\n  epochs: 200\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 500\n      minimum_ratio: 0.1\n\neval:\n  train:\n    name: 'PretrainEval'\n  val:\n    name: 'ScanReferEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: '/scratch/masaccio/code/pretrained_weights/objpretrain/pointnetpp-open-bert'\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['pretrain_head']\n    pretrain_head:\n      name: 'OVPretrainHead'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n  loss_type: 'ListLoss'\n  loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]\n  vis_loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]"
  },
  {
    "path": "configs/final/all_pretrain_s3d.yaml",
    "content": "###\n# Pretrain on all data with Structured 3D\n###\n\n# Experiment general info\nname: \"FinalOVPretrain\"\nrng_seed: 42\nnum_gpu: 8\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"all\"\n  train: ['ScanNetSpatialRefer','ARKitSceneSpatialRefer','MultiScanSpatialRefer','HMSpatialRefer','RScanSpatialRefer','S3DSpatialRefer']\n  val: ['ScanNetSpatialRefer']\n  test: ['ScanNetSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'pred'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n    use_scene_cap: True\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: [ 'scanrefer', 'referit3d', 'sgrefer', 'sgcaption' ]\n      referit3d:\n        anno_type: ['sr3d', 'nr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: [ 'rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  S3DSpatialRefer:\n    train:\n      sources: [ 'rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt' ]\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/scratch/masaccio/existing_datasets/HM3D'\n  s3d_base: '/scratch/masaccio/existing_datasets/Structured3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: False\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: False\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'MaskDatasetWrapper'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  batchsize: 64\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 20\n  epochs_per_eval: 1\n  lr: 5e-4\n  grad_norm: 5.0\n  epochs: 250\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 1000\n      minimum_ratio: 0.1\n\neval:\n  train:\n    name: 'PretrainEval'\n  val:\n    name: 'ScanReferEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: '/scratch/masaccio/results/ALLObjPretrain_b512_Pretrain_ScanNetPretrainObj+RScanPretrainObj+ARKitScenePretrainObj+MultiScanPretrainObj+HMPretrainObj+S3DPretrainObj_1113scannetws3d/2023-11-14-09:29:10.796592/ckpt/best.pth'\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['pretrain_head']\n    pretrain_head:\n      name: 'OVPretrainHead'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n  loss_type: 'ListLoss'\n  loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]\n  vis_loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]"
  },
  {
    "path": "configs/final/all_pretrain_unfreeze.yaml",
    "content": "###\n# Pretrain on all data with object encoder unfrozen\n###\n\n# Experiment general info\nname: \"FinalOVPretrain\"\nrng_seed: 42\nnum_gpu: 8\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"all\"\n  train: ['ScanNetSpatialRefer','ARKitSceneSpatialRefer','MultiScanSpatialRefer','HMSpatialRefer','RScanSpatialRefer']\n  val: ['ScanNetSpatialRefer']\n  test: ['ScanNetSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'pred'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n    use_scene_cap: True\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: [ 'scanrefer', 'referit3d', 'sgrefer', 'sgcaption' ]\n      referit3d:\n        anno_type: ['sr3d', 'nr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: [ 'rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/scratch/masaccio/existing_datasets/HM3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: False\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: False\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'MaskDatasetWrapper'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  batchsize: 64\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 20\n  epochs_per_eval: 1\n  lr: 5e-4\n  grad_norm: 5.0\n  epochs: 250\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 100\n      minimum_ratio: 0.1\n\neval:\n  train:\n    name: 'PretrainEval'\n  val:\n    name: 'ScanReferEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: False\n        path: '/scratch/masaccio/results/ALLObjPretrain_b64_Pretrain_ScanNetPretrainObj+RScanPretrainObj+ARKitScenePretrainObj+MultiScanPretrainObj+HMPretrainObj_1113real_all/2023-11-13-12:17:35.068482/ckpt/best.pth'\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['pretrain_head']\n    pretrain_head:\n      name: 'OVPretrainHead'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n  loss_type: 'ListLoss'\n  loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]\n  vis_loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]"
  },
  {
    "path": "configs/final/all_rewrite.yaml",
    "content": "###\n# Pretrain on all LLM-refined data only\n###\n\n# Experiment general info\nname: \"FinalOVPretrain\"\nrng_seed: 42\nnum_gpu: 8\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"all_rewrite\"\n  train: ['ScanNetSpatialRefer','ARKitSceneSpatialRefer','MultiScanSpatialRefer','HMSpatialRefer','RScanSpatialRefer']\n  val: ['ScanNetSpatialRefer']\n  test: ['ScanNetSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'pred'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n    use_scene_cap: True\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: ['scanrefer','sgrefer','sgcaption']\n      referit3d:\n        anno_type: ['sr3d', 'nr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: ['rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/scratch/masaccio/existing_datasets/HM3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: False\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: False\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'MaskDatasetWrapper'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  batchsize: 64\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 20\n  epochs_per_eval: 1\n  lr: 5e-4\n  grad_norm: 5.0\n  epochs: 150\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 500\n      minimum_ratio: 0.1\n\neval:\n  train:\n    name: 'PretrainEval'\n  val:\n    name: 'ScanReferEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: '/scratch/masaccio/code/pretrained_weights/objpretrain/pointnetpp-open-bert'\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['pretrain_head']\n    pretrain_head:\n      name: 'OVPretrainHead'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n  loss_type: 'ListLoss'\n  loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]\n  vis_loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]"
  },
  {
    "path": "configs/final/all_template.yaml",
    "content": "###\n# Pretrain on all template-based generated data only\n###\n\n# Experiment general info\nname: \"FinalOVPretrain\"\nrng_seed: 42\nnum_gpu: 8\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"template_only\"\n  train: ['ScanNetSpatialRefer','ARKitSceneSpatialRefer','MultiScanSpatialRefer','HMSpatialRefer','RScanSpatialRefer']\n  val: ['ScanNetSpatialRefer']\n  test: ['ScanNetSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'pred'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n    use_scene_cap: True\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: ['scanrefer', 'sgrefer']\n      referit3d:\n        anno_type: ['sr3d', 'nr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: ['rel2_template', 'relm_template', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template','relm_template','star_template']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: ['rel2_template','relm_template','star_template']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['rel2_template','relm_template','star_template']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: ['rel2_template','relm_template','star_template']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/scratch/masaccio/existing_datasets/HM3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: False\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: False\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'MaskDatasetWrapper'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  batchsize: 64\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 20\n  epochs_per_eval: 1\n  lr: 5e-4\n  grad_norm: 5.0\n  epochs: 150\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 500\n      minimum_ratio: 0.1\n\neval:\n  train:\n    name: 'PretrainEval'\n  val:\n    name: 'ScanReferEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: '/scratch/masaccio/code/pretrained_weights/objpretrain/pointnetpp-open-bert'\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['pretrain_head']\n    pretrain_head:\n      name: 'OVPretrainHead'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n  loss_type: 'ListLoss'\n  loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]\n  vis_loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]"
  },
  {
    "path": "configs/final/all_wo_both.yaml",
    "content": "###\n# Pretrain on all data without ScanNet and MultiScan data\n###\n\n# Experiment general info\nname: \"FinalOVPretrain\"\nrng_seed: 42\nnum_gpu: 8\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"wo_scannet_multiscan\"\n  train: ['ARKitSceneSpatialRefer','HMSpatialRefer','RScanSpatialRefer']\n  val: ['ScanNetSpatialRefer']\n  test: ['ScanNetSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'pred'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n    use_scene_cap: True\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: [ 'scanrefer', 'referit3d', 'sgrefer', 'sgcaption' ]\n      referit3d:\n        anno_type: ['sr3d', 'nr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: [ 'rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/scratch/masaccio/existing_datasets/HM3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: False\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: False\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'MaskDatasetWrapper'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  batchsize: 64\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 20\n  epochs_per_eval: 1\n  lr: 5e-4\n  grad_norm: 5.0\n  epochs: 150\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 500\n      minimum_ratio: 0.1\n\neval:\n  train:\n    name: 'PretrainEval'\n  val:\n    name: 'ScanReferEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: '/scratch/masaccio/code/pretrained_weights/objpretrain/pointnetpp-open-bert'\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['pretrain_head']\n    pretrain_head:\n      name: 'OVPretrainHead'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n  loss_type: 'ListLoss'\n  loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]\n  vis_loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]"
  },
  {
    "path": "configs/final/all_wo_both_125.yaml",
    "content": "###\n# Pretrain on 12.5% of all data without ScanNet and MultiScan\n###\n\n# Experiment general info\nname: \"FinalOVPretrain\"\nrng_seed: 42\nnum_gpu: 8\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"wo_scannet_multiscan.125\"\n  train: ['ARKitSceneSpatialRefer','HMSpatialRefer','RScanSpatialRefer']\n  val: ['ScanNetSpatialRefer']\n  test: ['ScanNetSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'pred'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n    use_scene_cap: True\n    subset_ratio: 0.125\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: [ 'scanrefer', 'referit3d', 'sgrefer', 'sgcaption' ]\n      referit3d:\n        anno_type: ['sr3d', 'nr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: [ 'rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/scratch/masaccio/existing_datasets/HM3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: False\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: False\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'MaskDatasetWrapper'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  batchsize: 64\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 20\n  epochs_per_eval: 1\n  lr: 5e-4\n  grad_norm: 5.0\n  epochs: 150\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 500\n      minimum_ratio: 0.1\n\neval:\n  train:\n    name: 'PretrainEval'\n  val:\n    name: 'ScanReferEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: '/scratch/masaccio/code/pretrained_weights/objpretrain/pointnetpp-open-bert'\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['pretrain_head']\n    pretrain_head:\n      name: 'OVPretrainHead'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n  loss_type: 'ListLoss'\n  loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]\n  vis_loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]"
  },
  {
    "path": "configs/final/all_wo_both_25.yaml",
    "content": "###\n# Pretrain on 25% of all data without ScanNet and MultiScan\n###\n\n# Experiment general info\nname: \"FinalOVPretrain\"\nrng_seed: 42\nnum_gpu: 8\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"wo_scannet_multiscan.25\"\n  train: ['ARKitSceneSpatialRefer','HMSpatialRefer','RScanSpatialRefer']\n  val: ['ScanNetSpatialRefer']\n  test: ['ScanNetSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'pred'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n    use_scene_cap: True\n    subset_ratio: 0.25\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: [ 'scanrefer', 'referit3d', 'sgrefer', 'sgcaption' ]\n      referit3d:\n        anno_type: ['sr3d', 'nr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: [ 'rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/scratch/masaccio/existing_datasets/HM3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: False\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: False\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'MaskDatasetWrapper'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  batchsize: 64\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 20\n  epochs_per_eval: 1\n  lr: 5e-4\n  grad_norm: 5.0\n  epochs: 150\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 500\n      minimum_ratio: 0.1\n\neval:\n  train:\n    name: 'PretrainEval'\n  val:\n    name: 'ScanReferEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: '/scratch/masaccio/code/pretrained_weights/objpretrain/pointnetpp-open-bert'\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['pretrain_head']\n    pretrain_head:\n      name: 'OVPretrainHead'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n  loss_type: 'ListLoss'\n  loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]\n  vis_loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]"
  },
  {
    "path": "configs/final/all_wo_both_50.yaml",
    "content": "###\n# Pretrain on 50% of all data without ScanNet and MultiScan\n###\n\n# Experiment general info\nname: \"FinalOVPretrain\"\nrng_seed: 42\nnum_gpu: 8\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"wo_scannet_multiscan.50\"\n  train: ['ARKitSceneSpatialRefer','HMSpatialRefer','RScanSpatialRefer']\n  val: ['ScanNetSpatialRefer']\n  test: ['ScanNetSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'pred'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n    use_scene_cap: True\n    subset_ratio: 0.50\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: [ 'scanrefer', 'referit3d', 'sgrefer', 'sgcaption' ]\n      referit3d:\n        anno_type: ['sr3d', 'nr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: [ 'rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/scratch/masaccio/existing_datasets/HM3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: False\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: False\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'MaskDatasetWrapper'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  batchsize: 64\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 20\n  epochs_per_eval: 1\n  lr: 5e-4\n  grad_norm: 5.0\n  epochs: 150\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 500\n      minimum_ratio: 0.1\n\neval:\n  train:\n    name: 'PretrainEval'\n  val:\n    name: 'ScanReferEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: '/scratch/masaccio/code/pretrained_weights/objpretrain/pointnetpp-open-bert'\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['pretrain_head']\n    pretrain_head:\n      name: 'OVPretrainHead'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n  loss_type: 'ListLoss'\n  loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]\n  vis_loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]"
  },
  {
    "path": "configs/final/all_wo_multiscan.yaml",
    "content": "###\n# Pretrain on all data without MultiScan\n###\n\n# Experiment general info\nname: \"FinalOVPretrain\"\nrng_seed: 42\nnum_gpu: 8\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"wo_multiscan\"\n  train: ['ScanNetSpatialRefer','ARKitSceneSpatialRefer','HMSpatialRefer','RScanSpatialRefer']\n  val: ['MultiScanSpatialRefer']\n  test: ['MultiScanSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'gt'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n    use_scene_cap: True\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: ['scanrefer','referit3d','sgrefer','sgcaption']\n      referit3d:\n        anno_type: ['sr3d', 'nr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: [ 'rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno' ]\n    test:\n      sources: [ 'anno' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/scratch/masaccio/existing_datasets/HM3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: False\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: False\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'MaskDatasetWrapper'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  batchsize: 64\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 20\n  epochs_per_eval: 1\n  lr: 5e-4\n  grad_norm: 5.0\n  epochs: 150\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 500\n      minimum_ratio: 0.1\n\neval:\n  train:\n    name: 'PretrainEval'\n  val:\n    name: 'ReferIt3DEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: '/scratch/masaccio/code/pretrained_weights/objpretrain/pointnetpp-open-bert'\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['pretrain_head']\n    pretrain_head:\n      name: 'OVPretrainHead'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n  loss_type: 'ListLoss'\n  loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]\n  vis_loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]"
  },
  {
    "path": "configs/final/all_wo_scannet.yaml",
    "content": "###\n# Pretrain on all data without ScanNet\n###\n\n# Experiment general info\nname: \"FinalOVPretrain\"\nrng_seed: 42\nnum_gpu: 8\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"wo_scannet\"\n  train: ['ARKitSceneSpatialRefer','MultiScanSpatialRefer','HMSpatialRefer','RScanSpatialRefer']\n  val: ['ScanNetSpatialRefer']\n  test: ['ScanNetSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'pred'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n    use_scene_cap: True\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: [ 'scanrefer', 'referit3d', 'sgrefer', 'sgcaption' ]\n      referit3d:\n        anno_type: ['sr3d', 'nr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: [''rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/scratch/masaccio/existing_datasets/HM3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: False\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: False\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'MaskDatasetWrapper'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  batchsize: 64\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 20\n  epochs_per_eval: 1\n  lr: 5e-4\n  grad_norm: 5.0\n  epochs: 150\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 500\n      minimum_ratio: 0.1\n\neval:\n  train:\n    name: 'PretrainEval'\n  val:\n    name: 'ScanReferEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: '/scratch/masaccio/code/pretrained_weights/objpretrain/pointnetpp-open-bert'\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['pretrain_head']\n    pretrain_head:\n      name: 'OVPretrainHead'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n  loss_type: 'ListLoss'\n  loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]\n  vis_loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]"
  },
  {
    "path": "configs/final/debug.yaml",
    "content": "###\n# Debugging\n###\n\n# Experiment general info\nname: \"FinalOVPretrain\"\nrng_seed: 42\nnum_gpu: 1\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/mnt/fillipo/baoxiong/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 3\n  hard_debug: True\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"all\"\n  train: ['RScanSpatialRefer','ScanNetSpatialRefer','ARKitSceneSpatialRefer','MultiScanSpatialRefer','HMSpatialRefer']\n  # train: ['RScanSpatialRefer','ScanNetSpatialRefer','ARKitSceneSpatialRefer','MultiScanSpatialRefer','HMSpatialRefer','ProcThorSpatialRefer','S3DSpatialRefer']\n  # train: ['ScanNetSpatialRefer']\n  val: ['ScanNetSpatialRefer']\n  test: ['ScanNetSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'pred'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n    use_scene_cap: False\n    max_scene_cap_len: 300\n    load_scene_pcds: True\n    max_pcd_num_points: 240000\n    subset_ratio: 0\n  ScanNetSpatialRefer:\n    train:\n      sources: [ 'scanrefer','referit3d','sgrefer','sgcaption']\n      referit3d:\n        anno_type: ['sr3d', 'nr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: ['rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: ['anno']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['anno']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: ['anno']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  S3DSpatialRefer:\n    train:\n      # sources: [ 'rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt' ]\n      sources: [ 'relm_gpt']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  ProcThorSpatialRefer:\n    train:\n      sources: ['star_template' ]\n    val:\n      sources: [ 'rel2_template', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'rel2_template', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/mnt/fillipo/Datasets/SceneVerse/ScanNet\"\n  rscan_base: \"/mnt/fillipo/Datasets/SceneVerse/3RScan\"\n  arkitscene_base: '/mnt/fillipo/Datasets/SceneVerse/ARKitScenes'\n  multiscan_base: '/mnt/fillipo/Datasets/SceneVerse/MultiScan'\n  hm_base: '/mnt/fillipo/Datasets/SceneVerse/HM3D'\n  procthor_base: '/mnt/fillipo/Datasets/SceneVerse/ProcThor'\n  s3d_base: '/mnt/fillipo/Datasets/SceneVerse/Structured3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: False\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: False\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'VisualizeDatasetWrapper'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"DebugTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  batchsize: 64\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 20\n  epochs_per_eval: 1\n  lr: 5e-4\n  grad_norm: 5.0\n  epochs: 150\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 500\n      minimum_ratio: 0.1\n\neval:\n  train:\n    name: 'PretrainEval'\n  val:\n    name: 'ScanReferEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: \"/mnt/fillipo/baoxiong/results/ALLObjPretrain_b64_Pretrain_ScanNetPretrainObj+RScanPretrainObj+ARKitScenePretrainObj+MultiScanPretrainObj+HMPretrainObj_1113real_all/2023-11-13-12:17:35.068482/ckpt/best.pth\"\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/mnt/fillipo/baoxiong/results/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['pretrain_head']\n    pretrain_head:\n      name: 'OVPretrainHead'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n  loss_type: 'ListLoss'\n  loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]\n  vis_loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]"
  },
  {
    "path": "configs/final/finetune/multiscan_finetune.yaml",
    "content": "###\n# Finetune on MultiScan\n###\n\n# Experiment general info\nname: \"FinalOVFinetune\"\nrng_seed: 42\nnum_gpu: 2\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"multiscan\"\n  train: ['MultiScanSpatialRefer']\n  val: ['MultiScanSpatialRefer']\n  test: ['MultiScanSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'gt'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n    use_scene_cap: True\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: ['sgrefer','sgcaption']\n      referit3d:\n        anno_type: ['sr3d', 'nr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: ['rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: [ 'anno' ]\n    val:\n      sources: [ 'anno' ]\n    test:\n      sources: [ 'anno' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/scratch/masaccio/existing_datasets/HM3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: True\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'ScanFamilyDatasetWrapperOld'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  # This is a per-gpu batchsize\n  batchsize: 256\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 50\n  epochs_per_eval: 1\n  lr: 1e-4\n  grad_norm: 5.0\n  epochs: 250\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 500\n\neval:\n  train:\n    name: 'ReferIt3DEval'\n  val:\n    name: 'ReferIt3DEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n#    name: 'pointnet_point_encoder'\n#    args:\n#      path: None\n#      freeze: False\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: \"/scratch/masaccio/results/ALLObjPretrain_b64_Pretrain_ScanNetPretrainObj+RScanPretrainObj+ARKitScenePretrainObj+MultiScanPretrainObj+HMPretrainObj_1113real_all/2023-11-13-12:17:35.068482/ckpt/best.pth\"\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['ground_head']\n    pretrain_head:\n      name: 'PretrainHeadV1'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n    ground_head:\n      name: \"GroundHeadV1\"\n      args:\n        hidden_size: 384\n        input_size: 768\n        sem_cls_size: 607\n        dropout: 0.3\n        detach_all_aux_loss: True\n  loss_type: 'ListLoss'\n  loss_list: [\n#       'TextObjWithinBatch'\n        'og3d_loss'\n  ]\n  vis_loss_list: [\n#        'TextObjWithinBatch'\n        'og3d_loss'\n  ]"
  },
  {
    "path": "configs/final/finetune/multiscan_woL.yaml",
    "content": "###\n# Finetune on MultiScan (unseen language)\n###\n\n# Experiment general info\nname: \"FinalOVFinetune\"\nrng_seed: 42\nnum_gpu: 2\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"multiscan_wo_L\"\n  train: ['MultiScanSpatialRefer']\n  val: ['MultiScanSpatialRefer']\n  test: ['MultiScanSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'gt'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n    use_scene_cap: True\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: ['sgrefer','sgcaption']\n      referit3d:\n        anno_type: ['sr3d', 'nr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: ['rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno' ]\n    test:\n      sources: [ 'anno' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/scratch/masaccio/existing_datasets/HM3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: True\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'ScanFamilyDatasetWrapperOld'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  # This is a per-gpu batchsize\n  batchsize: 256\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 50\n  epochs_per_eval: 1\n  lr: 1e-4\n  grad_norm: 5.0\n  epochs: 250\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 5000\n\neval:\n  train:\n    name: 'ReferIt3DEval'\n  val:\n    name: 'ReferIt3DEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n#    name: 'pointnet_point_encoder'\n#    args:\n#      path: None\n#      freeze: False\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: \"/scratch/masaccio/results/ALLObjPretrain_b64_Pretrain_ScanNetPretrainObj+RScanPretrainObj+ARKitScenePretrainObj+MultiScanPretrainObj+HMPretrainObj_1113real_all/2023-11-13-12:17:35.068482/ckpt/best.pth\"\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['ground_head']\n    pretrain_head:\n      name: 'PretrainHeadV1'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n    ground_head:\n      name: \"GroundHeadV1\"\n      args:\n        hidden_size: 384\n        input_size: 768\n        sem_cls_size: 607\n        dropout: 0.3\n        detach_all_aux_loss: True\n  loss_type: 'ListLoss'\n  loss_list: [\n#       'TextObjWithinBatch'\n        'og3d_loss'\n  ]\n  vis_loss_list: [\n#        'TextObjWithinBatch'\n        'og3d_loss'\n  ]"
  },
  {
    "path": "configs/final/finetune/nr3d_finetune.yaml",
    "content": "###\n# Finetune on Nr3D\n###\n\n# Experiment general info\nname: \"FinalOVFinetune\"\nrng_seed: 42\nnum_gpu: 2\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"nr3d_whead\"\n  train: ['ScanNetSpatialRefer']\n  val: ['ScanNetSpatialRefer']\n  test: ['ScanNetSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'gt'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n  ScanNetSpatialRefer:\n    train:\n      sources: ['referit3d']\n      referit3d:\n        anno_type: ['nr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: ['chain_gpt', 'chain_template', 'rel2_template', 'relm_template', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['referit3d']\n      referit3d:\n        anno_type: ['nr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['referit3d']\n      referit3d:\n        anno_type: ['nr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: [ 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    val:\n      sources: [ 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    val:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    val:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    val:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/scratch/masaccio/existing_datasets/HM3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: True\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'ScanFamilyDatasetWrapperOld'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  # This is a per-gpu batchsize\n  batchsize: 256\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 50\n  epochs_per_eval: 1\n  lr: 1e-4\n  grad_norm: 5.0\n  epochs: 150\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 5000\n\neval:\n  train:\n    name: 'ReferIt3DEval'\n  val:\n    name: 'ReferIt3DEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n#    name: 'pointnet_point_encoder'\n#    args:\n#      path: None\n#      freeze: False\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: \"/scratch/masaccio/results/ALLObjPretrain_b64_Pretrain_ScanNetPretrainObj+RScanPretrainObj+ARKitScenePretrainObj+MultiScanPretrainObj+HMPretrainObj_1113real_all/2023-11-13-12:17:35.068482/ckpt/best.pth\"\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['ground_head']\n    pretrain_head:\n      name: 'PretrainHeadV1'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n    ground_head:\n      name: \"GroundHeadV1\"\n      args:\n        hidden_size: 384\n        input_size: 768\n        sem_cls_size: 607\n        dropout: 0.3\n        detach_all_aux_loss: True\n  loss_type: 'ListLoss'\n  loss_list: [\n#       'TextObjWithinBatch'\n        'og3d_loss'\n  ]\n  vis_loss_list: [\n#        'TextObjWithinBatch'\n        'og3d_loss'\n  ]"
  },
  {
    "path": "configs/final/finetune/scannet_woL.yaml",
    "content": "###\n# Finetune on ScanRefer (unseen language)\n###\n\n# Experiment general info\nname: \"FinalOVFinetune\"\nrng_seed: 42\nnum_gpu: 2\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"scannet_wo_L\"\n  train: ['ScanNetSpatialRefer']\n  val: ['ScanNetSpatialRefer']\n  test: ['ScanNetSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'gt'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n    use_scene_cap: True\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: ['sgrefer','sgcaption']\n      referit3d:\n        anno_type: ['sr3d', 'nr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: ['chain_gpt', 'chain_template', 'rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['referit3d']\n      referit3d:\n        anno_type: ['nr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['referit3d']\n      referit3d:\n        anno_type: ['nr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['chain_template','chain_gpt','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: ['anno','chain_template','chain_gpt','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['anno','chain_template','chain_gpt','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: ['anno','chain_template','chain_gpt','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/scratch/masaccio/existing_datasets/HM3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: True\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'ScanFamilyDatasetWrapperOld'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  # This is a per-gpu batchsize\n  batchsize: 256\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 10\n  epochs_per_eval: 1\n  lr: 1e-4\n  grad_norm: 5.0\n  epochs: 100\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 1500\n\neval:\n  train:\n    name: 'ReferIt3DEval'\n  val:\n    name: 'ReferIt3DEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n#    name: 'pointnet_point_encoder'\n#    args:\n#      path: None\n#      freeze: False\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: \"/scratch/masaccio/results/ALLObjPretrain_b64_Pretrain_ScanNetPretrainObj+RScanPretrainObj+ARKitScenePretrainObj+MultiScanPretrainObj+HMPretrainObj_1113real_all/2023-11-13-12:17:35.068482/ckpt/best.pth\"\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['ground_head']\n    pretrain_head:\n      name: 'PretrainHeadV1'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n    ground_head:\n      name: \"GroundHeadV1\"\n      args:\n        hidden_size: 384\n        input_size: 768\n        sem_cls_size: 607\n        dropout: 0.3\n        detach_all_aux_loss: True\n  loss_type: 'ListLoss'\n  loss_list: [\n#       'TextObjWithinBatch'\n        'og3d_loss'\n  ]\n  vis_loss_list: [\n#        'TextObjWithinBatch'\n        'og3d_loss'\n  ]"
  },
  {
    "path": "configs/final/finetune/scanqa_finetune.yaml",
    "content": "# Experiment general info\nname: \"OV_ScanQA\"\nrng_seed: 42\nnum_gpu: 2\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/mnt/fillipo/baoxiong/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  hard_debug: False\n  debug_size: 20\n\nlogger:\n  name: \"wandb\"\n  entity: \"buzz-beater\"\n\n# dataset details\ndata:\n  train: ['ScanNetScanQAOld']\n  val: ['ScanNetScanQAOld']\n  test: ['ScanNetScanQAOld']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'pred'\n    sem_type: '607'\n    filter_lang: False\n    rot_aug: True\n  ScanNetScanQAOld:\n    train:\n      use_unanswer: True\n    val:\n      use_unanswer: True\n    test:\n      use_unanswer: True\n      test_file: \"test_w_obj\" # or \"test_wo_obj\"\n\n  use_voxel: False\n  scan_family_base: \"/mnt/fillipo/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/mnt/fillipo/scratch/masaccio/existing_datasets/3RScan-base\"\n\n# task details: 'pretrain', 'scanrefer', 'referit3d', 'scanqa', 'default'\ntask: 'ScanQA'\ndata_wrapper:\n  train: 'ScanFamilyDatasetWrapperOld'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"DefaultTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  # This is a per-gpu batchsize\n  batchsize: 32\n  num_workers: 2\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 10\n  epochs_per_eval: 5\n  lr: 1e-4\n  grad_norm: 5.0\n  epochs: 100\n  optim:\n    name: \"AdamW\"\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: \"warmup_cosine\"\n    args:\n      warmup_steps: 5000\n\neval:\n  name: \"ScanQAEval\"\n  save: False\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: \"BERTLanguageEncoder\"\n    args:\n      weights: \"bert-base-uncased\"\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n#    name: \"pointnet_point_encoder\"\n#    args:\n#      path: None\n#      freeze: False\n    name: \"PointOpenVocabEncoder\"\n    args:\n        backbone: \"pointnet++\"\n        hidden_size: 768\n        freeze: True\n        path: \"/mnt/fillipo/baoxiong/results/ALLObjPretrain_b64_Pretrain_ScanNetPretrainObj+RScanPretrainObj+ARKitScenePretrainObj+MultiScanPretrainObj+HMPretrainObj_1113real_all/2023-11-13-12:17:35.068482/ckpt/best.pth\"\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/mnt/fillipo/baoxiong/results/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: [\"qa_head\"]\n    qa_head:\n      name: \"QAHeadV1\"\n      args:\n        hidden_size: 768\n        mlp_size: 256\n        glimpse: 1\n        flat_out_size: 512\n        num_answers: 8864\n  loss_type: \"ListLoss\"\n  loss_list: [\n      \"answer_loss\",\n      'TextObjWithinBatch',\n  ]\n  vis_loss_list: [\n      \"answer_loss\",\n      'TextObjWithinBatch',\n  ]\n"
  },
  {
    "path": "configs/final/finetune/scanrefer_finetune.yaml",
    "content": "###\n# Finetune on ScanRefer\n###\n\n# Experiment general info\nname: \"FinalOVFinetune\"\nrng_seed: 42\nnum_gpu: 2\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"scanrefer_whead\"\n  train: ['ScanNetSpatialRefer']\n  val: ['ScanNetSpatialRefer']\n  test: ['ScanNetSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'pred'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n  ScanNetSpatialRefer:\n    train:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['chain_gpt', 'chain_template', 'rel2_template', 'relm_template', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: [ 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    val:\n      sources: [ 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    val:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    val:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    val:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/scratch/masaccio/existing_datasets/HM3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: True\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'ScanFamilyDatasetWrapperOld'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  # This is a per-gpu batchsize\n  batchsize: 256\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 50\n  epochs_per_eval: 1\n  lr: 1e-4\n  grad_norm: 5.0\n  epochs: 150\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 5000\n\neval:\n  train:\n    name: 'ScanReferEval'\n  val:\n    name: 'ScanReferEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n#    name: 'pointnet_point_encoder'\n#    args:\n#      path: None\n#      freeze: False\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: \"/scratch/masaccio/results/ALLObjPretrain_b64_Pretrain_ScanNetPretrainObj+RScanPretrainObj+ARKitScenePretrainObj+MultiScanPretrainObj+HMPretrainObj_1113real_all/2023-11-13-12:17:35.068482/ckpt/best.pth\"\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['ground_head']\n    pretrain_head:\n      name: 'PretrainHeadV1'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n    ground_head:\n      name: \"GroundHeadV1\"\n      args:\n        hidden_size: 384\n        input_size: 768\n        sem_cls_size: 607\n        dropout: 0.3\n        detach_all_aux_loss: True\n  loss_type: 'ListLoss'\n  loss_list: [\n#       'TextObjWithinBatch'\n        'og3d_loss'\n  ]\n  vis_loss_list: [\n#        'TextObjWithinBatch'\n        'og3d_loss'\n  ]"
  },
  {
    "path": "configs/final/finetune/sqa3d_finetune.yaml",
    "content": "# Experiment general info\nname: \"OV_SQA3D\"\nrng_seed: 42\nnum_gpu: 2\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/mnt/fillipo/baoxiong/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  hard_debug: False\n  debug_size: 20\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  train: ['ScanNetSQA3D']\n  val: ['ScanNetSQA3D']\n  test: ['ScanNetSQA3D']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'pred'\n    sem_type: '607'\n    filter_lang: False\n    rot_aug: True\n  ScanNetSQA3D:\n    train:\n      use_unanswer: True\n    val:\n      use_unanswer: True\n    test:\n      use_unanswer: True\n\n  use_voxel: False\n  scan_family_base: \"/mnt/fillipo/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/mnt/fillipo/scratch/masaccio/existing_datasets/3RScan-base\"\n\n# task details: 'pretrain', 'scanrefer', 'referit3d', 'scanqa', 'default'\ntask: 'SQA3D'\ndata_wrapper:\n  train: 'ScanFamilyDatasetWrapperOld'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"DefaultTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  # This is a per-gpu batchsize\n  batchsize: 32\n  num_workers: 2\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 10\n  epochs_per_eval: 5\n  lr: 1e-4\n  grad_norm: 5.0\n  epochs: 100\n  optim:\n    name: \"AdamW\"\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: \"warmup_cosine\"\n    args:\n      warmup_steps: 5000\n\neval:\n  name: \"SQA3DEval\"\n  save: False\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: \"BERTLanguageEncoder\"\n    args:\n      weights: \"bert-base-uncased\"\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n#    name: \"pointnet_point_encoder\"\n#    args:\n#      path: None\n#      freeze: False\n    name: \"PointOpenVocabEncoder\"\n    args:\n        backbone: \"pointnet++\"\n        hidden_size: 768\n        freeze: True\n        path: \"/mnt/fillipo/baoxiong/results/ALLObjPretrain_b64_Pretrain_ScanNetPretrainObj+RScanPretrainObj+ARKitScenePretrainObj+MultiScanPretrainObj+HMPretrainObj_1113real_all/2023-11-13-12:17:35.068482/ckpt/best.pth\"\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/mnt/fillipo/baoxiong/results/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: [\"qa_head\"]\n    qa_head:\n      name: \"QAHeadV1\"\n      args:\n        hidden_size: 768\n        mlp_size: 256\n        glimpse: 1\n        flat_out_size: 512\n        num_answers: 706\n  loss_type: \"ListLoss\"\n  loss_list: [\n      \"answer_loss\"\n  ]\n  vis_loss_list: [\n      \"answer_loss\"\n  ]\n"
  },
  {
    "path": "configs/final/finetune/sr3d_finetune.yaml",
    "content": "###\n# Finetune on Sr3D\n###\n\n# Experiment general info\nname: \"FinalOVFinetune\"\nrng_seed: 42\nnum_gpu: 2\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"sr3d_whead\"\n  train: ['ScanNetSpatialRefer']\n  val: ['ScanNetSpatialRefer']\n  test: ['ScanNetSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'gt'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n  ScanNetSpatialRefer:\n    train:\n      sources: ['referit3d']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['chain_gpt', 'chain_template', 'rel2_template', 'relm_template', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['referit3d']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['referit3d']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: [ 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    val:\n      sources: [ 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    val:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    val:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    val:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'chain_template', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/scratch/masaccio/existing_datasets/HM3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: True\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'ScanFamilyDatasetWrapperOld'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  # This is a per-gpu batchsize\n  batchsize: 256\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 50\n  epochs_per_eval: 1\n  lr: 1e-4\n  grad_norm: 5.0\n  epochs: 150\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 5000\n\neval:\n  train:\n    name: 'ReferIt3DEval'\n  val:\n    name: 'ReferIt3DEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n#    name: 'pointnet_point_encoder'\n#    args:\n#      path: None\n#      freeze: False\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: \"/scratch/masaccio/results/ALLObjPretrain_b64_Pretrain_ScanNetPretrainObj+RScanPretrainObj+ARKitScenePretrainObj+MultiScanPretrainObj+HMPretrainObj_1113real_all/2023-11-13-12:17:35.068482/ckpt/best.pth\"\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['ground_head']\n    pretrain_head:\n      name: 'PretrainHeadV1'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n    ground_head:\n      name: \"GroundHeadV1\"\n      args:\n        hidden_size: 384\n        input_size: 768\n        sem_cls_size: 607\n        dropout: 0.3\n        detach_all_aux_loss: True\n  loss_type: 'ListLoss'\n  loss_list: [\n#       'TextObjWithinBatch'\n        'og3d_loss'\n  ]\n  vis_loss_list: [\n#        'TextObjWithinBatch'\n        'og3d_loss'\n  ]"
  },
  {
    "path": "configs/final/multiscan_only.yaml",
    "content": "###\n# MultiScan pretrain from scratch\n###\n\n# Experiment general info\nname: \"FinalOVPretrain\"\nrng_seed: 42\nnum_gpu: 8\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"multiscan\"\n  train: ['MultiScanSpatialRefer']\n  val: ['MultiScanSpatialRefer']\n  test: ['MultiScanSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'gt'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n    use_scene_cap: False\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: [ 'referit3d' ]\n      referit3d:\n        anno_type: ['nr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: [ 'rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['referit3d']\n      referit3d:\n        anno_type: ['nr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['referit3d']\n      referit3d:\n        anno_type: ['nr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: [ 'anno' ]\n    val:\n      sources: [ 'anno' ]\n    test:\n      sources: [ 'anno' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  S3DSpatialRefer:\n    train:\n      sources: [ 'rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt' ]\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/scratch/masaccio/existing_datasets/HM3D'\n  s3d_base: '/scratch/masaccio/existing_datasets/Structured3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: False\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: False\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'MaskDatasetWrapper'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  batchsize: 64\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 20\n  epochs_per_eval: 1\n  lr: 5e-4\n  grad_norm: 5.0\n  epochs: 150\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 500\n      minimum_ratio: 0.1\n\neval:\n  train:\n    name: 'ReferIt3DEval'\n  val:\n    name: 'ReferIt3DEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: '/scratch/masaccio/results/ALLObjPretrain_b64_Pretrain_ScanNetPretrainObj+RScanPretrainObj+ARKitScenePretrainObj+MultiScanPretrainObj+HMPretrainObj_1113real_all/2023-11-13-12:17:35.068482/ckpt/best.pth'\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['pretrain_head']\n    pretrain_head:\n      name: 'OVPretrainHead'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n  loss_type: 'ListLoss'\n  loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]\n  vis_loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]"
  },
  {
    "path": "configs/final/nr3d_only.yaml",
    "content": "###\n# Nr3D pretrain from scratch\n###\n\n# Experiment general info\nname: \"FinalOVPretrain\"\nrng_seed: 42\nnum_gpu: 8\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"nr3d\"\n  train: ['ScanNetSpatialRefer']\n  val: ['ScanNetSpatialRefer']\n  test: ['ScanNetSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'gt'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n    use_scene_cap: False\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: [ 'referit3d' ]\n      referit3d:\n        anno_type: ['nr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: [ 'rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['referit3d']\n      referit3d:\n        anno_type: ['nr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['referit3d']\n      referit3d:\n        anno_type: ['nr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  S3DSpatialRefer:\n    train:\n      sources: [ 'rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt' ]\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/scratch/masaccio/existing_datasets/HM3D'\n  s3d_base: '/scratch/masaccio/existing_datasets/Structured3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: False\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: False\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'MaskDatasetWrapper'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  batchsize: 64\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 20\n  epochs_per_eval: 1\n  lr: 5e-4\n  grad_norm: 5.0\n  epochs: 150\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 500\n      minimum_ratio: 0.1\n\neval:\n  train:\n    name: 'ReferIt3DEval'\n  val:\n    name: 'ReferIt3DEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: '/scratch/masaccio/results/ALLObjPretrain_b64_Pretrain_ScanNetPretrainObj+RScanPretrainObj+ARKitScenePretrainObj+MultiScanPretrainObj+HMPretrainObj_1113real_all/2023-11-13-12:17:35.068482/ckpt/best.pth'\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['pretrain_head']\n    pretrain_head:\n      name: 'OVPretrainHead'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n  loss_type: 'ListLoss'\n  loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]\n  vis_loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]"
  },
  {
    "path": "configs/final/procthor_only.yaml",
    "content": "###\n# ProcTHOR pretrain from scratch\n###\n\n# Experiment general info\nname: \"FinalOVPretrain\"\nrng_seed: 42\nnum_gpu: 8\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"procthoronly\"\n  train: ['ProcThorSpatialRefer']\n  val: ['ProcThorSpatialRefer']\n  test: ['ProcThorSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'gt'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n    use_scene_cap: True\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: [ 'scanrefer', 'referit3d', 'sgrefer', 'sgcaption' ]\n      referit3d:\n        anno_type: ['sr3d', 'nr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: [ 'rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  S3DSpatialRefer:\n    train:\n      sources: [ 'rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt' ]\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  ProcThorSpatialRefer:\n    train:\n      sources: [ 'rel2_template','relm_template','star_template' ]\n    val:\n      sources: [ 'rel2_template', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'rel2_template', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/scratch/masaccio/existing_datasets/HM3D'\n  s3d_base: '/scratch/masaccio/existing_datasets/Structured3D'\n  procthor_base: '/scratch/masaccio/Procthor'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: False\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: False\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'MaskDatasetWrapper'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  batchsize: 64\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 20\n  epochs_per_eval: 1\n  lr: 5e-4\n  grad_norm: 5.0\n  epochs: 150\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 500\n      minimum_ratio: 0.1\n\neval:\n  train:\n    name: 'PretrainEval'\n  val:\n    name: 'ScanReferEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: '/scratch/masaccio/results/ALLObjPretrain_b512_Pretrain_ProcThorPretrainObj_1115procthor/2023-11-15-12:02:33.790890/ckpt/best.pth'\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['pretrain_head']\n    pretrain_head:\n      name: 'OVPretrainHead'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n  loss_type: 'ListLoss'\n  loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]\n  vis_loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]"
  },
  {
    "path": "configs/final/s3d_only.yaml",
    "content": "###\n# Structured3D pretrain from scratch\n###\n\n# Experiment general info\nname: \"FinalOVPretrain\"\nrng_seed: 42\nnum_gpu: 8\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"s3donly\"\n  train: ['ScanNetSpatialRefer']\n  val: ['S3DSpatialRefer']\n  test: ['S3DSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'gt'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n    use_scene_cap: True\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: [ 'scanrefer' ]\n      referit3d:\n        anno_type: ['sr3d', 'nr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: [ 'rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  S3DSpatialRefer:\n    train:\n      sources: [ 'rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt' ]\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/scratch/masaccio/existing_datasets/HM3D'\n  s3d_base: '/scratch/masaccio/existing_datasets/Structured3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: False\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: False\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'MaskDatasetWrapper'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  batchsize: 64\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 20\n  epochs_per_eval: 1\n  lr: 5e-4\n  grad_norm: 5.0\n  epochs: 150\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 500\n      minimum_ratio: 0.1\n\neval:\n  train:\n    name: 'PretrainEval'\n  val:\n    name: 'ScanReferEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: '/scratch/masaccio/results/ALLObjPretrain_b512_Pretrain_ScanNetPretrainObj+RScanPretrainObj+ARKitScenePretrainObj+MultiScanPretrainObj+HMPretrainObj+S3DPretrainObj_1113scannetws3d/2023-11-14-09:29:10.796592/ckpt/best.pth'\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['pretrain_head']\n    pretrain_head:\n      name: 'OVPretrainHead'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n  loss_type: 'ListLoss'\n  loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]\n  vis_loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]"
  },
  {
    "path": "configs/final/scanrefer_only.yaml",
    "content": "###\n# ScanRefer pretrain from scratch\n###\n\n# Experiment general info\nname: \"FinalOVPretrain\"\nrng_seed: 42\nnum_gpu: 8\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"scanrefer\"\n  train: ['ScanNetSpatialRefer']\n  val: ['ScanNetSpatialRefer']\n  test: ['ScanNetSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'pred'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n    use_scene_cap: False\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: [ 'scanrefer' ]\n      referit3d:\n        anno_type: ['sr3d', 'nr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: [ 'rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  S3DSpatialRefer:\n    train:\n      sources: [ 'rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt' ]\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/scratch/masaccio/existing_datasets/HM3D'\n  s3d_base: '/scratch/masaccio/existing_datasets/Structured3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: False\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: False\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'MaskDatasetWrapper'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  batchsize: 64\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 20\n  epochs_per_eval: 1\n  lr: 5e-4\n  grad_norm: 5.0\n  epochs: 150\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 500\n      minimum_ratio: 0.1\n\neval:\n  train:\n    name: 'PretrainEval'\n  val:\n    name: 'ScanReferEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: '/scratch/masaccio/results/ALLObjPretrain_b64_Pretrain_ScanNetPretrainObj+RScanPretrainObj+ARKitScenePretrainObj+MultiScanPretrainObj+HMPretrainObj_1113real_all/2023-11-13-12:17:35.068482/ckpt/best.pth'\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['pretrain_head']\n    pretrain_head:\n      name: 'OVPretrainHead'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n  loss_type: 'ListLoss'\n  loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]\n  vis_loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]"
  },
  {
    "path": "configs/final/scanrefer_only_gttest.yaml",
    "content": "###\n# ScanRefer pretrain from scratch\n###\n\n# Experiment general info\nname: \"FinalOVPretrain\"\nrng_seed: 42\nnum_gpu: 8\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/home/baoxiong/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"scanrefer\"\n  train: ['ScanNetSpatialRefer']\n  val: ['ScanNetSpatialRefer']\n  test: ['ScanNetSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'pred'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n    use_scene_cap: False\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: [ 'scanrefer' ]\n      referit3d:\n        anno_type: ['sr3d', 'nr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: [ 'rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['scanrefer']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  S3DSpatialRefer:\n    train:\n      sources: [ 'rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt' ]\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/mnt/fillipo/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/mnt/fillipo/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/mnt/fillipo/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/mnt/fillipo/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/mnt/fillipo/scratch/masaccio/existing_datasets/HM3D'\n  s3d_base: '/mnt/fillipo/scratch/masaccio/existing_datasets/Structured3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: False\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: False\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'MaskDatasetWrapper'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  batchsize: 64\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 20\n  epochs_per_eval: 1\n  lr: 5e-4\n  grad_norm: 5.0\n  epochs: 150\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 500\n      minimum_ratio: 0.1\n\neval:\n  train:\n    name: 'PretrainEval'\n  val:\n    name: 'ScanReferEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: \"\"\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/mnt/fillipo/baoxiong/results/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['pretrain_head']\n    pretrain_head:\n      name: 'OVPretrainHead'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n  loss_type: 'ListLoss'\n  loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]\n  vis_loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]"
  },
  {
    "path": "configs/final/sr3d_only.yaml",
    "content": "# Experiment general info\nname: \"FinalOVPretrain\"\nrng_seed: 42\nnum_gpu: 8\nmode: \"train\"\nnote: \"\"\n# Choose keywords to feature your saving directory\nnaming_keywords: [\"dataloader.batchsize\", \"task\", \"note\", \"time\"]\nbase_dir: \"/scratch/masaccio/results\"\nexp_dir: \"\"\nsave_frequency: 10\n\nresume: False\n\ndebug:\n  flag: False\n  debug_size: 20\n  hard_debug: False\n\nlogger:\n  name: \"wandb\"\n  entity: \"bigai-gvl\"\n\n# dataset details\ndata:\n  note: \"sr3d\"\n  train: ['ScanNetSpatialRefer']\n  val: ['ScanNetSpatialRefer']\n  test: ['ScanNetSpatialRefer']\n  args:\n    max_obj_len: 80\n    max_seq_len: 50\n    num_points: 1024\n    pc_type: 'gt'\n    sem_type: '607'\n    filter_lang: False\n    txt_mask_ratio: 0.15\n    pc_mask_ratio: 0.1\n    rot_aug: True\n    mask_strategy: random\n    use_scene_cap: False\n    max_scene_cap_len: 300\n  ScanNetSpatialRefer:\n    train:\n      sources: [ 'referit3d' ]\n      referit3d:\n        anno_type: ['sr3d']\n        sr3d_plus_aug: True\n      sgrefer:\n        anno_type: [ 'rel2_gpt', 'rel2_template', 'relm_gpt', 'relm_template', 'star_gpt', 'star_template'] #\n      sgcaption:\n        anno_type: ['gpt']\n    val:\n      sources: ['referit3d']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n    test:\n      sources: ['referit3d']\n      referit3d:\n        anno_type: ['sr3d'] # 'nr3d', 'sr3d'\n        sr3d_plus_aug: False\n      sgrefer:\n        anno_type: ['template'] # 'template', 'gpt', 'gpt_chain'\n      sgcaption:\n        anno_type: ['gpt']\n  RScanSpatialRefer:\n    train:\n      sources: ['rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  MultiScanSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template', 'star_gpt' ]\n  ARKitSceneSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  HMSpatialRefer:\n    train:\n      sources: ['anno','rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt']\n    val:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'anno', 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  S3DSpatialRefer:\n    train:\n      sources: [ 'rel2_template','rel2_gpt','relm_template','relm_gpt','star_template','star_gpt' ]\n    val:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n    test:\n      sources: [ 'rel2_template', 'relm_gpt', 'relm_template', 'star_template' ]\n  use_voxel: False\n  scan_family_base: \"/scratch/masaccio/existing_datasets/scannet\"\n  rscan_base: \"/scratch/masaccio/existing_datasets/3RScan-base\"\n  arkitscene_base: '/scratch/masaccio/existing_datasets/ARKitScenes'\n  multiscan_base: '/scratch/masaccio/existing_datasets/multiscan'\n  hm_base: '/scratch/masaccio/existing_datasets/HM3D'\n  s3d_base: '/scratch/masaccio/existing_datasets/Structured3D'\n\ndata_aug:\n  aug_list: ['scene_aug']\n  scene_aug:\n    translation:\n      enabled: False\n      value: [1.0, 1.0, 1.0]\n      p: 1.0\n    scaling:\n      enabled: False\n      p: 1.0\n      value: [0.9, 1.1]\n    flip:\n      enabled: False\n      p: 0.5\n    rotation:\n      enabled: True\n      p: 1.0\n      axis_align: True\n      value: [0.0, 0.0, 1.0]\n      shuffle: True\n    color_jitter: False\n    order_shuffle: False\n  obj_aug:\n    translation:\n      enabled: False\n      value: [0.1, 0.1, 0.1]\n      p: 1.0\n    rotation:\n      enabled: False\n      p: 1.0\n      axis_align: False\n      value: [0.0, 0.0, 0.1]\n      shuffle: True\n    random_jitter:\n      enabled: False\n      value: 0.01\n      accord_to_size: False\n      p: 1.0\n    pts_shuffle: True\n\n# task details: 'Pretrain', 'scanqa', 'spatialrefer'\ntask: 'Pretrain'\n# 'MaskDatasetWrapper', 'ScanFamilyDatasetWrapper', 'MaskMVDatasetWrapper'\ndata_wrapper:\n  train: 'MaskDatasetWrapper'\n  val: 'ScanFamilyDatasetWrapperOld'\n  test: 'ScanFamilyDatasetWrapperOld'\n\n# Training details\ntrainer: \"OpenVocabTrainer\"\nckpt_path: \"\"\npretrain_ckpt_path: \"\"\n\n# dataloader details\ndataloader:\n  batchsize: 64\n  num_workers: 4\n  balance_dataset: False\n  filter_empty_annotations: False\n\nsolver:\n  gradient_accumulation_steps: 1\n  epochs_per_save: 20\n  epochs_per_eval: 1\n  lr: 5e-4\n  grad_norm: 5.0\n  epochs: 150\n  optim:\n    name: 'AdamW'\n    args:\n      betas: [0.9, 0.98]\n  sched:\n    name: 'warmup_cosine'\n    args:\n      warmup_steps: 500\n      minimum_ratio: 0.1\n\neval:\n  train:\n    name: 'ReferIt3DEval'\n  val:\n    name: 'ReferIt3DEval'\n  save: False\n\n\n# Model details\nmodel:\n  name: OpenVocab\n  language:\n    # This part could be further optimized to be using\n    # huggingface yaml config files\n    name: 'BERTLanguageEncoder'\n    args:\n      weights: 'bert-base-uncased'\n      hidden_size: 768\n      num_hidden_layers: 4\n      num_attention_heads: 12\n      type_vocab_size: 2\n    lr: 1e-5\n  vision:\n    name: 'PointOpenVocabEncoder'\n    args:\n        backbone: 'pointnet++'\n        hidden_size: 768\n        freeze: True\n        path: '/scratch/masaccio/results/ALLObjPretrain_b64_Pretrain_ScanNetPretrainObj+RScanPretrainObj+ARKitScenePretrainObj+MultiScanPretrainObj+HMPretrainObj_1113real_all/2023-11-13-12:17:35.068482/ckpt/best.pth'\n        num_attention_heads: 12\n        spatial_dim: 5\n        num_layers: 4\n        dim_loc: 6\n        dim_feedforward: 2048\n        attn_type: spatial\n        pairwise_rel_type: 'center'\n        use_matmul_label: False\n        lang_type: 'bert'\n        lang_path: '/scratch/masaccio/607_text_embeddings'\n    lr: 1e-4\n  grounding:\n    name: 'UnifiedSpatialCrossEncoderV2'\n    args:\n      hidden_size: 768\n      num_attention_heads: 12\n      num_layers: 4\n      dim_feedforward: 2048\n      dim_loc: 6\n    lr: 1e-4\n  inter: before\n  heads:\n    head_list: ['pretrain_head']\n    pretrain_head:\n      name: 'OVPretrainHead'\n      args:\n        hidden_size: 768\n        vocab_size: 30522\n  loss_type: 'ListLoss'\n  loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]\n  vis_loss_list: [\n      'lm_cls_loss',\n      'TextObjWithinBatch',\n#      'TextObjBetweenBatch',\n      'TextSceneBetweenBatch'\n  ]"
  },
  {
    "path": "data/__init__.py",
    "content": "from .datasets import *\nfrom .data_utils import *\n"
  },
  {
    "path": "data/build.py",
    "content": "from omegaconf import OmegaConf\nfrom torch.utils.data import DataLoader, default_collate, ConcatDataset\nfrom fvcore.common.registry import Registry\n\nfrom .datasets.dataset_wrapper import DATASETWRAPPER_REGISTRY\n\nDATASET_REGISTRY = Registry(\"dataset\")\nDATASET_REGISTRY.__doc__ = \"\"\"\nRegistry for datasets, which takes a list of dataset names and returns a dataset object.\nCurrently it performs similar as registering dataset loading functions, but remains in a\nform of object class for future purposes.\n\"\"\"\n\ndef get_dataset(cfg, split):\n    assert cfg.data.get(split), f\"No valid dataset name in {split}.\"\n    dataset_list = []\n    print(split, ': ', ', '.join(cfg.data.get(split)))\n    for dataset_name in cfg.data.get(split):\n        _dataset = DATASET_REGISTRY.get(dataset_name)(cfg, split)\n        assert len(_dataset), f\"Dataset '{dataset_name}' is empty!\"\n        wrapper = cfg.data_wrapper.get(split, cfg.data_wrapper) if not isinstance(cfg.data_wrapper, str) else cfg.data_wrapper\n        _dataset = DATASETWRAPPER_REGISTRY.get(wrapper)(cfg, _dataset, split=split)\n        # Conduct voxelization\n        # TODO: fix voxel config\n        if cfg.data.get('use_voxel', None):\n            _dataset = DATASETWRAPPER_REGISTRY.get('VoxelDatasetWrapper')(cfg, _dataset)\n        dataset_list.append(_dataset)\n\n    print('='*50)\n    print('Dataset\\t\\t\\tSize')\n    total = sum([len(dataset) for dataset in dataset_list])\n    for dataset_name, dataset in zip(cfg.data.get(split), dataset_list):\n        print(f'{dataset_name:<20} {len(dataset):>6} ({len(dataset) / total * 100:.1f}%)')\n    print(f'Total\\t\\t\\t{total}')\n    print('='*50)\n    if split == 'train':\n        dataset_list = ConcatDataset(dataset_list)\n\n    return dataset_list\n\n\ndef build_dataloader(cfg, split='train'):\n    \"\"\"_summary_\n    Unittest:\n        dataloader_train = build_dataloader(default_cfg, split='train')\n        for _item in dataloader_train:\n            print(_item.keys())\n\n    Args:\n        cfg (_type_): _description_\n        split (str, optional): _description_. Defaults to 'train'.\n\n    Returns:\n        _type_: _description_\n    \"\"\"\n    if split == 'train':\n        dataset = get_dataset(cfg, split)\n        return DataLoader(dataset,\n                          batch_size=cfg.dataloader.batchsize,\n                          num_workers=cfg.dataloader.num_workers,\n                          collate_fn=getattr(dataset.datasets[0], 'collate_fn', default_collate),\n                          pin_memory=True, # TODO: Test speed\n                        #   prefetch_factor=2 if not cfg.debug.flag else None,\n                          persistent_workers=True if not cfg.debug.flag else None,\n                          shuffle=True,\n                          drop_last=True)\n    else:\n        loader_list = []\n        for dataset in get_dataset(cfg, split):\n            loader_list.append(\n                DataLoader(dataset,\n                    batch_size=cfg.dataloader.get('batchsize_eval', cfg.dataloader.batchsize),\n                    num_workers=cfg.dataloader.num_workers,\n                    collate_fn=getattr(dataset, 'collate_fn', default_collate),\n                    pin_memory=True, # TODO: Test speed\n                    # prefetch_factor=2 if not cfg.debug.flag else None,\n                    persistent_workers=True if not cfg.debug.flag else None,\n                    shuffle=False))\n        # TODO: temporary solution for backward compatibility.\n        if len(loader_list) == 1:\n            return loader_list[0]\n        else:\n            return loader_list\n\n\nif __name__ == '__main__':\n    pass\n"
  },
  {
    "path": "data/data_utils.py",
    "content": "import random\nimport csv\nfrom collections import Counter\nimport re\n\nimport numpy as np\nimport torch\n\nfrom data.datasets.constant import VALID_CLASS_IDS_200\n\n\ndef per_scene_pad(lang_list, max_len=64, tokenizer=None, max_seq_len=50):\n    \"\"\"\n    @param lang_list: lang json for all sentences, must include scan_id in the json element\n    @param max_len: the number for padding, default is 64\n    @return: a list of list, with each element in the list containing max_len number of sentences corresponding to\n             one scene\n    \"\"\"\n    scene_list = {}\n    if tokenizer is not None:\n        for key in [\"utterance\", \"question\", \"description\"]:\n            if key in lang_list[0].keys():\n                encoded_input = tokenizer(\n                    [item[key] for item in lang_list], padding=\"max_length\", truncation=True, max_length=max_seq_len\n                )\n                lang_list = [\n                    {\n                        k : (v, encoded_input[\"input_ids\"][idx], encoded_input[\"attention_mask\"][idx])\n                        if k == key else v for k, v in item.items()\n                    } for idx, item in enumerate(lang_list)\n                ]\n    for item in lang_list:\n        scan_id = item[\"scan_id\"]\n        if scan_id not in scene_list.keys():\n            scene_list[scan_id] = [item]\n        else:\n            scene_list[scan_id].append(item)\n    final_list = []\n    for key, value in scene_list.items():\n        for index in range(0, len(value), max_len):\n            if index + max_len < len(value):\n                final_list.append(value[index:index + max_len])\n            else:\n                content = value[index:]\n                sampled = random.choices(content, k=max_len)\n                final_list.append(sampled)\n    return final_list\n\n\ndef merge_tokens(token1, mask1, token2, mask2, max_len=300, tokenizer=None):\n    assert len(token1) > len(token2), \"not appendable\"\n    assert tokenizer is not None, \"should pass in a tokenizer\"\n    len_token1 = sum(mask1) - 1  # remove the last [CLS]\n    len_token2 = sum(mask2) - 1  # remove the first [BOS]\n    insert_length = min(max_len - len_token1, len_token2)\n    token1[len_token1: len_token1 + insert_length] = token2[1: 1 + insert_length]\n    mask1[len_token1: len_token1 + insert_length] = mask2[1: 1 + insert_length]\n    if token1[sum(mask1) - 1] != tokenizer.sep_token_id:\n        token1[sum(mask1) - 1] = tokenizer.sep_token_id\n    return token1, mask1\n\n\ndef convert_pc_to_box(obj_pc):\n    xmin = np.min(obj_pc[:,0])\n    ymin = np.min(obj_pc[:,1])\n    zmin = np.min(obj_pc[:,2])\n    xmax = np.max(obj_pc[:,0])\n    ymax = np.max(obj_pc[:,1])\n    zmax = np.max(obj_pc[:,2])\n    center = [(xmin+xmax)/2, (ymin+ymax)/2, (zmin+zmax)/2]\n    box_size = [xmax-xmin, ymax-ymin, zmax-zmin]\n    return center, box_size\n\n\n# input txt_ids, txt_masks\ndef random_word(tokens, tokens_mask, tokenizer, mask_ratio):\n    output_label = []\n    output_tokens = tokens.clone()\n    for i, token in enumerate(tokens):\n        if tokens_mask[i] == 0:\n            output_label.append(-1)\n        else:\n            prob = random.random()\n            # mask token with 15% probability\n            if prob < mask_ratio:\n                prob /= mask_ratio\n\n                # 80% randomly change token to mask token\n                if prob < 0.8:\n                    output_tokens[i] = tokenizer.mask_token_id\n\n                # 10% randomly change token to random token\n                elif prob < 0.9:\n                    output_tokens[i] = random.choice(list(tokenizer.vocab.items()))[1]\n\n                # -> rest 10% randomly keep current token\n\n                # append current token to output (we will predict these later)\n                output_label.append(token.item())\n            else:\n                # no masking token (will be ignored by loss function later)\n                output_label.append(-1)\n    output_label = torch.Tensor(output_label).long()\n    return output_tokens, output_label\n\n\ndef random_point_cloud(pcd, pcd_mask, mask_ratio):\n    assert len(pcd) == len(pcd_mask)\n    output_mask = []\n    for i in range(len(pcd)):\n        if pcd_mask[i] == 0:\n            output_mask.append(0)\n        else:\n            prob = random.random()\n            if prob < mask_ratio:\n                output_mask.append(0)\n            else:\n                output_mask.append(1)\n\n    output_mask = torch.tensor(output_mask, dtype=torch.bool)\n    return output_mask\n\n\nclass LabelConverter(object):\n    def __init__(self, file_path):\n        self.raw_name_to_id = {}\n        self.nyu40id_to_id = {}\n        self.nyu40_name_to_id = {}\n        self.scannet_name_to_scannet_id = {'cabinet':0, 'bed':1, 'chair':2, 'sofa':3, 'table':4,\n            'door':5, 'window':6,'bookshelf':7,'picture':8, 'counter':9, 'desk':10, 'curtain':11,\n            'refrigerator':12, 'shower curtain':13, 'toilet':14, 'sink':15, 'bathtub':16, 'others':17}  \n        self.id_to_scannetid = {}\n        self.scannet_raw_id_to_raw_name = {}\n\n        with open(file_path, encoding='utf-8') as fd:\n            rd = list(csv.reader(fd, delimiter=\"\\t\", quotechar='\"'))\n            for i in range(1, len(rd)):\n                raw_id = i - 1\n                scannet_raw_id = int(rd[i][0])\n                raw_name = rd[i][1]\n                nyu40_id = int(rd[i][4])\n                nyu40_name = rd[i][7]\n                self.raw_name_to_id[raw_name] = raw_id\n                self.scannet_raw_id_to_raw_name[scannet_raw_id] = raw_name\n                self.nyu40id_to_id[nyu40_id] = raw_id\n                self.nyu40_name_to_id[nyu40_name] = raw_id\n                if nyu40_name not in self.scannet_name_to_scannet_id:\n                    self.id_to_scannetid[raw_id] = self.scannet_name_to_scannet_id['others']\n                else:\n                    self.id_to_scannetid[raw_id] = self.scannet_name_to_scannet_id[nyu40_name]\n        \n        ### add instance id from org image to pth file\n        self.orgInstID_to_id = {id : id - 1 for id in range(1, 257)}\n        self.orgInstID_to_id[0] = -100\n        \n        # add map for scannet 200\n        self.scannet_raw_id_to_scannet200_id = {}\n        self.scannet200_id_to_scannet_raw_id = {}\n        for v, k in enumerate(VALID_CLASS_IDS_200):\n            self.scannet_raw_id_to_scannet200_id[k] = v\n            self.scannet200_id_to_scannet_raw_id[v] = k\n\ndef build_rotate_mat(split, rot_aug=True, rand_angle='axis'):\n    if rand_angle == 'random':\n        theta = np.random.rand() * np.pi * 2\n    else:\n        ROTATE_ANGLES = [0, np.pi/2, np.pi, np.pi*3/2]\n        theta_idx = np.random.randint(len(ROTATE_ANGLES))\n        theta = ROTATE_ANGLES[theta_idx]\n    if (theta is not None) and (theta != 0) and (split == 'train') and rot_aug:\n        rot_matrix = np.array([\n            [np.cos(theta), -np.sin(theta), 0],\n            [np.sin(theta), np.cos(theta), 0],\n            [0, 0, 1]\n        ], dtype=np.float32)\n    else:\n        rot_matrix = None\n    return rot_matrix\n\n\ndef eval_ref_one_sample(pred_bbox, gt_bbox):\n    \"\"\" Evaluate one reference prediction\n    Args:\n        pred_bbox: 8 corners of prediction bounding box, (8, 3)\n        gt_bbox: 8 corners of ground truth bounding box, (8, 3)\n    Returns:\n        iou: intersection over union score\n    \"\"\"\n\n    iou = box3d_iou(pred_bbox, gt_bbox)\n\n    return iou\n\ndef get_box3d_min_max(corner):\n    ''' Compute min and max coordinates for 3D bounding box\n        Note: only for axis-aligned bounding boxes\n    Input:\n        corners: numpy array (8,3), assume up direction is Z (batch of N samples)\n    Output:\n        box_min_max: an array for min and max coordinates of 3D bounding box IoU\n    '''\n\n    min_coord = corner.min(axis=0)\n    max_coord = corner.max(axis=0)\n    x_min, x_max = min_coord[0], max_coord[0]\n    y_min, y_max = min_coord[1], max_coord[1]\n    z_min, z_max = min_coord[2], max_coord[2]\n\n    return x_min, x_max, y_min, y_max, z_min, z_max\n\n\ndef box3d_iou(corners1, corners2):\n    ''' Compute 3D bounding box IoU.\n    Input:\n        corners1: numpy array (8,3), assume up direction is Z\n        corners2: numpy array (8,3), assume up direction is Z\n    Output:\n        iou: 3D bounding box IoU\n    '''\n\n    x_min_1, x_max_1, y_min_1, y_max_1, z_min_1, z_max_1 = get_box3d_min_max(corners1)\n    x_min_2, x_max_2, y_min_2, y_max_2, z_min_2, z_max_2 = get_box3d_min_max(corners2)\n    xA = np.maximum(x_min_1, x_min_2)\n    yA = np.maximum(y_min_1, y_min_2)\n    zA = np.maximum(z_min_1, z_min_2)\n    xB = np.minimum(x_max_1, x_max_2)\n    yB = np.minimum(y_max_1, y_max_2)\n    zB = np.minimum(z_max_1, z_max_2)\n    inter_vol = np.maximum((xB - xA), 0) * np.maximum((yB - yA), 0) * np.maximum((zB - zA), 0)\n    box_vol_1 = (x_max_1 - x_min_1) * (y_max_1 - y_min_1) * (z_max_1 - z_min_1)\n    box_vol_2 = (x_max_2 - x_min_2) * (y_max_2 - y_min_2) * (z_max_2 - z_min_2)\n    iou = inter_vol / (box_vol_1 + box_vol_2 - inter_vol + 1e-8)\n\n    return iou\n\n\ndef transform_points(points, transform, translate=True, mode=\"numpy\"):\n    \"\"\" Apply linear transform to a np array of points.\n    Args:\n        points (np array [..., 3]): Points to transform.\n        transform (np array [3, 4] or [4, 4]): Linear map.\n        translate (bool): If false, do not apply translation component of transform.\n    Returns:\n        transformed points (np array [..., 3])\n    \"\"\"\n    # Append ones or zeros to get homogenous coordinates\n    if translate:\n        if mode == \"numpy\":\n            constant_term = np.ones_like(points[..., :1])\n        else:\n            constant_term = torch.ones_like(points[..., :1])\n    else:\n        if mode == \"numpy\":\n            constant_term = np.zeros_like(points[..., :1])\n        else:\n            constant_term = torch.zeros_like(points[..., :1])\n    if mode == \"numpy\":\n        points = np.concatenate((points, constant_term), axis=-1)\n        points = np.einsum('nm,...m->...n', transform, points)\n    else:\n        points = torch.cat((points, constant_term), dim=-1)\n        points = torch.einsum('...nm,...m->...n', transform, points)\n    return points[..., :3]\n\n\ndef construct_bbox_corners(center, box_size):\n    sx, sy, sz = box_size\n    x_corners = [sx/2, sx/2, -sx/2, -sx/2, sx/2, sx/2, -sx/2, -sx/2]\n    y_corners = [sy/2, -sy/2, -sy/2, sy/2, sy/2, -sy/2, -sy/2, sy/2]\n    z_corners = [sz/2, sz/2, sz/2, sz/2, -sz/2, -sz/2, -sz/2, -sz/2]\n    corners_3d = np.vstack([x_corners, y_corners, z_corners])\n    corners_3d[0,:] = corners_3d[0,:] + center[0]\n    corners_3d[1,:] = corners_3d[1,:] + center[1]\n    corners_3d[2,:] = corners_3d[2,:] + center[2]\n    corners_3d = np.transpose(corners_3d)\n\n    return corners_3d\n\n\ndef is_explicitly_view_dependent(tokens):\n    \"\"\"\n    :return: a boolean mask\n    \"\"\"\n    target_words = {'front', 'behind', 'back', 'right', 'left', 'facing', 'leftmost', 'rightmost',\n                    'looking', 'across'}\n    for token in tokens:\n        if token in target_words:\n            return True\n    return False\n\n\nclass ScanQAAnswer(object):\n    def __init__(self, answers=None, unk_token='<unk>', ignore_idx=-100):\n        if answers is None:\n            answers = []\n        self.unk_token = unk_token\n        self.ignore_idx = ignore_idx\n        self.vocab = {x: i for i, x in enumerate(answers)}\n        self.rev_vocab = dict((v, k) for k, v in self.vocab.items())\n\n    def itos(self, i):\n        if i == self.ignore_idx:\n            return self.unk_token\n        return self.rev_vocab[i]\n\n    def stoi(self, v):\n        if v not in self.vocab:\n            return self.ignore_idx\n        return self.vocab[v]\n\n    def __len__(self):\n        return len(self.vocab)\n\n\nclass SQA3DAnswer(object):\n    def __init__(self, answers=None, unk_token='u'):\n        if answers is None:\n            answers = []\n        self.vocab = {x: i for i, x in enumerate(answers)}\n        self.rev_vocab = dict((v, k) for k, v in self.vocab.items())\n        self.unk_token = unk_token\n        self.ignore_idx = self.vocab['u']\n\n    def itos(self, i):\n        if i == self.ignore_idx:\n            return self.unk_token\n        return self.rev_vocab[i]\n\n    def stoi(self, v):\n        if v not in self.vocab:\n            return self.ignore_idx\n        return self.vocab[v]\n\n    def __len__(self):\n        return len(self.vocab)\n\n\ndef load_matrix_from_txt(path, shape=(4, 4)):\n    with open(path) as f:\n        txt = f.readlines()\n    txt = ''.join(txt).replace('\\n', ' ')\n    matrix = [float(v) for v in txt.split()]\n    return np.array(matrix).reshape(shape)\n\ndef pad_tensors(tensors, lens=None, pad=0):\n    assert tensors.shape[0] <= lens\n    if tensors.shape[0] == lens:\n        return tensors\n    shape = list(tensors.shape)\n    shape[0] = lens - shape[0]\n    res = torch.ones(shape, dtype=tensors.dtype) * pad\n    res = torch.cat((tensors, res), dim=0)\n    return res\n\ndef get_sqa_question_type(question):\n    question = question.lstrip()\n    if question[:4].lower() == 'what':\n        return 0\n    elif question[:2].lower() == 'is':\n        return 1\n    elif question[:3].lower() == 'how':\n        return 2\n    elif question[:3].lower() == 'can':\n        return 3\n    elif question[:5].lower() == 'which':\n        return 4\n    else:\n        return 5   # others\n\n\nclass Vocabulary(object):\n    def __init__(self, path=None):\n        self.vocab = {}\n        self.id_to_vocab = {}\n        self.id_to_bert = {}\n\n        if path is not None:\n            load_dict = torch.load(path)\n            self.vocab = load_dict['vocab']\n            self.id_to_vocab = load_dict['id_to_vocab']\n            self.id_to_bert = load_dict['id_to_bert']\n\n    def add_token(self, token, bert_id):\n        if token in self.vocab.keys():\n            return\n        id = len(self.vocab) \n        self.vocab[token] = id\n        self.id_to_vocab[id] = token\n        self.id_to_bert[id] = bert_id\n\n    def token_to_id(self, token):\n        return self.vocab[token]\n\n    def id_to_token(self, id):\n        return self.id_to_vocab[id]\n\n    def id_to_bert_id(self, id):\n        return self.id_to_bert[id]\n\n    def save_vocab(self, path):\n        save_dict = {'vocab': self.vocab, \"id_to_vocab\": self.id_to_vocab,\n                     \"id_to_bert\": self.id_to_bert}\n        torch.save(save_dict, path)\n\n\ndef random_caption_word(tokens, tokens_mask, tokenizer, vocab, mask_ratio):\n    output_label = []\n    output_tokens = tokens.clone()\n    for i, token in enumerate(tokens): # 101 cls 102 sep use them as SOS and EOS token\n        if tokens_mask[i] == 0 or token == 101:\n            output_label.append(-1)\n        elif token == 102:\n            output_tokens[i] = tokenizer.mask_token_id\n            output_label.append(vocab.token_to_id('[EOS]'))\n        else:\n            prob = random.random()\n            # mask token with 15% probability\n            if prob < mask_ratio:\n                output_tokens[i] = tokenizer.mask_token_id\n                output_label.append(vocab.token_to_id(tokenizer.decode([tokens[i]])))\n            else:\n                # no masking token (will be ignored by loss function later)\n                output_label.append(-1)\n    output_label = torch.Tensor(output_label).long()\n    return output_tokens, output_label\n\n\ndef clean_answer(data):\n    data = data.lower()\n    data = re.sub('[ ]+$' ,'', data)\n    data = re.sub('^[ ]+' ,'', data)\n    data = re.sub(' {2,}', ' ', data)\n\n    data = re.sub('\\.[ ]{2,}', '. ', data)\n    data = re.sub('[^a-zA-Z0-9,\\'\\s\\-:]+', '', data)\n    data = re.sub('ç' ,'c', data)\n    data = re.sub('’' ,'\\'', data)\n    data = re.sub(r'\\bletf\\b' ,'left', data)\n    data = re.sub(r'\\blet\\b' ,'left', data)\n    data = re.sub(r'\\btehre\\b' ,'there', data)\n    data = re.sub(r'\\brigth\\b' ,'right', data)\n    data = re.sub(r'\\brght\\b' ,'right', data)\n    data = re.sub(r'\\bbehine\\b', 'behind', data)\n    data = re.sub(r'\\btv\\b' ,'TV', data)\n    data = re.sub(r'\\bchai\\b' ,'chair', data)\n    data = re.sub(r'\\bwasing\\b' ,'washing', data)\n    data = re.sub(r'\\bwaslked\\b' ,'walked', data)\n    data = re.sub(r'\\boclock\\b' ,'o\\'clock', data)\n    data = re.sub(r'\\bo\\'[ ]+clock\\b' ,'o\\'clock', data)\n\n    # digit to word, only for answer\n    data = re.sub(r'\\b0\\b', 'zero', data)\n    data = re.sub(r'\\bnone\\b', 'zero', data)\n    data = re.sub(r'\\b1\\b', 'one', data)\n    data = re.sub(r'\\b2\\b', 'two', data)\n    data = re.sub(r'\\b3\\b', 'three', data)\n    data = re.sub(r'\\b4\\b', 'four', data)\n    data = re.sub(r'\\b5\\b', 'five', data)\n    data = re.sub(r'\\b6\\b', 'six', data)\n    data = re.sub(r'\\b7\\b', 'seven', data)\n    data = re.sub(r'\\b8\\b', 'eight', data)\n    data = re.sub(r'\\b9\\b', 'nine', data)\n    data = re.sub(r'\\b10\\b', 'ten', data)\n    data = re.sub(r'\\b11\\b', 'eleven', data)\n    data = re.sub(r'\\b12\\b', 'twelve', data)\n    data = re.sub(r'\\b13\\b', 'thirteen', data)\n    data = re.sub(r'\\b14\\b', 'fourteen', data)\n    data = re.sub(r'\\b15\\b', 'fifteen', data)\n    data = re.sub(r'\\b16\\b', 'sixteen', data)\n    data = re.sub(r'\\b17\\b', 'seventeen', data)\n    data = re.sub(r'\\b18\\b', 'eighteen', data)\n    data = re.sub(r'\\b19\\b', 'nineteen', data)\n    data = re.sub(r'\\b20\\b', 'twenty', data)\n    data = re.sub(r'\\b23\\b', 'twenty-three', data)\n\n    # misc\n    # no1, mat2, etc\n    data = re.sub(r'\\b([a-zA-Z]+)([0-9])\\b' ,r'\\g<1>', data)\n    data = re.sub(r'\\ba\\b ([a-zA-Z]+)' ,r'\\g<1>', data)\n    data = re.sub(r'\\ban\\b ([a-zA-Z]+)' ,r'\\g<1>', data)\n    data = re.sub(r'\\bthe\\b ([a-zA-Z]+)' ,r'\\g<1>', data)\n\n    data = re.sub(r'\\bbackwards\\b', 'backward', data)\n\n    return data\n\n\nif __name__ == \"__main__\":\n    path = \"/home/baoxiong/Desktop/gpt_gen_language.json\"\n    import json\n    with open(path, \"r\") as f:\n        data = json.load(f)\n        padded = per_scene_pad(data)\n        print(padded)\n"
  },
  {
    "path": "data/datasets/__init__.py",
    "content": "from .scannet import *\nfrom .rscan import *\nfrom .arkitscene import *\nfrom .hm import *\nfrom .multiscan import *\nfrom .procthor import *\nfrom .structure3d import *\n"
  },
  {
    "path": "data/datasets/arkitscene.py",
    "content": "import collections\n\nfrom ..build import DATASET_REGISTRY\nfrom .base import ScanBase\n\n\n@DATASET_REGISTRY.register()\nclass ARKitScenePretrainObj(ScanBase):\n    def __init__(self, cfg, split):\n        super(ARKitScenePretrainObj, self).__init__(cfg, split)\n        self.base_dir = cfg.data.arkitscene_base\n\n        self.load_scene_pcds = cfg.data.args.get('load_scene_pcds', False)\n        if self.load_scene_pcds:\n            self.max_pcd_num_points = cfg.data.args.get('max_pcd_num_points', None)\n            assert self.max_pcd_num_points is not None\n        self.bg_points_num = cfg.data.args.get('bg_points_num', 1000)\n\n        self.scan_ids = sorted(list(self._load_split(self.split)))\n        if cfg.debug.flag and cfg.debug.debug_size != -1:\n            self.scan_ids = self.scan_ids[:cfg.debug.debug_size]\n\n        print(f\"Loading ARKitScene {split}-set scans\")\n        self.scan_data = self._load_scan(self.scan_ids)\n        self.scan_ids = sorted(list(self.scan_data.keys()))\n        print(f\"Finish loading ARKitScene {split}-set scans of length {len(self.scan_ids)}\")\n\n    def __len__(self):\n        return len(self.scan_ids)\n\n    def __getitem__(self, index):\n        \"\"\"Data dict post-processing, for example, filtering, crop, nomalization,\n        rotation, etc.\n\n        Args:\n            index (int): _description_\n        \"\"\"\n        data_dict = self._getitem_obj_pretrain(index)\n        dataset = 'arkitscene'\n        data_dict['source'] = dataset\n        return data_dict\n\n\n@DATASET_REGISTRY.register()\nclass ARKitSceneSpatialRefer(ScanBase):\n    def __init__(self, cfg, split):\n        super(ARKitSceneSpatialRefer, self).__init__(cfg, split)\n        self.base_dir = cfg.data.arkitscene_base\n        self.max_obj_len = cfg.data.args.max_obj_len - 1\n        self.filter_lang = cfg.data.args.filter_lang\n\n        self.load_scene_pcds = cfg.data.args.get('load_scene_pcds', False)\n        if self.load_scene_pcds:\n            self.max_pcd_num_points = cfg.data.args.get('max_pcd_num_points', None)\n            assert self.max_pcd_num_points is not None\n        self.bg_points_num = cfg.data.args.get('bg_points_num', 1000)\n\n        split_cfg = cfg.data.get(self.__class__.__name__).get(split)\n        all_scan_ids = self._load_split(self.split)\n\n        print(f\"Loading ARKitScene {split}-set language\")\n        self.lang_data, self.scan_ids = self._load_lang(split_cfg, all_scan_ids)\n        print(f\"Finish loading ARKitScene {split}-set language of size {self.__len__()}\")\n\n        print(f\"Loading ARKitScene {split}-set scans\")\n        self.scan_data = self._load_scan(self.scan_ids)\n        print(f\"Finish loading ARKitScene {split}-set scans\")\n\n         # build unique multiple look up\n        for scan_id in self.scan_ids:\n            inst_labels = self.scan_data[scan_id]['inst_labels']\n            self.scan_data[scan_id]['label_count'] = collections.Counter(\n                                    [l for l in inst_labels])\n            self.scan_data[scan_id]['label_count_multi'] = collections.Counter(\n                                    [self.label_converter.id_to_scannetid[l] for l in inst_labels])\n\n    def __len__(self):\n        return len(self.lang_data)\n\n    def __getitem__(self, index):\n        \"\"\"Data dict post-processing, for example, filtering, crop, nomalization,\n        rotation, etc.\n\n        Args:\n            index (int): _description_\n        \"\"\"\n        data_dict = self._getitem_refer(index)\n        return data_dict\n"
  },
  {
    "path": "data/datasets/base.py",
    "content": "import os\nimport copy\nimport json\nimport jsonlines\nimport random\n\nfrom tqdm import tqdm\nimport numpy as np\nfrom scipy import sparse\nimport torch\nfrom torch.utils.data import Dataset\n\nfrom ..data_utils import LabelConverter, build_rotate_mat\nfrom ..data_utils import convert_pc_to_box, construct_bbox_corners, \\\n                            merge_tokens, eval_ref_one_sample, is_explicitly_view_dependent\nfrom .data_augmentor import DataAugmentor\nfrom .constant import CLASS_LABELS_200\n\n\nclass ScanBase(Dataset):\n    def __init__(self, cfg, split):\n        self.cfg = cfg\n        self.split = split\n        self.pc_type = cfg.data.args.pc_type\n        self.max_obj_len = cfg.data.args.max_obj_len\n        self.num_points = cfg.data.args.num_points\n        self.rot_aug = cfg.data.args.rot_aug\n        self.aug_cfg = getattr(cfg, 'data_aug', None)\n        self.debug = cfg.debug.flag\n        self.debug_size = cfg.debug.debug_size\n        self.subset_ratio = getattr(cfg.data.args, 'subset_ratio', 0)\n        if self.aug_cfg:\n            self.augmentor = DataAugmentor(self.aug_cfg, self.split)\n        self.scannet_dir = cfg.data.scan_family_base\n\n        assert self.split in ['train', 'val', 'test']\n        if self.split == 'train':\n            self.pc_type = 'gt'\n        # TODO: hack test split to be the same as val\n        if self.split == 'test':\n            self.split = 'val'\n\n        self.int2cat = json.load(open(os.path.join(self.scannet_dir,\n                                            \"annotations/meta_data/scannetv2_raw_categories.json\"),\n                                            'r', encoding=\"utf-8\"))\n        self.cat2int = {w: i for i, w in enumerate(self.int2cat)}\n        self.label_converter = LabelConverter(os.path.join(self.scannet_dir,\n                                            \"annotations/meta_data/scannetv2-labels.combined.tsv\"))\n\n        self.use_scene_cap = getattr(cfg.data.args, 'use_scene_cap', False)\n\n    def _load_split(self, split):\n        # TODO: temporarily reproducing\n        # split_file = os.path.join(self.base_dir, 'annotations/splits/'+ split + \"_split_non_overlap.txt\")\n        if 'scannet' in self.__class__.__name__.lower():\n            split_file = os.path.join(self.base_dir, 'annotations/splits/scannetv2_'+ split + \".txt\")\n        else:\n            split_file = os.path.join(self.base_dir, 'annotations/splits/'+ split + \"_split.txt\")\n\n        scan_ids = {x.strip() for x in open(split_file, 'r', encoding=\"utf-8\")}\n        scan_ids = sorted(scan_ids)\n\n        return scan_ids\n\n    def _load_scan(self, scan_ids, filter_bkg=False):\n        scans = {}\n        for scan_id in tqdm(scan_ids):\n            pcd_path = os.path.join(self.base_dir, 'scan_data', 'pcd_with_global_alignment', f'{scan_id}.pth')\n            inst2label_path = os.path.join(self.base_dir, 'scan_data', 'instance_id_to_label', f'{scan_id}.pth')\n\n            if not os.path.exists(pcd_path):\n                continue\n            pcd_data = torch.load(pcd_path)\n            points, colors, instance_labels = pcd_data[0], pcd_data[1], pcd_data[-1]\n            colors = colors / 127.5 - 1\n            pcds = np.concatenate([points, colors], 1)\n            # build obj_pcds\n            inst_to_label = torch.load(inst2label_path)\n            obj_pcds = []\n            inst_ids = []\n            inst_labels = []\n            bg_indices = np.full((points.shape[0], ), 1, dtype=np.bool_)\n            for inst_id in inst_to_label.keys():\n                if inst_to_label[inst_id] in self.cat2int.keys():\n                    mask = instance_labels == inst_id\n                    if np.sum(mask) == 0:\n                        continue\n                    obj_pcds.append(pcds[mask])\n                    inst_ids.append(inst_id)\n                    inst_labels.append(self.cat2int[inst_to_label[inst_id]])\n                    if inst_to_label[inst_id] not in ['wall', 'floor', 'ceiling']:\n                        bg_indices[mask] = False\n            if filter_bkg:\n                selected_obj_idxs = [i for i, obj_label in enumerate(inst_labels)\n                                if (self.int2cat[obj_label] not in ['wall', 'floor', 'ceiling'])]\n                if len(selected_obj_idxs) == 0:\n                    continue\n            scans[scan_id] = {}\n            # scans[scan_id]['scene_pcds'] = pcds\n            scans[scan_id]['obj_pcds'] = obj_pcds\n            scans[scan_id]['inst_labels'] = inst_labels\n            scans[scan_id]['inst_ids'] = inst_ids\n            scans[scan_id]['bg_pcds'] = pcds[bg_indices]\n            # calculate box for matching\n            obj_center = []\n            obj_box_size = []\n            for obj_pcd in obj_pcds:\n                _c, _b = convert_pc_to_box(obj_pcd)\n                obj_center.append(_c)\n                obj_box_size.append(_b)\n            scans[scan_id]['obj_center'] = obj_center\n            scans[scan_id]['obj_box_size'] = obj_box_size\n\n            # load pred pcds\n            obj_mask_path = os.path.join(self.base_dir, \"mask\", str(scan_id) + \".mask\" + \".npz\")\n            if os.path.exists(obj_mask_path):\n                obj_label_path = os.path.join(self.base_dir, \"mask\", str(scan_id) + \".label\" + \".npy\")\n                obj_pcds = []\n                obj_mask = np.array(sparse.load_npz(obj_mask_path).todense())[:50, :]\n                obj_labels = np.load(obj_label_path)[:50]\n                obj_l = []\n                bg_indices = np.full((pcds.shape[0], ), 1, dtype=np.bool_)\n                for i in range(obj_mask.shape[0]):\n                    mask = obj_mask[i]\n                    if pcds[mask == 1, :].shape[0] > 0:\n                        obj_pcds.append(pcds[mask == 1, :])\n                        obj_l.append(obj_labels[i])\n                        # if not self.int2cat[obj_labels[i]] in ['wall', 'floor', 'ceiling']:\n                        bg_indices[mask == 1] = False\n                scans[scan_id]['obj_pcds_pred'] = obj_pcds\n                scans[scan_id]['inst_labels_pred'] = obj_l\n                scans[scan_id]['bg_pcds_pred'] = pcds[bg_indices]\n                # calculate box for pred\n                obj_center_pred = []\n                obj_box_size_pred = []\n                for obj_pcd in obj_pcds:\n                    _c, _b = convert_pc_to_box(obj_pcd)\n                    obj_center_pred.append(_c)\n                    obj_box_size_pred.append(_b)\n                scans[scan_id]['obj_center_pred'] = obj_center_pred\n                scans[scan_id]['obj_box_size_pred'] = obj_box_size_pred\n        return scans\n\n    def _load_lang(self, cfg, scan_ids):\n        caption_source = cfg.sources\n        json_data = []\n        lang_data = []\n        valid_scan_ids = []\n\n        if self.use_scene_cap:\n            scene_cap_file = os.path.join(self.base_dir, 'annotations/scene_cap.json')\n            if not os.path.exists(scene_cap_file):\n                self.scene_caps = {}\n            else:\n                with open(scene_cap_file, 'r') as f:\n                    self.scene_caps = json.load(f)\n        else:\n            self.scene_caps = None\n\n        for anno_type in caption_source:\n            if anno_type == 'anno':\n                anno_file = os.path.join(self.base_dir, 'annotations/anno.json')\n                json_data.extend(json.load(open(anno_file, 'r', encoding='utf-8')))\n            elif anno_type == 'referit3d':\n                for anno_type in cfg.referit3d.anno_type:\n                    anno_file = os.path.join(self.base_dir,\n                                             f'annotations/refer/{anno_type}.jsonl')\n                    with jsonlines.open(anno_file, 'r') as _f:\n                        for item in _f:\n                            if len(item['tokens']) <= 24:\n                                json_data.append(item)\n                if cfg.referit3d.sr3d_plus_aug:\n                    anno_file = os.path.join(self.base_dir, 'annotations/refer/sr3d+.jsonl')\n                    with jsonlines.open(anno_file, 'r') as _f:\n                        for item in _f:\n                            if len(item['tokens']) <= 24:\n                                json_data.append(item)\n            elif anno_type == 'scanrefer':\n                anno_file = os.path.join(self.base_dir, 'annotations/refer/scanrefer.jsonl')\n                with jsonlines.open(anno_file, 'r') as _f:\n                    for item in _f:\n                        json_data.append(item)\n            elif anno_type == 'sgrefer':\n                for anno_type in cfg.sgrefer.anno_type:\n                    anno_file = os.path.join(self.base_dir,\n                                             f'annotations/refer/ssg_ref_{anno_type}.json')\n                    json_data.extend(json.load(open(anno_file, 'r', encoding='utf-8')))\n            elif anno_type == 'sgcaption':\n                for anno_type in cfg.sgcaption.anno_type:\n                    anno_file = os.path.join(self.base_dir,\n                                             f'annotations/refer/ssg_obj_caption_{anno_type}.json')\n                    json_data.extend(json.load(open(anno_file, 'r', encoding='utf-8')))\n            else:\n                if 'obj_caption' in anno_type:\n                    anno_file = os.path.join(self.base_dir, f'annotations/ssg_{anno_type}.json')\n                else:\n                    anno_file = os.path.join(self.base_dir, f'annotations/ssg_ref_{anno_type}.json')\n                json_data.extend(json.load(open(anno_file, 'r', encoding='utf-8')))\n        for item in json_data:\n            if item['scan_id'] in scan_ids and item['instance_type'] not in ['wall', 'floor', 'ceiling']:\n                lang_data.append(item)\n                if item['scan_id'] not in valid_scan_ids:\n                    valid_scan_ids.append(item['scan_id'])\n        valid_scan_ids = sorted(valid_scan_ids)\n        if self.subset_ratio > 0:\n            valid_scan_ids = valid_scan_ids[:int(self.subset_ratio*len(valid_scan_ids))]\n            lang_data = [item for item in lang_data if item['scan_id'] in valid_scan_ids]\n\n        if self.debug and self.debug_size != -1:\n            valid_scan_ids = valid_scan_ids[:self.debug_size]\n            lang_data = [item for item in lang_data if item['scan_id'] in valid_scan_ids]\n\n        return lang_data, valid_scan_ids\n\n    def _getitem_pretrain(self, index, is_rscan=False):\n        item = self.lang_data[index]\n        scan_id = item['scan_id']\n        if is_rscan and hasattr(item, 'sentence'):\n            sentence = item['sentence']\n        else:\n            sentence = item['utterance']\n\n        # load pcds and labels\n        if self.pc_type == 'gt':\n            obj_pcds = self.scan_data[scan_id]['obj_pcds'] # N, 6\n            obj_labels = self.scan_data[scan_id]['inst_labels'] # N\n        elif self.pc_type == 'pred':\n            obj_pcds = self.scan_data[scan_id]['obj_pcds_pred']\n            obj_labels = self.scan_data[scan_id]['inst_labels_pred']\n\n        # filter out background\n        selected_obj_idxs = [i for i, obj_label in enumerate(obj_labels)\n                                if (self.int2cat[obj_label] not in ['wall', 'floor', 'ceiling'])]\n        obj_pcds = [obj_pcds[id] for id in selected_obj_idxs]\n        obj_labels = [obj_labels[id] for id in selected_obj_idxs]\n\n        # crop objects\n        if self.max_obj_len < len(obj_pcds):\n            remained_obj_idx = [i for i in range(len(obj_pcds))]\n            random.shuffle(remained_obj_idx)\n            selected_obj_idxs = remained_obj_idx[:self.max_obj_len]\n            # reorganize ids\n            obj_pcds = [obj_pcds[i] for i in selected_obj_idxs]\n            obj_labels = [obj_labels[i] for i in selected_obj_idxs]\n            assert len(obj_pcds) == self.max_obj_len\n\n        if not self.aug_cfg:\n            obj_fts, obj_locs, _, obj_labels = self._obj_processing_post(obj_pcds, obj_labels,\n                                                                         is_need_bbox=True,\n                                                                         rot_aug=self.rot_aug)\n        else:\n            obj_fts, obj_locs, _, obj_labels = self._obj_processing_aug(obj_pcds, obj_labels,\n                                                                        is_need_bbox=True)\n\n        data_dict = {'scan_id': scan_id,\n                     'sentence': sentence,\n                     'obj_fts': obj_fts,\n                     'obj_locs': obj_locs,\n                     'obj_labels': obj_labels} \n\n        return data_dict\n\n    def _getitem_obj_pretrain(self, index):\n        scan_id = self.scan_ids[index]\n        sentence = 'placeholder'\n\n        # load pcds and labels\n        if self.pc_type == 'gt':\n            obj_pcds = self.scan_data[scan_id]['obj_pcds'] # N, 6\n            obj_labels = self.scan_data[scan_id]['inst_labels'] # N\n        elif self.pc_type == 'pred':\n            obj_pcds = self.scan_data[scan_id]['obj_pcds_pred']\n            obj_labels = self.scan_data[scan_id]['inst_labels_pred']\n\n        # filter out background\n        selected_obj_idxs = [i for i, obj_label in enumerate(obj_labels)\n                             if (self.int2cat[obj_label] in CLASS_LABELS_200) and (self.int2cat[obj_label] not in ['wall', 'floor', 'ceiling'])]\n        obj_pcds = [obj_pcds[id] for id in selected_obj_idxs]\n        obj_labels = [obj_labels[id] for id in selected_obj_idxs]\n\n        # crop objects\n        if self.max_obj_len < len(obj_pcds):\n            remained_obj_idx = [i for i in range(len(obj_pcds))]\n            random.shuffle(remained_obj_idx)\n            selected_obj_idxs = remained_obj_idx[:self.max_obj_len]\n            # reorganize ids\n            obj_pcds = [obj_pcds[i] for i in selected_obj_idxs]\n            obj_labels = [obj_labels[i] for i in selected_obj_idxs]\n            assert len(obj_pcds) == self.max_obj_len\n\n        if not self.load_scene_pcds:\n            if not self.aug_cfg:\n                obj_fts, obj_locs, _, obj_labels = self._obj_processing_post(obj_pcds,\n                                                                             obj_labels,\n                                                                             is_need_bbox=True,\n                                                                             rot_aug=self.rot_aug)\n            else:\n                obj_fts, obj_locs, _, obj_labels = self._obj_processing_aug(obj_pcds,\n                                                                            obj_labels,\n                                                                            is_need_bbox=True)\n        else:\n            assert self.aug_cfg\n            bg_pcds = self.scan_data[scan_id]['bg_pcds']\n            obj_locs, _, obj_labels, obj_pcds_masks, scene_pcds = self._scene_processing_aug(obj_pcds,\n                                                                                             bg_pcds,\n                                                                                             obj_labels,\n                                                                                             is_need_bbox=True)\n\n        if not self.load_scene_pcds:\n            data_dict = {'scan_id': scan_id,\n                        'sentence': sentence,\n                        'obj_fts': obj_fts,\n                        'obj_locs': obj_locs,\n                        'obj_labels': obj_labels}\n        else:\n            data_dict = {'scan_id': scan_id,\n                         'sentence': sentence, \n                         'obj_locs': obj_locs, \n                         'obj_labels': obj_labels, \n                         'obj_pcds_masks': obj_pcds_masks, \n                         'scene_pcds': scene_pcds}\n        return data_dict\n\n    def _getitem_refer(self, index):\n        item = self.lang_data[index]\n        item_id = item['item_id']\n        scan_id =  item['scan_id']\n        tgt_object_instance = int(item['target_id'])\n        tgt_object_name = item['instance_type']\n        sentence = item['utterance']\n        is_view_dependent = is_explicitly_view_dependent(item['utterance'].split(' '))\n\n        if self.use_scene_cap:\n            scene_caps = self.scene_caps.get(scan_id)\n            if scene_caps is not None:\n                scene_caps = scene_caps['captions']\n                scene_cap = scene_caps[np.random.choice(len(scene_caps))]\n            else:\n                scene_cap = \"This is a scene.\"\n\n        # load pcds and labels\n        if self.pc_type == 'gt':\n            obj_pcds = self.scan_data[scan_id]['obj_pcds'] # N, 6\n            obj_labels = self.scan_data[scan_id]['inst_labels'] # N\n            obj_ids = self.scan_data[scan_id]['inst_ids'] # N\n            assert tgt_object_instance in obj_ids, str(tgt_object_instance) + ' not in ' + str(obj_ids) + '-' + scan_id\n            tgt_object_id = obj_ids.index(tgt_object_instance)\n        elif self.pc_type == 'pred':\n            obj_pcds = self.scan_data[scan_id]['obj_pcds_pred']\n            # obj_labels = self.scan_data[scan_id]['inst_labels_pred']\n            # obj_ids = self.scan_data[scan_id]['inst_ids_pred'] # N\n            obj_labels = self.scan_data[scan_id]['inst_labels_pred']\n            # get obj labels by matching\n            gt_obj_labels = self.scan_data[scan_id]['inst_labels'] # N\n            obj_center = self.scan_data[scan_id]['obj_center']\n            obj_box_size = self.scan_data[scan_id]['obj_box_size']\n            obj_center_pred = self.scan_data[scan_id]['obj_center_pred']\n            obj_box_size_pred = self.scan_data[scan_id]['obj_box_size_pred']\n            for i, _ in enumerate(obj_center_pred):\n                for j, _ in enumerate(obj_center):\n                    if eval_ref_one_sample(construct_bbox_corners(obj_center[j],\n                                                                  obj_box_size[j]),\n                                           construct_bbox_corners(obj_center_pred[i],\n                                                                  obj_box_size_pred[i])) >= 0.25:\n                        obj_labels[i] = gt_obj_labels[j]\n                        break\n\n        # filter out background or language\n        if self.filter_lang:\n            if self.pc_type == 'gt':\n                selected_obj_idxs = [i for i, obj_label in enumerate(obj_labels)\n                                if (self.int2cat[obj_label] not in ['wall', 'floor', 'ceiling'])\n                                and (self.int2cat[obj_label] in sentence)]\n                if tgt_object_id not in selected_obj_idxs:\n                    selected_obj_idxs.append(tgt_object_id)\n            else:\n                selected_obj_idxs = [i for i in range(len(obj_pcds))]\n        else:\n            if self.pc_type == 'gt':\n                selected_obj_idxs = [i for i, obj_label in enumerate(obj_labels)\n                                if (self.int2cat[obj_label] not in ['wall', 'floor', 'ceiling'])]\n                if tgt_object_id not in selected_obj_idxs:\n                    selected_obj_idxs.append(tgt_object_id)\n            else:\n                selected_obj_idxs = [i for i in range(len(obj_pcds))]\n\n        obj_pcds = [obj_pcds[id] for id in selected_obj_idxs]\n        obj_labels = [obj_labels[id] for id in selected_obj_idxs]\n\n        # build tgt object id and box\n        if self.pc_type == 'gt':\n            tgt_object_id = selected_obj_idxs.index(tgt_object_id)\n            tgt_object_label = obj_labels[tgt_object_id]\n            tgt_object_id_iou25_list = [tgt_object_id]\n            tgt_object_id_iou50_list = [tgt_object_id]\n        elif self.pc_type == 'pred':\n            obj_ids = self.scan_data[scan_id]['inst_ids'] # N\n            tgt_object_id = obj_ids.index(tgt_object_instance)\n            gt_pcd = self.scan_data[scan_id][\"obj_pcds\"][tgt_object_id]\n            gt_center, gt_box_size = convert_pc_to_box(gt_pcd)\n            tgt_object_id = -1\n            tgt_object_id_iou25_list = []\n            tgt_object_id_iou50_list = []\n            tgt_object_label = self.cat2int[tgt_object_name]\n            # find tgt iou 25\n            for i, _ in enumerate(obj_pcds):\n                obj_center, obj_box_size = convert_pc_to_box(obj_pcds[i])\n                if eval_ref_one_sample(construct_bbox_corners(obj_center, obj_box_size),\n                                       construct_bbox_corners(gt_center, gt_box_size)) >= 0.25:\n                    tgt_object_id = i\n                    tgt_object_id_iou25_list.append(i)\n            # find tgt iou 50\n            for i, _ in enumerate(obj_pcds):\n                obj_center, obj_box_size = convert_pc_to_box(obj_pcds[i])\n                if eval_ref_one_sample(construct_bbox_corners(obj_center, obj_box_size),\n                                       construct_bbox_corners(gt_center, gt_box_size)) >= 0.5:\n                    tgt_object_id_iou50_list.append(i)\n        assert len(obj_pcds) == len(obj_labels)\n\n        # crop objects\n        if self.max_obj_len < len(obj_pcds):\n            # select target first\n            if tgt_object_id != -1:\n                selected_obj_idxs = [tgt_object_id]\n            selected_obj_idxs.extend(tgt_object_id_iou25_list)\n            selected_obj_idxs.extend(tgt_object_id_iou50_list)\n            selected_obj_idxs = list(set(selected_obj_idxs))\n            # select object with same semantic class with tgt_object\n            remained_obj_idx = []\n            for kobj, klabel in enumerate(obj_labels):\n                if kobj not in selected_obj_idxs:\n                    if klabel == tgt_object_label:\n                        selected_obj_idxs.append(kobj)\n                    else:\n                        remained_obj_idx.append(kobj)\n                if len(selected_obj_idxs) == self.max_obj_len:\n                    break\n            if len(selected_obj_idxs) < self.max_obj_len:\n                random.shuffle(remained_obj_idx)\n                selected_obj_idxs += remained_obj_idx[:(self.max_obj_len - len(selected_obj_idxs))]\n            # reorganize ids\n            obj_pcds = [obj_pcds[i] for i in selected_obj_idxs]\n            obj_labels = [obj_labels[i] for i in selected_obj_idxs]\n            if tgt_object_id != -1:\n                tgt_object_id = selected_obj_idxs.index(tgt_object_id)\n            tgt_object_id_iou25_list = [selected_obj_idxs.index(id)\n                                        for id in tgt_object_id_iou25_list]\n            tgt_object_id_iou50_list = [selected_obj_idxs.index(id)\n                                        for id in tgt_object_id_iou50_list]\n            assert len(obj_pcds) == self.max_obj_len\n\n        # rebuild tgt_object_id\n        if tgt_object_id == -1:\n            tgt_object_id = len(obj_pcds)\n\n        if not self.load_scene_pcds:\n            if not self.aug_cfg:\n                obj_fts, obj_locs, obj_boxes, obj_labels = self._obj_processing_post(obj_pcds,\n                                                                                     obj_labels,\n                                                                                     is_need_bbox=True,\n                                                                                     rot_aug=self.rot_aug)\n            else:\n                obj_fts, obj_locs, obj_boxes, obj_labels = self._obj_processing_aug(obj_pcds,\n                                                                                    obj_labels,\n                                                                                    is_need_bbox=True)\n        else:\n            assert self.aug_cfg\n            if self.pc_type == 'pred':\n                bg_pcds = self.scan_data[scan_id]['bg_pcds_pred']\n            else:\n                bg_pcds = self.scan_data[scan_id]['bg_pcds']\n            obj_locs, obj_boxes, obj_labels, obj_pcds_masks, scene_pcds = self._scene_processing_aug(obj_pcds,\n                                                                                                     bg_pcds,\n                                                                                                     obj_labels,\n                                                                                                     is_need_bbox=True)\n\n        # build iou25 and iou50\n        tgt_object_id_iou25 = torch.zeros(len(obj_pcds) + 1).long()\n        tgt_object_id_iou50 = torch.zeros(len(obj_pcds) + 1).long()\n        for _id in tgt_object_id_iou25_list:\n            tgt_object_id_iou25[_id] = 1\n        for _id in tgt_object_id_iou50_list:\n            tgt_object_id_iou50[_id] = 1\n\n        # build unique multiple\n        is_multiple = self.scan_data[scan_id]['label_count_multi'][self.label_converter.id_to_scannetid\n                                                                  [tgt_object_label]] > 1\n        is_hard = self.scan_data[scan_id]['label_count'][tgt_object_label] > 2\n\n        data_dict = {\n            \"sentence\": sentence,\n            \"tgt_object_id\": torch.LongTensor([tgt_object_id]),  # 1\n            \"tgt_object_label\": torch.LongTensor([tgt_object_label]),  # 1\n            \"obj_locs\": obj_locs,  # N, 3\n            \"obj_labels\": obj_labels,  # N,\n            \"obj_boxes\": obj_boxes,  # N, 6\n            \"data_idx\": item_id,\n            \"tgt_object_id_iou25\": tgt_object_id_iou25,\n            \"tgt_object_id_iou50\": tgt_object_id_iou50,\n            'is_multiple': is_multiple,\n            'is_view_dependent': is_view_dependent,\n            'is_hard': is_hard\n        }\n        if self.load_scene_pcds:\n            data_dict[\"scene_pcds\"] = scene_pcds\n            data_dict[\"obj_pcds_masks\"] = obj_pcds_masks\n        else:\n            data_dict[\"obj_fts\"] = obj_fts  # N, 6\n\n        if self.use_scene_cap:\n            data_dict[\"scene_cap\"] = scene_cap\n        return data_dict\n\n    def _getitem_perscene(self, index):\n        item = self.lang_data[index]\n        scan_id =  item[0]['scan_id']\n        # load lang list\n        list_item_id = [_i['item_id'] for _i in item]\n        list_tgt_object_instance = [int(_i['target_id']) for _i in item]\n        list_tgt_object_name = [_i['instance_type'] for _i in item]\n        # (sentence, token_seq, token_mask)\n        list_sentence = [_i['utterance'][0] for _i in item]\n        list_token = [_i['utterance'][1] for _i in item]\n        list_mask = [_i['utterance'][2] for _i in item]\n        list_is_view_dependent = [is_explicitly_view_dependent(sentence.split(' ')) for sentence in list_sentence]\n\n        # load pcds and labels\n        if self.pc_type == 'gt':\n            obj_pcds = self.scan_data[scan_id]['obj_pcds'] # N, 6\n            obj_labels = self.scan_data[scan_id]['inst_labels'] # N\n            obj_ids = self.scan_data[scan_id]['inst_ids'] # N\n        elif self.pc_type == 'pred':\n            obj_pcds = self.scan_data[scan_id]['obj_pcds_pred']\n            obj_labels = self.scan_data[scan_id]['inst_labels_pred']\n            # get obj labels by matching\n            gt_obj_labels = self.scan_data[scan_id]['inst_labels'] # N\n            obj_center = self.scan_data[scan_id]['obj_center']\n            obj_box_size = self.scan_data[scan_id]['obj_box_size']\n            obj_center_pred = self.scan_data[scan_id]['obj_center_pred']\n            obj_box_size_pred = self.scan_data[scan_id]['obj_box_size_pred']\n            for i, _ in enumerate(obj_center_pred):\n                for j, _ in enumerate(obj_center):\n                    if eval_ref_one_sample(construct_bbox_corners(obj_center[j],\n                                                                  obj_box_size[j]),\n                                           construct_bbox_corners(obj_center_pred[i],\n                                                                  obj_box_size_pred[i])) >= 0.25:\n                        obj_labels[i] = gt_obj_labels[j]\n                        break\n\n        list_tgt_object_id = [obj_ids.index(_ins) for _ins in list_tgt_object_instance]\n\n        if self.pc_type == 'gt':\n            selected_obj_idxs = [i for i, obj_label in enumerate(obj_labels)\n                            if (self.int2cat[obj_label] not in ['wall', 'floor', 'ceiling'])]\n        else:\n            selected_obj_idxs = [i for i in range(len(obj_pcds))]\n\n        obj_pcds = [obj_pcds[id] for id in selected_obj_idxs]\n        obj_labels = [obj_labels[id] for id in selected_obj_idxs]\n\n        # build tgt object id and box\n        list_tgt_object_label = []\n        list_tgt_object_id_iou25_list = []\n        list_tgt_object_id_iou50_list = []\n        list_is_multiple = []\n        list_is_hard = []\n        for idx, _ in enumerate(list_item_id):\n            item_id = list_item_id[idx]\n            tgt_object_id = list_tgt_object_id[idx]\n            tgt_object_name = list_tgt_object_name[idx]\n\n            if self.pc_type == 'gt':\n                tgt_object_id = selected_obj_idxs.index(tgt_object_id)\n                tgt_object_label = obj_labels[tgt_object_id]\n                tgt_object_id_iou25_list = [tgt_object_id]\n                tgt_object_id_iou50_list = [tgt_object_id]\n                assert self.int2cat[tgt_object_label] == tgt_object_name, str(self.int2cat[tgt_object_label]) + '-' + tgt_object_name\n            elif self.pc_type == 'pred':\n                gt_pcd = self.scan_data[scan_id][\"obj_pcds\"][tgt_object_id]\n                gt_center, gt_box_size = convert_pc_to_box(gt_pcd)\n                tgt_object_id = -1\n                tgt_object_id_iou25_list = []\n                tgt_object_id_iou50_list = []\n                tgt_object_label = self.cat2int[tgt_object_name]\n                # find tgt iou 25\n                for i, _ in enumerate(obj_pcds):\n                    obj_center, obj_box_size = convert_pc_to_box(obj_pcds[i])\n                    if eval_ref_one_sample(construct_bbox_corners(obj_center, obj_box_size),\n                                        construct_bbox_corners(gt_center, gt_box_size)) >= 0.25:\n                        tgt_object_id = i\n                        tgt_object_id_iou25_list.append(i)\n                # find tgt iou 50\n                for i, _ in enumerate(obj_pcds):\n                    obj_center, obj_box_size = convert_pc_to_box(obj_pcds[i])\n                    if eval_ref_one_sample(construct_bbox_corners(obj_center, obj_box_size),\n                                        construct_bbox_corners(gt_center, gt_box_size)) >= 0.5:\n                        tgt_object_id_iou50_list.append(i)\n            # build unique multiple\n            is_multiple = self.scan_data[scan_id]['label_count'][self.label_converter.id_to_scannetid\n                                                                    [tgt_object_label]] > 1\n            is_hard = self.scan_data[scan_id]['label_count'][tgt_object_label] > 2\n\n            list_tgt_object_id[idx] = tgt_object_id\n            list_tgt_object_label.append(tgt_object_label)\n            list_tgt_object_id_iou25_list.append(tgt_object_id_iou25_list)\n            list_tgt_object_id_iou50_list.append(tgt_object_id_iou50_list)\n            list_is_multiple.append(is_multiple)\n            list_is_hard.append(is_hard)\n\n        # crop objects\n        if self.max_obj_len < len(obj_pcds):\n            # select target first\n            selected_obj_idxs = [x for x in list_tgt_object_id if x != -1]\n            for idx, _ in enumerate(list_tgt_object_id):\n                selected_obj_idxs.extend(list_tgt_object_id_iou25_list[idx])\n                selected_obj_idxs.extend(list_tgt_object_id_iou50_list[idx])\n            selected_obj_idxs = list(set(selected_obj_idxs))\n            # select object with same semantic class with tgt_object\n            remained_obj_idx = []\n            for kobj, klabel in enumerate(obj_labels):\n                if kobj not in selected_obj_idxs:\n                    if klabel == tgt_object_label:\n                        selected_obj_idxs.append(kobj)\n                    else:\n                        remained_obj_idx.append(kobj)\n                if len(selected_obj_idxs) == self.max_obj_len:\n                    break\n            if len(selected_obj_idxs) < self.max_obj_len:\n                random.shuffle(remained_obj_idx)\n                selected_obj_idxs += remained_obj_idx[:(self.max_obj_len - len(selected_obj_idxs))]\n            # reorganize ids\n            obj_pcds = [obj_pcds[i] for i in selected_obj_idxs]\n            obj_labels = [obj_labels[i] for i in selected_obj_idxs]\n\n            list_tgt_object_id_tmp = []\n            for tgt_object_id in list_tgt_object_id:\n                list_tgt_object_id_tmp.append(selected_obj_idxs.index(tgt_object_id)  if tgt_object_id != -1 else -1)\n            list_tgt_object_id = list_tgt_object_id_tmp\n            list_tgt_object_id_iou25_list_tmp = []\n            for tgt_object_id_iou25_list in list_tgt_object_id_iou25_list:\n                list_tgt_object_id_iou25_list_tmp.append([selected_obj_idxs.index(id)\n                                        for id in tgt_object_id_iou25_list])\n            list_tgt_object_id_iou25_list = list_tgt_object_id_iou25_list_tmp\n            list_tgt_object_id_iou50_list_tmp = []\n            for tgt_object_id_iou50_list in list_tgt_object_id_iou50_list:\n                list_tgt_object_id_iou50_list_tmp.append([selected_obj_idxs.index(id)\n                                        for id in tgt_object_id_iou50_list])\n            list_tgt_object_id_iou50_list = list_tgt_object_id_iou50_list_tmp\n            assert len(obj_pcds) == self.max_obj_len\n\n        # rebuild tgt_object_id\n        for idx, _id in enumerate(list_tgt_object_id):\n            if _id == -1:\n                list_tgt_object_id[idx] = len(obj_pcds)\n\n        # build scene\n        assert self.aug_cfg\n        if self.pc_type == 'pred':\n            bg_pcds = self.scan_data[scan_id]['bg_pcds_pred']\n        else:\n            bg_pcds = self.scan_data[scan_id]['bg_pcds']\n        obj_locs, obj_boxes, obj_labels, obj_pcds_masks, scene_pcds = self._scene_processing_aug(obj_pcds,\n                                                                                                 bg_pcds,\n                                                                                                 obj_labels,\n                                                                                                 is_need_bbox=True)\n\n        # build iou25 and iou50\n        list_tgt_object_id_iou25 = torch.zeros((len(list_sentence), len(obj_pcds) + 1)).long()\n        list_tgt_object_id_iou50 = torch.zeros((len(list_sentence), len(obj_pcds) + 1)).long()\n        for _rid, tgt_id in enumerate(list_tgt_object_id_iou25_list):\n            for _cid in tgt_id:\n                list_tgt_object_id_iou25[_rid, _cid] = 1\n        for _rid, tgt_id in enumerate(list_tgt_object_id_iou50_list):\n            for _cid in tgt_id:\n                list_tgt_object_id_iou50[_rid, _cid] = 1\n\n        data_dict = {\n            \"sentence\": list_sentence,  # list, len L\n            \"txt_ids\": list_token, # Tensor, L\n            \"txt_masks\": torch.LongTensor(list_mask), # Tensor, L\n            \"tgt_object_id\": torch.LongTensor(list_tgt_object_id), # Tensor, L\n            \"tgt_object_label\": torch.LongTensor(list_tgt_object_label), # Tensor, L\n            \"scene_pcds\": scene_pcds, # N_pts, 6\n            \"obj_locs\": obj_locs, # Tensor N, 6\n            \"obj_labels\": obj_labels, # Tensor, N\n            \"obj_boxes\": obj_boxes, # Tensor, N, 6\n            \"data_idx\": item_id, # str, 1\n            \"tgt_object_id_iou25\": list_tgt_object_id_iou25, # Tensor, (L, N+1)\n            \"tgt_object_id_iou50\": list_tgt_object_id_iou50, # Tensor, (L, N+1)\n            \"is_multiple\": torch.LongTensor(list_is_multiple), # list, L\n            \"is_view_dependent\": torch.LongTensor(list_is_view_dependent), # List, L\n            \"is_hard\": torch.LongTensor(list_is_hard), # List, L\n            \"obj_pcds_masks\": obj_pcds_masks # Tensor, (N, 1024)\n        }\n        return data_dict\n\n    def _obj_processing_post(self, obj_pcds, obj_labels, is_need_bbox=False, rot_aug=False):\n        # rotate obj\n        rot_matrix = build_rotate_mat(self.split, rot_aug)\n\n        # normalize pc and calculate location\n        obj_fts = []\n        obj_locs = []\n        obj_boxes = []\n        for obj_pcd in obj_pcds:\n            # build locs\n            if rot_matrix is not None:\n                obj_pcd[:, :3] = np.matmul(obj_pcd[:, :3], rot_matrix.transpose())\n            obj_center = obj_pcd[:, :3].mean(0)\n            obj_size = obj_pcd[:, :3].max(0) - obj_pcd[:, :3].min(0)\n            obj_locs.append(np.concatenate([obj_center, obj_size], 0))\n\n            # build box\n            if is_need_bbox:\n                obj_box_center = (obj_pcd[:, :3].max(0) + obj_pcd[:, :3].min(0)) / 2\n                obj_box_size = obj_pcd[:, :3].max(0) - obj_pcd[:, :3].min(0)\n                obj_boxes.append(np.concatenate([obj_box_center, obj_box_size], 0))\n\n            # subsample\n            pcd_idxs = np.random.choice(len(obj_pcd), size=self.num_points,\n                                        replace=len(obj_pcd) < self.num_points)\n            obj_pcd = obj_pcd[pcd_idxs]\n            # normalize\n            obj_pcd[:, :3] = obj_pcd[:, :3] - obj_pcd[:, :3].mean(0)\n            max_dist = np.max(np.sqrt(np.sum(obj_pcd[:, :3]**2, 1)))\n            if max_dist < 1e-6: # take care of tiny point-clouds, i.e., padding\n                max_dist = 1\n            obj_pcd[:, :3] = obj_pcd[:, :3] / max_dist\n            obj_fts.append(obj_pcd)\n\n        # convert to torch\n        obj_fts = torch.from_numpy(np.stack(obj_fts, 0))\n        obj_locs = torch.from_numpy(np.array(obj_locs))\n        obj_boxes = torch.from_numpy(np.array(obj_boxes))\n        obj_labels = torch.LongTensor(obj_labels)\n\n        assert obj_labels.shape[0] == obj_locs.shape[0]\n        assert obj_fts.shape[0] == obj_locs.shape[0]\n\n        return obj_fts, obj_locs, obj_boxes, obj_labels\n\n    def _obj_processing_aug(self, obj_pcds, obj_labels, is_need_bbox=False):\n        # augment objects\n        if self.augmentor:\n            data_dict = self.augmentor.forward({'obj_pcds': obj_pcds,\n                                                'num_points': self.num_points})\n\n        obj_pcds = data_dict['obj_pcds']\n        if isinstance(obj_pcds, list):\n            obj_pcds = torch.Tensor(np.array(obj_pcds))\n        obj_sizes = torch.Tensor(np.array(data_dict['obj_sizes']))\n\n        xyz = obj_pcds[:, :, :3]\n        center = xyz.mean(1)\n        xyz_min = xyz.min(1).values\n        xyz_max = xyz.max(1).values\n        box_center = (xyz_min + xyz_max) / 2\n        size = torch.Tensor(obj_sizes)\n        # size = xyz_max - xyz_min\n        obj_locs = torch.cat([center, size], dim=1)\n        obj_boxes = torch.cat([box_center, size], dim=1)\n\n        # centering\n        obj_pcds[:, :, :3].sub_(obj_pcds[:, :, :3].mean(1, keepdim=True))\n\n        # normalization\n        max_dist = (obj_pcds[:, :, :3]**2).sum(2).sqrt().max(1).values\n        max_dist.clamp_(min=1e-6)\n        obj_pcds[:, :, :3].div_(max_dist[:, None, None])\n\n        # convert to torch\n        obj_labels = torch.LongTensor(obj_labels)\n\n        assert obj_labels.shape[0] == obj_locs.shape[0]\n\n        return obj_pcds, obj_locs, obj_boxes, obj_labels\n\n    def _scene_processing_aug(self, obj_pcds, bg_pcds, obj_labels, is_need_bbox=False):\n        obj_len = len(obj_pcds)\n        # sample background points\n        fg_points_num = len(obj_pcds) * self.num_points\n        assert fg_points_num < self.max_pcd_num_points\n        bg_points_num = min(self.max_pcd_num_points - fg_points_num, self.bg_points_num)\n        assert len(bg_pcds) > 0\n        assert bg_points_num > 0\n        bg_points_indices = np.random.choice(len(bg_pcds), size=bg_points_num, replace=len(bg_pcds) < bg_points_num)\n        bg_pcds = bg_pcds[bg_points_indices]\n\n        # augment objects\n        if self.augmentor:\n            data_dict = self.augmentor.forward({'obj_pcds': obj_pcds,\n                                                'bg_pcds': torch.Tensor(bg_pcds), \n                                                'num_points': self.num_points})\n\n        obj_pcds = data_dict['obj_pcds']\n        if isinstance(obj_pcds, list):\n            obj_pcds = torch.Tensor(np.array(obj_pcds))\n        obj_sizes = torch.Tensor(np.array(data_dict['obj_sizes']))\n        bg_pcds = data_dict['bg_pcds']\n        assert len(obj_pcds) * obj_pcds[0].shape[0] == fg_points_num\n\n        scene_pcds = np.vstack([np.array(obj_pcds.reshape(-1, 6)), np.array(bg_pcds)])\n\n        xyz = obj_pcds[:, :, :3]\n        center = xyz.mean(1)\n        xyz_min = xyz.min(1).values\n        xyz_max = xyz.max(1).values\n        box_center = (xyz_min + xyz_max) / 2\n        size = torch.Tensor(obj_sizes)\n        # size = xyz_max - xyz_min\n        obj_locs = torch.cat([center, size], dim=1)\n        obj_boxes = torch.cat([box_center, size], dim=1)\n\n        # centering\n        obj_pcds[:, :, :3].sub_(obj_pcds[:, :, :3].mean(1, keepdim=True))\n\n        # normalization\n        max_dist = (obj_pcds[:, :, :3]**2).sum(2).sqrt().max(1).values\n        max_dist.clamp_(min=1e-6)\n        obj_pcds[:, :, :3].div_(max_dist[:, None, None])\n\n        # generate obj point indices masks\n        obj_pcds_masks = []\n        offset = 0\n        for _j in range(obj_len):\n            mask = np.arange(self.num_points) + offset\n            assert len(mask) == len(obj_pcds[_j])\n            obj_pcds_masks.append(mask)\n            offset += self.num_points\n\n        # convert to torch\n        obj_labels = torch.LongTensor(obj_labels)\n        obj_pcds_masks = torch.from_numpy(np.array(obj_pcds_masks))\n\n        assert obj_labels.shape[0] == obj_locs.shape[0]\n        assert obj_pcds_masks.shape[0] == obj_locs.shape[0]\n\n        return obj_locs, obj_boxes, obj_labels, obj_pcds_masks, scene_pcds\n\n    def _getitem_finalrefer(self, index):\n        item = self.lang_data[index]\n        item_id = item['item_id']\n        scan_id = item['scan_id']\n        tgt_object_instance = int(item['target_id'])\n        tgt_object_name = item['instance_type']\n        sentence = item['utterance']\n        is_view_dependent = is_explicitly_view_dependent(item['utterance'].split(' '))\n\n        txt_ids = item['txt_ids']\n        txt_masks = item['txt_masks']\n        if self.use_scene_cap:\n            scene_caps = self.scene_caps.get(scan_id)\n            if scene_caps is not None:\n                scene_cap = copy.deepcopy(scene_caps[np.random.choice(len(scene_caps))])\n            else:\n                scene_cap = copy.deepcopy(self.default_scene_cap)\n        if self.use_scene_cap:\n            scene_txt_ids = scene_cap['scene_txt_ids']\n            scene_txt_masks = scene_cap[\"scene_txt_masks\"]\n            scene_txt_ids, scene_txt_masks = merge_tokens(\n                scene_txt_ids, scene_txt_masks, txt_ids, txt_masks,\n                max_len=self.max_scene_cap_len, tokenizer=self.tokenizer\n            )\n\n        # load pcds and labels\n        if self.pc_type == 'gt':\n            obj_pcds = self.scan_data[scan_id]['obj_pcds']  # N, 6\n            obj_labels = self.scan_data[scan_id]['inst_labels']  # N\n            obj_ids = self.scan_data[scan_id]['inst_ids']  # N\n        elif self.pc_type == 'pred':\n            obj_pcds = self.scan_data[scan_id]['obj_pcds_pred']\n            obj_labels = self.scan_data[scan_id]['inst_labels_pred']\n            obj_ids = self.scan_data[scan_id]['inst_ids_pred']  # N\n\n        assert tgt_object_instance in obj_ids, str(tgt_object_instance) + ' not in ' + str(obj_ids) + '-' + scan_id\n        tgt_object_id = obj_ids.index(tgt_object_instance)\n        # filter out background or language\n        if self.filter_lang:\n            if self.pc_type == 'gt':\n                selected_obj_idxs = [i for i, obj_label in enumerate(obj_labels)\n                                     if (self.int2cat[obj_label] not in ['wall', 'floor', 'ceiling'])\n                                     and (self.int2cat[obj_label] in sentence)]\n                if tgt_object_id not in selected_obj_idxs:\n                    selected_obj_idxs.append(tgt_object_id)\n            else:\n                selected_obj_idxs = [i for i in range(len(obj_pcds))]\n        else:\n            if self.pc_type == 'gt':\n                selected_obj_idxs = [i for i, obj_label in enumerate(obj_labels)\n                                     if (self.int2cat[obj_label] not in ['wall', 'floor', 'ceiling'])]\n                if tgt_object_id not in selected_obj_idxs:\n                    selected_obj_idxs.append(tgt_object_id)\n            else:\n                selected_obj_idxs = [i for i in range(len(obj_pcds))]\n\n        obj_pcds = [obj_pcds[id] for id in selected_obj_idxs]\n        obj_labels = [obj_labels[id] for id in selected_obj_idxs]\n\n        # build tgt object id and box\n        if self.pc_type == 'gt':\n            tgt_object_id = selected_obj_idxs.index(tgt_object_id)\n            tgt_object_label = obj_labels[tgt_object_id]\n            tgt_object_id_iou25_list = [tgt_object_id]\n            tgt_object_id_iou50_list = [tgt_object_id]\n        elif self.pc_type == 'pred':\n            gt_pcd = self.scan_data[scan_id][\"obj_pcds\"][tgt_object_id]\n            gt_center, gt_box_size = convert_pc_to_box(gt_pcd)\n            tgt_object_id = -1\n            tgt_object_id_iou25_list = []\n            tgt_object_id_iou50_list = []\n            tgt_object_label = self.cat2int[tgt_object_name]\n            # find tgt iou 25\n            for i, _ in enumerate(obj_pcds):\n                obj_center, obj_box_size = convert_pc_to_box(obj_pcds[i])\n                if eval_ref_one_sample(construct_bbox_corners(obj_center, obj_box_size),\n                                       construct_bbox_corners(gt_center, gt_box_size)) >= 0.25:\n                    tgt_object_id = i\n                    tgt_object_id_iou25_list.append(i)\n            # find tgt iou 50\n            for i, _ in enumerate(obj_pcds):\n                obj_center, obj_box_size = convert_pc_to_box(obj_pcds[i])\n                if eval_ref_one_sample(construct_bbox_corners(obj_center, obj_box_size),\n                                       construct_bbox_corners(gt_center, gt_box_size)) >= 0.5:\n                    tgt_object_id_iou50_list.append(i)\n        assert len(obj_pcds) == len(obj_labels)\n\n        # crop objects\n        if self.max_obj_len < len(obj_pcds):\n            # select target first\n            if tgt_object_id != -1:\n                selected_obj_idxs = [tgt_object_id]\n            selected_obj_idxs.extend(tgt_object_id_iou25_list)\n            selected_obj_idxs.extend(tgt_object_id_iou50_list)\n            selected_obj_idxs = list(set(selected_obj_idxs))\n            # select object with same semantic class with tgt_object\n            remained_obj_idx = []\n            for kobj, klabel in enumerate(obj_labels):\n                if kobj not in selected_obj_idxs:\n                    if klabel == tgt_object_label:\n                        selected_obj_idxs.append(kobj)\n                    else:\n                        remained_obj_idx.append(kobj)\n                if len(selected_obj_idxs) == self.max_obj_len:\n                    break\n            if len(selected_obj_idxs) < self.max_obj_len:\n                random.shuffle(remained_obj_idx)\n                selected_obj_idxs += remained_obj_idx[:(self.max_obj_len - len(selected_obj_idxs))]\n            # reorganize ids\n            obj_pcds = [obj_pcds[i] for i in selected_obj_idxs]\n            obj_labels = [obj_labels[i] for i in selected_obj_idxs]\n            if tgt_object_id != -1:\n                tgt_object_id = selected_obj_idxs.index(tgt_object_id)\n            tgt_object_id_iou25_list = [selected_obj_idxs.index(id)\n                                        for id in tgt_object_id_iou25_list]\n            tgt_object_id_iou50_list = [selected_obj_idxs.index(id)\n                                        for id in tgt_object_id_iou50_list]\n            assert len(obj_pcds) == self.max_obj_len\n\n        # rebuild tgt_object_id\n        if tgt_object_id == -1:\n            tgt_object_id = len(obj_pcds)\n\n        if not self.load_scene_pcds:\n            if not self.aug_cfg:\n                obj_fts, obj_locs, obj_boxes, obj_labels = self._obj_processing_post(obj_pcds,\n                                                                                     obj_labels,\n                                                                                     is_need_bbox=True,\n                                                                                     rot_aug=self.rot_aug)\n            else:\n                obj_fts, obj_locs, obj_boxes, obj_labels = self._obj_processing_aug(obj_pcds,\n                                                                                    obj_labels,\n                                                                                    is_need_bbox=True)\n        else:\n            assert self.aug_cfg\n            if self.pc_type == 'pred':\n                bg_pcds = self.scan_data[scan_id]['bg_pcds_pred']\n            else:\n                bg_pcds = self.scan_data[scan_id]['bg_pcds']\n            obj_locs, obj_boxes, obj_labels, obj_pcds_masks, scene_pcds = self._scene_processing_aug(obj_pcds,\n                                                                                                     bg_pcds,\n                                                                                                     obj_labels,\n                                                                                                     is_need_bbox=True)\n\n        # build iou25 and iou50\n        tgt_object_id_iou25 = torch.zeros(len(obj_pcds) + 1).long()\n        tgt_object_id_iou50 = torch.zeros(len(obj_pcds) + 1).long()\n        for _id in tgt_object_id_iou25_list:\n            tgt_object_id_iou25[_id] = 1\n        for _id in tgt_object_id_iou50_list:\n            tgt_object_id_iou50[_id] = 1\n\n        # build unique multiple\n        is_multiple = self.scan_data[scan_id]['label_count'][tgt_object_label] > 1\n        is_hard = self.scan_data[scan_id]['label_count'][tgt_object_label] > 2\n\n        data_dict = {\n            \"sentence\": sentence,\n            \"txt_ids\": torch.LongTensor(txt_ids),\n            \"txt_masks\": torch.LongTensor(txt_masks),\n            \"tgt_object_id\": torch.LongTensor([tgt_object_id]),  # 1\n            \"tgt_object_label\": torch.LongTensor([tgt_object_label]),  # 1\n            \"obj_locs\": obj_locs,  # N, 3\n            \"obj_labels\": obj_labels,  # N,\n            \"obj_boxes\": obj_boxes,  # N, 6\n            \"data_idx\": item_id,\n            \"tgt_object_id_iou25\": tgt_object_id_iou25,\n            \"tgt_object_id_iou50\": tgt_object_id_iou50,\n            'is_multiple': is_multiple,\n            'is_view_dependent': is_view_dependent,\n            'is_hard': is_hard\n        }\n        if self.load_scene_pcds:\n            data_dict[\"scene_pcds\"] = scene_pcds\n            data_dict[\"obj_pcds_masks\"] = obj_pcds_masks\n        else:\n            data_dict[\"obj_fts\"] = obj_fts  # N, 6\n\n        if self.use_scene_cap:\n            data_dict[\"scene_cap\"] = scene_cap[\"scene_cap\"]\n            data_dict[\"scene_txt_ids\"] = torch.LongTensor(scene_txt_ids)\n            data_dict[\"scene_txt_masks\"] = torch.LongTensor(scene_txt_masks)\n        return data_dict\n"
  },
  {
    "path": "data/datasets/constant.py",
    "content": "### ScanNet200 Benchmark constants ###\nVALID_CLASS_IDS_200 = (\n    1,\n    2,\n    3,\n    4,\n    5,\n    6,\n    7,\n    8,\n    9,\n    10,\n    11,\n    13,\n    14,\n    15,\n    16,\n    17,\n    18,\n    19,\n    21,\n    22,\n    23,\n    24,\n    26,\n    27,\n    28,\n    29,\n    31,\n    32,\n    33,\n    34,\n    35,\n    36,\n    38,\n    39,\n    40,\n    41,\n    42,\n    44,\n    45,\n    46,\n    47,\n    48,\n    49,\n    50,\n    51,\n    52,\n    54,\n    55,\n    56,\n    57,\n    58,\n    59,\n    62,\n    63,\n    64,\n    65,\n    66,\n    67,\n    68,\n    69,\n    70,\n    71,\n    72,\n    73,\n    74,\n    75,\n    76,\n    77,\n    78,\n    79,\n    80,\n    82,\n    84,\n    86,\n    87,\n    88,\n    89,\n    90,\n    93,\n    95,\n    96,\n    97,\n    98,\n    99,\n    100,\n    101,\n    102,\n    103,\n    104,\n    105,\n    106,\n    107,\n    110,\n    112,\n    115,\n    116,\n    118,\n    120,\n    121,\n    122,\n    125,\n    128,\n    130,\n    131,\n    132,\n    134,\n    136,\n    138,\n    139,\n    140,\n    141,\n    145,\n    148,\n    154,\n    155,\n    156,\n    157,\n    159,\n    161,\n    163,\n    165,\n    166,\n    168,\n    169,\n    170,\n    177,\n    180,\n    185,\n    188,\n    191,\n    193,\n    195,\n    202,\n    208,\n    213,\n    214,\n    221,\n    229,\n    230,\n    232,\n    233,\n    242,\n    250,\n    261,\n    264,\n    276,\n    283,\n    286,\n    300,\n    304,\n    312,\n    323,\n    325,\n    331,\n    342,\n    356,\n    370,\n    392,\n    395,\n    399,\n    408,\n    417,\n    488,\n    540,\n    562,\n    570,\n    572,\n    581,\n    609,\n    748,\n    776,\n    1156,\n    1163,\n    1164,\n    1165,\n    1166,\n    1167,\n    1168,\n    1169,\n    1170,\n    1171,\n    1172,\n    1173,\n    1174,\n    1175,\n    1176,\n    1178,\n    1179,\n    1180,\n    1181,\n    1182,\n    1183,\n    1184,\n    1185,\n    1186,\n    1187,\n    1188,\n    1189,\n    1190,\n    1191,\n)\n\nCLASS_LABELS_200 = (\n    \"wall\",\n    \"chair\",\n    \"floor\",\n    \"table\",\n    \"door\",\n    \"couch\",\n    \"cabinet\",\n    \"shelf\",\n    \"desk\",\n    \"office chair\",\n    \"bed\",\n    \"pillow\",\n    \"sink\",\n    \"picture\",\n    \"window\",\n    \"toilet\",\n    \"bookshelf\",\n    \"monitor\",\n    \"curtain\",\n    \"book\",\n    \"armchair\",\n    \"coffee table\",\n    \"box\",\n    \"refrigerator\",\n    \"lamp\",\n    \"kitchen cabinet\",\n    \"towel\",\n    \"clothes\",\n    \"tv\",\n    \"nightstand\",\n    \"counter\",\n    \"dresser\",\n    \"stool\",\n    \"cushion\",\n    \"plant\",\n    \"ceiling\",\n    \"bathtub\",\n    \"end table\",\n    \"dining table\",\n    \"keyboard\",\n    \"bag\",\n    \"backpack\",\n    \"toilet paper\",\n    \"printer\",\n    \"tv stand\",\n    \"whiteboard\",\n    \"blanket\",\n    \"shower curtain\",\n    \"trash can\",\n    \"closet\",\n    \"stairs\",\n    \"microwave\",\n    \"stove\",\n    \"shoe\",\n    \"computer tower\",\n    \"bottle\",\n    \"bin\",\n    \"ottoman\",\n    \"bench\",\n    \"board\",\n    \"washing machine\",\n    \"mirror\",\n    \"copier\",\n    \"basket\",\n    \"sofa chair\",\n    \"file cabinet\",\n    \"fan\",\n    \"laptop\",\n    \"shower\",\n    \"paper\",\n    \"person\",\n    \"paper towel dispenser\",\n    \"oven\",\n    \"blinds\",\n    \"rack\",\n    \"plate\",\n    \"blackboard\",\n    \"piano\",\n    \"suitcase\",\n    \"rail\",\n    \"radiator\",\n    \"recycling bin\",\n    \"container\",\n    \"wardrobe\",\n    \"soap dispenser\",\n    \"telephone\",\n    \"bucket\",\n    \"clock\",\n    \"stand\",\n    \"light\",\n    \"laundry basket\",\n    \"pipe\",\n    \"clothes dryer\",\n    \"guitar\",\n    \"toilet paper holder\",\n    \"seat\",\n    \"speaker\",\n    \"column\",\n    \"bicycle\",\n    \"ladder\",\n    \"bathroom stall\",\n    \"shower wall\",\n    \"cup\",\n    \"jacket\",\n    \"storage bin\",\n    \"coffee maker\",\n    \"dishwasher\",\n    \"paper towel roll\",\n    \"machine\",\n    \"mat\",\n    \"windowsill\",\n    \"bar\",\n    \"toaster\",\n    \"bulletin board\",\n    \"ironing board\",\n    \"fireplace\",\n    \"soap dish\",\n    \"kitchen counter\",\n    \"doorframe\",\n    \"toilet paper dispenser\",\n    \"mini fridge\",\n    \"fire extinguisher\",\n    \"ball\",\n    \"hat\",\n    \"shower curtain rod\",\n    \"water cooler\",\n    \"paper cutter\",\n    \"tray\",\n    \"shower door\",\n    \"pillar\",\n    \"ledge\",\n    \"toaster oven\",\n    \"mouse\",\n    \"toilet seat cover dispenser\",\n    \"furniture\",\n    \"cart\",\n    \"storage container\",\n    \"scale\",\n    \"tissue box\",\n    \"light switch\",\n    \"crate\",\n    \"power outlet\",\n    \"decoration\",\n    \"sign\",\n    \"projector\",\n    \"closet door\",\n    \"vacuum cleaner\",\n    \"candle\",\n    \"plunger\",\n    \"stuffed animal\",\n    \"headphones\",\n    \"dish rack\",\n    \"broom\",\n    \"guitar case\",\n    \"range hood\",\n    \"dustpan\",\n    \"hair dryer\",\n    \"water bottle\",\n    \"handicap bar\",\n    \"purse\",\n    \"vent\",\n    \"shower floor\",\n    \"water pitcher\",\n    \"mailbox\",\n    \"bowl\",\n    \"paper bag\",\n    \"alarm clock\",\n    \"music stand\",\n    \"projector screen\",\n    \"divider\",\n    \"laundry detergent\",\n    \"bathroom counter\",\n    \"object\",\n    \"bathroom vanity\",\n    \"closet wall\",\n    \"laundry hamper\",\n    \"bathroom stall door\",\n    \"ceiling light\",\n    \"trash bin\",\n    \"dumbbell\",\n    \"stair rail\",\n    \"tube\",\n    \"bathroom cabinet\",\n    \"cd case\",\n    \"closet rod\",\n    \"coffee kettle\",\n    \"structure\",\n    \"shower head\",\n    \"keyboard piano\",\n    \"case of water bottles\",\n    \"coat rack\",\n    \"storage organizer\",\n    \"folded chair\",\n    \"fire alarm\",\n    \"power strip\",\n    \"calendar\",\n    \"poster\",\n    \"potted plant\",\n    \"luggage\",\n    \"mattress\",\n)\n\nSCANNET_COLOR_MAP_200 = {\n    0: (0.0, 0.0, 0.0),\n    1: (174.0, 199.0, 232.0),\n    2: (188.0, 189.0, 34.0),\n    3: (152.0, 223.0, 138.0),\n    4: (255.0, 152.0, 150.0),\n    5: (214.0, 39.0, 40.0),\n    6: (91.0, 135.0, 229.0),\n    7: (31.0, 119.0, 180.0),\n    8: (229.0, 91.0, 104.0),\n    9: (247.0, 182.0, 210.0),\n    10: (91.0, 229.0, 110.0),\n    11: (255.0, 187.0, 120.0),\n    13: (141.0, 91.0, 229.0),\n    14: (112.0, 128.0, 144.0),\n    15: (196.0, 156.0, 148.0),\n    16: (197.0, 176.0, 213.0),\n    17: (44.0, 160.0, 44.0),\n    18: (148.0, 103.0, 189.0),\n    19: (229.0, 91.0, 223.0),\n    21: (219.0, 219.0, 141.0),\n    22: (192.0, 229.0, 91.0),\n    23: (88.0, 218.0, 137.0),\n    24: (58.0, 98.0, 137.0),\n    26: (177.0, 82.0, 239.0),\n    27: (255.0, 127.0, 14.0),\n    28: (237.0, 204.0, 37.0),\n    29: (41.0, 206.0, 32.0),\n    31: (62.0, 143.0, 148.0),\n    32: (34.0, 14.0, 130.0),\n    33: (143.0, 45.0, 115.0),\n    34: (137.0, 63.0, 14.0),\n    35: (23.0, 190.0, 207.0),\n    36: (16.0, 212.0, 139.0),\n    38: (90.0, 119.0, 201.0),\n    39: (125.0, 30.0, 141.0),\n    40: (150.0, 53.0, 56.0),\n    41: (186.0, 197.0, 62.0),\n    42: (227.0, 119.0, 194.0),\n    44: (38.0, 100.0, 128.0),\n    45: (120.0, 31.0, 243.0),\n    46: (154.0, 59.0, 103.0),\n    47: (169.0, 137.0, 78.0),\n    48: (143.0, 245.0, 111.0),\n    49: (37.0, 230.0, 205.0),\n    50: (14.0, 16.0, 155.0),\n    51: (196.0, 51.0, 182.0),\n    52: (237.0, 80.0, 38.0),\n    54: (138.0, 175.0, 62.0),\n    55: (158.0, 218.0, 229.0),\n    56: (38.0, 96.0, 167.0),\n    57: (190.0, 77.0, 246.0),\n    58: (208.0, 49.0, 84.0),\n    59: (208.0, 193.0, 72.0),\n    62: (55.0, 220.0, 57.0),\n    63: (10.0, 125.0, 140.0),\n    64: (76.0, 38.0, 202.0),\n    65: (191.0, 28.0, 135.0),\n    66: (211.0, 120.0, 42.0),\n    67: (118.0, 174.0, 76.0),\n    68: (17.0, 242.0, 171.0),\n    69: (20.0, 65.0, 247.0),\n    70: (208.0, 61.0, 222.0),\n    71: (162.0, 62.0, 60.0),\n    72: (210.0, 235.0, 62.0),\n    73: (45.0, 152.0, 72.0),\n    74: (35.0, 107.0, 149.0),\n    75: (160.0, 89.0, 237.0),\n    76: (227.0, 56.0, 125.0),\n    77: (169.0, 143.0, 81.0),\n    78: (42.0, 143.0, 20.0),\n    79: (25.0, 160.0, 151.0),\n    80: (82.0, 75.0, 227.0),\n    82: (253.0, 59.0, 222.0),\n    84: (240.0, 130.0, 89.0),\n    86: (123.0, 172.0, 47.0),\n    87: (71.0, 194.0, 133.0),\n    88: (24.0, 94.0, 205.0),\n    89: (134.0, 16.0, 179.0),\n    90: (159.0, 32.0, 52.0),\n    93: (213.0, 208.0, 88.0),\n    95: (64.0, 158.0, 70.0),\n    96: (18.0, 163.0, 194.0),\n    97: (65.0, 29.0, 153.0),\n    98: (177.0, 10.0, 109.0),\n    99: (152.0, 83.0, 7.0),\n    100: (83.0, 175.0, 30.0),\n    101: (18.0, 199.0, 153.0),\n    102: (61.0, 81.0, 208.0),\n    103: (213.0, 85.0, 216.0),\n    104: (170.0, 53.0, 42.0),\n    105: (161.0, 192.0, 38.0),\n    106: (23.0, 241.0, 91.0),\n    107: (12.0, 103.0, 170.0),\n    110: (151.0, 41.0, 245.0),\n    112: (133.0, 51.0, 80.0),\n    115: (184.0, 162.0, 91.0),\n    116: (50.0, 138.0, 38.0),\n    118: (31.0, 237.0, 236.0),\n    120: (39.0, 19.0, 208.0),\n    121: (223.0, 27.0, 180.0),\n    122: (254.0, 141.0, 85.0),\n    125: (97.0, 144.0, 39.0),\n    128: (106.0, 231.0, 176.0),\n    130: (12.0, 61.0, 162.0),\n    131: (124.0, 66.0, 140.0),\n    132: (137.0, 66.0, 73.0),\n    134: (250.0, 253.0, 26.0),\n    136: (55.0, 191.0, 73.0),\n    138: (60.0, 126.0, 146.0),\n    139: (153.0, 108.0, 234.0),\n    140: (184.0, 58.0, 125.0),\n    141: (135.0, 84.0, 14.0),\n    145: (139.0, 248.0, 91.0),\n    148: (53.0, 200.0, 172.0),\n    154: (63.0, 69.0, 134.0),\n    155: (190.0, 75.0, 186.0),\n    156: (127.0, 63.0, 52.0),\n    157: (141.0, 182.0, 25.0),\n    159: (56.0, 144.0, 89.0),\n    161: (64.0, 160.0, 250.0),\n    163: (182.0, 86.0, 245.0),\n    165: (139.0, 18.0, 53.0),\n    166: (134.0, 120.0, 54.0),\n    168: (49.0, 165.0, 42.0),\n    169: (51.0, 128.0, 133.0),\n    170: (44.0, 21.0, 163.0),\n    177: (232.0, 93.0, 193.0),\n    180: (176.0, 102.0, 54.0),\n    185: (116.0, 217.0, 17.0),\n    188: (54.0, 209.0, 150.0),\n    191: (60.0, 99.0, 204.0),\n    193: (129.0, 43.0, 144.0),\n    195: (252.0, 100.0, 106.0),\n    202: (187.0, 196.0, 73.0),\n    208: (13.0, 158.0, 40.0),\n    213: (52.0, 122.0, 152.0),\n    214: (128.0, 76.0, 202.0),\n    221: (187.0, 50.0, 115.0),\n    229: (180.0, 141.0, 71.0),\n    230: (77.0, 208.0, 35.0),\n    232: (72.0, 183.0, 168.0),\n    233: (97.0, 99.0, 203.0),\n    242: (172.0, 22.0, 158.0),\n    250: (155.0, 64.0, 40.0),\n    261: (118.0, 159.0, 30.0),\n    264: (69.0, 252.0, 148.0),\n    276: (45.0, 103.0, 173.0),\n    283: (111.0, 38.0, 149.0),\n    286: (184.0, 9.0, 49.0),\n    300: (188.0, 174.0, 67.0),\n    304: (53.0, 206.0, 53.0),\n    312: (97.0, 235.0, 252.0),\n    323: (66.0, 32.0, 182.0),\n    325: (236.0, 114.0, 195.0),\n    331: (241.0, 154.0, 83.0),\n    342: (133.0, 240.0, 52.0),\n    356: (16.0, 205.0, 144.0),\n    370: (75.0, 101.0, 198.0),\n    392: (237.0, 95.0, 251.0),\n    395: (191.0, 52.0, 49.0),\n    399: (227.0, 254.0, 54.0),\n    408: (49.0, 206.0, 87.0),\n    417: (48.0, 113.0, 150.0),\n    488: (125.0, 73.0, 182.0),\n    540: (229.0, 32.0, 114.0),\n    562: (158.0, 119.0, 28.0),\n    570: (60.0, 205.0, 27.0),\n    572: (18.0, 215.0, 201.0),\n    581: (79.0, 76.0, 153.0),\n    609: (134.0, 13.0, 116.0),\n    748: (192.0, 97.0, 63.0),\n    776: (108.0, 163.0, 18.0),\n    1156: (95.0, 220.0, 156.0),\n    1163: (98.0, 141.0, 208.0),\n    1164: (144.0, 19.0, 193.0),\n    1165: (166.0, 36.0, 57.0),\n    1166: (212.0, 202.0, 34.0),\n    1167: (23.0, 206.0, 34.0),\n    1168: (91.0, 211.0, 236.0),\n    1169: (79.0, 55.0, 137.0),\n    1170: (182.0, 19.0, 117.0),\n    1171: (134.0, 76.0, 14.0),\n    1172: (87.0, 185.0, 28.0),\n    1173: (82.0, 224.0, 187.0),\n    1174: (92.0, 110.0, 214.0),\n    1175: (168.0, 80.0, 171.0),\n    1176: (197.0, 63.0, 51.0),\n    1178: (175.0, 199.0, 77.0),\n    1179: (62.0, 180.0, 98.0),\n    1180: (8.0, 91.0, 150.0),\n    1181: (77.0, 15.0, 130.0),\n    1182: (154.0, 65.0, 96.0),\n    1183: (197.0, 152.0, 11.0),\n    1184: (59.0, 155.0, 45.0),\n    1185: (12.0, 147.0, 145.0),\n    1186: (54.0, 35.0, 219.0),\n    1187: (210.0, 73.0, 181.0),\n    1188: (221.0, 124.0, 77.0),\n    1189: (149.0, 214.0, 66.0),\n    1190: (72.0, 185.0, 134.0),\n    1191: (42.0, 94.0, 198.0),\n}\n\nHEAD_CATS_SCANNET_200 = ['tv stand', 'curtain', 'blinds', 'shower curtain', 'bookshelf', 'tv', 'kitchen cabinet', 'pillow', 'lamp', 'dresser', 'monitor', 'object', 'ceiling', 'board', 'stove', 'closet wall', 'couch', 'office chair', 'kitchen counter', 'shower', 'closet', 'doorframe', 'sofa chair', 'mailbox', 'nightstand', 'washing machine', 'picture', 'book', 'sink', 'recycling bin', 'table', 'backpack', 'shower wall', 'toilet', 'copier', 'counter', 'stool', 'refrigerator', 'window', 'file cabinet', 'chair', 'wall', 'plant', 'coffee table', 'stairs', 'armchair', 'cabinet', 'bathroom vanity', 'bathroom stall', 'mirror', 'blackboard', 'trash can', 'stair rail', 'box', 'towel', 'door', 'clothes', 'whiteboard', 'bed', 'floor', 'bathtub', 'desk', 'wardrobe', 'clothes dryer', 'radiator', 'shelf']\nCOMMON_CATS_SCANNET_200 = [\"cushion\", \"end table\", \"dining table\", \"keyboard\", \"bag\", \"toilet paper\", \"printer\", \"blanket\", \"microwave\", \"shoe\", \"computer tower\", \"bottle\", \"bin\", \"ottoman\", \"bench\", \"basket\", \"fan\", \"laptop\", \"person\", \"paper towel dispenser\", \"oven\", \"rack\", \"piano\", \"suitcase\", \"rail\", \"container\", \"telephone\", \"stand\", \"light\", \"laundry basket\", \"pipe\", \"seat\", \"column\", \"bicycle\", \"ladder\", \"jacket\", \"storage bin\", \"coffee maker\", \"dishwasher\", \"machine\", \"mat\", \"windowsill\", \"bulletin board\", \"fireplace\", \"mini fridge\", \"water cooler\", \"shower door\", \"pillar\", \"ledge\", \"furniture\", \"cart\", \"decoration\", \"closet door\", \"vacuum cleaner\", \"dish rack\", \"range hood\", \"projector screen\", \"divider\", \"bathroom counter\", \"laundry hamper\", \"bathroom stall door\", \"ceiling light\", \"trash bin\", \"bathroom cabinet\", \"structure\", \"storage organizer\", \"potted plant\", \"mattress\"]\nTAIL_CATS_SCANNET_200 = [\"paper\", \"plate\", \"soap dispenser\", \"bucket\", \"clock\", \"guitar\", \"toilet paper holder\", \"speaker\", \"cup\", \"paper towel roll\", \"bar\", \"toaster\", \"ironing board\", \"soap dish\", \"toilet paper dispenser\", \"fire extinguisher\", \"ball\", \"hat\", \"shower curtain rod\", \"paper cutter\", \"tray\", \"toaster oven\", \"mouse\", \"toilet seat cover dispenser\", \"storage container\", \"scale\", \"tissue box\", \"light switch\", \"crate\", \"power outlet\", \"sign\", \"projector\", \"candle\", \"plunger\", \"stuffed animal\", \"headphones\", \"broom\", \"guitar case\", \"dustpan\", \"hair dryer\", \"water bottle\", \"handicap bar\", \"purse\", \"vent\", \"shower floor\", \"water pitcher\", \"bowl\", \"paper bag\", \"alarm clock\", \"music stand\", \"laundry detergent\", \"dumbbell\", \"tube\", \"cd case\", \"closet rod\", \"coffee kettle\", \"shower head\", \"keyboard piano\", \"case of water bottles\", \"coat rack\", \"folded chair\", \"fire alarm\", \"power strip\", \"calendar\", \"poster\", \"luggage\"]\n\nVALID_CLASS_IDS_200_VALIDATION = ('wall', 'chair', 'floor', 'table', 'door', 'couch', 'cabinet', 'shelf', 'desk', 'office chair', 'bed', 'pillow', 'sink', 'picture', 'window', 'toilet', 'bookshelf', 'monitor', 'curtain', 'book', 'armchair', 'coffee table', 'box', 'refrigerator', 'lamp', 'kitchen cabinet', 'towel', 'clothes', 'tv', 'nightstand', 'counter', 'dresser', 'stool', 'cushion', 'plant', 'ceiling', 'bathtub', 'end table', 'dining table', 'keyboard', 'bag', 'backpack', 'toilet paper', 'printer', 'tv stand', 'whiteboard', 'blanket', 'shower curtain', 'trash can', 'closet', 'stairs', 'microwave', 'stove', 'shoe', 'computer tower', 'bottle', 'bin', 'ottoman', 'bench', 'board', 'washing machine', 'mirror', 'copier', 'basket', 'sofa chair', 'file cabinet', 'fan', 'laptop', 'shower', 'paper', 'person', 'paper towel dispenser', 'oven', 'blinds', 'rack', 'plate', 'blackboard', 'piano', 'suitcase', 'rail', 'radiator', 'recycling bin', 'container', 'wardrobe', 'soap dispenser', 'telephone', 'bucket', 'clock', 'stand', 'light', 'laundry basket', 'pipe', 'clothes dryer', 'guitar', 'toilet paper holder', 'seat', 'speaker', 'column', 'ladder', 'bathroom stall', 'shower wall', 'cup', 'jacket', 'storage bin', 'coffee maker', 'dishwasher', 'paper towel roll', 'machine', 'mat', 'windowsill', 'bar', 'toaster', 'bulletin board', 'ironing board', 'fireplace', 'soap dish', 'kitchen counter', 'doorframe', 'toilet paper dispenser', 'mini fridge', 'fire extinguisher', 'ball', 'hat', 'shower curtain rod', 'water cooler', 'paper cutter', 'tray', 'shower door', 'pillar', 'ledge', 'toaster oven', 'mouse', 'toilet seat cover dispenser', 'furniture', 'cart', 'scale', 'tissue box', 'light switch', 'crate', 'power outlet', 'decoration', 'sign', 'projector', 'closet door', 'vacuum cleaner', 'plunger', 'stuffed animal', 'headphones', 'dish rack', 'broom', 'range hood', 'dustpan', 'hair dryer', 'water bottle', 'handicap bar', 'vent', 'shower floor', 'water pitcher', 'mailbox', 'bowl', 'paper bag', 'projector screen', 'divider', 'laundry detergent', 'bathroom counter', 'object', 'bathroom vanity', 'closet wall', 'laundry hamper', 'bathroom stall door', 'ceiling light', 'trash bin', 'dumbbell', 'stair rail', 'tube', 'bathroom cabinet', 'closet rod', 'coffee kettle', 'shower head', 'keyboard piano', 'case of water bottles', 'coat rack', 'folded chair', 'fire alarm', 'power strip', 'calendar', 'poster', 'potted plant', 'mattress')"
  },
  {
    "path": "data/datasets/data_augmentor.py",
    "content": "from functools import partial\n\nimport math\nimport numpy as np\nimport torch\n\n\nclass DataAugmentor(object):\n    def __init__(self, cfg, split, **kwargs):\n        self.data_augmentor_queue = []\n        self.aug_cfg = cfg\n        self.kwargs = kwargs\n        aug_config_list = self.aug_cfg.aug_list\n\n        self.data_augmentor_queue = []\n        if split == 'train':\n            for aug in aug_config_list:\n                if aug not in self.aug_cfg:\n                    continue\n                cur_augmentor = partial(getattr(self, aug), config=self.aug_cfg[aug])\n                self.data_augmentor_queue.append(cur_augmentor)\n\n    def forward(self, data_dict):\n        \"\"\"\n        Args:\n            data_dict:\n                obj_pcds: (N, 3 + C_in)\n                num_points: (1,)\n                ...\n\n        Returns:\n        \"\"\"\n        aug_dict = self.init_aug(len(data_dict['obj_pcds']))\n        for cur_augmentor in self.data_augmentor_queue:\n            aug_dict = cur_augmentor(aug_dict=aug_dict)\n        data_dict = self.update_data_dict(data_dict, aug_dict)\n        return data_dict\n\n    def scene_aug(self, aug_dict, config):\n        # scene translation\n        if self.check_key(config.translation) and self.check_p(config.translation):\n            n = np.zeros((3))\n            for i in range(3):\n                n[i] = np.random.randn() * config.translation.value[i]\n            aug_dict['scene_trans'] = n\n        # scene scaling\n        if self.check_key(config.scaling) and self.check_p(config.scaling):\n            scaling_fac = np.random.rand() \\\n                        * (config.scaling.value[1] - config.scaling.value[0]) \\\n                        + config.scaling.value[0]\n            aug_dict['scene_scale'] = scaling_fac\n        # scene flip\n        if self.check_key(config.flip) and self.check_p(config.flip):\n            m = np.eye(3)\n            flip_type = np.random.choice(4, 1)\n            if flip_type == 0:\n                # flip x only\n                m[0][0] *= -1\n            elif flip_type == 1:\n                # flip y only\n                m[1][1] *= -1\n            elif flip_type == 2:\n                # flip x+y\n                m[0][0] *= -1\n                m[1][1] *= -1\n            aug_dict['scene_flip'] = m\n        # scene rotation\n        if self.check_key(config.rotation) and self.check_p(config.rotation):\n            if config.rotation.axis_align:\n                _r_angles = [0, math.pi/2, math.pi, math.pi*3/2]\n                theta_x = np.random.choice(_r_angles) * config.rotation.value[0]\n                theta_y = np.random.choice(_r_angles) * config.rotation.value[1]\n                theta_z = np.random.choice(_r_angles) * config.rotation.value[2]\n            else:\n                theta_x = (np.random.rand() * 2 * math.pi - math.pi) * config.rotation.value[0]\n                theta_y = (np.random.rand() * 2 * math.pi - math.pi) * config.rotation.value[1]\n                theta_z = (np.random.rand() * 2 * math.pi - math.pi) * config.rotation.value[2]\n            rx = np.array \\\n                ([[1, 0, 0],\n                  [0, math.cos(theta_x), -math.sin(theta_x)],\n                  [0, math.sin(theta_x), math.cos(theta_x)]])\n            ry = np.array \\\n                ([[math.cos(theta_y), 0, math.sin(theta_y)],\n                  [0, 1, 0],\n                  [-math.sin(theta_y), 0, math.cos(theta_y)]])\n            rz = np.array \\\n                ([[math.cos(theta_z), math.sin(theta_z), 0],\n                  [-math.sin(theta_z), math.cos(theta_z), 0],\n                  [0, 0, 1]])\n            rot_mats = [rx, ry, rz]\n            if config.rotation.get('shuffle', False):\n                np.random.shuffle(rot_mats)\n            r = rot_mats[0].dot(rot_mats[1]).dot(rot_mats[2])\n            aug_dict['scene_rot'] = r\n        # scene color jitter\n        if self.check_key(config.color_jitter):\n            rgb_delta = np.random.randn(3) * 0.1\n            aug_dict['rgb_delta'] = rgb_delta\n        # scene order suffle\n        if self.check_key(config.order_shuffle):\n            aug_dict['obj_order'] = np.random.permutation(len(aug_dict['obj_order']))\n\n        return aug_dict\n\n    def obj_aug(self, aug_dict, config):\n        obj_len = len(aug_dict['obj_order'])\n        obj_trans = []\n        obj_rot = []\n        for _ in range(obj_len):\n            n = None\n            r = None\n            # object translation\n            if self.check_key(config.translation) and self.check_p(config.translation):\n                n = np.zeros((3))\n                for i in range(3):\n                    n[i] = np.random.randn() * config.translation.value[i]\n            obj_trans.append(n)\n            # object rotation\n            if self.check_key(config.rotation) and self.check_p(config.rotation):\n                if config.rotation.axis_align:\n                    _r_angles = [0, math.pi/2, math.pi, math.pi*3/2]\n                    theta_x = np.random.choice(_r_angles) * config.rotation.value[0]\n                    theta_y = np.random.choice(_r_angles) * config.rotation.value[1]\n                    theta_z = np.random.choice(_r_angles) * config.rotation.value[2]\n                else:\n                    theta_x = (np.random.rand() * 2 * math.pi - math.pi) * config.rotation.value[0]\n                    theta_y = (np.random.rand() * 2 * math.pi - math.pi) * config.rotation.value[1]\n                    theta_z = (np.random.rand() * 2 * math.pi - math.pi) * config.rotation.value[2]\n                rx = np.array \\\n                    ([[1, 0, 0], \n                      [0, math.cos(theta_x), -math.sin(theta_x)], \n                      [0, math.sin(theta_x), math.cos(theta_x)]])\n                ry = np.array \\\n                    ([[math.cos(theta_y), 0, math.sin(theta_y)],\n                      [0, 1, 0],\n                      [-math.sin(theta_y), 0, math.cos(theta_y)]])\n                rz = np.array \\\n                    ([[math.cos(theta_z), math.sin(theta_z), 0],\n                      [-math.sin(theta_z), math.cos(theta_z), 0],\n                      [0, 0, 1]])\n                rot_mats = [rx, ry, rz]\n                if config.rotation.get('shuffle', False):\n                    np.random.shuffle(rot_mats)\n                r = rot_mats[0].dot(rot_mats[1]).dot(rot_mats[2])\n            obj_rot.append(r)\n        aug_dict['obj_trans'] = obj_trans\n        aug_dict['obj_rot'] = obj_rot\n        # object jitter\n        if self.check_key(config.random_jitter):\n            aug_dict['obj_jitter'] = config.random_jitter.value\n        # object pts shuffle\n        if self.check_key(config.pts_shuffle):\n            aug_dict['pts_shuffle'] = True\n        return aug_dict\n\n    def update_data_dict(self, data_dict, aug_dict):\n        data_dict['obj_sizes'] = []\n        for i, _ in enumerate(data_dict['obj_pcds']):\n            # scene flip\n            if aug_dict['scene_flip'] is not None:\n                data_dict['obj_pcds'][i][:, :3] = self.rot_fn(data_dict['obj_pcds'][i][:, :3],\n                                                               aug_dict['scene_flip'])\n            # scene scaling\n            if aug_dict['scene_scale'] is not None:\n                data_dict['obj_pcds'][i][:, :3] = self.scaling_fn(data_dict['obj_pcds'][i][:, :3],\n                                                                  aug_dict['scene_scale'])\n            # subsample\n            data_dict['obj_pcds'][i] = self.subsample_fn(data_dict['obj_pcds'][i],\n                                                         data_dict['num_points'])\n            # jitter\n            if aug_dict['obj_jitter'] is not None:\n                data_dict['obj_pcds'][i][:, :3] = self.jitter_fn(data_dict['obj_pcds'][i][:, :3],\n                                                                 aug_dict['obj_jitter'])\n            # calc obj size\n            data_dict['obj_sizes'].append(data_dict['obj_pcds'][i][:, :3].max(0)\n                                        - data_dict['obj_pcds'][i][:, :3].min(0))\n            # scene translation\n            if aug_dict['scene_trans'] is not None:\n                data_dict['obj_pcds'][i][:, :3] += aug_dict['scene_trans']\n            # obj translation\n            if aug_dict['obj_trans'] and aug_dict['obj_trans'][i] is not None:\n                data_dict['obj_pcds'][i][:, :3] += aug_dict['obj_trans'][i]\n\n        #     # scene rotation\n        #     if aug_dict['scene_rot'] is not None:\n        #         data_dict['obj_pcds'][i][:, :3] = self.rot_fn(data_dict['obj_pcds'][i][:, :3],\n        #                                                       aug_dict['scene_rot'])\n        #         if 'bg_pcds' in data_dict.keys():\n        #             data_dict['bg_pcds'][:, :3] = self.rot_fn(data_dict['bg_pcds'][:, :3],\n        #                                                               aug_dict['scene_rot'])\n\n        # scene rotation\n        if aug_dict['scene_rot'] is not None:\n            data_dict['obj_pcds'] = torch.Tensor(np.array(data_dict['obj_pcds']))\n            data_dict['obj_pcds'][:, :, :3] @= aug_dict['scene_rot']\n            # data_dict['obj_pcds'][:, :3] = self.rot_fn(data_dict['obj_pcds'][i][:, :3],\n            #                                                 aug_dict['scene_rot'])\n            if 'bg_pcds' in data_dict.keys():\n                data_dict['bg_pcds'][:, :3] @= aug_dict['scene_rot']\n                # ['scene_rot']= self.rot_fn(data_dict['bg_pcds'][:, :3],\n                #                                             aug_dict['scene_rot'])\n        for i, _ in enumerate(data_dict['obj_pcds']):\n            # obj rotation\n            if aug_dict['obj_rot'] and aug_dict['obj_rot'][i] is not None:\n                data_dict['obj_pcds'][i][:, :3] = self.obj_rot_fn(data_dict['obj_pcds'][i][:, :3],\n                                                                  aug_dict['obj_rot'][i])\n            # scene color jitter\n            if aug_dict['rgb_delta'] is not None:\n                data_dict['obj_pcds'][i][:, 3:] += aug_dict['rgb_delta']\n            # pts shuffle\n            if aug_dict['pts_shuffle']:\n                data_dict['obj_pcds'][i] = self.pts_shuffle_fn(data_dict['obj_pcds'][i])\n        # object order\n        data_dict['obj_order'] = aug_dict['obj_order']\n        return data_dict\n\n    @staticmethod\n    def init_aug(obj_len):\n        keys = ['scene_trans', 'scene_flip', 'scene_rot', 'scene_scale', 'rgb_delta',\n                'obj_trans', 'obj_rot', 'obj_jitter', 'pts_shuffle']\n        aug_dict = {key: None for key in keys}\n        aug_dict['obj_order'] = list(np.arange(obj_len))\n        return aug_dict\n\n    @staticmethod\n    def check_key(key):\n        exist = key is not None\n        if not exist:\n            return False\n        if isinstance(key, bool):\n            enabled = key\n        elif isinstance(key, dict):\n            enabled = key.get('enabled', True)\n        elif hasattr(key, 'enabled'):\n            enabled = key.get('enabled')\n        else:\n            enabled = True\n        return enabled\n\n    @staticmethod\n    def check_p(key):\n        return (not isinstance(key, dict)) or ('p' not in key) or (np.random.rand() < key['p'])\n\n    @staticmethod\n    def rot_fn(x, mat):\n        return np.matmul(x, mat)\n\n    @staticmethod\n    def obj_rot_fn(x, mat):\n        center = x.mean(0)\n        return np.matmul(x - center, mat) + center\n\n    @staticmethod\n    def scaling_fn(x, scale):\n        center = x.mean(0)\n        return (x - center) * scale + center\n\n    @staticmethod\n    def jitter_fn(x, scale):\n        return x + (np.random.randn(len(x), 3) - 0.5) * scale\n\n    @staticmethod\n    def subsample_fn(x, num_points):\n        pcd_idxs = np.random.choice(len(x), size=num_points, replace=len(x) < num_points)\n        return x[pcd_idxs]\n\n    @staticmethod\n    def pts_shuffle_fn(x):\n        return x[np.random.permutation(len(x))]\n"
  },
  {
    "path": "data/datasets/dataset_wrapper.py",
    "content": "import random\n\nimport torch\nfrom fvcore.common.registry import Registry\nfrom transformers import BertTokenizer, T5Tokenizer, AutoTokenizer\nfrom torch.utils.data import Dataset, default_collate\n\nfrom ..data_utils import random_word, random_point_cloud, pad_tensors, Vocabulary, random_caption_word\n\n\nDATASETWRAPPER_REGISTRY = Registry(\"dataset_wrapper\")\nDATASETWRAPPER_REGISTRY.__doc__ = \"\"\" \"\"\"\n\n\n@DATASETWRAPPER_REGISTRY.register()\nclass MaskDatasetWrapper(Dataset):\n    def __init__(self, cfg, dataset, split=\"train\"):\n        # tokenizer, max_seq_length=80, max_obj_len=80,\n        #  mask_strategy='random', txt_mask_ratio=0.15, pc_mask_ratio=0.1\n        assert cfg.data.args.mask_strategy in ['random']\n        self.dataset = dataset\n        self.tokenizer = BertTokenizer.from_pretrained(\"bert-base-uncased\", do_lower_case=True)\n        self.max_seq_length = cfg.data.args.max_seq_len\n        self.max_obj_len = cfg.data.args.max_obj_len\n        self.txt_mask_ratio = cfg.data.args.txt_mask_ratio\n        self.pc_mask_ratio = cfg.data.args.pc_mask_ratio\n\n        self.use_voxel = cfg.data.args.get('use_voxel', None)\n        if self.use_voxel:\n            self.voxel_size = cfg.data.args.get('voxel_size', 0.02)\n\n        self.use_scene_cap = cfg.data.args.get(\"use_scene_cap\", False)\n        self.max_scene_cap_len = cfg.data.args.get(\"max_scene_cap_len\", self.max_seq_length)\n\n    def __len__(self):\n        return len(self.dataset)\n\n    def __getitem__(self, idx):\n        data_dict = self.dataset[idx]\n        sentence = data_dict['sentence']\n        encoded_input = self.tokenizer(sentence, max_length=self.max_seq_length,\n                          add_special_tokens=True, truncation=True,\n                          padding='max_length', return_tensors=\"pt\")\n        # build txt\n        data_dict['txt_ids'] = encoded_input['input_ids'].squeeze(0) # L\n        data_dict['txt_masks'] = encoded_input['attention_mask'].squeeze(0) # L\n\n        if self.use_scene_cap:\n            scene_cap = data_dict[\"scene_cap\"] + \" \" + sentence\n            encoded_scene_cap = self.tokenizer(scene_cap, max_length=self.max_scene_cap_len,\n                          add_special_tokens=True, truncation=True,\n                          padding='max_length', return_tensors=\"pt\")\n            data_dict['scene_txt_ids'] = encoded_scene_cap['input_ids'].squeeze(0)          # L\n            data_dict['scene_txt_masks'] = encoded_scene_cap['attention_mask'].squeeze(0)   # L\n\n        # mask txt\n        masked_txt_ids, masked_lm_labels = random_word(data_dict['txt_ids'], data_dict['txt_masks'],\n                                                       self.tokenizer, self.txt_mask_ratio)\n        data_dict['txt_ids'] = masked_txt_ids\n        data_dict['masked_lm_labels'] = masked_lm_labels\n        # build object\n        data_dict['obj_masks'] = (torch.arange(self.max_obj_len) < len(data_dict['obj_locs'])) # O\n        if 'obj_fts' in data_dict.keys():\n            data_dict['obj_fts'] = pad_tensors(data_dict['obj_fts'], lens=self.max_obj_len,\n                                                    pad=1.0).float() # O, 1024, 6\n        if 'obj_pcds_masks' in data_dict.keys():\n            data_dict['obj_pcds_masks'] = pad_tensors(data_dict['obj_pcds_masks'], lens=self.max_obj_len, \n                                                      pad=1.0).float()\n        data_dict['obj_locs']= pad_tensors(data_dict['obj_locs'], lens=self.max_obj_len,\n                                                pad=0.0).float() # O, 3\n        data_dict['obj_labels'] = pad_tensors(data_dict['obj_labels'], lens=self.max_obj_len,\n                                                   pad=-100).long() # O\n        # mask object, 0 means mask object, 1 means keep object\n        if 'obj_fts' in data_dict.keys():\n            obj_sem_masks = random_point_cloud(data_dict['obj_fts'], data_dict['obj_masks'],\n                                            self.pc_mask_ratio)\n            data_dict['obj_sem_masks'] = obj_sem_masks\n        else:\n            obj_sem_masks = []\n            for i in range(self.max_obj_len):\n                if i >= len(data_dict['obj_locs']):\n                    obj_sem_masks.append(0)\n                else:\n                    prob = random.random()\n                    if prob < self.pc_mask_ratio:\n                        obj_sem_masks.append(0)\n                    else:\n                        obj_sem_masks.append(1)\n            data_dict['obj_sem_masks'] = torch.tensor(obj_sem_masks).long()\n        if 'tgt_object_id' in data_dict.keys():\n            data_dict['tgt_object_id'] = data_dict['tgt_object_id'].long() # 1 or O\n\n        # # Scene pcds\n        # data_dict[\"scene_pcds\"] = torch.from_numpy(data_dict[\"scene_pcds\"]).float()\n        key_list = [\n            'txt_ids', 'txt_masks', 'masked_lm_labels', 'obj_masks', 'obj_fts',\n            'obj_locs', 'obj_labels', 'obj_sem_masks', 'tgt_object_id'\n        ]\n        if 'obj_fts' not in data_dict.keys():\n            key_list.remove('obj_fts')\n            # key_list.remove('obj_sem_masks')\n        if 'obj_pcds_masks' in data_dict.keys():\n            key_list.append('obj_pcds_masks')\n        if 'scene_pcds' in data_dict.keys():\n            key_list.append('scene_pcds')\n        if 'scene_txt_ids' in data_dict.keys():\n            key_list.append('scene_txt_ids')\n        if 'scene_txt_masks' in data_dict.keys():\n            key_list.append('scene_txt_masks')\n        data_dict = {k : v for k, v in data_dict.items() if k in key_list}\n        return data_dict\n    \n    def collate_fn(self, batch_list):\n        ret = default_collate(batch_list)\n        return ret\n\n\n@DATASETWRAPPER_REGISTRY.register()\nclass ScanFamilyDatasetWrapperOld(Dataset):\n    def __init__(self, cfg, dataset, split=\"train\"):\n        # stokenizer, max_seq_length=80, max_obj_len=80\n        self.dataset = dataset\n        self.tokenizer = AutoTokenizer.from_pretrained(\"bert-base-uncased\", do_lower_case=True)\n        self.max_seq_length = cfg.data.args.max_seq_len\n        self.max_obj_len = cfg.data.args.max_obj_len\n\n        self.use_voxel = cfg.data.args.get('use_voxel', None)\n        if self.use_voxel:\n            self.voxel_size = cfg.data.args.get('voxel_size', 0.02)\n\n        self.use_scene_cap = cfg.data.args.get(\"use_scene_cap\", False)\n        self.max_scene_cap_len = cfg.data.args.get(\"max_scene_cap_len\", self.max_seq_length)\n\n    def __len__(self):\n        return len(self.dataset)\n\n    def pad_tensors(self, tensors, lens=None, pad=0):\n        assert tensors.shape[0] <= lens\n        if tensors.shape[0] == lens:\n            return tensors\n        shape = list(tensors.shape)\n        shape[0] = lens - shape[0]\n        res = torch.ones(shape, dtype=tensors.dtype) * pad\n        res = torch.cat((tensors, res), dim=0)\n        return res\n\n    def __getitem__(self, idx):\n        data_dict = self.dataset[idx]\n        sentence = data_dict['sentence']\n        encoded_input = self.tokenizer(sentence, max_length=self.max_seq_length,\n                          add_special_tokens=True, truncation=True,\n                          padding='max_length', return_tensors=\"pt\")\n        # build txt\n        data_dict['txt_ids'] = encoded_input['input_ids'].squeeze(0) # L\n        data_dict['txt_masks'] = encoded_input['attention_mask'].squeeze(0) # L\n        if self.use_scene_cap:\n            scene_cap = data_dict[\"scene_cap\"] + \" \" + sentence\n            encoded_scene_cap = self.tokenizer(scene_cap, max_length=self.max_scene_cap_len,\n                          add_special_tokens=True, truncation=True,\n                          padding='max_length', return_tensors=\"pt\")\n            data_dict['scene_txt_ids'] = encoded_scene_cap['input_ids'].squeeze(0)          # L\n            data_dict['scene_txt_masks'] = encoded_scene_cap['attention_mask'].squeeze(0)   # L\n        # build object\n        data_dict['obj_masks'] = (torch.arange(self.max_obj_len) < len(data_dict['obj_locs'])) # O\n        if 'obj_fts' in data_dict.keys():\n            data_dict['obj_fts'] = self.pad_tensors(data_dict['obj_fts'], lens=self.max_obj_len,\n                                                    pad=1.0).float() # O, 1024, 6\n        if 'obj_pcds_masks' in data_dict.keys():\n            data_dict['obj_pcds_masks'] = pad_tensors(data_dict['obj_pcds_masks'], lens=self.max_obj_len, \n                                                      pad=1.0).float()\n        data_dict['obj_locs']= self.pad_tensors(data_dict['obj_locs'], lens=self.max_obj_len,\n                                                pad=0.0).float() # O, 3\n        data_dict['obj_boxes']= self.pad_tensors(data_dict['obj_boxes'], lens=self.max_obj_len,\n                                                 pad=0.0).float() # O, 3\n        data_dict['obj_labels'] = self.pad_tensors(data_dict['obj_labels'], lens=self.max_obj_len,\n                                                   pad=-100).long() # O\n        # build sem mask, no mask\n        data_dict['obj_sem_masks'] = (torch.arange(self.max_obj_len) < len(data_dict['obj_locs']))\n        # build label for refer\n        data_dict['tgt_object_label'] = data_dict['tgt_object_label'].long() # 1 or C\n        data_dict['tgt_object_id'] = data_dict['tgt_object_id'].long() # 1 or O\n        if len(data_dict['tgt_object_id']) > 1: # O, pad to max objet length\n            data_dict['tgt_object_id'] = self.pad_tensors(data_dict['tgt_object_id'].long(),\n                                                          lens=self.max_obj_len, pad=0).long() # O\n        # build target\n        if data_dict.get('tgt_object_id_iou25') is not None:\n            data_dict['tgt_object_id_iou25'] = self.pad_tensors(data_dict['tgt_object_id_iou25'],\n                                                                lens=self.max_obj_len, pad=0).long()\n        if data_dict.get('tgt_object_id_iou50') is not None:\n            data_dict['tgt_object_id_iou50'] = self.pad_tensors(data_dict['tgt_object_id_iou50'],\n                                                                lens=self.max_obj_len, pad=0).long()\n        # build label for qa\n        if \"answer_label\" in data_dict:\n            data_dict['answer_label'] = data_dict['answer_label'].long() # N, C\n        return data_dict\n    \n    def collate_fn(self, batch_list):\n        ret = default_collate(batch_list)\n        return ret\n\n\n@DATASETWRAPPER_REGISTRY.register()\nclass VisualizeDatasetWrapper(Dataset):\n    def __init__(self, cfg, dataset, split=\"train\"):\n        self.dataset = dataset\n\n    def __len__(self):\n        return len(self.dataset)\n\n    def __getitem__(self, idx):\n        data_dict = self.dataset[idx]\n        ret = {}\n        ret['scene_pcds'] = data_dict['scene_pcds']\n        ret['item_id'] = data_dict['data_idx']\n        return ret\n\n    def collate_fn(self, batch_list):\n        ret = {}\n        ret['scene_pcds'] = [b['scene_pcds'] for b in batch_list]\n        ret['item_id'] = [b['item_id'] for b in batch_list]\n        return ret\n"
  },
  {
    "path": "data/datasets/hm.py",
    "content": "import collections\n\nfrom ..build import DATASET_REGISTRY\nfrom .base import ScanBase\n\n\n@DATASET_REGISTRY.register()\nclass HMPretrainObj(ScanBase):\n    def __init__(self, cfg, split):\n        super(HMPretrainObj, self).__init__(cfg, split)\n        self.base_dir = cfg.data.hm_base\n\n        self.load_scene_pcds = cfg.data.args.get('load_scene_pcds', False)\n        if self.load_scene_pcds:\n            self.max_pcd_num_points = cfg.data.args.get('max_pcd_num_points', None)\n            assert self.max_pcd_num_points is not None\n        self.bg_points_num = cfg.data.args.get('bg_points_num', 1000)\n\n        self.scan_ids = sorted(list(self._load_split(self.split)))\n        if cfg.debug.flag and cfg.debug.debug_size != -1:\n            self.scan_ids = self.scan_ids[:cfg.debug.debug_size]\n\n        print(f\"Loading HM3D {split}-set scans\")\n        self.scan_data = self._load_scan(self.scan_ids)\n        self.scan_ids = sorted(list(self.scan_data.keys()))\n        print(f\"Finish loading HM3D {split}-set scans of length {len(self.scan_ids)}\")\n\n    def __len__(self):\n        return len(self.scan_ids)\n\n    def __getitem__(self, index):\n        \"\"\"Data dict post-processing, for example, filtering, crop, nomalization,\n        rotation, etc.\n\n        Args:\n            index (int): _description_\n        \"\"\"\n        data_dict = self._getitem_obj_pretrain(index)\n        dataset = 'hm3d'\n        data_dict['source'] = dataset\n        return data_dict\n\n\n@DATASET_REGISTRY.register()\nclass HMSpatialRefer(ScanBase):\n    def __init__(self, cfg, split):\n        super(HMSpatialRefer, self).__init__(cfg, split)\n        self.base_dir = cfg.data.hm_base\n        self.max_obj_len = cfg.data.args.max_obj_len - 1\n        self.filter_lang = cfg.data.args.filter_lang\n\n        self.load_scene_pcds = cfg.data.args.get('load_scene_pcds', False)\n        if self.load_scene_pcds:\n            self.max_pcd_num_points = cfg.data.args.get('max_pcd_num_points', None)\n            assert self.max_pcd_num_points is not None\n        self.bg_points_num = cfg.data.args.get('bg_points_num', 1000)\n\n        split_cfg = cfg.data.get(self.__class__.__name__).get(split)\n        all_scan_ids = self._load_split(self.split)\n\n        print(f\"Loading HM3D {split}-set language\")\n        self.lang_data, self.scan_ids = self._load_lang(split_cfg, all_scan_ids)\n        print(f\"Finish loading HM3D {split}-set language of size {self.__len__()}\")\n\n        print(f\"Loading HM3D {split}-set scans\")\n        self.scan_data = self._load_scan(self.scan_ids)\n        print(f\"Finish loading HM3D {split}-set scans\")\n\n         # build unique multiple look up\n        for scan_id in self.scan_ids:\n            inst_labels = self.scan_data[scan_id]['inst_labels']\n            self.scan_data[scan_id]['label_count'] = collections.Counter(\n                                    [l for l in inst_labels])\n            self.scan_data[scan_id]['label_count_multi'] = collections.Counter(\n                                    [self.label_converter.id_to_scannetid[l] for l in inst_labels])\n\n    def __len__(self):\n        return len(self.lang_data)\n\n    def __getitem__(self, index):\n        \"\"\"Data dict post-processing, for example, filtering, crop, nomalization,\n        rotation, etc.\n\n        Args:\n            index (int): _description_\n        \"\"\"\n        data_dict = self._getitem_refer(index)\n        return data_dict\n"
  },
  {
    "path": "data/datasets/multiscan.py",
    "content": "import collections\n\nfrom ..build import DATASET_REGISTRY\nfrom .base import ScanBase\n\n\n@DATASET_REGISTRY.register()\nclass MultiScanPretrainObj(ScanBase):\n    def __init__(self, cfg, split):\n        super(MultiScanPretrainObj, self).__init__(cfg, split)\n        self.base_dir = cfg.data.multiscan_base\n\n        self.load_scene_pcds = cfg.data.args.get('load_scene_pcds', False)\n        if self.load_scene_pcds:\n            self.max_pcd_num_points = cfg.data.args.get('max_pcd_num_points', None)\n            assert self.max_pcd_num_points is not None\n        self.bg_points_num = cfg.data.args.get('bg_points_num', 1000)\n\n        self.scan_ids = sorted(list(self._load_split(self.split)))\n        if cfg.debug.flag and cfg.debug.debug_size != -1:\n            self.scan_ids = self.scan_ids[:cfg.debug.debug_size]\n\n        print(f\"Loading MultiScan {split}-set scans\")\n        self.scan_data = self._load_scan(self.scan_ids)\n        self.scan_ids = sorted(list(self.scan_data.keys()))\n        print(f\"Finish loading MultiScan {split}-set scans of length {len(self.scan_ids)}\")\n\n    def __len__(self):\n        return len(self.scan_ids)\n\n    def __getitem__(self, index):\n        \"\"\"Data dict post-processing, for example, filtering, crop, nomalization,\n        rotation, etc.\n\n        Args:\n            index (int): _description_\n        \"\"\"\n        data_dict = self._getitem_obj_pretrain(index)\n        dataset = 'multiscan'\n        data_dict['source'] = dataset\n        return data_dict\n\n\n@DATASET_REGISTRY.register()\nclass MultiScanSpatialRefer(ScanBase):\n    def __init__(self, cfg, split):\n        super(MultiScanSpatialRefer, self).__init__(cfg, split)\n        self.base_dir = cfg.data.multiscan_base\n        self.max_obj_len = cfg.data.args.max_obj_len - 1\n        self.filter_lang = cfg.data.args.filter_lang\n\n        self.load_scene_pcds = cfg.data.args.get('load_scene_pcds', False)\n        if self.load_scene_pcds:\n            self.max_pcd_num_points = cfg.data.args.get('max_pcd_num_points', None)\n            assert self.max_pcd_num_points is not None\n        self.bg_points_num = cfg.data.args.get('bg_points_num', 1000)\n\n        split_cfg = cfg.data.get(self.__class__.__name__).get(split)\n        all_scan_ids = self._load_split(self.split)\n\n        print(f\"Loading MultiScan {split}-set language\")\n        self.lang_data, self.scan_ids = self._load_lang(split_cfg, all_scan_ids)\n        print(f\"Finish loading MultiScan {split}-set language of size {self.__len__()}\")\n\n        print(f\"Loading Multiscan {split}-set scans\")\n        self.scan_data = self._load_scan(self.scan_ids)\n        print(f\"Finish loading MultiScan {split}-set scans\")\n\n         # build unique multiple look up\n        for scan_id in self.scan_ids:\n            inst_labels = self.scan_data[scan_id]['inst_labels']\n            self.scan_data[scan_id]['label_count'] = collections.Counter(\n                                    [l for l in inst_labels])\n            self.scan_data[scan_id]['label_count_multi'] = collections.Counter(\n                                    [self.label_converter.id_to_scannetid[l] for l in inst_labels])\n\n    def __len__(self):\n        return len(self.lang_data)\n\n    def __getitem__(self, index):\n        \"\"\"Data dict post-processing, for example, filtering, crop, nomalization,\n        rotation, etc.\n\n        Args:\n            index (int): _description_\n        \"\"\"\n        data_dict = self._getitem_refer(index)\n        return data_dict\n"
  },
  {
    "path": "data/datasets/procthor.py",
    "content": "import collections\n\nfrom ..build import DATASET_REGISTRY\nfrom .base import ScanBase\n\n\n@DATASET_REGISTRY.register()\nclass ProcThorPretrainObj(ScanBase):\n    def __init__(self, cfg, split):\n        super(ProcThorPretrainObj, self).__init__(cfg, split)\n        self.base_dir = cfg.data.procthor_base\n\n        self.load_scene_pcds = cfg.data.args.get('load_scene_pcds', False)\n        if self.load_scene_pcds:\n            self.max_pcd_num_points = cfg.data.args.get('max_pcd_num_points', None)\n            assert self.max_pcd_num_points is not None\n        self.bg_points_num = cfg.data.args.get('bg_points_num', 1000)\n\n        self.scan_ids = sorted(list(self._load_split(self.split)))\n        if cfg.debug.flag and cfg.debug.debug_size != -1:\n            self.scan_ids = self.scan_ids[:cfg.debug.debug_size]\n\n        print(f\"Loading ProcThor {split}-set scans\")\n        self.scan_data = self._load_scan(self.scan_ids)\n        self.scan_ids = sorted(list(self.scan_data.keys()))\n        print(f\"Finish loading ProcThor {split}-set scans of length {len(self.scan_ids)}\")\n\n    def __len__(self):\n        return len(self.scan_ids)\n\n    def __getitem__(self, index):\n        \"\"\"Data dict post-processing, for example, filtering, crop, nomalization,\n        rotation, etc.\n\n        Args:\n            index (int): _description_\n        \"\"\"\n        data_dict = self._getitem_obj_pretrain(index)\n        dataset = 'procthor'\n        data_dict['source'] = dataset\n        return data_dict\n\n\n@DATASET_REGISTRY.register()\nclass ProcThorSpatialRefer(ScanBase):\n    def __init__(self, cfg, split):\n        super(ProcThorSpatialRefer, self).__init__(cfg, split)\n        self.base_dir = cfg.data.procthor_base\n        self.max_obj_len = cfg.data.args.max_obj_len - 1\n        self.filter_lang = cfg.data.args.filter_lang\n\n        self.load_scene_pcds = cfg.data.args.get('load_scene_pcds', False)\n        if self.load_scene_pcds:\n            self.max_pcd_num_points = cfg.data.args.get('max_pcd_num_points', None)\n            assert self.max_pcd_num_points is not None\n        self.bg_points_num = cfg.data.args.get('bg_points_num', 1000)\n\n        split_cfg = cfg.data.get(self.__class__.__name__).get(split)\n        all_scan_ids = self._load_split(self.split)\n\n        print(f\"Loading ProcThor {split}-set language\")\n        self.lang_data, self.scan_ids = self._load_lang(split_cfg, all_scan_ids)\n        print(f\"Finish loading ProcThor {split}-set language of size {self.__len__()}\")\n\n        print(f\"Loading ProcThor {split}-set scans\")\n        self.scan_data = self._load_scan(self.scan_ids)\n        print(f\"Finish loading ProcThor {split}-set scans\")\n\n         # build unique multiple look up\n        for scan_id in self.scan_ids:\n            inst_labels = self.scan_data[scan_id]['inst_labels']\n            self.scan_data[scan_id]['label_count_multi'] = collections.Counter(\n                                    [self.label_converter.id_to_scannetid[l] for l in inst_labels])\n            self.scan_data[scan_id]['label_count'] = collections.Counter(\n                                    [l for l in inst_labels])\n\n    def __len__(self):\n        return len(self.lang_data)\n\n    def __getitem__(self, index):\n        \"\"\"Data dict post-processing, for example, filtering, crop, nomalization,\n        rotation, etc.\n\n        Args:\n            index (int): _description_\n        \"\"\"\n        data_dict = self._getitem_refer(index)\n        return data_dict\n"
  },
  {
    "path": "data/datasets/rscan.py",
    "content": "import collections\n\nfrom ..build import DATASET_REGISTRY\nfrom .base import ScanBase\n\n\n@DATASET_REGISTRY.register()\nclass RScanPretrainObj(ScanBase):\n    def __init__(self, cfg, split):\n        super(RScanPretrainObj, self).__init__(cfg, split)\n        self.base_dir = cfg.data.rscan_base\n\n        self.load_scene_pcds = cfg.data.args.get('load_scene_pcds', False)\n        if self.load_scene_pcds:\n            self.max_pcd_num_points = cfg.data.args.get('max_pcd_num_points', None)\n            assert self.max_pcd_num_points is not None\n        self.bg_points_num = cfg.data.args.get('bg_points_num', 1000)\n\n        self.scan_ids = sorted(list(self._load_split(self.split)))\n        if cfg.debug.flag and cfg.debug.debug_size != -1:\n            self.scan_ids = self.scan_ids[:cfg.debug.debug_size]\n\n        print(f\"Loading 3RScan {split}-set scans\")\n        self.scan_data = self._load_scan(self.scan_ids)\n        self.scan_ids = sorted(list(self.scan_data.keys()))\n        print(f\"Finish loading 3RScan {split}-set scans of length {len(self.scan_ids)}\")\n\n    def __len__(self):\n        return len(self.scan_ids)\n\n    def __getitem__(self, index):\n        \"\"\"Data dict post-processing, for example, filtering, crop, nomalization,\n        rotation, etc.\n\n        Args:\n            index (int): _description_\n        \"\"\"\n        data_dict = self._getitem_obj_pretrain(index)\n        dataset = 'rscan'\n        data_dict['source'] = dataset\n        return data_dict\n\n\n@DATASET_REGISTRY.register()\nclass RScanSpatialRefer(ScanBase):\n    def __init__(self, cfg, split):\n        super(RScanSpatialRefer, self).__init__(cfg, split)\n        self.base_dir = cfg.data.rscan_base\n        self.max_obj_len = cfg.data.args.max_obj_len - 1\n        self.filter_lang = cfg.data.args.filter_lang\n\n        self.load_scene_pcds = cfg.data.args.get('load_scene_pcds', False)\n        if self.load_scene_pcds:\n            self.max_pcd_num_points = cfg.data.args.get('max_pcd_num_points', None)\n            assert self.max_pcd_num_points is not None\n        self.bg_points_num = cfg.data.args.get('bg_points_num', 1000)\n\n        split_cfg = cfg.data.get(self.__class__.__name__).get(split)\n        all_scan_ids = self._load_split(self.split)\n\n        print(f\"Loading 3RScanSpatialRefer {split}-set language\")\n        self.lang_data, self.scan_ids = self._load_lang(split_cfg, all_scan_ids)\n        print(f\"Finish loading 3RScanSpatialRefer {split}-set language of size {self.__len__()}\")\n\n        print(f\"Loading 3RScan {split}-set scans\")\n        self.scan_data = self._load_scan(self.scan_ids)\n        print(f\"Finish loading 3RScan {split}-set scans\")\n\n         # build unique multiple look up\n        for scan_id in self.scan_ids:\n            inst_labels = self.scan_data[scan_id]['inst_labels']\n            self.scan_data[scan_id]['label_count'] = collections.Counter(\n                                    [l for l in inst_labels])\n            self.scan_data[scan_id]['label_count_multi'] = collections.Counter(\n                                    [self.label_converter.id_to_scannetid[l] for l in inst_labels])\n\n    def __len__(self):\n        return len(self.lang_data)\n\n    def __getitem__(self, index):\n        \"\"\"Data dict post-processing, for example, filtering, crop, nomalization,\n        rotation, etc.\n\n        Args:\n            index (int): _description_\n        \"\"\"\n        data_dict = self._getitem_refer(index)\n        return data_dict\n"
  },
  {
    "path": "data/datasets/scannet.py",
    "content": "import os\nimport collections\n\nfrom ..build import DATASET_REGISTRY\nfrom .base import ScanBase\n\n\n@DATASET_REGISTRY.register()\nclass ScanNetPretrainObj(ScanBase):\n    def __init__(self, cfg, split):\n        super(ScanNetPretrainObj, self).__init__(cfg, split)\n        self.base_dir = cfg.data.scan_family_base\n\n        self.load_scene_pcds = cfg.data.args.get('load_scene_pcds', False)\n        if self.load_scene_pcds:\n            self.max_pcd_num_points = cfg.data.args.get('max_pcd_num_points', None)\n            assert self.max_pcd_num_points is not None\n        self.bg_points_num = cfg.data.args.get('bg_points_num', 1000)\n\n        self.scan_ids = sorted(list(self._load_split(self.split)))\n        if cfg.debug.flag and cfg.debug.debug_size != -1:\n            self.scan_ids = self.scan_ids[:cfg.debug.debug_size]\n\n        print(f\"Loading 3RScan {split}-set scans\")\n        self.scan_data = self._load_scan(self.scan_ids)\n        self.scan_ids = sorted(list(self.scan_data.keys()))\n        print(f\"Finish loading 3RScan {split}-set scans of length {len(self.scan_ids)}\")\n\n    def __len__(self):\n        return len(self.scan_ids)\n\n    def __getitem__(self, index):\n        \"\"\"Data dict post-processing, for example, filtering, crop, nomalization,\n        rotation, etc.\n\n        Args:\n            index (int): _description_\n        \"\"\"\n        data_dict = self._getitem_obj_pretrain(index)\n        dataset = 'rscan'\n        data_dict['source'] = dataset\n        return data_dict\n\n\n@DATASET_REGISTRY.register()\nclass ScanNetSpatialRefer(ScanBase):\n    def __init__(self, cfg, split):\n        super(ScanNetSpatialRefer, self).__init__(cfg, split)\n        self.base_dir = cfg.data.scan_family_base\n        self.max_obj_len = cfg.data.args.max_obj_len - 1\n        self.filter_lang = cfg.data.args.filter_lang\n\n        self.load_scene_pcds = cfg.data.args.get('load_scene_pcds', False)\n        if self.load_scene_pcds:\n            self.max_pcd_num_points = cfg.data.args.get('max_pcd_num_points', None)\n            assert self.max_pcd_num_points is not None\n        self.bg_points_num = cfg.data.args.get('bg_points_num', 1000)\n\n        split_cfg = cfg.data.get(self.__class__.__name__).get(split)\n        all_scan_ids = self._load_split(self.split)\n\n        print(f\"Loading ScanNetSpatialRefer {split}-set language\")\n        self.lang_data, self.scan_ids = self._load_lang(split_cfg, all_scan_ids)\n        print(f\"Finish loading ScanNetSpatialRefer {split}-set language of size {self.__len__()}\")\n\n        print(f\"Loading ScanNet {split}-set scans\")\n        self.scan_data = self._load_scan(self.scan_ids)\n        print(f\"Finish loading ScanNet {split}-set scans\")\n\n         # build unique multiple look up\n        for scan_id in self.scan_ids:\n            inst_labels = self.scan_data[scan_id]['inst_labels']\n            self.scan_data[scan_id]['label_count'] = collections.Counter(\n                                    [l for l in inst_labels])\n            self.scan_data[scan_id]['label_count_multi'] = collections.Counter(\n                                    [self.label_converter.id_to_scannetid[l] for l in inst_labels])\n\n    def __len__(self):\n        return len(self.lang_data)\n\n    def __getitem__(self, index):\n        \"\"\"Data dict post-processing, for example, filtering, crop, nomalization,\n        rotation, etc.\n\n        Args:\n            index (int): _description_\n        \"\"\"\n        data_dict = self._getitem_refer(index)\n        return data_dict"
  },
  {
    "path": "data/datasets/scannet_base.py",
    "content": "import os\nimport collections\nimport json\nimport pickle\nimport random\n\nimport jsonlines\nfrom tqdm import tqdm\nfrom scipy import sparse\nimport numpy as np\nimport torch\nfrom torch.utils.data import Dataset\n\nfrom common.misc import rgetattr\nfrom ..data_utils import convert_pc_to_box, LabelConverter, build_rotate_mat, load_matrix_from_txt, \\\n                        construct_bbox_corners, eval_ref_one_sample  \nimport einops \n\n\nSCAN_DATA = {}\n\nclass ScanNetBase(Dataset):\n    def __init__(self, cfg, split):\n        self.cfg = cfg\n        self.split = split\n        self.base_dir = cfg.data.scan_family_base\n        assert self.split in ['train', 'val', 'test']\n\n        self.int2cat = json.load(open(os.path.join(self.base_dir,\n                                            \"annotations/meta_data/scannetv2_raw_categories.json\"),\n                                            'r', encoding=\"utf-8\"))\n        self.cat2int = {w: i for i, w in enumerate(self.int2cat)}\n        self.label_converter = LabelConverter(os.path.join(self.base_dir,\n                                            \"annotations/meta_data/scannetv2-labels.combined.tsv\"))\n\n        self.rot_matrix = build_rotate_mat(self.split)\n        self.use_cache = rgetattr(self.cfg.data, 'mvdatasettings.use_cache', False)\n        self.cache = {}\n\n    def __len__(self):\n        return len(self.lang_data)\n\n    def __getitem__(self, index):\n        raise NotImplementedError\n\n    def _load_one_scan(self, scan_id, pc_type = 'gt', load_inst_info = False, \n                      load_multiview_info = False, load_mask3d_voxel = False, load_pc_info = True, load_segment_info=False,\n                      load_offline_segment_voxel=False, load_offline_segment_image=False, load_offline_segment_point=False, load_nocls=False):\n        one_scan = {}\n        if load_inst_info:\n            inst_labels, inst_locs, inst_colors = self._load_inst_info(scan_id)\n            one_scan['inst_labels'] = inst_labels # (n_obj, )\n            one_scan['inst_locs'] = inst_locs # (n_obj, 6) center xyz, whl\n            one_scan['inst_colors'] = inst_colors # (n_obj, 3x4) cluster * (weight, mean rgb)\n\n        if load_pc_info:\n            # load pcd data\n            pcd_data = torch.load(os.path.join(self.base_dir, \"scan_data\",\n                                            \"pcd_with_global_alignment\", f'{scan_id}.pth'))\n            points, colors, instance_labels = pcd_data[0], pcd_data[1], pcd_data[-1]\n            colors = colors / 127.5 - 1\n            pcds = np.concatenate([points, colors], 1)\n            # convert to gt object\n            if load_inst_info:\n                obj_pcds = []\n                bg_indices = np.full((points.shape[0], ), 1, dtype=np.bool_)\n                for i in range(instance_labels.max() + 1):\n                    mask = instance_labels == i     # time consuming\n                    obj_pcds.append(pcds[mask])\n                    if not self.int2cat[inst_labels[i]] in ['wall', 'floor', 'ceiling']:\n                        bg_indices[mask] = False\n                one_scan['obj_pcds'] = obj_pcds\n                # assert sum([len(obj_pcd) for obj_pcd in obj_pcds]) + bg_indices.sum() == points.shape[0]\n                one_scan['bg_pcds'] = pcds[bg_indices]\n                # calculate box for matching\n                obj_center = []\n                obj_box_size = []\n                for obj_pcd in obj_pcds:\n                    _c, _b = convert_pc_to_box(obj_pcd)\n                    obj_center.append(_c)\n                    obj_box_size.append(_b)\n                one_scan['obj_center'] = obj_center\n                one_scan['obj_box_size'] = obj_box_size\n\n            obj_mask_path = os.path.join(self.base_dir, \"mask\", str(scan_id) + \".mask\" + \".npz\")\n            if os.path.exists(obj_mask_path):\n                obj_label_path = os.path.join(self.base_dir, \"mask\", str(scan_id) + \".label\" + \".npy\")\n                obj_pcds = []\n                obj_mask = np.array(sparse.load_npz(obj_mask_path).todense())[:50, :]\n                obj_labels = np.load(obj_label_path)[:50]\n                obj_l = []\n                bg_indices = np.full((pcds.shape[0], ), 1, dtype=np.bool_)\n                for i in range(obj_mask.shape[0]):\n                    mask = obj_mask[i]\n                    if pcds[mask == 1, :].shape[0] > 0:\n                        obj_pcds.append(pcds[mask == 1, :])\n                        obj_l.append(obj_labels[i])\n                        # if not self.int2cat[obj_labels[i]] in ['wall', 'floor', 'ceiling']:\n                        bg_indices[mask == 1] = False\n                one_scan['obj_pcds_pred'] = obj_pcds\n                one_scan['inst_labels_pred'] = obj_l\n                one_scan['bg_pcds_pred'] = pcds[bg_indices]\n                # calculate box for pred\n                obj_center_pred = []\n                obj_box_size_pred = []\n                for obj_pcd in obj_pcds:\n                    _c, _b = convert_pc_to_box(obj_pcd)\n                    obj_center_pred.append(_c)\n                    obj_box_size_pred.append(_b)\n                one_scan['obj_center_pred'] = obj_center_pred\n                one_scan['obj_box_size_pred'] = obj_box_size_pred\n\n        if load_multiview_info:\n            one_scan['multiview_info'] = self._load_multiview_info(scan_id)\n\n        if load_mask3d_voxel:\n            one_scan['mask3d_voxel'] = self._load_mask3d_voxel(scan_id)\n        \n        # load segment for mask3d\n        if load_segment_info:\n            one_scan[\"scene_pcds\"] = np.load(os.path.join(self.base_dir, \"scan_data\", \"pcd_mask3d\", f'{scan_id[-7:]}.npy'))\n        \n        # load offline feature \n        if load_offline_segment_voxel:\n            if load_nocls:\n                one_scan['offline_segment_voxel'] = torch.load(os.path.join(self.base_dir, \"scan_data\", \"mask3d_voxel_feature_nocls\", f'{scan_id}.pth'))\n            else:\n                one_scan['offline_segment_voxel'] = torch.load(os.path.join(self.base_dir, \"scan_data\", \"mask3d_voxel_feature\", f'{scan_id}.pth'))\n            \n        if load_offline_segment_image:\n            one_scan['offline_segment_image'] = torch.load(os.path.join(self.base_dir, \"scan_data\", \"mask3d_image_feature\", f'{scan_id}.pth'))\n\n        return (scan_id, one_scan)\n\n    def _load_scannet(self, scan_ids, pc_type = 'gt', load_inst_info = False, \n                      load_multiview_info = False, load_mask3d_voxel = False, load_pc_info = True, load_segment_info = False, \n                      load_offline_segment_voxel=False, load_offline_segment_image=False, load_offline_segment_point=False, load_nocls=False,\n                      process_num = 1):\n        unloaded_scan_ids = [scan_id for scan_id in scan_ids if scan_id not in SCAN_DATA]\n        print(f\"Loading scans: {len(unloaded_scan_ids)} / {len(scan_ids)}\")\n        scans = {}\n        if process_num > 1:\n            from joblib import Parallel, delayed\n            res_all = Parallel(n_jobs=process_num)(\n                delayed(self._load_one_scan)(scan_id, pc_type = pc_type,\n                                        load_inst_info = load_inst_info,\n                                        load_multiview_info = load_multiview_info,\n                                        load_mask3d_voxel = load_mask3d_voxel,\n                                        load_offline_segment_voxel=load_offline_segment_voxel, load_offline_segment_image=load_offline_segment_image,\n                                        load_offline_segment_point=load_offline_segment_point, load_nocls=load_nocls,\n                                        load_pc_info = load_pc_info, load_segment_info=load_segment_info) for scan_id in tqdm(unloaded_scan_ids))\n            for scan_id, one_scan in tqdm(res_all):\n                scans[scan_id] = one_scan\n\n        else:\n            for scan_id in tqdm(unloaded_scan_ids):\n                _, one_scan = self._load_one_scan(scan_id, pc_type = pc_type,\n                                                  load_inst_info = load_inst_info, \n                                                  load_multiview_info = load_multiview_info, \n                                                  load_mask3d_voxel = load_mask3d_voxel,\n                                                  load_pc_info = load_pc_info, load_segment_info=load_segment_info,\n                                                  load_offline_segment_voxel=load_offline_segment_voxel, load_offline_segment_image=load_offline_segment_image,\n                                                  load_offline_segment_point=load_offline_segment_point, load_nocls=load_nocls)\n                scans[scan_id] = one_scan\n\n        SCAN_DATA.update(scans)\n        scans = {scan_id: SCAN_DATA[scan_id] for scan_id in scan_ids}\n        return scans\n\n    def _load_lang(self, cfg):\n        caption_source = cfg.sources\n        lang_data = []\n        if caption_source:\n            if 'scanrefer' in caption_source:\n                anno_file = os.path.join(self.base_dir, 'annotations/refer/scanrefer.jsonl')\n                with jsonlines.open(anno_file, 'r') as _f:\n                    for item in _f:\n                        if item['scan_id'] in self.scannet_scan_ids:\n                            lang_data.append(('scannet', item['scan_id'], item['utterance']))\n\n            if 'referit3d' in caption_source:\n                for anno_type in cfg.referit3d.anno_type:\n                    anno_file = os.path.join(self.base_dir,\n                                             f'annotations/refer/{anno_type}.jsonl')\n                    with jsonlines.open(anno_file, 'r') as _f:\n                        for item in _f:\n                            if item['scan_id'] in self.scannet_scan_ids:\n                                lang_data.append(('scannet', item['scan_id'], item['utterance']))\n\n            if 'scanqa' in caption_source:\n                anno_file_list = ['annotations/qa/ScanQA_v1.0_train.json',\n                                  'annotations/qa/ScanQA_v1.0_val.json']\n                for anno_file in anno_file_list:\n                    anno_file = os.path.join(self.base_dir, anno_file)\n                    json_data = json.load(open(anno_file, 'r', encoding='utf-8'))\n                    for item in json_data:\n                        if item['scene_id'] in self.scannet_scan_ids:\n                            for i in range(len(item['answers'])):\n                                lang_data.append(('scannet', item['scene_id'],\n                                                  item['question'] + \" \" + item['answers'][i]))\n\n            if 'sgrefer' in caption_source:\n                for anno_type in cfg.sgrefer.anno_type:\n                    anno_file = os.path.join(self.base_dir,\n                                             f'annotations/refer/ssg_ref_{anno_type}.json')\n                    json_data = json.load(open(anno_file, 'r', encoding='utf-8'))\n                    for item in json_data:\n                        if item['scan_id'] in self.scannet_scan_ids:\n                            lang_data.append(('scannet', item['scan_id'], item['utterance']))\n\n            if 'sgcaption' in caption_source:\n                for anno_type in cfg.sgcaption.anno_type:\n                    anno_file = os.path.join(self.base_dir,\n                                             f'annotations/refer/ssg_caption_{anno_type}.json')\n                    json_data = json.load(open(anno_file, 'r', encoding='utf-8'))\n                    for item in json_data:\n                        if item['scan_id'] in self.scannet_scan_ids:\n                            lang_data.append(('scannet', item['scan_id'], item['utterance']))\n        return lang_data\n\n    def _load_split(self, cfg, split, use_multi_process = False):\n        if use_multi_process and split in ['train']:\n            split_file = os.path.join(self.base_dir, 'annotations/splits/scannetv2_'+ split + \"_sort.json\")\n            with open(split_file, 'r') as f:\n                scannet_scan_ids = json.load(f)\n        else:\n            split_file = os.path.join(self.base_dir, 'annotations/splits/scannetv2_'+ split + \".txt\")\n            scannet_scan_ids = {x.strip() for x in open(split_file, 'r', encoding=\"utf-8\")}\n        scannet_scan_ids = sorted(scannet_scan_ids)\n\n        if cfg.debug.flag and cfg.debug.debug_size != -1:\n            scannet_scan_ids = list(scannet_scan_ids)[:cfg.debug.debug_size]\n        return scannet_scan_ids\n\n    def _load_inst_info(self, scan_id):\n        # inst_labels = json.load(open(os.path.join(self.base_dir, 'scan_data',\n        #                                           'instance_id_to_label',\n        #                                          f'{scan_id}.json'), encoding=\"utf-8\"))\n        inst_labels = torch.load(os.path.join(self.base_dir, 'scan_data',\n                                                'instance_id_to_label',\n                                                f'{scan_id}.pth'))\n        inst_labels = [self.cat2int[i] for i in inst_labels.values()]\n\n        inst_loc_path = os.path.join(self.base_dir, 'scan_data',\n                                     'instance_id_to_loc', f'{scan_id}.npy')\n        if os.path.exists(inst_loc_path):\n            inst_locs = np.load(inst_loc_path)\n        else:\n            inst_locs = None\n\n        inst_color_path = os.path.join(self.base_dir, 'scan_data',\n                                       'instance_id_to_gmm_color', f'{scan_id}.json') \n        if os.path.exists(inst_color_path):\n            inst_colors = json.load(open(inst_color_path, encoding=\"utf-8\"))\n            inst_colors = [np.concatenate(\n                [np.array(x['weights'])[:, None], np.array(x['means'])],\n                axis=1).astype(np.float32) for x in inst_colors]\n        else:\n            inst_colors = None\n\n        return inst_labels, inst_locs, inst_colors\n\n    def _obj_processing_post(self, obj_pcds, obj_labels, is_need_bbox=False, rot_aug=True):\n        # rotate obj\n        rot_matrix = build_rotate_mat(self.split, rot_aug)\n\n        # normalize pc and calculate location\n        obj_fts = []\n        obj_locs = []\n        obj_boxes = []\n        for obj_pcd in obj_pcds:\n            # build locs\n            if rot_matrix is not None:\n                obj_pcd[:, :3] = np.matmul(obj_pcd[:, :3], rot_matrix.transpose())\n            obj_center = obj_pcd[:, :3].mean(0)\n            obj_size = obj_pcd[:, :3].max(0) - obj_pcd[:, :3].min(0)\n            obj_locs.append(np.concatenate([obj_center, obj_size], 0))\n\n            # build box\n            if is_need_bbox:\n                obj_box_center = (obj_pcd[:, :3].max(0) + obj_pcd[:, :3].min(0)) / 2\n                obj_box_size = obj_pcd[:, :3].max(0) - obj_pcd[:, :3].min(0)\n                obj_boxes.append(np.concatenate([obj_box_center, obj_box_size], 0))\n\n            # subsample\n            pcd_idxs = np.random.choice(len(obj_pcd), size=self.num_points,\n                                        replace=len(obj_pcd) < self.num_points)\n            obj_pcd = obj_pcd[pcd_idxs]\n            # normalize\n            obj_pcd[:, :3] = obj_pcd[:, :3] - obj_pcd[:, :3].mean(0)\n            max_dist = np.max(np.sqrt(np.sum(obj_pcd[:, :3]**2, 1)))\n            if max_dist < 1e-6: # take care of tiny point-clouds, i.e., padding\n                max_dist = 1\n            obj_pcd[:, :3] = obj_pcd[:, :3] / max_dist\n            obj_fts.append(obj_pcd)\n\n        # convert to torch\n        obj_fts = torch.from_numpy(np.stack(obj_fts, 0))\n        obj_locs = torch.from_numpy(np.array(obj_locs))\n        obj_boxes = torch.from_numpy(np.array(obj_boxes))\n        obj_labels = torch.LongTensor(obj_labels)\n\n        assert obj_labels.shape[0] == obj_locs.shape[0]\n        assert obj_fts.shape[0] == obj_locs.shape[0]\n\n        return obj_fts, obj_locs, obj_boxes, obj_labels\n\n    def _obj_processing_aug(self, obj_pcds, obj_labels, is_need_bbox=False):\n        # augment objects\n        if self.augmentor:\n            data_dict = self.augmentor.forward({'obj_pcds': obj_pcds,\n                                                'num_points': self.num_points})\n\n        obj_pcds = data_dict['obj_pcds']\n        if isinstance(obj_pcds, list):\n            obj_pcds = torch.Tensor(np.array(obj_pcds))\n        obj_sizes = torch.Tensor(np.array(data_dict['obj_sizes']))\n\n        xyz = obj_pcds[:, :, :3]\n        center = xyz.mean(1)\n        xyz_min = xyz.min(1).values\n        xyz_max = xyz.max(1).values\n        box_center = (xyz_min + xyz_max) / 2\n        size = torch.Tensor(obj_sizes)\n        # size = xyz_max - xyz_min\n        obj_locs = torch.cat([center, size], dim=1)\n        obj_boxes = torch.cat([box_center, size], dim=1)\n\n        # centering\n        obj_pcds[:, :, :3].sub_(obj_pcds[:, :, :3].mean(1, keepdim=True))\n\n        # normalization\n        max_dist = (obj_pcds[:, :, :3]**2).sum(2).sqrt().max(1).values\n        max_dist.clamp_(min=1e-6)\n        obj_pcds[:, :, :3].div_(max_dist[:, None, None])\n\n        # convert to torch\n        obj_labels = torch.LongTensor(obj_labels)\n\n        assert obj_labels.shape[0] == obj_locs.shape[0]\n\n        return obj_pcds, obj_locs, obj_boxes, obj_labels\n    \n    def _scene_processing_aug(self, obj_pcds, bg_pcds, obj_labels, is_need_bbox=False):\n        obj_len = len(obj_pcds)\n        # sample background points\n        fg_points_num = len(obj_pcds) * self.num_points\n        assert fg_points_num < self.max_pcd_num_points\n        bg_points_num = min(self.max_pcd_num_points - fg_points_num, self.bg_points_num)\n        assert len(bg_pcds) > 0\n        assert bg_points_num > 0\n        bg_points_indices = np.random.choice(len(bg_pcds), size=bg_points_num, replace=len(bg_pcds) < bg_points_num)\n        bg_pcds = bg_pcds[bg_points_indices]\n\n        # augment objects\n        if self.augmentor:\n            data_dict = self.augmentor.forward({'obj_pcds': obj_pcds,\n                                                'bg_pcds': torch.Tensor(bg_pcds), \n                                                'num_points': self.num_points})\n\n        obj_pcds = data_dict['obj_pcds']\n        if isinstance(obj_pcds, list):\n            obj_pcds = torch.Tensor(np.array(obj_pcds))\n        obj_sizes = torch.Tensor(np.array(data_dict['obj_sizes']))\n        bg_pcds = data_dict['bg_pcds']\n        assert len(obj_pcds) * obj_pcds[0].shape[0] == fg_points_num\n\n        scene_pcds = np.vstack([np.array(obj_pcds.reshape(-1, 6)), np.array(bg_pcds)])\n\n        xyz = obj_pcds[:, :, :3]\n        center = xyz.mean(1)\n        xyz_min = xyz.min(1).values\n        xyz_max = xyz.max(1).values\n        box_center = (xyz_min + xyz_max) / 2\n        size = torch.Tensor(obj_sizes)\n        # size = xyz_max - xyz_min\n        obj_locs = torch.cat([center, size], dim=1)\n        obj_boxes = torch.cat([box_center, size], dim=1)\n\n        # centering\n        obj_pcds[:, :, :3].sub_(obj_pcds[:, :, :3].mean(1, keepdim=True))\n\n        # normalization\n        max_dist = (obj_pcds[:, :, :3]**2).sum(2).sqrt().max(1).values\n        max_dist.clamp_(min=1e-6)\n        obj_pcds[:, :, :3].div_(max_dist[:, None, None])\n\n        # generate obj point indices masks\n        obj_pcds_masks = []\n        offset = 0\n        for _j in range(obj_len):\n            mask = np.arange(self.num_points) + offset\n            assert len(mask) == len(obj_pcds[_j])\n            obj_pcds_masks.append(mask)\n            offset += self.num_points\n\n        # convert to torch\n        obj_labels = torch.LongTensor(obj_labels)\n        obj_pcds_masks = torch.from_numpy(np.array(obj_pcds_masks))\n\n        assert obj_labels.shape[0] == obj_locs.shape[0]\n        assert obj_pcds_masks.shape[0] == obj_locs.shape[0]\n\n        return obj_locs, obj_boxes, obj_labels, obj_pcds_masks, scene_pcds\n\n\n    def _get_pooling_obj_feature(self, args, mv_info_all, sampled_frame_names, scan_id):\n        obj_dict = {}\n        for i in range(len(sampled_frame_names)):\n            frame_info = mv_info_all[sampled_frame_names[i]]\n            inst_all = [x for x in frame_info['instance_info'] if x['is_need_process']]\n            for one_inst in inst_all:\n                tmp_inst_id = one_inst['org_inst_id']\n                feat = one_inst[args.inst_feat_type]\n                feat = feat[0] if len(feat) == 1 else feat\n\n                inst_id = self.label_converter.orgInstID_to_id[tmp_inst_id]\n                if inst_id in obj_dict.keys():\n                    obj_dict[inst_id]['feat'].append(feat)\n                    assert self.scan_data[scan_id]['inst_labels'][inst_id] == obj_dict[inst_id]['label']\n                else:\n                    obj_pcd = self.scan_data[scan_id]['obj_pcds'][inst_id]\n                    if self.rot_matrix is not None:\n                        obj_pcd[:, :3] = np.matmul(obj_pcd[:, :3], self.rot_matrix.transpose())\n                    obj_center = obj_pcd[:, :3].mean(0)\n                    obj_size = obj_pcd[:, :3].max(0) - obj_pcd[:, :3].min(0)\n                    obj_loc = np.concatenate([obj_center, obj_size], 0)\n\n                    obj_box_center = (obj_pcd[:, :3].max(0) + obj_pcd[:, :3].min(0)) / 2\n                    obj_box_size = obj_pcd[:, :3].max(0) - obj_pcd[:, :3].min(0)\n                    obj_box = np.concatenate([obj_box_center, obj_box_size], 0)\n\n                    obj_dict[inst_id] = {\n                        'feat': [feat],\n                        'location': obj_loc,\n                        'label': self.scan_data[scan_id]['inst_labels'][inst_id],\n                        'box' : obj_box,\n                    }\n\n        if args.pooling_strategy == 'average_all':\n            for key in obj_dict.keys():\n                feat_all = np.array(obj_dict[key]['feat'])\n                if args.pooling_strategy == 'average_all':\n                    obj_dict[key]['feat'] = np.mean(feat_all, axis = 0)\n\n        return obj_dict\n\n    def init_dataset_params(self, dataset_cfg):\n        if dataset_cfg is None:\n            dataset_cfg = {}\n        self.pc_type = dataset_cfg.get('pc_type', 'gt')\n        self.sem_type = dataset_cfg.get('sem_type', '607')\n        self.max_obj_len = dataset_cfg.get('max_obj_len', 80)\n        self.num_points = dataset_cfg.get('num_points', 1024)\n        self.filter_lang = dataset_cfg.get('filter_lang', False)\n        self.rot_aug = dataset_cfg.get('rot_aug', True)\n        self.train_duplicate = dataset_cfg.get('train_duplicate', 1)\n\n        self.load_multiview_info = self.cfg.data.get('load_multiview_info', False)\n        self.load_mask3d_voxel = self.cfg.data.get('load_mask3d_voxel', False)\n        self.process_num = self.cfg.data.get('process_num', 20)\n        assert self.pc_type in ['gt', 'pred']\n        assert self.sem_type in ['607']\n\n    def init_scan_data(self):\n        self.scan_data = self._load_scannet(self.scan_ids, self.pc_type,\n                                           load_inst_info=self.split!='test',\n                                           load_multiview_info = self.load_multiview_info,\n                                           load_mask3d_voxel = self.load_mask3d_voxel,\n                                           process_num = self.process_num\n                                           )\n\n        # build unique multiple look up\n        for scan_id in self.scan_ids:\n            inst_labels = self.scan_data[scan_id]['inst_labels']\n            self.scan_data[scan_id]['label_count'] = collections.Counter(\n                                    [self.label_converter.id_to_scannetid[l] for l in inst_labels])\n\n    def get_scene(self, scan_id, tgt_object_id_list, tgt_object_name_list, sentence):\n        if not isinstance(tgt_object_id_list, list):\n            tgt_object_id_list = [tgt_object_id_list]\n        if not isinstance(tgt_object_name_list, list):\n            tgt_object_name_list = [tgt_object_name_list]\n        tgt_obj_boxes = [np.concatenate(convert_pc_to_box(self.scan_data[scan_id][\"obj_pcds\"][i])) for i in tgt_object_id_list]\n\n        # load pcds and labels\n        if self.pc_type == 'gt':\n            obj_pcds = self.scan_data[scan_id]['obj_pcds'] # N, 6\n            obj_labels = self.scan_data[scan_id]['inst_labels'] # N\n        elif self.pc_type == 'pred':\n            obj_pcds = self.scan_data[scan_id]['obj_pcds_pred']\n            obj_labels = self.scan_data[scan_id]['inst_labels_pred']\n            # get obj labels by matching\n            gt_obj_labels = self.scan_data[scan_id]['inst_labels'] # N\n            obj_center = self.scan_data[scan_id]['obj_center']\n            obj_box_size = self.scan_data[scan_id]['obj_box_size']\n            obj_center_pred = self.scan_data[scan_id]['obj_center_pred']\n            obj_box_size_pred = self.scan_data[scan_id]['obj_box_size_pred']\n            for i, _ in enumerate(obj_center_pred):\n                for j, _ in enumerate(obj_center):\n                    if eval_ref_one_sample(construct_bbox_corners(obj_center[j],\n                                                                  obj_box_size[j]),\n                                           construct_bbox_corners(obj_center_pred[i],\n                                                                  obj_box_size_pred[i])) >= 0.25:\n                        obj_labels[i] = gt_obj_labels[j]\n                        break\n\n        # filter out background or language\n        # do not filter for predicted labels, because these labels are not accurate\n        excluded_labels = ['wall', 'floor', 'ceiling']\n        def keep_obj(i, obj_label):\n            if self.pc_type != 'gt' or i in tgt_object_id_list:\n                return True\n            category = self.int2cat[obj_label]\n            if category in excluded_labels:\n                return False\n            if self.filter_lang and category not in sentence:\n                return False\n            return True\n        selected_obj_idxs = [i for i, obj_label in enumerate(obj_labels) if keep_obj(i, obj_label)]\n\n        # build tgt object id and box\n        if self.pc_type == 'gt':\n            tgt_object_label_list = [obj_labels[x] for x in tgt_object_id_list]\n            tgt_object_id_iou25_list = tgt_object_id_list\n            tgt_object_id_iou50_list = tgt_object_id_list\n            # for i, _ in enumerate(tgt_object_label_list):\n            #     assert self.int2cat[tgt_object_label_list[i]] == tgt_object_name_list[i]\n        elif self.pc_type == 'pred':\n            tgt_object_label_list = [self.cat2int[x] for x in tgt_object_name_list]\n            tgt_object_id_list_matched = []\n            tgt_object_id_iou25_list = []\n            tgt_object_id_iou50_list = []\n            for cur_id in tgt_object_id_list:\n                gt_pcd = self.scan_data[scan_id][\"obj_pcds\"][cur_id]\n                gt_center, gt_box_size = convert_pc_to_box(gt_pcd)\n                max_iou = -1\n                for i in selected_obj_idxs:\n                    obj_center, obj_box_size = convert_pc_to_box(obj_pcds[i])\n                    iou = eval_ref_one_sample(construct_bbox_corners(obj_center, obj_box_size),\n                                            construct_bbox_corners(gt_center, gt_box_size))\n                    if iou > max_iou:\n                        max_iou = iou\n                        tgt_object_id_matched = i\n                    # find tgt iou 25\n                    if iou >= 0.25:\n                        tgt_object_id_iou25_list.append(i)\n                    # find tgt iou 50\n                    if iou >= 0.5:\n                        tgt_object_id_iou50_list.append(i)\n                tgt_object_id_list_matched.append(tgt_object_id_matched)\n            tgt_object_id_list = tgt_object_id_list_matched\n\n            tgt_object_id_list = list(set(tgt_object_id_list))\n            tgt_object_id_iou25_list = list(set(tgt_object_id_iou25_list))\n            tgt_object_id_iou50_list = list(set(tgt_object_id_iou50_list))\n\n        # crop objects to max_obj_len\n        if self.max_obj_len < len(selected_obj_idxs):\n            pre_selected_obj_idxs = selected_obj_idxs\n            # select target first\n            if len(tgt_object_id_list) > 0:\n                selected_obj_idxs = tgt_object_id_list[:]\n            selected_obj_idxs.extend(tgt_object_id_iou25_list)\n            selected_obj_idxs.extend(tgt_object_id_iou50_list)\n            selected_obj_idxs = list(set(selected_obj_idxs))\n            # select object with same semantic class with tgt_object\n            remained_obj_idx = []\n            for i in pre_selected_obj_idxs:\n                label = obj_labels[i]\n                if i not in selected_obj_idxs:\n                    if label in tgt_object_label_list:\n                        selected_obj_idxs.append(i)\n                    else:\n                        remained_obj_idx.append(i)\n                if len(selected_obj_idxs) >= self.max_obj_len:\n                    break\n            if len(selected_obj_idxs) < self.max_obj_len:\n                random.shuffle(remained_obj_idx)\n                selected_obj_idxs += remained_obj_idx[:(self.max_obj_len - len(selected_obj_idxs))]\n            # assert len(selected_obj_idxs) == self.max_obj_len\n\n        # reorganize ids\n        tgt_object_id_list = [selected_obj_idxs.index(id) for id in tgt_object_id_list]\n        tgt_object_id_iou25_list = [selected_obj_idxs.index(id) for id in tgt_object_id_iou25_list]\n        tgt_object_id_iou50_list = [selected_obj_idxs.index(id) for id in tgt_object_id_iou50_list]\n\n        # build unique multiple\n        is_multiple = sum([self.scan_data[scan_id]['label_count'][self.label_converter.id_to_scannetid[x]] \n                           for x in tgt_object_label_list]) > 1\n\n        obj_pcds = [obj_pcds[id] for id in selected_obj_idxs]\n        obj_labels = [obj_labels[id] for id in selected_obj_idxs]\n        obj_fts, obj_locs, obj_boxes, obj_labels = self._obj_processing_post(obj_pcds, obj_labels, is_need_bbox=True, rot_aug=self.rot_aug)\n\n        data_dict = {\n            \"scan_id\": scan_id,\n            \"tgt_object_id\": torch.LongTensor(tgt_object_id_list),\n            \"tgt_object_label\": torch.LongTensor(tgt_object_label_list),\n            \"tgt_obj_boxes\": tgt_obj_boxes, # only use it for evaluation, because it is w/o augmentation.\n            \"obj_fts\": obj_fts,\n            \"obj_locs\": obj_locs,\n            \"obj_labels\": obj_labels,\n            \"obj_boxes\": obj_boxes,\n            \"tgt_object_id_iou25\": torch.LongTensor(tgt_object_id_iou25_list),\n            \"tgt_object_id_iou50\": torch.LongTensor(tgt_object_id_iou50_list), \n            'is_multiple': is_multiple\n        }\n\n        if 'multiview_info' in self.scan_data[scan_id]:\n            mv_out_dict = self._get_multiview_info(scan_id)\n            obj_mv_fts = [mv_out_dict[oid]['feat'] if oid in mv_out_dict else \n                      np.zeros_like(next(iter(mv_out_dict.values()))['feat']) for oid in selected_obj_idxs]\n            data_dict['obj_mv_fts'] = torch.from_numpy(np.array(obj_mv_fts)).float()\n\n        if 'mask3d_voxel' in self.scan_data[scan_id]:\n            voxel_out_dict = self.scan_data[scan_id]['mask3d_voxel']\n            obj_voxel_fts = [voxel_out_dict[id] if id in voxel_out_dict else\n                        np.zeros_like(next(iter(voxel_out_dict.values()))) for id in selected_obj_idxs]\n            data_dict['obj_voxel_fts'] = torch.from_numpy(np.array(obj_voxel_fts)).float()\n\n        return data_dict"
  },
  {
    "path": "data/datasets/scannet_old.py",
    "content": "import os\nimport collections\nimport json\nimport random\n\nimport jsonlines\nfrom tqdm import tqdm\nimport numpy as np\nimport albumentations as A\nimport volumentations as V\nimport torch\nfrom torch.utils.data import Dataset\nfrom pathlib import Path\nfrom copy import deepcopy\n\nfrom ..build import DATASET_REGISTRY\nfrom ..data_utils import convert_pc_to_box, ScanQAAnswer, SQA3DAnswer, construct_bbox_corners, \\\n                         eval_ref_one_sample, is_explicitly_view_dependent, get_sqa_question_type\nfrom .scannet_base import ScanNetBase\n\n\n@DATASET_REGISTRY.register()\nclass ScanNetSQA3D(ScanNetBase):\n    r\"\"\"\n    questions json file: dict_keys(['info', 'license', 'data_type', 'data_subtype', 'task_type', 'questions'])\n        'questions': List\n        'questions'[0]: {\n            'scene_id': 'scene0050_00',\n            'situation': 'I am standing by the ottoman on my right facing a couple of toolboxes.',\n            'alternative_situation': [\n                'I just placed two backpacks on the ottoman on my right side before I went to play the piano in front of me to the right.',\n                'I stood up from the ottoman and walked over to the piano ahead of me.'\n            ],\n            'question': 'What instrument in front of me is ebony and ivory?',\n            'question_id': 220602000002\n        }\n\n    annotations json file: dict_keys(['info', 'license', 'data_type', 'data_subtype', 'annotations'])\n        'annotations': List\n        'annotations'[0]: {\n            'scene_id': 'scene0050_00',\n            'question_type': 'N/A',\n            'answer_type': 'other',\n            'question_id': 220602000002,\n            'answers': [{'answer': 'piano', 'answer_confidence': 'yes', 'answer_id': 1}],\n            'rotation': {'_x': 0, '_y': 0, '_z': -0.9995736030415032, '_w': -0.02919952230128897},\n            'position': {'x': 0.7110268899979686, 'y': -0.03219739162793617, 'z': 0}\n        }\n    \"\"\"\n    def __init__(self, cfg, split):\n        super().__init__(cfg, split)\n\n        self.pc_type = cfg.data.args.pc_type\n        self.sem_type = cfg.data.args.sem_type\n        self.max_obj_len = cfg.data.args.max_obj_len - 1\n        self.num_points = cfg.data.args.num_points\n        self.filter_lang = cfg.data.args.filter_lang\n        self.rot_aug = cfg.data.args.rot_aug\n        self.use_unanswer = cfg.data.get(self.__class__.__name__).get(split).use_unanswer\n\n        assert self.pc_type in ['gt', 'pred']\n        assert self.sem_type in ['607']\n        assert self.split in ['train', 'val', 'test']\n        if self.split == 'train':\n            self.pc_type = 'gt'\n        # use test set for validation\n        elif self.split == 'val':\n            self.split = 'test'\n\n        print(f\"Loading ScanNet SQA3D {split}-set language\")\n        # build answer\n        self.num_answers, self.answer_vocab, self.answer_cands = self.build_answer()\n\n        # load annotations\n        lang_data, self.scan_ids, self.scan_to_item_idxs = self._load_lang()\n        if cfg.debug.flag:\n            self.lang_data = []\n            self.scan_ids = sorted(list(self.scan_ids))[:cfg.debug.debug_size]\n            for item in lang_data:\n                if item['scene_id'] in self.scan_ids:\n                    self.lang_data.append(item)\n        else:\n            self.lang_data = lang_data\n\n        # load question engine\n        self.questions_map = self._load_question()\n        print(f\"Finish loading ScanNet SQA3D {split}-set language\")\n\n        # load scans\n        print(f\"Loading ScanNet SQA3D {split}-set scans\")\n        self.scan_data = self._load_scannet(self.scan_ids, self.pc_type, self.pc_type == 'gt')\n        print(f\"Finish loading ScanNet SQA3D {split}-set data\")\n\n    def __getitem__(self, index):\n        item = self.lang_data[index]\n        item_id = item['question_id']\n        scan_id = item['scene_id']\n\n        tgt_object_id_list = []\n        tgt_object_name_list = []\n        answer_list = [answer['answer'] for answer in item['answers']]\n        answer_id_list = [self.answer_vocab.stoi(answer)\n                          for answer in answer_list if self.answer_vocab.stoi(answer) >= 0]\n\n        if self.split == 'train':\n            # augment with random situation for train\n            situation = random.choice(self.questions_map[scan_id][item_id]['situation'])\n        else:\n            # fix for eval\n            situation = self.questions_map[scan_id][item_id]['situation'][0]\n        question = self.questions_map[scan_id][item_id]['question']\n        concat_sentence = situation + question\n        question_type = get_sqa_question_type(question)\n\n        # load pcds and labels\n        if self.pc_type == 'gt':\n            obj_pcds = self.scan_data[scan_id]['obj_pcds'] # N, 6\n            obj_labels = self.scan_data[scan_id]['inst_labels'] # N\n        elif self.pc_type == 'pred':\n            obj_pcds = self.scan_data[scan_id]['obj_pcds_pred']\n            obj_labels = self.scan_data[scan_id]['inst_labels_pred']\n\n        # filter out background or language\n        if self.filter_lang:\n            if self.pc_type == 'gt':\n                selected_obj_idxs = [i for i, obj_label in enumerate(obj_labels)\n                                if (self.int2cat[obj_label] not in ['wall', 'floor', 'ceiling'])\n                                and (self.int2cat[obj_label] in concat_sentence)]\n                for _id in tgt_object_id_list:\n                    if _id not in selected_obj_idxs:\n                        selected_obj_idxs.append(_id)\n            else:\n                selected_obj_idxs = [i for i in range(len(obj_pcds))]\n        else:\n            if self.pc_type == 'gt':\n                selected_obj_idxs = [i for i, obj_label in enumerate(obj_labels)\n                                if (self.int2cat[obj_label] not in ['wall', 'floor', 'ceiling'])]\n            else:\n                selected_obj_idxs = [i for i in range(len(obj_pcds))]\n\n        obj_pcds = [obj_pcds[id] for id in selected_obj_idxs]\n        obj_labels = [obj_labels[id] for id in selected_obj_idxs]\n\n        # build tgt object id and box\n        if self.pc_type == 'gt':\n            tgt_object_id_list = [selected_obj_idxs.index(x) for x in tgt_object_id_list]\n            tgt_object_label_list = [obj_labels[x] for x in tgt_object_id_list]\n            for i in range(len(tgt_object_label_list)):\n                assert self.int2cat[tgt_object_label_list[i]] == tgt_object_name_list[i]\n        elif self.pc_type == 'pred':\n            # build gt box\n            gt_center = []\n            gt_box_size = []\n            for cur_id in tgt_object_id_list:\n                gt_pcd = self.scan_data[scan_id][\"obj_pcds\"][cur_id]\n                center, box_size = convert_pc_to_box(gt_pcd)\n                gt_center.append(center)\n                gt_box_size.append(box_size)\n\n            # start filtering\n            tgt_object_id_list = []\n            tgt_object_label_list = []\n            for i in range(len(obj_pcds)):\n                obj_center, obj_box_size = convert_pc_to_box(obj_pcds[i])\n                for j in range(len(gt_center)):\n                    if eval_ref_one_sample(construct_bbox_corners(obj_center, obj_box_size), construct_bbox_corners(gt_center[j], gt_box_size[j])) >= 0.25:\n                        tgt_object_id_list.append(i)\n                        tgt_object_label_list.append(self.cat2int[tgt_object_name_list[j]])\n                        break\n        assert(len(obj_pcds) == len(obj_labels))\n\n        # crop objects\n        if self.max_obj_len < len(obj_labels):\n            selected_obj_idxs = tgt_object_id_list.copy()\n            remained_obj_idx = []\n            for kobj, klabel in enumerate(obj_labels):\n                if kobj not in  tgt_object_id_list:\n                    if klabel in tgt_object_label_list:\n                        selected_obj_idxs.append(kobj)\n                    else:\n                        remained_obj_idx.append(kobj)\n                if len(selected_obj_idxs) == self.max_obj_len:\n                    break\n            if len(selected_obj_idxs) < self.max_obj_len:\n                random.shuffle(remained_obj_idx)\n                selected_obj_idxs += remained_obj_idx[:(self.max_obj_len - len(selected_obj_idxs))]\n            obj_pcds = [obj_pcds[i] for i in selected_obj_idxs]\n            obj_labels = [obj_labels[i] for i in selected_obj_idxs]\n            tgt_object_id_list = [i for i in range(len(tgt_object_id_list))]\n            assert len(obj_pcds) == self.max_obj_len\n\n        # rebuild tgt_object_id\n        if len(tgt_object_id_list) == 0:\n            tgt_object_id_list.append(len(obj_pcds))\n            tgt_object_label_list.append(5)\n\n        obj_fts, obj_locs, obj_boxes, obj_labels = self._obj_processing_post(obj_pcds, obj_labels, is_need_bbox=True, rot_aug=self.rot_aug)\n\n        # convert answer format\n        answer_label = torch.zeros(self.num_answers).long()\n        for _id in answer_id_list:\n            answer_label[_id] = 1\n        # tgt object id\n        tgt_object_id = torch.zeros(len(obj_fts) + 1).long() # add 1 for pad place holder\n        for _id in tgt_object_id_list:\n            tgt_object_id[_id] = 1\n        # tgt object sematic\n        if self.sem_type == '607':\n            tgt_object_label = torch.zeros(607).long()\n        else:\n            raise NotImplementedError(\"semantic type \" + self.sem_type)\n        for _id in tgt_object_label_list:\n            tgt_object_label[_id] = 1\n\n        data_dict = {\n            \"situation\": situation,\n            \"situation_pos\": item['position'],\n            \"situation_rot\": item['rotation'],\n            \"question\": question,\n            \"sentence\": concat_sentence,\n            \"scan_dir\": os.path.join(self.base_dir, 'scans'),\n            \"scan_id\": scan_id,\n            \"answer\": \"[answer_seq]\".join(answer_list),\n            \"answer_label\": answer_label, # A\n            \"tgt_object_id\": torch.LongTensor(tgt_object_id), # N\n            \"tgt_object_label\": torch.LongTensor(tgt_object_label), # L\n            \"obj_fts\": obj_fts,\n            \"obj_locs\": obj_locs,\n            \"obj_labels\": obj_labels,\n            \"obj_boxes\": obj_boxes, # N, 6\n            \"data_idx\": item_id,\n            \"sqa_type\": question_type\n        }\n\n        return data_dict\n\n    def build_answer(self):\n        answer_data = json.load(\n            open(os.path.join(self.base_dir,\n                              'annotations/sqa_task/answer_dict.json'), encoding='utf-8')\n            )[0]\n        answer_counter = []\n        for data in answer_data.keys():\n            answer_counter.append(data)\n        answer_counter = collections.Counter(sorted(answer_counter))\n        num_answers = len(answer_counter)\n        answer_cands = answer_counter.keys()\n        answer_vocab = SQA3DAnswer(answer_cands)\n        print(f\"total answers is {num_answers}\")\n        return num_answers, answer_vocab, answer_cands\n\n    def _load_lang(self):\n        lang_data = []\n        scan_ids = set()\n        scan_to_item_idxs = collections.defaultdict(list)\n\n        anno_file = os.path.join(self.base_dir,\n            f'annotations/sqa_task/balanced/v1_balanced_sqa_annotations_{self.split}_scannetv2.json')\n        json_data = json.load(open(anno_file, 'r', encoding='utf-8'))['annotations']\n        for item in json_data:\n            if self.use_unanswer or (len(set(item['answers']) & set(self.answer_cands)) > 0):\n                scan_ids.add(item['scene_id'])\n                scan_to_item_idxs[item['scene_id']].append(len(lang_data))\n                lang_data.append(item)\n        print(f'{self.split} unanswerable question {len(json_data) - len(lang_data)},'\n              + f'answerable question {len(lang_data)}')\n\n        return lang_data, scan_ids, scan_to_item_idxs\n\n    def _load_question(self):\n        questions_map = {}\n        anno_file = os.path.join(self.base_dir,\n            f'annotations/sqa_task/balanced/v1_balanced_questions_{self.split}_scannetv2.json')\n        json_data = json.load(open(anno_file, 'r', encoding='utf-8'))['questions']\n        for item in json_data:\n            if item['scene_id'] not in questions_map.keys():\n                questions_map[item['scene_id']] = {}\n            questions_map[item['scene_id']][item['question_id']] = {\n                'situation': [item['situation']] + item['alternative_situation'],   # list of sentences\n                'question': item['question']   # sentence\n            }\n\n        return questions_map\n\n\n@DATASET_REGISTRY.register()\nclass ScanNetScanQAOld(ScanNetBase):\n    def __init__(self, cfg, split):\n        super(ScanNetScanQAOld, self).__init__(cfg, split)\n\n        self.pc_type = cfg.data.args.pc_type\n        self.sem_type = cfg.data.args.sem_type\n        self.max_obj_len = cfg.data.args.max_obj_len - 1\n        self.num_points = cfg.data.args.num_points\n        self.filter_lang = cfg.data.args.filter_lang\n        self.use_unanswer = cfg.data.get(self.__class__.__name__).get(split).use_unanswer\n\n        assert self.pc_type in ['gt', 'pred']\n        assert self.sem_type in ['607']\n        assert self.split in ['train', 'val', 'test']\n        if self.split == 'train':\n            self.pc_type = 'gt'\n        # TODO: hack test split to be the same as val\n        if self.split == 'test':\n            self.split = cfg.data.ScanNetScanQAOld.test.test_file\n        self.is_test = ('test' in self.split)\n\n        print(f\"Loading ScanNet ScanQA {split}-set language\")\n        self.num_answers, self.answer_vocab, self.answer_cands = self.build_answer()\n        lang_data, self.scan_ids, self.scan_to_item_idxs = self._load_lang()\n        if cfg.debug.flag and cfg.debug.debug_size != -1:\n            self.lang_data = []\n            self.scan_ids = sorted(list(self.scan_ids))[:cfg.debug.debug_size]\n            for item in lang_data:\n                if item['scene_id'] in self.scan_ids:\n                    self.lang_data.append(item)\n        else:\n            self.lang_data = lang_data\n        print(f\"Finish loading ScanNet ScanQA {split}-set language\")\n\n        print(f\"Loading ScanNet ScanQA {split}-set scans\")\n        self.scan_data = self._load_scannet(self.scan_ids, self.pc_type,\n                                           load_inst_info=('test' not in self.split))\n        print(f\"Finish loading ScanNet ScanQA {split}-set data\")\n\n    def __getitem__(self, index):\n        \"\"\"Data dict post-processing, for example, filtering, crop, nomalization,\n        rotation, etc.\n        Args:\n            index (int): _description_\n        \"\"\"\n        item = self.lang_data[index]\n        item_id = item['question_id']\n        # item_id = ''.join([i for i in item_id if i.isdigit()])\n        # item_id = int(item_id[:-1].lstrip('0') + item_id[-1])\n        scan_id =  item['scene_id']\n        if not self.is_test:\n            tgt_object_id_list = item['object_ids']\n            tgt_object_name_list = item['object_names']\n            answer_list = item['answers']\n            answer_id_list = [self.answer_vocab.stoi(answer) \n                              for answer in answer_list if self.answer_vocab.stoi(answer) >= 0]\n        else:\n            tgt_object_id_list = []\n            tgt_object_name_list = []\n            answer_list = []\n            answer_id_list = []\n        question = item['question']\n\n         # load pcds and labels\n        if self.pc_type == 'gt':\n            obj_pcds = self.scan_data[scan_id]['obj_pcds'] # N, 6\n            obj_labels = self.scan_data[scan_id]['inst_labels'] # N\n        elif self.pc_type == 'pred':\n            obj_pcds = self.scan_data[scan_id]['obj_pcds_pred']\n            obj_labels = self.scan_data[scan_id]['inst_labels_pred']\n            # get obj labels by matching\n            if not self.is_test:\n                gt_obj_labels = self.scan_data[scan_id]['inst_labels'] # N\n                obj_center = self.scan_data[scan_id]['obj_center']\n                obj_box_size = self.scan_data[scan_id]['obj_box_size']\n                obj_center_pred = self.scan_data[scan_id]['obj_center_pred']\n                obj_box_size_pred = self.scan_data[scan_id]['obj_box_size_pred']\n                for i, _ in enumerate(obj_center_pred):\n                    for j, _ in enumerate(obj_center):\n                        if eval_ref_one_sample(construct_bbox_corners(obj_center[j],\n                                                                      obj_box_size[j]),\n                                            construct_bbox_corners(obj_center_pred[i],\n                                                                   obj_box_size_pred[i])) >= 0.25:\n                            obj_labels[i] = gt_obj_labels[j]\n                            break\n\n        # filter out background or language\n        if self.filter_lang:\n            if self.pc_type == 'gt':\n                selected_obj_idxs = [i for i, obj_label in enumerate(obj_labels)\n                                    if (self.int2cat[obj_label] not in ['wall', 'floor', 'ceiling'])\n                                    and (self.int2cat[obj_label] in question)]\n                for _id in tgt_object_id_list:\n                    if _id not in selected_obj_idxs:\n                        selected_obj_idxs.append(_id)\n            else:\n                selected_obj_idxs = [i for i in range(len(obj_pcds))]\n        else:\n            if self.pc_type == 'gt':\n                selected_obj_idxs = [i for i, obj_label in enumerate(obj_labels)\n                                if (self.int2cat[obj_label] not in ['wall', 'floor', 'ceiling'])]\n            else:\n                selected_obj_idxs = [i for i in range(len(obj_pcds))]\n\n        obj_pcds = [obj_pcds[id] for id in selected_obj_idxs]\n        obj_labels = [obj_labels[id] for id in selected_obj_idxs]\n\n        # build tgt object id and box\n        if self.pc_type == 'gt':\n            tgt_object_id_list = [selected_obj_idxs.index(x) for x in tgt_object_id_list]\n            tgt_object_label_list = [obj_labels[x] for x in tgt_object_id_list]\n            for i, _ in enumerate(tgt_object_label_list):\n                assert self.int2cat[tgt_object_label_list[i]] == tgt_object_name_list[i]\n        elif self.pc_type == 'pred':\n            # build gt box\n            gt_center = []\n            gt_box_size = []\n            for cur_id in tgt_object_id_list:\n                gt_pcd = self.scan_data[scan_id][\"obj_pcds\"][cur_id]\n                center, box_size = convert_pc_to_box(gt_pcd)\n                gt_center.append(center)\n                gt_box_size.append(box_size)\n\n            # start filtering\n            tgt_object_id_list = []\n            tgt_object_label_list = []\n            for i, _ in enumerate(obj_pcds):\n                obj_center, obj_box_size = convert_pc_to_box(obj_pcds[i])\n                for j, _ in enumerate(gt_center):\n                    if eval_ref_one_sample(construct_bbox_corners(obj_center,\n                                                                  obj_box_size),\n                                           construct_bbox_corners(gt_center[j],\n                                                                  gt_box_size[j])) >= 0.25:\n                        tgt_object_id_list.append(i)\n                        tgt_object_label_list.append(self.cat2int[tgt_object_name_list[j]])\n                        break\n        assert(len(obj_pcds) == len(obj_labels))\n\n        # crop objects\n        if self.max_obj_len < len(obj_labels):\n            selected_obj_idxs = tgt_object_id_list.copy()\n            remained_obj_idx = []\n            for kobj, klabel in enumerate(obj_labels):\n                if kobj not in  tgt_object_id_list:\n                    if klabel in tgt_object_label_list:\n                        selected_obj_idxs.append(kobj)\n                    else:\n                        remained_obj_idx.append(kobj)\n                if len(selected_obj_idxs) == self.max_obj_len:\n                    break\n            if len(selected_obj_idxs) < self.max_obj_len:\n                random.shuffle(remained_obj_idx)\n                selected_obj_idxs += remained_obj_idx[:(self.max_obj_len - len(selected_obj_idxs))]\n            obj_pcds = [obj_pcds[i] for i in selected_obj_idxs]\n            obj_labels = [obj_labels[i] for i in selected_obj_idxs]\n            tgt_object_id_list = [i for i in range(len(tgt_object_id_list))]\n            assert len(obj_pcds) == self.max_obj_len\n\n        # rebuild tgt_object_id\n        if len(tgt_object_id_list) == 0:\n            tgt_object_id_list.append(len(obj_pcds))\n            tgt_object_label_list.append(5)\n\n        obj_fts, obj_locs, obj_boxes, obj_labels = self._obj_processing_post(obj_pcds, obj_labels,\n                                                                             is_need_bbox=True)\n\n        # convert answer format\n        answer_label = torch.zeros(self.num_answers)\n        for _id in answer_id_list:\n            answer_label[_id] = 1\n        # tgt object id\n        tgt_object_id = torch.zeros(len(obj_fts) + 1) # add 1 for pad place holder\n        for _id in tgt_object_id_list:\n            tgt_object_id[_id] = 1\n        # tgt object sematic\n        if self.sem_type == '607':\n            tgt_object_label = torch.zeros(607)\n        else:\n            raise NotImplementedError(\"semantic type \" + self.sem_type) \n        for _id in tgt_object_label_list:\n            tgt_object_label[_id] = 1\n\n        data_dict = {\n            \"sentence\": question,\n            \"scan_dir\": os.path.join(self.base_dir, 'scans'),\n            \"scan_id\": scan_id,\n            \"answers\": \"[answer_seq]\".join(answer_list),\n            \"answer_label\": answer_label.float(), # A\n            \"tgt_object_id\": tgt_object_id.float(), # N\n            \"tgt_object_label\": tgt_object_label.float(), # L\n            \"obj_fts\": obj_fts,\n            \"obj_locs\": obj_locs,\n            \"obj_labels\": obj_labels,\n            \"obj_boxes\": obj_boxes, # N, 6 \n            \"data_idx\": item_id\n        }\n\n        return data_dict\n\n    def _load_lang(self):\n        lang_data = []\n        scan_ids = set()\n        scan_to_item_idxs = collections.defaultdict(list)\n\n        anno_file = os.path.join(self.base_dir,\n                                 f'annotations/qa/ScanQA_v1.0_{self.split}.json')\n\n        json_data = json.load(open(anno_file, 'r', encoding='utf-8'))\n        for item in json_data:\n            if self.use_unanswer or (len(set(item['answers']) & set(self.answer_cands)) > 0):\n                scan_ids.add(item['scene_id'])\n                scan_to_item_idxs[item['scene_id']].append(len(lang_data))\n                lang_data.append(item)\n        print(f'{self.split} unanswerable question {len(json_data) - len(lang_data)},'\n              + f'answerable question {len(lang_data)}')\n        return lang_data, scan_ids, scan_to_item_idxs\n\n    def build_answer(self):\n        train_data = json.load(open(os.path.join(self.base_dir,\n                                'annotations/qa/ScanQA_v1.0_train.json'), encoding='utf-8'))\n        answer_counter = sum([data['answers'] for data in train_data], [])\n        answer_counter = collections.Counter(sorted(answer_counter))\n        num_answers = len(answer_counter)\n        answer_cands = answer_counter.keys()\n        answer_vocab = ScanQAAnswer(answer_cands)\n        print(f\"total answers is {num_answers}\")\n        return num_answers, answer_vocab, answer_cands\n"
  },
  {
    "path": "data/datasets/structure3d.py",
    "content": "import collections\n\nfrom ..build import DATASET_REGISTRY\nfrom .base import ScanBase\n\n\n@DATASET_REGISTRY.register()\nclass S3DPretrainObj(ScanBase):\n    def __init__(self, cfg, split):\n        super(S3DPretrainObj, self).__init__(cfg, split)\n        self.base_dir = cfg.data.s3d_base\n\n        self.load_scene_pcds = cfg.data.args.get('load_scene_pcds', False)\n        if self.load_scene_pcds:\n            self.max_pcd_num_points = cfg.data.args.get('max_pcd_num_points', None)\n            assert self.max_pcd_num_points is not None\n        self.bg_points_num = cfg.data.args.get('bg_points_num', 1000)\n\n        self.scan_ids = sorted(list(self._load_split(self.split)))\n        if cfg.debug.flag and cfg.debug.debug_size != -1:\n            self.scan_ids = self.scan_ids[:cfg.debug.debug_size]\n\n        print(f\"Loading Structure3D {split}-set scans\")\n        self.scan_data = self._load_scan(self.scan_ids)\n        self.scan_ids = sorted(list(self.scan_data.keys()))\n        print(f\"Finish loading Structure3D {split}-set scans of length {len(self.scan_ids)}\")\n\n    def __len__(self):\n        return len(self.scan_ids)\n\n    def __getitem__(self, index):\n        \"\"\"Data dict post-processing, for example, filtering, crop, nomalization,\n        rotation, etc.\n\n        Args:\n            index (int): _description_\n        \"\"\"\n        data_dict = self._getitem_obj_pretrain(index)\n        dataset = 'rscan'\n        data_dict['source'] = dataset\n        return data_dict\n\n\n@DATASET_REGISTRY.register()\nclass S3DSpatialRefer(ScanBase):\n    def __init__(self, cfg, split):\n        super(S3DSpatialRefer, self).__init__(cfg, split)\n        self.base_dir = cfg.data.s3d_base\n        self.max_obj_len = cfg.data.args.max_obj_len - 1\n        self.filter_lang = cfg.data.args.filter_lang\n\n        self.load_scene_pcds = cfg.data.args.get('load_scene_pcds', False)\n        if self.load_scene_pcds:\n            self.max_pcd_num_points = cfg.data.args.get('max_pcd_num_points', None)\n            assert self.max_pcd_num_points is not None\n        self.bg_points_num = cfg.data.args.get('bg_points_num', 1000)\n\n        split_cfg = cfg.data.get(self.__class__.__name__).get(split)\n        all_scan_ids = self._load_split(self.split)\n\n        print(f\"Loading Structure3D SpatialRefer {split}-set language\")\n        self.lang_data, self.scan_ids = self._load_lang(split_cfg, all_scan_ids)\n        print(f\"Finish loading Structure3D SpatialRefer {split}-set language of size {self.__len__()}\")\n\n        print(f\"Loading Structure3D {split}-set scans\")\n        self.scan_data = self._load_scan(self.scan_ids)\n        print(f\"Finish loading Structure3D {split}-set scans\")\n\n         # build unique multiple look up\n        for scan_id in self.scan_ids:\n            inst_labels = self.scan_data[scan_id]['inst_labels']\n            self.scan_data[scan_id]['label_count'] = collections.Counter(\n                                    [l for l in inst_labels])\n            self.scan_data[scan_id]['label_count_multi'] = collections.Counter(\n                                    [self.label_converter.id_to_scannetid[l] for l in inst_labels])\n\n    def __len__(self):\n        return len(self.lang_data)\n\n    def __getitem__(self, index):\n        \"\"\"Data dict post-processing, for example, filtering, crop, nomalization,\n        rotation, etc.\n\n        Args:\n            index (int): _description_\n        \"\"\"\n        data_dict = self._getitem_refer(index)\n        return data_dict\n"
  },
  {
    "path": "evaluator/__init__.py",
    "content": "from .pretrain_eval import *\nfrom .referit3d_eval import *\nfrom .scanrefer_eval import *\nfrom .scanqa_eval import *\nfrom .objcls_eval import *\nfrom .sqa3d_eval import *\n"
  },
  {
    "path": "evaluator/build.py",
    "content": "import json\nimport numpy as np\nfrom omegaconf import open_dict\nfrom fvcore.common.registry import Registry\n\nfrom common.misc import gather_dict\n\nEVALUATOR_REGISTRY = Registry(\"EVALUATOR\")\n\n\nclass BaseEvaluator():\n    def __init__(self, cfg, accelerator):\n        self.accelerator = accelerator\n        self.best_result = -np.inf\n        self.save = cfg.eval.save\n        self.save_dir.mkdir(parents=True, exist_ok=True)\n        self.reset()\n\n    def reset(self):\n        self.eval_results = []\n        self.eval_dict = {}\n\n    def batch_metrics(self, data_dict, include_count=False):\n        raise NotImplementedError(\"Per batch metrics calculation is required for evaluation\")\n\n    def update(self, data_dict):\n        metrics = self.batch_metrics(data_dict, include_count=True)\n        for key in metrics.keys():\n            if key not in self.eval_dict:\n                self.eval_dict[key] = []\n            self.eval_dict[key].append(metrics[key])\n\n    def record(self):\n        self.eval_dict = gather_dict(self.accelerator, self.eval_dict)\n        for k, metrics in self.eval_dict.items():\n            if not isinstance(metrics, list):\n                continue\n            # metrics is a list of (value, count)\n            total_value = sum(x[0] for x in metrics)\n            total_count = sum(x[1] for x in metrics)\n            self.eval_dict[k] = total_value / max(total_count, 1)\n\n        if self.save and self.accelerator.is_main_process:\n            with (self.save_dir / \"results.json\").open(\"w\") as f:\n                json.dump(self.eval_results, f)\n        \n        self.eval_dict['target_metric'] = self.eval_dict[self.target_metric]\n        if self.eval_dict[\"target_metric\"] > self.best_result:\n            is_best = True\n            self.best_result = self.eval_dict[\"target_metric\"]\n        else:\n            is_best = False\n        self.eval_dict['best_result'] = self.best_result\n        return is_best, self.eval_dict\n\n\ndef get_eval(name, cfg, accelerator, **kwargs):\n    \"\"\"Get an evaluator or a list of evaluators.\"\"\"\n    if isinstance(name, str):\n        eval = EVALUATOR_REGISTRY.get(name)(cfg, accelerator, **kwargs)\n    else:\n        eval = [EVALUATOR_REGISTRY.get(i)(cfg, accelerator, **kwargs) for i in name]\n    return eval\n\ndef build_eval(cfg, accelerator, **kwargs):\n    if cfg.eval.get(\"train\", None) is not None:\n        train_eval = get_eval(cfg.eval.train.name, cfg, accelerator, **kwargs)\n        val_eval = get_eval(cfg.eval.val.name, cfg, accelerator, **kwargs)\n        return {\"train\": train_eval, \"val\": val_eval}\n    elif cfg.eval.get(\"name\", None) is not None:\n        return get_eval(cfg.eval.name, cfg, accelerator, **kwargs)\n    else:\n        with open_dict(cfg):\n            cfg.eval.name = [cfg.data.get(dataset).evaluator for dataset in cfg.data.val]\n        return get_eval(cfg.eval.name, cfg, accelerator, **kwargs)"
  },
  {
    "path": "evaluator/objcls_eval.py",
    "content": "import torch\nfrom pathlib import Path\n\nfrom evaluator.build import EVALUATOR_REGISTRY, BaseEvaluator\n\n\n@EVALUATOR_REGISTRY.register()\nclass PretrainObjEval(BaseEvaluator):\n    def __init__(self, cfg, accelerator, **kwargs):\n        self.target_metric = \"accuracy\"\n        self.save_dir = Path(cfg.exp_dir) / \"eval_results\" / self.__class__.__name__\n        super().__init__(cfg, accelerator, **kwargs)\n\n    def batch_metrics(self, data_dict, include_count=False):\n        metrics = {}\n        logits = data_dict[\"obj_logits\"][data_dict[\"obj_masks\"]].view(-1, data_dict[\"obj_logits\"].shape[-1])\n        labels = data_dict[\"obj_labels\"][data_dict[\"obj_masks\"]].view(-1)\n        _, pred = torch.max(logits, 1)\n        metrics[\"accuracy\"] = ((pred == labels.view(-1)).sum().item(), labels.shape[0])\n        if not include_count:\n            for key, v in metrics.items():\n                metrics[key] = v[0] / max(v[1], 1)\n        return metrics"
  },
  {
    "path": "evaluator/pretrain_eval.py",
    "content": "import torch\nimport numpy as np\n\nfrom evaluator.build import EVALUATOR_REGISTRY, BaseEvaluator\n\n\n@EVALUATOR_REGISTRY.register()\nclass PretrainEval(BaseEvaluator):\n    def __init__(self, cfg, accelerator, **kwargs):\n        self.cfg = cfg\n        self.eval_dict = {\n            \"target_metric\": [], \"og_acc\": [], \"lang_cls_acc_mask\": [], \"obj_cls_post_acc\": [], \"obj_cls_pre_acc\": [],\n            \"obj_cls_raw_acc\": [], \"obj_cls_pre_acc_unmask\": [], \"obj_cls_pre_acc_mask\": [],\n            \"obj_cls_post_acc_unmask\": [], \"obj_cls_post_acc_mask\": []\n        }\n        self.accelerator = accelerator\n        self.device = self.accelerator.device\n        self.total_count = 0\n        self.best_result = -np.inf\n\n    def batch_metrics(self, data_dict):\n        metrics = {}\n        txt_token_mask = (data_dict['masked_lm_labels'] != -1)\n        if 'tgt_object_id' in data_dict.keys():\n            metrics['og_acc'] = (torch.argmax(data_dict['og3d_logits'], dim=-1) == data_dict['tgt_object_id'].squeeze(\n                1)).sum().item() / float(len(data_dict['tgt_object_id']))\n        metrics['lang_cls_acc_mask'] = torch.sum(\n            torch.argmax(data_dict['txt_lm_cls_logits'], dim=2)[txt_token_mask] == data_dict['masked_lm_labels'][\n                txt_token_mask]).item() / float(txt_token_mask.sum().item() + 1e-8)\n        if 'obj_cls_post_logits' in data_dict.keys():\n            metrics['obj_cls_post_acc'] = torch.sum(\n                torch.argmax(data_dict['obj_cls_post_logits'], dim=2)[data_dict['obj_masks']] == data_dict[\"obj_labels\"][\n                    data_dict['obj_masks']]).item() / float(data_dict['obj_masks'].sum().item() + 1e-8)\n            metrics['obj_cls_post_acc_unmask'] = torch.sum(\n                torch.argmax(data_dict['obj_cls_post_logits'], dim=2)[\n                    data_dict['obj_masks'] * data_dict['obj_sem_masks']] ==\n                data_dict[\"obj_labels\"][data_dict['obj_masks'] * data_dict['obj_sem_masks']]).item() / float(\n                (data_dict['obj_masks'] * data_dict['obj_sem_masks']).sum().item() + 1e-8)\n            metrics['obj_cls_post_acc_mask'] = torch.sum(torch.argmax(data_dict['obj_cls_post_logits'], dim=2)[\n                                                             data_dict['obj_masks'] * data_dict[\n                                                                 'obj_sem_masks'].logical_not()] ==\n                                                         data_dict[\"obj_labels\"][\n                                                             data_dict['obj_masks'] * data_dict[\n                                                                 'obj_sem_masks'].logical_not()]).item() / float(\n                (data_dict['obj_masks'] * data_dict['obj_sem_masks'].logical_not()).sum().item() + 1e-8)\n        if 'obj_cls_raw_logits' in data_dict.keys():\n            metrics['obj_cls_raw_acc'] = torch.sum(\n                torch.argmax(data_dict['obj_cls_raw_logits'], dim=2)[data_dict['obj_masks']] == data_dict[\"obj_labels\"][\n                    data_dict['obj_masks']]).item() / float(data_dict['obj_masks'].sum().item() + 1e-8)\n        if 'obj_cls_pre_logits' in data_dict.keys():\n            metrics['obj_cls_pre_acc'] = torch.sum(\n                torch.argmax(data_dict['obj_cls_pre_logits'], dim=2)[data_dict['obj_masks']] == data_dict[\"obj_labels\"][\n                    data_dict['obj_masks']]).item() / float(data_dict['obj_masks'].sum().item() + 1e-8)\n            metrics['obj_cls_pre_acc_unmask'] = torch.sum(\n                torch.argmax(data_dict['obj_cls_pre_logits'], dim=2)[data_dict['obj_masks'] * data_dict['obj_sem_masks']] ==\n                data_dict[\"obj_labels\"][data_dict['obj_masks'] * data_dict['obj_sem_masks']]).item() / float(\n                (data_dict['obj_masks'] * data_dict['obj_sem_masks']).sum().item() + 1e-8)\n            metrics['obj_cls_pre_acc_mask'] = torch.sum(torch.argmax(data_dict['obj_cls_pre_logits'], dim=2)[\n                                                            data_dict['obj_masks'] * data_dict[\n                                                                'obj_sem_masks'].logical_not()] == data_dict[\"obj_labels\"][\n                                                            data_dict['obj_masks'] * data_dict[\n                                                                'obj_sem_masks'].logical_not()]).item() / float(\n                (data_dict['obj_masks'] * data_dict['obj_sem_masks'].logical_not()).sum().item() + 1e-8)\n        all_acc = [v for k, v in metrics.items()]\n        metrics[\"target_metric\"] = float(sum(all_acc)) / len(all_acc)\n        metrics[\"total_count\"] = data_dict[\"txt_lm_cls_logits\"].shape[0]\n        return metrics\n\n    def update(self, data_dict):\n        metrics = self.batch_metrics(data_dict)\n        self.total_count += metrics[\"total_count\"]\n        for key in self.eval_dict.keys():\n            if key not in metrics.keys():\n                continue\n            self.eval_dict[key].append(float(metrics[key]) * metrics[\"total_count\"])\n\n    def record(self):\n        # Average\n        for k, v in self.eval_dict.items():\n            self.eval_dict[k] = sum(v) / self.total_count\n        if self.eval_dict[\"target_metric\"] > self.best_result:\n            is_best = True\n            self.best_result = self.eval_dict[\"target_metric\"]\n        else:\n            is_best = False\n        return is_best, self.eval_dict\n\n    def reset(self):\n        for key in self.eval_dict.keys():\n            self.eval_dict[key] = []\n        self.total_count = 0"
  },
  {
    "path": "evaluator/referit3d_eval.py",
    "content": "from pathlib import Path\nimport torch\n\nfrom evaluator.build import EVALUATOR_REGISTRY, BaseEvaluator\n\n\n@EVALUATOR_REGISTRY.register()\nclass ReferIt3DEval(BaseEvaluator):\n    def __init__(self, cfg, accelerator, **kwargs):\n        self.target_metric = 'og_acc'\n        self.save_dir = Path(cfg.exp_dir) / \"eval_results\" / self.__class__.__name__\n        super().__init__(cfg, accelerator, **kwargs)\n\n    def batch_metrics(self, data_dict, include_count=False):\n        # Per-scene eval\n        if len(data_dict['og3d_logits'].shape) == 3:\n            data_dict['tgt_object_id'] = data_dict['tgt_object_id'].flatten(0, 1).unsqueeze(1)\n            data_dict['is_hard'] = data_dict['is_hard'].flatten(0, 1)\n            data_dict['is_view_dependent'] = data_dict['is_view_dependent'].flatten(0, 1)\n            data_dict['og3d_logits'] = data_dict['og3d_logits'].flatten(0, 1)\n\n        metrics = {}\n        og_pred = torch.argmax(data_dict['og3d_logits'], dim=-1)\n        total_count = len(og_pred)\n\n        # Easy and hard counts\n        hard_count = data_dict['is_hard'].sum().item()\n        easy_count = total_count - hard_count\n\n        # View-dependent and view-independent counts\n        view_dep_count = data_dict['is_view_dependent'].sum().item()\n        view_indep_count = total_count - view_dep_count\n\n        # Correct counts\n        correct_preds = data_dict['tgt_object_id'].flatten() == og_pred\n        correct = correct_preds.sum().item()\n\n        # Correct counts for easy and hard\n        hard_correct = (correct_preds & data_dict['is_hard']).sum().item()\n        easy_correct = correct - hard_correct\n\n        # Correct counts for view-dependent and view-independent\n        view_dep_correct = (correct_preds & data_dict['is_view_dependent']).sum().item()\n        view_indep_correct = correct - view_dep_correct\n\n        metrics['og_acc_easy'] = (easy_correct, easy_count)\n        metrics['og_acc_hard'] = (hard_correct, hard_count)\n        metrics['og_acc_view_dep'] = (view_dep_correct, view_dep_count)\n        metrics['og_acc_view_indep'] = (view_indep_correct, view_indep_count)\n\n        metrics['og_acc'] = (og_pred == data_dict['tgt_object_id'].squeeze(1)).sum().item()\n        if 'txt_cls_logits' in data_dict:\n            metrics['txt_acc'] = (torch.argmax(data_dict['txt_cls_logits'], dim=1) == data_dict[\"tgt_object_label\"].squeeze(1)).sum().item() \n        \n        # get obj cls acc\n        gt = data_dict['obj_labels']\n        mask = data_dict['obj_masks']\n        for key in data_dict.keys():\n            if key.endswith('logits') and data_dict[key].ndim == 3 and data_dict[key].shape[:2] == data_dict['obj_labels'].shape:\n                new_key = key.replace('logits', 'acc')\n                pred = torch.argmax(data_dict[key], dim=2)\n                metrics[new_key] = ((pred[mask] == gt[mask]).sum().item(), data_dict['obj_masks'].sum().item())\n\n        for key in metrics:\n            if isinstance(metrics[key], tuple):\n                # already has count\n                continue\n            metrics[key] = (metrics[key], total_count)\n        \n        if self.save:\n            item_ids = data_dict['data_idx']\n            for i in range(len(item_ids)):\n                self.eval_results.append({\n                    \"scene_id\": item_ids[i],\n                    \"bbox\": data_dict['obj_boxes'][i][og_pred[i]].cpu().numpy().tolist(),\n                    \"correct\": og_pred[i].item() == data_dict['tgt_object_id'][i].item()\n                })\n\n        if not include_count:\n            for key, v in metrics.items():\n                metrics[key] = v[0] / max(v[1], 1)\n\n        return metrics\n"
  },
  {
    "path": "evaluator/scanqa_eval.py",
    "content": "import os\nimport json\nimport collections\nfrom pathlib import Path\nimport torch\n\nfrom evaluator.build import EVALUATOR_REGISTRY, BaseEvaluator\n\nfrom data.data_utils import ScanQAAnswer, clean_answer\nfrom common.box_utils import get_3d_box\nfrom evaluator.build import EVALUATOR_REGISTRY\n\n\n@EVALUATOR_REGISTRY.register()\nclass ScanQAEval(BaseEvaluator):\n    def __init__(self, cfg, accelerator, **kwargs):\n        self.target_metric = 'ans1_acc'\n        self.save_dir = Path(cfg.exp_dir) / \"eval_results\" / self.__class__.__name__\n        super().__init__(cfg, accelerator, **kwargs)\n\n        if self.save:\n            train_data = json.load(open(os.path.join(cfg.data.scan_family_base,\n                                    'annotations/qa/ScanQA_v1.0_train.json'), encoding='utf-8'))\n            answer_counter = sum([data['answers'] for data in train_data], [])\n            answer_counter = collections.Counter(sorted(answer_counter))\n            answer_cands = answer_counter.keys()\n            self.answer_vocab = ScanQAAnswer(answer_cands)\n\n    def batch_metrics(self, data_dict, include_count=False):\n        metrics = {}\n        total_count = len(data_dict['answer_scores'])\n        # ans\n        choice_1 = data_dict['answer_scores'].argmax(dim=-1)\n        choice_10 = torch.topk(data_dict['answer_scores'].detach(), 10, -1)[1]\n        correct1 = 0\n        correct10 = 0\n        for i in range(data_dict['answer_label'].shape[0]):\n            if data_dict['answer_label'][i, choice_1[i]] == 1:\n                correct1 += 1\n            for j in range(10):\n                if data_dict['answer_label'][i, choice_10[i, j]] == 1:\n                    correct10 += 1\n                    break\n        metrics['ans1_acc'] = correct1\n        metrics['ans10_acc'] = correct10 \n        \n        # get obj cls acc\n        for key in data_dict.keys():\n            if key.endswith('logits') and data_dict[key].ndim == 3 and data_dict[key].shape[:2] == data_dict['obj_labels'].shape:\n                new_key = key.replace('logits', 'acc')\n                pred = torch.argmax(data_dict[key], dim=2)\n                gt = data_dict['obj_labels']\n                mask = data_dict['obj_masks']\n                metrics[new_key] = ((pred[mask] == gt[mask]).sum().item(), data_dict['obj_masks'].sum().item())\n\n        for key in metrics:\n            if isinstance(metrics[key], tuple):\n                # already has count\n                continue\n            metrics[key] = (metrics[key], total_count)\n\n        if self.save:\n            for i in range(total_count):\n                answer_top10 = [self.answer_vocab.itos(choice_10[i, j].item()) for j in range(10)]\n                og3d_pred = torch.argmax(data_dict['og3d_logits'], dim=1)\n                box = data_dict['obj_boxes'][i, og3d_pred[i]].cpu().numpy()\n                box_center = box[0:3]\n                box_size = box[3:6]\n                pred_data = {\n                    \"scene_id\": data_dict[\"scan_id\"][i],\n                    \"question_id\": data_dict[\"data_idx\"][i],\n                    \"answer_top10\": answer_top10,\n                    \"bbox\": get_3d_box(box_center, box_size).tolist()\n                }\n                self.eval_results.append(pred_data)\n\n        if not include_count:\n            for key, v in metrics.items():\n                metrics[key] = v[0] / max(v[1], 1)\n\n        return metrics\n\n\n@EVALUATOR_REGISTRY.register()\nclass ScanQAGenEval(ScanQAEval):\n    def __init__(self, cfg, accelerator, **kwargs):\n        super().__init__(cfg, accelerator, **kwargs)\n\n    def batch_metrics(self, data_dict, include_count=False):\n        metrics = {}\n        answer_preds = [clean_answer(a) for a in data_dict['answer_pred']]\n        answer_gts = [list(map(clean_answer, a)) for a in data_dict['answers']]\n        correct = len([1 for pred, gts in zip(answer_preds, answer_gts) if pred in gts])\n\n        metrics['ans1_acc'] = (correct, len(answer_preds))\n\n        if not include_count:\n            for key, v in metrics.items():\n                metrics[key] = v[0] / max(v[1], 1)\n        \n        return metrics"
  },
  {
    "path": "evaluator/scanrefer_eval.py",
    "content": "from pathlib import Path\nimport torch\n\nfrom evaluator.build import EVALUATOR_REGISTRY, BaseEvaluator\n\n\n@EVALUATOR_REGISTRY.register()\nclass ScanReferEval(BaseEvaluator):\n    def __init__(self, cfg, accelerator, **kwargs):\n        self.target_metric = 'og_acc_iou25'\n        self.save_dir = Path(cfg.exp_dir) / \"eval_results\" / self.__class__.__name__\n        super().__init__(cfg, accelerator, **kwargs)\n\n    def batch_metrics(self, data_dict, include_count=False):\n        # Per-scene eval\n        if len(data_dict['tgt_object_id_iou25'].shape) == 3:\n            data_dict['tgt_object_id_iou25'] = data_dict['tgt_object_id_iou25'].flatten(0, 1)\n            data_dict['tgt_object_id_iou50'] = data_dict['tgt_object_id_iou50'].flatten(0, 1)\n            data_dict['tgt_object_id'] = data_dict['tgt_object_id'].flatten(0, 1).unsqueeze(1)\n            data_dict['is_multiple'] = data_dict['is_multiple'].flatten(0, 1)\n            data_dict['og3d_logits'] = data_dict['og3d_logits'].flatten(0, 1)\n\n        metrics = {}\n        og_pred = torch.argmax(data_dict['og3d_logits'], dim=-1)\n        total_count = len(og_pred)\n\n        multiple_count = data_dict['is_multiple'].sum().item()\n        unique_count = total_count - multiple_count\n\n        # Correct counts for iou25 and iou50\n        iou25_correct_mask = data_dict['tgt_object_id_iou25'][torch.arange(len(og_pred)), og_pred].to(bool)\n        iou50_correct_mask = data_dict['tgt_object_id_iou50'][torch.arange(len(og_pred)), og_pred].to(bool)\n        iou25_correct = iou25_correct_mask.sum().item()\n        iou50_correct = iou50_correct_mask.sum().item()\n\n        # Correct counts for unique and multiple iou25 and iou50\n        iou25_multiple_correct = (iou25_correct_mask & data_dict['is_multiple']).sum().item()\n        iou25_unique_correct = iou25_correct - iou25_multiple_correct\n\n        iou50_multiple_correct = (iou50_correct_mask & data_dict['is_multiple']).sum().item()\n        iou50_unique_correct = iou50_correct - iou50_multiple_correct\n\n        metrics['og_acc_iou25'] = iou25_correct\n        metrics['og_acc_iou50'] = iou50_correct\n        metrics['og_acc_iou25_unique'] = iou25_unique_correct\n        metrics['og_acc_iou50_unique'] = iou50_unique_correct\n        metrics['og_acc_iou25_multiple'] = iou25_multiple_correct\n        metrics['og_acc_iou50_multiple'] = iou50_multiple_correct\n\n        metrics['og_acc'] = (og_pred == data_dict['tgt_object_id'].squeeze(1)).sum().item()\n        if 'txt_cls_logits' in data_dict:\n            metrics['txt_acc'] = (torch.argmax(data_dict['txt_cls_logits'], dim=1) == data_dict[\"tgt_object_label\"].squeeze(1)).sum().item() \n        \n        # get obj cls acc\n        gt = data_dict['obj_labels']\n        mask = data_dict['obj_masks']\n        for key in data_dict.keys():\n            if key.endswith('logits') and data_dict[key].ndim == 3 and data_dict[key].shape[:2] == data_dict['obj_labels'].shape:\n                new_key = key.replace('logits', 'acc')\n                pred = torch.argmax(data_dict[key], dim=2)\n                metrics[new_key] = ((pred[mask] == gt[mask]).sum().item(), mask.sum().item())\n\n        for key in metrics:\n            if isinstance(metrics[key], tuple):\n                # already has count\n                continue\n            if 'unique' in key:\n                metrics[key] = (metrics[key], unique_count)\n            elif 'multiple' in key:\n                metrics[key] = (metrics[key], multiple_count)\n            else:\n                metrics[key] = (metrics[key], total_count)\n\n        if self.save:\n            item_ids = data_dict['data_idx']\n            for i in range(len(item_ids)):\n                self.eval_results.append({\n                    \"scene_id\": item_ids[i],\n                    \"bbox\": data_dict['obj_boxes'][i][og_pred[i]].cpu().numpy().tolist(),\n                    \"correct\": og_pred[i].item() == data_dict['tgt_object_id'][i].item()\n                })\n\n        if not include_count:\n            for key, v in metrics.items():\n                metrics[key] = v[0] / max(v[1], 1)\n\n        return metrics\n"
  },
  {
    "path": "evaluator/sqa3d_eval.py",
    "content": "import os\nimport json\nimport collections\nfrom pathlib import Path\n\nimport numpy as np\nimport torch\n\nfrom data.data_utils import SQA3DAnswer\nfrom evaluator.build import EVALUATOR_REGISTRY\n\n\n@EVALUATOR_REGISTRY.register()\nclass SQA3DEval():\n    # 0: what, 1: is, 2: how, 3: can, 4: which, 5: others\n    def __init__(self, cfg, task_name):\n        self.eval_dict = {\n            'target_metric': [], 'obj_cls_raw_acc': [],'ans1_acc': [], 'ans10_acc': [],\n            'type0_acc': [], 'type1_acc': [], 'type2_acc': [],\n            'type0_acc': [], 'type1_acc': [], 'type2_acc': [],\n            'type3_acc': [], 'type4_acc': [], 'type5_acc': []\n        }\n        # run\n        self.total_count = 0\n        self.type_count = {\n            'type0_count': 1e-10, 'type1_count': 1e-10, 'type2_count': 1e-10,\n            'type3_count': 1e-10, 'type4_count': 1e-10, 'type5_count': 1e-10\n        }\n        self.best_result = -np.inf\n        self.base_dir = cfg.data.scan_family_base\n\n        answer_data = json.load(\n            open(os.path.join(self.base_dir,\n                              'annotations/sqa_task/answer_dict.json'), encoding='utf-8')\n        )[0]\n        answer_counter = []\n        for data in answer_data.keys():\n            answer_counter.append(data)\n        answer_counter = collections.Counter(sorted(answer_counter))\n        answer_cands = answer_counter.keys()\n        self.answer_vocab = SQA3DAnswer(answer_cands)\n\n        self.save = cfg.eval.save\n        if self.save:\n            self.eval_results = []\n            self.save_dir = Path(cfg.exp_dir) / \"eval_results\" / task_name\n            self.save_dir.mkdir(parents=True, exist_ok=True)\n\n    def update(self, data_dict):\n        metrics = self.batch_metrics(data_dict)\n        batch_count = metrics['total_count']\n        self.total_count += batch_count\n        for key in metrics:\n            if 'type' in key and 'count' in key:\n                self.type_count[key] += metrics[key]\n\n        if self.save:\n            for i in range(metrics[\"total_count\"]):\n                self.eval_results.append({\n                    # vision\n                    \"source\": data_dict['source'][i],\n                    \"scan_id\": data_dict['scan_id'][i],\n                    \"anchor\": data_dict['anchor_locs'][i],\n                    'anchor_ort': data_dict['anchor_orientation'][i],\n                    # language\n                    \"instruction\": data_dict['prompt_after_obj'][i],\n                    \"response_gt\": data_dict['answer_list'][i].split('[answer_seq]'),\n                    \"response_pred\": data_dict['output_text'][i]\n                })\n\n        # save eval dict\n        for key in self.eval_dict.keys():\n            if 'type' in key:\n                self.eval_dict[key].append(float(metrics[key]) * metrics['type' + key[4] + '_count'])\n            else:\n                self.eval_dict[key].append(float(metrics[key]) * batch_count)\n\n    def batch_metrics(self, data_dict):\n        metrics = {}\n\n        # ans\n        choice_1 = data_dict['answer_scores'].argmax(dim=-1)\n        choice_10 = torch.topk(data_dict['answer_scores'].detach(), 10, -1)[1]\n        correct1 = 0\n        correct10 = 0\n        correct_type = {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0}\n        count_type = {0: 1e-10, 1: 1e-10, 2: 1e-10, 3: 1e-10, 4: 1e-10, 5: 1e-10}\n        for i in range(data_dict['answer_label'].shape[0]):\n            count_type[data_dict['sqa_type'][i].item()] += 1\n            if data_dict['answer_label'][i, choice_1[i]] == 1:\n                correct1 += 1\n                correct_type[data_dict['sqa_type'][i].item()] += 1\n            for j in range(10):\n                if data_dict['answer_label'][i, choice_10[i, j]] == 1:\n                    correct10 += 1\n                    break\n        metrics['ans1_acc'] = correct1 / float(len(choice_1))\n        metrics['ans10_acc'] = correct10 / float(len(choice_1))\n        # metrics['answer_top10'] = [\n        #     # TODO: add this answer vocabulary in dataloader\n        #     [self.answer_vocab.itos(choice_10[i, j].item()) for j in range(10)] for i in\n        #     range(choice_10.shape[0])\n        # ]\n\n        metrics['obj_cls_raw_acc'] = torch.sum(\n            torch.argmax(data_dict['obj_cls_raw_logits'], dim=2)[data_dict['obj_masks']] == data_dict[\"obj_labels\"][\n                data_dict['obj_masks']]).item() / float(data_dict['obj_masks'].sum().item())\n\n        # question type acc\n        for key in count_type.keys():\n            metrics['type' + str(key) + '_acc'] = correct_type[key] / count_type[key]\n            metrics['type' + str(key) + '_count'] = count_type[key]\n\n        metrics['target_metric'] = metrics['ans1_acc']\n        metrics[\"total_count\"] = data_dict[\"answer_scores\"].shape[0]\n        return metrics\n\n    def reset(self):\n        for key in self.eval_dict.keys():\n            self.eval_dict[key] = []\n        self.total_count = 0\n        self.type_count = {\n            'type0_count': 1e-10, 'type1_count': 1e-10, 'type2_count': 1e-10, \n            'type3_count': 1e-10, 'type4_count': 1e-10, 'type5_count': 1e-10\n        }\n        if self.save:\n            self.eval_results = []\n\n    def record(self, split='val'):\n        # record\n        for k, v in self.eval_dict.items():\n            if k == \"answer_top10\":\n                continue\n            if 'type' in k:\n                self.eval_dict[k] = sum(v) / self.type_count['type' + k[4] + '_count']\n            else:\n                self.eval_dict[k] = sum(v) / self.total_count\n\n        if self.eval_dict[\"target_metric\"] > self.best_result:\n            is_best = True\n            self.best_result = self.eval_dict[\"target_metric\"]\n        else:\n            is_best = False\n\n        if self.save and (is_best or split == 'test'):\n            torch.save(self.eval_results, str(self.save_dir / 'results.pt'))\n\n        return is_best, self.eval_dict\n"
  },
  {
    "path": "launch.py",
    "content": "import argparse\n\nimport common.launch_utils as lu\n\n\ndef parse_args():\n    def str2bool(v):\n        if v.lower() in ('yes', 'true', 't', 'y', '1'):\n            return True\n        elif v.lower() in ('no', 'false', 'f', 'n', '0'):\n            return False\n        else:\n            return argparse.ArgumentTypeError('Unsupported value encountered')\n    parser = argparse.ArgumentParser()\n\n    # General settings\n    parser.add_argument(\"--mode\", default=\"submitit\", type=str,\n                        help=\"Launch mode (submitit | accelerate | python)\")\n    parser.add_argument(\"--debug\", default=False, type=str2bool,\n                        help=\"Debug mode (True | False)\")\n\n    # Slurm settings\n    parser.add_argument(\"--name\", default=\"masaccio\", type=str,\n                        help=\"Name of the job\")\n    parser.add_argument(\"--run_file\", default=\"run.py\", type=str,\n                        help=\"File position of launcher file\")\n    parser.add_argument(\"--job_dir\", default=\"jobs/%j\", type=str,\n                        help=\"Directory to save the job logs\")\n    parser.add_argument(\"--num_nodes\", default=1, type=int, \n                        help=\"Number of nodes to use in SLURM\")\n    parser.add_argument(\"--gpu_per_node\", default=2, type=int,\n                        help=\"Number of gpus to use in each node\")\n    parser.add_argument(\"--cpu_per_task\", default=32, type=int,\n                        help=\"Number of cpus to use for each gpu\")\n    parser.add_argument(\"--qos\", default=\"level0\", type=str,\n                        help=\"Qos of the job\")\n    parser.add_argument(\"--partition\", default=\"gpu\", type=str,\n                        help=\"Partition of the job\")\n    parser.add_argument(\"--account\", default=\"research\", type=str,\n                        help=\"Account of the job\")\n    parser.add_argument(\"--mem_per_gpu\", default=80, type=int,\n                        help=\"Memory allocated for each gpu in GB\")\n    parser.add_argument(\"--time\", default=24, type=int,\n                        help=\"Time allocated for the job in hours\")\n    parser.add_argument(\"--port\", default=1234, type=int,\n                        help=\"Default port for distributed training\")\n    parser.add_argument(\"--nodelist\", default=\"\", type=str,\n                        help=\"Default node id for distributed training\")\n\n    # Accelerate settings\n    parser.add_argument(\"--mixed_precision\", default=\"no\", type=str,\n                        help=\"Mixed precision training, options (no | fp16 | bf16)\")\n\n    # Additional Training settings\n    parser.add_argument(\"--config\", default=\"configs/default.yaml\", type=str,\n                        help=\"Path to the config file\")\n    parser.add_argument(\"opts\", default=None, nargs=argparse.REMAINDER,\n                        help=\"Additional options to change configureation\")\n    return parser.parse_args()\n\n\ndef main():\n    args = parse_args()\n    getattr(lu, f\"{args.mode}_launch\")(args)\n    print(\"launched\")\n\n\nif __name__ == \"__main__\":\n    main()"
  },
  {
    "path": "model/__init__.py",
    "content": "from .objcls import *\nfrom .openvocab import *\n"
  },
  {
    "path": "model/build.py",
    "content": "import torch.nn as nn\nfrom fvcore.common.registry import Registry\n\n\nMODEL_REGISTRY = Registry(\"model\")\n\n\nclass BaseModel(nn.Module):\n    def __init__(self, cfg):\n        super().__init__()\n\n    def get_opt_params(self):\n        raise NotImplementedError(\"Function to obtain all default parameters for optimization\")\n\n\ndef build_model(cfg):\n    model = MODEL_REGISTRY.get(cfg.model.name)(cfg)\n    return model"
  },
  {
    "path": "model/objcls.py",
    "content": "import torch\nimport torch.nn as nn\nimport json\nfrom pathlib import Path\n\nimport clip\nfrom transformers import BertConfig, BertModel, BertTokenizer\nfrom einops import rearrange\n\nfrom model.build import MODEL_REGISTRY, BaseModel\nfrom modules.layers.pointnet import PointNetPP\nfrom modules.utils import get_mlp_head\nfrom optim.utils import no_decay_param_group\n\n\n@MODEL_REGISTRY.register()\nclass ObjCls(BaseModel):\n    def __init__(self, cfg):\n        super().__init__(cfg)\n        self.cfg = cfg\n        self.model_name = cfg.model.get(\"model_name\", \"pointnext\")\n        self.language_type = cfg.model.get(\"language_type\", \"clip\")\n        self.pre_extract_path = cfg.model.get(\"pre_extract_path\", None)\n\n        cls_in_channel = 512 if self.language_type == \"clip\" else 768\n        self.point_feature_extractor = PointNetPP(\n            sa_n_points=[32, 16, None],\n            sa_n_samples=[32, 32, None],\n            sa_radii=[0.2, 0.4, None],\n            sa_mlps=[[3, 64, 64, 128], [128, 128, 128, 256], [256, 256, 512, cls_in_channel]],\n        )\n\n        if cfg.num_gpu > 1:\n            self.point_feature_extractor = torch.nn.SyncBatchNorm.convert_sync_batchnorm(self.point_feature_extractor)\n\n        if not cfg.model.open_vocab:\n            cls_hidden = cfg.model.get(\"cls_hidden\", 1024)\n            num_classes = cfg.model.num_classes\n            self.cls_head = get_mlp_head(cls_in_channel, cls_hidden, num_classes)\n        else:\n            if self.pre_extract_path is not None:\n                file_name = f\"scannet_607_{'clip-ViT-B16' if self.language_type == 'clip' else 'bert-base-uncased'}_id.pth\"\n                self.register_buffer(\"text_embeds\", torch.load(Path(self.pre_extract_path) / file_name).float())\n            else:\n                self.int2cat = json.load(open(cfg.model.vocab_path, \"r\"))\n                if self.language_type == \"clip\":\n                    self.clip_head = clip.load(\"ViT-B/16\")\n                    self.text_embeds = self.clip_head.encode_text(clip.tokenize(self.int2cat)).detach()\n                elif self.language_type == \"bert\":\n                    self.tokenizer = BertTokenizer.from_pretrained(\"bert-base-uncased\", do_lower_case=True)\n                    self.bert_config = BertConfig(\n                        hidden_size=768, num_hidden_layers=3, num_attention=12, type_vocab_size=2\n                    )\n                    self.model = BertModel.from_pretrained(\"bert-base-uncased\", config=self.bert_config)\n                    self.encoded_input = self.tokenizer(\n                        self.int2cat, padding=True, truncation=True, add_special_tokens=True, return_tensors=\"pt\"\n                    )\n                    self.text_embeds = self.model(**self.encoded_input).last_hidden_state\n                    self.text_embeds = self.text_embeds.detach()\n                else:\n                    raise NotImplementedError\n        self.dropout = nn.Dropout(0.1)\n\n    def forward(self, data_dict):\n        # prepare dict\n        if 'cur_step' not in data_dict.keys():\n            data_dict['cur_step'] = 1\n            data_dict['total_steps'] = 1\n\n        obj_pcds = data_dict[\"obj_fts\"]\n        batch_size, num_objs, _, _ = obj_pcds.size()\n        if self.model_name == \"pointnext\":\n            obj_locs = rearrange(obj_pcds[..., :3], 'b o p d -> (b o) p d')\n            obj_fts = rearrange(obj_pcds[..., 3:], 'b o p d -> (b o) d p').contiguous()\n            obj_embeds = self.point_feature_extractor(obj_locs, obj_fts, type=\"cls\")\n        elif self.model_name == \"pointnet++\":\n            obj_pcds = rearrange(obj_pcds, 'b o p d -> (b o) p d')\n            obj_embeds = self.point_feature_extractor(obj_pcds)\n        elif self.model_name == \"pointmlp\":\n            obj_pcds = rearrange(obj_pcds, 'b o p d -> (b o) p d')\n            obj_embeds = self.point_feature_extractor(obj_pcds)\n        obj_embeds = self.dropout(obj_embeds)\n        if self.cfg.model.open_vocab:\n            logits = obj_embeds @ self.text_embeds.t()\n            data_dict[\"obj_logits\"] = rearrange(logits, '(b o) c -> b o c', b=batch_size)\n        else:\n            data_dict[\"obj_logits\"] = rearrange(self.cls_head(obj_embeds), '(b o) d -> b o d', b=batch_size)\n        return data_dict\n\n    def get_opt_params(self):\n        optimizer_grouped_parameters = []\n        optimizer_grouped_parameters.append({\n            \"params\": self.parameters(),\n            \"weight_decay\": self.cfg.solver.get(\"weight_decay\", 0.0),\n            \"lr\": self.cfg.solver.lr\n        })\n        return optimizer_grouped_parameters\n"
  },
  {
    "path": "model/openvocab.py",
    "content": "import numpy as np\nimport torch\nimport torch.nn as nn\nfrom einops import einsum\n\nfrom model.build import MODEL_REGISTRY, BaseModel\nfrom modules.build import build_module\nfrom optim.utils import no_decay_param_group\n\n\n@MODEL_REGISTRY.register()\nclass OpenVocab(BaseModel):\n    def __init__(self, cfg):\n        super().__init__(cfg)\n        self.cfg = cfg\n        self.lang_encoder = build_module(\"language\", self.cfg.model.language)\n        self.point_encoder = build_module(\"vision\", self.cfg.model.vision)\n        self.unified_encoder = build_module(\"grounding\", self.cfg.model.grounding)\n        self.head_list = self.cfg.model.heads.head_list\n        for head in self.head_list:\n            setattr(self, head, build_module(\"heads\", getattr(self.cfg.model.heads, head)))\n        self.use_scene_cap = self.cfg.data.args.get(\"use_scene_cap\", False)\n        if self.use_scene_cap:\n            self.object_pool = lambda x : x.mean(dim=1)\n\n    def forward(self, data_dict):\n        # prepare dict\n        if 'cur_step' not in data_dict.keys():\n            data_dict['cur_step'] = 1\n            data_dict['total_steps'] = 1\n        # basic feature extractor\n        # point_features_pre_spatial is point features before spatial reasonging\n\n        lang_basic_features = self.lang_encoder(data_dict['txt_ids'], data_dict['txt_masks'])\n        if self.use_scene_cap:\n            scene_txt_ids = data_dict['scene_txt_ids']\n            scene_txt_masks = data_dict['scene_txt_masks']\n            scene_lang_basic_features = self.lang_encoder(scene_txt_ids, scene_txt_masks)\n            data_dict['scene_text_embed'] = scene_lang_basic_features[:, 0]\n\n        if not \"Scene\" in self.cfg.model.vision.name:\n            point_basic_features, point_features_pre, obj_cls_raw_logits = self.point_encoder(data_dict['obj_fts'].float(),\n                                                                                                data_dict['obj_locs'],\n                                                                                                data_dict['obj_masks'],\n                                                                                                data_dict['obj_sem_masks'],\n                                                                                                data_dict['obj_labels'],\n                                                                                                data_dict['cur_step'],\n                                                                                                data_dict['total_steps'])\n        else:\n            point_basic_features, point_features_pre, obj_cls_raw_logits = self.point_encoder(data_dict)\n\n        if self.use_scene_cap:\n            scene_feature = self.object_pool(point_basic_features)\n            data_dict[\"scene_embed\"] = scene_feature\n\n        if self.cfg.model.inter == \"before\":\n            data_dict[\"inter_text_embed\"] = lang_basic_features[:, 0]\n            data_dict[\"inter_obj_embeds\"] = point_basic_features\n\n        # unifed language entity transformer\n        language_fuse_feature, point_fuse_feature = self.unified_encoder(lang_basic_features, data_dict['txt_masks'],\n                                                                         point_basic_features, data_dict['obj_locs'],\n                                                                         data_dict['obj_masks'])\n        if self.cfg.model.inter != \"before\":\n            data_dict[\"inter_text_embed\"] = language_fuse_feature[:, 0]\n            data_dict[\"inter_obj_embeds\"] = point_fuse_feature\n\n        # # TODO: check if this is correct and if an additional mlp head is needed\n        language_summarize_feature = language_fuse_feature[:, 0]\n        data_dict[\"intra_text_embed\"] = language_summarize_feature\n        data_dict[\"intra_obj_embeds\"] = point_fuse_feature\n\n        data_dict['obj_cls_raw_logits'] = obj_cls_raw_logits\n        data_dict['og3d_logits'] = einsum(point_fuse_feature, language_summarize_feature, \"b o d, b d -> b o\")\n\n        # task head\n        if getattr(self, \"ground_head\", None) is not None:\n            txt_cls_logits, obj_cls_post_logits, obj_cls_pre_logits, og3d_logits = self.ground_head(language_fuse_feature,\n                                                                                                point_fuse_feature,\n                                                                                                point_features_pre,\n                                                                                                data_dict['obj_masks'])\n            data_dict['txt_cls_logits'] = txt_cls_logits\n            data_dict['obj_cls_post_logits'] = obj_cls_post_logits\n            data_dict['obj_cls_pre_logits'] = obj_cls_pre_logits\n            # reload og3d_logits for head concatenated finetuning\n            data_dict['og3d_logits'] = og3d_logits\n\n        if getattr(self, \"qa_head\", None) is not None:\n            answer_scores = self.qa_head(point_fuse_feature, data_dict['obj_masks'], language_fuse_feature,\n                                         data_dict['txt_masks'])\n            data_dict['answer_scores'] = answer_scores\n        if getattr(self, \"pretrain_head\", None) is not None:\n            output = self.pretrain_head(language_fuse_feature, point_fuse_feature)\n            if isinstance(output, tuple):\n                txt_lm_cls_logits, obj_lm_cls_logits = output\n                data_dict['obj_cls_post_logits'] = obj_lm_cls_logits\n            else:\n                txt_lm_cls_logits = output\n            data_dict['txt_lm_cls_logits'] = txt_lm_cls_logits\n\n        return data_dict\n\n    def get_opt_params(self):\n        def get_lr(cfg, default_lr):\n            return default_lr if cfg.get(\"lr\") is None else cfg.get(\"lr\")\n\n        optimizer_grouped_parameters = []\n        optimizer_grouped_parameters += no_decay_param_group(self.lang_encoder.named_parameters(),\n                                                             get_lr(self.cfg.model.language, self.cfg.solver.lr))\n        optimizer_grouped_parameters += no_decay_param_group(self.point_encoder.named_parameters(),\n                                                             get_lr(self.cfg.model.vision, self.cfg.solver.lr))\n        optimizer_grouped_parameters += no_decay_param_group(self.unified_encoder.named_parameters(),\n                                                             get_lr(self.cfg.model.grounding, self.cfg.solver.lr))\n        if \"ground_head\" in self.head_list:\n            optimizer_grouped_parameters += no_decay_param_group(\n                self.ground_head.named_parameters(), get_lr(self.cfg.model.heads.ground_head, self.cfg.solver.lr)\n            )\n        if \"qa_head\" in self.head_list:\n            optimizer_grouped_parameters += no_decay_param_group(\n                self.qa_head.named_parameters(), get_lr(self.cfg.model.heads.qa_head, self.cfg.solver.lr)\n            )\n        if \"pretrain_head\" in self.head_list:\n            optimizer_grouped_parameters += no_decay_param_group(\n                self.pretrain_head.named_parameters(), get_lr(self.cfg.model.heads.pretrain_head, self.cfg.solver.lr)\n            )\n        return optimizer_grouped_parameters\n\n\n@MODEL_REGISTRY.register()\nclass OpenVocabPerScene(BaseModel):\n    def __init__(self, cfg):\n        super().__init__(cfg)\n        self.cfg = cfg\n        self.lang_encoder = build_module(\"language\", self.cfg.model.language)\n        self.point_encoder = build_module(\"vision\", self.cfg.model.vision)\n        self.unified_encoder = build_module(\"grounding\", self.cfg.model.grounding)\n        self.head_list = self.cfg.model.heads.head_list\n        for head in self.head_list:\n            setattr(self, head, build_module(\"heads\", getattr(self.cfg.model.heads, head)))\n\n    def forward(self, data_dict):\n        # prepare dict\n        if 'cur_step' not in data_dict.keys():\n            data_dict['cur_step'] = 1\n            data_dict['total_steps'] = 1\n\n        use_per_scene = (len(data_dict['txt_ids'].shape) == 3)\n\n        if use_per_scene:\n            B, L, _ = data_dict['txt_ids'].shape\n            B, O = data_dict['obj_masks'].shape\n\n        # basic feature extracter\n        # point_features_pre_spatial is point features before spatial reasonging\n        txt_ids = data_dict['txt_ids'].view(B * L, -1) if use_per_scene else data_dict['txt_ids']\n        txt_masks = data_dict['txt_masks'].view(B * L, -1) if use_per_scene else data_dict['txt_masks']\n\n        lang_basic_features = self.lang_encoder(txt_ids, txt_masks)   # (B, L), D\n        if not \"Scene\" in self.cfg.model.vision.name:\n            point_basic_features, point_features_pre, obj_cls_raw_logits = self.point_encoder(data_dict['obj_fts'].float(),\n                                                                                                data_dict['obj_locs'],\n                                                                                                data_dict['obj_masks'],\n                                                                                                data_dict['obj_sem_masks'],\n                                                                                                data_dict['obj_labels'],\n                                                                                                data_dict['cur_step'],\n                                                                                                data_dict['total_steps'])\n        else:\n            point_basic_features, point_features_pre, obj_cls_raw_logits = self.point_encoder(data_dict)\n\n        point_basic_features = point_basic_features.unsqueeze(1).repeat(1, L, 1, 1) \\\n                                        if use_per_scene else point_basic_features\n        point_basic_features = point_basic_features.view(B * L, O, point_basic_features.shape[-1]) \\\n                                        if use_per_scene else point_basic_features\n        if use_per_scene:\n            obj_locs = data_dict['obj_locs'].unsqueeze(1).repeat(1, L, 1, 1)\n            obj_locs = obj_locs.view(B * L, O, obj_locs.shape[-1])\n            obj_masks = data_dict['obj_masks'].unsqueeze(1).repeat(1, L, 1)\n            obj_masks = obj_masks.view(B * L, O)\n        else:\n            obj_locs = data_dict['obj_locs']\n            obj_masks = data_dict['obj_masks']\n\n        if self.cfg.model.inter == \"before\":\n            data_dict[\"inter_text_embed\"] = lang_basic_features[:, 0]\n            data_dict[\"inter_obj_embeds\"] = point_basic_features\n\n        # unifed language entity transformer\n        language_fuse_feature, point_fuse_feature = self.unified_encoder(lang_basic_features, txt_masks,\n                                                                         point_basic_features, obj_locs,\n                                                                         obj_masks)\n        if self.cfg.model.inter != \"before\":\n            data_dict[\"inter_text_embed\"] = language_fuse_feature[:, 0]\n            data_dict[\"inter_obj_embeds\"] = point_fuse_feature\n\n        # # TODO: check if this is correct and if an additional mlp head is needed\n        language_summarize_feature = language_fuse_feature[:, 0]\n        data_dict[\"intra_text_embed\"] = language_summarize_feature\n        data_dict[\"intra_obj_embeds\"] = point_fuse_feature\n\n        data_dict['obj_cls_raw_logits'] = obj_cls_raw_logits\n        data_dict['og3d_logits'] = einsum(point_fuse_feature, language_summarize_feature, \"b o d, b d -> b o\")\n\n        if use_per_scene:\n            data_dict['og3d_logits'] = data_dict['og3d_logits'].view(B, L, O)\n\n        # # task head\n        # if getattr(self, \"ground_head\", None) is not None:\n        #     txt_cls_logits, obj_cls_post_logits, obj_cls_pre_logits, og3d_logits = self.ground_head(language_fuse_feature,\n        #                                                                                         point_fuse_feature,\n        #                                                                                         point_features_pre,\n        #                                                                                         data_dict['obj_masks'])\n        #     data_dict['txt_cls_logits'] = txt_cls_logits\n        #     data_dict['obj_cls_post_logits'] = obj_cls_post_logits\n        #     data_dict['obj_cls_pre_logits'] = obj_cls_pre_logits\n        #     # reload og3d_logits for head concatenated finetuning\n        #     data_dict['og3d_logits'] = og3d_logits\n        #\n        if getattr(self, \"qa_head\", None) is not None:\n            answer_scores = self.qa_head(point_fuse_feature, data_dict['obj_masks'], language_fuse_feature,\n                                         data_dict['txt_masks'])\n            data_dict['answer_scores'] = answer_scores\n        if getattr(self, \"pretrain_head\", None) is not None:\n            output = self.pretrain_head(language_fuse_feature, point_fuse_feature)\n            if isinstance(output, tuple):\n                txt_lm_cls_logits, obj_lm_cls_logits = output\n                data_dict['obj_cls_post_logits'] = obj_lm_cls_logits\n            else:\n                txt_lm_cls_logits = output\n            data_dict['txt_lm_cls_logits'] = txt_lm_cls_logits\n        return data_dict\n\n    def get_opt_params(self):\n        def get_lr(cfg, default_lr):\n            return default_lr if cfg.get(\"lr\") is None else cfg.get(\"lr\")\n\n        optimizer_grouped_parameters = []\n        optimizer_grouped_parameters += no_decay_param_group(self.lang_encoder.named_parameters(),\n                                                             get_lr(self.cfg.model.language, self.cfg.solver.lr))\n        optimizer_grouped_parameters += no_decay_param_group(self.point_encoder.named_parameters(),\n                                                             get_lr(self.cfg.model.vision, self.cfg.solver.lr))\n        optimizer_grouped_parameters += no_decay_param_group(self.unified_encoder.named_parameters(),\n                                                             get_lr(self.cfg.model.grounding, self.cfg.solver.lr))\n        if \"ground_head\" in self.head_list:\n            optimizer_grouped_parameters += no_decay_param_group(\n                self.ground_head.named_parameters(), get_lr(self.cfg.model.heads.ground_head, self.cfg.solver.lr)\n            )\n        if \"qa_head\" in self.head_list:\n            optimizer_grouped_parameters += no_decay_param_group(\n                self.qa_head.named_parameters(), get_lr(self.cfg.model.heads.qa_head, self.cfg.solver.lr)\n            )\n        if \"pretrain_head\" in self.head_list:\n            optimizer_grouped_parameters += no_decay_param_group(\n                self.pretrain_head.named_parameters(), get_lr(self.cfg.model.heads.pretrain_head, self.cfg.solver.lr)\n            )\n        return optimizer_grouped_parameters"
  },
  {
    "path": "modules/__init__.py",
    "content": "from .language import *\nfrom .vision import *\nfrom .grounding import *\nfrom .heads import *"
  },
  {
    "path": "modules/build.py",
    "content": "from fvcore.common.registry import Registry\n\nfrom common.type_utils import cfg2dict\n\n\nVISION_REGISTRY = Registry(\"vision\")\nLANGUAGE_REGISTRY = Registry(\"language\")\nGROUNDING_REGISTRY = Registry(\"grounding\")\nHEADS_REGISTRY = Registry(\"heads\")\n\n\ndef build_module(module_type, cfg):\n    if module_type == \"vision\":\n        return VISION_REGISTRY.get(cfg.name)(cfg, **cfg2dict(cfg.args))\n    elif module_type == \"language\":\n        return LANGUAGE_REGISTRY.get(cfg.name)(cfg, **cfg2dict(cfg.args))\n    elif module_type == \"grounding\":\n        return GROUNDING_REGISTRY.get(cfg.name)(cfg, **cfg2dict(cfg.args))\n    elif module_type == \"heads\":\n        return HEADS_REGISTRY.get(cfg.name)(cfg, **cfg2dict(cfg.args))\n    else:\n        raise NotImplementedError(f\"module type {module_type} not implemented\")\n\ndef build_module_by_name(cfg):\n    module_registries = [VISION_REGISTRY, LANGUAGE_REGISTRY, GROUNDING_REGISTRY, HEADS_REGISTRY]\n    for registry in module_registries:\n        if cfg.name in registry:\n            print(f\"Using {cfg.name} module from Registry {registry._name}\")\n            kwargs = cfg2dict(cfg.args) if hasattr(cfg, \"args\") else {}\n            return registry.get(cfg.name)(cfg, **kwargs)\n    raise NotImplementedError(f\"Unknown module: {cfg.name}\")"
  },
  {
    "path": "modules/grounding/__init__.py",
    "content": "from .unified_encoder import *\n"
  },
  {
    "path": "modules/grounding/unified_encoder.py",
    "content": "import torch\nimport torch.nn as nn\n\nfrom modules.build import GROUNDING_REGISTRY\nfrom modules.layers.transformers import (TransformerDecoderLayer,\n                                         TransformerEncoderLayer,\n                                         TransformerSpatialDecoderLayer)\nfrom modules.utils import layer_repeat, calc_pairwise_locs\nfrom modules.weights import _init_weights_bert\n\n\n@GROUNDING_REGISTRY.register()\nclass EntitySpatialCrossEncoder(nn.Module):\n    \"\"\"\n       spatial_dim: spatial feature dim, used to modify attention\n       dim_loc:\n    \"\"\"\n\n    def __init__(self, cfg, hidden_size=768, num_attention_heads=12, spatial_dim=5, num_layers=4, dim_loc=6,\n                 pairwise_rel_type='center'):\n        super().__init__()\n        decoder_layer = TransformerSpatialDecoderLayer(hidden_size, num_attention_heads, dim_feedforward=2048,\n                                                       dropout=0.1, activation='gelu',\n                                                       spatial_dim=spatial_dim, spatial_multihead=True,\n                                                       spatial_attn_fusion='cond')\n        self.layers = layer_repeat(decoder_layer, num_layers)\n        loc_layer = nn.Sequential(\n            nn.Linear(dim_loc, hidden_size),\n            nn.LayerNorm(hidden_size),\n        )\n        self.loc_layers = layer_repeat(loc_layer, 1)\n        self.pairwise_rel_type = pairwise_rel_type\n        self.spatial_dim = spatial_dim\n        self.spatial_dist_norm = True\n        self.apply(_init_weights_bert)\n\n    def forward(\n            self, txt_embeds, txt_masks, obj_embeds, obj_locs, obj_masks,\n            output_attentions=False, output_hidden_states=False, **kwargs\n    ):\n        pairwise_locs = calc_pairwise_locs(\n            obj_locs[:, :, :3], obj_locs[:, :, 3:],\n            pairwise_rel_type=self.pairwise_rel_type\n        )\n\n        out_embeds = obj_embeds\n        for i, layer in enumerate(self.layers):\n            query_pos = self.loc_layers[0](obj_locs)\n            out_embeds = out_embeds + query_pos\n\n            out_embeds, self_attn_matrices, cross_attn_matrices = layer(\n                out_embeds, txt_embeds, pairwise_locs,\n                tgt_key_padding_mask=obj_masks.logical_not(),\n                memory_key_padding_mask=txt_masks.logical_not(),\n            )\n\n        return txt_embeds, out_embeds\n\n\n@GROUNDING_REGISTRY.register()\nclass UnifiedSpatialCrossEncoderV1(nn.Module):\n    \"\"\"\n       spatial_dim: spatial feature dim, used to modify attention\n       dim_loc:\n    \"\"\"\n\n    def __init__(self, cfg, hidden_size=768, num_attention_heads=12, spatial_dim=5, num_layers=4, dim_loc=6,\n                 pairwise_rel_type='center'):\n        super().__init__()\n\n        pc_encoder_layer = TransformerSpatialDecoderLayer(hidden_size, num_attention_heads, dim_feedforward=2048,\n                                                          dropout=0.1, activation='gelu',\n                                                          spatial_dim=spatial_dim, spatial_multihead=True,\n                                                          spatial_attn_fusion='cond')\n        lang_encoder_layer = TransformerDecoderLayer(hidden_size, num_attention_heads)\n        self.pc_encoder = layer_repeat(pc_encoder_layer, num_layers)\n        self.lang_encoder = layer_repeat(lang_encoder_layer, num_layers)\n\n        loc_layer = nn.Sequential(\n            nn.Linear(dim_loc, hidden_size),\n            nn.LayerNorm(hidden_size),\n        )\n        self.loc_layers = layer_repeat(loc_layer, 1)\n\n        self.pairwise_rel_type = pairwise_rel_type\n        self.spatial_dim = spatial_dim\n        self.spatial_dist_norm = True\n        self.apply(_init_weights_bert)\n\n    def forward(\n            self, txt_embeds, txt_masks, obj_embeds, obj_locs, obj_masks,\n            output_attentions=False, output_hidden_states=False, **kwargs\n    ):\n        pairwise_locs = calc_pairwise_locs(\n            obj_locs[:, :, :3], obj_locs[:, :, 3:],\n            pairwise_rel_type=self.pairwise_rel_type\n        )\n\n        for i, (pc_layer, lang_layer) in enumerate(zip(self.pc_encoder, self.lang_encoder)):\n            query_pos = self.loc_layers[0](obj_locs)\n            obj_embeds = obj_embeds + query_pos\n\n            obj_embeds_out, self_attn_matrices, cross_attn_matrices = pc_layer(\n                obj_embeds, txt_embeds, pairwise_locs,\n                tgt_key_padding_mask=obj_masks.logical_not(),\n                memory_key_padding_mask=txt_masks.logical_not(),\n            )\n\n            txt_embeds_out, self_attn_matrices, cross_attn_matrices = lang_layer(\n                txt_embeds, obj_embeds,\n                tgt_key_padding_mask=txt_masks.logical_not(),\n                memory_key_padding_mask=obj_masks.logical_not(),\n            )\n\n            obj_embeds = obj_embeds_out\n            txt_embeds = txt_embeds_out\n\n        return txt_embeds, obj_embeds\n\n\n@GROUNDING_REGISTRY.register()\nclass UnifiedSpatialCrossEncoderV2(nn.Module):\n    \"\"\"\n       spatial_dim: spatial feature dim, used to modify attention\n       dim_loc:\n    \"\"\"\n\n    def __init__(self, cfg, hidden_size=768, dim_feedforward=2048, num_attention_heads=12, num_layers=4, dim_loc=6):\n        super().__init__()\n\n        # unfied encoder\n        unified_encoder_layer = TransformerEncoderLayer(hidden_size, num_attention_heads, dim_feedforward=dim_feedforward)\n        self.unified_encoder = layer_repeat(unified_encoder_layer, num_layers)\n\n        # loc layer\n        loc_layer = nn.Sequential(\n            nn.Linear(dim_loc, hidden_size),\n            nn.LayerNorm(hidden_size),\n        )\n        self.loc_layers = layer_repeat(loc_layer, 1)\n\n        # token embedding\n        self.token_type_embeddings = nn.Embedding(2, hidden_size)\n\n        self.apply(_init_weights_bert)\n\n    def forward(\n            self, txt_embeds, txt_masks, obj_embeds, obj_locs, obj_masks,\n            output_attentions=False, output_hidden_states=False, **kwargs\n    ):\n        txt_len = txt_embeds.shape[1]\n        obj_len = obj_embeds.shape[1]\n\n        for i, unified_layer in enumerate(self.unified_encoder):\n            # add embeddings for points\n            query_pos = self.loc_layers[0](obj_locs)\n            pc_token_type_ids = torch.ones((obj_embeds.shape[0:2])).long().cuda()\n            pc_type_embeds = self.token_type_embeddings(pc_token_type_ids)\n            obj_embeds = obj_embeds + query_pos + pc_type_embeds\n\n            # add embeddings for languages\n            lang_token_type_ids = torch.zeros((txt_embeds.shape[0:2])).long().cuda()\n            lang_type_embeds = self.token_type_embeddings(lang_token_type_ids)\n            txt_embeds = txt_embeds + lang_type_embeds\n\n            # fuse embeddings\n            joint_embeds = torch.cat((txt_embeds, obj_embeds), dim=1)\n            joint_masks = torch.cat((txt_masks, obj_masks), dim=1)\n\n            # transformer\n            joint_embeds, self_attn_matrices = unified_layer(joint_embeds,\n                                                             tgt_key_padding_mask=joint_masks.logical_not())\n\n            # split\n            txt_embeds, obj_embeds = torch.split(joint_embeds, [txt_len, obj_len], dim=1)\n\n        return txt_embeds, obj_embeds\n\n\nif __name__ == '__main__':\n    x = UnifiedSpatialCrossEncoderV2().cuda()\n    txt_embeds = torch.zeros((3, 10, 768)).cuda()\n    txt_masks = torch.ones((3, 10)).cuda()\n    obj_embeds = torch.zeros((3, 10, 768)).cuda()\n    obj_locs = torch.ones((3, 10, 6)).cuda()\n    obj_masks = torch.ones((3, 10)).cuda()\n    x(txt_embeds, txt_masks, obj_embeds, obj_locs, obj_masks)\n"
  },
  {
    "path": "modules/heads/__init__.py",
    "content": "from .grounding_head import *\nfrom .pretrain_head import *\nfrom .qa_head import *"
  },
  {
    "path": "modules/heads/grounding_head.py",
    "content": "import torch.nn as nn\n\nfrom modules.build import HEADS_REGISTRY\nfrom modules.utils import get_mlp_head\n\n\n@HEADS_REGISTRY.register()\nclass GroundHeadV1(nn.Module):\n    def __init__(self, cfg, input_size=768, hidden_size=768, sem_cls_size=607, dropout=0.3, detach_all_aux_loss=False):\n        super().__init__()\n        self.og3d_head = get_mlp_head(\n            input_size, hidden_size,\n            1, dropout=dropout\n        )\n        self.txt_clf_head = get_mlp_head(\n            input_size, hidden_size,\n            sem_cls_size, dropout=dropout\n        )\n        self.obj3d_clf_head = get_mlp_head(\n            input_size, hidden_size,\n            sem_cls_size, dropout=dropout\n        )\n        self.obj3d_clf_pre_head = get_mlp_head(\n            input_size, hidden_size,\n            sem_cls_size, dropout=dropout\n        )\n        self.detach_all_aux_loss = detach_all_aux_loss\n\n    def forward(self, txt_embeds, obj_embeds, obj_pre_embeds, obj_masks, **kwargs):\n        og3d_logits = self.og3d_head(obj_embeds).squeeze(2)\n        og3d_logits = og3d_logits.masked_fill_(obj_masks.logical_not(), -float('inf'))\n        if self.detach_all_aux_loss:\n            txt_embeds = txt_embeds.detach()\n            obj_embeds = obj_embeds.detach()\n            obj_pre_embeds = obj_pre_embeds.detach()\n        txt_cls_logits = self.txt_clf_head(txt_embeds[:, 0])\n        obj_cls_logits = self.obj3d_clf_head(obj_embeds)\n        obj_cls_pre_logits = self.obj3d_clf_pre_head(obj_pre_embeds)\n        return txt_cls_logits, obj_cls_logits, obj_cls_pre_logits, og3d_logits\n\n\n@HEADS_REGISTRY.register()\nclass GroundHead(nn.Module):\n    def __init__(self, cfg, input_size=768, hidden_size=768, dropout=0.3):\n        super().__init__()\n        self.og3d_head = get_mlp_head(\n            input_size, hidden_size,\n            1, dropout=dropout\n        )\n\n    def forward(self, obj_embeds, obj_masks=None, **kwargs):\n        og3d_logits = self.og3d_head(obj_embeds).squeeze(2)\n        if obj_masks is not None:\n            og3d_logits = og3d_logits.masked_fill_(obj_masks.logical_not(), -float('inf'))\n        return og3d_logits\n"
  },
  {
    "path": "modules/heads/pretrain_head.py",
    "content": "import torch\nimport torch.nn as nn\n\nfrom modules.build import HEADS_REGISTRY\nfrom modules.utils import get_activation_fn\n\n\nclass BertPredictionHeadTransform(nn.Module):\n    def __init__(self, hidden_size, hidden_act='gelu'):\n        super().__init__()\n        self.dense = nn.Linear(hidden_size, hidden_size)\n        self.transform_act_fn = get_activation_fn(hidden_act)\n        self.LayerNorm = nn.LayerNorm(hidden_size)\n\n    def forward(self, hidden_states):\n        hidden_states = self.dense(hidden_states)\n        hidden_states = self.transform_act_fn(hidden_states)\n        hidden_states = self.LayerNorm(hidden_states)\n        return hidden_states\n\n\nclass BertLMPredictionHead(nn.Module):\n    def __init__(self, hidden_size, vocab_size):\n        super().__init__()\n        self.transform = BertPredictionHeadTransform(hidden_size=hidden_size, hidden_act='gelu')\n        self.decoder = nn.Linear(hidden_size, vocab_size, bias=False)\n        self.bias = nn.Parameter(torch.zeros(vocab_size))\n\n    def forward(self, hidden_states):\n        hidden_states = self.transform(hidden_states)\n        hidden_states = self.decoder(hidden_states) + self.bias\n        return hidden_states\n\n\n@HEADS_REGISTRY.register()\nclass PretrainHeadV1(nn.Module):\n    def __init__(self, cfg, hidden_size=768, vocab_size=30522):\n        super().__init__()\n        self.lm_pred_head = BertLMPredictionHead(hidden_size, vocab_size)\n\n    def forward(self, txt_embeds, **kwargs):\n        txt_lm_cls_logits = self.lm_pred_head(txt_embeds)\n        return txt_lm_cls_logits\n\n\n@HEADS_REGISTRY.register()\nclass OVPretrainHead(nn.Module):\n    def __init__(self, cfg, hidden_size=768, vocab_size=30522, obj_vocab_size=607):\n        super().__init__()\n        self.lm_pred_head = BertLMPredictionHead(hidden_size, vocab_size)\n        self.obj_pred_head = BertLMPredictionHead(hidden_size, obj_vocab_size)\n\n    def forward(self, txt_embeds, obj_embeds, **kwargs):\n        txt_lm_cls_logits = self.lm_pred_head(txt_embeds)\n        obj_lm_cls_logits = self.obj_pred_head(obj_embeds)\n        return (txt_lm_cls_logits, obj_lm_cls_logits)"
  },
  {
    "path": "modules/heads/qa_head.py",
    "content": "import torch\nimport torch.nn.functional as F\nfrom torch import nn\n\nfrom modules.build import HEADS_REGISTRY\n\n\nclass FC(nn.Module):\n    def __init__(self, in_size, out_size, pdrop=0., use_gelu=True):\n        super(FC, self).__init__()\n        self.pdrop = pdrop\n        self.use_gelu = use_gelu\n        self.linear = nn.Linear(in_size, out_size)\n        if use_gelu:\n            # self.relu = nn.Relu(inplace=True)\n            self.gelu = nn.GELU()\n        if pdrop > 0:\n            self.dropout = nn.Dropout(pdrop)\n\n    def forward(self, x):\n        x = self.linear(x)\n        if self.use_gelu:\n            # x = self.relu(x)\n            x = self.gelu(x)\n        if self.pdrop > 0:\n            x = self.dropout(x)\n        return x\n\n\nclass MLP(nn.Module):\n    def __init__(self, in_size, mid_size, out_size, pdrop=0., use_gelu=True):\n        super().__init__()\n        self.fc = FC(in_size, mid_size, pdrop=pdrop, use_gelu=use_gelu)\n        self.linear = nn.Linear(mid_size, out_size)\n\n    def forward(self, x):\n        return self.linear(self.fc(x))\n\n\nclass AttFlat(nn.Module):\n    def __init__(self, hidden_size, flat_mlp_size=512, flat_glimpses=1, flat_out_size=1024, pdrop=0.1):\n        super().__init__()\n        self.mlp = MLP(\n            in_size=hidden_size,\n            mid_size=flat_mlp_size,\n            out_size=flat_glimpses,\n            pdrop=pdrop,\n            use_gelu=True\n        )\n        self.flat_glimpses = flat_glimpses\n        self.linear_merge = nn.Linear(\n            hidden_size * flat_glimpses,\n            flat_out_size\n        )\n\n    def forward(self, x, x_mask):\n        att = self.mlp(x)\n        if x_mask is not None:\n            # att = att.masked_fill(x_mask.squeeze(1).squeeze(1).unsqueeze(2), -1e9)\n            att = att.masked_fill(x_mask.unsqueeze(2), -1e9)\n        att = F.softmax(att, dim=1)\n        att_list = []\n        for i in range(self.flat_glimpses):\n            att_list.append(\n                torch.sum(att[:, :, i: i + 1] * x, dim=1)\n            )\n        x_atted = torch.cat(att_list, dim=1)\n        x_atted = self.linear_merge(x_atted)\n        return x_atted\n\n\n@HEADS_REGISTRY.register()\nclass QAHeadV1(nn.Module):\n    def __init__(self, cfg, hidden_size=768, mlp_size=256, glimpse=1, flat_out_size=512, num_answers=8864):\n        super().__init__()\n        self.attflat_visual = AttFlat(hidden_size, mlp_size, glimpse, flat_out_size, 0.1)\n        self.attflat_lang = AttFlat(hidden_size, mlp_size, glimpse, flat_out_size, 0.1)\n        self.answer_cls = nn.Sequential(\n            nn.Linear(flat_out_size, hidden_size),\n            nn.GELU(),\n            nn.Dropout(0.3),\n            nn.Linear(hidden_size, num_answers)\n        )\n        self.fusion_norm = nn.LayerNorm(flat_out_size)\n\n    def forward(self, obj_embeds, obj_masks, txt_embeds, txt_masks, **kwargs):\n        object_feat = self.attflat_visual(obj_embeds, obj_masks.logical_not())\n        lang_feat = self.attflat_lang(txt_embeds, txt_masks.logical_not())\n        fuse_feat = self.fusion_norm(lang_feat + object_feat)\n        answer_scores = self.answer_cls(fuse_feat)\n        return answer_scores"
  },
  {
    "path": "modules/language/__init__.py",
    "content": "from .bert import *\nfrom .clip import *"
  },
  {
    "path": "modules/language/bert.py",
    "content": "import torch.nn as nn\nfrom transformers import BertConfig, BertModel, BertTokenizer\n\nfrom modules.build import LANGUAGE_REGISTRY\n\n\n@LANGUAGE_REGISTRY.register()\nclass BERTLanguageEncoder(nn.Module):\n    def __init__(self, cfg, weights=\"bert-base-uncased\", hidden_size=768,\n                 num_hidden_layers=4, num_attention_heads=12, type_vocab_size=2):\n        super().__init__()\n        self.tokenizer = BertTokenizer.from_pretrained(\n            weights, do_lower_case=True\n        )\n        self.bert_config = BertConfig(\n            hidden_size=hidden_size,\n            num_hidden_layers=num_hidden_layers,\n            num_attention_heads=num_attention_heads,\n            type_vocab_size=type_vocab_size\n        )\n        self.model = BertModel.from_pretrained(\n            weights, config=self.bert_config\n        )\n\n    def forward(self, txt_ids, txt_masks, **kwargs):\n        return self.model(txt_ids, txt_masks).last_hidden_state\n"
  },
  {
    "path": "modules/language/clip.py",
    "content": "from contextlib import nullcontext\nimport torch\nimport torch.nn as nn\nfrom transformers import CLIPTextModelWithProjection\n\nfrom modules.build import LANGUAGE_REGISTRY\nfrom modules.utils import get_mlp_head\n\n@LANGUAGE_REGISTRY.register()\nclass CLIPLanguageEncoder(nn.Module):\n    def __init__(self, cfg, weights=\"openai/clip-vit-large-patch14\", output_dim=768, freeze_backbone=True, use_projection=False, dropout=0.1):\n        super().__init__()\n        self.context = torch.no_grad if freeze_backbone else nullcontext\n        self.model = CLIPTextModelWithProjection.from_pretrained(weights)\n        self.use_projection = use_projection\n        if use_projection:\n            self.projection = get_mlp_head(self.model.config.hidden_size, output_dim, output_dim, dropout=dropout)\n        #self.attention = nn.MultiheadAttention(embed_dim=768, num_heads=12, batch_first=True)\n        \n    def forward(self, txt_ids, txt_masks):\n        with self.context():\n            txt = self.model(txt_ids, txt_masks).last_hidden_state\n            txt = self.model.text_projection(txt)\n            txt = torch.nn.functional.normalize(txt, p=2, dim=2)\n        #txt = self.attention(txt, txt, txt, key_padding_mask=txt_masks.logical_not())[0]\n        if self.use_projection:\n            txt = self.projection(txt)\n        return txt"
  },
  {
    "path": "modules/layers/pointnet.py",
    "content": "import torch.nn as nn\n\nfrom modules.third_party.pointnet2.pointnet2_modules import PointnetSAModule\n\n\ndef break_up_pc(pc):\n    \"\"\"\n    Split the pointcloud into xyz positions and features tensors.\n    This method is taken from VoteNet codebase (https://github.com/facebookresearch/votenet)\n\n    @param pc: pointcloud [N, 3 + C]\n    :return: the xyz tensor and the feature tensor\n    \"\"\"\n    xyz = pc[..., 0:3].contiguous()\n    features = (\n        pc[..., 3:].transpose(1, 2).contiguous()\n        if pc.size(-1) > 3 else None\n    )\n    return xyz, features\n\n\nclass PointNetPP(nn.Module):\n    \"\"\"\n    Pointnet++ encoder.\n    For the hyper parameters please advise the paper (https://arxiv.org/abs/1706.02413)\n    \"\"\"\n\n    def __init__(self, sa_n_points: list,\n                 sa_n_samples: list,\n                 sa_radii: list,\n                 sa_mlps: list,\n                 bn=True,\n                 use_xyz=True):\n        super().__init__()\n\n        n_sa = len(sa_n_points)\n        if not (n_sa == len(sa_n_samples) == len(sa_radii) == len(sa_mlps)):\n            raise ValueError('Lens of given hyper-params are not compatible')\n\n        self.encoder = nn.ModuleList()\n\n        for i in range(n_sa):\n            self.encoder.append(PointnetSAModule(\n                npoint=sa_n_points[i],\n                nsample=sa_n_samples[i],\n                radius=sa_radii[i],\n                mlp=sa_mlps[i],\n                bn=bn,\n                use_xyz=use_xyz,\n            ))\n\n        out_n_points = sa_n_points[-1] if sa_n_points[-1] is not None else 1\n        self.fc = nn.Linear(out_n_points * sa_mlps[-1][-1], sa_mlps[-1][-1])\n\n    def forward(self, features):\n        \"\"\"\n        @param features: B x N_objects x N_Points x 3 + C\n        \"\"\"\n        xyz, features = break_up_pc(features)\n        for i in range(len(self.encoder)):\n            xyz, features = self.encoder[i](xyz, features)\n\n        return self.fc(features.view(features.size(0), -1))"
  },
  {
    "path": "modules/layers/transformers.py",
    "content": "from typing import Optional\n\nimport einops\nimport numpy as np\nimport torch\nimport torch.nn.functional as F\nfrom torch import Tensor, nn\n\nfrom modules.utils import get_activation_fn\n\n\nclass CrossAttentionLayer(nn.Module):\n\n    def __init__(self, d_model, nhead, dim_feedforward=2048, dropout=0.1, activation=\"relu\",\n                 k_dim=None, v_dim=None, prenorm=True):\n        super().__init__()\n        if k_dim is None:\n            k_dim = d_model\n        if v_dim is None:\n            v_dim = d_model\n        self.prenorm = prenorm\n        self.multihead_attn = nn.MultiheadAttention(\n            d_model, nhead, dropout=dropout, batch_first=True, kdim=k_dim, vdim=v_dim\n        )\n        # Implementation of Feedforward modules\n        self.linear1 = nn.Linear(d_model, dim_feedforward)\n        self.dropout = nn.Dropout(dropout)\n        self.linear2 = nn.Linear(dim_feedforward, d_model)\n\n        self.norm1 = nn.LayerNorm(d_model)\n        self.norm2 = nn.LayerNorm(d_model)\n        self.norm3 = nn.LayerNorm(d_model)\n        self.dropout1 = nn.Dropout(dropout)\n        self.dropout2 = nn.Dropout(dropout)\n        self.dropout3 = nn.Dropout(dropout)\n\n        self.activation = get_activation_fn(activation)\n\n    def forward(\n            self, tgt, memory,\n            tgt_mask: Optional[Tensor] = None,\n            memory_mask: Optional[Tensor] = None,\n            tgt_key_padding_mask: Optional[Tensor] = None,\n            memory_key_padding_mask: Optional[Tensor] = None,\n    ):\n        tgt2 = tgt\n        if self.prenorm:\n            tgt2 = self.norm1(tgt2)\n        tgt2, cross_attn_matrices = self.multihead_attn(\n            query=tgt2, key=memory,\n            value=memory, attn_mask=memory_mask,\n            key_padding_mask=memory_key_padding_mask\n        )\n        tgt = tgt + self.dropout2(tgt2)\n        if not self.prenorm:\n            tgt = self.norm1(tgt)\n        if self.prenorm:\n            tgt2 = self.norm3(tgt)\n        tgt2 = self.linear2(self.dropout(self.activation(self.linear1(tgt2))))\n        tgt = tgt + self.dropout3(tgt2)\n        if not self.prenorm:\n            tgt = self.norm3(tgt)\n        return tgt, cross_attn_matrices\n\n\nclass TransformerDecoderLayer(nn.Module):\n    def __init__(self, d_model, nhead, dim_feedforward=2048, dropout=0.1, activation=\"relu\"):\n        super().__init__()\n        self.self_attn = nn.MultiheadAttention(\n            d_model, nhead, dropout=dropout, batch_first=True\n        )\n        self.multihead_attn = nn.MultiheadAttention(\n            d_model, nhead, dropout=dropout, batch_first=True\n        )\n        # Implementation of Feedforward modules\n        self.linear1 = nn.Linear(d_model, dim_feedforward)\n        self.dropout = nn.Dropout(dropout)\n        self.linear2 = nn.Linear(dim_feedforward, d_model)\n\n        self.norm1 = nn.LayerNorm(d_model)\n        self.norm2 = nn.LayerNorm(d_model)\n        self.norm3 = nn.LayerNorm(d_model)\n        self.dropout1 = nn.Dropout(dropout)\n        self.dropout2 = nn.Dropout(dropout)\n        self.dropout3 = nn.Dropout(dropout)\n\n        self.activation = get_activation_fn(activation)\n\n    def forward(\n            self, tgt, memory,\n            tgt_mask: Optional[Tensor] = None,\n            memory_mask: Optional[Tensor] = None,\n            tgt_key_padding_mask: Optional[Tensor] = None,\n            memory_key_padding_mask: Optional[Tensor] = None,\n    ):\n        tgt2 = self.norm1(tgt)\n        tgt2, self_attn_matrices = self.self_attn(\n            query=tgt2, key=tgt2, value=tgt2, attn_mask=tgt_mask,\n            key_padding_mask=tgt_key_padding_mask\n        )\n        tgt = tgt + self.dropout1(tgt2)\n        tgt2 = self.norm2(tgt)\n        tgt2, cross_attn_matrices = self.multihead_attn(\n            query=tgt2, key=memory,\n            value=memory, attn_mask=memory_mask,\n            key_padding_mask=memory_key_padding_mask\n        )\n        tgt = tgt + self.dropout2(tgt2)\n        tgt2 = self.norm3(tgt)\n        tgt2 = self.linear2(self.dropout(self.activation(self.linear1(tgt2))))\n        tgt = tgt + self.dropout3(tgt2)\n        return tgt, self_attn_matrices, cross_attn_matrices\n\n\nclass TransformerEncoderLayer(nn.Module):\n    def __init__(self, d_model, nhead, dim_feedforward=2048, batch_first=True, dropout=0.1, activation=\"relu\", prenorm=False):\n        super().__init__()\n        self.self_attn = nn.MultiheadAttention(\n            d_model, nhead, dropout=dropout, batch_first=batch_first\n        )\n        # Implementation of Feedforward modules\n        self.linear1 = nn.Linear(d_model, dim_feedforward)\n        self.dropout = nn.Dropout(dropout)\n        self.linear2 = nn.Linear(dim_feedforward, d_model)\n\n        self.norm1 = nn.LayerNorm(d_model)\n        self.norm2 = nn.LayerNorm(d_model)\n        self.dropout1 = nn.Dropout(dropout)\n        self.dropout2 = nn.Dropout(dropout)\n\n        self.activation = get_activation_fn(activation)\n        self.prenorm = prenorm\n\n    def forward(\n            self, tgt, tgt_mask: Optional[Tensor] = None,\n            tgt_key_padding_mask: Optional[Tensor] = None,\n    ):\n        tgt2 = tgt\n        if self.prenorm:\n            tgt2 = self.norm1(tgt2)\n        tgt2, self_attn_matrices = self.self_attn(\n            query=tgt2, key=tgt2, value=tgt2, attn_mask=tgt_mask,\n            key_padding_mask=tgt_key_padding_mask\n        )\n        tgt = tgt + self.dropout1(tgt2)\n        if not self.prenorm:\n            tgt = self.norm1(tgt)\n        if self.prenorm:\n            tgt = self.norm2(tgt)\n        tgt2 = self.linear2(self.dropout(self.activation(self.linear1(tgt))))\n        tgt = tgt + self.dropout2(tgt2)\n        if not self.prenorm:\n            tgt = self.norm2(tgt)\n        return tgt, self_attn_matrices\n\n\nclass MultiHeadAttentionSpatial(nn.Module):\n    def __init__(\n            self, d_model, n_head, dropout=0.1, spatial_multihead=True, spatial_dim=5,\n            spatial_attn_fusion='mul',\n    ):\n        super().__init__()\n        assert d_model % n_head == 0, 'd_model: %d, n_head: %d' % (d_model, n_head)\n\n        self.n_head = n_head\n        self.d_model = d_model\n        self.d_per_head = d_model // n_head\n        self.spatial_multihead = spatial_multihead\n        self.spatial_dim = spatial_dim\n        self.spatial_attn_fusion = spatial_attn_fusion\n\n        self.w_qs = nn.Linear(d_model, d_model)\n        self.w_ks = nn.Linear(d_model, d_model)\n        self.w_vs = nn.Linear(d_model, d_model)\n\n        self.fc = nn.Linear(d_model, d_model)\n\n        self.spatial_n_head = n_head if spatial_multihead else 1\n        if self.spatial_attn_fusion in ['mul', 'bias', 'add']:\n            self.pairwise_loc_fc = nn.Linear(spatial_dim, self.spatial_n_head)\n        elif self.spatial_attn_fusion == 'ctx':\n            self.pairwise_loc_fc = nn.Linear(spatial_dim, d_model)\n        elif self.spatial_attn_fusion == 'cond':\n            self.lang_cond_fc = nn.Linear(d_model, self.spatial_n_head * (spatial_dim + 1))\n        else:\n            raise NotImplementedError('unsupported spatial_attn_fusion %s' % (self.spatial_attn_fusion))\n\n    def forward(self, q, k, v, pairwise_locs, key_padding_mask=None, txt_embeds=None):\n        residual = q\n        q = einops.rearrange(self.w_qs(q), 'b l (head k) -> head b l k', head=self.n_head)\n        k = einops.rearrange(self.w_ks(k), 'b t (head k) -> head b t k', head=self.n_head)\n        v = einops.rearrange(self.w_vs(v), 'b t (head v) -> head b t v', head=self.n_head)\n        attn = torch.einsum('hblk,hbtk->hblt', q, k) / np.sqrt(q.shape[-1])\n\n        if self.spatial_attn_fusion in ['mul', 'bias', 'add']:\n            loc_attn = self.pairwise_loc_fc(pairwise_locs)\n            loc_attn = einops.rearrange(loc_attn, 'b l t h -> h b l t')\n            if self.spatial_attn_fusion == 'mul':\n                loc_attn = F.relu(loc_attn)\n            if not self.spatial_multihead:\n                loc_attn = einops.repeat(loc_attn, 'h b l t -> (h nh) b l t', nh=self.n_head)\n        elif self.spatial_attn_fusion == 'ctx':\n            loc_attn = self.pairwise_loc_fc(pairwise_locs)\n            loc_attn = einops.rearrange(loc_attn, 'b l t (h k) -> h b l t k', h=self.n_head)\n            loc_attn = torch.einsum('hblk,hbltk->hblt', q, loc_attn) / np.sqrt(q.shape[-1])\n        elif self.spatial_attn_fusion == 'cond':\n            spatial_weights = self.lang_cond_fc(residual)\n            spatial_weights = einops.rearrange(spatial_weights, 'b l (h d) -> h b l d', h=self.spatial_n_head,\n                                               d=self.spatial_dim + 1)\n            if self.spatial_n_head == 1:\n                spatial_weights = einops.repeat(spatial_weights, '1 b l d -> h b l d', h=self.n_head)\n            spatial_bias = spatial_weights[..., :1]\n            spatial_weights = spatial_weights[..., 1:]\n            loc_attn = torch.einsum('hbld,bltd->hblt', spatial_weights, pairwise_locs) + spatial_bias\n            loc_attn = torch.sigmoid(loc_attn)\n\n        if key_padding_mask is not None:\n            mask = einops.repeat(key_padding_mask, 'b t -> h b l t', h=self.n_head, l=q.size(2))\n            attn = attn.masked_fill(mask, -np.inf)\n            if self.spatial_attn_fusion in ['mul', 'cond']:\n                loc_attn = loc_attn.masked_fill(mask, 0)\n            else:\n                loc_attn = loc_attn.masked_fill(mask, -np.inf)\n\n        if self.spatial_attn_fusion == 'add':\n            fused_attn = (torch.softmax(attn, 3) + torch.softmax(loc_attn, 3)) / 2\n        else:\n            if self.spatial_attn_fusion in ['mul', 'cond']:\n                fused_attn = torch.log(torch.clamp(loc_attn, min=1e-6)) + attn\n            else:\n                fused_attn = loc_attn + attn\n            fused_attn = torch.softmax(fused_attn, 3)\n\n        assert torch.sum(torch.isnan(fused_attn) == 0), print(fused_attn)\n\n        output = torch.einsum('hblt,hbtv->hblv', fused_attn, v)\n        output = einops.rearrange(output, 'head b l v -> b l (head v)')\n        output = self.fc(output)\n        return output, fused_attn\n\n\nclass TransformerSpatialDecoderLayer(TransformerDecoderLayer):\n    def __init__(\n            self, d_model, nhead, dim_feedforward=2048, dropout=0.1, activation=\"relu\",\n            spatial_multihead=True, spatial_dim=5, spatial_attn_fusion='mul'\n    ):\n        super().__init__(\n            d_model, nhead, dim_feedforward=dim_feedforward, dropout=dropout, activation=activation\n        )\n        del self.self_attn\n        self.self_attn = MultiHeadAttentionSpatial(\n            d_model, nhead, dropout=dropout,\n            spatial_multihead=spatial_multihead,\n            spatial_dim=spatial_dim,\n            spatial_attn_fusion=spatial_attn_fusion,\n        )\n\n    def forward(\n            self, tgt, memory,\n            tgt_pairwise_locs: Optional[Tensor] = None,\n            tgt_mask: Optional[Tensor] = None,\n            memory_mask: Optional[Tensor] = None,\n            tgt_key_padding_mask: Optional[Tensor] = None,\n            memory_key_padding_mask: Optional[Tensor] = None,\n    ):\n        tgt2 = self.norm1(tgt)\n        tgt2, self_attn_matrices = self.self_attn(\n            tgt2, tgt2, tgt2, tgt_pairwise_locs,\n            key_padding_mask=tgt_key_padding_mask\n        )\n        tgt = tgt + self.dropout1(tgt2)\n        tgt2 = self.norm2(tgt)\n        tgt2, cross_attn_matrices = self.multihead_attn(\n            query=tgt2, key=memory,\n            value=memory, attn_mask=memory_mask,\n            key_padding_mask=memory_key_padding_mask\n        )\n        tgt = tgt + self.dropout2(tgt2)\n        tgt2 = self.norm3(tgt)\n        tgt2 = self.linear2(self.dropout(self.activation(self.linear1(tgt2))))\n        tgt = tgt + self.dropout3(tgt2)\n        return tgt, self_attn_matrices, cross_attn_matrices\n\n\nclass TransformerSpatialEncoderLayer(TransformerEncoderLayer):\n    def __init__(\n            self, d_model, nhead, dim_feedforward=2048, dropout=0.1, activation=\"relu\",\n            spatial_multihead=True, spatial_dim=5, spatial_attn_fusion='mul'\n    ):\n        super().__init__(\n            d_model, nhead, dim_feedforward=dim_feedforward, dropout=dropout, activation=activation\n        )\n        del self.self_attn\n        self.self_attn = MultiHeadAttentionSpatial(\n            d_model, nhead, dropout=dropout,\n            spatial_multihead=spatial_multihead,\n            spatial_dim=spatial_dim,\n            spatial_attn_fusion=spatial_attn_fusion,\n        )\n\n    def forward(\n            self, tgt, tgt_pairwise_locs,\n            tgt_mask: Optional[Tensor] = None,\n            tgt_key_padding_mask: Optional[Tensor] = None,\n    ):\n        tgt2 = tgt\n        tgt2, self_attn_matrices = self.self_attn(\n            tgt2, tgt2, tgt2, tgt_pairwise_locs,\n            key_padding_mask=tgt_key_padding_mask\n        )\n        tgt = tgt + self.dropout1(tgt2)\n        tgt = self.norm1(tgt)\n        tgt2 = self.linear2(self.dropout(self.activation(self.linear1(tgt))))\n        tgt = tgt + self.dropout2(tgt2)\n        tgt = self.norm2(tgt)\n        return tgt, self_attn_matrices\n"
  },
  {
    "path": "modules/third_party/__init__.py",
    "content": ""
  },
  {
    "path": "modules/third_party/pointnet2/_ext_src/include/ball_query.h",
    "content": "#pragma once\n#include <torch/extension.h>\n\nat::Tensor ball_query(at::Tensor new_xyz, at::Tensor xyz, const float radius,\n                      const int nsample);\n"
  },
  {
    "path": "modules/third_party/pointnet2/_ext_src/include/cuda_utils.h",
    "content": "#ifndef _CUDA_UTILS_H\n#define _CUDA_UTILS_H\n\n#include <ATen/ATen.h>\n#include <ATen/cuda/CUDAContext.h>\n#include <cmath>\n\n#include <cuda.h>\n#include <cuda_runtime.h>\n\n#include <vector>\n\n#define TOTAL_THREADS 512\n\ninline int opt_n_threads(int work_size) {\n  const int pow_2 = std::log(static_cast<double>(work_size)) / std::log(2.0);\n\n  return max(min(1 << pow_2, TOTAL_THREADS), 1);\n}\n\ninline dim3 opt_block_config(int x, int y) {\n  const int x_threads = opt_n_threads(x);\n  const int y_threads =\n      max(min(opt_n_threads(y), TOTAL_THREADS / x_threads), 1);\n  dim3 block_config(x_threads, y_threads, 1);\n\n  return block_config;\n}\n\n#define CUDA_CHECK_ERRORS()                                           \\\n  do {                                                                \\\n    cudaError_t err = cudaGetLastError();                             \\\n    if (cudaSuccess != err) {                                         \\\n      fprintf(stderr, \"CUDA kernel failed : %s\\n%s at L:%d in %s\\n\",  \\\n              cudaGetErrorString(err), __PRETTY_FUNCTION__, __LINE__, \\\n              __FILE__);                                              \\\n      exit(-1);                                                       \\\n    }                                                                 \\\n  } while (0)\n\n#endif\n"
  },
  {
    "path": "modules/third_party/pointnet2/_ext_src/include/group_points.h",
    "content": "#pragma once\n#include <torch/extension.h>\n\nat::Tensor group_points(at::Tensor points, at::Tensor idx);\nat::Tensor group_points_grad(at::Tensor grad_out, at::Tensor idx, const int n);\n"
  },
  {
    "path": "modules/third_party/pointnet2/_ext_src/include/interpolate.h",
    "content": "#pragma once\n\n#include <torch/extension.h>\n#include <vector>\n\nstd::vector<at::Tensor> three_nn(at::Tensor unknowns, at::Tensor knows);\nat::Tensor three_interpolate(at::Tensor points, at::Tensor idx,\n                             at::Tensor weight);\nat::Tensor three_interpolate_grad(at::Tensor grad_out, at::Tensor idx,\n                                  at::Tensor weight, const int m);\n"
  },
  {
    "path": "modules/third_party/pointnet2/_ext_src/include/sampling.h",
    "content": "#pragma once\n#include <torch/extension.h>\n\nat::Tensor gather_points(at::Tensor points, at::Tensor idx);\nat::Tensor gather_points_grad(at::Tensor grad_out, at::Tensor idx, const int n);\nat::Tensor furthest_point_sampling(at::Tensor points, const int nsamples);\n"
  },
  {
    "path": "modules/third_party/pointnet2/_ext_src/include/utils.h",
    "content": "#pragma once\n#include <ATen/cuda/CUDAContext.h>\n#include <torch/extension.h>\n\n#define CHECK_CUDA(x)                                    \\\n  do {                                                   \\\n    AT_ASSERT(x.is_cuda(), #x \" must be a CUDA tensor\"); \\\n  } while (0)\n\n#define CHECK_CONTIGUOUS(x)                                          \\\n  do {                                                               \\\n    AT_ASSERT(x.is_contiguous(), #x \" must be a contiguous tensor\"); \\\n  } while (0)\n\n#define CHECK_IS_INT(x)                               \\\n  do {                                                \\\n    AT_ASSERT(x.scalar_type() == at::ScalarType::Int, \\\n              #x \" must be an int tensor\");           \\\n  } while (0)\n\n#define CHECK_IS_FLOAT(x)                               \\\n  do {                                                  \\\n    AT_ASSERT(x.scalar_type() == at::ScalarType::Float, \\\n              #x \" must be a float tensor\");            \\\n  } while (0)\n"
  },
  {
    "path": "modules/third_party/pointnet2/_ext_src/src/ball_query.cpp",
    "content": "#include \"ball_query.h\"\n#include \"utils.h\"\n\nvoid query_ball_point_kernel_wrapper(int b, int n, int m, float radius,\n                                     int nsample, const float *new_xyz,\n                                     const float *xyz, int *idx);\n\nat::Tensor ball_query(at::Tensor new_xyz, at::Tensor xyz, const float radius,\n                      const int nsample) {\n  CHECK_CONTIGUOUS(new_xyz);\n  CHECK_CONTIGUOUS(xyz);\n  CHECK_IS_FLOAT(new_xyz);\n  CHECK_IS_FLOAT(xyz);\n\n  if (new_xyz.is_cuda()) {\n    CHECK_CUDA(xyz);\n  }\n\n  at::Tensor idx =\n      torch::zeros({new_xyz.size(0), new_xyz.size(1), nsample},\n                   at::device(new_xyz.device()).dtype(at::ScalarType::Int));\n\n  if (new_xyz.is_cuda()) {\n    query_ball_point_kernel_wrapper(xyz.size(0), xyz.size(1), new_xyz.size(1),\n                                    radius, nsample, new_xyz.data_ptr<float>(),\n                                    xyz.data_ptr<float>(), idx.data_ptr<int>());\n  } else {\n    AT_ASSERT(false, \"CPU not supported\");\n  }\n\n  return idx;\n}\n"
  },
  {
    "path": "modules/third_party/pointnet2/_ext_src/src/ball_query_gpu.cu",
    "content": "#include <math.h>\n#include <stdio.h>\n#include <stdlib.h>\n\n#include \"cuda_utils.h\"\n\n// input: new_xyz(b, m, 3) xyz(b, n, 3)\n// output: idx(b, m, nsample)\n__global__ void query_ball_point_kernel(int b, int n, int m, float radius,\n                                        int nsample,\n                                        const float *__restrict__ new_xyz,\n                                        const float *__restrict__ xyz,\n                                        int *__restrict__ idx) {\n  int batch_index = blockIdx.x;\n  xyz += batch_index * n * 3;\n  new_xyz += batch_index * m * 3;\n  idx += m * nsample * batch_index;\n\n  int index = threadIdx.x;\n  int stride = blockDim.x;\n\n  float radius2 = radius * radius;\n  for (int j = index; j < m; j += stride) {\n    float new_x = new_xyz[j * 3 + 0];\n    float new_y = new_xyz[j * 3 + 1];\n    float new_z = new_xyz[j * 3 + 2];\n    for (int k = 0, cnt = 0; k < n && cnt < nsample; ++k) {\n      float x = xyz[k * 3 + 0];\n      float y = xyz[k * 3 + 1];\n      float z = xyz[k * 3 + 2];\n      float d2 = (new_x - x) * (new_x - x) + (new_y - y) * (new_y - y) +\n                 (new_z - z) * (new_z - z);\n      if (d2 < radius2) {\n        if (cnt == 0) {\n          for (int l = 0; l < nsample; ++l) {\n            idx[j * nsample + l] = k;\n          }\n        }\n        idx[j * nsample + cnt] = k;\n        ++cnt;\n      }\n    }\n  }\n}\n\nvoid query_ball_point_kernel_wrapper(int b, int n, int m, float radius,\n                                     int nsample, const float *new_xyz,\n                                     const float *xyz, int *idx) {\n  cudaStream_t stream = at::cuda::getCurrentCUDAStream();\n  query_ball_point_kernel<<<b, opt_n_threads(m), 0, stream>>>(\n      b, n, m, radius, nsample, new_xyz, xyz, idx);\n\n  CUDA_CHECK_ERRORS();\n}\n"
  },
  {
    "path": "modules/third_party/pointnet2/_ext_src/src/bindings.cpp",
    "content": "#include \"ball_query.h\"\n#include \"group_points.h\"\n#include \"interpolate.h\"\n#include \"sampling.h\"\n\nPYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {\n  m.def(\"gather_points\", &gather_points);\n  m.def(\"gather_points_grad\", &gather_points_grad);\n  m.def(\"furthest_point_sampling\", &furthest_point_sampling);\n\n  m.def(\"three_nn\", &three_nn);\n  m.def(\"three_interpolate\", &three_interpolate);\n  m.def(\"three_interpolate_grad\", &three_interpolate_grad);\n\n  m.def(\"ball_query\", &ball_query);\n\n  m.def(\"group_points\", &group_points);\n  m.def(\"group_points_grad\", &group_points_grad);\n}\n"
  },
  {
    "path": "modules/third_party/pointnet2/_ext_src/src/group_points.cpp",
    "content": "#include \"group_points.h\"\n#include \"utils.h\"\n\nvoid group_points_kernel_wrapper(int b, int c, int n, int npoints, int nsample,\n                                 const float *points, const int *idx,\n                                 float *out);\n\nvoid group_points_grad_kernel_wrapper(int b, int c, int n, int npoints,\n                                      int nsample, const float *grad_out,\n                                      const int *idx, float *grad_points);\n\nat::Tensor group_points(at::Tensor points, at::Tensor idx) {\n  CHECK_CONTIGUOUS(points);\n  CHECK_CONTIGUOUS(idx);\n  CHECK_IS_FLOAT(points);\n  CHECK_IS_INT(idx);\n\n  if (points.is_cuda()) {\n    CHECK_CUDA(idx);\n  }\n\n  at::Tensor output =\n      torch::zeros({points.size(0), points.size(1), idx.size(1), idx.size(2)},\n                   at::device(points.device()).dtype(at::ScalarType::Float));\n\n  if (points.is_cuda()) {\n    group_points_kernel_wrapper(points.size(0), points.size(1), points.size(2),\n                                idx.size(1), idx.size(2),\n                                points.data_ptr<float>(), idx.data_ptr<int>(),\n                                output.data_ptr<float>());\n  } else {\n    AT_ASSERT(false, \"CPU not supported\");\n  }\n\n  return output;\n}\n\nat::Tensor group_points_grad(at::Tensor grad_out, at::Tensor idx, const int n) {\n  CHECK_CONTIGUOUS(grad_out);\n  CHECK_CONTIGUOUS(idx);\n  CHECK_IS_FLOAT(grad_out);\n  CHECK_IS_INT(idx);\n\n  if (grad_out.is_cuda()) {\n    CHECK_CUDA(idx);\n  }\n\n  at::Tensor output =\n      torch::zeros({grad_out.size(0), grad_out.size(1), n},\n                   at::device(grad_out.device()).dtype(at::ScalarType::Float));\n\n  if (grad_out.is_cuda()) {\n    group_points_grad_kernel_wrapper(\n        grad_out.size(0), grad_out.size(1), n, idx.size(1), idx.size(2),\n        grad_out.data_ptr<float>(), idx.data_ptr<int>(),\n        output.data_ptr<float>());\n  } else {\n    AT_ASSERT(false, \"CPU not supported\");\n  }\n\n  return output;\n}\n"
  },
  {
    "path": "modules/third_party/pointnet2/_ext_src/src/group_points_gpu.cu",
    "content": "#include <stdio.h>\n#include <stdlib.h>\n\n#include \"cuda_utils.h\"\n\n// input: points(b, c, n) idx(b, npoints, nsample)\n// output: out(b, c, npoints, nsample)\n__global__ void group_points_kernel(int b, int c, int n, int npoints,\n                                    int nsample,\n                                    const float *__restrict__ points,\n                                    const int *__restrict__ idx,\n                                    float *__restrict__ out) {\n  int batch_index = blockIdx.x;\n  points += batch_index * n * c;\n  idx += batch_index * npoints * nsample;\n  out += batch_index * npoints * nsample * c;\n\n  const int index = threadIdx.y * blockDim.x + threadIdx.x;\n  const int stride = blockDim.y * blockDim.x;\n  for (int i = index; i < c * npoints; i += stride) {\n    const int l = i / npoints;\n    const int j = i % npoints;\n    for (int k = 0; k < nsample; ++k) {\n      int ii = idx[j * nsample + k];\n      out[(l * npoints + j) * nsample + k] = points[l * n + ii];\n    }\n  }\n}\n\nvoid group_points_kernel_wrapper(int b, int c, int n, int npoints, int nsample,\n                                 const float *points, const int *idx,\n                                 float *out) {\n  cudaStream_t stream = at::cuda::getCurrentCUDAStream();\n\n  group_points_kernel<<<b, opt_block_config(npoints, c), 0, stream>>>(\n      b, c, n, npoints, nsample, points, idx, out);\n\n  CUDA_CHECK_ERRORS();\n}\n\n// input: grad_out(b, c, npoints, nsample), idx(b, npoints, nsample)\n// output: grad_points(b, c, n)\n__global__ void group_points_grad_kernel(int b, int c, int n, int npoints,\n                                         int nsample,\n                                         const float *__restrict__ grad_out,\n                                         const int *__restrict__ idx,\n                                         float *__restrict__ grad_points) {\n  int batch_index = blockIdx.x;\n  grad_out += batch_index * npoints * nsample * c;\n  idx += batch_index * npoints * nsample;\n  grad_points += batch_index * n * c;\n\n  const int index = threadIdx.y * blockDim.x + threadIdx.x;\n  const int stride = blockDim.y * blockDim.x;\n  for (int i = index; i < c * npoints; i += stride) {\n    const int l = i / npoints;\n    const int j = i % npoints;\n    for (int k = 0; k < nsample; ++k) {\n      int ii = idx[j * nsample + k];\n      atomicAdd(grad_points + l * n + ii,\n                grad_out[(l * npoints + j) * nsample + k]);\n    }\n  }\n}\n\nvoid group_points_grad_kernel_wrapper(int b, int c, int n, int npoints,\n                                      int nsample, const float *grad_out,\n                                      const int *idx, float *grad_points) {\n  cudaStream_t stream = at::cuda::getCurrentCUDAStream();\n\n  group_points_grad_kernel<<<b, opt_block_config(npoints, c), 0, stream>>>(\n      b, c, n, npoints, nsample, grad_out, idx, grad_points);\n\n  CUDA_CHECK_ERRORS();\n}\n"
  },
  {
    "path": "modules/third_party/pointnet2/_ext_src/src/interpolate.cpp",
    "content": "#include \"interpolate.h\"\n#include \"utils.h\"\n\nvoid three_nn_kernel_wrapper(int b, int n, int m, const float *unknown,\n                             const float *known, float *dist2, int *idx);\nvoid three_interpolate_kernel_wrapper(int b, int c, int m, int n,\n                                      const float *points, const int *idx,\n                                      const float *weight, float *out);\nvoid three_interpolate_grad_kernel_wrapper(int b, int c, int n, int m,\n                                           const float *grad_out,\n                                           const int *idx, const float *weight,\n                                           float *grad_points);\n\nstd::vector<at::Tensor> three_nn(at::Tensor unknowns, at::Tensor knows) {\n  CHECK_CONTIGUOUS(unknowns);\n  CHECK_CONTIGUOUS(knows);\n  CHECK_IS_FLOAT(unknowns);\n  CHECK_IS_FLOAT(knows);\n\n  if (unknowns.is_cuda()) {\n    CHECK_CUDA(knows);\n  }\n\n  at::Tensor idx =\n      torch::zeros({unknowns.size(0), unknowns.size(1), 3},\n                   at::device(unknowns.device()).dtype(at::ScalarType::Int));\n  at::Tensor dist2 =\n      torch::zeros({unknowns.size(0), unknowns.size(1), 3},\n                   at::device(unknowns.device()).dtype(at::ScalarType::Float));\n\n  if (unknowns.is_cuda()) {\n    three_nn_kernel_wrapper(unknowns.size(0), unknowns.size(1), knows.size(1),\n                            unknowns.data_ptr<float>(), knows.data_ptr<float>(),\n                            dist2.data_ptr<float>(), idx.data_ptr<int>());\n  } else {\n    AT_ASSERT(false, \"CPU not supported\");\n  }\n\n  return {dist2, idx};\n}\n\nat::Tensor three_interpolate(at::Tensor points, at::Tensor idx,\n                             at::Tensor weight) {\n  CHECK_CONTIGUOUS(points);\n  CHECK_CONTIGUOUS(idx);\n  CHECK_CONTIGUOUS(weight);\n  CHECK_IS_FLOAT(points);\n  CHECK_IS_INT(idx);\n  CHECK_IS_FLOAT(weight);\n\n  if (points.is_cuda()) {\n    CHECK_CUDA(idx);\n    CHECK_CUDA(weight);\n  }\n\n  at::Tensor output =\n      torch::zeros({points.size(0), points.size(1), idx.size(1)},\n                   at::device(points.device()).dtype(at::ScalarType::Float));\n\n  if (points.is_cuda()) {\n    three_interpolate_kernel_wrapper(\n        points.size(0), points.size(1), points.size(2), idx.size(1),\n        points.data_ptr<float>(), idx.data_ptr<int>(), weight.data_ptr<float>(),\n        output.data_ptr<float>());\n  } else {\n    AT_ASSERT(false, \"CPU not supported\");\n  }\n\n  return output;\n}\nat::Tensor three_interpolate_grad(at::Tensor grad_out, at::Tensor idx,\n                                  at::Tensor weight, const int m) {\n  CHECK_CONTIGUOUS(grad_out);\n  CHECK_CONTIGUOUS(idx);\n  CHECK_CONTIGUOUS(weight);\n  CHECK_IS_FLOAT(grad_out);\n  CHECK_IS_INT(idx);\n  CHECK_IS_FLOAT(weight);\n\n  if (grad_out.is_cuda()) {\n    CHECK_CUDA(idx);\n    CHECK_CUDA(weight);\n  }\n\n  at::Tensor output =\n      torch::zeros({grad_out.size(0), grad_out.size(1), m},\n                   at::device(grad_out.device()).dtype(at::ScalarType::Float));\n\n  if (grad_out.is_cuda()) {\n    three_interpolate_grad_kernel_wrapper(\n        grad_out.size(0), grad_out.size(1), grad_out.size(2), m,\n        grad_out.data_ptr<float>(), idx.data_ptr<int>(),\n        weight.data_ptr<float>(), output.data_ptr<float>());\n  } else {\n    AT_ASSERT(false, \"CPU not supported\");\n  }\n\n  return output;\n}\n"
  },
  {
    "path": "modules/third_party/pointnet2/_ext_src/src/interpolate_gpu.cu",
    "content": "#include <math.h>\n#include <stdio.h>\n#include <stdlib.h>\n\n#include \"cuda_utils.h\"\n\n// input: unknown(b, n, 3) known(b, m, 3)\n// output: dist2(b, n, 3), idx(b, n, 3)\n__global__ void three_nn_kernel(int b, int n, int m,\n                                const float *__restrict__ unknown,\n                                const float *__restrict__ known,\n                                float *__restrict__ dist2,\n                                int *__restrict__ idx) {\n  int batch_index = blockIdx.x;\n  unknown += batch_index * n * 3;\n  known += batch_index * m * 3;\n  dist2 += batch_index * n * 3;\n  idx += batch_index * n * 3;\n\n  int index = threadIdx.x;\n  int stride = blockDim.x;\n  for (int j = index; j < n; j += stride) {\n    float ux = unknown[j * 3 + 0];\n    float uy = unknown[j * 3 + 1];\n    float uz = unknown[j * 3 + 2];\n\n    double best1 = 1e40, best2 = 1e40, best3 = 1e40;\n    int besti1 = 0, besti2 = 0, besti3 = 0;\n    for (int k = 0; k < m; ++k) {\n      float x = known[k * 3 + 0];\n      float y = known[k * 3 + 1];\n      float z = known[k * 3 + 2];\n      float d = (ux - x) * (ux - x) + (uy - y) * (uy - y) + (uz - z) * (uz - z);\n      if (d < best1) {\n        best3 = best2;\n        besti3 = besti2;\n        best2 = best1;\n        besti2 = besti1;\n        best1 = d;\n        besti1 = k;\n      } else if (d < best2) {\n        best3 = best2;\n        besti3 = besti2;\n        best2 = d;\n        besti2 = k;\n      } else if (d < best3) {\n        best3 = d;\n        besti3 = k;\n      }\n    }\n    dist2[j * 3 + 0] = best1;\n    dist2[j * 3 + 1] = best2;\n    dist2[j * 3 + 2] = best3;\n\n    idx[j * 3 + 0] = besti1;\n    idx[j * 3 + 1] = besti2;\n    idx[j * 3 + 2] = besti3;\n  }\n}\n\nvoid three_nn_kernel_wrapper(int b, int n, int m, const float *unknown,\n                             const float *known, float *dist2, int *idx) {\n  cudaStream_t stream = at::cuda::getCurrentCUDAStream();\n  three_nn_kernel<<<b, opt_n_threads(n), 0, stream>>>(b, n, m, unknown, known,\n                                                      dist2, idx);\n\n  CUDA_CHECK_ERRORS();\n}\n\n// input: points(b, c, m), idx(b, n, 3), weight(b, n, 3)\n// output: out(b, c, n)\n__global__ void three_interpolate_kernel(int b, int c, int m, int n,\n                                         const float *__restrict__ points,\n                                         const int *__restrict__ idx,\n                                         const float *__restrict__ weight,\n                                         float *__restrict__ out) {\n  int batch_index = blockIdx.x;\n  points += batch_index * m * c;\n\n  idx += batch_index * n * 3;\n  weight += batch_index * n * 3;\n\n  out += batch_index * n * c;\n\n  const int index = threadIdx.y * blockDim.x + threadIdx.x;\n  const int stride = blockDim.y * blockDim.x;\n  for (int i = index; i < c * n; i += stride) {\n    const int l = i / n;\n    const int j = i % n;\n    float w1 = weight[j * 3 + 0];\n    float w2 = weight[j * 3 + 1];\n    float w3 = weight[j * 3 + 2];\n\n    int i1 = idx[j * 3 + 0];\n    int i2 = idx[j * 3 + 1];\n    int i3 = idx[j * 3 + 2];\n\n    out[i] = points[l * m + i1] * w1 + points[l * m + i2] * w2 +\n             points[l * m + i3] * w3;\n  }\n}\n\nvoid three_interpolate_kernel_wrapper(int b, int c, int m, int n,\n                                      const float *points, const int *idx,\n                                      const float *weight, float *out) {\n  cudaStream_t stream = at::cuda::getCurrentCUDAStream();\n  three_interpolate_kernel<<<b, opt_block_config(n, c), 0, stream>>>(\n      b, c, m, n, points, idx, weight, out);\n\n  CUDA_CHECK_ERRORS();\n}\n\n// input: grad_out(b, c, n), idx(b, n, 3), weight(b, n, 3)\n// output: grad_points(b, c, m)\n\n__global__ void three_interpolate_grad_kernel(\n    int b, int c, int n, int m, const float *__restrict__ grad_out,\n    const int *__restrict__ idx, const float *__restrict__ weight,\n    float *__restrict__ grad_points) {\n  int batch_index = blockIdx.x;\n  grad_out += batch_index * n * c;\n  idx += batch_index * n * 3;\n  weight += batch_index * n * 3;\n  grad_points += batch_index * m * c;\n\n  const int index = threadIdx.y * blockDim.x + threadIdx.x;\n  const int stride = blockDim.y * blockDim.x;\n  for (int i = index; i < c * n; i += stride) {\n    const int l = i / n;\n    const int j = i % n;\n    float w1 = weight[j * 3 + 0];\n    float w2 = weight[j * 3 + 1];\n    float w3 = weight[j * 3 + 2];\n\n    int i1 = idx[j * 3 + 0];\n    int i2 = idx[j * 3 + 1];\n    int i3 = idx[j * 3 + 2];\n\n    atomicAdd(grad_points + l * m + i1, grad_out[i] * w1);\n    atomicAdd(grad_points + l * m + i2, grad_out[i] * w2);\n    atomicAdd(grad_points + l * m + i3, grad_out[i] * w3);\n  }\n}\n\nvoid three_interpolate_grad_kernel_wrapper(int b, int c, int n, int m,\n                                           const float *grad_out,\n                                           const int *idx, const float *weight,\n                                           float *grad_points) {\n  cudaStream_t stream = at::cuda::getCurrentCUDAStream();\n  three_interpolate_grad_kernel<<<b, opt_block_config(n, c), 0, stream>>>(\n      b, c, n, m, grad_out, idx, weight, grad_points);\n\n  CUDA_CHECK_ERRORS();\n}\n"
  },
  {
    "path": "modules/third_party/pointnet2/_ext_src/src/sampling.cpp",
    "content": "#include \"sampling.h\"\n#include \"utils.h\"\n\nvoid gather_points_kernel_wrapper(int b, int c, int n, int npoints,\n                                  const float *points, const int *idx,\n                                  float *out);\nvoid gather_points_grad_kernel_wrapper(int b, int c, int n, int npoints,\n                                       const float *grad_out, const int *idx,\n                                       float *grad_points);\n\nvoid furthest_point_sampling_kernel_wrapper(int b, int n, int m,\n                                            const float *dataset, float *temp,\n                                            int *idxs);\n\nat::Tensor gather_points(at::Tensor points, at::Tensor idx) {\n  CHECK_CONTIGUOUS(points);\n  CHECK_CONTIGUOUS(idx);\n  CHECK_IS_FLOAT(points);\n  CHECK_IS_INT(idx);\n\n  if (points.is_cuda()) {\n    CHECK_CUDA(idx);\n  }\n\n  at::Tensor output =\n      torch::zeros({points.size(0), points.size(1), idx.size(1)},\n                   at::device(points.device()).dtype(at::ScalarType::Float));\n\n  if (points.is_cuda()) {\n    gather_points_kernel_wrapper(points.size(0), points.size(1), points.size(2),\n                                 idx.size(1), points.data_ptr<float>(),\n                                 idx.data_ptr<int>(), output.data_ptr<float>());\n  } else {\n    AT_ASSERT(false, \"CPU not supported\");\n  }\n\n  return output;\n}\n\nat::Tensor gather_points_grad(at::Tensor grad_out, at::Tensor idx,\n                              const int n) {\n  CHECK_CONTIGUOUS(grad_out);\n  CHECK_CONTIGUOUS(idx);\n  CHECK_IS_FLOAT(grad_out);\n  CHECK_IS_INT(idx);\n\n  if (grad_out.is_cuda()) {\n    CHECK_CUDA(idx);\n  }\n\n  at::Tensor output =\n      torch::zeros({grad_out.size(0), grad_out.size(1), n},\n                   at::device(grad_out.device()).dtype(at::ScalarType::Float));\n\n  if (grad_out.is_cuda()) {\n    gather_points_grad_kernel_wrapper(grad_out.size(0), grad_out.size(1), n,\n                                      idx.size(1), grad_out.data_ptr<float>(),\n                                      idx.data_ptr<int>(),\n                                      output.data_ptr<float>());\n  } else {\n    AT_ASSERT(false, \"CPU not supported\");\n  }\n\n  return output;\n}\nat::Tensor furthest_point_sampling(at::Tensor points, const int nsamples) {\n  CHECK_CONTIGUOUS(points);\n  CHECK_IS_FLOAT(points);\n\n  at::Tensor output =\n      torch::zeros({points.size(0), nsamples},\n                   at::device(points.device()).dtype(at::ScalarType::Int));\n\n  at::Tensor tmp =\n      torch::full({points.size(0), points.size(1)}, 1e10,\n                  at::device(points.device()).dtype(at::ScalarType::Float));\n\n  if (points.is_cuda()) {\n    furthest_point_sampling_kernel_wrapper(\n        points.size(0), points.size(1), nsamples, points.data_ptr<float>(),\n        tmp.data_ptr<float>(), output.data_ptr<int>());\n  } else {\n    AT_ASSERT(false, \"CPU not supported\");\n  }\n\n  return output;\n}\n"
  },
  {
    "path": "modules/third_party/pointnet2/_ext_src/src/sampling_gpu.cu",
    "content": "#include <stdio.h>\n#include <stdlib.h>\n\n#include \"cuda_utils.h\"\n\n// input: points(b, c, n) idx(b, m)\n// output: out(b, c, m)\n__global__ void gather_points_kernel(int b, int c, int n, int m,\n                                     const float *__restrict__ points,\n                                     const int *__restrict__ idx,\n                                     float *__restrict__ out) {\n  for (int i = blockIdx.x; i < b; i += gridDim.x) {\n    for (int l = blockIdx.y; l < c; l += gridDim.y) {\n      for (int j = threadIdx.x; j < m; j += blockDim.x) {\n        int a = idx[i * m + j];\n        out[(i * c + l) * m + j] = points[(i * c + l) * n + a];\n      }\n    }\n  }\n}\n\nvoid gather_points_kernel_wrapper(int b, int c, int n, int npoints,\n                                  const float *points, const int *idx,\n                                  float *out) {\n  gather_points_kernel<<<dim3(b, c, 1), opt_n_threads(npoints), 0,\n                         at::cuda::getCurrentCUDAStream()>>>(b, c, n, npoints,\n                                                             points, idx, out);\n\n  CUDA_CHECK_ERRORS();\n}\n\n// input: grad_out(b, c, m) idx(b, m)\n// output: grad_points(b, c, n)\n__global__ void gather_points_grad_kernel(int b, int c, int n, int m,\n                                          const float *__restrict__ grad_out,\n                                          const int *__restrict__ idx,\n                                          float *__restrict__ grad_points) {\n  for (int i = blockIdx.x; i < b; i += gridDim.x) {\n    for (int l = blockIdx.y; l < c; l += gridDim.y) {\n      for (int j = threadIdx.x; j < m; j += blockDim.x) {\n        int a = idx[i * m + j];\n        atomicAdd(grad_points + (i * c + l) * n + a,\n                  grad_out[(i * c + l) * m + j]);\n      }\n    }\n  }\n}\n\nvoid gather_points_grad_kernel_wrapper(int b, int c, int n, int npoints,\n                                       const float *grad_out, const int *idx,\n                                       float *grad_points) {\n  gather_points_grad_kernel<<<dim3(b, c, 1), opt_n_threads(npoints), 0,\n                              at::cuda::getCurrentCUDAStream()>>>(\n      b, c, n, npoints, grad_out, idx, grad_points);\n\n  CUDA_CHECK_ERRORS();\n}\n\n__device__ void __update(float *__restrict__ dists, int *__restrict__ dists_i,\n                         int idx1, int idx2) {\n  const float v1 = dists[idx1], v2 = dists[idx2];\n  const int i1 = dists_i[idx1], i2 = dists_i[idx2];\n  dists[idx1] = max(v1, v2);\n  dists_i[idx1] = v2 > v1 ? i2 : i1;\n}\n\n// Input dataset: (b, n, 3), tmp: (b, n)\n// Ouput idxs (b, m)\ntemplate <unsigned int block_size>\n__global__ void furthest_point_sampling_kernel(\n    int b, int n, int m, const float *__restrict__ dataset,\n    float *__restrict__ temp, int *__restrict__ idxs) {\n  if (m <= 0) return;\n  __shared__ float dists[block_size];\n  __shared__ int dists_i[block_size];\n\n  int batch_index = blockIdx.x;\n  dataset += batch_index * n * 3;\n  temp += batch_index * n;\n  idxs += batch_index * m;\n\n  int tid = threadIdx.x;\n  const int stride = block_size;\n\n  int old = 0;\n  if (threadIdx.x == 0) idxs[0] = old;\n\n  __syncthreads();\n  for (int j = 1; j < m; j++) {\n    int besti = 0;\n    float best = -1;\n    float x1 = dataset[old * 3 + 0];\n    float y1 = dataset[old * 3 + 1];\n    float z1 = dataset[old * 3 + 2];\n    for (int k = tid; k < n; k += stride) {\n      float x2, y2, z2;\n      x2 = dataset[k * 3 + 0];\n      y2 = dataset[k * 3 + 1];\n      z2 = dataset[k * 3 + 2];\n      float mag = (x2 * x2) + (y2 * y2) + (z2 * z2);\n      if (mag <= 1e-3) continue;\n\n      float d =\n          (x2 - x1) * (x2 - x1) + (y2 - y1) * (y2 - y1) + (z2 - z1) * (z2 - z1);\n\n      float d2 = min(d, temp[k]);\n      temp[k] = d2;\n      besti = d2 > best ? k : besti;\n      best = d2 > best ? d2 : best;\n    }\n    dists[tid] = best;\n    dists_i[tid] = besti;\n    __syncthreads();\n\n    if (block_size >= 512) {\n      if (tid < 256) {\n        __update(dists, dists_i, tid, tid + 256);\n      }\n      __syncthreads();\n    }\n    if (block_size >= 256) {\n      if (tid < 128) {\n        __update(dists, dists_i, tid, tid + 128);\n      }\n      __syncthreads();\n    }\n    if (block_size >= 128) {\n      if (tid < 64) {\n        __update(dists, dists_i, tid, tid + 64);\n      }\n      __syncthreads();\n    }\n    if (block_size >= 64) {\n      if (tid < 32) {\n        __update(dists, dists_i, tid, tid + 32);\n      }\n      __syncthreads();\n    }\n    if (block_size >= 32) {\n      if (tid < 16) {\n        __update(dists, dists_i, tid, tid + 16);\n      }\n      __syncthreads();\n    }\n    if (block_size >= 16) {\n      if (tid < 8) {\n        __update(dists, dists_i, tid, tid + 8);\n      }\n      __syncthreads();\n    }\n    if (block_size >= 8) {\n      if (tid < 4) {\n        __update(dists, dists_i, tid, tid + 4);\n      }\n      __syncthreads();\n    }\n    if (block_size >= 4) {\n      if (tid < 2) {\n        __update(dists, dists_i, tid, tid + 2);\n      }\n      __syncthreads();\n    }\n    if (block_size >= 2) {\n      if (tid < 1) {\n        __update(dists, dists_i, tid, tid + 1);\n      }\n      __syncthreads();\n    }\n\n    old = dists_i[0];\n    if (tid == 0) idxs[j] = old;\n  }\n}\n\nvoid furthest_point_sampling_kernel_wrapper(int b, int n, int m,\n                                            const float *dataset, float *temp,\n                                            int *idxs) {\n  unsigned int n_threads = opt_n_threads(n);\n\n  cudaStream_t stream = at::cuda::getCurrentCUDAStream();\n\n  switch (n_threads) {\n    case 512:\n      furthest_point_sampling_kernel<512>\n          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);\n      break;\n    case 256:\n      furthest_point_sampling_kernel<256>\n          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);\n      break;\n    case 128:\n      furthest_point_sampling_kernel<128>\n          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);\n      break;\n    case 64:\n      furthest_point_sampling_kernel<64>\n          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);\n      break;\n    case 32:\n      furthest_point_sampling_kernel<32>\n          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);\n      break;\n    case 16:\n      furthest_point_sampling_kernel<16>\n          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);\n      break;\n    case 8:\n      furthest_point_sampling_kernel<8>\n          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);\n      break;\n    case 4:\n      furthest_point_sampling_kernel<4>\n          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);\n      break;\n    case 2:\n      furthest_point_sampling_kernel<2>\n          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);\n      break;\n    case 1:\n      furthest_point_sampling_kernel<1>\n          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);\n      break;\n    default:\n      furthest_point_sampling_kernel<512>\n          <<<b, n_threads, 0, stream>>>(b, n, m, dataset, temp, idxs);\n  }\n\n  CUDA_CHECK_ERRORS();\n}\n"
  },
  {
    "path": "modules/third_party/pointnet2/_version.py",
    "content": "__version__ = \"3.0.0\"\n"
  },
  {
    "path": "modules/third_party/pointnet2/pointnet2_modules.py",
    "content": "# Copyright (c) Facebook, Inc. and its affiliates.\n# \n# This source code is licensed under the MIT license found in the\n# LICENSE file in the root directory of this source tree.\n\n''' Pointnet2 layers.\nModified based on: https://github.com/erikwijmans/Pointnet2_PyTorch\nExtended with the following:\n1. Uniform sampling in each local region (sample_uniformly)\n2. Return sampled points indices to support votenet.\n'''\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nimport os\nimport sys\nBASE_DIR = os.path.dirname(os.path.abspath(__file__))\nsys.path.append(BASE_DIR)\n\nimport pointnet2_utils\nimport pytorch_utils as pt_utils\nfrom typing import List\n\n\nclass _PointnetSAModuleBase(nn.Module):\n\n    def __init__(self):\n        super().__init__()\n        self.npoint = None\n        self.groupers = None\n        self.mlps = None\n\n    def forward(self, xyz: torch.Tensor,\n                features: torch.Tensor = None) -> (torch.Tensor, torch.Tensor):\n        r\"\"\"\n        Parameters\n        ----------\n        xyz : torch.Tensor\n            (B, N, 3) tensor of the xyz coordinates of the features\n        features : torch.Tensor\n            (B, N, C) tensor of the descriptors of the the features\n\n        Returns\n        -------\n        new_xyz : torch.Tensor\n            (B, npoint, 3) tensor of the new features' xyz\n        new_features : torch.Tensor\n            (B, npoint, \\sum_k(mlps[k][-1])) tensor of the new_features descriptors\n        \"\"\"\n\n        new_features_list = []\n\n        xyz_flipped = xyz.transpose(1, 2).contiguous()\n        new_xyz = pointnet2_utils.gather_operation(\n            xyz_flipped,\n            pointnet2_utils.furthest_point_sample(xyz, self.npoint)\n        ).transpose(1, 2).contiguous() if self.npoint is not None else None\n\n        for i in range(len(self.groupers)):\n            new_features = self.groupers[i](\n                xyz, new_xyz, features\n            )  # (B, C, npoint, nsample)\n\n            new_features = self.mlps[i](\n                new_features\n            )  # (B, mlp[-1], npoint, nsample)\n            new_features = F.max_pool2d(\n                new_features, kernel_size=[1, new_features.size(3)]\n            )  # (B, mlp[-1], npoint, 1)\n            new_features = new_features.squeeze(-1)  # (B, mlp[-1], npoint)\n\n            new_features_list.append(new_features)\n\n        return new_xyz, torch.cat(new_features_list, dim=1)\n\n\nclass PointnetSAModuleMSG(_PointnetSAModuleBase):\n    r\"\"\"Pointnet set abstrction layer with multiscale grouping\n\n    Parameters\n    ----------\n    npoint : int\n        Number of features\n    radii : list of float32\n        list of radii to group with\n    nsamples : list of int32\n        Number of samples in each ball query\n    mlps : list of list of int32\n        Spec of the pointnet before the global max_pool for each scale\n    bn : bool\n        Use batchnorm\n    \"\"\"\n\n    def __init__(\n            self,\n            *,\n            npoint: int,\n            radii: List[float],\n            nsamples: List[int],\n            mlps: List[List[int]],\n            bn: bool = True,\n            use_xyz: bool = True, \n            sample_uniformly: bool = False\n    ):\n        super().__init__()\n\n        assert len(radii) == len(nsamples) == len(mlps)\n\n        self.npoint = npoint\n        self.groupers = nn.ModuleList()\n        self.mlps = nn.ModuleList()\n        for i in range(len(radii)):\n            radius = radii[i]\n            nsample = nsamples[i]\n            self.groupers.append(\n                pointnet2_utils.QueryAndGroup(radius, nsample, use_xyz=use_xyz, sample_uniformly=sample_uniformly)\n                if npoint is not None else pointnet2_utils.GroupAll(use_xyz)\n            )\n            mlp_spec = mlps[i]\n            if use_xyz:\n                mlp_spec[0] += 3\n\n            self.mlps.append(pt_utils.SharedMLP(mlp_spec, bn=bn))\n\n\nclass PointnetSAModule(PointnetSAModuleMSG):\n    r\"\"\"Pointnet set abstrction layer\n\n    Parameters\n    ----------\n    npoint : int\n        Number of features\n    radius : float\n        Radius of ball\n    nsample : int\n        Number of samples in the ball query\n    mlp : list\n        Spec of the pointnet before the global max_pool\n    bn : bool\n        Use batchnorm\n    \"\"\"\n\n    def __init__(\n            self,\n            *,\n            mlp: List[int],\n            npoint: int = None,\n            radius: float = None,\n            nsample: int = None,\n            bn: bool = True,\n            use_xyz: bool = True\n    ):\n        super().__init__(\n            mlps=[mlp],\n            npoint=npoint,\n            radii=[radius],\n            nsamples=[nsample],\n            bn=bn,\n            use_xyz=use_xyz\n        )\n\n\nclass PointnetSAModuleVotes(nn.Module):\n    ''' Modified based on _PointnetSAModuleBase and PointnetSAModuleMSG\n    with extra support for returning point indices for getting their GT votes '''\n\n    def __init__(\n            self,\n            *,\n            mlp: List[int],\n            npoint: int = None,\n            radius: float = None,\n            nsample: int = None,\n            bn: bool = True,\n            use_xyz: bool = True,\n            pooling: str = 'max',\n            sigma: float = None, # for RBF pooling\n            normalize_xyz: bool = False, # noramlize local XYZ with radius\n            sample_uniformly: bool = False,\n            ret_unique_cnt: bool = False\n    ):\n        super().__init__()\n\n        self.npoint = npoint\n        self.radius = radius\n        self.nsample = nsample\n        self.pooling = pooling\n        self.mlp_module = None\n        self.use_xyz = use_xyz\n        self.sigma = sigma\n        if self.sigma is None:\n            self.sigma = self.radius/2\n        self.normalize_xyz = normalize_xyz\n        self.ret_unique_cnt = ret_unique_cnt\n\n        if npoint is not None:\n            self.grouper = pointnet2_utils.QueryAndGroup(radius, nsample,\n                                                         use_xyz=use_xyz, ret_grouped_xyz=True, normalize_xyz=normalize_xyz,\n                                                         sample_uniformly=sample_uniformly, ret_unique_cnt=ret_unique_cnt)\n        else:\n            self.grouper = pointnet2_utils.GroupAll(use_xyz, ret_grouped_xyz=True)\n\n        mlp_spec = mlp\n        if use_xyz and len(mlp_spec)>0:\n            mlp_spec[0] += 3\n        self.mlp_module = pt_utils.SharedMLP(mlp_spec, bn=bn)\n\n\n    def forward(self, xyz: torch.Tensor,\n                features: torch.Tensor = None,\n                inds: torch.Tensor = None) -> (torch.Tensor, torch.Tensor):\n        r\"\"\"\n        Parameters\n        ----------\n        xyz : torch.Tensor\n            (B, N, 3) tensor of the xyz coordinates of the features\n        features : torch.Tensor\n            (B, C, N) tensor of the descriptors of the the features\n        inds : torch.Tensor\n            (B, npoint) tensor that stores index to the xyz points (values in 0-N-1)\n\n        Returns\n        -------\n        new_xyz : torch.Tensor\n            (B, npoint, 3) tensor of the new features' xyz\n        new_features : torch.Tensor\n            (B, \\sum_k(mlps[k][-1]), npoint) tensor of the new_features descriptors\n        inds: torch.Tensor\n            (B, npoint) tensor of the inds\n        \"\"\"\n\n        xyz_flipped = xyz.transpose(1, 2).contiguous()\n        if inds is None:\n            inds = pointnet2_utils.furthest_point_sample(xyz, self.npoint)\n        else:\n            assert(inds.shape[1] == self.npoint)\n        new_xyz = pointnet2_utils.gather_operation(\n            xyz_flipped, inds\n        ).transpose(1, 2).contiguous() if self.npoint is not None else None\n\n        if not self.ret_unique_cnt:\n            grouped_features, grouped_xyz = self.grouper(\n                xyz, new_xyz, features\n            )  # (B, C, npoint, nsample)\n        else:\n            grouped_features, grouped_xyz, unique_cnt = self.grouper(\n                xyz, new_xyz, features\n            )  # (B, C, npoint, nsample), (B,3,npoint,nsample), (B,npoint)\n\n        new_features = self.mlp_module(\n            grouped_features\n        )  # (B, mlp[-1], npoint, nsample)\n        if self.pooling == 'max':\n            new_features = F.max_pool2d(\n                new_features, kernel_size=[1, new_features.size(3)]\n            )  # (B, mlp[-1], npoint, 1)\n        elif self.pooling == 'avg':\n            new_features = F.avg_pool2d(\n                new_features, kernel_size=[1, new_features.size(3)]\n            )  # (B, mlp[-1], npoint, 1)\n        elif self.pooling == 'rbf': \n            # Use radial basis function kernel for weighted sum of features (normalized by nsample and sigma)\n            # Ref: https://en.wikipedia.org/wiki/Radial_basis_function_kernel\n            rbf = torch.exp(-1 * grouped_xyz.pow(2).sum(1,keepdim=False) / (self.sigma**2) / 2) # (B, npoint, nsample)\n            new_features = torch.sum(new_features * rbf.unsqueeze(1), -1, keepdim=True) / float(self.nsample) # (B, mlp[-1], npoint, 1)\n        new_features = new_features.squeeze(-1)  # (B, mlp[-1], npoint)\n\n        if not self.ret_unique_cnt:\n            return new_xyz, new_features, inds\n        else:\n            return new_xyz, new_features, inds, unique_cnt\n\nclass PointnetSAModuleMSGVotes(nn.Module):\n    ''' Modified based on _PointnetSAModuleBase and PointnetSAModuleMSG\n    with extra support for returning point indices for getting their GT votes '''\n\n    def __init__(\n            self,\n            *,\n            mlps: List[List[int]],\n            npoint: int,\n            radii: List[float],\n            nsamples: List[int],\n            bn: bool = True,\n            use_xyz: bool = True,\n            sample_uniformly: bool = False\n    ):\n        super().__init__()\n\n        assert(len(mlps) == len(nsamples) == len(radii))\n\n        self.npoint = npoint\n        self.groupers = nn.ModuleList()\n        self.mlps = nn.ModuleList()\n        for i in range(len(radii)):\n            radius = radii[i]\n            nsample = nsamples[i]\n            self.groupers.append(\n                pointnet2_utils.QueryAndGroup(radius, nsample, use_xyz=use_xyz, sample_uniformly=sample_uniformly)\n                if npoint is not None else pointnet2_utils.GroupAll(use_xyz)\n            )\n            mlp_spec = mlps[i]\n            if use_xyz:\n                mlp_spec[0] += 3\n\n            self.mlps.append(pt_utils.SharedMLP(mlp_spec, bn=bn))\n\n    def forward(self, xyz: torch.Tensor,\n                features: torch.Tensor = None, inds: torch.Tensor = None) -> (torch.Tensor, torch.Tensor):\n        r\"\"\"\n        Parameters\n        ----------\n        xyz : torch.Tensor\n            (B, N, 3) tensor of the xyz coordinates of the features\n        features : torch.Tensor\n            (B, C, C) tensor of the descriptors of the the features\n        inds : torch.Tensor\n            (B, npoint) tensor that stores index to the xyz points (values in 0-N-1)\n\n        Returns\n        -------\n        new_xyz : torch.Tensor\n            (B, npoint, 3) tensor of the new features' xyz\n        new_features : torch.Tensor\n            (B, \\sum_k(mlps[k][-1]), npoint) tensor of the new_features descriptors\n        inds: torch.Tensor\n            (B, npoint) tensor of the inds\n        \"\"\"\n        new_features_list = []\n\n        xyz_flipped = xyz.transpose(1, 2).contiguous()\n        if inds is None:\n            inds = pointnet2_utils.furthest_point_sample(xyz, self.npoint)\n        new_xyz = pointnet2_utils.gather_operation(\n            xyz_flipped, inds\n        ).transpose(1, 2).contiguous() if self.npoint is not None else None\n\n        for i in range(len(self.groupers)):\n            new_features = self.groupers[i](\n                xyz, new_xyz, features\n            )  # (B, C, npoint, nsample)\n            new_features = self.mlps[i](\n                new_features\n            )  # (B, mlp[-1], npoint, nsample)\n            new_features = F.max_pool2d(\n                new_features, kernel_size=[1, new_features.size(3)]\n            )  # (B, mlp[-1], npoint, 1)\n            new_features = new_features.squeeze(-1)  # (B, mlp[-1], npoint)\n\n            new_features_list.append(new_features)\n\n        return new_xyz, torch.cat(new_features_list, dim=1), inds\n\n\nclass PointnetFPModule(nn.Module):\n    r\"\"\"Propigates the features of one set to another\n\n    Parameters\n    ----------\n    mlp : list\n        Pointnet module parameters\n    bn : bool\n        Use batchnorm\n    \"\"\"\n\n    def __init__(self, *, mlp: List[int], bn: bool = True):\n        super().__init__()\n        self.mlp = pt_utils.SharedMLP(mlp, bn=bn)\n\n    def forward(\n            self, unknown: torch.Tensor, known: torch.Tensor,\n            unknow_feats: torch.Tensor, known_feats: torch.Tensor\n    ) -> torch.Tensor:\n        r\"\"\"\n        Parameters\n        ----------\n        unknown : torch.Tensor\n            (B, n, 3) tensor of the xyz positions of the unknown features\n        known : torch.Tensor\n            (B, m, 3) tensor of the xyz positions of the known features\n        unknow_feats : torch.Tensor\n            (B, C1, n) tensor of the features to be propigated to\n        known_feats : torch.Tensor\n            (B, C2, m) tensor of features to be propigated\n\n        Returns\n        -------\n        new_features : torch.Tensor\n            (B, mlp[-1], n) tensor of the features of the unknown features\n        \"\"\"\n\n        if known is not None:\n            dist, idx = pointnet2_utils.three_nn(unknown, known)\n            dist_recip = 1.0 / (dist + 1e-8)\n            norm = torch.sum(dist_recip, dim=2, keepdim=True)\n            weight = dist_recip / norm\n\n            interpolated_feats = pointnet2_utils.three_interpolate(\n                known_feats, idx, weight\n            )\n        else:\n            interpolated_feats = known_feats.expand(\n                *known_feats.size()[0:2], unknown.size(1)\n            )\n\n        if unknow_feats is not None:\n            new_features = torch.cat([interpolated_feats, unknow_feats],\n                                   dim=1)  #(B, C2 + C1, n)\n        else:\n            new_features = interpolated_feats\n\n        new_features = new_features.unsqueeze(-1)\n        new_features = self.mlp(new_features)\n\n        return new_features.squeeze(-1)\n\nclass PointnetLFPModuleMSG(nn.Module):\n    ''' Modified based on _PointnetSAModuleBase and PointnetSAModuleMSG\n    learnable feature propagation layer.'''\n\n    def __init__(\n            self,\n            *,\n            mlps: List[List[int]],\n            radii: List[float],\n            nsamples: List[int],\n            post_mlp: List[int],\n            bn: bool = True,\n            use_xyz: bool = True,\n            sample_uniformly: bool = False\n    ):\n        super().__init__()\n\n        assert(len(mlps) == len(nsamples) == len(radii))\n        \n        self.post_mlp = pt_utils.SharedMLP(post_mlp, bn=bn)\n\n        self.groupers = nn.ModuleList()\n        self.mlps = nn.ModuleList()\n        for i in range(len(radii)):\n            radius = radii[i]\n            nsample = nsamples[i]\n            self.groupers.append(\n                pointnet2_utils.QueryAndGroup(radius, nsample, use_xyz=use_xyz,\n                                              sample_uniformly=sample_uniformly)\n            )\n            mlp_spec = mlps[i]\n            if use_xyz:\n                mlp_spec[0] += 3\n\n            self.mlps.append(pt_utils.SharedMLP(mlp_spec, bn=bn))\n\n    def forward(self, xyz2: torch.Tensor, xyz1: torch.Tensor,\n                features2: torch.Tensor, features1: torch.Tensor) -> torch.Tensor:\n        r\"\"\" Propagate features from xyz1 to xyz2.\n        Parameters\n        ----------\n        xyz2 : torch.Tensor\n            (B, N2, 3) tensor of the xyz coordinates of the features\n        xyz1 : torch.Tensor\n            (B, N1, 3) tensor of the xyz coordinates of the features\n        features2 : torch.Tensor\n            (B, C2, N2) tensor of the descriptors of the the features\n        features1 : torch.Tensor\n            (B, C1, N1) tensor of the descriptors of the the features\n\n        Returns\n        -------\n        new_features1 : torch.Tensor\n            (B, \\sum_k(mlps[k][-1]), N1) tensor of the new_features descriptors\n        \"\"\"\n        new_features_list = []\n\n        for i in range(len(self.groupers)):\n            new_features = self.groupers[i](\n                xyz1, xyz2, features1\n            )  # (B, C1, N2, nsample)\n            new_features = self.mlps[i](\n                new_features\n            )  # (B, mlp[-1], N2, nsample)\n            new_features = F.max_pool2d(\n                new_features, kernel_size=[1, new_features.size(3)]\n            )  # (B, mlp[-1], N2, 1)\n            new_features = new_features.squeeze(-1)  # (B, mlp[-1], N2)\n\n            if features2 is not None:\n                new_features = torch.cat([new_features, features2],\n                                           dim=1)  #(B, mlp[-1] + C2, N2)\n\n            new_features = new_features.unsqueeze(-1)\n            new_features = self.post_mlp(new_features)\n\n            new_features_list.append(new_features)\n\n        return torch.cat(new_features_list, dim=1).squeeze(-1)\n\n\nif __name__ == \"__main__\":\n    from torch.autograd import Variable\n    torch.manual_seed(1)\n    torch.cuda.manual_seed_all(1)\n    xyz = Variable(torch.randn(2, 9, 3).cuda(), requires_grad=True)\n    xyz_feats = Variable(torch.randn(2, 9, 6).cuda(), requires_grad=True)\n\n    test_module = PointnetSAModuleMSG(\n        npoint=2, radii=[5.0, 10.0], nsamples=[6, 3], mlps=[[9, 3], [9, 6]]\n    )\n    test_module.cuda()\n    print(test_module(xyz, xyz_feats))\n\n    for _ in range(1):\n        _, new_features = test_module(xyz, xyz_feats)\n        new_features.backward(\n            torch.cuda.FloatTensor(*new_features.size()).fill_(1)\n        )\n        print(new_features)\n        print(xyz.grad)\n"
  },
  {
    "path": "modules/third_party/pointnet2/pointnet2_test.py",
    "content": "# Copyright (c) Facebook, Inc. and its affiliates.\n# \n# This source code is licensed under the MIT license found in the\n# LICENSE file in the root directory of this source tree.\n\n''' Testing customized ops. '''\n\nimport torch\nfrom torch.autograd import gradcheck\nimport numpy as np\n\nimport os\nimport sys\nBASE_DIR = os.path.dirname(os.path.abspath(__file__))\nsys.path.append(BASE_DIR)\nimport pointnet2_utils\n\ndef test_interpolation_grad():\n    batch_size = 1\n    feat_dim = 2\n    m = 4\n    feats = torch.randn(batch_size, feat_dim, m, requires_grad=True).float().cuda()\n    \n    def interpolate_func(inputs):\n        idx = torch.from_numpy(np.array([[[0,1,2],[1,2,3]]])).int().cuda()\n        weight = torch.from_numpy(np.array([[[1,1,1],[2,2,2]]])).float().cuda()\n        interpolated_feats = pointnet2_utils.three_interpolate(inputs, idx, weight)\n        return interpolated_feats\n    \n    assert (gradcheck(interpolate_func, feats, atol=1e-1, rtol=1e-1))\n\nif __name__=='__main__':\n    test_interpolation_grad()\n"
  },
  {
    "path": "modules/third_party/pointnet2/pointnet2_utils.py",
    "content": "# Copyright (c) Facebook, Inc. and its affiliates.\n# \n# This source code is licensed under the MIT license found in the\n# LICENSE file in the root directory of this source tree.\n\n''' Modified based on: https://github.com/erikwijmans/Pointnet2_PyTorch '''\nfrom __future__ import (\n    division,\n    absolute_import,\n    with_statement,\n    print_function,\n    unicode_literals,\n)\nimport torch\nfrom torch.autograd import Function\nimport torch.nn as nn\nimport modules.third_party.pointnet2.pytorch_utils as pt_utils\n\n\nimport builtins\n\ntry:\n    import pointnet2._ext as _ext\nexcept ImportError:\n    if not getattr(builtins, \"__POINTNET2_SETUP__\", False):\n        raise ImportError(\n            \"Could not import _ext module.\\n\"\n            \"Please see the setup instructions in the README: \"\n            \"https://github.com/erikwijmans/Pointnet2_PyTorch/blob/master/README.rst\"\n        )\n\nif False:\n    # Workaround for type hints without depending on the `typing` module\n    from typing import *\n\n\nclass RandomDropout(nn.Module):\n    def __init__(self, p=0.5, inplace=False):\n        super(RandomDropout, self).__init__()\n        self.p = p\n        self.inplace = inplace\n\n    def forward(self, X):\n        theta = torch.Tensor(1).uniform_(0, self.p)[0]\n        return pt_utils.feature_dropout_no_scaling(X, theta, self.train, self.inplace)\n\n\nclass FurthestPointSampling(Function):\n    @staticmethod\n    def forward(ctx, xyz, npoint):\n        # type: (Any, torch.Tensor, int) -> torch.Tensor\n        r\"\"\"\n        Uses iterative furthest point sampling to select a set of npoint features that have the largest\n        minimum distance\n\n        Parameters\n        ----------\n        xyz : torch.Tensor\n            (B, N, 3) tensor where N > npoint\n        npoint : int32\n            number of features in the sampled set\n\n        Returns\n        -------\n        torch.Tensor\n            (B, npoint) tensor containing the set\n        \"\"\"\n        fps_inds = _ext.furthest_point_sampling(xyz, npoint)\n        ctx.mark_non_differentiable(fps_inds)\n        return fps_inds\n\n    @staticmethod\n    def backward(xyz, a=None):\n        return None, None\n\n\nfurthest_point_sample = FurthestPointSampling.apply\n\n\nclass GatherOperation(Function):\n    @staticmethod\n    def forward(ctx, features, idx):\n        # type: (Any, torch.Tensor, torch.Tensor) -> torch.Tensor\n        r\"\"\"\n\n        Parameters\n        ----------\n        features : torch.Tensor\n            (B, C, N) tensor\n\n        idx : torch.Tensor\n            (B, npoint) tensor of the features to gather\n\n        Returns\n        -------\n        torch.Tensor\n            (B, C, npoint) tensor\n        \"\"\"\n\n        _, C, N = features.size()\n\n        ctx.for_backwards = (idx, C, N)\n\n        return _ext.gather_points(features, idx)\n\n    @staticmethod\n    def backward(ctx, grad_out):\n        idx, C, N = ctx.for_backwards\n\n        grad_features = _ext.gather_points_grad(grad_out.contiguous(), idx, N)\n        return grad_features, None\n\n\ngather_operation = GatherOperation.apply\n\n\nclass ThreeNN(Function):\n    @staticmethod\n    def forward(ctx, unknown, known):\n        # type: (Any, torch.Tensor, torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]\n        r\"\"\"\n            Find the three nearest neighbors of unknown in known\n        Parameters\n        ----------\n        unknown : torch.Tensor\n            (B, n, 3) tensor of known features\n        known : torch.Tensor\n            (B, m, 3) tensor of unknown features\n\n        Returns\n        -------\n        dist : torch.Tensor\n            (B, n, 3) l2 distance to the three nearest neighbors\n        idx : torch.Tensor\n            (B, n, 3) index of 3 nearest neighbors\n        \"\"\"\n        dist2, idx = _ext.three_nn(unknown, known)\n\n        return torch.sqrt(dist2), idx\n\n    @staticmethod\n    def backward(ctx, a=None, b=None):\n        return None, None\n\n\nthree_nn = ThreeNN.apply\n\n\nclass ThreeInterpolate(Function):\n    @staticmethod\n    def forward(ctx, features, idx, weight):\n        # type(Any, torch.Tensor, torch.Tensor, torch.Tensor) -> Torch.Tensor\n        r\"\"\"\n            Performs weight linear interpolation on 3 features\n        Parameters\n        ----------\n        features : torch.Tensor\n            (B, c, m) Features descriptors to be interpolated from\n        idx : torch.Tensor\n            (B, n, 3) three nearest neighbors of the target features in features\n        weight : torch.Tensor\n            (B, n, 3) weights\n\n        Returns\n        -------\n        torch.Tensor\n            (B, c, n) tensor of the interpolated features\n        \"\"\"\n        B, c, m = features.size()\n        n = idx.size(1)\n\n        ctx.three_interpolate_for_backward = (idx, weight, m)\n\n        return _ext.three_interpolate(features, idx, weight)\n\n    @staticmethod\n    def backward(ctx, grad_out):\n        # type: (Any, torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]\n        r\"\"\"\n        Parameters\n        ----------\n        grad_out : torch.Tensor\n            (B, c, n) tensor with gradients of ouputs\n\n        Returns\n        -------\n        grad_features : torch.Tensor\n            (B, c, m) tensor with gradients of features\n\n        None\n\n        None\n        \"\"\"\n        idx, weight, m = ctx.three_interpolate_for_backward\n\n        grad_features = _ext.three_interpolate_grad(\n            grad_out.contiguous(), idx, weight, m\n        )\n\n        return grad_features, None, None\n\n\nthree_interpolate = ThreeInterpolate.apply\n\n\nclass GroupingOperation(Function):\n    @staticmethod\n    def forward(ctx, features, idx):\n        # type: (Any, torch.Tensor, torch.Tensor) -> torch.Tensor\n        r\"\"\"\n\n        Parameters\n        ----------\n        features : torch.Tensor\n            (B, C, N) tensor of features to group\n        idx : torch.Tensor\n            (B, npoint, nsample) tensor containing the indicies of features to group with\n\n        Returns\n        -------\n        torch.Tensor\n            (B, C, npoint, nsample) tensor\n        \"\"\"\n        B, nfeatures, nsample = idx.size()\n        _, C, N = features.size()\n\n        ctx.for_backwards = (idx, N)\n\n        return _ext.group_points(features, idx)\n\n    @staticmethod\n    def backward(ctx, grad_out):\n        # type: (Any, torch.tensor) -> Tuple[torch.Tensor, torch.Tensor]\n        r\"\"\"\n\n        Parameters\n        ----------\n        grad_out : torch.Tensor\n            (B, C, npoint, nsample) tensor of the gradients of the output from forward\n\n        Returns\n        -------\n        torch.Tensor\n            (B, C, N) gradient of the features\n        None\n        \"\"\"\n        idx, N = ctx.for_backwards\n\n        grad_features = _ext.group_points_grad(grad_out.contiguous(), idx, N)\n\n        return grad_features, None\n\n\ngrouping_operation = GroupingOperation.apply\n\n\nclass BallQuery(Function):\n    @staticmethod\n    def forward(ctx, radius, nsample, xyz, new_xyz):\n        # type: (Any, float, int, torch.Tensor, torch.Tensor) -> torch.Tensor\n        r\"\"\"\n\n        Parameters\n        ----------\n        radius : float\n            radius of the balls\n        nsample : int\n            maximum number of features in the balls\n        xyz : torch.Tensor\n            (B, N, 3) xyz coordinates of the features\n        new_xyz : torch.Tensor\n            (B, npoint, 3) centers of the ball query\n\n        Returns\n        -------\n        torch.Tensor\n            (B, npoint, nsample) tensor with the indicies of the features that form the query balls\n        \"\"\"\n        inds = _ext.ball_query(new_xyz, xyz, radius, nsample)\n        ctx.mark_non_differentiable(inds)\n        return inds\n\n    @staticmethod\n    def backward(ctx, a=None):\n        return None, None, None, None\n\n\nball_query = BallQuery.apply\n\n\nclass QueryAndGroup(nn.Module):\n    r\"\"\"\n    Groups with a ball query of radius\n\n    Parameters\n    ---------\n    radius : float32\n        Radius of ball\n    nsample : int32\n        Maximum number of features to gather in the ball\n    \"\"\"\n\n    def __init__(self, radius, nsample, use_xyz=True, ret_grouped_xyz=False, normalize_xyz=False, sample_uniformly=False, ret_unique_cnt=False):\n        # type: (QueryAndGroup, float, int, bool) -> None\n        super(QueryAndGroup, self).__init__()\n        self.radius, self.nsample, self.use_xyz = radius, nsample, use_xyz\n        self.ret_grouped_xyz = ret_grouped_xyz\n        self.normalize_xyz = normalize_xyz\n        self.sample_uniformly = sample_uniformly\n        self.ret_unique_cnt = ret_unique_cnt\n        if self.ret_unique_cnt:\n            assert(self.sample_uniformly)\n\n    def forward(self, xyz, new_xyz, features=None):\n        # type: (QueryAndGroup, torch.Tensor. torch.Tensor, torch.Tensor) -> Tuple[Torch.Tensor]\n        r\"\"\"\n        Parameters\n        ----------\n        xyz : torch.Tensor\n            xyz coordinates of the features (B, N, 3)\n        new_xyz : torch.Tensor\n            centriods (B, npoint, 3)\n        features : torch.Tensor\n            Descriptors of the features (B, C, N)\n\n        Returns\n        -------\n        new_features : torch.Tensor\n            (B, 3 + C, npoint, nsample) tensor\n        \"\"\"\n        idx = ball_query(self.radius, self.nsample, xyz, new_xyz)\n\n        if self.sample_uniformly:\n            unique_cnt = torch.zeros((idx.shape[0], idx.shape[1]))\n            for i_batch in range(idx.shape[0]):\n                for i_region in range(idx.shape[1]):\n                    unique_ind = torch.unique(idx[i_batch, i_region, :])\n                    num_unique = unique_ind.shape[0]\n                    unique_cnt[i_batch, i_region] = num_unique\n                    sample_ind = torch.randint(0, num_unique, (self.nsample - num_unique,), dtype=torch.long)\n                    all_ind = torch.cat((unique_ind, unique_ind[sample_ind]))\n                    idx[i_batch, i_region, :] = all_ind\n\n\n        xyz_trans = xyz.transpose(1, 2).contiguous()\n        grouped_xyz = grouping_operation(xyz_trans, idx)  # (B, 3, npoint, nsample)\n        grouped_xyz -= new_xyz.transpose(1, 2).unsqueeze(-1)\n        if self.normalize_xyz:\n            grouped_xyz /= self.radius\n\n        if features is not None:\n            grouped_features = grouping_operation(features, idx)\n            if self.use_xyz:\n                new_features = torch.cat(\n                    [grouped_xyz, grouped_features], dim=1\n                )  # (B, C + 3, npoint, nsample)\n            else:\n                new_features = grouped_features\n        else:\n            assert (\n                self.use_xyz\n            ), \"Cannot have not features and not use xyz as a feature!\"\n            new_features = grouped_xyz\n\n        ret = [new_features]\n        if self.ret_grouped_xyz:\n            ret.append(grouped_xyz)\n        if self.ret_unique_cnt:\n            ret.append(unique_cnt)\n        if len(ret) == 1:\n            return ret[0]\n        else:\n            return tuple(ret)\n\n\nclass GroupAll(nn.Module):\n    r\"\"\"\n    Groups all features\n\n    Parameters\n    ---------\n    \"\"\"\n\n    def __init__(self, use_xyz=True, ret_grouped_xyz=False):\n        # type: (GroupAll, bool) -> None\n        super(GroupAll, self).__init__()\n        self.use_xyz = use_xyz\n\n    def forward(self, xyz, new_xyz, features=None):\n        # type: (GroupAll, torch.Tensor, torch.Tensor, torch.Tensor) -> Tuple[torch.Tensor]\n        r\"\"\"\n        Parameters\n        ----------\n        xyz : torch.Tensor\n            xyz coordinates of the features (B, N, 3)\n        new_xyz : torch.Tensor\n            Ignored\n        features : torch.Tensor\n            Descriptors of the features (B, C, N)\n\n        Returns\n        -------\n        new_features : torch.Tensor\n            (B, C + 3, 1, N) tensor\n        \"\"\"\n\n        grouped_xyz = xyz.transpose(1, 2).unsqueeze(2)\n        if features is not None:\n            grouped_features = features.unsqueeze(2)\n            if self.use_xyz:\n                new_features = torch.cat(\n                    [grouped_xyz, grouped_features], dim=1\n                )  # (B, 3 + C, 1, N)\n            else:\n                new_features = grouped_features\n        else:\n            new_features = grouped_xyz\n\n        return new_features\n"
  },
  {
    "path": "modules/third_party/pointnet2/pytorch_utils.py",
    "content": "# Copyright (c) Facebook, Inc. and its affiliates.\n# \n# This source code is licensed under the MIT license found in the\n# LICENSE file in the root directory of this source tree.\n\n''' Modified based on Ref: https://github.com/erikwijmans/Pointnet2_PyTorch '''\nimport torch\nimport torch.nn as nn\nfrom typing import List, Tuple\n\nclass SharedMLP(nn.Sequential):\n\n    def __init__(\n            self,\n            args: List[int],\n            *,\n            bn: bool = False,\n            activation=nn.ReLU(inplace=True),\n            preact: bool = False,\n            first: bool = False,\n            name: str = \"\"\n    ):\n        super().__init__()\n\n        for i in range(len(args) - 1):\n            self.add_module(\n                name + 'layer{}'.format(i),\n                Conv2d(\n                    args[i],\n                    args[i + 1],\n                    bn=(not first or not preact or (i != 0)) and bn,\n                    activation=activation\n                    if (not first or not preact or (i != 0)) else None,\n                    preact=preact\n                )\n            )\n\n\nclass _BNBase(nn.Sequential):\n\n    def __init__(self, in_size, batch_norm=None, name=\"\"):\n        super().__init__()\n        self.add_module(name + \"bn\", batch_norm(in_size))\n\n        nn.init.constant_(self[0].weight, 1.0)\n        nn.init.constant_(self[0].bias, 0)\n\n\nclass BatchNorm1d(_BNBase):\n\n    def __init__(self, in_size: int, *, name: str = \"\"):\n        super().__init__(in_size, batch_norm=nn.BatchNorm1d, name=name)\n\n\nclass BatchNorm2d(_BNBase):\n\n    def __init__(self, in_size: int, name: str = \"\"):\n        super().__init__(in_size, batch_norm=nn.BatchNorm2d, name=name)\n\n\nclass BatchNorm3d(_BNBase):\n\n    def __init__(self, in_size: int, name: str = \"\"):\n        super().__init__(in_size, batch_norm=nn.BatchNorm3d, name=name)\n\n\nclass _ConvBase(nn.Sequential):\n\n    def __init__(\n            self,\n            in_size,\n            out_size,\n            kernel_size,\n            stride,\n            padding,\n            activation,\n            bn,\n            init,\n            conv=None,\n            batch_norm=None,\n            bias=True,\n            preact=False,\n            name=\"\"\n    ):\n        super().__init__()\n\n        bias = bias and (not bn)\n        conv_unit = conv(\n            in_size,\n            out_size,\n            kernel_size=kernel_size,\n            stride=stride,\n            padding=padding,\n            bias=bias\n        )\n        init(conv_unit.weight)\n        if bias:\n            nn.init.constant_(conv_unit.bias, 0)\n\n        if bn:\n            if not preact:\n                bn_unit = batch_norm(out_size)\n            else:\n                bn_unit = batch_norm(in_size)\n\n        if preact:\n            if bn:\n                self.add_module(name + 'bn', bn_unit)\n\n            if activation is not None:\n                self.add_module(name + 'activation', activation)\n\n        self.add_module(name + 'conv', conv_unit)\n\n        if not preact:\n            if bn:\n                self.add_module(name + 'bn', bn_unit)\n\n            if activation is not None:\n                self.add_module(name + 'activation', activation)\n\n\nclass Conv1d(_ConvBase):\n\n    def __init__(\n            self,\n            in_size: int,\n            out_size: int,\n            *,\n            kernel_size: int = 1,\n            stride: int = 1,\n            padding: int = 0,\n            activation=nn.ReLU(inplace=True),\n            bn: bool = False,\n            init=nn.init.kaiming_normal_,\n            bias: bool = True,\n            preact: bool = False,\n            name: str = \"\"\n    ):\n        super().__init__(\n            in_size,\n            out_size,\n            kernel_size,\n            stride,\n            padding,\n            activation,\n            bn,\n            init,\n            conv=nn.Conv1d,\n            batch_norm=BatchNorm1d,\n            bias=bias,\n            preact=preact,\n            name=name\n        )\n\n\nclass Conv2d(_ConvBase):\n\n    def __init__(\n            self,\n            in_size: int,\n            out_size: int,\n            *,\n            kernel_size: Tuple[int, int] = (1, 1),\n            stride: Tuple[int, int] = (1, 1),\n            padding: Tuple[int, int] = (0, 0),\n            activation=nn.ReLU(inplace=True),\n            bn: bool = False,\n            init=nn.init.kaiming_normal_,\n            bias: bool = True,\n            preact: bool = False,\n            name: str = \"\"\n    ):\n        super().__init__(\n            in_size,\n            out_size,\n            kernel_size,\n            stride,\n            padding,\n            activation,\n            bn,\n            init,\n            conv=nn.Conv2d,\n            batch_norm=BatchNorm2d,\n            bias=bias,\n            preact=preact,\n            name=name\n        )\n\n\nclass Conv3d(_ConvBase):\n\n    def __init__(\n            self,\n            in_size: int,\n            out_size: int,\n            *,\n            kernel_size: Tuple[int, int, int] = (1, 1, 1),\n            stride: Tuple[int, int, int] = (1, 1, 1),\n            padding: Tuple[int, int, int] = (0, 0, 0),\n            activation=nn.ReLU(inplace=True),\n            bn: bool = False,\n            init=nn.init.kaiming_normal_,\n            bias: bool = True,\n            preact: bool = False,\n            name: str = \"\"\n    ):\n        super().__init__(\n            in_size,\n            out_size,\n            kernel_size,\n            stride,\n            padding,\n            activation,\n            bn,\n            init,\n            conv=nn.Conv3d,\n            batch_norm=BatchNorm3d,\n            bias=bias,\n            preact=preact,\n            name=name\n        )\n\n\nclass FC(nn.Sequential):\n\n    def __init__(\n            self,\n            in_size: int,\n            out_size: int,\n            *,\n            activation=nn.ReLU(inplace=True),\n            bn: bool = False,\n            init=None,\n            preact: bool = False,\n            name: str = \"\"\n    ):\n        super().__init__()\n\n        fc = nn.Linear(in_size, out_size, bias=not bn)\n        if init is not None:\n            init(fc.weight)\n        if not bn:\n            nn.init.constant_(fc.bias, 0)\n\n        if preact:\n            if bn:\n                self.add_module(name + 'bn', BatchNorm1d(in_size))\n\n            if activation is not None:\n                self.add_module(name + 'activation', activation)\n\n        self.add_module(name + 'fc', fc)\n\n        if not preact:\n            if bn:\n                self.add_module(name + 'bn', BatchNorm1d(out_size))\n\n            if activation is not None:\n                self.add_module(name + 'activation', activation)\n\ndef set_bn_momentum_default(bn_momentum):\n\n    def fn(m):\n        if isinstance(m, (nn.BatchNorm1d, nn.BatchNorm2d, nn.BatchNorm3d)):\n            m.momentum = bn_momentum\n\n    return fn\n\n\nclass BNMomentumScheduler(object):\n\n    def __init__(\n            self, model, bn_lambda, last_epoch=-1,\n            setter=set_bn_momentum_default\n    ):\n        if not isinstance(model, nn.Module):\n            raise RuntimeError(\n                \"Class '{}' is not a PyTorch nn Module\".format(\n                    type(model).__name__\n                )\n            )\n\n        self.model = model\n        self.setter = setter\n        self.lmbd = bn_lambda\n\n        self.step(last_epoch + 1)\n        self.last_epoch = last_epoch\n\n    def step(self, epoch=None):\n        if epoch is None:\n            epoch = self.last_epoch + 1\n\n        self.last_epoch = epoch\n        self.model.apply(self.setter(self.lmbd(epoch)))\n\n\n"
  },
  {
    "path": "modules/third_party/pointnet2/requirements_new.txt",
    "content": "accelerate==0.28.0\naddict==2.4.0\nantlr4-python3-runtime==4.9.3\nappdirs==1.4.4\nasttokens==2.4.1\nattrs==23.2.0\nblinker==1.7.0\ncertifi==2024.2.2\ncharset-normalizer==3.3.2\nclick==8.1.7\nclip @ git+https://github.com/openai/CLIP.git@a1d071733d7111c9c014f024669f959182114e33\ncomm==0.2.2\nConfigArgParse==1.7\ncontourpy==1.2.0\ncycler==0.12.1\ndash==2.16.1\ndash-core-components==2.0.0\ndash-html-components==2.0.0\ndash-table==5.0.0\ndecorator==5.1.1\ndocker-pycreds==0.4.0\neinops==0.7.0\nexceptiongroup==1.2.0\nexecuting==2.0.1\nfastjsonschema==2.19.1\nfilelock==3.13.3\nFlask==3.0.2\nfonttools==4.50.0\nfsspec==2024.3.1\nftfy==6.2.0\nfvcore==0.1.5.post20221221\ngitdb==4.0.11\nGitPython==3.1.42\nhuggingface-hub==0.22.1\nhydra-core==1.3.2\nidna==3.6\nimportlib_metadata==7.1.0\nimportlib_resources==6.4.0\niopath==0.1.10\nipython==8.18.1\nipywidgets==8.1.2\nitsdangerous==2.1.2\njedi==0.19.1\nJinja2==3.1.3\njoblib==1.3.2\njsonlines==4.0.0\njsonschema==4.21.1\njsonschema-specifications==2023.12.1\njupyter_core==5.7.2\njupyterlab_widgets==3.0.10\nkiwisolver==1.4.5\nMarkupSafe==2.1.5\nmatplotlib==3.8.3\nmatplotlib-inline==0.1.6\nmpmath==1.3.0\nnbformat==5.10.3\nnest-asyncio==1.6.0\nnetworkx==3.2.1\nnumpy==1.26.4\nnvidia-cublas-cu11==11.11.3.6\nnvidia-cublas-cu12==12.1.3.1\nnvidia-cuda-cupti-cu11==11.8.87\nnvidia-cuda-cupti-cu12==12.1.105\nnvidia-cuda-nvrtc-cu11==11.8.89\nnvidia-cuda-nvrtc-cu12==12.1.105\nnvidia-cuda-runtime-cu11==11.8.89\nnvidia-cuda-runtime-cu12==12.1.105\nnvidia-cudnn-cu11==8.7.0.84\nnvidia-cudnn-cu12==8.9.2.26\nnvidia-cufft-cu11==10.9.0.58\nnvidia-cufft-cu12==11.0.2.54\nnvidia-curand-cu11==10.3.0.86\nnvidia-curand-cu12==10.3.2.106\nnvidia-cusolver-cu11==11.4.1.48\nnvidia-cusolver-cu12==11.4.5.107\nnvidia-cusparse-cu11==11.7.5.86\nnvidia-cusparse-cu12==12.1.0.106\nnvidia-nccl-cu11==2.19.3\nnvidia-nccl-cu12==2.19.3\nnvidia-nvjitlink-cu12==12.4.99\nnvidia-nvtx-cu11==11.8.86\nnvidia-nvtx-cu12==12.1.105\nomegaconf==2.3.0\nopen3d==0.18.0\nopencv-python==4.9.0.80\npackaging==24.0\npandas==2.2.1\nparso==0.8.3\npexpect==4.9.0\npillow==10.2.0\nplatformdirs==4.2.0\nplotly==5.20.0\nplyfile==1.0.3\npointnet2==3.0.0\nportalocker==2.8.2\nprompt-toolkit==3.0.43\nprotobuf==4.25.3\npsutil==5.9.8\nptyprocess==0.7.0\npure-eval==0.2.2\nPygments==2.17.2\npyparsing==3.1.2\npyquaternion==0.9.9\npython-dateutil==2.9.0.post0\npytz==2024.1\nPyYAML==6.0.1\nreferencing==0.34.0\nregex==2023.12.25\nrequests==2.31.0\nretrying==1.3.4\nrpds-py==0.18.0\nsafetensors==0.4.2\nscikit-learn==1.4.1.post1\nscipy==1.12.0\nsentry-sdk==1.42.0\nsetproctitle==1.3.3\nsix==1.16.0\nsmmap==5.0.1\nstack-data==0.6.3\nsympy==1.12\ntabulate==0.9.0\ntenacity==8.2.3\ntermcolor==2.4.0\nthreadpoolctl==3.4.0\ntokenizers==0.15.2\ntorch==2.2.0+cu118\ntorchvision==0.17.0+cu118\ntqdm==4.66.2\ntraitlets==5.14.2\ntransformers==4.39.1\ntriton==2.2.0\ntyping_extensions==4.10.0\ntzdata==2024.1\nurllib3==2.2.1\nwandb==0.16.4\nwcwidth==0.2.13\nWerkzeug==3.0.1\nwidgetsnbextension==4.0.10\nyacs==0.1.8\nzipp==3.18.1\n"
  },
  {
    "path": "modules/third_party/pointnet2/setup.py",
    "content": "import glob\nimport os\nimport os.path as osp\n\nfrom setuptools import find_packages, setup\nfrom torch.utils.cpp_extension import BuildExtension, CUDAExtension\n\n_this_dir = osp.dirname(osp.abspath(__file__))\n_ext_src_root = \"_ext_src\"\n_ext_sources = glob.glob(\"{}/src/*.cpp\".format(_ext_src_root)) + glob.glob(\n    \"{}/src/*.cu\".format(_ext_src_root)\n)\n_ext_headers = glob.glob(\"{}/include/*\".format(_ext_src_root))\n\nrequirements = [\"torch>=1.4\"]\n\nos.environ[\"TORCH_CUDA_ARCH_LIST\"] = \"3.7+PTX;5.0;6.0;6.1;6.2;7.0;7.5\"\n\nexec(open(\"_version.py\").read())\n\nsetup(\n    name='pointnet2',\n    version=__version__,\n    packages=find_packages(),\n    install_requires=requirements,\n    ext_modules=[\n        CUDAExtension(\n            name='pointnet2._ext',\n            sources=_ext_sources,\n            extra_compile_args={\n                \"cxx\": [\"-O3\"],\n                \"nvcc\": [\"-O3\", \"-Xfatbin\", \"-compress-all\"],\n            },\n            include_dirs=[osp.join(_this_dir, _ext_src_root, \"include\")],\n        )\n    ],\n    cmdclass={\"build_ext\": BuildExtension},\n    include_package_data=True,\n)"
  },
  {
    "path": "modules/utils.py",
    "content": "import copy\n\nimport einops\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\n\n#########################################################\n# General modules helpers\n#########################################################\ndef get_activation_fn(activation_type):\n    if activation_type not in [\"relu\", \"gelu\", \"glu\"]:\n        raise RuntimeError(f\"activation function currently support relu/gelu, not {activation_type}\")\n    return getattr(F, activation_type)\n\n\ndef get_mlp_head(input_size, hidden_size, output_size, dropout=0):\n    return nn.Sequential(*[\n        nn.Linear(input_size, hidden_size),\n        nn.ReLU(),\n        nn.LayerNorm(hidden_size, eps=1e-12),\n        nn.Dropout(dropout),\n        nn.Linear(hidden_size, output_size)\n    ])\n\n\ndef layer_repeat(module, N, share_layer=False):\n    if share_layer:\n        return nn.ModuleList([module] * N)\n    else:\n        return nn.ModuleList([copy.deepcopy(module) for _ in range(N - 1)] + [module])\n\n\n#########################################################\n# Specific modules helpers\n#########################################################\ndef calc_pairwise_locs(obj_centers, obj_whls, eps=1e-10, pairwise_rel_type='center', spatial_dist_norm=True,\n                       spatial_dim=5):\n    if pairwise_rel_type == 'mlp':\n        obj_locs = torch.cat([obj_centers, obj_whls], 2)\n        pairwise_locs = torch.cat(\n            [einops.repeat(obj_locs, 'b l d -> b l x d', x=obj_locs.size(1)),\n             einops.repeat(obj_locs, 'b l d -> b x l d', x=obj_locs.size(1))],\n            dim=3\n        )\n        return pairwise_locs\n\n    pairwise_locs = einops.repeat(obj_centers, 'b l d -> b l 1 d') \\\n                    - einops.repeat(obj_centers, 'b l d -> b 1 l d')\n    pairwise_dists = torch.sqrt(torch.sum(pairwise_locs ** 2, 3) + eps)  # (b, l, l)\n    if spatial_dist_norm:\n        max_dists = torch.max(pairwise_dists.view(pairwise_dists.size(0), -1), dim=1)[0]\n        norm_pairwise_dists = pairwise_dists / einops.repeat(max_dists, 'b -> b 1 1')\n    else:\n        norm_pairwise_dists = pairwise_dists\n\n    if spatial_dim == 1:\n        return norm_pairwise_dists.unsqueeze(3)\n\n    pairwise_dists_2d = torch.sqrt(torch.sum(pairwise_locs[..., :2] ** 2, 3) + eps)\n    if pairwise_rel_type == 'center':\n        pairwise_locs = torch.stack(\n            [norm_pairwise_dists, pairwise_locs[..., 2] / pairwise_dists,\n             pairwise_dists_2d / pairwise_dists, pairwise_locs[..., 1] / pairwise_dists_2d,\n             pairwise_locs[..., 0] / pairwise_dists_2d],\n            dim=3\n        )\n    elif pairwise_rel_type == 'vertical_bottom':\n        bottom_centers = torch.clone(obj_centers)\n        bottom_centers[:, :, 2] -= obj_whls[:, :, 2]\n        bottom_pairwise_locs = einops.repeat(bottom_centers, 'b l d -> b l 1 d') \\\n                               - einops.repeat(bottom_centers, 'b l d -> b 1 l d')\n        bottom_pairwise_dists = torch.sqrt(torch.sum(bottom_pairwise_locs ** 2, 3) + eps)  # (b, l, l)\n        bottom_pairwise_dists_2d = torch.sqrt(torch.sum(bottom_pairwise_locs[..., :2] ** 2, 3) + eps)\n        pairwise_locs = torch.stack(\n            [norm_pairwise_dists,\n             bottom_pairwise_locs[..., 2] / bottom_pairwise_dists,\n             bottom_pairwise_dists_2d / bottom_pairwise_dists,\n             pairwise_locs[..., 1] / pairwise_dists_2d,\n             pairwise_locs[..., 0] / pairwise_dists_2d],\n            dim=3\n        )\n\n    if spatial_dim == 4:\n        pairwise_locs = pairwise_locs[..., 1:]\n    return pairwise_locs\n\ndef calc_pairwise_locs_mv(obj_centers, pairwise_rel_type='center', spatial_dist_norm=True, spatial_dim=5):\n    eps=1e-10\n    pairwise_locs = einops.repeat(obj_centers, 'b l d -> b l 1 d') \\\n                    - einops.repeat(obj_centers, 'b l d -> b 1 l d')\n    pairwise_dists = torch.sqrt(torch.sum(pairwise_locs ** 2, 3) + eps)  # (b, l, l)\n    if spatial_dist_norm:\n        max_dists = torch.max(pairwise_dists.view(pairwise_dists.size(0), -1), dim=1)[0]\n        norm_pairwise_dists = pairwise_dists / einops.repeat(max_dists, 'b -> b 1 1')\n    else:\n        norm_pairwise_dists = pairwise_dists\n\n    if spatial_dim == 1:\n        return norm_pairwise_dists.unsqueeze(3)\n\n    pairwise_dists_2d = torch.sqrt(torch.sum(pairwise_locs[..., :2] ** 2, 3) + eps)\n    if pairwise_rel_type == 'center':\n        pairwise_locs = torch.stack(\n            [norm_pairwise_dists, pairwise_locs[..., 2] / pairwise_dists,\n             pairwise_dists_2d / pairwise_dists, pairwise_locs[..., 1] / pairwise_dists_2d,\n             pairwise_locs[..., 0] / pairwise_dists_2d],\n            dim=3\n        )\n\n    if spatial_dim == 4:\n        pairwise_locs = pairwise_locs[..., 1:]\n    return pairwise_locs\n\n# TODO: need to generalize this function to more use cases to be in modules/utils.py\ndef get_mixup_function(mixup_strategy, mixup_stage1, mixup_stage2):\n    if mixup_strategy is None:\n        return None\n    assert mixup_strategy in ['linear_decay', 'all_mixup']\n\n    if mixup_strategy == 'linear_decay':\n        return LinearDecayMixup(mixup_stage1, mixup_stage2)\n    elif mixup_strategy == 'all_mixup':\n        return AllMixup()\n\n\nclass AllMixup(nn.Module):\n    def __init__(self) -> None:\n        super().__init__()\n\n    def forward(self, obj_sem_cls_pred, obj_labels, cur_step, total_steps):\n        mixup_sem_cls_pred = torch.zeros_like(obj_sem_cls_pred)\n        for i in range(mixup_sem_cls_pred.shape[0]):\n            for j in range(mixup_sem_cls_pred.shape[1]):\n                if obj_labels[i, j] >= 0:\n                    mixup_sem_cls_pred[i, j, obj_labels[i, j]] = 1.0\n        return mixup_sem_cls_pred\n\n\nclass LinearDecayMixup(nn.Module):\n    def __init__(self, mixup_stage1, mixup_stage2) -> None:\n        super().__init__()\n        self.stage1_rate = mixup_stage1\n        self.stage2_rate = mixup_stage2\n        assert self.stage2_rate > self.stage1_rate\n\n    def forward(self, obj_sem_cls_pred, obj_labels, cur_step, total_steps):\n        if cur_step < total_steps * self.stage1_rate:\n            mixup_ratio = 1.0\n        elif cur_step < total_steps * self.stage2_rate:\n            mixup_ratio = (total_steps * self.stage2_rate - cur_step) / (\n                        (self.stage2_rate - self.stage1_rate) * total_steps)\n        else:\n            mixup_ratio = 0.0\n        # mixup\n        mixup_sem_cls_pred = obj_sem_cls_pred.clone()  # B, O, 607\n        random_numer = torch.rand(mixup_sem_cls_pred.shape[0:2])  # B, O\n        mixup_mask = random_numer < mixup_ratio\n        for i in range(mixup_sem_cls_pred.shape[0]):\n            for j in range(mixup_sem_cls_pred.shape[1]):\n                if mixup_mask[i, j] and obj_labels[i, j] >= 0:\n                    mixup_sem_cls_pred[i, j, :] = 0.0\n                    mixup_sem_cls_pred[i, j, obj_labels[i, j]] = 1.0\n        return mixup_sem_cls_pred"
  },
  {
    "path": "modules/vision/__init__.py",
    "content": "from .pcd_openvocab_encoder import *\nfrom .obj_cls_encoder import *\n"
  },
  {
    "path": "modules/vision/obj_cls_encoder.py",
    "content": "import torch.nn as nn\nfrom modules.build import VISION_REGISTRY\nfrom modules.utils import get_mlp_head\n\n@VISION_REGISTRY.register()\nclass ObjClsEncoder(nn.Module):\n    def __init__(self, cfg, input_feat_size=768, hidden_size=768, tgt_cls_num=607):\n        super().__init__()\n        self.cfg = cfg\n        self.vis_cls_head = get_mlp_head(input_feat_size, hidden_size // 2, tgt_cls_num, dropout = 0.3)\n\n    def forward(self, obj_feats, **kwargs):\n        obj_logits = self.vis_cls_head(obj_feats)\n        return obj_logits\n"
  },
  {
    "path": "modules/vision/pcd_openvocab_encoder.py",
    "content": "import os\nimport glob\n\nimport einops\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom modules.build import VISION_REGISTRY\nfrom modules.layers.pointnet import PointNetPP\nfrom modules.layers.transformers import TransformerSpatialEncoderLayer\nfrom modules.utils import get_mlp_head, layer_repeat, calc_pairwise_locs, get_mixup_function\nfrom modules.weights import _init_weights_bert\n\n\n@VISION_REGISTRY.register()\nclass PointOpenVocabEncoder(nn.Module):\n    def __init__(self, cfg, backbone='pointnet++', hidden_size=768, path=None, freeze=False, dim_feedforward=2048,\n                 num_attention_heads=12, spatial_dim=5, num_layers=4, dim_loc=6, pairwise_rel_type='center',\n                 use_matmul_label=False, mixup_strategy=None, mixup_stage1=None, mixup_stage2=None,\n                 lang_type='bert', lang_path=None, attn_type='spatial'):\n        super().__init__()\n        assert backbone in ['pointnet++']\n\n        # build backbone\n        if backbone == 'pointnet++':\n            self.point_feature_extractor = PointNetPP(\n                sa_n_points=[32, 16, None],\n                sa_n_samples=[32, 32, None],\n                sa_radii=[0.2, 0.4, None],\n                sa_mlps=[[3, 64, 64, 128], [128, 128, 128, 256], [256, 256, 512, 768]],\n            )\n        elif backbone == 'pointnext':\n            self.point_feature_extractor = PointNextEncoder(\n                blocks=[1, 1, 1, 1, 1, 1],\n                strides=[1, 2, 2, 2, 2, 1],\n                sa_layers=2,\n                sa_use_res=True,\n                width=32,\n                radius=0.15,\n                radius_scaling=1.5,\n                mlp_head=[1024, 768] if lang_type == 'bert' else []\n            )\n\n        # Open vocab grounding head\n        vocab_file_name = f\"scannet_607_{'bert-base-uncased' if lang_type == 'bert' else 'clip-ViT-B16'}_id.pth\"\n        self.register_buffer(\"text_features\", torch.load(os.path.join(lang_path, vocab_file_name)))\n        self.point_cls_head = lambda x: x @ self.text_features.t()\n        self.dropout = nn.Dropout(0.1)\n\n        self.attn_type = attn_type\n\n        # freeze feature\n        self.freeze = freeze\n        if freeze:\n            for p in self.parameters():\n                p.requires_grad = False\n\n        # build semantic cls embeds\n        self.sem_cls_embed_layer = nn.Sequential(nn.Linear(hidden_size, hidden_size),\n                                                 nn.LayerNorm(hidden_size),\n                                                 nn.Dropout(0.1))\n\n        # self.int2cat = json.load(\n        #     open(os.path.join(glove_path, \"annotations/meta_data/scannetv2_raw_categories.json\"), 'r'))\n        # self.cat2int = {w: i for i, w in enumerate(self.int2cat)}\n        # self.cat2vec = json.load(open(os.path.join(glove_path, \"annotations/meta_data/cat2glove42b.json\"), 'r'))\n        # self.register_buffer(\"int2mat\", torch.ones(607, 300))\n        # for i in range(607):\n        #     self.int2mat[i, :] = torch.Tensor(self.cat2vec[self.int2cat[i]])\n\n        self.use_matmul_label = use_matmul_label\n        # build mask embedes\n        self.sem_mask_embeddings = nn.Embedding(1, 768)\n\n        # build spatial encoder layer\n        if self.attn_type == 'spatial':\n            pc_encoder_layer = TransformerSpatialEncoderLayer(hidden_size, num_attention_heads,\n                                                              dim_feedforward=dim_feedforward,\n                                                              dropout=0.1, activation='gelu',\n                                                              spatial_dim=spatial_dim, spatial_multihead=True,\n                                                              spatial_attn_fusion='cond')\n            self.spatial_encoder = layer_repeat(pc_encoder_layer, num_layers)\n            loc_layer = nn.Sequential(\n                nn.Linear(dim_loc, hidden_size),\n                nn.LayerNorm(hidden_size),\n            )\n            self.loc_layers = layer_repeat(loc_layer, 1)\n            self.pairwise_rel_type = pairwise_rel_type\n            self.spatial_dim = spatial_dim\n        else:\n            pass\n\n        # # build mixup strategy\n        # self.mixup_strategy = mixup_strategy\n        # self.mixup_function = get_mixup_function(mixup_strategy, mixup_stage1, mixup_stage2)\n\n        # load weights\n        self.apply(_init_weights_bert)\n        if path is not None:\n            # pre_dict = {}\n            # for name, p in self.named_parameters():\n            #     pre_dict[name] = p\n            # TODO: change this to accelerator loading multiple model files\n            print(\"loaded\")\n            ckpts = glob.glob(os.path.join(path, '*.bin'))\n            if len(ckpts) != 0:\n                for ckpt in ckpts:\n                    state_dict = torch.load(ckpt, map_location='cpu')\n                    self.load_state_dict(state_dict, strict=False)\n                print(\"loaded checkpoint files\")\n            elif path.endswith('.pth'):\n                print(\"loaded checkpoint file\")\n                state_dict = torch.load(path)\n                self.load_state_dict(state_dict, strict=False)\n            # for name, p in self.named_parameters():\n            #     if name in state_dict.keys():\n            #         print(name, pre_dict[name] - layer_repeat(p, 1))\n            # exit()\n\n    def freeze_bn(self, m):\n        for layer in m.modules():\n            if isinstance(layer, nn.BatchNorm2d):\n                layer.eval()\n\n    def forward(self, obj_pcds, obj_locs, obj_masks, obj_sem_masks,\n                obj_labels=None, cur_step=None, max_steps=None, **kwargs):\n        if self.freeze:\n            self.freeze_bn(self.point_feature_extractor)\n\n        # get obj_embdes\n        batch_size, num_objs, _, _ = obj_pcds.size()\n        obj_embeds = self.point_feature_extractor(einops.rearrange(obj_pcds, 'b o p d -> (b o) p d'))\n        # obj_sem_embeds = self.sem_cls_embed_layer(obj_embeds)\n        # obj_sem_embeds = einops.rearrange(obj_sem_embeds, '(b o) d -> b o d', b=batch_size)\n        obj_embeds = einops.rearrange(obj_embeds, '(b o) d -> b o d', b=batch_size)\n        obj_embeds = self.dropout(obj_embeds)\n        if self.freeze:\n            obj_embeds = obj_embeds.detach()\n\n        # get semantic cls embeds\n        obj_sem_cls = F.softmax(self.point_cls_head(obj_embeds), dim=2).detach()\n\n        # TODO: check if this sem_cls is still needed, switch this to cross attention\n        # if self.mixup_strategy != None:\n        #     obj_sem_cls_mix = self.mixup_function(obj_sem_cls, obj_labels, cur_step, max_steps)\n        # else:\n        #     obj_sem_cls_mix = obj_sem_cls\n        # if self.use_matmul_label:\n        #     obj_sem_cls_embeds = torch.matmul(obj_sem_cls_mix, self.int2mat)  # N, O, 607 matmul ,607, 300\n        # else:\n        #     obj_sem_cls_mix = torch.argmax(obj_sem_cls_mix, dim=2)\n        #     obj_sem_cls_embeds = torch.Tensor(\n        #         [self.cat2vec[self.int2cat[int(i)]] for i in obj_sem_cls_mix.view(batch_size * num_objs)])\n        #     obj_sem_cls_embeds = obj_sem_cls_embeds.view(batch_size, num_objs, 300).cuda()\n        # obj_sem_cls_embeds = self.sem_cls_embed_layer(obj_sem_cls_embeds)\n        # obj_embeds = obj_embeds + obj_sem_embeds\n\n        # # get semantic mask embeds\n        # obj_embeds = obj_embeds.masked_fill(obj_sem_masks.unsqueeze(2).logical_not(), 0.0)\n        # obj_sem_mask_embeds = self.sem_mask_embeddings(\n        #     torch.zeros((batch_size, num_objs)).long().cuda()\n        # ) * obj_sem_masks.logical_not().unsqueeze(2)\n        # obj_embeds = obj_embeds + obj_sem_mask_embeds\n\n        # record pre embedes\n        # note: in our implementation, there are three types of embds, raw embeds from PointNet,\n        # pre embeds after tokenization, post embeds after transformers\n        obj_embeds_pre = obj_embeds\n\n        # spatial reasoning, spatial attention transformer\n        if self.attn_type == 'spatial':\n            pairwise_locs = calc_pairwise_locs(obj_locs[:, :, :3], obj_locs[:, :, 3:],\n                                               pairwise_rel_type=self.pairwise_rel_type, spatial_dist_norm=True,\n                                               spatial_dim=self.spatial_dim)\n            for i, pc_layer in enumerate(self.spatial_encoder):\n                query_pos = self.loc_layers[0](obj_locs)\n                obj_embeds = obj_embeds + query_pos\n                obj_embeds, self_attn_matrices = pc_layer(obj_embeds, pairwise_locs,\n                                                          tgt_key_padding_mask=obj_masks.logical_not())\n        else:\n            pass\n\n        return obj_embeds, obj_embeds_pre, obj_sem_cls"
  },
  {
    "path": "modules/weights.py",
    "content": "import torch.nn as nn\n\ndef _init_weights_bert(module, std=0.02):\n    \"\"\"\n        Huggingface transformer weight initialization,\n        most commonly for bert initialization\n    \"\"\"\n    if isinstance(module, nn.Linear):\n        # Slightly different from the TF version which uses truncated_normal for initialization\n        # cf https://github.com/pytorch/pytorch/pull/5617\n        module.weight.data.normal_(mean=0.0, std=std)\n        if module.bias is not None:\n            module.bias.data.zero_()\n    elif isinstance(module, nn.Embedding):\n        module.weight.data.normal_(mean=0.0, std=std)\n        if module.padding_idx is not None:\n            module.weight.data[module.padding_idx].zero_()\n    elif isinstance(module, nn.LayerNorm):\n        module.bias.data.zero_()\n        module.weight.data.fill_(1.0)"
  },
  {
    "path": "optim/__init__.py",
    "content": "from .loss import *"
  },
  {
    "path": "optim/build.py",
    "content": "from optim.loss.loss import Loss\nfrom optim.optimizer.optim import get_optimizer\nfrom optim.scheduler import get_scheduler\n\n\ndef build_optim(cfg, params, total_steps):\n    loss = Loss(cfg)\n    optimizer = get_optimizer(cfg, params)\n    scheduler = get_scheduler(cfg, optimizer, total_steps)\n    return loss, optimizer, scheduler\n"
  },
  {
    "path": "optim/loss/__init__.py",
    "content": "from .contra_loss import *\n"
  },
  {
    "path": "optim/loss/contra_loss.py",
    "content": "import numpy as np\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom optim.loss.loss import LOSS_REGISTRY\nfrom einops import rearrange, einsum\n\nfrom common.dist_utils import all_gather\n\n\n@LOSS_REGISTRY.register()\nclass TextObjWithinBatch(nn.Module):\n    def __init__(self, cfg):\n        super().__init__()\n        self.distributed = cfg.num_gpu > 1\n        self.bce = cfg.task in [\"ScanQA\"]\n\n    def forward(self, data_dict):\n        obj_feats = data_dict[\"intra_obj_embeds\"]     # B, O, D\n        text_feats = data_dict[\"intra_text_embed\"]    # B, D, feature of CLS token\n        labels = data_dict[\"tgt_object_id\"]            # B, 1\n        masks = data_dict[\"obj_masks\"]\n\n        # B*L vs. B in per-scene scenario\n        if obj_feats.shape[0] != masks.shape[0]:\n            masks = masks.unsqueeze(1).repeat(1, int(obj_feats.shape[0] / masks.shape[0]), 1).view(-1, masks.shape[1])\n            labels = labels.view(-1, 1)\n\n        obj_feats = F.normalize(obj_feats, dim=-1, p=2)\n        text_feats = F.normalize(text_feats, dim=-1, p=2)\n        obj2text_logits = einsum(obj_feats, text_feats, \"b o d, b d -> b o\")\n        obj2text_logits = obj2text_logits\n        labels = labels.squeeze(-1)\n        if self.bce:\n            loss = F.binary_cross_entropy_with_logits(obj2text_logits, labels.float(), reduction=\"sum\", weight=masks) / float(labels.shape[0])\n        else:\n            obj2text_logits.masked_fill_(masks.logical_not(), -float('inf'))\n            loss = F.cross_entropy(obj2text_logits, labels)\n        return loss\n\n\n@LOSS_REGISTRY.register()\nclass TextObjBetweenBatch(nn.Module):\n    def __init__(self, cfg):\n        super().__init__()\n        self.distributed = cfg.num_gpu > 1\n        self.logit_scale = nn.Parameter((torch.ones([]) * np.log(1 / 0.07)).exp())\n\n    def forward(self, data_dict):\n        logit_scale = torch.clamp(self.logit_scale, max=100)\n        obj_feats = data_dict[\"inter_obj_embeds\"]     # B, O, D\n        text_feats = data_dict[\"inter_text_embed\"]    # B, D, feature of CLS token\n        labels = data_dict[\"tgt_object_id\"]            # B, 1\n\n        if obj_feats.shape[0] != labels.shape[0]:\n            labels = labels.view(-1, 1)\n\n        tgt_obj_feats = obj_feats[torch.arange(labels.size(0)), labels[:, 0], :]       # B, D\n        tgt_obj_feats = F.normalize(tgt_obj_feats, dim=-1, p=2)\n        text_feats = F.normalize(text_feats, dim=-1, p=2)\n        if self.distributed:\n            tgt_obj_feats, text_feats = all_gather([\n                tgt_obj_feats, text_feats\n            ])\n        pseudo_labels = torch.arange(text_feats.shape[0]).to(text_feats.device)  # B,\n        text2obj_logits = logit_scale * text_feats @ tgt_obj_feats.t()  # B, B\n        obj2text_logits = logit_scale * tgt_obj_feats @ text_feats.t()  # B, B\n        t2o = F.cross_entropy(text2obj_logits, pseudo_labels)\n        o2t = F.cross_entropy(obj2text_logits, pseudo_labels)\n        loss = (t2o + o2t) / 2\n        return loss\n\n\n@LOSS_REGISTRY.register()\nclass TextSceneBetweenBatch(nn.Module):\n    def __init__(self, cfg):\n        super().__init__()\n        self.distributed = cfg.num_gpu > 1\n        self.logit_scale = nn.Parameter((torch.ones([]) * np.log(1 / 0.07)).exp())\n\n    def forward(self, data_dict):\n        logit_scale = torch.clamp(self.logit_scale, max=100)\n        scene_feats = data_dict[\"scene_embed\"]     # B, O, D\n        text_feats = data_dict[\"scene_text_embed\"]    # B, D, feature of CLS token\n\n        scene_feats = F.normalize(scene_feats, dim=-1, p=2)\n        text_feats = F.normalize(text_feats, dim=-1, p=2)\n        if self.distributed:\n            scene_feats, text_feats = all_gather([\n                scene_feats, text_feats\n            ])\n        pseudo_labels = torch.arange(text_feats.shape[0]).to(text_feats.device)  # B,\n        text2scene_logits = logit_scale * text_feats @ scene_feats.t()  # B, B\n        scene2text_logits = logit_scale * scene_feats @ text_feats.t()  # B, B\n        t2s = F.cross_entropy(text2scene_logits, pseudo_labels)\n        s2t = F.cross_entropy(scene2text_logits, pseudo_labels)\n        loss = (t2s + s2t) / 2\n        return loss\n\n\nif __name__ == \"__main__\":\n    B, O, D = 32, 10, 512\n    data_dict = {\n        \"obj_embeds\": torch.randn(B, O, D),\n        \"text_embed\": torch.randn(B, D),\n        \"labels\": torch.randint(0, O, (B, 1)),\n        \"obj_masks\": torch.ones(B, O).bool(),\n    }\n    from omegaconf import OmegaConf\n    cfg = OmegaConf.create({\"num_gpu\": 1})\n    text2obj_loss = TextObjWithinBatch(cfg)(data_dict)\n    obj2text_loss = TextObjBetweenBatch(cfg)(data_dict)\n\n        "
  },
  {
    "path": "optim/loss/loss.py",
    "content": "import torch.nn as nn\nimport torch.nn.functional as F\nfrom fvcore.common.registry import Registry\n\n\nLOSS_REGISTRY = Registry(\"loss\")\n\ndef og3d_loss(data_dict):\n    return F.cross_entropy(data_dict[\"og3d_logits\"], data_dict[\"tgt_object_id\"].squeeze(1))\n\n\ndef og3d_multi_loss(data_dict):\n    return F.binary_cross_entropy_with_logits(\n        data_dict[\"og3d_logits\"],\n        data_dict[\"tgt_object_id\"].float(),\n        reduction=\"sum\") / float(data_dict[\"tgt_object_id\"].shape[0])\n\n\ndef txt_cls_multi_loss(data_dict):\n    return F.binary_cross_entropy_with_logits(\n        data_dict[\"txt_cls_logits\"],\n        data_dict[\"tgt_object_label\"].float(),\n        reduction='sum') / float(data_dict[\"tgt_object_label\"].shape[0])\n\n\ndef obj_cls_raw_loss(data_dict):\n    return (\n        F.cross_entropy(\n            data_dict[\"obj_cls_raw_logits\"].permute(0, 2, 1), data_dict[\"obj_labels\"], reduction='none'\n        ) * data_dict[\"obj_masks\"]\n    ).sum() / data_dict[\"obj_masks\"].sum()\n\n\ndef obj_cls_pre_loss(data_dict):\n    return (\n        F.cross_entropy(\n            data_dict[\"obj_cls_pre_logits\"].permute(0, 2, 1), data_dict[\"obj_labels\"], reduction='none'\n        ) * data_dict[\"obj_masks\"]\n    ).sum() / data_dict[\"obj_masks\"].sum()\n\n\ndef obj_cls_post_loss(data_dict):\n    return (\n        F.cross_entropy(\n            data_dict[\"obj_cls_post_logits\"].permute(0, 2, 1), data_dict[\"obj_labels\"], reduction='none'\n        ) * data_dict[\"obj_masks\"]\n    ).sum() / data_dict[\"obj_masks\"].sum()\n\n\ndef answer_loss(data_dict):\n    return F.binary_cross_entropy_with_logits(\n            data_dict[\"answer_scores\"], data_dict[\"answer_label\"].float(), reduction='sum'\n        ) / data_dict[\"answer_scores\"].shape[0]\n\n\ndef lm_cls_loss(data_dict):\n    target_labels = data_dict[\"masked_lm_labels\"]\n    target_labels = target_labels.view(-1, target_labels.size(-1)) if len(target_labels.size()) == 3 else target_labels\n    return F.cross_entropy(\n            data_dict[\"txt_lm_cls_logits\"].permute(0, 2, 1), target_labels, ignore_index=-1\n        )\n\n\ndef obj_cls_pre_loss_mask(data_dict):\n    return (\n        F.cross_entropy(\n            data_dict[\"obj_cls_pre_logits\"].permute(0, 2, 1), data_dict[\"obj_labels\"], reduction='none'\n        ) * data_dict[\"obj_masks\"] * data_dict[\"obj_sem_masks\"].logical_not()\n    ).sum() / (data_dict[\"obj_masks\"] * data_dict[\"obj_sem_masks\"].logical_not()).sum()\n\n\ndef obj_cls_pre_loss_unmask(data_dict):\n    return (\n        F.cross_entropy(\n            data_dict[\"obj_cls_pre_logits\"].permute(0, 2, 1), data_dict[\"obj_labels\"], reduction='none'\n        ) * data_dict[\"obj_masks\"] * data_dict[\"obj_sem_masks\"]\n    ).sum() / (data_dict[\"obj_masks\"] * data_dict[\"obj_sem_masks\"]).sum()\n\n\ndef obj_cls_post_loss_mask(data_dict):\n    return (\n        F.cross_entropy(\n            data_dict[\"obj_cls_post_logits\"].permute(0, 2, 1), data_dict[\"obj_labels\"], reduction='none'\n        ) * data_dict[\"obj_masks\"] * data_dict[\"obj_sem_masks\"].logical_not()\n    ).sum() / (data_dict[\"obj_masks\"] * data_dict[\"obj_sem_masks\"].logical_not()).sum()\n\n\ndef obj_cls_post_loss_unmask(data_dict):\n    return (\n        F.cross_entropy(\n            data_dict[\"obj_cls_post_logits\"].permute(0, 2, 1), data_dict[\"obj_labels\"], reduction='none'\n        ) * data_dict[\"obj_masks\"] * data_dict[\"obj_sem_masks\"]\n    ).sum() / (data_dict[\"obj_masks\"] * data_dict[\"obj_sem_masks\"]).sum()\n\n\ndef obj_cls_loss(data_dict, smoothing=0.3):\n    return (\n        F.cross_entropy(\n            data_dict[\"obj_logits\"].permute(0, 2, 1), data_dict[\"obj_labels\"],\n            reduction='none', label_smoothing=smoothing\n        ) * data_dict[\"obj_masks\"]\n    ).sum() / data_dict[\"obj_masks\"].sum()\n\n\ndef mse_loss(data_dict):\n    return (\n        ((data_dict[\"pred_images\"] - data_dict[\"target_images\"]) ** 2).mean()\n    )\n\n\nclass Loss(nn.Module):\n    def __init__(self, cfg):\n        # e.g.  refer_loss_v1: [\"og3d_loss\", \"txt_cls_loss\", \"obj_cls_raw_loss\", \"obj_cls_pre_loss\", \"obj_cls_post_loss\"]\n        #       qa_loss_v1: [\"og3d_loss\", \"txt_cls_loss\", \"obj_cls_raw_loss\", \"obj_cls_pre_loss\", \"obj_cls_post_loss\", \"answer_loss\"]\n        #       pretrain_loss_v1: [\"lm_cls_loss\", \"obj_cls_raw_loss\", \"obj_cls_pre_loss\", \"obj_cls_post_loss\", \"obj_cls_pre_loss_mask\",\n        #                           \"obj_cls_pre_loss_unmask\", \"obj_cls_post_loss_mask\", \"obj_cls_post_loss_unmask\"]\n        super().__init__()\n        self.all_keys = list(set(cfg.model.vis_loss_list + cfg.model.loss_list))\n        self.selected_keys = cfg.model.loss_list\n\n        self.loss_fn = {}\n        for k in self.all_keys:\n            if k in globals().keys():\n                self.loss_fn[k] = globals()[k]\n                print(f\"Using {k} from loss.globals()\")\n            else:\n                self.loss_fn[k] = LOSS_REGISTRY.get(k)(cfg)\n                setattr(self, k, self.loss_fn[k]) # register the loss module, otherwise its parameters will not be the same device as the model\n                print(f\"Using {k} from Registry {LOSS_REGISTRY._name}\")\n\n    def forward(self, data_dict):\n        all_losses = {}\n        for k, fn in self.loss_fn.items():\n            if k == 'txt_cls_loss' and 'txt_cls_label' not in data_dict: # compatible with old version of txt_cls_loss\n                data_dict['txt_cls_label'] = data_dict[\"tgt_object_label\"].squeeze(1)\n            cur_loss = fn(data_dict)\n            if not isinstance(cur_loss, list):\n                cur_dict_loss = {k : cur_loss}\n            else:\n                cur_dict_loss = {k: cur_loss[0]}\n                for ck, cv in cur_loss[1].items():\n                    cur_dict_loss[k + \"_\" + ck] = cv\n            all_losses.update(cur_dict_loss)\n        \n        selected_losses = {k: all_losses[k] for k in self.selected_keys}\n        total_loss = sum(selected_losses.values())\n        all_losses[\"total_loss\"] = total_loss\n        return total_loss, all_losses\n"
  },
  {
    "path": "optim/optimizer/__init__.py",
    "content": ""
  },
  {
    "path": "optim/optimizer/optim.py",
    "content": "import torch.optim as optim\n\nfrom fvcore.common.registry import Registry\nOPTIM_REGISTRY = Registry(\"loss\")\n\nfrom common.type_utils import cfg2dict\n\n\ndef get_optimizer(cfg, params):\n  if getattr(optim, cfg.solver.optim.name, None) is not None:\n    optimizer = getattr(optim, cfg.solver.optim.name)(params, **cfg2dict(cfg.solver.optim.args))\n  else:\n    optimizer = OPTIM_REGISTRY.get(cfg.solver.optim.name)(params, **cfg2dict(cfg.solver.optim.args))\n  return optimizer\n"
  },
  {
    "path": "optim/scheduler.py",
    "content": "import math\nfrom torch.optim.lr_scheduler import LambdaLR\n\n\ndef warmup_cosine(step, warmup_step, total_step, minimum_ratio=1e-5):\n    if step <= warmup_step and warmup_step > 0:\n        return step / warmup_step\n    return max(\n        0.5 * (1 + math.cos((step - warmup_step) / (total_step - warmup_step) * math.pi)),\n        minimum_ratio\n    )\n\n\ndef warmup_exp(step, warmup_step, total_step, **kwargs):\n    if step <= warmup_step and warmup_step > 0:\n        return step / warmup_step\n    return kwargs[\"gamma\"] ** (step * 1. / (total_step - warmup_step))\n\n\ndef get_scheduler(cfg, optimizer, total_steps):\n    warmup_steps = cfg.solver.sched.args.warmup_steps * cfg.num_gpu\n    minimum_ratio = cfg.solver.sched.args.get(\"minimum_ratio\", 1e-5)\n    lambda_func = lambda step: globals()[cfg.solver.sched.name](\n        step, warmup_steps, total_steps, minimum_ratio=minimum_ratio\n    )\n    return LambdaLR(optimizer=optimizer, lr_lambda=lambda_func)\n"
  },
  {
    "path": "optim/utils.py",
    "content": "def no_decay_param_group(parameters, lr):\n    no_decay = ['bias', 'LayerNorm.bias', 'LayerNorm.weight']\n    decay_params = []\n    no_decay_params = []\n    for n, p in parameters:\n        if p.requires_grad == False:\n            continue\n        if not any(nd in n for nd in no_decay):\n            decay_params.append(p)\n        else:\n            no_decay_params.append(p)\n    optimizer_grouped_parameters = [\n        {'params': decay_params,\n         'weight_decay': 0.01, 'lr': lr},\n        {'params': no_decay_params,\n         'weight_decay': 0.0, 'lr': lr}\n    ]\n    return optimizer_grouped_parameters\n"
  },
  {
    "path": "preprocess/README.md",
    "content": "## Data Processing\n\nWe have released the preprocessing scripts for 3RScan, MultiScan, ARKitScenes and Structured3D. They are designed to provide a comprehensive framework for data preparation. Taking 3RScan as an example, the process involves the following steps:\n\n- Import raw meshes and annotations from each dataset.  \n- Extract vertices from the mesh and assign both instance and semantic labels to them.  \n- Map the dataset-specific semantic labels to ScanNet 607. This is optional for SceneVerse training but may be required for closed-vocab training ([example](https://github.com/scene-verse/SceneVerse/blob/b936f96b61614bec32282e5eed7de844d1a7a330/preprocess/rscan.py#L58)).\n- Axis Alignment: Rotate the 3D point clouds so that most 3D object bounding boxes are axis-aligned. This follows ScanRefer, and is currently implemented as a heuristic search ([example](https://github.com/scene-verse/SceneVerse/blob/b936f96b61614bec32282e5eed7de844d1a7a330/preprocess/rscan.py#L95)).  \n- Translation Alignment: Translate the 3D point clouds so that its origin at the center on the floor ([example](https://github.com/scene-verse/SceneVerse/blob/b936f96b61614bec32282e5eed7de844d1a7a330/preprocess/rscan.py#L102)).  \n- Color Alignment: The color value should be within the [0, 255] range ([example](https://github.com/scene-verse/SceneVerse/blob/b936f96b61614bec32282e5eed7de844d1a7a330/preprocess/rscan.py#L98)).\n- Point subsampling: subsample the point clouds if the number of points exceeds 240K.\n    ```python\n    PTS_LIMIT = 240000\n    if out_points.shape[0] > PTS_LIMIT:\n        pcd_idxs = np.random.choice(out_points.shape[0], size=PTS_LIMIT, replace=False)\n        out_points = out_points[pcd_idxs]\n        out_colors = out_colors[pcd_idxs]\n        instance_labels = instance_labels[pcd_idxs]\n    ```\n\nThe detailed steps may vary between datasets. Please note the translation and color alignment are critical as they can significantly impact performance. Axis alignment, which requires 3D bounding box annotations, may result in slight but not severe degradation.\n\n### 3RScan\nTo reproduce the data preprocessing, download [3RScan](https://waldjohannau.github.io/RIO/) and run:\n```shell\n# Preprocess 3RScan \n$ python rscan.py\n```\nAdjust the `data_root`, `save_root` and `num_workers` accordingly.\n\n### HM3D\nAs some of our users requested the mapping between HM3D object id in SceneVerse to HM3D-semantics, we have added an additional file ([HM3D_tgtID2objID.zip](assets/HM3D_tgtID2objID.zip)) to obtain this mapping. The json file for each scene contains a dictionary of ```{<sceneverse_objid>:[hm3d_objid, hm3d_label]}```.\n* Note: The script ```sceneverse2hmsemantic.py``` has been deprecated as it cannot reproduce the mappings above. It currently points out how we read the semantics from the annotations in HM3D-semantics.\n\n\n## Prepare for your custom datasets\nTo prepare your custom data for inference, follow the previous steps and  the example script for 3RScan. A convenient way for verification is to use the `visualize_data.py`. If everything is correct, you should observe the colored point clouds displayed similarly to those in the released version of SceneVerse.\n\n## Scene Graph Generation\nWe also release the [scripts](preprocess/ssg/README.md) for 3D scene graph generation."
  },
  {
    "path": "preprocess/__init__.py",
    "content": "from .build import *\nfrom .utils import *\nfrom .rscan import *\nfrom .multiscan import *\nfrom .arkitscenes import *\n"
  },
  {
    "path": "preprocess/arkitscenes.py",
    "content": "import json\nfrom glob import glob\nfrom omegaconf import OmegaConf\nfrom joblib import Parallel, delayed, parallel_backend\n\nimport torch\nimport numpy as np\nimport trimesh\nfrom tqdm import tqdm\nfrom scipy.spatial.transform import Rotation\n\nfrom preprocess.build import ProcessorBase\nfrom preprocess.utils.label_convert import ARKITSCENE_SCANNET as label_convert\nfrom preprocess.utils.align_utils import compute_box_3d, calc_align_matrix, rotate_z_axis_by_degrees\nfrom preprocess.utils.constant import *\n\n\nclass ARKitScenesProcessor(ProcessorBase):\n    def record_splits(self, scan_ids):\n        split_dir = self.save_root / 'split'\n        split_dir.mkdir(exist_ok=True)\n        if (split_dir / 'train_split.txt').exists() and (split_dir / 'val_split.txt').exists():\n            return\n        split = {\n            'train': [],\n            'val': []}\n        split['train'] = [scan_id[1] for scan_id in scan_ids if scan_id[0] == 'Training']\n        split['val'] = [scan_id[1] for scan_id in scan_ids if scan_id[0] == 'Validation']\n        for _s, _c in split.items():\n            with open(split_dir / f'{_s}_split.txt', 'w', encoding='utf-8') as fp:\n                fp.write('\\n'.join(_c))\n\n    def read_all_scans(self):\n        scan_ids = []\n        for split in ['Training', 'Validation']:\n            scan_paths = glob(str(self.data_root) + f'/{split}/*')\n            scan_ids.extend([(split, path.split('/')[-1]) for path in scan_paths])\n        return scan_ids\n\n    def process_point_cloud(self, scan_id, plydata, annotations):\n        vertices = plydata.vertices\n        vertex_colors = plydata.visual.vertex_colors\n        vertex_colors = vertex_colors[:, :3]\n\n        vertex_instance = np.zeros((vertices.shape[0]))\n        inst_to_label = {}\n        bbox_list = []\n\n        for _i, label_info in enumerate(annotations[\"data\"]):\n            obj_label = label_info[\"label\"]\n            object_id = _i + 1\n            rotation = np.array(label_info[\"segments\"][\"obbAligned\"][\"normalizedAxes\"]).reshape(3, 3)\n            r = Rotation.from_matrix(rotation)\n\n            transform = np.array(label_info[\"segments\"][\"obbAligned\"][\"centroid\"]).reshape(-1, 3)\n            scale = np.array(label_info[\"segments\"][\"obbAligned\"][\"axesLengths\"]).reshape(-1, 3)\n            trns = np.eye(4)\n            trns[0:3, 3] = transform\n            trns[0:3, 0:3] = rotation.T\n            box_trimesh_fmt = trimesh.creation.box(scale.reshape(3,), trns)\n            obj_containment = np.argwhere(box_trimesh_fmt.contains(vertices))\n\n            vertex_instance[obj_containment] = object_id\n            inst_to_label[object_id] = label_convert[obj_label]\n\n            box3d = compute_box_3d(scale.reshape(3).tolist(), transform, rotation)\n            bbox_list.append(box3d)\n        if len(bbox_list) == 0:\n            return\n\n        align_angle = calc_align_matrix(bbox_list)\n        vertices = rotate_z_axis_by_degrees(np.array(vertices), align_angle)\n        if np.max(vertex_colors) <= 1:\n            vertex_colors = vertex_colors * 255.0\n        center_points = np.mean(vertices, axis=0)\n        center_points[2] = np.min(vertices[:, 2])\n        vertices = vertices - center_points\n\n        assert vertex_colors.shape == vertices.shape\n        assert vertex_colors.shape[0] == vertex_instance.shape[0]\n\n        if self.check_key(self.output.pcd):\n            torch.save(inst_to_label, self.inst2label_path / f\"{scan_id}.pth\")\n            torch.save((vertices, vertex_colors, vertex_instance), self.pcd_path / f\"{scan_id}.pth\")\n            np.save(self.pcd_path / f\"{scan_id}_align_angle.npy\", align_angle)\n\n    def scene_proc(self, scan_id):\n        split = scan_id[0]\n        scan_id = scan_id[1]\n        data_root = self.data_root / split / scan_id\n\n        if not (data_root / f'{scan_id}_3dod_mesh.ply').exists():\n            return\n        if not (data_root / f'{scan_id}_3dod_annotation.json').exists():\n            return\n\n        plydata = trimesh.load(data_root / f'{scan_id}_3dod_mesh.ply', process=False)\n        with open((data_root / f'{scan_id}_3dod_annotation.json'), \"r\", encoding='utf-8') as f:\n            annotations = json.load(f)\n\n        # process point cloud\n        self.process_point_cloud(scan_id, plydata, annotations)\n\n    def process_scans(self):\n        scan_ids = self.read_all_scans()\n        self.log_starting_info(len(scan_ids))\n\n        if self.num_workers > 1:\n            with parallel_backend('multiprocessing', n_jobs=self.num_workers):\n                Parallel()(delayed(self.scene_proc)(scan_id) for scan_id in tqdm(scan_ids))\n        else:\n            for scan_id in tqdm(scan_ids):\n                self.scene_proc(scan_id)\n\n\nif __name__ == '__main__':\n    cfg = OmegaConf.create({\n        'data_root': '/path/to/ARKitScenes',\n        'save_root': '/output/path/to/ARKitScenes',\n        'num_workers': 1,\n        'output': {\n            'pcd': True,\n        }\n    })\n    processor = ARKitScenesProcessor(cfg)\n    processor.process_scans()\n"
  },
  {
    "path": "preprocess/build.py",
    "content": "from pathlib import Path\nfrom fvcore.common.registry import Registry\n\nPROCESSOR_REGISTRY = Registry(\"Processor\")\n\n\nclass ProcessorBase:\n    def __init__(self, cfg):\n        self.data_root = Path(cfg.data_root)\n        self.save_root = Path(cfg.save_root) if cfg.get('save_root', None) else self.data_root.parent / 'scan_data'\n        self.num_workers = cfg.num_workers\n        self.inst2label_path = self.save_root / 'scan_data' / 'instance_id_to_label'\n        self.pcd_path = self.save_root / 'scan_data' / 'pcd_with_global_alignment'\n        self.segm_path = self.save_root / 'scan_data' / 'segm'\n        self.obj_path = self.save_root / 'scan_data' / 'obj'\n        self.sp_path = self.save_root / 'scan_data' / 'super_points'\n\n        self.output = cfg.output\n\n        self.setup_directories()\n\n    def setup_directories(self):\n        if self.check_key(self.output.pcd):\n            self.inst2label_path.mkdir(parents=True, exist_ok=True)\n            self.pcd_path.mkdir(parents=True, exist_ok=True)\n\n    def log_starting_info(self, scan_len, e=''):\n        print('='*50)\n        print(f'Preprocessing in {self.__class__.__name__} with {scan_len} scans')\n        o = [str(i) for i in self.output if i]\n        assert len(o) > 0, 'Please specify at least one output type'\n        print(f\"Output: {', '.join(o)}\")\n        if len(e) > 0:\n            print(e)\n        print('='*50)\n\n    @staticmethod\n    def check_key(key):\n        exist = key is not None\n        if not exist:\n            return False\n        if isinstance(key, bool):\n            enabled = key\n        elif isinstance(key, dict):\n            enabled = key.get('enabled', True)\n        elif hasattr(key, 'enabled'):\n            enabled = key.get('enabled')\n        else:\n            enabled = True\n        return enabled\n"
  },
  {
    "path": "preprocess/multiscan.py",
    "content": "import re\nimport json\nfrom glob import glob\nfrom omegaconf import OmegaConf\nfrom joblib import Parallel, delayed, parallel_backend\n\nimport torch\nfrom plyfile import PlyData\nimport numpy as np\nimport pandas as pd\nfrom tqdm import tqdm\n\nfrom preprocess.build import ProcessorBase\nfrom preprocess.utils.label_convert import MULTISCAN_SCANNET as label_convert\nfrom preprocess.utils.constant import *\n\n\nclass MultiScanProcessor(ProcessorBase):\n    def record_splits(self, scan_ids, ratio=0.8):\n        split_dir = self.save_root / 'split'\n        split_dir.mkdir(exist_ok=True)\n        if (split_dir / 'train_split.txt').exists() and (split_dir / 'val_split.txt').exists():\n            return\n        scan_len = len(scan_ids)\n        split = {\n            'train': [],\n            'val': []}\n        cur_split = 'train'\n        for scan_id in tqdm(sorted(scan_ids)):\n            split[cur_split].append(scan_id)\n            if len(split['train']) > ratio*scan_len:\n                cur_split = 'val'\n        for _s, _c in split.items():\n            with open(split_dir / f'{_s}_split.txt', 'w', encoding='utf-8') as fp:\n                fp.write('\\n'.join(_c))\n\n    def read_all_scans(self):\n        scan_paths = glob(str(self.data_root) + '/*')\n        scans_df = []\n        for scan_path in scan_paths:\n            scan_id = re.findall(r\"scene\\_[0-9]{5}\\_[0-9]{2}\", scan_path)[0]\n            scene_id = '_'.join(scan_id.split('_')[:-1])\n            row = pd.DataFrame([[scene_id, scan_id, scan_path]],\n                                columns=['sceneId', 'scanId', 'scanPath'])\n            scans_df.append(row)\n        scans_df = pd.concat(scans_df)\n        return scans_df\n\n    def process_point_cloud(self, scan_id, plydata, annotations):\n        inst_to_label = {}\n        _x = np.asarray(plydata['vertex']['x'])\n        _y = np.asarray(plydata['vertex']['y'])\n        _z = np.asarray(plydata['vertex']['z'])\n        _nx = np.asarray(plydata['vertex']['nx'])\n        _ny = np.asarray(plydata['vertex']['ny'])\n        _nz = np.asarray(plydata['vertex']['nz'])\n        _red = plydata['vertex']['red'].astype('float64')\n        _green = plydata['vertex']['green'].astype('float64')\n        _blue = plydata['vertex']['blue'].astype('float64')\n\n        vertices = np.column_stack((_x, _y, _z))\n        vertex_colors = np.column_stack((_red, _green, _blue))\n        vertex_instance = np.zeros((vertices.shape[0]))\n        triangles = np.vstack(plydata['face'].data['vertex_indices'])\n\n        object_ids = plydata['face'].data['objectId']\n        part_ids = plydata['face'].data['partId']\n        semseg_df = pd.DataFrame({'objectId': object_ids, 'partId': part_ids})\n\n        df = self.annotations_to_dataframe_obj(annotations)\n        for _, row in df.iterrows():\n            object_id = row['objectId']\n            assert object_id > 0, f\"object id should be greater than 0, but got {object_id}\"\n            object_label = row['objectLabel'].split('.')[0]\n            object_label_sn607 = label_convert[object_label]\n\n            condition1 = semseg_df['objectId'] == object_id\n            tri_indices = semseg_df[condition1].index.values\n            object_vertices = np.unique(triangles[tri_indices])\n            vertex_instance[object_vertices] = object_id\n            inst_to_label[object_id] = object_label_sn607\n\n        if np.max(vertex_colors) <= 1:\n            vertex_colors = vertex_colors * 255.0\n        center_points = np.mean(vertices, axis=0)\n        center_points[2] = np.min(vertices[:, 2])\n        vertices = vertices - center_points\n        assert vertex_colors.shape == vertices.shape\n        assert vertex_colors.shape[0] == vertex_instance.shape[0]\n\n        if self.check_key(self.output.pcd):\n            torch.save(inst_to_label, self.inst2label_path / f\"{scan_id}.pth\")\n            torch.save((vertices, vertex_colors, vertex_instance), self.pcd_path / f\"{scan_id}.pth\")\n\n    @staticmethod\n    def annotations_to_dataframe_obj(annotations):\n        objects = annotations['objects']\n        df_list = []\n        for obj in objects:\n            object_id = obj['objectId']\n            object_label = obj['label']\n            df_row = pd.DataFrame(\n                [[object_id, object_label]],\n                columns=['objectId', 'objectLabel']\n            )\n            df_list.append(df_row)\n        df = pd.concat(df_list)\n        return df\n\n    def scene_proc(self, scan_id):\n        data_root = self.data_root / scan_id\n        plydata = PlyData.read(data_root / f'{scan_id}.ply')\n        with open((data_root / f'{scan_id}.annotations.json'), \"r\", encoding='utf-8') as f:\n            annotations = json.load(f)\n\n        # process point cloud\n        self.process_point_cloud(scan_id, plydata, annotations)\n\n    def process_scans(self):\n        scans_df = self.read_all_scans()\n        scan_ids = scans_df['scanId'].unique()\n        self.log_starting_info(len(scan_ids))\n\n        if self.num_workers > 1:\n            with parallel_backend('multiprocessing', n_jobs=self.num_workers):\n                Parallel()(delayed(self.scene_proc)(scan_id) for scan_id in tqdm(scan_ids))\n        else:\n            for scan_id in tqdm(scan_ids):\n                print(scan_id)\n                self.scene_proc(scan_id)\n\n\nif __name__ == '__main__':\n    cfg = OmegaConf.create({\n        'data_root': '/path/to/MultiScan',\n        'save_root': '/output/path/to/MultiScan',\n        'num_workers': 1,\n        'output': {\n            'pcd': True,\n        }\n    })\n    processor = MultiScanProcessor(cfg)\n    processor.process_scans()\n"
  },
  {
    "path": "preprocess/rscan.py",
    "content": "import json\nfrom glob import glob\nfrom omegaconf import OmegaConf\nfrom joblib import Parallel, delayed, parallel_backend\n\nimport torch\nimport numpy as np\nimport trimesh\nimport open3d as o3d\nfrom tqdm import tqdm\n\nfrom preprocess.build import ProcessorBase\nfrom preprocess.utils.label_convert import RSCAN_SCANNET as label_convert\nfrom preprocess.utils.align_utils import compute_box_3d, calc_align_matrix, rotate_z_axis_by_degrees\nfrom preprocess.utils.constant import *\n\n\nclass RScanProcessor(ProcessorBase):\n    def record_splits(self, scan_ids, ratio=0.8):\n        split_dir = self.save_root / 'split'\n        split_dir.mkdir(exist_ok=True)\n        if (split_dir / 'train_split.txt').exists() and (split_dir / 'val_split.txt').exists():\n            return\n        scan_len = len(scan_ids)\n        split = {\n            'train': [],\n            'val': []}\n        cur_split = 'train'\n        for scan_id in tqdm(sorted(scan_ids)):\n            split[cur_split].append(scan_id)\n            if len(split['train']) > ratio*scan_len:\n                cur_split = 'val'\n        for _s, _c in split.items():\n            with open(split_dir / f'{_s}_split.txt', 'w', encoding='utf-8') as fp:\n                fp.write('\\n'.join(_c))\n\n    def read_all_scans(self):\n        scan_paths = glob(str(self.data_root) + '/*')\n        scan_ids = [path.split('/')[-1] for path in scan_paths]\n        return scan_ids\n\n    def process_point_cloud(self, scan_id, plydata, annotations):\n        plylabel, segments, aggregation = annotations\n        vertices = plydata.vertices\n        vertex_colors = trimesh.visual.uv_to_color(plydata.visual.uv, plydata.visual.material.image)\n        vertex_colors = vertex_colors[:, :3] / 255.0\n\n        none_list = list()\n        seg_to_inst = {} # segment id to object id\n        inst_to_label = {} # object id to label name\n        seg_indices = segments['segIndices']\n        seg_group = aggregation['segGroups']\n        bbox_list = []\n        for i, _ in enumerate(seg_group):\n            if seg_group[i]['label'] not in label_convert:\n                none_list.append(seg_group[i]['label'])\n                continue\n            inst_to_label[seg_group[i]['id']] = label_convert[seg_group[i]['label']]\n\n            rotation = np.array(seg_group[i][\"obb\"][\"normalizedAxes\"]).reshape(3, 3)\n            transform = np.array(seg_group[i][\"obb\"][\"centroid\"]).reshape(-1, 3)\n            scale = np.array(seg_group[i][\"obb\"][\"axesLengths\"]).reshape(-1, 3)\n            trns = np.eye(4)\n            trns[0:3, 3] = transform\n            trns[0:3, 0:3] = rotation.T\n            box3d = compute_box_3d(scale.reshape(3).tolist(), transform, rotation)\n            bbox_list.append(box3d)\n\n            for j in seg_group[i]['segments']:\n                seg_to_inst[j] = seg_group[i]['id']\n                assert seg_group[i]['id'] == seg_group[i]['objectId']\n                assert seg_group[i]['id'] > 0\n\n        query_points = vertices\n        pcd = o3d.geometry.PointCloud()\n        pcd.points = o3d.utility.Vector3dVector(np.array(plylabel.vertices, dtype=np.float64))\n        tree = o3d.geometry.KDTreeFlann(pcd)\n\n        out_instance = []\n\n        for i, _ in enumerate(query_points):\n            point = query_points[i]\n            [k, idx, distance] = tree.search_radius_vector_3d(point,0.1)\n            if k == 0:\n                out_instance.append(-1)\n            else:\n                nn_idx = idx[0]\n                if seg_indices[nn_idx] not in seg_to_inst.keys():\n                    out_instance.append(-1)\n                else:\n                    out_instance.append(seg_to_inst[seg_indices[nn_idx]])\n\n        # alignment: axis-aligned rotation\n        align_angle = calc_align_matrix(bbox_list)\n        vertices = rotate_z_axis_by_degrees(np.array(vertices), align_angle)\n        # alignment: color range\n        if np.max(vertex_colors) <= 1:\n            vertex_colors = vertex_colors * 255.0\n        # alignment: translation\n        center_points = np.mean(vertices, axis=0)\n        center_points[2] = np.min(vertices[:, 2])\n        vertices= vertices - center_points\n        vertex_instance = np.array(out_instance)\n\n        assert vertex_colors.shape == vertices.shape\n        assert vertex_colors.shape[0] == vertex_instance.shape[0]\n\n        if self.check_key(self.output.pcd):\n            torch.save(inst_to_label, self.inst2label_path / f\"{scan_id}.pth\")\n            torch.save((vertices, vertex_colors, vertex_instance), self.pcd_path / f\"{scan_id}.pth\")\n            np.save(self.pcd_path / f\"{scan_id}_align_angle.npy\", align_angle)\n\n    def scene_proc(self, scan_id):\n        data_root = self.data_root / scan_id\n        plydata = trimesh.load(data_root / 'mesh.refined.v2.obj', process=False)\n        if not (data_root / 'labels.instances.annotated.v2.ply').exists():\n            return\n        plylabel = trimesh.load(data_root / 'labels.instances.annotated.v2.ply', process=False)\n        with open((data_root / 'mesh.refined.0.010000.segs.v2.json'), \"r\", encoding='utf-8') as f:\n            segments = json.load(f)\n        with open((data_root / 'semseg.v2.json'), \"r\", encoding='utf-8') as f:\n            aggregation = json.load(f)\n\n        # process point cloud\n        self.process_point_cloud(scan_id, plydata, (plylabel, segments, aggregation))\n\n    def process_scans(self):\n        scan_ids = self.read_all_scans()\n        self.log_starting_info(len(scan_ids))\n\n        if self.num_workers > 1:\n            with parallel_backend('multiprocessing', n_jobs=self.num_workers):\n                Parallel()(delayed(self.scene_proc)(scan_id) for scan_id in tqdm(scan_ids))\n        else:\n            for scan_id in tqdm(scan_ids):\n                self.scene_proc(scan_id)\n\n\nif __name__ == '__main__':\n    cfg = OmegaConf.create({\n        'data_root': '/path/to/3RScan',\n        'save_root': '/output/path/to/3RScan',\n        'num_workers': 1,\n        'output': {\n            'pcd': True,\n        }\n    })\n    processor = RScanProcessor(cfg)\n    processor.process_scans()\n"
  },
  {
    "path": "preprocess/sceneverse2hmsemantic.py",
    "content": "import os\nimport json\nfrom joblib import Parallel, delayed, parallel_backend\nfrom glob import glob\nfrom tqdm import tqdm\nimport numpy as np\nimport argparse\n\n\ndef load_semantic_anno(semantic_txt):\n    semantic_color = []\n    obj_name_list = []\n    color_2_name = {}\n    color_2_id = {}\n    with open(semantic_txt) as f:\n        lines = f.readlines()[1:]\n        for line in lines:\n            obj_id = int(line.split(',')[0])\n            color_str = line.split(',')[1]\n            if len(color_str) != 6:\n                color_str = '0' * (6 - len(color_str)) + color_str\n            r = int(color_str[0:2], 16)\n            g = int(color_str[2:4], 16)\n            b = int(color_str[4:6], 16)\n            obj_name = line.split(',')[2][1:-1]\n            obj_name_list.append(obj_name)\n            rgb_value = np.array([r, g, b], dtype=np.uint8).reshape(1, 3)\n            semantic_color.append(rgb_value)\n            color_2_name[(r, g, b)] = obj_name\n            color_2_id[(r, g, b)] = obj_id\n    return np.concatenate(semantic_color, axis=0), obj_name_list, color_2_name, color_2_id\n\n\ndef scene_proc(scene_input):\n    scene_name = scene_input.split('/')[-1]\n    scene_uid = scene_name.split('-')[1]\n    sem_dir = scene_input + '/' + scene_uid + '.semantic'\n    print(scene_name)\n\n    # load obj semantics anno\n    semantic_anno_color, obj_name_list, color_2_name, color_2_id = load_semantic_anno(sem_dir+'.txt')\n\n    tgt_id2obj_id = {}\n    # obj assignment and export\n    semantic_anno_set = set(list(zip(*(semantic_anno_color.T))))\n    for _i, sem in enumerate(tqdm(semantic_anno_set)):\n        obj_name = color_2_name[(sem[0], sem[1], sem[2])]\n        obj_id = color_2_id[(sem[0], sem[1], sem[2])]\n        tgt_id2obj_id[_i+1] = (obj_id, obj_name)\n    json.dump(tgt_id2obj_id, open(os.path.join(scene_input, 'tgt_id2obj_id.json'), 'w'), indent=4)\n\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--data_root', type=str, default='./hm3d-train-annots', help='data root for hm-semantics data')\n    args = parser.parse_args()\n    scene_list = glob(args.data_root + '/*')\n    with parallel_backend('multiprocessing', n_jobs=1):\n        Parallel()(delayed(scene_proc)(scene) for scene in scene_list)"
  },
  {
    "path": "preprocess/ssg/README.md",
    "content": "## Scene Graph Generation\n\nWe have released the scripts to generate 3D scene graphs for the datasets released in SceneVerse. \n\n### Example Usage\nConstruct the following OmegaConfig and run ```python ssg_main.py```\n```python\n    cfg = OmegaConf.create({\n        'dataset': 'MultiScan',\n        'scene_path': 'path/to/SceneVerse',\n        'rels_save_path': './tmp',\n        'visualize': True,\n        'num_workers': 1,\n    })\n```\nNote that the current implementation of scene graph generation assumes a default viewing direction of \"+y\" from outside the 3D scan. Therefore, it can be adapted for situated understanding by allowing manual presetting of the position and viewing direction."
  },
  {
    "path": "preprocess/ssg/relationships/camera.py",
    "content": "import numpy as np\nimport ssg_utils as utils\n\n\ndef getLinearEquation(p1x, p1y, p2x, p2y):\n    sign = 1\n    a = p2y - p1y\n    if a < 0:\n        sign = -1\n        a = sign * a\n    b = sign * (p1x - p2x)\n    c = sign * (p1y * p2x - p1x * p2y)\n    return [a, b, c]\n\n\ndef cal_glocal_position(object, floor, distance_rate=1.6):\n    tgt_pos = object.position\n    room_pos = floor.position\n    room_rect = floor.bottom_rect\n\n    # center\n    center_dis = utils.euclideanDistance(tgt_pos, room_pos, 2)\n    if center_dis < distance_rate:\n        return 'in the center'\n\n    # corner\n    for point in room_rect:\n        if utils.euclideanDistance(tgt_pos, point, 2) < distance_rate:\n            return 'in the corner'\n\n    return None\n\n\ndef cal_camera_relations(ObjNode_dict, camera_position, camera_view, inst_dict, floor_idx, fov = 60):\n    relationships = []\n    for obj_id in ObjNode_dict:\n        if ObjNode_dict[obj_id].label == 'floor': continue\n\n        # camera relation\n        obj_position = ObjNode_dict[obj_id].position\n        vector = obj_position - camera_position\n        vector = vector / np.linalg.norm(vector)\n        angle = utils.get_theta(vector, camera_view)\n\n        a, b, c = getLinearEquation(camera_view[0]+camera_position[0],\n                                    camera_view[1]+camera_position[1],\n                                    camera_position[0],\n                                    camera_position[1])\n\n        if abs(angle) < fov/2:\n            rela = 'in front of'\n        elif abs(angle) > 180 - fov/2:\n            rela = 'behind'\n        elif a*obj_position[0] + b*obj_position[1] + c > 0:\n            rela = 'right' if camera_view[1] > 0 else 'left'\n        else:\n            rela = 'left' if camera_view[1] > 0 else 'right'\n\n        relationships.append(['-1', obj_id, rela])\n\n        # global relation\n        if inst_dict[ObjNode_dict[obj_id].label] > 1:\n            rela = cal_glocal_position(ObjNode_dict[obj_id], ObjNode_dict[floor_idx])\n            if rela is not None:\n\n                # print(ObjNode_dict[obj_id].label, rela)\n                # ObjNode_dict[obj_id].display_obb_box()\n                relationships.append([obj_id, obj_id, rela])\n\n    return relationships\n"
  },
  {
    "path": "preprocess/ssg/relationships/hanging.py",
    "content": "import ssg_utils as utils\n\n\ndef cal_above_below_relationships(ObjNode_dict, src, scene_high):\n\n    above_below_relationships  = []\n\n    rect = src.bottom_rect\n    src_max = src.z_max\n    src_min = src.z_min\n    src_pos = src.position\n\n    for tgt_id in ObjNode_dict:\n        tgt = ObjNode_dict[tgt_id]\n\n        if tgt.label == 'floor' : continue\n\n        tgt_max = tgt.z_max\n        tgt_min = tgt.z_min\n        tgt_pos = tgt.position\n        tgt_rect = tgt.bottom_rect\n\n        if utils.euclideanDistance (tgt.position, src.position, 2) < scene_high * 0.85: # make sure in same room\n            # above\n            if src_min > tgt_max and ( utils.if_inPoly(rect, tgt_pos) or utils.if_inPoly(tgt_rect, src_pos) ) :\n                above_below_relationships.extend(utils.generate_relation(src.id, tgt_id, 'high'))\n\n    return above_below_relationships\n\n\ndef filter_labels(obj_label):\n\n    no_hanging_labels = ['floor', 'table', 'chair', 'desk', 'bottle']\n    for l in no_hanging_labels:\n        if l in obj_label:return False\n\n    return True\n\n\ndef cal_hanging_relationships (ObjNode_dict, no_supported_objs, camera_angle,scene_high, dataset='scannet'):\n    hanging_relationships = []\n\n    for obj_id in ObjNode_dict:\n        if obj_id not in no_supported_objs:\n            obj = ObjNode_dict[obj_id]\n            if not filter_labels(obj.label): continue\n            desp = utils.generate_relation(obj.id, -2, 'hang')\n            if 'tv' in obj.label:\n                desp[2] = 'mounted on'\n            if 'mirror' in obj.label:\n                desp[2] = 'affixed to'\n\n            hanging_relationships.append(desp)\n            hanging_relationships.extend(cal_above_below_relationships(ObjNode_dict, obj, scene_high))\n\n    return hanging_relationships\n"
  },
  {
    "path": "preprocess/ssg/relationships/init.py",
    "content": ""
  },
  {
    "path": "preprocess/ssg/relationships/multi_objs.py",
    "content": "import numpy as np\nimport networkx as nx\nimport itertools\n\nimport ssg_utils as utils\n\n\ndef are_furniture_aligned(furniture1, furniture2, offset_threshold):\n    x1, y1, z1 = furniture1['center']\n    x2, y2, z2 = furniture2['center']\n    h1 = furniture1['size']\n    h2 = furniture2['size']\n    rect1 = furniture1['rect']\n    rect2 = furniture2['rect']\n\n    # x_offset\n    x_offset = abs(x1 - x2)\n    # y_offset\n    y_offset = abs(y1 - y2)\n    # z_offset\n    z_offset = abs(z1 - z2)\n\n    # volumn\n    volumn_diff = abs(utils.get_Poly_Area(rect1) - utils.get_Poly_Area(rect2))\n\n    if volumn_diff > offset_threshold:\n        return False\n    if z_offset > offset_threshold:\n        return False\n\n    if x_offset > offset_threshold and y_offset > offset_threshold:\n        return False\n\n    if x_offset < offset_threshold:\n        return 'x'\n\n    if y_offset < offset_threshold:\n        return 'y'\n\n\ndef find_aligned_furniture(furniture_list, ObjNode_dict, offset_threshold):\n    aligned_furniture = []\n\n    for i, object_id1 in enumerate(furniture_list):\n        obj1 = ObjNode_dict[object_id1]\n        furniture1 = {'center': np.array(obj1.position), 'size': obj1.z_max - obj1.z_min, 'rect': obj1.bottom_rect}\n\n        for j, object_id2 in enumerate(furniture_list[i+1:]):\n            obj2 = ObjNode_dict[object_id2]\n            furniture2 = {'center': np.array(obj2.position), 'size': obj2.z_max - obj2.z_min, 'rect': obj2.bottom_rect}\n            is_aligned = are_furniture_aligned(furniture1, furniture2, offset_threshold)\n            if is_aligned:\n                aligned_group = [obj1.id, obj2.id, is_aligned]\n                aligned_furniture.append(aligned_group)\n\n    aligned_furniture_merge = furniture_merge_lists(aligned_furniture)\n    return aligned_furniture_merge\n\ndef furniture_merge_lists(lists):\n    merged_lists = []\n\n    x_list = [lst[:2] for lst in lists if 'x' in lst]\n    y_list = [lst[:2] for lst in lists if 'y' in lst]\n\n    merged_x_list = merge_sublists(x_list)\n    merged_y_list = merge_sublists(y_list)\n\n    merged_lists.extend(merged_x_list)\n    merged_lists.extend(merged_y_list)\n\n    return merged_lists\n\n\ndef merge_sublists(L):\n    length = len(L)\n    for i in range(1, length):\n        for j in range(i):\n            if 0 in L[i] or 0 in L[j]:\n                continue\n            x = set(L[i]).union(set(L[j]))\n            y = len(L[i]) + len(L[j])\n            if len(x) < y:\n                L[i] = list(x)\n                L[j] = [0]\n\n    return [i for i in L if 0 not in i]\n\n\ndef find_middle_furniture (proximity_relations, ObjNode_dict):\n    # in the middle of\n    middle_relationships = []\n    G = nx.DiGraph()\n    for (src, tgt, rel) in proximity_relations:\n        G.add_edge(src, tgt, label=rel)\n\n    edage_dict = G.edges.data()._adjdict\n    for src_id in ObjNode_dict:\n        if src_id not in edage_dict: continue\n        if ObjNode_dict[src_id].label == 'floor' :continue\n        neighbors = edage_dict[src_id]\n        tgt_ids = list(neighbors.keys())\n        combinations = list(itertools.combinations(tgt_ids, 2))\n\n        for group in combinations:\n            idx1, idx2 = group\n            if 'near' in neighbors[idx1]['label'] and 'near' in neighbors[idx2]['label']:\n\n                direction1 = int(neighbors[idx1]['label'].split(' ')[0])\n                direction2 = int(neighbors[idx2]['label'].split(' ')[0])\n                if abs(direction1 - direction2) == 6:\n                    middle_relationships.append([[src_id,idx1,idx2], 'in the middle of'])\n\n    return middle_relationships\n\n\nif __name__ == '__main__':\n    # UnitTest\n    lists = [['26', '36', 'x'], ['26', '30', 'x'], ['29', '28', 'y'], ['29', '30', 'y'],\n             ['28', '30', 'y'], ['28', '33', 'x'], ['35', '36', 'y'], ['35', '32', 'y'],\n             ['35', '33', 'y'], ['31', '37', 'x'], ['2', '4', 'y'], ['2', '3', 'y'],\n             ['34', '32', 'y'], ['34', '33', 'y'], ['37', '3', 'x'], ['36', '30', 'x'],\n             ['4', '3', 'y'], ['32', '33', 'y']]\n    output = furniture_merge_lists(lists)\n    print(output)\n"
  },
  {
    "path": "preprocess/ssg/relationships/proximity.py",
    "content": "import numpy as np\nimport itertools\nimport ssg_utils as utils\n\ndef get_direction(src_obj, tgt_obj):\n\n    sx, sy = src_obj\n    tx, ty = tgt_obj\n\n    y = np.array((tx - sx, ty - sy))\n    y = y / np.linalg.norm(y)\n\n    angle_d = utils.get_theta(y, [1, 0])\n\n    direction = round(angle_d / 30)\n\n\n    if ty > sy : # tgt is up\n        if direction == 0: return \"3\"\n        elif direction == 1: return \"2\"\n        elif direction == 2: return \"1\"\n        elif direction == 3: return \"12\"\n        elif direction == 4: return \"11\"\n        elif direction == 5: return \"10\"\n        elif direction == 6: return \"9\"\n    else:\n        if direction == 0: return \"3\"\n        elif direction == 1: return \"4\"\n        elif direction == 2: return \"5\"\n        elif direction == 3: return \"6\"\n        elif direction == 4: return \"7\"\n        elif direction == 5: return \"8\"\n        elif direction == 6: return \"9\"\n\ndef get_oppo_direction(direction):\n\n    if direction in ['2', '3', '4']:\n        return 'to the left of'\n    elif direction in ['8', '9', '10']:\n        return 'to the right of'\n    elif direction in ['11','12','1']:\n        return 'behind'\n    else:\n        return 'in front of'\n\ndef get_space_relations(src, tgt):\n    overlap_point = 0\n    tgt_rect = tgt.bottom_rect\n    for point in tgt_rect:\n        if utils.if_inPoly(src.bottom_rect, point): # have overlap\n            overlap_point += 1\n\n    return overlap_point\n\ndef get_distance(src, tgt):\n\n    dis_of_center = utils.euclideanDistance(src.position[:2], tgt.position[:2], 2)\n    src_w = utils.euclideanDistance(src.position[:2], src.bottom_rect[0][:2], 2)\n    tgt_w = utils.euclideanDistance(tgt.position[:2], tgt.bottom_rect[0][:2], 2)\n\n    return dis_of_center > 1.5 * (src_w + tgt_w)\n\ndef cal_proximity_relationships(neighbor_objs_id, camera_angle, ObjNode_dict, scene_high):\n    proximity_relations = []\n\n    relations = ''\n\n    neighbor_objs_id_list = [i for i in range(len(neighbor_objs_id))]\n    combinations = list(itertools.combinations(neighbor_objs_id_list, 2))\n\n    for combination in combinations:\n\n        src_idx, tgt_idx = combination\n        src = neighbor_objs_id[src_idx]\n        tgt = neighbor_objs_id[tgt_idx]\n\n        if ObjNode_dict[src].room_id != ObjNode_dict[tgt].room_id:\n            continue\n\n        # is overlap\n        overlap_points = get_space_relations(src=ObjNode_dict[src], tgt=ObjNode_dict[tgt])\n\n        if overlap_points > 0 :\n            # bulid in\n            if overlap_points >=3:\n                relations = 'under'\n            # close to\n            else:\n                relations = 'close to'\n            proximity_relations.append(utils.generate_relation(ObjNode_dict[src].id, ObjNode_dict[tgt].id, relations))\n            proximity_relations.append(utils.generate_relation(ObjNode_dict[tgt].id, ObjNode_dict[src].id, relations))\n\n        else:\n            # direction\n            src_obj_center = ObjNode_dict[src].position\n            tgt_obj_center = ObjNode_dict[tgt].position\n\n            src_obj_center_new = utils.cw_rotate(src_obj_center, camera_angle)\n            tgt_obj_center_new = utils.cw_rotate(tgt_obj_center, camera_angle)\n\n            if src_obj_center_new == tgt_obj_center_new:\n                print ('src_obj_center_new == tgt_obj_center_new ', ObjNode_dict[src].id , ObjNode_dict[tgt].id)\n                break\n            direction = get_direction(src_obj_center_new, tgt_obj_center_new)\n\n            oppo_direction = get_oppo_direction(direction)\n            if get_distance(src=ObjNode_dict[src], tgt=ObjNode_dict[tgt]):\n                relations = direction + ' o‘clock direction far from'\n\n            else:\n                relations = direction + ' o‘clock direction near'\n            proximity_relations.append([ObjNode_dict[tgt].id, ObjNode_dict[src].id, relations])\n            if oppo_direction is not None:\n                proximity_relations.append([ObjNode_dict[src].id, ObjNode_dict[tgt].id, oppo_direction])\n\n    return proximity_relations\n"
  },
  {
    "path": "preprocess/ssg/relationships/support.py",
    "content": "from ssg_data.dictionary import always_supported, hanging\nimport ssg_utils as utils\n\ndef is_supported(target_obj, obj, camera_angle, radius_range = 0.1, threshold_of_z_rate=0.8):\n\n    z_min = obj.z_min\n    z_max = obj.z_max\n    tz_max = target_obj.z_max\n    tz_min = target_obj.z_min\n\n    # overlap of z\n    diff_z = z_min - tz_max\n    height = z_max - z_min\n    z_rate = abs(diff_z) / height\n\n    # must be larger\n    if not utils.get_Poly_Area(target_obj.bottom_rect[:, 0:2]) > utils.get_Poly_Area(obj.bottom_rect[:, 0:2]):\n        return False\n\n    if target_obj.label == 'floor':\n        if not z_min < tz_max:\n            return False\n    else:\n        # must be higher\n        # if tz_max > z_max:\n        #     return False\n        if z_min > (tz_max*0.05 if tz_max > 0 else tz_max*0.95): # floating\n            return False\n        if z_min < tz_min:\n            return False\n        if not diff_z < height*0.2:\n            return False\n\n    # must be centered\n    center = obj.position\n    if not utils.if_inPoly(target_obj.bottom_rect, center):\n        return False\n\n    if target_obj.label == 'floor':\n        return 'support_express'\n    else:\n        if z_rate < threshold_of_z_rate :\n            return 'support_express'\n        elif z_rate >= threshold_of_z_rate and z_rate < 0.95:\n            return 'embed_express'\n        else:\n            return 'inside_express'\n\n\ndef optimaze_support_loops(support_relations_dict):\n    relationships = []\n    for obj_id, tgts in support_relations_dict.items():\n        if len(tgts)>1:\n            positions = [tgt.position[2] for tgt in tgts]\n            hightest_tgt_inedx = positions.index(max(positions))\n            hightest_tgt = tgts[hightest_tgt_inedx]\n            relationships.append(utils.generate_relation(hightest_tgt.id, obj_id, 'support'))\n        else:\n            relationships.append(utils.generate_relation(tgts[0].id, obj_id, 'support'))\n\n    return relationships\n\ndef cal_support_relations(ObjNode_list, camera_angle):\n    support_relations_dict = {}\n    embedded_relationships = []\n    hanging_objs = {}\n\n    for target_obj_id in ObjNode_list:\n        target_obj = ObjNode_list[target_obj_id]\n\n        for obj_id in ObjNode_list:\n            obj = ObjNode_list[obj_id]\n\n            if target_obj.id == obj.id: continue\n            if target_obj.label in always_supported or obj.label in always_supported: continue\n            if target_obj.label in hanging or obj.label in hanging: continue\n\n            is_support = is_supported(target_obj, obj, camera_angle)\n\n            if is_support:\n\n                if is_support in ['embed_express', 'inside_express']:\n                    embedded_relationships.append(utils.generate_relation(target_obj.id, obj.id, is_support))\n                else:\n                    if obj.id not in support_relations_dict:\n                        support_relations_dict[obj.id] = [target_obj]\n                    else:\n                        support_relations_dict[obj.id].append(target_obj)\n\n                hanging_objs[obj.id] = 1\n\n    return optimaze_support_loops(support_relations_dict), embedded_relationships, hanging_objs\n"
  },
  {
    "path": "preprocess/ssg/ssg_data/dictionary.py",
    "content": "hanging = ['window', 'curtain', 'curtains', 'shower curtain', 'curtain rod', 'shower curtain rod']\n\nalways_supported = ['wall', 'wall hanging', 'bath walls', 'closet wall', 'closet walls', 'closet wall',\n                    'closet walls', 'door wall', 'pantry wall', 'pantry walls', 'shower wall', 'shower walls',\n                    'door','sliding door', 'sliding wood door', 'bathroom stall door', 'doors', 'door frame']\n\ncomponent = {\n    'closet' : [\"closet ceiling\" ,\"closet door\",\"closet doorframe\",\"closet doors\" , \"closet rod\" ,\"closet shelf\" ],\n    'cabinet': ['cabinet door', 'cabinet doors'],\n}\n\nadded_hanging = {\n    'curtain rod': ['curtain', ],\n    'shower curtain rod': ['shower curtain'],\n}\n\n# word diversity\nsupport_express = ['support']\nopp_support_express = ['resting on', 'placed on', 'on', 'supported by', 'on the top of']\n\nembed_express = ['']\nopp_embed_express = ['embedded into', 'placed within the area of']\n\ninside_express = ['']\nopp_inside_express = ['inside', 'placed within the area of']\n\nhanging_express = ['hanging on', 'hung on']\n\nclose_express = ['close to', 'adjacent to', 'beside', 'next to']\n\nunder_express = ['above']\n\nabove_express = ['above', 'higher than']\nbelow_express = ['below', 'lower than']\n\nmust_support_scannetpp = ['chair', 'sofa', 'table', 'bookshelf', 'standing lamp',\n                          'shoe', 'backpack', 'bag', 'mat', 'barbell','dumbbell',\n                          'trash bin', 'basket', 'tv stand', 'tablet', 'mop', 'vacum cleaner']\n"
  },
  {
    "path": "preprocess/ssg/ssg_data/script/ObjNode.py",
    "content": "import networkx as nx\nimport trimesh\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport pyvista as pv\n\n\nclass ObjNode(object):\n    def __init__(self, id=None, label=None, mesh=None, position=None, size=None, children=[], room_id = None,dataset='scannet'):\n        self.id = id\n        self.label = label\n        self.obj_mesh = mesh\n        self.size = size\n        self.position = position\n        self.children = children\n        self.room_id = room_id\n\n        self.align_matrix, self.position, self.z_min, self.z_max, self.bottom_rect, self.top_rect = self.get_object_information(dataset)\n\n    def __str__(self):\n        return \"[{}:{},{},{}]\".format(self.id, self.label, self.position, self.angle)\n\n    def get_object_information(self, dataset):\n        position = self.position # - bias\n        axis_align_matrix = None\n        x_min = position[0] - self.size[0] / 2\n        x_max = position[0] + self.size[0] / 2\n        y_min = position[1] - self.size[1] / 2\n        y_max = position[1] + self.size[1] / 2\n        z_min = position[2] - self.size[2] / 2\n        z_max = position[2] + self.size[2] / 2\n        top_vertics = np.array([[x_min, y_min, z_min], [x_max, y_min, z_min],\n                                [x_max, y_max, z_min], [x_min, y_max, z_min]])\n        bottom_vertics = np.array([[x_min, y_min, z_max], [x_max, y_min, z_max], [x_max, y_max, z_max],\n                               [x_min, y_max, z_max]])\n\n        return axis_align_matrix, position, z_min, z_max, bottom_vertics, top_vertics\n\n    def display_obb_box(self, scene_visible = True):\n\n        axis_align_matrix = self.align_matrix\n\n        obj_mesh = trimesh.load(self.obj_mesh)\n        scene_ply = pv.read(self.scan_ply)\n\n        # rotate to axis align\n        obj_mesh.apply_transform(axis_align_matrix)\n\n        if self.label == 'floor':\n\n            scene_mesh = trimesh.load(self.scan_mesh)\n            scene_mesh.apply_transform(axis_align_matrix)\n\n            # draw aabb\n            tgt_points = np.array(scene_mesh.bounding_box.as_outline().vertices)\n            tgt_edges = np.array(scene_mesh.bounding_box.as_outline().vertex_nodes)\n            tgt_points_new = []\n            for edge in tgt_edges:\n                tgt_points_new.append(tgt_points[edge[0]])\n                tgt_points_new.append(tgt_points[edge[1]])\n\n            # show results\n            plotter = pv.Plotter(off_screen=False)\n            light = pv.Light(light_type='headlight', intensity=0.2)\n            plotter.add_light(light)\n\n            plotter.add_mesh(scene_ply.transform(axis_align_matrix), rgb=True)\n            plotter.add_lines(np.array(tgt_points_new), color='red', width=3)\n\n        else:\n            # draw bbox\n            tgt_points = np.array(obj_mesh.bounding_box_oriented.as_outline().vertices)\n            tgt_edges = np.array(obj_mesh.bounding_box_oriented.as_outline().vertex_nodes)\n            tgt_points_new = []\n            for edge in tgt_edges:\n                tgt_points_new.append(tgt_points[edge[0]])\n                tgt_points_new.append(tgt_points[edge[1]])\n\n            # draw aabb\n            aa_tgt_points = np.array(obj_mesh.bounding_box.as_outline().vertices)\n            aa_tgt_edges = np.array(obj_mesh.bounding_box.as_outline().vertex_nodes)\n            aa_tgt_points_new = []\n            for edge in aa_tgt_edges:\n                aa_tgt_points_new.append(aa_tgt_points[edge[0]])\n                aa_tgt_points_new.append(aa_tgt_points[edge[1]])\n\n            # show results\n            plotter = pv.Plotter(off_screen=False)\n            light = pv.Light(light_type='headlight', intensity=0.2)\n            plotter.add_light(light)\n\n            plotter.add_mesh(scene_ply.transform(axis_align_matrix), rgb=True)\n            plotter.add_lines(np.array(tgt_points_new), color='red', width=3)\n            plotter.add_lines(np.array(aa_tgt_points_new), color='yellow', width=3)\n\n        plotter.camera.zoom(1.2)\n        plotter.show()\n\n\nif __name__ == '__main__':\n    obj_sample = ObjNode(id=1, label='', size='',\n                         mesh='../../DataAnnotation/data/scannet_objs/scene0000_00/45/mesh.obj')\n    G = nx.DiGraph()\n    G.add_node(obj_sample.id, desc = 'here1')\n    G.add_node(obj_sample.id+1, desc = 'here2')\n    G.add_node(obj_sample.id +3, desc = 'here3')\n    G.add_edge(1, 2, name ='support')\n    G.add_edge(2, 1, name='support2')\n    pos = nx.spring_layout(G)\n    nx.draw(G,pos)\n    node_labels = nx.get_node_attributes(G, 'desc')\n    nx.draw_networkx_labels(G, pos, labels=node_labels)\n    edge_labels = nx.get_edge_attributes(G, 'name')\n    nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels)\n\n    plt.show()\n"
  },
  {
    "path": "preprocess/ssg/ssg_data/ssg_visualize.py",
    "content": "import open3d as o3d\nimport numpy as np\nimport torch\n\n\ndef vis_dataset(ObjNode_dict, relation, scene_path, scan_id, scene_center):\n    coordinate = o3d.geometry.TriangleMesh.create_coordinate_frame(size=0.6, origin=[-0, -0, -0])\n    pcd_data = torch.load(scene_path / 'pcd_with_global_alignment' / f'{scan_id}.pth')\n    points, colors, _ = pcd_data[0], pcd_data[1], pcd_data[-1]\n    o3d_pcd = o3d.geometry.PointCloud()\n    o3d_pcd.points = o3d.utility.Vector3dVector(points)\n    o3d_pcd.colors = o3d.utility.Vector3dVector(colors/255.0)\n    scene_show = [o3d_pcd]\n\n    np.random.shuffle(relation)\n    for rel in relation:\n        if len(rel) == 3:\n            if rel[1] == -2:\n                src = ObjNode_dict[rel[0]]\n                gt_o3d_box_src = o3d.geometry.OrientedBoundingBox(src.position + scene_center, np.eye(3, 3),\n                                                                  src.size)\n                gt_o3d_box_src.color = [0, 1, 0]\n                obj_label = f'''{src.label} {rel[2]}'''\n                bbox_show_list = [gt_o3d_box_src]\n                o3d.visualization.draw_geometries(scene_show + [coordinate] + bbox_show_list,\n                                                  window_name=obj_label)\n\n            else:\n                src = ObjNode_dict[rel[0]]\n                tgt = ObjNode_dict[rel[1]]\n\n\n                gt_o3d_box_src = o3d.geometry.OrientedBoundingBox(src.position + scene_center, np.eye(3, 3),\n                                                                  src.size)\n                gt_o3d_box_src.color = [0, 1, 0]\n                gt_o3d_box_tgt = o3d.geometry.OrientedBoundingBox(tgt.position + scene_center, np.eye(3, 3),\n                                                                  tgt.size)\n                gt_o3d_box_tgt.color = [1, 0, 0]\n\n                obj_label = f'''{tgt.label} - {rel[2]} - {src.label} '''\n                bbox_show_list = [gt_o3d_box_src, gt_o3d_box_tgt]\n\n                o3d.visualization.draw_geometries(scene_show + [coordinate] + bbox_show_list,\n                                                  window_name=obj_label)\n        else:\n            tgts = [ObjNode_dict[tgt] for tgt in rel[0]]\n            gt_o3d_box_tgts = [o3d.geometry.OrientedBoundingBox(tgt.position + scene_center, np.eye(3, 3), tgt.size)\n                               for tgt in tgts]\n            # gt_o3d_box_tgt.color = [1, 0, 0]\n\n            obj_label = f''' {ObjNode_dict[tgts[0].id].label} {rel[1]}'''\n            bbox_show_list = [gt_o3d_box_tgts]\n            o3d.visualization.draw_geometries(scene_show + [coordinate] + bbox_show_list,\n                                              window_name=obj_label)\n"
  },
  {
    "path": "preprocess/ssg/ssg_main.py",
    "content": "import json\nimport pickle\nfrom tqdm import tqdm\nfrom pathlib import Path\nfrom omegaconf import OmegaConf\n\nimport torch\nimport networkx as nx\nimport numpy as np\n\nimport ssg_utils as utils\nfrom ssg_data import dictionary\nfrom ssg_data.ssg_visualize import vis_dataset\nfrom ssg_data.script.ObjNode import ObjNode\nfrom relationships.support import cal_support_relations\nfrom relationships.proximity import cal_proximity_relationships\nfrom relationships.hanging import cal_hanging_relationships\nfrom relationships.multi_objs import find_aligned_furniture, find_middle_furniture\n\n\ndef default_dump(obj):\n    \"\"\"Convert numpy classes to JSON serializable objects.\"\"\"\n    if isinstance(obj, (np.integer, np.floating, np.bool_)):\n        return obj.item()\n    elif isinstance(obj, np.ndarray):\n        return obj.tolist()\n    else:\n        return obj\n\ndef convert_pc_to_box(obj_pc):\n    xmin = np.min(obj_pc[:,0])\n    ymin = np.min(obj_pc[:,1])\n    zmin = np.min(obj_pc[:,2])\n    xmax = np.max(obj_pc[:,0])\n    ymax = np.max(obj_pc[:,1])\n    zmax = np.max(obj_pc[:,2])\n    center = [(xmin+xmax)/2, (ymin+ymax)/2, (zmin+zmax)/2]\n    box_size = [xmax-xmin, ymax-ymin, zmax-zmin]\n    return center, box_size\n\ndef init_camera_view():\n    camera_view = [0, -1, 0]\n    camera_pos = [0, 0, 0]\n    camera_view = camera_view / np.linalg.norm(camera_view)\n\n    if camera_view[0] < 0:\n        camera_angle = -utils.get_theta(camera_view, [0, 1, 0])\n    else:\n        camera_angle = utils.get_theta(camera_view, [0, 1, 0])\n\n    return camera_view, camera_pos, camera_angle\n\ndef filter_bad_label(input_label):\n    bad_label_list = ['ceiling', 'wall', 'door', 'doorframe', 'object']\n    for o in bad_label_list:\n        if o in input_label:\n            return False\n\n    return True\n\ndef get_obj_room_id (org_id):\n    infos = org_id.split('|')\n    if infos[1] == 'surface':\n        return int(infos[2])\n    else:\n        return int(infos[1])\n\ndef generate_object_info(save_root, scene_name) :\n    object_json_list = []\n\n    inst2label_path = save_root / 'instance_id_to_label'\n    pcd_path = save_root / 'pcd_with_global_alignment'\n\n    inst_to_label = torch.load(inst2label_path / f\"{scene_name}.pth\")\n    pcd_data = torch.load(pcd_path / f'{scene_name}.pth')\n\n    points, colors, instance_labels = pcd_data[0], pcd_data[1], pcd_data[-1]\n    pcds = np.concatenate([points, colors], 1)\n\n    x_max, y_max, z_max = points.max(axis=0)\n    x_min, y_min, z_min = points.min(axis=0)\n\n    obj_pcds = []\n    for i in np.unique(instance_labels):\n        if i < 0:\n            continue\n        mask = instance_labels == i     # time consuming\n        obj_pcds.append((pcds[mask], inst_to_label[int(i)], i))\n\n    for _, (obj, obj_label, i) in enumerate(obj_pcds):\n        gt_center, gt_size = convert_pc_to_box(obj)\n        object_json = {\n            'id': int(i),\n            'label': obj_label,\n            'position': gt_center,\n            'size': gt_size,\n            'mesh': None\n        }\n        object_json_list.append(object_json)\n\n    # add scan_id\n    object_json_string = {\n        'scan': scene_name,\n        'point_max': [x_max, y_max, z_max],\n        'point_min': [x_min, y_min, z_min],\n        'object_json_string': object_json_list,\n        'inst_to_label': inst_to_label,\n    }\n\n    return object_json_string\n\ndef generate_ssg_data(dataset, scene_path, pre_load_path):\n    ssg_data = {}\n    pre_load_file_save_path = pre_load_path / (dataset + '.pkl')\n    if pre_load_file_save_path.exists():\n        print('Using preprocessed scene data')\n        with open(pre_load_file_save_path, 'rb') as f:\n            ssg_data = pickle.load(f)\n    else:\n        print('Preprocessing scene data')\n        scans = [s.stem for s in (scene_path / 'pcd_with_global_alignment').glob('*.pth')]\n        scans.sort()\n        for scan_id in tqdm(scans):\n            object_json_string = generate_object_info(scene_path, scan_id)\n            if object_json_string is not None:\n                ssg_data[scan_id] = object_json_string\n        with open(pre_load_file_save_path, 'wb') as f:\n            pickle.dump(ssg_data, f)\n\n    return ssg_data\n\ndef main(cfg):\n    cfg.rels_save_path.mkdir(parents=True, exist_ok=True)\n    ssg_data = generate_ssg_data(cfg.dataset, cfg.scene_path, cfg.rels_save_path)\n    scans_all = list(ssg_data.keys())\n\n    ### init camera ###\n    camera_view, camera_pos, camera_angle = init_camera_view()\n    for scan_id in scans_all:\n        objects_save = {}\n        relationship_save = {}\n        inst_dict = {}\n\n        print('Processing ', scan_id)\n\n        objects_info = ssg_data[scan_id]['object_json_string']\n        inst_labels = ssg_data[scan_id]['inst_to_label']\n        # bad case\n        if len(objects_info) == 0:\n            continue\n\n        # construct object graph\n        G = nx.DiGraph()\n        # create nodes\n        ObjNode_dict = {}\n\n        # log objects of the same category\n        for inst in inst_labels:\n            if inst_labels[inst] not in inst_dict:\n                inst_dict[inst_labels[inst]] = 1\n            else:\n                inst_dict[inst_labels[inst]] += 1\n\n        x_max, y_max, z_max = ssg_data[scan_id]['point_max']\n        x_min, y_min, z_min = ssg_data[scan_id]['point_min']\n        scene_center = np.array([(x_max + x_min) / 2, (y_max + y_min) / 2, (z_max + z_min) / 2])\n        # floor bad\n        if z_max == z_min:\n            z_max = z_min + 5\n        scene_high = z_max - z_min\n\n        # generate object node for graph\n        obj_z_min = 1000\n        floor_idx = -100\n        for obj in objects_info:\n            if np.array(obj['size']).sum() == 0:\n                continue\n            if not filter_bad_label(obj['label']):\n                continue\n            if obj['label'] == 'floor':\n                floor_idx = int(obj['id'])\n            node = ObjNode(id=int(obj['id']),\n                           position=obj['position']-scene_center,\n                           label=obj['label'],\n                           mesh=obj['mesh'] if 'mesh' in obj else None,\n                           size=np.array(obj['size']),\n                           children=obj['children'] if 'children' in obj else None,\n                           room_id=get_obj_room_id (obj['id_org']) if 'id_org' in obj else None,\n                           dataset=cfg.dataset)\n\n            if obj['position'][2] - obj['size'][2]/2 < obj_z_min:\n                obj_z_min = obj['position'][2]-obj['size'][2]/2\n\n            obj['count'] = inst_dict[node.label]\n            obj['caption'] = ''\n\n            ObjNode_dict[int(obj['id'])] = node\n            G.add_node(node.id, label=node.label)\n\n        # added special nodes (wall camera)\n        G.add_node(-1, label='CAMERA')\n        G.add_node(-2, label='wall')\n\n        # special node for floor\n        if floor_idx == -100:\n            G.add_node(-3, label='floor')\n            fx, fy, fz = scene_center[0], scene_center[1], obj_z_min\n            node = ObjNode(id=-3,\n                           position=np.array([fx, fy, fz]) - scene_center,\n                           label='floor',\n                           size=[(x_max-x_min)*1.2, (y_max-y_min)*1.2, (z_max-z_min)*0.1],\n                           dataset=cfg.dataset)\n            ObjNode_dict[-3] = node\n            floor_idx = -3\n        else:\n            fx, fy, fz = scene_center[0], scene_center[1], obj_z_min\n            node_ = ObjNode_dict[floor_idx]\n            if node_.size[2] > 0:\n                node = ObjNode(id=floor_idx,\n                               position= np.array([fx, fy, fz]) - scene_center,\n                               label='floor',\n                               size=[max((x_max-x_min)*1.2, node_.size[0]),\n                                     max((y_max-y_min)*1.2, node_.size[0]),\n                                     node_.size[2]],\n                               dataset=cfg.dataset)\n            else:\n                node = ObjNode(id=floor_idx,\n                               position= np.array([fx, fy, fz]) - scene_center,\n                               label='floor',\n                               size=[max((x_max-x_min)*1.2, node_.size[0]),\n                                     max((y_max-y_min)*1.2, node_.size[0]),\n                                     (z_max-z_min)*0.1],\n                               dataset=cfg.dataset)\n\n            ObjNode_dict[floor_idx] = node\n\n        # support embedded relationship\n        if cfg.dataset.lower() in ['procthor']:\n            support_relations = []\n            embedded_relations = []\n            hanging_objs_dict = {}\n            for src_id, _ in ObjNode_dict.items():\n                src_obj = ObjNode_dict[src_id]\n                if src_obj.z_min <= ObjNode_dict[floor_idx].z_max and src_obj.id != floor_idx:\n                    support_relations.append(utils.generate_relation(floor_idx, src_id,'support'))\n                    hanging_objs_dict[src_id] = 1\n\n                if src_obj.children != []:\n                    for child in src_obj.children:\n                        hanging_objs_dict[child] = 1\n                        if child not in ObjNode_dict:\n                            continue\n                        if ObjNode_dict[child].z_max < src_obj.z_max:\n                            embedded_relations.append(utils.generate_relation(src_id, child ,'inside_express'))\n                        else:\n                            support_relations.append(utils.generate_relation(src_id, child , 'support'))\n\n        else:\n            support_relations, embedded_relations, hanging_objs_dict = cal_support_relations(ObjNode_dict, camera_angle)\n        for rela in support_relations:\n            target_obj_id, obj_id, _ = rela\n            G.add_edge(target_obj_id, obj_id, label='support') # optimizer\n\n        # hanging relationships\n        hanging_relationships = cal_hanging_relationships(ObjNode_dict, hanging_objs_dict, camera_angle,\n                                                          scene_high, dataset=cfg.dataset)\n\n        # iterate graph to cal saptial relationships\n        proximity_relations = []\n        for node in G:\n            neighbor = dict(nx.bfs_successors(G, source=node, depth_limit=1))\n            if len(neighbor[node]) > 1:\n                neighbor_objs = neighbor[node]\n                proximity = cal_proximity_relationships(neighbor_objs, camera_angle, ObjNode_dict, scene_high)\n                proximity_relations += proximity\n\n        # added some special relations and oppo support relationships\n        oppo_support_relations = []\n        objects_rels = support_relations + embedded_relations + hanging_relationships\n        for idx, rels in enumerate(objects_rels):\n            src, tgt, rela = rels\n            if rela == 'support':\n                oppo_support_relations.append(utils.generate_relation(src, tgt, 'oppo_support'))\n\n            if src == -2 or tgt == -2:\n                continue\n            src_label = ObjNode_dict[src].label\n            tgt_label = ObjNode_dict[tgt].label\n\n            if src_label in dictionary.added_hanging and dictionary.added_hanging[src_label] == tgt_label:\n                objects_rels[idx][2] = 'hanging'\n            if tgt_label in dictionary.added_hanging and dictionary.added_hanging[tgt_label] == src_label:\n                objects_rels[idx][2] = 'hanging'\n\n        # multi objects\n        multi_objs_relationships = []\n\n        # added aligned relationship\n        furniture_list = list(ObjNode_dict.keys())\n        aligned_furniture = find_aligned_furniture(furniture_list, ObjNode_dict, 0.065)\n\n        for _, aligned_furni in enumerate(aligned_furniture):\n            multi_objs_relationships.append([aligned_furni, 'Aligned'])\n\n        # added in the middle of relationship\n        middle_relationships = find_middle_furniture(proximity_relations, ObjNode_dict)\n\n        # output json\n        relationships_json_string = {\n            'scan': scan_id,\n            'camera_view': camera_view,\n            'camera_position': camera_pos,\n            'relationships': objects_rels + proximity_relations + oppo_support_relations,\n            'multi_objs_relationships': multi_objs_relationships + middle_relationships,\n        }\n\n        np.random.shuffle(objects_rels)\n        # visualize scene\n        if cfg.visualize:\n            vis_dataset(ObjNode_dict=ObjNode_dict,\n                        relation=proximity_relations,\n                        scene_path=cfg.scene_path,\n                        scan_id=scan_id,\n                        scene_center=scene_center)\n\n\n        relationship_save[scan_id] = relationships_json_string\n        objects_save[scan_id] = {\"objects_info\": objects_info,\n                                    \"inst_to_label\" : inst_labels}\n\n        print ('==> DONE')\n        print('SCENE ', scan_id)\n        print('OBJECTS ', len(ObjNode_dict))\n\n        scan_path = cfg.rels_save_path / scan_id\n\n        scan_path.mkdir(parents=True, exist_ok=True)\n        print('SAVE', scan_path)\n        with (scan_path / 'relationships.json').open('w') as file:\n            json.dump(relationship_save, file, default=default_dump)\n        with (scan_path / 'objects.json').open('w') as file:\n            json.dump(objects_save, file, default=default_dump)\n        print ('=====================\\n')\n\n\nif __name__ == '__main__':\n    cfg = OmegaConf.create({\n        'dataset': 'dataset',\n        'scene_path': '/path/to/dir',\n        'rels_save_path': '/output/path/to/dir',\n        'visualize': True,\n        'num_workers': 1,\n    })\n\n    cfg.scene_path = Path(cfg.scene_path) / cfg.dataset / 'scan_data'\n    cfg.rels_save_path = Path(cfg.rels_save_path) / cfg.dataset\n\n    main(cfg)\n"
  },
  {
    "path": "preprocess/ssg/ssg_utils.py",
    "content": "import trimesh\nimport math\nfrom shapely import geometry\nimport os\nimport numpy as np\nimport pyvista as pv\nfrom ssg_data.dictionary import *\nimport random\nimport open3d as o3d\n\n\ndef cw_rotate(point, ang):\n    x,y,_ = point\n    ang = math.radians(ang)\n    new_x = round(x * math.cos(ang) - y * math.sin(ang), 5)\n    new_y = round(x * math.sin(ang) + y * math.cos(ang), 5)\n    return new_x, new_y\n\ndef euclideanDistance(instance1, instance2, dimension):\n    distance = 0\n    for i in range(dimension):\n        distance += (instance1[i] - instance2[i])**2\n\n    return math.sqrt(distance)\n\ndef if_inPoly(polygon, Points):\n    line = geometry.LineString(polygon)\n    point = geometry.Point(Points)\n    polygon = geometry.Polygon(line)\n    return polygon.contains(point)\n\ndef get_Poly_Area(polygon):\n\n    line = geometry.LineString(polygon)\n    polygon = geometry.Polygon(line)\n    return polygon.area\n\ndef get_theta (x, y):\n\n    x = np.array(x)\n    y = np.array(y)\n\n    l_x = np.sqrt(x.dot(x))\n    l_y = np.sqrt(y.dot(y))\n\n    dian = x.dot(y)\n\n    cos_ = dian / (l_x * l_y)\n\n    angle_hu = np.arccos(cos_)\n    angle_d = angle_hu * 180 / np.pi\n\n    return angle_d\n\ndef generate_relation(src, tgt, express):\n    if 'oppo_support' in express:\n        oppo_rels = [tgt, src, random.choice(opp_support_express)]\n        return oppo_rels\n    elif 'support' in express:\n        rels = [src, tgt, random.choice(support_express)]\n        return rels\n    elif 'embed' in express:\n        oppo_rels = [tgt, src, random.choice(opp_embed_express)]\n        return oppo_rels\n    elif 'inside' in express:\n        oppo_rels = [tgt, src, random.choice(opp_inside_express)]\n        return oppo_rels\n    elif 'hang' in express:\n        oppo_rels = [src, tgt, random.choice(hanging_express)]\n        return oppo_rels\n    elif 'under' in express:\n        oppo_rels = [src, tgt, random.choice(under_express)]\n        return oppo_rels\n    elif 'close' in express:\n        oppo_rels = [src, tgt, random.choice(close_express)]\n        return oppo_rels\n    elif 'high' in express:\n        rels = [src, tgt, random.choice(above_express)]\n        oppo_rels = [tgt, src, random.choice(below_express)]\n        return [rels,oppo_rels]\n\ndef visualize_relations(target_obj, obj, relationship, camera_angle, camera_position = np.array([0,0,0]), save = False):\n    if save:\n        render_bbox_pyvista(obj, target_obj, relationship, camera_angle, camera_position)\n    else:\n        axis_align_matrix = target_obj.align_matrix\n\n        tgt_mesh = trimesh.load(target_obj.obj_mesh)\n        src_mesh = trimesh.load(obj.obj_mesh)\n        tgt_mesh.apply_transform(axis_align_matrix)\n        src_mesh.apply_transform(axis_align_matrix)\n\n        tgt_p = tgt_mesh.bounding_box.as_outline()\n        tgt_p.entities[0].color = (255, 0, 0, 255)\n\n        src_p = src_mesh.bounding_box.as_outline()\n        src_p.entities[0].color = (255, 255, 0, 255)\n\n        scene_mesh = trimesh.load_mesh(target_obj.scan_mesh)\n\n        scene_mesh.apply_transform(axis_align_matrix)\n\n\n        # draw line of two objects\n        lines_of_center = [[np.array(target_obj.position), np.array(obj.position)]]\n        p = trimesh.load_path(lines_of_center)\n\n        # rotate from camera view\n        camera_rotate = trimesh.transformations.rotation_matrix(\n            np.deg2rad(camera_angle), [0,0,1], point=(0,0,0)\n        )\n\n        scene_mesh.apply_transform(camera_rotate)\n        tgt_p.apply_transform(camera_rotate)\n        src_p.apply_transform(camera_rotate)\n        p.apply_transform(camera_rotate)\n\n        # draw camera center\n        camera = trimesh.primitives.Sphere(radius=0.2, center=camera_position)\n        camera.apply_transform(camera_rotate)\n\n        Scene = trimesh.Scene()\n\n        camera_rotate = trimesh.transformations.rotation_matrix(\n           -20, [1,0,0], point=(0,0,0)\n        )\n        Scene.add_geometry([scene_mesh, src_p, tgt_p, p])\n        Scene.apply_transform(camera_rotate)\n\n        Scene.show()\n\ndef visualize_relations_multi_objs(objs, relationship, item, camera_angle, camera_position = np.array([0,0,0]), save = False):\n    # img save name\n    save_img_name = '_'.join([relationship, objs[0].label]) + str(item)\n\n    # load mesh\n    scene_mesh = pv.read(objs[0].scan_ply)\n    axis_align_matrix = objs[0].align_matrix\n    tgt_meshs = [trimesh.load(obj.obj_mesh) for obj in objs]\n\n    # show results\n    plotter = pv.Plotter(off_screen=True)\n    light = pv.Light(light_type='headlight', intensity=0.3)\n    plotter.add_light(light)\n\n    # draw camera\n    camera_look_at = cw_rotate(camera_position+np.array([0,1,0]), -camera_angle)\n    camera_look_at = np.array([camera_look_at[0], camera_look_at[1], 0])\n    # plotter.add_lines(np.array([camera_position, camera_look_at]), color='blue', width=3)\n    mesh = pv.Arrow(start=camera_position, direction=camera_look_at)\n    plotter.add_mesh(mesh)\n\n    # added scene mesh\n    plotter.add_mesh(scene_mesh.transform(axis_align_matrix), rgb=True)\n\n    # rotate to axis align and added in to scene\n    for tgt_mesh in tgt_meshs:\n        tgt_mesh.apply_transform(axis_align_matrix)\n        # draw bbox\n        tgt_points = np.array(tgt_mesh.bounding_box.as_outline().vertices)\n        tgt_edges = np.array(tgt_mesh.bounding_box.as_outline().vertex_nodes)\n        tgt_points_new = []\n        for edge in tgt_edges:\n            tgt_points_new.append(tgt_points[edge[0]])\n            tgt_points_new.append(tgt_points[edge[1]])\n\n        plotter.add_lines(np.array(tgt_points_new), color='yellow', width=3)\n\n    plotter.add_point_labels(\n        [np.array(obj.position) for obj in objs],\n        [obj.label for obj in objs],\n        margin=0,\n        fill_shape=True,\n        font_size=18,\n        shape_color=\"black\",\n        point_color=\"red\",\n        text_color=\"white\",\n        always_visible=True,\n    )\n\n    plotter.add_text(\n        save_img_name,\n        position='upper_right',\n        color='Blue',\n        shadow=True,\n        font_size=19,\n    )\n\n    plotter.camera_position = 'yz'\n    plotter.camera.azimuth = 90 - camera_angle + 180\n    plotter.camera.elevation = 65\n\n    plotter.camera.zoom(1.2)\n    plotter.show()\n\ndef render_bbox_pyvista(tgt, src, relationship, camera_angle, camera_position):\n\n    # img save name\n    save_img_name = '_'.join([relationship, src.label, src.id, tgt.label, tgt.id])\n\n\n    # load mesh\n    tgt_mesh = trimesh.load(tgt.obj_mesh)\n    src_mesh = trimesh.load(src.obj_mesh)\n    scene_mesh = pv.read(tgt.scan_ply)\n    axis_align_matrix = tgt.align_matrix\n\n    # rotate to axis align\n    tgt_mesh.apply_transform(axis_align_matrix)\n    src_mesh.apply_transform(axis_align_matrix)\n\n    # draw bbox\n    tgt_points = np.array(tgt_mesh.bounding_box.as_outline().vertices)\n    tgt_edges = np.array(tgt_mesh.bounding_box.as_outline().vertex_nodes)\n    tgt_points_new = []\n    for edge in tgt_edges:\n        tgt_points_new.append(tgt_points[edge[0]])\n        tgt_points_new.append(tgt_points[edge[1]])\n\n    src_points = np.array(src_mesh.bounding_box.as_outline().vertices)\n    src_edges = np.array(src_mesh.bounding_box.as_outline().vertex_nodes)\n    src_points_new = []\n    for edge in src_edges:\n        src_points_new.append(src_points[edge[0]])\n        src_points_new.append(src_points[edge[1]])\n\n    # show results\n    plotter = pv.Plotter(off_screen=True)\n    light = pv.Light(light_type='headlight', intensity=0.3)\n    plotter.add_light(light)\n\n    # draw camera\n    camera_look_at = cw_rotate(camera_position+np.array([0,1,0]), -camera_angle)\n    camera_look_at = np.array([camera_look_at[0], camera_look_at[1], 0])\n    # plotter.add_lines(np.array([camera_position, camera_look_at]), color='blue', width=3)\n    mesh = pv.Arrow(start=camera_position, direction=camera_look_at)\n    plotter.add_mesh(mesh)\n\n    plotter.add_mesh(scene_mesh.transform(axis_align_matrix), rgb=True)\n    plotter.add_lines(np.array([src.position, tgt.position]), color='red', width=3)\n    plotter.add_lines(np.array(src_points_new), color='red', width=3)\n    plotter.add_lines(np.array(tgt_points_new), color='yellow', width=3)\n    # plotter.add_axes_at_origin()\n\n    plotter.add_point_labels(\n        [\n            src.position,\n            tgt.position,\n            camera_position\n        ],\n        [src.label, tgt.label, 'Camera View'],\n        margin=0,\n        fill_shape=True,\n        font_size=18,\n        shape_color=\"black\",\n        point_color=\"red\",\n        text_color=\"white\",\n        always_visible=True,\n    )\n\n    plotter.add_text(\n        save_img_name,\n        position='upper_right',\n        color='Blue',\n        shadow=True,\n        font_size=19,\n    )\n\n    plotter.camera_position = 'yz'\n    plotter.camera.azimuth = 90 - camera_angle + 180\n    plotter.camera.elevation = 65\n\n    plotter.camera.zoom(1.2)\n    plotter.show()\n\ndef visualize_camera_relations(ObjNode_dict, camera_relations, camera_position, camera_view, save = False):\n    tgt = ObjNode_dict[camera_relations[0][1]]\n    scene_mesh = trimesh.load(tgt.scan_mesh)\n    axis_align_matrix = tgt.align_matrix\n    objs_mesh = []\n    for rela in camera_relations:\n        _, obj, desc = rela\n        obj = ObjNode_dict[obj]\n        src_mesh = trimesh.load(obj.obj_mesh)\n        src_mesh.apply_transform(axis_align_matrix)\n        src_p = src_mesh.bounding_box.as_outline()\n\n        if desc == 'behind':\n            src_p.entities[0].color = (0, 255, 0, 255)\n        elif desc == 'in front of':\n            src_p.entities[0].color = (255, 0, 0, 255)\n        elif desc == 'left':\n            src_p.entities[0].color = (0, 0, 255, 255)\n        else:\n            src_p.entities[0].color = (0, 255, 255, 255)\n\n        objs_mesh.append (src_p)\n\n    end_point = np.array(camera_position) + np.array(camera_view)\n    # draw line of two objects\n    lines_of_center = [[end_point, np.array(camera_position)],]\n    p = trimesh.load_path(lines_of_center)\n    scene_mesh.apply_transform(axis_align_matrix)\n\n    # camera position\n    camera_pos = trimesh.primitives.Sphere(radius=0.2, center=np.array(camera_position))\n\n    Scene = trimesh.Scene()\n    Scene.add_geometry([scene_mesh, p, camera_pos])\n    Scene.add_geometry(objs_mesh)\n\n    if not save:\n        Scene.show()\n    else:\n        data = Scene.save_image(resolution=(640, 640))\n        save_img_name = tgt.scan_id + 'camera_view.png'\n        save_path = os.path.join('../SSGResults/cameras', save_img_name)\n        with open(save_path, 'wb') as f:\n            f.write(data)\n        #Scene.show()\n\n\ndef read_one_obj(bbox_points, scene_file):\n    scene_mesh = pv.read(scene_file)\n    scene_points = scene_mesh.points\n\n    # visualize scene\n    o3d_pcd = o3d.geometry.PointCloud()\n    o3d_pcd.points = o3d.utility.Vector3dVector(scene_points)\n\n\n    bbox_center = np.mean(bbox_points, axis=0)\n    bbox_size = np.max(bbox_points, axis=0) - np.min(bbox_points, axis=0)\n    gt_o3d_box = o3d.geometry.OrientedBoundingBox(bbox_center, np.eye(3, 3), bbox_size)\n    gt_o3d_box.color = [0, 1, 0]\n\n    mesh_frame = o3d.geometry.TriangleMesh.create_coordinate_frame(size=0.6, origin=[-0, -0, -0])\n    o3d.visualization.draw_geometries([o3d_pcd, gt_o3d_box, mesh_frame])\n"
  },
  {
    "path": "preprocess/structured3d.py",
    "content": "import pickle\nfrom glob import glob\nfrom omegaconf import OmegaConf\nfrom joblib import Parallel, delayed, parallel_backend\n\nimport torch\nimport numpy as np\nfrom tqdm import tqdm\n\nfrom preprocess.build import ProcessorBase\nfrom preprocess.utils.label_convert import S3D_SCANNET as label_convert\nfrom preprocess.utils.constant import *\n\n\nPTS_LIMIT = 480000\n\n\nclass S3DProcessor(ProcessorBase):\n    def record_splits(self, scan_ids):\n        split_dir = self.save_root / 'split'\n        split_dir.mkdir(exist_ok=True)\n        split = {\n            'train': [],\n            'val': [],\n            'test': []}\n        split['train'] = [scan_id[1] for scan_id in scan_ids if scan_id[0] == 'train']\n        split['val'] = [scan_id[1] for scan_id in scan_ids if scan_id[0] == 'val']\n        split['test'] = [scan_id[1] for scan_id in scan_ids if scan_id[0] == 'test']\n        for _s, _c in split.items():\n            with open(split_dir / f'{_s}_split.txt', 'w', encoding='utf-8') as fp:\n                fp.write('\\n'.join(_c))\n\n    def read_all_scans(self):\n        scan_ids = []\n        for split in ['train', 'val', 'test']:\n            scan_paths = glob(str(self.data_root) + f'/{split}/*')\n            scan_ids.extend([(split, '_'.join(path.split('/')[-1].split('_')[:-2])) for path in scan_paths])\n        return scan_ids\n\n    def process_point_cloud(self, scan_id, plydata, annotations):\n        vertices = plydata[0]\n        vertex_colors = (plydata[1][:,:3] + 1) / 2.0 * 255.0\n\n        vertex_instance = - np.ones((vertices.shape[0]))\n        inst_to_label = {}\n\n        for _id, _box in enumerate(annotations['gt_boxes_upright_depth']):\n            if annotations['class'][_id] in [38, 39, 40]:\n                continue\n            centroid = _box[:3]\n            dimension = _box[3:6]\n            box_max = centroid + dimension/2\n            box_min = centroid - dimension/2\n            point_max_mask = np.all(vertices < box_max, axis=1)\n            point_min_mask = np.all(vertices > box_min, axis=1)\n            point_mask = np.logical_and(point_max_mask, point_min_mask)\n            vertex_instance[point_mask] = _id\n            inst_to_label[_id] = label_convert[annotations['class'][_id]]\n\n        center_points = np.mean(vertices, axis=0)\n        center_points[2] = np.min(vertices[:, 2])\n        vertices = vertices - center_points\n        assert vertex_colors.shape == vertices.shape\n        assert vertex_colors.shape[0] == vertex_instance.shape[0]\n\n        if vertices.shape[0] > PTS_LIMIT:\n            pcd_idxs = np.random.choice(vertices.shape[0], size=PTS_LIMIT, replace=False)\n            vertices = vertices[pcd_idxs]\n            colors = colors[pcd_idxs]\n            vertex_instance = vertex_instance[pcd_idxs]\n\n        if self.check_key(self.output.pcd):\n            torch.save(inst_to_label, self.inst2label_path / f\"{scan_id}.pth\")\n            torch.save((vertices, vertex_colors, vertex_instance), self.pcd_path / f\"{scan_id}.pth\")\n\n    def scene_proc(self, scan_id):\n        split = scan_id[0]\n        scan_id = scan_id[1]\n        data_root = self.data_root / split\n\n        if not (data_root / f'{scan_id}_1cm_seg.pth').exists():\n            return\n        if not (self.data_root.parent / 'anno_mask' / f'{scan_id}_1cm.bin').exists():\n            return\n        plydata = torch.load(data_root / f'{scan_id}_1cm_seg.pth')\n        with open(self.data_root.parent / 'anno_mask' / f'{scan_id}_1cm.bin', 'rb') as f:\n            annotations = pickle.load(f)\n\n        # process point cloud\n        self.process_point_cloud(scan_id, plydata, annotations)\n\n    def process_scans(self):\n        scan_ids = self.read_all_scans()\n        self.log_starting_info(len(scan_ids))\n\n        if self.num_workers > 1:\n            with parallel_backend('multiprocessing', n_jobs=self.num_workers):\n                Parallel()(delayed(self.scene_proc)(scan_id) for scan_id in tqdm(scan_ids))\n        else:\n            for scan_id in tqdm(scan_ids):\n                self.scene_proc(scan_id)\n\n\nif __name__ == '__main__':\n    # we use the data processing for Structured3D from Swin3D,\n    # please refer to https://github.com/yuxiaoguo/Uni3DScenes for more details.\n    cfg = OmegaConf.create({\n        'data_root': '/path/to/Structured3D/data_out/swin3d_new',\n        'save_root': '/output/path/to/Structured3D',\n        'num_workers': 1,\n        'output': {\n            'pcd': True,\n        }\n    })\n    processor = S3DProcessor(cfg)\n    processor.process_scans()\n"
  },
  {
    "path": "preprocess/utils/__init__.py",
    "content": ""
  },
  {
    "path": "preprocess/utils/align_utils.py",
    "content": "import numpy as np\nimport math\n\ndef compute_box_3d(size, center, rotmat):\n    \"\"\"Compute corners of a single box from rotation matrix\n    Args:\n        size: list of float [dx, dy, dz]\n        center: np.array [x, y, z]\n        rotmat: np.array (3, 3)\n    Returns:\n        corners: (8, 3)\n    \"\"\"\n    l, h, w = [i / 2 for i in size]\n    center = np.reshape(center, (-1, 3))\n    center = center.reshape(3)\n    x_corners = [l, l, -l, -l, l, l, -l, -l]\n    y_corners = [h, -h, -h, h, h, -h, -h, h]\n    z_corners = [w, w, w, w, -w, -w, -w, -w]\n    corners_3d = np.dot(\n        np.transpose(rotmat), np.vstack([x_corners, y_corners, z_corners])\n    )\n    corners_3d[0, :] += center[0]\n    corners_3d[1, :] += center[1]\n    corners_3d[2, :] += center[2]\n    return np.transpose(corners_3d)\n\n\ndef rotate_z_axis_by_degrees(pointcloud, theta, clockwise=True):\n    theta = np.deg2rad(theta)\n    cos_t = np.cos(theta)\n    sin_t = np.sin(theta)\n    rot_matrix = np.array([[cos_t, -sin_t, 0],\n                           [sin_t, cos_t, 0],\n                           [0, 0, 1]], pointcloud.dtype)\n    if not clockwise:\n        rot_matrix = rot_matrix.T\n    return pointcloud.dot(rot_matrix)\n\n\ndef eulerAnglesToRotationMatrix(theta):\n    \"\"\"Euler rotation matrix with clockwise logic.\n    Rotation\n\n    Args:\n        theta: list of float\n            [theta_x, theta_y, theta_z]\n    Returns:\n        R: np.array (3, 3)\n            rotation matrix of Rz*Ry*Rx\n    \"\"\"\n    R_x = np.array(\n        [\n            [1, 0, 0],\n            [0, math.cos(theta[0]), -math.sin(theta[0])],\n            [0, math.sin(theta[0]), math.cos(theta[0])],\n        ]\n    )\n\n    R_y = np.array(\n        [\n            [math.cos(theta[1]), 0, math.sin(theta[1])],\n            [0, 1, 0],\n            [-math.sin(theta[1]), 0, math.cos(theta[1])],\n        ]\n    )\n\n    R_z = np.array(\n        [\n            [math.cos(theta[2]), -math.sin(theta[2]), 0],\n            [math.sin(theta[2]), math.cos(theta[2]), 0],\n            [0, 0, 1],\n        ]\n    )\n\n    R = np.dot(R_z, np.dot(R_y, R_x))\n    return R\n\n\ndef is_axis_aligned(rotated_box, thres=0.05):\n    x_diff = abs(rotated_box[0][0] - rotated_box[1][0])\n    y_diff = abs(rotated_box[0][1] - rotated_box[3][1])\n    return x_diff < thres and y_diff < thres\n\n\ndef calc_align_matrix(bbox_list):\n    RANGE = [-45, 45]\n    NUM_BIN = 90\n    angles = np.linspace(RANGE[0], RANGE[1], NUM_BIN)\n    angle_counts = {}\n    for _a in angles:\n        bucket = round(_a, 3)\n        for box in bbox_list:\n            box_r = rotate_z_axis_by_degrees(box, bucket)\n            bottom = box_r[4:]\n            if is_axis_aligned(bottom):\n                angle_counts[bucket] = angle_counts.get(bucket, 0) + 1\n    if len(angle_counts) == 0:\n        RANGE = [-90, 90]\n        NUM_BIN = 180\n        angles = np.linspace(RANGE[0], RANGE[1], NUM_BIN)\n        for _a in angles:\n            bucket = round(_a, 3)\n            for box in bbox_list:\n                box_r = rotate_z_axis_by_degrees(box, bucket)\n                bottom = box_r[4:]\n                if is_axis_aligned(bottom, thres=0.15):\n                    angle_counts[bucket] = angle_counts.get(bucket, 0) + 1\n    most_common_angle = max(angle_counts, key=angle_counts.get)\n    return most_common_angle\n"
  },
  {
    "path": "preprocess/utils/constant.py",
    "content": "from enum import Enum\nimport numpy as np\nimport matplotlib.pyplot as plt\n\n### ScanNet200 Benchmark constants ###\nVALID_CLASS_IDS_200 = (\n    1,\n    2,\n    3,\n    4,\n    5,\n    6,\n    7,\n    8,\n    9,\n    10,\n    11,\n    13,\n    14,\n    15,\n    16,\n    17,\n    18,\n    19,\n    21,\n    22,\n    23,\n    24,\n    26,\n    27,\n    28,\n    29,\n    31,\n    32,\n    33,\n    34,\n    35,\n    36,\n    38,\n    39,\n    40,\n    41,\n    42,\n    44,\n    45,\n    46,\n    47,\n    48,\n    49,\n    50,\n    51,\n    52,\n    54,\n    55,\n    56,\n    57,\n    58,\n    59,\n    62,\n    63,\n    64,\n    65,\n    66,\n    67,\n    68,\n    69,\n    70,\n    71,\n    72,\n    73,\n    74,\n    75,\n    76,\n    77,\n    78,\n    79,\n    80,\n    82,\n    84,\n    86,\n    87,\n    88,\n    89,\n    90,\n    93,\n    95,\n    96,\n    97,\n    98,\n    99,\n    100,\n    101,\n    102,\n    103,\n    104,\n    105,\n    106,\n    107,\n    110,\n    112,\n    115,\n    116,\n    118,\n    120,\n    121,\n    122,\n    125,\n    128,\n    130,\n    131,\n    132,\n    134,\n    136,\n    138,\n    139,\n    140,\n    141,\n    145,\n    148,\n    154,\n    155,\n    156,\n    157,\n    159,\n    161,\n    163,\n    165,\n    166,\n    168,\n    169,\n    170,\n    177,\n    180,\n    185,\n    188,\n    191,\n    193,\n    195,\n    202,\n    208,\n    213,\n    214,\n    221,\n    229,\n    230,\n    232,\n    233,\n    242,\n    250,\n    261,\n    264,\n    276,\n    283,\n    286,\n    300,\n    304,\n    312,\n    323,\n    325,\n    331,\n    342,\n    356,\n    370,\n    392,\n    395,\n    399,\n    408,\n    417,\n    488,\n    540,\n    562,\n    570,\n    572,\n    581,\n    609,\n    748,\n    776,\n    1156,\n    1163,\n    1164,\n    1165,\n    1166,\n    1167,\n    1168,\n    1169,\n    1170,\n    1171,\n    1172,\n    1173,\n    1174,\n    1175,\n    1176,\n    1178,\n    1179,\n    1180,\n    1181,\n    1182,\n    1183,\n    1184,\n    1185,\n    1186,\n    1187,\n    1188,\n    1189,\n    1190,\n    1191,\n)\n\nCLASS_LABELS_200 = (\n    \"wall\",\n    \"chair\",\n    \"floor\",\n    \"table\",\n    \"door\",\n    \"couch\",\n    \"cabinet\",\n    \"shelf\",\n    \"desk\",\n    \"office chair\",\n    \"bed\",\n    \"pillow\",\n    \"sink\",\n    \"picture\",\n    \"window\",\n    \"toilet\",\n    \"bookshelf\",\n    \"monitor\",\n    \"curtain\",\n    \"book\",\n    \"armchair\",\n    \"coffee table\",\n    \"box\",\n    \"refrigerator\",\n    \"lamp\",\n    \"kitchen cabinet\",\n    \"towel\",\n    \"clothes\",\n    \"tv\",\n    \"nightstand\",\n    \"counter\",\n    \"dresser\",\n    \"stool\",\n    \"cushion\",\n    \"plant\",\n    \"ceiling\",\n    \"bathtub\",\n    \"end table\",\n    \"dining table\",\n    \"keyboard\",\n    \"bag\",\n    \"backpack\",\n    \"toilet paper\",\n    \"printer\",\n    \"tv stand\",\n    \"whiteboard\",\n    \"blanket\",\n    \"shower curtain\",\n    \"trash can\",\n    \"closet\",\n    \"stairs\",\n    \"microwave\",\n    \"stove\",\n    \"shoe\",\n    \"computer tower\",\n    \"bottle\",\n    \"bin\",\n    \"ottoman\",\n    \"bench\",\n    \"board\",\n    \"washing machine\",\n    \"mirror\",\n    \"copier\",\n    \"basket\",\n    \"sofa chair\",\n    \"file cabinet\",\n    \"fan\",\n    \"laptop\",\n    \"shower\",\n    \"paper\",\n    \"person\",\n    \"paper towel dispenser\",\n    \"oven\",\n    \"blinds\",\n    \"rack\",\n    \"plate\",\n    \"blackboard\",\n    \"piano\",\n    \"suitcase\",\n    \"rail\",\n    \"radiator\",\n    \"recycling bin\",\n    \"container\",\n    \"wardrobe\",\n    \"soap dispenser\",\n    \"telephone\",\n    \"bucket\",\n    \"clock\",\n    \"stand\",\n    \"light\",\n    \"laundry basket\",\n    \"pipe\",\n    \"clothes dryer\",\n    \"guitar\",\n    \"toilet paper holder\",\n    \"seat\",\n    \"speaker\",\n    \"column\",\n    \"bicycle\",\n    \"ladder\",\n    \"bathroom stall\",\n    \"shower wall\",\n    \"cup\",\n    \"jacket\",\n    \"storage bin\",\n    \"coffee maker\",\n    \"dishwasher\",\n    \"paper towel roll\",\n    \"machine\",\n    \"mat\",\n    \"windowsill\",\n    \"bar\",\n    \"toaster\",\n    \"bulletin board\",\n    \"ironing board\",\n    \"fireplace\",\n    \"soap dish\",\n    \"kitchen counter\",\n    \"doorframe\",\n    \"toilet paper dispenser\",\n    \"mini fridge\",\n    \"fire extinguisher\",\n    \"ball\",\n    \"hat\",\n    \"shower curtain rod\",\n    \"water cooler\",\n    \"paper cutter\",\n    \"tray\",\n    \"shower door\",\n    \"pillar\",\n    \"ledge\",\n    \"toaster oven\",\n    \"mouse\",\n    \"toilet seat cover dispenser\",\n    \"furniture\",\n    \"cart\",\n    \"storage container\",\n    \"scale\",\n    \"tissue box\",\n    \"light switch\",\n    \"crate\",\n    \"power outlet\",\n    \"decoration\",\n    \"sign\",\n    \"projector\",\n    \"closet door\",\n    \"vacuum cleaner\",\n    \"candle\",\n    \"plunger\",\n    \"stuffed animal\",\n    \"headphones\",\n    \"dish rack\",\n    \"broom\",\n    \"guitar case\",\n    \"range hood\",\n    \"dustpan\",\n    \"hair dryer\",\n    \"water bottle\",\n    \"handicap bar\",\n    \"purse\",\n    \"vent\",\n    \"shower floor\",\n    \"water pitcher\",\n    \"mailbox\",\n    \"bowl\",\n    \"paper bag\",\n    \"alarm clock\",\n    \"music stand\",\n    \"projector screen\",\n    \"divider\",\n    \"laundry detergent\",\n    \"bathroom counter\",\n    \"object\",\n    \"bathroom vanity\",\n    \"closet wall\",\n    \"laundry hamper\",\n    \"bathroom stall door\",\n    \"ceiling light\",\n    \"trash bin\",\n    \"dumbbell\",\n    \"stair rail\",\n    \"tube\",\n    \"bathroom cabinet\",\n    \"cd case\",\n    \"closet rod\",\n    \"coffee kettle\",\n    \"structure\",\n    \"shower head\",\n    \"keyboard piano\",\n    \"case of water bottles\",\n    \"coat rack\",\n    \"storage organizer\",\n    \"folded chair\",\n    \"fire alarm\",\n    \"power strip\",\n    \"calendar\",\n    \"poster\",\n    \"potted plant\",\n    \"luggage\",\n    \"mattress\",\n)\n\nSCANNET_COLOR_MAP_200 = {\n    0: (0.0, 0.0, 0.0),\n    1: (174.0, 199.0, 232.0),\n    2: (188.0, 189.0, 34.0),\n    3: (152.0, 223.0, 138.0),\n    4: (255.0, 152.0, 150.0),\n    5: (214.0, 39.0, 40.0),\n    6: (91.0, 135.0, 229.0),\n    7: (31.0, 119.0, 180.0),\n    8: (229.0, 91.0, 104.0),\n    9: (247.0, 182.0, 210.0),\n    10: (91.0, 229.0, 110.0),\n    11: (255.0, 187.0, 120.0),\n    13: (141.0, 91.0, 229.0),\n    14: (112.0, 128.0, 144.0),\n    15: (196.0, 156.0, 148.0),\n    16: (197.0, 176.0, 213.0),\n    17: (44.0, 160.0, 44.0),\n    18: (148.0, 103.0, 189.0),\n    19: (229.0, 91.0, 223.0),\n    21: (219.0, 219.0, 141.0),\n    22: (192.0, 229.0, 91.0),\n    23: (88.0, 218.0, 137.0),\n    24: (58.0, 98.0, 137.0),\n    26: (177.0, 82.0, 239.0),\n    27: (255.0, 127.0, 14.0),\n    28: (237.0, 204.0, 37.0),\n    29: (41.0, 206.0, 32.0),\n    31: (62.0, 143.0, 148.0),\n    32: (34.0, 14.0, 130.0),\n    33: (143.0, 45.0, 115.0),\n    34: (137.0, 63.0, 14.0),\n    35: (23.0, 190.0, 207.0),\n    36: (16.0, 212.0, 139.0),\n    38: (90.0, 119.0, 201.0),\n    39: (125.0, 30.0, 141.0),\n    40: (150.0, 53.0, 56.0),\n    41: (186.0, 197.0, 62.0),\n    42: (227.0, 119.0, 194.0),\n    44: (38.0, 100.0, 128.0),\n    45: (120.0, 31.0, 243.0),\n    46: (154.0, 59.0, 103.0),\n    47: (169.0, 137.0, 78.0),\n    48: (143.0, 245.0, 111.0),\n    49: (37.0, 230.0, 205.0),\n    50: (14.0, 16.0, 155.0),\n    51: (196.0, 51.0, 182.0),\n    52: (237.0, 80.0, 38.0),\n    54: (138.0, 175.0, 62.0),\n    55: (158.0, 218.0, 229.0),\n    56: (38.0, 96.0, 167.0),\n    57: (190.0, 77.0, 246.0),\n    58: (208.0, 49.0, 84.0),\n    59: (208.0, 193.0, 72.0),\n    62: (55.0, 220.0, 57.0),\n    63: (10.0, 125.0, 140.0),\n    64: (76.0, 38.0, 202.0),\n    65: (191.0, 28.0, 135.0),\n    66: (211.0, 120.0, 42.0),\n    67: (118.0, 174.0, 76.0),\n    68: (17.0, 242.0, 171.0),\n    69: (20.0, 65.0, 247.0),\n    70: (208.0, 61.0, 222.0),\n    71: (162.0, 62.0, 60.0),\n    72: (210.0, 235.0, 62.0),\n    73: (45.0, 152.0, 72.0),\n    74: (35.0, 107.0, 149.0),\n    75: (160.0, 89.0, 237.0),\n    76: (227.0, 56.0, 125.0),\n    77: (169.0, 143.0, 81.0),\n    78: (42.0, 143.0, 20.0),\n    79: (25.0, 160.0, 151.0),\n    80: (82.0, 75.0, 227.0),\n    82: (253.0, 59.0, 222.0),\n    84: (240.0, 130.0, 89.0),\n    86: (123.0, 172.0, 47.0),\n    87: (71.0, 194.0, 133.0),\n    88: (24.0, 94.0, 205.0),\n    89: (134.0, 16.0, 179.0),\n    90: (159.0, 32.0, 52.0),\n    93: (213.0, 208.0, 88.0),\n    95: (64.0, 158.0, 70.0),\n    96: (18.0, 163.0, 194.0),\n    97: (65.0, 29.0, 153.0),\n    98: (177.0, 10.0, 109.0),\n    99: (152.0, 83.0, 7.0),\n    100: (83.0, 175.0, 30.0),\n    101: (18.0, 199.0, 153.0),\n    102: (61.0, 81.0, 208.0),\n    103: (213.0, 85.0, 216.0),\n    104: (170.0, 53.0, 42.0),\n    105: (161.0, 192.0, 38.0),\n    106: (23.0, 241.0, 91.0),\n    107: (12.0, 103.0, 170.0),\n    110: (151.0, 41.0, 245.0),\n    112: (133.0, 51.0, 80.0),\n    115: (184.0, 162.0, 91.0),\n    116: (50.0, 138.0, 38.0),\n    118: (31.0, 237.0, 236.0),\n    120: (39.0, 19.0, 208.0),\n    121: (223.0, 27.0, 180.0),\n    122: (254.0, 141.0, 85.0),\n    125: (97.0, 144.0, 39.0),\n    128: (106.0, 231.0, 176.0),\n    130: (12.0, 61.0, 162.0),\n    131: (124.0, 66.0, 140.0),\n    132: (137.0, 66.0, 73.0),\n    134: (250.0, 253.0, 26.0),\n    136: (55.0, 191.0, 73.0),\n    138: (60.0, 126.0, 146.0),\n    139: (153.0, 108.0, 234.0),\n    140: (184.0, 58.0, 125.0),\n    141: (135.0, 84.0, 14.0),\n    145: (139.0, 248.0, 91.0),\n    148: (53.0, 200.0, 172.0),\n    154: (63.0, 69.0, 134.0),\n    155: (190.0, 75.0, 186.0),\n    156: (127.0, 63.0, 52.0),\n    157: (141.0, 182.0, 25.0),\n    159: (56.0, 144.0, 89.0),\n    161: (64.0, 160.0, 250.0),\n    163: (182.0, 86.0, 245.0),\n    165: (139.0, 18.0, 53.0),\n    166: (134.0, 120.0, 54.0),\n    168: (49.0, 165.0, 42.0),\n    169: (51.0, 128.0, 133.0),\n    170: (44.0, 21.0, 163.0),\n    177: (232.0, 93.0, 193.0),\n    180: (176.0, 102.0, 54.0),\n    185: (116.0, 217.0, 17.0),\n    188: (54.0, 209.0, 150.0),\n    191: (60.0, 99.0, 204.0),\n    193: (129.0, 43.0, 144.0),\n    195: (252.0, 100.0, 106.0),\n    202: (187.0, 196.0, 73.0),\n    208: (13.0, 158.0, 40.0),\n    213: (52.0, 122.0, 152.0),\n    214: (128.0, 76.0, 202.0),\n    221: (187.0, 50.0, 115.0),\n    229: (180.0, 141.0, 71.0),\n    230: (77.0, 208.0, 35.0),\n    232: (72.0, 183.0, 168.0),\n    233: (97.0, 99.0, 203.0),\n    242: (172.0, 22.0, 158.0),\n    250: (155.0, 64.0, 40.0),\n    261: (118.0, 159.0, 30.0),\n    264: (69.0, 252.0, 148.0),\n    276: (45.0, 103.0, 173.0),\n    283: (111.0, 38.0, 149.0),\n    286: (184.0, 9.0, 49.0),\n    300: (188.0, 174.0, 67.0),\n    304: (53.0, 206.0, 53.0),\n    312: (97.0, 235.0, 252.0),\n    323: (66.0, 32.0, 182.0),\n    325: (236.0, 114.0, 195.0),\n    331: (241.0, 154.0, 83.0),\n    342: (133.0, 240.0, 52.0),\n    356: (16.0, 205.0, 144.0),\n    370: (75.0, 101.0, 198.0),\n    392: (237.0, 95.0, 251.0),\n    395: (191.0, 52.0, 49.0),\n    399: (227.0, 254.0, 54.0),\n    408: (49.0, 206.0, 87.0),\n    417: (48.0, 113.0, 150.0),\n    488: (125.0, 73.0, 182.0),\n    540: (229.0, 32.0, 114.0),\n    562: (158.0, 119.0, 28.0),\n    570: (60.0, 205.0, 27.0),\n    572: (18.0, 215.0, 201.0),\n    581: (79.0, 76.0, 153.0),\n    609: (134.0, 13.0, 116.0),\n    748: (192.0, 97.0, 63.0),\n    776: (108.0, 163.0, 18.0),\n    1156: (95.0, 220.0, 156.0),\n    1163: (98.0, 141.0, 208.0),\n    1164: (144.0, 19.0, 193.0),\n    1165: (166.0, 36.0, 57.0),\n    1166: (212.0, 202.0, 34.0),\n    1167: (23.0, 206.0, 34.0),\n    1168: (91.0, 211.0, 236.0),\n    1169: (79.0, 55.0, 137.0),\n    1170: (182.0, 19.0, 117.0),\n    1171: (134.0, 76.0, 14.0),\n    1172: (87.0, 185.0, 28.0),\n    1173: (82.0, 224.0, 187.0),\n    1174: (92.0, 110.0, 214.0),\n    1175: (168.0, 80.0, 171.0),\n    1176: (197.0, 63.0, 51.0),\n    1178: (175.0, 199.0, 77.0),\n    1179: (62.0, 180.0, 98.0),\n    1180: (8.0, 91.0, 150.0),\n    1181: (77.0, 15.0, 130.0),\n    1182: (154.0, 65.0, 96.0),\n    1183: (197.0, 152.0, 11.0),\n    1184: (59.0, 155.0, 45.0),\n    1185: (12.0, 147.0, 145.0),\n    1186: (54.0, 35.0, 219.0),\n    1187: (210.0, 73.0, 181.0),\n    1188: (221.0, 124.0, 77.0),\n    1189: (149.0, 214.0, 66.0),\n    1190: (72.0, 185.0, 134.0),\n    1191: (42.0, 94.0, 198.0),\n}\n\nHEAD_CATS_SCANNET_200 = ['tv stand', 'curtain', 'blinds', 'shower curtain', 'bookshelf', 'tv', 'kitchen cabinet', 'pillow', 'lamp', 'dresser', 'monitor', 'object', 'ceiling', 'board', 'stove', 'closet wall', 'couch', 'office chair', 'kitchen counter', 'shower', 'closet', 'doorframe', 'sofa chair', 'mailbox', 'nightstand', 'washing machine', 'picture', 'book', 'sink', 'recycling bin', 'table', 'backpack', 'shower wall', 'toilet', 'copier', 'counter', 'stool', 'refrigerator', 'window', 'file cabinet', 'chair', 'wall', 'plant', 'coffee table', 'stairs', 'armchair', 'cabinet', 'bathroom vanity', 'bathroom stall', 'mirror', 'blackboard', 'trash can', 'stair rail', 'box', 'towel', 'door', 'clothes', 'whiteboard', 'bed', 'floor', 'bathtub', 'desk', 'wardrobe', 'clothes dryer', 'radiator', 'shelf']\nCOMMON_CATS_SCANNET_200 = [\"cushion\", \"end table\", \"dining table\", \"keyboard\", \"bag\", \"toilet paper\", \"printer\", \"blanket\", \"microwave\", \"shoe\", \"computer tower\", \"bottle\", \"bin\", \"ottoman\", \"bench\", \"basket\", \"fan\", \"laptop\", \"person\", \"paper towel dispenser\", \"oven\", \"rack\", \"piano\", \"suitcase\", \"rail\", \"container\", \"telephone\", \"stand\", \"light\", \"laundry basket\", \"pipe\", \"seat\", \"column\", \"bicycle\", \"ladder\", \"jacket\", \"storage bin\", \"coffee maker\", \"dishwasher\", \"machine\", \"mat\", \"windowsill\", \"bulletin board\", \"fireplace\", \"mini fridge\", \"water cooler\", \"shower door\", \"pillar\", \"ledge\", \"furniture\", \"cart\", \"decoration\", \"closet door\", \"vacuum cleaner\", \"dish rack\", \"range hood\", \"projector screen\", \"divider\", \"bathroom counter\", \"laundry hamper\", \"bathroom stall door\", \"ceiling light\", \"trash bin\", \"bathroom cabinet\", \"structure\", \"storage organizer\", \"potted plant\", \"mattress\"]\nTAIL_CATS_SCANNET_200 = [\"paper\", \"plate\", \"soap dispenser\", \"bucket\", \"clock\", \"guitar\", \"toilet paper holder\", \"speaker\", \"cup\", \"paper towel roll\", \"bar\", \"toaster\", \"ironing board\", \"soap dish\", \"toilet paper dispenser\", \"fire extinguisher\", \"ball\", \"hat\", \"shower curtain rod\", \"paper cutter\", \"tray\", \"toaster oven\", \"mouse\", \"toilet seat cover dispenser\", \"storage container\", \"scale\", \"tissue box\", \"light switch\", \"crate\", \"power outlet\", \"sign\", \"projector\", \"candle\", \"plunger\", \"stuffed animal\", \"headphones\", \"broom\", \"guitar case\", \"dustpan\", \"hair dryer\", \"water bottle\", \"handicap bar\", \"purse\", \"vent\", \"shower floor\", \"water pitcher\", \"bowl\", \"paper bag\", \"alarm clock\", \"music stand\", \"laundry detergent\", \"dumbbell\", \"tube\", \"cd case\", \"closet rod\", \"coffee kettle\", \"shower head\", \"keyboard piano\", \"case of water bottles\", \"coat rack\", \"folded chair\", \"fire alarm\", \"power strip\", \"calendar\", \"poster\", \"luggage\"]\n\nVALID_CLASS_IDS_200_VALIDATION = ('wall', 'chair', 'floor', 'table', 'door', 'couch', 'cabinet', 'shelf', 'desk', 'office chair', 'bed', 'pillow', 'sink', 'picture', 'window', 'toilet', 'bookshelf', 'monitor', 'curtain', 'book', 'armchair', 'coffee table', 'box', 'refrigerator', 'lamp', 'kitchen cabinet', 'towel', 'clothes', 'tv', 'nightstand', 'counter', 'dresser', 'stool', 'cushion', 'plant', 'ceiling', 'bathtub', 'end table', 'dining table', 'keyboard', 'bag', 'backpack', 'toilet paper', 'printer', 'tv stand', 'whiteboard', 'blanket', 'shower curtain', 'trash can', 'closet', 'stairs', 'microwave', 'stove', 'shoe', 'computer tower', 'bottle', 'bin', 'ottoman', 'bench', 'board', 'washing machine', 'mirror', 'copier', 'basket', 'sofa chair', 'file cabinet', 'fan', 'laptop', 'shower', 'paper', 'person', 'paper towel dispenser', 'oven', 'blinds', 'rack', 'plate', 'blackboard', 'piano', 'suitcase', 'rail', 'radiator', 'recycling bin', 'container', 'wardrobe', 'soap dispenser', 'telephone', 'bucket', 'clock', 'stand', 'light', 'laundry basket', 'pipe', 'clothes dryer', 'guitar', 'toilet paper holder', 'seat', 'speaker', 'column', 'ladder', 'bathroom stall', 'shower wall', 'cup', 'jacket', 'storage bin', 'coffee maker', 'dishwasher', 'paper towel roll', 'machine', 'mat', 'windowsill', 'bar', 'toaster', 'bulletin board', 'ironing board', 'fireplace', 'soap dish', 'kitchen counter', 'doorframe', 'toilet paper dispenser', 'mini fridge', 'fire extinguisher', 'ball', 'hat', 'shower curtain rod', 'water cooler', 'paper cutter', 'tray', 'shower door', 'pillar', 'ledge', 'toaster oven', 'mouse', 'toilet seat cover dispenser', 'furniture', 'cart', 'scale', 'tissue box', 'light switch', 'crate', 'power outlet', 'decoration', 'sign', 'projector', 'closet door', 'vacuum cleaner', 'plunger', 'stuffed animal', 'headphones', 'dish rack', 'broom', 'range hood', 'dustpan', 'hair dryer', 'water bottle', 'handicap bar', 'vent', 'shower floor', 'water pitcher', 'mailbox', 'bowl', 'paper bag', 'projector screen', 'divider', 'laundry detergent', 'bathroom counter', 'object', 'bathroom vanity', 'closet wall', 'laundry hamper', 'bathroom stall door', 'ceiling light', 'trash bin', 'dumbbell', 'stair rail', 'tube', 'bathroom cabinet', 'closet rod', 'coffee kettle', 'shower head', 'keyboard piano', 'case of water bottles', 'coat rack', 'folded chair', 'fire alarm', 'power strip', 'calendar', 'poster', 'potted plant', 'mattress')\n\nBASE_CLASS_1 = ['cabinet', 'bed', 'chair', 'table', 'door', 'window', 'picture', 'counter', 'curtain', 'refrigerator', 'shower curtain', 'sink', 'bathtub']\nNOVEL_CLASS_1 = ['sofa chair', 'bookshelf', 'desk', 'toilet']\nBASE_CLASS_2 = ['cabinet', 'sofa chair', 'door', 'window', 'counter', 'desk', 'curtain', 'refrigerator', 'shower curtain', 'toilet']\nNOVEL_CLASS_2 = ['bed', 'chair', 'table', 'bookshelf', 'picture', 'sink', 'bathtub']\nBASE_CLASS_3 = ['cabinet', 'bed', 'chair', 'sofa chair', 'table', 'door', 'window', 'curtain']\nNOVEL_CLASS_3 = ['bookshelf', 'picture', 'counter', 'desk', 'refrigerator', 'shower curtain', 'toilet', 'sink', 'bathtub']\nALL = list(CLASS_LABELS_200)\n\nRANDOM_COLOR_MAP = []\nfor _ in range(1000):\n    for k in range(12):\n        RANDOM_COLOR_MAP.append(plt.cm.Set3(k))\n    for k in range(9):\n        RANDOM_COLOR_MAP.append(plt.cm.Set1(k))\n    for k in range(8):\n        RANDOM_COLOR_MAP.append(plt.cm.Set2(k))\nRANDOM_COLOR_MAP.append((0, 0, 0, 0))\nRANDOM_COLOR_MAP = np.array(RANDOM_COLOR_MAP) * 255\n\nclass PromptType(Enum):\n    TEXT = 1\n    IMAGE = 2\n    POINT = 3\n"
  },
  {
    "path": "preprocess/utils/label_convert.py",
    "content": "ARKITSCENE_SCANNET= {\n'bed': 'bed',\n'cabinet': 'cabinet',\n'refrigerator': 'refrigerator',\n'table': 'table',\n'chair': 'chair',\n'sink': 'sink',\n'stove': 'stove',\n'oven': 'oven',\n'washer': 'washing machine',\n'shelf': 'shelf',\n'tv_monitor': 'tv',\n'bathtub': 'bathtub',\n'toilet': 'toilet',\n'sofa': 'sofa',\n'stool': 'stool',\n'fireplace': 'fireplace',\n'build_in_cabinet': 'cabinet',\n'dishwasher': 'dishwasher',\n'stairs': 'stairs'\n}\n\n\nMULTISCAN_SCANNET = {\n    \"wall\": \"wall\",\n    \"door\": \"door\",\n    \"slippers\": \"shoe\",\n    \"mop\": \"broom\",\n    \"rug\": \"rug\",\n    \"floor\": \"floor\",\n    \"basin\": \"sink\",\n    \"basin_stand\": \"sink\",\n    \"bucket\": \"bucket\",\n    \"shower\": \"shower\",\n    \"water_tank\": \"container\",\n    \"beam\": \"wood beam\",\n    \"pillar\": \"pillar\",\n    \"ceiling\": \"ceiling\",\n    \"sink\": \"sink\",\n    \"toilet\": \"toilet\",\n    \"cabinet\": \"cabinet\",\n    \"remove\": \"object\",\n    \"towel\": \"towel\",\n    \"pillow\": \"pillow\",\n    \"sofa\": \"sofa\",\n    \"footstool\": \"footstool\",\n    \"picture\": \"picture\",\n    \"window\": \"window\",\n    \"heater\": \"heater\",\n    \"mirror\": \"mirror\",\n    \"pipe\": \"pipe\",\n    \"scarf\": \"cloth\",\n    \"ceiling_light\": \"ceiling light\",\n    \"chair\": \"chair\",\n    \"table\": \"table\",\n    \"vent\": \"vent\",\n    \"bag\": \"bag\",\n    \"wall_cabinet\": \"cabinet\",\n    \"range\": \"stove\",\n    \"ricemaker\": \"rice cooker\",\n    \"pan\": \"cooking pan\",\n    \"coffee_machine\": \"coffee maker\",\n    \"rice_bag\": \"bag\",\n    \"light\": \"light\",\n    \"trashbin\": \"trash bin\",\n    \"kettle\": \"kettle\",\n    \"refrigerator\": \"refrigerator\",\n    \"microwave\": \"microwave\",\n    \"light_switch\": \"light switch\",\n    \"rice_cooker\": \"rice cooker\",\n    \"box\": \"box\",\n    \"shoe\": \"shoe\",\n    \"range_hood\": \"range hood\",\n    \"wok\": \"cooking pan\",\n    \"router\": \"object\",\n    \"paper_towel\": \"paper towel roll\",\n    \"stock_pot\": \"pot\",\n    \"cutting_board\": \"cutting board\",\n    \"wall_calendar\": \"calendar\",\n    \"baseboard\": \"object\",\n    \"coke_box\": \"box\",\n    \"printer\": \"printer\",\n    \"bowl\": \"bowl\",\n    \"backpack\": \"backpack\",\n    \"baseboard_heater\": \"heater\",\n    \"broom\": \"broom\",\n    \"dust_pan\": \"dustpan\",\n    \"trash_bin\": \"trash bin\",\n    \"rigid_duct\": \"vent\",\n    \"electric_range\": \"stove\",\n    \"spatula\": \"object\",\n    \"faucet\": \"faucet\",\n    \"bottle\": \"bottle\",\n    \"countertop\": \"counter\",\n    \"railing\": \"railing\",\n    \"suitcase\": \"suitcase\",\n    \"trash\": \"trash can\",\n    \"pot\": \"pot\",\n    \"kitchen_tool\": \"object\",\n    \"vegetable\": \"object\",\n    \"board\": \"board\",\n    \"washing_machine\": \"washing machine\",\n    \"jar\": \"jar\",\n    \"object\": \"object\",\n    \"notebook\": \"book\",\n    \"induction_cooker\": \"stove\",\n    \"instant_pot_lid\": \"cooking pot\",\n    \"oven\": \"oven\",\n    \"air_fryer\": \"object\",\n    \"lid\": \"pot\",\n    \"sponge\": \"sponge\",\n    \"blender\": \"object\",\n    \"spoon\": \"object\",\n    \"dishwasher\": \"dishwasher\",\n    \"detergent\": \"laundry detergent\",\n    \"watermelon\": \"bananas\",\n    \"yard_waste_bag\": \"garbage bag\",\n    \"container\": \"container\",\n    \"newspapers\": \"paper\",\n    \"rag\": \"cloth\",\n    \"ladder\": \"ladder\",\n    \"gate\": \"door\",\n    \"napkin_box\": \"tissue box\",\n    \"jacket\": \"jacket\",\n    \"windowsill\": \"windowsill\",\n    \"water_faucet\": \"faucet\",\n    \"steel_ball\": \"ball\",\n    \"rice_maker\": \"rice cooker\",\n    \"watter_bottle\": \"water bottle\",\n    \"plastic_bag\": \"bag\",\n    \"paper_bag\": \"paper bag\",\n    \"cuttting_board\": \"cutting board\",\n    \"trash_bin_lid\": \"trash bin\",\n    \"hair_dryer\": \"hair dryer\",\n    \"electric_socket\": \"power outlet\",\n    \"electric_panel\": \"electric panel\",\n    \"wash_stand\": \"sink\",\n    \"soap\": \"soap\",\n    \"curtain\": \"curtain\",\n    \"bathtub\": \"bathtub\",\n    \"smoke_detector\": \"smoke detector\",\n    \"roll_paper\": \"paper towel roll\",\n    \"chandelier\": \"chandelier\",\n    \"hand_sanitizer\": \"hand sanitzer dispenser\",\n    \"plate\": \"plate\",\n    \"sticker\": \"sticker\",\n    \"power_socket\": \"power outlet\",\n    \"stacked_cups\": \"stack of cups\",\n    \"stacked_chairs\": \"stack of chairs\",\n    \"air_vent\": \"vent\",\n    \"cornice\": \"cabinet\",\n    \"wine_cabinet\": \"kitchen cabinet\",\n    \"crock\": \"bowl\",\n    \"liquor_box\": \"cabinet\",\n    \"shampoo\": \"shampoo\",\n    \"shower_curtain\": \"shower curtain\",\n    \"wall_light\": \"wall lamp\",\n    \"sink_cabinet\": \"sink\",\n    \"toilet_roll\": \"toilet paper\",\n    \"shelf\": \"shelf\",\n    \"paper_bin\": \"recycling bin\",\n    \"toilet_brush\": \"toilet brush\",\n    \"shower_head\": \"shower head\",\n    \"tv\": \"tv\",\n    \"remote_control\": \"remote\",\n    \"tv_box\": \"tv stand\",\n    \"nightstand\": \"nightstand\",\n    \"bed\": \"bed\",\n    \"quilt\": \"blanket\",\n    \"telephone\": \"telephone\",\n    \"monitor\": \"monitor\",\n    \"desk\": \"desk\",\n    \"radiator_shell\": \"radiator\",\n    \"calendar\": \"calendar\",\n    \"clock\": \"clock\",\n    \"keyboard\": \"keyboard\",\n    \"speaker\": \"speaker\",\n    \"clothes\": \"clothes\",\n    \"door_frame\": \"doorframe\",\n    \"sliding_door\": \"sliding door\",\n    \"ceiling_lamp\": \"ceiling lamp\",\n    \"scale\": \"scale\",\n    \"power_strip\": \"power strip\",\n    \"switch\": \"light switch\",\n    \"basket\": \"basket\",\n    \"stool\": \"stool\",\n    \"shoes\": \"shoe\",\n    \"slipper\": \"slippers\",\n    \"bifold_door\": \"door\",\n    \"rangehood\": \"range hood\",\n    \"books\": \"books\",\n    \"toilet_paper\": \"toilet paper\",\n    \"mouse_pad\": \"mouse\",\n    \"ipad\": \"ipad\",\n    \"scissor\": \"knife block\",\n    \"radiator\": \"radiator\",\n    \"pc\": \"computer tower\",\n    \"bicycle\": \"bicycle\",\n    \"wardrobe\": \"wardrobe\",\n    \"mouse\": \"mouse\",\n    \"advertising_board\": \"poster\",\n    \"banner\": \"banner\",\n    \"ceiling_decoration\": \"ceiling light\",\n    \"whiteboard\": \"whiteboard\",\n    \"wall_storage_set\": \"shelf\",\n    \"traffic_cone\": \"traffic cone\",\n    \"wall_decoration\": \"decoration\",\n    \"papers\": \"papers\",\n    \"hat\": \"hat\",\n    \"velvet_hangers\": \"clothes hanger\",\n    \"circular_plate\": \"plate\",\n    \"cellphone\": \"telephone\",\n    \"pen\": \"keyboard piano\",\n    \"paper\": \"paper\",\n    \"lamp\": \"lamp\",\n    \"curtain_box\": \"curtains\",\n    \"woodcarving\": \"wood\",\n    \"scissors\": \"knife block\",\n    \"hand_dryer\": \"hand dryer\",\n    \"machine\": \"machine\",\n    \"vase\": \"vase\",\n    \"plant\": \"plant\",\n    \"power_socket_case\": \"power outlet\",\n    \"gloves\": \"clothes\",\n    \"dishcloth\": \"cloth\",\n    \"painting\": \"painting\",\n    \"shower_wall\": \"shower wall\",\n    \"showerhead\": \"shower head\",\n    \"tooth_mug\": \"cup\",\n    \"map\": \"map\",\n    \"knot_artwork\": \"decoration\",\n    \"fan\": \"fan\",\n    \"sphygmomanometer\": \"scale\",\n    \"electric_kettle\": \"kettle\",\n    \"bread_maker\": \"oven\",\n    \"knife_set\": \"knife block\",\n    \"soup_pot\": \"cooking pot\",\n    \"flatware_set\": \"cutting board\",\n    \"candle\": \"candle\",\n    \"lid_rack\": \"dish rack\",\n    \"flower\": \"flowerpot\",\n    \"can\": \"can\",\n    \"scoop\": \"bowl\",\n    \"laptop\": \"laptop\",\n    \"glass\": \"glass doors\",\n    \"wet_floor_sign\": \"wet floor sign\",\n    \"shower_enclosure\": \"shower doors\",\n    \"jewelry_box\": \"jewelry box\",\n    \"bath_brush\": \"hair brush\",\n    \"sofa_cushion\": \"couch cushions\",\n    \"tv_cabinet\": \"tv stand\",\n    \"wood_fence\": \"wood beam\",\n    \"floor_lamp\": \"lamp\",\n    \"computer_case\": \"computer tower\",\n    \"waste_container\": \"trash bin\",\n    \"roadblock\": \"barricade\",\n    \"trash_can_lids\": \"trash can\",\n    \"hand_sanitizer_stand\": \"soap dispenser\",\n    \"air_conditioner\": \"conditioner bottle\",\n    \"pattern\": \"rug\",\n    \"remote_controller\": \"remote\",\n    \"phone\": \"telephone\",\n    \"speakers\": \"speaker\",\n    \"table_divider\": \"divider\",\n    \"table_card\": \"card\",\n    \"paper_trimmer\": \"paper cutter\",\n    \"stapler\": \"stapler\",\n    \"cup\": \"cup\",\n    \"bathroom_heater\": \"heater\",\n    \"wall_shelf\": \"shelf\",\n    \"towel_rack\": \"towel\",\n    \"sink_drain\": \"sink\",\n    \"floor_drain\": \"floor\",\n    \"broom_head\": \"broom\",\n    \"door_curtain\": \"curtain\",\n    \"refill_pouch\": \"plastic container\",\n    \"bin\": \"bin\",\n    \"stall_wall\": \"bathroom stall door\",\n    \"wall_speaker\": \"speaker\",\n    \"laundry_basket\": \"laundry basket\",\n    \"tissue_box\": \"tissue box\",\n    \"document_holder\": \"file cabinet\",\n    \"yoga_mat\": \"yoga mat\",\n    \"gas_range\": \"stove\",\n    \"chopping_board\": \"cutting board\",\n    \"book_scanner\": \"scanner\",\n    \"payment_terminal\": \"vending machine\",\n    \"napkin_roll\": \"paper towel roll\",\n    \"faucet_switch\": \"faucet\",\n    \"glass_door\": \"glass doors\",\n    \"carpet\": \"carpet\",\n    \"shower_floor\": \"shower floor\",\n    \"toilet_plunger\": \"plunger\",\n    \"plug_panel\": \"power outlet\",\n    \"stand\": \"stand\",\n    \"potted_plant\": \"potted plant\",\n    \"poster\": \"poster\",\n    \"isolation_board\": \"divider\",\n    \"soap_holder\": \"soap dish\",\n    \"plug\": \"power outlet\",\n    \"brush\": \"hair brush\",\n    \"threshold\": \"doorframe\",\n    \"air_conditioner_controller\": \"remote\",\n    \"iron\": \"iron\",\n    \"ironing_board\": \"ironing board\",\n    \"safe\": \"suitcase\",\n    \"gas_cooker\": \"stove\",\n    \"pressure_cooker\": \"cooking pot\",\n    \"steamer_pot\": \"pot\",\n    \"soy_sauce_bottle\": \"bottle\",\n    \"dishwashing_liquid\": \"dishwashing soap bottle\",\n    \"water_ladle\": \"bowl\",\n    \"power_socket_set\": \"power strip\",\n    \"kitchen_tool_holder\": \"kitchen cabinet\",\n    \"case\": \"case\",\n    \"wall_paper\": \"wall\",\n    \"comb\": \"hair brush\",\n    \"paper_cutter\": \"paper cutter\",\n    \"pencil_sharpener\": \"pen holder\",\n    \"sealing_machine\": \"machine\",\n    \"poster_board\": \"poster\",\n    \"shredder\": \"shredder\",\n    \"footstep\": \"stair\",\n    \"planter\": \"plant\",\n    \"floor_light\": \"lamp\",\n    \"paper_cup\": \"cup\",\n    \"divider\": \"divider\",\n    \"hanger\": \"clothes hanger\",\n    \"glove\": \"clothing\",\n    \"blanket\": \"blanket\",\n    \"remote\": \"remote\",\n    \"cloth\": \"cloth\",\n    \"clutter\": \"object\",\n    \"extinguisher\": \"fire extinguisher\",\n    \"dryer\": \"clothes dryer\",\n    \"soap_bottle\": \"soap bottle\",\n    \"fabric_softener_box\": \"box\",\n    \"dryer_sheet_box\": \"box\",\n    \"detergent_bottle\": \"laundry detergent\",\n    \"toaster\": \"toaster\",\n    \"stacked_bowls\": \"bowl\",\n    \"pot_lid\": \"pot\",\n    \"electric_pressure_cooker\": \"rice cooker\",\n    \"bread\": \"food display\",\n    \"bagels\": \"object\",\n    \"oranges\": \"bananas\",\n    \"card_reader\": \"card\",\n    \"whiteboard_detergent\": \"soap dispenser\",\n    \"power_outlet\": \"power outlet\",\n    \"bouquet\": \"vase\",\n    \"water_bottle\": \"water bottle\",\n    \"wall_mounted_telephone\": \"telephone\",\n    \"fridge\": \"refrigerator\",\n    \"toy\": \"toy dinosaur\",\n    \"shoe_box\": \"box\",\n    \"hole_puncher\": \"paper cutter\",\n    \"landline_telephone\": \"telephone\",\n    \"base\": \"stand\",\n    \"handkerchief\": \"cloth\",\n    \"cornice_molding\": \"frame\",\n    \"bathtub_base\": \"bathtub\",\n    \"bidet\": \"toilet\",\n    \"pedestal_urinal\": \"urinal\",\n    \"pedestal_urinal_covered\": \"urinal\",\n    \"pit_toilet\": \"toilet\",\n    \"low_wall\": \"wall\",\n    \"rail\": \"rail\",\n    \"bottles\": \"bottles\",\n    \"floor_otherroom\": \"floor\",\n    \"wall_otherroom\": \"wall\",\n    \"canopy\": \"canopy\",\n    \"cable_manager\": \"cable\",\n    \"sneakers\": \"shoes\",\n    \"purse\": \"purse\",\n    \"cushion\": \"cushion\",\n    \"napkin\": \"towel\",\n    \"plush_toy\": \"stuffed animal\",\n    \"adjustable_desk\": \"desk\",\n    \"tableware\": \"plates\",\n    \"computer_desk\": \"desk\",\n    \"cat_kennel\": \"cat litter box\",\n    \"back_cushion\": \"pillow\",\n    \"ukulele_bag\": \"guitar case\",\n    \"litter_box\": \"trash can\",\n    \"storage_box\": \"storage bin\",\n    \"toy_doll\": \"doll\",\n    \"drawer_unit\": \"drawer\",\n    \"doll\": \"stuffed animal\",\n    \"laptop_bag\": \"messenger bag\",\n    \"clothing_rack\": \"clothing rack\",\n    \"bookshelf\": \"bookshelves\",\n    \"mask\": \"cloth\",\n    \"watch\": \"clock\",\n    \"book\": \"books\",\n    \"ashtray\": \"tray\",\n    \"car_key\": \"car\",\n    \"wallet\": \"purse\",\n    \"tea_pot\": \"tea kettle\",\n    \"wire\": \"cable\",\n    \"rake\": \"broom\",\n    \"dispenser\": \"soap dispenser\",\n    \"toilet_tank\": \"toilet\",\n    \"door_sill\": \"doorframe\",\n    \"cleanser\": \"soap\",\n    \"armrest\": \"armchair\",\n    \"short_wall\": \"wall\",\n    \"suspended_ceiling\": \"ceiling\",\n    \"fire_extinguisher_cabinet\": \"fire extinguisher\",\n    \"plastic_box\": \"plastic container\",\n    \"sanitation_station\": \"soap dispenser\",\n    \"plant_pot\": \"flowerpot\",\n    \"fireplace\": \"fireplace\",\n    \"computer_table\": \"desk\",\n    \"tissue_bag\": \"tissue box\",\n    \"wall_frame\": \"frame\",\n    \"map_board\": \"map\",\n    \"automated_teller_machine\": \"vending machine\",\n    \"ticket\": \"card\",\n    \"tablet\": \"ipad\",\n    \"blankets\": \"blanket\",\n    \"bags\": \"bag\",\n    \"flag\": \"flag\",\n    \"blackboard\": \"blackboard\",\n    \"bar_table\": \"bar\",\n    \"cardboard_holder\": \"cardboard\",\n    \"potted_planet\": \"potted plant\",\n    \"tray\": \"tray\",\n    \"utensil_holder\": \"kitchen counter\",\n    \"bird_ceramics\": \"statue\",\n    \"shirt\": \"shirt\",\n    \"clothes_rail\": \"clothes hanger\",\n    \"power_strips\": \"power strip\",\n    \"card_board\": \"board\",\n    \"pile_of_blankets\": \"blanket\",\n    \"bed_net\": \"bed\",\n    \"umbrella\": \"umbrella\",\n    \"dragon_fruit\": \"bananas\",\n    \"tissue\": \"tissue box\",\n    \"electrical_panel\": \"electric panel\",\n    \"panel\": \"door\",\n    \"tube\": \"tube\",\n    \"pile_of_cloth\": \"cloth\",\n    \"surface\": \"table\",\n    \"chair_cushion\": \"cushion\",\n    \"guide\": \"book\",\n    \"parapet\": \"railing\",\n    \"camera\": \"camera\",\n    \"light_base\": \"lamp base\",\n    \"first_aid\": \"object\",\n    \"bench\": \"bench\",\n    \"potted_plants\": \"potted plant\",\n    \"pot_cover\": \"pot\",\n    \"yoga_mat_roll\": \"yoga mat\",\n    \"panda_doll\": \"stuffed animal\",\n    \"window_trim\": \"window\",\n    \"shoe_cabinet\": \"shoe rack\",\n    \"toilet_paper_holder\": \"toilet paper dispenser\",\n    \"shower_faucet\": \"shower faucet handle\",\n    \"bath_sponge\": \"sponge\",\n    \"ornament\": \"decoration\",\n    \"planter_box\": \"plant\",\n    \"cooktop\": \"stove\",\n    \"knife_block\": \"knife block\",\n    \"step_stool\": \"step stool\",\n    \"touchpad\": \"keyboard\",\n    \"light_box\": \"light\",\n    \"sound\": \"speaker\",\n    \"exhaust_fan_vent\": \"vent\",\n    \"paperbin\": \"recycling bin\",\n    \"mop_bucket\": \"bucket\",\n    \"sneaker\": \"shoes\",\n    \"objects\": \"object\",\n    \"cd_tray\": \"cd case\",\n    \"wall_board\": \"board\",\n    \"room_divider\": \"divider\",\n    \"paiting\": \"painting\",\n    \"cabinet_otherroom\": \"cabinet\",\n    \"electric_switch\": \"light switch\",\n    \"sign\": \"exit sign\",\n    \"hand_soap\": \"soap bottle\",\n    \"window_blinds\": \"blinds\"\n}\n\n\nRSCAN_SCANNET = {\n'pillow': 'pillow',\n'box': 'box',\n'item': 'object',\n'curtain': 'curtain',\n'towel': 'towel',\n'garbage bin': 'trash bin',\n'wall': 'wall',\n'floor': 'floor',\n'figure': 'statue',\n'frame': 'frame',\n'shelf': 'shelf',\n'clothes': 'clothing',\n'picture': 'picture',\n'organizer': 'organizer shelf',\n'ceiling': 'ceiling',\n'object': 'object',\n'cabinet': 'cabinet',\n'blanket': 'blanket',\n'monitor': 'monitor',\n'door': 'door',\n'roll': 'paper towel roll',\n'bed': 'bed',\n'desk': 'desk',\n'window': 'window',\n'nightstand': 'nightstand',\n'rack': 'rack stand',\n'plant': 'potted plant',\n'cushion': 'cushion',\n'light': 'light',\n'table': 'table',\n'windowsill': 'windowsill',\n'shades': 'blinds',\n'sofa': 'sofa',\n'beanbag': 'beanbag chair',\n'commode': 'toilet',\n'heater': 'heater',\n'trash can': 'trash can',\n'child chair': 'stool',\n'mirror': 'mirror',\n'lamp': 'lamp',\n'sink': 'sink',\n'cupboard': 'kitchen cabinet',\n'toilet paper': 'toilet paper rolls',\n'toilet': 'toilet',\n'handhold': 'hand rail',\n'vase': 'vase',\n'toilet brush': 'toilet brush',\n'armchair': 'armchair',\n'doorframe': 'doorframe',\n'bathtub': 'bathtub',\n'bath cabinet': 'bathroom cabinet',\n'basket': 'basket',\n'shower curtain': 'shower curtain',\n'bin': 'trash bin',\n'kitchen hood': 'range hood',\n'kitchen cabinet': 'kitchen cabinet',\n'kitchen sofa': 'sofa',\n'chair': 'chair',\n'rag': 'towel',\n'kitchen counter': 'kitchen counter',\n'oven': 'oven',\n'microwave': 'microwave',\n'fruit plate': 'plate',\n'player': 'keyboard piano',\n'kitchen appliance': 'microwave',\n'kettle': 'kettle',\n'wardrobe': 'wardrobe closet',\n'stool': 'stool',\n'stand': 'stand',\n'shoes': 'shoes',\n'counter': 'counter',\n'hand dryer': 'hand dryer',\n'suitcase': 'suitcase',\n'closet': 'closet',\n'tv': 'tv',\n'bag': 'bag',\n'laptop': 'laptop',\n'jalousie': 'blinds',\n'whiteboard': 'whiteboard',\n'planter': 'flowerpot',\n'shower': 'shower',\n'hanging cabinet': 'kitchen cabinets',\n'flower': 'plant',\n'washbasin': 'sink',\n'clothes dryer': 'clothes dryers',\n'sack': 'bag',\n'basin': 'sink',\n'radiator': 'radiator',\n'refrigerator': 'refrigerator',\n'clutter': 'object',\n'vacuum cleaner': 'vacuum cleaner',\n'shelf unit': 'shelf',\n'mop': 'broom',\n'ironing board': 'ironing board',\n'iron': 'iron',\n'bucket': 'bucket',\n'toy': 'doll',\n'stairs': 'stairs',\n'barrel': 'container',\n'washing machine': 'washing machine',\n'carpet': 'carpet',\n'sidecouch': 'couch',\n'tv stand': 'tv stand',\n'bench': 'bench',\n'humidifier': 'humidifier',\n'hanger': 'clothes hanger',\n'backpack': 'backpack',\n'drawer': 'drawer',\n'console': 'computer tower',\n'hangers': 'clothes hanger',\n'blinds': 'blinds',\n'balcony door': 'door',\n'upholstered wall': 'wall',\n'coffee table': 'coffee table',\n'blackboard': 'blackboard',\n'glass wall': 'window',\n'bottles': 'bottle',\n'pack': 'bag',\n'scale': 'scale',\n'ventilation': 'fan',\n'paper towel': 'paper towel roll',\n'bottle': 'bottle',\n'dish dryer': 'dish rack',\n'candle': 'candle',\n'pc': 'computer tower',\n'washing': 'washing machine',\n'tube': 'tube',\n'snowboard': 'board',\n'board': 'board',\n'pipe': 'pipe',\n'water heater': 'water heater',\n'vacuum': 'vacuum cleaner',\n'stuffed animal': 'stuffed animal',\n'decoration': 'decoration',\n'shower wall': 'shower wall',\n'telephone': 'telephone',\n'plate': 'plate',\n'watering can': 'can',\n'device': 'object',\n'stove': 'stove',\n'kitchen towel': 'towel',\n'garbage': 'trash can',\n'shampoo': 'shampoo bottle',\n'statue': 'statue',\n'shower door': 'shower door',\n'book': 'book',\n'fan': 'fan',\n'speaker': 'speaker',\n'pile of books': 'books',\n'side table': 'end table',\n'table lamp': 'table lamp',\n'couch': 'couch',\n'magazine': 'magazine',\n'papers': 'papers',\n'books': 'books',\n'furniture': 'furniture',\n'magazine files': 'magazine rack',\n'mannequin': 'object',\n'boxes': 'boxes',\n'clock': 'clock',\n'cube': 'object',\n'napkins': 'cloth',\n'stuffed': 'stuffed animal',\n'luggage': 'luggage',\n'partition': 'divider',\n'trash': 'trash can',\n'coffee': 'coffee maker',\n'bar': 'bar',\n'newspaper': 'paper',\n'wood': 'wood',\n'fireplace': 'fireplace',\n'dining': 'dining table',\n'dining table': 'dining table',\n'bread': 'object',\n'fruits': 'object',\n'kitchen': 'kitchen cabinet',\n'can': 'can',\n'squeezer': 'object',\n'bowl': 'bowl',\n'recycle': 'recycling bin',\n'barstool': 'stool',\n'computer': 'computer tower',\n'umbrella': 'umbrella',\n'bath': 'bathtub',\n'hanging': 'hanging',\n'rocking': 'object',\n'objects': 'object',\n'flowers': 'plant',\n'plants': 'plant',\n'jar': 'jar',\n'bedside': 'nightstand',\n'buggy': 'object',\n'side': 'object',\n'socket': 'power outlet',\n'showcase': 'display case',\n'drying': 'drying rack',\n'ottoman': 'ottoman',\n'pictures': 'pictures',\n'storage': 'storage bin',\n'footstool': 'footstool',\n'folding': 'folded chair',\n'ladder': 'ladder',\n'shoe': 'shoes',\n'pet': 'object',\n'medical': 'object',\n'soap': 'soap',\n'balcony': 'object',\n'foosball': 'foosball table',\n'hand': 'object',\n'bookshelf': 'bookshelf',\n'pile': 'object',\n'cleaning': 'object',\n'flush': 'toilet flush button',\n'towels': 'towels',\n'candlestick': 'candle',\n'puf': 'object',\n'printer': 'printer',\n'shelves': 'shelf',\n'stair': 'stair',\n'cleanser': 'soap bottle',\n'armoire': 'wardrobe closet',\n'bidet': 'object',\n'exit': 'exit sign',\n'toaster': 'toaster',\n'laundry': 'laundry basket',\n'hood': 'range hood',\n'sponge': 'sponge',\n'fridge': 'refrigerator',\n'breadboard': 'cutting board',\n'pan': 'frying pan',\n'water': 'water bottle',\n'teapot': 'tea kettle',\n'projector': 'projector',\n'juicer': 'kitchen mixer',\n'cutting': 'cutting board',\n'windows': 'windowsill',\n'food': 'food container',\n'cup': 'cup',\n'rug': 'rug',\n'column': 'column',\n'keyboard': 'keyboard',\n'office': 'office chair',\n'exhaust': 'range hood',\n'apron': 'kitchen apron',\n'pepper': 'salt',\n'knife': 'knife block',\n'cooking': 'cooking pot',\n'tablet': 'ipad',\n'bicycle': 'bicycle',\n'pillar': 'pillar',\n'machine': 'washing machine',\n'meter': 'scale',\n'cut': 'paper cutter',\n'salt': 'salt',\n'candles': 'candle',\n'grass': 'plant',\n'sidetable': 'end table',\n'sewing': 'sewing machine',\n'guitar': 'guitar',\n'flag': 'flag',\n'paper': 'paper',\n'sugar': 'bowl',\n'cups': 'cups',\n'packs': 'boxes',\n'plates': 'plates',\n'tray': 'tray',\n'chandelier': 'chandelier',\n'mandarins': 'bananas',\n'puppet': 'doll',\n'painting': 'painting',\n'cradle': 'crib',\n'price': 'tag',\n'dish': 'dish rack',\n'boiler': 'boiler',\n'fruit': 'bananas',\n'multicooker': 'rice cooker',\n'items': 'object',\n'extractor': 'juicer',\n'air': 'fan',\n'dressing': 'mirror',\n'round': 'round table',\n'screen': 'screen',\n'mattress': 'mattress',\n'bike': 'bicycle',\n'rolled': 'rolled poster',\n'locker': 'cabinet',\n'tennis': 'tennis racket',\n'cap': 'cap',\n'ball': 'ball',\n'folder': 'folder',\n'milk': 'refridgerator',\n'dishdrainer': 'dish rack',\n'dishwasher': 'dishwasher',\n'piano': 'piano',\n'stereo': 'speaker',\n'upholstered': 'couch',\n'folded': 'folded chairs',\n'loft': 'loft bed',\n'aquarium': 'fish',\n'dispenser': 'soap dispenser',\n'body': 'person',\n'sign': 'sign',\n'baby': 'crib',\n'chest': 'chest',\n'pot': 'pot',\n'drawers': 'drawer',\n'rail': 'rail',\n'platform': 'platform',\n'tree': 'plant',\n'armor': 'helmet',\n'ironing': 'ironing board',\n'headboard': 'headboard',\n'crib': 'crib',\n'beverage': 'bottle',\n'plank': 'wood',\n'generator': 'machine',\n'file': 'file cabinet',\n'coat': 'coat rack',\n'tool': 'toolbox',\n'rolling': 'cart',\n'tire': 'tire',\n'cable': 'cable',\n'fence': 'gate',\n'handrail': 'handrail',\n't-shirt': 'shirt',\n'ramp': 'stairs',\n'seat': 'seat',\n'sideboard': 'cabinet',\n'lounger': 'chair',\n'discs': 'cd case',\n'drum': 'drum set',\n'drinks': 'soda can',\n'chairs': 'chair',\n'dishes': 'dish rack',\n'linen': 'towel',\n'glass': 'glass',\n'xbox': 'xbox controller',\n'ukulele': 'guitar',\n'pin': 'needle'\n}\n\n\nHM3D_SCANNET = {\n\"wall\": \"wall\",\n\"picture\": \"picture\",\n\"floor\": \"floor\",\n\"fireplace\": \"fireplace\",\n\"window\": \"window\",\n\"window frame\": \"window\",\n\"door\": \"door\",\n\"door knob\": \"door\",\n\"door frame\": \"doorframe\",\n\"ceiling\": \"ceiling\",\n\"ceiling fan\": \"ceiling fan\",\n\"fireplace shelf\": \"fireplace\",\n\"hearth\": \"fireplace\",\n\"fireplace floor\": \"fireplace\",\n\"armchair\": \"armchair\",\n\"table\": \"table\",\n\"coffee table\": \"coffee table\",\n\"table lamp\": \"table lamp\",\n\"sofa\": \"sofa\",\n\"pillow\": \"pillow\",\n\"tv stand\": \"tv stand\",\n\"tv\": \"tv\",\n\"device\": \"object\",\n\"chair\": \"chair\",\n\"cutlery\": \"knife block\",\n\"plate\": \"plate\",\n\"napkins\": \"towel\",\n\"ceiling lamp\": \"ceiling light\",\n\"kitchen cabinet\": \"kitchen cabinet\",\n\"shelf\": \"shelf\",\n\"fridge\": \"refrigerator\",\n\"microwave\": \"microwave\",\n\"kitchen cabinet lower\": \"kitchen cabinet\",\n\"coffee machine\": \"coffee maker\",\n\"oven\": \"oven\",\n\"kettle\": \"kettle\",\n\"tray\": \"tray\",\n\"knife set\": \"kitchen cabinet\",\n\"wall lamp\": \"wall lamp\",\n\"sink\": \"sink\",\n\"tap\": \"faucet\",\n\"detergent\": \"soap\",\n\"unknown\": \"object\",\n\"toaster\": \"toaster\",\n\"dishwasher\": \"dishwasher\",\n\"cabinet\": \"cabinet\",\n\"bed\": \"bed\",\n\"bedside table\": \"nightstand\",\n\"clock\": \"clock\",\n\"air vent\": \"vent\",\n\"hanger\": \"clothes hanger\",\n\"mirror\": \"mirror\",\n\"wash cabinet\": \"bathroom vanity\",\n\"washbasin\": \"sink\",\n\"faucet\": \"faucet\",\n\"cosmetic\": \"toiletry\",\n\"soap\": \"soap\",\n\"cosmetics\": \"toiletry\",\n\"towel\": \"towel\",\n\"bin\": \"bin\",\n\"toilet paper\": \"toilet paper\",\n\"toilet\": \"toilet\",\n\"bath\": \"bathtub\",\n\"bath curtain\": \"shower curtain\",\n\"curtain bar\": \"curtain rod\",\n\"bath shelf\": \"shelf\",\n\"bath dial\": \"shower faucet handle\",\n\"bath faucet\": \"faucet\",\n\"decoration\": \"decoration\",\n\"tissue box\": \"tissue box\",\n\"hand towel\": \"hand towel\",\n\"storage unit\": \"storage bin\",\n\"shower wall\": \"shower wall\",\n\"shower seat\": \"seat\",\n\"shower floor\": \"shower floor\",\n\"shower cabin\": \"shower\",\n\"shower curtain\": \"shower curtain\",\n\"shower bar\": \"handrail\",\n\"showerhead\": \"shower head\",\n\"shower dial\": \"shower faucet handle\",\n\"shower hanger\": \"clothes hanger\",\n\"rug\": \"rug\",\n\"fire detector\": \"smoke detector\",\n\"banister\": \"banister\",\n\"plunger\": \"plunger\",\n\"rod\": \"rod\",\n\"washer-dryer\": \"washing machine\",\n\"couch\": \"couch\",\n\"lamp\": \"lamp\",\n\"pouffe\": \"ottoman\",\n\"furniture\": \"furniture\",\n\"fan\": \"fan\",\n\"chest of drawers\": \"dresser\",\n\"curtain\": \"curtain\",\n\"curtain rod\": \"curtain rod\",\n\"desk\": \"desk\",\n\"hanging clothes\": \"clothing\",\n\"barrel\": \"container\",\n\"bathtub\": \"bathtub\",\n\"drawer\": \"drawer\",\n\"countertop\": \"counter\",\n\"bathroom shelf\": \"shelf\",\n\"knob\": \"door\",\n\"toilet brush\": \"toilet\",\n\"shower knob\": \"shower\",\n\"stairs\": \"staircase\",\n\"handrail\": \"banister\",\n\"bathroom cabinet\": \"bathroom vanity\",\n\"paper towel\": \"paper towel dispenser\",\n\"towel ring\": \"towel\",\n\"towel bar\": \"towel\",\n\"kitchen counter\": \"kitchen cabinet\",\n\"refrigerator\": \"mini fridge\",\n\"vase\": \"flowerpot\",\n\"kitchen utensil\": \"cooking pot\",\n\"kitchen countertop item\": \"kitchen counter\",\n\"shelving\": \"shelf\",\n\"pitcher\": \"water pitcher\",\n\"bowl\": \"bowl\",\n\"kitchen island\": \"kitchen counter\",\n\"trashcan\": \"trash can\",\n\"stove\": \"oven\",\n\"box\": \"storage box\",\n\"clutter\": \"object\",\n\"painting\": \"picture\",\n\"book\": \"bookshelf\",\n\"toy\": \"stuffed animal\",\n\"heater\": \"radiator\",\n\"ceiling vent\": \"vent\",\n\"floor mat\": \"mat\",\n\"hand soap\": \"soap dispenser\",\n\"flower\": \"potted plant\",\n\"toilet paper dispenser\": \"toilet paper\",\n\"bathrobe\": \"bathrobe\",\n\"bed table\": \"nightstand\",\n\"bedside lamp\": \"lamp\",\n\"folding chair\": \"folded chair\",\n\"patio chair\": \"chair\",\n\"grill\": \"oven\",\n\"balustrade\": \"banister\",\n\"attic door\": \"door\",\n\"sensor\": \"smoke detector\",\n\"wall hanging decoration\": \"decoration\",\n\"doormat\": \"mat\",\n\"clothes hanger\": \"clothes\",\n\"wall cabinet\": \"cabinet\",\n\"wall clock\": \"clock\",\n\"led tv\": \"tv\",\n\"fireplace wall\": \"fireplace\",\n\"firewood holder\": \"wood\",\n\"floor lamp\": \"lamp\",\n\"curtain rail\": \"curtain rod\",\n\"wine rack\": \"bar\",\n\"wine bottle\": \"bottle\",\n\"wall electronics\": \"power outlet\",\n\"washing machine\": \"washing machines\",\n\"kitchen appliance\": \"kitchen cabinets\",\n\"bed light\": \"lamp\",\n\"electric box\": \"electric panel\",\n\"guitar\": \"guitar case\",\n\"media console\": \"tv stand\",\n\"newspaper\": \"magazine\",\n\"wardrobe\": \"wardrobe closet\",\n\"bottle of soap\": \"soap bottle\",\n\"ventilation hood\": \"range hood\",\n\"sauna heater\": \"heater\",\n\"sauna bowl\": \"bowl\",\n\"rail\": \"handrail\",\n\"spa bench\": \"bench\",\n\"bathroom utensil\": \"toiletry\",\n\"recessed wall\": \"wall\",\n\"art frame\": \"picture\",\n\"appliance\": \"machine\",\n\"decorative plant\": \"potted plant\",\n\"flowerpot\": \"vase\",\n\"door/window frame\": \"doorframe\",\n\"cardboard box\": \"box\",\n\"shoe\": \"shoes\",\n\"clothes\": \"clothing\",\n\"clothes rack\": \"clothing rack\",\n\"shelf with clutter\": \"shelf\",\n\"case\": \"suitcase\",\n\"backpack\": \"messenger bag\",\n\"hat\": \"cap\",\n\"storage cabinet\": \"cabinet\",\n\"bag\": \"grocery bag\",\n\"basket of something\": \"basket\",\n\"blanket\": \"blanket\",\n\"vacuum cleaner\": \"roomba\",\n\"window shutter\": \"blinds\",\n\"exercise bike\": \"exercise machine\",\n\"door/window\": \"doorframe\",\n\"plant\": \"potted plant\",\n\"bath sink\": \"sink\",\n\"shower hose\": \"shower head\",\n\"ironing board\": \"iron\",\n\"stand\": \"podium\",\n\"rack\": \"rack stand\",\n\"bottle of detergent\": \"laundry detergent\",\n\"storage box\": \"storage container\",\n\"jar\": \"container\",\n\"ceiling dome\": \"ceiling\",\n\"container\": \"box\",\n\"refrigerator cabinet\": \"refrigerator\",\n\"compound wall\": \"wall\",\n\"glass\": \"cup\",\n\"bottle\": \"bottle\",\n\"speaker\": \"speaker\",\n\"telephone\": \"telephone\",\n\"sofa chair\": \"sofa chair\",\n\"small table/stand\": \"end table\",\n\"blinds\": \"blinds\",\n\"stairs railing\": \"stair rail\",\n\"bar\": \"bar\",\n\"air conditioner\": \"vent\",\n\"bed stand\": \"nightstand\",\n\"shower door frame\": \"shower door\",\n\"bathroom accessory\": \"soap dispenser\",\n\"trash can\": \"trash can\",\n\"liquid soap\": \"soap\",\n\"desk chair\": \"office chair\",\n\"desk clutter\": \"papers\",\n\"book rack\": \"bookshelf\",\n\"wall panel\": \"wall\",\n\"closet area for hanging clothes\": \"wardrobe closet\",\n\"laundry basket\": \"laundry basket\",\n\"shower rod\": \"shower curtain rod\",\n\"shower hose/head\": \"shower head\",\n\"box of tissues\": \"tissue box\",\n\"shower ceiling\": \"ceiling\",\n\"plush toy\": \"stuffed animal\",\n\"dresser\": \"dresser\",\n\"recycle bin\": \"recycling bin\",\n\"desk lamp\": \"desk lamp\",\n\"basket\": \"basket\",\n\"drum\": \"instrument case\",\n\"stack of papers\": \"papers\",\n\"bath towel\": \"towel\",\n\"bath cabinet\": \"bathroom cabinet\",\n\"carpet\": \"carpet\",\n\"printer\": \"printer\",\n\"bench\": \"bench\",\n\"flower vase\": \"vase\",\n\"furnace\": \"heater\",\n\"sink cabinet\": \"bathroom vanity\",\n\"paper\": \"paper\",\n\"bath wall\": \"bath walls\",\n\"shower soap shelf\": \"soap dish\",\n\"chandelier\": \"chandelier\",\n\"window curtain\": \"curtain\",\n\"monitor\": \"monitor\",\n\"keyboard\": \"keyboard\",\n\"computer mouse\": \"mouse\",\n\"computer desk\": \"desk\",\n\"board\": \"board\",\n\"snack\": \"food container\",\n\"machine\": \"machine\",\n\"mini fridge\": \"mini fridge\",\n\"arcade game\": \"machine\",\n\"exercise machine\": \"exercise machine\",\n\"shutter\": \"door\",\n\"pad\": \"pillow\",\n\"computer equipment\": \"computer tower\",\n\"computer chair\": \"office chair\",\n\"wall shelf\": \"shelf\",\n\"staircase handrail\": \"handrail\",\n\"bed small\": \"bed\",\n\"dog bed\": \"pillow\",\n\"photo\": \"picture\",\n\"lamp table\": \"table lamp\",\n\"footstool\": \"footstool\",\n\"bath utensil\": \"bath products\",\n\"brush\": \"hair brush\",\n\"chest\": \"chest\",\n\"throw blanket\": \"blanket\",\n\"bowl of fruit\": \"bowl\",\n\"highchair\": \"seat\",\n\"laptop\": \"laptop\",\n\"file cabinet\": \"file cabinet\",\n\"food stand\": \"food display\",\n\"dining table\": \"dining table\",\n\"wall toilet paper\": \"toilet paper holder\",\n\"cabinet table\": \"cabinet\",\n\"ventilation\": \"vent\",\n\"wall indent\": \"wall\",\n\"window glass\": \"window\",\n\"bath tub\": \"bathtub\",\n\"cabinet door\": \"cabinet door\",\n\"support beam\": \"wood beam\",\n\"holder\": \"soap dish\",\n\"side table\": \"end table\",\n\"smoke detector\": \"smoke detector\",\n\"dinnerware\": \"plates\",\n\"dinner table\": \"dining table\",\n\"mirror /otherroom\": \"mirror\",\n\"shower\": \"shower\",\n\"pipe\": \"pipe\",\n\"motion detector\": \"alarm\",\n\"paper towel holder\": \"paper towel dispenser\",\n\"ornament\": \"decoration\",\n\"bedside cabinet\": \"nightstand\",\n\"ceiling door\": \"door\",\n\"stool\": \"stool\",\n\"countertop item\": \"kitchen counter\",\n\"island\": \"kitchen island\",\n\"range hood\": \"range hood\",\n\"door cabinet\": \"cabinet door\",\n\"window /outside\": \"window\",\n\"frame\": \"frame\",\n\"rocking chair\": \"chair\",\n\"fruit\": \"banana holder\",\n\"tissue\": \"tissue box\",\n\"plate of food\": \"plate\",\n\"shower curtain rod\": \"shower curtain rod\",\n\"shower tap\": \"shower faucet handle\",\n\"soapbox\": \"soap bar\",\n\"soap dispenser\": \"soap dispenser\",\n\"remote control\": \"remote\",\n\"ceiling duct\": \"vent\",\n\"pool stick\": \"pool table\",\n\"statue\": \"statue\",\n\"mat\": \"mat\",\n\"fire alarm\": \"fire alarm\",\n\"toilet brush holder\": \"toilet brush\",\n\"vent\": \"vent\",\n\"step\": \"step\",\n\"shower pipe\": \"shower head\",\n\"laundry machine\": \"washing machine\",\n\"bucket\": \"bucket\",\n\"broom\": \"broom\",\n\"cutting board\": \"cutting board\",\n\"oven and stove\": \"oven\",\n\"oven vent\": \"vent\",\n\"kitchen countertop items\": \"kitchen counter\",\n\"kitchen lower cabinet\": \"kitchen cabinet\",\n\"grate\": \"vent\",\n\"beam\": \"wood beam\",\n\"pillar\": \"pillar\",\n\"dining chair\": \"chair\",\n\"basket of towels\": \"towels\",\n\"stair handle\": \"handrail\",\n\"stair\": \"stair\",\n\"stair wall\": \"wall\",\n\"exercise ball\": \"exercise ball\",\n\"exercise equipment\": \"exercise machine\",\n\"exercise mat\": \"yoga mat\",\n\"bedframe\": \"bedframe\",\n\"lamp stand\": \"lamp base\",\n\"pool table\": \"pool table\",\n\"dartboard\": \"dart board\",\n\"bar cabinet\": \"bar\",\n\"bar chair\": \"chair\",\n\"display cabinet\": \"display case\",\n\"display table\": \"table\",\n\"screen\": \"screen\",\n\"teapot\": \"tea kettle\",\n\"railing\": \"railing\",\n\"unknown/remove\": \"object\",\n\"alarm clock\": \"alarm clock\",\n\"alarm\": \"alarm\",\n\"hunting trophy\": \"decoration\",\n\"water dispenser\": \"water cooler\",\n\"antique telephone\": \"telephone\",\n\"calendar\": \"calendar\",\n\"knife holder\": \"rack\",\n\"spice rack\": \"rack\",\n\"fireplace tool set\": \"fireplace\",\n\"firewood\": \"wood\",\n\"ceiling under staircase\": \"ceiling\",\n\"can\": \"can\",\n\"bathtub platform\": \"bathtub\",\n\"scale\": \"scale\",\n\"tile\": \"floor\",\n\"belt\": \"clothing\",\n\"parapet\": \"railing\",\n\"three\": \"object\",\n\"computer\": \"computer tower\",\n\"treadmill\": \"treadmill\",\n\"duct\": \"vent\",\n\"electrical box\": \"electric panel\",\n\"stair frame\": \"stair\",\n\"ladder\": \"ladder\",\n\"water heater\": \"water heater\",\n\"heater piping\": \"pipe\",\n\"exhaust pipe\": \"pipe\",\n\"pump\": \"water cooler\",\n\"hose\": \"hose\",\n\"switch\": \"light switch\",\n\"storage bin\": \"storage bin\",\n\"plastic bag\": \"bag\",\n\"wall board\": \"board\",\n\"washing machine and dryer\": \"washing machine\",\n\"flag\": \"flag\",\n\"can of paint\": \"can\",\n\"broomstick\": \"broom\",\n\"plumbing\": \"pipe\",\n\"column\": \"column\",\n\"ceiling pipe\": \"pipe\",\n\"cables\": \"power strip\",\n\"landing\": \"stairs\",\n\"ledge\": \"ledge\",\n\"mop\": \"broom\",\n\"tv remote\": \"remote\",\n\"closet door\": \"closet door\",\n\"door window\": \"door\",\n\"door hinge\": \"door\",\n\"doorway\": \"doorframe\",\n\"alarm control\": \"alarm\",\n\"radiator\": \"radiator\",\n\"ceiling light fixture connection\": \"ceiling light\",\n\"mirror frame\": \"mirror\",\n\"mantel\": \"fireplace\",\n\"firebox\": \"fireplace\",\n\"fireplace sconce\": \"fireplace\",\n\"fume cupboard\": \"cabinet\",\n\"decorative bowl\": \"bowl\",\n\"stonework\": \"wall\",\n\"kitchen shelf\": \"kitchen cabinet\",\n\"window shade\": \"curtain\",\n\"radio\": \"speaker\",\n\"cooker\": \"stove\",\n\"kitchen table\": \"dining table\",\n\"cup\": \"cup\",\n\"tiles\": \"floor\",\n\"washbasin counter\": \"sink\",\n\"doorstep\": \"door\",\n\"stairs wall\": \"stairs\",\n\"stairs trim\": \"stairs\",\n\"stacked chair\": \"stack of chairs\",\n\"carpet roll\": \"carpet\",\n\"safe\": \"cabinet\",\n\"briefcase\": \"briefcase\",\n\"door stopper\": \"door\",\n\"shades\": \"blinds\",\n\"electric cable\": \"power strip\",\n\"nightstand\": \"nightstand\",\n\"clothes hanger rod\": \"closet rod\",\n\"shower stall\": \"shower\",\n\"sink/basin\": \"sink\",\n\"worktop\": \"kitchen counter\",\n\"basin\": \"sink\",\n\"diploma\": \"frame\",\n\"garage door\": \"garage door\",\n\"garage door frame\": \"garage door\",\n\"garage door opener\": \"garage door\",\n\"boxes\": \"boxes\",\n\"jacket\": \"jacket\",\n\"tree\": \"plant\",\n\"tree branch\": \"plant\",\n\"shovel\": \"broom\",\n\"tool\": \"toolbox\",\n\"binder\": \"binders\",\n\"paper storage\": \"paper tray\",\n\"folder\": \"file cabinet\",\n\"flower stand\": \"plant\",\n\"coat hanger\": \"clothes hanger\",\n\"wall detail\": \"wall\",\n\"boiler\": \"boiler\",\n\"shower-bath cabinet\": \"bathroom cabinet\",\n\"sofa seat\": \"sofa\",\n\"platform\": \"platform\",\n\"panel\": \"wall\",\n\"coffee maker\": \"coffee maker\",\n\"dish cabinet\": \"dish rack\",\n\"candle holder\": \"candle\",\n\"ceiling molding\": \"ceiling\",\n\"bed comforter\": \"bed\",\n\"cuddly toy\": \"teddy bear\",\n\"decorative plate\": \"plate\",\n\"figure\": \"statue\",\n\"bouquet\": \"flowerpot\",\n\"kitchen extractor\": \"range hood\",\n\"pot\": \"pot\",\n\"vessel\": \"bowl\",\n\"kitchen top\": \"kitchen counter\",\n\"painting frame\": \"painting\",\n\"casket\": \"chest\",\n\"bathroom towel\": \"towel\",\n\"washing stuff\": \"soap\",\n\"shelf cubby\": \"shelf\",\n\"shower rail\": \"shower curtain rod\",\n\"cloth\": \"cloth\",\n\"stick\": \"rod\",\n\"luggage\": \"luggage\",\n\"sconce\": \"wall lamp\",\n\"lounge chair\": \"armchair\",\n\"patio floor\": \"floor\",\n\"roof\": \"ceiling\",\n\"jewelry\": \"jewelry box\",\n\"bedroom ceiling\": \"ceiling\",\n\"cushion\": \"pillow\",\n\"ceiling bedroom\": \"ceiling\",\n\"record player\": \"speaker\",\n\"perfume\": \"soap\",\n\"shower handle\": \"shower faucet handle\",\n\"shampoo\": \"shampoo bottle\",\n\"weight\": \"dumbbell\",\n\"candlestick\": \"candle\",\n\"cabinet kitchen\": \"kitchen cabinet\",\n\"antique clock\": \"clock\",\n\"picture frame\": \"picture\",\n\"candle\": \"candle\",\n\"air conditioning\": \"fan\",\n\"floor /outside\": \"floor\",\n\"fence\": \"rail\",\n\"canopy\": \"canopy\",\n\"end table\": \"end table\",\n\"shelf with art\": \"shelf\",\n\"dishrag\": \"towel\",\n\"staircase trim\": \"staircase\",\n\"bookshelf\": \"bookshelf\",\n\"decorative quilt\": \"blanket\",\n\"shower tub\": \"bathtub\",\n\"clothes dryer\": \"clothes dryer\",\n\"clothes hamper\": \"laundry hamper\",\n\"bathroom counter\": \"bathroom counter\",\n\"cart\": \"cart\",\n\"weight bench\": \"bench\",\n\"rack of weights\": \"dumbbell\",\n\"ceiling under stairs\": \"ceiling\",\n\"storage shelving\": \"storage shelf\",\n\"office chair\": \"office chair\",\n\"doll\": \"doll\",\n\"step stool\": \"step stool\",\n\"pc tower\": \"computer tower\",\n\"control panel\": \"electric panel\",\n\"umbrella\": \"umbrella\",\n\"food\": \"food container\",\n\"closet\": \"closet\",\n\"fireplace utensil\": \"fireplace\",\n\"holy cross\": \"decoration\",\n\"tablet\": \"ipad\",\n\"mug\": \"mug\",\n\"box of tissue\": \"tissue box\",\n\"sled\": \"luggage stand\",\n\"electrical controller\": \"electric panel\",\n\"dressing table\": \"dresser\",\n\"freezer\": \"refrigerator\",\n\"paneling\": \"wall\",\n\"object\": \"object\",\n\"pile of magazines\": \"magazine\",\n\"objects\": \"object\",\n\"counter\": \"counter\",\n\"outlet\": \"power outlet\",\n\"fire extinguisher\": \"fire extinguisher\",\n\"entertainment set\": \"tv stand\",\n\"window shutters\": \"blinds\",\n\"window valence\": \"curtain\",\n\"shower curtain bar\": \"shower curtain rod\",\n\"statue/art\": \"statue\",\n\"brochure\": \"paper\",\n\"trinket\": \"decoration\",\n\"toilet cleaner\": \"toilet brush\",\n\"air duct\": \"vent\",\n\"plank\": \"wood\",\n\"bed cabinet\": \"nightstand\",\n\"ottoman\": \"ottoman\",\n\"air refresher\": \"spray bottle\",\n\"cleaner\": \"vacuum cleaner\",\n\"shower shelf\": \"shower\",\n\"ceiling lower\": \"ceiling\",\n\"cases\": \"suitcases\",\n\"magazines\": \"magazine\",\n\"grab bar\": \"grab bar\",\n\"bath door frame\": \"doorframe\",\n\"shower door knob\": \"shower door\",\n\"document\": \"paper\",\n\"cleaning paper\": \"paper towel roll\",\n\"bath floor\": \"floor\",\n\"shower bench\": \"bench\",\n\"shower step\": \"step\",\n\"slippers\": \"slippers\",\n\"boots\": \"shoes\",\n\"weights\": \"dumbbell\",\n\"cap\": \"cap\",\n\"note\": \"paper\",\n\"golf sticks\": \"golf bag\",\n\"glasses\": \"mirror\",\n\"medal collection\": \"decoration\",\n\"decorative dinnerware\": \"plate\",\n\"music player\": \"speaker\",\n\"jug\": \"pitcher\",\n\"sliding door\": \"door\",\n\"kitchen knife set\": \"kitchen cabinet\",\n\"sitting bench\": \"bench\",\n\"bowl with sweets\": \"bowl\",\n\"decorative vase\": \"vase\",\n\"rolled carpet\": \"carpet\",\n\"ball\": \"ball\",\n\"dumbbell\": \"dumbbell\",\n\"folded chair\": \"folded chair\",\n\"screen frame\": \"screen\",\n\"speaker stand\": \"speaker\",\n\"projector\": \"projector\",\n\"bath tap\": \"faucet\",\n\"stuffed animal\": \"stuffed animal\",\n\"ceiling window\": \"window\",\n\"kitchen handle\": \"kitchen cabinet\",\n\"podium\": \"podium\",\n\"office table\": \"desk\",\n\"button\": \"toilet flush button\",\n\"decoder\": \"computer tower\",\n\"food tray\": \"tray\",\n\"elevator\": \"elevator\",\n\"bathtub utensil\": \"bathtub\",\n\"footrest\": \"footrest\",\n\"stair step\": \"stairs\",\n\"ceiling light\": \"ceiling light\",\n\"telescope\": \"telescope\",\n\"ping pong table\": \"ping pong table\",\n\"photo mount\": \"picture\",\n\"bookstand\": \"bookshelf\",\n\"curtain rod cover\": \"curtain rod\",\n\"baby changing table\": \"changing station\",\n\"bulletin board\": \"bulletin board\",\n\"stack of jackets\": \"jacket\",\n\"crib\": \"crib\",\n\"kitchen cabinet door\": \"cabinet door\",\n\"kitchen cabinet drawer\": \"kitchen cabinet\",\n\"potty\": \"toilet\",\n\"bottles of wine\": \"beer bottles\",\n\"table cloth\": \"table\",\n\"iron board\": \"ironing board\",\n\"shoes\": \"shoes\",\n\"bottles of water\": \"water bottle\",\n\"watch\": \"clock\",\n\"cable\": \"power outlet\",\n\"desk door\": \"desk\",\n\"notebook\": \"book\",\n\"file binder\": \"binders\",\n\"bathroom stuff\": \"bath products\",\n\"bath mat\": \"mat\",\n\"handbag\": \"bag\",\n\"robe\": \"bathrobe\",\n\"sculpture\": \"statue\",\n\"kitchen decoration\": \"decoration\",\n\"pan\": \"cooking pan\",\n\"handle\": \"door\",\n\"bicycle\": \"bicycle\",\n\"piano\": \"piano\",\n\"piano stool\": \"piano bench\",\n\"wall post\": \"wall\",\n\"magazine\": \"magazine\",\n\"cover\": \"cover\",\n\"tool box\": \"toolbox\",\n\"wood\": \"wood\",\n\"support\": \"structure\",\n\"hook\": \"rack\",\n\"gravel\": \"floor\",\n\"rafter\": \"wood beam\",\n\"soap bottle\": \"soap bottle\",\n\"dustbin\": \"trash bin\",\n\"light\": \"light\",\n\"sign\": \"sign\",\n\"plates\": \"plate\",\n\"thermostat\": \"thermostat\",\n\"mascot\": \"stuffed animal\",\n\"stack of books\": \"stack of chairs\",\n\"stack of books / papers\": \"papers\",\n\"decorative cloth\": \"cloth\",\n\"locker\": \"cabinet\",\n\"wine storage\": \"cabinet\",\n\"cleaning liquid\": \"dishwashing soap bottle\",\n\"cleaning sponge\": \"sponge\",\n\"stack of pots\": \"pot\",\n\"seat\": \"seat\",\n\"air vent fan\": \"fan\",\n\"shower handrail\": \"handrail\",\n\"ceiling fire detector\": \"smoke detector\",\n\"ceiling fan vent\": \"vent\",\n\"curtain hanger\": \"curtain rod\",\n\"bath bar\": \"bath walls\",\n\"book cabinet\": \"bookshelf\",\n\"desk cabinet\": \"desk\",\n\"papers\": \"papers\",\n\"sheets / clothes\": \"clothes\",\n\"wall hanger\": \"wall mounted coat rack\",\n\"ceiling lamp hanger\": \"ceiling lamp\",\n\"stack of stuff\": \"stack of chairs\",\n\"socket\": \"power outlet\",\n\"shower door\": \"shower door\",\n\"fan air vent\": \"vent\",\n\"crate\": \"crate\",\n\"wire\": \"power outlet\",\n\"installation\": \"structure\",\n\"sideboard\": \"cabinet\",\n\"wall sign\": \"sign\",\n\"camera\": \"camera\",\n\"urinal\": \"urinal\",\n\"stone support structure\": \"structure\",\n\"bottles\": \"bottles\",\n\"bathroom mat\": \"mat\",\n\"shower ceiling lamp\": \"ceiling lamp\",\n\"shower cosmetics\": \"bath products\",\n\"fire sprinkler\": \"fire extinguisher\",\n\"bowl of sweets\": \"bowl\",\n\"sprinkler\": \"hose\",\n\"self-closing mechanism\": \"door\",\n\"utensil\": \"kitchen cabinet\",\n\"control\": \"light switch\",\n\"headphones\": \"headphones\",\n\"bidet\": \"toilet\",\n\"arch\": \"doorframe\",\n\"closet shelving\": \"closet\",\n\"bean bag chair\": \"beanbag chair\",\n\"scarf\": \"clothes\",\n\"shelf clutter\": \"shelf\",\n\"bed sheet\": \"bed\",\n\"bowl of fruits\": \"bowl\",\n\"cabinet clutter\": \"cabinet\",\n\"light fixture\": \"ceiling light\",\n\"dvd player\": \"tv\",\n\"surface\": \"table\",\n\"surfboard\": \"board\",\n\"whine shelf\": \"shelf\",\n\"amplifier\": \"speaker\",\n\"stereo\": \"speaker\",\n\"fire screen\": \"fireplace\",\n\"ice maker\": \"refrigerator\",\n\"knife\": \"kitchen cabinet\",\n\"closet mirror wall\": \"mirror\",\n\"table tennis table\": \"ping pong table\",\n\"baby changing station\": \"changing station\",\n\"book display\": \"bookshelf\",\n\"sliding glass door\": \"door\",\n\"l-shaped sofa\": \"sofa\",\n\"soap dish\": \"soap dish\",\n\"mixer\": \"kitchen mixer\",\n\"puppet\": \"doll\",\n\"electric guitar\": \"guitar\",\n\"music album shelf\": \"shelf\",\n\"bathroom floor\": \"floor\",\n\"powder soap\": \"soap\",\n\"decorative lamp\": \"lamp\",\n\"toothbrush\": \"toothbrush\",\n\"shower grab bar\": \"grab bar\",\n\"candle stand\": \"candle\",\n\"electric outlet\": \"power outlet\",\n\"photos\": \"pictures\",\n\"shower caddy\": \"shower\",\n\"mouse\": \"mouse\",\n\"car model\": \"car\",\n\"guitar frame\": \"guitar case\",\n\"ship toy\": \"boat\",\n\"platter\": \"plate\",\n\"decorative bottle\": \"bottle\",\n\"audio player\": \"speaker\",\n\"firewood chest\": \"chest\",\n\"kitchen seating\": \"seat\",\n\"hats\": \"hat\",\n\"cat toilet\": \"cat litter box\",\n\"electric kettle\": \"kettle\",\n\"security camera\": \"camera\",\n\"light switch\": \"light switch\",\n\"schedule\": \"calendar\",\n\"globe\": \"globe\",\n\"magazine rack\": \"magazine\",\n\"ceiling/west wall\": \"ceiling\",\n\"shelf with shoes\": \"shoe rack\",\n\"kitchen lower shelf\": \"kitchen cabinet\",\n\"sewing machine\": \"sewing machine\",\n\"sewing set\": \"sewing machine\",\n\"sewing tools\": \"sewing machine\",\n\"strings\": \"rope\",\n\"decorative lantern\": \"lamp\",\n\"paper towel dispenser\": \"paper towel dispenser\",\n\"stack of cds\": \"cd case\",\n\"stationery\": \"paper\",\n\"shutters\": \"blinds\",\n\"keys\": \"cabinet\",\n\"rope\": \"rope\",\n\"iron\": \"iron\",\n\"trolley\": \"trolley\",\n\"closet shelf\": \"closet\",\n\"ceiling wall\": \"ceiling\",\n\"coach\": \"sofa\",\n\"sink table\": \"sink\",\n\"pool\": \"pool table\",\n\"entrance arch\": \"doorframe\",\n\"ceiling arch\": \"ceiling\",\n\"arcade\": \"object\",\n\"ceiling support\": \"pillar\",\n\"map\": \"map\",\n\"stained glass\": \"window\",\n\"ceiling decorative lamp\": \"ceiling light\",\n\"bust\": \"statue\",\n\"framed text\": \"picture\",\n\"ceiling boarder\": \"ceiling\",\n\"exhibition window frame\": \"window\",\n\"exhibition window\": \"window\",\n\"elevator door\": \"elevator\",\n\"exhibition panel\": \"display case\",\n\"information\": \"sign\",\n\"chassis\": \"structure\",\n\"ceiling ladder\": \"ladder\",\n\"exhibition picture\": \"picture\",\n\"exhibition table\": \"table\",\n\"ship model\": \"boat\",\n\"garage door opener motor\": \"garage door\",\n\"garage door opener bar\": \"bar\",\n\"garage door railing\": \"railing\",\n\"riser\": \"stairs\",\n\"dustpan\": \"dustpan\",\n\"coat\": \"coat\",\n\"doorpost\": \"doorframe\",\n\"backrest\": \"chair\",\n\"fruit bowl\": \"bowl\",\n\"easy chair\": \"armchair\",\n\"stone\": \"object\",\n\"ashtray\": \"container\",\n\"smoke alarm\": \"smoke detector\",\n\"headboard\": \"headboard\",\n\"bedpost\": \"bedframe\",\n\"wall cubby\": \"shelf\",\n\"toiletry\": \"toiletry\",\n\"globe stand\": \"stand\",\n\"swivel chair\": \"office chair\",\n\"bar soap\": \"soap\",\n\"sheet\": \"bed\",\n\"shirt\": \"shirt\",\n\"coffee mug\": \"mug\",\n\"toilet seat\": \"toilet\",\n\"door handle\": \"door\",\n\"exit sign\": \"exit sign\",\n\"kitchen wall\": \"wall\",\n\"washcloth\": \"washcloth\",\n\"table stand\": \"stand\",\n\"wall /outside\": \"wall\",\n\"gym rope\": \"rope\",\n\"gym equipment\": \"dumbbell\",\n\"barbell\": \"dumbbell\",\n\"gym stepper\": \"stair\",\n\"gym mat\": \"mat\",\n\"wall beam\": \"wood beam\",\n\"meshwork\": \"object\",\n\"punchbag\": \"bag\",\n\"boxing ring\": \"structure\",\n\"water tank\": \"container\",\n\"exercise mat roll\": \"mat\",\n\"gate\": \"door\",\n\"trophy\": \"statue\",\n\"payment terminal\": \"object\",\n\"cash register\": \"object\",\n\"power strip\": \"power strip\",\n\"cloth hanger\": \"clothes hanger\",\n\"jewelry box\": \"jewelry box\",\n\"extension cord\": \"power outlet\",\n\"gramophone\": \"music stand\",\n\"stereo set\": \"speaker\",\n\"newspaper basket\": \"basket\",\n\"phone\": \"telephone\",\n\"hourglass\": \"clock\",\n\"cat\": \"cat litter box\",\n\"balcony railing\": \"railing\",\n\"bed curtain\": \"curtain\",\n\"swing\": \"furniture\",\n\"skateboard\": \"board\",\n\"cat tree\": \"furniture\",\n\"laundry\": \"clothes\",\n\"clothing stand\": \"clothing rack\",\n\"sink pipe\": \"pipe\",\n\"trash bag\": \"trash bag\",\n\"lid\": \"cover\",\n\"foosball game table\": \"foosball table\",\n\"workout bike\": \"exercise machine\",\n\"canvas\": \"picture\",\n\"cat food\": \"food container\",\n\"chair /w clutter\": \"chair\",\n\"wreath\": \"decoration\",\n\"shade\": \"curtain\",\n\"wall soap shelf\": \"soap dish\",\n\"hair brush\": \"hair brush\",\n\"hair dryer\": \"hair dryer\",\n\"shower mat\": \"mat\",\n\"beanbag chair\": \"beanbag chair\",\n\"yoga mat\": \"yoga mat\",\n\"computer tower\": \"computer tower\",\n\"canister\": \"container\",\n\"motorcycle\": \"bicycle\",\n\"cabinet /w clutter\": \"cabinet\",\n\"tire\": \"tire\",\n\"axe\": \"toolbox\",\n\"partition\": \"divider\",\n\"electric wire casing\": \"power outlet\",\n\"cleaning clutter\": \"broom\",\n\"shower tray\": \"shower\",\n\"fuse box\": \"fuse box\",\n\"mortar\": \"container\",\n\"soap dish cubby\": \"soap dish\",\n\"curtain valence\": \"curtain\",\n\"beanbag\": \"beanbag chair\",\n\"boxes with books\": \"boxes\",\n\"wine\": \"bottle\",\n\"box with books\": \"box\",\n\"electric heater\": \"heater\",\n\"hot water/cold water knob\": \"faucet\",\n\"wall tv\": \"tv\",\n\"box of something\": \"box\",\n\"garage door motor\": \"garage door\",\n\"fuse panel\": \"electric panel\",\n\"closet storage area\": \"closet\",\n\"lampshade\": \"lamp\",\n\"dvd\": \"object\",\n\"curtain box\": \"curtain\",\n\"board game\": \"object\",\n\"easel\": \"easel\",\n\"blackboard\": \"blackboard\",\n\"overhang\": \"structure\",\n\"drainpipe\": \"pipe\",\n\"grass\": \"floor\",\n\"patio\": \"floor\",\n\"wall statue\": \"statue\",\n\"recessed shelving\": \"shelf\",\n\"cornice\": \"wall\",\n\"coat rack\": \"coat rack\",\n\"candelabra\": \"chandelier\",\n\"tea set\": \"cup\",\n\"purse\": \"purse\",\n\"notes\": \"paper\",\n\"wall outside\": \"wall\",\n\"office wall\": \"wall\",\n\"floor stand\": \"stand\",\n\"banner\": \"banner\",\n\"cups\": \"cups\",\n\"drawer sink table\": \"sink\",\n\"liquid cleaner\": \"spray bottle\",\n\"wall panel frame\": \"frame\",\n\"poster\": \"poster\",\n\"brochures\": \"papers\",\n\"calculator\": \"object\",\n\"pavement\": \"floor\",\n\"chest drawer\": \"dresser\",\n\"pen cup\": \"cup\",\n\"pendant\": \"object\",\n\"box pen\": \"box\",\n\"bathtub tap\": \"bathtub\",\n\"sink tap\": \"sink\",\n\"sewing box\": \"box\",\n\"attic hatch\": \"ceiling\",\n\"shredder\": \"shredder\",\n\"decor\": \"decoration\",\n\"board games\": \"object\",\n\"chess\": \"object\",\n\"piano bench\": \"piano bench\",\n\"basket with books\": \"basket\",\n\"cross\": \"object\",\n\"kitchen utensils\": \"kitchen cabinet\",\n\"christmas tree\": \"decoration\",\n\"containers\": \"container\",\n\"cans of paint\": \"can\",\n\"cardboard\": \"cardboard\",\n\"basketballs\": \"ball\",\n\"sleeping bag\": \"sleeping bag\",\n\"cloth hangers\": \"clothes hanger\",\n\"scoop\": \"object\",\n\"electrical device\": \"electric panel\",\n\"chimney\": \"structure\",\n\"bags\": \"bag\",\n\"box of food\": \"box\",\n\"liquid container\": \"bottle\",\n\"staircase wall\": \"wall\",\n\"garage door opener railing\": \"garage door\",\n\"electric wire\": \"electric panel\",\n\"silicone gun\": \"object\",\n\"wheel\": \"wheel\",\n\"drawers\": \"drawer\",\n\"boards\": \"board\",\n\"spray\": \"spray bottle\",\n\"car\": \"car\",\n\"bicycle helmets\": \"helmet\",\n\"baseball bat\": \"object\",\n\"hammock\": \"object\",\n\"camping chair\": \"chair\",\n\"poles\": \"structure\",\n\"saw\": \"object\",\n\"baskets\": \"basket\",\n\"watering can\": \"can\",\n\"football\": \"ball\",\n\"notebooks\": \"book\",\n\"shoes on shelf\": \"shoes\",\n\"clothes on shelf\": \"clothes\",\n\"rack with shoes\": \"shoe rack\",\n\"set of hangers\": \"clothes hanger\",\n\"stack of clothes\": \"clothes\",\n\"figurine\": \"object\",\n\"door screen\": \"door\",\n\"subwoofer\": \"speaker\",\n\"dish\": \"plate\",\n\"stack of towels\": \"towel\",\n\"shower wall cubby\": \"shower wall\",\n\"stack of shoes\": \"shoes\",\n\"stack of blankets\": \"blanket\",\n\"duvet\": \"blanket\",\n\"stack of pillows\": \"pillows\",\n\"stack of bags\": \"bag\",\n\"set of cosmetics\": \"object\",\n\"detergent bottle\": \"bottle\",\n\"central heating furnace\": \"heater\",\n\"set of valves\": \"pipe\",\n\"closet rod\": \"closet rod\",\n\"chair stand\": \"chair\",\n\"bathroom art\": \"painting\",\n\"ceiling fixture\": \"ceiling light\",\n\"shower glass\": \"shower door\",\n\"bath curtain bar\": \"shower curtain rod\",\n\"bath grab bar\": \"grab bar\",\n\"mascots\": \"statue\",\n\"attic entrance\": \"door\",\n\"shoe shelf\": \"shoe rack\",\n\"guitar stand\": \"guitar case\",\n\"keyboard stand\": \"keyboard piano\",\n\"air vent installation\": \"vent\",\n\"washing powder\": \"laundry detergent\",\n\"electric installation\": \"electric panel\",\n\"tank\": \"container\",\n\"stack of boxes\": \"boxes\",\n\"title\": \"sign\",\n\"dinner chair\": \"chair\",\n\"cabinet /otherroom\": \"cabinet\",\n\"plug\": \"power outlet\",\n\"countertop /otherroom\": \"counter\",\n\"whiteboard\": \"whiteboard\",\n\"dishes\": \"plate\",\n\"flatware\": \"kitchen cabinet\",\n\"garage light\": \"ceiling light\",\n\"aquarium\": \"fish\",\n\"high shelf\": \"shelf\",\n\"clothes bag\": \"laundry bag\",\n\"copier machine\": \"copier\",\n\"partial\": \"structure\",\n\"stairs skirt\": \"stairs\",\n\"roomba\": \"roomba\",\n\"hutch\": \"cabinet\",\n\"icebox\": \"refrigerator\",\n\"stack\": \"stack of chairs\",\n\"wine cabinet\": \"cabinet\",\n\"plant ornament\": \"potted plant\",\n\"fireplace mirror\": \"mirror\",\n\"stand/small table\": \"end table\",\n\"artwork\": \"painting\",\n\"leg rest\": \"footrest\",\n\"counter door\": \"cabinet door\",\n\"artwork frame\": \"frame\",\n\"oil lamp\": \"lamp\",\n\"round chair\": \"chair\",\n\"violin case\": \"instrument case\",\n\"folding stand\": \"folded chair\",\n\"dress\": \"clothing\",\n\"art/statue\": \"statue\",\n\"cassette\": \"cd case\",\n\"clothes container\": \"clothing rack\",\n\"shelf / cabinet\": \"shelf\",\n\"gap\": \"divider\",\n\"alarm controller\": \"alarm clock\",\n\"wall control\": \"light switch\",\n\"flush button\": \"toilet flush button\",\n\"bread bin\": \"container\",\n\"soundbar\": \"speaker\",\n\"game board\": \"board\",\n\"chaise longue\": \"sofa chair\",\n\"lantern\": \"lamp\",\n\"glass door\": \"door\",\n\"shower battery\": \"shower control valve\",\n\"dispenser\": \"soap dispenser\",\n\"glass pane\": \"window\",\n\"basin faucet\": \"faucet\",\n\"lawn\": \"plant\",\n\"garden chair\": \"bench\",\n\"garden bench\": \"bench\",\n\"sky\": \"ceiling\",\n\"garden deck\": \"object\",\n\"ceiling corridor\": \"ceiling\",\n\"router\": \"object\",\n\"cleaning spray\": \"spray bottle\",\n\"elephant sculpture\": \"statue\",\n\"salt and pepper grinder\": \"salt\",\n\"projector screen\": \"projector screen\",\n\"gift\": \"box\",\n\"typewriter\": \"keyboard\",\n\"violin\": \"guitar\",\n\"sheet music\": \"music book\",\n\"kitchen chair\": \"chair\",\n\"cabinet drawer\": \"drawer\",\n\"pitchfork\": \"object\",\n\"hammer\": \"toolbox\",\n\"salver\": \"tray\",\n\"napkin\": \"towel\",\n\"fork\": \"knife block\",\n\"apron\": \"kitchen apron\",\n\"sledge\": \"object\",\n\"chamber pot\": \"toilet\",\n\"floor /otherroom\": \"floor\",\n\"door /otherroom\": \"door\",\n\"painting /otherroom\": \"painting\",\n\"window frame /otherroom\": \"window\",\n\"mannequin\": \"clothing\",\n\"cloth holder\": \"clothes hanger\",\n\"storage\": \"storage box\",\n\"bread\": \"food container\",\n\"bottom of stairs\": \"stairs\",\n\"fire pit\": \"fireplace\",\n\"balcony\": \"platform\",\n\"night lamp\": \"lamp\",\n\"bathroom window\": \"window\",\n\"skirting board\": \"wall\",\n\"pencil\": \"pen holder\",\n\"sticky notes\": \"paper\",\n\"pen\": \"pen holder\",\n\"spoon\": \"plate\",\n\"desk organizer\": \"organizer\",\n\"stapler\": \"paper\",\n\"flush\": \"toilet flush button\",\n\"document holder\": \"file cabinet\",\n\"conference phone\": \"telephone\",\n\"identifier\": \"object\",\n\"shower frame\": \"shower\",\n\"compressor\": \"machine\",\n\"recuperator\": \"machine\",\n\"terrace\": \"platform\",\n\"cook book\": \"book\",\n\"rocking horse\": \"object\",\n\"kitchen counter support\": \"kitchen counter\",\n\"keyboard piano\": \"piano\",\n\"power breaker box\": \"electric panel\",\n\"stack of yarns\": \"storage box\",\n\"den\": \"object\",\n\"sponge\": \"washcloth\",\n\"deck chair\": \"chair\",\n\"torch\": \"lamp\",\n\"trough\": \"bucket\",\n\"gutter\": \"pipe\",\n\"basket of fruits\": \"basket\",\n\"file\": \"paper\",\n\"stack of files\": \"file cabinet\",\n\"shower cockpit\": \"shower\",\n\"shower mirror\": \"mirror\",\n\"shower window frame\": \"window\",\n\"bathroom glass\": \"mirror\",\n\"circular sofa\": \"sofa\",\n\"soap dispenser shelf in shower\": \"soap dispenser\",\n\"recessed cubby\": \"shelf\",\n\"set of armchairs\": \"armchair\",\n\"umbrella stand\": \"stand\",\n\"post\": \"pillar\",\n\"bed cabinet lamp\": \"lamp\",\n\"cross-trainer\": \"exercise machine\",\n\"modem\": \"machine\",\n\"paper holder\": \"paper organizer\",\n\"rock\": \"object\",\n\"bathmat\": \"mat\",\n\"vanity\": \"bathroom vanity\",\n\"window /otherroom\": \"window\",\n\"junk\": \"trash can\",\n\"decorative mask\": \"decoration\",\n\"slab\": \"platform\",\n\"sofa set\": \"sofa\",\n\"pedestal\": \"stand\",\n\"display\": \"monitor\",\n\"lighting fixture\": \"lamp\",\n\"molding\": \"wall\",\n\"model\": \"object\",\n\"niche\": \"shelf\",\n\"billiard balls\": \"ball\",\n\"billiard cues\": \"stick\",\n\"ball pouffe\": \"seat\",\n\"air freshener\": \"spray bottle\",\n\"press\": \"machine\",\n\"pencil holder\": \"organizer\",\n\"hatch\": \"door\",\n\"jacuzzi\": \"bathtub\",\n\"bedding\": \"bed\",\n\"lounger\": \"chair\",\n\"salt and pepper\": \"dispenser\",\n\"kitchen sink cabinet\": \"sink\",\n\"lighter\": \"object\",\n\"fur carpet\": \"carpet\",\n\"bath side table\": \"end table\",\n\"bell\": \"object\",\n\"panel screen\": \"divider\",\n\"fish tank\": \"container\",\n\"skylight\": \"window\",\n\"material\": \"cloth\",\n\"mantle\": \"shelf\",\n\"note board\": \"bulletin board\",\n\"cd\": \"cd case\",\n\"pet bowl\": \"bowl\",\n\"wall top\": \"wall\",\n\"pet bed\": \"bed\",\n\"box with toys\": \"box\",\n\"cage\": \"container\",\n\"birdhouse\": \"decoration\",\n\"box with shoes\": \"box\",\n\"fluorescent light\": \"light\",\n\"medical lamp\": \"lamp\",\n\"paper towels\": \"towel\",\n\"relief\": \"object\",\n\"massage bed\": \"bed\",\n\"hand dryer\": \"hand dryer\",\n\"makeup accessories\": \"toiletry\",\n\"double armchair\": \"armchair\",\n\"bathroom wall\": \"wall\",\n\"art piece\": \"painting\",\n\"foot spa\": \"bathtub\",\n\"spa armchair\": \"armchair\",\n\"spa bathtub\": \"bathtub\",\n\"thermometer\": \"thermostat\",\n\"sauna seat\": \"seat\",\n\"sauna support\": \"handrail\",\n\"sauna oven\": \"oven\",\n\"foam\": \"sponge\",\n\"beverage dispenser\": \"water cooler\",\n\"box with tea\": \"box\",\n\"shelf with cosmetics\": \"shelf\",\n\"product\": \"object\",\n\"advertisement\": \"poster\",\n\"steel plate\": \"plate\",\n\"electric hub\": \"power outlet\",\n\"solarium\": \"structure\",\n\"flush push\": \"toilet flush button\",\n\"cleaner brush\": \"broom\",\n\"solarium door\": \"door\",\n\"bottle dispenser\": \"dispenser\",\n\"floor vent\": \"vent\",\n\"staircase\": \"stairs\",\n\"emergency sign\": \"exit sign\",\n\"bed base\": \"bed\",\n\"dehumidifier\": \"humidifier\",\n\"package\": \"box\",\n\"cyp\": \"plant\",\n\"stack of magazines\": \"magazine\",\n\"stack of binders\": \"binders\",\n\"planner\": \"calendar\",\n\"box of paper\": \"boxes of paper\",\n\"groceries\": \"food container\",\n\"alcohol bottles\": \"bottles\",\n\"cookies\": \"food display\",\n\"kitchenware\": \"cooking pan\",\n\"bag with something\": \"bag\",\n\"set of knives\": \"knife block\",\n\"display of pictures\": \"pictures\",\n\"toothpaste\": \"toothpaste\",\n\"toiletry bag\": \"toiletry\",\n\"sandals\": \"shoes\",\n\"mobile\": \"telephone\",\n\"cd player\": \"cd case\",\n\"breadbox\": \"storage box\",\n\"electric toothbrush\": \"toothbrush\",\n\"eyeglasses\": \"glass\",\n\"art/clutter\": \"decoration\",\n\"photo stand\": \"picture\",\n\"records\": \"music book\",\n\"microphone\": \"speaker\",\n\"shoe cabinet\": \"shoe rack\",\n\"dog leash\": \"object\",\n\"stairwell\": \"staircase\",\n\"flashlight\": \"light\",\n\"backsplash\": \"wall\",\n\"rag\": \"cloth\",\n\"chaise\": \"seat\",\n\"shoe rack\": \"shoe rack\",\n\"playpen\": \"crib\",\n\"shower cabinet\": \"shower\",\n\"chain\": \"chain\",\n\"drill\": \"object\",\n\"tool rack\": \"rack\",\n\"cork board\": \"bulletin board\",\n\"dish dryer\": \"dish rack\",\n\"hand cloth\": \"hand towel\",\n\"chest of drawer\": \"drawer\",\n\"secretary\": \"desk\",\n\"air hockey\": \"object\",\n\"medal\": \"object\",\n\"shower valve\": \"shower control valve\",\n\"ceiling border\": \"ceiling\",\n\"toaster oven\": \"toaster oven\",\n\"foot stand\": \"footrest\",\n\"closet floor\": \"floor\",\n\"recliner\": \"armchair\",\n\"condiment\": \"salt\",\n\"security detector\": \"smoke detector\",\n\"porcelain\": \"vase\",\n\"mail\": \"mail\",\n\"ceiling panel\": \"ceiling\",\n\"cleaning fluid\": \"laundry detergent\",\n\"ceiling lamp rail\": \"ceiling light\",\n\"medical object\": \"object\",\n\"electricity box\": \"electric panel\",\n\"meter\": \"scale\",\n\"electrical installation\": \"power outlet\",\n\"pole\": \"pillar\",\n\"extension lead\": \"power strip\",\n\"coaster\": \"mat\",\n\"bonsai tree\": \"plant\",\n\"window seat\": \"seat\",\n\"dish rack\": \"dish rack\",\n\"cradle\": \"crib\",\n\"tent\": \"structure\",\n\"headset\": \"headphones\",\n\"table pad\": \"table\",\n\"archway\": \"doorframe\",\n\"box of fruit\": \"box\",\n\"laundry bag\": \"laundry bag\",\n\"fireplace brush\": \"broom\",\n\"pantry\": \"kitchen cabinet\",\n\"shower case\": \"shower\",\n\"drawer desk\": \"desk\",\n\"soft chair\": \"armchair\",\n\"workstation\": \"desk\",\n\"moose head/sculpture/hunting trophy\": \"decoration\",\n\"plane\": \"airplane\",\n\"storage space\": \"storage bin\",\n\"cat litter box\": \"cat litter box\",\n\"separator\": \"divider\",\n\"bridge\": \"structure\",\n\"ball pool\": \"ball\",\n\"stage\": \"platform\",\n\"curb\": \"structure\",\n\"sunbed\": \"couch\",\n\"mailbox\": \"mailbox\",\n\"exercise ladder\": \"ladder\",\n\"stones\": \"structure\",\n\"sauna heat rocks\": \"structure\",\n\"barbecue\": \"object\",\n\"bath towels\": \"towel\",\n\"boat model\": \"boat\",\n\"copier\": \"copier\",\n\"bureau\": \"dresser\",\n\"kitchen sink\": \"sink\",\n\"bedside cabinet drawer\": \"drawer\",\n\"bedside cabinet door\": \"cabinet doors\",\n\"bath carpet\": \"rug\",\n\"bathroom cabinet drawer\": \"drawer\",\n\"bathroom cabinet door\": \"cabinet doors\",\n\"shower base\": \"shower\",\n\"tennis racket\": \"tennis racket\",\n\"window/door\": \"door\",\n\"bathtub knob\": \"bathtub\",\n\"set of towels\": \"towels\",\n\"glass container\": \"container\",\n\"stack of trays\": \"tray\",\n\"bathroom rug\": \"rug\",\n\"binders\": \"binders\",\n\"hole puncher\": \"stapler\",\n\"stools\": \"stool\",\n\"mousepad\": \"keyboard\",\n\"shoe case\": \"shoe rack\",\n\"screw box\": \"toolbox\",\n\"electric cord\": \"power outlet\",\n\"bricks\": \"structure\",\n\"plenum box\": \"box\",\n\"water meter\": \"object\",\n\"gauge\": \"object\",\n\"gas meter\": \"object\",\n\"water pump\": \"machine\",\n\"water outlet\": \"faucet\",\n\"crutches\": \"crutches\",\n\"beer crate\": \"crate\",\n\"bread box\": \"box\",\n\"electric plug\": \"power outlet\",\n\"air purifier\": \"fan\",\n\"bicycle helmet\": \"helmet\",\n\"extractor hood\": \"range hood\",\n\"brushes\": \"toilet brush\",\n\"vegetables\": \"object\",\n\"waffle iron\": \"iron\",\n\"bag with sheets\": \"bag\",\n\"lamp shade\": \"lamp\",\n\"cat bed\": \"bed\",\n\"row of theater chairs\": \"seat\",\n\"rice cooker\": \"rice cooker\",\n\"blanket basket\": \"basket\",\n\"blankets\": \"blanket\",\n\"scanner\": \"scanner\",\n\"shower utensil\": \"soap\",\n\"cabinet /w cluttered art\": \"cabinet\",\n\"place mat\": \"mat\",\n\"draw\": \"drawer\",\n\"water fountain\": \"water fountain\",\n\"aisle frame\": \"frame\",\n\"box with jewelry\": \"jewelry box\",\n\"decorative tray\": \"tray\",\n\"decorative vessel\": \"vase\",\n\"ceramics\": \"vase\",\n\"charger\": \"power outlet\",\n\"wifi router\": \"object\",\n\"prop\": \"decoration\",\n\"fishing rod\": \"rod\",\n\"fitness ball\": \"exercise ball\",\n\"exercising blocks\": \"dumbbell\",\n\"drawing\": \"picture\",\n\"foosball table\": \"foosball table\",\n\"baseboard\": \"board\",\n\"gas box\": \"box\",\n\"stone bench\": \"bench\",\n\"ceiling fan lamp\": \"ceiling fan\",\n\"chest bench\": \"bench\",\n\"fur\": \"rug\",\n\"shoulder bag\": \"bag\",\n\"decorative window\": \"window\",\n\"lace doily\": \"table\",\n\"bedroom table\": \"end table\",\n\"blouse\": \"clothing\",\n\"drawer cabinet\": \"cabinet\",\n\"dried flowers\": \"plant\",\n\"decorative frame\": \"frame\",\n\"box with photos\": \"box\",\n\"electric device\": \"power outlet\",\n\"electric freshener\": \"object\",\n\"fishing pole\": \"rod\",\n\"vinyl records\": \"object\",\n\"apple\": \"object\",\n\"card\": \"card\",\n\"star\": \"star\",\n\"ragdoll\": \"doll\",\n\"doily\": \"table\",\n\"ragdoll cat\": \"object\",\n\"pot lid\": \"pot\",\n\"pouches\": \"bag\",\n\"gun\": \"nerf gun\",\n\"handkerchiefs\": \"cloth\",\n\"training mat\": \"mat\",\n\"painting roll\": \"painting\",\n\"painting rolls\": \"painting\",\n\"painting tray\": \"painting\",\n\"controller\": \"controller\",\n\"electrical switchboard\": \"electric panel\",\n\"glue\": \"tape\",\n\"measuring tape\": \"tape\",\n\"ruler\": \"object\",\n\"wrench\": \"object\",\n\"pliers\": \"object\",\n\"screwdriver\": \"object\",\n\"level\": \"object\",\n\"spatula\": \"object\",\n\"square\": \"object\",\n\"roll\": \"paper towel roll\",\n\"cans\": \"can\",\n\"base\": \"object\",\n\"rods\": \"rod\",\n\"baseball cap\": \"cap\",\n\"shade rail\": \"rail\",\n\"crayon\": \"object\",\n\"drawer cart\": \"cart\",\n\"gas furnace\": \"furnace\",\n\"silicone tube\": \"tube\",\n\"drywall board\": \"wall\",\n\"canal\": \"object\",\n\"cat food bag\": \"object\",\n\"buckets\": \"bucket\",\n\"spirit level\": \"object\",\n\"air heater\": \"heater\",\n\"trampoline\": \"object\",\n\"folded table\": \"folded table\",\n\"hoverboard\": \"hoverboard\",\n\"kitchen towel\": \"towel\",\n\"knife stand\": \"knife block\",\n\"food processor\": \"kitchen mixer\",\n\"bottle of wine\": \"bottle\",\n\"frying pan\": \"frying pan\",\n\"cooker hood\": \"range hood\",\n\"ball chair\": \"chair\",\n\"leaflets\": \"paper\",\n\"strongbox\": \"chest\",\n\"dog toy\": \"toy dinosaur\",\n\"console pad\": \"controller\",\n\"console pad charger\": \"object\",\n\"console\": \"media center\",\n\"baby seat\": \"carseat\",\n\"ceiling floor\": \"ceiling\",\n\"drawers for clothes\": \"dresser\",\n\"drinking fountain\": \"water fountain\",\n\"cone\": \"traffic cone\",\n\"albums\": \"cd case\",\n\"sticker book\": \"book\",\n\"price tag\": \"object\",\n\"stack of chairs\": \"stack of chairs\",\n\"magic marker\": \"pen holder\",\n\"music equipment stand\": \"music stand\",\n\"electric percussion\": \"drum set\",\n\"magic marker box\": \"pen holder\",\n\"ceiling chassis\": \"ceiling\",\n\"ceiling hanger\": \"ceiling\",\n\"guitar pedals\": \"guitar\",\n\"ukulele\": \"guitar\",\n\"guitar cases\": \"guitar case\",\n\"guitar case\": \"guitar case\",\n\"guitar case cover\": \"guitar case\",\n\"keyboard cover\": \"keyboard\",\n\"product box\": \"box\",\n\"stack of product boxes\": \"boxes\",\n\"board with keys\": \"keyboard\",\n\"trombone\": \"instrument case\",\n\"trumpet\": \"instrument case\",\n\"saxophone\": \"instrument case\",\n\"clarinet\": \"instrument case\",\n\"tambourine\": \"instrument case\",\n\"product boxes\": \"boxes\",\n\"electric drum\": \"drum set\",\n\"cabinet counter\": \"kitchen cabinet\",\n\"counter desk\": \"desk\",\n\"album\": \"book\",\n\"stack of albums\": \"stack of chairs\",\n\"shop shelf\": \"shelf\",\n\"microphone accessory\": \"music stand\",\n\"audio cable\": \"power outlet\",\n\"audio cables\": \"power strip\",\n\"guitar straps\": \"guitar case\",\n\"t-shirt\": \"clothing\",\n\"stack of t-shirts\": \"stacks of cups\",\n\"socks\": \"sock\",\n\"products\": \"container\",\n\"keyboard box\": \"keyboard piano\",\n\"stack of music stands\": \"stack of chairs\",\n\"round cushion\": \"pillow\",\n\"tripod\": \"tripod\",\n\"baby chair\": \"armchair\",\n\"shoes rack\": \"shoe rack\",\n\"acoustic panel\": \"divider\",\n\"sauna\": \"shower\",\n\"game\": \"board\",\n\"electronics\": \"computer tower\",\n\"music equipment\": \"keyboard piano\",\n\"bunk bed\": \"bunk bed\",\n\"stack of plates\": \"plates\",\n\"antlers\": \"statue\",\n\"menu board\": \"board\",\n\"jars\": \"jar\",\n\"menu\": \"board\",\n\"cleaner bottle\": \"spray bottle\",\n\"receipt printer\": \"printer\",\n\"sombrero\": \"hat\",\n\"shisha\": \"pipe\",\n\"keg\": \"container\",\n\"saturator\": \"humidifier\",\n\"gas container\": \"container\",\n\"spices\": \"salt\",\n\"receipt spike\": \"paper\",\n\"spray can\": \"spray bottle\",\n\"bowls\": \"bowl\",\n\"bottle opener\": \"bottle\",\n\"gloves\": \"clothing\",\n\"stovetop\": \"stovetop\"\n}\n\nS3D_SCANNET = {\n    1: 'wall',\n    2: 'floor',\n    3: 'cabinet',\n    4: 'bed',\n    5: 'chair',\n    6: 'sofa',\n    7: 'table',\n    8: 'door',\n    9: 'window',\n    10: 'bookshelf',\n    11: 'picture',\n    12: 'counter',\n    13: 'blinds',\n    14: 'desk',\n    15: 'shelf',\n    16: 'curtain',\n    17: 'dresser',\n    18: 'pillow',\n    19: 'mirror',\n    20: 'mat',\n    21: 'clothes',\n    22: 'ceiling',\n    23: 'books',\n    24: 'refrigerator',\n    25: 'tv',\n    26: 'paper',\n    27: 'towel',\n    28: 'shower curtain',\n    29: 'box',\n    30: 'whiteboard',\n    31: 'person',\n    32: 'nightstand',\n    33: 'toilet',\n    34: 'sink',\n    35: 'lamp',\n    36: 'bathtub',\n    37: 'bag',\n    38: 'otherstructure',\n    39: 'otherfurniture',\n    40: 'otherprop'}\n"
  },
  {
    "path": "requirements.txt",
    "content": "git+https://github.com/openai/CLIP.git\ngit+https://github.com/facebookresearch/fvcore\naccelerate==0.28.0\naddict==2.4.0\nantlr4-python3-runtime==4.9.3\nappdirs==1.4.4\nasttokens==2.4.1\nattrs==23.2.0\nblinker==1.7.0\ncertifi==2024.2.2\ncharset-normalizer==3.3.2\nclick==8.1.7\ncomm==0.2.2\nConfigArgParse==1.7\ncontourpy==1.2.0\ncycler==0.12.1\ndash==2.16.1\ndash-core-components==2.0.0\ndash-html-components==2.0.0\ndash-table==5.0.0\ndecorator==5.1.1\ndocker-pycreds==0.4.0\neinops==0.7.0\nexceptiongroup==1.2.0\nexecuting==2.0.1\nfastjsonschema==2.19.1\nfilelock==3.13.3\nFlask==3.0.2\nfonttools==4.50.0\nfsspec==2024.3.1\nftfy==6.2.0\nfvcore==0.1.5.post20221221\ngitdb==4.0.11\nGitPython==3.1.42\nhuggingface-hub==0.22.1\nhydra-core==1.3.2\nidna==3.6\nimportlib_metadata==7.1.0\nimportlib_resources==6.4.0\niopath==0.1.10\nipython==8.18.1\nipywidgets==8.1.2\nitsdangerous==2.1.2\njedi==0.19.1\nJinja2==3.1.3\njoblib==1.3.2\njsonlines==4.0.0\njsonschema==4.21.1\njsonschema-specifications==2023.12.1\njupyter_core==5.7.2\njupyterlab_widgets==3.0.10\nkiwisolver==1.4.5\nMarkupSafe==2.1.5\nmatplotlib==3.8.3\nmatplotlib-inline==0.1.6\nmpmath==1.3.0\nnbformat==5.10.3\nnest-asyncio==1.6.0\nnetworkx==3.2.1\nnumpy==1.26.4\nnvidia-cublas-cu12==12.1.3.1\nnvidia-cuda-cupti-cu12==12.1.105\nnvidia-cuda-nvrtc-cu12==12.1.105\nnvidia-cuda-runtime-cu12==12.1.105\nnvidia-cudnn-cu12==8.9.2.26\nnvidia-cufft-cu12==11.0.2.54\nnvidia-curand-cu12==10.3.2.106\nnvidia-cusolver-cu12==11.4.5.107\nnvidia-cusparse-cu12==12.1.0.106\nnvidia-nccl-cu12==2.19.3\nnvidia-nvjitlink-cu12==12.4.99\nnvidia-nvtx-cu12==12.1.105\nomegaconf==2.3.0\nopen3d==0.18.0\nopencv-python==4.9.0.80\npackaging==24.0\npandas==2.2.1\nparso==0.8.3\npexpect==4.9.0\npillow==10.2.0\nplatformdirs==4.2.0\nplotly==5.20.0\nplyfile==1.0.3\nportalocker==2.8.2\nprompt-toolkit==3.0.43\nprotobuf==4.25.3\npsutil==5.9.8\nptyprocess==0.7.0\npure-eval==0.2.2\nPygments==2.17.2\npyparsing==3.1.2\npyquaternion==0.9.9\npython-dateutil==2.9.0.post0\npytz==2024.1\nPyYAML==6.0.1\nreferencing==0.34.0\nregex==2023.12.25\nrequests==2.31.0\nretrying==1.3.4\nrpds-py==0.18.0\nsafetensors==0.4.2\nscikit-learn==1.4.1.post1\nscipy==1.12.0\nsentry-sdk==1.42.0\nsetproctitle==1.3.3\nsix==1.16.0\nsmmap==5.0.1\nstack-data==0.6.3\nsympy==1.12\ntabulate==0.9.0\ntenacity==8.2.3\ntermcolor==2.4.0\nthreadpoolctl==3.4.0\ntokenizers==0.15.2\ntorch==2.2.1\ntorchvision==0.17.1\ntqdm==4.66.2\ntraitlets==5.14.2\ntransformers==4.39.1\ntriton==2.2.0\ntyping_extensions==4.10.0\ntzdata==2024.1\nurllib3==2.2.1\nwandb==0.16.4\nwcwidth==0.2.13\nWerkzeug==3.0.1\nwidgetsnbextension==4.0.10\nyacs==0.1.8\nzipp==3.18.1\n"
  },
  {
    "path": "run.py",
    "content": "from pathlib import Path\nimport hydra\nfrom datetime import datetime\nfrom omegaconf import OmegaConf, open_dict\nimport wandb\n\nimport common.io_utils as iu\nfrom common.misc import rgetattr\nfrom trainer.build import build_trainer\n\n\n@hydra.main(version_base=None, config_path=\"./configs\", config_name=\"default\")\ndef main(cfg):\n    if cfg.resume:\n        assert Path(cfg.exp_dir).exists(), f\"Resuming failed: {cfg.exp_dir} does not exist.\"\n        print(f\"Resuming from {cfg.exp_dir}\")\n        cfg = OmegaConf.load(Path(cfg.exp_dir) / 'config.yaml')\n        cfg.resume = True\n    else:\n        run_id = wandb.util.generate_id()\n        with open_dict(cfg):\n            cfg.logger.run_id = run_id\n    \n    OmegaConf.resolve(cfg)\n    naming_keys = [cfg.name]\n    for name in cfg.get('naming_keywords', []):\n        if name == \"time\":\n            continue\n        elif name == \"task\":\n            naming_keys.append(cfg.task)\n            if rgetattr(cfg, \"data.note\", None) is not None:\n                naming_keys.append(rgetattr(cfg, \"data.note\"))\n            else:\n                datasets = rgetattr(cfg, \"data.train\")\n                dataset_names = \"+\".join([str(x) for x in datasets])\n                naming_keys.append(dataset_names)\n        elif name == \"dataloader.batchsize\":\n            naming_keys.append(f\"b{rgetattr(cfg, name) * rgetattr(cfg, 'num_gpu')}\")\n        else:\n            if str(rgetattr(cfg, name)) != \"\":\n                naming_keys.append(str(rgetattr(cfg, name)))\n    exp_name = \"_\".join(naming_keys)\n\n    if rgetattr(cfg, \"debug.flag\", False):\n        exp_name = \"Debug_test\"\n    print(exp_name)\n\n    # Record the experiment\n    if not cfg.exp_dir:\n        cfg.exp_dir = Path(cfg.base_dir) / exp_name / f\"{datetime.now().strftime('%Y-%m-%d-%H:%M:%S.%f')}\" \n    else:\n        cfg.exp_dir = Path(cfg.exp_dir)\n    iu.make_dir(cfg.exp_dir)\n    OmegaConf.save(cfg, cfg.exp_dir / \"config.yaml\")\n\n    trainer = build_trainer(cfg)\n    trainer.run()\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "trainer/__init__.py",
    "content": "from .default_trainer import *\nfrom .openvocab_trainer import *\nfrom .objpretrain_trainer import *\nfrom .debug_trainer import *"
  },
  {
    "path": "trainer/build.py",
    "content": "import copy as cp\nimport glob\nfrom datetime import timedelta\nfrom pathlib import Path\nfrom omegaconf import OmegaConf\nfrom omegaconf import open_dict\nfrom tqdm import tqdm\nimport numpy as np\n\nfrom accelerate import Accelerator, DistributedDataParallelKwargs\nfrom accelerate.logging import get_logger\nfrom accelerate.utils import set_seed, InitProcessGroupKwargs\nfrom fvcore.common.registry import Registry\nimport torch\nimport wandb\n\nimport common.io_utils as iu\nfrom common.io_utils import make_dir\nimport common.misc as misc\nfrom data.build import build_dataloader\nfrom evaluator.build import build_eval\nfrom model.build import build_model\nfrom optim.build import build_optim\n\n\nTRAINER_REGISTRY = Registry(\"Trainer\")\n\n\nclass Tracker():\n    def __init__(self, cfg):\n        self.reset(cfg)\n\n    def step(self):\n        self.epoch += 1\n\n    def reset(self, cfg):\n        self.exp_name = f\"{cfg.exp_dir.parent.name.replace(f'{cfg.name}', '').lstrip('_')}/{cfg.exp_dir.name}\"\n        self.epoch = 0\n        self.best_result = -np.inf\n\n    def state_dict(self):\n        return {k: v for k, v in self.__dict__.items() if not k.startswith('__')}\n    \n    def load_state_dict(self, state_dict):\n        self.__dict__.update(state_dict)\n\n@TRAINER_REGISTRY.register()\nclass BaseTrainer():\n    def __init__(self, cfg):\n        set_seed(cfg.rng_seed)\n        self.debug = cfg.debug.get(\"flag\", False)\n        self.hard_debug = cfg.debug.get(\"hard_debug\", False)\n        self.epochs_per_eval = cfg.solver.get(\"epochs_per_eval\", None)\n        self.epochs_per_save = cfg.solver.get(\"epochs_per_save\", None)\n        self.global_step = 0\n        \n        # Initialize accelerator\n        self.exp_tracker = Tracker(cfg)\n        wandb_args = {\"entity\": cfg.logger.entity, \"id\": cfg.logger.run_id, \"resume\": cfg.resume}\n        if not cfg.logger.get('autoname'):\n            wandb_args[\"name\"] = self.exp_tracker.exp_name\n        # There is bug in logger setting, needs fixing from accelerate side\n        self.logger = get_logger(__name__)\n        self.mode = cfg.mode\n\n        ddp_kwargs = DistributedDataParallelKwargs(find_unused_parameters=True)\n        init_kwargs = InitProcessGroupKwargs(timeout=timedelta(seconds=5400))\n        kwargs = ([ddp_kwargs] if cfg.num_gpu > 1 else []) + [init_kwargs]\n\n        gradient_accumulation_steps = cfg.solver.get(\"gradient_accumulation_steps\", 1)\n        self.accelerator = Accelerator(\n            gradient_accumulation_steps=gradient_accumulation_steps,\n            log_with=cfg.logger.name,\n            kwargs_handlers=kwargs\n        )\n        if not self.hard_debug:\n            self.accelerator.init_trackers(\n                    project_name=cfg.name if not self.debug else \"Debug\",\n                    config=OmegaConf.to_container(cfg, resolve=True, throw_on_missing=True) if not cfg.resume else None,\n                    init_kwargs={\"wandb\": wandb_args}\n                )\n        print(OmegaConf.to_yaml(cfg))\n\n        if cfg.model.name == 'Query3D':\n            # choose whether to load mv or voxel features based on model.memories for Query3D\n            # TODO: a better way to do this?\n            if 'mv' in cfg.model.memories or 'sem' in cfg.model.memories:\n                cfg.data.load_multiview_info = True\n            if 'voxel' in cfg.model.memories or 'sem' in cfg.model.memories:\n                cfg.data.load_mask3d_voxel = True\n            txt_model2tokenizer = {'BERTLanguageEncoder': 'bert-base-uncased', 'CLIPLanguageEncoder': 'openai/clip-vit-large-patch14'}\n            cfg.data_wrapper.tokenizer = txt_model2tokenizer[cfg.model.txt_encoder.name]\n            \n        keys = [\"train\", \"val\", \"test\"] if self.mode == \"train\" else [\"test\"]\n        self.data_loaders = {key : build_dataloader(cfg, split=key) for key in keys}\n        self.model = build_model(cfg)\n        if self.mode == \"test\":\n            total_steps = 1\n        else:\n            total_steps = (len(self.data_loaders[\"train\"]) * cfg.solver.epochs) // gradient_accumulation_steps\n        self.loss, self.optimizer, self.scheduler = build_optim(cfg, self.model.get_opt_params(),\n                                                                total_steps= total_steps)\n\n        if misc.rgetattr(cfg, \"eval.pass_kwargs\", False):\n            kwargs = {\"dataloaders\": self.data_loaders}\n        else:\n            kwargs = {}\n        self.evaluator = build_eval(cfg, self.accelerator, **kwargs)\n\n        # Training details\n        self.epochs = cfg.solver.epochs\n        self.total_steps = 1 if self.mode == \"test\" else len(self.data_loaders[\"train\"]) * cfg.solver.epochs\n        self.grad_norm = cfg.solver.get(\"grad_norm\")\n\n        # Load pretrain model weights\n        if cfg.get('pretrain_ckpt_path'):\n            self.pretrain_ckpt_path = Path(cfg.pretrain_ckpt_path)\n            self.load_pretrain()\n\n        # Accelerator preparation\n        self.model, self.loss, self.optimizer, self.scheduler = self.accelerator.prepare(self.model, self.loss, self.optimizer, self.scheduler)\n        for name, loader in self.data_loaders.items():\n            if isinstance(loader, list):\n                loader = self.accelerator.prepare(*loader)\n            else:\n                loader = self.accelerator.prepare(loader)\n            self.data_loaders[name] = loader\n        self.accelerator.register_for_checkpointing(self.exp_tracker)\n\n        # Check if resuming from previous checkpoint is needed\n        self.ckpt_path = Path(cfg.ckpt_path) if cfg.get(\"ckpt_path\") else Path(cfg.exp_dir) / \"ckpt\" / \"best.pth\"\n        if cfg.resume:\n            self.resume()\n\n    def forward(self, data_dict):\n        return self.model(data_dict)\n\n    def backward(self, loss):\n        # Need to be reimplemented when using different sets of optimizer and schedulers\n        self.optimizer.zero_grad()\n        self.accelerator.backward(loss)\n        if self.grad_norm is not None and self.accelerator.sync_gradients:\n            self.accelerator.clip_grad_norm_(self.model.parameters(), self.grad_norm)\n        self.optimizer.step()\n        self.scheduler.step()\n\n    def log(self, results, mode=\"train\"):\n        if not self.hard_debug:\n            log_dict = {}\n            for key, val in results.items():\n                if isinstance(val, torch.Tensor):\n                    val = val.item()\n                log_dict[f\"{mode}/{key}\"] = val\n            if mode == \"train\":\n                lrs = self.scheduler.get_lr()\n                for i, lr in enumerate(lrs):\n                    log_dict[f\"{mode}/lr/group_{i}\"] = lr\n            self.accelerator.log(log_dict, step=self.global_step)\n\n    def save(self, name):\n        make_dir(self.ckpt_path.parent)\n        self.save_func(str(self.ckpt_path.parent / name))\n\n    def resume(self):\n        if self.ckpt_path.exists():\n            print(f\"Resuming from {str(self.ckpt_path)}\")\n            # self.logger.info(f\"Resuming from {str(self.ckpt_path)}\")\n            self.accelerator.load_state(str(self.ckpt_path))\n            # self.logger.info(f\"Successfully resumed from {self.ckpt_path}\")\n            print(f\"Successfully resumed from {self.ckpt_path}\")\n        else:\n            self.logger.info(\"training from scratch\")\n\n    def load_pretrain(self):\n        self.logger.info(f\"Loading pretrained weights from {str(self.pretrain_ckpt_path)}\")\n        model_weight_path_pattern = str(self.pretrain_ckpt_path / \"pytorch_model*.bin\")\n        model_weight_paths = glob.glob(model_weight_path_pattern)\n        if len(model_weight_paths) == 0:\n            raise FileNotFoundError(f\"Cannot find pytorch_model.bin in {str(self.pretrain_ckpt_path)}\")\n        weights = {}\n        for model_weight_path in model_weight_paths:\n            weights.update(torch.load(model_weight_path, map_location=\"cpu\"))\n        warning = self.model.load_state_dict(weights, strict=False)\n        self.logger.info(f\"Successfully loaded from {str(self.pretrain_ckpt_path)}: {warning}\")\n\n    def save_func(self, path):\n        self.accelerator.save_state(path)\n    \n\ndef build_trainer(cfg):\n    return TRAINER_REGISTRY.get(cfg.trainer)(cfg)\n"
  },
  {
    "path": "trainer/debug_trainer.py",
    "content": "import copy\nfrom tqdm import tqdm\n\nimport torch\nfrom trainer.build import TRAINER_REGISTRY\nfrom trainer.build import BaseTrainer\n\n\n@TRAINER_REGISTRY.register()\nclass DebugTrainer(BaseTrainer):\n    def __init__(self, cfg):\n        super().__init__(cfg)\n        self.best_metric = -1\n\n    def forward(self, data_dict):\n        return self.model(data_dict)\n\n    def backward(self, loss):\n        self.optimizer.zero_grad()\n        self.accelerator.backward(loss)\n        if self.grad_norm is not None and self.accelerator.sync_gradients:\n            self.accelerator.clip_grad_norm_(self.model.parameters(), self.grad_norm)\n        self.optimizer.step()\n        self.scheduler.step()\n\n    def train_step(self, epoch):\n        self.model.train()\n        loader = self.data_loaders[\"train\"]\n        pbar = tqdm(range(len(loader)), disable=(not self.accelerator.is_main_process),\n                    desc=f\"[Epoch {epoch + 1}/{self.epochs}]\")\n        for i, data_dict in enumerate(loader):\n            with self.accelerator.accumulate(self.model):\n                data_dict['cur_step'] = epoch * len(loader) + i\n                data_dict['total_steps'] = self.total_steps\n                # forward\n                pbar.update(1)\n\n    @torch.no_grad()\n    def eval_step(self, epoch):\n        self.model.eval()\n        loader = self.data_loaders[\"val\"]\n        pbar = tqdm(range(len(loader)), disable=(not self.accelerator.is_main_process))\n        for i, data_dict in enumerate(loader):\n            pbar.update(1)\n        return\n\n    @torch.no_grad()\n    def test_step(self):\n        self.model.eval()\n        loader = self.data_loaders[\"test\"]\n        pbar = tqdm(range(len(loader)), disable=(not self.accelerator.is_main_process))\n        for i, data_dict in enumerate(loader):\n            pbar.update(1)\n        return\n\n    def run(self):\n        if self.mode == \"train\":\n            start_epoch = self.exp_tracker.epoch\n            self.global_step = start_epoch * len(self.data_loaders[\"train\"])\n            for epoch in range(start_epoch, self.epochs):\n                self.exp_tracker.step()\n                self.train_step(epoch)\n\n                if self.epochs_per_eval and (epoch + 1) % self.epochs_per_eval == 0:\n                    self.eval_step(epoch)\n                break\n\n        self.test_step()\n        if self.mode == \"train\":\n            self.accelerator.end_training()\n"
  },
  {
    "path": "trainer/default_trainer.py",
    "content": "import copy\nfrom tqdm import tqdm\n\nimport torch\nfrom trainer.build import TRAINER_REGISTRY\nfrom trainer.build import BaseTrainer\n\n\n@TRAINER_REGISTRY.register()\nclass DefaultTrainer(BaseTrainer):\n    def __init__(self, cfg):\n        super().__init__(cfg)\n        self.best_metric = -1\n\n    def forward(self, data_dict):\n        return self.model(data_dict)\n\n    def backward(self, loss):\n        self.optimizer.zero_grad()\n        self.accelerator.backward(loss)\n        if self.grad_norm is not None and self.accelerator.sync_gradients:\n            self.accelerator.clip_grad_norm_(self.model.parameters(), self.grad_norm)\n        self.optimizer.step()\n        self.scheduler.step()\n\n    def train_step(self, epoch):\n        self.model.train()\n        loader = self.data_loaders[\"train\"]\n        pbar = tqdm(range(len(loader)), disable=(not self.accelerator.is_main_process), desc=f\"[Epoch {epoch + 1}/{self.epochs}]\")\n        for i, data_dict in enumerate(loader):\n            with self.accelerator.accumulate(self.model):\n                data_dict['cur_step'] = epoch * len(loader) + i\n                data_dict['total_steps'] = self.total_steps\n                # forward\n                data_dict = self.forward(data_dict)\n                # calculate loss\n                loss, losses = self.loss(data_dict)\n                # calculate evaluator\n                metrics = self.evaluator.batch_metrics(data_dict)\n                # optimize\n                self.backward(loss)\n                # record\n                self.global_step += 1\n                log_dict = {'step': self.global_step}\n                log_dict.update(losses)\n                log_dict.update(metrics)\n                self.log(log_dict, mode=\"train\")\n                pbar.update(1)\n\n    @torch.no_grad()\n    def eval_step(self, epoch):\n        self.model.eval()\n        loader = self.data_loaders[\"val\"]\n        pbar = tqdm(range(len(loader)), disable=(not self.accelerator.is_main_process))\n        for i, data_dict in enumerate(loader):\n            data_dict = self.forward(data_dict)\n            self.evaluator.update(data_dict)\n            pbar.update(1)\n        is_best, results = self.evaluator.record()\n        if is_best:\n            self.best_metric = results[\"target_metric\"]\n        self.log(results, mode=\"val\")\n        self.evaluator.reset()\n        return is_best\n\n    @torch.no_grad()\n    def test_step(self):\n        self.model.eval()\n        loader = self.data_loaders[\"test\"]\n        pbar = tqdm(range(len(loader)), disable=(not self.accelerator.is_main_process))\n        for i, data_dict in enumerate(loader):\n            data_dict = self.forward(data_dict)\n            self.evaluator.update(data_dict)\n            pbar.update(1)\n        is_best, results = self.evaluator.record()\n        self.log(results, mode=\"test\")\n        self.evaluator.reset()\n        return results\n\n    def run(self):\n        if self.mode == \"train\":\n            start_epoch = self.exp_tracker.epoch\n            self.global_step = start_epoch * len(self.data_loaders[\"train\"])\n            for epoch in range(start_epoch, self.epochs):\n                self.exp_tracker.step()\n                self.train_step(epoch)\n\n                if self.epochs_per_eval and (epoch + 1) % self.epochs_per_eval == 0:\n                    is_best = self.eval_step(epoch)\n                    self.accelerator.print(f\"[Epoch {epoch + 1}/{self.epochs}] finished eval, is_best: {is_best}\")\n                else:\n                    is_best = False\n\n                self.accelerator.wait_for_everyone()\n                if self.accelerator.is_main_process:\n                    self.save(\"latest.pth\")\n                    if is_best:\n                        self.save(\"best.pth\")\n                    if self.epochs_per_save and (epoch + 1) % self.epochs_per_save == 0:\n                        self.save(f\"ckpt_{epoch+1}.pth\")\n\n        self.test_step()\n        if self.mode == \"train\":\n            self.accelerator.end_training()\n"
  },
  {
    "path": "trainer/objpretrain_trainer.py",
    "content": "import copy\nfrom tqdm import tqdm\n\nimport torch\nfrom trainer.build import TRAINER_REGISTRY\nfrom trainer.build import BaseTrainer\n\n\n@TRAINER_REGISTRY.register()\nclass ObjPretrainTrainer(BaseTrainer):\n    def __init__(self, cfg):\n        super().__init__(cfg)\n        self.best_metric = -1\n\n    def forward(self, data_dict):\n        return self.model(data_dict)\n\n    def backward(self, loss):\n        self.optimizer.zero_grad()\n        self.accelerator.backward(loss)\n        if self.grad_norm is not None and self.accelerator.sync_gradients:\n            self.accelerator.clip_grad_norm_(self.model.parameters(), self.grad_norm)\n        self.optimizer.step()\n        self.scheduler.step()\n\n    def train_step(self, epoch):\n        self.model.train()\n        loader = self.data_loaders[\"train\"]\n        pbar = tqdm(range(len(loader)), disable=(not self.accelerator.is_main_process), desc=f\"[Epoch {epoch + 1}/{self.epochs}]\")\n        for i, data_dict in enumerate(loader):\n            with self.accelerator.accumulate(self.model):\n                # forward\n                data_dict = self.forward(data_dict)\n                # calculate loss\n                loss, losses = self.loss(data_dict)\n                # calculate evaluator\n                metrics = self.evaluator.batch_metrics(data_dict)\n                # optimize\n                self.backward(loss)\n                # record\n                self.global_step += 1\n                log_dict = {'step': self.global_step}\n                log_dict.update(losses)\n                log_dict.update(metrics)\n                self.log(log_dict, mode=\"train\")\n                pbar.update(1)\n\n    @torch.no_grad()\n    def eval_step(self, epoch):\n        self.model.eval()\n        loader = self.data_loaders[\"val\"]\n        pbar = tqdm(range(len(loader)), disable=(not self.accelerator.is_main_process))\n        for i, data_dict in enumerate(loader):\n            data_dict = self.forward(data_dict)\n            # data_dict = {\n            #     k : v.contiguous() for k, v in data_dict.items() if isinstance(v, torch.Tensor)\n            #     and k not in ['voxel_features', 'v2p_map', 'voxel_coords']\n            # }\n            # data_dict = self.accelerator.gather_for_metrics(data_dict)\n            self.evaluator.update(data_dict)\n            pbar.update(1)\n        is_best, results = self.evaluator.record()\n        if is_best:\n            self.best_metric = results[\"target_metric\"]\n        self.log(results, mode=\"val\")\n        self.evaluator.reset()\n        return is_best\n\n    @torch.no_grad()\n    def test_step(self):\n        self.model.eval()\n        loader = self.data_loaders[\"test\"]\n        pbar = tqdm(range(len(loader)), disable=(not self.accelerator.is_main_process))\n        for i, data_dict in enumerate(loader):\n            data_dict = self.forward(data_dict)\n            # data_dict = {\n            #     k : v.contiguous() for k, v in data_dict.items() if isinstance(v, torch.Tensor)\n            #     and k not in ['voxel_features', 'v2p_map', 'voxel_coords']\n            # }\n            self.evaluator.update(data_dict)\n            pbar.update(1)\n        is_best, results = self.evaluator.record()\n        self.log(results, mode=\"test\")\n        self.evaluator.reset()\n        return results\n\n    def run(self):\n        if self.mode == \"train\":\n            start_epoch = self.exp_tracker.epoch\n            self.global_step = start_epoch * len(self.data_loaders[\"train\"])\n            for epoch in range(start_epoch, self.epochs):\n                self.exp_tracker.step()\n                self.train_step(epoch)\n\n                if self.epochs_per_eval and (epoch + 1) % self.epochs_per_eval == 0:\n                    is_best = self.eval_step(epoch)\n                    self.accelerator.print(f\"[Epoch {epoch + 1}/{self.epochs}] finished eval, is_best: {is_best}\")\n                else:\n                    is_best = False\n\n                self.accelerator.wait_for_everyone()\n                if self.accelerator.is_main_process:\n                    if is_best:\n                        self.save(\"best.pth\")\n                    if self.epochs_per_save and (epoch + 1) % self.epochs_per_save == 0:\n                        self.save(f\"ckpt_{epoch+1}.pth\")\n\n        self.test_step()\n        if self.mode == \"train\":\n            self.accelerator.end_training()\n"
  },
  {
    "path": "trainer/openvocab_trainer.py",
    "content": "import copy\nfrom tqdm import tqdm\n\nimport torch\nfrom trainer.build import TRAINER_REGISTRY\nfrom trainer.build import BaseTrainer\n\n\n@TRAINER_REGISTRY.register()\nclass OpenVocabTrainer(BaseTrainer):\n    def __init__(self, cfg):\n        super().__init__(cfg)\n        self.best_metric = -1\n\n    def forward(self, data_dict):\n        return self.model(data_dict)\n\n    def backward(self, loss):\n        self.optimizer.zero_grad()\n        self.accelerator.backward(loss)\n        if self.grad_norm is not None and self.accelerator.sync_gradients:\n            self.accelerator.clip_grad_norm_(self.model.parameters(), self.grad_norm)\n        self.optimizer.step()\n        self.scheduler.step()\n\n    def train_step(self, epoch):\n        self.model.train()\n        loader = self.data_loaders[\"train\"]\n        pbar = tqdm(range(len(loader)), disable=(not self.accelerator.is_main_process), desc=f\"[Epoch {epoch + 1}/{self.epochs}]\")\n        for i, data_dict in enumerate(loader):\n            with self.accelerator.accumulate(self.model):\n                # forward\n                data_dict = self.forward(data_dict)\n                # calculate loss\n                loss, losses = self.loss(data_dict)\n                # calculate evaluator\n                metrics = self.evaluator[\"train\"].batch_metrics(data_dict)\n                # optimize\n                self.backward(loss)\n                # record\n                self.global_step += 1\n                log_dict = {'step': self.global_step}\n                log_dict.update(losses)\n                log_dict.update(metrics)\n                self.log(log_dict, mode=\"train\")\n                pbar.update(1)\n\n    @torch.no_grad()\n    def eval_step(self, epoch):\n        self.model.eval()\n        loader = self.data_loaders[\"val\"]\n        pbar = tqdm(range(len(loader)), disable=(not self.accelerator.is_main_process))\n        for i, data_dict in enumerate(loader):\n            data_dict = self.forward(data_dict)\n            self.evaluator[\"val\"].update(data_dict)\n            pbar.update(1)\n        is_best, results = self.evaluator[\"val\"].record()\n        if is_best:\n            self.best_metric = results[\"target_metric\"]\n        self.log(results, mode=\"val\")\n        self.evaluator[\"val\"].reset()\n        return is_best\n\n    @torch.no_grad()\n    def test_step(self):\n        self.model.eval()\n        loader = self.data_loaders[\"test\"]\n        pbar = tqdm(range(len(loader)), disable=(not self.accelerator.is_main_process))\n        for i, data_dict in enumerate(loader):\n            data_dict = self.forward(data_dict)\n            # data_dict = {\n            #     k: v.contiguous() for k, v in data_dict.items() if isinstance(v, torch.Tensor)\n            # }\n            # data_dict = self.accelerator.gather_for_metrics(data_dict)\n            self.evaluator[\"val\"].update(data_dict)\n            pbar.update(1)\n        is_best, results = self.evaluator[\"val\"].record()\n        self.log(results, mode=\"test\")\n        self.evaluator[\"val\"].reset()\n        return results\n\n    def run(self):\n        if self.mode == \"train\":\n            start_epoch = self.exp_tracker.epoch\n            self.global_step = start_epoch * len(self.data_loaders[\"train\"])\n            for epoch in range(start_epoch, self.epochs):\n                self.exp_tracker.step()\n                self.train_step(epoch)\n                # with torch.profiler.profile(record_shapes=True) as prof_train:\n                #     with torch.profiler.record_function(\"model_inference\"):\n                #         self.train_step(epoch)\n                # print(prof_train.key_averages().table(sort_by=\"cpu_time_total\", row_limit=20))\n\n                if self.epochs_per_eval and (epoch + 1) % self.epochs_per_eval == 0:\n                    is_best = self.eval_step(epoch)\n                    # with torch.profiler.profile(record_shapes=True) as prof:\n                    #     with torch.profiler.record_function(\"model_inference\"):\n                    #         is_best = self.eval_step(epoch)\n                    # print(prof.key_averages().table(sort_by=\"cpu_time_total\", row_limit=20))\n                    self.accelerator.print(f\"[Epoch {epoch + 1}/{self.epochs}] finished eval, is_best: {is_best}\")\n                else:\n                    is_best = False\n\n                self.accelerator.wait_for_everyone()\n                if self.accelerator.is_main_process:\n                    if is_best:\n                        self.save(\"best.pth\")\n                    if self.epochs_per_save and (epoch + 1) % self.epochs_per_save == 0:\n                        self.save(f\"ckpt_{epoch+1}.pth\")\n\n        self.test_step()\n        if self.mode == \"train\":\n            self.accelerator.end_training()\n"
  },
  {
    "path": "visualize_data.py",
    "content": "import argparse\nimport random\nimport json\nfrom pathlib import Path\n\nimport numpy as np\nimport torch\nimport open3d as o3d\n\n\ndef convert_pc_to_box(obj_pc):\n    xmin = np.min(obj_pc[:,0])\n    ymin = np.min(obj_pc[:,1])\n    zmin = np.min(obj_pc[:,2])\n    xmax = np.max(obj_pc[:,0])\n    ymax = np.max(obj_pc[:,1])\n    zmax = np.max(obj_pc[:,2])\n    center = [(xmin+xmax)/2, (ymin+ymax)/2, (zmin+zmax)/2]\n    box_size = [xmax-xmin, ymax-ymin, zmax-zmin]\n    return center, box_size\n\n\ndef load_scan(pcd_path, inst2label_path, scene_name):\n    pcd_data = torch.load(pcd_path / f'{scene_name}.pth')\n    inst_to_label = torch.load(inst2label_path / f\"{scene_name}.pth\")\n    points, colors, instance_labels = pcd_data[0], pcd_data[1], pcd_data[-1]\n    pcds = np.concatenate([points, colors], 1)\n    return points, colors, pcds, instance_labels, inst_to_label\n\n\ndef visualize_one_scene(obj_pcds, points, colors, caption):\n    # visualize scene\n    o3d_pcd = o3d.geometry.PointCloud()\n    o3d_pcd.points = o3d.utility.Vector3dVector(points)\n    o3d_pcd.colors = o3d.utility.Vector3dVector(colors / 255.0)\n    # visualize gt\n    for idx, (obj, obj_label) in enumerate(obj_pcds):\n        if idx > 3:\n            break\n        gt_center, gt_size = convert_pc_to_box(obj)\n        gt_o3d_box = o3d.geometry.OrientedBoundingBox(gt_center, np.eye(3,3), gt_size)\n        gt_o3d_box.color = [0, 1, 0]\n        mesh_frame = o3d.geometry.TriangleMesh.create_coordinate_frame(size=0.6, origin=[-0, -0, -0])\n        o3d.visualization.draw_geometries([o3d_pcd, gt_o3d_box, mesh_frame], window_name=obj_label+'_'+caption)\n\n\ndef visualize_data(save_root, scene_name, vis_obj=True):\n    inst2label_path = save_root / 'instance_id_to_label'\n    pcd_path = save_root / 'pcd_with_global_alignment'\n\n    points, colors, pcds, instance_labels, inst_to_label = load_scan(pcd_path, inst2label_path, scene_name)\n\n    if not vis_obj:\n        o3d_pcd = o3d.geometry.PointCloud()\n        mesh_frame = o3d.geometry.TriangleMesh.create_coordinate_frame(size=0.6, origin=[-0, -0, -0])\n        o3d_pcd.points = o3d.utility.Vector3dVector(points)\n        o3d_pcd.colors = o3d.utility.Vector3dVector(colors / 255.0)\n        o3d.visualization.draw_geometries([mesh_frame, o3d_pcd])\n        return\n\n    obj_pcds = []\n    for i in inst_to_label.keys():\n        mask = instance_labels == i     # time consuming\n        if np.sum(mask) == 0:\n            continue\n        obj_pcds.append((pcds[mask], inst_to_label[i]))\n\n    visualize_one_scene(obj_pcds, points, colors, scene_name)\n\ndef visualize_refer(save_root, anno_file):\n    inst2label_path = save_root / 'instance_id_to_label'\n    pcd_path = save_root / 'pcd_with_global_alignment'\n    json_data = json.load(open(anno_file, 'r', encoding='utf8'))\n    for item in json_data:\n        scan_id = item['scan_id']\n        inst2label_path = save_root / 'instance_id_to_label'\n        pcd_path = save_root / 'pcd_with_global_alignment'\n\n        inst_to_label = torch.load(inst2label_path / f\"{scan_id}.pth\")\n        pcd_data = torch.load(pcd_path / f'{scan_id}.pth')\n        points, colors, instance_labels = pcd_data[0], pcd_data[1], pcd_data[-1]\n        pcds = np.concatenate([points, colors], 1)\n\n        target_id = int(item['target_id'])\n        mask = instance_labels == target_id\n        if np.sum(mask) == 0:\n            continue\n\n        obj_pcds = [(pcds[mask], inst_to_label[target_id])]\n        visualize_one_scene(obj_pcds, points, colors, scan_id+'-'+item['utterance'])\n\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"-r\", \"--root\", required=True, type=str, help=\"path of dataset dir\")\n    parser.add_argument(\"-d\", \"--dataset\", type=str,\n                        help=\"available datasets in ['ARKitScenes', 'HM3D', 'MultiScan', 'ProcThor', \\\n                        'Structured3D', 'ScanNet', '3RScan']\")\n    parser.add_argument(\"--vis_refer\", action=\"store_true\",\n                        help=\"visualize reference data\")\n    parser.add_argument(\"-a\", \"--anno\", type=str, default=\"ssg_ref_rel2_template.json\",\n                        help=\"the annotation file for reference\")\n    args = parser.parse_args()\n    dataset = args.dataset\n    assert dataset in ['ARKitScenes', 'HM3D', 'MultiScan', 'ProcThor', 'Structured3D', 'ScanNet', '3RScan']\n    print(dataset)\n    data_root = Path(args.root) / dataset\n    if args.vis_refer:\n        if dataset == 'ScanNet':\n            anno_file = data_root / 'annotations/refer' / args.anno\n        else:\n            anno_file = data_root / 'annotations' / args.anno\n\n        visualize_refer(data_root / 'scan_data', anno_file)\n    else:\n        all_scans = (data_root / 'scan_data' / 'pcd_with_global_alignment').glob('*.pth')\n        scene_id = Path(random.choice(list(all_scans))).stem\n        visualize_data(data_root / 'scan_data', scene_id)"
  }
]