[
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2023 HAOMO.AI\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "# Cam4DOcc\n\nThe official code an data for the benchmark with baselines for our paper: [Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications](https://arxiv.org/abs/2311.17663)\n\nThis work has been accepted by CVPR 2024 :tada:\n\n[Junyi Ma#](https://github.com/BIT-MJY), [Xieyuanli Chen#](https://github.com/Chen-Xieyuanli), Jiawei Huang, [Jingyi Xu](https://github.com/BIT-XJY), [Zhen Luo](https://github.com/Blurryface0814), Jintao Xu, Weihao Gu, Rui Ai, [Hesheng Wang*](https://scholar.google.com/citations?hl=en&user=q6AY9XsAAAAJ)\n\n<img src=\"https://github.com/haomo-ai/Cam4DOcc/blob/main/benchmark.png\" width=\"49%\"/> <img src=\"https://github.com/haomo-ai/Cam4DOcc/blob/main/OCFNet.png\" width=\"49%\"/>\n\n\n\n## Citation\nIf you use Cam4DOcc in an academic work, please cite our paper:\n\n\t@inproceedings{ma2024cvpr,\n\t\tauthor = {Junyi Ma and Xieyuanli Chen and Jiawei Huang and Jingyi Xu and Zhen Luo and Jintao Xu and Weihao Gu and Rui Ai and Hesheng Wang},\n\t\ttitle = {{Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications}},\n\t\tbooktitle = {Proc.~of the IEEE/CVF Conf.~on Computer Vision and Pattern Recognition (CVPR)},\n\t\tyear = 2024\n\t}\n \n## Installation\n\n<details>\n\t\n<summary>We follow the installation instructions of our codebase OpenOccupancy, which are also posted here\n</summary>\n\n* Create a conda virtual environment and activate it\n```bash\nconda create -n cam4docc python=3.7 -y\nconda activate cam4docc\n```\n* Install PyTorch and torchvision (tested on torch==1.10.1 & cuda=11.3)\n```bash\nconda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge\n```\n* Install gcc>=5 in conda env\n```bash\nconda install -c omgarcia gcc-6\n```\n* Install mmcv, mmdet, and mmseg\n```bash\npip install mmcv-full==1.4.0\npip install mmdet==2.14.0\npip install mmsegmentation==0.14.1\n```\n* Install mmdet3d from the source code\n```bash\ngit clone https://github.com/open-mmlab/mmdetection3d.git\ncd mmdetection3d\ngit checkout v0.17.1 # Other versions may not be compatible.\npython setup.py install\n```\n* Install other dependencies\n```bash\npip install timm\npip install open3d-python\npip install PyMCubes\npip install spconv-cu113\npip install fvcore\npip install setuptools==59.5.0\n\npip install lyft_dataset_sdk # for lyft dataset\n```\n* Install occupancy pooling\n```\ngit clone git@github.com:haomo-ai/Cam4DOcc.git\ncd Cam4DOcc\nexport PYTHONPATH=“.”\npython setup.py develop\n```\n\n</details>\n\n## Data Structure\n\n### nuScenes dataset\n* Please link your [nuScenes V1.0 full dataset](https://www.nuscenes.org/nuscenes#download) to the data folder. \n* [nuScenes-Occupancy](https://drive.google.com/file/d/1vTbgddMzUN6nLyWSsCZMb9KwihS7nPoH/view?usp=sharing), [nuscenes_occ_infos_train.pkl](https://github.com/JeffWang987/OpenOccupancy/releases/tag/train_pkl), and [nuscenes_occ_infos_val.pkl](https://github.com/JeffWang987/OpenOccupancy/releases/tag/val_pkl) are also provided by the previous work. If you only want to reproduce the forecasting results with \"inflated\" form, nuScenes dataset and Cam4DOcc are all you need.\n\n### Lyft dataset\n* Please link your [Lyft dataset](https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles/data) to the data folder.\n* The required folders are listed below.\n\nNote that the folders under `cam4docc` will be generated automatically once you first run our training or evaluation scripts.\n\n```bash\nCam4DOcc\n├── data/\n│   ├── nuscenes/\n│   │   ├── maps/\n│   │   ├── samples/\n│   │   ├── sweeps/\n│   │   ├── lidarseg/\n│   │   ├── v1.0-test/\n│   │   ├── v1.0-trainval/\n│   │   ├── nuscenes_occ_infos_train.pkl\n│   │   ├── nuscenes_occ_infos_val.pkl\n│   ├── nuScenes-Occupancy/\n│   ├── lyft/\n│   │   ├── maps/\n│   │   ├── train_data/\n│   │   ├── images/   # from train images, containing xxx.jpeg\n│   ├── cam4docc\n│   │   ├── GMO/\n│   │   │   ├── segmentation/\n│   │   │   ├── instance/\n│   │   │   ├── flow/\n│   │   ├── MMO/\n│   │   │   ├── segmentation/\n│   │   │   ├── instance/\n│   │   │   ├── flow/\n│   │   ├── GMO_lyft/\n│   │   │   ├── ...\n│   │   ├── MMO_lyft/\n│   │   │   ├── ...\n```\nAlternatively, you could manually modify the path parameters in the [config files](https://github.com/haomo-ai/Cam4DOcc/tree/main/projects/configs/baselines) instead of using the default data structure, which are also listed here:\n```\nocc_path = \"./data/nuScenes-Occupancy\"\ndepth_gt_path = './data/depth_gt'\ntrain_ann_file = \"./data/nuscenes/nuscenes_occ_infos_train.pkl\"\nval_ann_file = \"./data/nuscenes/nuscenes_occ_infos_val.pkl\"\ncam4docc_dataset_path = \"./data/cam4docc/\"\nnusc_root = './data/nuscenes/'\n```\n\n## Training and Evaluation\n\nWe directly integrate the Cam4DOcc dataset generation pipeline into the dataloader, so you can directly run training or evaluate scripts and just wait :smirk:\n\nOptionally, you can set `only_generate_dataset=True` in the [config files](https://github.com/haomo-ai/Cam4DOcc/tree/main/projects/configs/baselines) to only generate the Cam4DOcc data without model training and inference.\n\n### Train OCFNetV1.1 with 8 GPUs\n\nOCFNetV1.1 can forecast inflated GMO and others. In this case, _vehicle_ and _human_ are considered as one unified category.\n\nFor the nuScenes dataset, please run\n\n```bash\nbash run.sh ./projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.1.py 8\n```\n\nFor the Lyft dataset, please run\n\n```bash\nbash run.sh ./projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.1_lyft.py 8\n```\n### Train OCFNetV1.2 with 8 GPUs\n\nOCFNetV1.2 can forecast inflated GMO including _bicycle_, _bus_, _car_, _construction_, _motorcycle_, _trailer_, _truck_, _pedestrian_, and others. In this case, _vehicle_ and _human_ are divided into multiple categories for clearer evaluation on forecasting performance.\n\nFor the nuScenes dataset, please run\n\n```bash\nbash run.sh ./projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.2.py 8\n```\n\nFor the Lyft dataset, please run\n\n```bash\nbash run.sh ./projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.2_lyft.py 8\n```\n\n* The training/test process will be accelerated several times after you generate datasets by the first epoch.\n\n### Test OCFNet for different tasks\n\nIf you only want to test the performance of occupancy prediction for the present frame (current observation), please set `test_present=True` in the [config files](https://github.com/haomo-ai/Cam4DOcc/tree/main/projects/configs/baselines). Otherwise, forecasting performance on the future interval is evaluated.\n\n```bash\nbash run_eval.sh $PATH_TO_CFG $PATH_TO_CKPT $GPU_NUM\n# e.g. bash run_eval.sh ./projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.1.py ./work_dirs/OCFNet_in_Cam4DOcc_V1.1/epoch_20.pth  8\n```\nPlease set `save_pred` and `save_path` in the config files once saving prediction results is needed.\n\n`VPQ` evaluation of 3D instance prediction will be refined in the future.\n\n### Visualization\n\nPlease install the dependencies as follows:\n\n```bash\nsudo apt-get install Xvfb\npip install xvfbwrapper\npip install mayavi\n```\nwhere `Xvfb` may be needed for visualization in your server.\n\n**Visualize ground-truth occupancy labels**. Set `show_time_change = True` if you want to show the changing state of occupancy in time intervals. \n\n```bash\ncd viz\npython viz_gt.py\n```\n<img src=\"https://github.com/haomo-ai/Cam4DOcc/blob/main/viz_occupancy.png\" width=\"100%\"/>\n\n**Visualize occupancy forecasting results**. Set `show_time_change = True` if you want to show the changing state of occupancy in time intervals. \n\n```bash\ncd viz\npython viz_pred.py\n```\n<img src=\"https://github.com/haomo-ai/Cam4DOcc/blob/main/viz_pred.png\" width=\"100%\"/>\n\nThere is still room for improvement. Camera-only 4D occupancy forecasting remains challenging, especially for predicting over longer time intervals with many moving objects. We envision this benchmark as a valuable evaluation tool, and our OCFNet can serve as a foundational codebase for future research on 4D occupancy forecasting.\n\n## Basic Information\n\nSome basic information as well as key parameters for our current version.\n\n| Type |  Info | Parameter |\n| :----: | :----: | :----: |\n| train           | 23,930 sequences | train_capacity |\n| val             | 5,119 frames | test_capacity |\n| voxel size      | 0.2m | voxel_x/y/z |\n| range           | [-51.2m, -51.2m, -5m, 51.2m, 51.2m, 3m]| point_cloud_range |\n| volume size     | [512, 512, 40]| occ_size |\n| classes         | 2 for V1.1 / 9 for V1.2 | num_cls |\n| observation frames | 3 | time_receptive_field |\n| future frames | 4 | n_future_frames |\n| extension frames | 6 | n_future_frames_plus |\n\nOur proposed OCFNet can still perform well while being trained with partial data. Please try to decrease `train_capacity` if you want to explore more details with sparser supervision signals. \n\nIn addition, please make sure that `n_future_frames_plus <= time_receptive_field + n_future_frames` because `n_future_frames_plus` means the real prediction number. We estimate more frames including the past ones rather than only `n_future_frames`.\n\n## Pretrained Models\n\nWe will provide our pretrained models of the erratum version. Your patience is appreciated.\n\n**Deprecated:**\n\n~~Please download our pretrained models (for epoch=20) to resume training or reproduce results.~~\n\n| Version | Google Drive <img src=\"https://ssl.gstatic.com/docs/doclist/images/drive_2022q3_32dp.png\" alt=\"Google Drive\" width=\"18\"/> | Baidu Cloud <img src=\"https://nd-static.bdstatic.com/m-static/v20-main/favicon-main.ico\" alt=\"Baidu Yun\" width=\"18\"/> | Config |\n| :---: | :---: | :---: | :---: |\n| ~~V1.0~~ | ~~link~~ | ~~link~~ | ~~only vehicle~~ |\n| V1.1 | [link](https://drive.google.com/file/d/1IXRqOQk3RKpIjGgBBqV9D9vgSt58QDr8/view?usp=sharing) | [link](https://pan.baidu.com/s/18gODsVnBAXEJ4pzv2-LqGA?pwd=m99b) | [OCFNet_in_Cam4DOcc_V1.1.py](https://github.com/haomo-ai/Cam4DOcc/blob/main/projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.1.py) |\n| V1.2 | [link](https://drive.google.com/file/d/1q1XnRt0wYE3oq6YBMBnagpGL7h2I46uN/view?usp=sharing) | [link](https://pan.baidu.com/s/1OPc1-a2McOO_0QPX63J7WQ?pwd=adic) | [OCFNet_in_Cam4DOcc_V1.2.py](https://github.com/haomo-ai/Cam4DOcc/blob/main/projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.2.py) |\n\n\n## Other Baselines\n\nWe also provide the evaluation on the forecasting performance of [other baselines](https://github.com/haomo-ai/Cam4DOcc/tree/main/other_baselines) in Cam4DOcc.\n\n## TODO\nThe tutorial is being updated ...\n\nWe will release our pretrained models as soon as possible. OCFNetV1.3 and OCFNetV2 are on their way ...\n\n\n### Acknowledgement\n\nWe thank the fantastic works [OpenOccupancy](https://github.com/JeffWang987/OpenOccupancy), [PowerBEV](https://github.com/EdwardLeeLPZ/PowerBEV), and [FIERY](https://anthonyhu.github.io/fiery) for their pioneer code release, which provide codebase for this benchmark.\n"
  },
  {
    "path": "data/README.md",
    "content": "### Data Structure\n\nPlease link your [nuScenes V1.0 full dataset ](https://www.nuscenes.org/nuscenes#download) to the data folder. \n\n[nuScenes-Occupancy](https://drive.google.com/file/d/1vTbgddMzUN6nLyWSsCZMb9KwihS7nPoH/view?usp=sharing), [nuscenes_occ_infos_train.pkl](https://github.com/JeffWang987/OpenOccupancy/releases/tag/train_pkl), and [nuscenes_occ_infos_val.pkl](https://github.com/JeffWang987/OpenOccupancy/releases/tag/val_pkl) are also provided by the previous work. If you only want to reproduce the forecasting results with \"inflated\" form, nuScenes dataset and Cam4DOcc are all you need.\n\nNote that the folders under `cam4docc` will be generated automatically once you first run our training or evaluation scripts.\n\n```bash\nCam4DOcc\n├── data/\n│   ├── nuscenes/\n│   │   ├── maps/\n│   │   ├── samples/\n│   │   ├── sweeps/\n│   │   ├── lidarseg/\n│   │   ├── v1.0-test/\n│   │   ├── v1.0-trainval/\n│   │   ├── nuscenes_occ_infos_train.pkl/\n│   │   ├── nuscenes_occ_infos_val.pkl/\n│   ├── nuScenes-Occupancy/\n│   ├── cam4docc\n│   │   ├── GMO/\n│   │   │   ├── segmentation/\n│   │   │   ├── instance/\n│   │   │   ├── flow/\n│   │   ├── MMO/\n│   │   │   ├── segmentation/\n│   │   │   ├── instance/\n│   │   │   ├── flow/\n```\nThe GMO folder will contain the data where vehicle and human are considered as one unified category.\n\nThe MMO folder will contain the data where vehicle and human are divided into multiple categories for clearer evaluation on forecasting performance.\n\nIn near future, we will unify GMO and MMO for easier usage.\n"
  },
  {
    "path": "data/cam4docc/.gitkeep",
    "content": ""
  },
  {
    "path": "data/cam4docc/GMO/.gitkeep",
    "content": ""
  },
  {
    "path": "data/cam4docc/GMO_lyft/.gitkeep",
    "content": ""
  },
  {
    "path": "data/cam4docc/MMO/.gitkeep",
    "content": ""
  },
  {
    "path": "data/cam4docc/MMO_lyft/.gitkeep",
    "content": ""
  },
  {
    "path": "data/nuscenes/.gitkeep",
    "content": ""
  },
  {
    "path": "other_baselines/README.md",
    "content": "## I. Static World\r\n\r\nThe static world model is built based on the identity hypothesis.\r\n\r\n```bash\r\ncd other_baselines/static_world\r\npython ./eval_static_world.py\r\n```\r\n#### Parameters:\r\n* **test_idx_dir**: Path of test indexes, which is generated by the standard OCFNet evaluation process. \r\n* **test_results_dir**: Path of occupancy prediction results. Here we simply set it to the path of OCFNet forecasting results and use the present occupancy prediction results for evaluation. You can also replace them with [OpenOccupancy](https://github.com/JeffWang987/OpenOccupancy) estimation results.\r\n* **gt_dir**: Path of ground-truth segmentations.\r\n\r\n##  II. Voxelization of PCP\r\n\r\nVoxelization of point cloud prediction requires the outputs of [PCPNet](https://github.com/Blurryface0814/PCPNet). Here we use nuScenes-Occupancy as ground-truth since predicted points are limited by sparsity.\r\n```bash\r\ncd other_baselines/voxel_pcp\r\npython ./eval_voxel_pcp.py\r\n```\r\n#### Parameters:\r\n* **test_idx_dir**: Path of test indexes, which is generated by the standard OCFNet evaluation process.\r\n* **occ_path**: Path of nuScenes-Occupancy.\r\n* **test_results_dir**: Path of point cloud prediction results. The data is organized as follows:\r\n\r\n```bash\r\nCam4DOcc\r\n├── data/\r\n│   ├── cam4docc/\r\n│   │   ├── pcpnet_results/\r\n│   │   │   ├── point_clouds/\r\n│   │   │   │   ├── past/\r\n│   │   │   │   │   ├── 000000.ply\r\n│   │   │   │   │   ├── 000001.ply\r\n│   │   │   │   │   ├── 000002.ply\r\n│   │   │   │   │   ├── 000003.ply\r\n│   │   │   │   ├── pred/\r\n│   │   │   │   │   ├── 000000.ply\r\n│   │   │   │   │   ├── ...\r\n│   │   │   ├── saved_labels/\r\n│   │   │   │   ├── past/\r\n│   │   │   │   │   ├── 000000.label\r\n│   │   │   │   │   ├── 000001.label\r\n│   │   │   │   │   ├── 000002.label\r\n│   │   │   │   │   ├── 000003.label\r\n│   │   │   │   ├── pred/\r\n│   │   │   │   │   ├── 000000.ply\r\n│   │   │   │   │   ├── ...\r\n```\r\nWe will provide our PCPNet predictions soon and please open an issue [here](https://github.com/Blurryface0814/PCPNet) if you have questions about how PCPNet is implemented for points forecasting.\r\n\r\n## III. 2D-3D Lifted Prediction\r\n\r\n2D-3D lifted prediction requires the outputs of [PowerBEV](https://github.com/EdwardLeeLPZ/PowerBEV). \r\n\r\n```bash\r\ncd other_baselines/lifted_2d\r\npython ./eval_lifted_2d.py\r\n```\r\n#### Parameters:\r\n* **test_idx_dir**: Path of test indexes, which is generated by the standard OCFNet evaluation process.\r\n* **gt_dir**: Path of ground-truth segmentations.\r\n* **hmin**: minimum height for lifting operation.\r\n* **hmax**: maximum height for lifting operation.\r\n* **test_results_dir**: Path of point cloud prediction results. The data is organized as follows:\r\n```bash\r\nCam4DOcc\r\n├── data/\r\n│   ├── cam4docc/\r\n│   │   ├── powerbev_results/\r\n│   │   │   ├── {scene_token}_{lidar_token}.npz\r\n│   │   │   ├── ...\r\n```\r\nWe have provided our [PowerBEV predictions](https://drive.google.com/file/d/1X_N-GwU2ZB65UI9-EYpeQrb2BzS44VVX/view?usp=sharing) and please open an issue [here](https://github.com/EdwardLeeLPZ/PowerBEV) if you have questions about how PowerBEV is implemented for BEV-based instance prediction.\r\n\r\nMore refinement strategies for the baselines will be released ... Before that, please simply use the scripts here for fast evaluation.\r\n\r\n## Publications\r\n\r\nIf you use our proposed baselines in your work, please cite as:\r\n\r\n* Cam4DOcc\r\n```\r\n@inproceedings{ma2024cvpr,\r\n\tauthor = {Junyi Ma and Xieyuanli Chen and Jiawei Huang and Jingyi Xu and Zhen Luo and Jintao Xu and Weihao Gu and Rui Ai and Hesheng Wang},\r\n\ttitle = {{Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications}},\r\n\tbooktitle = {Proc.~of the IEEE/CVF Conf.~on Computer Vision and Pattern Recognition (CVPR)},\r\n\tyear = 2024\r\n}\r\n```\r\n\r\n* OpenOccupancy\r\n```\r\n@article{wang2023openoccupancy,\r\n  title={Openoccupancy: A large scale benchmark for surrounding semantic occupancy perception},\r\n  author={Wang, Xiaofeng and Zhu, Zheng and Xu, Wenbo and Zhang, Yunpeng and Wei, Yi and Chi, Xu and Ye, Yun and Du, Dalong and Lu, Jiwen and Wang, Xingang},\r\n  journal={arXiv preprint arXiv:2303.03991},\r\n  year={2023}\r\n}\r\n```\r\n\r\n* PCPNet\r\n```\r\n@ARTICLE{10141631,\r\n  author={Luo, Zhen and Ma, Junyi and Zhou, Zijie and Xiong, Guangming},\r\n  journal={IEEE Robotics and Automation Letters}, \r\n  title={PCPNet: An Efficient and Semantic-Enhanced Transformer Network for Point Cloud Prediction}, \r\n  year={2023},\r\n  volume={8},\r\n  number={7},\r\n  pages={4267-4274},\r\n  doi={10.1109/LRA.2023.3281937}}\r\n```\r\n* PowerBEV\r\n```\r\n@inproceedings{ijcai2023p120,\r\n  title     = {PowerBEV: A Powerful Yet Lightweight Framework for Instance Prediction in Bird’s-Eye View},\r\n  author    = {Li, Peizheng and Ding, Shuxiao and Chen, Xieyuanli and Hanselmann, Niklas and Cordts, Marius and Gall, Juergen},\r\n  booktitle = {Proceedings of the Thirty-Second International Joint Conference on\r\n               Artificial Intelligence, {IJCAI-23}},\r\n  pages     = {1080--1088},\r\n  year      = {2023},\r\n  month     = {8},\r\n  doi       = {10.24963/ijcai.2023/120},\r\n}\r\n```\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n"
  },
  {
    "path": "other_baselines/lifted_2d/eval_lifted_2d.py",
    "content": "# Developed by Junyi Ma\n# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications\n# https://github.com/haomo-ai/Cam4DOcc\n\nfrom tqdm import trange\nimport numpy as np\nfrom nuscenes import NuScenes\nimport os\nimport torch\nimport torch.nn.functional as F\nimport copy\nfrom pyquaternion import Quaternion\n\n# Setups =================================================================================================\ntest_idx_dir = \"../../data/cam4docc/test_ids/\"\ntest_results_dir = \"../../data/cam4docc/powerbev_results/\"\ngt_dir = \"../../data/cam4docc/MMO/segmentation/\"\n\ntest_seqs = os.listdir(test_idx_dir)\ntest_segmentations = os.listdir(test_results_dir)\ndimension = [512, 512, 40]\nfuture_ious = [0, 0, 0, 0]\n\nvoxel_size = np.array([0.2,0.2,0.2])\npc_range = np.array([-50, -50, 0, 50, 50, 0])\nvoxel_size_new = np.array([0.2,0.2,0.2])\npc_range_new = np.array([-51.2, -51.2, -5, 51.2, 51.2, 3])\n\n# 10*0.2=2m \n# You can modify the parameters to show the changes with variable heights for lifting\nhmin = -1\nhmax = 9\n\nnusc = NuScenes(version='v1.0-trainval', dataroot=\"../../data/nuscenes\", verbose=False)\n# ========================================================================================================\n\ndef cm_to_ious(cm):\n    mean_ious = []\n    cls_num = len(cm)\n    for i in range(cls_num):\n        tp = cm[i, i]\n        p = cm[:, i].sum()\n        g = cm[i, :].sum()\n        union = p + g - tp\n        mean_ious.append(tp / union)\n    return mean_ious\n\ndef fast_hist(pred, label, max_label=18): \n    pred = copy.deepcopy(pred.flatten())\n    label = copy.deepcopy(label.flatten())\n    bin_count = np.bincount(max_label * label.astype(int) + pred, minlength=max_label ** 2)\n    iou_per_pred = (bin_count[-1]/(bin_count[-1]+bin_count[1]+bin_count[2]))\n    return bin_count[:max_label ** 2].reshape(max_label, max_label),iou_per_pred\n\nfor i in trange(len(test_seqs)):\n    segmentation_file = test_results_dir + test_seqs[i]\n    instance_seq = np.load(segmentation_file)['arr_0']\n    instance_seq = torch.from_numpy(instance_seq)\n\n    test_seqs_idxs = np.load(test_idx_dir+test_seqs[i])[\"arr_0\"]\n    gt_segmentation_file = os.path.join(gt_dir, test_seqs[i])\n    gt_segmentation_seqs = np.load(gt_segmentation_file, allow_pickle=True)['arr_0']\n\n    for t in range(3, 7):\n        scene_token_cur = test_seqs_idxs[t].split(\"_\")[0]\n        lidar_token_cur = test_seqs_idxs[t].split(\"_\")[1]\n\n        instance_ = instance_seq[0,(t-1)].unsqueeze(0)  # t-1 -> t\n        instance_ = instance_.unsqueeze(0)\n        instance_ = F.interpolate(instance_.float(), size=[500, 500], mode='nearest').contiguous() # Note: default PowerBEV has different ranges with OCFNet\n        instance_ = instance_.squeeze(0)\n\n        x_grid = torch.linspace(0, 500-1, 500, dtype=torch.float)\n        x_grid = x_grid.view(500, 1).expand(500,500)\n        y_grid = torch.linspace(0, 500-1,500, dtype=torch.float)\n        y_grid = y_grid.view(1, 500).expand(500,500)\n        mesh_grid_2d = torch.stack((x_grid, y_grid), -1)\n        mesh_grid_2d = mesh_grid_2d.view(-1, 2)\n        instance_ = instance_.view(-1, 1)\n\n        semantics_lifted = []\n        for ii in range(hmin, hmax): \n            semantics_lifted_ = torch.cat((mesh_grid_2d, ii*torch.ones_like(mesh_grid_2d[:,0:1])),dim=-1)\n            semantics_lifted_ = torch.cat((semantics_lifted_, instance_),dim=-1)\n            semantics_lifted.append(semantics_lifted_)\n\n        semantics_lifted = np.array(torch.cat(semantics_lifted, dim=0))\n\n        kept = semantics_lifted[:,-1]!=0\n        semantics_lifted = semantics_lifted[kept]\n        if semantics_lifted.shape[0] == 0:\n            semantics_lifted = np.zeros((1,4))\n\n        lidar_sample = nusc.get('sample_data', lidar_token_cur)\n        lidar_sample_calib = nusc.get('calibrated_sensor', lidar_sample['calibrated_sensor_token'])\n        lidar_sensor_rotation = Quaternion(lidar_sample_calib['rotation'])\n        lidar_sensor_translation = np.array(lidar_sample_calib['translation'])[:, None]\n        lidar_to_lidarego = np.vstack([\n            np.hstack((lidar_sensor_rotation.rotation_matrix, lidar_sensor_translation)),\n            np.array([0, 0, 0, 1])\n        ])\n        lidarego_to_lidar = np.linalg.inv(lidar_to_lidarego)\n        points = np.ones_like(semantics_lifted)\n        points[:,:3] = semantics_lifted[:,:3]\n        points[:,:3] = points[:,:3] * voxel_size[None, :] + pc_range[:3][None, :] \n        points = lidarego_to_lidar @ points.T\n        semantics_lifted_transformed = np.ones_like(semantics_lifted)\n        semantics_lifted_transformed[:,:3] = (points.T)[:,:3]\n        semantics_lifted_transformed[:,-1] = semantics_lifted[:,-1]\n        semantics_lifted_transformed[:,:3] = (semantics_lifted_transformed[:,:3] - pc_range_new[:3][None, :]) / voxel_size_new[None, :] \n\n        pred_segmentation = np.zeros((dimension[0], dimension[1], dimension[2]))\n        for j in range(semantics_lifted_transformed.shape[0]):\n            cur_ind = semantics_lifted_transformed[j, :3].astype(int)\n            cur_label = semantics_lifted_transformed[j, -1]\n            if cur_label != 0:\n                pred_segmentation[cur_ind[0],cur_ind[1],cur_ind[2]] = 1\n\n        gt_segmentation = np.zeros((dimension[0], dimension[1], dimension[2]))\n        gt_segmentation_raw = gt_segmentation_seqs[t].cpu().numpy()\n        gt_segmentation[gt_segmentation_raw[:,0].astype(int),gt_segmentation_raw[:,1].astype(int),gt_segmentation_raw[:,2].astype(int)] = gt_segmentation_raw[:, -1]\n\n        hist_cur, iou_per_pred = fast_hist(pred_segmentation.astype(int), gt_segmentation.astype(int), max_label=2)\n        \n        if t <= 3:\n            future_ious[0] = future_ious[0] + hist_cur\n        if t <= 4:\n            future_ious[1] = future_ious[1] + hist_cur\n        if t <= 5:\n            future_ious[2] = future_ious[2] + hist_cur\n        if t <= 6:\n            future_ious[3] = future_ious[3] + hist_cur\n\nfor t in range(len(future_ious)):\n    print(\"iou for step \"+str(t), cm_to_ious(future_ious[t]))"
  },
  {
    "path": "other_baselines/static_world/eval_static_world.py",
    "content": "# Developed by Junyi Ma\n# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications\n# https://github.com/haomo-ai/Cam4DOcc\n\nimport numpy as np\nimport os\nimport copy\nfrom tqdm import trange\n\n# Setups =================================================================================================\ntest_idx_dir = \"../../data/cam4docc/test_ids/\"\ntest_results_dir = \"../../data/cam4docc/results/\"\ngt_dir = \"../../data/cam4docc/MMO/segmentation/\"\n\nobjects_max_label = 9\n\ntest_seqs = os.listdir(test_idx_dir)\ntest_segmentations = os.listdir(test_results_dir)\ndimension = [512, 512, 40]\nfuture_ious = [0, 0, 0, 0]\n# ========================================================================================================\n\ndef cm_to_ious(cm):\n    mean_ious = []\n    cls_num = len(cm)\n    for i in range(cls_num):\n        tp = cm[i, i]\n        p = cm[:, i].sum()\n        g = cm[i, :].sum()\n        union = p + g - tp\n        mean_ious.append(tp / union)\n    return mean_ious\n\ndef fast_hist(pred, label, max_label=18):\n    pred = copy.deepcopy(pred.flatten())\n    label = copy.deepcopy(label.flatten())\n    bin_count = np.bincount(max_label * label.astype(int) + pred, minlength=max_label ** 2)\n    iou_per_pred = (bin_count[-1]/(bin_count[-1]+bin_count[1]+bin_count[2]))\n    return bin_count[:max_label ** 2].reshape(max_label, max_label),iou_per_pred\n\nfor i in trange(len(test_seqs)):\n    segmentation_file = test_results_dir + test_seqs[i]\n    if not os.path.exists(segmentation_file):\n        continue\n    segmentation = np.load(segmentation_file,allow_pickle=True)['arr_0']\n    test_seqs_idxs = np.load(os.path.join(test_idx_dir, test_seqs[i]))[\"arr_0\"]\n    gt_segmentation_file = os.path.join(gt_dir, test_seqs[i])\n    gt_segmentation_seqs = np.load(gt_segmentation_file,allow_pickle=True)['arr_0']\n\n    # hard coding for input:3 output:4\n    for t in range(3,7):\n        # static world using present predictions\n        segmentation_t = segmentation[0]\n        pred_segmentation = np.zeros((dimension[0], dimension[1], dimension[2]))\n        for j in range(segmentation_t.shape[0]):\n            cur_ind = segmentation_t[j, :3].astype(int)\n            cur_label = segmentation_t[j, -1]\n            pred_segmentation[cur_ind[0],cur_ind[1],cur_ind[2]] = cur_label\n\n        gt_segmentation = np.zeros((dimension[0], dimension[1], dimension[2]))\n        gt_segmentation_raw = gt_segmentation_seqs[t]\n        for k in range(gt_segmentation_raw.shape[0]):\n            cur_ind = gt_segmentation_raw[k, :3].astype(int)\n            cur_label = gt_segmentation_raw[k, -1]\n            gt_segmentation[cur_ind[0],cur_ind[1],cur_ind[2]] = cur_label\n        hist_cur, iou_per_pred = fast_hist(pred_segmentation.astype(int), gt_segmentation.astype(int), max_label=objects_max_label)\n        if t <= 3:\n            future_ious[0] = future_ious[0] + hist_cur\n        if t <= 4:\n            future_ious[1] = future_ious[1] + hist_cur\n        if t <= 5:\n            future_ious[2] = future_ious[2] + hist_cur\n        if t <= 6:\n            future_ious[3] = future_ious[3] + hist_cur\n\nfor t in range(len(future_ious)):\n    print(\"iou for step \"+str(t), cm_to_ious(future_ious[t]))"
  },
  {
    "path": "other_baselines/voxel_pcp/eval_voxel_pcp.py",
    "content": "# Developed by Junyi Ma\n# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications\n# https://github.com/haomo-ai/Cam4DOcc\n\nimport numpy as np\nimport os\nimport copy\nfrom tqdm import trange\nimport open3d as o3d\nfrom nuscenes import NuScenes\nfrom pyquaternion import Quaternion\n\n# Setups =================================================================================================\ntest_idx_dir = \"../../data/cam4docc/test_ids/\"\ntest_results_dir = \"../../data/cam4docc/pcpnet_results/\"\nocc_path = \"../../data/nuScenes-Occupancy\"\n\ntest_seqs = os.listdir(test_idx_dir)\ntest_segmentations = os.listdir(test_results_dir)\npc_range= np.array([-51.2, -51.2, -5.0, 51.2, 51.2, 3.0])\ndimension = [512, 512, 40]\ngrid_size= np.array(dimension)\nvoxel_size = (pc_range[3:] -pc_range[:3]) / grid_size\nfuture_ious = [0, 0, 0, 0]\n\nnusc = NuScenes(version='v1.0-trainval', dataroot=\"../../data/nuscenes\", verbose=False)\n# ========================================================================================================\n\nlidar_token2sample_token = {}\nfor i in range(len(nusc.sample)):  \n    my_sample = nusc.sample[i]\n    frame_token = my_sample['token']\n    lidar_token = my_sample['data']['LIDAR_TOP']\n    lidar_token2sample_token[lidar_token] = frame_token\n\ndef voxel2world(voxel):\n    \"\"\"\n    voxel: [N, 3]\n    \"\"\"\n    return voxel *voxel_size[None, :] + pc_range[:3][None, :]\n\ndef world2voxel(world):\n    \"\"\"\n    world: [N, 3]\n    \"\"\"\n    return (world - pc_range[:3][None, :]) / voxel_size[None, :]\n\ndef cm_to_ious(cm):\n    mean_ious = []\n    cls_num = len(cm)\n    for i in range(cls_num):\n        tp = cm[i, i]\n        p = cm[:, i].sum()\n        g = cm[i, :].sum()\n        union = p + g - tp\n        mean_ious.append(tp / union)\n    return mean_ious\n\ndef fast_hist(pred, label, max_label=18):\n    pred = copy.deepcopy(pred.flatten())\n    label = copy.deepcopy(label.flatten())\n    bin_count = np.bincount(max_label * label.astype(int) + pred, minlength=max_label ** 2)\n    iou_per_pred = (bin_count[-1]/(bin_count[-1]+bin_count[1]+bin_count[2]))\n    return bin_count[:max_label ** 2].reshape(max_label, max_label),iou_per_pred\n\ndef nb_process_label(processed_label, sorted_label_voxel_pair):\n    label_size = 256\n    counter = np.zeros((label_size,), dtype=np.uint16)\n    counter[sorted_label_voxel_pair[0, 3]] = 1\n    cur_sear_ind = sorted_label_voxel_pair[0, :3]\n    for i in range(1, sorted_label_voxel_pair.shape[0]):\n        cur_ind = sorted_label_voxel_pair[i, :3]\n        if not np.all(np.equal(cur_ind, cur_sear_ind)):\n            processed_label[cur_sear_ind[0], cur_sear_ind[1], cur_sear_ind[2]] = np.argmax(counter)\n            counter = np.zeros((label_size,), dtype=np.uint16)\n            cur_sear_ind = cur_ind\n        counter[sorted_label_voxel_pair[i, 3]] += 1\n    processed_label[cur_sear_ind[0], cur_sear_ind[1], cur_sear_ind[2]] = np.argmax(counter)\n    \n    return processed_label\n\ndef get_ego2lidar_pose(rec):\n    lidar_top_data = nusc.get('sample_data', rec['data']['LIDAR_TOP'])\n    lidar2ego_translation = nusc.get('calibrated_sensor', lidar_top_data['calibrated_sensor_token'])['translation']\n    lidar2ego_rotation =  nusc.get('calibrated_sensor', lidar_top_data['calibrated_sensor_token'])['rotation']\n    trans = -np.array(lidar2ego_translation)\n    rot = Quaternion(lidar2ego_rotation).inverse\n    return trans, rot\n\ndef get_lidar_pose(rec):\n    current_sample = nusc.get('sample', rec['token'])\n    egopose = nusc.get('ego_pose', nusc.get('sample_data', current_sample['data']['LIDAR_TOP'])['ego_pose_token'])\n    trans = -np.array(egopose['translation'])\n    rot = Quaternion(egopose['rotation']).inverse  \n    return trans, rot\n\nfor i in trange(len(test_seqs)):\n    test_seqs_idxs = np.load(os.path.join(test_idx_dir, test_seqs[i]))['arr_0']\n    scene_token_present = test_seqs[i].split(\"_\")[0]\n    lidar_token_present = test_seqs[i].split(\"_\")[1][:-4]\n\n    # transform past point clouds to the present frame\n    # point cloud prediction baseline is limited by sparsity of laser points, so we aggregate\n    # past point clouds to mitigate in this version\n    # More reasonable versions will be released\n    past_voxels = []\n    for t in range(1, 4):\n        scene_token_ = test_seqs_idxs[t-1].split(\"_\")[0]\n        lidar_token_ = test_seqs_idxs[t-1].split(\"_\")[1]\n        point_file = test_results_dir+\"point_clouds/\"+scene_token_present+\"_\"+lidar_token_present+\"/past/00000\"+str(t)+\".ply\"\n        label_file = test_results_dir+\"saved_labels/\"+scene_token_present+\"_\"+lidar_token_present+\"/past/00000\"+str(t)+\".label\"\n        \n        pcd_load = o3d.io.read_point_cloud(point_file)\n        xyz_load = np.asarray(pcd_load.points)\n\n        sample_token_present = lidar_token2sample_token[lidar_token_present]\n        rec_present = nusc.get('sample', sample_token_present)\n        translation_present, rotation_present = get_lidar_pose(rec_present)\n        ego2lidar_translation_present, ego2lidar_rotation_present = get_ego2lidar_pose(rec_present)\n\n        sample_token_ = lidar_token2sample_token[lidar_token_]\n        rec_ = nusc.get('sample', sample_token_)\n        translation_, rotation_ = get_lidar_pose(rec_)\n        ego2lidar_translation_, ego2lidar_rotation_ = get_ego2lidar_pose(rec_)\n\n        present_global2ego = [translation_present, rotation_present]\n        present_ego2lidar = [ego2lidar_translation_present, ego2lidar_rotation_present]\n        cur_global2ego = [translation_, rotation_]\n        cur_ego2lidar = [ego2lidar_translation_, ego2lidar_rotation_]\n        pcd_np_cor = np.dot(cur_ego2lidar[1].inverse.rotation_matrix, xyz_load.T)\n        pcd_np_cor = pcd_np_cor.T\n        pcd_np_cor = pcd_np_cor - cur_ego2lidar[0]\n        pcd_np_cor = np.dot(cur_global2ego[1].inverse.rotation_matrix, pcd_np_cor.T)\n        pcd_np_cor = pcd_np_cor.T\n        pcd_np_cor = pcd_np_cor - cur_global2ego[0]\n        pcd_np_cor = pcd_np_cor + present_global2ego[0]\n        pcd_np_cor = np.dot(present_global2ego[1].rotation_matrix, pcd_np_cor.T)\n        pcd_np_cor = pcd_np_cor.T\n        pcd_np_cor = pcd_np_cor + present_ego2lidar[0]   # trans\n        pcd_np_cor = np.dot(present_ego2lidar[1].rotation_matrix, pcd_np_cor.T)\n        xyz_load = pcd_np_cor.T   \n\n        xyz_load = world2voxel(xyz_load)\n        label = np.fromfile(label_file, dtype=np.uint32)\n        label = label.reshape((-1,1))\n        segmentation_t = np.concatenate((xyz_load, label), axis=-1)\n        kept = (segmentation_t[:,0]>0) & (segmentation_t[:,0]<dimension[0]) & (segmentation_t[:,1]>0) & (segmentation_t[:,1]<dimension[1])  & (segmentation_t[:,2]>0) & (segmentation_t[:,2]<dimension[2])\n        segmentation_t = segmentation_t[kept]\n        past_voxels.append(segmentation_t)\n\n    past_voxel_aggregated = np.concatenate(past_voxels, axis=0)\n\n    # for future forecasting\n    for t in range(3, 7):\n        scene_token_ = test_seqs_idxs[t].split(\"_\")[0]\n        lidar_token_ = test_seqs_idxs[t].split(\"_\")[1]\n\n        point_file = test_results_dir+\"point_clouds/\"+scene_token_present+\"_\"+lidar_token_present+\"/pred/00000\"+str(t-3)+\".ply\"\n        label_file = test_results_dir+\"saved_labels/\"+scene_token_present+\"_\"+lidar_token_present+\"/pred/00000\"+str(t-3)+\".label\"\n\n        pcd_load = o3d.io.read_point_cloud(point_file)\n        xyz_load = np.asarray(pcd_load.points)\n        xyz_load = world2voxel(xyz_load)\n\n        label = np.fromfile(label_file, dtype=np.uint32)\n        label = label.reshape((-1,1))\n\n        segmentation_t = np.concatenate((xyz_load, label), axis=-1)\n        kept = (segmentation_t[:,0]>0) & (segmentation_t[:,0]<dimension[0]) & (segmentation_t[:,1]>0) & (segmentation_t[:,1]<dimension[1])  & (segmentation_t[:,2]>0) & (segmentation_t[:,2]<dimension[2])\n        segmentation_t = segmentation_t[kept]\n        segmentation_t = np.concatenate((segmentation_t, past_voxel_aggregated), axis=0)\n\n        pred_segmentation = np.zeros((dimension[0], dimension[1], dimension[2]))\n        pred_segmentation[segmentation_t[:, 0].astype(int), segmentation_t[:, 1].astype(int), segmentation_t[:, 2].astype(int)] = segmentation_t[:, -1]\n\n        # eval according to setups\n        # hardcoding for classes of interest\n        for otheridx in [0,1,8,11,12,13,14,15,16,17,18,255]:\n            pred_segmentation[pred_segmentation==otheridx] = 0\n        for vehidx in [2,3,4,5,6,7,9,10]:\n            pred_segmentation[pred_segmentation==vehidx] = 1\n\n        # load nuScenes-Occupancy\n        rel_path = 'scene_{0}/occupancy/{1}.npy'.format(scene_token_, lidar_token_)\n        gt_segmentation_file = os.path.join(occ_path, rel_path)\n        pcd = np.load(gt_segmentation_file)\n\n        pcd_label = pcd[..., -1:]\n        pcd_label[pcd_label==0] = 255\n        pcd_np_cor = voxel2world(pcd[..., [2,1,0]] + 0.5)\n        pcd_np_cor = world2voxel(pcd_np_cor)\n\n        # make sure the point is in the grid\n        pcd_np_cor = np.clip(pcd_np_cor, np.array([0,0,0]), grid_size - 1)\n        transformed_occ = copy.deepcopy(pcd_np_cor)\n        pcd_np = np.concatenate([pcd_np_cor, pcd_label], axis=-1)\n\n        # 255: noise, 1-16 normal classes, 0 unoccupied\n        pcd_np = pcd_np[np.lexsort((pcd_np_cor[:, 0], pcd_np_cor[:, 1], pcd_np_cor[:, 2])), :]\n        pcd_np = pcd_np.astype(np.int64)\n        processed_label = np.ones(grid_size, dtype=np.uint8) * 0.0\n        processed_label = nb_process_label(processed_label, pcd_np)\n        # Opt.\n        # processed_label[pcd_np[:, 0].astype(int), pcd_np[:, 1].astype(int), pcd_np[:, 2].astype(int)] = pcd_np[:, -1]\n\n        for otheridx in [0,1,8,11,12,13,14,15,16,17,18,255]:\n            processed_label[processed_label==otheridx] = 0\n        for vehidx in [2,3,4,5,6,7,9,10]:\n            processed_label[processed_label==vehidx] = 1\n\n        hist_cur, iou_per_pred = fast_hist(pred_segmentation.astype(int), processed_label.astype(int), max_label=2)\n\n        if t <= 3:\n            future_ious[0] = future_ious[0] + hist_cur\n        if t <= 4:\n            future_ious[1] = future_ious[1] + hist_cur\n        if t <= 5:\n            future_ious[2] = future_ious[2] + hist_cur\n        if t <= 6:\n            future_ious[3] = future_ious[3] + hist_cur\n            \nfor t in range(len(future_ious)):\n    print(\"ious for step \"+str(t), cm_to_ious(future_ious[t]))"
  },
  {
    "path": "projects/__init__.py",
    "content": ""
  },
  {
    "path": "projects/configs/_base_/datasets/custom_lyft-3d.py",
    "content": "# If point cloud range is changed, the models should also change their point\n# cloud range accordingly\npoint_cloud_range = [-80, -80, -5, 80, 80, 3]\n# For Lyft we usually do 9-class detection\nclass_names = [\n    'car', 'truck', 'bus', 'emergency_vehicle', 'other_vehicle', 'motorcycle',\n    'bicycle', 'pedestrian', 'animal'\n]\ndataset_type = 'CustomLyftDataset'\ndata_root = 'data/lyft/'\n# Input modality for Lyft dataset, this is consistent with the submission\n# format which requires the information in input_modality.\ninput_modality = dict(\n    use_lidar=True,\n    use_camera=False,\n    use_radar=False,\n    use_map=False,\n    use_external=True)\nfile_client_args = dict(backend='disk')\n# Uncomment the following if use ceph or other file clients.\n# See https://mmcv.readthedocs.io/en/latest/api.html#mmcv.fileio.FileClient\n# for more details.\n# file_client_args = dict(\n#     backend='petrel',\n#     path_mapping=dict({\n#         './data/lyft/': 's3://lyft/lyft/',\n#         'data/lyft/': 's3://lyft/lyft/'\n#    }))\ntrain_pipeline = [\n    dict(\n        type='LoadPointsFromFile',\n        coord_type='LIDAR',\n        load_dim=5,\n        use_dim=5,\n        file_client_args=file_client_args),\n    dict(\n        type='LoadPointsFromMultiSweeps',\n        sweeps_num=10,\n        file_client_args=file_client_args),\n    dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),\n    dict(\n        type='GlobalRotScaleTrans',\n        rot_range=[-0.3925, 0.3925],\n        scale_ratio_range=[0.95, 1.05],\n        translation_std=[0, 0, 0]),\n    dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),\n    dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),\n    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),\n    dict(type='PointShuffle'),\n    dict(type='DefaultFormatBundle3D', class_names=class_names),\n    dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])\n]\ntest_pipeline = [\n    dict(\n        type='LoadPointsFromFile',\n        coord_type='LIDAR',\n        load_dim=5,\n        use_dim=5,\n        file_client_args=file_client_args),\n    dict(\n        type='LoadPointsFromMultiSweeps',\n        sweeps_num=10,\n        file_client_args=file_client_args),\n    dict(\n        type='MultiScaleFlipAug3D',\n        img_scale=(1333, 800),\n        pts_scale_ratio=1,\n        flip=False,\n        transforms=[\n            dict(\n                type='GlobalRotScaleTrans',\n                rot_range=[0, 0],\n                scale_ratio_range=[1., 1.],\n                translation_std=[0, 0, 0]),\n            dict(type='RandomFlip3D'),\n            dict(\n                type='PointsRangeFilter', point_cloud_range=point_cloud_range),\n            dict(\n                type='DefaultFormatBundle3D',\n                class_names=class_names,\n                with_label=False),\n            dict(type='Collect3D', keys=['points'])\n        ])\n]\n# construct a pipeline for data and gt loading in show function\n# please keep its loading function consistent with test_pipeline (e.g. client)\neval_pipeline = [\n    dict(\n        type='LoadPointsFromFile',\n        coord_type='LIDAR',\n        load_dim=5,\n        use_dim=5,\n        file_client_args=file_client_args),\n    dict(\n        type='LoadPointsFromMultiSweeps',\n        sweeps_num=10,\n        file_client_args=file_client_args),\n    dict(\n        type='DefaultFormatBundle3D',\n        class_names=class_names,\n        with_label=False),\n    dict(type='Collect3D', keys=['points'])\n]\n\ndata = dict(\n    samples_per_gpu=2,\n    workers_per_gpu=2,\n    train=dict(\n        type=dataset_type,\n        data_root=data_root,\n        ann_file=data_root + 'lyft_infos_train.pkl',\n        pipeline=train_pipeline,\n        classes=class_names,\n        modality=input_modality,\n        test_mode=False),\n    val=dict(\n        type=dataset_type,\n        data_root=data_root,\n        ann_file=data_root + 'lyft_infos_val.pkl',\n        pipeline=test_pipeline,\n        classes=class_names,\n        modality=input_modality,\n        test_mode=True),\n    test=dict(\n        type=dataset_type,\n        data_root=data_root,\n        ann_file=data_root + 'lyft_infos_val.pkl',\n        pipeline=test_pipeline,\n        classes=class_names,\n        modality=input_modality,\n        test_mode=True))\n# For Lyft dataset, we usually evaluate the model at the end of training.\n# Since the models are trained by 24 epochs by default, we set evaluation\n# interval to be 24. Please change the interval accordingly if you do not\n# use a default schedule.\nevaluation = dict(interval=24, pipeline=eval_pipeline)"
  },
  {
    "path": "projects/configs/_base_/datasets/custom_nus-3d.py",
    "content": "# If point cloud range is changed, the models should also change their point\n# cloud range accordingly\npoint_cloud_range = [-50, -50, -5, 50, 50, 3]\n# For nuScenes we usually do 10-class detection\nclass_names = [\n    'car', 'truck', 'trailer', 'bus', 'construction_vehicle', 'bicycle',\n    'motorcycle', 'pedestrian', 'traffic_cone', 'barrier'\n]\ndataset_type = 'NuScenesDataset_eval_modified'\ndata_root = 'data/nuscenes/'\n# Input modality for nuScenes dataset, this is consistent with the submission\n# format which requires the information in input_modality.\ninput_modality = dict(\n    use_lidar=True,\n    use_camera=False,\n    use_radar=False,\n    use_map=False,\n    use_external=False)\nfile_client_args = dict(backend='disk')\n# Uncomment the following if use ceph or other file clients.\n# See https://mmcv.readthedocs.io/en/latest/api.html#mmcv.fileio.FileClient\n# for more details.\n# file_client_args = dict(\n#     backend='petrel',\n#     path_mapping=dict({\n#         './data/nuscenes/': 's3://nuscenes/nuscenes/',\n#         'data/nuscenes/': 's3://nuscenes/nuscenes/'\n#     }))\ntrain_pipeline = [\n    dict(\n        type='LoadPointsFromFile',\n        coord_type='LIDAR',\n        load_dim=5,\n        use_dim=5,\n        file_client_args=file_client_args),\n    dict(\n        type='LoadPointsFromMultiSweeps',\n        sweeps_num=10,\n        file_client_args=file_client_args),\n    dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),\n    dict(\n        type='GlobalRotScaleTrans',\n        rot_range=[-0.3925, 0.3925],\n        scale_ratio_range=[0.95, 1.05],\n        translation_std=[0, 0, 0]),\n    dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),\n    dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),\n    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),\n    dict(type='ObjectNameFilter', classes=class_names),\n    dict(type='PointShuffle'),\n    dict(type='DefaultFormatBundle3D', class_names=class_names),\n    dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])\n]\ntest_pipeline = [\n    dict(\n        type='LoadPointsFromFile',\n        coord_type='LIDAR',\n        load_dim=5,\n        use_dim=5,\n        file_client_args=file_client_args),\n    dict(\n        type='LoadPointsFromMultiSweeps',\n        sweeps_num=10,\n        file_client_args=file_client_args),\n    dict(\n        type='MultiScaleFlipAug3D',\n        img_scale=(1333, 800),\n        pts_scale_ratio=1,\n        flip=False,\n        transforms=[\n            dict(\n                type='GlobalRotScaleTrans',\n                rot_range=[0, 0],\n                scale_ratio_range=[1., 1.],\n                translation_std=[0, 0, 0]),\n            dict(type='RandomFlip3D'),\n            dict(\n                type='PointsRangeFilter', point_cloud_range=point_cloud_range),\n            dict(\n                type='DefaultFormatBundle3D',\n                class_names=class_names,\n                with_label=False),\n            dict(type='Collect3D', keys=['points'])\n        ])\n]\n# construct a pipeline for data and gt loading in show function\n# please keep its loading function consistent with test_pipeline (e.g. client)\neval_pipeline = [\n    dict(\n        type='LoadPointsFromFile',\n        coord_type='LIDAR',\n        load_dim=5,\n        use_dim=5,\n        file_client_args=file_client_args),\n    dict(\n        type='LoadPointsFromMultiSweeps',\n        sweeps_num=10,\n        file_client_args=file_client_args),\n    dict(\n        type='DefaultFormatBundle3D',\n        class_names=class_names,\n        with_label=False),\n    dict(type='Collect3D', keys=['points'])\n]\n\ndata = dict(\n    samples_per_gpu=4,\n    workers_per_gpu=4,\n    train=dict(\n        type=dataset_type,\n        data_root=data_root,\n        ann_file=data_root + 'nuscenes_infos_train.pkl',\n        pipeline=train_pipeline,\n        classes=class_names,\n        modality=input_modality,\n        test_mode=False,\n        # we use box_type_3d='LiDAR' in kitti and nuscenes dataset\n        # and box_type_3d='Depth' in sunrgbd and scannet dataset.\n        box_type_3d='LiDAR'),\n    val=dict(\n        type=dataset_type,\n        ann_file=data_root + 'nuscenes_infos_val.pkl',\n        pipeline=test_pipeline,\n        classes=class_names,\n        modality=input_modality,\n        test_mode=True,\n        box_type_3d='LiDAR'),\n    test=dict(\n        type=dataset_type,\n        data_root=data_root,\n        ann_file=data_root + 'nuscenes_infos_val.pkl',\n        pipeline=test_pipeline,\n        classes=class_names,\n        modality=input_modality,\n        test_mode=True,\n        box_type_3d='LiDAR'))\n# For nuScenes dataset, we usually evaluate the model at the end of training.\n# Since the models are trained by 24 epochs by default, we set evaluation\n# interval to be 24. Please change the interval accordingly if you do not\n# use a default schedule.\nevaluation = dict(interval=24, pipeline=eval_pipeline)\n"
  },
  {
    "path": "projects/configs/_base_/datasets/custom_waymo-3d.py",
    "content": "# dataset settings\n# D5 in the config name means the whole dataset is divided into 5 folds\n# We only use one fold for efficient experiments\ndataset_type = 'CustomWaymoDataset'\ndata_root = 'data/waymo/kitti_format/'\nfile_client_args = dict(backend='disk')\n# Uncomment the following if use ceph or other file clients.\n# See https://mmcv.readthedocs.io/en/latest/api.html#mmcv.fileio.FileClient\n# for more details.\n# file_client_args = dict(\n#     backend='petrel', path_mapping=dict(data='s3://waymo_data/'))\n\nimg_norm_cfg = dict(\n    mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)\nclass_names = ['Car', 'Pedestrian', 'Cyclist']\npoint_cloud_range = [-74.88, -74.88, -2, 74.88, 74.88, 4]\ninput_modality = dict(use_lidar=False, use_camera=True)\ndb_sampler = dict(\n    data_root=data_root,\n    info_path=data_root + 'waymo_dbinfos_train.pkl',\n    rate=1.0,\n    prepare=dict(\n        filter_by_difficulty=[-1],\n        filter_by_min_points=dict(Car=5, Pedestrian=10, Cyclist=10)),\n    classes=class_names,\n    sample_groups=dict(Car=15, Pedestrian=10, Cyclist=10),\n    points_loader=dict(\n        type='LoadPointsFromFile',\n        coord_type='LIDAR',\n        load_dim=5,\n        use_dim=[0, 1, 2, 3, 4],\n        file_client_args=file_client_args))\n\n\n\ntrain_pipeline = [\n    dict(type='LoadMultiViewImageFromFiles', to_float32=True),\n    dict(type='PhotoMetricDistortionMultiViewImage'),\n    dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True, with_attr_label=False),\n    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),\n    dict(type='ObjectNameFilter', classes=class_names),\n    dict(type='NormalizeMultiviewImage', **img_norm_cfg),\n    dict(type='PadMultiViewImage', size_divisor=32),\n    dict(type='DefaultFormatBundle3D', class_names=class_names),\n    dict(type='CustomCollect3D', keys=['gt_bboxes_3d', 'gt_labels_3d', 'img'])\n]\n\n\ntest_pipeline = [\n    dict(type='LoadMultiViewImageFromFiles', to_float32=True),\n    dict(type='NormalizeMultiviewImage', **img_norm_cfg),\n    dict(type='PadMultiViewImage', size_divisor=32),\n    dict(\n        type='MultiScaleFlipAug3D',\n        img_scale=(1920, 1280),\n        pts_scale_ratio=1,\n        flip=False,\n        transforms=[\n            dict(\n                type='DefaultFormatBundle3D',\n                class_names=class_names,\n                with_label=False),\n            dict(type='CustomCollect3D', keys=['img'])\n        ])\n]\n\n\n# construct a pipeline for data and gt loading in show function\n# please keep its loading function consistent with test_pipeline (e.g. client)\n\ndata = dict(\n    samples_per_gpu=2,\n    workers_per_gpu=4,\n    train=dict(\n        type='RepeatDataset',\n        times=2,\n        dataset=dict(\n            type=dataset_type,\n            data_root=data_root,\n            ann_file=data_root + 'waymo_infos_train.pkl',\n            split='training',\n            pipeline=train_pipeline,\n            modality=input_modality,\n            classes=class_names,\n            test_mode=False,\n            # we use box_type_3d='LiDAR' in kitti and nuscenes dataset\n            # and box_type_3d='Depth' in sunrgbd and scannet dataset.\n            box_type_3d='LiDAR',\n            # load one frame every five frames\n            load_interval=5)),\n    val=dict(\n        type=dataset_type,\n        data_root=data_root,\n        ann_file=data_root + 'waymo_infos_val.pkl',\n        split='training',\n        pipeline=test_pipeline,\n        modality=input_modality,\n        classes=class_names,\n        test_mode=True,\n        box_type_3d='LiDAR'),\n    test=dict(\n        type=dataset_type,\n        data_root=data_root,\n        ann_file=data_root + 'waymo_infos_val.pkl',\n        split='training',\n        pipeline=test_pipeline,\n        modality=input_modality,\n        classes=class_names,\n        test_mode=True,\n        box_type_3d='LiDAR'))\n\nevaluation = dict(interval=24, pipeline=test_pipeline)"
  },
  {
    "path": "projects/configs/_base_/default_runtime.py",
    "content": "checkpoint_config = dict(interval=1)\n# yapf:disable push\n# By default we use textlogger hook and tensorboard\n# For more loggers see\n# https://mmcv.readthedocs.io/en/latest/api.html#mmcv.runner.LoggerHook\nlog_config = dict(\n    interval=1,\n    hooks=[\n        dict(type='TextLoggerHook'),\n        dict(type='TensorboardLoggerHook')\n    ])\n# yapf:enable\ndist_params = dict(backend='nccl')\nlog_level = 'INFO'\nwork_dir = None\nload_from = None\nresume_from = None\nworkflow = [('train', 1)]\n"
  },
  {
    "path": "projects/configs/_base_/schedules/cosine.py",
    "content": "# This schedule is mainly used by models with dynamic voxelization\n# optimizer\nlr = 0.003  # max learning rate\noptimizer = dict(\n    type='AdamW',\n    lr=lr,\n    betas=(0.95, 0.99),  # the momentum is change during training\n    weight_decay=0.001)\noptimizer_config = dict(grad_clip=dict(max_norm=10, norm_type=2))\n\nlr_config = dict(\n    policy='CosineAnnealing',\n    warmup='linear',\n    warmup_iters=1000,\n    warmup_ratio=1.0 / 10,\n    min_lr_ratio=1e-5)\n\nmomentum_config = None\n\nrunner = dict(type='EpochBasedRunner', max_epochs=40)\n"
  },
  {
    "path": "projects/configs/_base_/schedules/cyclic_20e.py",
    "content": "# For nuScenes dataset, we usually evaluate the model at the end of training.\n# Since the models are trained by 24 epochs by default, we set evaluation\n# interval to be 20. Please change the interval accordingly if you do not\n# use a default schedule.\n# optimizer\n# This schedule is mainly used by models on nuScenes dataset\noptimizer = dict(type='AdamW', lr=1e-4, weight_decay=0.01)\n# max_norm=10 is better for SECOND\noptimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))\nlr_config = dict(\n    policy='cyclic',\n    target_ratio=(10, 1e-4),\n    cyclic_times=1,\n    step_ratio_up=0.4,\n)\nmomentum_config = dict(\n    policy='cyclic',\n    target_ratio=(0.85 / 0.95, 1),\n    cyclic_times=1,\n    step_ratio_up=0.4,\n)\n\n# runtime settings\nrunner = dict(type='EpochBasedRunner', max_epochs=20)\n"
  },
  {
    "path": "projects/configs/_base_/schedules/cyclic_40e.py",
    "content": "# The schedule is usually used by models trained on KITTI dataset\n\n# The learning rate set in the cyclic schedule is the initial learning rate\n# rather than the max learning rate. Since the target_ratio is (10, 1e-4),\n# the learning rate will change from 0.0018 to 0.018, than go to 0.0018*1e-4\nlr = 0.0018\n# The optimizer follows the setting in SECOND.Pytorch, but here we use\n# the offcial AdamW optimizer implemented by PyTorch.\noptimizer = dict(type='AdamW', lr=lr, betas=(0.95, 0.99), weight_decay=0.01)\noptimizer_config = dict(grad_clip=dict(max_norm=10, norm_type=2))\n# We use cyclic learning rate and momentum schedule following SECOND.Pytorch\n# https://github.com/traveller59/second.pytorch/blob/3aba19c9688274f75ebb5e576f65cfe54773c021/torchplus/train/learning_schedules_fastai.py#L69  # noqa\n# We implement them in mmcv, for more details, please refer to\n# https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/lr_updater.py#L327  # noqa\n# https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/momentum_updater.py#L130  # noqa\nlr_config = dict(\n    policy='cyclic',\n    target_ratio=(10, 1e-4),\n    cyclic_times=1,\n    step_ratio_up=0.4,\n)\nmomentum_config = dict(\n    policy='cyclic',\n    target_ratio=(0.85 / 0.95, 1),\n    cyclic_times=1,\n    step_ratio_up=0.4,\n)\n# Although the max_epochs is 40, this schedule is usually used we\n# RepeatDataset with repeat ratio N, thus the actual max epoch\n# number could be Nx40\nrunner = dict(type='EpochBasedRunner', max_epochs=40)\n"
  },
  {
    "path": "projects/configs/_base_/schedules/mmdet_schedule_1x.py",
    "content": "# optimizer\noptimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)\noptimizer_config = dict(grad_clip=None)\n# learning policy\nlr_config = dict(\n    policy='step',\n    warmup='linear',\n    warmup_iters=500,\n    warmup_ratio=0.001,\n    step=[8, 11])\nrunner = dict(type='EpochBasedRunner', max_epochs=12)\n"
  },
  {
    "path": "projects/configs/_base_/schedules/schedule_2x.py",
    "content": "# optimizer\n# This schedule is mainly used by models on nuScenes dataset\noptimizer = dict(type='AdamW', lr=0.001, weight_decay=0.01)\n# max_norm=10 is better for SECOND\noptimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))\nlr_config = dict(\n    policy='step',\n    warmup='linear',\n    warmup_iters=1000,\n    warmup_ratio=1.0 / 1000,\n    step=[20, 23])\nmomentum_config = None\n# runtime settings\nrunner = dict(type='EpochBasedRunner', max_epochs=24)\n"
  },
  {
    "path": "projects/configs/_base_/schedules/schedule_3x.py",
    "content": "# optimizer\n# This schedule is mainly used by models on indoor dataset,\n# e.g., VoteNet on SUNRGBD and ScanNet\nlr = 0.008  # max learning rate\noptimizer = dict(type='AdamW', lr=lr, weight_decay=0.01)\noptimizer_config = dict(grad_clip=dict(max_norm=10, norm_type=2))\nlr_config = dict(policy='step', warmup=None, step=[24, 32])\n# runtime settings\nrunner = dict(type='EpochBasedRunner', max_epochs=36)\n"
  },
  {
    "path": "projects/configs/_base_/schedules/seg_cosine_150e.py",
    "content": "# optimizer\n# This schedule is mainly used on S3DIS dataset in segmentation task\noptimizer = dict(type='SGD', lr=0.2, weight_decay=0.0001, momentum=0.9)\noptimizer_config = dict(grad_clip=None)\nlr_config = dict(policy='CosineAnnealing', warmup=None, min_lr=0.002)\nmomentum_config = None\n\n# runtime settings\nrunner = dict(type='EpochBasedRunner', max_epochs=150)\n"
  },
  {
    "path": "projects/configs/_base_/schedules/seg_cosine_200e.py",
    "content": "# optimizer\n# This schedule is mainly used on ScanNet dataset in segmentation task\noptimizer = dict(type='Adam', lr=0.001, weight_decay=0.01)\noptimizer_config = dict(grad_clip=None)\nlr_config = dict(policy='CosineAnnealing', warmup=None, min_lr=1e-5)\nmomentum_config = None\n\n# runtime settings\nrunner = dict(type='EpochBasedRunner', max_epochs=200)\n"
  },
  {
    "path": "projects/configs/_base_/schedules/seg_cosine_50e.py",
    "content": "# optimizer\n# This schedule is mainly used on S3DIS dataset in segmentation task\noptimizer = dict(type='Adam', lr=0.001, weight_decay=0.001)\noptimizer_config = dict(grad_clip=None)\nlr_config = dict(policy='CosineAnnealing', warmup=None, min_lr=1e-5)\nmomentum_config = None\n\n# runtime settings\nrunner = dict(type='EpochBasedRunner', max_epochs=50)\n"
  },
  {
    "path": "projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.1.py",
    "content": "# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications\n# https://github.com/haomo-ai/Cam4DOcc\n# 2 classes: inflated GMO and others\n\n# Basic params ******************************************\n_base_ = [\n    '../datasets/custom_nus-3d.py',\n    '../_base_/default_runtime.py'\n]\nfind_unused_parameters = True\n# whether training and test together with dataset generation\nonly_generate_dataset = False\n# we only consider use_camera in Cam4DOcc in the current version\ninput_modality = dict(\n    use_lidar=False,\n    use_camera=True,\n    use_radar=False,\n    use_map=False,\n    use_external=False)\nplugin = True\nplugin_dir = \"projects/occ_plugin/\"\nocc_path = \"./data/nuScenes-Occupancy\"\ndepth_gt_path = './data/depth_gt'\ntrain_ann_file = \"./data/nuscenes/nuscenes_occ_infos_train.pkl\"\nval_ann_file = \"./data/nuscenes/nuscenes_occ_infos_val.pkl\"\ncam4docc_dataset_path = \"./data/cam4docc/\"\nnusc_root = './data/nuscenes/'\n# GMO class names\nclass_names = ['vehicle', 'human']\nuse_separate_classes = False\nuse_fine_occ = False\n\n# Forecasting-related params ******************************************\n# we use *time_receptive_field* past frames to forecast future *n_future_frames* frames\n# for 3D instance prediction, n_future_frames_plus > n_future_frames has to be set\ntime_receptive_field = 3\nn_future_frames = 4\nn_future_frames_plus = 6\niou_thresh_for_vpq = 0.2\ntest_present = False\n\n# Occupancy-related params ******************************************\npoint_cloud_range = [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]\nocc_size = [512, 512, 40]\nlss_downsample = [4, 4, 4]\nvoxel_x = (point_cloud_range[3] - point_cloud_range[0]) / occ_size[0]\nvoxel_y = (point_cloud_range[4] - point_cloud_range[1]) / occ_size[1]\nvoxel_z = (point_cloud_range[5] - point_cloud_range[2]) / occ_size[2]\nempty_idx = 0\nif use_separate_classes:\n    num_cls = len(class_names) + 1\nelse:\n    num_cls = 2\nimg_norm_cfg = None\n\n# Save params ******************************************\nsave_pred = False\nsave_path = \"./data/cam4docc/results\"\n\n# Data-generation and pipeline params ******************************************\ndataset_type = 'Cam4DOccDataset'\nfile_client_args = dict(backend='disk')\ndata_config={\n    'cams': ['CAM_FRONT_LEFT', 'CAM_FRONT', 'CAM_FRONT_RIGHT',\n             'CAM_BACK_LEFT', 'CAM_BACK', 'CAM_BACK_RIGHT'],\n    'Ncams': 6,\n    'input_size': (896, 1600),\n    'src_size': (900, 1600),\n    # image-view augmentation\n    'resize': (-0.06, 0.11),\n    'rot': (-5.4, 5.4),\n    'flip': False,\n    'crop_h': (0.0, 0.0),\n    'resize_test': 0.00,\n}\n\nbda_aug_conf = dict(\n            rot_lim=(-0, 0),\n            scale_lim=(0.95, 1.05),\n            flip_dx_ratio=0.5,\n            flip_dy_ratio=0.5)\n\ntrain_capacity = 23930 # default: use all sequences\ntest_capacity = 5119 # default: use all sequences\ntrain_pipeline = [\n    dict(type='LoadInstanceWithFlow', cam4docc_dataset_path=cam4docc_dataset_path, grid_size=occ_size, use_flow=True, background=empty_idx, pc_range=point_cloud_range,\n                use_separate_classes=use_separate_classes),\n    dict(type='LoadMultiViewImageFromFiles_BEVDet', is_train=True, data_config=data_config,\n                sequential=False, aligned=True, trans_only=False, depth_gt_path=depth_gt_path, data_root=nusc_root,\n                mmlabnorm=True, load_depth=True, img_norm_cfg=img_norm_cfg),\n    dict(type='LoadOccupancy', to_float32=True, occ_path=occ_path, grid_size=occ_size, unoccupied=empty_idx, pc_range=point_cloud_range, use_fine_occ=use_fine_occ, test_mode=False),\n    dict(type='OccDefaultFormatBundle3D', class_names=class_names),\n    dict(type='Collect3D', keys=['img_inputs_seq', 'gt_occ', 'segmentation', 'instance', 'flow', 'future_egomotion']),\n]\n\ntest_pipeline = [\n    dict(type='LoadInstanceWithFlow', cam4docc_dataset_path=cam4docc_dataset_path, grid_size=occ_size, use_flow=True, background=empty_idx, pc_range=point_cloud_range,\n         use_separate_classes=use_separate_classes),\n    dict(type='LoadMultiViewImageFromFiles_BEVDet', data_config=data_config, depth_gt_path=depth_gt_path, data_root=nusc_root,\n         sequential=False, aligned=True, trans_only=False, mmlabnorm=True, img_norm_cfg=img_norm_cfg, test_mode=True),\n    dict(type='LoadOccupancy', to_float32=True, occ_path=occ_path, grid_size=occ_size, unoccupied=empty_idx, pc_range=point_cloud_range, use_fine_occ=use_fine_occ, test_mode=True),\n    dict(type='OccDefaultFormatBundle3D', class_names=class_names, with_label=False), \n    dict(type='Collect3D', keys=['img_inputs_seq', 'gt_occ', 'segmentation', 'instance', 'flow', 'future_egomotion'], meta_keys=['pc_range', 'occ_size', 'scene_token', 'lidar_token']),\n]\n\ntrain_config=dict(\n        type=dataset_type,\n        data_root=nusc_root,\n        occ_root=occ_path,\n        idx_root=cam4docc_dataset_path,\n        ori_data_root=cam4docc_dataset_path,\n        ann_file=train_ann_file,\n        pipeline=train_pipeline,\n        classes=class_names,\n        use_separate_classes=use_separate_classes,\n        modality=input_modality,\n        test_mode=False,\n        use_valid_flag=True,\n        occ_size=occ_size,\n        pc_range=point_cloud_range,\n        box_type_3d='LiDAR',\n        time_receptive_field=time_receptive_field,\n        n_future_frames=n_future_frames,\n        train_capacity=train_capacity,\n        test_capacity=test_capacity,\n        ) \n\ntest_config=dict(\n    type=dataset_type,\n    occ_root=occ_path,\n    data_root=nusc_root,\n    idx_root=cam4docc_dataset_path,\n    ori_data_root=cam4docc_dataset_path,\n    ann_file=val_ann_file,\n    pipeline=test_pipeline,\n    classes=class_names,\n    use_separate_classes=use_separate_classes,\n    modality=input_modality,\n    occ_size=occ_size,\n    pc_range=point_cloud_range,\n    time_receptive_field=time_receptive_field, \n    n_future_frames=n_future_frames,\n    train_capacity=train_capacity,\n    test_capacity=test_capacity,\n    ) \n\n# in our work we use 8 NVIDIA A100 GPUs\ndata = dict(\n    samples_per_gpu=1,\n    workers_per_gpu=1,\n    train=train_config,\n    val=test_config,\n    test=test_config,\n    shuffler_sampler=dict(type='DistributedGroupSampler'),\n    nonshuffler_sampler=dict(type='DistributedSampler'),\n)\n\n# Model params ******************************************\ngrid_config = {\n    'xbound': [point_cloud_range[0], point_cloud_range[3], voxel_x*lss_downsample[0]],\n    'ybound': [point_cloud_range[1], point_cloud_range[4], voxel_y*lss_downsample[1]],\n    'zbound': [point_cloud_range[2], point_cloud_range[5], voxel_z*lss_downsample[2]],\n    'dbound': [2.0, 58.0, 0.5],\n}\n\nvoxel_channels = [32*(time_receptive_field), 32*2*(time_receptive_field), 32*4*(time_receptive_field), 32*8*(time_receptive_field)]\npred_channels = [32, 32*2, 32*4, 32*8]\ndecoder_channels = [32*(n_future_frames_plus), 32*2*(n_future_frames_plus), 32*4*(n_future_frames_plus), 32*8*(n_future_frames_plus)]\n\nnumC_Trans = 64\nocc_encoder_input_channel = (numC_Trans+6)*time_receptive_field\nvoxel_out_channel = 32*(n_future_frames_plus)\nflow_out_channel = 32*(n_future_frames_plus)\nvoxel_out_channel_per_frame = 32\nvoxel_out_indices = (0, 1, 2, 3)\nmy_voxel_out_indices = (0, 1, 2, 3)\n\nmodel = dict(\n    type='OCFNet',\n    only_generate_dataset=only_generate_dataset,\n    loss_norm=False,\n    disable_loss_depth=True,\n    point_cloud_range=point_cloud_range,\n    time_receptive_field=time_receptive_field,\n    n_future_frames=n_future_frames,\n    n_future_frames_plus=n_future_frames_plus,\n    max_label=num_cls,\n    iou_thresh_for_vpq=iou_thresh_for_vpq,\n    test_present=test_present,\n    record_time=False,\n    save_pred=save_pred,\n    save_path=save_path,\n    img_backbone=dict(\n        pretrained='torchvision://resnet50',\n        type='ResNet',\n        depth=50,\n        num_stages=4,\n        out_indices=(0, 1, 2, 3),\n        frozen_stages=0,\n        with_cp=False,\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n        norm_eval=False,\n        style='pytorch'),\n    img_neck=dict(\n        type='SECONDFPN',\n        in_channels=[256, 512, 1024, 2048],\n        upsample_strides=[0.25, 0.5, 1, 2],\n        out_channels=[128, 128, 128, 128]),\n    img_view_transformer=dict(type='ViewTransformerLiftSplatShootVoxel',\n                              norm_cfg=dict(type='SyncBN', requires_grad=True),\n                              loss_depth_weight=3.,\n                              loss_depth_type='kld',\n                              grid_config=grid_config,\n                              data_config=data_config,\n                              numC_Trans=numC_Trans,\n                              vp_megvii=False),\n    occ_encoder_backbone=dict(\n        type='CustomResNet3D',\n        depth=18,\n        n_input_channels=occ_encoder_input_channel,\n        block_inplanes=voxel_channels,\n        out_indices=my_voxel_out_indices,\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n    ),\n    occ_predictor=dict(\n        type='Predictor',\n        n_input_channels=pred_channels,\n        in_timesteps=time_receptive_field,\n        out_timesteps=n_future_frames_plus,\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n    ),\n    occ_encoder_neck=dict(\n        type='FPN3D',\n        with_cp=False,\n        in_channels=decoder_channels,\n        out_channels=voxel_out_channel,\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n    ),\n    flow_encoder_backbone=dict(\n        type='CustomResNet3D',\n        depth=18,\n        n_input_channels=occ_encoder_input_channel,\n        block_inplanes=voxel_channels,\n        out_indices=my_voxel_out_indices,\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n    ),\n    flow_predictor=dict(\n        type='Predictor',\n        n_input_channels=pred_channels,\n        in_timesteps=time_receptive_field,\n        out_timesteps=n_future_frames_plus,\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n    ),\n    flow_encoder_neck=dict(\n        type='FPN3D',\n        with_cp=False,\n        in_channels=decoder_channels,\n        out_channels=flow_out_channel,\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n    ),\n    flow_head=dict(\n        type='FlowHead',\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n        soft_weights=True, \n        final_occ_size=occ_size,\n        fine_topk=15000,\n        empty_idx=empty_idx,\n        num_level=len(my_voxel_out_indices),\n        in_channels=[voxel_out_channel_per_frame] * len(my_voxel_out_indices),\n        out_channel=3,  # 3-dim flow\n        point_cloud_range=point_cloud_range,\n    ),\n    pts_bbox_head=dict(\n        type='OccHead',\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n        soft_weights=True,\n        final_occ_size=occ_size,\n        fine_topk=15000,\n        empty_idx=empty_idx,\n        num_level=len(my_voxel_out_indices),\n        in_channels=[voxel_out_channel_per_frame] * len(my_voxel_out_indices),\n        out_channel=num_cls,\n        point_cloud_range=point_cloud_range,\n        loss_weight_cfg=dict(\n            loss_voxel_ce_weight=1.0,\n            loss_voxel_sem_scal_weight=1.0,\n            loss_voxel_geo_scal_weight=1.0,\n            loss_voxel_lovasz_weight=1.0,\n        ),\n    ),\n    empty_idx=empty_idx,\n)\n\n# Learning policy params ******************************************\noptimizer = dict(\n    type='AdamW',\n    lr=3e-4,   \n    paramwise_cfg=dict(\n        custom_keys={\n            'img_backbone': dict(lr_mult=0.1),\n        }),\n    weight_decay=0.01)\n\noptimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))\nlr_config = dict(\n    policy='CosineAnnealing',\n    warmup='linear',\n    warmup_iters=500,\n    warmup_ratio=1.0 / 3,\n    min_lr_ratio=1e-3)\n\nrunner = dict(type='EpochBasedRunner', max_epochs=24)\nevaluation = dict(\n    interval=1,\n    pipeline=test_pipeline,\n    save_best='SSC_mean',\n    rule='greater',\n)\n\ncustom_hooks = [\n    dict(type='OccEfficiencyHook'),\n]"
  },
  {
    "path": "projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.1_lyft.py",
    "content": "# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications\n# https://github.com/haomo-ai/Cam4DOcc\n# 2 classes: inflated GMO and others\n\n# Basic params ******************************************\n_base_ = [\n    '../datasets/custom_nus-3d.py',\n    '../_base_/default_runtime.py'\n]\nfind_unused_parameters = True\n# whether training and test together with dataset generation\nonly_generate_dataset = False\n# we only consider use_camera in Cam4DOcc in the current version\ninput_modality = dict(\n    use_lidar=False,\n    use_camera=True,\n    use_radar=False,\n    use_map=False,\n    use_external=False)\nplugin = True\nplugin_dir = \"projects/occ_plugin/\"\n\n# path unused for lyft\nocc_path = \" \"\ndepth_gt_path = \" \"\ntrain_ann_file = \" \"\nval_ann_file = \" \"\n\ncam4docc_dataset_path = \"./data/cam4docc/\"\nnusc_root = './data/lyft/'\n# GMO class names\nclass_names = ['vehicle', 'human']\nuse_separate_classes = False\nuse_fine_occ = False\n\n# Forecasting-related params ******************************************\n# we use *time_receptive_field* past frames to forecast future *n_future_frames* frames\n# for 3D instance prediction, n_future_frames_plus > n_future_frames has to be set\ntime_receptive_field = 3\nn_future_frames = 4\nn_future_frames_plus = 6\niou_thresh_for_vpq = 0.2\ntest_present = False\n\n# Occupancy-related params ******************************************\npoint_cloud_range = [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]\nocc_size = [512, 512, 40]\nlss_downsample = [4, 4, 4]\nvoxel_x = (point_cloud_range[3] - point_cloud_range[0]) / occ_size[0]\nvoxel_y = (point_cloud_range[4] - point_cloud_range[1]) / occ_size[1]\nvoxel_z = (point_cloud_range[5] - point_cloud_range[2]) / occ_size[2]\nempty_idx = 0\nif use_separate_classes:\n    num_cls = len(class_names) + 1\nelse:\n    num_cls = 2\nimg_norm_cfg = None\n\n# Save params ******************************************\nsave_pred = False\nsave_path = \"./data/cam4docc/results\"\n\n# Data-generation and pipeline params ******************************************\ndataset_type = 'Cam4DOccLyftDataset'\nfile_client_args = dict(backend='disk')\ndata_config={\n    'cams': ['CAM_FRONT_LEFT', 'CAM_FRONT', 'CAM_FRONT_RIGHT',\n             'CAM_BACK_LEFT', 'CAM_BACK', 'CAM_BACK_RIGHT'],\n    'Ncams': 6,\n    'input_size': (896, 1600),\n    'src_size': (900, 1600),\n    # image-view augmentation\n    'resize': (-0.06, 0.11),\n    'rot': (-5.4, 5.4),\n    'flip': False,\n    'crop_h': (0.0, 0.0),\n    'resize_test': 0.00,\n}\n\nbda_aug_conf = dict(\n            rot_lim=(-0, 0),\n            scale_lim=(0.95, 1.05),\n            flip_dx_ratio=0.5,\n            flip_dy_ratio=0.5)\n\ntrain_capacity = 15720 # default: use all sequences\ntest_capacity = 5880 # default: use all sequences\ntrain_pipeline = [\n    dict(type='LoadInstanceWithFlow', cam4docc_dataset_path=cam4docc_dataset_path, grid_size=occ_size, use_flow=True, background=empty_idx, pc_range=point_cloud_range,\n                use_separate_classes=use_separate_classes, use_lyft=True),\n    dict(type='LoadMultiViewImageFromFiles_BEVDet', is_train=True, data_config=data_config,\n                sequential=False, aligned=True, trans_only=False, depth_gt_path=depth_gt_path, data_root=nusc_root,\n                mmlabnorm=True, load_depth=True, img_norm_cfg=img_norm_cfg, use_lyft=True),\n    dict(type='LoadOccupancy', to_float32=True, occ_path=occ_path, grid_size=occ_size, unoccupied=empty_idx, pc_range=point_cloud_range, use_fine_occ=use_fine_occ, test_mode=False),\n    dict(type='OccDefaultFormatBundle3D', class_names=class_names),\n    dict(type='Collect3D', keys=['img_inputs_seq', 'gt_occ', 'segmentation', 'instance', 'flow', 'future_egomotion']),\n]\n\ntest_pipeline = [\n    dict(type='LoadInstanceWithFlow', cam4docc_dataset_path=cam4docc_dataset_path, grid_size=occ_size, use_flow=True, background=empty_idx, pc_range=point_cloud_range,\n         use_separate_classes=use_separate_classes, use_lyft=True),\n    dict(type='LoadMultiViewImageFromFiles_BEVDet', data_config=data_config, depth_gt_path=depth_gt_path, data_root=nusc_root,\n         sequential=False, aligned=True, trans_only=False, mmlabnorm=True, img_norm_cfg=img_norm_cfg, test_mode=True, use_lyft=True),\n    dict(type='LoadOccupancy', to_float32=True, occ_path=occ_path, grid_size=occ_size, unoccupied=empty_idx, pc_range=point_cloud_range, use_fine_occ=use_fine_occ, test_mode=True),\n    dict(type='OccDefaultFormatBundle3D', class_names=class_names, with_label=False), \n    dict(type='Collect3D', keys=['img_inputs_seq', 'gt_occ', 'segmentation', 'instance', 'flow', 'future_egomotion'], meta_keys=['pc_range', 'occ_size', 'scene_token', 'lidar_token']),\n]\n\ntrain_config=dict(\n        type=dataset_type,\n        data_root=nusc_root,\n        occ_root=occ_path,\n        idx_root=cam4docc_dataset_path,\n        ori_data_root=cam4docc_dataset_path,\n        ann_file=train_ann_file,\n        pipeline=train_pipeline,\n        classes=class_names,\n        use_separate_classes=use_separate_classes,\n        modality=input_modality,\n        test_mode=False,\n        use_valid_flag=True,\n        occ_size=occ_size,\n        pc_range=point_cloud_range,\n        box_type_3d='LiDAR',\n        time_receptive_field=time_receptive_field,\n        n_future_frames=n_future_frames,\n        train_capacity=train_capacity,\n        test_capacity=test_capacity,\n        ) \n\ntest_config=dict(\n    type=dataset_type,\n    occ_root=occ_path,\n    data_root=nusc_root,\n    idx_root=cam4docc_dataset_path,\n    ori_data_root=cam4docc_dataset_path,\n    ann_file=val_ann_file,\n    pipeline=test_pipeline,\n    classes=class_names,\n    use_separate_classes=use_separate_classes,\n    modality=input_modality,\n    occ_size=occ_size,\n    pc_range=point_cloud_range,\n    time_receptive_field=time_receptive_field, \n    n_future_frames=n_future_frames,\n    train_capacity=train_capacity,\n    test_capacity=test_capacity,\n    ) \n\n# in our work we use 8 NVIDIA A100 GPUs\ndata = dict(\n    samples_per_gpu=1,\n    workers_per_gpu=1,\n    train=train_config,\n    val=test_config,\n    test=test_config,\n    shuffler_sampler=dict(type='DistributedGroupSampler'),\n    nonshuffler_sampler=dict(type='DistributedSampler'),\n)\n\n# Model params ******************************************\ngrid_config = {\n    'xbound': [point_cloud_range[0], point_cloud_range[3], voxel_x*lss_downsample[0]],\n    'ybound': [point_cloud_range[1], point_cloud_range[4], voxel_y*lss_downsample[1]],\n    'zbound': [point_cloud_range[2], point_cloud_range[5], voxel_z*lss_downsample[2]],\n    'dbound': [2.0, 58.0, 0.5],\n}\n\nvoxel_channels = [32*(time_receptive_field), 32*2*(time_receptive_field), 32*4*(time_receptive_field), 32*8*(time_receptive_field)]\npred_channels = [32, 32*2, 32*4, 32*8]\ndecoder_channels = [32*(n_future_frames_plus), 32*2*(n_future_frames_plus), 32*4*(n_future_frames_plus), 32*8*(n_future_frames_plus)]\n\nnumC_Trans = 64\nocc_encoder_input_channel = (numC_Trans+6)*time_receptive_field\nvoxel_out_channel = 32*(n_future_frames_plus)\nflow_out_channel = 32*(n_future_frames_plus)\nvoxel_out_channel_per_frame = 32\nvoxel_out_indices = (0, 1, 2, 3)\nmy_voxel_out_indices = (0, 1, 2, 3)\n\nmodel = dict(\n    type='OCFNet',\n    only_generate_dataset=only_generate_dataset,\n    loss_norm=False,\n    disable_loss_depth=True,\n    point_cloud_range=point_cloud_range,\n    time_receptive_field=time_receptive_field,\n    n_future_frames=n_future_frames,\n    n_future_frames_plus=n_future_frames_plus,\n    max_label=num_cls,\n    iou_thresh_for_vpq=iou_thresh_for_vpq,\n    test_present=test_present,\n    record_time=False,\n    save_pred=save_pred,\n    save_path=save_path,\n    img_backbone=dict(\n        pretrained='torchvision://resnet50',\n        type='ResNet',\n        depth=50,\n        num_stages=4,\n        out_indices=(0, 1, 2, 3),\n        frozen_stages=0,\n        with_cp=False,\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n        norm_eval=False,\n        style='pytorch'),\n    img_neck=dict(\n        type='SECONDFPN',\n        in_channels=[256, 512, 1024, 2048],\n        upsample_strides=[0.25, 0.5, 1, 2],\n        out_channels=[128, 128, 128, 128]),\n    img_view_transformer=dict(type='ViewTransformerLiftSplatShootVoxel',\n                              norm_cfg=dict(type='SyncBN', requires_grad=True),\n                              loss_depth_weight=3.,\n                              loss_depth_type='kld',\n                              grid_config=grid_config,\n                              data_config=data_config,\n                              numC_Trans=numC_Trans,\n                              vp_megvii=False),\n    occ_encoder_backbone=dict(\n        type='CustomResNet3D',\n        depth=18,\n        n_input_channels=occ_encoder_input_channel,\n        block_inplanes=voxel_channels,\n        out_indices=my_voxel_out_indices,\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n    ),\n    occ_predictor=dict(\n        type='Predictor',\n        n_input_channels=pred_channels,\n        in_timesteps=time_receptive_field,\n        out_timesteps=n_future_frames_plus,\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n    ),\n    occ_encoder_neck=dict(\n        type='FPN3D',\n        with_cp=False,\n        in_channels=decoder_channels,\n        out_channels=voxel_out_channel,\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n    ),\n    flow_encoder_backbone=dict(\n        type='CustomResNet3D',\n        depth=18,\n        n_input_channels=occ_encoder_input_channel,\n        block_inplanes=voxel_channels,\n        out_indices=my_voxel_out_indices,\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n    ),\n    flow_predictor=dict(\n        type='Predictor',\n        n_input_channels=pred_channels,\n        in_timesteps=time_receptive_field,\n        out_timesteps=n_future_frames_plus,\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n    ),\n    flow_encoder_neck=dict(\n        type='FPN3D',\n        with_cp=False,\n        in_channels=decoder_channels,\n        out_channels=flow_out_channel,\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n    ),\n    flow_head=dict(\n        type='FlowHead',\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n        soft_weights=True, \n        final_occ_size=occ_size,\n        fine_topk=15000,\n        empty_idx=empty_idx,\n        num_level=len(my_voxel_out_indices),\n        in_channels=[voxel_out_channel_per_frame] * len(my_voxel_out_indices),\n        out_channel=3,  # 3-dim flow\n        point_cloud_range=point_cloud_range,\n    ),\n    pts_bbox_head=dict(\n        type='OccHead',\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n        soft_weights=True,\n        final_occ_size=occ_size,\n        fine_topk=15000,\n        empty_idx=empty_idx,\n        num_level=len(my_voxel_out_indices),\n        in_channels=[voxel_out_channel_per_frame] * len(my_voxel_out_indices),\n        out_channel=num_cls,\n        point_cloud_range=point_cloud_range,\n        loss_weight_cfg=dict(\n            loss_voxel_ce_weight=1.0,\n            loss_voxel_sem_scal_weight=1.0,\n            loss_voxel_geo_scal_weight=1.0,\n            loss_voxel_lovasz_weight=1.0,\n        ),\n    ),\n    empty_idx=empty_idx,\n)\n\n# Learning policy params ******************************************\noptimizer = dict(\n    type='AdamW',\n    lr=3e-4,   \n    paramwise_cfg=dict(\n        custom_keys={\n            'img_backbone': dict(lr_mult=0.1),\n        }),\n    weight_decay=0.01)\n\noptimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))\nlr_config = dict(\n    policy='CosineAnnealing',\n    warmup='linear',\n    warmup_iters=500,\n    warmup_ratio=1.0 / 3,\n    min_lr_ratio=1e-3)\n\nrunner = dict(type='EpochBasedRunner', max_epochs=24)\nevaluation = dict(\n    interval=1,\n    pipeline=test_pipeline,\n    save_best='SSC_mean',\n    rule='greater',\n)\n\ncustom_hooks = [\n    dict(type='OccEfficiencyHook'),\n]"
  },
  {
    "path": "projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.2.py",
    "content": "# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications\n# https://github.com/haomo-ai/Cam4DOcc\n# multiple classes: inflated multiple MO classes\n\n# Basic params ******************************************\n_base_ = [\n    '../datasets/custom_nus-3d.py',\n    '../_base_/default_runtime.py'\n]\nfind_unused_parameters = True\n# whether training and test together with dataset generation\nonly_generate_dataset = False\n# we only consider use_camera in Cam4DOcc in the current version\ninput_modality = dict(\n    use_lidar=False,\n    use_camera=True,\n    use_radar=False,\n    use_map=False,\n    use_external=False)\nplugin = True\nplugin_dir = \"projects/occ_plugin/\"\nocc_path = \"./data/nuScenes-Occupancy\"\ndepth_gt_path = './data/depth_gt'\ntrain_ann_file = \"./data/nuscenes/nuscenes_occ_infos_train.pkl\"\nval_ann_file = \"./data/nuscenes/nuscenes_occ_infos_val.pkl\"\ncam4docc_dataset_path = \"./data/cam4docc/\"\nnusc_root = './data/nuscenes/'\n# GMO class names\nclass_names = [\n    'vehicle.bicycle', 'bus', 'car', 'construction', 'motorcycle', 'trailer', 'truck', 'pedestrian'\n]\nuse_separate_classes = True\nuse_fine_occ = False\n\n# Forecasting-related params ******************************************\n# we use *time_receptive_field* past frames to forecast future *n_future_frames* frames\n# for 3D instance prediction, n_future_frames_plus > n_future_frames has to be set\ntime_receptive_field = 3\nn_future_frames = 4\nn_future_frames_plus = 6\niou_thresh_for_vpq = 0.2\ntest_present = False\n\n# Occupancy-related params ******************************************\npoint_cloud_range = [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]\nocc_size = [512, 512, 40]\nlss_downsample = [4, 4, 4]\nvoxel_x = (point_cloud_range[3] - point_cloud_range[0]) / occ_size[0]\nvoxel_y = (point_cloud_range[4] - point_cloud_range[1]) / occ_size[1]\nvoxel_z = (point_cloud_range[5] - point_cloud_range[2]) / occ_size[2]\nempty_idx = 0\nif use_separate_classes:\n    num_cls = len(class_names) + 1\nelse:\n    num_cls = 2\nimg_norm_cfg = None\n\n# Save params ******************************************\nsave_pred = False\nsave_path = \"./data/cam4docc/results\"\n\n# Data-generation and pipeline params ******************************************\ndataset_type = 'Cam4DOccDataset'\nfile_client_args = dict(backend='disk')\ndata_config={\n    'cams': ['CAM_FRONT_LEFT', 'CAM_FRONT', 'CAM_FRONT_RIGHT',\n             'CAM_BACK_LEFT', 'CAM_BACK', 'CAM_BACK_RIGHT'],\n    'Ncams': 6,\n    'input_size': (896, 1600),\n    'src_size': (900, 1600),\n    # image-view augmentation\n    'resize': (-0.06, 0.11),\n    'rot': (-5.4, 5.4),\n    'flip': False,\n    'crop_h': (0.0, 0.0),\n    'resize_test': 0.00,\n}\n\nbda_aug_conf = dict(\n            rot_lim=(-0, 0),\n            scale_lim=(0.95, 1.05),\n            flip_dx_ratio=0.5,\n            flip_dy_ratio=0.5)\n\n\ntrain_capacity = 23930 # default: use all sequences\ntest_capacity = 5119 # default: use all sequences\ntrain_pipeline = [\n    dict(type='LoadInstanceWithFlow', cam4docc_dataset_path=cam4docc_dataset_path, grid_size=occ_size, use_flow=True, background=empty_idx, pc_range=point_cloud_range,\n                use_separate_classes=use_separate_classes),\n    dict(type='LoadMultiViewImageFromFiles_BEVDet', is_train=True, data_config=data_config,\n                sequential=False, aligned=True, trans_only=False, depth_gt_path=depth_gt_path, data_root=nusc_root,\n                mmlabnorm=True, load_depth=True, img_norm_cfg=img_norm_cfg),\n    dict(type='LoadOccupancy', to_float32=True, occ_path=occ_path, grid_size=occ_size, unoccupied=empty_idx, pc_range=point_cloud_range, use_fine_occ=use_fine_occ, test_mode=False),\n    dict(type='OccDefaultFormatBundle3D', class_names=class_names),\n    dict(type='Collect3D', keys=['img_inputs_seq', 'gt_occ', 'segmentation', 'instance', 'flow', 'future_egomotion']),\n]\n\ntest_pipeline = [\n    dict(type='LoadInstanceWithFlow', cam4docc_dataset_path=cam4docc_dataset_path, grid_size=occ_size, use_flow=True, background=empty_idx, pc_range=point_cloud_range,\n         use_separate_classes=use_separate_classes),\n    dict(type='LoadMultiViewImageFromFiles_BEVDet', data_config=data_config, depth_gt_path=depth_gt_path, data_root=nusc_root,\n         sequential=False, aligned=True, trans_only=False, mmlabnorm=True, img_norm_cfg=img_norm_cfg, test_mode=True),\n    dict(type='LoadOccupancy', to_float32=True, occ_path=occ_path, grid_size=occ_size, unoccupied=empty_idx, pc_range=point_cloud_range, use_fine_occ=use_fine_occ, test_mode=True),\n    dict(type='OccDefaultFormatBundle3D', class_names=class_names, with_label=False), \n    dict(type='Collect3D', keys=['img_inputs_seq', 'gt_occ', 'segmentation', 'instance', 'flow', 'future_egomotion'], meta_keys=['pc_range', 'occ_size', 'scene_token', 'lidar_token']),\n]\n\ntrain_config=dict(\n        type=dataset_type,\n        data_root=nusc_root,\n        occ_root=occ_path,\n        idx_root=cam4docc_dataset_path,\n        ori_data_root=cam4docc_dataset_path,\n        ann_file=train_ann_file,\n        pipeline=train_pipeline,\n        classes=class_names,\n        use_separate_classes=use_separate_classes,\n        modality=input_modality,\n        test_mode=False,\n        use_valid_flag=True,\n        occ_size=occ_size,\n        pc_range=point_cloud_range,\n        box_type_3d='LiDAR',\n        time_receptive_field=time_receptive_field,\n        n_future_frames=n_future_frames,\n        train_capacity=train_capacity,\n        test_capacity=test_capacity,\n        ) \n\ntest_config=dict(\n    type=dataset_type,\n    occ_root=occ_path,\n    data_root=nusc_root,\n    idx_root=cam4docc_dataset_path,\n    ori_data_root=cam4docc_dataset_path,\n    ann_file=val_ann_file,\n    pipeline=test_pipeline,\n    classes=class_names,\n    use_separate_classes=use_separate_classes,\n    modality=input_modality,\n    occ_size=occ_size,\n    pc_range=point_cloud_range,\n    time_receptive_field=time_receptive_field, \n    n_future_frames=n_future_frames,\n    train_capacity=train_capacity,\n    test_capacity=test_capacity,\n    ) \n\n# in our work we use 8 NVIDIA A100 GPUs\ndata = dict(\n    samples_per_gpu=1,\n    workers_per_gpu=1,\n    train=train_config,\n    val=test_config,\n    test=test_config,\n    shuffler_sampler=dict(type='DistributedGroupSampler'),\n    nonshuffler_sampler=dict(type='DistributedSampler'),\n)\n\n# Model params ******************************************\ngrid_config = {\n    'xbound': [point_cloud_range[0], point_cloud_range[3], voxel_x*lss_downsample[0]],\n    'ybound': [point_cloud_range[1], point_cloud_range[4], voxel_y*lss_downsample[1]],\n    'zbound': [point_cloud_range[2], point_cloud_range[5], voxel_z*lss_downsample[2]],\n    'dbound': [2.0, 58.0, 0.5],\n}\n\nvoxel_channels = [32*(time_receptive_field), 32*2*(time_receptive_field), 32*4*(time_receptive_field), 32*8*(time_receptive_field)]\npred_channels = [32, 32*2, 32*4, 32*8]\ndecoder_channels = [32*(n_future_frames_plus), 32*2*(n_future_frames_plus), 32*4*(n_future_frames_plus), 32*8*(n_future_frames_plus)]\n\nnumC_Trans = 64\nocc_encoder_input_channel = (numC_Trans+6)*time_receptive_field\nvoxel_out_channel = 32*(n_future_frames_plus)\nflow_out_channel = 32*(n_future_frames_plus)\nvoxel_out_channel_per_frame = 32\nvoxel_out_indices = (0, 1, 2, 3)\nmy_voxel_out_indices = (0, 1, 2, 3)\n\nmodel = dict(\n    type='OCFNet',\n    only_generate_dataset=only_generate_dataset,\n    loss_norm=False,\n    disable_loss_depth=True,\n    point_cloud_range=point_cloud_range,\n    time_receptive_field=time_receptive_field,\n    n_future_frames=n_future_frames,\n    n_future_frames_plus=n_future_frames_plus,\n    max_label=num_cls,\n    iou_thresh_for_vpq=iou_thresh_for_vpq,\n    test_present=test_present,\n    record_time=False,\n    save_pred=save_pred,\n    save_path=save_path,\n    img_backbone=dict(\n        pretrained='torchvision://resnet50',\n        type='ResNet',\n        depth=50,\n        num_stages=4,\n        out_indices=(0, 1, 2, 3),\n        frozen_stages=0,\n        with_cp=False,\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n        norm_eval=False,\n        style='pytorch'),\n    img_neck=dict(\n        type='SECONDFPN',\n        in_channels=[256, 512, 1024, 2048],\n        upsample_strides=[0.25, 0.5, 1, 2],\n        out_channels=[128, 128, 128, 128]),\n    img_view_transformer=dict(type='ViewTransformerLiftSplatShootVoxel',\n                              norm_cfg=dict(type='SyncBN', requires_grad=True),\n                              loss_depth_weight=3.,\n                              loss_depth_type='kld',\n                              grid_config=grid_config,\n                              data_config=data_config,\n                              numC_Trans=numC_Trans,\n                              vp_megvii=False),\n    occ_encoder_backbone=dict(\n        type='CustomResNet3D',\n        depth=18,\n        n_input_channels=occ_encoder_input_channel,\n        block_inplanes=voxel_channels,\n        out_indices=my_voxel_out_indices,\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n    ),\n    occ_predictor=dict(\n        type='Predictor',\n        n_input_channels=pred_channels,\n        in_timesteps=time_receptive_field,\n        out_timesteps=n_future_frames_plus,\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n    ),\n    occ_encoder_neck=dict(\n        type='FPN3D',\n        with_cp=False,\n        in_channels=decoder_channels,\n        out_channels=voxel_out_channel,\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n    ),\n    flow_encoder_backbone=dict(\n        type='CustomResNet3D',\n        depth=18,\n        n_input_channels=occ_encoder_input_channel,\n        block_inplanes=voxel_channels,\n        out_indices=my_voxel_out_indices,\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n    ),\n    flow_predictor=dict(\n        type='Predictor',\n        n_input_channels=pred_channels,\n        in_timesteps=time_receptive_field,\n        out_timesteps=n_future_frames_plus,\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n    ),\n    flow_encoder_neck=dict(\n        type='FPN3D',\n        with_cp=False,\n        in_channels=decoder_channels,\n        out_channels=flow_out_channel,\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n    ),\n    flow_head=dict(\n        type='FlowHead',\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n        soft_weights=True, \n        final_occ_size=occ_size,\n        fine_topk=15000,\n        empty_idx=empty_idx,\n        num_level=len(my_voxel_out_indices),\n        in_channels=[voxel_out_channel_per_frame] * len(my_voxel_out_indices),\n        out_channel=3,  # 3-dim flow\n        point_cloud_range=point_cloud_range,\n    ),\n    pts_bbox_head=dict(\n        type='OccHead',\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n        soft_weights=True,\n        final_occ_size=occ_size,\n        fine_topk=15000,\n        empty_idx=empty_idx,\n        num_level=len(my_voxel_out_indices),\n        in_channels=[voxel_out_channel_per_frame] * len(my_voxel_out_indices),\n        out_channel=num_cls,\n        point_cloud_range=point_cloud_range,\n        loss_weight_cfg=dict(\n            loss_voxel_ce_weight=1.0,\n            loss_voxel_sem_scal_weight=1.0,\n            loss_voxel_geo_scal_weight=1.0,\n            loss_voxel_lovasz_weight=1.0,\n        ),\n    ),\n    empty_idx=empty_idx,\n)\n\n# Learning policy params ******************************************\noptimizer = dict(\n    type='AdamW',\n    lr=3e-4,   \n    paramwise_cfg=dict(\n        custom_keys={\n            'img_backbone': dict(lr_mult=0.1),\n        }),\n    weight_decay=0.01)\n\noptimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))\nlr_config = dict(\n    policy='CosineAnnealing',\n    warmup='linear',\n    warmup_iters=500,\n    warmup_ratio=1.0 / 3,\n    min_lr_ratio=1e-3)\n\nrunner = dict(type='EpochBasedRunner', max_epochs=24)\nevaluation = dict(\n    interval=1,\n    pipeline=test_pipeline,\n    save_best='SSC_mean',\n    rule='greater',\n)\n\ncustom_hooks = [\n    dict(type='OccEfficiencyHook'),\n]"
  },
  {
    "path": "projects/configs/baselines/OCFNet_in_Cam4DOcc_V1.2_lyft.py",
    "content": "# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications\n# https://github.com/haomo-ai/Cam4DOcc\n# multiple classes: inflated multiple MO classes\n\n# Basic params ******************************************\n_base_ = [\n    '../datasets/custom_nus-3d.py',\n    '../_base_/default_runtime.py'\n]\nfind_unused_parameters = True\n# whether training and test together with dataset generation\nonly_generate_dataset = False\n# we only consider use_camera in Cam4DOcc in the current version\ninput_modality = dict(\n    use_lidar=False,\n    use_camera=True,\n    use_radar=False,\n    use_map=False,\n    use_external=False)\nplugin = True\nplugin_dir = \"projects/occ_plugin/\"\n\n# path unused for lyft\nocc_path = \" \"\ndepth_gt_path = \" \"\ntrain_ann_file = \" \"\nval_ann_file = \" \"\n\ncam4docc_dataset_path = \"./data/cam4docc/\"\nnusc_root = './data/lyft/'\n# GMO class names\n# refine the classes for lyft datasets according to your needs\nclass_names = [\n    'bicycle', 'bus', 'car', 'construction', 'motorcycle', 'trailer', 'truck', 'pedestrian'\n]\nuse_separate_classes = True\nuse_fine_occ = False\n\n# Forecasting-related params ******************************************\n# we use *time_receptive_field* past frames to forecast future *n_future_frames* frames\n# for 3D instance prediction, n_future_frames_plus > n_future_frames has to be set\ntime_receptive_field = 3\nn_future_frames = 4\nn_future_frames_plus = 6\niou_thresh_for_vpq = 0.2\ntest_present = False\n\n# Occupancy-related params ******************************************\npoint_cloud_range = [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]\nocc_size = [512, 512, 40]\nlss_downsample = [4, 4, 4]\nvoxel_x = (point_cloud_range[3] - point_cloud_range[0]) / occ_size[0]\nvoxel_y = (point_cloud_range[4] - point_cloud_range[1]) / occ_size[1]\nvoxel_z = (point_cloud_range[5] - point_cloud_range[2]) / occ_size[2]\nempty_idx = 0\nif use_separate_classes:\n    num_cls = len(class_names) + 1\nelse:\n    num_cls = 2\nimg_norm_cfg = None\n\n# Save params ******************************************\nsave_pred = False\nsave_path = \"./data/cam4docc/results\"\n\n# Data-generation and pipeline params ******************************************\ndataset_type = 'Cam4DOccLyftDataset'\nfile_client_args = dict(backend='disk')\ndata_config={\n    'cams': ['CAM_FRONT_LEFT', 'CAM_FRONT', 'CAM_FRONT_RIGHT',\n             'CAM_BACK_LEFT', 'CAM_BACK', 'CAM_BACK_RIGHT'],\n    'Ncams': 6,\n    'input_size': (896, 1600),\n    'src_size': (900, 1600),\n    # image-view augmentation\n    'resize': (-0.06, 0.11),\n    'rot': (-5.4, 5.4),\n    'flip': False,\n    'crop_h': (0.0, 0.0),\n    'resize_test': 0.00,\n}\n\nbda_aug_conf = dict(\n            rot_lim=(-0, 0),\n            scale_lim=(0.95, 1.05),\n            flip_dx_ratio=0.5,\n            flip_dy_ratio=0.5)\n\n\ntrain_capacity = 15720 # default: use all sequences\ntest_capacity = 5880 # default: use all sequences\ntrain_pipeline = [\n    dict(type='LoadInstanceWithFlow', cam4docc_dataset_path=cam4docc_dataset_path, grid_size=occ_size, use_flow=True, background=empty_idx, pc_range=point_cloud_range,\n                use_separate_classes=use_separate_classes, use_lyft=True),\n    dict(type='LoadMultiViewImageFromFiles_BEVDet', is_train=True, data_config=data_config,\n                sequential=False, aligned=True, trans_only=False, depth_gt_path=depth_gt_path, data_root=nusc_root,\n                mmlabnorm=True, load_depth=True, img_norm_cfg=img_norm_cfg, use_lyft=True),\n    dict(type='LoadOccupancy', to_float32=True, occ_path=occ_path, grid_size=occ_size, unoccupied=empty_idx, pc_range=point_cloud_range, use_fine_occ=use_fine_occ, test_mode=False),\n    dict(type='OccDefaultFormatBundle3D', class_names=class_names),\n    dict(type='Collect3D', keys=['img_inputs_seq', 'gt_occ', 'segmentation', 'instance', 'flow', 'future_egomotion']),\n]\n\ntest_pipeline = [\n    dict(type='LoadInstanceWithFlow', cam4docc_dataset_path=cam4docc_dataset_path, grid_size=occ_size, use_flow=True, background=empty_idx, pc_range=point_cloud_range,\n         use_separate_classes=use_separate_classes, use_lyft=True),\n    dict(type='LoadMultiViewImageFromFiles_BEVDet', data_config=data_config, depth_gt_path=depth_gt_path, data_root=nusc_root,\n         sequential=False, aligned=True, trans_only=False, mmlabnorm=True, img_norm_cfg=img_norm_cfg, test_mode=True, use_lyft=True),\n    dict(type='LoadOccupancy', to_float32=True, occ_path=occ_path, grid_size=occ_size, unoccupied=empty_idx, pc_range=point_cloud_range, use_fine_occ=use_fine_occ, test_mode=True),\n    dict(type='OccDefaultFormatBundle3D', class_names=class_names, with_label=False), \n    dict(type='Collect3D', keys=['img_inputs_seq', 'gt_occ', 'segmentation', 'instance', 'flow', 'future_egomotion'], meta_keys=['pc_range', 'occ_size', 'scene_token', 'lidar_token']),\n]\n\ntrain_config=dict(\n        type=dataset_type,\n        data_root=nusc_root,\n        occ_root=occ_path,\n        idx_root=cam4docc_dataset_path,\n        ori_data_root=cam4docc_dataset_path,\n        ann_file=train_ann_file,\n        pipeline=train_pipeline,\n        classes=class_names,\n        use_separate_classes=use_separate_classes,\n        modality=input_modality,\n        test_mode=False,\n        use_valid_flag=True,\n        occ_size=occ_size,\n        pc_range=point_cloud_range,\n        box_type_3d='LiDAR',\n        time_receptive_field=time_receptive_field,\n        n_future_frames=n_future_frames,\n        train_capacity=train_capacity,\n        test_capacity=test_capacity,\n        ) \n\ntest_config=dict(\n    type=dataset_type,\n    occ_root=occ_path,\n    data_root=nusc_root,\n    idx_root=cam4docc_dataset_path,\n    ori_data_root=cam4docc_dataset_path,\n    ann_file=val_ann_file,\n    pipeline=test_pipeline,\n    classes=class_names,\n    use_separate_classes=use_separate_classes,\n    modality=input_modality,\n    occ_size=occ_size,\n    pc_range=point_cloud_range,\n    time_receptive_field=time_receptive_field, \n    n_future_frames=n_future_frames,\n    train_capacity=train_capacity,\n    test_capacity=test_capacity,\n    ) \n\n# in our work we use 8 NVIDIA A100 GPUs\ndata = dict(\n    samples_per_gpu=1,\n    workers_per_gpu=1,\n    train=train_config,\n    val=test_config,\n    test=test_config,\n    shuffler_sampler=dict(type='DistributedGroupSampler'),\n    nonshuffler_sampler=dict(type='DistributedSampler'),\n)\n\n# Model params ******************************************\ngrid_config = {\n    'xbound': [point_cloud_range[0], point_cloud_range[3], voxel_x*lss_downsample[0]],\n    'ybound': [point_cloud_range[1], point_cloud_range[4], voxel_y*lss_downsample[1]],\n    'zbound': [point_cloud_range[2], point_cloud_range[5], voxel_z*lss_downsample[2]],\n    'dbound': [2.0, 58.0, 0.5],\n}\n\nvoxel_channels = [32*(time_receptive_field), 32*2*(time_receptive_field), 32*4*(time_receptive_field), 32*8*(time_receptive_field)]\npred_channels = [32, 32*2, 32*4, 32*8]\ndecoder_channels = [32*(n_future_frames_plus), 32*2*(n_future_frames_plus), 32*4*(n_future_frames_plus), 32*8*(n_future_frames_plus)]\n\nnumC_Trans = 64\nocc_encoder_input_channel = (numC_Trans+6)*time_receptive_field\nvoxel_out_channel = 32*(n_future_frames_plus)\nflow_out_channel = 32*(n_future_frames_plus)\nvoxel_out_channel_per_frame = 32\nvoxel_out_indices = (0, 1, 2, 3)\nmy_voxel_out_indices = (0, 1, 2, 3)\n\nmodel = dict(\n    type='OCFNet',\n    only_generate_dataset=only_generate_dataset,\n    loss_norm=False,\n    disable_loss_depth=True,\n    point_cloud_range=point_cloud_range,\n    time_receptive_field=time_receptive_field,\n    n_future_frames=n_future_frames,\n    n_future_frames_plus=n_future_frames_plus,\n    max_label=num_cls,\n    iou_thresh_for_vpq=iou_thresh_for_vpq,\n    test_present=test_present,\n    record_time=False,\n    save_pred=save_pred,\n    save_path=save_path,\n    img_backbone=dict(\n        pretrained='torchvision://resnet50',\n        type='ResNet',\n        depth=50,\n        num_stages=4,\n        out_indices=(0, 1, 2, 3),\n        frozen_stages=0,\n        with_cp=False,\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n        norm_eval=False,\n        style='pytorch'),\n    img_neck=dict(\n        type='SECONDFPN',\n        in_channels=[256, 512, 1024, 2048],\n        upsample_strides=[0.25, 0.5, 1, 2],\n        out_channels=[128, 128, 128, 128]),\n    img_view_transformer=dict(type='ViewTransformerLiftSplatShootVoxel',\n                              norm_cfg=dict(type='SyncBN', requires_grad=True),\n                              loss_depth_weight=3.,\n                              loss_depth_type='kld',\n                              grid_config=grid_config,\n                              data_config=data_config,\n                              numC_Trans=numC_Trans,\n                              vp_megvii=False),\n    occ_encoder_backbone=dict(\n        type='CustomResNet3D',\n        depth=18,\n        n_input_channels=occ_encoder_input_channel,\n        block_inplanes=voxel_channels,\n        out_indices=my_voxel_out_indices,\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n    ),\n    occ_predictor=dict(\n        type='Predictor',\n        n_input_channels=pred_channels,\n        in_timesteps=time_receptive_field,\n        out_timesteps=n_future_frames_plus,\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n    ),\n    occ_encoder_neck=dict(\n        type='FPN3D',\n        with_cp=False,\n        in_channels=decoder_channels,\n        out_channels=voxel_out_channel,\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n    ),\n    flow_encoder_backbone=dict(\n        type='CustomResNet3D',\n        depth=18,\n        n_input_channels=occ_encoder_input_channel,\n        block_inplanes=voxel_channels,\n        out_indices=my_voxel_out_indices,\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n    ),\n    flow_predictor=dict(\n        type='Predictor',\n        n_input_channels=pred_channels,\n        in_timesteps=time_receptive_field,\n        out_timesteps=n_future_frames_plus,\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n    ),\n    flow_encoder_neck=dict(\n        type='FPN3D',\n        with_cp=False,\n        in_channels=decoder_channels,\n        out_channels=flow_out_channel,\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n    ),\n    flow_head=dict(\n        type='FlowHead',\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n        soft_weights=True, \n        final_occ_size=occ_size,\n        fine_topk=15000,\n        empty_idx=empty_idx,\n        num_level=len(my_voxel_out_indices),\n        in_channels=[voxel_out_channel_per_frame] * len(my_voxel_out_indices),\n        out_channel=3,  # 3-dim flow\n        point_cloud_range=point_cloud_range,\n    ),\n    pts_bbox_head=dict(\n        type='OccHead',\n        norm_cfg=dict(type='SyncBN', requires_grad=True),\n        soft_weights=True,\n        final_occ_size=occ_size,\n        fine_topk=15000,\n        empty_idx=empty_idx,\n        num_level=len(my_voxel_out_indices),\n        in_channels=[voxel_out_channel_per_frame] * len(my_voxel_out_indices),\n        out_channel=num_cls,\n        point_cloud_range=point_cloud_range,\n        loss_weight_cfg=dict(\n            loss_voxel_ce_weight=1.0,\n            loss_voxel_sem_scal_weight=1.0,\n            loss_voxel_geo_scal_weight=1.0,\n            loss_voxel_lovasz_weight=1.0,\n        ),\n    ),\n    empty_idx=empty_idx,\n)\n\n# Learning policy params ******************************************\noptimizer = dict(\n    type='AdamW',\n    lr=3e-4,   \n    paramwise_cfg=dict(\n        custom_keys={\n            'img_backbone': dict(lr_mult=0.1),\n        }),\n    weight_decay=0.01)\n\noptimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))\nlr_config = dict(\n    policy='CosineAnnealing',\n    warmup='linear',\n    warmup_iters=500,\n    warmup_ratio=1.0 / 3,\n    min_lr_ratio=1e-3)\n\nrunner = dict(type='EpochBasedRunner', max_epochs=24)\nevaluation = dict(\n    interval=1,\n    pipeline=test_pipeline,\n    save_best='SSC_mean',\n    rule='greater',\n)\n\ncustom_hooks = [\n    dict(type='OccEfficiencyHook'),\n]"
  },
  {
    "path": "projects/configs/datasets/custom_nus-3d.py",
    "content": "# If point cloud range is changed, the models should also change their point\n# cloud range accordingly\npoint_cloud_range = [-50, -50, -5, 50, 50, 3]\n# For nuScenes we usually do 10-class detection\nclass_names = [\n    'car', 'truck', 'trailer', 'bus', 'construction_vehicle', 'bicycle',\n    'motorcycle', 'pedestrian', 'traffic_cone', 'barrier'\n]\ndataset_type = 'NuScenesDataset_eval_modified'\ndata_root = 'data/nuscenes/'\n# Input modality for nuScenes dataset, this is consistent with the submission\n# format which requires the information in input_modality.\ninput_modality = dict(\n    use_lidar=True,\n    use_camera=False,\n    use_radar=False,\n    use_map=False,\n    use_external=False)\nfile_client_args = dict(backend='disk')\n# Uncomment the following if use ceph or other file clients.\n# See https://mmcv.readthedocs.io/en/latest/api.html#mmcv.fileio.FileClient\n# for more details.\n# file_client_args = dict(\n#     backend='petrel',\n#     path_mapping=dict({\n#         './data/nuscenes/': 's3://nuscenes/nuscenes/',\n#         'data/nuscenes/': 's3://nuscenes/nuscenes/'\n#     }))\ntrain_pipeline = [\n    dict(\n        type='LoadPointsFromFile',\n        coord_type='LIDAR',\n        load_dim=5,\n        use_dim=5,\n        file_client_args=file_client_args),\n    dict(\n        type='LoadPointsFromMultiSweeps',\n        sweeps_num=10,\n        file_client_args=file_client_args),\n    dict(type='LoadAnnotations3D', with_bbox_3d=True, with_label_3d=True),\n    dict(\n        type='GlobalRotScaleTrans',\n        rot_range=[-0.3925, 0.3925],\n        scale_ratio_range=[0.95, 1.05],\n        translation_std=[0, 0, 0]),\n    dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),\n    dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),\n    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),\n    dict(type='ObjectNameFilter', classes=class_names),\n    dict(type='PointShuffle'),\n    dict(type='DefaultFormatBundle3D', class_names=class_names),\n    dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])\n]\ntest_pipeline = [\n    dict(\n        type='LoadPointsFromFile',\n        coord_type='LIDAR',\n        load_dim=5,\n        use_dim=5,\n        file_client_args=file_client_args),\n    dict(\n        type='LoadPointsFromMultiSweeps',\n        sweeps_num=10,\n        file_client_args=file_client_args),\n    dict(\n        type='MultiScaleFlipAug3D',\n        img_scale=(1333, 800),\n        pts_scale_ratio=1,\n        flip=False,\n        transforms=[\n            dict(\n                type='GlobalRotScaleTrans',\n                rot_range=[0, 0],\n                scale_ratio_range=[1., 1.],\n                translation_std=[0, 0, 0]),\n            dict(type='RandomFlip3D'),\n            dict(\n                type='PointsRangeFilter', point_cloud_range=point_cloud_range),\n            dict(\n                type='DefaultFormatBundle3D',\n                class_names=class_names,\n                with_label=False),\n            dict(type='Collect3D', keys=['points'])\n        ])\n]\n# construct a pipeline for data and gt loading in show function\n# please keep its loading function consistent with test_pipeline (e.g. client)\neval_pipeline = [\n    dict(\n        type='LoadPointsFromFile',\n        coord_type='LIDAR',\n        load_dim=5,\n        use_dim=5,\n        file_client_args=file_client_args),\n    dict(\n        type='LoadPointsFromMultiSweeps',\n        sweeps_num=10,\n        file_client_args=file_client_args),\n    dict(\n        type='DefaultFormatBundle3D',\n        class_names=class_names,\n        with_label=False),\n    dict(type='Collect3D', keys=['points'])\n]\n\ndata = dict(\n    samples_per_gpu=4,\n    workers_per_gpu=4,\n    train=dict(\n        type=dataset_type,\n        data_root=data_root,\n        ann_file=data_root + 'nuscenes_infos_train.pkl',\n        pipeline=train_pipeline,\n        classes=class_names,\n        modality=input_modality,\n        test_mode=False,\n        # we use box_type_3d='LiDAR' in kitti and nuscenes dataset\n        # and box_type_3d='Depth' in sunrgbd and scannet dataset.\n        box_type_3d='LiDAR'),\n    val=dict(\n        type=dataset_type,\n        ann_file=data_root + 'nuscenes_infos_val.pkl',\n        pipeline=test_pipeline,\n        classes=class_names,\n        modality=input_modality,\n        test_mode=True,\n        box_type_3d='LiDAR'),\n    test=dict(\n        type=dataset_type,\n        data_root=data_root,\n        ann_file=data_root + 'nuscenes_infos_val.pkl',\n        pipeline=test_pipeline,\n        classes=class_names,\n        modality=input_modality,\n        test_mode=True,\n        box_type_3d='LiDAR'))\n# For nuScenes dataset, we usually evaluate the model at the end of training.\n# Since the models are trained by 24 epochs by default, we set evaluation\n# interval to be 24. Please change the interval accordingly if you do not\n# use a default schedule.\nevaluation = dict(interval=24, pipeline=eval_pipeline)\n"
  },
  {
    "path": "projects/occ_plugin/__init__.py",
    "content": "from .core.evaluation.eval_hooks import OccDistEvalHook, OccEvalHook\nfrom .core.evaluation.efficiency_hooks import OccEfficiencyHook\nfrom .core.visualizer import save_occ\nfrom .datasets.pipelines import (\n  PhotoMetricDistortionMultiViewImage, PadMultiViewImage, \n  NormalizeMultiviewImage,  CustomCollect3D)\nfrom .occupancy import *"
  },
  {
    "path": "projects/occ_plugin/core/__init__.py",
    "content": "from .evaluation import *\nfrom .visualizer import *"
  },
  {
    "path": "projects/occ_plugin/core/evaluation/__init__.py",
    "content": "from .eval_hooks import OccDistEvalHook, OccEvalHook\nfrom .efficiency_hooks import OccEfficiencyHook"
  },
  {
    "path": "projects/occ_plugin/core/evaluation/efficiency_hooks.py",
    "content": "import copy\nfrom mmcv.runner import HOOKS, Hook\nimport time\ntry:\n    from mmcv.cnn import get_model_complexity_info\nexcept ImportError:\n    raise ImportError('Please upgrade mmcv to >0.6.2')\nimport torch\nimport torch.distributed as dist\n\n@HOOKS.register_module()\nclass OccEfficiencyHook(Hook):\n    def __init__(self, dataloader,  **kwargs):\n        self.dataloader = dataloader \n        self.warm_up = 5\n        \n    def construct_input(self, DUMMY_SHAPE=None, m_info=None):\n        if m_info is None:\n            m_info = next(iter(self.dataloader))\n        img_metas = m_info['img_metas'].data\n        input = dict(\n            img_metas=img_metas,\n        )\n        if 'img_inputs' in m_info.keys():\n            img_inputs = m_info['img_inputs']\n            for i in range(len(img_inputs)):\n                if isinstance(img_inputs[i], list):\n                    for j in range(len(img_inputs[i])):\n                        img_inputs[i][j] = img_inputs[i][j].cuda()\n                else:\n                    img_inputs[i] = img_inputs[i].cuda()\n            input['img_inputs'] = img_inputs\n            \n        if 'points' in m_info.keys():\n            points = m_info['points'].data[0]\n            points[0] = points[0].cuda()\n            input['points'] = points\n        return input\n    \n    def before_run(self, runner):\n        torch.cuda.reset_peak_memory_stats()\n        \n        # model = copy.deepcopy(runner.model)\n        # if hasattr(model, 'module'):\n        #     model = model.module\n        # if hasattr(model, 'forward_dummy'):\n        #     model.forward_train = model.forward_dummy\n        #     model.forward_test = model.forward_dummy\n        #     model.eval()\n        # else:\n        #     raise NotImplementedError(\n        #         'FLOPs counter is currently not supported for {}'.format(\n        #             model.__class__.__name__))\n        # # inf time\n        # pure_inf_time = 0\n        # itv_sample = 10\n        # for i, data in enumerate(self.dataloader):\n        #     torch.cuda.synchronize()\n        #     start_time = time.perf_counter()\n\n        #     with torch.no_grad():\n        #         model(return_loss=False, rescale=True, **self.construct_input(m_info=data))\n\n        #     torch.cuda.synchronize()\n        #     elapsed = time.perf_counter() - start_time\n\n        #     if i >= self.warm_up:\n        #         pure_inf_time += elapsed\n        #         if (i + 1) % itv_sample == 0:\n        #             fps = (i + 1 - self.warm_up) / pure_inf_time\n        #             if runner.rank == 0:\n        #                 runner.logger.info(f'Done sample [{i + 1:<3}/ {itv_sample*5}], '\n        #                     f'fps: {fps:.1f} sample / s')\n\n        #     if (i + 1) == itv_sample*5:\n        #         pure_inf_time += elapsed\n        #         fps = (i + 1 - self.warm_up) / pure_inf_time\n        #         if runner.rank == 0:\n        #             runner.logger.info(f'Overall fps: {fps:.1f} sample / s')\n        #         break\n            \n        # # flops and params   \n        # if runner.rank == 0:\n        #     flops, params = get_model_complexity_info(\n        #         model, (None, None), input_constructor=self.construct_input)\n            \n        #     split_line = '=' * 30\n        #     gpu_measure = torch.cuda.max_memory_allocated() / 1024. / 1024. /1024. \n        #     runner.logger.info(f'{split_line}\\n' f'Flops: {flops}\\nParams: {params}\\nGPU memory: {gpu_measure:.2f}GB\\n{split_line}')\n            \n        if dist.is_available() and dist.is_initialized():\n            dist.barrier() \n        \n        \n    def after_run(self, runner):\n        pass\n\n    def before_epoch(self, runner):\n        pass\n\n    def after_epoch(self, runner):\n        pass\n\n    def before_iter(self, runner):\n        pass\n\n    def after_iter(self, runner):\n        pass\n  \n"
  },
  {
    "path": "projects/occ_plugin/core/evaluation/eval_hooks.py",
    "content": "\n# Note: Considering that MMCV's EvalHook updated its interface in V1.3.16,\n# in order to avoid strong version dependency, we did not directly\n# inherit EvalHook but BaseDistEvalHook.\n\nimport os.path as osp\nimport torch.distributed as dist\nfrom mmcv.runner import DistEvalHook as BaseDistEvalHook\nfrom torch.nn.modules.batchnorm import _BatchNorm\nfrom mmcv.runner import EvalHook as BaseEvalHook\n\n\nclass OccEvalHook(BaseEvalHook):\n    def __init__(self, *args,  **kwargs):\n        super(OccEvalHook, self).__init__(*args, **kwargs)  \n            \n    def _do_evaluate(self, runner):\n        \"\"\"perform evaluation and save ckpt.\"\"\"\n        if not self._should_evaluate(runner):\n            return\n\n        from projects.occ_plugin.occupancy.apis.test import custom_single_gpu_test\n        results = custom_single_gpu_test(runner.model, self.dataloader, show=False)\n        \n        runner.log_buffer.output['eval_iter_num'] = len(self.dataloader)\n        key_score = self.evaluate(runner, results)\n        if self.save_best:\n            self._save_ckpt(runner, key_score)\n            \n            \nclass OccDistEvalHook(BaseDistEvalHook):\n    def __init__(self, *args,  **kwargs):\n        super(OccDistEvalHook, self).__init__(*args, **kwargs)       \n\n    def _do_evaluate(self, runner):\n        \"\"\"perform evaluation and save ckpt.\"\"\"\n        # Synchronization of BatchNorm's buffer (running_mean\n        # and running_var) is not supported in the DDP of pytorch,\n        # which may cause the inconsistent performance of models in\n        # different ranks, so we broadcast BatchNorm's buffers\n        # of rank 0 to other ranks to avoid this.\n        if self.broadcast_bn_buffer:\n            model = runner.model\n            for name, module in model.named_modules():\n                if isinstance(module,\n                              _BatchNorm) and module.track_running_stats:\n                    dist.broadcast(module.running_var, 0)\n                    dist.broadcast(module.running_mean, 0)\n\n        if not self._should_evaluate(runner):\n            return\n\n        tmpdir = self.tmpdir\n        if tmpdir is None:\n            tmpdir = osp.join(runner.work_dir, '.eval_hook')\n\n        from projects.occ_plugin.occupancy.apis.test import custom_multi_gpu_test # to solve circlur  import\n\n        results = custom_multi_gpu_test(\n            runner.model,\n            self.dataloader,\n            tmpdir=tmpdir,\n            gpu_collect=self.gpu_collect)\n        \n        if runner.rank == 0:\n            print('\\n')\n            runner.log_buffer.output['eval_iter_num'] = len(self.dataloader)\n            \n            key_score = self.evaluate(runner, results)\n\n            if self.save_best:\n                self._save_ckpt(runner, key_score)\n  \n"
  },
  {
    "path": "projects/occ_plugin/core/visualizer/__init__.py",
    "content": "from .show_occ import save_occ"
  },
  {
    "path": "projects/occ_plugin/core/visualizer/show_occ.py",
    "content": "\nimport torch.nn.functional as F\nimport torch\nimport numpy as np\nfrom os import path as osp\nimport os\n\ndef save_occ(pred_c, pred_f, img_metas, path, visible_mask=None, gt_occ=None, free_id=0, thres_low=0.4, thres_high=0.99):\n\n    \"\"\"\n    visualization saving for paper:\n    1. gt\n    2. pred_f pred_c\n    3. gt visible\n    4. pred_f visible\n    \"\"\"\n    pred_f = F.softmax(pred_f, dim=1)\n    pred_f = pred_f[0].cpu().numpy()  # C W H D\n    pred_c = F.softmax(pred_c, dim=1)\n    pred_c = pred_c[0].cpu().numpy()  # C W H D\n    visible_mask = visible_mask[0].cpu().numpy().reshape(-1) > 0  # WHD\n    gt_occ = gt_occ.data[0][0].cpu().numpy()  # W H D\n    gt_occ[gt_occ==255] = 0\n    _, W, H, D = pred_f.shape\n    coordinates_3D_f = np.stack(np.meshgrid(np.arange(W), np.arange(H), np.arange(D), indexing='ij'), axis=-1).reshape(-1, 3) # (W*H*D, 3)\n    _, W, H, D = pred_c.shape\n    coordinates_3D_c = np.stack(np.meshgrid(np.arange(W), np.arange(H), np.arange(D), indexing='ij'), axis=-1).reshape(-1, 3) # (W*H*D, 3)\n    pred_f = np.argmax(pred_f, axis=0) # (W, H, D)\n    pred_c = np.argmax(pred_c, axis=0) # (W, H, D)\n    occ_pred_f_mask = (pred_f.reshape(-1))!=free_id\n    occ_pred_c_mask = (pred_c.reshape(-1))!=free_id\n    occ_gt_mask = (gt_occ.reshape(-1))!=free_id\n    pred_f_save = np.concatenate([coordinates_3D_f[occ_pred_f_mask], pred_f.reshape(-1)[occ_pred_f_mask].reshape(-1, 1)], axis=1)[:, [2,1,0,3]]  # zyx cls\n    pred_c_save = np.concatenate([coordinates_3D_c[occ_pred_c_mask], pred_c.reshape(-1)[occ_pred_c_mask].reshape(-1, 1)], axis=1)[:, [2,1,0,3]]  # zyx cls\n    pred_f_visible_save = np.concatenate([coordinates_3D_f[occ_pred_f_mask&visible_mask], pred_f.reshape(-1)[occ_pred_f_mask&visible_mask].reshape(-1, 1)], axis=1)[:, [2,1,0,3]]  # zyx cls\n    gt_save = np.concatenate([coordinates_3D_f[occ_gt_mask], gt_occ.reshape(-1)[occ_gt_mask].reshape(-1, 1)], axis=1)[:, [2,1,0,3]]  # zyx cls\n    gt_visible_save = np.concatenate([coordinates_3D_f[occ_gt_mask&visible_mask], gt_occ.reshape(-1)[occ_gt_mask&visible_mask].reshape(-1, 1)], axis=1)[:, [2,1,0,3]]  # zyx cls\n    \n    scene_token = img_metas.data[0][0]['scene_token']\n    lidar_token = img_metas.data[0][0]['lidar_token']\n    save_path = osp.join(path, scene_token, lidar_token)\n    if not osp.exists(save_path):\n        os.makedirs(save_path)\n    save_pred_f_path = osp.join(save_path, 'pred_f.npy')\n    save_pred_c_path = osp.join(save_path, 'pred_c.npy')\n    save_pred_f_v_path = osp.join(save_path, 'pred_f_visible.npy')\n    save_gt_path = osp.join(save_path, 'gt.npy')\n    save_gt_v_path = osp.join(save_path, 'gt_visible.npy')\n    np.save(save_pred_f_path, pred_f_save)\n    np.save(save_pred_c_path, pred_c_save)\n    np.save(save_pred_f_v_path, pred_f_visible_save)\n    np.save(save_gt_path, gt_save)\n    np.save(save_gt_v_path, gt_visible_save)\n"
  },
  {
    "path": "projects/occ_plugin/datasets/__init__.py",
    "content": "from .nuscenes_dataset import CustomNuScenesDataset\nfrom .cam4docc_dataset import Cam4DOccDataset\nfrom .cam4docc_lyft_dataset import Cam4DOccLyftDataset\nfrom .builder import custom_build_dataset\n\n__all__ = [\n    'CustomNuScenesDataset', 'NuscOCCDataset'\n]\n"
  },
  {
    "path": "projects/occ_plugin/datasets/builder.py",
    "content": "# Copyright (c) OpenMMLab. All rights reserved.\nimport copy\nimport platform\nimport random\nfrom functools import partial\n\nimport numpy as np\nfrom mmcv.parallel import collate\nfrom mmcv.runner import get_dist_info\nfrom mmcv.utils import Registry, build_from_cfg\nfrom torch.utils.data import DataLoader\n\nfrom mmdet.datasets.samplers import GroupSampler\nfrom projects.occ_plugin.datasets.samplers.group_sampler import DistributedGroupSampler\nfrom projects.occ_plugin.datasets.samplers.distributed_sampler import DistributedSampler\nfrom projects.occ_plugin.datasets.samplers.sampler import build_sampler\n\ndef build_dataloader(dataset,\n                     samples_per_gpu,\n                     workers_per_gpu,\n                     num_gpus=1,\n                     dist=True,\n                     shuffle=True,\n                     seed=None,\n                     shuffler_sampler=None,\n                     nonshuffler_sampler=None,\n                     **kwargs):\n    \"\"\"Build PyTorch DataLoader.\n    In distributed training, each GPU/process has a dataloader.\n    In non-distributed training, there is only one dataloader for all GPUs.\n    Args:\n        dataset (Dataset): A PyTorch dataset.\n        samples_per_gpu (int): Number of training samples on each GPU, i.e.,\n            batch size of each GPU.\n        workers_per_gpu (int): How many subprocesses to use for data loading\n            for each GPU.\n        num_gpus (int): Number of GPUs. Only used in non-distributed training.\n        dist (bool): Distributed training/test or not. Default: True.\n        shuffle (bool): Whether to shuffle the data at every epoch.\n            Default: True.\n        kwargs: any keyword argument to be used to initialize DataLoader\n    Returns:\n        DataLoader: A PyTorch dataloader.\n    \"\"\"\n    rank, world_size = get_dist_info()\n    if dist:\n        # DistributedGroupSampler will definitely shuffle the data to satisfy\n        # that images on each GPU are in the same group\n        if shuffle:\n            sampler = build_sampler(shuffler_sampler if shuffler_sampler is not None else dict(type='DistributedGroupSampler'),\n                                     dict(\n                                         dataset=dataset,\n                                         samples_per_gpu=samples_per_gpu,\n                                         num_replicas=world_size,\n                                         rank=rank,\n                                         seed=seed)\n                                     )\n\n        else:\n            sampler = build_sampler(nonshuffler_sampler if nonshuffler_sampler is not None else dict(type='DistributedSampler'),\n                                     dict(\n                                         dataset=dataset,\n                                         num_replicas=world_size,\n                                         rank=rank,\n                                         shuffle=shuffle,\n                                         seed=seed)\n                                     )\n\n        batch_size = samples_per_gpu\n        num_workers = workers_per_gpu\n    else:\n        print('WARNING!!!!, Only can be used for obtain inference speed!!!!')\n        sampler = GroupSampler(dataset, samples_per_gpu) if shuffle else None\n        batch_size = num_gpus * samples_per_gpu\n        num_workers = num_gpus * workers_per_gpu\n\n    init_fn = partial(\n        worker_init_fn, num_workers=num_workers, rank=rank,\n        seed=seed) if seed is not None else None\n\n    data_loader = DataLoader(\n        dataset,\n        batch_size=batch_size,\n        sampler=sampler,\n        num_workers=num_workers,\n        collate_fn=partial(collate, samples_per_gpu=samples_per_gpu),\n        pin_memory=False,\n        worker_init_fn=init_fn,\n        **kwargs)\n\n    return data_loader\n\n\ndef worker_init_fn(worker_id, num_workers, rank, seed):\n    # The seed of each worker equals to\n    # num_worker * rank + worker_id + user_seed\n    worker_seed = num_workers * rank + worker_id + seed\n    np.random.seed(worker_seed)\n    random.seed(worker_seed)\n\n\n# Copyright (c) OpenMMLab. All rights reserved.\nimport platform\nfrom mmcv.utils import Registry, build_from_cfg\n\nfrom mmdet.datasets import DATASETS\nfrom mmdet.datasets.builder import _concat_dataset\n\nif platform.system() != 'Windows':\n    # https://github.com/pytorch/pytorch/issues/973\n    import resource\n    rlimit = resource.getrlimit(resource.RLIMIT_NOFILE)\n    base_soft_limit = rlimit[0]\n    hard_limit = rlimit[1]\n    soft_limit = min(max(4096, base_soft_limit), hard_limit)\n    resource.setrlimit(resource.RLIMIT_NOFILE, (soft_limit, hard_limit))\n\nOBJECTSAMPLERS = Registry('Object sampler')\n\n\ndef custom_build_dataset(cfg, default_args=None):\n    from mmdet3d.datasets.dataset_wrappers import CBGSDataset\n    from mmdet.datasets.dataset_wrappers import (ClassBalancedDataset,\n                                                 ConcatDataset, RepeatDataset)\n    if isinstance(cfg, (list, tuple)):\n        dataset = ConcatDataset([custom_build_dataset(c, default_args) for c in cfg])\n    elif cfg['type'] == 'ConcatDataset':\n        dataset = ConcatDataset(\n            [custom_build_dataset(c, default_args) for c in cfg['datasets']],\n            cfg.get('separate_eval', True))\n    elif cfg['type'] == 'RepeatDataset':\n        dataset = RepeatDataset(\n            custom_build_dataset(cfg['dataset'], default_args), cfg['times'])\n    elif cfg['type'] == 'ClassBalancedDataset':\n        dataset = ClassBalancedDataset(\n            custom_build_dataset(cfg['dataset'], default_args), cfg['oversample_thr'])\n    elif cfg['type'] == 'CBGSDataset':\n        dataset = CBGSDataset(custom_build_dataset(cfg['dataset'], default_args))\n    elif isinstance(cfg.get('ann_file'), (list, tuple)):\n        dataset = _concat_dataset(cfg, default_args)\n    else:\n        dataset = build_from_cfg(cfg, DATASETS, default_args)\n\n    return dataset\n"
  },
  {
    "path": "projects/occ_plugin/datasets/cam4docc_dataset.py",
    "content": "# Developed by Junyi Ma based on the codebase of OpenOccupancy and PowerBEV\n# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications\n# https://github.com/haomo-ai/Cam4DOcc\n\nimport numpy as np\nfrom mmcv.runner import get_dist_info\nfrom mmdet.datasets import DATASETS\nfrom mmdet3d.datasets import NuScenesDataset\nimport os\nfrom nuscenes.eval.common.utils import quaternion_yaw, Quaternion\nfrom projects.occ_plugin.utils.formating import cm_to_ious, format_iou_results\nfrom projects.occ_plugin.utils.geometry import convert_egopose_to_matrix_numpy, invert_matrix_egopose_numpy\nfrom nuscenes import NuScenes\nfrom pyquaternion import Quaternion\nimport torch\nimport random\nimport time\n\n@DATASETS.register_module()\nclass Cam4DOccDataset(NuScenesDataset):\n    def __init__(self, occ_size, pc_range, occ_root, idx_root, ori_data_root, data_root, time_receptive_field, n_future_frames, classes, use_separate_classes,\n                  train_capacity, test_capacity, **kwargs):\n        \n        '''\n        Cam4DOccDataset contains sequential occupancy states as well as instance flow for training occupancy forecasting models. We unify the related operations in the LiDAR coordinate system following OpenOccupancy.\n\n        occ_size: number of grids along H W L, default: [512, 512, 40]\n        pc_range: predefined ranges along H W L, default: [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]\n        occ_root: data path of nuScenes-Occupancy\n        idx_root: save path of test indexes\n        time_receptive_field: number of historical frames used for forecasting (including the present one), default: 3\n        n_future_frames: number of forecasted future frames, default: 4\n        classes: predefiend categories in GMO\n        use_separate_classes: separate movable objects instead of the general one\n        train_capacity: number of sequences used for training, default: 23930\n        test_capacity: number of sequences used for testing, default: 5119\n        '''\n        \n        self.train_capacity = train_capacity\n        self.test_capacity = test_capacity\n\n        super().__init__(**kwargs)\n\n        rank, world_size = get_dist_info()\n\n        self.time_receptive_field = time_receptive_field\n        self.n_future_frames = n_future_frames\n        self.sequence_length = time_receptive_field + n_future_frames\n\n        if rank == 0:\n            print(\"-------------\")\n            print(\"use past \" + str(self.time_receptive_field) + \" frames to forecast future \" + str(self.n_future_frames) + \" frames\")\n            print(\"-------------\")\n\n        self.data_infos = list(sorted(self.data_infos, key=lambda e: e['timestamp']))\n        self.data_infos = self.data_infos[::self.load_interval]\n        self.occ_size = occ_size\n        self.pc_range = pc_range\n        self.occ_root = occ_root\n        self.idx_root = idx_root\n        self.ori_data_root = ori_data_root\n        self.data_root = data_root\n        self.classes = classes\n        self.use_separate_classes = use_separate_classes\n\n        self.indices = self.get_indices()\n        self.present_scene_lidar_token = \" \"\n        self._set_group_flag()\n\n        # load origin nusc dataset for instance annotation\n        self.nusc = NuScenes(version='v1.0-trainval', dataroot=self.data_root, verbose=False)\n        if self.test_mode:\n            self.chosen_list = random.sample(range(0, self.test_capacity) , self.test_capacity)\n            self.chosen_list_num = len(self.chosen_list)\n        else:\n            self.chosen_list = random.sample(range(0, self.train_capacity) , self.train_capacity)\n            self.chosen_list_num = len(self.chosen_list)\n    \n    def _set_group_flag(self):\n        if self.test_mode:\n            self.flag = np.zeros(self.test_capacity, dtype=np.uint8)\n        else:\n            self.flag = np.zeros(self.train_capacity, dtype=np.uint8)\n\n    def __len__(self):\n        if self.test_mode:\n            return self.test_capacity\n        else:\n            return self.train_capacity\n\n    def __getitem__(self, idx):\n     \n        idx = int(self.chosen_list[idx])\n\n        self.egopose_list = []\n        self.ego2lidar_list = []\n        self.visible_instance_set = set()\n        self.instance_dict = {}\n\n        if self.test_mode:\n            return self.prepare_test_data(idx)\n            \n        while True:\n            data = self.prepare_train_data(idx)\n            if data is None:\n                idx = self._rand_another(idx)\n                idx = int(self.chosen_list[idx])\n                continue\n            \n            return data\n\n    def get_indices(self):\n        '''\n        Generate sequential indexes for training and testing\n        '''\n        indices = []\n        for index in range(len(self.data_infos)): \n            is_valid_data = True\n            previous_rec = None\n            current_indices = []\n            for t in range(self.sequence_length):\n                index_t = index + t\n                # Going over the dataset size limit.\n                if index_t >= len(self.data_infos):\n                    is_valid_data = False\n                    break\n                rec = self.data_infos[index_t]\n                # Check if scene is the same\n                if (previous_rec is not None) and (rec['scene_token'] != previous_rec['scene_token']):\n                    is_valid_data = False\n                    break\n\n                current_indices.append(index_t)\n                previous_rec = rec\n\n            if is_valid_data:\n                indices.append(current_indices)\n\n        return np.asarray(indices)\n\n    def get_lidar_pose(self, rec):\n        '''\n        Get global poses for following bbox transforming\n        '''\n        ego2global_translation = rec['ego2global_translation']\n        ego2global_rotation = rec['ego2global_rotation']\n        trans = -np.array(ego2global_translation)\n        rot = Quaternion(ego2global_rotation).inverse\n        \n        return trans, rot\n    \n    def get_ego2lidar_pose(self, rec):\n        '''\n        Get LiDAR poses in ego system\n        '''\n        lidar2ego_translation = rec['lidar2ego_translation']\n        lidar2ego_rotation = rec['lidar2ego_rotation']\n        trans = -np.array(lidar2ego_translation)\n        rot = Quaternion(lidar2ego_rotation).inverse\n        return trans, rot\n\n    def record_instance(self, idx, instance_map):\n        \"\"\"\n        Record information about each visible instance in the sequence and assign a unique ID to it\n        \"\"\"\n        rec = self.data_infos[idx]\n        translation, rotation = self.get_lidar_pose(rec)\n        self.egopose_list.append([translation, rotation])\n        ego2lidar_translation, ego2lidar_rotation = self.get_ego2lidar_pose(rec)\n        self.ego2lidar_list.append([ego2lidar_translation, ego2lidar_rotation])\n\n        current_sample = self.nusc.get('sample', rec['token'])\n        for annotation_token in current_sample['anns']:\n            annotation = self.nusc.get('sample_annotation', annotation_token)\n            # Instance extraction for Cam4DOcc-V1 \n            # Filter out all non vehicle instances\n            # if 'vehicle' not in annotation['category_name']:\n            #     continue\n            gmo_flag = False\n            for class_name in self.classes:\n                if class_name in annotation['category_name']:\n                    gmo_flag = True\n                    break\n            if not gmo_flag:\n                continue\n            # Specify semantic id if use_separate_classes\n            semantic_id = 1\n            if self.use_separate_classes:\n                if 'vehicle.bicycle' in annotation['category_name']: # rm static_object.bicycle_rack\n                    semantic_id = 1\n                elif 'bus'  in annotation['category_name']:\n                    semantic_id = 2\n                elif 'car'  in annotation['category_name']:\n                    semantic_id = 3\n                elif 'construction'  in annotation['category_name']:\n                    semantic_id = 4\n                elif 'motorcycle'  in annotation['category_name']:\n                    semantic_id = 5\n                elif 'trailer'  in annotation['category_name']:\n                    semantic_id = 6\n                elif 'truck'  in annotation['category_name']:\n                    semantic_id = 7\n                elif 'pedestrian'  in annotation['category_name']:\n                    semantic_id = 8\n\n            # Filter out invisible vehicles\n            FILTER_INVISIBLE_VEHICLES = True\n            if FILTER_INVISIBLE_VEHICLES and int(annotation['visibility_token']) == 1 and annotation['instance_token'] not in self.visible_instance_set:\n                continue\n            # Filter out vehicles that have not been seen in the past\n            if self.counter >= self.time_receptive_field and annotation['instance_token'] not in self.visible_instance_set:\n                continue\n            self.visible_instance_set.add(annotation['instance_token'])\n\n            if annotation['instance_token'] not in instance_map:\n                instance_map[annotation['instance_token']] = len(instance_map) + 1\n            instance_id = instance_map[annotation['instance_token']]\n            instance_attribute = int(annotation['visibility_token'])\n\n            if annotation['instance_token'] not in self.instance_dict:\n                # For the first occurrence of an instance\n                self.instance_dict[annotation['instance_token']] = {\n                    'timestep': [self.counter],\n                    'translation': [annotation['translation']],\n                    'rotation': [annotation['rotation']],\n                    'size': annotation['size'],\n                    'instance_id': instance_id,\n                    'semantic_id': semantic_id,\n                    'attribute_label': [instance_attribute],\n                }\n            else:\n                # For the instance that have appeared before\n                self.instance_dict[annotation['instance_token']]['timestep'].append(self.counter)\n                self.instance_dict[annotation['instance_token']]['translation'].append(annotation['translation'])\n                self.instance_dict[annotation['instance_token']]['rotation'].append(annotation['rotation'])\n                self.instance_dict[annotation['instance_token']]['attribute_label'].append(instance_attribute)\n\n        return instance_map\n\n    def get_future_egomotion(self, idx):\n        '''\n        Calculate LiDAR pose updates between idx and idx+1\n        '''\n        rec_t0 = self.data_infos[idx]\n        future_egomotion = np.eye(4, dtype=np.float32)\n\n        if idx < len(self.data_infos) - 1:\n            rec_t1 = self.data_infos[idx + 1]\n\n            if rec_t0['scene_token'] == rec_t1['scene_token']:\n                egopose_t0_trans = rec_t0['ego2global_translation']\n                egopose_t0_rot = rec_t0['ego2global_rotation']\n                egopose_t1_trans = rec_t1['ego2global_translation']\n                egopose_t1_rot = rec_t1['ego2global_rotation']\n                egopose_t0 = convert_egopose_to_matrix_numpy(egopose_t0_trans, egopose_t0_rot)\n                egopose_t1 = convert_egopose_to_matrix_numpy(egopose_t1_trans, egopose_t1_rot)\n\n                lidar2ego_t0_trans = rec_t0['lidar2ego_translation']\n                lidar2ego_t0_rot = rec_t0['lidar2ego_rotation']\n                lidar2ego_t1_trans = rec_t1['lidar2ego_translation']\n                lidar2ego_t1_rot = rec_t1['lidar2ego_rotation']\n                lidar2ego_t0 = convert_egopose_to_matrix_numpy(lidar2ego_t0_trans, lidar2ego_t0_rot)\n                lidar2ego_t1 = convert_egopose_to_matrix_numpy(lidar2ego_t1_trans, lidar2ego_t1_rot)\n\n                future_egomotion = invert_matrix_egopose_numpy(lidar2ego_t1).dot(invert_matrix_egopose_numpy(egopose_t1)).dot(egopose_t0).dot(lidar2ego_t0)   \n\n        future_egomotion = torch.Tensor(future_egomotion).float()\n\n        # Convert to 6DoF vector\n        return future_egomotion.unsqueeze(0)\n\n    @staticmethod\n    def _check_consistency(translation, prev_translation, threshold=1.0):\n        \"\"\"\n        Check for significant displacement of the instance adjacent moments\n        \"\"\"\n        x, y = translation[:2]\n        prev_x, prev_y = prev_translation[:2]\n\n        if abs(x - prev_x) > threshold or abs(y - prev_y) > threshold:\n            return False\n        return True\n\n    def refine_instance_poly(self, instance):\n        \"\"\"\n        Fix the missing frames and disturbances of ground truth caused by noise\n        \"\"\"\n        pointer = 1\n        for i in range(instance['timestep'][0] + 1, self.sequence_length):\n            # Fill in the missing frames\n            if i not in instance['timestep']:\n                instance['timestep'].insert(pointer, i)\n                instance['translation'].insert(pointer, instance['translation'][pointer-1])\n                instance['rotation'].insert(pointer, instance['rotation'][pointer-1])\n                instance['attribute_label'].insert(pointer, instance['attribute_label'][pointer-1])\n                pointer += 1\n                continue\n            \n            # Eliminate observation disturbances\n            if self._check_consistency(instance['translation'][pointer], instance['translation'][pointer-1]):\n                instance['translation'][pointer] = instance['translation'][pointer-1]\n                instance['rotation'][pointer] = instance['rotation'][pointer-1]\n                instance['attribute_label'][pointer] = instance['attribute_label'][pointer-1]\n            pointer += 1\n        \n        return instance\n\n    def prepare_train_data(self, index):\n        '''\n        Generate a training sequence\n        '''\n        input_dict = self.get_data_info(index)\n        if input_dict is None:\n            return None\n        \n        example = self.prepare_sequential_data(index)\n        return example\n\n    def prepare_test_data(self, index):\n        '''\n        Generate a test sequence\n        TODO: Give additional functions here such as visualization\n        '''\n        input_dict = self.get_data_info(index)\n        if input_dict is None:\n            return None\n        \n        example = self.prepare_sequential_data(index)\n        # TODO: visualize example data\n        return example\n    \n    def prepare_sequential_data(self, index):\n        '''\n        Use the predefined pipeline to generate inputs of the baseline network and ground truth for the standard evaluation protocol in Cam4DOcc\n        '''\n        instance_map = {}\n        input_seq_data = {}\n        keys = ['input_dict','future_egomotion', 'sample_token']\n        for key in keys:\n            input_seq_data[key] = []\n        scene_lidar_token = []\n\n        for self.counter, index_t in enumerate(self.indices[index]):\n            input_dict_per_frame = self.get_data_info(index_t)\n            if input_dict_per_frame is None:\n                return None\n            \n            input_seq_data['input_dict'].append(input_dict_per_frame)\n            input_seq_data['sample_token'].append(input_dict_per_frame['sample_idx'])\n\n            instance_map = self.record_instance(index_t, instance_map)\n            future_egomotion = self.get_future_egomotion(index_t)\n            input_seq_data['future_egomotion'].append(future_egomotion)\n\n            scene_lidar_token.append(input_dict_per_frame['scene_token']+\"_\"+input_dict_per_frame['lidar_token'])\n            if self.counter == self.time_receptive_field - 1:\n                self.present_scene_lidar_token = input_dict_per_frame['scene_token']+\"_\"+input_dict_per_frame['lidar_token']\n\n        # save sequential test indexes for possible evaluation\n        if self.test_mode:\n            test_idx_path = os.path.join(self.idx_root, \"test_ids\")\n            if not os.path.exists(test_idx_path):\n                os.mkdir(test_idx_path)\n            np.savez(os.path.join(test_idx_path, self.present_scene_lidar_token), scene_lidar_token)\n\n        for token in self.instance_dict.keys():\n            self.instance_dict[token] = self.refine_instance_poly(self.instance_dict[token])\n\n        input_seq_data.update(\n            dict(\n                time_receptive_field=self.time_receptive_field,\n                sequence_length=self.sequence_length,\n                egopose_list=self.egopose_list,\n                ego2lidar_list=self.ego2lidar_list,\n                instance_dict=self.instance_dict,\n                instance_map=instance_map,\n                indices=self.indices[index],\n                scene_token=self.present_scene_lidar_token,\n            ))\n\n        example = self.pipeline(input_seq_data)\n\n        return example\n\n\n    def get_data_info(self, index):\n        '''\n        get_data_info from .pkl also used by OpenOccupancy\n        '''\n        \n        info = self.data_infos[index]\n        \n        # standard protocal modified from SECOND.Pytorch\n        input_dict = dict(\n            sample_idx=info['token'],\n            pts_filename=info['lidar_path'],\n            sweeps=info['sweeps'],\n            lidar2ego_translation=info['lidar2ego_translation'],\n            lidar2ego_rotation=info['lidar2ego_rotation'],\n            ego2global_translation=info['ego2global_translation'],\n            ego2global_rotation=info['ego2global_rotation'],\n            prev_idx=info['prev'],\n            next_idx=info['next'],\n            scene_token=info['scene_token'],\n            can_bus=info['can_bus'],\n            # frame_idx=info['frame_idx'],\n            timestamp=info['timestamp'] / 1e6,\n            occ_size = np.array(self.occ_size),\n            pc_range = np.array(self.pc_range),\n            lidar_token=info['lidar_token'],\n            lidarseg=info['lidarseg'],\n            curr=info,\n        )\n\n        if self.modality['use_camera']:\n            image_paths = []\n            lidar2img_rts = []\n            lidar2cam_rts = []\n            cam_intrinsics = []\n            \n            lidar2cam_dic = {}\n            \n            for cam_type, cam_info in info['cams'].items():\n                image_paths.append(cam_info['data_path'])\n                # obtain lidar to image transformation matrix\n                lidar2cam_r = np.linalg.inv(cam_info['sensor2lidar_rotation'])\n                lidar2cam_t = cam_info[\n                    'sensor2lidar_translation'] @ lidar2cam_r.T\n                lidar2cam_rt = np.eye(4)\n                lidar2cam_rt[:3, :3] = lidar2cam_r.T\n                lidar2cam_rt[3, :3] = -lidar2cam_t\n                intrinsic = cam_info['cam_intrinsic']\n                viewpad = np.eye(4)\n                viewpad[:intrinsic.shape[0], :intrinsic.shape[1]] = intrinsic\n                lidar2img_rt = (viewpad @ lidar2cam_rt.T)\n                lidar2img_rts.append(lidar2img_rt)\n\n                cam_intrinsics.append(viewpad)\n                lidar2cam_rts.append(lidar2cam_rt.T)\n                \n                lidar2cam_dic[cam_type] = lidar2cam_rt.T\n\n            input_dict.update(\n                dict(\n                    img_filename=image_paths,\n                    lidar2img=lidar2img_rts,\n                    cam_intrinsic=cam_intrinsics,\n                    lidar2cam=lidar2cam_rts,\n                    lidar2cam_dic=lidar2cam_dic,\n                ))\n\n        return input_dict\n\n\n    def evaluate(self, results, logger=None, **kawrgs):\n        '''\n        Evaluate by IOU and VPQ metrics for model evaluation\n        '''\n        eval_results = {}\n        \n        ''' calculate IOU '''\n        hist_for_iou = sum(results['hist_for_iou'])\n        ious = cm_to_ious(hist_for_iou)\n        res_table, res_dic = format_iou_results(ious, return_dic=True)\n        for key, val in res_dic.items():\n            eval_results['IOU_{}'.format(key)] = val\n        if logger is not None:\n            logger.info('IOU Evaluation')\n            logger.info(res_table)        \n\n        ''' calculate VPQ '''\n        if 'vpq_metric' in results.keys() and 'vpq_len' in results.keys():\n            vpq_sum = sum(results['vpq_metric'])\n            eval_results['VPQ'] = vpq_sum/results['vpq_len']\n\n        return eval_results\n"
  },
  {
    "path": "projects/occ_plugin/datasets/cam4docc_lyft_dataset.py",
    "content": "# Developed by Junyi Ma based on the codebase of OpenOccupancy and PowerBEV\n# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications\n# https://github.com/haomo-ai/Cam4DOcc\n\nimport numpy as np\nfrom mmcv.runner import get_dist_info\nfrom mmdet.datasets import DATASETS\nfrom mmdet3d.datasets import NuScenesDataset\nfrom mmdet3d.datasets.pipelines import Compose\nfrom torch.utils.data import Dataset\nfrom lyft_dataset_sdk.lyftdataset import LyftDataset\nimport os\nfrom nuscenes.eval.common.utils import quaternion_yaw, Quaternion\nfrom projects.occ_plugin.utils.formating import cm_to_ious, format_iou_results\nfrom projects.occ_plugin.utils.geometry import convert_egopose_to_matrix_numpy, invert_matrix_egopose_numpy\nfrom nuscenes import NuScenes\nfrom pyquaternion import Quaternion\nimport torch\nimport random\nimport time\n\n@DATASETS.register_module()\nclass Cam4DOccLyftDataset(Dataset):\n    def __init__(self, occ_size, pc_range, occ_root, idx_root, ori_data_root, data_root, time_receptive_field, n_future_frames, classes, use_separate_classes,\n                  train_capacity, test_capacity, test_mode=False, pipeline=None, **kwargs):\n        \n        '''\n        Cam4DOccLyftDataset contains sequential occupancy states as well as instance flow for training occupancy forecasting models. We unify the related operations in the LiDAR coordinate system following OpenOccupancy.\n\n        occ_size: number of grids along H W L, default: [512, 512, 40]\n        pc_range: predefined ranges along H W L, default: [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]\n        occ_root: data path of nuScenes-Occupancy\n        idx_root: save path of test indexes\n        time_receptive_field: number of historical frames used for forecasting (including the present one), default: 3\n        n_future_frames: number of forecasted future frames, default: 4\n        classes: predefiend categories in GMO\n        use_separate_classes: separate movable objects instead of the general one\n        train_capacity: number of sequences used for training, default: 23930\n        test_capacity: number of sequences used for testing, default: 5119\n        '''\n\n        self.test_mode = test_mode\n        self.CLASSES = classes\n        \n        self.train_capacity = train_capacity\n        self.test_capacity = test_capacity\n\n        super().__init__()\n\n        # training and test indexes following PowerBEV\n        self.TRAIN_LYFT_INDICES = [1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16,\n                      17, 18, 19, 20, 21, 23, 24, 27, 28, 29, 30, 31, 32,\n                      33, 35, 36, 37, 39, 41, 43, 44, 45, 46, 47, 48, 49,\n                      50, 51, 52, 53, 55, 56, 59, 60, 62, 63, 65, 68, 69,\n                      70, 71, 72, 73, 74, 75, 76, 78, 79, 81, 82, 83, 84,\n                      86, 87, 88, 89, 93, 95, 97, 98, 99, 103, 104, 107, 108,\n                      109, 110, 111, 113, 114, 115, 116, 117, 118, 119, 121, 122, 124,\n                      127, 128, 130, 131, 132, 134, 135, 136, 137, 138, 139, 143, 144,\n                      146, 147, 148, 149, 150, 151, 152, 153, 154, 156, 157, 158, 159,\n                      161, 162, 165, 166, 167, 171, 172, 173, 174, 175, 176, 177, 178,\n                      179]\n\n        self.VAL_LYFT_INDICES = [0, 2, 4, 13, 22, 25, 26, 34, 38, 40, 42, 54, 57,\n                    58, 61, 64, 66, 67, 77, 80, 85, 90, 91, 92, 94, 96,\n                    100, 101, 102, 105, 106, 112, 120, 123, 125, 126, 129, 133, 140,\n                    141, 142, 145, 155, 160, 163, 164, 168, 169, 170]\n\n\n        rank, world_size = get_dist_info()\n\n        self.time_receptive_field = time_receptive_field\n        self.n_future_frames = n_future_frames\n        self.sequence_length = time_receptive_field + n_future_frames\n\n        if rank == 0:\n            print(\"-------------\")\n            print(\"use past \" + str(self.time_receptive_field) + \" frames to forecast future \" + str(self.n_future_frames) + \" frames\")\n            print(\"-------------\")\n\n        self.occ_size = occ_size\n        self.pc_range = pc_range\n        self.occ_root = occ_root\n        self.idx_root = idx_root\n        self.ori_data_root = ori_data_root\n        self.data_root = data_root\n        self.classes = classes\n        self.use_separate_classes = use_separate_classes\n\n        self.pipeline = Compose(pipeline)\n\n        # load origin nusc dataset for instance annotation\n        self.lyft = LyftDataset(data_path=self.data_root, json_path=os.path.join(self.data_root, 'train_data'), verbose=False)\n\n        self.scenes = self.get_scenes()\n        self.ixes = self.get_samples()\n        self.indices = self.get_indices()\n\n        self.present_scene_lidar_token = \" \"\n        self._set_group_flag()\n\n        if self.test_mode:\n            self.chosen_list = random.sample(range(0, self.test_capacity) , self.test_capacity)\n            self.chosen_list_num = len(self.chosen_list)\n        else:\n            self.chosen_list = random.sample(range(0, self.train_capacity) , self.train_capacity)\n            self.chosen_list_num = len(self.chosen_list)\n    \n    def _set_group_flag(self):\n        if self.test_mode:\n            self.flag = np.zeros(self.test_capacity, dtype=np.uint8)\n        else:\n            self.flag = np.zeros(self.train_capacity, dtype=np.uint8)\n\n    def __len__(self):\n        if self.test_mode:\n            return self.test_capacity\n        else:\n            return self.train_capacity\n\n    def __getitem__(self, idx):\n     \n        idx = int(self.chosen_list[idx])\n\n        self.egopose_list = []\n        self.ego2lidar_list = []\n        self.visible_instance_set = set()\n        self.instance_dict = {}\n\n        if self.test_mode:\n            return self.prepare_test_data(idx)\n            \n        while True:\n            data = self.prepare_train_data(idx)\n            if data is None:\n                idx = self._rand_another(idx)\n                idx = int(self.chosen_list[idx])\n                continue\n            \n            return data\n\n    def get_scenes(self):\n        \"\"\"\n        Obtain the list of scenes names in the given split.\n        \"\"\"\n        scenes = [row['name'] for row in self.lyft.scene]\n        # split in train/val\n        indices = self.VAL_LYFT_INDICES  if self.test_mode else  self.TRAIN_LYFT_INDICES\n        scenes = [scenes[i] for i in indices]\n        return scenes\n\n    def get_samples(self):\n        \"\"\"\n        Find and sort the samples in the given split by scene.\n        \"\"\"\n        samples = [sample for sample in self.lyft.sample]\n        # remove samples that aren't in this split\n        samples = [sample for sample in samples if self.lyft.get('scene', sample['scene_token'])['name'] in self.scenes]\n        # sort by scene, timestamp (only to make chronological viz easier)\n        samples.sort(key=lambda x: (x['scene_token'], x['timestamp']))\n\n        return samples\n\n    def get_indices(self):\n        '''\n        Generate sequential indexes for training and testing\n        '''\n        indices = []\n        for index in range(len(self.ixes)):\n            is_valid_data = True\n            previous_rec = None\n            current_indices = []\n            for t in range(self.sequence_length):\n                index_t = index + t\n                # Going over the dataset size limit.\n                if index_t >= len(self.ixes):\n                    is_valid_data = False\n                    break\n                rec = self.ixes[index_t]\n                # Check if scene is the same\n                if (previous_rec is not None) and (rec['scene_token'] != previous_rec['scene_token']):\n                    is_valid_data = False\n                    break\n\n                current_indices.append(index_t)\n                previous_rec = rec\n\n            if is_valid_data:\n                indices.append(current_indices)\n\n        return np.asarray(indices)\n\n    def get_lidar_pose(self, rec):\n        '''\n        Get global poses for following bbox transforming\n        '''\n        current_sample = self.lyft.get('sample', rec['token'])\n        egopose = self.lyft.get('ego_pose', self.lyft.get('sample_data', current_sample['data']['LIDAR_TOP'])['ego_pose_token'])\n        ego2global_translation = egopose['translation']\n        ego2global_rotation = egopose['rotation']\n        trans = -np.array(ego2global_translation)\n        rot = Quaternion(ego2global_rotation).inverse\n        \n        return trans, rot\n    \n    def get_ego2lidar_pose(self, rec):\n        '''\n        Get LiDAR poses in ego system\n        '''\n        current_sample = self.lyft.get('sample', rec['token'])\n        lidar_top_data = self.lyft.get('sample_data', current_sample['data']['LIDAR_TOP'])\n        lidar2ego_translation = self.lyft.get('calibrated_sensor', lidar_top_data['calibrated_sensor_token'])['translation']\n        lidar2ego_rotation =  self.lyft.get('calibrated_sensor', lidar_top_data['calibrated_sensor_token'])['rotation']\n\n        trans = -np.array(lidar2ego_translation)\n        rot = Quaternion(lidar2ego_rotation).inverse\n        return trans, rot\n\n    def record_instance(self, idx, instance_map):\n        \"\"\"\n        Record information about each visible instance in the sequence and assign a unique ID to it\n        \"\"\"\n        rec = self.ixes[idx]\n        translation, rotation = self.get_lidar_pose(rec)\n        self.egopose_list.append([translation, rotation])\n        ego2lidar_translation, ego2lidar_rotation = self.get_ego2lidar_pose(rec)\n        self.ego2lidar_list.append([ego2lidar_translation, ego2lidar_rotation])\n\n        current_sample = self.lyft.get('sample', rec['token'])\n        for annotation_token in current_sample['anns']:\n            annotation = self.lyft.get('sample_annotation', annotation_token)\n            # Instance extraction for Cam4DOcc-V1 \n            # Filter out all non vehicle instances\n            # if 'vehicle' not in annotation['category_name']:\n            #     continue\n            gmo_flag = False\n            for class_name in self.classes:\n                if class_name in annotation['category_name']:\n                    gmo_flag = True\n                    break\n            if not gmo_flag:\n                continue\n            # Specify semantic id if use_separate_classes\n            semantic_id = 1\n            if self.use_separate_classes:\n                if 'bicycle' in annotation['category_name']:\n                    semantic_id = 1\n                elif 'bus'  in annotation['category_name']:\n                    semantic_id = 2\n                elif 'car'  in annotation['category_name']:\n                    semantic_id = 3\n                elif 'construction'  in annotation['category_name']:\n                    semantic_id = 4\n                elif 'motorcycle'  in annotation['category_name']:\n                    semantic_id = 5\n                elif 'trailer'  in annotation['category_name']:\n                    semantic_id = 6\n                elif 'truck'  in annotation['category_name']:\n                    semantic_id = 7\n                elif 'pedestrian'  in annotation['category_name']:\n                    semantic_id = 8\n\n            if annotation['instance_token'] not in instance_map:\n                instance_map[annotation['instance_token']] = len(instance_map) + 1\n            instance_id = instance_map[annotation['instance_token']]\n            instance_attribute = 1 # deprecated\n\n            if annotation['instance_token'] not in self.instance_dict:\n                # For the first occurrence of an instance\n                self.instance_dict[annotation['instance_token']] = {\n                    'timestep': [self.counter],\n                    'translation': [annotation['translation']],\n                    'rotation': [annotation['rotation']],\n                    'size': annotation['size'],\n                    'instance_id': instance_id,\n                    'semantic_id': semantic_id,\n                    'attribute_label': [instance_attribute],\n                }\n            else:\n                # For the instance that have appeared before\n                self.instance_dict[annotation['instance_token']]['timestep'].append(self.counter)\n                self.instance_dict[annotation['instance_token']]['translation'].append(annotation['translation'])\n                self.instance_dict[annotation['instance_token']]['rotation'].append(annotation['rotation'])\n                self.instance_dict[annotation['instance_token']]['attribute_label'].append(instance_attribute)\n\n        return instance_map\n\n    def get_future_egomotion(self, idx):\n        '''\n        Calculate LiDAR pose updates between idx and idx+1\n        '''\n        rec_t0 = self.ixes[idx]\n        future_egomotion = np.eye(4, dtype=np.float32)\n\n        if idx < len(self.ixes) - 1:\n            rec_t1 = self.ixes[idx + 1]\n\n            if rec_t0['scene_token'] == rec_t1['scene_token']:\n                egopose_t0 = self.lyft.get('ego_pose', self.lyft.get('sample_data', rec_t0['data']['LIDAR_TOP'])['ego_pose_token'])\n                egopose_t0_trans = egopose_t0['translation']\n                egopose_t0_rot = egopose_t0['rotation']\n\n                egopose_t1 = self.lyft.get('ego_pose', self.lyft.get('sample_data', rec_t1['data']['LIDAR_TOP'])['ego_pose_token'])\n                egopose_t1_trans = egopose_t1['translation']\n                egopose_t1_rot = egopose_t1['rotation']\n\n                egopose_t0 = convert_egopose_to_matrix_numpy(egopose_t0_trans, egopose_t0_rot)\n                egopose_t1 = convert_egopose_to_matrix_numpy(egopose_t1_trans, egopose_t1_rot)\n\n                lidar_top_data_t0 = self.lyft.get('sample_data', rec_t0['data']['LIDAR_TOP'])\n                lidar2ego_t0_trans = self.lyft.get('calibrated_sensor', lidar_top_data_t0['calibrated_sensor_token'])['translation']\n                lidar2ego_t0_rot =  self.lyft.get('calibrated_sensor', lidar_top_data_t0['calibrated_sensor_token'])['rotation']\n                lidar_top_data_t1 = self.lyft.get('sample_data', rec_t1['data']['LIDAR_TOP'])\n                lidar2ego_t1_trans = self.lyft.get('calibrated_sensor', lidar_top_data_t1['calibrated_sensor_token'])['translation']\n                lidar2ego_t1_rot =  self.lyft.get('calibrated_sensor', lidar_top_data_t1['calibrated_sensor_token'])['rotation']\n\n\n                lidar2ego_t0 = convert_egopose_to_matrix_numpy(lidar2ego_t0_trans, lidar2ego_t0_rot)\n                lidar2ego_t1 = convert_egopose_to_matrix_numpy(lidar2ego_t1_trans, lidar2ego_t1_rot)\n\n                future_egomotion = invert_matrix_egopose_numpy(lidar2ego_t1).dot(invert_matrix_egopose_numpy(egopose_t1)).dot(egopose_t0).dot(lidar2ego_t0)   \n\n        future_egomotion = torch.Tensor(future_egomotion).float()\n        return future_egomotion.unsqueeze(0)\n\n    @staticmethod\n    def _check_consistency(translation, prev_translation, threshold=1.0):\n        \"\"\"\n        Check for significant displacement of the instance adjacent moments\n        \"\"\"\n        x, y = translation[:2]\n        prev_x, prev_y = prev_translation[:2]\n\n        if abs(x - prev_x) > threshold or abs(y - prev_y) > threshold:\n            return False\n        return True\n\n    def refine_instance_poly(self, instance):\n        \"\"\"\n        Fix the missing frames and disturbances of ground truth caused by noise\n        \"\"\"\n        pointer = 1\n        for i in range(instance['timestep'][0] + 1, self.sequence_length):\n            # Fill in the missing frames\n            if i not in instance['timestep']:\n                instance['timestep'].insert(pointer, i)\n                instance['translation'].insert(pointer, instance['translation'][pointer-1])\n                instance['rotation'].insert(pointer, instance['rotation'][pointer-1])\n                instance['attribute_label'].insert(pointer, instance['attribute_label'][pointer-1])\n                pointer += 1\n                continue\n            \n            # Eliminate observation disturbances\n            if self._check_consistency(instance['translation'][pointer], instance['translation'][pointer-1]):\n                instance['translation'][pointer] = instance['translation'][pointer-1]\n                instance['rotation'][pointer] = instance['rotation'][pointer-1]\n                instance['attribute_label'][pointer] = instance['attribute_label'][pointer-1]\n            pointer += 1\n        \n        return instance\n\n    def prepare_train_data(self, index):\n        '''\n        Generate a training sequence\n        '''\n        \n        example = self.prepare_sequential_data(index)\n        return example\n\n    def prepare_test_data(self, index):\n        '''\n        Generate a test sequence\n        TODO: Give additional functions here such as visualization\n        '''\n        \n        example = self.prepare_sequential_data(index)\n        # TODO: visualize example data\n        return example\n    \n    def prepare_sequential_data(self, index):\n        '''\n        Use the predefined pipeline to generate inputs of the baseline network and ground truth for the standard evaluation protocol in Cam4DOcc\n        '''\n        instance_map = {}\n        input_seq_data = {}\n        keys = ['input_dict','future_egomotion', 'sample_token']\n        for key in keys:\n            input_seq_data[key] = []\n        scene_lidar_token = []\n\n        for self.counter, index_t in enumerate(self.indices[index]):\n\n            input_dict_per_frame = {}\n            rec = self.ixes[index_t]  # sample\n\n            lidar_top_data = self.lyft.get('sample_data', rec['data']['LIDAR_TOP'])\n            lidar2ego_translation = self.lyft.get('calibrated_sensor', lidar_top_data['calibrated_sensor_token'])['translation']\n            lidar2ego_rotation =  self.lyft.get('calibrated_sensor', lidar_top_data['calibrated_sensor_token'])['rotation']\n\n            egopose = self.lyft.get('ego_pose', self.lyft.get('sample_data', rec['data']['LIDAR_TOP'])['ego_pose_token'])\n            ego2global_translation = egopose['translation']\n            ego2global_rotation = egopose['rotation']\n\n            input_dict_per_frame['lidar2ego_translation'] = lidar2ego_translation\n            input_dict_per_frame['lidar2ego_rotation'] = lidar2ego_rotation\n            input_dict_per_frame['ego2global_translation'] = ego2global_translation\n            input_dict_per_frame['ego2global_rotation'] = ego2global_rotation\n            input_dict_per_frame['scene_token'] = rec['scene_token']\n            input_dict_per_frame['lidar_token'] = rec['data']['LIDAR_TOP']\n            input_dict_per_frame['occ_size'] = np.array(self.occ_size)\n            input_dict_per_frame['pc_range'] = np.array(self.pc_range)\n            input_dict_per_frame['sample_idx'] = rec['token']\n\n\n            image_paths = []\n            lidar2img_rts = []\n            lidar2cam_rts = []\n            cam_intrinsics = []\n            cam_intrinsics_ori = []\n            lidar2cam_dic = {}\n\n            lidar_sample = self.lyft.get('sample_data', rec['data']['LIDAR_TOP'])\n            lidar_pose = self.lyft.get('ego_pose', lidar_sample['ego_pose_token'])\n            lidar_rotation = Quaternion(lidar_pose['rotation'])\n            lidar_translation = np.array(lidar_pose['translation'])[:, None]\n            lidar_to_world = np.vstack([\n                np.hstack((lidar_rotation.rotation_matrix, lidar_translation)),\n                np.array([0, 0, 0, 1])\n            ])\n\n            lidar_sample_calib = self.lyft.get('calibrated_sensor', lidar_sample['calibrated_sensor_token'])\n            lidar_sensor_rotation = Quaternion(lidar_sample_calib['rotation'])\n            lidar_sensor_translation = np.array(lidar_sample_calib['translation'])[:, None]\n            lidar_to_lidarego = np.vstack([\n                np.hstack((lidar_sensor_rotation.rotation_matrix, lidar_sensor_translation)),\n                np.array([0, 0, 0, 1])\n            ])\n\n            cameras = ['CAM_FRONT_LEFT', 'CAM_FRONT', 'CAM_FRONT_RIGHT', 'CAM_BACK_LEFT', 'CAM_BACK', 'CAM_BACK_RIGHT']\n\n            for cam in cameras:\n                camera_sample = self.lyft.get('sample_data', rec['data'][cam])\n                image_paths.append(os.path.join(\"/tos://haomo-algorithms/c6089dc67ff976615510d22b5eaaaa4e/mjy/cam4docc/data/lyft/\", camera_sample['filename']))\n\n                car_egopose = self.lyft.get('ego_pose', camera_sample['ego_pose_token'])\n                egopose_rotation = Quaternion(car_egopose['rotation']).inverse\n                egopose_translation = -np.array(car_egopose['translation'])[:, None]\n                world_to_car_egopose = np.vstack([\n                    np.hstack((egopose_rotation.rotation_matrix, egopose_rotation.rotation_matrix @ egopose_translation)),\n                    np.array([0, 0, 0, 1])\n                ])\n\n                sensor_sample = self.lyft.get('calibrated_sensor', camera_sample['calibrated_sensor_token'])\n                intrinsic = torch.Tensor(sensor_sample['camera_intrinsic'])\n                cam_intrinsics_ori.append(intrinsic)\n                sensor_rotation = Quaternion(sensor_sample['rotation'])\n                sensor_translation = np.array(sensor_sample['translation'])[:, None]\n                car_egopose_to_sensor = np.vstack([\n                    np.hstack((sensor_rotation.rotation_matrix, sensor_translation)),\n                    np.array([0, 0, 0, 1])\n                ])\n                car_egopose_to_sensor = np.linalg.inv(car_egopose_to_sensor)\n\n                lidar_to_sensor = car_egopose_to_sensor @ world_to_car_egopose @ lidar_to_world @ lidar_to_lidarego\n                sensor_to_lidar =np.linalg.inv(lidar_to_sensor)\n\n                lidar2cam_r = lidar_to_sensor[:3, :3] \n                lidar2cam_t = sensor_to_lidar[:3, -1].reshape(1,3) @ lidar2cam_r.T\n                lidar2cam_rt = np.eye(4)\n                lidar2cam_rt[:3, :3] = lidar2cam_r.T\n                lidar2cam_rt[3, :3] = -lidar2cam_t\n                viewpad = np.eye(4)\n                viewpad[:intrinsic.shape[0], :intrinsic.shape[1]] = intrinsic\n                lidar2img_rt = (viewpad @ lidar2cam_rt.T)\n                lidar2img_rts.append(lidar2img_rt)\n\n                cam_intrinsics.append(viewpad)\n                lidar2cam_rts.append(lidar2cam_rt.T)\n                \n                lidar2cam_dic[cam] = lidar2cam_rt.T\n\n            input_dict_per_frame.update(\n                dict(\n                    img_filename=image_paths,\n                    lidar2img=lidar2img_rts,\n                    cam_intrinsic=cam_intrinsics,\n                    cam_intrinsics=cam_intrinsics_ori,\n                    lidar2cam=lidar2cam_rts,\n                    lidar2cam_dic=lidar2cam_dic,\n                ))\n                        \n            input_seq_data['input_dict'].append(input_dict_per_frame)\n\n            instance_map = self.record_instance(index_t, instance_map)\n            future_egomotion = self.get_future_egomotion(index_t)\n            input_seq_data['future_egomotion'].append(future_egomotion)\n            input_seq_data['sample_token'].append(input_dict_per_frame['sample_idx'])\n\n            scene_lidar_token.append(input_dict_per_frame['scene_token']+\"_\"+input_dict_per_frame['lidar_token'])\n            if self.counter == self.time_receptive_field - 1:\n                self.present_scene_lidar_token = input_dict_per_frame['scene_token']+\"_\"+input_dict_per_frame['lidar_token']\n\n        for token in self.instance_dict.keys():\n            self.instance_dict[token] = self.refine_instance_poly(self.instance_dict[token])\n\n        input_seq_data.update(\n            dict(\n                time_receptive_field=self.time_receptive_field,\n                sequence_length=self.sequence_length,\n                egopose_list=self.egopose_list,\n                ego2lidar_list=self.ego2lidar_list,\n                instance_dict=self.instance_dict,\n                instance_map=instance_map,\n                indices=self.indices[index],\n                scene_token=self.present_scene_lidar_token,\n            ))\n\n        example = self.pipeline(input_seq_data)\n        return example\n\n\n    def evaluate(self, results, logger=None, **kawrgs):\n        '''\n        Evaluate by IOU and VPQ metrics for model evaluation\n        '''\n        eval_results = {}\n        \n        ''' calculate IOU '''\n        hist_for_iou = sum(results['hist_for_iou'])\n        ious = cm_to_ious(hist_for_iou)\n        res_table, res_dic = format_iou_results(ious, return_dic=True)\n        for key, val in res_dic.items():\n            eval_results['IOU_{}'.format(key)] = val\n        if logger is not None:\n            logger.info('IOU Evaluation')\n            logger.info(res_table)        \n\n        ''' calculate VPQ '''\n        if 'vpq_metric' in results.keys() and 'vpq_len' in results.keys():\n            vpq_sum = sum(results['vpq_metric'])\n            eval_results['VPQ'] = vpq_sum/results['vpq_len']\n\n        return eval_results\n"
  },
  {
    "path": "projects/occ_plugin/datasets/nuscenes_dataset.py",
    "content": "import copy\nimport numpy as np\nfrom mmdet.datasets import DATASETS\nfrom mmdet3d.datasets import NuScenesDataset\nimport mmcv\nfrom os import path as osp\nfrom mmdet.datasets import DATASETS\nimport torch\nimport numpy as np\nfrom nuscenes.eval.common.utils import quaternion_yaw, Quaternion\nfrom mmcv.parallel import DataContainer as DC\nimport random\n\n\n@DATASETS.register_module()\nclass CustomNuScenesDataset(NuScenesDataset):\n    r\"\"\"NuScenes Dataset.\n\n    This datset only add camera intrinsics and extrinsics to the results.\n    \"\"\"\n\n    def __init__(self, queue_length=4, bev_size=(200, 200), overlap_test=False, *args, **kwargs):\n        super().__init__(*args, **kwargs)\n        self.queue_length = queue_length\n        self.overlap_test = overlap_test\n        self.bev_size = bev_size\n        \n    def prepare_train_data(self, index):\n        \"\"\"\n        Training data preparation.\n        Args:\n            index (int): Index for accessing the target data.\n        Returns:\n            dict: Training data dict of the corresponding index.\n        \"\"\"\n        queue = []\n        index_list = list(range(index-self.queue_length, index))\n        random.shuffle(index_list)\n        index_list = sorted(index_list[1:])\n        index_list.append(index)\n        for i in index_list:\n            i = max(0, i)\n            input_dict = self.get_data_info(i)\n            if input_dict is None:\n                return None\n            self.pre_pipeline(input_dict)\n            example = self.pipeline(input_dict)\n            if self.filter_empty_gt and \\\n                    (example is None or ~(example['gt_labels_3d']._data != -1).any()):\n                return None\n            queue.append(example)\n        return self.union2one(queue)\n\n\n    def union2one(self, queue):\n        imgs_list = [each['img'].data for each in queue]\n        metas_map = {}\n        prev_scene_token = None\n        prev_pos = None\n        prev_angle = None\n        for i, each in enumerate(queue):\n            metas_map[i] = each['img_metas'].data\n            if metas_map[i]['scene_token'] != prev_scene_token:\n                metas_map[i]['prev_bev_exists'] = False\n                prev_scene_token = metas_map[i]['scene_token']\n                prev_pos = copy.deepcopy(metas_map[i]['can_bus'][:3])\n                prev_angle = copy.deepcopy(metas_map[i]['can_bus'][-1])\n                metas_map[i]['can_bus'][:3] = 0\n                metas_map[i]['can_bus'][-1] = 0\n            else:\n                metas_map[i]['prev_bev_exists'] = True\n                tmp_pos = copy.deepcopy(metas_map[i]['can_bus'][:3])\n                tmp_angle = copy.deepcopy(metas_map[i]['can_bus'][-1])\n                metas_map[i]['can_bus'][:3] -= prev_pos\n                metas_map[i]['can_bus'][-1] -= prev_angle\n                prev_pos = copy.deepcopy(tmp_pos)\n                prev_angle = copy.deepcopy(tmp_angle)\n        queue[-1]['img'] = DC(torch.stack(imgs_list), cpu_only=False, stack=True)\n        queue[-1]['img_metas'] = DC(metas_map, cpu_only=True)\n        queue = queue[-1]\n        return queue\n\n    def get_data_info(self, index):\n        \"\"\"Get data info according to the given index.\n\n        Args:\n            index (int): Index of the sample data to get.\n\n        Returns:\n            dict: Data information that will be passed to the data \\\n                preprocessing pipelines. It includes the following keys:\n\n                - sample_idx (str): Sample index.\n                - pts_filename (str): Filename of point clouds.\n                - sweeps (list[dict]): Infos of sweeps.\n                - timestamp (float): Sample timestamp.\n                - img_filename (str, optional): Image filename.\n                - lidar2img (list[np.ndarray], optional): Transformations \\\n                    from lidar to different cameras.\n                - ann_info (dict): Annotation info.\n        \"\"\"\n        info = self.data_infos[index]\n        # standard protocal modified from SECOND.Pytorch\n        input_dict = dict(\n            sample_idx=info['token'],\n            pts_filename=info['lidar_path'],\n            sweeps=info['sweeps'],\n            ego2global_translation=info['ego2global_translation'],\n            ego2global_rotation=info['ego2global_rotation'],\n            prev_idx=info['prev'],\n            next_idx=info['next'],\n            scene_token=info['scene_token'],\n            can_bus=info['can_bus'],\n            frame_idx=info['frame_idx'],\n            timestamp=info['timestamp'] / 1e6,\n        )\n\n        if self.modality['use_camera']:\n            image_paths = []\n            lidar2img_rts = []\n            lidar2cam_rts = []\n            cam_intrinsics = []\n            for cam_type, cam_info in info['cams'].items():\n                image_paths.append(cam_info['data_path'])\n                # obtain lidar to image transformation matrix\n                lidar2cam_r = np.linalg.inv(cam_info['sensor2lidar_rotation'])\n                lidar2cam_t = cam_info[\n                    'sensor2lidar_translation'] @ lidar2cam_r.T\n                lidar2cam_rt = np.eye(4)\n                lidar2cam_rt[:3, :3] = lidar2cam_r.T\n                lidar2cam_rt[3, :3] = -lidar2cam_t\n                intrinsic = cam_info['cam_intrinsic']\n                viewpad = np.eye(4)\n                viewpad[:intrinsic.shape[0], :intrinsic.shape[1]] = intrinsic\n                lidar2img_rt = (viewpad @ lidar2cam_rt.T)\n                lidar2img_rts.append(lidar2img_rt)\n\n                cam_intrinsics.append(viewpad)\n                lidar2cam_rts.append(lidar2cam_rt.T)\n\n            input_dict.update(\n                dict(\n                    img_filename=image_paths,\n                    lidar2img=lidar2img_rts,\n                    cam_intrinsic=cam_intrinsics,\n                    lidar2cam=lidar2cam_rts,\n                ))\n\n        if not self.test_mode:\n            annos = self.get_ann_info(index)\n            input_dict['ann_info'] = annos\n\n        rotation = Quaternion(input_dict['ego2global_rotation'])\n        translation = input_dict['ego2global_translation']\n        can_bus = input_dict['can_bus']\n        can_bus[:3] = translation\n        can_bus[3:7] = rotation\n        patch_angle = quaternion_yaw(rotation) / np.pi * 180\n        if patch_angle < 0:\n            patch_angle += 360\n        can_bus[-2] = patch_angle / 180 * np.pi\n        can_bus[-1] = patch_angle\n\n        return input_dict\n\n    def __getitem__(self, idx):\n        \"\"\"Get item from infos according to the given index.\n        Returns:\n            dict: Data dictionary of the corresponding index.\n        \"\"\"\n        if self.test_mode:\n            return self.prepare_test_data(idx)\n        while True:\n\n            data = self.prepare_train_data(idx)\n            if data is None:\n                idx = self._rand_another(idx)\n                continue\n            return data\n\n   "
  },
  {
    "path": "projects/occ_plugin/datasets/pipelines/__init__.py",
    "content": "from .transform_3d import (\n    PadMultiViewImage, NormalizeMultiviewImage, \n    PhotoMetricDistortionMultiViewImage, CustomCollect3D, CustomOccCollect3D, RandomScaleImageMultiViewImage)\nfrom .formating import OccDefaultFormatBundle3D\nfrom .loading_occupancy import LoadOccupancy\nfrom .loading_bevdet import LoadAnnotationsBEVDepth, LoadMultiViewImageFromFiles_BEVDet\nfrom .loading_instance import LoadInstanceWithFlow\n\n__all__ = [\n    'PadMultiViewImage', 'NormalizeMultiviewImage', 'CustomOccCollect3D', 'LoadAnnotationsBEVDepth', 'LoadMultiViewImageFromFiles_BEVDet', 'LoadOccupancy',\n    'PhotoMetricDistortionMultiViewImage', 'OccDefaultFormatBundle3D', 'CustomCollect3D', 'RandomScaleImageMultiViewImage', \"LoadInstanceWithFlow\",\n]"
  },
  {
    "path": "projects/occ_plugin/datasets/pipelines/formating.py",
    "content": "# Copyright (c) OpenMMLab. All rights reserved.\r\nimport numpy as np\r\nfrom mmcv.parallel import DataContainer as DC\r\n\r\nfrom mmdet.datasets.builder import PIPELINES\r\nfrom mmdet.datasets.pipelines import to_tensor\r\nfrom mmdet3d.datasets.pipelines import DefaultFormatBundle3D\r\n\r\n\r\n@PIPELINES.register_module()\r\nclass OccDefaultFormatBundle3D(DefaultFormatBundle3D):\r\n    \"\"\"Default formatting bundle.\r\n    It simplifies the pipeline of formatting common fields for voxels,\r\n    including \"proposals\", \"gt_bboxes\", \"gt_labels\", \"gt_masks\" and\r\n    \"gt_semantic_seg\".\r\n    These fields are formatted as follows.\r\n    - img: (1)transpose, (2)to tensor, (3)to DataContainer (stack=True)\r\n    - proposals: (1)to tensor, (2)to DataContainer\r\n    - gt_bboxes: (1)to tensor, (2)to DataContainer\r\n    - gt_bboxes_ignore: (1)to tensor, (2)to DataContainer\r\n    - gt_labels: (1)to tensor, (2)to DataContainer\r\n    \"\"\"\r\n    def __init__(self, *args, **kwargs):\r\n        super().__init__(*args, **kwargs)\r\n\r\n        \r\n    def __call__(self, results):\r\n        \"\"\"Call function to transform and format common fields in results.\r\n        Args:\r\n            results (dict): Result dict contains the data to convert.\r\n        Returns:\r\n            dict: The result dict contains the data that is formatted with\r\n                default bundle.\r\n        \"\"\"\r\n        # Format 3D data\r\n        results = super(OccDefaultFormatBundle3D, self).__call__(results)\r\n\r\n        if 'gt_occ' in results.keys():\r\n            results['gt_occ'] = DC(to_tensor(results['gt_occ']), stack=True)\r\n        if 'gt_occ' in results.keys():\r\n            results['segmentation'] = DC(to_tensor(results['segmentation']), stack=True)\r\n        if 'gt_occ' in results.keys():\r\n            results['instance'] = DC(to_tensor(results['instance']), stack=True)\r\n        if 'gt_occ' in results.keys():\r\n            results['attribute_label'] = DC(to_tensor(results['attribute_label']), stack=True)\r\n        if 'gt_occ' in results.keys():\r\n            results['flow'] = DC(to_tensor(results['flow']), stack=True)\r\n        if 'gt_vel' in results.keys():\r\n            results['gt_vel'] = DC(to_tensor(results['gt_vel']), stack=False)\r\n\r\n        return results"
  },
  {
    "path": "projects/occ_plugin/datasets/pipelines/loading_bevdet.py",
    "content": "# Copyright (c) OpenMMLab. All rights reserved.\n\nimport mmcv\nimport numpy as np\nfrom mmdet.datasets.builder import PIPELINES\nimport os\nimport torch\nfrom PIL import Image\nfrom pyquaternion import Quaternion\nfrom mmdet3d.core.bbox import LiDARInstance3DBoxes\n\nfrom numpy import random\nimport pdb\n\ndef mmlabNormalize(img, img_norm_cfg=None):\n    from mmcv.image.photometric import imnormalize\n    \n    if img_norm_cfg is None:\n        mean = np.array([123.675, 116.28, 103.53], dtype=np.float32)\n        std = np.array([58.395, 57.12, 57.375], dtype=np.float32)\n        to_rgb = True\n    else:\n        mean = np.array(img_norm_cfg['mean'], dtype=np.float32)\n        std = np.array(img_norm_cfg['std'], dtype=np.float32)\n        to_rgb = img_norm_cfg['to_rgb']\n    \n    img = imnormalize(np.array(img), mean, std, to_rgb)\n    img = torch.tensor(img).float().permute(2, 0, 1).contiguous()\n    \n    return img\n\ndef depth_transform(cam_depth, resize, resize_dims, crop, flip, rotate):\n    \"\"\"Transform depth based on ida augmentation configuration.\n\n    Args:\n        cam_depth (np array): Nx3, 3: x,y,d.\n        resize (float): Resize factor.\n        resize_dims (list): Final dimension.\n        crop (list): x1, y1, x2, y2\n        flip (bool): Whether to flip.\n        rotate (float): Rotation value.\n\n    Returns:\n        np array: [h/down_ratio, w/down_ratio, d]\n    \"\"\"\n\n    H, W = resize_dims\n    cam_depth[:, :2] = cam_depth[:, :2] * resize\n    cam_depth[:, 0] -= crop[0]\n    cam_depth[:, 1] -= crop[1]\n    if flip:\n        cam_depth[:, 0] = resize_dims[1] - cam_depth[:, 0]\n\n    cam_depth[:, 0] -= W / 2.0\n    cam_depth[:, 1] -= H / 2.0\n\n    h = rotate / 180 * np.pi\n    rot_matrix = [\n        [np.cos(h), np.sin(h)],\n        [-np.sin(h), np.cos(h)],\n    ]\n    cam_depth[:, :2] = np.matmul(rot_matrix, cam_depth[:, :2].T).T\n\n    cam_depth[:, 0] += W / 2.0\n    cam_depth[:, 1] += H / 2.0\n\n    depth_coords = cam_depth[:, :2].astype(np.int16)\n\n    depth_map = np.zeros(resize_dims)\n    valid_mask = ((depth_coords[:, 1] < resize_dims[0])\n                  & (depth_coords[:, 0] < resize_dims[1])\n                  & (depth_coords[:, 1] >= 0)\n                  & (depth_coords[:, 0] >= 0))\n    depth_map[depth_coords[valid_mask, 1],\n              depth_coords[valid_mask, 0]] = cam_depth[valid_mask, 2]\n\n    return torch.Tensor(depth_map)\n\n@PIPELINES.register_module()\nclass LoadMultiViewImageFromFiles_BEVDet(object):\n    \"\"\"Load multi channel images from a list of separate channel files.\n\n    Expects results['img_filename'] to be a list of filenames.\n\n    Args:\n        to_float32 (bool): Whether to convert the img to float32.\n            Defaults to False.\n        color_type (str): Color type of the file. Defaults to 'unchanged'.\n    \"\"\"\n\n    def __init__(self, data_config, is_train=False, using_ego=True, colorjitter=False,\n                 sequential=False, aligned=False, trans_only=True, img_norm_cfg=None,\n                 mmlabnorm=False, load_depth=False, depth_gt_path=None, data_root=None, test_mode=False, use_lyft=False):\n        self.is_train = is_train\n        self.data_config = data_config\n        \n        # using mean camera ego frame, rather than the lidar coordinates\n        self.using_ego = using_ego\n        self.normalize_img = mmlabNormalize\n        self.img_norm_cfg = img_norm_cfg\n        \n        self.sequential = sequential\n        self.aligned = aligned\n        self.trans_only = trans_only\n        self.load_depth = load_depth\n        \n        self.depth_gt_path = depth_gt_path\n        self.data_root = data_root\n        \n        self.colorjitter = colorjitter\n        self.pipeline_colorjitter = PhotoMetricDistortionMultiViewImage()\n\n        self.test_mode = test_mode\n\n        self.use_lyft = use_lyft\n\n    def get_rot(self,h):\n        return torch.Tensor([\n            [np.cos(h), np.sin(h)],\n            [-np.sin(h), np.cos(h)],\n        ])\n\n    def img_transform(self, img, post_rot, post_tran,\n                      resize, resize_dims, crop,\n                      flip, rotate):\n        # adjust image\n        img = self.img_transform_core(img, resize_dims, crop, flip, rotate)\n\n        # post-homography transformation\n        post_rot *= resize\n        post_tran -= torch.Tensor(crop[:2])\n        if flip:\n            A = torch.Tensor([[-1, 0], [0, 1]])\n            b = torch.Tensor([crop[2] - crop[0], 0])\n            post_rot = A.matmul(post_rot)\n            post_tran = A.matmul(post_tran) + b\n        A = self.get_rot(rotate / 180 * np.pi)\n        b = torch.Tensor([crop[2] - crop[0], crop[3] - crop[1]]) / 2\n        b = A.matmul(-b) + b\n        post_rot = A.matmul(post_rot)\n        post_tran = A.matmul(post_tran) + b\n\n        return img, post_rot, post_tran\n\n    def img_transform_core(self, img, resize_dims, crop, flip, rotate):\n        # adjust image\n        img = img.resize(resize_dims)\n        img = img.crop(crop)\n        if flip:\n            img = img.transpose(method=Image.FLIP_LEFT_RIGHT)\n        img = img.rotate(rotate)\n        return img\n\n    def choose_cams(self):\n        if self.is_train and self.data_config['Ncams'] < len(self.data_config['cams']):\n            cam_names = np.random.choice(self.data_config['cams'], self.data_config['Ncams'],\n                                    replace=False)\n        else:\n            cam_names = self.data_config['cams']\n        return cam_names\n\n    def sample_augmentation(self, H , W, flip=None, scale=None):\n        fH, fW = self.data_config['input_size']\n        if self.is_train:\n            resize = float(fW)/float(W)\n            resize += np.random.uniform(*self.data_config['resize'])\n            resize_dims = (int(W * resize), int(H * resize))\n            newW, newH = resize_dims\n            crop_h = int((1 - np.random.uniform(*self.data_config['crop_h'])) * newH) - fH\n            crop_w = int(np.random.uniform(0, max(0, newW - fW)))\n            crop = (crop_w, crop_h, crop_w + fW, crop_h + fH)\n            # We do not use flip here to keep right forecasting\n            flip = None\n            rotate = 0\n        else:\n            resize = float(fW)/float(W)\n            resize += self.data_config.get('resize_test', 0.0)\n            if scale is not None:\n                resize = scale\n            resize_dims = (int(W * resize), int(H * resize))\n            newW, newH = resize_dims\n            crop_h = int((1 - np.mean(self.data_config['crop_h'])) * newH) - fH\n            crop_w = int(max(0, newW - fW) / 2)\n            crop = (crop_w, crop_h, crop_w + fW, crop_h + fH)\n            flip = None\n            rotate = 0\n        return resize, resize_dims, crop, flip, rotate\n\n    def get_sensor2ego_transformation(self, cam_info, key_info, cam_name):\n        w, x, y, z = cam_info['cams'][cam_name]['sensor2ego_rotation']\n        # sweep sensor to sweep ego\n        sweepsensor2sweepego_rot = torch.Tensor(\n            Quaternion(w, x, y, z).rotation_matrix)\n        sweepsensor2sweepego_tran = torch.Tensor(\n            cam_info['cams'][cam_name]['sensor2ego_translation'])\n        sweepsensor2sweepego = sweepsensor2sweepego_rot.new_zeros(\n            (4, 4))\n        sweepsensor2sweepego[3, 3] = 1\n        sweepsensor2sweepego[:3, :3] = sweepsensor2sweepego_rot\n        sweepsensor2sweepego[:3, -1] = sweepsensor2sweepego_tran\n        # sweep ego to global\n        w, x, y, z = cam_info['cams'][cam_name]['ego2global_rotation']\n        sweepego2global_rot = torch.Tensor(\n            Quaternion(w, x, y, z).rotation_matrix)\n        sweepego2global_tran = torch.Tensor(\n            cam_info['cams'][cam_name]['ego2global_translation'])\n        sweepego2global = sweepego2global_rot.new_zeros((4, 4))\n        sweepego2global[3, 3] = 1\n        sweepego2global[:3, :3] = sweepego2global_rot\n        sweepego2global[:3, -1] = sweepego2global_tran\n\n        # global sensor to cur ego\n        w, x, y, z = key_info['cams'][cam_name]['ego2global_rotation']\n        keyego2global_rot = torch.Tensor(\n            Quaternion(w, x, y, z).rotation_matrix)\n        keyego2global_tran = torch.Tensor(\n            key_info['cams'][cam_name]['ego2global_translation'])\n        keyego2global = keyego2global_rot.new_zeros((4, 4))\n        keyego2global[3, 3] = 1\n        keyego2global[:3, :3] = keyego2global_rot\n        keyego2global[:3, -1] = keyego2global_tran\n        global2keyego = keyego2global.inverse()\n\n        # cur ego to sensor\n        w, x, y, z = key_info['cams'][cam_name]['sensor2ego_rotation']\n        keysensor2keyego_rot = torch.Tensor(\n            Quaternion(w, x, y, z).rotation_matrix)\n        keysensor2keyego_tran = torch.Tensor(\n            key_info['cams'][cam_name]['sensor2ego_translation'])\n        keysensor2keyego = keysensor2keyego_rot.new_zeros((4, 4))\n        keysensor2keyego[3, 3] = 1\n        keysensor2keyego[:3, :3] = keysensor2keyego_rot\n        keysensor2keyego[:3, -1] = keysensor2keyego_tran\n        keyego2keysensor = keysensor2keyego.inverse()\n        keysensor2sweepsensor = (\n                keyego2keysensor @ global2keyego @ sweepego2global\n                @ sweepsensor2sweepego).inverse()\n        sweepsensor2keyego = global2keyego @ sweepego2global @ \\\n                             sweepsensor2sweepego\n        return sweepsensor2keyego, keysensor2sweepsensor\n    \n    def get_sensor2lidar_transformation(self, cam_info, cam_name, sample_info):\n        w, x, y, z = cam_info['cams'][cam_name]['sensor2ego_rotation']\n        # sweep sensor to sweep ego\n        sweepsensor2sweepego_rot = torch.Tensor(\n            Quaternion(w, x, y, z).rotation_matrix)\n        sweepsensor2sweepego_tran = torch.Tensor(\n            cam_info['cams'][cam_name]['sensor2ego_translation'])\n        sweepsensor2sweepego = sweepsensor2sweepego_rot.new_zeros(\n            (4, 4))\n        sweepsensor2sweepego[3, 3] = 1\n        sweepsensor2sweepego[:3, :3] = sweepsensor2sweepego_rot\n        sweepsensor2sweepego[:3, -1] = sweepsensor2sweepego_tran\n        # sweep ego to global\n        w, x, y, z = cam_info['cams'][cam_name]['ego2global_rotation']\n        sweepego2global_rot = torch.Tensor(\n            Quaternion(w, x, y, z).rotation_matrix)\n        sweepego2global_tran = torch.Tensor(\n            cam_info['cams'][cam_name]['ego2global_translation'])\n        sweepego2global = sweepego2global_rot.new_zeros((4, 4))\n        sweepego2global[3, 3] = 1\n        sweepego2global[:3, :3] = sweepego2global_rot\n        sweepego2global[:3, -1] = sweepego2global_tran\n        \n        # global to lidar ego\n        w, x, y, z = sample_info['ego2global_rotation']\n        lidarego2global_rot = torch.Tensor(\n            Quaternion(w, x, y, z).rotation_matrix)\n        lidarego2global_tran = torch.Tensor(sample_info['ego2global_translation'])\n        lidarego2global = lidarego2global_rot.new_zeros((4, 4))\n        lidarego2global[3, 3] = 1\n        lidarego2global[:3, :3] = lidarego2global_rot\n        lidarego2global[:3, -1] = lidarego2global_tran\n        global2lidarego = lidarego2global.inverse()\n        \n        # lidar ego to lidar\n        w, x, y, z = sample_info['lidar2ego_rotation']\n        lidar2ego_rot = torch.Tensor(\n            Quaternion(w, x, y, z).rotation_matrix)\n        lidar2ego_tran = torch.Tensor(sample_info['lidar2ego_translation'])\n        lidar2ego = lidar2ego_rot.new_zeros((4, 4))\n        lidar2ego[3, 3] = 1\n        lidar2ego[:3, :3] = lidar2ego_rot\n        lidar2ego[:3, -1] = lidar2ego_tran\n        ego2lidar = lidar2ego.inverse()\n        \n        # camera to lidar\n        sweepsensor2lidar = ego2lidar @ global2lidarego @ sweepego2global @ sweepsensor2sweepego\n        \n        return sweepsensor2lidar\n\n\n    def get_seq_inputs(self, results, flip=None, scale=None):\n        \n        cam_names = self.choose_cams()\n        results['cam_names'] = cam_names\n\n        if self.use_lyft:\n            filename = results['input_dict'][0]['img_filename'][0]\n        else:\n            cam_data = results['input_dict'][0]['curr']['cams'][cam_names[0]]\n            filename = cam_data['data_path']\n            filename = os.path.join(self.data_root, filename.split('/')[-3], filename.split('/')[-2], filename.split('/')[-1])\n            \n        img = Image.open(filename)\n\n        img_augs = self.sample_augmentation(H=img.height,\n                                            W=img.width,\n                                            flip=flip,\n                                            scale=scale)\n        resize, resize_dims, crop, flip, rotate = img_augs\n\n        sequence_length = results['sequence_length']\n\n        imgs_seq = []\n        rots_seq = []\n        trans_seq = []\n        intrins_seq = []\n        post_rots_seq = []\n        post_trans_seq = []\n        gt_depths_seq = list()\n        canvas_seq = []\n        sensor2sensors_seq = []\n\n        for counter in range(sequence_length):\n\n            input_dict_curr = results['input_dict'][counter]\n\n            imgs = []\n            rots = []\n            trans = []\n            intrins = []\n            post_rots = []\n            post_trans = []\n            gt_depths = list()\n            canvas = []\n            sensor2sensors = []\n\n            for cam_idx, cam_name in enumerate(cam_names):\n                if self.use_lyft:\n                    cam_data = None\n                    filename = input_dict_curr['img_filename'][cam_idx]\n                else:\n                    cam_data = input_dict_curr['curr']['cams'][cam_name]\n                    filename = cam_data['data_path']\n                    filename = os.path.join(self.data_root, filename.split('/')[-3], filename.split('/')[-2], filename.split('/')[-1])\n                \n                img = Image.open(filename)\n                post_rot = torch.eye(2)\n                post_tran = torch.zeros(2)\n\n                if self.use_lyft:\n                    intrin = torch.Tensor(input_dict_curr['cam_intrinsics'][cam_idx])\n                else:\n                    intrin = torch.Tensor(cam_data['cam_intrinsic'])\n                \n                # from camera to lidar \n                sensor2lidar = torch.tensor(input_dict_curr['lidar2cam_dic'][cam_name]).inverse().float()\n                rot = sensor2lidar[:3, :3]\n                tran = sensor2lidar[:3, 3]\n\n                img, post_rot2, post_tran2 = \\\n                    self.img_transform(img, post_rot,\n                                    post_tran,\n                                    resize=resize,\n                                    resize_dims=resize_dims,\n                                    crop=crop,\n                                    flip=flip,\n                                    rotate=rotate)\n\n                # for convenience, make augmentation matrices 3x3\n                post_tran = torch.zeros(3)\n                post_rot = torch.eye(3)\n                post_tran[:2] = post_tran2\n                post_rot[:2, :2] = post_rot2\n\n                # TODO: open source depth enhancement\n                gt_depths.append(torch.zeros(1))\n                \n                canvas.append(np.array(img))\n                \n                if self.colorjitter and self.is_train:\n                    img = self.pipeline_colorjitter(img)\n                \n                imgs.append(self.normalize_img(img, img_norm_cfg=self.img_norm_cfg))\n                intrins.append(intrin)\n                rots.append(rot)\n                trans.append(tran)\n                post_rots.append(post_rot)\n                post_trans.append(post_tran)\n                sensor2sensors.append(sensor2lidar)\n\n            imgs = torch.stack(imgs)\n            rots = torch.stack(rots)\n            trans = torch.stack(trans)\n            intrins = torch.stack(intrins)\n            post_rots = torch.stack(post_rots)\n            post_trans = torch.stack(post_trans)\n            gt_depths = torch.stack(gt_depths)\n            sensor2sensors = torch.stack(sensor2sensors)\n\n            imgs_seq.append(imgs)\n            rots_seq.append(rots)\n            trans_seq.append(trans)\n            intrins_seq.append(intrins)\n            post_rots_seq.append(post_rots)\n            post_trans_seq.append(post_trans)\n            gt_depths_seq.append(gt_depths)\n            canvas_seq.append(canvas)\n            sensor2sensors_seq.append(sensor2sensors)\n\n\n        imgs_seq = torch.stack(imgs_seq)\n        rots_seq = torch.stack(rots_seq)\n        trans_seq = torch.stack(trans_seq)\n        intrins_seq = torch.stack(intrins_seq)\n        post_rots_seq = torch.stack(post_rots_seq)\n        post_trans_seq = torch.stack(post_trans_seq)\n        gt_depths_seq = torch.stack(gt_depths_seq)\n        sensor2sensors_seq = torch.stack(sensor2sensors_seq)\n\n        results['canvas'] = canvas\n\n        return imgs_seq, rots_seq, trans_seq, intrins_seq, post_rots_seq, post_trans_seq, gt_depths_seq, sensor2sensors_seq\n\n\n    def __call__(self, results):\n\n        results['img_inputs_seq'] = self.get_seq_inputs(results)\n    \n        return results\n\ndef bev_transform(rotate_angle, scale_ratio, flip_dx, flip_dy):\n    rotate_angle = torch.tensor(rotate_angle / 180 * np.pi)\n    rot_sin = torch.sin(rotate_angle)\n    rot_cos = torch.cos(rotate_angle)\n    rot_mat = torch.Tensor([[rot_cos, -rot_sin, 0], [rot_sin, rot_cos, 0],\n                            [0, 0, 1]])\n    scale_mat = torch.Tensor([[scale_ratio, 0, 0], [0, scale_ratio, 0],\n                              [0, 0, scale_ratio]])\n    flip_mat = torch.Tensor([[1, 0, 0], [0, 1, 0], [0, 0, 1]])\n    if flip_dx:\n        flip_mat = flip_mat @ torch.Tensor([[-1, 0, 0], [0, 1, 0], [0, 0, 1]])\n    if flip_dy:\n        flip_mat = flip_mat @ torch.Tensor([[1, 0, 0], [0, -1, 0], [0, 0, 1]])\n    rot_mat = flip_mat @ (scale_mat @ rot_mat)\n    return rot_mat\n\n@PIPELINES.register_module()\nclass LoadAnnotationsBEVDepth():\n    def __init__(self, bda_aug_conf, classes, is_train=True,\n                 input_modality=None):\n        self.bda_aug_conf = bda_aug_conf\n        self.is_train = is_train\n        self.classes = classes\n        if input_modality == None:\n            input_modality = dict(\n                use_lidar=True,\n                use_camera=True,\n                use_radar=False,\n                use_map=False,\n                use_external=False)\n        self.input_modality = input_modality\n\n    def sample_bda_augmentation(self):\n        \"\"\"Generate bda augmentation values based on bda_config.\"\"\"\n        if self.is_train:\n            rotate_bda = np.random.uniform(*self.bda_aug_conf['rot_lim'])\n            scale_bda = np.random.uniform(*self.bda_aug_conf['scale_lim'])\n            flip_dx = np.random.uniform() < self.bda_aug_conf['flip_dx_ratio']\n            flip_dy = np.random.uniform() < self.bda_aug_conf['flip_dy_ratio']\n        else:\n            rotate_bda = 0\n            scale_bda = 1.0\n            flip_dx = False\n            flip_dy = False\n        return rotate_bda, scale_bda, flip_dx, flip_dy\n\n    def __call__(self, results):\n        rotate_bda, scale_bda, flip_dx, flip_dy = self.sample_bda_augmentation()\n        \n        bda_mat = torch.zeros(4, 4)\n        bda_mat[3, 3] = 1\n        bda_rot = bev_transform(rotate_bda, scale_bda, flip_dx, flip_dy)\n        bda_mat[:3, :3] = bda_rot\n\n        results['bda_mat'] = bda_rot\n        if 'points' in results.keys():\n            results['points'].rotate(bda_rot)\n        \n        if self.input_modality['use_camera']:\n            assert len(results['img_inputs']) == 8\n            imgs, rots, trans, intrins, post_rots, post_trans, gt_depths, sensor2sensors = results['img_inputs']\n            results['img_inputs'] = (imgs, rots, trans, intrins, post_rots, post_trans, bda_rot, imgs.shape[-2:], gt_depths, sensor2sensors)\n        \n        return results\n\nclass PhotoMetricDistortionMultiViewImage(object):\n    \"\"\"Apply photometric distortion to image sequentially, every transformation\n    is applied with a probability of 0.5. The position of random contrast is in\n    second or second to last.\n    1. random brightness\n    2. random contrast (mode 0)\n    3. convert color from BGR to HSV\n    4. random saturation\n    5. random hue\n    6. convert color from HSV to BGR\n    7. random contrast (mode 1)\n    8. randomly swap channels\n    Args:\n        brightness_delta (int): delta of brightness.\n        contrast_range (tuple): range of contrast.\n        saturation_range (tuple): range of saturation.\n        hue_delta (int): delta of hue.\n    \"\"\"\n\n    def __init__(self,\n                 brightness_delta=32,\n                 contrast_range=(0.5, 1.5),\n                 saturation_range=(0.5, 1.5),\n                 hue_delta=18):\n        self.brightness_delta = brightness_delta\n        self.contrast_lower, self.contrast_upper = contrast_range\n        self.saturation_lower, self.saturation_upper = saturation_range\n        self.hue_delta = hue_delta\n\n    def __call__(self, img):\n        \"\"\"Call function to perform photometric distortion on images.\n        Args:\n            results (dict): Result dict from loading pipeline.\n        Returns:\n            dict: Result dict with images distorted.\n        \"\"\"\n        \n        # convert PIL Image to Ndarray float32\n        img = np.array(img, dtype=np.float32)\n\n        assert img.dtype == np.float32, \\\n            'PhotoMetricDistortion needs the input image of dtype np.float32,'\\\n            ' please set \"to_float32=True\" in \"LoadImageFromFile\" pipeline'\n        # random brightness\n        if random.randint(2):\n            delta = random.uniform(-self.brightness_delta,\n                                self.brightness_delta)\n            img += delta\n\n        # mode == 0 --> do random contrast first\n        # mode == 1 --> do random contrast last\n        mode = random.randint(2)\n        if mode == 1:\n            if random.randint(2):\n                alpha = random.uniform(self.contrast_lower,\n                                    self.contrast_upper)\n                img *= alpha\n\n        # convert color from BGR to HSV\n        img = mmcv.bgr2hsv(img)\n\n        # random saturation\n        if random.randint(2):\n            img[..., 1] *= random.uniform(self.saturation_lower,\n                                        self.saturation_upper)\n\n        # random hue\n        if random.randint(2):\n            img[..., 0] += random.uniform(-self.hue_delta, self.hue_delta)\n            img[..., 0][img[..., 0] > 360] -= 360\n            img[..., 0][img[..., 0] < 0] += 360\n\n        # convert color from HSV to BGR\n        img = mmcv.hsv2bgr(img)\n\n        # random contrast\n        if mode == 0:\n            if random.randint(2):\n                alpha = random.uniform(self.contrast_lower,\n                                    self.contrast_upper)\n                img *= alpha\n\n        # randomly swap channels\n        if random.randint(2):\n            img = img[..., random.permutation(3)]\n        \n        img = Image.fromarray(img.astype(np.uint8))\n        \n        return img"
  },
  {
    "path": "projects/occ_plugin/datasets/pipelines/loading_instance.py",
    "content": "# Developed by Junyi Ma based on the codebase of PowerBEV\n# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications\n# https://github.com/haomo-ai/Cam4DOcc\n\nimport numpy as np\nfrom mmdet.datasets.builder import PIPELINES\nimport os\nimport torch\nfrom pyquaternion import Quaternion\nfrom nuscenes.utils.data_classes import Box\nimport time\n\n@PIPELINES.register_module()\nclass LoadInstanceWithFlow(object):\n    def __init__(self, cam4docc_dataset_path, grid_size=[512, 512, 40], pc_range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0], background=0, \n                    use_flow=True, use_separate_classes=False, use_lyft=False):\n        '''\n        Loading sequential occupancy labels and instance flows for training and testing\n\n        cam4docc_dataset_path: data path of Cam4DOcc dataset, including 'segmentation', 'instance', and 'flow'\n        grid_size: number of grids along H W L, default: [512, 512, 40]\n        pc_range: predefined ranges along H W L, default: [-51.2, -51.2, -5.0, 51.2, 51.2, 3.0]\n        background: background pixel value for segmentation/instance/flow maps, default: 0\n        use_flow: whether use flow for training schemes, default: True\n        '''\n\n        self.cam4docc_dataset_path = cam4docc_dataset_path\n\n        self.pc_range = pc_range\n        self.resolution = [(self.pc_range[3+i] - self.pc_range[i])/grid_size[i] for i in range(len(self.pc_range[:3]))]\n        self.start_position = [self.pc_range[i] + self.resolution[i] / 2.0 for i in range(len(self.pc_range[:3]))]\n        self.dimension = grid_size\n\n        self.pc_range = np.array(self.pc_range)\n        self.resolution = np.array(self.resolution)\n        self.start_position = np.array(self.start_position)\n        self.dimension = np.array(self.dimension)\n\n        self.background = background\n        self.use_flow = use_flow\n        self.use_separate_classes = use_separate_classes\n\n        self.use_lyft = use_lyft\n\n    def get_poly_region(self, instance_annotation, present_egopose, present_ego2lidar):\n        \"\"\"\n        Obtain the bounding box polygon of the instance\n        \"\"\"\n        present_ego_translation, present_ego_rotation = present_egopose\n        present_ego2lidar_translation, present_ego2lidar_rotation = present_ego2lidar\n\n        box = Box(\n            instance_annotation['translation'], instance_annotation['size'], Quaternion(instance_annotation['rotation'])\n        )\n        box.translate(present_ego_translation)\n        box.rotate(present_ego_rotation)\n\n        box.translate(present_ego2lidar_translation)\n        box.rotate(present_ego2lidar_rotation)\n        pts=box.corners().T\n\n        X_min_box = pts.min(axis=0)[0]\n        X_max_box = pts.max(axis=0)[0]\n        Y_min_box = pts.min(axis=0)[1]\n        Y_max_box = pts.max(axis=0)[1]\n        Z_min_box = pts.min(axis=0)[2]\n        Z_max_box = pts.max(axis=0)[2]\n\n        if self.pc_range[0] <= X_min_box and X_max_box <= self.pc_range[3] \\\n                and self.pc_range[1] <= Y_min_box and Y_max_box <= self.pc_range[4] \\\n                and self.pc_range[2] <= Z_min_box and Z_max_box <= self.pc_range[5]:\n            pts = np.round((pts - self.start_position[:3] + self.resolution[:3] / 2.0) / self.resolution[:3]).astype(np.int32)\n\n            return pts\n        else:\n            return None\n\n    def fill_occupancy(self, occ_instance, occ_segmentation, occ_attribute_label, instance_fill_info):\n\n        x_grid = torch.linspace(0, self.dimension[0]-1, self.dimension[0], dtype=torch.float)\n        x_grid = x_grid.view(self.dimension[0], 1, 1).expand(self.dimension[0], self.dimension[1], self.dimension[2])\n        y_grid = torch.linspace(0, self.dimension[1]-1, self.dimension[1], dtype=torch.float)\n        y_grid = y_grid.view(1, self.dimension[1], 1).expand(self.dimension[0], self.dimension[1], self.dimension[2])\n        z_grid = torch.linspace(0, self.dimension[2]-1, self.dimension[2], dtype=torch.float)\n        z_grid = z_grid.view(1, 1, self.dimension[2]).expand(self.dimension[0], self.dimension[1], self.dimension[2])\n        mesh_grid_3d = torch.stack((x_grid, y_grid, z_grid), -1)\n        mesh_grid_3d = mesh_grid_3d.view(-1, 3)\n\n        occ_instance = torch.from_numpy(occ_instance).view(-1, 1)\n        occ_segmentation = torch.from_numpy(occ_segmentation).view(-1, 1)\n        occ_attribute_label = torch.from_numpy(occ_attribute_label).view(-1, 1)\n\n        for instance_info in instance_fill_info:\n            poly_region_pts = instance_info['poly_region']\n            semantic_id = instance_info['semantic_id']\n            instance_id = instance_info['instance_id']\n            attribute_label=instance_info['attribute_label']\n\n            X_min_box = poly_region_pts.min(axis=0)[0]\n            X_max_box = poly_region_pts.max(axis=0)[0]\n            Y_min_box = poly_region_pts.min(axis=0)[1]\n            Y_max_box = poly_region_pts.max(axis=0)[1]\n            Z_min_box = poly_region_pts.min(axis=0)[2]\n            Z_max_box = poly_region_pts.max(axis=0)[2]\n\n            mask_cur_instance = (mesh_grid_3d[:,0] >= X_min_box) & (X_max_box >= mesh_grid_3d[:,0]) \\\n                                & (mesh_grid_3d[:,1] >= Y_min_box) & (Y_max_box >= mesh_grid_3d[:,1]) \\\n                                & (mesh_grid_3d[:,2] >= Z_min_box) & (Z_max_box >= mesh_grid_3d[:,2])\n            occ_instance[mask_cur_instance] = instance_id\n            occ_segmentation[mask_cur_instance] = semantic_id\n            occ_attribute_label[mask_cur_instance] = attribute_label\n        \n        occ_instance = occ_instance.view(self.dimension[0], self.dimension[1], self.dimension[2]).long()\n        occ_segmentation = occ_segmentation.view(self.dimension[0], self.dimension[1], self.dimension[2]).long()\n        occ_attribute_label = occ_attribute_label.view(self.dimension[0], self.dimension[1], self.dimension[2]).long()\n\n        return occ_instance, occ_segmentation, occ_attribute_label\n\n\n    def get_label(self, input_seq_data):\n        \"\"\"\n        Generate labels for semantic segmentation, instance segmentation, z position, attribute from the raw data of nuScenes\n        \"\"\"\n        timestep = self.counter\n        # Background is ID 0\n        segmentation = np.ones((self.dimension[0], self.dimension[1], self.dimension[2])) * self.background\n        instance = np.ones((self.dimension[0], self.dimension[1], self.dimension[2])) * self.background\n        attribute_label = np.ones((self.dimension[0], self.dimension[1], self.dimension[2]))  * self.background\n        \n        instance_dict = input_seq_data['instance_dict']\n        egopose_list = input_seq_data['egopose_list']\n        ego2lidar_list = input_seq_data['ego2lidar_list']\n        time_receptive_field = input_seq_data['time_receptive_field']\n\n        instance_fill_info = []\n        \n        for instance_token, instance_annotation in instance_dict.items():\n            if timestep not in instance_annotation['timestep']:\n                continue\n            pointer = instance_annotation['timestep'].index(timestep)\n            annotation = {\n                'translation': instance_annotation['translation'][pointer],\n                'rotation': instance_annotation['rotation'][pointer],\n                'size': instance_annotation['size'],\n            }\n            \n            poly_region = self.get_poly_region(annotation, egopose_list[time_receptive_field - 1], ego2lidar_list[time_receptive_field - 1]) \n\n            if isinstance(poly_region, np.ndarray):\n                if self.counter >= time_receptive_field and instance_token not in self.visible_instance_set:\n                    continue\n                self.visible_instance_set.add(instance_token)\n\n                prepare_for_fill = dict(\n                    poly_region=poly_region,\n                    instance_id=instance_annotation['instance_id'],\n                    attribute_label=instance_annotation['attribute_label'][pointer],\n                    semantic_id=instance_annotation['semantic_id'],\n                )\n\n                instance_fill_info.append(prepare_for_fill)\n\n        instance, segmentation, attribute_label = self.fill_occupancy(instance, segmentation, attribute_label, instance_fill_info)\n\n        segmentation = segmentation.unsqueeze(0)\n        instance = instance.unsqueeze(0)\n        attribute_label = attribute_label.unsqueeze(0).unsqueeze(0)\n\n        return segmentation, instance, attribute_label\n\n\n    @staticmethod\n    def generate_flow(flow, occ_instance_seq, instance, instance_id):\n        \"\"\"\n        Generate ground truth for the flow of each instance based on instance segmentation\n        \"\"\"\n        seg_len, wx, wy, wz = occ_instance_seq.shape\n        ratio = 4\n        occ_instance_seq = occ_instance_seq.reshape(seg_len, wx//ratio, ratio, wy//ratio, ratio, wz//ratio, ratio).permute(0,1,3,5,2,4,6).reshape(seg_len, wx//ratio, wy//ratio, wz//ratio, ratio**3)\n        empty_mask = occ_instance_seq.sum(-1) == 0\n        occ_instance_seq = occ_instance_seq.to(torch.int64)\n        occ_space = occ_instance_seq[~empty_mask]\n        occ_space[occ_space==0] = -torch.arange(len(occ_space[occ_space==0])).to(occ_space.device) - 1 \n        occ_instance_seq[~empty_mask] = occ_space\n        occ_instance_seq = torch.mode(occ_instance_seq, dim=-1)[0]\n        occ_instance_seq[occ_instance_seq<0] = 0\n        occ_instance_seq = occ_instance_seq.long()\n\n        _, wx, wy, wz = occ_instance_seq.shape \n        x, y, z = torch.meshgrid(torch.arange(wx, dtype=torch.float), torch.arange(wy, dtype=torch.float), torch.arange(wz, dtype=torch.float))\n        \n        grid = torch.stack((x, y, z), dim=0)\n\n        # Set the first frame\n        init_pointer = instance['timestep'][0]\n        instance_mask = (occ_instance_seq[init_pointer] == instance_id)\n\n        flow[init_pointer, 0, instance_mask] = grid[0, instance_mask].mean(dim=0, keepdim=True).round() - grid[0, instance_mask]\n        flow[init_pointer, 1, instance_mask] = grid[1, instance_mask].mean(dim=0, keepdim=True).round() - grid[1, instance_mask]\n        flow[init_pointer, 2, instance_mask] = grid[2, instance_mask].mean(dim=0, keepdim=True).round() - grid[2, instance_mask]\n\n        for i, timestep in enumerate(instance['timestep']):\n            if i == 0:\n                continue\n\n            instance_mask = (occ_instance_seq[timestep] == instance_id)\n            prev_instance_mask = (occ_instance_seq[timestep-1] == instance_id)\n            if instance_mask.sum() == 0 or prev_instance_mask.sum() == 0:\n                continue\n\n            flow[timestep, 0, instance_mask] = grid[0, prev_instance_mask].mean(dim=0, keepdim=True).round() - grid[0, instance_mask]\n            flow[timestep, 1, instance_mask] = grid[1, prev_instance_mask].mean(dim=0, keepdim=True).round() - grid[1, instance_mask]\n            flow[timestep, 2, instance_mask] = grid[2, prev_instance_mask].mean(dim=0, keepdim=True).round() - grid[2, instance_mask]\n\n        return flow\n\n    def get_flow_label(self, input_seq_data, ignore_index=255):\n        \"\"\"\n        Generate the global map of the flow ground truth\n        \"\"\"\n        occ_instance = input_seq_data['instance']\n        instance_dict = input_seq_data['instance_dict']\n        instance_map = input_seq_data['instance_map']\n\n        seq_len, wx, wy, wz = occ_instance.shape\n        ratio = 4\n        flow = ignore_index * torch.ones(seq_len, 3, wx//ratio, wy//ratio, wz//ratio)\n        \n        # ignore flow generation for faster pipelines\n        if not self.use_flow:\n            return flow\n\n        for token, instance in instance_dict.items():\n            flow = self.generate_flow(flow, occ_instance, instance, instance_map[token])\n        return flow.float()\n\n    # set ignore index to 0 for vis\n    @staticmethod\n    def convert_instance_mask_to_center_and_offset_label(input_seq_data, ignore_index=255, sigma=3):\n        occ_instance = input_seq_data['instance']\n        num_instances=len(input_seq_data['instance_map'])\n\n        seq_len, wx, wy, wz = occ_instance.shape\n        center_label = torch.zeros(seq_len, 1, wx, wy, wz)\n        offset_label = ignore_index * torch.ones(seq_len, 3, wx, wy, wz)\n        # x is vertical displacement, y is horizontal displacement\n        x, y, z = torch.meshgrid(torch.arange(wx, dtype=torch.float), torch.arange(wy, dtype=torch.float), torch.arange(wz, dtype=torch.float))\n        \n        # Ignore id 0 which is the background\n        for instance_id in range(1, num_instances+1):\n            for t in range(seq_len):\n                instance_mask = (occ_instance[t] == instance_id)\n\n                xc = x[instance_mask].mean().round().long()\n                yc = y[instance_mask].mean().round().long()\n                zc = z[instance_mask].mean().round().long()\n\n                off_x = xc - x\n                off_y = yc - y\n                off_z = zc - z\n\n                g = torch.exp(-(off_x ** 2 + off_y ** 2 + off_z ** 2) / sigma ** 2)\n                center_label[t, 0] = torch.maximum(center_label[t, 0], g)\n                offset_label[t, 0, instance_mask] = off_x[instance_mask]\n                offset_label[t, 1, instance_mask] = off_y[instance_mask]\n                offset_label[t, 2, instance_mask] = off_z[instance_mask]\n\n        return center_label, offset_label\n\n    def __call__(self, results):\n        assert 'segmentation' not in results.keys()\n        assert 'instance' not in results.keys()\n        assert 'attribute_label' not in results.keys()\n\n        time_receptive_field = results['time_receptive_field']\n\n        prefix = \"MMO\" if self.use_separate_classes else \"GMO\"\n        if self.use_lyft:\n            prefix = prefix + \"_lyft\"\n\n        seg_label_dir = os.path.join(self.cam4docc_dataset_path, prefix, \"segmentation\")\n        if not os.path.exists(seg_label_dir):\n            os.mkdir(seg_label_dir)\n        seg_label_path = os.path.join(seg_label_dir, \\\n            results['input_dict'][time_receptive_field-1]['scene_token']+\"_\"+results['input_dict'][time_receptive_field-1]['lidar_token'])\n\n        instance_label_dir = os.path.join(self.cam4docc_dataset_path, prefix, \"instance\")\n        if not os.path.exists(instance_label_dir):\n            os.mkdir(instance_label_dir)\n        instance_label_path = os.path.join(instance_label_dir, \\\n            results['input_dict'][time_receptive_field-1]['scene_token']+\"_\"+results['input_dict'][time_receptive_field-1]['lidar_token'])\n\n        flow_label_dir = os.path.join(self.cam4docc_dataset_path, prefix, \"flow\")\n        if not os.path.exists(flow_label_dir):\n            os.mkdir(flow_label_dir)        \n        flow_label_path = os.path.join(flow_label_dir, \\\n            results['input_dict'][time_receptive_field-1]['scene_token']+\"_\"+results['input_dict'][time_receptive_field-1]['lidar_token'])\n\n        segmentation_list = []\n        if os.path.exists(seg_label_path+\".npz\"):\n            gt_segmentation_arr = np.load(seg_label_path+\".npz\",allow_pickle=True)['arr_0']\n            for j in range(len(gt_segmentation_arr)):\n                segmentation = np.zeros((self.dimension[0], self.dimension[1], self.dimension[2])) * self.background\n                gt_segmentation = gt_segmentation_arr[j]\n                gt_segmentation = torch.from_numpy(gt_segmentation)\n                # for i in range(gt_segmentation.shape[0]):\n                #     cur_ind = gt_segmentation[i, :3].long()\n                #     cur_label = gt_segmentation[i, -1]\n                #     segmentation[cur_ind[0],cur_ind[1],cur_ind[2]] = cur_label\n                segmentation[gt_segmentation[:, 0].long(), gt_segmentation[:, 1].long(), gt_segmentation[:, 2].long()] = gt_segmentation[:, -1]\n                segmentation = torch.from_numpy(segmentation).unsqueeze(0)\n                segmentation_list.append(segmentation)\n\n        instance_list = []\n        if os.path.exists(instance_label_path+\".npz\"):\n            gt_instance_arr = np.load(instance_label_path+\".npz\",allow_pickle=True)['arr_0']\n\n            for j in range(len(gt_instance_arr)):\n                instance = np.ones((self.dimension[0], self.dimension[1], self.dimension[2])) * self.background\n                gt_instance = gt_instance_arr[j]\n                gt_instance = torch.from_numpy(gt_instance)\n                # for i in range(gt_instance.shape[0]):\n                #     cur_ind = gt_instance[i, :3].long()\n                #     cur_label = gt_instance[i, -1]\n                #     instance[cur_ind[0],cur_ind[1],cur_ind[2]] = cur_label\n                instance[gt_instance[:, 0].long(), gt_instance[:, 1].long(), gt_instance[:, 2].long()] = gt_instance[:, -1]\n                instance = torch.from_numpy(instance).unsqueeze(0)\n                instance_list.append(instance)\n        \n        flow_list = []\n        if os.path.exists(flow_label_path+\".npz\"):\n            gt_flow_arr = np.load(flow_label_path+\".npz\",allow_pickle=True)['arr_0']\n\n            for j in range(len(gt_flow_arr)):\n                flow = np.ones((3, self.dimension[0]//4, self.dimension[1]//4, self.dimension[2]//4)) * 255\n                gt_flow = gt_flow_arr[j]\n                gt_flow = torch.from_numpy(gt_flow)\n                # for i in range(gt_flow.shape[0]):\n                #     cur_ind = gt_flow[i, :3].long()\n                #     cur_label = gt_flow[i, 3:]\n                #     flow[0, cur_ind[0],cur_ind[1],cur_ind[2]] = cur_label[0]\n                #     flow[1, cur_ind[0],cur_ind[1],cur_ind[2]] = cur_label[1]\n                #     flow[2, cur_ind[0],cur_ind[1],cur_ind[2]] = cur_label[2]\n                flow[:, gt_flow[:, 0].long(), gt_flow[:, 1].long(), gt_flow[:, 2].long()] = gt_flow[:, 3:].permute(1, 0)\n                flow = torch.from_numpy(flow).unsqueeze(0)\n                flow_list.append(flow)\n\n\n        if os.path.exists(seg_label_path+\".npz\") and os.path.exists(instance_label_path+\".npz\") and os.path.exists(flow_label_path+\".npz\"):\n            results['segmentation'] = torch.cat(segmentation_list, dim=0)\n            results['instance'] = torch.cat(instance_list, dim=0)\n            results['attribute_label'] =  torch.from_numpy(np.zeros((self.dimension[0], self.dimension[1], self.dimension[2]))).unsqueeze(0)\n            results['flow'] = torch.cat(flow_list, dim=0).float()\n\n            for key, value in results.items():\n                if key in ['sample_token', 'centerness', 'offset', 'flow', 'time_receptive_field', \"indices\", \\\n                  'segmentation','instance','attribute_label','sequence_length', 'instance_dict', 'instance_map', 'input_dict', 'egopose_list','ego2lidar_list','scene_token']:\n                    continue\n                results[key] = torch.cat(value, dim=0)\n            return results\n\n        else:\n            results['segmentation'] = []\n            results['instance'] = []\n            results['attribute_label'] = []\n\n            segmentation_saved_list = []\n            instance_saved_list = []\n\n            sequence_length = results['sequence_length']\n            self.visible_instance_set = set()\n            for self.counter in range(sequence_length):\n                segmentation, instance, attribute_label = self.get_label(results)\n                results['segmentation'].append(segmentation)\n                results['instance'].append(instance)\n                results['attribute_label'].append(attribute_label)\n\n                x_grid = torch.linspace(0, self.dimension[0]-1, self.dimension[0], dtype=torch.long)\n                x_grid = x_grid.view(self.dimension[0], 1, 1).expand(self.dimension[0], self.dimension[1], self.dimension[2])\n                y_grid = torch.linspace(0, self.dimension[1]-1, self.dimension[1], dtype=torch.long)\n                y_grid = y_grid.view(1, self.dimension[1], 1).expand(self.dimension[0], self.dimension[1], self.dimension[2])\n                z_grid = torch.linspace(0, self.dimension[2]-1, self.dimension[2], dtype=torch.long)\n                z_grid = z_grid.view(1, 1, self.dimension[2]).expand(self.dimension[0], self.dimension[1], self.dimension[2])\n                segmentation_for_save = torch.stack((x_grid, y_grid, z_grid), -1)\n                segmentation_for_save = segmentation_for_save.view(-1, 3)\n                segmentation_label = segmentation.squeeze(0).view(-1,1)\n                segmentation_for_save = torch.cat((segmentation_for_save, segmentation_label), dim=-1)\n                kept = segmentation_for_save[:,-1]!=0\n                segmentation_for_save= segmentation_for_save[kept]\n                segmentation_saved_list.append(segmentation_for_save)\n\n\n                x_grid = torch.linspace(0, self.dimension[0]-1, self.dimension[0], dtype=torch.long)\n                x_grid = x_grid.view(self.dimension[0], 1, 1).expand(self.dimension[0], self.dimension[1], self.dimension[2])\n                y_grid = torch.linspace(0, self.dimension[1]-1, self.dimension[1], dtype=torch.long)\n                y_grid = y_grid.view(1, self.dimension[1], 1).expand(self.dimension[0], self.dimension[1], self.dimension[2])\n                z_grid = torch.linspace(0, self.dimension[2]-1, self.dimension[2], dtype=torch.long)\n                z_grid = z_grid.view(1, 1, self.dimension[2]).expand(self.dimension[0], self.dimension[1], self.dimension[2])\n                instance_for_save = torch.stack((x_grid, y_grid, z_grid), -1)\n                instance_for_save = instance_for_save.view(-1, 3)\n                instance_label = instance.squeeze(0).view(-1,1)\n                instance_for_save = torch.cat((instance_for_save, instance_label), dim=-1)\n                kept = instance_for_save[:,-1]!=0\n                instance_for_save= instance_for_save[kept]\n                instance_saved_list.append(instance_for_save)\n            \n            segmentation_saved_list2 = [item.cpu().detach().numpy() for item in segmentation_saved_list]\n            instance_saved_list2 = [item.cpu().detach().numpy() for item in instance_saved_list]\n\n            np.savez(seg_label_path, segmentation_saved_list2)\n            np.savez(instance_label_path, instance_saved_list2)\n\n            results['segmentation'] = torch.cat(results['segmentation'], dim=0)\n            results['instance'] = torch.cat(results['instance'], dim=0)\n            results['attribute_label'] =  torch.from_numpy(np.zeros((self.dimension[0], self.dimension[1], self.dimension[2]))).unsqueeze(0)\n\n            results['flow'] = self.get_flow_label(results, ignore_index=255)\n            \n            flow_saved_list = []\n            sequence_length = results['sequence_length']\n            d0 = self.dimension[0]//4\n            d1 = self.dimension[1]//4 \n            d2 = self.dimension[2]//4 \n            for cnt in range(sequence_length):\n                flow = results['flow'][cnt, ...]\n                x_grid = torch.linspace(0, d0-1, d0, dtype=torch.long)\n                x_grid = x_grid.view(d0, 1, 1).expand(d0, d1, d2)\n                y_grid = torch.linspace(0, d1-1, d1, dtype=torch.long)\n                y_grid = y_grid.view(1, d1, 1).expand(d0, d1, d2)\n                z_grid = torch.linspace(0, d2-1, d2, dtype=torch.long)\n                z_grid = z_grid.view(1, 1, d2).expand(d0, d1, d2)\n                flow_for_save = torch.stack((x_grid, y_grid, z_grid), -1)\n                flow_for_save = flow_for_save.view(-1, 3)\n                flow_label = flow.permute(1,2,3,0).view(-1,3)\n                flow_for_save = torch.cat((flow_for_save, flow_label), dim=-1)\n                kept = (flow_for_save[:,-1]!=255) & (flow_for_save[:,-2]!=255) & (flow_for_save[:,-3]!=255)\n                flow_for_save= flow_for_save[kept]\n                flow_saved_list.append(flow_for_save)\n\n            flow_saved_list2 = [item.cpu().detach().numpy() for item in flow_saved_list]\n            np.savez(flow_label_path, flow_saved_list2)\n\n            for key, value in results.items():\n                if key in ['sample_token', 'centerness', 'offset', 'flow', 'time_receptive_field', \"indices\", \\\n                   'segmentation','instance','attribute_label','sequence_length', 'instance_dict', 'instance_map', 'input_dict', 'egopose_list','ego2lidar_list','scene_token']:\n                    continue\n                results[key] = torch.cat(value, dim=0)\n\n        return results"
  },
  {
    "path": "projects/occ_plugin/datasets/pipelines/loading_occupancy.py",
    "content": "# Developed by Junyi Ma based on the codebase of OpenOccupancy\n# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications\n# https://github.com/haomo-ai/Cam4DOcc\n\nimport numpy as np\nimport numba as nb\n\nfrom mmdet.datasets.builder import PIPELINES\nimport yaml, os\nimport torch\nimport torch.nn.functional as F\nimport copy\n\n@PIPELINES.register_module()\nclass LoadOccupancy(object):\n\n    def __init__(self, to_float32=True, occ_path=None, grid_size=[512, 512, 40], pc_range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0], unoccupied=0, gt_resize_ratio=1, use_fine_occ=False, test_mode=False):\n        '''\n        Read sequential fine-grained occupancy labels from nuScenes-Occupancy if use_fine_occ=True\n        '''\n        self.to_float32 = to_float32\n        self.occ_path = occ_path\n\n        self.grid_size = np.array(grid_size)\n        self.unoccupied = unoccupied\n        self.pc_range = np.array(pc_range)\n        self.voxel_size = (self.pc_range[3:] - self.pc_range[:3]) / self.grid_size\n        self.gt_resize_ratio = gt_resize_ratio\n        self.use_fine_occ = use_fine_occ\n\n        self.test_mode = test_mode\n\n\n    def get_seq_pseudo_occ(self, results):\n        sequence_length = results['sequence_length']\n        gt_occ_seq = []\n\n        for count in range(sequence_length):\n            processed_label = np.ones(self.grid_size, dtype=np.uint8) * self.unoccupied\n            processed_label = torch.from_numpy(processed_label)\n            gt_occ_seq.append(processed_label)\n\n        gt_occ_seq = torch.stack(gt_occ_seq)\n        return gt_occ_seq\n\n    def get_seq_occ(self, results):\n        sequence_length = results['sequence_length']\n        gt_occ_seq = []\n\n        for count in range(sequence_length):\n\n            scene_token_cur = results['input_dict'][count]['scene_token']\n            lidar_token_cur = results['input_dict'][count]['lidar_token']\n\n            rel_path = 'scene_{0}/occupancy/{1}.npy'.format(scene_token_cur, lidar_token_cur)\n            #  [z y x cls] or [z y x vx vy vz cls]\n            pcd = np.load(os.path.join(self.occ_path, rel_path))\n            pcd_label = pcd[..., -1:]\n            pcd_label[pcd_label==0] = 255\n            pcd_np_cor = self.voxel2world(pcd[..., [2,1,0]] + 0.5)\n            untransformed_occ = copy.deepcopy(pcd_np_cor)\n\n            egopose_list = results['egopose_list']\n            ego2lidar_list = results['ego2lidar_list']\n            time_receptive_field = results['time_receptive_field']\n            present_global2ego = egopose_list[time_receptive_field - 1]\n            present_ego2lidar = ego2lidar_list[time_receptive_field - 1]\n            cur_global2ego = egopose_list[count]\n            cur_ego2lidar = ego2lidar_list[count]\n\n            pcd_np_cor = np.dot(cur_ego2lidar[1].inverse.rotation_matrix, pcd_np_cor.T)\n            pcd_np_cor = pcd_np_cor.T\n            pcd_np_cor = pcd_np_cor - cur_ego2lidar[0]   # trans\n            # cur_ego -> global \n            pcd_np_cor = np.dot(cur_global2ego[1].inverse.rotation_matrix, pcd_np_cor.T)    # rot    \n            pcd_np_cor = pcd_np_cor.T\n            pcd_np_cor = pcd_np_cor - cur_global2ego[0]   # trans\n            # global -> present_ego  \n            pcd_np_cor = pcd_np_cor + present_global2ego[0]   # trans\n            pcd_np_cor = np.dot(present_global2ego[1].rotation_matrix, pcd_np_cor.T)\n            pcd_np_cor = pcd_np_cor.T\n            # present_ego -> present_lidar\n            pcd_np_cor = pcd_np_cor + present_ego2lidar[0]   # trans\n            pcd_np_cor = np.dot(present_ego2lidar[1].rotation_matrix, pcd_np_cor.T)    # rot    \n            pcd_np_cor = pcd_np_cor.T            \n\n            pcd_np_cor = self.world2voxel(pcd_np_cor)\n\n            # make sure the point is in the grid\n            pcd_np_cor = np.clip(pcd_np_cor, np.array([0,0,0]), self.grid_size - 1)\n            transformed_occ = copy.deepcopy(pcd_np_cor)\n            pcd_np = np.concatenate([pcd_np_cor, pcd_label], axis=-1)\n\n            # 255: noise, 1-16 normal classes, 0 unoccupied\n            pcd_np = pcd_np[np.lexsort((pcd_np_cor[:, 0], pcd_np_cor[:, 1], pcd_np_cor[:, 2])), :]\n            pcd_np = pcd_np.astype(np.int64)\n            processed_label = np.ones(self.grid_size, dtype=np.uint8) * self.unoccupied\n            processed_label = nb_process_label(processed_label, pcd_np)\n\n            processed_label = torch.from_numpy(processed_label)\n\n            # TODO: hard coding\n            for otheridx in [0,1,7,8,11,12,13,14,15,16,17,18,255]:\n                processed_label[processed_label==otheridx] = 0\n            for vehidx in [2,3,4,5,6,9,10]:\n                processed_label[processed_label==vehidx] = 1\n\n            gt_occ_seq.append(processed_label)\n\n        gt_occ_seq = torch.stack(gt_occ_seq)\n        return gt_occ_seq\n    \n\n    def __call__(self, results):\n        if self.use_fine_occ:\n            results['gt_occ'] = self.get_seq_occ(results)\n        else:\n            results['gt_occ'] = self.get_seq_pseudo_occ(results)\n\n        return results\n\n    def voxel2world(self, voxel):\n        \"\"\"\n        voxel: [N, 3]\n        \"\"\"\n        return voxel * self.voxel_size[None, :] + self.pc_range[:3][None, :]\n\n\n    def world2voxel(self, world):\n        \"\"\"\n        world: [N, 3]\n        \"\"\"\n        return (world - self.pc_range[:3][None, :]) / self.voxel_size[None, :]\n\n\n    def __repr__(self):\n        \"\"\"str: Return a string that describes the module.\"\"\"\n        repr_str = self.__class__.__name__\n        repr_str += f'(to_float32={self.to_float32}'\n        return repr_str\n\n    def project_points(self, points, rots, trans, intrins, post_rots, post_trans):\n        \n        # from lidar to camera\n        points = points.reshape(-1, 1, 3)\n        points = points - trans.reshape(1, -1, 3)\n        inv_rots = rots.inverse().unsqueeze(0)\n        points = (inv_rots @ points.unsqueeze(-1))\n        \n        # from camera to raw pixel\n        points = (intrins.unsqueeze(0) @ points).squeeze(-1)\n        points_d = points[..., 2:3]\n        points_uv = points[..., :2] / points_d\n        \n        # from raw pixel to transformed pixel\n        points_uv = post_rots[:, :2, :2].unsqueeze(0) @ points_uv.unsqueeze(-1)\n        points_uv = points_uv.squeeze(-1) + post_trans[..., :2].unsqueeze(0)\n        points_uvd = torch.cat((points_uv, points_d), dim=2)\n        \n        return points_uvd\n    \n# b1:boolean, u1: uint8, i2: int16, u2: uint16\n@nb.jit('b1[:](i2[:,:],u2[:,:],b1[:])', nopython=True, cache=True, parallel=False)\ndef nb_process_img_points(basic_valid_occ, depth_canva, nb_valid_mask):\n    # basic_valid_occ M 3\n    # depth_canva H W\n    # label_size = M   # for original occ, small: 2w mid: ~8w base: ~30w\n    canva_idx = -1 * np.ones_like(depth_canva, dtype=np.int16)\n    for i in range(basic_valid_occ.shape[0]):\n        occ = basic_valid_occ[i]\n        if occ[2] < depth_canva[occ[1], occ[0]]:\n            if canva_idx[occ[1], occ[0]] != -1:\n                nb_valid_mask[canva_idx[occ[1], occ[0]]] = False\n\n            canva_idx[occ[1], occ[0]] = i\n            depth_canva[occ[1], occ[0]] = occ[2]\n            nb_valid_mask[i] = True\n    return nb_valid_mask\n\n# u1: uint8, u8: uint16, i8: int64\n@nb.jit('u1[:,:,:](u1[:,:,:],i8[:,:])', nopython=True, cache=True, parallel=False)\ndef nb_process_label_withvel(processed_label, sorted_label_voxel_pair):\n    label_size = 256\n    counter = np.zeros((label_size,), dtype=np.uint16)\n    counter[sorted_label_voxel_pair[0, 3]] = 1\n    cur_sear_ind = sorted_label_voxel_pair[0, :3]\n    for i in range(1, sorted_label_voxel_pair.shape[0]):\n        cur_ind = sorted_label_voxel_pair[i, :3]\n        if not np.all(np.equal(cur_ind, cur_sear_ind)):\n            processed_label[cur_sear_ind[0], cur_sear_ind[1], cur_sear_ind[2]] = np.argmax(counter)\n            counter = np.zeros((label_size,), dtype=np.uint16)\n            cur_sear_ind = cur_ind\n        counter[sorted_label_voxel_pair[i, 3]] += 1\n    processed_label[cur_sear_ind[0], cur_sear_ind[1], cur_sear_ind[2]] = np.argmax(counter)\n    \n    return processed_label\n\n\n# u1: uint8, u8: uint16, i8: int64\n@nb.jit('u1[:,:,:](u1[:,:,:],i8[:,:])', nopython=True, cache=True, parallel=False)\ndef nb_process_label(processed_label, sorted_label_voxel_pair):\n    label_size = 256\n    counter = np.zeros((label_size,), dtype=np.uint16)\n    counter[sorted_label_voxel_pair[0, 3]] = 1\n    cur_sear_ind = sorted_label_voxel_pair[0, :3]\n    for i in range(1, sorted_label_voxel_pair.shape[0]):\n        cur_ind = sorted_label_voxel_pair[i, :3]\n        if not np.all(np.equal(cur_ind, cur_sear_ind)):\n            processed_label[cur_sear_ind[0], cur_sear_ind[1], cur_sear_ind[2]] = np.argmax(counter)\n            counter = np.zeros((label_size,), dtype=np.uint16)\n            cur_sear_ind = cur_ind\n        counter[sorted_label_voxel_pair[i, 3]] += 1\n    processed_label[cur_sear_ind[0], cur_sear_ind[1], cur_sear_ind[2]] = np.argmax(counter)\n    \n    return processed_label"
  },
  {
    "path": "projects/occ_plugin/datasets/pipelines/transform_3d.py",
    "content": "import numpy as np\nfrom numpy import random\nimport mmcv\nfrom mmdet.datasets.builder import PIPELINES\nfrom mmcv.parallel import DataContainer as DC\n\n@PIPELINES.register_module()\nclass PadMultiViewImage(object):\n    \"\"\"Pad the multi-view image.\n    There are two padding modes: (1) pad to a fixed size and (2) pad to the\n    minimum size that is divisible by some number.\n    Added keys are \"pad_shape\", \"pad_fixed_size\", \"pad_size_divisor\",\n    Args:\n        size (tuple, optional): Fixed padding size.\n        size_divisor (int, optional): The divisor of padded size.\n        pad_val (float, optional): Padding value, 0 by default.\n    \"\"\"\n\n    def __init__(self, size=None, size_divisor=None, pad_val=0):\n        self.size = size\n        self.size_divisor = size_divisor\n        self.pad_val = pad_val\n        # only one of size and size_divisor should be valid\n        assert size is not None or size_divisor is not None\n        assert size is None or size_divisor is None\n\n    def _pad_img(self, results):\n        \"\"\"Pad images according to ``self.size``.\"\"\"\n        if self.size is not None:\n            padded_img = [mmcv.impad(\n                img, shape=self.size, pad_val=self.pad_val) for img in results['img']]\n        elif self.size_divisor is not None:\n            padded_img = [mmcv.impad_to_multiple(\n                img, self.size_divisor, pad_val=self.pad_val) for img in results['img']]\n        \n        results['ori_shape'] = [img.shape for img in results['img']]\n        results['img'] = padded_img\n        results['img_shape'] = [img.shape for img in padded_img]\n        results['pad_shape'] = [img.shape for img in padded_img]\n        results['pad_fixed_size'] = self.size\n        results['pad_size_divisor'] = self.size_divisor\n\n    def __call__(self, results):\n        \"\"\"Call function to pad images, masks, semantic segmentation maps.\n        Args:\n            results (dict): Result dict from loading pipeline.\n        Returns:\n            dict: Updated result dict.\n        \"\"\"\n        self._pad_img(results)\n        return results\n\n    def __repr__(self):\n        repr_str = self.__class__.__name__\n        repr_str += f'(size={self.size}, '\n        repr_str += f'size_divisor={self.size_divisor}, '\n        repr_str += f'pad_val={self.pad_val})'\n        return repr_str\n\n\n@PIPELINES.register_module()\nclass NormalizeMultiviewImage(object):\n    \"\"\"Normalize the image.\n    Added key is \"img_norm_cfg\".\n    Args:\n        mean (sequence): Mean values of 3 channels.\n        std (sequence): Std values of 3 channels.\n        to_rgb (bool): Whether to convert the image from BGR to RGB,\n            default is true.\n    \"\"\"\n\n    def __init__(self, mean, std, to_rgb=True):\n        self.mean = np.array(mean, dtype=np.float32)\n        self.std = np.array(std, dtype=np.float32)\n        self.to_rgb = to_rgb\n\n\n    def __call__(self, results):\n        \"\"\"Call function to normalize images.\n        Args:\n            results (dict): Result dict from loading pipeline.\n        Returns:\n            dict: Normalized results, 'img_norm_cfg' key is added into\n                result dict.\n        \"\"\"\n\n        results['img'] = [mmcv.imnormalize(img, self.mean, self.std, self.to_rgb) for img in results['img']]\n        results['img_norm_cfg'] = dict(\n            mean=self.mean, std=self.std, to_rgb=self.to_rgb)\n        return results\n\n    def __repr__(self):\n        repr_str = self.__class__.__name__\n        repr_str += f'(mean={self.mean}, std={self.std}, to_rgb={self.to_rgb})'\n        return repr_str\n\n\n@PIPELINES.register_module()\nclass PhotoMetricDistortionMultiViewImage:\n    \"\"\"Apply photometric distortion to image sequentially, every transformation\n    is applied with a probability of 0.5. The position of random contrast is in\n    second or second to last.\n    1. random brightness\n    2. random contrast (mode 0)\n    3. convert color from BGR to HSV\n    4. random saturation\n    5. random hue\n    6. convert color from HSV to BGR\n    7. random contrast (mode 1)\n    8. randomly swap channels\n    Args:\n        brightness_delta (int): delta of brightness.\n        contrast_range (tuple): range of contrast.\n        saturation_range (tuple): range of saturation.\n        hue_delta (int): delta of hue.\n    \"\"\"\n\n    def __init__(self,\n                 brightness_delta=32,\n                 contrast_range=(0.5, 1.5),\n                 saturation_range=(0.5, 1.5),\n                 hue_delta=18):\n        self.brightness_delta = brightness_delta\n        self.contrast_lower, self.contrast_upper = contrast_range\n        self.saturation_lower, self.saturation_upper = saturation_range\n        self.hue_delta = hue_delta\n\n    def __call__(self, results):\n        \"\"\"Call function to perform photometric distortion on images.\n        Args:\n            results (dict): Result dict from loading pipeline.\n        Returns:\n            dict: Result dict with images distorted.\n        \"\"\"\n        imgs = results['img']\n        new_imgs = []\n        for img in imgs:\n            assert img.dtype == np.float32, \\\n                'PhotoMetricDistortion needs the input image of dtype np.float32,'\\\n                ' please set \"to_float32=True\" in \"LoadImageFromFile\" pipeline'\n            # random brightness\n            if random.randint(2):\n                delta = random.uniform(-self.brightness_delta,\n                                    self.brightness_delta)\n                img += delta\n\n            # mode == 0 --> do random contrast first\n            # mode == 1 --> do random contrast last\n            mode = random.randint(2)\n            if mode == 1:\n                if random.randint(2):\n                    alpha = random.uniform(self.contrast_lower,\n                                        self.contrast_upper)\n                    img *= alpha\n\n            # convert color from BGR to HSV\n            img = mmcv.bgr2hsv(img)\n\n            # random saturation\n            if random.randint(2):\n                img[..., 1] *= random.uniform(self.saturation_lower,\n                                            self.saturation_upper)\n\n            # random hue\n            if random.randint(2):\n                img[..., 0] += random.uniform(-self.hue_delta, self.hue_delta)\n                img[..., 0][img[..., 0] > 360] -= 360\n                img[..., 0][img[..., 0] < 0] += 360\n\n            # convert color from HSV to BGR\n            img = mmcv.hsv2bgr(img)\n\n            # random contrast\n            if mode == 0:\n                if random.randint(2):\n                    alpha = random.uniform(self.contrast_lower,\n                                        self.contrast_upper)\n                    img *= alpha\n\n            # randomly swap channels\n            if random.randint(2):\n                img = img[..., random.permutation(3)]\n            new_imgs.append(img)\n        results['img'] = new_imgs\n        return results\n\n    def __repr__(self):\n        repr_str = self.__class__.__name__\n        repr_str += f'(\\nbrightness_delta={self.brightness_delta},\\n'\n        repr_str += 'contrast_range='\n        repr_str += f'{(self.contrast_lower, self.contrast_upper)},\\n'\n        repr_str += 'saturation_range='\n        repr_str += f'{(self.saturation_lower, self.saturation_upper)},\\n'\n        repr_str += f'hue_delta={self.hue_delta})'\n        return repr_str\n\n\n\n@PIPELINES.register_module()\nclass CustomCollect3D(object):\n    \"\"\"Collect data from the loader relevant to the specific task.\n    This is usually the last stage of the data loader pipeline. Typically keys\n    is set to some subset of \"img\", \"proposals\", \"gt_bboxes\",\n    \"gt_bboxes_ignore\", \"gt_labels\", and/or \"gt_masks\".\n    The \"img_meta\" item is always populated.  The contents of the \"img_meta\"\n    dictionary depends on \"meta_keys\". By default this includes:\n        - 'img_shape': shape of the image input to the network as a tuple \\\n            (h, w, c).  Note that images may be zero padded on the \\\n            bottom/right if the batch tensor is larger than this shape.\n        - 'scale_factor': a float indicating the preprocessing scale\n        - 'flip': a boolean indicating if image flip transform was used\n        - 'filename': path to the image file\n        - 'ori_shape': original shape of the image as a tuple (h, w, c)\n        - 'pad_shape': image shape after padding\n        - 'lidar2img': transform from lidar to image\n        - 'depth2img': transform from depth to image\n        - 'cam2img': transform from camera to image\n        - 'pcd_horizontal_flip': a boolean indicating if point cloud is \\\n            flipped horizontally\n        - 'pcd_vertical_flip': a boolean indicating if point cloud is \\\n            flipped vertically\n        - 'box_mode_3d': 3D box mode\n        - 'box_type_3d': 3D box type\n        - 'img_norm_cfg': a dict of normalization information:\n            - mean: per channel mean subtraction\n            - std: per channel std divisor\n            - to_rgb: bool indicating if bgr was converted to rgb\n        - 'pcd_trans': point cloud transformations\n        - 'sample_idx': sample index\n        - 'pcd_scale_factor': point cloud scale factor\n        - 'pcd_rotation': rotation applied to point cloud\n        - 'pts_filename': path to point cloud file.\n    Args:\n        keys (Sequence[str]): Keys of results to be collected in ``data``.\n        meta_keys (Sequence[str], optional): Meta keys to be converted to\n            ``mmcv.DataContainer`` and collected in ``data[img_metas]``.\n            Default: ('filename', 'ori_shape', 'img_shape', 'lidar2img',\n            'depth2img', 'cam2img', 'pad_shape', 'scale_factor', 'flip',\n            'pcd_horizontal_flip', 'pcd_vertical_flip', 'box_mode_3d',\n            'box_type_3d', 'img_norm_cfg', 'pcd_trans',\n            'sample_idx', 'pcd_scale_factor', 'pcd_rotation', 'pts_filename')\n    \"\"\"\n\n    def __init__(self,\n                 keys,\n                 meta_keys=('filename', 'ori_shape', 'img_shape', 'lidar2img',\n                            'depth2img', 'cam2img', 'pad_shape',\n                            'scale_factor', 'flip', 'pcd_horizontal_flip',\n                            'pcd_vertical_flip', 'box_mode_3d', 'box_type_3d',\n                            'img_norm_cfg', 'pcd_trans', 'sample_idx', 'prev_idx', 'next_idx',\n                            'pcd_scale_factor', 'pcd_rotation', 'pts_filename',\n                            'transformation_3d_flow', 'scene_token',\n                            'can_bus'\n                            )):\n        self.keys = keys\n        self.meta_keys = meta_keys\n\n    def __call__(self, results):\n        \"\"\"Call function to collect keys in results. The keys in ``meta_keys``\n        will be converted to :obj:`mmcv.DataContainer`.\n        Args:\n            results (dict): Result dict contains the data to collect.\n        Returns:\n            dict: The result dict contains the following keys\n                - keys in ``self.keys``\n                - ``img_metas``\n        \"\"\"\n       \n        data = {}\n        img_metas = {}\n      \n        for key in self.meta_keys:\n            if key in results:\n                img_metas[key] = results[key]\n\n        data['img_metas'] = DC(img_metas, cpu_only=True)\n        for key in self.keys:\n            data[key] = results[key]\n        return data\n\n    def __repr__(self):\n        \"\"\"str: Return a string that describes the module.\"\"\"\n        return self.__class__.__name__ + \\\n            f'(keys={self.keys}, meta_keys={self.meta_keys})'\n\n@PIPELINES.register_module()\nclass CustomOccCollect3D(object):\n    \"\"\"Collect data from the loader relevant to the specific task.\n    This is usually the last stage of the data loader pipeline. Typically keys\n    is set to some subset of \"img\", \"proposals\", \"gt_bboxes\",\n    \"gt_bboxes_ignore\", \"gt_labels\", and/or \"gt_masks\".\n    The \"img_meta\" item is always populated.  The contents of the \"img_meta\"\n    dictionary depends on \"meta_keys\". By default this includes:\n        - 'img_shape': shape of the image input to the network as a tuple \\\n            (h, w, c).  Note that images may be zero padded on the \\\n            bottom/right if the batch tensor is larger than this shape.\n        - 'scale_factor': a float indicating the preprocessing scale\n        - 'flip': a boolean indicating if image flip transform was used\n        - 'filename': path to the image file\n        - 'ori_shape': original shape of the image as a tuple (h, w, c)\n        - 'pad_shape': image shape after padding\n        - 'lidar2img': transform from lidar to image\n        - 'depth2img': transform from depth to image\n        - 'cam2img': transform from camera to image\n        - 'pcd_horizontal_flip': a boolean indicating if point cloud is \\\n            flipped horizontally\n        - 'pcd_vertical_flip': a boolean indicating if point cloud is \\\n            flipped vertically\n        - 'box_mode_3d': 3D box mode\n        - 'box_type_3d': 3D box type\n        - 'img_norm_cfg': a dict of normalization information:\n            - mean: per channel mean subtraction\n            - std: per channel std divisor\n            - to_rgb: bool indicating if bgr was converted to rgb\n        - 'pcd_trans': point cloud transformations\n        - 'sample_idx': sample index\n        - 'pcd_scale_factor': point cloud scale factor\n        - 'pcd_rotation': rotation applied to point cloud\n        - 'pts_filename': path to point cloud file.\n    Args:\n        keys (Sequence[str]): Keys of results to be collected in ``data``.\n        meta_keys (Sequence[str], optional): Meta keys to be converted to\n            ``mmcv.DataContainer`` and collected in ``data[img_metas]``.\n            Default: ('filename', 'ori_shape', 'img_shape', 'lidar2img',\n            'depth2img', 'cam2img', 'pad_shape', 'scale_factor', 'flip',\n            'pcd_horizontal_flip', 'pcd_vertical_flip', 'box_mode_3d',\n            'box_type_3d', 'img_norm_cfg', 'pcd_trans',\n            'sample_idx', 'pcd_scale_factor', 'pcd_rotation', 'pts_filename')\n    \"\"\"\n\n    def __init__(self,\n                 keys,\n                 meta_keys=('filename', 'ori_shape', 'img_shape', 'lidar2img',\n                            'depth2img', 'cam2img', 'pad_shape',\n                            'scale_factor', 'flip', 'pcd_horizontal_flip',\n                            'pcd_vertical_flip', 'box_mode_3d', 'box_type_3d',\n                            'img_norm_cfg', 'pcd_trans', 'sample_idx', 'prev_idx', 'next_idx',\n                            'pcd_scale_factor', 'pcd_rotation', 'pts_filename',\n                            'transformation_3d_flow', 'scene_token',\n                            'can_bus', 'pc_range', 'occ_size', 'lidar_token'\n                            )):\n        self.keys = keys\n        self.meta_keys = meta_keys\n\n    def __call__(self, results):\n        \"\"\"Call function to collect keys in results. The keys in ``meta_keys``\n        will be converted to :obj:`mmcv.DataContainer`.\n        Args:\n            results (dict): Result dict contains the data to collect.\n        Returns:\n            dict: The result dict contains the following keys\n                - keys in ``self.keys``\n                - ``img_metas``\n        \"\"\"\n       \n        data = {}\n        img_metas = {}\n      \n        for key in self.meta_keys:\n            if key in results:\n                img_metas[key] = results[key]\n\n        data['img_metas'] = DC(img_metas, cpu_only=True)\n        for key in self.keys:\n            if key in results.keys():\n                data[key] = results[key]\n        print(\"self.keys\", self.keys)\n        # if 'gt_occ' in results.keys():\n        #     data['gt_occ'] = results['gt_occ']\n            \n        return data\n\n    def __repr__(self):\n        \"\"\"str: Return a string that describes the module.\"\"\"\n        return self.__class__.__name__ + \\\n            f'(keys={self.keys}, meta_keys={self.meta_keys})'\n\n@PIPELINES.register_module()\nclass RandomScaleImageMultiViewImage(object):\n    \"\"\"Random scale the image\n    Args:\n        scales\n    \"\"\"\n\n    def __init__(self, scales=[]):\n        self.scales = scales\n        assert len(self.scales)==1\n\n    def __call__(self, results):\n        \"\"\"Call function to pad images, masks, semantic segmentation maps.\n        Args:\n            results (dict): Result dict from loading pipeline.\n        Returns:\n            dict: Updated result dict.\n        \"\"\"\n        rand_ind = np.random.permutation(range(len(self.scales)))[0]\n        rand_scale = self.scales[rand_ind]\n\n        y_size = [int(img.shape[0] * rand_scale) for img in results['img']]\n        x_size = [int(img.shape[1] * rand_scale) for img in results['img']]\n        scale_factor = np.eye(4)\n        scale_factor[0, 0] *= rand_scale\n        scale_factor[1, 1] *= rand_scale\n        results['img'] = [mmcv.imresize(img, (x_size[idx], y_size[idx]), return_scale=False) for idx, img in\n                          enumerate(results['img'])]\n        lidar2img = [scale_factor @ l2i for l2i in results['lidar2img']]\n        results['lidar2img'] = lidar2img\n        results['img_shape'] = [img.shape for img in results['img']]\n        results['ori_shape'] = [img.shape for img in results['img']]\n\n        return results\n\n\n    def __repr__(self):\n        repr_str = self.__class__.__name__\n        repr_str += f'(size={self.scales}, '\n        return repr_str"
  },
  {
    "path": "projects/occ_plugin/datasets/samplers/__init__.py",
    "content": "from .group_sampler import DistributedGroupSampler\nfrom .distributed_sampler import DistributedSampler\nfrom .sampler import SAMPLER, build_sampler\n\n"
  },
  {
    "path": "projects/occ_plugin/datasets/samplers/distributed_sampler.py",
    "content": "import math\n\nimport torch\nfrom torch.utils.data import DistributedSampler as _DistributedSampler\nfrom .sampler import SAMPLER\n\n\n@SAMPLER.register_module()\nclass DistributedSampler(_DistributedSampler):\n\n    def __init__(self,\n                 dataset=None,\n                 num_replicas=None,\n                 rank=None,\n                 shuffle=True,\n                 seed=0):\n        super().__init__(\n            dataset, num_replicas=num_replicas, rank=rank, shuffle=shuffle)\n        # for the compatibility from PyTorch 1.3+\n        self.seed = seed if seed is not None else 0\n\n    def __iter__(self):\n        # deterministically shuffle based on epoch\n        if self.shuffle:\n            assert False\n        else:\n            indices = torch.arange(len(self.dataset)).tolist()\n\n        # add extra samples to make it evenly divisible\n        # in case that indices is shorter than half of total_size\n        indices = (indices *\n                   math.ceil(self.total_size / len(indices)))[:self.total_size]\n        assert len(indices) == self.total_size\n\n        # subsample\n        per_replicas = self.total_size//self.num_replicas\n        # indices = indices[self.rank:self.total_size:self.num_replicas]\n        indices = indices[self.rank*per_replicas:(self.rank+1)*per_replicas]\n        assert len(indices) == self.num_samples\n\n        return iter(indices)\n"
  },
  {
    "path": "projects/occ_plugin/datasets/samplers/group_sampler.py",
    "content": "# Copyright (c) OpenMMLab. All rights reserved.\nimport math\n\nimport numpy as np\nimport torch\nfrom mmcv.runner import get_dist_info\nfrom torch.utils.data import Sampler\nfrom .sampler import SAMPLER\nimport random\nfrom IPython import embed\n\n\n@SAMPLER.register_module()\nclass DistributedGroupSampler(Sampler):\n    \"\"\"Sampler that restricts data loading to a subset of the dataset.\n    It is especially useful in conjunction with\n    :class:`torch.nn.parallel.DistributedDataParallel`. In such case, each\n    process can pass a DistributedSampler instance as a DataLoader sampler,\n    and load a subset of the original dataset that is exclusive to it.\n    .. note::\n        Dataset is assumed to be of constant size.\n    Arguments:\n        dataset: Dataset used for sampling.\n        num_replicas (optional): Number of processes participating in\n            distributed training.\n        rank (optional): Rank of the current process within num_replicas.\n        seed (int, optional): random seed used to shuffle the sampler if\n            ``shuffle=True``. This number should be identical across all\n            processes in the distributed group. Default: 0.\n    \"\"\"\n\n    def __init__(self,\n                 dataset,\n                 samples_per_gpu=1,\n                 num_replicas=None,\n                 rank=None,\n                 seed=0):\n        _rank, _num_replicas = get_dist_info()\n        if num_replicas is None:\n            num_replicas = _num_replicas\n        if rank is None:\n            rank = _rank\n        self.dataset = dataset\n        self.samples_per_gpu = samples_per_gpu\n        self.num_replicas = num_replicas\n        self.rank = rank\n        self.epoch = 0\n        self.seed = seed if seed is not None else 0\n\n        assert hasattr(self.dataset, 'flag')\n        self.flag = self.dataset.flag\n        self.group_sizes = np.bincount(self.flag)\n\n        self.num_samples = 0\n        for i, j in enumerate(self.group_sizes):\n            self.num_samples += int(\n                math.ceil(self.group_sizes[i] * 1.0 / self.samples_per_gpu /\n                          self.num_replicas)) * self.samples_per_gpu\n        self.total_size = self.num_samples * self.num_replicas\n\n    def __iter__(self):\n        # deterministically shuffle based on epoch\n        g = torch.Generator()\n        g.manual_seed(self.epoch + self.seed)\n\n        indices = []\n        for i, size in enumerate(self.group_sizes):\n            if size > 0:\n                indice = np.where(self.flag == i)[0]\n                assert len(indice) == size\n                # add .numpy() to avoid bug when selecting indice in parrots.\n                # TODO: check whether torch.randperm() can be replaced by\n                # numpy.random.permutation().\n                indice = indice[list(\n                    torch.randperm(int(size), generator=g).numpy())].tolist()\n                extra = int(\n                    math.ceil(\n                        size * 1.0 / self.samples_per_gpu / self.num_replicas)\n                ) * self.samples_per_gpu * self.num_replicas - len(indice)\n                # pad indice\n                tmp = indice.copy()\n                for _ in range(extra // size):\n                    indice.extend(tmp)\n                indice.extend(tmp[:extra % size])\n                indices.extend(indice)\n\n        assert len(indices) == self.total_size\n\n        indices = [\n            indices[j] for i in list(\n                torch.randperm(\n                    len(indices) // self.samples_per_gpu, generator=g))\n            for j in range(i * self.samples_per_gpu, (i + 1) *\n                           self.samples_per_gpu)\n        ]\n\n        # subsample\n        offset = self.num_samples * self.rank\n        indices = indices[offset:offset + self.num_samples]\n        assert len(indices) == self.num_samples\n\n        return iter(indices)\n\n    def __len__(self):\n        return self.num_samples\n\n    def set_epoch(self, epoch):\n        self.epoch = epoch\n\n"
  },
  {
    "path": "projects/occ_plugin/datasets/samplers/sampler.py",
    "content": "from mmcv.utils.registry import Registry, build_from_cfg\r\n\r\nSAMPLER = Registry('sampler')\r\n\r\n\r\ndef build_sampler(cfg, default_args):\r\n    return build_from_cfg(cfg, SAMPLER, default_args)\r\n"
  },
  {
    "path": "projects/occ_plugin/occupancy/__init__.py",
    "content": "from .dense_heads import *\r\nfrom .detectors import *\r\nfrom .backbones import *\r\nfrom .image2bev import *\r\nfrom .voxel_encoder import *\r\nfrom .necks import *\r\nfrom .fuser import *\r\n"
  },
  {
    "path": "projects/occ_plugin/occupancy/apis/__init__.py",
    "content": "from .train import custom_train_model\nfrom .mmdet_train import custom_train_detector\n# from .test import custom_multi_gpu_test"
  },
  {
    "path": "projects/occ_plugin/occupancy/apis/mmdet_train.py",
    "content": "# ---------------------------------------------\n# Copyright (c) OpenMMLab. All rights reserved.\n# ---------------------------------------------\n#  Modified by Junyi Ma, following OpenOccupancy of Xiaofeng Wang\n# ---------------------------------------------\nimport random\nimport warnings\n\nimport numpy as np\nimport torch\nimport torch.distributed as dist\nfrom mmcv.parallel import MMDataParallel, MMDistributedDataParallel\nfrom mmcv.runner import (HOOKS, DistSamplerSeedHook, EpochBasedRunner,\n                         Fp16OptimizerHook, OptimizerHook, build_optimizer,\n                         build_runner, get_dist_info)\nfrom mmcv.utils import build_from_cfg\nfrom mmdet.core import EvalHook\nfrom mmdet.datasets import (build_dataset,\n                            replace_ImageToTensor)\nfrom mmdet.utils import get_root_logger\nimport time\nimport os.path as osp\nfrom projects.occ_plugin.datasets.builder import build_dataloader\nfrom projects.occ_plugin.core.evaluation.eval_hooks import OccDistEvalHook, OccEvalHook\nfrom projects.occ_plugin.datasets import custom_build_dataset\n\ndef custom_train_detector(model,\n                   dataset,\n                   cfg,\n                   distributed=False,\n                   validate=False,\n                   timestamp=None,\n                   meta=None):\n    \n    logger = get_root_logger(cfg.log_level)\n    \n    dataset = dataset if isinstance(dataset, (list, tuple)) else [dataset]\n    data_loaders = [\n        build_dataloader(\n            ds,\n            cfg.data.samples_per_gpu,\n            cfg.data.workers_per_gpu,\n            # cfg.gpus will be ignored if distributed\n            len(cfg.gpu_ids),\n            dist=distributed,\n            seed=cfg.seed,\n            shuffler_sampler=cfg.data.shuffler_sampler,\n            nonshuffler_sampler=cfg.data.nonshuffler_sampler,\n        ) for ds in dataset\n    ]\n    # torch.distributed.init_process_group(backend='nccl')\n\n    \n    if distributed:\n        find_unused_parameters = cfg.get('find_unused_parameters', False)\n        model = MMDistributedDataParallel(\n            model.cuda(),\n            device_ids=[torch.cuda.current_device()],\n            broadcast_buffers=False,\n            find_unused_parameters=find_unused_parameters)\n    else:\n        model = MMDataParallel(\n            model.cuda(cfg.gpu_ids[0]), device_ids=cfg.gpu_ids)\n\n    # build runner\n    optimizer = build_optimizer(model, cfg.optimizer)\n\n    assert 'runner' in cfg\n    runner = build_runner(\n        cfg.runner,\n        default_args=dict(\n            model=model,\n            optimizer=optimizer,\n            work_dir=cfg.work_dir,\n            logger=logger,\n            meta=meta))\n\n    # an ugly workaround to make .log and .log.json filenames the same\n    runner.timestamp = timestamp\n\n    # fp16 setting TODO\n    fp16_cfg = cfg.get('fp16', None)\n    if fp16_cfg is not None:\n        optimizer_config = Fp16OptimizerHook(\n            **cfg.optimizer_config, **fp16_cfg, distributed=distributed)\n    elif distributed and 'type' not in cfg.optimizer_config:\n        optimizer_config = OptimizerHook(**cfg.optimizer_config)\n    else:\n        optimizer_config = cfg.optimizer_config\n\n    # register hooks\n    runner.register_training_hooks(cfg.lr_config, optimizer_config,\n                                   cfg.checkpoint_config, cfg.log_config,\n                                   cfg.get('momentum_config', None))\n    \n    if distributed:\n        if isinstance(runner, EpochBasedRunner):\n            runner.register_hook(DistSamplerSeedHook())\n\n    rank, world_size = get_dist_info()\n    if cfg.resume_from:\n        if rank == 0:\n            print(\"-------------\")\n            print(\"resume from \" + cfg.resume_from)\n            print(\"-------------\")\n        runner.resume(cfg.resume_from)\n    elif cfg.load_from:\n        if rank == 0:\n            print(\"-------------\")\n            print(\"load from \" + cfg.load_from)\n            print(\"-------------\")\n        runner.load_checkpoint(cfg.load_from)\n    runner.run(data_loaders, cfg.workflow)\n\n"
  },
  {
    "path": "projects/occ_plugin/occupancy/apis/test.py",
    "content": "# ---------------------------------------------\n# Copyright (c) OpenMMLab. All rights reserved.\n# ---------------------------------------------\n#  Modified by Zhiqi Li\n# ---------------------------------------------\nimport os.path as osp\nimport pickle\nimport shutil\nimport tempfile\nimport time\n\nimport mmcv\nimport torch\nimport torch.distributed as dist\nfrom mmcv.image import tensor2imgs\nfrom mmcv.runner import get_dist_info\nfrom mmdet.utils import get_root_logger\nfrom mmdet.core import encode_mask_results\n\n\nimport numpy as np\nimport pycocotools.mask as mask_util\nfrom fvcore.nn import FlopCountAnalysis, parameter_count_table\n\ndef custom_encode_mask_results(mask_results):\n    \"\"\"Encode bitmap mask to RLE code. Semantic Masks only\n    Args:\n        mask_results (list | tuple[list]): bitmap mask results.\n            In mask scoring rcnn, mask_results is a tuple of (segm_results,\n            segm_cls_score).\n    Returns:\n        list | tuple: RLE encoded mask.\n    \"\"\"\n    cls_segms = mask_results\n    num_classes = len(cls_segms)\n    encoded_mask_results = []\n    for i in range(len(cls_segms)):\n        encoded_mask_results.append(\n            mask_util.encode(\n                np.array(\n                    cls_segms[i][:, :, np.newaxis], order='F',\n                        dtype='uint8'))[0])  # encoded with RLE\n    return [encoded_mask_results]\n\ndef custom_single_gpu_test(model, data_loader, show=False, out_dir=None, show_score_thr=0.3):\n    model.eval()\n    \n    iou_metric = 0\n    vpq_metric = 0\n    dataset = data_loader.dataset\n    prog_bar = mmcv.ProgressBar(len(dataset))\n    logger = get_root_logger()\n    \n    logger.info(parameter_count_table(model))\n    \n    for i, data in enumerate(data_loader):\n        with torch.no_grad():\n            result = model(return_loss=False, rescale=True, **data)\n            \n            if 'hist_for_iou' in result.keys():\n                iou_metric += result['hist_for_iou']\n                vpq_metric += result['vpq']\n\n        prog_bar.update()\n\n    res = {\n        'hist_for_iou': iou_metric,\n        'vpq_len': len(dataset),\n        'vpq_metric': vpq_metric,\n    }\n\n    return res\n\ndef custom_multi_gpu_test(model, data_loader, tmpdir=None, gpu_collect=False, show=False, out_dir=None):\n    \"\"\"Test model with multiple gpus.\n    This method tests model with multiple gpus and collects the results\n    under two different modes: gpu and cpu modes. By setting 'gpu_collect=True'\n    it encodes results to gpu tensors and use gpu communication for results\n    collection. On cpu mode it saves the results on different gpus to 'tmpdir'\n    and collects them by the rank 0 worker.\n    Args:\n        model (nn.Module): Model to be tested.\n        data_loader (nn.Dataloader): Pytorch data loader.\n        tmpdir (str): Path of directory to save the temporary results from\n            different gpus under cpu mode.\n        gpu_collect (bool): Option to use either gpu or cpu to collect results.\n    Returns:\n        list: The prediction results.\n    \"\"\"\n\n    model.eval()\n    \n    # init predictions\n    iou_metric = []\n    vpq_metric = []\n\n    dataset = data_loader.dataset\n    rank, world_size = get_dist_info()\n    if rank == 0:\n        prog_bar = mmcv.ProgressBar(len(dataset))\n    \n    time.sleep(2)  # This line can prevent deadlock problem in some cases.\n    \n    logger = get_root_logger()\n    logger.info(parameter_count_table(model))\n    \n    for i, data in enumerate(data_loader):\n\n        with torch.no_grad():\n\n            result = model(return_loss=False, rescale=True, **data)\n            \n            if 'hist_for_iou' in result.keys():\n                iou_metric.append(result['hist_for_iou'])\n            if 'vpq' in result.keys():\n                vpq_metric.append(result['vpq'])\n\n            batch_size = 1\n                \n        if rank == 0:\n            for _ in range(batch_size * world_size):\n                prog_bar.update()\n\n    # collect lists from multi-GPUs\n    res = {}\n\n    if 'hist_for_iou' in result.keys():\n        iou_metric = [sum(iou_metric)]\n        iou_metric = collect_results_cpu(iou_metric, len(dataset), tmpdir)\n        res['hist_for_iou'] = iou_metric\n\n    if 'vpq' in result.keys():\n        res['vpq_len'] = len(dataset)\n        vpq_metric = [sum(vpq_metric)]\n        vpq_metric = collect_results_cpu(vpq_metric, len(dataset), tmpdir)\n        res['vpq_metric'] = vpq_metric\n\n    return res\n\n\ndef collect_results_cpu(result_part, size, tmpdir=None, type='list'):\n    rank, world_size = get_dist_info()\n    # create a tmp dir if it is not specified\n    \n    if tmpdir is None:\n        MAX_LEN = 512\n        # 32 is whitespace\n        dir_tensor = torch.full((MAX_LEN,), 32, dtype=torch.uint8, device='cuda')\n        if rank == 0:\n            mmcv.mkdir_or_exist('.dist_test')\n            tmpdir = tempfile.mkdtemp(dir='.dist_test')\n            tmpdir = torch.tensor(\n                bytearray(tmpdir.encode()), dtype=torch.uint8, device='cuda')\n            dir_tensor[:len(tmpdir)] = tmpdir\n        dist.broadcast(dir_tensor, 0)\n        tmpdir = dir_tensor.cpu().numpy().tobytes().decode().rstrip()\n    else:\n        mmcv.mkdir_or_exist(tmpdir)\n    \n    # dump the part result to the dir\n    mmcv.dump(result_part, osp.join(tmpdir, f'part_{rank}.pkl'))\n    dist.barrier()\n\n    # collect all parts\n    if rank == 0:\n    \n        # load results of all parts from tmp dir\n        part_list = []\n        for i in range(world_size):\n            part_file = osp.join(tmpdir, f'part_{i}.pkl')\n            part_list.append(mmcv.load(part_file))\n\n        # sort the results\n        if type == 'list':\n            ordered_results = []\n            for res in part_list:  \n                ordered_results.extend(list(res))\n            # the dataloader may pad some samples\n            ordered_results = ordered_results[:size]\n        \n        else:\n            raise NotImplementedError\n        \n        # remove tmp dir\n        shutil.rmtree(tmpdir)\n    \n    dist.barrier()\n\n    if rank != 0:\n        return None\n    \n    return ordered_results\n\n"
  },
  {
    "path": "projects/occ_plugin/occupancy/apis/train.py",
    "content": "from .mmdet_train import custom_train_detector\nfrom mmseg.apis import train_segmentor\nfrom mmdet.apis import train_detector\n\ndef custom_train_model(model,\n                dataset,\n                cfg,\n                distributed=False,\n                validate=False,\n                timestamp=None,\n                meta=None):\n    \"\"\"A function wrapper for launching model training according to cfg.\n\n    Because we need different eval_hook in runner. Should be deprecated in the\n    future.\n    \"\"\"\n    if cfg.model.type in ['EncoderDecoder3D']:\n        assert False\n    else:\n        custom_train_detector(\n            model,\n            dataset,\n            cfg,\n            distributed=distributed,\n            validate=validate,\n            timestamp=timestamp,\n            meta=meta)\n\n\ndef train_model(model,\n                dataset,\n                cfg,\n                distributed=False,\n                validate=False,\n                timestamp=None,\n                meta=None):\n    \"\"\"A function wrapper for launching model training according to cfg.\n\n    Because we need different eval_hook in runner. Should be deprecated in the\n    future.\n    \"\"\"\n    if cfg.model.type in ['EncoderDecoder3D']:\n        train_segmentor(\n            model,\n            dataset,\n            cfg,\n            distributed=distributed,\n            validate=validate,\n            timestamp=timestamp,\n            meta=meta)\n    else:\n        train_detector(\n            model,\n            dataset,\n            cfg,\n            distributed=distributed,\n            validate=validate,\n            timestamp=timestamp,\n            meta=meta)\n"
  },
  {
    "path": "projects/occ_plugin/occupancy/backbones/__init__.py",
    "content": "from .resnet3d import CustomResNet3D\nfrom .pred_block import Predictor"
  },
  {
    "path": "projects/occ_plugin/occupancy/backbones/pred_block.py",
    "content": "# Developed by Junyi Ma\n# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications\n# https://github.com/haomo-ai/Cam4DOcc\n\nimport copy\nimport numpy as np\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom mmdet3d.models.builder import BACKBONES\nfrom collections import OrderedDict\nfrom mmcv.cnn import build_norm_layer\n\n\nclass Residual(nn.Module):\n    def __init__(\n        self,\n        in_channels,\n        out_channels,\n        kernel_size=(3,3,1),\n        dilation=1,\n        norm_cfg=None\n    ):\n        super().__init__()\n        out_channels = out_channels or in_channels\n        # padding_size = ((kernel_size - 1) * dilation + 1) // 2\n        padding_size = [0,0,0]\n        if dilation!=0:\n            padding_size[0] = ((kernel_size[0] - 1) * dilation + 1) // 2\n            padding_size[1] = ((kernel_size[1] - 1) * dilation + 1) // 2\n            padding_size[2] = ((kernel_size[2] - 1) * dilation + 1) // 2\n        padding_size = tuple(padding_size)\n\n        conv = nn.Conv3d(in_channels, out_channels, kernel_size=kernel_size, bias=False, dilation=dilation, padding=padding_size)\n        self.layers = nn.Sequential(conv, build_norm_layer(norm_cfg, out_channels)[1], nn.LeakyReLU(inplace=True))\n\n\n        if out_channels == in_channels :\n            self.projection = None\n        else:\n            projection = OrderedDict()\n            projection.update(\n                {\n                    'conv_skip_proj': nn.Conv3d(in_channels, out_channels, kernel_size=1, bias=False),\n                    'bn_skip_proj': build_norm_layer(norm_cfg, out_channels)[1],\n                }\n            )\n            self.projection = nn.Sequential(projection)\n\n\n    def forward(self, x):\n        x_residual = self.layers(x)\n        if self.projection is not None:\n            x_projected = self.projection(x)\n            return x_residual + x_projected\n        return x_residual + x\n\n\n@BACKBONES.register_module()\nclass Predictor(nn.Module):\n    def __init__(\n        self,\n        n_input_channels=None,\n        in_timesteps=None,\n        out_timesteps=None,\n        norm_cfg=None,\n    ):\n        super(Predictor, self).__init__()\n        \n        self.predictor = nn.ModuleList()\n        for nf in n_input_channels:\n            self.predictor.append(nn.Sequential(\n                Residual(nf * in_timesteps, nf * in_timesteps, norm_cfg=norm_cfg),\n                Residual(nf * in_timesteps, nf * in_timesteps, norm_cfg=norm_cfg),\n                Residual(nf * in_timesteps, nf * out_timesteps, norm_cfg=norm_cfg),\n                Residual(nf * out_timesteps, nf * out_timesteps, norm_cfg=norm_cfg),\n                Residual(nf * out_timesteps, nf * out_timesteps, norm_cfg=norm_cfg),\n            ))\n\n    def forward(self, x):\n        assert len(x) == len(self.predictor), f'The number of input feature tensors ({len(x)}) must be the same as the number of STPredictor blocks {len(self.predictor)}.'\n        \n        y = []\n        \n        for i in range(len(x)):\n            b, c, _, _, _ = x[i].shape\n            y.append(self.predictor[i](x[i]))\n                \n        return y"
  },
  {
    "path": "projects/occ_plugin/occupancy/backbones/resnet3d.py",
    "content": "import math\nfrom functools import partial\nfrom mmdet3d.models.builder import BACKBONES\nfrom mmcv.cnn import build_conv_layer, build_norm_layer, build_upsample_layer\nfrom mmcv.runner import BaseModule\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nimport pdb\n\ndef get_inplanes():\n    return [64, 128, 256, 512]\n\n\ndef conv3x3x3(in_planes, out_planes, stride=1):\n    return nn.Conv3d(in_planes,\n                     out_planes,\n                     kernel_size=3,\n                     stride=stride,\n                     padding=1,\n                     bias=False)\n\n\ndef conv1x1x1(in_planes, out_planes, stride=1):\n    return nn.Conv3d(in_planes,\n                     out_planes,\n                     kernel_size=1,\n                     stride=stride,\n                     bias=False)\n\n\nclass BasicBlock(nn.Module):\n    expansion = 1\n\n    def __init__(self, in_planes, planes, stride=1, downsample=None, norm_cfg=None):\n        super().__init__()\n\n        self.conv1 = conv3x3x3(in_planes, planes, stride)\n        self.bn1 = build_norm_layer(norm_cfg, planes)[1]\n        self.relu = nn.ReLU(inplace=True)\n        self.conv2 = conv3x3x3(planes, planes)\n        self.bn2 = build_norm_layer(norm_cfg, planes)[1]\n        self.downsample = downsample\n        self.stride = stride\n\n    def forward(self, x):\n        residual = x\n\n        out = self.conv1(x)\n        out = self.bn1(out)\n        out = self.relu(out)\n\n        out = self.conv2(out)\n        out = self.bn2(out)\n\n        if self.downsample is not None:\n            residual = self.downsample(x)\n\n        out += residual\n        out = self.relu(out)\n\n        return out\n\n\nclass Bottleneck(nn.Module):\n    expansion = 4\n\n    def __init__(self, in_planes, planes, stride=1, downsample=None, norm_cfg=None):\n        super().__init__()\n\n        self.conv1 = conv1x1x1(in_planes, planes)\n        self.bn1 = build_norm_layer(norm_cfg, planes)[1]\n        self.conv2 = conv3x3x3(planes, planes, stride)\n        self.bn2 = build_norm_layer(norm_cfg, planes)[1]\n        self.conv3 = conv1x1x1(planes, planes * self.expansion)\n        self.bn3 = build_norm_layer(norm_cfg, planes * self.expansion)[1]\n        self.relu = nn.ReLU(inplace=True)\n        self.downsample = downsample\n        self.stride = stride\n\n    def forward(self, x):\n        residual = x\n\n        out = self.conv1(x)\n        out = self.bn1(out)\n        out = self.relu(out)\n\n        out = self.conv2(out)\n        out = self.bn2(out)\n        out = self.relu(out)\n\n        out = self.conv3(out)\n        out = self.bn3(out)\n\n        if self.downsample is not None:\n            residual = self.downsample(x)\n\n        out += residual\n        out = self.relu(out)\n\n        return out\n\n@BACKBONES.register_module()\nclass CustomResNet3D(BaseModule):\n    def __init__(self,\n                 depth,\n                 block_inplanes=[64, 128, 256, 512],\n                 block_strides=[1, 2, 2, 2],\n                 out_indices=(0, 1, 2, 3),\n                 n_input_channels=3,\n                 shortcut_type='B',\n                 norm_cfg=dict(type='BN3d', requires_grad=True),\n                 widen_factor=1.0):\n        super().__init__()\n        \n        layer_metas = {\n            10: [1, 1, 1, 1],\n            18: [2, 2, 2, 2],\n            34: [3, 4, 6, 3],\n            50: [3, 4, 6, 3],\n            101: [3, 4, 23, 3],\n        }\n        \n        if depth in [10, 18, 34]:\n            block = BasicBlock\n        else:\n            assert depth in [50, 101]\n            block = Bottleneck\n        \n        layers = layer_metas[depth]\n        block_inplanes = [int(x * widen_factor) for x in block_inplanes]\n        self.in_planes = block_inplanes[0]\n        self.out_indices = out_indices\n        \n        # replace the first several downsampling layers with the channel-squeeze layers\n        self.input_proj = nn.Sequential(\n            nn.Conv3d(n_input_channels, self.in_planes, kernel_size=(1, 1, 1),\n                      stride=(1, 1, 1), bias=False),\n            build_norm_layer(norm_cfg, self.in_planes)[1],\n            nn.ReLU(inplace=True),\n        )\n        \n        self.layers = nn.ModuleList()\n        for i in range(len(block_inplanes)):\n            self.layers.append(self._make_layer(block, block_inplanes[i], layers[i], \n                                shortcut_type, block_strides[i], norm_cfg=norm_cfg))\n\n        for m in self.modules():\n            if isinstance(m, nn.Conv3d):\n                nn.init.kaiming_normal_(m.weight,\n                                        mode='fan_out',\n                                        nonlinearity='relu')\n            \n            elif isinstance(m, nn.BatchNorm3d):\n                nn.init.constant_(m.weight, 1)\n                nn.init.constant_(m.bias, 0)\n\n    def _downsample_basic_block(self, x, planes, stride):\n        out = F.avg_pool3d(x, kernel_size=1, stride=stride)\n        zero_pads = torch.zeros(out.size(0), planes - out.size(1), out.size(2),\n                                out.size(3), out.size(4))\n        if isinstance(out.data, torch.cuda.FloatTensor):\n            zero_pads = zero_pads.cuda()\n\n        out = torch.cat([out.data, zero_pads], dim=1)\n\n        return out\n\n    def _make_layer(self, block, planes, blocks, shortcut_type, stride=1, norm_cfg=None):\n        downsample = None\n        if stride != 1 or self.in_planes != planes * block.expansion:\n            if shortcut_type == 'A':\n                downsample = partial(self._downsample_basic_block,\n                                     planes=planes * block.expansion,\n                                     stride=stride)\n            else:\n                downsample = nn.Sequential(\n                    conv1x1x1(self.in_planes, planes * block.expansion, stride),\n                    build_norm_layer(norm_cfg, planes * block.expansion)[1])\n\n        layers = []\n        layers.append(\n            block(in_planes=self.in_planes,\n                  planes=planes,\n                  stride=stride,\n                  downsample=downsample,\n                  norm_cfg=norm_cfg))\n        self.in_planes = planes * block.expansion\n        for i in range(1, blocks):\n            layers.append(block(self.in_planes, planes, norm_cfg=norm_cfg))\n\n        return nn.Sequential(*layers)\n\n    def forward(self, x):\n        x = self.input_proj(x)\n        res = []\n        for index, layer in enumerate(self.layers):\n            x = layer(x)\n            \n            if index in self.out_indices:\n                res.append(x)\n            \n        return res\n\ndef generate_model(model_depth, **kwargs):\n    assert model_depth in [10, 18, 34, 50, 101, 152, 200]\n\n    if model_depth == 10:\n        model = ResNet(BasicBlock, [1, 1, 1, 1], get_inplanes(), **kwargs)\n    elif model_depth == 18:\n        model = ResNet(BasicBlock, [2, 2, 2, 2], get_inplanes(), **kwargs)\n    elif model_depth == 34:\n        model = ResNet(BasicBlock, [3, 4, 6, 3], get_inplanes(), **kwargs)\n    elif model_depth == 50:\n        model = ResNet(Bottleneck, [3, 4, 6, 3], get_inplanes(), **kwargs)\n    elif model_depth == 101:\n        model = ResNet(Bottleneck, [3, 4, 23, 3], get_inplanes(), **kwargs)\n    elif model_depth == 152:\n        model = ResNet(Bottleneck, [3, 8, 36, 3], get_inplanes(), **kwargs)\n    elif model_depth == 200:\n        model = ResNet(Bottleneck, [3, 24, 36, 3], get_inplanes(), **kwargs)\n\n    return model"
  },
  {
    "path": "projects/occ_plugin/occupancy/dense_heads/__init__.py",
    "content": "from .occ_head import OccHead\nfrom .flow_head import FlowHead"
  },
  {
    "path": "projects/occ_plugin/occupancy/dense_heads/flow_head.py",
    "content": "# Developed by Junyi Ma\n# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications\n# https://github.com/haomo-ai/Cam4DOcc\n\nimport copy\nimport numpy as np\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom mmdet.core import reduce_mean\nfrom mmdet.models import HEADS\nfrom mmcv.cnn import build_conv_layer, build_norm_layer\nfrom .lovasz_softmax import lovasz_softmax\nfrom projects.occ_plugin.utils.nusc_param import nusc_class_names\nfrom projects.occ_plugin.utils.semkitti import Smooth_L1_loss\n\n@HEADS.register_module()\nclass FlowHead(nn.Module):\n    def __init__(\n        self,\n        in_channels,\n        out_channel,\n        num_level=1,\n        num_img_level=1,\n        soft_weights=False,\n        loss_weight_cfg=None,\n        conv_cfg=dict(type='Conv3d', bias=False),\n        norm_cfg=dict(type='GN', num_groups=32, requires_grad=True),\n        fine_topk=20000,\n        point_cloud_range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0],\n        final_occ_size=[256, 256, 20],\n        empty_idx=0,\n        visible_loss=False,\n        balance_cls_weight=True,\n        train_cfg=None,\n        test_cfg=None,\n    ):\n        super(FlowHead, self).__init__()\n        \n        if type(in_channels) is not list:\n            in_channels = [in_channels]\n        \n        self.in_channels = in_channels \n        self.out_channel = out_channel\n        self.num_level = num_level\n        self.fine_topk = fine_topk\n        \n        self.point_cloud_range = torch.tensor(np.array(point_cloud_range)).float()\n        self.final_occ_size = final_occ_size\n        self.visible_loss = visible_loss\n\n        # voxel-level prediction\n        self.occ_convs = nn.ModuleList()\n        for i in range(self.num_level):\n            mid_channel = self.in_channels[i]\n            occ_conv = nn.Sequential(\n                build_conv_layer(conv_cfg, in_channels=self.in_channels[i], \n                        out_channels=mid_channel, kernel_size=3, stride=1, padding=1),\n                build_norm_layer(norm_cfg, mid_channel)[1],\n                nn.ReLU(inplace=True))\n            self.occ_convs.append(occ_conv)\n\n\n        self.occ_pred_conv = nn.Sequential(\n                build_conv_layer(conv_cfg, in_channels=mid_channel, \n                        out_channels=mid_channel//2, kernel_size=1, stride=1, padding=0),\n                build_norm_layer(norm_cfg, mid_channel//2)[1],\n                nn.ReLU(inplace=True),)\n        \n        self.last_conv = build_conv_layer(conv_cfg, in_channels=mid_channel//2, \n                        out_channels=out_channel, kernel_size=1, stride=1, padding=0)\n        self.last_conv.bias = nn.parameter.Parameter(torch.tensor([0.0, 0.0, 0.0], requires_grad=True))\n\n        self.soft_weights = soft_weights\n        self.num_img_level = num_img_level\n        self.num_point_sampling_feat = self.num_level\n        if self.soft_weights:\n            soft_in_channel = mid_channel\n            self.voxel_soft_weights = nn.Sequential(\n                build_conv_layer(conv_cfg, in_channels=soft_in_channel, \n                        out_channels=soft_in_channel//2, kernel_size=1, stride=1, padding=0),\n                build_norm_layer(norm_cfg, soft_in_channel//2)[1],\n                nn.ReLU(inplace=True),\n                build_conv_layer(conv_cfg, in_channels=soft_in_channel//2, \n                        out_channels=self.num_point_sampling_feat, kernel_size=1, stride=1, padding=0))\n        self.class_names = nusc_class_names    \n        self.empty_idx = empty_idx\n        \n    def forward_coarse_voxel(self, voxel_feats):\n        output_occs = []\n        output = {}\n        for feats, occ_conv in zip(voxel_feats, self.occ_convs):\n            output_occs.append(occ_conv(feats))  \n        if self.soft_weights: \n            voxel_soft_weights = self.voxel_soft_weights(output_occs[0])\n            voxel_soft_weights = torch.softmax(voxel_soft_weights, dim=1)\n        else:\n            voxel_soft_weights = torch.ones([output_occs[0].shape[0], self.num_point_sampling_feat, 1, 1, 1],).to(output_occs[0].device) / self.num_point_sampling_feat\n\n        out_voxel_feats = 0\n        _, _, H, W, D= output_occs[0].shape\n        for feats, weights in zip(output_occs, torch.unbind(voxel_soft_weights, dim=1)):\n            feats = F.interpolate(feats, size=[H, W, D], mode='trilinear', align_corners=False).contiguous()\n            out_voxel_feats += feats * weights.unsqueeze(1)\n        output['out_voxel_feats'] = [out_voxel_feats]\n\n        out_voxel = self.occ_pred_conv(out_voxel_feats) \n        out_voxel = self.last_conv(out_voxel)\n        output['occ'] = [out_voxel]\n\n        return output\n\n    def forward(self, voxel_feats, img_feats=None, transform=None, **kwargs):\n        assert type(voxel_feats) is list and len(voxel_feats) == self.num_level\n        \n        # forward voxel \n        output = self.forward_coarse_voxel(voxel_feats)\n\n        res = {\n            'output_voxels': output['occ'],\n        }\n        \n        return res\n\n    def loss_voxel(self, output_voxels, target_voxels, tag):                    \n        B, C, H, W, D = output_voxels.shape\n        tB, tC, tF, tH, tW, tD = target_voxels.shape\n        target_voxels = target_voxels.view(tB*tC, tF, tH, tW, tD)\n        \n        assert torch.isnan(output_voxels).sum().item() == 0\n\n        output_voxels = output_voxels.permute(0,2,3,4,1) \n        target_voxels = target_voxels.permute(0,2,3,4,1)\n\n        loss_dict = {}\n\n        loss_dict['loss_flow_l1_{}'.format(tag)] = (0.5) * (0.1) * Smooth_L1_loss(output_voxels, target_voxels, ignore_index=255)\n\n        return loss_dict\n\n    def loss_point(self, fine_coord, fine_output, target_voxels, tag):\n\n        selected_gt = target_voxels[:, fine_coord[0,:], fine_coord[1,:], fine_coord[2,:]].long()[0]\n        assert torch.isnan(selected_gt).sum().item() == 0, torch.isnan(selected_gt).sum().item()\n        assert torch.isnan(fine_output).sum().item() == 0, torch.isnan(fine_output).sum().item()\n\n        loss_dict = {}\n\n        # igore 255 = ignore noise. we keep the loss bascward for the label=0 (free voxels)\n        loss_dict['loss_voxel_ce_{}'.format(tag)] = self.loss_voxel_ce_weight * CE_ssc_loss(fine_output, selected_gt, ignore_index=255)\n        loss_dict['loss_voxel_sem_scal_{}'.format(tag)] = self.loss_voxel_sem_scal_weight * sem_scal_loss(fine_output, selected_gt, ignore_index=255)\n        loss_dict['loss_voxel_geo_scal_{}'.format(tag)] = self.loss_voxel_geo_scal_weight * geo_scal_loss(fine_output, selected_gt, ignore_index=255, non_empty_idx=self.empty_idx)\n        loss_dict['loss_voxel_lovasz_{}'.format(tag)] = self.loss_voxel_lovasz_weight * lovasz_softmax(torch.softmax(fine_output, dim=1), selected_gt, ignore=255)\n\n\n        return loss_dict\n\n    def loss(self, output_voxels=None,\n                output_coords_fine=None, output_voxels_fine=None, \n                target_voxels=None, **kwargs):\n        loss_dict = {}\n        for index, output_voxel in enumerate(output_voxels):\n            loss_dict.update(self.loss_voxel(output_voxel, target_voxels,  tag='c_{}'.format(index)))\n            \n        return loss_dict"
  },
  {
    "path": "projects/occ_plugin/occupancy/dense_heads/lovasz_softmax.py",
    "content": "# -*- coding:utf-8 -*-\n# author: Xinge\n\n\n\"\"\"\nLovasz-Softmax and Jaccard hinge loss in PyTorch\nMaxim Berman 2018 ESAT-PSI KU Leuven (MIT License)\n\"\"\"\n\nfrom __future__ import print_function, division\n\nimport torch\nfrom torch.autograd import Variable\nimport torch.nn.functional as F\nimport numpy as np\ntry:\n    from itertools import  ifilterfalse\nexcept ImportError: # py3k\n    from itertools import  filterfalse as ifilterfalse\n\ndef lovasz_grad(gt_sorted):\n    \"\"\"\n    Computes gradient of the Lovasz extension w.r.t sorted errors\n    See Alg. 1 in paper\n    \"\"\"\n    p = len(gt_sorted)\n    gts = gt_sorted.sum()\n    intersection = gts - gt_sorted.float().cumsum(0)\n    union = gts + (1 - gt_sorted).float().cumsum(0)\n    jaccard = 1. - intersection / union\n    if p > 1: # cover 1-pixel case\n        jaccard[1:p] = jaccard[1:p] - jaccard[0:-1]\n    return jaccard\n\n\ndef iou_binary(preds, labels, EMPTY=1., ignore=None, per_image=True):\n    \"\"\"\n    IoU for foreground class\n    binary: 1 foreground, 0 background\n    \"\"\"\n    if not per_image:\n        preds, labels = (preds,), (labels,)\n    ious = []\n    for pred, label in zip(preds, labels):\n        intersection = ((label == 1) & (pred == 1)).sum()\n        union = ((label == 1) | ((pred == 1) & (label != ignore))).sum()\n        if not union:\n            iou = EMPTY\n        else:\n            iou = float(intersection) / float(union)\n        ious.append(iou)\n    iou = mean(ious)    # mean accross images if per_image\n    return 100 * iou\n\n\ndef iou(preds, labels, C, EMPTY=1., ignore=None, per_image=False):\n    \"\"\"\n    Array of IoU for each (non ignored) class\n    \"\"\"\n    if not per_image:\n        preds, labels = (preds,), (labels,)\n    ious = []\n    for pred, label in zip(preds, labels):\n        iou = []    \n        for i in range(C):\n            if i != ignore: # The ignored label is sometimes among predicted classes (ENet - CityScapes)\n                intersection = ((label == i) & (pred == i)).sum()\n                union = ((label == i) | ((pred == i) & (label != ignore))).sum()\n                if not union:\n                    iou.append(EMPTY)\n                else:\n                    iou.append(float(intersection) / float(union))\n        ious.append(iou)\n    ious = [mean(iou) for iou in zip(*ious)] # mean accross images if per_image\n    return 100 * np.array(ious)\n\n\n# --------------------------- BINARY LOSSES ---------------------------\n\n\ndef lovasz_hinge(logits, labels, per_image=True, ignore=None):\n    \"\"\"\n    Binary Lovasz hinge loss\n      logits: [B, H, W] Variable, logits at each pixel (between -\\infty and +\\infty)\n      labels: [B, H, W] Tensor, binary ground truth masks (0 or 1)\n      per_image: compute the loss per image instead of per batch\n      ignore: void class id\n    \"\"\"\n    if per_image:\n        loss = mean(lovasz_hinge_flat(*flatten_binary_scores(log.unsqueeze(0), lab.unsqueeze(0), ignore))\n                          for log, lab in zip(logits, labels))\n    else:\n        loss = lovasz_hinge_flat(*flatten_binary_scores(logits, labels, ignore))\n    return loss\n\n\ndef lovasz_hinge_flat(logits, labels):\n    \"\"\"\n    Binary Lovasz hinge loss\n      logits: [P] Variable, logits at each prediction (between -\\infty and +\\infty)\n      labels: [P] Tensor, binary ground truth labels (0 or 1)\n      ignore: label to ignore\n    \"\"\"\n    if len(labels) == 0:\n        # only void pixels, the gradients should be 0\n        return logits.sum() * 0.\n    signs = 2. * labels.float() - 1.\n    errors = (1. - logits * Variable(signs))\n    errors_sorted, perm = torch.sort(errors, dim=0, descending=True)\n    perm = perm.data\n    gt_sorted = labels[perm]\n    grad = lovasz_grad(gt_sorted)\n    loss = torch.dot(F.relu(errors_sorted), Variable(grad))\n    return loss\n\n\ndef flatten_binary_scores(scores, labels, ignore=None):\n    \"\"\"\n    Flattens predictions in the batch (binary case)\n    Remove labels equal to 'ignore'\n    \"\"\"\n    scores = scores.view(-1)\n    labels = labels.view(-1)\n    if ignore is None:\n        return scores, labels\n    valid = (labels != ignore)\n    vscores = scores[valid]\n    vlabels = labels[valid]\n    return vscores, vlabels\n\n\nclass StableBCELoss(torch.nn.modules.Module):\n    def __init__(self):\n         super(StableBCELoss, self).__init__()\n    def forward(self, input, target):\n         neg_abs = - input.abs()\n         loss = input.clamp(min=0) - input * target + (1 + neg_abs.exp()).log()\n         return loss.mean()\n\n\ndef binary_xloss(logits, labels, ignore=None):\n    \"\"\"\n    Binary Cross entropy loss\n      logits: [B, H, W] Variable, logits at each pixel (between -\\infty and +\\infty)\n      labels: [B, H, W] Tensor, binary ground truth masks (0 or 1)\n      ignore: void class id\n    \"\"\"\n    logits, labels = flatten_binary_scores(logits, labels, ignore)\n    loss = StableBCELoss()(logits, Variable(labels.float()))\n    return loss\n\n\n# --------------------------- MULTICLASS LOSSES ---------------------------\n\n\ndef lovasz_softmax(probas, labels, classes='present', per_image=False, ignore=None):\n    \"\"\"\n    Multi-class Lovasz-Softmax loss\n      probas: [B, C, H, W] Variable, class probabilities at each prediction (between 0 and 1).\n              Interpreted as binary (sigmoid) output with outputs of size [B, H, W].\n      labels: [B, H, W] Tensor, ground truth labels (between 0 and C - 1)\n      classes: 'all' for all, 'present' for classes present in labels, or a list of classes to average.\n      per_image: compute the loss per image instead of per batch\n      ignore: void class labels\n    \"\"\"\n    if per_image:\n        loss = mean(lovasz_softmax_flat(*flatten_probas(prob.unsqueeze(0), lab.unsqueeze(0), ignore), classes=classes)\n                          for prob, lab in zip(probas, labels))\n    else:\n        loss = lovasz_softmax_flat(*flatten_probas(probas, labels, ignore), classes=classes)\n    return loss\n\n\ndef lovasz_softmax_flat(probas, labels, classes='present'):\n    \"\"\"\n    Multi-class Lovasz-Softmax loss\n      probas: [P, C] Variable, class probabilities at each prediction (between 0 and 1)\n      labels: [P] Tensor, ground truth labels (between 0 and C - 1)\n      classes: 'all' for all, 'present' for classes present in labels, or a list of classes to average.\n    \"\"\"\n    if probas.numel() == 0:\n        # only void pixels, the gradients should be 0\n        return probas * 0.\n    C = probas.size(1)\n    losses = []\n    class_to_sum = list(range(C)) if classes in ['all', 'present'] else classes\n    for c in class_to_sum:\n        fg = (labels == c).float() # foreground for class c\n        if (classes is 'present' and fg.sum() == 0):\n            continue\n        if C == 1:\n            if len(classes) > 1:\n                raise ValueError('Sigmoid output possible only with 1 class')\n            class_pred = probas[:, 0]\n        else:\n            class_pred = probas[:, c]\n        errors = (Variable(fg) - class_pred).abs()\n        errors_sorted, perm = torch.sort(errors, 0, descending=True)\n        perm = perm.data\n        fg_sorted = fg[perm]\n        losses.append(torch.dot(errors_sorted, Variable(lovasz_grad(fg_sorted))))\n    return mean(losses)\n\n\ndef flatten_probas(probas, labels, ignore=None):\n    \"\"\"\n    Flattens predictions in the batch\n    \"\"\"\n    if probas.dim() == 2:\n        if ignore is not None:\n            valid = (labels != ignore)\n            probas = probas[valid]\n            labels = labels[valid]\n        return probas, labels\n\n    elif probas.dim() == 3:\n        # assumes output of a sigmoid layer\n        B, H, W = probas.size()\n        probas = probas.view(B, 1, H, W)\n    elif probas.dim() == 5:\n        #3D segmentation\n        B, C, L, H, W = probas.size()\n        probas = probas.contiguous().view(B, C, L, H*W)\n    B, C, H, W = probas.size()\n    probas = probas.permute(0, 2, 3, 1).contiguous().view(-1, C)  # B * H * W, C = P, C\n    labels = labels.view(-1)\n    if ignore is None:\n        return probas, labels\n    valid = (labels != ignore)\n    vprobas = probas[valid.nonzero().squeeze()]\n    vlabels = labels[valid]\n    return vprobas, vlabels\n\ndef xloss(logits, labels, ignore=None):\n    \"\"\"\n    Cross entropy loss\n    \"\"\"\n    return F.cross_entropy(logits, Variable(labels), ignore_index=255)\n\ndef jaccard_loss(probas, labels,ignore=None, smooth = 100, bk_class = None):\n    \"\"\"\n    Something wrong with this loss\n    Multi-class Lovasz-Softmax loss\n      probas: [B, C, H, W] Variable, class probabilities at each prediction (between 0 and 1).\n              Interpreted as binary (sigmoid) output with outputs of size [B, H, W].\n      labels: [B, H, W] Tensor, ground truth labels (between 0 and C - 1)\n      classes: 'all' for all, 'present' for classes present in labels, or a list of classes to average.\n      per_image: compute the loss per image instead of per batch\n      ignore: void class labels\n    \"\"\"\n    vprobas, vlabels = flatten_probas(probas, labels, ignore)\n    \n    \n    true_1_hot = torch.eye(vprobas.shape[1])[vlabels]\n    \n    if bk_class:\n        one_hot_assignment = torch.ones_like(vlabels)\n        one_hot_assignment[vlabels == bk_class] = 0\n        one_hot_assignment = one_hot_assignment.float().unsqueeze(1)\n        true_1_hot = true_1_hot*one_hot_assignment\n    \n    true_1_hot = true_1_hot.to(vprobas.device)\n    intersection = torch.sum(vprobas * true_1_hot)\n    cardinality = torch.sum(vprobas + true_1_hot)\n    loss = (intersection + smooth / (cardinality - intersection + smooth)).mean()\n    return (1-loss)*smooth\n\ndef hinge_jaccard_loss(probas, labels,ignore=None, classes = 'present', hinge = 0.1, smooth =100):\n    \"\"\"\n    Multi-class Hinge Jaccard loss\n      probas: [B, C, H, W] Variable, class probabilities at each prediction (between 0 and 1).\n              Interpreted as binary (sigmoid) output with outputs of size [B, H, W].\n      labels: [B, H, W] Tensor, ground truth labels (between 0 and C - 1)\n      classes: 'all' for all, 'present' for classes present in labels, or a list of classes to average.\n      ignore: void class labels\n    \"\"\"\n    vprobas, vlabels = flatten_probas(probas, labels, ignore)\n    C = vprobas.size(1)\n    losses = []\n    class_to_sum = list(range(C)) if classes in ['all', 'present'] else classes\n    for c in class_to_sum:\n        if c in vlabels:\n            c_sample_ind = vlabels == c\n            cprobas = vprobas[c_sample_ind,:]\n            non_c_ind =np.array([a for a in class_to_sum if a != c])\n            class_pred = cprobas[:,c]\n            max_non_class_pred = torch.max(cprobas[:,non_c_ind],dim = 1)[0]\n            TP = torch.sum(torch.clamp(class_pred - max_non_class_pred, max = hinge)+1.) + smooth\n            FN = torch.sum(torch.clamp(max_non_class_pred - class_pred, min = -hinge)+hinge)\n            \n            if (~c_sample_ind).sum() == 0:\n                FP = 0\n            else:\n                nonc_probas = vprobas[~c_sample_ind,:]\n                class_pred = nonc_probas[:,c]\n                max_non_class_pred = torch.max(nonc_probas[:,non_c_ind],dim = 1)[0]\n                FP = torch.sum(torch.clamp(class_pred - max_non_class_pred, max = hinge)+1.)\n            \n            losses.append(1 - TP/(TP+FP+FN))\n    \n    if len(losses) == 0: return 0\n    return mean(losses)\n\n# --------------------------- HELPER FUNCTIONS ---------------------------\ndef isnan(x):\n    return x != x\n    \n    \ndef mean(l, ignore_nan=False, empty=0):\n    \"\"\"\n    nanmean compatible with generators.\n    \"\"\"\n    l = iter(l)\n    if ignore_nan:\n        l = ifilterfalse(isnan, l)\n    try:\n        n = 1\n        acc = next(l)\n    except StopIteration:\n        if empty == 'raise':\n            raise ValueError('Empty mean')\n        return empty\n    for n, v in enumerate(l, 2):\n        acc += v\n    if n == 1:\n        return acc\n    return acc / n\n"
  },
  {
    "path": "projects/occ_plugin/occupancy/dense_heads/occ_head.py",
    "content": "# Developed by Junyi Ma based on the codebase of OpenOccupancy\n# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications\n# https://github.com/haomo-ai/Cam4DOcc\n\nimport numpy as np\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom mmdet.core import reduce_mean\nfrom mmdet.models import HEADS\nfrom mmcv.cnn import build_conv_layer, build_norm_layer\nfrom .lovasz_softmax import lovasz_softmax\nfrom projects.occ_plugin.utils.nusc_param import nusc_class_names\nfrom projects.occ_plugin.utils.semkitti import geo_scal_loss, sem_scal_loss, CE_ssc_loss\n\n@HEADS.register_module()\nclass OccHead(nn.Module):\n    def __init__(\n        self,\n        in_channels,\n        out_channel,\n        num_level=1,\n        num_img_level=1,\n        soft_weights=False,\n        loss_weight_cfg=None,\n        conv_cfg=dict(type='Conv3d', bias=False),\n        norm_cfg=dict(type='GN', num_groups=32, requires_grad=True),\n        fine_topk=20000,\n        point_cloud_range=[-51.2, -51.2, -5.0, 51.2, 51.2, 3.0],\n        final_occ_size=[256, 256, 20],\n        empty_idx=0,\n        visible_loss=False,\n        balance_cls_weight=True,\n        train_cfg=None,\n        test_cfg=None,\n    ):\n        super(OccHead, self).__init__()\n        \n        if type(in_channels) is not list:\n            in_channels = [in_channels]\n        \n        self.in_channels = in_channels\n        self.out_channel = out_channel\n        self.num_level = num_level\n        self.fine_topk = fine_topk\n        \n        self.point_cloud_range = torch.tensor(np.array(point_cloud_range)).float()\n        self.final_occ_size = final_occ_size\n        self.visible_loss = visible_loss\n\n\n        if loss_weight_cfg is None:\n            self.loss_weight_cfg = {\n                \"loss_voxel_ce_weight\": 1.0,\n                \"loss_voxel_sem_scal_weight\": 1.0,\n                \"loss_voxel_geo_scal_weight\": 1.0,\n                \"loss_voxel_lovasz_weight\": 1.0,\n            }\n        else:\n            self.loss_weight_cfg = loss_weight_cfg\n        \n        # voxel losses\n        self.loss_voxel_ce_weight = self.loss_weight_cfg.get('loss_voxel_ce_weight', 1.0)\n        self.loss_voxel_sem_scal_weight = self.loss_weight_cfg.get('loss_voxel_sem_scal_weight', 1.0)\n        self.loss_voxel_geo_scal_weight = self.loss_weight_cfg.get('loss_voxel_geo_scal_weight', 1.0)\n        self.loss_voxel_lovasz_weight = self.loss_weight_cfg.get('loss_voxel_lovasz_weight', 1.0)\n        \n        # voxel-level prediction\n        self.occ_convs = nn.ModuleList()\n        for i in range(self.num_level):\n            mid_channel = self.in_channels[i]\n            occ_conv = nn.Sequential(\n                build_conv_layer(conv_cfg, in_channels=self.in_channels[i], \n                        out_channels=mid_channel, kernel_size=3, stride=1, padding=1),\n                build_norm_layer(norm_cfg, mid_channel)[1],\n                nn.ReLU(inplace=True))\n            self.occ_convs.append(occ_conv)\n\n\n        self.occ_pred_conv = nn.Sequential(\n                build_conv_layer(conv_cfg, in_channels=mid_channel, \n                        out_channels=mid_channel//2, kernel_size=1, stride=1, padding=0),\n                build_norm_layer(norm_cfg, mid_channel//2)[1],\n                nn.ReLU(inplace=True),\n                build_conv_layer(conv_cfg, in_channels=mid_channel//2, \n                        out_channels=out_channel, kernel_size=1, stride=1, padding=0))\n\n        self.soft_weights = soft_weights\n        self.num_img_level = num_img_level\n        self.num_point_sampling_feat = self.num_level\n        if self.soft_weights:\n            soft_in_channel = mid_channel\n            self.voxel_soft_weights = nn.Sequential(\n                build_conv_layer(conv_cfg, in_channels=soft_in_channel, \n                        out_channels=soft_in_channel//2, kernel_size=1, stride=1, padding=0),\n                build_norm_layer(norm_cfg, soft_in_channel//2)[1],\n                nn.ReLU(inplace=True),\n                build_conv_layer(conv_cfg, in_channels=soft_in_channel//2, \n                        out_channels=self.num_point_sampling_feat, kernel_size=1, stride=1, padding=0))  # num_point_sampling_feat=4\n        \n        if balance_cls_weight:\n            # out_channel\n            self.class_weights = np.ones((out_channel,))\n            self.class_weights[1:] = 5\n            self.class_weights = torch.from_numpy(self.class_weights)\n        else:\n            self.class_weights = np.ones((out_channel,))\n\n        self.class_names = nusc_class_names    \n        self.empty_idx = empty_idx\n        \n    def forward_coarse_voxel(self, voxel_feats):\n        output_occs = []\n        output = {}\n        for feats, occ_conv in zip(voxel_feats, self.occ_convs):\n            output_occs.append(occ_conv(feats))\n\n        if self.soft_weights:\n            voxel_soft_weights = self.voxel_soft_weights(output_occs[0])\n            voxel_soft_weights = torch.softmax(voxel_soft_weights, dim=1)\n        else:\n            voxel_soft_weights = torch.ones([output_occs[0].shape[0], self.num_point_sampling_feat, 1, 1, 1],).to(output_occs[0].device) / self.num_point_sampling_feat\n\n        out_voxel_feats = 0\n        _, _, H, W, D= output_occs[0].shape\n        for feats, weights in zip(output_occs, torch.unbind(voxel_soft_weights, dim=1)): \n            feats = F.interpolate(feats, size=[H, W, D], mode='trilinear', align_corners=False).contiguous()\n            out_voxel_feats += feats * weights.unsqueeze(1)\n        output['out_voxel_feats'] = [out_voxel_feats]\n\n        out_voxel = self.occ_pred_conv(out_voxel_feats) \n        output['occ'] = [out_voxel]\n\n        return output\n    \n    \n    def forward(self, voxel_feats, img_feats=None, transform=None, **kwargs):\n        assert type(voxel_feats) is list and len(voxel_feats) == self.num_level\n        \n        # forward voxel \n        output = self.forward_coarse_voxel(voxel_feats)\n\n        res = {\n            'output_voxels': output['occ'],\n        }\n        \n        return res\n\n    def loss_voxel(self, output_voxels, target_voxels, tag):\n        B, C, H, W, D = output_voxels.shape\n        tB, tC, tH, tW, tD = target_voxels.shape\n        target_voxels = target_voxels.view(tB*tC, tH, tW, tD)\n        ratio = target_voxels.shape[2] // H\n        if ratio != 1:\n            target_voxels = target_voxels.reshape(B, H, ratio, W, ratio, D, ratio).permute(0,1,3,5,2,4,6).reshape(B, H, W, D, ratio**3)\n            empty_mask = target_voxels.sum(-1) == self.empty_idx\n            target_voxels = target_voxels.to(torch.int64)\n            occ_space = target_voxels[~empty_mask]\n            occ_space[occ_space==0] = -torch.arange(len(occ_space[occ_space==0])).to(occ_space.device) - 1\n            target_voxels[~empty_mask] = occ_space\n            target_voxels = torch.mode(target_voxels, dim=-1)[0]\n            target_voxels[target_voxels<0] = 255\n            target_voxels = target_voxels.long()\n\n        assert torch.isnan(output_voxels).sum().item() == 0\n        assert torch.isnan(target_voxels).sum().item() == 0\n\n        loss_dict = {}\n\n        loss_dict['loss_voxel_ce_{}'.format(tag)] = (0.5) * CE_ssc_loss(output_voxels, target_voxels, self.class_weights.type_as(output_voxels), ignore_index=255)\n\n        return loss_dict\n\n    def loss_point(self, fine_coord, fine_output, target_voxels, tag):\n\n        selected_gt = target_voxels[:, fine_coord[0,:], fine_coord[1,:], fine_coord[2,:]].long()[0]\n        assert torch.isnan(selected_gt).sum().item() == 0, torch.isnan(selected_gt).sum().item()\n        assert torch.isnan(fine_output).sum().item() == 0, torch.isnan(fine_output).sum().item()\n\n        loss_dict = {}\n\n        # igore 255 = ignore noise. we keep the loss bascward for the label=0 (free voxels)\n        loss_dict['loss_voxel_ce_{}'.format(tag)] = self.loss_voxel_ce_weight * CE_ssc_loss(fine_output, selected_gt, ignore_index=255)\n        loss_dict['loss_voxel_sem_scal_{}'.format(tag)] = self.loss_voxel_sem_scal_weight * sem_scal_loss(fine_output, selected_gt, ignore_index=255)\n        loss_dict['loss_voxel_geo_scal_{}'.format(tag)] = self.loss_voxel_geo_scal_weight * geo_scal_loss(fine_output, selected_gt, ignore_index=255, non_empty_idx=self.empty_idx)\n        loss_dict['loss_voxel_lovasz_{}'.format(tag)] = self.loss_voxel_lovasz_weight * lovasz_softmax(torch.softmax(fine_output, dim=1), selected_gt, ignore=255)\n\n\n        return loss_dict\n\n    def loss(self, output_voxels=None,\n                output_coords_fine=None, output_voxels_fine=None, \n                target_voxels=None, **kwargs):\n        loss_dict = {}\n        for index, output_voxel in enumerate(output_voxels):\n            loss_dict.update(self.loss_voxel(output_voxel, target_voxels,  tag='c_{}'.format(index)))\n            \n        return loss_dict\n    \n        \n"
  },
  {
    "path": "projects/occ_plugin/occupancy/dense_heads/utils.py",
    "content": "# borrowed from https://github.com/GuoPingPan/RPVNet/blob/main/core/models/utils/utils.py\n\nimport time\nimport numpy as np\n\nimport torch\nfrom torch.nn.functional import grid_sample\n\nimport torchsparse.nn.functional as F\nfrom torchsparse import PointTensor, SparseTensor\nfrom torchsparse.nn.utils import get_kernel_offsets\n\n__all__ = ['initial_voxelize', 'point_to_voxel', 'voxel_to_point',\n           'range_to_point','point_to_range']\n\n\ndef initial_voxelize(z: PointTensor, after_res) -> SparseTensor:\n\n    new_float_coord = torch.cat(\n        [z.C[:, :3]  / after_res, z.C[:, -1].view(-1, 1)], 1)\n\n    pc_hash = F.sphash(torch.round(new_float_coord).int())\n\n    sparse_hash = torch.unique(pc_hash)\n\n    idx_query = F.sphashquery(pc_hash, sparse_hash)\n\n    counts = F.spcount(idx_query.int(), len(sparse_hash))\n\n    inserted_coords = F.spvoxelize(torch.round(new_float_coord), idx_query,counts)\n\n    inserted_coords = torch.round(inserted_coords).int()\n\n    inserted_feat = F.spvoxelize(z.F, idx_query, counts)\n\n    new_tensor = SparseTensor(inserted_feat, inserted_coords, 1)\n\n    new_tensor.cmaps.setdefault((1,1,1), new_tensor.coords)\n\n    z.additional_features['idx_query'][(1,1,1)] = idx_query\n    z.additional_features['counts'][(1,1,1)] = counts\n\n    return new_tensor.to(z.F.device)\n\n\ndef point_to_voxel(x: SparseTensor, z: PointTensor) -> SparseTensor:\n    if z.additional_features is None or z.additional_features['idx_query'] is None \\\n            or z.additional_features['idx_query'].get(x.s) is None:\n\n        pc_hash = F.sphash(\n            torch.cat([\n                torch.round(z.C[:, :3] / x.s[0]).int(),\n                z.C[:, -1].int().view(-1, 1)\n            ], 1))\n        sparse_hash = F.sphash(x.C)\n\n        idx_query = F.sphashquery(pc_hash, sparse_hash)\n        counts = F.spcount(idx_query.int(), x.C.shape[0])\n    else:\n        idx_query = z.additional_features['idx_query'][x.s]\n        counts = z.additional_features['counts'][x.s]\n\n    inserted_feat = F.spvoxelize(z.F, idx_query, counts)\n    new_tensor = SparseTensor(inserted_feat, x.C, x.s)\n    new_tensor.cmaps = x.cmaps\n    new_tensor.kmaps = x.kmaps\n\n    return new_tensor\n\n\ndef voxel_to_point(x: SparseTensor, z: PointTensor, nearest=False) -> torch.Tensor:\n    if z.idx_query is None or z.weights is None or z.idx_query.get(x.s) is None \\\n            or z.weights.get(x.s) is None:\n\n        off = get_kernel_offsets(2, x.s, 1, device=z.F.device)\n\n        old_hash = F.sphash(\n            torch.cat([\n                torch.round(z.C[:, :3] / x.s[0]).int(),\n                z.C[:, -1].int().view(-1, 1)\n            ], 1), off)\n\n\n        pc_hash = F.sphash(x.C.to(z.F.device))\n\n        idx_query = F.sphashquery(old_hash, pc_hash)\n\n        weights = F.calc_ti_weights(z.C, idx_query,\n                                    scale=x.s[0]).transpose(0, 1).contiguous()\n\n        idx_query = idx_query.transpose(0, 1).contiguous()\n\n        if nearest:\n            weights[:, 1:] = 0.\n            idx_query[:, 1:] = -1\n\n        new_feat = F.spdevoxelize(x.F, idx_query, weights)\n\n        if x.s == (1,1,1):\n            z.idx_query[x.s] = idx_query\n            z.weights[x.s] = weights\n    else:\n        new_feat = F.spdevoxelize(x.F, z.idx_query.get(x.s), z.weights.get(x.s))\n\n    return new_feat\n\ndef range_to_point(x,px,py):\n\n    r2p = []\n\n    for batch,(p_x,p_y) in enumerate(zip(px,py)):\n        pypx = torch.stack([p_x,p_y],dim=2).to(px[0].device)\n        resampled = grid_sample(x[batch].unsqueeze(0),pypx.unsqueeze(0))\n        r2p.append(resampled.squeeze().permute(1,0))\n    return torch.concat(r2p,dim=0)\n\n\ndef point_to_range(range_shape,pF,px,py):\n    H, W = range_shape\n    cnt = 0\n    r = []\n    # t1 = time.time()\n    for batch,(p_x,p_y) in enumerate(zip(px,py)):\n        image = torch.zeros(size=(H,W,pF.shape[1])).to(px[0].device)\n        image_cumsum = torch.zeros(size=(H,W,pF.shape[1])) + 1e-5\n\n        p_x = torch.floor((p_x/2. + 0.5) * W).long()\n        p_y = torch.floor((p_y/2. + 0.5) * H).long()\n\n        ''' v1: directly assign '''\n        # image[p_y,p_x] = pF[cnt:cnt+p_x.shape[1]]\n\n        ''' v2: use average '''\n        image[p_y,p_x] += pF[cnt:cnt+p_x.shape[1]]\n        image_cumsum[p_y,p_x] += torch.ones(pF.shape[1])\n        image = image/image_cumsum.to(px[0].device)\n\n        r.append(image.permute(2,0,1))\n        cnt += p_x.shape[1]\n    return torch.stack(r,dim=0).to(px[0].device)"
  },
  {
    "path": "projects/occ_plugin/occupancy/detectors/__init__.py",
    "content": "from .ocfnet import OCFNet\r\n"
  },
  {
    "path": "projects/occ_plugin/occupancy/detectors/bevdepth.py",
    "content": "# Copyright (c) Phigent Robotics. All rights reserved.\nimport math\nimport torch\nfrom mmcv.runner import force_fp32\nimport torch.nn.functional as F\n\nfrom mmdet.models import DETECTORS\nfrom mmdet3d.models import builder\nfrom torch.utils.checkpoint import checkpoint\nfrom mmdet3d.models.detectors import CenterPoint\n\nimport pdb\n\n@DETECTORS.register_module()\nclass BEVDet(CenterPoint):\n    def __init__(self, img_view_transformer=None,\n                 img_bev_encoder_backbone=None,\n                 img_bev_encoder_neck=None, **kwargs):\n        super(BEVDet, self).__init__(**kwargs)\n        \n        if img_view_transformer is not None:\n            self.img_view_transformer = builder.build_neck(img_view_transformer)\n        else:\n            self.img_view_transformer = None\n        \n        if img_bev_encoder_backbone is not None:\n            self.img_bev_encoder_backbone = builder.build_backbone(img_bev_encoder_backbone)\n        else:\n            self.img_bev_encoder_backbone = torch.nn.Identity()\n        \n        if img_bev_encoder_neck is not None:\n            self.img_bev_encoder_neck = builder.build_neck(img_bev_encoder_neck)\n        else:\n            self.img_bev_encoder_neck = torch.nn.Identity()\n\n    def image_encoder(self, img):\n        imgs = img\n        B, N, C, imH, imW = imgs.shape\n        imgs = imgs.view(B * N, C, imH, imW)\n        x = self.img_backbone(imgs)\n        if self.with_img_neck:\n            x = self.img_neck(x)\n            if type(x) in [list, tuple]:\n                x = x[0]\n        _, output_dim, ouput_H, output_W = x.shape\n        x = x.view(B, N, output_dim, ouput_H, output_W)\n        return x\n\n    @force_fp32()\n    def bev_encoder(self, x):\n        x = self.img_bev_encoder_backbone(x)\n        x = self.img_bev_encoder_neck(x)\n        if type(x) in [list, tuple]:\n            x = x[0]\n        return x\n\n    def extract_img_feat(self, img, img_metas):\n        \"\"\"Extract features of images.\"\"\"\n        x = self.image_encoder(img[0])\n        x = self.img_view_transformer([x] + img[1:7])\n        x = self.bev_encoder(x)\n        return [x]\n\n    def extract_feat(self, points, img, img_metas):\n        \"\"\"Extract features from images and points.\"\"\"\n        img_feats = self.extract_img_feat(img, img_metas)\n        pts_feats = None\n        return (img_feats, pts_feats)\n\n    def forward_train(self,\n                      points=None,\n                      img_metas=None,\n                      gt_bboxes_3d=None,\n                      gt_labels_3d=None,\n                      gt_labels=None,\n                      gt_bboxes=None,\n                      img_inputs=None,\n                      proposals=None,\n                      gt_bboxes_ignore=None):\n        \"\"\"Forward training function.\n\n        Args:\n            points (list[torch.Tensor], optional): Points of each sample.\n                Defaults to None.\n            img_metas (list[dict], optional): Meta information of each sample.\n                Defaults to None.\n            gt_bboxes_3d (list[:obj:`BaseInstance3DBoxes`], optional):\n                Ground truth 3D boxes. Defaults to None.\n            gt_labels_3d (list[torch.Tensor], optional): Ground truth labels\n                of 3D boxes. Defaults to None.\n            gt_labels (list[torch.Tensor], optional): Ground truth labels\n                of 2D boxes in images. Defaults to None.\n            gt_bboxes (list[torch.Tensor], optional): Ground truth 2D boxes in\n                images. Defaults to None.\n            img (torch.Tensor optional): Images of each sample with shape\n                (N, C, H, W). Defaults to None.\n            proposals ([list[torch.Tensor], optional): Predicted proposals\n                used for training Fast RCNN. Defaults to None.\n            gt_bboxes_ignore (list[torch.Tensor], optional): Ground truth\n                2D boxes in images to be ignored. Defaults to None.\n\n        Returns:\n            dict: Losses of different branches.\n        \"\"\"\n        img_feats, pts_feats = self.extract_feat(\n            points, img=img_inputs, img_metas=img_metas)\n        assert self.with_pts_bbox\n        losses = dict()\n        losses_pts = self.forward_pts_train(img_feats, gt_bboxes_3d,\n                                            gt_labels_3d, img_metas,\n                                            gt_bboxes_ignore)\n        losses.update(losses_pts)\n        return losses\n\n    def forward_test(self, points=None, img_metas=None, img_inputs=None, **kwargs):\n        \"\"\"\n        Args:\n            points (list[torch.Tensor]): the outer list indicates test-time\n                augmentations and inner torch.Tensor should have a shape NxC,\n                which contains all points in the batch.\n            img_metas (list[list[dict]]): the outer list indicates test-time\n                augs (multiscale, flip, etc.) and the inner list indicates\n                images in a batch\n            img (list[torch.Tensor], optional): the outer\n                list indicates test-time augmentations and inner\n                torch.Tensor should have a shape NxCxHxW, which contains\n                all images in the batch. Defaults to None.\n        \"\"\"\n        for var, name in [(img_inputs, 'img_inputs'), (img_metas, 'img_metas')]:\n            if not isinstance(var, list):\n                raise TypeError('{} must be a list, but got {}'.format(\n                    name, type(var)))\n\n        num_augs = len(img_inputs)\n        if num_augs != len(img_metas):\n            raise ValueError(\n                'num of augmentations ({}) != num of image meta ({})'.format(\n                    len(img_inputs), len(img_metas)))\n\n        if not isinstance(img_inputs[0][0],list):\n            img_inputs = [img_inputs] if img_inputs is None else img_inputs\n            points = [points] if points is None else points\n            return self.simple_test(points[0], img_metas[0], img_inputs[0], **kwargs)\n        else:\n            return self.aug_test(None, img_metas[0], img_inputs[0], **kwargs)\n\n    def aug_test(self, points, img_metas, img=None, rescale=False):\n        \"\"\"Test function without augmentaiton.\"\"\"\n        combine_type = self.test_cfg.get('combine_type','output')\n        if combine_type=='output':\n            return self.aug_test_combine_output(points, img_metas, img, rescale)\n        elif combine_type=='feature':\n            return self.aug_test_combine_feature(points, img_metas, img, rescale)\n        else:\n            assert False\n\n    def simple_test(self, points, img_metas, img=None, rescale=False):\n        \"\"\"Test function without augmentaiton.\"\"\"\n        img_feats, _ = self.extract_feat(points, img=img, img_metas=img_metas)\n        bbox_list = [dict() for _ in range(len(img_metas))]\n        bbox_pts = self.simple_test_pts(img_feats, img_metas, rescale=rescale)\n        for result_dict, pts_bbox in zip(bbox_list, bbox_pts):\n            result_dict['pts_bbox'] = pts_bbox\n        return bbox_list\n\n\n    def forward_dummy(self, points=None, img_metas=None, img_inputs=None, **kwargs):\n        img_feats, _ = self.extract_feat(points, img=img_inputs, img_metas=img_metas)\n        from mmdet3d.core.bbox.structures.box_3d_mode import LiDARInstance3DBoxes\n        img_metas=[dict(box_type_3d=LiDARInstance3DBoxes)]\n        bbox_list = [dict() for _ in range(1)]\n        assert self.with_pts_bbox\n        bbox_pts = self.simple_test_pts(\n            img_feats, img_metas, rescale=False)\n        for result_dict, pts_bbox in zip(bbox_list, bbox_pts):\n            result_dict['pts_bbox'] = pts_bbox\n        return bbox_list\n\n\n@DETECTORS.register_module()\nclass BEVDet4D(BEVDet):\n    def __init__(self, pre_process=None,\n                 align_after_view_transfromation=False,\n                 detach=True,\n                 detach_pre_process=False, **kwargs):\n        super(BEVDet4D, self).__init__(**kwargs)\n        self.pre_process = pre_process is not None\n        if self.pre_process:\n            self.pre_process_net = builder.build_backbone(pre_process)\n        self.align_after_view_transfromation = align_after_view_transfromation\n        self.detach = detach\n        self.detach_pre_process = detach_pre_process\n\n    @force_fp32()\n    def shift_feature(self, input, trans, rots):\n        n, c, h, w = input.shape\n        _, v, _ = trans[0].shape\n\n        # generate grid\n        xs = torch.linspace(0, w - 1, w, dtype=input.dtype,\n                            device=input.device).view(1, w).expand(h, w)\n        ys = torch.linspace(0, h - 1, h, dtype=input.dtype,\n                            device=input.device).view(h, 1).expand(h, w)\n        grid = torch.stack((xs, ys, torch.ones_like(xs)), -1)\n        grid = grid.view(1, h, w, 3).expand(n,h,w,3).view(n, h, w, 3, 1)\n\n        # get transformation from current lidar frame to adjacent lidar frame\n        # transformation from current camera frame to current lidar frame\n        c02l0 = torch.zeros((n, v, 4, 4), dtype=grid.dtype).to(grid)\n        c02l0[:, :, :3, :3] = rots[0]\n        c02l0[:, :, :3, 3] = trans[0]\n        c02l0[:, :, 3, 3] = 1\n\n        # transformation from adjacent camera frame to current lidar frame\n        c12l0 = torch.zeros((n, v, 4, 4), dtype=grid.dtype).to(grid)\n        c12l0[:, :, :3, :3] = rots[1]\n        c12l0[:, :, :3, 3] = trans[1]\n        c12l0[:, :, 3, 3] = 1\n\n        # transformation from current lidar frame to adjacent lidar frame\n        l02l1 = c02l0.matmul(torch.inverse(c12l0))[:, 0, :, :].view(n, 1, 1, 4, 4)\n        '''\n          c02l0 * inv(c12l0)\n        = c02l0 * inv(l12l0 * c12l1)\n        = c02l0 * inv(c12l1) * inv(l12l0)\n        = l02l1 # c02l0==c12l1\n        '''\n\n        l02l1 = l02l1[:, :, :, [True, True, False, True], :][:, :, :, :,\n                [True, True, False, True]]\n\n        feat2bev = torch.zeros((3, 3), dtype=grid.dtype).to(grid)\n        feat2bev[0, 0] = self.img_view_transformer.dx[0]\n        feat2bev[1, 1] = self.img_view_transformer.dx[1]\n        feat2bev[0, 2] = self.img_view_transformer.bx[0] - \\\n                         self.img_view_transformer.dx[0] / 2.\n        feat2bev[1, 2] = self.img_view_transformer.bx[1] - \\\n                         self.img_view_transformer.dx[1] / 2.\n        feat2bev[2, 2] = 1\n        feat2bev = feat2bev.view(1, 3, 3)\n        tf = torch.inverse(feat2bev).matmul(l02l1).matmul(feat2bev)\n\n        # transform and normalize\n        grid = tf.matmul(grid)\n        normalize_factor = torch.tensor([w - 1.0, h - 1.0], dtype=input.dtype,\n                                        device=input.device)\n        grid = grid[:, :, :, :2, 0] / normalize_factor.view(1, 1, 1,\n                                                            2) * 2.0 - 1.0\n        output = F.grid_sample(input, grid.to(input.dtype), align_corners=True)\n        return output\n\n    def prepare_bev_feat(self, img, rot, tran, intrin, post_rot, post_tran, bda):\n        x = self.image_encoder(img)\n        bev_feat = self.img_view_transformer([x, rot, tran, intrin,\n                                              post_rot, post_tran, bda])\n        if self.pre_process:\n            bev_feat = self.pre_process_net(bev_feat)[0]\n        return bev_feat\n\n    def extract_img_feat(self, img, img_metas):\n        inputs = img\n        \"\"\"Extract features of images.\"\"\"\n        B, N, _, H, W = inputs[0].shape\n        N = N//2\n        imgs = inputs[0].view(B,N,2,3,H,W)\n        imgs = torch.split(imgs,1,2)\n        imgs = [t.squeeze(2) for t in imgs]\n        rots, trans, intrins, post_rots, post_trans, bda = inputs[1:7]\n        extra = [rots.view(B,2,N,3,3),\n                 trans.view(B,2,N,3),\n                 intrins.view(B,2,N,3,3),\n                 post_rots.view(B,2,N,3,3),\n                 post_trans.view(B,2,N,3)]\n        extra = [torch.split(t, 1, 1) for t in extra]\n        extra = [[p.squeeze(1) for p in t] for t in extra]\n        rots, trans, intrins, post_rots, post_trans = extra\n        bev_feat_list = []\n        key_frame=True # back propagation for key frame only\n        for img, rot, tran, intrin, post_rot, \\\n            post_tran in zip(imgs, rots, trans, intrins, post_rots, post_trans):\n            if self.align_after_view_transfromation:\n                rot, tran = rots[0], trans[0]\n            inputs_curr = (img, rot, tran, intrin, post_rot, post_tran, bda)\n            if not key_frame and self.detach:\n                with torch.no_grad():\n                    bev_feat = self.prepare_bev_feat(*inputs_curr)\n            else:\n                bev_feat = self.prepare_bev_feat(*inputs_curr)\n            bev_feat_list.append(bev_feat)\n            key_frame = False\n        if self.align_after_view_transfromation:\n            bev_feat_list[1] = self.shift_feature(bev_feat_list[1],\n                                                  trans, rots)\n        bev_feat = torch.cat(bev_feat_list, dim=1)\n        x = self.bev_encoder(bev_feat)\n        return [x]\n\n\nclass BEVDepth_Base(object):\n    def extract_feat(self, points, img, img_metas):\n        \"\"\"Extract features from images and points.\"\"\"\n        img_feats, depth = self.extract_img_feat(img, img_metas)\n        pts_feats = None\n        return (img_feats, pts_feats, depth)\n\n    def simple_test(self, points, img_metas, img=None, rescale=False):\n        \"\"\"Test function without augmentaiton.\"\"\"\n        img_feats, _, _ = self.extract_feat(points, img=img, img_metas=img_metas)\n        bbox_list = [dict() for _ in range(len(img_metas))]\n        bbox_pts = self.simple_test_pts(img_feats, img_metas, rescale=rescale)\n        for result_dict, pts_bbox in zip(bbox_list, bbox_pts):\n            result_dict['pts_bbox'] = pts_bbox\n        return bbox_list\n\n    def forward_train(self,\n                      points=None,\n                      img_metas=None,\n                      gt_bboxes_3d=None,\n                      gt_labels_3d=None,\n                      gt_labels=None,\n                      gt_bboxes=None,\n                      img_inputs=None,\n                      proposals=None,\n                      gt_bboxes_ignore=None):\n        \"\"\"Forward training function.\n\n        Args:\n            points (list[torch.Tensor], optional): Points of each sample.\n                Defaults to None.\n            img_metas (list[dict], optional): Meta information of each sample.\n                Defaults to None.\n            gt_bboxes_3d (list[:obj:`BaseInstance3DBoxes`], optional):\n                Ground truth 3D boxes. Defaults to None.\n            gt_labels_3d (list[torch.Tensor], optional): Ground truth labels\n                of 3D boxes. Defaults to None.\n            gt_labels (list[torch.Tensor], optional): Ground truth labels\n                of 2D boxes in images. Defaults to None.\n            gt_bboxes (list[torch.Tensor], optional): Ground truth 2D boxes in\n                images. Defaults to None.\n            img (torch.Tensor optional): Images of each sample with shape\n                (N, C, H, W). Defaults to None.\n            proposals ([list[torch.Tensor], optional): Predicted proposals\n                used for training Fast RCNN. Defaults to None.\n            gt_bboxes_ignore (list[torch.Tensor], optional): Ground truth\n                2D boxes in images to be ignored. Defaults to None.\n\n        Returns:\n            dict: Losses of different branches.\n        \"\"\"\n\n        img_feats, pts_feats, depth = self.extract_feat(\n            points, img=img_inputs, img_metas=img_metas)\n        assert self.with_pts_bbox\n        # assert len(img_inputs) == 8\n        depth_gt = img_inputs[7]\n        loss_depth = self.img_view_transformer.get_depth_loss(depth_gt, depth)\n        losses = dict(loss_depth=loss_depth)\n        losses_pts = self.forward_pts_train(img_feats, gt_bboxes_3d,\n                                            gt_labels_3d, img_metas,\n                                            gt_bboxes_ignore)\n        losses.update(losses_pts)\n        \n        # some modifications\n        if hasattr(self.img_view_transformer, 'loss_depth_reg_weight') and self.img_view_transformer.loss_depth_reg_weight > 0:\n            losses['loss_depth_reg'] = self.img_view_transformer.get_depth_reg_loss(depth_gt, depth)\n\n        return losses\n\n@DETECTORS.register_module()\nclass BEVDepth(BEVDepth_Base, BEVDet):\n    def extract_img_feat(self, img, img_metas):\n        \"\"\"Extract features of images.\"\"\"\n        x = self.image_encoder(img[0])\n\n        # img: imgs, rots, trans, intrins, post_rots, post_trans, gt_depths, sensor2sensors\n        rots, trans, intrins, post_rots, post_trans, bda = img[1:7]\n        \n        mlp_input = self.img_view_transformer.get_mlp_input(rots, trans, intrins, post_rots, post_trans, bda)\n        geo_inputs = [rots, trans, intrins, post_rots, post_trans, bda, mlp_input]\n        \n        x, depth = self.img_view_transformer([x] + geo_inputs)\n        x = self.bev_encoder(x)\n        \n        return [x], depth\n\n\n@DETECTORS.register_module()\nclass BEVDepth4D(BEVDepth_Base, BEVDet4D):\n    def prepare_bev_feat(self, img, rot, tran, intrin,\n                         post_rot, post_tran, bda, mlp_input):\n        x = self.image_encoder(img)\n        bev_feat, depth = self.img_view_transformer([x, rot, tran, intrin,\n                                              post_rot, post_tran, bda, mlp_input])\n        if self.detach_pre_process and self.pre_process:\n            bev_feat = self.pre_process_net(bev_feat)[0]\n        return bev_feat, depth\n\n    def extract_img_feat(self, img, img_metas):\n        inputs = img\n        \"\"\"Extract features of images.\"\"\"\n        B, N, _, H, W = inputs[0].shape\n        N = N//2\n        imgs = inputs[0].view(B,N,2,3,H,W)\n        imgs = torch.split(imgs,1,2)\n        imgs = [t.squeeze(2) for t in imgs]\n        rots, trans, intrins, post_rots, post_trans, bda = inputs[1:7]\n        extra = [rots.view(B,2,N,3,3),\n                 trans.view(B,2,N,3),\n                 intrins.view(B,2,N,3,3),\n                 post_rots.view(B,2,N,3,3),\n                 post_trans.view(B,2,N,3)]\n        extra = [torch.split(t, 1, 1) for t in extra]\n        extra = [[p.squeeze(1) for p in t] for t in extra]\n        rots, trans, intrins, post_rots, post_trans = extra\n        bev_feat_list = []\n        depth_list = []\n        key_frame=True # back propagation for key frame only\n        for img, rot, tran, intrin, post_rot, \\\n            post_tran in zip(imgs, rots, trans, intrins, post_rots, post_trans):\n            if self.align_after_view_transfromation:\n                rot, tran = rots[0], trans[0]\n            mlp_input = self.img_view_transformer.get_mlp_input(\n                rots[0], trans[0], intrin,post_rot, post_tran, bda)\n            inputs_curr = (img, rot, tran, intrin, post_rot, post_tran, bda, mlp_input)\n            if not key_frame and self.detach:\n                with torch.no_grad():\n                    bev_feat, depth = self.prepare_bev_feat(*inputs_curr)\n            else:\n                bev_feat, depth = self.prepare_bev_feat(*inputs_curr)\n            if not self.detach_pre_process and self.pre_process:\n                bev_feat = self.pre_process_net(bev_feat)[0]\n            bev_feat_list.append(bev_feat)\n            depth_list.append(depth)\n            key_frame = False\n        if self.align_after_view_transfromation:\n            bev_feat_list[1] = self.shift_feature(bev_feat_list[1],\n                                                  trans, rots)\n            \n        bev_feat = torch.cat(bev_feat_list, dim=1)\n        x = self.bev_encoder(bev_feat)\n        return [x], depth_list[0]\n\n\n@DETECTORS.register_module()\nclass BEVStereo(BEVDepth4D):\n    def __init__(self, bevdet_model=False, **kwargs):\n        super(BEVStereo, self).__init__(**kwargs)\n        self.bevdet_model = bevdet_model\n\n    def image_encoder(self, img):\n        imgs = img\n        B, N, C, imH, imW = imgs.shape\n        imgs = imgs.view(B * N, C, imH, imW)\n        x = self.img_backbone(imgs)\n        stereo_feat = x[0].detach()\n\n        # if isinstance(self.img_backbone, CustomSwin):\n        #     stereo_feat = stereo_feat.permute(0,2,3,1)\n        #     stereo_feat = self.img_backbone.norm0(stereo_feat)\n        #     stereo_feat = stereo_feat.permute(0,3,1,2)\n        \n        if self.bevdet_model:\n            x = x[-2:]\n        \n        if self.with_img_neck:\n            x = self.img_neck(x)\n            if type(x) in [list, tuple]:\n                x = x[0]\n        \n        _, output_dim, ouput_H, output_W = x.shape\n        x = x.view(B, N, output_dim, ouput_H, output_W)\n        return x, stereo_feat\n\n    def extract_img_feat(self, img, img_metas):\n        inputs = img\n        \"\"\"Extract features of images.\"\"\"\n        B, N, _, H, W = inputs[0].shape\n        N = N//2\n        imgs = inputs[0].view(B,N,2,3,H,W)\n        imgs = torch.split(imgs,1,2)\n        imgs = [t.squeeze(2) for t in imgs]\n        rots, trans, intrins, post_rots, post_trans, bda, _, sensor2sensors = inputs[1:9]\n        extra = [rots.view(B,2,N,3,3),\n                 trans.view(B,2,N,3),\n                 intrins.view(B,2,N,3,3),\n                 post_rots.view(B,2,N,3,3),\n                 post_trans.view(B,2,N,3),\n                 sensor2sensors.view(B,2,N,4,4)]\n\n        sensor2ego_mats = torch.eye(4).view(1,1,1,4,4).repeat(B,2,N,1,1).to(rots)\n        sensor2ego_mats[:,:,:,:3,:3] = extra[0]\n        sensor2ego_mats[:,:,:,:3,3] = extra[1]\n        intrin_mats = torch.eye(4).view(1,1,1,4,4).repeat(B,2,N,1,1).to(rots)\n        intrin_mats[:,:,:,:3,:3] = extra[2]\n        ida_mats = torch.eye(4).view(1,1,1,4,4).repeat(B,2,N,1,1).to(rots)\n        ida_mats[:,:,:,:3,:3] = extra[3]\n        ida_mats[:,:,:,:3,3] = extra[4]\n        mats_dict = dict(sensor2ego_mats=sensor2ego_mats,\n                         intrin_mats=intrin_mats,\n                         ida_mats=ida_mats,\n                         sensor2sensor_mats=extra[5],\n                         bda_mat=bda)\n        extra = [torch.split(t, 1, 1) for t in extra]\n        extra = [[p.squeeze(1) for p in t] for t in extra]\n        rots, trans, intrins, post_rots, post_trans, sensor2sensors = extra\n\n        # forward stereo depth\n        context_all_sweeps = list()\n        depth_feat_all_sweeps = list()\n        img_feats_all_sweeps = list()\n        stereo_feats_all_sweeps = list()\n        mu_all_sweeps = list()\n        sigma_all_sweeps = list()\n        mono_depth_all_sweeps = list()\n        range_score_all_sweeps = list()\n        key_frame=True # back propagation for key frame only\n        for img, rot, tran, intrin, post_rot, post_tran in zip(imgs, rots, trans, intrins, post_rots, post_trans):\n            if not key_frame:\n                with torch.no_grad():\n                    img_feats, stereo_feats = self.image_encoder(img)\n                    img_feats = img_feats.view(B * N, *img_feats.shape[2:])\n                    mlp_input = \\\n                        self.img_view_transformer.get_mlp_input(rots[0], trans[0], intrin, post_rot, post_tran, bda)\n                    depth_feat, context, mu, sigma, range_score, mono_depth = \\\n                        self.img_view_transformer.depth_net(img_feats,\n                                                            mlp_input)\n                    context = self.img_view_transformer.context_downsample_net(\n                        context)\n            else:\n                img_feats, stereo_feats = self.image_encoder(img)\n                img_feats = img_feats.view(B * N, *img_feats.shape[2:])\n                mlp_input = \\\n                    self.img_view_transformer.get_mlp_input(rots[0], trans[0], intrin,\n                                                            post_rot,\n                                                            post_tran, bda)\n                depth_feat, context, mu, sigma, range_score, mono_depth = \\\n                    self.img_view_transformer.depth_net(img_feats,\n                                                        mlp_input)\n                context = self.img_view_transformer.context_downsample_net(\n                    context)\n            img_feats_all_sweeps.append(img_feats)\n            stereo_feats_all_sweeps.append(stereo_feats)\n            depth_feat_all_sweeps.append(depth_feat)\n            context_all_sweeps.append(context)\n            mu_all_sweeps.append(mu)\n            sigma_all_sweeps.append(sigma)\n            mono_depth_all_sweeps.append(mono_depth)\n            range_score_all_sweeps.append(range_score)\n            key_frame = False\n\n        depth_score_all_sweeps = list()\n        num_sweeps = 2\n        for ref_idx in range(num_sweeps):\n            sensor2sensor_mats = list()\n            for src_idx in range(num_sweeps):\n                ref2keysensor_mats = sensor2sensors[ref_idx].inverse()\n                key2srcsensor_mats = sensor2sensors[src_idx]\n                ref2srcsensor_mats = key2srcsensor_mats @ ref2keysensor_mats\n                sensor2sensor_mats.append(ref2srcsensor_mats)\n            if ref_idx == 0:\n                # last iteration on stage 1 does not have propagation\n                # (photometric consistency filtering)\n                if self.img_view_transformer.use_mask:\n                    stereo_depth, mask = self.img_view_transformer._forward_stereo(\n                        ref_idx,\n                        stereo_feats_all_sweeps,\n                        mono_depth_all_sweeps,\n                        mats_dict,\n                        sensor2sensor_mats,\n                        mu_all_sweeps,\n                        sigma_all_sweeps,\n                        range_score_all_sweeps,\n                        depth_feat_all_sweeps,\n                    )\n                else:\n                    stereo_depth = self.img_view_transformer._forward_stereo(\n                        ref_idx,\n                        stereo_feats_all_sweeps,\n                        mono_depth_all_sweeps,\n                        mats_dict,\n                        sensor2sensor_mats,\n                        mu_all_sweeps,\n                        sigma_all_sweeps,\n                        range_score_all_sweeps,\n                        depth_feat_all_sweeps,\n                    )\n            else:\n                with torch.no_grad():\n                    # last iteration on stage 1 does not have\n                    # propagation (photometric consistency filtering)\n                    if self.img_view_transformer.use_mask:\n                        stereo_depth, mask = self.img_view_transformer._forward_stereo(\n                            ref_idx,\n                            stereo_feats_all_sweeps,\n                            mono_depth_all_sweeps,\n                            mats_dict,\n                            sensor2sensor_mats,\n                            mu_all_sweeps,\n                            sigma_all_sweeps,\n                            range_score_all_sweeps,\n                            depth_feat_all_sweeps,\n                        )\n                    else:\n                        stereo_depth = self.img_view_transformer._forward_stereo(\n                            ref_idx,\n                            stereo_feats_all_sweeps,\n                            mono_depth_all_sweeps,\n                            mats_dict,\n                            sensor2sensor_mats,\n                            mu_all_sweeps,\n                            sigma_all_sweeps,\n                            range_score_all_sweeps,\n                            depth_feat_all_sweeps,\n                        )\n            if self.img_view_transformer.use_mask:\n                depth_score = (\n                        mono_depth_all_sweeps[ref_idx] +\n                        self.img_view_transformer.depth_downsample_net(\n                            stereo_depth) * mask).softmax(1)\n            else:\n                depth_score = (\n                        mono_depth_all_sweeps[ref_idx] +\n                        self.img_view_transformer.depth_downsample_net(stereo_depth)).softmax(1)\n            depth_score_all_sweeps.append(depth_score)\n\n        # forward view transformation\n        bev_feat_list = []\n        key_frame=True # back propagation for key frame only\n        for image_feat, depth_prob, rot, tran, intrin, post_rot, post_tran in \\\n            zip(context_all_sweeps, depth_score_all_sweeps, rots, trans,\n                intrins, post_rots, post_trans):\n            if not key_frame:\n                with torch.no_grad():\n                    input_curr = (image_feat.view(B,N,*image_feat.shape[1:]),\n                                  depth_prob, rot, tran, intrin, post_rot,\n                                  post_tran, bda)\n                    bev_feat = self.img_view_transformer(input_curr)\n            else:\n                input_curr = (image_feat.view(B,N,*image_feat.shape[1:]),\n                                  depth_prob, rot, tran, intrin, post_rot,\n                                  post_tran, bda)\n                bev_feat = self.img_view_transformer(input_curr)\n            if self.pre_process:\n                bev_feat = self.pre_process_net(bev_feat)[0]\n            bev_feat_list.append(bev_feat)\n            key_frame = False\n\n        bev_feat = torch.cat(bev_feat_list, dim=1)\n        x = self.bev_encoder(bev_feat)\n        return [x], depth_score_all_sweeps[0]"
  },
  {
    "path": "projects/occ_plugin/occupancy/detectors/ocfnet.py",
    "content": "# Developed by Junyi Ma based on the codebase of OpenOccupancy and PowerBEV\n# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications\n# https://github.com/haomo-ai/Cam4DOcc\n\nfrom sys import api_version\nimport torch\nimport collections \nimport torch.nn.functional as F\nimport os\n\nfrom mmdet.models import DETECTORS\nfrom mmcv.runner import auto_fp16, force_fp32\nfrom .bevdepth import BEVDepth\nfrom mmdet3d.models import builder\n\nimport numpy as np\nimport time\nimport copy\nfrom typing import Tuple\n\n\n@DETECTORS.register_module()\nclass OCFNet(BEVDepth):\n    def __init__(self, \n            loss_cfg=None,\n            only_generate_dataset=False,\n            disable_loss_depth=False,\n            test_present=False,\n            empty_idx=0,\n            max_label=2,\n            occ_encoder_backbone=None,\n            occ_predictor=None,\n            occ_encoder_neck=None,\n            flow_encoder_backbone=None,\n            flow_predictor=None,\n            flow_encoder_neck=None,\n            flow_head=None,\n            loss_norm=False,\n            point_cloud_range=None,\n            time_receptive_field=None,\n            n_future_frames=None,\n            n_future_frames_plus=None,\n            iou_thresh_for_vpq=None,\n            record_time=False,\n            save_pred=False,\n            save_path=None,\n            **kwargs):\n        '''\n        OCFNet is our end-to-end baseline for 4D camera-only occupancy forecasting\n        \n        there are two streams for the forecasting task with aggregated voxel features as inputs:\n            1. occ_encoder_backbone -> occ_predictor -> occ_encoder_neck -> pts_bbox_head\n            2. flow_encoder_backbone -> flow_predictor -> flow_encoder_neck -> flow_head\n        \n        time_receptive_field: number of historical frames used for forecasting (including the present one), default: 3\n        n_future_frames: number of forecasted future frames, default: 4\n        n_future_frames_plus: number of estimated frames (> n_future_frames), default: 6 (if only forecasting occupancy states rather than instances, n_future_frames=n_future_frames_plus can be set)\n        iou_thresh_for_vpq: iou threshold to associate instances in 3D instance prediction, default: 0.2 (adjusted by occupancy forecasting performance)\n        '''\n        super().__init__(**kwargs)\n\n        self.loss_cfg = loss_cfg\n        self.disable_loss_depth = disable_loss_depth\n        self.only_generate_dataset = only_generate_dataset\n        self.loss_norm = loss_norm\n        self.time_receptive_field = time_receptive_field\n        self.n_future_frames = n_future_frames\n        self.n_future_frames_plus = n_future_frames_plus\n        self.eval_start_moment = self.n_future_frames_plus - self.n_future_frames - 1\n\n        self.iou_thresh_for_vpq = iou_thresh_for_vpq\n        \n        self.record_time = record_time\n        self.time_stats = collections.defaultdict(list)\n        self.empty_idx = empty_idx\n        self.max_label = max_label\n\n        self.occ_encoder_backbone = builder.build_backbone(occ_encoder_backbone)\n        self.occ_predictor = builder.build_neck(occ_predictor)\n        self.occ_encoder_neck = builder.build_neck(occ_encoder_neck)\n\n        self.flow_encoder_backbone = builder.build_backbone(flow_encoder_backbone)\n        self.flow_encoder_neck = builder.build_neck(flow_encoder_neck)\n        self.flow_predictor = builder.build_neck(flow_predictor)\n        self.flow_head = builder.build_head(flow_head)\n\n        self.point_cloud_range = point_cloud_range\n        self.spatial_extent3d = (self.point_cloud_range[3]-self.point_cloud_range[0], \\\n                                    self.point_cloud_range[4]-self.point_cloud_range[1], \\\n                                         self.point_cloud_range[5]-self.point_cloud_range[2])\n        self.ego_center_shift_proportion_x = abs(self.point_cloud_range[0])/(self.point_cloud_range[3]-self.point_cloud_range[0])\n        self.ego_center_shift_proportion_y = abs(self.point_cloud_range[1])/(self.point_cloud_range[4]-self.point_cloud_range[1])\n        self.ego_center_shift_proportion_z = abs(self.point_cloud_range[2])/(self.point_cloud_range[5]-self.point_cloud_range[2])\n\n        self.n_cam = 6\n        self.fine_grained = False\n        self.vehicles_id = 1\n\n        self.test_present = test_present\n        self.save_pred = save_pred\n        self.save_path = save_path\n\n    def image_encoder(self, img):\n        imgs = img\n        B, N, C, imH, imW = imgs.shape\n        imgs = imgs.view(B * N, C, imH, imW)\n        \n        backbone_feats = self.img_backbone(imgs)\n\n        if self.with_img_neck:\n            x = self.img_neck(backbone_feats)\n            if type(x) in [list, tuple]:\n                x = x[0]\n        else:\n            x = backbone_feats\n        _, output_dim, ouput_H, output_W = x.shape\n        x = x.view(B, N, output_dim, ouput_H, output_W)\n        \n        return {'x': x,\n                'img_feats': [x.clone()]}\n    \n    @force_fp32()\n    def occ_encoder(self, x):\n        b, t, _, _, _, _ = x.shape\n        x = x.reshape(b, -1, *x.shape[3:])\n        x = self.occ_encoder_backbone(x)\n        x = self.occ_predictor(x)\n        x = self.occ_encoder_neck(x) \n        return x\n\n    @force_fp32()\n    def flow_encoder(self, x):\n        b, t, _, _, _, _ = x.shape\n        x = x.reshape(b, -1, *x.shape[3:])\n        x = self.flow_encoder_backbone(x)\n        x = self.flow_predictor(x)\n        x = self.flow_encoder_neck(x) \n        return x\n\n    def mat2pose_vec(self, matrix: torch.Tensor):\n        \"\"\"\n        Converts a 4x4 pose matrix into a 6-dof pose vector\n        Args:\n            matrix (ndarray): 4x4 pose matrix\n        Returns:\n            vector (ndarray): 6-dof pose vector comprising translation components (tx, ty, tz) and\n            rotation components (rx, ry, rz)\n        \"\"\"\n\n        # M[1, 2] = -sinx*cosy, M[2, 2] = +cosx*cosy\n        rotx = torch.atan2(-matrix[..., 1, 2], matrix[..., 2, 2])\n\n        # M[0, 2] = +siny, M[1, 2] = -sinx*cosy, M[2, 2] = +cosx*cosy\n        cosy = torch.sqrt(matrix[..., 1, 2] ** 2 + matrix[..., 2, 2] ** 2)\n        roty = torch.atan2(matrix[..., 0, 2], cosy)\n\n        # M[0, 0] = +cosy*cosz, M[0, 1] = -cosy*sinz\n        rotz = torch.atan2(-matrix[..., 0, 1], matrix[..., 0, 0])\n\n        rotation = torch.stack((rotx, roty, rotz), dim=-1)\n\n        # Extract translation params\n        translation = matrix[..., :3, 3]\n        return torch.cat((translation, rotation), dim=-1)\n\n    def pack_dbatch_and_dtime(self, x):\n        b = x.shape[0]\n        s = x.shape[1]\n        x = x.view(b*s, *x.shape[2:])\n        return x\n    \n    def unpack_dbatch_and_dtime(self, x, b, s):\n        assert (b*s) == x.shape[0]\n        x = x.view(b, s, *x.shape[1:])\n        return x\n\n    def extract_img_feat(self, img_inputs_seq, img_metas):\n        '''\n        Extract features of sequential input images\n        '''\n        \n        if self.record_time:\n            torch.cuda.synchronize()\n            t0 = time.time()\n        \n        imgs_seq, rots_seq, trans_seq, intrins_seq, post_rots_seq, post_trans_seq, gt_depths_seq, sensor2sensors_seq = img_inputs_seq\n\n        self.batch_size = imgs_seq.shape[0]\n        self.sequence_length = imgs_seq.shape[1]\n\n        imgs_seq = imgs_seq[:,0:self.time_receptive_field,...].contiguous()\n        rots_seq = rots_seq[:,0:self.time_receptive_field,...].contiguous()\n        trans_seq = trans_seq[:,0:self.time_receptive_field,...].contiguous()\n        intrins_seq = intrins_seq[:,0:self.time_receptive_field,...].contiguous()\n        post_rots_seq = post_rots_seq[:,0:self.time_receptive_field,...].contiguous()\n        post_trans_seq = post_trans_seq[:,0:self.time_receptive_field,...].contiguous()\n        gt_depths_seq = gt_depths_seq[:,0:self.time_receptive_field,...].contiguous()\n        sensor2sensors_seq = sensor2sensors_seq[:,0:self.time_receptive_field,...].contiguous()\n        \n        imgs_seq = self.pack_dbatch_and_dtime(imgs_seq)\n        rots_seq = self.pack_dbatch_and_dtime(rots_seq)\n        trans_seq = self.pack_dbatch_and_dtime(trans_seq)\n        intrins_seq = self.pack_dbatch_and_dtime(intrins_seq)\n        post_rots_seq = self.pack_dbatch_and_dtime(post_rots_seq)\n        post_trans_seq = self.pack_dbatch_and_dtime(post_trans_seq)\n        gt_depths_seq = self.pack_dbatch_and_dtime(gt_depths_seq)\n        sensor2sensors_seq = self.pack_dbatch_and_dtime(sensor2sensors_seq)\n        \n        self.n_cam = imgs_seq.shape[1]\n        \n        img_enc_feats = self.image_encoder(imgs_seq) \n        x = img_enc_feats['x']\n        img_feats = img_enc_feats['img_feats']\n        \n        if self.record_time:\n            torch.cuda.synchronize()\n            t1 = time.time()\n            self.time_stats['img_encoder'].append(t1 - t0)\n        \n        mlp_input_seq = self.img_view_transformer.get_mlp_input(rots_seq, trans_seq, intrins_seq, post_rots_seq, post_trans_seq)\n        geo_inputs = [rots_seq, trans_seq, intrins_seq, post_rots_seq, post_trans_seq, None, mlp_input_seq]\n\n        x, depth = self.img_view_transformer([x] + geo_inputs)\n\n        if self.record_time:\n            torch.cuda.synchronize()\n            t2 = time.time()\n            self.time_stats['view_transformer'].append(t2 - t1)\n        \n        return x, depth, img_feats   \n\n    def warp_features(self, x, flow, tseq):\n        '''\n        Warp features by motion flow\n        '''\n\n        if flow is None:\n            return x\n\n        b, dc, dx, dy, dz = x.shape\n\n        # normalize 3D motion flow\n        flow[:,0,-1] =flow[:,0,-1]*dx/self.spatial_extent3d[0]\n        flow[:,1,-1] =flow[:,1,-1]*dy/self.spatial_extent3d[1]\n        flow[:,2,-1] =flow[:,2,-1]*dz/self.spatial_extent3d[2]\n\n        nx, ny, nz = torch.meshgrid(torch.arange(dx, dtype=torch.float, device=x.device), \\\n                                    torch.arange(dy, dtype=torch.float, device=x.device), \\\n                                    torch.arange(dz, dtype=torch.float, device=x.device))\n        tmp = torch.ones((dx, dy, dz), device=x.device)\n        grid = torch.stack((nx, ny, nz, tmp), dim=-1)\n\n        # centralize shift\n        shift_x = self.ego_center_shift_proportion_x * dx\n        shift_y = self.ego_center_shift_proportion_y * dy\n        shift_z = self.ego_center_shift_proportion_z * dz\n\n        grid[:, :, :, 0] = grid[:, :, :, 0] - shift_x\n        grid[:, :, :, 1] = grid[:, :, :, 1] - shift_y\n        grid[:, :, :, 2] = grid[:, :, :, 2] - shift_z\n        grid = grid.view(dx*dy*dz, grid.shape[-1]).unsqueeze(-1)   #[N,4,1] \n\n        transformation = flow.unsqueeze(1)  # [bs, 1, 4, 4]\n        transformed_grid = transformation @ grid  # [bs, N, 4, 1]\n        transformed_grid = transformed_grid.squeeze(-1) # [bs, N, 4]\n        transformed_grid = transformed_grid.view(-1, 4)\n\n        # de-centralize\n        transformed_grid[:, 0] = (transformed_grid[:, 0] + shift_x)\n        transformed_grid[:, 1] = (transformed_grid[:, 1] + shift_y)\n        transformed_grid[:, 2] = (transformed_grid[:, 2] + shift_z)\n        transformed_grid = transformed_grid.round().long()\n\n        # de-normalize\n        grid = grid.squeeze(-1)\n        grid = grid.view(-1, 4)\n        grid[:, 0] = (grid[:, 0] + shift_x)\n        grid[:, 1] = (grid[:, 1] + shift_y)\n        grid[:, 2] = (grid[:, 2] + shift_z)\n        grid = grid.round().long()\n\n        batch_ix = torch.cat([torch.full([transformed_grid.shape[0] // b, 1], ix, device=x.device, dtype=torch.long) for ix in range(b)])\n\n        kept = (transformed_grid[:,0] >= 0) & (transformed_grid[:,0] <dx) \\\n               & (transformed_grid[:,1] >= 0) & (transformed_grid[:,1] <dy) \\\n               & (transformed_grid[:,2] >= 0) & (transformed_grid[:,2] < dz)\n\n        transformed_grid = transformed_grid[kept]\n        batch_ix = batch_ix[kept]\n        grid = grid[kept]\n\n        warped_x =  torch.zeros_like(x, device=x.device) \n\n\n        # hard coding for reducing memory usage\n        # erratum for new version\n        split_num = 32\n        gap = transformed_grid.shape[0]//split_num\n        for tt in range(split_num-1):\n            start_idx_tt = int(tt*gap)\n            end_idx_tt = int((tt+1)*gap)\n            current_batch = batch_ix[start_idx_tt:end_idx_tt]\n            ixx = transformed_grid[start_idx_tt:end_idx_tt, 0]\n            ixy = transformed_grid[start_idx_tt:end_idx_tt, 1]\n            ixz = transformed_grid[start_idx_tt:end_idx_tt, 2]\n\n            ixx_ori = grid[start_idx_tt:end_idx_tt, 0]\n            ixy_ori = grid[start_idx_tt:end_idx_tt, 1]\n            ixz_ori = grid[start_idx_tt:end_idx_tt, 2]\n\n            warped_x[current_batch, :, ixx, ixy, ixz] = x[current_batch, :, ixx_ori, ixy_ori, ixz_ori]\n            \n        # for i in range(transformed_grid.shape[0]):  \n        #     current_batch = batch_ix[i]\n        #     ixx = transformed_grid[i, 0]\n        #     ixy = transformed_grid[i, 1]\n        #     ixz = transformed_grid[i, 2]\n\n        #     ixx_ori = grid[i, 0]\n        #     ixy_ori = grid[i, 1]\n        #     ixz_ori = grid[i, 2]\n\n        #     warped_x[current_batch, :, ixx, ixy, ixz] = x[current_batch, :, ixx_ori, ixy_ori, ixz_ori]\n\n        return warped_x \n\n    def cumulative_warp_occ(self, lifted_feature_seq, future_egomotion, mode='bilinear'):\n        '''\n        Warp sequential voxel features to the present frame by ego pose updates\n        '''\n        future_egomotion = future_egomotion[:, :self.time_receptive_field, ...].contiguous()\n        \n        out = [lifted_feature_seq[:, -1]]\n        cum_future_egomotion = future_egomotion[:, -2]\n        for t in reversed(range(self.time_receptive_field - 1)):\n            out.append(self.warp_features(lifted_feature_seq[:, t], cum_future_egomotion, t))\n            cum_future_egomotion = cum_future_egomotion @ future_egomotion[:, t - 1]\n\n        return torch.stack(out[::-1], 1)\n\n\n    def extract_feat(self, img_inputs_seq, img_metas, future_egomotion):\n        '''\n        Extract voxel features from input sequential images\n        '''\n        voxel_feats = None\n        depth, img_feats = None, None\n        if img_inputs_seq is not None:\n            voxel_feats, depth, img_feats = self.extract_img_feat(img_inputs_seq, img_metas)\n\n        if self.record_time:\n            torch.cuda.synchronize()\n            t0 = time.time()\n        \n        voxel_feats = self.unpack_dbatch_and_dtime(voxel_feats, self.batch_size, self.time_receptive_field)\n        voxel_feats = self.cumulative_warp_occ(voxel_feats.clone(), future_egomotion, mode='bilinear')\n\n        if self.record_time:\n            torch.cuda.synchronize()\n            t1 = time.time()\n            self.time_stats['feature warping'].append(t1 - t0)\n\n        # egomotion-aware\n        future_egomotion_vec = self.mat2pose_vec(future_egomotion)\n        batch_size, sequence_length, nbr_pose_channels = future_egomotion_vec.shape\n        dx, dy, dz = voxel_feats.shape[-3:]\n        future_egomotions_spatial = future_egomotion_vec.view(batch_size, sequence_length, nbr_pose_channels, 1, 1, 1).expand(batch_size, sequence_length, nbr_pose_channels, dx, dy, dz)\n        # at time 0, no egomotion so feed zero vector\n        future_egomotions_spatial = torch.cat([torch.zeros_like(future_egomotions_spatial[:, :1]),\n                                            future_egomotions_spatial[:, :(self.time_receptive_field-1)]], dim=1)\n        voxel_feats = torch.cat([voxel_feats, future_egomotions_spatial], dim=-4)\n\n        voxel_feats_enc = self.occ_encoder(voxel_feats)\n        if type(voxel_feats_enc) is not list:\n            voxel_feats_enc = [voxel_feats_enc]\n        \n        if self.record_time:\n            torch.cuda.synchronize()\n            t2 = time.time()\n            self.time_stats['occ_encoder'].append(t2 - t1)\n\n        flow_feats_enc = self.flow_encoder(voxel_feats)\n        if type(flow_feats_enc) is not list:\n            flow_feats_enc = [flow_feats_enc]\n\n        if self.record_time:\n            torch.cuda.synchronize()\n            t3 = time.time()\n            self.time_stats['flow_encoder'].append(t3 - t2)\n\n        depth = depth.view(-1, self.n_cam, *depth.shape[-3:])\n        return (voxel_feats_enc, flow_feats_enc, img_feats, depth)\n    \n    @force_fp32(apply_to=('voxel_feats'))\n    def forward_pts_train(\n            self,\n            voxel_feats,\n            gt_occ=None,\n            points_occ=None,\n            img_metas=None,\n            transform=None,\n            img_feats=None,\n        ):\n        \n        if self.record_time:\n            torch.cuda.synchronize()\n            t0 = time.time()\n        \n        outs = self.pts_bbox_head(\n            voxel_feats=voxel_feats,\n            points=points_occ,\n            img_metas=img_metas,\n            img_feats=img_feats,\n            transform=transform,\n        )\n        \n        if self.record_time:\n            torch.cuda.synchronize()\n            t1 = time.time()\n            self.time_stats['occ_head'].append(t1 - t0)\n        \n        losses = self.pts_bbox_head.loss(\n            output_voxels=outs['output_voxels'],\n            target_voxels=gt_occ,\n            target_points=points_occ,\n            img_metas=img_metas,\n        )\n        \n        if self.record_time:\n            torch.cuda.synchronize()\n            t2 = time.time()\n            self.time_stats['loss_occ'].append(t2 - t1)\n        \n        return losses\n\n    @force_fp32(apply_to=('voxel_feats'))\n    def forward_flow_train(\n            self,\n            voxel_feats,\n            gt_occ=None,\n            points_occ=None,\n            img_metas=None,\n            transform=None,\n            img_feats=None,\n        ):\n\n        if self.record_time:\n            torch.cuda.synchronize()\n            t0 = time.time()\n\n        outs = self.flow_head(\n            voxel_feats=voxel_feats,\n            points=points_occ,\n            img_metas=img_metas,\n            img_feats=img_feats,\n            transform=transform,\n        )\n\n        if self.record_time:\n            torch.cuda.synchronize()\n            t1 = time.time()\n            self.time_stats['flow_head'].append(t1 - t0)\n        \n        losses = self.flow_head.loss(\n            output_voxels=outs['output_voxels'],\n            target_voxels=gt_occ,\n            target_points=points_occ,\n            img_metas=img_metas,\n        )\n        \n        if self.record_time:\n            torch.cuda.synchronize()\n            t2 = time.time()\n            self.time_stats['loss_flow'].append(t2 - t1)\n        \n        return losses\n\n    def forward_train(self,\n            img_inputs_seq=None,\n            segmentation=None,\n            instance=None,\n            attribute_label=None,\n            flow=None,\n            future_egomotion=None,\n            gt_occ=None,\n            img_metas=None,\n            points_occ=None,\n            **kwargs,\n        ):\n        '''\n        Train OCFNet using bbox-wise occupancy labels if self.fine_grained=False, else using voxel-wise labels from nuScenes-Occupancy\n        '''\n\n        # manually stop forward\n        if self.only_generate_dataset:\n            return {\"pseudo_loss\": torch.tensor(0.0, device=segmentation.device, requires_grad=True)}\n\n        if not self.fine_grained:\n            gt_occ = segmentation\n\n        voxel_feats, flow_feats, img_feats, depth = self.extract_feat(\n            img_inputs_seq=img_inputs_seq, img_metas=img_metas, future_egomotion=future_egomotion)\n        \n        # training losses\n        losses = dict()\n        \n        if self.record_time:        \n            torch.cuda.synchronize()\n            t0 = time.time()\n        \n        # TODO: we will release the version with depth fine-tuning in the future\n        if not self.disable_loss_depth and depth is not None:\n            depth_gt = img_inputs_seq[-2][:,0:self.time_receptive_field,...].contiguous()\n            depth_gt = depth_gt.view(depth_gt.shape[0]*depth_gt.shape[1],*depth_gt.shape[2:])\n            depth = depth.view(-1, *depth.shape[2:])\n            losses['loss_depth'] = self.img_view_transformer.get_depth_loss(depth_gt, depth)\n        \n        if self.record_time:\n            torch.cuda.synchronize()\n            t1 = time.time()\n            self.time_stats['loss_depth'].append(t1 - t0)\n        \n        transform = img_inputs_seq[1:8] if img_inputs_seq is not None else None\n        voxel_feats_seq = []\n        for voxel_feats_stage in voxel_feats:\n            bs, sfeatures = voxel_feats_stage.shape[:2]\n            voxel_feats_stage_ = voxel_feats_stage.view(bs*self.n_future_frames_plus, sfeatures//self.n_future_frames_plus, *voxel_feats_stage.shape[2:])\n            voxel_feats_seq.append(voxel_feats_stage_)\n        gt_occ = gt_occ[:, -self.n_future_frames_plus:, ...] \n        flow = flow[:, -self.n_future_frames_plus:, ...] \n\n        losses_occupancy = self.forward_pts_train(voxel_feats_seq, gt_occ,\n                        points_occ, img_metas, img_feats=img_feats, transform=transform)\n        losses.update(losses_occupancy)\n\n        flow_feats_seq = []\n        for flow_feats_stage in flow_feats:\n            bs, sfeatures = flow_feats_stage.shape[:2]\n            flow_feats_stage_ = flow_feats_stage.view(bs*self.n_future_frames_plus, sfeatures//self.n_future_frames_plus, *flow_feats_stage.shape[2:])\n            flow_feats_seq.append(flow_feats_stage_)\n\n        losses_flow = self.forward_flow_train(flow_feats_seq, flow,\n                        points_occ, img_metas, img_feats=img_feats, transform=transform)\n        losses.update(losses_flow)\n\n        if self.loss_norm:\n            for loss_key in losses.keys():\n                if loss_key.startswith('loss'):\n                    losses[loss_key] = losses[loss_key] / (losses[loss_key].detach() + 1e-9)\n\n        def logging_latencies():\n            # logging latencies\n            avg_time = {key: sum(val) / len(val) for key, val in self.time_stats.items()}\n            sum_time = sum(list(avg_time.values()))\n            out_res = ''\n            for key, val in avg_time.items():\n                out_res += '{}: {:.4f}, {:.1f}, '.format(key, val, val / sum_time)\n            \n            print(out_res)\n        \n        if self.record_time:\n            logging_latencies()\n        \n        return losses\n        \n    def forward_test(self,\n            img_inputs_seq=None,\n            segmentation=None,\n            instance=None,\n            attribute_label=None,\n            flow=None,\n            future_egomotion=None,\n            gt_occ=None,\n            img_metas=None,\n            points_occ=None,\n            **kwargs,\n        ):\n        '''\n        Test OCFNet using IOU and VPQ metrics\n        '''\n\n        # let batch size equals 1 while testing\n        assert segmentation.shape[0] == 1\n\n        return self.simple_test(img_metas, img_inputs_seq, gt_occ=gt_occ, gt_flow=flow, segmentation=segmentation, instance=instance, future_egomotion=future_egomotion, **kwargs)\n    \n    def simple_test(self, img_metas, img_inputs_seq=None, rescale=False, points_occ=None, \n            gt_occ=None, gt_flow=None, segmentation=None, instance=None, future_egomotion=None):\n        \n        # manually stop forward\n        if self.only_generate_dataset:\n            return {'hist_for_iou': 0, 'pred_c': 0, 'vpq':0}\n\n        if not self.fine_grained:\n            gt_occ = segmentation\n                \n        voxel_feats, flow_feats, img_feats, depth = self.extract_feat(\n            img_inputs_seq=img_inputs_seq, img_metas=img_metas, future_egomotion=future_egomotion)\n\n        transform = img_inputs_seq[1:8] if img_inputs_seq is not None else None\n\n        voxel_feats_seq = []\n        for voxel_feats_stage in voxel_feats:\n            bs, sfeatures = voxel_feats_stage.shape[:2]\n            voxel_feats_stage_ = voxel_feats_stage.view(bs*self.n_future_frames_plus, sfeatures//self.n_future_frames_plus, *voxel_feats_stage.shape[2:])\n            voxel_feats_seq.append(voxel_feats_stage_)\n        \n        gt_occ = gt_occ[:, -self.n_future_frames_plus:, ...].contiguous()\n        gt_occ = gt_occ.view(gt_occ.shape[0]*gt_occ.shape[1], *gt_occ.shape[2:])\n        instance = instance[:, -self.n_future_frames_plus:, ...].contiguous()\n        instance = instance.view(instance.shape[0]*instance.shape[1], *instance.shape[2:])\n\n        output = self.pts_bbox_head(\n            voxel_feats=voxel_feats_seq,\n            points=points_occ,\n            img_metas=img_metas,\n            img_feats=img_feats,\n            transform=transform,\n        )\n\n        pred_c = output['output_voxels'][0]  \n        \n        flow_feats_seq = []\n        for flow_feats_stage in flow_feats:\n            bs, sfeatures = flow_feats_stage.shape[:2]\n            flow_feats_stage_ = flow_feats_stage.view(bs*self.n_future_frames_plus, sfeatures//self.n_future_frames_plus, *flow_feats_stage.shape[2:])\n            flow_feats_seq.append(flow_feats_stage_)\n\n        output_flow = self.flow_head(\n            voxel_feats=flow_feats_seq,\n            points=points_occ,\n            img_metas=img_metas,\n            img_feats=img_feats,\n            transform=transform,\n        )\n\n        gt_flow = gt_flow[:, -self.n_future_frames_plus:, ...].contiguous()\n        gt_flow = gt_flow.view(gt_flow.shape[0]*gt_flow.shape[1], *gt_flow.shape[2:])\n\n        # pred_flow = output_flow['output_voxels'][0]\n        # vpq = self.evaluate_instance_prediction(pred_c, pred_flow, gt_occ, instance)\n        vpq = 0.1\n\n        if self.test_present:\n            pred_c = pred_c[self.eval_start_moment:(self.eval_start_moment+1), ...]\n            gt_occ = gt_occ[self.eval_start_moment:(self.eval_start_moment+1), ...]\n        else:\n            pred_c = pred_c[self.eval_start_moment+1:, ...]\n            gt_occ = gt_occ[self.eval_start_moment+1:, ...]\n        \n        hist_for_iou = self.evaluate_occupancy_forecasting(pred_c, gt_occ, img_metas=img_metas, save_pred=self.save_pred, save_path=self.save_path)\n\n        test_output = {\n            'hist_for_iou': hist_for_iou,\n            'pred_c': pred_c,\n            'vpq': vpq,\n        }\n\n        return test_output\n\n\n    def evaluate_occupancy_forecasting(self, pred, gt, img_metas=None, save_pred=False, save_path=None):\n\n        B, H, W, D = gt.shape\n        pred = F.interpolate(pred, size=[H, W, D], mode='trilinear', align_corners=False).contiguous()\n\n        hist_all = 0\n        iou_per_pred_list = []\n        pred_list = []\n        gt_list = []\n        for i in range(B):\n            pred_cur = pred[i,...]\n            pred_cur = torch.argmax(pred_cur, dim=0).cpu().numpy()\n            gt_cur = gt[i, ...].cpu().numpy()\n            gt_cur = gt_cur.astype(np.int)\n\n            pred_list.append(pred_cur)\n            gt_list.append(gt_cur)\n\n            # ignore noise\n            noise_mask = gt_cur != 255\n\n            # GMO and others for max_label=2\n            # multiple movable objects for max_label=9\n            hist_cur, iou_per_pred = fast_hist(pred_cur[noise_mask], gt_cur[noise_mask], max_label=self.max_label)\n            hist_all = hist_all + hist_cur\n            iou_per_pred_list.append(iou_per_pred)\n\n        # whether save prediction results\n        if save_pred:\n            if not os.path.exists(save_path):\n                os.mkdir(save_path)\n            pred_for_save_list = []\n            for k in range(B):\n                pred_for_save = torch.argmax(pred[k], dim=0).cpu()\n                x_grid = torch.linspace(0, H-1, H, dtype=torch.long)\n                x_grid = x_grid.view(H, 1, 1).expand(H, W, D)\n                y_grid = torch.linspace(0, W-1, W, dtype=torch.long)\n                y_grid = y_grid.view(1, W, 1).expand(H, W, D)\n                z_grid = torch.linspace(0, D-1, D, dtype=torch.long)\n                z_grid = z_grid.view(1, 1, D).expand(H, W, D)\n                segmentation_for_save = torch.stack((x_grid, y_grid, z_grid), -1)\n                segmentation_for_save = segmentation_for_save.view(-1, 3)\n                segmentation_label = pred_for_save.squeeze(0).view(-1,1)\n                segmentation_for_save = torch.cat((segmentation_for_save, segmentation_label), dim=-1) # N,4\n                kept = segmentation_for_save[:,-1]!=0\n                segmentation_for_save= segmentation_for_save[kept].cpu().numpy()\n                pred_for_save_list.append(segmentation_for_save)\n            np.savez(os.path.join(save_path, img_metas[0][\"scene_token\"]), pred_for_save_list)\n\n        return hist_all\n\n\n    def find_instance_centers(self, center_prediction: torch.Tensor, conf_threshold: float = 0.1, nms_kernel_size: float = 3):\n        assert len(center_prediction.shape) == 4\n\n        center_prediction = F.threshold(center_prediction, threshold=conf_threshold, value=-1)\n\n        nms_padding = (nms_kernel_size - 1) // 2\n        maxpooled_center_prediction = F.max_pool3d(\n            center_prediction, kernel_size=nms_kernel_size, stride=1, padding=nms_padding\n        )\n\n        # Filter all elements that are not the maximum (i.e. the center of the heatmap instance)\n        center_prediction[center_prediction != maxpooled_center_prediction] = -1\n        return torch.nonzero(center_prediction > 0)[:, 1:]\n\n    def group_pixels(self, centers: torch.Tensor, offset_predictions: torch.Tensor) -> torch.Tensor:\n        dx, dy, dz = offset_predictions.shape[-3:]\n        x_grid = (\n            torch.arange(dx, dtype=offset_predictions.dtype, device=offset_predictions.device)\n            .view(1, dx, 1, 1)\n            .repeat(1, 1, dy, dz)\n        )\n        y_grid = (\n            torch.arange(dy, dtype=offset_predictions.dtype, device=offset_predictions.device)\n            .view(1, 1, dy, 1)\n            .repeat(1, dx, 1, dz)\n        )\n        z_grid = (\n            torch.arange(dz, dtype=offset_predictions.dtype, device=offset_predictions.device)\n            .view(1, 1, 1, dz)\n            .repeat(1, dx, dy, 1)\n        )\n\n        pixel_grid = torch.cat((x_grid, y_grid, z_grid), dim=0)\n        center_locations = (pixel_grid + offset_predictions).view(3, dx*dy*dz, 1).permute(2, 1, 0)\n        centers = centers.view(-1, 1, 3)\n\n        distances = torch.norm(centers - center_locations, dim=-1)\n\n        instance_id = torch.argmin(distances, dim=0).reshape(1, dx, dy, dz) + 1\n        return instance_id\n\n    def update_instance_ids(self, instance_seg, old_ids, new_ids):\n        indices = torch.arange(old_ids.max() + 1, device=instance_seg.device)\n        for old_id, new_id in zip(old_ids, new_ids):\n            indices[old_id] = new_id\n\n        return indices[instance_seg].long()\n\n    def make_instance_seg_consecutive(self, instance_seg):\n        # Make the indices of instance_seg consecutive\n        unique_ids = torch.unique(instance_seg)\n        new_ids = torch.arange(len(unique_ids), device=instance_seg.device)\n        instance_seg = self.update_instance_ids(instance_seg, unique_ids, new_ids)\n        return instance_seg\n\n    def get_instance_segmentation_and_centers(self,\n        center_predictions: torch.Tensor,\n        offset_predictions: torch.Tensor,\n        foreground_mask: torch.Tensor,\n        conf_threshold: float = 0.1,\n        nms_kernel_size: float = 5,\n        max_n_instance_centers: int = 100,\n    ) -> Tuple[torch.Tensor, torch.Tensor]:\n        dx, dy, dz = offset_predictions.shape[-3:]\n        center_predictions = center_predictions.view(1, dx, dy, dz)\n        offset_predictions = offset_predictions.view(3, dx, dy, dz)\n        foreground_mask = foreground_mask.view(1, dx, dy, dz)\n\n        centers = self.find_instance_centers(center_predictions, conf_threshold=conf_threshold, nms_kernel_size=nms_kernel_size)\n        if not len(centers):\n            return torch.zeros(center_predictions.shape, dtype=torch.int64, device=center_predictions.device)\n\n        if len(centers) > max_n_instance_centers:\n            centers = centers[:max_n_instance_centers].clone()\n        \n        instance_ids = self.group_pixels(centers, offset_predictions * foreground_mask.float()) \n        instance_seg = (instance_ids * foreground_mask.float()).long()\n\n        # Make the indices of instance_seg consecutive\n        instance_seg = self.make_instance_seg_consecutive(instance_seg) \n\n        return instance_seg.long()\n\n    def flow_warp(self, occupancy, flow, mode='nearest', padding_mode='zeros'):\n        '''\n        Warp ground-truth flow-origin occupancies according to predicted flows\n        '''\n\n        _, num_waypoints, _, grid_dx_cells, grid_dy_cells, grid_dz_cells = occupancy.size()\n\n        dx = torch.linspace(-1, 1, steps=grid_dx_cells)\n        dy = torch.linspace(-1, 1, steps=grid_dy_cells)\n        dz = torch.linspace(-1, 1, steps=grid_dz_cells)\n\n        x_idx, y_idx, z_idx = torch.meshgrid(dx, dy, dz)\n        identity_indices = torch.stack((x_idx, y_idx, z_idx), dim=0).to(device=occupancy.device)\n\n        warped_occupancy = []\n        for k in range(num_waypoints):  # 1\n            flow_origin_occupancy = occupancy[:, k]  # B T 1 dx dy dz -> B 1 dx dy dz\n            pred_flow = flow[:, k]  # B T 3 dx dy dz -> B 3 dx dy dz\n            # Normalize along the width and height direction\n            normalize_pred_flow = torch.stack(\n                (2.0 * pred_flow[:, 0] / (grid_dx_cells - 1),  \n                2.0 * pred_flow[:, 1] / (grid_dy_cells - 1),\n                2.0 * pred_flow[:, 2] / (grid_dz_cells - 1),),\n                dim=1,\n            )\n\n            warped_indices = identity_indices + normalize_pred_flow\n            warped_indices = warped_indices.permute(0, 2, 3, 4, 1)\n\n            flow_origin_occupancy = flow_origin_occupancy.permute(0, 1, 4, 3, 2)\n\n            sampled_occupancy = F.grid_sample(\n                input=flow_origin_occupancy,\n                grid=warped_indices,\n                mode=mode,\n                padding_mode='zeros',\n                align_corners=True,\n            )\n            warped_occupancy.append(sampled_occupancy)\n        return warped_occupancy[0]\n\n    def make_instance_id_temporally_consecutive(self, pred_inst, preds, backward_flow, ignore_index=255.0):\n\n        assert pred_inst.shape[0] == 1, 'Assumes batch size = 1'\n\n        # Initialise instance segmentations with prediction corresponding to the present\n        consistent_instance_seg = [pred_inst.unsqueeze(0)]\n        backward_flow = backward_flow.clone().detach()\n        backward_flow[backward_flow == ignore_index] = 0.0\n        seq_len, _, dx, dy, dz = preds.shape\n\n        for t in range(1, seq_len):\n\n            init_warped_instance_seg = self.flow_warp(consistent_instance_seg[-1].unsqueeze(0).float(), backward_flow[t:t+1].unsqueeze(0)).int()\n\n            warped_instance_seg = init_warped_instance_seg * preds[t:t+1, 0]\n        \n            consistent_instance_seg.append(warped_instance_seg)\n        \n        consistent_instance_seg = torch.cat(consistent_instance_seg, dim=1)\n        return consistent_instance_seg\n\n\n    def predict_instance_segmentation(self, pred_seg, pred_flow):\n        pred_seg_sm = pred_seg.detach()\n        pred_seg_sm = torch.argmax(pred_seg_sm, dim=1, keepdims=True)\n        foreground_masks = pred_seg_sm.squeeze(1) == self.vehicles_id\n\n        pred_inst_batch = self.get_instance_segmentation_and_centers(\n            torch.softmax(pred_seg, dim=1)[0:1, self.vehicles_id].detach(),\n            pred_flow[1:2].detach(), \n            foreground_masks[1:2].detach(),\n            nms_kernel_size=7,\n        )  \n        \n        consistent_instance_seg = self.make_instance_id_temporally_consecutive(\n                pred_inst_batch,\n                pred_seg_sm[1:],\n                pred_flow[1:].detach(),\n                )\n\n        consistent_instance_seg = torch.cat([torch.zeros_like(pred_inst_batch.unsqueeze(0)), consistent_instance_seg], dim=1)\n\n        return consistent_instance_seg.permute(1, 0, 2, 3, 4).long()  # [1, 6, 512, 512, 40]\n\n    def combine_mask(self, segmentation: torch.Tensor, instance: torch.Tensor, n_classes: int, n_all_things: int):\n        '''\n        Shift all things ids by num_classes and combine things and stuff into a single mask\n        '''\n        instance = instance.view(-1)\n        instance_mask = instance > 0\n        instance = instance - 1 + n_classes\n\n        segmentation = segmentation.clone().view(-1)\n        segmentation_mask = segmentation < n_classes\n\n        # Build an index from instance id to class id.\n        instance_id_to_class_tuples = torch.cat(\n            (\n                instance[instance_mask & segmentation_mask].unsqueeze(1),\n                segmentation[instance_mask & segmentation_mask].unsqueeze(1),\n            ),\n            dim=1,\n        )\n\n        instance_id_to_class = -instance_id_to_class_tuples.new_ones((n_all_things,))\n        instance_id_to_class[instance_id_to_class_tuples[:, 0]] = instance_id_to_class_tuples[:, 1]\n        instance_id_to_class[torch.arange(n_classes, device=segmentation.device)] = torch.arange(\n            n_classes, device=segmentation.device\n        )\n\n        segmentation[instance_mask] = instance[instance_mask]\n        segmentation += 1\n        segmentation[~segmentation_mask] = 0\n\n        return segmentation, instance_id_to_class\n\n\n    def panoptic_metrics(self, pred_segmentation, pred_instance, gt_segmentation, gt_instance, unique_id_mapping):\n        # GMO and others\n        n_classes = 2 \n        self.keys = ['iou', 'true_positive', 'false_positive', 'false_negative'] # hard coding\n        result = {key: torch.zeros(n_classes, dtype=torch.float32, device=gt_instance.device) for key in self.keys}\n\n        assert pred_segmentation.dim() == 3\n        assert pred_segmentation.shape == pred_instance.shape == gt_segmentation.shape == gt_instance.shape\n\n        n_instances = int(torch.cat([pred_instance, gt_instance]).max().item())\n        n_all_things = n_instances + n_classes  # Classes + instances.\n        n_things_and_void = n_all_things + 1\n\n        pred_segmentation = pred_segmentation.long().detach().cpu()\n        pred_instance = pred_instance.long().detach().cpu()\n        gt_segmentation = gt_segmentation.long().detach().cpu()\n        gt_instance = gt_instance.long().detach().cpu()\n        \n        prediction, pred_to_cls = self.combine_mask(pred_segmentation, pred_instance, n_classes, n_all_things)\n        target, target_to_cls = self.combine_mask(gt_segmentation, gt_instance, n_classes, n_all_things)\n\n        # Compute ious between all stuff and things\n        # hack for bincounting 2 arrays together\n        x = prediction + n_things_and_void * target  \n        bincount_2d = torch.bincount(x.long(), minlength=n_things_and_void ** 2) \n        if bincount_2d.shape[0] != n_things_and_void ** 2:\n            raise ValueError('Incorrect bincount size.')\n        conf = bincount_2d.reshape((n_things_and_void, n_things_and_void))\n        # Drop void class\n        conf = conf[1:, 1:]  \n        # Confusion matrix contains intersections between all combinations of classes\n        union = conf.sum(0).unsqueeze(0) + conf.sum(1).unsqueeze(1) - conf\n        iou = torch.where(union > 0, (conf.float() + 1e-9) / (union.float() + 1e-9), torch.zeros_like(union).float())\n\n        mapping = (iou > self.iou_thresh_for_vpq).nonzero(as_tuple=False)\n \n        # Check that classes match.\n        is_matching = pred_to_cls[mapping[:, 1]] == target_to_cls[mapping[:, 0]]\n        mapping = mapping[is_matching.detach().cpu().numpy()]\n        tp_mask = torch.zeros_like(conf, dtype=torch.bool)\n        tp_mask[mapping[:, 0], mapping[:, 1]] = True\n\n        # First ids correspond to \"stuff\" i.e. semantic seg.\n        # Instance ids are offset accordingly\n        for target_id, pred_id in mapping:\n            cls_id = pred_to_cls[pred_id]\n\n            self.temporally_consistent = True # hard coding !\n            if self.temporally_consistent and cls_id == self.vehicles_id:\n                if target_id.item() in unique_id_mapping and unique_id_mapping[target_id.item()] != pred_id.item():\n                    # Not temporally consistent\n                    result['false_negative'][target_to_cls[target_id]] += 1\n                    result['false_positive'][pred_to_cls[pred_id]] += 1\n                    unique_id_mapping[target_id.item()] = pred_id.item()\n                    continue\n\n            result['true_positive'][cls_id] += 1\n            result['iou'][cls_id] += iou[target_id][pred_id]\n            unique_id_mapping[target_id.item()] = pred_id.item()\n\n        for target_id in range(n_classes, n_all_things):\n            # If this is a true positive do nothing.\n            if tp_mask[target_id, n_classes:].any():\n                continue\n            # If this target instance didn't match with any predictions and was present set it as false negative.\n            if target_to_cls[target_id] != -1:\n                result['false_negative'][target_to_cls[target_id]] += 1\n\n        for pred_id in range(n_classes, n_all_things):\n            # If this is a true positive do nothing.\n            if tp_mask[n_classes:, pred_id].any():\n                continue\n            # If this predicted instance didn't match with any prediction, set that predictions as false positive.\n            if pred_to_cls[pred_id] != -1 and (conf[:, pred_id] > 0).any():\n                result['false_positive'][pred_to_cls[pred_id]] += 1\n\n        return result\n\n    def evaluate_instance_prediction(self, pred_seg, pred_flow, gt_seg, gt_instance):\n\n        B, H, W, D = gt_seg.shape\n\n        pred_consistent_instance_seg = self.predict_instance_segmentation(pred_seg, pred_flow)\n\n        # add one feature dimension for interpolate\n        pred_consistent_instance_seg = F.interpolate(pred_consistent_instance_seg.float(), size=[H, W, D], mode='nearest').contiguous()\n        pred_consistent_instance_seg = pred_consistent_instance_seg.squeeze(1) # [6,512,512,40]\n\n        iou = 0\n        true_positive = 0\n        false_positive = 0\n        false_negative = 0\n\n        # starting from the present frame\n        pred_instance = pred_consistent_instance_seg[self.eval_start_moment:]\n        gt_instance = gt_instance[self.eval_start_moment:].long()\n\n        assert gt_instance.min() == 0, 'ID 0 of gt_instance must be background'\n        pred_segmentation = (pred_instance > 0).long()\n        gt_segmentation = (gt_instance > 0).long()\n\n        unique_id_mapping = {}\n        for t in range(pred_segmentation.shape[0]):\n            result = self.panoptic_metrics(\n                pred_segmentation[t].detach(),\n                pred_instance[t].detach(),\n                gt_segmentation[t],\n                gt_instance[t],\n                unique_id_mapping,\n            )\n\n            iou += result['iou']\n            true_positive += result['true_positive']\n            false_positive += result['false_positive']\n            false_negative += result['false_negative']\n\n        denominator = torch.maximum(\n            (true_positive + false_positive / 2 + false_negative / 2),\n            torch.ones_like(true_positive)\n        )\n        pq = iou / denominator\n\n        return pq.cpu().numpy()\n\n    def forward_dummy(self,\n            points=None,\n            img_metas=None,\n            img_inputs=None,\n            points_occ=None,\n            **kwargs,\n        ):\n\n        voxel_feats, flow_feats, img_feats, depth = self.extract_feat(img=img_inputs, img_metas=img_metas)\n\n        transform = img_inputs[1:8] if img_inputs is not None else None\n        output = self.pts_bbox_head(\n            voxel_feats=voxel_feats,\n            points=points_occ,\n            img_metas=img_metas,\n            img_feats=img_feats,\n            transform=transform,\n        )\n        \n        return output\n    \n    \ndef fast_hist(pred, label, max_label=18):\n    pred = copy.deepcopy(pred.flatten())\n    label = copy.deepcopy(label.flatten())\n    bin_count = np.bincount(max_label * label.astype(int) + pred, minlength=max_label ** 2) \n    iou_per_pred = (bin_count[-1]/(bin_count[-1]+bin_count[1]+bin_count[2]))\n    return bin_count[:max_label ** 2].reshape(max_label, max_label),iou_per_pred\n"
  },
  {
    "path": "projects/occ_plugin/occupancy/fuser/__init__.py",
    "content": "from .addfuse import AddFuser\nfrom .visfuse import VisFuser\nfrom .convfuse import ConvFuser\n"
  },
  {
    "path": "projects/occ_plugin/occupancy/fuser/addfuse.py",
    "content": "import random\nfrom typing import List\n\nimport torch\nfrom torch import nn\n\nfrom mmdet3d.models.builder import FUSION_LAYERS\n\n\n@FUSION_LAYERS.register_module()\nclass AddFuser(nn.Module):\n    def __init__(self, in_channels, out_channels, dropout, input_modality=None) -> None:\n        super().__init__()\n        self.in_channels = in_channels\n        self.out_channels = out_channels\n        self.dropout = dropout\n        if input_modality == None:\n            input_modality = dict(\n                use_lidar=True,\n                use_camera=True,\n                use_radar=False,\n                use_map=False,\n                use_external=False)\n        self.use_lidar = input_modality['use_lidar']\n        self.use_img = input_modality['use_camera']\n\n        if self.use_img:\n            self.img_enc = nn.Sequential(\n                nn.Conv3d(in_channels, out_channels, 3, padding=1, bias=False),\n                nn.BatchNorm3d(out_channels),\n                nn.ReLU(True),\n            )\n        if self.use_lidar:\n            self.pts_enc = nn.Sequential(\n                nn.Conv3d(in_channels, out_channels, 3, padding=1, bias=False),\n                nn.BatchNorm3d(out_channels),\n                nn.ReLU(True),\n            )\n\n    def forward(self, img_voxel_feats, pts_voxel_feats):\n        features = []\n        if self.use_img:\n            img_voxel_feats = self.img_enc(img_voxel_feats)\n            features.append(img_voxel_feats)\n        if self.use_lidar:\n            pts_voxel_feats = self.pts_enc(pts_voxel_feats)\n            features.append(pts_voxel_feats)\n\n        weights = [1] * len(features)\n        if self.training and random.random() < self.dropout:\n            index = random.randint(0, len(features) - 1)\n            weights[index] = 0\n\n        return sum(w * f for w, f in zip(weights, features)) / sum(weights)\n"
  },
  {
    "path": "projects/occ_plugin/occupancy/fuser/convfuse.py",
    "content": "import random\nfrom typing import List\n\nimport torch\nfrom torch import nn\n\nfrom mmdet3d.models.builder import FUSION_LAYERS\n\n\n@FUSION_LAYERS.register_module()\nclass ConvFuser(nn.Module):\n    def __init__(self, in_channels, out_channels) -> None:\n        super().__init__()\n        self.in_channels = in_channels\n        self.out_channels = out_channels\n\n        self.occ_enc = nn.Sequential(\n            nn.Conv3d(in_channels*2, out_channels, 3, padding=1, bias=False),\n            nn.BatchNorm3d(out_channels),\n            nn.ReLU(True),\n        )\n    def forward(self, img_voxel_feats, pts_voxel_feats):\n        return self.occ_enc(torch.cat([img_voxel_feats, pts_voxel_feats], dim=1))\n"
  },
  {
    "path": "projects/occ_plugin/occupancy/fuser/visfuse.py",
    "content": "import random\nfrom typing import List\n\nimport torch\nfrom torch import nn\nimport torch.nn.functional as F\n\nfrom mmdet3d.models.builder import FUSION_LAYERS\nfrom mmcv.cnn import build_norm_layer\n\n\n@FUSION_LAYERS.register_module()\nclass VisFuser(nn.Module):\n    def __init__(self, in_channels, out_channels, norm_cfg=None) -> None:\n        super().__init__()\n        self.in_channels = in_channels\n        self.out_channels = out_channels\n        if norm_cfg is None:\n            norm_cfg = dict(type='BN3d', eps=1e-3, momentum=0.01)\n\n        self.img_enc = nn.Sequential(\n            nn.Conv3d(in_channels, out_channels, 7, padding=3, bias=False),\n            build_norm_layer(norm_cfg, out_channels)[1],\n            # nn.BatchNorm3d(out_channels),\n            nn.ReLU(True),\n        )\n        self.pts_enc = nn.Sequential(\n            nn.Conv3d(in_channels, out_channels, 7, padding=3, bias=False),\n            build_norm_layer(norm_cfg, out_channels)[1],\n            # nn.BatchNorm3d(out_channels),\n            nn.ReLU(True),\n        )\n        self.vis_enc = nn.Sequential(\n            nn.Conv3d(2*out_channels, 16, 3, padding=1, bias=False),\n            build_norm_layer(norm_cfg, 16)[1],\n            # nn.BatchNorm3d(16),\n            nn.ReLU(True),\n            nn.Conv3d(16, 1, 1, padding=0, bias=False),\n            nn.Sigmoid(),\n        )\n\n    def forward(self, img_voxel_feats, pts_voxel_feats):\n\n        img_voxel_feats = self.img_enc(img_voxel_feats)\n        pts_voxel_feats = self.pts_enc(pts_voxel_feats)\n        vis_weight = self.vis_enc(torch.cat([img_voxel_feats, pts_voxel_feats], dim=1))\n        voxel_feats = vis_weight * img_voxel_feats + (1 - vis_weight) * pts_voxel_feats\n\n        return voxel_feats\n"
  },
  {
    "path": "projects/occ_plugin/occupancy/image2bev/ViewTransformerLSSBEVDepth.py",
    "content": "# Copyright (c) Phigent Robotics. All rights reserved.\nimport math\nimport torch\nimport torch.nn as nn\nfrom mmcv.runner import BaseModule\nfrom mmdet3d.models.builder import NECKS\nfrom projects.occ_plugin.ops.occ_pooling import occ_pool\nfrom mmcv.cnn import build_conv_layer, build_norm_layer\nfrom mmcv.runner import force_fp32\nfrom torch.cuda.amp.autocast_mode import autocast\nfrom mmdet.models.backbones.resnet import BasicBlock\nimport torch.nn.functional as F\nfrom torch.utils.checkpoint import checkpoint\nfrom scipy.special import erf\nfrom scipy.stats import norm\nimport numpy as np\nimport copy\nimport pdb\n\ndef gen_dx_bx(xbound, ybound, zbound):\n    dx = torch.Tensor([row[2] for row in [xbound, ybound, zbound]])\n    bx = torch.Tensor([row[0] + row[2]/2.0 for row in [xbound, ybound, zbound]])\n    nx = torch.Tensor([(row[1] - row[0]) / row[2] for row in [xbound, ybound, zbound]])\n    return dx, bx, nx\n\ndef cumsum_trick(x, geom_feats, ranks):\n    x = x.cumsum(0)\n    kept = torch.ones(x.shape[0], device=x.device, dtype=torch.bool)\n    kept[:-1] = (ranks[1:] != ranks[:-1])\n    x, geom_feats = x[kept], geom_feats[kept]\n    x = torch.cat((x[:1], x[1:] - x[:-1]))\n    return x, geom_feats\n\n\nclass QuickCumsum(torch.autograd.Function):\n    @staticmethod\n    def forward(ctx, x, geom_feats, ranks):\n        x = x.cumsum(0)\n        kept = torch.ones(x.shape[0], device=x.device, dtype=torch.bool)\n        kept[:-1] = (ranks[1:] != ranks[:-1])\n\n        x, geom_feats = x[kept], geom_feats[kept]\n        x = torch.cat((x[:1], x[1:] - x[:-1]))\n\n        # save kept for backward\n        ctx.save_for_backward(kept)\n\n        # no gradient for geom_feats\n        ctx.mark_non_differentiable(geom_feats)\n\n        return x, geom_feats\n\n    @staticmethod\n    def backward(ctx, gradx, gradgeom):\n        kept, = ctx.saved_tensors\n        back = torch.cumsum(kept, 0)\n        back[kept] -= 1\n\n        val = gradx[back]\n\n        return val, None, None\n\nclass ViewTransformerLiftSplatShoot(BaseModule):\n    def __init__(self, grid_config=None, data_config=None,\n                 numC_input=512, numC_Trans=64, downsample=16,\n                 accelerate=False, use_bev_pool=True, vp_megvii=False,\n                 vp_stero=False, **kwargs):\n        super(ViewTransformerLiftSplatShoot, self).__init__()\n        if grid_config is None:\n            grid_config = {\n                'xbound': [-51.2, 51.2, 0.8],\n                'ybound': [-51.2, 51.2, 0.8],\n                'zbound': [-10.0, 10.0, 20.0],\n                'dbound': [1.0, 60.0, 1.0],}\n        self.grid_config = grid_config\n        dx, bx, nx = gen_dx_bx(self.grid_config['xbound'],\n                               self.grid_config['ybound'],\n                               self.grid_config['zbound'],\n                               )\n        self.dx = nn.Parameter(dx, requires_grad=False)\n        self.bx = nn.Parameter(bx, requires_grad=False)\n        self.nx = nn.Parameter(nx, requires_grad=False)\n\n        if data_config is None:\n            data_config = {'input_size': (256, 704)}\n        self.data_config = data_config\n        self.downsample = downsample\n\n        self.frustum = self.create_frustum()   # D x H x W x 3\n        self.D, _, _, _ = self.frustum.shape\n        self.numC_input = numC_input\n        self.numC_Trans = numC_Trans\n        self.depth_net = nn.Conv2d(self.numC_input, self.D + self.numC_Trans, kernel_size=1, padding=0)\n        self.geom_feats = None\n        self.accelerate = accelerate\n        self.use_bev_pool = use_bev_pool\n        self.vp_megvii = vp_megvii\n        self.vp_stereo = vp_stero\n\n    def get_depth_dist(self, x):\n        return x.softmax(dim=1)\n\n    def create_frustum(self):\n        # make grid in image plane\n        ogfH, ogfW = self.data_config['input_size']\n        fH, fW = ogfH // self.downsample, ogfW // self.downsample\n        ds = torch.arange(*self.grid_config['dbound'], dtype=torch.float).view(-1, 1, 1).expand(-1, fH, fW)  # dbound=[2.0, 58.0, 0.5]\n        D, _, _ = ds.shape\n        xs = torch.linspace(0, ogfW - 1, fW, dtype=torch.float).view(1, 1, fW).expand(D, fH, fW)\n        ys = torch.linspace(0, ogfH - 1, fH, dtype=torch.float).view(1, fH, 1).expand(D, fH, fW)\n\n        # D x H x W x 3\n        frustum = torch.stack((xs, ys, ds), -1)\n        return nn.Parameter(frustum, requires_grad=False)\n\n    def get_geometry(self, rots_seq, trans_seq, intrins_seq, post_rots_seq, post_trans_seq, bda):\n        \"\"\"Determine the (x,y,z) locations (in the ego frame)\n        of the points in the point cloud.\n        Returns B x N x D x H/downsample x W/downsample x 3\n        \"\"\"\n\n        B, N, _ = trans_seq.shape\n\n        # undo post-transformation\n        # B x N x D x H x W x 3\n        points = self.frustum - post_trans_seq.view(B, N, 1, 1, 1, 3)\n        points = torch.inverse(post_rots_seq).view(B, N, 1, 1, 1, 3, 3).matmul(points.unsqueeze(-1))\n\n        # cam_to_ego\n        points = torch.cat((points[:, :, :, :, :, :2] * points[:, :, :, :, :, 2:3],\n                            points[:, :, :, :, :, 2:3]\n                            ), 5)\n        \n        if intrins_seq.shape[3] == 4:\n            shift = intrins_seq[:, :, :3, 3]\n            points = points - shift.view(B, N, 1, 1, 1, 3, 1)\n            intrins_seq = intrins_seq[:, :, :3, :3]\n        \n        combine = rots_seq.matmul(torch.inverse(intrins_seq))\n        points = combine.view(B, N, 1, 1, 1, 3, 3).matmul(points).squeeze(-1)\n        points += trans_seq.view(B, N, 1, 1, 1, 3)\n        \n        return points\n\n    def voxel_pooling(self, geom_feats, x):\n        B, N, D, H, W, C = x.shape\n        Nprime = B * N * D * H * W\n        nx = self.nx.to(torch.long)\n        # flatten x\n        x = x.reshape(Nprime, C)\n\n        # flatten indices\n        geom_feats = ((geom_feats - (self.bx - self.dx / 2.)) / self.dx).long()\n        geom_feats = geom_feats.view(Nprime, 3)\n        batch_ix = torch.cat([torch.full([Nprime // B, 1], ix,\n                                         device=x.device, dtype=torch.long) for ix in range(B)])\n        geom_feats = torch.cat((geom_feats, batch_ix), 1)\n\n        # filter out points that are outside box\n        kept = (geom_feats[:, 0] >= 0) & (geom_feats[:, 0] < self.nx[0]) \\\n               & (geom_feats[:, 1] >= 0) & (geom_feats[:, 1] < self.nx[1]) \\\n               & (geom_feats[:, 2] >= 0) & (geom_feats[:, 2] < self.nx[2])\n        x = x[kept]\n        geom_feats = geom_feats[kept]\n\n        if self.use_bev_pool:\n            final = occ_pool(x, geom_feats, B, self.nx[2], self.nx[0],\n                                   self.nx[1])\n            final = final.transpose(dim0=-2, dim1=-1)\n        else:\n            # get tensors from the same voxel next to each other\n            ranks = geom_feats[:, 0] * (self.nx[1] * self.nx[2] * B) \\\n                    + geom_feats[:, 1] * (self.nx[2] * B) \\\n                    + geom_feats[:, 2] * B \\\n                    + geom_feats[:, 3]\n            sorts = ranks.argsort()\n            x, geom_feats, ranks = x[sorts], geom_feats[sorts], ranks[sorts]\n\n            # cumsum trick\n            x, geom_feats = QuickCumsum.apply(x, geom_feats, ranks)\n\n            # griddify (B x C x Z x X x Y)\n            final = torch.zeros((B, C, nx[2], nx[1], nx[0]), device=x.device)\n            final[geom_feats[:, 3], :, geom_feats[:, 2], geom_feats[:, 1], geom_feats[:, 0]] = x\n        # collapse Z\n        final = torch.cat(final.unbind(dim=2), 1)\n\n        return final\n\n    def voxel_pooling_accelerated(self, rots, trans, intrins, post_rots, post_trans, bda, x):\n        B, N, D, H, W, C = x.shape\n        Nprime = B * N * D * H * W\n        nx = self.nx.to(torch.long)\n        # flatten x\n        x = x.reshape(Nprime, C)\n        max = 300\n        # flatten indices\n        if self.geom_feats is None:\n            geom_feats = self.get_geometry(rots, trans, intrins,\n                                           post_rots, post_trans, bda)\n            geom_feats = ((geom_feats - (self.bx - self.dx / 2.)) /\n                          self.dx).long()\n            geom_feats = geom_feats.view(Nprime, 3)\n            batch_ix = torch.cat([torch.full([Nprime // B, 1], ix,\n                                             device=x.device, dtype=torch.long)\n                                  for ix in range(B)])\n            geom_feats = torch.cat((geom_feats, batch_ix), 1)\n\n            # filter out points that are outside box\n            kept1 = (geom_feats[:, 0] >= 0) & (geom_feats[:, 0] < self.nx[0]) \\\n                    & (geom_feats[:, 1] >= 0) & (geom_feats[:, 1] < self.nx[1]) \\\n                    & (geom_feats[:, 2] >= 0) & (geom_feats[:, 2] < self.nx[2])\n            idx = torch.range(0, x.shape[0] - 1, dtype=torch.long)\n            x = x[kept1]\n            idx = idx[kept1]\n            geom_feats = geom_feats[kept1]\n\n            # get tensors from the same voxel next to each other\n            ranks = geom_feats[:, 0] * (self.nx[1] * self.nx[2] * B) \\\n                    + geom_feats[:, 1] * (self.nx[2] * B) \\\n                    + geom_feats[:, 2] * B \\\n                    + geom_feats[:, 3]\n            sorts = ranks.argsort()\n            x, geom_feats, ranks, idx = x[sorts], geom_feats[sorts], ranks[sorts], idx[sorts]\n            repeat_id = torch.ones(geom_feats.shape[0], device=geom_feats.device, dtype=geom_feats.dtype)\n            curr = 0\n            repeat_id[0] = 0\n            curr_rank = ranks[0]\n\n            for i in range(1, ranks.shape[0]):\n                if curr_rank == ranks[i]:\n                    curr += 1\n                    repeat_id[i] = curr\n                else:\n                    curr_rank = ranks[i]\n                    curr = 0\n                    repeat_id[i] = curr\n            kept2 = repeat_id < max\n            repeat_id, geom_feats, x, idx = repeat_id[kept2], geom_feats[kept2], x[kept2], idx[kept2]\n\n            geom_feats = torch.cat([geom_feats,\n                                    repeat_id.unsqueeze(-1)], dim=-1)\n            self.geom_feats = geom_feats\n            self.idx = idx\n        else:\n            geom_feats = self.geom_feats\n            idx = self.idx\n            x = x[idx]\n\n        # griddify (B x C x Z x X x Y)\n        final = torch.zeros((B, C, nx[2], nx[1], nx[0], max), device=x.device)\n        final[geom_feats[:, 3], :, geom_feats[:, 2], geom_feats[:, 1],\n        geom_feats[:, 0], geom_feats[:, 4]] = x\n        final = final.sum(-1)\n        # collapse Z\n        final = torch.cat(final.unbind(dim=2), 1)\n        return final\n\n    def voxel_pooling_bevdepth(self, geom_feats, x):\n        nx = self.nx.to(torch.long)\n        geom_feats = ((geom_feats - (self.bx - self.dx / 2.)) / self.dx).int()\n        \n        # FIXME\n        # final = voxel_pooling(geom_feats, x.contiguous(), nx)\n        final = self.voxel_pooling(geom_feats, x.contiguous(), nx)\n        \n        return final\n\n    def forward(self, input):\n        x, rots, trans, intrins, post_rots, post_trans, bda = input\n        B, N, C, H, W = x.shape\n        x = x.view(B * N, C, H, W)\n        x = self.depth_net(x)\n        depth = self.get_depth_dist(x[:, :self.D])\n        img_feat = x[:, self.D:(self.D + self.numC_Trans)]\n\n        # Lift\n        volume = depth.unsqueeze(1) * img_feat.unsqueeze(2)\n        volume = volume.view(B, N, self.numC_Trans, self.D, H, W)\n        volume = volume.permute(0, 1, 3, 4, 5, 2)\n\n        # Splat\n        if self.accelerate:\n            bev_feat = self.voxel_pooling_accelerated(rots, trans, intrins,\n                                                      post_rots, post_trans,\n                                                      bda, volume)\n        else:\n            geom = self.get_geometry(rots, trans, intrins,\n                                     post_rots, post_trans, bda)\n            if self.vp_megvii:\n                bev_feat = self.voxel_pooling_bevdepth(geom, volume)\n            else:\n                bev_feat = self.voxel_pooling(geom, volume)\n        return bev_feat\n\n\nclass _ASPPModule(nn.Module):\n    def __init__(self, inplanes, planes, kernel_size, padding, dilation,\n                 BatchNorm):\n        super(_ASPPModule, self).__init__()\n        self.atrous_conv = nn.Conv2d(inplanes,\n                                     planes,\n                                     kernel_size=kernel_size,\n                                     stride=1,\n                                     padding=padding,\n                                     dilation=dilation,\n                                     bias=False)\n        self.bn = BatchNorm\n        self.relu = nn.ReLU()\n\n        self._init_weight()\n\n    def forward(self, x):\n        x = self.atrous_conv(x)\n        x = self.bn(x)\n\n        return self.relu(x)\n\n    def _init_weight(self):\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                torch.nn.init.kaiming_normal_(m.weight)\n            elif isinstance(m, nn.BatchNorm2d):\n                m.weight.data.fill_(1)\n                m.bias.data.zero_()\n\n\nclass ASPP(nn.Module):\n    def __init__(self, inplanes, mid_channels=256, norm_cfg=dict(type='BN2d')):\n        super(ASPP, self).__init__()\n\n        dilations = [1, 6, 12, 18]\n\n        self.aspp1 = _ASPPModule(inplanes,\n                                 mid_channels,\n                                 1,\n                                 padding=0,\n                                 dilation=dilations[0],\n                                 BatchNorm=build_norm_layer(norm_cfg, mid_channels)[1])\n        self.aspp2 = _ASPPModule(inplanes,\n                                 mid_channels,\n                                 3,\n                                 padding=dilations[1],\n                                 dilation=dilations[1],\n                                 BatchNorm=build_norm_layer(norm_cfg, mid_channels)[1])\n        self.aspp3 = _ASPPModule(inplanes,\n                                 mid_channels,\n                                 3,\n                                 padding=dilations[2],\n                                 dilation=dilations[2],\n                                 BatchNorm=build_norm_layer(norm_cfg, mid_channels)[1])\n        self.aspp4 = _ASPPModule(inplanes,\n                                 mid_channels,\n                                 3,\n                                 padding=dilations[3],\n                                 dilation=dilations[3],\n                                 BatchNorm=build_norm_layer(norm_cfg, mid_channels)[1])\n\n        self.global_avg_pool = nn.Sequential(\n            nn.AdaptiveAvgPool2d((1, 1)),\n            nn.Conv2d(inplanes, mid_channels, 1, stride=1, bias=False),\n            build_norm_layer(norm_cfg, mid_channels)[1],\n            nn.ReLU(),\n        )\n        self.conv1 = nn.Conv2d(int(mid_channels * 5),\n                               mid_channels,\n                               1,\n                               bias=False)\n        self.bn1 = build_norm_layer(norm_cfg, mid_channels)[1]\n        self.relu = nn.ReLU()\n        self.dropout = nn.Dropout(0.5)\n        self._init_weight()\n\n    def forward(self, x):\n        x1 = self.aspp1(x)\n        x2 = self.aspp2(x)\n        x3 = self.aspp3(x)\n        x4 = self.aspp4(x)\n        x5 = self.global_avg_pool(x)\n        x5 = F.interpolate(x5,\n                           size=x4.size()[2:],\n                           mode='bilinear',\n                           align_corners=True)\n        x = torch.cat((x1, x2, x3, x4, x5), dim=1)\n\n        x = self.conv1(x)\n        x = self.bn1(x)\n        x = self.relu(x)\n\n        return self.dropout(x)\n\n    def _init_weight(self):\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                torch.nn.init.kaiming_normal_(m.weight)\n            elif isinstance(m, nn.BatchNorm2d):\n                m.weight.data.fill_(1)\n                m.bias.data.zero_()\n\n\nclass Mlp(nn.Module):\n    def __init__(self,\n                 in_features,\n                 hidden_features=None,\n                 out_features=None,\n                 act_layer=nn.ReLU,\n                 drop=0.0):\n        super().__init__()\n        out_features = out_features or in_features\n        hidden_features = hidden_features or in_features\n        self.fc1 = nn.Linear(in_features, hidden_features)\n        self.act = act_layer()\n        self.drop1 = nn.Dropout(drop)\n        self.fc2 = nn.Linear(hidden_features, out_features)\n        self.drop2 = nn.Dropout(drop)\n\n    def forward(self, x):\n        x = self.fc1(x)\n        x = self.act(x)\n        x = self.drop1(x)\n        x = self.fc2(x)\n        x = self.drop2(x)\n        return x\n\n\nclass SELayer(nn.Module):\n    def __init__(self, channels, act_layer=nn.ReLU, gate_layer=nn.Sigmoid):\n        super().__init__()\n        self.conv_reduce = nn.Conv2d(channels, channels, 1, bias=True)\n        self.act1 = act_layer()\n        self.conv_expand = nn.Conv2d(channels, channels, 1, bias=True)\n        self.gate = gate_layer()\n\n    def forward(self, x, x_se):\n        x_se = self.conv_reduce(x_se)\n        x_se = self.act1(x_se)\n        x_se = self.conv_expand(x_se)\n        return x * self.gate(x_se)\n    \n\nclass DepthNet(nn.Module):\n    def __init__(self, in_channels, mid_channels, context_channels,\n                 depth_channels, cam_channels=27, norm_cfg=None):\n        super(DepthNet, self).__init__()\n\n        self.reduce_conv = nn.Sequential(\n            nn.Conv2d(in_channels,\n                      mid_channels,\n                      kernel_size=3,\n                      stride=1,\n                      padding=1),\n            build_norm_layer(norm_cfg, mid_channels)[1],\n            nn.ReLU(inplace=True),\n        )\n        self.context_conv = nn.Conv2d(mid_channels,\n                                      context_channels,\n                                      kernel_size=1,\n                                      stride=1,\n                                      padding=0)\n        \n        self.bn = build_norm_layer(dict(type='GN', num_groups=9, requires_grad=True), cam_channels)[1]\n        self.depth_mlp = Mlp(cam_channels, mid_channels, mid_channels)\n        self.depth_se = SELayer(mid_channels)  # NOTE: add camera-aware\n        self.context_mlp = Mlp(cam_channels, mid_channels, mid_channels)\n        self.context_se = SELayer(mid_channels)  # NOTE: add camera-aware\n        self.depth_conv = nn.Sequential(\n            BasicBlock(mid_channels, mid_channels, norm_cfg=norm_cfg),\n            BasicBlock(mid_channels, mid_channels, norm_cfg=norm_cfg),\n            BasicBlock(mid_channels, mid_channels, norm_cfg=norm_cfg),\n            ASPP(mid_channels, mid_channels, norm_cfg=norm_cfg),\n            build_conv_layer(cfg=dict(\n                type='DCN',\n                in_channels=mid_channels,\n                out_channels=mid_channels,\n                kernel_size=3,\n                padding=1,\n                groups=4,\n                im2col_step=128,\n            )),\n            nn.Conv2d(mid_channels,\n                      depth_channels,\n                      kernel_size=1,\n                      stride=1,\n                      padding=0),\n        )\n\n    def forward(self, x, mlp_input):\n        mlp_input = self.bn(mlp_input.reshape(-1, mlp_input.shape[-1]))\n        x = self.reduce_conv(x)\n        context_se = self.context_mlp(mlp_input)[..., None, None]\n        context = self.context_se(x, context_se)\n        context = self.context_conv(context)\n        depth_se = self.depth_mlp(mlp_input)[..., None, None]\n        depth = self.depth_se(x, depth_se)\n        depth = self.depth_conv(depth)\n        return torch.cat([depth, context], dim=1)\n\nclass DepthAggregation(nn.Module):\n    \"\"\"\n    pixel cloud feature extraction\n    \"\"\"\n    def __init__(self, in_channels, mid_channels, out_channels, norm_cfg):\n        super(DepthAggregation, self).__init__()\n\n        self.reduce_conv = nn.Sequential(\n            nn.Conv2d(in_channels,\n                      mid_channels,\n                      kernel_size=3,\n                      stride=1,\n                      padding=1,\n                      bias=False),\n            build_norm_layer(norm_cfg, mid_channels)[1],\n            nn.ReLU(inplace=True),\n        )\n\n        self.conv = nn.Sequential(\n            nn.Conv2d(mid_channels,\n                      mid_channels,\n                      kernel_size=3,\n                      stride=1,\n                      padding=1,\n                      bias=False),\n            build_norm_layer(norm_cfg, mid_channels)[1],\n            nn.ReLU(inplace=True),\n            nn.Conv2d(mid_channels,\n                      mid_channels,\n                      kernel_size=3,\n                      stride=1,\n                      padding=1,\n                      bias=False),\n            build_norm_layer(norm_cfg, mid_channels)[1],\n            nn.ReLU(inplace=True),\n        )\n\n        self.out_conv = nn.Sequential(\n            nn.Conv2d(mid_channels,\n                      out_channels,\n                      kernel_size=3,\n                      stride=1,\n                      padding=1,\n                      bias=True),\n            # nn.BatchNorm3d(out_channels),\n            # nn.ReLU(inplace=True),\n        )\n\n    @autocast(False)\n    def forward(self, x):\n        x = checkpoint(self.reduce_conv, x)\n        short_cut = x\n        x = checkpoint(self.conv, x)\n        x = short_cut + x\n        x = self.out_conv(x)\n        return x\n\n\n@NECKS.register_module()\nclass ViewTransformerLSSBEVDepth(ViewTransformerLiftSplatShoot):\n    def __init__(self, loss_depth_weight, cam_channels=27, loss_depth_reg_weight=0.0, use_voxel_net=False, \n                 norm_cfg=dict(type='BN2d', eps=1e-3, momentum=0.01), **kwargs):\n        super(ViewTransformerLSSBEVDepth, self).__init__(**kwargs)\n        self.loss_depth_weight = loss_depth_weight\n        self.loss_depth_reg_weight = loss_depth_reg_weight\n        self.cam_channels = cam_channels\n        \n        self.depth_net = DepthNet(self.numC_input, self.numC_input,\n                                  self.numC_Trans, self.D, cam_channels=self.cam_channels,\n                                  norm_cfg=norm_cfg)\n        self.depth_aggregation_net = DepthAggregation(self.numC_Trans,\n                                                      self.numC_Trans,\n                                                      self.numC_Trans,\n                                                      norm_cfg=norm_cfg) if use_voxel_net else None\n\n    def _forward_voxel_net(self, img_feat_with_depth):\n        # BEVConv2D [n, c, d, h, w] -> [n, h, c, w, d]\n        if self.depth_aggregation_net is None:\n            return img_feat_with_depth\n        img_feat_with_depth = img_feat_with_depth.permute(\n            0, 3, 1, 4, 2).contiguous()  # [n, c, d, h, w] -> [n, h, c, w, d]\n        n, h, c, w, d = img_feat_with_depth.shape\n        img_feat_with_depth = img_feat_with_depth.view(-1, c, w, d)\n        img_feat_with_depth = (\n            self.depth_aggregation_net(img_feat_with_depth).view(\n                n, h, c, w, d).permute(0, 2, 4, 1, 3).contiguous().float())\n        return img_feat_with_depth\n\n    def get_mlp_input(self, rot, tran, intrin, post_rot, post_tran, bda=None):\n        B,N,_,_ = rot.shape\n        if bda is None:\n            bda = torch.eye(3).to(rot).view(1,3,3).repeat(B,1,1)\n        bda = bda.view(B,1,3,3).repeat(1,N,1,1)\n        \n        if intrin.shape[-1] == 4:\n            # for KITTI, the intrin matrix is 3x4\n            mlp_input = torch.stack([\n                intrin[:, :, 0, 0],\n                intrin[:, :, 1, 1],\n                intrin[:, :, 0, 2],\n                intrin[:, :, 1, 2],\n                intrin[:, :, 0, 3],\n                intrin[:, :, 1, 3],\n                intrin[:, :, 2, 3],\n                post_rot[:, :, 0, 0],\n                post_rot[:, :, 0, 1],\n                post_tran[:, :, 0],\n                post_rot[:, :, 1, 0],\n                post_rot[:, :, 1, 1],\n                post_tran[:, :, 1],\n                bda[:, :, 0, 0],\n                bda[:, :, 0, 1],\n                bda[:, :, 1, 0],\n                bda[:, :, 1, 1],\n                bda[:, :, 2, 2],\n            ], dim=-1)\n        else:\n            mlp_input = torch.stack([\n                intrin[:, :, 0, 0],\n                intrin[:, :, 1, 1],\n                intrin[:, :, 0, 2],\n                intrin[:, :, 1, 2],\n                post_rot[:, :, 0, 0],\n                post_rot[:, :, 0, 1],\n                post_tran[:, :, 0],\n                post_rot[:, :, 1, 0],\n                post_rot[:, :, 1, 1],\n                post_tran[:, :, 1],\n                bda[:, :, 0, 0],\n                bda[:, :, 0, 1],\n                bda[:, :, 1, 0],\n                bda[:, :, 1, 1],\n                bda[:, :, 2, 2],\n            ], dim=-1)\n        \n        sensor2ego = torch.cat([rot, tran.reshape(B, N, 3, 1)], dim=-1).reshape(B, N, -1)\n        mlp_input = torch.cat([mlp_input, sensor2ego], dim=-1)\n        \n        return mlp_input\n\n    def get_downsampled_gt_depth(self, gt_depths):\n        \"\"\"\n        Input:\n            gt_depths: [B, N, H, W]\n        Output:\n            gt_depths: [B*N*h*w, d]\n        \"\"\"\n        B, N, H, W = gt_depths.shape\n        gt_depths = gt_depths.view(B * N,\n                                   H // self.downsample, self.downsample,\n                                   W // self.downsample, self.downsample, 1)\n        gt_depths = gt_depths.permute(0, 1, 3, 5, 2, 4).contiguous()\n        gt_depths = gt_depths.view(-1, self.downsample * self.downsample)\n        gt_depths_tmp = torch.where(gt_depths == 0.0, 1e5 * torch.ones_like(gt_depths), gt_depths)\n        gt_depths = torch.min(gt_depths_tmp, dim=-1).values\n        gt_depths = gt_depths.view(B * N, H // self.downsample, W // self.downsample)\n        \n        # [min - step / 2, min + step / 2] creates min depth\n        gt_depths = (gt_depths - (self.grid_config['dbound'][0] - self.grid_config['dbound'][2] / 2)) / self.grid_config['dbound'][2]\n        gt_depths = torch.where((gt_depths < self.D + 1) & (gt_depths >= 0.0), gt_depths, torch.zeros_like(gt_depths))\n        gt_depths = F.one_hot(gt_depths.long(), num_classes=self.D + 1).view(-1, self.D + 1)[:, 1:]\n        \n        return gt_depths.float()\n\n    def _prepare_depth_gt(self, gt_depths):\n        \"\"\"\n        Input:\n            gt_depths: [B, N, H, W]\n        Output:\n            gt_depths: [B*N*H*W, d]\n        \"\"\"\n        gt_depths = (gt_depths - (self.grid_config['dbound'][0] -\n                                  self.grid_config['dbound'][2])) / \\\n                    self.grid_config['dbound'][2]\n        gt_depths = torch.where((gt_depths < self.D + 1) & (gt_depths >= 0.0),\n                                gt_depths, torch.zeros_like(gt_depths))\n        gt_depths = F.one_hot(gt_depths.long(),\n                              num_classes=self.D + 1).view(-1,\n                                                           self.D + 1)[:, 1:]\n        return gt_depths.float()\n\n    @force_fp32()\n    def get_depth_reg_loss(self, depth_labels, depth_preds):\n        depth_labels = self.get_downsampled_gt_depth(depth_labels)\n        # depth_labels = self._prepare_depth_gt(depth_labels)\n        depth_preds = depth_preds.permute(0, 2, 3, 1).contiguous().view(-1, self.D)\n        # foreground predictions & labels\n        fg_mask = torch.max(depth_labels, dim=1).values > 0.0\n        depth_labels = depth_labels[fg_mask]\n        depth_preds = depth_preds[fg_mask]\n        \n        # cls_targets ==> reg_targets\n        ds = torch.arange(*self.grid_config['dbound'], dtype=torch.float).view(1, -1).type_as(depth_preds)\n        depth_reg_labels = torch.sum(depth_labels * ds, dim=1)\n        depth_reg_preds = torch.sum(depth_preds * ds, dim=1)\n        \n        with autocast(enabled=False):\n            loss_depth = F.smooth_l1_loss(depth_reg_preds, depth_reg_labels, reduction='mean')\n\n        return self.loss_depth_reg_weight * loss_depth\n\n    @force_fp32()\n    def get_depth_loss(self, depth_labels, depth_preds):\n        depth_labels = self.get_downsampled_gt_depth(depth_labels)\n        # depth_labels = self._prepare_depth_gt(depth_labels)\n        depth_preds = depth_preds.permute(0, 2, 3, 1).contiguous().view(\n            -1, self.D)\n        fg_mask = torch.max(depth_labels, dim=1).values > 0.0\n        depth_labels = depth_labels[fg_mask]\n        depth_preds = depth_preds[fg_mask]\n        with autocast(enabled=False):\n            depth_loss = F.binary_cross_entropy(\n                depth_preds,\n                depth_labels,\n                reduction='none',\n            ).sum() / max(1.0, fg_mask.sum())\n        \n        return self.loss_depth_weight * depth_loss\n\n    def forward(self, input):\n        (x, rots, trans, intrins, post_rots, post_trans, bda, mlp_input) = input[:8]\n\n        B, N, C, H, W = x.shape\n        x = x.view(B * N, C, H, W)\n        x = self.depth_net(x, mlp_input)\n        depth_digit = x[:, :self.D, ...]\n        img_feat = x[:, self.D:self.D+self.numC_Trans, ...]\n        depth_prob = self.get_depth_dist(depth_digit)\n        # Lift\n        volume = depth_prob.unsqueeze(1) * img_feat.unsqueeze(2)\n        volume = self._forward_voxel_net(volume)\n        volume = volume.view(B, N, self.numC_Trans, self.D, H, W)\n        volume = volume.permute(0, 1, 3, 4, 5, 2)\n\n        # Splat\n        if self.accelerate:\n            bev_feat = self.voxel_pooling_accelerated(rots, trans, intrins,\n                                                      post_rots, post_trans,\n                                                      bda, volume)\n        else:\n            geom = self.get_geometry(rots, trans, intrins,\n                                     post_rots, post_trans, bda)\n            if self.vp_megvii:\n                bev_feat = self.voxel_pooling_bevdepth(geom, volume)\n            else:\n                bev_feat = self.voxel_pooling(geom, volume)\n        return bev_feat, depth_prob\n\n\nclass ConvBnReLU3D(nn.Module):\n    \"\"\"Implements of 3d convolution + batch normalization + ReLU.\"\"\"\n    def __init__(\n        self,\n        in_channels: int,\n        out_channels: int,\n        kernel_size: int = 3,\n        stride: int = 1,\n        pad: int = 1,\n        dilation: int = 1,\n    ) -> None:\n        \"\"\"initialization method for convolution3D + batch normalization + relu module\n        Args:\n            in_channels: input channel number of convolution layer\n            out_channels: output channel number of convolution layer\n            kernel_size: kernel size of convolution layer\n            stride: stride of convolution layer\n            pad: pad of convolution layer\n            dilation: dilation of convolution layer\n        \"\"\"\n        super(ConvBnReLU3D, self).__init__()\n        self.conv = nn.Conv3d(in_channels,\n                              out_channels,\n                              kernel_size,\n                              stride=stride,\n                              padding=pad,\n                              dilation=dilation,\n                              bias=False)\n        self.bn = nn.BatchNorm3d(out_channels)\n\n    def forward(self, x: torch.Tensor) -> torch.Tensor:\n        \"\"\"forward method\"\"\"\n        return F.relu(self.bn(self.conv(x)), inplace=True)\n\n\nclass DepthNetStereo(nn.Module):\n    def __init__(self,\n                 in_channels,\n                 mid_channels,\n                 context_channels,\n                 depth_channels,\n                 d_bound,\n                 num_ranges=4,\n                 norm_cfg=dict(type='BN', requires_grad=True)):\n        super(DepthNetStereo, self).__init__()\n        self.reduce_conv = nn.Sequential(\n            nn.Conv2d(in_channels,\n                      mid_channels,\n                      kernel_size=3,\n                      stride=1,\n                      padding=1),\n            nn.BatchNorm2d(mid_channels),\n            nn.ReLU(inplace=True),\n        )\n        self.context_conv = nn.Conv2d(mid_channels,\n                                      context_channels,\n                                      kernel_size=1,\n                                      stride=1,\n                                      padding=0)\n        self.bn = nn.BatchNorm1d(27)\n        self.depth_mlp = Mlp(27, mid_channels, mid_channels)\n        self.depth_se = SELayer(mid_channels)  # NOTE: add camera-aware\n        self.context_mlp = Mlp(27, mid_channels, mid_channels)\n        self.context_se = SELayer(mid_channels)  # NOTE: add camera-aware\n        self.depth_feat_conv = nn.Sequential(\n            BasicBlock(mid_channels, mid_channels, norm_cfg=norm_cfg),\n            BasicBlock(mid_channels, mid_channels, norm_cfg=norm_cfg),\n            ASPP(mid_channels, mid_channels, norm_cfg=norm_cfg),\n            build_conv_layer(cfg=dict(\n                type='DCN',\n                in_channels=mid_channels,\n                out_channels=mid_channels,\n                kernel_size=3,\n                padding=1,\n                groups=4,\n                im2col_step=128,\n            )),\n        )\n        self.mu_sigma_range_net = nn.Sequential(\n            BasicBlock(mid_channels, mid_channels),\n            nn.ConvTranspose2d(mid_channels,\n                               mid_channels,\n                               3,\n                               stride=2,\n                               padding=1,\n                               output_padding=1),\n            nn.BatchNorm2d(mid_channels),\n            nn.ReLU(inplace=True),\n            nn.ConvTranspose2d(mid_channels,\n                               mid_channels,\n                               3,\n                               stride=2,\n                               padding=1,\n                               output_padding=1),\n            nn.BatchNorm2d(mid_channels),\n            nn.ReLU(inplace=True),\n            nn.Conv2d(mid_channels,\n                      num_ranges * 3,\n                      kernel_size=1,\n                      stride=1,\n                      padding=0),\n        )\n        self.mono_depth_net = nn.Sequential(\n            BasicBlock(mid_channels, mid_channels),\n            nn.Conv2d(mid_channels,\n                      depth_channels,\n                      kernel_size=1,\n                      stride=1,\n                      padding=0),\n        )\n        self.d_bound = d_bound\n        self.num_ranges = num_ranges\n\n    # @autocast(False)\n    def forward(self, x, mlp_input):\n        B, _, H, W = x.shape\n\n        mlp_input = self.bn(mlp_input.reshape(-1, mlp_input.shape[-1]))\n        x = self.reduce_conv(x)\n        context_se = self.context_mlp(mlp_input)[..., None, None]\n        context = self.context_se(x, context_se)\n        context = self.context_conv(context)\n        depth_se = self.depth_mlp(mlp_input)[..., None, None]\n        depth_feat = self.depth_se(x, depth_se)\n        depth_feat = checkpoint(self.depth_feat_conv, depth_feat)\n        mono_depth = checkpoint(self.mono_depth_net, depth_feat)\n        mu_sigma_score = checkpoint(self.mu_sigma_range_net, depth_feat)\n        mu = mu_sigma_score[:, 0:self.num_ranges, ...]\n        sigma = mu_sigma_score[:, self.num_ranges:2 * self.num_ranges, ...]\n        range_score = mu_sigma_score[:,\n                                     2 * self.num_ranges:3 * self.num_ranges,\n                                     ...]\n        sigma = F.elu(sigma) + 1.0 + 1e-10\n        return x, context, mu, sigma, range_score, mono_depth\n\n\n@NECKS.register_module()\nclass ViewTransformerLSSBEVStereo(ViewTransformerLSSBEVDepth):\n    def __init__(self, num_ranges=4, use_mask=True, em_iteration=3,\n                 range_list=[[2, 8], [8, 16], [16, 28], [28, 58]],\n                 sampling_range=3, num_samples=3,\n                 k_list=None, min_sigma=1.0,\n                 num_groups=8,\n                 stereo_downsample_factor=4, \n                 norm_cfg=dict(type='BN2d'), **kwargs):\n        super(ViewTransformerLSSBEVStereo, self).__init__(**kwargs)\n        self.num_ranges = num_ranges\n        self.depth_net = DepthNetStereo(self.numC_input, self.numC_input,\n                                  self.numC_Trans, self.D,\n                                  self.grid_config['dbound'],\n                                  self.num_ranges,\n                                  norm_cfg=norm_cfg)\n        self.context_downsample_net = nn.Identity()\n        self.use_mask = use_mask\n        self.stereo_downsample_factor = stereo_downsample_factor\n        self.num_ranges = num_ranges\n        self.min_sigma = min_sigma\n        self.sampling_range = sampling_range\n        self.num_samples = num_samples\n        self.num_groups=num_groups\n        self.similarity_net = nn.Sequential(\n            ConvBnReLU3D(in_channels=num_groups,\n                         out_channels=16,\n                         kernel_size=1,\n                         stride=1,\n                         pad=0),\n            ConvBnReLU3D(in_channels=16,\n                         out_channels=8,\n                         kernel_size=1,\n                         stride=1,\n                         pad=0),\n            nn.Conv3d(in_channels=8,\n                      out_channels=1,\n                      kernel_size=1,\n                      stride=1,\n                      padding=0),\n        )\n        self.depth_downsample_net = nn.Sequential(\n            nn.Conv2d(self.D, 256, 3, 2, 1),\n            nn.BatchNorm2d(256),\n            nn.ReLU(),\n            nn.Conv2d(256, 256, 3, 2, 1),\n            nn.BatchNorm2d(256),\n            nn.ReLU(),\n            nn.Conv2d(256, self.D, 1, 1, 0),\n        )\n        if range_list is None:\n            range_length = (self.grid_config['dbound'][1] -\n                            self.grid_config['dbound'][0]) / num_ranges\n            self.range_list = [[\n                self.grid_config['dbound'][0] + range_length * i,\n                self.grid_config['dbound'][0] + range_length * (i + 1)\n            ] for i in range(num_ranges)]\n        else:\n            assert len(range_list) == num_ranges\n            self.range_list = range_list\n        self.em_iteration = em_iteration\n        if k_list is None:\n            self.register_buffer('k_list', torch.Tensor(self.depth_sampling()))\n        else:\n            self.register_buffer('k_list', torch.Tensor(k_list))\n        if self.use_mask:\n            self.mask_net = nn.Sequential(\n                nn.Conv2d(self.D*2, 64, 3, 1, 1),\n                nn.BatchNorm2d(64),\n                nn.ReLU(inplace=True),\n                BasicBlock(64, 64),\n                BasicBlock(64, 64),\n                nn.Conv2d(64, 1, 1, 1, 0),\n                nn.Sigmoid(),\n            )\n\n    def depth_sampling(self):\n        \"\"\"Generate sampling range of candidates.\n\n        Returns:\n            list[float]: List of all candidates.\n        \"\"\"\n        P_total = erf(self.sampling_range /\n                      np.sqrt(2))  # Probability covered by the sampling range\n        idx_list = np.arange(0, self.num_samples + 1)\n        p_list = (1 - P_total) / 2 + ((idx_list / self.num_samples) * P_total)\n        k_list = norm.ppf(p_list)\n        k_list = (k_list[1:] + k_list[:-1]) / 2\n        return list(k_list)\n\n    def create_depth_sample_frustum(self, depth_sample, downsample_factor=16):\n        \"\"\"Generate frustum\"\"\"\n        # make grid in image plane\n        ogfH, ogfW = self.data_config['input_size']\n        fH, fW = ogfH // downsample_factor, ogfW // downsample_factor\n        batch_size, num_depth, _, _ = depth_sample.shape\n        x_coords = (torch.linspace(0,\n                                   ogfW - 1,\n                                   fW,\n                                   dtype=torch.float,\n                                   device=depth_sample.device).view(\n                                       1, 1, 1,\n                                       fW).expand(batch_size, num_depth, fH,\n                                                  fW))\n        y_coords = (torch.linspace(0,\n                                   ogfH - 1,\n                                   fH,\n                                   dtype=torch.float,\n                                   device=depth_sample.device).view(\n                                       1, 1, fH,\n                                       1).expand(batch_size, num_depth, fH,\n                                                 fW))\n        paddings = torch.ones_like(depth_sample)\n\n        # D x H x W x 3\n        frustum = torch.stack((x_coords, y_coords, depth_sample, paddings), -1)\n        return frustum\n\n    def homo_warping(\n        self,\n        stereo_feat,\n        key_intrin_mats,\n        sweep_intrin_mats,\n        sensor2sensor_mats,\n        key_ida_mats,\n        sweep_ida_mats,\n        depth_sample,\n        frustum,\n    ):\n        \"\"\"Used for mvs method to transfer sweep image feature to\n            key image feature.\n\n        Args:\n            src_fea(Tensor): image features.\n            key_intrin_mats(Tensor): Intrin matrix for key sensor.\n            sweep_intrin_mats(Tensor): Intrin matrix for sweep sensor.\n            sensor2sensor_mats(Tensor): Transformation matrix from key\n                sensor to sweep sensor.\n            key_ida_mats(Tensor): Ida matrix for key frame.\n            sweep_ida_mats(Tensor): Ida matrix for sweep frame.\n            depth_sample (Tensor): Depth map of all candidates.\n            depth_sample_frustum (Tensor): Pre-generated frustum.\n        \"\"\"\n        batch_size_with_num_cams, channels = stereo_feat.shape[\n            0], stereo_feat.shape[1]\n        height, width = stereo_feat.shape[2], stereo_feat.shape[3]\n        with torch.no_grad():\n            points = frustum\n            points = points.reshape(points.shape[0], -1, points.shape[-1])\n            points[..., 2] = 1\n            # Undo ida for key frame.\n            points = key_ida_mats.reshape(batch_size_with_num_cams,\n                                          *key_ida_mats.shape[2:]).inverse(\n                                          ).unsqueeze(1) @ points.unsqueeze(-1)\n            # Convert points from pixel coord to key camera coord.\n            points[..., :3, :] *= depth_sample.reshape(\n                batch_size_with_num_cams, -1, 1, 1)\n            num_depth = frustum.shape[1]\n            points = (key_intrin_mats.reshape(\n                batch_size_with_num_cams,\n                *key_intrin_mats.shape[2:]).inverse().unsqueeze(1) @ points)\n            points = (sensor2sensor_mats.reshape(\n                batch_size_with_num_cams,\n                *sensor2sensor_mats.shape[2:]).unsqueeze(1) @ points)\n            # points in sweep sensor coord.\n            points = (sweep_intrin_mats.reshape(\n                batch_size_with_num_cams,\n                *sweep_intrin_mats.shape[2:]).unsqueeze(1) @ points)\n            # points in sweep pixel coord.\n            points[..., :2, :] = points[..., :2, :] / points[\n                ..., 2:3, :]  # [B, 2, Ndepth, H*W]\n\n            points = (sweep_ida_mats.reshape(\n                batch_size_with_num_cams,\n                *sweep_ida_mats.shape[2:]).unsqueeze(1) @ points).squeeze(-1)\n            neg_mask = points[..., 2] < 1e-3\n            points[..., 0][neg_mask] = width * self.stereo_downsample_factor\n            points[..., 1][neg_mask] = height * self.stereo_downsample_factor\n            points[..., 2][neg_mask] = 1\n            proj_x_normalized = points[..., 0] / (\n                (width * self.stereo_downsample_factor - 1) / 2) - 1\n            proj_y_normalized = points[..., 1] / (\n                (height * self.stereo_downsample_factor - 1) / 2) - 1\n            grid = torch.stack([proj_x_normalized, proj_y_normalized],\n                               dim=2)  # [B, Ndepth, H*W, 2]\n\n        warped_stereo_fea = F.grid_sample(\n            stereo_feat,\n            grid.view(batch_size_with_num_cams, num_depth * height, width, 2),\n            mode='bilinear',\n            padding_mode='zeros',\n        )\n        warped_stereo_fea = warped_stereo_fea.view(batch_size_with_num_cams,\n                                                   channels, num_depth, height,\n                                                   width)\n\n        return warped_stereo_fea\n\n    def _forward_mask(\n        self,\n        sweep_index,\n        mono_depth_all_sweeps,\n        mats_dict,\n        depth_sample,\n        depth_sample_frustum,\n        sensor2sensor_mats,\n    ):\n        \"\"\"Forward function to generate mask.\n\n        Args:\n            sweep_index (int): Index of sweep.\n            mono_depth_all_sweeps (list[Tensor]): List of mono_depth for\n                all sweeps.\n            mats_dict (dict):\n                sensor2ego_mats (Tensor): Transformation matrix from\n                    camera to ego with shape of (B, num_sweeps,\n                    num_cameras, 4, 4).\n                intrin_mats (Tensor): Intrinsic matrix with shape\n                    of (B, num_sweeps, num_cameras, 4, 4).\n                ida_mats (Tensor): Transformation matrix for ida with\n                    shape of (B, num_sweeps, num_cameras, 4, 4).\n                sensor2sensor_mats (Tensor): Transformation matrix\n                    from key frame camera to sweep frame camera with\n                    shape of (B, num_sweeps, num_cameras, 4, 4).\n                bda_mat (Tensor): Rotation matrix for bda with shape\n                    of (B, 4, 4).\n            depth_sample (Tensor): Depth map of all candidates.\n            depth_sample_frustum (Tensor): Pre-generated frustum.\n            sensor2sensor_mats (Tensor): Transformation matrix from reference\n                sensor to source sensor.\n\n        Returns:\n            Tensor: Generated mask.\n        \"\"\"\n        num_sweeps = len(mono_depth_all_sweeps)\n        mask_all_sweeps = list()\n        for idx in range(num_sweeps):\n            if idx == sweep_index:\n                continue\n            warped_mono_depth = self.homo_warping(\n                mono_depth_all_sweeps[idx],\n                mats_dict['intrin_mats'][:, sweep_index, ...],\n                mats_dict['intrin_mats'][:, idx, ...],\n                sensor2sensor_mats[idx],\n                mats_dict['ida_mats'][:, sweep_index, ...],\n                mats_dict['ida_mats'][:, idx, ...],\n                depth_sample,\n                depth_sample_frustum.type_as(mono_depth_all_sweeps[idx]),\n            )\n            mask = self.mask_net(\n                torch.cat([\n                    mono_depth_all_sweeps[sweep_index].detach(),\n                    warped_mono_depth.mean(2).detach()\n                ], 1))\n            mask_all_sweeps.append(mask)\n        return torch.stack(mask_all_sweeps).mean(0)\n\n    def _generate_cost_volume(\n            self,\n            sweep_index,\n            stereo_feats_all_sweeps,\n            mats_dict,\n            depth_sample,\n            depth_sample_frustum,\n            sensor2sensor_mats,\n    ):\n        \"\"\"Generate cost volume based on depth sample.\n\n        Args:\n            sweep_index (int): Index of sweep.\n            stereo_feats_all_sweeps (list[Tensor]): Stereo feature\n                of all sweeps.\n            mats_dict (dict):\n                sensor2ego_mats (Tensor): Transformation matrix from\n                    camera to ego with shape of (B, num_sweeps,\n                    num_cameras, 4, 4).\n                intrin_mats (Tensor): Intrinsic matrix with shape\n                    of (B, num_sweeps, num_cameras, 4, 4).\n                ida_mats (Tensor): Transformation matrix for ida with\n                    shape of (B, num_sweeps, num_cameras, 4, 4).\n                sensor2sensor_mats (Tensor): Transformation matrix\n                    from key frame camera to sweep frame camera with\n                    shape of (B, num_sweeps, num_cameras, 4, 4).\n                bda_mat (Tensor): Rotation matrix for bda with shape\n                    of (B, 4, 4).\n            depth_sample (Tensor): Depth map of all candidates.\n            depth_sample_frustum (Tensor): Pre-generated frustum.\n            sensor2sensor_mats (Tensor): Transformation matrix from reference\n                sensor to source sensor.\n\n        Returns:\n            Tensor: Depth score for all sweeps.\n        \"\"\"\n        batch_size, num_channels, height, width = stereo_feats_all_sweeps[\n            0].shape\n        # thres = int(self.mvs_weighting.split(\"CW\")[1])\n        num_sweeps = len(stereo_feats_all_sweeps)\n        depth_score_all_sweeps = list()\n        for idx in range(num_sweeps):\n            if idx == sweep_index:\n                continue\n            warped_stereo_fea = self.homo_warping(\n                stereo_feats_all_sweeps[idx],\n                mats_dict['intrin_mats'][:, sweep_index, ...],\n                mats_dict['intrin_mats'][:, idx, ...],\n                sensor2sensor_mats[idx],\n                mats_dict['ida_mats'][:, sweep_index, ...],\n                mats_dict['ida_mats'][:, idx, ...],\n                depth_sample,\n                depth_sample_frustum.type_as(stereo_feats_all_sweeps[idx]),\n            )\n            warped_stereo_fea = warped_stereo_fea.reshape(\n                batch_size, self.num_groups, num_channels // self.num_groups,\n                self.num_samples, height, width)\n            ref_stereo_feat = stereo_feats_all_sweeps[sweep_index].reshape(\n                batch_size, self.num_groups, num_channels // self.num_groups,\n                height, width)\n            feat_cost = torch.mean(\n                (ref_stereo_feat.unsqueeze(3) * warped_stereo_fea), axis=2)\n            depth_score = self.similarity_net(feat_cost).squeeze(1)\n            depth_score_all_sweeps.append(depth_score)\n        return torch.stack(depth_score_all_sweeps).mean(0)\n\n    def _forward_stereo(\n        self,\n        sweep_index,\n        stereo_feats_all_sweeps,\n        mono_depth_all_sweeps,\n        mats_dict,\n        sensor2sensor_mats,\n        mu_all_sweeps,\n        sigma_all_sweeps,\n        range_score_all_sweeps,\n        depth_feat_all_sweeps,\n    ):\n        \"\"\"Forward function to generate stereo depth.\n\n        Args:\n            sweep_index (int): Index of sweep.\n            stereo_feats_all_sweeps (list[Tensor]): Stereo feature\n                of all sweeps.\n            mono_depth_all_sweeps (list[Tensor]):\n            mats_dict (dict):\n                sensor2ego_mats (Tensor): Transformation matrix from\n                    camera to ego with shape of (B, num_sweeps,\n                    num_cameras, 4, 4).\n                intrin_mats (Tensor): Intrinsic matrix with shape\n                    of (B, num_sweeps, num_cameras, 4, 4).\n                ida_mats (Tensor): Transformation matrix for ida with\n                    shape of (B, num_sweeps, num_cameras, 4, 4).\n                sensor2sensor_mats (Tensor): Transformation matrix\n                    from key frame camera to sweep frame camera with\n                    shape of (B, num_sweeps, num_cameras, 4, 4).\n                bda_mat (Tensor): Rotation matrix for bda with shape\n                    of (B, 4, 4).\n            sensor2sensor_mats(Tensor): Transformation matrix from key\n                sensor to sweep sensor.\n            mu_all_sweeps (list[Tensor]): List of mu for all sweeps.\n            sigma_all_sweeps (list[Tensor]): List of sigma for all sweeps.\n            range_score_all_sweeps (list[Tensor]): List of all range score\n                for all sweeps.\n            depth_feat_all_sweeps (list[Tensor]): List of all depth feat for\n                all sweeps.\n\n        Returns:\n            Tensor: stereo_depth\n        \"\"\"\n        batch_size_with_cams, _, feat_height, feat_width = \\\n            stereo_feats_all_sweeps[0].shape\n        device = stereo_feats_all_sweeps[0].device\n        d_coords = torch.arange(*self.grid_config['dbound'],\n                                dtype=torch.float,\n                                device=device).reshape(1, -1, 1, 1)\n        d_coords = d_coords.repeat(batch_size_with_cams, 1, feat_height,\n                                   feat_width)\n        stereo_depth = stereo_feats_all_sweeps[0].new_zeros(\n            batch_size_with_cams, self.D, feat_height, feat_width)\n        mask_score = stereo_feats_all_sweeps[0].new_zeros(\n            batch_size_with_cams,\n            self.D,\n            feat_height * self.stereo_downsample_factor //\n            self.downsample,\n            feat_width * self.stereo_downsample_factor //\n            self.downsample,\n        )\n        score_all_ranges = list()\n        range_score = range_score_all_sweeps[sweep_index].softmax(1)\n        for range_idx in range(self.num_ranges):\n            # Map mu to the corresponding interval.\n            range_start = self.range_list[range_idx][0]\n            mu_all_sweeps_single_range = [\n                mu[:, range_idx:range_idx + 1, ...].sigmoid() *\n                (self.range_list[range_idx][1] - self.range_list[range_idx][0])\n                + range_start for mu in mu_all_sweeps\n            ]\n            sigma_all_sweeps_single_range = [\n                sigma[:, range_idx:range_idx + 1, ...]\n                for sigma in sigma_all_sweeps\n            ]\n            batch_size_with_cams, _, feat_height, feat_width =\\\n                stereo_feats_all_sweeps[0].shape\n            mu = mu_all_sweeps_single_range[sweep_index]\n            sigma = sigma_all_sweeps_single_range[sweep_index]\n            for _ in range(self.em_iteration):\n                depth_sample = torch.cat([mu + sigma * k for k in self.k_list],\n                                         1)\n                depth_sample_frustum = self.create_depth_sample_frustum(\n                    depth_sample, self.stereo_downsample_factor)\n                mu_score = self._generate_cost_volume(\n                    sweep_index,\n                    stereo_feats_all_sweeps,\n                    mats_dict,\n                    depth_sample,\n                    depth_sample_frustum,\n                    sensor2sensor_mats,\n                )\n                mu_score = mu_score.softmax(1)\n                scale_factor = torch.clamp(\n                    0.5 / (1e-4 + mu_score[:, self.num_samples //\n                                           2:self.num_samples // 2 + 1, ...]),\n                    min=0.1,\n                    max=10)\n\n                sigma = torch.clamp(sigma * scale_factor, min=0.1, max=10)\n                mu = (depth_sample * mu_score).sum(1, keepdim=True)\n                del depth_sample\n                del depth_sample_frustum\n            mu = torch.clamp(mu,\n                             max=self.range_list[range_idx][1],\n                             min=self.range_list[range_idx][0])\n            range_length = int(\n                (self.range_list[range_idx][1] - self.range_list[range_idx][0])\n                // self.grid_config['dbound'][2])\n            if self.use_mask:\n                depth_sample = F.avg_pool2d(\n                    mu,\n                    self.downsample // self.stereo_downsample_factor,\n                    self.downsample // self.stereo_downsample_factor,\n                )\n                depth_sample_frustum = self.create_depth_sample_frustum(\n                    depth_sample, self.downsample)\n                mask = self._forward_mask(\n                    sweep_index,\n                    mono_depth_all_sweeps,\n                    mats_dict,\n                    depth_sample,\n                    depth_sample_frustum,\n                    sensor2sensor_mats,\n                )\n                mask_score[:,\n                           int((range_start - self.grid_config['dbound'][0]) //\n                               self.grid_config['dbound'][2]):range_length +\n                           int((range_start - self.grid_config['dbound'][0]) //\n                               self.grid_config['dbound'][2]), ..., ] += mask\n                del depth_sample\n                del depth_sample_frustum\n            sigma = torch.clamp(sigma, self.min_sigma)\n            mu_repeated = mu.repeat(1, range_length, 1, 1)\n            eps = 1e-6\n            depth_score_single_range = (-1 / 2 * (\n                (d_coords[:,\n                          int((range_start - self.grid_config['dbound'][0]) //\n                              self.grid_config['dbound'][2]):range_length + int(\n                                  (range_start - self.grid_config['dbound'][0]) //\n                                  self.grid_config['dbound'][2]), ..., ] - mu_repeated) /\n                torch.sqrt(sigma))**2)\n            depth_score_single_range = depth_score_single_range.exp()\n            score_all_ranges.append(mu_score.sum(1).unsqueeze(1))\n            depth_score_single_range = depth_score_single_range / (\n                sigma * math.sqrt(2 * math.pi) + eps)\n            stereo_depth[:,\n                         int((range_start - self.grid_config['dbound'][0]) //\n                             self.grid_config['dbound'][2]):range_length +\n                         int((range_start - self.grid_config['dbound'][0]) //\n                             self.grid_config['dbound'][2]), ..., ] = (\n                                 depth_score_single_range *\n                                 range_score[:, range_idx:range_idx + 1, ...])\n            # del range_score\n            del depth_score_single_range\n            del mu_repeated\n        if self.use_mask:\n            return stereo_depth, mask_score\n        else:\n            return stereo_depth\n\n    def forward(self, input):\n        img_feat, depth_prob, rots, trans, intrins, post_rots, post_trans, bda = input\n        B, N, C, H, W = img_feat.shape\n        img_feat = img_feat.view(B*N,C,H,W)\n        # Lift\n        volume = depth_prob.unsqueeze(1) * img_feat.unsqueeze(2)\n        volume = self._forward_voxel_net(volume)\n        volume = volume.view(B, N, self.numC_Trans, self.D, H, W)\n        volume = volume.permute(0, 1, 3, 4, 5, 2)\n\n        # Splat\n        if self.accelerate:\n            bev_feat = self.voxel_pooling_accelerated(rots, trans, intrins,\n                                                      post_rots, post_trans,\n                                                      bda, volume)\n        else:\n            geom = self.get_geometry(rots, trans, intrins,\n                                     post_rots, post_trans, bda)\n            if self.vp_megvii:\n                bev_feat = self.voxel_pooling_bevdepth(geom, volume)\n            else:\n                bev_feat = self.voxel_pooling(geom, volume)\n        return bev_feat"
  },
  {
    "path": "projects/occ_plugin/occupancy/image2bev/ViewTransformerLSSVoxel.py",
    "content": "# Copyright (c) Phigent Robotics. All rights reserved.\nimport math\nimport torch\nimport torch.nn as nn\nfrom mmcv.runner import BaseModule\nfrom mmdet3d.models.builder import NECKS\nfrom projects.occ_plugin.ops.occ_pooling import occ_pool\nfrom mmcv.cnn import build_conv_layer\nfrom mmcv.runner import force_fp32\nfrom torch.cuda.amp.autocast_mode import autocast\nfrom projects.occ_plugin.utils.gaussian import generate_guassian_depth_target\n\nimport torch.nn.functional as F\nimport numpy as np\n\nimport pdb\n\nfrom .ViewTransformerLSSBEVDepth import *\n\nimport torch.cuda as cuda\n\ndef get_gpu_memory_usage():\n    allocated = cuda.memory_allocated()\n    reserved = cuda.memory_reserved()\n    return allocated, reserved\n\n@NECKS.register_module()\nclass ViewTransformerLiftSplatShootVoxel(ViewTransformerLSSBEVDepth):\n    def __init__(self, loss_depth_weight, loss_depth_type='bce', **kwargs):\n        super(ViewTransformerLiftSplatShootVoxel, self).__init__(loss_depth_weight=loss_depth_weight, **kwargs)\n        \n        self.loss_depth_type = loss_depth_type\n        self.cam_depth_range = self.grid_config['dbound']\n        self.constant_std = 0.5\n    \n    def get_downsampled_gt_depth(self, gt_depths):\n        \"\"\"\n        Input:\n            gt_depths: [B, N, H, W]\n        Output:\n            gt_depths: [B*N*h*w, d]\n        \"\"\"\n        B, N, H, W = gt_depths.shape\n        gt_depths = gt_depths.view(B * N,\n                                   H // self.downsample, self.downsample,\n                                   W // self.downsample, self.downsample, 1)\n        gt_depths = gt_depths.permute(0, 1, 3, 5, 2, 4).contiguous()\n        gt_depths = gt_depths.view(-1, self.downsample * self.downsample)\n        gt_depths_tmp = torch.where(gt_depths == 0.0, 1e5 * torch.ones_like(gt_depths), gt_depths)\n        gt_depths = torch.min(gt_depths_tmp, dim=-1).values\n        gt_depths = gt_depths.view(B * N, H // self.downsample, W // self.downsample)\n        \n        # [min - step / 2, min + step / 2] creates min depth\n        gt_depths = (gt_depths - (self.grid_config['dbound'][0] - self.grid_config['dbound'][2] / 2)) / self.grid_config['dbound'][2]\n        gt_depths_vals = gt_depths.clone()\n        \n        gt_depths = torch.where((gt_depths < self.D + 1) & (gt_depths >= 0.0), gt_depths, torch.zeros_like(gt_depths))\n        gt_depths = F.one_hot(gt_depths.long(), num_classes=self.D + 1).view(-1, self.D + 1)[:, 1:]\n        \n        return gt_depths_vals, gt_depths.float()\n    \n    @force_fp32()\n    def get_bce_depth_loss(self, depth_labels, depth_preds):\n        _, depth_labels = self.get_downsampled_gt_depth(depth_labels)\n        depth_preds = depth_preds.permute(0, 2, 3, 1).contiguous().view(-1, self.D)\n        fg_mask = torch.max(depth_labels, dim=1).values > 0.0\n        depth_labels = depth_labels[fg_mask]\n        depth_preds = depth_preds[fg_mask]\n        \n        with autocast(enabled=False):\n            depth_loss = F.binary_cross_entropy(depth_preds, depth_labels, reduction='none').sum() / max(1.0, fg_mask.sum())\n        \n        return depth_loss\n\n    @force_fp32()\n    def get_klv_depth_loss(self, depth_labels, depth_preds):\n        depth_gaussian_labels, depth_values = generate_guassian_depth_target(depth_labels,\n            self.downsample, self.cam_depth_range, constant_std=self.constant_std)\n\n        depth_values = depth_values.view(-1)\n        fg_mask = (depth_values >= self.cam_depth_range[0]) & (depth_values <= (self.cam_depth_range[1] - self.cam_depth_range[2]))        \n        \n        depth_gaussian_labels = depth_gaussian_labels.view(-1, self.D)[fg_mask]\n        depth_preds = depth_preds.permute(0, 2, 3, 1).contiguous().view(-1, self.D)[fg_mask]\n        \n        depth_loss = F.kl_div(torch.log(depth_preds + 1e-4), depth_gaussian_labels, reduction='batchmean', log_target=False)\n        \n        return depth_loss\n    \n    @force_fp32()\n    def get_depth_loss(self, depth_labels, depth_preds):\n        if self.loss_depth_type == 'bce':\n            depth_loss = self.get_bce_depth_loss(depth_labels, depth_preds)\n        \n        elif self.loss_depth_type == 'kld':\n            depth_loss = self.get_klv_depth_loss(depth_labels, depth_preds)\n        \n        else:\n            pdb.set_trace()\n        \n        return self.loss_depth_weight * depth_loss\n        \n    def voxel_pooling(self, geom_feats, x):\n        B, N, D, H, W, C = x.shape\n\n        Nprime = B * N * D * H * W\n\n        x = x.contiguous().view(Nprime, C)\n\n        # flatten indices\n        geom_feats = ((geom_feats - (self.bx - self.dx / 2.)) / self.dx).long()\n        geom_feats = geom_feats.view(Nprime, 3)\n        batch_ix = torch.cat([torch.full([Nprime // B, 1], ix, device=x.device, dtype=torch.long) for ix in range(B)])\n        geom_feats = torch.cat((geom_feats, batch_ix), 1)\n\n        # filter out points that are outside box\n        kept = (geom_feats[:, 0] >= 0) & (geom_feats[:, 0] < self.nx[0]) \\\n               & (geom_feats[:, 1] >= 0) & (geom_feats[:, 1] < self.nx[1]) \\\n               & (geom_feats[:, 2] >= 0) & (geom_feats[:, 2] < self.nx[2])\n\n        x = x[kept]\n        geom_feats = geom_feats[kept]\n        \n        final = occ_pool(x, geom_feats, B, self.nx[2], self.nx[0], self.nx[1])\n        final = final.permute(0, 1, 3, 4, 2)\n\n        return final\n\n    def forward(self, input):\n        (x, rots_seq, trans_seq, intrins_seq, post_rots_seq, post_trans_seq, bda, mlp_input_seq) = input[:8]\n\n        B, N, C, H, W = x.shape\n        x = x.view(B * N, C, H, W)\n        x = self.depth_net(x, mlp_input_seq)\n        depth_digit = x[:, :self.D, ...]\n        img_feat = x[:, self.D:self.D + self.numC_Trans, ...]\n        depth_prob = self.get_depth_dist(depth_digit)\n\n        volume = depth_prob.unsqueeze(1) * img_feat.unsqueeze(2)\n        volume = volume.view(B, N, self.numC_Trans, self.D, H, W)\n        volume = volume.permute(0, 1, 3, 4, 5, 2)\n\n        geom = self.get_geometry(rots_seq, trans_seq, intrins_seq, post_rots_seq, post_trans_seq, bda)\n\n        bev_feat = self.voxel_pooling(geom, volume) \n\n        return bev_feat, depth_prob\n"
  },
  {
    "path": "projects/occ_plugin/occupancy/image2bev/__init__.py",
    "content": "from .ViewTransformerLSSBEVDepth import ViewTransformerLSSBEVDepth\nfrom .ViewTransformerLSSVoxel import ViewTransformerLiftSplatShootVoxel"
  },
  {
    "path": "projects/occ_plugin/occupancy/necks/__init__.py",
    "content": "from .second_fpn_3d import SECONDFPN3D\nfrom .fpn3d import FPN3D"
  },
  {
    "path": "projects/occ_plugin/occupancy/necks/fpn3d.py",
    "content": "# Copyright (c) OpenMMLab. All rights reserved.\nimport numpy as np\nimport torch\nfrom mmcv.cnn import build_conv_layer, build_norm_layer, build_upsample_layer\nfrom mmcv.runner import BaseModule, auto_fp16\nfrom torch import nn as nn\nfrom mmcv.cnn import ConvModule\nfrom mmdet.models import NECKS\n\nimport torch.nn.functional as F\nimport pdb\n\n@NECKS.register_module()\nclass FPN3D(BaseModule):\n    \"\"\"FPN used in SECOND/PointPillars/PartA2/MVXNet.\n\n    Args:\n        in_channels (list[int]): Input channels of multi-scale feature maps.\n        out_channels (list[int]): Output channels of feature maps.\n        upsample_strides (list[int]): Strides used to upsample the\n            feature maps.\n        norm_cfg (dict): Config dict of normalization layers.\n        upsample_cfg (dict): Config dict of upsample layers.\n        conv_cfg (dict): Config dict of conv layers.\n        use_conv_for_no_stride (bool): Whether to use conv when stride is 1.\n    \"\"\"\n    def __init__(self,\n                 in_channels=[80, 160, 320, 640],\n                 out_channels=256,\n                 norm_cfg=dict(type='GN', num_groups=32, requires_grad=True),\n                 conv_cfg=dict(type='Conv3d'),\n                 act_cfg=dict(type='ReLU'),\n                 with_cp=False,\n                 upsample_cfg=dict(mode='trilinear'),\n                 init_cfg=None):\n        super(FPN3D, self).__init__(init_cfg=init_cfg)\n        \n        self.in_channels = in_channels\n        self.out_channels = out_channels\n        self.fp16_enabled = False\n        self.upsample_cfg = upsample_cfg\n        self.with_cp = with_cp\n        \n        self.num_out = len(self.in_channels)\n        self.lateral_convs = nn.ModuleList()\n        self.fpn_convs = nn.ModuleList()\n        \n        for i in range(self.num_out):\n            l_conv = nn.Sequential(\n                ConvModule(in_channels[i], out_channels, \n                    kernel_size=1, padding=0,\n                    conv_cfg=conv_cfg, norm_cfg=norm_cfg, \n                    act_cfg=act_cfg, bias=False, \n                    inplace=True),\n            )\n            \n            fpn_conv = nn.Sequential(\n                ConvModule(out_channels, out_channels, \n                    kernel_size=3, padding=1,\n                    conv_cfg=conv_cfg, norm_cfg=norm_cfg, \n                    act_cfg=act_cfg, bias=False, \n                    inplace=True),\n            )\n\n            self.lateral_convs.append(l_conv)\n            self.fpn_convs.append(fpn_conv)\n\n    @auto_fp16()\n    def forward(self, inputs):\n        \"\"\"Forward function.\n\n        Args:\n            x (torch.Tensor): 4D Tensor in (N, C, H, W) shape.\n\n        Returns:\n            list[torch.Tensor]: Multi-level feature maps.\n        \"\"\"\n        assert len(inputs) == len(self.in_channels)\n\n        # build laterals\n        laterals = []\n        for i, lateral_conv in enumerate(self.lateral_convs):\n            if self.with_cp:\n                lateral_i = torch.utils.checkpoint.checkpoint(lateral_conv, inputs[i])\n            else:\n                lateral_i = lateral_conv(inputs[i])\n            laterals.append(lateral_i)\n\n        # build down-top path\n        for i in range(self.num_out - 1, 0, -1):\n            prev_shape = laterals[i - 1].shape[2:]\n            laterals[i - 1] = laterals[i - 1] + F.interpolate(laterals[i], \n                    size=prev_shape, align_corners=False, **self.upsample_cfg)\n        \n        # outs = [\n        #     self.fpn_convs[i](laterals[i]) for i in range(self.num_out)\n        # ]\n        \n        outs = []\n        for i, fpn_conv in enumerate(self.fpn_convs):\n            if self.with_cp:\n                out_i = torch.utils.checkpoint.checkpoint(fpn_conv, laterals[i])\n            else:\n                out_i = fpn_conv(laterals[i])\n            outs.append(out_i)\n        \n        return outs\n"
  },
  {
    "path": "projects/occ_plugin/occupancy/necks/second_fpn_3d.py",
    "content": "# Copyright (c) OpenMMLab. All rights reserved.\nimport numpy as np\nimport torch\nfrom mmcv.cnn import build_conv_layer, build_norm_layer, build_upsample_layer\nfrom mmcv.runner import BaseModule, auto_fp16\nfrom torch import nn as nn\n\nfrom mmdet.models import NECKS\n\nimport pdb\n\n@NECKS.register_module()\nclass SECONDFPN3D(BaseModule):\n    \"\"\"FPN used in SECOND/PointPillars/PartA2/MVXNet.\n\n    Args:\n        in_channels (list[int]): Input channels of multi-scale feature maps.\n        out_channels (list[int]): Output channels of feature maps.\n        upsample_strides (list[int]): Strides used to upsample the\n            feature maps.\n        norm_cfg (dict): Config dict of normalization layers.\n        upsample_cfg (dict): Config dict of upsample layers.\n        conv_cfg (dict): Config dict of conv layers.\n        use_conv_for_no_stride (bool): Whether to use conv when stride is 1.\n    \"\"\"\n    def __init__(self,\n                 in_channels=[128, 128, 256],\n                 out_channels=[256, 256, 256],\n                 upsample_strides=[1, 2, 4],\n                 norm_cfg=dict(type='GN', num_groups=32, requires_grad=True),\n                 upsample_cfg=dict(type='deconv3d', bias=False),\n                 conv_cfg=dict(type='Conv3d', bias=False),\n                 use_conv_for_no_stride=False,\n                 init_cfg=None):\n        \n        # replacing GN with BN3D, performance drops from 42.5 to 40.9. \n        # the difference may be exaggerated because the performance can fluncate a lot\n        \n        super(SECONDFPN3D, self).__init__(init_cfg=init_cfg)\n        assert len(out_channels) == len(upsample_strides) == len(in_channels)\n        self.in_channels = in_channels\n        self.out_channels = out_channels\n        self.fp16_enabled = False\n\n        deblocks = []\n        for i, out_channel in enumerate(out_channels):\n            stride = upsample_strides[i]\n            if stride > 1 or (stride == 1 and not use_conv_for_no_stride):\n                upsample_layer = build_upsample_layer(\n                    upsample_cfg,\n                    in_channels=in_channels[i],\n                    out_channels=out_channel,\n                    kernel_size=upsample_strides[i],\n                    stride=upsample_strides[i])\n            else:\n                stride = np.round(1 / stride).astype(np.int64)\n                upsample_layer = build_conv_layer(\n                    conv_cfg,\n                    in_channels=in_channels[i],\n                    out_channels=out_channel,\n                    kernel_size=stride,\n                    stride=stride)\n\n            deblock = nn.Sequential(upsample_layer,\n                                    build_norm_layer(norm_cfg, out_channel)[1],\n                                    nn.ReLU(inplace=True))\n            deblocks.append(deblock)\n        \n        self.deblocks = nn.ModuleList(deblocks)\n\n        if init_cfg is None:\n            self.init_cfg = [\n                dict(type='Kaiming', layer='ConvTranspose2d'),\n                dict(type='Constant', layer='NaiveSyncBatchNorm2d', val=1.0)\n            ]\n\n    @auto_fp16()\n    def forward(self, x):\n        \"\"\"Forward function.\n\n        Args:\n            x (torch.Tensor): 4D Tensor in (N, C, H, W) shape.\n\n        Returns:\n            list[torch.Tensor]: Multi-level feature maps.\n        \"\"\"\n        assert len(x) == len(self.in_channels)\n        ups = [deblock(x[i]) for i, deblock in enumerate(self.deblocks)]\n        \n        if len(ups) > 1:\n            out = torch.cat(ups, dim=1)\n        else:\n            out = ups[0]\n        \n        return [out]\n"
  },
  {
    "path": "projects/occ_plugin/occupancy/voxel_encoder/__init__.py",
    "content": "from .sparse_lidar_enc import SparseLiDAREnc4x, SparseLiDAREnc8x"
  },
  {
    "path": "projects/occ_plugin/occupancy/voxel_encoder/sparse_lidar_enc.py",
    "content": "import math\nfrom functools import partial\nfrom mmcv.cnn import build_conv_layer, build_norm_layer, build_upsample_layer\nfrom mmcv.runner import BaseModule\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nimport spconv.pytorch as spconv\nfrom spconv.pytorch import functional as Fsp\n\nfrom mmdet3d.models.builder import MIDDLE_ENCODERS\n\nimport copy\n\ndef post_act_block(in_channels, out_channels, kernel_size, indice_key=None, stride=1, padding=0,\n                   conv_type='subm', norm_cfg=None):\n\n    if conv_type == 'subm':\n        conv = spconv.SubMConv3d(in_channels, out_channels, kernel_size, bias=False, indice_key=indice_key)\n    elif conv_type == 'spconv':\n        conv = spconv.SparseConv3d(in_channels, out_channels, kernel_size, stride=stride, padding=padding,\n                                   bias=False, indice_key=indice_key)\n    elif conv_type == 'inverseconv':\n        conv = spconv.SparseInverseConv3d(in_channels, out_channels, kernel_size, indice_key=indice_key, bias=False)\n    else:\n        raise NotImplementedError\n\n    m = spconv.SparseSequential(\n        conv,\n        build_norm_layer(norm_cfg, out_channels)[1],\n        nn.ReLU(inplace=True),\n    )\n\n    return m\n\n\n\nclass SparseBasicBlock(spconv.SparseModule):\n\n    def __init__(self, inplanes, planes, stride=1, norm_cfg=None, indice_key=None):\n        super(SparseBasicBlock, self).__init__()\n\n        self.net = spconv.SparseSequential(\n            spconv.SubMConv3d(inplanes, planes, kernel_size=3, stride=stride, padding=1, bias=False, indice_key=indice_key),\n            build_norm_layer(norm_cfg, planes)[1],\n            nn.ReLU(inplace=True),\n            spconv.SubMConv3d(planes, planes, kernel_size=3, stride=stride, padding=1, bias=False, indice_key=indice_key),\n            build_norm_layer(norm_cfg, planes)[1],\n        )\n\n        self.relu = nn.ReLU(inplace=True)\n\n    def forward(self, x):\n        identity = x\n        out = self.net(x)\n        out = out.replace_feature(out.features + identity.features)\n        out = out.replace_feature(self.relu(out.features))\n\n        return out\n\n\n\n@MIDDLE_ENCODERS.register_module()\nclass SparseLiDAREnc4x(nn.Module):\n    def __init__(self, input_channel, norm_cfg, base_channel, out_channel, \n                sparse_shape_xyz, **kwargs):\n        super().__init__()\n\n        block = post_act_block\n        self.sparse_shape_xyz = sparse_shape_xyz\n\n        self.conv_input = spconv.SparseSequential(\n            spconv.SubMConv3d(input_channel, base_channel, 3),\n            nn.GroupNorm(16, base_channel),\n            nn.ReLU(inplace=True))\n\n        self.conv1 = spconv.SparseSequential(\n            SparseBasicBlock(base_channel, base_channel, norm_cfg=norm_cfg, indice_key='res1'),\n            SparseBasicBlock(base_channel, base_channel, norm_cfg=norm_cfg, indice_key='res1'),\n        )\n\n        self.conv2 = spconv.SparseSequential(\n            block(base_channel, base_channel*2, 3, norm_cfg=norm_cfg, stride=2, padding=1, indice_key='spconv2', conv_type='spconv'),\n            SparseBasicBlock(base_channel*2, base_channel*2, norm_cfg=norm_cfg, indice_key='res2'),\n            SparseBasicBlock(base_channel*2, base_channel*2, norm_cfg=norm_cfg, indice_key='res2'),\n        )\n\n        self.conv3 = spconv.SparseSequential(\n            block(base_channel*2, base_channel*4, 3, norm_cfg=norm_cfg, stride=2, padding=1, indice_key='spconv3', conv_type='spconv'),\n            SparseBasicBlock(base_channel*4, base_channel*4, norm_cfg=norm_cfg, indice_key='res3'),\n            SparseBasicBlock(base_channel*4, base_channel*4, norm_cfg=norm_cfg, indice_key='res3'),\n        )\n\n        self.conv_out = spconv.SparseSequential(\n            spconv.SubMConv3d(base_channel*4, out_channel, 3),\n            nn.GroupNorm(16, out_channel),\n            nn.ReLU(inplace=True))\n\n\n\n    def forward(self, voxel_features, coors, batch_size):\n        # spconv encoding\n        coors = coors.int()\n        # FIXME bs=1 hardcode\n        input_sp_tensor = spconv.SparseConvTensor(voxel_features, coors, self.sparse_shape_xyz[::-1], batch_size)\n        x = self.conv_input(input_sp_tensor)\n\n        x_conv1 = self.conv1(x)\n        x_conv2 = self.conv2(x_conv1)\n        x_conv3 = self.conv3(x_conv2)\n        \n        x = self.conv_out(x_conv3)\n\n        return {'x': x.dense().permute(0,1,4,3,2), # B, C, W, H, D \n                'pts_feats': [x]}\n\n\n\n\n\n@MIDDLE_ENCODERS.register_module()\nclass SparseLiDAREnc8x(nn.Module):\n    def __init__(self, input_channel, norm_cfg, base_channel, out_channel, \n                sparse_shape_xyz, **kwargs):\n        super().__init__()\n\n        block = post_act_block\n        self.sparse_shape_xyz = sparse_shape_xyz\n\n        self.conv_input = spconv.SparseSequential(\n            spconv.SubMConv3d(input_channel, base_channel, 3),\n            nn.GroupNorm(16, base_channel),\n            nn.ReLU(inplace=True))\n\n        self.conv1 = spconv.SparseSequential(\n            block(base_channel, base_channel*2, 3, norm_cfg=norm_cfg, stride=2, padding=1, indice_key='spconv1', conv_type='spconv'),\n            SparseBasicBlock(base_channel*2, base_channel*2, norm_cfg=norm_cfg, indice_key='res1'),\n            SparseBasicBlock(base_channel*2, base_channel*2, norm_cfg=norm_cfg, indice_key='res1'),\n        )\n\n        self.conv2 = spconv.SparseSequential(\n            block(base_channel*2, base_channel*4, 3, norm_cfg=norm_cfg, stride=2, padding=1, indice_key='spconv2', conv_type='spconv'),\n            SparseBasicBlock(base_channel*4, base_channel*4, norm_cfg=norm_cfg, indice_key='res2'),\n            SparseBasicBlock(base_channel*4, base_channel*4, norm_cfg=norm_cfg, indice_key='res2'),\n        )\n\n        self.conv3 = spconv.SparseSequential(\n            block(base_channel*4, base_channel*8, 3, norm_cfg=norm_cfg, stride=2, padding=1, indice_key='spconv3', conv_type='spconv'),\n            SparseBasicBlock(base_channel*8, base_channel*8, norm_cfg=norm_cfg, indice_key='res3'),\n            SparseBasicBlock(base_channel*8, base_channel*8, norm_cfg=norm_cfg, indice_key='res3'),\n        )\n\n        self.conv_out = spconv.SparseSequential(\n            spconv.SubMConv3d(base_channel*8, out_channel, 3),\n            nn.GroupNorm(16, out_channel),\n            nn.ReLU(inplace=True))\n\n\n\n    def forward(self, voxel_features, coors, batch_size):\n        # spconv encoding\n        coors = coors.int()\n        # FIXME bs=1 hardcode\n        input_sp_tensor = spconv.SparseConvTensor(voxel_features, coors, self.sparse_shape_xyz[::-1], batch_size)\n        x = self.conv_input(input_sp_tensor)\n\n        x_conv1 = self.conv1(x)\n        x_conv2 = self.conv2(x_conv1)\n        x_conv3 = self.conv3(x_conv2)\n        \n        x = self.conv_out(x_conv3)\n\n        return {'x': x.dense().permute(0,1,4,3,2), # B, C, W, H, D \n                'pts_feats': [x]}\n\n"
  },
  {
    "path": "projects/occ_plugin/ops/__init__.py",
    "content": "from .occ_pooling import *"
  },
  {
    "path": "projects/occ_plugin/ops/occ_pooling/OCC_Pool.py",
    "content": "import torch\n\nfrom projects.occ_plugin.ops.occ_pooling import occ_pool_ext\n\n__all__ = [\"occ_pool\"]\n\n\nclass QuickCumsum(torch.autograd.Function):\n    @staticmethod\n    def forward(ctx, x, geom_feats, ranks):\n        x = x.cumsum(0)\n        kept = torch.ones(x.shape[0], device=x.device, dtype=torch.bool)\n        kept[:-1] = ranks[1:] != ranks[:-1]\n\n        x, geom_feats = x[kept], geom_feats[kept]\n        x = torch.cat((x[:1], x[1:] - x[:-1]))\n\n        # save kept for backward\n        ctx.save_for_backward(kept)\n\n        # no gradient for geom_feats\n        ctx.mark_non_differentiable(geom_feats)\n\n        return x, geom_feats\n\n    @staticmethod\n    def backward(ctx, gradx, gradgeom):\n        (kept,) = ctx.saved_tensors\n        back = torch.cumsum(kept, 0)\n        back[kept] -= 1\n\n        val = gradx[back]\n\n        return val, None, None\n\n\nclass QuickCumsumCuda(torch.autograd.Function):\n    @staticmethod\n    def forward(ctx, x, geom_feats, ranks, B, D, H, W):\n        kept = torch.ones(x.shape[0], device=x.device, dtype=torch.bool)\n        kept[1:] = ranks[1:] != ranks[:-1]\n        interval_starts = torch.where(kept)[0].int()\n        interval_lengths = torch.zeros_like(interval_starts)\n        interval_lengths[:-1] = interval_starts[1:] - interval_starts[:-1]\n        interval_lengths[-1] = x.shape[0] - interval_starts[-1]\n        geom_feats = geom_feats.int()\n\n        out = occ_pool_ext.occ_pool_forward(\n            x,\n            geom_feats,\n            interval_lengths,\n            interval_starts,\n            B,\n            D,\n            H,\n            W,\n        )\n\n        ctx.save_for_backward(interval_starts, interval_lengths, geom_feats)\n        ctx.saved_shapes = B, D, H, W\n        return out\n\n    @staticmethod\n    def backward(ctx, out_grad):\n        interval_starts, interval_lengths, geom_feats = ctx.saved_tensors\n        B, D, H, W = ctx.saved_shapes\n\n        out_grad = out_grad.contiguous()\n        x_grad = occ_pool_ext.occ_pool_backward(\n            out_grad,\n            geom_feats,\n            interval_lengths,\n            interval_starts,\n            B,\n            D,\n            H,\n            W,\n        )\n\n        return x_grad, None, None, None, None, None, None\n\n\ndef occ_pool(feats, coords, B, D, H, W):\n    assert feats.shape[0] == coords.shape[0]\n\n    ranks = (\n        coords[:, 0] * (W * D * B)\n        + coords[:, 1] * (D * B)\n        + coords[:, 2] * B\n        + coords[:, 3]\n    )\n    indices = ranks.argsort()\n    feats, coords, ranks = feats[indices], coords[indices], ranks[indices]\n\n    x = QuickCumsumCuda.apply(feats, coords, ranks, B, D, H, W)\n    x = x.permute(0, 4, 1, 2, 3).contiguous()\n\n    return x"
  },
  {
    "path": "projects/occ_plugin/ops/occ_pooling/__init__.py",
    "content": "from .OCC_Pool import occ_pool"
  },
  {
    "path": "projects/occ_plugin/ops/occ_pooling/src/occ_pool.cpp",
    "content": "#include <torch/torch.h>\n#include <c10/cuda/CUDAGuard.h>\n\n// CUDA function declarations\nvoid occ_pool(int b, int d, int h, int w, int n, int c, int n_intervals, const float* x,\n    const int* geom_feats, const int* interval_starts, const int* interval_lengths, float* out);\n\nvoid occ_pool_grad(int b, int d, int h, int w, int n, int c, int n_intervals, const float* out_grad,\n  const int* geom_feats, const int* interval_starts, const int* interval_lengths, float* x_grad);\n\n\n/*\n  Function: pillar pooling (forward, cuda)\n  Args:\n    x                : input features, FloatTensor[n, c]\n    geom_feats       : input coordinates, IntTensor[n, 4]\n    interval_lengths : starting position for pooled point, IntTensor[n_intervals]\n    interval_starts  : how many points in each pooled point, IntTensor[n_intervals]\n  Return:\n    out              : output features, FloatTensor[b, d, h, w, c]\n*/\nat::Tensor occ_pool_forward(\n  const at::Tensor _x,\n  const at::Tensor _geom_feats, \n  const at::Tensor _interval_lengths, \n  const at::Tensor _interval_starts,\n  int b, int d, int h, int w\n) {\n  int n = _x.size(0);\n  int c = _x.size(1);\n  int n_intervals = _interval_lengths.size(0);\n  const at::cuda::OptionalCUDAGuard device_guard(device_of(_x));\n  const float* x = _x.data_ptr<float>();\n  const int* geom_feats = _geom_feats.data_ptr<int>();\n  const int* interval_lengths = _interval_lengths.data_ptr<int>();\n  const int* interval_starts = _interval_starts.data_ptr<int>();\n  \n  auto options =\n      torch::TensorOptions().dtype(_x.dtype()).device(_x.device());\n  at::Tensor _out = torch::zeros({b, d, h, w, c}, options);\n  float* out = _out.data_ptr<float>();\n  occ_pool(\n    b, d, h, w, n, c, n_intervals, x,\n    geom_feats, interval_starts, interval_lengths, out\n  );\n  return _out;\n}\n\n\n/*\n  Function: pillar pooling (backward, cuda)\n  Args:\n    out_grad         : input features, FloatTensor[b, d, h, w, c]\n    geom_feats       : input coordinates, IntTensor[n, 4]\n    interval_lengths : starting position for pooled point, IntTensor[n_intervals]\n    interval_starts  : how many points in each pooled point, IntTensor[n_intervals]\n  Return:\n    x_grad           : output features, FloatTensor[n, 4]\n*/\nat::Tensor occ_pool_backward(\n  const at::Tensor _out_grad,\n  const at::Tensor _geom_feats, \n  const at::Tensor _interval_lengths, \n  const at::Tensor _interval_starts,\n  int b, int d, int h, int w\n) {\n  int n = _geom_feats.size(0);\n  int c = _out_grad.size(4);\n  int n_intervals = _interval_lengths.size(0);\n  const at::cuda::OptionalCUDAGuard device_guard(device_of(_out_grad));\n  const float* out_grad = _out_grad.data_ptr<float>();\n  const int* geom_feats = _geom_feats.data_ptr<int>();\n  const int* interval_lengths = _interval_lengths.data_ptr<int>();\n  const int* interval_starts = _interval_starts.data_ptr<int>();\n\n  auto options =\n      torch::TensorOptions().dtype(_out_grad.dtype()).device(_out_grad.device());\n  at::Tensor _x_grad = torch::zeros({n, c}, options);\n  float* x_grad = _x_grad.data_ptr<float>();\n  \n  occ_pool_grad(\n    b, d, h, w, n, c, n_intervals, out_grad,\n    geom_feats, interval_starts, interval_lengths, x_grad\n  );\n  \n  return _x_grad;\n}\n\nPYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {\n  m.def(\"occ_pool_forward\", &occ_pool_forward,\n        \"occ_pool_forward\");\n  m.def(\"occ_pool_backward\", &occ_pool_backward,\n        \"occ_pool_backward\");\n}"
  },
  {
    "path": "projects/occ_plugin/ops/occ_pooling/src/occ_pool_cuda.cu",
    "content": "#include <stdio.h>\n#include <stdlib.h>\n\n/*\n  Function: pillar pooling\n  Args:\n    b                : batch size\n    d                : depth of the feature map\n    h                : height of pooled feature map\n    w                : width of pooled feature map\n    n                : number of input points\n    c                : number of channels\n    n_intervals      : number of unique points\n    x                : input features, FloatTensor[n, c]\n    geom_feats       : input coordinates, IntTensor[n, 4]\n    interval_lengths : starting position for pooled point, IntTensor[n_intervals]\n    interval_starts  : how many points in each pooled point, IntTensor[n_intervals]\n    out              : output features, FloatTensor[b, d, h, w, c]\n*/\n__global__ void occ_pool_kernel(int b, int d, int h, int w, int n, int c, int n_intervals,\n                                  const float *__restrict__ x,\n                                  const int *__restrict__ geom_feats,\n                                  const int *__restrict__ interval_starts,\n                                  const int *__restrict__ interval_lengths,\n                                  float* __restrict__ out) {\n  int idx = blockIdx.x * blockDim.x + threadIdx.x;\n  int index = idx / c;\n  int cur_c = idx % c;\n  if (index >= n_intervals) return;\n  int interval_start = interval_starts[index];\n  int interval_length = interval_lengths[index];\n  const int* cur_geom_feats = geom_feats + interval_start * 4;\n  const float* cur_x = x + interval_start * c + cur_c;\n  float* cur_out = out + cur_geom_feats[3] * d * h * w * c + \n    cur_geom_feats[2] * h * w * c + cur_geom_feats[0] * w * c + \n    cur_geom_feats[1] * c + cur_c;\n  float psum = 0;\n  for(int i = 0; i < interval_length; i++){\n    psum += cur_x[i * c];\n  }\n  *cur_out = psum;\n}\n\n\n/*\n  Function: pillar pooling backward\n  Args:\n    b                : batch size\n    d                : depth of the feature map\n    h                : height of pooled feature map\n    w                : width of pooled feature map\n    n                : number of input points\n    c                : number of channels\n    n_intervals      : number of unique points\n    out_grad         : gradient of the BEV fmap from top, FloatTensor[b, d, h, w, c]\n    geom_feats       : input coordinates, IntTensor[n, 4]\n    interval_lengths : starting position for pooled point, IntTensor[n_intervals]\n    interval_starts  : how many points in each pooled point, IntTensor[n_intervals]\n    x_grad           : gradient of the image fmap, FloatTensor\n*/\n__global__ void occ_pool_grad_kernel(int b, int d, int h, int w, int n, int c, int n_intervals,\n                                  const float *__restrict__ out_grad,\n                                  const int *__restrict__ geom_feats,\n                                  const int *__restrict__ interval_starts,\n                                  const int *__restrict__ interval_lengths,\n                                  float* __restrict__ x_grad) {\n  int idx = blockIdx.x * blockDim.x + threadIdx.x;\n  int index = idx / c;\n  int cur_c = idx % c;\n  if (index >= n_intervals) return;\n  int interval_start = interval_starts[index];\n  int interval_length = interval_lengths[index];\n  \n  const int* cur_geom_feats = geom_feats + interval_start * 4;\n  float* cur_x_grad = x_grad + interval_start * c + cur_c;\n  \n  const float* cur_out_grad = out_grad + cur_geom_feats[3] * d * h * w * c + \n    cur_geom_feats[2] * h * w * c + cur_geom_feats[0] * w * c + \n    cur_geom_feats[1] * c + cur_c;\n  for(int i = 0; i < interval_length; i++){\n    cur_x_grad[i * c] = *cur_out_grad;\n  }\n  \n}\n\nvoid occ_pool(int b, int d, int h, int w, int n, int c, int n_intervals, const float* x,\n  const int* geom_feats, const int* interval_starts, const int* interval_lengths, float* out) {\n  occ_pool_kernel<<<(int)ceil(((double)n_intervals * c / 256)), 256>>>(\n    b, d, h, w, n, c, n_intervals, x, geom_feats, interval_starts, interval_lengths, out\n  );\n}\n\nvoid occ_pool_grad(int b, int d, int h, int w, int n, int c, int n_intervals, const float* out_grad,\n  const int* geom_feats, const int* interval_starts, const int* interval_lengths, float* x_grad) {\n  occ_pool_grad_kernel<<<(int)ceil(((double)n_intervals * c / 256)), 256>>>(\n    b, d, h, w, n, c, n_intervals, out_grad, geom_feats, interval_starts, interval_lengths, x_grad\n  );\n}"
  },
  {
    "path": "projects/occ_plugin/utils/__init__.py",
    "content": "from .formating import cm_to_ious, format_results\nfrom .metric_util import per_class_iu, fast_hist_crop\nfrom .coordinate_transform import coarse_to_fine_coordinates, project_points_on_img\nfrom .geometry import convert_egopose_to_matrix_numpy, invert_matrix_egopose_numpy"
  },
  {
    "path": "projects/occ_plugin/utils/coordinate_transform.py",
    "content": "\nimport torch\n\ndef coarse_to_fine_coordinates(coarse_cor, ratio, topk=30000):\n    \"\"\"\n    Args:\n        coarse_cor (torch.Tensor): [3, N]\"\"\"\n\n    fine_cor = coarse_cor * ratio\n    fine_cor = fine_cor[None].repeat(ratio**3, 1, 1)  # [8, 3, N]\n\n    device = fine_cor.device\n    value = torch.meshgrid([torch.arange(ratio).to(device), torch.arange(ratio).to(device), torch.arange(ratio).to(device)])\n    value = torch.stack(value, dim=3).reshape(-1, 3)\n\n    fine_cor = fine_cor + value[:,:,None]\n\n    if fine_cor.shape[-1] < topk:\n        return fine_cor.permute(1,0,2).reshape(3,-1)\n    else:\n        fine_cor = fine_cor[:,:,torch.randperm(fine_cor.shape[-1])[:topk]]\n        return fine_cor.permute(1,0,2).reshape(3,-1)\n\n\n\ndef project_points_on_img(points, rots, trans, intrins, post_rots, post_trans, bda_mat, pts_range,\n                        W_img, H_img, W_occ, H_occ, D_occ):\n    with torch.no_grad():\n        voxel_size = ((pts_range[3:] - pts_range[:3]) / torch.tensor([W_occ-1, H_occ-1, D_occ-1])).to(points.device)\n        points = points * voxel_size[None, None] + pts_range[:3][None, None].to(points.device)\n\n        # project 3D point cloud (after bev-aug) onto multi-view images for corresponding 2D coordinates\n        inv_bda = bda_mat.inverse()\n        points = (inv_bda @ points.unsqueeze(-1)).squeeze(-1)\n        \n        # from lidar to camera\n        points = points.view(-1, 1, 3)\n        points = points - trans.view(1, -1, 3)\n        inv_rots = rots.inverse().unsqueeze(0)\n        points = (inv_rots @ points.unsqueeze(-1))\n        \n        # from camera to raw pixel\n        points = (intrins.unsqueeze(0) @ points).squeeze(-1)\n        points_d = points[..., 2:3]\n        points_uv = points[..., :2] / (points_d + 1e-5)\n        \n        # from raw pixel to transformed pixel\n        points_uv = post_rots[..., :2, :2].unsqueeze(0) @ points_uv.unsqueeze(-1)\n        points_uv = points_uv.squeeze(-1) + post_trans[..., :2].unsqueeze(0)\n\n        points_uv[..., 0] = (points_uv[..., 0] / (W_img-1) - 0.5) * 2\n        points_uv[..., 1] = (points_uv[..., 1] / (H_img-1) - 0.5) * 2\n\n        mask = (points_d[..., 0] > 1e-5) \\\n            & (points_uv[..., 0] > -1) & (points_uv[..., 0] < 1) \\\n            & (points_uv[..., 1] > -1) & (points_uv[..., 1] < 1)\n    \n    return points_uv.permute(2,1,0,3), mask"
  },
  {
    "path": "projects/occ_plugin/utils/formating.py",
    "content": "from prettytable import PrettyTable\nimport numpy as np\n\ndef cm_to_ious(cm):\n    # SC：[TN FP \\n FN TP]\n    mean_ious = []\n    cls_num = len(cm)\n    for i in range(cls_num):\n        tp = cm[i, i]\n        p = cm[:, i].sum()\n        g = cm[i, :].sum()\n        union = p + g - tp\n        mean_ious.append(tp / union)\n    \n    return mean_ious\n\ndef format_results(mean_ious, return_dic=False):\n    class_map = {\n        1: 'barrier',\n        2: 'bicycle',\n        3: 'bus',\n        4: 'car',\n        5: 'construction_vehicle',\n        6: 'motorcycle',\n        7: 'pedestrian',\n        8: 'traffic_cone',\n        9: 'trailer',\n        10: 'truck',\n        11: 'driveable_surface',\n        12: 'other_flat',\n        13: 'sidewalk',\n        14: 'terrain',\n        15: 'manmade',\n        16: 'vegetation',\n    }\n    \n    x = PrettyTable()\n    x.field_names = ['class', 'IoU']\n    class_names = list(class_map.values()) + ['mean']\n    class_ious = mean_ious + [sum(mean_ious) / len(mean_ious)]\n    dic = {}\n    \n    for cls_name, cls_iou in zip(class_names, class_ious):\n        dic[cls_name] = round(cls_iou, 3)\n        x.add_row([cls_name, round(cls_iou, 3)])\n    \n    if return_dic:\n        return x, dic \n    else:\n        return x\n\n\n\n\ndef format_iou_results(mean_ious, return_dic=False):\n    if len(mean_ious) == 2:\n        class_map = {\n            0: 'free',\n            1: 'movable objects',\n        }\n    else:\n        class_map = {\n            0: 'free',\n            1: 'bicycle',\n            2: 'bus',\n            3: 'car',\n            4: 'construction',\n            5: 'motorcycle',\n            6: 'trailer',\n            7: 'truck',\n            8: 'pedestrian',\n        }\n    x = PrettyTable()\n    x.field_names = ['class', 'IoU']\n    class_names = list(class_map.values())\n    class_ious = mean_ious\n    dic = {}\n    \n    for cls_name, cls_iou in zip(class_names, class_ious):\n        dic[cls_name] = np.round(cls_iou, 3)\n        x.add_row([cls_name, np.round(cls_iou, 3)])\n    \n    mean_ious = sum(mean_ious[1:]) / len(mean_ious[1:])\n    dic['mean'] = np.round(mean_ious, 3)\n    x.add_row(['mean', np.round(mean_ious, 3)])\n    \n    if return_dic:\n        return x, dic \n    else:\n        return x\n\ndef format_vel_results(mean_epe, return_dic=False):\n    class_map = {\n        0: 'barrier',\n        1: 'bicycle',\n        2: 'bus',\n        3: 'car',\n        4: 'construction_vehicle',\n        5: 'motorcycle',\n        6: 'pedestrian',\n        7: 'traffic_cone',\n        8: 'trailer',\n        9: 'truck',\n    }\n    x = PrettyTable()\n    x.field_names = ['class', 'EPE']\n    class_names = list(class_map.values())\n    class_epes = mean_epe\n    dic = {}\n    \n    for cls_name, cls_iou in zip(class_names, class_epes):\n        dic[cls_name] = np.round(cls_iou, 3)\n        x.add_row([cls_name, np.round(cls_iou, 3)])\n\n    mean_all_epe = mean_epe.mean()\n    dic['mean'] = np.round(mean_all_epe, 3)\n    x.add_row(['mean', np.round(mean_all_epe, 3)])\n    if return_dic:\n        return x, dic \n    else:\n        return x"
  },
  {
    "path": "projects/occ_plugin/utils/gaussian.py",
    "content": "import numpy as np\nimport torch\nimport torch.nn.functional as F\nfrom torch.distributions import Normal\nimport pdb\n\ndef gaussian_2d(shape, sigma=1):\n    \"\"\"Generate gaussian map.\n\n    Args:\n        shape (list[int]): Shape of the map.\n        sigma (float): Sigma to generate gaussian map.\n            Defaults to 1.\n\n    Returns:\n        np.ndarray: Generated gaussian map.\n    \"\"\"\n    m, n = [(ss - 1.) / 2. for ss in shape]\n    y, x = np.ogrid[-m:m + 1, -n:n + 1]\n\n    h = np.exp(-(x * x + y * y) / (2 * sigma * sigma))\n    h[h < np.finfo(h.dtype).eps * h.max()] = 0\n    return h\n\n\ndef draw_heatmap_gaussian(heatmap, center, radius, k=1):\n    \"\"\"Get gaussian masked heatmap.\n\n    Args:\n        heatmap (torch.Tensor): Heatmap to be masked.\n        center (torch.Tensor): Center coord of the heatmap.\n        radius (int): Radius of gausian.\n        K (int): Multiple of masked_gaussian. Defaults to 1.\n\n    Returns:\n        torch.Tensor: Masked heatmap.\n    \"\"\"\n    diameter = 2 * radius + 1\n    gaussian = gaussian_2d((diameter, diameter), sigma=diameter / 6)\n\n    x, y = int(center[0]), int(center[1])\n\n    height, width = heatmap.shape[0:2]\n\n    left, right = min(x, radius), min(width - x, radius + 1)\n    top, bottom = min(y, radius), min(height - y, radius + 1)\n\n    masked_heatmap = heatmap[y - top:y + bottom, x - left:x + right]\n    masked_gaussian = torch.from_numpy(\n        gaussian[radius - top:radius + bottom,\n                 radius - left:radius + right]).to(heatmap.device,\n                                                   torch.float32)\n    if min(masked_gaussian.shape) > 0 and min(masked_heatmap.shape) > 0:\n        torch.max(masked_heatmap, masked_gaussian * k, out=masked_heatmap)\n    return heatmap\n\n\ndef gaussian_radius(det_size, min_overlap=0.5):\n    \"\"\"Get radius of gaussian.\n\n    Args:\n        det_size (tuple[torch.Tensor]): Size of the detection result.\n        min_overlap (float): Gaussian_overlap. Defaults to 0.5.\n\n    Returns:\n        torch.Tensor: Computed radius.\n    \"\"\"\n    height, width = det_size\n\n    a1 = 1\n    b1 = (height + width)\n    c1 = width * height * (1 - min_overlap) / (1 + min_overlap)\n    sq1 = torch.sqrt(b1**2 - 4 * a1 * c1)\n    r1 = (b1 + sq1) / 2\n\n    a2 = 4\n    b2 = 2 * (height + width)\n    c2 = (1 - min_overlap) * width * height\n    sq2 = torch.sqrt(b2**2 - 4 * a2 * c2)\n    r2 = (b2 + sq2) / 2\n\n    a3 = 4 * min_overlap\n    b3 = -2 * min_overlap * (height + width)\n    c3 = (min_overlap - 1) * width * height\n    sq3 = torch.sqrt(b3**2 - 4 * a3 * c3)\n    r3 = (b3 + sq3) / 2\n    return min(r1, r2, r3)\n\n\ndef generate_guassian_depth_target(depth, stride, cam_depth_range, constant_std=None):\n    depth = depth.flatten(0, 1)  # [bs*s, 6, 896, 1600] -> [bs*s*6, 896, 1600]\n    B, tH, tW = depth.shape\n    kernel_size = stride  # [4,4,4]\n    center_idx = kernel_size * kernel_size // 2\n    H = tH // stride  # 896//4 = 248\n    W = tW // stride  # 1600//4 = 400\n    \n    unfold_depth = F.unfold(depth.unsqueeze(1), kernel_size, dilation=1, padding=0, stride=stride) #B, Cxkxk, HxW\n    unfold_depth = unfold_depth.view(B, -1, H, W).permute(0, 2, 3, 1).contiguous() # B, H, W, kxk\n    valid_mask = (unfold_depth != 0) # BN, H, W, kxk\n    \n    if constant_std is None:\n        valid_mask_f = valid_mask.float() # BN, H, W, kxk\n        valid_num = torch.sum(valid_mask_f, dim=-1) # BN, H, W\n        valid_num[valid_num == 0] = 1e10\n        \n        mean = torch.sum(unfold_depth, dim=-1) / valid_num\n        var_sum = torch.sum(((unfold_depth - mean.unsqueeze(-1))**2) * valid_mask_f, dim=-1) # BN, H, W\n        std_var = torch.sqrt(var_sum / valid_num)\n        std_var[valid_num == 1] = 1 # set std_var to 1 when only one point in patch\n    else:\n        std_var = torch.ones((B, H, W)).type_as(depth).float() * constant_std\n\n    unfold_depth[~valid_mask] = 1e10\n    min_depth = torch.min(unfold_depth, dim=-1)[0] #BN, H, W\n    min_depth[min_depth == 1e10] = 0\n    \n    # x in raw depth \n    x = torch.arange(cam_depth_range[0] - cam_depth_range[2] / 2, cam_depth_range[1], cam_depth_range[2])\n    # normalized by intervals\n    dist = Normal(min_depth / cam_depth_range[2], std_var / cam_depth_range[2]) # BN, H, W, D\n    cdfs = []\n    for i in x:\n        cdf = dist.cdf(i)\n        cdfs.append(cdf)\n    \n    cdfs = torch.stack(cdfs, dim=-1)\n    depth_dist = cdfs[..., 1:] - cdfs[...,:-1]\n    \n    return depth_dist, min_depth"
  },
  {
    "path": "projects/occ_plugin/utils/geometry.py",
    "content": "import numpy as np\nimport PIL\nimport torch\nimport torch.nn.functional as F\nfrom pyquaternion import Quaternion\n\n\ndef convert_egopose_to_matrix_numpy(trans, rot):\n    transformation_matrix = np.zeros((4, 4), dtype=np.float32)\n    rotation = Quaternion(rot).rotation_matrix\n    translation = np.array(trans)\n    transformation_matrix[:3, :3] = rotation\n    transformation_matrix[:3, 3] = translation\n    transformation_matrix[3, 3] = 1.0\n    return transformation_matrix\n\n\ndef invert_matrix_egopose_numpy(egopose):\n    \"\"\" Compute the inverse transformation of a 4x4 egopose numpy matrix.\"\"\"\n    inverse_matrix = np.zeros((4, 4), dtype=np.float32)\n    rotation = egopose[:3, :3]\n    translation = egopose[:3, 3]\n    inverse_matrix[:3, :3] = rotation.T\n    inverse_matrix[:3, 3] = -np.dot(rotation.T, translation)\n    inverse_matrix[3, 3] = 1.0\n    return inverse_matrix"
  },
  {
    "path": "projects/occ_plugin/utils/metric_util.py",
    "content": "# -*- coding:utf-8 -*-\n# author: Xinge\n# @file: metric_util.py \n\nimport numpy as np\n\ndef fast_hist(pred, label, n):\n    k = (label >= 0) & (label < n)\n    bin_count = np.bincount(\n        n * label[k].astype(int) + pred[k], minlength=n ** 2)\n    return bin_count[:n ** 2].reshape(n, n)\n\ndef per_class_iu(hist):\n    return np.diag(hist) / (hist.sum(1) + hist.sum(0) - np.diag(hist))\n\ndef fast_hist_crop(output, target, unique_label):\n    hist = fast_hist(output.flatten(), target.flatten(), np.max(unique_label) + 2)\n    \n    hist = hist[unique_label + 1, :]\n    hist = hist[:, unique_label + 1]\n    \n    return hist\n\nclass SSCMetrics:\n    def __init__(self, class_names, ignore_idx=255, empty_idx=None):\n        self.class_names = class_names\n        self.n_classes = len(class_names)\n        self.ignore_idx = ignore_idx\n        self.empty_idx = empty_idx\n        self.reset()\n\n    def hist_info(self, n_cl, pred, gt):\n        assert pred.shape == gt.shape\n        k = (gt >= 0) & (gt < n_cl)  # exclude 255\n        labeled = np.sum(k)\n        correct = np.sum((pred[k] == gt[k]))\n\n        return (\n            np.bincount(\n                n_cl * gt[k].astype(int) + pred[k].astype(int), minlength=n_cl ** 2\n            ).reshape(n_cl, n_cl),\n            correct,\n            labeled,\n        )\n\n    @staticmethod\n    def compute_score(hist, correct, labeled):\n        iu = np.diag(hist) / (hist.sum(1) + hist.sum(0) - np.diag(hist))\n        mean_IU = np.nanmean(iu)\n        mean_IU_no_back = np.nanmean(iu[1:])\n        freq = hist.sum(1) / hist.sum()\n        freq_IU = (iu[freq > 0] * freq[freq > 0]).sum()\n        mean_pixel_acc = correct / labeled if labeled != 0 else 0\n\n        return iu, mean_IU, mean_IU_no_back, mean_pixel_acc\n\n    def add_batch(self, y_pred, y_true, nonsurface=None):\n        self.count += 1\n        mask = y_true != self.ignore_idx\n        if self.empty_idx is not None:\n            mask = mask & (y_true != self.empty_idx)\n        if nonsurface is not None:\n            mask = mask & nonsurface\n        tp, fp, fn = self.get_score_completion(y_pred, y_true, mask)\n\n        self.completion_tp += tp\n        self.completion_fp += fp\n        self.completion_fn += fn\n\n        mask = y_true != self.ignore_idx\n        if self.empty_idx is not None:\n            mask = mask & (y_true != self.empty_idx)\n        tp_sum, fp_sum, fn_sum = self.get_score_semantic_and_completion(\n            y_pred, y_true, mask\n        )\n        self.tps += tp_sum\n        self.fps += fp_sum\n        self.fns += fn_sum\n\n    def get_stats(self):\n        if self.completion_tp != 0:\n            precision = self.completion_tp / (self.completion_tp + self.completion_fp)\n            recall = self.completion_tp / (self.completion_tp + self.completion_fn)\n            iou = self.completion_tp / (\n                self.completion_tp + self.completion_fp + self.completion_fn\n            )\n        else:\n            precision, recall, iou = 0, 0, 0\n        iou_ssc = self.tps / (self.tps + self.fps + self.fns + 1e-5)\n        return {\n            \"precision\": precision,\n            \"recall\": recall,\n            \"iou\": iou,\n            \"iou_ssc\": iou_ssc,\n            \"iou_ssc_mean\": np.mean(iou_ssc[1:]),\n        }\n\n    def reset(self):\n\n        self.completion_tp = 0\n        self.completion_fp = 0\n        self.completion_fn = 0\n        self.tps = np.zeros(self.n_classes)\n        self.fps = np.zeros(self.n_classes)\n        self.fns = np.zeros(self.n_classes)\n\n        self.hist_ssc = np.zeros((self.n_classes, self.n_classes))\n        self.labeled_ssc = 0\n        self.correct_ssc = 0\n\n        self.precision = 0\n        self.recall = 0\n        self.iou = 0\n        self.count = 1e-8\n        self.iou_ssc = np.zeros(self.n_classes, dtype=np.float32)\n        self.cnt_class = np.zeros(self.n_classes, dtype=np.float32)\n\n    def get_score_completion(self, predict, target, nonempty=None):\n        predict = np.copy(predict)\n        target = np.copy(target)\n\n        \"\"\"for scene completion, treat the task as two-classes problem, just empty or occupancy\"\"\"\n        _bs = predict.shape[0]  # batch size\n        # ---- ignore\n        predict[target == self.ignore_idx] = 0\n        target[target == self.ignore_idx] = 0\n        # ---- flatten\n        target = target.reshape(_bs, -1)  # (_bs, 129600)\n        predict = predict.reshape(_bs, -1)  # (_bs, _C, 129600), 60*36*60=129600\n        # ---- treat all non-empty object class as one category, set them to label 1\n        b_pred = np.zeros(predict.shape)\n        b_true = np.zeros(target.shape)\n        b_pred[predict != self.empty_idx] = 1\n        b_true[target != self.empty_idx] = 1\n        p, r, iou = 0.0, 0.0, 0.0\n        tp_sum, fp_sum, fn_sum = 0, 0, 0\n        for idx in range(_bs):\n            y_true = b_true[idx, :]  # GT\n            y_pred = b_pred[idx, :]\n            if nonempty is not None:\n                nonempty_idx = nonempty[idx, :].reshape(-1)\n                y_true = y_true[nonempty_idx == 1]\n                y_pred = y_pred[nonempty_idx == 1]\n\n            tp = np.array(np.where(np.logical_and(y_true == 1, y_pred == 1))).size\n            fp = np.array(np.where(np.logical_and(y_true != 1, y_pred == 1))).size\n            fn = np.array(np.where(np.logical_and(y_true == 1, y_pred != 1))).size\n            tp_sum += tp\n            fp_sum += fp\n            fn_sum += fn\n        return tp_sum, fp_sum, fn_sum\n\n    def get_score_semantic_and_completion(self, predict, target, nonempty=None):\n        target = np.copy(target)\n        predict = np.copy(predict)\n        _bs = predict.shape[0]  # batch size\n        _C = self.n_classes  # _C = 12\n        # ---- ignore\n        predict[target == self.ignore_idx] = 0\n        target[target == self.ignore_idx] = 0\n        # ---- flatten\n        target = target.reshape(_bs, -1)  # (_bs, 129600)\n        predict = predict.reshape(_bs, -1)  # (_bs, 129600), 60*36*60=129600\n\n        cnt_class = np.zeros(_C, dtype=np.int32)  # count for each class\n        iou_sum = np.zeros(_C, dtype=np.float32)  # sum of iou for each class\n        tp_sum = np.zeros(_C, dtype=np.int32)  # tp\n        fp_sum = np.zeros(_C, dtype=np.int32)  # fp\n        fn_sum = np.zeros(_C, dtype=np.int32)  # fn\n\n        for idx in range(_bs):\n            y_true = target[idx, :]  # GT\n            y_pred = predict[idx, :]\n            if nonempty is not None:\n                nonempty_idx = nonempty[idx, :].reshape(-1)\n                y_pred = y_pred[\n                    np.where(np.logical_and(nonempty_idx == 1, y_true != self.ignore_idx))\n                ]\n                y_true = y_true[\n                    np.where(np.logical_and(nonempty_idx == 1, y_true != self.ignore_idx))\n                ]\n            for j in range(_C):  # for each class\n                tp = np.array(np.where(np.logical_and(y_true == j, y_pred == j))).size\n                fp = np.array(np.where(np.logical_and(y_true != j, y_pred == j))).size\n                fn = np.array(np.where(np.logical_and(y_true == j, y_pred != j))).size\n\n                tp_sum[j] += tp\n                fp_sum[j] += fp\n                fn_sum[j] += fn\n\n        return tp_sum, fp_sum, fn_sum"
  },
  {
    "path": "projects/occ_plugin/utils/nusc_param.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport numpy as np\n\n\nnusc_class_frequencies = np.array([2242961742295, 25985376, 1561108, 28862014, 196106643, 15920504,\n                2158753, 26539491, 4004729, 34838681, 75173306, 2255027978, 50959399, 646022466, 869055679,\n                1446141335, 1724391378])\n\n\nnusc_class_names = [\n    \"empty\",\n    \"barrier\",\n    \"bicycle\",\n    \"bus\",\n    \"car\",\n    \"construction\",\n    \"motorcycle\",\n    \"pedestrian\",\n    \"trafficcone\",\n    \"trailer\",\n    \"truck\",\n    \"driveable_surface\",\n    \"other\",\n    \"sidewalk\",\n    \"terrain\",\n    \"mannade\",\n    \"vegetation\",\n]\n\n\nclassname_to_color = {  # RGB.\n    # 0: (0, 0, 0),  # Black. noise\n    1: (112, 128, 144),  # Slategrey barrier\n    2: (220, 20, 60),  # Crimson bicycle\n    3: (255, 127, 80),  # Orangered bus\n    4: (255, 158, 0),  # Orange car\n    5: (233, 150, 70),  # Darksalmon construction\n    6: (255, 61, 99),  # Red motorcycle\n    7: (0, 0, 230),  # Blue pedestrian\n    8: (47, 79, 79),  # Darkslategrey trafficcone\n    9: (255, 140, 0),  # Darkorange trailer\n    10: (255, 99, 71),  # Tomato truck\n    11: (0, 207, 191),  # nuTonomy green driveable_surface\n    12: (175, 0, 75),  # flat other\n    13: (75, 0, 75),  # sidewalk\n    14: (112, 180, 60),  # terrain\n    15: (222, 184, 135),  # Burlywood mannade\n    16: (0, 175, 0),  # Green vegetation\n}\n\ndef KL_sep(p, target):\n    \"\"\"\n    KL divergence on nonzeros classes\n    \"\"\"\n    nonzeros = target != 0\n    nonzero_p = p[nonzeros]\n    kl_term = F.kl_div(torch.log(nonzero_p), target[nonzeros], reduction=\"sum\")\n    return kl_term\n\n\ndef geo_scal_loss(pred, ssc_target):\n\n    # Get softmax probabilities\n    pred = F.softmax(pred, dim=1)\n\n    # Compute empty and nonempty probabilities\n    empty_probs = pred[:, 0, :, :, :]\n    nonempty_probs = 1 - empty_probs\n\n    # Remove unknown voxels\n    mask = ssc_target != 255\n    nonempty_target = ssc_target != 0\n    nonempty_target = nonempty_target[mask].float()\n    nonempty_probs = nonempty_probs[mask]\n    empty_probs = empty_probs[mask]\n\n    intersection = (nonempty_target * nonempty_probs).sum()\n    precision = intersection / nonempty_probs.sum()\n    recall = intersection / nonempty_target.sum()\n    spec = ((1 - nonempty_target) * (empty_probs)).sum() / (1 - nonempty_target).sum()\n    return (\n        F.binary_cross_entropy(precision, torch.ones_like(precision))\n        + F.binary_cross_entropy(recall, torch.ones_like(recall))\n        + F.binary_cross_entropy(spec, torch.ones_like(spec))\n    )\n\n\ndef sem_scal_loss(pred, ssc_target):\n    # Get softmax probabilities\n    pred = F.softmax(pred, dim=1)\n    loss = 0\n    count = 0\n    mask = ssc_target != 255\n    n_classes = pred.shape[1]\n    for i in range(0, n_classes):\n\n        # Get probability of class i\n        p = pred[:, i, :, :, :]\n\n        # Remove unknown voxels\n        target_ori = ssc_target\n        p = p[mask]\n        target = ssc_target[mask]\n\n        completion_target = torch.ones_like(target)\n        completion_target[target != i] = 0\n        completion_target_ori = torch.ones_like(target_ori).float()\n        completion_target_ori[target_ori != i] = 0\n        if torch.sum(completion_target) > 0:\n            count += 1.0\n            nominator = torch.sum(p * completion_target)\n            loss_class = 0\n            if torch.sum(p) > 0:\n                precision = nominator / (torch.sum(p))\n                loss_precision = F.binary_cross_entropy(\n                    precision, torch.ones_like(precision)\n                )\n                loss_class += loss_precision\n            if torch.sum(completion_target) > 0:\n                recall = nominator / (torch.sum(completion_target))\n                loss_recall = F.binary_cross_entropy(recall, torch.ones_like(recall))\n                loss_class += loss_recall\n            if torch.sum(1 - completion_target) > 0:\n                specificity = torch.sum((1 - p) * (1 - completion_target)) / (\n                    torch.sum(1 - completion_target)\n                )\n                loss_specificity = F.binary_cross_entropy(\n                    specificity, torch.ones_like(specificity)\n                )\n                loss_class += loss_specificity\n            loss += loss_class\n    return loss / count\n\n\ndef CE_ssc_loss(pred, target, class_weights):\n    \"\"\"\n    :param: prediction: the predicted tensor, must be [BS, C, H, W, D]\n    \"\"\"\n    criterion = nn.CrossEntropyLoss(\n        weight=class_weights, ignore_index=255, reduction=\"mean\"\n    )\n    loss = criterion(pred, target.long())\n\n    return loss\n"
  },
  {
    "path": "projects/occ_plugin/utils/semkitti.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport numpy as np\n\nsemantic_kitti_class_frequencies = np.array(\n    [\n        5.41773033e09,\n        1.57835390e07,\n        1.25136000e05,\n        1.18809000e05,\n        6.46799000e05,\n        8.21951000e05,\n        2.62978000e05,\n        2.83696000e05,\n        2.04750000e05,\n        6.16887030e07,\n        4.50296100e06,\n        4.48836500e07,\n        2.26992300e06,\n        5.68402180e07,\n        1.57196520e07,\n        1.58442623e08,\n        2.06162300e06,\n        3.69705220e07,\n        1.15198800e06,\n        3.34146000e05,\n    ]\n)\n\nkitti_class_names = [\n    \"empty\",\n    \"car\",\n    \"bicycle\",\n    \"motorcycle\",\n    \"truck\",\n    \"other-vehicle\",\n    \"person\",\n    \"bicyclist\",\n    \"motorcyclist\",\n    \"road\",\n    \"parking\",\n    \"sidewalk\",\n    \"other-ground\",\n    \"building\",\n    \"fence\",\n    \"vegetation\",\n    \"trunk\",\n    \"terrain\",\n    \"pole\",\n    \"traffic-sign\",\n]\n\n\ndef KL_sep(p, target):\n    \"\"\"\n    KL divergence on nonzeros classes\n    \"\"\"\n    nonzeros = target != 0\n    nonzero_p = p[nonzeros]\n    kl_term = F.kl_div(torch.log(nonzero_p), target[nonzeros], reduction=\"sum\")\n    return kl_term\n\n\ndef geo_scal_loss(pred, ssc_target, ignore_index=255, non_empty_idx=0):\n\n    # Get softmax probabilities\n    pred = F.softmax(pred, dim=1)\n\n    # Compute empty and nonempty probabilities\n    empty_probs = pred[:, non_empty_idx]\n    nonempty_probs = 1 - empty_probs\n\n    # Remove unknown voxels\n    mask = ssc_target != ignore_index\n    nonempty_target = ssc_target != non_empty_idx\n    nonempty_target = nonempty_target[mask].float()\n    nonempty_probs = nonempty_probs[mask]\n    empty_probs = empty_probs[mask]\n\n    eps = 1e-5\n    intersection = (nonempty_target * nonempty_probs).sum()\n    precision = intersection / (nonempty_probs.sum()+eps)\n    recall = intersection / (nonempty_target.sum()+eps)\n    spec = ((1 - nonempty_target) * (empty_probs)).sum() / ((1 - nonempty_target).sum()+eps)\n    return (\n        F.binary_cross_entropy(precision, torch.ones_like(precision))\n        + F.binary_cross_entropy(recall, torch.ones_like(recall))\n        + F.binary_cross_entropy(spec, torch.ones_like(spec))\n    )\n\n\ndef sem_scal_loss(pred, ssc_target, ignore_index=255):\n    # Get softmax probabilities\n    pred = F.softmax(pred, dim=1)\n    loss = 0\n    count = 0\n    mask = ssc_target != ignore_index\n    n_classes = pred.shape[1]\n    for i in range(0, n_classes):\n\n        # Get probability of class i\n        p = pred[:, i]\n\n        # Remove unknown voxels\n        target_ori = ssc_target\n        p = p[mask]\n        target = ssc_target[mask]\n\n        completion_target = torch.ones_like(target)\n        completion_target[target != i] = 0\n        completion_target_ori = torch.ones_like(target_ori).float()\n        completion_target_ori[target_ori != i] = 0\n        if torch.sum(completion_target) > 0:\n            count += 1.0\n            nominator = torch.sum(p * completion_target)\n            loss_class = 0\n            if torch.sum(p) > 0:\n                precision = nominator / (torch.sum(p))\n                loss_precision = F.binary_cross_entropy(\n                    precision, torch.ones_like(precision)\n                )\n                loss_class += loss_precision\n            if torch.sum(completion_target) > 0:\n                recall = nominator / (torch.sum(completion_target))\n                loss_recall = F.binary_cross_entropy(recall, torch.ones_like(recall))\n                loss_class += loss_recall\n            if torch.sum(1 - completion_target) > 0:\n                specificity = torch.sum((1 - p) * (1 - completion_target)) / (\n                    torch.sum(1 - completion_target)\n                )\n                loss_specificity = F.binary_cross_entropy(\n                    specificity, torch.ones_like(specificity)\n                )\n                loss_class += loss_specificity\n            loss += loss_class\n    return loss / count\n\n\ndef CE_ssc_loss(pred, target, class_weights=None, ignore_index=255):\n    \"\"\"\n    :param: prediction: the predicted tensor, must be [BS, C, ...]\n    \"\"\"\n    criterion = nn.CrossEntropyLoss(\n        weight=class_weights, ignore_index=ignore_index, reduction=\"mean\"\n    )\n    loss = criterion(pred, target.long())\n\n    return loss\n\ndef Smooth_L1_loss(pred, target, ignore_index=255):\n    # pred/target B, H, W, D, 3\n    kept = (target[:, :, :, :, 0] != ignore_index) & (target[:, :, :, :, 1] != ignore_index) & (target[:, :, :, :, 2] != ignore_index)\n\n    criterion = nn.SmoothL1Loss( reduction=\"mean\" )\n    loss = criterion(pred[kept], target[kept])\n\n    if torch.isnan(loss):\n        pred = pred * 0\n        target = target * 0\n        loss = criterion(pred, target)\n        return loss\n    return loss\n\ndef vel_loss(pred, gt):\n    return F.l1_loss(pred, gt)"
  },
  {
    "path": "projects/occ_plugin/utils/voxel_to_points.py",
    "content": "import open3d as o3d\nimport numpy as np\n\ndef query_points_from_voxels(pred, gt, img_metas):\n    # pred, [tensor of shape (num_class, x, y, z)]: predicted classes\n    # gt, [tensor of shape (batch, num_points)]: target points with semantic labels\n    \n    # logits to pred cls_id\n    pred = np.argmax(pred.detach().cpu().numpy(), axis=0)\n    gt_ = gt.detach().cpu().numpy()\n    \n    pred_fore_mask = pred > 0\n    if pred_fore_mask.sum() == 0:\n        return None\n    \n    # select foreground 3d voxel vertex\n    x = np.linspace(0, pred.shape[0] - 1, pred.shape[0])\n    y = np.linspace(0, pred.shape[1] - 1, pred.shape[1])\n    z = np.linspace(0, pred.shape[2] - 1, pred.shape[2])\n    X, Y, Z = np.meshgrid(x, y, z,  indexing='ij')\n    vv = np.stack([X, Y, Z], axis=-1)\n    \n    # foreground predictions & coordinates\n    pred = pred[pred_fore_mask]\n    vv = vv[pred_fore_mask]\n    \n    vv[:, 0] = (vv[:, 0] + 0.5) * (img_metas['pc_range'][3] - img_metas['pc_range'][0]) /  img_metas['occ_size'][0]  + img_metas['pc_range'][0]\n    vv[:, 1] = (vv[:, 1] + 0.5) * (img_metas['pc_range'][4] - img_metas['pc_range'][1]) /  img_metas['occ_size'][1]  + img_metas['pc_range'][1]\n    vv[:, 2] = (vv[:, 2] + 0.5) * (img_metas['pc_range'][5] - img_metas['pc_range'][2]) /  img_metas['occ_size'][2]  + img_metas['pc_range'][2]\n\n    pcd = o3d.geometry.PointCloud()\n    pcd.points = o3d.utility.Vector3dVector(vv)\n    \n    # for every lidar point, search its nearest *foreground* voxel vertex as the semantic prediction\n    kdtree = o3d.geometry.KDTreeFlann(pcd)\n    indices = []\n    for vert in gt_[:, :3]:\n        _, inds, _ = kdtree.search_knn_vector_3d(vert, 1)\n        indices.append(inds[0])\n    \n    pred_valid = pred[np.array(indices)]\n    \n    return pred_valid\n    "
  },
  {
    "path": "run.sh",
    "content": "echo \"-------------\"\necho \"load config from local path:\" $1\nif [ -f $1 ]; then\n  config=$1\nelse\n  echo \"need a config file\"\n  exit\nfi\n\nbash tools/dist_train.sh $config $2 ${@:3}"
  },
  {
    "path": "run_eval.sh",
    "content": "echo \"-------------\"\necho \"load config from local path:\" $1\nif [ -f $1 ]; then\n  config=$1\nelse\n  echo \"need a config file\"\n  exit\nfi\n\nexport PYTHONPATH=\".\"\n\nckpt=$2\ngpu=$3\nbash tools/dist_test.sh $config $ckpt $gpu ${@:4}"
  },
  {
    "path": "setup.py",
    "content": "from setuptools import find_packages, setup\n\nimport os\nimport torch\nfrom os import path as osp\nfrom torch.utils.cpp_extension import (BuildExtension, CppExtension,\n                                       CUDAExtension)\n\n\ndef make_cuda_ext(name,\n                  module,\n                  sources,\n                  sources_cuda=[],\n                  extra_args=[],\n                  extra_include_path=[]):\n\n    define_macros = []\n    extra_compile_args = {'cxx': [] + extra_args}\n\n    if torch.cuda.is_available() or os.getenv('FORCE_CUDA', '0') == '1':\n        define_macros += [('WITH_CUDA', None)]\n        extension = CUDAExtension\n        extra_compile_args['nvcc'] = extra_args + [\n            '-D__CUDA_NO_HALF_OPERATORS__',\n            '-D__CUDA_NO_HALF_CONVERSIONS__',\n            '-D__CUDA_NO_HALF2_OPERATORS__',\n        ]\n        sources += sources_cuda\n    else:\n        print('Compiling {} without CUDA'.format(name))\n        extension = CppExtension\n        # raise EnvironmentError('CUDA is required to compile MMDetection!')\n\n    return extension(\n        name='{}.{}'.format(module, name),\n        sources=[os.path.join(*module.split('.'), p) for p in sources],\n        include_dirs=extra_include_path,\n        define_macros=define_macros,\n        extra_compile_args=extra_compile_args)\n\n\n\nif __name__ == '__main__':\n    # add_mim_extention()\n    setup(\n        name='OpenOccupancy',\n        version='0.0',\n        description=(\"OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception\"),\n        author='OpenOccupancy Contributors',\n        author_email='wangxiaofeng2020@ia.ac.cn',\n        keywords='Occupancy Perception',\n        packages=find_packages(),\n        include_package_data=True,\n        package_data={'projects.occ_plugin.ops': ['*/*.so']},\n        classifiers=[\n            \"Development Status :: 4 - Beta\",\n            \"License :: OSI Approved :: Apache Software License\",\n            \"Operating System :: OS Independent\",\n            \"Programming Language :: Python :: 3\",\n            \"Programming Language :: Python :: 3.6\",\n            \"Programming Language :: Python :: 3.7\",\n        ],\n        license=\"Apache License 2.0\",\n\n        ext_modules=[\n            make_cuda_ext(\n                name=\"occ_pool_ext\",\n                module=\"projects.occ_plugin.ops.occ_pooling\",\n                sources=[\n                    \"src/occ_pool.cpp\",\n                    \"src/occ_pool_cuda.cu\",\n                ]),\n        ],\n        cmdclass={'build_ext': BuildExtension})"
  },
  {
    "path": "tools/dist_test.sh",
    "content": "#!/usr/bin/env bash\n\nCONFIG=$1\nCHECKPOINT=$2\nGPUS=$3\nPORT=${PORT:-29504}\n\nPYTHONPATH=\"$(dirname $0)/..\":$PYTHONPATH \\\npython -m torch.distributed.run --nproc_per_node=$GPUS --master_port=$PORT \\\n    $(dirname \"$0\")/test.py $CONFIG $CHECKPOINT --launcher pytorch ${@:4} --deterministic --eval bbox"
  },
  {
    "path": "tools/dist_train.sh",
    "content": "#!/usr/bin/env bash\n\nCONFIG=$1\nGPUS=$2\nNNODES=${NNODES:-1}\nNODE_RANK=${NODE_RANK:-0}\nPORT=${PORT:-29501}\nMASTER_ADDR=${MASTER_ADDR:-\"127.0.0.1\"}\n\nPYTHONPATH=\"$(dirname $0)/..\":$PYTHONPATH \\\npython -m torch.distributed.run \\\n    --nnodes=$NNODES \\\n    --node_rank=$NODE_RANK \\\n    --master_addr=$MASTER_ADDR \\\n    --nproc_per_node=$GPUS \\\n    --master_port=$PORT \\\n    $(dirname \"$0\")/train.py \\\n    $CONFIG \\\n    --seed 2 \\\n    --resume ./work_dirs/OCFNet_in_Cam4DOcc_V1.2/epoch_15.pth\n    --launcher pytorch ${@:3}\n"
  },
  {
    "path": "tools/gen_data/gen_depth_gt.py",
    "content": "import os\nfrom multiprocessing import Pool\n\nimport mmcv\nimport numpy as np\nfrom nuscenes.utils.data_classes import LidarPointCloud\nfrom nuscenes.utils.geometry_utils import view_points\nfrom pyquaternion import Quaternion\nimport copy\n\n# https://github.com/nutonomy/nuscenes-devkit/blob/master/python-sdk/nuscenes/nuscenes.py#L834\ndef map_pointcloud_to_image(\n    pc,\n    im,\n    lidar2ego_translation,\n    lidar2ego_rotation,\n    ego2global_translation,\n    ego2global_rotation,\n    sensor2ego_translation, \n    sensor2ego_rotation,\n    cam_ego2global_translation,\n    cam_ego2global_rotation,\n    cam_intrinsic,\n    min_dist: float = 0.0,\n):\n\n    # Points live in the point sensor frame. So they need to be\n    # transformed via global to the image plane.\n    # First step: transform the pointcloud to the ego vehicle\n    # frame for the timestamp of the sweep.\n\n    pc = LidarPointCloud(pc.T)\n    pc.rotate(Quaternion(lidar2ego_rotation).rotation_matrix)\n    pc.translate(np.array(lidar2ego_translation))\n\n    # Second step: transform from ego to the global frame.\n    pc.rotate(Quaternion(ego2global_rotation).rotation_matrix)\n    pc.translate(np.array(ego2global_translation))\n\n    # Third step: transform from global into the ego vehicle\n    # frame for the timestamp of the image.\n    pc.translate(-np.array(cam_ego2global_translation))\n    pc.rotate(Quaternion(cam_ego2global_rotation).rotation_matrix.T)\n\n    # Fourth step: transform from ego into the camera.\n    pc.translate(-np.array(sensor2ego_translation))\n    pc.rotate(Quaternion(sensor2ego_rotation).rotation_matrix.T)\n\n    # Fifth step: actually take a \"picture\" of the point cloud.\n    # Grab the depths (camera frame z axis points away from the camera).\n    depths = pc.points[2, :]\n    coloring = depths\n\n    # Take the actual picture (matrix multiplication with camera-matrix\n    # + renormalization).\n    points = view_points(pc.points[:3, :],\n                         cam_intrinsic,\n                         normalize=True)\n\n    # Remove points that are either outside or behind the camera.\n    # Leave a margin of 1 pixel for aesthetic reasons. Also make\n    # sure points are at least 1m in front of the camera to avoid\n    # seeing the lidar points on the camera casing for non-keyframes\n    # which are slightly out of sync.\n    mask = np.ones(depths.shape[0], dtype=bool)\n    mask = np.logical_and(mask, depths > min_dist)\n    mask = np.logical_and(mask, points[0, :] > 1)\n    mask = np.logical_and(mask, points[0, :] < im.shape[1] - 1)\n    mask = np.logical_and(mask, points[1, :] > 1)\n    mask = np.logical_and(mask, points[1, :] < im.shape[0] - 1)\n    points = points[:, mask]\n    coloring = coloring[mask]\n\n    return points, coloring\n\n\ndata_root = './data/nuscenes'\ninfo_path_train = './data/nuscenes/nuscenes_occ_infos_train.pkl'\ninfo_path_val = './data/nuscenes/nuscenes_occ_infos_val.pkl'\n\n# data3d_nusc = NuscMVDetData()\n\nlidar_key = 'LIDAR_TOP'\ncam_keys = [\n    'CAM_FRONT_LEFT', 'CAM_FRONT', 'CAM_FRONT_RIGHT', 'CAM_BACK_RIGHT',\n    'CAM_BACK', 'CAM_BACK_LEFT'\n]\n\n\ndef worker(info):\n    lidar_path = info['lidar_path']\n    points = np.fromfile(lidar_path,\n                         dtype=np.float32,\n                         count=-1).reshape(-1, 5)[..., :4]\n    \n    lidar2ego_translation = info['lidar2ego_translation']\n    lidar2ego_rotation = info['lidar2ego_rotation']\n    ego2global_translation = info['ego2global_translation']\n    ego2global_rotation = info['ego2global_rotation']\n    for i, cam_key in enumerate(cam_keys):\n        sensor2ego_translation = info['cams'][cam_key]['sensor2ego_translation']\n        sensor2ego_rotation = info['cams'][cam_key]['sensor2ego_rotation']\n        cam_ego2global_translation = info['cams'][cam_key]['ego2global_translation']\n        cam_ego2global_rotation = info['cams'][cam_key]['ego2global_rotation']\n        cam_intrinsic = info['cams'][cam_key]['cam_intrinsic']\n        img = mmcv.imread(\n            os.path.join(info['cams'][cam_key]['data_path']))\n        pts_img, depth = map_pointcloud_to_image(\n            points.copy(), img, \n            copy.deepcopy(lidar2ego_translation), \n            copy.deepcopy(lidar2ego_rotation), \n            copy.deepcopy(ego2global_translation),\n            copy.deepcopy(ego2global_rotation),\n            copy.deepcopy(sensor2ego_translation), \n            copy.deepcopy(sensor2ego_rotation), \n            copy.deepcopy(cam_ego2global_translation), \n            copy.deepcopy(cam_ego2global_rotation),\n            copy.deepcopy(cam_intrinsic))\n        \n        file_name = os.path.split(info['cams'][cam_key]['data_path'])[-1]\n        np.concatenate([pts_img[:2, :].T, depth[:, None]],\n                       axis=1).astype(np.float32).flatten().tofile(\n                           os.path.join('./data', 'depth_gt',\n                                        f'{file_name}.bin'))\n\nif __name__ == '__main__':\n    po = Pool(12)\n    mmcv.mkdir_or_exist(os.path.join('./data', 'depth_gt'))\n    infos = mmcv.load(info_path_train)['infos']\n    for info in infos:\n        po.apply_async(func=worker, args=(info, ))\n    po.close()\n    po.join()\n    \n    po2 = Pool(12)\n    infos = mmcv.load(info_path_val)['infos']\n    for info in infos:\n        po2.apply_async(func=worker, args=(info, ))\n    po2.close()\n    po2.join()\n"
  },
  {
    "path": "tools/misc/browse_dataset.py",
    "content": "# Copyright (c) OpenMMLab. All rights reserved.\nimport argparse\nimport numpy as np\nimport warnings\nfrom mmcv import Config, DictAction, mkdir_or_exist, track_iter_progress\nfrom os import path as osp\n\nfrom mmdet3d.core.bbox import (Box3DMode, CameraInstance3DBoxes, Coord3DMode,\n                               DepthInstance3DBoxes, LiDARInstance3DBoxes)\nfrom mmdet3d.core.visualizer import (show_multi_modality_result, show_result,\n                                     show_seg_result)\nfrom mmdet3d.datasets import build_dataset\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser(description='Browse a dataset')\n    parser.add_argument('config', help='train config file path')\n    parser.add_argument(\n        '--skip-type',\n        type=str,\n        nargs='+',\n        default=['Normalize'],\n        help='skip some useless pipeline')\n    parser.add_argument(\n        '--output-dir',\n        default=None,\n        type=str,\n        help='If there is no display interface, you can save it')\n    parser.add_argument(\n        '--task',\n        type=str,\n        choices=['det', 'seg', 'multi_modality-det', 'mono-det'],\n        help='Determine the visualization method depending on the task.')\n    parser.add_argument(\n        '--online',\n        action='store_true',\n        help='Whether to perform online visualization. Note that you often '\n        'need a monitor to do so.')\n    parser.add_argument(\n        '--cfg-options',\n        nargs='+',\n        action=DictAction,\n        help='override some settings in the used config, the key-value pair '\n        'in xxx=yyy format will be merged into config file. If the value to '\n        'be overwritten is a list, it should be like key=\"[a,b]\" or key=a,b '\n        'It also allows nested list/tuple values, e.g. key=\"[(a,b),(c,d)]\" '\n        'Note that the quotation marks are necessary and that no white space '\n        'is allowed.')\n    args = parser.parse_args()\n    return args\n\n\ndef build_data_cfg(config_path, skip_type, cfg_options):\n    \"\"\"Build data config for loading visualization data.\"\"\"\n    cfg = Config.fromfile(config_path)\n    if cfg_options is not None:\n        cfg.merge_from_dict(cfg_options)\n    # import modules from string list.\n    if cfg.get('custom_imports', None):\n        from mmcv.utils import import_modules_from_strings\n        import_modules_from_strings(**cfg['custom_imports'])\n    # extract inner dataset of `RepeatDataset` as `cfg.data.train`\n    # so we don't need to worry about it later\n    if cfg.data.train['type'] == 'RepeatDataset':\n        cfg.data.train = cfg.data.train.dataset\n    # use only first dataset for `ConcatDataset`\n    if cfg.data.train['type'] == 'ConcatDataset':\n        cfg.data.train = cfg.data.train.datasets[0]\n    train_data_cfg = cfg.data.train\n    # eval_pipeline purely consists of loading functions\n    # use eval_pipeline for data loading\n    train_data_cfg['pipeline'] = [\n        x for x in cfg.eval_pipeline if x['type'] not in skip_type\n    ]\n\n    return cfg\n\n\ndef to_depth_mode(points, bboxes):\n    \"\"\"Convert points and bboxes to Depth Coord and Depth Box mode.\"\"\"\n    if points is not None:\n        points = Coord3DMode.convert_point(points.copy(), Coord3DMode.LIDAR,\n                                           Coord3DMode.DEPTH)\n    if bboxes is not None:\n        bboxes = Box3DMode.convert(bboxes.clone(), Box3DMode.LIDAR,\n                                   Box3DMode.DEPTH)\n    return points, bboxes\n\n\ndef show_det_data(idx, dataset, out_dir, filename, show=False):\n    \"\"\"Visualize 3D point cloud and 3D bboxes.\"\"\"\n    example = dataset.prepare_train_data(idx)\n    points = example['points']._data.numpy()\n    gt_bboxes = dataset.get_ann_info(idx)['gt_bboxes_3d'].tensor\n    if dataset.box_mode_3d != Box3DMode.DEPTH:\n        points, gt_bboxes = to_depth_mode(points, gt_bboxes)\n    show_result(\n        points,\n        gt_bboxes.clone(),\n        None,\n        out_dir,\n        filename,\n        show=show,\n        snapshot=True)\n\n\ndef show_seg_data(idx, dataset, out_dir, filename, show=False):\n    \"\"\"Visualize 3D point cloud and segmentation mask.\"\"\"\n    example = dataset.prepare_train_data(idx)\n    points = example['points']._data.numpy()\n    gt_seg = example['pts_semantic_mask']._data.numpy()\n    show_seg_result(\n        points,\n        gt_seg.copy(),\n        None,\n        out_dir,\n        filename,\n        np.array(dataset.PALETTE),\n        dataset.ignore_index,\n        show=show,\n        snapshot=True)\n\n\ndef show_proj_bbox_img(idx,\n                       dataset,\n                       out_dir,\n                       filename,\n                       show=False,\n                       is_nus_mono=False):\n    \"\"\"Visualize 3D bboxes on 2D image by projection.\"\"\"\n    try:\n        example = dataset.prepare_train_data(idx)\n    except AttributeError:  # for Mono-3D datasets\n        example = dataset.prepare_train_img(idx)\n    gt_bboxes = dataset.get_ann_info(idx)['gt_bboxes_3d']\n    img_metas = example['img_metas']._data\n    img = example['img']._data.numpy()\n    # need to transpose channel to first dim\n    img = img.transpose(1, 2, 0)\n    # no 3D gt bboxes, just show img\n    if gt_bboxes.tensor.shape[0] == 0:\n        gt_bboxes = None\n    if isinstance(gt_bboxes, DepthInstance3DBoxes):\n        show_multi_modality_result(\n            img,\n            gt_bboxes,\n            None,\n            None,\n            out_dir,\n            filename,\n            box_mode='depth',\n            img_metas=img_metas,\n            show=show)\n    elif isinstance(gt_bboxes, LiDARInstance3DBoxes):\n        show_multi_modality_result(\n            img,\n            gt_bboxes,\n            None,\n            img_metas['lidar2img'],\n            out_dir,\n            filename,\n            box_mode='lidar',\n            img_metas=img_metas,\n            show=show)\n    elif isinstance(gt_bboxes, CameraInstance3DBoxes):\n        show_multi_modality_result(\n            img,\n            gt_bboxes,\n            None,\n            img_metas['cam2img'],\n            out_dir,\n            filename,\n            box_mode='camera',\n            img_metas=img_metas,\n            show=show)\n    else:\n        # can't project, just show img\n        warnings.warn(\n            f'unrecognized gt box type {type(gt_bboxes)}, only show image')\n        show_multi_modality_result(\n            img, None, None, None, out_dir, filename, show=show)\n\n\ndef main():\n    args = parse_args()\n\n    if args.output_dir is not None:\n        mkdir_or_exist(args.output_dir)\n\n    cfg = build_data_cfg(args.config, args.skip_type, args.cfg_options)\n    try:\n        dataset = build_dataset(\n            cfg.data.train, default_args=dict(filter_empty_gt=False))\n    except TypeError:  # seg dataset doesn't have `filter_empty_gt` key\n        dataset = build_dataset(cfg.data.train)\n    data_infos = dataset.data_infos\n    dataset_type = cfg.dataset_type\n\n    # configure visualization mode\n    vis_task = args.task  # 'det', 'seg', 'multi_modality-det', 'mono-det'\n\n    for idx, data_info in enumerate(track_iter_progress(data_infos)):\n        if dataset_type in ['KittiDataset', 'WaymoDataset']:\n            data_path = data_info['point_cloud']['velodyne_path']\n        elif dataset_type in [\n                'ScanNetDataset', 'SUNRGBDDataset', 'ScanNetSegDataset',\n                'S3DISSegDataset', 'S3DISDataset'\n        ]:\n            data_path = data_info['pts_path']\n        elif dataset_type in ['NuScenesDataset', 'LyftDataset']:\n            data_path = data_info['lidar_path']\n        elif dataset_type in ['NuScenesMonoDataset']:\n            data_path = data_info['file_name']\n        else:\n            raise NotImplementedError(\n                f'unsupported dataset type {dataset_type}')\n\n        file_name = osp.splitext(osp.basename(data_path))[0]\n\n        if vis_task in ['det', 'multi_modality-det']:\n            # show 3D bboxes on 3D point clouds\n            show_det_data(\n                idx, dataset, args.output_dir, file_name, show=args.online)\n        if vis_task in ['multi_modality-det', 'mono-det']:\n            # project 3D bboxes to 2D image\n            show_proj_bbox_img(\n                idx,\n                dataset,\n                args.output_dir,\n                file_name,\n                show=args.online,\n                is_nus_mono=(dataset_type == 'NuScenesMonoDataset'))\n        elif vis_task in ['seg']:\n            # show 3D segmentation mask on 3D point clouds\n            show_seg_data(\n                idx, dataset, args.output_dir, file_name, show=args.online)\n\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "tools/misc/fuse_conv_bn.py",
    "content": "# Copyright (c) OpenMMLab. All rights reserved.\nimport argparse\nimport torch\nfrom mmcv.runner import save_checkpoint\nfrom torch import nn as nn\n\nfrom mmdet.apis import init_model\n\n\ndef fuse_conv_bn(conv, bn):\n    \"\"\"During inference, the functionary of batch norm layers is turned off but\n    only the mean and var alone channels are used, which exposes the chance to\n    fuse it with the preceding conv layers to save computations and simplify\n    network structures.\"\"\"\n    conv_w = conv.weight\n    conv_b = conv.bias if conv.bias is not None else torch.zeros_like(\n        bn.running_mean)\n\n    factor = bn.weight / torch.sqrt(bn.running_var + bn.eps)\n    conv.weight = nn.Parameter(conv_w *\n                               factor.reshape([conv.out_channels, 1, 1, 1]))\n    conv.bias = nn.Parameter((conv_b - bn.running_mean) * factor + bn.bias)\n    return conv\n\n\ndef fuse_module(m):\n    last_conv = None\n    last_conv_name = None\n\n    for name, child in m.named_children():\n        if isinstance(child, (nn.BatchNorm2d, nn.SyncBatchNorm)):\n            if last_conv is None:  # only fuse BN that is after Conv\n                continue\n            fused_conv = fuse_conv_bn(last_conv, child)\n            m._modules[last_conv_name] = fused_conv\n            # To reduce changes, set BN as Identity instead of deleting it.\n            m._modules[name] = nn.Identity()\n            last_conv = None\n        elif isinstance(child, nn.Conv2d):\n            last_conv = child\n            last_conv_name = name\n        else:\n            fuse_module(child)\n    return m\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser(\n        description='fuse Conv and BN layers in a model')\n    parser.add_argument('config', help='config file path')\n    parser.add_argument('checkpoint', help='checkpoint file path')\n    parser.add_argument('out', help='output path of the converted model')\n    args = parser.parse_args()\n    return args\n\n\ndef main():\n    args = parse_args()\n    # build the model from a config file and a checkpoint file\n    model = init_model(args.config, args.checkpoint)\n    # fuse conv and bn layers of the model\n    fused_model = fuse_module(model)\n    save_checkpoint(fused_model, args.out)\n\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "tools/misc/print_config.py",
    "content": "# Copyright (c) OpenMMLab. All rights reserved.\nimport argparse\nfrom mmcv import Config, DictAction\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser(description='Print the whole config')\n    parser.add_argument('config', help='config file path')\n    parser.add_argument(\n        '--options', nargs='+', action=DictAction, help='arguments in dict')\n    args = parser.parse_args()\n\n    return args\n\n\ndef main():\n    args = parse_args()\n\n    cfg = Config.fromfile(args.config)\n    if args.options is not None:\n        cfg.merge_from_dict(args.options)\n    print(f'Config:\\n{cfg.pretty_text}')\n\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "tools/misc/visualize_results.py",
    "content": "# Copyright (c) OpenMMLab. All rights reserved.\nimport argparse\nimport mmcv\nfrom mmcv import Config\n\nfrom mmdet3d.datasets import build_dataset\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser(\n        description='MMDet3D visualize the results')\n    parser.add_argument('config', help='test config file path')\n    parser.add_argument('--result', help='results file in pickle format')\n    parser.add_argument(\n        '--show-dir', help='directory where visualize results will be saved')\n    args = parser.parse_args()\n\n    return args\n\n\ndef main():\n    args = parse_args()\n\n    if args.result is not None and \\\n            not args.result.endswith(('.pkl', '.pickle')):\n        raise ValueError('The results file must be a pkl file.')\n\n    cfg = Config.fromfile(args.config)\n    cfg.data.test.test_mode = True\n\n    # build the dataset\n    dataset = build_dataset(cfg.data.test)\n    results = mmcv.load(args.result)\n\n    if getattr(dataset, 'show', None) is not None:\n        # data loading pipeline for showing\n        eval_pipeline = cfg.get('eval_pipeline', {})\n        if eval_pipeline:\n            dataset.show(results, args.show_dir, pipeline=eval_pipeline)\n        else:\n            dataset.show(results, args.show_dir)  # use default pipeline\n    else:\n        raise NotImplementedError(\n            'Show is not implemented for dataset {}!'.format(\n                type(dataset).__name__))\n\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "tools/test.py",
    "content": "# Copyright (c) OpenMMLab. All rights reserved.\n#  Modified by Junyi Ma, following OpenOccupancy of Zhiqi Li\n\nimport argparse\nimport mmcv\nimport os\nimport torch\nfrom mmcv import Config, DictAction\nfrom mmcv.cnn import fuse_conv_bn\nfrom mmcv.parallel import MMDataParallel, MMDistributedDataParallel\nfrom mmcv.runner import (get_dist_info, init_dist, load_checkpoint,\n                         wrap_fp16_model)\n\nfrom mmdet3d.apis import single_gpu_test\nfrom mmdet3d.datasets import build_dataset\nfrom projects.occ_plugin.datasets.builder import build_dataloader\nfrom mmdet3d.models import build_model\nfrom mmdet.apis import set_random_seed\nfrom projects.occ_plugin.occupancy.apis.test import custom_single_gpu_test, custom_multi_gpu_test\n\nfrom mmdet.datasets import replace_ImageToTensor\nimport time\nimport os.path as osp\nimport warnings\nwarnings.filterwarnings(\"ignore\")\nwarnings.simplefilter(action=\"ignore\",category=FutureWarning)\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser(\n        description='MMDet test (and eval) a model')\n    parser.add_argument('config', help='test config file path')\n    parser.add_argument('checkpoint', help='checkpoint file')\n    parser.add_argument('--out', help='output result file in pickle format')\n    parser.add_argument(\n        '--fuse-conv-bn',\n        action='store_true',\n        help='Whether to fuse conv and bn, this will slightly increase'\n        'the inference speed')\n    parser.add_argument(\n        '--format-only',\n        action='store_true',\n        help='Format the output results without perform evaluation. It is'\n        'useful when you want to format the result to a specific format and '\n        'submit it to the test server')\n    parser.add_argument(\n        '--eval',\n        type=str,\n        nargs='+',\n        help='evaluation metrics, which depends on the dataset, e.g., \"bbox\",'\n        ' \"segm\", \"proposal\" for COCO, and \"mAP\", \"recall\" for PASCAL VOC')\n    parser.add_argument('--show', action='store_true', help='show results')\n    parser.add_argument(\n        '--show-dir', help='directory where results will be saved')\n    parser.add_argument(\n        '--gpu-collect',\n        action='store_true',\n        help='whether to use gpu to collect results.')\n    parser.add_argument(\n        '--tmpdir',\n        help='tmp directory used for collecting results from multiple '\n        'workers, available when gpu-collect is not specified')\n    parser.add_argument('--seed', type=int, default=0, help='random seed')\n    parser.add_argument(\n        '--deterministic',\n        action='store_true',\n        help='whether to set deterministic options for CUDNN backend.')\n    parser.add_argument(\n        '--cfg-options',\n        nargs='+',\n        action=DictAction,\n        help='override some settings in the used config, the key-value pair '\n        'in xxx=yyy format will be merged into config file. If the value to '\n        'be overwritten is a list, it should be like key=\"[a,b]\" or key=a,b '\n        'It also allows nested list/tuple values, e.g. key=\"[(a,b),(c,d)]\" '\n        'Note that the quotation marks are necessary and that no white space '\n        'is allowed.')\n    parser.add_argument(\n        '--options',\n        nargs='+',\n        action=DictAction,\n        help='custom options for evaluation, the key-value pair in xxx=yyy '\n        'format will be kwargs for dataset.evaluate() function (deprecate), '\n        'change to --eval-options instead.')\n    parser.add_argument(\n        '--eval-options',\n        nargs='+',\n        action=DictAction,\n        help='custom options for evaluation, the key-value pair in xxx=yyy '\n        'format will be kwargs for dataset.evaluate() function')\n    parser.add_argument(\n        '--launcher',\n        choices=['none', 'pytorch', 'slurm', 'mpi'],\n        default='none',\n        help='job launcher')\n    parser.add_argument('--local_rank', type=int, default=0)\n    args = parser.parse_args()\n    if 'LOCAL_RANK' not in os.environ:\n        os.environ['LOCAL_RANK'] = str(args.local_rank)\n\n    if args.options and args.eval_options:\n        raise ValueError(\n            '--options and --eval-options cannot be both specified, '\n            '--options is deprecated in favor of --eval-options')\n    if args.options:\n        warnings.warn('--options is deprecated in favor of --eval-options')\n        args.eval_options = args.options\n    return args\n\n\ndef main():\n    args = parse_args()\n\n    assert args.out or args.eval or args.format_only or args.show \\\n        or args.show_dir, \\\n        ('Please specify at least one operation (save/eval/format/show the '\n         'results / save the results) with the argument \"--out\", \"--eval\"'\n         ', \"--format-only\", \"--show\" or \"--show-dir\"')\n\n    if args.eval and args.format_only:\n        raise ValueError('--eval and --format_only cannot be both specified')\n\n    if args.out is not None and not args.out.endswith(('.pkl', '.pickle')):\n        raise ValueError('The output file must be a pkl file.')\n\n    cfg = Config.fromfile(args.config)\n    if args.cfg_options is not None:\n        cfg.merge_from_dict(args.cfg_options)\n    # import modules from string list.\n    if cfg.get('custom_imports', None):\n        from mmcv.utils import import_modules_from_strings\n        import_modules_from_strings(**cfg['custom_imports'])\n\n    # import modules from plguin/xx, registry will be updated\n    if hasattr(cfg, 'plugin'):\n        if cfg.plugin:\n            import importlib\n            if hasattr(cfg, 'plugin_dir'):\n                plugin_dir = cfg.plugin_dir\n                _module_dir = os.path.dirname(plugin_dir)\n                _module_dir = _module_dir.split('/')\n                _module_path = _module_dir[0]\n\n                for m in _module_dir[1:]:\n                    _module_path = _module_path + '.' + m\n                # print(_module_path)\n                plg_lib = importlib.import_module(_module_path)\n            else:\n                # import dir is the dirpath for the config file\n                _module_dir = os.path.dirname(args.config)\n                _module_dir = _module_dir.split('/')\n                _module_path = _module_dir[0]\n                for m in _module_dir[1:]:\n                    _module_path = _module_path + '.' + m\n                # print(_module_path)\n                plg_lib = importlib.import_module(_module_path)\n\n    # set cudnn_benchmark\n    if cfg.get('cudnn_benchmark', False):\n        torch.backends.cudnn.benchmark = True\n\n    cfg.model.pretrained = None\n    # in case the test dataset is concatenated\n    samples_per_gpu = 1\n    if isinstance(cfg.data.test, dict):\n        cfg.data.test.test_mode = True\n        samples_per_gpu = cfg.data.test.pop('samples_per_gpu', 1)\n        if samples_per_gpu > 1:\n            # Replace 'ImageToTensor' to 'DefaultFormatBundle'\n            cfg.data.test.pipeline = replace_ImageToTensor(\n                cfg.data.test.pipeline)\n    elif isinstance(cfg.data.test, list):\n        for ds_cfg in cfg.data.test:\n            ds_cfg.test_mode = True\n        samples_per_gpu = max(\n            [ds_cfg.pop('samples_per_gpu', 1) for ds_cfg in cfg.data.test])\n        if samples_per_gpu > 1:\n            for ds_cfg in cfg.data.test:\n                ds_cfg.pipeline = replace_ImageToTensor(ds_cfg.pipeline)\n\n    # init distributed env first, since logger depends on the dist info.\n    # print(\"args.launcher\", args.launcher)\n    if args.launcher == 'none':\n        distributed = False\n    else:\n        distributed = True\n        init_dist(args.launcher, **cfg.dist_params)\n\n    # set random seeds\n    if args.seed is not None:\n        set_random_seed(args.seed, deterministic=args.deterministic)\n\n    # build the dataloader\n    dataset = build_dataset(cfg.data.test)\n    data_loader = build_dataloader(\n        dataset,\n        samples_per_gpu=samples_per_gpu,\n        workers_per_gpu=cfg.data.workers_per_gpu,\n        dist=distributed,\n        shuffle=True,\n        num_gpus=2, \n    )\n\n    # build the model and load checkpoint\n    cfg.model.train_cfg = None\n    model = build_model(cfg.model, test_cfg=cfg.get('test_cfg'))\n    fp16_cfg = cfg.get('fp16', None)\n    if fp16_cfg is not None:\n        wrap_fp16_model(model)\n    checkpoint = load_checkpoint(model, args.checkpoint, map_location='cpu')\n    if args.fuse_conv_bn:\n        model = fuse_conv_bn(model)\n    # old versions did not save class info in checkpoints, this walkaround is\n    # for backward compatibility\n    if 'CLASSES' in checkpoint.get('meta', {}):\n        model.CLASSES = checkpoint['meta']['CLASSES']\n    else:\n        model.CLASSES = dataset.CLASSES\n    # palette for visualization in segmentation tasks\n    if 'PALETTE' in checkpoint.get('meta', {}):\n        model.PALETTE = checkpoint['meta']['PALETTE']\n    elif hasattr(dataset, 'PALETTE'):\n        # segmentation dataset has `PALETTE` attribute\n        model.PALETTE = dataset.PALETTE\n\n    if args.show:\n        if args.show_dir is None:\n            args.show_dir = osp.join('./work_dirs',\n                            osp.splitext(osp.basename(args.config))[0],\n                            'visualization')\n        print('save dir: ', args.show_dir)               \n        os.makedirs(args.show_dir, exist_ok=True)\n    if not distributed:\n        model = MMDataParallel(model, device_ids=[0])\n        outputs = custom_single_gpu_test(model, data_loader, args.show, args.show_dir)\n    else:\n        model = MMDistributedDataParallel(\n            model.cuda(),\n            device_ids=[torch.cuda.current_device()],\n            broadcast_buffers=False)\n        outputs = custom_multi_gpu_test(model, data_loader, args.tmpdir,\n                                        args.gpu_collect, args.show, args.show_dir)\n\n    rank, _ = get_dist_info()\n    if rank == 0 and distributed:\n        \n        kwargs = {} if args.eval_options is None else args.eval_options\n        kwargs['jsonfile_prefix'] = osp.join('test', args.config.split(\n            '/')[-1].split('.')[-2], time.ctime().replace(' ', '_').replace(':', '_'))\n        \n        if args.format_only:\n            dataset.format_results(outputs, **kwargs)\n\n        if args.eval:\n            eval_kwargs = cfg.get('evaluation', {}).copy()\n            # hard-code way to remove EvalHook args\n            for key in [\n                    'interval', 'tmpdir', 'start', 'gpu_collect', 'save_best',\n                    'rule'\n            ]:\n                eval_kwargs.pop(key, None)\n            eval_kwargs.update(dict(metric=args.eval, **kwargs))\n\n            print(dataset.evaluate(outputs, **eval_kwargs))\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "tools/train.py",
    "content": "# Copyright (c) OpenMMLab. All rights reserved.\n# Cam4DOcc refers to OpenOccupancy of Zhiqi Li\n \nfrom __future__ import division\n\nimport argparse\nimport copy\nimport mmcv\nimport os\nimport time\nimport torch\nimport warnings\nfrom mmcv import Config, DictAction\nfrom mmcv.runner import get_dist_info, init_dist\nfrom os import path as osp\n\nfrom mmdet import __version__ as mmdet_version\nfrom mmdet3d import __version__ as mmdet3d_version\nfrom mmseg import __version__ as mmseg_version\n\nfrom mmdet3d.datasets import build_dataset\nfrom mmdet3d.models import build_model\nfrom mmdet3d.utils import collect_env, get_root_logger\nfrom mmdet.apis import set_random_seed\n\nfrom mmcv.utils import TORCH_VERSION, digit_version\nfrom projects.occ_plugin.occupancy.apis.train import custom_train_model\n\nimport warnings\nwarnings.filterwarnings(\"ignore\")\nwarnings.simplefilter(action=\"ignore\",category=FutureWarning)\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser(description='Train a detector')\n    parser.add_argument('config', help='train config file path')\n    parser.add_argument('--work-dir', help='the dir to save logs and models')\n    parser.add_argument(\n        '--resume', help='the checkpoint file to resume from')\n    parser.add_argument(\n        '--no-validate',\n        action='store_true',\n        help='whether not to evaluate the checkpoint during training')\n    group_gpus = parser.add_mutually_exclusive_group()\n    group_gpus.add_argument(\n        '--gpus',\n        type=int,\n        help='number of gpus to use '\n        '(only applicable to non-distributed training)')\n    group_gpus.add_argument(\n        '--gpu-ids',\n        type=int,\n        nargs='+',\n        help='ids of gpus to use '\n        '(only applicable to non-distributed training)')\n    parser.add_argument('--seed', type=int, default=0, help='random seed')\n    parser.add_argument(\n        '--deterministic',\n        action='store_true',\n        help='whether to set deterministic options for CUDNN backend.')\n    parser.add_argument(\n        '--options',\n        nargs='+',\n        action=DictAction,\n        help='override some settings in the used config, the key-value pair '\n        'in xxx=yyy format will be merged into config file (deprecate), '\n        'change to --cfg-options instead.')\n    parser.add_argument(\n        '--cfg-options',\n        nargs='+',\n        action=DictAction,\n        help='override some settings in the used config, the key-value pair '\n        'in xxx=yyy format will be merged into config file. If the value to '\n        'be overwritten is a list, it should be like key=\"[a,b]\" or key=a,b '\n        'It also allows nested list/tuple values, e.g. key=\"[(a,b),(c,d)]\" '\n        'Note that the quotation marks are necessary and that no white space '\n        'is allowed.')\n    parser.add_argument(\n        '--launcher',\n        choices=['none', 'pytorch', 'slurm', 'mpi'],\n        default='pytorch',\n        help='job launcher')\n    parser.add_argument('--local_rank', type=int, default=0)\n    parser.add_argument(\n        '--autoscale-lr',\n        action='store_true',\n        help='automatically scale lr with the number of gpus')\n    args = parser.parse_args()\n    if 'LOCAL_RANK' not in os.environ:\n        os.environ['LOCAL_RANK'] = str(args.local_rank)\n\n    return args\n\n\ndef main():\n    args = parse_args()\n\n    cfg = Config.fromfile(args.config)\n    if args.cfg_options is not None:\n        cfg.merge_from_dict(args.cfg_options)\n\n    # import modules from plguin/xx, registry will be updated\n    if hasattr(cfg, 'plugin') and cfg.plugin:\n        assert cfg.plugin_dir is not None\n        import importlib\n        plugin_dir = cfg.plugin_dir\n        _module_dir = os.path.dirname(plugin_dir)\n        _module_dir = _module_dir.split('/')\n        _module_path = _module_dir[0]\n\n        for m in _module_dir[1:]:\n            _module_path = _module_path + '.' + m\n        # print(_module_path)\n        plg_lib = importlib.import_module(_module_path)\n    \n    # set cudnn_benchmark\n    if cfg.get('cudnn_benchmark', False):\n        torch.backends.cudnn.benchmark = True\n\n    # work_dir is determined in this priority: CLI > segment in file > filename\n    if args.work_dir is not None:\n        cfg.work_dir = args.work_dir\n    elif cfg.get('work_dir', None) is None:\n        cfg.work_dir = osp.join('./work_dirs', osp.splitext(osp.basename(args.config))[0])\n    \n    if args.resume is not None and osp.isfile(args.resume):\n        cfg.resume_from = args.resume\n        \n    if args.gpu_ids is not None:\n        cfg.gpu_ids = args.gpu_ids\n    else:\n        cfg.gpu_ids = range(1) if args.gpus is None else range(args.gpus)\n\n    if args.autoscale_lr:\n        cfg.optimizer['lr'] = cfg.optimizer['lr'] * len(cfg.gpu_ids) / 8\n\n    # init distributed env first, since logger depends on the dist info.\n    if args.launcher == 'none':\n        distributed = False\n    else:\n        distributed = True\n        init_dist(args.launcher, **cfg.dist_params)\n        # re-set gpu_ids with distributed training mode\n        _, world_size = get_dist_info()\n        cfg.gpu_ids = range(world_size)\n\n    # create work_dir\n    mmcv.mkdir_or_exist(osp.abspath(cfg.work_dir))\n    # dump config\n    cfg.dump(osp.join(cfg.work_dir, osp.basename(args.config)))\n    # init the logger before other steps\n    timestamp = time.strftime('%Y%m%d_%H%M%S', time.localtime())\n    log_file = osp.join(cfg.work_dir, f'{timestamp}.log')\n    logger = get_root_logger(\n        log_file=log_file, log_level=cfg.log_level, name='mmdet')\n\n    # init the meta dict to record some important information such as\n    # environment info and seed, which will be logged\n    meta = dict()\n    # log env info\n    env_info_dict = collect_env()\n    env_info = '\\n'.join([(f'{k}: {v}') for k, v in env_info_dict.items()])\n    dash_line = '-' * 60 + '\\n'\n    logger.info('Environment info:\\n' + dash_line + env_info + '\\n' + dash_line)\n    meta['env_info'] = env_info\n    meta['config'] = cfg.pretty_text\n\n    # log some basic info\n    logger.info(f'Distributed training: {distributed}')\n\n    # set random seeds\n    if args.seed is not None:\n        logger.info(f'Set random seed to {args.seed}, '\n                    f'deterministic: {args.deterministic}')\n        set_random_seed(args.seed, deterministic=args.deterministic)\n    cfg.seed = args.seed\n    meta['seed'] = args.seed\n    meta['exp_name'] = osp.basename(args.config)\n\n    model = build_model(\n        cfg.model,\n        train_cfg=cfg.get('train_cfg'),\n        test_cfg=cfg.get('test_cfg'))\n    n_parameters = sum(p.numel() for p in model.parameters() if p.requires_grad)\n    logger.info(f'Number of params: {n_parameters}')\n    model.init_weights()\n\n    datasets = [build_dataset(cfg.data.train)]\n        \n    # add an attribute for visualization convenience\n    model.CLASSES = datasets[0].CLASSES\n    \n    custom_train_model(\n        model,\n        datasets,\n        cfg,\n        distributed=distributed,\n        validate=(not args.no_validate),\n        timestamp=timestamp,\n        meta=meta)\n\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "viz/viz_gt.py",
    "content": "# Developed by Junyi Ma\n# Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications\n# https://github.com/haomo-ai/Cam4DOcc\n\nfrom tqdm import tqdm\nimport pickle\nimport numpy as np\nfrom mayavi import mlab\nfrom tqdm import trange\nimport os\nfrom xvfbwrapper import Xvfb\n\n# export QT_QPA_PLATFORM='offscreen' \nmlab.options.offscreen = True\n\ndef viz_occ(occ, occ_mo, file_name, voxel_size, show_occ, show_time_change):\n\n    vdisplay = Xvfb(width=1, height=1)\n    vdisplay.start()\n\n    mlab.figure(size=(800,800), bgcolor=(1,1,1))\n\n    plt_plot_occ = mlab.points3d(\n        occ[:, 0] * voxel_size,\n        occ[:, 1] * voxel_size,\n        occ[:, 2] * voxel_size,\n        occ[:, 3],\n        colormap=\"viridis\",\n        scale_factor=voxel_size - 0.05 * voxel_size,\n        mode=\"cube\",\n        opacity=0.9,\n        vmin=1,\n    )\n    colors_occ = np.array(\n        [\n            [152, 251, 152, 255],\n            [152, 251, 152, 255],\n            [152, 251, 152, 255],\n            [152, 251, 152, 255],\n            [152, 251, 152, 255],\n        ]\n    ).astype(np.uint8)    \n    plt_plot_occ.glyph.scale_mode = \"scale_by_vector\"\n    plt_plot_occ.module_manager.scalar_lut_manager.lut.table = colors_occ\n\n    plt_plot_mov = mlab.points3d(\n        occ_mo[:, 0] * voxel_size,\n        occ_mo[:, 1] * voxel_size,\n        occ_mo[:, 2] * voxel_size,\n        occ_mo[:, 3],\n        colormap=\"viridis\",\n        scale_factor=voxel_size - 0.05 * voxel_size,\n        mode=\"cube\",\n        opacity=0.9,\n        vmin=1,\n    )\n    if show_time_change:\n        colors_occ_mo = np.array(\n            [\n                [255, 70, 255, 255],\n                [255, 110, 255, 255],\n                [255, 150, 255, 255],\n                [255, 190, 255, 255],\n                [255, 250, 250, 255],\n            ]\n        ).astype(np.uint8)\n    else:\n        colors_occ_mo = np.array(\n            [\n                [220, 20, 60, 255],\n                [255, 127, 80, 255],\n                [0, 0, 230, 255],\n                [255, 158, 0, 255],\n                [233, 150, 70, 255],\n                [47, 79, 79, 255],\n                [255, 99, 71, 255],\n                [175, 0, 75, 255],\n                [255, 61, 99, 255],\n            ]\n        ).astype(np.uint8)    \n    plt_plot_mov.glyph.scale_mode = \"scale_by_vector\"\n    plt_plot_mov.module_manager.scalar_lut_manager.lut.table = colors_occ_mo\n\n    fig_dir = \"./figs\"\n    if not os.path.exists(fig_dir):\n        os.mkdir(fig_dir)\n    mlab.savefig(os.path.join(fig_dir, file_name[:-4]+\".png\"))\n    vdisplay.stop()\n\n\ndef main():\n\n    show_time_change = True\n\n    nuscocc_path = \"../data/nuScenes-Occupancy/\"\n    cam4docc_path = \"../data/cam4docc/GMO/segmentation/\"\n\n    segmentation_files = os.listdir(cam4docc_path)\n    segmentation_files.sort(key=lambda x: (x.split(\"_\")[1]))\n    index = 0\n\n    for file_ in tqdm(segmentation_files):\n\n        scene_token = file_.split(\"_\")[0]\n        lidar_token = file_.split(\"_\")[1]\n\n        gt_file = nuscocc_path+\"scene_\"+scene_token+\"/occupancy/\"+lidar_token[:-4]+\".npy\"\n        gt_occ_semantic =  np.load(gt_file,allow_pickle=True)\n        gt_occ_semantic = gt_occ_semantic[gt_occ_semantic[:, -1]!=0]\n        gt_occ_semantic = gt_occ_semantic[::2]\n        gt_occ_semantic_refine = np.zeros_like(gt_occ_semantic)\n        gt_occ_semantic_refine[:, 0] = gt_occ_semantic[:, 2]\n        gt_occ_semantic_refine[:, 1] = gt_occ_semantic[:, 1]\n        gt_occ_semantic_refine[:, 2] = gt_occ_semantic[:, 0]\n        gt_occ_semantic_refine[:, 3] = 1\n\n        gt_mo_semantic =  np.load(cam4docc_path+file_,allow_pickle=True)['arr_0']\n\n        gt_mo_semantic_to_draw=np.zeros((0,4))\n        for t in range(0,4):\n            gt_mo_cur = gt_mo_semantic[t]\n            gt_mo_cur = np.array(gt_mo_cur)\n            gt_mo_cur = gt_mo_cur[::2]\n            if show_time_change:\n                gt_mo_cur[:, -1] = int(t+1)\n            gt_mo_semantic_to_draw = np.concatenate((gt_mo_semantic_to_draw, gt_mo_cur))\n\n        viz_occ(gt_occ_semantic_refine, gt_mo_semantic_to_draw, file_, voxel_size=0.2, show_occ=True, show_time_change=show_time_change)\n\n        index += 1\n\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "viz/viz_pred.py",
    "content": "from tqdm import tqdm\nimport pickle\nimport numpy as np\nfrom mayavi import mlab\nfrom tqdm import trange\nimport os\nfrom xvfbwrapper import Xvfb\n\nmlab.options.offscreen = True\n\ndef viz_occ(occ, occ_mo, file_name, voxel_size, show_occ, show_time_change):\n\n    vdisplay = Xvfb(width=1, height=1)\n    vdisplay.start()\n\n    mlab.figure(size=(800,800), bgcolor=(1,1,1))\n\n    plt_plot_occ = mlab.points3d(\n        occ[:, 0] * voxel_size,\n        occ[:, 1] * voxel_size,\n        occ[:, 2] * voxel_size,\n        occ[:, 3],\n        colormap=\"viridis\",\n        scale_factor=voxel_size - 0.05 * voxel_size,\n        mode=\"cube\",\n        opacity=0.9,\n        vmin=1,\n    )\n    colors_occ = np.array(\n        [\n            [152, 251, 152, 255],\n            [152, 251, 152, 255],\n            [152, 251, 152, 255],\n            [152, 251, 152, 255],\n            [152, 251, 152, 255],\n        ]\n    ).astype(np.uint8)    \n    plt_plot_occ.glyph.scale_mode = \"scale_by_vector\"\n    plt_plot_occ.module_manager.scalar_lut_manager.lut.table = colors_occ\n\n    plt_plot_mov = mlab.points3d(\n        occ_mo[:, 0] * voxel_size,\n        occ_mo[:, 1] * voxel_size,\n        occ_mo[:, 2] * voxel_size,\n        occ_mo[:, 3],\n        colormap=\"viridis\",\n        scale_factor=voxel_size - 0.05 * voxel_size,\n        mode=\"cube\",\n        opacity=0.9,\n        vmin=1,\n    )\n    if show_time_change:\n        colors_occ_mo = np.array(\n            [\n                [255, 70, 255, 255],\n                [255, 110, 255, 255],\n                [255, 150, 255, 255],\n                [255, 190, 255, 255],\n                [255, 250, 250, 255],\n            ]\n        ).astype(np.uint8)\n    else:\n        colors_occ_mo = np.array(\n            [\n                [220, 20, 60, 255],\n                [255, 127, 80, 255],\n                [0, 0, 230, 255],\n                [255, 158, 0, 255],\n                [233, 150, 70, 255],\n                [47, 79, 79, 255],\n                [255, 99, 71, 255],\n                [175, 0, 75, 255],\n                [255, 61, 99, 255],\n            ]\n        ).astype(np.uint8)    \n    plt_plot_mov.glyph.scale_mode = \"scale_by_vector\"\n    plt_plot_mov.module_manager.scalar_lut_manager.lut.table = colors_occ_mo\n\n    fig_dir = \"./figs\"\n    if not os.path.exists(fig_dir):\n        os.mkdir(fig_dir)\n    mlab.savefig(os.path.join(fig_dir, file_name[:-4]+\".png\"))\n    vdisplay.stop()\n\n\ndef main():\n\n    show_time_change = True\n\n    nuscocc_path = \"../data/nuScenes-Occupancy/\"\n    pred_path = \"../data/cam4docc/results/\"\n\n    segmentation_files = os.listdir(pred_path)\n    segmentation_files.sort(key=lambda x: (x.split(\"_\")[1]))\n    index = 0\n\n    segmentation_files = segmentation_files[::10]\n    for file_ in tqdm(segmentation_files):\n\n        scene_token = file_.split(\"_\")[0]\n        lidar_token = file_.split(\"_\")[1]\n        gt_file = nuscocc_path+\"scene_\"+scene_token+\"/occupancy/\"+lidar_token[:-4]+\".npy\"\n        gt_occ_semantic =  np.load(gt_file,allow_pickle=True)\n        gt_occ_semantic = gt_occ_semantic[gt_occ_semantic[:, -1]!=0]\n        gt_occ_semantic = gt_occ_semantic[::2]\n        gt_occ_semantic_refine = np.zeros_like(gt_occ_semantic)\n        gt_occ_semantic_refine[:, 0] = gt_occ_semantic[:, 2]\n        gt_occ_semantic_refine[:, 1] = gt_occ_semantic[:, 1]\n        gt_occ_semantic_refine[:, 2] = gt_occ_semantic[:, 0]\n        gt_occ_semantic_refine[:, 3] = 1\n\n\n        pred_mo_semantic =  np.load(pred_path+file_,allow_pickle=True)['arr_0']\n        pred_mo_semantic_to_draw=np.zeros((0,4))\n        for t in range(0,4):\n            pred_mo_cur = pred_mo_semantic[t]\n            pred_mo_cur = np.array(pred_mo_cur)\n            pred_mo_cur = pred_mo_cur[::2]\n            if show_time_change:\n                pred_mo_cur[:, -1] = int(t+1)\n            pred_mo_semantic_to_draw = np.concatenate((pred_mo_semantic_to_draw, pred_mo_cur))\n\n        viz_occ(gt_occ_semantic_refine, pred_mo_semantic_to_draw, file_, voxel_size=0.2, show_occ=True, show_time_change=show_time_change)\n\n        index += 1\n\n\nif __name__ == \"__main__\":\n    main()\n    # export QT_QPA_PLATFORM='offscreen' "
  }
]