Full Code of MCG-NJU/SparseOcc for AI

main af4d9df83bfd cached

58 files

409.0 KB

109.1k tokens

353 symbols

1 requests

Download .txt

Showing preview only (429K chars total). Download the full file or copy to clipboard to get everything.

Repository: MCG-NJU/SparseOcc
Branch: main
Commit: af4d9df83bfd
Files: 58
Total size: 409.0 KB

Directory structure:
gitextract_8j2gi4im/

├── .gitignore
├── LICENSE
├── README.md
├── configs/
│   ├── r50_nuimg_704x256_8f.py
│   ├── r50_nuimg_704x256_8f_60e.py
│   ├── r50_nuimg_704x256_8f_openocc.py
│   └── r50_nuimg_704x256_8f_pano.py
├── gen_instance_info.py
├── gen_sweep_info.py
├── lib/
│   └── dvr/
│       ├── dvr.cpp
│       └── dvr.cu
├── loaders/
│   ├── __init__.py
│   ├── builder.py
│   ├── ego_pose_dataset.py
│   ├── nuscenes_dataset.py
│   ├── nuscenes_occ_dataset.py
│   ├── old_metrics.py
│   ├── pipelines/
│   │   ├── __init__.py
│   │   ├── loading.py
│   │   └── transforms.py
│   ├── ray_metrics.py
│   └── ray_pq.py
├── models/
│   ├── __init__.py
│   ├── backbones/
│   │   ├── __init__.py
│   │   └── vovnet.py
│   ├── bbox/
│   │   ├── __init__.py
│   │   ├── assigners/
│   │   │   ├── __init__.py
│   │   │   └── hungarian_assigner_3d.py
│   │   ├── coders/
│   │   │   ├── __init__.py
│   │   │   └── nms_free_coder.py
│   │   ├── match_costs/
│   │   │   ├── __init__.py
│   │   │   └── match_cost.py
│   │   └── utils.py
│   ├── checkpoint.py
│   ├── csrc/
│   │   ├── __init__.py
│   │   ├── msmv_sampling/
│   │   │   ├── msmv_sampling.cpp
│   │   │   ├── msmv_sampling.h
│   │   │   ├── msmv_sampling_backward.cu
│   │   │   └── msmv_sampling_forward.cu
│   │   ├── setup.py
│   │   └── wrapper.py
│   ├── loss_utils.py
│   ├── matcher.py
│   ├── sparse_voxel_decoder.py
│   ├── sparsebev_head.py
│   ├── sparsebev_sampling.py
│   ├── sparsebev_transformer.py
│   ├── sparseocc.py
│   ├── sparseocc_head.py
│   ├── sparseocc_transformer.py
│   └── utils.py
├── old_metrics.py
├── ray_metrics.py
├── timing.py
├── train.py
├── utils.py
├── val.py
└── viz_prediction.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
# OS generated files
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db

# Compiled source
build
debug
Debug
release
Release
x64
*.so
*.whl

# VS project files
*.sln
*.vcxproj
*.vcxproj.filters
*.vcxproj.user
*.rc
.vs

# Byte-compiled / optimized / DLL files
*__pycache__*
*.py[cod]
*$py.class

# Distribution / packaging
.Python
build
develop-eggs
dist
downloads

# IDE
.idea
.vscode
pyrightconfig.json

# Custom
data
outputs
prediction
submission
checkpoints
pretrain
ckpts
occ_result
wandb

================================================
FILE: LICENSE
================================================
                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: README.md
================================================
# SparseOcc

This is the official PyTorch implementation for our paper:

> [**Fully Sparse 3D Panoptic Occupancy Prediction**](https://arxiv.org/abs/2312.17118)<br>
> :school: Presented by Nanjing University and Shanghai AI Lab<br>
> :email: Primary contact: Haisong Liu (afterthat97@gmail.com)<br>
> :trophy: [CVPR 2024 Autonomous Driving Challenge - Occupancy and Flow](https://opendrivelab.com/challenge2024/#occupancy_and_flow)<br>
> :book: 中文解读（官方）：https://zhuanlan.zhihu.com/p/709576252<br>
> :book: 中文解读（第三方）: [AIming](https://zhuanlan.zhihu.com/p/691549750), [自动驾驶之心](https://zhuanlan.zhihu.com/p/675811281)

## :warning: Important Notes

There is another concurrent project titled *''SparseOcc: Rethinking sparse latent representation for vision-based semantic occupancy prediction''* by Tang et al., which shares the same name SparseOcc with ours. However, this repository is **unrelated** to the aforementioned paper.

If you cite our research, please ensure that you reference the correct version (arXiv **2312.17118**, authored by **Liu et al.**):

```
@article{liu2023fully,
  title={Fully sparse 3d panoptic occupancy prediction},
  author={Liu, Haisong and Wang, Haiguang and Chen, Yang and Yang, Zetong and Zeng, Jia and Chen, Li and Wang, Limin},
  journal={arXiv preprint arXiv:2312.17118},
  year={2023}
}
```

> In arXiv 2312.17118v3, we removed the word "panoptic" from the title. However, Google Scholar's database has not been updated and still shows the old one. Therefore, we still recommend citing the old title - "Fully sparse 3d panoptic occupancy prediction" - so that Google Scholar can index it correctly. Thank you all.

## News

* **2024-07-19**: We released an updated version of SparseOcc on [arXiv](https://arxiv.org/abs/2312.17118). All charts and colors have been carefully adjusted. Delete the old version and download the new one!
* **2024-07-01**: SparseOcc is accepted to ECCV 2024.
* **2024-06-27**: SparseOcc v1.1 is released. In this change, we introduce BEV data augmentation (BDA) and Lovasz-Softmax loss to further enhance the performance. Compared with [v1.0](https://github.com/MCG-NJU/SparseOcc/tree/v1.0) (35.0 RayIoU with 48 epochs), SparseOcc v1.1 can achieve 36.8 RayIoU with 24 epochs!
* **2024-05-29**: We add support for [OpenOcc v2](configs/r50_nuimg_704x256_8f_openocc.py) dataset (without occupancy flow).
* **2024-04-11**: The panoptic version of SparseOcc ([configs/r50_nuimg_704x256_8f_pano.py](configs/r50_nuimg_704x256_8f_pano.py)) is released.
* **2024-04-09**: An updated arXiv version [https://arxiv.org/abs/2312.17118v3](https://arxiv.org/abs/2312.17118v3) has been released.
* **2024-03-31**: We release the code and pretrained weights.
* **2023-12-30**: We release the paper.

## Highlights

**New model**:1st_place_medal:: SparseOcc initially reconstructs a sparse 3D representation from visual inputs and subsequently predicts semantic/instance occupancy from the 3D sparse representation by sparse queries.

![](asserts/arch.jpg)

**New evaluation metric**:chart_with_upwards_trend:: We design a thoughtful ray-based evaluation metric, namely RayIoU, to solve the inconsistency penalty along depths raised in traditional voxel-level mIoU criteria.

![](asserts/rayiou.jpg)

Some FAQs from the community about the evaluation metrics:

1. **Why does training with visible masks result in significant improvements in the old mIoU metric, but not in the new RayIoU metric?** As mentioned in the paper, when using the visible mask during training, the area behind the surface won't be supervised, so the model tends to fill this area with duplicated predictions, leading to a thicker surface. The old metric inconsistently penalizes along the depth axis when the prediction has a thick surface. Thus, this ''imporovement'' is mainly due to the vulnerability of old metric.
2. **Why SparseOcc cannot exploit the vulnerability of the old metrics?** As SparseOcc employs a fully sparse architecture, it always predicts a thin surface. Thus, there are two ways for a fair comparison: (a) use the old metric, but all methods must predict a thin surface, which implies they cannot use the visible mask during training; (b) use RayIoU, as it is more reasonable and can fairly compare thick or thin surface. Our method achieves SOTA performance on both cases.
3. **Does RayIoU overlook interior reconstruction?** Firstly, we are unable to obtain the interior occupancy ground-truth. This is because the ground-truth is derived from voxelizing LiDAR point clouds, and LiDARs are only capable of scanning the thin surface of an object. Secondly, the query ray in RayIoU can originate from any position within the scene (see the figure above). This allows it to evaluate the overall reconstruction performance, unlike depth estimation. We would like to emphasize that the evaluation logic of RayIoU aligns with the process of ground-truth generation.

If you have other questions, feel free to contact me (Haisong Liu, afterthat97@gmail.com).

## Model Zoo

These results are from our latest version, v1.1, which outperforms the results reported in the paper. Additionally, our implementation differs slightly from the original paper. If you wish to reproduce the paper exactly, please refer to the [v1.0](https://github.com/MCG-NJU/SparseOcc/tree/v1.0) tag.

| Setting  | Epochs | Training Cost | RayIoU | RayPQ | FPS | Weights |
|----------|:--------:|:-------------:|:------:|:-----:|:---:|:-------:|
| [r50_nuimg_704x256_8f](configs/r50_nuimg_704x256_8f.py) | 24 | 15h, ~12GB | 36.8 | - | 17.3 | [github](https://github.com/MCG-NJU/SparseOcc/releases/download/v1.1/sparseocc_r50_nuimg_704x256_8f_24e_v1.1.pth) |
| [r50_nuimg_704x256_8f_60e](configs/r50_nuimg_704x256_8f_60e.py) | 60 | 37h, ~12GB | 37.7 | - | 17.3 | [github](https://github.com/MCG-NJU/SparseOcc/releases/download/v1.1/sparseocc_r50_nuimg_704x256_8f_60e_v1.1.pth) |
| [r50_nuimg_704x256_8f_pano](configs/r50_nuimg_704x256_8f_pano.py) | 24 | 15h, ~12GB | 35.9 | 14.0 | 17.3 | [github](https://github.com/MCG-NJU/SparseOcc/releases/download/v1.1/sparseocc_r50_nuimg_704x256_8f_pano_24e_v1.1.pth) |

* The backbone is pretrained on [nuImages](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/nuimages_semseg/cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim/cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim_20201009_124951-40963960.pth). Download the weights to `pretrain/xxx.pth` before you start training.
* FPS is measured with Intel(R) Xeon(R) Platinum 8369B CPU and NVIDIA A100-SXM4-80GB GPU (PyTorch `fp32` backend, including data loading).
* We will release more settings in the future.

## Environment

> The requirements are the same as those of [SparseBEV](https://github.com/MCG-NJU/SparseBEV).

Install PyTorch 2.0 + CUDA 11.8:

```
conda create -n sparseocc python=3.8
conda activate sparseocc
conda install pytorch==2.0.0 torchvision==0.15.0 pytorch-cuda=11.8 -c pytorch -c nvidia
```

Install other dependencies:

```
pip install openmim
mim install mmcv-full==1.6.0
mim install mmdet==2.28.2
mim install mmsegmentation==0.30.0
mim install mmdet3d==1.0.0rc6
pip install setuptools==59.5.0
pip install numpy==1.23.5
```

Install turbojpeg and pillow-simd to speed up data loading (optional but important):

```
sudo apt-get update
sudo apt-get install -y libturbojpeg
pip install pyturbojpeg
pip uninstall pillow
pip install pillow-simd==9.0.0.post1
```

Compile CUDA extensions:

```
cd models/csrc
python setup.py build_ext --inplace
```

## Prepare Dataset

> The first two steps are the same as those of [SparseBEV](https://github.com/MCG-NJU/SparseBEV).

1. Download nuScenes from [https://www.nuscenes.org/nuscenes](https://www.nuscenes.org/nuscenes), put it to `data/nuscenes` and preprocess it with [mmdetection3d](https://github.com/open-mmlab/mmdetection3d/tree/v1.0.0rc6).

2. Download the generated info file from [gdrive](https://drive.google.com/file/d/1uyoUuSRIVScrm_CUpge6V_UzwDT61ODO/view?usp=sharing) and unzip it. These `*.pkl` files can also be generated with our script: `gen_sweep_info.py`.

3. Download Occ3D-nuScenes occupancy GT from [gdrive](https://drive.google.com/file/d/1kiXVNSEi3UrNERPMz_CfiJXKkgts_5dY/view?usp=drive_link), unzip it, and save it to `data/nuscenes/occ3d`.

4. Folder structure:

```
data/nuscenes
├── maps
├── nuscenes_infos_test_sweep.pkl
├── nuscenes_infos_train_sweep.pkl
├── nuscenes_infos_val_sweep.pkl
├── samples
├── sweeps
├── v1.0-test
└── v1.0-trainval
└── occ3d
    ├── scene-0001
    │   ├── 0037a705a2e04559b1bba6c01beca1cf
    │   │   └── labels.npz
    │   ├── 026155aa1c554e2f87914ec9ba80acae
    │   │   └── labels.npz
    ...
```

5. (Optional) Generate the panoptic occupancy ground truth with `gen_instance_info.py`. The panoptic version of Occ3D will be saved to `data/nuscenes/occ3d_panoptic`.

## Training

Train SparseOcc with 8 GPUs:

```
torchrun --nproc_per_node 8 train.py --config configs/sparseocc_r50_nuimg_704x256_8f.py
```

Train SparseOcc with 4 GPUs (i.e the last four GPUs):

```
export CUDA_VISIBLE_DEVICES=4,5,6,7
torchrun --nproc_per_node 4 train.py --config configs/sparseocc_r50_nuimg_704x256_8f.py
```

The batch size for each GPU will be scaled automatically. So there is no need to modify the `batch_size` in config files.

## Evaluation

Single-GPU evaluation:

```
export CUDA_VISIBLE_DEVICES=0
python val.py --config configs/sparseocc_r50_nuimg_704x256_8f.py --weights checkpoints/sparseocc_r50_nuimg_704x256_8f.pth
```

Multi-GPU evaluation:

```
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
torchrun --nproc_per_node 8 val.py --config configs/sparseocc_r50_nuimg_704x256_8f.py --weights checkpoints/sparseocc_r50_nuimg_704x256_8f.pth
```

## Standalone Evaluation

If you want to evaluate your own model using RayIoU, please follow the steps below:

1. Save the predictions (shape=`[200x200x16]`, dtype=`np.uint8`) with the compressed `npz` format. For example:

```
save_path = os.path.join(save_dir, sample_token + '.npz')
np.savez_compressed(save_path, pred=sem_pred)
``` 

2. The filename for each sample is `sample_token.npz`,  for example:

```
prediction/your_model
├── 000681a060c04755a1537cf83b53ba57.npz
├── 000868a72138448191b4092f75ed7776.npz
├── 0017c2623c914571a1ff2a37f034ffd7.npz
├── ...
```

3. Run `ray_metrics.py` to evaluate on the RayIoU:

```
python ray_metrics.py --pred-dir prediction/your_model
```

## Timing

FPS is measured with a single GPU:

```
export CUDA_VISIBLE_DEVICES=0
python timing.py --config configs/sparseocc_r50_nuimg_704x256_8f.py --weights checkpoints/sparseocc_r50_nuimg_704x256_8f.pth
```

## Acknowledgements

Many thanks to these excellent open-source projects:

* [MaskFormer](https://github.com/facebookresearch/MaskFormer)
* [NeuralRecon](https://github.com/zju3dv/NeuralRecon)
* [4D-Occ](https://github.com/tarashakhurana/4d-occ-forecasting)
* [MMDetection3D](https://github.com/open-mmlab/mmdetection3d)


================================================
FILE: configs/r50_nuimg_704x256_8f.py
================================================
dataset_type = 'NuSceneOcc'
dataset_root = 'data/nuscenes/'
occ_gt_root = 'data/nuscenes/occ3d'

# If point cloud range is changed, the models should also change their point
# cloud range accordingly
point_cloud_range = [-40, -40, -1.0, 40, 40, 5.4]
occ_size = [200, 200, 16]

img_norm_cfg = dict(
    mean=[123.675, 116.280, 103.530],
    std=[58.395, 57.120, 57.375],
    to_rgb=True
)

# For nuScenes we usually do 10-class detection
det_class_names = [
    'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier',
    'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
]

occ_class_names = [
    'others', 'barrier', 'bicycle', 'bus', 'car', 'construction_vehicle',
    'motorcycle', 'pedestrian', 'traffic_cone', 'trailer', 'truck',
    'driveable_surface', 'other_flat', 'sidewalk',
    'terrain', 'manmade', 'vegetation', 'free'
]

input_modality = dict(
    use_lidar=False,
    use_camera=True,
    use_radar=False,
    use_map=False,
    use_external=False
)

_dim_ = 256
_num_points_ = 4
_num_groups_ = 4
_num_layers_ = 2
_num_frames_ = 8
_num_queries_ = 100
_topk_training_ = [4000, 16000, 64000]
_topk_testing_ = [2000, 8000, 32000]

model = dict(
    type='SparseOcc',
    data_aug=dict(
        img_color_aug=True,  # Move some augmentations to GPU
        img_norm_cfg=img_norm_cfg,
        img_pad_cfg=dict(size_divisor=32)),
    use_mask_camera=False,
    img_backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN2d', requires_grad=True),
        norm_eval=True,
        style='pytorch',
        with_cp=True),
    img_neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=_dim_,
        num_outs=4),
    pts_bbox_head=dict(
        type='SparseOccHead',
        class_names=occ_class_names,
        embed_dims=_dim_,
        occ_size=occ_size,
        pc_range=point_cloud_range,
        transformer=dict(
            type='SparseOccTransformer',
            embed_dims=_dim_,
            num_layers=_num_layers_,
            num_frames=_num_frames_,
            num_points=_num_points_,
            num_groups=_num_groups_,
            num_queries=_num_queries_,
            num_levels=4,
            num_classes=len(occ_class_names),
            pc_range=point_cloud_range,
            occ_size=occ_size,
            topk_training=_topk_training_,
            topk_testing=_topk_testing_),
        loss_cfgs=dict(
            loss_mask2former=dict(
                type='Mask2FormerLoss',
                num_classes=len(occ_class_names),
                no_class_weight=0.1,
                loss_cls_weight=2.0,
                loss_mask_weight=5.0,
                loss_dice_weight=5.0,
            ),
            loss_geo_scal=dict(
                type='GeoScalLoss',
                num_classes=len(occ_class_names),
                loss_weight=1.0
            ),
            loss_sem_scal=dict(
                type='SemScalLoss',
                num_classes=len(occ_class_names),
                loss_weight=1.0
            )
        ),
    ),
)

ida_aug_conf = {
    'resize_lim': (0.38, 0.55),
    'final_dim': (256, 704),
    'bot_pct_lim': (0.0, 0.0),
    'rot_lim': (0.0, 0.0),
    'H': 900, 'W': 1600,
    'rand_flip': True,
}

bda_aug_conf = dict(
    rot_lim=(-22.5, 22.5),
    scale_lim=(1., 1.),
    flip_dx_ratio=0.5,
    flip_dy_ratio=0.5
)

train_pipeline = [
    dict(type='LoadMultiViewImageFromFiles', to_float32=False, color_type='color'),
    dict(type='LoadMultiViewImageFromMultiSweeps', sweeps_num=_num_frames_ - 1),
    dict(type='BEVAug', bda_aug_conf=bda_aug_conf, classes=det_class_names, is_train=True),
    dict(type='LoadOccGTFromFile', num_classes=len(occ_class_names)),
    dict(type='RandomTransformImage', ida_aug_conf=ida_aug_conf, training=True),
    dict(type='DefaultFormatBundle3D', class_names=det_class_names),
    dict(type='Collect3D', keys=['img', 'voxel_semantics', 'voxel_instances', 'instance_class_ids'],  # other keys: 'mask_camera'
         meta_keys=('filename', 'ori_shape', 'img_shape', 'pad_shape', 'lidar2img', 'img_timestamp', 'ego2lidar'))
]

test_pipeline = [
    dict(type='LoadMultiViewImageFromFiles', to_float32=False, color_type='color'),
    dict(type='LoadMultiViewImageFromMultiSweeps', sweeps_num=_num_frames_ - 1, test_mode=True),
    dict(type='BEVAug', bda_aug_conf=bda_aug_conf, classes=det_class_names, is_train=False),
    dict(type='LoadOccGTFromFile', num_classes=len(occ_class_names)),
    dict(type='RandomTransformImage', ida_aug_conf=ida_aug_conf, training=False),
    dict(type='DefaultFormatBundle3D', class_names=det_class_names),
    dict(type='Collect3D', keys=['img', 'voxel_semantics', 'voxel_instances', 'instance_class_ids'],
         meta_keys=('filename', 'ori_shape', 'img_shape', 'pad_shape', 'lidar2img', 'img_timestamp', 'ego2lidar'))
]

data = dict(
    workers_per_gpu=8,
    train=dict(
        type=dataset_type,
        data_root=dataset_root,
        occ_gt_root=occ_gt_root,
        ann_file=dataset_root + 'nuscenes_infos_train_sweep.pkl',
        pipeline=train_pipeline,
        classes=det_class_names,
        modality=input_modality,
        test_mode=False
    ),
    val=dict(
        type=dataset_type,
        data_root=dataset_root,
        occ_gt_root=occ_gt_root,
        ann_file=dataset_root + 'nuscenes_infos_val_sweep.pkl',
        pipeline=test_pipeline,
        classes=det_class_names,
        modality=input_modality,
        test_mode=True
    ),
    test=dict(
        type=dataset_type,
        data_root=dataset_root,
        occ_gt_root=occ_gt_root,
        ann_file=dataset_root + 'nuscenes_infos_test_sweep.pkl',
        pipeline=test_pipeline,
        classes=det_class_names,
        modality=input_modality,
        test_mode=True
    ),
)

optimizer = dict(
    type='AdamW',
    lr=5e-4,
    paramwise_cfg=dict(
        custom_keys={
            'img_backbone': dict(lr_mult=0.1),
            'sampling_offset': dict(lr_mult=0.1),
        }),
    weight_decay=0.01
)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))

lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=1.0 / 3,
    by_epoch=True,
    step=[22, 24],
    gamma=0.2
)
total_epochs = 24
batch_size = 8

# load pretrained weights
load_from = 'pretrain/cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim_20201009_124951-40963960.pth'
revise_keys = [('backbone', 'img_backbone')]

# resume the last training
resume_from = None

# checkpointing
checkpoint_config = dict(interval=1, max_keep_ckpts=1)

# logging
log_config = dict(
    interval=1,
    hooks=[
        dict(type='MyTextLoggerHook', interval=1, reset_flag=True),
        dict(type='MyTensorboardLoggerHook', interval=500, reset_flag=True)
    ]
)

# evaluation
eval_config = dict(interval=total_epochs)

# other flags
debug = False

================================================
FILE: configs/r50_nuimg_704x256_8f_60e.py
================================================
_base_ = ['./r50_nuimg_704x256_8f.py']

lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=1.0 / 3,
    by_epoch=True,
    step=[48, 60],
    gamma=0.2
)
total_epochs = 60

# evaluation
eval_config = dict(interval=total_epochs)

================================================
FILE: configs/r50_nuimg_704x256_8f_openocc.py
================================================
_base_ = ['./r50_nuimg_704x256_8f.py']

occ_gt_root = 'data/nuscenes/openocc_v2'

det_class_names = [
    'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier',
    'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
]

occ_class_names = [
    'car', 'truck', 'trailer', 'bus', 'construction_vehicle',
    'bicycle', 'motorcycle', 'pedestrian', 'traffic_cone', 'barrier',
    'driveable_surface', 'other_flat', 'sidewalk',
    'terrain', 'manmade', 'vegetation', 'free'
]

_num_frames_ = 8

model = dict(
    pts_bbox_head=dict(
        class_names=occ_class_names,
        transformer=dict(
            num_classes=len(occ_class_names)),
        loss_cfgs=dict(
            loss_mask2former=dict(
                num_classes=len(occ_class_names)
            ),
        ),
    ),
)

ida_aug_conf = {
    'resize_lim': (0.38, 0.55),
    'final_dim': (256, 704),
    'bot_pct_lim': (0.0, 0.0),
    'rot_lim': (0.0, 0.0),
    'H': 900, 'W': 1600,
    'rand_flip': False,
}

train_pipeline = [
    dict(type='LoadMultiViewImageFromFiles', to_float32=False, color_type='color'),
    dict(type='LoadMultiViewImageFromMultiSweeps', sweeps_num=_num_frames_ - 1),
    dict(type='LoadOccGTFromFile', num_classes=len(occ_class_names)),
    dict(type='RandomTransformImage', ida_aug_conf=ida_aug_conf, training=True),
    dict(type='DefaultFormatBundle3D', class_names=det_class_names),
    dict(type='Collect3D', keys=['img', 'voxel_semantics', 'voxel_instances', 'instance_class_ids'],  # other keys: 'mask_camera'
         meta_keys=('filename', 'ori_shape', 'img_shape', 'pad_shape', 'lidar2img', 'img_timestamp', 'ego2lidar'))
]

test_pipeline = [
    dict(type='LoadMultiViewImageFromFiles', to_float32=False, color_type='color'),
    dict(type='LoadMultiViewImageFromMultiSweeps', sweeps_num=_num_frames_ - 1, test_mode=True),
    dict(type='LoadOccGTFromFile', num_classes=len(occ_class_names)),
    dict(type='RandomTransformImage', ida_aug_conf=ida_aug_conf, training=False),
    dict(type='DefaultFormatBundle3D', class_names=det_class_names),
    dict(type='Collect3D', keys=['img', 'voxel_semantics', 'voxel_instances', 'instance_class_ids'],
         meta_keys=('filename', 'ori_shape', 'img_shape', 'pad_shape', 'lidar2img', 'img_timestamp', 'ego2lidar'))
]

data = dict(
    workers_per_gpu=8,
    train=dict(
        pipeline=train_pipeline,
        occ_gt_root=occ_gt_root
    ),
    val=dict(
        pipeline=test_pipeline,
        occ_gt_root=occ_gt_root
    ),
    test=dict(
        pipeline=test_pipeline,
        occ_gt_root=occ_gt_root
    ),
)

================================================
FILE: configs/r50_nuimg_704x256_8f_pano.py
================================================
_base_ = ['./r50_nuimg_704x256_8f.py']

occ_gt_root = 'data/nuscenes/occ3d_panoptic'

# For nuScenes we usually do 10-class detection
det_class_names = [
    'car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'barrier',
    'motorcycle', 'bicycle', 'pedestrian', 'traffic_cone'
]

occ_class_names = [
    'others', 'barrier', 'bicycle', 'bus', 'car', 'construction_vehicle',
    'motorcycle', 'pedestrian', 'traffic_cone', 'trailer', 'truck',
    'driveable_surface', 'other_flat', 'sidewalk',
    'terrain', 'manmade', 'vegetation', 'free'
]

_num_frames_ = 8

model = dict(
    pts_bbox_head=dict(
        panoptic=True
    )
)

ida_aug_conf = {
    'resize_lim': (0.38, 0.55),
    'final_dim': (256, 704),
    'bot_pct_lim': (0.0, 0.0),
    'rot_lim': (0.0, 0.0),
    'H': 900, 'W': 1600,
    'rand_flip': True,
}

bda_aug_conf = dict(
    rot_lim=(-22.5, 22.5),
    scale_lim=(1., 1.),
    flip_dx_ratio=0.5,
    flip_dy_ratio=0.5
)

train_pipeline = [
    dict(type='LoadMultiViewImageFromFiles', to_float32=False, color_type='color'),
    dict(type='LoadMultiViewImageFromMultiSweeps', sweeps_num=_num_frames_ - 1),
    dict(type='BEVAug', bda_aug_conf=bda_aug_conf, classes=det_class_names, is_train=True),
    dict(type='LoadOccGTFromFile', num_classes=len(occ_class_names), inst_class_ids=[2, 3, 4, 5, 6, 7, 9, 10]),
    dict(type='RandomTransformImage', ida_aug_conf=ida_aug_conf, training=True),
    dict(type='DefaultFormatBundle3D', class_names=det_class_names),
    dict(type='Collect3D', keys=['img', 'voxel_semantics', 'voxel_instances', 'instance_class_ids'],  # other keys: 'mask_camera'
         meta_keys=('filename', 'ori_shape', 'img_shape', 'pad_shape', 'lidar2img', 'img_timestamp', 'ego2lidar'))
]

test_pipeline = [
    dict(type='LoadMultiViewImageFromFiles', to_float32=False, color_type='color'),
    dict(type='LoadMultiViewImageFromMultiSweeps', sweeps_num=_num_frames_ - 1, test_mode=True),
    dict(type='BEVAug', bda_aug_conf=bda_aug_conf, classes=det_class_names, is_train=False),
    dict(type='LoadOccGTFromFile', num_classes=len(occ_class_names), inst_class_ids=[2, 3, 4, 5, 6, 7, 9, 10]),
    dict(type='RandomTransformImage', ida_aug_conf=ida_aug_conf, training=False),
    dict(type='DefaultFormatBundle3D', class_names=det_class_names),
    dict(type='Collect3D', keys=['img', 'voxel_semantics', 'voxel_instances', 'instance_class_ids'],
         meta_keys=('filename', 'ori_shape', 'img_shape', 'pad_shape', 'lidar2img', 'img_timestamp', 'ego2lidar'))
]

data = dict(
    workers_per_gpu=8,
    train=dict(
        pipeline=train_pipeline,
        occ_gt_root=occ_gt_root
    ),
    val=dict(
        pipeline=test_pipeline,
        occ_gt_root=occ_gt_root
    ),
    test=dict(
        pipeline=test_pipeline,
        occ_gt_root=occ_gt_root
    ),
)


================================================
FILE: gen_instance_info.py
================================================
import os
import tqdm
import glob
import pickle
import argparse
import numpy as np
import torch
import multiprocessing
from pyquaternion import Quaternion
from nuscenes.utils.data_classes import Box
from nuscenes.utils.geometry_utils import points_in_box


parser = argparse.ArgumentParser()
parser.add_argument('--nusc-root', default='data/nuscenes')
parser.add_argument('--occ3d-root', default='data/nuscenes/occ3d')
parser.add_argument('--output-dir', default='data/nuscenes/occ3d_panoptic')
parser.add_argument('--version', default='v1.0-trainval')
args = parser.parse_args()

token2path = {}
for gt_path in glob.glob(os.path.join(args.occ3d_root, '*/*/*.npz')):
    token = gt_path.split('/')[-2]
    token2path[token] = gt_path

occ_class_names = [
    'others', 'barrier', 'bicycle', 'bus', 'car', 'construction_vehicle',
    'motorcycle', 'pedestrian', 'traffic_cone', 'trailer', 'truck',
    'driveable_surface', 'other_flat', 'sidewalk',
    'terrain', 'manmade', 'vegetation', 'free'
]

det_class_names = [
    'car', 'truck', 'trailer', 'bus', 'construction_vehicle',
    'bicycle', 'motorcycle', 'pedestrian', 'traffic_cone', 'barrier'
]


def convert_to_nusc_box(bboxes, lift_center=False, wlh_margin=0.0):
    results = []
    for q in range(bboxes.shape[0]):

        bbox = bboxes[q].copy()
        if lift_center:
            bbox[2] += bbox[5] * 0.5

        bbox_yaw = -bbox[6] - np.pi / 2
        orientation = Quaternion(axis=[0, 0, 1], radians=bbox_yaw).inverse

        box = Box(
            center=[bbox[0], bbox[1], bbox[2]],
            # 0.8 in pc range is roungly 2 voxels in occ grid
            # enlarge bbox to include voxels on the edge
            size=[bbox[3]+wlh_margin, bbox[4]+wlh_margin, bbox[5]+wlh_margin],
            orientation=orientation,
        )

        results.append(box)

    return results


def meshgrid3d(occ_size, pc_range):  # points in ego coord
    W, H, D = occ_size
    
    xs = torch.linspace(0.5, W - 0.5, W).view(W, 1, 1).expand(W, H, D) / W
    ys = torch.linspace(0.5, H - 0.5, H).view(1, H, 1).expand(W, H, D) / H
    zs = torch.linspace(0.5, D - 0.5, D).view(1, 1, D).expand(W, H, D) / D
    xs = xs * (pc_range[3] - pc_range[0]) + pc_range[0]
    ys = ys * (pc_range[4] - pc_range[1]) + pc_range[1]
    zs = zs * (pc_range[5] - pc_range[2]) + pc_range[2]
    xyz = torch.stack((xs, ys, zs), -1)

    return xyz


def process_add_instance_info(sample):
    point_cloud_range = [-40, -40, -1.0, 40, 40, 5.4]
    occ_size = [200, 200, 16]
    num_classes = 18
    
    occ_gt_path = token2path[sample['token']]
    occ_labels = np.load(occ_gt_path)
    
    occ_gt = occ_labels['semantics']
    gt_boxes = sample['gt_boxes']
    gt_names = sample['gt_names']
    
    bboxes = convert_to_nusc_box(gt_boxes)
    
    instance_gt = np.zeros(occ_gt.shape).astype(np.uint8)
    instance_id = 1
    
    pts = meshgrid3d(occ_size, point_cloud_range).numpy()
    
    # filter out free voxels to accelerate
    valid_idx = np.where(occ_gt < num_classes - 1)
    flatten_occ_gt = occ_gt[valid_idx]
    flatten_inst_gt = instance_gt[valid_idx]
    flatten_pts = pts[valid_idx]
    
    instance_boxes = []
    instance_class_ids = []
    
    for i in range(len(gt_names)):
        if gt_names[i] not in occ_class_names:
            continue
        occ_tag_id = occ_class_names.index(gt_names[i])
            
        # Move box to ego vehicle coord system
        bbox = bboxes[i]
        bbox.rotate(Quaternion(sample['lidar2ego_rotation']))
        bbox.translate(np.array(sample['lidar2ego_translation']))
        
        mask = points_in_box(bbox, flatten_pts.transpose(1, 0))
        
        # ignore voxels not belonging to this class
        mask[mask] = (flatten_occ_gt[mask] == occ_tag_id)
        # ignore voxels already occupied
        mask[mask] = (flatten_inst_gt[mask] == 0)
        
        # only instance with at least 1 voxel will be recorded
        if mask.sum() > 0:
            flatten_inst_gt[mask] = instance_id
            instance_id += 1
            
            # enlarge boxes to include voxels on the edge
            new_box = bbox.copy()
            new_box.wlh = new_box.wlh + 0.8
            
            instance_boxes.append(new_box)
            instance_class_ids.append(occ_tag_id)
    
    # classes that should be viewed as one instance
    all_class_ids_unique = np.unique(occ_gt)
    for i, class_name in enumerate(occ_class_names):
        if class_name in det_class_names or class_name == 'free' or i not in all_class_ids_unique:
            continue
        flatten_inst_gt[flatten_occ_gt == i] = instance_id
        instance_id += 1
    
    # post process unconvered non-occupied voxels
    uncover_idx = np.where(flatten_inst_gt == 0)
    uncover_pts = flatten_pts[uncover_idx]
    uncover_inst_gt = np.zeros_like(uncover_pts[..., 0]).astype(np.uint8)
    unconver_occ_gt = flatten_occ_gt[uncover_idx]
    
    # uncover_inst_dist records the dist between each voxel and its current nearest bbox's center
    uncover_inst_dist = np.ones_like(uncover_pts[..., 0]) * 1e8
    for i, box in enumerate(instance_boxes):
        # important, non-background inst id starts from 1
        inst_id = i + 1
        class_id = instance_class_ids[i]
        mask = points_in_box(box, uncover_pts.transpose(1, 0))
        # mask voxels not belonging to this class
        mask[unconver_occ_gt != class_id] = False
        dist = np.sum((box.center - uncover_pts) ** 2, axis=-1)
        # voxels that have already been assigned to a closer box's instance should be ignored
        # voxels that not inside the box should be ignored
        # `mask[(dist >= uncover_inst_dist)]=False` is right, as it only transforms True masks into False without converting False into True
        # to give readers a more clear understanding, the most standard writing is `mask[mask & (dist >= uncover_inst_dist)]=False`
        mask[dist >= uncover_inst_dist] = False
        # mask[mask & (dist >= uncover_inst_dist)]=False
        
        # important: only voxels inside the box (mask = True) and having no closer identical-class box need to update dist
        uncover_inst_dist[mask] = dist[mask]
        uncover_inst_gt[mask] = inst_id
        
    flatten_inst_gt[uncover_idx] = uncover_inst_gt
    
    instance_gt[valid_idx] = flatten_inst_gt
    # not using this checking function yet
    # assert (instance_gt == 0).sum() - (occ_gt == num_classes-1).sum() < 100, "too many non-free voxels are not assigned to any instance in %s"%(occ_gt_path)
    # global max_margin
    # if max_margin < (instance_gt == 0).sum() - (occ_gt == num_classes-1).sum():
    #     print("###### new max margin: ", max(max_margin, (instance_gt == 0).sum() - (occ_gt == num_classes-1).sum()))
    # max_margin = max(max_margin, (instance_gt == 0).sum() - (occ_gt == num_classes-1).sum())
    
    # save to original path
    data_split = occ_gt_path.split(os.path.sep)[-3:]
    data_path = os.path.sep.join(data_split)
    
    ##### Warning: Using args.xxx (global variable) here is strongly unrecommended
    save_path = os.path.join(args.output_dir, data_path)
    
    save_dir = os.path.split(save_path)[0]
    if not os.path.exists(save_dir):
        os.makedirs(save_dir)
    
    if np.unique(instance_gt).shape[0] != instance_gt.max()+1:
        print('warning: some instance masks are covered by following ones %s'%(save_dir))
    
    # only semantic and mask information is needed to be reserved
    retain_keys = ['semantics', 'mask_lidar', 'mask_camera']   
    new_occ_labels = {k: occ_labels[k] for k in retain_keys}
    new_occ_labels['instances'] = instance_gt
    np.savez_compressed(save_path, **new_occ_labels)


def add_instance_info(sample_infos):
    if not os.path.exists(args.output_dir):
        os.makedirs(args.output_dir)
    
    # all cpus participate in multi processing
    pool = multiprocessing.Pool(multiprocessing.cpu_count())
    with tqdm.tqdm(total=len(sample_infos['infos'])) as pbar:
        for _ in pool.imap(process_add_instance_info, sample_infos['infos']):
            pbar.update(1)
    
    pool.close()
    pool.join()


if __name__ == '__main__':
    if args.version == 'v1.0-trainval':
        sample_infos = pickle.load(open(os.path.join(args.nusc_root, 'nuscenes_infos_train_sweep.pkl'), 'rb'))
        add_instance_info(sample_infos)

        sample_infos = pickle.load(open(os.path.join(args.nusc_root, 'nuscenes_infos_val_sweep.pkl'), 'rb'))
        add_instance_info(sample_infos)

    elif args.version == 'v1.0-test':
        sample_infos = pickle.load(open(os.path.join(args.nusc_root, 'nuscenes_infos_test_sweep.pkl'), 'rb'))
        add_instance_info(sample_infos)

    else:
        raise ValueError


================================================
FILE: gen_sweep_info.py
================================================
# Generate info files manually
import os
import mmcv
import tqdm
import pickle
import argparse
import numpy as np
from nuscenes import NuScenes
from pyquaternion import Quaternion


parser = argparse.ArgumentParser()
parser.add_argument('--data-root', default='data/nuscenes')
parser.add_argument('--version', default='v1.0-trainval')
args = parser.parse_args()


def get_cam_info(nusc, sample_data):
    pose_record = nusc.get('ego_pose', sample_data['ego_pose_token'])
    cs_record = nusc.get('calibrated_sensor', sample_data['calibrated_sensor_token'])
    
    sensor2ego_translation = cs_record['translation']
    ego2global_translation = pose_record['translation']
    sensor2ego_rotation = Quaternion(cs_record['rotation']).rotation_matrix
    ego2global_rotation = Quaternion(pose_record['rotation']).rotation_matrix
    cam_intrinsic = np.array(cs_record['camera_intrinsic'])

    sensor2global_rotation = sensor2ego_rotation.T @ ego2global_rotation.T
    sensor2global_translation = sensor2ego_translation @ ego2global_rotation.T + ego2global_translation

    return {
        'data_path': os.path.join(args.data_root, sample_data['filename']),
        'sensor2global_rotation': sensor2global_rotation,
        'sensor2global_translation': sensor2global_translation,
        'cam_intrinsic': cam_intrinsic,
        'timestamp': sample_data['timestamp'],
    }


def add_sweep_info(nusc, sample_infos):
    for curr_id in tqdm.tqdm(range(len(sample_infos['infos']))):
        sample = nusc.get('sample', sample_infos['infos'][curr_id]['token'])

        cam_types = [
            'CAM_FRONT', 'CAM_FRONT_RIGHT', 'CAM_BACK_RIGHT',
            'CAM_BACK', 'CAM_BACK_LEFT', 'CAM_FRONT_LEFT'
        ]

        curr_cams = dict()
        for cam in cam_types:
            curr_cams[cam] = nusc.get('sample_data', sample['data'][cam])

        for cam in cam_types:
            sample_data = nusc.get('sample_data', sample['data'][cam])
            sweep_cam = get_cam_info(nusc, sample_data)
            sample_infos['infos'][curr_id]['cams'][cam].update(sweep_cam)

        # remove unnecessary
        for cam in cam_types:
            del sample_infos['infos'][curr_id]['cams'][cam]['sensor2ego_translation']
            del sample_infos['infos'][curr_id]['cams'][cam]['sensor2ego_rotation']
            del sample_infos['infos'][curr_id]['cams'][cam]['ego2global_translation']
            del sample_infos['infos'][curr_id]['cams'][cam]['ego2global_rotation']

        sweep_infos = []
        if sample['prev'] != '':  # add sweep frame between two key frame
            for _ in range(5):
                sweep_info = dict()
                for cam in cam_types: 
                    if curr_cams[cam]['prev'] == '':    
                        sweep_info = sweep_infos[-1] 
                        break
                    sample_data = nusc.get('sample_data', curr_cams[cam]['prev'])
                    sweep_cam = get_cam_info(nusc, sample_data)
                    curr_cams[cam] = sample_data
                    sweep_info[cam] = sweep_cam
                sweep_infos.append(sweep_info)

        sample_infos['infos'][curr_id]['sweeps'] = sweep_infos

    return sample_infos


if __name__ == '__main__':
    nusc = NuScenes(args.version, args.data_root)

    if args.version == 'v1.0-trainval':
        sample_infos = pickle.load(open(os.path.join(args.data_root, 'nuscenes_infos_train.pkl'), 'rb'))
        sample_infos = add_sweep_info(nusc, sample_infos)
        mmcv.dump(sample_infos, os.path.join(args.data_root, 'nuscenes_infos_train_sweep.pkl'))

        sample_infos = pickle.load(open(os.path.join(args.data_root, 'nuscenes_infos_val.pkl'), 'rb'))
        sample_infos = add_sweep_info(nusc, sample_infos)
        mmcv.dump(sample_infos, os.path.join(args.data_root, 'nuscenes_infos_val_sweep.pkl'))

    elif args.version == 'v1.0-test':
        sample_infos = pickle.load(open(os.path.join(args.data_root, 'nuscenes_infos_test.pkl'), 'rb'))
        sample_infos = add_sweep_info(nusc, sample_infos)
        mmcv.dump(sample_infos, os.path.join(args.data_root, 'nuscenes_infos_test_sweep.pkl'))

    else:
        raise ValueError


================================================
FILE: lib/dvr/dvr.cpp
================================================
// Acknowledgments: https://github.com/tarashakhurana/4d-occ-forecasting
// Modified by Haisong Liu

#include <string>
#include <torch/extension.h>
#include <vector>

/*
 * CUDA forward declarations
 */

std::vector<torch::Tensor> render_forward_cuda(torch::Tensor sigma,
                                               torch::Tensor origin,
                                               torch::Tensor points,
                                               torch::Tensor tindex,
                                               const std::vector<int> grid,
                                               std::string phase_name);

std::vector<torch::Tensor>
render_cuda(torch::Tensor sigma, torch::Tensor origin, torch::Tensor points,
            torch::Tensor tindex, std::string loss_name);

torch::Tensor init_cuda(torch::Tensor points, torch::Tensor tindex,
                        const std::vector<int> grid);


/*
 * C++ interface
 */

#define CHECK_CUDA(x)                                                          \
  TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
#define CHECK_CONTIGUOUS(x)                                                    \
  TORCH_CHECK(x.is_contiguous(), #x " must be contiguous")
#define CHECK_INPUT(x)                                                         \
  CHECK_CUDA(x);                                                               \
  CHECK_CONTIGUOUS(x)

std::vector<torch::Tensor>
render_forward(torch::Tensor sigma, torch::Tensor origin, torch::Tensor points,
               torch::Tensor tindex, const std::vector<int> grid,
               std::string phase_name) {
  CHECK_INPUT(sigma);
  CHECK_INPUT(origin);
  CHECK_INPUT(points);
  CHECK_INPUT(tindex);
  return render_forward_cuda(sigma, origin, points, tindex, grid, phase_name);
}


std::vector<torch::Tensor> render(torch::Tensor sigma, torch::Tensor origin,
                                  torch::Tensor points, torch::Tensor tindex,
                                  std::string loss_name) {
  CHECK_INPUT(sigma);
  CHECK_INPUT(origin);
  CHECK_INPUT(points);
  CHECK_INPUT(tindex);
  return render_cuda(sigma, origin, points, tindex, loss_name);
}

torch::Tensor init(torch::Tensor points, torch::Tensor tindex,
                   const std::vector<int> grid) {
  CHECK_INPUT(points);
  CHECK_INPUT(tindex);
  return init_cuda(points, tindex, grid);
}

PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
  m.def("init", &init, "Initialize");
  m.def("render", &render, "Render");
  m.def("render_forward", &render_forward, "Render (forward pass only)");
}


================================================
FILE: lib/dvr/dvr.cu
================================================
// Acknowledgments: https://github.com/tarashakhurana/4d-occ-forecasting
// Modified by Haisong Liu

#include <torch/extension.h>
#include <stdio.h>
#include <cuda.h>
#include <cuda_runtime.h>
#include <vector>
#include <string>
#include <iostream>

#define MAX_D 1446 // 700 + 700 + 45 + 1
#define MAX_STEP 1000

enum LossType {L1, L2, ABSREL};
enum PhaseName {TEST, TRAIN};

template <typename scalar_t>
__global__ void init_cuda_kernel(
    const torch::PackedTensorAccessor32<scalar_t,3,torch::RestrictPtrTraits> points,
    const torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> tindex,
    torch::PackedTensorAccessor32<scalar_t,5,torch::RestrictPtrTraits> occupancy) {

    // batch index
    const auto n = blockIdx.y;

    // ray index
    const auto c = blockIdx.x * blockDim.x + threadIdx.x;

    // num of rays
    const auto M = points.size(1);
    const auto T = occupancy.size(1);

    // we allocated more threads than num_rays
    if (c < M) {
        // ray end point
        const auto t = tindex[n][c];

        // invalid points
        assert(T == 1 || t < T);

        // if t < 0, it is a padded point
        if (t < 0) return;

        // time index for sigma
        // when T = 1, we have a static sigma
        const auto ts = (T == 1) ? 0 : t;

        // grid shape
        const int vzsize = occupancy.size(2);
        const int vysize = occupancy.size(3);
        const int vxsize = occupancy.size(4);
        // assert(vzsize + vysize + vxsize <= MAX_D);

        // end point
        const int vx = int(points[n][c][0]);
        const int vy = int(points[n][c][1]);
        const int vz = int(points[n][c][2]);

        //
        if (0 <= vx && vx < vxsize &&
            0 <= vy && vy < vysize &&
            0 <= vz && vz < vzsize) {
            occupancy[n][ts][vz][vy][vx] = 1;
        }
    }
}

template <typename scalar_t>
__global__ void render_forward_cuda_kernel(
    const torch::PackedTensorAccessor32<scalar_t,5,torch::RestrictPtrTraits> sigma,
    const torch::PackedTensorAccessor32<scalar_t,3,torch::RestrictPtrTraits> origin,
    const torch::PackedTensorAccessor32<scalar_t,3,torch::RestrictPtrTraits> points,
    const torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> tindex,
    // torch::PackedTensorAccessor32<scalar_t,5,torch::RestrictPtrTraits> pog,
    torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> pred_dist,
    torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> gt_dist,
    torch::PackedTensorAccessor32<scalar_t,3,torch::RestrictPtrTraits> coord_index,
    PhaseName train_phase) {

    // batch index
    const auto n = blockIdx.y;

    // ray index
    const auto c = blockIdx.x * blockDim.x + threadIdx.x;

    // num of rays
    const auto M = points.size(1);
    const auto T = sigma.size(1);

    // we allocated more threads than num_rays
    if (c < M) {
        // ray end point
        const auto t = tindex[n][c];

        // invalid points
        // assert(t < T);
        assert(T == 1 || t < T);

        // time index for sigma
        // when T = 1, we have a static sigma
        const auto ts = (T == 1) ? 0 : t;

        // if t < 0, it is a padded point
        if (t < 0) return;

        // grid shape
        const int vzsize = sigma.size(2);
        const int vysize = sigma.size(3);
        const int vxsize = sigma.size(4);
        // assert(vzsize + vysize + vxsize <= MAX_D);

        // origin
        const double xo = origin[n][t][0];
        const double yo = origin[n][t][1];
        const double zo = origin[n][t][2];

        // end point
        const double xe = points[n][c][0];
        const double ye = points[n][c][1];
        const double ze = points[n][c][2];

        // locate the voxel where the origin resides
        const int vxo = int(xo);
        const int vyo = int(yo);
        const int vzo = int(zo);

        const int vxe = int(xe);
        const int vye = int(ye);
        const int vze = int(ze);

        // NOTE: new
        int vx = vxo;
        int vy = vyo;
        int vz = vzo;

        // origin to end
        const double rx = xe - xo;
        const double ry = ye - yo;
        const double rz = ze - zo;
        double gt_d = sqrt(rx * rx + ry * ry + rz * rz);

        // directional vector
        const double dx = rx / gt_d;
        const double dy = ry / gt_d;
        const double dz = rz / gt_d;

        // In which direction the voxel ids are incremented.
        const int stepX = (dx >= 0) ? 1 : -1;
        const int stepY = (dy >= 0) ? 1 : -1;
        const int stepZ = (dz >= 0) ? 1 : -1;

        // Distance along the ray to the next voxel border from the current position (tMaxX, tMaxY, tMaxZ).
        const double next_voxel_boundary_x = vx + (stepX < 0 ? 0 : 1);
        const double next_voxel_boundary_y = vy + (stepY < 0 ? 0 : 1);
        const double next_voxel_boundary_z = vz + (stepZ < 0 ? 0 : 1);

        // tMaxX, tMaxY, tMaxZ -- distance until next intersection with voxel-border
        // the value of t at which the ray crosses the first vertical voxel boundary
        double tMaxX = (dx!=0) ? (next_voxel_boundary_x - xo)/dx : DBL_MAX; //
        double tMaxY = (dy!=0) ? (next_voxel_boundary_y - yo)/dy : DBL_MAX; //
        double tMaxZ = (dz!=0) ? (next_voxel_boundary_z - zo)/dz : DBL_MAX; //

        // tDeltaX, tDeltaY, tDeltaZ --
        // how far along the ray we must move for the horizontal component to equal the width of a voxel
        // the direction in which we traverse the grid
        // can only be FLT_MAX if we never go in that direction
        const double tDeltaX = (dx!=0) ? stepX/dx : DBL_MAX;
        const double tDeltaY = (dy!=0) ? stepY/dy : DBL_MAX;
        const double tDeltaZ = (dz!=0) ? stepZ/dz : DBL_MAX;

        int3 path[MAX_D];
        double csd[MAX_D];  // cumulative sum of sigma times delta
        double p[MAX_D];  // alpha
        double d[MAX_D];

        // forward raymarching with voxel traversal
        int step = 0;  // total number of voxels traversed
        int count = 0;  // number of voxels traversed inside the voxel grid
        double last_d = 0.0;  // correct initialization

        // voxel traversal raycasting
        bool was_inside = false;
        while (true) {
            bool inside = (0 <= vx && vx < vxsize) &&
                (0 <= vy && vy < vysize) &&
                (0 <= vz && vz < vzsize);
            if (inside) {
                was_inside = true;
                path[count] = make_int3(vx, vy, vz);
            } else if (was_inside) { // was but no longer inside
                // we know we are not coming back so terminate
                break;
            } /*else if (last_d > gt_d) {
                break;
            } */
            /*else { // has not gone inside yet
                // assert(count == 0);
                // (1) when we have hit the destination but haven't gone inside the voxel grid
                // (2) when we have traveled MAX_D voxels but haven't found one valid voxel
                //     handle intersection corner cases in case of infinite loop
                bool hit = (vx == vxe && vy == vye && vz == vze);  // this test seems brittle with corner cases
                if (hit || step >= MAX_D)
                    break;
                //if (last_d >= gt_d || step >= MAX_D) break;
            } */
            // _d represents the ray distance has traveled before escaping the current voxel cell
            double _d = 0.0;
            // voxel traversal
            if (tMaxX < tMaxY) {
                if (tMaxX < tMaxZ) {
                    _d = tMaxX;
                    vx += stepX;
                    tMaxX += tDeltaX;
                } else {
                    _d = tMaxZ;
                    vz += stepZ;
                    tMaxZ += tDeltaZ;
                }
            } else {
                if (tMaxY < tMaxZ) {
                    _d = tMaxY;
                    vy += stepY;
                    tMaxY += tDeltaY;
                } else {
                    _d = tMaxZ;
                    vz += stepZ;
                    tMaxZ += tDeltaZ;
                }
            }
            if (inside) {
                // get sigma at the current voxel
                const int3 &v = path[count];  // use the recorded index
                const double _sigma = sigma[n][ts][v.z][v.y][v.x];
                const double _delta = max(0.0, _d - last_d);  // THIS TURNS OUT IMPORTANT
                const double sd = _sigma * _delta;
                if (count == 0) { // the first voxel inside
                    csd[count] = sd;
                    p[count] = 1 - exp(-sd);
                } else {
                    csd[count] = csd[count-1] + sd;
                    p[count] = exp(-csd[count-1]) - exp(-csd[count]);
                }
                // record the traveled distance
                d[count] = _d;
                // count the number of voxels we have escaped
                count ++;
            }
            last_d = _d;
            step ++;

            if (step > MAX_STEP) {
                break;
            }
        }

        // the total number of voxels visited should not exceed this number
        assert(count <= MAX_D);
        
        if (count > 0) {
            // compute the expected ray distance
            //double exp_d = 0.0;
            double exp_d = d[count-1];
            
            const int3 &v_init = path[count-1];
            int x = v_init.x;
            int y = v_init.y;
            int z = v_init.z;

            for (int i = 0; i < count; i++) {
                //printf("%f\t%f\n",p[i], d[i]);
                //exp_d += p[i] * d[i];
                const int3 &v = path[i];
                const double occ = sigma[n][ts][v.z][v.y][v.x];
                if (occ > 0.5) {
                    exp_d = d[i];
                    
                    x = v.x;
                    y = v.y;
                    z = v.z;
                
                    break;
                }

            }
            //printf("%f\n",exp_d);

            // add an imaginary sample at the end point should gt_d exceeds max_d
            double p_out = exp(-csd[count-1]);
            double max_d = d[count-1];

            // if (gt_d > max_d)
            //   exp_d += (p_out * gt_d);

            // p_out is the probability the ray escapes the voxel grid
            //exp_d += (p_out * max_d);
            if (train_phase == 1) {
                gt_d = min(gt_d, max_d);
            }

            // write the rendered ray distance (max_d)
            pred_dist[n][c] = exp_d;
            gt_dist[n][c] = gt_d;
          
            coord_index[n][c][0] = double(x);
            coord_index[n][c][1] = double(y);
            coord_index[n][c][2] = double(z);

            // // write occupancy
            // for (int i = 0; i < count; i ++) {
            //     const int3 &v = path[i];
            //     auto & occ = pog[n][t][v.z][v.y][v.x];
            //     if (p[i] >= occ) {
            //         occ = p[i];
            //     }
            // }
        }
    }
}

/*
 * input shape
 *   sigma      : N x T x H x L x W
 *   origin   : N x T x 3
 *   points   : N x M x 4
 * output shape
 *   dist     : N x M
 */
std::vector<torch::Tensor> render_forward_cuda(
    torch::Tensor sigma,
    torch::Tensor origin,
    torch::Tensor points,
    torch::Tensor tindex,
    const std::vector<int> grid,
    std::string phase_name) {

    const auto N = points.size(0); // batch size
    const auto M = points.size(1); // num of rays

    const auto T = grid[0];
    const auto H = grid[1];
    const auto L = grid[2];
    const auto W = grid[3];

    const auto device = sigma.device();

    const int threads = 1024;
    const dim3 blocks((M + threads - 1) / threads, N);

    //
    // const auto dtype = points.dtype();
    // const auto options = torch::TensorOptions().dtype(dtype).device(device).requires_grad(false);
    // auto pog = torch::zeros({N, T, H, L, W}, options);

    // perform rendering
    auto gt_dist = -torch::ones({N, M}, device);
    auto pred_dist = -torch::ones({N, M}, device);

    auto coord_index = torch::zeros({N, M, 3}, device);

    PhaseName train_phase;
    if (phase_name.compare("test") == 0) {
        train_phase = TEST;
    } else if (phase_name.compare("train") == 0){
        train_phase = TRAIN;
    } else {
        std::cout << "UNKNOWN PHASE NAME: " << phase_name << std::endl;
        exit(1);
    }

    AT_DISPATCH_FLOATING_TYPES(sigma.type(), "render_forward_cuda", ([&] {
                render_forward_cuda_kernel<scalar_t><<<blocks, threads>>>(
                    sigma.packed_accessor32<scalar_t,5,torch::RestrictPtrTraits>(),
                    origin.packed_accessor32<scalar_t,3,torch::RestrictPtrTraits>(),
                    points.packed_accessor32<scalar_t,3,torch::RestrictPtrTraits>(),
                    tindex.packed_accessor32<scalar_t,2,torch::RestrictPtrTraits>(),
                    // pog.packed_accessor32<scalar_t,5,torch::RestrictPtrTraits>(),
                    pred_dist.packed_accessor32<scalar_t,2,torch::RestrictPtrTraits>(),
                    gt_dist.packed_accessor32<scalar_t,2,torch::RestrictPtrTraits>(),
                    coord_index.packed_accessor32<scalar_t,3,torch::RestrictPtrTraits>(),
                    train_phase);
            }));

    cudaDeviceSynchronize();

    // return {pog, pred_dist, gt_dist};
    return {pred_dist, gt_dist, coord_index};
}

template <typename scalar_t>
__global__ void render_cuda_kernel(
    const torch::PackedTensorAccessor32<scalar_t,5,torch::RestrictPtrTraits> sigma,
    const torch::PackedTensorAccessor32<scalar_t,3,torch::RestrictPtrTraits> origin,
    const torch::PackedTensorAccessor32<scalar_t,3,torch::RestrictPtrTraits> points,
    const torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> tindex,
    // const torch::PackedTensorAccessor32<scalar_t,5,torch::RestrictPtrTraits> occupancy,
    torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> pred_dist,
    torch::PackedTensorAccessor32<scalar_t,2,torch::RestrictPtrTraits> gt_dist,
    torch::PackedTensorAccessor32<scalar_t,5,torch::RestrictPtrTraits> grad_sigma,
    // torch::PackedTensorAccessor32<scalar_t,5,torch::RestrictPtrTraits> grad_sigma_count,
    LossType loss_type) {

    // batch index
    const auto n = blockIdx.y;

    // ray index
    const auto c = blockIdx.x * blockDim.x + threadIdx.x;

    // num of rays
    const auto M = points.size(1);
    const auto T = sigma.size(1);

    // we allocated more threads than num_rays
    if (c < M) {
        // ray end point
        const auto t = tindex[n][c];

        // invalid points
        // assert(t < T);
        assert(T == 1 || t < T);

        // time index for sigma
        // when T = 1, we have a static sigma
        const auto ts = (T == 1) ? 0 : t;

        // if t < 0, it is a padded point
        if (t < 0) return;

        // grid shape
        const int vzsize = sigma.size(2);
        const int vysize = sigma.size(3);
        const int vxsize = sigma.size(4);
        // assert(vzsize + vysize + vxsize <= MAX_D);

        // origin
        const double xo = origin[n][t][0];
        const double yo = origin[n][t][1];
        const double zo = origin[n][t][2];

        // end point
        const double xe = points[n][c][0];
        const double ye = points[n][c][1];
        const double ze = points[n][c][2];

        // locate the voxel where the origin resides
        const int vxo = int(xo);
        const int vyo = int(yo);
        const int vzo = int(zo);

        //
        const int vxe = int(xe);
        const int vye = int(ye);
        const int vze = int(ze);

        // NOTE: new
        int vx = vxo;
        int vy = vyo;
        int vz = vzo;

        // origin to end
        const double rx = xe - xo;
        const double ry = ye - yo;
        const double rz = ze - zo;
        double gt_d = sqrt(rx * rx + ry * ry + rz * rz);

        // directional vector
        const double dx = rx / gt_d;
        const double dy = ry / gt_d;
        const double dz = rz / gt_d;

        // In which direction the voxel ids are incremented.
        const int stepX = (dx >= 0) ? 1 : -1;
        const int stepY = (dy >= 0) ? 1 : -1;
        const int stepZ = (dz >= 0) ? 1 : -1;

        // Distance along the ray to the next voxel border from the current position (tMaxX, tMaxY, tMaxZ).
        const double next_voxel_boundary_x = vx + (stepX < 0 ? 0 : 1);
        const double next_voxel_boundary_y = vy + (stepY < 0 ? 0 : 1);
        const double next_voxel_boundary_z = vz + (stepZ < 0 ? 0 : 1);

        // tMaxX, tMaxY, tMaxZ -- distance until next intersection with voxel-border
        // the value of t at which the ray crosses the first vertical voxel boundary
        double tMaxX = (dx!=0) ? (next_voxel_boundary_x - xo)/dx : DBL_MAX; //
        double tMaxY = (dy!=0) ? (next_voxel_boundary_y - yo)/dy : DBL_MAX; //
        double tMaxZ = (dz!=0) ? (next_voxel_boundary_z - zo)/dz : DBL_MAX; //

        // tDeltaX, tDeltaY, tDeltaZ --
        // how far along the ray we must move for the horizontal component to equal the width of a voxel
        // the direction in which we traverse the grid
        // can only be FLT_MAX if we never go in that direction
        const double tDeltaX = (dx!=0) ? stepX/dx : DBL_MAX;
        const double tDeltaY = (dy!=0) ? stepY/dy : DBL_MAX;
        const double tDeltaZ = (dz!=0) ? stepZ/dz : DBL_MAX;

        int3 path[MAX_D];
        double csd[MAX_D];  // cumulative sum of sigma times delta
        double p[MAX_D];  // alpha
        double d[MAX_D];
        double dt[MAX_D];

        // forward raymarching with voxel traversal
        int step = 0;  // total number of voxels traversed
        int count = 0;  // number of voxels traversed inside the voxel grid
        double last_d = 0.0;  // correct initialization

        // voxel traversal raycasting
        bool was_inside = false;
        while (true) {
            bool inside = (0 <= vx && vx < vxsize) &&
                (0 <= vy && vy < vysize) &&
                (0 <= vz && vz < vzsize);
            if (inside) { // now inside
                was_inside = true;
                path[count] = make_int3(vx, vy, vz);
            } else if (was_inside) { // was inside but no longer
                // we know we are not coming back so terminate
                break;
            } else if (last_d > gt_d) {
                break;
            } /* else { // has not gone inside yet
                // assert(count == 0);
                // (1) when we have hit the destination but haven't gone inside the voxel grid
                // (2) when we have traveled MAX_D voxels but haven't found one valid voxel
                //     handle intersection corner cases in case of infinite loop
                // bool hit = (vx == vxe && vy == vye && vz == vze);
                // if (hit || step >= MAX_D)
                //     break;
                if (last_d >= gt_d || step >= MAX_D) break;
            } */
            // _d represents the ray distance has traveled before escaping the current voxel cell
            double _d = 0.0;
            // voxel traversal
            if (tMaxX < tMaxY) {
                if (tMaxX < tMaxZ) {
                    _d = tMaxX;
                    vx += stepX;
                    tMaxX += tDeltaX;
                } else {
                    _d = tMaxZ;
                    vz += stepZ;
                    tMaxZ += tDeltaZ;
                }
            } else {
                if (tMaxY < tMaxZ) {
                    _d = tMaxY;
                    vy += stepY;
                    tMaxY += tDeltaY;
                } else {
                    _d = tMaxZ;
                    vz += stepZ;
                    tMaxZ += tDeltaZ;
                }
            }
            if (inside) {
                // get sigma at the current voxel
                const int3 &v = path[count];  // use the recorded index
                const double _sigma = sigma[n][ts][v.z][v.y][v.x];
                const double _delta = max(0.0, _d - last_d);  // THIS TURNS OUT IMPORTANT
                const double sd = _sigma * _delta;
                if (count == 0) { // the first voxel inside
                    csd[count] = sd;
                    p[count] = 1 - exp(-sd);
                } else {
                    csd[count] = csd[count-1] + sd;
                    p[count] = exp(-csd[count-1]) - exp(-csd[count]);
                }
                // record the traveled distance
                d[count] = _d;
                dt[count] = _delta;
                // count the number of voxels we have escaped
                count ++;
            }
            last_d = _d;
            step ++;

            if (step > MAX_STEP) {
                break;
            }
        }

        // the total number of voxels visited should not exceed this number
        assert(count <= MAX_D);

        // WHEN THERE IS AN INTERSECTION BETWEEN THE RAY AND THE VOXEL GRID
        if (count > 0) {
            // compute the expected ray distance
            double exp_d = 0.0;
            for (int i = 0; i < count; i ++)
                exp_d += p[i] * d[i];

            // add an imaginary sample at the end point should gt_d exceeds max_d
            double p_out = exp(-csd[count-1]);
            double max_d = d[count-1];

            exp_d += (p_out * max_d);
            gt_d = min(gt_d, max_d);

            // write the rendered ray distance (max_d)
            pred_dist[n][c] = exp_d;
            gt_dist[n][c] = gt_d;

            /* backward raymarching */
            double dd_dsigma[MAX_D];
            for (int i = count - 1; i >= 0; i --) {
                // NOTE: probably need to double check again
                if (i == count - 1)
                    dd_dsigma[i] = p_out * max_d;
                else
                    dd_dsigma[i] = dd_dsigma[i+1] - exp(-csd[i]) * (d[i+1] - d[i]);
            }

            for (int i = count - 1; i >= 0; i --)
                dd_dsigma[i] *= dt[i];

            // option 2: cap at the boundary
            for (int i = count - 1; i >= 0; i --)
                dd_dsigma[i] -= dt[i] * p_out * max_d;

            double dl_dd = 1.0;
            if (loss_type == L1)
                dl_dd = (exp_d >= gt_d) ? 1 : -1;
            else if (loss_type == L2)
                dl_dd = (exp_d - gt_d);
            else if (loss_type == ABSREL)
                dl_dd = (exp_d >= gt_d) ? (1.0/gt_d) : -(1.0/gt_d);

            // apply chain rule
            for (int i = 0; i < count; i ++) {
                const int3 &v = path[i];
                // NOTE: potential race conditions when writing gradients
                grad_sigma[n][ts][v.z][v.y][v.x] += dl_dd * dd_dsigma[i];
                // grad_sigma_count[n][ts][v.z][v.y][v.x] += 1;
            }
        }
    }
}

/*
 * input shape
 *   sigma      : N x T x H x L x W
 *   origin   : N x T x 3
 *   points   : N x M x 4
 * output shape
 *   dist     : N x M
 *   loss     : N x M
 *   grad_sigma : N x T x H x L x W
 */
std::vector<torch::Tensor> render_cuda(
    torch::Tensor sigma,
    torch::Tensor origin,
    torch::Tensor points,
    torch::Tensor tindex,
    std::string loss_name) {

    const auto N = points.size(0); // batch size
    const auto M = points.size(1); // num of rays

    const auto device = sigma.device();

    const int threads = 1024;
    const dim3 blocks((M + threads - 1) / threads, N);

    // perform rendering
    auto gt_dist = -torch::ones({N, M}, device);
    auto pred_dist = -torch::ones({N, M}, device);
    auto grad_sigma = torch::zeros_like(sigma);
    // auto grad_sigma_count = torch::zeros_like(sigma);

    LossType loss_type;
    if (loss_name.compare("l1") == 0) {
        loss_type = L1;
    } else if (loss_name.compare("l2") == 0) {
        loss_type = L2;
    } else if (loss_name.compare("absrel") == 0) {
        loss_type = ABSREL;
    } else if (loss_name.compare("bce") == 0){
        loss_type = L1;
    } else {
        std::cout << "UNKNOWN LOSS TYPE: " << loss_name << std::endl;
        exit(1);
    }

    AT_DISPATCH_FLOATING_TYPES(sigma.type(), "render_cuda", ([&] {
                render_cuda_kernel<scalar_t><<<blocks, threads>>>(
                    sigma.packed_accessor32<scalar_t,5,torch::RestrictPtrTraits>(),
                    origin.packed_accessor32<scalar_t,3,torch::RestrictPtrTraits>(),
                    points.packed_accessor32<scalar_t,3,torch::RestrictPtrTraits>(),
                    tindex.packed_accessor32<scalar_t,2,torch::RestrictPtrTraits>(),
                    // occupancy.packed_accessor32<scalar_t,5,torch::RestrictPtrTraits>(),
                    pred_dist.packed_accessor32<scalar_t,2,torch::RestrictPtrTraits>(),
                    gt_dist.packed_accessor32<scalar_t,2,torch::RestrictPtrTraits>(),
                    grad_sigma.packed_accessor32<scalar_t,5,torch::RestrictPtrTraits>(),
                    // grad_sigma_count.packed_accessor32<scalar_t,5,torch::RestrictPtrTraits>(),
                    loss_type);
            }));

    cudaDeviceSynchronize();

    // grad_sigma_count += (grad_sigma_count == 0);
    // grad_sigma /= grad_sigma_count;

    return {pred_dist, gt_dist, grad_sigma};
}


/*
 * input shape
 *   origin   : N x T x 3
 *   points   : N x M x 3
 *   tindex   : N x M
 * output shape
 *   occupancy: N x T x H x L x W
 */
torch::Tensor init_cuda(
    torch::Tensor points,
    torch::Tensor tindex,
    const std::vector<int> grid) {

    const auto N = points.size(0); // batch size
    const auto M = points.size(1); // num of rays

    const auto T = grid[0];
    const auto H = grid[1];
    const auto L = grid[2];
    const auto W = grid[3];

    const auto dtype = points.dtype();
    const auto device = points.device();
    const auto options = torch::TensorOptions().dtype(dtype).device(device).requires_grad(false);
    auto occupancy = torch::zeros({N, T, H, L, W}, options);

    const int threads = 1024;
    const dim3 blocks((M + threads - 1) / threads, N);

    // initialize occupancy such that every voxel with one or more points is occupied
    AT_DISPATCH_FLOATING_TYPES(points.type(), "init_cuda", ([&] {
                init_cuda_kernel<scalar_t><<<blocks, threads>>>(
                    points.packed_accessor32<scalar_t,3,torch::RestrictPtrTraits>(),
                    tindex.packed_accessor32<scalar_t,2,torch::RestrictPtrTraits>(),
                    occupancy.packed_accessor32<scalar_t,5,torch::RestrictPtrTraits>());
            }));

    // synchronize
    cudaDeviceSynchronize();

    return occupancy;
}

================================================
FILE: loaders/__init__.py
================================================
from .pipelines import __all__
from .nuscenes_dataset import CustomNuScenesDataset
from .nuscenes_occ_dataset import NuSceneOcc

__all__ = [
    'CustomNuScenesDataset', 'NuSceneOcc'
]


================================================
FILE: loaders/builder.py
================================================
from functools import partial
from mmcv.parallel import collate
from mmcv.runner import get_dist_info
from torch.utils.data import DataLoader
from mmdet.datasets.builder import worker_init_fn
from mmdet.datasets.samplers import DistributedGroupSampler, DistributedSampler, GroupSampler


def build_dataloader(dataset,
                     samples_per_gpu,
                     workers_per_gpu,
                     num_gpus=1,
                     dist=True,
                     shuffle=True,
                     seed=None,
                     **kwargs):

    rank, world_size = get_dist_info()
    if dist:
        # DistributedGroupSampler will definitely shuffle the data to satisfy
        # that images on each GPU are in the same group
        if shuffle:
            sampler = DistributedGroupSampler(
                dataset, samples_per_gpu, world_size, rank, seed=seed)
        else:
            sampler = DistributedSampler(
                dataset, world_size, rank, shuffle=False, seed=seed)
        batch_size = samples_per_gpu
        num_workers = workers_per_gpu
    else:
        sampler = GroupSampler(dataset, samples_per_gpu) if shuffle else None
        batch_size = num_gpus * samples_per_gpu
        num_workers = num_gpus * workers_per_gpu

    init_fn = partial(
        worker_init_fn, num_workers=num_workers, rank=rank,
        seed=seed) if seed is not None else None

    data_loader = DataLoader(
        dataset,
        batch_size=batch_size,
        sampler=sampler,
        num_workers=num_workers,
        collate_fn=partial(collate, samples_per_gpu=samples_per_gpu),
        pin_memory=False,
        worker_init_fn=init_fn,
        **kwargs)

    return data_loader


================================================
FILE: loaders/ego_pose_dataset.py
================================================
import torch
import numpy as np
from pyquaternion import Quaternion
from torch.utils.data import Dataset
np.set_printoptions(precision=3, suppress=True)


def trans_matrix(T, R):
    tm = np.eye(4)
    tm[:3, :3] = R.rotation_matrix
    tm[:3, 3] = T
    return tm


# A helper dataset for RayIoU. It is NOT used during training.
class EgoPoseDataset(Dataset):
    def __init__(self, data_infos):
        super(EgoPoseDataset, self).__init__()

        self.data_infos = data_infos
        self.scene_frames = {}

        for info in data_infos:
            scene_name = info['scene_name']
            if scene_name not in self.scene_frames:
                self.scene_frames[scene_name] = []
            self.scene_frames[scene_name].append(info)

    def __len__(self):
        return len(self.data_infos)

    def get_ego_from_lidar(self, info):
        ego_from_lidar = trans_matrix(
            np.array(info['lidar2ego_translation']), 
            Quaternion(info['lidar2ego_rotation']))
        return ego_from_lidar

    def get_global_pose(self, info, inverse=False):
        global_from_ego = trans_matrix(
            np.array(info['ego2global_translation']), 
            Quaternion(info['ego2global_rotation']))
        ego_from_lidar = trans_matrix(
            np.array(info['lidar2ego_translation']), 
            Quaternion(info['lidar2ego_rotation']))
        pose = global_from_ego.dot(ego_from_lidar)
        if inverse:
            pose = np.linalg.inv(pose)
        return pose

    def __getitem__(self, idx):
        info = self.data_infos[idx]

        ref_sample_token = info['token']
        ref_lidar_from_global = self.get_global_pose(info, inverse=True)
        ref_ego_from_lidar = self.get_ego_from_lidar(info)

        scene_frame = self.scene_frames[info['scene_name']]
        ref_index = scene_frame.index(info)

        # NOTE: getting output frames
        output_origin_list = []
        for curr_index in range(len(scene_frame)):
            # if this exists a valid target
            if curr_index == ref_index:
                origin_tf = np.array([0.0, 0.0, 0.0], dtype=np.float32)
            else:
                # transform from the current lidar frame to global and then to the reference lidar frame
                global_from_curr = self.get_global_pose(scene_frame[curr_index], inverse=False)
                ref_from_curr = ref_lidar_from_global.dot(global_from_curr)
                origin_tf = np.array(ref_from_curr[:3, 3], dtype=np.float32)

            origin_tf_pad = np.ones([4])
            origin_tf_pad[:3] = origin_tf  # pad to [4]
            origin_tf = np.dot(ref_ego_from_lidar[:3], origin_tf_pad.T).T  # [3]

            # origin
            if np.abs(origin_tf[0]) < 39 and np.abs(origin_tf[1]) < 39:
                output_origin_list.append(origin_tf)
        
        # select 8 origins
        if len(output_origin_list) > 8:
            select_idx = np.round(np.linspace(0, len(output_origin_list) - 1, 8)).astype(np.int64)
            output_origin_list = [output_origin_list[i] for i in select_idx]

        output_origin_tensor = torch.from_numpy(np.stack(output_origin_list))  # [T, 3]

        return (ref_sample_token, output_origin_tensor)


================================================
FILE: loaders/nuscenes_dataset.py
================================================
import os
import numpy as np
from mmdet.datasets import DATASETS
from mmdet3d.datasets import NuScenesDataset
from pyquaternion import Quaternion


@DATASETS.register_module()
class CustomNuScenesDataset(NuScenesDataset):

    def collect_sweeps(self, index, into_past=60, into_future=0):
        all_sweeps_prev = []
        curr_index = index
        while len(all_sweeps_prev) < into_past:
            curr_sweeps = self.data_infos[curr_index]['sweeps']
            if len(curr_sweeps) == 0:
                break
            all_sweeps_prev.extend(curr_sweeps)
            all_sweeps_prev.append(self.data_infos[curr_index - 1]['cams'])
            curr_index = curr_index - 1
        
        all_sweeps_next = []
        curr_index = index + 1
        while len(all_sweeps_next) < into_future:
            if curr_index >= len(self.data_infos):
                break
            curr_sweeps = self.data_infos[curr_index]['sweeps']
            all_sweeps_next.extend(curr_sweeps[::-1])
            all_sweeps_next.append(self.data_infos[curr_index]['cams'])
            curr_index = curr_index + 1

        return all_sweeps_prev, all_sweeps_next

    def get_data_info(self, index):
        info = self.data_infos[index]
        sweeps_prev, sweeps_next = self.collect_sweeps(index)

        ego2global_translation = info['ego2global_translation']
        ego2global_rotation = info['ego2global_rotation']
        lidar2ego_translation = info['lidar2ego_translation']
        lidar2ego_rotation = info['lidar2ego_rotation']
        ego2global_rotation = Quaternion(ego2global_rotation).rotation_matrix
        lidar2ego_rotation = Quaternion(lidar2ego_rotation).rotation_matrix

        input_dict = dict(
            sample_idx=info['token'],
            sweeps={'prev': sweeps_prev, 'next': sweeps_next},
            timestamp=info['timestamp'] / 1e6,
            ego2global_translation=ego2global_translation,
            ego2global_rotation=ego2global_rotation,
            lidar2ego_translation=lidar2ego_translation,
            lidar2ego_rotation=lidar2ego_rotation,
        )

        if self.modality['use_camera']:
            img_paths = []
            img_timestamps = []
            lidar2img_rts = []

            for _, cam_info in info['cams'].items():
                img_paths.append(os.path.relpath(cam_info['data_path']))
                img_timestamps.append(cam_info['timestamp'] / 1e6)

                # obtain lidar to image transformation matrix
                lidar2cam_r = np.linalg.inv(cam_info['sensor2lidar_rotation'])
                lidar2cam_t = cam_info['sensor2lidar_translation'] @ lidar2cam_r.T

                lidar2cam_rt = np.eye(4)
                lidar2cam_rt[:3, :3] = lidar2cam_r.T
                lidar2cam_rt[3, :3] = -lidar2cam_t
                
                intrinsic = cam_info['cam_intrinsic']
                viewpad = np.eye(4)
                viewpad[:intrinsic.shape[0], :intrinsic.shape[1]] = intrinsic
                lidar2img_rt = (viewpad @ lidar2cam_rt.T)
                lidar2img_rts.append(lidar2img_rt)

            input_dict.update(dict(
                img_filename=img_paths,
                img_timestamp=img_timestamps,
                lidar2img=lidar2img_rts,
            ))

        if not self.test_mode:
            annos = self.get_ann_info(index)
            input_dict['ann_info'] = annos

        return input_dict

================================================
FILE: loaders/nuscenes_occ_dataset.py
================================================
import os
import mmcv
import glob
import torch
import numpy as np
from tqdm import tqdm
from mmdet.datasets import DATASETS
from mmdet3d.datasets import NuScenesDataset
from nuscenes.eval.common.utils import Quaternion
from nuscenes.utils.geometry_utils import transform_matrix
from torch.utils.data import DataLoader
from models.utils import sparse2dense
from .ray_metrics import main_rayiou, main_raypq
from .ego_pose_dataset import EgoPoseDataset
from configs.r50_nuimg_704x256_8f import occ_class_names as occ3d_class_names
from configs.r50_nuimg_704x256_8f_openocc import occ_class_names as openocc_class_names

@DATASETS.register_module()
class NuSceneOcc(NuScenesDataset):    
    def __init__(self, occ_gt_root, *args, **kwargs):
        super().__init__(filter_empty_gt=False, *args, **kwargs)
        self.occ_gt_root = occ_gt_root
        self.data_infos = self.load_annotations(self.ann_file)

        self.token2scene = {}
        for gt_path in glob.glob(os.path.join(self.occ_gt_root, '*/*/*.npz')):
            token = gt_path.split('/')[-2]
            scene_name = gt_path.split('/')[-3]
            self.token2scene[token] = scene_name

        for i in range(len(self.data_infos)):
            scene_name = self.token2scene[self.data_infos[i]['token']]
            self.data_infos[i]['scene_name'] = scene_name

    def collect_sweeps(self, index, into_past=150, into_future=0):
        all_sweeps_prev = []
        curr_index = index
        while len(all_sweeps_prev) < into_past:
            curr_sweeps = self.data_infos[curr_index]['sweeps']
            if len(curr_sweeps) == 0:
                break
            all_sweeps_prev.extend(curr_sweeps)
            all_sweeps_prev.append(self.data_infos[curr_index - 1]['cams'])
            curr_index = curr_index - 1
        
        all_sweeps_next = []
        curr_index = index + 1
        while len(all_sweeps_next) < into_future:
            if curr_index >= len(self.data_infos):
                break
            curr_sweeps = self.data_infos[curr_index]['sweeps']
            all_sweeps_next.extend(curr_sweeps[::-1])
            all_sweeps_next.append(self.data_infos[curr_index]['cams'])
            curr_index = curr_index + 1

        return all_sweeps_prev, all_sweeps_next

    def get_data_info(self, index):
        info = self.data_infos[index]
        sweeps_prev, sweeps_next = self.collect_sweeps(index)

        ego2global_translation = info['ego2global_translation']
        ego2global_rotation = info['ego2global_rotation']
        lidar2ego_translation = info['lidar2ego_translation']
        lidar2ego_rotation = info['lidar2ego_rotation']
        ego2global_rotation_mat = Quaternion(ego2global_rotation).rotation_matrix
        lidar2ego_rotation_mat = Quaternion(lidar2ego_rotation).rotation_matrix

        input_dict = dict(
            sample_idx=info['token'],
            sweeps={'prev': sweeps_prev, 'next': sweeps_next},
            timestamp=info['timestamp'] / 1e6,
            ego2global_translation=ego2global_translation,
            ego2global_rotation=ego2global_rotation_mat,
            lidar2ego_translation=lidar2ego_translation,
            lidar2ego_rotation=lidar2ego_rotation_mat,
        )

        ego2lidar = transform_matrix(lidar2ego_translation, Quaternion(lidar2ego_rotation), inverse=True)
        input_dict['ego2lidar'] = [ego2lidar for _ in range(6)]
        input_dict['occ_path'] = os.path.join(self.occ_gt_root, info['scene_name'], info['token'], 'labels.npz')

        if self.modality['use_camera']:
            img_paths = []
            img_timestamps = []
            lidar2img_rts = []

            for _, cam_info in info['cams'].items():
                img_paths.append(os.path.relpath(cam_info['data_path']))
                img_timestamps.append(cam_info['timestamp'] / 1e6)

                # obtain lidar to image transformation matrix
                lidar2cam_r = np.linalg.inv(cam_info['sensor2lidar_rotation'])
                lidar2cam_t = cam_info['sensor2lidar_translation'] @ lidar2cam_r.T

                lidar2cam_rt = np.eye(4)
                lidar2cam_rt[:3, :3] = lidar2cam_r.T
                lidar2cam_rt[3, :3] = -lidar2cam_t
                
                intrinsic = cam_info['cam_intrinsic']
                viewpad = np.eye(4)
                viewpad[:intrinsic.shape[0], :intrinsic.shape[1]] = intrinsic
                lidar2img_rt = (viewpad @ lidar2cam_rt.T)
                lidar2img_rts.append(lidar2img_rt)

            input_dict.update(dict(
                img_filename=img_paths,
                img_timestamp=img_timestamps,
                lidar2img=lidar2img_rts,
            ))

        if not self.test_mode:
            annos = self.get_ann_info(index)
            input_dict['ann_info'] = annos

        return input_dict

    def evaluate(self, occ_results, runner=None, show_dir=None, **eval_kwargs):
        occ_gts, occ_preds, inst_gts, inst_preds, lidar_origins = [], [], [], [], []
        print('\nStarting Evaluation...')

        sample_tokens = [info['token'] for info in self.data_infos]

        for batch in DataLoader(EgoPoseDataset(self.data_infos), num_workers=8):
            token = batch[0][0]
            output_origin = batch[1]
            
            data_id = sample_tokens.index(token)
            info = self.data_infos[data_id]

            occ_path = os.path.join(self.occ_gt_root, info['scene_name'], info['token'], 'labels.npz')
            occ_gt = np.load(occ_path, allow_pickle=True)
            gt_semantics = occ_gt['semantics']

            occ_pred = occ_results[data_id]
            sem_pred = torch.from_numpy(occ_pred['sem_pred'])  # [B, N]
            occ_loc = torch.from_numpy(occ_pred['occ_loc'].astype(np.int64))  # [B, N, 3]
            
            data_type = self.occ_gt_root.split('/')[-1]
            if data_type == 'occ3d' or data_type == 'occ3d_panoptic':
                occ_class_names = occ3d_class_names
            elif data_type == 'openocc_v2':
                occ_class_names = openocc_class_names
            else:
                raise ValueError
            free_id = len(occ_class_names) - 1
            
            occ_size = list(gt_semantics.shape)
            sem_pred, _ = sparse2dense(occ_loc, sem_pred, dense_shape=occ_size, empty_value=free_id)
            sem_pred = sem_pred.squeeze(0).numpy()

            if 'pano_inst' in occ_pred.keys():
                pano_inst = torch.from_numpy(occ_pred['pano_inst'])
                pano_sem = torch.from_numpy(occ_pred['pano_sem'])

                pano_inst, _ = sparse2dense(occ_loc, pano_inst, dense_shape=occ_size, empty_value=0)
                pano_sem, _ = sparse2dense(occ_loc, pano_sem, dense_shape=occ_size, empty_value=free_id)
                pano_inst = pano_inst.squeeze(0).numpy()
                pano_sem = pano_sem.squeeze(0).numpy()
                sem_pred = pano_sem

                gt_instances = occ_gt['instances']
                inst_gts.append(gt_instances)
                inst_preds.append(pano_inst)

            lidar_origins.append(output_origin)
            occ_gts.append(gt_semantics)
            occ_preds.append(sem_pred)
        
        if len(inst_preds) > 0:
            results = main_raypq(occ_preds, occ_gts, inst_preds, inst_gts, lidar_origins, occ_class_names=occ_class_names)
            results.update(main_rayiou(occ_preds, occ_gts, lidar_origins, occ_class_names=occ_class_names))
            return results
        else:
            return main_rayiou(occ_preds, occ_gts, lidar_origins, occ_class_names=occ_class_names)

    def format_results(self, occ_results, submission_prefix, **kwargs):
        if submission_prefix is not None:
            mmcv.mkdir_or_exist(submission_prefix)

        for index, occ_pred in enumerate(tqdm(occ_results)):
            info = self.data_infos[index]
            sample_token = info['token']
            save_path = os.path.join(submission_prefix, '{}.npz'.format(sample_token))
            np.savez_compressed(save_path, occ_pred.astype(np.uint8))
        
        print('\nFinished.')


================================================
FILE: loaders/old_metrics.py
================================================
import os
import numpy as np
from sklearn.neighbors import KDTree
from termcolor import colored
from functools import reduce
from typing import Iterable

np.seterr(divide='ignore', invalid='ignore')
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"


def pcolor(string, color, on_color=None, attrs=None):
    """
    Produces a colored string for printing

    Parameters
    ----------
    string : str
        String that will be colored
    color : str
        Color to use
    on_color : str
        Background color to use
    attrs : list of str
        Different attributes for the string

    Returns
    -------
    string: str
        Colored string
    """
    return colored(string, color, on_color, attrs)


def getCellCoordinates(points, voxelSize):
    return (points / voxelSize).astype(np.int)


def getNumUniqueCells(cells):
    M = cells.max() + 1
    return np.unique(cells[:, 0] + M * cells[:, 1] + M ** 2 * cells[:, 2]).shape[0]


class Metric_mIoU():
    def __init__(self,
                 save_dir='.',
                 num_classes=18,
                 use_lidar_mask=False,
                 use_image_mask=False,
                 ):
        if num_classes == 18:
            self.class_names = [
                'others','barrier', 'bicycle', 'bus', 'car', 'construction_vehicle',
                'motorcycle', 'pedestrian', 'traffic_cone', 'trailer', 'truck',
                'driveable_surface', 'other_flat', 'sidewalk',
                'terrain', 'manmade', 'vegetation','free'
            ]
        elif num_classes == 2:
            self.class_names = ['non-free', 'free']
        
        self.save_dir = save_dir
        self.use_lidar_mask = use_lidar_mask
        self.use_image_mask = use_image_mask
        self.num_classes = num_classes

        self.point_cloud_range = [-40.0, -40.0, -1.0, 40.0, 40.0, 5.4]
        self.occupancy_size = [0.4, 0.4, 0.4]
        self.voxel_size = 0.4
        self.occ_xdim = int((self.point_cloud_range[3] - self.point_cloud_range[0]) / self.occupancy_size[0])
        self.occ_ydim = int((self.point_cloud_range[4] - self.point_cloud_range[1]) / self.occupancy_size[1])
        self.occ_zdim = int((self.point_cloud_range[5] - self.point_cloud_range[2]) / self.occupancy_size[2])
        self.voxel_num = self.occ_xdim * self.occ_ydim * self.occ_zdim
        self.hist = np.zeros((self.num_classes, self.num_classes))
        self.cnt = 0

    def hist_info(self, n_cl, pred, gt):
        """
        build confusion matrix
        # empty classes:0
        non-empty class: 0-16
        free voxel class: 17

        Args:
            n_cl (int): num_classes_occupancy
            pred (1-d array): pred_occupancy_label
            gt (1-d array): gt_occupancu_label

        Returns:
            tuple:(hist, correctly number_predicted_labels, num_labelled_sample)
        """
        assert pred.shape == gt.shape
        k = (gt >= 0) & (gt < n_cl)  # exclude 255
        labeled = np.sum(k)
        correct = np.sum((pred[k] == gt[k]))

        return (
            np.bincount(
                n_cl * gt[k].astype(int) + pred[k].astype(int), minlength=n_cl ** 2
            ).reshape(n_cl, n_cl),
            correct,
            labeled,
        )

    def per_class_iu(self, hist):
        #return np.diag(hist) / (hist.sum(1) + hist.sum(0) - np.diag(hist))
        result = np.diag(hist) / (hist.sum(1) + hist.sum(0) - np.diag(hist))
        result[hist.sum(1) == 0] = float('nan')
        return result

    def compute_mIoU(self, pred, label, n_classes):
        hist = np.zeros((n_classes, n_classes))
        new_hist, correct, labeled = self.hist_info(n_classes, pred.flatten(), label.flatten())
        hist += new_hist
        mIoUs = self.per_class_iu(hist)
        # for ind_class in range(n_classes):
        #     print(str(round(mIoUs[ind_class] * 100, 2)))
        # print('===> mIoU: ' + str(round(np.nanmean(mIoUs) * 100, 2)))
        return round(np.nanmean(mIoUs) * 100, 2), hist

    def add_batch(self,semantics_pred,semantics_gt,mask_lidar,mask_camera):
        self.cnt += 1
        if self.use_image_mask:
            masked_semantics_gt = semantics_gt[mask_camera]
            masked_semantics_pred = semantics_pred[mask_camera]
        elif self.use_lidar_mask:
            masked_semantics_gt = semantics_gt[mask_lidar]
            masked_semantics_pred = semantics_pred[mask_lidar]
        else:
            masked_semantics_gt = semantics_gt
            masked_semantics_pred = semantics_pred

        if self.num_classes == 2:
            masked_semantics_pred = np.copy(masked_semantics_pred)
            masked_semantics_gt = np.copy(masked_semantics_gt)
            masked_semantics_pred[masked_semantics_pred < 17] = 0
            masked_semantics_pred[masked_semantics_pred == 17] = 1
            masked_semantics_gt[masked_semantics_gt < 17] = 0
            masked_semantics_gt[masked_semantics_gt == 17] = 1
        
        _, _hist = self.compute_mIoU(masked_semantics_pred, masked_semantics_gt, self.num_classes)
        self.hist += _hist

    def count_miou(self):
        mIoU = self.per_class_iu(self.hist)
        # assert cnt == num_samples, 'some samples are not included in the miou calculation'
        print(f'===> per class IoU of {self.cnt} samples:')
        for ind_class in range(self.num_classes-1):
            print(f'===> {self.class_names[ind_class]} - IoU = ' + str(round(mIoU[ind_class] * 100, 2)))

        print(f'===> mIoU of {self.cnt} samples: ' + str(round(np.nanmean(mIoU[:self.num_classes-1]) * 100, 2)))
        # print(f'===> sample-wise averaged mIoU of {cnt} samples: ' + str(round(np.nanmean(mIoU_avg), 2)))

        return round(np.nanmean(mIoU[:self.num_classes-1]) * 100, 2)


class Metric_FScore():
    def __init__(self,
                 leaf_size=10,
                 threshold_acc=0.6,
                 threshold_complete=0.6,
                 voxel_size=[0.4, 0.4, 0.4],
                 range=[-40, -40, -1, 40, 40, 5.4],
                 void=[17, 255],
                 use_lidar_mask=False,
                 use_image_mask=False, ) -> None:

        self.leaf_size = leaf_size
        self.threshold_acc = threshold_acc
        self.threshold_complete = threshold_complete
        self.voxel_size = voxel_size
        self.range = range
        self.void = void
        self.use_lidar_mask = use_lidar_mask
        self.use_image_mask = use_image_mask
        self.cnt=0
        self.tot_acc = 0.
        self.tot_cmpl = 0.
        self.tot_f1_mean = 0.
        self.eps = 1e-8

    def voxel2points(self, voxel):
        # occIdx = torch.where(torch.logical_and(voxel != FREE, voxel != NOT_OBSERVED))
        # if isinstance(voxel, np.ndarray): voxel = torch.from_numpy(voxel)
        mask = np.logical_not(reduce(np.logical_or, [voxel == self.void[i] for i in range(len(self.void))]))
        occIdx = np.where(mask)

        points = np.concatenate((occIdx[0][:, None] * self.voxel_size[0] + self.voxel_size[0] / 2 + self.range[0], \
                                 occIdx[1][:, None] * self.voxel_size[1] + self.voxel_size[1] / 2 + self.range[1], \
                                 occIdx[2][:, None] * self.voxel_size[2] + self.voxel_size[2] / 2 + self.range[2]),
                                axis=1)
        return points

    def add_batch(self,semantics_pred,semantics_gt,mask_lidar,mask_camera ):
        # for scene_token in tqdm(preds_dict.keys()):
        self.cnt += 1

        if self.use_image_mask:

            semantics_gt[mask_camera == False] = 255
            semantics_pred[mask_camera == False] = 255
        elif self.use_lidar_mask:
            semantics_gt[mask_lidar == False] = 255
            semantics_pred[mask_lidar == False] = 255
        else:
            pass

        ground_truth = self.voxel2points(semantics_gt)
        prediction = self.voxel2points(semantics_pred)
        if prediction.shape[0] == 0:
            accuracy=0
            completeness=0
            fmean=0

        else:
            prediction_tree = KDTree(prediction, leaf_size=self.leaf_size)
            ground_truth_tree = KDTree(ground_truth, leaf_size=self.leaf_size)
            complete_distance, _ = prediction_tree.query(ground_truth)
            complete_distance = complete_distance.flatten()

            accuracy_distance, _ = ground_truth_tree.query(prediction)
            accuracy_distance = accuracy_distance.flatten()

            # evaluate completeness
            complete_mask = complete_distance < self.threshold_complete
            completeness = complete_mask.mean()

            # evalute accuracy
            accuracy_mask = accuracy_distance < self.threshold_acc
            accuracy = accuracy_mask.mean()

            fmean = 2.0 / (1 / (accuracy+self.eps) + 1 / (completeness+self.eps))

        self.tot_acc += accuracy
        self.tot_cmpl += completeness
        self.tot_f1_mean += fmean

    def count_fscore(self,):
        base_color, attrs = 'red', ['bold', 'dark']
        print(pcolor('\n######## F score: {} #######'.format(self.tot_f1_mean / self.cnt), base_color, attrs=attrs))
        return self.tot_f1_mean / self.cnt

class Metric_mRecall():
    def __init__(self,
                 save_dir='.',
                 num_classes=18,
                 pred_classes=2,
                 use_lidar_mask=False,
                 use_image_mask=False,
                 ):
        if num_classes == 18:
            self.class_names = [
                'others','barrier', 'bicycle', 'bus', 'car', 'construction_vehicle',
                'motorcycle', 'pedestrian', 'traffic_cone', 'trailer', 'truck',
                'driveable_surface', 'other_flat', 'sidewalk',
                'terrain', 'manmade', 'vegetation','free'
            ]
        elif num_classes == 2:
            self.class_names = ['non-free', 'free']
        
        self.pred_classes = pred_classes
        self.save_dir = save_dir
        self.use_lidar_mask = use_lidar_mask
        self.use_image_mask = use_image_mask
        self.num_classes = num_classes

        self.point_cloud_range = [-40.0, -40.0, -1.0, 40.0, 40.0, 5.4]
        self.occupancy_size = [0.4, 0.4, 0.4]
        self.voxel_size = 0.4
        self.occ_xdim = int((self.point_cloud_range[3] - self.point_cloud_range[0]) / self.occupancy_size[0])
        self.occ_ydim = int((self.point_cloud_range[4] - self.point_cloud_range[1]) / self.occupancy_size[1])
        self.occ_zdim = int((self.point_cloud_range[5] - self.point_cloud_range[2]) / self.occupancy_size[2])
        self.voxel_num = self.occ_xdim * self.occ_ydim * self.occ_zdim
        self.hist = np.zeros((self.num_classes, self.pred_classes))   # n_cl, p_cl
        self.cnt = 0

    def hist_info(self, n_cl, p_cl, pred, gt):
        """
        build confusion matrix
        # empty classes:0
        non-empty class: 0-16
        free voxel class: 17

        Args:
            n_cl (int): num_classes_occupancy
            pred (1-d array): pred_occupancy_label
            gt (1-d array): gt_occupancu_label

        Returns:
            tuple:(hist, correctly number_predicted_labels, num_labelled_sample)
        """
        assert pred.shape == gt.shape
        k = (gt >= 0) & (gt < n_cl)  # exclude 255
        labeled = np.sum(k)
        correct = np.sum((pred[k] == gt[k]))

        return (
            np.bincount(
                p_cl * gt[k].astype(int) + pred[k].astype(int), minlength=n_cl * p_cl
            ).reshape(n_cl, p_cl),   # 18, 2
            correct,
            labeled,
        )

    def per_class_recall(self, hist):
        return hist[:, 1] / hist.sum(1)   ## recall 

    def compute_mRecall(self, pred, label, n_classes, p_classes):
        hist = np.zeros((n_classes, p_classes))
        new_hist, correct, labeled = self.hist_info(n_classes, p_classes, pred.flatten(), label.flatten())
        hist += new_hist
        mRecalls = self.per_class_recall(hist)
        # for ind_class in range(n_classes):
        #     print(str(round(mIoUs[ind_class] * 100, 2)))
        # print('===> mIoU: ' + str(round(np.nanmean(mIoUs) * 100, 2)))
        return round(np.nanmean(mRecalls) * 100, 2), hist

    def add_batch(self,semantics_pred,semantics_gt,mask_lidar,mask_camera):
        self.cnt += 1
        if self.use_image_mask:
            masked_semantics_gt = semantics_gt[mask_camera]
            masked_semantics_pred = semantics_pred[mask_camera]
        elif self.use_lidar_mask:
            masked_semantics_gt = semantics_gt[mask_lidar]
            masked_semantics_pred = semantics_pred[mask_lidar]
        else:
            masked_semantics_gt = semantics_gt
            masked_semantics_pred = semantics_pred

        if self.pred_classes == 2:
            masked_semantics_pred = np.copy(masked_semantics_pred)
            masked_semantics_gt = np.copy(masked_semantics_gt)
            masked_semantics_pred[masked_semantics_pred < 17] = 1  
            masked_semantics_pred[masked_semantics_pred == 17] = 0 # 0 is free

        _, _hist = self.compute_mRecall(masked_semantics_pred, masked_semantics_gt, self.num_classes, self.pred_classes)
        self.hist += _hist

    def count_mrecall(self):
        mRecall = self.per_class_recall(self.hist)
        # assert cnt == num_samples, 'some samples are not included in the miou calculation'
        print(f'===> per class Recall of {self.cnt} samples:')
        for ind_class in range(self.num_classes-1):
            print(f'===> {self.class_names[ind_class]} - Recall = ' + str(round(mRecall[ind_class] * 100, 2)))

        print(f'===> mRecall of {self.cnt} samples: ' + str(round(np.nanmean(mRecall[:self.num_classes-1]) * 100, 2)))

        return round(np.nanmean(mRecall[:self.num_classes-1]) * 100, 2)


# modified from https://github.com/open-mmlab/mmdetection3d/blob/main/mmdet3d/evaluation/functional/panoptic_seg_eval.py#L10
class Metric_Panoptic():
    def __init__(self, 
                 save_dir='.',
                 num_classes=18,
                 use_lidar_mask=False,
                 use_image_mask=False,
                 ignore_index: Iterable[int]=[],
                 ):
        """
        Args:
            ignore_index (llist): Class ids that not be considered in pq counting.
        """
        if num_classes == 18:
            self.class_names = [
                'others','barrier', 'bicycle', 'bus', 'car', 'construction_vehicle',
                'motorcycle', 'pedestrian', 'traffic_cone', 'trailer', 'truck',
                'driveable_surface', 'other_flat', 'sidewalk',
                'terrain', 'manmade', 'vegetation','free'
            ]
        else:
            raise ValueError
        
        self.save_dir = save_dir
        self.num_classes = num_classes
        self.use_lidar_mask = use_lidar_mask
        self.use_image_mask = use_image_mask
        self.ignore_index = ignore_index
        self.id_offset = 2 ** 16
        self.eps = 1e-5
        
        self.min_num_points = 20
        self.include = np.array(
            [n for n in range(self.num_classes - 1) if n not in self.ignore_index],
            dtype=int)
        self.cnt = 0
        
        # panoptic stuff
        self.pan_tp = np.zeros(self.num_classes, dtype=int)
        self.pan_iou = np.zeros(self.num_classes, dtype=np.double)
        self.pan_fp = np.zeros(self.num_classes, dtype=int)
        self.pan_fn = np.zeros(self.num_classes, dtype=int)
        
    def add_batch(self,semantics_pred,semantics_gt,instances_pred,instances_gt,mask_lidar,mask_camera):
        self.cnt += 1
        if self.use_image_mask:
            masked_semantics_gt = semantics_gt[mask_camera]
            masked_semantics_pred = semantics_pred[mask_camera]
            masked_instances_gt = instances_gt[mask_camera]
            masked_instances_pred = instances_pred[mask_camera]
        elif self.use_lidar_mask:
            masked_semantics_gt = semantics_gt[mask_lidar]
            masked_semantics_pred = semantics_pred[mask_lidar]
            masked_instances_gt = instances_gt[mask_lidar]
            masked_instances_pred = instances_pred[mask_lidar]
        else:
            masked_semantics_gt = semantics_gt
            masked_semantics_pred = semantics_pred
            masked_instances_gt = instances_gt
            masked_instances_pred = instances_pred
        self.add_panoptic_sample(masked_semantics_pred, masked_semantics_gt, masked_instances_pred, masked_instances_gt) 
    
    def add_panoptic_sample(self, semantics_pred, semantics_gt, instances_pred, instances_gt):
        """Add one sample of panoptic predictions and ground truths for
        evaluation.

        Args:
            semantics_pred (np.ndarray): Semantic predictions.
            semantics_gt (np.ndarray): Semantic ground truths.
            instances_pred (np.ndarray): Instance predictions.
            instances_gt (np.ndarray): Instance ground truths.
        """
        # get instance_class_id from instance_gt
        instance_class_ids = [self.num_classes - 1]
        for i in range(1, instances_gt.max() + 1):
            class_id = np.unique(semantics_gt[instances_gt == i])
            # assert class_id.shape[0] == 1, "each instance must belong to only one class"
            if class_id.shape[0] == 1:
                instance_class_ids.append(class_id[0])
            else:
                instance_class_ids.append(self.num_classes - 1)
        instance_class_ids = np.array(instance_class_ids)

        instance_count = 1
        final_instance_class_ids = []
        final_instances = np.zeros_like(instances_gt)  # empty space has instance id "0"

        for class_id in range(self.num_classes - 1):
            if np.sum(semantics_gt == class_id) == 0:
                continue

            if self.class_names[class_id] in ['car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'motorcycle', 'bicycle', 'pedestrian']:
                # treat as instances
                for instance_id in range(len(instance_class_ids)):
                    if instance_class_ids[instance_id] != class_id:
                        continue
                    final_instances[instances_gt == instance_id] = instance_count
                    instance_count += 1
                    final_instance_class_ids.append(class_id)
            else:
                # treat as semantics
                final_instances[semantics_gt == class_id] = instance_count
                instance_count += 1
                final_instance_class_ids.append(class_id)
                
        instances_gt = final_instances
        
        # avoid zero (ignored label)
        instances_pred = instances_pred + 1
        instances_gt = instances_gt + 1
        
        for cl in self.ignore_index:
            # make a mask for this class
            gt_not_in_excl_mask = semantics_gt != cl
            # remove all other points
            semantics_pred = semantics_pred[gt_not_in_excl_mask]
            semantics_gt = semantics_gt[gt_not_in_excl_mask]
            instances_pred = instances_pred[gt_not_in_excl_mask]
            instances_gt = instances_gt[gt_not_in_excl_mask]
        
        # for each class (except the ignored ones)
        for cl in self.include:
            # get a class mask
            pred_inst_in_cl_mask = semantics_pred == cl
            gt_inst_in_cl_mask = semantics_gt == cl

            # get instance points in class (makes outside stuff 0)
            pred_inst_in_cl = instances_pred * pred_inst_in_cl_mask.astype(int)
            gt_inst_in_cl = instances_gt * gt_inst_in_cl_mask.astype(int)

            # generate the areas for each unique instance prediction
            unique_pred, counts_pred = np.unique(
                pred_inst_in_cl[pred_inst_in_cl > 0], return_counts=True)
            id2idx_pred = {id: idx for idx, id in enumerate(unique_pred)}
            matched_pred = np.array([False] * unique_pred.shape[0])

            # generate the areas for each unique instance gt_np
            unique_gt, counts_gt = np.unique(
                gt_inst_in_cl[gt_inst_in_cl > 0], return_counts=True)
            id2idx_gt = {id: idx for idx, id in enumerate(unique_gt)}
            matched_gt = np.array([False] * unique_gt.shape[0])

            # generate intersection using offset
            valid_combos = np.logical_and(pred_inst_in_cl > 0,
                                          gt_inst_in_cl > 0)
            id_offset_combo = pred_inst_in_cl[
                valid_combos] + self.id_offset * gt_inst_in_cl[valid_combos]
            unique_combo, counts_combo = np.unique(
                id_offset_combo, return_counts=True)

            # generate an intersection map
            # count the intersections with over 0.5 IoU as TP
            gt_labels = unique_combo // self.id_offset
            pred_labels = unique_combo % self.id_offset
            gt_areas = np.array([counts_gt[id2idx_gt[id]] for id in gt_labels])
            pred_areas = np.array(
                [counts_pred[id2idx_pred[id]] for id in pred_labels])
            intersections = counts_combo
            unions = gt_areas + pred_areas - intersections
            ious = intersections.astype(float) / unions.astype(float)

            tp_indexes = ious > 0.5
            self.pan_tp[cl] += np.sum(tp_indexes)
            self.pan_iou[cl] += np.sum(ious[tp_indexes])

            matched_gt[[id2idx_gt[id] for id in gt_labels[tp_indexes]]] = True
            matched_pred[[id2idx_pred[id]
                          for id in pred_labels[tp_indexes]]] = True

            # count the FN
            if len(counts_gt) > 0:
                self.pan_fn[cl] += np.sum(
                    np.logical_and(counts_gt >= self.min_num_points,
                                   ~matched_gt))

            # count the FP
            if len(matched_pred) > 0:
                self.pan_fp[cl] += np.sum(
                    np.logical_and(counts_pred >= self.min_num_points,
                                   ~matched_pred))
    
    def count_pq(self, ):
        sq_all = self.pan_iou.astype(np.double) / np.maximum(
            self.pan_tp.astype(np.double), self.eps)
        rq_all = self.pan_tp.astype(np.double) / np.maximum(
            self.pan_tp.astype(np.double) + 0.5 * self.pan_fp.astype(np.double)
            + 0.5 * self.pan_fn.astype(np.double), self.eps)
        pq_all = sq_all * rq_all
        
        # mask classes not occurring in dataset
        mask = (self.pan_tp + self.pan_fp + self.pan_fn) > 0
        sq_all[~mask] = float('nan')
        rq_all[~mask] = float('nan')
        pq_all[~mask] = float('nan')
        
        # then do the REAL mean (no ignored classes)
        sq = round(np.nanmean(sq_all[self.include]) * 100, 2)
        rq = round(np.nanmean(rq_all[self.include]) * 100, 2)
        pq = round(np.nanmean(pq_all[self.include]) * 100, 2)
        
        print(f'===> per class sq, rq, pq of {self.cnt} samples:')
        for ind_class in self.include:
            print(f'===> {self.class_names[ind_class]} -' + \
                  f' sq = {round(sq_all[ind_class] * 100, 2)},' + \
                  f' rq = {round(rq_all[ind_class] * 100, 2)},' + \
                  f' pq = {round(pq_all[ind_class] * 100, 2)}')
        
        print(f'===> sq of {self.cnt} samples: ' + str(sq))
        print(f'===> rq of {self.cnt} samples: ' + str(rq))
        print(f'===> pq of {self.cnt} samples: ' + str(pq))

        return (pq, sq, rq)

================================================
FILE: loaders/pipelines/__init__.py
================================================
from .loading import LoadMultiViewImageFromMultiSweeps, LoadOccGTFromFile
from .transforms import PadMultiViewImage, NormalizeMultiviewImage, PhotoMetricDistortionMultiViewImage

__all__ = [
    'LoadMultiViewImageFromMultiSweeps', 'PadMultiViewImage', 'NormalizeMultiviewImage', 
    'PhotoMetricDistortionMultiViewImage', 'LoadOccGTFromFile'
]

================================================
FILE: loaders/pipelines/loading.py
================================================
import os
import mmcv
import torch
import numpy as np
from mmdet.datasets.builder import PIPELINES
from numpy.linalg import inv
from mmcv.runner import get_dist_info
from mmcv.parallel import DataContainer as DC
from mmdet.datasets.pipelines import to_tensor
from torchvision.transforms.functional import rotate


def compose_lidar2img(ego2global_translation_curr,
                      ego2global_rotation_curr,
                      lidar2ego_translation_curr,
                      lidar2ego_rotation_curr,
                      sensor2global_translation_past,
                      sensor2global_rotation_past,
                      cam_intrinsic_past):
    
    R = sensor2global_rotation_past @ (inv(ego2global_rotation_curr).T @ inv(lidar2ego_rotation_curr).T)
    T = sensor2global_translation_past @ (inv(ego2global_rotation_curr).T @ inv(lidar2ego_rotation_curr).T)
    T -= ego2global_translation_curr @ (inv(ego2global_rotation_curr).T @ inv(lidar2ego_rotation_curr).T) + lidar2ego_translation_curr @ inv(lidar2ego_rotation_curr).T

    lidar2cam_r = inv(R.T)
    lidar2cam_t = T @ lidar2cam_r.T

    lidar2cam_rt = np.eye(4)
    lidar2cam_rt[:3, :3] = lidar2cam_r.T
    lidar2cam_rt[3, :3] = -lidar2cam_t

    viewpad = np.eye(4)
    viewpad[:cam_intrinsic_past.shape[0], :cam_intrinsic_past.shape[1]] = cam_intrinsic_past
    lidar2img = (viewpad @ lidar2cam_rt.T).astype(np.float32)

    return lidar2img


@PIPELINES.register_module()
class LoadMultiViewImageFromMultiSweeps(object):
    def __init__(self,
                 sweeps_num=5,
                 color_type='color',
                 test_mode=False):
        self.sweeps_num = sweeps_num
        self.color_type = color_type
        self.test_mode = test_mode

        self.train_interval = [4, 8]
        self.test_interval = 6

        try:
            mmcv.use_backend('turbojpeg')
        except ImportError:
            mmcv.use_backend('cv2')

    def load_offline(self, results):
        cam_types = [
            'CAM_FRONT', 'CAM_FRONT_RIGHT', 'CAM_FRONT_LEFT',
            'CAM_BACK', 'CAM_BACK_LEFT', 'CAM_BACK_RIGHT'
        ]

        if len(results['sweeps']['prev']) == 0:
            for _ in range(self.sweeps_num):
                for j in range(len(cam_types)):
                    results['img'].append(results['img'][j])
                    results['img_timestamp'].append(results['img_timestamp'][j])
                    results['filename'].append(results['filename'][j])
                    results['lidar2img'].append(np.copy(results['lidar2img'][j]))
                    if 'ego2lidar' in results:
                        results['ego2lidar'].append(results['ego2lidar'][0])
        else:
            if self.test_mode:
                interval = self.test_interval
                choices = [(k + 1) * interval - 1 for k in range(self.sweeps_num)]
            elif len(results['sweeps']['prev']) <= self.sweeps_num:
                pad_len = self.sweeps_num - len(results['sweeps']['prev'])
                choices = list(range(len(results['sweeps']['prev']))) + [len(results['sweeps']['prev']) - 1] * pad_len
            else:
                max_interval = len(results['sweeps']['prev']) // self.sweeps_num
                max_interval = min(max_interval, self.train_interval[1])
                min_interval = min(max_interval, self.train_interval[0])
                interval = np.random.randint(min_interval, max_interval + 1)
                choices = [(k + 1) * interval - 1 for k in range(self.sweeps_num)]

            for idx in sorted(list(choices)):
                sweep_idx = min(idx, len(results['sweeps']['prev']) - 1)
                sweep = results['sweeps']['prev'][sweep_idx]

                if len(sweep.keys()) < len(cam_types):
                    sweep = results['sweeps']['prev'][sweep_idx - 1]

                for sensor in cam_types:
                    results['img'].append(mmcv.imread(sweep[sensor]['data_path'], self.color_type))
                    results['img_timestamp'].append(sweep[sensor]['timestamp'] / 1e6)
                    results['filename'].append(os.path.relpath(sweep[sensor]['data_path']))
                    results['lidar2img'].append(compose_lidar2img(
                        results['ego2global_translation'],
                        results['ego2global_rotation'],
                        results['lidar2ego_translation'],
                        results['lidar2ego_rotation'],
                        sweep[sensor]['sensor2global_translation'],
                        sweep[sensor]['sensor2global_rotation'],
                        sweep[sensor]['cam_intrinsic'],
                    ))
                    if 'ego2lidar' in results:
                        results['ego2lidar'].append(results['ego2lidar'][0])

        return results

    def load_online(self, results):
        # only used when measuring FPS
        assert self.test_mode
        assert self.test_interval % 6 == 0

        cam_types = [
            'CAM_FRONT', 'CAM_FRONT_RIGHT', 'CAM_FRONT_LEFT',
            'CAM_BACK', 'CAM_BACK_LEFT', 'CAM_BACK_RIGHT'
        ]

        if len(results['sweeps']['prev']) == 0:
            for _ in range(self.sweeps_num):
                for j in range(len(cam_types)):
                    results['img_timestamp'].append(results['img_timestamp'][j])
                    results['filename'].append(results['filename'][j])
                    results['lidar2img'].append(np.copy(results['lidar2img'][j]))
                    if 'ego2lidar' in results:
                        results['ego2lidar'].append(results['ego2lidar'][0])
        else:
            interval = self.test_interval
            choices = [(k + 1) * interval - 1 for k in range(self.sweeps_num)]

            for idx in sorted(list(choices)):
                sweep_idx = min(idx, len(results['sweeps']['prev']) - 1)
                sweep = results['sweeps']['prev'][sweep_idx]

                if len(sweep.keys()) < len(cam_types):
                    sweep = results['sweeps']['prev'][sweep_idx - 1]

                for sensor in cam_types:
                    # skip loading history frames
                    results['img_timestamp'].append(sweep[sensor]['timestamp'] / 1e6)
                    results['filename'].append(os.path.relpath(sweep[sensor]['data_path']))
                    results['lidar2img'].append(compose_lidar2img(
                        results['ego2global_translation'],
                        results['ego2global_rotation'],
                        results['lidar2ego_translation'],
                        results['lidar2ego_rotation'],
                        sweep[sensor]['sensor2global_translation'],
                        sweep[sensor]['sensor2global_rotation'],
                        sweep[sensor]['cam_intrinsic'],
                    ))
                    if 'ego2lidar' in results:
                        results['ego2lidar'].append(results['ego2lidar'][0])

        return results

    def __call__(self, results):
        if self.sweeps_num == 0:
            return results

        world_size = get_dist_info()[1]
        if world_size == 1 and self.test_mode:
            return self.load_online(results)
        else:
            return self.load_offline(results)


@PIPELINES.register_module()
class LoadOccGTFromFile(object):
    def __init__(self, num_classes=18, inst_class_ids=[]):
        self.num_classes = num_classes
        self.inst_class_ids = inst_class_ids
    
    def __call__(self, results):
        occ_labels = np.load(results['occ_path'])
        semantics = occ_labels['semantics']  # [200, 200, 16]
        # mask_lidar = occ_labels['mask_lidar'].astype(np.bool_)  # [200, 200, 16]
        # mask_camera = occ_labels['mask_camera'].astype(np.bool_)  # [200, 200, 16]

        # results['mask_lidar'] = mask_lidar
        # results['mask_camera'] = mask_camera
  
        # instance GT
        if 'instances' in occ_labels.keys():
            instances = occ_labels['instances']
            instance_class_ids = [self.num_classes - 1]  # the 0-th class is always free class
            for i in range(1, instances.max() + 1):
                class_id = np.unique(semantics[instances == i])
                assert class_id.shape[0] == 1, "each instance must belong to only one class"
                instance_class_ids.append(class_id[0])
            instance_class_ids = np.array(instance_class_ids)
        else:
            instances = None
            instance_class_ids = None

        instance_count = 0
        final_instance_class_ids = []
        final_instances = np.ones_like(semantics) * 255  # empty space has instance id "255"

        for class_id in range(self.num_classes - 1):
            if np.sum(semantics == class_id) == 0:
                continue

            if class_id in self.inst_class_ids:
                assert instances is not None, 'instance annotation not found'
                # treat as instances
                for instance_id in range(len(instance_class_ids)):
                    if instance_class_ids[instance_id] != class_id:
                        continue
                    final_instances[instances == instance_id] = instance_count
                    instance_count += 1
                    final_instance_class_ids.append(class_id)
            else:
                # treat as semantics
                final_instances[semantics == class_id] = instance_count
                instance_count += 1
                final_instance_class_ids.append(class_id)

        results['voxel_semantics'] = semantics
        results['voxel_instances'] = final_instances
        results['instance_class_ids'] = DC(to_tensor(final_instance_class_ids))

        if results.get('rotate_bda', False):
            semantics = torch.from_numpy(semantics).permute(2, 0, 1)  # [16, 200, 200]
            semantics = rotate(semantics, results['rotate_bda'], fill=255).permute(1, 2, 0)  # [200, 200, 16]
            results['voxel_semantics'] = semantics.numpy()

            final_instances = torch.from_numpy(final_instances).permute(2, 0, 1)  # [16, 200, 200]
            final_instances = rotate(final_instances, results['rotate_bda'], fill=255).permute(1, 2, 0)  # [200, 200, 16]
            results['voxel_instances'] = final_instances.numpy()

        if results.get('flip_dx', False):
            results['voxel_semantics'] = results['voxel_semantics'][::-1, ...].copy()
            results['voxel_instances'] = results['voxel_instances'][::-1, ...].copy()
            
        if results.get('flip_dy', False):
            results['voxel_semantics'] = results['voxel_semantics'][:, ::-1, ...].copy()
            results['voxel_instances'] = results['voxel_instances'][:, ::-1, ...].copy()

        return results


# https://github.com/HuangJunJie2017/BEVDet/blob/58c2587a8f89a1927926f0bdb6cde2917c91a9a5/mmdet3d/datasets/pipelines/loading.py#L1177
@PIPELINES.register_module()
class BEVAug(object):
    def __init__(self, bda_aug_conf, classes, is_train=True):
        self.bda_aug_conf = bda_aug_conf
        self.is_train = is_train
        self.classes = classes

    def sample_bda_augmentation(self):
        """Generate bda augmentation values based on bda_config."""
        if self.is_train:
            rotate_bda = np.random.uniform(*self.bda_aug_conf['rot_lim'])
            scale_bda = np.random.uniform(*self.bda_aug_conf['scale_lim'])
            flip_dx = np.random.uniform() < self.bda_aug_conf['flip_dx_ratio']
            flip_dy = np.random.uniform() < self.bda_aug_conf['flip_dy_ratio']
        else:
            rotate_bda = 0
            scale_bda = 1.0
            flip_dx = False
            flip_dy = False
        return rotate_bda, scale_bda, flip_dx, flip_dy

    def bev_transform(self, rotate_angle, scale_ratio, flip_dx, flip_dy):
        """
        Returns:
            rot_mat: (3, 3)
        """
        rotate_angle = torch.tensor(rotate_angle / 180 * np.pi)
        rot_sin = torch.sin(rotate_angle)
        rot_cos = torch.cos(rotate_angle)
        rot_mat = torch.Tensor([[rot_cos, -rot_sin, 0], [rot_sin, rot_cos, 0],
                                [0, 0, 1]])
        scale_mat = torch.Tensor([[scale_ratio, 0, 0], [0, scale_ratio, 0],
                                  [0, 0, scale_ratio]])
        flip_mat = torch.Tensor([[1, 0, 0], [0, 1, 0], [0, 0, 1]])

        if flip_dx:
            flip_mat = flip_mat @ torch.Tensor([[-1, 0, 0], [0, 1, 0],
                                                [0, 0, 1]])
        if flip_dy:
            flip_mat = flip_mat @ torch.Tensor([[1, 0, 0], [0, -1, 0],
                                                [0, 0, 1]])
        rot_mat = flip_mat @ (scale_mat @ rot_mat)
        
        return rot_mat

    def __call__(self, results):
        rotate_bda, scale_bda, flip_dx, flip_dy = self.sample_bda_augmentation()

        bda_mat = torch.zeros(4, 4)
        bda_mat[3, 3] = 1

        # bda_rot: (3, 3)
        bda_rot = self.bev_transform(rotate_bda, scale_bda, flip_dx, flip_dy)
        bda_mat[:3, :3] = bda_rot

        results['bda_mat'] = bda_mat
        results['flip_dx'] = flip_dx
        results['flip_dy'] = flip_dy
        results['rotate_bda'] = rotate_bda
        results['scale_bda'] = scale_bda

        for i in range(len(results['ego2lidar'])):
            results['ego2lidar'][i] = results['ego2lidar'][i] @ torch.inverse(bda_mat).numpy()  # [4, 4] @ [4, 4]

        return results


================================================
FILE: loaders/pipelines/transforms.py
================================================
import mmcv
import torch
import numpy as np
from PIL import Image
from numpy import random
from mmdet.datasets.builder import PIPELINES


@PIPELINES.register_module()
class PadMultiViewImage(object):
    """Pad the multi-view image.
    There are two padding modes: (1) pad to a fixed size and (2) pad to the
    minimum size that is divisible by some number.
    Added keys are "pad_shape", "pad_fixed_size", "pad_size_divisor",
    Args:
        size (tuple, optional): Fixed padding size.
        size_divisor (int, optional): The divisor of padded size.
        pad_val (float, optional): Padding value, 0 by default.
    """

    def __init__(self, size=None, size_divisor=None, pad_val=0):
        self.size = size
        self.size_divisor = size_divisor
        self.pad_val = pad_val
        # only one of size and size_divisor should be valid
        assert size is not None or size_divisor is not None
        assert size is None or size_divisor is None

    def _pad_img(self, img):
        if self.size_divisor is not None:
            pad_h = int(np.ceil(img.shape[0] / self.size_divisor)) * self.size_divisor
            pad_w = int(np.ceil(img.shape[1] / self.size_divisor)) * self.size_divisor
        else:
            pad_h, pad_w = self.size

        pad_width = ((0, pad_h - img.shape[0]), (0, pad_w - img.shape[1]), (0, 0))
        img = np.pad(img, pad_width, constant_values=self.pad_val)
        return img

    def _pad_imgs(self, results):
        padded_img = [self._pad_img(img) for img in results['img']]
        
        results['ori_shape'] = [img.shape for img in results['img']]
        results['img'] = padded_img
        results['img_shape'] = [img.shape for img in padded_img]
        results['pad_shape'] = [img.shape for img in padded_img]
        results['pad_fixed_size'] = self.size
        results['pad_size_divisor'] = self.size_divisor

    def __call__(self, results):
        """Call function to pad images, masks, semantic segmentation maps.
        Args:
            results (dict): Result dict from loading pipeline.
        Returns:
            dict: Updated result dict.
        """
        self._pad_imgs(results)
        return results

    def __repr__(self):
        repr_str = self.__class__.__name__
        repr_str += f'(size={self.size}, '
        repr_str += f'size_divisor={self.size_divisor}, '
        repr_str += f'pad_val={self.pad_val})'
        return repr_str


@PIPELINES.register_module()
class NormalizeMultiviewImage(object):
    """Normalize the image.
    Added key is "img_norm_cfg".
    Args:
        mean (sequence): Mean values of 3 channels.
        std (sequence): Std values of 3 channels.
        to_rgb (bool): Whether to convert the image from BGR to RGB,
            default is true.
    """

    def __init__(self, mean, std, to_rgb=True):
        self.mean = np.array(mean, dtype=np.float32).reshape(-1)
        self.std = 1 / np.array(std, dtype=np.float32).reshape(-1)
        self.to_rgb = to_rgb

    def __call__(self, results):
        """Call function to normalize images.
        Args:
            results (dict): Result dict from loading pipeline.
        Returns:
            dict: Normalized results, 'img_norm_cfg' key is added into
                result dict.
        """
        normalized_imgs = []

        for img in results['img']:
            img = img.astype(np.float32)
            if self.to_rgb:
                img = img[..., ::-1]
            img = img - self.mean
            img = img * self.std
            normalized_imgs.append(img)

        results['img'] = normalized_imgs
        results['img_norm_cfg'] = dict(
            mean=self.mean,
            std=self.std,
            to_rgb=self.to_rgb
        )
        return results

    def __repr__(self):
        repr_str = self.__class__.__name__
        repr_str += f'(mean={self.mean}, std={self.std}, to_rgb={self.to_rgb})'
        return repr_str


@PIPELINES.register_module()
class PhotoMetricDistortionMultiViewImage:
    """Apply photometric distortion to image sequentially, every transformation
    is applied with a probability of 0.5. The position of random contrast is in
    second or second to last.
    1. random brightness
    2. random contrast (mode 0)
    3. convert color from BGR to HSV
    4. random saturation
    5. random hue
    6. convert color from HSV to BGR
    7. random contrast (mode 1)
    8. randomly swap channels
    Args:
        brightness_delta (int): delta of brightness.
        contrast_range (tuple): range of contrast.
        saturation_range (tuple): range of saturation.
        hue_delta (int): delta of hue.
    """

    def __init__(self,
                 brightness_delta=32,
                 contrast_range=(0.5, 1.5),
                 saturation_range=(0.5, 1.5),
                 hue_delta=18):
        self.brightness_delta = brightness_delta
        self.contrast_lower, self.contrast_upper = contrast_range
        self.saturation_lower, self.saturation_upper = saturation_range
        self.hue_delta = hue_delta

    def __call__(self, results):
        """Call function to perform photometric distortion on images.
        Args:
            results (dict): Result dict from loading pipeline.
        Returns:
            dict: Result dict with images distorted.
        """
        imgs = results['img']
        new_imgs = []
        for img in imgs:
            ori_dtype = img.dtype
            img = img.astype(np.float32)

            # random brightness
            if random.randint(2):
                delta = random.uniform(-self.brightness_delta,
                                    self.brightness_delta)
                img += delta

            # mode == 0 --> do random contrast first
            # mode == 1 --> do random contrast last
            mode = random.randint(2)
            if mode == 1:
                if random.randint(2):
                    alpha = random.uniform(self.contrast_lower,
                                        self.contrast_upper)
                    img *= alpha

            # convert color from BGR to HSV
            img = mmcv.bgr2hsv(img)

            # random saturation
            if random.randint(2):
                img[..., 1] *= random.uniform(self.saturation_lower,
                                            self.saturation_upper)

            # random hue
            if random.randint(2):
                img[..., 0] += random.uniform(-self.hue_delta, self.hue_delta)
                img[..., 0][img[..., 0] > 360] -= 360
                img[..., 0][img[..., 0] < 0] += 360

            # convert color from HSV to BGR
            img = mmcv.hsv2bgr(img)

            # random contrast
            if mode == 0:
                if random.randint(2):
                    alpha = random.uniform(self.contrast_lower,
                                        self.contrast_upper)
                    img *= alpha

            # randomly swap channels
            if random.randint(2):
                img = img[..., random.permutation(3)]

            new_imgs.append(img.astype(ori_dtype))

        results['img'] = new_imgs
        return results

    def __repr__(self):
        repr_str = self.__class__.__name__
        repr_str += f'(\nbrightness_delta={self.brightness_delta},\n'
        repr_str += 'contrast_range='
        repr_str += f'{(self.contrast_lower, self.contrast_upper)},\n'
        repr_str += 'saturation_range='
        repr_str += f'{(self.saturation_lower, self.saturation_upper)},\n'
        repr_str += f'hue_delta={self.hue_delta})'
        return repr_str


@PIPELINES.register_module()
class RandomTransformImage(object):
    def __init__(self, ida_aug_conf=None, training=True):
        self.ida_aug_conf = ida_aug_conf
        self.training = training

    def __call__(self, results):
        resize, resize_dims, crop, flip, rotate = self.sample_augmentation()
        
        if len(results['lidar2img']) == len(results['img']):
            for i in range(len(results['img'])):
                img = Image.fromarray(np.uint8(results['img'][i]))
                
                # resize, resize_dims, crop, flip, rotate = self._sample_augmentation()
                img, ida_mat = self.img_transform(
                    img,
                    resize=resize,
                    resize_dims=resize_dims,
                    crop=crop,
                    flip=flip,
                    rotate=rotate,
                )
                results['img'][i] = np.array(img).astype(np.uint8)
                results['lidar2img'][i] = ida_mat @ results['lidar2img'][i]

        elif len(results['img']) == 6:
            for i in range(len(results['img'])):
                img = Image.fromarray(np.uint8(results['img'][i]))
                
                # resize, resize_dims, crop, flip, rotate = self._sample_augmentation()
                img, ida_mat = self.img_transform(
                    img,
                    resize=resize,
                    resize_dims=resize_dims,
                    crop=crop,
                    flip=flip,
                    rotate=rotate,
                )
                results['img'][i] = np.array(img).astype(np.uint8)

            for i in range(len(results['lidar2img'])):
                results['lidar2img'][i] = ida_mat @ results['lidar2img'][i]

        else:
            raise ValueError()

        results['ori_shape'] = [img.shape for img in results['img']]
        results['img_shape'] = [img.shape for img in results['img']]
        results['pad_shape'] = [img.shape for img in results['img']]

        return results

    def img_transform(self, img, resize, resize_dims, crop, flip, rotate):
        """
        https://github.com/Megvii-BaseDetection/BEVStereo/blob/master/dataset/nusc_mv_det_dataset.py#L48
        """
        def get_rot(h):
            return torch.Tensor([
                [np.cos(h), np.sin(h)],
                [-np.sin(h), np.cos(h)],
            ])

        ida_rot = torch.eye(2)
        ida_tran = torch.zeros(2)

        # adjust image
        img = img.resize(resize_dims)
        img = img.crop(crop)
        if flip:
            img = img.transpose(method=Image.FLIP_LEFT_RIGHT)
        img = img.rotate(rotate)

        # post-homography transformation
        ida_rot *= resize
        ida_tran -= torch.Tensor(crop[:2])
        
        if flip:
            A = torch.Tensor([[-1, 0], [0, 1]])
            b = torch.Tensor([crop[2] - crop[0], 0])
            ida_rot = A.matmul(ida_rot)
            ida_tran = A.matmul(ida_tran) + b
        
        A = get_rot(rotate / 180 * np.pi)
        b = torch.Tensor([crop[2] - crop[0], crop[3] - crop[1]]) / 2
        b = A.matmul(-b) + b

        ida_rot = A.matmul(ida_rot)
        ida_tran = A.matmul(ida_tran) + b

        ida_mat = torch.eye(4)
        ida_mat[:2, :2] = ida_rot
        ida_mat[:2, 2] = ida_tran

        return img, ida_mat.numpy()

    def sample_augmentation(self):
        """
        https://github.com/Megvii-BaseDetection/BEVStereo/blob/master/dataset/nusc_mv_det_dataset.py#L247
        """
        H, W = self.ida_aug_conf['H'], self.ida_aug_conf['W']
        fH, fW = self.ida_aug_conf['final_dim']

        if self.training:
            resize = np.random.uniform(*self.ida_aug_conf['resize_lim'])
            resize_dims = (int(W * resize), int(H * resize))
            newW, newH = resize_dims
            crop_h = int((1 - np.random.uniform(*self.ida_aug_conf['bot_pct_lim'])) * newH) - fH
            crop_w = int(np.random.uniform(0, max(0, newW - fW)))
            crop = (crop_w, crop_h, crop_w + fW, crop_h + fH)
            flip = False
            if self.ida_aug_conf['rand_flip'] and np.random.choice([0, 1]):
                flip = True
            rotate = np.random.uniform(*self.ida_aug_conf['rot_lim'])
        else:
            resize = max(fH / H, fW / W)
            resize_dims = (int(W * resize), int(H * resize))
            newW, newH = resize_dims
            crop_h = int((1 - np.mean(self.ida_aug_conf['bot_pct_lim'])) * newH) - fH
            crop_w = int(max(0, newW - fW) / 2)
            crop = (crop_w, crop_h, crop_w + fW, crop_h + fH)
            flip = False
            rotate = 0

        return resize, resize_dims, crop, flip, rotate


@PIPELINES.register_module()
class GlobalRotScaleTransImage(object):
    def __init__(self,
                 rot_range=[-0.3925, 0.3925],
                 scale_ratio_range=[0.95, 1.05],
                 translation_std=[0, 0, 0]):
        self.rot_range = rot_range
        self.scale_ratio_range = scale_ratio_range
        self.translation_std = translation_std

    def __call__(self, results):
        # random rotate
        rot_angle = np.random.uniform(*self.rot_range)
        self.rotate_z(results, rot_angle)
        results["gt_bboxes_3d"].rotate(np.array(rot_angle))

        # random scale
        scale_ratio = np.random.uniform(*self.scale_ratio_range)
        self.scale_xyz(results, scale_ratio)
        results["gt_bboxes_3d"].scale(scale_ratio)

        # TODO: support translation

        return results

    def rotate_z(self, results, rot_angle):
        rot_cos = torch.cos(torch.tensor(rot_angle))
        rot_sin = torch.sin(torch.tensor(rot_angle))

        rot_mat = torch.tensor([
            [rot_cos, -rot_sin, 0, 0],
            [rot_sin, rot_cos, 0, 0],
            [0, 0, 1, 0],
            [0, 0, 0, 1],
        ])
        rot_mat_inv = torch.inverse(rot_mat)

        for view in range(len(results['lidar2img'])):
            results['lidar2img'][view] = (torch.tensor(results['lidar2img'][view]).float() @ rot_mat_inv).numpy()

    def scale_xyz(self, results, scale_ratio):
        scale_mat = torch.tensor([
            [scale_ratio, 0, 0, 0],
            [0, scale_ratio, 0, 0],
            [0, 0, scale_ratio, 0],
            [0, 0, 0, 1],
        ])
        scale_mat_inv = torch.inverse(scale_mat)

        for view in range(len(results['lidar2img'])):
            results['lidar2img'][view] = (torch.tensor(results['lidar2img'][view]).float() @ scale_mat_inv).numpy()


================================================
FILE: loaders/ray_metrics.py
================================================
# Acknowledgments: https://github.com/tarashakhurana/4d-occ-forecasting
# Modified by Haisong Liu
import math
import copy
import numpy as np
import torch
from torch.utils.cpp_extension import load
from tqdm import tqdm
from prettytable import PrettyTable
from .ray_pq import Metric_RayPQ


dvr = load("dvr", sources=["lib/dvr/dvr.cpp", "lib/dvr/dvr.cu"], verbose=True, extra_cuda_cflags=['-allow-unsupported-compiler'])
_pc_range = [-40, -40, -1.0, 40, 40, 5.4]
_voxel_size = 0.4


# https://github.com/tarashakhurana/4d-occ-forecasting/blob/ff986082cd6ea10e67ab7839bf0e654736b3f4e2/test_fgbg.py#L29C1-L46C16
def get_rendered_pcds(origin, points, tindex, pred_dist):
    pcds = []
    
    for t in range(len(origin)):
        mask = (tindex == t)
        # skip the ones with no data
        if not mask.any():
            continue
        _pts = points[mask, :3]
        # use ground truth lidar points for the raycasting direction
        v = _pts - origin[t][None, :]
        d = v / np.sqrt((v ** 2).sum(axis=1, keepdims=True))
        pred_pts = origin[t][None, :] + d * pred_dist[mask][:, None]
        pcds.append(torch.from_numpy(pred_pts))
        
    return pcds


def meshgrid3d(occ_size, pc_range):
    W, H, D = occ_size
    
    xs = torch.linspace(0.5, W - 0.5, W).view(W, 1, 1).expand(W, H, D) / W
    ys = torch.linspace(0.5, H - 0.5, H).view(1, H, 1).expand(W, H, D) / H
    zs = torch.linspace(0.5, D - 0.5, D).view(1, 1, D).expand(W, H, D) / D
    xs = xs * (pc_range[3] - pc_range[0]) + pc_range[0]
    ys = ys * (pc_range[4] - pc_range[1]) + pc_range[1]
    zs = zs * (pc_range[5] - pc_range[2]) + pc_range[2]
    xyz = torch.stack((xs, ys, zs), -1)

    return xyz


def generate_lidar_rays():
    # prepare lidar ray angles
    pitch_angles = []
    for k in range(10):
        angle = math.pi / 2 - math.atan(k + 1)
        pitch_angles.append(-angle)
    
    # nuscenes lidar fov: [0.2107773983152201, -0.5439104895672159] (rad)
    while pitch_angles[-1] < 0.21:
        delta = pitch_angles[-1] - pitch_angles[-2]
        pitch_angles.append(pitch_angles[-1] + delta)

    lidar_rays = []
    for pitch_angle in pitch_angles:
        for azimuth_angle in np.arange(0, 360, 1):
            azimuth_angle = np.deg2rad(azimuth_angle)

            x = np.cos(pitch_angle) * np.cos(azimuth_angle)
            y = np.cos(pitch_angle) * np.sin(azimuth_angle)
            z = np.sin(pitch_angle)

            lidar_rays.append((x, y, z))

    return np.array(lidar_rays, dtype=np.float32)


def process_one_sample(sem_pred, lidar_rays, output_origin, instance_pred=None, occ_class_names=None):
    # lidar origin in ego coordinate
    # lidar_origin = torch.tensor([[[0.9858, 0.0000, 1.8402]]])
    T = output_origin.shape[1]
    pred_pcds_t = []

    free_id = len(occ_class_names) - 1 
    occ_pred = copy.deepcopy(sem_pred)
    occ_pred[sem_pred < free_id] = 1
    occ_pred[sem_pred == free_id] = 0
    occ_pred = occ_pred.permute(2, 1, 0)
    occ_pred = occ_pred[None, None, :].contiguous().float()

    offset = torch.Tensor(_pc_range[:3])[None, None, :]
    scaler = torch.Tensor([_voxel_size] * 3)[None, None, :]

    lidar_tindex = torch.zeros([1, lidar_rays.shape[0]])
    
    for t in range(T): 
        lidar_origin = output_origin[:, t:t+1, :]  # [1, 1, 3]
        lidar_endpts = lidar_rays[None] + lidar_origin  # [1, 15840, 3]

        output_origin_render = ((lidar_origin - offset) / scaler).float()  # [1, 1, 3]
        output_points_render = ((lidar_endpts - offset) / scaler).float()  # [1, N, 3]
        output_tindex_render = lidar_tindex  # [1, N], all zeros

        with torch.no_grad():
            pred_dist, _, coord_index = dvr.render_forward(
                occ_pred.cuda(),
                output_origin_render.cuda(),
                output_points_render.cuda(),
                output_tindex_render.cuda(),
                [1, 16, 200, 200],
                "test"
            )
            pred_dist *= _voxel_size

        pred_pcds = get_rendered_pcds(
            lidar_origin[0].cpu().numpy(),
            lidar_endpts[0].cpu().numpy(),
            lidar_tindex[0].cpu().numpy(),
            pred_dist[0].cpu().numpy()
        )
        coord_index = coord_index[0, :, :].int().cpu()  # [N, 3]

        pred_label = sem_pred[coord_index[:, 0], coord_index[:, 1], coord_index[:, 2]][:, None]  # [N, 1]        
        pred_dist = pred_dist[0, :, None].cpu()

        if instance_pred is not None:
            pred_instance = instance_pred[coord_index[:, 0], coord_index[:, 1], coord_index[:, 2]][:, None]  # [N, 1]
            pred_pcds = torch.cat([pred_label.float(), pred_instance.float(), pred_dist], dim=-1)
        else:
            pred_pcds = torch.cat([pred_label.float(), pred_dist], dim=-1)

        pred_pcds_t.append(pred_pcds)

    pred_pcds_t = torch.cat(pred_pcds_t, dim=0)
   
    return pred_pcds_t.numpy()


def calc_rayiou(pcd_pred_list, pcd_gt_list, occ_class_names):
    thresholds = [1, 2, 4]

    gt_cnt = np.zeros([len(occ_class_names)])
    pred_cnt = np.zeros([len(occ_class_names)])
    tp_cnt = np.zeros([len(thresholds), len(occ_class_names)])

    for pcd_pred, pcd_gt in zip(pcd_pred_list, pcd_gt_list):
        for j, threshold in enumerate(thresholds):
            # L1
            depth_pred = pcd_pred[:, 1]
            depth_gt = pcd_gt[:, 1]
            l1_error = np.abs(depth_pred - depth_gt)
            tp_dist_mask = (l1_error < threshold)
            
            for i, cls in enumerate(occ_class_names):
                cls_id = occ_class_names.index(cls)
                cls_mask_pred = (pcd_pred[:, 0] == cls_id)
                cls_mask_gt = (pcd_gt[:, 0] == cls_id)

                gt_cnt_i = cls_mask_gt.sum()
                pred_cnt_i = cls_mask_pred.sum()
                if j == 0:
                    gt_cnt[i] += gt_cnt_i
                    pred_cnt[i] += pred_cnt_i

                tp_cls = cls_mask_gt & cls_mask_pred  # [N]
                tp_mask = np.logical_and(tp_cls, tp_dist_mask)
                tp_cnt[j][i] += tp_mask.sum()
    
    iou_list = []
    for j, threshold in enumerate(thresholds):
        iou_list.append((tp_cnt[j] / (gt_cnt + pred_cnt - tp_cnt[j]))[:-1])

    return iou_list


def main_rayiou(sem_pred_list, sem_gt_list, lidar_origin_list, occ_class_names):
    torch.cuda.empty_cache()

    # generate lidar rays
    lidar_rays = generate_lidar_rays()
    lidar_rays = torch.from_numpy(lidar_rays)

    pcd_pred_list, pcd_gt_list = [], []
    for sem_pred, sem_gt, lidar_origins in tqdm(zip(sem_pred_list, sem_gt_list, lidar_origin_list), ncols=50):
        sem_pred = torch.from_numpy(np.reshape(sem_pred, [200, 200, 16]))
        sem_gt = torch.from_numpy(np.reshape(sem_gt, [200, 200, 16]))

        pcd_pred = process_one_sample(sem_pred, lidar_rays, lidar_origins, occ_class_names=occ_class_names)
        pcd_gt = process_one_sample(sem_gt, lidar_rays, lidar_origins, occ_class_names=occ_class_names)

        # evalute on non-free rays
        valid_mask = (pcd_gt[:, 0].astype(np.int32) != len(occ_class_names) - 1)
        pcd_pred = pcd_pred[valid_mask]
        pcd_gt = pcd_gt[valid_mask]

        assert pcd_pred.shape == pcd_gt.shape
        pcd_pred_list.append(pcd_pred)
        pcd_gt_list.append(pcd_gt)

    iou_list = calc_rayiou(pcd_pred_list, pcd_gt_list, occ_class_names)
    rayiou = np.nanmean(iou_list)
    rayiou_0 = np.nanmean(iou_list[0])
    rayiou_1 = np.nanmean(iou_list[1])
    rayiou_2 = np.nanmean(iou_list[2])
    
    table = PrettyTable([
        'Class Names',
        'RayIoU@1', 'RayIoU@2', 'RayIoU@4'
    ])
    table.float_format = '.3'

    for i in range(len(occ_class_names) - 1):
        table.add_row([
            occ_class_names[i],
            iou_list[0][i], iou_list[1][i], iou_list[2][i]
        ], divider=(i == len(occ_class_names) - 2))
    
    table.add_row(['MEAN', rayiou_0, rayiou_1, rayiou_2])

    print(table)

    torch.cuda.empty_cache()

    return {
        'RayIoU': rayiou,
        'RayIoU@1': rayiou_0,
        'RayIoU@2': rayiou_1,
        'RayIoU@4': rayiou_2,
    }


def main_raypq(sem_pred_list, sem_gt_list, inst_pred_list, inst_gt_list, lidar_origin_list, occ_class_names):
    torch.cuda.empty_cache()

    eval_metrics_pq = Metric_RayPQ(
        occ_class_names=occ_class_names,
        num_classes=len(occ_class_names),
        thresholds=[1, 2, 4]
    )

    # generate lidar rays
    lidar_rays = generate_lidar_rays()
    lidar_rays = torch.from_numpy(lidar_rays)

    for sem_pred, sem_gt, inst_pred, inst_gt, lidar_origins in \
        tqdm(zip(sem_pred_list, sem_gt_list, inst_pred_list, inst_gt_list, lidar_origin_list), ncols=50):
        sem_pred = torch.from_numpy(np.reshape(sem_pred, [200, 200, 16]))
        sem_gt = torch.from_numpy(np.reshape(sem_gt, [200, 200, 16]))

        inst_pred = torch.from_numpy(np.reshape(inst_pred, [200, 200, 16]))
        inst_gt = torch.from_numpy(np.reshape(inst_gt, [200, 200, 16]))

        pcd_pred = process_one_sample(sem_pred, lidar_rays, lidar_origins, instance_pred=inst_pred, occ_class_names=occ_class_names)
        pcd_gt = process_one_sample(sem_gt, lidar_rays, lidar_origins, instance_pred=inst_gt, occ_class_names=occ_class_names)

        # evalute on non-free rays
        valid_mask = (pcd_gt[:, 0].astype(np.int32) != len(occ_class_names) - 1)
        pcd_pred = pcd_pred[valid_mask]
        pcd_gt = pcd_gt[valid_mask]

        assert pcd_pred.shape == pcd_gt.shape
        
        sem_gt = pcd_gt[:, 0].astype(np.int32)
        sem_pred = pcd_pred[:, 0].astype(np.int32)

        instances_gt = pcd_gt[:, 1].astype(np.int32)
        instances_pred = pcd_pred[:, 1].astype(np.int32)

        # L1
        depth_gt = pcd_gt[:, 2]
        depth_pred = pcd_pred[:, 2]
        l1_error = np.abs(depth_pred - depth_gt)

        eval_metrics_pq.add_batch(sem_pred, sem_gt, instances_pred, instances_gt, l1_error)

    torch.cuda.empty_cache()

    return eval_metrics_pq.count_pq()


================================================
FILE: loaders/ray_pq.py
================================================
import numpy as np
from prettytable import PrettyTable


class Metric_RayPQ:
    def __init__(self,
                 occ_class_names, 
                 num_classes=18,
                 thresholds=[1, 2, 4]):
        """
        Args:
            ignore_index (llist): Class ids that not be considered in pq counting.
        """
        if num_classes == 18 or num_classes == 17:
            self.class_names = occ_class_names
        else:
            raise ValueError
        
        self.num_classes = num_classes
        self.id_offset = 2 ** 16
        self.eps = 1e-5
        self.thresholds = thresholds
        
        self.min_num_points = 10
        self.include = np.array(
            [n for n in range(self.num_classes - 1)],
            dtype=int)
        self.cnt = 0
        
        # panoptic stuff
        self.pan_tp = np.zeros([len(self.thresholds), num_classes], dtype=int)
        self.pan_iou = np.zeros([len(self.thresholds), num_classes], dtype=np.double)
        self.pan_fp = np.zeros([len(self.thresholds), num_classes], dtype=int)
        self.pan_fn = np.zeros([len(self.thresholds), num_classes], dtype=int)
        
    def add_batch(self,semantics_pred,semantics_gt,instances_pred,instances_gt, l1_error):
        self.cnt += 1
        self.add_panoptic_sample(semantics_pred, semantics_gt, instances_pred, instances_gt, l1_error) 
    
    def add_panoptic_sample(self, semantics_pred, semantics_gt, instances_pred, instances_gt, l1_error):
        """Add one sample of panoptic predictions and ground truths for
        evaluation.

        Args:
            semantics_pred (np.ndarray): Semantic predictions.
            semantics_gt (np.ndarray): Semantic ground truths.
            instances_pred (np.ndarray): Instance predictions.
            instances_gt (np.ndarray): Instance ground truths.
        """
        # get instance_class_id from instance_gt
        instance_class_ids = [self.num_classes - 1]
        for i in range(1, instances_gt.max() + 1):
            class_id = np.unique(semantics_gt[instances_gt == i])
            # assert class_id.shape[0] == 1, "each instance must belong to only one class"
            if class_id.shape[0] == 1:
                instance_class_ids.append(class_id[0])
            else:
                instance_class_ids.append(self.num_classes - 1)
        instance_class_ids = np.array(instance_class_ids)

        instance_count = 1
        final_instance_class_ids = []
        final_instances = np.zeros_like(instances_gt)  # empty space has instance id "0"

        for class_id in range(self.num_classes - 1):
            if np.sum(semantics_gt == class_id) == 0:
                continue

            if self.class_names[class_id] in ['car', 'truck', 'construction_vehicle', 'bus', 'trailer', 'motorcycle', 'bicycle', 'pedestrian']:
                # treat as instances
                for instance_id in range(len(instance_class_ids)):
                    if instance_class_ids[instance_id] != class_id:
                        continue
                    final_instances[instances_gt == instance_id] = instance_count
                    instance_count += 1
                    final_instance_class_ids.append(class_id)
            else:
                # treat as semantics
                final_instances[semantics_gt == class_id] = instance_count
                instance_count += 1
                final_instance_class_ids.append(class_id)
                
        instances_gt = final_instances
        
        # avoid zero (ignored label)
        instances_pred = instances_pred + 1
        instances_gt = instances_gt + 1
        
        for j, threshold in enumerate(self.thresholds):
            tp_dist_mask = l1_error < threshold
            # for each class (except the ignored ones)
            for cl in self.include:
                # get a class mask
                pred_inst_in_cl_mask = semantics_pred == cl
                gt_inst_in_cl_mask = semantics_gt == cl

                # get instance points in class (makes outside stuff 0)
                pred_inst_in_cl = instances_pred * pred_inst_in_cl_mask.astype(int)
                gt_inst_in_cl = instances_gt * gt_inst_in_cl_mask.astype(int)

                # generate the areas for each unique instance prediction
                unique_pred, counts_pred = np.unique(
                    pred_inst_in_cl[pred_inst_in_cl > 0], return_counts=True)
                id2idx_pred = {id: idx for idx, id in enumerate(unique_pred)}
                matched_pred = np.array([False] * unique_pred.shape[0])

                # generate the areas for each unique instance gt_np
                unique_gt, counts_gt = np.unique(
                    gt_inst_in_cl[gt_inst_in_cl > 0], return_counts=True)
                id2idx_gt = {id: idx for idx, id in enumerate(unique_gt)}
                matched_gt = np.array([False] * unique_gt.shape[0])

                # generate intersection using offset
                valid_combos = np.logical_and(pred_inst_in_cl > 0,
                                            gt_inst_in_cl > 0)
                # add dist_mask
                valid_combos = np.logical_and(valid_combos, tp_dist_mask)

                id_offset_combo = pred_inst_in_cl[
                    valid_combos] + self.id_offset * gt_inst_in_cl[valid_combos]
                unique_combo, counts_combo = np.unique(
                    id_offset_combo, return_counts=True)

                # generate an intersection map
                # count the intersections with over 0.5 IoU as TP
                gt_labels = unique_combo // self.id_offset
                pred_labels = unique_combo % self.id_offset
                gt_areas = np.array([counts_gt[id2idx_gt[id]] for id in gt_labels])
                pred_areas = np.array(
                    [counts_pred[id2idx_pred[id]] for id in pred_labels])
                intersections = counts_combo
                unions = gt_areas + pred_areas - intersections
                ious = intersections.astype(float) / unions.astype(float)

                tp_indexes = ious > 0.5
                self.pan_tp[j][cl] += np.sum(tp_indexes)
                self.pan_iou[j][cl] += np.sum(ious[tp_indexes])

                matched_gt[[id2idx_gt[id] for id in gt_labels[tp_indexes]]] = True
                matched_pred[[id2idx_pred[id]
                            for id in pred_labels[tp_indexes]]] = True

                # count the FN
                if len(counts_gt) > 0:
                    self.pan_fn[j][cl] += np.sum(
                        np.logical_and(counts_gt >= self.min_num_points,
                                    ~matched_gt))

                # count the FP
                if len(matched_pred) > 0:
                    self.pan_fp[j][cl] += np.sum(
                        np.logical_and(counts_pred >= self.min_num_points,
                                    ~matched_pred))
    
    def count_pq(self):
        sq_all = self.pan_iou.astype(np.double) / np.maximum(
            self.pan_tp.astype(np.double), self.eps)
        rq_all = self.pan_tp.astype(np.double) / np.maximum(
            self.pan_tp.astype(np.double) + 0.5 * self.pan_fp.astype(np.double)
            + 0.5 * self.pan_fn.astype(np.double), self.eps)
        pq_all = sq_all * rq_all
        
        # mask classes not occurring in dataset
        mask = (self.pan_tp + self.pan_fp + self.pan_fn) > 0
        pq_all[~mask] = float('nan')

        table = PrettyTable([
            'Class Names',
            'RayPQ@%d' % self.thresholds[0],
            'RayPQ@%d' % self.thresholds[1],
            'RayPQ@%d' % self.thresholds[2]
        ])
        table.float_format = '.3'

        for i in range(len(self.class_names) - 1):
            table.add_row([
                self.class_names[i],
                pq_all[0][i], pq_all[1][i], pq_all[2][i],
            ], divider=(i == len(self.class_names) - 2))
        
        table.add_row([
            'MEAN',
            np.nanmean(pq_all[0]), np.nanmean(pq_all[1]), np.nanmean(pq_all[2])
        ])

        print(table)

        return {
            'RayPQ': np.nanmean(pq_all),
            'RayPQ@1': np.nanmean(pq_all[0]),
            'RayPQ@2': np.nanmean(pq_all[1]),
            'RayPQ@4': np.nanmean(pq_all[2]),
        }


================================================
FILE: models/__init__.py
================================================
from .backbones import __all__
from .bbox import __all__
from .sparseocc import SparseOcc
from .sparseocc_head import SparseOccHead
from .sparseocc_transformer import SparseOccTransformer
from .loss_utils import *

__all__ = []


================================================
FILE: models/backbones/__init__.py
================================================
from .vovnet import VoVNet

__all__ = ['VoVNet']


================================================
FILE: models/backbones/vovnet.py
================================================
import torch
import torch.nn as nn
import torch.nn.functional as F
import warnings
import torch.utils.checkpoint as cp
from collections import OrderedDict
from mmcv.runner import BaseModule
from mmdet.models.builder import BACKBONES
from torch.nn.modules.batchnorm import _BatchNorm


VoVNet19_slim_dw_eSE = {
    'stem': [64, 64, 64],
    'stage_conv_ch': [64, 80, 96, 112],
    'stage_out_ch': [112, 256, 384, 512],
    "layer_per_block": 3,
    "block_per_stage": [1, 1, 1, 1],
    "eSE": True,
    "dw": True
}

VoVNet19_dw_eSE = {
    'stem': [64, 64, 64],
    "stage_conv_ch": [128, 160, 192, 224],
    "stage_out_ch": [256, 512, 768, 1024],
    "layer_per_block": 3,
    "block_per_stage": [1, 1, 1, 1],
    "eSE": True,
    "dw": True
}

VoVNet19_slim_eSE = {
    'stem': [64, 64, 128],
    'stage_conv_ch': [64, 80, 96, 112],
    'stage_out_ch': [112, 256, 384, 512],
    'layer_per_block': 3,
    'block_per_stage': [1, 1, 1, 1],
    'eSE': True,
    "dw": False
}

VoVNet19_eSE = {
    'stem': [64, 64, 128],
    "stage_conv_ch": [128, 160, 192, 224],
    "stage_out_ch": [256, 512, 768, 1024],
    "layer_per_block": 3,
    "block_per_stage": [1, 1, 1, 1],
    "eSE": True,
    "dw": False
}

VoVNet39_eSE = {
    'stem': [64, 64, 128],
    "stage_conv_ch": [128, 160, 192, 224],
    "stage_out_ch": [256, 512, 768, 1024],
    "layer_per_block": 5,
    "block_per_stage": [1, 1, 2, 2],
    "eSE": True,
    "dw": False
}

VoVNet57_eSE = {
    'stem': [64, 64, 128],
    "stage_conv_ch": [128, 160, 192, 224],
    "stage_out_ch": [256, 512, 768, 1024],
    "layer_per_block": 5,
    "block_per_stage": [1, 1, 4, 3],
    "eSE": True,
    "dw": False
}

VoVNet99_eSE = {
    'stem': [64, 64, 128],
    "stage_conv_ch": [128, 160, 192, 224],
    "stage_out_ch": [256, 512, 768, 1024],
    "layer_per_block": 5,
    "block_per_stage": [1, 3, 9, 3],
    "eSE": True,
    "dw": False
}

_STAGE_SPECS = {
    "V-19-slim-dw-eSE": VoVNet19_slim_dw_eSE,
    "V-19-dw-eSE": VoVNet19_dw_eSE,
    "V-19-slim-eSE": VoVNet19_slim_eSE,
    "V-19-eSE": VoVNet19_eSE,
    "V-39-eSE": VoVNet39_eSE,
    "V-57-eSE": VoVNet57_eSE,
    "V-99-eSE": VoVNet99_eSE,
}


def dw_conv3x3(in_channels, out_channels, module_name, postfix, stride=1, kernel_size=3, padding=1):
    """3x3 convolution with padding"""
    return [
        (
            '{}_{}/dw_conv3x3'.format(module_name, postfix),
            nn.Conv2d(
                in_channels,
                out_channels,
                kernel_size=kernel_size,
                stride=stride,
                padding=padding,
                groups=out_channels,
                bias=False
            )
        ),
        (
            '{}_{}/pw_conv1x1'.format(module_name, postfix),
            nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, padding=0, groups=1, bias=False)
        ),
        ('{}_{}/pw_norm'.format(module_name, postfix), nn.BatchNorm2d(out_channels)),
        ('{}_{}/pw_relu'.format(module_name, postfix), nn.ReLU(inplace=True)),
    ]


def conv3x3(in_channels, out_channels, module_name, postfix, stride=1, groups=1, kernel_size=3, padding=1):
    """3x3 convolution with padding"""
    return [
        (
            f"{module_name}_{postfix}/conv",
            nn.Conv2d(
                in_channels,
                out_channels,
                kernel_size=kernel_size,
                stride=stride,
                padding=padding,
                groups=groups,
                bias=False,
            ),
        ),
        (f"{module_name}_{postfix}/norm", nn.BatchNorm2d(out_channels)),
        (f"{module_name}_{postfix}/relu", nn.ReLU(inplace=True)),
    ]


def conv1x1(in_channels, out_channels, module_name, postfix, stride=1, groups=1, kernel_size=1, padding=0):
    """1x1 convolution with padding"""
    return [
        (
            f"{module_name}_{postfix}/conv",
            nn.Conv2d(
                in_channels,
                out_channels,
                kernel_size=kernel_size,
                stride=stride,
                padding=padding,
                groups=groups,
                bias=False,
            ),
        ),
        (f"{module_name}_{postfix}/norm", nn.BatchNorm2d(out_channels)),
        (f"{module_name}_{postfix}/relu", nn.ReLU(inplace=True)),
    ]


class Hsigmoid(nn.Module):
    def __init__(self, inplace=True):
        super(Hsigmoid, self).__init__()
        self.inplace = inplace

    def forward(self, x):
        return F.relu6(x + 3.0, inplace=self.inplace) / 6.0


class eSEModule(nn.Module):
    def __init__(self, channel, reduction=4):
        super(eSEModule, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Conv2d(channel, channel, kernel_size=1, padding=0)
        self.hsigmoid = Hsigmoid()

    def forward(self, x):
        inputs = x
        x = self.avg_pool(x)
        x = self.fc(x)
        x = self.hsigmoid(x)
        return inputs * x


class _OSA_module(nn.Module):
    def __init__(self, in_ch, stage_ch, concat_ch, layer_per_block, module_name, SE=False, identity=False, depthwise=False, with_cp=False):
        super(_OSA_module, self).__init__()
        self.with_cp = with_cp

        self.identity = identity
        self.depthwise = depthwise
        self.isReduced = False
        self.layers = nn.ModuleList()
        in_channel = in_ch

        if self.depthwise and in_channel != stage_ch:
            self.isReduced = True
            self.conv_reduction = nn.Sequential(
                OrderedDict(conv1x1(in_channel, stage_ch, "{}_reduction".format(module_name), "0"))
            )

        for i in range(layer_per_block):
            if self.depthwise:
                self.layers.append(nn.Sequential(OrderedDict(dw_conv3x3(stage_ch, stage_ch, module_name, i))))
            else:
                self.layers.append(nn.Sequential(OrderedDict(conv3x3(in_channel, stage_ch, module_name, i))))
            in_channel = stage_ch

        # feature aggregation
        in_channel = in_ch + layer_per_block * stage_ch
        self.concat = nn.Sequential(OrderedDict(conv1x1(in_channel, concat_ch, module_name, "concat")))

        self.ese = eSEModule(concat_ch)

    def _forward(self, x):
        identity_feat = x

        output = []
        output.append(x)

        if self.depthwise and self.isReduced:
            x = self.conv_reduction(x)

        for layer in self.layers:
            x = layer(x)
            output.append(x)

        x = torch.cat(output, dim=1)
        xt = self.concat(x)

        xt = self.ese(xt)

        if self.identity:
            xt = xt + identity_feat

        return xt

    def forward(self, x):
        if self.with_cp and self.training and x.requires_grad:
            return cp.checkpoint(self._forward, x)
        else:
            return self._forward(x)


class _OSA_stage(nn.Sequential):
    def __init__(self, in_ch, stage_ch, concat_ch, block_per_stage, layer_per_block, stage_num, SE=False, depthwise=False, with_cp=False):
        super(_OSA_stage, self).__init__()
        if not stage_num == 2:
            self.add_module("Pooling", nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True))

        if block_per_stage != 1:
            SE = False

        module_name = f"OSA{stage_num}_1"
        self.add_module(
            module_name, _OSA_module(in_ch, stage_ch, concat_ch, layer_per_block, module_name, SE, depthwise=depthwise, with_cp=with_cp)
        )

        for i in range(block_per_stage - 1):
            if i != block_per_stage - 2:  # last block
                SE = False
            module_name = f"OSA{stage_num}_{i + 2}"
            self.add_module(
                module_name,
                _OSA_module(
                    concat_ch,
                    stage_ch,
                    concat_ch,
                    layer_per_block,
                    module_name,
                    SE,
                    identity=True,
                    depthwise=depthwise,
                    with_cp=with_cp
                ),
            )


@BACKBONES.register_module()
class VoVNet(BaseModule):
    def __init__(self, spec_name, input_ch=3, out_features=None, frozen_stages=-1, norm_eval=True, with_cp=False, pretrained=None, init_cfg=None):
        """
        Args:
            input_ch(int) : the number of input channel
            out_features (list[str]): name of the layers whose outputs should
                be returned in forward. Can be anything in "stem", "stage2" ...
        """
        super(VoVNet, self).__init__(init_cfg)
        self.frozen_stages = frozen_stages
        self.norm_eval = norm_eval

        if isinstance(pretrained, str):
            warnings.warn('DeprecationWarning: pretrained is deprecated, '
                          'please use "init_cfg" instead')
            self.init_cfg = dict(type='Pretrained', checkpoint=pretrained)
        stage_specs = _STAGE_SPECS[spec_name]

        stem_ch = stage_specs["stem"]
        config_stage_ch = stage_specs["stage_conv_ch"]
        config_concat_ch = stage_specs["stage_out_ch"]
        block_per_stage = stage_specs["block_per_stage"]
        layer_per_block = stage_specs["layer_per_block"]
        SE = stage_specs["eSE"]
        depthwise = stage_specs["dw"]

        self._out_features = out_features

        # Stem module
        conv_type = dw_conv3x3 if depthwise else conv3x3
        stem = conv3x3(input_ch, stem_ch[0], "stem", "1", 2)
        stem += conv_type(stem_ch[0], stem_ch[1], "stem", "2", 1)
        stem += conv_type(stem_ch[1], stem_ch[2], "stem", "3", 2)
        self.add_module("stem", nn.Sequential((OrderedDict(stem))))
        current_stirde = 4
        self._out_feature_strides = {"stem": current_stirde, "stage2": current_stirde}
        self._out_feature_channels = {"stem": stem_ch[2]}

        stem_out_ch = [stem_ch[2]]
        in_ch_list = stem_out_ch + config_concat_ch[:-1]

        # OSA stages
        self.stage_names = []
        for i in range(4):  # num_stages
            name = "stage%d" % (i + 2)  # stage 2 ... stage 5
            self.stage_names.append(name)
            self.add_module(
                name,
                _OSA_stage(
                    in_ch_list[i],
                    config_stage_ch[i],
                    config_concat_ch[i],
                    block_per_stage[i],
                    layer_per_block,
                    i + 2,
                    SE,
                    depthwise,
                    with_cp=with_cp
                ),
            )

            self._out_feature_channels[name] = config_concat_ch[i]
            if not i == 0:
                self._out_feature_strides[name] = current_stirde = int(current_stirde * 2)

        # initialize weights
        # self._initialize_weights()

    def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight)

    def forward(self, x):
        outputs = {}
        x = self.stem(x)
        if "stem" in self._out_features:
            outputs["stem"] = x
        for name in self.stage_names:
            x = getattr(self, name)(x)
            if name in self._out_features:
                outputs[name] = x

        return outputs

    def _freeze_stages(self):
        if self.frozen_stages >= 0:
            m = getattr(self, 'stem')
            m.eval()
            for param in m.parameters():
                param.requires_grad = False

        for i in range(1, self.frozen_stages + 1):
            m = getattr(self, f'stage{i+1}')
            m.eval()
            for param in m.parameters():
                param.requires_grad = False

    def train(self, mode=True):
        """Convert the model into training mode while keep normalization layer
        freezed."""
        super(VoVNet, self).train(mode)
        self._freeze_stages()
        if mode and self.norm_eval:
            for m in self.modules():
                # trick: eval have effect on BatchNorm only
                if isinstance(m, _BatchNorm):
                    m.eval()


================================================
FILE: models/bbox/__init__.py
================================================
from .assigners import __all__
from .coders import __all__
from .match_costs import __all__

================================================
FILE: models/bbox/assigners/__init__.py
================================================
from .hungarian_assigner_3d import HungarianAssigner3D

__all__ = ['HungarianAssigner3D']


================================================
FILE: models/bbox/assigners/hungarian_assigner_3d.py
================================================
import torch

from mmdet.core.bbox.builder import BBOX_ASSIGNERS
from mmdet.core.bbox.assigners import AssignResult
from mmdet.core.bbox.assigners import BaseAssigner
from mmdet.core.bbox.match_costs import build_match_cost
from ..utils import normalize_bbox

try:
    from scipy.optimize import linear_sum_assignment
except ImportError:
    linear_sum_assignment = None


@BBOX_ASSIGNERS.register_module()
class HungarianAssigner3D(BaseAssigner):
    def __init__(self,
                 cls_cost=dict(type='ClassificationCost', weight=1.),
                 reg_cost=dict(type='BBoxL1Cost', weight=1.0),
                 iou_cost=dict(type='IoUCost', weight=0.0),
                 pc_range=None):
        self.cls_cost = build_match_cost(cls_cost)
        self.reg_cost = build_match_cost(reg_cost)
        self.iou_cost = build_match_cost(iou_cost)
        self.pc_range = pc_range

    def assign(self,
               bbox_pred,
               cls_pred,
               gt_bboxes,
               gt_labels,
               gt_bboxes_ignore=None,
               code_weights=None,
               with_velo=False):
        assert gt_bboxes_ignore is None, \
            'Only case when gt_bboxes_ignore is None is supported.'
        num_gts, num_bboxes = gt_bboxes.size(0), bbox_pred.size(0)

        # 1. assign -1 by default
        assigned_gt_inds = bbox_pred.new_full((num_bboxes, ),
                                              -1,
                                              dtype=torch.long)
        assigned_labels = bbox_pred.new_full((num_bboxes, ),
                                             -1,
                                             dtype=torch.long)
        if num_gts == 0 or num_bboxes == 0:
            # No ground truth or boxes, return empty assignment
            if num_gts == 0:
                # No ground truth, assign all to background
                assigned_gt_inds[:] = 0
            return AssignResult(
                num_gts, assigned_gt_inds, None, labels=assigned_labels)
        
        # 2. compute the weighted costs
        # classification and bboxcost.
        cls_cost = self.cls_cost(cls_pred, gt_labels)
        # regression L1 cost
        normalized_gt_bboxes = normalize_bbox(gt_bboxes)
        
        if code_weights is not None:
            bbox_pred = bbox_pred * code_weights
            normalized_gt_bboxes = normalized_gt_bboxes * code_weights
        
        if with_velo:
            reg_cost = self.reg_cost(bbox_pred, normalized_gt_bboxes)
        else:
            reg_cost = self.reg_cost(bbox_pred[:, :8], normalized_gt_bboxes[:, :8])
        
        # weighted sum of above two costs
        cost = cls_cost + reg_cost
        
        # 3. do Hungarian matching on CPU using linear_sum_assignment
        cost = cost.detach().cpu()
        cost = torch.nan_to_num(cost, nan=100.0, posinf=100.0, neginf=-100.0)
        
        if linear_sum_assignment is None:
            raise ImportError('Please run "pip install scipy" '
                              'to install scipy first.')
        
        matched_row_inds, matched_col_inds = linear_sum_assignment(cost)
        matched_row_inds = torch.from_numpy(matched_row_inds).to(
            bbox_pred.device)
        matched_col_inds = torch.from_numpy(matched_col_inds).to(
            bbox_pred.device)

        # 4. assign backgrounds and foregrounds
        # assign all indices to backgrounds first
        assigned_gt_inds[:] = 0
        # assign foregrounds based on matching results
        assigned_gt_inds[matched_row_inds] = matched_col_inds + 1
        assigned_labels[matched_row_inds] = gt_labels[matched_col_inds]
        return AssignResult(
            num_gts, assigned_gt_inds, None, labels=assigned_labels)


================================================
FILE: models/bbox/coders/__init__.py
================================================
from .nms_free_coder import NMSFreeCoder

__all__ = ['NMSFreeCoder']


================================================
FILE: models/bbox/coders/nms_free_coder.py
================================================
import torch

from mmdet.core.bbox import BaseBBoxCoder
from mmdet.core.bbox.builder import BBOX_CODERS
from ..utils import denormalize_bbox


@BBOX_CODERS.register_module()
class NMSFreeCoder(BaseBBoxCoder):
    """Bbox coder for NMS-free detector.
    Args:
        pc_range (list[float]): Range of point cloud.
        post_center_range (list[float]): Limit of the center.
            Default: None.
        max_num (int): Max number to be kept. Default: 100.
        score_threshold (float): Threshold to filter boxes based on score.
            Default: None.
        code_size (int): Code size of bboxes. Default: 9
    """
    def __init__(self,
                 pc_range,
                 voxel_size=None,
                 post_center_range=None,
                 max_num=100,
                 score_threshold=None,
                 num_classes=10):
        self.pc_range = pc_range
        self.voxel_size = voxel_size
        self.post_center_range = post_center_range
        self.max_num = max_num
        self.score_threshold = score_threshold
        self.num_classes = num_classes

    def encode(self):
        pass

    def decode_single(self, cls_scores, bbox_preds):
        """Decode bboxes.
        Args:
            cls_scores (Tensor): Outputs from the classification head, \
                shape [num_query, cls_out_channels]. Note \
                cls_out_channels should includes background.
            bbox_preds (Tensor): Outputs from the regression \
                head with normalized coordinate format (cx, cy, w, l, cz, h, rot_sine, rot_cosine, vx, vy). \
                Shape [num_query, 9].
        Returns:
            list[dict]: Decoded boxes.
        """
        max_num = self.max_num

        cls_scores = cls_scores.sigmoid()
        scores, indexs = cls_scores.view(-1).topk(max_num)
        labels = indexs % self.num_classes
        bbox_index = torch.div(indexs, self.num_classes, rounding_mode='trunc')
        bbox_preds = bbox_preds[bbox_index]

        final_box_preds = denormalize_bbox(bbox_preds)   
        final_scores = scores 
        final_preds = labels 

        # use score threshold
        if self.score_threshold is not None:
            thresh_mask = final_scores > self.score_threshold

        if self.post_center_range is not None:
            limit = torch.tensor(self.post_center_range, device=scores.device)
            mask = (final_box_preds[..., :3] >= limit[:3]).all(1)
            mask &= (final_box_preds[..., :3] <= limit[3:]).all(1)

            if self.score_threshold:
                mask &= thresh_mask

            boxes3d = final_box_preds[mask]
            scores = final_scores[mask]
            labels = final_preds[mask]
            predictions_dict = {
                'bboxes': boxes3d,
                'scores': scores,
                'labels': labels
            }

        else:
            raise NotImplementedError(
                'Need to reorganize output as a batch, only '
                'support post_center_range is not None for now!'
            )

        return predictions_dict

    def decode(self, preds_dicts):
        """Decode bboxes.
        Args:
            all_cls_scores (Tensor): Outputs from the classification head, \
                shape [nb_dec, bs, num_query, cls_out_channels]. Note \
                cls_out_channels should includes background.
            all_bbox_preds (Tensor): Sigmoid outputs from the regression \
                head with normalized coordinate format (cx, cy, w, l, cz, h, rot_sine, rot_cosine, vx, vy). \
                Shape [nb_dec, bs, num_query, 9].
        Returns:
            list[dict]: Decoded boxes.
        """
        all_cls_scores = preds_dicts['all_cls_scores'][-1]
        all_bbox_preds = preds_dicts['all_bbox_preds'][-1]
        
        batch_size = all_cls_scores.size()[0]
        predictions_list = []
        for i in range(batch_size):
            predictions_list.append(self.decode_single(all_cls_scores[i], all_bbox_preds[i]))

        return predictions_list


================================================
FILE: models/bbox/match_costs/__init__.py
================================================
from .match_cost import BBox3DL1Cost

__all__ = ['BBox3DL1Cost']

================================================
FILE: models/bbox/match_costs/match_cost.py
================================================
import torch
from mmdet.core.bbox.match_costs.builder import MATCH_COST


@MATCH_COST.register_module()
class BBox3DL1Cost(object):
    """BBox3DL1Cost.
     Args:
         weight (int | float, optional): loss_weight
    """

    def __init__(self, weight=1.0):
        self.weight = weight

    def __call__(self, bbox_pred, gt_bboxes):
        """
        Args:
            bbox_pred (Tensor): Predicted boxes with normalized coordinates
                (cx, cy, w, h), which are all in range [0, 1]. Shape
                [num_query, 4].
            gt_bboxes (Tensor): Ground truth boxes with normalized
                coordinates (x1, y1, x2, y2). Shape [num_gt, 4].
        Returns:
            torch.Tensor: bbox_cost value with weight
        """
        bbox_cost = torch.cdist(bbox_pred, gt_bboxes, p=1)
        return bbox_cost * self.weight


@MATCH_COST.register_module()
class BBoxBEVL1Cost(object):
    def __init__(self, weight, pc_range):
        self.weight = weight
        self.pc_range = pc_range

    def __call__(self, bboxes, gt_bboxes):
        pc_start = bboxes.new(self.pc_range[0:2])
        pc_range = bboxes.new(self.pc_range[3:5]) - bboxes.new(self.pc_range[0:2])
        # normalize the box center to [0, 1]
        normalized_bboxes_xy = (bboxes[:, :2] - pc_start) / pc_range
        normalized_gt_bboxes_xy = (gt_bboxes[:, :2] - pc_start) / pc_range
        reg_cost = torch.cdist(normalized_bboxes_xy, normalized_gt_bboxes_xy, p=1)
        return reg_cost * self.weight


@MATCH_COST.register_module()
class IoU3DCost(object):
    def __init__(self, weight):
        self.weight = weight

    def __call__(self, iou):
        iou_cost = - iou
        return iou_cost * self.weight


================================================
FILE: models/bbox/utils.py
================================================
import torch 


def normalize_bbox(bboxes):
    cx = bboxes[..., 0:1]
    cy = bboxes[..., 1:2]
    cz = bboxes[..., 2:3]
    w = bboxes[..., 3:4].log()
    l = bboxes[..., 4:5].log()
    h = bboxes[..., 5:6].log()
    rot = bboxes[..., 6:7]

    if bboxes.size(-1) > 7:
        vx = bboxes[..., 7:8]
        vy = bboxes[..., 8:9]
        out = torch.cat([cx, cy, w, l, cz, h, rot.sin(), rot.cos(), vx, vy], dim=-1)
    else:
        out = torch.cat([cx, cy, w, l, cz, h, rot.sin(), rot.cos()], dim=-1)

    return out


def denormalize_bbox(normalized_bboxes):
    rot_sin = normalized_bboxes[..., 6:7]
    rot_cos = normalized_bboxes[..., 7:8]
    rot = torch.atan2(rot_sin, rot_cos)

    cx = normalized_bboxes[..., 0:1]
    cy = normalized_bboxes[..., 1:2]
    cz = normalized_bboxes[..., 4:5]

    w = normalized_bboxes[..., 2:3].exp()
    l = normalized_bboxes[..., 3:4].exp()
    h = normalized_bboxes[..., 5:6].exp()

    if normalized_bboxes.size(-1) > 8:
        vx = normalized_bboxes[..., 8:9]
        vy = normalized_bboxes[..., 9:10]
        out = torch.cat([cx, cy, cz, w, l, h, rot, vx, vy], dim=-1)
    else:
        out = torch.cat([cx, cy, cz, w, l, h, rot], dim=-1)

    return out


def encode_bbox(bboxes, pc_range=None):
    xyz = bboxes[..., 0:3].clone()
    wlh = bboxes[..., 3:6].log()
    rot = bboxes[..., 6:7]

    if pc_range is not None:
        xyz[..., 0] = (xyz[..., 0] - pc_range[0]) / (pc_range[3] - pc_range[0])
        xyz[..., 1] = (xyz[..., 1] - pc_range[1]) / (pc_range[4] - pc_range[1])
        xyz[..., 2] = (xyz[..., 2] - pc_range[2]) / (pc_range[5] - pc_range[2])

    if bboxes.shape[-1] > 7:
        vel = bboxes[..., 7:9].clone()
        return torch.cat([xyz, wlh, rot.sin(), rot.cos(), vel], dim=-1)
    else:
        return torch.cat([xyz, wlh, rot.sin(), rot.cos()], dim=-1)


def decode_bbox(bboxes, pc_range=None):
    xyz = bboxes[..., 0:3].clone()
    wlh = bboxes[..., 3:6].exp()
    rot = torch.atan2(bboxes[..., 6:7], bboxes[..., 7:8])

    if pc_range is not None:
        xyz[..., 0] = xyz[..., 0] * (pc_range[3] - pc_range[0]) + pc_range[0]
        xyz[..., 1] = xyz[..., 1] * (pc_range[4] - pc_range[1]) + pc_range[1]
        xyz[..., 2] = xyz[..., 2] * (pc_range[5] - pc_range[2]) + pc_range[2]

    if bboxes.shape[-1] > 8:
        vel = bboxes[..., 8:10].clone()
        return torch.cat([xyz, wlh, rot, vel], dim=-1)
    else:
        return torch.cat([xyz, wlh, rot], dim=-1)

def bbox2occrange(bboxes, occ_size, query_cube_size=None):
    """
    xyz in [0, 1]
    wlh in [0, 1]
    """
    xyz = bboxes[..., 0:3].clone()
    if query_cube_size is not None:
        wlh = torch.zeros_like(xyz)
        wlh[..., 0] = query_cube_size[0]
        wlh[..., 1] = query_cube_size[1]
        wlh[..., 2] = query_cube_size[2]
    else:
        wlh = bboxes[..., 3:6]
        wlh[..., 0] = wlh[..., 0] * occ_size[0]
        wlh[..., 1] = wlh[..., 1] * occ_size[1]
        wlh[..., 2] = wlh[..., 2] * occ_size[2]
    
    xyz[..., 0] = xyz[..., 0] * occ_size[0]
    xyz[..., 1] = xyz[..., 1] * occ_size[1]
    xyz[..., 2] = xyz[..., 2] * occ_size[2]
    
    xyz = torch.round(xyz)
        
    low_bound = torch.round(xyz - wlh/2)
    high_bound = torch.round(xyz + wlh/2)
    
    return torch.cat((low_bound, high_bound), dim=-1).long()

def occrange2bbox(occ_range, occ_size, pc_range):
    """
    Return: xyz in [0, 1], wlh in [0, pc_range_size)
    """
    xyz = (occ_range[..., :3] + occ_range[..., 3:]).to(torch.float32) / 2
    xyz[..., 0] /= occ_size[0]
    xyz[..., 1] /= occ_size[1]
    xyz[..., 2] /= occ_size[2]
    wlh = (occ_range[..., 3:] - occ_range[..., :3]).to(torch.float32)
    wlh[..., 0] *= (pc_range[3] - pc_range[0]) / occ_size[0]
    wlh[..., 1] *= (pc_range[4] - pc_range[1]) / occ_size[1]
    wlh[..., 2] *= (pc_range[5] - pc_range[2]) / occ_size[2]
    return torch.cat((xyz, wlh), dim=-1)

================================================
FILE: models/checkpoint.py
================================================
# This page is completely copied from https://pytorch.org/docs/stable/_modules/torch/utils/checkpoint.html#checkpoint
# If you are using torch 2.0 or higher, you can safely delete this page and import the related functions from official PyTorch

import torch
import warnings
import weakref
from typing import Any, Iterable, List, Tuple

__all__ = [
    "checkpoint", "checkpoint_sequential", "CheckpointFunction",
    "check_backward_validity", "detach_variable", "get_device_states",
    "set_device_states",
]

def detach_variable(inputs: Tuple[Any, ...]) -> Tuple[torch.Tensor, ...]:
    if isinstance(inputs, tuple):
        out = []
        for inp in inputs:
            if not isinstance(inp, torch.Tensor):
                out.append(inp)
                continue

            x = inp.detach()
            x.requires_grad = inp.requires_grad
            out.append(x)
        return tuple(out)
    else:
        raise RuntimeError(
            "Only tuple of tensors is supported. Got Unsupported input type: ", type(inputs).__name__)


def check_backward_validity(inputs: Iterable[Any]) -> None:
    if not any(inp.requires_grad for inp in inputs if isinstance(inp, torch.Tensor)):
        warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")


# We can't know if the run_fn will internally move some args to different devices,
# which would require logic to preserve rng states for those devices as well.
# We could paranoically stash and restore ALL the rng states for all visible devices,
# but that seems very wasteful for most cases.  Compromise:  Stash the RNG state for
# the device of all Tensor args.
#
# To consider:  maybe get_device_states and set_device_states should reside in torch/random.py?
def get_device_states(*args) -> Tuple[List[int], List[torch.Tensor]]:
    # This will not error out if "arg" is a CPU tensor or a non-tensor type because
    # the conditionals short-circuit.
    fwd_gpu_devices = list({arg.get_device() for arg in args
                            if isinstance(arg, torch.Tensor) and arg.is_cuda})

    fwd_gpu_states = []
    for device in fwd_gpu_devices:
        with torch.cuda.device(device):
            fwd_gpu_states.append(torch.cuda.get_rng_state())

    return fwd_gpu_devices, fwd_gpu_states


def set_device_states(devices, states) -> None:
    for device, state in zip(devices, states):
        with torch.cuda.device(device):
            torch.cuda.set_rng_state(state)

def _get_autocast_kwargs():
    gpu_autocast_kwargs = {"enabled": torch.is_autocast_enabled(),
                           "dtype": torch.get_autocast_gpu_dtype(),
                           "cache_enabled": torch.is_autocast_cache_enabled()}

    cpu_autocast_kwargs = {"enabled": torch.is_autocast_cpu_enabled(),
                           "dtype": torch.get_autocast_cpu_dtype(),
                           "cache_enabled": torch.is_autocast_cache_enabled()}

    return gpu_autocast_kwargs, cpu_autocast_kwargs

class CheckpointFunction(torch.autograd.Function):

    @staticmethod
    def forward(ctx, run_function, preserve_rng_state, *args):
        check_backward_validity(args)
        ctx.run_function = run_function
        ctx.preserve_rng_state = preserve_rng_state
        # Accommodates the (remote) possibility that autocast is enabled for cpu AND gpu.
        ctx.gpu_autocast_kwargs, ctx.cpu_autocast_kwargs = _get_autocast_kwargs()
        if preserve_rng_state:
            ctx.fwd_cpu_state = torch.get_rng_state()
            # Don't eagerly initialize the cuda context by accident.
            # (If the user intends that the context is initialized later, within their
            # run_function, we SHOULD actually stash the cuda state here.  Unfortunately,
            # we have no way to anticipate this will happen before we run the function.)
            ctx.had_cuda_in_fwd = False
            if torch.cuda._initialized:
                ctx.had_cuda_in_fwd = True
                ctx.fwd_gpu_devices, ctx.fwd_gpu_states = get_device_states(*args)

        # Save non-tensor inputs in ctx, keep a placeholder None for tensors
        # to be filled out during the backward.
        ctx.inputs = []
        ctx.tensor_indices = []
        tensor_inputs = []
        for i, arg in enumerate(args):
            if torch.is_tensor(arg):
                tensor_inputs.append(arg)
                ctx.tensor_indices.append(i)
                ctx.inputs.append(None)
            else:
                ctx.inputs.append(arg)

        ctx.save_for_backward(*tensor_inputs)

        with torch.no_grad():
            outputs = run_function(*args)
        return outputs

    @staticmethod
    def backward(ctx, *args):
        if not torch.autograd._is_checkpoint_valid():
            raise RuntimeError(
                "Checkpointing is not compatible with .grad() or when an `inputs` parameter"
                " is passed to .backward(). Please use .backward() and do not pass its `inputs`"
                " argument.")
        # Copy the list to avoid modifying original list.
        inputs = list(ctx.inputs)
        tensor_indices = ctx.tensor_indices
        tensors = ctx.saved_tensors

        # Fill in inputs with appropriate saved tensors.
        for i, idx in enumerate(tensor_indices):
            inputs[idx] = tensors[i]

        # Stash the surrounding rng state, and mimic the state that was
        # present at this time during forward.  Restore the surrounding state
        # when we're done.
        rng_devices = []
        if ctx.preserve_rng_state and ctx.had_cuda_in_fwd:
            rng_devices = ctx.fwd_gpu_devices
        with torch.random.fork_rng(devices=rng_dev

Download .txt

gitextract_8j2gi4im/

├── .gitignore
├── LICENSE
├── README.md
├── configs/
│   ├── r50_nuimg_704x256_8f.py
│   ├── r50_nuimg_704x256_8f_60e.py
│   ├── r50_nuimg_704x256_8f_openocc.py
│   └── r50_nuimg_704x256_8f_pano.py
├── gen_instance_info.py
├── gen_sweep_info.py
├── lib/
│   └── dvr/
│       ├── dvr.cpp
│       └── dvr.cu
├── loaders/
│   ├── __init__.py
│   ├── builder.py
│   ├── ego_pose_dataset.py
│   ├── nuscenes_dataset.py
│   ├── nuscenes_occ_dataset.py
│   ├── old_metrics.py
│   ├── pipelines/
│   │   ├── __init__.py
│   │   ├── loading.py
│   │   └── transforms.py
│   ├── ray_metrics.py
│   └── ray_pq.py
├── models/
│   ├── __init__.py
│   ├── backbones/
│   │   ├── __init__.py
│   │   └── vovnet.py
│   ├── bbox/
│   │   ├── __init__.py
│   │   ├── assigners/
│   │   │   ├── __init__.py
│   │   │   └── hungarian_assigner_3d.py
│   │   ├── coders/
│   │   │   ├── __init__.py
│   │   │   └── nms_free_coder.py
│   │   ├── match_costs/
│   │   │   ├── __init__.py
│   │   │   └── match_cost.py
│   │   └── utils.py
│   ├── checkpoint.py
│   ├── csrc/
│   │   ├── __init__.py
│   │   ├── msmv_sampling/
│   │   │   ├── msmv_sampling.cpp
│   │   │   ├── msmv_sampling.h
│   │   │   ├── msmv_sampling_backward.cu
│   │   │   └── msmv_sampling_forward.cu
│   │   ├── setup.py
│   │   └── wrapper.py
│   ├── loss_utils.py
│   ├── matcher.py
│   ├── sparse_voxel_decoder.py
│   ├── sparsebev_head.py
│   ├── sparsebev_sampling.py
│   ├── sparsebev_transformer.py
│   ├── sparseocc.py
│   ├── sparseocc_head.py
│   ├── sparseocc_transformer.py
│   └── utils.py
├── old_metrics.py
├── ray_metrics.py
├── timing.py
├── train.py
├── utils.py
├── val.py
└── viz_prediction.py

Download .txt

SYMBOL INDEX (353 symbols across 38 files)

FILE: gen_instance_info.py
  function convert_to_nusc_box (line 39) | def convert_to_nusc_box(bboxes, lift_center=False, wlh_margin=0.0):
  function meshgrid3d (line 63) | def meshgrid3d(occ_size, pc_range):  # points in ego coord
  function process_add_instance_info (line 77) | def process_add_instance_info(sample):
  function add_instance_info (line 200) | def add_instance_info(sample_infos):

FILE: gen_sweep_info.py
  function get_cam_info (line 18) | def get_cam_info(nusc, sample_data):
  function add_sweep_info (line 40) | def add_sweep_info(nusc, sample_infos):

FILE: lib/dvr/dvr.cpp
  function render_forward (line 39) | std::vector<torch::Tensor>
  function render (line 51) | std::vector<torch::Tensor> render(torch::Tensor sigma, torch::Tensor ori...
  function init (line 61) | torch::Tensor init(torch::Tensor points, torch::Tensor tindex,
  function PYBIND11_MODULE (line 68) | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {

FILE: loaders/builder.py
  function build_dataloader (line 9) | def build_dataloader(dataset,

FILE: loaders/ego_pose_dataset.py
  function trans_matrix (line 8) | def trans_matrix(T, R):
  class EgoPoseDataset (line 16) | class EgoPoseDataset(Dataset):
    method __init__ (line 17) | def __init__(self, data_infos):
    method __len__ (line 29) | def __len__(self):
    method get_ego_from_lidar (line 32) | def get_ego_from_lidar(self, info):
    method get_global_pose (line 38) | def get_global_pose(self, info, inverse=False):
    method __getitem__ (line 50) | def __getitem__(self, idx):

FILE: loaders/nuscenes_dataset.py
  class CustomNuScenesDataset (line 9) | class CustomNuScenesDataset(NuScenesDataset):
    method collect_sweeps (line 11) | def collect_sweeps(self, index, into_past=60, into_future=0):
    method get_data_info (line 34) | def get_data_info(self, index):

FILE: loaders/nuscenes_occ_dataset.py
  class NuSceneOcc (line 19) | class NuSceneOcc(NuScenesDataset):
    method __init__ (line 20) | def __init__(self, occ_gt_root, *args, **kwargs):
    method collect_sweeps (line 35) | def collect_sweeps(self, index, into_past=150, into_future=0):
    method get_data_info (line 58) | def get_data_info(self, index):
    method evaluate (line 118) | def evaluate(self, occ_results, runner=None, show_dir=None, **eval_kwa...
    method format_results (line 177) | def format_results(self, occ_results, submission_prefix, **kwargs):

FILE: loaders/old_metrics.py
  function pcolor (line 12) | def pcolor(string, color, on_color=None, attrs=None):
  function getCellCoordinates (line 35) | def getCellCoordinates(points, voxelSize):
  function getNumUniqueCells (line 39) | def getNumUniqueCells(cells):
  class Metric_mIoU (line 44) | class Metric_mIoU():
    method __init__ (line 45) | def __init__(self,
    method hist_info (line 76) | def hist_info(self, n_cl, pred, gt):
    method per_class_iu (line 104) | def per_class_iu(self, hist):
    method compute_mIoU (line 110) | def compute_mIoU(self, pred, label, n_classes):
    method add_batch (line 120) | def add_batch(self,semantics_pred,semantics_gt,mask_lidar,mask_camera):
    method count_miou (line 143) | def count_miou(self):
  class Metric_FScore (line 156) | class Metric_FScore():
    method __init__ (line 157) | def __init__(self,
    method voxel2points (line 181) | def voxel2points(self, voxel):
    method add_batch (line 193) | def add_batch(self,semantics_pred,semantics_gt,mask_lidar,mask_camera ):
    method count_fscore (line 237) | def count_fscore(self,):
  class Metric_mRecall (line 242) | class Metric_mRecall():
    method __init__ (line 243) | def __init__(self,
    method hist_info (line 276) | def hist_info(self, n_cl, p_cl, pred, gt):
    method per_class_recall (line 304) | def per_class_recall(self, hist):
    method compute_mRecall (line 307) | def compute_mRecall(self, pred, label, n_classes, p_classes):
    method add_batch (line 317) | def add_batch(self,semantics_pred,semantics_gt,mask_lidar,mask_camera):
    method count_mrecall (line 338) | def count_mrecall(self):
  class Metric_Panoptic (line 351) | class Metric_Panoptic():
    method __init__ (line 352) | def __init__(self,
    method add_batch (line 393) | def add_batch(self,semantics_pred,semantics_gt,instances_pred,instance...
    method add_panoptic_sample (line 412) | def add_panoptic_sample(self, semantics_pred, semantics_gt, instances_...
    method count_pq (line 531) | def count_pq(self, ):

FILE: loaders/pipelines/loading.py
  function compose_lidar2img (line 13) | def compose_lidar2img(ego2global_translation_curr,
  class LoadMultiViewImageFromMultiSweeps (line 40) | class LoadMultiViewImageFromMultiSweeps(object):
    method __init__ (line 41) | def __init__(self,
    method load_offline (line 57) | def load_offline(self, results):
    method load_online (line 111) | def load_online(self, results):
    method __call__ (line 158) | def __call__(self, results):
  class LoadOccGTFromFile (line 170) | class LoadOccGTFromFile(object):
    method __init__ (line 171) | def __init__(self, num_classes=18, inst_class_ids=[]):
    method __call__ (line 175) | def __call__(self, results):
  class BEVAug (line 246) | class BEVAug(object):
    method __init__ (line 247) | def __init__(self, bda_aug_conf, classes, is_train=True):
    method sample_bda_augmentation (line 252) | def sample_bda_augmentation(self):
    method bev_transform (line 266) | def bev_transform(self, rotate_angle, scale_ratio, flip_dx, flip_dy):
    method __call__ (line 290) | def __call__(self, results):

FILE: loaders/pipelines/transforms.py
  class PadMultiViewImage (line 10) | class PadMultiViewImage(object):
    method __init__ (line 21) | def __init__(self, size=None, size_divisor=None, pad_val=0):
    method _pad_img (line 29) | def _pad_img(self, img):
    method _pad_imgs (line 40) | def _pad_imgs(self, results):
    method __call__ (line 50) | def __call__(self, results):
    method __repr__ (line 60) | def __repr__(self):
  class NormalizeMultiviewImage (line 69) | class NormalizeMultiviewImage(object):
    method __init__ (line 79) | def __init__(self, mean, std, to_rgb=True):
    method __call__ (line 84) | def __call__(self, results):
    method __repr__ (line 110) | def __repr__(self):
  class PhotoMetricDistortionMultiViewImage (line 117) | class PhotoMetricDistortionMultiViewImage:
    method __init__ (line 136) | def __init__(self,
    method __call__ (line 146) | def __call__(self, results):
    method __repr__ (line 207) | def __repr__(self):
  class RandomTransformImage (line 219) | class RandomTransformImage(object):
    method __init__ (line 220) | def __init__(self, ida_aug_conf=None, training=True):
    method __call__ (line 224) | def __call__(self, results):
    method img_transform (line 270) | def img_transform(self, img, resize, resize_dims, crop, flip, rotate):
    method sample_augmentation (line 313) | def sample_augmentation(self):
  class GlobalRotScaleTransImage (line 345) | class GlobalRotScaleTransImage(object):
    method __init__ (line 346) | def __init__(self,
    method __call__ (line 354) | def __call__(self, results):
    method rotate_z (line 369) | def rotate_z(self, results, rot_angle):
    method scale_xyz (line 384) | def scale_xyz(self, results, scale_ratio):

FILE: loaders/ray_metrics.py
  function get_rendered_pcds (line 19) | def get_rendered_pcds(origin, points, tindex, pred_dist):
  function meshgrid3d (line 37) | def meshgrid3d(occ_size, pc_range):
  function generate_lidar_rays (line 51) | def generate_lidar_rays():
  function process_one_sample (line 77) | def process_one_sample(sem_pred, lidar_rays, output_origin, instance_pre...
  function calc_rayiou (line 138) | def calc_rayiou(pcd_pred_list, pcd_gt_list, occ_class_names):
  function main_rayiou (line 175) | def main_rayiou(sem_pred_list, sem_gt_list, lidar_origin_list, occ_class...
  function main_raypq (line 231) | def main_raypq(sem_pred_list, sem_gt_list, inst_pred_list, inst_gt_list,...

FILE: loaders/ray_pq.py
  class Metric_RayPQ (line 5) | class Metric_RayPQ:
    method __init__ (line 6) | def __init__(self,
    method add_batch (line 36) | def add_batch(self,semantics_pred,semantics_gt,instances_pred,instance...
    method add_panoptic_sample (line 40) | def add_panoptic_sample(self, semantics_pred, semantics_gt, instances_...
    method count_pq (line 155) | def count_pq(self):

FILE: models/backbones/vovnet.py
  function dw_conv3x3 (line 93) | def dw_conv3x3(in_channels, out_channels, module_name, postfix, stride=1...
  function conv3x3 (line 117) | def conv3x3(in_channels, out_channels, module_name, postfix, stride=1, g...
  function conv1x1 (line 137) | def conv1x1(in_channels, out_channels, module_name, postfix, stride=1, g...
  class Hsigmoid (line 157) | class Hsigmoid(nn.Module):
    method __init__ (line 158) | def __init__(self, inplace=True):
    method forward (line 162) | def forward(self, x):
  class eSEModule (line 166) | class eSEModule(nn.Module):
    method __init__ (line 167) | def __init__(self, channel, reduction=4):
    method forward (line 173) | def forward(self, x):
  class _OSA_module (line 181) | class _OSA_module(nn.Module):
    method __init__ (line 182) | def __init__(self, in_ch, stage_ch, concat_ch, layer_per_block, module...
    method _forward (line 211) | def _forward(self, x):
    method forward (line 234) | def forward(self, x):
  class _OSA_stage (line 241) | class _OSA_stage(nn.Sequential):
    method __init__ (line 242) | def __init__(self, in_ch, stage_ch, concat_ch, block_per_stage, layer_...
  class VoVNet (line 276) | class VoVNet(BaseModule):
    method __init__ (line 277) | def __init__(self, spec_name, input_ch=3, out_features=None, frozen_st...
    method _initialize_weights (line 344) | def _initialize_weights(self):
    method forward (line 349) | def forward(self, x):
    method _freeze_stages (line 361) | def _freeze_stages(self):
    method train (line 374) | def train(self, mode=True):

FILE: models/bbox/assigners/hungarian_assigner_3d.py
  class HungarianAssigner3D (line 16) | class HungarianAssigner3D(BaseAssigner):
    method __init__ (line 17) | def __init__(self,
    method assign (line 27) | def assign(self,

FILE: models/bbox/coders/nms_free_coder.py
  class NMSFreeCoder (line 9) | class NMSFreeCoder(BaseBBoxCoder):
    method __init__ (line 20) | def __init__(self,
    method encode (line 34) | def encode(self):
    method decode_single (line 37) | def decode_single(self, cls_scores, bbox_preds):
    method decode (line 90) | def decode(self, preds_dicts):

FILE: models/bbox/match_costs/match_cost.py
  class BBox3DL1Cost (line 6) | class BBox3DL1Cost(object):
    method __init__ (line 12) | def __init__(self, weight=1.0):
    method __call__ (line 15) | def __call__(self, bbox_pred, gt_bboxes):
  class BBoxBEVL1Cost (line 31) | class BBoxBEVL1Cost(object):
    method __init__ (line 32) | def __init__(self, weight, pc_range):
    method __call__ (line 36) | def __call__(self, bboxes, gt_bboxes):
  class IoU3DCost (line 47) | class IoU3DCost(object):
    method __init__ (line 48) | def __init__(self, weight):
    method __call__ (line 51) | def __call__(self, iou):

FILE: models/bbox/utils.py
  function normalize_bbox (line 4) | def normalize_bbox(bboxes):
  function denormalize_bbox (line 23) | def denormalize_bbox(normalized_bboxes):
  function encode_bbox (line 46) | def encode_bbox(bboxes, pc_range=None):
  function decode_bbox (line 63) | def decode_bbox(bboxes, pc_range=None):
  function bbox2occrange (line 79) | def bbox2occrange(bboxes, occ_size, query_cube_size=None):
  function occrange2bbox (line 107) | def occrange2bbox(occ_range, occ_size, pc_range):

FILE: models/checkpoint.py
  function detach_variable (line 15) | def detach_variable(inputs: Tuple[Any, ...]) -> Tuple[torch.Tensor, ...]:
  function check_backward_validity (line 32) | def check_backward_validity(inputs: Iterable[Any]) -> None:
  function get_device_states (line 44) | def get_device_states(*args) -> Tuple[List[int], List[torch.Tensor]]:
  function set_device_states (line 58) | def set_device_states(devices, states) -> None:
  function _get_autocast_kwargs (line 63) | def _get_autocast_kwargs():
  class CheckpointFunction (line 74) | class CheckpointFunction(torch.autograd.Function):
    method forward (line 77) | def forward(ctx, run_function, preserve_rng_state, *args):
    method backward (line 114) | def backward(ctx, *args):
  function checkpoint (line 167) | def checkpoint(function, *args, use_reentrant: bool = True, **kwargs):
  function checkpoint_sequential (line 262) | def checkpoint_sequential(functions, segments, input, use_reentrant=True...
  function _checkpoint_without_reentrant (line 342) | def _checkpoint_without_reentrant(function, preserve_rng_state=True, *ar...

FILE: models/csrc/msmv_sampling/msmv_sampling.cpp
  function ms_deform_attn_cuda_c2345_forward (line 98) | at::Tensor ms_deform_attn_cuda_c2345_forward(
  function ms_deform_attn_cuda_c23456_forward (line 152) | at::Tensor ms_deform_attn_cuda_c23456_forward(
  function ms_deform_attn_cuda_c2345_backward (line 212) | std::vector<at::Tensor> ms_deform_attn_cuda_c2345_backward(
  function ms_deform_attn_cuda_c23456_backward (line 283) | std::vector<at::Tensor> ms_deform_attn_cuda_c23456_backward(
  function PYBIND11_MODULE (line 363) | PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {

FILE: models/csrc/setup.py
  function get_ext_modules (line 5) | def get_ext_modules():

FILE: models/csrc/wrapper.py
  function msmv_sampling_pytorch (line 7) | def msmv_sampling_pytorch(mlvl_feats, sampling_locations, scale_weights):
  class MSMVSamplingC2345 (line 35) | class MSMVSamplingC2345(torch.autograd.Function):
    method forward (line 37) | def forward(ctx, feat_c2, feat_c3, feat_c4, feat_c5, sampling_location...
    method backward (line 46) | def backward(ctx, grad_output):
  class MSMVSamplingC23456 (line 58) | class MSMVSamplingC23456(torch.autograd.Function):
    method forward (line 60) | def forward(ctx, feat_c2, feat_c3, feat_c4, feat_c5, feat_c6, sampling...
    method backward (line 69) | def backward(ctx, grad_output):
  function msmv_sampling (line 81) | def msmv_sampling(mlvl_feats, sampling_locations, scale_weights):

FILE: models/loss_utils.py
  function get_voxel_decoder_loss_input (line 11) | def get_voxel_decoder_loss_input(voxel_semantics, occ_loc_i, seg_pred_i,...
  function compute_scal_loss (line 34) | def compute_scal_loss(pred, gt, class_id, reverse=False, ignore_index=255):
  class GeoScalLoss (line 85) | class GeoScalLoss(nn.Module):
    method __init__ (line 86) | def __init__(self,
    method forward (line 93) | def forward(self, pred, gt):
  class SemScalLoss (line 102) | class SemScalLoss(nn.Module):
    method __init__ (line 103) | def __init__(self,
    method forward (line 116) | def forward(self, pred, gt):
  function dice_loss (line 130) | def dice_loss(
  function sigmoid_ce_loss (line 164) | def sigmoid_ce_loss(
  function CE_ssc_loss (line 196) | def CE_ssc_loss(pred, target, class_weights=None, ignore_index=255):
  function lovasz_grad (line 211) | def lovasz_grad(gt_sorted):
  function lovasz_softmax (line 227) | def lovasz_softmax(probas, labels, classes='present', per_image=False, i...
  function lovasz_softmax_flat (line 247) | def lovasz_softmax_flat(probas, labels, classes='present'):
  function flatten_probas (line 279) | def flatten_probas(probas, labels, ignore=None):
  class Mask2FormerLoss (line 311) | class Mask2FormerLoss(nn.Module):
    method __init__ (line 312) | def __init__(self,
    method forward (line 334) | def forward(self, mask_pred, class_pred, mask_gt, class_gt, indices, m...
    method loss_masks (line 376) | def loss_masks(self, src_mask, tgt_mask, avg_factor=None, mask_camera=...
    method loss_labels (line 394) | def loss_labels(self, src_class, tgt_class, empty_weight=None, avg_fac...
  function mean (line 403) | def mean(l, empty=0):

FILE: models/matcher.py
  function batch_dice_loss (line 12) | def batch_dice_loss(inputs: torch.Tensor, targets: torch.Tensor, mask_ca...
  function batch_sigmoid_ce_loss (line 39) | def batch_sigmoid_ce_loss(inputs: torch.Tensor, targets: torch.Tensor, m...
  class HungarianMatcher (line 84) | class HungarianMatcher(BaseModule):
    method __init__ (line 92) | def __init__(self, cost_class: float = 1, cost_mask: float = 1, cost_d...
    method forward (line 110) | def forward(self, mask_pred, class_pred, mask_gt, class_gt, mask_camera):

FILE: models/sparse_voxel_decoder.py
  function index2point (line 11) | def index2point(coords, pc_range, voxel_size):
  function point2bbox (line 22) | def point2bbox(coords, box_size):
  function upsample (line 32) | def upsample(pre_feat, pre_coords, interval):
  class SparseVoxelDecoder (line 55) | class SparseVoxelDecoder(BaseModule):
    method __init__ (line 56) | def __init__(self,
    method init_weights (line 106) | def init_weights(self):
    method forward (line 110) | def forward(self, mlvl_feats, img_metas):
  class SparseVoxelDecoderLayer (line 165) | class SparseVoxelDecoderLayer(BaseModule):
    method __init__ (line 166) | def __init__(self,
    method init_weights (line 211) | def init_weights(self):
    method forward (line 218) | def forward(self, query_feat, query_bbox, mlvl_feats, img_metas):

FILE: models/sparsebev_head.py
  class SparseBEVHead (line 14) | class SparseBEVHead(DETRHead):
    method __init__ (line 15) | def __init__(self,
    method _init_layers (line 48) | def _init_layers(self):
    method init_weights (line 65) | def init_weights(self):
    method forward (line 68) | def forward(self, mlvl_feats, img_metas):
    method prepare_for_dn_input (line 117) | def prepare_for_dn_input(self, batch_size, init_query_bbox, label_enc,...
    method prepare_for_dn_loss (line 218) | def prepare_for_dn_loss(self, mask_dict):
    method dn_loss_single (line 233) | def dn_loss_single(self,
    method calc_dn_loss (line 272) | def calc_dn_loss(self, loss_dict, preds_dicts, num_dec_layers):
    method _get_target_single (line 295) | def _get_target_single(self,
    method get_targets (line 323) | def get_targets(self,
    method loss_single (line 343) | def loss_single(self,
    method loss (line 399) | def loss(self,
    method get_bboxes (line 457) | def get_bboxes(self, preds_dicts, img_metas, rescale=False):

FILE: models/sparsebev_sampling.py
  function make_sample_points_from_bbox (line 7) | def make_sample_points_from_bbox(query_bbox, offset, pc_range):
  function make_sample_points_from_mask (line 32) | def make_sample_points_from_mask(valid_map, pc_range, occ_size, num_poin...
  function sampling_4d (line 83) | def sampling_4d(sample_points, mlvl_feats, scale_weights, lidar2img, ima...

FILE: models/sparsebev_transformer.py
  class SparseBEVTransformer (line 16) | class SparseBEVTransformer(BaseModule):
    method __init__ (line 17) | def __init__(self, embed_dims, num_frames=8, num_points=4, num_layers=...
    method init_weights (line 28) | def init_weights(self):
    method forward (line 31) | def forward(self, query_bbox, query_feat, mlvl_feats, attn_mask, img_m...
  class SparseBEVTransformerDecoder (line 40) | class SparseBEVTransformerDecoder(BaseModule):
    method __init__ (line 41) | def __init__(self, embed_dims, num_frames=8, num_points=4, num_layers=...
    method init_weights (line 51) | def init_weights(self):
    method forward (line 54) | def forward(self, query_bbox, query_feat, mlvl_feats, attn_mask, img_m...
  class SparseBEVTransformerDecoderLayer (line 93) | class SparseBEVTransformerDecoderLayer(BaseModule):
    method __init__ (line 94) | def __init__(self, embed_dims, num_frames=8, num_points=4, num_levels=...
    method init_weights (line 136) | def init_weights(self):
    method refine_bbox (line 144) | def refine_bbox(self, bbox_proposal, bbox_delta):
    method forward (line 151) | def forward(self, query_bbox, query_feat, mlvl_feats, attn_mask, img_m...
  class SparseBEVSelfAttention (line 184) | class SparseBEVSelfAttention(BaseModule):
    method __init__ (line 185) | def __init__(self, embed_dims=256, num_heads=8, dropout=0.1, pc_range=...
    method init_weights (line 197) | def init_weights(self):
    method inner_forward (line 202) | def inner_forward(self, query_bbox, query_feat, pre_attn_mask=None):
    method forward (line 224) | def forward(self, query_bbox, query_feat, pre_attn_mask=None):
    method calc_bbox_dists (line 231) | def calc_bbox_dists(self, bboxes):
  class SparseBEVSampling (line 245) | class SparseBEVSampling(BaseModule):
    method __init__ (line 246) | def __init__(self, embed_dims=256, num_frames=4, num_groups=4, num_poi...
    method init_weights (line 258) | def init_weights(self):
    method inner_forward (line 263) | def inner_forward(self, query_bbox, query_feat, mlvl_feats, img_metas):
    method forward (line 307) | def forward(self, query_bbox, query_feat, mlvl_feats, img_metas):
  class AdaptiveMixing (line 314) | class AdaptiveMixing(nn.Module):
    method __init__ (line 315) | def __init__(self, in_dim, in_points, n_groups=1, query_dim=None, out_...
    method init_weights (line 341) | def init_weights(self):
    method inner_forward (line 344) | def inner_forward(self, x, query):
    method forward (line 376) | def forward(self, x, query):
  class AdaptiveMixingPointOnly (line 383) | class AdaptiveMixingPointOnly(nn.Module):
    method __init__ (line 384) | def __init__(self, in_dim, in_points, n_groups=1, query_dim=None, out_...
    method init_weights (line 409) | def init_weights(self):
    method inner_forward (line 412) | def inner_forward(self, x, query):
    method forward (line 437) | def forward(self, x, query):
  class DeformAggregation (line 444) | class DeformAggregation(nn.Module):
    method __init__ (line 445) | def __init__(self, in_dim, in_points, n_groups=1, query_dim=None, out_...
    method init_weights (line 467) | def init_weights(self):
    method inner_forward (line 470) | def inner_forward(self, x, query):
    method forward (line 488) | def forward(self, x, query):

FILE: models/sparseocc.py
  class SparseOcc (line 13) | class SparseOcc(MVXTwoStageDetector):
    method __init__ (line 14) | def __init__(self,
    method extract_img_feat (line 48) | def extract_img_feat(self, img):
    method extract_feat (line 60) | def extract_feat(self, img, img_metas=None):
    method forward_pts_train (line 109) | def forward_pts_train(self, mlvl_feats, voxel_semantics, voxel_instanc...
    method forward (line 119) | def forward(self, return_loss=True, **kwargs):
    method forward_train (line 126) | def forward_train(self, img_metas=None, img=None, voxel_semantics=None...
    method forward_test (line 130) | def forward_test(self, img_metas, img=None, **kwargs):
    method simple_test_pts (line 155) | def simple_test_pts(self, x, img_metas, rescale=False):
    method simple_test (line 160) | def simple_test(self, img_metas, img=None, rescale=False):
    method simple_test_offline (line 167) | def simple_test_offline(self, img_metas, img=None, rescale=False):
    method simple_test_online (line 171) | def simple_test_online(self, img_metas, img=None, rescale=False):

FILE: models/sparseocc_head.py
  class SparseOccHead (line 19) | class SparseOccHead(nn.Module):
    method __init__ (line 20) | def __init__(self,
    method init_weights (line 45) | def init_weights(self):
    method forward (line 49) | def forward(self, mlvl_feats, img_metas):
    method loss (line 59) | def loss(self, voxel_semantics, voxel_instances, instance_class_ids, p...
    method loss_single (line 62) | def loss_single(self, voxel_semantics, voxel_instances, instance_class...
    method merge_occ_pred (line 131) | def merge_occ_pred(self, outs):
    method merge_semseg (line 148) | def merge_semseg(self, mask_cls, mask_pred):
    method merge_panoseg (line 160) | def merge_panoseg(self, mask_cls, mask_pred):
    method merge_panoseg_single (line 176) | def merge_panoseg_single(self, mask_cls, mask_pred):

FILE: models/sparseocc_transformer.py
  class SparseOccTransformer (line 16) | class SparseOccTransformer(BaseModule):
    method __init__ (line 17) | def __init__(self,
    method init_weights (line 60) | def init_weights(self):
    method forward (line 64) | def forward(self, mlvl_feats, img_metas):
  class MaskFormerOccDecoder (line 87) | class MaskFormerOccDecoder(BaseModule):
    method __init__ (line 88) | def __init__(self,
    method init_weights (line 121) | def init_weights(self):
    method forward (line 124) | def forward(self, occ_preds, mlvl_feats, img_metas):
  class MaskFormerOccDecoderLayer (line 146) | class MaskFormerOccDecoderLayer(BaseModule):
    method __init__ (line 147) | def __init__(self,
    method init_weights (line 175) | def init_weights(self):
    method forward (line 181) | def forward(self, query_feat, valid_map, mask_pred, occ_preds, mlvl_fe...
    method pred_segmentation (line 202) | def pred_segmentation(self, query_feat, mask_feat):
    method inner_pred_segmentation (line 208) | def inner_pred_segmentation(self, query_feat, mask_feat):
  class MaskFormerSelfAttention (line 217) | class MaskFormerSelfAttention(BaseModule):
    method __init__ (line 218) | def __init__(self, embed_dims, num_heads, dropout=0.0):
    method init_weights (line 224) | def init_weights(self):
    method with_pos_embed (line 229) | def with_pos_embed(self, tensor, pos=None):
    method inner_forward (line 232) | def inner_forward(self, query, mask = None, key_padding_mask = None,qu...
    method forward (line 238) | def forward(self, query, mask = None, key_padding_mask = None,query_po...
  class MaskFormerSampling (line 245) | class MaskFormerSampling(BaseModule):
    method __init__ (line 246) | def __init__(self, embed_dims=256, num_frames=4, num_groups=4, num_poi...
    method init_weights (line 259) | def init_weights(self, ):
    method inner_forward (line 263) | def inner_forward(self, query_feat, valid_map, occ_loc, mlvl_feats, im...
    method forward (line 293) | def forward(self, query_feat, valid_map, occ_loc,  mlvl_feats, img_met...

FILE: models/utils.py
  function conv3d_gn_relu (line 8) | def conv3d_gn_relu(in_channels, out_channels, kernel_size=1, stride=1):
  function deconv3d_gn_relu (line 16) | def deconv3d_gn_relu(in_channels, out_channels, kernel_size=2, stride=2):
  function sparse2dense (line 24) | def sparse2dense(indices, value, dense_shape, empty_value=0):
  function generate_grid (line 38) | def generate_grid(n_vox, interval):
  function batch_indexing (line 46) | def batch_indexing(batched_data: torch.Tensor, batched_indices: torch.Te...
  function rotation_3d_in_axis (line 92) | def rotation_3d_in_axis(points, angles):
  function inverse_sigmoid (line 123) | def inverse_sigmoid(x, eps=1e-5):
  function pad_multiple (line 141) | def pad_multiple(inputs, img_metas, size_divisor=32):
  function rgb_to_hsv (line 160) | def rgb_to_hsv(image: torch.Tensor, eps: float = 1e-8) -> torch.Tensor:
  function hsv_to_rgb (line 215) | def hsv_to_rgb(image: torch.Tensor) -> torch.Tensor:
  class GpuPhotoMetricDistortion (line 256) | class GpuPhotoMetricDistortion:
    method __init__ (line 275) | def __init__(self,
    method __call__ (line 285) | def __call__(self, imgs):
  class DumpConfig (line 345) | class DumpConfig:
    method __init__ (line 346) | def __init__(self):

FILE: old_metrics.py
  function main (line 10) | def main(args):

FILE: ray_metrics.py
  function main (line 14) | def main(args):

FILE: timing.py
  function main (line 17) | def main():

FILE: train.py
  function main (line 20) | def main():

FILE: utils.py
  function init_logging (line 16) | def init_logging(filename=None, debug=False):
  function backup_code (line 30) | def backup_code(work_dir, verbose=False):
  class MyTextLoggerHook (line 45) | class MyTextLoggerHook(TextLoggerHook):
    method _log_info (line 46) | def _log_info(self, log_dict, runner):
    method log (line 84) | def log(self, runner):
    method after_train_epoch (line 121) | def after_train_epoch(self, runner):
  class MyTensorboardLoggerHook (line 132) | class MyTensorboardLoggerHook(LoggerHook):
    method __init__ (line 133) | def __init__(self, log_dir=None, interval=10, ignore_last=True, reset_...
    method before_run (line 139) | def before_run(self, runner):
    method log (line 146) | def log(self, runner):
    method after_run (line 179) | def after_run(self, runner):
  class MyWandbLoggerHook (line 185) | class MyWandbLoggerHook(LoggerHook):
    method __init__ (line 225) | def __init__(self, log_dir=None, project_name=None, team_name=None, ex...
    method import_wandb (line 244) | def import_wandb(self) -> None:
    method before_run (line 253) | def before_run(self, runner) -> None:
    method log (line 276) | def log(self, runner) -> None:
    method after_run (line 296) | def after_run(self, runner) -> None:

FILE: val.py
  function evaluate (line 18) | def evaluate(dataset, results):
  function main (line 28) | def main():

FILE: viz_prediction.py
  function occ2img (line 42) | def occ2img(semantics):
  function main (line 59) | def main():

Download .json

Condensed preview — 58 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (434K chars).

[
  {
    "path": ".gitignore",
    "chars": 513,
    "preview": "# OS generated files\n.DS_Store\n.DS_Store?\n._*\n.Spotlight-V100\n.Trashes\nehthumbs.db\nThumbs.db\n\n# Compiled source\nbuild\nde"
  },
  {
    "path": "LICENSE",
    "chars": 11357,
    "preview": "                                 Apache License\n                           Version 2.0, January 2004\n                   "
  },
  {
    "path": "README.md",
    "chars": 10893,
    "preview": "# SparseOcc\n\nThis is the official PyTorch implementation for our paper:\n\n> [**Fully Sparse 3D Panoptic Occupancy Predict"
  },
  {
    "path": "configs/r50_nuimg_704x256_8f.py",
    "chars": 6939,
    "preview": "dataset_type = 'NuSceneOcc'\ndataset_root = 'data/nuscenes/'\nocc_gt_root = 'data/nuscenes/occ3d'\n\n# If point cloud range "
  },
  {
    "path": "configs/r50_nuimg_704x256_8f_60e.py",
    "chars": 273,
    "preview": "_base_ = ['./r50_nuimg_704x256_8f.py']\n\nlr_config = dict(\n    policy='step',\n    warmup='linear',\n    warmup_iters=500,\n"
  },
  {
    "path": "configs/r50_nuimg_704x256_8f_openocc.py",
    "chars": 2574,
    "preview": "_base_ = ['./r50_nuimg_704x256_8f.py']\n\nocc_gt_root = 'data/nuscenes/openocc_v2'\n\ndet_class_names = [\n    'car', 'truck'"
  },
  {
    "path": "configs/r50_nuimg_704x256_8f_pano.py",
    "chars": 2802,
    "preview": "_base_ = ['./r50_nuimg_704x256_8f.py']\n\nocc_gt_root = 'data/nuscenes/occ3d_panoptic'\n\n# For nuScenes we usually do 10-cl"
  },
  {
    "path": "gen_instance_info.py",
    "chars": 8738,
    "preview": "import os\nimport tqdm\nimport glob\nimport pickle\nimport argparse\nimport numpy as np\nimport torch\nimport multiprocessing\nf"
  },
  {
    "path": "gen_sweep_info.py",
    "chars": 4163,
    "preview": "# Generate info files manually\nimport os\nimport mmcv\nimport tqdm\nimport pickle\nimport argparse\nimport numpy as np\nfrom n"
  },
  {
    "path": "lib/dvr/dvr.cpp",
    "chars": 2568,
    "preview": "// Acknowledgments: https://github.com/tarashakhurana/4d-occ-forecasting\n// Modified by Haisong Liu\n\n#include <string>\n#"
  },
  {
    "path": "lib/dvr/dvr.cu",
    "chars": 26778,
    "preview": "// Acknowledgments: https://github.com/tarashakhurana/4d-occ-forecasting\n// Modified by Haisong Liu\n\n#include <torch/ext"
  },
  {
    "path": "loaders/__init__.py",
    "chars": 185,
    "preview": "from .pipelines import __all__\nfrom .nuscenes_dataset import CustomNuScenesDataset\nfrom .nuscenes_occ_dataset import NuS"
  },
  {
    "path": "loaders/builder.py",
    "chars": 1708,
    "preview": "from functools import partial\nfrom mmcv.parallel import collate\nfrom mmcv.runner import get_dist_info\nfrom torch.utils.d"
  },
  {
    "path": "loaders/ego_pose_dataset.py",
    "chars": 3222,
    "preview": "import torch\nimport numpy as np\nfrom pyquaternion import Quaternion\nfrom torch.utils.data import Dataset\nnp.set_printopt"
  },
  {
    "path": "loaders/nuscenes_dataset.py",
    "chars": 3405,
    "preview": "import os\nimport numpy as np\nfrom mmdet.datasets import DATASETS\nfrom mmdet3d.datasets import NuScenesDataset\nfrom pyqua"
  },
  {
    "path": "loaders/nuscenes_occ_dataset.py",
    "chars": 8093,
    "preview": "import os\nimport mmcv\nimport glob\nimport torch\nimport numpy as np\nfrom tqdm import tqdm\nfrom mmdet.datasets import DATAS"
  },
  {
    "path": "loaders/old_metrics.py",
    "chars": 23417,
    "preview": "import os\nimport numpy as np\nfrom sklearn.neighbors import KDTree\nfrom termcolor import colored\nfrom functools import re"
  },
  {
    "path": "loaders/pipelines/__init__.py",
    "chars": 345,
    "preview": "from .loading import LoadMultiViewImageFromMultiSweeps, LoadOccGTFromFile\nfrom .transforms import PadMultiViewImage, Nor"
  },
  {
    "path": "loaders/pipelines/loading.py",
    "chars": 13433,
    "preview": "import os\nimport mmcv\nimport torch\nimport numpy as np\nfrom mmdet.datasets.builder import PIPELINES\nfrom numpy.linalg imp"
  },
  {
    "path": "loaders/pipelines/transforms.py",
    "chars": 14088,
    "preview": "import mmcv\nimport torch\nimport numpy as np\nfrom PIL import Image\nfrom numpy import random\nfrom mmdet.datasets.builder i"
  },
  {
    "path": "loaders/ray_metrics.py",
    "chars": 9975,
    "preview": "# Acknowledgments: https://github.com/tarashakhurana/4d-occ-forecasting\n# Modified by Haisong Liu\nimport math\nimport cop"
  },
  {
    "path": "loaders/ray_pq.py",
    "chars": 8286,
    "preview": "import numpy as np\nfrom prettytable import PrettyTable\n\n\nclass Metric_RayPQ:\n    def __init__(self,\n                 occ"
  },
  {
    "path": "models/__init__.py",
    "chars": 236,
    "preview": "from .backbones import __all__\r\nfrom .bbox import __all__\r\nfrom .sparseocc import SparseOcc\r\nfrom .sparseocc_head import"
  },
  {
    "path": "models/backbones/__init__.py",
    "chars": 49,
    "preview": "from .vovnet import VoVNet\n\n__all__ = ['VoVNet']\n"
  },
  {
    "path": "models/backbones/vovnet.py",
    "chars": 12070,
    "preview": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport warnings\nimport torch.utils.checkpoint as cp\nf"
  },
  {
    "path": "models/bbox/__init__.py",
    "chars": 91,
    "preview": "from .assigners import __all__\nfrom .coders import __all__\nfrom .match_costs import __all__"
  },
  {
    "path": "models/bbox/assigners/__init__.py",
    "chars": 90,
    "preview": "from .hungarian_assigner_3d import HungarianAssigner3D\n\n__all__ = ['HungarianAssigner3D']\n"
  },
  {
    "path": "models/bbox/assigners/hungarian_assigner_3d.py",
    "chars": 3757,
    "preview": "import torch\n\nfrom mmdet.core.bbox.builder import BBOX_ASSIGNERS\nfrom mmdet.core.bbox.assigners import AssignResult\nfrom"
  },
  {
    "path": "models/bbox/coders/__init__.py",
    "chars": 69,
    "preview": "from .nms_free_coder import NMSFreeCoder\n\n__all__ = ['NMSFreeCoder']\n"
  },
  {
    "path": "models/bbox/coders/nms_free_coder.py",
    "chars": 4047,
    "preview": "import torch\n\nfrom mmdet.core.bbox import BaseBBoxCoder\nfrom mmdet.core.bbox.builder import BBOX_CODERS\nfrom ..utils imp"
  },
  {
    "path": "models/bbox/match_costs/__init__.py",
    "chars": 64,
    "preview": "from .match_cost import BBox3DL1Cost\n\n__all__ = ['BBox3DL1Cost']"
  },
  {
    "path": "models/bbox/match_costs/match_cost.py",
    "chars": 1717,
    "preview": "import torch\nfrom mmdet.core.bbox.match_costs.builder import MATCH_COST\n\n\n@MATCH_COST.register_module()\nclass BBox3DL1Co"
  },
  {
    "path": "models/bbox/utils.py",
    "chars": 3878,
    "preview": "import torch \n\n\ndef normalize_bbox(bboxes):\n    cx = bboxes[..., 0:1]\n    cy = bboxes[..., 1:2]\n    cz = bboxes[..., 2:3"
  },
  {
    "path": "models/checkpoint.py",
    "chars": 20520,
    "preview": "# This page is completely copied from https://pytorch.org/docs/stable/_modules/torch/utils/checkpoint.html#checkpoint\n# "
  },
  {
    "path": "models/csrc/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "models/csrc/msmv_sampling/msmv_sampling.cpp",
    "chars": 14979,
    "preview": "#include \"msmv_sampling.h\"\n\n#define MAX_POINT 32\n\nvoid ms_deformable_im2col_cuda_c2345(\n    const float* feat_c2,\n    co"
  },
  {
    "path": "models/csrc/msmv_sampling/msmv_sampling.h",
    "chars": 1676,
    "preview": "#pragma once\n\n#include <torch/extension.h>\n\nat::Tensor ms_deform_attn_cuda_c2345_forward(\n    const at::Tensor& feat_c2,"
  },
  {
    "path": "models/csrc/msmv_sampling/msmv_sampling_backward.cu",
    "chars": 17944,
    "preview": "/*!\n * Modified from Deformable DETR\n */\n\n#include <cstdio>\n#include <iostream>\n#include <algorithm>\n#include <cstring>\n"
  },
  {
    "path": "models/csrc/msmv_sampling/msmv_sampling_forward.cu",
    "chars": 13094,
    "preview": "/*!\n* Modified from Deformable DETR\n*/\n\n#include <cstdio>\n#include <algorithm>\n#include <cstring>\n#include <cuda_runtime"
  },
  {
    "path": "models/csrc/setup.py",
    "chars": 568,
    "preview": "from setuptools import setup\nfrom torch.utils.cpp_extension import BuildExtension, CUDAExtension\n\n\ndef get_ext_modules()"
  },
  {
    "path": "models/csrc/wrapper.py",
    "chars": 3932,
    "preview": "import torch\nimport torch.nn.functional as F\nfrom ._msmv_sampling_cuda import _ms_deform_attn_cuda_c2345_forward, _ms_de"
  },
  {
    "path": "models/loss_utils.py",
    "chars": 16416,
    "preview": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom mmdet.models.builder import LOSSES, build_loss\nf"
  },
  {
    "path": "models/matcher.py",
    "chars": 6873,
    "preview": "\"\"\"\nModified from https://github.com/facebookresearch/Mask2Former/blob/main/mask2former/modeling/matcher.py\n\"\"\"\nimport t"
  },
  {
    "path": "models/sparse_voxel_decoder.py",
    "chars": 8270,
    "preview": "import torch\nimport torch.nn as nn\nfrom mmcv.runner import BaseModule\nfrom mmcv.cnn.bricks.transformer import FFN\nfrom ."
  },
  {
    "path": "models/sparsebev_head.py",
    "chars": 21328,
    "preview": "import math\nimport torch\nimport torch.nn as nn\nfrom mmcv.runner import force_fp32\nfrom mmdet.core import multi_apply, re"
  },
  {
    "path": "models/sparsebev_sampling.py",
    "chars": 7107,
    "preview": "import torch\nfrom .bbox.utils import decode_bbox\nfrom .utils import rotation_3d_in_axis, DUMP\nfrom .csrc.wrapper import "
  },
  {
    "path": "models/sparsebev_transformer.py",
    "chars": 19753,
    "preview": "import torch\nimport torch.nn as nn\nimport numpy as np\nimport torch.nn.functional as F\nfrom mmcv.runner import BaseModule"
  },
  {
    "path": "models/sparseocc.py",
    "chars": 9351,
    "preview": "import torch\nimport queue\nimport numpy as np\nfrom mmcv.runner import get_dist_info\nfrom mmcv.runner.fp16_utils import ca"
  },
  {
    "path": "models/sparseocc_head.py",
    "chars": 10215,
    "preview": "import numpy as np\nimport torch\nimport torch.nn as nn\nfrom mmdet.models import HEADS\nfrom mmcv.runner import force_fp32,"
  },
  {
    "path": "models/sparseocc_transformer.py",
    "chars": 11856,
    "preview": "import copy\nimport numpy as np\nimport torch\nimport torch.nn as nn\nfrom mmcv.runner import BaseModule\nfrom mmdet.models.u"
  },
  {
    "path": "models/utils.py",
    "chars": 12214,
    "preview": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom numpy import random\nfrom mmcv.cnn.bricks import "
  },
  {
    "path": "old_metrics.py",
    "chars": 1467,
    "preview": "import os\nimport glob\nimport torch\nimport argparse\nimport numpy as np\nfrom tqdm import tqdm\nfrom loaders.old_metrics imp"
  },
  {
    "path": "ray_metrics.py",
    "chars": 2421,
    "preview": "import os\nimport glob\nimport mmcv\nimport argparse\nimport numpy as np\nimport torch\nfrom torch.utils.data import DataLoade"
  },
  {
    "path": "timing.py",
    "chars": 3256,
    "preview": "import time\nimport utils\nimport logging\nimport argparse\nimport importlib\nimport torch\nimport torch.distributed\nimport to"
  },
  {
    "path": "train.py",
    "chars": 6354,
    "preview": "import os\nimport utils\nimport shutil\nimport logging\nimport argparse\nimport importlib\nimport torch\nimport torch.distribut"
  },
  {
    "path": "utils.py",
    "chars": 10875,
    "preview": "import os\nimport sys\nimport glob\nimport torch\nimport shutil\nimport logging\nimport datetime\nimport socket\nimport wandb\nfr"
  },
  {
    "path": "val.py",
    "chars": 3595,
    "preview": "import os\nimport utils\nimport logging\nimport argparse\nimport importlib\nimport torch\nimport torch.distributed\nimport torc"
  },
  {
    "path": "viz_prediction.py",
    "chars": 4909,
    "preview": "import os\nimport cv2\nimport utils\nimport logging\nimport argparse\nimport importlib\nimport torch\nimport numpy as np\nfrom t"
  }
]

About this extraction

This page contains the full source code of the MCG-NJU/SparseOcc GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 58 files (409.0 KB), approximately 109.1k tokens, and a symbol index with 353 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo